Springer Series in Reliability Engineering
Series Editor Professor Hoang Pham Department of Industrial and Systems Engineering Rutgers, The State University of New Jersey 96 Frelinghuysen Road Piscataway, NJ 08854-8018 USA
Other titles in this series The Universal Generating Function in Reliability Analysis and Optimization Gregory Levitin
Human Reliability and Error in Transportation Systems B.S. Dhillon
Warranty Management and Product Manufacture D.N.P. Murthy and Wallace R. Blischke
Complex System Maintenance Handbook D.N.P. Murthy and Khairy A.H. Kobbacy
Maintenance Theory of Reliability Toshio Nakagawa
Recent Advances in Reliability and Quality in Design Hoang Pham
System Software Reliability Hoang Pham Reliability and Optimal Maintenance Hongzhou Wang and Hoang Pham Applied Reliability and Quality B.S. Dhillon Shock and Damage Models in Reliability Theory Toshio Nakagawa Risk Management Terje Aven and Jan Erik Vinnem Satisfying Safety Goals by Probabilistic Risk Assessment Hiromitsu Kumamoto
Product Reliability D.N.P. Murthy, Marvin Rausand and Trond Østerås Mining Equipment Reliability, Maintainability, and Safety B.S. Dhillon Advanced Reliability Models and Maintenance Policies Toshio Nakagawa Justifying the Dependability of Computerbased Systems Pierre-Jacques Courtois
Offshore Risk Assessment (2nd Edition) Jan Erik Vinnem
Reliability and Risk Issues in Large Scale Safety-critical Digital Control Systems Poong Hyun Seong
The Maintenance Management Framework Adolfo Crespo Márquez
Risks in Technological Systems Torbjörn Thedéen and Göran Grimvall
Riccardo Manzini · Alberto Regattieri Hoang Pham · Emilio Ferrari
Maintenance for Industrial Systems With 504 figures and 174 tables
123
Prof. Riccardo Manzini Università di Bologna Dipartimento Ingegneria delle Costruzioni Meccaniche, Nucleari, Aeronautiche e di Metallurgia (DIEM) Viale Risorgimento, 2 40136 Bologna Italy
[email protected]
Prof. Hoang Pham Rutgers University Department of Industrial and Systems Engineering 96 Frelinghuysen Road Piscataway NJ 08854-8018 USA
[email protected]
Prof. Alberto Regattieri Università di Bologna Dipartimento Ingegneria delle Costruzioni Meccaniche, Nucleari, Aeronautiche e di Metallurgia (DIEM) Viale Risorgimento, 2 40136 Bologna Italy
[email protected]
Prof. Emilio Ferrari Università di Bologna Dipartimento Ingegneria delle Costruzioni Meccaniche, Nucleari, Aeronautiche e di Metallurgia (DIEM) Viale Risorgimento, 2 40136 Bologna Italy
[email protected]
ISSN 1614-7839 ISBN 978-1-84882-574-1 e-ISBN 978-1-84882-575-8 DOI 10.1007/978-1-84882-575-8 Springer Dordrecht Heidelberg London New York British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library Library of Congress Control Number: 2009937576 © Springer-Verlag London Limited 2010 MaintiMizer™ is a trademark of Ashcom Technologies, Inc., 3917 Research Park Drive, Suite B4, Ann Arbor, MI 48108, USA, http://www.ashcomtech.com MATLAB® and Simulink® are registered trademarks of The MathWorks, Inc., 3 Apple Hill Drive, Natick, MA 01760-2098, USA, http://www.mathworks.com Portions of the input and output contained in this publication/book are printed with permission of Minitab Inc. All material remains the exclusive property and copyright of Minitab Inc. All rights reserved. MINITAB® and all other trademarks and logos for the Company’s products and services are the exclusive property of Minitab Inc. All other marks referenced remain the property of their respective owners. See minitab.com for more information. Relex® is a registered trademark of Relex Software Corporation, 540 Pellis Road, Greensburg, PA 15601, USA, http://www.relex.com ReliaSoft® is a registered trademark of ReliaSoft Corporation, 1450 S. Eastside Loop, Tucson, Arizona 85710-6703, USA, http://www.reliasoft.com Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be repro-duced, stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms of licenses issued by the Copyright Licensing Agency. Enquiries concerning reproduction outside those terms should be sent to the publishers. The use of registered names, trademarks, etc., in this publication does not imply, even in the ab-sence of a specific statement, that such names are exempt from the relevant laws and regulations and therefore free for general use. The publisher makes no representation, express or implied, with regard to the accuracy of the informa-tion contained in this book and cannot accept any legal responsibility or liability for any errors or omissions that may be made. Cover design: deblik, Berlin, Germany Typesetting and production: le-tex publishing services GmbH, Leipzig, Germany Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com)
to Sara and Marta
Preface
Billions of dollars are currently spent producing high-technology products and services in a variety of production systems operating in different manufacturing and service sectors (e. g., aviation, automotive industry, software development, banks and financial companies, health care). Most of these products are very complex and sophisticated owing to the number of functions and components. As a result, the production process that realizes these products can be very complicated. A significant example is the largest passenger airliner in the world, the Airbus A380, also known as the “Superjumbo,” with an operating range of approximately 15,200 km, sufficient to fly directly from New York City to Hong Kong. The failure and repair behaviors of the generic part of this system can be directly or indirectly associated with thousands of different safety implications and/or quality expectations and performance measurements, which simultaneously deal with passengers, buildings, the environment, safety, and communities of people. What is the role of maintenance in the design and management of such a product, process, or system? Proper maintenance definitely helps to minimize problems, reduce risk, increase productivity, improve quality, and minimize production costs. This is true both for industrial and for infrastructure assets, from private to government industries producing and supplying products as well as services. We do not need to think about complex production systems, e. g., nuclear power plants, aerospace applications, aircraft, and hospital monitoring control systems, to understand the strategic role of maintenance for the continuous functioning of production systems and equipment. Concepts such as safety, risk, and reliability are universally widespread and maybe abused, because daily we make our choices on the basis of them, willingly or not. That is why we prefer a safer or a more reliable car, or why we travel with a safer airline instead of saving money with an ill-famed company. The acquisition of a safer, or high-quality, article is a great comfort to us even if we pay more. The strategic role of maintenance grows in importance as society grows in complexity, global competition increases, and technological research finds new applications. Consequently the necessity for maintenance actions will continue to increase in the future as will the necessity to further reduce production costs, i. e., increase efficiency, and improve the safety and quality of products and processes. In particular, during the last few decades the so-called reliability and maintenance engineering
vii
viii
discipline has grown considerably in both universities and industry as well as in government. The activities of planning, design, management, control, and optimization of maintenance issues are very critical topics of reliability and maintenance engineering. These are the focus of this book, whose aim is to introduce practitioners and researchers to the main problems and issues in reliability engineering and maintenance planning and optimization. Several supporting decision models and methods are introduced and applied: the book is full of numerical examples, case studies, figures, and tables in order to quickly introduce the reader to very complicated engineering problems. Basic theory and fundamentals are continuously combined with practical experience and exercises useful to practitioners but also to students of undergraduate and graduate schools of engineering, science, and management. The most important keywords used in this book are as follows: product, process, production system, productivity, reliability, availability, maintainability, risk, safety, failure modes and criticality analyses (failure modes and effects analysis and failure mode, effects, and criticality analysis), prediction and evaluation, assessment, preventive maintenance, inspection maintenance, optimization, cost minimization, spare parts fulfillment and management, computerized maintenance management system, total productive maintenance, overall equipment effectiveness, fault tree analysis, Markov chains, Monte Carlo simulation, numerical example, and case study. The book consists of 12 chapters organized as introduced briefly below. Chapter 1 identifies and illustrates the most critical issues concerning the planning activity, the design, the management, and the control of modern production systems, both producing goods (manufacturing systems in industrial sectors) and/or supplying services (e. g., hospital, university, bank). This chapter identifies the role of maintenance in a production system and the capability of guaranteeing a high level of safety, quality, and productivity in a proper way. Chapter 2 introduces quality assessment, presents statistical quality control models and methods, and finally Six Sigma theory and applications. A brief illustration and discussion of European standards and specifications for quality assessment is also presented. Chapter 3 introduces the reader to the actual methodology for the implementation of a risk evaluation capable of reducing risk exposure and guaranteeing the desired level of safety. Chapter 4 examines the fundamental definitions concerning maintenance, and discusses the maintenance question in product manufacturing companies and service suppliers. The most important maintenance engineering frameworks, e. g., reliability-centered maintenance and total productive maintenance, are presented. Chapter 5 introduces the reader to the definition, measurement, management, and control of the main reliability parameters that form the basis for modeling and evaluating activities in complex production systems. In particular, the basic maintenance terminology and nomenclature related to a generic item as a part, component, device, subsystem, functional unit, piece of equipment, or system that can be considered individually are introduced. Chapter 6 deals with reliability evaluation and prediction. It also discusses the elementary reliability configurations of a system in order to introduce the reader to the basic tools used to evaluate complex production systems.
Preface
Preface
ix
Chapter 7 discusses about the strategic role of the maintenance information system and computerized maintenance management systems in reliability engineering. Failure rate prediction models are also illustrated and applied. Chapter 8 introduces models and methods supporting the production system designer and the safety and/or maintenance manager to identify how subsystems and components could fail and what the corresponding effects on the whole system are, and to quantify the reliability parameters for complex systems. In particular models, methods, and tools (failure modes and effects analysis and failure mode, effects, and criticality analysis, fault tree analysis, Markov chains, Monte Carlo dynamic simulation) for the evaluation of reliability in complex production systems are illustrated and applied to numerical examples and case studies. Chapter 9 presents basic and effective models and methods to plan and conduct maintenance actions in accordance with corrective, preventive, and inspection strategies and rules. Several numerical examples and applications are illustrated. Chapter 10 discusses advanced models and methods, including the block replacements, age replacements, and inspection policies for maintenance management. Chapter 11 presents and applies models and tools for supporting the activities of fulfillment and management of spare parts. Chapter 12 presents two significant case studies on reliability and maintenance engineering. In particular, several models and methods introduced and exemplified in previous chapters are applied and compared. We would like to thank our colleagues and students, particularly those who deal with reliability engineering and maintenance every day, and all professionals from industry and service companies who supported our research and activities, Springer for its professional help and cooperation, and finally our families, who encouraged us to write this book. Bologna (Italy) and Piscataway (NJ, USA) Autumn 2008
Riccardo Manzini Alberto Regattieri Hoang Pham Emilio Ferrari
Contents
1
A New Framework for Productivity in Production Systems . . . . . . . .
1
1.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
1.2
A Multiobjective Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
1.2.1
Product Variety . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
1.2.2
Product Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
1.3
Production System Design Framework . . . . . . . . . . . . . . . . . . . . . .
4
1.4
Models, Methods, and Technologies for Industrial Management
5
1.4.1
The Product and Its Main Features . . . . . . . . . . . . . . . . . . . . . . . . .
5
1.4.2
Reduction of Unremunerated Complexity: The Case of Southwest Airlines . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
1.4.3
The Production Process and Its Main Features . . . . . . . . . . . . . . .
7
1.4.4
The Choice of Production Plant . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
1.5
Design, Management, and Control of Production Systems . . . . .
10
1.5.1
Demand Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10
1.5.2
Product Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10
1.5.3
Process and System Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10
1.5.4
Role of Maintenance in the Design of a Production System . . . .
11
1.5.5
Material Handling Device Design . . . . . . . . . . . . . . . . . . . . . . . . . .
11
1.5.6
System Validation and Profit Evaluation . . . . . . . . . . . . . . . . . . . .
11
1.5.7
Project Planning and Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
1.5.8
New Versus Existing Production Systems . . . . . . . . . . . . . . . . . . .
11
1.6
Production System Management Processes for Productivity . . . .
13
1.6.1
Inventory and Purchasing Management . . . . . . . . . . . . . . . . . . . . .
14
1.6.2
Production Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
1.6.3
Distribution Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
14
1.7
Research into Productivity and Maintenance Systems . . . . . . . . .
14 xi
xii
2
3
Contents
Quality Management Systems and Statistical Quality Control . . . . . . 2.1 Introduction to Quality Management Systems . . . . . . . . . . . . . . . . 2.2 International Standards and Specifications . . . . . . . . . . . . . . . . . . . 2.3 ISO Standards for Quality Management and Assessment . . . . . . . 2.3.1 Quality Audit, Conformity, and Certification . . . . . . . . . . . . . . . . . 2.3.2 Environmental Standards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Introduction to Statistical Methods for Quality Control . . . . . . . . 2.4.1 The Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.2 Terms and Definition in Statistical Quality Control . . . . . . . . . . . 2.5 Histograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Control Charts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7 Control Charts for Means . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7.1 The R-Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7.2 Numerical Example, R-Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7.3 The x-Chart N ............................................ 2.7.4 Numerical Example, x-Chart N .............................. 2.7.5 The s-Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7.6 Numerical Example, s-Chart and x-Chart N ................... 2.8 Control Charts for Attribute Data . . . . . . . . . . . . . . . . . . . . . . . . . . 2.8.1 The p-Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.8.2 Numerical Example, p-Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.8.3 The np-Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.8.4 Numerical Example, np-Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.8.5 The c-Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.8.6 Numerical Example, c-Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.8.7 The u-Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.8.8 Numerical Example, u-Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.9 Capability Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.9.1 Numerical Example, Capability Analysis and Normal Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.9.2 Numerical Examples, Capability Analysis and Nonnormal Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.10 Six Sigma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.10.1 Numerical Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.10.2 Six Sigma in the Service Sector. Thermal Water Treatments for Health and Fitness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17 17 19 19 19 21 23 23 24 25 25 26 26 29 29 30 30 33 33 35 36 37 37 37 39 40 40 40
Safety and Risk Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Introduction to Safety Management . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Terms and Definitions. Hazard Versus Risk . . . . . . . . . . . . . . . . . . 3.3 Risk Assessment and Risk Reduction . . . . . . . . . . . . . . . . . . . . . . . 3.4 Classification of Risks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Protective and Preventive Actions . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 Risk Assessment, Risk Reduction, and Maintenance . . . . . . . . . . 3.7 Standards and Specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
53 53 54 57 58 60 63 63
42 46 48 51 51
Contents
xiii
4
Introduction to Maintenance in Production Systems . . . . . . . . . . . . . . 4.1 Maintenance and Maintenance Management . . . . . . . . . . . . . . . . . 4.2 The Production Process and the Maintenance Process . . . . . . . . . 4.3 Maintenance and Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Maintenance Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Maintenance Engineering Frameworks . . . . . . . . . . . . . . . . . . . . . . 4.6 Reliability-Centered Maintenance . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7 Total Productive Maintenance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7.1 Introduction to TPM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7.2 The Concept of TPM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7.3 TPM Operating Instruments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7.4 From Tradition to TPM: A Difficult Transition . . . . . . . . . . . . . . . 4.8 Maintenance Status Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.9 Maintenance Outsourcing and Contracts . . . . . . . . . . . . . . . . . . . .
65 65 66 69 70 70 72 73 73 74 75 76 80 83
5
Basic Statistics and Introduction to Reliability . . . . . . . . . . . . . . . . . . . . 5.1 Introduction to Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Components and Systems in Reliability . . . . . . . . . . . . . . . . . . . . . 5.3 Basic Statistics in Reliability Engineering . . . . . . . . . . . . . . . . . . . 5.4 Time to Failure and Time to Repair . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Probability Distribution Function . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6 Repairable and Nonrepairable Systems . . . . . . . . . . . . . . . . . . . . . 5.7 The Reliability Function – R(t) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.8 Hazard Rate Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.8.1 Hazard Rate Profiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.8.2 Mean Time to Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.9 Stochastic Repair Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.10 Parametric Probability Density Functions . . . . . . . . . . . . . . . . . . . 5.10.1 Constant Failure Rate Model: The Exponential Distribution . . . . 5.10.2 Exponential Distribution. Numerical example . . . . . . . . . . . . . . . . 5.10.3 The Normal and Lognormal Distributions . . . . . . . . . . . . . . . . . . . 5.10.4 Normal and Lognormal Distributions. Numerical example . . . . . 5.10.5 The Weibull Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.10.6 Weibull Distribution. Numerical Example . . . . . . . . . . . . . . . . . . . 5.11 Repairable Components/Systems: The Renewal Process and Availability A(t) . . . . . . . . . . . . . . . . . . 5.12 Applications and Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.12.1 Application 1 – Nonrepairable Components . . . . . . . . . . . . . . . . . 5.12.2 Application 2 – Repairable System . . . . . . . . . . . . . . . . . . . . . . . . .
87 88 88 89 90 90 91 91 92 94 95 95 97 97 99 103 106 110 112
Reliability Evaluation and Reliability Prediction Models . . . . . . . . . . . 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Data Collection and Evaluation of Reliability Parameters . . . . . . 6.2.1 Empirical Functions Direct to Data . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.2 Theoretical Distribution Research . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Introduction to Reliability Block Diagrams . . . . . . . . . . . . . . . . . . 6.4 Serial Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.1 Numerical Example – Serial Configuration . . . . . . . . . . . . . . . . . . 6.5 Parallel Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
133 133 134 135 145 152 153 154 161
6
113 117 117 122
xiv
Contents
6.5.1 6.6 6.7 6.8 6.8.1 6.9 6.9.1
Numerical Example – Parallel Configuration . . . . . . . . . . . . . . . . . Combined Series–Parallel Systems . . . . . . . . . . . . . . . . . . . . . . . . . Combined Parallel–Series Systems . . . . . . . . . . . . . . . . . . . . . . . . . k-out-of-n Redundancy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Numerical Examples, k-out-of-n Redundancy . . . . . . . . . . . . . . . . Simple Standby System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Numerical Example – Time-Dependent Analysis: Standby System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.10 Production System Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.10.1 Water Supplier System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.10.2 Continuous Dryer System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
163 168 170 170 171 174
7
Maintenance Information System and Failure Rate Prediction . . . . . 7.1 The Role of a Maintenance Information System . . . . . . . . . . . . . . 7.2 Maintenance Information System Framework . . . . . . . . . . . . . . . . 7.2.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.2 Maintenance Engineering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.3 Interventions and Workload Analysis . . . . . . . . . . . . . . . . . . . . . . . 7.2.4 Spare Parts and Equipment Management . . . . . . . . . . . . . . . . . . . . 7.3 Computer Maintenance Management Software . . . . . . . . . . . . . . . 7.4 CMMS Implementation: Procedure and Experimental Evidence 7.4.1 System Configuration and Integration . . . . . . . . . . . . . . . . . . . . . . 7.4.2 Training and Data Entry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.3 Go Live . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.4 Postimplementation Phase and Closing . . . . . . . . . . . . . . . . . . . . . 7.4.5 Experimental Evidence Concerning CMMS Implementation . . . 7.5 Failure Rate Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5.1 Accelerated Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5.2 Failure Data Prediction Using a Database . . . . . . . . . . . . . . . . . . . 7.6 Remote Maintenance/Telemaintenance . . . . . . . . . . . . . . . . . . . . . 7.6.1 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
189 189 190 190 192 194 195 196 199 199 200 200 200 200 204 204 206 214 216
8
Effects Analysis and Reliability Modeling of Complex Production Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1 Introduction to Failure Modes Analysis and Reliability Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Failure Modes and Effects Analysis . . . . . . . . . . . . . . . . . . . . . . . . 8.2.1 Product Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.2 Failure Mode, Effects, and Causes Analysis . . . . . . . . . . . . . . . . . 8.2.3 Risk Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.4 Corrective Action Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.5 FMEA Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3 Failure Mode, Effects, and Criticality Analysis . . . . . . . . . . . . . . . 8.3.1 Qualitative FMECA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3.2 Quantitative FMECA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3.3 Numerical Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4 Introduction to Fault Tree Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 8.5 Qualitative FTA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.5.1 Fault Tree Construction Guidelines . . . . . . . . . . . . . . . . . . . . . . . . .
180 183 185 187
219 220 220 221 222 222 225 229 229 231 231 232 236 239 239
Contents
xv
8.5.2 8.5.3 8.5.4 8.6 8.6.1 8.6.2 8.6.3
Numerical Example 1. Fault Tree Construction . . . . . . . . . . . . . . . Boolean Algebra and Application to FTA . . . . . . . . . . . . . . . . . . . Qualitative FTA: A Numerical Example . . . . . . . . . . . . . . . . . . . . . Quantitative FTA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Quantitative FTA, Numerical Example 1 . . . . . . . . . . . . . . . . . . . . Quantitative FTA, Numerical Example 2 . . . . . . . . . . . . . . . . . . . . Numerical Example. Quantitative Analysis in the Presence of a Mix of Statistical Distributions . . . . . . . . . . . . . . . . . . . . . . . . 8.7 Application 1 – FTA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.7.1 Fault Tree Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.7.2 Qualitative FTA and Standards-Based Reliability Prediction . . . . 8.7.3 Quantitative FTA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.8 Application 2 – FTA in a Waste to Energy System . . . . . . . . . . . . 8.8.1 Introduction to Waste Treatment . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.8.2 Case study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.8.3 Emissions and Externalities: Literature Review . . . . . . . . . . . . . . 8.8.4 SNCR Plant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.8.5 SNCR Plant. Reliability Prediction and Evaluation Model . . . . . 8.8.6 Qualitative FTA Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.8.7 NOx Emissions: Quantitative FTA Evaluation . . . . . . . . . . . . . . . . 8.8.8 Criticality Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.8.9 Spare Parts Availability, What-If Analysis . . . . . . . . . . . . . . . . . . . 8.8.10 System Modifications for ENF Reduction and Effects Analysis . 8.9 Markov Analysis and Time-Dependent Components/Systems . . . 8.9.1 Redundant Parallel Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.9.2 Parallel System with Repairable Components . . . . . . . . . . . . . . . . 8.9.3 Standby Parallel Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.10 Common Mode Failures and Common Causes . . . . . . . . . . . . . . . 8.10.1 Unavailability of a System Subject to Common Causes . . . . . . . . 8.10.2 Numerical Example, Dependent Event . . . . . . . . . . . . . . . . . . . . . . 9
Basic Models and Methods for Maintenance of Production Systems . 9.1 Introduction to Analytical Models for Maintenance of Production Systems . . . . . . . . . . . . . . . . . . . . . 9.1.1 Inspection Versus Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2 Maintenance Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3 Introduction to Preventive Maintenance Models . . . . . . . . . . . . . . 9.4 Component Replacement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.4.1 Time-Related Terms and Life Cycle Management . . . . . . . . . . . . 9.4.2 Numerical Example. Preventive Replacement and Cost Minimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.5 Time-Based Preventive Replacement – Type I Replacement Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.5.1 Numerical Example. Type I Replacement Model . . . . . . . . . . . . . 9.5.2 Numerical Example. Type I Model and Exponential Distribution of ttf . . . . . . . . . . . . . . . . . . . . . . . . . 9.5.3 Type I Replacement Model for Weibull distribution of ttf . . . . . . 9.5.4 The Golden Section Search Method . . . . . . . . . . . . . . . . . . . . . . . .
240 241 242 244 248 252 254 263 264 266 269 277 277 278 279 280 281 283 287 292 295 300 301 302 304 306 309 310 311 313 314 315 315 318 319 319 320 323 324 325 326 326
xvi
Contents
9.5.5 9.6 9.6.1 9.6.2 9.6.3 9.6.4 9.7 9.7.1 9.7.2 9.7.3 9.7.4 9.7.5 9.7.6 9.7.7 9.8 9.8.1 9.9 9.9.1 9.9.2 9.10 9.11 9.11.1 9.11.2 9.12 9.12.1 9.12.2 9.13 9.14 9.15 9.15.1 9.15.2 9.16 9.17 9.18 9.18.1 9.18.2 9.18.3 9.19
Numerical Example. Type I Model and the Golden Section Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Time-Based Preventive Replacement Including Duration of Replacements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Numerical Example 1: Type I Replacement Model Including Durations Tp and Tf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Type I Model with Duration of Replacement for Weibull Distribution of ttf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Numerical Example 2: Type I Model with Durations Tp and Tf . . Practical Shortcut to tp Determination . . . . . . . . . . . . . . . . . . . . . . Block Replacement Strategy – Type II . . . . . . . . . . . . . . . . . . . . . . Renewal Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Laplace Transformation: W(t) and w(t) . . . . . . . . . . . . . . . . . . . . . . Renewal Process and W(t) Determination, Numerical Example . Numerical Example, Type II Model . . . . . . . . . . . . . . . . . . . . . . . . Discrete Approach to W(t) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Numerical Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Practical Shortcut to W(t) and tp Determination . . . . . . . . . . . . . . Maintenance Performance Measurement in Preventive Maintenance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Numerical Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Minimum Total Downtime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Type I – Minimum Downtime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Type II – Downtime Minimization . . . . . . . . . . . . . . . . . . . . . . . . . Group Replacement: The Lamp Replacement Problem . . . . . . . . Preventive Maintenance Policies for Repairable Systems . . . . . . . Type I Policy for Repairable Systems . . . . . . . . . . . . . . . . . . . . . . . Type II Policy for Repairable Systems . . . . . . . . . . . . . . . . . . . . . . Replacement of Capital Equipment . . . . . . . . . . . . . . . . . . . . . . . . . Minimization of Total Cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Numerical Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Literature Discussion on Preventive Maintenance Strategies . . . . Inspection Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Single Machine Inspection Model Based on a Constant Value of Conditional Probability Failure . . . . . . . . Numerical Example 1, Elementary Inspection Model . . . . . . . . . . Numerical Example 2, Elementary Inspection Model . . . . . . . . . . Inspection Frequency Determination and Profit per Unit Time Maximization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Inspection Frequency Determination and Downtime Minimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Inspection Cycle Determination and Profit per Unit Time Maximization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Exponential Distribution of ttf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Weibull Distribution of ttf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Numerical Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Single Machine Inspection Model Based on Total Cost per Unit Time Minimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
328 333 333 335 335 335 339 340 341 341 343 348 349 352 353 354 355 355 357 358 359 360 370 372 372 372 372 373 375 376 377 378 380 381 381 382 382 383
Contents
xvii
9.20
Single Machine Inspection Model Based on Minimal Repair and Cost Minimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.21 Inspection Model Based on Expected Availability per Unit Time Maximization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.22 Group of Machines Inspection Model . . . . . . . . . . . . . . . . . . . . . . . 9.23 A Note on Inspection Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.24 Imperfect Maintenance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.24.1 Imperfect Preventive Maintenance p – q . . . . . . . . . . . . . . . . . . . . . 9.25 Maintenance-Free Operating Period . . . . . . . . . . . . . . . . . . . . . . . . 9.25.1 Numerical Example (Kumar et al. 1999) . . . . . . . . . . . . . . . . . . . . 9.25.2 MFOPS and Weibull Distribution of ttf . . . . . . . . . . . . . . . . . . . . . 9.26 Opportunistic Maintenance Strategy . . . . . . . . . . . . . . . . . . . . . . . .
384 385 386 387 388 388 390 391 392 393
10 Advanced Maintenance Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2 Maintenance Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2.1 Age Replacement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2.2 Block Replacement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3 Modeling of Nonrepairable Degraded Systems . . . . . . . . . . . . . . . 10.4 Modeling of Inspection-Maintenance Repairable Degraded Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.4.1 Calculate EŒNI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.4.2 Calculate Pp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.4.3 Expected Cycle Length Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.4.4 Optimization of Maintenance Cost Rate Policy . . . . . . . . . . . . . . . 10.4.5 Numerical Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.5 Warranty Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
397 397 398 398 399 399 402 403 404 405 405 406 406 408
11 Spare Parts Forecasting and Management . . . . . . . . . . . . . . . . . . . . . . . 11.1 Spare Parts Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Spare Parts Characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3 Forecasting Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.4 Croston Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.5 Poisson Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.6 Binomial Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.6.1 Numerical Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.7 Spare Parts Forecasting Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . 11.8 Spare Parts Forecasting Methods: Application and Case Studies 11.8.1 Case Study 1: Spare Parts Forecasting for an Aircraft . . . . . . . . . 11.8.2 Case Study 2: Spare Parts Forecasting in a Steel Company . . . . . 11.9 Methods of Spare Parts Management . . . . . . . . . . . . . . . . . . . . . . . 11.9.1 Spare Parts Management: Qualitative Methods . . . . . . . . . . . . . . . 11.9.2 Spare Parts Management: Quantitative Methods . . . . . . . . . . . . . .
409 409 410 411 412 413 414 415 416 417 417 418 422 423 426
12 Applications and Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.1 Preventive Maintenance Strategy Applied to a Waste to Energy Plant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.1.1 Motor System Reliability Evaluation . . . . . . . . . . . . . . . . . . . . . . .
433 433 434
xviii
Contents
12.1.2 12.1.3 12.1.4 12.1.5 12.1.6 12.1.7 12.1.8 12.1.9 12.2
Bucket Reliability Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Motor System. Determination of Maintenance Costs . . . . . . . . . . Time-Based Preventive Replacement for the Motor System . . . . Time-Based Preventive Replacement for the Bucket Component Time-Based Preventive Replacement with Durations Tp and Tf . Downtime Minimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Monte Carlo Dynamic Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . Monte Carlo Analysis of the System . . . . . . . . . . . . . . . . . . . . . . . . Reliability, Availability, and Maintainability Analysis in a Plastic Closures Production System for Beverages . . . . . . . . RBD construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rotating Hydraulic Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Data Collection and Reliability Evaluation of Components . . . . . Reliability Evaluation, Nonrepairable Components/Systems . . . . Data on Repairs and Maintenance Strategies . . . . . . . . . . . . . . . . . Monte Carlo Analysis of the Repairable System . . . . . . . . . . . . . . Alternative Scenarios and System Optimization . . . . . . . . . . . . . . Conclusions and Call for New Contributions . . . . . . . . . . . . . . . . .
436 437 439 439 441 442 442 446
Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.1 Standardized Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . A.2 Control Chart Constants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.3 Critical Values of Student’s Distribution with Degree of Freedom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
463 463 464
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
467
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
475
12.2.1 12.2.2 12.2.3 12.2.4 12.2.5 12.2.6 12.2.7 12.3 A
446 448 449 449 454 456 456 460 462
465
1
A New Framework for Productivity in Production Systems
Contents 1.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
1.2
A Multiobjective Scenario . . . . . . . . . . . . . . . . . . . . . 1.2.1 Product Variety . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.2 Product Quality . . . . . . . . . . . . . . . . . . . . . . . . .
2 3 3
1.3
Production System Design Framework . . . . . . . . .
4
1.4
Models, Methods, and Technologies for Industrial Management . . . . . . . . . . . . . . . . . . . . 1.4.1 The Product and Its Main Features . . . . . . . . . 1.4.2 Reduction of Unremunerated Complexity: The Case of Southwest Airlines . . . . . . . . . . . 1.4.3 The Production Process and Its Main Features 1.4.4 The Choice of Production Plant . . . . . . . . . . .
1.5
1.6
1.7
Design, Management, and Control of Production Systems . . . . . . . . . . . . . . . . . . . . . . . . 1.5.1 Demand Analysis . . . . . . . . . . . . . . . . . . . . . . . 1.5.2 Product Design . . . . . . . . . . . . . . . . . . . . . . . . . 1.5.3 Process and System Design . . . . . . . . . . . . . . . 1.5.4 Role of Maintenance in the Design of a Production System . . . . . . . . . . . . . . . . . . 1.5.5 Material Handling Device Design . . . . . . . . . 1.5.6 System Validation and Profit Evaluation . . . . 1.5.7 Project Planning and Scheduling . . . . . . . . . . 1.5.8 New Versus Existing Production Systems . . . Production System Management Processes for Productivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6.1 Inventory and Purchasing Management . . . . . 1.6.2 Production Planning . . . . . . . . . . . . . . . . . . . . . 1.6.3 Distribution Management . . . . . . . . . . . . . . . .
5 5 6 7 7 10 10 10 10 11 11 11 11 11 13 14 14 14
Research into Productivity and Maintenance Systems . . . . . . . . . . . . . . . . . . . . . 14
The pressure of the global market ... we all face increased competition for share. The fundamental key is the productivity of the system. All players in the indus-
try are in the same race to become low cost producers, including manufacturers, our suppliers, and their suppliers, too. And each of us must do it while improving quality, because consumers require it (Alain Batty, CEO, Ford Motor Company of Canada, 2004). High levels of product personalization and quality standardization are essential requirements in current market conditions, in which prices are falling, and in which a new production paradigm for a production system has come into existence. The planning, management, and control of a production system are crucial activities requiring an integrated approach examining the internal features of available production resources and guiding their rational exploitation. Maintenance techniques play a major role in supporting research into productivity, and these very effective tools must be adopted by modern companies.
1.1 Introduction In this book explicitly devoted to maintenance, the first chapter aims to identify and to illustrate the most critical issues concerning the planning activity, the design, the management, and the control of modern production systems, both producing goods (manufacturing systems in industrial sectors) and/or supplying services (e. g., hospital, university, bank). By this discussion it is possible to identify the role of maintenance in a production system and the capability of guaranteeing a high level of safety, quality, and productivity in a proper way. In particular, the expression
R. Manzini, A. Regattieri, H. Pham, E. Ferrari, Maintenance for Industrial Systems © Springer 2010
1
2
“research for productivity” frequently animates the sections of this chapter. The following section introduces the uncertain operating scenario that modern companies have to face to compete in a globalized market. Section 1.3 illustrates a meta-framework for the design of a production system with an enterprise perspective. The aim is to underline the most important tasks and decisional steps affecting the performance of the system with particular attention being given to the business and corporate strategies of the enterprise and its related companies. Section 1.4 briefly discusses the models, methods, and technologies currently available to support the decision-making process dealing with production systems. Section 1.5 presents a conceptual framework, proposed by the authors, for the integration of the design, management, and control of a production system.
1.2 A Multiobjective Scenario Vaughn et al. (2002) identified the most critical factors affecting the performance of a production system as part of an enterprise system. The enterprise does not have complete control over these factors: • Market uncertainty. This is defined as the demand fluctuations for the product, including both shortterm random variability and long-term step/cyclical variability. The uncertainty of demand can create overcapacity or undercapacity, generating customer dissatisfaction. • Production volume, i. e., the number of products to be manufactured over a time period. Market uncertainty and production volume are tightly coupled. Production volume determines the production system capacity and most of the factory physical design, e. g., floor space needed, machine selection, layout, and number of workers. • Product mix. This is the number of different products to be manufactured. The production system has to be capable of producing various versions of a product, or different products simultaneously in the same plant in order to fulfill the market need with the best exploitation of the resources. Product mix and product volume are closely related (Manzini et al. 2004).
1 A New Framework for Productivity in Production Systems
• Frequency of changes. This is the number of engineering changes per time period. The changes can be either structural or upgrades to existing systems. It is not possible to foresee all the changes that might be introduced into a product in the future. For example, the frequency of changes is a very critical issue for the electronic control systems of packaging machines. A packaging system can be used by a generic customer for a few decades: the electronic technologies change very quickly and the customer could need to replace failed parts with new, different spare parts. • Complexity. There are several ways to measure product, process, or system complexity. A few examples are the number of parts, the number of process steps, and the number of subsystems. Complexity deals with the level of difficulty to design, manufacture, assemble, move, etc. a part, and it is affected by the available process capability (see Chap. 2). • Process capability, as the ability to make something repeatedly with minimal interventions. This factor deals with the quality of the process, product, and production systems, as properly illustrated in Chap. 2. • Type of organization and in particular the innovation of the workforce participating in product, process, and system improvements. • Worker skill level, i. e., the availability of high-level employee skills. This factor is strongly linked to the necessary and/or available level of automation. • Investment, as the amount of financial resources required. This is one of the most critical constraints in the production system design, management, and control. • Time to first part. This is another very critical constraint and represents the time from the initial system design to the full rate of production. • Available/existing resources (financial, technological, human, etc.). Current markets have changed a great deal from those of a few years ago. Mass production (large quantities of a limited range of products) has declined in several production systems and been replaced by customeroriented production. Sales and quantities have essentially remained constant, but the related product mix is growing ever larger. Companies are attempting to spread risk over a wider range of base products and
1.2 A Multiobjective Scenario
3
meet (or anticipate) customer needs and desires. This trend is intensified by global competition: different players throughout the world are supplying “similar” products to the same markets. This situation has produced significant changes in production systems (which either produce products or supply services): production batches are very small, production lead times are kept very short, product life cycle is also brief, and consequently product time to market is very compressed. In conclusion, production systems must possess two important features: flexibility and elasticity. Flexibility deals with the ability of the production system to evolve continuously and manufacture wide ranges of products. On the other hand, elasticity allows great variation in production volumes without a significant change in the production system configuration (i. e., without needing time-consuming and expensive work). The literature also names these concepts “capability flexibility” and “capacity flexibility.”
1.2.1 Product Variety The great increase in product variety is easily verified in several case studies. It is sufficient to investigate a single product in order to see how many different versions are now offered in comparison with 10 years ago. Some significant results from the research conducted by Thonemann and Bradley (2002) on product variety analysis are reported below. Table 1.1 shows the increase of product mix in different industrial sectors in the decade 1990–2000. The smallest increase of a little over 50% occurred in commodities. Table 1.1 Product variety increase in various industrial sectors Sector
Percentage variety increase (1990–2000)
Commodities Telecommunications Information technology Automotive Defense
52 57 77 80 81
Table 1.2 Increase in variety of different products Product
1970
2001
Car models Newspapers TV sizes Breakfast cereals Types of milk Running shoes Brands of sparkling waters Pantyhose
140 339 5 160 4 5 16 5
260 790 15 340 19 285 50 90
The change in several product mix ratios is relevant and, as Table 1.2 illustrates, these have more than doubled in some cases.
1.2.2 Product Quality In addition to the range of the product mix, another feature has also greatly increased in significance and is a growing trend: product quality. Consumers have developed great sensitivity and their perception of the quality of products and services is increasing. Consequently, companies must not only produce but also supply products and services to very high quality standards, meaning stand-alone quality is no longer a marginal success factor. In addition to these observations of “new market trends,” industrial and service companies also need their industrial investments to be remunerated. This field is also significantly affected by global competition: with prices falling, companies are forced to reduce production costs. Therefore, modern companies must expand their product mix, increase the quality of the product and the process, and reduce costs: a very stimulating challenge! Moreover, companies are striving to improve the productivity and quality of their production systems, with the most relevant targets in this multiscenario decision-making process including: • a great degree of flexibility and elasticity in the production system; • short lead times; • high-quality products and production processes; • short time to market; • control of production costs.
4
1 A New Framework for Productivity in Production Systems
1.3 Production System Design Framework
INFRASTRUCTURE
This section presents a conceptual framework for supporting the design of a production system with an enterprise perspective. It takes inspiration from the study by Fernandes (2001) in the aerospace industry and lean production. The illustration of this framework is very useful for identification of the operating context of modern production systems and for justification of the introduction of an integrated quality-, safety-, and reliability-based approach to support the design, management, and control of a complex system. In particular, maintenance models and methods reveal themselves as very effective tools to conduct this process. Figure 1.1 presents the meta-framework which also contain other tools, methods, and processes applicable to the design process of production systems operating in different industrial and service sectors, such as auto-
motive, food, health care, pharmaceutical, education, and public administration. The proposed framework is made of three main and distinct elements: 1. Infrastructure, as a result of the enterprise strategy formulation which defines important and critical attributes of the system as operating policy, organizational structure, location, and environment (see the top portion of Fig. 1.1). This strategy is the result of long-term objectives and programs, and is focused on creating operating capabilities. The corporate-level strategy balances the conflicting needs of the numerous stakeholders (e. g., customers, employees, and owners) facing the overall enterprise the production system belongs to, by the formulation of a corporate strategy which is transferred to the business units throughout the corporation. 2. Structure (see the bottom portion Fig. 1.1). It is the physical manifestation of the detailed produc-
Corporate level
Strategy formulation
Stakeholders
Corporate level
Corporate business strategy
Business unit
Suppliers Make/buy
Product Production Marketing design system DFA, DFM, conCustomer needs current engineering technical feasibility
Product strategy
Manufacturing
Physical implications
STRUCTURE
Requirements/considerations/constraints
Manufacturing system design/selection Analytical & simulation tools Implement Fine tune
Trial & error Evaluate/ validate
Finalized product design
RATE PRODUCTION
Fig. 1.1 Production system design framework. DFA design for assembly, DFM manufacturing. (Fernandes 2001)
System design
Marketing
PRODUCT STRATEGY
Suppliers
Product design
1.4 Models, Methods, and Technologies for Industrial Management
tion system design and is the result of the factory layout, number and configuration of machines, and production methods and processes. 3. Product strategy. congruence between the corporate-level business strategy and the functional strategies. It involves functional elements such as marketing, product design, supplier, and manufacturing (see the concurrent engineering overlapping of functions in Fig. 1.1) This meta-framework gives the concurrent engineering approach a great and strategic importance and provides enlightenment on the validation analysis, and the continuous improvement (see the so-called modification loop in Fig. 1.1).
1.4 Models, Methods, and Technologies for Industrial Management Which resources are capable of supporting companies in meeting the challenge introduced in the previous section? First of all, it is important to state that only resources relating to products (or services) and to production processes (i. e., manufacturing and assembly activities in industrial companies) are considered in this chapter. It is not the authors’ purpose to take into account some other factors associated with advertising, marketing, or administrative areas. In brief, research supports productivity via three fundamental and interrelated drivers: the product, the process, and the production system.
1.4.1 The Product and Its Main Features Products are usually designed with reference to their performance (i. e., the ability to satisfy customer needs) and to the aesthetic appearance required by the market. Requirements derived from the production system are sometimes neglected, thus having a negative effect on final production costs. As a consequence, during the last few decades several strategies or techniques for product design, such as design for manufacturing (DFM) and design for assembly (DFA), which, respectively, consider manufacturing and assembly requirements during the design process, have
5
been proposed in the literature and applied in modern production systems. They provide a valid support to the effective management of total production costs. In recent years, the matter of reuse and/or recycling of products has become extremely pressing worldwide, and many countries are facing problems relating to waste evaluation and disposal. The significance of these topics is demonstrated by the wide diffusion of product life cycle management, as the process of managing the entire life cycle of a product from its conception, through design and manufacture, to service and disposal. Figure 1.2 presents a conceptual model of the product life cycle, including the design, production, support, and ultimate disposal activities. Maintenance of production facilities and recovery of products explicitly play a strategic role in product life cycle management. As a consequence, a product design process that also considers product disassembly problems at the end of the product life cycle has become a success factor in modern production systems. This approach to the design process is known as “design for disassembly” (DFD). In several supply chains (e. g., tires and batteries) the manufacturer is burdened with the reuse or final disposal of the product, and DFD is a particularly effective tool for the reduction of production costs. Section 1.2 discusses the advantages and disadvantages associated with the production of a wide variety of products: wide ranges of product mix are an effective strategy in meeting customer expectations, but companies must reach this goal with the minimum number of components and parts. In particular, any part or function not directly perceived by the customer implies both an unnecessary and a harmful addition of complexity because it is not remunerated. Research and trials examining this special kind of complexity lead to the formulation of the following production strategy: what is visible over the skin of the product is based on a very high degree of modularity under the skin. The so-called product platforms are a good solution to support product variability, and so have been adopted in modern production systems. Several families of similar products are developed on the same platform using identical basic production guidelines for all “derivative” products. A well-known example is the “spheroid platform” developed by Piaggio (the Italian manufacturer of the famous Vespa scooter): the products named Zip, Storm, Typhoon, Energy, Skip-
6
1 A New Framework for Productivity in Production Systems
Fig. 1.2 Product life cycle model
per, Quartz, and Free are all derived from the same underlying fundamental design of the scooter called “Sphere” (hence the spheroid platform). Another significant example is the standardization of car speed indicators in the automotive sector: the manufacturers tend to use the same component in every product mix regardless of the speed attainable by each individual car model. As a result of this strategy, the range of the product mix is reduced and the management of parts is simplified without affecting product performance. Every remark or comment about the techniques and strategies cited is also effective both in production systems and in supply services such as hospitals, banks, and consultants.
1.4.2 Reduction of Unremunerated Complexity: The Case of Southwest Airlines Southwest Airlines has developed several interesting ideas for reducing complexity in the service sector.
Figure 1.3 shows the cost per passenger for each mile traveled with the main US airlines. Two fundamental facts can be observed in Fig. 1.3: since 2004 the cost per passenger for each mile traveled (extrapolated from available seat miles) for Southwest Airlines has been lower than for its competitors, clearly competing in the same market and over the same routes. Moreover, the available seat mile costs of Southwest Airlines have continued to decrease since 11 September 2001, in contrast to those of its competitors. Moreover, these costs have significantly increased owing to the increase in the cost of petroleum and owing to the introduction of new safety and security standards. How can this be explained? The answer lies in the efforts of Southwest Airlines, since 1996, to reduce the variety and complexity of services offered to its customers but not directly perceived by them. A significant analysis of the fleet configurations of major American airlines is reported in Table 1.3. The average number of different models of airplane used by the major USA airlines is 14, but Southwest Airlines employs only Boeing 737 airplanes. In fact,
1.4 Models, Methods, and Technologies for Industrial Management
7
Table 1.3 Number of different models of airplane used by USA airlines (June 2008)
No. of different models of airplane in fleet
United Airlines
Delta Airlines
American Airlines
Average for USA airlines
Southwest Airlines
13
9
6
7
1
Boeing 737
very effectively. Among a great many original approaches proposed during the last two decades for the reduction of complexity in a production system, the well-known Variety Reduction Program (VRP) developed by Koudate and Suzue (1990) is worthy of mention.
1.4.3 The Production Process and Its Main Features
Fig. 1.3 Cost per passenger for each mile traveled. ASM available seat miles. (United States Securities and Exchange Commission 2000)
in June 2008, Southwest Airlines owned 535 airplanes of this particular type but using various internal configurations, ranging from 122 to 137 seats. Variety based on the type of airplane is completely irrelevant to customers. Furthermore, when a passenger buys a ticket, the airline companies do not communicate the model of airplane for that flight. However, reducing the number of different models of airplane in the fleet directly results in a significant saving for the airline company: only one simulator for pilot training is required, only one training course for technicians and maintenance staff, spare parts management and control activities are optimized, “on ground” equipment such as systems for towing and refueling are standard, etc. In spite of critical safety problems and high fuel costs, Southwest Airlines has been able to compete
Production processes in several industrial sectors have recently been forced to undergo significant transformations in order to ensure both cost reductions and high quality. A good example from the wood sector is the nonstop pressing process used to simplify the assembly process by using small flaps, gluing, and other techniques instead of screw junctions. Every process innovation capable of consuming too many production resources such as energy, manpower, and raw materials is a very useful motivating factor driving research into productivity. Consequently, when a new production investment is being made in a manufacturing or service sector, a benchmark investigation is required in order to check the state of the art of the production processes. In addition to this, from an economic or technical point of view, scouting for alternative processes that could be more effective is also recommended.
1.4.4 The Choice of Production Plant An effective production process is a basic condition in making an entire production system effective. Thorough analysis of the specific characteristics of production factors, e. g., resources and equipment required by the available processes, is one of the most important aspects of research into productivity.
8
It is possible to have two different production plants carrying out the same process with their own specifications and production lead times to get the same results, but at different costs. A great deal of effort in innovation of the plant equipment has taken place in recent years, but innovation in the production process is a very difficult problem to solve, often involving contributions from various industrial disciplines (e. g., electronics, robotics, industrial instrumentation, mechanical technology). One of the most significant trends in equipment innovation developments is represented by flexible automation, which provides the impetus for a production system to achieve high levels of productivity. Presently, industrial equipment and resources are highly automated. However, flexible automation is required so that a wide mix of different products and services is achieved without long and expensive setups. One of the best expressions of this concept, i. e., the simultaneous need for automation and flexibility, is the so-called flexible manufacturing system (FMS). A flexible manufacturing system is
1 A New Framework for Productivity in Production Systems
a melting pot where several automatic and flexible machines (e. g., computer numerical control (CNC) lathes or milling machines) are grouped and linked together using an automatic and flexible material handling system. The system can operate all job sequences, distinguish between different raw materials by their codes, download the correct part program from the logic controller, and send each part to the corresponding machine. This basic example of the integration of different parts shows how successful productivity in a modern production system can be. The potential offered by flexible automation can only be exploited effectively if every element of the integrated system is capable of sharing information quickly and easily. The information technology in flexible systems provides the connectivity between machines, tool storage systems, material feeding systems, and each part of the integrated system in general. Figure 1.4 presents a brief classification, proposed by Black and Hunter (2003), of the main manufac-
Fig. 1.4 Different kinds of manufacturing systems (Black and Hunter 2003)
1.4 Models, Methods, and Technologies for Industrial Management
turing systems in an industrial production context by comparing different methodologies based on production rates and flexibility, i. e., the number of different parts the generic system can handle. In conclusion, the required system integration means developing data exchange and sharing of information, and the development of production systems in the future will be based on this critical challenge. The current advanced information technology solutions (such as local area networks, the Internet, wire-
9
less connectivity, and radio-frequency identification (RFID)) represent a valid support in the effective integration of production activities. Figure 1.5 is extracted from a previous study by the authors and briefly summarizes the productivity paradigm discussed in this chapter. This figure was proposed for the first time by Rampersad (1995). Research into productivity also requires technical, human, and economic resources. Consequently, before a generic production initiative is embarked upon, it is
EXTERNAL DEVELOPMENTS RESOURCES MARKET DEVELOPMENTS PRODUCT DEVELOPMENTS
SELLING MARKET • • • • • • •
•
New design strategies (DFM, DFA, DFD,..) • New materials
International competition Shorter product life cycle Increasing product diversity Decreasing product quantity Shorter delivery times Higher delivery reliability Higher quality requirements
PROCESS DEVELOPMENTS • • •
LABOR MARKET • •
Increasing labour costs Lack of well-motivated and qualified personnel
Innovative processes New process strategies New joining methods
SYSTEM DEVELOPMENTS • • •
Flexible automation Integration Information technology
COMPANY COMPANY
POLICY
COMPANY OBJECTIVES
ACTIVITIES
•
High flexibility
•
Effective system design
•
Constant and high product quality
•
Effective system management
•
Short throughput times
•
Low production costs
Fig. 1.5 The new productivity paradigm for a production system. DFM design for manufacturing, DFA design for assembly, DFD design for disassembly. (Rampersad 1995)
10
essential to carry out a feasibility study and an appraisal of the economic impact. At the design stage of a product or service, a multidecision approach is often required before the production start-up is initiated. Moreover, as it involves a broad spectrum of enterprise roles and functions, an integrated management approach is achieved because brilliant design solutions can be compromised by bad management. The following section deals with the design, management, and control of a production system in accordance with a new productivity paradigm proposed by the authors.
1.5 Design, Management, and Control of Production Systems A systematic and integrated approach to the management and control of a production system is essential for rational and effective use of the above-mentioned resources and equipment. In other words, productivity must be designed and managed correctly, otherwise the enterprise will risk not being appropriately remunerated for its investment. In both the manufacturing and the service sectors, every new industrial initiative at its start-up needs a complete design process taking the following critical aspects into consideration: market demand analysis, design activity, validation of design, and sequencing and scheduling of project activities. Once the production system has been designed and installed, modern management and optimization techniques and tools need to be applied. Because of this complex scenario, the productivity goal for a complex production system can be effectively achieved by using the integrated and systematic approach shown in Fig. 1.6 (Manzini et al. 2006a). This approach summarizes the complete design procedure for a generic production system according to the current state of the art supporting decision-making techniques and methods
1.5.1 Demand Analysis The starting point of the proposed method is the product (or service) market analysis, based on up-to-date statistical forecasting methods (e. g., time series, exponential smoothing, moving average) for the extrapolation of the future demand from the current one.
1 A New Framework for Productivity in Production Systems
The logical sequence of events is therefore the design phase, and only after its approval is it possible to move on to process design, and lastly the production plant can be designed. Once system optimization has been carried out, the product can be launched on the market.
1.5.2 Product Design The product design phase involves the very important strategies and methodologies of DFM, DFA, and DFD which support management decision making in manufacturing and service companies. These two strategies take manufacturing and assembly problems, respectively, into consideration during the product design activity. The results bring about a drastic reduction in the number of redesign cases, a significant improvement in production system performance, and a noteworthy compression of product time to market. Another supporting decision-making technique is the previously mentioned VRP, which focuses on reduction of complexity. All these supporting design strategies are implemented by using several computerized system solutions: the well-known design automation tools, particularly computer-aided design and computer-aided manufacturing. The design of a new product (or service) is generally based on an interactive loop that verifies and modifies the project by the execution of several fine-tuning iterations.
1.5.3 Process and System Design The product specification forms the input data used in the production process design, which is therefore strictly dependent on the product or service to be supplied. A benchmarking analysis is fundamental to effective process design because it analyses the state of the art in process technologies. The detailed definition of the production process immediately outlines the system structure (i. e., plant, production resources, and equipment), thus choosing the right number and type of machines, tools, operators, etc., and defining the corresponding facility layout design. The plant layout problem can be solved
1.5 Design, Management, and Control of Production Systems
11
using a dedicated software platform (Ferrari et al. 2003; Gamberi et al. 2009).
impossible to experiment on a reliable prototype, performance analysis and system validation are usually conducted by using simulation (e. g., visual interactive simulation, Monte Carlo simulation, what-if analysis). This ex ante evaluation checks the formal congruity of the whole design process, supporting the final choice of system configuration and the fine-tuning of the solution adopted. The technical analysis of the configuration examined is not a guarantee of a rapid return on the industrial investment: the economic evaluation, in terms of total amount of money over time, is the most important deciding factor. For an investment analysis methods such as the well-known net present value, payback analysis, and discounted cash flow rate of return are very frequently used. The best solution results from this double-check, both technical and economic, and forms the foundation for the following phase related to execution of the activities, i. e., project planning and activity scheduling.
1.5.4 Role of Maintenance in the Design of a Production System The maintenance function is a strategic resource during the preliminary design process of a production system. The analysis and forecasting of the reliability performance of a piece of equipment significantly improve the effectiveness of the design of the whole production system. It is very important to foresee future maintenance operations and costs both in the resources/facilities and in the plant layout design so as to avoid lengthy downtimes due to, e. g., the incorrect location of machines, or to a bad assignment and scheduling of manufacturing tasks to resources and workload. The role of maintenance has been increasing in importance, thus leading to a new conceptual framework: the so-called design for maintenance directly embodies maintenance principles in the design process.
1.5.5 Material Handling Device Design In order to complete the illustration of the design process of a production system, the material handling device design has to be considered. Several decisionmaking models and methods have been developed to support this critical issue (Gamberi et al. 2009), in particular in logistics and in operations research, e. g., vehicle routing algorithms and traveling scheduling procedures.
1.5.6 System Validation and Profit Evaluation Each design activity, for product, process, material handling device, etc., is very complex. As a whole they form a set of interlaced tasks whose global solution is not the sum of individual optimizations. An integrated approach generates a set of suitable solutions to be investigated in depth from an economic and technical point of view. In conclusion, the final design must be fully validated. As the production system does not exist during the design process, and it is often almost
1.5.7 Project Planning and Scheduling The effective planning and control of each task in a generic project is crucial in avoiding any delay. To respect the project deadline means to save money, especially when several activities must be performed simultaneously or according to several precedence constraints. A great many project scheduling models and methods are presented in the literature, such as the well-known program evaluation and review technique (PERT), the critical path method (CPM), and Gantt analysis. Figure 1.6 presents a nonexhaustive list of supporting techniques and tools for the execution of the design tasks previously illustrated in general. Most of them have already been mentioned and briefly described or are discussed in the following sections.
1.5.8 New Versus Existing Production Systems Some previous considerations concern research into productivity from the design process of a new production system. But what are the requirements for a production system that has already been set up and is working?
12
1 A New Framework for Productivity in Production Systems
Obviously the challenge of productivity also involves existing production systems. The techniques previously discussed are illustrated in Fig. 1.6 and also represent a useful benchmark in the process of rationalization and optimization of existing production systems.
PROCESSES
SUPPORTING TECHNIQUES AND TOOLS
DEMAND ANALYSIS PRODUCT VARIANTS ANALYSIS
An existing production system must follow a continuous improvement process based on the multitarget scenario, as described in Sect. 1.2. First of all, the company must analyze the structure of the product mix in the production system, seeking to rationalize it, e. g., by applying some effective supporting
NEW PRODUCTS DATA MINING & DATA WAREHOUSING HISTORICAL & TREND ANALYSIS MARKET INVESTIGATION
DEMAND & FORECASTS
PRODUCTION QUANTITIES-VARIANCE ANALYSIS
PRODUCT-PROCESS-MHD INTEGRATED DESIGN
PROCESS DESIGN
PRODUCT DESIGN
MHD DESIGN
ALTERNATIVE SOLUTIONS PRODUCT DEFINITION MANUFACTURING PROCESS DEFINITION MANUFACTURING SYSTEM DEFINITION PLANT LAYOUT DEFINITION MATERIAL HANDLING SYSTEM DEFINITION
DESIGN FOR MANUFACTURING - DFM DESIGN FOR ASSEMBLY - DFA DESIGN FOR DISASSEMBLY - DFD MODULARITY AND STANDARDIZATION VARIETY REDUCTION PROGRAM - VRP DESIGN AUTOMATION TOOL (CAD/CAE, CAPP...) RESOURCES DETERMINATION LAYOUT DESIGN MATERIAL HANDLING DEVICE DESIGN VEHICLE ROUTING OPTIMIZATION RELIABILITY & MAINTAINABILITY ANALYSIS
VISUAL INTERACTIVE SIMULATION - VIS MONTE CARLO SIMULATION WHAT-IF ANALYSIS
SYSTEM VALIDATION
PROFIT ANALYSIS SYSTEM SELECTION PROJECT PLANNING & SCHEDULING
NET PRESENT VALUE - NPV ECONOMIC VALUE ADDED - EVA DISCOUNTED CASH FLOW RATE OF RETURN PAY BACK ANALYSIS DECISION TREE ANALYSIS MONTE CARLO SIMULATION PROJECT SCHEDULING ALGORITHMS PROGRAM EVALUATION & REVIEW TECHNIQUE PERT SCHEDULING CRITICAL PATH METHOD - CPM GANTT ANALYSIS
PROJECT EXECUTION
Fig. 1.6 Production system: a complete design framework. MHD material handling device. (Manzini et al. 2006a)
1.6 Production System Management Processes for Productivity
decision-making techniques such as DFM, DFA, and VRP. Modern companies must put continuous monitoring and evaluation of the degree of innovation of their processes into operation. Consequently, process innovation is an important key factor in company success. In recent years, flexible automation has become a valid reference point in process innovation. Any production plant needs some revision during its life cycle, including partial or total substitution of resources, upgrades, and plant layout reengineering. Consequently, planning and execution of prior decisions are also important for a company already on-thejob. In conclusion, the general framework in Fig. 1.6 is also valid for existing production systems. The most important question remains how to choose the most convenient strategy and effective supporting decision methods from the very large collection of solutions available in the literature. The generic case study has its specific peculiarities making it different from all the others. That is why, at a first
13
glance, it is not easy to detect a suitable tool from the wide set of models and methods that can be used to support management decision making.
1.6 Production System Management Processes for Productivity This book discusses a set of effective management procedures, models, methods, and techniques, directly affecting the productivity performance of a production system. Even though they mainly deal with maintenance, safety, and quality assessments, we now illustrate a conceptual framework which classifies the most important management activities into three macro classes: materials and inventory management, production planning, and product/service distribution management (Fig. 1.7). All these activities have to be managed and optimized by whoever in a business unit, in a production system, or in an enterprise is concerned with research for productivity.
Distribution management Production planning Inventory and purchasing management Production system management
Inventory and purchasing management Economic order quantity Safety stock Just in time Comakership with suppliers
Production planning
Aggregate programming Material requirement planning Manufacturing resource planning
Distribution management Distribution resource planning Location allocation problem Transporation
Scheduling Vehicle routing
Fig. 1.7 Production system management activities
14
This book can effectively support the managers, analysts, and practitioners in a generic production system in making the best decisions regarding products, processes, and production plants, in accordance with customer’s expectations of quality and minimizing production costs with particular attention to the reduction of the production system downtimes and to the reliability/availability of products, processes, and equipment. The focus of this work is coherent with the definition of maintenance as “the combination of all technical, administrative and managerial actions during the life cycle of an item intended to retain it in, or restore it to, a state in which it can perform the required function” (European standard EN 13306:2001 – Maintenance terminology), and with the definition of quality management as the system which assists in enhancing customer satisfaction (European standard EN ISO 9000:2006 Quality management systems – Fundamentals and vocabulary). Consequently, the main keywords of this book are as follows: productivity, quality and safety by reliability engineering, maintenance, quality, and safety assessment.
1.6.1 Inventory and Purchasing Management A generic production system needs a fulfillment system for the continuous supply of raw materials and therefore has to cope with material management. In modern companies the traditional economic order quantity (EOQ) and safety stock methods are combined with a great many effective techniques based on pull logics, such as just-in-time strategy. Other eligible methodologies, such as consignment stock, electronic data interchange, comakership, business to business, and e-marketplaces, provide for very close cooperation between customers (service clients) and manufacturers (service providers).
1.6.2 Production Planning Production planning is a second management macroarea with significant impact on productivity. The aim of a preliminary definition of production planning is to provide a fundamental prerequisite for resource requirement planning. These programs are scheduled
1 A New Framework for Productivity in Production Systems
with reference to different time fences, or planning periods, with an increasing degree of detail: from a wide and outermost time fence, related to aggregate programming, to a narrow and very close time fence, related to detailed programming. After the aggregated programming phase, material and resource requirements need to be quantified. Techniques such as the well-known material requirement planning and manufacturing resource planning are usually suitable for this purpose, but the literature also contains several models and methods for so-called advanced planning: advanced planning systems (APS). Lastly, the final step requires the direct “load” of machines and assignment of workload. This is shortterm scheduling. The goal is to define the priorities of different jobs on different items of equipment and machines.
1.6.3 Distribution Management The third important management problem relates to the final distribution of products and services. The main problems are the following: the planning of shipments, generally issued as distribution resource planning; the location–allocation problem along the distributive network, i. e., the simultaneous location of equipment and logistic resources such as distribution centers and warehousing systems; the allocation of customer demand to the available set of resources; the optimal selection of transportation systems; the vehicle routing; and, finally, the execution of distribution activities.
1.7 Research into Productivity and Maintenance Systems The frameworks for the design and management of a production system, illustrated in Figs. 1.1, 1.5, and 1.6, underline how important the contributions of reliability, availability, and quality of resources (equipment, employees, and production plants) are to the production of products or services. In particular, there is a very strong positive link between maintenance and productivity. For example, the availability of a production plant is an absolute necessity for the scheduling of work orders, and spare parts forecasting
1.7 Research into Productivity and Maintenance Systems
is a fundamental part of the planning and design processes (see Chap. 11). A very important factor in purchasing is the quality control of raw materials, and the new design techniques, such as DFM and DFA, must guarantee quality levels set as targets. Modern companies must consider maintenance strategies, rules, procedures, and actions to be some of the most important issues and factors in their success. In other words, the effective design and management of a production system requires the effective design and management of the correlated maintenance process and system. A maintenance system requires strategic planning, dedicated budgets, relevant investments in terms of money and human resources, equipment, and spare parts too. In particular, the availability and commitment of personnel at all levels of an organization also includes the application of the maintenance process. An effective maintenance system provides supporting decision-making techniques, models, and methodologies, and enables maintenance personnel to apply them in order to set the global production costs at a minimum and to ensure high levels of customer service. To achieve this purpose in a production system, those elements such as the ability, skill, and knowledge required by the whole organization and in particular by product designers, production managers, and people who directly operate in the production plants, are crucial. In conclusion, as illustrated in Fig. 1.8, maintenance techniques, including also quality and safety assessment tools and procedures, represent very effective instruments for research into productivity, safety, and quality as modern companies are now forced to pursue them relentlessly. This issue will be demonstrated and supported in detail in the following chapters. The following chapters are organized as follows: • Chapter 2 introduces quality assessment and presents statistical quality control models and methods and Six Sigma theory and applications. A brief illustration and discussion of European standards and specifications for quality assessment is also presented. • Chapter 3 deals with safety assessment and risk assessment with particular attention being given to
15
Fig. 1.8 Maintenance engineering, safety assessment, and quality assessment
•
•
•
•
•
risk analysis and risk reduction procedures. Some exemplifying standards and specifications are illustrated. Chapter 4 introduces maintenance and maintenance management in production systems. An illustration of total productive maintenance production philosophy is also presented. Chapter 5 introduces the main reliability and maintenance terminology and nomenclature. It presents and applies basic statistics and reliability models for the evaluation of failure (and repair) activities in repairable (and nonrepairable) elementary components. Chapter 6 illustrates some effective statistics-based models and methods for the evaluation and prediction of reliability. This chapter also discusses the elementary reliability configurations of a production system, the so-called reliability block diagrams. Chapter 7 discusses the maintenance information systems and their strategic role in maintenance management. A discussion on computer maintenance management software (CMMS) is also presented. Finally, failure rate prediction models are illustrated and applied. Chapter 8 presents and applies models for the analysis and evaluation of failure mode, effects, and criticality in modern production systems. Then models, methods, and tools (failure modes and effects analysis and failure mode, effects, and criti-
16
cality analysis, fault tree analysis, Markov chains, Monte Carlo dynamic simulation) for the evaluation of reliability in complex production systems are illustrated and applied to numerical examples and case studies. • Chapter 9 presents several models and methods to plan and conduct maintenance actions in accordance with corrective, preventive, and inspection
1 A New Framework for Productivity in Production Systems
strategies and rules. Several numerical examples and applications are illustrated. • Chapter 10 illustrates advanced models and methods for maintenance management. • Chapter 11 discusses spare parts management and fulfillment models and tools. • Chapter 12 presents and discusses significant case studies on reliability and maintenance engineering.
2
Quality Management Systems and Statistical Quality Control
Contents 2.1
Introduction to Quality Management Systems . . . 17
2.2
International Standards and Specifications . . . . . . 19
2.3
ISO Standards for Quality Management and Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.3.1 Quality Audit, Conformity, and Certification 19 2.3.2 Environmental Standards . . . . . . . . . . . . . . . . . 21
2.4
Introduction to Statistical Methods for Quality Control . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.4.1 The Central Limit Theorem . . . . . . . . . . . . . . . 23 2.4.2 Terms and Definition in Statistical Quality Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.5
Histograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.6
Control Charts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.7
Control Charts for Means . . . . . . . . . . . . . . . . . . . . . 2.7.1 The R-Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7.2 Numerical Example, R-Chart . . . . . . . . . . . . . 2.7.3 The x-Chart N ........................... 2.7.4 Numerical Example, x-Chart N ............. 2.7.5 The s-Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7.6 Numerical Example, s-Chart and x-Chart N ...
26 26 29 29 30 30 33
2.8
Control Charts for Attribute Data . . . . . . . . . . . . . . 2.8.1 The p-Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.8.2 Numerical Example, p-Chart . . . . . . . . . . . . . . 2.8.3 The np-Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.8.4 Numerical Example, np-Chart . . . . . . . . . . . . 2.8.5 The c-Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.8.6 Numerical Example, c-Chart . . . . . . . . . . . . . . 2.8.7 The u-Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.8.8 Numerical Example, u-Chart . . . . . . . . . . . . . .
33 35 36 37 37 37 39 40 40
2.9
Capability Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 2.9.1 Numerical Example, Capability Analysis and Normal Probability . . . . . . . . . . . . . . . . . . 42 2.9.2 Numerical Examples, Capability Analysis and Nonnormal Probability . . . . . . . . . . . . . . . 46
2.10 Six Sigma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 2.10.1 Numerical Examples . . . . . . . . . . . . . . . . . . . . 51 2.10.2 Six Sigma in the Service Sector. Thermal Water Treatments for Health and Fitness . . . . 51
Organizations depend on their customers and therefore should understand current and future customer needs, should meet customer requirements and strive to exceed customer expectations... Identifying, understanding and managing interrelated processes as a system contributes to the organization’s effectiveness and efficiency in achieving its objectives (EN ISO 9000:2006 Quality management systems – fundamentals and vocabulary). Nowadays, user and consumer assume their own choices regarding very important competitive factors such as quality of product, production process, and production system. Users and consumers start making their choices when they feel they are able to value and compare firms with high quality standards by themselves. This chapter introduces the reader to the main problems concerning management and control of a quality system and also the main supporting decision measures and tools for so-called statistical quality control (SQC) and Six Sigma.
2.1 Introduction to Quality Management Systems The standard EN ISO 8402:1995, replaced by EN ISO 9000:2005, defines “quality” as “the totality of characteristics of an entity that bear on its ability to satisfy stated and implied needs,” and “product” as
R. Manzini, A. Regattieri, H. Pham, E. Ferrari, Maintenance for Industrial Systems © Springer 2010
17
18
2 Quality Management Systems and Statistical Quality Control
“the result of activities or processes and can be tangible or intangible, or a combination thereof.” Consequently, these definitions refer to production systems both in industrial sectors, such as insurance, banking, and transport, and service sectors, in accordance with the conceptual framework introduced in Chap. 1. Another synthetic definition of quality is the “degree to which a set of inherent characteristics fulfills requirements” (ISO 9000:2005). A requirement is an expectation; it is generally related to the organization, customers, or other interested, or involved, parties. We choose to name all these entities, i. e., the stakeholders of the organization, as customers and, consequently, the basic keyword in quality management is customer satisfaction. Another basic term is capability as the ability of the organization, system, or process to realize a product fulfilling the requirements. A quality management system is a particular management system driving the organization with regard to quality. In other words, it assists companies and organizations in enhancing customer satisfaction. This is the result of products capable of satisfying the everchanging customer needs and expectations that consequently require the continuous improvement of products, processes, and production systems. Quality management is a responsibility at all levels of management and involves all members of an organi-
zation. For this reason, in the 1980s total quality management (TQM) as a business management strategy aimed at embedding awareness of quality in all organizational processes found very great success. According to the International Organization for Standardization (ISO) standards (ISO 9000:2006), the basic steps for developing and implementing a quality management system are: • determination of needs and expectations of customers and other involved parties; • definition of the organization’s quality policy and quality objectives; • determination of processes and responsibilities for quality assessment; • identification and choice of production resources necessary to attain the quality objectives; • determination and application of methods to measure the effectiveness and efficiency of each process within the production system; • prevention of nonconformities and deletion of the related causes; • definition and application of a process for continuous improvement of the quality management system. Figure 2.1 presents the model of a process-based quality management system, as proposed by the ISO standards. Continual improvement of the quality management system
Management responsability
Interested parties
Requirements
Fig. 2.1 Process-based quality management system (ISO 9000:2005)
Mesaurement analysis & improvement
Resource management realization
Interested parties
IN
Value adding activities Information flow
Product realization
Product
Satisfaction
OUT
2.3 ISO Standards for Quality Management and Assessment
2.2 International Standards and Specifications According to European Directive 98/34/EC of 22 June 1998, a “standard” is a technical specification for repeated or continuous application approved, without a compulsory compliance, by one of the following recognized standardization bodies: • ISO; • European standard (EN); • national standard (e. g., in Italy UNI). Standards are therefore documents defining the characteristics (dimensional, performance, environmental, safety, organizational, etc.) of a product, process, or service, in accordance with the state of the art, and they are the result of input received from thousands of experts working in the European Union and elsewhere in the world. Standards have the following distinctive characteristics: • Consensuality: They must be approved with the consensus of the participants in the works of preparation and confirmed by the result of a public enquiry. • Democracy: All the interested economic/social parties can participate in the works and, above all, have the opportunity to make observations during the procedure prior to final and public approval. • Transparency: UNI specifies the basic milestones of the approval procedure for a draft standard, placing the draft documents at the disposal of the interested parties for consultation. • Voluntary nature: Standards are a source of reference that the interested parties agree to apply freely on a noncompulsory basis. In particular CEN, the European Committee for Standardization founded in 1961 by the national standards bodies in the European Economic Community and EFTA countries, is contributing to the objectives of the European Union and European Economic Area with voluntary technical standards promoting free trade, safety of workers and consumers, interoperability of networks, environmental protection, exploitation of research and development programs, and public procurement. CEN works closely with the European Committee for Electrotechnical Standardization (CENELEC), the European Telecommunications Standards Institute
19
(ETSI), and the ISO. CEN is a multisectorial organization serving several sectors in different ways, as illustrated in the next sections and chapters dealing with safety assessment.
2.3 ISO Standards for Quality Management and Assessment The main issues developed by the technical committee for the area of quality are: 1. CEN/CLC/TC 1 – criteria for conformity assessment bodies; 2. CEN/SS F20 – quality assurance. Table 2.1 reports the list of standards belonging to the first technical committee since 2008. Similarly, Table 2.2 reports the list of standards belonging to the technical committee CEN/SS F20 since 2008, while Table 2.3 shows the list of standards currently under development. Quality issues are discussed in several standards that belong to other technical groups. For example, there is a list of standards of the aerospace series dealing with quality, as reported in Table 2.4. Table 2.5 presents a list of standards for quality management systems in health care services. Similarly, there are other sets of standards for specific sectors, businesses, or products.
2.3.1 Quality Audit, Conformity, and Certification Quality audit is the systematic examination of a quality system carried out by an internal or external quality auditor, or an audit team. It is an independent and documented process to obtain audit evidence and to allow its objective evaluation, in order to verify the extent of the fulfillment of the audit criteria. In particular, third-party audits are conducted by external organizations providing certification/registration of conformity to a standard or a set of standards, e. g., ISO 9001 or ISO 14001. The audit process is the basis for the declaration of conformity. The audit process is conducted by an auditor, or an audit team, i. e., a person or a team, with competence
20
2 Quality Management Systems and Statistical Quality Control
Table 2.1 CEN/CLC/TC 1 criteria for conformity assessment bodies, standards published since 2008 Standard
Title
EN 45011:1998
General requirements for bodies operating product certification systems (ISO/IEC Guide 65:1996) Attestation Standard for the assessment of contract award procedures of entities operating in the water, energy, transport and telecommunications sectors Conformity assessment – Vocabulary and general principles (ISO/IEC 17000:2004) Conformity assessment – General requirements for accreditation bodies accrediting conformity assessment bodies (ISO/IEC 17011:2004) General criteria for the operation of various types of bodies performing inspection (ISO/IEC 17020:1998) Conformity assessment – Requirements for bodies providing audit and certification of management systems (ISO/IEC 17021:2006) Conformity assessment – General requirements for bodies operating certification of persons (ISO/IEC 17024:2003) General requirements for the competence of testing and calibration laboratories (ISO/IEC 17025:2005) General requirements for the competence of testing and calibration laboratories (ISO/IEC 17025:2005/Cor.1:2006) Conformity assessment – General requirements for peer assessment of conformity assessment bodies and accreditation bodies (ISO/IEC 17040:2005) Conformity assessment – Supplier’s declaration of conformity – Part 1: General requirements (ISO/IEC 17050-1:2004) Conformity assessment – Supplier’s declaration of conformity – Part 2: Supporting documentation (ISO/IEC 17050-2:2004)
EN 45503:1996 EN ISO/IEC 17000:2004 EN ISO/IEC 17011:2004 EN ISO/IEC 17020:2004 EN ISO/IEC 17021:2006 EN ISO/IEC 17024:2003 EN ISO/IEC 17025:2005 EN ISO/IEC 17025:2005/AC:2006 EN ISO/IEC 17040:2005 EN ISO/IEC 17050-1:2004 EN ISO/IEC 17050-2:2004
Table 2.2 CEN/SS F20 quality assurance, standards published since 2008 Standard
Title
EN 45020:2006 EN ISO 10012:2003
Standardization and related activities – General vocabulary (ISO/IEC Guide 2:2004) Measurement management systems – Requirements for measurement processes and measuring equipment (ISO 10012:2003) Primary packaging materials for medicinal products – Particular requirements for the application of ISO 9001:2000, with reference to good manufacturing practice (GMP) (ISO 15378:2006) Guidelines for quality and/or environmental management systems auditing (ISO 19011:2002) Quality management systems – Fundamentals and vocabulary (ISO 9000:2005) Quality management systems – Requirements (ISO 9001:2000) Quality management systems – Guidelines for performance improvements (ISO 9004:2000)
EN ISO 15378:2007
EN ISO 19011:2002 EN ISO 9000:2005 EN ISO 9001:2000 EN ISO 9004:2000
Table 2.3 CEN/SS F20 quality assurance, standards under development as of October 2008 Standard
Title
ISO 15161:2001
Guidelines on the application of ISO 9001:2000 for the food and drink industry (ISO 15161:2001) Quality management systems – Requirements (ISO/FDIS 9001:2008) Guidelines for auditing management systems Managing for the sustained success of an organization – A quality management approach (ISO/DIS 9004:2008)
prEN ISO 9001 prEN ISO 19011 rev prEN ISO 9004
2.3 ISO Standards for Quality Management and Assessment
21
Table 2.4 Aerospace series, quality standards Standard
Title
EN 9102:2006 EN 9103:2005
Aerospace series – Quality systems – First article inspection Aerospace series – Quality management systems – Variation management of key characteristics Aerospace series – Quality systems – Model for quality assurance applicable to maintenance organizations Aerospace series – Quality management systems –Requirements for stockist distributors (based on ISO 9001:2000) Aerospace series – Quality management systems –Requirements for Aerospace Quality Management System Certification/Registrations Programs Aerospace series – Quality management systems – Assessment applicable to maintenance organizations (based on ISO 9001:2000) Aerospace series – Quality management systems – Assessment applicable to stockist distributors (based on ISO 9001:2000) Aerospace series – Quality management systems – Data Matrix Quality Requirements for Parts Marking Aerospace series – Qualification and approval of personnel for nondestructive testing Aerospace series – Recommended practices for standardizing company standards Aerospace series – Quality management systems – Assessment (based on ISO 9001:2000) Aerospace series – Quality management systems – Part 002: Requirements for Oversight of Aerospace Quality Management System Certification/Registrations Programs
EN 9110:2005 EN 9120:2005 EN 9104:2006 EN 9111:2005 EN 9121:2005 EN 9132:2006 EN 4179:2005 EN 4617:2006 EN 9101:2008 EN 9104-002:2008
Table 2.5 CEN/TC 362, health care services, quality management systems Standard
Title
CEN/TR 15592:200
Health services – Quality management systems – Guide for the use of EN ISO 9004:2000 in health services for performance improvement Health services – Quality management systems – Guide for the use of EN ISO 9001:2000
CEN/TS 15224:2005
to conduct an audit, in accordance with an audit program consisting of a set of one or more audits planned for a specific time frame. Audit findings are used to assess the effectiveness of the quality management system and to identify opportunities for improvement. Guidance on auditing is provided by ISO 19011:2002 (Guidelines for quality and/or environmental management systems auditing). The main advantages arising from certification are: • • • •
improvement of the company image; increase of productivity and company profit; rise of contractual power; quality guarantee of the product for the client.
In the process of auditing and certification, the documentation plays a very important role, enabling communication of intent and consistency of action. Several types of documents are generated in quality management systems.
2.3.2 Environmental Standards Every standard, even if related to product, service, or process, has an environmental impact. For a product this can vary according to the different stages of the product life cycle, such as production, distribution, use, and end-of-life. To this purpose, CEN has recently been playing a major role in reducing environmental impacts by influencing the choices that are made in connection with the design of products and processes. CEN has in place an organizational structure to respond to the challenges posed by the developments within the various sectors, as well as by the evolution of the legislation within the European Community. The main bodies within CEN are: 1. The Strategic Advisory Body on the Environment (SABE) – an advisory body for the CEN Technical Board on issues related to environment. Stakeholders identify environmental issues of importance
22
2 Quality Management Systems and Statistical Quality Control
to the standardization system and suggest corresponding solutions. 2. The CEN Environmental Helpdesk provides support and services to CEN Technical Bodies on how to address environmental aspects in standards. 3. Sectors – some sectors established a dedicated body to address environmental matters associated with their specific needs, such as the Construction Sector Network Project for the Environment (CSNPE). 4. Associates – two CEN associate members provide a particular focus on the environment within standardization: • European Environmental Citizens Organization for Standardization (ECOS); • European Association for the Coordination of Consumer Representation in Standardization (ANEC).
Table 2.6 reports the list of technical committees on the environment. There are several standards on environmental management. To exemplify this, Table 2.7 reports the list of standards grouped in accordance with the committee CEN/SS S26 – environmental management. ISO 14000 is a family of standards supporting the organizations on the containment of the polluting effects on air, water, or land derived by their operations, in compliance with applicable laws and regulations. In particular, ISO 14001 is the international specification for an environmental management system (EMS). It specifies requirements for establishing an environmental policy, determining environmental aspects and impacts of products/activities/services, planning environmental objectives and measurable targets, implementation and operation of programs to meet objectives and targets, checking and corrective action, and management review.
Table 2.6 Technical committees on the environment Technical commitee
Title
CEN/TC 223 CEN/TC 230 CEN/TC 264 CEN/TC 292 CEN/TC 308 CEN/TC 345 CEN/TC 351
Soil improvers and growing media Water analysis Air quality Characterization of waste Characterization of sludges Characterization of soils Construction Products – Assessment of release of dangerous substances
Table 2.7 Committee CEN/SS S26 – environmental management Standard
Title
EN ISO 14031:1999
Environmental management – Environmental performance evaluation – Guidelines (ISO 14031:1999) Environmental management systems – Requirements with guidance for use (ISO 14001:2004) Environmental labels and declarations – Type I environmental labeling – Principles and procedures (ISO 14024:1999) Environmental labels and declarations – Self-declared environmental claims (Type II environmental labelling) (ISO 14021:1999) Environmental labels and declarations – General principles (ISO 14020:2000) Environmental management – Life cycle assessment – Principles and framework (ISO 14040:2006) Environmental management – Life cycle assessment – Requirements and guidelines (ISO 14044:2006) Environmental management systems – Guidelines for a staged implementation of an environmental management system, including the use of environmental performance evaluation
EN ISO 14001:2004 EN ISO 14024:2000 EN ISO 14021:2001 EN ISO 14020:2001 EN ISO 14040:2006 EN ISO 14044:2006 prEN ISO 14005
2.4 Introduction to Statistical Methods for Quality Control
uniform
μ
23
n=2
x
n = 10
x
μ
x
μ
exponential
n = 10 n=2
μ
x
x
μ
μ
Gauss
n = 10 n=2
x
x μ
μ
μ
Fig. 2.2 Central limit theorem, examples
2.4 Introduction to Statistical Methods for Quality Control The aim of the remainder of this chapter is the introduction and exemplification of effective models and methods for statistical quality control. These tools are very diffuse and can be used to guarantee also the reliability,1 productivity and safety of a generic production system in accordance with the purpose of this book, as illustrated in Chap. 1.
2.4.1 The Central Limit Theorem This section briefly summarizes the basic result obtained by this famous theorem. Given a population or process, a random variable x, with mean and standard deviation , let xN be the mean of a random sample made of n elements x1 ; x2 ; : : : ; xn extracted from this population: when the sample size n is sufficiently large, the sampling distribution of the random vari1
Reliability, properly defined in Chap. 5, can be also defined as “quality in use.”
able xN can be approximated by a normal distribution. The larger the value of n, the better the approximation. This theorem holds irrespective of the shape of the population, i. e., of the density function of the variable x. The analytic translation of the theorem is given by the following equations: M.x/ N D xN D ; O O .x/ N Dp ; n
(2.1) (2.2)
where O is the estimation of and O is the estimation of . Figure 2.2 graphically and qualitatively demonstrates these results representing the basis for the development and discussion of the methods illustrated and applied below. In the presence of a normal distribution of population, the variable xN is normal too for each value of size n. Figure 2.3 quantitatively demonstrates the central limit theorem starting from a set of random values distributed in accordance with a uniform distribution Œ0; 10: the variable xN is a normally distributed variable when the number of items used for the calculus of mean xN i is sufficiently large. In detail, in Fig. 2.3 the size n is assumed be 2, 5, and 20.
24
2 Quality Management Systems and Statistical Quality Control
Histogram of DATA (n=1) and means (n>1) DATA (n=1)
Mean (n=2) 0.20
0.16 0.15
0.12
0.10
Density
0.08
0.05
0.04 0.00
0.00 0.0
1.5
3.0
4.5
6.0
7.5
9.0
Mean (n=5)
1.5
3.0
4.5
6.0
7.5
9.0
Mean (n=20)
0.8
0.3 0.6 0.2
0.4
0.1
0.2
0.0
0.0 1
2
3
4
5
6
7
8
3.6
4.2
4.8
5.4
6.0
6.6
Fig. 2.3 Central limit theorem, histogram of xN for n D f1; 2; 5; 20g. Uniform distribution of variable x
2.4.2 Terms and Definition in Statistical Quality Control Quality control is a part of quality management (ISO 9000:2005) focused on the fulfillment of quality requirements. It is a systematic process to monitor and improve the quality of a product, e. g., a manufactured article, or service by achieving the quality of the production process and the production plant. A list of basic terms and definitions in accordance with the ISO standards follows: • Process, set of interrelated activities turning input into output. It is a sequence of steps that results in an outcome. • Product, result of a process. • Defect, not fulfillment of a requirement related to an intended or specified use. • Measurement process, set of operations to determine the value of a quantity. • Key characteristic, a quality characteristic the product or service should have to fulfill customer requirements and expectations. • Value of a key characteristic. For several products a single value is the desired quality level for a characteristic. • Nominal or target value. It is the expected value for the key characteristic. It is almost impossible to make each unit of product or service identical to the
next; consequently it is nonsense to ask for identical items having a key characteristic value exactly equal to the target value. This need for flexibility is supported by the introduction of limits and tolerances. • Specification limit, or tolerance, conformance boundary, range, specified for a characteristic. The lower specification limit (LSL) is the lower conformance boundary, the upper specification limit (USL) is the upper conformance boundary. The following equation summarizes the relationship among these terms: Specification limits D (nominal value) ˙ tolerance: (2.3) • One-sided tolerance. It relates to characteristics with only one specification limit. • Two-sided tolerance. It refers to characteristics with both USLs and LSLs. • Nonconformity. It is a nonfulfillment of a requirement. It is generally associated with a unit: a nonconformity unit, i. e., a unit that does not meet the specifications. • Nonconforming product or service. A product or service with one or more nonconformities. A nonconforming product is not necessary defective, i. e., no longer fit for use.
2.6 Control Charts
25
2.5 Histograms Histograms are effective and simple graphic tools for the comprehension and analysis of a process behavior with regards to the target value and the specification limits. The histograms illustrate the frequency distribution of variable data: the values assumed by the variable are reported on the abscissa, while the vertical axis reports the absolute or relative frequency values. The specification limits are generally included in the graph and give warnings of possible process problems. Figure 2.4 exemplifies a few histogram shapes. The control charts illustrated in the next section represent a more effective tool for the analysis of a production process.
2.6 Control Charts Control charts, introduced by W.A. Shewhart in 1924, are effective tools for the analysis of the variation of repetitive processes. They are able to identify possible sources of process variation in order to control and eventually eliminate them. In a generic process, two different kinds of variations can be distinguished:
configurations
1. Common causes variations. They are the noise of a production system and are uncontrollable variations. 2. Assignable (or special) causes variations. They can be properly identified and controlled. Some examples are turnover in workman load, breakdowns, machine or tool wear out, and tool change. Control charts are a family of tools for detecting the existence of special causes variations in order to avoid them, i. e., eliminate all anomalous controllable patterns, and bring the process into a state called “of statistical control,” or simply “in control,” whose random behavior is justified by the existence of common causes variations. The “in control” state is necessary to obtain conforming products, as properly discussed in the following sections on capability analysis and Six Sigma. Control charts can be constructed by extracting successive samples from the variable output of the process. These samples, also called “subgroups,” all have size n and have to be taken at regular intervals of time. For each group a summary statistic is calculated and plotted as illustrated in Fig. 2.5. Typical statistical measures calculated for each subgroup are reported in Table 2.8, where the related statistical distribution is cited together with the values of
reasons
configurations
reasons
"Special (assignable) causes" of variation, i.e. errors of measurement or in the activity of data collection LSL
process variation too large for the specification limits LSL
USL
USL
process shifted to the "right" or measurements are out of calibration LSL
"truncated" data LSL
USL
USL
process shifted to the "left" or measurements are out of calibration LSL
USL
granularity, i.e. "granular process" LSL
USL
mix of two different processes, e.g. data from two operators, two machines, or collected at different points in time LSL
USL
stable process within specifications LSL
USL
Fig. 2.4 Exemplifying histograms shapes. LSL lower specification limit, USL upper specification limit
2 Quality Management Systems and Statistical Quality Control
Subgroup statistic (e.g. R, p, u)
26
UCL centerline LCL
1
2
3
4
5 …
Subgroup number
Fig. 2.5 Control chart
the centerline and control limits, as properly defined in the next subsections. A control chart is made of three basic lines as illustrated in Fig. 2.5: 1. Centerline. It is the mean of the statistic quantified for each subgroup, the so-called subgroup statistic (e. g., mean, range, standard deviation). 2. Control limits. These limits on a control chart delimit that region where a data point falls outside, thus alerting one to special causes of variation. This region is normally extended three standard deviations on either side of the mean. The control limits are: • upper control limit (UCL), above the mean; • lower control limit (LCL), below the mean. The generic point of the chart in Fig. 2.5 may represent a subgroup, a sample, or a statistic. k different samples are associated with k different points whose temporal sequence is reported on the chart. Control limits are conventionally set at a distance of three standards errors, i. e., three deviations of the subgroup statistic, from the centerline, because the distribution of samples closely approximates a normal statistical distribution by the central limit theorem. Consequently, the analyst expects that about 99.73% of samples lie within three standard deviations of the mean. This corresponds to a probability of 0.27% that a control chart point falls outside one of the previously defined control limits when no assignable causes are present. In some countries, such as in the UK, the adopted convention of ˙ three standard deviations is different. Figure 2.6 presents eight different anomalous patterns of statistic subgroups tested by Minitab® Statis-
tical Software to find reliable conditions for the in, or out, control state of the process. A process is said to be “in control” when all subgroups on a control chart lie within the control limits and no anomalous patterns are in the sequence of points representing the subgroups. Otherwise, the process is said to be “out of control,” i. e., it is not random because there are special causes variations affecting the output obtained. What happens in the presence of special causes? It is necessary to identify and eliminate them. Consequently, if a chart shows the possible existence of special causes by one of the anomalous behaviors illustrated in Fig. 2.6, the analyst and the person responsible for the process have to repeat the analysis by eliminating the anomalous subgroups. Now, if all the tests are not verified, the process has been conducted to the state of statistical control.
2.7 Control Charts for Means These charts refer to continuous measurement data, also called “variable data” (see Table 2.8), because there are an infinite number of data between two generic ones.
2.7.1 The R-Chart This is a chart for subgroup ranges. The range is the difference between the maximum and the minimum values within a sample of size n: Ri D max fxij g min fxij g; j D1;:::;n
j D1;:::;n
(2.4)
where i is a generic sample and xij is the j th value in the sample i . Consequently, the centerline is R D RN D
k 1X Ri : k
(2.5)
i D1
This value is a good estimation of the mean value of variable Ri , called “R .” We also define the statistic measure of variability of the variable Ri , the standard deviation R . By the central limit theorem, the distribution of values Ri is normal. As a consequence, the
Variable
Variable
Variable
Attribute
Attribute
Attribute
Attribute
R-chart
s-chart
x-chart N
p-chart
np-chart
c-chart
u-chart
Discrete – Poisson distribution
Discrete – Poisson distribution
Discrete – binomial distribution
Discrete – binomial distribution
Continuous – normal distribution
Continuous – normal distribution
Continuous – normal distribution
Statistical distribution
Nonconformities per unit u
Number of nonconformities
Number of nonconformities np
Nonconforming proportion p
Mean xN
Standard deviation s
Range R
Statistic measure
iD1
iD1
k 1 X ui k
uN D
iD1
k 1 X ci k
cN D
npN
x1 C x2 C C xk1 C xk n1 C n2 C C nk1 C nk
pN D
iD1
k 1 X pi k
pN D
iD1
k 1 X N Xi O D XNN D k
O S D sN D
k 1 X si k
sN =c4 O UCL D O C 3 p XNN C 3 p D XNN C A3 sN n n s N =c O 4 N LCL D O 3 p XN 3 p D XNN A3 sN n n s p.1 N p/ N UCL D pN C 3 n s p.1 N p/ N LCL D pN 3 n s p.1 N p/ N UCLi D pN C 3 ni s p.1 N p/ N LCLi D pN 3 ni p N p/ N UCL D npN C 3 np.1 p LCL D npN 3 np.1 N p/ N p UCL D cN C 3 cN p LCL D cN 3 cN s uN UCLi D uN C 3 ni s uN LCLi D uN 3 ni
N 2 R=d O UCL D O C 3 p XNN C 3 p D XNN C A2 RN n n N R=d O 2 LCL D O 3 p XNN 3 p D XNN A2 RN n n
UCL D B4 sN LCL D B3 sN
UCL D R C 3R Š D4 RN LCL D R 3R Š D3 RN
k 1 X RN D Ri k iD1
Control limits
Centerline
UCL upper control limit, LCL lower control limit, UCLi upper control limit for sample i , LCLi lower control limit for sample i
Type of data
Name
Table 2.8 Statistical measures and control chart classification
O D
O D
sN c4
RN d2
Standard deviation
2.7 Control Charts for Means 27
28
2 Quality Management Systems and Statistical Quality Control
Test 1 – 1 point beyond 3 std.dev. (zone A)
Test 2 – 9 points in a row on same side of the center line
Test 3 – 6 points in a row all increasing (or decreasing)
Test 4 – 14 points in a row alternang up and down
Test 5 – 2/3 points in a row more than 2 std.dev.
Test 6 – 4 out of 5 points more than 1 std.dev.
Test 7 – 15 points in a row within 1 std.dev. (either side)
Test 8 – 8 points in a row more than 1 std.dev. (either side)
Fig. 2.6 Eight tests for special causes investigation, Minitab® Statistical Software
control limits are defined as N UCLR D R C 3R Š D4 R; N LCLR D R 3R Š D3 R;
(2.6)
where R is the standard deviation of the variable R and D4 is a constant value depending on the size of
the generic subgroup. The values are reported in Appendix A.2. The following equation represents an estimation of the standard deviation of the variable and continuous data xij : RN (2.7) O D d2
2.7 Control Charts for Means
29
2.7.2 Numerical Example, R-Chart
2.7.3 The x-Chart N
This application refers to the assembly process of an automotive engine. The process variable is a distance, D, measured in tenths of millimeters, between two characteristic axes in the drive shafts and heads. Table 2.9 presents the data collected over 25 days of observation and grouped in samples of size n D 5. By the application of Eqs. 2.5 and 2.6, we have
This is a chart for subgroup means. In the x-chart, N also called “x-chart,” the problem is the estimation of the standard deviation of the population of values. In Sect. 2.7.1, Eq. 2.7 is an effective estimation. Consequently, this chart is generally constructed after the creation of the R-chart and reveals the process to be in the state of statistical control. The centerline of the statistic variable xN is the average of the subgroup means: n k k X X 1X O D . O x/ N D xN D xN i D xij : (2.8) k
k 1X 1 .R1 C C R25 / Š 6:50; RN D Ri D k 25 i D1
UCL D R C 3R Š D4 RN LCL D R 3R Š D3 RN
D
D4 .nD5/D2:114
i D1
13:74;
The control limits are D
D4 .nD5/D0
0:
The R-chart obtained is reported in Fig. 2.7. Previously introduced tests for anomalous behaviors are not verified. As a consequence, the process seems to be random and “coherent with itself” and its characteristic noise and variance. There are no special causes of variation.
N 2 O R=d N UCLxN D O C 3 p xN C 3 p D xN C A2 R; n n N 2 O R=d N LCLxN D O 3 p xNN 3 p D xNN A2 R; n n (2.9) where d2 and A2 are constant values as reported in Appendix A.2.
Table 2.9 Data – 25 subgroups, numerical example Sample
Month
Day
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
7 7 7 7 7 7 7 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8
25 26 27 28 29 30 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
i D1 j D1
D (mm=10) 0:387 4.251 2:727 6.980 3.978 3.424 4:285 1:756 4.184 3:577 2:467 1.199 4.312 3.282 2.000 3.268 3.356 0:240 4:524 0.837 1:016 4.547 0.159 0.842 4.435
5.192 3.333 2:806 3.280 3.479 1.758 2:369 1:434 1.005 1:684 2:752 0.817 1.127 3.319 3:364 1.519 3:335 3:811 0:091 4:536 2.023 0.262 3.786 3:550 1.730
1.839 4.398 4.655 3.372 7.034 0.009 2:666 1.887 0.825 1.800 4:029 0:213 2.534 3:564 1:996 2.704 3:358 1:615 1.945 4.249 4.539 4:108 1:951 1:805 0:185
0.088 6.082 0.494 1:914 4.388 0:216 2.639 1:678 6:427 4.339 2:798 0:737 1.618 3.430 1:830 0.138 4:302 3:510 4.515 0.114 0.075 1:881 6.315 2:731 0.242
1.774 1.706 2:807 2.478 1:790 1.832 3.081 7.060 4:598 0.027 2:152 1:757 0:665 1.556 0.015 0:050 2:856 4:377 1:667 0:087 2:724 0:004 5.161 1:610 4:689
30
2 Quality Management Systems and Statistical Quality Control
R Chart of D 14
UCL=13.74
Sample Range
12 10 8 _ R=6.50
6 4 2
LCL=0
0 Sample 1 Month 7 Day 25
3 7 27
5 7 29
7 7 31
9 8 2
11 8 4
13 8 6
15 8 8
17 8 10
19 8 12
21 8 14
23 8 16
25 8 18
Fig. 2.7 R-chart, numerical example. Minitab® Statistical Software. UCL upper control limit, LCL lower control limit
2.7.4 Numerical Example, x-Chart N Consider the application introduced in Sect. 2.7.2. By Eqs. 2.8 and 2.9, O D xN D
k 1X 1 .xN 1 C C xN 25 / D 0:389; xN i D k 25 i D1
N 2 O R=d UCL D O C 3 p xNN C 3 p D xNN C A2 RN n n D 0:389 C 0:577 6:5 Š 4:139; A2 D0:577
N 2 O R=d LCL D O 3 p xNN 3 p D xNN A2 RN n n D 0:389 0:577 6:5 Š 3:361: A2 D0:577
The chart obtained is reported in Fig. 2.8. Test 6 for anomalous behaviors is verified in sample 5, month 7, and day 29, i. e., there are four of five points in zone B or beyond. As a consequence, the process seems to be “out of control.” There is in fact a very scarce probability of having a sample in those points when the process is “in control.” We assume we are able to properly identify this special cause of variation and to eliminate it. Figure 2.9 presents the charts obtained from the pool of samples without the anomalous subgroup 5. The chart shows another potential anomalous behav-
ior regarding subgroup 4. In this way, assuming we identify and eliminate new special causes, we obtain Figs. 2.10 and 2.11. In particular, Fig. 2.11 presents a process in the state of statistical control: subgroups 2, 4, and 5 have been eliminated.
2.7.5 The s-Chart This chart for subgroup standard deviation can be used to support the construction of the x-chart by the estimation of the standard deviation of the continuous variable xij . In particular, the control limits of the xchart use the centerline of the s-chart. The average of standard deviation of subgroups, sO , is the centerline of the s-chart: O S D .s O i / D sN D
k 1X si ; k
(2.10)
i D1
where .s O i / is the estimation of the mean of the variable si , the standard deviation of a subgroup. The control limits are .s O i/ O i / C 3 p D B4 sN ; UCLs D .s n .s O i/ LCLs D .s O i / 3 p D B3 sN ; n
(2.11)
2.7 Control Charts for Means
31
Xbar-R Chart of D Sample Mean
4
UCL=4.136
6
2 _ _ X=0.389
0 -2
LCL=-3.359 -4 Sample Month Day
1 7 25
3 7 27
5 7 29
7 7 31
9 8 2
11 8 4
13 8 6
15 8 8
17 8 10
19 8 12
21 8 14
23 8 16
25 8 18
15
Sample Range
UCL=13.74 10 _ R=6.50 5
LCL=0
0 Sample Month Day
1 7 25
3 7 27
5 7 29
7 7 31
9 8 2
11 8 4
13 8 6
15 8 8
17 8 10
19 8 12
21 8 14
23 8 16
25 8 18
Fig. 2.8 x-chart from R. Numerical example (25 samples). Minitab® Statistical Software
Xbar-R Chart [rif.no sub.5] 4
UCL=3.984
Sample Mean
5
2 _ _ X=0.262
0 -2
-4 Sample [24] Month [24] Day [24]
LCL=-3.459 1 7 25
3 7 27
6 7 30
8 8 1
10 8 3
12 8 5
14 8 7
16 8 9
18 8 11
20 8 13
22 8 15
24 8 17
15
Sample Range
UCL=13.64 10 _ R=6.45 5
0 Sample [24] Month [24] Day [24]
LCL=0 1 7 25
3 7 27
6 7 30
8 8 1
10 8 3
12 8 5
14 8 7
16 8 9
18 8 11
Fig. 2.9 x-chart from R. Numerical example (24 samples). Minitab® Statistical Software
20 8 13
22 8 15
24 8 17
32
2 Quality Management Systems and Statistical Quality Control
Xbar-R Chart [no sub 4 and 5] 1
Sample Mean
4
UCL=3.847
2 _ _ X=0.150
0 -2
-4 Sample [23] Month [23] Day [23]
LCL=-3.546 1 7 25
3 7 27
7 7 31
9 8 2
11 8 4
13 8 6
15 8 8
17 8 10
19 8 12
21 8 14
23 8 16
25 8 18
15
Sample Range
UCL=13.55 10 _ R=6.41 5
LCL=0
0 Sample [23] Month [23] Day [23]
1 7 25
3 7 27
7 7 31
9 8 2
11 8 4
13 8 6
15 8 8
17 8 10
19 8 12
21 8 14
23 8 16
25 8 18
Fig. 2.10 x-chart from R. Numerical example (23 samples). Minitab® Statistical Software
Xbar-R Chart [no sub 2, 4 and 5] Sample Mean
4
UCL=3.729
2 _ _ X=-0.022
0 -2
-4 Sample [22] Month [22] Day [22]
LCL=-3.774 1 7 25
6 7 30
8 8 1
10 8 3
12 8 5
14 8 7
16 8 9
18 8 11
20 8 13
22 8 15
24 8 17
15
Sample Range
UCL=13.75 10 _ R=6.50 5
0 Sample [22] Month [22] Day [22]
LCL=0 1 7 25
6 7 30
8 8 1
10 8 3
12 8 5
14 8 7
16 8 9
18 8 11
Fig. 2.11 x-chart from R. Numerical example (22 samples). Minitab® Statistical Software
20 8 13
22 8 15
24 8 17
2.8 Control Charts for Attribute Data
where .s O i / is the estimation of the standard deviation of the variable si , the standard deviation of a subgroup, and B3 and B4 are constant values reported in Appendix A.2. The standard deviation of the process measurement is sN O D . O xN i / D : (2.12) c4
33
The x-chart is now created assuming the centerline of the R-chart and in accordance with Eqs. 2.8 and 2.9: UCLx from R Š xNN C A2 RN D 0:009237 C 0:577 0:004155 LCLx from R
D 0:009237 0:577 0:004155
As a consequence, the control limits of the x-chart are sN =c4 O UCLxN D O C 3 p xN C 3 p D xN C A3 sN; n n O s N =c 4 LCLxN D O 3 p xNN 3 p D xNN A3 sN; n n (2.13) where A3 is a constant value reported in Appendix A.2.
D 0:0116; Š xN A2 RN D 0:0068:
The upper section of Fig. 2.12 presents the x-chart where some subgroups verify a few tests, as illustrated also in Fig. 2.13. Consequently, the process is not in a state of control. Similarly, by the application of Eqs. 2.10, 2.11, and 2.13, O S D sN D
k 1X si Š 0:00170: k i D1
2.7.6 Numerical Example, s-Chart and x-Chart N Table 2.10 reports a set of measurement data made for 20 samples of size n D 5. They are the output of a manufacturing process in the automotive industry. The last three columns report some statistics useful for the construction of the control charts and for verification of the status of the control of the process. With use of the values of the constant parameters in Appendix A.2, the following control limits and centerlines have been obtained. Firstly, we propose the results related to the Rchart. By Eq. 2.5 the centerline is k 1X N Ri Š 0:004155: R D R D k
UCLs Š B4 sN D 2:089 0:00170 D 0:00355; LCLs Š B3 sN D 0 0:00170 D 0; UCLx from s xNN C A3 sN D 0:009237 C 1:427 0:00170 LCLx from s
D 0:01166; xNN A3 sN D 0:009237 1:427 0:00170 D 0:00681:
All these values are also reported in Fig. 2.14, showing that the process is not in the state of statistical control. Consequently, a survey for the identification and deletion of special causes of variations, and the subsequent repetition of the control analysis, is required.
i D1
By Eq. 2.6 UCLR D4 RN D 2:114 0:004155 Š 0:008784: LCLR D3 RN D 0 0:004155 D 0: These results are very close to those proposed by the R-chart, as constructed by the tool Minitab® Statistical Software (Fig. 2.12). From the R-chart the process seems to be in the state of statistical control.
2.8 Control Charts for Attribute Data These charts refer to counted data, also called “attribute data.” They support the activities of monitoring and analysis of production processes whose products possess, or do not possess, a specified characteristic or attribute. Attributes measurement is frequently obtained as the result of human judgements.
34
2 Quality Management Systems and Statistical Quality Control
Table 2.10 Measurement data and subgroup statistics. Numerical example ID sample – i 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Measure – X 0.0073 0.0106 0.0096 0.0080 0.0104 0.0071 0.0078 0.0087 0.0074 0.0081 0.0078 0.0089 0.0087 0.0084 0.0074 0.0069 0.0077 0.0076 0.0069 0.0063
0.0101 0.0083 0.0080 0.0076 0.0084 0.0052 0.0089 0.0094 0.0081 0.0065 0.0098 0.0090 0.0075 0.0083 0.0091 0.0093 0.0089 0.0069 0.0077 0.0071
0.0091 0.0076 0.0132 0.0090 0.0123 0.0101 0.0122 0.0120 0.0120 0.0105 0.0113 0.0111 0.0125 0.0101 0.0116 0.0090 0.0091 0.0062 0.0073 0.0078
0.0091 0.0074 0.0105 0.0099 0.0132 0.0123 0.0091 0.0102 0.0116 0.0125 0.0087 0.0122 0.0106 0.0140 0.0109 0.0084 0.0068 0.0077 0.0074 0.0063
M.xi /
Ri
si
0.0053 0.0059 0.0098 0.0123 0.0120 0.0073 0.0095 0.0099 0.0122 0.0136 0.0118 0.0126 0.0113 0.0097 0.0108 0.0090 0.0094 0.0067 0.0074 0.0088
0.0082 0.0080 0.0102 0.0094 0.0113 0.0084 0.0095 0.0101 0.0103 0.0102 0.0099 0.0107 0.0101 0.0101 0.0100 0.0085 0.0084 0.0070 0.0073 0.0073
0.0048 0.0047 0.0052 0.0047 0.0048 0.0071 0.0044 0.0033 0.0048 0.0071 0.0040 0.0037 0.0050 0.0057 0.0042 0.0024 0.0026 0.0015 0.0008 0.0025
0.0019 0.0017 0.0019 0.0019 0.0019 0.0028 0.0016 0.0012 0.0023 0.0029 0.0017 0.0017 0.0020 0.0023 0.0017 0.0010 0.0011 0.0006 0.0003 0.0011
Mean
0.009237
0.004155
0.0016832
Xbar-R Chart
Sample Mean
UCL=0.011666 0.011
6 6
0.010
6
2
_ _ X=0.009237
0.009 0.008 0.007
3
1
3
5
7
9
11 Sample
13
15
17
5
5
LCL=0.006808
19
UCL=0.008905
Sample Range
0.008 0.006 _ R=0.004211
0.004 0.002
LCL=0
0.000 1
3
5
7
9
11 Sample
13
15
Fig. 2.12 R-chart and x-chart from R. Numerical example. Minitab® Statistical Software
17
19
2.8 Control Charts for Attribute Data
35
Test Results for Xbar Chart TEST 2. 9 points in a row on same side of center line. Test Failed at points: 15 TEST 3. 6 points in a row all increasing or all decreasing. Test Failed at points: 18 TEST 5. 2 out of 3 points more than 2 standard deviations from center line (on one side of CL). Test Failed at points: 19; 20 TEST 6. 4 out of 5 points more than 1 standard deviation from center line (on one side of CL). Test Failed at points: 12; 13; 14; 20
Fig. 2.13 x-chart from R, test results. Numerical example. Minitab® Statistical Software
Xbar-S Chart
Sample Mean
UCL=0.011666 0.011
6 6
0.010
6
2
_ _ X=0.009237
0.009 0.008 5
0.007
3
1
3
5
7
9
11 Sample
13
15
17
5
LCL=0.006808
19
0.004
Sample StDev
UCL=0.003555 0.003 0.002
_ S=0.001702
0.001 0.000
LCL=0 1
3
5
7
9
11 Sample
13
15
17
19
Fig. 2.14 s-chart and x-chart from s. Numerical example. Minitab® Statistical Software
2.8.1 The p-Chart The p-chart is a control chart for monitoring the proportion of nonconforming items in successive subgroups of size n. An item of a generic subgroup is said to be nonconforming if it possesses a specified characteristic. Given p1 ; p2 ; : : : ; pk , the subgroups’ proportions of nonconforming items, the sampling random
variable pi for the generic sample i has a mean and a standard deviation: p D ; r p D
.1 / ; n
(2.14)
where is the true proportion of nonconforming items of the process, i. e., the population of items.
36
2 Quality Management Systems and Statistical Quality Control
The equations in Eq. 2.14 result from the binomial discrete distribution of the variable number of nonconformities x. This distribution function is defined as ! n x (2.15) p.x/ D .1 /nx ; x where x is the number of nonconformities and is the probability the generic item has the attribute. The mean value of the standard deviation of this discrete random variable is X xp.x/ D n; D x
s D
X
.x /2 p.x/ D n.1 /:
(2.16)
Table 2.11 Rejects versus tested items. Numerical example Day
Rejects
Tested
32 25 21 23 13 14 15 17 19 21 15 16 21 9 25
286 304 304 324 289 299 322 316 293 287 307 328 304 296 317
21=10 22=10 23=10 24=10 25=10 26=10 27=10 28=10 29=10 30=10 31=10 1=11 2=11 3=11 4=11
Day 5=11 6=11 7=11 8=11 9=11 10=11 11=11 12=11 13=11 14=11 15=11 16=11 17=11 18=11 19=11
Rejects
Tested
21 14 13 21 23 13 23 15 14 15 19 22 23 24 27
281 310 313 293 305 317 323 304 304 324 289 299 318 313 302
x
By the central limit theorem, the centerline, as the estimated value of , and the control limits of the p-chart are O i / D pN D O p D .p r UCLp D pN C 3 r LCLp D pN 3
k 1X pi ; k
(2.17)
i D1
p.1 N p/ N ; n p.1 N p/ N : n
(2.18)
If the number of items for a subgroup is not constant, the centerline and the control limits are quantified by the following equations: pN D
x1 C x2 C C xk1 C xk ; n1 C n2 C C nk1 C nk
UCLp;i D pN C 3 s LCLp;i D pN 3
p.1 N p/ N ; ni
Table 2.11 reports the data related to the number of electric parts rejected by a control process considering 30 samples of different size. By the application of Eqs. 2.19 and 2.20,
(2.20)
p.1 N p/ N ; ni
where UCLi is the UCL for sample i and LCLi is the LCL for sample i .
573 x1 C x2 C C xk1 C xk D n1 C n2 C C nk1 C nk 9171 Š 0:0625; s p.1 N p/ N D pN C 3 ni s 0:0625.1 0:0625/ Š 0:0625 C 3 ; ni s p.1 N p/ N D pN 3 ni s 0:0625.1 0:0625/ Š 0:0625 3 : ni
pN D
UCLp;i
(2.19)
where xi is the number of nonconforming items in sample i and ni is the number of items within the subgroup i , and s
2.8.2 Numerical Example, p-Chart
LCLp;i
Figure 2.15 presents the p-chart generated by Minitab® Statistical Software and shows that test 1 (one point beyond three standard deviations) occurs for the first sample. This chart also presents the noncontinuous trend of the control limits in accordance with the equations in Eq. 2.20.
2.8 Control Charts for Attribute Data
37
P Chart of Rejects 1
0.11 UCL=0.1043
0.10
Proportion
0.09 0.08 0.07
_ P=0.0625
0.06 0.05 0.04 0.03
LCL=0.0207
0.02 21/10 24/10 27/10 30/10
2/11
5/11
8/11
11/11 14/11 17/11
Day Tests performed with unequal sample sizes
Fig. 2.15 p-chart with unequal sample sizes. Numerical example. Minitab® Statistical Software
Table 2.12 Rejected items. Numerical example
2.8.3 The np-Chart
Day
This is a control chart for monitoring the number of nonconforming items in subgroups having the same size. The centerline and control limits are N O np D np; UCLnp LCLnp
p D npN C 3 np.1 N p/; N p D npN 3 np.1 N p/: N
(2.21) (2.22)
2.8.4 Numerical Example, np-Chart The data reported in Table 2.12 relate to a production process similar to that illustrated in a previous application, see Sect. 2.8.2. The size of the subgroups is now constant and equal to 280 items. Figure 2.16 presents the np-chart generated by Minitab® Statistical Software: test 1 is verified by two consecutive samples (collected on 12 and 13 November). The analyst has to find the special causes, then he/she must eliminate them and regenerate the chart, as in Fig. 2.17. This second chart presents another anomalous subgroup: 11=11. Similarly, it is necessary to eliminate this sample and regenerate the chart.
21=10 22=10 23=10 24=10 25=10 26=10 27=10 28=10 29=10 30=10 31=10 1=11 2=11 3=11 4=11
Rejects 19 24 21 23 13 32 15 17 19 21 15 16 21 12 25
Day 5=11 6=11 7=11 8=11 9=11 10=11 11=11 12=11 13=11 14=11 15=11 16=11 17=11 18=11 19=11
Rejects 21 14 13 21 23 13 34 35 36 15 19 22 23 24 27
2.8.5 The c-Chart The c-chart is a control chart used to track the number of nonconformities in special subgroups, called “inspection units.” In general, an item can have any number of nonconformities. This is an inspection unit, as a unit of output sampled and monitored for determination of nonconformities. The classic example is a single printed circuit board. An inspection unit can be a batch, a collection, of items. The monitoring activity of the inspection unit is useful in a continuous pro-
38
2 Quality Management Systems and Statistical Quality Control
NP Chart of Rejects 1
1
35
UCL=34.35
Sample Count
30 25 __ NP=21.1
20 15 10
LCL=7.85 21/10 24/10 27/10 30/10 2/11
5/11 Day
8/11 11/11 14/11 17/11
Fig. 2.16 np-chart, equal sample sizes. Numerical example. Minitab® Statistical Software
NP Chart of Rejects (no 12 & 13 /11) 35
1
UCL=33.02
Sample Count
30
25 __ NP=20.07
20
15
10 LCL=7.12 21/10 24/10 27/10 30/10 2/11 5/11 8/11 Day (no 12 & 13 /11)
11/11 16/11 19/11
Fig. 2.17 np-chart, equal sample sizes. Numerical example. Minitab® Statistical Software
duction process. The number of nonconformities per inspection unit is called c. The centerline of the c-chart has the following average value: k 1X O i / D cN D ci : O c D .c k
(2.23)
i D1
The control limits are p UCLc D cN C 3 c; N p LCLc D cN 3 c: N
The mean and the variance of the Poisson distribution, defined for the random variable number of nonconformities units counted in an inspection unit, are N .ci / D .ci / D c:
The density function of this very important discrete probability distribution is e x ; xŠ where x is the random variable. f .x/ D
(2.24)
(2.25)
(2.26)
2.8 Control Charts for Attribute Data
39 Table 2.13 Errors in inspection unit of 6,000 digits. Numerical example
2.8.6 Numerical Example, c-Chart Table 2.13 reports the number of coding errors made by a typist in a page of 6,000 digits. Figure 2.18 shows the c-chart obtained by the sequence of subgroups and the following reference measures: k 1X ci D 6:8; cN D k i D1
p p UCLc D cN C 3 cN D 6:8 C 3 6:8 Š 14:62; p p LCLc D cN 3 cN D maxf6:8 3 6:8; 0g Š 0; where ci is the number of nonconformities in an inspection unit. From Fig. 2.18 there are no anomalous behaviors suggesting the existence of special causes of variations in the process, thus resulting in a state of statistical control. A significant remark can be made: why does this numerical example adopt the c-chart and not the pchart? If a generic digit can be, or cannot be, an object of an error, it is in fact possible to consider a binomial process where the probability of finding a digit with an
Day 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Errors 10 11 6 9 12 12 14 9 5 0 1 2 1 11 9
Day
Errors
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
error is pi D
8 7 1 2 3 5 1 11 9 14 1 9 1 8 12
ci ci D ; n 6;000
where n is the number of digits identifying the inspection unit. The corresponding p-chart, generated by Minitab® Statistical Software and shown in Fig. 2.19, is very similar to the c-chart in Fig. 2.18.
C Chart of errors 16 UCL=14.62
14
Sample Count
12 10 8
_ C=6.8
6 4 2 LCL=0
0 1
4
7
10
13
16 Day
19
22
25
28
Fig. 2.18 c-chart. Inspection unit equal to 6,000 digits. Numerical example. Minitab® Statistical Software
40
2 Quality Management Systems and Statistical Quality Control
P Chart of errors 0.0025
UCL=0.002436
Proportion
0.0020
0.0015 _ P=0.001133 0.0010
0.0005
0.0000
LCL=0 1
4
7
10
13
16 Day
19
22
25
28
Fig. 2.19 p-chart. Inspection unit equal to 6,000 digits. Numerical example. Minitab® Statistical Software
2.8.7 The u-Chart
ure 2.21 shows the chart obtained by the elimination of those samples. A new sample, i D 30, is “irregular.”
If the subgroup does not coincide with the inspection unit and subgroups are made of different numbers of inspection units, the number of nonconformities per unit, ui , is ci (2.27) ui D : n The centerline and the control limits of the so-called u-chart are O u D .u O i / D uN D s UCLu;i D uN C 3 s LCLu;i D uN 3
k 1X ui ; k i D1
uN ; ni
(2.28)
uN : ni
2.8.8 Numerical Example, u-Chart Table 2.14 reports the number of nonconformities as defects on ceramic tiles of different sizes, expressed in feet squared. Figure 2.20 presents the u-chart obtained; five different subgroups reveal themselves as anomalous. Fig-
2.9 Capability Analysis A production process is said to be capable when it is in state of statistical control and products meet the specification limits, i. e., the customers’ requirements. In other words, the process is capable when it produces “good” products. This is the first time the lower and upper specifications are explicitly considered in the analysis of the process variations. Nonconformity rates are the proportions of process measurements above, or below, the USL, or LSL. This proportion can be quantified in parts per million (PPM), as USL O ; PPM > USL D P .x > USL/ P z > O (2.29) LSL O ; PPM < LSL D P .x < LSL/ P z < O (2.30) where x is a normal random variable and z is a standard normal variable (see Appendix A.1).
2.9 Capability Analysis
41
Table 2.14 Errors/defects in ceramic tiles. Numerical example Sample i
ci [nonconform. number]
Size [ft2 ]
ui
Sample i
ci [nonconform. number]
Size [ft2 ]
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
14 47 21 6 16 27 21 22 43 17 32 14 9 16 19
7.1 3.3 5.9 5.2 5.6 8 8.9 5.6 6.1 4.2 8.4 6.8 4.4 5.2 7.8
1.972 14.242 3.559 1.154 2.857 3.375 2.360 3.929 7.049 4.048 3.810 2.059 2.045 3.077 2.436
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
25 32 41 13 0 14 16 17 18 26 14 23 35 42 31
9.8 8.8 7.1 3.3 6.8 4.4 5.6 8 8.9 5.3 3.1 6.2 4.8 13.5 5.9
ui 2.551 3.636 5.775 3.939 0.000 3.182 2.857 2.125 2.022 4.906 4.516 3.710 7.292 3.111 5.254
U Chart of number of nonconformities 16
Sample Count Per Unit
1
14 12 10 8
1
1 1
6
UCL=5.76 _ U=3.46
4 2
LCL=1.16 0
1
1
4
7
10
13
16 Sample
19
22
25
28
Tests performed with unequal sample sizes
Fig. 2.20 u-chart, tile industry numerical example – chart 1. Minitab® Statistical Software
Consequently, by the application of the central limit theorem, Eqs. 2.29 and 2.30 can be applied to the mean value of the random variable x, x, N assuming the generic statistical probability density function when the size n of the generic sample is over a threshold and critical value. From Eqs. 2.29 and 2.30 it is necessary to estimate and , i. e., quantify O and O . In particular, in the presence of a normal distribution of values x, in order to quantify O it can be useful to use Eq. 2.7 or 2.12. In general, for a generic statistical distribution of the random variable, i. e., the process characteristic x,
there are two different kinds of standard deviations, called “within” and “overall”: the first relates to the within-subgroup variation, while the second relates to the between-subgroup variation. In particular, the “overall” standard deviation is a standard deviation of all the measurements and it is an estimate of the overall process variation, while the “within” standard deviation is a measure of the variations of the items within the same group. In a “in control” process these standard deviation measures are very close to each other. In the following, an in-depth illustration of the statistical models
42
2 Quality Management Systems and Statistical Quality Control
U Chart of nonconformities
Sample Count Per Unit
6 1
UCL=5.199
5 4
_ U=3.044
3 2 1
LCL=0.889
0 1
4
6
8
11
13 15 17 sample ID
21
23
25
27
30
Tests performed with unequal sample sizes
Fig. 2.21 u-chart, tile industry numerical example – chart 2. Minitab® Statistical Software
related to capability analysis is substituted by a few significant numerical examples created with the support of a statistical tool such as Minitab® Statistical Software. For this purpose, it is necessary to introduce the following process capability indexes, specifically designed for normally distributed data, i. e., measurements: USL LSL ; 6O USL O ; D 3O O LSL D 3O USL O O LSL I : D min 3O 3O
Cp D CP U CPL Cpk
(2.31) (2.32) (2.33) (2.34)
When Cp < 1 the process is said to be “noncapable,” otherwise it is “capable” because the quality control variability, represented by 6, can be included by the specification limits LSL and USL, i. e., the production process can meet the customer requirements. The 6 variation is also called “process spread,” while USLLSL is called “specification spread.” A capable process is able to produce products or services that meet specifications. Nevertheless, this index measures the capability only from a potential point of view, because Cp does not tell us if the range of values ˙3 above and below the mean value, called “centerline” in the
control charts, is really included in the specification range, i. e., in other words it does not tell the analyst if the process is centered on the target value. For this purpose, the index Cpk is preferable to Cp because, if we assume values greater than 1, it guarantees the process is centered on the target value, thus telling the analyst what capability the process could achieve if centered, while Cp does not consider the location of the process mean. Finally, the CP U and CPL indexes relate the process spread, the 3 variation, to a single-sided specification spread: -LSL O or USL-, O respectively. A conventionally accepted minimum value for these indexes is 1.33, corresponding to the so-called four sigma production process, as defined in Sect. 2.9. The performance of an in-control process is predictable. Therefore, the capability analysis following the “in-control analysis” can assess the ability of the production process to produce units that are “in spec” and predict the number of parts “out-of-spec.”
2.9.1 Numerical Example, Capability Analysis and Normal Probability Table 2.15 reports the measurements, in millimeters, obtained on 100 products produced by a manufacturing process of cutting metal bars when the expected
2.9 Capability Analysis
43
Table 2.15 Measurement data – process 1, numerical example Sample 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Data – process 1 600.3333 600.2929 599.8586 599.2491 600.4454 599.4055 600.1634 600.3021 600.1666 600.9336 600.3714 599.7379 599.797 600.2411 599.4932 600.6162 599.1419 600.5005 600.7689 599.9661
600.8494 598.789 599.706 599.537 599.9179 599.5074 599.5934 600.3307 599.8434 600.5842 601.2756 601.112 599.9101 599.643 599.6578 599.3922 599.8016 599.3184 599.1993 598.7038
600.693 599.8655 599.8773 599.848 599.5341 599.5099 599.9918 600.6115 600.612 599.7249 599.7404 600.5713 599.1727 599.6155 599.9164 600.6494 600.4682 599.424 599.8779 600.4608
values of the target and specification limits are 600, 601, and 599 mm. Consequently, the tolerances are ˙1 mm. First of all, it is useful to conduct the variability analysis by generating the control chart: Figure 2.22 reports the x-chart based on the s-chart. There are no anomalous behaviors of the sequence of subgroups. It is now possible to quantify the capability indexes and the nonconformity rates by adopting both the overall and the within standard deviations. Figure 2.23 is a report generated by Minitab® Statistical Software for the analysis of the capability of the production process. The Cp value obtained is 0.55, i. e., the process is not potentially capable, both considering the within capability analysis and the overall capability analysis. Figure 2.23 quantifies also the PPM over and under the specifications by Eqs. 2.29 and 2.30, distinguishing: • “Observed performance.” They are related to the observed frequency distribution of data (see the histogram in Fig. 2.23). • “Expected within performance.”2 They relate to the parametric distribution, and in particular to the nor2
Minitab® Statistical Software calls the performance indices Pp and Ppk in the “overall capability” analysis to distinguish them from Cp and Cpk defined by Eqs. 2.31–2.34 for the “within analysis” (see Fig. 2.23).
599.2493 599.3179 600.8859 600.0593 600.3004 599.9597 600.2792 599.0412 600.7174 599.5842 601.0146 600.287 600.8716 600.2896 600.6215 599.6583 599.3786 600.7875 600.7521 599.3556
Mean value
Range
600.6724 599.4127 600.3385 599.2632 598.8681 599.2939 599.41 599.4191 599.9917 599.8445 600.3568 599.922 600.1579 598.6065 599.3805 599.216 600.4624 600.2031 599.9077 601.4034
600.35948 599.5356 600.13326 599.59132 599.81318 599.53528 599.88756 599.94092 600.26622 600.13428 600.55176 600.32604 599.98186 599.67914 599.81388 599.90642 599.85054 600.0467 600.10118 599.97794
1.6001 1.5039 1.1799 0.8102 1.5773 0.6658 0.8692 1.5703 0.874 1.3494 1.5352 1.3741 1.6989 1.6831 1.241 1.4334 1.3263 1.4691 1.5696 2.6996
Average
599.971628
1.40152
mal distribution, obtained by a best-fitting statistical evaluation conducted with the within standard deviation. • “Expected overall performance.” They relate to the parametric distribution obtained by a best-fitting evaluation conducted with the overall standard deviation. In particular, the maximum expected value of PPM is about 96,620. The so-called six-pack capability analysis, illustrated in Fig. 2.24, summarizes the main results presented in Figs. 2.22 and 2.23 and concerning the variability of the process analyzed. The normal probability plot verifies that data are distributed as a normal density function: for this purpose the Anderson–Darling index and the P value are properly quantified. Similarly to the s-chart reported in Fig. 2.22, the R-chart is proposed to support the generation of the x-chart. The standard deviations and capability indexes are hence quantified both in “overall” and “within” hypotheses. Finally, the so-called capability plot illustrates and compares the previously defined process spread and specification spread. The analyst decides to improve the performance of the production process in order to meet the customer specifications and to minimize the process variations.
44
2 Quality Management Systems and Statistical Quality Control
Xbar-S Chart of manufacturing measurements 601.0
Sample Mean
UCL=600.778 600.5 _ _ X=599.972
600.0 599.5
LCL=599.165 599.0 1
3
5
7
9
11 Sample
13
15
17
19
Sample StDev
1.2
UCL=1.181
0.9 _ S=0.565
0.6 0.3
LCL=0
0.0 1
3
5
7
9
11 Sample
13
15
17
19
Fig. 2.22 x-chart and s-chart – process 1, numerical example. Minitab® Statistical Software
LSL
Target
USL
Process Data LSL 599 Target 600 USL 601 Sample Mean 599.972 Sample N 100 StDev(Within) 0.60121 StDev(Overall) 0.603415
Within Overall Potential (Within) Capability Cp 0.55 CPL 0.54 CPU 0.57 Cpk 0.54 Overall Capability Pp PPL PPU Ppk Cpm
0.55 0.54 0.57 0.54 0.55
598.5 599.0 599.5 600.0 600.5 601.0 601.5 Observed Performance PPM < LSL 40000.00 PPM > USL 40000.00 PPM Total 80000.00
Exp. Within Performance PPM < LSL 53034.23 PPM > USL 43586.49 PPM Total 96620.72
Exp. Overall Performance PPM < LSL 53675.43 PPM > USL 44166.87 PPM Total 97842.30
Fig. 2.23 Capability analysis – process 1, numerical example. Minitab® Statistical Software
Table 2.16 reports the process data as a result of the process improvement made for a new set of k D 20 samples with n D 5 measurements each. Figure 2.25 presents the report generated by the six-pack analysis.
It demonstrates that the process is still in statistical control, centered on the target value, 600 mm, and with a Cpk value equal to 3.31. Consequently, the negligible expected number of PPM outside the specification
2.9 Capability Analysis
45
Process Capability Sixpack of DATA Sample Mean
Xbar Chart
Capability Histogram
601
LSL
600
Specifications LSL 599 Target 600 USL 601
LCL=599.165 1
3
5
7
9
11
13
15
17
19
598.5 599.0 599.5 600.0 600.5 601.0 601.5
R Chart Sample Range
USL
_ _ X=599.972
599
Normal Prob Plot
3.0
UCL=2.957
1.5
_ R=1.398
0.0
AD: 0.481, P: 0.228
LCL=0 1
3
5
7
9
11
13
15
17
19
598
600
602
Capability Plot
Last 20 Subgroups Values
Target
UCL=600.778
Within StDev 0.60121 Cp 0.55 Cpk 0.54
601 600
Within
Overall
599
Overall StDev 0.603415 Pp 0.55 Ppk 0.54 Cpm 0.55
Specs 5
10 Sample
15
20
Fig. 2.24 Six-pack analysis – process 1, numerical example. Minitab® Statistical Software
Table 2.16 Measurement data – process 2, numerical example Sample 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.10 2.11 2.12 2.13 2.14 2.15 2.16 2.17 2.18 2.19 2.20
Data – process 2 600.041 599.8219 600.0089 600.1896 600.1819 599.675 600.0521 600.0002 600.02 600.1571 600.0934 599.8668 599.9859 599.9456 600.0487 599.8959 600.1891 600.0002 599.9228 599.7843
600.0938 599.9173 600.075 600.1723 600.0538 599.9778 600.1707 600.0831 599.9963 600.0212 599.9554 599.8757 599.9269 600.0405 600.0569 599.979 600.1168 600.1121 600.092 599.9597
600.1039 600.0308 600.0148 599.8368 599.9957 599.9633 599.9446 599.9298 599.9278 599.9061 599.7975 600.0414 599.8124 600.0576 599.9321 600.1418 600.1106 599.93 599.9225 600.011
600.0911 600.07 599.9714 600.0947 600.0995 599.9895 599.8487 599.9329 599.9793 599.9786 600.0221 599.7939 600.0288 599.7819 599.9164 600.1157 599.9148 599.9924 600.1062 600.0409
Mean value
Range
600.1096 600.0732 600.0271 599.9781 599.9639 599.8853 600.012 599.9142 600.0456 600.0626 599.8821 600.1153 600.0261 600.0603 599.9984 599.9525 600.0013 600.0458 600.1794 600.0436
600.08788 599.98264 600.01944 600.0543 600.05896 599.89818 600.00562 599.97204 599.9938 600.02512 599.9501 599.93862 599.95602 599.97718 599.9905 600.01698 600.06652 600.0161 600.04458 599.9679
0.0686 0.2513 0.1036 0.3528 0.218 0.3145 0.322 0.1689 0.1178 0.251 0.2959 0.3214 0.2164 0.2784 0.1405 0.2459 0.2743 0.1821 0.2569 0.2593
Average
600.001124
0.23198
46
2 Quality Management Systems and Statistical Quality Control
Process Capability Sixpack of DATA2 Sample Mean
Xbar Chart
Capability Histogram UCL=600.1362
LSL
Target
USL
600.1
Specifications LSL 599 Target 600 USL 601
_ _ X=600.0011
600.0 599.9
LCL=599.8660 1
3
5
7
9
11
13
15
17
19
599.1 599.4
599.7
Sample Range
R Chart
600.6
600.9
Normal Prob Plot AD: 0.408, P: 0.340
UCL=0.4952 0.4 _ R=0.2342
0.2
0.0
LCL=0 1
3
5
7
9
11
13
15
17
19
599.6
600.0
Last 20 Subgroups
600.4
Capability Plot
600.2
Values
600.0 600.3
Within StDev 0.100682 Cp 3.31 Cpk 3.31
600.0 599.8
Within
Overall
Overall StDev 0.101659 Pp 3.28 Ppk 3.28 Cpm 3.28
Specs 5
10 Sample
15
20
Fig. 2.25 Six-pack analysis – process 2, numerical example. Minitab® Statistical Software
limits is quantified as ˇ ˇ USL O Total PPM D ˇˇP z > O ˇ LSL O ˇˇ CP z < ˇ O
2.9.2 Numerical Examples, Capability Analysis and Nonnormal Probability O D0:101659 NN D O xD600:0011
Š 0:
These numerical examples refer to data nondistributed in accordance with a normal density function. Consequently, different parametric statistical functions have to be adopted.
Table 2.17 Measurement data (mm=10), nonnormal distribution. Numerical example Sample 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Measurement data 1.246057 0.432057 3.289106 4.740917 1.03499 4.864409 3.045406 0.936205 4.55721 5.635049 4.693689 1.063906 2.902382 4.24421 1.667182
0.493869 1.573958 4.26632 1.38156 6.639968 1.546174 3.160609 0.940518 1.902965 1.851431 1.903728 0.821599 2.769513 4.099892 0.717635
2.662834 2.361707 3.597959 1.618083 6.071461 3.875799 2.901201 3.15243 4.462141 5.076608 6.866619 1.658612 4.439952 0.813895 1.420329
5.917727 0.178515 1.511217 5.597763 1.552255 1.098431 6.760744 4.550744 3.509317 1.630322 3.064651 5.847757 0.912794 4.460482 2.365193
3.020594 1.945173 3.783617 3.05798 0.151038 5.50208 6.04942 1.732531 1.995514 2.673297 0.565978 4.024718 3.192323 3.007995 2.011729
3.233249 3.891315 0.323979 2.404994 1.659891 1.281942 1.39276 5.629206 4.803485 0.777941 2.093118 3.41589 0.774273 3.84575 4.629
0.890597 2.222251 5.367135 1.409824 3.580737 0.921708 3.495365 0.397718 1.95335 7.998625 5.058873 2.196106 3.936241 3.755018 1.934723
1.107955 3.295799 0.429597 1.266203 6.482635 4.884044 2.494509 6.539783 2.53267 0.864797 4.96973 2.153251 2.605119 3.018857 1.844031
1.732582 2.521666 2.179387 3.864219 2.282011 3.054542 3.865445 4.46137 4.884973 5.338903 4.40998 1.59855 6.360237 2.535924 6.976545
2.963924 2.398454 1.945532 0.735855 3.062937 3.225921 1.390489 2.886115 0.882012 6.03149 1.459153 3.074742 5.220038 3.867536 1.01383
2.9 Capability Analysis
47
Process Capability of data Calculations Based on Weibull Distribution Model Target
USL
Process Data LSL * Target 0 USL 10 Sample Mean 3.09791 Sample N 150 Shape 1.70519 Scale 3.47777
Overall Capability Pp * PPL * PPU 0.93 Ppk 0.93 Exp. Overall Performance PPM < LSL * PPM > USL 2344.33 PPM Total 2344.33
Observed Performance PPM < LSL * PPM > USL 6666.67 PPM Total 6666.67
0.0
1.5
3.0
4.5
6.0
7.5
9.0
Fig. 2.26 Capability analysis – Weibull distribution, numerical example. Minitab® Statistical Software
Process Capability Sixpack - Weibull distribution Capability Histogram
Sample Mean
Xbar Chart 5
UCL=4.926
3
_ _ X=3.098
Specifications USL 10
LCL=1.270
1 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
0.0
1.5
3.0
4.5
6.0
7.5
9.0
Weibull Prob Plot
S Chart Sample StDev
USL
AD: 0.248, P: > 0.250
UCL=3.217
3
_ S=1.874
2 1
LCL=0.532 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
0.01
Last 15 Subgroups
1.00
10.00
Capability Plot
10
Values
0.10
Overall Shape 1.70519 Scale 3.47777 Pp * Ppk 0.93
5
Overall
Specs
0 5
10
15
Sample
Fig. 2.27 Six-pack analysis – Weibull distribution, numerical example. Minitab® Statistical Software
2.9.2.1 Weibull Distribution Table 2.17 reports data regarding the output of manufacturing process of tile production in the ceramics industry. This measurement refers to the planarity of the tile surface as the maximum vertical distance of cou-
ples of two generic points on the surface, assuming as the USL a maximum admissible value of 1 mm. Figures 2.26 and 2.27 present the report generated by Minitab® Statistical Software for the capability analysis. The production process generates products, i. e., output, that are “well fitted” by a Weibull statisti-
48
2 Quality Management Systems and Statistical Quality Control Table 2.18 Number of calls and “no answer”, numerical example
cal distribution, shape parameter ˇ D 1:71 and scale parameter D 3:48. The process is therefore “in statistical control” but it does not meet customer requirements in terms of an admissible USL. In other words, the process is “predictable” but “not capable.” In particular, the number of expected items over the USL is about 6,667 PPM.
2.9.2.2 Binomial Distribution This application deals with a call center. Table 2.18 reports the number of calls received in 1 h, between 3 and 4 p.m., and the number of calls that were not answered by the operators. The measurement data can be modeled by assuming a binomial distribution of values. Figure 2.28 presents the results of the capability analysis conducted on this set of values, called “data set 1.” The process is not in statistical control because sample 15 is over the UCL. As a consequence, it is not correct to quantify the production process capability. This figure nevertheless shows that the process is difficultly capable, also in the absence of sample 15. In order to meet the demand of customers properly it is useful to increase the number of operators in the call center.
Sample No answer Calls
Sample No answer Calls
day 1 day 2 day 3 day 4 day 5 day 6 day 7 day 8 day 9 day 10
day 11 day 12 day 13 day 14 day 15 day 16 day 17 day 18 day 19 day 20
421 392 456 436 446 429 470 455 427 424
1935 1945 1934 1888 1894 1941 1868 1894 1938 1854
“Six Sigma” stands for six standard deviations and can be defined as a business management strategy, originally developed by Motorola, that enjoys widespread application in many sectors of industry and services. Six Sigma was originally developed as a set of practices designed to improve manufacturing processes and eliminate defects. This chapter presents a synthetic recall of the basic purpose of Six Sigma, assuming that a large number of the models and methods illustrated here and in the following can properly
Rate of Defectives
1
26
UCL=0.25684
0.24
% Defective
Proportion
P Chart
_ P=0.22769
0.22 0.20
LCL=0.19853 1
3
5
7
9 11 13 Sample
15
17
24 22 20 1840
19
1920 Sample Size
2000
Tests performed with unequal sample sizes Cumulative % Defective
Histogram Tar 8
Summary Stats
% Defective
22.5 22.0 21.5 21.0 5
10 Sample
15
20
%Defective: Lower CI: Upper CI: Target: PPM Def: Lower CI:
22.77 22.35 23.19 0.20 227686 223487
Upper CI: Process Z: Lower CI: Upper CI:
231926 0.7465 0.7325 0.7605
Frequency
(95,0% confidence)
23.0
1937 1838 2025 1888 1894 1941 1868 1894 1933 1862
2.10 Six Sigma
Binomial Process Capability Analysis of no answer 0.26
410 386 436 424 497 459 433 424 425 441
6 4 2 0
0
4
8 12 16 20 % Defective
Fig. 2.28 Binomial process capability, numerical example. Minitab® Statistical Software
24
2.10 Six Sigma
49
support it. Nevertheless, there are a lot of ad hoc tools and models specifically designed by the theorists and practitioners of this decisional and systematic approach, as properly illustrated in the survey by Black and Hunter (2003). Six Sigma is a standard and represents a measure of variability and repeatability in a production process. In particular, the 6 specifications, also known as Six Sigma capabilities, ask a process variability to be capable of producing a very high proportion of output
Lower specification limit
within specification. The “process spread” has to be included twice in the “specification spread” and centered on the target value. Figure 2.29 presents the results generated by a process capability conducted on an “in control” process in accordance with the Six Sigma philosophy. Configuration c identifies a capable process, as previously defined, whose variability meets the Six Sigma requirements. In other words, in a Six Sigma process there is a number of defects lower than two parts per billion,
Upper specification limit
The process is very capable with 6σ’
a
x capability 6σ’
The process is very capable, but the mean has drifted on center. Corrective action is needed to center the process. Out of specification
x
b
capability 6σ’
The process is just barely capable with 6σ’=USL-LSL centered but the variability should be decreased
c
x
The process is centered and stable but is not capable. Corrective action on the basic process is needed.
Outside lower specification
Out of specification
x
d
Fig. 2.29 Process capability and Six Sigma
50
2 Quality Management Systems and Statistical Quality Control
i. e., 0.002 PPM:
of the output of the process (process characteristic) – x is assumed to be normally distributed – and ˚ is a cumulative function of the standard normal distribution. Figure 2.30, proposed by Black and Hunter (2003), compares the performance of a capable process with Cp D Cpk D 1:33, known also as “four sigma capability,” and a process with Cp D Cpk D 2, which guarantees “Six Sigma capability.”
C6 Z
1
f .x/ dx D 2Œ1 ˚.z D 6/ 6
Š 0:00000000198024 Š 2 109 ;
(2.35)
where is the standard deviation of the process, f .x/ is the density function of the variable x, a measure
Cp=1.33
Cp=1.33
LSL
U SL -1.5σ
1.5σ
defect
defect
5
5
10 5
10
1510
-6σ -5σ -4σ -3σ -2σ -1σ 0
15
15
1σ 2σ 3σ 4σ 5σ 6σ
± four sigma design specification width mean LSL
USL Cp= Cpk= 2
Cp=2 Cpk=1.5
5
Cp=2 Cpk=1.5
5 10
5 10 15
-6σ -5σ -4σ -3σ -2σ -1σ 0 Fig. 2.30 Four sigma (Cp D Cpk D 1:33) versus Six Sigma (Cp D Cpk D 2)
10 15
15
1σ 2σ 3σ 4σ 5σ 6σ
± six sigma design specification width
virtually zero defects
2.10 Six Sigma
51
2.10.1 Numerical Examples
grouped in three divisions, each with a proper booking office and dedicated employees: hotel, wellness, and thermal services. Employees are nominated to have contact with the costumers, to identify their requirements, to accept the requests, and to finalize the booking process. Customers can have contact via telephone, e-mail, Web site, or by presenting themselves at the reception. Every kind of service has its own booking procedure, depending on the customer request. Before the application of Six Sigma methodologies the process was divided into the following five subroutines, depending on the service:
Among the previously illustrated numerical examples only the one discussed in Sect. 2.9.1 (process 2) verifies the Six Sigma hypotheses, because Cp D Cpk D 3:31.
2.10.2 Six Sigma in the Service Sector. Thermal Water Treatments for Health and Fitness In this subsection we present the results obtained by the application of the Six Sigma philosophy to the health service sector of thermal water treatments. This instance demonstrates how this methodological approach is effective also for the optimization of service processes. In particular, in this case study several health and fitness treatments are offered and they are
• • • • •
single thermal booking; group thermal booking; single hotel booking; tour operator hotel booking; wellness booking.
Customer contact
Customer
Legend:
Event
Activity
Note
Fax delivered
Call
Decision
Process
E-mail delivered
Subprocess
Remote request
Local request
Missed request
Missed booking
no
Booking/Reception
Single customer?
no
Request accepted?
Being out to book?
yes
yes
Retrieve Customer Master Data
yes
Consistent request?
Acquaint customer
yes
Recognize customer request
Phone request?
Forward to a collegue
no
yes
yes
Single wellness reservation
Single thermal reservation
Single hotel reservation
Group hotel reservation
Customer in Master Data? no
no
Request no for thermal treat- no ment?
yes no
Take customer request
Request for wellness?
Input Customer Data
yes Request for wellness?
no
Request for thermal treatment?
yes
yes
Group wellness reservation
Group thermal reservation
no
Is it a group?
no
yes
yes
Booking
Unoccupied collegue?
Tour operator reservation
no
Apply phone number for recall
no
Lucky customer?
Fig. 2.31 Booking procedure
yes
Call customer back
52
Once the booking procedure has been completed the staff will wait for the customer. On his/her arrival, the related booking data are recalled from the system and the customer is sent to the so-called “welcome process,” which is common to hotel, wellness, and thermal services. By the next check-in stage the customer is accepted and can access the required service. There is a specific check-in stage with its own dedicated rules and procedures for every kind of service. Once the customer has enjoyed the service, he/she will leave the system and go to the checkout stage, with its own procedures too. The whole process, from the admittance to the exit of the customer, can be displayed as a flowchart; Fig. 2.31 exemplifies the detail of the booking procedure.
2 Quality Management Systems and Statistical Quality Control
The analysis of the whole process has emphasized the existence of significant improvement margins, related to costs and time. For example, a particular service, e. g., thermal mud, may need a medical visit before the customer is allowed to access the treatment. By the Six Sigma analysis it was possible to reduce the lead time of the customer during the visit, through the optimization of the work tasks and processes. Sometimes this can be performed by very simple tricks and expedients. For example, the aural test can be invalidated by the presence of a plug of ear wax in a patient. Teaching the technician how to recognize and remove this obstruction reduces the probability of null tests, and consequently there is a reduction of costs and lead times.
3
Safety and Risk Assessment
Contents 3.1
Introduction to Safety Management . . . . . . . . . . . . 53
3.2
Terms and Definitions. Hazard Versus Risk . . . . . 54
3.3
Risk Assessment and Risk Reduction . . . . . . . . . . . 57
3.4
Classification of Risks . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.5
Protective and Preventive Actions . . . . . . . . . . . . . . 60
3.6
Risk Assessment, Risk Reduction, and Maintenance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.7
Standards and Specifications . . . . . . . . . . . . . . . . . . 63
Labour is not a commodity and markets must serve people. Nearly 90 years ago the protection of workers’ lives and health was set out as a key objective in the founding charter of the Organization. Today, rapid technological change and a fast-paced and globalized economy bring new challenges and pressures for all areas of the world of work... Millions of work related accidents, injury and disease annually take their toll on human lives, businesses, the economy and the environment. Each year, for some two million women and men, the ultimate cost is loss of life. In economic terms it is estimated that roughly four percent of the annual global Gross Domestic Product, or US$ 1.25 trillion, is siphoned off by direct and indirect costs of occupational accidents and diseases such as lost working time, workers’ compensation, the interruption of production and medical expenses... This year we focus on managing risk in the work environment. We know that by assessing risks and hazards, combating them at source and promoting a culture of prevention we can significantly reduce workplace illness and in-
juries. Employers, workers and governments all play key roles in making this happen (message from Juan Somavia, Director-General of the International Labor Organization, on the occasion of World Day for Safety and Health at Work – 28 April 2008). Every five seconds, an EU worker is involved in a work-related accident, and every two hours one worker dies in an accident at work (OSHA, European Agency for Safety and Health at Work, 2008) Safety must be designed and build into airplanes just as are performance, stability, and structural integrity. A safety group must be just as important a part of a manufacturer’s organization as a stress, aerodynamics, or a weights group (“Engineering for Safety,” Institute of Aeronautical Sciences 1947).
3.1 Introduction to Safety Management The concept of safety is universally widespread and maybe one of the most abused because daily we make our choices on the basis of it, willingly or not. That is why we prefer a safer car, or we travel with a safer airline instead of saving money with a ill-famed company. The acquisition of a safer article is a great comfort to us even if we pay more. In recent years great technological progress has reduced exposure to risks, sometimes even unknown, for people; anyway, the concurrent growing complexity of production systems can cause a lot of hazards for the operators, the community, and the environment. Both the scientific and the legislative communities aim to keep updating safety standards as a reference for production systems.
R. Manzini, A. Regattieri, H. Pham, E. Ferrari, Maintenance for Industrial Systems © Springer 2010
53
54
Safety involves every kind of production system and discipline, such as medicine, natural sciences, informatics, and engineering, in a specific way; for this reason a wide range of competences, from organization to management, from medicine to law, are required. It is not possible to detail this very complex topic in a single chapter because of the great variety and number of risks for people, goods, and the environment. Safety is moreover a fast-developing issue, and whatever attempt in arranging the multitude of laws, guidelines, technical regulations, plant solutions, medical studies, damage examinations, etc., is made, it is a never-ending task. This chapter aims to introduce the reader to the actual methodology for the implementation of a risk evaluation capable of reducing the risk exposure and of guaranteeing the desired level of safety. Several technical books and technical regulations are focused on the in-depth treatment of specific risks, such as electric, explosion, fire, vibrations, and informatics. The most significant keywords are as follows: safety, safety engineering, risk, danger, risk analysis, risk evaluation, accident, magnitude, protection, and prevention. Safety engineering is a subject whose purpose is a systematic definition and application of tools and techniques for a level of safety in whatever operative conditions, but especially in very complex production systems. Safety is placed at the center of a set of completing subjects, such as medicine, natural sciences, law, economics, and engineering, going on and on in the contribution for better knowledge. In order to demonstrate the level of criticality and the requirement of safety, it is impressive to summarize some data on the number of accidents worldwide collected by the International Labor Organization (2006): each day, an average of 6,000 people die as a result of work-related accidents or diseases, totaling more than 2.2 million work-related deaths a year. Of these, about 350,000 deaths (about 74,000 in China) are from workplace accidents and more than 1.7 million are from work-related diseases. In addition, commuting accidents increase the burden with another 158,000 fatal accidents. This situation generates a cost for the community of about US$ 1,250 billion, that is about 4% of gross domestic product. In particular, the number of mortal accidents per 100,000 workers in the European Community decreased from 2.9 in 2003 to 2.5 in 2005 (Health and Safety Executive 2005).
3 Safety and Risk Assessment
The objective of safety engineering is to establish a state such that people live and work under conditions where hazards are known and controlled in accordance with an acceptable level of potential injury for the community and potential damage to the environment. An integrated management of safety conditions is the most effective approach in order to achieve high safety standards and, at the same time, with the minimum global cost. This is the same rule of this book, whose content aims at an integrated approach for the improvement of productivity, quality, and safety in production systems.
3.2 Terms and Definitions. Hazard Versus Risk Every human activity has an unavoidable degree of uncertainty somehow capable of jeopardizing the achievement of the desired goals. The risk is the measure of this uncertainty. This definition underlines the probabilistic character of risk as the probability value of the event: for this purpose it is possible to distinguish between “accepted risk” with 100% probability of occurrence, and “unaccepted risk” having a probability lower than 100%. The following basic terms and definitions take inspiration from ISO 12100-1:2003 (Safety of machinery – basic concepts, general principles for design – part 1: basic terminology, methodology) and ISO 14121-1:2007 (Safety of machinery – risk assessment – part 1: principles). ISO 12100 and ISO 14121 are two type A standards1 because they give basic concepts, design principles, and general aspects for risk assessment, i. e., to meet the risk reduction objectives established by laws, specifications, and standards. Harm is a physical injury or damage to health. What about the difference between hazard and risk? Hazard is defined as the potential source of harm. It is also generally qualified according to its origin (e. g., mechanical hazard, electrical hazard, thermal hazard) and/or according to the nature of the poten1
The International Organization for Standardization (ISO) classifies safety standards in three types: type A standards dealing with basic concepts and principles, type B standards dealing with one safety aspect, and type C standards dealing with detailed safety requirements for a particular item (e. g., a machine or a group of machines).
3.2 Terms and Definitions. Hazard Versus Risk
tial harm. The hazard can be permanently present or appear unexpectedly. Examples of mechanical hazards are crushing, cutting, impact, friction, high pressure fluid injection, etc. Hazards can be generated
55
by noise, vibration, radiation, fire, explosive materials, etc. Table 3.1 exemplifies several hazards in accordance with the type, origin, and potential consequences (ISO 14121-1:2007).
Table 3.1 Hazards examples from ISO 14121-1:2007 Type or group
Origin
Potential consequences
1 Mechanical hazards
– Acceleration, deceleration (kinetic energy) – Angular parts – Approach of a moving element to a fixed part – Cutting parts – Elastic elements – Falling objects – Gravity (stored energy) – Height from the ground – High pressure – Machinery mobility – Moving elements – Rotating elements – Rough, slippery surface – Sharp edges – Stability – Vacuum
– Being run over – Being thrown – Crushing – Cutting or severing – Drawing-in or trapping – Entanglement – Friction or abrasion – Impact – Injection – Shearing – Slipping, tripping and falling – Stabbing or puncture – Suffocation
2 Electrical hazards
– Arc – Electromagnetic phenomena – Electrostatic phenomena – Live parts – Not enough distance to live parts under high voltage – Overload – Parts which have become live under fault conditions – Short-circuit – Thermal radiation
– Burn – Chemical effects – Effects on medical implants – Electrocution – Falling, being thrown – Fire – Projection of molten particles – Shock
3 Thermal hazards
– Explosion – Flame – Objects or materials with a high or low temperature – Radiation from heat sources
– Burn – Dehydration – Discomfort – Frostbite – Injuries by the radiation of heat sources – Scald
4 Noise hazards
– Cavitation phenomena – Exhausting system – Gas leaking at high speed – Manufacturing process (stamping, cutting, etc.) – Moving parts – Scraping surfaces – Unbalanced rotating parts – Whistling pneumatics – Worn parts
– Discomfort – Loss of awareness – Loss of balance – Permanent hearing loss – Stress – Tinnitus – Tiredness
5 Vibration hazards
– Cavitation phenomena – Misalignment of moving parts – Mobile equipment – Scraping surfaces – Unbalanced rotating parts – Vibrating equipment – Worn parts
– Discomfort – Low-back morbidity – Neurologic disorder – Osteo-articular disorder – Trauma of the spine – Vascular disorder
56
3 Safety and Risk Assessment
Table 3.1 (continued) Type or group
Origin
Potential consequences
6 Radiation hazards
– Ionising radiation source – Low frequency electromagnetic radiation – Optical radiation (infrared, visible and ultraviolet), including laser – Radio frequency electromagnetic radiation
– Burn – Damage to eyes and skin – Effects on reproductive capability – Genetic mutation – Headache, insomnia, etc.
7 Material/substance hazards
– Aerosol – Biological and microbiological (viral or bacterial) agent – Combustible – Dust – Explosive – Fibre – Flammable – Fluid – Fume – Gas – Mist – Oxidizer
– Breathing difficulties, suffocation – Cancer – Corrosion – Effects on reproductive capability – Explosion – Fire – Infection – Mutation – Poisoning – Sensitization
8 Ergonomic hazards
– Access – Design or location of indicators and visual displays units – Design, location or identification of control devices – Effort – Flicker, dazzling, shadow, stroboscopic effect – Local lighting – Mental overload/underload – Posture – Repetitive activity – Visibility
– Discomfort – Fatigue – Musculoskeletal disorder – Stress – Any other (e. g. mechanical, electrical) as a consequence of human error
A situation or a circumstance can be defined as hazardous if a person, a community, and/or the environment are exposed to one or more hazards. A zone is hazardous if a person near that zone can be exposed to one or more hazards. Examples of hazardous situations are contact of a person with thermal radiation, unsuitable insulation, etc. Risk can be defined as the combination of the probability of occurrence of harm, i. e., the likelihood of occurrence of possible adverse consequence(s), and the severity of that harm, i. e., the magnitude of the consequence(s). The severity depends upon the extent of harm to one or several persons and the level of injuries or damage to health and to the environment. In particular, a residual risk is a risk remaining after protective measures have been adopted. Figure 3.1 summarizes the most important elements of risk. They are the severity of harm and the probability of occurrence as a function of three important factors: 1. The exposure of a person (or persons, i. e., the community) to the hazard.
2. The occurrence of the event. For the determination of failure/damage probability see the analytical models and methods introduced, illustrated, and exemplified in Chaps. 5, 6, and 8. 3. The technical and human possibilities (e. g., reflex, agility) of avoiding or limiting the harm. These factors are taken into account by two important techniques for identification and analysis of failure modes (see failure modes and effects analysis and failures mode, effects, and criticality analysis in Chap. 8) and reliability evaluation of complex production systems (see fault tree analysis in Chap. 8). Risk assessment is the science of risks and their likelihood and evaluation. It is a very complex decision process in engineering planning, design, management, and control of a complex engineered technological entity, the so-called production system.2 Risk assessment is a systematic and comprehensive methodology to 2
See the definition introduced in Chap. 1.
3.3 Risk Assessment and Risk Reduction Risk
considered
related to the hazard considered
57 Severity of harm
is a function of
that can result from the hazard
Probability of occurrence Exposure of person(s) to the hazard and
Occurrence of a hazardous event Possibility of avoiding or limiting the harm
Fig. 3.1 Elements of risk
evaluate risks and includes risk analysis and risk evaluation. The ISO 12100-1 standard defines them as: 1. Risk analysis, combination of the specification of the limits of the machine, hazard identification and risk estimation based on the definition of the severity of harm and probability of its occurrence. 2. Risk evaluation, judgment of whatever the risk reduction objectives have been achieved. Other important definitions are: • Failure to danger. Any malfunction that increases risk. • Emergency situation. Hazardous situation that needs to be urgently ended and averted. From these definitions it clearly emerges that safety management is effectively supported by risk assessment methodology, statistics-based supporting decisions models and methods, and it deals with failure events and probabilistic evaluation of risk. As a consequence, safety is strongly linked to reliability and maintenance too as properly introduced in Chap. 1.
3.3 Risk Assessment and Risk Reduction Figure 3.2 presents the iterative process for risk reduction as proposed by the standard ISO 14121-1:2007. This process is made up of the following steps: 1. Determination of limits of production resources, e. g., equipment, parts/components and tools, and human resources, during their life cycle. Limits typologies are use limits (operator training, exposure of persons, etc.), space limits (e. g., range of movement, space requirement), time limits (e. g., rec-
ommended service intervals), environmental limits (temperature, sunlight exposure, etc.), etc. 2. hazard identification, i. e., identification of hazardous circumstances and events by the limits of the system during setting, testing, start-up, different modes of operation, stopping, emergency, and other tasks that can be identified during all life cycle phases. 3. Risk estimation, see Fig. 3.1. 4. Risk evaluation. The aim is to determine if risk elimination or risk reduction is required and possible. For this purpose, it is necessary to face separately or simultaneously each of these two elements determining the risk: • Severity of harm by so-called protective actions. • Probability of occurrence of harm by a socalled preventive action. For this purpose see also maintenance strategies, rules, and actions illustrated and exemplified in Chap. 9. These results can be achieved by the so called threesteps method according to the standard ISO 12100-1 (2003): 1. Introduce inherently safe design measure, e. g., substitution of materials with less hazardous materials and application of ergonomic principles; 2. Introduce guards (i. e., physical barriers to provide protection) and protective devices; 3. Introduce information for use about the residual risk. Important aspects to be considered during the risk assessment are personnel exposure, type, frequency, and duration of exposure, relationship between exposure and effects (see also Chap. 8), human factors (e. g., er-
3 Safety and Risk Assessment
P - probability
58 Start
Determination of the limits of the machinery
conventional risks
Risk assessment
Risk analysis
Hazard identification
catastrophic risks
M - magnitude Fig. 3.3 Classification of risks
(generation of minor injury and no damage to the system), and negligible. Another classification can be based on the attributes severe, major, and minor. Similarly, it is possible to classify hazards in accordance with the occurrence’s probability measure by adopting a qualitative probability ranking – frequent, probable, remote, improbable, and impossible – or a quantitative probability ranking p > 0:75, p 2 Œ0:5; 0:75, p 2 Œ0:25; 0:5, etc. The following categories of risk can be conventionally adopted:
Risk estimation
Risk evaluation
Has the risk been adequately reduced?
specific/ minor risks
Yes
End
No
Risk reduction
Fig. 3.2 Iterative process for reducing risk (ISO 14121-1:2007)
gonomics aspects), availability and suitability of protective measures, information for use, etc.
3.4 Classification of Risks MIL-STD-882 identified four main categories of hazard severity: catastrophic (e. g., generation of death and loss of production), critical (generation of severe injury and major damage to the system), marginal
• Specific risks. This category has small values of magnitude M , assumed as a measure of the outcomes, and high likelihood of occurrence P , as typically for a continuative exposure. These risks are referred to in laws and technical regulations concerning health and safety at work, risk of noise, vibrations, thermal discomfort, etc. • Conventional risks. In comparison with the previous category there are slightly greater values of M and lower values of P . • Great risks, or potentially relevant accidents. In this case we have a very high level of magnitude M regardless of the value of P , e. g., in the case of risk of fire or explosion in a production plant. In Fig. 3.3 all these occurrences are placed on the M –P diagram. Depending on the position in the M –P diagram, the quantification of the risk expressed by the parameter R is carried out in three different ways: 1. Qualitative approach. Both M and P are ranked according to explanatory denominations quite similar to verbal expressions (e. g., high, low; strong,
3.4 Classification of Risks
59 Start
Time Series analysis on-field historical data collection
Check Lists
Hazard Operability HazOp
Hazard identification
Failure Modes & Effects Analysis FMEA
Failure modes and events identification
Failure Modes & Criticality Analysis FMECA
Reliability evaluation models and methods (see Chapter 5, 6, 7 and 8)
Probability of occurance estimation
Risk assessment
Risk analysis
Surveys and Questionnaire for workers
Determination of the limits of the production system (e.g. equipment , machinery)
Severity of harm estimation
Risk estimation
R=PxM
Risk evaluation
Yes Has the risk been adequately reduced ?
End
R=PxM
No
Maintenace strategies, models and methods (see Chapter 9)
Protective measures taken by the designer
Step 2 Safeguarding and complementary protective measures Step 3 Information for use
Protective measures taken by the users
Organization Additional safeguards Use of personal protective equipment Training
Risk
Risk reduction
Step 1 Inherently safe design measures
Level of Risk
Laws, Standards & specs
Residual risk after designer’s protective measures
Residual risk after designer and user’s protective measures
Fig. 3.4 Flowchart of risk assessment in a production system
weak; negligible, catastrophic). Risks are classified in a descending order of criticality, i. e., according to the level of emergency associated with the intervention of the safety manager or employer.
2. Semiqualitative approach. Both M and P are now ranked into categories according to prearranged scales of values (e. g., from 0 to 9). In this case too the safety manager can determine the priority of the intervention according to this scale.
60
3 Safety and Risk Assessment
Table 3.2 Technical Committee, occupational health and safety area Technical Committee
Title
CEN/TC 70 CEN/TC 93 CEN/TC 122 CEN/TC 126 CEN/TC 137 CEN/TC 191 CEN/TC 192 CEN/TC 211 CEN/TC 231
Manual means of fire fighting equipment Ladders Ergonomics Acoustic properties of building products and of buildings Assessment of workplace exposure Fixed firefighting systems Fire service equipment Acoustics Mechanical vibration and shock
Table 3.3 Technical Committee, personal and protective equipment area Technical Committee
Title
CEN/TC 79 CEN/TC 85 CEN/TC 158 CEN/TC 159 CEN/TC 160 CEN/TC 161 CEN/TC 162
Respiratory protective devices Eye protective equipment Head protection Hearing protectors Protection against falls from height including working belts Foot and leg protectors Protective clothing including hand and arm protection and lifejackets
3. Quantitative approach. For M several mathematical models are applied in order to quantify the outcomes of events such as explosion, fire, and leakage of pollutants, while P is evaluated by reliability models and techniques as described in Chaps. 5, 6, and 8.
a part of it, e. g., by adopting more reliable components, or operating on its connections, or modifying the operative conditions, or planning a different exploitation of the system. For this purpose, maintenance plays a fundamental role for the support of planning, execution, and control activities. 2. Protection strategy. It aims to reduce M , mainly by interventions on the system in order to protect any exposed subject and to reduce the outcomes of the event. In the case of individual protection, the employer must provide some protective devices, such as earphones, gloves, shoes, and overalls, capable of protecting the individual operator from specific hazards. Some devices are, of course, capable of reducing M for a group of people, or a community, in the same environment: e. g., acoustic baffles for noise reduction, fire-extinguisher devices3 such as hydrants, fire doors, and every solution to create compartments.4 In exchange, in such a situation it is possible to have some operators deliberately without individual protection. 3. Mixed strategy. A combination of the previous strategies.
It is worth noting that these three approaches are quite different in objectivity, accuracy, and, last but not least, cost. The last one is particularly expensive and time-consuming with regard to applicable results. In general, for the safety manager both qualitative and semiqualitative approaches, essentially by means of a checklist, are likely in the case of conventional or specific risks, while the quantitative approach cannot be rejected in the case of great risks having catastrophic effects on goods, people, and the environment (as the explosion of a nuclear reactor).
3.5 Protective and Preventive Actions According to the previous definition of R as a combination of M and P , three alternative strategies are applicable to reduce the risk: 1. Prevention strategy. It aims to reduce P , mainly by changing the configuration of the system or
3 4
“Active” devices “Passive” devices
3.5 Protective and Preventive Actions
61
Table 3.4 CEN/TC Ergonomics, standards published since 2008. Part 1
Standard
Title
Standard
Title
EN 1005-1:2001
Safety of machinery - Human physical performance - Part 1: Terms and definions
EN ISO 13406-2:2001
EN 1005-2:2003
Safety of machinery - Human physical performance - Part 2: Manual handling of machinery and component parts of machinery
Erg. requirements for work with visual displays based on flat panels - Part 2: Erg. requirements for flat panel displays (ISO 13406-2:2001)
EN ISO 13407:1999
Human-centred design processes for interacve systems (ISO 13407:1999)
EN ISO 13731:2001
Erg.s of the thermal environment Vocabulary and symbols (ISO 13731:2001)
EN ISO 13732-1:2008
Erg.s of the thermal environment Methods for the assessment of human responses to contact with surfaces - Part 1: Hot surfaces (ISO 13732-1:2006)
EN ISO 13732-3:2008
Erg.s of the thermal environment Methods for the assessment of human responses to contact with surfaces - Part 3: Cold surfaces (ISO 13732-3:2005)
EN ISO 14505-2:2006
Erg.s of the thermal environment Evaluaon of thermal environments in vehicles - Part 2: Determinaon of equivalent temperature (ISO 14505-2:2006)
EN ISO 14505-3:2006
Erg.s of the thermal environment Evaluaon of the thermal environment in vehicles - Part 3: Evaluaon of thermal comfort using human subjects (ISO 145053:2006)
EN ISO 14738:2008
Safety of machinery - Anthropometric requirements for the design of workstaons at machinery (ISO 14738:2002, including Cor 1:2003 and Cor 2:2005)
EN ISO 14915-1:2002
Soware Erg.s for mulmedia user interfaces - Part 1: Design principles and framework (ISO 14915-1:2002)
EN 1005-3:2002
Safety of machinery - Human physical performance - Part 3: Recommended force limits for machinery operaon
EN 1005-4:2005
Safety of machinery - Human physical performance - Part 4: Evaluaon of working postures and movements in relaon to machinery
EN 1005-5:2007
Safety of machinery - Human physical performance - Part 5: Risk assessment for repeve handling at high frequency
EN 13861:2002
Safety of machinery - Guidance for the applicaon of Erg.s standards in the design of machinery
EN 13921:2007
Personal protecve equipment - Erg. principles
EN 27243:1993
Hot environments - Esmaon of the heat stress on working man, based on the WBGTindex (wet bulb globe temperature) (ISO 7243:1989)
EN 547-1:1996+A1:2008
Safety of machinery - Human body measurements - Part 1: Principles for determining the dimensions required for openings for whole body access into machinery
EN 981:1996+A1:2008
Safety of machinery - System of auditory and visual danger and informaon signals
EN ISO 10075-1:2000
Erg. principles related to mental work-load Part 1: General terms and definions (ISO 10075:1991)
EN ISO 20685:2005
3-D scanning methodologies for internaonally compable anthropometric databases (ISO 20685:2005)
EN ISO 10075-2:2000
Erg. principles related to mental workload Part 2: Design principles (ISO 100752:1996)
EN ISO 6385:2004
Erg. principles in the design of work systems (ISO 6385:2004)
EN ISO 7250:1997
EN ISO 10075-3:2004
Erg. principles related to mental workload Part 3: Principles and requirements concerning methods for measuring and assessing mental workload (ISO 100753:2004)
Basic human body measurements for technological design (ISO 7250:1996)
EN ISO 7726:2001
Erg.s of the thermal environment Instruments for measuring physical quanes (ISO 7726:1998)
EN ISO 7730:2005
Erg.s of the thermal environment Analycal determinaon and interpretaon of thermal comfort using calculaon of the PMV and PPD indices and local thermal comfort criteria (ISO 7730:2005)
EN ISO 7731:2008
Erg.s - Danger signals for public and work areas - Auditory danger signals (ISO 7731:2003)
EN ISO 7933:2004
Erg.s of the thermal environment Analycal determinaon and interpretaon of heat stress using calculaon of the predicted heat strain (ISO 7933:2004)
EN ISO 8996:2004
Erg.s of the thermal environment Determinaon of metabolic rate (ISO 8996:2004)
EN ISO 9241-110:2006
Erg.s of human-system interacon - Part 110: Dialogue principles (ISO 9241110:2006)
EN ISO 10551:2001
Erg.s of the thermal environment Assessment of the influence of the thermal environment using subjecve judgement scales (ISO 10551:1995)
EN ISO 11064-1:2000
Erg. design of control centres - Part 1: Principles for the design of control centres (ISO 11064-1:2000)
EN ISO 11064-2:2000
Erg. design of control centres - Part 2: Principles for the arrangement of control suites (ISO 11064-2:2000)
EN ISO 11064-3:1999
Erg. design of control centres - Part 3: Control room layout (ISO 11064-3:1999)
EN ISO 11064-3:1999/AC:2002 Erg. design of control centres - Part 3: Control room layout (ISO 110643:1999/Cor.1:2002)
62
3 Safety and Risk Assessment
Table 3.5 CEN/TC Ergonomics, standards published since 2008. Part 2
Standard
Title
Standard
Title
EN 547-2:1996+A1:2008
Safety of machinery - Human body measurements - Part 2: Principles for determining the dimensions required for access openings Safety of machinery - Human body measurements - Part 3: Anthropometric data Safety of machinery - Erg. design principles Part 1: Terminology and general principles
EN ISO 14915-2:2003
Soware Erg.s for mulmedia user interfaces - Part 2: Mulmedia navigaon and control (ISO 14915-2:2003)
EN ISO 14915-3:2002
Soware Erg.s for mulmedia user interfaces - Part 3: Media selecon and combinaon (ISO 14915-3:2002) Erg.s of the thermal environment - Risk assessment strategy for the prevenon of stress or discomfort in thermal working condions (ISO 15265:2004)
EN 547-3:1996+A1:2008
EN 614-1:2006
EN ISO 15265:2004
EN 614-2:2000+A1:2008
Safety of machinery - Erg. design principles Part 2: Interacons between the design of machinery and work tasks
EN ISO 15535:2006
General requirements for establishing anthropometric databases (ISO 15535:2006)
EN 842:1996+A1:2008
Safety of machinery - Visual danger signals General requirements, design and tesng
EN ISO 15536-1:2008
Erg.s - Computer manikins and body templates - Part 1: General requirements (ISO 15536-1:2005)
EN 894-1:1997
Safety of machinery - Erg.s requirements for the design of displays and control actuators - Part 1: General principles for human interacons with displays and control actuators
EN ISO 15536-2:2007
Erg.s - Computer manikins and body templates - Part 2: Verificaon of funcons and validaon of dimensions for computer manikin systems (ISO 15536-2:2007)
EN 894-2:1997
Safety of machinery - Erg.s requirements for the design of displays and control actuators - Part 2: Displays
EN ISO 15537:2004
EN 894-3:2000
Safety of machinery - Erg.s requirements for the design of displays and control actuators - Part 3: Control actuators
EN ISO 15743:2008
Principles for selecng and using test persons for tesng anthropometric aspects of industrial products and designs (ISO 15537:2004) Erg.s of the thermal environment - Cold workplaces - Risk assessment and management (ISO 15743:2008)
EN ISO 11064-4:2004
Erg. design of control centres - Part 4: Layout and dimensions of workstaons (ISO 11064-4:2004)
EN ISO 9241-151:2008
Erg.s of human-system interacon - Part 151: Guidance on World Wide Web user interfaces (ISO 9241-151:2008)
EN ISO 11064-5:2008
Erg. design of control centres - Part 5: Displays and controls (ISO 11064-5:2008)
EN ISO 9241-171:2008
EN ISO 11064-6:2005
Erg. design of control centres - Part 6: Environmental requirements for control centres (ISO 11064-6:2005)
EN ISO 9241-400:2007
Erg.s of human-system interacon - Part 171: Guidance on soware accessibility (ISO 9241-171:2008) Erg.s of human-system interacon - Part 400: Principles and requirements for physical input devices (ISO 9241-400:2007)
EN ISO 11064-7:2006
Erg. design of control centres - Part 7: Principles for the evaluaon of control centres (ISO 11064-7:2006)
EN ISO 9241-410:2008
Erg.s of human-system interacon - Part 410: Design criteria for physical input devices (ISO 9241-410:2008)
EN ISO 11079:2007
Erg.s of the thermal environment Determinaon and interpretaon of cold stress when using required clothing insulaon (IREQ) and local cooling effects (ISO 11079:2007) Erg.s of the thermal environment Principles and applicaon of relevant Internaonal Standards (ISO 11399:1995)
EN ISO 9886:2004
Erg.s - Evaluaon of thermal strain by physiological measurements (ISO 9886:2004)
EN ISO 9920:2007
Erg.s of the thermal environment Esmaon of thermal insulaon and water vapour resistance of a clothing ensemble (ISO 9920:2007)
EN ISO 9921:2003
Erg.s - Assessment of speech communicaon (ISO 9921:2003)
EN ISO 11399:2000
EN ISO 12894:2001
EN ISO 13406-1:1999
Erg.s of the thermal environment - Medical supervision of individuals exposed to extreme hot or cold environments (ISO 12894:2001) Erg. requirements for work with visual display based on flat panels - Part 1: Introducon (ISO 13406-1:1999)
3.7 Standards and Specifications
Every solution adopted for reducing R has its own cost to be evaluated in conjunction with the effectiveness of the technical solutions, and to be compared with the currently available budget. In detail, laws and technical regulations for safety in production systems always suggest performing activities with special attention to the budget and according to the following policies: • • • •
Removal of hazard and risk. Preventive interventions for the reduction of P . Preventive interventions for the community. Individual preventive interventions. These solutions are not too expensive for the employer and can be applied immediately.
3.6 Risk Assessment, Risk Reduction, and Maintenance In conclusion, the most important steps of the procedure for risk assessment are summarized in Fig. 3.4 in the form of a self-explanatory flowchart. In particular, the role of models and methods for reliability evaluation and maintenance is clearly emphasized. On the importance of an integrated approach to health and safety management, risk assessment, and maintenance planning and execution, see the research report by Wintle et al. (2001). This study, commissioned by the Health and Safety Executive, proposes a plant integrity management based on riskbased inspection. This is an integrated approach to risk
63
assessment and maintenance planning, as discussed at the end of Chap. 9. Ad hoc rules for planning inspections to reduce risks of failures and improve safety and health, reduce costs by repair or replacement of deteriorating equipment in the best time and eliminating ineffective inspections.
3.7 Standards and Specifications The sector of interest in safety engineering is called “health and safety” and mainly operates in two different areas: 1. Occupational health and safety. It is linked with a large number of standardization fields such as machinery, pressure equipment, personal protective equipment, transport, and electrotechnical matters. 2. Personal protective equipment. The aim of this area is to meet the health and safety requirements of the directive for personal protective equipment (89/686/EEC). The main issues developed by the technical committee for the first area are reported in Table 3.2. Similarly, Table 3.3 presents the list of the main issues for the second area. Tables 3.4 and 3.5 report the list of standards belonging to the Technical Committee CEN/TC 122 Ergonomics and published since 2008.
4
Introduction to Maintenance in Production Systems
Contents 4.1
Maintenance and Maintenance Management . . . . 65
4.2
The Production Process and the Maintenance Process . . . . . . . . . . . . . . . . . . 66
4.3
Maintenance and Integration . . . . . . . . . . . . . . . . . . 69
4.4
Maintenance Workflow . . . . . . . . . . . . . . . . . . . . . . . 70
4.5
Maintenance Engineering Frameworks . . . . . . . . . 70
4.6
Reliability-Centered Maintenance . . . . . . . . . . . . . . 72
4.7
Total Productive Maintenance . . . . . . . . . . . . . . . . . 4.7.1 Introduction to TPM . . . . . . . . . . . . . . . . . . . . . 4.7.2 The Concept of TPM . . . . . . . . . . . . . . . . . . . . 4.7.3 TPM Operating Instruments . . . . . . . . . . . . . . 4.7.4 From Tradition to TPM: A Difficult Transition
4.8
Maintenance Status Survey . . . . . . . . . . . . . . . . . . . . 80
4.9
Maintenance Outsourcing and Contracts . . . . . . . 83
73 73 74 75 76
“Maintenance is the combination of all technical, administrative and managerial actions during the life cycle of an item intended to retain it in, or restore it to, a state in which it can perform the required function” (EN 13306:2001 Maintenance terminology). This chapter examines the fundamental definitions concerning maintenance, and discusses the maintenance question in product manufacturing companies or service suppliers. Emphasis is placed on integrating maintenance with the other activities of a company (e. g., production, R&D, quality assurance, purchasing). In conclusion, a survey on the status of maintenance in industrial companies and several observations about maintenance outsourcing are discussed.
4.1 Maintenance and Maintenance Management The life cycle of a generic component in a production system is firstly characterized by periods of uptime when the element is working correctly, i. e., in nominal conditions, secondly by periods of time when it is working but not as expected in the conditions, and thirdly by periods when it stops working altogether owing to a breakdown occurring and the subsequent repair work still having to be completed. Figure 4.1 shows this behavior. In general, the item1 (plant, component, system, equipment, etc.) is supposed to be subject to failures, and to a time-dependent process of degradation. The item can also be repaired by a restoration activity. Both failure and repair times are random variables. Nevertheless, there are different types of failures, repairs, and components/systems; in particular, Chap. 5 distinguishes repairable from nonrepairable items. Maintenance is the function that monitors and keeps plant, equipment, and facilities working. It must design, organize, carry out, and check the work to guarantee nominal functioning of the item during working times “T i ” (uptimes) and to minimize stopping intervals (downtimes) caused by breakdowns or by the resulting repairs. Maintenance management, as illustrated by the framework shown in Fig. 4.2, is made up of all activi1 The standard EN 13306:2001 defines the item as any part, component, device, subsystem, functional unit, equipment, or system that can be considered individually.
R. Manzini, A. Regattieri, H. Pham, E. Ferrari, Maintenance for Industrial Systems © Springer 2010
65
66
4 Introduction to Maintenance in Production Systems
0
preventive maintenance, condition-based maintenance, and corrective maintenance as discussed in Chap. 9, where several analytical models and methods are applied and compared. Maintenance planning is the activity of planning maintenance actions, e. g., inspection, replacement, overhaul, and repair, as properly defined in Chap. 9. In particular, maintenance planning schedules interventions over time, and identifies and allocates necessary resources for the implementation of strategies. Obviously, planning is followed by the execution of maintenance actions and also by the control and supervision of the production systems: on-site, i. e., at the location where the item is used, on-line, i. e., during the time that the item is used, remotely, i. e., without physical access to the item, etc. Maintenance strategies and planning can be properly updated on the basis of the feedback data extracted from the item performances. All these activities have to be properly supported by a maintenance support system made up of resources, services, and management.4 The configuration of such a support system depends on many factors, such as the complexity of maintenance tasks, the skill of the personnel, and availability of the facilities, and is therefore a very critical issue in maintenance management.
T1
τ1
T2
τ2
time
Ti : working time in nominal conditions (uptime) τi: failure time or not nominal working time or reparation time
Fig. 4.1 Life cycle of a component in a production system
ties that determine the maintenance objectives, strategies, and responsibilities2 and implement them by: • maintenance planning; • maintenance control and supervision; • improvement of methods in the organization. The objectives assigned for the maintenance activities can include key performance indicators3 such as reliability, availability, mean time to repair, number of failures, and maintenance costs, properly defined in the following chapters. Consequently, some exemplifying objectives are as follows: improve availability, retain health, safety and environmental preservation, and reduce maintenance costs. Four main classes of objectives are distinguished in the literature (Cheunusa et al. 2004): 1. Loss of production, as an indirect cost. A few examples are the minimization of breakdowns, downtime, rework, inventory, spare parts, overtime, and accidents. 2. Maintenance direct cost. Cost reduction by extension of the useful life of the assets. 3. Volume. This class mainly deals with the following objectives: • Improve reliability and availability; • Improve plant performance; • Support new market opportunities. 4. Price by the product quality increase. The first two classes reduce costs, while the remaining two increase revenues. All classes contribute to maximizing the profit. Maintenance strategies are different types of tasks including actions, procedures, resources, and time. These activities have to be carried out in accordance with established time schedules to guarantee maintenance targets. Some examples are represented by 2
See footnote 4. See also the European Standard EN 15341:2007 Maintenance – maintenance key performance indicators.
3
4.2 The Production Process and the Maintenance Process In modern production systems, the product, or the service, and the maintenance requirements are major outputs: that is to say, in parallel with the production process is the maintenance process. Maintenance is a system whose activities are carried out in synergy with those of the production systems. Figure 4.3 shows the relationship between different objectives relating to these processes. Production systems usually convert inputs (raw materials, energy, workload, etc.) into a product that satisfies customer needs. The mainte4
The European Standard CEN/TR 15628:2007 Maintenance – qualification of maintenance personnel classifies three different categories of maintenance personnel: the European Maintenance Technician, the European Maintenance Supervisor, and the European Maintenance Manager. All categories are characterized in terms of competences and responsibilities.
4.2 The Production Process and the Maintenance Process
67
Maintenance support system Planning
Objectives
Execution
Strategies
Services
Management
Control & supervision
Feedback
Resources
Responsabilities
Input
Improvements and revisions
Fig. 4.2 Maintenance management
INPUT
PRODUCTION PROCESS
OUTPUT
FEEDBACK
Fig. 4.3 Production and maintenance processes. (Duffuaa et al. 1999)
PRODUCTION CAPACITY
nance system, as a mix of know-how, labor, and spare parts, together with other resources aims to maintain equipment in a good working order, i. e., able to provide the appropriate level of production capacity. In a maintenance system, feedback control, planning, and organization activities are very critical and strategic issues. The first of these deals with the production system and control of maintenance activity (e. g., workload emission, spare parts management). Consequently, various actions must be taken to control production and maintenance activities and to resolve breakdowns. Moreover, these activities must be planned in advance whenever possible. Clearly the first aim of maintenance action in downtime periods, during an unexpected breakdown, is to put the plant back into working order: the planning phase is skipped and
MAINTENANCE PROCESS
MAINTENANCE NEED
the maintenance work is carried out as soon as possible. This is breakdown/corrective maintenance. In this situation the maintenance work must be completed quickly, or must be postponed until the next stop, simply leaving the system to run till the next scheduled recondition. In this second case, the definitive maintenance work is scheduled in a previously planned stop period. Maintenance activities are so numerous and complex that they require effective management and wellstructured organization. The starting point is the synchronized control of the production system that not only involves monitoring equipment but also maintenance control, planning, and organization, with a lot of subactivities. This is illustrated in Fig. 4.4 and summarized as follows:
68
4 Introduction to Maintenance in Production Systems PLANNING MAINTENANCE PHILOSOPHY MAINTENANCE LOAD FORECASTING MAINTENANCE CAPACITY MAINTENANCE ORGANIZATION
INPUT FACILITIES LABOR EQUIPMENT SPARES MANAGEMENT
ORGANIZATION JOB DESIGN TIME STANDARDS WORK MEASUREMENT PROJECT MANAGEMENT
MAINTENANCE PROCESS
OUTPUT OPERATIONAL MACHINES EQUIPMENT
FEEDBACK CONTROL
MONITORING PLANT CONTROL WORK CONTROL INVENTORY CONTROL COST CONTROL QUALITY CONTROL
Fig. 4.4 Characteristics of the maintenance process. (Duffuaa et al. 1999)
• Plant control. Control of system performance reliability and collection of on-field data for breakdowns and repair processes by the application of sensors or human checks. • Work control. The maintenance workload is influenced by the maintenance strategy adopted and is supported by well-designed control of the workload based on an effective reporting system. • Inventory control. This activity deals with spare parts management and with all the tools and equipment used in maintenance work. • Cost control. Maintenance usually consumes large amounts of money. There are two fundamental cost factors: the direct cost of investment, i. e., investment in production resources (e. g., plant, equipment, employees), and indirect costs caused by lack of production. It is extremely important to have an effective and continuous cost control process. • Quality control. The main aim of quality assurance of a process or a product is to measure several variables representing a range of specifications, as stated by the Six Sigma quality strategies, for example, and policies applied to production/logistic system management and optimization. The check and control process of the production system generates a large amount of useful data for planning the maintenance work. In particular, during the maintenance planning process it is necessary to assume some decisions involving:
• Maintenance philosophy. Several maintenance policies have been developed by practitioners and are discussed in the literature (see Chap. 9). Since no strategy is significantly more effective than the others, this problem usually deals with the identification of the best mix of strategies and policies in order to obtain the best global result (e. g., minimization of production costs). • Maintenance load forecasting and capacity. Maintenance requires the simultaneous use of several resources (e. g., manpower, spare parts, equipment). Consequently, the load forecasting process is essential to obtain the desired level of maintenance system performance. Critical aspects of maintenance capacity include the identification of the optimum number of craftsmen and their skills and the maintenance of the required tools. After the control and planning of maintenance processes has been carried out, the next step is to design the maintenance system correctly. This requires the integration of several aspects: • Job design. A variety of complicated tasks, called “jobs,” are usually required to maintain a production system. Each job must be designed correctly. The most important instrument for job design and management is the maintenance work order (illustrated in detail in Chap. 7): it contains all the details of the work required, e. g., its location, and all the skills and tools required. The work order is the
4.3 Maintenance and Integration
69
main instrument used in monitoring, planning, and reporting all maintenance activities. Moreover, in maintenance job design the evaluation of the duration of a generic activity is an extremely critical issue. To measure and estimate this duration, method time measurement and the Maynard operation sequence technique are two examples of effective decision-supporting tools. • Work measurement. Each maintenance job requires various resources and generates costs. The target of the workload analysis is to evaluate and control these costs. The ultimate aim of the maintenance process is to minimize the total cost of the production system. • Project management. Maintenance activities are frequently part of a general development plan for the production system. Project management techniques are very useful in supporting the maintenance planning activities and effecting maintenance work (Gantt charts, critical path methods, program evaluation review technique, heuristics for project scheduling and sequencing).
Maintenance activities can provide a significant contribution to meeting the set of the productivity targets for a system, as illustrated in Chap. 1. However, maintenance requires a great deal of time, considerable knowledge, and it also consumes a great deal of money. Consequently, the choice of the “best maintenance level” contains to be a hidden trade-off problem. Since performance maximization of the entire production system is the final goal of a maintenance system, the right approach and the most appropriate working instruments depend on the characteristics of the particular real-world instance examined. Before evaluating this trade-off, one needs to understand that companies often run the risk of underestimating the importance of maintenance, thus highlighting how important the introduction of a new managerial and organizational culture taking care of this issue is. Maintenance activities produce good results only if they are integrated with the other corporate functions, and particularly with the following activities (Chuenusa et al. 2004):
In conclusion, the monitoring phase is the starting point of all maintenance activities. In particular, the performance measurement of a production system can be effectively supported by reliability and availability theory and evaluation.
• • • • •
4.3 Maintenance and Integration In addition to performing maintenance work, “maintenance” must have a place in the design activity and in supporting the management decision-making in the company. For example, this applies in spare parts fulfillment and management, knowledge management, and other areas. Maintenance procedures affect different organizational levels in a company, and have several particularly important implications: • Financial. Production plants lock up a great deal of capital, and the related investment must be repaid. • Technological. Process and product (or service) quality are directly related to the state of the plant and production system maintenance. • Economic. Failures and defects reduce profits. • Social and legal. In poor conditions equipment and facilities can produce pollution and cause both accidents and safety problems.
strategic planning design; production planning; workload management; quality assurance and control; material purchasing management and material management; • human resource allocation and management; • administration and cost accounting; • information technology management. In particular, production planning, quality assurance and control, material purchasing and management, and human resource management influence maintenance the most. Production planning. A continuous flow of material guarantees that production systems will perform excellently, but this goal can only be achieved by the perfect integration of maintenance services and production planning: the shared aim is to make sure that the production system is always available. Quality assurance and control. High levels of process and product quality reduce the scrap rate and improve the customer service level. Furthermore, quality products do not need reworking activity or continuous measurement of the production processes. In conclusion, quality outcomes are the result of an effective integration of both maintenance and quality functions of
70
production systems that make products and/or supply services. Material purchasing. Equipment availability and continuous operability of the production system strongly depend on the availability of spare parts. As a result, the spare parts forecasting question is very critical in production system management and optimization (see Chap. 11). Firstly, maintenance must define the specifications of the spare parts required for functioning of the production system, then the purchasing department of the company must buy the spare parts under the best financial terms and conditions available, and finally maintenance personnel must check and either accept or reject incoming deliveries of materials. Human resources. Great care must be taken in appointing maintenance personnel since human resource skills and knowledge play a fundamental role in developing an effective maintenance division and in minimizing production costs. Two fundamental activities to apply the most appropriate maintenance policies are data collection and management. Consequently, the link between maintenance and the information technology system is one of the most important targets of a production system.
4.4 Maintenance Workflow The maintenance of a production system is strongly related to a set of activities and procedures to cope with for an effective management. The European standard EN 13460 proposes the maintenance workflow with its main activities and documents as reported in Fig. 4.5. The maintenance planning and execution system (Fig. 4.2) is supported by a maintenance information system, properly illustrated in Chap. 7. The main areas of information systems require the following information modules: • Work list and inventory, containing all technical and functional data of parts, components, plants, and resources in general. Also data regarding methods, costs, and times are collected and managed in this area. • Maintenance planning, dealing with frequency, procedure, and technical specifications of each item. • Scheduling and resource management. • Requests of interventions.
4 Introduction to Maintenance in Production Systems
• Work orders, i. e., authorizations and instructions for intervention. • Spare parts monitoring and management. • Cost reporting and controlling. • Inspection record and periodic inspections. • Reliability evaluation tools. The workflow presented is strongly based on a series of tools, approaches, and methodologies (e. g., reliability theory, maintenance policy models, spare parts management) that are properly discussed in the next chapters. For example, the control and supervision phase requires a continuous calculation of key performance indexes for a robust analysis of the status and above all the design of optimizing policies such as preventive interventions, inspections, or the optimal management of spare parts. The planning and scheduling and the execution phases are devoted to applying these policies in practice.
4.5 Maintenance Engineering Frameworks The previously introduced workflow is an output of the evolution of the maintenance concept since the end of the Second World War. As failure is a not eliminable occurrence, the first maintenance activity developed, called “breakdown or reactive maintenance,” was clearly devoted to the restoring of equipment. From the 1950s plant managers were encouraged to develop programs to prevent damage, according to the new trend of “preventive maintenance.” Although it helped to reduce the downtime, it was an expensive alternative. Parts were replaced on a time basis, while they could have lasted longer; a lot of unnecessary man-hours were also spent, thus resulting in an excess of activities, resulting in an increase of total costs in many cases. The problem is still therefore the determination of the optimal level of preventive activities. The monitoring of the real conditions of equipment can permit a calibration of the deadlines for preventive interventions: this is the main strategy of the “on condition monitoring policy,” introduced in the 1990s as the natural evolution of the preventive one. Currently, many companies are still coping with this evolution, from the extensive use of corrective maintenance to the introduction of significant preventive and on condition activities. The optimal mix of policies is strongly
4.5 Maintenance Engineering Frameworks
71
Planning & scheduling Input documents
Execution
Procedures, technical data, operational manual, component list, etc.
Work order, tools, spare parts, manpower resources, operational procedures
Control & supervision
History records, control procedures, laws and regulations
Preparatory phase: item conception, designing, manufacturing, assembly, etc.
Carry out the work
Work planning Production of reports
Closure of the work order
Work scheduling Reports analysis
Work order release & assignment
Output documents
Work order, planning sheet, scheduling sheet,etc.
History records, KPI
Reports, feedback documents and improment proposal
Fig. 4.5 The maintenance workflow. KPI key performance index
dependent on the real application, but several studies in the literature stated the “20-40-40” rule: 20% of corrective strategy and 40% both for preventive and on condition strategies. Anyway, the relevance of the maintenance question requires a systematic approach and a wide perspective involving not only the best mix of maintenance policies but every factor that has an impact on the global cost of a production system. For example, the human contribution, the maintenance information system, and the spare parts management are several important features to be managed in order to achieve excellence. During the past few years several conceptual frameworks for maintenance modeling and management following this “total approach” were developed. In particular, some fundamentals about reliability-centered
maintenance (RCM) and total productive maintenance (TPM) are briefly discussed. RCM is a systematic engineering process to determine what to do in order to ensure that the physical assets continue to behave as users wish. In other words, RCM supports the definition of a complete maintenance regime. The main tools and models traditionally related to the RCM approach are illustrated in the following chapters in this book. They regard maintenance as the way to maintain the functions of the machinery a user may require in a defined operating context. It enables the machinery stakeholders to monitor, assess, predict, and generally know how their physical assets work. TPM, firstly a Japanese idea, is a proactive and systematic engineering approach that essentially aims to prevent any kind of slack before occurrence. It em-
72
phasizes the importance of people, a “can do” and “continuous improvement” philosophy, and the importance of production and maintenance staff working together. The following sections discuss the main topics, problems, models, and methods dealing with maintenance in general. The authors do not have a preferred philosophy or a preferred approach to maintenance. The models, methods, numerical examples, and applications can support the manager or the practitioner of modern production systems to implement the “approach of the moment” when he/she knows the main pillars which define it. That is why we pick RAMS engineering, whose “reliability,” “availability,” “maintainability,” and “safety” are the basic keywords describing the content of this book, especially if we think of quality as part of them (see Chap. 1).
4.6 Reliability-Centered Maintenance The RCM process is known as a “reliability by design” based approach and is reliability-centered because its programs aim to achieve the inherent safety and reliability capabilities of a piece of equipment at a minimum cost. The fundamental goal of RCM is to give the equipment the opportunity to reach the maximum level of reliability that is consistent with the safety, environmental, operational, and profit goals of the organization. This is allowed by addressing the basic causes of system failures and ensuring that there are organizational activities designed to prevent them, predict them, or mitigate the business impact of the functional failures associated with them. The RCM approach is based on several basic steps for each asset: 1. Identification of the expected functions of the equipment to be used. Every facility is designed and built to produce some desired outputs. To achieve this goal the equipment operates some functions, usually grouped in two categories. The main, or primary, functions, e. g., velocity, quality, and safety, are necessary for the correct operation of the equipment, and therefore are strictly related to the reason why the asset has been installed. The second category includes the support functions expressing desirable conditions. The loss of these functions usually does not compromise the output,
4 Introduction to Maintenance in Production Systems
e. g., comfort, effectiveness, and noise, but only the way to get it. 2. Identification of the components of the system with their related failure modes. It is important to note that for the RCM approach any unsatisfactory condition is equivalent to a failure. By this definition it is possible to fix the concept of a failed but still working piece of equipment. Many programs for condition monitoring do not achieve their desired output because the people involved in the program often do not identify a failure as soon as an unsatisfactory condition has been detected. 3. Failure causes analysis. Identification and classification of faults and failures. The goal of this step is the determination of the causes for each functional failure. The cause may be the failure of a piece of equipment or a part of it, or sometimes a failure in some human activity as well. Improper operation and improper maintenance are likely to be the causes of failures. An effective tool to develop this analysis is fault tree analysis, discussed in Chap. 8. 4. Failure effects and consequences analysis. Failure effects analysis is a step-by-step approach devoted to studying the consequences of each failure. When a failure occurs, many different things resulting in different impacts on the equipment, hence on the company business, can happen. Every company fixes its targets for profitability, safety performance, environmental performance, and operational performance. Each failure has a different impact on the business performance, and for the RCM team it is important to evaluate the corresponding consequences, from lack of or minor effects to the total collapse of the business or, in extreme cases, the loss of lives. Failure modes and effects analysis and failure mode, effects, and criticality analysis, as discussed in Chap. 8, are two very interesting tools for an easy approach to this step. After this first phase devoted to “knowledge,” RCM provides some actions, divided into two categories too, dealing with failures. In particular: 5. Proactive tasks, i. e., preventive and/or predictive, i. e., on condition, maintenance tasks. Especially in the case of relevant consequences, something must be done to prevent or predict the failures, or at least to reduce their impact. The proactive tasks are practically the aforementioned preventive and
4.7 Total Productive Maintenance
on condition maintenance policies. It can be stated that the RCM framework, scheduling restoration, discard, and on-condition tasks, is based on the same fundamental concept expressed in Sect. 4.5. Scheduled restoration involves the remanufacturing of a component or an assembly at or before a specified age limit, regardless of its condition at that time. Similarly, scheduled discard implies rejecting an item at or before a specified life limit, regardless of its condition at that time. According to the on condition tasks, items keep providing their service since they meet the desired performance standards. An action is inspired by a requirement, whose evaluation can result in a big deal. Most failures provide warnings about their imminent occurrence. These warnings, or potential failures, are defined as recognizable physical conditions suggesting that a functional failure is about to occur or is in progress. The analysis of these warnings and the correlation with the probability of failure is still a current and significant problem. Chapter 9 deals with the techniques supporting the optimization of proactive tasks. 6. Default actions, i. e., failure-finding, redesign, and run-to-failure, when it is not possible to identify a proactive task. The appropriate default action can be decided according to the consequence of the failure. If there are no proactive tasks capable of reducing the operational consequences, the first default decision can be considered as “do nothing,” i. e., running until the failure occurs for successive corrective interventions. If the restoration cost is too high, a redesign might of course be required. If proactive tasks to improve safety or to reduce environmental risks to an acceptable level cannot be found, the equipment must be redesigned or the process/system where it is employed must be modified. In conclusion, the RCM method provides the last step principally devoted to the monitoring of implementation. 7. Implement and refine the maintenance plan. The RCM approach requires continuous monitoring of its procedure. The maintenance plan must be constantly reviewed taking into account how pieces of equipment evolve and react. The RCM maintenance plan properly requires a cross-functional team constituted of maintenance, operations, and
73
engineering personnel having a thorough understanding of the asset and a clear identification of the risks and profits of the company. Several models and methods useful for implementation of the previously mentioned decision steps are discussed in the following chapters. In particular, Chap. 8 introduces failure modes and effects analysis and failure mode, effects, and criticality analysis techniques for the identification of failure events and the criticality analysis, while some analytical planning models for preventive maintenance actions and inspections are discussed and applied in Chap. 9.
4.7 Total Productive Maintenance A few sections of this book are devoted to this conceptual maintenance framework, currently a reference for a lot of companies. TPM is a people-centered methodology, generally considered as a critical add-on to the “lean manufacturing” production philosophy.
4.7.1 Introduction to TPM The importance of the maintenance function has increased because it has a fundamental role in keeping and increasing the availability, product quality, safety requirements, and plant cost-effectiveness levels. Maintenance costs constitute an important part of the operating budget of manufacturing firms. During the 1960s the concept of TPM was developed in Japan in response to this problem. TPM is a manufacturing program designed primarily to maximize the effectiveness of equipment throughout its entire life by the participation and motivation of the entire workforce (Nakajima 1988). This approach provides a synergistic relationship among all the company’s functions, but particularly between production and maintenance, for continuous improvement of product quality, operational efficiency, capacity assurance, and safety. According to this vision, the word “total” in TPM may assume three meanings:
74
1. TPM pursues the total effectiveness such as economic efficiency and profitability. 2. TPM provides a total maintenance approach mainly including corrective, preventive, and on condition policies and other techniques. 3. TPM needs the total participation of all employees and involves every level and function in the organization, from the top executive to the production operator on the floor. There is a lot of documentation about the benefits arising from the adoption of TPM. Many papers, such as Koelsch (1993), Ferrari et al. (2001, 2002), Eti et al. (2004), Chan et. al (2005), and Gosavi (2006), told of similar success stories of companies that reduced breakdown labor rates, setup times, and production losses very significantly by TPM, thus avoiding costs per maintenance unit. TPM implementation presents several opportunities but also some threats, as discussed in the following sections together with some operative suggestions.
4.7.2 The Concept of TPM TPM is an evolution of the “preventive maintenance approach.” In the early 1960s in some Japanese companies (e. g., the famous Nippondenso) maintenance became a problem as soon as the demand for personnel dedicated to maintenance increased. The management decided to assign the routine maintenance of equipment directly to the operators, thus creating one of the pillars of TPM: the concept of autonomous maintenance. The maintenance personnel took up only important or difficult maintenance interventions, and at the same time suggested some solutions to improve reliability. This approach was completed over the years. At the moment, TPM is universally defined as a productive maintenance technique that is made up of a set of activities to be performed by every operator in order to get zero defects. From a general point, the main targets of TPM are: • maximum efficiency of the plant; • an accurate definition of the plan for preventive maintenance; • a diffusion of relevance of maintenance; • diffusion of workers’ participation, at any level; • organization of small groups of people for enhanced management of problems.
4 Introduction to Maintenance in Production Systems
TPM is based on several fundamental steps, generally called “pillars of TPM,” hereafter discussed briefly. (i) Deletion of causes of losses in productivity. Usually six fundamental causes are expected: 1. Time losses due to: (a) Breakdowns: failures of components require corrective interventions or restoration activities with eventual utilization of spare parts. (b) Setup activities: setting up means a series of operations such as attachment, adjustment, trial processing, readjustment, measurement, production, and finally the ability to produce excellent products. A large amount of time is spent in product-change adjustments until the production of the new item is completely satisfactory. 2. Speed losses due to: (a) Micro-stops: minor and idling stops, usually very short and difficult to trace, when production is interrupted by a temporary malfunction or when a machine is idling. (b) Speed reduction from nominal value: This is due to a misalignment between expected and actual speed or, less frequently, to inadequate technological standards. Sometimes the speed is reduced because of quality or mechanical problems, but there are also cases where the standard speed is not used because it will shorten the service life of the equipment. 3. Defects due to: (a) Equipment starting: some start-up phases (e. g., after periodic repairs, long-time stoppage, holidays, or lunch breaks) may have problems resulting in loss of time, production volume, and costs. (b) Quality defects: volume losses due to defects and reworks, and time losses arising from the time required to repair defective products to turn them into excellent products. (ii) Creation of a program of autonomous maintenance (AM) (maintenance by workers). Operators perform simple maintenance tasks, while more value added activities and technical repairs are performed by skilled maintenance people. Operators
4.7 Total Productive Maintenance
are responsible for upkeep of their equipment to prevent it from degradation. (iii) Plans of preventive and on condition maintenance for maintenance division (on staff position). The maintenance personnel plays a new role in performing only the nonconventional interventions and, above all, in developing activities, e. g., preventive activities, on condition monitoring systems, and plant design modifications, to increase the equipment reliability and safety. (iv) Advance in workers’ capability to provide maintenance. Training plays a crucial role in TPM application. It aims to have multiskilled and wellmotivated people eager to come to work and perform all the required functions effectively and independently. The goal is to create a factory full of experts. Education is continuously provided to operators and maintenance workers, in order to upgrade their skill. Employees should be trained to achieve the four phases of the educational process: do not know, know the theory but cannot do, can do but cannot teach, can do and also teach. (v) Plant/equipment management system. Equipment must be managed considering several aspects: the phase in and the warm-up phase, the normal operating time, and the phase out. Spare parts, design modifications, and continuous improvement are to be pursued with determination. Production and maintenance departments are engaged to develop policies and systematic approaches to achieve these targets. In conclusion, the core of the TPM approach deals with the new role of operators and maintenance workers. Operators and maintenance personnel must reach mutual understanding and share responsibility for equipment. A cooperative effort is required: operators develop the routine maintenance activities, and in particular the following:
75
The maintenance personnel is instead focused on tasks mostly requiring technical expertise and more sophisticated techniques for advanced manufacturing. In particular: • providing technical support for the AM activities; • restoring deterioration thoroughly and accurately, using inspections, condition monitoring, and overhaul; • clarifying operating standards by tracing design weaknesses and making appropriate improvements; • enhancing maintenance skills for checkups, condition monitoring, inspections, and overhaul. TPM introduces a vision significantly different from that of the preventive maintenance approach. The goal of TPM is the improvement of production efficiency to its maximum extent. Its purpose is to maximize the efficiency of production systems in an overall manner, also involving the human factor. In contrast, the preventive maintenance approach is centered on equipment, the target is the maximum efficiency. The preventive maintenance approach considers the fundamental role of the maintenance department and its activities, whereas TPM consists of small-group activities where all members, usually including managers, participate and work jointly on a self-discipline basis.
4.7.3 TPM Operating Instruments In addition to the well-known reliability theory, based on reliability, maintainability, and availability, TPM introduces a rather extended vision of a new synthetic indicator of analysis called “overall equipment effectiveness” (OEE), taking into account availability, quality, and performance efficiency. In particular, OEE D availability production efficiency rate of quality D A PE RQ;
• maintaining basic equipment conditions (cleaning, lubrication, bolting); • maintaining operating conditions (proper operation and visual inspection); • discovering deterioration, mainly by visual inspection and early identification of signs of abnormalities during operation; • enhancing skills such as equipment operation, setup, and adjustment, as well as visual inspection.
where uptime ; uptime C downtime theoretical cycle time ; PE D actual cycle time total products defectives RQ D : total products AD
76
Any improvement process requires the measurement of performance. The choice of the appropriate metrics is a relevant purpose. OEE is a combination of operation maintenance, equipment management, and available resources expressing the “global” approach of TPM best. The goal of TPM is to maximize equipment effectiveness and the OEE is used as a measure of this parameter. Factors affecting OEE are not equally important in every situation and different weights should be assigned according to the specific application, as stated by several authors (Dal et al. 2000; Ferrari et al. 2001). The fine-tuning process of OEE can vary across different business sectors and industries. Generally speaking, a world-class OEE is 0.80–0.85, roughly multiplying an availability rate of about 0.92–0.94, a production efficiency rate of about 0.90–0.92, and a quality rate of about 0.98–0.99. By this new parameter the contributions of the most relevant causes of production losses, in terms of time losses, speed losses, and defects, can be seen: that is why OEE appears as a profitable instrument for TPM implementation.
4.7.4 From Tradition to TPM: A Difficult Transition The new vision introduced as TPM, with its concepts such as autonomous maintenance and instruments such as OEE, is certainly a big opportunity for a global consideration of maintenance but, at the same time, it has some threats. In spite of the continuous improvement observed over recent years, the tradition is still strong and therefore there is not a great disposition for those techniques that directly involve the workers. The principal difficulties are encountered in the area of the organizational change involving people. A cultural shortage can spread the misunderstanding that the TPM method requires production employees to work more, thus reducing the number of maintenance people. However, there are no binding elements for TPM application, but a tenable method for its gradual and smooth application must be found. The proposition of an implementation methodology for TPM, firstly as a new philosophy and successively as a new operational system, is extremely important.
4 Introduction to Maintenance in Production Systems
4.7.4.1 The Proposed Method Workers from any level in the factory have to be gradually but constantly involved in the implementation of TPM, basically made of five main steps: 1. Knowledge diffusion and creation of a structure for project management. For good application of TPM, “top-down” involvement is fundamental, especially in order to get the required change in mentality. For this reason it is necessary to carry on the training and education, both by theoretical sessions and practical simulation, before the on-field implementation. It is furthermore necessary to create a unit dedicated to TPM in order to pursue design activities and development control of the project. 2. Pilot line choice. The TPM technique represents a set of general prescriptions but it could require big changes and adaptations, especially in the western world. The selection of a pilot plant, or a line, to test the TPM approach with and to bring about some adjustments could be the right move for maximum limitation of problems and for better “calibration” of the system to the real situation. 3. Analysis of the de facto situation. At the starting phase and before continuing the TPM application, it is absolutely necessary to recover both technical and economic information, related to the performance parameters and to the costs of the maintenance system respectively, about the pilot line. In this phase it is useful to apply the reliability theory (i. e., mean time before failure, mean time to repair, and failure rate – see Chap. 5) and the synthetic parameter OEE. 4. Criticality determination and proposition for improvements. The analysis of the starting situation allows one to underline criticalities, suggesting some possible improvements and solutions for the next steps. Obviously, the management procedures must be “lined up” with TPM feeling and consequently must be based on autonomous maintenance, small groups, and increase of workers’ competence. In this phase it is very important to keep the personnel continuously informed about the developing status, e. g., by explanation panels. 5. Economic evaluation of proposed developments and extension of analysis. Generally, the previous steps lead to some modifications, both technical and managerial, each of them to be valued by a cost– benefit balance before the application in practice.
4.7 Total Productive Maintenance
The real application of this method to the pilot line requires a warm-up period but after the following transitory period the methodology can be extended to other lines or plants of the factory. The proposed method is applied to an important company, a world leader in its business sectors, with very encouraging results, as presented in the following case study.
77
-
Targets definition Intervention priority choice Advance status check
TPM
-
OEE analysis Big losses analysis Solutions elaboration
Work
-
Applicative solutions
Project
4.7.4.2 Alfa Spa Case History The proposed procedure has been applied in the factory of a world leader, Alfa, in the manufacturing of plants for the metallurgical sector. Before the TPM project, Alfa approached maintenance in a conventional way based on a corrective system with some agreements linked to productive maintenance. The most significant points of the general procedure can be briefly traced as follows: Knowledge diffusion and creation of a structure for project management. For the right application and a consistent result of the project, it appears very important to spread the knowledge and the participation among workers, at any level in the factory. That is why the prime activity consisted in training and educational courses, with different levels and targets, and theoretical lessons about TPM targets and methods, fundamentally for top managers, and “operative” lessons and workshops for direct workers were both organized. After this alignment of knowledge, the creation of a structure for TPM management is important. In the case of Alfa, this organization is made up of three levels and three different teams; in particular: 1. Project team, with: • • • • •
plant director (team leader); workshop manager; manufacturing manager; maintenance manager; quality director.
2. TPM team, with: • • • • •
manufacturing manager (team leader); maintenance manager; workshop delegate; manufacturing delegate; quality control delegate.
3. Work team, with workers and maintenance people, and past members of the TPM team.
Fig. 4.6 Total productive maintenance team responsibilities. OEE overall equipment effectiveness
The corresponding responsibilities for each team are briefly reported in Fig. 4.6. Choice of the pilot line. A key factor for TPM success is the gradual application of the project. The implementation must start from a pilot line, from which it is possible to evidence the specific problems and specialties and, as a consequence, to adjust the TPM concepts and methods ahead of a global application. A boring unit made up of four machines, briefly from mac_1 to mac_4, a very capital intensive device with very big problems concerning maintenance, is the pilot line for the Alfa case. Analysis of the de facto situation. A deep analysis of the real situation is an inalienable starting point. It is very important to trace the situation of maintenance activities from both technical and economic aspects. In Alfa maintenance, especially for the pilot line, was centered on corrective and preventive policies performed by a maintenance division, eventually integrated with external suppliers. Figures 4.7 and 4.8 report for each machine the time per year dedicated to maintenance activities divided into internal and external interventions. For example, in 2007 mac_1 required 876 h for maintenance activities, of these 68.6% in corrective interventions with a significant contribution by external suppliers (40.0% of the number of hours). In parallel, some typical parameters for reliability evaluation are extracted from the maintenance database under the hypothesis of constant failure rates (Table 4.1). In TPM the OEE index enables one to express some different managerial aspects of the plant simultaneously. Still from the maintenance database of the factory, whose relevance is discussed in Chap. 7, the OEE
78
4 Introduction to Maintenance in Production Systems
Fig. 4.9 OEE performance – mac_2 Fig. 4.7 Distribution of maintenance activities (preventive– corrective)
Fig. 4.8 external)
Distribution of maintenance activities (internal–
values are calculated weekly. Figure 4.9 shows an extraction of the OEE index for mac_2 in the period from 1 February 2008 to 8 April 2008. This OEE index can be partitioned into its elements, such as availability, production efficiency, and rate of quality (Fig. 4.10). In particular, Fig. 4.11 aims to focus the setup and start-up times for mac_2 in the same period. Figure 4.12 shows a report concerning the different maintenance policies applied to mac_2. Criticality determination and proposition for improvements. The OEE parameter with its factors enables one to focus on the most significant causes of production losses. In particular for Alfa, for fundamental causes are underlined: setups, maintenance in-
terventions, management problems, i. e., absence of workers and shortage of materials, and technical problems, such as nonconformity of tool and materials. These criticalities assume a different relevance for mac_2: as reported in Fig. 4.13, setups and maintenance interventions represent the major important causes of production losses. Some remedial activities must follow the previous analysis in order for us to delete or to reduce constraints and distortions. The fundamental principles are automaintenance, small group activities, and participation of workers, but more in detail the proposed solution is as follows: a different management of setup activities, some modifications of the plant for the reduction of the failure rate, a total revision of preventive and predictive maintenance planning, and a remanagement of the staff of the maintenance division. It is very important to make all the workforce aware of the current situation. An informative panel, placed in the middle of the pilot line, reporting the OEE trend together with criticalities detected, proposed solutions, and final goals is very useful for the diffusion of knowledge. Economic evaluation of proposed developments and extension of the analysis. Before the application of the solutions picked out in the previous steps an economic survey is absolutely prescribed. Each solution has to be subjected to a cost–benefit estimation for a payback period analysis of investment. For example, the evaluation of the economic impact of a new procedure for the work cycle and tool management (June 2008 euro–dollar exchange rate) is briefly reported:
Table 4.1 Reliability parameters for 2007
mac_1 mac_2 mac_3 mac_4
MTTF (days)
MTTR (h)
(days1 )
5.35 3.07 5.92 4.51
7.45 4.76 6.34 9.34
0.19 0.33 0.17 0.22
MTTF mean time to failure, MTTR mean time to repair
– – – –
Starting investment US$ 62,750; Annual investment US$ 3,750; Annual savings US$ 67,300; Payback period around 11 months.
The investment is mainly concentrated on personnel training and, for a minor fraction, on equipment use-
4.7 Total Productive Maintenance
Fig. 4.10 OEE factors – mac_2
Fig. 4.11 Setup and start-up activities (in hours) – mac_2
Fig. 4.12 Maintenance activities (in hours) – mac_2
79
80
4 Introduction to Maintenance in Production Systems 0
0.4
0.55
0.75 0.85
1.00
World class Excellence Proacve Emerging Reacve
Fig. 4.14 Assessment score
Fig. 4.13 Impact of mac_2 criticalities (period from 1 February 2008 to 8 April 2008)
ful to facilitate the operators in their automaintenance activity. The annual investment includes training owing to personnel turnover and spare parts for the TPM equipment. Savings are fundamentally due to the increase in production time, hence in revenue, and in product quality, i. e., defect reduction. The job satisfaction concerning a TPM project is very strictly related to the direct participation of workers, and that is why it is very important to plan a good and serious educational program at any level in the factory. Moreover, as previously stated, TPM aims at a gradual improvement by small, but continuous, steps: Alfa decided to extend the TPM system to all the other production lines.
4.8 Maintenance Status Survey Several studies devoted to the assessment of maintenance organization and strategies implemented by companies around the world have been reported in the literature. Smith (2003) developed a benchmark study of more than 170 assessments over a broad spectrum of plant and facility types. The study investigated the situation of maintenance in the companies in three different areas: the organization of maintenance, maintenance process support, and finally the support in the operative procedures, including maintenance engineering techniques and work planning and control. Each factor was evaluated according to an assessment scale from 0.00 to 1.00, as reported in Fig. 4.14. Tables 4.2–4.4 summarize the results. With reference to the first area “organization”, the diffusion of the maintenance principles and the level
of the target clarification were further singled out by the author. The presence of a master plan, with its own budget controlled by the management, related to the maintenance question, was another important feature investigated. Figures 4.15–4.17 summarize the results of the analysis: companies have insight into the importance of maintenance in a sufficient way but often face this question without a formal master plan and a systematic approach. As discussed in Chap. 1, an effective maintenance process has to be supported with scheduling and supervision of the designed subprocesses. Training of personnel, dedicated software, and, in general, information technology are important resources. Smith states a significant use of information technology, e. g., CMMS discussed in Chap. 7. Moreover, the training of personnel is sufficiently implemented, whereas scheduling and the required coordination of support are insufficient. This is further evidence of the organizational deficiency usually found in companies facing the maintenance question. Table 4.6 and Fig. 4.19 report as a whole how companies evaluate their preventive maintenance system by themselves. The last group of factors explored by Smith is the implementation of procedures, techniques, and methods for the application of the maintenance principles. On average, the situation is not positive. All the factors have a score in the reactive zone, and in particular work measurement and work planning are very critical. An interesting paper by the maintenance provider Corrigo (2007) included 142 assessments in companies from different sectors. The survey focused on the application of the preventive maintenance solutions and related factors. The inadequate situation is above all due to poor reporting after the interventions and consequently to information supporting the preventive
4.8 Maintenance Status Survey
81
Table 4.2 Assessment score: maintenance organization Scores
Governing principles
Objective clarification
Master plan
Budgetary control
Management control
lowest average highest median
0.000 0.468 0.925 0.475
0.040 0.388 0.880 0.360
0.000 0.279 0.960 0.160
0.100 0.526 1.000 0.500
0.033 0.471 0.900 0.433
Table 4.3 Assessment score: maintenance process support Scores
Training
supervision
Scheduling and coordination
Computer support
lowest average highest median
0.000 0.468 0.925 0.475
0.040 0.388 0.880 0.360
0.000 0.279 0.960 0.160
0.100 0.526 1.000 0.500
Fig. 4.15 Survey results: maintenance organization
Fig. 4.16 Survey results: maintenance process support
maintenance scheduling. Table 4.5 and Fig. 4.18 indicate that the preventive maintenance activities are usually scheduled and documented with significant support from automated system, but at the same time
interventions appear to be found mainly on an experience basis, with a very poor contribution from historical and reliability data not properly traced and stored in the database.
82
4 Introduction to Maintenance in Production Systems
Table 4.4 Assessment score: maintenance procedures Scores lowest average highest median
Maintenance engineering Prev/pred maintenance Work planning Work measurement Material support and control 0.1 0.4 0.9 0.4
0.1 0.5 0.9 0.5
0.0 0.4 0.9 0.3
0.0 0.3 0.7 0.2
0.1 0.6 0.9 0.6
Table 4.5 Preventive maintenance factors benchmark
Maintenance tasks scheduled and documented PM scheduling supported by an automated system Asset condition and history available before PM scheduling Full reporting of PM tasks executed
yes
no
59% 53% 34% 43%
41% 47% 66% 57%
PM preventive maintenance
Fig. 4.17 Survey results: maintenance procedures
Fig. 4.18 Preventive maintenance factors benchmark. PM preventive maintenance
Most of the companies had a lack of perception about preventive tasks, and only for 9% of the sample was the preventive policy optimal.
These surveys are clearly restricted to limitations in the sample size, industrial sectors, and geographical areas, but anyway a significant conclusion can
4.9 Maintenance Outsourcing and Contracts
83
Table 4.6 Overall self-rating of preventive maintenance situation Optimal
Sufficient
In place but insufficient
Non existent
9%
36%
36%
19%
Global evaluation of PM management system
Fig. 4.19 Overall self-rating of preventive maintenance
be drawn: maintenance practitioners apply good practices, although without full comprehension of the corresponding benefits. The importance of the maintenance management facility in manufacturing systems is increasing rapidly as many organizations aim to become world-class companies. Companies must respond to global competitive pressure by seeking to increase their productivity also by pursing an effective and efficient maintenance program. The crucial involvement of the management is fundamental to give guidance and direction to the maintenance function.
4.9 Maintenance Outsourcing and Contracts In the past few years many companies opted to outsource their “noncore” business activities, thus creating a discussion about what is “core” and what is “noncore.” This is a highly subjective process, often ending when a personal opinion has the upper hand over another personal opinion. For companies such as several service suppliers, e. g., airlines, railways, and amusement parks, maintenance is a primary business area, but in general, and above all for manufacturers, maintenance can be considered a noncore business aspect. In spite of this, the outsourcing of maintenance activities has strongly increased in the last few years. This is not a trivial choice, first of all in fixing what has to be outsourced. The maintenance management
process discussed in the following chapter involves, in general, three macro-activities: data collection, analysis and application of maintenance engineering techniques, and the execution of interventions. Companies often prefer to outsource the executive phase, while developing the remaining steps in-house. This is typical, e. g., when the external contractors support the in-house workforce during work-intensive periods, or during major shutdowns or overhauls. This can be considered as a minimalist approach. As an alternative, companies can outsource the planning in addition to the executive phase. In this case, only for preventive and on condition tasks of course, the external contractor decides how and when, but the outsourcing organization retains control over what is to be done. The global approach is to outsource all the activities. In this instance, every part of the agreement must be structured around the achievement of desired outcomes in terms of equipment performance. In other words, companies “buy” the performance reliability levels. In every situation there are advantages and disadvantages, and the most appropriate approach will depend on the particular case. Manufacturers using external maintenance providers can reduce the cost of the maintenance division, or at least they turn fixed costs into variable costs. The providers offer their services to many clients at a very convenient price, thus exploiting the scale effect, and the clients can find more competences in the external personnel than in their own operators, with better performance as a consequence. In conclusion, an effective provider can raise the technical performance of the equipment, paying continuous attention to costs, usually with a slight reduction. The rating process of the provider is a very complex task, because only few actors are well skilled and organized to provide a systematic and effective contribution. This remark is less significant when only the executive phase is outsourced, but in contrast is absolutely fundamental when manufacturers assign all their maintenance to an external provider.
84
Another limiting factor for maintenance outsourcing deals with the competences: to externalize completely the maintenance activities means to lose every related technical and organizational competence in a short time. This can result in some difficulties in the relations with the provider, or mainly in recovering this competence in the future. The challenge in maintenance outsourcing is that manufacturers and providers, also referred to as “contractors,” are independent and usually make decisions based on their own economic interests. Without coordination, their policies may not be compatible or may not lead to optimal system performance. An effective maintenance contract represents an instrument to ensure that manufacturers and contractors have the common target of system efficiency, in terms of performance and costs. The recent European standard EN 13269:2006 presents a useful guideline for the preparation of the maintenance contract. In particular, on the side of the contractor the standards are: • supplying the resources of personnel, material, and equipment; • Preparing a work program and carry out the work; • providing the management required to control the program and the workforce at every stage; • submitting claims for payment; • management of possible contract changes. On the side of the company the standard actions are: • budgeting and validation of the maintenance contractor’s claims for payment; • agreeing with any variation to the contract; • quality assurance requirement and overall management; • verifying that the maintenance performed complies with the requirement of the contract. This book can properly support the reader also in acquiring the basic knowledge for preparing a contract. The third approach mentioned at the beginning of this section, usually called “maintenance global service,” requires a very accurate definition of the cooperation between contractor and client. They have contrasting attitudes: providers are usually involved in limiting their costs and manufacturers are more concerned with the uptimes of the equipment. Anyway, success comes only when strong partnering arrange-
4 Introduction to Maintenance in Production Systems
contract fee
bonus area
0
technical performance
malus area 0 Fig. 4.20 Bonus–malus concept in maintenance contract
ments and cooperative relationships between contractor and client exist. Experimental evidence has demonstrated that an incentive-based contract improves the maintenance operations: usually a mix of equipment uptime, or availability, target levels, and a bonus–malus percentage on the extra profit eventually generated are fixed. Figure 4.20 shows a typical bonus–malus solution: when the providers generate the targeted technical performance for the equipment, the contracted fee is paid. An extra fee is paid in the case of better performance, a penalty is due in case of worse results. Through some recent diffusion of maintenance outsourcing, the outsourcing organization has to address many critical issues in the transition to the new arrangements. Among these are matters such as: • The personnel. Which will be retained by the organization, which will be employed by the contractor, which will be let go? • The drawings. Who is responsible for ensuring that drawings are kept up to date, who will be the custodian of site drawings? • The computer systems. Will the contractor have access to the client’s computerized maintenance management system (see Chap. 7)? Will the contractor maintain its own computerized maintenance records? Who is responsible for ensuring that all the data in the computerized maintenance management systems are accurate? • Materials management (spare parts and tools). Will the contractor provide his own materials, or will the client provide these? Another critical issue to be addressed before the contract is concluded, is how to manage the rescission of
4.9 Maintenance Outsourcing and Contracts
an existing contract. In particular, an agreement has to be reached regarding the duties and obligations of the outgoing contractor in handing over to the incoming contractor (or the client organization, should it decide to bring maintenance back in-house).
85
In conclusion, it is not worth taking the decision to outsource the maintenance activity with a light heart. The potential advantages are very significant and interesting, but a careful consideration of all major issues is vital for a good final result.
5
Basic Statistics and Introduction to Reliability
Contents 5.1
Introduction to Reliability . . . . . . . . . . . . . . . . . . . . 88
5.2
Components and Systems in Reliability . . . . . . . . . 88
5.3
Basic Statistics in Reliability Engineering . . . . . . . 89
5.4
Time to Failure and Time to Repair . . . . . . . . . . . . 90
5.5
Probability Distribution Function . . . . . . . . . . . . . . 90
5.6
Repairable and Nonrepairable Systems . . . . . . . . . 91
5.7
The Reliability Function – R(t) . . . . . . . . . . . . . . . . . 91
5.8
Hazard Rate Function . . . . . . . . . . . . . . . . . . . . . . . . 92 5.8.1 Hazard Rate Profiles . . . . . . . . . . . . . . . . . . . . . 94 5.8.2 Mean Time to Failure . . . . . . . . . . . . . . . . . . . . 95
5.9
Stochastic Repair Process . . . . . . . . . . . . . . . . . . . . . 95
5.10 Parametric Probability Density Functions . . . . . . . 97 5.10.1 Constant Failure Rate Model: The Exponential Distribution . . . . . . . . . . . . . 97 5.10.2 Exponential Distribution. Numerical example 99 5.10.3 The Normal and Lognormal Distributions . . . 103 5.10.4 Normal and Lognormal Distributions. Numerical example . . . . . . . . . . . . . . . . . . . . . 106 5.10.5 The Weibull Distribution . . . . . . . . . . . . . . . . . 110 5.10.6 Weibull Distribution. Numerical Example . . . 112 5.11 Repairable Components/Systems: The Renewal Process and Availability A(t) . . . . . . 113 5.12 Applications and Case Studies . . . . . . . . . . . . . . . . . 117 5.12.1 Application 1 – Nonrepairable Components . 117 5.12.2 Application 2 – Repairable System . . . . . . . . 122
Billions of dollars are currently spent producing high-technology products and services in a variety of production systems operating in different manufacturing and service sectors (e. g., aviation, automotive industry, software development, banks and financial
companies, health care). Most of these products are very complex and sophisticated owing to the number of functions and components (many systems are made of millions of parts). A good example is the largest passenger airliner in the world, the Airbus A380, also known as the “Superjumbo,” with an operating range of approximately 15;200 km, sufficient to fly directly from New York City to Hong Kong. The generic part of this very complex product can be characterized by life cycle and failure behavior, but also by repair behavior in case of failure detection, and in the presence/absence of a maintenance strategy, e. g., based on replacement and/or inspection or preventive action. Moreover, the failure and repair behavior of the generic part of the system can be directly or indirectly associated with thousands of different safety implications and/or quality expectations and performance measurements, which simultaneously deal with passengers, buildings, environment, and communities of people. In particular, reliability can be defined as the probability that a component (or system) will perform a required function for a given period of time when used under specific operating conditions. Another important basic definition is that of availability, which is the probability that a component (system) is performing its required function at a given point in time when used under specific operating conditions. Finally, maintainability is the probability that a failed component (system) will be restored (or repaired) to a specified condition within a period of time when maintenance is carried out in accordance with prescribed procedures. These definitions mean that the improvement, measurement, and control of software reliability and avail-
R. Manzini, A. Regattieri, H. Pham, E. Ferrari, Maintenance for Industrial Systems © Springer 2010
87
88
ability to support the operability of production systems are very important issues. In fact, most system outages and machine crashes are generated by malfunction of the software management system. The aim of this chapter is to introduce the reader to the definition, measurement, management, and control of the main reliability parameters that form the bases for modeling and evaluating activities in complex production systems.
5.1 Introduction to Reliability Reliability has become a very frequently used term during the last 10 years, not only used by engineers and practitioners but also by shop and superstore assistants who justify the price and performance of a product by stressing quality, reliability, warranty, and customer service if failures occur, etc. In particular, this term is implicit in the thought processes of modern society, from the housewife choosing a model of washing machine to the engineers who design the product and guarantee its performance. In doing this, engineers also consider the implications of the warranty and repair costs, a significant proportion of which is composed of the spare parts management costs (i. e., fulfillment, inventory management, replacement, etc.). As briefly introduced in Chap. 3, the importance of measuring reliability is closely related to risk determination and control: the generic risk event is related to the quantification of a probability, i. e., the reliability, and simultaneously the magnitude of the consequences. The importance of reliability also finds justification in the continuous quality control and improvement of the products/services, process, and production systems, and safety requirements and expectations: the more complex the product is, the larger the number of laws and regulations the product must comply with. For example, the previously mentioned Airbus A380 must meet an extremely large number of standards and obtain certification, mainly from the Federal Aviation Administration in the USA and the European Aviation Safety Agency. Reliability, quality, safety, warranty, etc. are very important keywords often used without respecting the original and correct meaning. Consequently, the main aim of this book is to provide the reader with the abil-
5 Basic Statistics and Introduction to Reliability
ity to marry correct notation with a set of definitions, appropriately supported by a set of effective decisionmaking methods and models. The identification of a universal notation used by most users, producers, designers, and practitioners would represent a revolution in customer and consumer expectations of products and services, guaranteeing benefits for all actors in the supply chain. When expectations are clearly defined, ambitious, and also shared by a group of people, all advantages can be shared with costs consequently reduced, and the performance of the production system simultaneously improved. Reliability management can be considered the fuel and energy of the most pure, natural, and valued face of competition providing significant incentives for self-improvement. This chapter explains reliability evaluation and management, which are then discussed in more detail in Chaps. 6–8. It introduces the basic statistical definitions, measurements, and models. It is organized as follows. Section 5.2 discusses the difference between the concept of components and systems in reliability engineering. Sections 5.3–5.10 present the fundamentals of the statistical inference and estimation with particular emphasis on the standard probability distribution functions and stochastic process evaluation. In particular, Sect. 5.10 presents several parametric statistical distributions and numerical examples. Section 5.11 introduces availability for repairable components. Finally, Sect. 5.12 presents two significant applications in which the basic reliability parameters are determined using the models and methods illustrated in this chapter.
5.2 Components and Systems in Reliability The aim of reliability theory is to study the failure behavior of components, such as parts of a production system, and the failure behavior of complex systems in order to guarantee that they function correctly during a period when they are in operation. In general, the production system analyzed is made of more than one part, which is in turn composed of several components that perform various functions. From the point of view of reliability, a component is a generic entity (e. g., a tool, a machine, an item of equipment, a part of the equipment) whose failure behavior (and eventu-
5.3 Basic Statistics in Reliability Engineering
ally repair behavior) is known and can be modeled accurately by evaluating a pool of statistical parameters. These are generally time-based and evaluated by ad hoc investigation of failure and repair events in different operating conditions (reliability evaluation models are properly illustrated and applied in Chap. 6). The system is an entity composed of more than one component, whose failure behavior can be evaluated using knowledge of the failure and repair behavior of its basic components. In other words, reliability evaluation of a system can be based on an analysis of the behavior of its components and their logical and physical connections. This analysis is supported by the effective models and methods presented in Chap. 8. In particular, the approach to the evaluation proposed in Chap. 8 attempts to bypass direct quantification of the system’s statistical parameters by implementing an ad hoc investigation that is very expensive in terms of time and money. In fact, the so-called ad hoc investigation is sometimes a destructive task requiring simultaneous analysis of a large and statistically significant number of equal entities (i. e., systems) operating under common conditions. In conclusion, a reliability system is an entity whose failure and/or repair behaviors are not known and whose complexity usually requires one to adopt effective models to support production system reliability evaluation to be based on the basic reliability and maintainability parameters of the components in the system. Finally, a part of a production system is a component when its reliability parameterization is well known, but it is a system when a reliability evaluation and prediction analysis has to be conducted with its components’ basic failure and repair behaviors and parameters.
89
follows:
nA ; (5.1) n where nA is the number of occurrences (chances) of event A in a period of time T and n is the number of occurrences (chances) in T . In other words, event A is a set of outcomes (a subset) to which a probability p.A/ is assigned. The following equations represent two main properties of random events: p.A/ D
N D 1; p.A/ C p.A/
(5.2)
p.;/ D 0;
(5.3)
where AN is the negation of event A and ; is an event without outcomes, i. e., a set without elements. In particular, the failure event is a random occurrence characterized by a probability function that measures the chance of the event occurring in accordance with a specific set of operating conditions. Similarly, repair activity can be modeled by a probability function measurement of the occurrence of the random repair process. A random process, sometimes called a “stochastic process,” is the counterpart in probability theory to a deterministic process and deterministic system. Reliability theory mainly refers to stochastic processes and to the basic statistics briefly introduced and discussed in the current section and in the following chapters to demonstrate the proposed and applied reliability and maintenance analytical models, which are the subject of this book. The conditional probability is the probability of an event A occurring given the occurrence of another event B, as follows: p.A=B/ D
5.3 Basic Statistics in Reliability Engineering In terms of reliability engineering, a failure or a repair can be described as a random event. A random event A can be characterized by the probability of the event occurring. The probability p.A/ is the likelihood or chance that A is either the case or will happen in the future. It is represented by a real number ranging from 0 to 1. p.A/ generally refers to a period of time T as
p.A \ B/ ; p.B/
(5.4)
where A \ B is the intersection of events A and B. Consequently, p.A \ B/ D p.A=B/ p.B/:
(5.5)
A and B are statistically independent in the case where p.A=B/ D p.A/; (5.6) p.A \ B/ D p.A/ p.B/:
90
5 Basic Statistics and Introduction to Reliability
Considering three statistically independent events, Y p.i /: p.A \ B \ C / D p.A/ p.B/ p.C / D i DA;B;C
(5.7) Two events are mutually (or statistically) exclusive in the case of p.A \ B/ D 0; A \ B D ;:
(5.8)
Another useful property in probability analysis and reliability evaluation is the probability of the union of events: p.A [ B/ D p.A/ C p.B/ p.A \ B/;
(5.9)
where A [ B is the union of events A and B. Now considering three independent events A, B, and C , p.A [ B [ C / D p.A/ C p.B/ C p.C /
The underlying general hypothesis is that the generic component is subject to time cycles composed of a functioning period followed by a nonfunctioning period. These periods are separated by the stochastic failure event.
5.5 Probability Distribution Function These random events can be related to probability distributions that describe the values and the probabilities of these events occurring. The values must cover all possible outcomes of the event, while the total amount of the probabilities must sum to 1 exactly. The probability density function represents a probability distribution in terms of an integral. In particular, a probability distribution has density f , where f is a nonnegative integrable function R ! R, so the probability of the interval [a, b] is given by Zb
p.A/ p.B/ p.A/ p.C / p.B/ p.C /
P .a X b/ D
(5.12)
a
C p.A/ p.B/ p.C /: (5.10) In the case where the events are mutually exclusive, [ X p Ai D p.Ai /; (5.11) i
f .x/ dx
i
where Ai is a generic random event.
for any two numbers a and b, where X is a generic random variable (e. g., ttf and ttr). The following is a very important property common to every probability density function and all random variables (i. e., probability distributions): Z1 f .x/ dx D 1:
(5.13)
1
5.4 Time to Failure and Time to Repair Failure of a product or component (system) is a stochastic process. Consequently, the so-called time to failure (ttf1 ), i. e., the time between the starting instant of time (the functioning starting time) of a component (system) and the failure instant of time, is a random variable often attributed to the “useful life.” The value of this variable is closely related to the component (system) operating conditions. The variable of time between failure occurring and the component (system) being returned to service is another random variable known as time to repair (ttr2 ). 1 2
Sometimes abbreviated as TTF Sometimes abbreviated as TTR
The definition of the cumulative distribution function F .y/ is Zy F .y/ D P .X y/ D
f .x/ dx:
(5.14)
1
A probability distribution has a density function if and only if its cumulative distribution function is absolutecontinuous. In this case F is differentiable almost everywhere, and its derivative can be used as a probability density: dF .x/ : (5.15) f .x/ D dx A probability distribution is called “continuous” if its cumulative distribution function is a continuous
5.7 The Reliability Function – R(t)
91
Functioning period
Functioning period
Up Repair instant of time
Down Failure event
t
Fig. 5.1 Component (system) subject to failure and repair events
function. If the distribution of variable X is continuous, then X is called a “continuous random variable,” where pŒX D a D 0; (5.16) where a is a real number. A probability distribution is called discrete if it is characterized by a probability mass function, which is a function that provides the probability that a discrete random variable is exactly equal to a value. Thus, the distribution of a random variable X is discrete, and X is then called a discrete random variable if X p.X D u/ D 1; (5.17) u
where u is a feasible generic value of X . The distributions of discrete random variables do not have a density function.
5.6 Repairable and Nonrepairable Systems Reliability theory distinguishes nonrepairable from repairable entities (i. e., systems or components). When a failure occurs, an entity is nonrepairable if it is not possible to bring it back into service (i. e., function), which is to say its ttr is infinite. When a failure occurs, a component is repairable if it can be made to function again, as illustrated in Fig. 5.1. Nonrepairable equipment is a special class of repairable entities with infinite ttr. Different models are used to evaluate the reliability of repairable and nonrepairable systems. In particular, the reliability R.T /,
defined as the ability of a system or component to perform its required functions under stated conditions for a specified period of time T , is a probability function appropriate for nonrepairable entities. The equivalent quantity defined for repairable components or systems is the availability A.t/, which is a measure of the degree to which an item of equipment is operable in a generic instant of time t. In other words, the availability is the probability that the system is operating at a specified time t. Sections 5.7 and 5.8 examine the basic models and properties of nonrepairable components and systems, while the stochastic repair process is introduced in Sect. 5.9. The diagram in Fig. 5.2 illustrates a simplified failed nonrepairable component/system (the repair activity is forbidden). This is the two-state diagram of a nonrepairable component/system. The hypotheses adopted to model and manage this class of production system are that: • There are only two states for the generic component/system: “in order” (state 0) and “out of order” (state 1). Consequently, no “gray” conditions of functioning exist, i. e., different configurations of the system which differ from the “white” state 0 (the system is functioning perfectly) and the “black” state 1 (the system is not working at all). • The transition from state 0 to state 1 is instantaneous.
5.7 The Reliability Function – R(t) The ttf of a production component or system is generally a random variable due to several factors, most
92
5 Basic Statistics and Introduction to Reliability
of the period of time T :
Failure
R.T / D 1 F .T /
Out of order
In order
ZT f .x/ dx D 1
D1 Fig. 5.2 Two-state diagram of a nonrepairable component/system
of which are not controllable. In the case of a continuous ttf and in the presence of a probability density function representing the distribution of the random values, identifying the parametric and statistical functions (e. g., exponential, lognormal, normal, logistic, loglogistic) which best fit the values could be useful. Equation 5.14 is the cumulative distribution of the random variable x, where f .x/ is the probability density function. This function is also known as the not conditional failure rate, i. e., a measurement of the failure rate assuming the component (system) is functioning at the instant of time t0 D 0. Formally, f .t/ is defined as f .t/ dt D P .t ttf t C dt/: (5.18) Equation 5.18 can also be directly obtained from Eq. 5.15. Equation 5.14 defines the so-called cumulative function of a generic random variable. This function is called the “failure probability function” in the case of a ttf random variable and is defined by a component (or a system) working under stated operating conditions through a related period of time T , called “mission time.” This period of time is the time horizon during which the component/system’s probabilistic failure behavior is quantified. Also called “survival function,” reliability can be defined as the probability that a component (or system) will perform a required function for a given period of time T (i. e., over a period of time) if used under stated operating conditions. It is formally defined as Z1 R.T / D P .ttf T / D
f .x/ dx;
(5.19)
T
where f .t/ is the probability density function of the ttf random variable and T is the mission time. In other words, it measures the probability that the component/system will not fail before the conclusion
ZT
1
f .x/ dx;
(5.20)
0
where F .T / is the failure probability function and ttf is the failure random variable which belongs to the range Œ0; C1/. The reliability function of a component/system usually refers to t (i. e., the independent variable) as a generic instant of time that clearly identifies the mission time as (5.21) T D t t0 assuming the component/system is functioning at the starting operating time t0 , generally equal to 0.
5.8 Hazard Rate Function The failure rate or hazard rate function (t) is an instantaneous rate of failure, and as a conditional probability referring to a point in time t is defined as follows: .t/t D P .t ttf t C t ncomponent-system functioning in t/ D P .t ttf t C t=ttf t/:
(5.22)
Figure 5.3 illustrates the difference between the reliability function and the hazard rate in relation to t and T D t t0 . What is the difference between f .t/ and (t)? As a “nonconditional failure rate,” f .t/ refers to the component/system being in function at point t0 D 0, and is a measurement of failure velocity. As a “conditional failure rate,” (t) differs from f .t/ because it refers to the functioning of the component/system at point t and is another failure velocity, assuming that the component/system is functioning in t. Equation 5.22 can be rewritten as follows: .t/t D P .t ttf t C tnttf t/ D
R.t/ R.t C t/ : R.t/
(5.23)
5.8 Hazard Rate Function
93
λ(t) T
t + Δt τ
t0= 0
Fig. 5.3 Reliability R.T /, failure rate (t ), point in time t , and time mission T
t R(T) • Nh .t/ is the number of “healthy” components at time point t.
From Eq. 5.23, R.t/ R.t C t/ : R.t/t
.t/ D
(5.24)
By these assumptions, Nh .t/ D N Nf .t/; Nf .t/ D 1: lim t !1 N
In more detail, R.t/ R.t C t/ t !0 R.t/t f .t/ 1 dR.t/ D : D R.t/ dt R.t/
.t/ D lim
(5.25)
R.t Z /
.t/ dt D 0
dR.t/ : R.t/
(5.26)
R.0/D1
0 R.t/ D exp @
Zt
1 .x/ dx A ;
F .t/ D 1 exp @
Zt
Nh .t/ N Nf .t/ D N N
(5.31)
and N Nh .t/ Nf .t/ D D 1 R.t/: N N
(5.32)
Nf .t C t/ Nf .t/ t !0 Nt NF .t C t/ NF .t/ D lim t !0 Nt dF .t/ dR.t/ D D : (5.33) dt dt
(5.27)
0
0
R.t/ D
F .t/ D
Then,
(5.30)
The expressions of the reliability and probability function are, respectively,
Consequently, a hazard function can be written as Zt
(5.29)
1 .x/ dx A ;
(5.28)
0
which are, respectively, the general expression of the reliability function and the probability distribution function defined for the period of time T D t 0. Now a simplified model3 of the reliability function based on the following assumptions is introduced: • N is number of identical and nonrepairable components start operating in t0 D 0, i. e., assuming the components are functioning (i. e., state 0, “up” in Fig. 5.1); • Nf .t/ is the number of “failed” components at time point t;
f .t/ D lim
Equation 5.33 is a well known property in statistics but assumes a special value in reliability theory because it links the R.t/ to the density function, f .t/, of the ttf random variable. Similarly, Nf .t C t/ Nf .t/ .t/ D lim t !0 Nh .t/t f .t/ NF .t C t/ NF .t/ D lim D t !0 NR.t/t R.t/ dR.t/ 1 D ; dt R.t/ (5.34)
3
Reliability evaluation models based on statistics are properly illustrated in Chap. 6.
which is identical to the previously given Eq. 5.25.
94
5 Basic Statistics and Introduction to Reliability λ(t)
λ(τ)
t
Fig. 5.5 Linear bathtub curve
t Fig. 5.4 Bathtub curve of the hazard rate function
5.8.1 Hazard Rate Profiles Figure 5.4 presents the well-known bathtub-curve hazard rate. It is a parametric rate function that identifies the failure behaviors of components/systems subject to a running-in period and a stress/strain period, as is typical, e. g., in parts production for mechanical applications. In particular, Fig. 5.4 reveals three different periods during the life cycle of a generic component/system: 1. Running in period (also called “run-in” or “infant mortality”). During the period of time the hazard function generally decreases while the operating time is running. 2. Service life period (also called “design life”). This is the lifetime expected, or the acceptable period of time in use. The hazard function is sometimes assumed to be constant during this period of time. 3. Subject to wear period (also called “wear out”). Degradation of the component/system accelerates, consequently the probability of failure occurring increases. The analytical model of a parametric and linear bathtub curve is introduced to model the random failure behavior of a production component/system as follows: 8 c0 ˆ 0t c0 c1 t C ; ˆ ˆ c1 < c0 (5.35) .t/ D ; < t t0 ˆ c1 ˆ ˆ : c2 .t t0 / C ; t0 < t:
From Eq. 5.35 the expression of the reliability R.t/ is 8 c0 t2 ˆ ˆexp .c0 C /t c1 ; 0t ˆ ˆ 2 c1 ˆ ˆ ˆ ˆ ˆ 2 ˆ c0 ˆ <exp t C c0 ; < t t0 2c1 c1 R.t/ D ˆ h c ˆ ˆ 2 ˆ ˆ .t t0 /2 exp ˆ ˆ 2 ˆ ˆ t0 < t: ˆ c 2 i ˆ : Ct C 0 ; 2c1 (5.36) where c0 , c1 , c2 , and t0 are parameters affecting the profile of the hazard rate (Fig. 5.5). Other typical profiles of the hazard function are reported in Fig. 5.6. The profile in Fig. 5.6a relates to a component whose conditional failure rate is progressively increasing, i. e., the longer the running time, the more the strain and velocity to fail intensifies. This is typical of parts subject to slow wear with a constant trend (e. g., equipment for insulation) where wear out can be a loss or deformation of material. The European standard EN 13306 (Maintenance terminology) defines wear-out failure as “failure whose probability of occurrence increases with the operating time or the number of operations of the item or its applied stresses.” The profile in Fig. 5.6b relates to equipment becoming obsolete quickly, typical of several electronic and electrotechnical parts and components. The conditional failure rate for these items is assumed to be constant, i. e., the instantaneous velocity to failure does not depend on the use of the item: this equipment is “without memory” or “memoryless” and the failure time is random, i. e., accidental. The profiles in Fig. 5.6c and d relate, respectively, to items with a low and a high infant rate at the beginning of their life and a lower increasing hazard rate during a running-in period. Appropriate similar simplified models of the failure rate can also be introduced
5.9 Stochastic Repair Process
95 λ(t)
λ(t)
t
t
(a)
(b)
λ (t)
λ(t)
Fig. 5.6 Examples of simplified profiles of the hazard function, a increasing failure rate, b constant failure rate, c low infant rate, d high infant rate
t
t
(c)
in the evaluation of production component system reliability.
(d)
In the special case of a constant hazard rate (t) (as illustrated in Fig. 5.6b), Z1 R.t/ dt D
MTTF D
5.8.2 Mean Time to Failure
0
This is the statistical mean value of the random variable ttf. Consequently, it is defined as Z1 MTTF D
tf .t/ dt D
tf .t/ dt
0
ˇ1 ˇ ˇ ˇ 1 1 D ˇˇ et ˇˇ D : t
(5.39)
5.9 Stochastic Repair Process
0
Z1 t
dR.t/ dt: dt
(5.37)
0
Using the integration by parts technique, we obtain 8 Z1 ˆ ˆ 1 ˆ MTTF D C R.t/ dt jtR.t/j0 ˆ ˆ ˆ ˆ ˆ 0 ˆ ˆ ˆ Z1 ˆ < D R.t/ dt since ˆ ˆ ˆ 0 ˆ 13 2 0 t ˆ ˆ Z ˆ ˆ ˆ ˆ A5 D 0: 4 @ ˆ ˆ : lim ŒtR.t/ D lim t exp .x/ dx t !1
et dt
Z1
1
D
Z1
t !1
0
(5.38)
The analytical definitions and models previously illustrated mainly refer to the random failure process of a production/component system operating under certain conditions. When maintenance is performed in accordance with prescribed procedures, the repair (i. e., restoration) process for a specific condition of a given failed component or system is stochastic. In addition to the previously discussed assumptions (Sect. 5.6), this section briefly describes this process and introduces several new properties of reliability based on the following hypotheses: • The repair activity is admissible. • The transaction from one state to another is instantaneous.
96
5 Basic Statistics and Introduction to Reliability
.t/t D P .t ttr t C t
Failure
ncomponent-system nonfunctioning in t/: Out of order
In order
Like the hazard rate function, (t) is defined in relation to time point t, while M.t/ is defined in relation to the period of time T D t t0 ; where T is equal to t when t0 D 0. Equation 5.43 can be rewritten as follows:
Repair
Fig. 5.7 Two-state diagram for a repairable component
• Two transactions cannot be executed in a period of time t, i. e., in the infinitesimal dt. • The component/system is assumed to be generally “as good as new” at the end of the repair activity; but this is not always admissible as explained in the following chapters. The diagram shown in Fig. 5.7 shows the assumed states of a part subject to failure and a random repair process. This component/system is called “repairable.” The basic random variable is called “time to repair” (ttr). The probability density function that represents the distribution of values assumed by ttr is g.t/. As with f .t/, g.t/ is a nonconditional rate: nonconditional repair rate. In particular, it is a nonnegative integrable function R ! R, so the repair probability of the interval [a, b] is given by
.t/t D P .t ttr t C tnttr t/ G.t C t/ G.t/ ; (5.44) D 1 G.t/ where .t/ D
G.t C t/ G.t/ : Œ1 G.t/t
g.x/ dx:
(5.40)
1 dG.t/ G.t C t/ G.t/ D : Œ1 G.t/t 1 G.t/ dt (5.46) Consequently, a repairable hazard function and distribution function (or maintainability function) can be written as follows: .t/ D lim
t !0
Zt
G.t Z /
dG.t/ ; 1 G.t/
(5.47)
G.0/D0
G.t/ D 1 exp @
Zt
1 .t/ dt A :
(5.48)
0
Maintainability, i. e., the probability a failed entity will be repaired or restored can be formally defined as M.T / D P .ttr T / D
The MTTR is the statistic mean value of the random variable ttr, which is defined as follows: Z1
ZT g.x/ dx:
(5.41)
0
where g.t/ is the probability density function of the random variable ttr and T is the mission time. There are two known measurements in the stochastic repair process, called “mean time to repair” (MTTR) and the “repair rate function” (t) defined, respectively, as the mean value of the variable ttr and the conditional repair rate: Z1 xg.x/ dx; 0
.t/ dt D 0
a
MTTR D
(5.45)
Then as t ! 0
0
Zb P .a ttr b/ D
(5.43)
(5.42)
MTTR D
Z1 tg.t/ dt D
1
Z1 tg.t/ dt D
0
t
dG.t/ dt: dt
0
(5.49) In the special case in which the hazard rate (t) is constant, Z1 MTTR D
Z1 Œ1 G.t/ dt D
0
ˇ ˇ ˇ 1 t ˇ1 1 ˇ D ˇ e ˇˇ D : 0
et dt
0
(5.50)
The notation corresponding to the generic random failure and to repair processes is summarized in Table 5.1.
5.10 Parametric Probability Density Functions Table 5.1 Stochastic failure and repair processes Failure process
Repair process
F .t /
G.t /
f .t /
g.t /
MTTF
MTTR
(t )
(t )
MTTF mean time to failure, MTTR mean time to repair
In particular, defined to identify the failure process, the failure probability function F .t/ corresponds to the maintainability function G.t/ in the repair process. The first columns in Tables 5.2 and 5.3, respectively, report the main definitions and properties of reliability quantities for a process randomly deteriorating to failure and for a random repair process concerning a failed component/system. The reliability engineering of repairable components/systems introduces the time between failures as the time duration between two consecutive failures of an item. As a consequence, it is possible to quantify the so-called mean time between failures, which is the mean value of the random time between failures.
97
ticular, the failures in the stochastic failure process, which is known as the exponential reliability function, are due to completely random or chance events, which is often the case during the useful life of an electronic or electrotechnical component/system. For a given generic continuous random variable x, the exponential probability density function is defined as follows: f .x/ D ex ;
x > 0:
(5.51)
The cumulative function and the mean function are quantified, respectively, as follows: Zx F .x/ D
f .x/ dx D 1 ex ;
(5.52)
1
4.8 4.4 4 3.6 3.2 2.8 λ = 0.25 λ= 1 λ= 5
5.10 Parametric Probability Density Functions
f(x)
2.4 2
1.6 1.2 0.8 0.4 0
0
0.5
1
1.5
2
2.5
3
x
Fig. 5.8 Exponential distribution, density function M.x/ D f4I 1I 0:2g 1 0.9 0.8 0.7 λ = 0.25 λ= 1 λ= 5
0.6
F(x)
This section presents a set of probability density functions presented in the literature that are used to determine the probability of failure and repair events occurring. These are parametric functions based on a small number of parameters whose values unequivocally identify a probability function and the stochastic behavior of a random event. There are several effective statistical methods of identifying the best parameterization of a generic density function in order to model a stochastic process. Some of these evaluating models and methods are presented and applied in the next chapter, and are supported by several commercial tools developed for both statistical and reliability evaluation.
0.5 0.4 0.3
5.10.1 Constant Failure Rate Model: The Exponential Distribution The models discussed in this section are based on the so-called exponential probability distribution. In par-
0.2 0.1 0
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
x
Fig. 5.9 Exponential distribution, cumulated function
5
98
5 Basic Statistics and Introduction to Reliability
Table 5.2 Stochastic failure process. Main definitions and properties of nonrepairable components Hazard function (t ) x and t are the random variable ttf t >0
Constant hazard rate function .t / D
R.t / C F .t / D 1 R.0/ D 1
F .0/ D 0
R.1/ D 0
F .1/ D 1
f .t / D
dF .t / dt
f .t / dt D F .t C dt / F .t / Zt F .t / D
f .x/ dx 0
R.t / D
Z1 f .x/ dx t
f .t / f .t / D .1 F .t // R.t / 1 0 Zt f .t / D .t / exp @ .x/ dx A
.t / D
0
0 F .t / D 1 exp @ 0 R.t / D exp @
Zt
f .t / D et
1 .x/ dx A
0
Zt
.t / D
F .t / D 1 et
1
.x/ dx A
R.t / D et
0
Z1 Z1 MTTF D xf .x/ dx D R.t / dt 0
MTTF D
1
0
ttf time to failure
C1 Z 1 M.x/ D Œxf .x/ dx D :
(5.53)
1
From Eq. 5.53 the mean value, i. e., the expected value, is constant. Consequently, in the case of a random failure process and an exponential distribution of values, the MTTF is constant and equal to the inverse of the constant hazard function. Figure 5.8 illustrates the trend of the exponential density function f .x/ for different values of constant hazard rate . Similarly, Fig. 5.9 shows the trend of the cumulative function F .x/.
When the distribution of failures is exponential, the following equations are obtained for reliability R.t/, failure probability F .t/, and nonconditional failure rate f .t/: R.t/ D et ; F .t/ D 1 e f .t/ D
t
(5.54) ;
(5.55)
dF .t/ D et : dt
(5.56)
Similarly, for a random repair process in which ttr is exponentially distributed, maintainability G.t/ and not
5.10 Parametric Probability Density Functions
99
Table 5.3 Stochastic repair process. Main definitions and properties of repairable components Constant repair rate function .t / D
Repair rate function (t ) x and t are the random variable ttr t >0 G.0/ D 0 G.1/ D 1 dG.t / dt g.t / dt D G.t C dt / G.t / g.t / D
Zt G.t / D
g.x/ dx 0
.t / D
g.t / Œ1 G.t / 0
.t / D
g.t / D .t / exp @ 0 G.t / D 1 exp @
Zt 0
Zt
1 .x/ dx A
g.t / D et
1 .x/ dx A
G.t / D 1 et
0
Z1 MTTR D xg.x/ dx
MTTR D
1
0
ttr time to repair Table 5.4 Time to failure (ttf) in minutes of an electronic component 12,571.02 16,566.82 18,433.96 18,741.88 11,35.786 19,025.89 19,556.63 22,477.93 27,838.93 32,185.33
52,492.86 53,197.55 56,094.05 56,539.05 32,290.36 56,788.96 56,878.74 57,106.58 57,541.64 58,470.93
76,739.5 77,284.16 77,656.09 82,304.53 63,034.87 82,733.7 83,145.33 83,336.68 92,298.63 97,400.47
141,107.7 142,527.9 145,527.7 148,483.6 97,443.35 150,747.2 151,409.6 152,489 154,131.8 155,809.6
221,538.8 246,367.7 257,147.7 257,335.3 158,096.7 278,000.5 279,977 285,308.8 290,657 295,666.3
conditional repair rate g.t/ are defined as G.t/ D 1 et ; g.t/ D
dG.t/ D et : dt
(5.57)
2,321.06 6,340.624 7,007.418 10,591.91 10,743.09 11,695.93 7,201.37 7,433.18 8,352.128 9,512.557
36,523.39 36,727.35 38,415.69 48,893.78 49,081.61 51,812.46 41,429.79 42,878.09 44,267.55 44,415.77
64,559.04 65,590.31 67,692.19 73,302.27 74,263.19 76,394.68 68,527.89 69,292.39 69,720.86 71,725.63
97,914.57 101,450.9 104,813.9 134,817.2 134,993.7 138,521.1 106,475.2 109,851.8 120,703.4 128,467.2
159,237.6 161,166.7 163,365.4 192,251.1 198,138.9 216,529.9 164,287.1 165,079.1 180,107.8 189,962.4
Table 5.3 presents the summarizing analytical models for repairable components/systems in both the absence (first column) and the presence (second column) of an exponential distribution of ttr.
(5.58)
Table 5.2 reports the main definitions and properties of the stochastic failure process of nonrepairable components/systems for both the generic item (first column) and for items whose density function is assumed to be exponential (second column).
5.10.2 Exponential Distribution. Numerical example Table 5.4 presents the ttf of a sample of 100 electronic components produced by a company in the USA. The
100
5 Basic Statistics and Introduction to Reliability
ReliaSoft Weibull++ 7 - www.ReliaSoft.com
Failure Timeline FS Timeline Failure
0.000
60000.000
120000.000
180000.000
240000.000
300000.000
Time, (t) λ = 1.1340Ε−5, ρ = −0.9062
Fig. 5.10 Failure timeline. ReliaSoft® software
ReliaSoft Weibull++ 7 - www.ReliaSoft.com
TTF Histogram 8.000E-6
FS Histogram Pdf Line Failures
6.400E-6
Value
4.800E-6
3.200E-6
1.600E-6
0.000
0-29453.... 29453.05... 58906.10... 88359.15... 117812.2... 147265.2... 176718.3... 206171.3... 235624.4... 265077.4... 294530.5...
Period λ = 1.1340Ε −5, ρ = −0.9062
Fig. 5.11 ttf histogram. ReliaSoft® software
5.10 Parametric Probability Density Functions
101
ReliaSoft Weibull++ 7-www.ReliaSoft.com
Probability - Exponential
99.900
Probability-Exponential Folio2\Data 1 Exponential-1P RRX SRM MED FM F=100/S=0 Data Points Probability Line
Reliability, R(t)
50.000
10.000 5.000
1.000 0.500
0.100 0.000
60000.000
120000.000
180000.000
240000.000
300000.000
Time, (t) λ = 1.1340Ε−5, ρ = −0.9062
Fig. 5.12 Probability plot, exponential distribution. ReliaSoft® software
generic variable time relates to the use of the component and is expressed in minutes. Figure 5.10 presents the failure timeline, i. e., the graphical collection and representation of failures according to the available ttf, while Fig. 5.11 shows the related histogram from which it is possible to identify a possible parametric distribution of the random values. Figure 5.12 presents the so-called probability plot, which is a graphical technique for assessing whether or not a data set follows a given distribution. In particular, the data are plotted against a theoretical (in other words a parametric) distribution so that the points approximate a straight line. Departure from this straight line indicates departure from the specified distribution. Furthermore, conducted with the support of ReliaSoft® reliability software and illustrated in Fig. 5.12, the proposed analysis assesses whether or not the ttf values follow an exponential distribution. The following chapter discusses the ability of a generic parametric distribution to best fit an available set of stochastic data in order to develop the reliability evaluation models and methods useful to practitioners. In fact, the probability plots can be
generated for different competing parametric distributions to identify which provides the best fit, and the probability plot generating the highest correlation coefficient is the best choice since it generates the straightest probability plot. The plot illustrated in Fig. 5.12 shows that there seems to be good correlation between the available ttf and an exponential distribution, which is supported by the estimate of the cumulative distribution function F .t/, i. e., the failure probability function, i. e., unreliability, as reported in Fig. 5.13. Similarly, Fig. 5.14 presents the estimated reliability function, i. e., the survival function R.t/. The estimated value of failure rate is .t/ D 1:13 105 min1 : Figure 5.15 presents the related trend of the estimated probability density function f .t/ and the constant failure rate (t). Figure 5.16 completes the illustration of this numerical application. It is the result of a nonparametric reliability evaluation based on the estimation of a set of lower and upper bounds for the reliability R.t/. This analysis is illustrated and discussed in the next chapter.
102
5 Basic Statistics and Introduction to Reliability
ReliaSoft Weibull++ 7 - www.ReliaSoft.com
Unreliability vs Time Plot
1.000
Unreliability Folio2\Data 1 Exponential-1P RRX SRM MED FM F=100/S=0 Data Points Unreliability Line
Unreliability, F(t)=1-R(t)
0.800
0.600
0.400
0.200
0.000 0.000
140000.000
280000.000
420000.000
560000.000
700000.000
Time, (t) λ = 1.1340Ε−5, ρ = −0.9062
Fig. 5.13 Unreliability, exponential distribution. ReliaSoft® software
ReliaSoft Weibull++ 7 - www.ReliaSoft.com
Reliability vs Time Plot 1.000
Reliability Folio2\Data 1 Exponential-1P RRX SRM MED FM F=100/S=0 Data Points Reliability Line
Reliability, R(t)=1-F(t)
0.800
0.600
0.400
0.200
0.000 0.000
140000.000
280000.000
420000.000
560000.000
Time, (t) λ = 1.1340Ε−5, ρ = −0.9062
Fig. 5.14 Reliability, exponential distribution. ReliaSoft® software
700000.000
5.10 Parametric Probability Density Functions
103 ReliaSoft Weibull++ 7 - www.ReliaSoft.com
ReliaSoft Weibull++ 7 - www.ReliaSoft.com
Probability Density Function Pdf
1.600E-5
Folio2\Data 1 Exponential-1P RRX SRM MED FM F=100/S=0 Pdf Line
f(t)
1.200E-5
8.000E-6
2.000E-5
Failure Rate
1.600E-5
Folio2\Data 1 Exponential-1P RRX SRM MED FM F=100/S=0 Failure Rate Line
1.200E-5
8.000E-6
4.000E-6
4.000E-6
0.000 0.000
Failure Rate, f(t)/R(t)
Failure Rate vs Time Plot
2.000E-5
60000.000
120000.000
180000.000
240000.000
300000.000
0.000 0.000
60000.000
120000.000
Time, (t)
180000.000
240000.000
300000.000
Time, (t) λ = 1.1340Ε−5, ρ = −0.9062
λ = 1.1340Ε−5, ρ = −0.9062
Fig. 5.15 Probability density function and failure rate. ReliaSoft® software
ReliaSoft Weibull++ 7 - www.ReliaSoft.com
Reliability vs Time 1.000
Kaplan-Meier Reliability Lower Bound Upper Bound
0.800
Reliability
0.600
0.400
0.200
0.000 0.000
60000.000
120000.000
180000.000
240000.000
300000.000
Time, (t)
Fig. 5.16 Nonparametric evaluation. ReliaSoft® software
5.10.3 The Normal and Lognormal Distributions Two useful time-dependent statistical models are frequently applied in reliability theory. The normal probability density function is a continuous and parametric
distribution defined as follows: .x /2 1 exp ; f .x/ D p 2 2 2
(5.59)
where and are two parameters, respectively, equal to the mean and the standard deviation of the random variable x.
104
5 Basic Statistics and Introduction to Reliability 1.1
The following models quantify the cumulative function and the mean function: F .x/ D 1
0.9
x 1 ; 1 C erf f .x/ dx D p 2 2
0.7
F(x)
Zx
(5.60)
C1 Z M.x/ D Œxf .x/ dx D ;
μ=1, σ =0.2 0.5
μ=1, σ =0.4 μ=1, σ =0.8
(5.61)
0.3
1
The integral in Eq. 5.62 cannot be evaluated in closed form in terms of an elementary function (differential algebra), but it can be evaluated by expanding the integrand in a Taylor series as follows: ! 1 n x Y x 2 2 X : (5.63) erf.x/ D p i nD0 2n C 1 i D1
0.1 -2
-1
-0.1
0
1
2
3
x
Fig. 5.18 Normal distribution, cumulative function. D 1, D f0:2; 0:4; 0:8g 40 35 30
f(x)/[1-F(x)]
where erf(x) is the error function (also called the “Gauss error function”). Erf(x) is a nonelementary function because it is not built from a finite number of exponential functions, logarithms, constants, one variable, and root (mathematics) of equations by function composition and combinations using the four arithmetic operations (+ – /. In particular, it is defined as 8 2 Rx t 2 ˆ ˆ erf.x/ D p e dt ˆ ˆ < 0 (5.62) ˆ ˆ 2 x 2 ˆd ˆ : erf.x/ D p e : dx
25 μ=1, σ=0.2
20
μ=1, σ=0.4 μ=1, σ=0.8
15 10 5
-1.5
-1
-0.5
0
0
0.5
1
1.5
2
2.5
x
Fig. 5.19 Normal distribution, rate (x). D 1, D f0:2; 0:4; 0:8g
2.2
1.7
1.2
f(x)
μ=1, σ=0.2 μ=1, σ=0.4 μ=1, σ=0.8 0.7
0.2
-1
0
1
2
3
-0.3
x
Fig. 5.17 Normal distribution, density function. D 1, D f0:2; 0:4; 0:8g
Figure 5.17 illustrates the trend of the normal density function f .x/ for different values of the standard deviation , assuming D 1. Similarly, Fig. 5.18 presents the trend of the cumulative function F .x/, which is the failure probability function in the case where the variable x is the ttf. Figure 5.19 presents the values of (x) obtained by applying Eq. 5.25. Figure 5.20 presents the trend of f .x/ and F .x/ for different values of and . The lognormal distribution is the probability distribution of a random variable whose logarithm is a normal distribution. The probability density function is
5.10 Parametric Probability Density Functions
105
2.2
1.1
0.9
1.7
0.7
1.2
f(x)
F(x)
μ=1, σ=0.2 μ=2, σ=0.4
0.5
μ=3, σ=0.8
0.7 0.3
0.2
0.1
0
1
2
-0.3
3
4
5
6
7
0
1
2
3
-0.1
x
4
5
6
7
x
Fig. 5.20 Normal distribution, density function and cumulative function
defined as follows: ! 8 ˆ 1 Œln.x/ 2
0:
0.8 0.7 0.6
(5.64)
0.5 μ=1, σ =0.2
f(x)
0.4
The cumulative distribution is F .x/ D
f .x/ dx D
Zx f .x/ dx
x>0
1
0.1
0
1 1 ln.x/ : D C erf p 2 2 2
0 2
4
-0,1
6
8
10
x
Fig. 5.21 Lognormal distribution, density function. D 1, D f0:2; 0:4; 0:8g
C1 C1 Z Z D M.x/ D Œxf .x/ dx Œxf .x/ dx x>0
2 : D exp C 2
0
(5.65)
The mean function, i. e., the expected value, is
1
μ=1, σ =0.8
0.2
1.1
0.9
0
(5.66)
Figures 5.21 and 5.22 illustrate the trend of the density function f .x/ and the cumulative function F .x/ for different parameterizations of the analytical model. Figure 5.23 presents the values of the rate obtained by applying Eq. 5.25. The lognormal distribution is generally used to model the stochastic repair process that is characterized by the previously introduced random variable time to repair (ttr). In particular, Figs. 5.24–5.26 illustrate the trend of the most significant functions that describe the repair process, assuming a lognormal distribution of a set of ttr values (represented by the dots
0.7
F(x)
Zx
μ=1, σ =0.4 0.3
μ=1, σ=0.2
0.5
μ=1, σ=0.4 μ=1, σ=0.8
0.3
0.1 0 -0.1
2
4
6
8
10
x
Fig. 5.22 Lognormal distribution, cumulative function. D 1, D f0:2; 0:4; 0:8g
106
5 Basic Statistics and Introduction to Reliability
4
1.0 0.9
3.5
0.8 0.7
3
G(x)
0.6 2.5 μ=1, σ=0.2 2
λ(x)
0.4 0.3
μ=1, σ=0.4
0.2
μ=1, σ=0.8
1.5
0.5
0.1 0.0
1
0
1
2
3
4
5
6
7
4
5
6
7
x
0.5
Fig. 5.25 Repair process, G.t /
0 0
2
4
6
8
10
12
x
Fig. 5.23 Lognormal distribution, .x/. D 1, D f0:2; 0:4; 0:8g
0.9 0.8 0.7
g(x)/[1-G(x)]
0.7 0.6
g(x)
0.5 0.4
0.6 0.5 0.4 0.3 0.2 0.1
0.3
0.0 0.2
0
1
2
0.1
3
x
Fig. 5.26 Repair process, (t )
0.0 0
1
2
3
4
5
6
7
x
Fig. 5.24 Repair process, g.t /
in figure). These functions are: • the density function of the ttr variable g.t/, also called “nonconditional repair rate”; • the cumulative function G.t/, also called “maintainability”; • the conditional repair rate (t).
5.10.4 Normal and Lognormal Distributions. Numerical example The failure timeline of the stochastic failure process for a mechanical component, for which a sample of
100 ttf is available, is reported in Fig. 5.27. The frequency distribution of values is illustrated in the histogram shown in Fig. 5.28. Figures 5.29 and 5.30 present the result of a parametric evaluation of the probability plot and reliability measures assuming a normal distribution of random values. Similarly, Figs. 5.31 and 5.32 present the results obtained assuming a lognormal distribution of random values. Both parametric evaluation analyses seem to fit the random variables effectively. Nevertheless, the statistical distributions (normal and lognormal) differ and so do the estimated values of the reliability parameters when one of them is assumed. In-depth analysis using ad hoc “goodness of the fit” models is introduced in the next chapter.
5.10 Parametric Probability Density Functions ReliaSoft Weibull++ 7 - www.ReliaSoft.com
107
Failures Timeline FS Timeline Failure
0.000
1.000
2.000
3.000
4.000
5.000
Time, (t) β = 8.8740, η = 3.2201, ρ = 0.9885
Fig. 5.27 Failure timeline. ReliaSoft® software
ReliaSoft Weibull++ 7 - www.ReliaSoft.com 2.000
TTF Histogram FS Histogram Pdf Line Failures
1.600
Value
1.200
0.800
0.400
0.000 0-0.2 0.2-0.4 0.4-0.6 0.6-0.8 0.8-1.0 1.0-1.2 1.2-1.4 1.4-1.6 1.6-1.8 1.8-2.0 2.0-2.2 2.2-2.4 2.4-2.6 2.6-2.8 2.8-3.0 3.0-3.2 3.2-3.4 3.4-3.6 3.6-3.8 3.8-4.0 4.0-4.2 4.2-4.4 4.4-4.6 4.6-4.8 4.8-5.0
Period β = 8.8740, η = 3.2201, ρ = 0.9885
Fig. 5.28 ttf histogram. ReliaSoft® software
108
5 Basic Statistics and Introduction to Reliability ReliaSoft Weibull++ 7 - www.ReliaSoft.com
Probability - Normal distribution
99.900
Probability-Normal (Normal-2P)
Unreliability, F(t)
Folio1\Data 1 Normal-2P RRX SRM MED FM F=100/S=0 Data Points Probability Line
50.000
10.000 5.000
1.000 0.500 0.100 1.000
Fig. 5.29 Probability plot, normal distribution. ReliaSoft® software
2.000
3.000
4.000
ReliaSoft W eibull++ 7 - www.ReliaSoft.com
Unreliability vs Time Plot distribution
Reliability vs Time Plot distribution
1.000
1.000
Unreliability (Normal-2P) Folio1\Data 1 Normal-2P RRX SRM MED FM F=100/S=0 Data Points Unreliability Line
0.600
0.400
Reliability (Normal-2P) Folio1\Data 1 Normal-2P RRX SRM MED FM F=100/S=0 Data Points Reliability Line
0.800
Reliability, R(t)=1-F(t)
0.800
Unreliability, F(t)=1-R(t)
6.000
μ = 3.1380, σ = 0.4668, ρ = 0.9581
ReliaSoft Weibull++ 7 - www.ReliaSoft.com
0.200
0.000 0.000
5.000
Time, (t)
0.600
0.400
0.200
1.000
2.000
3.000
4.000
0.000 0.000
5.000
1.000
2.000
3.000
4.000
μ = 3.1380, σ = 0.4668, ρ = 0.9581
μ = 3.1380, σ = 0.4668, ρ = 0.9581
ReliaSoft Weibull++ 7 - www.ReliaSoft.com
ReliaSoft Weibull++ 7 - www.ReliaSoft.com
Failure Rate vs Time Plot distribution
Probability Density Function distribution 0.900
20.000
Pdf (Normal-2P)
Failure Rate (Normal-2P)
Folio1\Data 1 Normal-2P RRX SRM MED FM F=100/S=0 Pdf Line
Folio1\Data 1 Normal-2P RRX SRM MED FM F=100/S=0 Failure Rate Line
0.720
Failure Rate, f(t)/R(t)
16.000
f(t)
0.540
0.360
0.180
0.000 0.200
12.000
8.000
4.000
1.360
2.520
3.680
4.840
6.000
0.000 0.200
1.360
2.520
3.680
Time, (t)
Time, (t) μ = 3.1380, σ = 0.4668, ρ = 0.9581
5.000
Time, (t)
Time, (t)
μ = 3.1380, σ = 0.4668, ρ = 0.9581
Fig. 5.30 F .t /, R.t /, f .t /, and (t ). Normal distribution. ReliaSoft® software
4.840
6.000
5.10 Parametric Probability Density Functions
109
ReliaSoft Weibull++ 7 - www.ReliaSoft.com
Probability - Lognormal distribution
99.900
Probability-Lognormal (Lognormal-2P)
Unreliability, F(t)
Folio1\Data 1 Lognormal-2P RRX SRM MED FM F=100/S=0 Data Points Probability Line
50.000
10.000 5.000
1.000 0.500 0.100 1.000
Fig. 5.31 Probability plot, lognormal distribution. ReliaSoft® software
10.000
Time, (t)
μ = 1.1053, σ = 0.1429, ρ = 0.9907
ReliaSoft Weibull++ 7 - www.ReliaSoft.com
ReliaSoft Weibull++ 7 - www.ReliaSoft.com
Probability Density Function distribution
Failure Rate vs Time Plot distribution
1.000
7.000
Pdf (Lognormal-2P)
Failure Rate (Lognormal-2P)
Folio1\Data 1 Lognormal-2P RRX SRM MED FM F=100/S=0 Pdf Line
Folio1\Data 1 Lognormal-2P RRX SRM MED FM F=100/S=0 Failure Rate Line 5.600
Failure Rate. f(t)/R(t)
0.800
f (t)
0.600
0.400
0.200
4.200
2.800
1.400
0.000 0.000
2.000
4.000
Time, (t)
6.000
8.000
0.000 0.000
10.000
μ = 1.1053, σ = 0.1429, ρ = 0.9907
16.000
48.000
64.000
80.000
Reliability vs Time Plot distribution
Unreliability vs Time Plot distribution 1.000
Unreliability (Lognormal-2P)
0.800
Folio1\Data 1 Lognormal-2P RRX SRM MED FM F=100/S=0 Data Points Unreliability Line
Reliability, R(t)=1-F(t)
Unreliability, F(t)=1-R(t)
Time, (t)
ReliaSoft Weibull++ 7 - www.ReliaSoft.com
ReliaSoft Weibull++ 7 - www.ReliaSoft.com
0.600
0.400
1.000
Reliability (Lognormal-2P)
0.800
Folio1\Data 1 Lognormal-2P RRX SRM MED FM F=100/S=0 Data Points Reliability Line
0.600
0.400
0.200
0.200
0.000 0.000
32.000
μ = 1.1053, σ = 0.1429, ρ = 0.9907
1.000
μ = 1.1053, σ = 0.1429, ρ = 0.9907
2.000
Time, (t)
3.000
4.000
5.000
0.000 0.000
1.000
μ = 1.1053, σ = 0.1429, ρ = 0.9907
Fig. 5.32 F .t /, R.t /, f .t / and (t ). Lognormal distribution. ReliaSoft® software
2.000
Time, (t)
3.000
4.000
5.000
110
5.10.5 The Weibull Distribution This is a time-dependent failure model and one of the most useful parametric distributions in reliability engineering. The Weibull density function f .x/ is defined as follows: 8 0; where a is a scale parameter4 and b is a shape parameter5 . b is called a “shape parameter” because: • b < 1 implies infant mortality, i. e., high mortality of infants typical of both electronic and mechanical systems. This is why, before the products are delivered, several of the components are subject to acceptance tests known as “burn-in” and stress screening so that infant mortality is bypassed. Hazard rate declines with age. • b D 1 implies random failures, i. e., failure modes are “ageless” and the probability density function is an exponential in which D 1=a. • 1 < b < 4 implies early wear out. The cost of unplanned failure for this component is generally higher than the cost of planned failure. Consequently, there is an optimal replacement time that minimizes the global cost. • b 4 implies old age and rapid wear out. The probability density function is somewhat symmetrical and similar to a normal distribution. Typical failure modes are stress corrosion, material properties, erosions, etc. These components require inspection and corrective action. Waloddi Weibull (1887–1979) introduced the “B10” life, which is the age at which 10% of the “bearings” fail and can be directly read from the Weibull plot. For example, some manufacturers use B10 life for design requirements, some use lower values (e. g., B0.1 for serious failures or B0.01 for catastrophic failures). The cumulative function of the Weibull probability distribution is x b : (5.68) F .x/ D 1 exp a 4 5
Sometimes represented by ˛ Sometimes represented by ˇ
5 Basic Statistics and Introduction to Reliability
The mean function is C1 C1 Z Z M.x/ D Œxf .x/ dx D Œxf .x/ dx x>0
1
1 ; D a 1 C b
0
(5.69)
where .x/ is the gamma function defined as Z1 .X / D
y x1 ey dy:
(5.70)
0
Table 5.5 presents the value of the gamma function for different values of the variable x. From Eq. 5.25, function (x) is .x/ D
b x b1 : a a
(5.71)
Figures 5.33–5.39 illustrate the trend of the density function f .x/, cumulative function F .x/, and rate function (x) for different combinations of parameters a and b. In particular, Figs. 5.33–5.35 assume b D 1 and different values of a. Similarly Figs. 5.36–5.38 illustrate the obtained values for different shape parameters given a equal to 2. Figure 5.39 presents a zoom of Fig. 5.37 based on a different scale for the function (x/ 2 Œ0I 3 : The equations for reliability and maintainability in the case of a ttf or a ttr random variable distributed in accordance with a Weibull probability distribution are the following: " # 8 b ˆ 0; where T is the mission time defined on the ttf stochastic variable in agreement with the definition introduced in Eq. 5.19. Then, " # 8 b ˆ 0;
ln[ .x/]
2.968879 2.252713 1.827814 1.524064 1.288023 1.095798 0.934581 0.796678 0.677087 0.572365 0.480031 0.398234 0.325552 0.260867 0.203281 0.152060 0.106595 0.066376 0.030969 0.000000
x
0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00
19.470085 9.513508 6.220273 4.590844 3.625610 2.991569 2.546147 2.218160 1.968136 1.772454 1.616124 1.489192 1.384795 1.298055 1.225417 1.164230 1.112484 1.068629 1.031453 1.000000
.x/
Table 5.5 Gamma function
1.05 1.10 1.15 1.20 1.25 1.30 1.35 1.40 1.45 1.50 1.55 1.60 1.65 1.70 1.75 1.80 1.85 1.90 1.95 2.00
x
.x/ 0.973504 0.951351 0.933041 0.918169 0.906402 0.897471 0.891151 0.887264 0.885661 0.886227 0.888868 0.893515 0.900117 0.908639 0.919063 0.931384 0.945611 0.961766 0.979881 1.000000
ln[ .x/] 0.026853 0.049872 0.069306 0.085374 0.098272 0.108175 0.115241 0.119613 0.121421 0.120782 0.117806 0.112592 0.105231 0.095808 0.084401 0.071084 0.055924 0.038984 0.020324 0.000000 2.05 2.10 2.15 2.20 2.25 2.30 2.35 2.40 2.45 2.50 2.55 2.60 2.65 2.70 2.75 2.80 2.85 2.90 2.95 3.00
x 0.021937 0.045438 0.070456 0.096947 0.124872 0.154189 0.184864 0.216859 0.250143 0.284683 0.320449 0.357412 0.395545 0.434821 0.475215 0.516703 0.559262 0.602870 0.647505 0.693147
ln[ .x/] 1.022179 1.046486 1.072997 1.101802 1.133003 1.166712 1.203054 1.242169 1.284209 1.329340 1.377746 1.429625 1.485193 1.544686 1.608359 1.676491 1.749381 1.827355 1.910767 2.000000
.x/ 3.05 3.10 3.15 3.20 3.25 3.30 3.35 3.40 3.45 3.50 3.55 3.60 3.65 3.70 3.75 3.80 3.85 3.90 3.95 4.00
x 0.739777. 0.787375 0.835924 0.885405 0.935802 0.987099 1.039279 1.092328 1.146231 1.200974 1.256542 1.312923 1.370104 1.428072 1.486816 1.546322 1.606581 1.667580 1.729310 1.791759
ln[ .x/] 2.095468 2.197620 2.306944 2.423965 2.549257 2.683437 2.827178 2.981206 3.146312 3.323351 3.513252 3.717024 3.935761 4.170652 4.422988 4.694174 4.985735 5.299330 5.636763 6.000000
.x/
4.05 4.10 4.15 4.20 4.25 4.30 4.35 4.40 4.45 4.50 4.55 4.60 4.65 4.70 4.75 4.80 4.85 4.90 4.95 5.00
x
1.854918 1.918777 1.983326 2.048556 2.114457 2.181021 2.248239 2.316103 2.384605 2.453737 2.523490 2.593857 2.664831 2.736405 2.808571 2.881323 2.954654 3.028557 3.103026 3.178054
ln[ .x/]
6.391177 6.812623 7.266873 7.756690 8.285085 8.855343 9.471046 10.136102 10.854777 11.631728 12.472045 13.381286 14.365527 15.431412 16.586207 17.837862 19.195079 20.667386 22.265216 24.000000
.x/
5.10 Parametric Probability Density Functions 111
112
5 Basic Statistics and Introduction to Reliability Weibull F(x), a=2
Weibull f(x), b=1 2.5
1 0.8
1.5
a=4 a=2 a=1 a=0.5
1
F(x)
f(x)
2
b=4 b=2 b=1 b=0.5
0.6 0.4
0.5
0.2 0 0
2
4
6
8
10
0
x
0
Fig. 5.33 Weibull distribution, density function. b D 1
2
4
6
8
10
x
Weibull F(x), b = 1
Fig. 5.37 Weibull distribution, cumulative function. a D 2
1.2 1 0.8 F(x)
a=4
0.6
a=2 a=1
0.4
a=0.5
0.2 0 0
1
2
3
4
5
6
7
8
9
10
x
Fig. 5.34 Weibull distribution, cumulative function. b D 1
Fig. 5.38 Weibull distribution, (x). a D 2
Weibull λ(x), b=1 2.5
f(x)/[1-F(x)]
2 a=4 a=2 a=1 a=0.5
1.5 1 0.5 0 0
2
4
6
8
10
x
Fig. 5.35 Weibull distribution, failure rate (x). b D 1
Fig. 5.39 Weibull distribution, (x). a D 2, zoom
where T is the mission repair time defined on the ttr random variable in agreement with Eq. 5.48 for repairable components.
5.10.6 Weibull Distribution. Numerical Example Fig. 5.36 Weibull distribution, density function. a D 2
This section presents the statistical evaluation of the reliability parameters for the random ttf values intro-
5.11 Repairable Components/Systems: The Renewal Process and Availability A(t)
duced in Sect. 5.4, assuming a Weibull parametric distribution of values. In particular, Fig. 5.40 presents the Weibull probability plot, while Fig. 5.41 presents F .t/, R.t/, estimated f .t/, and (t). From a qualitative point of view, the graphical trends obtained for f .t/, F .t/, and R.t/ seem to be similar to those previously illustrated, assuming a normal or a lognormal distribution of random values. But their failure rate trends and values differ very much. This justifies the importance of the parametric reliability evaluation process discussed in the next chapter.
5.11 Repairable Components/Systems: The Renewal Process and Availability A(t) The first group of definitions, models, and properties previously discussed refer to “nonrepairable” components and the second group refer to “repairable” components in the stochastic repair process. This section introduces useful new definitions and models to characterize repairable components/systems subject to function, failure, and repair (FFR) cycles as illustrated in Fig. 5.1. A very important measurement of reliability for repairable components is the nonconditional hazard rate w(t) defined for the range t 2 Œt0 D 0; C1/. In fact, a generic repairable entity is subject to FFR cycles. Consequently, a nonconditional hazard rate f .t/ as introduced for nonrepairable entities (see Sects. 5.7 and 5.8) cannot be identified. f .t/ is the density function of the unique random variable ttf defined for a nonrepairable component subject to a degradation process to failure. In other words, while the failure event is unique for nonrepairable items, a repairable component exposed to FFR cycles is subject to several degradation processes to failure during its life cycle, starting from the point in time t0 D 0 as illustrated in Fig. 5.1. In particular, for a given repairable component which starts to function in t0 D 0, w.t/ quantifies the rate, i. e., the velocity, to failure at time point t as follows: P .t ttQf t C dtncomponent is in state of function in t D t0 / D w.t/ dt;
(5.74)
113
where ttQf is a random variable defined in the range Œt0 D 0; C1Œ. From Eq. 5.74, w.t/ measures the probability of the repairable component failing in the range Œt; t C dt. The variable ttQf differs from the traditional time to failure variable ttf introduced in Sect. 5.4. Moreover, ttf can also be defined for a repairable component subject to FFR cycles: it represents the period of time from a generic starting point time ti and the following time t when a failure occurs. ti can be equal to t0 or immediately follow the conclusion of the restoration process of a repaired item. Consequently, a set of different time to failure random variables can be defined for a repairable item, which strongly depends on the operating conditions during the generic cycle and the state of function and health of the component after the previous restoration. Similarly, different time to repair random variables can be defined for a repairable item subject to FFR cycles. The generic stochastic repair process obviously depends upon the state of failure and the operating conditions under the repair activity. The ttQf variable is used in the following chapters to demonstrate theoretical and analytical relationships, and in practice is substituted by a set of ttf and ttr random variables defined according to the previous assumptions. The following measurement of reliability is called “expected number of failures” (ENF) and quantifies the number of failures in a period of time T D Œt1 ; t2 for a repairable component/system subject to FFR cycles: Zt2 W .t1 ; t2 / D w.t/ dt : (5.75) t1
Figure 5.42 illustrates the trend of the ENF for the generic period of time [0,t], distinguishing a repairable from a nonrepairable component. The ENF for a nonrepairable item corresponds to the failure probability F .t/, i. e., the cumulative of the density function f .t/ defined for the ttf random variable. The availability is one of the most significant statistical measures defined for a repairable component subject to FFR cycles. The system operates until it fails, after which it is repaired and returned in its original operating condition. This is the so-called renewal process, which is a sequence of independent and not negative random variables. A renewal occurs when a unit
114
5 Basic Statistics and Introduction to Reliability ReliaSoft Weibull++ 7 - www.ReliaSoft.com
Probability - Weibull distribution
99.900
Probability-Weibull Folio1\Data 1 Weibull-2P RRX SRM MED FM F=100/S=0 Data Points Probability Line
90.000
Unreliability, F(t)
50.000
10.000 5.000
1.000 0.500
0.100 1.000
Fig. 5.40 Probability plot, Weibull distribution. ReliaSoft® software
β = 8.8740, η = 3.2201, ρ = 0.9885
ReliaSoft Weibull++ 7 - www.ReliaSoft.com
ReliaSoft Weibull++ 7 - www.ReliaSoft.com
Unreliability vs Time Plot Weibull
Reliability vs Time Plot - Weibull distribution
1.000
Unreliability
0.800
Folio1\Data 1 Weibull-2P RRX SRM MED FM F=100/S=0 Data Points Unreliability Line
Reliability, R(t)=1-F(t)
Unreliability, F(t)=1-R(t)
10.000
Time, (t)
0.600
0.400
1.000
Reliability
0.800
Folio1\Data 1 Weibull-2P RRX SRM MED FM F=100/S=0 Data Points Reliability Line
0.600
0.400
0.200
0.200
0.000 0.000
1.000
2.000
3.000
4.000
0.000 0.000
5.000
1.000
2.000
3.000
4.000
β = 8.8740, η = 3.2201, ρ = 0.9885
β = 8.8740, η = 3.2201, ρ = 0.9885
ReliaSoft Weibull++ 7 - www.ReliaSoft.com
ReliaSoft Weibull++ 7 - www.ReliaSoft.com
Probability Density Function f(t) - Weibull distribution 2.000
Failure Rate vs Time Plot 20000.000
Pdf
Failure Rate
Folio1\Data 1 Weibull-2P RRX SRM MED FM F=100/S=0 Pdf Line
Folio1\Data 1 Weibull-2P RRX SRM MED FM F=100/S=0 Failure Rate Line
1.600
Failure Rate, f(t)/R(t)
16000.000
f(t)
1.200
0.800
0.400
0.000 0.000
12000.000
8000.000
4000.000
2.000
4.000
6.000
8.000
10.000
0.000 0.000
Time, (t) β = 8.8740, η = 3.2201, ρ = 0.9885
5.000
Time, (t)
Time, (t)
2.000
4.000
6.000
Time, (t) β = 8.8740, η = 3.2201, ρ = 0.9885
Fig. 5.41 F .t /, R.t /, f .t /, and (t ). Weibull distribution. ReliaSoft® software
8.000
10.000
5.11 Repairable Components/Systems: The Renewal Process and Availability A(t)
115
The availability can be also defined over an interval of time T D t t0 as follows:
W(0,t) repairable
N /D 1 A.T T
ZT A.t/ dt ;
(5.78)
t0
1
W(0,t) not repairable= F(t)
t
where A.t/ is the point availability in t. The availability in Eq. 5.78 is the so-called mean availability. The steady-state availability is the following: A.1/ D lim A.t/;
Fig. 5.42 Expected number of failures for repairable and nonrepairable components
t !1
(5.79)
where A.t/ is the point availability in t. Other definitions of availability refer to the following very simplified expression: fails and is restored to work. Its availability is the probability that it is performing the required function at a given point of time t when it is operating and maintained under specific conditions. In other words, A.t/ measures the capability of the component/system as the probability of being operational at a given time t: A.t/ D P .component is operating in time t/: (5.76) The literature presents several definitions of availability that mainly depend on which types of downtimes are chosen for analysis. In particular, the instantaneous or point availability A.t/ is the probability that a component/system is operational at any random time t. In other words, it is the sum of two contributions: 1. R.t/, the reliability of the component/system; Zt R.t x/m.x/ dx; 2. 0
where m.x/ is the renewal density function of the system because the repairable component has a failure distribution and a repair distribution.6 The point availability is Zt R.t x/m.x/ dx:
A.t/ D R.t/ C
(5.77)
0
6
The so-called renewal theory is properly discussed in Chap. 9.
AD
UT ; UT C DT
(5.80)
where UT is the component/system uptime and DT is the component/system downtime. In particular, Eq. 5.80 can be quantified by only assuming the corrective downtime for DT, or the total amount of downtime (corrective, preventive, inspection, etc.). The unavailability Q.t/ is the complementary function of A.t/, and a statistical measure of nonoperability of the component system at t: Q.t/ D 1 A.t/:
(5.81)
Given a repairable component subject to FFR cycles and assuming constant hazard and repair rates, i. e., .t/ D and .t/ D , the simplified expressions of availability and unavailability are A.t/ D
C e.C/t ; C C
(5.82)
Q.t/ D
1 e.C/t ; C
(5.83)
where is the constant hazard rate and is the constant repair rate. The demonstration of Eqs. 5.82 and 5.83 now follows. In agreement with the previously introduced two-state diagram (see Fig. 5.2), it is assumed the state of function of a generic repairable component is 0 and the state of nonfunction is 1.
116
5 Basic Statistics and Introduction to Reliability
Then the following notation that defines four basic events is assumed: E0 .t/ the component is functioning at time t; E1 .t/ the component is not functioning at time t; Ef .t/ the component fails at time [t,t C dt]; Er .t/ the component is repaired in time [t,t C dt]. Consequently, the probability associated with event E0 .t/ is the availability of the component at time point t: A.t/. Finally, two further definitions are: (t) failure rate of the component in t; (t) repair rate of the component in t: Using these definitions, one obtains the following basic equation: E0 .t C t/ D E0 .t/EN f .t/ C E1 .t/Er .t/ D E0 .t/ Œ1 .t/t C E1 .t/.t/t: (5.84) From Eq. 5.84, 8 dE0 .t/ E0 .t C t/ E0 .t/ ˆ ˆ D lim ˆ ˆ t !0 dt t ˆ ˆ ˆ ˆ ˆ E0 .t/ Œ1 .t/t ˆ ˆ D lim ˆ ˆ ˆ t !0 t ˆ ˆ < E1 .t/.t/t E0 .t/ C ˆ ˆ t ˆ ˆ ˆ ˆ .t/ D .t/t E ˆ f ˆ ˆ ˆ ˆ ˆ Er .t/ D .t/t ˆ ˆ ˆ ˆ :E .t/ D 1 E .t/: 1 0 (5.85) The following derivative equation is obtained: .t/t A.t/.t/t A.t/.t/t dA.t/ D lim t !0 dt t D .t/ A.t/ Œ.t/ C .t/ : (5.86) Now assuming failure and repair rates to be constant, A.t / Z
A.0/D1
dA.t/ D A.t/ . C /
Zt dt D t;
(5.87)
0
ˇ ˇA.t / ˇ ˇ ˇ 1 ln Œ A.t/ . C /ˇ D t: ˇ C ˇ 1
(5.88)
Solving Eq. 5.88, one obtains 1 1 ln Œ A.t/ . C / C ln./ D t: C C (5.89) Applying the properties of logarithms, one gets
Œ A.t/. C / exp ln
D exp Œt. C / ; (5.90)
A.t/ D
C e.C/t ; C C
(5.91)
thus demonstrating Eq. 5.82. The same result can be obtained by applying the Markov chains technique, as illustrated in Chap. 8, which discusses reliability models for dependent components/systems. The asymptotic values of availability and unavailability are .C/t C e A.1/ D lim t !1 C C MTTF D ; (5.92) D C MTTR C MTTF
i h 1 e.C/t t !1 C MTTR D : D C MTTF C MTTR
Q.1/ D lim
(5.93)
Table 5.6 reports the main definitions and properties related to repairable components subject to failure and repair processes. The second column includes the results obtained assuming an infinite MTTR, i. e., a repair rate equal to 0: although the repairable component becomes a nonrepairable item, the analytical models do not change. In particular, 8 ˆ
5.12 Applications and Case Studies
117
Table 5.6 Stochastic failure and repair processes for repairable components/systems Repairable component A.t / D
Nonrepairable component
w.t / C e.C/t D C C
A.t / D et D R.t / A.1/ D 0
MTTF D C MTTR C MTTF
1 e.C/t Q.t / D C
A.1/ D
Q.1/ D w.t / D
Q.t / D 1 et D F .t / Q.1/ D 1
MTTR D C MTTF C MTTR
2 C e.C/t D A.t / C C
w.t / D et D f .t / w.1/ D 0
1 D C MTTF C MTTR Zt W .0; t / D w.t / dt
w.1/ D
W .0; t / D 1 et D F .t /
0
Constant (t ) and (t )
justifies the following analytical relationship, assuming constant failure and repair rates: w.t/ C e.C/t D : (5.95) A.t/ D C C In general, A.t/ D
w.t/ ; .t/
(5.96)
where A.t/ is the availability of the repairable component, w.t/ is the nonconditional failure rate referred to t0 D 0, and (t) is the conditional failure rate at time point t for the repairable component. The condition is that the component is in a state of function at t. The major problem with the practical application of the previously illustrated analytical models is the availability of time-dependent quantities e. g., w.t/ and (t) for repairable components. Consequently, they are usually simplified by the assumption of constant hazard and repair rates, as illustrated in several applications explained in the following chapters. Nevertheless, Chap. 9 briefly discusses the so-called renewal process in order to model several stochastic processes of failure and repair in sequence, assuming independent random variables.
5.12 Applications and Case Studies This section presents two significant numerical examples that concern an industrial case study. In particular, the first discusses nonrepairable systems, the second repairable components. Both applications help the reader to understand and apply the previously illustrated analytical models. Some basic tools for supporting the reliability evaluation and analysis of simple components/systems are introduced in this section: histograms, probability plots, nonparametric estimations, etc. The following chapter discusses and illustrates these very useful tools of statistical analysis and reliability evaluation for nonrepairable and repairable production systems composed of simple and complex combinations of basic components.
5.12.1 Application 1 – Nonrepairable Components The manager of a leading mechanical company producing gearboxes for industrial applications wants to
118
5 Basic Statistics and Introduction to Reliability
Table 5.7 Data collection: ttf ( 100 h) ttf1
ttf2
ttf3
ttf4
2.3 2.7 3.4 3.5 4.0 3.0 3.2 3.0 3.1 3.9 3.2 3.1 2.8 2.6 2.7 2.7 2.8 2.9 3.1 3.3 3.3 3.7 2.7 2.8 2.9
3.4 2.3 2.9 3.0 2.8 2.7 3.4 3.7 3.2 3.8 3.3 3.3 3.0 3.1 3.2 3.4 2.9 2.9 3.0 2.0 3.3 3.7 2.7 3.5 3.5
2.9 2.8 2.8 3.2 2.9 2.2 3.5 3.6 3.2 3.7 2.9 2.3 2.5 2.6 2.7 2.9 2.8 3.1 3.1 3.2 3.2 2.9 2.2 3.5 3.6
3.0 3.1 3.3 3.3 3.7 2.7 3.5 3.5 3.3 3.5 2.9 2.6 2.5 2.4 2.3 2.6 2.6 2.4 3.3 3.2 4.0 3.0 3.2 3.0 3.1
quantify the reliability parameters describing the failure behavior of a nonrepairable and basic component of a family of products: in this case a gearing chain.
Consequently, the manager organizes a destructive test conducted during four different periods of time involving a total of 100 product units. Table 5.7 reports the ttf values of all 100 units tested, expressed in hundreds of hours. The values in Table 5.7 are grouped into four different samples (ttf1 , ..., ttf4 / composed of 25 units. Each sample refers to the complete analysis, i. e., without censored data (see Chap. 6), conducted during a specific period of time: 400 h. All of the elements involved are subject to the same operating conditions, defined by the specifications required by the most important customer. Initial analysis of the data identifies the best fitting statistical distribution. Consequently, it might be useful to analyze histograms of the time to failure. In particular, Fig. 5.43 illustrates the histogram of ttf1 , ttf2 , ttf3 , and ttf4 values distribution, while Fig. 5.44 exemplifies the cumulated frequency diagram for the sample 3 (ttf3 /. Identifying the best statistical distribution of the variable ttf makes it possible to predict the failure behavior of the component analyzed (the gearing chain). The fit analysis illustrated in this application was carried out using Minitab® Statistical Software. In particular, Fig. 5.45 displays the probability plot of estimated cumulative probabilities p versus the nondeterministic data after both variables have undergone linear transformation. Linearity makes it is possible to
Histogram of ttf1 ; ttf2 ; ttf3 ; ttf4
Frequency
ttf 1
ttf 2
8
6.0
6
4.5
4
3.0
2
1.5 0.0
0 2.4
2.8
3.2
3.6
4.0
2.0
2.4
2.8
ttf 3
3.2
3.6
ttf 4
6.0
4
4.5
3
3.0
2
1.5
1
0.0
0 2.4
2.8
3.2
3.6
Fig. 5.43 Histograms of ttf values, ttf1 , ..., ttf4
2.4
2.8
3.2
3.6
4.0
5.12 Applications and Case Studies
119
Cumulative Frequency
25 20 15 10 5 0 2.2
2.4
2.6
2.8
3.0
3.2
3.4
3.6
3.8
ttf3
Fig. 5.44 Histogram of cumulative frequency ttf values. Sample 3 (ttf3 /
use the degree of linear fit to identify the statistical distribution of values (e. g., Weibull, exponential, normal, or lognormal) which best fits the available data.
The set of parameters that best fits the available data is identified for each statistical distribution type according to the maximum likelihood estimation. A probability plot can also be used to perform an analysis based on a selected probability distribution, as illustrated for Weibull and normal distributions in Figs. 5.46 and 5.47, respectively, where maximum likelihood parameter estimates for the selected distribution are also produced. In order to compare the ability of each statistical distribution to fit the available data, the goodness-of-fit statistic for the maximum likelihood introduced by Anderson and Darling (D’Augostino and Stephens 1986) needs to be calculated. The Anderson–Darling statistic measures how far from the fitted line the plot points are located on a graph. The smallest value of this statistic identifies the statistical distribution that best fits the data. As a result, the Weibull and normal distributions in Fig. 5.45 are the best fitting statistical distributions. In particular, the Anderson–Darling statistics
Four-way probability plot for ttf1-ttf4 ML Estimates – Complete Data
Lognormal base e
Percent
Percent
Weibull 99 95 90 80 70 60 50 40 30 20 10 5 3 2
99
ttf1
95
ttf2 ttf3
80 70 60 50 40 30 20
ttf4
Anderson–Darling (adj) Weibull
10 5
1.208; 0.638; 0.845; 0.739 Lognormal base e
1
1
0.746; 0.969; 0.833; 0.856 2
3
4
2
3
4
Exponential
Normal
99
99
98 97
95
95
80 70 60 50 40 30 20
Percent
Percent
Exponential
90 80 70 60 50 30 10
8.893; 8.878; 8.643; 8.568 Normal 0.899; 0.739; 0.777; 0.770
10 5 1
0
5
10
15
2
Fig. 5.45 Probability plot of ttf. Minitab® Statistical Software
3
4
120
5 Basic Statistics and Introduction to Reliability
Overview Plot for ttf1-ttf4 ML Estimates – Complete Data
Probability Density Function
Weibull Probability ttf1 ttf2 ttf3 ttf4
99
Percent
1.0
0.5
95 90 80 70 60 50 40 30
Shape
Scale
AD*
F/C
20
7.6324
3.2498
1.208
25/0
10
9.1912
3.2939
0.638
25/0
5
8.1008
3.1521
0.845
25/0
7.6562
3.2290
0.739
25/0
1
0.0 2
3
2
4
Survival Function
3
4
Hazard Function ttf1 ttf2 ttf3 ttf4
1.0 10
0.9 0.8
0.6
Rate
Probability
0.7
0.5
5
0.4 0.3 0.2 0.1 0
0.0 2
3
4
2
3
4
Fig. 5.46 Weibull distribution. Parameter estimation. Minitab® Statistical Software
for the normal distribution and the set of four samples are 0.899, 0.739, 0.777, and 0.770. Figures 5.46 and 5.47, respectively, refer to the Weibull and normal distributions. These figures also present the estimated probability density function f .t/, the survival function R.t/, and the hazard function (t) according to the estimated parameters of the statistical distributions. Figure 5.48 presents the survival function and the hazard function obtained by the application of the Kaplan–Meier nonparametric estimation method. The advantage of this method is that it is not based on any hypotheses of statistical distribution of data. It might also be helpful to quantify R.t/ and (t) according to the number of failures in t, i. e., Nf .t/, and the number of functioning (i. e., healthy) elements in t, i. e., Nh .t/, and compare the results obtained with those estimated using the Kaplan–Meier method. For this purpose Table 5.8 collects the failure time values for the components of sample 3 (ttf3 without censored data). In order to simplify the calculus of the reliabil-
Table 5.8 Sample 3 ttf values ttf3 2.2 2.2 2.3 2.5 2.6
2.7 2.8 2.8 2.8 2.9
2.9 2.9 2.9 2.9 3.1
3.1 3.2 3.2 3.2 3.2
3.5 3.5 3.6 3.6 3.7
ity parameters, the ttf values are initially ordered in ascending values. The following equation quantifies the reliability value for a time period of 300 h (i. e., t D 3): R.t D 3/ D
11 Nh .t/ D D 0:44; N 25
where Nh .t/ is the number of functioning components at t and N is the number of functioning components at t D 0 (i. e., at the beginning of the test).
5.12 Applications and Case Studies
121
Overview Plot for ttf1-ttf4 ML Estimates – Complete Data
Probability Density Function
Normal Probability
1.0
ttf1 ttf2 ttf3 ttf4
99 95 90
Percent
80
0.5
20 10 5
2
3
2
4
3
Survival Function
25/0
3.120
0.4167
0.739
25/0
2.972
0.4152
0.777
25/0
3.040
0.4336
0.770
25/0
Hazard Function
0.9
ttf1 ttf2 ttf3 ttf4
6
0.8 5
0.7 0.6
4
0.5
Rate
Probability
0.899
4
7
1.0
0.4 0.3
3 2
0.2 1 0.1 0.0
0 2
3
4
2
3
4
Fig. 5.47 Normal distribution. Parameter estimation. Minitab® Statistical Software
Overview Plot for ttf1-ttf4 Kaplan–Meier Method – Complete Data Kaplan-Meier Survival Function 1.0
Probability
F/C
0.4017
1
0.0
ttf1 ttf2 ttf3 ttf4
0.5
0.0 0
1
2
3
4
Nonparametric Hazard Function 1.0
Rate
AD*
3.068
Location Scale
70 60 50 40 30
0.5
0.0 0
1
2
3
4 ®
Fig. 5.48 Kaplan–Meier nonparametric estimation of R.t / and (t ). Minitab Statistical Software
122
5 Basic Statistics and Introduction to Reliability
Nf .t C t/ Nf .t/ t !0 Nh .t/t 11 Nf .4/ Nf .3/ D D D 1; t D1 Nh .3/ 1 11
.t D 3/ D lim
where Nf .t C t/ Nf .t/ is the absolute number of failures in ]t, t C t] and t is assumed to be equal to 1. In particular, this value of the hazard rate is the mean value in ]t, t+ t]. If t D 0:6, then 8 Nf f3; 3:6Œg D D 1:2; t D0:6 Nh .3/ 0:6 11 0:6
.t D 3/ D
where Nf ft; t C tŒg is the number of failures in ]t, t C t[, or .t D 3/ D
t D0:6
10 Nf f3; 3:6g D Š 1:52; Nh .3/ 0:6 11 0:6
where Nf ft; t C tg is the number of failures in ]t, t C t]: This second value of the hazard rate is more correct because Nf .t C t/ Nf .t/ D Nf ft; t C tg : By applying these equations to a larger number of elements, such as the total number of components in the samples, one can adopt shorter values of t which approximate dt more accurately. The effect of applying these models on the total number of elements (whose cumulative frequency values are shown in Fig. 5.49) is R.t D 3/ D
N Nf .t/ Nh .t/ D N N
100 50 D 0:50; D 100 Nf .3:5/ Nf .3/ .t D 3/ D t D0:5 Nh .3/ 0:5 D
Nf f3; 3.5]g 39 D D 1:56; Nh .3/ 0:5 50 0:5
100
Cumulative Frequency
This value agrees with the values in Fig. 5.48 obtained by using Kaplan–Meier estimation. The hazard rate at t D 3 is based on the following:
50
0 2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 3.6 3.8 4.0
TTF Fig. 5.49 Cumulative frequency of ttf for the total number of components
Nf f3; 3.5]g Nf .3:5/ Nf .3/ D t D0:5 N 0:5 N 0:5 39 D 0:78: D 100 0:5 In fact, the failure rate at t D 3 can also be quantified by the following: f .t D 3/ D
.t D 3/ D
f .3/ 0:78 D D 1:56: R.3/ t D0:5 0:50
Assuming t D 0:2, Nf .3:2/ Nf .3/ t D0:2 N 0:2
f .t D 3/ D D
Nf f3; 3.2]g 18 D 0:9; N 0:2 100 0:2
which is close to the previously quantified value (i. e., 0.78). The problem of approximating t is solved by applying nonparametric models, so the Kaplan–Meier results are illustrated in Fig. 5.48.
5.12.2 Application 2 – Repairable System An important Italian manufacturing company for hydraulic pumps works every day approximately 100 different parts on CNC machines. The set of machining tools has to be continuously monitored in order to guarantee the required level of quality. The behavior of a specific tool was observed during a 23 weeks-long
5.12 Applications and Case Studies
123 Table 5.10 Time to failure (ttf) and time to repair (ttr)
state [UP/DOWN]
State diagram – Component 1 and 2 Component 1 Component 2 UP2
UP1
DOWN 0
2
4
6
8
10
12
14
16
18
20
time [weeks]
Fig. 5.50 State diagram. Components 1 and 2
Component
ttf1
ttr1
ttf2
ttr2
ttf3
ttr3
ttf4
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
5.5 5.2 6.0 4.9 3.0 2.9 5.1 4.8 5.4 6.3 7.1 8.0 2.9 4.0 5.0
0.5 0.3 0.4 0.32 0.6 0.9 1.2 2.0 2.1 1.6 1.1 0.4 0.7 0.8 0.9
2.9 5.1 4.8 5.4 6.3 7.1 4.1 5.3 3.5 3.6 2.6 3.8 5.0 3.2 4.0
0.9 2.1 1.6 1.1 0.4 0.5 0.5 0.4 0.5 0.3 0.4 0.32 1.1 0.4 0.7
2.9 4.0 5.0 5.2 6.0 4.9 3.0 2.9 7.1 3.5 3.6 2.6 3.8 4.1 3.9
1.1 2.1 1.6 1.1 0.4 0.7 0.6 1.2
4.8
0.6 0.9 1.2 2.0 2.1 1.6
3.0 2.9 5.1 4.8 5.4 3.2
4.1 5.3 3.5 3.6
The 4th failure is definitive.
activity of a CNC machine by registering the instance of failure tfailure and the repair time trepair for a set of 15 units of that tool. These values are collected in Table 5.9, assuming t0 D 0 as starting operating time for the whole set of 15 tools. The tools are “as good as new” at t0 . Figure 5.50 illustrates the state diagram for components 1 and 2: “up” means the component is functioning correctly; “down” refers to the nonfunction of the component, i. e., it is “under repair.” The data reported in Table 5.10 were obtained from Table 5.9 and refer to ttf and ttr, which start and finish during the 23-week time window of the analysis.
In particular, Figs. 5.51 and 5.52, respectively, illustrate the frequency distribution of ttf1 (first set of ttf defined for the components) and ttr1 (first set of ttr). Cumulative frequency values for the amount of time to failure (called “global ttf”) and time to repair (called “global ttr”) are reported in Figs. 5.53 and 5.54, respectively. These histograms are useful in helping to determine some important parameters in failure and repair behaviors in the case of the “as good as new” hypothesis for the generic component following the repair activity.
Table 5.9 Function–failure–repair cycles for 15 tools. Unit of time, week Component
tfailure (1)
trepair (1)
tfailure (2)
trepair (2)
tfailure (3)
trepair (3)
tfailure (4)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
5.5 5.2 6.0 4.9 3.0 2.9 5.1 4.8 5.4 6.3 7.1 8.0 2.9 4.0 5.0
6.0 5.5 6.4 5.22 3.6 3.8 6.3 6.8 7.5 7.9 8.2 8.4 3.6 4.8 5.9
8.9 10.6 11.2 10.62 9.9 10.9 10.4 12.1 11.0 11.5 10.8 12.2 8.6 8.0 9.9
9.8 12.7 12.8 11.72 10.3 11.4 10.9 12.5 11.5 11.8 11.2 12.52 9.7 8.4 10.6
12.7 16.7 17.8 16.92 16.3 16.3 13.9 15.4 18.6 15.3 14.8 15.12 13.5 12.5 14.5
13.8 18.8 19.4 18.02 16.7 17.0 14.5 16.6
18.6
15.9 15.7 16.32 15.5 14.6 16.1
18.9 18.6 21.42 20.3 20.0 19.3
The 4th failure is definitive.
20.8 22.3 18.0 20.2
5 Basic Statistics and Introduction to Reliability 5
60
4
50
Cumulative Frequency
Frequency
124
3 2 1 0 3.0
3.5
4.0
4.5
5.0
5.5
6.0
6.5
7.0
7.5
40 30 20 10 0
8.0
TTF1
2
3
4
5
6
7
8
TTF "global"
Fig. 5.51 Frequency distribution of first time to failure (ttf1 )
Fig. 5.53 Cumulative frequency values for all ttf 3
45
Cumulative Frequency
Frequency
40 2
1
0 0.3
0.5
0.7
0.9
1.1
1.3
1.5
1.7
1.9
2.1
TTR1 Fig. 5.52 Frequency distribution of first time to repair (ttr1 /
35 30 25 20 15 10 5 0 0.3 0.5 0.7 0.9 1.1 1.3 1.5 1.7 1.9 2.1
TTR "global" Fig. 5.54 Cumulative frequency values for all ttr
5.12.2.1 Failure Process Analysis. As Good as New Hypothesis From the so-called as good as new hypothesis, the set of 15 repairable components corresponds to 56 nonrepairable elements as components 2, 3, 4 are still working and 9 still under repair at the end of the week 23. All components start to function in t0 D 0 according to the same set of operating conditions. Table 5.11 reports the total number of ttf by ascending values (“global” ttf). As a result, reliability (i. e., the survival function in Fig. 5.55) and the failure rate (i. e., the hazard function in Fig. 5.55) can be quantified when the components are in a state of function. For example, considering a 4-week period of time, the value of reliability is R.t D 4/ D
56 22 N Nf .t/ D Š 0:607: N 56
Similarly for t D 5 weeks, R.t D 5/ D
56 34 N Nf .t/ D Š 0:393: N 56
Assuming that all 15 units should be “as bad as first failure”, that is in the same state of failure after they fail (similarly to the “as good as new” hypothesis considered in the previous analysis of the failure process), an mount of 44 elements under repair is derived. Figure 5.56 illustrates the distribution overview plot assuming the hypothesis of a lognormal distribution of data for repairs (see the probability density function in Fig. 5.56). In particular, the survival function in Fig. 5.56 corresponds to 1 G.T /;
5.12 Applications and Case Studies
125
Table 5.11 Time to failure according to the “as good as new” hypothesis
cess of failure and each process of repair, which are all statistically independent, i. e., without assuming the as good as new (failure processes) and the as bad as the first failure (repair processes) hypotheses.
“Global” ttf 2.6 2.6 2.9 2.9 2.9 2.9 2.9 2.9 3.0 3.0 3.0 3.2 3.2 3.5
3.5 3.5 3.6 3.6 3.6 3.8 3.8 3.9 4.0 4.0 4.0 4.1 4.1 4.1
4.8 4.8 4.8 4.8 4.9 4.9 5.0 5.0 5.0 5.1 5.1 5.1 5.2 5.2
5.3 5.3 5.4 5.4 5.4 5.5 6.0 6.0 6.3 6.3 7.1 7.1 7.1 8.0
5.12.2.2 Failure Process Analysis Without Assuming the “as Good as New” Hypothesis Figure 5.57 illustrates the overview plot analysis, conducted separately, for the four different sets of failure times (ttf) related to the repetitive failures of the 15 components and assuming the parametric Weibull distribution. The pool of components is subject to progressive degradation by the reduction of MTTF values and the increase of hazard functions. Consequently, the assumption of the previously illustrated “as good as new” hypothesis is not correct.
where G.T / is the maintainability for the time interval T D t t0 , where t0 is the repair starting time. The so-called “hazard function” in Fig. 5.56 corresponds to the repair rate (t) that quantifies the velocity of the component to be repaired after a specific failure. The Anderson–Darling goodness-of-fit parameter shows that the lognormal distribution fits the available data correctly. The following sections study the failure and repair processes, distinguishing each pro-
5.12.2.3 Repair Process Analysis Without Assuming the “as Bad as First Failure” Hypothesis Figure 5.58 illustrates the overview plot analysis, conducted separately, for the four different sets of repair times (ttr) related to the repetitive activities of repair (i. e., repair cycles) on the 15 components and assuming a lognormal distribution of data.
Kaplan–Meier Survival Function
Probability
1.0
0.5
0.0 0
1
2
3
4
5
6
7
8
6
7
8
Nonparametric Hazard Function
Rate
1.0
Fig. 5.55 R.t / and (t ), nonparametric analysis. “As good as new” hypothesis
0.5
0.0 0
1
2
3
4
5
126
5 Basic Statistics and Introduction to Reliability
Time To Repair Analysis ML Estimates – Complete Data Lognormal base e Probability
Probability Density Function 1.0
99
0.9 95
0.8
90 0.7
Percent
0.6 0.5 0.4 0.3
80 70 60 50 40 30 20
-0.2242 0.613
MTTF
0.9644
Failure Censor
45 0
Goodness of Fit AD*
10 5
0.2
Location Scale
1.046
0.1 1
0.0 0
1
2
0.1
3
1.0
μ(t)
1-G(t) 1.0
1.7
0.9 0.8 0.7
Rate
Probability
1.2 0.6 0.5 0.4 0.7
0.3 0.2 0.1 0.0
0.2 0
1
2
3
0
1
2
3
Fig. 5.56 Repair process analysis. Lognormal distribution. Minitab® Statistical Software
Failures Analysis ML Estimates – Complete Data Weibull Probability
Probability Density Function
TTF1 TTF2 TTF3 TTF4
0.45 0.40 0.35 0.30
Percent
0.25 0.20 0.15
99 95 90 80 70 60 50 40 30
Shape
1.018
10
3.8780 4.9134
0.938
3.5483 4.6215
1.111
5.2899 4.5241
1.380
0.05 1
0.00 1
2
3
4
5
6
7
1
8
10
Survival Function
Hazard Function
1.0
TTF1 TTF2 TTF3 TTF4
4
0.9 0.8 3
0.6 0.5
Rate
Probability
0.7
0.4
2
0.3 1 0.2 0.1 0
0.0 1
2
3
4
5
6
7
8
AD*
3.9406 5.6039
5
0.10
Scale
20
1
2
®
3
4
5
6
Fig. 5.57 Failure analysis (ttf1 , ..., ttf4 /. Minitab Statistical Software
7
8
5.12 Applications and Case Studies
127
Repairs Analysis ML Estimates – Complete Data Lognormal base e Probability
Probability Density Function
TTR1 TTR2 TTR3
99 95 1.0
Percent
90
0.5
80 70 60 50 40 30 20
Location Scale
AD*
-0.2659 0.6111
0.902
-0.4765 0.5814
1.346
0.0909 0.4994
1.066
10 5 1
0.0 0.0
0.5
1.0
1.5
2.0
2.5
3.0
0.1
3.5
1.0
μ(t)
1-G(t)
TTR1 TTR2 TTR3
1.0 0.9 2
0.8
0.6 0.5
Rate
Probability
0.7
0.4
1
0.3 0.2 0.1 0.0
0 0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
®
Fig. 5.58 Repair analysis (ttr1 , ..., ttr4 /. Minitab Statistical Software Table 5.12 Calculus for failure rates and ENF(T ) Component
tfailure(1)
trepair(1)
tfailure(2)
trepair(2)
tfailure(3)
t D5
1 2 3 4.9 5 6 7 8 9 10 11 12 13 14 15
5.5 5.2 6.0 5.22 3.0 2.9 5.1 4.8 5.4 6.3 7.1 8.0 2.9 4.0 5.0
6.0 5.5 6.4 10.62 3.6 3.8 6.3 6.8 7.5 7.9 8.2 8.4 3.6 4.8 5.9
8.9 10.6 11.2 11.72 9.9 10.9 10.4 12.1 11.0 11.5 10.8 12.2 8.6 8.0 9.9
9.8 12.7 12.8 16.92 10.3 11.4 10.9 12.5 11.5 11.8 11.2 12.52 9.7 8.4 10.6
12.7 16.7 17.8 y 16.3 16.3 13.9 15.4 18.6 15.3 14.8 15.12 13.5 12.5 14.5 Nf .t /
n n n n n n n y n n n n n n y 3
State of failure [y/n] in t t D6 t D7 t D 10 n y y n n n y y y n n n n n n 5
n n n n n n n n y y n n n n n 2
n n n n y n n n n n n n n n y 2
ANF ]0,10] 2 1 1 1 2 1 1 1 1 1 1 1 2 2 2 20
ENF expected number of failures, ANF absolute number of failures
5.12.2.4 Availability Determination The previously discussed analyses are subject to specific hypotheses concerning the state of health of the components after the generic repair (e. g., as soon as good hypothesis) and/or concerning the state of problems with the components after the generic failure
(e. g., as bad as the first failure hypothesis). The calculus of availability is now presented. It has been specifically quantified for the unit of time t D 5 weeks:
A.t D 5/ D
12 Nh .t D 5/ D D 0:800: N 15
128
5 Basic Statistics and Introduction to Reliability
The unavailability in the unit of time t D 5 weeks is Q.t D 5/ D
3 Nf .t D 5/ D D 0:200: N 15
Finally, the not conditional failure rate of the generic component in the unit period of time t D 5 is ANF ft; t C tg w.t D 5/ D lim t !0 Nt 8 5 ANF f5; 6g ˆ ˆ D D ˆ ˆ t D1 ˆ 15 1 15 ˆ ˆ ˆ < D 0:3N week1 D ˆ 6 ANF f5; 7g ˆ ˆ D D ˆ ˆ t D2 ˆ 15 2 15 2 ˆ ˆ : Š 0:2 week1 ; where ANF is the absolute number of failures (i. e., on the whole number of components) between t and t C t, i. e., in ]t,t C t]. The failure rate strongly depends on the value of t as demonstrated in the discussion of the value of (t) in the previous case study (nonrepairable components). The values obtained quantify the mean hazard rate in [t; t C t]. In particular, w.t/ can be negative, i. e., during t the number of repairs is greater than the number of failures. The conditional failure rate value for the repairable component in the unit period of time equal to 5 weeks is subject to the same considerations and is equal to .t D5/ ANF ft; t C tg D lim t !0 Nh .t/t 8 5 ANF f5; 6g ˆ ˆ D D ˆ ˆ t D1 Nh .t D 5/ 1 12 ˆ ˆ ˆ ˆ < Š 0:417 week1 D ˆ 6 ANF f5; 7g ˆ ˆ ˆ D D ˆ ˆ t D2 Nh .t D 5/ 2 12 2 ˆ ˆ : Š 0:25 week1 : These values could be also obtained from the following equation: 8 0:3N ˆ ˆ D Š 0:417 week1 < w.t D 5/ t D1 0:8 .t D 5/ D D ˆ A.t D 5/ ˆ : D 0:2 Š 0.25 week1 : t D2 0:8
It is important to remember that the failure rate definition is based on the assumption of infinitesimal t, i. e., dt, in accordance with the basic hypothesis that two transactions from state 0 to 1 (see Sect. 5.8) are not admissible. In particular, considering the state of failure (or health) of the generic component in t and t C t (see Table 5.12), .t D 5/ Nf .t C t/ Nf .t/ D lim t !0 Nh .t/t 8 53 Nf .6/ Nf .5/ ˆ D ˆ D ˆ ˆ t D1 N .5/ 1 12 ˆ h ˆ ˆ ˆ < 0:16N week1 D ˆ ˆ ˆ D Nf .7/ Nf .5/ D 2 3 ˆ ˆ ˆ t D2 Nh .5/ 2 12 2 ˆ ˆ : Š 0:042 week1 ; where, in general, Nf .t/ 2 Œ0; 12 and ANF 2 ft; t C tg D Œ0; C1Œ. Consequently these values of the failure rate, (t D 5), differ from previous ones because Nf .t C t/ Nf .t/ does not quantify the absolute number of failures. Finally, the ratio of failures during the first 10 weeks is (see Table 5.12) ANF f0; 10g N 20 Nf .10/ Nf .0/ D D 1:33; D N 15
ENF.T D 10/ D W .0; 10/ D
where 15 is the number of components that fail in [0, 10] and 5 is the number of components that fail a second time in [0, 10].
5.12.2.5 Availability by Monte Carlo Simulation Analysis Now the Monte Carlo simulation analysis has been applied to study this repairable component assuming the “as good as new” and the “as bad as old” hypotheses. In other words all available ttf (ttr) values are used to evaluate the failure (repair) behavior without distinguishing the first failure (repair) event from the subsequent failures (repairs). Different approaches to the analysis of recurrent stochastic events,
5.12 Applications and Case Studies
129
Re lia So f t W e ibull+ + 7 - w w w . Re lia So f t . co m
F/S Histogram 0.400
FS Histogram Pdf Line Failures
0.320
Value
0.240
0.160
0.080
0.000
0 -0 . 5 4 0 . 5 4 -. . . 1 . 0 8 -. . . 1 . 6 2 -. . . 2.16 -. . . 2.70 -. . . 3 .24 -. . . 3 .78 -. . . 4 . 3 2 -. . . 4 . 8 6 -. . . 5 . 4 0 -. . . 5 . 9 4 -. . . 6.48 -. . . 7.02 -. . . 7.56 -. . . 8.10 -. . . 8 . 6 4 -. . .
Period β = 4.4871, η = 4.8796, ρ = 0.9516
Fig. 5.59 Frequency of failures distribution, ttf. ReliaSoft® software
ReliaSoft Weibull++ 7 - www.ReliaSoft.com
F/S Timeline FS Timeline Failure
0 .000
1. 8 00
3. 6 00
5. 4 00
Time, (t) β = 4.4871, η = 4.8796, ρ = 0.9516
Fig. 5.60 Timeline analysis of failure events. ReliaSoft® software
7.200
9.000
130
5 Basic Statistics and Introduction to Reliability
TTF analysis Weibull Shape 3.710 Scale 4.963 N 56
100
Percent
80
60
40
20
0 1
2
3
4
5
6
7
8
TTF
Distribution Overview Plot for TTF ML Estimates – Complete Data Probability Density Function
Table of S hape S cale M ean S tDev M edian IQ R F ailure C ensor A D*
Weibull 99.9
0.3
Percent
PDF
90 0.2
0.1
50 10
0.0 2
4
6
1
8
2
S tatistics 3.70957 4.96289 4.47941 1.34520 4.49599 1.87261 56 0 1.121
5
TTF
TTF
Survival Function
Hazard Function
100
Rate
P er cent
2
50
0
1
0 2
4
6
8
TTF
2
4
6
8
TTF
Fig. 5.61 ttf analysis. Minitab® Statistical Software
e. g., the renewal process analysis (also called “recurrent events data analysis”) conducted with parametric evaluation models or nonparametric evaluation models, have been presented in the literature, but they are not subject of this chapter. Figures 5.59 and 5.60 present the frequency of failure events distribution and the timeline analysis conducted on the available 56 failure events collected on the set of 15 components.
By the parametric distribution evaluation analysis assuming a Weibull statistical distribution of ttf, the shape and scale values are ˇ D 3:710 and ˛ D 4:963 as demonstrated by Fig. 5.61. Similarly Figs. 5.62–5.64 illustrate the parametric analysis conducted to identify the best parameterization of the probabilistic distribution of ttr values (the number of failure events is 44), assuming a lognormal distribution.
5.12 Applications and Case Studies
131
ReliaSoft Weibull++ 7 - www.ReliaSoft.com
F/S Histogram 2.00 0
FS Histogram Pdf Line Failures
1.60 0
Value
1.20 0
0.80 0
0.40 0
0.00 0
0 -0.18 0.18-. . . 0.36-. . . 0 .54-. . . 0.72 -. . . 0.90 -. . . 1.08-. . . 1.26-. . . 1.44 -. . . 1.62 -. . . 1.80 -. . . 1.98-. . . 2. 1 6-. . . 2.34 -. . . 2.52 -. . . 2.70 -. . . 2. 8 8-. . .
Period μ = −0.2242, σ = 0.6220, ρ = 0.9752
Fig. 5.62 Frequency of failures distribution, ttr. ReliaSoft® software
ReliaSoft Weibull++ 7 - www.ReliaSoft.com
F/S Timeline FS Timeline Failure
0.000
0.600
1 .200
1.80 0
Time, (t) μ = −0.2242, σ = 0.6220, ρ = 0.9752
Fig. 5.63 Timeline analysis of repair events. ReliaSoft® software
2. 4 00
3.000
132
5 Basic Statistics and Introduction to Reliability Distribution Overview Plot for TTR
TTR analysis
ML Estimates-Complete Data
Lognormal
90 P er cent
80
99
1.0
P DF
100
-0.2242 0.6201 44
Table of S tatistics Loc -0.224158 Scale 0.613003 M ean 0.964379 StD ev 0.651307 M edian 0.799189 IQ R 0.679864 F ailure 44 C ensor 0 A D* 1.104
Lognormal
P robability Density F unction
Loc Scale N
0.5
50
Percent
10
60
0.0 0
1
2
1 0.1
3
1.0
TTR
40
TTR
S urv iv a l F unction
H azard F unction
100 1.5
0
Rate
P er cent
20 50
1.0
0.5
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
0
TTR
0
1
2
3
TTR
0
1
2
3
TTR
®
Fig. 5.64 ttr analysis. Minitab Statistical Software ReliaSoft BlockSim 7 - www.ReliaSoft.com
System Up/Down Sta t e Operating Time Time Under Repair
System
0.000
10.000
20.000
30.000
40.000
50.000
Time, (t)
Fig. 5.65 Up/down analysis, 50 weeks. ReliaSoft® software
The estimated values of the location and scale parameters of the lognormal distribution of ttr values are 0:224 and 0.613 respectively. By the application of the dynamic simulation on a period of time of 520 weeks (equal to about 10 years) and considering a number of repetitions equal to 100, the following results have been obtained: • mean availability 0.824;
• • • • •
point availability A(t D 520 weeks) 0.74; ENF(T D 520 weeks) 95.46 failures; MTTF 4.69 weeks; uptime 428.3 weeks; downtime 91.7 weeks.
Finally, Fig. 5.65 presents the up/down diagram obtained by the simulation analysis on a period of 50 weeks.
6
Reliability Evaluation and Reliability Prediction Models
Contents 6.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
6.2
Data Collection and Evaluation of Reliability Parameters . . . . . . . . . . . . . . . . . . . . . . 134 6.2.1 Empirical Functions Direct to Data . . . . . . . . 135 6.2.2 Theoretical Distribution Research . . . . . . . . . 145
6.3
Introduction to Reliability Block Diagrams . . . . . . 152
6.4
Serial Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . 153 6.4.1 Numerical Example – Serial Configuration . . 154
6.5
Parallel Configuration . . . . . . . . . . . . . . . . . . . . . . . . 161 6.5.1 Numerical Example – Parallel Configuration 163
6.6
Combined Series–Parallel Systems . . . . . . . . . . . . . 168
6.7
Combined Parallel–Series Systems . . . . . . . . . . . . . 170
6.8
k-out-of-n Redundancy . . . . . . . . . . . . . . . . . . . . . . . 170 6.8.1 Numerical Examples, k-out-of-n Redundancy 171
6.9
Simple Standby System . . . . . . . . . . . . . . . . . . . . . . . 174 6.9.1 Numerical Example – Time-Dependent Analysis: Standby System . . . . . . . . . . . . . . . . 180
6.10 Production System Efficiency . . . . . . . . . . . . . . . . . . 183 6.10.1 Water Supplier System . . . . . . . . . . . . . . . . . . . 185 6.10.2 Continuous Dryer System . . . . . . . . . . . . . . . . 187
Chapter 5 introduced the basic maintenance terminology and nomenclature related to a generic item as a part, component, device, subsystem, functional unit, piece of equipment, or system that can be individually considered. It is worth remembering the following definition of availability in accordance with the European standards and specifications: “ability of an item to be in a state to perform a required function under given conditions at a given instant of time or during a given time interval, assuming that the required external re-
sources are provided.” Availability, such as reliability and maintainability, refers to a production system as a combination of different functions, parts, and basic components whose failure and repair behaviors can be known or unknown. In particular, these behaviors can be eventually based on the availability of historical data of failures and repairs, whose statistical evaluation can effectively support the prediction of future and stochastic behaviors for new equipment and/or already operating production systems and components. What about models and methods for reliability evaluation engineering? This chapter also discusses the elementary reliability configurations of a system in order to introduce the reader to the basic tools to evaluate complex systems, i. e., based on complex configurations, as deeply discussed and exemplified in the next chapter.
6.1 Introduction This chapter introduces the basic analytical models and statistical methods used to analyze simple reliability systems that form the basis for evaluation and prediction of the stochastic failure and repair behavior of complex production systems, assembled using a variety of components. Consequently, the first part of the chapter (Sect. 6.2) presents various applications of analytical models that are alternatives to determining the statistical distribution that best fits a set of failure and/or repairable data in the absence (or presence) of censored data. This important activity is the so-called reliability life data analysis based on the statistical in-
R. Manzini, A. Regattieri, H. Pham, E. Ferrari, Maintenance for Industrial Systems © Springer 2010
133
134
ference models and tools explained and illustrated in this chapter, also supported by commercial statistical and reliability packages. The second part of this chapter (Sects. 6.3–6.9) presents simple reliability block diagrams that help to predict the reliability and availability of elementary production systems. The basic reliability block diagram configurations are used to build complex block diagrams capable of describing the failure and repair behaviors of complex production systems composed of both repairable and nonrepairable components, as illustrated and explained in Chap. 5. Several useful numerical examples providing helpful support to practitioners and managers of production systems and maintenance departments are presented in this chapter.
6.2 Data Collection and Evaluation of Reliability Parameters Evaluation of reliability parameters based on the field data collected is a very significant problem. In general, the starting point is a set of failure times or, more precisely, failure and removing times (when units fail or are removed from the test, information about their failure times is sometimes not available). The aim is to obtain a meaningful estimate of the fundamental reliability parameters, especially the cumulative failure distribution FO .t/, the survival funcO tion (reliability function) R.t/, and the hazard funcO tion .t/. Determining these functions means reliability theory and all related optimization policies can be applied. In general, considering a population of n units, each specific failure time can be found. The result is represented by t1 ; t2 ; : : : ; tn , where ti represents the time of failure of the i th unit: there is a complete data situation in this case, i. e., all n unit failure times are available. However, this is frequently not the situation because a lot of time and information is required. The real-world test often ends before all units have failed, or several units have finished their work before data monitoring, so their real working times are unknown. These conditions are usually known as censored data situations.
6 Reliability Evaluation and Reliability Prediction Models
Technically, censoring may be further categorized into: 1. Individual censored data. All units have the same test time t . A unit has either failed before t or is still running (generating censored data). 2. Multiple censored data. Test times vary from unit to unit. Clearly, failure times differ but there are also different censoring times. Censored units are removed from the sample at different times, while units go into service at different times. An individually censored situation usually deals with laboratory tests, while a multiple situation is frequently found in real-world operating conditions. The “clock,” or rather the main parameter defining the censoring, is usually time, but can also be the number of failures. So it is possible to distinguish: 1. Type I censoring. Testing is terminated after a fixed time t . 2. Type II censoring. Testing is terminated after a fixed number of failures occur (usually represented by r). The test time is tr , the failure time of the rth unit. The last important taxonomy deals with censoring: 1. Right censored data. The failure time for some units is known to occur only before a specified time. 2. Left censored data. The failure time for some units is known to occur only after a specified time (in other words the test is finished but the units work well again). 3. Interval censored data. Exact failure times are unknown but the number of failures in a specified interval of times is recorded. Figure 6.1 shows several of these situations. There are two main approaches in both complete and in censored conditions to fitting the reliability parameters to the real-world data set. The first is to derive empirical reliability functions directly (empirical functions direct to data, EFDD). The second is to fit theoretical distributions (theoretical distribution research, TDR) such as exponential and Weibull, which is usually more complicated but more accurate. The second approach usually follows the first one, which remains particularly important. Several methods for both are presented in the technical literature. The more established ones are considered first, and then the latest developments are briefly considered in the final part of this chapter. Figure 6.2 summarizes the most frequently used approaches.
6.2 Data Collection and Evaluation of Reliability Parameters
135
units units n n
i
right censored
i 1
1 t*
0 Complete data set
0
t* Type I – singly censored data set
units n
failure
r censor
i interval censored 1
0
right tr Type II – multiply censored data
Fig. 6.1 Complete and censored data set
6.2.1 Empirical Functions Direct to Data Empirical methods are also called nonparametric methods or nondistribution methods. They directly evaluate F .t/, R.t/, and .t/ in a real-world data set in terms of failure times or removing times. The corresponding estimates are usually indicated as FO .t/, O O R.t/, and .t/. This very simple empirical method is not timeconsuming. The resulting plots of the reliability functions are very easy to interpret but difficult to manage for automatic systems (e. g., PCs, programmable logic controllers, and software packages). In addition, the empirical evaluation provides the starting point for the analytical evaluation of reliability functions (TDR). 6.2.1.1 Complete Data – Direct Method
O i/ D 1 n i D i : FO .ti / D 1 R.t n n
(6.1)
(6.2)
Using the definitions of the failure density function f .t/ and the hazard function .t/, one can easily evaluate the following equations by considering the previous equations: R.t i C1/ R.t i / dF .t/ dR.t/ fO.t/ D D Š dt dt ti C1 ti 1 for ti < t < ti C1 ; D (6.3) n.ti C1 ti / fO.t/ O .t/ D O R.t/ D
Given that t1 ; t2 ; t3 ; : : : ; tn , where ti ti C1 , are n ordered failure times in a random sample, and i is the number of failures occurring up to time ti , a possible estimate of the survival function R.t/ at time ti can be calculated by the fraction of units surviving at time ti : O i/ D n i D 1 i : R.t n n
From this equation F .t/ can be evaluated immediately:
1 .ti C1 ti /.n i /
for ti < t < ti C1 : (6.4)
Using Eq. 6.2 F .tn / D n/n D 1, then the probability for any unit surviving beyond tn is zero. Since it is unlikely that any sample analyzed contains the longest survival time, Eq. 6.1 tends to underestimate the reliability of components, and so an improved direct method was developed.
136
6 Reliability Evaluation and Reliability Prediction Models
EVALUATION of RELIABILITY FUNCTIONS [F(t), R(t), λ(t)]
1
Empirical Functions Direct to Data
COMPLETE DATA
Direct Method
CENSORED DATA
Median Rank
Product Limit Estimator
Improved Direct Method
Rank Adjustment
Kaplan–Meier
Others
Fˆ (t) , Rˆ (t) , λˆ (t) plots
2
Theoretical Distribution Research
Least-squares X rank Y
Maximum likelihood estimator
Fˆ (t) , Rˆ (t), λˆ (t) analytical expressions Fig. 6.2 General framework for the evaluation of reliability functions
6.2.1.2 Complete Data – Improved Direct Method An improved estimate of cumulative failure distribution is FO .ti / D
i : nC1
(6.5)
Compared with the previous one, this method behaves very well on real-world applications and so is widely diffused. The following are very simple to estimate: O i / D 1 FO .ti / D 1 i D n C 1 i ; R.t nC1 nC1 (6.6)
R.t / R.ti / dR.t/ dF .t/ D Š i C1 fO.t/ D dt dt ti C1 ti 1 for ti < t < ti C1 ; D .ti C1 ti /.n C 1/ (6.7) fO.t/ 1 O .t/ D D O .t t /.n C 1 i/ i C1 i R.t/ for ti < t < ti C1 :
(6.8)
6.2.1.3 Complete Data – Median Rank Method The improved estimate of the cumulative failure distribution obtained on a probability plot provides the mean plotting position for the i th-ordered failure. When the value of i is close to the bound of the in-
6.2 Data Collection and Evaluation of Reliability Parameters
terval, i. e., 0 and n, the F .t/ distribution is tilted and the median instead of the mean value is preferred. The median position is usually called “median rank” (MR), depends on both the order of failure (i / and the number of components n, and is defined as the value of F .t/ associated with the probability of i or more failures occurring being 0.5. Numerically MR is expressed by ! n X n MRk .1 MR/nk D 0:50: (6.9) k kDi
MRs are often tabulated (Ebeling 2005) but can be easily approximated as follows, especially when sample sizes are large: MR FO .ti / D
i 0:3 : n C 0:4
(6.10)
And thereby O i / D 1 FO .ti / D 1 i 0:3 D n C 0:7 i ; R.t n C 0:4 n C 0:4 (6.11) R.ti C1 / R.ti / dF .t/ dR.t/ fO.t/ D D Š dt dt ti C1 ti 1 for ti < t < ti C1 ; D .ti C1 ti /.n C 0:4/ (6.12) fO.t/ 1 O .t/ D D O .ti C1 ti /.n C 0:7 i / R.t/ for ti < t < ti C1 : (6.13)
6.2.1.4 Mean Time to Failure and Time to Failure Variance The mean time to failure (MTTF) and its variance are very important parameters in reliability analysis. Their values can be estimated directly from a sample of n elements by n X ti (6.14) MTTF D n i D1
and s2 D
n X .ti MTTF /2 D n1 i D1
Pn
2 i D1 ti
n.MTTF /2 : n1 (6.15)
137
If n is large enough to invoke the central limit theorem, the confidence interval for the MTTF based on Student’s t distribution can be set as follows: s Pr MTTF t˛=2;n1 p MTTF n s D .1 ˛/; MTTF C t˛=2;n1 p n
(6.16)
where ˛ is the level of confidence and t˛=2;n1 is a parameter derived from Student’s distribution. By using the mean time to repair (MTTR) instead of MTTF, one may also use Eqs. 6.14–6.16 for repair times, and the repair cumulative distribution function G.t/ can be estimated using the above-mentioned approach for estimating F .t/. Table 6.1 summarizes the basic results of the reliability estimation using the EFDD approach for complete data. We now illustrate an application. An important international manufacturer of electric motors for the oleodynamic industry collects the failure data for their products from their customers. In particular, the complete set of data for item 3 of product r.090.1768 (Fig. 6.3) is reported in Table 6.2. Table 6.3 presents the rank-ordered data of F .t/, f .t/, and .t/ according to the direct, improved direct, and median rank methods, while Fig. 6.4 compares their trends. The estimated value of MTTF, its variance, and the interval of confidence are provided by Eqs. 6.14–6.16: MTTF D
n X ti i D1
n
D 2;179 h;
n X .ti MTTF /2 2 s D n1 i D1 Pn 2 t n.MTTF /2 D i D1 i D 670;681 h2 ; n1 s Pr MTTF t˛=2;n1 p MTTF n s D .1 ˛/: MTTF C t˛=2;n1 p n
A 90% confidence interval, i. e., .1˛/ D 0:90, can be found using the table of values t˛=2;n1 for Student’s
138
6 Reliability Evaluation and Reliability Prediction Models
Table 6.1 Empirical functions direct to data: reliability estimation for complete data
Direct method Improved direct method Median rank method
FO .ti /
O i/ R.t
fO .t /
O / .t
i n i nC1 i 0:3 n C 0:4
i n nC1i nC1 n C 0:7 i n C 0:4
1 n.tiC1 ti / 1 .tiC1 ti /.n C 1/ 1 .tiC1 ti /.n C 0:4/
1 .tiC1 ti /.n i / 1 .tiC1 ti /.n C 1 i / 1 .tiC1 ti /.n C 0:7 i /
1
MTTF D
MTTF
n X ti n
iD1
Pn n 2 2 X .ti MTTF /2 iD1 ti n.MTTF / s D D n1 n1 iD1 s s D .1 ˛/ Pr MTTF t˛=2;n1 p MTTF MTTF C t˛=2;n1 p n n 2
Variance Confidence interval MTTF mean time to failure
Table 6.2 Complete data set Time to failure (h)
Time to failure (h)
Time to failure (h)
1,124 2,785 1,642 980 1,974 2,461 1,879 2,894 3,097 2,674
667 1,998 2,756 2,489 2,745 1,945 1,478 1,684 1,246 2,056
2,128 4,562 3,467 2,687 1,695 1,745 1,689 1,348 2,497 2,976
1.0
Direct Method Improved Direct Method Median Rank Method
0.8
0.6
F(t) 0.4
0.2
hours
0.0 0
1000
2000
3000
4000
5000
0.4
Direct Method Improved Direct Method Median Rank Method 0.3
(t) 0.2
0.1
hours
0.0 500
Fig. 6.3 Code r.090.1768 sketch (item 3)
1000
1500
2000
2500
3000
3500
4000
Fig. 6.4 Cumulative failure distribution and hazard curve
6.2 Data Collection and Evaluation of Reliability Parameters
139
Table 6.3 Complete data set i
ttf (h)
DM
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
667 980 1,124 1,246 1,348 1,478 1,642 1,684 1,689 1,695 1,745 1,879 1,945 1,974 1,998 2,056 2,128 2,461 2,489 2,497 2,674 2,687 2,745 2,756 2,785 2,894 2,976 3,097 3,467 4,562
0.033 0.067 0.100 0.133 0.167 0.200 0.233 0.267 0.300 0.333 0.367 0.400 0.433 0.467 0.500 0.533 0.567 0.600 0.633 0.667 0.700 0.733 0.767 0.800 0.833 0.867 0.900 0.933 0.967 1.000
F .t / IDM MRM 0.032 0.065 0.097 0.129 0.161 0.194 0.226 0.258 0.290 0.323 0.355 0.387 0.419 0.452 0.484 0.516 0.548 0.581 0.613 0.645 0.677 0.710 0.742 0.774 0.806 0.839 0.871 0.903 0.935 0.968
0.023 0.056 0.089 0.122 0.155 0.188 0.220 0.253 0.286 0.319 0.352 0.385 0.418 0.451 0.484 0.516 0.549 0.582 0.615 0.648 0.681 0.714 0.747 0.780 0.813 0.845 0.878 0.911 0.944 0.977
DM 0.00011 0.00023 0.00027 0.00033 0.00026 0.00020 0.00079 0.00667 0.00556 0.00067 0.00025 0.00051 0.00115 0.00139 0.00057 0.00046 0.00010 0.00119 0.00417 0.00019 0.00256 0.00057 0.00303 0.00115 0.00031 0.00041 0.00028 0.00009 0.00003
f .t / IDM 0.00010 0.00022 0.00026 0.00032 0.00025 0.00020 0.00077 0.00645 0.00538 0.00065 0.00024 0.00049 0.00111 0.00134 0.00056 0.00045 0.00010 0.00115 0.00403 0.00018 0.00248 0.00056 0.00293 0.00111 0.00030 0.00039 0.00027 0.00009 0.00003
MRM
DM
.t / IDM
MRM
0.00011 0.00023 0.00027 0.00032 0.00025 0.00020 0.00078 0.00658 0.00548 0.00066 0.00025 0.00050 0.00113 0.00137 0.00057 0.00046 0.00010 0.00117 0.00411 0.00019 0.00253 0.00057 0.00299 0.00113 0.00030 0.00040 0.00027 0.00009 0.00003
0.000110 0.000248 0.000304 0.000377 0.000308 0.000254 0.001035 0.009091 0.007937 0.001000 0.000393 0.000842 0.002028 0.002604 0.001149 0.000992 0.000231 0.002976 0.011364 0.000565 0.008547 0.002155 0.012987 0.005747 0.001835 0.003049 0.002755 0.001351 0.000913
0.000106 0.000239 0.000293 0.000363 0.000296 0.000244 0.000992 0.008696 0.007576 0.000952 0.000373 0.000797 0.001916 0.002451 0.001078 0.000926 0.000215 0.002747 0.010417 0.000514 0.007692 0.001916 0.011364 0.004926 0.001529 0.002439 0.002066 0.000901 0.000457
0.000108 0.000242 0.000296 0.000367 0.000299 0.000247 0.001005 0.008811 0.007680 0.000966 0.000379 0.000810 0.001948 0.002495 0.001098 0.000945 0.000219 0.002812 0.010684 0.000528 0.007930 0.001982 0.011806 0.005147 0.001610 0.002595 0.002234 0.001001 0.000537
ttf time to failure, DM direct method, IDM improved direct method, MRM median rank method
distribution (see Appendix A.3):
t˛=2;n1 D 1:699;
819 Pr 2;179 1:311 p MTTF 30 819 D .1 0:1/ D 0:90; 2;179 C 1:311 p 30 Prf1;983 MTTF 2;375g D 0:90:
O O The estimates of FO .t/, R.t/, and .t/ are obtained at the suspension time of the test just as they are computed for complete data but with the difference that these values are truncated on the right. For multiple censored data, ti defines a failure time, while tiC represents a censored (suspension) time. The lifetime distribution of censored components is considered to be the same as for noncensored components. The product limit estimator method suggested by Lewis (1987) is based on the improved direct method used for complete data: O i / D 1 FO .ti / D 1 R.t
6.2.1.5 Censored Data – Product Limit Estimator Let n be the number of units in a test and r < n be the number of failures that occur. The test is suspended before n failures, and the data set is individually right censored (see Fig. 6.1).
nC1i i D nC1 nC1
and O i 1 / D 1 FO .ti 1 / D 1 i 1 D n C 2 i : R.t nC1 nC1
140
6 Reliability Evaluation and Reliability Prediction Models
Table 6.4 Censored data set
1.0
Time to failure (h)
Time to failure (h)
Time to failure (h)
1,124 2,785 1,642 800C (980) 1,974 2,461 1,300C (1,879) 2,894 3,097 2,674
667 700C (1,998) 2,756 2,489 1,500C (2,745) 1,945 1,478 1,500C (1,684) 1,246 2,056
2,128 2,500C 3,467 2,687 1,000C 1,745 1,000C 1,348 2,497 2,500C
0.8
(4,562) Rˆ (ti )
(1,695)
0.6
0.4
(1,689) 0.2
(2,976)
0.0 0
The plus superscripts indicate the suspension times.
1000
2000
3000
4000
hours
Fig. 6.5 Reliability plot for the product limit estimator method
O i / are only estimated with respect The values of R.t to failure times ti . In particular, the points plotted are those given in Table 6.5.
Then O i/ R.t nC1i I D O i 1 / nC2i R.t hence, O i / D n C 1 i R.t O i 1 /: R.t nC2i
6.2.1.6 Censored Data – Kaplan–Meier Approach (6.17)
If censoring occurs at time tiC , the reliability at that time is estimated by the reliability at time ti 1 . If failure occurs at time ti , the reliability at that time is given by Eq. 6.17. In a unique equation, O i/ D R.t
nC1i nC2i
ıi
O i 1 /; R.t
(6.18)
where ıi D .1:0/ (if failure occurs at time ti , if cenO D 1. soring occurs at time ti ) and R.0/ O With the appropriate value of R.t/ and by simply inputting the ti corresponding to failure times, O Eqs. 6.6–6.8 can estimate FO .t/, fO.t/, and .t/. We now illustrate an application. Censoring (or suspension) times tiC are introduced in the previously cited electric motors data set, assuming the suspension of several units before the recorded failures, and the complete data set is transformed into a right censored one. The real (future and not known) failure time is reported in parentheses in Table 6.4 next to the suspension time. O i / derived from The graph in Fig. 6.5 represents R.t Eq. 6.18. A linear trend between points is assumed in this case.
Kaplan and Meier introduce a variation of the product limit estimator method. Assuming ti is the ranked failure times and ni is the number of components at risk prior to the i th failure, the estimated reliability is calculated by ıi O i 1 /; O i/ D 1 1 R.t R.t ni
(6.19)
where ıi D .1; 0/ (if failure occurs at time ti , if censorO ing occurs at time ti / and R.0/ D 1. The estimates for O FO .t/, fO.t/, and .t/ in this case are also derived from Eqs. 6.6–6.8 by simply inputting the ti corresponding O to failure times and the appropriate R.t/. O i / be deWe now illustrate an application. Let R.t rived directly from Eq. 6.19, only in this case for ti corresponding to failure times, and compare the product limit estimator method with the Kaplan–Meier approach on the basis of the same data set. The results are shown in Table 6.6 and Fig. 6.6.
6.2.1.7 Censored Data – Rank Adjustment Method This method is based on determining a failure ranking, which is influenced by the censored data position. The
6.2 Data Collection and Evaluation of Reliability Parameters
141
Table 6.5 Reliability estimation according to the product limit estimator method i
ti
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
0 667 700 800 1,000 1,000 1,124 1,246 1,300 1,348 1,478 1,500 1,500 1,642 1,745 1,945 1,974 2,056 2,128 2,461 2,489 2,497 2,500 2,500 2,674 2,687 2,756 2,785 2,894 3,097 3,467
Censored time
.n C 1 i /=.n C 2 i /
ıi
R.ti /
0.968 0.967 0.966 0.964 0.963 0.962 0.960 0.958 0.957 0.955 0.952 0.950 0.947 0.944 0.941 0.938 0.933 0.929 0.923 0.917 0.909 0.900 0.889 0.875 0.857 0.833 0.800 0.750 0.667 0.500
1 0 0 0 0 1 1 0 1 1 0 0 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1
R.667/ D 0:968R.0/ D
1.000 0.968
R.1; 124/ D 0:962R.667/ D R.1;246/ D 0:960R.1;124/ D
0.931 0.893
C C C C C C C
C C
0.773 0.730 0.687 0.644 0.601 0.558 0.515 0.472 0.429
0.376 0.322 0.268 0.215 0.161 0.107 0.054
where n is the total number of units and iti is the rank order of the failure at time ti . In particular,
basic formula is
iti 0:3 O R.ti / D 1 ; n C 0:4
0.854 0.816
(6.20)
iti D iti 1 C RI;
(6.21)
where RI is the rank increment, 1.0
RI D 0.8
0.6
Rˆ (ti ) 0.4
0.2
0.0 0
1000
2000
3000
hours
Fig. 6.6 Reliability plot for the Kaplan–Meier method
4000
.n C 1/ iti 1 ; 1 C n
(6.22)
where n is the number of units at risk (present unit included). The rank increment is recomputed for the next failure time following a censored unit, and then it remains the same until the next piece of censored data appears. Both failure time it i and RI are initially 1. O i / values obWe now illustrate an application. R.t tained using the rank adjustment method on the basis of the same data set are shown in Table 6.7 and Fig. 6.7.
142
6 Reliability Evaluation and Reliability Prediction Models
Table 6.6 Reliability estimation according to the Kaplan–Meier method i
ti
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
0 667 700 800 1,000 1,000 1,124 1,246 1,300 1,348 1,478 1,500 1,500 1,642 1,745 1,945 1,974 2,056 2,128 2,461 2,489 2,497 2,500 2,500 2,674 2,687 2,756 2,785 2,894 3,097 3,467
Censored time
C C C C C C C
C C
ni
.1 1=ni /
ıi
R.ti /
30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
0.967 0.966 0.964 0.963 0.962 0.960 0.958 0.957 0.955 0.952 0.950 0.947 0.944 0.941 0.938 0.933 0.929 0.923 0.917 0.909 0.900 0.889 0.875 0.857 0.833 0.800 0.750 0.667 0.500 0.000
1 0 0 0 0 1 1 0 1 1 0 0 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1
R.667/ D 0:967R.0/ D
1.000 0.967
R.1;124/ D 0:960R.667/ D R.1;246/ D 0:958R.1;124/ D
0.928 0.889
6.2.1.8 Crossover Analysis Comparing different methods that use censored data leads to some interesting observations. 1.0
0.8
0.6
Rˆ (ti ) 0.4
0.2
0.0 0
1000
2000
3000
hours
Fig. 6.7 Reliability plot for the rank adjustment method
4000
0.849 0.808
0.764 0.719 0.674 0.629 0.584 0.539 0.494 0.449 0.404
0.346 0.289 0.231 0.173 0.115 0.058 0.000
As seen in Fig. 6.8, the values of the product limit estimation method and the rank adjustment method are very close to each other, while the Kaplan–Meier method tends to underestimate the value (12% on average). Furthermore, comparing the reliability estimation and the complete set of data with the corresponding estimation in the censored condition is very revealing. The data set used in censored applications is directly derived from the original complete data set used for the previously completed applications, which makes comparison very easy. For the sake of simplicity only two methods are compared: the improved direct method (complete data) and the Kaplan–Meier method (censored data). They are the methods most frequently used in real-world applications. The choice in the same class of methods (complete and censored) is not so relevant for the following
6.2 Data Collection and Evaluation of Reliability Parameters
143
Table 6.7 Reliability estimation according to the rank adjustment method i
ti
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
0 667 700 800 1,000 1,000 1,124 1,246 1,300 1,348 1,478 1,500 1,500 1,642 1,745 1,945 1,974 2,056 2,128 2,461 2,489 2,497 2,500 2,500 2,674 2,687 2,756 2,785 2,894 3,097 3,467
Censored time
iti
R.ti /
1
1.000
1.000 0.977
Œ.30 C 1/ 1:000=.1 C 25/ D
1.154 1.154
2.154 3.308
0.939 0.901
Œ.30 C 1/ 3:308=.1 C 22/ D
1.204 1.204
4.512 5.716
0.861 0.822
Œ.30 C 1/ 5:716=.1 C 18/ D
1.331 1.331 1.331 1.331 1.331 1.331 1.331 1.331 1.331
7.047 8.378 9.709 11.040 12.371 13.702 15.033 16.364 17.695
0.778 0.734 0.690 0.647 0.603 0.559 0.515 0.472 0.428
Œ.30 C 1/ 17:695=.1 C 7/ D
1.663 1.663 1.663 1.663 1.663 1.663 1.663
19.358 21.021 22.684 24.347 26.010 27.673 29.336
0.373 0.318 0.264 0.209 0.154 0.100 0.045
RI
C C C C
C
C C
C C
RI rank increment
analysis as different methods in each class perform in a very similar way. The aim is to evaluate the change in reliability estimation when suspension of several data items occurs (censoring). The ti column in Table 6.8 reports all failure times, while the ti .c/ columns contain both failure times and censored data (the ttt C form suggests a suspension of observation at time ttt). O i / estimaFigure 6.9 directly compares the two R.t tions. Clearly, the “censored” plot only presents values when ti .c/ is a failure time. Using a censored data set obviously introduces errors. Moreover, this is very frequently found in realworld situations (e. g., complete tests are very time
consuming and so very expensive, leading to them often being suspended before all the units fail). The maximum error in the application analyzed is around 20% and corresponds to an overestimate by the Kaplan– Meier method. The error is concentrated in the time zone that follows the concentration of suspended units. Figure 6.9 shows a significant error (underestimation) for lifetime near the end of the scale (maximum values). This is typical of the Kaplan–Meier method when the last data item is a censored time. The error is generally an overestimation of reliability. It depends on the percentage of suspended units, on the censoring times, and on the link between this censoring time and the real failure time of units (a cen-
144
6 Reliability Evaluation and Reliability Prediction Models
Table 6.8 Comparison of complete and censored data ti
ti .c/
0 667 980 1,124 1,246 1,348 1,478 1,642 1,684 1,689 1,695 1,745 1,879 1,945 1,974 1,998 2,056 2,128 2,461 2,489 2,497 2,674 2,687 2,745 2,756 2,785 2,894 2,976 3,097 3,467 4,562
0 667 800 1,124 1,246 1,348 1,478 1,642 1,500 1,000 1,000 1,745 1,300 1,945 1,974 700 2,056 2,128 2,461 2,489 2,497 2,674 2,687 1,500 2,756 2,785 2,894 2,500 3,097 3,467 2,500
Censored time
C
C C C C C
C
C C
Improved direct method
Kaplan–Meier method
1.000 0.968 0.935 0.903 0.871 0.839 0.806 0.774 0.742 0.710 0.677 0.645 0.613 0.581 0.548 0.516 0.484 0.452 0.419 0.387 0.355 0.323 0.290 0.258 0.226 0.194 0.161 0.129 0.097 0.065 0.032
1.000 0.967 0.928 0.889 0.849 0.808 0.764
0.719 0.674 0.629 0.584 0.539 0.494 0.449 0.404 0.346 0.289 0.231 0.173 0.115 0.058 0.000
1.0
1.0
0.8
0.8
0.6
0.6
Rˆ (ti ) 0.4
0.4 Product Limit Estimator Kaplan–Meier Rank Adjustment Method
0.2
Improved Direct Method Kaplan–Meier
0.2
0.0 0
0.0 0
1000
2000
3000
4000
1000
2000
3000
4000
hours
hours
Fig. 6.8 Compared reliability plots
Fig. 6.9 R.ti / comparison between complete and censored data
6.2 Data Collection and Evaluation of Reliability Parameters
145
sored unit can work well for a few hours only or for many hours).
lyzing a group of patients undergoing a specific therapy.
6.2.1.9 Recent Development Affecting Censored Data Analysis
6.2.2 Theoretical Distribution Research
The issue of estimating reliability parameters in the censoring condition is very critical and also very important in the field. Some of the more consolidated approaches presented in the previous sections only form a starting point for this open issue involving researchers and practitioners. Several authors have recently proposed very interesting potentially important methodologies. In adopting simulation to evaluate the censoring effect, Fu (2007) found it was more accurate and easier to use than traditional methods. The neural network approach is another strategy for solving the censoring problem that is examined here. The contribution made by Hsieh (2007) is very important. He studied two neural networks: the first was designed to estimate the censored data extracted from the model derived from the uncensored data, and the second was designed to find the optimal settings for the control factors using the uncensored data and the estimated censored data. On the other hand, several authors are developing an alternative approach based on the expectationmaximization algorithm. In particular, Contreras (2007) has implemented this statistical analysis algorithm on a finite censored distribution of data. Sets of censored data are also analyzed using estimators based on fuzzy sets and on genetic algorithms. In light of the work by Cheng and Mordeson (1985), Cheng (2005) discussed an interesting approach based on fuzzy logic that provides more information than a simple point estimate of reliability. Zhou and Wang (2005) discussed the introduction of a genetic algorithm that provides a good estimation of reliability parameters with a large probability. This approach seems to be particularly interesting in the case of heavy censoring. In conclusion, the analysis of a censored data set is a very important issue because in the field it does no impact significantly only on the reliability parameters of industrial equipment but also on human “reliability.” For example, in studying therapy effects, physicians “fortunately” must use censored data when ana-
Section 6.2.1 dealt with methods for deriving an empirical reliability distribution based on estimations of reliability information directly collected in the field (i. e., failure times), but an alternative approach uses theoretical distributions derived from the data collected. This second approach is generally preferable because of its thoroughness. It is also possible to evaluate reliability over the range of data collected. Moreover, theoretical distributions can be used to further develop analysis of maintenance policies and the failure process (see Chap. 5). The collection of failure data is also the starting point in this case. The determination of the EFDD using the EFDD approach is effective in fitting a good O i / derived usdistribution. The estimates FO .ti / or R.t ing the EFDD methods are used in the fitting phase. When the sample data include both failure and censored times, the fitting process remains the same. Then in agreement with the above-mentioned approaches, several adjustments must be made to the cumulative function estimates. The fitting of a theoretical distribution can be viewed as a two-step process: the first step identifies a candidate distribution, the second implements a goodness-of-fit test. Both of these steps were developed by researchers, but here we present several approaches that are used in practice.
6.2.2.1 Least Squares Curve Fitting Method The basic idea is to fit a linear regression using the least-squares method in the form of y D a C bx to a set of transformed data depending on the theoretical distribution considered. If the index of fit, usually represented by r, is high (close to one) then the fit is good. Exponential, Weibull, and normal are the most used distributions considered in this approach.
146
6 Reliability Evaluation and Reliability Prediction Models
Least-Squares Method: Exponential Distribution Case The cumulative distribution of the exponential distribution is well known:
normal distribution, which is usually tabulated (Appendix A.1). Using the inverse function,
F .t/ D 1 et :
ti ti D : (6.31) This function is linear in t, so the least-squares fitting process is applied to the following variables: yi D zi D 1 ŒF .ti / and xi D ti . From application of the least-squares fit,
(6.23)
Applying the natural logarithm on both sides gives 1 D t: (6.24) lnŒ1 F .t/ D ln 1 F .t/ The
slope of the line produced by considering yi D 1 ln and xi D ti represents an estimation of . 1FO .ti / Performing the least-squares method in the form of y D bx, one obtains Pn xi yi O b D D Pi D1 : (6.25) n 2 i D1 xi Least-Squares Method: Weibull Distribution Case The Weibull cumulative distribution (Eq. 5.68) provides ˇ F .t/ D 1 e.t =˛/ : (6.26) Taking two natural logarithms in sequence, one obtains 1 D ˇ ln t ˇ ln ˛: (6.27) ln ln 1 F .t/ The linear regression form is obtained by considering yi D ln ln 1F1.ti / and xi D ln ti , and especially yi D a C bxi ;
where b D ˇO D and
Pn
.xi x/.y N i y/ N Pn 2 N i D1 .xi x/
i D1
a D ˇO ln ˛O D yN b x: N
(6.28) (6.29)
ˇO is derived from Eq. 6.28 and then ˛O is estimated by Eq. 6.29. Least-Squares Method: Normal Distribution Case Assuming the cumulative function F .t/ is a normal distribution, the normalized variable z can be used. In particular, F .t/ D .z/ D
t
Zz D 1
1 2 p ey =2 dy; 2
(6.30) where is the standard deviation and is the average value of the normal distribution (in t). The link between z and .z/ can be obtained quite quickly using the inverse function of the standardized
1 ŒF .ti / D 1 Œ .zi / D zi D
O D
1 b
(6.32)
and
a O D aO D : (6.33) b Table 6.9 presents the fundamental information collected using the least-squares approach according to the main distributions mentioned above. The index of fit in the least-squares method is calculated by 1 Pn .xi x/.y N i y/ N n r D qP i D1 qP ; (6.34) 2 n n .xi x/ .yi y/ N N 2 i D1
n
i D1
n
where yN and xN are, respectively, the average values of yi and xi , and n is the number of couples .xi ; yi / available. Application The same complete data set used as in the EFDD approach (Table 6.2) was used in the research into a theoretical distribution of cumulative function F .t/ using the least-squares method: 1. Exponential distribution (Table 6.10). Solving Eq. 6.25 for b, one obtains Pn xi yi b D Pi D1 D O D 0:000501: n 2 x i D1 i The linear regression is represented by yi D aCbxi D 0:000501xi and the index of fit r is 0.6601. In terms of a cumulative distribution, the equation of the exponential distribution fitting the real-world data is F .t/ D 1 et D 1 e0:000501t . The dashed line in Fig. 6.10 represents the linear regression: the approximation is not satisfactory, as the index of fit is very poor. In conclusion, the exponential distribution is not very appropriate.
6.2 Data Collection and Evaluation of Reliability Parameters
147
Table 6.9 Least squares curve fitting method Distribution
Cumulative function xi
Exponential
F .t / D 1 et
Weibull
F .t / D 1 e.t=˛/
Normal
F .t / D .z/ D
ti
ˇ
Zz D 1
Function
1
t
ln ti
zi D 1 F .ti /
a D yN b xN D b O Pn .x x/.y N N 1 i i y/ Pn b D iD1 D 2 O .x x/ N i iD1
1 2 p ey =2 dy 2
ŒF .t / in Appendix A.1
Table 6.10 Exponential distribution 1 ti (h) FO .ti / yi D ln 1 FO .ti / 667 980 1,124 1,246 1,348 1,478 1,642 1,684 1,689 1,695 1,745 1,879 1,945 1,974 1,998
ti
Linear regression function yi D a C bxi yi Parameters (a; b) 1 ln aD0 1 FO .ti / Pn iD1 xi yi O bD P D n 2 iD1 xi 1 a D yN b xN D ˇO ln ˛O ln ln 1 F .t / Pn .xi x/.y N N i y/ Pn b D iD1 D ˇO 2 .x x/ N i iD1
0.032 0.065 0.097 0.129 0.161 0.194 0.226 0.258 0.290 0.323 0.355 0.387 0.419 0.452 0.484
0.033 0.067 0.102 0.138 0.176 0.215 0.256 0.298 0.343 0.389 0.438 0.490 0.544 0.601 0.661
ti (h)
FO .ti /
2,056 2,128 2,461 2,489 2,497 2,674 2,687 2,745 2,756 2,785 2,894 2,976 3,097 3,467 4,562
0.516 0.548 0.581 0.613 0.645 0.677 0.710 0.742 0.774 0.806 0.839 0.871 0.903 0.935 0.968
yi D ln
1 1 FO .ti /
0.726 0.795 0.869 0.949 1.036 1.131 1.237 1.355 1.488 1.642 1.825 2.048 2.335 2.741 3.434
Estimated using the improved direct method
2. Weibull distribution (Table 6.11). In solving Eqs. 6.28 and 6.29 for a, one can derive the estimates for the following directly: Pn .xi x/.y N i y/ N O Pn b D ˇ D i D1 D 2:766 2 N i D1 .xi x/ and
a D ˇO ln ˛O D yN b xN D 21:593:
The linear regression is represented by yi D aCbxi D 21:593 C 2:766xi and the index of fit r is 0.9801. Figure 6.11 shows the plots of real-world data and linear regression.
In terms of the failure cumulative distribution, the original equation is ˇ
F .t/ D 1 e.t =˛/ : Parameters ˛ and ˇ are directly derived from parameters a and b, which characterize the linear regression. a In particular, ˇO D b D 2:766 and ˛O D e b D 2,463.66. In conclusion, the equation of the cumulative failure function is F .t/ D 1 e.t =2;463:66/
2:766
:
148
6 Reliability Evaluation and Reliability Prediction Models
Table 6.11 Weibull distribution ti (h)
FO .ti /
xi D ln ti
667 980 1,124 1,246 1,348 1,478 1,642 1,684 1,689 1,695 1,745 1,879 1,945 1,974 1,998
0.032 0.065 0.097 0.129 0.161 0.194 0.226 0.258 0.290 0.323 0.355 0.387 0.419 0.452 0.484
6.503 6.888 7.025 7.128 7.206 7.298 7.404 7.429 7.432 7.435 7.465 7.538 7.573 7.588 7.600
yi D ln ln
1 1 FO .ti /
3:418 2:708 2:285 1:979 1:738 1:537 1:363 1:209 1:070 0:943 0:825 0:714 0:610 0:510 0:413
ti (h)
FO .ti /
xi D ln ti
2,056 2,128 2,461 2,489 2,497 2,674 2,687 2,745 2,756 2,785 2,894 2,976 3,097 3,467 4,562
0.516 0.548 0.581 0.613 0.645 0.677 0.710 0.742 0.774 0.806 0.839 0.871 0.903 0.935 0.968
7.629 7.663 7.808 7.820 7.823 7.891 7.896 7.918 7.922 7.932 7.970 7.998 8.038 8.151 8.426
yi D ln ln
1 1 FO .ti /
0:320 0:230 0:140 0:052 0.035 0.123 0.212 0.303 0.397 0.496 0.601 0.717 0.848 1.008 1.234
Estimated using the improved direct method
yi
yi
4.000
2
3.500
1
3.000
0
2.500 2.000
-1
1.500
-2
1.000
-3
0.500
-4
0.000
6 0
1000
2000
3000
4000
6.5
7
7.5
5000
hours
Fig. 6.10 Exponential least-squares plot of failure data
8
8.5
9
ln(hours)
Fig. 6.11 Weibull least-squares plot of failure data
yi 3.000 2.000
3. Normal distribution. Using the well-known method (Table 6.9) and the data in Table 6.12, Pn .xi x/.y N i y/ N 1 Pn b D i D1 D D 0:0011; 2 O N i D1 .xi x/
1.000 0.000 -1.000 -2.000 0
1000
2000
3000
4000
hours
a D yN b xN D b O D 2:3826: The linear regression is represented by yi D aCbxi D 2:3826 C 0:0011xi and the index of fit r is 0.9531. Figure 6.12 shows the plots of real-world data and linear regression. The resolution of Eqs. 6.32 and 6.33 makes it possible to determine the cumulative failure distribution,
Fig. 6.12 Normal least-squares plot of failure data
and in particular O D
1 D 914:528 b
and O D aO D
a D 2;178:933: b
5000
6.2 Data Collection and Evaluation of Reliability Parameters
149
Table 6.12 Normal distribution ti (h) D xi
FO .ti /
yi D zi D 1 ŒF .ti /
ti (h) D xi
FO .ti /
yi D zi D 1 ŒF .ti /
667 980 1,124 1,246 1,348 1,478 1,642 1,684 1,689 1,695 1,745 1,879 1,945 1,974 1,998
0.032 0.065 0.097 0.129 0.161 0.194 0.226 0.258 0.290 0.323 0.355 0.387 0.419 0.452 0.484
1:849 1:518 1:300 1:131 0:989 0:865 0:753 0:649 0:552 0:460 0:372 0:287 0:204 0:122 0:040
2,056 2,128 2,461 2,489 2,497 2,674 2,687 2,745 2,756 2,785 2,894 2,976 3,097 3,467 4,562
0.516 0.548 0.581 0.613 0.645 0.677 0.710 0.742 0.774 0.806 0.839 0.871 0.903 0.935 0.968
0.040 0.122 0.204 0.287 0.372 0.460 0.552 0.649 0.753 0.865 0.989 1.131 1.300 1.518 1.849
Estimated using the improved direct method
In conclusion, the equation of the cumulative failure function is Zt F .t/ D 1
Zt D 1
.y/2 1 p e 2 2 dy 2
1
p e 914:528 2
.y2;178:933/2
2914:5282
6.2.2.2 Maximum Likelihood Estimator
dy:
Crossover Analysis and Final Observations On comparing the three different equations representing the cumulative failure function calculated using the least-squares method, it is worth initially noting that as reported in Fig. 6.13 the exponential distribution does not fit the real-world data well enough, whereas the two remaining distributions (i. e., Weibull and normal) are perfectly satisfactory. This observation is confirmed by the respective index of fit results: 0.6601, 0.9801, and 0.9531. The good fit of the Weibull and normal distributions is an indicator of the typical process of failure for the component analyzed. In fact, the failure rate according to the Weibull distribution is ˇ t ˇ 1 .t/ D : ˛ ˛
demonstrates that the component tested is working in conditions of wear.
From a statistical point of view, the method of maximum likelihood estimation is considered to be a very robust one, with some exceptions. As the name suggests, maximum likelihood estimation aims to obtain the most likely values of the parameters that best describe the data for a given distribution. If x is a continuous random variable with the following probability density function f .xI 1 ; 2 ; : : : ; k /; where 1 ; 2 ; : : : ; k are k unknown parameters to be estimated, with n independent observations x1 ; x2 ; : : : ; xn , corresponding in the case of life data analysis to failure times (or suspended times), the likelihood function is given by L. 1 ; 2 ; : : : ; k =x1 ; x2 ; : : : ; xn / D L D
(6.35)
Figure 6.13(d) presents the trend of the failure rate adopting the Weibull distribution. The increasing trend
n Y
f .xi I 1 ; 2 ; : : : ; k /: (6.36)
i D1
The maximum likelihood estimators (MLE; or parameter values) of 1 ; 2 ; : : : ; k are obtained by maximizing L. It is possible to define the logarithmic version,
150
6 Reliability Evaluation and Reliability Prediction Models
1.200
1.200
F(t) - IDM
F(t) - IDM Weibull
exponential
1.000
1.000
0.800
0.800
0.600
0.600
0.400
0.400
0.200
0.200
0.000 600
1600
2600
3600
0.000 600
4600
1600
2600
3600
4600
hours (b)
hours (a) 1.200
F(t) - IDM normal
1.000
0.0035 0.003
0.800
0.0025 0.600
λ(t) 0.002
0.400
0.0015 0.001
0.200
0.0005 0
0.000 600
1600
2600
3600
4600
0
hours (c)
1000
2000
3000
4000
5000
hours (d)
Fig. 6.13 Comparison between cumulative failure distribution F .t / calculated using the empirical functions direct to data and theoretical distribution research methods (least-squares approach) (a–c) and failure rate curve (Weibull distribution) (d). IDM improved direct method
which is much easier to work with than L, as follows: D ln L D
n X
ln.f .xi I 1 ; 2 ; : : : ; k //:
(6.37)
i D1
By maximizing the MLE of 1 ; 2 ; : : : ; k are the simultaneous solutions of k equations, so @ ln L D 0; @ i
i D 1; 2; : : : ; k:
(6.38)
With censored data (e. g., on the right) the likelihood function is modified in L. 1 ; 2 ; : : : ; k =x1 ; x2 ; : : : ; xn / D L D
r Y
f .xI 1 ; 2 ; : : : ; k /ŒR.t /nr ;
(6.39)
i D1
where r is the number of failures and n is the number of components at risk. The term ŒR.t /nr repre-
sents the probability that the .n r/ censored components do not fail prior to the termination of the test. Generally speaking, some components are assumed to be suspended at the termination time of analysis (test). The MLE method is very appealing as the many properties it possesses can deal with a large sample. It is asymptotically consistent, i. e., as the sample size increases, the estimates converge to the right values. It is asymptotically efficient, i. e., it produces the most precise estimates for large samples. It is asymptotically unbiased, i. e., on average the expected right value is obtained for large samples. MLE Method: Exponential Distribution Case Let n be the number of components in a test, r n the number of failures, and ti the ordered punctual values of failure times. The corresponding probability
6.2 Data Collection and Evaluation of Reliability Parameters
density functions are
151
particular
f .ti / D i eti ;
Pr
i D 1; 2; : : : ; r:
g.ˇ / D
According to Eq. 6.39, the likelihood function is L.t1 ; t2 ; : : : ; tr / D
r Y
i eti . et /nr
r X D r exp ti .n r/t : i D1
(6.40) Then, ln L.t1 ; t2 ; : : : ; tr / D r ln
Pr
ˇ
ln ti C .n r/ts ln ts
ˇ i D1 ti
C .n r/tsˇ ln ts r 1 1X ln ti D 0: ˇ r
(6.45)
i D1
i D1
r X
ˇ i D1 ti
ı
This equation must be solved numerically, and its result is the ˇ value. The parameter ˛ is obtained by ı
˛ D
X r 1 r
( ts D
i D1
C .n
ı r/tsˇ
1 ˇı
;
(6.46)
i D1
where ti .n r/t :
ı tiˇ
1
for complete data
t
for censored data:
(6.41) By applying Eqs. 6.38 and 6.39
MLE Method: Normal Distribution Case The derivation of the MLE function for a normal distribution has the following parameters:
d ln L.t1 ; t2 ; : : : ; tr / D 0; d r X ti .n r/t D 0: r
D
i D1
i D1
2 D
Solving in , one obtains r :
i D1 ti .n r/t
D P r
(6.42)
f .t/ D e t ;
(6.43)
R.t/ D e t :
r Y
f .t/ŒR.t /nr :
n
;
.n 1/s 2 ; n
(6.47) (6.48)
n X .ti MTTF/2 : n1 i D1
Application This application is realized using the same complete data set employed in the EFDD and in the leastsquares approaches: 1. Exponential distribution. Using Eq. 6.42, r 30 D 0:000459 h1 : D
65;368 t .n r/t i i D1
D P r
MLE Method: Weibull Distribution Case The likelihood function in the Weibull distribution case is L.˛; ˇ/ D
where s2 D
In conclusion, the resulting exponential distribution is characterized by
F .t/ D 1 e t ;
n X ti
(6.44)
i D1
The two parameters ˛ and ˇ have to be computed numerically (e. g., Newton–Raphson method), and in
In conclusion, the resulting exponential distribution is characterized by f .t/ D 0:000459 e0:000459t F .t/ D 1 e0:000459t R.t/ D e0:000459t : This result compares favorably with the previously obtained least-squares estimates (8:4%).
152
6 Reliability Evaluation and Reliability Prediction Models
2. Weibull distribution. The application presents a complete data set. From Eq. 6.45, Pn
ˇ i D1 ti ln ti Pn ˇ i D1 ti
ı
g.ˇ / D
In conclusion, the equation of the cumulative failure function is Zt
1X 1 ln ti D 0: ˇ n n
F .t/ D
i D1
1 Zt
ı
Figure 6.14 shows the plot of the g.ˇ / function. An acceptable value of ˇ ı is 2.873 (close to the value obtained by least-squares approach; C3:8%). The second parameter of Weibull distribution (˛) is obtained by simplifying Eq. 6.46, especially with a complete data set: ˛ı D
X n 1 n
tiˇ
ı
1 ˇı
i D1
This value is very close to the results of the leastsquares approach (0:1%). In conclusion, the equation of the cumulative failure function is 2:873
:
3. Normal distribution. By using Eqs. 6.47 and 6.48, the MLE function for a normal distribution has the following parameters: D
n X ti i D1
r D
n
1
.y2;178:933/2 p e 2805:1862 dy: 805:186 2 1
6.3 Introduction to Reliability Block Diagrams
D 2;443:47:
F .t/ D 1 e.t =2;443:47/
D
.y/2 1 p e 2 2 dy 2
D 2;178:933;
.n 1/s 2 D 805:186: n
The first parameter is the same as that found by the least-squares method. The second one is underestimated by the MLE method compared to the leastsquares method (11:9%).
1
Functional schemes representing the physical connections among the components of a production system can be used in describing, modeling, and studying its operating principles. Examples of functional schemes are represented by mechanical applications such as steam production and distribution plants, water supply distribution systems, and liquid fuel storage systems. Otherwise a reliability scheme is useful to model and study the operating configurations for the correct and incorrect working of a production system according to different operating conditions and physical connections. In order to understand the difference between functional and reliability schemes more clearly, Fig. 6.15 presents the scheme for a water supply plant composed of two pumps connected in a parallel redundant configuration. In terms of reliability, pumps P1 and P2 are not necessarily related to each other in this configuration: depending on the water requirement of the user, located at the end of the functional scheme, only one of the two pumps could operate rather than both.
0 -1
0
2
4
6
8
10
-2 -3 -4 -5 -6 -7 -8
Fig. 6.14 g.ˇ ı / function
Fig. 6.15 Functional scheme of a production system
6.4 Serial Configuration
153
In the case of independent events, Eq. 6.49 changes as follows:
C1
RS D P .X1 /P .X2 /P .X3 / : : : P .Xn /:
(6.50)
In other words by Eq. 6.50 the system reliability RS .T / for a period of time T is
C2 Fig. 6.16 Reliability block diagram: parallel configuration
RS .T / D C1
C2
D Fig. 6.17 Reliability block diagram: serial configuration
n Y i D1 n Y
P .Xi / DR1 .T /R2 .T / : : : Rn .T / Ri .T / D e
i D1 R T Pn 0 i D1 i .t / dt
De Consequently, the functional scheme in Fig. 6.15 may be associated with different reliability block diagrams. For example, the diagram in Fig. 6.16 (parallel or redundant configuration) is suitable when one component of the system must supply the whole request. The reliability block diagram in Fig. 6.17 (serial configuration) is instead applicable when every component is critical and its function must be performed in order to guarantee the operativity of the whole system.
RT 0
S .t / dt
;
(6.51)
where Ri .T / is the reliability of the i th component for the time interval T , i .t/ is the failure rate for the i th component in the unit period of time t, S .t/ is the failure rate (i. e., hazard rate) for the system in the unit period of time t, and n is the number of components in serial configuration. Derived from Eq. 6.51, the failure rate for the system S is n X S .t/ D i .t/: (6.52) i D1
6.4 Serial Configuration The reliability block diagram for serial components can be observed in Fig. 6.18. In this reliability configuration every component Ci of the system is indispensable to the functioning of the whole system, i. e., should a component fail the whole system fails too. The system reliability RS for the system is RS D P .X1 /P .X2 =X1 /P .X3 =X1 X2 / P .Xn =X1 X2 : : : Xn1 /;
(6.49)
By Eq. 6.51, in order to increase the reliability of a serial configuration system RS .t/, the reliability of the component with the lowest value can be properly and effectively improved (see the discussion below on the so-called reliability importance). In a serial configuration the failure statistical distribution of the system is quantified by the following equation: fS .t/ D
i D1
Fig. 6.18 Serial reliability configuration
Ci
fi .t/
Y
Rj .t/ :
(6.53)
j ¤i
This is the unconditional failure rate. It depends on the generic component i when the others are supposed to be reliable (j ¤ i ).
where P .E/ is the probability of event E and Xi means event component i is operating.
C1
n X
Cn
154
6 Reliability Evaluation and Reliability Prediction Models
By Eq. 6.53, the system failure rate S .t/ is Pn
i D1 fi .t/
S .t/ D
i D1 fi .t/
D
j ¤i
Rj .t/
RS .t/
Pn
.
Q
Rj .t // Ri .t /
j
RS .t/ n X
D Q
RS .t /D
Q
j Rj .t /
i .t/:
(6.54)
i D1
In accordance with the Eq. 6.51. Therefore, if the generic failure rate i is constant, S is also constant and the reliability behavior of the system is random. In other words, there is not a specific period of time with a greater probability for the system to fail. The following equation, derived from the expression of MTTF for a generic component, quantifies MTTF for the system, called MTTFS : Z1 MTTFS D
Z1 RS .t/ dt D
0
e
Rt 0
S .x/ dx
dt:
where Rs is the system reliability and Ri is the component reliability. Equation 6.51 presents an analytical model for the determination of the reliability RS .t/ of a simple system of components. A similar model can be applied to quantify the availability of the system AS .t/: it is necessary to substitute the genericRi .t/ with the probability functionAi .t/:The same substitution is necessary to quantify the availability function of the systems introduced and exemplified below, starting from the equations which model the reliability function RS .t/, e. g., Eqs. 6.59, 6.65, and 6.66.
6.4.1 Numerical Example – Serial Configuration Figure 6.19 presents the block diagram of a piping system made of a pump and two valves: a ball valve called “Valve1 ,” located before the pump, and a check valve called “Valve2 ,” located after the pump.
0
(6.55) Finally, when the failure rate of all components is constant, then 1 1 1 D Pn ; D Pn 1 S i D1 i .t/ i D1 MTTFi (6.56) where MTTFi is the MTTF of the i th component. Once the reliability of a system has been determined, engineers must often face the task of identifying the least reliable component(s) in the system in order to improve the system design. In particular, the analyst needs a mathematical approach capable of pointing out and quantifying the importance of each component in the system. The reliability importance of a system is defined as follows:
6.4.1.1 Exponential Distributions of Components’ ttf, Nonepairable Components
MTTFS D
IRi .t/ D
@RS .t/ ; @Ri .t/
Valve 1 Fig. 6.19 Block diagram, piping system
(6.57)
All these components are supposed to be not repairable, and the probability distributions of time to failure (ttf) random variables are assumed to be exponential. In particular, the values of MTTF are the following: • MTTFValve1 D 10;000 h; • MTTFValve2 D 6;000 h; • MTTFPump D 7;000 h. By Eq. 6.52 the failure rate of the system is S .t/ D
3 X i D1
Pump
Valve 2
i .t/ D 4:095 104 h1 :
6.4 Serial Configuration
155
By Eq. 6.51 the reliability of the system is RS .T / D R1 .T /R2 .T /R3 .T / D
3 Y
e
i D1
D e
RT
RT 0
0
S .t / dt
D e
1 1 1 10;000 C 7;000 C 6;000
R T Pn 0
dt
i D1 i .t / dt
4 T
D e4:09510
:
Considering two values for the mission time T , T D 4;000 h and T D 8;000 h, the values of system reliability are RS .T D 4;000/ Š 0:194 and RS .T D 8;000/ Š 0:038: By Eq. 6.53 the density function of the system is fS .t/ D
n X
fi .t/
i D1
Y
Rj .t/
j ¤i
D fValve1 .t/ŒRPump .t/RValve2 .t/ C fValve2 .t/ŒRPump .t/RValve1 .t/ C fPump .t/ŒRValve1 .t/RValve2 .t/ D
exponential distributions of ttf
Valve1 eValve1 t . ePump t eValve2 t /
C Valve2 eValve2 t . ePump t eValve1 t / C Pump ePump t . eValve1 t eValve2 t / D .Valve1 C Valve2 C Pump / . eValve1 t ePump t eValve2 t / D S eS t in accordance with Eqs. 6.51 and 6.52. As a consequence, fS .t D 4;000/ D S eS 4000 Š 7:95 105 h1 ; fS .t D 8;000/ D S eS 8000 Š 1:54 105 h1 : Figure 6.20 presents the trend of the system’s probability function F .t/, reliability R.t/, density function f .t/, and failure rate .t/ when compared with the trends of the components involved, called “blocks.”
By the application of the previously introduced reliability importance evaluation model, 8 @RS .t/ ˆ ˆ D RPump .t/RValve2 .t/ ˆ IRValve1 .t/ D @R ˆ ˆ Valve1 .t/ ˆ ˆ ˆ ˆ ˆ D expŒ.Pump C Valve2 /t ˆ ˆ ˆ ˆ ˆ 1 1 ˆ ˆ C t D exp ˆ ˆ ˆ 7;000 6;000 ˆ ˆ ˆ ˆ ˆ @RS .t/ ˆ ˆ IRValve2 .t/ D ˆ ˆ @RValve2 .t/ ˆ ˆ ˆ ˆ < DR .t/R .t/ Pump
ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ IRPump .t/ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ :
Valve1
D expŒ.Pump C Valve1 /t 1 1 C t D exp 7;000 10;000 D
@RS .t/ D RValve1 .t/RValve2 .t/ @RPump .t/
D expŒ.Valve1 C Valve2 /t 1 1 D exp C t : 10;000 6;000
In particular, for t D 4;000 h and t D 8;000 h, respectively, 8 ˆ IRValve1 .t D 4;000/ ˆ ˆ ˆ ˆ ˆ 1 1 ˆ ˆ ˆ C 4;000 Š 0:290 D exp ˆ ˆ 7;000 6000 ˆ ˆ ˆ ˆ ˆ ˆ ˆ < IRValve2 .t D 4;000/ 1 1 ˆ C 4;000 Š 0:379 D exp ˆ ˆ ˆ 7;000 10000 ˆ ˆ ˆ ˆ ˆ ˆ IRPump .t D 4;000/ ˆ ˆ ˆ ˆ ˆ ˆ 1 1 ˆ ˆ C 4;000 Š 0:344; : D exp 10;000 6;000 8 ˆ IRValve1 .t D 8;000/ ˆ ˆ ˆ ˆ ˆ 1 1 ˆ ˆ ˆ C 8;000 Š 0:084 D exp ˆ ˆ 7;000 6000 ˆ ˆ ˆ ˆ ˆ ˆ ˆ < IRValve2 .t D 8;000/ 1 1 ˆ C 8;000 Š 0:143 D exp ˆ ˆ ˆ 7;000 10000 ˆ ˆ ˆ ˆ ˆ ˆ IRPump .t D 8;000/ ˆ ˆ ˆ ˆ ˆ ˆ 1 1 ˆ ˆ C 8;000 Š 0:118: : D exp 10;000 6;000
156
6 Reliability Evaluation and Reliability Prediction Models
ReliaSoft BlockSim 7 - www.ReliaSoft.com
ReliaSoft BlockSim 7 - www.ReliaSoft.com
Block Unreliability vs Time
1.000
Block Reliability vs Time
1.000
Unreliability
Reliability Diagram1 System Pump Valve 1 Valve 2
0.800
0.800
0.600
0.600
Reliability, R(t)
Unreliability, F(t)=1-R(t)
Diagram1 System Pump Valve 1 Valve 2
0.400
0.400
0.200
0.200
0.000 0.000
10000.000
20000.000
30000.000
40000.000
0.000 0.000
50000.000
10000.000
20000.000
30000.000
40000.000
ReliaSoft BlockSim 7 - www.ReliaSoft.com
ReliaSoft BlockSim 7 - www.ReliaSoft.com
Block Probability Density Function
Block Failure Rate vs Time
5.000E-4
5.000E-4
Pdf
Failure Rate
Diagram1 System Pump Valve 1 Valve 2
Diagram1 System Pump Valve 1 Valve 2
4.000E-4
Failure Rate, f(t)/R(t)
4.000E-4
f(t)
3.000E-4
2.000E-4
1.000E-4
0.000 0.000
50000.000
Time, (t)
Time, (t)
3.000E-4
2.000E-4
1.000E-4
10000.000
20000.000
30000.000
40000.000
0.000 0.000
50000.000
Time, (t)
10000.000
20000.000
30000.000
40000.000
50000.000
Time, (t)
®
Fig. 6.20 Serial configuration, exponential distributions. F .t /, R.t /, f .t /, and .t /. ReliaSoft software
Figure 6.21 presents the values of the reliability importance IRi .t/ for different values of t, while Fig. 6.22 presents the reliability importance for t D 4;000 h and t D 8;000 h. The most critical component is Valve2 . Every graph reported in these figures was obtained using ReliaSoft® software.
Figure 6.23 presents the trend of the system’s probability function F .t/, reliability R.t/, density function f .t/, and failure rate .t/ compared with the trends of the three components involved. Figures 6.24 and 6.25 present the results of the reliability importance evaluation for the components of the serial block diagram.
6.4.1.2 Mix of Probability Distributions of Components’ ttf, Nonrepairable Components
6.4.1.3 Repairable Components and Exponential Distributions of ttf and ttr Random Variables
Figures 6.23–6.25 illustrate the results obtained by assuming the following distributions of the blocks’ ttf in Fig. 6.19:
Now every component in Fig. 6.18 is supposed to be repairable under corrective actions, and the probability distributions of the random variables ttf and time to repair (ttr) are assumed to be exponential. In particular, the values of MTTF and MTTR are:
• Valve1 . Exponential distribution, MTTFValve1 D 10;000 h; • Valve2 . Normal distribution, MTTFValve2 D 6;000 h and standard deviation of ttf equal to 100 h; • Pump. Weibull distribution, scale parameter ˛ D 7;000 h, and shape parameter ˇ D 1:5.
• • • •
MTTFValve1 D 10;000 h; MTTFValve2 D 6;000 h; MTTFPump D 7;000 h; MTTRPump D MTTRValve2 D MTTRValve1 D 100 h.
6.4 Serial Configuration
157
ReliaSoft BlockSim 7 - www.ReliaSoft.com
Reliability Importance vs Time
1.000
Importance Diagram1 Valve 1 Pump Valve 2
Reliability Importance Value
0.800
0.601
0.401
0.202
0.002 0.000
4000.000
8000.000
12000.000
16000.000
20000.000
Time, (t)
Fig. 6.21 Serial configuration, reliability importance of components within the system. ReliaSoft® software ReliaSoft BlockSim 7 - www.ReliaSoft.com
ReliaSoft BlockSim 7 - www.ReliaSoft.com
Static Reliability Importance 0.379
Static Reliability Importance 0.143
Reliability 100%
Reliability 100%
50%
50%
0.115 0% 3 Item(s)
0.227
0.151
0.076
0.000
Reliability Importance Value
Reliability Importance Value
0.303
0% 3 Item(s)
0.086
0.057
0.029
Valve 2
Pump
Valve 1
Time = 4000
0.000
Valve 2
Pump
Valve 1
Time = 8000
Fig. 6.22 Serial configuration, reliability importance of components within the system. t D 4;000 h and t D 8;000 h. ReliaSoft® software
Figure 6.26, obtained by the application of the Monte Carlo simulation analysis of the serial system, illustrates the state diagram, i. e., the up/down diagram, reporting the state of the components and of the system for different values of time t. The system is failing when a generic component fails, i. e., it passes from the state of function to the state of failure. The failure and repair events are random because of the assumption of exponential distributions of ttf and ttr. Figures 6.27 and 6.28 present some other significant results obtained by the simulation analysis. Fig-
ure 6.27 compares the value of point reliability R.t/ by assuming nonrepairable components and point availability A.t/ of the system made of repairable components. In particular A.t/ is the probability that the system is up at time t. In order to obtain this value at t , A.t /, a special counter is utilized during the simulation analysis: this counter is incremented by one every time the system is up at t . Thus, A.t / is the number of times the system is up at t divided by the number of simulation runs executed in the dynamic analysis. Similarly R.t / is the number of times the system is
158
6 Reliability Evaluation and Reliability Prediction Models
ReliaSoft BlockSim 7 - www.ReliaSoft.com
ReliaSoft BlockSim 7 - www.ReliaSoft.com
Block Unreliability vs Time
Block Reliability vs Time
1.000
1.000
Unreliability
Reliability
Diagram1 System Pump Valve 1 Valve 2
Diagram1 System Pump Valve 1 Valve 2
0.800
Reliability, R(t)
Unreliability, F(t)=1-R(t)
0.800
0.600
0.600
0.400
0.400
0.200
0.200
0.000 0.000
10000.000
20000.000
30000.000
40000.000
0.000 0.000
50000.000
10000.000
20000.000
30000.000
40000.000
ReliaSoft BlockSim 7 - www.ReliaSoft.com
ReliaSoft BlockSim 7 - www.ReliaSoft.com
Block Probability Density Function
Block Failure Rate vs Time
5.000E-4
2.000E+64
Pdf
Failure Rate Diagram1 System Pump Valve 1 Valve 2
Diagram1 System Pump Valve 1 Valve 2
1.600E+64
Failure Rate, f(t)/R(t)
4.000E-4
f(t)
3.000E-4
2.000E-4
1.200E+64
8.000E+63
4.000E+63
1.000E-4
0.000 0.000
10000.000
20000.000
30000.000
40000.000
50000.000
0.000 0.000
10000.000
20000.000
Time, (t)
30000.000
Time, (t)
Fig. 6.23 Serial configuration,mix of distributions. F .t /, R.t /, f .t /, and .t /. ReliaSoft® software ReliaSoft BlockSim 7 - www.ReliaSoft.com
Reliability Importance vs Time 1.000
Importance Diagram1 Valve 1 Valve 2 Pump
Reliability Importance Value
0.800
0.600
0.400
0.200
0.000 0.000
50000.000
Time, (t)
Time, (t)
4000.000
8000.000
12000.000
16000.000
20000.000
Time, (t)
Fig. 6.24 Serial configuration. Reliability importance of components within the system. ReliaSoft® software
40000.000
50000.000
6.4 Serial Configuration
159 ReliaSoft BlockSim 7 - www.ReliaSoft.com
ReliaSoft BlockSim 7 - www.ReliaSoft.com
Static Reliability Importance
0.670
Static Reliability Importance
0.449
Reliability 100%
Reliability 100%
50%
50% 0.359
0% 3 Item(s)
0.402
0.268
0.134
Reliability Importance Value
Reliability Importance Value
0.536
0% 3 Item(s)
0.270
0.180
0.090
0.000
Pump
Valve 1
0.000
Valve 2
Pump
Valve 1
Time = 4000
Valve 2
Time = 8000
Fig. 6.25 Serial configuration. Reliability importance of components within the system. t D 4;000 h and t D 8;000 h. ReliaSoft® software ReliaSoft BlockSim 7 - www.ReliaSoft.com
ReliaSoft BlockSim 7 - www.ReliaSoft.com
Block Up/Down
System Up/Down
State
State
Operating Time Time Under Repair
Operating Time Time Under Repair
Valve 2
Valve 1
System
Pump
System
0.000
3000.000
6000.000
9000.000
12000.000
15000.000
Time, (t)
0.000
1400.000
2800.000
4200.000
5600.000
7000.000
Time, (t)
Fig. 6.26 Serial configuration. Repairable components, simulation analysis. State diagram of the system. ReliaSoft® software
up at t divided by the number of simulation runs executed in the dynamic analysis and given the basic hypothesis of nonrepairable components/systems. Figure 6.28 presents the trend of the so-called mean availability defined by Eq. 5.78: N D 1 A.t/ t
Zt A.x/ dx; 0
where A.t/1 is the point availability in t. 1
The theoretical definition of A.t / is the following: Zt A.t / D R.t / C
R.t x/m.x/ dx;
The following trends and measures are also the result of the average value quantified among all simulation runs. In particular, Fig. 6.29 presents the number of failures NF.t/ for the system, obtained by the application of the Monte Carlo simulation. Figure 6.30 presents the number of failures for t D 15;000 h. The following chapter introduces an analytical and effective expression for the determination of the expected value of the number of failures. This expression is very useful in the so-called quantitative evaluation of the reliability and availability of a complex system by the application of fault tree analysis. This numerical example is the opportunity to introduce the downing event criticality index (DECI) defined as follows:
0
where m.x/ is the renewal density function illustrated in Chap. 9 discussing the renewal process and maintenance strategies.
DECIi D
component.i /DE ; ALLDE
(6.58)
160
6 Reliability Evaluation and Reliability Prediction Models
ReliaSoft BlockSim 7 - www.ReliaSoft.com
Availability and Reliability vs Time 1.000
Diagram1 Point Availability Line Point Reliability Line
A(t), R(t)
0.800
0.600
0.400
0.200
0.000 0.000
3000.000
6000.000
9000.000
12000.000
15000.000
Time, (t)
Fig. 6.27 Repairable components. Availability and reliability, simulation analysis. ReliaSoft® software ReliaSoft BlockSim 7 - www.ReliaSoft.com
Mean Availability
1.000
Mean Availability Diagram1 Mean Availability
Mean Availability(t)
0.980
0.960
0.940
0.920
0.900 0.000
3000.000
6000.000
9000.000
12000.000
15000.000
Time, (t)
Fig. 6.28 Repairable components, system availability. ReliaSoft® software
where component.i /DE is the number of downing events for the system caused by component i and ALLDE is the total number of downing events for the system. Figure 6.31 shows the values obtained by the application of the simulation analysis. Because of the sys-
tem serial configuration, X DECIi D DECIValve1 C DECIValve2 C DECIPump i
D 24:409% C 40:293% C 35:298% D 100%:
6.5 Parallel Configuration ReliaSoft BlockSim 7 - www.ReliaSoft.com
161 System Failures
7.000
System Failures Diagram1 System Failures
System Failures(t)
5.600
4.200
2.800
1.400
0.000 0.000
3000.000
6000.000
9000.000
12000.000
15000.000
Time, (t)
Fig. 6.29 Serial system, simulation analysis. NF.t /. ReliaSoft® software ReliaSoft BlockSim 7 - www.ReliaSoft.com
Block Expected Failures
2.420
RS FCI 100%
50% 1.936 0% 3 Item(s)
1.452
0.968
0.484
0.000
Valve 2
Pump
Valve 1
Fig. 6.30 Serial system, simulation analysis. Component’s NF.t D 15;000 h/. ReliaSoft® software
6.5 Parallel Configuration Figure 6.32 presents the parallel reliability block diagram of a system. This is the so-called fully redundant system, where all units must fail for the whole system to fail. In the case of independent components (i. e., the failure of a single component does not affect the reliability of the other components), the system reliability
is expressed as RS .T / D 1 FS .T / D 1 Œ1 R1 .T / : : : Œ1 Rn .T / D1
n Y i D1
Œ1 Ri .T / D
n a i D1
Ri .T /; (6.59)
162
6 Reliability Evaluation and Reliability Prediction Models
ReliaSoft BlockSim 7 - www.ReliaSoft.com
RS DECI
40.293
Availability 100%
50% 32.234 0% 3 Item(s)
24.176
16.117
8.059
0.000 Valve 2
Pump
Valve 1
Fig. 6.31 Serial system, downing event criticality index (DECI). ReliaSoft® software
From Eq. 6.59 C1
RS .T / D 1 D
Ci
n a
n Y
1 e
i D1 RT 0 i .t / dt
e
RT 0
i .t / dt
;
(6.60)
i D1
where i is the failure rate for the i th component. In a fully redundant parallel system the unconditional failure rate is
Cn
Fig. 6.32 Parallel reliability configuration
fS .t/ D
n Y X fi .t/ Œ1 Rj .t/ : i D1
where FS .T / is the system unreliability function for the time interval T , Ri .T / is the reliability of the i th component for the time interval T , and n is the number of components in parallel configurations. n a i D1
pi 1
n Y
.1 pi /
i D1
Its value depends on the generic component i when the others (j ¤ i ) are supposed to be in the state of failure. By Eqs. 6.60 and 6.61, the failure rate of the system is S .t/ D
fS .t/ RS .t/ Pn
Œ1 Rj .t/ D : Rt Rt
` Q 1 niD1 1 e 0 i .x/ dx D niD1 e 0 i .x/ dx (6.62) i D1
In this case the whole system is able to function even if only one component is correctly functioning, i. e., the system fails only if all the components fail.
(6.61)
j ¤i
fi .t/
Q
j ¤i
6.5 Parallel Configuration
163
Reliability – parallel configuration 1.2 1
RS (t)
0.8
n=1 n=8 n=15 n=22
0.6 0.4 0.2 0 0
5
10
15 t
20
25
30
where fS .t/ is the failure probability distribution, i. e., the probability distribution of the time to failure of the system. Figures 6.33 and 6.34 compare the trend of the system reliability RS .t/ and the ratio S for different numbers of components in a parallel configuration when failure rates are constant and equal to . In order to increase the reliability of a parallel redundant configuration system, it is necessary to improve the reliability of the component with the highest value.
Fig. 6.33 System reliability RS .t / for different numbers of components in a parallel configuration
6.5.1 Numerical Example – Parallel Configuration
Failure rate analysis – parallel configurations
1
λS/λ
0.8 n=1
0.6
n=8
Figure 6.35 presents a block diagram of a piping system made of three redundant parallel pumps: Pump1 , Pump2 , and Pump3 .
n=15 0.4
n=22
6.5.1.1 Exponential Distributions of Components’ ttf, Nonrepairable Components
0.2
0 0
5
10
15
20
25
30
t
Fig. 6.34 S = for different numbers of components in a parallel configuration
When the generic failure rate is constant and equal to (i D 8i D 1; : : : ; n), Eq. 6.60 assumes the following special configuration: RS .T / D e
RT 0
S .t / dt
• MTTFPump1 D 10;000 h; • MTTFPump2 D 6;000 h; • MTTFPump3 D 7;000 h.
D 1 .1 eT /n ; (6.63)
where S .t/ is the failure rate for the system. In this special case of constant failure rate for every component, the system failure rate does not assume constant values. As a consequence, the combination of components whose failure behavior is random does not guarantee constant system failure rates, as seen in Eq. 6.62: S .t/ D
All components in Fig. 6.35 are supposed to be not repairable and the probability distributions of the ttf random variables, assumed to be exponential, are based on the following assumptions:
n et .1 et /n1 fS .t/ D ; (6.64) RS .t/ 1 .1 et /n
Pump1
Pump2
Pump3
Fig. 6.35 Block diagram parallel system, piping system
164
6 Reliability Evaluation and Reliability Prediction Models
By Eq. 6.60 the reliability RS .t/ of the parallel system is RS .t/ D
n a
Ri .t/
i D1
D 1 Œ1 RPump1 .t/Œ1 RPump2 .t/Œ1 RPump3 .t/ D R1 .t/ C R2 .t/ C R3 .t/ R1 .t/R2 .t/ R2 .t/R3 .t/ R1 .t/R3 .t/ t t C exp Š exp 10;000 6;000 t C exp exp.2:67 104 t/ 7;000 exp.2:43 104 t/ exp.3:10 104 t /: Considering two values for the mission time T , T D 4;000 h and T D 8;000 h, respectively, the values of the system reliability are RS .T D 4;000/ Š 0:737; RS .T D 8;000/ Š 0:687: By Eq. 6.61 the unconditional failure rate fS .t/ of the parallel system is fS .t/ D
n Y X fi .t/ Œ1 Rj .t/ i D1
j ¤i
D f1 .t/ C f2 .t/ C f3 .t/ C f1 .t/R2 .t/R3 .t/ C f2 .t/R1 .t/R3 .t/ C f3 .t/R1 .t/R2 .t/ f1 .t/ŒR2 .t/ C R3 .t/ f2 .t/ŒR3 .t/ C R1 .t/ f3 .t/ŒR2 .t/ C R1 .t/: Similarly, S .t/ is given by S .t/ D
n Y X i .t/ Œ1 Rj .t/ i D1
j ¤i
D 1 .t/ C 2 .t/ C 3 .t/ C 1 .t/R2 .t/R3 .t/
f .t/, and failure rate .t/ compared with the trends of the components involved (i. e., three pumps). The results illustrated in Fig. 6.36 and related to a parallel configuration of the system can be directly compared with those reported in Fig. 6.20 and related to the same components in a serial configuration. As a consequence, comparing the results obtained, in terms of system reliability, the parallel configuration is much more reliable than the serial one. Figure 6.37 presents the values of the reliability importance IRi .t/for different values of t, while Fig. 6.38 presents the reliability importance for t D 4;000 h and t D 10;000 h. Every graph shown in these figures was obtained with ReliaSoft® software:The most critical component is Pump1 because it is the most reliable one; in other words it is convenient to improve it and further increase the values of reliability.
6.5.1.2 Mix of Probability Distributions of Components’ ttf, Nonrepairable Components Figures 6.39–6.41 illustrate the results obtained by assuming the following distributions of the blocks’ ttf in Fig. 6.35: • Pump1 . Exponential distribution, MTTFPump1 D 10;000 h; D • Pump2 . Normal distribution, MTTFPump2 6;000 h, and standard deviation of ttf 100 h. • Pump3 . Weibull distribution, scale parameter ˛ D 7;000 h, and shape parameter ˇ D 1:5. Figure 6.39 presents the trend of the system’s probability function F .t/, reliability R.t/, density function f .t/, and failure rate .t/ compared with the trends of the three components involved (i. e., blocks). Figures 6.40 and 6.41 present the results of the reliability importance evaluation for the components of the parallel block diagram.
C 2 .t/R1 .t/R3 .t/ C 3 .t/R1 .t/R2 .t/ 1 .t/ŒR2 .t/ C R3 .t/ 2 .t/ŒR3 .t/ C R1 .t/ 3 .t/ŒR2 .t/ C R1 .t/: Figure 6.36 presents the trend of the system’s probability function F .t/, reliability R.t/, density function
6.5.1.3 Repairable Components and Exponential Distributions of ttf and ttr Random Variables In this case every component in the parallel system in Fig. 6.35 is supposed to be repairable under corrective actions and the probability distributions of random
6.5 Parallel Configuration ReliaSoft BlockSim 7 - www.ReliaSoft.com
165 ReliaSoft BlockSim 7 - www.ReliaSoft.com
Block Unreliability vs Time
1.000
Block Reliability vs Time
1.000
Unreliability
Reliability Diagram1
Diagram1
System Pump 1 Pump 2 Pump 3
System Pump 1 Pump 2 Pump 3
0.800
Reliability, R(t)
Unreliability, F(t)=1-R(t)
0.800
0.600
0.600
0.400
0.400
0.200
0.200
0.000 0.000
10000.000
20000.000
30000.000
40000.000
0.000 0.000
50000.000
10000.000
20000.000
Time, (t) ReliaSoft BlockSim 7 - www.ReliaSoft.com
30000.000
40000.000
50000.000
Time, (t) ReliaSoft BlockSim 7 - www.ReliaSoft.com
Block Probability Density Function
2.000E-4
Block Failure Rate vs Time
2.000E-4
Pdf
Failure Rate
Diagram1
Diagram1
System Pump 1 Pump 2 Pump 3
System Pump 1 Pump 2 Pump 3
1.600E-4
Failure Rate, f(t)/R(t)
1.600E-4
f(t)
1.200E-4
8.000E-5
1.200E-4
8.000E-5
4.000E-5
-3.000E-20 0.000
4.000E-5
10000.000
20000.000
30000.000
40000.000
50000.000
-3.000E-20 0.000
10000.000
20000.000
30000.000
Time, (t)
Time, (t)
Fig. 6.36 Parallel system, exponential distributors. F .t /, R.t /, f .t /, and (t /. ReliaSoft® software ReliaSoft BlockSim 7 - www.ReliaSoft.com
Reliability Importance vs Time
0.999
Importance Diagram1 Starting Block Ending Block Pump 3 Pump 2 Pump 1
Reliability Importance Value
0.799
0.599
0.400
0.200
0.000 0.000
10000.000
20000.000
30000.000
40000.000
50000.000
Time, (t)
Fig. 6.37 Parallel system. Reliability importance of components within the system. ReliaSoft® software
40000.000
50000.000
166
6 Reliability Evaluation and Reliability Prediction Models
ReliaSoft BlockSim 7 - www.ReliaSoft.com
ReliaSoft BlockSim 7 - www.ReliaSoft.com
Static Reliability Importance
0.212
Static Reliability Importance
0.617
Reliability
Reliability
100%
100%
50%
0.493 0% 3 Item(s)
0.127
0.085
Reliability Importance Value
Reliability Importance Value
50%
0.169
0.042
0% 3 Item(s)
0.370
0.247
0.123
0.000
0.000 Pump 1
Pump 3
Pump 2
Pump 1
Time = 4000
Pump 3
Pump 2
Time = 10000
Fig. 6.38 Parallel system. Reliability importance of components within the system. t D 4;000 h and t D 10;000 h. ReliaSoft® software ReliaSoft BlockSim 7 - www.ReliaSoft.com
ReliaSoft BlockSim 7 - www.ReliaSoft.com
Block Unreliability vs Time
1.000
Block Reliability vs Time
1.000
Unreliability
Reliability Diagram1
Diagram1
System Pump 1 Pump 2 Pump 3
System Pump 1 Pump 2 Pump 3
0.800
Reliability, R(t)
Unreliability, F(t)=1-R(t)
0.800
0.600
0.400
0.200
0.000 0.000
0.600
0.400
0.200
10000.000
20000.000
30000.000
40000.000
0.000 0.000
50000.000
10000.000
Time, (t) ReliaSoft BlockSim 7 - www.ReliaSoft.com
20000.000
30000.000
40000.000
50000.000
Time, (t) ReliaSoft BlockSim 7 - www.ReliaSoft.com
Block Probability Density Function
2.000E-4
Block Failure Rate vs Time
1.000E+27
Pdf
Failure Rate
Diagram1
Diagram1
System Pump 1 Pump 2 Pump 3
System Pump 1 Pump 2 Pump 3
8.000E+26
Failure Rate, f(t)/R(t)
1.600E-4
f(t)
1.200E-4
8.000E-5
4.000E+26
2.000E+26
4.000E-5
0.000 0.000
6.000E+26
10000.000
20000.000
30000.000
40000.000
50000.000
Time, (t)
0.000 0.000
10000.000
20000.000
30000.000
40000.000
50000.000
Time, (t)
Fig. 6.39 Parallel system, mix of distributions. F .t /, R.t /, f .t /, and .t /. ReliaSoft® software
variables ttf and ttr are assumed to be exponential. In particular the values of MTTF and MTTR are the following: • MTTFPump1 D 10;000 h; • MTTFPump2 D 6;000 h; • MTTFPump3 D 7;000 h;
• MTTRPump1 D MTTRPump2 D MTTRPump3 D 100 h. Figure 6.42 illustrates the state diagram, reporting the state of the components and of the system for different values of time t obtained by the application of the Monte Carlo simulation analysis. In the case of “full
6.5 Parallel Configuration
167
ReliaSoft BlockSim 7 - www.ReliaSoft.com
Reliability Importance vs Time
1.000
Importance Diagram1 Starting Block Ending Block Pump 1 Pump 2 Pump 3
Reliability Importance Value
0.800
0.600
0.400
0.200
0.000 0.000
10000.000
20000.000
30000.000
40000.000
50000.000
Time, (t)
Fig. 6.40 Parallel system. Reliability importance of components within the system. ReliaSoft® software ReliaSoft BlockSim 7 - www.ReliaSoft.com
ReliaSoft BlockSim 7 - www.ReliaSoft.com
Static Reliability Importance
Static Reliability Importance
0.351
0.819
Reliability
Reliability 100%
100%
0% 3 Item(s)
0.210
0.140
0.070
0.000
50%
Reliability Importance Value
Reliability Importance Value
50%
0.281
0.655 0% 3 Item(s)
0.491
0.327
0.164
Pump 1
Pump 3
Pump 2
Time = 4000
0.000
Pump 1
Pump 3
Pump 2
Time = 10000
Fig. 6.41 Parallel system. Reliability importance of components within the system. t D 4;000 h and t D 10;000 h. ReliaSoft® software
redundancy” the system fails if all the components fail. In other words the number of expected failures for the system is close to 0. In fact if the components introduced are used as parts of a redundant parallel system, the value of the system availability is very close to 1 as shown in the Fig. 6.43 reporting the simulated analysis conducted by ReliaSoft® software. If the value of MTTR passes from 100 to 600 h (C500%), the trend of the state diagram (the so-called
up/down diagram) related to the three components and to the system changes as illustrated in Fig. 6.44. This simulated analysis is called “B” in order to distinguish it from previous one, called “A,” which relates to MTTR equal to 100 h. By the analysis of the simulated scenario, in configuration B the system is always in the state of function (up state). The system availability versus reliability diagram changes as illustrated in Fig. 6.45.
168
6 Reliability Evaluation and Reliability Prediction Models
ReliaSoft BlockSim 7 - www.ReliaSoft.com
Block Up/Down State Operating Time Time Under Repair
Pump 3
Pump 2
Pump 1
System
0.000
10000.000
20000.000
30000.000
40000.000
50000.000
Time, (t)
Fig. 6.42 Parallel configuration. Repairable components, simulation analysis A. State diagram of the system. ReliaSoft® software ReliaSoft BlockSim 7 - www.ReliaSoft.com
Availability and Reliability vs Time
1.000
Diagram1 Point Availability Line Point Reliability Line
0.999
A(t), R(t)
0.999
0.998
0.998
0.997 0.000
10000.000
20000.000
30000.000
40000.000
50000.000
Time, (t)
Fig. 6.43 Repairable components, simulation A. Availability and reliability. ReliaSoft® software
6.6 Combined Series–Parallel Systems This reliability configuration is composed of a series of parallel systems, as illustrated in Fig. 6.46. A similar reliability system configuration can be obtained by using a pool of components in serial configuration
and with each component repeated more than once. In particular, Fig. 6.46 presents m 1 copies (i. e., units) for the generic component Cij (i D 1; : : : ; m and j D 1; : : : ; n). The basic hypothesis is that the “standby copy” Cij only functions and takes part in system operation if the primary component unit fails.
6.6 Combined Series–Parallel Systems
169
ReliaSoft BlockSim 7 - www.ReliaSoft.com
Block Up/Down State
Operating Time Time Under Repair Pump 3
Pump 2
Pump 1
System
0.000
10000.000
20000.000
30000.000
40000.000
50000.000
Time, (t)
Fig. 6.44 Parallel configuration. Repairable components, simulation analysis B. State diagram of the system. ReliaSoft® software ReliaSoft BlockSim 7 - www.ReliaSoft.com
Availability and Reliability vs Time 1.000
Diagram1 Point Availability Line Point Reliability Line
A(t), R(t)
0.976
0.952
0.928
0.904
0.880 0.000
10000.000
20000.000
30000.000
40000.000
50000.000
Time, (t)
Fig. 6.45 Repairable components, simulation B. Availability and reliability. ReliaSoft® software
Consequently, system reliability is based on Eqs. 6.51 and 6.59: RSP .t/ D
Y m n m n a Y Y 1 Œ1 rij .t/ D rij .t/; j D1
i D1
j D1 i D1
(6.65)
where rij .t/ is the reliability of the i th copy of the j th component.
170
6 Reliability Evaluation and Reliability Prediction Models
1–m
1–n
C1n
C1j
C11
Ci1
Cij
Cin
Cm1
Cmj
Cmn
Fig. 6.46 Series–parallel configuration
6.7 Combined Parallel–Series Systems
6.8 k-out-of-n Redundancy
This reliability configuration differs from those previously described because the redundancy is applied to the whole series of components: several independent series of components are in a parallel reliability configuration. Should one series fail because at least one component of the series fails, a redundant series starts to operate and takes part in the system function. Figure 6.47 illustrates the reliability block diagram of a parallel–series configuration. The system reliability is a n m n m Y Y Y 1 RPS .t/ D 1 Œrij .t/ D rij .t/;
This configuration of a reliability system is a generalization of a parallel redundant system with a requirement for k out of n (obviously k n) identical and independent components to function in order for the whole system to function. An example is represented by a supply system for a foundry furnace: it is based on five conveyors, three of which must function in order to guarantee the right level of service to the furnace. When k D 1 the previously discussed complete redundancy occurs, while if k D n the system is made up of n components in series. The number of configurations for k functioning components of the available n components is ! n nŠ : (6.67) D kŠ.n k/Š k
i D1
j D1
i D1 j D1
(6.66) where rij .t/ is the reliability of the j th component in the i th chain of the parallel system.
1–n
C11
C1j
C1n
Cij
Cin
Cm1
Cmj
Cmn
1–m
Ci1
Fig. 6.47 Parallel–series configuration
6.8 k-out-of-n Redundancy
171
i Dk
Table 6.13 Successful configurations in two-out-of-three redundancy Reliability Rj .t /
Successful configurations (i. e., functioning components)
RA RB RC RD
A: 1, 2, 3 B: 1, 2 C: 2, 3 D: 1, 3
D r1 r2 r3 D r1 r2 .1 r3 / D r2 r3 .1 r1 / D r1 r3 .1 r2 /
Ri .t / is the reliability of the i th component. 1.2 1 0.8 R(t)
For a better understanding of the so-called partial redundancy, Table 6.13 lists the reliability values of a system composed of three independent components (n D 3) if at least two (k D 2) of them have to function. In particular, the reliability of the number of different configurations2 of the operational system is quantified. In agreement with Eq. 6.67, the number of successful configuration is 4 when k D 2 (successful configurations B, C, and D in Table 6.13) and 1 when k D 3 (successful configuration A in Table 6.13). The generic expression of reliability for a k-out-ofn system composed of identical and independent components is ! n X n (6.68) Œr.t/i Œ1 r.t/ni ; Rk=n .t/ D i
0.6 r(t)
0.4
Parallel (2 comp.) Parallel (3 comp.)
where r.t/ is the reliability function for each component of the system. The following quantifies the reliability of the system in the case of two-out-of-three redundancy, where A, B, C, and D refer to the successful configurations of Table 6.13: ! X 3 2 3 Rj D r .t/ C R2=3 .t/ D r .t/Œ1 r.t/ 2 j DA;B;C;D
D 3r 2 .t/ 2r 3 .t/:
(6.69)
If the failure distribution is exponential, Eq. 6.68 is quantified by the following: ! n X n i t .1 et /ni : (6.70) e Rk=n .t/ D i i Dk
The value of MTTF in the case of an exponential distribution is Z1 MTTFk=n D
1X1 : i
(6.71)
i Dk
Consequently, this is the MTTF in the special case of two-out-of-three redundancy: MTTF2=3 D
.3 e2t 2 e3t / dt D
0
2
0 0
0.2
0.4
0.6
0.8
1
1.2
r(t) - component
Fig. 6.48 Reliability of redundancy systems
Figure 6.48 presents the reliability for different redundancy systems: parallel systems of two and three fully redundant and independent components, and the twoout-of-three system.
6.8.1 Numerical Examples, k-out-of-n Redundancy Now a few numerical examples illustrate the application of the previously introduced analytical model for k-out-of-n redundancy both for nonrepairable and for repairable components.
n
Rk=n .t/ dt D 0
Z1
2-out-of-3 system
0.2
Called “successful configurations”
5 : (6.72) 6
6.8.1.1 k-out-of-n Redundancy, Exponential Distributions and Nonrepairable Components Consider the previously illustrated parallel system made of three nonrepairable pumps whose ttf are supposed to be exponentially distributed with (see Fig. 6.35): • MTTFPump1 D 10;000 h; • MTTFPump2 D 6;000 h; • MTTFPump3 D 7;000 h.
172
6 Reliability Evaluation and Reliability Prediction Models
Now in the case k D 2, i. e., two of three working pumps are required, the reliability of the system is
As a consequence, the expression for the unconditional failure rate fS .t/ is
RS .t/ D RPump1 .t/RPump2 .t/ C RPump2 .t/RPump3 .t/
fS .t/ D fPump3 .t/RPump2 .t/ C fPump2 .t/RPump3 .t/
C RPump1 .t/RPump3 .t/
C fPump3 .t/RPump1 .t/ C fPump1 .t/RPump3 .t/
2RPump1 .t/RPump2 .t/RPump3 .t/ 1 1 C D exp t 10;000 6;000 1 1 C exp t C 7;000 6;000 1 1 C exp t C 7;000 10;000 1 1 1 2 exp t C C ; 7;000 6;000 10;000
where t is in hours. In particular, in the case t D 4;000 h and t D 10;000 h, R.t D 4;000 h/ D 0:624; R.t D 10;000 h/ D 0:170: The analytical expression of the failure rate S .t/ of the system is fS .t/ RS .t/ rate.one pump is working, one is failing/ D P .exactly two of three pumps are working/ fPump3 .t/RPump2 .t/ CfPump2 .t/RPump3 .t/ C fPump3 .t/RPump1 .t/ D RPump1 .t/RPump2 .t/ C RPump1 .t/RPump3 .t/ CRPump3 .t/RPump2 .t/ 2RPump1 .t/RPump2 .t/RPump3 .t/
S2=3 .t/ D
C
RPump1 .t/RPump2 .t/ C RPump1 .t/RPump3 .t/ CRPump3 .t/RPump2 .t/ 2RPump1 .t/RPump2 .t/RPump3 .t/
C fPump2 .t/RPump1 .t/RPump3 .t/ C fPump3 .t/RPump1 .t/RPump2 .t/: Figure 6.49 presents the trend of the system’s probability function F .t/, reliability R.t/, density function f .t/, and failure rate .t/ compared with the trends of the components involved (i. e., blocks). Figures 6.50 and 6.51 present the results of the reliability importance evaluation for the components of the two-out-of-three block diagram.
6.8.1.2 k-out-of-n Redundancy, Nonrepairable Components and Mix of Probability Distributions of Components’ ttf Consider the reliability block diagram of a two-out-ofthree system as illustrated in Fig. 6.52 with the following assumptions:
Figure 6.53 presents the trend of the system’s probability function F .t/, reliability R.t/, density function f .t/, and failure rate .t/ compared with the trends of the components involved (i. e., blocks). Figures 6.54 and 6.55 present the results of the reliability importance evaluation for the components of the two-out-of-three block diagram.
RPump1 .t/RPump2 .t/ C RPump1 .t/RPump3 .t/ CRPump3 .t/RPump2 .t/ 2RPump1 .t/RPump2 .t/RPump3 .t/
2
2ŒfPump1 .t/RPump2 .t/RPump3 .t/
• Pump1 . Exponential distribution, MTTFPump1 D 10;000 h; • Pump2 . Normal distribution, MTTFPump2 D 6;000 h, and standard deviation of ttf 100 h; • Pump3 . Weibull distribution, scale parameter ˛ D 7;000 h, and shape parameter ˇ D 1:5.
fPump1 .t/RPump3 .t/ C fPump2 .t/RPump1 .t/ CfPump1 .t/RPump2 .t/
fPump1 .t/ RPump2 .t/RPump3 .t/ CfPump2 .t/RPump1 .t/RPump3 .t/ CfPump3 .t/RPump1 .t/RPump2 .t/
C fPump2 .t/RPump1 .t/ C fPump1 .t/RPump2 .t/
6.8.1.3 Repairable Components and Exponential Distributions of Components’ ttf : Consider the system of three pumps previously introduced. In particular, the system is supposed to be in a state of function if two out of three components are
6.8 k-out-of-n Redundancy ReliaSoft BlockSim 7 - www.ReliaSoft.com
173 ReliaSoft BlockSim 7 - www.ReliaSoft.com
Block Unreliability vs Time
1.000
Block Reliability vs Time
1.000
Unreliability
Reliability
Diagram1
Diagram1
System Pump 1 Pump 2 Pump 3
System Pump 1 Pump 2 Pump 3
0.800
Reliability, R(t)
Unreliability, F(t)=1-R(t)
0.800
0.600
0.400
0.200
0.000 0.000
0.600
0.400
0.200
10000.000
20000.000
30000.000
40000.000
0.000 0.000
50000.000
10000.000
20000.000
Time, (t) ReliaSoft BlockSim 7 - www.ReliaSoft.com
30000.000
40000.000
50000.000
Time, (t) ReliaSoft BlockSim 7 - www.ReliaSoft.com
Block Probability Density Function
2.000E-4
Block Failure Rate vs Time
3.000E-4
Pdf
Failure Rate
Diagram1
Diagram1
System Pump 1 Pump 2 Pump 3
System Pump 1 Pump 2 Pump 3 2.400E-4
Failure Rate, f(t)/R(t)
1.600E-4
f(t)
1.200E-4
8.000E-5
4.000E-5
0.000 0.000
1.800E-4
1.200E-4
6.000E-5
10000.000
20000.000
30000.000
40000.000
0.000 0.000
50000.000
10000.000
20000.000
Time, (t)
30000.000
Time, (t)
Fig. 6.49 k-out-of-n system, exponential distributions. F .t /, R.t /, f .t /, and .t /. ReliaSoft® software ReliaSoft BlockSim 7 - www.ReliaSoft.com
Reliability Importance vs Time
0.515
Importance Diagram1 Starting Block Ending Block Pump 2 Pump 1 Pump 3 Node 1
Reliability Importance Value
0.412
0.309
0.206
0.103
0.000 0.000
6000.000
12000.000
18000.000
24000.000
30000.000
Time, (t)
Fig. 6.50 k-out-of-nsystem. Reliability importance of components within the system. ReliaSoft® software
40000.000
50000.000
174
6 Reliability Evaluation and Reliability Prediction Models
ReliaSoft BlockSim 7 - www.ReliaSoft.com
ReliaSoft BlockSim 7 - www.ReliaSoft.com
Static Reliability Importance
0.498
Static Reliability Importance
0.431
Reliability
Reliability
100%
100%
50%
50%
0.345 0% 3 Item(s)
0.299
0.199
0.100
0.000
Reliability Importance Value
Reliability Importance Value
0.399
0% 3 Item(s)
0.259
0.172
0.086
Pump 1
Pump 3
0.000
Pump 2
Time = 4000
Pump 2
Pump 3
Pump 1
Time = 10000
Fig. 6.51 k-out-of-n system. Reliability importance of components within the system. t D 4;000 h and t D 10;000 h. ReliaSoft® software
Monte Carlo simulation. Figure 6.59 presents the number of failures for t D 50;000 h. Finally, Fig. 6.60 presents the DECI values obtained for t D 50;000 h.
Pump 1
Starting Block
Pump 2
Node 2_out_of_3
Ending Block
Pump 3
Fig. 6.52 Reliability block diagram, two-out-of-three system
operating properly (see Fig. 6.52). The pumps are supposed to be repairable under corrective actions and the probability distributions of the random variables ttf and ttr are assumed to be exponential. In particular, the values of MTTF and MTTR are: • • • •
MTTFPump1 D 10;000 h; MTTFPump2 D 6;000 h; MTTFPump3 D 7;000 h; MTTRPump1 D MTTRPump2 D MTTRPump3 D 100 h.
Figure 6.56 illustrates the state diagram obtained by the application of the Monte Carlo simulation analysis. It reports the state of the components and of the system for different values of time t. The system availability A.t/ versus reliability R.t/ diagram is illustrated in Fig. 6.57. Figure 6.58 presents the number of failures NF(t/ for the system, obtained by the application of the
6.9 Simple Standby System Standby redundancy configurations consist of items that are inactive and available to be called into service when/if an active item fails. The inactive items are on standby. Standby systems represent a significant and important part of reliability systems: the functioning of several production systems has its foundation on components that are not based on the critical assumption of independency of failures. Simple standby is a redundancy strategy but it differs from those previously discussed (e. g., parallel, k-out-of-n configurations) in that the redundant units, if they do not fail, are always in a state of use. Chapter 8 presents and applies Markov analysis for the determination of reliability in state-dependent complex systems; this section briefly introduces a simple standby system whose reliability can be quantified without introducing Markov analysis. Figure 6.61 presents a parallel configuration of two identical components, one of which must function in order to guarantee the operation of the whole system. As a consequence, only one component is in use, while the second is ready to function in case the first one fails; the third element, SW, switches the activity between the components. When the reliability of the switch is equal to 1 (i. e., Rsw D 1), the value of the reliability system can be
6.9 Simple Standby System
175 ReliaSoft BlockSim 7 - www.ReliaSoft.com
ReliaSoft BlockSim 7 - www.ReliaSoft.com
Block Failure Rate vs Time
Block Probability Density Function
2.000E+68
0.002
Failure Rate
Pdf
Diagram1
Diagram1 System Pump 1 Pump 2 Pump 3
1.600E+68
0.002
1.200E+68
0.001
f(t)
Failure Rate, f(t)/R(t)
System Pump 1 Pump 2 Pump 3
8.000E+67
8.000E-4
4.000E+67
4.000E-4
0.000 0.000
10000.000
20000.000
30000.000
40000.000
2.000E-223 0.000
50000.000
10000.000
20000.000
Time, (t)
30000.000
40000.000
50000.000
Time, (t) ReliaSoft BlockSim 7 - www.ReliaSoft.com
ReliaSoft BlockSim 7 - www.ReliaSoft.com
Block Reliability vs Time
Block Unreliability vs Time
1.000
1.000
Reliability
Unreliability Diagram1
Diagram1 System Pump 1 Pump 2 Pump 3
System Pump 1 Pump 2 Pump 3
0.800
Unreliability, F(t)=1-R(t)
Reliability, R(t)
0.800
0.600
0.400
0.400
0.200
0.200
0.000 0.000
0.600
10000.000
20000.000
30000.000
40000.000
50000.000
0.000 0.000
10000.000
20000.000
Time, (t)
30000.000
Time, (t)
Fig. 6.53 k-out-of-n system, mix of distributions. F .t /, R.t /, f .t /, and .t /. ReliaSoft® software ReliaSoft BlockSim 7 - www.ReliaSoft.com
Reliability Importance vs Time
0.533
Importance Diagram1 Starting Block Ending Block Pump 1 Pump 2 Pump 3
Reliability Importance Value
0.426
0.320
0.213
0.107
0.000 0.000
4000.000
8000.000
12000.000
16000.000
20000.000
Time, (t)
Fig. 6.54 k-out-of-nsystem. Reliability importance of components within the system. ReliaSoft® software
40000.000
50000.000
176
6 Reliability Evaluation and Reliability Prediction Models ReliaSoft BlockSim 7 - www.ReliaSoft.com
ReliaSoft BlockSim 7 - www.ReliaSoft.com
Static Reliability Importance
Static Reliability Importance 0.449
0.416
Reliability
Reliability 100%
100%
50%
50%
0.333 0% 3 Item(s)
0.269
0.180
Reliability Importance Value
Reliability Importance Value
0.359
0% 3 Item(s)
0.249
0.166
0.083
0.090
0.000
Pump 2
Pump 1
0.000
Pump 3
Pump 2
Pump 3
Pump 1
Time = 10000
Time = 4000
Fig. 6.55 k-out-of-n system. Reliability importance of components within the system. t D 4;000 h and t D 10;000 h. ReliaSoft® software
ReliaSoft BlockSim 7 - www.ReliaSoft.com
Block Up/Down State Operating Time Time Under Repair
Pump 3
Pump 2
Pump 1
System
0.000
10000.000
20000.000
30000.000
40000.000
50000.000
Time, (t)
Fig. 6.56 Two-out-of-three system. Repairable components, simulation analysis. State diagram of the system. ReliaSoft® software
quantified by RS .t/ D RI .t/ C RII .t/;
These are the equations used to determine the reliability values: (6.73)
where RI .t/ is the reliability of component A and RII .t/ is the probability component A fails, component B starts functioning and is reliable for a period of time equal to t (see Fig. 6.62). Figure 6.62 illustrates the disjoint events modeled by RI .t/ and RII .t/.
RI .t/ D RA .t/; Zt RII .t/ D
fA . /RB .t / d ;
(6.74)
0
where fA . / is the probability density function for component A.
6.9 Simple Standby System ReliaSoft BlockSim 7 - www.ReliaSoft.com
177
Availability and Reliability vs Time
1.000
Diagram1 Point Availability Line Point Reliability Line
A(t), R(t)
0.900
0.800
0.700
0.600
0.500 0.000
10000.000
20000.000
30000.000
40000.000
50000.000
Time, (t)
Fig. 6.57 Two-out-of-three system. Repairable components, simulation. Availability and reliability. ReliaSoft® software ReliaSoft BlockSim 7 - www.ReliaSoft.com
System Failures 0.600
System Failures Diagram1 System Failures
System Failures(t)
0.480
0.360
0.240
0.120
0.000 0.000
10000.000
20000.000
30000.000
40000.000
50000.000
Time, (t)
Fig. 6.58 Two-out-of-three system, simulation analysis. NF.t /. ReliaSoft® software
From Eqs. 6.73 and 6.74, the reliability of the system is RS .t/ D e
Rt 0
Zt A .x/ dx
C
R t
fA . / e 0 B .x/ dx d ;
0
(6.75)
where A .t/ is the failure rate of component A and B .t/ is the failure rate of component B. In Eq. 6.75 it is assumed that the standby component B does not fail during its waiting time, i. e., the operating time of component A: component B is as good as new at time and is so when it starts to function.
178
6 Reliability Evaluation and Reliability Prediction Models
ReliaSoft BlockSim 7 - www.ReliaSoft.com
Block Expected Failures
8.352
RS FCI 100%
50%
6.682
0% 3 Item(s)
5.011
3.341
1.670
0.000 Pump 2
Pump 3
Pump 1
Fig. 6.59 Two-out-of-three system, simulation analysis. Component’s NF.t D 50;000 h/. ReliaSoft® software ReliaSoft BlockSim 7 - www.ReliaSoft.com
RS DECI 37.667
Availability 100%
50%
30.134
0% 3 Item(s)
22.600
15.067
7.533
0.000
Pump 2
Pump 3
Pump 1
Fig. 6.60 Two-out-of-three system. Downing event criticality index, t D 50;000 h. ReliaSoft® software
If A .t/ D B .t/ D , Eq. 6.75 can be modified as RS .t/ D e
t
Z1
Zt C
and the MTTF of the system (MTTFS ) is
. e
.t /
e
MTTFS D
/ d
0
0
D et .1 C t/
RS .t/ dt D
(6.76)
2 :
(6.77)
6.9 Simple Standby System
179 1 0.9
A
0.8
R (system, R(SW) = 1, B OK in τ)
0.7 Reliability
SW B
R(comp. A)
0.6 R (system, R(SW) = 1 and B failing)
0.5 R (system, R(SW) = 1, B ok in τ, λA = λB, aer = 0.002) R (parallel A & B, λA = λB)
0.4 0.3
Fig. 6.61 Simple standby system
0.2 0.1 0 0
Component A functions from t0 to t
I
1000
1500
2000
2500 3000 units of me
3500
4000
4500
5000
Fig. 6.63 Standby system, R.SW/ D 1
t0=0 II
500
t
Component A functions from t0 to τ t0=0
τ
Component B functions from τ to t t
Fig. 6.62 Disjoint events in a simple standby system
The following equation quantifies the reliability RII .t/ should the switch component be subject to failures with a failure rate of sw .t/: Zt RII .t/ D
R R t
fA . / e 0 sw .x/ dx e 0 B .x/ dx d :
0
(6.78) If component A is subjected to random failures with failure rate A and component B is subjected to a random failure with failure rate B;before during the “waiting state” and failure rate B;after during the “use state,” the expression for the system reliability function is
RS .t/ D e
A t
Zt C 0
D eA t C A
A eA eB;before e
B;after .t /
d
B;after t
e B;after A B;before
. et .A CB;before B;after / 1/:
(6.79)
Figure 6.63 compares the values of reliability obtained in the case of the presence of a perfect switch, i. e., R.SW/ D 1, A D 0:002 (units of time)1 , B;after D 1=2A D 0:001 (units of time)1 , and B;before D 0:0005 (units of time)1 . In particular, if t D 1;000 units of time, the following values of reliability can be obtained:
• R.A/ D 0:135, reliability of component A without the standby; • RŒsystem; R.SW/ D 1; component B OK in D 0:600, the reliability of the standby system if component B is not subjected to failures during the waiting state; • RŒsystem; R.SW/ D 1; component B failing D 0:516, the reliability of the standby system if component B is subjected to failures during the waiting state; • RŒsystem; R.SW/ D 1, component B OK in , A D B;after D 0:406, the reliability of the standby system if component B is not subjected to failures during the waiting state and component A is identical to component B, i. e., the failure rate of the component A is equal to failure rate of component B (A D B;after ); • RŒparallel components A and B; A D B D 0:252, the reliability of a parallel system made of two identical components A and B. As a consequence, the introduction of a redundancy based on standby can increase the value of the system reliability up to 300% when compared with the reliability of component A, and up to 140% when compared with the reliability of a parallel system. Figure 6.64 presents the values of the increment of reliability passing from a single component A, i. e., R(A), to a standby system made of two identical components (R1 values), and the values of the increment passing from a redundant system made of two parallel components to the standby configuration (R2 values). Similarly, Fig. 6.65 presents the percentage increment of reliability. It increases when the units of time
180
6 Reliability Evaluation and Reliability Prediction Models
6.9.1 Numerical Example – Time-Dependent Analysis: Standby System
Absolute Reliability Increment 0.4
Reliability Increment
ΔR1 0.3
ΔR2 0.2
0.1
0 0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
units of me
Fig. 6.64 Absolute reliability increment in a standby system
Reliability Increment (%) Reliability Increment [%]
1000% 900%
ΔR1/R(A)
800%
ΔR2/R(parallel A & B)
700% 600% 500% 400% 300% 200% 100% 0% 0
500
1000 1500
2000
2500 3000 3500 units of me
4000
4500
5000
Fig. 6.65 Reliability increment in a standby system
are incremented. As a consequence, the reliability increment is great for large values of time in terms of percentage but in absolute terms it assumes a maximum value depending on the failure rates of the components in the system. Similarly, in the presence of a switch randomly subjected to failures with failure rate sw , RS .t/ D e
A t
Zt C 0
De
A t
In previous numerical examples and in most industrial applications (cases studies), all the components within the system are supposed to be independent. For example, the failure of component A does not affect the failure of component B. Consider two pumps, Pump1 and Pump2, in a standby redundancy system. For each block of the system the “active” failure distribution is distinguished by the “quiescent” failure distribution. In particular, the quiescent failure distribution refers to the component when it is in standby mode. For a generic component the failure modes during the quiescent mode are generally different from those during the active mode. In the case of identical failure distributions for both quiescent and active modes, the components are in a simple parallel configuration (also called a “hot standby” configuration). When the rate of failure of the standby component is less in quiescent mode than in active mode, then the configuration is called a “warm standby” configuration. Lastly, in a cold standby configuration the rate of failure of the standby component is zero in quiescent mode (i. e., the component cannot fail when in standby). Dealing with standby systems, a switching device to the standby component in the case of failure for the active component is often present. In particular, it is possible for the switch to fail before the active component. If the active component fails and the switch has also failed, then the system cannot be switched to the standby component and it therefore fails.
6.9.1.1 Nonrepairable Components, Exponential Distribution of ttf. Perfect Switch
A eA eswitch eB;before eB;after .t / d
eB;after t C A B;after A B;before switch
. et .A Cswitch CB;before B;after / 1/: (6.80) The previously introduced parameters and models have been applied in the following industrial case study.
Figure 6.66 presents the trend of F .t/, R.t/, f .t/, and .t/ for different values of t, distinguishing and comparing the hot standby system (where both quiescent and active failure distributions are the same – first column in the figure) from the cold standby system (where the rate of failure of the standby component is zero in quiescent mode – second column in the figure). Both systems are supposed to be not repairable. Obviously, as demonstrated by Fig. 6.66, the cold system
6.9 Simple Standby System ReliaSoft BlockSim 7 - www.ReliaSoft.com
181 ReliaSoft BlockSim 7 - www.ReliaSoft.com
Block Unreliability vs Time
1.000
Block Unreliability vs Time
1.000
Unreliability
Unreliability
Diagram1 System Stand-by 1 (Pump2 in stand by)
Diagram1 System Stand-by 1 (Pump2 in stand by)
0.800
Unreliability, F(t)=1-R(t)
Unreliability, F(t)=1-R(t)
0.800
0.600
0.400
0.200
0.000 0.000
0.600
0.400
0.200
10000.000
20000.000
30000.000
40000.000
0.000 0.000
50000.000
10000.000
Time, (t) ReliaSoft BlockSim 7 - www.ReliaSoft.com
20000.000
30000.000
40000.000
ReliaSoft BlockSim 7 - www.ReliaSoft.com
Block Reliability vs Time
1.000
Block Reliability vs Time
1.000
Reliability
Reliability Diagram1 System Stand-by 1 (Pump2 in stand by)
Diagram1 System Stand-by 1 (Pump2 in stand by)
0.800
Reliability, R(t)
Reliability, R(t)
0.800
0.600
0.400
0.600
0.400
0.200
0.200
0.000 0.000
10000.000
20000.000
30000.000
40000.000
0.000 0.000
50000.000
10000.000
20000.000
30000.000
40000.000
50000.000
Time, (t)
Time, (t) ReliaSoft BlockSim 7 - www.ReliaSoft.com
ReliaSoft BlockSim 7 - www.ReliaSoft.com
Block Probability Density Function 7.000E-5
Block Probability Density Function 5.000E-5
Pdf
Pdf
Diagram1 System Stand-by 1 (Pump2 in stand by)
Diagram1 System Stand-by 1 (Pump2 in stand by)
4.000E-5
4.200E-5
3.000E-5
f(t)
f(t)
5.600E-5
2.800E-5
2.000E-5
1.400E-5
1.000E-5
0.000 0.000
10000.000
20000.000
30000.000
40000.000
0.000 0.000
50000.000
10000.000
Time, (t)
20000.000
30000.000
40000.000
ReliaSoft BlockSim 7 - www.ReliaSoft.com
Block Failure Rate vs Time
Block Failure Rate vs Time
2.000E-4
1.000E-4
Failure Rate
Failure Rate
Diagram1 System Stand-by 1 (Pump2 in stand by)
Diagram1 System Stand-by 1 (Pump2 in stand by)
8.000E-5
Failure Rate, f(t)/R(t)
1.600E-4
1.200E-4
8.000E-5
4.000E-5
0.000 0.000
50000.000
Time, (t)
ReliaSoft BlockSim 7 - www.ReliaSoft.com
Failure Rate, f(t)/R(t)
50000.000
Time, (t)
6.000E-5
4.000E-5
2.000E-5
10000.000
20000.000
30000.000
Time, (t)
40000.000
50000.000
0.000 0.000
10000.000
20000.000
30000.000
40000.000
50000.000
Time, (t)
Fig. 6.66 Hot standby versus cold standby. Pump1 active, Pump2 standby. Exponential distribution. Nonrepairable components: F .t /, R.t /, f .t /, .t /. ReliaSoft® software
182
6 Reliability Evaluation and Reliability Prediction Models
is better than the hot one because the standby component is “as good as new” till the switch component, supposed to be perfect, switches the active and the quiescent pumps (i. e., it substitutes the originally active component which fails).
• Pump1 . Exponential distribution, MTTFPump1 D 10;000 h; • Pump2 . Normal distribution, MTTFPump2 D 6;000 h, and standard deviation of ttf 100 h; • Switch. Weibull distribution, scale parameter ˛ D 7;000 h, and shape parameter ˇ D 1:5.
6.9.1.2 Nonrepairable Components, Mix of Probability Distributions of Blocks’ ttf. Not Perfect Switch
Both hot and cold time-dependent systems are supposed to be not repairable.
Similarly to the analysis conducted in the previous section, Fig. 6.67 presents the trend of F , .t/ and f .t/ for different values of t, distinguishing and comparing the hot standby system (where both quiescent and active failure distributions are the same – first column of figure) from the cold standby system (where the rate of failure of the standby component is zero in quiescent mode – second column of figure), and assuming:
6.9.1.3 Nonrepairable Components and Simulation Analysis. Hot Standby System and Switch Perfect
ReliaSoft BlockSim 7 - www.ReliaSoft.com
0.002
The following analysis was conducted with the use of Monte Carlo simulation in order to test the system behavior in accordance with the hot and cold hypotheses. In particular, Fig. 6.68 presents an up/down diagram
ReliaSoft BlockSim 7 - www.ReliaSoft.com
Block Failure Rate vs Time Block Unreliability vs Time Cold stand by
Block Failure Rate vs Time Block Unreliability vs Time - Cold stand by Failure Rate Unreliability
1,000
3.000E-4 1,000
Failure Rate Unreliability
Diagram1 Diagram1 System System Stand-by 1 (Pump2 in stand by) Stand-by 1 (Pump2 in stand by)
0.002
2.360E-4 0,800
Failure Rate, f(t)/R(t)
Unreliability, F(t)=1-R(t)
Failure Rate, f(t)/R(t)
0,800
0,6000.001
1.720E-4 0,600
8.000E-4 0,400
1.080E-4 0,400
4.000E-4 0,200
0,0000.000 0,000 0.000
4.400E-5 0,200
10000,000 10000.000
20000,000 20000.000
30000,000 30000.000
Time, (t) Time,
ReliaSoft BlockSim 7 - www.ReliaSoft.com
8.000E-4
Diagram1 Diagram1 System System Stand-by 1 (Pump2 in stand Stand-by 1 (Pump2 in stand by)by)
40000,000 40000.000
50000,000 50000.000
-2.000E-5 0,000 0,000 0.000
10000,000 10000.000
20000,000 20000.000
30000,000 30000.000
Time, (t) Time,
(t) ReliaSoft BlockSim 7 - www.ReliaSoft.com
Block Reliability vs Time -Density Cold standFunction by Block Probability Pdf Reliability
1,000
1,000 2.000E-4
40000,000 40000.000
Block Reliability vs Time
Block Probability Density Function - Cold stand by Reliability Pdf
Diagram1 Diagram1 System System Stand-by 1 (Pump2 in stand by) Stand-by 1 (Pump2 in stand by)
Diagram1 Diagram1 System System Stand-by 1 (Pump2 in stand by) by) Stand-by 1 (Pump2 in stand
6.400E-4
0,800 1.594E-4
4.800E-4 0,600
0,600 1.188E-4
f(t)
f(t)
0,800
3.200E-4 0,400
0,400 7.820E-5
1.600E-4 0,200
0,200 3.760E-5
0,0000.000 0,000 0.000
50000,000 50000.000
(t)
10000,000 10000.000
20000,000 20000.000
Time, (t)
30000,000 30000.000
Time, (t)
40000,000 40000.000
50000,000 50000.000
0,000 -3.000E-6 0,000 0.000
10000,000 10000.000
20000,000 20000.000
30000,000 30000.000
40000,000 40000.000
50000,000 50000.000
Time, (t)
Time, (t)
Fig. 6.67 Hot standby versus cold standby. Pump1 active, Pump2 standby. Nonrepairable components: F .t /, R.t /, f .t /, .t /. Switch not perfect. ReliaSoft® software
6.10 Production System Efficiency
183
ReliaSoft BlockSim 7 - www.ReliaSoft.com
Block Up/Down
Stand-by 1 (Pump2 in stand by)_Switch
Stand-by 1 (Pump2 in stand by)
Pump 2
Pump 1
System
0.000
10000.000
20000.000
30000.000
40000.000
50000.000
Time, (t)
Fig. 6.68 Hot standby, simulation analysis. Switch perfect. ReliaSoft® software
related to the nonrepairable hot standby system made of pumps Pump1 and Pump2 , and a “perfect” switch component, i. e., a component which does not fail and it is not subject to failures. From Fig. 6.68, the standby system fails when the active Pump1 fails because nonrepairable Pump2 fails first, i. e., during the standby period. 6.9.1.4 Nonrepairable Components and Simulation Analysis, Cold standby system Figure 6.69 presents the up/down diagram obtained by a simulation analysis. It shows the system failing when Pump1 fails because the switch fails first, i. e., it is in the state of failure when Pump1 fails and has to be substituted by Pump2 . If the switch is perfect, Pump2 action starts immediately when Pump1 fails as illustrated in Fig. 6.70. 6.9.1.5 Repairable Components and Simulation Analysis. Hot Standby System and Switch Perfect Assuming an exponential distribution of ttr (MTTRPump1 D MTTRPump2 D 100 h), a Monte Carlo simulation analysis generates the state diagram shown
in Fig. 6.71 for the hot standby system. Figure 6.72 reports the trend of the expected availability and reliability of the hot standby system as the result of a simulation analysis by ReliaSoft® reliability software. Figure 6.73 shows the results of the simulation analysis with MTTR equal to 1,000, it can be stated that the system too passes from up to the down when it fails because both Pump1 and Pump2 are under the random repair process (between 20,000 and 30;000 h).
6.10 Production System Efficiency Production system efficiency measures the productivity of a system able to work in different operating conditions with different performance levels. In contrast to the reliability and the availability functions, efficiency is not a measure of probability but depends on the reliability of different operating configurations. In fact, a production system is normally composed of several components whose possible failure requires different operating configurations and performance. Efficiency eS is an estimation of the average productivity of a system: X QSi P .Si /; (6.81) eS D i
184
6 Reliability Evaluation and Reliability Prediction Models
ReliaSoft BlockSim 7 - www.ReliaSoft.com
Block Up/Down
Stand-by 1 (Pump2 in stand by)_Switch
Stand-by 1 (Pump2 in stand by)
Pump 2
Pump 1
System
0.000
10000.000
20000.000
30000.000
40000.000
50000.000
Time, (t)
Fig. 6.69 Cold standby, simulation analysis. Nonrepairable components. Switch not perfect. ReliaSoft® software ReliaSoft BlockSim 7 - www.ReliaSoft.com
Block Up/Down
Stand-by 1 (Pump2 in stand by)_Switch
Stand-by 1 (Pump2 in stand by)
Pump 2
Pump 1
System
0.000
10000.000
20000.000
30000.000
40000.000
50000.000
Time, (t)
Fig. 6.70 Cold standby, simulation analysis. Nonrepairable components. Switch perfect. ReliaSoft® software
where QSi is the productivity (measured as a percentage of the nominal productivity value) of the i th operating configuration of the system and P .Si / is the probability the system functions in configuration i . An example is provided by the helpdesk service of a bank. Its productivity is measured in terms of users
served in 1 h and can change quite markedly during the working day according to the various degrees of stress and fatigue experienced by the bank employees. Two significant applications of the determination of efficiency are illustrated next.
6.10 Production System Efficiency
185
ReliaSoft BlockSim 7 - www.ReliaSoft.com
Block Up/Down
Stand-by 1 (Pump2 in stand by)_Switch
Stand-by 1 (Pump2 in stand by)
Pump 2
Pump 1
System
0.000
10000.000
20000.000
30000.000
40000.000
50000.000
Time, (t)
Fig. 6.71 Repairable components, simulation analysis. Hot standby system: state diagram, MTTR D 100 h. ReliaSoft® software
ReliaSoft BlockSim 7 - www.ReliaSoft.com
Availability and Reliability vs Time 1.000 Diagram1 Point Availability Line Point Reliability Line
0.960
A(t), R(t)
0.920
0.880
0.840
0.800 0.000
10000.000
20000.000
30000.000
40000.000
50000.000
Time, (t)
Fig. 6.72 Repairable components, simulation analysis. Hot standby system: A.t / and R.t /. ReliaSoft® software
6.10.1 Water Supplier System Figure 6.74 illustrates a water supplier which supplies water for a production activity. It is composed of four independent and identical pumps whose hazard rate is
assumed to be constant and equal to 0:8 year1 considering an average and continuous functioning of the pump and a nominal and constant water flow rate of 5 kg s1 . The year is composed of 200 operating days composed of 16 hours per day.
186
6 Reliability Evaluation and Reliability Prediction Models
ReliaSoft BlockSim 7 - www.ReliaSoft.com
Block Up/Down
Stand-by 1 (Pump2 in stand by)_Switch
Stand-by 1 (Pump2 in stand by)
Pump 2
Pump 1
System
0.000
10000.000
20000.000
30000.000
40000.000
50000.000
Time, (t)
Fig. 6.73 Repairable components, simulation analysis. Hot standby system: state diagram, MTTR D 1;000 h. ReliaSoft® software
A B DRYER
C D Fig. 6.74 System function scheme
Fig. 6.75 Supply system of the continuous dryer
Table 6.14 Probability P .Si / Scenario Si
Productivity QSi
Case I
5 kg=s
Case II
10 kg=s
Case III
15 kg=s
Case IV
20 kg=s
Probability P .Si / ! 4 4Š P .S1 / D Ri .2500/1 Œ1 Ri .2500/41 D .0:535/1 .1 0:535/3 D 0:215 1 1Š.4 1/Š ! 4 4Š Ri .2500/2 Œ1 Ri .2500/42 D P .S2 / D .0:535/2 .1 0:535/2 D 0:371 2 2Š.4 2/Š ! 4 4Š P .S3 / D .0:535/3 .1 0:535/1 D 0:285 Ri .2500/3 Œ1 Ri .2500/43 D 3Š.4 3/Š 3 n Y P .S4 / D Ri .2500/ D .0:535/4 D 0:082 iD1
Ri is the component reliability
6.10 Production System Efficiency
187
Table 6.15 System efficiency values Efficiency eS X Qi P .Si / D P .S1 / 100% C .P .S2 / C P .S3 / C P .S4 // 100% D 0:953 i X Qi P .Si / D P .S1 / 50% C P .S2 / 100% C .P .S3 / C P .S4 // 100% D 0:8455 i X Qi P .Si / D P .S1 / 33% C P .S2 / 67% C P .S3 / 100% C P .S4 / 100% D 0:686 i X Qi P .Si / D P .S1 / 25% C P .S2 / 50% C P .S3 / 75% C P .S4 / 100% D 0:535
Target value 5 kg=s 10 kg=s 15 kg=s 20 kg=s
i
Table 6.16 Productivity and reliability of the conveyors Conveyor
Productivity Q (%)
Reliability R.144/
A B C D
40 30 30 20
0.989 0.921 0.997 0.893
Consequently, the number of active operating hours per year is N D 200
hours hours days 16 D 3200 : year day year
The following equation quantifies the value of reliability Ri for the i th component and 2;500 h of operation: Ri .2500/ D eT D e 3200 2500 D 0:535: 0:8
Table 6.14 quantifies reliability in different operating scenarios. Table 6.15 quantifies the efficiency of the system for different values of system performance (operating target value).
6.10.2 Continuous Dryer System The supply system of a continuous dryer used to dry pasta is composed of four conveyors: A, B, C, and D (Fig. 6.75). The system has been modified several times during the last decade. As a result, each conveyor works with a specific production capacity Qsi and reliability values (see Table 6.16). The dryer works 24 h a day for 6 days a week.
Table 6.17 Efficiency calculus OK
Not OK
– A B C D A, B A, C A, D B, C B, D C, D A, B, C A, C, D A, B, D B, C, D A, B, C, D
A, B, C, D B, C, D A, C, D A, B, D A, B, C C, D B, D B, C A, D A, C A, B D B C A –
P .Si / .1 RA .144//.1 RB .144//.1 RC .144//.1 RD .144// D 2.79E07 RA .144/.1 RB .144//.1 RC .144//.1 RD .144// D 2.51E05 RB .144/.1 RA .144//.1 RC .144//.1 RD .144// D 3.25E06 RC .144/.1 RA .144//.1 RB .144//.1 RD .144// D 2.33E06 RD .144/.1 RA .144//.1 RB .144//.1 RC .144// D 2.33E06 RA .144/RB .144/.1 RC .144//.1 RD .144// D 2.92E04 RA .144/RC .144/.1 RB .144//.1 RD .144// D 8.33E03 RA .144/RD .144/.1 RB .144//.1 RC .144// D 2.09E04 RB .144/RC .144/.1 RA .144//.1 RD .144// D 1.08E03 RB .144/RD .144/.1 RA .144//.1 RC .144// D 2.71E05 RC .144/RD .144/.1 RA .144//.1 RB .144// D 7.74E04 RA .144/RB .144/RC .144/.1 RD .144// D 9.72E02 RA .144/.1 RB .144//RC .144/RD .144/ D 6.96E02 RA .144/RB .144/.1 RC .144//RD .144/ D 2.44E03 .1 RA .144//RB .144/RC .144/RD .144/ D 9.02E03 RA .144/RB .144/RC .144/RD .144/ D 8.11E01
QSi (%)
P .Si /Qi
0 40 30 30 20 70 70 60 60 50 50 100 90 90 80 100
0 1.00E05 9.76E07 6.98E07 4.66E07 0.0002 0.0058 0.0001 0.0006 1.36E05 0.0004 0.0972 0.0626 0.0022 0.0072 0.8110
es D
0.987
188
Considering 1 week (i. e., 144 h) of operating time, Table 6.17 quantifies the efficiency of the system. In particular, there are 16 different system operating configurations: each configuration is composed of “OK” (i. e., in a state of function) and “not OK” (i. e., not in a state of function) components. Finally, each configuration is characterized in terms of productivity. The
6 Reliability Evaluation and Reliability Prediction Models
generic value of P .Si / is based on combining the reliability of each component. When the capacity of the supply system exceeds the requested value (considering the values in Table 6.16), productivity is assumed to be equal to 100%. The system efficiency eS is 0.987, and the results obtained are reported in Table 6.17.
7
Maintenance Information System and Failure Rate Prediction
Contents 7.1
The Role of a Maintenance Information System . . 189
7.2
Maintenance Information System Framework . . . 7.2.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.2 Maintenance Engineering . . . . . . . . . . . . . . . . 7.2.3 Interventions and Workload Analysis . . . . . . . 7.2.4 Spare Parts and Equipment Management . . .
7.3
Computer Maintenance Management Software . . 196
7.4
CMMS Implementation: Procedure and Experimental Evidence . . . . . . . . . 7.4.1 System Configuration and Integration . . . . . . 7.4.2 Training and Data Entry . . . . . . . . . . . . . . . . . 7.4.3 Go Live . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.4 Postimplementation Phase and Closing . . . . . 7.4.5 Experimental Evidence Concerning CMMS Implementation . . . . . . . . . . . . . . . . . . . . . . . .
190 190 192 194 195
199 199 200 200 200 200
7.5
Failure Rate Prediction . . . . . . . . . . . . . . . . . . . . . . . 204 7.5.1 Accelerated Testing . . . . . . . . . . . . . . . . . . . . . 204 7.5.2 Failure Data Prediction Using a Database . . . 206
7.6
Remote Maintenance/Telemaintenance . . . . . . . . . 214 7.6.1 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
A modern approach to the maintenance problem requires an efficient support operated by the information system. There are a lot of articulated data to be taken into consideration. A system that collects and organizes this information is a prerequisite for any further elaboration. Nowadays, information technology provides to maintenance engineers and practitioners an automatic software platform called a “computerized maintenance management system,” with some advantages but also some omissions. Often engineers and practitioners cannot wait for the implementation of the
computerized maintenance management system; their policies require robust information since from the phase-in of the equipment or plant. They may wish to get reliability results more quickly than in the case of data coming from products operating under normal conditions. This situation is usually faced using the experience of the maintenance personnel but several lacks of robustness of data occur. Alternative, more accurate approaches are accelerated testing and failure data prediction using an existing database.
7.1 The Role of a Maintenance Information System Some parts of this book emphasize very clearly the importance of the knowledge of the performance of plants, equipment, and facilities in order to operate an effective management of the maintenance of the system. For example, reliability theory is absolutely based on the failure behavior, which is the starting point to evaluate appropriate key performance indexes. For this reason an effective maintenance system requires the introduction of a maintenance information system to record the history of equipment in terms of failures, spare parts, workloads, interventions, and to support the optimization policies (i. e., preventive, predictive, etc.). In a normal situation there is a large set of critical components operating a lot of cycle failure–restoration cycles during their lives, and maintenance workers make interventions daily. In conclusion, all the information about maintenance growing day by day represents an unreleaseable source of data for the com-
R. Manzini, A. Regattieri, H. Pham, E. Ferrari, Maintenance for Industrial Systems © Springer 2010
189
190
7 Maintenance Information System and Failure Rate Prediction
Maintenance information system
Data collection
Maintenance engineering
Interventions and workload analysis
Spare parts and equipment management
Asset register (machines or components)
Key performance index evaluation
Interventions scheduling
Field data collection
Maintenance policies
Activities progress checking
Spare parts procurement and storage
Remote data collection
Maintenance budgeting and costing
Maintenance workers management
Maintenance instrumentation management
Spare parts forecasting and management
Fig. 7.1 Typical corrective intervention activities
pany. The solution is the maintenance information system. The relevance of this topic is demonstrated by the interest of the European Committee for Standardization (CEN). CEN technical committee TC319 has been working for several years on the unification of different standards existing in maintenance, with particular attention to the information system. In 1997 the Italian Standardization Center (UNI) promoted the standard UNI 10584/97 devoted to the structure of a maintenance information system. It has four general sections dealing with the “environment,” i. e., description of plants, equipment, and facilities, the “maintenance management” devoted to managing interventions (e. g., corrective, preventive), the “check” dedicated to key performance index evaluation, and finally the “improvement section” concerning the application of several techniques such as failure modes and effects analysis and failure mode, effects, and criticality analysis to enhance system performance. In the following section we show a general and complete framework for a maintenance information system coming from a literature analysis, and above all from several applications in the real industrial field.
7.2 Maintenance Information System Framework A modern information system representing an effective support to all maintenance activities must have
several sections, such as data collection, maintenance engineering, interventions and workload analysis, and spare parts and equipment management. Each section in the framework proposed in Fig. 7.1 is divided into its typical subsections. These strictly intercorrelated sections have different goals but work together to reach the maximum economic result for the company.
7.2.1 Data Collection The fundamental scope of this section is to collect all interesting data from the field. First of all it is important to identify the facility or plant characteristics and their “critical” components. Plants usually have hundreds or thousands of components of which a manageable part must be preliminarily selected. This proper set of components is referred to a specific configuration of the plant, and is to be revised when the configuration changes. In the start-up phase the selection of critical components is difficult because no historical data are available. In this case, information from suppliers and expertise developed in similar plants can represent a valid initial solution. The result of this preventive phase is usually the construction of an asset register (machines and/or components). The typical information collected deals with “general data” such as purchase date, cost, supplier, layout position, critical components, preventive interventions suggested, and spare parts suggested.
7.2 Maintenance Information System Framework
191
Company
Asset Register
MAINTENANCE
MACHINES
MACHINE
MACHINE CODE
MANUFACTURER
PURCHASE DATE
MANUFACTURING DATE SUPPLIER
PURCHASE COST
PURCHASE CONDITION
LAYOUT POSITION
SUPPLIER SUGGESTED INTERVENTIONS TIME INTERVAL
INTERVENTION
STANDARD MTTR
CRITICAL COMPONENTS and NOTES
N° data sheet
Date
Compiler
Signature
Fig. 7.2 Example of an asset register document (machines)
The dual approach, machines and components, is interesting when the same components are installed on different machines: the evaluation of the key performance index and the application of optimizing policies are easier. Figures 7.2 and 7.3 show, respectively, an example of an asset register document dedicated to machines and an asset register document dedicated to components. The asset register collects “static” information. But in a working production or service system all the machines, equipment, and facilities continuously alternate between uptimes and downtimes, i. e., failures and restorations. The relating information is fundamental knowledge for an effective approach to the maintenance problem. For this reason, the core of the
data collection section is data mining while systems are working. This goal is achieved basically by the workflow of two documents: the failure report and the work order. When a failure occurs the operators of a maintenance division perform the corrective intervention in order to recreate the original work conditions as soon as possible. After this, they must fill out a report, the so-called failure report, to characterize the interventions. Figure 7.4 shows an example of a failure report. The fundamental pieces of information to be collected are the date and time of failure, the machine and component that failed, and the characteristics of the intervention performed (time to repair, spare parts if used, and workload employed).
192
7 Maintenance Information System and Failure Rate Prediction
Company
Asset Register
MAINTENANCE
COMPONENTS
COMPONENT
COMPONENT CODE
MANUFACTURER
MANUFACTURING DATE SUPPLIER
PURCHASE DATE
PURCHASE COST
PURCHASE CONDITION
COMPONENT INSTALLATIONS MACHINE
MACHINE CODE
MACHINE LAYOUT POSITION
TIMES TO REPAIR REPLACE
ASSEMBLY/DISASSEMBLY
TUNING
NOTES
Fig. 7.3 Example of an asset register document (components)
N° data sheet
Preventive and predictive interventions must be planned according to a formal document indicating provided activities, times, workload, and spare parts, if due. This document is followed by a final report, containing the effective actions in the intervention. The experience in practice suggests condensing both planning and reporting phases in a single document. Figure 7.5 presents an example of this work order document with the planning sector at the top and the report sector at the bottom. The failure report and the work order continuously fill a dynamic database tracing the maintenance history of plants and linked to maintenance intervention operated by workers. In the few past years companies have developed new industrial instrumentation devices that
Date
Compiler
Signature
would allow an automated collection of multiple data, i. e., temperature, vibrations, velocity, noises, power, etc., thus powering a fundamental activity not fully exploited at the moment, as stated in Sect. 7.6.
7.2.2 Maintenance Engineering This section is devoted to developing the analysis supporting the maintenance optimization, and in particular the evaluation of key performance indexes and the determination of the best policies. The correct collection of data, as described in Chaps. 5 and 6, is standard in order to carry out a set of synthetic parameters that “measure” the maintenance performance
7.2 Maintenance Information System Framework
193
Company
Failure report
MAINTENANCE ID NUMBER
APPLICANT
FAILURE MODE
FAILURE DATE & TIME /
/
COMPONENT CODE
MACHINE CODE
:
FAILURE PRESUMED CAUSE
FAILURE EFFETS
INTERVENTION STARTING DATE & TIME
/
/
ENGAGED WORKERS
FINISHING DATE & TIME
:
/
/
:
JOB DESCRIPTION
SPARE PARTS & EXPENDABLE DESCRIPTION
CODE
QUANTITY
NOTES
N° data sheet
Date
Compiler
Signature
Fig. 7.4 Example of a failure report
of the system. Reliability, maintainability, availability, and hours spent in maintenance are some typical parameters usually considered. There are different levels of investigation, from a group of machines to a single machine, or to the components, according to the completeness of the data and to the goals to be reached. The best solution, i. e., the way to maximize the benefits, is usually a mix of maintenance policies deriving from the as-is analysis; some suitable key performance indexes can help in identifying the right techniques to be applied.
Some of them, such as preventive and inspection maintenance models, fault tree analysis, failure modes and effects analysis, and failure mode, effects, and criticality analysis models (see Chap. 8), are supported in this section for maintenance engineering. All these efforts are directed to an economic result. In every company, adopted models and techniques must be validated from an economic point of view, and the evaluation of costs related to production losses, maintenance interventions, spare parts, equipment, and personnel is crucial.
194
7 Maintenance Information System and Failure Rate Prediction
Company
Work order
MAINTENANCE ID NUMBER
APPLICANT
EMISSION DATE & TIME /
/
:
MAINTENANCE PLANNING SERVICE RESERVED MACHINE
MACHINE CODE
COMPONENT
COMPONENT CODE
JOB DESCRIPTION
REPORT of INTERVENTION STARTING DATE & TIME
/
/
ENGAGED WORKERS
FINISHING DATE & TIME
:
/
/
:
JOB DESCRIPTION
SPARE PARTS & EXPENDABLE DESCRIPTION
CODE
QUANTITY
NOTES
N° data sheet
Date
Compiler
Signature
Fig. 7.5 Example of a work order
In conclusion, the maintenance engineering module as a part of the maintenance information system copes with a main group of structured key performance indexes to monitor the maintenance performance and costs, and several subsections for developing the optimization policies.
7.2.3 Interventions and Workload Analysis An effective maintenance system requires a mix of policies, usually not easy to manage contemporaneously because of the large number of items, the very
7.2 Maintenance Information System Framework
significant impact on production losses, and the relevant number of workers engaged. A correct scheduling of maintenance policies and activities is required in order to seize the possibility of important savings. Project management techniques such as Gantt diagrams, the program evaluation review technique, and the critical path method match efficiency with simplicity and are very effective tools also in a maintenance system. The scheduling of maintenance interventions, and especially of preventive activities, has a great impact on the productivity of systems. Often maintenance interventions require the production systems be stopped. For this reason, there must be close coordination with maintenance and production to avoid a delay in the due date and reductions in the customer service level. Several maintenance interventions are time-consuming (e. g., days or weeks of service for a steam turbine) and require many activities. In these situations, in addition to an effective scheduling, it is very important to check the progress of different actions day by day, sometimes even hour by hour. This monitoring activity must consider the possible delay and generate corrective actions as soon as possible in case of misalignments with the schedule. Maintenance activities are usually executed by skilled personnel. Depending on the production/service system, the maintenance branch can have a lot of workers. This section of the information system supplies information concerning working hours, shifts, vacations, and skills training, thus supporting people management. The integration of the information system in the scheduling module allows the analysis of maintenance cost, based on the schedule of activities, in terms of supplied hours, e. g., classified into the different policies (i. e., corrective, preventive, inspective).
7.2.4 Spare Parts and Equipment Management Spare parts represent a very important part of the economic impact of maintenance in a production/service system. To take effective decisions, the robustness of information is very important. This module is devoted to supporting the forecast of spare parts requirements and the management of the quantities procured.
195
The spare parts forecasting problem is discussed in Chap. 11, where the optimal number of spare parts is achieved by some models presented. From an informative point of view, a valuable solution needs a robust historical data set. Data on previous consumption of technical items collected by the failure reports and work order reports are the grounds for the optimizing models, and after that evaluation it is necessary to cope with the management of procured spare parts. In any company the procurement branch is usually devoted to getting raw materials for production, and possibly can attend to spare parts procurement too, but it is important to underline the distinctive peculiarities of spare parts, such as low consumption, high cost, and uncertain and specific use, in comparison with “ordinary” materials. This is a typical trade-off problem within the company because the procurement area office has high skills in negotiation and trading but no competence regarding technical features of materials, which is possessed by the maintenance personnel, who do not have commercial expertise to procure the material in an economic way. If the spare parts procurement is exploited by the maintenance division, it is absolutely important to integrate the applied methodologies into the general enterprise resource program (ERP) software (e. g., SAP, JDE, Baan). Another typical problem related to spare parts management deals with the phase-out of plants and equipment. The phase-out is the terminal step of the life of a production/service system: the management has already decided on the future date when the plant will be cast off, and until that time it is necessary to guarantee the correct level of output with the minimum maintenance expense, e. g., spare parts investments. An effective maintenance information system supports the phase-out by taking into consideration every assumed decision and informing all people involved in maintenance, procurement, process design, etc., thus avoiding wrong behaviors. Not only plants and machines require maintenance, even tools and equipment, such as hand tools, measuring devices, and programmable logic controllers, used by maintenance performers need maintenance and calibration. For example, devices for measuring length are subjected to an official calibration by certified associations. These validations have a specific duration and must be renewed. A company has many devices to take into consideration, and the information maintenance system
196
plays an important role to support their effective management.
7.3 Computer Maintenance Management Software In maintenance, some decisions concerning maintenance policies, spare parts procurement, etc., and based on information stored in the maintenance information system, are often made very repetitively and quickly. This large amount of data is very difficult to manage, especially when information is stored on paper documents. For example, the choice of the preventive policy is fundamentally based on the hazard rate, whose evaluation requires the time to failure analysis as recorded in the failure reports: sometimes it could be necessary to review hundreds of sheets concerning a specific component simply to extract its reliability parameter. Such a scenario enlightens us about the positive impact of information technology instruments such as databases and software. Automatic data processing reduces the time spent and its correspondent cost, and usually improves the robustness of elaboration. Furthermore, the experimental evidence shows that the maintenance personnel has fewer difficulties accepting maintenance information management through software support in comparison with a paper one, considered as a time-consuming activity with no added value. The software for a maintenance information system is usually called “computer maintenance management software” (CMMS). Different CMMS packages offer a wide range of capabilities and cover a correspondingly wide range of prices. Anyway, they have a great data management capacity in terms of data storage and filtering, but very rarely support optimizing models and techniques for determination of the optimal mix of policies, spare parts forecasting, etc. In other words, the existing CMMS packages contain a subset of functionalities provided by the general framework discussed above. A typical commercial package is structured in several sections: – Asset management: Recording data about equipment and property, including specifications, warranty information, service contracts, suggested spare parts, purchase date, and anything else that might be not linked to the equipment functioning.
7 Maintenance Information System and Failure Rate Prediction
– Work orders: Scheduling jobs, assigning personnel, reserving materials, recording costs, and other relevant information, such as the cause of the problem (if any), downtime involved (if any), and recommendations for future action. – Purchase orders: Procuring materials (spare parts, instruments, and external workload). This section points out the “commercial” setting typically adopted by the software house, usually devoted to the general ERP. – Spare parts inventory control: Management of spare parts, tools, and other materials, including the reservation of materials for particular jobs, recording where materials are stored, determining when materials should be purchased, tracking shipment receipts, and taking inventory. CMMS packages can produce status reports and documents giving details or summaries of maintenance activities, but usually these reports are obtained only by filtering of the data set. No contributions dealing with reliability parameters, probability failure distributions, hazard rates, and optimizing approaches are supported. The ideal framework shown in Fig. 7.1 has not yet been achieved. There are a number of CMMS packages available on the market today, from small solutions working on stand-alone PCs, to very complicated integrated packages working only on the company mainframe, with costs varying from a few thousand euros (PC standalone solutions) to 80,000–100,000 euros for a mainframe system with 25–30 licenses. The CMMS implementation in a company requires a significant customizing phase, with its relevant cost. Evans (2005) estimated for an intermediate-level CMMS package an implementation cost of about 18 months per worker for each ten licenses. In conclusion, owing to technical reasons (i. e., lack of optimizing models) and/or owing to economic reasons (i. e., significant purchase and implementation costs), many companies decided to develop software to support maintenance activities themselves. Several sections of a CMMS package1 representative of the standard level of computerized maintenance management systems available on the market today are shown in Figs. 7.6–7.11. 1
MaintiMizerTM . Copyright 2005 Ashcom Tecnologies, Ann Arbor, USA
7.3 Computer Maintenance Management Software
Fig. 7.6 Example of the main form of computer maintenance management software (CMMS) (MaintiMizerTM /
Fig. 7.7 Example of CMMS data entry mask (MaintiMizerTM /
Fig. 7.8 Example of CMMS preventive actions agenda (MaintiMizerTM /
197
198
7 Maintenance Information System and Failure Rate Prediction
Fig. 7.9 Example of CMMS work order (MaintiMizerTM /
Fig. 7.10 Example of CMMS purchase order frame (MaintiMizerTM /
Example of commercial CMMS The user is welcomed by the main form reported in Fig. 7.6, useful to reach the different parts of the program, and in particular those four parts discussed above. The menus Quick Parts Lookup and Display Equip Info represent the asset management section for data collection about equipment and plants. The top submenu (i. e., General, Tasks PM’s, Parts, Costs, Information) completes the setting of the “static” information about preventive intervention, usually suggested by supplier, spare parts, and correspondent costs (see Fig. 7.7). Concerning planned preventive interventions, the software remembers the user actions according to a bill
book made manually. These actions are inserted manually by the user, as illustrated in Fig. 7.8, because no optimizing models are supported. In this software the failure report and the work order linked to preventive or predictive interventions are unified in a single document, simply called “Quick Work Order.” An example of this document is presented in Fig. 7.9. The navigation buttons at the bottom connect the main features of failure and performed intervention, i. e., spare parts used (material) and workload engaged (labor). Even the procurement activity is supported in CMMS by storage of data about items, corresponding to quantity at hand, tracking, tracing of prices, etc. Figure 7.10 shows a frame containing information about
7.4 CMMS Implementation: Procedure and Experimental Evidence
199
Fig. 7.11 Example of CMMS purchase order frame (MaintiMizerTM /
the supplier, while Fig. 7.11 describes the item to be bought. Usually for every item CMMS keeps the quantity on hand, but the availability of this information is dependent on manual load/unload procedures concerning the storage/retrieval of materials in/from the warehouse. These procedures are very crucial in order to avoid great misalignment between virtual and physical stocks, hence the absence of materials or obsolescence risks.
7.4 CMMS Implementation: Procedure and Experimental Evidence Often companies purchase CMMS with the expectation that it will solve their problems regarding maintenance. The implementation of every CMMS package is not a trivial procedure, is made of a lot of activities and takes several months. This phase can take place only after the requirements have been set and the software selected. Unfortunately, it is not as easy as flicking a switch. Functional CMMS means configuring the software, entering collected key data, and involving people in the system. It is important to emphasize that the system aims to organize the maintenance question in a proactive, instead of a reactive, mode and not to monitor employees.
Experimental evidence shows that the introduction of a CMMS steering committee can reduce the effort and the time of the phase-in. This group includes members of the same team, possibly together with consultants, suppliers, and direct users of CMMS, involved in the definition of the business process, its requirements, and the software selection. The fundamental milestones for a useful implementation are: • • • •
system configuration and integration; training and data entry; go live; postimplementation phase and closing.
7.4.1 System Configuration and Integration CMMS works well only if it is correctly configured according to the real industrial system. The existing commercial CMMS solutions have their own typical structure to be customized according to the real case. This structure should be as complete and accurate as possible. Incorrect or inconsistent data are the quickest road to frustration for CMMS users; moreover, the system should provide a very user-friendly interface. For all these reasons, it is very important to state the final target, the intermediate subtargets, and the relative
200
activities before the customization phase. In the field it is very frequent to notice some sections, or modules, of CMMS that are utilized less. This is often due to inaccurate customizations of the software. Another very important issue is the integration between the CMMS and the ERP in use, i. e., the company central database, in order to avoid data misalignments and duplications and to guarantee continuous control of the maintenance division.
7.4.2 Training and Data Entry The system should be well tested prior to going live. The test phase is developed by scripts modeling the process and involving computer-savvy end-users, getting their first hands-on experience with the system and recording their first impressions. It is recommended to use “not canned” data for training. The training environment should mirror the production database, and its format should be step by step role and process based in order to avoid misleading and confusion among all the CMMS functionalities. Several team members must be available to train users, if possible in a “temporary” environment where they can practice without corrupting production data. After training, the data entry phase must follow. CMMS functionalities are exploited only if the asset register is sufficiently consistent, i. e., the maintenance database has to reach a critical mass before the go live stage.
7.4.3 Go Live The best practice is to schedule the “go live” when the training is sufficient and the maintenance work does not have a peak (i. e., general overhaul, revamping, or very important preventive interventions). It is important to plan a backup solution for managing the flow of information in the case of an unexpected crash of the system, and the users can follow some good practices in order to reduce the corresponding risk. For example, it is useful for workers to create work orders in the CMMS at the end of their shift, or to preserve some quick references or sheets and diagrams as well containing the proper values to be entered. A daily review
7 Maintenance Information System and Failure Rate Prediction
of what went wrong is necessary, in order to schedule the required modifications and update the work process for the next day: this is an excellent way to see how successfully each maintenance user is interacting with the system, or who needs some extra help.
7.4.4 Postimplementation Phase and Closing After the “go live” and before the definitive release of the system, the project team has to review all the defined requirements and evaluate the corresponding fulfillment. Usually it is necessary to schedule several corrective actions, with their goals and due dates. In the postimplementation phase some negative factors, such as the turnover in maintenance employees, the modifications in company technical assets, and the new releases of CMMS, must be considered. The corresponding actions are the organization of training courses for new maintenance workers, the application of procedures for data collection about new assets, and relations with software providers that ensure the compatibility of different releases. Finally, it is necessary to put in place a performance indicator about the maintenance processes, not only technical, as every efficient CMMS still does in an automatic way, but also economic. A CMMS system is a tool that can genuinely enable an organization to meet profitability, but its impact must be continuously monitored. An effective CMMS implementation process is fundamental. Those organizations that successfully supported this processes claim 10–30% reduction in maintenance-related expenditures. But the experimental evidence points out a generalized underutilization of CMMS systems, resulting in an insufficient return of money and work paid and a not complete commitment of people. This is fully demonstrated in the following studies about CMMS implementations in practice.
7.4.5 Experimental Evidence Concerning CMMS Implementation Several authors developed studies about the diffusion of CMMS systems. Swanson (2003) focused his attention on the general characteristics of CMMS systems
7.4 CMMS Implementation: Procedure and Experimental Evidence Table 7.1 Computer maintenance management software (CMMS) hardware characteristics Companies with a CMMS system (%) Companies without a CMMS system (%)
Table 7.3 CMMS software structure CMMS user
Percentage of the total number of companies
Degree of use (1 rarely, 5 frequently)
Maintenance directors Maintenance planners Maintenance workers Purchase employees Warehouse employees Production managers Production workers
93.2 86.1 86.1 77.9 64.7 51.9 36.8
3.9 4.0 2.9 3.3 3.6 2.3 1.8
60.1 39.9
CMMS average go live (years) CMMS origin Commercial (%) In-house software (%) Others (%) No answer (%) Hardware configuration Mainframe (%) Minicomputer (%) PC-LAN (%) Stand-alone PC (%) Others (%) No answer (%)
4.0 57.1 28.6 9.0 5.3 28.6 4.5 17.3 29.3 14.3 6.0
Table 7.2 CMMS software structure CMMS module
Percentage of CMMS with the module
Degree of use (1 rarely, 5 frequently)
Scheduling of preventive interventions Database of past interventions Asset register Scheduling of workload Spare parts purchasing Spare parts need management Spare parts stock management Support to inspections Maintenance budgeting
95.5
4.0
95.5
3.4
89.5 82.7 80.5 80.5
3.4 2.4 3.3 2.9
78.9
3.3
70.7 72.9
2.7 2.6
201
Table 7.4 CMMS commercial packages CMMS
Percentage
PLM300 (SAP) Maximo (IBM) MP2 (Datastream) MIMS (EAM) PMC (DPSI) Mainsaver (Mainsaver) MPAC (Indus) Others In-house software
24.8 13.3 5.7 4.8 3.8 2.9 2.9 28.5 13.3
Table 7.5 Average CMMS “go live”
in terms of hardware architecture, software structure, and company users. This study was based on the analysis of 354 American companies participating at the National Maintenance Excellence Award section Mechanical Industries. Fundamental results are shown in Tables 7.1–7.3. A sufficient diffusion of CMMS systems is seen in Table 7.1, with a work period quite short on average, suggesting a situation on the rise. The percentage of companies that developed the software themselves is significant (28.6%), but very significant is the hardware configuration adopted: the same diffusion for mainframes and stand-alone PCs. Moreover, the CMMS is not yet sufficiently integrated with the company ERP, and the maintenance function exploits its
Years
Percentage
In progress < 1 year 1–2 years 2–3 years 3–4 years > 4–5 years > 5 years No answer
4.8 5.7 16.2 12.4 12.4 6.7 25.7 16.2
support autonomously, without sharing any data with the other parts of the company, such as purchase office and administration. Table 7.2 points out the typical support offered by commercial CMMS: a database of interventions and the management of the scheduling of preventive actions. In general, the spare parts management is well supported and employed by users. Commercial CMMS packages usually do not support any model to optimize maintenance policies and to support maintenance engineering choices. Table 7.3 underlines a full commitment of maintenance directors and planners, while maintenance workers are less involved in the use of CMMS. Because of the scarce integration between maintenance and production, the
202
7 Maintenance Information System and Failure Rate Prediction
Table 7.6 Reasons for CMMS choice Reason
Most important (%)
Second most important (%)
Don’t know Integration with other commercial software General functionality and features Ease of use Price General reputation of software and its vendor Compatibility with previous CMMS Compatibility with operating system Availability of training Availability of local support It uses the latest technology Speed of system response Ease of implementation Integration with other technical software Availability in local language version Other/not applicable
22.9 15.2 9.5 8.6 6.7 3.8 3.8 2.9 1.9 1.0 1.0 1.0 1.0 0.0 0.0 21.0
21.0 7.6 13.3 3.8 6.7 8.6 1.9 2.9 1.0 6.7 3.8 1.9 1.0 1.9 1.0 17.1
Table 7.7 CMMS success factors Factor
Most important (%)
Second most important (%)
Senior management commitment Effective training Choosing the right CMMS Effective change management CMMS vendor support Adequate budget Focus on business benefits Effective BPR Effective project management Consultant support
46.9 37.5 31.3 31.3 21.9 18.8 15.6 15.6 15.6 12.5
53.1 53.1 21.9 15.6 6.3 25.0 28.1 25.0 15.6 6.3
BPR business process reengineering
production personnel is rarely aware of the potentiality of a CMMS. Another interesting study was developed by the Plant Maintenance Resource Center (PMRS 2004) of Booragoon (Australia). In this case, a sample of 105 companies from several sectors (automotive, petroleum, food and beverage, transport) in the USA (29.5%), Australia (10.5%), the UK (6.7%), and Canada (5.7%) was investigated. The study was particularly devoted to analyzing the reason for the choice of CMMS. These companies generally had in their trading staff more than ten people, (84.8%, and in particular 47.6% had more than 100). CMMS was present in the 81.9% of the sample, and 13.3% of CMMS was developed in-house, while the first seven commercial packages had about 60% penetration (Table 7.4). Most of the systems analyzed had been in
place in recent years, but a significant proportion had been in place for at least 5 years or more (Table 7.5). The analysis of the factors that influence the software selection is very interesting. A great number of maintenance managers who replied to this question were not aware of the reasons driving this process. Anyhow, the most commonly stated reasons were general functionality and features and integration with other commercial software, as summarized in Table 7.6. In addition, some other factors, such as the possibility to handle enormous amounts of data, the commonality with tools adopted in other divisions of the company, or a convenient price, were considered. The Plant Maintenance Resource Center research points out the importance of a senior management commitment, an effective change in the management, and valuable training (Table 7.7).
7.4 CMMS Implementation: Procedure and Experimental Evidence Table 7.8 “Hot” factors Factor
Percentage
Effective training Effective BPR Effective change management Choosing the right CMMS Senior management commitment Effective project management Adequate budget Focus on business benefits CMMS vendor support Consultant support Other/not applicable
19.0 15.2 11.4 8.6 7.6 4.8 4.8 1.9 2.9 1.0 22.9
As reported in Table 7.8, training is the activity with the biggest potential improvement, but a lot of effort and time was also paid to an effective business process reengineering. The commitment of senior management and an effective change in management are very popular factors: in other words, the success of the CMMS implementation is related to a significant change in mentality firstly of top management and secondly of workers. The last important question deals with the benefits accrued from the CMMS implementation. The results in Table 7.9 report the prevalence of “don’t know/not applicable,” including people who currently do not use CMMS. The most important benefits concern the possibility to improve the control of technical activities, such as maintenance history, planning and scheduling of interventions and spare parts, and the related costs. There is not a clear vision about benefits concerning the reliability and availability of equipment, thus confirming a weak approach to optimization strategies: current CMMS systems are considered overall
203
as large databases useful for data classification and management. The work by O’Hanlon (2005) confirms the difficulties in the implementation process of a CMMS system. The investigation involved more than 600 companies all over the world and focused on the expected return of investment due to introduction of CMMS. Fifty-seven percent of companies declared missing the expected return of investment, 4% had no idea about the expected return of investment, and for only 39% was the investment successful. This low percentage of successful investments is mainly due to an incomplete implementation of CMMS. In particular, CMMS is often considered as a formal attainment requiring time and resources without positive impacts on the maintenance work. Consequently, interventions are partially registered in the database and with great time delay, spare parts are managed in an informal manner without the CMMS support, and data elaborations by CMMS (i. e., mean time to failure, mean time to repair calculus) are not used to support maintenance policies. This situation is clearly reported in Figs. 7.12 and 7.13. The return of the investment associated with a CMMS system can be seriously compromised by discontinuous training. Companies often invest their time and money in a CMMS system without supporting this choice through training of new personnel, updating the software through new releases, and “maintaining” the CMMS during the “go live” years. Figure 7.14 shows how much companies reserve for updating the system and the correspondent training on average per year. The CMMS impact is strongly related to a massive use of its potentiality: every critical asset must be registered, all the interventions must be recorded, the spare parts must be fully managed with the dedicated
Table 7.9 CMMS benefits Benefit
Significant (%)
Some (%)
None (%)
Don’t know/not applicable (%)
Improved cost control Improved maintenance history Improved maintenance planning Improved maintenance scheduling Improved spare parts control Improved equipment reliability Improved equipment availability Reductions in materials costs Reductions in other costs Reductions in labor costs
35.2 30.5 30.5 28.6 21.9 13.3 9.5 11.4 8.6 5.7
23.8 37.1 36.2 39.0 35.2 41.0 37.1 32.4 36.2 32.4
16.2 9.5 8.6 6.7 12.4 15.2 21.9 22.9 23.8 29.5
24.8 22.9 24.3 25.7 30.5 30.5 31.4 33.3 31.4 32.4
204
7 Maintenance Information System and Failure Rate Prediction [no answer] 1%
7.5 Failure Rate Prediction [all] 19%
[<25%] 27% [20 –50%] 9%
all (100%) 75– 95% 50– 75%
[75–95%] 31 %
[50 –75%] 13%
20– 50% < 25% no answer
Fig. 7.12 Spare parts managed by the CMMS system [no answer] 1%
[20 –50%] 12%
[<25%] 8%
[all] 20%
all (100%) 75 –95%
[50 –75%] 15%
50 –75% [75 –95%] 44%
20 –50% < 25% no answer
Fig. 7.13 Maintenance intervention registered on the CMMS system
CMMS module, etc. It is possible to gain real technical and economic benefits only with a robust and complete database. Furthermore, as a CMMS system needs trained and skilled users, it is also fundamental to develop the CMMS according to new trends and models in maintenance: companies have to be ready to upgrade their system to constantly take new advantages.
Previous chapters dealt with the reliability evaluation of complex systems using reliability theory (i. e., statistical approach) or other approaches (e. g., Markov analysis). The initial part of this chapter discussed resources, such as CMMS, supporting the collection of data from the field and their elaboration. Data collection is time-consuming and very expensive (e. g., introduction of the CMMS system). Anyhow, the experimental evidence shows a time interval, about 10–14 months according to the case study, between the introduction of the information system and sufficient usability of data. Engineers and practitioners require robust information from the phase-in of the equipment or the plant, and often they cannot wait so long. They may wish to obtain reliability results more quickly than they can when data come from products operating under normal conditions. The experience of the maintenance personnel is useful to overcome such a situation, but several lacks in data robustness occur. Otherwise, accelerated test and failure data prediction using existing databases are more accurate approaches.
7.5.1 Accelerated Testing In a reliability accelerated test, the components are stressed over the normal operating conditions in order to capture reliability data related to failure state more rapidly. An accelerated test can significantly reduce the amount of time needed, if it is properly conducted. A lot of different approaches are available, but all
[75,000–100,000 $/ y] [>250,000 $/y] 7% 8% [100,000–250,000 $/ y] [no answer] 7% 6% > 250,000 $/year 100,000–250,000 $/year 75,000–100,000 $/year 50,000–75,000 $/year 25,000–50,000 $/year 10,000–25,000 $/year < 10,000 $/ year no answer
Fig. 7.14 Yearly investment devoted to CMMS update and training
[<10,000 $/y] 39%
[10,000 – 25,000 $/y] 16%
[50,000–75,000 $/ y] 6% [25,000 – 50,000 $/y] 11%
7.5 Failure Rate Prediction
of them belong to two fundamental categories: qualitative and quantitative. Qualitative accelerated tests, such as highly accelerated life tests, highly accelerated stress tests, “torture tests,” or “shake and bake,” are primarily used to investigate failure modes for the product. These are “on/off” tests: if the product survives, the test is passed, otherwise the test is failed. This kind of test is usually employed to limit the investment in comparison with the quantitative test, which is more expensive. Another typical application is related to the improvement of the product’s design, in order to eliminate the main causes of failure identified during the test. For equipment that works intermittently, the advantage of accelerated test lies in its extended use: the product to be tested operates at a rate greater than normal to simulate longer periods of work under normal conditions. Anyhow, devices are very often expected to operate continuously under normal conditions. In this case a different type of accelerated life test, founded on overstress, must be used in order to get data more rapidly. By an overstress acceleration, one or more environmental factors, such as temperature, voltage, and humidity, supposed to cause the product to fail under normal conditions are increased in order to stimulate the product to fail more quickly during the test. The stress types and levels used in an overstress acceleration test must be carefully chosen, in order to speed up the failure modes of the product without introducing other failure modes that would never occur under normal use conditions. The stressed conditions are usually reached by mechanical strains, force cycling, cold to hot, vibrations, and other solutions according to the task of the device being analyzed. The approach is usually very cheap because the sample is limited to a few components; however, in general, it does not provide information useful for quantifying the failure rate or the reliability parameter of the product under normal-use conditions. Quantitative accelerated life testing is the solution. This type of test involves the application of punctual levels of stress and requires a punctual evaluation of the resulting life data. The test output is useful for an estimation of the probability density function for the product under normal-use conditions, and many other very important metrics for the product, such as reliability, probability of failure, mean life, and failure rate. The application of the stress can be constant, i. e., time-independent, or time-dependent as well. Each
205 Time
Pdf functions
Stress level
Fig. 7.15 Relationship between life and stress
stress combination, based on single or multiple levels, is usually called a “stress cell.” When a stress cell is operating for a fixed period of time, some components typically end the test without failing, thus giving rise to the censoring problem discussed in Chap. 6. In general, accelerated life data sets from stress cells require special data analysis techniques, including mathematical models to “translate” the probability density function from stressed conditions to normaluse conditions. These models, called “life–stress relationship,” work out the probability distribution at each accelerated stress level in order to estimate the probability density function at the normal stress level. Figure 7.15 shows the relationship between life and stress for a particular product. A typical problem affecting the accelerated life tests is the determination of the best stress cells: often the link between strains and product performance is not clear (e. g., an electronic device facing temperature, humidity, vibrations), and the definition of a representative group of stress cells and the consequent robust analysis of data are quite complex tasks. Available life–stress relationships include these principal models (Nelson, 2005): • • • • •
Arrhenius; the inverse power rule; the exponential voltage model; two temperature/voltage models; the electromigration model;
206
Fig. 7.16 Reliability performance according to stress and time levels
• three stress models (temperature, voltage and humidity); • Eyring; • the Coffin–Manson mechanical crack growth model. The Arrhenius model is very general and widely applied to chemical and electronic failure mechanisms. The Coffin–Manson model works well for many mechanical-fatigue-related mechanisms. The Eyring approach is used when more than three kinds of stress are considered, or as an alternative to the above-mentioned models. The final goal is to detect the connection among the reliability behavior under stress conditions and under normal conditions, as represented in Fig. 7.16.
7 Maintenance Information System and Failure Rate Prediction
During the Second World War the US Navy decided to collect information about the failures of the electronic devices in its equipment in a database. The goal was to permit the failure rate prediction using extrapolative techniques, running with this data, without tests or implementations of expansive maintenance information systems, such as CMMS systems. By this approach, considering the enormous number of pieces of equipment and every single contribution of information about the normal life cycle (i. e., uptimes/downtimes), a general purpose database of failure rate was obtained. Several public and private companies still follow the same path to develop their databases in a very cheap and rapid way. Probably the earliest source of reliability data was the Martin Titan Handbook published in 1959 (Akhmedjanov 2001). It contained generic failure rates on a wide range of electrical, electronic, electromechanical, and mechanical parts and assemblies. The Martin Titan Handbook was the first known attempt to standardize the presentation of failure rates, expressed in terms of 106 h and eventually corrected by factors involving the redundancy and the operative conditions. The Martin Titan Handbook was the starting point for the next generations of databases which have survived in some forms to the present day. Well-known instruments derived from the Martin Titan Handbook experience useful at the present time are: • MIL-STD-217 handbook (MIL-HDBK-217); • Government–Industry Data Exchange Program (GIDEP) and failure rate databank (FARADA); • Rome Air Development Center (RADC) nonelectronic reliability notebook.
7.5.2.1 MIL-HDBK-217
7.5.2 Failure Data Prediction Using a Database The collection of empirical information for the prediction of reliability performance has a long history. Since the seventeenth century, many insurance companies have collected empirical data about vessel accidents and estimated the probability of completion of a trip on a specific route, in order to calculate a convenient premium.
MIL-HDBK-217, published by the US Department of Defense, is based on the work done by the Reliability Analysis Center and Rome Laboratory at Griffiss Air Force Base, New York. MIL-STD-217 was developed for military and aerospace applications; however, it has become widely used for industrial and commercial electronic equipment applications throughout the world. This handbook contains failure rate models for the various part types used in electronic systems, such as integrated circuits, transistors, diodes, resistors, capacitors, relays, switches, and connectors. These fail-
7.5 Failure Rate Prediction
207
ure rate models are based on the best field data that could be obtained for a wide variety of parts and systems; these data are then analyzed assuming many simplifying hypotheses to create applicable models. The latest version of MIL-HDBK-217 is MIL-HDBK217F notice 2 (MIL-HDBK-217F2). The MIL-HDBK-217 standard for reliability prediction reports failure rate and mean time between failures values for individual components, pieces of equipment, and the overall system. The final calculated prediction results are based on the roll-up, or summation, of all the individual component failure rates. The handbook contains two methods for reliability prediction: part stress analysis and parts count analysis. The two methods vary in the degree of information required to be provided. The part stress method requires a greater amount of detailed information and is usually more applicable to the later design phase. The parts count method requires less information, such as part quantities, quality level, and application environment. It is most applicable during the early design or proposal phases of a project. The parts count method will usually result in a higher failure rate or lower system reliability. In other words, it provides a more conservative result than the part stress method. The widely diffused part stress method is applicable when the design phase is complete, and the definition of the bill of material and the component stresses are available. As a standard, the level of stress on each component is referred to the actual operating conditions, such as environment, temperature, voltage, current, and power levels applied. A sample MIL-STD-217 failure rate model for a simple very high speed integrated circuit (VHSIC)/VHSIC-like and very large scale integration CMOS component is shown below. Many components, especially microcircuits, have significantly different and more complex models. D b T A R S C Q E 6
(failures/10 h); (7.1) where b is the base failure rate, T is a temperature factor, A is an application factor (linear, switching, etc.), R is the power rating factor, S is the electrical (voltage) stress factor, C is the contact construction factor, Q is the quality factor, and E is the operating environment factor.
The failure rate formulas include a base failure rate for the selected component. These rates apply to components and parts operating under normal environmental conditions, with power applied, performing the intended function, using base component quality levels and operating at the design stress levels. Base failure rates are adjusted by applying the i factors, ranging from 0 to 1.0, to the underlying equation or model provided for each component category. The i factors listed are based on a simple component and are presented in different tables; Tables 7.10–7.12 show several examples. There are also i factors for issues such as learning factor, complexity factor, manufacturing process factor, device complexity factor, programming cycles factor, and package type factor. Each component, or part group, and its associated subgroup has a base failure rate plus numerous i factor tables specific to that component or part, in order to capture these issues in the model and to adjust the base failure rate. For example, ambient and operating temperatures have a great impact on the failure rate prediction results, especially for equipment involving semiconductors and integrated circuits. The MIL-STD-217 requires as input the value of ambient temperature and more defini-
Table 7.10 Base failure rate (MIL-HDBK-217F – semiconductors) Diode type – application
b (failures/106 h)
General purpose Switching Fast recovery power rectifier Power rectifier/Schottky Power rectifier/stacks Transient suppressor/varistor Current regulator Voltage regulator
0.0038 0.0010 0.025 0.0030 0.0050 0.0013 0.0034 0.0020
Table 7.11 Temperature factor T (MIL-HDBK-217F – semiconductors) Tj (°C)
T
Tj (°C)
T
25 30 35 40 45
1.0 1.1 1.2 1.4 1.5
50 55 60 65 ...
1.6 1.8 2.0 2.1 ...
1 T D exp 0:1925 Tj C273
1 298
208
7 Maintenance Information System and Failure Rate Prediction
Table 7.12 Electrical stress factor E (MIL-HDBK-217F – semiconductors) Stress
S
Transient suppressor, voltage regulator, voltage reference, current regulator
1.0
lamp 1 pp24-60 selector ff56
switch 1 qa1304
All others: Vs 0.3 0.3 Vs 0.4 0.4 Vs 0.5 0.5 Vs 0.6 0.6 Vs 0.7 0.7 Vs 0.8 0.8 Vs 0.9 0.9 Vs 1.0
0.054 0.11 0.19 0.29 0.42 0.58 0.77 1.0
For all except transient suppressor, voltage regulator, voltage reference, current regulator
switch 2 qa1304
lamp 2 ght24-56
fuse whsk 20
0.054 (Vs 0:3) Vs2:43 (0:3 Vs 1:0)
connector xc102
Fig. 7.17 Electronic circuit of a signaling system
voltage applied Vs .voltage stress ratio/ D rated voltage
tive data for the calculation of junction temperatures in semiconductors and microcircuits. The parts count reliability prediction is normally applied when design data and component specifications are not complete. Typically, this will happen at the start of the product design process, when equally many design decisions and project specifications, allocations, etc. can be determined with help from preliminary reliability prediction data. The formula for a parts count analysis is simply the sum of the base failure rate of all the components in the system: tot D
n X
Ni .g q /i
(failures/106 h);
(7.2)
i D1
where g is the generic failure rate for the i th generic part, Ni is the quantity of the i th generic part, q is the quality factor for the i th generic part, and n is the number of different generic part categories in the equipment. The standard provides tables for the component groups listing generic failure rates and quality factors for different environments. The predicted failure rate results will normally be harsher using the parts count method than using the part stress analysis. The parts count analysis does not consider the numerous variables and applies generic worst-case or base failure rates and i factors.
The MIL-HDBK-217F2 approach allows an easy “what if” evaluation, thus enabling the engineer to experiment with temperature, environmental, and stress settings and see how the system performance will vary.
7.5.2.2 MIL-HDBK-217F2 Application Consider the electronic circuit in Fig. 7.17 which represents a part of the signaling system of an automatic cutting machine for leather in a dressmaking process. The prediction provided by MIL-HDBK-217F2 requires different models and specific parameters for each kind of component. For example, the model for connectors is p D b P Q E
Table 7.13 217F2)
(failures/106 h):
(7.3)
Base failure rate – connectors (MIL-HDBK-
Description
b (failures/106 h)
Dual in-line package Single in-line package Chip carrier Pin grid array Relay Transistor CRT
0.00064 0.00064 0.00064 0.00064 0.037 0.0051 0.011
7.5 Failure Rate Prediction Table 7.14 217F2)
209
Active pins factor – connectors (MIL-HDBK-
Active contacts (N )
P
1 2 3 4 5 6 ...
1.0 1.5 1.7 1.9 2.0 2.1 ...
N 1 p D exp 10
of environment is defined by MIL-HBBK-217F2 as ground mobile (GM /. In conclusion, using Eq. 7.3, p D b P Q E
21 D 0:00064 exp 10
0:39 1:0 14
D 0:013 failures/106 h: Considering fuse whsk 20, the MIL-HDBK-217F2 model is very simple:
0:39 :
p D b E
Table 7.15 Quality factor – connectors (MIL-HDBK-217F2) Quality
Q
Military specifications Low quality
0.3 1.0
(failures/106 h):
(7.4)
The base failure rate for all fuses is 0.010 failures/106 h and the environmental factor E is defined as in Table 7.17. The failure rate predicted value for fuse whsk 20 is p D b E D 0:010 8:0 D 0:080 failures/106 h:
Table 7.16 Environmental factor – connectors (MIL-HDBK217F2) Environment
E
Ground benign (GB / Ground fixed (GF / Ground mobile (GM / Naval sheltered (NS / Naval unsheltered (NU / Airborne inhabited cargo (AIC / Airborne inhabited fighter (AIF / ...
1.0 3.0 14 6.0 18 8.0 12 ...
Table 7.17 Environmental factor – fuses (MIL-HDBK-217F2) Environment
E
Ground benign (GB / Ground fixed (GF / Ground mobile (GM / Naval sheltered (NS / Naval unsheltered (NU / Airborne inhabited cargo (AIC / Airborne inhabited fighter (AIF / ...
1.0 2.0 8.0 5.0 11 9.0 12 ...
The factors in Eq. 7.3 depend on several conditions and are collected in Tables 7.13–7.16. Connector xc102 in the electronic circuit in Fig. 7.17 is a single in-line package connector with two pins, has a normal, i. e., not military specification, quality, and is installed on a moving shuttle. This kind
Switch 1 and switch 2 (code qa1304) are identical, perform the same function, and operate in a unique assembled group. They are push-button resistive switches, not military specifications with two double pole, single throw contacts, with stress level S (see Table 7.18) near 0.4. For these components MIL-HBBK-217F2 suggests this model: p D b L C Q E
(failures/106 h):
(7.5)
The parameters are defined using Tables 7.19–7.22.
Table 7.18 Load stress factor – switches (MIL-HDBK-217F2) Stress S
Load stress factor L Load type
0.05 0.1 0.2 0.3 0.4 0.5 ...
SD
Resistive
Inductive
Lamp
1.00 1.02 1.06 1.15 1.28 1.48 ...
1.02 1.06 1.28 1.76 2.72 4.77
1.06 1.28 2.72 9.49 54.6
operating load current ; rated resistive load current
L D exp
S 0:8
2 :
210
7 Maintenance Information System and Failure Rate Prediction
Table 7.19 Base failure rate – switches (MIL-HDBK-217F2) 6
Description
b (failures/10 h)
Centrifugal Dual in-line package Limit Liquid level Push-button Rocker ...
3.4 0.00012 4.3 2.3 0.10 0.023 ...
p D b L C Q E 0:6 2 D 0:10 exp 1:4 2:0 18 0:8 D 6:346 failures/106 h:
Table 7.20 Contact configuration factor – switches (MILHDBK-217F2) Form
Contacts
C
SPST DPST SPDT 3PST ...
1 2 2 3
1.0 1.3 1.3 1.4 ...
Table 7.21 Quality factor – switches (MIL-HDBK-217F2) Q
Military specifications Low quality
1.0 2.0
Table 7.22 217F2)
MIL-HDBK-217F2 standard provides a dedicated model to estimate the failure rate for lamps. In particular, p D b A U E
SPST single pole, single throw; DPST double pole, single throw; SPDT single pole, double throw; 3PST triple pole, single throw
Quality
tacts and does not have a military specification. Using the previous tables and Eq. 7.5, the final result is
Environmental factor – switches (MIL-HDBK-
Environment
E
Ground benign (GB / Ground fixed (GF / Ground mobile (GM / Naval sheltered (NS / Naval unsheltered (NU / Airborne inhabited cargo (AIC / Airborne inhabited fighter (AIF / ...
1.0 3.0 18 8.0 29 10 18 ...
In conclusion, switches 1 and 2 have the following failure rate predicted value: p D b L C Q E 0:4 2 D 0:10 exp 1:3 2:0 18 0:8 D 6:009 failures/106 h: Selector ff56 is a three-position resistive push-button device with a higher stress level (S D 0:6) than the previous switches; it has triple pole, single throw con-
(failures/106 h):
(7.6)
Lamp 1 (code pp24-60) is a 24-V direct current device working when the alarm is disabled, then probably with a coefficient of utilization greater than 0.90. Lamp 2 (code ght24-56) has the same characteristics, i. e., voltage and direct current, but it works in the opposite manner in comparison with lamp 1. The parameters b , A , U , and E are fixed in Tables 7.23–7.26. In conclusion, the estimated failure rates for lamps in the circuit are: • p D 4:5 3:3 1:0 3:0 D 44:550 failures/106 h for lamp 1;
Table 7.23 Base failure rate – lamps (MIL-HDBK-217F2) Voltage (V)
b (failures/106 h)
5 6 12 24 28 37.5 ...
0.59 0.75 1.80 4.50 5.40 7.90 ...
Table 7.24 Application factor – lamps (MIL-HDBK-217F2) Application
A
Alternating current Direct current
0.59 0.75
Table 7.25 Utilization factor – lamps (MIL-HDBK-217F2) Coefficient of utilization
U
< 0:10 0.10–0.90 > 0:90
0.10 0.72 1.0
7.5 Failure Rate Prediction Environmental factor – lamps (MIL-HDBK-
Environment
E
Ground benign (GB / Ground fixed (GF / Ground mobile (GM / Naval sheltered (NS / Naval unsheltered (NU / Airborne inhabited cargo (AIC / Airborne inhabited fighter (AIF / ...
1.0 2.0 3.0 3.0 4.0 4.0 4.0 ...
12
λb (failures/millions hs)
Table 7.26 217F2)
211
10 8 6 4 2 0 0
5
10
15
20
25
30
35
40
45
50
supply voltage (V)
Fig. 7.18 Base failure rate predictions under different supply voltages – pp series lamp
• p D 4:5 3:3 0:1 3:0 D 4:455 failures/106 h for lamp 2. All the devices in the circuit have a serial placement: the failure of a single component compromises all the system. The predicted failure rate of the entire circuit is therefore the sum of the different contributions of the predicted failure rates of the components: X system D components : Table 7.27 summarizes the results. The final predicted failure rate for this part of the signaling system of an automatic cutting machine is 67:462 106 h1 . In the case of a continuous variation of parameters, the MIL-HDBK-217F2 standard provides some equations to estimate the failure rates. This very interesting feature allows a kind of sensitivity analysis for the failure rate under varying conditions. For example, the MIL-HDBK-217F2 standard suggests for lamps the following law devoted to b evaluation in order to take into consideration the effect of the supply voltage Vr : b D 0:074Vr1:29
(failures/1;66 h):
(7.7)
Fig. 7.19 Failure rate predictions under different supply voltages and environments
It is possible therefore to investigate the variations of failure rate as a function of voltage. For example, the supply voltage of lamp 1 belonging to the pp series in the electronic circuit in Fig. 7.17 runs from 4 to 48 V, and its base failure rate can change according to this variation as represented in Fig. 7.18. Figure 7.19 presents the failure rate for the lamp of the pp series under different supply voltages and
Table 7.27 Failure rate predictions – signaling system electronic circuit (MIL-HDBK-217F2) Name
MIL-STD-217 category
Part number
Failure rate (failures/106 h)
Connector Fuse Switch 1 Switch 2 Selector Lamp 1 Lamp 2
15.2 Connectors, socket 22.1 Fuses 14.1 Switches 14.1 Switches 14.1 Switches 20.1 Lamps 20.1 Lamps
xc102 whsk 20 qa1304 qa1304 ff56 pp24-60 ght24-56
0.013 0.080 6.009 6.009 6.346 44.550 4.455
Total
67.462
212
environmental conditions. Application and utilization factors are fixed, i. e., U equal to 1.0 and A equal to 3.3.
7.5.2.3 GIDEP and FARADA GIDEP is a cooperative effort to exchange research, development, design, testing, acquisition, and logistics information among the government and participant industries. The objective of GIDEP is to improve the availability of information for the total quality management of critical materials. This goal includes improving reliability, maintainability, and cost of ownership while reducing or eliminating the use of critical resources for redundant testing and avoiding the use of known problem or discontinued parts and materials. GIDEP was born in 1959 as the Interservice Data Exchange Program (IDEP), a mutual agreement created by the Army, Navy, and Air Force in an effort to reduce duplicate qualification and environmental testing carried on for the military services by various contractors on the same parts, components, and materials. Initially IDEP covered only the military equipment and in a second stage it was expanded to include other types of data and information and others participants according to the requirements of the US defense industries. The program was renamed GIDEP to reflect the makeup of its participants and its evolution. In the early 1960 the data were collected, cataloged, analyzed, and published in a series of books known as the FARADA handbooks. Recently, several technical modernizations were made, with particular reference to the connection to automated data mining systems. At the time of writing, the GIDEP database contains five major data areas: • Engineering data. Information in engineering data covers a broad range of technical reports related to parts, components, materials, processes, systems, and subsystems applicable to all the engineering and technical disciplines. Soldering technology, best manufacturing practices, and value engineering reports are also contained in this data area. • Product information data. The product Information data include the diminishing manufacturing sources and material shortages notices, product change notices, and product information notices. • Failure experience data. This part of the database contains information about important failures and
7 Maintenance Information System and Failure Rate Prediction
their consequences. Failure experience data include the well-known ALERTs problem advisories and agency action notices. • Reliability–maintainability data. The reliability– maintainability data contain failure rate, failure mode, replacement rate, and mean time to repair data on parts, components, and subsystems. Some information is also in the failure experience data section. This is the core of the database when the problem is the failure rate prediction. The FARADA handbook is derived from this section. • Metrology data. This part contains the calibration procedures and technical manuals for test and measurement equipment. GIDEP data are accessible through a series of menus. Every document required is downloadable electronically. Data about new products are continually being assessed and are available according to the analysis and recommendations of the Data Committee.
7.5.2.4 RADC Nonelectronic Reliability Notebook In early 1980, RADC, New York State, USA, was engaged by the Air Force Agency to increase knowledge of the reliability performance of nonelectronic components in avionic equipment. At first, RADC developed methodologies to test components, thus introducing the “testability engineering principles.” Afterwards RADC published reliability handbooks containing failure data and reliability methods pertaining to a variety of applications. Its objective was the collection, analysis, and presentation of nonelectronic component failure data and the presentation of analytical methods forming the state of the art in nonelectronic reliability analysis. Topics include applicable statistical methods for nonelectronic reliability; reliability specifications; special application methods for reliability prediction; part failure characteristics; reliability demonstration tests. The last available version of this handbook is RADC-TR-85-194 distributed in 1985. The above-mentioned approaches, i. e., MIL-STD217, GIDEP, and RADC, are still applied to estimate figures for the predicted reliability of products. Many studies (Economou 2004) have indicated that their predictions are not concordant. Usually, MIL-HDBK217F2 is conservative and the actual value is several
7.5 Failure Rate Prediction
213
times better than the one predicted. The databases are built through information collected in the field and provided by supplier or users; since field failures depend on the specific application, these data are not representative for every situation. During the last few years, effort has mainly been devoted to enlarging the information in the database considering more influencing parameters: starting from MIL-STD-217 several other sources of reliability information have been developed, such as FIDES 2004, Telcordia SR-332, Naval Surface Warfare Center (NSWC) NSWC Handbook of Reliability Prediction Procedures for Mechanical Equipment, RDF 2000/2003, and the China 299B Electronic Reliability Prediction standard.
7.5.2.5 FIDES 2004 This approach has been developed since 2004 by a group of French companies working in the aeronautic and defense sector. It is based on the physics of failures method and supported by the analysis of test data and field returns. The FIDES approach provides models for components considering technological and physical factors, precise consideration of the mission profile, consideration of mechanical and thermal overstress, and the possibility of distinguishing the failure rate of a specific supplier of a component. Moreover, it takes into account failures linked to development, production, field operation, or maintenance processes. In synthesis, the failure rate predicted by the FIDES method is related to three parameters: D phis man proc :
(7.8)
phis is the physical failure rate. It is calculated using the base failure rate, usually represented by 0 and provided in tables, corrected by several factors, such as thermal conditions, electrical stresses, and humidity. man is a factor considering the quality level surrounding the part. Usually, the value is linked to specific certifications by the supplier of the components. proc is a factor linked to the characteristics of the realized process. In order to determine this value, a set of questions are provided. The FIDES approach is consistent with MILHDBK-217F2 (Marin and Pollard 2005) and it is usually less conservative, its failure rate being close to the observed rate.
7.5.2.6 Telcordia (Bellcore) SR-332 Telcordia is the new name of Bellcore Company (Bell Communications Research, a spin-off of AT&T Bell Labs). Bellcore previously referred to MIL-HDBK217 for its reliability predictions, and subsequently modified this model to reflect the field experience more exactly, thus developing in 1985 the Bellcore reliability prediction procedure, still applied to commercial electronic products. Many commercial electronic product companies are now choosing to use the Bellcore handbook for their reliability predictions. Typically this approach is useful to provide predictions for devices, units, or serial systems constituted by commercial electronic products. The information requested is the physical design data, the installation’s parameters, and the boundary conditions (e. g., temperature, vibrations).
7.5.2.7 NSWC Mechanical Reliability Prediction (US Navy Standard NSWC 06/LE1) Since 1992 the US Navy has dealt with the reliability prediction problem through its NSWC. The NSWC Handbook of Reliability Prediction Procedures for Mechanical Equipment contains 23 chapters of information with equations, engineering tables, and procedures for estimating the reliability of a mechanical design for the intended operating environment. The NSWC 06/LE1 standard is particularly devoted to mechanical components. Handbook procedures are used to determine the reliability of fundamental components such as springs, bearings, seals, and gaskets. These component applications are then expanded to subassemblies such as valves, actuators, and pumps and then to the system level. Equations in the handbook include parameters for material properties, operating conditions, and stress levels at each equipment indenture level, providing a full reliability, maintainability, and availability analysis at the system, assembly, and component indenture levels.
7.5.2.8 IEC 62380 (RDF 2000/2003 UTEC 80810 Method) The IEC 62380 module supports reliability prediction methods based on the European Reliability Predic-
214
tion Standard. This standard is directly derived from a French standard published by the Union Technique de L’Electricite in 2000. The standard evolved and became the European Standard for Reliability Prediction (IEC 62380). It includes most of the same components as MIL-HDBK-217, mainly therefore electronic devices. As this standard becomes more widely used, it could become the international successor to the US MIL-HDBK-217. Since it is difficult to evaluate the environmental factor, IEC 62380 uses equipment mission profiles and thermal cycling for evaluation. IEC 62380 provides complex models that can handle permanent working, on/off cycling, and sleeping applications. Its unique approach and methodology has gained worldwide recognition. IEC 62380 is a significant step forward in reliability prediction when compared with older reliability standards. It makes equipment reliability optimization studies easier to carry out, thanks to the introduction of influence factors. The reliability data contained in the IEC 62380 handbook are derived from field data concerning electronic equipment operating in these environments: • ground; stationary; weather-protected (equipment for stationary use on the ground in weatherprotected locations, operating permanently or otherwise); • ground; stationary; non-weather-protected (equipment for stationary use on the ground in nonweather-protected locations); • airborne, inhabited, cargo (equipment used in an aircraft, benign conditions); • ground; nonstationary; moderate (equipment for nonstationary use on the ground in moderate conditions of use). In conclusion, the latest version provides: • failure rate calculation at component, block, and system levels; • unavailability calculation at the system level; • repairable system calculation; • component and block i factors (see MIL-STD217 equations).
7.5.2.9 China 299B Electronic Reliability Prediction The China 299B standard is a reliability prediction approach based on the internationally recognized method
7 Maintenance Information System and Failure Rate Prediction
of calculating electronic equipment reliability given in the Chinese Military Standard GJB/z 299B. This standard uses a series of models, also very complicated, for various categories of electronic, electrical, and electromechanical components to predict failure rates that are affected by environmental conditions, quality levels, stress conditions, and various other parameters. The procedure requires a hierarchy process associating components, often not so user-friendly.
7.6 Remote Maintenance/Telemaintenance In this manuscript the authors strongly sustain the need for a “continuous” check of the equipment conditions, as a prerequisite to applying advanced maintenance policies (i. e., preventive and on condition). In the last few years, from this important issue of the technological evolution companies have been able to gain advantages: sensors, data capture systems, and the data transfer systems permit automatic data collection from the field. The integration of the automatic data collection and the CMMS database is a natural evolution of the system, suggesting very interesting advantages in terms of completeness of data and consumption of resources (i. e., workload and money). Moreover, in several cases the maintenance interventions are executed remotely thanks to remote control of actuators. This approach is generally called “remote maintenance” or “telemaintenance.” Early studies and applications have been developed in high-risk sectors, such as nuclear and chemical. The research linked to the International Thermonuclear Experimental Reactor (Haange 1995) is very interesting. Afterwards, the remote maintenance was extended to “capital-intensive” industrial sectors. General Electric can be considered a pioneer for proactive maintenance in large power plants (Rosi and Salemme 2001; Rotival et al. 2001). At the moment, the technology allows an extension of the remote control principle to small and medium-sized plants, thus opening enormous possibilities to plant managers, plant suppliers, and external companies for a global service. In summary, the technological resources (e. g., sensors, data management systems, actuators), the Internet, and other communication technologies can give or facilitate:
7.6 Remote Maintenance/Telemaintenance
215
• remote monitoring and as a consequence the analysis of degradation of plants; • notification of faults; • remote maintenance intervention (in particular, on the logical controller of the plant); • help on-line and remote counseling in real time; • management of spare parts; • education of personnel and continuous training. It is important to underline that the Internet and remote signaling are very powerful instruments also for off-line, which is not strictly linked to production flow, functions, having continuous development and a great impact in maintenance systems. Figure 7.20 gives a general representation of a telemaintenance system. Quick response and integration are the main advantages permitted by the automatic remote control, and practically their consequences lead to a significant reduction of cost. In particular, it is possible to build quickly a database for failures by the concentration of recorded data in some locations, even very far from each other. With this information a set of optimization algorithms and different approaches are usable, from simple ones to very complicated ones, such as expert systems or neural networks. Moreover, this centralized and continuously updated source of data guarantees maximum flexibility and real-time diffusion of knowledge. The absence of data sharing in industrial organizations is often a great problem. In this new vision each modification in the management system of main-
tenance data is very quick and easy: in fact, it is firstly based on the centralized master system and only secondarily on remote and local slaves. This new approach offers relevant possibilities about integration between users and suppliers of plants. This innovative link allows rapid interventions, maybe directly remotely, and can limit intermediate levels of maintenance structure, with maintenance engineers and local technicians. The heavy exchange of data that is usually realized between the customer and the supplier of equipment can be simplified by means of on-line counseling: e. g., remote training both in the starting phase and in the work phase, remote management of spare parts, and technical support and placing of purchase orders. Now that the potentiality of telemaintenance has been underlined some observations about the actors could be interesting. The evolution of the industrial market and the increasing costs of manpower are pushing companies to the delocalization of plants. In this situation, remote maintenance service can be an “owner resource,” totally managed by the enterprise. On the other hand, also in a “localized” case, many companies use external services for maintenance. From this point of view, remote strategies are very significant instruments. In fact, a lot of maintenance global service suppliers, with specific skills in different sectors, such as packaging machines, petroleum, and food and beverages, could be interested in offering their services to a set of similar plants owned by different companies around the world. These companies can use the high-level competences developed by
Fleet of machines/equipment
Database Algorithms Support ….
Control parameter adjustment Performance feedback Maintenance history Telemaintenance and diagnosis Spare parts Training
Real-time controller Production management
Remote site
Internet, LAN, WAN
Production/service site
Fig. 7.20 Remote maintenance system structure
216
7 Maintenance Information System and Failure Rate Prediction
Fig. 7.21 Peeling line scheme
outsourcers in different plants, and this is very crucial, especially during the start-up phases. Equipment suppliers can achieve concentration and scale economy, even offering their service 24 h=day with very competitive costs. It must be remembered that in a global service condition customers buy a fixed level of availability and productivity of plants. Plant supplier is the third category that can take advantage of remote maintenance, making it a not marginal factor: providing skills and competences to the plant customer in a rapid and economic way could turn into a strategic competitive advantage. Moreover, by punctual control of an installed fleet it is important to keep in mind that suppliers have strategic feedback, useful for addressing the research and the development of new products. Industrial experience shows that some criticalities are actually linked to remote maintenance and for this reason researchers will have a great job to overcome them in the future. Primarily, some observations about the measuring system must be underlined. The fundamental question is the definition, for each plant, of the most important parameters to take under control and to send. This choice, usually among temperatures, velocities, vibrations, torques, and electrical intensities, masks a determination of models linking the states of the plant to these parameters. In this perspective, research appears very long and interesting. Anyway, as the net of sensors will expand following the same increasing trend of recorded information, its management will turn into a very complex task. Sensors must transmit robust and reliable data, and actually we can use algorithms for the validation of field signals. In this following interpretative phase the human contribution is still desirable. Use of remote transmission systems, the Internet, and LANs involve questions about protocol standardization, security of data, and precompression techniques in order to make data transmission less onerous. The electronic and information technology sectors must provide suitable methods and instruments. In addition to this “technical question,” there are political and psychological criticalities. First, plant
users are still suspicious of maintenance systems based on remote suggestions. Second, the same plant suppliers are still reluctant to install sensors on machines. In this perspective, the last industrial positive results will surely be a great impulse.
7.6.1 Case Study This is an application of remote maintenance to a “peeling line” for wood panel manufacturing. In particular, the company is European leader for plywood panel production. The plant considered is located in northern Italy and started its production in February 2004, while the supplier is a great north European company. In 2006 the wood panel manufacturer accepted the supplier’s offer to adopt remote control and maintenance, management of spare parts, and continuous training of personnel by the Internet. Figure 7.21 shows the scheme of the plant and Fig. 7.22 presents a photograph of the exit section of the peeler. The plant works 16 h=day with two shifts, and has a cost per hour of about $1,500. Telecontrol required
Fig. 7.22 Product exiting from peeler. (Courtesy of Reni Ettore Spa)
7.6 Remote Maintenance/Telemaintenance
217
Table 7.28 Comparison between traditional maintenance and remote maintenance Traditional maintenance (2005)
Remote maintenance (2006)
Total hours available per year Production losses (h)a Production losses ($)a Corrective interventions by supplier (n/b Corrective interventions by supplier ($)b Corrective interventions by wood manufacturer (n/b Corrective interventions by wood manufacturer ($)b Preventive interventions by supplier .n/b Preventive interventions by supplier ($)b Preventive interventions by wood manufacturer .n/b Preventive interventions by wood manufacturer ($)b Remote interventions by supplier .n/c Total spare parts costs ($)d
5,198 287.3 (5.5%) 430,950 26 70,345 23 7,540 4 7,778 5 2,220 0 405,68 559,401
5,185 145.2 (28%) 217,100 5 19,874 16 2,350 2 13,520 20 8,952 6 42,550 304,346 255; 055
Production losses ($)a Maintenance policies total costs Spare parts total costs
430,950 87,883 40,568
217,100 44,696 304,346
a
Due only to corrective and preventive maintenance Excluding spare parts costs c Cost is included in the annual fee d Spare parts used in corrective and preventive interventions b
the introduction of a management system for the signal based on Sinumerik© technology by Siemens and the installation of new sensors. The fundamental variables under control are angular velocities of shafts, temperatures, intensity of currents, and vibrations, both for machines and the working environment. The interventions on hardware were realized during the 2005 winter stoppage, and the correspondent cash flow was about $ 130,200 .$ 1 D ¤ 0:98/. For this service of remote maintenance the supplier requested an annual fee of about $ 9,300 for remote counseling, training, and ordering of spare parts. In 2006 the new system worked, and Table 7.28 matches the most relevant maintenance factors for the traditional system (2005) and the new system (2006). A great recovery in hours worked, and therefore in costs, due to production losses can be observed immediately. At the same time the total cost of maintenance policies is decreased, and costs for spare parts are not changed much. Continuous remote control of the plant on more than one opportunity permitted an intervention, during unproductive time, before the failure. The possibility to use the great competences of the supplier in real time with very competitive costs (the largest fraction of supplier contri-
butions was only in a remote way) reduces downtimes in a significant manner. Finally, it must be noted that this system enabled the training of personnel, still in progress, and a continuous alignment between the technological improvements of the plant by operators. In conclusion, telemaintenance is a very powerful resource that can open great perspectives for industrial/service systems. Not only manufacturers, but also services industries can take advantage with remote control diagnosis and maintenance, both for users and suppliers. Experimental evidence shows the wide applicability of this technique: increasing availability and reducing costs are gained by punctual and continuous equipment monitoring, a rationalization of maintenance interventions, and low-cost management of spare parts and training. A large part of the technologies required to provide remote maintenance is available. Progress in sensors, protocols, and compression methods is desirable, but first and foremost a more intensive diffusion of the remote concept is needed. Very significant initial results of real applications surely will represent a great impulse.
8
Effects Analysis and Reliability Modeling of Complex Production Systems
Contents 8.1
Introduction to Failure Modes Analysis and Reliability Evaluation . . . . . . . . . . . . . . . . . . . . . 220
8.2
Failure Modes and Effects Analysis . . . . . . . . . . . . 8.2.1 Product Analysis . . . . . . . . . . . . . . . . . . . . . . . . 8.2.2 Failure Mode, Effects, and Causes Analysis . 8.2.3 Risk Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.4 Corrective Action Planning . . . . . . . . . . . . . . . 8.2.5 FMEA Concluding Remarks . . . . . . . . . . . . . .
220 221 222 222 225 229
8.3
Failure Mode, Effects, and Criticality Analysis . . 8.3.1 Qualitative FMECA . . . . . . . . . . . . . . . . . . . . . 8.3.2 Quantitative FMECA . . . . . . . . . . . . . . . . . . . . 8.3.3 Numerical Examples . . . . . . . . . . . . . . . . . . . .
229 231 231 232
8.4
Introduction to Fault Tree Analysis . . . . . . . . . . . . . 236
8.5
Qualitative FTA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.5.1 Fault Tree Construction Guidelines . . . . . . . . 8.5.2 Numerical Example 1. Fault Tree Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.5.3 Boolean Algebra and Application to FTA . . . 8.5.4 Qualitative FTA: A Numerical Example . . . .
8.6
8.7
8.8
239 239 240 241 242
Quantitative FTA . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.6.1 Quantitative FTA, Numerical Example 1 . . . . 8.6.2 Quantitative FTA, Numerical Example 2 . . . . 8.6.3 Numerical Example. Quantitative Analysis in the Presence of a Mix of Statistical Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . .
244 248 252
Application 1 – FTA . . . . . . . . . . . . . . . . . . . . . . . . . . 8.7.1 Fault Tree Construction . . . . . . . . . . . . . . . . . . 8.7.2 Qualitative FTA and Standards-Based Reliability Prediction . . . . . . . . . . . . . . . . . . . . 8.7.3 Quantitative FTA . . . . . . . . . . . . . . . . . . . . . . .
263 264
Application 2 – FTA in a Waste to Energy System 8.8.1 Introduction to Waste Treatment . . . . . . . . . . . 8.8.2 Case study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.8.3 Emissions and Externalities: Literature Review . . . . . . . . . . . . . . . . . . . . . . .
277 277 278
254
266 269
279
8.8.4 SNCR Plant . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.8.5 SNCR Plant. Reliability Prediction and Evaluation Model . . . . . . . . . . . . . . . . . . . 8.8.6 Qualitative FTA Evaluation . . . . . . . . . . . . . . . 8.8.7 NOx Emissions: Quantitative FTA Evaluation 8.8.8 Criticality Analysis . . . . . . . . . . . . . . . . . . . . . . 8.8.9 Spare Parts Availability, What-If Analysis . . 8.8.10 System Modifications for ENF Reduction and Effects Analysis . . . . . . . . . . . . . . . . . . . . . 8.9
Markov Analysis and Time-Dependent Components/Systems . . . . . . . . . . . . . . . . . . . . . . . . . 8.9.1 Redundant Parallel Systems . . . . . . . . . . . . . . 8.9.2 Parallel System with Repairable Components 8.9.3 Standby Parallel Systems . . . . . . . . . . . . . . . . .
280 281 283 287 292 295 300 301 302 304 306
8.10 Common Mode Failures and Common Causes . . . 309 8.10.1 Unavailability of a System Subject to Common Causes . . . . . . . . . . . . . . . . . . . . . 310 8.10.2 Numerical Example, Dependent Event . . . . . 311
Given a complex system made of thousands of parts and components, such as an Airbus A380, a flexible manufacturing system, an item of health-care equipment (e. g., a radiation machine, a cardiograph), a particle accelerator, etc., there are several modes in which the system does not function properly, i. e., in accordance with specifications. The first problem is the identification of all these modes, even the rarest and most hidden ones, especially if the safety of people and the environment could be compromised. The second problem is the identification of the minimal conditions which can bring a system into one of its possible states of “not function” (i. e., failures). What about the number of failure events, the downtime, the uptime, and the availability of a complex system given a period of time T ? How can the performance of a system be improved? How can the exter-
R. Manzini, A. Regattieri, H. Pham, E. Ferrari, Maintenance for Industrial Systems © Springer 2010
219
220
8 Effects Analysis and Reliability Modeling of Complex Production Systems
nalities generated by a piece of equipment be reduced for a given reliability system configuration? A very critical problem deals with the treatment of dependency among failure and repair events for the basic components of the system under investigation. The Markov chain technique can effectively support the modeling activity of such a production system. The models and methods proposed and exemplified in this chapter will support the introduction of costbased optimization models for planning and executing the maintenance actions and the spare parts fulfillment and management, as properly discussed in the following chapters.
8.1 Introduction to Failure Modes Analysis and Reliability Evaluation The objective of this chapter is the introduction to models and methods supporting the production system designer and the safety and/or maintenance manager to identify how subsystems and components could fail and what are the corresponding effects on the whole system, and to quantify the reliability parameters for complex systems. A system is complex when it is made of physical and logical combinations of several primary components, a lot of basic items whose failure and repair behaviors are known in terms of reliability performance indexes, e. g., failure rate, expected number of failures (ENF), and the mean time to repair (MTTR). This chapter is organized as follows: firstly some models and tools i. e., failure modes and effects analysis (FMEA) and failure mode, effects, and criticality analysis (FMECA) for the identification of failure modes and causes are illustrated and exemplified; afterwards fault tree analysis (FTA) is introduced and applied to several significant examples; and, finally, Markov chain modeling is illustrated and applied.
8.2 Failure Modes and Effects Analysis FMEA is a systematic inductive technique designed to identify the potential failure modes for a product or a process, to assess the risk associated with those failure modes, to rank the issues in terms of importance,
and to identify and carry the correspondent corrective actions out. The final goal is to anticipate problems and minimize their occurrence and impact. Practically, the target is to prioritize the failure modes (product or process) by an index usually called “risk priority number” (RPN) which is very useful in designing activities to reduce the criticalities. FMEAs are often referred to by type, such as design FMEA (DFMEA) and process FMEA (PFMEA). DFMEA is focused on the product, the failure modes and their causes being related to product functions and components. The primary objective is to uncover the potential failures associated with the product that could cause malfunctions, safety hazards for the user, or shortened product life. Ideally the DFMEA should be conducted throughout the entire product design process, from the preliminary design until the product goes into production, with an iterative procedure. PFMEA examines how failures in manufacturing and assembly processes can affect operation and quality of a product or service. PFMEA indicates what can be done to prevent potential process failures prior to the first production run. Ideally the PFMEA should be conducted throughout the process design phase. Overall, FMEA is intended to be a dynamic and iterative process where practitioners review and update the analysis as new information becomes available, corrective actions are implemented, design phases progress, etc. FMEA requires different skills; hence, it is absolutely necessary to build an FMEA group usually organized and conducted by a FMEA process owner. This group may include representatives from the following areas: product design, testing, materials, suppliers/OEM, manufacturing and assembling, quality, and field service. The project leader plays a fundamental role in defining the rules and the organization of work. FMEA can represent a very powerful approach but in compliance with rules and personnel commitment, otherwise FMEA is only a time-consuming activity. There are several guidelines and standards for the requirements of FMEA as well as the recommended reporting format. Some of the main published standards for this type of analysis include: • MIL-STD-1629A; • J1739 from the Society of Automotive Engineers for the automotive industry;
8.2 Failure Modes and Effects Analysis
221
• AIAG FMEA-3 from the Automotive Industry Action Group for the automotive industry; • ARP5580 from the Society of Automotive Engineers for nonautomotive applications; • IEC 812 from the International Electrotechnical Commission; • BS 5760 from the British Standards Institution. In addition, many industries and companies have developed their own procedures to meet the specific requirements of their products/processes. The standards are slightly different, but the core of the FMEA procedure is the same: 1. 2. 3. 4. 5.
life application dealing with a fundamental part of a drink vending machine: the distribution valve system. These automatic machines for the preparation of various drinks are normally equipped with a multiway valve used for supplying water or steam to different collecting vessels, according to the drink required. The multiway valve is exposed to considerable stresses due to temperatures and pressures, and usually its behavior can significantly influence the total reliability of the machine (Fig. 8.1).
8.2.1 Product Analysis
FMEA group formation and rule sharing; product or process analysis; FMECA; risk evaluation; corrective action planning.
In the following pages, the DFMEA procedure (MILSTD-1629A standard) is detailed by means of a real-
The FMEA team must analyze the machine (in general, the system) with the goal to define the system structure having its subsystems and components placed at different hierarchical levels. This structure, usually in a top-down form, represents a very useful permanent reference when the system is very com-
Distribution valve subsystem Subsystem 1 Switch A Distribution valve subsystem Drink vending machine
Subsystem 3
Valve actuator Discharge pipe
……….. Subsystem 4
Fig. 8.1 Distribution valve subsystem, drink vending machine
………..
222
8 Effects Analysis and Reliability Modeling of Complex Production Systems
plex. This subsection arrangement is usually generated according to the different functions performed by subsystems, such as supply electrical energy, storage data, and sound recording. Normally each subsystem performs a single function. In this phase the analysis can usually require a lot of information, such as design drawings, description and operation documentation, and supplier information. In the real case discussed, the system has several subsystems, but the focus is on the distribution valve subsystem (item code 1100). Its “critical” components are an electrical switch (switch A), the valve actuator, and the discharge pipe: it is very important in this phase to concentrate the analysis on a small group of components having a strong impact on reliability. Machines have hundreds or thousands of items, and a thorough investigation is not applicable.
8.2.2 Failure Mode, Effects, and Causes Analysis Failures may potentially occur for each subsystem or function, resulting in several effects such as loss of production, no entrance of people, and absence of lighting. Usually each failure, or failure mode, can have several causes. A basic step of the FMEA procedure is the definition of the sequence of failure modes, effects, and causes. Typically data are arranged into a structured standard worksheet or a hierarchical diagram, as reported in Figs. 8.2 and 8.3, respectively (distribution valve subsystem – drink vending machine example). Switch A (item 1100.1), whose main function is to allow the distribution valve to supply the beverage, has three principal failure modes: oxidation, mechanical break, and pin disconnection from the connector. Columns 1 and 2 in the worksheet shown in Fig. 8.2 show, respectively, the item and its correspondent failure modes. Speaking about effects, one can distinguish among different categories: a local effect (FMEA worksheet, column 3), i. e., strictly concerning the item analyzed, a next-higher-level effect (FMEA worksheet, column 4), i. e., involving items set on the next-higher assembly level, and an end effect (FMEA worksheet, column 5), the most important in the FMEA.
Each failure mode can have different causes as reported in column 8 of the FMEA worksheet. Considering oxidation as a failure mode for switch A, the end effect is a difficult supply of beverage and the causes of oxidation can be a loss of water and steam and a problem with gaskets (tear and wear). Several FMEA styles (e. g., MIL-STD-1629A) potentially provide a failure detection method and a compensating provisions action (FMEA worksheet, columns 10 and 11). This supplementary information is very useful when corrective actions are investigated and implemented.
8.2.3 Risk Evaluation The core task of the FMEA is the evaluation of risks associated with the potential problems identified through the failure modes identification and analysis. The purpose of FMEA is to take actions in order to eliminate or reduce failures, starting with the highestpriority ones. It may be used to evaluate risk management priorities for mitigating known threat vulnerabilities. FMEA helps to select some remedial actions by reducing the cumulative impacts of life-cycle consequences resulting from a system failure. The risk of each failure is called “risk priority number” (RPN) and it is expressed by the product of severity (S ), occurrence (O), and detection (D). For a generic cause of failure i , RPNi D Si Oi Di :
(8.1)
Severity (Si ) is the amount of harm or damage the failure effect may cause to people or equipment. This parameter is rated following a qualitative scale. From the MIL-STD-1629A standard the correspondent magnitudes range from 1 to 4 as expressed in Table 8.1; this rate is reported in column 7 of the FMEA worksheet in Fig. 8.2. Occurrence (Oi ) is the rate stating the likelihood of occurrence for each cause of failure. The probability of occurrence ranges from extremely unlikely to frequent. Also in this case the evaluation is qualitative but it is clearly linked to the failure rate. This concept will be stressed later on when we speak about criticality analysis. From the MIL-STD-1629A standard the classification of occurrence is expressed in Table 8.2.
difficult supply of beverage
gaskets tear
water loss
Fig. 8.2 FMEA worksheet. RPN risk priority number, distribution valve subsystem
superficial cut
water loss
no supply of beverage
wear of internal crown
no supply of beverage
disconnection from connector
End Effects
(6)
no supply of beverage
Next Higher Level
(5)
mechanical break
Local Effects
(4)
difficult supply of beverage
Mission Phase/Operational Mode
(3)
oxidation
Failure Modes
(2)
1100.3 - discharge pipe keep clean supply system disconnection from chassis
1100.2 - valve actuator beverages supply
1100.1 - switch A supply permission
Item/Function
(1)
Causes
Sev - Si supply incorrect
assembly incorrect
vibrations from pump
thermal stress
pipe occlusion (residuals)
superficial treatment failed
assembly incorrect
assembly incorrect
presswork incorrect
normal use of disposal
assembly incorrect
vibrations from pump
vibrations from pump
wear
gaskets wear
gaskets tear
steam or water loss by valve
( 8)
(7) (10)
Failure Detection Method
(9)
Compensating Provisions
(11)
(13)
RPN
(12)
8.2 Failure Modes and Effects Analysis 223
Det - Di
Occ - Oi
224
8 Effects Analysis and Reliability Modeling of Complex Production Systems
1.1.1.1 - steam or water loss by valve (joint loosening) 1.1 - oxidation
1.1.1 - no supply of beverage
1.2 - mechanical break
1.2.1 - no supply of beverage
1.3 - disconnection from connector
1.3.1 - no supply of beverage
1.1 - wear of internal crown
1.1.1 - no supply of beverage
1.1.1.2 - gaskets tear 1.1.1.3 - gaskets wear
1.2.1.1 - wear 1100.1 - switch A 1.2.1.2 - vibrations from pump 1.3.1.1 - vibrations from pump 1.3.1.2 - assembly incorrect 1.1.1.1 - normal use of disposal 1.1.1.2 - presswork incorrect 1.1.1.3 - assembly incorrect 1.2.1.1 - assembly incorrect
1100 - distribution valve
1100.2 - valve actuator
1.2 - gaskets tear
1.2.1 - no supply of beverage
1.1 - disconnection from chassis
1.1.1 - water loss
1.2.1.2 - superficial treatment failed 1.1.1.1 - pipe occlusion (residuals) 1.1.1.2 - thermal stress
1100.3 - discharge pipe
1.1.1.3 - vibrations from pump
Fig. 8.3 FMEA diagram, distribution valve subsystem
In the FMEA worksheet (Fig. 8.2) this rate is posted in column 9. Detection (Di / is the likelihood that the failure will be detected. This parameter introduces an important point of view, often not considered in the classic magnitude-effect analysis. The difficulty of failure detection can represent a significant problem increasing the total criticality of a cause of a failure characterized by average severity and occurrence. Table 8.3 shows the criteria adopted for detection evaluation and the correspondent qualitative numerical ranking. Column 12 of the FMEA worksheet collects this ratio. The scales adopted by MIL-STD-1629A and presented here are only a model: various textbooks and
manuals addressing FMEA, or the standards adopted by major industries provide several rating scales, with the possibility for the team to create/modify them in order to fit the specific analysis. The basic concept remains to rate the failure risk by RPN. High values of RPN reveal critical causes of failure. The sum of the RPNi for a lower level (i. e., subsystem, subassembly, components) is the overall RPN for the upper level, up to the entire product. Considering to the distribution valve example, and in particular to switch A and its first failure mode (i. e., oxidation), the correspondent severity level is near the maximum (rate 3 – critical) because in this condition the customer has significant difficulties to obtain the
8.2 Failure Modes and Effects Analysis
225
Table 8.1 Severity rating scale (MIL-STD-1629A) Rate
Description
Criteria
1
Category IV – minor
2
Category III – marginal
3
Category II – critical
4
Category I – catastrophic
A failure not serious enough to cause injury, property damage, or system damage, but which will result in unscheduled maintenance or repair A failure which may cause minor injury, minor property damage, or minor system damage which will result in delay or loss of availability or mission degradation A failure which may cause severe injury, major property damage, or major system damage which will result in mission loss A failure which may cause death or weapon system loss (i. e., aircraft, tank, missile, ship, etc.)
Table 8.2 Occurrence rating scale (MIL-STD-1629A) Rate
Description
Criteria
1
Level E – extremely unlikely
2
Level D – remote
3
Level C – occasional
4
Level B – reasonably probable
5
Level A – frequent
Probability of occurrence is essentially zero during the item operating time interval. A single failure mode probability of occurrence is less than 0.001 of the overall probability of failure during the item operating time An unlikely probability of occurrence during the item operating time interval. A single failure mode probability of occurrence is more than 0.001 but less than 0.01 of the overall probability of failure during the item operating time An occasional probability of occurrence during the item operating time interval. A single failure mode probability of occurrence is more than 0.01 but less than 0.10 of the overall probability of failure during the item operating time A moderate probability of occurrence during the item operating time interval. A single failure mode probability of occurrence is more than 0.10 but less than 0.20 of the overall probability of failure during the item operating time A high probability of occurrence during the item operating time interval. A single failure mode probability greater than 0.20 of the overall probability of failure during the item operating time interval
Table 8.3 Detection rating scale (MIL-STD-1629A) Rate 1 2 3 4 5 6 7 8 9 10
Description
Criteria
Almost certain
Current controls almost always will detect the failure. Reliable detection controls are known and used in similar processes Very high likelihood current controls will detect the failure Good likelihood current controls will detect the failure Moderately high likelihood current controls will detect the failure Medium likelihood current controls will detect the failure Low likelihood current controls will detect the failure Slight likelihood current controls will detect the failure Very slight likelihood current controls will detect the failure Remote likelihood current controls will detect the failure No known controls available to detect the failure
Very high High Moderately high Medium Low Slight Very slight Remote Almost impossible
drink. The three causes of failure detected have an average value of probability of occurrence, but the higher level of probability is assigned to the wear of gaskets (ranked 4 in the occurrence scale), a cause linked to the natural use of the machine. All the above-mentioned causes are relatively easy to detect; the wear of gaskets is the higher level of criticality (ranked 5 – medium) in this case too. The result of the iteration of this approach to other components is the risk evaluation summarized in Fig. 8.4.
8.2.4 Corrective Action Planning The risk evaluation is the starting point for the design and the execution of corrective actions. The goal of FMEA is to anticipate potential problems and to perform activities in order to reduce and/or remove risks. RPN permits the interventions to be prioritized. It is worth remembering that RPN ratings are related to a specific analysis. A crossover comparison of some RPN values among different applications (product or process) is in fact meaningless.
difficult supply of beverage
gaskets tear
water loss
Fig. 8.4 Risk evaluation for distribution valve system (MIL-STD-1629A standard)
superficial cut
water loss
no supply of beverage
wear of internal crown
no supply of beverage
disconnection from connector
End Effects
(6)
no supply of beverage
Next Higher Level
(5)
mechanical break
Local Effects
(4)
difficult supply of beverage
Mission Phase/Operational Mode
(3)
oxidation
Failure Modes
(2)
1100.3 - discharge pipe keep clean supply system disconnection from chassis
1100.2 - valve actuator beverages supply
1100.1 - switch A supply permission
Item/Function
(1)
2
2
3
4
4
4
1
supply incorrect
4
vibrations from pump
4
2
thermal stress
assembly incorrect
3
2
superficial treatment failed
pipe occlusion (residuals)
2
4
assembly incorrect assembly incorrect
2
2
assembly incorrect
presswork incorrect
3
vibrations from pump
4
2
vibrations from pump
normal use of disposal
3
4
gaskets wear wear
3
gaskets tear
Failure Detection Method
3
(10)
(9)
Occ - Oi
steam or water loss by valve
Causes
Sev - Si 3
(8)
(7)
Compensating Provisions
(11)
5
5
6
4
9
10
3
3
7
2
2
9
8
2
5
4
10
40
48
16
54
68
18
48
56
32
16
108
64
24
60
36
18
Det - Di 2
(13)
RPN
(12)
226 8 Effects Analysis and Reliability Modeling of Complex Production Systems
8.2 Failure Modes and Effects Analysis
227
The RPN analysis recommends corrective actions focused on reducing a single factor or more than one factor. Usually the FMEA team provides a new level of RPN, the so-called revised RPN, to be compared with the initial RPN. The FMEA team must spend time analyzing the RPNi configuration. Typical instruments are the Pareto chart of RPNi , the occurrence–severity matrix, the causes by occurrence analysis, and the effects analysis. Application of these tools with reference to the distribution valve example is shown in Figs. 8.5–8.8. The most critical cause of failure has RPNi D 108, which corresponds to Si D 4, Oi D 3, and Di D 9 due to vibrations from the pump as a result of the disconnection of switch A from the connector. Others critical issues engage switch A and pump vibrations: in particular, a mechanical break is possible (RPNi D 64, Si D 4, Oi D 2, and Di D 8). Switch A has a very high occurrence among the greatest RPN values. Its problems are fundamentally due to pump vibrations and gaskets. The occurrence–severity matrix is another interesting tool for the risk assessment. The user can set three different regions on the two-dimensional space severity (on x-axes) and occurrence (on y-axes) by the
Causes Ranked by Initial RPN 120 108 100
Cause RPN
80
64 60 60
60
56
54 48 48 40
40
36 32 24
20
18 18
16 16 10
0
1
2
3
4
5
6
7
8
9
definition of high and low levels. The matrix gives a prompt idea about the criticality of the causes of failure. The analysis can be completed by other studies such as the causes by occurrence (Fig. 8.7) and the effects classification (Fig. 8.8). In conclusion, the analysis of RPNi allows one to prioritize some corrective actions usually linked to the product design. For the distribution valve case, the FMEA team decided to improve the first four criticalities sorted by the Pareto analysis of RPN. As mentioned, the more critical problems deal with the vibrations induced by the pump and the resistance and the retaining of valve gaskets. In particular, several corrective actions are defined: • A rubber bumper insertion in the fixing system between the pump and the chassis to reduce the vibrations induced on other components (i. e., switch A and discharge pipe). The responsibility is shared by the mechanical design division and the procurement division. The activity starts on 1 November 2008 and the due date is fixed at 1 June 2009. • A new switch design with mechanical redundancy to increase the availability of disposal. The responsibility is shared by the mechanical design division and the procurement division. The activity starts on 1 November 2008 and the due date is fixed at 1 June 2009. • A new connection system to avoid disconnection of electrical pins. The quality assurance division must guarantee the study and the procurement division must search for a new effective supplier. The starting date is 1 November 2008 and the new system must work before 1 April 2009. • A new material or new treatment for gaskets. At the same time a new profile is needed for the gasket to avoid tearing. The mechanical design division must develop the new profile, and the quality assurance division executes the experiments to validate new materials and a new profile. The procurement division must search for new suppliers. The activity starts on 1 November 2008 and the due date is 1 April 2009.
10 11 12 13 14 15 16 17
Cause
Fig. 8.5 Pareto analysis of initial RPN, distribution valve subsystem
The corrective actions provided have a significant potential effect on the criticality of the distribution valve, as confirmed by the 50% decrease of the criticality
228
8 Effects Analysis and Reliability Modeling of Complex Production Systems Occurrence/Severity Matrix (Initial Ratings) 5
Low Priority Line High Priority Line
Occurrence
4
x2
x2
3
x2
x2
Severity limit HIGH: 7 LOW: 3 Occurrence limit HIGH: 6 LOW: 4 2
x2
x3
1
0
0
1
2
3
4
Severity
High-priority causes: Normal use of disposal Assembly incorrect Gaskets wear Wear Vibrations from pump
(Item: 1100.2 - valve actuator) (Item: 1100.2 - valve actuator) (Item: 1100.1 - switch A) (Item: 1100.1 - switch A) (Item: 1100.1 - switch A)
Sev = 4, Sev = 4, Sev = 3, Sev = 4, Sev = 4,
Occ = 4 Occ = 4 Occ = 4 Occ = 3 Occ = 3
Medium-priority causes: Assembly incorrect Steam or water loss by valve Gaskets tear Presswork incorrect Vibrations from pump Assembly incorrect Assembly incorrect Superficial treatment failed Pipe occlusion (residuals) Thermic stress Vibrations from pump
(Item: 1100.3 - discharge pipe) (Item: 1100.1 - switch A) (Item: 1100.1 - switch A) (Item: 1100.2 - valve actuator) (Item: 1100.1 - switch A) (Item: 1100.1 - switch A) (Item: 1100.2 - valve actuator) (Item: 1100.2 - valve actuator) (Item: 1100.3 - discharge pipe) (Item: 1100.3 - discharge pipe) (Item: 1100.3 - discharge pipe)
Sev = 2, Sev = 3, Sev = 3, Sev = 4, Sev = 4, Sev = 4, Sev = 3, Sev = 3, Sev = 2, Sev = 2, Sev = 2,
Occ = 4 Occ = 3 Occ = 3 Occ = 2 Occ = 2 Occ = 2 Occ = 2 Occ = 2 Occ = 3 Occ = 2 Occ = 4
Low-priority causes: Supply incorrect
(Item: 1100.3 - discharge pipe)
Sev = 2, Occ = 1
Fig. 8.6 Occurrence–severity matrix, distribution valve subsystem
of the “original” causes at least. The FMEA procedure suggests a calculus of the new levels of severity, occurrence, and detection parameters (so-called revised) and in conclusion a new revised RPN is available.
Clearly, both the initial RPN and the revised RPN are based on an estimation of their factors, no mathematical models, or something similar supporting these evaluations. Figure 8.9 shows the action plan and the comparison between RPN values.
8.3 Failure Mode, Effects, and Criticality Analysis
229
Fig. 8.7 Causes by occurrence (distribution valve system)
Fig. 8.8 Effects classification (distribution valve system)
8.2.5 FMEA Concluding Remarks FMEA is a well-known qualitative reliability method. It is devoted both to the product and to the process analysis. It provides a systematic approach requiring all known or suspected potential failures to be considered. Usually the analysis directly results in actions to reduce failures and anyhow includes a follow-up system and reevaluation of potential causes of reliability problems. By paying attention to the customer point of view, it permits a tangible improvement of product and process reliability. Since FMEA represents a valid support to the design review provided by EN ISO 9001 and gives immediacy to the problem’s revision procedures, it should be approached together with the design phase as a whole. Some difficulties are of course related to its application. In particular, FMEA is a time-consuming process with very complex tasks taking hours or days to complete the process; it accounts for every cause of problems as a single event, and the combinations of events are captured as a single initiating event. Moreover, the process relies on recruiting the right participants and the personnel involved must be truthful about the respective activities. Nevertheless, it is worth mentioning some complications due to human error, some-
times overlooked because of the limited possibility of examination. Finally, it is important to remember that FMEA is only a qualitative procedure based on different scales of attributes such as severity, occurrence, and detection of failures, whose evaluations are dependent on the team involved. Just to overcome this last criticism, FMECA was developed as an extension of FMEA. The fundamental feature of FMECA is the introduction of the criticality factor, which is an effort to evaluate the criticality of the components on a quantitative basis instead of the qualitative approach adopted by FMEA.
8.3 Failure Mode, Effects, and Criticality Analysis FMECA differs from FMEA in investigating the criticality of failure in detail. This process systematically determines functions, functional failures, and failure modes of the production system, i. e., the equipment, with particular attention to the related effects, severity, and frequency of failure effects. A fundamental reference for the FMECA is represented by the MIL-STD-1629A standard. It provides two levels of criticality analysis: the qualitative and the quantitative FMECA.
Failure Modes
no supply of beverage
disconnection from connector
superficial cut
keep clean disconnection supply from chassis system
gaskets tear
water loss
water loss
difficult supply of beverage
beverages wear of internal no supply of supply crown beverage
no supply of beverage
difficult supply of beverage
End Effects
mechanical break
supply oxidation permission
Function
Si
24 64 rubber bumper insertion under pump fixing system - switch with mechanical redundancy
56 48 18
16 32
4 1
2 assembly incorrect supply incorrect
5 5
4 6
40 10
16 48 rubber bumper insertion under pump fixing system
54
10 60 developing and testing new material or new treatment - new profile for gasket to reveal tear
2 4
2
superficial treatment failed
7 3 3
9
2 4 2
presswork incorrect assembly incorrect 3 assembly incorrect
2 2
3
2 4
assembly incorrect 4 normal use of disposal
Corrective action description
36 60 developing and testing new material or new treatment - new profile for gasket to reveal tear
18
RPNi
9 108 rubber bumper insertion under pump fixing system - new connection system
2 pipe occlusion (residuals) thermal stress vibrations from pump
3
2 8
4 5
3 4
3 2
2
Di
3
4 vibrations from pump
4 wear vibrations from pump
3 steam or water loss by valve gaskets tear gaskets wear
Causes
Oi
Fig. 8.9 Planned corrective actions and revised RPN, distribution valve subsystem
1100.3 discharge pipe
1100.2 valve actuator
1100.1 switch A
item
quality assurance / mech. design / procurement
quality assurance / mech. design / procurement
quality assurance / procurement
mech. design / procurement
quality assurance / mech. design / procurement
Responsability
2008-11-01
2008-10-01
2008-11-01
2008-11-01
2008-10-01
Planned start date
2009-04-01
2009-02-01
2009-04-01
2009-06-01
2009-02-01
Completion Date
Sr 2
3
4
4
3
3
1
1
1
2
4
8
9
8
3
Dr
revised Or
initial
RPNr 24
24
36
32
18
230 8 Effects Analysis and Reliability Modeling of Complex Production Systems
8.3 Failure Mode, Effects, and Criticality Analysis
8.3.1 Qualitative FMECA The qualitative FMECA approach is a direct followup of the FMEA result. The target is to assign a priority to the failure modes and to group them in different “classes of criticalities,” usually three, according to a qualitative criticality matrix including the parameters severity and occurrence. The first factor can be evaluated by four different levels, from minor to catastrophic, as used for FMEA (Table 8.1). In the same way, the occurrence of the second factor is evaluated according to a qualitative scale ranging from extremely unlikely to frequent, as in FMEA (Table 8.2). Each failure mode is classified into the matrix depending on its own evaluations, usually indicated as Si and Oi for severity and occurrence, respectively. The most critical failure modes are revealed immediately, since three areas of criticalities, low, medium, and strong as in Fig. 8.10, are provided as a standard. The relative position of each failure mode with respect to the position of the “best” and “worst” categories gives a qualitative idea of its corresponding criticality level. The qualitative FMECA applied to the example of the distribution valve system is summarized by the criticality matrix in Fig. 8.11. Comparing some failure modes, the oxidation of switch A contacts, the wear of the internal crown of the valve actuator, and the mechanical break of the switch are very critical, while the disconnection of the discharge pipe from chassis failure mode has a medium level of criticality.
231
On one hand, the simplicity of the approach makes it suitable as a preliminary activity in order to drive the qualitative FMECA; however, on the other hand, it is sometimes very hard to estimate the qualitative evaluations of factors in a significant way.
8.3.2 Quantitative FMECA This approach is based on a quantitative procedure representing the most rigorous method currently available. The fundamental goal is the development of a numerical expression of the item criticality. Considering an item having c significant components, the correspondent item criticality is IC D
c X
CCi ;
(8.2)
i D1
where CCi is the criticality of component i defined as CCi D
m X
FMCij ;
(8.3)
j D1
where m is the number of failure modes for component i and FMCij is the failure mode criticality of failure mode j for component i . Each failure mode is characterized by a criticality value derived from FMCij D CUi .t / RUij PLij ;
(8.4)
Level A frequent
Occurrence
Level B reasonably probable
strong
Level C occasional
medium
Level D remote
low
Level E extremely unlikely
Fig. 8.10 Criticality matrix and criticality regions
IV - minor
III - marginal Severity
II - crical
I - catastrophic
232
8 Effects Analysis and Reliability Modeling of Complex Production Systems
PROBABILITY OF OCCURRENCE LEVEL ( INCREASING LEVEL OF PROBABILITY ----------> )
(HIGH)
(LOW)
Level A frequent Level B reasonably probable Level C occasional
- disconnection from chassis
- oxidation
- gaskets tear
- mechanical break
Level D remote
- disconnection from connector
Level E extremely unlikely
- superficial cut
Category IV - minor
- wear of internal crown
Category III - marginal
Category II - critical
Category I - catastrophic
SEVERITY CLASSIFICATION (INCREASING LEVEL OF SEVERITY ------>)
Fig. 8.11 Criticality matrix, distribution valve system
where t is the operating time, CUi .t / is the unreliability of component i at operating time t , RUij is the ratio of unreliability of failure mode j for component i , and PLij is the probability of loss of function, due to the failure mode j for component i . As shown in Eq. 8.4, for each failure mode the criticality is the product of three numerical factors. The first one, CUi .t /, is common for all the failure modes of the same component, and represents the unreliability of the component at the operating time t , thus disclosing a bridge between the quantitative FMECA and the theory of reliability. The definition of the component unreliability requires the operating time setting and the evaluation of the time-dependent failure distributions through well-known mathematical approaches, e. g., Weibull and exponential, as discussed in Chaps. 5 and 6. The ratio of unreliability RUij of the failure mode j is the probability that the component failure will be due to the considered failure mode j ; it is the percentage of failures, among all the failures allowed for the component, that will be caused by the given mode. It is important to note that the total percentage assigned to all modes must be obviously equal to 100%: m X j D1
RUij D 1
(8.5)
The probability of loss PLij is the probability of the loss of function at the occurrence of the considered failure mode j . This value is often equal to 1, because the failure gives rise to a complete loss of functionality of the component. In conclusion, the quantitative FMECA requires a procedure based on several steps: • definition of the reliability statistical distribution for different components of each item; • definition of an analysis operating time; • identification of the part of unreliability assigned to each potential failure mode; • rating of the probability of loss of function resulting from each failure mode that may occur; • calculation of the criticality for each component; • calculation of total item criticality by the sum of previous calculated criticalities. The final results are numerical evaluations of item criticalities which represent the starting points for a critical analysis and for the corrective action plan.
8.3.3 Numerical Examples We now present two numerical examples. Consider an item X, composed of two components A and B. The experimental evidence permits
8.3 Failure Mode, Effects, and Criticality Analysis
233
Table 8.4 Statistical distribution of reliability of components A and B f .t / Component A Component B
Exponential Normal
Table 8.5 Statistical distribution of reliability of components of the distribution valve system
Parameters
f .t / 1
.t / D 0:000207 h D 6:578 h D 1:211 h
an evaluation of their reliability performance, summarized in Table 8.4. Setting the operating time t D 6;000 h, the correspondent unreliabilities of the two components are CUA D FA .6000/ D 0:712; CUB D FB .6000/ D 0:316; Consider component A responsible for a generic function, named “function A,” and two failure modes, named “failure mode A.1” and “failure mode A.2,” generating, respectively, two causes named “cause A.1.1” and “cause A.1.2” and a single cause A.2.1. Failure mode A.1 is responsible for 60% of the failures of component A, then the remaining 40% is due to failure mode A.2. Failure mode A.1 gives rise to a complete loss of function A, while the probability of loss of function for failure mode A.2 is about 90%. Focusing on failure modes, FMCA;1 D CUA RUA;1 PLA;1 D 0:712 0:6 1 D 0:427; FMCA;2 D CUA RUA;2 PLA;2 D 0:712 0:4 0:9 D 0:256: Then the criticality of component A is CCA D FMCA;1 C FMCA;2 D 0:427 C 0:256 D 0:683: Similarly for component B the criticality is CCB D 0:269. In conclusion, item X has a criticality defined by the sum of the criticalities of its components:
Switch A
Normal
Valve actuator Discharge pipe
Exponential Weibull
D 752 h D 321 h .t / D 0:001 h1 ˇ D 2:766 D 2;463 h
Now consider the application of the distribution valve system, the significant components are switch A (ID 1100.1), the valve actuator (ID 1100.2), and the discharge pipe (ID 1100.3). For each of them the failure statistical distributions are defined in Table 8.5. The operating time is set to 1;000 h; for a drink vending machine, having an average operating of about 4 hours per day, this time represents more or less 1 year of work, that is the time between two consequent overhaul interventions. Figure 8.13 shows the final result of the quantitative FMECA approach. The results of the quantitative FMECA have different levels of detail: the criticality index can be defined for a single failure mode, or for a single component, i. e., groups of failure modes, or finally for a single item, i. e., groups of components. This feature allows a complete top-down analysis for the research of the most critical items of a product, its most critical components, and their related failure modes. In spite of this, a very effective corrective action plan can be developed. The distribution valve system has a criticality index of 1.289 fundamentally due to the criticality of switch A (0.689) and of the valve actuator (0.533). The discharge pipe has a secondary effect on the criticality of the entire item (Table 8.6). Analyzing the criticality of failure modes, the oxidation of contacts, the mechanical break for switch A, and the wear of the internal crown for the valve ac-
Table 8.6 Distribution valve criticality and component criticalities Criticality
ICX D CCA C CCB D 0:683 C 0:269 D 0:952: Figure 8.12 presents a typical worksheet used for the quantitative FMECA populated with the data of the previous example referred to item X.
Parameters
1100 – distribution valve 1100.1 – switch A 1100.2 – valve actuator 1100.3 – discharge pipe
1.289 0.689 0.533 0.077
234
8 Effects Analysis and Reliability Modeling of Complex Production Systems
Fig. 8.12 Quantitative failure mode, effects, and criticality analysis (FMECA) worksheet (item X example)
Table 8.7 Failure mode criticalities for the distribution valve system Failure modes and causes Wear of internal crown – Normal use of disposal – Presswork incorrect – Assembly incorrect Mechanical break – Wear – Vibrations from pump Oxidation – Steam or water loss by valve – Gaskets tear – Gaskets wear Disconnection from chassis – Pipe occlusion (residuals) – Thermal stress – Vibrations from pump Disconnection from connector – Vibrations from pump – Assembly incorrect Gaskets tear – Assembly incorrect – Superficial treatment failed Superficial cut – Assembly incorrect – Supply incorrect
Mode criticality 0.491
0.351
0.281
0.071
0.047
0.042
0.006
tuator are clearly very critical modes. The remaining modes have marginal criticalities. In conclusion, the product designers must focus their attention on the causes of these critical modes, listed in Table 8.7. The characteristic numerical approach of the quantitative FMECA allows a robust comparison in terms
of criticalities among different items of a product, and moreover gives priority to the corrective actions to be taken, ranking the failure modes and the related causes. It is important to note that this robustness is paid for, on the other hand, in terms of the time spent collecting data and developing the calculus of criticality factors. Moreover, the quantitative FMECA also requires some subjective assumptions; in particular, the unreliability ratio of failure mode j for component i RUij and the probability of loss of failure mode j for component i PLij depend on personal evaluations by the engineers, the technicians, and the practitioners who will develop the analysis. For this reason, some authors consider FMEA and in particular FMECA very effective instruments in the product/process design phase, but suggest their use exclusively for a comparison among the different failure modes or/and the components of a single product or process. In the case of a cross-check of the results among different products or processes, these methods reach their limits. Another typical result of the quantitative approach is the quantitative criticality matrix. It represents a hybrid matrix mixing the severity evaluation and the criticality value of each failure mode. As well as the FMEA criticality matrix, it usually individuates three zones characterized by different levels of criticality. Figure 8.14 shows the quantitative criticality matrix for the distribution valve system.
Fig. 8.13 Quantitative FMECA worksheet for the distribution valve system
8.3 Failure Mode, Effects, and Criticality Analysis 235
236
8 Effects Analysis and Reliability Modeling of Complex Production Systems
CRITICALITY NUMBER (Cr) ( INCREASING LEVEL OF CRITICALITY ------ > )
(HIGH)
(LOW)
0.491
- wear of internal crown
0.351
- mechanical break
- oxidation
0.281
- disconnection from chassis
0.071
0.047
- disconnection from connector
- gaskets tear
0.042
- superficial cut
0.006 Category IV - minor
Category III - marginal
Category II - critical
Category I - catastrophic
SEVERITY CLASSIFICATION (INCREASING LEVEL OF SEVERITY ------ >)
Fig. 8.14 Quantitative FMECA matrix for the distribution valve system
8.4 Introduction to Fault Tree Analysis FTA is a systematic technique which is used to acquire information on a system, in the case of normal behavior but, in particular, in the presence of a failure, in order to support the very complex decision-making process during the design stage as well as its managing and controlling activities. This process generally involves people dealing with the system, from suppliers to customers passing through managers and employees working daily within the system. This analysis can also support the decision-making process developed by safety and maintenance engineers who plan and organize preventive and/or breakdown maintenance and monitoring activities on the production systems. The fault tree is a deductive system analysis by which the analyst postulates that the system could fail in a certain way and attempts to find out how the system or its components could contribute to this failure. Born as a qualitative model, it turned into a quantitative tool: for this reason in this chapter qualitative and quantitative analyses are distinguished and applied to trivial academic examples and some industrial case studies. A fault tree is a whole set of entities called “gates” addressing the bottom-up transmission of fault logic. These gates represent the relationships of events for the occurrence of a higher event, called “father event.” The higher event is the output of the gate, while the
events at a lower level, also called “sons of the father,” are the input. Figure 8.15 reports a list of main typologies of events, gates, and transfers. Figure 8.16 shows a list of gates available in the commercial Relex® Reliability software. Figure 8.17 illustrates a FTA applied to an elevator, here referred to as a particular production system. The top event “passenger injury which occurs in an elevator” is analyzed by Relex® Reliability software. In general, the top event is the result of different combinations of basic events identified for the components of the system. The behavior of every element in the system is known in terms of failures and repairs, and it can be modeled by the usual parameters coming from the reliability evaluating activities. With reference to the failure rate, two kinds of components can be mainly distinguished: passive and active components. A passive, or quasi-static, component transmits a signal, e. g., a current or a force: the failure rates are below 104 per demand, i. e., about 3 107 h1 . An active component causes or modifies a signal above this value. Usually there are 3 orders of magnitude between these rate values. In the case of failure of an active component, e. g., a switch in an electrical circuit, a hydraulic pump, or a valve regulating the fluid flow in a piping system, the output signal could be incorrect or absent, while the failure of a passive component, e. g., an electric wire in a circuit or a pipe in a piping system, can result in a no-signal transmission.
8.4 Introduction to Fault Tree Analysis Symbol
237
Name
Description
Basic event
A fault event which does not require further development
Top event
This event is related to a failure mode of the production system. The aim of a FTA is the characterization of this event
Conditioning event
It specifies the condition and/or the restrictions applied to a logic gate (e.g., a P-AND gate)
Intermediate event
It occurs because of one or more former causes acting through logic gates
AND gate
Output fault occurs if all input faults occur
OR gate
Output fault occurs if at least one of the input faults occurs
XOR gate (exclusive OR gate)
Output fault occurs if solely one of the input faults occurs
P-AND gate (priority AND gate)
It is a special case of an AND gate. Output fault occurs if all of the input fails in a specific sequence, stated by a conditioning event
INHIBIT gate
The output is caused by a single input if only it is conditional, i.e., under the condition specified by the conditioning event
Transfer IN
It points out that the tree is developed further at the transfer OUT
Transfer OUT
It shows the portion of the tree that has to be attached to the related transfer IN
Fig. 8.15 Main gates, events and transfers in a fault tree analysis (FTA)
238
8 Effects Analysis and Reliability Modeling of Complex Production Systems
In order to introduce the reader to the meaning and use of a fault tree, Fig. 8.18 illustrates a two-input OR gate, where A and B are the input events and C is the output. By Eq. 5.9 the probability of event C can be expressed as follows: P .C / D P .A/ C P .B/ P .A \ B/ D P .A/ C P .B/ P .A/P .BnA/:
(8.6)
Equation 8.6 can be properly modified in accordance with the following hypotheses: 1. A and B are mutually exclusive events: ( P .A \ B/ D 0; P .C / D P .A/ C P .B/: Fig. 8.16 Gate list in Relex® Reliability software
Fig. 8.17 FTA, passenger injury in an elevator (Relex® Reliability software)
8.5 Qualitative FTA
239
8.5.1 Fault Tree Construction Guidelines Before the introduction of the main notation and properties of Boolean algebra, a few guidelines for the construction of a fault and its application to a production system, with a previously identified top event, could be useful. It is a top-down process of analysis starting from the top event defined for the system, or a generic part (subsystem) of the system: Fig. 8.18 OR gate
2. A and B are independent events: ( P .B=A/ D P .B/; P .C / D P .A/ C P .B/ P .A/P .B/: 3. Event B is completely dependent on event A: ( P .B=A/ D 1;
1. Identification of a more detailed event. The generic event or input is substituted by a new and more detailed output event, as in Fig. 8.19. 2. Classification. The generic input event is analyzed in depth by the identification of two, or more, basic and alternatives configurations, e. g., cases 1 and 2 in Fig. 8.20. This identification is based on a process of classification applied to the input event and the introduction of an OR gate which classifies the available configuration (and/or failure) modes of the starting event, as illustrated in Fig. 8.20.
P .C / D P .A/ C P .B/ P .A/ D P .B/: Figure 8.17 reports the value of unavailability, or failure probability, for every basic event or combination; e. g., the failure probability for the basic Event11 “controller failure” is Q D 0:00741239, while for Gate5 “door close failure” Q D 0:00989076. The determination of these measures of unavailability, accomplished by ENF values, MTTR values, etc., is the result of the so-called quantitative FTA, properly illustrated and exemplified in Sect. 8.6. The next section presents the “qualitative” FTA, whose aim is the identification of the so-called cut sets, which are the minimal combinations of primary failure components/events causing the top event of the production system.
Abstract event
More detailed event
Fig. 8.19 A more detailed event
Event
8.5 Qualitative FTA The objective of this section is to identify the minimal cut sets (MCS) of a fault tree defined for a specific top event in a production system. A MCS is an intersection of “primary,” or “basic,” events essential for the top event: if a single failure in the cut set does not occur, there is no top event failure. The identification of cut sets can be effectively supported by the application of the Boolean algebra, whose basic notation and properties are introduced below.
+
Case A
Case B
Fig. 8.20 Classification of failure modes
240
8 Effects Analysis and Reliability Modeling of Complex Production Systems
+
Event
Cause A
Cause B
Fig. 8.21 Identification of distinct causes
x
Event
Absence of protection/ protective action
Hazard/failure event
Fig. 8.22 Absence of protection/protective action
8.5.2 Numerical Example 1. Fault Tree Construction Figure 8.24 presents a pumping system supplying cooling water for temperature control of a reactor and the related tank pressure. In particular, given the catastrophic top event “reactor explosion” and knowing the reliability performance indexes for a set of basic components, Fig. 8.25 shows a fault constructed according to the previously illustrated guidelines. The breakage of valves V1 and V2, of pumps P1 and P2, of processor PR, and the absence of electric power PW are the failure basic events defined for the system. Only supply line 2, exactly like line 1, is considered in the fault tree. The proposed fault tree corresponds to the hypothesis of redundant pumping lines in parallel, i. e., the cooling service is ensured by a single line at least. If the two circuits are both required simultaneously to supply the reactor’s demand, an OR gate replaces the AND gate, and the fault tree changes as illustrated in Fig. 8.26.
x
Event
3. Identification of distinct causes. Some different causes for the generic failure event are identified, and an OR gate is introduced, as in Fig. 8.21. The generic cause is capable of generating the failure event. 4. Failure event and absence of protection. A generic failure event is coupled with the absence of protection or a protective action (see Fig. 8.22). An AND gate is introduced. 5. Concurrent causes. The generic failure event occurs only in the case of concurrent causes, as exemplified in Fig. 8.23.
Cause 1
Cause 2
Fig. 8.23 Concurrent causes
Fig. 8.24 Pressure control in a chemical reactor
8.5 Qualitative FTA
241
No cold water from line 2
Reactor explosion
+
Pressure tank rupture
Internal overpression V2 does not open
P2 does not work
+
Cold water supplier fails
X
V2 broken
PR broken No electric power
No cold water from line 1
+
Temperature out of control
No cold water from line 2
P2 broken
No electric power
Fig. 8.25 Fault tree construction. AND gate, configuration A
8.5.3 Boolean Algebra and Application to FTA The Boolean algebra, or “algebra of events,” is particularly useful for conducting a FTA from both a qualitative and a quantitative point of view. In particular, this algebra supports the designer and manager of a production system in answering to this critical question: What are the basic/primary events causing the defined top event for the production system? Given a production system and a top event related to the system function, it is possible to construct a fault tree. The Boolean algebra materially supports the application of reducing and simplifying properties to obtain an equivalent fault tree (EFT), as a result of different MCS.
Boolean algebra is the algebra of two values introduce by George Boole, a British mathematician and philosopher of the nineteenth century. These values are usually taken to be 0 and 1, corresponding to false and true. In particular, given a generic event A, a Boolean variable XA can be defined as follows: ( 0 if event A does not occur (8.7) XA D 1 if event A occurs. Tables 8.8 and 8.9 refer to the main properties and rules of the Boolean algebra, useful for conducting a FTA and in particular for obtaining the EFT. The significance and validity of the Boolean rules can be checked by the application of Venn diagrams. An EFT is a tree made of two levels: level 0 identifies the top event and level 1 the set of MCS, as il-
242
8 Effects Analysis and Reliability Modeling of Complex Production Systems
No cold water from line 2
Reactor explosion
+
Pressure tank rupture
Internal overpression V2 does not open
P2 does not work
+
Cold water supplier fails
+
Temperature out of control
+
V2 broken
PR broken No electric power
No cold water from line 1
P2 broken
No electric power
No cold water from line 2
Fig. 8.26 Fault tree construction. OR gate, configuration B
lustrated in Fig. 8.27. A MCS defines a failure mode of the top event, because it is a smaller combination of component failures capable of causing the top event, if all component failures occur. A generic MCS can be represented by the fault tree in Fig. 8.28. The application of the Boolean properties previously illustrated allows one to express the MCS for the top event in an EFT as follows: TOP D
n X i D1
MCSi D
mi n Y X i D1
Cij ;
(8.8)
j D1
where MCSi is the MCS i for the top event, n is the number of MCS, mi is the number of primary events in MCS i , and Cij is primary event j for MCS i . Every algebraic operation in Eq. 8.8 is executed in accordance with Boolean definitions and properties, as illustrated below.
It is possible to rank the MCS according to their size, thus weighting the relevance of a failure; moreover, it could be useful to conduct a quantitative evaluation of a fault tree in order to properly identify the system’s criticalities, as illustrated below.
8.5.4 Qualitative FTA: A Numerical Example This numerical example refers to the system represented in Fig. 8.24, which is useful for identifying the MCS, given the top event “reactor explosion.” In Sect. 8.5.2 two different reliability configurations, A and B, were considered, but in this case the FTA applies to configuration A made up of two redundant lines for cooling water in parallel. Figure 8.30 presents
8.5 Qualitative FTA
243
Table 8.8 Boolean algebra and Venn diagrams Event
Venn diagrams
Boolean algebra
U
A
A
XA
U
AN
Boolean variable
Complement or negation XAN D XN A D 1 XA
A
Disjunction ˚ U
A [ B or A C B A
XA[B D XA ˚ XB D
B
a
Xi
iDA;B
D 1 .1 XA /.1 XB /
Conjunction ˝ U
A \ B or A B A
B
XA\B D XA ˝ XB Y D Xi D XA XB iDA;B
˚ Boolean sum, ˝ Boolean product
Table 8.9 Rules of Boolean algebra Events domain
Boolean algebra
Operation with
A[;D A
X A C ; D XA C 0 D XA
events ; and U
A\;D ;
X A ; D XA C 0 D ; D 0
U [ADU
XU C X A D X U D 1 XU X A D X A
Complementation
U \ADA AN \ A D ;
Commutative law
A[B DB [A
X A C XB D XB C XA
AB DB A
X A XB D XB XA
A [ .B [ C / D .A [ B/ [ C
XA C .XB C XC / D .XA C XB / C XC
A \ .B \ C / D .A \ B/ \ C
XA .XB XC / D .XA XB /XC
A \ .B [ C / D .A \ B/ [ .A \ C /
XA .XB C XC / D .XA XB / C .XA XC /
Associative law Distributive law
XAN XA D 0
A [ .B \ C / D .A [ B/ \ .A [ C /
XA C .XB XC / D .XA C XB /.XA C XC /
Law of absorption
A [ .A \ B/ D A
XA C .XA XB / D XA
A \ .A \ B/ D A \ B
XA .XA XB / D XA XB
Idempotent Law
A[ADA
X A C XA D XA
A\ADA
X A XA D XA
244
8 Effects Analysis and Reliability Modeling of Complex Production Systems Top event
Level 0
OR
Level 1
Cut set n
Cut set 1
Cut set 2 Cut set i
Fig. 8.27 Equivalent fault tree (EFT)
the EFT resulting from the application of the qualitative evaluation of the fault tree in Fig. 8.29, in accordance with the following expression: AND gate
TOP D Œ.V1 C PR/ C .P1 C PW/ Œ.V2 C PR/ C .P2 C PW/ D V1 V2 C V1 PR C V1 P2 C V1 PW C PR V2 C PR C PR P2 C PR PW C P1 V2 C P1 PR C P1 P2 C P1 PW C PW V2 C PW PR C PW P2 C PW D V1 V2 C V1 P2 C P1 V2 law of absorption
C P1 P2 C PR C PW D
5 X
MCSi :
i D1
Level 1
CSi -Cut set i
AND
Cimi
Ci1 Ci 2
Cij
Fig. 8.28 EFT and generic cut set
Level 2
On a whole there are five MCS, two on five of cardinality 1, i. e., including only one basic event (PR and PW) and the remaining three of cardinality 2 (V1 V2; V1 P2; P1 V2; P1 P2). Figures 8.29 and 8.30 are both based on the introduction of a few “mirrored blocks.” A mirrored block is an event repeated more than once in the system: e. g., the basic event “no electric power” is repeated four times and it certainly represents a very critical component for the system, especially in the case of a great value of failure rate .t/. Figure 8.31 reports the equivalent reliability block diagram generated by the fault tree in Fig. 8.29 and made up of two parallel and identical subsystems corresponding to the inputs of the AND gate in Fig. 8.25. Similarly, Fig. 8.32 presents the equivalent reliability block diagram generated by the EFT in Fig. 8.30. Figure 8.33 presents the fault tree generated for the not redundant configuration B, where the two lines are both necessary to properly control the reactor temperature level. In this special configuration there are six cut sets of cardinality 1, because every basic event is critical. Figure 8.34 lists the cut sets obtained by the qualitative analysis applied to the system in configuration B.
8.6 Quantitative FTA The aim of quantitative FTA is the determination of some reliability and probabilistic parameters, mainly referred to the top event declared for the production system investigated. This analysis can be performed
8.6 Quantitative FTA
245
Top
AND
OR
OR
OR
V1 broken
OR
OR
No electric power
P1 broken
PR broken
No electric power
V2 broken
No electric power
OR
PR broken
P2 broken
Fig. 8.29 FTA, “reactor explosion.” Configuration A – “redundancy.” ReliaSoft® software Top
OR
AND
V1 broken
V2 broken
AND
P1 broken
P2 broken
AND
AND
P1 broken
V2 broken
P2 broken
AND
V1 broken
AND
PR broken
PW broken ®
Fig. 8.30 Qualitative fault tree evaluation. EFT. Configuration A – “redundancy.” ReliaSoft software
V2 broken
No electric power
PR broken
P2 broken
No electric power
Extra Starting Block
Extra Ending Block
V1 broken
No electric power
PR broken
P1 broken
No electric power
Fig. 8.31 Equivalent reliability block diagram, “reactor explosion.” Configuration A – “redundancy.” ReliaSoft® software
No electric power
246
8 Effects Analysis and Reliability Modeling of Complex Production Systems
V2 broken
P2 broken
Extra Starting Block
V2 broken
Extra Node3 V1 broken
Extra Node4 P1 broken
V1 broken Extra Node5
P1 broken
PR broken
PW broken
P2 broken
Fig. 8.32 Equivalent reliability block diagram by the EFT. Configuration A – “redundancy.” ReliaSoft® software
Top
OR
OR
OR
OR
V1 broken
OR
OR
No electric power
PR broken
P1 broken
No electric power
V2 broken
No electric power
OR
PR broken
P2 broken
No electric power
Fig. 8.33 FTA, “reactor explosion.” Configuration B – “no redundancy.” ReliaSoft® software
OR
once MCS have been identified. It is a sequential evaluation which firstly determines the failure probability for the components, then the MCS, and finally the probabilities for the system, given the top event. The main equations for the determination of these probabilities are give as follows:
Fig. 8.34 Qualitative fault tree evaluation. EFT. Configuration B – “no redundancy.” ReliaSoft® software
• Component failure probability. Generally, for any component, or basic primary event, a constant failure rate per hour is assumed, and any timedependent effect is ignored. If a generic component is considered, it could be necessary to distinguish a “standby failure rate” from an “operating failure rate”: as a consequence, the proper failure rate has to be coupled to the proper time period, standby
Top
V1 broken
V2 broken
P1 broken
P2 broken
PR broken
PW broken
8.6 Quantitative FTA
247
time t or operating time t, respectively. The component failure probability, which mainly refers to the nonrepairable items, is Fj .t/ D Fj;s .ts / C Œ1 Fj;s .ts /Fj;o .to /;
(8.9)
where s is the standby phase, ts is the ready (i. e., standby) time period, o is the operating phase, and to is the operating time period. Assuming an exponential distribution for the random variable t, one can approximate the cumulated value F .t/ by its first-order term, when t < 0:1, as follows: F .t/ Š t;
(8.10)
where is the conditional and constant rate defined for the variable t. In particular, if t is the time to failure (ttf), then F .t/ is the failure probability function (unreliability) and is the constant failure rate. For repairable failures the constant asymptotic unavailability of a component is quantified by qj
D
Dconstant Dconstant
MTTR D Š ; C (8.11)
• Component failure occurrence rate. This rate is defined for both repairable and nonrepairable components or systems. For nonrepairable items it is defined as w.t/ D f .t/ D et ;
(8.15)
where f .t/ is the probability density function of the ttf. For both unrepairable and repairable failures .t/ is a reasonable approximation of this rate. • Failure occurrence rate of a cut set given a TOP event. A MCS failure occurs at time t to t C t if all components except one are down at time t, and the other component fails at time t to t C t. Consequently, X Y wj .t/ qk .t/; (8.16) wCSi .t/ D j 2CSi
k¤j k;j 2CSi
where wj .t/ is the failure rate of component j in MCS i . • ENF for a cut set. The ENF for a cut set CSi on a time period T is ENFCSi .T / D WCSi .0; T / D WCSi .T / ZT
where is the repair rate. • Failure probability and unavailability of a cut set given a top event. The general model for the evaluation of cut set unavailability, equivalent to failure probability, is Y qCSi .t/ D qj .t/; (8.12)
D
wCSi .t/ dt 0
D
ZT X 0
wj .t/
j 2CSi
Y k¤j k;j 2CSi
(8.17)
j 2CSi
where CSi is cut set i and qj .t/ is the unavailability of component j which belongs to CSi . • Unavailability of the system given a top event. a Y qCSi .t/ D 1 Œ1 qCSi .t/: QS .t/ D i
i
(8.13)
where T is the time period. • ENF of a system on a time period T, given a top event. ENF.T / D WS .T / n\ o X D WCSi .T / Pr E.CSi / i
A simplified equation quantifying the unavailability of the system is X QS .t/ Š qCSi .t/: (8.14) i
qk .t/ dt;
X
i
WCSi .T /;
(8.18)
i
where Prf: : :g is the failure probability and E.CSi / is the failure event defined for cut set i .
248
8 Effects Analysis and Reliability Modeling of Complex Production Systems
For the system the ENF is generally quantified by the following expression: X ENF.T / Š WCSi : (8.19)
TOP
i
• Virtual MTTR of a system given a top event. The following equations quantify the MTTR for the production system, given a top event: 8 QS .T / ˆ ˆ < MTTRS Š ws .T / (8.20) ˆ Ws .T / ˆ : ws .T / D ; T
G1
where ws .T / is the average estimated failure rate for the system.
G2
C G3
G4
G5
8.6.1 Quantitative FTA, Numerical Example 1 The fault tree reported in Fig. 8.35 relates to a repairable system and five repairable components, or basic events, A, B, C, D, and E, having well-known failure and repair behaviors. The analyst needs to quantify the unavailability, the ENF, and the MTTR of the system for a given top event and assuming a period of time T equal to 8;000 h. Table 8.10 presents the values of the failure and repair rates assuming an exponential distribution, i. e., random failure and repair durations, for ttf and the time to repair (ttr). By the application of the Boolean algebra, three MCS can be identified, each made up of two basic components: TOP D AB C ABE C ABD C ABC C EC C CD D AB C EC C CD: The quantitative analysis of the fault tree is found on the values of availability and unavailability for each basic component illustrated in Table 8.11. In particular, the unavailability has been quantified by the application both of the simplified model in Eq. 8.11, as reported in the fourth column in Table 8.11, and the exact exponential analytical model illustrated in Chap. 5 (Eq. 5.83) as reported in the fifth column in Table 8.11. The reliability of the component, representing the survival function of the item to the first failure, has been quantified by the application of the simplified model [see Eq. 8.10 for the failure probability function F .t/], as reported in the sixth column in Table 8.11, and of
A
B
A
B
D G6
E
D
Fig. 8.35 Fault tree, numerical example 1
the exact model (see Eq. 5.27), as reported in the seventh column in Table 8.11. Sometimes the simplified analytical models previously introduced are not applicable, as demonstrated by the value 2.4 assumed by the reliability for component C, while for other applications, such as for basic event D, the exact and simplified values of reliability significantly differ. A similar consideration can be made for the estimated values of availability.
Table 8.10 Reliability parameters, numerical example 1 Basic event
(h1 )
(h1 )
A B C D E
2 105 105 3 104 104 105
102 5 102 0 5 102 0
8.6 Quantitative FTA
249
Table 8.11 Reliability and availability evaluation, numerical example 1 Basic event A B C D E
MTTF (h)
MTTR (h)
MTTR
Œ=. C /Œ1 exp.. C /t /
T
1 exp. T /
50,000 100,000 3,333.333 10,000 100,000
100 20 1 20 1
0.002 0.0002
0.0020 0.0002 0.9093 0.0020 0.0769
0.16 0.08 2.4 0.8 0.08
0.148 0.077 0.909 0.551 0.077
0.002
MTTF mean time to failure, MTTR mean time to repair
Assuming the hypothesis of statistical independence between basic events related to the component of the system, the unavailabilities of the cut sets are qAB D qA qB Š 0:002 0:0002 Š 4 107 ;
where B is the MTTR of component B and A is the MTTR of component A, in accordance with the opportunity to apply the simplified analytical models of the unavailability. Similarly, for the other cut sets,
qE C D qE qC Š 0:910 0:08 Š 0:073; 3
qCD D qC qD Š 0:910 0:002 Š 1:82 10 : By application of Eq. 8.13, the unavailability of the system is a QS .8;000 h/ D qCSi .t/ i
D 1 .1 4 107 /.1 0:073/ .1 1:82 103 / Š 0:0747: If the simplified Eq. 8.14 is applied, X QS .8;000 h/ D qCSi .t/ Š 0:0748: In order to quantify the ENF of the system, Eq. 8.17 has been applied for each cut set: 8;000 Z
X i
0
WEC .T / Š
ŒC qE .t/ C E qC .t/ dt 0
ZT Š E
wi .t/
Y
qj .t/ dt;
j ¤i
i. e.,
0
1 C t T 1 je j 0 C C T 2 Š E T C C 2 1 C T 1 Š E T C .e 1/ C C T 2 C 2 1 4 .e8;000310 1/ Š 105 8;000 C 3 104 C 0:5 3 104 8;0002
ZT WCD .T / Š
ŒD qC .t/ C C qD .t/ dt 0
ZT Š D 0
ZT WAB .0; 8;000/ Š
ŒA qB .t/ C B qA .t/ dt 0 8;000 Z
Š
Œ.1 eC t / C C t dt
Š 0:146 failures;
i
WCS .0; 8;000/ D
ZT
ŒA B B C B A A dt 0
Š ŒA B B C B A A 8;000 Š 1:92 104 failures;
Œ.1 eC t / C C D dt
1 C t T je j 0 C C D T Š D T C C 1 C T .e 1/ C C D T Š D T C C 1 4 4 .e8000310 1/ Š 10 8;000 C 3 104 1 C 3 104 8;000 5 102 Š 0:502 failures:
250
8 Effects Analysis and Reliability Modeling of Complex Production Systems
As a consequence, given the top event and assuming a period of time of 8;000 h, the ENF for the system is X WCSi Š 0:648 failures: ENF.T / Š
The failure probability of the system FS .T / is FS .8;000 h/ D
a
FCSi .T /
i
i
D 1 .1 0:0114/.1 0:07/
Now it is possible to quantify the MTTR of the system by the application of the Eq. 8.20: 0:648 WS .T / D Š 8:1 105 h1 wS Š T 8000
.1 0:501/ Š 0:541; which is very similar to the “simplified” value:
and QS .T / QS .T / Š s .T / wS .T / 0:0748 Š Š 923:5 h: 8:1 105
FS .T D 8;000 h/ Š
MTTRS D
If the analyst has to quantify the failure probability of the repairable system considering the first failure, it is useful to evaluate the failure probabilities for the cut sets as follows: FAB .T / D FA .T /FB .T / Š 0:148 0:077 Š 0:0114; FE C .T / D qE C D FE .T /FC .T / Š 0:910 0:077 Š 0:070; FCD .T / D FC .T /FD .T / Š 0:910 0:551 Š 0:501: ReliaSoft BlockSim 7 - www.ReliaSoft.com
Figures 8.36 and 8.37 present the results obtained by the application of the Monte Carlo simulation analysis on the system for T D 8;000 h. In particular, Fig. 8.36 shows the up/down diagram obtained for components/events A–E and their contributions. Component C is clearly nonrepairable, but fortunately it is not a cut set and the system is always repairable within 8;000 h. For T longer than 8;000 h, the system can reach a state of nonrepairable failure owing to the simultaneous failure of the nonrepairable components E and C, as illustrated in Fig. 8.37. Finally, Fig. 8.38 presents the histogram of the expected failures.
Block Up/Down
Operating Time Time Under Repair
E
D
C
B
A
System
1600.000
3200.000
4800.000
FCSi .T / Š 0:582:
i
State
0.000
X
6400.000
Time, (t)
Fig. 8.36 Block up/down analysis, T D 8;000 h. ReliaSoft® software
8000.000
8.6 Quantitative FTA
251
ReliaSoft BlockSim 7 - www.ReliaSoft.com
Block Up/Down State Operating Time Time Under Repair
E
C
System
0.000
10000.000
20000.000
30000.000
40000.000
50000.000
Time, (t)
Fig. 8.37 Block up/down analysis, T D 5;000 h. ReliaSoft® software
ReliaSoft BlockSim 7 - www.ReliaSoft.com
Block Expected Failures 0.912
RS FCI 100%
50%
0.730 0% 5 Item(s)
0.547
0.365
0.182
0.000
C
D
A
E
Fig. 8.38 Component expected failures. ReliaSoft® software
B
252
8 Effects Analysis and Reliability Modeling of Complex Production Systems
8.6.2 Quantitative FTA, Numerical Example 2 The FTA is applied in this case to the system previously described in Sect. 8.5.2, whose cut sets were illustrated in Sect. 8.5.4.
8.6.2.1 System Configuration A The analytical evaluation of the reliability and the failure rate for the system, given a top event and assuming the redundant configuration A, is as follows: RS .t/ D RP2 .t/ Relectric power .t/ RPR .t/ RV2 .t/ C RP1 .t/ Relectric power .t/ RPR .t/ RV1 .t/ RP2 .t/ RV1 .t/ Relectric power .t/ RPR .t/ RP1 .t/ RV2 .t/; S .t/ D P1 broken .t/RV1 .t/Relectric power .t/RPR .t/ C PR broken .t/RV1 .t/Relectric power .t/RP1 .t/ C no electric power .t/RV1 .t/RPR .t/RP1 .t/ C V1 broken .t/Relectric power .t/RPR .t/RP1 .t/ C V2 broken .t/RP2 .t/Relectric power .t/RPR .t/ C PR broken .t/RP2 .t/Relectric power .t/RV2 .t/ C PR broken .t/RP2 .t/Relectric power .t/RV2 .t/ C no electric power .t/RP2 .t/RPR .t/RV2 .t/ C P2 broken .t/Relectric power .t/RPR .t/RV2 .t/ V2 broken .t/RP2 .t/RV1 .t/Relectric power .t/ RPR .t/RP1 .t/ V1 broken .t/RP2 .t/RV2 .t/Relectric power .t/ RPR .t/RP1 .t/ P1 broken .t/RP2 .t/RV1 .t/Relectric power .t/ RPR .t/RV2 .t/ P2 broken .t/RP1 .t/RV1 .t/Relectric power .t/ RPR .t/RV2 .t/ PR broken .t/RP2 .t/RV1 .t/Relectric power .t/ RV2 .t/RP1 .t/ no electric power .t/RP2 .t/RV1 .t/RV2 .t/ RPR .t/RP1 .t/:
A quantitative analysis based on different scenarios is illustrated next for configuration A and exponential distributions of ttf and ttr random variables. Table 8.12 reports the values of ttf and ttr assumed for the basic components in the system illustrated in Fig. 8.24. Given the top event “reactor explosion,” Fig. 8.39 shows the trends of F .t/, R.t/, f .t/, and .t/ as a function of time t for system configuration A; as a consequence, the components and the system, subject to the top event, are supposed to be not repairable. These trends also illustrate the top event for the system in the case of repairable components, but considering the so-called first failure top event as catastrophic. From the reliability importance analysis in Fig. 8.40, the most critical component is the electric power supplier, whose “absence of power” event is very critical because of its great failure rate and the cardinality 1 of the corresponding cut set. The same conclusion is supported by the static reliability importance analysis for time t D 4;000 h and t D 8;000 h, as reported in Fig. 8.41. Figures 8.42–8.45 present the results of a dynamic Monte Carlo simulation analysis for a period T of 50;000 h, assuming the hypothesis of repairable components. It is worth noting in Fig. 8.42 that each time the electric power supply fails, the system fails too. Figure 8.43 presents the trend of the system failures NF.t/ cumulated from t0 D 0 to the generic time point t. Figure 8.44 shows the expected downing events for the set of components, or basic events, and, finally, Fig. 8.45 shows the point availability A.t/. 8.6.2.2 System Configuration B Considering the not redundant configuration B, the analytical evaluation of reliability functions RS .t/ and S .t/ results in the following: RS .t/ D RV1 .t/Relectric power .t/RPR .t/ RP1 .t/RV2 .t/RP2 .t/; S .t/ D V1 broken .t/ C no electric power .t/ C PR broken .t/ C P1 broken .t/ C V2 broken .t/ C P2 broken .t/: As for configuration A, Figs. 8.46–8.48 illustrate the results for configuration B, assuming the failure and repair probability distributions listed in Table 8.12.
8.6 Quantitative FTA
253
Table 8.12 Constant failure and repair rates. Configuration A .t / D
1=.t / D 1= D MTTR
3 105 h1 3 104 h1 104 h1 106 h1
25 h 18 h 15 h 30 h
Component P1 , P2 pumps PW electric power supplier V1 , V2 valves PR processor
ReliaSoft BlockSim 7 - www.ReliaSoft.com
ReliaSoft BlockSim 7 - www.ReliaSoft.com
Unreliability vs Time
1.000
Reliability vs Time
1.000
Unreliability
Reliability
Fault Tree1 Unreliability Line
Fault Tree1 Reliability Line
0.800
Reliability, R(t)
Unreliability, F(t)=1-R(t)
0.800
0.600
0.400
0.200
0.000 0.000
0.600
0.400
0.200
4000.000
8000.000
12000.000
16000.000
0.000 0.000
20000.000
4000.000
8000.000
Time, (t) ReliaSoft BlockSim 7 - www.ReliaSoft.com
12000.000
16000.000
ReliaSoft BlockSim 7 - www.ReliaSoft.com
Probability Density Function
4.000E-4
Failure Rate vs Time
5.000E-4
Pdf
Failure Rate
Fault Tree1 Pdf Line
Fault Tree1 Failure Rate Line
4.400E-4
Failure Rate, f(t)/R(t)
3.200E-4
f(t)
2.400E-4
1.600E-4
8.000E-5
0.000 0.000
20000.000
Time, (t)
3.800E-4
3.200E-4
2.600E-4
4000.000
8000.000
12000.000
16000.000
2.000E-4 0.000
20000.000
4000.000
8000.000
Time, (t)
12000.000
16000.000
20000.000
Time, (t)
®
Fig. 8.39 Event “reactor explosion,” configuration A. F .t /, R.t /, f .t /, .t /. ReliaSoft software ReliaSoft BlockSim 7 - www.ReliaSoft.com
ReliaSoft BlockSim 7 - www.ReliaSoft.com
Reliability Importance vs Time
0.200
Reliability Importance Value
0.160
0.120
0.080
0.040
0.000 0.000
Importance Fault Tree1 Extra Starting Block Extra Ending Block V1 broken no electric power PR broken P1 broken no electric power V2 broken no electric power PR broken P2 broken No electric power
0.800
Reliability Importance Value
Fault Tree1 Extra Starting Block Extra Ending Block V1 broken no electric power PR broken P1 broken no electric power V2 broken no electric power PR broken P2 broken No electric power
Reliability Importance vs Time
1.000
Importance
0.600
0.400
0.200
4000.000
8000.000
12000.000
Time, (t)
16000.000
20000.000
0.000 0.000
4000.000
8000.000
12000.000
Time, (t)
Fig. 8.40 Event “reactor explosion,” configuration A. Reliability importance. ReliaSoft® software
16000.000
20000.000
254
8 Effects Analysis and Reliability Modeling of Complex Production Systems
ReliaSoft BlockSim 7 - www.ReliaSoft.com
ReliaSoft BlockSim 7 - www.ReliaSoft.com
Static Reliability Importance
Static Reliability Importance
0.832
0.577
Reliability
Reliability
100%
100%
50%
50%
0.462 0% 5 Item(s)
0.499
0.333
Reliability Importance Value
Reliability Importance Value
0.666
0.166
0.000
0% 5 Item(s)
0.346
0.231
0.115
no electric power
PR broken
V1 broken
V2 broken
0.000
P2 broken
no electric power
Time = 4000
PR broken
V2 broken
V1 broken
P2 broken
Time = 8000
Fig. 8.41 Event “reactor explosion,” configuration A. Static reliability importance, t D 4;000 h and t D 8;000 h. ReliaSoft® software
ReliaSoft BlockSim 7 - www.ReliaSoft.com
Block Up/Down State
V2 broken
V1 broken
PR broken
P2 broken
P1 broken
no electric power
System
0.000
10000.000
20000.000
30000.000
40000.000
50000.000
Time, (t)
Fig. 8.42 Event “reactor explosion,” repairable components, configuration A. Simulation analysis. Up/down diagram. ReliaSoft® software
From the reliability importance analysis in Fig. 8.47, the most critical component is the electric power supplier, whose “absence of power” event is very critical because of its great failure rate and the cardinality 1 of the corresponding cut set. Fig. 8.48 presents the result of a static reliability importance analysis for t D 4;000 h and t D 8;000 h.
8.6.3 Numerical Example. Quantitative Analysis in the Presence of a Mix of Statistical Distributions This numerical example rejects the assumption of constant failure rates, and the probability distributions for ttf and ttr vary as reported in Table 8.13.
8.6 Quantitative FTA
255
ReliaSoft BlockSim 7 - www.ReliaSoft.com
System Failures
20.000
System Failures Fault Tree1 System Failures
System Failures(t)
16.000
12.000
8.000
4.000
0.000 0.000
10000.000
20000.000
30000.000
40000.000
50000.000
Time, (t)
Fig. 8.43 Event “reactor explosion,” repairable components, configuration A. Simulation analysis. System failures. ReliaSoft® software ReliaSoft BlockSim 7 - www.ReliaSoft.com
Block Expected Downing Events 14.928
RS DECI 100%
50%
11.942 0% 6 Item(s)
8.956
5.971
2.986
0.000
no electric po...
V2 broken
V1 broken
P1 broken
P2 broken
PR broken
Fig. 8.44 Event “reactor explosion,” repairable components, configuration A. Expected downing events. ReliaSoft® software
8.6.3.1 System Configuration A Given the top event “reactor explosion,” Fig. 8.49 shows the trends of F .t/, R.t/, f .t/, and .t/ as a function of time t for system configuration A; as
a consequence, these trends can support the determination and analysis of the first failure process assuming the system is not repairable, i. e., in the case of a failure catastrophic event and repairable components (see Table 8.13). In particular, assuming a mission time T
256
8 Effects Analysis and Reliability Modeling of Complex Production Systems
ReliaSoft BlockSim 7 - www.ReliaSoft.com
Point Availability vs Time
1.000
Availability Fault Tree1 Point Availability Line
Availability, A(t)
0.998
0.996
0.994
0.992
0.990 0.000
10000.000
20000.000
30000.000
40000.000
50000.000
Time, (t)
Fig. 8.45 Event “reactor explosion,” configuration A. Availability A.t /. ReliaSoft® software ReliaSoft BlockSim 7 - www.ReliaSoft.com
ReliaSoft BlockSim 7 - www.ReliaSoft.com
Unreliability vs Time
Reliability vs Time
1.000
1.000
Unreliability
Reliability
Fault Tree1 Unreliability Line
Fault Tree1 Reliability Line
0.800
Reliability, R(t)
Unreliability, F(t)=1-R(t)
0.800
0.600
0.400
0.200
0.000 0.000
0.600
0.400
0.200
1800.000
3600.000
5400.000
7200.000
0.000 0.000
9000.000
1800.000
Time, (t)
3600.000
5400.000
7200.000
ReliaSoft BlockSim 7 - www.ReliaSoft.com
ReliaSoft BlockSim 7 - www.ReliaSoft.com
Probability Density Function
Block Failure Rate vs Time
6.000E-4
6.000E-4
Pdf
Failure Rate
Fault Tree1 Pdf Line
Fault Tree1 System no electric power P1 broken P2 broken PR broken V1 broken V2 broken
4.800E-4
Failure Rate, f(t)/R(t)
4.804E-4
f(t)
3.608E-4
2.412E-4
1.216E-4
2.000E-6 0.000
9000.000
Time, (t)
3.600E-4
2.400E-4
1.200E-4
1800.000
3600.000
5400.000
Time, (t)
7200.000
9000.000
0.000 0.000
1000000.000
2.000E+6
3.000E+6
Time, (t)
Fig. 8.46 Event “reactor explosion,” configuration B. F .t /, R.t /, f .t /, (t /. ReliaSoft® software
4.000E+6
5.000E+6
8.6 Quantitative FTA
257
ReliaSoft BlockSim 7 - www.ReliaSoft.com
Reliability Importance vs Time 1.000
Importance Fault Tree1 V1 broken P1 broken V2 broken P2 broken no electric power no electric power No electric power no electric power PR broken PR broken
Reliability Importance Value
0.800
0.600
0.400
0.200
0.000 0.000
1800.000
3600.000
5400.000
7200.000
9000.000
Time, (t)
Fig. 8.47 Event “reactor explosion,” configuration B. Reliability importance. ReliaSoft® software
ReliaSoft BlockSim 7 - www.ReliaSoft.com
ReliaSoft BlockSim 7 - www.ReliaSoft.com
Static Reliability Importance
Static Reliability Importance
0.352
0.124
Reliability
Reliability
100%
100%
50%
50%
Reliability Importance Value
0% 5 Item(s)
0.074
0.050
Reliability Importance Value
0.282
0.099
0.211
0.141
0.070
0.025
0.000
0% 5 Item(s)
no electric power
V1 broken
V2 broken
P2 broken
0.000
P1 broken
no electric power
V1 broken
Time = 8000
V2 broken
P2 broken
Time = 4000
Fig. 8.48 Event “reactor explosion,” configuration B. Static reliability importance. ReliaSoft® software
Table 8.13 Mix of failure and repair distributions. Configuration A Component
Process
Distribution
Parameter 1
Parameter 2
P1, P2 pumps
Failure Repair Failure Repair Failure Repair Failure Repair
Weibull Lognormal Exponential Exponential Weibull Lognormal Exponential Exponential
1=a D 33;333 h D 25 h D 3 104 h1 MTTR D 18 h 1=a D 1;000 h D 15 h D 106 h1 MTTR D 30 h
b D 1:5 3h
PW electric power supplier V1, V2 valves PR processor
b D 1:5 0:5 h
P1 broken
258
8 Effects Analysis and Reliability Modeling of Complex Production Systems
ReliaSoft BlockSim 7 - www.ReliaSoft.com
ReliaSoft BlockSim 7 - www.ReliaSoft.com
Unreliability vs Time
Reliability vs Time
1.000
1.000
Unreliability
Reliability
Fault Tree1 Unreliability Line
Fault Tree1 Reliability Line
0.800
Reliability, R(t)
Unreliability, F(t)=1-R(t)
0.800
0.600
0.400
0.200
0.000 0.000
0.600
0.400
0.200
600.000
1200.000
1800.000
2400.000
0.000 0.000
3000.000
600.000
1200.000
Time, (t)
1800.000
2400.000
3000.000
Time, (t)
ReliaSoft BlockSim 7 - www.ReliaSoft.com
ReliaSoft BlockSim 7 - www.ReliaSoft.com
Probability Density Function
Failure Rate vs Time
8.000E-4
0.003
Pdf
Failure Rate
Fault Tree1 Pdf Line
Fault Tree1 Failure Rate Line
0.002
Failure Rate, f(t)/R(t)
6.400E-4
f(t)
4.800E-4
3.200E-4
1.600E-4
0.000 0.000
0.002
0.001
7.600E-4
600.000
1200.000
1800.000
2400.000
3000.000
2.000E-4 0.000
600.000
1200.000
Time, (t)
1800.000
2400.000
3000.000
Time, (t)
Fig. 8.49 Event “reactor explosion,” configuration A and mix of distributions.F .t /, R.t /, f .t /, .t /. ReliaSoft® software
ReliaSoft BlockSim 7 - www.ReliaSoft.com
Reliability Importance vs Time
1.000
Importance Fault Tree1 Extra Starting Block Extra Ending Block no electric power no electric power No electric power PR broken PR broken no electric power V2 broken V1 broken P1 broken P2 broken
Reliability Importance Value
0.800
0.600
0.400
0.200
0.000 0.000
600.000
1200.000
1800.000
2400.000
3000.000
Time, (t)
Fig. 8.50 Event “reactor explosion,” configuration A and mix of distributions. Reliability importance. ReliaSoft® software
8.6 Quantitative FTA ReliaSoft BlockSim 7 - www.ReliaSoft.com
259 ReliaSoft BlockSim 7 - www.ReliaSoft.com
Static Reliability Importance
0.860
Static Reliability Importance
0.507
Reliability
Reliability 100%
100%
50%
50%
0.406 0% 5 Item(s)
0.516
0.344
0.172
Reliability Importance Value
Reliability Importance Value
0.688
0% 5 Item(s)
0.304
0.203
0.101
0.000
no electric power
PR broken
V1 broken
V2 broken
0.000
P2 broken
V2 broken
V1 broken
Time = 600
no electric power
PR broken
P1 broken
Time = 1200
Fig. 8.51 Event “reactor explosion,” configuration A and mix of distributions. Static reliability importance, t D 600 h and t D 1;200 h. ReliaSoft® software
ReliaSoft BlockSim 7 - www.ReliaSoft.com
Failure Rate vs Time 0.005
Failure Rate Fault Tree1 Failure Rate Line
Failure Rate, f(t)/R(t)
0.004
0.003
0.002
0.001
2.000E-4 0.000
400.000
800.000
1200.000
1600.000
2000.000
Time, (t)
Fig. 8.52 Event “reactor explosion,” configuration B and mix of distributions .t /. ReliaSoft® software
of about 3;000 h, the system certainly fails as clearly illustrated by the unreliability function, i. e., the failure probability function. Figure 8.50 shows the results of the reliability importance analysis conducted by ReliaSoft® software: the most critical component is the electric power supply before t about 1;200 h, while later valves V1 and V2 reveal themselves as the most important components in terms of reliability. The same conclusion is supported by the static reliability importance analysis illustrated in Fig. 8.51.
8.6.3.2 System Configuration B Given the top event “reactor explosion,” Figs. 8.52 and 8.53 present the failure rate .t/ and the reliability importance function for the repairable system in configuration B, made up of components subject to random failure and repair processes with different probability distributions, as listed in Table 8.13.
260
8 Effects Analysis and Reliability Modeling of Complex Production Systems
ReliaSoft BlockSim 7 - www.ReliaSoft.com
Reliability Importance vs Time 1.000
Importance Fault Tree1 no electric power no electric power No electric power PR broken PR broken no electric power V2 broken V1 broken P1 broken P2 broken
Reliability Importance Value
0.800
0.600
0.400
0.200
0.000 0.000
400.000
800.000
1200.000
1600.000
2000.000
Time, (t)
Fig. 8.53 Event “reactor explosion,” configuration B and mix of distributions. Reliability importance. ReliaSoft® software ReliaSoft BlockSim 7 - www.ReliaSoft.com
Block Expected Downing Events
0.992
RS DECI 100%
50%
0.793 0% 6 Item(s)
0.595
0.397
0.198
0.000
V2 broken
V1 broken
no electric po...
P2 broken
P1 broken
PR broken
Fig. 8.54 Event “reactor explosion,” configuration A and repairable components. Expected downing events. Simulation, t D 3;000 h. ReliaSoft® software
8.6.3.3 Monte Carlo Simulation The following results relate to the application of the Monte Carlo dynamic simulation of system configuration A, whose top event is the same as in the numerical example illustrated in Sect. 8.5.4 (see also
Figs. 8.29 and 8.30), assuming the hypothesis of repairable components and a mix of random variables ttf and ttr (see Table 8.13). Figure 8.54 presents the expected values of downing events related to the components of the repairable system and assuming t D 3;000 h.
8.6 Quantitative FTA
261
ReliaSoft BlockSim 7 - www.ReliaSoft.com
ReliaSoft BlockSim 7 - www.ReliaSoft.com
Block Up/Down
Block Up/Down
State
State Operating Time Time Under Repair
V2 broken
V1 broken
V1 broken
PR broken
PR broken
P2 broken
P2 broken
P1 broken
P1 broken
no electric power
no electric power
System
0.000
Operating Time Time Under Repair
V2 broken
System
600.000
1200.000
1800.000
2400.000
3000.000
0.000
600.000
1200.000
Time, (t)
1800.000
2400.000
3000.000
Time, (t)
Fig. 8.55 Event “reactor explosion,” configuration A and repairable components. Up/down dynamic analysis. ReliaSoft® software ReliaSoft BlockSim 7 - www.ReliaSoft.com
System Failures
2.000
System Failures Fault Tree1 System Failures
System Failures(t)
1.600
1.200
0.800
0.400
0.000 0.000
600.000
1200.000
1800.000
2400.000
3000.000
Time, (t)
Fig. 8.56 Event “reactor explosion,” configuration A and repairable components. Simulation analysis: system failures, t D Œ0; 3;000 h. ReliaSoft® software
Figure 8.55 presents the up/down (i. e., 0/1) diagrams obtained by two different simulation runs of the repairable system. In the first diagram the system fails twice because of the failure events for the electric power supply. A third time relates to the failure of valve V1 (very close to time point t D 1;800 h) following the failure of valve V2 in accordance with the existence of the cut set V1V2. In the second diagram the system fails when the failure of valve V2 occurs, given a previous failure of valve V1. Figure 8.56 presents the trend of the system failures for t belonging to the range Œ0; 3;000 h. This is the
result of a specific simulation run of the system and the top event. Figure 8.57 reports the measure of the downing event criticality index for the components, or basic events, of the system, given the “reactor explosion” top event. Figure 8.58 presents the values of the point availability A.t/ for the system subject to the top event, i. e., the probability that the system is operational at a given time in accordance with the so-called alternating renewal process made up of ttf and ttr stochastic processes. In particular, it is useful to remember that A.t/ is the probability that the system is up at time t.
262
8 Effects Analysis and Reliability Modeling of Complex Production Systems
ReliaSoft BlockSim 7 - www.ReliaSoft.com
RS DECI 36.265
Availability 100%
50%
29.012 0% 5 Item(s)
21.759
14.506
7.253
0.000
V2 broken
V1 broken
no electric power
P1 broken
P2 broken
Fig. 8.57 Event “reactor explosion,” configuration A and repairable components. Downing event criticality index (DECI), t D 3;000 h. ReliaSoft® software ReliaSoft BlockSim 7 - www.ReliaSoft.com
Availability and Reliability vs Time
1.000 Fault Tree1 Point Availability Line Point Reliability Line
0.800
A(t), R(t)
0.600
0.400
0.200
0.000 0.000
600.000
1200.000
1800.000
2400.000
3000.000
Time, (t)
Fig. 8.58 Point system availability A.t / and reliability R.t /, configuration A and repairable components. Simulation analysis. ReliaSoft® software
In other words, during the simulation analysis a special counter would be required in order to get this value at t . This counter is incremented by one every time the system is up at t considering the whole set of simulations runs; thus, the point availability at t is
the number of times the system is up at t divided by the number of simulation runs in the dynamic analysis. Figure 8.58 also reports the value of the point reliability R.t/ obtained in the same way as for A.t/, i. e., by means of several runs of dynamic simulation: this is
8.7 Application 1 – FTA
263
the probability that the nonrepairable system has not failed by time t.
The hot water produced by the boiler is pumped by a force pump, called a “boiler pump,” along a primary loop of piping; some thermic and hydraulic drops are obviously encountered. The hydraulic circuit is completed by a secondary loop, when the two heat exchangers in the controlled zone are fed by the same boiler, but it is possible to double the secondary loop (loop1 and loop2) in order to feed the fan-coils by two distinct and independent boiler systems. Each secondary loop is supported by its own pump. The generic loop associated with a boiler is made of two subloops, one for each exchanger. The environmental temperature is controlled by adjusting the hot water flow by means of automatic valves, one for each secondary loop, and a zone valve (mixing three-way valve) for each exchanger and for each loop. As a consequence, in the case of two fan-coils and two boilers, four valves are required. The boiler pump as well as every pump
8.7 Application 1 – FTA This application deals with the FTA conducted on a heating plant for a 160-m2 public lounge. The system, conventionally split into a hydronic device for warm water and a heating device based on water temperatures and thermic energy conservation, has three main components, as illustrated in the functional simplified block scheme of Fig. 8.59: the boiler, the distribution system (pumps, collectors, valves, etc.), and the heat exchangers. In particular, two fan-coils are fed in a redundant configuration, i. e., the heating system is supposed to be capable when at least one fan-coil is operating.
8 O100 mm
sub-loop 2.1
sub-loop 2.2
5 sub-loop 2.1
sub-loop 1.1
10 6 9 1
sub-loop 1.1 sub-loop 2.1
3
natural gas
water supply 11
2
4
primary loop 7 return
supply
secondary loop 1 Legend - Thermal system Code Block/component name 1 Boiler 2 Pump 1 3 Thermal sensor 4 Hydraulic disjunctor 5 3-way zone valve 6 Fan-coil 1
Legend - Thermal system Code Block/component name 7 Gas adducon valve 8 Gas burner system 9 Boiler circulaon pump 10 Flow fan temperature sensor 1 11 Control valve loop 1
Fig. 8.59 Functional block scheme of the thermic system
264
8 Effects Analysis and Reliability Modeling of Complex Production Systems
on the secondary loops operate according to the simultaneous and integrated control of sensors, such as a thermic sensor for each subloop, a flow fan temperature sensor for each fan, and an environmental sensor. Some other critical components playing a significant role complete the generic FTA: • a boiler system with natural gas adduction and combustion gas evacuation; • two fan-coils; • the electric power supply system; • the water supply system with a hydraulic pipe adduction; • the piping system, i. e., the piping distribution network; • the hydraulic disjunctors, as many as the secondary loops, for the right mix of hot and cold water in the primary and secondary loops.
The hydraulic circuit has to be filled up with water at the start-up, and later the water recirculates in the system when it is working. A refill is sometimes required in order to compensate for some water leaks.
8.7.1 Fault Tree Construction Assuming the situation “no thermic comfort” as the top event for the heating plant or “thermic system,” one can develop some different fault trees in accordance with different system configurations and hypotheses. These trees are made of the four basic “subtrees” illustrated in Figs. 8.60–8.63, representing the events of absence of hot water within the two available fans as follows:
no warm H2O supp.1
1.1
cold water
hydraulic disjunctor 1
3-way zone valve closed 1.1
no discharge head - loop 1.1
not operating boiler no electric power
piping rupture loop 1.1
zone valve 1.1 rupture
not operating pump 1
not operating circulation pump
no natural gas
valve automatic control 1
gas burner rupture 1
no drive c.v. no gas supply
gas adduction valve 1 closed
no drive c.p.
boiler pump 1 broken
no drive pump
pump 1 broken
no control
no electric power
no electric power flow fan n°1 temperature sensor
environmental thermal sensor
Fig. 8.60 Fault tree construction, subloop 1.1
Control Valve rupture loop1
no control p.
no electric power
no control
thermal sensor n°1.1 not operative environmental thermal sensor environmental thermal sensor
thermal sensor n° 1.1 not operative
thermal sensor n° 1.1 not operative
8.7 Application 1 – FTA
265
2.1
no warm H2O supp.2
cold water
hydraulic disjunctor 2
3-way zone valve closed 2.1
no discharge head - loop 2.1
not operating boiler no electric power
piping rupture loop 2.1
zone valve 2.1 rupture
not operating pump 2
not operating circulation pump
no natural gas
valve automatic control 2
gas burner rupture 2
no drive c.v. no gas supply
gas adduction valve 2 closed
no drive c.p.
no drive pump
no electric power
Control Valve rupture loop2
boiler pump 2 broken
pump 2 broken no electric power
no control
no control
no electric power flow fan n°1 temperature sensor
no control p.
thermal sensor n° 2.1 not operative environmental thermal sensor environmental thermal sensor
environmental thermal sensor
thermal sensor n° 2.1 not operative
thermal sensor n° 2.1 not operative
Fig. 8.61 Fault tree construction, subloop 2.1
• Transfer out block 1.1. It refers to subloop boiler 1, and no hot water on fan 1. • Transfer out block 2.1. It refers to subloop boiler 2 (i. e., in the case of the existence of a ond boiler), and no hot water on fan 1. • Transfer out block 1.2. It refers to subloop boiler 1, and no hot water on fan 2. • Transfer out block 2.2. It refers to subloop boiler 2 (i. e., in the case of the existence of a ond boiler), and no hot water on fan 2.
1.1, 2.1, sec1.2, 2.2, sec-
Every tree configuration, on five configurations A, B, C, D, and E proposed, has been generated and analyzed from both a qualitative and a quantitative point of view as follows: • Configuration A – one boiler and fan-coil redundancy and fill water (Fig. 8.64.) There is only a single boiler and two redundant fan-coils, i. e., there is
one secondary loop made of two subloops, one for each fan. It is supposed the system requires the water supplier to be operative, i. e., in a state of function. • Configuration B – one boiler and fan-coil redundancy (Fig. 8.65): There is only a single boiler and two redundant fan-coils, i. e., there is one secondary loop made of two subloops, one for each fan. It is also supposed the system does not require the water supplier to be operative because the piping has already been filled. • Configuration C – one boiler and no fan-coil redundancy (Fig. 8.66). There is only a single boiler and two fan coils, both necessary to guarantee thermic comfort. It is also supposed the system does not require the water supplier to be operative because the piping has already been filled.
266
8 Effects Analysis and Reliability Modeling of Complex Production Systems
1.2
no warm H2O supp.1
cold water
hydraulic disjunctor 1
3-way zone valve closed 1.2
no discharge head - loop 1.2
not operating boiler 1 no electric power
piping rupture loop 1.2
zone valve 1.2 rupture
not operating pump 1
not operating circulation pump
no natural gas
valve automatic control
gas burner rupture 1
no drive c.v. no gas supply
gas adduction valve 1 closed
no drive c.p.
boiler pump 1 broken no drive pump 1
Control Valve rupture loop1
pump 1 broken
no control
no electric power
no electric power no electric power
flow fan n°2 temperature sensor
environmental thermal sensor
no control
no control p.
thermal sensor n°1.2 not operative environmental thermal sensor environmental thermal sensor
thermal sensor n° 1.2 not operative
thermal sensor n° 1.2 not operative
Fig. 8.62 Fault tree construction, subloop 1.2
• Configuration D – two boilers and fan-coil redundancy and fill water (Figs. 8.67 and 8.68). There are two alternative boilers (i. e., one is redundant) and two redundant fan-coils. It is also supposed the system requires the water supplier to be operative because the piping network could be empty. • Configuration E – two boilers and fan-coil redundancy (Fig. 8.69). There are two alternative boilers (i. e., one is redundant) and two redundant fan-coils. It is also supposed the production system does not require the water supplier to be operative because the piping network is already filled (both primary and secondary loops).
8.7.2 Qualitative FTA and Standards-Based Reliability Prediction The generic fault tree previously illustrated is made up of several blocks, many of which are primary blocks/events related to the components of the system investigated. Many blocks are mirrors of a few primary events, such as the so-called no electric power, the rupture on the “environmental thermic sensor,” and the “no gas supply” event related to the natural gas supply system. The generic event mirror of a basic/primary component can be represented by a “little square” near the block associated with the event. The
8.7 Application 1 – FTA
267
2.2
no warm H2O supp.2
cold water
hydraulic disjunctor 2
3-way zone valve closed 2.2
no discharge head - loop 2.2
not operating boiler 2
no electric power
piping rupture loop 2.2
zone valve 2.2 rupture
not operating pump 2
not operating circulation pump
no natural gas
no gas supply
gas adduction valve 2 closed
valve automatic control 2
gas burner rupture 2
no drive c.v. no drive c.p.
boiler pump 2 broken
Control Valve rupture loop2
pump 2 broken no drive pump 2
no electric power
no control no control
no electric power no electric power
flow fan n°2 temperature sensor
no control p.
thermal sensor n° 2.2 not operative
environmental thermal sensor
environmental thermal sensor
environmental thermal sensor
thermal sensor n° 2.2 not operative
thermal sensor n° 2.2 not operative
Fig. 8.63 Fault tree construction, subloop 2.2
event associated with a component is considered “basic/primary” in accordance with the availability of data related to the failure and repair random behaviors. In particular, Table 8.14 reports the failure rates of the basic events/components collected by a library reference of nonelectronic parts (see standards-based reliability database of predefined components MIL-217, NSWC98/LE1, etc.). Another trivial but significant consideration can be made. The presence of redundancies justifies the absence of AND gates in fault tree construction (e. g., only OR gates in configuration C). In particular, according to the previously introduced and discussed Boolean absorption laws, configuration C is as illustrated in Fig. 8.70. The number of MCS is 19, each one made up of a single member. Given the top event, the failure rate
of the system is S D pump 1 broken C piping rupture loop 1.1 C boiler pump 1 broken C thermal sensor 1.1 not operative C control valve rupture loop 1 C hydraulic disjunctor 1 C fan axial flow 1 C no electric power C flow fan 1 temperature sensor Cenvironmental thermal sensor C thermal sensor 1.2 not operative C piping rupture loop 1.2 C zone valve 1.2 rupture C fan axial flow 2 C flow fan 2 temperature sensor C zone valve 1.1 rupture C no gas supply C gas adduction valve 1 closed C gas burner rupture 1 : Tables 8.15 and 8.16 illustrate the configuration of the MCS identified by the qualitative analysis for the
268
8 Effects Analysis and Reliability Modeling of Complex Production Systems
No Thermal Comfort
no warm air
no warm air Fan 2
no warm air Fan 1
no supply H2O
no flow fan 1
flow fan n°1 temperature sensor
fan axial flow 1
no flow fan 2
no electric power
no water supply
rupture hydronic pipe adduction
1.1
flow fan n°2 temperature sensor
fan axial flow 2
no supply H2O
no electric power
no water supply
rupture hydronic pipe adduction
1.2
Fig. 8.64 One boiler and fan-coil redundancy and fill water. Configuration A
Table 8.14 Failure rates from standards-based reliability libraries. FT fault tree (106 )
Number of components Configurations Configurations A, B and C – D and F – 1 boiler 2 boilers
Code
FT component
Other reference
1 2 3
Fan axial flow No electric power Flow fan temperature sensor Rupture hydronic pipe adduction No gas supply Boiler pump broken
Fancoil Electric power supplier Sensor transmitter temperature Piping water system
1.586 13.65 25.69
630,517 73,260 38,926
2 1 2
2 1 2
1.066
938,086
1
1
Gas supplier Pump hydraulic boiler feed Pump hydraulic Valve mixing 3-way Valve hydraulic gate Water supplier system Sensor temperature
50.7 0.4216
19,724 2,371,916
1 1
1 2
86.28 18.54 1.336 95.1 0.1053
11,590 53,937 74,8503 10,515 9,496,676
1 2 1 1 1
2 4 2 1 1
Valve automatic control
10.87
91,966
1
2
4 5 6 7 8 9 10 11 12
Pump broken Zone valve rupture Gas adduction valve No water supply Environmental thermal sensor Control valve rupture
MTTF (h)
8.7 Application 1 – FTA
269
No Thermal Comfort
no warm air
no warm air Fan 2
no warm air Fan 1
no flow fan 1 no flow fan 2
flow fan n°1 temperature sensor
fan axial flow 1
no electric power
1.1
flow fan n°2 temperature
fan axial flow 2
no electric power
1.2
Fig. 8.65 One boiler and fan-coil redundancy. Configuration B
available configurations. In particular, the number of cut sets is 36 for configuration A, 34 for configuration B, 19 for configuration C (as previously demonstrated), 414 for configuration D, and 412 for configuration E.
8.7.3 Quantitative FTA By the application of the analytical model illustrated in the previous sections of this chapter, it is possible to quantify the reliability parameters of the system, e. g., reliability RS .t/ and MTTF. Table 8.16 summarizes these values for the five system configurations previously illustrated. In particular, the reliability function has been quantified for t D 4;000 h and
t D 6;570 h, corresponding to an operating period of 1 year (i. e., 365 days per year and 18 h per day). The system is supposed to be nonrepairable and made up of nonrepairable components, and as a consequence these values refer to the first occurrence of the system failure event. In accordance with this hypothesis, the following sections illustrate some basic results obtained for the five system configurations previously introduced.
8.7.3.1 Configuration A – One Boiler and Fan-Coil Redundancy and Fill Water Figure 8.71 presents the failure probability function F .t/ (i. e., the unreliability), the reliability R.t/ (i. e., the survival function), the probability density func-
270
8 Effects Analysis and Reliability Modeling of Complex Production Systems
No Thermal Comfort
no WARM air
no warm air Fan 2
no warm air Fan 1
no flow fan 1 no flow fan 2
flow fan n°1 temperature sensor
fan axial flow 1
no electric power
flow fan n°2 temperature
1.1
fan axial flow 2
no electric power
1.2
Fig. 8.66 One boiler and no fan coil redundancy. Configuration C
tion f .t/, and the failure rate .t/ for the thermic system made up of one boiler and two redundant fan-coils. The hydraulic circuit could be empty. Figure 8.72 presents the results obtained by the static reliability importance analysis (see Chaps. 5 and 6) applied to the system for t D 4;000 h and t D 8;000 h. The most critical components are the water supply system, pump 1, the gas supply system, the electric power system, and the automatic control valve. This rank ordering list is confirmed by the time-dependent reliability importance analysis, whose main results are illustrated in Fig. 8.73, and whose most critical components have the highest values of the reliability importance value (in the vertical y-coordinate). Figure 8.74 compares the failure rate of the system S .t/ with the failure rates of the most critical components previously identified. Now, the reliability of two exemplifying cut sets is quantified as follows: Y qj .t/ qCSfpump 1 brokeng .t/ D j 2CSfpump 1 brokeng
D qpump 1 broken .t/ D 1 epump 1 broken t 6 t
D 1 e86:2810
qCS˚
fan axial_flow_1I zone_valve_1.2_rupture
D
.t/
Y
j 2CS˚ fan axial_flow_1I zone_valve_1.2_rupture
qj .t/
D qfan axial_flow_1 .t/ qzone_valve_1.2_rupture .t/ D Œ1 efan axial_flow_1 t Œ1 ezone_valve_1.2_rupture t 6
6
D Œ1 e1:586x10 tŒ1 e18:54x10 t
8.7.3.2 Configuration B – One Boiler and Fan-Coil Redundancy As previously applied to configuration A, Fig. 8.75 presents the failure probability function F .t/, the reliability R.t/, the probability density function f .t/, and the failure rate .t/ for the thermic system made up of one boiler and two redundant fan-coils, without requiring water from the water supplier system in this case. Figure 8.76 presents the results obtained by the static reliability importance analysis applied to the system for t D 4;000 h and t D 8;000 h. The most critical components are the same as for configuration A: the rank ordering list is confirmed by the time-dependent reliability importance analysis (see Fig. 8.77). Fig-
8.7 Application 1 – FTA
271
No Thermal Comfort
no warm air
no warm air Fan 2
no warm air Fan 1
no supply H2O
no flow fan 1
flow fan n°1 temperature sensor
fan axial flow 1
no flow fan 2
no electric power
no water supply
rupture hydronic pipe adduction
1.1
flow fan n°2 temperature sensor
AND
fan axial flow 2
2.1
no supply H2O
no electric power
no water supply
rupture hydronic pipe adduction
1.2
AND
2.2
Fig. 8.67 Two boilers and fan-coil redundancy and fill water. Configuration D
ure 8.78 compares the failure rate of the system S .t/ with the failure rates of the most critical components.
As previously demonstrated, the failure rate of the system is constant, i. e., the top event is random.
8.7.3.3 Configuration C – One Boiler and No Fan-Coil Redundancy
8.7.3.4 Configuration D – Two boilers and Fan-Coil Redundancy and Fill Water
Figure 8.79 presents the failure probability function F .t/, the reliability R.t/, the probability density function f .t/, and the failure rate .t/ for the system made up of one boiler and two fan-coils, all necessary to guarantee environmental thermic comfort, without requiring water from the water supply system. Figures 8.80–8.82 are similar to those introduced for configurations A and B. The most critical basic events/components are the failure of the pump, the gas supply system, the flow fan thermic sensors, and the subloop thermic sensors.
Figure 8.83 presents the failure probability function F .t/, the reliability R.t/, the probability density function f .t/, and the failure rate .t/ for the thermic system made up of two boilers and two redundant fan-coils. The hydraulic circuit could be empty. Figures 8.84–8.86 correspond to those introduced for the previous system configurations. The most critical basic events/components are the water supply system, the gas supply system, the electric power system, the hydronic pipe adduction (for the water supply system), and the environmental thermic sensor.
fan axial flow 1
no electric power
hydraulic disjunctor 1
no electric power
cold water
boiler pump 1 broken
environmental thermal sensor
flow fan n°1 temperature sensor
no control
no drive c.p.
no electric power
gas adduction valve 1 closed
not operating circulation pump
not operating boiler
no natural gas
no gas supply
zone valve 1.1 rupture
rupture hydronic pipe adduction
no warm H2O supp.1
no water supply
no supply H2O
thermal sensor n°1.1 not operative
gas burner rupture 1
piping rupture loop 1.1
pump 1 broken
thermal sensor n° 1.1 not operative
no control p.
environmental thermal sensor
no electric power
no drive pump
not operating pump 1
no discharge head - loop 1.1
no electric power
environmental thermal sensor
thermal sensor n° 1.1 not operative
Control Valve rupture loop1
no control
no drive c.v.
no electric power
hydraulic disjunctor 2
valve automatic control 1
3-way zone valve closed 2.1
AND
gas burner rupture 2
environmental thermal sensor
thermal sensor n° 2.1 not operative
boiler pump 2 broken
no control
no drive c.p.
flow fan n°1 temperature sensor
no electric power
gas adduction valve 2 closed
not operating circulation pump
not operating boiler
no natural gas
no gas supply
zone valve 2.1 rupture
cold water
no warm H2O supp.2
environmental thermal sensor
thermal sensor n° 2.1 not operative
no control p.
pump 2 broken
not operating pump 2
no drive pump
no electric power
piping rupture loop 2.1
fan axial flow 2
no discharge head - loop 2.1
flow fan n°2 temperature sensor
no flow fan 2
environmental thermal sensor
thermal sensor n° 2.1 not operative
no control
Control Valve rupture loop2
valve automatic control 2
no drive c.v.
no electric power
no electric power
no water supply
3-way zone valve closed 1.2
rupture hydronic pipe adduction
no supply H2O
Fig. 8.68 Thermic system, two boilers and fan-coil redundancy and fill water. Configuration D
3-way zone valve closed 1.1
flow fan n°1 temperature sensor
no flow fan 1
no warm air Fan 1
no warm air
No Thermal Comfort
no electric power
hydraulic disjunctor 1
gas adduction valve 1 closed
no control
gas burner rupture 1
thermal sensor n°1.2 not operative
boiler pump 1 broken
environmental thermal sensor
flow fan n°2 temperature sensor
no electric power
no drive c.p.
not operating circulation pump
not operating boiler 1
no natural gas
no gas supply
zone valve 1.2 rupture
cold water
no warm H2O supp.1
AND
environmental thermal sensor
thermal sensor n° 1.2 not operative
no control p.
pump 1 broken
not operating pump 1
no gas supply
no control
no drive c.v.
thermal sensor n° 1.2 not operative
Control Valve rupture loop1
environmental thermal sensor
flow fan n°2 temperature sensor
gas burner rupture 2
thermal sensor n° 2.2 not operative
boiler pump 2 broken
no control
no drive c.p.
no electric power
gas adduction valve 2 closed
not operating circulation pump
not operating boiler 2
cold water
no warm H2O supp.2
no natural gas
valve automatic control
zone valve 2.2 rupture
hydraulic disjunctor 2
environmental thermal sensor
no electric power
no electric power
no discharge head - loop 1.2
3-way zone valve closed 2.2
no drive pump 1
no electric power
piping rupture loop 1.2
no warm air Fan 2
no electric power
piping rupture loop 2.2
thermal sensor n° 2.2 not operative
no control p.
environmental thermal sensor
no drive pump 2
pump 2 broken
not operating pump 2
no discharge head - loop 2.2
no electric power
environmental thermal sensor
Control Valve rupture loop2
thermal sensor n° 2.2 not operative
no control
no drive c.v.
valve automatic control 2
272 8 Effects Analysis and Reliability Modeling of Complex Production Systems
8.7 Application 1 – FTA
273
No Thermal Comfort
no warm air
no warm air Fan 2
no warm air Fan 1
no flow fan 1 no flow fan 2
flow fan n°1 temperature sensor
fan axial flow 1
no electric power
flow fan n°2 temperature sensor
AND
1.1
fan axial flow 2
2.1
no electric power
AND
1.2
2.2
Fig. 8.69 Two boilers and fan-coil redundancy. Configuration E
No Thermal Comfort
no WARM air
flow fan n°1 temperature
fan axial flow 1
no electric power
zone valve 1.1 rupture
hydraulic disjunctor 1
Fig. 8.70 Configuration C, EFT
no gas supply
gas adduction valve 1
environmental thermal sensor
boiler pump 1 broken
gas burner rupture 1
piping rupture loop 1.1
thermal sensor n° 1.1
pump 1 broken
Control Valve rupture loop1
flow fan n°2 temperature
fan axial flow 2
zone valve 1.2 rupture
piping rupture loop 1.2
thermal sensor n° 1.2
pump 1 broken no electric power no gas supply gas adduction valve 1 closed gas burner rupture 1 boiler pump 1 broken Control Valve rupture loop1 hydraulic disjunctor 1 environmental thermal sensor piping rupture loop 1.1 and thermal sensor no. 1.2 not operative piping rupture loop 1.1 and fan axial flow 2 piping rupture loop 1.1 and flow fan no. 2 temperature sensor piping rupture loop 1.1 and piping rupture loop 1.2 piping rupture loop 1.1 and zone valve 1.2 rupture zone valve 1.1 rupture and thermal sensor no. 1.2 not operative zone valve 1.1 rupture and piping rupture loop 1.2 zone valve 1.1 rupture and zone valve 1.2 rupture zone valve 1.1 rupture and fan axial flow 2 zone valve 1.1 rupture and flow fan no. 2 temperature sensor thermal sensor no. 1.1 not operative and thermal sensor no. 1.2 not operative thermal sensor no. 1.1 not operative and piping rupture loop 1.2 thermal sensor no. 1.1 not operative and zone valve 1.2 rupture thermal sensor no. 1.1 not operative and fan axial flow 2 thermal sensor no. 1.1 not operative and flow fan no. 2 temperature sensor fan axial flow 1 and thermal sensor no. 1.2 not operative fan axial flow 1 and piping rupture loop 1.2 fan axial flow 1 and zone valve 1.2 rupture fan axial flow 1 and fan axial flow 2 fan axial flow 1 and flow fan no. 2 temperature sensor flow fan no. 1 temperature sensor and thermal sensor no. 1.2 not operative flow fan no. 1 temperature sensor and piping rupture loop 1.2 flow fan no. 1 temperature sensor and zone valve 1.2 rupture flow fan no. 1 temperature sensor and fan axial flow 2 flow fan no. 1temperature sensor and flow fan no. 2 temperature sensor
1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
environmental thermal sensor piping rupture loop 1.1 and thermal sensor no. 1.2 not operative
piping rupture loop 1.1 and fan axial flow 2 piping rupture loop 1.1 and flow fan no. 2 temperature sensor piping rupture loop 1.1 and piping rupture loop 1.2
piping rupture loop 1.1 and zone valve 1.2 rupture zone valve 1.1 rupture and thermal sensor no. 1.2 not operative zone valve 1.1 rupture and piping rupture loop 1.2 zone valve 1.1 rupture and zone valve 1.2 rupture
zone valve 1.1 rupture and fan axial flow 2
zone valve 1.1 rupture and flow fan no. 2 temperature sensor
thermal sensor no. 1.1 not operative and thermal sensor no. 1.2 not operative thermal sensor no. 1.1 not operative and piping rupture loop 1.2
thermal sensor no. 1.1 not operative and zone valve 1.2 rupture
thermal sensor no. 1.1 not operative and fan axial flow 2
thermal sensor no. 1.1 not operative and flow fan no. 2 temperature sensor fan axial flow 1 and thermal sensor no. 1.2 not operative fan axial flow 1 and piping rupture loop 1.2 fan axial flow 1 and zone valve 1.2 rupture fan axial flow 1 and fan axial flow 2
fan axial flow 1 and flow fan no. 2 temperature sensor
flow fan no. 1 temperature sensor and thermal sensor no. 1.2 not operative flow fan no. 1 temperature sensor and piping rupture loop 1.2 flow fan no. 1 temperature sensor and zone valve 1.2 rupture
flow fan no. 1 temperature sensor and fan axial flow 2 flow fan no. 1 temperature sensor and flow fan no. 2 temperature sensor
Configuration B
1 1 1 1 1 1 1 1 1 1
Cardinality
pump 1 broken no water supply rupture hydronic pipe adduction no electric power no gas supply gas adduction valve 1 closed gas burner rupture 1 boiler pump 1 broken Control Valve rupture loop 1 hydraulic disjunctor 1
Configuration A
Table 8.15 Minimal cut sets configuration, configurations A, B, and C
2 2
2
2
2 2 2 2
2
2
2
2
2
2
2
2 2 2 2
2 2 2
2 2
1 1 1 1 1 1 1 1 1 2
Cardinality
zone valve 1.2 rupture fan axial flow 2 flow fan no. 2 temperature sensor
environmental thermal sensor thermal sensor no. 1.2 not operative piping rupture loop 1.2
pump 1 broken piping rupture loop 1.1 no electric power zone valve 1.1 rupture no gas supply gas adduction valve 1 closed gas burner rupture 1 boiler pump 1 broken thermal sensor no. 1.1 not operative Control Valve rupture loop 1 hydraulic disjunctor 1 fan axial flow 1 flow fan no. 1 temperature sensor
Configuration C
1 1 1
1 1 1
1 1
1 1 1 1 1 1 1 1 1 1
Cardinality
274 8 Effects Analysis and Reliability Modeling of Complex Production Systems
8.7 Application 1 – FTA
275
Table 8.16 Minimal cut sets configuration, configurations D and E Configuration D no water supply rupture hydronic pipe adduction no electric power no gas supply environmental thermal sensor cut sets of cardinality 2 40 cut sets example: ...
Cardinality
Configuration E
Cardinality
1
no electric power
1
1 1 1 1
no gas supply environmental thermal sensor
1 1
2 2
cut sets of cardinality 3 289 cut sets example: pump 1 broken and piping rupture loop 2.1 and zone valve 2.2 rupture ... cut sets of cardinality 4 80 cut sets example: zone valve 1.1 rupture, zone valve 1.2 rupture, thermal sensor no. 2.1 not operative, piping rupture loop 2.2 piping rupture loop 1.1 and zone valve 2.1 rupture and zone valve 1.2 rupture and zone valve 2.2 rupture ...
3 3
4 4
Configuration A Configuration B Configuration C Configuration D Configuration E
0.3288 0.4831 0.2886 0.4492 0.6599
0.1534 0.2886 0.1299 0.2367 0.4453
cut sets of cardinality 3 288 cut sets example: Control Valve rupture loop 1 and zone valve 2.1 rupture and flow fan no. 2 temperature sensor ... cut sets of cardinality 4 81 cut sets example: ...
2 2
3 3
4
4
Table 8.17 System reliability parameters Reliability t D 4;000 t D 6;570
cut sets of cardinality 2 40 cut sets example: pump 1 broken and gas burner rupture 2 ...
MTTF 3,524 5,180 3,218 4,510 7,062
8.7.3.5 Configuration E – Two Boilers and Fan-Coil Redundancy Figure 8.87 presents the failure probability function F .t/, the reliability R.t/, the probability density function f .t/, and the failure rate .t/ for the thermic system, made up of two alternative boilers (i. e., one is redundant) and two redundant fan-coils, without requiring water supply. Figures 8.88–8.90 are similar to those introduced for the previous system configurations. The most critical basic events/components are the gas supply system, the electric power system, the
environmental thermic sensor, pump 1 and pump 2, and the control valve rupture event. Table 8.17 reports the values of reliability (t D 4;000 and 6,570) and MTTF for configurations A– E. In particular configuration E assumes the best values of reliability and MTTF if compared with the others.
8.7.3.6 Repairable System and Monte Carlo Simulation Now the system is supposed to be repairable and all basic components subject to very similar repair behaviors. Figure 8.91 presents the results of the evaluation of the probability distribution of the ttr values in accordance with the availability of a set of 100 historical values. In particular, by a normal distribution is detected with mean value 4.844 hours and standard deviation 1.104 hours. All components are supposed to be repairable in accordance to this statistical distribution.
276
8 Effects Analysis and Reliability Modeling of Complex Production Systems
ReliaSoft BlockSim 7 - www.ReliaSoft.com
ReliaSoft BlockSim 7 - www.ReliaSoft.com
Unreliability vs Time
1.000
Reliability vs Time
1.000
Unreliability
Reliability
Fault Tree1 Unreliability Line
Fault Tree1 Reliability Line
0.800
Reliability, R(t)
Unreliability, F(t)=1-R(t)
0.800
0.600
0.400
0.200
0.600
0.400
0.200
0.000 0.000
4000.000
8000.000
12000.000
16000.000
0.000 0.000
20000.000
4000.000
8000.000
Time, (t) ReliaSoft BlockSim 7 - www.ReliaSoft.com
12000.000
16000.000
20000.000
Time, (t) ReliaSoft BlockSim 7 - www.ReliaSoft.com
Probability Density Function
3.000E-4
Failure Rate vs Time
4.000E-4
Pdf
Failure Rate
Fault Tree1 Pdf Line
Fault Tree1 Failure Rate Line
3.400E-4
Failure Rate, f(t)/R(t)
2.401E-4
f(t)
1.802E-4
1.204E-4
6.048E-5
2.800E-4
2.200E-4
1.600E-4
6.000E-7 0.000
4000.000
8000.000
12000.000
16000.000
1.000E-4 0.000
20000.000
4000.000
8000.000
Time, (t)
12000.000
16000.000
20000.000
Time, (t)
®
Fig. 8.71 F .t /, R.t /, f .t /, and .t /. System configuration A. ReliaSoft software ReliaSoft BlockSim 7 - www.ReliaSoft.com
ReliaSoft BlockSim 7 - www.ReliaSoft.com
Static Reliability Importance
0.481
Static Reliability Importance
0.213
Reliability
Reliability
100%
100%
50%
50%
0.170 0% 5 Item(s)
0.289
0.192
0.096
0.000
Reliability Importance Value
Reliability Importance Value
0.385
0% 5 Item(s)
0.128
0.085
0.043
no water supply
pump 1 broken
no gas supply
no electric power Control Valve rup...
0.000
no water supply
Time = 4000
pump 1 broken
no gas supply
no electric power Control Valve rup...
Time = 8000
Fig. 8.72 Static reliability analysis. System configuration A. ReliaSoft® software
Figure 8.92 presents the system up/down diagram, within an operating time period of 10 years, corresponding 65,700 h, obtained by the application of the Monte Carlo simulation. Figure 8.93 presents the block up/down analysis obtained by the Monte Carlo dynamic evaluation ap-
plied to the most critical basic components/events of the failure tree. It can be stated that the mean availability is 0.9997, the point availability (for t D 65;700 h) is 1, the ENF is 4.15, the uptime is 65;679 h, and the corrective downtime is 20:17 h.
8.8 Application 2 – FTA in a Waste to Energy System
277
ReliaSoft BlockSim 7 - www.ReliaSoft.com
Reliability Importance vs Time 1.000
Importance Fault Tree1 Extra Starting Block Extra Ending Block no electric power no electric power no electric power no electric power no electric power no electric power no electric power flow fan n°1 temperature sensor flow fan n°2 temperature sensor no water supply no electric power rupture hydronic pipe adduction hydraulic disjunctor 1 no gas supply gas adduction valve 1 closed gas burner rupture 1 boiler pump 1 broken environmental thermal sensor environmental thermal sensor environmental thermal sensor pump 1 broken Control Valve rupture loop1 thermal sensor n° 1.1 not operative environmental thermal sensor thermal sensor n° 1.2 not operative thermal sensor n°1.1 not operative thermal sensor n°1.2 not operative fan axial flow 1 flow fan n°1 temperature sensor no water supply rupture hydronic pipe adduction no electric power zone valve 1.1 rupture gas adduction valve 1 closed no gas supply environmental thermal sensor boiler pump 1 broken piping rupture loop 1.1 thermal sensor n° 1.1 not operative pump 1 broken zone valve 1.2 rupture piping rupture loop 1.2
Reliability Importance Value
0.800
0.600
0.400
0.200
0.000 0.000
4000.000
8000.000
12000.000
16000.000
20000.000
Time, (t)
Fig. 8.73 Time-dependent reliability analysis. System configuration A. ReliaSoft® software
8.8 Application 2 – FTA in a Waste to Energy System This section introduces a case study including a costbased model for failure modes analysis, reliability prediction, and magnitude evaluation of a waste to energy (WtE) plant. The model pays particular attention to the economic determination and evaluation of the environmental effects, here called “externalities,” of those facilities dedicated to the thermic treatment of waste, in accordance with the adoption of different maintenance policies. In detail, after a short description of the incinerator object of the study, this section illustrates the FTA conducted on some critical subsystems of the WtE plant. A qualitative and quantitative evaluation of the solid waste incinerator is carried out and the results of these FTAs, as reported in Sects. 8.8.6 and 8.8.7, join in a cost-based prediction reliability model for the determination of the economic effects of the emissions, e. g., nitrogen oxides (NOx ) and carbon dioxide. This
model is based on the integration of a failure modes analysis, a reliability prediction analysis, and a “magnitude of consequences” evaluation, which takes inspiration from the large number of literature studies on the determination of the externalities in WtE plants.
8.8.1 Introduction to Waste Treatment An incinerator is a waste treatment technology for the thermic treatment of waste. By high-temperature combustion it transforms waste into thermic energy useful for the generation of electricity and/or for district heating. An incinerator also produces gaseous emissions in the atmosphere and residual ash. The incinerator represents one of the most popular alternative technology to landfilling and biological treatment of waste. It is particularly popular in countries such as Japan where land is a scarce resource, but several municipalities all over the world, such as
278
8 Effects Analysis and Reliability Modeling of Complex Production Systems
ReliaSoft BlockSim 7 - www.ReliaSoft.com
Block Failure Rate vs Time 4.000E-4
Failure Rate Fault Tree1 System Control Valve rupture loop1 no electric power no gas supply no water supply pump 1 broken
Failure Rate, f(t)/R(t)
3.200E-4
2.400E-4
1.600E-4
8.000E-5
0.000 0.000
100000.000
200000.000
300000.000
400000.000
500000.000
Time, (t)
Fig. 8.74 Failure rates of the system and of the most critical components. System configuration A. ReliaSoft® software
Hong Kong, Saugus in Massachusetts, USA, Brescia in Italy, London in the UK, and Tokyo in Japan, have adopted municipal solid waste incinerators. Table 8.18 presents a snapshot on WtE plants in Europe as of 2002. A WtE plant is equipped with high-efficiency furnaces and devices for continuous monitoring of emissions and air pollution control. There are various types of incinerator plants: • Simple incinerator made of a brick-lined cell, with a metal grate over a lower ash pit, and openings, called “clinkers,” for waste loading and refuse removal; often used for domestic heating. • Moving grate combustion. A grate enables the movement of waste through the combustion chamber. • Rotary kiln, made of a long, slightly inclined cylindrical tube along which refuse is continuously moved and spills out of the end through the clink-
ers. The system is made of some different sections where waste is dried, ignited, and completely burned. • Multiple/stepped heart. Waste is transported through the furnace by moving teeth mounted on a central rotating shaft. • Fluidized bed. An flow of air is forced through a bed of sand. The sand particles separate, enabling air to flow through; thus, a fluidized bed is created and fuel and waste can be introduced. The mass of waste, fuel, and sand is fully circulated through the furnace.
8.8.2 Case study The WtE plant considered, as reported in Table 8.19, has a plant capacity, or waste treatment capacity, of about 200 ton=day for 2;600 kcal=kg
8.8 Application 2 – FTA in a Waste to Energy System ReliaSoft BlockSim 7 - www.ReliaSoft.com
279 ReliaSoft BlockSim 7 - www.ReliaSoft.com
Unreliability vs Time
1.000
Reliability vs Time
1.000
Unreliability
Reliability Fault Tree1 Reliability Line
Fault Tree1 Unreliability Line
0.800
Reliability, R(t)=F(t)
Unreliability, F(t)=1-R(t)
0.800
0.600
0.400
0.200
0.600
0.400
0.200
0.000 0.000
6000.000
12000.000
18000.000
24000.000
0.000 0.000
30000.000
6000.000
ReliaSoft BlockSim 7 - www.ReliaSoft.com
12000.000
18000.000
24000.000
30000.000
Time, (t)
Time, (t) ReliaSoft BlockSim 7 - www.ReliaSoft.com
Probability Density Function
2.000E-4
Failure Rate vs Time
3.000E-4
Pdf
Failure Rate
Fault Tree1 Pdf Line
Fault Tree1 Failure Rate Line
1.600E-4
Failure Rate, f(t)/R(t)
2.400E-4
f(t)
1.201E-4
8.012E-5
4.016E-5
1.800E-4
1.200E-4
6.000E-5
2.000E-7 0.000
6000.000
12000.000
18000.000
24000.000
0.000 0.000
30000.000
6000.000
12000.000
18000.000
24000.000
30000.000
Time, (t)
Time, (t)
Fig. 8.75 F .t /, R.t /, f .t /, and .t /. System configuration B. ReliaSoft® software ReliaSoft BlockSim 7 - www.ReliaSoft.com
ReliaSoft BlockSim 7 - www.ReliaSoft.com
Static Reliability Importance 0.682
Static Reliability Importance 0.428
Reliability
Reliability
100%
100%
50%
50%
0.342 0% 5 Item(s)
0.409
0.273
0.136
Reliability Importance Value
Reliability Importance Value
0.546
0% 5 Item(s)
0.257
0.171
0.086
0.000
0.000 pump 1 broken
no gas supply
no electric power Control Valve rup... gas adduction val...
pump 1 broken
Time = 4000
no gas supply no electric power Control Valve rup... gas adduction val...
Time = 8000
Fig. 8.76 Static reliability analysis. System configuration B. ReliaSoft® software
of waste, resulting in 11;000 MWh=year of electric energy and 34;000 MWh=year of thermic energy produced, thus corresponding to 1:238 kWh for each kilogram of waste. The system supplies thermic energy for a community of about 2,600 families.
8.8.3 Emissions and Externalities: Literature Review Even incinerators are faced with environmental and health questions. An exemplifying list obtained from the literature mentions damage to buildings, forests,
280
8 Effects Analysis and Reliability Modeling of Complex Production Systems
ReliaSoft BlockSim 7 - www.ReliaSoft.com
Reliability Importance vs Time 1.000
Importance Fault Tree1 Extra Starting Block Extra Ending Block no electric power no electric power no electric power no electric power no electric power no electric power no electric power flow fan n°1 temperature sensor flow fan n°2 temperature sensor no electric power hydraulic disjunctor 1 no gas supply gas adduction valve 1 closed gas burner rupture 1 boiler pump 1 broken environmental thermal sensor environmental thermal sensor environmental thermal sensor pump 1 broken Control Valve rupture loop1 thermal sensor n° 1.1 not operative environmental thermal sensor thermal sensor n° 1.2 not operative thermal sensor n°1.1 not operative thermal sensor n°1.2 not operative fan axial flow 1 flow fan n°1 temperature sensor no electric power zone valve 1.1 rupture gas adduction valve 1 closed no gas supply environmental thermal sensor boiler pump 1 broken piping rupture loop 1.1 thermal sensor n° 1.1 not operative pump 1 broken zone valve 1.2 rupture piping rupture loop 1.2 no electric power gas burner rupture 1 flow fan n°2 temperature sensor fan axial flow 2
Reliability Importance Value
0.800
0.600
0.400
0.200
0.000 0.000
6000.000
12000.000
18000.000
24000.000
30000.000
Time, (t)
Fig. 8.77 Time-dependent reliability analysis. System configuration B. ReliaSoft® software
and agricultural yields; costs associated with transportation and logistics (e. g., vehicle emissions, congestion, accidents, noise); odor, dust, visual intrusion, etc. The magnitude of these effects strongly depends on the distance from the site, the type of waste, topography, prevailing wind directions, etc., and as a consequence the costs of externalities can range in a wide interval. According to EC Directives, published in 2000, NOx emissions, with about 70% of the total health costs, are the most critical externality generated by an incinerator. They are believed to aggravate asthmatic conditions, and react with the oxygen in the air to produce ozone, which is also an irritant, and eventually forming nitric acid when they are dissolved in water. When they are dissolved in atmospheric moisture, the result is acid rain, which can damage entire forest ecosystems. As illustrated in Table 8.20, costs associated with NOx vary very significantly in literature studies (Eshet et al. 2006), ranging from US$ 0.13 to US$ 18.6
per kilogram of NOx . This table presents economic unit values of all externalities associated with different emissions (CO2 , CH4 , NOx , PM10 , SO2 , etc.) for both landfill and incinerators. These economic unit values are quantified in dollar per kilogram of pollutant (at 2003 prices). Table 8.21 reports economic valuations in US dollars per ton of waste (2003 prices) for specific impacts (e. g., transportation, leachate) for incineration. The following analysis and results refer to the control and reduction of NOx emissions in the incinerator considered, with particular attention to the so-called selective noncatalytic reduction (SNCR) technology.
8.8.4 SNCR Plant Table 8.22 quantifies the annual cost of externalities associated with some critical emissions of the incinerator, in accordance with the economic unit values
8.8 Application 2 – FTA in a Waste to Energy System
281
ReliaSoft BlockSim 7 - www.ReliaSoft.com
Block Failure Rate vs Time 3.000E-4
Failure Rate Fault Tree1 System Control Valve rupture loop1 gas adduction valve 1 closed no electric power no gas supply pump 1 broken
Failure Rate, f(t)/R(t)
2.400E-4
1.800E-4
1.200E-4
6.000E-5
0.000 0.000
100000.000
200000.000
300000.000
400000.000
500000.000
Time, (t)
Fig. 8.78 Failure rates of the system and of the most critical components. System configuration B. ReliaSoft® software
collected from the literature (total average value in Table 8.20, last row). In particular, the emission of NOx represents about 33.5% of the admissible value of 85;619 kg=year (EC Directives); moreover, the related cost represents 99% of global social costs associated with pollutant emissions. In order to limit gas emissions in the atmosphere, and in particular the emissions of NOx , in accordance with the limits fixed by 2000/76/CE Directive, a SNCR plant has been recently introduced. The SNCR technology injects urea into the firebox of the boiler to react with the nitrogen oxides formed in the combustion process at a gas temperature between 1,600 and 2;100 ı F. This chemical reaction produces elemental nitrogen, carbon dioxide, and water. As a result of the introduction of the SNCR plant, the average value of NOx emissions decreased from 150 to 120 mg=Nm3 . This is the control parameter of the incineration process, and values greater than 200 mg=Nm3 , as declared by the manufacturer, can reveal anomalies.
Figure 8.94 illustrates the statistical distribution of NOx (mg=Nm3) emissions during a period of time T from June 2005 to February 2007, for the power plant considered. This analysis is based on more than 25,000 half-hour observations. A half-hourly observation gives the average value of 30 values registered each minute. Figure 8.95 reports the trend of half-hour values during the 20-month observation period. By an in-depth analysis of these values, for 12;185 h the NOx emissions did not pass the critical value of 200 mg=Nm3, while for 75 h the SNCR system did not function correctly. In particular, the emission values exceeded 235 mg=Nm3 for 4 h.
8.8.5 SNCR Plant. Reliability Prediction and Evaluation Model A FTA was implemented by Relex® Reliability software in order to investigate the minimal conditions
282
8 Effects Analysis and Reliability Modeling of Complex Production Systems
ReliaSoft BlockSim 7 - www.ReliaSoft.com
ReliaSoft BlockSim 7 - www.ReliaSoft.com
Unreliability vs Time
1.000
Reliability vs Time
1.000
Unreliability
Reliability
Fault Tree1 Unreliability Line
Fault Tree1 Reliability Line
0.800
Reliability, R(t)=F(t)
Unreliability, F(t)=1-R(t)
0.800
0.600
0.400
0.200
0.600
0.400
0.200
0.000 0.000
4000.000
8000.000
12000.000
16000.000
0.000 0.000
20000.000
4000.000
8000.000
12000.000
Time, (t) ReliaSoft BlockSim 7 - www.ReliaSoft.com
16000.000
20000.000
Time, (t) ReliaSoft BlockSim 7 - www.ReliaSoft.com
Probability Density Function
4.000E-4
Failure Rate vs Time
4.000E-4
Pdf
Failure Rate
Fault Tree1 Pdf Line
Fault Tree1 Failure Rate Line
3.600E-4
Failure Rate, f(t)/R(t)
3.201E-4
f(t)
2.402E-4
1.603E-4
8.040E-5
3.200E-4
2.800E-4
2.400E-4
5.000E-7 0.000
4000.000
8000.000
12000.000
16000.000
2.000E-4 0.000
20000.000
4000.000
Time, (t)
8000.000
12000.000
16000.000
20000.000
Time, (t)
Fig. 8.79 F .t /, R.t /, f .t /, and .t /. System configuration C. ReliaSoft® software
ReliaSoft BlockSim 7 - www.ReliaSoft.com
ReliaSoft BlockSim 7 - www.ReliaSoft.com
Static Reliability Importance
0.408
Static Reliability Importance
0.166
Reliability
Reliability
100%
100%
50%
50%
0.133 0% 5 Item(s)
0.245
0.163
0.082
0.000
Reliability Importance Value
Reliability Importance Value
0.326
0% 5 Item(s)
0.100
0.066
0.033
pump 1 broken
no gas supply
flow fan n°2 tem... thermal sensor n... flow fan n°1 tem...
0.000
pump 1 broken
Time = 4000
no gas supply thermal sensor n... flow fan n°1 tem... flow fan n°2 tem...
Time = 8000
®
Fig. 8.80 Static reliability analysis. System configuration C. ReliaSoft software
8.8 Application 2 – FTA in a Waste to Energy System ReliaSoft BlockSim 7 - www.ReliaSoft.com
283
Reliability Importance vs Time
1.000
Importance Fault Tree1 no electric power no electric power no electric power no electric power no electric power no electric power no electric power flow fan n°1 temperature sensor flow fan n°2 temperature sensor no electric power hydraulic disjunctor 1 no gas supply gas adduction valve 1 closed gas burner rupture 1 boiler pump 1 broken environmental thermal sensor environmental thermal sensor environmental thermal sensor pump 1 broken Control Valve rupture loop1 thermal sensor n° 1.1 not operative environmental thermal sensor thermal sensor n° 1.2 not operative thermal sensor n°1.1 not operative thermal sensor n°1.2 not operative fan axial flow 1 flow fan n°1 temperature sensor no electric power zone valve 1.1 rupture gas adduction valve 1 closed no gas supply environmental thermal sensor boiler pump 1 broken piping rupture loop 1.1 thermal sensor n° 1.1 not operative pump 1 broken zone valve 1.2 rupture piping rupture loop 1.2 no electric power gas burner rupture 1 flow fan n°2 temperature sensor fan axial flow 2 hydraulic disjunctor 1 Control Valve rupture loop1
Reliability Importance Value
0.800
0.600
0.400
0.200
0.000 0.000
4000.000
8000.000
12000.000
16000.000
20000.000
Time, (t)
Fig. 8.81 Time-dependent reliability analysis. System configuration C. ReliaSoft® software
which cause an incorrect functioning of the system identified by the top event “NOx emissions exceeding the threshold 200 mg=Nm3.” Figure 8.96 shows the fault tree obtained for the determination of the unavailability Q.t/ of the SNCR plant and the probability associated with the top event.
where AIR_SEC D VR1101_fail C AIR_fail level 3
D VR1101 C ELECRTRIC_fail
level 4
C AIR_fail D VR1101 C TT101 TT105
8.8.6 Qualitative FTA Evaluation This section illustrates the qualitative evaluation of the fault tree, given the top event “exceeding NOx 200 mg=Nm3 limit.” By applying the Boolean algebra, one can explain the top event explained as follows (see Fig. 8.96 for nomenclature): TOP D TCOMB C P_UREA level 1
D AIR_SEC C m_CIRCU C TKUREA
level 2
C m_DOSAGE C e_ELECTRIC C m_SUPPLY;
level 5
C AIR_fail; m_CIRCU D p_CIRCU C f_CIRCU level 3
D CX51005 CX51006
level 4
C DH51001 DH51002; m_DOSAGE D p_DOSAGE C f_DOSAGE level 3
D CX51008 CX51009
level 4
C DH51003 DH51004;
284
8 Effects Analysis and Reliability Modeling of Complex Production Systems
ReliaSoft BlockSim 7 - www.ReliaSoft.com
Block Failure Rate vs Time 4.000E-4
Failure Rate Fault Tree1 System flow fan n°1 temperature sensor flow fan n°2 temperature sensor no gas supply pump 1 broken thermal sensor n° 1.1 not operative
Failure Rate, f(t)/R(t)
3.220E-4
2.440E-4
1.660E-4
8.800E-5
1.000E-5 0.000
40000.000
80000.000
120000.000
160000.000
200000.000
Time, (t)
Fig. 8.82 Failure rates of the system and of the most critical components. System configuration C. ReliaSoft® software
m_SUPPLY D SPEARS_1 SPEARS_2 level 3
D .INJ51101L C INJ51102L
level 4
C INJ51103L/ .INJ51101H C INJ51102H C INJ51103H/ D INJ51101L INJ51101H
level 5
C INJ51101L INJ51102H C INJ51101L INJ51103H C INJ51102L INJ51101H C INJ51102L INJ51102H C INJ51102L INJ51103H C INJ51103L INJ51101H C INJ51103L INJ51102H C INJ51103L INJ51103H:
Consequently, TOP D VR1101 C TT101 TT105 C AIR_fail C CX51005 CX51006 C DH51001 DH51002 C TKUREA C CX51008 CX51009 C DH51003 DH51004 C e_ELECTRIC C INJ51101L INJ51101H C INJ51101L INJ51102H C INJ51101L INJ51103H C INJ51102L INJ51101H C INJ51102L INJ51102H C INJ51102L INJ51103H C INJ51103L INJ51101H C INJ51103L INJ51102H C INJ51103L INJ51103H:
8.8 Application 2 – FTA in a Waste to Energy System
285
ReliaSoft BlockSim 7 - www.ReliaSoft.com
ReliaSoft BlockSim 7 - www.ReliaSoft.com
Unreliability vs Time
1.000
Reliability vs Time
1.000
Unreliability
Reliability
Fault Tree1 Unreliability Line
Fault Tree1 Reliability Line
0.800
Reliability, R(t)=F(t)
Unreliability, F(t)=1-R(t)
0.800
0.600
0.400
0.200
0.600
0.400
0.200
0.000 0.000
4000.000
8000.000
12000.000
16000.000
0.000 0.000
20000.000
4000.000
8000.000
Time, (t)
12000.000
16000.000
20000.000
Time, (t)
ReliaSoft BlockSim 7 - www.ReliaSoft.com
ReliaSoft BlockSim 7 - www.ReliaSoft.com
Probability Density Function
2.000E-4
Failure Rate vs Time
4.000E-4
Pdf
Failure Rate
Fault Tree1 Pdf Line
Fault Tree1 Failure Rate Line
3.200E-4
Failure Rate, f(t)/R(t)
1.600E-4
f(t)
1.200E-4
8.000E-5
4.000E-5
2.400E-4
1.600E-4
8.000E-5
0.000 0.000
4000.000
8000.000
12000.000
16000.000
0.000 0.000
20000.000
4000.000
8000.000
Time, (t)
12000.000
16000.000
20000.000
Time, (t)
Fig. 8.83 F .t /, R.t /, f .t /, and .t /. System configuration D. ReliaSoft® software ReliaSoft BlockSim 7 - www.ReliaSoft.com
ReliaSoft BlockSim 7 - www.ReliaSoft.com
Static Reliability Importance
0.657
Static Reliability Importance
0.345
Reliability
Reliability
100%
100%
50%
50%
0.276 0% 5 Item(s)
0.394
0.263
0% 5 Item(s)
0.207
0.138
0.069
0.131
0.000
Reliability Importance Value
Reliability Importance Value
0.526
no water supply
no gas supply
no electric power rupture hydronic ... environmental th...
0.000
no water supply
no gas supply
no electric power rupture hydronic ... environmental th...
Time = 8000
Time = 4000
Fig. 8.84 Static reliability analysis. System configuration D. ReliaSoft® software
Filters DH, pumps CX, and spears INJ can be considered to be identical items, and consequently the analyst could be seduced into appling the absorption laws. The previous equation seems to change as follows:
TOP D VR1101 C TT101 TT105 C AIR_fail C TKUREA C e_ELECTRIC C INJ C CX C DH;
286
8 Effects Analysis and Reliability Modeling of Complex Production Systems
ReliaSoft BlockSim 7 - www.ReliaSoft.com
Reliability Importance vs Time
1.000
Importance Fault Tree1 Extra Starting Block Extra Ending Block no water supply rupture hydronic pipe adduction fan axial flow 1 no electric power rupture hydronic pipe adduction no water supply no electric power fan axial flow 2 hydraulic disjunctor 1 piping rupture loop 1.1 zone valve 1.1 rupture gas burner rupture 1 no electric power pump 1 broken boiler pump 1 broken Control Valve rupture loop1 thermal sensor n°1.1 not operative no electric power no electric power environmental thermal sensor thermal sensor n° 1.1 not operative no electric power zone valve 2.1 rupture no gas supply no electric power environmental thermal sensor flow fan n°1 temperature sensor flow fan n°1 temperature sensor flow fan n°1 temperature sensor thermal sensor n° 1.1 not operative thermal sensor n° 2.1 not operative environmental thermal sensor environmental thermal sensor no electric power no electric power piping rupture loop 2.1 environmental thermal sensor thermal sensor n° 2.1 not operative thermal sensor n° 2.1 not operative no electric power hydraulic disjunctor 1 zone valve 1.2 rupture
Reliability Importance Value
0.800
0.600
0.400
0.200
0.000 0.000
4000.000
8000.000
12000.000
16000.000
20000.000
Time, (t)
Fig. 8.85 Time-dependent reliability analysis. System configuration D. ReliaSoft® software
where CX D CX51008 D CX51008 CX51009 C CX51005 CX51006; DH D DH51001 D DH51001 DH51002 C DH51003 DH51004; INJ D INJ51101L D INJ51101L INJ51101H
cardinality 1. Nevertheless this equation is not correct because the absorption laws can be applied only in the case when the same basic component event, i. e., the same item, is redundant in a Boolean equation. For example, if components DH51001 and DH51002 have the same failure behavior but they deal with distinct items, the following reduction is consequently false: DH D DH51001 D DH51001 DH51002
C INJ51101L INJ51102H
C DH51003 DH51004:
C INJ51101L INJ51103H C INJ51102L INJ51101H C INJ51102L INJ51102H C INJ51102L INJ51103H C INJ51103L INJ51101H C INJ51103L INJ51102H C INJ51103L INJ51103H: By the last equation eight cut sets are obtained, one of cardinality 2 (TT101 TT105) and the others of
In the same way the other reductions in the equation reported above are not feasible. The basic events involved are not mirror1 items. Similarly for the control of every critical emission and pollutant, e. g., HCl, CO, and SO2 , specific fault trees have been designed. Qualitative analyses for the determination of the MCS and quantitative anal1
The meaning of mirror event was illustrated at the beginning of this chapter.
8.8 Application 2 – FTA in a Waste to Energy System ReliaSoft BlockSim 7 - www.ReliaSoft.com
287
Block Failure Rate vs Time
4.000E-4
Failure Rate Fault Tree1 System environmental thermal sensor no electric power no gas supply no water supply rupture hydronic pipe adduction
Failure Rate, f(t)/R(t)
3.200E-4
2.400E-4
1.600E-4
8.000E-5
0.000 0.000
100000.000
200000.000
300000.000
400000.000
500000.000
Time, (t)
Fig. 8.86 Failure rates of the system and of the most critical components. System configuration D. ReliaSoft® software
yses for the determination of the reliability parameters, e. g., unavailability, ENF, and reliability function, which describe the correct and incorrect function of the system, have been implemented.
8.8.7 NOx Emissions: Quantitative FTA Evaluation This section summarizes the results obtained by the evaluation of the most important reliability parameters related to the system, given a specific top event “exceeding NOx limit.” For this purpose, Table 8.23 summarizes some significant parameters for the basic/primary components of the system which are involved in MCS previously identified. In particular,
assuming a length of the period of time T equal to 365 h, about 15 days, the approximated values of the unavailability by Eq. 8.11 and of the probability function F .T / by Eq. 8.10 are reported in Table 8.23, columns 4 and 5, respectively, while the exact value of F .T / is in the last column. In order to properly illustrate the correct quantitative evaluation of the fault trees in Figs. 8.96 and 8.100, the analysis is conducted on MCS assuming the same failure behavior for every component of the same kind, i. e., pumps, filters, and spears. Table 8.24 reports the unavailability by Eq. 8.12, the ENF by Eq. 8.17, the probability function, and the survival function for the generic cut set CSi . The following equation exemplifies the calculus of the ENF for the MCS made up of two temperature transmitters TT101 and TT105 (related to the cut set
288
8 Effects Analysis and Reliability Modeling of Complex Production Systems
ReliaSoft BlockSim 7 - www.ReliaSoft.com
ReliaSoft BlockSim 7 - www.ReliaSoft.com
Unreliability vs Time
1.000
Reliability vs Time
1.000
Unreliability
Reliability
Fault Tree1 Unreliability Line
Fault Tree1 Reliability Line
0.800
Reliability, R(t)=F(t)
Unreliability, F(t)=1-R(t)
0.800
0.600
0.400
0.200
0.600
0.400
0.200
0.000 0.000
6000.000
12000.000
18000.000
24000.000
0.000 0.000
30000.000
6000.000
12000.000
Time, (t) ReliaSoft BlockSim 7 - www.ReliaSoft.com
18000.000
24000.000
30000.000
Time, (t) ReliaSoft BlockSim 7 - www.ReliaSoft.com
Probability Density Function
1.000E-4
Failure Rate vs Time
3.000E-4
Pdf
Failure Rate Fault Tree1 Failure Rate Line
Fault Tree1 Pdf Line
2.500E-4
Failure Rate, f(t)/R(t)
8.010E-5
f(t)
6.020E-5
4.030E-5
2.040E-5
2.000E-4
1.500E-4
1.000E-4
5.000E-7 0.000
6000.000
12000.000
18000.000
24000.000
5.000E-5 0.000
30000.000
6000.000
Time, (t)
12000.000
18000.000
24000.000
30000.000
Time, (t)
Fig. 8.87 F .t /, R.t /, f .t /, and .t /. System configuration E. ReliaSoft® software
ReliaSoft BlockSim 7 - www.ReliaSoft.com
ReliaSoft BlockSim 7 - www.ReliaSoft.com
Static Reliability Importance
0.808
Static Reliability Importance
0.522
Reliability
Reliability
100%
100%
50%
50%
0.417 0% 5 Item(s)
0.485
0.323
0.162
0.000
Reliability Importance Value
Reliability Importance Value
0.647
0% 5 Item(s)
0.313
0.209
0.104
no gas supply
no electric power environmental th... pump 1 broken
pump 2 broken
0.000
no gas supply
Time = 4000
Fig. 8.88 Static reliability analysis. System configuration E. ReliaSoft® software
no electric power environmental th... pump 1 broken
Time = 8000
pump 2 broken
8.8 Application 2 – FTA in a Waste to Energy System
289
ReliaSoft BlockSim 7 - www.ReliaSoft.com
Reliability Importance vs Time 1.000
Importance Fault Tree1 Extra Starting Block Extra Ending Block fan axial flow 1 no electric power no electric power fan axial flow 2 hydraulic disjunctor 1 piping rupture loop 1.1 zone valve 1.1 rupture gas burner rupture 1 no electric power pump 1 broken boiler pump 1 broken Control Valve rupture loop1 thermal sensor n°1.1 not operative no electric power no electric power environmental thermal sensor thermal sensor n° 1.1 not operative no electric power zone valve 2.1 rupture no gas supply no electric power environmental thermal sensor flow fan n°1 temperature sensor flow fan n°1 temperature sensor flow fan n°1 temperature sensor thermal sensor n° 1.1 not operative thermal sensor n° 2.1 not operative environmental thermal sensor environmental thermal sensor no electric power no electric power piping rupture loop 2.1 environmental thermal sensor thermal sensor n° 2.1 not operative thermal sensor n° 2.1 not operative no electric power hydraulic disjunctor 1 zone valve 1.2 rupture gas adduction valve 1 closed no electric power piping rupture loop 1.2 flow fan n°2 temperature sensor
Reliability Importance Value
0.800
0.600
0.400
0.200
0.000 0.000
6000.000
12000.000
18000.000
24000.000
30000.000
Time, (t)
Fig. 8.89 Time-dependent reliability analysis. System configuration E. ReliaSoft® software
TT101TT105) on the ground of Eq. 8.17 for a period of time T D 365 h: ENFCSTT101TT105 .T D 365 h/ Z365 wCSTT101TT105 .t/ dt D
D 0
0
D 365.T T 101 T T 105 T T 105 C T T 105 T T 101 T T 101 /
X
Š 3:77 106 failures.
wj .t/
j 2CSTT101TT105
Y
qk .t/ dt
k¤j k;j 2CSTT101TT105
Z365 D ŒT T 101 qT T 105 .t/ C T T 105 qT T 101 .t/ dt 0
1 T T 105 T T 101 D 1 T T 101
C T T 105 .T T 101 T T 101 / dt
0
Z365
D
T T 105 D
Z365 ŒT T 101 .T T 105 T T 105 /
By the application of Eq. 8.13 for a period of time T D 15 days, a Y qCSi .T / D 1 Œ1 qCSi .T / QS .t D T / D i
i 4
D 7:410 10 X qCSi .T / Š 7:412 104 : i
290
8 Effects Analysis and Reliability Modeling of Complex Production Systems
ReliaSoft BlockSim 7 - www.ReliaSoft.com
Block Failure Rate vs Time 2.500E-4
Failure Rate Fault Tree1 System environmental thermal sensor no electric power no gas supply pump 1 broken pump 2 broken
Failure Rate, f(t)/R(t)
2.000E-4
1.500E-4
1.000E-4
5.000E-5
0.000 0.000
100000.000
200000.000
300000.000
400000.000
500000.000
Time, (t)
Fig. 8.90 Failure rates of the system and of the most critical components. System configuration E. ReliaSoft® software
time to repair - ttr Evaluation Normal Mean StDev N
100
Percent
80
60
40
20
0 00 25 50 75 0 0 2 5 50 75 00 25 50 7 5 0 0 25 50 75 00 25 50 7 5 00 25 50 75 00 2. 2. 2. 2. 3 . 3 . 3. 3. 4. 4. 4. 4. 5 . 5 . 5. 5. 6. 6. 6. 6 . 7 . 7. 7. 7. 8.
ttr
Fig. 8.91 Time to repair (ttr) probability distribution evaluation
4.844 1.104 100
8.8 Application 2 – FTA in a Waste to Energy System ReliaSoft BlockSim 7 - www.ReliaSoft.com
291
System Up/Down State Operating Time Time Under Repair
System
0.000
13140.000
26280.000
39420.000
52560.000
65700.000
Time, (t)
Fig. 8.92 Repairable system. System up/down analysis. ReliaSoft® software
ReliaSoft BlockSim 7 - www.ReliaSoft.com
Block Up/Down State Operating Time Time Under Repair
pump 2 broken
pump 1 broken
no gas supply
no electric power
environmental thermal sensor
System
0.000
13140.000
26280.000
39420.000
52560.000
65700.000
Time, (t)
Fig. 8.93 Repairable system. Component up/down analysis. ReliaSoft® software
292
8 Effects Analysis and Reliability Modeling of Complex Production Systems
Table 8.18 Waste to energy (WtE) plants in Europe (2002) Country
Number of plants
Burned quantities (ton=year)
Austria Belgium Denmark France Germany UK Italy Norway Holland Portugal Spain Sweden Switzerland Hungary
2 18 32 112 60 3 50 4 11 2 8 19 31 1
406,700 2,652,000 3,136,000 11,965,800 16,787,400 1,071,000 3,488,776 273,000 4,412,000 933,800 1,070,300 2,344,000 3,150,700 420,000
Total
354
52,111,476
Table 8.19 Operative characteristic of the WtE plant, case study Operative characteristic Incinerator capacity. Waste quantities (nominal value considering 2 lines) Waste heat of combustion Smoke flow during gas purification Mean temperature of furnace Mean temperature of the postcombustion chamber Smoke temperature during cleaning Smoke temperature (ref. chimney) Vapor production Vapor pressure Overheated temperature Operation hours per year
Value
Unit of measure
8.33 (200) 10,868 (2,600) 50,400 1,000 980 230 170 28 10 300 8,000
ton=h (ton=day) kJ=kg (kcal=kg) Nm3 =h ı C ı C ı C ı C ton=h bar ı C h=year
Similarly, the failure probability function of the system for the period of time T is a Y FCSi .T / D 1 Œ1 FCSi .T / FS .t D T / D i
i
D 0:08373
X
FCSi .T / Š 0:08665:
i
Applying Eq. 8.18, the ENF for the system is X WCSi Š 7:25 102 failures: ENF.T D 365 h/ Š i
Finally, the MTTR defined for the system, given the top event, can be quantified by the application of
Eq. 8.20: 8 7:412 104 QS .T / ˆ ˆ MTTR D Š Š 3:73 h ˆ S ˆ ws .T / 1:986 104 ˆ ˆ ˆ < 0:0725 Ws .T / ˆ D ws .T / D ˆ ˆ T 365 ˆ ˆ ˆ ˆ : Š 1:986 104 day1 :
8.8.8 Criticality Analysis Figure 8.97 presents a view of the criticality analysis conducted with Relex® Reliability software. There are three main measures to detect weak points in the
2001
2002
2002
2003
RDC and PIRA
Eunomia
AEA Technology
Dijkgraaf and Vollebergh
Total average value
2000
EU 0.0035
0.0238
0.034
0.0245–0.0257 (0.0251)
0.6242
0.379
0.4506–0.4892 (0.4694)
1999
Krewitt et al. 0.004–0.042 (0.023)
18.6
1998
6.8104
3.291
1.4–8.2 (4.2)
4.3–18.34 (11.32)
3.4–5.4 (5 EU)
18.05
1998
4.6–18 (11.3)
8
0.9–18 (9.45)
0.13
0.19
18.34
7.33 2.4–4.7 (3.55)
0.132–0.523 (0.3275)
0.132–0.523 (0.3275)
NOx
Rabl et al. (a) (average France) Rabl et al. (b)
0.053–2.223 (1.138)
0.124
0.086
2.69
0.051–0.2216 (0.1363)
0.0496–0.2216 (0.1356)
CH4
1997
0.0038–0.1339 (0.072)
0.023
0.023
0.004
0.04 0.004
0.0065–0.0496 (0.02805)
0.0017–0.0136 (0.00765)
Pollutant
CO2
ExternE (Spain–France)
UK
1998
1996
Enosh
Eyre EU
1996
EC (a) (German case)
1996
1995 1996
ECON EC (b) (average EU12)
1997
1994
Powell and Brisson
Rosendash
1993
CSERGE
EMC
Year
Study
36.156
1.7–22 (14)
24
12.8–17.4 (13 EU)
6.6–62.7 (34.6)
13.6
4.41–57 (30.7)
15
1.3–57 (29.5)
260
22.2
13.6
28.7
20.5 9.5–12.8 (11.15)
22.75
22.75
PM10
5.383
4.701
5.2
1
2.1–12.2 (7.15)
6–8.3 (6 EU)
13.4
12.2
4.21–15.3 (9.755)
7
1–15 (7.5)
0.38
0.42
7.3
2.1 3.1–7.3 (5.2)
0.392–0.68 (0.536)
0.392–0.68 (0.536)
SO2
0.1905
0.002–0.009 (0.0055)
0.002
0.045–1.583 (0.814)
0.124
0.007
CO
8.239–14.161 (11.2)
1.469
N2 O
Economic unit-values (US$=kg emission, $, 2003) calculated/used in estimates of landfill and incineration externalities (average in parenthesis)
Table 8.20 Economic unit values of emissions for incinerators (Eshet et al. 2006)
1.262
2.1
0.73
0.7
0.7
2.53
1.65
VOC
314.24
VCI
293
1916
1445
Heavy metals
0
3378
Leachate Pb C Cd C Hg
16,300,000
2,000,000
Dioxins
8.8 Application 2 – FTA in a Waste to Energy System 293
294
8 Effects Analysis and Reliability Modeling of Complex Production Systems
Table 8.21 Costs and benefits from incineration (US$=ton waste) (Eshet et al. 2006) Valuation results (costs and benefits) on emissions from incineration (US$=ton waste, $, 2003) Pullutant study
CO2
Tellus (1992) CSERGE et al. (1993)b Powell and Brisson (1994)b ECON (1995)c EC (1996) Enosh (1996) EMC (1996) Miranda and Hale (1997)d Rabl et al. (1998a) ExternE (1998) EC (2000a,b) Eunomia (2002) Dijkgraaf and Vollebergh (2003)e
NO2
Other conventional
Transportation
Energy recovery
1.1–10.72
1.64–3.3
0.17–1.64
6.88–23.6
1–5 5.77–19.8
1.1–10.72
1.85–4.08
0.368–0.567
10.99–15.04
(–)3.15–6.3
3.9
0.5–1 19.65–20.69 17.26
5–108 8.72–23.43
Total estimatea
28–171 1.3 10.09 1.65 5.17–31.5
8.55 8.55
2.51
0.97–1.68
Leachate (most ash)
0–115 0.05 0
22.62
12.3 15–92d (–)9–124 29.39–45.85 17.57
a
Each of the estimate is a sum of different components and not necessarily the sum of the values in the line. The ranges refer to rural and urban sites for UK and UK C ECE. c The rang presents different types of materials (left for glass and right for plastic). d The ranges refers to differences between countries. e Modern incinerator with energy recovery including calculation of chemicals and materials. b
Table 8.22 Annual emissions (year 2006) and annual costs (2003 prices) Pollutant PM10 CO COT HCl SO2 NOx
Total amount of annual emissions (kg)
Unit cost ($=kg)
Annual cost ($=year)
28 541 70 42 73 28,711
36.2 0.2 1.3 5.4 5.4 6.8
1,005 103 89 224 393 195,534
design and to put in light the most critical component failures for the system. They can assist in identifying the fault tree event whose upgrade is most likely to yield the greatest improvement in system performance. These measures are: • Birnbaum. It determines the maximum increase of the risk due to the failure event of a component in comparison with when the component is operating. This measure is very important because it allows one to rate how much the top gate probability changes when the unavailability of a basic event has changed; as a consequence, it is possible to rank the events according to the Birnbaum measure and to select those on which to concentrate the best efforts for improvement. The Birnbaum measure is defined
as follows: IB .A/ D P .TOPnA/ P .TOPnAN /; where A is the primary/basic event and TOP is the top event. • Criticality. The criticality importance measure of event A determines the probability that the top event, here assumed to have occurred, is due to the failure of component A: IC .A/ D IB .A/
P .A/ : P .TOP/
• Fussell–Vesely. Given that the system failed, the Fussell–Vesely measure determines the probability
8.8 Application 2 – FTA in a Waste to Energy System
295
Fig. 8.94 Distribution of NOx emission values. Year 2006 (25,091 half-hour observations)
Fig. 8.95 Half-hour values of NOx emissions (mg=Nm3 )
that component A contributed to this failure. In particular, it is the ratio of the probability of occurrence of any cut set containing event A and the probability of the top event. The Birnbaum importance measure considers only the conditional probability that event A is critical, while the criticality importance measure also takes into account the overall probability of the occurrence of the top event due to event A. According to this criticality analysis, the urea tank, electric equipment, and air secondary piping are the most critical parts.
8.8.9 Spare Parts Availability, What-If Analysis As illustrated in Fig. 8.96, the system unavailability for a period of time T equal to 365 h is 7:407 104 ; for a longer period of 1 year the availability of the
system, given the top event, is 0.9984, as reported in the second column of Table 8.25. This last value was obtained by the application of the Monte Carlo dynamic simulation with 10,000 repetitions, i. e., simulating the failures and repair events for 10,000 virtual production systems based on the same components/basic events parameterization. The point availability A.t/ at t D 8;760 h is about 0.9979, while the reliability is about 0.1735 for a mission period T (D t t0 ) equal to 1 year. Other significant results, reported in Table 8.25, are the ENF, the mean time to first failure, and the annual downtime, which amounts to 13:74 h=year. This system configuration is called “optimistic” because it does not consider the lead times required to supply spare parts, such as valves and pumps, in the case of failures and corrective maintenance actions. In other words, the MTTR is based on the optimistic hypothesis of assured availability of every generic spare part, i. e., a fulfillment lead time equal to zero or an infinite number of spare parts in storage.
8 Effects Analysis and Reliability Modeling of Complex Production Systems
Fig. 8.96 FTA. Top event: Exceeding NOx limit. Relex® Reliability software
296
8.8 Application 2 – FTA in a Waste to Energy System
297
Table 8.23 Components’ basic reliability parameters, T D 365 h (h1 )
(h1 )
=
T
1 exp.T /
1:25 105 6:23 105 6:23 105 2:49 105 6:23 105 3:11 105 3:11 105 3:11 105 3:11 105 1:56 105 1:56 105 1:56 105 1:56 105 1:04 104 1:04 104 1:04 104 1:04 104 1:04 104 1:04 104 6:23 105
5:00 101 1:25 101 3.00 3.00 5:0 101 1:35 101 1:35 101 1:35 101 1:35 101 3.00 3.00 3.00 3.00 8:62 101 8:62 101 8:62 101 8:62 101 8:62 101 8:62 101 6:67 101
2:49 105 4:98 104 2:08 105 8:30 105 1:25 104 2:30 104 2:30 104 2:30 104 2:30 104 5:19 106 5:19 106 5:19 106 5:19 106 1:20 104 1:20 104 1:20 104 1:20 104 1:20 104 1:20 104 9:34 105
4:55 103 2:27 102 2:27 102 9:09 102 2:27 102 1:14 102 1:14 102 1:14 102 1:14 102 5:68 103 5:68 103 5:68 103 5:68 103 3:79 102 3:79 102 3:79 102 3:79 102 3:79 102 3:79 102 2:27 102
4:54 103 2:25 102 2:25 102 8:69 102 2:25 102 1:13 102 1:13 102 1:13 102 1:13 102 5:67 103 5:67 103 5:67 103 5:67 103 3:72 102 3:72 102 3:72 102 3:72 102 3:72 102 3:72 102 2:25 102
Component AIR_fail VR1101 TT101 TT105 TKUREA CX51005 CX51006 CX51008 CX51009 DH51001 DH51002 DH51003 DH51004 INJ51101H INJ51102H INJ51103H INJ51101L INJ51102L INJ51103L e_ELECTRIC
Table 8.24 MCS evaluation, T D 365 h Minimal cut set i VR1101 AIR_fail TKUREA e_ELECTRIC TT101 TT105 CX51005 CX51006 CX51008 CX51009 DH51001 DH51002 DH51003 DH51004 INJ51101L INJ51101H INJ51101L INJ51102H INJ51101L INJ51103H INJ51102L INJ51101H INJ51102L INJ51102H INJ51102L INJ51103H INJ51103L INJ51101H INJ51103L INJ51102H INJ51103L INJ51103H
qCSi
WCSi 4
4:98 10 2:49 105 1:25 104 9:34 105 1:72 109 5:31 108 5:31 108 2:69 1011 2:69 1011 1:45 108 1:45 108 1:45 108 1:45 108 1:45 108 1:45 108 1:45 108 1:45 108 1:45 108
1 FCSi
FCSi 2
2:27 10 4:55 103 2:27 102 2:27 102 3:77 106 5:24 106 5:24 106 5:90 108 5:90 108 9:12 106 9:12 106 9:12 106 9:12 106 9:12 106 9:12 106 9:12 106 9:12 106 9:12 106
2
2:25 10 4:54 103 2:25 102 2:25 102 1:95 103 1:28 104 1:28 104 3:21 105 3:21 105 1:38 103 1:38 103 1:38 103 1:38 103 1:38 103 1:38 103 1:38 103 1:38 103 1:38 103
9:78 101 9:95 101 9:78 101 9:78 101 9:99 101 9:99 101 9:99 101 9:99 101 9:99 101 9:99 101 9:99 101 9:99 101 9:99 101 9:99 101 9:99 101 9:99 101 9:99 101 9:99 101
Table 8.25 also summarizes the predicted values of system reliability parameters for two different scenarios:
case of an exponential distribution of probability for ttr, and to about 29:21 h=year when ttr is constant, as reported in the last column of Table 8.25.
• Realistic operating scenario. The required supply lead time LTS is 2 weeks, corresponding to 10 working days or 15 operating days, or 360 h, for pumps and 1 day, or 24 h, for valves. The system downtime amounts to about 28:77 h=year in the
• Pessimistic operating scenario. Same hypotheses of the realistic scenario for pumps and valves, while for the other parts LTS is equal to 144 h, or 6 days. The system downtime amounts about to 203 h=year.
298
8 Effects Analysis and Reliability Modeling of Complex Production Systems
Fig. 8.97 Criticality analysis. Relex® Reliability software ReliaSoft BlockSim 7 - www.ReliaSoft.com
Block Up/Down State Operating Time Time Under Repair
VR1101
TKUREA
INJ51103L
e_ELECTRIC
Air_fail
System
0.000
1752.000
3504.000
5256.000
7008.000
8760.000
Time, (t)
Fig. 8.98 System up/down analysis, pessimistic configuration. Reliasoft® Reliability software
An exponential distribution of ttr random values is assumed and the MTTR for pumps is the value reported in Table 8.23 (MTTR D 1=) in the realistic scenario with 360 h in addition. A similar consideration applies for the MTTR defined for valves of S and for the other parts in case of a “pessimistic” scenario. Figure 8.98 shows the results of the up/down analysis obtained by Monte Carlo simulation applied to the
“pessimistic” system. Figure 8.99 presents the most critical components in terms of the number of failures in the same system configuration. The values obtained assuming the so-called realistic hypothesis agree with the results obtained by the analysis of the historical data of NOx emissions. The following equation can be applied in order to quantify the economic effects of externalities, in terms
8.8 Application 2 – FTA in a Waste to Energy System
299
ReliaSoft BlockSim 7 - www.ReliaSoft.com
Block Expected Failures 0.878
RS FCI 100%
50%
0.702 0% 5 Item(s)
0.527
0.351
0.176
0.000
INJ51103L
TKUREA
e_ELECTRIC
VR1101
Air_fail
Fig. 8.99 Expected failures, pessimistic configuration. Reliasoft® Reliability software
of euros per year, on the environment and on the community: MNOx D Q.CNOx ;failure CNOx ;function /tfailure ; where MNOx is the extra emission quantity of NOx (mg/year) in comparison with the correct function of the system,Q is the air flow, i. e., 24;860 Nm3 =h, CNOx ;failure is the NOx emission concentration in the case of failure, i. e., 212:4 mg=Nm3, CNOx ;function is the NOx emission concentration in the case of correct function, i. e., 133:7 mg=Nm3 , and tfailure is the annual downtime of the system, given the top event.
Table 8.25 reports the economic impact for the system configurations/parameterizations evaluated, assuming a unit cost of the NOx emission equal to US$ 6.81 per kilogram (2003 prices; see Table 8.20). The results demonstrate that the estimated extra cost of externalities, due to an incorrect function of the system, amounts about to US$ 180,000 per year assuming the optimistic hypothesis and the first what-if scenario configuration, and to ¤ 2,700,000 per year in case of the pessimistic, but not realistic, scenario. It is worth noting how important it is to conduct a quantitative analysis more accurately and as realisti-
Table 8.25 Reliability parameters prediction, multiscenario analysis Optimistic T (h) Mean availability (all events) Point availability (all events) at 8;760 h Reliability (8;760 h) Expected number of failures (failures) MTTFF (h) System uptime (h) System downtime (h) NOx (kg) NOx externality costs (2003 US$=year) MTTFF mean time to first failure
8,760 0.9984 0.9979 0.1735 1.74 5,013.38 8,746.26 13.74 26,882 183,066
Spare parts availability scenarios Realistic Pessimistic Realistic MTTR constant 8,760 0.9967 0.9962 0.1663 1.77 4,885.94 8,731.23 28.77 56,286 383,308
8,760 0.9768 0.976 0.139 1.94 4,451.88 8,556.93 203.07 397,311 2,705,687
8,760 0.9967 0.996 0.1704 1.76 4,933.15 8,730.79 29.21 57,149 389,185
300
8 Effects Analysis and Reliability Modeling of Complex Production Systems
cally as possible, and to manage spare parts. For this purpose it could be useful to repeat the FTA assuming more realistic probabilistic distributions of ttr and ttf random variables, e. g., introducing a Weibull parametric distribution. Chapter 11 will opportunely discuss basic and innovative models and methods to optimize the management of critical spare parts, in accordance with the adoption of different maintenance strategies and actions.
ing a planning period T D 8;760 h; the total amount of the annual cost saving, due to the introduction of a second redundant valve, for three scenarios is: 1. Optimistic configuration, Costextern.,annual .opt./ D Cost2 valves .opt./ Cost1 valves .opt./ D 126;675 183;066 D US$ 56;391 per year .30:8%/: 2. Realistic configuration,
8.8.10 System Modifications for ENF Reduction and Effects Analysis This section exemplifies the impacts on reliability and costs associated with some modifications to the SNCR plant and to the strategies/rules for the control of NOx emissions. In particular, they deal with the introduction of two alternative management policies for the critical valve VR1101. Similar considerations could of course be applied to other parts and components of the system.
Costextern.,annual .real./ D Cost2 valves .real./ Cost1 valves .real./ D 156;288 383;308 D US$ 227;020 per year .59:2%/: 3. Pessimistic configuration, Costextern.,annual .pess./ D Cost2 valves .pess./ Cost1 valves .pess./ D 2;486;237 2;705;687 D US$ 219;450 per year .8:1%/:
8.8.10.1 A Redundant Valve In the case of insertion of a new redundant valve in a parallel configuration, the fault tree changes. Figure 8.100 shows this new situation, given the top event, assuming T D 365 h and the optimistic configuration of the system. In Table 8.26 the performance of the system and the related externality costs are compared for different configurations/parameterizations, assum-
It is worth noting that the redundant valve brings very important benefits from an environmental and social point of view; moreover, this introduction is very profitable, considering an annual investment cost of about $6,000. Similar considerations can be made, considering different system alternative and/or simultaneous modifications, with reference to other externality costs, such as the emissions of CO2 , CH4 , PM10 , SO2 , CO, and N2 O.
Table 8.26 Valve redundancy introduction, what-if analysis 1 vs. 2 valves T (h) Mean availability (all events) Point availability (all events) at 8;760 h Reliability (8;760 h) Expected number of failures (failures) MTTFF (h) System uptime (h) System downtime (h) NOx (kg) NOx externality costs (2003 US$=year)
Realistic – 1 valve 8,760 0.9967 0.9962 0.1663 1.77 4,885.94 8,731.23 28.77 56,286 383,308
Spare parts availability scenarios Realistic – Optimistic – 2 valves 2 valves 8,760 0.9987 0.9989 0.2918 1.219 7,126.994 8,748.2699 11.7301 22,950 156,288
8,760 0.9989 0.9987 0.3037 1.1986 7,297.2985 8,750.4925 9.5075 18,601 126,675
Pessimistic – 2 valves 8,760 0.9787 0.9808 0.2351 1.3988 6,105.6822 8,573.3966 186.6034 365,086 2,486,237
8.9 Markov Analysis and Time-Dependent Components/Systems
301
Fig. 8.100 System modification: valves VR1101 and VR1102
8.9 Markov Analysis and Time-Dependent Components/Systems
and past states are independent: P fXnC1 D xnXn D xn ; : : : ; X1 D x1 g D P fXnC1 D xnXn D xn g: (8.21)
Markov modeling and analysis are very useful in the presence of dependences among basic/primary events in a fault tree, in particular with standby redundancies and common causes. A Markov chain is a discretetime stochastic process complying with the so-called Markov property: given the present state of a system/component, its future states are independent of its past states. Alternatively stated, the present state description fully captures all the information that can influence the future evolution of the process. Thus, given the present, the future is conditionally independent of the past. In particular, at the generic time instant the system may change its state from the current state to another state, or it may remain in the same state, according to a certain probability distribution. These changes of state are called “transitions,” and the probabilities associated with various state changes are termed “transition probabilities.” Formally given a sequence of random variables X1 ; X2 ; X3 ; : : : with the Markov property, the future
The state space of the chain is the set of possible values assumed by Xi . Markov chains are often described by a directed graph, where the edges are labeled by the probabilities of going from one state to the other states, as illustrated in Fig. 8.101. In other words, considering a generic system, Si .ti / identifies the state Si of the system at the instant of time ti and Eq. 8.21 changes as follows: P fSnC1 .tn C t/nSn .tn /; Sn1 .tn1 /; : : : ; S1 .t1 /g D P fSnC1 .tn C t/nSn .tn /g D Pn;nC1 ; (8.22) where Pn;nC1 represents the transition from state n to state n C 1. The generic Markov chain can be modeled by a set of differential equations, in accordance with the notation introduced in Fig. 8.101. Given a state i for the system and transitions tk and tj , respectively, from
302
8 Effects Analysis and Reliability Modeling of Complex Production Systems
Fig. 8.102 Vertex sections in the graph representation of a Markov chain
Fig. 8.101 Markov chain and differential equation model
state i to state k and from state j to state i , Pi .t C t/ D Pi .t/.1 tk t/ C Pj .t/tj t: (8.23) Equation 8.23 can be explained as follows: Pi .t C t/ Pi .t/ dPi .t/ D lim t !0 dt t D Pj .t/tj Pi .t/tk :
(8.24) Fig. 8.103 Markov chain for a parallel system and nonrepairable components
In general, dPi .t/ D dt
X
Pj .t/tj
j 2fstate IN i g
X
Pi .t/tk
k2fstate OUT i g
(8.25) when X
Pj .t/ D 1:
(8.26)
j 2fstate of the system Sg
8.9.1 Redundant Parallel Systems A significant example of the Markov chain theory is its application to the reliability prediction for a system made of two components, A and B, in a parallel configuration. For each component, consider the two states of function f0; 1g, representing the state of function or of failure, respectively; typical notation is reported schematically in Fig. 8.102. Figure 8.103 presents the Markov chain model, based on a vertex made of three sections as in Fig. 8.102, for a parallel system made of nonrepairable components (A D B D 0).
By the application of Eq. 8.25, 8 dP .t/ 1 ˆ D P1 .t/.A C B / ˆ ˆ ˆ dt ˆ ˆ ˆ ˆ dP2 .t/ ˆ ˆ D P1 .t/A P2 .t/B < dt ˆ dP3 .t/ ˆ ˆ D P1 .t/B P3 .t/A ˆ ˆ ˆ ˆ dt ˆ ˆ ˆ : dP4 .t/ D P .t/ C P .t/ ; 2 B 3 A dt
(8.27)
considering the following starting conditions: ( P1 .0/ D 1 Pj .0/ D 0 8j ¤ 1; where 1, 2, etc. refer to states S1 , S2 , etc. (see Fig. 8.103). By the application of the Laplace transform, Z1 F .s/ D LŒy.t/ D 0
est y.t/ dt
(8.28)
8.9 Markov Analysis and Time-Dependent Components/Systems
and the following property dy.t/ D sF .s/ y.t D 0C / L dt to Eq. 8.27,
(8.29)
.s/ D
LŒf .t/ D F .s/; 1 LŒ1 D ; s k LŒk D ; s 1 LŒt D 2 ; s 1 ; LŒ ekt D sCk L1 ŒF .s/ D f .t/:
.s a/P .s/ Q.s/
(8.34)
and a1 ; : : : ; an are nonmultiple roots of Q.s/ D 0. The roots obtained in Eq. 8.32 when Q.s/ D 0 are 8 a1 D .A C B / < .s a1 /P .s/ : .s D a1 / D D 1: Q.s/ As a consequence, 1 P .s/ P1 .t/ D L D .a1 / ea1 t D e.A CB /t : Q.s/ (8.35) Exactly the same result can be obtained by the integration of the first term in Eq. 8.27:
As a consequence, it is useful to derive from Eq. 8.26 the following equation: 1 p1 p2 p3 : s
(8.31)
From Eqs. 8.30 and 8.31 the values of pi .s/ are 8 P1 .s/ 1 ˆ ˆ p1 .s/ D D ˆ ˆ ˆ s C .A C B / Q1 .s/ ˆ ˆ ˆ ˆ 1 A A ˆ ˆ p2 .s/ D p1 D ˆ ˆ s C B s C B s C .A C B / ˆ ˆ ˆ ˆ ˆ P2 .s/ ˆ ˆ D ˆ ˆ Q2 .s/ ˆ ˆ ˆ ˆ B B 1 ˆ ˆ ˆ < p3 .s/ D s C p1 D s C s C . C / A
(8.33)
where (8.30)
Other general and useful analytical relationships and properties are
A
The inverse Laplace transform is then applied in accordance with the following property: P .s/ D .a1 / ea1 t C .a2 / ea2 t L1 Q.s/ C C .an / ean t ;
8 sp1 1 D p1 .A C B / ˆ ˆ ˆ < sp2 D p1 A p2 B ˆ sp 3 D p1 B p3 A ˆ ˆ : sp4 D p2 B C A p3 :
p4 D
303
A
B
P3 .s/ ˆ ˆ ˆ D ˆ ˆ Q 3 .s/ ˆ ˆ ˆ ˆ ˆ Œs C .A C B /.s C B /.s C A / ˆ ˆ ˆ ˆ s.s C B /.s C A / ˆ ˆ ˆ ˆ A s.s C A / B s.s C B / ˆ ˆ ˆ p4 .s/ D ˆ ˆ sŒs C .A C B /.s C B /.s C A / ˆ ˆ ˆ ˆ P4 .s/ ˆ ˆ D : : (8.32) Q4 .s/
8 dP1 .t/ ˆ D .1 C 2 / dt ˆ ˆ ˆ P1 .t/ ˆ ˆ ˆ ˆ ˆ P ˆ Z1 .t / Zt ˆ < dP1 .t/ D .1 C 2 / dt P1 .t/ ˆ ˆ 0 P1 .0/ ˆ ˆ ˆ ˆ ˆ ˆ lnŒP1 .t/ D .1 C 2 /t ˆ ˆ ˆ : P1 .t/ D e.1 C2 /t :
(8.36)
This result is the well-known expression of the reliability of a serial system made of unrepairable components as illustrated in Sect. 6.4. In fact, in state 1 components A and B have to be in a state of function.
304
8 Effects Analysis and Reliability Modeling of Complex Production Systems
Similarly, we have the expression for P2 .t/: 8 a1 ˆ ˆ ˆ ˆ ˆ ˆ a2 ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ .s D a1 / ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ < .s D a2 / ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ P2 .t/ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ :
D .A C B / D B D
.s a1 /P2 .s/ Q2 .s/
D Œs C .A C B /
A 1 s C B s C .A C B /
ˇ A ˇˇ D D 1 s C B ˇsD.A CB / D
.s a2 /P2 .s/ Q2 .s/
Fig. 8.104 Markov chain for a parallel system and repairable components
1 A s C B s C .A C B / ˇ ˇ A ˇ D1 D s C . C / ˇ
D .s C B /
A
D L1
P2 .s/ Q2 .s/
B
is the determination of P1 :
sDB
D .a1 / ea1 t C .a2 / ea2 t D eB t e.A CB /t : (8.37)
In the same way we obtain P3 .t/ and P4 .t/: P3 .t/ D eA t e.A CB /t ; P4 .t/ D 1 eA t eB t C e.A CB /t : (8.38) By the calculus of 1P4 .t/ it is possible to evaluate the reliability of a parallel redundant system as illustrated in Sect. 6.5.
8.9.2 Parallel System with Repairable Components This section applies the Markov chain modeling to the analysis of a parallel system made up of repairable components, as illustrated in Fig. 8.104. In particular, it is assumed that it is not possible to return to state S2 or S3 , starting from S4 . The main aim of this analysis
8 dP1 .t/ ˆ ˆ D A P2 .t/ C B P3 .t/ .A C B /P1 .t/ ˆ ˆ ˆ dt ˆ ˆ ˆ ˆ ˆ dP2 .t/ ˆ ˆ D P1 .t/A .B C A /P2 .t/ ˆ ˆ dt ˆ ˆ ˆ ˆ ˆ ˆ < dP3 .t/ D P1 .t/B .A C B /P3 .t/ dt ˆ ˆ ˆ ˆ ˆ dP4 .t/ ˆ ˆ D P3 .t/A C P2 .t/B ˆ ˆ ˆ dt ˆ ˆ ˆ ˆ ˆ ˆ P1 .0/ D 1 ˆ ˆ ˆ ˆ : Pj .0/ D 0; j ¤ 1: (8.39) Applying Laplace transforms, 8 sp1 1 D A p2 C B p3 .A C B /p1 ˆ ˆ ˆ ˆ ˆ ˆ sp2 D A p1 .B C A /p2 < ˆ ˆ ˆ ˆ ˆ ˆ :
sp3 D B p1 .A C B /p3 p1 C p2 C p3 C p4 D
(8.40)
1 : s
As a consequence, p1 D
1 s C A C B
A A sCB CA
B B sCA CB
: (8.41)
8.9 Markov Analysis and Time-Dependent Components/Systems
305
State S 1 - P1(t) 1 0.9 0.8
Probability
0.7 0.6 0.5 P1(t)
0.4 0.3 0.2 0.1 0 0
50000
100000
150000
200000
250000
300000
Unit of me
Fig. 8.105 Probability of the event “system in state S1 ”
Applying the inverse Laplace transform in the special case A D B D and A D B D , P1 .t/ D L1 Œp1 .s/ p
1 2 C 12 C 12 2 C 6 C 2 p
exp 12 t.3 C 2 +6 C 2 / D p 2 C 6 C 2
1p 2 C 6 C 2 C 12 . / 2 p
exp 12 t. 2 C 6 C 2 C 3 C / C p : 2 C 6 C 2 (8.42) Similarly, it is possible to quantify P2 .t/ and P3 .t/. Figure 8.105 presents the probability that the system is in state 1. In the case of repairable component A and/or component B and in the state of failure of both (see state S4 in Fig. 8.106), it could be useful to quantify the unavailability of the system, which is equal to the probability P4 .t/, i. e., the availability: A.t/ D 1 P4 .t/
(8.43)
Fig. 8.106 Markov chain for a parallel system and repairable components
The differential equation related to state S4 is dP4 .t/ D P3 .t/A C P2 .t/B .A C B /P4 .t/: dt (8.44)
306
8 Effects Analysis and Reliability Modeling of Complex Production Systems
8.9.3 Standby Parallel Systems In this section different examples regarding repairable systems are illustrated in accordance with the new notation reported in Fig. 8.107. Figure 8.108 represents the Markov chain model of the standby parallel system when the generic component, in the standby state, is not subject to failures. This is the so-called cold standby parallel system. Similarly, Fig. 8.109 presents the Markov chain model of the system when the generic standby component C can fail, with failure rate 0C , during the “waiting time”: this is a “warm standby” parallel system.
8.9.3.1 Cold Standby
In the case of A D B D and A D B D , 8 dP1 .t/ ˆ ˆ D P3 .t/ P1 .t/ ˆ ˆ dt ˆ ˆ ˆ ˆ ˆ dP2 .t/ ˆ ˆ D P4 .t/ P2 .t/ ˆ ˆ ˆ dt ˆ ˆ ˆ ˆ ˆ dP3 .t/ ˆ ˆ D P2 .t/ C P5 .t/ . C /P3 .t/ ˆ ˆ ˆ dt ˆ < dP4 .t/ D P1 .t/ C P5 .t/ . C /P4 .t/ ˆ ˆ dt ˆ ˆ ˆ ˆ ˆ ˆ dP5 .t/ ˆ ˆ D ŒP3 .t/ C P4 .t/ 2P5 .t/ ˆ ˆ dt ˆ ˆ ˆ ˆ ˆ ˆ P1 .t/ C P2 .t/ C P3 .t/ C P4 .t/ C P5 .t/ D 1 ˆ ˆ ˆ ˆ ˆ P1 .0/ D 1 ˆ ˆ : (8.46) Pj .0/ D 0; j ¤ 1:
In the cold standby parallel system (Fig. 8.108), 8 dP1 .t/ ˆ ˆ D B P3 .t/ P1 .t/A ˆ ˆ dt ˆ ˆ ˆ ˆ ˆ dP2 .t/ ˆ ˆ D A P4 .t/ P2 .t/B ˆ ˆ ˆ dt ˆ ˆ ˆ ˆ ˆ dP3 .t/ ˆ ˆ D B P2 .t/ C A P5 .t/ .A C B /P3 .t/ ˆ ˆ ˆ dt ˆ < dP4 .t/ D A P1 .t/ C B P5 .t/ .B C A /P4 .t/ ˆ ˆ dt ˆ ˆ ˆ ˆ ˆ ˆ dP5 .t/ ˆ ˆ D A P3 .t/ C B P4 .t/ .A C B /P5 .t/ ˆ ˆ dt ˆ ˆ ˆ ˆ ˆ ˆ P .t/ C P2 .t/ C P3 .t/ C P4 .t/ C P5 .t/ D 1 ˆ ˆ 1 ˆ ˆ ˆ ˆ P1 .0/ D 1 ˆ : Pj .0/ D 0; j ¤ 1: (8.45)
Fig. 8.107 Vertex sections in the graph representation of a Markov chain
Fig. 8.108 Markov chain for a parallel cold standby system
Fig. 8.109 Markov chain for a parallel warm standby system
8.9 Markov Analysis and Time-Dependent Components/Systems
It is now possible to define three new states for the system as follows: 8 ˆ < P0 .t/ D P1 .t/ C P2 .t/ (8.47) PI .t/ D P3 .t/ C P4 .t/ ˆ : PII .t/ D P5 .t/: Then, 8 dP0 .t/ dP1 .t/ dP2 .t/ ˆ ˆ D C ˆ ˆ ˆ dt dt dt ˆ ˆ ˆ ˆ D ŒP3 .t/ C P4 .t/ ŒP1 .t/ C P2 .t/ ˆ ˆ ˆ ˆ ˆ ˆ D PI .t/ P0 .t/ ˆ ˆ ˆ ˆ ˆ ˆ dP3 .t/ dP4 .t/ dPI .t/ ˆ ˆ ˆ < dt D dt C dt D P0 .t/ C 2PII .t/ . C /PI .t/ ˆ ˆ ˆ ˆ ˆ ˆ dP5 .t/ dPII .t/ ˆ ˆ D D PI .t/ 2PII .t/ ˆ ˆ dt dt ˆ ˆ ˆ ˆ ˆ ˆ P0 .t/ C PI .t/ C PII .t/ D 1 ˆ ˆ ˆ ˆ ˆ PI .0/ D 1 ˆ ˆ ˆ : (8.48) Pj .0/ D 0; j ¤ I:
307
following: PII .t/ D p p . C 4/ . C 4/
p exp 12 t.3 C 2/ cosh 12 t . C 4/ 2 p .2 C 22 C / . C 4/
1 p
1 2 exp 2 t.3 C 2/ sinh 2 t . C 4/ 3 p .2 C 22 C / . C 4/
1
1 p 3 exp 2 t.3 C 2/ sinh 2 t . C 4/ p ; 2 .2 C 22 C / . C 4/ (8.51) while the state of function is 1 PII .t/:
(8.52)
Figure 8.110 presents the trend of the probability PII .t/ assuming D 104 (unit of time)1 and D 103 (unit of time)1 .
8.9.3.2 Warm Standby By the application of Laplace transforms, 8 sp0 1 D pI p0 ˆ ˆ ˆ < spI D p0 pI . C / C 2pII ˆ spII D pI 2pII ˆ ˆ : p0 C pI C pII D 1=s:
(8.49)
As a consequence, 8 .s 2 C 3s C s C 22 / ˆ ˆ ˆ p0 D ˆ ˆ s.s 2 C 2s C 3s C 2 C 2 C 22 / ˆ ˆ ˆ < .s C 2/ pI D 3 2 ˆ s C 2s C 3s 2 C 2s C 2 s C 22 s ˆ ˆ ˆ ˆ ˆ 2 ˆ ˆ : pII D : s.s 2 C 2s C 3s C 2 C 2 C 22 / (8.50) It is possible to quantify the probability of the system being in states S0 , SI , and SII , by the application of the inverse Laplace transform to p0 , pI , and pII . In particular, the state of not function is quantified by the
In the warm standby parallel system (Fig. 8.109), 8 dP1 .t/ ˆ ˆ D B P3 .t/ P1 .t/.A C 0B / ˆ ˆ dt ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ dP2 .t/ D A P4 .t/ P2 .t/.B C 0 / ˆ A ˆ ˆ ˆ dt ˆ ˆ ˆ ˆ ˆ dP3 .t/ ˆ ˆ D B P2 .t/ C 0B P1 .t/ C A P5 .t/ ˆ ˆ ˆ dt ˆ ˆ ˆ .A C B /P3 .t/ ˆ < dP4 .t/ D A P1 .t/ C 0B P2 .t/ C B P5 .t/ ˆ ˆ dt ˆ ˆ ˆ .B C A /P4 .t/ ˆ ˆ ˆ ˆ dP .t/ ˆ 5 ˆ ˆ D A P3 .t/ C B P4 .t/ .A C B /P5 .t/ ˆ ˆ dt ˆ ˆ ˆ ˆ ˆ ˆ ˆ P1 .t/ C P2 .t/ C P3 .t/ C P4 .t/ C P5 .t/ D 1 ˆ ˆ ˆ ˆ ˆ ˆ P1 .0/ D 1 ˆ ˆ ˆ : Pj .0/ D 0; j ¤ 1: (8.53)
308
8 Effects Analysis and Reliability Modeling of Complex Production Systems
State II - PII(t) 5.00E-03 4.50E-03 4.00E-03
Probability
3.50E-03 3.00E-03 2.50E-03 PII(t) 2.00E-03 1.50E-03 1.00E-03 5.00E-04 0.00E+00 0
2000
4000
6000
8000
10000
12000
14000
Unit of me
Fig. 8.110 Failure probability of the standby system (“system in state II”)
In the case of A D B D , A D B D , and 0A D 0B D 0 , 8 dP1 .t/ ˆ ˆ D B P3 .t/ P1 .t/.A C 0B / ˆ ˆ ˆ dt ˆ ˆ ˆ D P3 .t/ P1 .t/. C 0 / ˆ ˆ ˆ ˆ ˆ dP2 .t/ ˆ ˆ D A P4 .t/ P2 .t/.B C 0A / ˆ ˆ ˆ dt ˆ ˆ ˆ D P4 .t/ P2 .t/. C 0 / ˆ ˆ ˆ ˆ ˆ dP3 .t/ ˆ ˆ D B P2 .t/ C 0B P1 .t/ C A P5 .t/ ˆ ˆ ˆ dt ˆ ˆ .A C B /P3 .t/ ˆ ˆ ˆ ˆ ˆ ˆ D P2 .t/ C 0 P1 .t/ C P5 .t/ ˆ ˆ ˆ ˆ ˆ ˆ . C /P3 .t/ < dP4 .t/ ˆ D A P1 .t/ C 0B P2 .t/ C B P5 .t/ ˆ ˆ dt ˆ ˆ ˆ .B C A /P4 .t/ ˆ ˆ ˆ ˆ ˆ ˆ D P1 .t/ C 0 P2 .t/ C P5 .t/ ˆ ˆ ˆ ˆ ˆ ˆ . C /P4 .t/ ˆ ˆ ˆ ˆ ˆ dP5 .t/ ˆ ˆ D A P3 .t/ C B P4 .t/ .A C B /P5 .t/ ˆ ˆ dt ˆ ˆ ˆ ˆ D P3 .t/ C P4 .t/ 2P5 .t/ ˆ ˆ ˆ ˆ ˆ P1 .t/ C P2 .t/ C P3 .t/ C P4 .t/ C P5 .t/ D 1 ˆ ˆ ˆ ˆ ˆ ˆ P .0/ D 1 ˆ ˆ 1 ˆ : (8.54) Pj .0/ D 0; j ¤ 1:
In particular, it is possible to define three new states for the system as follows: 8 ˆ < P0 .t/ D P1 .t/ C P2 .t/ (8.55) PI .t/ D P3 .t/ C P4 .t/ ˆ : PII .t/ D P5 .t/: The unavailability of the system is quantified by PII .t/: 8 dP0 .t/ ˆ ˆ D PI .t/ P0 .t/. C 0 / ˆ ˆ dt ˆ ˆ ˆ ˆ ˆ dPI .t/ ˆ ˆ D . C 0 /P0 .t/ . C /PI .t/ ˆ ˆ ˆ dt ˆ < C2PII .t/ (8.56) .t/ dP II ˆ ˆ D PI .t/ 2PII .t/ ˆ ˆ dt ˆ ˆ ˆ ˆ P0 .t/ C PI .t/ C PII .t/ D 1 ˆ ˆ ˆ ˆ ˆ ˆ P0 .0/ D 1 ˆ : Pj .0/ D 0; j ¤ 0: By the application of Laplace transforms, 8 sp0 1 D pI p0 . C 0 / ˆ ˆ ˆ < spI D p0 . C 0 / pI . C / C 2pII ˆ spII D pI 2pII ˆ ˆ : (8.57) p0 C pI C pII D 1=s:
8.10 Common Mode Failures and Common Causes
309
Then, pI D
. C 0 /.s C 2/ : s 3 C 3s 2 C 2s 2 C 2s C 0 s 2 C20 s C s2 C s0 C 22 s
(8.58)
The probability of the system being in a state of function, but with a component under repair, can be quantified by the application of the inverse Laplace transform as follows: p . C 0 /. 02 20 C 2 C 4
PI .t/ D
2 C 2 C 0 C 0 /
1 0 exp 2 t. C 3 C 2 p 02 20 C 2 C 4/ .22 C 20 C 2 C 2 C 0 / p Œ 02 20 C 2 C 4 C 2. C 0 / Œ.22 C 20 C 2 C 2 C 0 / . C 0 /.2 C 2 C 0 C 0 p C 02 20 C 2 C 4/ p
exp 12 t. 02 20 C 2 C 4 C 0 C 3 C 2/
In particular, for the previously introduced differential equations, 8 dP0 .t/ ˆ ˆ D 0 D PI .t/ P0 .t/. C 0 / ˆ ˆ dt ˆ ˆ ˆ ˆ ˆ ˆ ˆ dPI .t/ D 0 D . C 0 /P0 .t/ . C /PI .t/ < dt C2PII .t/ ˆ ˆ ˆ .t/ dP ˆ II ˆ ˆ D 0 D PI .t/ 2PII .t/ ˆ ˆ dt ˆ ˆ ˆ : P0 .t/ C PI .t/ C PII .t/ D 1; (8.62) i. e., 22 C 2. C 0 / C . C 0 / Œ. C 0 /2 22 C 2. C 0 / C . C 0 / PI .t ! 1/ D 2 2 . C 0 / 2 2 C 2. C 0 / C . C 0 / : PII .t ! 1/ D . C 0 / (8.63) P0 .t ! 1/ D 22
These results are true in the case of asymptotic values of availability and unavailability. The applications of : the Markov chain modeling and analysis illustrated so far are a few examples of the power and effectiveness .22 C 20 C 2 C 2 C 0 / p 0 Œ 02 20 C 2 C 4 C 2. C / of this set of tools. Other advanced applications are presented in the literature and are not subject of this Œ.22 C 20 C 2 C 2 C 0 / book. (8.59) Similarly, it is possible to quantify PII .t/ and P0 .t/. It could be useful to quantify, for each state j of the system, the probability Pj .t/ in the case of stationary conditions, i. e., dPj .t/ D 0: dt
(8.60)
As a consequence, the generic condition explained by the Eq. 8.25 becomes dPi .t/ D dt
X j 2fstate IN j g
D 0;
Pj .t/tj
X
Pi .t/tk
k2fstate OUT j g
(8.61)
8.10 Common Mode Failures and Common Causes The assumption of independency of failures among different components within a production system is sometimes violated. Some components can share the same power source or external environmental conditions. This is the reason why in FTA, given a top event, it is possible to identify several identical basic events, and mirror events were properly introduced in the numerical examples illustrated in Sects. 8.5 and 8.7. How should we consider a MCS with two or more different basic components subject to common mode failures (also called “common causes”)?
310
8 Effects Analysis and Reliability Modeling of Complex Production Systems
μ1 0,0
For this purpose consider a MCS made of two components A and B, modeled as .i ; i / with i D 1; : : : ; n and n D 2, subject to a common cause modeled as .c; b/ and the following events:
1,0
λ1 μ2
λ2
C
μ2
C
1. There are no common cause events in .0; t/. 2. The last common cause event occurs in .u du; u, where u 2 .0; t/.
λ2
The MCS can be considered as a redundant parallel system of components A and B. As a consequence, the unavailability of the system is the result of two different contributions:
μ1 C
0,1
1,1
1. Hypothesis I. The system unavailability is the result of the application of Eq. 8.12 when components A and B are supposed to be in a state of function for t D 0, i. e., .0; 0/t D0. The probability of components A and B being in a state of failure in t is
λ1 Fig. 8.111 Markov chain, common cause and a 2-dim MCS
A common cause can be modeled as a repairable event based on constant failure and repair rates. In particular, if we call them c and b, respectively; the density function for the common cause event wc_cause .t/ by the application of Eq. 5.952 is wc_cause .t/ D
2 Y i D1
i .1 e.i Ci /t /: i C i
Equation 8.66 does not consider the event “no common cause in Œ0; t.” The probability of no common causes in the system during Œ0; t is quantified by the basic equation (Eq. 5.27) as follows:
2
c cb C expŒ.c C b/t: cCb cCb (8.64)
ect :
The asymptotic value is wc_cause .1/ D lim Œwc_cause .t/ D t !1
cb c Cb
Š if 1=cŠ1=b
(8.65) QI;S .t/ D ect
If a MCS is made up of two or more basic events subject to common causes, Eq. 8.12 cannot be applied. For example, in presence of a cut set made up of two repairable components subject to a common cause of rates .; / D .c; b/ it is possible to introduce the Markov chain as in Fig. 8.111.
i .1 e.i Ci /t /: i C i (8.68)
2. Hypothesis II. Assuming configuration .1; 1/u , Eq. 5.82 can be applied3 to quantify the probability of components A and B remaining in .1; 1/t : 2 Y i D1
See also Table 5.6.
2 Y i D1
8.10.1 Unavailability of a System Subject to Common Causes
2
(8.67)
As a consequence, the system unavailability assuming hypothesis I is
c:
The object of this section is to present an analytical model for the determination of the unavailability of a system with two or more components subject to common causes.
(8.66)
i i .i Ci /.t u/ : C e i C i i C i (8.69)
Consequently, Eq. 8.69 differs from Eq. 5.82 because of the swapping of terms and . 3
A new failure event is introduced: the failure rate is and the repair rate is and Eq. 5.82 is applied.
8.10 Common Mode Failures and Common Causes
311
The probability of a common cause event occurring in .u du; u is wc_cause .u/ du
Š
Eq. 8.65
c du:
i D1
ec.t u/ :
QII;S .t/ D
c ec.t u/
i D1
0
C
i i C i
i e.i Ci /.t u/ du: i C i (8.72)
As a consequence, the system unavailability, i. e., the probability of components A and B being in a state of failure in t is
(8.75) This is the result of the application of Eqs. 5.83 and 8.12.
8.10.2 Numerical Example, Dependent Event Consider the application illustrated in Sect. 8.6.1 and the hypothesis that there is a common cause between the basic components/events A and B. Then the value of c is supposed to be 0:2 year1 (five events per year). By the application of the Eq. 8.74 for T D 8;000 h, when the system operates 365 days per year and 24 h per day,4 the unavailability is QS .8;000/ D ect
D ect
i D1
Zt C
ce
i .1 e.i Ci /t / i C i cs
Y 2 i D1
C
n Y i D1
Zt C
ce 0
Y n i D1
C
i i .i Ci /s ds C e i C i i C i
2 105 2 105 C 102 105 5 2 .1 e.210 C10 /8;000 / 5 10 C 5 102 5 8;000
i i C i
5 C5102 /8;000
.1 e.10 Zt C 0
/
5 s
2:28 105 e2:2810
2 105 102 C 2 105 C 102 2 105 C 102 5 2 . e.210 C10 /s / 5 102 105 C 105 C 5 102 105 C 5 102 5 2 . e.10 C510 /s / ds
i .1 e.i Ci /t / i C i cs
i D1
i .1 e.i Ci /t / i C i
D e2:2810
In general, for a MCS made of n components subject to a common cause, QS .t/ D ect
ce
cs
Y n
0
i .i Ci /s ds: C e i C i (8.73)
0
n Y i D1
Zt
QS .t/ D QI;S .t/ C QII;S .t/ 2 Y
i .1 e.i Ci /t /: i C i
(8.71)
By Eqs. 8.69–8.71 the probability of components A and B remaining in the state of failure .1; 1/ in t as in .1; 1/u , because it is subject to a common cause between .u du; u, is Y 2
n Y
QS .t; c D 0/ D
(8.70)
Hypothesis II is based on the assumption that the last common cause occurs in .u du; u. In particular, the probability that the system stays in .1; 1/ during the period Œu; t can be quantified similarly to Eq. 8.67:
Zt
If c D 0,
i i C i
i e.i Ci /s ds: i C i (8.74) 4
c D 0:2 year1 D 0:2=.24 365/ h1 2:28 105 .
312
8 Effects Analysis and Reliability Modeling of Complex Production Systems
D 3:9999 107 C 2:2850 105 8;000 Z
5 C2105 C102 C105 C5102 /s
e.2:2810
ds
0
D 3:9999 107 C 2:2850 105
1 6 102 .6102 /8;000 .1 e /
D 3:9999 107 C 2:2850 105
8;000 Z
0
2 s
e610
ds
D 3:81 104 :
(8.76)
This value differs from qAB quantified in Sect. 8.6.1 and also influences the system availability in T D 8;000 h.
9
Basic Models and Methods for Maintenance of Production Systems
Contents 9.1
Introduction to Analytical Models for Maintenance of Production Systems . . . . . . . . . 314 9.1.1 Inspection Versus Monitoring . . . . . . . . . . . . . 315
9.2
Maintenance Strategies . . . . . . . . . . . . . . . . . . . . . . . 315
9.3
Introduction to Preventive Maintenance Models . 318
9.4
Component Replacement . . . . . . . . . . . . . . . . . . . . . . 319 9.4.1 Time-Related Terms and Life Cycle Management . . . . . . . . . . . . . . 319 9.4.2 Numerical Example. Preventive Replacement and Cost Minimization . . . . . . . 320
9.5
9.6
9.7
Time-Based Preventive Replacement – Type I Replacement Model . . . . . . . . . . . . . . . . . . . . . . . . . . 9.5.1 Numerical Example. Type I Replacement Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.5.2 Numerical Example. Type I Model and Exponential Distribution of ttf . . . . . . . . . 9.5.3 Type I Replacement Model for Weibull distribution of ttf . . . . . . . . . . . . . . . . . . . . . . . . 9.5.4 The Golden Section Search Method . . . . . . . . 9.5.5 Numerical Example. Type I Model and the Golden Section Method . . . . . . . . . . . Time-Based Preventive Replacement Including Duration of Replacements . . . . . . . . . . . . 9.6.1 Numerical Example 1: Type I Replacement Model Including Durations Tp and Tf . . . . . . 9.6.2 Type I Model with Duration of Replacement for Weibull Distribution of ttf . 9.6.3 Numerical Example 2: Type I Model with Durations Tp and Tf . . . . . . . . . . . . . . . . . 9.6.4 Practical Shortcut to t p Determination . . . . . . Block Replacement Strategy – Type II . . . . . . . . . . 9.7.1 Renewal Process . . . . . . . . . . . . . . . . . . . . . . . . 9.7.2 Laplace Transformation: W(t) and w(t) . . . . . 9.7.3 Renewal Process and W(t) Determination, Numerical Example . . . . . . . . . . . . . . . . . . . . . 9.7.4 Numerical Example, Type II Model . . . . . . . .
323 324 325 326 326 328 333 333 335 335 335 339 340 341 341 343
9.7.5 Discrete Approach to W(t) . . . . . . . . . . . . . . . . 348 9.7.6 Numerical Examples . . . . . . . . . . . . . . . . . . . . 349 9.7.7 Practical Shortcut to W(t) and t p Determination . . . . . . . . . . . . . . . . . . . . . . . . . . 352 9.8
Maintenance Performance Measurement in Preventive Maintenance . . . . . . . . . . . . . . . . . . . . 353 9.8.1 Numerical Example . . . . . . . . . . . . . . . . . . . . . 354
9.9
Minimum Total Downtime . . . . . . . . . . . . . . . . . . . . 355 9.9.1 Type I – Minimum Downtime . . . . . . . . . . . . . 355 9.9.2 Type II – Downtime Minimization . . . . . . . . . 357
9.10 Group Replacement: The Lamp Replacement Problem . . . . . . . . . . . . . . 358 9.11 Preventive Maintenance Policies for Repairable Systems . . . . . . . . . . . . . . . . . . . . . . . 359 9.11.1 Type I Policy for Repairable Systems . . . . . . . . . . . . . . . . . . 360 9.11.2 Type II Policy for Repairable Systems . . . . . . 370 9.12 Replacement of Capital Equipment . . . . . . . . . . . . 372 9.12.1 Minimization of Total Cost . . . . . . . . . . . . . . . 372 9.12.2 Numerical Example . . . . . . . . . . . . . . . . . . . . . 372 9.13 Literature Discussion on Preventive Maintenance Strategies . . . . . . . . . . 372 9.14 Inspection Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373 9.15 Single Machine Inspection Model Based on a Constant Value of Conditional Probability Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375 9.15.1 Numerical Example 1, Elementary Inspection Model . . . . . . . . . . . . . 376 9.15.2 Numerical Example 2, Elementary Inspection Model . . . . . . . . . . . . . 377 9.16 Inspection Frequency Determination and Profit per Unit Time Maximization . . . . . . . . . 378 9.17 Inspection Frequency Determination and Downtime Minimization . . . . . . . . . . . . . . . . . . 380
R. Manzini, A. Regattieri, H. Pham, E. Ferrari, Maintenance for Industrial Systems © Springer 2010
313
314
9 Basic Models and Methods for Maintenance of Production Systems
9.18 Inspection Cycle Determination and Profit per Unit Time Maximization . . . . . . . . . 9.18.1 Exponential Distribution of ttf . . . . . . . . . . . . 9.18.2 Weibull Distribution of ttf . . . . . . . . . . . . . . . . 9.18.3 Numerical Example . . . . . . . . . . . . . . . . . . . . .
381 381 382 382
9.19 Single Machine Inspection Model Based on Total Cost per Unit Time Minimization . . . . . . 383 9.20 Single Machine Inspection Model Based on Minimal Repair and Cost Minimization . . . . . . 384 9.21 Inspection Model Based on Expected Availability per Unit Time Maximization . . . . . . . . 385 9.22 Group of Machines Inspection Model . . . . . . . . . . . 386 9.23 A Note on Inspection Strategies . . . . . . . . . . . . . . . . 387 9.24 Imperfect Maintenance . . . . . . . . . . . . . . . . . . . . . . . 388 9.24.1 Imperfect Preventive Maintenance p – q . . . . 388 9.25 Maintenance-Free Operating Period . . . . . . . . . . . . 390 9.25.1 Numerical Example (Kumar et al. 1999) . . . . 391 9.25.2 MFOPS and Weibull Distribution of ttf . . . . . 392 9.26 Opportunistic Maintenance Strategy . . . . . . . . . . . 393
“There is a proverb in this country which says prevention is better than cure,” interrupted Mr Vladimir, throwing himself into the arm-chair. (The Secret Agent by Joseph Conrad)
The European standard EN 13306 (Maintenance terminology) distinguishes two main types of maintenance, called “maintenance strategies,” as: • “Preventive maintenance ... carried out at predetermined intervals or according to prescribed criteria and intended to reduce the probability of failure or the degradation of the functioning of an item”; • “Corrective maintenance ... carried out after fault recognition and intended to put an item into a state in which it can perform a required function”. One of the most critical decisions for the analyst, i. e., the maintenance manager, is the determination of the items subject to preventive maintenance, then the time schedule or the number of units of use suitable for performing the maintenance actions. A famous proverb, also used a lot in television spots of a well-known toothpaste, is “prevention is better than cure,” known as “prevenire è meglio che curare” by the Italian authors of this book. This is the Hamlet-like
maintenance issue against the “outrageous fortune,” as William Shakespeare calls stochastic processes: “is it better to prevent or wait and see?: that is the question”. This is the question of strategy in maintenance management and this chapter introduces models, methods, and significant applications to support the choice of the best reply and reaction to it. Other very important questions deal with the identification of the production system’s performance and parameters subject to monitoring and inspection activities, monitoring or inspection?, deferred or immediate maintenance?, on-line or off-line maintenance?, on-site or off-site and/or remote maintenance? replacement or overhaul or rebuilding?
9.1 Introduction to Analytical Models for Maintenance of Production Systems Chapter 4 defined maintenance management as the set of “activities of the management that determine the maintenance objectives, strategies, and responsibilities and implement them by means such as maintenance planning, maintenance control and supervision, improvement of methods in the organization including economical aspects” (European standard EN 13306). This chapter aims to classify and illustrate the most significant maintenance strategies proposed in the literature and applied to production systems. The largest number of automotive companies suggest their customers, and sometimes force them, to plan a preventive maintenance action (sometimes called “voucher”) in accordance with an established time schedule (e. g., 1 year) and/or an established number of units of use (e. g., 20;000 km). Nevertheless, the car could be subject to unexpected downtimes and require corrective maintenance, or a “compliance test”, i. e., a test used to show whether or not a property of an item complies with the stated specifications, or a “function checkout,” etc. What is the best number of time units or units of use to schedule a preventive action? Finally it is necessary to remember that maintenance excellence is the result of maintenance decisions with technical, economic, and organizational implications. In particular, a very critical issue, not the subject of this book, is summarized by the following ques-
9.2 Maintenance Strategies
tion: What are the best procedures and resources cited in the definition of maintainability as “The ability of an item under given conditions of use, to be retained in, or restore to, a state in which it can perform a required function, when maintenance is performed under given conditions and using stated procedures and resources” (EN 13306:2001 Maintenance terminology)? An effective reply to this question differs from business to business, company to company, department to department of the same company, production system to production system of the same department, component to component of the same production system, etc. For this reason this is not the subject of this book but the analyst, which could be the reader of this book, has to be conscious of its existence and criticality. In order to introduce the reader to the most significant maintenance strategies, it is useful to cite the standard EN 13306:2001, which identifies two main kinds of strategies – preventive and corrective – whose definitions are reported at the beginning of this chapter. How many strategies exist? We think that an answer to this question does not exist, because it is possible to identify different conceptual frameworks useful for classifying strategies and actions in maintenance management. For this purpose we choose to illustrate the classification proposed by the European standards and specifications (see Fig. 9.1) and another framework proposed by the authors, inspired by the literature and introduced in Sect. 9.2. In particular the proposed framework is coherent with the models and methods illustrated and applied in this chapter. We now give a few definitions from EN 13306 to properly illustrate the framework reported in Fig. 9.1: • Condition based maintenance ... Preventive maintenance based on performance and/or parameter monitoring and the subsequent actions. ... monitoring may be scheduled, on request or continuous. • Predetermined maintenance ... Preventive maintenance carried out in accordance with established intervals of time or number of units of use but without previous condition investigation. • Deferred maintenance ... Corrective maintenance which is not immediately carried out after a fault detection but is delayed in accordance with given maintenance rules. • Immediate maintenance ... is carried out without delay after a fault has been detected to avoid unacceptable consequences.
315
9.1.1 Inspection Versus Monitoring The framework illustrated in Fig. 9.1 classifies the most important strategies which operatively are maintenance activities. The activities classified by the EN 13306 are inspection, monitoring, compliance test, function checkout, routine maintenance (e. g., cleaning, lubrication), overhaul, rebuilding, repair, fault diagnosis (the well-known troubleshooting), fault localization, improvement, and modification (i. e., change the function of an item). In particular EN 13306 also helps us to identify the most important differences between inspection and monitoring activities. Inspection is defined as “Check for conformity by measuring, observing, testing or gauging the relevant characteristics of an item. ... inspection can be carried out before, during or after other maintenance activity”; monitoring is defined as “Activity, performed either manually or automatically, intended to observe the actual state of an item ... used to evaluate any changes in the parameters of the item with time. ... continuous, over time interval or after a given number of operations. ... usually carried out in the operating state.” The remainder of this chapter is organized as follows. Section 9.2 presents the classification of the maintenance strategies adopted by the authors and a little bit different from that illustrated in Fig. 9.1. Sections 9.3–9.10 present different analytical models and several applications on preventive maintenance based on replacements. Section 9.11 presents preventive maintenance policies for repairable systems. Section 9.12 illustrates a model for planning the replacement of capital equipment. Sections 9.14–9.24 present analytical models for inspection maintenance. Section 9.25 introduces and exemplifies an important reliability measure: maintenance-free operating period. Finally, Sect. 9.26 discusses opportunistic maintenance.
9.2 Maintenance Strategies During last few decades academic researchers and practitioners of industrial companies developed several rules and techniques for planning and managing maintenance activities in production systems. These supporting decision-making models and methods can
316
9 Basic Models and Methods for Maintenance of Production Systems
Failure
Time between failures
Failure
Maintenance Before a detected fault
After a detected fault
Preventive maintenance
Condition-based maintenance
Scheduled, continuous or on request
Corrective maintenance
Predetermined maintenance
Scheduled
Deferred
Immediate
Fig. 9.1 Maintenance strategies overview, EN 13306:2001
be classified in accordance with one of the following maintenance philosophies: • Breakdown/corrective maintenance (CM). It is performed when the production system stops functioning correctly, i. e., in accordance with a set of known operating conditions. There are no planning activities to optimize equipment maintenance and support management decisions. This strategy is influenced by the spare parts fulfillment and management system adopted and the cost of a breakdown maintenance action obviously depends on the availability (unavailability) of spare parts necessary to perform the repair action. • Preventive maintenance (PM) (scheduled and unscheduled). It deals with planned actions performed to face and counteract potential failures on a component/system. Timing (i. e., frequency) and outcome of a preventive maintenance action have to be properly planned and optimized, maximizing the throughput of the production system and minimizing costs. It is supposed that a preventive maintenance strategy can be performed only with the continuous knowledge of system operating conditions, which can be correct (incorrect) when they respect (do not respect) a pool of predefined specifications in accordance with the definition of continuous monitoring action introduced in Sect. 9.1.1.
Several models and methods to support management and practitioners in planning and scheduling preventive maintenance activities have been presented in the literature. Some examples are represented by replacement and the adoption of the as good as new hypothesis, refurbishment, and overhaul (i. e., restoration). The first class of preventive maintenance is the socalled statistically and reliability based preventive maintenance, which mainly refers to the analysis of the equipment historical records. Two widely used approaches to this preventive maintenance planning strategy are the use-based preventive maintenance actions, performed on an hours run of the component/system basis, and the time-based preventive maintenance actions, performed on a calendar basis. These are also known as scheduled-basis preventive maintenance strategies. Another special class of preventive maintenance is the condition-based preventive maintenance (the so-called predictive maintenance or unscheduledbasis maintenance), which is carried out on the basis of the continuous monitoring and knowledge of the operating condition and performance of the equipment. In particular, a set of relevant system functions’ parameters is monitored on-line or off-line, detecting a deterioration or degradation in the functional performance of the component/system.
9.2 Maintenance Strategies
By the current definition, preventive maintenance requires continuous monitoring of the system. Obviously preventive maintenance actions need to be properly integrated with spare parts fulfillment and management decisions. • Replacement. This widely used maintenance strategy can be classified in two main classes of rules: planned replacement and replacement upon failure. The first class belongs to the family of preventive maintenance rules (the so-called preventive replacement) and is based on the determination and optimization of the best timing and outcome of the maintenance action as previously introduced (see the introduction to preventive maintenance) and discussed in detail later. Applying the replacement upon failure, the component/system is left to run until it fails. As a consequence, this second class of rules belongs to the breakdown/corrective maintenance strategy. Both replacement rules are significantly influenced by the spare parts fulfillment and management system adopted. • Inspection maintenance (IM). These maintenance actions firstly determine the state of the equipment and ad hoc models and methods try to identify the points in time at which these actions have to be performed. This strategy is also called “fault finding”: measurements and inspections can be properly planned in advanced, but restorative or preventive tasks (e. g., preventive replacement, failure repair, or replacement, overhaul) can not. The state of function of the system/component can be based on a set of indicators capable of describing the health of the system in accordance with a pool of specifications. As a consequence, inspection rules can be referred to the previously cited condition-based maintenance strategy, because the state of function of the equipment can depend exclusively upon one or more monitored and relevant conditions. The basic difference between condition-based preventive maintenance and condition-based inspection maintenance is that the first one needs a continuous monitoring activity of the production system to reduce downtimes/failure occurrences/events and to detect them when they occur, while condition-based inspection maintenance schedules fault-finding actions at specific points in time t to detect if the system is in a state of failure and eventually perform a maintenance action.
317
The primary aim of the inspection strategy is to make a system more reliable, but an inspection action costs money in terms of materials, wages, and loss of production owing to scheduled downtimes. For these reasons managers of production systems have to properly plan and schedule inspection maintenance actions capable of maximizing throughput and profit, and minimizing global production costs. • Condition-based maintenance. This strategy requires monitoring a relevant variable or a set of relevant variables that are closely related to equipment failure. As previously illustrated, conditionbased maintenance refers to models and rules which can belong to preventive maintenance (in the case of continuous monitoring of equipment parameters) or to inspection maintenance, when the state of the equipment is known only after an inspection activity that can be properly scheduled. In condition-based maintenance based on continuous monitoring, the decision refers to the value of a suitable diagnostic signal (e. g., operating/use times, structural parameter, cost indicator) associated with the item and equipment under consideration. As a consequence, a continuous condition-based maintenance is not a scheduled basis preventive maintenance (i. e., based on predetermined time intervals). Some examples of monitored parameters are related to equipment operations, e. g., vibration of machines, operating temperature, and noise, or to indirect measures of equipment function, e. g., product dimensions and quality levels. The first problem related to condition-based maintenance is the determination of the best set of parameters to be monitored and measures of system function. • Opportunistic maintenance. Maintenance actions are performed when the opportunity arises (such as during shutdown periods). • Overhaul. This strategy is based on maintenance actions for the restoration of a component/system to an acceptable condition. The action restores the equipment to a desired level of function. As a consequence, overhaul actions can belong to the class of preventive maintenance, e. g., the socalled time-based preventive maintenance, or to inspection maintenance (i. e., condition-based maintenance) in the case of detection of a degraded con-
318
9 Basic Models and Methods for Maintenance of Production Systems
dition or performance of the production system by performing an inspection action. • Design modification. This strategy deals with the introduction of modifications in system configurations and/or components in order to increase the reliability and the productivity of the production system. Figure 9.2 reports the classification of the main maintenance strategies whose analytical models and methods are illustrated and applied in examples and case studies presented in following sections. No maintenance philosophy or maintenance rule is better than the others. The efficacy is based on the operative context and conditions of the production system, which usually requires managers and practitioners to apply a combination (i. e., a mix) of different models and techniques. As a consequence, different strategies and rules need to be properly integrated in accordance with both preventive maintenance and inspection maintenance programs, whose tasks are grouped by periodicity (e. g., daily, weekly, based on the number of cycles), availability and skills of maintenance teams of workers (also called “maintenance crews”), and the availability of spare parts
and equipment necessary to perform the maintenance action. In complex production systems the planning activity of maintenance tasks needs to be properly supported by models and methods for finite capacity constraints scheduling and sequencing problems, whose significant and efficacy contributions are supported by operations research studies (e. g., Jeong et al. 2007; Tam et al. 2007; Oke and Charles-Owaba 2006).
9.3 Introduction to Preventive Maintenance Models Preventive maintenance is defined as a series of tasks, called “planned maintenance actions,” performed to face known causes of potential failures of a production system (i. e., a component or a piece of equipment). As previously introduced, there are two main categories of preventive maintenance: statistically and reliability based and condition-based (Fig. 9.2). In preventive maintenance the first critical question is to identify the tasks that should be performed to
Maintenance strategies
Design modification
Breakdown – corrective maintenance
Maintenance actions
Preventive maintenance (PM)
Opportunity maintenance (OP)
Conditionbased
Statistically and reliability based
Inspection maintenance (IM)
Condition-based maintenance
Overhault/restoration
Replacement or repair
Fig. 9.2 Classification of maintenance strategies
9.4 Component Replacement
prevent failures and reduce downtimes, i. e., select the components and subsystems of the production system subject to planned maintenance actions instead of corrective tasks in the presence of failures. The second level of decisions deals with planning and scheduling of maintenance actions. The following sections present a set of different models for supporting managers and practitioners in planning and scheduling preventive maintenance activities. These models belong to the statistically and reliability based class of preventive maintenance and in particular they deal with preventive component replacement decisions. The proposed analytical models and methods are accompanied by numerical examples and case studies. A list of notation used in preventive replacement models follows: Cp Cf f .t/ F .t/ R.t/ r.t/ W .t/ UEC
preventive replacement unit cost; corrective replacement unit cost; probability function of the variable time to failure (ttf) of the generic component; failure probability function; survival probability functionI failure rate function;1 expected number of failures in .0; t/; unit (i. e., per unit time) expected cost of replacement.
Since failure is unexpected, a failure replacement is more costly than a preventive replacement, i. e., Cf > Cp . This is true especially if a failure results in damage to the equipment, or to other production systems, and is accompanied by delays related to the organization of maintenance teams/crews, the fulfillment of spare parts, etc. A balance is required between the amount spent on the preventive replacements and the resulting benefits, i. e., the reduction of downtimes and in particular of failure replacements, which are more expensive than preventive replacement. Section 9.8 discusses performance measures of effectiveness of preventive maintenance , with particular attention to preventive replacement.
1
In the case of nonrepairable components/systems, the failure rate function is generally represented by .t /; see Chap. 5.
319
9.4 Component Replacement The replacement of parts and components of a production system can be a preventive maintenance action, whose first decision deals with the determination of which critical entities have to be preventively replaced and which components, subject to breakdown/corrective actions, should be left to run until they fail. The second decision refers to the determination of timing of actions capable of improving the availability and reliability of the system. Barlow and Hunter (1960) proposed two simple analytical models for the determination of the optimal replacement policy minimizing the operating cost of the production system: 1. age-based replacement policy, or time based preventive replacement, also called “type I policy”; 2. constant interval replacement policy, also called “type II policy” or “block replacement policy.” These basic models represent the main and first reference for the development of several and more complex models and methods dealing with a preventive maintenance strategy (Huang et al. 1995; Jiang et al. 2006). In particular, the preventive replacement should take place after the component/system has been significantly used and before it has aged for too long. As a consequence, a too early or too late scheduling of a preventive replacement action is not a good decision. The numerical example illustrated in Sect. 9.4.2 clarifies this important rule.
9.4.1 Time-Related Terms and Life Cycle Management The European standard EN 13306 gives a set of useful definitions related to maintenance strategies and rules. A few of them are reported as follows: • operating time ... time interval during which an item is performing its required function; • required time ... time interval during which the user requires the item to be in a condition to perform a required function; • standby time ... time interval during which an item is in a standby state; • idle time ... time interval during which an item is in an idle time;
320
9 Basic Models and Methods for Maintenance of Production Systems
• maintenance time ... time interval during which a maintenance is carried out an item either manually or automatically, including technical and logistic delays; • active maintenance time ... part of maintenance time during which active maintenance is carried out on an item, excluding logistic delays; • repair time ... part of active corrective maintenance time during which repair is carried out on an item; • logistic delay ... accumulated time during which maintenance cannot be carried out due to the necessity to acquire maintenance resources, excluding any administrative delay; • life cycle ... time interval that commences with the initiation of the concept and terminates with the disposal of the item. In particular, the logistic delay time can have a very significant contribution to maintenance time because of traveling to an unattended installation, pending arrival of spare parts (see Chap. 11), specialists, test equipment and information, and unsuitable environmental conditions. There are a lot of literature studies regarding life cycle management (LCM) and product lifecycle management (PLM). Life cycle management and product life cycle management can be especially defined as an integrated concept to assist in businesses managing the total life cycle of products and services towards more sustainable consumption and production patterns. Product life cycle goes through many phases, involves many professional disciplines, and requires many skills, tools, and processes; this is not the subject of this book, but reliability engineering and the optimization of maintenance management represent an effective set of quantitative and practical tools to support the optimization of life cycle management and product life cycle management.
9.4.2 Numerical Example. Preventive Replacement and Cost Minimization Consider a component whose failure behavior is well known, and in particular the probability distribution of the random variable ttf is a Weibull distribution (shape parameter ˇ D 2:1 and scale parameter ˛ D 1;531:4 h). Figure 9.3 reports the trend of the failure
Table 9.1 Numerical example. Corrective maintenance (CM) compared with preventative maintenance (PM) actions Performance of action
CM
PM
Spare part cost (¤/unit) Call cost (¤/replacement) Crew cost (¤/h) Nonproduction cost (¤/h) TTR (h)
400 200 100 600 18
350 100 100 600 8
TTR time to repair
probability function F .t/, reliability function R.t/, density function f .t/, and failure rate function .t/. The value of the mean time to repair (MTTR) is about 1;356 h and reliability referred to a period of time T D 1;000 h is about 0.665. The component is assumed to be repairable, and in particular to be as good as new after a maintenance action consisting of a replacement. The duration of the generic replacement action is supposed to be constant and equal to 18 h [i. e., time to repair (ttr) equals MTTR D 18) in the case of a corrective replacement and 8 h in the case of a preventive replacement. Table 9.1 summarizes the assumed variable and fixed costs of maintenance actions, distinguishing the following contributions: • Spare part cost, i. e., the cost of acquiring and storing the replacing new part C. • Call cost, i. e., the fixed cost of calling and organizing the maintenance crew activity. • Crew cost, i. e., the direct cost of the crew for unit time. • Nonproduction cost, i. e., the direct cost of nonproduction for unit time. This is generally called “lost production cost”. In particular, the second column of Table 9.1 refers to the corrective maintenance cost contributions, i. e., the cost performance in the case of a corrective replacement action. Similarly the third column reports the costs in case of a preventive maintenance (i. e., a preventive replacement action). Table 9.2 reports the results obtained in terms of system costs and reliability performance by the application of dynamic simulation to the component/system, assuming a period of time T equal to 32;200 h and 500 repetitions (simulation runs). In particular, configuration A refers to the component when the hypothesis of corrective replacement is adopted and no preventive maintenance rules are applied.
9.4 Component Replacement
321 Reliability vs Time 1.000
0.800
0.800
Reliability, R(t)
Unreliability, F(t)=1-R(t)
Unreliability vs Time 1.000
0.600
0.400
0.200
0.000 0.000
0.600
0.400
0.200
800.000
1600.000
2400.000
3200.000
0.000 0.000
4000.000
800.000
1600.000
Time, (t)
3200.000
4000.000
Failure Rate vs Time
Probability Density Function 0.004
4.800E-4
0.003
Failure Rate, f(t)/R(t)
6.000E-4
f(t)
3.600E-4
2.400E-4
0.002
0.002
8.000E-4
1.200E-4
4.000E-116 0.000
2400.000
Time, (t)
800.000
1600.000
2400.000
3200.000
4000.000
4.000E-116 0.000
800.000
Time, (t)
1600.000
2400.000
3200.000
4000.000
Time, (t)
Fig. 9.3 F .t /, R.t /, f .t /, and .t /, numerical example. ReliaSoft® software
Table 9.2 Different maintenance strategies and parameterizations tp (h)
Configuration A –
Configuration B 1,356
Configuration C 600
Mean availability CM downtime (h) PM downtime (h) Total downtime (h) W .T / (failures) Number of PR Maintenance cost (¤) Total cost (¤)
0.9871 415.71 0 415.71 23.1 0 55,192 304,618
0.9878 288.2 104.45 392.65 13.06 16.01 54,749 290,333
0.9842 128.61 380.56 509.17 7.15 47.58 76,614 382,116
T (h) Simulation repetitions (runs)
Configuration D 4,000 0.9871 416.62 0.03 416.65 23.15 0.004 55,557 305,547
32,200 500
PR preventative replacements
Configurations B, C, and D refer to the hypothesis that preventive replacement is also adopted and the component is preventively replaced when the number of hours from the last replacement (preventive or corrective) reaches the tp value. Configuration B adopts
1;356 h as the value for tp , configuration C adopts 600 h, and configuration D adopts 4;000 h. Figure 9.4 compares the values of the downtimes obtained for the set of simulated system configurations, distinguishing the contribution of
322
9 Basic Models and Methods for Maintenance of Production Systems
500
Downme [h]
400 300 CM downme [h]
200
PM downme [h] Total downme [h]
100 0 Config. A 415.71
CM downme [h] PM downme [h] Total downme [h]
Config. B 288.2
Config. C 128.61
Config. D 416.62
0
104.45
380.56
0.03
415.71
392.65
509.17
416.65
System Configuraon
Fig. 9.4 Downtime analysis in different system configurations. CM corrective maintenance, PM preventive maintenance
350000 300000
Costs [€]
250000 200000 Maintenance cost [
150000
Total cost [ €]
100000 50000 0 Maintenance cost [ €]
Config. A 55192
Config. B 54749
Config. C 76614
Config. D 55557
Total cost [ €]
304618
290333
382116
305547
System Configuraon
Fig. 9.5 Maintenance cost analysis in different system configurations
corrective maintenance from that of preventive maintenance. Finally, Fig. 9.5 presents the results obtained in terms of system costs distinguishing maintenance cost due to corrective maintenance and preventive maintenance actions from total cost, including the significant nonproduction cost contribution. These results clearly demonstrate how much the downtimes and system costs differ for the adoption of different parameterizations of a maintenance action,
and in particular for different values of the time tp . In general, it is possible to obtain advantages from the introduction of a preventive maintenance, e. g., a preventive replacement, but it is also possible to obtain disadvantages as demonstrated by a bad parameterization of the preventive maintenance action in configuration C (C25:4% total cost and C38:8% downtime) if compared with the absence of preventive maintenance. The following sections present and apply basic models for the control and minimization of these costs.
9.5 Time-Based Preventive Replacement – Type I Replacement Model
9.5 Time-Based Preventive Replacement – Type I Replacement Model This strategy refers to the practice of periodically replacing the deteriorating units and components of a production system. This practice is particularly effective for parts and components whose failure behaviors are closely correlated with the time or age of the unit in service. The so-called single unit model can be applied to systems with one unit, but also to each unit in a complex system where the economic dependency among components is weak. In this strategy, maintenance of the system means replacing the component/unit. Cf is the cost due to a replacement after failure;Cp is the unit cost due to a preventive replacement (assuming Cf > Cp ). The object of the problem is to determine the optimal preventive replacement age tp such that the expected system maintenance unit cost (i. e., the cost per unit of operation time, i. e., the total expected replacement cost per unit time) is minimized. Considering Fig. 9.6, when failures occur and failure replacements are performed, the time clock is reset to zero and the planning preventive replacement occurs when the component has been in use for a specified period tp . The following analytical model proposed by Barlow and Hunter (1960) quantifies the UEC, i. e., as a ratio of two expectations: the total expected replacement cost per cycle and the expected cycle length, defined as follows:
Corrective action
tp
0
time
Fig. 9.6 Time-based preventive replacement. Type I
the differentiation of products of differentiable functions (the so-called Leibniz law), and Eq. 9.1 can be explained as follows: UEC.tp / D
Cp R.tp / C Cf Œ1 R.tp / : R tp 0 R.t/ dt
(9.3)
The minimum UEC(tp / given by Eqs. 9.1 and 9.3 is as follows (Jiang et al. 2006): ˇ dUEC.t/ ˇˇ (9.4) ˇ D 0; dt t Dt or r.t/G.t/ D
c R.t/; c1
(9.5)
where c D Cf =Cp > 1; Zt G.t/ D
R.x/ dx;
(9.6)
0
r.t/ D
f .t/ 1 F .t/
failure rate function.
In particular, assuming a Weibull distribution of ttf for a generic component/system,
where R tp
tf .t/ dt ; 1 R.tp /
Preventive action
tp
expected total replacement cost per cycle expected cycle lenght Cp R.tp / C Cf Œ1 R.tp / D tp R.tp / C m.tp /Œ1 R.tp / Cp R.tp / C Cf Œ1 R.tp / D ; (9.1) Rt tp R.tp / C 0 p tf .t/ dt
1
Corrective action Preventive action
UEC.tp / D
m.tp / D
323
(9.2)
where tp is the age of the component/system and m.tp / is the mean time to failure (MTTF) if a corrective replacement occurs before tp (since the last preventive or corrective replacement). It is the mean of the truncated distribution at tp : Rt In particular, the term tp R.tp /C 0 p tf .t/ dt is equal Rt to 0 p R.t/ dt by applying the integration by parts, i. e.,
r.t/ D 8 ˆ ˆ < ˆ ˆ :
˛ G.t/ D ˇ
Zz
ˇ t ˇ 1 : ˛ ˛
z 1=ˇ 1 ez dz D
(9.7)
˛ .1=ˇ; z/ ˇ (9.8)
0
z D .t=˛/ˇ ;
where ˇ is a shape parameter of the Weibull distribution, ˛ is scale parameter of the Weibull distribution,
324
9 Basic Models and Methods for Maintenance of Production Systems UEC(tp) Type I model
9.165
9.16
UEC [€/h]
9.155
9.15
9.145
9.14
9.135 1300
1320
1340
1360
1380
1400
1420
1440
1460
1480
1500
tp [h]
Fig. 9.7 Unit expected cost of replacement (UEC) minimization, type I model numerical example
and Zz .k; z/ D
x k1 ex dx
(9.9)
0
.k; z/ is the lower incomplete gamma function whose properties are illustrated by Weisstein (2008). In particular, Table 5.5 reports the values of the gamma function for different values of k, and assuming z equal to C1.
9.5.1 Numerical Example. Type I Replacement Model Consider the numerical example illustrated in Sect. 9.4.2 where the values of maintenance cost per action, including nonproduction costs, are: • corrective maintenance, Cf D ¤ 13;200 per action; • preventive maintenance, Cp D ¤ 6;050 per action. These values refer to the hypothesis of a fixed ttr in both the preventive maintenance and the corrective maintenance, and are equal to 8 and 18 h, respectively. The analytical model introduced above for the type I replacement model does not consider the existence of a repair duration: it is assumed to be equal to zero, i. e., the replacement is instantaneous. As a consequence, to properly apply this model it is necessary to quantify the cost of replacement due to the repair duration
and neglect the repair duration.2 The next model proposed faces this problem explicitly by introducing the duration of replacements, as discussed in Sect. 9.6. Figure 9.7 presents the results obtained by the application of the analytical model in terms of UEC(tp /. The best value of tp is 1;429 h, while the minimum value obtained for UEC is about ¤ 9.14 per hour. By the application of Monte Carlo simulation (assuming T D 32;200 h and 2,000 simulation repetitions3 ) the following results can be obtained and compared with those illustrated in Sect. 9.4.2: • • • • • • • •
mean availability 0.988; corrective maintenance downtime 293:38 h; preventive maintenance downtime 94:09 h; total downtime 387:47 h; W .T / 16.3 failures; number of preventive replacement 11.76; maintenance cost ¤ 53,823; total cost ¤ 286,305.
The total cost of ¤ 286,305 is about 25:07% if compared with previously defined configuration C (see Table 9.2) and 1:39% if compared with configuration B (see Table 9.2). Figure 9.8 summarizes the results obtained in terms of costs, comparing configurations A–D with the best configuration, E, and corresponding to tp D 1;429 h.
2
See Eq. 9.3 The number of repetitions is 2,000 in order to obtain more precise values of system performance. 3
9.5 Time-Based Preventive Replacement – Type I Replacement Model
325
Type I model - tp* determinaon 350000
Costs [€]
300000 250000 200000 150000
Maintenance cost [
100000
Total cost [ €]
50000 0
Maintenance cost [ €]
Config. A 55192
Config. B 54749
Config. C 76614
Config. D 55557
Config. E 53823
Total cost [ €]
304618
290333
382116
305547
286305
System Configuraon
Fig. 9.8 Maintenance costs minimization, type I model
9.5.2 Numerical Example. Type I Model and Exponential Distribution of ttf This numerical example relates to a component/system whose probability distribution of ttf is exponential (i. e., the density function is a Weibull distribution with shape parameter ˇ D 1) and consequently it differs from the distribution of the application illustrated in Sect. 9.5.1. The failure event is random because the failure rate is constant. Table 9.3 presents the results of the performance evaluation and comparison carried on the component/system for different values of time tp by the application of the Monte Carlo simulation.
In particular, the total downtime cumulated on a period of time T equal to 32;200 h (about 5 years) increases when a preventive maintenance replacement strategy is adopted. Consequently, it decreases when the adopted tp interval of time increases. A similar conclusion can be drawn from the analysis of both the maintenance cost and the total cost (see also Fig. 9.9). These results support the rule that it is not convenient to apply preventive maintenance actions of replacement on a component/system whose ttf is subject to an exponential distribution. This thesis is further supported by the following section, which presents and demonstrates universal results.
Table 9.3 Type I model and exponential distribution of time to failure (ttf) for ˇ D 1 tp (h)
Configuration A –
Configuration B 1,356
Configuration C 600
Configuration D 4,000
Mean availability CM downtime (h) PM downtime (h) Total downtime (h) W .T / (failures) Number of PR Maintenance cost (¤) Total cost (¤)
0.988 379.65 0 379.65 21.09 0 50,624 278,414
0.985 375.53 113.55 489.09 20.87 14.2 67,817 361,265
0.978 377.09 338.31 715.39 20.95 42.29 102,641 531,875
0.988 375.95 12.1 388.05 20.9 1.51 52,025 284,855
T (h) Simulation repetitions (runs)
32,200 500
326
9 Basic Models and Methods for Maintenance of Production Systems
Weibull β=1, α=1531.4 350000
Costs [€]
300000 250000 200000 Maintenance cost [
150000
Total cost [ €]
100000 50000 0 Maintenance cost [ €]
Config. A 50624
Config. B 67817
Config. C 102641
Config. D 52025
Total cost [ €]
278414
361265
531875
284855
System Configuraon
Fig. 9.9 Maintenance costs minimization, type I model and ˇ D 1
9.5.3 Type I Replacement Model for Weibull distribution of ttf Figure 9.10 presents the UEC for different Weibull distributions of ttf. These probability distributions differ for different values of shape parameter ˇ (called b in the figure). Cp and Cf values are assumed to be equal to 100 units of cost (e. g., dollars or euros) and 1,000 units of cost, respectively. In particular, for values of ˇ greater than 1 it is possible to identify an optimal value of tp in terms of units of time (e. g., hours or days). Values of the shape parameter lower than 1 are not supported by the determination of the best tp value, as clearly demonstrated by Fig. 9.11. Figure 9.12 presents the expected total cost and the expected cycle length for different values of shape parameter ˇ. Finally, Fig. 9.13 presents the UEC values for different shape parameters of the Weibull distribution, with Cp passing from a value equal to 100 units of cost to a new value equal to 10 units of cost.
9.5.4 The Golden Section Search Method This is a method to find a minimum of a unimodal continuous function over an interval without using derivatives. It can therefore be applied for the minimization
of an objective function similar to Eq. 9.10. Consider a function g over an interval Œa; b; g.t/ is continuous and unimodal (i. e., it has only one minimum) over Œa; b. This method applies as well to finding the maximum of g.t/. The basic idea is to narrow the interval that contains the minimum value, comparing different function values: min fg.t/g:
at b
(9.10)
A method based on five steps for the determination of the minimum (maximum) follows. This algorithm is based on an allowable final tolerance level, ı: Step 1.
Let Œa1 ; b1 D Œa; b; 1 D a1 C .1 ˛/.b1 a1 /; (9.11) 1 D a1 C ˛.b1 a1 /; p 1 C 5 ˛D D 0:6180: 2
(9.12)
˛ is a constant reduction factor for the determination of the size of the interval. Set k D 1. Evaluate g.1 / and g.1 /.
9.5 Time-Based Preventive Replacement – Type I Replacement Model
327
Type I Replacement Model - UEC for Weibull distributions of ttf 100 Weibull (a=50, b=1), Cp=100,Cf=1000 Weibull (a=50, b=2), Cp=100,Cf=1000 Weibull (a=50, b=3), Cp=100,Cf=1000 Weibull (a=50, b=4), Cp=100,Cf=1000 Weibull (a=50, b=0.5), Cp=100,Cf=1000 Weibull (a=50, b=0,2), Cp=100,Cf=1000
90
80
70
UEC(tp)
60
50
40
30
20
10
0
0
10
20
30
40
50
60
70
80
90
100
110
120
130
140
150
160
170
180
190
200
tp [unit of time]
Fig. 9.10 Weibull distribution of time to failure (ttf ). UEC.tp / for different values of shape parameter ˇ (i. e., b), UEC D Œ0; 100
Type I Replacement Model - UEC for Weibull distributions of ttf 30 Weibull (a=50, b=1), Cp=100,Cf=1000 Weibull (a=50, b=2), Cp=100,Cf=1000 Weibull (a=50, b=3), Cp=100,Cf=1000 Weibull (a=50, b=4), Cp=100,Cf=1000 Weibull (a=50, b=0.5), Cp=100,Cf=1000 Weibull (a=50, b=0,2), Cp=100,Cf=1000
25
UEC(tp)
20
15
10
5
0
0
10
20
30
40
50
60
70
80
90
100
110
120
130
140
150
160
170
180
190
200
tp [unit of time]
Fig. 9.11 Weibull distribution of ttf. UEC.tp / for different values of shape parameter ˇ (i. e., b), UEC D Œ0; 30
Step 2.
If bk ak < ı, the optimal solution t is defined as ak C bk : t D 2 Otherwise if g.k / > g.k /, go to step 3 and if g.k / g.k /, go to step 4.
(9.13)
Step 3.
Let akC1 D k ; bkC1 D bk ; kC1 D k ;
(9.14)
kC1 D akC1 C ˛.bkC1 akC1 /: (9.15) Evaluate g.kC1 / and go to step 5.
328
9 Basic Models and Methods for Maintenance of Production Systems Type I Replacement Model - Total Expected Cost for Weibull distributions of ttf
1100
1000
900
Total Expected Cost
800
700
600
500 Weibull (a=50, b=1), Cp=100, Cf=1000 Weibull (a=50, b=2), Cp=100, Cf=1000 Weibull (a=50, b=3), Cp=100, Cf=1000 Weibull (a=50, b=4), Cp=100, Cf=1000 Weibull (a=50, b=0.5), Cp=100, Cf=1000 Weibull (a=50, b=0.2), Cp=100, Cf=1000
400
300
200
100
0
10
20
30
40
50
60
70
80
90
100
110
120
130
140
150
160
170
180
190
200
tp [units of time]
Type I Replacement Model - Expected Cycle Length for Weibull distributions of ttf 70
Expected cycle lenght [units of time]
60
50
40
30 Weibull (a=50, b=1), Cp=100,Cf=1000 Weibull (a=50, b=2), Cp=100,Cf=1000 Weibull (a=50, b=3), Cp=100,Cf=1000 Weibull (a=50, b=4), Cp=100,Cf=1000 Weibull (a=50, b=0.5), Cp=100,Cf=1000 Weibull (a=50, b=0,2), Cp=100,Cf=1000
20
10
0
0
10
20
30
40
50
60
70
80
90
100
110
120
130
140
150
160
170
180
190
200
tp [unit of time]
Fig. 9.12 Weibull distribution of ttf. Expected total cost and expected cycle length
Step 4.
9.5.5 Numerical Example. Type I Model and the Golden Section Method
Let akC1 D ak ; bkC1 D k ; kC1 D k ;
(9.16)
kC1 D akC1 C .1 ˛/.bkC1 akC1 /: (9.17) Evaluate g.kC1 / and go to step 5. Step 5.
Set k D k C 1 and go to step 2.
Consider a component whose ttf probability density function f .t/ between Œ0; 7 weeks is defined as follows: 8 1 ˆ ˆ ; 0t <4 ˆ ˆ ˆ 8 < .week1 / f .t/ D 1 ; 4t 7 ˆ ˆ 6 ˆ ˆ ˆ :0 otherwise:
9.5 Time-Based Preventive Replacement – Type I Replacement Model
329
Type I Replacement Model - UEC(tp) for Weibull distributions of ttf 70 Weibull (a=50, b=1). Cp=100, Cf=1000 Weibull (a=50, b=1). Cp=10, Cf=1000 Weibull (a=50, b=1). Cp=100, Cf=1000 Weibull (a=50, b=4). Cp=100, Cf=1000 Weibull (a=50, b=4). Cp=10, Cf=1000 Weibull (a=50, b=4). Cp=100, Cf=1000
60
UEC(tp)
50
40
30
20
10
0
0
10
20
30
40
50
60
70
80
tp [unit of time]
Fig. 9.13 Weibull distribution of ttf. UEC for different values of shape parameter ˇ (i. e., b) and Cp
f [week-1]
f(t)
The reliability R.t/ of the equipment is
0.18 0.16 0.14 0.12 0.10 0.08 0.06 0.04 0.02 0.00 0
1
2
3
4
5
6
7
8
9
R.t/ D 1 F .t/ 8 Zt ˆ ˆ 1 1 ˆ ˆ ˆ 1 dt D 1 t; 0t <4 ˆ ˆ 8 8 ˆ ˆ ˆ 0 ˆ ! ˆ < Z4 Zt 7t 1 1 D 1 dt C dt D ; ˆ ˆ 8 6 6 ˆ ˆ 0 4 ˆ ˆ ˆ 4t 7 ˆ ˆ ˆ ˆ ˆ :0; t 7:
10
t [week]
Fig. 9.14 f .t /, type I model numerical example
The failure rate function .t/4 is The cost of preventive replacement is Cp D ¤ 5,000 per generic replacement action and the cost of failure replacement is Cf D ¤ 50,000 per generic replacement cation. Figure 9.14 illustrates the values assumed by the function f .t/. By the analytical expression of f .t/, the failure probability function F .t/ is 8 Zt ˆ ˆ 1 1 ˆ ˆ dt D t; 0t <4 ˆ ˆ ˆ 8 8 ˆ ˆ <0 Zt F .t/ D Z4 1 t 1 1 ˆ ˆ dt C dt D ; 4t 7 ˆ ˆ ˆ 8 6 6 ˆ ˆ 0 4 ˆ ˆ :1; t 7:
f .t/ R.t/ 8 1 1 ˆ ˆ ˆ 81 D ; < 8t 1 8t D 1 ˆ ˆ 6 D 1 ; ˆ : 7t 7t 6
.t/ D
4
0t <4 .week1 /: 4t 7
.t / is the symbol used for defining failure rate function for nonrepairable components. In the type I model the component is supposed to be “as good as new” after the generic maintenance action of replacement. As a consequence, this rate is represented by (t /, where r.t / D .t / and t D 0 after the generic instantaneous replacement action.
330
9 Basic Models and Methods for Maintenance of Production Systems
In order to evaluate Eq. 9.1 it is necessary to quantify the following functions: Ztp
D
8 tp Z ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ 0 ˆ ˆ ˆ ˆ ˆ 4 Z ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ <0
tf .t/ dt 0
tp2 1 t dt D ; 8 16 1 t dt C 8
Ztp
0 tp < 4
1 t dt 6
0
ˇ Ztp ˆ t 2 ˇˇ 1 ˆ ˆ D t dt C ˆ ˇ ˆ 16 6 ˆ t D4 ˆ ˆ 4 ˆ ˆ ˆ tp2 4 ˆ ˆ ˆ D ; ˆ ˆ 12 ˆ ˆ ˆ ˆ ˆ ˆ ˆ : 45 ; 12
.weeks/:
4 tp 7 tp > 7
Table 9.4 reports the values of some reliability measures in accordance with previous identified analytical equations. Figure 9.15 illustrates the trend of R.t/ and
R tp
f .t/ dt 1 R.tp / 8 2 t =16 tp ˆ ˆ p D ; 0 tp < 4 ˆ ˆ ˆ tp =8 2 ˆ ˆ ˆ ˆ ˆ ˆ .t 2 4/=12 ˆ ˆ < p .t 1/=6 D 2 ˆ ˆ ˆD tp 4 ; 4 t 7: ˆ p ˆ ˆ 2.t 1/ ˆ ˆ ˆ ˆ ˆ ˆ ˆ : 45 ; tp > 7 12 1
m.tp / D
λ(t)
.weeks/
λ [week -1]
1
Cp R.tp / C Cf Œ1 R.tp / R tp tp R.tp / C 1 tf .t/ dt 8
5 1 18 tp C 50 81 tp ˆ ˆ ˆ ˆ
t2 ˆ ˆ ˆ tp 1 18 tp C 16p ˆ ˆ ˆ ˆ 45tp C 40 ˆ ˆ ˆ D2 ; 0 tp < 4 ˆ ˆ 16tp tp2 ˆ ˆ ˆ ˆ ˆ
t 1 < 7tp 5 6 C 50 p 6 D .103 ¤=week/: 2 4
ˆ t 7t ˆ ˆ tp 6 p C p12 ˆ ˆ ˆ ˆ ˆ 45tp 15 ˆ ˆ ; 4 tp < 7 D2 ˆ ˆ 14tp tp2 4 ˆ ˆ ˆ ˆ ˆ ˆ 40 50 ˆ ˆ ˆ tp 7 : 45 D 3 ; 12
UEC.tp / D
Ztp tf .t/ dt D
The UEC(tp / obtained is
9.0 8.0 7.0 6.0 5.0 4.0 3.0 2.0 1.0 0.0
λ(
0
1
2
3
4
5
6
7
t [week]
Fig. 9.16 .t /, type I model numerical example
R(t) vs F(t)
m(tp)
1.00 0.90 0.80 0.70 0.60 0.50 0.40 0.30 0.20 0.10 0.00
4.00 3.50
R(t) F(t)
m [week]
3.00 2.50 2.00 1.50 1.00 0.50 0.00
0
1
2
3
4
5
6
7
t [week]
Fig. 9.15 R.t / and F .t /, type I model numerical example
1
2
3
4
5
6
tp [week]
Fig. 9.17 m.t /, type I model numerical example
7
8
9.5 Time-Based Preventive Replacement – Type I Replacement Model
F .t/ for different values of time t. Similarly, Fig. 9.16 illustrates the trend of .t/ and Fig. 9.17 illustrates the trend of m.tp /. By these values it is possible to identify tp , i. e., the best value of tp minimizing the UEC, as also illustrated in Fig. 9.18. We called this method as the “exhaustive approach” for the determination of the minimum value of a continuous function in a range Œa; b, i. e.,
331
Iteration 2 Œa2 ; b2 D Œ0; 4:326 2 D a2 C .1 ˛/.b2 a2 / D 0 C 0:382 4:326 D 1:653 2 D 1 D 2:674 UEC.2 D 1:653/ D 2
UEC.tp / D min fUEC.tp /g: tp 2Œa;b
452 C 40 162 22
D 9:646 .103 ¤=week/
Now the previously illustrated golden section model has been applied to identify tp , letting
D 0:25:
UEC.2 D 2:674/ D UEC.1 / D 9:003 .103 ¤=week/ Now UEC.2 / > UEC.2 /; as a consequence Œa3 ; b3 D Œ1:653; 4:326.
Iteration 1 Œa1; b1 D Œ0; 7 ˛ D 0:618 and 1 ˛ D 0:382
Iteration 3
1 D a1 C .1 ˛/.b1 a1 /
Œa3 ; b3 D Œ1:653; 4:326
D 0 C 0:382 7 D 2:674 1 D a1 C ˛.b1 a1 / D 0 C 0:618 7 D 4:326 451 C 40 UEC.1 D 2:647/ D 2 161 21
3 D a3 C ˛.b3 a3 / D 1:653 C 0:618.4:326 1:653/ D 3:305 UEC.3 D 2:674/ D 9:003 .103 ¤=week/
D 9:003 .103 ¤=week/ UEC.1 D 6:18/ D 2
3 D 2 D 2:674
451 15 141 21 4
UEC.3 D 3:305/ D 2
453 C 40 163 23
D 8:996 .103 ¤=week/
D 11:871 .103 ¤=week/ Now UEC.1 / UEC.1 /; as a consequence Œa2 ; b2 D Œ0; 4:326.
Now UEC.3 / > UEC.3 /; as a consequence Œa4 ; b4 D Œ2:674; 4:326. Iteration 4 Œa4 ; b4 D Œ2:674; 4:326
UEC(tp) 14.00
UEC [k€/week]
12.00
11.3 9.3
10.00
9.0
9.2
13.3 10.2
4 D 3 D 3:305 4 D a4 C ˛.b4 a4 /
11.6
D 2:674 C 0:618.4:326 2:674/ D 3:695
8.00
UEC.4 D 3:305/ D 8:996 .103 ¤=week/
6.00 4.00
UEC.4 D 3:695/ D 2
2.00 0.00 1
2
3
4
5
6
7
tp [week]
UEC(tp )
Fig. 9.18 merical example
“exhaustive approach”, type I model nu-
454 C 40 164 24
D 9:07 .103 ¤=week/ Now UEC.4 / < UEC.4 /, as a consequence Œa5 ; b5 D Œ2:674; 3:695.
332
9 Basic Models and Methods for Maintenance of Production Systems
Table 9.4 Reliability measures, type I model numerical example tp f .t / R.t / F .t / .t / m.tp / UEC.tp /
0
1
2
3
4
5
6
7
8
0.125 1.000 0.000 0.125 0.000 1
0.125 0.875 0.125 0.143 0.500 11.333
0.125 0.750 0.250 0.167 1.000 9.286
0.125 0.625 0.375 0.200 1.500 8.974
0.125 0.500 0.500 0.250 2.000 9.167
0.167 0.333 0.667 0.500 2.625 10.244
0.167 0.167 0.833 1.000 3.200 11.591
0.167 0.000 1.000 1 3.750 13.333
0.000 0.000 1.000 1 3.750 13.333
Iteration 5
UEC.7 D 2:823/ D 2
Œa5 ; b5 D Œ2:674; 3:695
D 8:981 .103 ¤=week/
5 D a5 C .1 ˛/.b5 a5 /
UEC.7 D 2:915/ D 8:9755 .103 ¤=week/
D 2:674 C 0:382.3:695 2:674/ D 3:064 5 D 4 D 3:305 UEC.5 D 3:064/ D 2
455 C 40 165 25
Now UEC.7 / < UEC.7 /; as a consequence Œa8 ; b8 D Œ2:823; 3:064, and b8 a8 < . Halting the iterative procedure, the best value of tp can be assumed to be equal to
D 8:976 .103 ¤=week/ tp Š
UEC.5 D 3:305/ D 8:996 .103 ¤=week/ Now UEC.5 / < UEC.5 /; as a consequence Œa6 ; b6 D Œ2:674; 3:305.
2:823 C 3:064 D 2:95 .weeks/: 2
The corresponding value of UEC is quantified by the following: UEC.tp D 2:95/ D 2
Iteration 6
6 D a6 C .1 ˛/.b6 a6 / D 2:674 C 0:382.3:305 2:674/ D 2:915 6 D 5 D 3:064
16tp tp2
456 C 40 166 26
Figure 9.19 illustrates the trend of the value bk ak passing from Œa1 ; b1 to Œa8 ; b8 , where ak , bk represent the coordinates on the axis (ak -axis of abscissae and bk -axis of ordinates): the dimension of the generic ball in the figure is proportional to the value bk ak .
D 8:9755 .103 ¤=week/
9
UEC.6 D 3:064/ D 8:9757 .103 ¤=week/
8
Iteration 1
7 6
bk
Now UEC.6 / < UEC.6 /; as a consequence Œa7 ; b7 D Œ2:674; 3:064.
Iteration 3
5 4
Iteration 2
3
Iteration 7
7 D a7 C .1 ˛/.b7 a7 / D 2:674 C 0:382.3:064 2:674/ D 2:823
Iteration 8
2 1
Œa7 ; b7 D Œ2:674; 3:064
7 D 6 D 2:915
45tp C 40
D 8:975 .103 ¤=week/:
Œa6 ; b6 D Œ2:674; 3:305
UEC.6 D 2:915/ D 2
457 C 40 167 27
0 -0.5
0
0.5
1
1.5
ak
2
2.5
3
3.5
Fig. 9.19 Golden section method, type I model numerical example
9.6 Time-Based Preventive Replacement Including Duration of Replacements
9.6 Time-Based Preventive Replacement Including Duration of Replacements
9.6.1 Numerical Example 1: Type I Replacement Model Including Durations Tp and Tf
The following new parameters have been introduced in a modified version of the original type I time-based preventive replacement model: Tp Tf
Consider the numerical example illustrated in Sects. 9.4.2 and 9.5.1 illustrating the type I preventive replacement maintenance model. The values of Tp and Tf were assumed to be 8 and 18 h, respectively (see Table 9.1), but these values were not explicitly considered to find tp by the application of Eq. 9.1. They were only considered to quantify Cp and Cf . Figure 9.21 presents the values obtained for the UEC, including durations Tp and Tf , where the optimal value of the time period of preventive replacement tp is 1;445 h and the corresponding minimal value of the UEC is ¤ 9.025 per hour. By the application of the Monte Carlo simulation to the current system and assuming corrective maintenance and preventive maintenance based on a tp value equal to 1;445 h, it is possible to define a new operating scenario: configuration F. The following results, to be compared with those obtained in scenarios A–E illustrated and simulated in Sects. 9.4.2 and 9.5.1, are obtained: mean availability A D 0:988; total downtime 386:39 h; total cost ¤ 285,465. The total cost is reduced by about 1:68% if compared with the previously identified scenario called “configuration B” (see Sect. 9.4.2) and by about 0:29% if compared with configuration E (see Sect. 9.5.1). Table 9.5 summarizes the results obtained in the multiscenario analysis of the performance of the system. Figure 9.22 compares the maintenance and total costs obtained by the Monte Carlo simulation analysis comparing configurations A–D with configuration F.
mean time required to perform a preventive replacement; mean time required to perform a failure replacement.
In Sect. 9.5 Tf and Tp were assumed to be equal to 0 and replacements were assumed to be instantaneous. The expected cycle length changes as follows: .tp C Tp /R.tp / C Œm.tp / C Tf Œ1 R.tp /;
(9.18)
where m.tp / C Tf is the expected length of a failure cycle. Figure 9.20 illustrates the composition of three operative cycles, of which only the third is complete, i. e., it is made up of a scheduled preventive replacement. The first and the second cycles are characterized by unexpected failure replacements. The total expected replacement cost per unit time is Cp R.tp / C Cf Œ1 R.tp / .tp C Tp /R.tp / C Œm.tp / C Tf Œ1 R.tp / Cp R.tp / C Cf Œ1 R.tp / D : R tp .tp C Tp /R.tp / C 1 tf .t/ dt C Tf Œ1 R.tp / (9.19)
UEC.tp / D
In Eq. 9.19 the replacement times Tp and Tf are assumed to be deterministic values, and in particular constant.
Preventive replacement
Failure replacement
Tp
Failure replacement
Tf
Preventive replacement
Tf tp
333
Tp tp time
tp A cycle length
A cycle length
Fig. 9.20 Type I model with duration of replacement, Tp < Tf
A cycle length
334
9 Basic Models and Methods for Maintenance of Production Systems
Table 9.5 Performance evaluation and comparison. Type I with Tp and Tf . Numerical example Configuration A –
tp (h) Mean availability CM downtime (h) PM downtime (h) Total downtime (h) W .T / (failures) Number of PR Maintenance cost (¤) Total cost (¤)
Configuration B 1,356
0.9871 415.71 0 415.71 23.1 0 55,192 304,618
0.9878 288.2 104.45 392.65 13.06 16.01 54,749 290,333
T (h) Simulation repetitions (runs)
Configuration C 600 0.9842 128.61 380.56 509.17 7.15 47.58 76,614 382,116
Configuration D 4,000 0.9871 416.62 0.03 416.65 23.15 0.004 55,557 305,547
32,200 500
UEC(tp) [€/h]
9.03 9.029 9.028 9.027 9.026
1410
1420
1430
1440
1450
1460
1470
1480
1490
1500
tp [h]
Fig. 9.21 UEC minimization, type I model including Tp and Tf . Numerical example
Type I model with T p and T f - tp* determinaon 350000
Costs [€]
300000 250000 200000 150000
Maintenance cost [
100000
Total cost [ €]
50000 0
Maintenance cost [ €]
Config. A 55192
Config. B 54749
Config. C 76614
Config. D 55557
Config. F 53631
Total cost [ €]
304618
290333
382116
305547
285465
System Configuraon
Fig. 9.22 Maintenance cost and total cost. Type I model with Tp and Tf
0.988 294.42 91.97 386.39 16.36 11.49 53,631 285,465 32,200 2,000
9.031
9.025 1400
Configuration F 1,445
9.6 Time-Based Preventive Replacement Including Duration of Replacements
9.6.2 Type I Model with Duration of Replacement for Weibull Distribution of ttf
335
scenario A is UEC.tp / Cp R.tp / C Cf Œ1 R.tp / R tp .tp C Tp /R.tp / C 1 tf .t/ dt C Tf Œ1 R.tp /
8 1 5 1 8 tp C 50 81 tp ˆ ˆ ˆ 2
ˆ ˆ ˆ .tp C 0:5/ 1 1 tp C tp C 0:5 1 tp ˆ 8 16 8 ˆ ˆ ˆ ˆ tp 5 C 45 ˆ 8 ˆ ; 0 tp < 4 ˆD ˆ 1 2 ˆ 0:5 C t t p ˆ p 16 ˆ ˆ ˆ
t 1
7t < 5 6 p C 50 p 6 D ˆ
7t t 2 4
t 1 ˆ ˆ .tp C 0:5/ 6 p C p12 C 0:5 p 6 ˆ ˆ ˆ ˆ 45tp 15 ˆ ˆ D2 ; 4 tp < 7 ˆ ˆ ˆ 2 C 14tp tp2 ˆ ˆ ˆ ˆ ˆ ˆ 50 ˆ ˆ D 11:76; tp 7 : 45 12 C 0:5
D Figure 9.23 presents the UEC for different Weibull distributions of ttf and adopting the analytical model of Eq. 9.19. In particular, different values of shape parameter ˇ have been considered and the following assumptions: • Cp and Cf equal to 100 and 1,000 units of cost (e. g., dollars or euros), respectively; • Tp and Tf equal to 0.5 and 1 unit of time (e. g., hour or day), respectively. For values of ˇ greater than 1 it is possible to identify an optimal value of tp in terms of units of time (e. g., hours or days). Values of the shape parameter lower than 1 are not supported by a best tp value, as clearly demonstrated by Figs. 9.23 and 9.24 based on different scaling of the axes. Figure 9.25 presents the trends of UEC for different values of shape parameter ˇ (i. e., b in the figure) and Cp , when Cf is 1,000 units of cost. If Cp passes from 100 to 10 units of cost, the UEC is reduced and in the case of the existence of an optimal tp value (e. g., ˇ D 3) this cost is reduced further.
9.6.3 Numerical Example 2: Type I Model with Durations Tp and Tf Consider the previously introduced example (Sect. 9.5.1) for the determination of the optimal tp in accordance to the original type I model. Replacement times are supposed to assume values in agreement with four different operating scenarios (A, B, C, D): A. B. C. D.
Tp Tp Tp Tp
D 0:5, Tf D 0:5; D 1, Tf D 1; D 0:5, Tf D 1; D 0:25, Tf D 0:5.
These scenarios have been simulated and the results are summarized in Table 9.6 and illustrated in Fig. 9.26. In particular, the configuration assumed by the expected replacement cost per unit time in
.103 ¤=week/: By changing the values assumed by Tp and Tf , one can identify the best value tp of the interval of preventive replacement tp . In particular, for simulated scenario B (where Tp D 1 week and Tf D 1 week) the best tp seems to be equal to 0, i. e., after a replacement is complete (duration 1 week) a new replacement is executed and the production system pays ¤ 5,000 every week because the cycle length is 1 week. This result clearly demonstrates that Eq. 9.19 minimizes the maintenance cost dealing with preventive and failure replacements, and not the global production cost, which also quantifies the system costs for unproductive operating periods. In other words, if Tp and/or Tf increase/increases, the unproductive cost (“lost production cost”) contribution increases too: as a consequence, Cp and Cf need to be updated in order to properly quantify the global system operating cost. This is the reason why in the applications illustrated in Sects. 9.5.1, 9.5.2, and 9.6.1, and supported by the Monte Carlo simulation, Cp and Cf include the variable nonproduction cost equal to ¤ 600 per hour.
9.6.4 Practical Shortcut to tp Determination How should one quickly compute the optimal age replacement interval, given a Weibull density function of the ttf random values? Legat et al. (1996) presented
336
9 Basic Models and Methods for Maintenance of Production Systems Type I model with duration of replacements (Tp=0.5, Tf=1). Weibull distributions
100 Weibull (a=50, b=1), Cp=100, Cf=1000 Weibull (a=50, b=2), Cp=100, Cf=1000 Weibull (a=50, b=3), Cp=100, Cf=1000 Weibull (a=50, b=4), Cp=100, Cf=1000 Weibull (a=50, b=0.5), Cp=100, Cf=1000 Weibull (a=50, b=0.2), Cp=100, Cf=1000
90
80
70
UEC(tp)
60
50
40
30
20
10
0
0
10
20
30
40
50
60
70
80
90
100
110
120
130
140
150
160
170
180
190
200
tp [unit of time]
Fig. 9.23 Weibull distribution of ttf. Type 1 model with duration of replacements (Tp D 0:5, Tf D 1) Type I model with duration of replacements (Tp=0.5, Tf=1). Weibull distributions 30 Weibull (a=50, b=1), Cp=100, Cf=1000 Weibull (a=50, b=2), Cp=100, Cf=1000 Weibull (a=50, b=3), Cp=100, Cf=1000 Weibull (a=50, b=4), Cp=100, Cf=1000 Weibull (a=50, b=0.5), Cp=100, Cf=1000 Weibull (a=50, b=0.2), Cp=100, Cf=1000
25
UEC(tp)
20
15
10
5
0
0
10
20
30
40
50
60
70
80
90
100
110
120
130
140
150
160
170
180
190
200
tp [unit of time]
Fig. 9.24 Weibull distribution of ttf. Type I model with duration of replacements (Tp D 0:5, Tf D 1)
a table containing the results of minimizing Eq. 9.1 for different combinations of the Weibull shape parameter ˇ and the cost ratio Cf =Cp , assuming ˛ D 1 for the characteristic life parameter (this is the so-called scaled Weibull distribution). The table is reported in Table 9.7. To exemplify this, consider the numerical example illustrated in Sect. 9.5.1 where Cf D ¤ 13,200 per action and Cp D ¤ 6,050 per action, ˇ D 2:1, and ˛ D 1;531:4 h. From the values reported in Table 9.7
it is possible to quantify the following combination of normalized5 replacement times: tp;1 D f .Cf =Cp D 2:0; ˇ D 2:0/ D 1:094; D f .Cf =Cp D 2:5; ˇ D 2:0/ D 0:866; tp;2 tp;3 D f .Cf =Cp D 2:0; ˇ D 2:5/ D 0:866; tp;3 D f .Cf =Cp D 2:0; ˇ D 2:5/ D 0:744: 5
Because it is expressed as a multiple of ˛. In other words it refers to the scaled Weibull distribution.
9.6 Time-Based Preventive Replacement Including Duration of Replacements
337
Type I model wit duration of replacement (Cf=1000, Tp=0.5, Tf=1) 200 Weibull (a=50, b=1), Cp=100 Weibull (a=50, b=1), Cp=10 Weibull (a=50, b=3), Cp=100 Weibull (a=50, b=3), Cp=10 Weibull (a=50, b=0.2), Cp=100 Weibull (a=50, b=0.2), Cp=10
180
160
140
UEC(tp)
120
100
80
60
40
20
0
0
10
20
30
40
50
60
70
80
90
100
110
120
130
140
150
160
170
180
190
200
tp [unit of time] Type I model wit duration of replacement (Cf=1000, Tp=0.5, Tf=1) 200 Weibull (a=50, b=1), Cp=100 Weibull (a=50, b=1), Cp=10 Weibull (a=50, b=3), Cp=100 Weibull (a=50, b=3), Cp=10 Weibull (a=50, b=0.2), Cp=100 Weibull (a=50, b=0.2), Cp=10
180
160
140
UEC(tp)
120
100
80
60
40
20
0
0
10
20
tp [unit of time]
Fig. 9.25 Weibull distribution of ttf. Type 1 model with duration of replacements. ˇ and Cp variables
Table 9.6 Multiscenario analysis. Type I with Tp and Tf . UEC(tp / values Id scenario
Tp
Tf
0
1
2
3
tp 4
5
6
7
8
A B C D
0.5 1 0.5 0.25
0.5 1 1 1
10.00 5.00 10.00 20.00
7.39 5.48 7.08 8.29
7.22 5.91 6.84 7.43
7.45 6.36 7.00 7.37
7.86 6.88 7.33 7.59
8.94 7.92 8.24 8.40
10.20 9.11 9.27 9.36
11.76 10.53 10.53 10.53
11.76 11.76 11.76 11.76
338
9 Basic Models and Methods for Maintenance of Production Systems 20 18
UEC(tp)
16 14
A
12
B
10
C
8
D
6 4 2
Fig. 9.26 Type I withTp and Tf . UEC.tp / values. Scenarios A, B, C, and D
0
1
2
3
4
5
6
7
8
tp
Table 9.7 Values of optimal age replacement interval tp as a multiple of the Weibull parameter ˛ Cf =Cp 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8 8.5 9 9.5 10 20 50 100
1.5
2
2.5
ˇ 3
3.5
4
4.5
4.932 2.186 1.486 1.162 0.97 0.842 0.75 0.678 0.622 0.576 0.538 0.506 0.478 0.454 0.432 0.412 0.396 0.38 0.228 0.12 0.076
1.694 1.094 0.866 0.74 0.656 0.596 0.55 0.512 0.482 0.456 0.434 0.416 0.398 0.384 0.37 0.358 0.348 0.338 0.232 0.144 0.102
1.214 0.886 0.744 0.658 0.6 0.556 0.522 0.494 0.472 0.452 0.434 0.42 0.406 0.394 0.382 0.372 0.364 0.356 0.264 0.18 0.136
1.044 0.812 0.704 0.638 0.59 0.556 0.528 0.504 0.484 0.468 0.452 0.44 0.428 0.418 0.408 0.398 0.39 0.384 0.298 0.218 0.172
0.964 0.78 0.692 0.636 0.596 0.566 0.542 0.52 0.504 0.488 0.476 0.464 0.452 0.444 0.434 0.426 0.42 0.412 0.334 0.254 0.208
0.92 0.768 0.692 0.642 0.608 0.58 0.558 0.54 0.524 0.51 0.498 0.488 0.478 0.468 0.46 0.454 0.446 0.44 0.366 0.288 0.242
0.894 0.762 0.696 0.652 0.62 0.596 0.576 0.558 0.544 0.532 0.52 0.51 0.5 0.492 0.486 0.478 0.472 0.466 0.394 0.32 0.274
Now it is possible to apply the linear interpolation method6 to determine new replacement times tp;4 and tp;5 : f .Cf =Cp D 2:18; ˇ D 2:0/ f .Cf =Cp D 2:0; ˇ D 2:0/ 2:18 2 f .Cf =Cp D 2:5; ˇ D 2:0/ f .Cf =Cp D 2:0; ˇ D 2:0/ ; D 2:5 2 6
A first time to tp;1 and tp;2 , a second time to tp;3 and tp;4
f .Cf =Cp D 2:18; ˇ D 2:0/ 1:094 0:18 0:866 1:094 D 0:456; D 0:5 tp;4 D f .Cf =Cp D 2:18; ˇ D 2:0/ Š 1:012:
9.7 Block Replacement Strategy – Type II
339
Similarly,
the number of intervening failures, where failed units are replaced at failure. The following model, named “constant-interval replacement policy” or “type II” by its proponents Barlow and Hunter (1960) quantifies and sets the cost of replacement per unit time at a minimum:
f .Cf =Cp D 2:18; ˇ D 2:5/ f .Cf =Cp D 2:0; ˇ D 2:5/ 2:18 2 f .Cf =Cp D 2:5; ˇ D 2:5/ f .Cf =Cp D 2:0; ˇ D 2:5/ ; D 2:5 2
UEC.tp / D
f .Cf =Cp D 2:18; ˇ D 2:5/ 0:866 0:18 0:744 0:866 D 0:244; D 0:5 tp;4 D f .Cf =Cp D 2:18; ˇ D 2:5/ Š 0:822:
Consequently, it is possible to apply the linear interpo lation for a third time to tp;4 and tp;5 as follows: f .Cf =Cp D 2:18; ˇ D 2:1/ f .Cf =Cp D 2:18; ˇ D 2:0/ 2:1 2 D
f .Cf =Cp D 2:18; ˇ D 2:5/ f .Cf =Cp D 2:18; ˇ D 2:0/ 2:5 2
Cp C Cf W .tp / ; tp
where W .t/ is the expected number of failures in the interval .0; t/. Figure 9.27 illustrates the sequence of maintenance actions during two cycles of tp units of time. The duration of replacement is supposed to be equal to 0 (i. e., instantaneous replacement). Differentiating the right-hand side of Eq. 9.20 with respect to the length of the preventive replacement interval tp and equating it to zero, one obtains tp w.tp / W .tp / D
;
f .Cf =Cp D 2:18; ˇ D 2:1/ 1:012 0:1 0:822 1:012 D 0:38; D 0:5 tp;4 D f .Cf =Cp D 2:18; ˇ D 2:1/ Š 0:974:
Now the value of optimal replacement time tp is tp .˛ D 1531:4; ˇ D 2:1; Cf =Cp D 2:18/ Š 0:974 1531:4 Š 1;491:6 h: This value is very close to the exact value of 1;429 h obtained by the application of numerical and continuous simulation, as illustrated in Sect. 9.5.1.
9.7 Block Replacement Strategy – Type II This model, also known as the group replacement policy model, is suitable for the determination of the optimal preventive replacement intervals of items subject to breakdown. A preventive replacement is performed on the unit at periodic intervals tp , regardless of
(9.20)
Cp ; Cf
(9.21)
where w.t/ is the derivative of W .t/ called the “renewal density function”.7 The renewal density function is defined for stochastic processes based on identically distributed variables as described in the next section. As a consequence, the basic assumptions for applying the renewal theory are the hypotheses “as good as new” and the instantaneous replacement. Both hypotheses characterize the maintenance replacement rule type II as defined in the current section. For this reason, the failure rate function w.t/ introduced in Sect. 5.10 for the determination of the expected number of failures W .t/ in a generic repairable component/system subject to to function, failure, and repair (FFR) cycles generally differs from the renewal density function m.t/, called w.t/ for simplicity in Eq. 9.21. In fact, for a repairable item the W .t/ function quantifies the expected number of failures considering a sequence of multiple operative cycles separated by failures and characterized by repair activities and repair variable/stochastic times (ttr values). Nevertheless, the following basic equation8 is true for both repairable items subject to FFR cycles and components replaced in accordance with the “as good as new” hy7 8
Introduced in Sect. 5.10 and called m.x/ See Eq. 5.75
340
9 Basic Models and Methods for Maintenance of Production Systems Preventive action
Corrective actions
Corrective actions
tp
0
Preventive action
2tp
time
tp
tp Fig. 9.27 Block replacement strategy – type II
pothesis adopted in the type II model: Zt W .t/ D EŒN.t/ D
w.t/ dt ;
(9.22)
0
where N.t/ is the number of failures in the interval .0; t/ and EŒ::: is the expectation function. A special class of renewal process is known as an alternating renewal process because the state of the component alternates between a functioning state and a repair state. In other words, the process is a sequence of independent and nonnegative random variables representing the ttf and the ttr/time to restore variables. The alternating renewal process is not the subject of this book.
9.7.1 Renewal Process A large class of stochastic processes are renewal processes. This class of processes is used to model independent identically distributed occurrences. The hypothesis of identically distributed random variables cannot be applied to the previously defined alternating renewal process. Let Y1 ; Y2 ; Y3 ; : : : be independent identically distributed and positive stochastic variables, and set Tn D Y 1 C Y 2 C C Y n : The following process X.t/ is called a “renewal process”: X.t/ D maxfTn t g; n
where t > 0.
(9.23)
The process is named “renewal” because of the fact that every time there is an occurrence, the process “starts all over again,”, i. e., it renews itself. As a consequence, Yi and Yj are independent for i ¤ j , and cov.Yi ; Yj / D 0, where cov.:::/ is the covariance function. With reference to the renewal process related to the failure process, as illustrated by Jardine and Tsang (2006), the number of expected failures in t is correctly quantified by Eq. 9.22. The authors also introduced the random variable Sr def ined as follows: Sr D t1 C t2 C C tr ;
(9.24)
where t1 ; t2 ; : : : ; tr are intervals between failures. As a consequence, the following equation quantifies the probability that variable t lies between the rth and the .r C 1/th failures: N.t/ D r:
(9.25)
The following set of equations can be properly justified: P ŒN.t/ < r D P .Sr > t/ D 1 P .Sr < t/ D 1 Fr .t/; (9.26) P ŒN.t/ > r D P .SrC1 < t/ D FrC1 .t/;
(9.27)
where Fr .t/ is the cumulative distribution function of variable Sr . Equation 9.26 measures the probability of cumulating fewer than r failures in a period of time t. The complementary equation (Eq. 9.27) obviously measures the probability of cumulating fewer than r failures in t, as illustrated in Fig. 9.28.
9.7 Block Replacement Strategy – Type II
341
N(t)
Sr = t1 + t2 +…+ tr S3 = t1 + t2 +t3 S2 = t1 + t2 S1 = t1
t t1
t2
1
t3
2
t4
3
tr
4
r-1
time tr+1
r
r+1
0
number of failures
Fig. 9.28 Renewal process. Variable t and Sr
As a consequence, it is possible to accept the following identical equations: ( P ŒN.t/ < r C P ŒN.t/ D r C P ŒN.t/ > r D 1 P ŒN.t/ D r D Fr .t/ FrC1 .t/: (9.28) Now the expected value W .t/ of N.t/ can be quantified by the following equation: EŒN.t/ D D
1 X rD0 1 X
rŒFr .t/ FrC1 .t/ D
0
Then,
rP ŒN.t/ D r
rD0
In particular, if ttf is distributed in accordance with a negative exponential function, from Eq. 9.29, 8 f .s/ ˆ ˆ W D 2 .s/ D ˆ ˆ sŒ1 f .s/ s ˆ ˆ ˆ < f .s/ D LŒf .t/ D et (9.31) ˆ Z1 ˆ ˆ ˆ ˆ D : est f .t/ dt D ˆ ˆ : Cs
1 X
Fr .t/:
rD1
(9.29)
W .t/ D L1 W .s/ D 2 D t: s
(9.32)
As a consequence, the number of expected failures increases as a linear function of time t.
9.7.2 Laplace Transformation: W(t) and w(t)9 Determination
9.7.3 Renewal Process and W(t) Determination, Numerical Example
Applying Laplace integral transforms to both sides of Eq. 9.29, we have (Jardine and Tsang 2006) 8 f .s/ ˆ ˆ W .s/ D ˆ ˆ ˆ sŒ1 f .s/ < (9.30) Z1 ˆ ˆ st ˆ f .s/ D LŒf .t/ D e f .t/ dt ; ˆ ˆ :
In order to exemplify the application of the Laplace transform consider the following probability distribution of the random variable ttf:
0
where f .t/ is the probability density function of the random variable ttf. 9 This is m.x/, the renewal density function introduced in Sect. 5.5 in accordance with the basic hypotheses of the renewal process as illustrated in Sect. 9.7.1.
f .t/ D
1 ; 10
0 t 10:
Applying Laplace transforms, f .s/ D
1 ; 10s
1 ; s.10s 1/ 1 1 W .t/ D 2 exp t sinh t : 20 20
W .s/ D
342
9 Basic Models and Methods for Maintenance of Production Systems
Figure 9.29 presents the values assumed by W .t/ in the case of immediate replacement of failed components and t D Œ0; 10. Similarly, Fig. 9.30 presents the trend of W .t/ for the period of time t D Œ0; 100. How is possible to determine the renewal density w.t/ for an item subject to a renewal process? By Eq. 9.22, w.t/ D
In particular, considering the example illustrated in this section, 1 1 1 dW .t/ D exp t sinh t w.t/ D dt 10 20 20 1 1 1 C exp t cosh t : 10 20 20 It is important to remember that this w.t/ is not the generic failure rate function defined for a repairable
dW .t/ : dt
1.8
W(t)=2*exp(1/20*t)*sinh(1/20*t)
1.6 1.4 1.2 1 0.8 0.6 0.4 0.2
Fig. 9.29 Renewal process. Transforms of Laplace W .t /, t D Œ0; 10
0
0
1
2
3
4
5 t
6
7
8
9
10 -3 0
10
20
30
40
50
60
70
80
90
10
10 5
10 4
10 3
W(t)
10 2
10 1
10 0
10 -1
10 -2
Fig. 9.30 Renewal process. Transforms of Laplace W .t /, t D Œ0; 100
t
100
9.7 Block Replacement Strategy – Type II
343 2 1.8 1.6
w(t). λ(t). f(t)
1.4 1.2
f(t)
1
λ(t) w(t)
0.8 0.6 0.4 0.2 0
Fig. 9.31 Renewal process .t /, f .t /, and w.t /, numerical example
0
1
f .t/ f .t/ D R1 R.t/ t f .x/ dx D
3
4
5
6
7
8
9
10
t
component subjected to a sequence of operative cycles (FFR), i. e., a sequence of failures and repairs (see footnote 9). Figure 9.31 presents the trend assumed by renewal density w.t/, .t/, and f .t/. In particular, for a repairable component the rate function .t/ represents the failure rate at point in time t measured from the last replacement: .t/ D
2
1 1=10 D : 1 .1=10/t 10 t
As a consequence, it is not correct to strictly compare these functions which are defined for different ranges of values: Œ0; 10 for .t/ and f .t/, Œ0; 1/ for w.t/.
9.7.4 Numerical Example, Type II Model This example relates to the application introduced in Sect. 9.4.2. The component is subject to preventive maintenance and possibly corrective maintenance actions in accordance with the model of Eq. 9.20. In particular, Monte Carlo analysis has been applied to different operating scenarios, from configuration A, corresponding to the absence of preventive maintenance actions, to configuration F identified in Sects. 9.4.2,
9.5.1, and 9.6.1. Configuration G will be properly justified in Sect. 9.9.1.1 (the application of the so-called Type I – Minimum Downtime model will justify a replacement time equal to 1,392 h). The proposed scenarios differ from the value of tp adopted in Eq. 9.20. Both preventive and corrective actions perform replacements in accordance with the “as good as new” hypothesis. Figure 9.32 shows that corrective maintenance downtime increases when the value of tp increases too, while preventive maintenance downtime decreases. In terms of maintenance and total costs the first scenario, configuration A, turns out to be the best one. In order to identify the best value of tp , in accordance with Eq. 9.20, it is necessary to quantify the renewal function W .t/, i. e., the expected number of failures. For the two-parameter Weibull distribution, W .t/ is not computable in explicit form, and for its guesstimate several functions, lower and upper bounds not the subject of this book (e. g., Soland 1969; Bilgen and Deligönül 1987; Constantine and Robinson 1997; Yannaros 1994; Jiang et al. 2006; Politis and Koutras 2006), are outlined in the literature. In particular, Soland (1969) and Constantine and Robinson (1997) presented useful tables for computing the renewal function W .t/. In order to evaluate W .t/ as an approximation, two alternative and practical methods are proposed:
344
9 Basic Models and Methods for Maintenance of Production Systems 600
500
downtime [h]
400
300
CM downtime [h] PM downtime [h] Total downtime [h]
200
100
0
Config. A
Config. C
Config. B
Config. G
CM downtime [h]
416
124
253
259
265
263
PM downtime [h]
0
421
182
182
174
174
63
416
545
435
441
438
436
427
Total downtime [h]
Config. E
Config. F
Config. D 364
Fig. 9.32 Downtime contributions, type II model
1. assuming w.t/ D .t/; 2. applying Monte Carlo simulation analysis.
9.7.4.1 Approximation Method 1 for W(t) In this method Eq. 5.75 is applied as follows: Zt W .t/ D 0
Z t b1 x b b x dx D .x/ dx D : a a a 0
The trend of the approximated renewal function W .t/ is illustrated in Fig. 9.33. Figure 9.34 reports the estimated values of UEC(tp ) as a result of the application of the analytical model (Eq. 9.20), and assuming the W .t/ trend in Fig. 9.33 and the following unit costs10 : Cf D ¤ 13,200 per action and Cp D ¤ 6,050 per action. The minimum value of UEC(tp / obtained is about ¤ 11.45 per hour, for tp equal to 1;043 h. For t equal to 1,043 time units the number of expected failures obtained by the assumption w.t/ D .t/ is about 0.445. This value can now be compared with the values obtained by the application of the Monte Carlo simulation, see Table 9.8. In particular, in configuration B the expected number of failures W .t D 32;200 h/ was about 14.09, 10
In coherence with the hypotheses summarized in Table 9.1
assuming the hypothesis of random failure events but also constant repair times (see ttr values in Table 9.1). It is worth observing that this value obtained by the Monte Carlo simulation refers to 32;200 h, while the number of failures, 0.445, obtained by the assumption w.t/ D .t/ refers to a period of time of 1;043 h. This is a good result as it can be checked by a simple proportion: 14:09 1043 Š 0:456; 32;200 which is a value very close to 0.445. UEC evaluation In order to complete the comparative what-if analysis conducted on different values of tp (see Table 9.8 Conf. A–G), we also present the results obtained by the application of Monte Carlo simulation assuming tp D 1;043 h, T D 32;200 h, and 2,000 repetitions/runs (we call this configuration H). These results are obtained for configuration H: mean availability 0.9861, corrective maintenance downtime 209.87, preventive maintenance downtime 237.22, total downtime 447.10, W .T / D 11:67, number of preventive maintenance actions 29.65, maintenance cost ¤ 65,051, and total cost ¤ 333,311. These values further suggest it would not be useful
9.7 Block Replacement Strategy – Type II
345
Table 9.8 Performance evaluation and comparison, type II model Type II tp (h)
Configuration A –
Configuration C 600
Configuration B 1,356
Configuration G 1,392
Configuration E 1,429
Configuration F 1,445
Configuration D 4,000
Mean availability CM downtime (h) PM downtime (h) Total downtime (h) W .T / (failures) Number of PR
0.9871 416 0 416 23.1 0
0.9831 124 421 545 6.9 52.58
0.9865 253 182 435 14.09 22.72
0.9863 259 182 441 14.4 22.71
0.9864 265 174 438 14.71 21.7
0.9864 263 174 436 14.6 21.71
0.9867 364 63 427 20.23 7.91
Maintenance cost (¤) Total cost (¤)
55,192 304,618
82,285 409,189
62,210 323,390
62,935 327,409
T (h) Simulation repetitions (runs)
62,415 325,377
62,164 323,986
58,429 314,833
32,200 2,000
W(t)
W(t) determination by failure rate
Fig. 9.33 W .t / determination by the failure rate .t /
130 120 110 100 90 80 70 60 50 40 30 20 10 0 0
2000
to apply a preventive maintenance strategy based on the type II replacement rule to the current case study. How is it possible that Fig. 9.34 clearly identifies an optimal value of tp and what-if analysis demonstrates that it is not economic to apply a type II based preventive replacement? First of all, the analytical model illustrated by Eq. 9.20 does not consider the replacement times (Tp and Tf introduced in Sect. 9.6) which influence the alternating renewal process as a sequence of ttf and ttr values (see the definition introduced in Sect. 9.7) and the number of replacement cycles in the simulated period of time T , e. g., 32;200 h. In general, during a cycle of preventive replacement, therefore, some corrective actions take place, i. e., replacements based on the as good as new hypothesis. This is in contrast with the assumption w.t/ D .t/, because the density function w.t/ assumed to quantify the expected number of failures W .t/ is de-
4000
6000
8000
10000
12000
14000
t
fined for the whole preventive cycle that can include several corrective replacements. Consequently, by the assumption w.t/ D .t/, the function w.t/ increases during a preventive cycle as exemplified in Fig. 9.35 and corrective actions cannot be based on the as good as new hypothesis. The method illustrated in next section tries to bypass the limit of adopting w.t/ D .t/.
9.7.4.2 Approximation Method 2 for W(t) Evaluation By the application of the simulation analysis to the case study introduced for the first time in Sect. 9.4.2, it is possible to quantify the expected number of failures of the component in a period of time T . In particular, assuming T D 2;000 h the following what-if scenarios, based on different values of the corrective replace-
346
9 Basic Models and Methods for Maintenance of Production Systems UEC(tp) - Type II, W(t) approx. 11.85 11.8 11.75
UEC(tp)
11.7 11.65 11.6 11.55 11.5 11.45 11.4
Fig. 9.34 UEC(tp ) type II, numerical example
800
850
900
950
1000
1050
Corrective actions
0
Preventive action Corrective actions
tp
W1(t)
130
120
120
110
110
100
100
90
90
80
80
70
70
60
60
50
50
40
40
30
30 20
10 0
time
W(t) determination by failure rate
130
20
10 0
1000
2000
3000
4000
5000
6000
7000
8000
9000
0 10000 11000 12000 13000 14000 1500 0
1000
t
Fig. 9.35 Preventive maintenance and preventive replacement (PR) cycles when w.t / D .t /
2000
3000
4000
5000
6000
7000
8000
9000
10000 11000 12000 13000 14000 1500
t
PR cycle 1 w1(t)=λ(t) t0=0 t=time- t0
PR cycle 2 w2(t)=λ(t) t0=tp t=time- tp
t
t
W1 (t ) = ∫ w1 ( x)dx
W2 (t ) = ∫ w2 ( x)dx
W (t ) = W1 (t )
W (t ) = W1 (t ) + W2 (t )
0
ment time Tf and the restoration factor q 11 , have been simulated and compared, as illustrated in Fig. 9.36:
11
1200
Preventive action
2tp
W(t) determination by failure rate
Tf Tf Tf Tf
1150
tp
tp
1. 2. 3. 4.
1100
tp
D 18 h, q D 1 (as good as new hypothesis); D 18 h, q D 0 (minimal repair hypothesis); D 0 h, q D 1 (as good as new hypothesis); D 18 h, q D 0 (minimal repair hypothesis).
The percentage to which a component is restored after the execution of the maintenance action. In particular q D 1 corresponds to the well-known “as good as new hypothesis,” while q D 0 corresponds to the minimal repair hypothesis properly defined in Sect. 9.11.
0
UEC evaluation It is possible to quantify UEC(tp /, as illustrated in Fig. 9.37, by entering the values of W .t/ obtained in Eq. 9.20. Table 9.9 summarizes in detail the minimum values of UEC(tp / for scenarios A–D. From the values of UEC(tp / obtained, we see that the preventive maintenance replacement strategy based on the type II policy when q=1, i. e., in the presence of the “as good as new” hypothesis12, is not so attractive. This con clusion is coherent with the simu12
See footnote 11.
9.7 Block Replacement Strategy – Type II
347 1.8 1.6 1.4
W(T)
1.2 1.0
W(Tf=18,q=1) W(Tf=18,q=0)
0.8
W(Tf=0,q=1) W(Tf=0,q=0)
0.6 0.4 0.2
Fig. 9.36 Evaluation of the expected number of failures by simulation
0 0
200
400
600
800
1000
1200
1400
1600
1800
2000
me-t
100 90 80 70
UEC(tp)
60
UEC(A)
50
UEC(B) UEC(C)
40
UEC(D) 30 20 10 0 0
Fig. 9.37 UEC values, scenarios A–D
200
400
600
800
1000
1200
1400
1600
1800
2000
me replacement-tp
Table 9.9 UEC(tp ) values, scenarios A–D Scenario
Tf (h)
q
tp (h)
UEC(tp ) (¤=h)
A B C D
18 18 0 0
1 0 1 0
> 2;000 900 1,960 880
– 11.09 10.28 11.42
lated results obtained by the application of the Monte Carlo analysis and reported in Table 9.8 (see also the previously introduced configuration H, Sect. 9.7.4.1).
Otherwise, assuming the restoration factor q equal to 0, i. e., in the presence of “minimal repair” maintenance actions, the best value of tp exists in coherence with the previously illustrated analysis, conducted with basic and simplifying hypothesis w.t/ D .t/, whose results are illustrated in Fig. 9.34. This numerical example demonstrates how important it is to quantify the expected number of failures W .t/.
348
9 Basic Models and Methods for Maintenance of Production Systems
9.7.5 Discrete Approach to W(t) In the following a discrete analytical method to predict the expected number of failures in a period of time T , made up of several units or intervals, is presented. The basic assumption is that no more than one failure can occur in any unit of the period. As a consequence, this hypothesis is not so restrictive because it is possible to define the desired number of intervals for a given period of time T . The number of expected failures W .T / occurring in the interval .0; t D T / can be considered as the sum of the following contributions (as illustrated in Fig. 9.38): 1. Number of expected failures occurring in .0; T / when the first failure occurs in the first period .0; 1/, multiplied by the probability of the first failure occurringR in the interval .0; 1/, i. e., P .0 1 ttf < 1/ D 0 f .t/ dt . The number of expected failures that occurs in the interval .0; T / when the first failure occurs in the first period is 1 (e. g., the failure occurred in the first week) plus the expected number of failures in the remaining T 1 periods W .T 1/. 2. Number of expected failures that occur in interval .0; T / when the first failure occurs in the second period multiplied by the probability of the first failure occurring in the interval .1; 2/. 3. ...
T . Number of expected failures that occur in interval .0; T / when the first failure occurs in the T th period multiplied by the probability of the first failure occurring in the interval .T 1; T /. The events .1; : : : ; T / described above are disjunctive. As a consequence, W .T / D
T 1 X
Zi C1 Œ1 C W .T i 1/ f .t/ dt :
i D0
i
(9.33) For example, the number of expected failures in the interval (0,5) is Z1 W .T D 5/ D Œ1 C W .4/ f .t/ dt 0
Z2 C Œ1 C W .3/ f .t/ dt 1
C Œ1 C W .2/
Z3
C Œ1 C W .1/
Z4
2
3 T ZD5 f .t/ dt : C Œ1 C W .0/
N(T)
0
2
3
1
T-1
1+W(T-1)
∫ f (t )dt 0
2
1+W(T-2)
∫ f (t )dt 1
3
∫ f (t )dt 2
1+W(T-3)
…
Fig. 9.38 W .T / determination. Discrete approach
T
f .t/ dt
4
1
f .t/ dt
time period
(9.34)
9.7 Block Replacement Strategy – Type II
349
Z1 D Œ1 C W .2/ f .t/ dt
9.7.6 Numerical Examples
0
The following two examples illustrate the application of the discrete approach for the determination of the expected number of failures W .t/ during a renewal process and in accordance with the hypotheses previously introduced. Example 2 also exemplifies the determination of the best tp value by the application of preventive replacement model type II, i. e., the minimization of UEC(tp /.
Z2 C Œ1 C W .1/ f .t/ dt 1
C Œ1 C W .0/
D Œ1 C W .0/ŒF .3/ F .2/ Š 0:022;
W .T D 4/ D
Zi C1 Œ1 C W .T i 1/ f .t/ dt i
D Œ1 C W .3/
Z1
f .t/ dt
0
Z2 C Œ1 C W .2/ f .t/ dt Z1
W .T D 1/ D Œ1 C W .0/
1
Z3 C Œ1 C W .1/ f .t/ dt
f .t/ dt 0
2
D F .t D 1/ F .t D 0/ D
3 X i D0
W .0/ D 0;
zD tMTTF .TTF/
o f .t/ dt
2
9.7.6.1 Numerical Example 1 The variable ttf is assumed to be distributed in accordance with a normal distribution with a mean of 6 weeks and a standard deviation of 1.5 weeks. By the application of Eq. 9.33,
Z3
n
C Œ1 C W .0/
˚.z D 3:34/ ˚.z D 4/ Š 0;
Z4
f .t/ dt
3
W .T D 2/ D
1 X
Zi C1 Œ1 C W .T i 1/
i D0
D Œ1 C W .1/
f .t/ dt
Z3 2
f .t/ dt C
W .T D 5/ D
4 X
Zi C1 Œ1 C W .T i 1/ f .t/ dt
i D0
D Œ1 C W .4/
1
Z2
0
3
f .t/ dt
Z2 C Œ1 C W .0/ f .t/ dt
D
f .t/ dt
D F .4/ F .2/ D 0:087;
0
Z1
f .t/ dt C
Š
i
Z1
Z4
f .t/ dt
i
Z1
f .t/ dt
0
Z2 C Œ1 C W .3/ f .t/ dt
1
Š F .2/ F .1/ Š 0:0034;
1
Zi C1 2 X W .T D 3/ D Œ1 C W .T i 1/ f .t/ dt i D0
i
C Œ1 C W .2/
Z3
f .t/ dt
2
350
9 Basic Models and Methods for Maintenance of Production Systems
Z4 C Œ1 C W .1/ f .t/ dt C Œ1 C W .0/
Z3 C Œ1 C W .4/ f .t/ dt
3
2
Z5
Z4
C Œ1 C W .3/
f .t/ dt
3
4
Z5 Š .1:0034/ŒF .3/ F .2/ C
Z5 C Œ1 C W .2/ f .t/ dt
f .t/ dt 3
4
D 0:249; W .T D 6/ D
5 X
Zi C1 Œ1 C W .T i 1/ f .t/ dt
i D0
Z1
D Œ1 C W .5/
C 1:022ŒF .4/ F .3/
Z2 C Œ1 C W .4/ f .t/ dt
C Œ1 C W .2/ C Œ1 C W .1/
Z5
C Œ1 C W .0/
Z6
C 1:003ŒF .5/ F .4/ C ŒF .7/ F .5/ D 0:747:
1
Z4
f .t/ dt
Š 1:087ŒF .3/ F .2/
0
C Œ1 C W .3/
C Œ1 C W .0/
Z7
f .t/ dt
6
f .t/ dt
Z3
C Œ1 C W .1/
Z6 5
i
f .t/ dt
f .t/ dt
2
9.7.6.2 Numerical Example 2
f .t/ dt 3
f .t/ dt
4
f .t/ dt
Consider the component introduced in the example illustrated in Sect. 9.5.5 and related to the application of the type I replacement model. The value of the expected number of failures can be quantified by the application of the discrete approach: W .0/ D 0;
5
Š .1:022/ŒF .3/ F .2/
Z1
C 1:003ŒF .4/ F .3/ C ŒF .6/ F .4/
W .T D 1/ D Œ1 C W .0/
D 0:496; W .T D 7/ D
6 X
0
Zi C1 Œ1 C W .T i 1/ f .t/ dt
i D0
f .t/ dt
Z1 D
1 1 dt D D 0:125; 8 8
0
i
Z1
D Œ1 C W .6/
W .T D 2/ D
f .t/ dt
Zi C1 Œ1 C W .T i 1/ f .t/ dt
i D0
0
C Œ1 C W .5/
1 X
Z2
f .t/ dt
1
D Œ1 C W .1/
i
Z1
f .t/ dt
0
9.7 Block Replacement Strategy – Type II
351
Z2 C Œ1 C W .0/ f .t/ dt
Z5 C
1
4
Z1 Z2 1 1 1 dt C dt D 1C 8 8 8 0
22;185 1 D C Š 0:844; 32;768 6
1
1 1 1 17 D 1C C D Š 0:266; 8 8 8 64 Zi C1 2 X W .T D 3/ D Œ1 C W .T i 1/ f .t/ dt i D0
i
D Œ1 C W .2/
Z1
f .t/ dt
0
Z2 C Œ1 C W .1/ f .t/ dt 1
C Œ1 C W .0/
Z3
f .t/ dt
W .T D 6/ D
W .T D 4/ D
i D0
Zi C1 Œ1 C W .T i 1/ f .t/ dt i
1 Œ1 C W .3/ C 1 C W .2/ 8 C 1 C W .1/ C 1 C W .0/ 217 17 1 1 4C C C D 8 512 64 8 2;465 D Š 0:602; 4;096 D
W .T D 5/ D
4 X i D0
Zi C1 Œ1 C W .T i 1/ f .t/ dt i
1 Œ1 C W .4/ C 1 C W .3/ C 1 C W .2/ 8 1 C 1 C W .1/ C Œ1 C W .0/ 6 1 2;465 217 17 1 D 4C C C C 8 4;096 512 64 8
5 X
Zi C1 Œ1 C W .T i 1/ f .t/ dt
i D0
i
1 D Œ1 C W .5/ C 1 C W .4/ 8 C 1 C W .3/ C 1 C W .2/ 1 C Œ1 C W .1/ C 1 C W .0/ 6 1 22;185 1 2;465 D 4C C C 8 32;768 6 4;096 17 1 1 217 C C 2C C 512 64 6 8 Š 1:121;
2
1 D Œ1 C W .2/ C 1 C W .1/ C 1 C W .0/ 8 1 1 17 217 C Š 0:424; D 3C D 8 64 8 512 3 X
1 dt.1 C 0/ 6
W .T D 7/ D
6 X
Zi C1 Œ1 C W .T i 1/ f .t/ dt
i D0
i
1 D Œ1 C W .6/ C 1 C W .5/ 8 C 1 C W .4/ C 1 C W .3/ 1 C Œ1 C W .2/ C 1 C W .1/ 6 C 1 C W .0/ 22;185 1 4 C 1:121 C D 8 32;768 2;465 217 1 C C C 6 4;096 512 17 1 1 3C C C 6 64 8 Š 1:439:
D
Figure 9.39 illustrates the trend of W .T / values for the range Œ0; 7. Figure 9.40 illustrates the trend of UEC(tp / values for the range considered. The best value of tp is equal to 3 weeks, UEC(tp / D 8:73.
352
9 Basic Models and Methods for Maintenance of Production Systems Expected Number of Failures 1.6 1.4 1.2
W(T)
1 0.8 0.6 0.4 0.2 0
Fig. 9.39 Expected number of failures W .T /
0
1
2
3
4
5
6
7
T
UEC - Type II 12 11.5 11
UEC
10.5 10 9.5 9 8.5 8 7.5 7
1
Fig. 9.40 Type II model. Tp determination
2
3
4
5
6
7
Tp
Now, introducing the approximation w.t/ D .t/, Ztp EŒN.tp / D W .tp /approx. D
Ztp w.t/ dt Š
0
8 t p ˆ ˆZ 8 1 ˆ ˆ dt D ln ; ˆ ˆ ˆ 8t 8 tp ˆ ˆ ˆ 0 ˆ ˆ ˆ ˆ Z4 Ztp ˆ < 1 1 dt C dt D 8t 7t ˆ ˆ 0 ˆ 4 ˆ ˆ 3 ˆ ˆ D ln 2 C ln ˆ ˆ ˆ ˆ 7 tp ˆ ˆ 6 ˆ ˆ :D ln ; 7 tp
.t/ dt 0
0 t < 4;
9.7.7 Practical Shortcut to W(t) and tp Determination Similarly to Sect. 9.6.4, which relates to a practical shortcut for the optimal age replacement interval, this section presents a quick way to determine the renewal function W .t/. Smith (1954) proposed the following asymptotic approximation of W .t/, which is effective for large values of t: W .t/ Š
4 t < 7:
The values obtained, W .tp /approx. , approximate in a satisfactory way the expected number of failures W .tp / determined with the discrete approach, as illustrated in Fig. 9.41. Finally, Fig. 9.42 presents the values of UEC(tp / obtained by the application of both the approximation approach and the discrete approach.
t 2 2 C ; 22
(9.35)
where is the mean of an arbitrary lifetime density function f .t/ and 2 is the variance of f .t/. In particular, for the Weibull distribution Eq. 9.35 with ˇ < 4 gives good numerical accuracy for t 3 and reasonable relative accuracy for t 2 Œ1; 3, where t is the variable time for the scaled Weibull distribution.13 For larger ˇ the accuracy is not very good 13
Because ˛ is the “scale” parameter of the generic Weibull density function. The condition t D 3 for the scaled function is equal to the condition t =˛ D 3 for the generic Weibull function.
9.8 Maintenance Performance Measurement in Preventive Maintenance
353 W(T) W(T) approx.
Expected Number of Failures
2 1.8 1.6 1.4 W(T)
1.2 1 0.8 0.6 0.4 0.2 0
Fig. 9.41 Expected number of failures, numerical example
0
1
2
3
4
5
6
7
T
UEC(Tp)
UEC - Type II
12
UEC(Tp) approx.
11.5 11 10.5 UEC
10 9.5 9 8.5 8 7.5
Fig. 9.42 UEC(Tp /. “Discrete” calculus compared with “approximation” calculus
7 1
2
for moderate values of t (Constantine and Robinson 1997). To exemplify this, consider the numerical example introduced in Sect. 9.4.2 and illustrated in Sect. 9.7.4 with regards to the type II preventive replacement model: Cf D ¤ 13,200 per action and Cp D ¤ 6,050 per action, ˇ D 2:1, and ˛ D 1;531:4 h. The MTTR is 1;356 h by Eq. 5.69 and the variance can be quantified by the following general equation (Abernethy 2007): 1 2 2 1C : (9.36) D˛ 1C ˇ ˇ 2
2
As a consequence (see also Table 5.5), 2 .˛ D 1531:4; ˇ D 2:1/ 2 1 2 1C D 1531:42 1 C 2:1 2:1 Š 460;119:8:
3
4
5
6
7
Tp
Table 9.10 presents the results obtained by the application of Eq. 9.35. These values can be compared with those reported in Fig. 9.36. In particular, in accordance with the results and conclusions of Sect. 9.7.4, it seems there is not an optimal replacement time period tp .
9.8 Maintenance Performance Measurement in Preventive Maintenance Several authors have proposed some measures of effectiveness of preventive maintenance using the relative amount of preventive maintenance actions, such as the ratio of preventive maintenance hours and the total maintenance hours (Arts et al. 1998). They affirm that the benchmark data for appropriate preventive maintenance are about 75–97%. As a consequence, an effective preventive maintenance is supported by as few
354
9 Basic Models and Methods for Maintenance of Production Systems
Table 9.10 Asymptotic approximation of W .t / and UEC(tp /, numerical example t
1,000 1,500 2,000 2,500
3,000
3,500
4,000
4,500
5,000
5,500
6,000
6,500
7,000
7,500
W .t / 0.36 0.73 1.10 1.47 UEC(tp / 10.84 10.47 10.29 10.18
1.84 10.10
2.21 10.05
2.57 10.01
2.94 9.98
3.31 9.95
3.68 9.93
4.05 9.92
4.42 9.90
4.79 9.89
5.16 9.88
t
8,000 8,500 9,000 9,500 10,000 10,500 11,000 11,500 12,000 12,500 13,000 13,500 14,000 14,500
W .t / UEC.tp /
5.52 9.87
5.89 9.86
6.26 9.86
6.63 9.85
7.00 9.84
7.37 9.84
7.74 9.83
corrective maintenance (i. e., high occurrence rate for preventive maintenance) and as few preventive maintenance occurrences as possible. The first condition can be explained as follows:
8.11 9.83
8.47 9.83
8.84 9.82
number of corrective events after t ; (9.37) total number of events number of preventive events after t : Sm .t/ D total number of events (9.38) Sc .t/ D
Jiang et al. (2006) proposed preventive effect measures, which are not based on historic data (i. e., model-free) but are based on the previously introduced preventive maintenance replacement analytical models of type I and type II. For the age replacement policy (type I), they introduced the measure Pe .t/, called “preventive effect indicator”:
1 ; W .t/ C 1
10.32 9.81
Pe .0/ D 1;
Finally, Pe .t/ is decreasing. The authors demonstrated that a poor preventive effect implies a poor cost saving. In particular, for the age replacement policy the following measure of cost saving can be introduced: Scost D
(9.41)
Consider the numerical example illustrated in Sect. 9.5.5. The cost saving obtained by the introduction of the preventive maintenance type I replacement rule is14 Cf =MTTF UEC.t / Cf =MTTF 50=.15=4/ 8:975 Š 32:7%; D 50=.15=4/
Scost .t D 2:95/ D
where Z1 MTTF D
R.t/ dt 0
Z4 Z7 Z1 1 7t dt C 0 dt D 1 t dt C 8 6
(9.40)
where W .t/ is the renewal function, i. e., the expected number of corrective maintenance actions.
Cf =MTTF UEC.t / : Cf =MTTF
9.8.1 Numerical Example
(9.39)
whereF .t/ measures the fraction of corrective replacement andR.t/ measures the fraction of preventive replacement. Similarly, for the block replacement policy (type II), where the number of preventive maintenance actions is 1 in the interval Œ0; t, PeII .t/ D
9.95 9.81
Pe .1/ D 0:
where
R.t/ D R.t/; F .t/ C R.t/
9.58 9.82
The generic preventive effect indicator Pe .t/ has the following properties:
Sm .t/ Sc .t/;
PeI .t/ D
9.21 9.82
0
15 D 4 14
4
7
.weeks/:
tp D 2:95 weeks and UEC.tp / D ¤ 8,975 per week.
9.9 Minimum Total Downtime
The expected maintenance cost per unit time decreases from ¤ 13,333 per week to ¤ 8,975 per week. Other significant performance indexes were introduced in some previously illustrated numerical examples and applications, e. g., Table 9.3: mean availability, corrective maintenance downtime, preventive maintenance downtime, total downtime, W .T /, number of preventive replacements, maintenance cost, and total cost. All these performance indexes are defined for a period of time15 T .
9.9 Minimum Total Downtime If the aim of the optimal replacement strategy is to maximize the throughput or the utilization of the equipment , the objective function of the supporting decision-making model can be the total downtime per unit time (due to both preventive and failure replacement actions and frequencies). The proposed model sets this function to a minimum. The type I and type II models previously described have been modified in accordance with this new objective function, and are described and exemplified separately in next sections.
9.9.1 Type I – Minimum Downtime This model supports the determination of the optimal age tp at which preventive replacements should occur in order to minimize the total downtime per unit time: total expected downtime cycle length Tp R.tp / C Tf Œ1 R.tp / : D .tp C Tp /R.tp / C Œm.tp / C Tf Œ1 R.tp / (9.42)
DT.tp / D
The cycle length is calculated in accordance with Eq. 9.18. 9.9.1.1 Type I – Minimum Downtime, Numerical Example 1 Consider the numerical example introduced in Sect. 9.4.2. The application of the analytical model 15
Mission time, also known as time of analysis – observing time.
355
(Eq. 9.42) generates the trend of the downtime DT.tp / illustrated in Fig. 9.43. The minimum value of the downtime is about 0.0122 for tp D 1;392 h. The results obtained by the application of the Monte Carlo simulation, in accordance with a preventive replacement executed after 1;392 h from the last preventive or corrective replacement, are reported in Table 9.11 (configuration G). These values are very similar to those related to the application of the original type I replacement model (configuration E, see Sect. 9.5.1) and to the application of the type I model with Tp and Tf (configuration F, see Sect. 9.6.1). All the results are obtained by a number of simulation runs, called “repetitions,” equal to 2,000. They are not deterministic values and this is why the total downtime does not seem to be at its minimum in configuration G.
9.9.1.2 Type I – Minimum Downtime, Numerical Example 2
Considering the numerical example introduced in Sect. 9.5.5, the total downtime per unit time is Tp R.tp / C Tf Œ1 R.tp / .tp C Tp /R.tp / C Œm.tp / C Tf Œ1 R.tp /
8 Tp 1 18 tp C Tf 18 tp ˆ ˆ ˆ t ;
ˆ ˆ ˆ .tp C Tp / 1 18 tp C 2p C Tf 18 tp ˆ ˆ ˆ ˆ ˆ 0 tp < 4; ˆ ˆ ˆ ˆ ˆ
7tp
tp 1 < Tp 6 C Tf 6 D t 1 ;
7t t 2 4 ˆ ˆ ˆ .tp C Tp / 6 p C 2.tp 1/ C Tf p 6 ˆ ˆ ˆ ˆ ˆ 4 tp < 7; ˆ ˆ ˆ ˆ ˆ ˆ Tf ˆ ˆ ; tp 7: : 45 12 C Tf
DT.tp / D
Table 9.12 summarizes the values of DT(tp / obtained for different operating scenarios and couplets of Tp and Tf values (see the scenarios introduced in Sect. 9.6.3). For example, in scenario C the best value of tp is 5 weeks, corresponding to a unit DT equal to 0.196. Figure 9.44 presents the graphic trend of the downtime values obtained.
356
9 Basic Models and Methods for Maintenance of Production Systems
Table 9.11 Monte Carlo analysis and type I model with downtime minimization Configuration A –
tp (h)
Configuration B 1,356
Configuration C 600
Configuration D 4,000
Configuration E 1,429
Configuration F 1,445
Configuration G 1,392
Mean availability CM downtime (h) PM downtime (h) Total downtime (h) W .T / (failures) Number of PR Maintenance cost (¤) Total cost (¤)
0.9871 0.9878 0.9842 0.9871 415.71 288.2 128.61 416.62 0 104.45 380.56 0.03 415.71 392.65 509.17 416.65 23.1 13.06 7.15 23.15 0 16.01 47.58 0.004 55,192 54,749 76,614 55,557 304,618 290,333 382,116 305,547
0.988 0.988 0.988 293.38 294.42 288.15 94.09 91.97 100 387.47 386.39 388.15 16.3 16.36 16.01 11.76 11.49 12.51 53,823 53,631 54,059 286,305 285,465 286,949
T (h) Simulation repetitions (runs)
32,200 500
32,200 2,000
Table 9.12 Analysis multiscenario. Downtime minimization, type I Type I – downtime minimization Id scenario Tp Tf A B C D
0.5 1 0.5 0.25
0.5 1 1 1
0
1
2
3
tp 4
5
6
7
8
1.00 1.00 1.00 1.00
0.35 0.52 0.38 0.27
0.22 0.36 0.26 0.20
0.17 0.29 0.22 0.18
0.14 0.25 0.20 0.17
0.13 0.23 0.20 0.18
0.12 0.21 0.20 0.19
0.12 0.21 0.21 0.21
11.76 11.76 11.76 11.76
Type I repl. model DT minimization
0.01226 0.01225
DT(tp)
0.01224 0.01223 0.01222 0.01221
Fig. 9.43 Type I model and downtime (DT) minimization, numerical example
0.01220 1300 1310 1320 1330 1340 1350 1360 1370 1380 1390 1400 1410 1420 1430 1440 1450 tp
1 0.9 0.8
DT(Tp)
0.7 0.6
A
0.5
B
0.4
C
0.3
D
0.2 0.1
Fig. 9.44 Downtime minimization, type I. Numerical example
0 0
1
2
3
4
tp
5
6
7
9.9 Minimum Total Downtime
357
9.9.1.3 Type I Replacement for Minimum Downtime. Weibull Distribution of ttf
Finally, Fig. 9.46 presents the expected total downtime per unit time when Tp passes from 0.5 to 0.1 units of time.
Figure 9.45 presents the expected total downtime per unit time for distributions of ttf which differ for the value of shape parameter ˇ, assuming Tp and Tf are equal to 0.5 and 1 unit of time (e. g., hour or day), respectively. For values of ˇ greater than 1 it is possible to identify an optimal value of tp in terms of units of time. Values of the shape parameter lower than 1 are not supported by a best tp value.
9.9.2 Type II – Downtime Minimization The following model supports the determination of the optimal replacement interval tp between preventive replacements adopting the block replacement strategy (type II) and setting the total downtime per unit time
Type I Replacement Model - DownTime minimization (Tp =0.5, Tf =1)
1
Weibull (a=50, b=1) 0.9
Weibull (a=50, b=2) Weibull (a=50, b=3)
0.8
Weibull (a=50, b=4) Weibull (a=50, b=0.5)
0.7
Weibull (a=50, b=0.2)
DT(tp)
0.6
0.5
0.4
0.3
0.2
0.1
0 0
10
20
30
40
50
60
70
80
90
100
110
tp [unit of time] Type I Replacement Model - DownTime minimization (Tp = 0.5, Tf = 1) 0.1 Weibull (a=50, b=1) 0.09
Weibull (a=50, b=2) Weibull (a=50, b=3)
0.08
Weibull (a=50, b=4) Weibull (a=50, b=0.5)
0.07
Weibull (a=50, b=0.2)
DT(tp)
0.06
0.05
0.04
0.03
0.02
0.01
0
0
10
20
30
40
50
60
70
80
90
100
110
120
tp [unit of time]
Fig. 9.45 Weibull distribution of ttf. Type I replacement model based on downtime minimization. Variable ˇ
130
140
150
160
170
180
190
200
358
9 Basic Models and Methods for Maintenance of Production Systems Type I Replacement Model - DownTime Minimization (Tf = 1) Weibull (a=50, b=1), Tp=0.5 Weibull (a=50, b=1), Tp=0.1 Weibull (a=50, b=3), Tp=0.5 Weibull (a=50, b=3), Tp=0.1 Weibull (a=50, b=0.2), Tp=0.5 Weibull (a=50, b=0.2), Tp=0.1
0.25
DT(tp)
0.2
0.15
0.1
0.05
0
0
10
20
30
40
50
60
70
80
90
100
110
120
130
140
150
160
170
180
190
200
t p [unit of time]
Fig. 9.46 Weibull distribution of ttf. Type I replacement model based on downtime minimization. Variables ˇ and Tp Preventive replacement Tp
Failure replacement
Failure replacement
Tf
Tf
Preventive replacement Tp
tp
time
Cycle length
Fig. 9.47 Cycle length. Downtime minimization, type II with Tp and Tf
DT(tp / to its minimum, as illustrated in Fig. 9.47. In particular, Fig. 9.47 illustrates the cycle length determination in the presence of fixed times of replacement Tp and Tf . failure replacement downtime C preventive replacement downtime DT.tp / D cycle length W .tp /Tf C Tp D : (9.43) tp C Tp 9.9.2.1 Type I – Minimum Downtime, Numerical Example Consider the numerical example introduced in Sect. 9.5.5. Table 9.13 summarizes the unit downtime values obtained by the application of Eq. 9.43, in accordance with the expected number of failures quantified by Eq. 9.33, the discrete approach.
Finally, Fig. 9.48 illustrates the trend of DT(Tp / for scenarios A, B, C, and D.
9.10 Group Replacement: The Lamp Replacement Problem Sometimes groups of similar items subject to failure (valves or filters in a piping system, lamps in a building or in a street, racks in a warehousing systems, etc.) are managed simultaneously in order to accomplish economies of scale. In such a situation, it could be useful to replace a generic item under group replacement conditions rather than replace only a single unit/entity. For example, it could be justifiable to replace all valves and filters of a piping system rather than only the failed ones. Replacing an item under group replacement, at the end of a fixed cycle length tp , is assumed to be less expensive than every failure replacement performed in
9.11 Preventive Maintenance Policies for Repairable Systems
359
Table 9.13 Analysis multiscenario. Downtime minimization, type II Type II – downtime minimization Id scenario Tp Tf 0.5 1 0.5 0.25
0.5 1 1 1
1
2
3
4
5
6
7
1.00 1.00 1.00 1.00
0.38 0.56 0.42 0.30
0.25 0.42 0.31 0.23
0.20 0.36 0.26 0.21
0.18 0.32 0.24 0.20
0.17 0.31 0.24 0.21
0.16 0.30 0.25 0.22
0.16 0.30 0.26 0.23
DT(Tp )
A B C D
tp 0
Fig. 9.48 Downtime minimization, type II
DT(Tp ) - Type II
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
A B C D 0
1
the course of tp ; in other words, the replacing cost per item, during a group replacement at tp , Cg is lower than the cost of failure replacement, during tp , Cf . Moreover, it is assumed that when an item fails in the course of tp that item is replaced by a new one before tp expires. The aim of the proposed model is to minimize the total expected cost of replacement per unit time UEC, defined as total expected cost in .0; tp / cycle length NCg C Cf ŒN W .tp / D ; (9.44) tp
UEC.tp / D
where N is the number of items in a group and W .t/ is the expected number of failures for one item. We now present a numerical example. Consider the application introduced in Sect. 9.5.5 and the definition of the probability distribution of the ttf f .t/ for the item subject to replacement. In particular, there is a group of 70 similar items subject to f .t/: It is possible to apply Eq. 9.44 assuming Cg and Cf are equal to ¤ 3,000 and ¤ 50,000 per replacement, respectively. Cg differs from Cp (equal to ¤ 5,000 per replacement) because the group replacement is performed in the presence of economies of scale. Figure 9.49 illustrates the values of the expected cost per
2
3
4
5
6
7
Tp
unit time obtained for different values of the period tp as follows: NCg C Cf ŒN W .tp / tp 70 3 C 50Œ70W .tp / D : tp
UEC.tp / D
Figure 9.49 shows the minimum of UEC for tp equal to 3 weeks, i. e., executing a block replacement on 70 items after three periods of time.
9.11 Preventive Maintenance Policies for Repairable Systems The analytical models proposed in this section assume, as a basic hypothesis, that the equipment, i. e., the production systems and components, is repairable and not as good as new immediately after the completion of the generic maintenance action (preventive or corrective). This is the reason why these models are not preventive replacement models, but are based on repair activity and/or replacement of a part of the whole system. In other words, the production system is subject to a continuous process of degradation and ageing.
360
9 Basic Models and Methods for Maintenance of Production Systems
Group replacement - UEC(tp)
800
UEC(t p) [k€/week]
750 700 650 600 550 500 1
Fig. 9.49 Group replacement. UEC(tp /
2
lim i .t/ D 1;
t !1
(9.45)
where i .t/ is the failure rate at time t (time from the last repair action) of the repairable component i in a system subjected to (i 1) repairs; •
i C1 .t/ i .t/; i C1 .0/ i .0/:
4
5
6
7
t p [week]
In particular, if the generic failure rate is not influenced and disturbed by any minimal repair of failures, we are in the presence of the so-called minimal repair action. It is also assumed that the state of the item is always known with certainty, in accordance with the adopted framework for the classification of maintenance strategies and the definition of preventive maintenance (see Sect. 9.2). It is therefore assumed that repair and/or replacement activities start immediately as soon as a failure occurs. In general (i. e., in the absence of the “as good as new hypothesis” and “minimal repair” actions), the life distribution of the equipment is assumed to change after each repair, i. e., the failure rate function increases after a generic maintenance action. We call this kind of repair activity a “not perfect or imperfect repair” action. As a consequence, the following properties follow: • i .t/ strictly increases,
3
t > 0;
(9.46)
From these assumptions and the property that the generic component degrades after the not perfect repair action, the following set of equations can be properly demonstrated: MTTFi MTTFi C1 ; (9.47) FNi .t/ FNi C1 .t/;
where MTTFi is the MTTF of the component subjected to (i 1) repairs and FNi .t/ is the survival function of the component at time t after the last (i 1) repair. The following sections present two analytical models for the determination of the best preventive policy for repairable systems subjected to replacement cycles (Nguyen and Murthy 1981) in accordance with the following considerations: 1. The replacement and repair costs of a failed component/system are generally greater than the replacement and repair costs of an entity that has not failed. 2. Continuing to repair a system is often costly compared with replacing it after a certain number of repairs.
9.11.1 Type I Policy for Repairable Systems The basic rule of this preventive maintenance model is to replace the component/system after (k 1) repairs. Considering an entity subjected to (i 1) repairs, that entity is repaired, or replaced if i D k, at the time of failure (breakdown action) or at the age Ti (preventive action) from the last repair or replacement. Figure 9.50 illustrates the replacement cycle with related costs, assumed to be constant. The notation adopted follows: Cr
Cp
is the replacement cost at the kth maintenance action. The replacement activity is coherent with the as good as new hypothesis. is the repair cost.
9.11 Preventive Maintenance Policies for Repairable Systems
361
Cp
Cp
Cp
Cp
Cr
Fi(Ti)Cf
Fi(Ti)Cf
Fi(Ti)Cf
Fi(Ti)Cf
Fi(Ti)Cf
i=1
i=2
i=3
T2
i = k-1
T3
Replacement with a new component
i=k
Tk
The same component “gets old”
Fig. 9.50 Type I policy, repairable systems. Replacement cycle and costs
The component is repaired after a failure Cp
Cp
Fi(Ti)Cf
Fi(Ti)Cf
i=1
i=2
C p + Cf
Cp
Cr + Cf
Replacement with a new component
Fi(Ti)Cf i=3
i = k-1
i=k
Tk
T2 T3 The same component “gets old”
Fig. 9.51 Type I policy, repairable systems. Example
Cf
is the breakdown cost; it is a cost additional to Cp . As a consequence, the generic corrective action, which follows a breakdown event, costs Cp C Cf :
This policy is characterized by k and fTi g variables, whose values have to be properly identified, and where fTi g denotes the set of maintenance ages T1 ; T2 ; : : : ; Tk . Figure 9.51 exemplifies a replacement cycle for a component which fails after a second preventive repair (i D 2) before waiting T3 and the third planned preventive action. The expected costs of a repair Cp .Ti / and of a replacement Cr .Ti / are, respectively, Cp .Ti / D .Cp C Cf /Fi .Ti / C Cp Œ1 Fi .Ti / D Cp C Cf Fi .Ti /
Applying this rule, UEC is UECŒk; T1 ; : : : ; Tk D
P .k 1/Cp C Cr C Cf kiD1 Fi .Ti / ; D Pk R Ti FNi .t/ dt
where ECŒk; fTi g is the expected cost for a replacement cycle and LŒk; fTi g is the expected length of a replacement cycle. The optimal policy is to select k and maintenance ages fTi g so as to minimize Eq. 9.50. Differentiating Eq. 9.47 with respect to Ti and equating to zero, ri .Ti / D
Cr .Ti / D .Cr C Cf /Fi .Ti / C Cr Œ1 Fi .Ti / D Cr C Cf Fi .Ti /:
(9.49)
(9.50)
i D1 0
(9.48)
and
ECŒk; fTi g LŒk; fTi g
UECŒk; fTi g ; Cf
(9.51)
where ri .Ti / is failure rate of the component at time Ti after the last repair, called the “(i 1)th repair,”
362
and
9 Basic Models and Methods for Maintenance of Production Systems
8 ˆ r .T / D r1 .T1 /; 1 < i k ˆ ˆ i i ˆ ˆ ˆ ˆ k ZTi < X r1 .T1 / FNi .t/ dt Fi .Ti / ˆ ˆ ˆ i D1 ˆ 0 ˆ ˆ ˆ : Œ.k1/Cp CCr D : Cf
and Zz
eu ux1 du:
.x; z/ D 0
(9.52)
Now we demonstrate that ZTi
The generic i th cycle is based on the set of functions fi .t/, ri .t/, etc. defined for the random variable ttfi . Consequently, the failure rate ri .t/ can be assumed to be equal to i .t/, where t is the point in time from the last maintenance action. The availability of the system is Pk R T i N i D1 0 Fi .t/ dt AŒk; fTi g D : P .k 1/Cp C Cr C Cf kiD1 Fi .Ti / RT P C kiD1 0 i FNi .t/ dt (9.53)
1 Ti b N C Ti FN .Ti /; F .t/ dt D a 1 C ; b a
0
(9.57) where 1 Ti b a 1 C ; b a
T b i ! Za 1=b y Da y e dy 0
T b i Za
yD
The problem of maximizing AŒk; fTi g is equivalent to the problem of minimizing UECŒk; T1 ; : : : ; Tk . The following algorithm can be applied to compute optimal policy I, as demonstrated by Nguyen and Murthy (1981):
5. k D k 1. Compute UECŒk; fTi g: Now consider the case of a Weibull distribution, t ˇi ; Fi .t/ D 1 exp ˛i (9.54) ˇi t ˇi 1 ri .t/ D i .t/ D : ˛i ˛i RT In Eq. 9.50, in order to compute 0 i FNi .t/ dt when f .t/ is represented by a Weibull density function, it is useful to quantify the lower incomplete function .x; z/. Given a Weibull distribution (a scale parameter and b shape parameter), we know that
0
t b exp dt a
!
xDT Z i
D
b1 ab
a dx
x
x . x /b x b1 e a b b dx a a
b x b . x /b e a dx a a
!
!
0
4. Set k D k C 1 and go to step 2.
ZTi
x . x /b e a dy a
yD0
Da
3. If T1 .k/ T1 .k 1/, go to step 5.
0
a
ZTi
2. Solve Eq. 9.52 for fTi .k/g.
FNi .t/ dt D
D
b yD. x a/
x b dyD dŒ. x a / Db
1. Set k= 1.
ZTi
(9.56)
ZTi b x b x e. a / dx D b a 0
D
x b
b x b1 . a / f .x/D a .a/ e
ZTi b1 b x x b e. a / x dx a a 0
ZTi D
xf .x/ dx 0
D
Eq. 5.38
Ti FN .Ti / C
ZTi
FN .x/ dx:
0
9.11.1.1 Numerical Example (9.55)
Consider a component subject to preventive maintenance type I general actions, in accordance with
9.11 Preventive Maintenance Policies for Repairable Systems
363
Table 9.14 T1 determination, k D 1 and ˇ D 2 T1
0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 1.05 1.1 1.15 1.2 1.25 1.3 1.35 1.4
F1 .T1 /
r1 .T1 /
1 C 1=ˇ
.T1 =˛/ˇ
A1 D
Œ1 C 1=ˇ I .T1 =˛/ˇ
˛A1
0.002 0.010 0.022 0.039 0.061 0.086 0.115 0.148 0.183 0.221 0.261 0.302 0.345 0.387 0.430 0.473 0.514 0.555 0.594 0.632 0.668 0.702 0.734 0.763 0.790 0.815 0.838 0.859
0.100 0.200 0.300 0.400 0.500 0.600 0.700 0.800 0.900 1.000 1.100 1.200 1.300 1.400 1.500 1.600 1.700 1.800 1.900 2.000 2.100 2.200 2.300 2.400 2.500 2.600 2.700 2.800
1.500 1.500 1.500 1.500 1.500 1.500 1.500 1.500 1.500 1.500 1.500 1.500 1.500 1.500 1.500 1.500 1.500 1.500 1.500 1.500 1.500 1.500 1.500 1.500 1.500 1.500 1.500 1.500
0.003 0.010 0.023 0.040 0.063 0.090 0.123 0.160 0.203 0.250 0.303 0.360 0.423 0.490 0.563 0.640 0.723 0.810 0.903 1.000 1.103 1.210 1.323 1.440 1.563 1.690 1.823 1.960
0.000 0.001 0.002 0.005 0.010 0.017 0.027 0.039 0.054 0.072 0.093 0.117 0.143 0.172 0.203 0.236 0.270 0.306 0.342 0.379 0.416 0.452 0.488 0.522 0.556 0.588 0.618 0.647
0.000 0.001 0.002 0.005 0.010 0.017 0.027 0.039 0.054 0.072 0.093 0.117 0.143 0.172 0.203 0.236 0.270 0.306 0.342 0.379 0.416 0.452 0.488 0.522 0.556 0.588 0.618 0.647
the previously illustrated model. We assume Cp D ¤ 5,000 per action, Cr D ¤ 15,000 per action, and Cf D ¤15,000 per action. The failure probability function, the rate function, and the scale parameter of the generic Weibull density function are defined as follows: t ˇ ; Fi .t/ D 1 exp ˛i ri .t/ D i .t/ D
ˇ t ˇ 1 ; ˛i ˛i
˛i D .1:5/1i : The algorithm illustrated above and introduced by Nguyen and Murthy (1981) is applied to find the best .T /i values. In the first iteration, when k D 1, the value of ˛ is 1. Table 9.14 presents the calculus to
A2 D ˛A1 C T1 Œ1 F .T1 /
B1 D r1 .T1 /A2 F1 .T1 / Cr =Cf
0.050 0.100 0.149 0.197 0.245 0.291 0.336 0.380 0.421 0.461 0.499 0.535 0.569 0.601 0.630 0.658 0.683 0.706 0.727 0.747 0.764 0.780 0.794 0.807 0.818 0.828 0.836 0.844
0:998 0:990 0:978 0:960 0:938 0:911 0:880 0:844 0:804 0:760 0:712 0:660 0:605 0:546 0:485 0:420 0:353 0:284 0:212 0:138 0:063 0:014 0.093 0.173 0.254 0.337 0.420 0.504
quantify T1 in the case ˇ D 2: ZT1 1 .T1 /
Cr FN1 .t/ dt F1 .T1 / D 0; Cf
0
where ZT1
1 T1 ˇ C T1 FN .T1 /: FN .T1 / dt D ˛ 1 C ; ˇ ˛
0
In particular, T1 .k D 1/ 2 .1:05; 1:1/; r1 .T1 / 2 .2:1; 2:2/: This calculus was implemented in a spreadsheet in order to demonstrate that no particular informatics skills are required, and practitioners or managers can apply
364
9 Basic Models and Methods for Maintenance of Production Systems
the proposed model, even if it can appear very complicated. The value of T1 , which sets the B1 values to zero in the last column of Table 9.14, is the best T1 assuming k D 1. Before quantifying UEC, it is useful to quantify A2 as (see Table 9.14) ZT1 A2 .k D 1; T1 D 1:05/ D
FN .t/ dt
For this purpose we propose the use of a new spreadsheet, reported in Table 9.15, which refers to Table 9.16 for the explanation of the symbols. Then, ˇ ˇ 1 ˇ ˇ ˇˇ t 2t ˇ D 2 2 .t/ D ˇˇ ˇ ˛2 ˛2 ˛2 ˇ D2 and
0
1 T1 D a 1 C ; ˇ ˛
ˇ ZT1
A2 .k D 1; T1 D 1:1/ D
T2 D
C T1 FN .T1 / D 0:764; FN .t/ dt D 0:780:
2 .t/ 2 ˛ 2 2
D
1 .t /D2 .t /
The last column in Table 9.15 reports the values of the following equation, called “B2”: ZT1
0
B2.T1 ; T2 / D 1 .T1 /
The value of UEC obtained, assuming T1 D 1:05 as a lower bound of T1 , is UECŒk D 1; T1 D 1:05 D
ZT2 C 1 .T1 /
.k 1/Cp C Cr C Cf i D1 Fi .Ti / Pk R Ti FNi .t/ dt
Š
FN2 .t/ dt F2 .T2 /
0
Cp C Cr : Cf
i D1 0
Table 9.14
FN1 .t/ dt F1 .T1 /
0
Pk
15 C 15 0:668 Cr C Cf F1 .T1 / D D RT 1 N 0:764 0 Fi .t/ dt
1 .t/ 2 ˛ : 2 2
Equation B2 is equal to 0, in accordance with Eq. 9.52, when (see also Fig. 9.52)
¤ 32;740 per unit of time.
T1 .k D 2/ 2 .1; 1:05/; T2 .k D 2/ 2 .0:444; 0:467/;
When T1 is adopted for its upper bound T1 D 1:1,
r1 .T1 / D r2 .T2 / 2 .2:0; 2:1/:
UECŒk D 1; T1 D 1:1
P .k 1/Cp C Cr C Cf kiD1 Fi .Ti / D Pk R Ti FNi .t/ dt
Now
15 C 15 0:702 Cr C Cf F1 .T1 / D D RT 1 N 0:780 0 Fi .t/ dt
Consequently, values of unit cost lower than UEC.k D 1/ previously quantified are expected. Before the illustration of the calculus of the UEC we explicitly quantify the following values in accordance with Table 9.15:
i D1 0
Š
Table 9.14
¤ 32;730 per unit of time.
For k D 2 it is necessary to solve the following set of equations: 8 ˆ 1 .T1 / D 2 .T2 / ˆ ˆ ˆ ˆ ˆ ˆ < .T / R T1 FN .t/ dt F .T / 1 1 0 1 1 1 R T2 C C Cr ˆ N ˆC1 .T1 / 0 F2 .t/ dt F2 .T2 / D p ˆ ˆ Cf ˆ ˆ ˆ : ˛1 D 1; ˛2 D .1:5/1 :
T1 .k D 1/ > T1 .k D 2/:
ZT1 b2.k D 2; T1 D 1/ D
FN .t/ dt
0
1 T1 ˇ D a 1 C ; C T1 FN .T1 / D 0:747; ˇ ˛ ZT1 b2.k D 2; T1 D 1:05/ D FN .t/ dt D 0:764; 0
0.022 0.044 0.067 0.089 0.111 0.133 0.156 0.178 0.200 0.222 0.244 0.267 0.289 0.311 0.333 0.356 0.378 0.400 0.422 0.444 0.467 0.489 0.511 0.533 0.556 0.578 0.600 0.622 0.644 0.667 0.689 0.711 0.733 0.756 0.778 0.800 0.822 0.844 0.867 0.889 0.911
0.002 0.010 0.022 0.039 0.061 0.086 0.115 0.148 0.183 0.221 0.261 0.302 0.345 0.387 0.430 0.473 0.514 0.555 0.594 0.632 0.668 0.702 0.734 0.763 0.790 0.815 0.838 0.859 0.878 0.895 0.910 0.923 0.934 0.944 0.953 0.961 0.967 0.973 0.978 0.982 0.985
0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 1.05 1.1 1.15 1.2 1.25 1.3 1.35 1.4 1.45 1.5 1.55 1.6 1.65 1.7 1.75 1.8 1.85 1.9 1.95 2 2.05
0.100 0.200 0.300 0.400 0.500 0.600 0.700 0.800 0.900 1.000 1.100 1.200 1.300 1.400 1.500 1.600 1.700 1.800 1.900 2.000 2.100 2.200 2.300 2.400 2.500 2.600 2.700 2.800 2.900 3.000 3.100 3.200 3.300 3.400 3.500 3.600 3.700 3.800 3.900 4.000 4.100
F1 .T1 / r1 .T1 / D r2 .T2 / T2
T1 1.500 1.500 1.500 1.500 1.500 1.500 1.500 1.500 1.500 1.500 1.500 1.500 1.500 1.500 1.500 1.500 1.500 1.500 1.500 1.500 1.500 1.500 1.500 1.500 1.500 1.500 1.500 1.500 1.500 1.500 1.500 1.500 1.500 1.500 1.500 1.500 1.500 1.500 1.500 1.500 1.500
0.003 0.010 0.023 0.040 0.063 0.090 0.123 0.160 0.203 0.250 0.303 0.360 0.423 0.490 0.563 0.640 0.723 0.810 0.903 1.000 1.103 1.210 1.323 1.440 1.563 1.690 1.823 1.960 2.103 2.250 2.403 2.560 2.723 2.890 3.063 3.240 3.423 3.610 3.803 4.000 4.203
0.000 0.001 0.002 0.005 0.010 0.017 0.027 0.039 0.054 0.072 0.093 0.117 0.143 0.172 0.203 0.236 0.270 0.306 0.342 0.379 0.416 0.452 0.488 0.522 0.556 0.588 0.618 0.647 0.673 0.698 0.721 0.742 0.760 0.777 0.793 0.806 0.818 0.828 0.838 0.845 0.852
0.000 0.001 0.002 0.005 0.010 0.017 0.027 0.039 0.054 0.072 0.093 0.117 0.143 0.172 0.203 0.236 0.270 0.306 0.342 0.379 0.416 0.452 0.488 0.522 0.556 0.588 0.618 0.647 0.673 0.698 0.721 0.742 0.760 0.777 0.793 0.806 0.818 0.828 0.838 0.845 0.852
1 C 1=ˇ .T1 =˛1 /ˇ Œ1 C 1=ˇ I b1 .T1 =˛1 /ˇ 0.050 0.100 0.149 0.197 0.245 0.291 0.336 0.380 0.421 0.461 0.499 0.535 0.569 0.601 0.630 0.658 0.683 0.706 0.727 0.747 0.764 0.780 0.794 0.807 0.818 0.828 0.836 0.844 0.851 0.856 0.861 0.865 0.869 0.872 0.874 0.877 0.878 0.880 0.881 0.882 0.883
b2 0:998 0:990 0:978 0:960 0:938 0:911 0:880 0:844 0:804 0:760 0:712 0:660 0:605 0:546 0:485 0:420 0:353 0:284 0:212 0:138 0:063 0.014 0.093 0.173 0.254 0.337 0.420 0.504 0.589 0.674 0.760 0.846 0.933 1.020 1.107 1.195 1.283 1.370 1.458 1.547 1.635
b3 0.001 0.004 0.010 0.018 0.028 0.040 0.054 0.071 0.090 0.111 0.134 0.160 0.188 0.218 0.250 0.284 0.321 0.360 0.401 0.444 0.490 0.538 0.588 0.640 0.694 0.751 0.810 0.871 0.934 1.000 1.068 1.138 1.210 1.284 1.361 1.440 1.521 1.604 1.690 1.778 1.868
0.000 0.000 0.001 0.002 0.003 0.005 0.008 0.012 0.017 0.023 0.030 0.039 0.049 0.060 0.072 0.086 0.100 0.117 0.134 0.152 0.172 0.192 0.214 0.236 0.259 0.282 0.306 0.330 0.354 0.379 0.403 0.428 0.452 0.476 0.499 0.522 0.545 0.567 0.588 0.608 0.628
0.000 0.000 0.000 0.001 0.002 0.003 0.005 0.008 0.011 0.015 0.020 0.026 0.032 0.040 0.048 0.057 0.067 0.078 0.089 0.102 0.115 0.128 0.142 0.157 0.172 0.188 0.204 0.220 0.236 0.253 0.269 0.285 0.301 0.317 0.333 0.348 0.363 0.378 0.392 0.406 0.419
.T2 =˛1 /ˇ Œ1 C 1=ˇ I b4 .T2 =˛2 /ˇ
Table 9.15 T1 and T2 determination, k D 2 and ˇ D 2 (the explanation of the symbols is given in Table 9.16)
0.022 0.044 0.066 0.088 0.110 0.132 0.153 0.174 0.194 0.214 0.234 0.253 0.272 0.290 0.308 0.325 0.341 0.357 0.372 0.387 0.400 0.414 0.426 0.438 0.450 0.461 0.471 0.480 0.489 0.498 0.506 0.513 0.520 0.526 0.532 0.538 0.543 0.548 0.552 0.556 0.559
b5 0.001 0.004 0.010 0.018 0.027 0.039 0.053 0.069 0.086 0.105 0.126 0.148 0.171 0.196 0.221 0.248 0.275 0.302 0.330 0.359 0.387 0.416 0.444 0.473 0.501 0.528 0.555 0.582 0.607 0.632 0.656 0.679 0.702 0.723 0.744 0.763 0.782 0.799 0.815 0.831 0.846
F2 .T2 / 1:330 1:319 1:301 1:276 1:244 1:205 1:159 1:107 1:049 0:984 0:914 0:838 0:756 0:670 0:578 0:482 0:382 0:277 0:169 0:058 0:057 0.175 0.296 0.419 0.545 0.673 0.803 0.934 1.067 1.202 1.338 1.475 1.614 1.753 1.893 2.035 2.176 2.319 2.462 2.605 2.749
B2
9.11 Preventive Maintenance Policies for Repairable Systems 365
366
9 Basic Models and Methods for Maintenance of Production Systems
Table 9.16 Table 9.15
Explanation of symbols b1, b2, and b3 in
b1
˛1 Œ1 C 1=ˇ I .T1 =˛1 /ˇ
b2
b1 C T1 Œ1 F .T1 /
b3
r1 .T1 /b2 F1 .T1 / Cr =Cf
b4
˛2 Œ1 C 1=ˇ I .T2 =˛2 /ˇ
b5
b4 C T2 Œ1 F .T2 /
D
5 C 15 C 15 0:632 C 15 0:359 0:747 C 0:387
D ¤ 30,745 per unit of time. If the upper bounds values of time T1 D 1:05 and T2 D 0:668 are assumed, UECŒk D 2; T1 D 1:05; T2 D 0:668 P .k 1/Cp C Cr C Cf kiD1 Fi .Ti / D Pk R T i FNi .t/ dt
B2(T1,T2)
i D1 0
5.000
D
4.000
Table 9.15 3.000
D B2
2.000
Cp C Cr C Cf F1 .T1 / C Cf F2 .T2 / R T1 RT FN1 .t/ dt C 2 FN2 .t/ dt 0
0
5 C 15 C 15 0:668 15 0:387 0:764 C 0:400
Š ¤ 30,777 per unit of time.
1.000
0.000 0
0.2
0.4
0.6 0.8
1
1.2
1.4
1.6
1.8
2
2.2
2.4
-1.000
-2.000 T1
Fig. 9.52 B2.T1 ; T2 /, numerical example
and ZT2 b5.k D 2; T2 D 0:444/ D
1 T2 D a 1 C ; ˇ ˛
FN .t/ dt
0
ˇ
1
C T2 FN .T2 / D 0:387;
ZT2 b5.k D 2; T2 D 0:467/ D
As expected these values of unit cost are lower than UEC.k D 1/, and a further iteration of the proposed algorithm is performed as follows (see k D k C 1 D 3). For k D 3 it is necessary to solve the following: 8 ˆ ˆ ˆ1 .T1 / D 2 .T2 / D 3 .T3 / ˆ ˆ ˆ ˆ .T / R T1 FN .t/ dt F .T / ˆ ˆ 1 1 0 1 1 1 ˆ ˆ < R T2 C1 .T1 / 0 FN2 .t/ dt F2 .T2 / ˆ ˆ ˆ RT ˆ 2Cp C Cr ˆ ˆ C1 .T1 / 0 1 FN3 .t/ dt F3 .T1 / D ˆ ˆ Cf ˆ ˆ ˆ :˛ D 1; ˛ D .1:5/1 ; ˛ D .1:5/2 :
FN .t/ dt D 0:400:
2
3
Then, ˇ ˇ 1 ˇ ˇ ˇ ˇ T3 2T3 ˇ D 2 : 3 .T3 / D 1 .T1 / D ˇˇ ˇ ˛3 ˛3 ˛3 ˇ D2
0
The UEC assuming the lower bounds values of time T1 D 1 and T2 D 0:444 is therefore UECŒk D 2; T1 D 1; T2 D 0:444 P .k 1/Cp C Cr C Cf kiD1 Fi .Ti / D Pk R T i N i D1 0 Fi .t/ dt Š
Table 9.15
Cp C Cr C Cf F1 .T1 / C Cf F2 .T2 / R T1 RT FN1 .t/ dt C 2 FN2 .t/ dt 0
0
Table 9.17 presents the spreadsheet used to support the algorithm calculus for k D 3, adopting the explanation of symbols in Tables 9.16 and 9.18. The values of time obtained are T1 .k D 3/ 2 .1:05; 1:1/; T2 .k D 3/ 2 .0:467; 0:489/; T3 .k D 3/ 2 .0:207; 0:217/; r1 .T1 / D r2 .T2 / D r3 .T3 / 2 .2:1; 2:2/:
0.002 0.010 0.022 0.039 0.061 0.086 0.115 0.148 0.183 0.221 0.261 0.302 0.345 0.387 0.430 0.473 0.514 0.555 0.594 0.632 0.668 0.702 0.734 0.763 0.790 0.815 0.838 0.859 0.878 0.895 0.910 0.923 0.934 0.944 0.953 0.961 0.967 0.973 0.978 0.982 0.985
0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 1.05 1.1 1.15 1.2 1.25 1.3 1.35 1.4 1.45 1.5 1.55 1.6 1.65 1.7 1.75 1.8 1.85 1.9 1.95 2 2.05
0.100 0.200 0.300 0.400 0.500 0.600 0.700 0.800 0.900 1.000 1.100 1.200 1.300 1.400 1.500 1.600 1.700 1.800 1.900 2.000 2.100 2.200 2.300 2.400 2.500 2.600 2.700 2.800 2.900 3.000 3.100 3.200 3.300 3.400 3.500 3.600 3.700 3.800 3.900 4.000 4.100
F1 .T1 / r1 .T1 /
T1
0.022 0.044 0.067 0.089 0.111 0.133 0.156 0.178 0.200 0.222 0.244 0.267 0.289 0.311 0.333 0.356 0.378 0.400 0.422 0.444 0.467 0.489 0.511 0.533 0.556 0.578 0.600 0.622 0.644 0.667 0.689 0.711 0.733 0.756 0.778 0.800 0.822 0.844 0.867 0.889 0.911
T2
0.010 0.020 0.030 0.040 0.049 0.059 0.069 0.079 0.089 0.099 0.109 0.119 0.128 0.138 0.148 0.158 0.168 0.178 0.188 0.198 0.207 0.217 0.227 0.237 0.247 0.257 0.267 0.277 0.286 0.296 0.306 0.316 0.326 0.336 0.346 0.356 0.365 0.375 0.385 0.395 0.405
T3 0.003 0.010 0.023 0.040 0.063 0.090 0.123 0.160 0.203 0.250 0.303 0.360 0.423 0.490 0.563 0.640 0.723 0.810 0.903 1.000 1.103 1.210 1.323 1.440 1.563 1.690 1.823 1.960 2.103 2.250 2.403 2.560 2.723 2.890 3.063 3.240 3.423 3.610 3.803 4.000 4.203
0.000 0.001 0.002 0.005 0.010 0.017 0.027 0.039 0.054 0.072 0.093 0.117 0.143 0.172 0.203 0.236 0.270 0.306 0.342 0.379 0.416 0.452 0.488 0.522 0.556 0.588 0.618 0.647 0.673 0.698 0.721 0.742 0.760 0.777 0.793 0.806 0.818 0.828 0.838 0.845 0.852
.T1 =˛1 /ˇ Œ1 C 1=ˇ I .T1 =˛1 /ˇ 0.000 0.001 0.002 0.005 0.010 0.017 0.027 0.039 0.054 0.072 0.093 0.117 0.143 0.172 0.203 0.236 0.270 0.306 0.342 0.379 0.416 0.452 0.488 0.522 0.556 0.588 0.618 0.647 0.673 0.698 0.721 0.742 0.760 0.777 0.793 0.806 0.818 0.828 0.838 0.845 0.852
b1 0.050 0.100 0.149 0.197 0.245 0.291 0.336 0.380 0.421 0.461 0.499 0.535 0.569 0.601 0.630 0.658 0.683 0.706 0.727 0.747 0.764 0.780 0.794 0.807 0.818 0.828 0.836 0.844 0.851 0.856 0.861 0.865 0.869 0.872 0.874 0.877 0.878 0.880 0.881 0.882 0.883
b2 0:998 0:990 0:978 0:960 0:938 0:911 0:880 0:844 0:804 0:760 0:712 0:660 0:605 0:546 0:485 0:420 0:353 0:284 0:212 0:138 0:063 0.014 0.093 0.173 0.254 0.337 0.420 0.504 0.589 0.674 0.760 0.846 0.933 1.020 1.107 1.195 1.283 1.370 1.458 1.547 1.635
b3 0.001 0.004 0.010 0.018 0.028 0.040 0.054 0.071 0.090 0.111 0.134 0.160 0.188 0.218 0.250 0.284 0.321 0.360 0.401 0.444 0.490 0.538 0.588 0.640 0.694 0.751 0.810 0.871 0.934 1.000 1.068 1.138 1.210 1.284 1.361 1.440 1.521 1.604 1.690 1.778 1.868
0.000 0.000 0.001 0.002 0.003 0.005 0.008 0.012 0.017 0.023 0.030 0.039 0.049 0.060 0.072 0.086 0.100 0.117 0.134 0.152 0.172 0.192 0.214 0.236 0.259 0.282 0.306 0.330 0.354 0.379 0.403 0.428 0.452 0.476 0.499 0.522 0.545 0.567 0.588 0.608 0.628
.T2 =˛1 /ˇ Œ1 C 1=ˇ I .T2 =˛2 /ˇ 0.000 0.000 0.000 0.001 0.002 0.003 0.005 0.008 0.011 0.015 0.020 0.026 0.032 0.040 0.048 0.057 0.067 0.078 0.089 0.102 0.115 0.128 0.142 0.157 0.172 0.188 0.204 0.220 0.236 0.253 0.269 0.285 0.301 0.317 0.333 0.348 0.363 0.378 0.392 0.406 0.419
b4 0.022 0.044 0.066 0.088 0.110 0.132 0.153 0.174 0.194 0.214 0.234 0.253 0.272 0.290 0.308 0.325 0.341 0.357 0.372 0.387 0.400 0.414 0.426 0.438 0.450 0.461 0.471 0.480 0.489 0.498 0.506 0.513 0.520 0.526 0.532 0.538 0.543 0.548 0.552 0.556 0.559
b5 0.001 0.004 0.010 0.018 0.027 0.039 0.053 0.069 0.086 0.105 0.126 0.148 0.171 0.196 0.221 0.248 0.275 0.302 0.330 0.359 0.387 0.416 0.444 0.473 0.501 0.528 0.555 0.582 0.607 0.632 0.656 0.679 0.702 0.723 0.744 0.763 0.782 0.799 0.815 0.831 0.846
0.000 0.002 0.004 0.008 0.012 0.018 0.024 0.032 0.040 0.049 0.060 0.071 0.083 0.097 0.111 0.126 0.143 0.160 0.178 0.198 0.218 0.239 0.261 0.284 0.309 0.334 0.360 0.387 0.415 0.444 0.475 0.506 0.538 0.571 0.605 0.640 0.676 0.713 0.751 0.790 0.830
0.000 0.000 0.000 0.000 0.001 0.002 0.002 0.004 0.005 0.007 0.009 0.012 0.015 0.019 0.023 0.028 0.033 0.039 0.045 0.052 0.060 0.068 0.076 0.086 0.095 0.106 0.117 0.128 0.140 0.152 0.165 0.179 0.192 0.206 0.221 0.236 0.251 0.266 0.282 0.298 0.314
0.000 0.000 0.000 0.000 0.000 0.001 0.001 0.002 0.002 0.003 0.004 0.005 0.007 0.008 0.010 0.012 0.015 0.017 0.020 0.023 0.026 0.030 0.034 0.038 0.042 0.047 0.052 0.057 0.062 0.068 0.073 0.079 0.085 0.092 0.098 0.105 0.112 0.118 0.125 0.132 0.140
F2 .T2 / .T3 =˛3 /ˇ Œ1 C 1=ˇ I b6 .T3 =˛3 /ˇ
Table 9.17 T1 , T2 , and T3 determination, k D 3 and ˇ D 2 (the explanation of the symbols is given in Table 9.18)
0.010 0.020 0.030 0.039 0.049 0.059 0.069 0.078 0.088 0.097 0.107 0.116 0.125 0.134 0.143 0.152 0.160 0.169 0.177 0.185 0.193 0.201 0.209 0.216 0.224 0.231 0.238 0.245 0.251 0.258 0.264 0.270 0.276 0.282 0.287 0.292 0.297 0.302 0.307 0.312 0.316
b7 0.000 0.002 0.004 0.008 0.012 0.018 0.024 0.031 0.039 0.048 0.058 0.069 0.080 0.092 0.105 0.119 0.133 0.148 0.163 0.179 0.196 0.213 0.230 0.248 0.266 0.284 0.302 0.321 0.340 0.359 0.378 0.397 0.416 0.435 0.454 0.473 0.491 0.510 0.528 0.546 0.564
F3 .T3 /
1:663 1:650 1:630 1:601 1:565 1:521 1:469 1:409 1:342 1:269 1:188 1:101 1:007 0:908 0:802 0:692 0:576 0:455 0:329 0:200 0:066 0.072 0.213 0.358 0.505 0.656 0.809 0.965 1.123 1.283 1.445 1.609 1.775 1.942 2.111 2.281 2.452 2.624 2.798 2.972 3.148
B3
9.11 Preventive Maintenance Policies for Repairable Systems 367
368
9 Basic Models and Methods for Maintenance of Production Systems Table 9.18 Explanation of symbols b6 and b7 in Table 9.17
Now, T1 .k D 3/ T1 .k D 2/:
b3
˛3 Œ1 C 1=ˇ I .T3 =˛3 /ˇ
b7
b6 C T3 Œ1 F .T3 /
Consequently, the best value of k, k , is 2 and for k D 3 values of unit cost UEC.k D 3/ greater than UEC.k D 2/ previously quantified are expected. Before the illustration of the calculus of UEC.k D 3/ we explicitly quantify the following values in accordance with Table 9.17: ZT1 b2.k D 3; T1 D 1:05/ D D a 1 C
1 T1 ; ˇ ˛
2Cp C Cr C Cf F1 .T1 / C Cf F2 .T2 / C Cf F3 .T3 / R T1 R T2 R T3 Table 9.17 N N N 0 F1 .t/ dt C 0 F2 .t/ dt C 0 F3 .t/ dt Š
10 C 15 C 15 0:668 C 15 0:387 C 15 0:196 0:764 C 0:400 C 0:193 D ¤32,251 per unit of time.
D FN .t/ dt
0
ˇ ZT1
b2.k D 3; T1 D 1:1/ D
C T1 FN .T1 / D 0:764;
FN .t/ dt D 0:780;
0
then ZT2 b5.k D 3; T2 D 0:444/ D
Now, assuming the upper bounds values, T1 D 1:1, T2 D 0:489, and T3 D 0:217, the unit cost is UECŒk D 3; T1 D 1:1; T2 D 0:489; T3 D 0:217 P .k 1/Cp C Cr C Cf kiD1 Fi .Ti / D Pk R T i N i D1 0 Fi .t/ dt Š
FN .t/ dt
Table 9.17
10 C 15 C 15 0:702 C 15 0:416 C 15 0:213 0:780 C 0:414 C 0:201 D ¤32,233 per unit of time.
D
0
1 T2 ˇ C T1 FN .T1 / D 0:4; D a 1 C ; ˇ ˛ ZT2 b5.k D 3; T1 D 0:467/ D FN .t/ dt D 0:414; 0
and finally ZT3 b7.k D 3; T3 D 0:207/ D
FN .t/ dt
0
1 T2 ˇ D a 1 C ; C T1 FN .T1 / D 0:193; ˇ ˛ ZT3 b7.k D 3; T3 D 0:217/ D FN .t/ dt D 0:201: 0
UEC.k D 3/ assuming T1 D 1:05, T2 D 0:467, and T3 D 0:207 is therefore UECŒk D 3; T1 D 1:05; T2 D 0:467; T3 D 0:207 P .k 1/Cp C Cr C Cf kiD1 Fi .Ti / D Pk R Ti N i D1 0 Fi .t/ dt
2Cp C Cr C Cf F1 .T1 / C Cf F2 .T2 / C Cf F3 .T3 / R T1 R T2 R T3 N N N 0 F1 .t/ dt C 0 F2 .t/ dt C 0 F3 .t/ dt
Figure 9.53 illustrates the trend of the expected lower and upper bounds of UEC for different values of k, further demonstrating that the best value is k D 2. From Eqs. 9.46 and 9.52, and from the assumed values of the scale parameter of the Weibull function ai , the trend of the failure rate increases on passing from the generic period i to i +1, as properly illustrated in Fig. 9.54 when k= 3 and i assumes the values f1; 2; 3g. Figure 9.55 presents the sawtooth trend of the failure rate r.t/ [i. e., .t/] for the component/system during the time period made up of three identical replacement cycles16 , each made up of three periods of duration T1 , T2 , and T3 . The results obtained are in accordance with the as good as new hypothesis adopted and with Eq. 9.52, where 8 ˆ
16
9.11 Preventive Maintenance Policies for Repairable Systems
369
33.000 32.500 UEC(k)
32.000 31.500 31.000 30.500 UEC (k,lower)
30.000 UEC (k,upper)
29.500
UEC (k,upper)
UEC (k,lower)
k=1
k=2 k=3
Fig. 9.53 UEC.k/ and k determination, numerical example
k=1
k=2
k=3
UEC (k,lower)
32.749
30.745
32.251
UEC(k,upper)
32.731
30.777
32.233
Failure rates for ai=(1.5)1-i 20.000 18.000
r1(t), r 2(t), r 3(t)
16.000 14.000 12.000 10.000
r1(t)
8.000
r2(t)
6.000
r3(t)
4.000 2.000 0.000 0.000
Fig. 9.54 1 .t /, 2 .t /, and 3 .t /, numerical example
0.200
0.400
0.600
0.800
1.000
1.200
1.400
1.600
1.800
2.000
t
2.20 2.00 1.80 1.60
r(t)
1.40 1.20 Cycle 1
1.00
Cycle 2 0.80
Cycle 3
0.60
Fig. 9.55 Failure rate r.t / when k D 3, three exemplifying identical cycles of replacement
0.40 0.20 0.00 0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
370
9 Basic Models and Methods for Maintenance of Production Systems
in the first cycle, while in the generic cycle j 8 ˆ tj D .j 1/.T1 C T2 C T3 /; ˆ ˆ ˆ
1
2
3
3Cp C Cr C Cf F1 .T1 / C Cf F2 .T2 / CCf F3 .T3 / C Cf F4 .T4 / D RT R T2 1 N N 0 R F1 .t/ dt C 0 R F2 .t/ dt T3 N T4 N C 0 F3 .t/ dt C 0 F4 .t/ dt
2
Š
The configuration illustrated in Fig. 9.55 costs UEC.k D 3/ 2 .32:23; 32:25/: Obviously, the best solution (k D k D 2) is made up of identical cycles of replacement which differ from those illustrated in Fig. 9.55. In fact the generic cycle is made up of two periods (j is the generic replacement cycle) where 8 tj D .j 1/.T1 C T2 / ˆ ˆ ˆ ˆ ˆ ˆ ˆ
3 0:5 C 15 C 15 0:490 C 15 0:258 C15 0:124 C 15 0:057
0:668 C 0:331 C 0:155 C 0:071 D ¤ 28,840 per unit of time. We also discuss the results obtained assuming a progressive change for the generic scale parameter according to the relation ai D .1:1/1i . Then the cost Cp is assumed to be ¤ 5,000 per action. Figure 9.56 illustrates the trend of ri D1 .t/, ri D2 .t/, and ri D3 .t/ when k D 3. This figure is comparable to Fig. 9.54 (this is the reason the scale of y-axis adopted by Figs. 9.54 and 9.56 is the same). The best value of k, k , is 4, and the main results are reported below: T1 .k D k D 4; Cp D 5; ai D 1:11i / 2 .0:84; 0:85/; T2 .k ; Cp D 5/ 2 .0:694; 0:702/; T3 .k ; Cp D 5/ 2 .0:574; 0:581/;
Now, if Cp reduces from ¤ 5,000 per action to a value of ¤ 500 per action, the best value of k, k , is 4 because T1 .k D 1/ T1 .k D 2/ T1 .k D 3/ T1 .k D k D 4/ T1 .k D 5/: In particular, T1 .k D 4; Cp D 0:5/ 2 .0:82; 0:83/; T2 .k D 4; Cp D 0:5/ 2 .0:364; 0:369/; T3 .k D 4; Cp D 0:5/ 2 .0:162; 0:164/; T4 .k D 4; Cp D 0:5/ 2 .0:072; 0:074/; r1 .T1 / D r2 .T2 / D r3 .T3 / D r4 .T4 / 2 .1:64; 1:66/: The UEC assuming the lower bounds values of Ti is UECŒCp D 0:5; k D 4; T1 D 0:82; T2 D 0:364; T3 D 0:162; T4 D 0:072 P .k 1/Cp C Cr C Cf kiD1 Fi .Ti / D Pk R T i FNi .t/ dt i D1 0
T4 .k ; Cp D 5/ 2 .0:474; 0:480/; r1 .T1 / D r2 .T2 / D r3 .T3 / D r4 .T4 / 2 .1:68; 1:7/: The present optimal number of periods in the generic replacement cycle k D 4 differs from k D 2 previously illustrated and obtained assuming Cp D 5 and ˛i D .1:5/1i . Why? The reader can find the answer to this question by looking to the failure rate values and trends illustrated in Figs. 9.54 and 9.56: when ai D .1:5/1i the failure rate increases more rapidly on passing from the i th to the .i C 1/th period of the generic replacement cycle. Consequently, in this case it is more convenient to replace the item in advance (k D 2).
9.11.2 Type II Policy for Repairable Systems This policy is based on a replacement cycle of a component/system after (k 1) repairs. Consider an entity subjected to (i 1) repairs and repaired at age Ti from
9.11 Preventive Maintenance Policies for Repairable Systems
371
Failure rates for ai=(1.1)1-i 20.000 18.000
r1(t), r2(t), r3(t)
16.000 14.000 12.000
r1(t)
10.000
r2(t)
8.000
r3(t)
6.000 4.000 2.000 0.000 0.000
Fig. 9.56 r1 .t /, r2 .t / when k D 3 and ˛i D .1:1/1i
0.400
0.600
0.800
1.000
1.200
1.400
1.600
1.800
2.000
t
Ti
Ti
Ti
o
o
o
∫ wi (t )dt
∫ wi (t )dt
0.200
∫ wi (t )dt
Replacement with a new component
Cp
Cp
Cp
Cp
Cr
i=1
i=2
i=3
i = k-1
i=k
Cf Cf Cf T2
Tk T3 The same component “gets old”
Fig. 9.57 Type II policy, repairable systems. Replacement cycle and costs
the last planned repair. In the case of failure during the period Ti it is generally supposed that a minimal repair is carried out. A minimal repair does not influence the current failure rate, which changes after a replacement action (i D k and “as good as new hypothesis”) or a preventive repair of cost Cp . As a consequence, similarly to the type I policy illustrated in Sect. 9.11.1, the type II policy requires the determination of the best values of k and fTi g. Figure 9.57 illustrates the replacement cycle and related costs concerning the application of policy II to a repairable item. The expected number of failures (and related minimal repairs) in Œ0; Ti is ZTi EŒN.Ti / D W .0; Ti / D
wi .t/ dt ;
(9.58)
0
where wi .t/ is the entity failure rate after (i 1) planned preventive maintenance actions.
The UEC is UECŒk; T1 ; : : : ; Tk .k 1/Cp C Cr C Cf D Pk
R Ti i D1 0
Pk
wi .t/ dt
:
i D1 Ti
(9.59) By differentiating Eq. 9.59 with respect to Ti and equating to zero, wi .Ti / D
UECŒk; fTi g ; Cf
(9.60)
where fTi g are the optimal values of fTi g and 8 ˆ wi .Ti / D w1 .T1 /; 1 < i k ˆ ˆ ˆ < ZTi k X Œ.k 1/Cp C Cr ˆ ˆ Ti w1 .T1 / wi .t/ dt D : ˆ ˆ Cf : i D1
0
(9.61)
372
9 Basic Models and Methods for Maintenance of Production Systems
In order to solve this set of equations, the algorithm proposed in Sect. 9.11.1 and related to the generalized type I replacement model is a very useful tool. It is important to remember that there are different approximation ways to quantify the renewal function Wi .Ti / from wi .t/, but in the case of minimal repair it is possible to assume wi .t/ D i .t/. The availability of the system is Pk
AŒk; fTi g D
i D1 Ti : .k 1/Cp C Cr Pk Pk R T i CCf i D1 0 wi .t/ dt C i D1 Ti (9.62)
Similarly to the type I policy (see Sect. 9.11.1), the problem of maximizing AŒk; fTi g is equivalent to the problem of minimizing UECŒk; T1 ; : : : ; Tk .
the equipment after n periods (at the end of the nth period). Similarly, assuming n is the age of the capital equipment during all the operative cycles, the discounted cost back to the start of a generic operative cycle j is Cj .n/ D
n X Ci i D1
ri
C
1 .A Sn /: rn
(9.64)
The total discounted cost, which corresponds to an infinite chain of replacement, is C.n/ D
1 X Cj .n/ j D1
Pn D
rjn
Ci i D1 r i
D
Cj .n/DCj D1 .n/ 8j
C
1
1 .A rn n r
Sn /
:
C1 .n/
1 1 r1n (9.65)
9.12 Replacement of Capital Equipment Jardine and Tsang (2006) treated the determination of the optimal replacement interval, based on the minimization of total cost, for capital equipment subject to a sequence of cycles which are separated by replacements. An exemplifying cost-based model is now illustrated.
9.12.1 Minimization of Total Cost This model is based on the minimization of the total discounted cost generated by operation and maintenance activities on the capital production system to be replaced. A first operative cycle generates a cost discounted back to the beginning of the cycle equal to (Fig. 9.58) C1 .n/ D
n X Ci i D1
ri
C
1 .A Sn /; rn
(9.63)
r D 1 C i; where i is the interest rate, r is the discount rate, n is the age of the equipment at its replacement, Ci is the operation and maintenance cost during the i th period from the instant of time the equipment is new, A is the purchase cost of the equipment and generally includes the purchase price, the installation cost, and the cost for loss of production due to the time required to replace the equipment, and Sn is the residual value of
9.12.2 Numerical Example Consider a new capital production plant whose acquisition cost is US$ 17,000. The estimation of the annual operations and maintenance costs is reported in Table 9.19, which also reports the trend of resale values of the capital equipment after different numbers of years of operation. The trend of the annual costs increases because the efficiency of the production system decreases, while the resale value decreases in accordance with increasing obsolescence. Table 9.20 presents the results obtained by the evaluation of Eq. 9.61 for the determination of the optimal age n. In particular, the minimum total discounted cost is US$ 48,738, obtained for n D 2. This value corresponds to US$ 5,361 per cycle (i. e., US$ 377 in the first period/year and about US$ 4,984 in the second period) paid every 2 years. Table 9.20 also quantifies the value of Cj .n/=n, which is at its minimum for n D 7, but does not correctly consider the effect of time on discounted cash flows.
9.13 Literature Discussion on Preventive Maintenance Strategies In the literature there are a large set of studies on maintenance strategies, rules, models, and methods
9.14 Inspection Models
C1
C2
C3
1
2
3
373
Cn-1
…
Cn
n-1
n
C1
C2
C3
1
2
3
…
Cn-1 n-1
Cycle 1
Cn
…
n
…
Cycle 2 replacement
replacement
Fig. 9.58 Replacement cycles Table 9.19 Annual costs and resale values of the production plant Year Estimated annual costs Ci (US$) Residual value (US$) Ci =r i .A Sn /=r n
1
2
3
4
5
6
7
8
400 13,500 1.00 4449.98
600 12,000 2.00 7556.57
780 8,000 3.00 8713.03
1,100 6,000 4.00 10162.71
1,300 3,400 5.00 10651.95
2,000 1,890 6.00 10308.39
3,660 1,500 7.00 10101.34
4,900 900 8.00 17000.00
Table 9.20 Annual costs and resale values of the production plant n
Cj .n/ (US$)
C.n/ (US$)
Cj .n/=n (US$)
1 2 3 4 5 6 7 8
3,679.25 5,361.34 9,122.83 11,150.59 13,571.71 15,470.87 17,561.41 20,428.69
65,000 48,738 56,882 53,633 53,698 52,437 52,431 54,829
3,679.25 2,680.67 3,040.94 2,787.65 2,714.34 2,578.48 2,508.77 2,553.59
useful for planning, managing, and controlling maintenance actions. In this chapter we chose to discuss some basic analytical models for maintenance planning, sometimes supported by simulation, with a focus on numerical examples and applications useful to practitioners and managers of industry and services. Particular attention has been paid to the calculus based on the use of spreadsheets and what-if analysis. The reader can develop new and ad hoc supporting decisions models based on different levels of approximation, useful for the particular case study to be faced. An optimal maintenance strategy should properly incorporate various maintenance policies, single configurations, maintenance restoration degrees (e. g., imperfect, minimal, perfect), correlated failures and repairs, failure dependence, economic dependence, modeling tools (e. g., simulation, Markov chain), planning horizon, etc. (Wang and Pham 2006a). To this purpose few literature references are here cited and further reported in detail in bibliography.
In particular, Wang (2002) presented a survey on maintenance policies with particular attention to the factors most affecting the configuration of optimal maintenance strategies and rules. Lai et al. (2000) presented a case study to determine and schedule preventive maintenance and replacement actions, starting from data collection on failures and repair events. Dekker and Roelvink (1995) presented some advanced cost-based models for the group replacement strategy. Other significant references are reported in the bibliography (e. g., Clavareau and Labeau 2009; Rezg et al. 2005).
9.14 Inspection Models Inspection maintenance actions generally try to detect and correct minor defects of the production system before major (i. e., critical and expensive) breakdowns occur. The status of function of the component/production system, subject to the inspection maintenance strategy, is known only after a fault finding action (inspection action) based on the determination of the values assumed by a set of critical equipment indicators (or production system parameters). These indicators suggest the need for a further maintenance action (e. g., replacement, repair, overhaul, depending on the outcome of the survey) to the manager. In other words, a therapy eventually follows a diagnostic investigation. The main goal of an inspection model is to determine the optimal inspection schedules, i. e., the points in time at which the inspection action should take place. The challenge is to identify the optimal frequency and intensity of inspections aris-
374
9 Basic Models and Methods for Maintenance of Production Systems
ticular, the following assumptions have been adopted for the component (called “block 1” in Fig. 9.60): costs
C failure C inspecon C tot
me - inspecon frequency
Fig. 9.59 Optimal inspection frequency/interval determination
ing from the reduction in system failures, breakdowns, loss of production, wages, etc. The effectiveness of an inspection strategy allows one to detect the potential failure of a component that, if neglected, would cause a failure of the complete equipment, i. e., the production system the component belongs to. When a detected failure is repaired, the failure distribution of the equipment is generally based on a reduced rate of failure. In this case the problem is the determination of the best failure rate of the equipment. Sometimes the problem is explained in terms of the determination of the best level of inspection. In particular, higher inspection costs offer the opportunity to identify potential failures in a better way. Jardine and Tsang (2006) distinguished three classes of inspection problems: 1. determination of inspection frequencies for continuous operation equipment subject to breakdowns; 2. determination of inspection intervals for equipment used only in emergency conditions, the socalled protective devices coming into service in the case of need; 3. optimization of condition monitoring decisions for production equipment. Figure 9.59 exemplifies the relationships between inspection costs (Cinsp / and corrective/breakdown costs (Cfailure ) for the determination and minimization (Cmin ) of the global cost of maintenance on a production system. T is the optimal inspection frequency/interval which minimizes costs and/or maximizes the throughput of the system. Figure 9.60 shows the trend of the well known up/down analysis conducted on a component/system subject to an inspection maintenance strategy. In par-
1. Inspection frequency: every 100 units of time17 . 2. Duration of inspection constant and equal to 20 units of time. 3. Block 1 is subject to failures and the ttf’s probability distribution is N.250; 10/. 4. In the case of failure, block 1 is repaired after the inspection action. 5. The ttr is constant and equal to 50 units of time. 6. The inspection action does not change the reliability status of the component. In other words, the restoration factor q is assumed to be equal to 0. 7. The corrective action is a restoration based on the as good as new hypothesis (i. e., restoration factor q D 1). Figure 9.60 distinguishes the operating time from the time under repair (corrective action), from the socalled “waiting for repair opportunity.” In particular, the waiting for repair opportunity is the time the component/system waits before an inspection is executed, and the so-called dormant failure is identified and repaired. It is also possible to introduce maintenance preventive actions combined with inspection actions as exemplified in Fig. 9.61, where tIM is the duration of the inspection action and tPM is the duration of preventive restoration action. For this purpose we define the so-called warning period from an instant of time tP , when a potential failure can be detected, to tF , when the failure occurs. Every inspection action scheduled within this period is followed by a preventive maintenance action based on a specific value of the restoration factor q. The preventive maintenance task is triggered to take action against the failure occurring. In particular, Fig. 9.61 assumes q D 1, i. e., the component is replaced with another item as good as new and the duration of replacement is 10 units of time. The following sections illustrate some basic models for the application of inspection maintenance combined with corrective maintenance, and not with preventive maintenance actions as illustrated in Fig. 9.61. In fact inspection maintenance without corrective maintenance does not exist. 17
This is the time of the system S , not the age of component block 1
9.15 Single Machine Inspection Model Based on a Constant Value of Conditional Probability Failure
375
Block Up/Down State Operating Time Time Under Repair Waiting for Repair Opportunity
Block 1
System
0.000
200.000
400.000
600.000
800.000
1000.000
Time, (t)
Fig. 9.60 Inspection and corrective actions. ReliaSoft® software
PM action
Scheduled inspection
starting time
tIM
Fig. 9.61 Inspection, preventive, and corrective actions
tP
9.15 Single Machine Inspection Model Based on a Constant Value of Conditional Probability Failure The aim of this model, indicated as elementary inspection maintenance, is to minimize the maintenance cost of a production cycle beginning when the component/system is new and starts to operate, and ending when the component/system fault is found. The instant of time the failure is identified (failure identification, Ii ) generally differs from the time the failure occurs (failure event) as illustrated in Fig. 9.62. The cost of the maintenance action (e. g., minimal repair, replacement) depends on time during which the production system does not operate correctly in
warning period
tPM
tF
time
accordance with specifications. The costs related to a generic maintenance action can be summarized as follows: • CF is the cost per unit time including repair cost action and costs for an unproductive function of the production system. • CI is the cost of an inspection. CF differs from Cf introduced in the previous sections: the former depends on the period of time passed from the failure event to its instant of identification, e. g., euros per hour; the latter is the cost of a single repair action, e. g., euros per action. This model is based on a constant value p for the conditional probability that the component/system, in a state of function in ti 1 , fails between two instants
376
9 Basic Models and Methods for Maintenance of Production Systems
τ Fig. 9.62 Failure event and failure identification. F_Event failure event
F_Event
Ii-1 ti-1
ti
Ii
ti+1
ti+2
time
ti+3
Fig. 9.63 Schedule of inspection times. Example
time
of time (ti 1 , ti / when inspection actions (Ii 1 , Ii ) are executed, defined as follows: F .ti / F .ti 1 / D p; R.ti 1 /
(9.66)
By the differentiation of Eq. 9.69 and equating to zero, one can find the optimal value of p, called p , as dCTotal D 0: (9.71) dp p
i. e., F .ti / F .ti 1 / D .1 p/i 1 p:
(9.67)
The operative cycle is defined as the time interval from the point in time the component/system begins to work to the failure detection during an inspection action. The average number of inspection actions before the identification of a failure is i D
1 X
if .i / D
i D1
1 X
Œi.1 p/i 1 p D
i D1
1 : (9.68) p
Figure 9.63 illustrates an example of an inspection time schedule, where the distance between two generic instants of time Ii 1 , Ii in the series reduces as the component/system gets older and older. The cost of an operative cycle is CTotal D CI i C CF .p/;
(9.69)
where .p/ is the mean time to detection of the failure. An estimation of the mean time to detection is based on the probability that the failure event is occurring in the range Œti 1 ; ti : EŒ .p/ D
1 X
" Zti
D D
1 X i D1 1 X
ti 1
where q D 1 p.
and
8
1
Z1
CTotal D CI i C CF .p/ X 1 CI .i 1/ C CF ti q p MTTF D p
xf .x/ dx
i D1
1
0
i D1
F .t0 /D0
:F .t / D 1 et1 1
Subsequently, from Eqs. 9.69 and 9.70,
ti ŒF .ti / F .ti 1 / ti q .i 1/ p MTTF;
An industrial component has its ttf distributed in accordance with a negative exponential distribution, and its MTTF is 1;000 h. The component is subjected to an inspection maintenance, the cost of an inspection is US$ 64,000, and the cost per unit time CF is US$ 800 per hour. The following trivial relationships can be obtained from Eq. 9.67: 8
# .ti x/f .x/ dx
t D1
9.15.1 Numerical Example 1, Elementary Inspection Model
X CF CI C CF t1 i q .i 1/ p 1 p MTTFD i D1 ln.1 p/ 1 CF CI Eq. (7-44) C C : D F P i 1pD 1 p p i D 1 i D1 Œi.1p/ p D
(9.70)
9.15 Single Machine Inspection Model Based on a Constant Value of Conditional Probability Failure
Differentiating and equating to zero, CI dCTotal ln.1 p/ 1 D 2 C CF C CF 2 dp p p .1 p/p D 0; i. e., CI p C ln.1 p/ D : 1p CF Now, p 1 et1 C ln.1 p/ D t1 1p 1 .1 et1 / 1 et1 t1 et1 D et1 1 t1 : D
Table 9.21 Elementary inspection model. p values, exponential distribution CI =CF
p
CI =CF
p
0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10 0.20 0.30 0.40 0.50
0.131 0.181 0.209 0.236 0.261 0.280 0.298 0.312 0.326 0.340 0.436 0.496 0.541 0.576
0.60 0.70 0.80 0.90 1.00 2.00 3.00 4.00 5.00 6.00 7.00 8.00 9.00 10.00
0.604 0.628 0.648 0.667 0.682 0.778 0.826 0.856 0.876 0.891 0.903 0.912 0.920 0.926
D
As a consequence, it is possible to quantify t1 by the following: e
t1
1 X CI C CF zi q .i 1/ p t p Zi D i i D1 1 X c C D CF zi q .i 1/ p ; p
D
i D1
where zi is the standard variable of the standard normal distribution (see Appendix A.1), N.0; 1/, and cD
1
CTotal
X CI C CF D ti q .i 1/ p CF p i D1
CI : CF
Table 9.22 summarizes the best values of p for different values of c. The values of c and p in the production system, the subject of this application, are
9.15.2 Numerical Example 2, Elementary Inspection Model The variable ttf of a painting system in a production plant is distributed in accordance with a normal distribution (MTTF D 700 h and standard deviation D 150 h). CI D ¤ 650 and CF = ¤48 per hour. The maintenance manager has to schedule inspection actions on the system in accordance with the elementary inspection model. By applying Eqs. 9.69 and 9.70,
1 X ti .i 1/ CI C CF q p p i D1
CI 1 t1 D : CF
Table 9.21 summarizes the best values of p for different values of CCFI . t1 and ti , expressed in hours, can be quantified as follows: 8 ln.1 p/ ˆ ˆ D ln.1 p /MTTF ˆ < t1 D F .t1 /Dp D Œln.1 0:312/ 1000 Š 374 h ˆ ˆ ˆ : ti D i t1 hours.
377
cD
650 CI D Š 0:09 CF 48 150
Table 9.22 Elementary inspection model. p values, normal distribution c
p
c
p
0.01 0.03 0.05 0.07 0.09 0.10 0.30
0.0985 0.1734 0.2234 0.2628 0.2956 0.3103 0.4927
0.50 0.70 0.90 1.00 2.00 3.00 4.00
0.5897 0.6538 0.7001 0.7189 0.8278 0.8769 0.9069
378
9 Basic Models and Methods for Maintenance of Production Systems
F .t4 / Š 0:7538;
and p D 0:2956: As a consequence, it is possible to identify the values of ti by the following: F .ti / F .ti 1 / F .t1 / F .0/ D D F .t1 / D 0:2956; R.ti 1 / R.0/ R.t1 / D 1 F .t1 / Š 0:7044; t1 700 ; 150 t1 Š 700 C 150 .0:535/ D 619:75 h;
1 .0:2956/ Š 0:535 D z1 D Fstd
1 where Fstd .y/ is the inverse function of the failure probability function (see Appendix A.1), where y D F .t/. The determination of t2 in the case of no failure detection in t1 gives
F .t2 / F .t1 / F .ti / F .ti 1 / D R.ti 1 / R.t1 / F .t2 / 0:2956 D 0:2956; D 0:7044 F .t2 / 0:5038; R.t2 / D 1 F .t2 / Š 0:4962; t2 700 ; 150 t2 Š 700 C 150 .0:01/ D 701:5 h:
1 z2 D Fstd .0:5038/ Š C0:01 D
R.t4 / D 1 F .t2 / Š 0:2462; t4 700 ; 150 t4 Š 700 C 150 .0:685/ D 802:75 h: 1
z4 D Fstd .0:7538/ Š 0:685 D
The inspection action has to be performed more quickly as the component/system is getting older, as illustrated in Fig. 9.64. The periods of time between two consecutive inspections are t1 t2 t3 t4
D t1 t0 D t2 t1 D t3 t2 D t4 t3
D 619:75 0 Š 619:5 h; D 701:5 619:5 Š 82 h; D 757:75 701:5 Š 56:3 h; D 802:75 757:75 Š 45 h:
In the case of a failure detected during the third planned inspection (I3 / with an immediate replacement of the component/system (or similarly another as good as new maintenance action is performed) during the repair activity, the scheduled instants of time obtained for the inspections are: t1 t2 t3 t4 D t3 C t1 t5 D t3 C t2 t6 D t3 C t3
D 619:75 h D 701:5 h D 757:75 h D 757:75 C 619:75 D 1;377:5 hI D 757:75 C 701:5 D 1;459:3 hI D 757:75 C 757:75 D 1;515:5 h; etc.
The values obtained are illustrated in Fig. 9.65.
For t3 (no failure detection in t2 / and t4 (no failure detection in t3 / we have F .t3 / F .t2 / F .ti / F .ti 1 / D R.ti 1 / R.t2 / F .t3 / 0:5038 D 0:2956; D 0:4962 F .t3 / Š 0:6505; R.t3 / D 1 F .t2 / Š 0:3495; t3 700 ; 150 t3 Š 700 C 150 .0:385/ D 757:75 h; F .t4 / F .t3 / F .ti / F .ti 1 / D R.ti 1 / R.t3 / F .t4 / 0:6505 D 0:2956; D 0:3495 1
z3 D Fstd .0:6505/ Š 0:385 D
9.16 Inspection Frequency Determination and Profit per Unit Time Maximization This analytical model of Jardine and Tsang (2006) relates to a single machine inspection and is based on a negative exponential distribution of equipment failures (failure rate ) and repair times (repair rate ). .n/ measures the breakdown rate of the equipment and is a function of the number of inspections n performed on the equipment. As a consequence, the profit per unit time P .n/ is a function of n: P .n/ D V .V C R/
n .n/ .V C I / ; i
(9.72)
9.16 Inspection Frequency Determination and Profit per Unit Time Maximization 619 h
701 h
Fig. 9.64 Schedule of inspection maintenance actions
Δt2
619 h
Fig. 9.65 Schedule of inspection maintenance actions in case of a failure in Œt2 ; t3
Δt2
where V is the output profit per unit time in an interrupted (i. e., without downtime losses) unit time, n is the number of inspections per unit time, 1= i is the mean time of negative exponentially distributed inspection times, i. e., duration of the inspection time, R is the cost of repair per unit of time, and I is the cost of inspection per uninterrupted unit of time. By Eq. 9.72 the output value is lost both during repairs and during inspections, i. e., both repairs and inspections have a cost. Differentiating Eq. 9.72 and equating to zero, 0 .n/ .V C I / dP .n/ D .V C R/ D 0; (9.73) dn i i. e.,
V CI .n/ D ; i V CR 0
(9.74)
where 0 .n/ D
d.n/ : dn
We now provide a numerical example. The failure rate of a component is distributed in accordance with a negative exponential function depending on the number of inspections performed as .n/ D =nˇ ;
757 h 803 h
Δt3
701 h
379
Δt4
time
1377 h
757 h
Δt3
1459 h
Δt2
1515 h
Δt3
time
per unit time and ˇ is an adimensional parameter introduced to obtain more general results. Differentiating the failure rate function and applying Eq. 9.74, V CI ; 0 .n/ D ˇ nˇ 1 D i V CR i. e., nD
1=.ˇ C1/ V CI : iˇ V C R
If we assume the unit period is 1 month, i D 90 month1 , D 30 month1 , D 6 month1 , V D ¤ 22,000 per month, I D ¤ 210 per month, and R D ¤ 310 per month, Table 9.23 presents the values of .n/ and P .n/ obtained for ˇ D 2. In particular, the optimal value of n, n , is 3:31 month1, i. e., about one inspection every week assuming 21 working days per month. Figure 9.66 presents the value assumed by the different economic contributions to the profit per unit time assuming ˇ D 2. Figure 9.67 shows the value assumed by the profit per unit time for different values of ˇ. Finally, Table 9.24 presents the optimal value n for different ˇ.
Table 9.24 n values for different ˇ
where is the breakdown rate (i. e., number of breakdowns per unit time) when an inspection is performed
ˇ n
0.1 1.713
0.5 4.340
1 4.252
2 3.307
3 2.714
Table 9.23 Profit maximization ˇ D 2, numerical example n
3.31
1.00
2.00
3.00
4.00
5.00
6.00
7.00
8.00
9.00
10.00
.n/ 0.55 6.00 1.50 0.67 0.38 0.24 0.17 0.12 0.09 0.07 0.06 P .n/ 20,775.90 17,291.22 20,390.94 20,763.89 20,734.01 20,587.63 20,395.39 20,181.49 19,956.06 19,723.91 19,487.60
380
9 Basic Models and Methods for Maintenance of Production Systems 17291.22
20390.94
20763.89
20734.01
20587.63 20395.39
20181.49
19956.06
19723.91
19487.60
10000
P(n)
1000
Output lost for inspecons
100
Cost of repairs Cost of inspecons P(n)
10
Output lost for repairs
1
1
2
3
4
5
6
7
8
9
10
0
n -number of inspecon per unit me
P(n)
Fig. 9.66 Profit per unit time contributions, ˇ D 2
21500 21000 20500 20000 19500 19000 18500 18000 17500 17000 16500 16000 15500
i. e., 0 .n/ D : i
β=0.1 β=0.5 β=1 β=2 β=3 1
2
3
4
5
6
7
8
9
10
n -number of inspecons per unit me
Fig. 9.67 P .n/ and n for different values of ˇ , numerical example
We now provide a numerical example. Considering the previous example for the maximization of the profit per unit time, one can determine the best number of inspections per unit time n by solving the following: 0 .n/ D ˇ nˇ 1 D ; i i. e.,
9.17 Inspection Frequency Determination and Downtime Minimization The total downtime per unit time D.n/ is a function of the inspection frequency n in accordance with the definitions introduced in Sect. 9.16: D.n/ D .n/MTTR C nMTTI D
nD
iˇ
1=.ˇ C1/ :
Figure 9.68 presents the best number of inspections per unit time n, n , for different values of ˇ. Finally, Table 9.25 presents the optimal values of n, n for different ˇ.
n .n/ C ; i (9.75)
where MTTI is the mean time duration of an inspection on a single machine. 0 .n/ 1 dD.n/ D C D 0; dn i
Table 9.25 n values for different ˇ ˇ n
0.1 1.706
0.5 4.327
1 4.243
2 3.302
3 2.711
D(n)
9.18 Inspection Cycle Determination and Profit per Unit Time Maximization 0.280 0.260 0.240 0.220 0.200 0.180 0.160 0.140 0.120 0.100 0.080 0.060 0.040
β=0.1 β=0.5 β=1 β=2 β=3 1
2
3
4
5
6
7
8
9
10
n -number of inspecons per unit me
Fig. 9.68 Downtime minimization for different ˇ values
381
P2 .T / D EŒptnt < T Ci CR RT ptf .t/ dt Ci CR ; D 0 F .T / RT
(9.77)
ptf .t / dt
where 0 F .T / is the expected profit in the presence of a failure in T . By the application of the integration by parts as in Eq. 5.38, the expected profit per cycle is P .T / D P1 .T /R.T / C P2 .T /F .T / ZT D .pT Ci /R.T / C
ptf .t/ dt 0
9.18 Inspection Cycle Determination and Profit per Unit Time Maximization
.Ci C CR /F .T / D
RT RT 0 tf .t / dt DTR.T /C 0 R.t / dt
.pT Ci /R.T /
C Œ1 R.T /.Ci C CR / The following notation is derived from the optimization maintenance model of Hariga (1996) for a single machine:
ZT Cp
tf .t/ dt 0
p
T
P .T / P1 .T / P2 .T / UP.T / Ci CR
Profit per unit time when the production resource (frequently called “machine”) is operating. Length of the operative cycle (inspection cycle), i. e., the interval when the inspection action takes place (in time t, t < T ) and a failure can be detected. Expected profit per inspection cycle. Profit per operative cycle in the hypothesis of the absence of a failure. Profit per operative cycle in the hypothesis of the presence of a failure. Expected profit per unit time. Inspection cost (i. e., cost of a scheduled inspection action). Cost of repairing or replacing the machine. The as good as new hypothesis is generally adopted.
The durations of inspection and repair activities are supposed to be negligible and all failures are equally expensive in terms of repair. Subsequently, the generic failure is supposed to halt completely the production, and finally the machine is as good as new after repair. The following equations can be derived: P1 .T / D pT Ci ;
(9.76)
ZT R.t/ dt C CR R.T / Ci CR :
Dp 0
(9.78)
and the expected profit per unit time: UP.T / D
p
RT 0
R.t/ dt C CR R.T / Ci CR T (9.79)
Hariga (1996) demonstrated the existence of an unique break-even-point inspection interval.
9.18.1 Exponential Distribution of ttf In the case of an exponential distribution of the ttf, f .t/ D exp.t/;
(9.80)
UP.T / RT p 0 exp.t/ dt C CR exp.T / Ci CR D T p Œ1 exp.T / C CR exp.T / Ci CR D : T (9.81)
382
9 Basic Models and Methods for Maintenance of Production Systems
2000 1900
UP [US$/week]
1800 1700 1600 1500 1400 1300 T [week]
Fig. 9.69 UP.T / – profit per unit time maximization, Exponential distribution, numerical example
Figure 9.69 reports the values of the expected unit profit for different values of T , assuming D 0:001 week1, CR D US$ 8,550 per action, Ci D US$ 650 per action, and p D US$ 2,000 per week. The optimal inspection interval is about 26 weeks, and the optimal expected profit per unit time is about US$ 1,941 per week.
9.18.2 Weibull Distribution of ttf In the case of a Weibull distribution of ttf, Eq. 5.67 quantifies the density function at a point in time t: ˇ t ˇ 1 t ˇ ; (9.82) f .t/ D exp ˛ ˛ ˛ where ˛ is a scale parameter of the Weibull distribution and ˇ is a shape parameter of the Weibull distribution. By the application of Eq. 9.79,
D
p
RT 0
R.t/ dt C CR R.T / Ci CR T RT
t ˇ dt p 0 exp ˛
T ˇ CCR exp ˛ Ci CR
R.T /DFN .T /
RT
D
FN .t / dt b Da Œ1C b1 ;. T a/ CTF .T / 0
T
T
9.18.3 Numerical Example Consider the example introduced in Sect. 9.5.5 where the density function of ttf is defined as follows: 8 1 ˆ ˆ ; 0 t < 4; ˆ ˆ8 ˆ < f .t/ D 1 ; ; 4 t 7 .week1 / ˆ ˆ 6 ˆ ˆ ˆ : 0 otherwise: 500 450
T ˇ
p ˛ 1 C ; ˛ ˚
ˇ CT 1 exp T˛
ˇ CCR exp T˛ Ci CR 1 ˇ
where .x; z/ is the lower incomplete function, explicitly introduced by Eq. 9.56. Figure 9.70 reports the values of the expected unit profit for different values of T and ˇ, when the scale parameter ˛ D 50 units of time, CR D US$ 5,000 per action, Ci D US$ 1,000 per action, and p D US$ 500 per unit of time.
;
Expected unit profit [$/week]
UP.T / D
Fig. 9.70 UP.T / ($/unit of time) – profit per unit time maximization, Weibull distribution, numerical example
400 350 300 250 200 150 100 50 0 1
(9.83)
2
3
4 T [week]
5
6
Fig. 9.71 Unit of profit (per week), numerical example
7
9.19 Single Machine Inspection Model Based on Total Cost per Unit Time Minimization
383
Table 9.26 Unit of profit (per week), numerical example T UP.T /
1 25.00
2 375.00
3 441.67
4 437.50
The unit costs of maintenance are CR D US$ 2,000 per action and Ci D US$ 850 per action, and the profit per unit time p D US$ 1,200 per week. The expected value of unit profit can be quantified as follows: RT p 0 R.t/ dt C CR R.T / Ci CR UP.T / D T 8 RT
ˆ p 0 1 18 t dt C CR 1 18 T Ci CR ˆ ˆ ˆ ; ˆ ˆ T ˆ ˆ ˆ 0T 4 ˆ ˆ ˆ < R4
R T 7t 1 D p 0 1 8 t dt C p 4 6 dt ˆ ˆ Ci CR CCR 7T ˆ ˆ 6 ˆ ; 4
1 p T 16 T 2 C CR 1 18 T Ci CR ˆ ˆ ˆ ; ˆ ˆ T ˆ ˆ ˆ ˆ 0T 4 ˆ ˆ ˆ
< 7 1 T2 7T D p 6 T 3 12 C CR 6 Ci CR ; ˆ ˆ T ˆ ˆ ˆ 4
4 437.50
5 383.33
6 313.89
7 235.71
starts when a failure is detected and the equipment is repaired, or replaced, and ends when a new failure is detected. The aim of the proposed model is the determination of the optimal inspection schedule that minimizes the global expected cost per unit time. Inspections at time x1 ; x2 ; : : :; xn are performed until failure is detected. In other words, the objective is to determine the values of fxi g that minimize the total UEC. The expected cost of maintenance, if a failure occurs in the range Œxk1 ; xk , is Zxk ŒkCi C Cu .xk t/ C CR f .t/ dt;
(9.84)
xk1
where f .t/ is the statistical distribution of ttf for the component/system, Ci is the cost of an inspection action, Cu is the cost per unit time associated with undetected equipment failure, and CR is the cost of repair (or replacement). Figure 9.72 illustrates the main contribution to the maintenance cost and the instants of time when the generic maintenance actions are performed. The total expected cost per cycle is ECŒfxi g x 1 ZkC1 X D Œ.k C 1/Ci C Cu .xkC1 t/ C CR f .t/ dt kD0 xk x 1 ZkC1 X D CR C Œ.k C 1/Ci C Cu .xkC1 t/f .t/ dt: kD0 xk
(9.85)
9.19 Single Machine Inspection Model Based on Total Cost per Unit Time Minimization
The expected cycle length is x 1 ZkC1 X .xkC1 t/f .t/ dt ; ETŒfxi g D MTTF C TR C kD0 xk
This model is based on the definition of the production cycle, as illustrated in Sect. 9.15: the generic cycle
where TR is the repair time.
(9.86)
384
9 Basic Models and Methods for Maintenance of Production Systems
failure detection
failure event, tk Ci
Ci
Ci
x1
x2
x3
Ci
Ci CR
xk-1= x4
time
xk=x5
cycle length
new cycle
(x5 - tk) Cu Fig. 9.72 Total cost per unit time minimization model, k D 5
The expected cost per unit time is ECŒfxi g ETŒfxi g R xkC1 P CR C 1 Œ.k C 1/Ci kD0 xk CCu .xkC1 t/f .t/ dt R xkC1 D : P MTTF C TR C 1 .xkC1 t/f .t/ dt kD0 xk (9.87)
out of control for the component/system, the expected cost of repair and restoration is
UECŒfxi g D
The timing fxi g could be very hard because of the great number of variables to be managed and optimized simultaneously. As a consequence, in order to identify the best fxi g, it could be useful to apply the simulation analysis on different operating systems, similarly to the what-if analyses conducted in Sects. 9.4.2, 9.5.2, 9.6.1, etc., on preventive maintenance .
R Ti C1 Ci0C1
Ti
Œr C s.Ti C1 t/f .t/ dt ; FN .Ti /
(9.88)
where FN .Ti / D 1 F .Ti / is the survival function, RT F .Ti / D 0 i f .t/ dt is the failure probability function,18 r is the minimal repair cost, and s is the cost per unit time of function in an out of control state. Figure 9.73 illustrates the inspection rule proposed by the authors exemplifying a failure event in time t in the interval between the instants of time T2 and T3 . CI is the cost of an inspection, Cr the cost of replacement. This event is detected during the inspection performed in T3 . Consequently, the expected cost per unit time can be quantified by the following:
9.20 Single Machine Inspection Model Based on Minimal Repair and Cost Minimization Banerjee and Chuiv (1996) presented a single machine inspection model for cost per unit time minimization. The production cycle has a duration T defined from the starting time of the component/system to its replacement or restoration in as good as new condition. This period of time is made up of a sequence of n intervals T1 ; T2 ; : : : ; Tn , separated by inspections. A minimal repair action of cost r is performed in the case of detection of the out of control state of the production system. The aim of the model is to define n and fTi g. The generic minimal repair does not alter the probability density function f .t/. Considering the generic inspection interval of duration Ti C1 Ti , in the case of detection of a state of
D
UECfn; .Ti D1;:::;n /g D
Cr C nCI C T
Pn1 i D0
Ci0C1
;
(9.89) where T is the cycle length. How should we minimize UEC? We suggest the use of a what-if analysis performed by Eq. 9.89 and/or by the application of the event simulation, such as the Monte Carlo simulation, and/or by a continuous simulation. In the first case, a successful attempt is introducing the probability density function f .t/ of the variable ttf, e. g., a Weibull function supposing different values of the set of times (Ti /, and simplifying the analytical expression of UEC by the introduction of the lower incomplete gamma function, illustrated and applied in Sect. 9.11.1.1. For example, we can suppose 18
Only a density function, called f .t /, is defined because of the minimal repair hypothesis.
9.21 Inspection Model Based on Expected Availability per Unit Time Maximization Failure in t CI
CI
CI
T1
T2
T3
replacement CI
C 2' =
∫
T1
CI
Cr
T4
cycle length T2
385
time n
new cycle
[r + s (T2 − t )] f (t )dt F (T2 )
r + s(T3 - t)
Fig. 9.73 Inspection model based on minimal repair and cost minimization
one of the following values of Ti in order to reduce the number of degrees of freedom: Ti D T = i; Ti D T = i 2 ; Ti D T = i k ; Ti D T =.1:5/i ; Ti D T =.k/i ;
where k > 1:
It is possible to conduct a Monte Carlo simulation by the use of a spreadsheet, e. g., in Microsoft Excel, or by the application of an ad hoc tool for event simulation analysis. This tool can be a general purpose statistical tool or reliability software, e. g., Weibull reliability software. Finally, for the continuous simulation MATLAB® and Simulink® are very effective decision tools for solving this kind of optimization problem.
9.21 Inspection Model Based on Expected Availability per Unit Time Maximization This is a useful model in particular to plan inspections of equipment used in emergency conditions (Jardine and Tsang 2006), e. g., protective devices (fire hydrants, fire extinguishers, diesel generators for runway lights, automatic switchers for emergency power supply, etc.). The determination of the optimal inspection intervals is generally called a “failure-finding interval” problem. If a piece of equipment is found in a failure state it is repaired or replaced. As a consequence, it is possible to distinguish good cycles, where no failures are detected and whose length is ti CTi , from failed cycles based on the MTTF of the equipment and whose duration is ti C Ti C Tr , where ti is the inspection interval, Ti is the duration
of an inspection, and Tr is the repair (or replacement) duration. A basic hypothesis is that the equipment is as good as new after a replacement or a repair activity following an inspection. The expected uptime per cycle is R ti tf .t/ dt Œ1 R.ti / UT.ti / D ti R.ti / C 1 1 R.ti / Zti tf .t/ dt: (9.90) D ti R.ti / C 1
The expected cycle length is L.ti / D .ti C Ti /R.ti / C .ti C Ti C TR /Œ1 R.ti /: (9.91) Therefore, the expected availability per unit time is R ti ti R.ti / C 1 tf .t/ dt A.ti / D : ti C Ti C Tr Œ1 R.ti /
(9.92)
We now provide a numerical example. Considering the numerical example introduced in Sect. 9.5.5, Eq. 9.92 becomes A.ti / D
ti R.ti / C
R ti
2f ty
tf .t/ dt
ti C Ti C Tr Œ1 R.ti /
8
t2 ˆ ti 1 18 ti C 16i 16ti ti2 ˆ ˆ
1 D ; ˆ ˆ ˆ 2.8ti C 8Ti C Tr ti / ti C Ti C Tr 8 ti ˆ ˆ ˆ ˆ ˆ 0 ti < 4 ˆ ˆ < ti2 4 7ti 2 ti 6 C 12 14ti ti 4 D ;
ti 1 D ˆ ˆ 2Œ6ti C 6Ti C Tr .ti 1/ ˆ ti C Ti C Tr 6 ˆ ˆ ˆ ˆ 4 ti 7 ˆ ˆ ˆ 45 ˆ ˆ ˆ 12 : ; ti 7: ti C Ti C Tr
386
9 Basic Models and Methods for Maintenance of Production Systems
Table 9.27 Maximization of availability, numerical example ti A.ti /
0 0.00
1 0.63
2 0.68
3 0.67
4 0.63
5 0.58
6 0.53
terval can be determined for each machine:
7 0.46
ti;j C1 ti;j D ti;j ti;j 1 D Ti : The following assumptions have to be further considered (see Fig. 9.75):
0.8
T
0.7 0.6
A(ti)
0.5
A(ti)
0.4
tij
0.3 0.2
A
0.1 0 0
2
4
6
8
ti
length of planning horizon; consequently, the cycle time T is usually made up of several basic cycles of duration T0 ; time between (j 1)th and j th inspections of machine i ; average setup cost of a review for inspection, not dependent on the machines inspected.
The number of inspections for machine i during the time periodT is defined as follows:
Fig. 9.74 Maximization of availability, numerical example
ni D Assuming Ti D 0:4 weeks and Tr D 0:7 weeks, Table 9.27 and Fig. 9.74 report the values of A.ti / for different values of the inspection time. In particular, the best inspection interval corresponds to 2 weeks.
9.22 Group of Machines Inspection Model This model developed by Ben Daya and Duffuaa (1997) is a generalization of that illustrated in Sect. 9.20 when applied to a set of machines: the aim is to choose a subset of machines to be inspected in order to coordinate the maintenance activities, thus gaining a significant advantage of setup costs and generating savings. The time between two consecutive setups is the so-called basic cycle, having duration T0 , and the objective is the determination of the inspection time Ti for machine i , i D 1; : : : ; N , so as to minimize the total UEC. The generic item/machine shifts from an in-control state to an out-of-control state, only detectable by an inspection. The duration T0 corresponds to a periodic review to determine those machines to be inspected. The average setup cost for this reviewing activity isA, and ti;j is the point in time the j th inspection is executed for item i . If the ttf variable is distributed in accordance with an exponential distribution, a constant inspection in-
T : Ti
These are three different contributions to the global expected cost per cycle, or planning horizon, of length T : 1. Setup cost
T T0 A.
P T 2. Cost of inspection N i D1 Ti ai , where ai is the unit inspection cost on machine i and N is the number of machines. 3. Similarly to the single machine model illustrated in Sect. 9.20, the generic failure cost contribution for the component i and the time period Œti;j ; ti;j C1 is defined as R ti;j C1 Ci;j C1 D
ti;j
Œri C si .ti;j C1 t/fi .t/ dt FN .ti;j /
;
(9.93) where ri is the repair cost of machine i and si is the cost per unit time of function in the “out of control” state for machine i . Consequently, the expected cost per unit time is UEC.T0 I T1 ; : : : ; Tj ; : : : ; TN / PN T PN Pni 1 T i D1 Ti ai C i D1 j D0 Ci;j C1 T0 A C D T N N nX i 1 X X ai Ci;j C1 A : (9.94) D C C T0 Ti T i D1
i D1 j D0
The inspection time Ti is a multiple of the basic cycle T0 and the objective of the model is to determine Ti .
9.23 A Note on Inspection Strategies
Failure: out of control
T1=t1,2-t1,1 Machine i=1
ai=1
a1
ti=1,1
t i,2
387
a1
a1
t i,3
ti, 4
repair a1 ti, 5
time
Planning horizon T r1+s1(t1,4-t)
Machine i=2
. . .
T2=t2,2-t2,1
Failure: out of control
repair
a2
a2
a2
a2
a2
ti=2,1
t2,2
t2,3
t2,4
t2,5
time
Planning horizon T
Machine i=N
r2+s2(t2,3-t)
Fig. 9.75 Group of machines inspection model
Consequently, Ti D ki T0 , where ki is the number of basic cycles between two consecutive inspections on machine i . The same final conclusions of Sect. 9.20 about the use of simulation as an effective tool to support the minimization of equations and models, such as Eq. 9.94, can be drawn in this case too.
9.23 A Note on Inspection Strategies Inspection can be defined as the process of planning, implementation, and evaluation of examinations to determine the condition of a piece of equipment subject to deterioration, in terms of fitness for service. Deterioration is made up of damage, defects, or degradation, and inspection strategies are particularly useful in the presence of lack of data (e. g., on materials, equipment design), unknown operating history, and uncertainty of operating conditions. The effectiveness of inspection actions is strongly based on the quality of the data required to be collected directly in the field during the operating time, when not only profits but also costs, in the case of lost production, are generated. The international regulations for safety on critical and hazardous production systems state several
requirements or “goal setting,” i. e., final results and desired expectations are declared without the prescriptive models and methods to achieve them. There are several guides for the best practice regarding the application of inspection maintenance, and a lot of guidelines on planning the periodicity of examinations, i. e., recommended intervals between inspections (see the API recommended practices – http://www.api.org/Publications/2009catalog-pages.cfm; Wintle et al. 2001; de Almeida et al. 2003; Pinto 2008; Nagano 2008). A significant example dealing with inspection in terms of in-service examinations on pressure equipment and systems, e. g., storage tanks and containers, boilers, refrigerated storage spheres, and protective devices, subject to the Pressure Systems Safety Regulations, was proposed by Wintle et al. (2001). In general, similar best practice can be applied on systems containing hazardous materials to be inspected. Analogous equipment can involve flammable and/or toxic contents which are responsible for externalities such as environmental pollution, lost production, harm to health and safety of employees and the community, and production costs (including replacement of a piece of equipment after an incident, insurance premiums, legal actions). In particular, deterioration and modes of failures of
388
9 Basic Models and Methods for Maintenance of Production Systems
pressure systems are corrosion/erosion, creep and high-temperature damage, fati gue, stress corrosion, embrittlement, brittle fracture, buckling, etc. Visual surveys, ultrasonic testing and radiography, dimension control, metallurgical analyses, pressure tests and stress concentrations and distribution analysis, and leak-before-break analyses are only a few examples of tests and analyses suitable to support a condition-based maintenance action, such as an inspection as a sparkling event for repair and/or replacement actions. In particular, local inspection methods for pressure equipment include visual inspection, penetrant testing, magnetic particle inspection19 , eddy current, radiography, ultrasonic testing, thickness gauging for wall detections on components subject to corrosion/erosion, alternating current field measurement, thermography, long-range ultrasonic, acoustic emissions, etc. What about the contribution of this book to inspection maintenance strategies and rules? We chose simply to cite the technologies and the recent methods for executing effective condition-based maintenance actions, because each industrial sector (e. g., refining and petrochemicals industry, automotive industry, food industry, health care and pharmaceutical industry) has its specific issues, i. e., products and services, their deterioration mechanisms and rates, production processes, plants and equipment, systems’ operating conditions, customers’ expectations, etc. How is possible to affirm that similar technologies are recent and effective in general? Consequently, we decided to present models and methods for supporting managers and practitioners in planning and executing risk-based inspections. These techniques are mainly based on information on potential degradation mechanisms and threats obtained by a risk analysis, and not only by prescriptive practices generally based on industrial experience (e. g., historical experience, industry guidelines for classes of equipment, as a prescribed percentage of the estimated residual/remnant life). These analytical models are basic, but they can effectively support the development of ad hoc supporting decisionmaking methods for the analysts, owners, and users (e. g., safety managers, site inspectors, the so-called competent person, duty holders) of many industrial and service companies.
Similarly to the discussion on preventive maintenance models and methods, we decided to introduce the reader to the basic models for planning an inspection maintenance strategy. The literature contains some other materials, a large part of which is made up of very complicated models.
9.24 Imperfect Maintenance In the maintenance planning models previously discussed two kinds of hypotheses play a fundamental role: the “as good as new” hypothesis after a renewal or replacement action, and the “as bad as old” hypothesis after a minimal repair action. We are in the presence of an “imperfect maintenance” strategy or action when the item, after a maintenance activity, is neither “as good as new” nor “as bad as old,” and only a partial/imperfect restoration, not a complete renewal process, is performed. Several ways to model an imperfect restoration can be found in the literature. Three significant and alternative examples were illustrated by Duffuaa et al. (1999): 1. The item can be good as new or restored with different levels of incomplete restorations after the execution of a maintenance action. The level of restoration is a stochastic variable between 0 and 1,20 where 0 is no restoration, corresponding to a minimal repair action, and 1 is the result of a perfect renewal action. 2. The residual life of the component/system is subject to a reduction of X units of time. 3. The age of the item is reduced to its original value at the beginning of the maintenance action in proportion to the maintenance cost.
9.24.1 Imperfect Preventive Maintenance p – q These are the hypotheses and notation adopted by the proposed model for planning imperfect preventive maintenance actions (Duffuaa et al. 1999; Nagakawa 1980):
19
The item is magnetized and if the item is sound the magnetic flux is inside the material, otherwise it is distorted.
20
Remember the restoration factor q introduced in Sect. 9.7.
9.24 Imperfect Maintenance
389 jT
Cf
∫ r (t )dt 0
Cp
Cp P
Cp
P
0
2T
T T
Cp q = 1-p
jT
(j-1)T
T
time
T
Fig. 9.76 Imperfect preventive maintenance
• With probability p; the generic item after a preventive maintenance action has the same failure rate as before,21 i. e., it is as good as new with probability q D 1 p. Consequently, in the case of minimal f .t / 22 repair action the failure rate is r.t/ D 1F . .t / • Planned preventive maintenance points in time are: kT;
where k D 1; 2; : : : ; and T > 0:
• Repair action duration is negligible. • Cp is the preventive maintenance cost per action. • Cf is the repair cost per action. The imperfect maintenance object of the proposed model is therefore the result of the combination of minimal repair and renewal actions. The component/system starts to function at the point in time t0 D 0. The following equation quantifies the expected cost from time 0 to the time the item is as good as new after a perfect preventive maintenance action: ZjT jCp C Cf
r.t/ dt :
(9.95)
The expected cycle duration up to a perfect preventive maintenance is EŒcycle D
j D1
(9.97)
which is the expected value of a discrete stochastic variable y, where y D j T and the density function is f .y/ D qp j 1 . As a consequence, the expected unit cost is ECŒP; T EŒcycle
R jT P1 j 1 jCp C Cf 0 r.t/ dt j D1 qp P1 D : j 1 .j T / j D1 qp (9.98)
UECŒP; T D
By the introduction of a few simplifications, 1 X
qp j 1 .jCp / D qCp
j D1
1 X
jp j 1
j D1
1 d X j p dP j D1 p d D qCp dP 1 p qCp qCp Cp D ; (9.99) D 2 D 2 .1 p/ q q D qCp
Equation 9.95 is based on the number of minimal repair actions executed before a perfect preventive maintenance . This is the expected cost for an operating cycle of duration j T , i. e., up to perfect preventive maintenance (Fig. 9.76): ECŒp; T D
qp j 1 .j T /;
j D1
0
1 X
1 X
ZjT j 1 jCp C Cf r.t/ dt : qp
1 X
qp j 1 .j T / D qT
j D1
1 X
jp j 1 D
j D1
qT T D : q2 q
0
(9.100)
(9.96) The expected unit cost is 21
This hypothesis is the well-known minimal repair action. 22 In particular, r.t / D .t /, where (t / is the conditional failure rate introduced in Chap. 5, and the number of expected failures can be quickly quantified by the integration function applied to .t /.
UECŒP; T D
Cp C Cf q 2
P1
j D1
T
p j 1
R jT 0
r.t/ dt
:
(9.101)
390
9 Basic Models and Methods for Maintenance of Production Systems
Up
Down time T0
T1
T2
T3
T4
T5
T6
T7
Fig. 9.77 Points in time at whic h the system fails and system startup points
In the case of a Weibull distribution of ttf the failure rate is a simple function, r.t/
D
Eq. 5.71
ˇ x ˇ 1 ; ˛ ˛
but in Eq. 9.101 it is necessary to cope with a series of functions that in general are very difficult to solve, especially for practitioners and maintenance managers of industrial and service companies. This is the last model we choose to present in this chapter, and it demonstrates that analytical models can rapidly increase the level of complexity when we try to introduce more and more realistic hypotheses.
9.25 Maintenance-Free Operating Period This section introduces a reliability and maintainability performance index for a system of components used by the Royal Air Force as a reliability metric: the so-called maintenance-free operating period (MFOP), also discussed by Hocley and Appleton (1997). MFOP defines a period of operation, tmf life units, during which an item will be able to carry out all its assigned missions, without system faults and limitations, and with the minimum of maintenance. MFOP introduces another metric, maintenance-free operating period survivability (MFOPS), defined as the probability that a component/system will survive for the duration of the MFOP. As a consequence, it measures the probability of not having any unscheduled maintenance, without the need for corrective maintenance, for a period of tmf given the current age of the item. During this period of time some planned minimal maintenance actions can be allowed, and redundant components can fail: any corrective action has to be bypassed.
A MFOP is generally followed by a maintenance recovery period during which scheduled maintenance actions are performed. More explicitly, considering a repairable system, Kumar et al. (1999) defined MFOPS as n\ o Œ.T2i C1 T2i / tmf ; MFOPS.tmf / D Pr i D0
(9.102)
where fT1 ; T3 ; T5 ; : : :g are system fails time points, fT2 ; T4 ; T6 ; : : :g are the times the system starts up after a repair, T0 is the beginning of life, and is a confidence interval (e. g., 0.95). Periods fT1 T0 , T3 T2 , : : : , T2i C1 T2i g are the operating times of the production system (Fig. 9.77). They can be independent and identically distributed. MFOPS of tmf life units is a reliability requirement used for a very complex production system, whose unavailability is very expensive. An example is represented by civil airlines, burdened with the cost for unscheduled maintenance of about two million dollars per aircraft per year (Kumar et al., 1999). In particular, Kumar et al. (1999) developed analytical models to predict the MFOP for a system. If the system is made of n components connected in series, the probability the system has a MFOP of tmf life units for the i th operative cycle, i. e., it survives to the i th operative cycle assuming that it survives (i 1) cycles, is given by MFOPS.tmf ; i / D
n Y kD1
Rk .i tmf / : Rk Œ.i 1/tmf
(9.103)
The authors propose a trivial iterative procedure for the determination of the number of cycles the system guarantees a probability MFOPS , assuming independence between ttf distributions in each operative cycle.
9.25 Maintenance-Free Operating Period
391 1 0.9 0.8
MFOPS(tmf,i)
0.7 0.6 0.5
MFOPS (tmf=50,i)
0.4
MFOPS (tmf=100,i) MFOPS (tmf=150,i)
0.3 0.2 0.1
Fig. 9.78 MFOPS.50; i / for different cycles. MFOPS maintenance-free operating period survivability
Step 1.
Set i =1.
Step 2.
MFOPS.tmf ; i / D
Step 3. Step 4. Step 5. Step 6.
0 1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
Cycle number - i
Table 9.28 ttf distributions of components n Y
Rk .i tmf / . Rk .Œi 1tmf / kD1 If MFOPS.tmf ; i / , go to step 5. Set i D i C 1, go to step 2. Number of cycles is i 1. Stop.
Assuming a Weibull distributed failure time, the probability for the item to survive tmf units of time, in addition to the already survived t units, is ˇ t .t C tmf /ˇ ; (9.104) MFOPS.tmf / D exp ˛ˇ where ˛ is the Weibull scale parameter and ˇ is the Weibull shape parameter. As a consequence, the value of tmf for a given level of confidence can be calculated as follows: ( tmf D ft ˇ ˛ ˇ lnŒMFOPS.tmf /g1=ˇ t ; MFOPS.tmf / D : (9.105)
9.25.1 Numerical Example (Kumar et al. 1999) Consider a system with four components connected in series. Table 9.28 collects the reliability parameters of ttf distribution of the values.
Component
Distribution
Parameter values
1 2 3 4
Exponential Weibull Normal Weibull
D 0:001 h1 ˛ D 1;200 h, ˇ D 3 MTTF D 1;500 h, D 200 h ˛ D 1;400 h, ˇ D 2:1
By Eq. 9.103, 8 n Y ˆ Rk .i tmf / ˆ ˆ MFOPS.t ; ; i / D ˆ mf ˆ ˆ Rk Œ.i 1/tmf ˆ kD1 ˆ ˆ ˆ ˆ ˆ ˆ ˆ R1 .i tmf / D exp.i tmf / D exp.0:001i tmf /; ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ i tmf 3 i tmf ˇ ˆ ˆ R .i t / D exp D exp ; ˆ ˆ < 2 mf ˛ 1200 ˆ ˆ i tmf MTTF ˆ ˆ z D R .i t / D 1 F 3 mf std ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ i tmf 1500 1500 i tmf ˆ ˆD 1 Fstd D Fstd ; ˆ ˆ 200 200 ˆ ˆ ˆ ˆ ˆ ˆ 2:1 ˆ ˆ i tmf ˇ ˆ i tmf ˆ D exp 1400 ; :R4 .i tmf / D exp ˛ where Fstd is the standard cumulative normal distribution function (see Appendix A.1). Figure 9.78 compares MFOPS values for different numbers of operative cycles and different values of tmf
392
9 Basic Models and Methods for Maintenance of Production Systems
Fig. 9.79 MFOP.t /, Weibull distribution, MTTF D 1,000, MFOPS.tmf / D 0:95. MFOP maintenance-free operating period
Fig. 9.80 MFOP.t /, Weibull distribution, MTTF D 1,000, MFOPS.tmf / D 0:80
units. As a consequence, this figure can support the determination of the number of cycle i which guarantees the MFOPS value in accordance with a confidence interval .
shape parameters ˇ when MTTF D 1; 000 units of time and MFOPS.tmf ; i / D 0:95. Values of time on the abscissa represent the age of the system and are t D tmf i:
9.25.2 MFOPS and Weibull Distribution of ttf Assuming a Weibull distribution of ttf, Fig. 9.79 presents the MFOP values obtained for different
(9.106)
Similarly, Fig. 9.80 illustrates the results obtained when MFOPS.tmf ; i / D 0:80. Figures 9.81 and 9.82 present the same results illustrated, respectively, in Figs. 9.79 and 9.80 when t 2 .600; 1100/.
9.26 Opportunistic Maintenance Strategy
393
Fig. 9.81 MFOP.t /, Weibull distribution, MTTF D 1,000, MFOPS.tmf / D 0:95, t 2 .600; 1100/
Fig. 9.82 MFOP.t /, Weibull distribution, MTTF D 1,000, MFOPS.tmf /=0.80, t 2 .600; 1100/
Finally, Figs. 9.83 and 9.84 present the results for MFOP obtained for different values of i , the number of cycles, and assuming MFOPS.tmf ; i / D 0:80, in the range i 2 Œ1; 250 (Fig. 9.83) and in the range i 2 Œ1; 12 (Fig. 9.84). MFOP and MFOPS turn out to be very effective measures of reliability of a component/system subject to maintenance.
9.26 Opportunistic Maintenance Strategy The opportunity to take advantage of a system downtime to perform a preventive replacement is called “optional strategy,” and has related optional rules. The generic supporting management decision model has to take account of the costs of failure replace-
394
9 Basic Models and Methods for Maintenance of Production Systems
Fig. 9.83 MFOP.i /, Weibull distribution, MTTF D 1,000, MFOPS.tmf / D 0:80
Fig. 9.84 MFOP.i /, Weibull distribution, MTTF D 1,000, MFOPS.tmf / D 0:80, i 2 Œ1; 12
ments/repairs for the generic item as a part of a system, the cost of optional replacements/repairs during the downtimes of the system and/or components and the behavior of the components/system during their life cycles. For example, in a generic opportunistic maintenance action executed when a component fails, the whole production system takes advantage of the downtime opportunity to replace or repair the other similar components. An example is represented by the maintenance planning and scheduling of an aircraft, meant as
a production system operating under high reliability and safety requirements: the cost of downtime can be reduced by the adoption of opportunistic maintenance, e. g., the parts to be replaced in the immediate future are replaced in advance during another scheduled or unscheduled maintenance (preventive or corrective) action. In the aviation industry the downtime due to unscheduled maintenance generates great costs owing to a canceled flight, unavailability of logistical support, loss of customers’ goodwill, etc. Some critical maintenance
9.26 Opportunistic Maintenance Strategy
checks therefore require thousands of man-hours. As a consequence, it is impossible to avoid downtimes but it is right and proper to redu ce them by performing opportunistic maintenance actions in advance. Another significant example dealing with a multistage production system which can be modeled as a flow shop, i. e., a line of different machines using a set of similar machining tools, was illustrated by Kaspi and Shabtay (2003): if a failure occurs, the tool (i. e., the component) is replaced, and if a downtime opportunity occurs and simultaneously a component, or another similar tool, exceeds a predefined control limit of time, a preventive replacement is performed. This is the so-called integrated replacement strategy, i. e., the maintenance action simultaneously replaces the failed tool on machine i and the tool used in a different machine j.j ¤ i / when tool j is older than a specific value. Saranga (2004) distinguished two different kinds of opportunistic maintenance: age-related and non-agerelated, performed during a failure, i. e., a corrective action. The first category is made up of three subcategories: 1. Hard life. This refers to the life of safety-critical parts, i. e., items that significantly compromise the safety of people, equipment and/or the environment. 2. Soft life (Crocker and Kumar 2000). This is the age of a part that is rejected during the next recovery of the module where it is placed. Soft lives are generally set on cheap components using opportunistic maintenance.
395
3. Degradation. This refers to a component controlled and managed through condition-monitoring devices: components are repaired or replaced after the deterioration level reaches a critical value. A non-age-related strategy refers to those items whose failure is undetectable and known only after an inspection activity, or when the containing inaccessible module has been completely dismantled. Saranga (2004) called these entities “non-safety-significant components.” They are generally replaced during routine maintenance recoveries. In accordance with this classification of parts and components of a generic complex production system, the question is whether to replace them if a downtime occurs or whether to wait until the next shutdown of the system. Then a second question deals with the definition of the basis and specifics for the optional maintenance action. What is the cost of the remaining life of an item of a production system? What about the downtime costs related to the decision to wait until the part has used up its remaining life, or related to the decision to perform an optional replacement in the presence of an opportunistic system shutdown? This discussion on opportunistic strategy is not exhaustive: there are several contributions in the literature which can properly inspire the professional and research activities of the reader of this book now properly introduced to the importance of an integrated approach to health and safety management, risk assessment, maintenance planning and execution, quality management, and production system optimization all based on cost reduction and profit maximization.
10
Advanced Maintenance Modeling
10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397
olds are also determined. Several numerical examples are also given to illustrate the models. A brief warranty concept is also discussed.
10.2 Maintenance Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . 398 10.2.1 Age Replacement . . . . . . . . . . . . . . . . . . . . . . . 398 10.2.2 Block Replacement . . . . . . . . . . . . . . . . . . . . . 399
10.1 Introduction
Contents
10.3 Modeling of Nonrepairable Degraded Systems . . . 399 10.4 Modeling of Inspection-Maintenance Repairable Degraded Systems . . . . . . . . . . . . . . . . . 10.4.1 Calculate EŒNI . . . . . . . . . . . . . . . . . . . . . . . . . 10.4.2 Calculate Pp . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.4.3 Expected Cycle Length Analysis . . . . . . . . . . 10.4.4 Optimization of Maintenance Cost Rate Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.4.5 Numerical Example . . . . . . . . . . . . . . . . . . . . .
402 403 404 405 405 406
10.5 Warranty Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . 406 10.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 408
This chapter provides a brief introduction to maintenance modeling with various maintenance policies, including age replacement, block replacement, and multiple failure degradation processes. We then discuss reliability modeling for degradation systems subject to competing failure processes. We also describe inspection maintenance modeling for degraded repairable systems with competing failure processes. An average long-run maintenance cost rate function is derived on the basis of the expression for the degradation paths and cumulative shock damage, which are measurable. An inspection sequence is determined on the basis of the minimal maintenance cost rate. Upon inspection, a decision will be made whether to perform a preventive maintenance or not. The optimum replacement policies and preventive maintenance thresh-
Maintenance involves preventive (planned) and corrective (unplanned) actions carried out to retain a system in or restore it to an acceptable operating condition. Maintenance, replacement, and inspection problems have been extensively studied in the reliability, maintainability, and warranty literature as demonstrated by the large number of references in the bibliography at the end of the book. A few models and methods were introduced and applied in Chap. 9. Many researchers have developed various models and maintenance policies in order to prevent the occurrence of system failures at the lowest possible maintenance costs (Barlow and Proschan 1965; Bai and Pham 2006; Beichelt and Fisher 1980; Boland 1982; Esary et al. 1973; Hollander et al. 1992; Murthy and Nguyen 1981; Wang and Pham 1996a–c). Mccall (1965), Pierskalla and Voelker (1976), Sherif and Smith (1981), Jardine and Buzacott (1985), Valdez-Flores and Feldman (1989), Cho and Parlar (1991), Dekker (1996), and Pham and Wang (1996) summarized the research done in the areas of maintenance and warranty. Maintenance can be classified into two major categories: corrective and preventive. Corrective maintenance occurs when the system fails. In other words, corrective maintenance means all actions performed as a result of failure, to restore an item to a specified condition. Some researchers also refer to correc-
R. Manzini, A. Regattieri, H. Pham, E. Ferrari, Maintenance for Industrial Systems © Springer 2010
397
398
tive maintenance as “repair.” Preventive maintenance occurs when the system is operating. In other words, preventive maintenance means all actions performed in an attempt to retain an item in a specified condition for operation by providing systematic inspection, detection, adjustment, and prevention of failures. Maintenance can also be categorized according to the degree to which the operating conditions of an item are restored by maintenance as follows1 : 1. Perfect repair or perfect maintenance: a maintenance action which restores the system operating condition to “as good as new,” i. e., upon perfect maintenance, a system has the same lifetime distribution and failure rate function as a brand new one. Complete overhaul of an engine with a broken connecting rod is an example of perfect repair. Generally, replacement of a failed system by a new one is a perfect repair. 2. Minimal repair or minimal maintenance: a maintenance action which restores the system to the failure rate it had when it just failed. The operating state of the system under minimal repair is also called “as bad as old” policy in the literature. 3. Imperfect repair or imperfect maintenance2: a maintenance action may not make a system “as good as new” but younger. Usually, it is assumed that imperfect maintenance restores the system operating state.
10 Advanced Maintenance Modeling
the number of replacements of nonfailed units during the interval .0; t, respectively. Then the expected total cost during .0; t is given by Ef ŒC.t/ D c1 EŒN1 .t/ C c2 EŒN2 .t/:
(10.1)
When the planning horizon is approaching infinity, the approximation function limt !1 Ef ŒC.t/=t can be used to obtain the expected cost per unit time. We next discuss the optimum policies, introduced in Chap. 9, which minimize the expected costs per unit time of each replacement policy such as age replacement and block replacement.
10.2.1 Age Replacement A unit is replaced at time T or at failure, whichever occurs first. T is also called a planned replacement interval time. Let fXk g1 be the failure times of suckD1 cessive operating units with a density f and distribution F with mean . Let Zk minfXk ; T g represent the intervals between the replacements caused by either failure or planned replacement for k D 1; 2; : : : The probability of Zk can be written as follows: ( F .t/ t < T (10.2) Pr fZk tg D 1 t T: The mean time of one cycle is
10.2 Maintenance Policy A failed system is assumed to be immediately replaced or repaired. There is a cost associated with it. On one hand, the designer may want to maintain a system before its failure. On the other hand, it is better not to maintain the system too often because of the cost involved each time. Therefore, it is important to determine when to perform the maintenance of the system that minimizes the expected total cost. Consider a one-unit system where a unit is replaced upon failure. Let c1 , called Cf in Chap. 9, be the cost of each failed unit which is replaced and c2 (< c1 ), called Cp in Chap. 9, be the cost of a planned replacement for each nonfailed unit. Let N1 .t/ and N2 .t/ denote the number of failures with corrective replacements and 1 2
See Chap. 9 See the “restoration factor” q introduced in Chap. 9.
ZT EfZk g D
ZT t dF .t/ C TR.T / D
0
R.t/ dt: 0
(10.3)
The expected total cost per cycle is Ec .T / D c1 F .T / C c2 R.T /;
(10.4)
where R.T / D 1 F .T /. The expected total cost per unit time for an infinite time span is E.T / D
c1 F .T / C c2 R.T / : RT 0 R.t/ dt
(10.5)
Let r.t/ f .t/=R.t/ be the failure rate. One can obtain the optimal replacement policy time T which minimizes the expected total cost per unit time E.T / in Eq. 10.5. Assume the failure rate r.t/ is a strictly
10.3 Modeling of Nonrepairable Degraded Systems
increasing function and A D
c1 , .c1 c2 /
399
then
1. If r.1/ > A then there exists a finite value c2 ; (10.6) T D G 1 c1 c2 where ZT G.T / D r.T /
R.t/ dt F .T /;
(10.7)
0
and the resulting expected total cost is C.T / D .c1 c2 /r.T /:
(10.8)
2. If r.1/ A, then the optimum replacement time T is at T D 1. This implies that a unit should not be replaced unless it fails. It is easy to obtain the above results by differentiating the expected total cost function from Eq. 10.5 with respect to T and setting it equal to 0. We have ZT @E.T / D .c1 c2 / r.T / R.t/ dt F .T / c2 @T 0
0
(10.9)
2 or, equivalently, G.T / D c1cc . Since r.T / is strictly 2 increasing and G.0/ D 0, we can easily show that the function G.T / is strictly increasing in T . 2 . This shows If r.1/ > A, then G.1/ > .c1cc 2/ that there exists a finite value T , where T is given in Eq. 10.6 and it minimizes C.T /. 2 . This shows If r.1/ A, then G.1/ .c1cc 2/ that the optimum replacement time is T D 1. This implies that a unit will not be replaced until it fails.
10.2.2 Block Replacement Consider that a unit begins to operate at time t D 0 and when it fails, it is discovered instantly and replaced immediately by a new one. Under this block policy, a unit is replaced at periodic times kT (k D 1; 2; : : : ) independent of its age. Suppose that each unit has a failure time distribution F .t/ with finite mean . The expected total cost per cycle is given by c1 EŒN1 .T / C c2 EŒN2 .T / D c1 M.T / C c2 ; (10.10)
where M.T / is differential and the expected number of failed units per cycle. The expected cost per unit time for an infinite time span under block replacement policy is defined as C.T / D
c1 M.T / C c2 : T
(10.11)
This indicates that there will be one planned replacement per period at a cost of c2 and the expected number of failures with corrective replacement per period where each corrective replacement has a cost of c1 . One can obtain the optimum planned replacement time T which minimizes the expected cost per unit time C.T / by differentiating the function C.T / with respect to T and setting it equal to zero. Then we obtain c2 (10.12) T m.T / M.T / D ; c1 where m.t/ dM.t/= dt. There exists a finite T and the resulting expected cost is C.T / D c1 m.T /.
10.3 Modeling of Nonrepairable Degraded Systems Maintenance has evolved from a simple model that deals with machinery breakdowns, to time-based preventive maintenance, to today’s condition-based maintenance. It is of great importance to avoid the failure of a system during its actual operation, especially when such failure is dangerous and costly. This section examines the problem of developing maintenance cost models for determining the optimal maintenance policies of degraded systems with competing failure processes. Most of the content in this section is based on the study conducted by Li and Pham (2005). Pham et al. (1996) presented a Markov model for predicting the reliability of k-out-of-n systems in which components are subject to multistage degradation as well as catastrophic failures. Owing to the aging effect, the failure rate of the component will increase. They considered the state-dependent transition rates for the degradation process. Pham et al. (1997) derived models for predicting the availability and mean lifetime of multistage degraded systems with partial repairs. Several authors have proposed various inspection policies and models
400
10 Advanced Maintenance Modeling
for systems with a degradation process (Dieulle et al. 2003; Grall et al. 2002; Klutke and Yang 2002; Lam 1991; Li and Pham 2005; Li 2005; Lam and Yeh 1994; Wortman et al. 1994). The notation adopted follows: Cc Cp Cm Ci Y .t/ G D.t/ Xi
N.t/ S C.t/ EŒC1 EŒW1 EŒNI EŒ fIi gi 2N fUi gi 2N Pi C1 Pp Pc
M
This section discusses a reliability model for nonrepairable degraded systems subject to two competing processes. Consider that: 1. The system has the state space ˝U D fM; : : : ; 1; 0; F g and it starts at state M at time t D 0. 2. The system fails owing either to degradation [Y .t/ > G] or to catastrophic failure PN.t / (D.t/ D i D1 Xi > S /. The system may either go from state i to the next degraded state i 1 or may go directly to catastrophic failure state F , i D M; : : : ; 1. 3. No repair or maintenance is performed on the system. 4. The two processes Y .t/ and D.t/ are independent.
Cost per corrective maintenance action; Cost per preventive maintenance action; Loss per unit idle time; Cost per inspection; Degradation process; Critical value for degradation process; Cumulative shock damage value up to time t; The damage of the i th shock and is independent and identically distributed with a cumulative distribution function Fx ; A random variable that represents the number of shocks; Critical value for shock damage; Cumulative maintenance cost up to time t; Average total maintenance cost during a cycle; Mean cycle length; Mean number of inspections during a cycle; Mean idle time during a cycle; Inspection sequence; Interinspection sequence; Probability that there are a total of (i C 1) inspections in a renewal cycle; Probability that a renewal cycle ends as a result of a preventive maintenance action; Probability that a renewal cycle ends as a result of a corrective maintenance action (Pc D 1 Pp ).
M–1
Figure 10.1 illustrates the case where systems are subject to two competing failure processes: degradation process Y .t/ and the random shock process D.t/: Whichever process occurred first would cause the system to fail. Suppose that the operating condition of the system at any time point could be classified into one of a finite number of the states, say, ˝U D fM; : : : ; 1; 0; F g: A one-to-one relationship between the element of ˝ D fM; : : : ; 1; 0g and its corresponding interval is defined as follows:
State M 1 if Y .t/ 2 .WM ; WM 1 :: : State i
Y .t/ 2 .Wi C1 ; Wi
State 1
Y .t/ 2 .W2 ; W1
State 0
Y .t/ > W1 :
1
D(t)>S Fig. 10.1 Flow diagram of the system with two competing failure processes (Li and Pham 2005)
Y .t/ 2 Œ0; WM
State M if
0
F
10.3 Modeling of Nonrepairable Degraded Systems
401
Let Pi .t/ be the probability that the value of Y .t/ will fall within a predefined interval corresponding to state i and D.t/ S . From state i , the system will make a direct transition to state i 1 owing to gradual degradation or to state F owing to a random shock (Fig. 10.1). The reliability function is defined as RM .t/ D
M X
Pi .t/ D P fY .t/ G; D.t/ S g;
i D1
(10.13)
where Pi .t/ is the probability of being in state i . Let T be the time to failure of the system. Then T can be defined as T D infft > 0 W Y .t/ > G or D.t/ > S g. The mean time to failure is given by
Bt
e , assume that In particular, for Y .t/ D W AC eBt the degradation process is described as the function eBt Y .t/ D W AC , where the two random variables eBt A and B are independent, and that A follows a uniform distribution with parameter interval [0,a] and B follows an exponential distribution with parameter ˇ > 0. In short, A U Œ0; a, a > 0 and B exp ˇ, ˇ > 0. The probability for the system being in state M is as follows:
PM .t/ D P fY .t/ WM ; D.t/ S g Z D 8A
Z1 P fY .t/ G; D.t/ S g dt
EŒT D
P fD.t/ S g " # ˇ 1 1 u1 t t 1 ˇt D 1 .a 1/ a u1 t ˇ
0
Z1 D
P fY .t/ Gg
1 X .2 t/j e2 t .j / FX .S / dt jŠ j D0
0
e2 t
(10.14) or, equivalently, Z 1 .j / X FX .S / P fY .t/ Gg.2 t/j e2 t dt: EŒT D jŠ j D0 0 (10.15) 1
Let FG .t/ D P fY .t/ Gg, then fG .t/ D dtd FG .t/. The probability distribution function of the time to failure, fT .t/, can be easily obtained: d fT .t/ D ŒP fY .t/ GgP fD.t/ S g dt 1 X FX.j / .S / d D ŒP fY .t/ Gg.2 t/j e2 t : jŠ dt j D0
(10.16) After simplifications, we have fT .t/ D
1 .j / X F .S / X
j D1
jŠ
j
fG .t/.2 t/ e
2 t
C FG .t/j2 .2 t/j 1 e2 t 2 FG .t/.2 t/j e2 t : (10.17)
! u1 A ˇˇ 1 P B < ln ˇA D x fA .x/ dx t 1 u1
1 X .2 t/j j D0
jŠ
FX.j / .S /:
(10.18)
Then the probability for the system being in state i can be calculated as follows: ( ) eBt Pi .t/ D P Wi C1 < W Wi ; D.t/ S A C eBt ! " Za ˇ ui 1 A ui A ˇˇ 1 1 ADx ln D P < B ln t 1 ui 1 t 1 ui ˇ 0
1 X .2 t/j .j / fA .x/ dx e2 t FX .S / jŠ j D1 ( t 1 1 ˇt a D a t ˇ " ˇ ˇ #) 1 ui t 1 ui 1 t ui ui 1 e2 t
1 X .2 t/j j D0
jŠ
.j /
FX .S /;
where i D
Wi ; W
i D M 1; : : : ; 1:
(10.19)
402
10 Advanced Maintenance Modeling 1
0.9
λ 2 = 0.12
0.8
λ2 = 0.20
We now provide a numerical example. eBt , where A U Œ0; 5 and Assume Y .t/ D W AC eBt B exp 10, and critical values for the degradation and the shock damage are G D 500 and S D 200, respectively. The random shock function is D.t/ D PN.t / i D1 Xi , where Xi exp 0:3. Figure 10.2 shows the reliability of the system using Eq. 10.22 for 2 D 0:12 and 2 D 0:20.
Reliability
0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0
5
10
15
20
25
30
35
40
Time x 20
Fig. 10.2 Reliability versus time (Li and Pham 2005)
Similarly, the probability for the system being in state 0 is as follows: eBt > G; D.t/ S P0 .t/ D P Y .t/ D W A C eBt # " ˇt t 1 1 uM 1 ˇt .a / D a uM t ˇ e2 t
1 X .2 t/j
jŠ
j D0
.j /
FX .S /:
(10.20)
The probability for a catastrophic failure state F is given by eBt PF .t/ D P Y .t/ D W G; D.t/ > S A C eBt ˇ ˇ t 1 1 u1 t .a1 t / D 1 a u1 t ˇ 1 X .2 t/j .j / 2 t FX .S / (10.21) 1 e jŠ j D0
Hence, the reliability RM .t/ is given by RM .t/ D
M X
Pk .t/
kD1
"
# ˇ ˇ 1 1 uM t t .a1 t / D 1 a uM a t ˇ 1 X .2 t/j .j / 2 t FX .S / : (10.22) e jŠ j D0
10.4 Modeling of Inspection-Maintenance Repairable Degraded Systems The system is assumed to be periodically inspected at times fI; 2I; : : : ; nI; : : : g and the state of the system can only be detected by inspection. After a preventive maintenance or corrective maintenance action the system will be restored to the as-good-as-new state. Assume that the degradation fY .t/gt 0 and random shock fD.t/gt 0 are independent, and a corrective maintenance action is more costly than a preventive maintenance and a preventive maintenance costs much more than an inspection. In other words, Cc > Cp > Ci . From Sect. 10.3, T is defined as the time to failure T D infft > 0 W Y .t/ > G or D.t/ > S g, where G is the critical value for fY .t/gt 0 and S is the threshold level for fD.t/gt 0 . The material in this section is mostly based on the study conducted by Li and Pham (2005). The two threshold values L and G (G is fixed) effectively divide the system state into three zones as shown in Fig. 10.3. They are as follows: doing nothing zone when Y .t/ L and D.t/ S ; preventive maintenance zone when L < Y .t/ G and D.t/ S ; and corrective maintenance zone when Y .t/ > G and D.t/ > S . The maintenance action will be performed when either of the following situations occurs: 1. The current inspection reveals that the system condition falls into the preventive maintenance zone; however, this state is not found at the previous inspection. At the inspection time iI , the system falls into the preventive maintenance zone, which means fY ..i 1/I / L, D..i 1/I / S g\fL < Y .iI / G, D.iI / S g. Then preventive maintenance action is performed and it will take a random time R1 .
10.4 Modeling of Inspection-Maintenance Repairable Degraded Systems
403
Y(t)
CM zone
G PM zone
L
Doing nothing zone
D(t)
S Fig. 10.3 The evolution of the system. CM corrective maintenance, PM preventative maintenance. (Li and Pham 2005)
I1 L
I 1 ... I i I i +1 R1 W1
2. When the system fails at T , a corrective maintenance action is taken immediately and would take a random time R2 .
I i T R2 W2
I1 L I i T W3
R2
10.4.1 Calculate EŒNI Let EŒNI denote the expected number of inspections during a cycle. Then,
Note that after a preventive maintenance or a corrective maintenance action has been performed, the system is renewed and the cycle ends. From a concept of renewal reward theory, the average long-run maintenance cost per unit time is given by EC.L; I / D
EŒC1 : EŒW1
(10.23)
EŒNI D
Note that there is a probability Pp that the cycle will end as a result of a preventive maintenance action and it will take on average EŒR1 amount of time to complete a preventive maintenance action with a corresponding cost Cp EŒR1 Pp . Similarly, if a cycle ends as a result of a corrective maintenance action with probability P , it will take on average cEŒR2 amount of time to complete a corrective maintenance action with corresponding cost Cc EŒR2 Pc . We next discuss the analytical analysis of EŒC1 .
.i /P fNI D i g;
(10.25)
i D1
where P fNI D i g is the probability that there are a total of i inspections in a renewal cycle. It can be shown that P fNI D i g D P fY Œ.i 1/I L; DŒ.i 1/I S g P fL < Y .iI / G; D.iI / S g C P fY .iI / L; D.iI / S g
The expected total maintenance cost during a cycle EŒC1 is defined as EŒC1 D Ci EŒNI C Cp EŒR1 Pp C Cc EŒR2 Pc : (10.24)
1 X
P fiI < T .i C 1/I g: (10.26) Hence, EŒNI D
1 X i D1
i fP fY Œ.i 1/I L; DŒ.i 1/I S g P fL < Y .iI / G; D.iI / S g
C P fY .iI / L; D.iI / S g P fiI < T .i C 1/I g:
(10.27)
Assume Y .t/ D A C Bg.t/, where A N.A ; A2 /, B N.B ; B2 /, and A and B are independent. We
404
10 Advanced Maintenance Modeling
now calculate the probabilities P fY Œ.i 1/I L, DŒ.i 1/I S g and P fL < Y .iI / G, D.iI / S g. PN.t / Given g.t/ D t, D.t/ D i D0 Xi , where the Xi are independent and identically distributed, and N.t/ Poisson./, then, P fY Œ.i 1/I L; DŒ.i 1/I S g D P fA C B.i 1/I Lg N..i 1/I / n o X P DŒ.i 1/I D Xi S i D0
L .A C B .i 1/I / D˚ q A2 C B2 Œ.i 1/I 2 e.i 1/I
!
1 X Œ.i 1/I j .j / FX .S / jŠ
j D0
(10.28) and
creasing function of time t, assume that A fA .a/, B fB .b/. Let y1 D a C bg.iI / (10.31) y2 D a C bgŒ.i C 1/I : After simultaneously solving the above equations in terms of y1 and y2 , we obtain y1 gŒ.i C 1/I y2 g.iI / D h1 .y1 ; y2 /; (10.32) gŒ.i C 1/I g.iI / y2 y1 D h2 .y1 ; y2 /: (10.33) bD gŒ.i C 1/I g.iI /
aD
Then the random vector .Y .iI /; Y Œ.i C 1/I / has a joint continuous probability distribution function as follows: fY.iI /;Y Œ.i C1/I .y1 ; y2 / D jJ jfA Œh1 .y1 ; y2 /fB Œh2 .y1 ; y2 /;
(10.34)
where the Jacobian J is given by P fL < Y .iI / G; D.iI / S g " ! G .A C B iI / D ˚ q A2 C B2 .iI /2 L .A C B iI / ˚ q A2 C B2 .iI /2 eiI
1 X j D0
ˇ @h ˇ 1 ˇ ˇ @y1 J D ˇˇ ˇ @h2 ˇ @y1
!#
j
.iI / .j / FX .S /: jŠ
(10.29)
Since T D infft > 0 W Y .t/ > G or D.t/ > S g, we have P fiI < T .i C 1/I g D P fY .iI / L; Y Œ.i C 1/I > Gg P fDŒ.i C 1/I S g C P fY Œ.i C 1/I Lg P fD.iI / S; DŒ.i C 1/I > S g: (10.30) In Eq. 10.30, since Y .iI / and Y Œ.i C 1/I are not independent, we need to obtain the joint probability distribution function fY.iI /;Y Œ.i C1/I .y1 ; y2 / in order to compute P fY .iI / L; Y Œ.i C 1/I > Gg. In general, as for when A > 0 and B >0 are two independent random variables, and g.t/ is an in-
ˇ ˇ ˇ ˇ ˇ ˇ ˇ ˇ 1 ˇ: ˇDˇ ˇ ˇ g.iI / gŒ.i C 1/I ˇ @h2 ˇ ˇ @y2 (10.35) @h1 @y2
As for the term P fD.iI / S;PDŒ.i C 1/I > S g N.t / in Eq. 10.30, since D.t/ D i D0 Xi is a compound Poisson process, the compound Poisson process has a stationary independent increment property. Therefore, the random variables D.iI / and DŒ.i C 1/I D.iI / are independent. Using the Jacobian transformation, random vector .D.iI /; DŒ.i C 1/I D.iI // is distributed the same as vector .D.iI /; DŒ.i C 1/I /. Note that D.iI / and D.Ii C1 / are independent; therefore, P fD.iI / S; DŒ.i C 1/I > S g D P fD.iI / S gP fDŒ.i C 1/I > S g: (10.36)
10.4.2 Calculate Pp Note that either a preventive maintenance or a corrective maintenance action will end a renewal cycle. In other words, preventive maintenance and corrective maintenance events are mutually exclusive at the re-
10.4 Modeling of Inspection-Maintenance Repairable Degraded Systems
newal time point. As a consequence, Pp C Pc D 1. The probability Pp can be obtained as follows: Pp D P fpreventative maintenance ending a cycleg D
1 X
P fY Œ.i 1/I L; L < Y .iI / Gg
i D1
P fD.iI / S g:
(10.37)
10.4.3 Expected Cycle Length Analysis Since the renewal cycle ends as a result of either a preventive maintenance action with probability Pp or a corrective maintenance action with probability Pc , the mean cycle length EŒW1 is calculated as follows:
The expression for EŒT depends on the probability P fY .t/ Gg and sometimes it is not easy to obtain a closed form.
10.4.4 Optimization of Maintenance Cost Rate Policy We determine the optimal inspection time I and preventive maintenance threshold L such that the longrun average maintenance cost rate EC.L; I / is minimized. Mathematically, we wish to minimize the following objective function: EC.L; I / P1
i D1
EŒW1 D
1 X
EŒ.iI C R1 /IPM occurs in Œ.i 1/I;iI
i D1
C EŒ.T C R2 /ICM occurs X 1 iIP fY Œ.i 1/I L; DŒ.i 1/I S g D i D1
P fL < Y .iI / G; D.iI / S g
C EŒR1 Pp C .EŒT C EŒR2 /Pc ;
(10.38)
where IPM occurs in ..i 1/I;iI and ICM occurs are the indicator functions. The mean time to failure, EŒT , is given by Z1 EŒT D
P fT > tg dt 0
Z1 P fY .t/ G; D.t/ S g dt
D 0
Z1 D
P fY .t/ Gg
1 X .2 t/j e2 t .j / FX .S / dt jŠ
j D0
0
(10.39) or, equivalently, by Z 1 X FX.j / .S / P fY .t/ Gg.2 t/j e2 t dt: EŒT D jŠ 1
j D0
0
(10.40)
405
iP fY .Ii 1 / L; D.Ii 1 / S g P fL < Y .Ii / G; D.Ii / S g
D P1 . i D1 Ii P fY .Ii 1 / L; D.Ii 1 / S g P fL < Y .Ii / G; D.Ii / S g/ CEŒR1 Pp C EŒR2 Pc P1 i D1 iVi fP fY .Ii / L; Y .Ii C1 / > Gg P fD.Ii C1 / S g CP fY .Ii C1 / Lg P fD.Ii / S; D.Ii C1 / > S gg C P1 . i D1 Ii P fY .Ii 1 / L; D.Ii 1 / S g P fL < Y .Ii / G; D.Ii / S g/ CEŒR1 Pp C EŒR2 Pc P Cp EŒR1 1 i D1 P fY .Ii 1 / L; D.Ii 1 / S g P fL < Y .Ii / G; D.Ii / S g C P . 1 I P fY .Ii 1 / L; D.Ii 1 / S g i D1 i P fL < Y .Ii / G; D.Ii / S g/ CEŒR1 Pp C EŒR2 Pc Cc EŒR2 P f1 1 i D1 P fY .Ii 1 / L; D.Ii 1 / S g P fL < Y .Ii / G; D.Ii / S gg C ; P1 . i D1 Ii P fY .Ii 1 / L; D.Ii 1 / S g P fL < Y .Ii / G; D.Ii / S g/ CEŒR1 Pp C EŒR2 Pc where Ii 1 D .i 1/I , Ii D iI , Ii C1 D .i C 1/I , and Vi D P fY .iI / L; D.iI / S g. The above complex objective function is a nonlinear optimization problem. Li and Pham (2005) discussed a step-by-step algorithm based on the Nelder– Mead downhill simplex method shown as follows:
406
• Step 1: Choose .n C 1/ distinct vertices as an initial set fZ .1/ ; : : : ; Z .nC1/ g, then calculate the value of the function f .Z/ for i D 1; 2; : : : ; .n C 1/, where f .Z/ D EC.I; L/. Put the values f .Z/ in increasing order, where f .Z .1/ / D minfEC.I; L/g and f .Z .nC1/ / D maxfEC.I; L/g and set k D 0. P • Step 2: Compute X .k/ D n1 niD1 Z .i / . • Step 3: Use the centroid X .k/ in step 2 to compute X .kC1/ D X .k/ Z .nC1/ . • Step 4: Set D 1 and compute f .X .k/ C X .kC1/ /. If f .X .k/ C X .kC1/ / f .Z .1/ /, go to step 5. If f .X .k/ C X .kC1/ / f .Z .n/ /, go to step 6. Otherwise, fix D 1 and go to step 8.
10 Advanced Maintenance Modeling
p spectively, and g.t/ D t e0:005t . Assume that the ranPN.t / dom shock damage is described by D.t/ D i D1 Xi , where Xi follows the exponential distribution, i. e., Xi exp.0:04t/ and N.t/ Poisson.0:1/. Given G D 50, S D 100, Ci D 900 per inspection Cc D 5600 per corrective maintenance, Cp D 3000 per preventative maintenance, R1 exp.0:1t/, and R2 exp.0:04t/, we now determine the values of both I and L so that the average total cost per unit time EC.I; L/ is minimized. The step-by-step procedure follows: Step 1:
I and L are two decision variables. We need .n C 1/ D 3 initial distinct vertices, which are Z .1/ D .25; 20/, Z .2/ D .20; 18/, and Z .3/ D .15; 10/. Set k D 0. Calculate the value of f .Z . / / corresponding to each vertex and sort them in increasing order in terms of EC.I; L/.
Step 2:
Calculate: X .0/ D .Z .1/ C Z .2/ /=2 D .22:5; 19/.
Step 3:
Generate the searching direction: X D X .0/ Z .3/ D .7:5; 9/.
Step 4:
Set D 1; it will produce a new minimum EC.30; 28/ D 501:76 that leads trying an expansion with D 2, i. e., .37:5; 38/.
Step 5:
Set D 2. Similarly, calculate f .Z/ that leads to EC.37:5; 38/ D 440:7: Go to step 8 in Sect. 10.4.4. This result turns out to be a better solution; hence .15; 10/ is replaced by .37:5; 38/.
• Step 5: Set D 2 and compute f .X .k/ C 2X .kC1/ /: If f .X .k/ C 2X .kC1/ / f .X .k/ C X .kC1/ /, set D 2. Otherwise, set D 1. Then go to step 8. • Step 6: If f .X .k/ C X .kC1/ / f .Z .nC1/ /, set D 1=2: Compute f .X .k/ C 12 X .kC1/ /. If f .X .k/ C 12 X .kC1/ / f .Z .nC1/ /, set D 1=2 and go to step 8. Otherwise, set D 1=2 and if f .X .k/ 12 X .kC1/ / f .Z .nC1/ /, set D 1=2 and go to step 8. Otherwise, go to step 7. • Step 7: Shrink the current solution set toward the best Z .1/ by Z .i / D 12 .Z .1/ C Z .i / /, i D 2; : : : ; n C 1. Compute the new f .Z .2/ /, : : : , f .Z .nC1/ /, let k D k C 1, and return to step 2. • Step 8: Replace worst Z .nC1/ by X .k/ C q the P 1 .kC1/ .i / N2 X . If nC1 nC1 i D1 Œf .Z / f < ", where fN is an average value, stop. Otherwise, let k D k C 1 and return to step 2. It should be noted that " denotes the difference between the maximum and the minimum values of f . In the following example, " D 0:5, which also indicates how soon we would like the algorithm to stop when the vertices function values are close.
10.4.5 Numerical Example Assume that the degradation process is described by Y .t/ D ACBg.t/, where A andB are independent and follow the uniform distribution with parameter interval Œ0; 4 and an exponential distribution with parameter 0.3, i. e., A U.0; 4/ and B exp.0:3t/, re-
The iteration continues q P and stops at k D 6 (see Table 10.1) since 13 3i D1 ŒEC.Z .i / / EC.I; L/2 < 0:5, where EC.I; L/ is the average value. From Table 10.1, the optimal values are I D 37:5 and L D 38 and the corresponding cost value is EC .I; L/ D 440:7. Figure 10.4 shows the relationship between L and Pc for different I values, i. e., 35, 37.5, and 40. We also observe that Pc is an increasing function on L. This means a higher preventive maintenance threshold is more likely to result in a failure.
10.5 Warranty Concepts A warranty is a contract under which the manufacturers of a product and/or service agree to repair or replace the product or provide a service when a product
10.5 Warranty Concepts
407
Table 10.1 Optimal values I and L (Li and Pham 2005) k
Z .1/
Z .2/
Z .3/
Search result
0
(25,20) EC.I; L/ D 564:3
(20,18) EC.I; L/ D 631:1
(15,10) EC.I; L/ D 773:6
(37.5,38) EC.I; L/ D 440:7
1
(37.5,38) EC.I; L/ D 440:7
(25,20) EC.I; L/ D 564:3
(20,18) EC.I; L/ D 631:1
(42.5,40) EC.I; L/ D 481:2
2
(37.5,38) EC.I; L/ D 440:7
(42.5,40) EC.I; L/ D 481:2
(25,20) EC.I; L/ D 564:3
(32.5,29) EC.I; L/ D 482:2
3
(37.5,38) EC.I; L/ D 440:7
(42.5,40) EC.I; L/ D 481:2
(32.5,29) EC.I; L/ D 482:2
(32.5,33.5) EC.I; L/ D 448:9
4
(37.5,38) EC.I; L/ D 440:7
(32.5,33.5) EC.I; L/ D 448:9
(42.5,40) EC.I; L/ D 481:2
(38.75,37.125) EC.I; L/ D 441:0
5
(37.5,38) EC.I; L/ D 440:7
(38.75,37.125) EC.I; L/ D 441:0
(32.5,33.5) EC.I; L/ D 448:9
(35.3125,35.25) EC.I; L/ D 441:1
6
(37.5,38) EC.I ; L / D 440:7
(38.75,37.125) EC.I; L/ D 441:0
(35.3125,35.25) EC.I; L/ D 441:4
Stop
for a variety of products (Bai and Pham 2004, 2005, 2006; Murthy and Blischke 2006). Warranty types are dependent on the kind of product that it protects. For larger or more expensive products with many components, it may be cheaper to repair the product rather than to replace it. These items are called “repairable products.” Other warranties simply result in replacement of an entire product because the cost to repair it is either close to or exceeds its original price. These products are considered nonrepairable. The following are the most common types used in warranties:
Fig. 10.4 Pc versus L
fails or the service does not meet the intended requirements. These agreements exist because of the uncertainty present in the supply of products or services, especially in a competitive environment. Warranties are important factors in both the consumers’ and the manufacturers’ decision making (Wang and Pham 2006b). A warranty can be the deciding factor for the purchase of a particular item when different products have similar functions and prices. The length and type of warranty is often thought of as a reflection of the reliability of a product as well as the company’s reputation. Many researchers have developed different models to provide guidance in selecting a successful warranty plan
• Ordinary free replacement. Under this policy, when an item fails before a warranty expires it is replaced at no cost to the consumer. The new item is then covered for the remainder of the warranty period. This is the most common type of a warranty and often applies to cars and kitchen appliances. • Unlimited free replacement. This policy is the same as the ordinary free replacement policy but each replacement item carries a new identical warranty. This type of warranty is often used for electronic appliances with high early failure rates and usually has a shorter length because of this. • Pro rata warranty. The third most common policy takes into account how much an item is used. If the product fails before the end of the warranty period, then it is replaced at a cost that is discounted proportional to its use. Items that experience wear or aging, such as tires, are often covered under these warranties.
408
10 Advanced Maintenance Modeling
Table 10.2 Maintenance and warranty modeling and analysis literature Group
References
General modeling
Amari and Pham (2007), Bai and Pham (2006), Beichelt and Fisher (1980), Brown and Proschan (1983), Esary et al. (1973), Kijimma (1989), Özekici (1996), Sheu (1998), Wang and Pham (1999)
Maintenance modeling
Barlow and Proschan (1965), Ben-Daya et al. (2000), Lie et al. (1995), Pham (2003a), Wang and Pham (2006b)
Age, block replacements
Ansell et al. (1984), Beichelt (1981), Berg (1995), Block et al. (1988), Bris et al. (2003), Fox (1966), Lam (1991), Nakagawa (1981a, b), Park and Yoo (1993), Savits (1988), Wang and Pham (1999)
Imperfect repairs
Bagai and Jain (1994), Hollander et al. (1992), Ebrahimi (1985), Nakagawa (1977), Park (1979), Wang and Pham (1996a–c)
Optimal policies
Chen and Feldman (1997), Feldman (1977), Lam and Yeh (1994), Makis and Jardine (1992), Nakagawa and Yasui (1987), Phelps (1983), Sheu (1994), Suresh and Chaudhuri (1994), Wang and Pham (1996a–c)
Inspection policies
Dieulle et al. (2003), Li and Pham (2005a, b), Zuo et al. (2000), Zuckerman (1989)
Warranty modeling
Bai and Pham (2005, 2006), Murthy and Blischke (2006)
Optimization
Canfield (1986), Inagaki et al. (1980), Lam and Yeh (1994), Pham and Wang (2000), Wang and Pham (1997, 2006a), Zheng (1995)
Different warranty models may include a combination of these three types as well as offering other incentives such as rebates, maintenance, or other services that can satisfy a customer and extend the life of the product. Table 10.2 presents a brief summary of references to research papers and books on maintenance and warranty modeling and analysis for quick reference.
10.6 Conclusions In this chapter, we presented reliability and maintenance models for systems with multiple competing failure processes such as degradation and random shock. The results of the maintenance models can be used as decision-tools to help practitioners and
inspectors as well as marketing managers to allocate the resources and also for the purposes of promotion strategies of the new products, including warranty policies. It should be noted that maintenance system costs associated with inspections, preventive maintenance, corrective maintenance, and downtime are often difficult to obtain, even though they are applicable in practice. For some critical systems, the overriding goal is to ensure that the system is available when needed; therefore, in many cases, the cost is, however, secondary. To achieve as high a level of availability as possible for a specified inspection rate, it is worth determining the optimal policies, including the number of inspections with respect to imperfect repairs (i. e., minimal and opportunistic schemes), that maximizes the degraded system availability.
11
Spare Parts Forecasting and Management
Contents 11.1 Spare Parts Problem . . . . . . . . . . . . . . . . . . . . . . . . . . 409 11.2 Spare Parts Characterization . . . . . . . . . . . . . . . . . . 410 11.3 Forecasting Methods . . . . . . . . . . . . . . . . . . . . . . . . . 411 11.4 Croston Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412 11.5 Poisson Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413 11.6 Binomial Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414 11.6.1 Numerical Example . . . . . . . . . . . . . . . . . . . . . 415 11.7 Spare Parts Forecasting Accuracy . . . . . . . . . . . . . 416 11.8 Spare Parts Forecasting Methods: Application and Case Studies . . . . . . . . . . . . . . . . . . 417 11.8.1 Case Study 1: Spare Parts Forecasting for an Aircraft . . . . . . . . . . . . . . . . . . . . . . . . . . 417 11.8.2 Case Study 2: Spare Parts Forecasting in a Steel Company . . . . . . . . . . . . . . . . . . . . . 418 11.9 Methods of Spare Parts Management . . . . . . . . . . . 422 11.9.1 Spare Parts Management: Qualitative Methods . . . . . . . . . . . . . . . . . . . . . 423 11.9.2 Spare Parts Management: Quantitative Methods . . . . . . . . . . . . . . . . . . . . 426
Even without considering crashes and other damage during its life, a car needs parts such as its engine oil, tires, and brake pads to be replaced or changed. Similarly, production systems have the same need for spare parts. Although a very relevant topic, spare parts management has rarely been studied. How many spare parts are there in the local warehouse of the company? How can future demand be forecast? This chapter deals with these questions, and presents several methodologies to support decision making on this theme.
11.1 Spare Parts Problem During its working life a production system needs spare parts to fix breakdowns and other reliability problems, while equipment also wears out with use. Spare parts management is therefore very important in economic terms and also technical terms. Figure 11.1 shows a typical sequence of activities performed during corrective maintenance requiring spare parts (e. g., electronic card, gearbox, chains, and other components) or expendable material (e. g., oil, glue). The procurement of spare parts is often included in the sequence. The duration of this activity is strongly related to the presence of spare parts in the local warehouse of the company. If the required spare part is available in the company’s warehouse, the procurement lead time is only a few minutes, but otherwise it is days or even weeks (e. g., when the supplier is located very far away or has to manufacture the items). The absence of a spare part can lead to production stopping or being curtailed, and so to a very significant increase in related costs. Furthermore, adapting not original spare parts that are not perfectly interchangeable with failed components leads to further damage to the equipment occurring rather than to its swift and effective repair. Spare parts are typically expensive and are at great risk of becoming obsolete (see Sect. 11.2). In addition, they may or may not be used and this uncertainty usually makes storing them expensive. In conclusion, spare parts management must consider two opposing factors: the lack of production cost and the procurement and storage cost. As shown in
R. Manzini, A. Regattieri, H. Pham, E. Ferrari, Maintenance for Industrial Systems © Springer 2010
409
410
11 Spare Parts Forecasting and Management stop equipment
restart
failure
time
uptime
downtime
uptime
intervention
ΔT
ΔT
ΔT
ΔT
ΔT
ΔT
ΔT
ΔT
ΔT
call
preparation
disassembly
procurement
repair
calibration
assembly
check
reporting
Fig. 11.1 Typical corrective maintenance activities
Lack of production cost
Total cost
Storage cost Costs
Spare parts in stock Optimal level
Fig. 11.2 The spare parts trade-off problem
Fig. 11.2, this is a trade-off problem in which the goal is to determine the optimal set (kind and quantity) of spare parts required at the company’s local warehouse. This set gives the minimum total cost. Two subproblems arise out of this optimal level: the forecasting of spare part consumption and the economic management of actual consumption. The important first step is to forecast the number of spare parts that the system will use in the future very carefully. Then these parts need to be procured and managed as efficiently as possible.
11.2 Spare Parts Characterization Compared with other materials flowing in a supply chain, the behavior of spare parts is very peculiar. The consumption of spare parts is basically intermittent and storage usually requires a wide variety of spare parts combined with few units per type. According to Williams (1984), Syntesos (2001), and Syntetos et al.
(2005), the parameters usually adopted to characterize spare part properties are: ADI Average interdemand interval: the average time interval between two successive consumptions of a spare part. It is usually expressed in time periods (e. g., months). CV 2 Squared coefficient of variation: standard deviation of consumption divided by the average value of consumption. It is adimensional. Figure 11.3 shows the typical consumption of a spare part in agreement with which the following can be defined: PN i (11.1) ADI D i D1 ; N 12 0qP N i D1
B CV D @ 2
."ri "a /2 N
"a
C A ;
(11.2)
PN "a D
i D1 "ri
N
;
(11.3)
where "ri is the spare part demand (units), i is the time interval between two successive spare part demands (periods), and N is the number of time intervals analyzed Several studies in the literature (Syntetos 2001; Syntetos and Boylan 2005, 2006; Syntetos et al. 2005; Ghobbar and Friend 2002, 2003; Boylan and Syntetos 2006; Boylan et al. 2008) introduce different patterns of spare parts according to ADI and CV2 values. In particular, they suggest different cutoff values for the classification depending on the context of the application. For example, Fig. 11.4 shows the different pattern discussed in Syntetos et al. (2005).
11.3 Forecasting Methods τ1
411
τ2
0
εr1
τi
εr2
εri
time
Fig. 11.3 Typical spare parts consumption
0
ADI = 1.32 periods
Smooth
Intermittent
Erratic
Lumpy
CV2 = 0.49
prevent problems or damage (i. e., thermal or hygrometric conditions). For the reasons mentioned above, spare part acquisition and storage can lead to a significant financial investment. Spare parts have tricky specific properties. They represent a particular category of materials in a production system that needs to be managed very carefully.
Fig. 11.4 Patterns of spare parts. ADI average interdemand interval, CV 2 squared coefficient of variation
11.3 Forecasting Methods Using Fig. 11.4, one can classify the patterns into four categories according to the state and size of the demand: 1. Intermittent demand is random, and a lot of time periods have no demand. 2. Erratic demand is (highly) variable and there is erratic behavior of the size of the demand rather than the demand per single time period. 3. Smooth demand, also occurs at random with a lot of time periods having no demand. When there is demand, it occurs in single or very few units. 4. Lumpy demand is similarly random with many time periods having no demand. Moreover, when the demand occurs, it is (highly) variable. The concept of lumpy corresponds to an extremely irregular demand, with great differences between each period’s requirements and with a large number of periods with zero requirements. Another fundamental peculiarity of spare parts is their specificity of use. In other words, a spare part is not usually of general purpose but only has its own use. Consequently, the risk of obsolescence is very high. For example, when a machine is superseded in a production system by a new one, most of the spare parts are not reusable on other equipment and so immediately become obsolete. Spare parts are usually expensive because their technological content is significant. Furthermore, specific storage devices are required in some situations to
The goal of an efficient spare parts management system is to minimize the total cost. This general observation is not true when safety or environmental questions impose specific constraints. Generally speaking, a trade-off between storage costs and production downtime costs needs to be found. This determination of the optimal level of spare parts requires two levels of analysis: the forecasting of future demand and the consequent optimal management of this demand. Several different approaches are available in order to determine the future requirement of spare parts in the real world of industry: • Experience and the know-how possessed by maintenance personnel. The experience of operators often represents a unique source of information. • Information from suppliers. Several suppliers develop lists of “suggested” spare parts for local stock. These lists are developed according to the work experience of the supplier or using suitably developed tests. • Forecasting models. Statistical models elaborate the consumption of spare parts registered in the past and estimate future demand. These categories of methods need different investments in terms of time and cost. The simultaneous use of all of these approaches can produce the best practice: the forecasting models provide good results that can then be fine-tuned using the know-how of maintenance operators and suppliers. The existence of a sig-
412
11 Spare Parts Forecasting and Management
Table 11.1 Spare parts forecasting methods Method
Abbreviation
Description
Moving averages Weighted moving averages
MA WMA
Exponential weighted moving averages
EWMA
Single exponential smoothing
SES
Croston method
Croston
Double exponential smoothing Additive Holt–Winter
DES AW
Multiplicative Holt–Winter
MW
Adaptive-response-rate single exponential smoothing Time series decomposition (seasonal regression model) Autoregressive integrated moving average Poisson model
ARRSES SRM
Rolling average value based on past demand data A simple variation on the moving average technique that weights the data in order to average them Applies weighting factors that decrease exponentially. The weighting for each older data point decreases exponentially, giving much more importance to recent observations while not discarding older observations entirely Similar to exponential weighted moving averages, weights decrease exponentially. It produces interesting results in the case of low and intermittent demand Adjustment of single exponential smoothing to consider series with zero value of demand occurring many times. Forecasting in the case of low and intermittent demand A factor considering trend effects is introduced into single exponential smoothing Extension of single exponential smoothing to linear exponential smoothing. Assumes that the seasonal effects are constant in size Assumes that the seasonal effects are proportional in size to the local deseasonalized mean level This is a variation of single exponential smoothing that continually adjusts the smoothing parameter to allow for changes in the trend Identifies different separate components of the basic pattern
ARIMA
Based on autocorrelation of residual (noise) in the data of the series
Poisson
Binomial model
BM
Models based on the Poisson distribution with the customer’s service level being defined Method based on a two-factor consumption model
nificant maintenance information system (see Chap. 7) containing information on the past consumption of spare parts is fundamental to the application of statistical methods. In the literature the forecasting of spare parts using a statistical approach is usually based on the general demand forecasting problem. This very broad approach can be focused by taking the specific peculiarities of spare parts into account, therefore avoiding the way a great many statistical methods underperform. Several studies on this topic are reported in the technical literature (Makridakis et al. 1998; Willemain et al. 2004; Ghobbar and Friend 2004; Regattieri et al. 2005; Ferrari et al. 2006), and their conclusions sometimes differ. Moreover, a group of interesting methods can be selected from the experimental evidence (Table 11.1). The following sections only deal with “nonconventional” approaches such as the Croston, Poisson, and binomial methods specifically devoted to the intermittent-demand case. Other models are very well known and very frequently used in the product demand forecasting problem (Madrikakis et al. 1998).
11.4 Croston Model Croston’s method is a widely used approach for intermittent-demand forecasting, and is based on exponential smoothing. In particular, it involves separate simple exponential smoothing forecasts of the demand size and the time period between demands. This approach is devoted to the situation where the time series has several zero values. Let Yt0 be the expected consumption of a spare part for the period (t C 1) defined at period t: Yt0 D
zt : pt
If yt D 0, then pt D pt 1 ; zt D zt 1 ; q D q C 1; otherwise pt D pt 1 C ˛.q pt 1 /; zt D zt 1 C ˛.yt zt 1 /; q D 1;
(11.4)
11.5 Poisson Model
413
where yt is the spare part consumption at period t, pt is the time interval between period t and the last period with a positive consumption of spare part(s), zt is the average consumption of a spare part upgraded at period t, q is the number of periods between period t and the last period with a positive consumption of spare part(s), and ˛ is a smoothing factor (optimized by a trial-and-error procedure). Some authors, including Johnston and Boylan (1996), Syntetos and Boylan (2001), Syntesos et al. (2005), and Boylan et al. (2008), have proposed modifications to Croston’s method for the purpose of improving the accuracy of the forecast. In particular, Syntetos and Boylan (2001) proposed a modification in the final calculus of the forecast: Yt0 D
zt ; pt c pt 1
d
d
d
Time (periods)
0
T x components
Fig. 11.5 Average rate of consumption and time horizon
(11.5)
where c is a constant optimized by a trial-and-error procedure (c usually ranges from 100 to 200). Fig. 11.6 Component XC100 Table 11.2 Probability of consumption for component XC100
11.5 Poisson Model The Poisson method is based on the Poisson distribution and forecasts the probability of a rare event. It is a direct consequence of the binomial distribution. When applied to the spare parts forecasting problem, it provides an estimate of the probability of consumption for a fixed value of spare parts. The starting point of this approach is the average consumption rate of a spare part (called d ). The probability that x spare parts will be used in a time horizon T at an average rate of consumption d is given by Pd;T;x D
.d T /x e .d T / ; xŠ
(11.6)
where d is the average rate of spare part consumption (pieces per period), x is the number of pieces consumed, and T is the time horizon (periods). The cumulative probability of the maximum consumption of x spare parts is given by P CUM d;T;x D
x X .d T /k e .d T / : kŠ
kD0
Figure 11.5 shows the situation.
(11.7)
T
dT
P .d; T; 1/
P .d; T; 2/
P .d; T; 3/
0 1 10 100 1,000 2,000 3,000 4,000 5,000 ... 10,000
0 0.00024 0.0024 0.024 0.24 0.48 0.72 0.96 1.2 ... 2.4
0 0.00024 0.002394 0.023431 0.188791 0.297016 0.350462 0.367577 0.361433 ... 0.217723
0 2.88E-08 2.87E-06 0.000281 0.022655 0.071284 0.126166 0.176437 0.21686 ... 0.261268
0 2.3E-12 2.3E-09 2.25E-06 0.001812 0.011405 0.03028 0.05646 0.086744 ... 0.209014
We now present an application. The average consumption for component XC100 (a secondary shaft for a conveyor system; Fig. 11.6) registered in the past and extracted from the database management system is d D 2:4 104 pieces=h, i. e., one replacement approximately every 4;200 h. The probability of consumption relating to a time horizon T (in hours) is given by Eq. 11.6 (see also Fig. 11.7). Table 11.2 shows results for consumption of one, two, and three spare parts [P .d; T; 1/, P .d; T; 2/, and P .d; T; 3/, respectively] according to different time horizons.
414
11 Spare Parts Forecasting and Management
11.6 Binomial Model
0.4 0.35
P(d,T,x)
0.3 0.25
P(d,T,1) P(d,T,2) P(d,T,3)
0.2 0.15 0.1 0.05 0 0
2000
4000
6000
8000
10000
12000
T(h)
Fig. 11.7 Probability plot of expected consumption for XC100 Table 11.3 Probability of consumption for component XC100 in T D 5;000 h X
dT
P .d; T; X/
0 1 2 3 4 5 6 7 8 9 10
1.20 1.20 1.20 1.20 1.20 1.20 1.20 1.20 1.20 1.20 1.20
0.301194 0.361433 0.21686 0.086744 0.026023 0.006246 0.001249 0.000214 3.21E-05 4.28E-06 5.14E-07
N D x1 C x2 : T n; x1 D 1=d
0.4 0.3
P(d,t,X)
0.25 0.2 0.15 0.1 0.05 0 1
2
3
4
(11.8) (11.9)
where N is the spare part forecast (pieces), d is the average consumption of spare parts (pieces per period), and T is the forecasting time horizon. x2 is related to the probability of at maximum x2 failures occurring; hence, spare part consumption in the time interval Tresidual is defined as 1 T : (11.10) Tresidual D T 1=d d
0.35
0
This method was proposed by Regattieri (1996) and is based on the binomial distribution. When the lumpiness of demand is significant, the simple application of the Poisson formula can give an inconsistent forecast (usually overestimated). In this method the spare part forecast is composed of two terms: the first, x1 , considers the average consumption of the spare part for a fixed period T and the second, x2 , tries to link the consumption to a desired level of service using the binomial approach. This method also considers the multiple use of the same spare part on different items of equipment in the system by applying the number of installations parameter (n). The forecast is given by
5
6
7
8
9
10
11
X
Fig. 11.8 Probability trend for expected consumption of XC100 in T D 5;000 h
In real applications the time horizon T is fixed and usually represents the lead time of supply. The main problem is to define the expected probability of consumption for different values of spare parts. The approach used is the same as in Eq. 11.6, i. e., fixed T and variable x. Table 11.3 and Fig. 11.8 show the expected consumption for time horizon T D 5;000 h and number of spare parts from zero to ten.
Let p be the cumulative probability of the spare part being used (i. e., to have a failure) in the Tresidual period. Assuming an exponential distribution of time to failure (other competitive distributions are Weibull and normal ones), F .Tresidual / D 1 e
1 1=d Tresidual
D p:
(11.11)
Let n be the number of examined components contemporaneously installed and LS the a priori fixed probability of satisfying the demand of spare parts forecast. Using the binomial formula, ! x2 X n (11.12) .1 p/ni p i LS: P .x2 / D i i D0
The iterative application of Eq. 11.12 means the minimum value x2 satisfying the disequation can be defined.
11.6 Binomial Model
415
d =2×10-3 pieces/h
0
time
T = 640 h x components
Fig. 11.9 P000-303 valve
11.6.1 Numerical Example
F .Tresidual / D 1 e D 1e
On a power and free transportation system installed in a car production plant for chassis handling there are n D 10 elements of a solenoid valve named “P000303” (Fig. 11.9). The past consumption of this spare part according to the database management system is d D 2 103 pieces=h. The plant engineer must forecast the expected consumption of item P000-303 for a time horizon of 640 h, corresponding to the time interval between two consecutive procurements. The fixed service level is 90%. The starting point is Eq. 11.8:
x1 D
T 1=d
$
nD
1 2
640 h 103 pieces=h
10
D 10 pieces: Tresidual and the corresponding cumulative probability of failure p are 1 T Tresidual D T 1=d d % $ 640 h D 640 h 1 3 pieces=h 2 10 1 103 pieces=h 2 D 140 h;
D 0:244:
Let x2 be equal to one unit: ! 1 X n P .x2 D 1/ D .1 p/ni p i i i D0
D
%
1 1 10 2
Dp
140 3
i D0
1 X i D0
Then,
The service level in Tresidual is 90%, and the value of x2 is obtained from the recursive application of ! x2 X n P .x2 / D .1 p/ni p i 0:90: i
D
N D x1 C x2 :
1 1=d Tresidual
nŠ .1 p/ni p i i Š.n i /Š
10Š .1 0:244/100 0:2440 0Š.10 0/Š
C
10Š .1 0:244/101 0:2441 1Š.10 1/Š
D 0:061 C 0:197 Š 0:258: In conclusion, P .x2 D 1/ < LS and x2 D 1 is not the solution. The following attempt value must be x2 D 2: ! 2 X n P .x2 D 2/ D .1 p/ni p i i i D0
D
10Š .1 0:244/100 0:2440 0Š.10 0/Š 10Š .1 0:244/101 0:2441 C 1Š.10 1/Š
416
11 Spare Parts Forecasting and Management
10Š .1 0:244/102 0:2442 2Š.10 2/Š D 0:061 C 0:197 C 0:286 Š 0:544: C
Furthermore, in this case the disequation (Eq. 11.12) is not satisfied. In brief, if x2 D 4, P .x2 D4/
! 4 X n D .1 p/ni p i i i D0
D
10Š .1 0:244/10 0:2440 0Š 10Š 10Š .1 0:244/104 C ::: C 4Š.10 4/Š 0:2444
D 0:060 C 0:197 C 0:286 C 0:246 C 0:136 D 0:928 > 0:90; then the expected value of x2 is four pieces. In conclusion, the total forecast of spare parts demand in 640 h is N D x1 C x2 D 10 C 4 D 14 pieces.
the outcome: in this case the expected values are very uncertain. There are many parameters to evaluate the forecast error. Let At be the actual value at time t, Ft the forecast value at time t, and n the couples (At ; Ft ) considered: • Mean deviation (MD): Pn Pn .At Ft / t D1 et MD D D t D1 I n n
(11.13)
• Mean square deviation (MSD): Pn Pn 2 .At Ft /2 t D1 et MSD D D t D1 I (11.14) n n • Mean absolute deviation (MAD): Pn Pn jAt Ft j t D1 jet j MAD D D t D1 I (11.15) n n • Mean absolute percentage error (MAPE): ˇ n ˇ 1 X ˇˇ At Ft ˇˇ I MAPE D n t D1 ˇ At ˇ
(11.16)
• Standardized MAD (SMAD):
11.7 Spare Parts Forecasting Accuracy
SMAD D
MAD MAD D Pn : A t D1 At
(11.17)
n
The forecast error is the difference between the actual/real and the predicted/forecast value of a time series or any other phenomenon of interest. In simple cases, a forecast is compared with an outcome at a single point in time and a summary of forecast errors is constructed over a collection of these samples. Here the forecast may be assessed using the difference or using a proportional error. By convention, the error is defined using the value of the outcome minus the value of the forecast. Obviously, the forecast accuracy is linked to the forecast error. In particular, if the error E is expressed as a percentage, the accuracy is equal to (1 E)%. The evaluation of the forecast error (or accuracy) is of critical importance in choosing the best method according to the real data in the analysis. Furthermore, the evaluation of forecasting error, and in particular its value compared with the real outcomes, means the robustness of the choice can be evaluated. At times the forecasting error is greater than (or comparable with)
MD is the basic error but suffers from a significant problem linked to a “compensation” of errors (minus sign and plus sign). Consequently, MSD and MAD are introduced. MSD and MAD can introduce relevant bias effects when couples with significant differences in terms of value (e. g., different orders of magnitude) are compared. MAPE skips this problem by introducing the concept of percentage error. MD, MSD, MAD, and MAPE are typical error measurements normally used for the demandforecasting problem in which there are no periods with null demand. But in spare parts forecasting a null demand for components in a period is very frequently observed: in this situation a MAPE is not available and other methods can introduce the above-mentioned bias. The SMAD, defined as the ratio between MAD and the average value of the actual time series, is an efficient parameter to evaluate the accuracy of a forecast for an intermittent demand.
11.8 Spare Parts Forecasting Methods: Application and Case Studies
11.8 Spare Parts Forecasting Methods: Application and Case Studies 11.8.1 Case Study 1: Spare Parts Forecasting for an Aircraft Accurate spare parts demand forecasting is a very critical issue in the management of an aircraft fleet. Airline operators often base their predictions on work experience and on information from aircraft manufacturers. Stocking costs, obsolescence risks, or costs incurred by the unavailability of the airplane can be very important. A large stock of spare parts is often required for many reasons, thus making the management of aircraft fleets very difficult. Safety issues and costs due to interruption of service by airplanes being out of service while undergoing maintenance require efficient maintenance policies in cooperation with continuous inspection and preventive maintenance. Airline companies must have a policy for coping with unanticipated mechanical problems when their aircraft are away from their base. The management of spare parts inventory becomes a significant issue in this context. In particular, accurate forecasts of consumption are important and influence both the performance of an airline fleet and economic returns on capital. As demonstrated by Ghobbar and Friend (2002, 2004) and others, lumpiness is a direct consequence of the inner structural features of the operations performed by an airline company, in particular the fierce competition between companies to meet performance targets expected by customers while still making a profit.
417
There are two broad approaches to spare parts selection: the first is based on the operational experience of an enterprise and the second on the application of forecasting techniques. Ghobbar and Friend (2004) found that only 9–10% of companies use forecasting models. Airline operators usually base predictions on their operational experience, on annual budgets, and on information from lists of spare parts recommended by the aircraft manufacturers. The application presented here is a comparison of different forecasting techniques applied to the spare parts of a fleet of Airbus A320 aircraft belonging to an important national airline company. The airline’s technical division collects daily records of the demand for each component. These records are aggregated to provide monthly data. This database covers the 6 years from 1998 to 2004, and more than 3,000 different items are affected with five different levels, or classes, of lumpiness. Each class contains a population of many items, but for the sake of brevity the following analysis refers only to one item per class as a sample. These five groups of lumpiness seem to be typical for aircraft spare parts. To maintain confidentiality, the items are referred to as a, x, y, z, and w. Figure 11.10 presents an illustrative time series of the demand for item z. Table 11.4 contains the values of CV2 on a monthly basis, and ADI for these five items, while positions inside the lumpy area are given in Fig. 11.11. The performance of forecasting methods is evaluated using the MAD as defined in Eq. 11.15. The comparison of the forecasting methods, in terms of evaluating forecast accuracy using MAD and
demand
20
15
10
5
0 1
4
7
10
13
16
19
22
25
28
31
month
Fig. 11.10 Time series of demand for item z
34
37
40
43
46
49
52
55
58
418
11 Spare Parts Forecasting and Management
Table 11.4 Values of the squared coefficient of variation (CV 2 ) and the average interdemand interval (ADI) for the five representative items Item
CV2
ADI
a x y z w
2.56 2.15 0.79 0.59 2.59
1.63 2.19 1.55 1.34 3.17
tio of maximum SMAD to the minimum SMAD for a single item found using different techniques is approximately 1.56, ranging from 1.45 to 1.71, while for any forecasting method the same average fluctuation for different items is approximately 2.16, ranging from 1.84 and 2.31. This clearly demonstrates the dominant influence of lumpiness. These empirical experiments are summarized in Tables 11.6 and 11.7 that show the effectiveness of each model, and that the weighted moving averages (WMA), Croston, exponential weighted moving averages (EWMA), and trend adjusted exponential smoothing models are the best performers. However, the seasonal regression model (SRM) does perform well, particularly for small values of ADI (less than 1.70). Its forecast error for items y and z is very close to that of the best methods. Small values of CV2 and ADI, as for items y and z, improve the performance of each method. In particular, the Holt–Winter method is very competitive in these conditions, with the additive version generally being more effective than the multiplicative one.
Fig. 11.11 Lumpy coordinates of the five representative items
11.8.2 Case Study 2: Spare Parts Forecasting in a Steel Company SMAD parameters, applied to the five selected items, is reported in Table 11.5. The value of SMAD for each item returned using each forecasting method is presented in Table 11.6 in descending order. The column “position” represents the relative weight of a forecast’s performance, using which a comprehensive comparison and evaluation of strengths and weaknesses for each method can be carried out. Table 11.7 shows the total and average scores based on the collected values and the relative weights in Table 11.6. SMAD makes comparison possible in terms of performance of the forecasting methods on different items, as well as their behavior in different conditions of lumpiness (Fig. 11.12). As clearly seen from Fig. 11.12, the dominant parameter is item lumpiness. The choice of the forecasting method is a side issue. All methods generally perform better when applied to items with shady lumpiness such as y and z compared with the best performer applied to items with glaring lumpiness, such as x and w. Moreover, lumpiness is an independent variable and is uncontrollable. The average fluctuation of the ra-
A European leader in the metallurgy sector and trading worldwide has to cope with forecasting the spare parts requirements in its manufacturing plants. This application is focused on comparing the performance of the “traditionally” good methods for spare parts (particularly Croston, WMA, and EWMA) with the family of autoregressive integrated moving average (ARIMA) methods. The available data set contains information covering 5 years for approximately 2,500 items (Fig. 11.13). The analysis of 12 items (A–N) is now reported with different patterns according to Fig. 11.4, while Table 11.8 summarizes the characteristics of the components presented. ARIMA methods are a populated family of forecasting methods whose object is to express the forecast as a function of the previous values of the series (autoregressive terms) and previous values of forecasting error (moving average terms). The model is generally referred to as an ARIMA(p; d; q) model, where p, d , and q are integers greater than or equal to zero linked,
9.26 1.70 0.57 4.54 5.08
1.00 1.11 0.71 0.65 1.31 0.65 1.31 2.02
10.31 1.35 0.60 3.71 5.10
1.12 0.88 0.74 0.53 1.31 0.53 1.31 2.48
1.22 1.42 0.96 0.82 1.56 0.82 1.56 1.90
11.24 2.18 0.77 5.75 6.04
SES DES
1.36 1.39 0.99 0.86 1.66 0.86 1.66 1.94
12.55 2.12 0.79 6.01 6.44 1.19 1.38 0.92 0.74 1.55 0.74 1.55 2.10
11.02 2.12 0.74 5.17 5.99 1.10 1.26 0.88 0.68 1.43 0.68 1.43 2.09
10.17 1.93 0.71 4.80 5.54 1.17 1.22 0.85 0.71 1.42 0.71 1.42 2.00
10.76 1.87 0.68 4.98 5.51 1.16 1.25 0.81 0.68 1.51 0.68 1.51 2.21
10.74 1.91 0.65 4.78 5.84 1.08 1.22 0.83 0.65 1.49 0.65 1.49 2.31
10.00 1.86 0.67 4.54 5.79 1.08 1.20 0.82 0.63 1.50 0.63 1.50 2.37
10.01 1.84 0.66 4.44 5.82 1.10 1.17 0.78 0.65 1.45 0.65 1.45 2.23
10.19 1.79 0.63 4.55 5.61 1.12 1.14 0.75 0.66 1.47 0.66 1.47 2.23
10.33 1.75 0.60 4.63 5.70 1.09 1.14 0.78 0.65 1.48 0.65 1.48 2.27
10.10 1.75 0.62 4.59 5.76 1.11 1.18 0.78 0.62 1.44 0.62 1.44 2.31
10.23 1.80 0.63 4.36 5.58
0.95 1.12 0.69 0.57 1.14 0.57 1.14 2.00
8.77 1.71 0.55 4.00 4.42
1.04 1.10 0.69 0.53 1.18 0.53 1.18 2.23
9.60 1.68 0.55 3.72 4.58
0.92 1.08 0.71 0.55 1.21 0.55 1.21 2.19
8.48 1.65 0.57 3.90 4.71
0.90 1.03 0.65 0.50 1.18 0.50 1.18 2.36
8.31 1.58 0.52 3.51 4.58
0.95 1.10 0.68 0.55 1.20 0.55 1.20 2.18
8.77 1.68 0.54 3.86 4.66
MA(2) MA(3) MA(4) MA(5) MA(6) MA(7) MA(8) MA(9) MA(10) MA(11) MA(12) SRM TAES EWMA WMA Croston
MAD mean absolute deviation, SMAD standardized mean absolute deviation, TAES trend-adjusted exponential smoothing
MAD a 9.79 x 1.38 y 0.62 z 4.05 w — SMAD a 1.06 x 0.90 y 0.77 z 0.58 w Min 0.58 Max 1.06 Max/ 1.84 min
MW AW
Table 11.5 Evaluation of the forecast accuracy for different methods (MAD and SMAD = MAD/A)
11.8 Spare Parts Forecasting Methods: Application and Case Studies 419
420
11 Spare Parts Forecasting and Management
Table 11.6 Classification of methods based on performance evaluation Item a
x
y
z
w
Position
WMA EWMA Croston SES TAES MW SRM MA(7) MA(8) MA(11) MA(4) MA(9) MA(12) AW MA(10) MA(6) MA(5) MA(3) DES MA(2)
AW MW WMA SRM EWMA Croston TAES SES MA(10) MA(11) MA(9) MA(12) MA(8) MA(7) MA(5) MA(6) MA(4) MA(3) MA(2) DES
WMA CROSTON TAES EWMA SES SRM AW MA(10) MW MA(11) MA(12) MA(9) MA(6) MA(8) MA(7) MA(5) MA(4) MA(3) DES MA(2)
WMA AW TAES Croston EWMA SRM MW MA(12) MA(8) MA(7) SES MA(9) MA(11) MA(10) MA(6) MA(4) MA(5) MA(3) DES MA(2)
SRM WMA TAES Croston EWMA SES AW MA(5) MA(4) MA(12) MA(9) MA(10) MA(11) MA(7) MA(8) MA(6) MA(3) DES MA(2)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
1.80 1.60
MAD/A
1.40 x y z w a
1.20 1.00 0.80 0.60
(9 )
(1 0 M ) A (1 1 M ) A (1 2) SR M TA ES EW M A W M A C R O S TO N
M A
M A
(7 )
(8 )
M A
(6 )
M A
(5 )
M A
(4 )
M A
M A
(2 )
(3 )
M A
M A
S
ES D
AW
SE
M W
0.40
Fig. 11.12 Accuracy evaluation of forecasting methods using SMAD = MAD/A. See Table 11.1 for an explanation of the methods item L demand (pieces )
60 50 40 30 20 10 0 1
3
5
7
9
11
13
15
17 19
21
23
25
27
29 31
33
time (months)
Fig. 11.13 Time series of the demand for item L
35
37
39
41 43
45
47
49
51
53 55
57
59
11.8 Spare Parts Forecasting Methods: Application and Case Studies Table 11.7 Total and average score for forecasting methods applied to different items Method
Total score
Average score
WMA Croston EWMA TAES SRM MW AW SES MA(12) MA(11) MA(9) MA(10) MA(8) MA(7) MA(4) MA(5) MA(6) MA(3) DES MA(2)
8 18 21 21 24 24 31 34 54 56 58 58 60 61 70 73 76 89 95 98
1.6 3.6 4.2 4.2 4.8 6.0 6.2 6.8 10.8 11.2 11.6 11.6 12.0 12.2 14.0 14.6 15.2 17.8 19.0 19.6
Table 11.8 Values of CV2 and ADI for the 12 representative items Item
CV2
ADI
Pattern
A B C D E F G H I L M N
0.80 1.03 0.30 0.30 0.69 0.58 0.18 0.00 0.13 0.95 2.26 0.54
1.94 3.28 1.59 1.48 1.28 1.20 2.14 1.23 1.11 2.57 1.43 3.00
Lumpy Lumpy Intermittent Intermittent Erratic Erratic Intermittent Smooth Smooth Lumpy Lumpy Lumpy
421
respectively, to the order of the autoregressive, integrated, and moving average parts of the model. Detailed information can be found in Makridakis et al. (1998). The choice of the ARIMA model and the optimization of its parameter is performed by an iterative process programmed in software for statistical analysis (e. g., Minitab Statistical Software). Figures 11.14 and 11.15, respectively, show the autocorrelation analysis, complete and partial, and the parameter estimations for item L calculated by Minitab Statistical Software. Table 11.9 summarizes the performance of different forecasting methods in terms of MAD and SMAD. The parameters input in several methods such as WMA, EWMA, and seasonal ARIMA (ARIMAs) are optimized by an iterative process programmed in Minitab Statistical Software. For item L, the ARIMA methods perform best, especially ARIMA(2,0,2)(1,0,0)12. The forecast error is still relevant: SMAD is approximately 0.850, as is typical for spare parts forecasting. The application of the same process to the whole set of 12 items is summarized in the classification of merit in Table 11.10. Several guidelines can be formulated in this heterogeneous situation. The Croston method performs best for erratic patterns, with a significant reduction in SMAD of 20–30%. This trend is also confirmed for slow-moving patterns, but the difference is considerably less at 5–8 %. For an intermittent pattern, ARIMA models perform best. Lumpy patterns are generally forecast well by ARIMA models, but the EWMA method is strong when values of ADI and CV2 are high, i. e., near the
Autocorrelation Function for item L (with 5% significance limits for the autocorrelations)
Partial Autocorrelation Function for item L (with 5% significance limits for the partial autocorrelations) 1.0
0.8
0.8 Partial Autocorrelation
1.0
Autocorrelation
0.6 0.4 0.2 0.0 -0.2 -0.4 -0.6 -0.8 -1.0
0.6 0.4 0.2 0.0 -0.2 -0.4 -0.6 -0.8 -1.0
1
2
3
4
5
6
7
8 9 Lag
10
11
12 13
14
15
1
Fig. 11.14 Autocorrelation analysis for item L. Minitab Statistical Software
2
3
4
5
6
7
8 9 Lag
10 11 12 13 14 15
422
11 Spare Parts Forecasting and Management 0.49 ADI
ARIMA model: item L Estimates at each iteration Iteration 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
SSE 2983.20 2978.28 2975.19 2972.31 2969.10 2965.18 2959.97 2951.94 2937.25 2908.05 2822.13 2797.74 2767.09 2764.72 2764.51 2764.14 2764.10
0.100 0.112 0.262 0.412 0.562 0.711 0.860 1.009 1.153 1.303 1.406 1.445 1.446 1.452 1.455 1.457 1.458
0.100 0.112 -0.002 -0.109 -0.216 -0.325 -0.445 -0.585 -0.733 -0.842 -0.857 -0.904 -0.886 -0.887 -0.888 -0.888 -0.887
Parameters 0.100 0.100 0.088 0.089 0.238 -0.021 0.388 -0.125 0.538 -0.228 0.688 -0.335 0.838 -0.453 0.988 -0.591 1.133 -0.741 1.280 -0.851 1.344 -0.868 1.374 -0.920 1.354 -0.914 1.354 -0.915 1.355 -0.917 1.355 -0.917 1.355 -0.918
EWMA 3.520 3.354 3.197 3.011 2.826 2.652 2.527 2.487 2.506 2.331 1.964 2.029 1.964 1.957 1.951 1.943 1.940
3.0
1.32 1.0
Croston 0
Relative change in each estimate less than 0.0010
Fig. 11.15 Estimation of autoregressive integrated moving average (ARIMA) parameters for item L. Minitab Statistical Software. SSE sum of squares due to error Table 11.9 Performance comparison of forecasting methods for item L
a b
Method
MAD
MAD/A
WMA(3 periods) WMA(5 periods) WMA(7 periods) EWMA ARIMA(1,0,0) ARIMA(0,0,1) ARIMA(1,0,1) ARIMA(2,0,0) ARIMA(0,0,2) ARIMA(2,0,2) ARIMA(2,0,1) ARIMA(1,0,2) ARIMA(2,1,2) ARIMA(1,1,1) a ARIMA(2,0,2)(0,1,0)12 ARIMA(2,0,2)(0,1,0)12 without constant ARIMA(1,0,2)(0,1,0)12 without constant ARIMA(2,0,2)(1,0,0)12 without constant ARIMA(2,0,2)(0,0,1)12 ARIMA(2,0,2)(1,0,0)12 Croston Croston modified b
6.316 6.291 6.617 4.750 5.317 5.317 5.310 5.319 5.322 5.001 5.315 5.072 5.217 5.249 4.854 4.880 5.525 4.432 4.473 4.327 7.055 5.053
1.237 1.232 1.296 0.930 1.041 1.041 1.040 1.042 1.042 0.979 1.041 0.993 1.021 1.028 0.950 0.956 1.082 0.868 0.876 0.847 1.381 0.989
Seasonal ARIMA (Makridakis et al. 1998) Modified according to Syntetos and Boylan (2001)
lower limit of the lumpy class. ARIMA methods are very interesting when “seasonality” is present, which is very difficult to discover in lumpy behavior (e. g., item B): ARIMAs, with SMAD of 50%, work significantly better than the others.
ARIMA
2.0
0
0.5
1.0
1.5
2.0
CV2
Fig. 11.16 The technical merit of forecasting methods. EWMA exponential weighted moving averages
It is important to note that the number of iterations required to optimize the parameters in the ARIMA approach can be high, especially in a lumpy pattern. The modified Croston method (Syntetos and Boylan 2001) mainly works better than the original Croston method for lumpy patterns with high lumpiness in terms of ADI. The generalization of the proposed approach, which was applied to almost 900 items, results in the development of a technical diagram of merit for the forecasting methods that presents the best method according to the area of the item in terms of CV2 and ADI values (Fig. 11.16).
11.9 Methods of Spare Parts Management The goal of an efficient spare parts management system is to minimize the total cost. The forecasting problem of spare parts has been investigated in the preceding pages. When the requirement of spare parts is based on an estimate, the challenge is to optimize the management of these items. The main questions for a manager forecasting the future demand for a set of spare parts are which items to stock at the maintenance division of the company, and how many? There are two fundamental strategies in tackling this problem: the first is based on qualitative methods and the second on quantitative methods.
11.9 Methods of Spare Parts Management
423
Table 11.10 Technical merit for different items Item
CV2
ADI
Pattern
Forecasting technical merit 1°(best)
2°
3°
4°
5°(worst)
Croston modified Croston WMA WMA EWMA EWMA Croston WMA WMA WMA ARIMA Croston
WMA
A
0.80
1.94
Lumpy
ARIMA
EWMA
Croston
B C D E F G H I L M N
1.03 0.30 0.30 0.69 0.58 0.18 0.00 0.13 0.95 2.26 0.54
3.28 1.59 1.48 1.28 1.20 2.14 1.23 1.11 2.57 1.43 3.00
Lumpy Intermittent Intermittent Erratic Erratic Intermittent Smooth Smooth Lumpy Lumpy Lumpy
ARIMAs ARIMA ARIMA Croston Croston ARIMA Croston Croston ARIMAs Croston Croston mod
Croston modified Croston Croston Croston modified Croston modified EWMA ARIMA ARIMA EWMA Croston modified ARIMA
EWMA EWMA EWMA ARIMAs ARIMAs Croston modified EWMA EWMA Croston modified EWMA EWMA
WMA Croston modified Croston modified WMA WMA WMA Croston modified Croston modified Croston WMA WMA
ARIMAs seasonal ARIMA
11.9.1 Spare Parts Management: Qualitative Methods The goal of these approaches is to determine which spare parts should be stocked in the local warehouse. A set of significant spare parts management parameters is evaluated qualitatively. Several authors (Botter and Fortuin 2004; Cobbaert and Van Oudheusden 1996; Braglia et al. 2004) support qualitative approaches to approximating low levels of demand, and reject sophisticated mathematical models with complex distribution functions on the grounds that all the work involved in applying and preserving them is not worth the result. Some suitable qualitative solutions are now reported.
11.9.1.1 The VED Approach The starting point of the VED approach (Botter and Fortuin 2000) is a qualitative classification of service parts into vital, essential, and desirable, which is carried out by analyzing a set of factors. Criticality is the main rule and it can relate closely to several parameters, as reported in Table 11.11. Table 11.12 shows an example of the decision levels for each factor. The three decision levels usually correspond to VED classification. The cut sets for these criteria are clearly related to the specific case. The authors suggest that the specific case be analyzed by focusing on a small number of factors, and
that the analysis of each factor concentrates on a small number of feasible values. Since the choice is absolutely arbitrary, an unfounded situation is very commonly generated. Figure 11.17 presents an example of a framework built on these three factors. Assuming a two-level value for these three factors (i. e., short–long, low–high) it is possible to identify eight areas, each corresponding to a particular management strategy for the spare parts: 1. Low price, short response time, high usage. These cheap, fast-moving spare parts have to be stocked in local warehouses in large quantities. 2. Low price, short response time, low usage. These cheap, slow-moving items also have to be stocked close to the market, but in lower quantities. 3. Low price, long response time, high usage. Inventory and transport costs for these items should be investigated in order to determine whether or not the local stock level of these items is the most economical. For instance, some or all of the local stock of fast-moving parts could be positively substituted and absorbed by transport costs through the shipping of larger quantities using more economical means of transport. 4. Low price, long response time, low usage. In this case no stock can be a good solution. 5. High price, short response time, high usage. Because of the short response time, the expensive stocking of these items, primarily in local warehouses, must be managed particularly carefully.
424
11 Spare Parts Forecasting and Management
Table 11.11 Criticality factors Factor
Comment
Response time Functionality
Maximum time between a call for help and restoration of the system’s functionality Effect the failure of an item has on the system’s availability: an item is functional if the system cannot function without it, or is merely cosmetic if the system can continue to run without it, possibly with minor restrictions Total demand for an item in a unit of time, expressed in units or in money Newly developed, established, continued, or soon to be phased out An item can be (relatively) cheap or expensive Time between placing an order with the supplier of an item and the moment it is ready for use The possibility of restoring an item’s functionality after failure
Consumption Stage of the life cycle Price Purchase lead time Repairability
1
Criteria
Options
Response time
2–4 h Next business day Later than next business day AMC < 5 5 < AMC < 100 AMC > 100 Introduction Maturity Decline P < 100 100 < P < 1;000 P > 1;000 LT < 1 1 < LT < 3 LT > 3
high
Demand (AMC) 8 low
high
short long Response time
Pr
low
Life cycle
ice
Consumption (pieces)
Table 11.12 Choice of criteria and options for spare parts criticality (example)
Fig. 11.17 Framework based on three factors: consumption, response time, and price (Botter and Fortuin 2000)
The quantities should be the minimum required to meet the desired level of service. 6. High price, short response time, low usage. In this case no stock can be the best solution. 7. High price, long response time, high usage. A trade-off analysis is required to choose between maintaining a local stock or no stock for these parts. 8. High price, long response time, low usage. The same observation as in area 7.
P (dollars)
Purchase LT (weeks)
AMC average monthly consumption, P price, LT lead time Table 11.13 Inventory policy matrix Stocking policy
No stock Single item inventory Just-in-time inventory Multi-item inventory
Spare parts classification A
B
C
D
11.9.1.2 Multiattribute Spare Tree Analysis As a result of taking a decision tree approach (Braglia et al. 2004), the authors define four stocking policies and four classes of spare parts. As before, the classification in this case is based on qualitative parameters. The method provides one or more eligible policies for each class. Table 11.13 shows the grid of spare part classes and stocking policies.
Fig. 11.18 The main choice for criticality (Braglia et al. 2004)
As in the VED approach, the method is centered on the concept of spare part plant criticality with its three levels: desirable, important, and critical (see
11.9 Methods of Spare Parts Management
425
Fig. 11.19 Subtree 1: spare part plant criticality as critical (Braglia et al. 2004)
Fig. 11.20 Subtree 2: spare part plant criticality as important (Braglia et al. 2004)
Fig. 11.18). Each level is given by the application of an analytic hierarchy process, as in Sharaf and Helmy (2001). This main choice is influenced by a great many attributes, such as quality, production loss, domino effect, safety, and spare part characteristics, making a qualitative evaluation necessary. For example, if the rating for quality is between 0 and 85%, the authors suggest an evaluation is critical, while between 85 and 95% it is important, and it is desirable between 95 and 100%. Dealing with qualitative evaluation, these cut sets are absolutely arbitrary. Following this, a tun-
ing phase is usually required, preferably carried out by maintenance experts according to the application examined. Three alternative decision-making subtrees are reported in Figs. 11.19–11.21, and are covered by the assigned level. The final result is the classification of the spare part; hence, the related best policy is that shown in Table 11.13. In conclusion, it is important to note that qualitative methods are simple, rapid, and usually cheap, very interesting features especially when the mainte-
426
11 Spare Parts Forecasting and Management
Fig. 11.21 Subtree 3: spare part plant criticality as desirable (Braglia et al. 2004)
nance database is poor. Furthermore, it is possible to consider several intangible additional factors such as the quality of suppliers, obsolescence, and transport. They can help to reduce the size of the problem, subsequently making it positively suitable for a quantitative approach.
N
Pd,T,1 N–1
Pd,T,2
N–2 Pd,T,N-1 1 t0
t0+T
time
Fig. 11.22 Trend of spare parts stock with time
11.9.2 Spare Parts Management: Quantitative Methods The aim of these procedures is very ambitious: the accurate prediction of the numbers of spare parts to stock in the company’s local warehouse. Several authors (Giri et al. 2005; Chang et al. 2005; Gutierrez et al. 2008) proposed different approaches based on linear programming, simulation, and many other methods, normally focusing on the optimization of the level of service or inventory costs. Unfortunately, the extreme degree of complexity or oversimplification sometimes tends to restrict their effectiveness in real applications. Several methodologies successfully tested in industrial applications are now presented.
11.9.2.1 Minimum Total Cost Method This basic method (Roversi and Turco 1974) considers two fundamental costs, the holding cost and the
shortage cost. These two result in a trade-off problem. Moreover, spare parts storage is an expensive activity in which the probability of the use of the components is low. On the other hand, should there be a shortage of spare parts, the related loss of production can be very expensive. (11.18) Ctot .N / D CH C CS ; where Ctot is the total cost, N is the number of spare parts to stock, CH is the holding cost, and CS is the shortage cost. The holding cost term CH can be estimated as follows. In considering a single item, N is the number of spare parts in the warehouse at the point in time t0 D 0, and T is the interval between two consecutive supplies. The number of items in stock remains N if there is no consumption in T , the corresponding probability being Pd;T;0. In the case of a single consumption, the stock decreases to (N 1) with probability Pd;T;1 , and so on. The situation concerning spare parts stock is explained in Fig. 11.22.
11.9 Methods of Spare Parts Management
427
In conclusion, the holding cost is estimated by
CH D R NPd:T;0 C .N 1/Pd:T;1
C .N 2/Pd:T;2 C : : : C Pd:T;N 1 ; (11.19)
where R is the purchase cost of the item, ' is the annual stock percentage cost (on purchase cost), Pd;T;x is the probability of x items being consumed during supply time T , and d is the average consumption of the item. Production losses could occur if the number of failures during the time T exceeds the number N of parts initially supplied. The cumulative probability of this event is calculated by P D Pd:T:N C1 CPd:T:N C2 CPd:T:N C3 C: : : (11.20) The probability Pd;T;x is easily computed using the Poisson formula (see Sect. 11.5). Assuming Cm is the cost resulting from the shortage of a single spare part, the shortage cost is CS D Cm dP:
(11.21)
These two cost terms are functions of N with opposite trends, and the optimal value for N sets their sum to a minimum, i. e., the total cost Ctot is minimized.
11.9.2.2 Numerical Application The main motor of an automatic labeling machine has an average consumption d D 2 pieces/year and a purchase cost R D ¤ 5,000 per piece. The manufacturer purchases spare parts every 4 months (i. e., supply time T D 4 months). Should a failure occur, this
7000 6000
[€/year]
5000 CH
4000
motor causes expensive production losses and emergency procurements that from historic Cm amount to ¤ 6,700 per piece. The annual stock percentage value is D 22%. Using the Poisson formula to calculate the probability Pd;T;n , we get the values given in Table 11.14. According to Eqs. 11.19 and 11.21 and considering different values of N , the total cost Ctot can be estimated. For N D 2,
CH D R NPd:T;0 C .N 1/Pd:T;1 C .N 2/Pd:T;2 C : : : C Pd:T;N 1 D 5000 0:22 Œ2 0:513 C .2 1/ 0:342 D ¤ 1504 per year and CS D Cm dP D Cm d.Pd;T;3 C Pd;T;4 C : : :/
D Cm d 1 .Pd;T;0 C Pd;T;1 C Pd;T;2 / D 6700 2 0:031 D ¤ 415 per year. Table 11.15 shows the costs as a function of N . The optimal solution provides two motors in stock as spare parts (Fig. 11.23). This minimal cost model is very easy to use and is widely diffused among engineers and practitioners. The literature presents several graphical abacuses based on this model in support of the rapid solution of the proposed problem. Figure 11.24 shows one of these abacuses: the entering side (input zone) requires the average item consumption d and the supply time T . Following the different zones and using the remain-
Table 11.14 Poisson probability distribution N
Pd;T;n
0 1 2 3 4 5
0.513 0.342 0.114 0.025 0.004 ...
CS
3000
Ctot (€/year)
Table 11.15 Cost plan according to N values
2000 1000 0 0
1
2
3
N
Fig. 11.23 Optimal number of spare parts
4
N
CH (¤/year)
CS (¤/year)
Ctot (¤/year)
0 1 2 3
0 564 1,504 2,571
6,525 1,943 415 80
6,525 2,507 1,919 2,651
428
11 Spare Parts Forecasting and Management I n p u t region
Output region
T constant 1 w 2 ee w ks ee ks 1 m on th 3 m on th 6 m s on t h 1 ye s a 2 r
N= 0
N= 1
N= 2
N= 3 N= 4 N= 5
s ar
ye
0.001
d (pieces/year)
0.01
0.1
N= 6
1
10
Cm ( €/ p i e c e )
5 50 5 00 5
5 000
50
5 0000
(€/ p i e c e × year) 5 00
5 00000
Rϕ
5 000
Fig. 11.24 Example of abacus for the optimal number N of spare parts
ing input (i. e., R, ', Cm /, the final result in the output zone is the optimal number of spare parts.
11.9.2.3 Stock Level Conditioned to Least-Availability Method This method (Regattieri 1999) is based on the definition of asymptotic availability: AD
MTTF ; MTTF C MTTR
The MTTR is derived from different factors, as shown in Fig. 11.25. Its value is strongly dependent on the number of spare parts stored in the particular case, as reported in Fig. 11.26. Assuming no spare parts are in stock, let tN be the point in time corresponding to a failure and f .tN / its corresponding probability distribution. Consequently, the expected waiting time is ta D T tN , with an average value of M.ta / given by RT
(11.22)
where MTTF is the mean time to failure and MTTR is the mean time to repair.
M.ta / D
0
.T tN /f .tN / dtN : RT f .t / dt N N 0
(11.23)
11.9 Methods of Spare Parts Management
429
MTTR
Predictive maintenance
call
set-up
disassembly
Breakdown maintenance
reparation
supply
from market
calibration
assembly
check
closure
from stock
Fig. 11.25 Structure of the mean time to repair (MTTR)
tN ; f(tN)
The average value of MTTR can be expressed by the sum of the average waiting time M.ta / and Treplace , i. e., the amount of time due to all the remaining factors (except time of supply, e. g., disassembling, repairs):
time 0
T
2T
Fig. 11.26 Spare parts waiting time evaluation
MTTR D Treplace C M.ta / RT .T tN /f .tN / dtN : D Treplace C 0 R T 0 f .tN / dtN (11.24) It is worth noting that increasing N lowers MTTR, with corresponding improvement of availability A and collapse of downtime costs. Moreover, the method affords the quantitative definition of the holding cost CH as in Eq. 11.19. An iterative process can be used to evaluate CH in order to find the optimum value of N by setting CH to a minimum while allowing the minimum level of availability Amin .N / to guarantee ontime dispatch of several technical requests (e. g., safety questions or productivity level):
8 ˚
ˆ min CH D min R NPd;t;0 C .N 1/Pd;T;1 C ˆ ˆ ˆ < C Pd;T;N 1 ˆ ˆ ˆ MTTF ˆ :A.N / D Amin : MTTF C MTTR.N / (11.25)
11.9.2.4 Case Study: Spare Parts Optimization for a Motorcycle Manufacturer A worldwide leader in motorcycle race championships was having problems with spare parts optimization in
Table 11.16 Classes and values for the factors in the criticality index Lead time of supply LTmin
S
EV
LTmax
PAC
¤/piece
XEV
(%)
XS
Days
XLTm
Days
XLTM
Pieces/year
XPAC
0–15 16–50 51–200 201–500 501–...
0.2 0.4 0.6 0.8 1.0
0–10 11–50 51–89 90–100
0.2 0.5 0.8 1.0
0–1 2–5 6–22 23–...
0.2 0.4 0.7 1.0
0–5 6–10 11–44 45–...
0.2 0.4 0.7 1.0
0–5101 5101 –1 1.1–5 5.1 –...
1.0 0.8 0.45 0.2
EV economic value of the item, S possibility of substituting the item with a similar item when the part is not in stock, LT lead time of supply, PAC past average consumption
430
11 Spare Parts Forecasting and Management 100
C
90
Cumulative percentage
80 70
B
60 50 40
A
30 20 10 0
Fig. 11.27 ABC analysis for the criticality index
192
Number of items
3,000
2
spare parts classificaon (CV and ADI) 1.00 0.90 0.80 0.70 0.60 0.50 0.40 0.30 0.20 0.10 0.00
2
Errac
Lumpy
N L
U
M G
H
F
C
O
A D 0.4
0.6
0.8
1.0
V
E 1.2
Q 1.4
F - BC06SD3476 G - DF573 H - FG563ER52 I - FF456LK86 L - EW986GH527
A - FG56WT356 B - AA65GH354 C - QA3465 D - QA3489 E - RT678YH452
S
P
smooth
0.2
R
I
B
0.0
T
1.6 ADI
1.8
Intermient
2.0
2.2
M - XC190FG0236 N - WE345JK784 O - WE3729RF45 P - D4239 Q - FG56WT654
2.4
2.6
2.8
3.0
3.2
R - DC0946SS4572 S - WE34897KL213 T - GG54TY673 U - DF254 V - AA65DFGE523
Fig. 11.28 Example of spare parts classification (CV2 and ADI)
item BC06SD3476 250
forecast (pieces)
200
150
historical
100
binomial
Poisson
50
0 0
1000
2000
3000
4000
T (h)
Fig. 11.29 Forecasting performance for a smooth item
its main production plant. These spare parts were routinely managed using the experience of staff involved in manufacturing and assembly operations. Significant
losses in production due to shortage of spare parts occurred, with high costs being incurred due to large numbers of parts in stock and the risk of obsolescence. The high number of items to be considered is often a problem. In this case more than 3,000 spare parts are codified and used, and on which, in terms of time and cost, it is neither convenient nor possible to carry out a systematic analysis. Rather, it is important to “simplify” the data set by, e. g., introducing an “item criticality index” composed of several factors. In this case the economic value of the item, the possibility of substituting it with a similar item when the part is not in stock, the lead time of supply, and the past average consumption are related to each other in an average weighted value as follows: PF i D1 pi Xi ; Criticality index D 1 P F i D1 pi
(11.26)
11.9 Methods of Spare Parts Management
431
16
Table 11.17 DX345SS387
14
N (pieces)
CH (¤/year)
CS (¤/year)
Ctot (¤/year)
12
1 2 3 4 5 6 7 8 9 10
43 65 94 120 164 230 327 462 664 950
910 468 240 116 49 21 16 4 3 1
953 533 334 236 213 251 343 466 667 950
forecast (pieces)
item DC0946SS4572
10
historical
8
Poisson binomial
6 4 2 0 0
1000
2000
3000
4000
Minimum total
cost
method
for
item
T (h)
Fig. 11.30 Forecasting performance for an intermittent item item XC190FG0236 25
forecast (pieces)
20
15
historical binomial Poisson
10
5
0
0
1000
2000
3000
4000
T (h)
Fig. 11.31 Forecasting performance for an erratic (or lumpy) item
where F is the number of factors, pi is the relative weight assigned to the factor, and Xi is the value as-
signed to the item as a function of its range for each factor. The smaller the index, the greater the criticality. Table 11.16 shows these factors, their ranges, and the values Xi for each class and each factor. The relative weights pi used in this application for the economic value of the item, the possibility of substituting it with a similar item when the part is not in stock, the lead time of supply, and the past average consumption are 0.4 (40%), 0.2 (20%), 0.1 (10%), and 0.3 (30%), respectively. All the items are ranked in increasing order of the criticality index, and for every item the specific criticality index is expressed as a percentage of the total amount of the indexes in order to obtain a cumulative percentage order. Hence, the items are classified in three categories A, B, and C, as reported in Fig. 11.27. The first 30% of the cumulative percentage forms a reduced set composed of 192 critical items representing approximately 73% in value of the whole set of spare
costs 1000
CH [€/year] CS [€/year] Ctot [€/year]
900 800
[€/year]
700 600 500 400
Saving
300 200 100 0 0
Fig. 11.32 Costs and saving for item DX345SS387
1
2
3
4
Optimal value
5
6
7
8
9
10
Spare parts in stock
432
parts in stock. This set is analyzed according to the well-known lumpiness parameters CV2 and ADI (see Fig. 11.28), leading to some interesting observations about the consumption forecast and optimal management. Forecasting of each item in the set was performed using both Poisson and binomial distributions, and the results were compared with the historical consumption in order to specify the best alternative. In brief, smooth items are best forecast by the Poisson formula, while for intermittent items the results from Poisson and binomial distributions are substantially good, and the bi-
11 Spare Parts Forecasting and Management
nomial distribution is slightly better for lumpy and erratic patterns. Figures 11.29–11.31 present a collection of different forecasting results for smooth, intermittent, and erratic items, respectively. The optimal management of the consumption forecast for the 192 items in the set is performed by the minimum total cost method as reported for a specific item in Table 11.17 and Fig. 11.32. Regarding the whole set, the optimal quantity is often smaller than the average number of spare parts in stock, with a consequent saving of several hundred thousand euros per year (approximately 37% of total cost).
12
Applications and Case Studies
Contents 12.1 Preventive Maintenance Strategy Applied to a Waste to Energy Plant . . . . . . . . . . . . . . . . . . . . 12.1.1 Motor System Reliability Evaluation . . . . . . . 12.1.2 Bucket Reliability Evaluation . . . . . . . . . . . . . 12.1.3 Motor System. Determination of Maintenance Costs . . . . . . . . . . . . . . . . . . . . 12.1.4 Time-Based Preventive Replacement for the Motor System . . . . . . . . . . . . . . . . . . . . 12.1.5 Time-Based Preventive Replacement for the Bucket Component . . . . . . . . . . . . . . . . 12.1.6 Time-Based Preventive Replacement with Durations Tp and Tf . . . . . . . . . . . . . . . . . 12.1.7 Downtime Minimization . . . . . . . . . . . . . . . . . 12.1.8 Monte Carlo Dynamic Analysis . . . . . . . . . . . 12.1.9 Monte Carlo Analysis of the System . . . . . . . 12.2 Reliability, Availability, and Maintainability Analysis in a Plastic Closures Production System for Beverages . . . . . . . . . . . . . . . . . . . . . . . . . 12.2.1 RBD construction . . . . . . . . . . . . . . . . . . . . . . . 12.2.2 Rotating Hydraulic Machine . . . . . . . . . . . . . . 12.2.3 Data Collection and Reliability Evaluation of Components . . . . . . . . . . . . . . . . . . . . . . . . . 12.2.4 Reliability Evaluation, Nonrepairable Components/Systems . . . . . . . . . . . . . . . . . . . . 12.2.5 Data on Repairs and Maintenance Strategies 12.2.6 Monte Carlo Analysis of the Repairable System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.2.7 Alternative Scenarios and System Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . .
433 434 436 437 439 439 441 442 442 446
446 448 449 449 454 456 456 460
12.3 Conclusions and Call for New Contributions . . . . 462
Reliability, availability, maintainability, quality, safety, preventive maintenance, inspections, spare parts, failure modes and effects analysis (FMEA), failure mode, effects, and criticality analysis (FMECA), reliability block diagrams, fault tree analysis, Markov
chains, etc. are just a few measures, parameters, and tools introduced in the previous chapters. Several models and methods, together with supporting numerical examples and applications, have been illustrated. This chapter aims to present some applications for industrial case studies where the previously introduced models and methods have been effectively applied. We hope that readers will be able to contribute to a new edition of this book by proposing industrial case studies which we can analyze further.
12.1 Preventive Maintenance Strategy Applied to a Waste to Energy Plant With reference to the case study introduced in Sect. 8.8 dealing with the maintenance optimization of a waste to energy (WtE) plant, this application presents the results obtained by the introduction of the analytical models for planning of preventive maintenance actions, as discussed in Chap. 9. The production system is made up of two lines for the waste treatment, supplied by two bridge cranes acting as primary loaders of the furnace hoppers. Every bridge crane has a motor system and a bucket among its most important elements. Tables 12.1 and 12.2 collect the time records of failure for these elements during a period of time T and quantify the time to failure (ttf) values representing the input of a statistical analysis and reliability evaluation for the determination of the best preventive rule. In particular, considering the bridge crane 1, the motor system registered 15 failure events from January 2005 to March 2007, while the bucket had nine fail-
R. Manzini, A. Regattieri, H. Pham, E. Ferrari, Maintenance for Industrial Systems © Springer 2010
433
434
12 Applications and Case Studies
ures during the period from July 2005 to March 2007. Figure 12.1 presents the frequency distribution of the ttf for the motor system and the bucket of the bridge crane 1. The analytical models for the optimization of the replacements of components and systems are based on the determination of reliability measures such as the reliability function, hazard rate, and expected number of failures. For this purpose, it is possible to apply a parameter estimation of these measures using literature statistical distributions such as Weibull, exponential, and normal, or a nonparametric evaluation, e. g., based on the Kaplan–Meier method and on a confidence interval (see the theory and applications illustrated in Chaps. 5 and 6). The following section presents some of the most significative results achieved by the application of both evaluation approaches, for the motor system and the bucket components respectively.
12.1.1 Motor System Reliability Evaluation
t – TTF
R.t /
LCI limit
UCI limit
.t /
2 4 5 10 12 14 37 47 51 52 59 74 86 151 168
0.933 0.867 0.800 0.733 0.667 0.600 0.533 0.467 0.400 0.333 0.267 0.200 0.133 0.067 0.000
0.807 0.695 0.598 0.510 0.428 0.352 0.281 0.214 0.152 0.095 0.043 0.000 0.000 0.000 0.000
1.000 1.000 1.000 0.957 0.905 0.848 0.786 0.719 0.648 0.572 0.490 0.402 0.305 0.193 0.000
0.067 0.071 0.077 0.083 0.091 0.100 0.111 0.125 0.143 0.167 0.200 0.250 0.333 0.500 1.000
TTF time to failure, LCI lower confidence interval, UCI upper confidence interval
tion f .t/, survival function R.t/, and hazard rate .t/, assuming a Weibull distribution with shape parameter 0.9455 and scale parameter 50.21 days. Figure 12.4 presents the same reliability functions but assuming an exponential distribution for the historical observations, with a correspondent Anderson–Darling index equal to 1.020. The scale parameter is 51.467 days, i. e., the constant hazard rate is 0.194 day1 . Finally, Figs. 12.5 and 12.6 present the results obtained by a nonparametric reliability evaluation. In particular, the values of the survival function assuming a confidence interval of 95% (Fig. 12.5) and a hazard rate (Fig. 12.6) are reported in Table 12.1.
3
3
2
2
Frequency
Frequency
Figure 12.2 presents the well-known four-way probability plot for the ttf. This is a very useful tool to identify the best parametric statistical distribution function capable of fitting the historical observations of failure events. The values obtained for the Anderson–Darling index suggest the Weibull probability function is the best-fitting distribution for the available data. In particular, Fig. 12.3 presents the trend of the most important estimated functions, the probability density func-
Table 12.1 Motor system. Nonparametric estimation. Values of R.t / and .t /
1
0
1
0
0
20
40
60
80
100 120 140 160
ttf - motor system Fig. 12.1 Time to failure (ttf ) (days) distribution for two components of bridge crane 1
0
25
50
75
100
125
ttf - bucket
150
175
200
12.1 Preventive Maintenance Strategy Applied to a Waste to Energy Plant
435
Motor System. Four-way Probability Plot of ttf ML Estimates - Complete Data
Lognormal base e
99 95 90 80 70 60 50 40 30
99
Anderson–Darling
95
Percent
Percent
Weibull
20 10 5
Weibull 0.991
80 70 60 50 40 30 20
Lognormal base e 1.137 Exponential 1.020
10
3 2
5
1
Normal
1 1
10
1.484
100
1
10
Exponential
100
1000
Normal
99
99
98
95
97
Percent
Percent
95 90 80 70 60 50
80 70 60 50 40 30 20 10 5
30 10
1 0
100
200
0
100
200
Fig. 12.2 Motor system. Four-way probability plot of ttf. Minitab® Statistical Software. ML maximum likelihood
Motor System. Overview Plot of ttf. Weibull function ML Estimates - Complete Data
Probability Density Function
Weibull Probability
Percent
0.02
0.01
99 95 90 80 70 60 50 40 30
Shape Scale
0.9455 50.21
MTTF
51.503
Failure Censor
15 0
20
Goodness of Fit
10
AD*
5
0.991
1
0.00 0
50
100
150
200
1
250
10
Survival Function
100
Hazard Function
1.0
0.025
0.9
0.024
0.8
0.023 0.022
0.6 0.5
Rate
Probability
0.7
0.4
0.021 0.020
0.3 0.019
0.2
0.018
0.1 0.0
0.017 0
50
100
150
200
250
0
50
100
150
200
250
Fig. 12.3 Motor system. Parametric estimation. Weibull function. Minitab® Statistical Software. MTTF mean time to failure
436
12 Applications and Case Studies
Motor System. Overview Plot of ttf. Exponential function ML Estimates - Complete Data
Probability Density Function
Exponential Probability
0.02 99
Percent
95 0.01
100
MTTF
51.467
Failure Censor
15 0
Goodness of Fit AD*
70 60 50 40 30 20 10 5 1 0
1 51.467
90 80
0.00
Shape Scale
0
200
100
Survival Function
1.02
200
Hazard Function 0.0200
1.0 0.9 0.8
0.6 0.0195
0.5
Rate
Probability
0.7
0.4 0.3 0.2 0.1
0.0190
0.0 0
100
200
0
100
200
Fig. 12.4 Motor system. Parametric estimation. Exponential function. Minitab® Statistical Software
Nonparametric Survival Plot for ttf - motor system Kaplan-Meier Method - 95.0% CI Complete Data 1.0
MTTF
51.467
0.9
Median IQR
47.000 64.000
0.8
Probability
0.7 0.6 0.5 0.4 0.3
Fig. 12.5 Motor system. Nonparametric estimation. Survival function and confidence interval (CI). Minitab® Statistical Software. IQR interquartile range
0.2 0.1 0.0 0
12.1.2 Bucket Reliability Evaluation Figures 12.7–12.9 and Table 12.2 report the estimated values and trends obtained by the application of both
50
100
150
Time to Failure
parametric and nonparametric statistical evaluation analyses, as similarly performed for the most important reliability functions for the motor system reported in the previous section.
12.1 Preventive Maintenance Strategy Applied to a Waste to Energy Plant
437
Nonparametric Hazard Plot for ttf - motor system Empirical Hazard Function Complete Data 1.0
MTTF
51.467
0.9
Median IQR
47.000 64.000
0.8 0.7
Rate
0.6 0.5 0.4 0.3 0.2 0.1
Fig. 12.6 Motor system. Nonparametric estimation. Hazard function. Minitab® Statistical Software
0.0 0
50
100
150
Time to Failure
Bucket comp. Four-way Probability Plot of ttf ML Estimates - Complete Data
Lognormal base e
99 95 90 80 70 60 50 40 30 20
99
Anderson–Darling
95
Percent
Percent
Weibull
10 5
Weibull 1.313
80 70 60 50 40 30 20
Lognormal base e 1.379 Exponential 1.324
10
3 2
5
1
Normal
1 1
10
100
1.521 10
100
Exponential
1000
Normal
99
99
98
95
97
Percent
Percent
95 90 80 70 60 50
80 70 60 50 40 30 20 10 5
30 10
1 0
100
200
300
0
100
200
®
Fig. 12.7 Bucket component. Four-way probability plot of ttf. Minitab Statistical Software
12.1.3 Motor System. Determination of Maintenance Costs The motor system and the bucket of bridge crane 1 are both subject to corrective and preventive maintenance, mainly consisting of actions of replacement. The com-
ponent is assumed to be “as good as new” after the completion of the generic recovery action, whose duration is the time to repair (ttr). The mean value of ttr (MTTR) is about 8 h, equivalent to about 10 h=year of idle time for the WtE plant. The corresponding cost for the lost production of electricity is about US$ 2,103
438
12 Applications and Case Studies
Bucket comp. Overview Plot of ttf. Weibull function ML Estimates - Complete Data Weibull Probability
Probability Density Function
Percent
0.010
0.005
99 95 90 80 70 60 50 40 30 20
Shape Scale
1.0807 69.247
MTTF
67.218
Failure Censor
9 0
Goodness of Fit
10
AD*
5
1.313
1
0.000 0
100
200
1
300
Survival Function
10
100
Hazard Function 0.018
1.0 0.9
0.017
0.8 0.016
0.6
0.015
0.5
Rate
Probability
0.7
0.4 0.3
0.014 0.013
0.2 0.012
0.1 0.0
0.011 0
100
200
300
0
100
200
300
Bucket comp. Overview Plot of ttf. Esponential function ML Estimates - Complete Data
Probability Density Function
Exponential Probability
0.015
99
0.010
Percent
95
80
0.000
70 60 50 40 30 20 10 5 1 100
200
1 67.222
MTTF
67.222
Failure Censor
9 0
90
0.005
0
Shape Scale
Goodness of Fit AD*
0
300
Survival Function
100
200
1.324
300
Hazard Function 0.0154
1.0 0.9 0.8
0.6 0.0149
0.5
Rate
Fig. 12.8 Bucket component. Parametric estimation. Weibull compared with exponential. Minitab® Statistical Software
Probability
0.7
0.4 0.3 0.2 0.1
0.0144
0.0 0
100
per year, considering a cost per hour equal to US$ 0.21 per kilowatt-hour, while the cost for the not produced and distributed heat is about US$ 2,277 per year, considering a cost per hour equal to US$ 0.08 per kilowatthour. Consequently the related average cost of a failure is supposed to be about US$ 150.7 per failure. The average cost of a preventive action Cp on the components is about US$ 111.5 per action, made up of US$ 101.5 per action for man work and US$ 10
200
300
0
100
200
300
per action for materials and spare parts. The average global cost of a failure maintenance action Cf is about US$ 1,348.7 per action, made up of the following contributions: • • • •
Cost of failure: US$ 150.7 per failure; Cost of man work: US$ 548 per action; Cost of materials and spare parts: US$ 650 per year; Cost of emissions: US$ 0 per year.
12.1 Preventive Maintenance Strategy Applied to a Waste to Energy Plant
Nonparametric Analysis - Bucket component Kaplan-Meier Method - Complete Data
Kaplan-Meier Survival Function
Probability
1.0
0.5
0.0 0
100
200
Nonparametric Hazard Function
Rate
1.0
0.5
0.0 0
100
200
Fig. 12.9 Bucket component. Nonparametric estimation. Survival function and hazard function. Minitab® Statistical Software
Table 12.2 Bucket component. Nonparametric estimation. Values of R.t / and .t / t – TTF
R.t /
LCI limit
UCI limit
.t /
5 13 16 31 52 81 99 108 200
0.889 0.778 0.667 0.556 0.444 0.333 0.222 0.111 0.000
0.684 0.506 0.359 0.231 0.120 0.025 0.000 0.000 0.000
1.000 1.000 0.975 0.880 0.769 0.641 0.494 0.316 0.000
0.111 0.125 0.143 0.167 0.200 0.250 0.333 0.500 1.000
TTF time to failure, LCI lower confidence interval, UCI upper confidence interval
439
Figure 12.11 reports the values assumed by the expected unit cost of maintenance UEC.tp / by the application of Eq. 9.3 and the continuous dynamic analysis. In particular, the best value of interval tp is about 46.99 days. This value generates a UEC equal to US$ 22.464 per day, as the result of the expected total replacement cost per cycle of US$ 688.86 and the expected cycle length of about 30.7 days. The trends presented in Figs. 12.12 and 12.13 are similar to those illustrated and exemplified in Sect. 9.5 about the generic Weibull density function for the variable ttf (see, in particular, Fig. 9.10). Nevertheless, they are different in shape: in detail, their sawtooth shape is due to the introduction of historical observations of failure events, while the results of Sects. 12.1.1 and 12.1.2 are the output of a parametric evaluation by a failure probability function. The expected cost per unit time displayed in Fig. 12.13 was derived assuming a Weibull distribution for ttf. As previously demonstrated in Sect. 12.1.1, the distribution of ttf values is well modeled by a shape factor equal to 0.9455, corresponding to a nonaging component subject to early wear out, as discussed in Sect. 5.10.5. This value is very close to 1: as a consequence, an exponential function can justify the UEC values, reported in Fig. 12.13 and obtained by a best fitting Weibull analytical model assuming ˇ D 0:9455 and ˛ D 50:21 days. In this case, a preventive maintenance strategy is not suitable. These remarks seem to be in conflict with the previous best tp value (46.99 days), but Fig. 12.11 demonstrates that the minimal UEC is also obtained with a larger interval of time tp . In particular, the expected cost of replacement per unit time is about US$ 24.6 per day in tp D 160 days, while the Weibull assumption justifies a unit cost equal to US$ 26.4 per day in tp D 160 days.
12.1.4 Time-Based Preventive Replacement for the Motor System Figure 12.10 illustrates the logic diagram of the continuous dynamic model developed in Simulink® (MATLAB / to support the determination of the best tp value by the application of the time-based preventive replacement model illustrated in Sect. 9.5. This is a parametric model useful for identifying the best tp value for the generic distribution of ttf values and couples of (Cp , Cf ).
12.1.5 Time-Based Preventive Replacement for the Bucket Component This section provides the results obtained by the application of replacement model type I to the decisions regarding the maintenance activity for the bucket component, similarly to Sect. 12.1.4 for the motor system.
440
12 Applications and Case Studies 1 s
Int. R(t)
Clock
R(t) 1 Constant
Divide
Cp
Cf
F(t)
Expected cost & Expected length
111.5
1348.7
Expected cost
R(t) & F(t) UEC(tp)
Fig. 12.10 Motor system. Type I model. Simulink® , MATLAB®
simout
To Workspace
100 90
UEC(t p) [$/day]
80 70 60 50 40 30
Fig. 12.11 Motor system, nonparametric analysis. UEC.tp /
20
10
20
30
40
50
60
70
80 90 t p [day]
100
110
120
130
140
150
160
Fig. 12.12 Motor system, nonparametric analysis. Expected total replacement cost per cycle
Expected total replacement cost per cycle [$/cycle]
1400
1200
1000
800
600
400
200
0
0
10
20
30
40
50
60
70
80 90 t p [day]
100
110
120
130
140
150 160
12.1 Preventive Maintenance Strategy Applied to a Waste to Energy Plant
441
100 90
UEC(t p) [$/day]
80 70 60 50 40 30
Fig. 12.13 Motor system, parametric analysis. Weibull distribution, UEC.tp /
20 0
10
20
30
40
50
60
70
80 90 t p [day]
100
110
120
130
140
150
160
90 80
UEC(t p) [$/day]
70 60 50 40 30 20
Fig. 12.14 UEC.tp / for the bucket component, nonparametric analysis
10 0
10
20
30
40
In particular, Fig. 12.14 reports the trend of the UEC values assuming a nonparametric distribution of ttf, as reported in Fig. 12.9. The minimum value of UEC.tp /, for tp D 80:99 days, is US$ 16.3 per day. The expected total replacement cost per cycle is US$ 798.83 per cycle and the expected cycle length is about 48.99 days. Figure 12.15 shows the outcomes of the application of the type I model to the Weibull distribution function, having scale value ˛ D 69:247 days and shape factor ˇ D 1:0807, reported in Fig. 12.8. The best tp value is about 98 days and the minimum UEC.tp / is US$ 19.94 per day.
50
60
70
80
90
100 110 120 130 140 150 160 170 180 190 200 t p [day]
12.1.6 Time-Based Preventive Replacement with Durations Tp and Tf Figure 12.16 illustrates the results obtained by the application of the analytical model for the determination of the best interval of time tp , as illustrated in Sect. 9.6, for the preventive replacement of the bucket component in bridge crane 1. The duration of a preventive replacement is assumed to be equal to 0.5 days per replacement, while the duration of a replacement in the case of failure is 1 day per replacement. The min-
442
12 Applications and Case Studies 22
UEC(tp) [$/day]
21.5
21
20.5
20
Fig. 12.15 Bucket component, parametric analysis. Weibull distribution. UEC.tp /
19.5
30
40
50
60
70
80
90
100 110 120 130 140 150 160 170 180 190 200 t p [day]
Type I vs Type I with duration of replacement (Tp = 0.5, Tf = 1)
40
35 Type I with duration of replacement
UEC(t p) [$/day]
Type I
Fig. 12.16 Bucket component, nonparametric analysis. Type I model versus type I model with duration of replacement
30
25
20
15 10
20
30
40
imum UEC based on the duration of replacement is about US$ 16.05 per day, when tp D 80:99 days. Figure 12.16 compares in detail the UEC values obtained by applying the type I model and the type I model with the duration of replacement.
12.1.7 Downtime Minimization With reference to the bucket component of bridge crane 1, Fig. 12.17 reports the values of the downtime obtained by the application of the model illustrated in Sect. 9.9 (see Eq. 9.42). From 0 to 200 days, there is not a tp value corresponding to a minimum for the downtime.
50
60
70
80
90
100 110 120 130 140 150 160 170 180 190 200 t p [day]
12.1.8 Monte Carlo Dynamic Analysis This section deals with the most important results obtained by the application of the Monte Carlo dynamic simulation. First of all, each component has to be considered separately and subsequently, as discussed in Sect. 12.1.9, it is possible to analyze the failure and repair behavior for each component in detail. Table 12.3 and Figs. 12.18 and 12.19 refer to the motor system. The following scenarios have been simulated with a period of time of 3,650 days, i. e., 10 years: • Corrective maintenance strategy. The component is subject to corrective maintenance and not to preventative maintenance.
12.1 Preventive Maintenance Strategy Applied to a Waste to Energy Plant
443 Downtime Minimization
25
Dow DT [%]
20
15
10
5
Fig. 12.17 Bucket component, nonparametric analysis. Downtime (DT) minimization
0
0
10
20
30
40
50
60
70
80
90
100 110 120 130 140 150 160 170 180 190 200 t p [day]
Table 12.3 Motor system. Multiscenario analysis. q D 1 Motor system (1,000 rep. – 3,650 days)
CM
Mean availability (all events) Point availability (all events) at 3,650 MTTFF Uptime CM downtime PM downtime Total downtime Number of failures Number of CMs Number of PMs Total events CMs and PMs Total costs
0.981 0.986 54.062 3,580.510 69.491 0.000 69.491 69.498 69.498 0.000 69.498 93,731.953
CM and PM (tp D 46:99)
CM and PM (tp D 160)
0.974 0.974 48.927 3,555.128 71.884 22.988 94.872 71.892 71.892 45.980 117.872 102,087.510
0.980 0.970 48.840 3,578.573 69.600 1.827 71.427 69.612 69.612 3.654 73.266 94,293.125
CM corrective maintenance, PM preventative maintenance, rep. replacements, MTTFF mean time to first failure
Motor system - System Failures (t)
• Corrective maintenance and preventive maintenance (tp D 46:99 days) strategy. The component is subject to corrective maintenance upon failure, and to preventative maintenance adopting the time-based replacement policy with tp D 46:99 days, in accordance with the optimization analysis conducted in Sect. 12.1.4. • Corrective maintenance and preventive maintenance (tp D 160 days) strategy. The component is subject to corrective maintenance and to preventative maintenance adopting the time-based replacement policy with tp D 160 days, in accordance with the analysis conducted in Sect. 12.1.4.
80 70 60 50 40 30
CM CM & PM (tp=46.99)
20
CM & PM (tp=160)
10 0 0
500
1000
1500
2000
2500
3000
3500
Fig. 12.18 Motor system. Number of failures, period 10 years. q D 1. CM corrective maintenance, PM preventative maintenance
All these scenarios are based on the “as good as new” assumption at the end of the generic replacement action, i. e., the adopted restoration factor is q D 1 after
444
12 Applications and Case Studies
Motor system - Availability A (t) 1
0.99
0.98 CM
0.97
CM & PM (tp=46.99) CM & PM (tp=160) 0.96
0.95
Fig. 12.19 Motor system. Point availabilityA.t /, period 10 years. q D 1
0.94 0
500
1000
both corrective and preventive actions. The assumed repair/restoration times are Tf D MTTRCM D 1 day and Tp D MTTRPM D 0:5 day. The first approach based on the corrective maintenance strategy has the lowest maintenance costs. The item is not an aging component, and the preventative maintenance strategy is not convenient: it is more affordable to act in the case of failure, even if the cost of corrective replacement is 1,348.7 unit of cost (u. c.) per action compared with 111.5 u. c. per action in the case of a preventative maintenance action. With reference to a finite number of historical observations, the best preventive replacement time tp seems to be equal to 46.99 days. Moreover, the number of corrective maintenance actions, about 69.49 events, is the same moving from the first to the third scenario, while the number of the preventive maintenance actions ranges from 0 to about 4 events. The number of failures slightly decreases instead, as shown in Fig. 12.18. This is behavior typical of the components subject to random failures, i. e., ageless, or without memory, items. The trend of the point availability A.t/ in the simulated hypothesis is reported in Fig. 12.19, where the greatest values are referred to all the components subject to corrective maintenance, and not to preventative maintenance. Similar remarks are made for the bucket component in Table 12.4 and Figs. 12.20 and 12.21. Different scenarios have been simulated. The period T is equal to 10 years:1 1
The restoration factor q is adopted equal to 1. Other scenarios illustrated below are based on different assumptions.
1500
2000
2500
3000
3500
Bucket comp. System Failures (t) 60 55 50 45 40 35 30 25 20
CM
15
CM & PM (tp=98)
10
CM & PM (tp=30)
5 0 0
500
1000
1500
2000
2500
3000
3500
Fig. 12.20 Bucket component. Number of failures, period 10 years. q D 1
• Corrective maintenance strategy. The component is subject to corrective maintenance and not to preventative maintenance; • Corrective maintenance and preventative maintenance (tp D 98 days) strategy. The component is subject to corrective maintenance, upon failure, and to preventative maintenance adopting the timebased replacement policy with tp D 98 days, in accordance with the optimization analysis conducted in Sect. 12.1.5; • Corrective maintenance and preventative maintenance (tp D 30 days) strategy. The component is subject to corrective maintenance, upon failure, and to preventative maintenance adopting the timebased replacement policy with tp is assumed to be equal to 30 days.
12.1 Preventive Maintenance Strategy Applied to a Waste to Energy Plant
445
Table 12.4 Bucket component. Multiscenario analysis. q D 1 CM
CM and PM (tp D 98)
CM and PM (tp D 30)
0.985 0.980 64.167 3,596.523 53.477 0.000 53.477 53.488 53.488 0.000 53.488 72,139.266
0.984 0.983 70.182 3,590.543 51.624 7.833 59.457 51.631 51.631 15.666 67.297 71,381.489
0.974 0.967 73.475 3,555.128 47.338 47.534 94.872 47.346 47.346 95.075 142.421 74,456.413
Bucket comp. (1,000 rep. – 3,650 days) Mean availability (all events) Point availability (all events) at 3,650 MTTFF Uptime CM downtime PM downtime Total downtime Number of failures Number of CMs Number of PMs Total events CMs and PMs Total costs
Bucket comp. Availability A (t) 1
0.99
CM
0.98
CM & PM (tp=98) CM & PM (tp=30) 0.97
Fig. 12.21 Bucket component. Point availability A.t /, period 10 years. q D 1
0.96 0
500
In this case the best maintenance strategy corresponds to the second scenario, because the shape factor of the Weibull function is greater than 1. With reference to the number of failures, the third scenario has the best performance, as pointed out in Fig. 12.20. This strategy is also based on 95 preventative maintenance actions compared with 15.7 for the second scenario, and the total downtime is about 95 days compared with 59 days (37:3%) for the second scenario and 53 days (43:63%) for the first scenario. Figure 12.22 reports the comparison of the total maintenance cost for the three scenarios, assuming alternatively the restoration factor q is equal to 1 and q D 0:5 in corrective maintenance and preventative maintenance. The maintenance cost is always higher for q D 0:5 because the aging component is not completely restored: moving from the first to the third scenario, the percentage increment is C8:07, C4:49, and C2:77%, respectively.
1000
1500
2000
2500
3000
3500
Bucket comp. Total cost (€) 78000 77000 76000 75000 74000 73000 72000
CM - q=1
71000
CM - q=0.5
70000
CM & PM (tp=98) - q=1
69000
CM & PM (tp=98) - q=0.5
68000 1 CM - q=1
72139
CM - q=0.5
77964
CM & PM (tp=98) - q=1
71381
CM & PM (tp=98) - q=0.5
74584
CM & PM (tp=30) - q=1
74456
CM & PM (tp=30) - q=0.5
76522
CM & PM (tp=30) - q=1 CM & PM (tp=30) - q=0.5
Fig. 12.22 Bucket component. Total maintenance cost (period 10 years) q D 1 compared with q D 0:5
Figures 12.23 and 12.24 compare severally corrective maintenance downtimes (days) and preventative maintenance downtimes for these six simulated operating scenarios.
446
12 Applications and Case Studies
CM Downme [days] 60
Fig. 12.25 Reliability block diagram. System with two basic components
50 40
Motor system
Bucket
30 CM - q=1
20
CM - q=0.5
10
CM & PM (tp=98) - q=1 CM & PM (tp=98) - q=0.5
0 1 CM - q=1
53.477
CM - q=0.5
57.7989
CM & PM (tp=98) - q=1
51.6244
CM & PM (tp=98) - q=0.5
53.4166
CM & PM (tp=30) - q=1
47.3378
CM & PM (tp=30) - q=0.5
48.0118
CM & PM (tp=30) - q=1 CM & PM (tp=30) - q=0.5
Fig. 12.23 Bucket component. CM downtime (period 10 years) q D 1 versus q D 0:5
PM Downme days 60 50 40 30 CM - q=1 20
CM - q=0.5
10
CM & PM (tp=98) - q=1 CM & PM (tp=98) - q=0.5
0 1
CM & PM (tp=30) - q=1
CM - q=1
0
CM & PM (tp=30) - q=0.5
CM - q=0.5
0
CM & PM (tp=98) - q=1
7.8326
CM & PM (tp=98) - q=0.5
11.3323
CM & PM (tp=30) - q=1
47.5342
CM & PM (tp=30) - q=0.5
52.7175
Fig. 12.24 Bucket component. PM downtime (period 10 years) q D 1 compared with q D 0:5
12.1.9 Monte Carlo Analysis of the System This section deals with the Monte Carlo analysis of a system made up of two components as in Fig. 12.25. It is possible to design several combinations of maintenance strategies to apply to the two components. In detail, the following scenarios, or systems, are considered: 1. System 1. The motor system and the bucket are both subject to corrective maintenance and not to preventative maintenance, q D 1. 2. System 2. In addition to the scenario for system 1, the bucket is subject to preventative maintenance with tp D 98 days with a related restoration factor q(PM) equal to 1 (as good as new replacement).
3. System 3. This differs from system 2 because the restoration factor in the case of corrective maintenance, q(CM), is equal to 0.5. 4. System 4. This differs from system 3 because the restoration factor of the motor system is equal to 1 and not to 0.5. Table 12.5 presents the results obtained by the simulation analysis of these scenarios. System 1 is the best performer regarding the total events preventative maintenance actions + corrective maintenance actions and total downtime because the less onerous preventive actions are not admissible. Moving to system 2 where these actions are applied on the bucket, i. e., on an aging component, the total costs range from US$ 163,348 to US$ 162,841. In the case of improper application of corrective maintenance, i. e., restoration factor q(CM) < 1, the expected cost for system 3 is lower. This astonishing result must be compared with the results for system 4, where q(CM) D 1 for the motor system and q(CM) D 0:5 for the bucket, and the expected cost of about US$ 165,000 is the greatest. In fact, if it is possible to choose between q D 1 and q < 1 for the motor system, i. e., a nonaging item, the first option, q D 1, is not convenient. This is a case where the “as good as new” hypothesis does not perform better than an incomplete restoration action. In conclusion, the best combination can be summarized as follows: only corrective maintenance with q < 1 for the motor system; corrective maintenance with q D 1 and preventative maintenance with tp D 98 and q D 1 for the bucket.
12.2 Reliability, Availability, and Maintainability Analysis in a Plastic Closures Production System for Beverages The aim of this section is to illustrate how a reliability, availability, and maintainability (RAM) analysis can result in cost reduction and productivity improvement.
12.2 Reliability, Availability, and Maintainability Analysis in a Plastic Closures Production System for Beverages
447
Table 12.5 System performance. Multiscenario analysis, q D 1 versus q D 0:5 System configuration
System 1
System 2
System 3
System 4
CM on motor system (Y/N) q(CM) on motor system CM on bucket (Y/N) q(CM) on bucket PM on motor system (Y/N) PM_tp on motor system (day) q(CM) on motor system PM on bucket (Y/N) PM_tp on bucket (day) q(CM) on bucket
Y 1 Y 1 N – – N – –
Y 1 Y 1 N – – Y 98 1
Y 0.5 Y 0.5 N – – Y 98 1
Y 1 Y 0.5 N – – Y 98 1
Mean availability (all events) Expected number of failures MTTFF Uptime CM downtime PM downtime Total downtime Number of failures Number of CMs Number of PMs Total events PMs and CMs Total costs
0.967 121.115 30.175 3,528.905 121.095 0.000 121.095 121.115 121.115 0.000 121.115 163,348
0.965 119.474 26.697 3,522.890 119.457 7.653 127.110 119.474 119.474 15.306 134.780 162,841
0.965 116.832 29.799 3,522.105 116.816 11.079 127.895 116.832 116.832 22.159 138.991 160,042
0.964 120.630 30.127 3,518.283 120.616 11.102 131.717 120.630 120.630 22.203 142.833 165,169
This second case study applies the reliability and availability evaluation analysis and the maintenance planning models to a complex mechanical system, illustrated in Fig. 12.26. This system is manufactured by an Italian company operating in the beverage and packaging sector for the production of plastic closures, such as the bottle top in Fig. 12.27. This production line is essentially made up of three main pieces of equipment: the compression-molding group, the folding (processing the corrugated effect on the cap to allow a proper “capping”) and scoring group, and a lining group for the insertion of a polyethylene-based liner through the cap. The analysis is focused on the first functional group, made up of a rotating hydraulic machine for the molding of plastic closures. The caps are made of a plastic compound (highdensity polyethylene, polypropylene) and their manufacturing process schedules several tasks such as extrusion, metering, pelleting, insertion, and molding. The core of the compression-molding group is a rotating carousel, represented in Fig. 12.28, driving the whole manufacturing process; Fig. 12.29 details the molding task identified by number 3 in Fig. 12.28. The compression-molding process has a high level of quality and repeatability, i. e., a small deviation from the quality standards.
Folding and scoring group Lining group
Compression molding group
Fig. 12.26 Rotating and compression-molding system
Fig. 12.27 X-ray picture of a cap for water
448
12 Applications and Case Studies
icy rules, spare parts numbers, maintenance action costs, availability of maintenance crews, etc.), and system optimization.
1 2 3 4 5 6
Placement Cavity lifting Molding Cooling Mold opening Ejection
Melt Pellet Caps
Fig. 12.28 Rotating compression-molding manufacturing process
A Punch B Stripper C Melt feeding D Mold E Pellet portion
Fig. 12.29 Compression-molding task
The application of the RAM analysis follows the decisional steps theoretically illustrated and discussed in the previous chapters: • production system analysis and reliability block diagram (RBD) construction; • failure modes identification and analysis by the application of FMEA or FMECA techniques; • data collection: failure and repair times for all basic components or “blocks” and failure modes of the production system, maintenance action costs (corrective, preventive, inspection, etc.), spare parts availability (e. g., storage costs and fulfillment costs), availability of crews, etc.; • evaluation of system reliability parameters assuming the hypothesis of nonrepairable components/systems; • system availability evaluation by dynamic simulation assuming the hypothesis of repairable components/systems; • evaluation of maintenance costs, multiscenario comparison assuming different configurations and parameterizations of the system (maintenance pol-
Some of these steps and the related results are presented in the following sections.
12.2.1 RBD construction The RBD construction is obtained by the analysis of the production system, and/or a concurrent and advanced design of the system, which simultaneously involves the following group of technicians: • research and development department with responsibility for the development of new products; • product’s engineers and designers with responsibility for the products; • production managers with the responsibility for the production system in accordance with production plans; • quality managers with responsibility for the quality of products, processes, and the whole production system; • logistics manager with responsibility for the production system’s spare parts; • maintenance manager with responsibility for the availability of the equipment (components and production system). The construction of the RBD is influenced by the results of a FMEA and FMECA conducted on the basic components/parts of the production system. In particular, these analyses support the identification of the failure modes of the basic components that are critical for the system function. At the same time, it is possible to evaluate the whole system, both from a qualitative and from a quantitative point of view, in order to point out its most important failure modes by an ad hoc fault tree analysis. The RBD model is a powerful tool for the accurate evaluation of a set of reliability parameters involving the whole production system, and it is very useful for the planning and organizing of costsaving maintenance actions on components and subsystems. The following case study does not explicitly describe the identification of the failure modes affecting the RBD construction.
12.2 Reliability, Availability, and Maintainability Analysis in a Plastic Closures Production System for Beverages
12.2.2 Rotating Hydraulic Machine
449
strength and life of products that very quickly wear out over a certain age, is given by
The RBD concerning the whole production line of plastic closures, showed in Fig. 12.26, is made up of thousands of basic reliability blocks. The focus is on the rotating hydraulic machine, the subject of the application of the RAM analysis discussed in the current chapter, having the operational diagram shown in Fig. 12.30. This functional group can be modeled by 259 basic blocks, as properly illustrated in the RBD presented in Fig. 12.31, and is made up of several pieces of equipment, e. g., thrust, driving belts, cams, and O-rings, grouped in operational sets named A, B, C, D, and E. Set A is made up of two components, set C is made up of three components, set D is a 10=12 (k=n) redundant system, and, finally, set E is made up of 48 units, each of five items.
f .x/ D
1 x e x ˇ e ˇ e ; ˇ
(12.1)
where is the location parameter and ˇ is the scale parameter. The Gumbel mean or MTTF is C 0:5772ˇ, where 0.5772 is the Euler’s constant. The standard p . The Gumbel cumulative distribution deviation is ˇ 6 function is: x F .x/ D e e : (12.2) Figure 12.34 presents the histogram of the distribution of failures. The very little number of available data gives the statistical analysis a significant level of uncertainty. Obviously, the greater the number of available data on failures, the more accurate the evaluation of the estimated reliability parameters. For 200 available ttf values, Figs. 12.35–12.37 present the result of the parametric probability plot analysis conducted by Minitab® Statistical Software for different distributions of ttf. Some of these distributions were introduced in Chap. 5, while the others, such as loglogistic, three-parameter Weibull, and three-parameter exponential, are illustrated by several literature references on statistics science and applications. The result of the goodness-of-fit analysis by Minitab® Statistical Software, based both on the Anderson–Darling index and the correlation coefficient, is reported in Table 12.6. With reference to the correlation coefficient, in spite of the good performance of the three-parameter Weibull distribution, the adopted probability distribution of the ttf is the Gumbel distribution, whose probability plot is illustrated in Fig. 12.38, having a very good correlation coefficient too but a better Anderson– Darling index.
12.2.3 Data Collection and Reliability Evaluation of Components This section exemplifies the reliability evaluation analysis conducted for a specific component E.4 of the production system. In particular, Fig. 12.32 illustrates five values of ttf, 50, 4,000, 4,200, 4,950, and 5;060 h, collected on five different applications of this component. In the case of a few historical values of the random ttf, as in this situation, the best-fit parametric analysis and evaluation suggests adopting the so-called Gumbel distribution, whose related probability plot is illustrated in Fig. 12.33. The probability density function of the Gumbel distribution, also called “smallest extreme value distribution,” used in general to model the
D.1 D.2 A.1
A.2 A
B
C.1
C.2
C.3
E.1
E.2
C
E.3 E (1th)
D.1 D k/n parallel (10/12)
Fig. 12.30 Operational diagram of the rotating hydraulic machine
E.4
E.5
E.1
E.2
E.3 E (48th)
E.4
E.5
Driv.Belt
Cam
E.2
A.2
Distr.
E.3
Water o-ring
B
Oil o-ring
E.4
E.5
C.1
C.2
Extruder12
Extruder11
Extruder10
Extruder9
Extruder8
Extruder7
C.3
UP flange (stripper)
UP flange (stripper)
UP flange (stripper)
UP flange (stripper)
UP flange o-ring
UP flange o-ring
UP flange o-ring
UP flange o-ring
UP flange o-ring
UP flange o-ring
UP flange o-ring
PUNCH
PUNCH
PUNCH
PUNCH
PUNCH
PUNCH
PUNCH
PUNCH
DOWN DOWN flange flange (mould) o-ring
DOWN DOWN flange flange (mould) o-ring
DOWN DOWN flange flange (mould) o-ring
DOWN DOWN flange flange (mould) o-ring
DOWN DOWN flange flange (mould) o-ring
DOWN DOWN flange flange (mould) o-ring
DOWN DOWN flange flange (mould) o-ring
DOWN DOWN flange flange (mould) o-ring
UP flange (stripper)
UP flange (stripper)
UP flange (stripper)
UP flange (stripper)
UP flange (stripper)
UP flange (stripper)
UP flange (stripper)
UP flange o-ring
UP flange o-ring
UP flange o-ring
UP flange o-ring
PUNCH
Molding unit (1–48)
Pelleting and inserting system (1–12)
UP flange o-ring
UP flange o-ring
PUNCH
PUNCH
PUNCH
PUNCH
PUNCH
PUNCH
PUNCH
UP flange o-ring
UP flange o-ring
UP flange (stripper)
Transmission pallet placement
k/n, 10/12
UP flange (stripper)
UP flange (stripper)
UP flange (stripper)
UP flange o-ring
Fig. 12.31 Reliability block diagram of the rotating hydraulic machine, 259 blocks
E.1
D
A.1
Legend
Thrust
Extruder6
Extruder5
Extruder4
Extruder3
Extruder2
Extruder1
UP flange (stripper)
DOWN DOWN flange flange (mould) o-ring
DOWN DOWN flange flange (mould) o-ring
DOWN DOWN flange flange (mould) o-ring
DOWN DOWN flange flange (mould) o-ring
DOWN DOWN flange flange (mould) o-ring
DOWN DOWN flange flange (mould) o-ring
DOWN DOWN flange flange (mould) o-ring
DOWN DOWN flange flange (mould) o-ring
UP flange (stripper)
UP flange (stripper)
UP flange (stripper)
UP flange (stripper)
UP flange (stripper)
UP flange o-ring
UP flange o-ring
UP flange o-ring
UP flange o-ring
UP flange o-ring
UP flange o-ring
UP flange o-ring
UP flange o-ring
UP flange (stripper)
UP flange (stripper)
UP flange (stripper)
PUNCH
PUNCH
PUNCH
PUNCH
PUNCH
PUNCH
PUNCH
PUNCH
DOWN DOWN flange flange (mould) o-ring
DOWN DOWN flange flange (mould) o-ring
DOWN DOWN flange flange (mould) o-ring
DOWN DOWN flange flange (mould) o-ring
DOWN DOWN flange flange (mould) o-ring
DOWN DOWN flange flange (mould) o-ring
DOWN DOWN flange flange (mould) o-ring
DOWN DOWN flange flange (mould) o-ring
UP flange (stripper)
UP flange (stripper)
UP flange (stripper)
UP flange (stripper)
UP flange (stripper)
UP flange o-ring
UP flange o-ring
UP flange o-ring
UP flange o-ring
UP flange o-ring
UP flange o-ring
UP flange o-ring
UP flange o-ring
UP flange (stripper)
UP flange (stripper)
UP flange (stripper)
PUNCH
PUNCH
PUNCH
PUNCH
PUNCH
PUNCH
PUNCH
PUNCH
DOWN DOWN flange flange (mould) o-ring
DOWN DOWN flange flange (mould) o-ring
DOWN DOWN flange flange (mould) o-ring
DOWN DOWN flange flange (mould) o-ring
DOWN DOWN flange flange (mould) o-ring
DOWN DOWN flange flange (mould) o-ring
DOWN DOWN flange flange (mould) o-ring
DOWN DOWN flange flange (mould) o-ring
UP flange (stripper)
UP flange (stripper)
UP flange (stripper)
UP flange (stripper)
UP flange (stripper)
UP flange o-ring
UP flange o-ring
UP flange o-ring
UP flange o-ring
UP flange o-ring
UP flange o-ring
UP flange o-ring
UP flange o-ring
UP flange (stripper)
UP flange (stripper)
UP flange (stripper)
PUNCH
PUNCH
PUNCH
PUNCH
PUNCH
PUNCH
PUNCH
PUNCH
DOWN DOWN flange flange (mould) o-ring
DOWN DOWN flange flange (mould) o-ring
DOWN DOWN flange flange (mould) o-ring
DOWN DOWN flange flange (mould) o-ring
DOWN DOWN flange flange (mould) o-ring
DOWN DOWN flange flange (mould) o-ring
DOWN DOWN flange flange (mould) o-ring
DOWN DOWN flange flange (mould) o-ring
UP flange (stripper)
UP flange (stripper)
UP flange (stripper)
UP flange (stripper)
UP flange (stripper)
UP flange o-ring
UP flange o-ring
UP flange o-ring
UP flange o-ring
UP flange o-ring
UP flange o-ring
UP flange o-ring
UP flange o-ring
UP flange (stripper)
UP flange (stripper)
UP flange (stripper)
PUNCH
PUNCH
PUNCH
PUNCH
PUNCH
PUNCH
PUNCH
PUNCH
DOWN DOWN flange flange (mould) o-ring
DOWN DOWN flange flange (mould) o-ring
DOWN DOWN flange flange (mould) o-ring
DOWN DOWN flange flange (mould) o-ring
DOWN DOWN flange flange (mould) o-ring
DOWN DOWN flange flange (mould) o-ring
DOWN DOWN flange flange (mould) o-ring
DOWN DOWN flange flange (mould) o-ring
450 12 Applications and Case Studies
12.2 Reliability, Availability, and Maintainability Analysis in a Plastic Closures Production System for Beverages
ReliaSoft Weibull++ 7 - www.ReliaSoft.com
F/S Timeline FS Timeline Failure
0.000
1200.000
2400.000
3600.000
4800.000
6000.000
Time, (t) μ=4412,8325, σ=1069,6069
Fig. 12.32 Failure timeline analysis, five applications. ReliaSoft® software
ReliaSoft Weibull++ 7 - www.ReliaSoft.com
Probability - Gumbel
99.000
Probability-Gumbel
Unreliability, F(t)
Data 1 Gumbel-2P MLE SRM K-M FM F=5/S=0 Data Points Probability Line
50.000
10.000 30.000
1224.000
2418.000
3612.000
4806.000
6000.000
Time, (t) μ=4412,8325, σ=1069,6069
Fig. 12.33 Probability plot, Gumbel distribution five data points. ReliaSoft® software
451
452
12 Applications and Case Studies
ReliaSoft Weibull++ 7 - www.ReliaSoft.com
F/S Histogram
4.000E-4
FS Histogram Pdf Line Failures
3.200E-4
Value
2.400E-4
1.600E-4
8.000E-5
0.000
0-501
501-1002
1002-1503 1503-2004 2004-2505 2505-3006 3006-3507 3507-4008 4008-4509 4509-5010 5010-5511 5511-6012
Period μ=4412.8325, σ=1069.6069
Fig. 12.34 Failure histogram, Gumbel distribution five data points. ReliaSoft® software
Probability Plot for ttf - comp. E.4 LSXY Estimates-Complete Data Correlation Coefficient Weibull 0.945 Lognormal 0.857 Exponential * Loglogistic 0.872
Lognorm al
99.9
99.9
90
99
50
90 Percent
Percent
Weibull
10 1
50 10 1
0.1 100
1000 ttf
0.1 100
10000
E xponential
1000 ttf
10000
Loglogistic
99.9
99.9
90
99 Percent
Percent
50 10
90 50 10
1 1 0.1
1
10
100 ttf
1000
10000
0.1 100
1000
10000 ttf
Fig. 12.35 Probability plot for ttf, part 1. Minitab® Statistical Software
12.2 Reliability, Availability, and Maintainability Analysis in a Plastic Closures Production System for Beverages
Probability Plot for ttf - comp. E.4 LSXY Estimates-Complete Data 3-Param et er Weibull 99.9
90
99
50
90 Percent
Percent
Correlation Coefficient 3-Parameter Weibull 0.996 3-Parameter Lognormal 0.985 2-Parameter Exponential * 3-Parameter Loglogistic 0.983
3-Param et er Lognorm al
99.9
10
50 10
1
1 0.1
0.1 8000
10000 12000 ttf - Threshold
14000
188000
2-Param et er Exponent ial
190000 192000 194000 ttf - Threshold 3-Param et er Loglogis t ic
99.9
99.9 90
99 Percent
Percent
50 10 1
90 50 10 1
0.1
01
0.
10
0.
00
1.
.
10
00
00
0.
10
185000
0.
.
00
10
0.1
00
00 00
10
187500 190000 ttf - Threshold
192500
ttf - Threshold
Fig. 12.36 Probability plot for ttf, part 2. Minitab® Statistical Software
Probability Plot for ttf - comp. E.4 LSXY Estimates-Complete Data Sm alles t Ext rem e V alue 99.9
90
99
50
90 Percent
Percent
Correlation Coefficient Smallest Extreme Value 0.993 Normal 0.985 Logistic 0.983
Norm al
99.9
10
50 10
1
1 0.1
0.1 0
4000
8000
ttf
0
2500
5000 ttf
Logistic 99.9
Percent
99 90 50 10 1 0.1 0
2500
5000
7500
ttf
Fig. 12.37 Probability plot for ttf, part 3. Minitab® Statistical Software
7500
453
454
12 Applications and Case Studies
ReliaSoft Weibull++ 7 - www.ReliaSoft.com
Probability - Gumbel
99.900
Probability-Gumbel Data 1 Gumbel-2P RRX SRM MED FM F=200/S=0 Data Points Probability Line
Unreliability, F(t)
50.000
10.000
5.000
1.000
0.500
0.100 100.000
1480.000
2860.000
4240.000
5620.000
7000.000
Time, (t) μ=4391.6311, σ=987.9175, ρ=0.9927
Fig. 12.38 Probability plot for ttf, Gumbel distribution. ReliaSoft® software Table 12.6 Goodness-of-fit analysis conducted with Minitab® Statistical Software Distribution Weibull Lognormal Exponential Loglogistic 3-Parameter Weibull 3-Parameter lognormal 2-Parameter exponential 3-Parameter loglogistic Smallest extreme value Normal Logistic
Anderson–Darling index
Correlation coefficient
5.269 10.924 91.066 9.718 0.386 1.537 85.054 1.532 0.869 1.487 1.489
0.945 0.857
Figure 12.39 presents the trend of reliability, unreliability, failure distribution, and conditional failure rate assuming a Gumbel distribution ( D 4;391, D 988). Figures 12.40 and 12.41 present the reliability, unreliability, and failure rate obtained by the application of a nonparametric distribution analysis deriving from the Kaplan–Meier method with confidence interval equal to 95%. Table 12.7 summarizes the parametric probability distributions of the random ttf for the components of the system. These values are very important for one to be able to conduct an evaluation analysis on the whole
0.872 0.996 0.985 0.983 0.993 0.985 0.983
production system and to plan and optimize the maintenance actions.
12.2.4 Reliability Evaluation, Nonrepairable Components/Systems Once data collection has been completed and the failure behaviors of the parts and components of the system have been modeled, it is possible to evaluate the reliability of the whole system as a combination
12.2 Reliability, Availability, and Maintainability Analysis in a Plastic Closures Production System for Beverages ReliaSoft Weibull++ 7 - www.ReliaSoft.com
455
ReliaSoft Weibull++ 7 - www.ReliaSoft.com
Reliability vs Time Plot
Unreliability vs Time Plot
1.000
1.000
Reliability
Unreliability
Data 1 Gumbel-2P RRX SRM MED FM F=200/S=0 Data Points Reliability Line
Data 1 Gumbel-2P RRX SRM MED FM F=200/S=0 Data Points Unreliability Line 0.800
Unreliability, F(t)=1-R(t)
Reliability, R(t)=1-F(t)
0.800
0.600
0.400
0.200
0.000 0.000
0.600
0.400
0.200
1400.000
2800.000
4200.000
5600.000
0.000 0.000
7000.000
1400.000
2800.000
Time, (t)
4200.000
5600.000
7000.000
Time, (t) μ=4391.6311, σ=987.9175, ρ=0.9927
μ=4391.6311, σ=987.9175, ρ=0.9927
ReliaSoft Weibull++ 7 - www.ReliaSoft.com
ReliaSoft Weibull++ 7 - www.ReliaSoft.com
Probability Density Function
Failure Rate vs Time Plot
4.000E-4
0.060
Pdf
Failure Rate
Data 1 Gumbel-2P RRX SRM MED FM F=200/S=0 Pdf Line
Data 1 Gumbel-2P RRX SRM MED FM F=200/S=0 Failure Rate Line 0.048
Failure Rate, f(t)/R(t)
3.200E-4
f(t)
2.400E-4
1.600E-4
8.000E-5
0.000 0.000
0.036
0.024
0.012
2000.000
4000.000
6000.000
8000.000
0.000 400.000
10000.000
2120.000
Time, (t)
3840.000
5560.000
7280.000
9000.000
Time, (t) μ=4391.6311, σ=987.9175, ρ=0.9927
μ=4391.6311, σ=987.9175, ρ=0.9927
Fig. 12.39 R.t /, F .t /, f .t /, and .t /, Gumbel distribution. ReliaSoft® software Table 12.7 Probability distribution of ttf. Parts system Part
Probability distribution
A.1 A.2 B C.1 C.2 C.3 D (1–48) E.1 E.2 E.3 E.4 E.5
normal normal normal normal 3-parameters Weibull normal normal normal lognormal normal normal Gumbel
4,000 1,650 4,000 1,436
1,600 660 1,600 574.4
4,000 2,810 4,000 7.877 2,650 4,600 4,391
1,600 1,124 1,600 2.452 1,060 1,840 988
of different parts and components, in accordance with the elementary configurations illustrated in Chap. 6. Figure 12.42 reports the main results of the reliability evaluation for the system, i. e., the determination of the failure probability function FS .t/ and the reliability function RS .t/, very significant for prediction of the first failure in the case of nonrepairable components/systems.
ˇ
0.7612
1,531.4
1,212.65
The values of the reliability of the system RS .t/, where t is in hours, are very low: e. g., RS .10/ D 0:1301, RS .50/ D 0:0151, RS .100/ D 0:0018. The MTTF is about 4:185 h. This is the mean time to the first failure and in the presence of repairable components it is not a useful measure of system performance.
12 Applications and Case Studies Survival Plot for ttf
Cumulative Failure Plot for ttf
Kaplan-Meier Method - 95% CI Complete Data
Kaplan-Meier Method - 95% CI Complete Data
100
100
80
80
60
Table of M ean M edian IQ R
40
Statistics 3826.23 3980.76 1468.34
20
Percent
Percent
456
60
Table of M ean M edian IQ R
40
Statistics 3826.23 3980.76 1468.34
20
0
0 0
1000
2000
3000
4000
5000
6000
7000
ttf
0
1000
2000
3000
4000
5000
6000
7000
ttf
Fig. 12.40 Nonparametric distribution analysis, R.t / and F .t /
Hazard Plot for ttf Empirical Hazard Function Complete Data 1.0
Table of Mean Median IQR
0.8
Statistics 3826.23 3980.76 1468.34
Rate
0.6
0.4
0.2
0.0 0
1000
2000
3000
4000
5000
6000
7000
ttf
Fig. 12.41 Nonparametric distribution analysis, .t /
12.2.5 Data on Repairs and Maintenance Strategies In order to evaluate the availability of the components/system it is necessary to know the repair stochastic behavior that can be modeled by evaluating the parametric distribution of the generic random variable ttr. In this case study, the repair process is characterized by the parameterization, reported in Table 12.8, depending on the maintenance strategy adopted (corrective, preventive, or inspection). Table 12.8 also reports a lot of data grouped in three different sections: 1. Corrective maintenance data and parameters: ttr, cost per action,2 and restoration factor. The ttr for the generic component is assumed to be constant, but in general it is randomly distributed in accor2
In this case study, all costs are reported in a generic unit of measure, unit of cost, because the real values cannot be revealed.
dance with a generic probability distribution, such as Weibull or lognormal. The restoration factor specifies the level to which the block is restored after the maintenance action. In particular, type I assumes that the repair removes only the damage since the last repair, while type II, represented as II.q/ in Table 12.8, assumes that the generic repair is capable of removing any damage accumulated up to failure. As is well known, q D 1 means adopting the “as good as new” hypothesis. 2. Preventive maintenance data and parameters: replacement time tp in accordance with the type I analytical model illustrated in Chap. 9, duration of the preventive action assumed to be constant, cost per action and restoration factor. 3. Inspection maintenance (data and parameters: fixed interval time based on item age, duration of the inspection action, restoration factor assumed to be type II and q D 0.
12.2.6 Monte Carlo Analysis of the Repairable System The outcomes of the evaluation analysis for the repairable system are illustrated in the following figures. Scenario 1 represents the as-is configuration of the production system and maintenance system that the company needed to optimize in order to minimize the global production cost. This cost includes
195.6400 0.7000 8.6800 182.9200 0.9900 0.4900 1.3800 1.7000 0.0360 5.8000 2.5400 0.0010
II(1) II(1) II(1) II(1) II(1) II(1) II(1) II(1) II(1) II(1) II(1) II(1)
Type I/II
– 2,160 2,160 – 4,320 4,320 360 – 12,960 – – 12,960
Replacement tp (h)
A.1 A.2 B C.1 C.2 C.3 D (1–48) E.1 E.2 E.3 E.4 E.5
Part
– 1,000 1,000 – – – – – – 1,440 – –
Insp. fixed time interval – item age (h)
4,000 2,160 2,160 1,436 4,320 4,320 360 – 12,960 – – 12,960
5 0.167 1 1 4 4 0.5 – 0.3 – – 0.3
195.6400 0.7000 8.6800 182.9200 0.9900 0.4900 1.3800 – 0.0360 – – 0.0010
II(1) II(1) II(1) II(1) II(1) II(1) II(1) – II(1) – – II(1)
4,000 2,160 2,160 1,436 4,320 4,320 360 – 12,960 – – 12,960
– 0.167 0.033 – – – – – – 0.167 – –
20 0.167 1 5 4 4 0.5 – 0.3 – – 0.3
– – – – – – – – –
–
Cost per action (u. c./action)
195.6400 0.7000 8.6800 182.9200 0.9900 0.4900 1.3800 – 0.0360 – – 0.0010
Cost per action (u. c./action)
Insp. time duration (constant) (h)
Inspection maintenance
Prev. time duration (constant) (h)
Replacement tp (h)
Type I/II
– II(1) II(1) – II(1) II(1) II(1) – II(1) – – II(1)
Type I/II
Replacement tp (h) Cost per action (u. c./action)
– 0.7000 8.6800 – 0.9900 0.4900 1.3800 – 0.0360 – – 0.0010
Cost per action (u. c./action)
Scenario 3 Preventive maintenance Prev. time duration (constant) (h)
– 0.167 1 – 16 16 1 – 2 – – 2
Prev. time duration (constant) (h)
Preventive maintenance
Scenario 2 Preventive maintenance
Table 12.9 Preventive maintenance parameterization, scenarios 2 and 3
20 2 2.5 5 8 8 1 0.67 0.67 2 0.67 0.67
Cost per action (u. c./action)
Corrective maintenance
ttr (constant) (h)
u. c. unit of cost
A.1 A.2 B C.1 C.2 C.3 D (1–48) E.1 E.2 E.3 E.4 E.5
Part
Scenario
Table 12.8 Repair process parameterization, ttr constant (hours). Scenario 1
II(1) II(1) II(1) II(1) II(1) II(1) II(1) – II(1) – – II(1)
Type I/II
– – – II(0) – –
– II(0) II(0) – –
Type I/II
12.2 Reliability, Availability, and Maintainability Analysis in a Plastic Closures Production System for Beverages 457
458
12 Applications and Case Studies Unreliability vs Time
Reliability vs Time
1.000
1.000
Unreliability
Reliability
CompMould Unreliability Line
CompMould Reliability Line
0.800
Reliability, R(t)=F(t)
Unreliability, F(t)=1-R(t)
0.800
0.600
0.400
0.200
0.000 0.000
0.600
0.400
0.200
12.000
24.000
36.000
48.000
0.000 0.000
60.000
12.000
Time, (t)
24.000
36.000
48.000
60.000
Time, (t)
Fig. 12.42 System unreliability FS .t / and reliability RS .t /. ReliaSoft® software
Availability and Reliability vs Time 1.000
CompMould Point Availability Line Point Reliability Line
0.800
A(t), R(t)
0.600
0.400
0.200
0.000 0.000
6440.000
12880.000
19320.000
25760.000
32200.000
Time, (t)
Fig. 12.43 Availability and reliability, scenario 1. ReliaSoft® software
costs for corrective, preventive, and inspection maintenance met for crew and materials, e. g., spare parts and replacement costs. When the system is down, i. e., in the presence of a downtime of the system, a cost for nonproduction3 equal to 11:8 u: c:=h is assumed. Figure 12.43 shows the trend of the point availability AS .t/ and the reliability RS .t/ of the system for the range t 2 Œ0; 32;200 h by the application of the Monte Carlo dynamic simulation. The period of 3
Lost production cost
time of 32;200 h for the system corresponds to about five operating years. The reliability of the repairable systems is referred to the first failure. The simulation analysis is carried on 500 repeated runs, corresponding to 500 virtual periods of work under the same environmental conditions for the production systems. Figures 12.44 and 12.45 present the analysis of the most critical components/blocks considering the so-called failure criticality index and the downing events criticality index, respectively.
12.2 Reliability, Availability, and Maintainability Analysis in a Plastic Closures Production System for Beverages
459
RS FCI 1.144
Availability 100%
50%
0.915 0% 5 Item(s)
0.687
0.458
0.229
Fig. 12.44 Failure criticality index (FCI) – scenario 1. ReliaSoft® software
0.000
C.1
A.2
E.3
E.3
E.3
RS DECI 1.356
Availability 100%
50%
1.085 0% 5 Item(s)
0.814
0.543
0.271
Fig. 12.45 Downing events criticality index (DECI) – scenario 1. ReliaSoft® software
0.000
B
As previously exemplified in Chaps. 5 and 6, for a generic block the failure criticality index is obtained from the number of system downing failures due to the generic block divided by the whole number of system failures. Similarly, for a generic block the downing events criticality index is obtained from the number of system downing events, different from system downing failures, due to the generic block divided by the whole number of system failures. This implies that
C.1
A.2
C.2
E.3
1.36% of the times that the system is down is due to the down condition of component B. Figure 12.46 reports the state diagram (up/down) for the system and the most critical components/blocks. The Pareto chart of the costs for the corrective, preventive, and inspection maintenance of the blocks in Fig. 12.47 highlights the most critical components in terms of annual costs.
460
12 Applications and Case Studies Block Up/Down State Operating Time Time Under Repair
C.2
C.3
A.2
C.1
B
System
Fig. 12.46 Critical blocks up/down analysis. ReliaSoft® software
0.000
2000.000
4000.000
6000.000
8000.000
10000.000
Time, (t)
12.2.7 Alternative Scenarios and System Optimization 2500000
100
2000000
80
1500000
60
1000000
40
500000
20
0 Block Name
Percent
Total cost
Pareto Chart of Blocks/Components costs
0 3 1 1 4 1 5 2 2 3 1 2 4 1 0 7 6 5 2 9 r E. C . E. E. A . E. E. C . C . D.1 D. D. D. D.1 D. D. D. D.1 D. the O
Fig. 12.47 Pareto chart of components costs. Scenario 1
Component A.2 maintenance optimization N(1650, 660)
0.8 0.75 0.7 0.65 0.6 0.55 0.5 0.45 0.4 0.35 0.3 200
Starting from the previously illustrated scenario 1, some other scenarios concerning alternative operating conditions for the production system can be discussed. Every scenario involves different maintenance strategies and rules to be applied to the components/blocks. Scenarios 2–4 have been simulated in order to identify the best to-be configuration of the production system capable of minimizing the whole production and maintenance costs, in accordance with the adoption of different maintenance strategies and decisions. In particular, as an alternative to the as-is situation, the following hypotheses are adopted:
300
400
500
600
700
800
900
1000
1100
1200
Fig. 12.48 Component A.2 preventive maintenance optimization, scenario 4
• Scenario 2 (see the first section of Table 12.9). Maintenance actions are planned for the following very expensive components, not preventively replaced in the as-is configuration: component A.1 (cost per action equal to about 196 u: c: added to crew costs) and component C.1 (cost per action equal to 183 u: c: added to crew costs). The adopted replacement times tp , 4;000 h for component A.1 and 1;436 h for component C.1, respectively, are the mean values of the parametric probability distribution of the related ttf random variable. The values of the preventive time duration are assumed to be
12.2 Reliability, Availability, and Maintainability Analysis in a Plastic Closures Production System for Beverages
461
Table 12.10 Preventive maintenance parameterization, scenario 4 Part
x 10000
A.1 A.2 B C.1 C.2 C.3 D (1–48) E.1 E.2 E.3 E.4 E.5
Replacement tp (h)
Cf (u. c./action)
Cp (u. c./action)
3,433 778 3,878
471.6400 28.3000 43.1800
264.6400 3.0000 22.5800
1,456 2,810 3,921
110.8800 15.1800 10.9400
6.0000 8.2800 5.8400
2,116 4,727
33.4000 11.7800
12.7000 6.6800
UEC (u. c./h)
Prev. time duration (constant) (h)
0.0822 0.0069 0.0086 no prev. replacement no prev. replacement 0.0078 0.0049 0.0025 no prev. replacement 0.0098 0.0024 no prev. replacement
Cost per action (u. c./action)
Type I/II
195.6400 0.7000 8.6800
II(1) II(1) II(1)
4 0.5 0.3
0.4900 1.3800 –
II(1) II(1) –
0.5 0.3
5.8000 2.5400
– –
20 0.167 1
80 70
Costs [u.c.]
60 50 40 Scenario 1
30
Scenario 2 Scenario 3
20
Scenario 4 10 0 Costs for Parts (CM)
Costs for Crews (CM)
Costs for Parts (PM)
Costs for Crews (PM)
Costs for Crews (IN)
Cost for not producon [€/year x 5]
Total Cost [€/year x 5]
Scenario 1
94600
311615
14735
73657
13484
274915
783006
Scenario 2
70053
296349
41363
78024
13529
265165
764481
Scenario 3
70175
295472
41050
90987
13504
275613
786801
Scenario 4
62248
187599
53616
73404
18124
210058
605048
Fig. 12.49 Multiscenario analysis. Cost evaluation and comparison
constant, and equal to 5 and 1 h, respectively (see Table 12.9). • Scenario 3 (see the second section of Table 12.9). This configuration differs from scenario 2 in the time duration of the preventive action, here equal to 20 and 5 h, respectively, for components A.1 and C.1. These values are the same as the constant ttr in the case of corrective action. • Scenario 4 (see Table 12.10). This configuration differs from scenario 2 in the identification and the adoption of the best value of the replacement time tp for the generic component/block of the
system, in accordance with the replacement analytical model of type I and the application of the multiscenario parametric analysis by Simulink® MATLAB® 7.0. Figure 12.48 exemplifies the output of this dynamic analysis for component A.2 with a best replacement time tp equal to 778 h. It can be concluded that it is not convenient to plan a preventive replacement for four basic components C.1, C.2, E.2, and E.5. Tables 12.11 and 12.12 compare the results for the set of four scenarios previously illustrated in or-
462
12 Applications and Case Studies
Table 12.11 Multiscenario analysis and performance evaluation Scenario 1 Mean availability (all events) Mean availability (w/o PM and inspection) Point availability (all events) at 32,200 Reliability (32,200) Expected number of failures MTTFF Uptime (ref. 5 years) CM downtime (ref. 5 years) Inspection downtime (ref. 5 years) PM downtime (ref. 5 years) Total downtime (ref. 5 years) Number of failures (ref. 5 years) Number of CMs (ref. 5 years) Number of inspections (ref. 5 years) Number of PMs (ref. 5 years) Total events (ref. 5 years) Total maintenance costs (u. c./year x 5)
0.928 0.930 0.928 0 1,771.77 4.63 29,870.21 2,243.62 0.46 85.71 2,329.79 1,771.77 1,771.77 14.00 109.97 1,895.73 50,809
Scenario 2 0.930 0.934 0.946 0 1,764.42 3.24 29,952.84 2,132.72 . 0.46 113.98 2,247.16 1,764.42 1,764.42 14.00 122.90 1,901.32 49,932
Scenario 3 0.928 0.934 0.926 0 1,759.39 5.35 29,864.30 2,126.06 0.46 209.18 2,335.70 1,759.39 1,759.39 14.00 121.18 1,894.57 51,119
Scenario 4 0.945 0.960 0.940 0 1,175.37 4.97 30,419.85 1,283.35 0.27 496.53 1,780.15 1,175.37 1,175.37 8.09 786.04 1,969.49 39,499
Table 12.12 Multiscenario analysis and cost evaluation Costs (u. c.) Costs for parts (CM) (ref. 5 years) Costs for crews (CM) (ref. 5 years) Total CM costs (ref. 5 years) Costs for parts (PM) (ref. 5 years) Costs for crews (PM) (ref. 5 years) Total PM costs (ref. 5 years) Costs for crews (IN) (ref. 5 years) Total inspection costs (ref. 5 years) Total maintenenace costs (u. c./year 5) Total Downtime (h) (ref. 5 years) Cost for not production (u. c./year 5) Total cost (u. c./year 5) Annual total cost (u. c./year)
Scenario 1
Scenario 2
Scenario 3
Scenario 4
94,600 311,615 406,216 14,735 73,657 88,391 13,484 13,484 508,091 466 274,915 783,006 156,601
70,053 296,349 366,402 41,363 78,024 119,387 13,529 13,529 499,317 449 265,165 764,481 152,896
70,175 295,472 365,647 41,050 90,987 132,036 13,504 13,504 511,188 467 275,613 786,801 157,360
62,248 187,599 249,847 53,616 73,404 127,019 18,124 18,124 394,991 356 210,058 605,048 121,010
der to demonstrate the efficacy of the applied models and methods to support decisions regarding maintenance planning. In particular, scenario 4 is the best performer in terms of mean availability, equal to 0.945. Moreover, in comparison with the as-is configuration of scenario 1, both the total maintenance cost (about 39,450 u. c. for 5 years, 22%) and the total annual costs, including nonproduction costs (about 121,009 u. c., 29:5%), are significantly reduced. Tables 12.11 and 12.12 outline the whole situation, as illustrated in Fig. 12.49.4
4
All costs are in unit of cost, see footnote 3.
12.3 Conclusions and Call for New Contributions This chapter is thought to be continuously under construction, and therefore it is open to new case studies from industrial and service applications, as outlined at the beginning with an explicit invitation for readers to submit original contributions. For this purpose the reader is invited to submit original contributions by contacting one of the authors.
A
Appendix
A.1 Standardized Normal Distribution 8 Zz ˆ ˆ ˆ ˆ f .x/ dx F .z/ D ˆ < 1
ˆ ˆ ˆ 1 x2 ˆ ˆ : f .x/ D p e 2 2 z
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.0 3.1 3.2 3.3 3.4
0.50000 0.53983 0.57926 0.61791 0.65542 0.69146 0.72575 0.75804 0.78814 0.81594 0.84134 0.86433 0.88493 0.90320 0.91924 0.93319 0.94520 0.95543 0.96407 0.97128 0.97725 0.98214 0.98610 0.98928 0.99180 0.99379 0.99534 0.99653 0.99744 0.99813 0.99865 0.99903 0.99931 0.99952 0.99966
0.50399 0.54380 0.58317 0.62172 0.65910 0.69497 0.72907 0.76115 0.79103 0.81859 0.84375 0.86650 0.88686 0.90490 0.92073 0.93448 0.94630 0.95637 0.96485 0.97193 0.97778 0.98257 0.98645 0.98956 0.99202 0.99396 0.99547 0.99664 0.99752 0.99819 0.99869 0.99906 0.99934 0.99953 0.99968
0.50798 0.54776 0.58706 0.62552 0.66276 0.69847 0.73237 0.76424 0.79389 0.82121 0.84614 0.86864 0.88877 0.90658 0.92220 0.93574 0.94738 0.95.728 0.96562 0.97257 0.97831 0.98300 0.98679 0.98983 0.99224 0.99413 0.99560 0.99674 0.99760 0.99825 0.99874 0.99910 0.99936 0.99957 0.99969
0.51197 0.55172 0.59095 0.62930 0.66640 0.70194 0.73565 0.76730 0.79673 0.82381 0.84850 0.87076 0.89065 0.90824 0.92364 0.93699 0.94845 0.95818 0.96638 0.97320 0.97882 0.98341 0.98713 0.99010 0.99245 0.99430 0.99573 0.99683 0.99767 0.99831 0.99878 0.99913 0.99938 0.99957 0.99970
0.51595 0.55567 0.59483 0.63307 0.67003 0.70540 0.73891 0.77035 0.79955 0.82639 0.85083 0.87286 0.89251 0.90988 0.92507 0.93822 0.94950 0.95907 0.96712 0.97381 0.97932 0.98382 0.98745 0.99036 0.99266 0.99446 0.99585 0.99693 0.99774 0.99836 0.99882 0.99916 0.99940 0.99958 0.99971
0.51994 0.55962 0.59871 0.63683 0.67364 0.70884 0.74215 0.77337 0.80234 0.82894 0.85.314 0.87493 0.89435 0.91149 0.92647 0.93943 0.95053 0.95994 0.96784 0.97441 0.97982 0.98422 0.98778 0.99061 0.99286 0.99461 0.99598 0.99702 0.99781 0.99841 0.99886 0.99918 0.99942 0.99960 0.99972
0.52392 0.56356 0.60257 0.64058 0.67724 0.71226 0.74537 0.77637 0.80511 0.83147 0.85543 0.87698 0.89617 0.91309 0.92786 0.94062 0.95154 0.96080 0.96856 0.97500 0.98030 0.98461 0.98809 0.99086 0.99305 0.99477 0.99609 0.99711 0.99788 0.99846 0.99889 0.99921 0.99944 0.99961 0.99973
0.52790 0.56749 0.60642 0.64431 0.68082 0.71566 0.74857 0.77935 0.80785 0.83398 0.85769 0.87900 0.89796 0.91466 0.92922 0.94179 0.95254 0.96164 0.96926 0.97558 0.98077 0.98500 0.98840 0.99111 0.99324 0.99492 0.99621 0.99720 0.99795 0.99851 0.99893 0.99924 0.99946 0.99962 0.99974
0.53188 0.57142 0.61026 0.64803 0.68439 0.71904 0.75175 0.78230 0.81057 0.83646 0.85993 0.88100 0.89973 0.91621 0.93056 0.94295 0.95352 0.96246 0.96995 0.97615 0.98124 0.98537 0.98870 0.99134 0.99343 0.99506 0.99632 0.99728 0.99801 0.99856 0.99897 0.99926 0.99948 0.99964 0.99975
0.53586 0.57535 0.61409 0.65173 0.68793 0.72240 0.75490 0.78524 0.81327 0.83891 0.86214 0.88298 0.90147 0.91774 0.93189 0.94408 0.95449 0.96327 0.97062 0.97670 0.98169 0.98574 0.98899 0.99158 0.99361 0.99520 0.99643 0.99736 0.99807 0.99861 0.99900 0.99929 0.99950 0.99965 0.99976
R. Manzini, A. Regattieri, H. Pham, E. Ferrari, Maintenance for Industrial Systems © Springer 2010
463
464
A Appendix
A.2 Control Chart Constants
n
D3
D4
B3
B4
A2
A3
d2
c4
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
0 0 0 0 0 0.076 0.136 0.184 0.223 0.256 0.283 0.307 0.328 0.347 0.363 0.378 0.391 0.403 0.415 0.425 0.434 0.443 0.451 0.459
3.267 2.574 2.282 2.114 2.004 1.924 1.864 1.816 1.777 1.744 1.717 1.693 1.672 1.653 1.637 1.622 1.608 1.579 1.585 1.575 1.566 1.557 1.548 1.541
0 0 0 0 0.030 0.118 0.185 0.239 0.284 0.321 0.354 0.382 0.406 0.428 0.448 0.466 0.482 0.497 0.510 0.523 0.534 0.545 0.555 0.565
3.267 2.568 2.266 2.089 1.970 1.882 1.815 1.761 1.716 1.679 1.646 1.618 1.594 1.572 1.552 1.534 1.518 1.503 1.490 1.477 1.466 1.455 1.445 1.435
1.880 1.023 0.729 0.577 0.483 0.419 0.373 0.337 0.308 0.285 0.266 0.249 0.235 0.223 0.212 0.203 0.194 0.187 0.180 0.173 0.167 0.162 0.157 0.153
2.659 1.954 1.628 1.427 1.287 1.182 1.099 1.032 0.975 0.927 0.886 0.850 0.817 0.789 0.763 0.739 0.718 0.698 0.680 0.663 0.647 0.633 0.619 0.606
1.128 1.693 2.059 2.326 2.534 2.704 2.847 2.970 3.078 3.173 3.258 3.336 3.407 3.472 3.532 3.588 3.640 3.689 3.735 3.778 3.819 3.858 3.895 3.931
0.7979 0.8862 0.9213 0.9400 0.9515 0.9594 0.9650 0.9693 0.9727 0.9754 0.9776 0.9794 0.9810 0.9823 0.9835 0.9845 0.9854 0.9862 0.9869 0.9876 0.9882 0.9887 0.9892 0.9896
A.3 Critical Values of Student’s Distribution with Degree of Freedom
A.3 Critical Values of Student’s Distribution with Degree of Freedom
˛ 0.2
0.1
0.05
0.01
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
1.376 1.061 0.978 0.941 0.920 0.906 0.896 0.889 0.883 0.879 0.876 0.873 0.870 0.868 0.866 0.865 0.863 0.862 0.861 0.860 0.859 0.858 0.858 0.857 0.856 0.856 0.855 0.855 0.854 0.854
3.078 1.886 1.638 1.533 1.476 1.440 1.415 1.397 1.383 1.372 1.363 1.356 1.350 1.345 1.341 1.337 1.333 1.330 1.328 1.325 1.323 1.321 1.319 1.318 1.316 1.315 1.314 1.313 1.311 1.310
6.314 2.920 2.353 2.132 2.015 1.943 1.895 1.860 1.833 1.812 1.796 1.782 1.771 1.761 1.753 1.746 1.740 1.734 1.729 1.725 1.721 1.717 1.714 1.711 1.708 1.706 1.703 1.701 1.699 1.697
31.821 6.965 4.541 3.747 3.365 3.143 2.998 2.896 2.821 2.764 2.718 2.681 2.650 2.624 2.602 2.583 2.567 2.552 2.539 2.528 2.518 2.508 2.500 2.492 2.485 2.479 2.473 2.467 2.462 2.457
465
Bibliography
Abernethy RB (2007) The new Weibull handbook. Abernethy, North Palm Beach Ait Kadi D, Cléroux R (1988) Optimal block replacement policies with multiple choice at failure. Nav Res Logist 35:99– 110 Akhmedjanov FM (2001) Reliability databases: state of the art and perspectives. Riso Natl Lab R1235:1–37 Amari SV, Pham H (2007) A novel approach for optimal costeffective design of complex repairable systems. IEEE Trans Syst Man Cybern Part A 37(3):406–415 American Petroleum Institute (2002) API risk based inspection. API recommended practice 580. American Petroleum Institute, Washington Ansell J, Bendell A, Humble S (1984) Age replacement under alternative cost criteria. Manag Sci 30:358–367 Arts RHPM, Knapp GM, Mann L Jr (1988) Some aspects of measuring maintenance performance in the process industry. J Qual Maint Eng 4(1):6–11 Aven T (1983) Optimal replacement under a minimal repair strategy – a general failure model. Adv Appl Probab 15:198– 211 Aven T (1990) Availability formulae for standby systems of similar units that are preventively maintained. IEEE Trans Reliab 39:603–606 Bagai I, Jain K (1994) Improvement, deterioration, and optimal replacement under age-replacement with minimal repair. IEEE Trans Reliab 43:156–162 Bai DS, Jang JS, Kwon YI (1983) Generalized preventive maintenance policies for a system subject to deterioration. IEEE Trans Reliab 32:512–514 Bai J, Pham H (2004) Discounted warranty cost of minimally repaired series systems. IEEE Trans Reliab 53(1):37–42 Bai J, Pham H (2005) Repair-limit risk-free warranty policies with imperfect repair. IEEE Trans Syst Man Cybern Part A 35(6):765–772 Bai J, Pham H (2006) Cost analysis on renewable full-service warranties for multi-component systems. Eur J Oper Res 168:492–508 Baldin A (1983) La manutenzione e il rinnovo. Angeli, Milan Baldin A, Furlanetto L, Roversi A, Turco F (1978) Manuale della manutenzione degli impianti industriali. Angeli, Milan
Banerjee PK, Chuiv N (1996) Inspection policies for repairable systems. IIE Trans 28(12):1003–1010 Barlow RE, Hunter LC (1960) Optimum preventive maintenance policies. Oper Res 8:90–100 Barlow RE, Proschan F (1965) Mathematical theory of reliability. Wiley, New York Beichelt F (1981) A generalized block-replacement policy. IEEE Trans Reliab 30:171–172 Beichelt F, Fisher K (1980) General failure model applied to preventive maintenance policies. IEEE Trans Reliab 29:39– 41 Ben Daya M, Duffuaa SO (1997) Coordination of inspections for a group of machines. J Qual Maint Eng 3(3)199–208 Ben-Daya M, Duffuaa SO, Raouf A (2000) Maintenance, modeling and optimization. Kluwer, Boston. Berg M (1978) General trigger-off replacement procedures for two-unit system. Nav Res Logist Q 25:15–29 Berg MP (1995) The marginal cost analysis and its application to repair and replacement policies. Eur J Oper Res 82:214– 224 Bilgen S, Deligönül ZS (1987) Weibull renewal equation solution. Computing 39(1):71–76 Birolini A (1994) Quality and reliability of technical systems. Springer, Berlin Black JT, Hunter SL (2003) Lean manufacturing systems and cell design. Society of Manufacturing Engineers, Dearborn Block HW, Borges WS, Savits TH (1985) Age-dependent minimal repair. J Appl Probab 22:370–385 Block HW, Borges WS, Savits TH (1998) A general age replacement model with minimal repair. Nav Res Logist 35(5):365– 372 Boland PJ (1982) Periodic replacement when minimal repair costs vary with time. Nav Res Logist Q 29:541–546 Boland PJ, Proschan F (1982), Periodic replacement with increasing minimal repair costs at failure. Oper Res 30:1183– 1189 Botter R, Fortuin L (2000) Stocking strategy for service parts – a case study. Int J Oper Prod Manag 20(6):656–674 Boylan JE, Syntetos AA (2006) Accuracy and accuracy implication metrics for intermittent demand. Int J Appl Forecast 4:39–42
467
468 Boylan JE, Syntetos AA, Karakostas GC (2008) Classification for forecasting and stock control: a case study. J Oper Res Soc 59:473–481 Braglia M, Grassi A, Montanari R (2004) Multi-attribute classification method for spare parts inventory management. J Qual Manag Eng 10(1):55–65 Bris R, Chatelet E, Yalaoui F (2003) New method to minimize the preventive maintenance cost of series-parallel systems. Reliab Eng Syst Saf 82:247–255 Brown M, Proschan F (1983) Imperfect repair. J Appl Probab 20:851–859 Burr JT (2005) Elementary statistical quality control, 2nd edn. Dekker, New York Canfield RV (1986) Cost optimization of periodic preventive maintenance. IEEE Trans Reliab 35:78–81 Cattaneo M (2004) La manutenzione in Italia nelle PMI. http://aiman.gs-m.eu/documenti/manutenzione_italia_pmi_ 2000%20.pdf Chan FTS, Lau HCW, Ip L, Chan HK, Kong S (2005) Implementation of total productive maintenance: a case study. Int J Prod Econ 95(1):71–94 Chan PKW, Downs T (1978) Two criteria for preventive maintenance. IEEE Trans Reliab 27:272–273 Chand G, Shirvani B (2000) Implementation of TPM in cellular manufacture. J Mater Process Technol 4(3):149–154 Chang PL, Chou YC, Huang MG (2005) A (r; r; Q) inventory model for spare parts involving equipment criticality. Int J Prod Econ 76:66–74 Chelbi S, Ait-Kadi D (1999) An optimal inspection strategy for randomly failing equipment, Reliab Eng Syst Saf 63:127– 131 Chen M, Feldman RM (1997) Optimal replacement policies with minimal repair and age-dependent costs. Eur J Oper Res 98:75–84 Cheng SC (2005) Fuzzy estimator of the total failure time of censored data. In: Proceedings of conference of the North American Fuzzy Information Processing Society – NAFIPS, pp 505–509 Cheng SC, Mordeson JN (1985) A note on the notion of adequacy. Chin J Math 13(1):13–41 Cho DI. Parlar M (1991) A survey of maintenance models for multi-unit systems. Eur J Oper Res 51:1–23 Chuenusa C, Ramnik B, Fiju A (2004) The status of maintenance management in UK manufacturing organizations: results from a pilot survey. J Qual Maint Eng 10(1):5–15 Clavareau J, Labeau P-E (2009) Maintenance and replacement policies under technological obsolescence. Reliab Eng Syst Saf 94(2):370–381 Cléroux R, Dubuc S, Tilquin C (1979) The age replacement problem with minimal repair and random repair costs. Oper Res 27:1158–1167 Cobbaert K, Van Oudheusden D (1996) Inventory models for fast moving spare parts subject to sudden death obsolescence. Int J Prod Econ 44:239–248 Constantine AG, Robinson NI (1997) The Weibull renewal function for moderate to large arguments. Comput Stat Data Anal 24:9–27 Contreras CA (2007) Using the EM algorithm for inference in a mixture of distributions with censored but partially identifiable data. Comput Stat Data Anal 51(5):2769–2781
Bibliography Corrigo (2007) Preventive maintenance benchmark study results. http://www.corrigo.com/uploadedFiles/Products/ Resources/Benchmark_Reports/2007_Corrigo_PM_ Benchmark_Report.pdf Crocker J, Kumar DU (2000) Age related maintenance versus reliability centered maintenance: a case study on aeroengines. Reliab Eng Syst Saf 67(2):113–118 Croston JD (1972) Forecasting and stock control for intermittent demands. Oper Res Q 23:289–304 CSERGE, Warren Spring Laboratory, EFTEL (1993) Externalities from landfill and incineration. Report to the Department of Environment. HMSO, London Dal B, Tugwell P, Greatbanks R (2000) Overall equipment effectiveness as a measure of operational improvement, a practical analysis. Int J Oper Prod Manag 20(12):1488–1502 D’Augostino RB, Stephens MA (1986) Goodness-of-fit techniques. Dekker, New York de Almeida JR, Camargo JB, Basseto BA, Paz SM (2003) Best practices in code inspection for safety-critical software, IEEE Software 20(3):56–63 Dekker R (1996) Applications of maintenance optimization models: a review and analysis. Reliab Eng Syst Saf 51(3):229–240 Dekker R, Roelvink IFK (1995) Marginal cost criteria for preventive replacement of a group of components. Eur J Oper Res 84:467–480 Devore JL, Farnum NR (1999) Applied statistics for engineers and scientists. Duxbury, Pacific Grove Dhillon BS (2005) Reliability, quality, and safety for engineers. CRC, New York Dieulle L, Berenguer C, Gralland G, Roussignol M (2003) Sequential condition-based maintenance scheduling for a deteriorating system. Eur J Oper Res 150:451–461 Dijkgraaf E, Vollebergh H (2003) Burn or bury? A social comparison of final waste disposal methods SIEV. Erasmus University Rotterdam and OCFEB, Rotterdam Dimitrakos TD, Kyriakidis EG (2007) An improved algorithm for the computation of the optimal repair/replacement policy under general repairs. Eur J Oper Res 182:775–782 Duffuaa SO, Raouf A, Campbell JD (1999) Planning and control of maintenance systems. Modelling and analysis. Wiley, New York Ebeling CE (2005) An introduction to reliability and maintainability engineering, McGraw-Hill, New York Ebrahimi N (1985) Mean time to achieve a failure-free requirement with imperfect repair. IEEE Trans Reliab 34:34–37 EC (1996) Cost–benefit analysis of the different municipal solid waste management systems: objectives and instruments for the year 2000. Final report to EC, by Coopers and Lybrand, CSERGE and EFTEC EC (2000a) A study on economic valuation of environment externalities from landfill disposal and incineration of waste. Final main report. European Commission, DG Environment. http://europa.eu.int/comm/ environment/waste/studies/econ eva landfill.htm EC (2000b) A study on economic valuation of environment externalities from landfill disposal and incineration of waste. European Commission, DG Environment ECON, Senter for Okonomosk Analyse (1995) Environmental costs of different types of waste
Bibliography Economou M (2004) The merit and limitations of reliability predictions. In: IEEE – RAMS 2004 conference, pp 352–357 EMC (1996) Waste disposal management, analyzing direct and indirect costs. Final report. EMC, Israel, p 70 Enosh (1996) Solid waste management externalities. Final report. Enosh, Israel, p 140 Esary JD, Marshall AW, Proschan F (1973) Shock models and wear processes. Ann Probab 1:627–649 Eshet T, Ayalon O, Shechter M (2006) Valuation of externalities of selected waste management alternatives: a comparative review and analysis. Resour Conserv Recycl 46:335–364 Eti MC, Ogaji SOT, Probert SD (2004) Implementing total productive maintenance in Nigerian manufacturing industries. Appl Energy 79(4):385–40 Eunomia (2002) Economic analysis of option for managing biodegradable municipal waste. Final report and appendices. http://europa.eu.int/comm/environment/waste/compost/ econanalysis finalreport.pdf Evans RD (2005) Too small for a CMMS, working paper. Compliance Technologies, Woodstock. http://www. maintenanceworld.com/articles/evansr/toosmall.htm ExternE (1998) Extern: externalities of energy. New results, to be published by Extern Program of European Commission. Directorate-General XII, Science Research and Development, p 67 Fedele L, Furlanetto L, Saccardi D (2005) Progettare e gestire la manutenzione. Mc-Graw-Hill, Milan Feldman RM (1976) Optimal replacement with semi-Markov shock models. J Appl Probab 13:108–117 Feldman RM (1977) Optimal replacement for systems governed by Markov additive shock processes. Ann Probab 5:413–429 Fernandes P (2001) A framework for a strategy driven manufacturing system design in an aerospace environment – design beyond factory floor. MSc thesis, Massachusetts Institute of Technology, Boston Ferrari E, Pareschi A, Persona A, Regattieri A (2001) TPM: a total approach for industrial maintenance question. In: Proceedings of 7th international conference on reliability and quality in design – ISSAT01, Washington DC, pp 262–268 Ferrari E, Pareschi A, Persona A, Regattieri A (2002) TPM: situation and procedure for a soft introduction in Italian factories. Tqm Mag 14:350–358 Ferrari E, Pareschi A, Persona A, Regattieri A (2003) Plant layout computerised design: logistic and relayout program (LRP). Int J Adv Manuf Technol 21(12):917–922 Ferrari E, Pareschi A, Persona A, Regattieri A (2006) Statistical management and modeling for demand spare parts. In: Springer handbook of engineering statistics. Springer, Berlin, pp 905–929 Ferrari G (1994) La gestione delle manutenzione. Angeli, Milan Fox B (1966) Age replacement with discounting. Oper Res 14:533–537 Fu HM (2007) Integral estimate method for interval censored data. J Aerosp Power 22(2):175–179 Gamberi M, Manzini R, Regattieri A (2009) A new approach for the automatic analysis and control of material handling systems: integrated layout flow analysis (ILFA). Int J Adv Manuf Technol 41(1):156–167 Geraghty T (1996) Beyond TPM. Manuf Eng 7(5):183–185
469 Ghobbar AA, Friend CH (2002) Sources of intermittent demand for aircraft spare parts within airline operations. J Air Transp Manag 8:221–231. Ghobbar AA, Friend CH (2003) Evaluation of forecasting methods for intermittent parts demand in the field of aviation: a predictive model. Comput Oper Res 30(14):2097–2114. Ghobbar A.A, Friend CH (2004) The material requirements planning system for aircraft maintenance and inventory control: a note. J Air Transp Manag 10:217–221 Giri BC, Dohi T, Kaio N (2005) A discrete time order replacement model with time discounting and spare parts provisioning. J Qual Maint Eng 11(3):190–205 Glasser GJ (1967) The age replacement problem. Technometrics 9:83–91 Gosavi A (2006) A risk-sensitive approach to total productive maintenance. Automatica 42(8):1321–1330 Grall C, Berenguer L, Dieulle C (2002) A condition-based maintenance policy for stochastically deteriorating systems. Reliab Eng Syst Saf 76:167–180 Grall L, Dieulle C, Berenguer L, Roussignol M (2002) Continuous-time predictive-maintenance scheduling for a deteriorating system. IEEE Trans Reliab 51(2):141–150 Gutierrez RS, Solis AO, Mukhopadhyay S (2008) Lumpy demand forecasting using neural networks. Int J Prod Econ 111(2):409–420 Haange R (1995) Overview of remote-maintenance scenarios for the ITER machine. Fusion Eng Des 27:69–82 Hariga M (1996) A maintenance inspection model for a single machine with general failure distribution. Microelectron Reliab 36(3):353–358 Hariga M, Azaiez N, Ben Daya M (2006) A discounted integrated inspection-maintenance model for a single deteriorating production facility. Int Trans Oper Res 13:353–364 Hartman E (1992) Successfully installing TPM in a nonJapanese plant. TPM, London Health and Safety Executive (2005) European comparisons – fatal injuries. http://www.hse.gov.uk/statistics/european/ fatal.htm Hocley CJ, Appleton DP (1997) Setting the requirements for the Royal Air Force’s next generation aircraft. Paper presented at the annual reliability and maintainability symposium, Philadelphia Hollander M, Presnell B, Sethuraman J (1992) Nonparametric methods for imperfect repair models. Ann Stat 20(2):879– 887 Hsieh KL (2007) Applying neural networks approach to achieve the parameter optimization for censored data. WSEAS Trans Comput 6(5):858–863 Huang J, Miller CR, Okogbaa OG (1995) Optimal preventivereplacement intervals for the Weibull life distribution: solutions and applications. In: Proceedings of the annual reliability and maintainability symposium, Washington Hutchins D (1998) Introducing TPM. Manuf Eng 4(2):34–36 Inagaki T, Inoue K, Akashi H (1980) Optimization of staggered inspection schedules for protective systems. IEEE Trans Reliab 29:170–173 International Labor Organization (2006) Facts on safety at work. http://www.ilo.org/public/english/bureau/inf/download/ factsheets/pdf/wdshw.pdf
470 Jardine AKS, Buzacott JA (1985) Equipment reliability and maintenance. Eur J Oper Res 19:285–296 Jardine AKS, Tsang AHC (2006) Maintenance, replacement and reliability. Theory and applications. Taylor & Francis, Oxford Jeong IJ, Leon VJ, Villalobos JR (2007) Integrated decisionsupport system for diagnosis. Maint Plann Sched Manuf Syst 45(2):267–285 Jiang R, Ji P, Tsang AHC (2006) Preventive effect of optimal replacement policies. J Qual Maint Eng 12(3):267–274 JIPM – Japan Institute of Plant Maintenance (1995) Applichiamo il TPM. Guida operativa alla realizzazione del total productive maintenance. Angeli, Milan Johnston FR, Boylan JE (1996) Forecasting for items with intermittent demand. J Oper Res Soc 47:113–121 Kaspi M, Shabtay D (2003) Optimization of the machining economics problem for a multistage transfer machine under failure, opportunistic and integrated replacement strategies. Int J Prod Res 41(10):2229–2247 Khan FI, Haddara M, Krishnasamy LA (2008) New methodology for risk-based availability analysis reliability. IEEE Trans Reliab 57(1):103–111 Kijima M, Nakagawa T (1991) Accumulative damage shock model with imperfect preventive maintenance. Nav Res Logist 38:145–156 Kijima M, Nakagawa T (1992) Replacement policies of a shock model with imperfect preventive maintenance. Eur J Oper Res 57:100–110 Kijima M, Morimura H, Suzuki Y (1988) Periodical replacement problem without assuming minimal repair. Eur J Oper Res 37(2):194–203 Kijimma M (1989) Some results for repairable systems with general repair. J Appl Probab 26:89–102 Klutke GA, Yang YJ (2002) The availability of inspected systems subjected to shocks and graceful degradation. IEEE Trans Reliab 44:371–374 Koelsch JR (1993) A dose of TPM: downtime needn’t be a better pill. Manuf Eng 110(3):6–66 Konopka J, Fowler J (1994) A TPM inspired factory throughput analysis tool. In: Proceedings of international symposium on semiconductor manufacturing, pp 64–67 Koudate A, Suzue T (1990) Variety reduction program: a production strategy for product diversification. Productivity, New York Kumar DU, Knezevic J, Crocker J (1999) Maintenance free operating period – an alternative measure to MTBF and failure rate for specifying reliability? Reliab Eng Syst Saf 64:127– 131 Lai KK, Leung FKN, Tao B, Wang SY (2000) Eur J Oper Res 124(2):294–306 Lam CT, Yeh RH (1994) Optimal maintenance-policies for deteriorating systems under various maintenance strategies. IEEE Trans Reliab 43(3):423–430 Lam YA (1991) Optimal repairable replacement model for deteriorating system. J Appl Probab 28:843–851 Lee YB, Kim HE, Yoo YC, Park JH (2006) Accelerated life test model for life prediction of piston assemblies in hydraulic pump and motor. Key Eng Mater 326–328:649–652 Legat V, Zaludova AH, Cervenka V, Jurca V (1996) Contribution to optimization of preventive replacement. Reliab Eng Syst Saf 51:259–266
Bibliography Lewis EE (1987) Introduction to reliability engineering. Wiley, New York Li W (2005) Reliability and maintenance modeling of multistate degraded systems with multiple competing failure processes. PhD dissertation, Department of Industrial and Systems Engineering, Rutgers University Li W, Pham H (2003) An inspection-maintenance model for systems with multiple competing processes. IEEE Trans Reliab 54:318–327 Li W, Pham H (2005a) Reliability modeling of multi-state degraded systems with multi-competing failures and random shocks. IEEE Trans Reliab 54(2):297–303 Li W, Pham H (2005b) An inspection-maintenance model for systems with multiple competing processes. IEEE Trans Reliab 54(2):318–327 Lie XG, Makis V, Jardine AKS (1995) A replacement model with overhauls and repairs. Nav Res Logist 42:1063–1079 Maggard B (1992) The theory and design of total productive maintenance. A guide for implementing TPM. TPM, London Makis V, Jardine AKS (1992) Optimal replacement policy for a general model with imperfect repair. J Oper Res Soc 43:111–120 Makridakis S, Wheelwright SC, Hyndman RJ (1998) Forecasting: methods and applications. Wiley, New York Manzini R, Gamberi M, Regattieri A, Persona A (2004) Framework for designing a flexible cellular assembly system. Int J Prod Res 42(17):3505–3528 Manzini R, Gamberi M, Regattieri A (2005) Design and control of a flexible order-picking system (FOPS). A new integrated approach to the implementation of an expert system. J Manuf Technol Manag 16(1):18–35 Manzini R, Persona A, Regattieri A (2006a) Framework for designing and controlling a multi-cellular flexible manufacturing system. Int J Serv Oper Manag 2:1–21 Manzini R, Gamberi M, Regattieri A (2006b) Design and control of an AS/RS. Int J Adv Manuf Technol 28(7/8):766–774 Manzini R, Ferrari E, Meccariello A, Regattieri A (2007) A cost based model for failure modes analysis, reliability prediction and magnitude evaluation of a waste to energy plant. In: Proceedings of the ISSAT international conference on modeling of complex systems and environments, Ho Chi Minh City, Vietnam Manzini R, Gamberi M, Persona A, Regattieri A (2007) Design of a class based storage picker to product order picking system. Int J Adv Manuf Technol 32:811–821 Marin JJ, Pollard RW (2005) Experience report on the FIDES reliability predictions. In: IEEE – RAMS 2005 conference, vol 1, pp 8–13 McCall JJ (1965) Maintenance policies for stochastically failing equipment: a survey. Manag Sci 11(5):493–524 Miranda ML, Hale B (1997) Waste not, want not: the private and social costs of waste-to-energy production. Energy Policy 25:587–600 Monesi F (2008) La qualità totale nelle aziende di servizi: implementazione di un approccio lean six Sigma finalizzato all’efficienza gestionale. PhD thesis, University of Bologna Murthy DNP, Blischke WR (2006) Warranty management and product manufacture. Springer, Berlin Murthy DNP, Nguyen DG (1981) Optimal age-policy with imperfect preventive maintenance. IEEE Trans Reliab 30:80– 81
Bibliography Nakagawa T (1977) Optimum preventive maintenance policies for repairable systems. IEEE Trans Reliab 26:168–173 Nakagawa T (1979) Imperfect preventive maintenance. IEEE Trans Reliab 28(5):402–408 Nagakawa T (1980) A summary of imperfect preventive maintenance policies with minimal repair. Res Oper 14(3):249–255 Nakagawa T (1981a) A summary of periodic replacement with minimal repair at failure. J Oper Res Soc Jpn 24:213–227 Nakagawa T (1981b) Modified periodic replacement with minimal repair at failure. IEEE Trans Reliab 30:165–168 Nakagawa T (1983) Optimal number of failures before replacement time. IEEE Trans Reliab 32:115–116 Nakagawa T, Osaki S (1974) Some aspects of damage models. Microelectron Reliab 13:253–257 Nakagawa T, Yasui K (1987) Optimum policies for a system with imperfect maintenance. IEEE Trans Reliab 36:631–633 Nagano S (2008) Space systems verification program and management process. Syst Eng 11(1):27–38 Nakajima S (1988) Introduction to TPM. Productivity, Cambridge Nakajima S (1992) TPM-total productive maintenance. Productivity, Isedi Natvig B (1990) On information based minimal repair and the reduction in remaining system lifetime due to the failure of a specific module. J Appl Probability 27:365–375 Nelson WB (2005) A bibliography of accelerated test plans. IEEE Trans Reliab 54(2):194–197 Nguyen D.G, Murthy DNP (1981) Optimal preventive maintenance policies for repairable systems. Oper Res 29(6):1181– 1194 O’Hanlon (2005) Computerized maintenance management and enterprise asset management best practices. http://www.cmmscity.com/articles/cmms_best_practices.pdf O’Connor PDT (1988). RADC reliability engineer’s toolkit, USAF Rome Air Development Center. Wiley, New York Ohnishi M, Kawai H, Mine H (1986) An optimal inspection and replacement policy for a deteriorating system. J Appl Probab 23:973–988 Oke SA, Charles-Owaba OE (2006) Application of fuzzy logic control model to Gantt charting preventive maintenance scheduling. Int J Qual Reliab Manag 23(4):441–459 Özekici S (1996) Reliability and maintenance of complex systems. Springer, Berlin Pareschi A, Regattieri A (2002) Remote maintenance, teleservice and internet. Topics, methods and applications. In: Proceedings of 8th international conference on reliability and quality in design, Anaheim, USA Park D, Willemain TR (1999) The threshold bootstrap and threshold jackknife. Comput Stat Data Anal 31:187–202 Park KS (1979) Optimal number of minimal repairs before replacement. IEEE Trans Reliab 28:137–140 Park KS, Yoo YK (1993) (,k) block replacement policy with idle count. IEEE Trans Reliab 42:561–565 Persona A, Regattieri A (1997) Methodology for maintenance system establishment and management in little and medium industry. In: Proceedings of SMI conference, Bologna Pham H (2001) Recent advances in reliability and quality engineering, World Scientific, New York Pham H (ed) (2003a) Handbook of reliability engineering. Springer, Berlin
471 Pham H (2003b) Commentary: steady-state series-system availability. IEEE Trans Reliab 52(2):146–147 Pham H (ed) (2006a) Springer handbook of engineering statistics. Springer, Berlin Pham H (2006b) Reliability modeling, analysis and optimization. World Scientific, New York Pham H (2008) Recent advances in reliability and quality in design. Springer, Berlin Pham H, Wang HZ (1996) Imperfect maintenance. Eur J Oper Res 94:425–438 Pham H, Wang H (2000) Optimal (t,T) opportunistic maintenance of a k-out-of-n system with imperfect PM and partial failure. Nav Res Logist 47:223–239 Pham H, Suprasad A, Misra RB (1996) Reliability analysis of k-out-of-n systems with partially repairable multi-state components. Microelectron Reliab J 36(10):1407–1415 Pham H, Suprasad A, Misra RB (1997) Availability and mean life time prediction of multistage degraded system with partial repairs. Reliab Eng Syst Saf J 56:169–173 Phelps RI (1983) Optimal policy for minimal repair. J Oper Res Soc 34:452–457 Pierskalla WP, Voelker JA (1976) A survey of maintenance models: the control and surveillance of deteriorating systems. Nav Res Logist Q 23:353–388 Pinto FW (2008) Mechanical integrity – stationary equipment reliability. Process Saf Prog 27(2):105–111 Plant Maintenance Resource Center (2004) CMMS implementation survey results. Plant Maintenance Resource Center, Booragoon Politis K, Koutras MV (2006) Some new bounds for the renewal function. Probab Eng Inf Sci 20:231–250 Ponte M (2003) L’informazione essenziale di tecnica e legislazione per costruire. Tipografia del Genio Civile, Rome Powell JC, Brisson I (1994) The assessment of social costs and benefits of waste disposal. Working paper 0967-8875 Rabl A, Spadaro JV, Desaigues B (1998) Usefulness of damage cost estimates despite uncertainties: the example of regulations for incinerators. Environmental risk final report 1 annex 7: COSTS Rampersad HK (1995) Concurrent design of product, process and robotic assembly system, Assem Autom 15:21–28 Regattieri A (1996) Maintenance engineering: methodologies and applied instruments. Graduate thesis, Bologna University Regattieri A (1999) Maintenance engineering in industrial systems. PhD thesis, Parma University Regattieri A, Gamberi M, Gamberini R, Manzini R (2005) Managing lumpy demand for aircraft spare parts. J Air Transp Manag 11:426–431 Rezg N, Chelbi A, Xie X (2005) Modeling and optimizing a joint inventory control and preventive maintenance strategy for a randomly failing production unit: analytical and simulation approaches. Int J Comput Integr Manuf 18(2/3):225– 235 Ritchken P, Wilson JG (1990) (m,T) group maintenance policies. Manag Sci 36:632–639 Rosi L, Salemme R (2001) Remote monitoring & diagnostic for rotating equipment. Working paper. General Electric Power Systems
472 Rotival L, Gai S, Canova M, Lancioni R, Rossi C (2001) Internet changes the rules of the game. Working paper. Nuovo Pignone, Florence Roversi A, Turco F (1974) Il problema dei ricambi negli impianti industriali: la valutazione dei fabbisogni. Angeli, Milan Ryan C, Joiner J, Cryer D (2005) MINITAB handbook. Thomson Brooks/Cole, Belmont Saranga H (2004) Opportunistic maintenance using genetic algorithms. J Qual Maint Eng 10(1):66–74 Savits TH (1988) A cost relationship between age and block replacement policies. J Appl Probab 25:789–796 Sharaf MA, Helmy HA (2001) A classification model for inventory management of spare parts. In: Proceedings of 7th international conference on production, Alexandria, Egypt, vol 1, pp 375–382 Sherif YS, Smith ML (1981) Optimal maintenance models for systems subject to failure – a review. Nav Res Logist Q 28:47–74 Sheu SH (1994) Extended block replacement policy with used item and general random minimal repair cost. Eur J Oper Res 79:405–416 Sheu SH (1998) A generalized age block replacement of a system subject to shocks. Eur J Oper Res 108:345–362 Sheu SH, Griffith WS (1996) Optimal number of minimal repairs before replacement of a system subject to shocks. Nav Res Logist 43:319–333 Smith DJ (2001) Reliability maintainability and risk. Newnes, London Smith R (2003) Benchmark maintenance organization effectiveness. Life Cycle Engineering, Charleston Smith WL (1954) Asymptotic renewal theorems. Proc R Soc Edinb A 64:9–48 Smith WL, Leadbetter MR (1963) On the renewal function for the Weibull distribution. Technometrics 5:393–396 Soland RM (1969) Availability of renewal functions for gamma and Weibull distributions with increasing hazard rate. Oper Res 17(3):536–543 Srinivasan VS (1966) The effect of standby redundancy in system’s failure with repair maintenance. Oper Res 14:1024– 1036 Suresh PV, Chaudhuri D (1994) Preventive maintenance scheduling for a system with assured reliability using fuzzy set theory. Int J Reliab Qual Saf Eng 1(4):497–513 Swanson L (2003) An information-processing model of maintenance management. Int J Prod Econ 83:45–64 Syntesos AA (2001) Forecasting of intermittent demand. PhD thesis, Buckinghamshire Business School, Brunel University Syntetos AA, Boylan JE (2001) On the bias of intermittent demand estimates. Int J Prod Econ 71:457–466 Syntetos AA, Boylan JE (2005) The accuracy of intermittent demand estimates. Int J Forecast 21:303–314 Syntetos AA, Boylan JE (2006) Comments on the attribution of an intermittent demand estimator. Int J Forecast 22:201–208 Syntetos AA, Boylan JE, Croston JD (2005) On the categorization of demand patterns. J Oper Res Soc 56:495–503 Tam ASB, Chan WM, Price JWH (2007) Maintenance scheduling to support the operation of manufacturing and production assets. Int J Adv Manuf Technol 34:399–405
Bibliography Tellus Institute (1992) Tellus packaging study – assessing the impacts of production and disposal of packaging and public policy measures, vol 1. Tellus Institute, Boston Thonemann U, Bradley J (2002) The effect of product variety on supply-chain performance. Eur J Oper Res 143(3):548–569 United States Securities and Exchange Commission (2000) Form 10-Q. http://www.continental.com/web/en-US/content/ company/investor/docs/continental_10q_2000_q3.pdf Usher JS, Kamal AH, Syed WH (1998) Cost optimal preventive maintenance and replacement scheduling. IIE Trans 30:1121–1128 Valdez-Flores C, Feldman RM (1989) A survey of preventive maintenance models for stochastically deteriorating singleunit systems. Nav Res Logist 36:419–446 Vaughn A, Fernandes P, Shields JT (2002) Manufacturing system design framework manual. Massachusetts Institute of Technology, Boston Vesely WE, Roberts NH, Haasl DF (1981) Fault tree handbook, system and reliability. Research Office of Nuclear Regulatory Research, US Nuclear Regulatory Commission, Washington Wang H (2002) A survey of maintenance policies of deteriorating systems. Eur J Oper Res 139:469–489 Wang H, Pham H (1996a) Optimal age-dependent preventive maintenance policies with imperfect maintenance. Int J Reliab Qual Saf Eng 3(2):119–135 Wang H, Pham H (1996b) A quasi renewal process and its applications in imperfect maintenance. Int J Syst Sci 27(10):1055–1062 Wang H, Pham H (1996c) Optimal maintenance policies for several imperfect repair models. Int J Syst Sci 27(6):543–549 Wang H, Pham H (1997) Optimal opportunistic maintenance of a k-out-of-n:G system. Int J Reliab Qual Saf Eng 4(4):369– 386 Wang H, Pham H (2006a) Reliability and maintenance of series systems subject to imperfect repair and correlated failure and repair. Eur J Oper Res 174:1706–1722 Wang H, Pham H (2006b) Reliability and optimal maintenance. Springer, Berlin Wang HZ, Pham H (1999) Some maintenance models and availability with imperfect maintenance in production systems. Ann Oper Res 91:305–318 Weisstein EW (2008) Incomplete gamma function at MathWorld. http://mathworld.wolfram.com/ IncompleteGammaFunction.html Willemain TR, Smart CN, Shockor JH, DeSautels PA (1994) Forecasting intermittent demand in manufacturing: a comparative evaluation of Croston’s method. Int J Forecast 10(4):529–538 Willemain TR, Smart CN, Schwarz HF (2004) A new approach to forecasting intermittent demand for service parts inventories. Int J Forecast 20(3):375–387 Williams TM (1984) Stock control with sporadic and slowmoving demand. J Oper Res Soc 35(10):939–948 Wintle JB, Kenzie BW, Amphlett GH, Smalley J (2001) Best practice for risk based inspection as a part of plant integrity management. Contract research report 363/2001. Prepared by TWA and Royal & Sun Alliance Engineering for the Health and Safety Executive
Bibliography Wortman MA, Klutke GA, Ayhan H (1994) A maintenance strategy for systems subjected to deterioration governed by random shocks. IEEE Trans Reliab 43:439–445 Yannaros N (1994) Weibull renewal processes. Ann Inst Stat Math 46(4):641–648 Zheng X (1995) All opportunity-triggered replacement policy for multiple-unit systems. IEEE Trans Reliab 44:648–652
473 Zhou X, Wang J (2005) A genetic method of LAD estimation for models with censored data. Comput Stat Data Anal 48(3):451–466 Zuckerman D (1989) Optimal inspection policy for a multi-unit machine. J Appl Probab 26:543–551 Zuo MJ, Liu B, Murthy DNP (2000) Replacement-repair policy for multi-state deteriorating products under warranty. Eur J Oper Res 123:519–530
Index
A accelerated life testing, 204 accident, 53, 54 active maintenance time, 320 adaptive-response-rate single exponential smoothing, 412 additive Holt–Winter, 412 ADI average interdemand interval, 410, 430 Aerospace, 21 age-based replacement policy, 319 AIAG FMEA-3, 221 airlines, 6, 417 alternating renewal process, 261, 345 analytic hierarchy process, 425 Anderson–Darling, 43 ANEC, 22 ARP5580, 221 Arrhenius, 205 as bad as first failure, 124 as good as new, 96 asset management, 196 asset register, 190 associative law, 243 attribute data, 33 automation, 8 automotive, 220, 221 autonomous maintenance, 74 autoregressive integrated moving average, 412 availability, 91, 113, 127 B basic event, 237 basic statistics, 89 bathtub curve, 94 Bellcore, 213 binomial distribution, 27, 48 binomial model, 412 Birnbaum, 294 block diagram, 156 block replacement, 399 block replacement policy, 319, 339 Boolean algebra, 239, 243
breakdown, 65, 67, 236, 316 British Standards Institution, 221 BS 5760, 221 C c-chart, 39 call cost, 320 capability analysis, 25, 40 capital equipment, 372 CAPP, 12 case studies, 117 catastrophic risks, 58 causes by occurrence analysis, 227 CEN standard, 19, 21, 60 censored data, 118, 135, 145 central limit theorem, 23 check lists, 59 closure production system, 446 CM downtime, 334 CMMS, 196 Coffin–Manson model, 206 cold standby, 180 comakership with suppliers, 13 combined parallel–series system, 170 combined series–parallel system, 168 common causes, 25, 309 commutative law, 243 complete failure data, 134 component, 88 computer-aided design, 10 computer-aided manufacturing, 10 computerized maintenance management system, ix, 189 condition based maintenance, 315, 454 conditional probability, 89 conditioning event, 237 confidence interval, 137 constant failure rate, 95, 97, 247 constant interval replacement policy, 319, 339 continuous dryer system, 187 continuous improvement, 18 control charts, 25, 464
475
476 conventional risks, 58 corrective, 67, 70 corrective actions, 227 corrective maintenance, 314 cost, 3, 203 cost control, 68 cost of emissions, 438 cost of failure, 438 cost of man work, 438 cost of materials and spare parts, 438 cost rate, 405 crew cost, 320 critical path method, 11 criticality, 294, 430 criticality matrix, 231, 234 Croston method, 412 cumulative distribution, 90 cumulative failure, 152 customer, 5, 18 CV2 squared coefficient of variation, 410, 430 cycle length, 333, 405 cycles of replacement, 369 D danger, 54 data collection, 83, 134, 191, 196 data mining, 12 data warehousing, 12 decision tree, 12 defect, 24, 50 defectives, 75 deferred maintenance, 315 degradation process, 400, 402 demand analysis, 10 density function, 90 dependent event, 311 design, 10 design FMEA (DFMEA), 220 design for assembly, 5 design for disassembly, 5 design for manufacturing, 5 design modification, 318 detection, 222, 225 DFA, 4 DFD, 5 DFM, 4 direct method, 136 discounted cash flow rate of return, 11 discrete random variable, 36 disjunction, 243 distinct causes, 240 distribution function, 36 distribution management, 13, 14 distributive law, 243 double exponential smoothing, 412 downing event criticality index, 159 downtime, 65, 115 drink vending machine, 221 duration of replacements, 336
Index E early wear out, 110 economic order quantity, 13 economic value added, 12 ECOS, 22 effects classification, 227 Efficiency, 183 EFTA, 19 elasticity, 3 electric power supplier, 252 electrical hazards, 55 electromigration model, 205 elementary inspection model, 376 emergency situation, 57 EN ISO 14121, 55 EN ISO 9000, 17, 19 enterprise resource program, 195 environment factor, 207 environmental standards, 21 equivalent fault tree (EFT), 244 equivalent reliability block diagram, 244 ergonomic hazards, 56 erratic demand, 411 expected cycle length, 323 expected number of failures (ENF), 113 expected overall performance, 43 expected within performance, 43 exponential distribution, 97 exponential smoothing, 10 exponential voltage model, 205 exponential weighted moving averages, 412 Eyring, 206 F failure event, 91 failure mode, 233 failure mode and effects analysis (FMEA), 222 failure mode, effects, and criticality analysis (FMECA), 220, 231 failure modes and effects analysis (FMEA), 220, 224 failure process, 90 failure rate databank (FARADA), 206 failure rate prediction, 97, 204, 211 failure replacement, 333 failure report, 191, 192 failure to danger, 57 father event, 236 fault finding, 317 fault tree analysis (FTA), 237, 239, 244, 263 FFR, 113 fire service, 60 first failure, 248 fit analysis, 118, 145 flexible automation, 9 flexible manufacturing system, 8 forecasting, 11, 410 forecasting accuracy, 416 functional scheme, 152
Index
477
functional unit, 133 Fussell–Vesely, 294
key characteristic, 24 KPI, 71, 353
G
L
gamma function, 110 Gantt, 11 golden section search method, 326 goodness of the fit, 106, 145 Government–Industry Data Exchange Program (GIDEP), 206 great risks, 58 group replacement, 339, 358
lamp replacement problem, 358 Laplace transform, 302 law of absorption, 243 lean manufacturing, 73 least-square, 136, 145 left censored data, 134 life cycle management, 5, 320 life data analysis, 133 life–stress relationships, 205 linear regression, 145 location allocation problem, 13 logistic delay, 320 loglogistic function, 454 lognormal distribution, 103, 104 lower control limit, 26 lower incomplete gamma function, 324 lower specification limit, 24 lumpy demand, 411
H harm, 54 hazard, 54, 57 hazard operability, 59 hazard rate, 92, 94 head protection, 60 health, 21, 51 hearing protectors, 60 heating system, 263 hospitals, 6 hot standby, 180 I idempotent law, 243 idle time, 319 IEC 812, 221 immediate maintenance, 315 imperfect maintenance, 388, 398 improved indirect method, 136 in control, 25 incinerator, 278 independent events, 90, 239 individual censored data, 134 industrial management, 5 infant mortality, 94, 110 information technology, 8 INHIBIT gate, 237 inspection maintenance, 317, 373, 381 inspection units, 37, 38 intermediate event, 237 intermittent demand, 410, 411 International Electrotechnical Commission, 221 interval censored data, 134 inventory control, 68, 196 inverse Laplace transform, 305 inverse power rule, 205 item criticality, 232 J J1739, 220 just in time, 13 K k-out-of-n parallel, 170 Kaplan–Meier, 120, 136
M M –P diagram, 58 magnitude, 54, 224 maintainability, 96 maintenance, 65, 71, 398 maintenance control, 66 maintenance cost, 334 maintenance global service, 83, 215 maintenance information system, 189, 196 maintenance management, 65, 77 maintenance planning, 66 maintenance status survey, 80 maintenance strategies, 66, 315, 398, 437 maintenance-free operating period, 390 manufacturing systems, 8 market investigation, 12 market uncertainty, 2 Markov analysis, 116, 301 Martin Titan Handbook, 206 material handling device design, 11 material/substance hazards, 56 maximum likelihood estimator, 136, 149 mean absolute deviation (MAD), 416 mean absolute percentage error (MAPE), 416 mean availability, 115 mean deviation (MD), 416 mean square deviation (MSD), 416 mean time to failure (MTTF), 95, 137 mean time to repair (MTTR), 96, 429 mechanical hazards, 55 median rank, 136 memoryless, 94 micro-stops, 74 MIL-STD-1629A, 220 MIL-STD-217, 206 minimal cut sets (MCS), 239
478 minimal repair, 371 minimum total cost method, 426 minimum total downtime, 355 mirrored blocks, 244 Monte Carlo simulation, 128, 157, 260, 275, 442 motorcycle manufacturer, 429 moving average, 10, 412 multiattribute spare tree analysis, 424 multiple censored data, 134 multiscenario analysis, 337 N net present value, 11 neural network, 145 noise hazards, 55 nonconformity, 24, 27 nonnormal probability, 46 nonparametric reliability evaluation, 101, 120 nonproduction cost, 320 nonrepairable component, 91 normal distribution, 41, 103 not conditional failure rate, 92 np-chart, 37 number of failures, 159 O occurrence–severity matrix, 227 on condition monitoring, 70 on-line counseling, 215 operating time, 319 opportunistic maintenance, 317, 393 ordinary free replacement, 407 OSHA, 53 out of control, 26 out of specification, 49 outsourcing, 83 overall equipment effectiveness OEE, 76 overhaul, 83, 316 P P-AND gate (priority AND gate), 237 p-chart, 35 parallel configuration, 161 Pareto chart, 227 part stress analysis, 207 payback analysis, 11 performance, 2 piping system, 236 planned replacement, 317 plant control, 68 plant layout, 12 PM downtime, 334 point availability, 115 Poisson distribution, 27, 38, 413 population, 23, 35 power rating factor, 207 PPM, 48 predetermined maintenance, 315
Index predictive maintenance, 72, 316, 439 prevention strategy, 60 preventive maintenance, 57, 314, 317, 333 pro rata warranty, 407 proactive, 72 probability distribution function, 90 probability of event, 238 probability plot, 101 process capability, 2 process design, 10 process FMEA (PFMEA), 220 product design, 10 product life cycle management, 5, 9, 320 product limit estimator method, 136 product mix, 2, 3, 5 production efficiency, 75 production planning, 14 production process, 66 production system, 2, 11, 13 production system design framework, 4 profit analysis, 12 profit per unit time maximization, 378 program evaluation and review technique, 11 project execution, 12 project planning, 11 protection, 54 protection strategy, 60 protective action, 57, 60, 63 purchase order, 196 Q quality audit, 19 quality control, 23, 68 quality factor, 207 quality management system, 18 R R-chart, 26 RADC, 212 radiation hazards, 56 radio-frequency identification, 9 RAMS, 72 random failures, 110 rank adjustment method, 136, 140 rapid wear out, 110 rate of quality, 75 RCM, 71 reactor explosion, 240 redundant system, 161, 171, 246, 302 refurbishment, 316 relevant accident, 58 reliability, 88 reliability based preventive maintenance, 316 reliability block diagram, 152 reliability database, 267 reliability function, 91 reliability libraries, 268 reliability of system, 153, 163, 434 reliability parameters evaluation, 133, 454
Index remote maintenance, 190, 214 renewal process, 113, 115, 340 repair process, 91, 95, 99, 248 repair time, 320 replacement, 317 replacement upon failure, 317 required time, 319 research for productivity, 2 residual risk, 59 restoration, 316, 346 right censored data, 134 risk, 53, 56 risk analysis, 54, 57, 222 risk priority number (RPN), 220 Rome Air Development Center (RADC), 206 running in period, 94 S s-chart, 30 SABE, 21 safety, 53 safety of machinery, 61 safety stock, 13 scheduled-basis preventive maintenance, 316 scheduling, 10 sequencing, 10 serial configuration, 153 service life period, 94 severity, 222, 232 shock damage, 400 simple standby system, 174 simulation, 11, 157 single exponential smoothing, 412 Six Sigma analysis, 48 six-pack capability analysis, 43 spare parts, 195, 295, 320, 409 spare parts forecasting, 411, 414 spare parts management, 7, 423, 426 specific/minor risks, 58 specification limit, 24 stakeholders, 4 standardized MAD (SMAD), 416 standardized normal distribution, 463 standby system, 180, 246, 319 state diagram, 157 static reliability importance analysis, 252 statistical quality control, 24 steady-state availability, 115 stochastic failure and repair process, 89, 95, 117 storage cost, 409 stress factor, 207 student distribution, 137, 465 successful configuration, 171 supply plant, 152 survival function, 92 switching device, 180
479 telemaintenance, 214 thermal hazards, 55 thermal water treatments, 51 three stress models, 206 time series, 10, 59 time series decomposition, 412 time to failure, 90 time to market, 3 time to repair, 90 time-based preventive maintenance, 316 time-dependent analysis, 180, 301 top event, 237, 239 top-down analysis, 233 total expected replacement cost per cycle, 323 TPM, 71, 73, 76 transfer out block, 265 transporation, 13 traveling scheduling procedures, 11 two temperature/voltage models, 205 two-state diagram, 91 type I model, 324, 328 type II model, 319, 343, 357 U unavailability, 247 UNI, 19 unlimited free replacement, 407 up/down analysis, 132, 157 uptime, 65 use-based preventive maintenance, 316 V variety reduction program, 7, 12 VED approach, 423 vehicle routing, 12, 13 Venn diagrams, 241 vibration hazards, 55 VRP, 7, 12 W warm standby, 307 warranty, 406, 407 waste to energy plant, 433 waste treatment, 277 water supplier system, 185 wear out, 94 Weibull distribution, 47, 110, 454 weighted moving averages, 412 what-if analysis, 12 wood panel manufacturing, 216 work order, 191
T
X
Telcordia, 213
x-chart, N 29