Advances in Intelligent and Soft Computing Editor-in-Chief: J. Kacprzyk
56
Advances in Intelligent and Soft Computing Editor-in-Chief Prof. Janusz Kacprzyk Systems Research Institute Polish Academy of Sciences ul. Newelska 6 01-447 Warsaw Poland E-mail:
[email protected] Further volumes of this series can be found on our homepage: springer.com Vol. 42. O. Castillo, P. Melin, O. Montiel Ross, R. Sepúlveda Cruz, W. Pedrycz, J. Kacprzyk (Eds.) Theoretical Advances and Applications of Fuzzy Logic and Soft Computing, 2007 ISBN 978-3-540-72433-9
Vol. 50. J.M. Corchado, S. Rodriguez, J. Llinas, J.M. Molina (Eds.) International Symposium on Distributed Computing and Artificial Intelligence 2008 (DCAI 2008), 2009 ISBN 978-3-540-85862-1
Vol. 43. K.M. W˛egrzyn-Wolska, P.S. Szczepaniak (Eds.) Advances in Intelligent Web Mastering, 2007 ISBN 978-3-540-72574-9
Vol. 51. J.M. Corchado, D.I. Tapia, J. Bravo (Eds.) 3rd Symposium of Ubiquitous Computing and Ambient Intelligence 2008, 2009 ISBN 978-3-540-85866-9
Vol. 44. E. Corchado, J.M. Corchado, A. Abraham (Eds.) Innovations in Hybrid Intelligent Systems, 2007 ISBN 978-3-540-74971-4 Vol. 45. M. Kurzynski, E. Puchala, M. Wozniak, A. Zolnierek (Eds.) Computer Recognition Systems 2, 2007 ISBN 978-3-540-75174-8 Vol. 46. V.-N. Huynh, Y. Nakamori, H. Ono, J. Lawry, V. Kreinovich, H.T. Nguyen (Eds.) Interval / Probabilistic Uncertainty and Non-classical Logics, 2008 ISBN 978-3-540-77663-5 Vol. 47. E. Pietka, J. Kawa (Eds.) Information Technologies in Biomedicine, 2008 ISBN 978-3-540-68167-0 Vol. 48. D. Dubois, M. Asunción Lubiano, H. Prade, M. Ángeles Gil, P. Grzegorzewski, O. Hryniewicz (Eds.) Soft Methods for Handling Variability and Imprecision, 2008 ISBN 978-3-540-85026-7 Vol. 49. J.M. Corchado, F. de Paz, M.P. Rocha, F. Fernández Riverola (Eds.) 2nd International Workshop on Practical Applications of Computational Biology and Bioinformatics (IWPACBB 2008), 2009 ISBN 978-3-540-85860-7
Vol. 52. E. Avineri, M. Köppen, K. Dahal, Y. Sunitiyoso, R. Roy (Eds.) Applications of Soft Computing, 2009 ISBN 978-3-540-88078-3 Vol. 53. E. Corchado, R. Zunino, P. Gastaldo, Á. Herrero (Eds.) Proceedings of the International Workshop on Computational Intelligence in Security for Information Systems CISIS 2008, 2009 ISBN 978-3-540-88180-3 Vol. 54. B.-y. Cao, C.-y. Zhang, T.-f. Li (Eds.) Fuzzy Information and Engineering, 2009 ISBN 978-3-540-88913-7 Vol. 55. Y. Demazeau, J. Pavón, J.M. Corchado, J. Bajo (Eds.) 7th International Conference on Practical Applications of Agents and Multi-Agent Systems (PAAMS 2009), 2009 ISBN 978-3-642-00486-5 Vol. 56. H. Wang, Y. Shen, T. Huang, Z. Zeng (Eds.) The Sixth International Symposium on Neural Networks (ISNN 2009), 2009 ISBN 978-3-642-01215-0
Hongwei Wang, Yi Shen, Tingwen Huang, Zhigang Zeng (Eds.)
The Sixth International Symposium on Neural Networks (ISNN 2009)
ABC
Editors Hongwei Wang Department of Control Science and Engineering Huazhong University of Science and Technology No. 1037, Luoyu Road Wuhan, Hubei, 430074 China Yi Shen Department of Control Science and Engineering Huazhong University of Science and Technology No. 1037, Luoyu Road Wuhan, Hubei, 430074 China
ISBN 978-3-642-01215-0
Tingwen Huang Texas A&M University at Qatar PO Box 23874 Doha Qatar E-mail: tingwen.huang@ qatar.tamu.edu Zhigang Zeng Department of Control Science and Engineering Huazhong University of Science and Technology No. 1037, Luoyu Road Wuhan, Hubei, 430074 China E-mail:
[email protected]
e-ISBN 978-3-642-01216-7
DOI 10.1007/978-3-642-01216-7 Advances in Intelligent and Soft Computing
ISSN 1867-5662
Library of Congress Control Number: Applied for c 2009
Springer-Verlag Berlin Heidelberg
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable for prosecution under the German Copyright Law. The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.
Typeset & Cover Design: Scientific Publishing Services Pvt. Ltd., Chennai, India. Printed in acid-free paper 543210 springer.com
Preface
This volume of Advances in Soft Computing and Lecture Notes in Computer Science vols. 5551, 5552 and 5553, constitute the Proceedings of the 6th International Symposium of Neural Networks (ISNN 2009) held in Wuhan, China during May 26–29, 2009. ISNN is a prestigious annual symposium on neural networks with past events held in Dalian (2004), Chongqing (2005), Chengdu (2006), Nanjing (2007) and Beijing (2008). Over the past few years, ISNN has matured into a well-established series of international conference on neural networks and their applications to other fields. Following this tradition, ISNN 2009 provided an academic forum for the participants to disseminate their new research findings and discuss emerging areas of research. Also, it created a stimulating environment for the participants to interact and exchange information on future research challenges and opportunities of neural networks and their applications. ISNN 2009 received 1,235 submissions from about 2,459 authors in 29 countries and regions (Australia, Brazil, Canada, China, Democratic People's Republic of Korea, Finland, Germany, Hong Kong, Hungary, India, Islamic Republic of Iran, Japan, Jordan, Macao, Malaysia, Mexico, Norway, Qatar, Republic of Korea, Singapore, Spain, Taiwan, Thailand, Tunisia, United Kingdom, United States, Venezuela, Vietnam, and Yemen) across six continents (Asia, Europe, North America, South America, Africa, and Oceania). Based on rigorous reviews by the Program Committee members and reviewers, 95 high-quality papers were selected to be published in this volume. These papers cover all major topics of the theoretical research, empirical study and applications of neural network research. In addition to the contributed papers, the ISNN 2009 technical program included five plenary speeches by Anthony Kuh (University of Hawaii at Manoa, USA), Jose C. Principe (University of Florida, USA), Leszek Rutkowski (Technical University of Czestochowa, Poland), Fei-Yue Wang (Institute of Automation, Chinese Academy of Sciences, China) and Cheng Wu (Tsinghua University, China). Furthermore, the ISNN 2009 also featured five special sessions focusing on emerging topics of neural network research. ISNN 2009 would not have achieved its success without the support and contributions of many volunteers and organizations. We would like to express our sincere thanks to the Huazhong University of Science and Technology, The Chinese University of Hong Kong, and the National Natural Science Foundation of China for their sponsorship, to the IEEE Wuhan Section, the IEEE Computational Intelligence Society, the International Neural Network Society, the Asia Pacific Neural Network Assembly, and the European Neural Network Society for their
VI
Preface
technical co-sponsorship, and to the Systems Engineering Society of Hubei Province and the IEEE Hong Kong Joint Chapter on Robotics & Automation and Control Systems for their logistic co-operations. We would also like to sincerely thank the General Chair and General Co-Chairs for their overall organization of the symposium. Also we would like to thank the members of the Advisory Committee and Steering Committee for their invaluable assistance and guidance in enhancing the scientific level of the event, the members of the Program Committee and additional reviewers for reviewing the papers, and members of the Publications Committee for checking the accepted papers in a short period of time. Particularly, we deeply appreciate Prof. Janusz Kacprzyk (Editor-in-Chief), Dr. Thomas Ditzinger (Senior Editor, Engineering/Applied Sciences) and other Springer-Verlag staff for their help and collaboration in this demanding scientific publication project – it is always a great pleasure to work with them. There are still many more colleagues, associates, friends, and supporters who helped us in many ways, we would like to say “Thank you so much” to all of them. We also would like to express our heartfelt gratitude to the plenary and panel speakers for their vision and discussions on the state-of-the-art research development in the field as well as promising future research directions, opportunities, and challenges. Last but not least, we would like to express our most cordial thanks to all of the authors of the papers constituting this volume in Advances in Soft Computing: it is the excellence of their research work that gives value to the book.
May 2009
Hongwei Wang Yi Shen Tingwen Huang Zhigang Zeng
Organization
General Chair Shuzi Yang, China
General Co-chairs Youlun Xiong, China Yongchuan Zhang, China
Advisory Committee Chairs Shoujue Wang, China Paul J. Werbos, USA
Advisory Committee Members Shun-ichi Amari, Japan Zheng Bao, China Tianyou Chai, China Guanrong Chen, China Shijie Cheng, China Ruwei Dai, China Jay Farrell, USA Chunbo Feng, China Russell Eberhart, USA David Fogel, USA Walter J. Freeman, USA Kunihiko Fukushima, Japan Marco Gilli, Italy Aike Guo, China Xingui He, China Zhenya He, China Petros Loannou, USA Janusz Kacprzyk, Poland
VIII
Nikola Kasabov, New Zealand Okyay Kaynak, Turkey Frank L. Lewis, USA Deyi Li, China Yanda Li, China Chin-Teng Lin, Taiwan Robert J. Marks II, USA Erkki Oja, Finland Nikhil R. Pal, India Marios M. Polycarpou, USA Leszek Rutkowsk, Poland Jennie Si, USA Youxian Sun, China Joos Vandewalle, Belgium DeLiang Wang, USA Fei-Yue Wang, USA Donald C. Wunsch II, USA Lei Xu, China Xin Yao, UK Gary G. Yen, USA Bo Zhang, China Nanning Zheng, China Jacek M. Zurada, USA
Steering Committee Chairs Jun Wang, Hong Kong Derong Liu, China
Steering Committee Members Jinde Cao, China Shumin Fei, China Chengan Guo, China Min Han, China Zeng-Guang Hou, China Xiaofeng Liao, China Bao-Liang Lu, China Fuchun Sun, China Zhang Yi, China Fuliang Yin, China Hujun Yin, UK Huaguang Zhang, China Jianwei Zhang, Germany
Organization
Organization
Organizing Committee Chairs Hongwei Wang, China Jianzhong Zhou, China Yi Shen, China
Program Committee Chairs Wen Yu, Mexico Haibo He, USA Nian Zhang, USA
Special Sessions Chairs Sanqing Hu, USA Youshen Xia, China Yunong Zhang, China
Publications Chairs Xiaolin Hu, China Minghui Jiang, China Qingshan Liu, China
Publicity Chairs Tingwen Huang, Qatar Paul S. Pang, New Zealand Changyin Sun, China
Finance Chair Xiaoping Wang, China
Registration Chairs Charlie C.L. Wang, China Zhenyuan Liu, China Weifeng Zhu, China
Local Arrangements Chairs Zhigang Zeng, China Chao Qi, China Liu Hong, China
IX
X
Organization
Program Committee Members José Alfredo, Brazil Sabri Arik, Turkey Xindi Cai, USA Yu Cao, USA Matthew Casey, UK Emre Celebi, USA Jonathan Chan, Thailand Sheng Chen, UK Yangquan Chen, USA Ji-Xiang Du, China Hai-Bin Duan, China Andries Engelbrecht, South Africa Péter érdi, USA Jufeng Feng, China Chaojin Fu, China Wai Keung Fung, Canada Erol Gelenbe, UK Xinping Guan, China Chengan Guo, China Ping Guo, China Qing-Long Han, Australia Hanlin He, China Daniel Ho, Hong Kong Zhongsheng Hou, China Huosheng Hu, UK Jinglu Hu, Japan Junhao Hu, China Marc van Hulle, Belgium Danchi Jiang, Australia Haijun Jiang, China Shunshoku Kanae, Japan Rhee Man Kil, Republic of Korea Sungshin Kim, Korea Arto Klami, Finland Rakhesh Singh Kshetrimayum, India Hon Keung Kwan, Canada Chuandong Li, China Kang Li, UK Li Li, China Michael Li, Australia Ping Li, Hong Kong Shutao Li, China
Xiaoli Li, UK Xiaoou Li, Mexico Yangmin Li, Macao Hualou Liang, USA Jinling Liang, China Wudai Liao, China Alan Liew, Australia Ju Liu, China Li Liu, USA Meiqin Liu, China Wenxin Liu, USA Yan Liu, USA Jianquan Lu, Hong Kong Jinhu Lu, China Wenlian Lu, China Jinwen Ma, China Ikuko Nishkawa, Japan Seiichi Ozawa, Japan Jaakko Peltonen, Finland Juan Reyes, Mexico Jose de Jesus Rubio, Mexico Eng. Sattar B. Sadkhan, Iraq Gerald Schaefer, UK Michael Small, Hong Kong Qiankun Song, China Humberto Sossa, Mexico Bingyu Sun, China Norikazu Takahashi, Japan Manchun Tan, China Ying Tan, China Christos Tjortjis, UK Michel Verleysen, Belgium Bing Wang, UK Dan Wang, China Dianhui Wang, Australia Meiqing Wang, China Rubin Wang, China Xin Wang, China Zhongsheng Wang, China Jinyu Wen, China Wei Wu, China Degui Xiao, China
Organization
Rui Xu, USA Yingjie Yang, UK Kun Yuan, China
XI
Xiaoqin Zeng, China Jie Zhang, UK Liqing Zhang, China
Publications Committee Members Guici Chen Huangqiong Chen Shengle Fang Lizhu Feng Junhao Hu Feng Jiang Bin Li Yanling Li Mingzhao Li Lei Liu Xiaoyang Liu Cheng Wang Xiaohong Wang
Zhikun Wang Shiping Wen Ailong Wu Yongbo Xia Li Xiao Weina Yang Zhanying Yang Tianfeng Ye Hongyan Yin Lingfa Zeng Yongchang Zhang Yongqing Zhao Song Zhu
Technical Committee Members Helena Aidos Antti Ajanki Tholkappia AraSu Hyeon Bae Tao Ban Li Bin Binghuang Cai Lingru Cai Xindi Cai Qiao Cai Chao Cao Hua Cao Jinde Cao Kai Cao Wenbiao Cao Yuan Cao George Cavalcanti Lei Chang Mingchun Chang Zhai Chao Cheng Chen Gang Chen Guici Chen
Ke Chen Jiao Chen Lei Chen Ming Chen Rongzhang Chen Shan Chen Sheng Chen Siyue Chen TianYu Chen Wei Chen Xi Chen Xiaochi Chen Xiaofeng Chen XinYu Chen Xiong Chen Xuedong Chen Yongjie Chen Zongzheng Chen Hao Cheng Jian Cheng Long Cheng Zunshui Cheng Rong Chu
XII
Bianca di Angeli C.S. Costa Jose Alfredo Ferreira Costa Dadian Dai Jianming Dai Jayanta Kumar Debnath Spiros Denaxas Chengnuo Deng Gang Deng Jianfeng Deng Kangfa Deng Zhipo Deng Xiaohua Ding Xiuzhen Ding Zhiqiang Dong Jinran Du Hongwu Duan Lijuan Duan Xiaopeng Duan Yasunori Endo Andries Engelbrecht Tolga Ensari Zhengping Fan Fang Fang Haitao Fang Yuanda Fang June Feng Lizhu Feng Yunqing Feng Avgoustinos Filippoupolitis Liang Fu Ruhai Fu Fang Gao Lei Gao Ruiling Gao Daoyuan Gong Xiangguo Gong Fanji Gu Haibo Gu Xingsheng Gu Lihe Guan Jun Guo Songtao Guo Xu Guo Fengqing Han Pei Han Qi Han
Organization
Weiwei Han Yishan Han Yunpeng Han Hanlin He Jinghui He Rui He Shan He Tonejun He Tongjun He Wangli He Huosheng Hu Li Hong Liu Hong Ruibing Hou Cheng Hu Jin Hu Junhao Hu Hao Hu Hui Hu Ruibin Hu Sanqing Hu Xiaolin Hu Xiaoyan Hu Chi Huang Darong Huang Diqiu Huang Dongliang Huang Gan Huang Huayong Huang Jian Huang Li Huang Qifeng Huang Tingwen Huang Zhangcan Huang Zhenkun Huang Zhilin Huang Rey-Chue Hwang Sae Hwang Hui Ji Tianyao Ji Han Jia Danchi Jiang Shaobo Jiang Wei Jiang Wang Jiao Xianfa Jiao
Organization
Yiannis Kanellopoulos Wenjing Kang Anthony Karageorgos Masanori KaWakita Haibin Ke Seong-Joo Kim Peng Kong Zhanghui Kuang Lingcong Le Jong Min Lee Liu Lei Siyu Leng Bing Li Changping Li Chuandong Li Hui Li Jian Li Jianmin Li Jianxiang Li Kelin Li Kezan Li Lei Li Li Li Liping Li Lulu Li Ming Li Na Li Ping Li Qi Li Song Li Weiqun Li Wenlong Li Wentian Li Shaokang Li Shiying Li Tian Li Wei Li Wu Li Xiang Li Xiaoli Li Xiaoou Li Xin Li Xinghai Li Xiumin Li Yanlin Li Yanling Li
XIII
Yong Li Yongfei Li Yongmin Li Yuechao Li Zhan Li Zhe Li Jinling Liang Wudai Liao Wei Lin Zhihao Lin Yunqing Ling Alex Liu Bo Liu Da Liu Dehua Li Dayuan Liu Dongbing Liu Desheng Liu F.C. Liu Huaping Liu Jia Liu Kangqi Liu Li Liu Ming Liu Qian Liu Qingshan Liu Shangjin Liu Shenquan Liu Shi Liu Weiqi Liu Xiaoyang Liu Xiuquan Liu Xiwei Liu XinRong Liu Yan Liu Yang Liu Yawei Liu Yingju Liu Yuxi Liu Zhenyuan Liu Zijian Liu Yimin Long Georgios Loukas Jinhu Lu Jianquan Lu Wen Lu
XIV
Wenlian Lu Wenqian Lu Tongting Lu Qiuming Luo Xucheng Luo Chaohua Ma Jie Ma Liefeng Ma Long Ma Yang Ma Zhiwei Ma Xiaoou Mao Xuehui Mei Xiangpei Meng Xiangyu Meng Zhaohui Meng Guo Min Rui Min Yuanneng Mou Junichi Murata Puyan Nie Xiushan Nie Gulay Oke Ming Ouyang Yao Ouyang Seiichi Ozawa Neyir Ozcan Joni Pajarinen Hongwei Pan Linqiang Pan Yunpeng Pan Tianqi Pang Kyungseo Park Xiaohan Peng Zaiyun Peng Gao Pingan Liquan Qiu Jianlong Qiu Tapani Raiko Congjun Rao Fengli Ren Jose L. Rosseilo Gongqin Ruan Quan Rui Sattar B. Sadkhan Renato Jose Sassi Sassi
Organization
Sibel Senan Sijia Shao Bo Shen Enhua Shen Huayu Shen Meili Shen Zifei Shen Dianyang Shi Jinrui Shi Lisha Shi Noritaka Shigei Atsushi Shimada Jiaqi Song Wen Song Yexin Song Zhen Song Zhu Song Gustavo Fontoura de Souza Kuo-Ho Su Ruiqi Su Cheng Sun Dian Sun Junfeng Sun Lisha Sun Weipeng Sun Yonghui Sun Zhaowan Sun Zhendong Sun Manchun Tan Xuehong Tan Yanxing Tan Zhiguo Tan Bing Tang Hao Tang Yili Tang Gang Tian Jing Tian Yuguang Tian Stelios Timotheou Shozo Tokinaga Jun Tong Joaquin Torres Sospedra Hiroshi Wakuya Jin Wan B.H. Wang Cheng Wang
Organization
Fan Wang Fen Wang Gang Wang Gaoxia Wang Guanjun Wang Han Wang Heding Wang Hongcui Wang Huayong Wang Hui Wang Huiwei Wang Jiahai Wang Jian Wang Jin Wang Juzhi Wang Kai Wang Lan Wang Lili Wang Lu Wang Qilin Wang Qingyun Wang Suqin Wang Tian Wang Tianxiong Wang Tonghua Wang Wei Wang Wenjie Wang Xiao Wang Xiaoping Wang Xiong Wang Xudong Wang Yang Wang Yanwei Wang Yao Wang Yiping Wang Yiyu Wang Yue Wang Zhanshan Wang Zhengxia Wang Zhibo Wang Zhongsheng Wang Zhihui Wang Zidong Wang Zhuo Wang Guoliang Wei Li Wei
XV
Na Wei Shuang Wei Wenbiao Wei Yongchang Wei Xiaohua Wen Xuexin Wen Junmei Weng Yixiang Wu You Wu Huaiqin Wu Zhihai Wu Bin Xia Weiguo Xia Yonghui Xia Youshen Xia Zhigu Xia Zhiguo Xia Xun Xiang Chengcheng Xiao Donghua Xiao Jiangwen Xiao Yongkang Xiao Yonkang Xiao Yong Xie Xiaofei Xie Peng Xin Chen Xiong Jinghui Xiong Wenjun Xiong Anbang Xu Chen Xu Hesong Xu Jianbing Xu Jin Xu Lou Xu Man Xu Xiufen Yu Yan Xu Yang Xu Yuanlan Xu Zhaodong Xu Shujing Yan Dong Yang Fan Yang Gaobo Yang Lei Yang
XVI
Sihai Yang Tianqi Yang Xiaolin Yang Xing Yang Xue Yang Yang Yang Yongqing Yang Yiwen Yang Hongshan Yao John Yao Xianfeng Ye Chenfu Yi Aihua Yin Lewen Yin Qian Yin Yu Ying Xu Yong Yuan You Shuai You Chenglong Yu Liang Yu Lin Yu Liqiang Yu Qing Yu Yingzhong Yu Zheyi Yu Jinhui Yuan Peijiang Yuan Eylem Yucel Si Yue Jianfang Zeng Lingjun Zeng Ming Zeng Yi Zeng Zeyu Zhang Zhigang Zeng Cheng Zhang Da Zhang Hanling Zhang Haopeng Zhang Kaifeng Zhang Jiacai Zhang Jiajia Zhang Jiangjun Zhang Jifan Zhang Jinjian Zhang
Organization
Liming Zhang Long Zhang Qi Zhang Rui Zhang Wei Zhang Xiaochun Zhang Xiong Zhang Xudong Zhang Xuguang Zhang Yang Zhang Yangzhou Zhang Yinxue Zhang Yunong Zhang Zhaoxiong Zhang YuanYuan Bin Zhao Jin Zhao Le Zhao Leina Zhao Qibin Zhao Xiaquan Zhao Zhenjiang Zhao Yue Zhen Changwei Zheng Huan Zheng Lina Zheng Meijun Zheng Quanchao Zheng Shitao Zheng Ying Zheng Xun Zheng Lingfei Zhi Ming Zhong Benhai Zhou Jianxiong Zhou Jiao Zhou Jin Zhou Jinnong Zhou Junming Zhou Lin Zhou Rong Zhou Song Zhou Xiang Zhou Xiuling Zhou Yiduo Zhou Yinlei Zhou
Organization
Yuan Zhou Zhenqiao Zhou Ze Zhou Zhouliu Zhou Haibo Zhu Ji Zhu
XVII
Jiajun Zhu Tanyuan Zhu Zhenqian Zhu Song Zhu Xunlin Zhu Zhiqiang Zuo
Contents
Session 1: Theoretical Analysis The Initial Alignment of SINS Based on Neural Network . . . . Tingjun Li
1
Analysis on Basic Conceptions and Principles of Human Cognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Xiaorui Zhang, Minyong Li, Zhong Liu, Meng Zhang
7
Global Exponential Stability for Discrete-Time BAM Neural Network with Variable Delay . . . . . . . . . . . . . . . . . . . . . . . . Xiaochun Lu
19
The Study of Project Cost Estimation Based on Cost-Significant Theory and Neural Network Theory . . . . . . . . Xinzheng Wang, Liying Xing, Feng Lin
31
Global Exponential Stability of High-Order Hopfield Neural Networks with Time Delays . . . . . . . . . . . . . . . . . . . . . . . . . Jianlong Qiu, Quanxin Cheng
39
Improved Particle Swarm Optimization for RCP Scheduling Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Qiang Wang, Jianxun Qi
49
Exponential Stability of Reaction-Diffusion Cohen-Grossberg Neural Networks with S-Type Distributed Delays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yonggui Kao, Shuping Bao
59
XX
Contents
Global Exponential Robust Stability of Static Reaction-Diffusion Neural Networks with S-Type Distributed Delays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shuping Bao A LEC-and-AHP Based Hazard Assessment Method in Hydroelectric Project Construction . . . . . . . . . . . . . . . . . . . . . . . . . . Jian-lan Zhou, Da-wei Tang, Xian-rong Liu, Sheng-yu Gong A Stochastic Lotka-Volterra Model with Variable Delay . . . . . Yong Xu, Song Zhu, Shigeng Hu
69
81 91
Extreme Reformulated Radial Basis Function Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 Gexin Bi, Fang Dong Research of Nonlinear Combination Forecasting Model for Insulators ESDD Based on Wavelet Neural Network . . . . . . . . . 111 Haiyan Shuai, Jun Wu, Qingwu Gong Parameter Tuning of MLP Neural Network Using Genetic Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 Meng Joo Er, Fan Liu Intelligent Grid of Computations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 Samia Jones Method of Solving Matrix Equation and Its Applications in Economic Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 Qingfang Zhang, June Liu Efficient Feature Selection Algorithm Based on Difference and Similitude Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 Weibing Wu, Zhangyan Xu, June Liu Exponential Stability of Neural Networks with Time-Varying Delays and Impulses . . . . . . . . . . . . . . . . . . . . . . . . . . 153 Haydar Ak¸ca, Val´ery Covachev, Kumud Singh Altmayer
Session 2: Machine Learning Adaptive Higher Order Neural Networks for Effective Data Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 Shuxiang Xu, Ling Chen
Contents
XXI
Exploring Cost-Sensitive Learning in Domain Based Protein-Protein Interaction Prediction . . . . . . . . . . . . . . . . . . . . . . . 175 Weizhao Guo, Yong Hu, Mei Liu, Jian Yin, Kang Xie, Xiaobo Yang An Efficient and Fast Algorithm for Estimating the Frequencies of 2-D Superimposed Exponential Signals in Presence of Multiplicative and Additive Noise . . . . . . . . . . . . . . . 185 Jiawen Bian, Hongwei Li, Huiming Peng, Jing Xing An Improved Greedy Based Global Optimized Placement Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 Luo Zhong, Kejing Wang, Jingling Yuan, Jingjing He An Alternative Fast Learning Algorithm of Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 Pin-Hsuan Weng, Chih-Chien Huang, Yu-Ju Chen, Huang-Chu Huang, Rey-Chue Hwang Computer Aided Diagnosis of Alzheimer’s Disease Using Principal Component Analysis and Bayesian Classifiers . . . . . . 213 ´ Miriam L´ opez, Javier Ram´ırez, Juan M. G´ orriz, Ignacio Alvarez, Diego Salas-Gonzalez, Fermin Segovia, Carlos Garc´ıa Puntonet Margin-Based Transfer Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 Bai Su, Wei Xu, Yidong Shen
Session 3: Support Vector Machines and Kernel Methods Nonlinear Dead Zone System Identification Based on Support Vector Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 Jingyi Du, Mei Wang A SVM Model Selection Method Based on Hybrid Genetic Algorithm and Empirical Error Minimization Criterion . . . . . . 245 Xin Zhou, Jianhua Xu An SVM-Based Mandarin Pronunciation Quality Assessment System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255 Fengpei Ge, Fuping Pan, Changliang Liu, Bin Dong, Shui-duen Chan, Xinhua Zhu, Yonghong Yan An Quality Prediction Method of Injection Molding Batch Processes Based on Sub-Stage LS-SVM . . . . . . . . . . . . . . . . . . . . . . 267 XiaoPing Guo, Chao Zhang, Li Wang, Yuan Li
XXII
Contents
Soft Sensing for Propylene Purity Using Partial Least Squares and Support Vector Machine . . . . . . . . . . . . . . . . . . . . . . . . 275 Zhiru Xu, Desheng Liu, Jingguo Zhou, Qingjun Shi Application of Support Vector Machines Method in Credit Scoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283 Leilei Zhang, Xiaofeng Hui
Session 4: Pattern Recognition Improving Short Text Clustering Performance with Keyword Expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291 Jun Wang, Yiming Zhou, Lin Li, Biyun Hu, Xia Hu Nonnative Speech Recognition Based on Bilingual Model Modification at State Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299 Qingqing Zhang, Jielin Pan, Shui-duen Chan, Yonghong Yan Edge Detection Based on a PCNN-Anisotropic Diffusion Synergetic Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311 Mario I. Chacon-Murguia, Mitzy Nevarez-Aguilar, Angel Licon-Trillo, Oscar Mendoza-Vida˜ na, J. Alejandro Martinez-Ibarra, Francisco J. Solis-Martinez, Lucina Cordoba Fierro Automatic Face Recognition Systems Design and Realization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323 Zhiming Qian, Chaoqun Huang, Dan Xu Multi-view Face Detection Using Six Segmented Rectangular Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333 Jean Paul Niyoyita, Zhao Hui Tang, Jin Ping Liu Level Detection of Raisins Based on Image Analysis and Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343 Xiaoling Li, Jimin Yuan, Tianxiang Gu, Xiaoying Liu English Letters Recognition Based on Bayesian Regularization Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351 Xiaoli Huang, Huanglin Zeng Iris Disease Classifying Using Neuro-Fuzzy Medical Diagnosis Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359 Sara Moein, Mohamad Hossein Saraee, Mahsa Moein
Contents
XXIII
An Approach to Dynamic Gesture Recognition for Real-Time Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369 Jinli Zhao, Tianding Chen Dynamic Multiple Pronunciation Incorporation in a Refined Search Space for Reading Miscue Detection . . . . . . . . . 379 Changliang Liu, Fuping Pan, Fengpei Ge, Bin Dong, Shuiduen Chen, Yonghong Yan Depicting Diversity in Rules Extracted from Ensembles . . . . . 391 Fabian H.P. Chan, A. Chekima, Augustina Sitiol, S.M.A. Kalaiarasi A New Statistical Model for Radar HRRP Target Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401 Qingyu Hou, Feng Chen, Hongwei Liu, Zheng Bao Independent Component Analysis of SPECT Images to Assist the Alzheimer’s Disease Diagnosis . . . . . . . . . . . . . . . . . . . . 411 ´ Ignacio Alvarez, Juan M. G´ orriz, Javier Ram´ırez, Diego Salas-Gonzalez, Miriam L´ opez, Carlos Garc´ıa Puntonet, Fermin Segovia The Multi-Class Imbalance Problem: Cost Functions with Modular and Non-Modular Neural Networks . . . . . . . . . . . . . . . 421 Roberto Alejo, Jose M. Sotoca, R.M. Valdovinos, Gustavo A. Casa˜ n Geometry Algebra Neuron Based on Biomimetic Pattern Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433 Wenming Cao, Feng Hao A Novel Matrix-Pattern-Oriented Ho-Kashyap Classifier with Locally Spatial Smoothness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 441 Zhe Wang, Songcan Chen, Daqi Gao An Integration Model Based on Non-classical Receptive Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451 Xiaomei Wang, Hui Wei Classification of Imagery Movement Tasks for Brain-Computer Interfaces Using Regression Tree . . . . . . . . . . . 461 Chiman Wong, Feng Wan MIDBSCAN: An Efficient Density-Based Clustering Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 469 Cheng-Fa Tsai, Chun-Yi Sung
XXIV
Contents
Detection and Following of a Face in Movement Using a Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481 Jaime Pacheco Mart´ınez, Jos´e de Jes´ us Rubio Avila, Javier Guillen Campos
Session 5: Intelligent Modelling and Control Nonparametric Inter-Quartile Range for Error Evaluation and Correction of Demand Forecasting Model under Short Product Lifecycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 491 Wen-Rong Li, Bo Li Simulated Annealing and Crowd Dynamics Approaches for Intelligent Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 501 Qingpeng Zhang Accomplishing Station Keeping Mode for Attitude Orbit Control Subsystem Designed for T-SAT . . . . . . . . . . . . . . . . . . . . . 507 Montenegro Salom´ on, Am´ezquita Kendrick Nonlinear System Identification Based on Recurrent Wavelet Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517 Fengyao Zhao, Liangming Hu, Zongkun Li Approximation to Nonlinear Discrete-Time Systems by Recurrent Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 527 Fengjun Li Model-Free Control of Nonlinear Noise Processes Based on C-FLAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535 Yali Zhou, Qizhi Zhang, Xiaodong Li, Woonseng Gan An Empirical Study of the Artificial Neural Network for Currency Exchange Rate Time Series Prediction . . . . . . . . . . . . 543 Pin-Chang Chen, Chih-Yao Lo, Hung-Teng Chang Grey Prediction with Markov-Chain for Crude Oil Production and Consumption in China . . . . . . . . . . . . . . . . . . . . . . 551 Hongwei Ma, Zhaotong Zhang Fabric Weave Identification Based on Cellular Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 563 Suyi Liu, Qian Wan, Heng Zhang Cutting Force Prediction of High-Speed Milling Hardened Steel Based on BP Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . 571 Yuanling Chen, Weiren Long, Fanglan Ma, Baolei Zhang
Contents
XXV
BP Neural Networks Based Soft Measurement of Rheological Properties of CWS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 579 Dong Xie, Changxi Li A Parameters Self-adjusting ANN-PI Controller Based on Homotopy BP Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 587 Shuguang Liu, Mingyuan Liu
Session 6: Optimization and Genetic Algorithms Study on Optimization of the Laser Texturing Surface Morphology Parameters Based on ANN . . . . . . . . . . . . . . . . . . . . . 597 Zhigao Luo, Binbin Fan, Xiaodong Guo, Xiang Wang, Ju Li A Combined Newton Method for Second-Order Cone Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605 Xiaoni Chi, Jin Peng MES Scheduling Optimization and Simulation Based on CAPP/PPC Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 613 Yan Cao, Ning Liu, Lina Yang, Yanli Yang An Improved Diversity Guided Particle Swarm Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 623 Dongsheng Xu, Xiaoyan Ai Research on Intelligent Diagnosis of Mechanical Fault Based on Ant Colony Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 631 Zhousuo Zhang, Wei Cheng, Xiaoning Zhou A New Supermemory Gradient Method without Line Search for Unconstrained Optimization . . . . . . . . . . . . . . . . . . . . . . 641 June Liu, Huanbin Liu, Yue Zheng A Neural Network Approach for Solving Linear Bilevel Programming Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 649 Tiesong Hu, Bing Huang, Xiang Zhang Fuzzy Solution for Multiple Targets Optimization Based on Fuzzy Max-Min Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 659 Pengfei Peng, Jun Xing, Xuezhi Fu Fixed-Structure Mixed Sensitivity/Model Reference Control Using Evolutionary Algorithms . . . . . . . . . . . . . . . . . . . . . . 669 Pitsanu Srithongchai, Piyapong Olranthichachat, Somyot Kaitwanidvilai
XXVI
Contents
Session 7: Telecommunication and Transportation Systems ANN-Based Multi-scales Prediction of Self-similar Network Traffic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 677 Yunhua Rao, Lujuan Ma, Cuncheng Zhao, Yang Cao Application of DM and Combined Grey Neural Network in E-Commerce Data Transmission . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 685 Zhiming Qu Application of Prediction Model in Monitoring LAN Data Flow Based on Grey BP Neural Network . . . . . . . . . . . . . . . . . . . . 693 Zhiming Qu Monitoring ARP Attack Using Responding Time and State ARP Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 701 Zhenqi Wang, Yu Zhou A Study of Multi-agent Based Metropolitan Demand Responsive Transport Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 711 Jin Xu, Weiming Yin, Zhe Huang
Sesson 8: Applications The Diagnosis Research of Electric Submersible Pump Based on Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 721 Ding Feng, Cheng Yang, Bianyou Tan, Guanjun Xu, Yongxin Yuan, Peng Wang The Application of BP Feedforward Neural Networks to the Irradiation Effects of High Power Microwave . . . . . . . . . . . . 729 Tingjun Li A Novel Model for Customer Retention . . . . . . . . . . . . . . . . . . . . . 739 Yadan Li, Xu Xu, Panida Songram Neural Network Ensemble Approach in Analog Circuit Fault Diagnosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 749 Hong Liu, Guangju Chen, Guoming Song, Tailin Han Research on Case Retrieval of Case-Based Reasoning of Motorcycle Intelligent Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 759 Fanglan Ma, Yulin He, Shangping Li, Yuanling Chen, Shi Liang
Contents
XXVII
Improving Voice Search Using Forward-Backward LVCSR System Combination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 769 Ta Li, Changchun Bao, Weiqun Xu, Jielin Pan, Yonghong Yan Agent Oriented Programming for Setting Up the Platform for Processing EEG / ECG / EMG Waveforms . . . . . . . . . . . . . . 779 Tholkappia Arasu Govindarajan, Mazin Al-Hadidi, Palanisamy V. A Forecasting Model of City Freight Volume Based on BPNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 791 Peina Wen, Zhiyong Zhang The Estimations of Mechanical Property of Rolled Steel Bar by Using Quantum Neural Network . . . . . . . . . . . . . . . . . . . . . 799 Jen-Pin Yang, Yu-Ju Chen, Huang-Chu Huang, Sung-Ning Tsai, Rey-Chue Hwang Diagnosis of Epilepsy Disorders Using Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 807 Anupam Shukla, Ritu Tiwari, Prabhdeep Kaur Neural Forecasting Network for the Market of Pleione Formosana Hayata Orchid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 817 Chih-Yao Lo, Cheng-I Hou, Tian-Syung Lan Harmonic Current Detection Based on Neural Network Adaptive Noise Cancellation Technology . . . . . . . . . . . . . . . . . . . . . 829 Ziqiang Xi, Ruili Tang, Wencong Huang, Dandan Huang, Lizhi Zheng, Pan Shen Study on Dynamic Relation between Share Price Index and Housing Price: Co-integration Analysis and Application in Share Price Index Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 837 Jin Peng Application of RBF and Elman Neural Networks on Condition Prediction in CBM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 847 Chao Liu, Dongxiang Jiang, Minghao Zhao Judging the States of Blast Furnace by ART2 Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 857 Zhiling Lin, Youjun Yue, Hui Zhao, Hongru Li Research on Dynamic Response of Riverbed Deformation Based on Theory of BP Neural Network . . . . . . . . . . . . . . . . . . . . . 865 Qiang Zhang, Xiaofeng Zhang, Juanjuan Wu
XXVIII
Contents
Adaboosting Neural Networks for Credit Scoring . . . . . . . . . . . . 875 Ligang Zhou, Kin Keung Lai An Enterprise Evaluation of Reverse Supply Chain Based on Ant Colony BP Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . 885 Ping Li, Xuhui Xia, Zhengguo Dai Ultrasonic Crack Size Estimation Based on Wavelet Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 893 Yonghong Zhang, Lihua Wang, Honglian Zhu Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 901
The Initial Alignment of SINS Based on Neural Network Tingjun Li 1
Abstract. Strap-down Inertial Navigation System (SINS) is the development direction of the inertial navigation technology. Its working precision based not only on the inertial measurement units, but also on the initial alignment. This paper uses the N-Tupple neural network in their research, gives principal algorithm and the training algorithm of N-Tupple, and the computer simulation is done. The simulation shows that, the alignment precision of the neural network method is higher than the normal methods, and the greatly reduced alignment time shows the advantage of the new method. Keywords: Strap-down Inertial Navigation System (SINS), N-Tupple neural network, Training algorithm of N-Tupple.
1 Introduction Modern intelligence technique has been successfully used in both pattern recognition and function approximation task, and has been used in the inertial alignment of INS. It adopts BP neural network as well RBF neural network to carry out the inertial alignment of INS. This technique simplifies the algebra structure of system operation, and the real-time of system is better than that of distributed Kalman filter, and the precision is almost the same as that of Kalman filter. Theory N-Tupple neural networks have been successfully applied to both patters recognition and function approximation tasks. Their main advantages include a single layer structure, capability of realizing highly non-linear mapping and simplicity of operation. This network is capable of approximating complex probability density functions (PDFS), and deterministic arbitrary function mappings. The main advantages of N-Tupple network is the fact that the training set points are stored by network implicitly, rather than explicitly, and thus the operation speed remains constant and independent of the training set size. Therefore, the network performance can be guaranteed in practical implementations. This paper uses N-Tupple network to realize the initial alignment of SINS, the computer simulation results show that this method meet the need of engineer. Tingjun Li Naval Aeronautical and Astronautical University, Yantai 264001, China
[email protected] H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 1–6. springerlink.com © Springer-Verlag Berlin Heidelberg 2009
2
T. Li
2 Algorithm of N-Tupple Network We consider a general system taking a D-dimensional real-valued vector, X, as its input and producing a scalar real-valued output, Y (a scalar rather than vector output form is considered for simplicity). The input and output are realizations of random variables X and Y, respectively. It is assumed that X and Y are distributed according to a continuous joint probability density function (pdf). We seek to find an input/output relationship of the system in terms of the regression or a conditional mean of the dependent variable Y for any particular value of the input.
m( x ) = E{Y / x} = E{Y / X = x} : ℜ D → ℜ
(1)
Where it is assumed that the conditional mean exists, i.e., ∀x E m( x) < ∞ . For a known underlying PDF the regression function is given by ∞
m( x) = E{Y / x} = ∫ y ⋅ f ( y / x)dy −∞
(2)
And for any particular (x, y) pair generated by the system: y = m( x ) + ε , where the random error component, ε , disappears in the average (i.e., m ( x ) = E{Y / x} ). However, when no explicit knowledge about the system is available, the regression function can only be estimated from a finite set of point according to its distribution. Regression analysis plays a major role in statistics. In this work we are concerned only with one kind of non-parameter regression, based on the kernel method for probability density estimation. From the definition of the conditional mean it is apparent that if an estimate of the system joint PDF was available, it could be used directly for estimating the regression function. The kernel method provides a means of estimating f ( x, y ) with no assumptions being made about its form, allowing approximation of the regression function in the general case. The unvaried, real and even kernel function, ϕ ( x ) , satisfies the following conditions. Additionally, if hT denotes a smoothing parameter (also called bandwidth or window width of the kernel function) dependent on the number of training samples, T, and satisfying the condition. lim hr = 0 lim T ⋅ hT = ∞
T →∞
T →∞
(3)
Then the estimator, fˆ ( x ) , given by ∞
x − xi fˆ ( x) = ∑ φ ( ) [T ⋅ ∫ φ ( x)dx] hT i =1 −∞ T
(4)
Approaches asymptotically the unvaried distribution density f ( x) . This provides a consistent and asymptotically unbiased estimate of fˆ ( x ) .
The Initial Alignment of SINS Based on Neural Network
3
∫ ϕ ( x) dx = 1
φ ( x) ≥ 0
(5)
ℜ
Approaches asymptotically the univariate distribution density f ( x) , this provides a consistent and asymptotically unbiased estimate of fˆ ( x ) .
ϕ ( x, y ) = ϕ x ( x )φ y ( y )
(6)
Where ϕ y ( y ) is a univariate kernel function satisfying the conditions (3) and (6).
:
Thus, PDF is
1 fˆ ( X , y ) = T
T
∑Φ i =1
X
( X − X i ) ⋅ϕ y ( y − y i )
(7)
And the regression function can be estimated as: T
Eˆ (Y / X ) =
∑y
i
i =1
⋅ΦX (X − X i )
(8)
T
∑ΦX (X − X i ) i =1
Where Φ X ( X ) satisfies the following conditions
sup Φ( x) < ∞
X ∈ℜ D
sup Φ( x) < ∞
X ∈ℜ D
sup Φ( x) < ∞
(9)
X ∈ℜ D
According to the above analysis, we know the fact that if the function Φ X ( X ) can be realized, then this network can response the input/output relationship. The literature (3) gives the N-Tupple network. The network consists of an R-bit binary array, and a set of K memory nodes, each having a N-bit long address word (i.e., having 2 N addressable locations), the structure is showed in Fig.1. The ℜ D → ℜ mapping performed by the network consists essentially of three stages:
%LQDU\PRGHORILQSXW
Address of N -bit
LQSXW RXWSXW
Fig. 1 Structure of N-Tupple Network for Initial Alignment
4
T. Li
z z z
Conversion of the real vector input into a binary format and projecting it onto the network retina. Sampling of the R-bit binary array to form its address with N randomly selected array bits. Combining the contents of the addressed memory location to produce the network response.
3 N-Tupple Network of Initial Alignment 3.1 Initial Alignment Principle of SINS Assume that the state and measurement equation of initial system is:
⎧⎪ X ( k +1) = Φ ( k +1, k ) X ( k ) + T( k +1, k )U ( k ) + Γ ( k +1, k )W( k ) ⎨ Z ( k +1) = H ( k +1) X ( k +1) + V( k +1) ⎪⎩
(10)
Where X ∈ R n is state vector, Z ∈ R m is measurement vector, U ( k ) is controlling vector, W( k ) ,
V( k +1)
are the system’s dynamic noise vector and measurement noise
vector, are white noise, its’ variance matrix is Q( k ) and
R( k +1) ,
the rest are all coefficient
matrix. Taking SINS of 8-order for example, the process and precision of initial alignment system using N-Tupple network will be discussed concretely. Assume that the state vector x = ⎡⎣Φ x , Φ y , Φ z , ε x , ε y , ε z , ∇ x , ∇ y ⎤⎦T , measurement vector z = ⎡⎣ zx , z y ⎤⎦T , controlling vector u = ⎡⎣u x , u y , u z ⎤⎦T , measurement noise vector V = ⎡⎣Vx ,Vy ⎤⎦T , and then the mathematic model is: ⎧ dx (t ) = Ax(t ) + Bu (t ) ⎪ ⎨ dt ⎪⎩ z (t ) = Hx (t ) + V (t )
(11)
Where Φ x , Φ y , Φ z are the error angles of SINS, ε x , ε y , ε z are the random constant excursion of three gyro (east and north are 0.01°/h, sky direction is 0.03°/h), ∇ x , ∇ y are the random constant excursion of accelerometer (east and north are 10-4g). A, B, H is the coefficient matrix, the item whose the value is respectively is A(1,2) = - A(2,1) = ωiesinL, H(1,2) = -H(2,1)= -g, A(1,3) = - A(3,1) = -ωiecosL, A(1,4) = - A(2,5) = A(3,6) = 1, H(1,7) = H(2,8) = 1, B(1,1) = B(2,2) = B(3,3) = 1.
3.2 Study Algorithm of N-Tupple As each memory location contains a real number, any particular choice of tupple addresses results in a selection of K numerical weights {w1 ( x), w2 ( x), , wk ( x)} . When the input is x, let ak ( x) designate the counter value corresponding to the location addressed in the kth tupple memory by the input X, thus any input to the
The Initial Alignment of SINS Based on Neural Network
5
network results in a unique selection of K-Tupple addresses together with their associated weight and counter values.
⎧ {t1 ( x), t2 ( x), , tk ( x )} ⎪ x → ⎨{w1 ( x), w2 ( x), , wk ( x)} ⎪ {a ( x), a ( x), , a ( x)} ⎩ 1 2 k
(12)
Initially, all network tupple memory locations (both the weight and counter values) are set to zero. During the training phase the network is presented with T training pairs (Xi, Yi) drawn according to the PDF of the system being modeled, where Xi is the D-dimensional input vector, and Yi denotes the corresponding output. For each tupple location addressed by Xi the value of Yi is added to the corresponding weight, and the location counter is incremented: ⎧⎪ wk ( X i ) ← wk ( X i ) + y i ⎨ i i ⎪⎩ ak ( X ) ← ak ( X ) + 1
(13)
During the recall phase the network output, yˆ( X ) , is obtained by normalizing the sum of addressed weights with the sum of their corresponding counter values. K
yˆ( X ) = ∑ wk ( X ) k =1
K
K
∑ a ( X ) , ∑ a ( X ) = 0 → yˆ ( X ) = 0 k =1
k
k =1
(14)
k
3.3 Simulation and Analysis of N-Tupple SINS Initial Alignment The initial data for simulation is L=45°, E{Φ 2x } = E{Φ 2y } = (10′)2 , E{Φ 2z } = (60′) 2 ,
E{ε x2 } = E{ε y2 } = (0.01° h)2 , E{ε z2 } = (0.03° h) 2 , E{∇2x } = E{∇ 2y } = (10−4 g )2 . When introducing N-tupple network, the simulated curves for error angles Φ x, N , Φ y , N and Φ z , N of SINS initial alignment is showed in Fig.2. 0.1 X,N
0.05
Ф-0.05 0
Y,N
0.05 0
Ф-0.05 0.5
Z,N
Fig. 2 Simulated Curves Using N-tupple
0
Ф-0.5 0
50
100
150 200 250 T ime (s)
300
350
400
6
T. Li
4 Conclusions As discussion, we finished the initial alignment of SINS using N-Tupple network. From the simulated results, we can reach the following conclusions: (1) The initial alignment using N-Tupple network is feasible. (2) The initial alignment using N-Tupple network can decrease greatly the alignment time, especially can be applied in the field of rapid alignment; At the same time from the point of view of hardware realization, N-Tupple network has great predominance.
References 1. Yang, L., Wang, S.H.: Initial Alignment System of SINS Using BP Neural Network. Transactions of Nanjing University of Aeronautics & Astronautics 28, 487–499 (1996) 2. Wang, D.L.: Initial Alignment System of SINS Using RBF Neural Network. Astronautics Control 2, 48–59 (1999) 3. Aleksander, K.: N-Tupple Regression Network. Neural Network 9, 855–869 (1996) 4. Yuan, X., Yu, J., Chen, Z.: Navigation System. China Aviation Industry Press, Beijing (1994) 5. Li, T.J.: Data Acquiring System Based on Vxi bus. In: Proceedings of the Second International Conference on Active Media Technology, vol. 5, pp. 688–692 (2004) 6. Li, T.J.: Design of Computer Management System. In: Proceedings of the Third International Conference on Wavelet Analysis and Applications, vol. 5, pp. 744–749 (2004) 7. Li, T.J.: Design of Boot Loader in Embedded System. In: Proceedings of the 6th International Progress Wavelet Analysis and Active Media Technology, vol. 6, pp. 458–463 (2005) 8. Li, T.J., Lin, X.Y.: Research on Integrated Navigation System by Rubidium Clock. Journal on Communication 8, 144–147 (2006)
Analysis on Basic Conceptions and Principles of Human Cognition Xiaorui Zhang, Minyong Li, Zhong Liu, and Meng Zhang 1
Abstract. The knowledge on consciousness, thinking and cognition is the base of the researches about human intelligence. The basic elements of cognition and their relations between each other are discussed, and breakthrough on cognition has been made. It is concluded that thinking is one-dimensional kinetic process that is based on memory function and is composed of two or more static consciousnesses with certain relationships between them in one’s conscious focus. The concepts about consciousness in several scientific subjects and their relationships are summarized, and conscious forms are classified newly. According to the thought about the contemporary system of science and technology, an opinion about human cognitive levels has been proposed. Keywords: Systems science, Cognitive science, Consciousness, Thinking, Language.
1 Introduction Cognitive researches in China belong to the field of noetic science [1] that is in fact no other than cognitive science, because they both have the same research objects, but there exist some dissimilarities between their viewpoints. Noetic science is dedicated to disclose the essence and laws of thinking and cognition, and aims at understanding human cognition. It is very significant to study cognition. First, it can promote cognitive abilities and broaden our fields of vision and help us to cognize the world better. Second, it can facilitate thoughts and cognitions owned by different persons to be better communicated and imparted, and make individual cognition become social cognition. Third, it can accelerate one’s consciousness and body to coevolve better and answer some basic problems, e.g. what life is, why life forms are so various, why I exist, how human come, whether there is life in other planets, if there is limit in the scales of matter forms, etc. Fourth, it can improve individual’s life, enhance his inner psychic realm, and increase the morality at the aspect of his behavior and make individuals pursue to be saints. Uncovering human Xiaorui Zhang . Minyong Li . Zhong Liu . Meng Zhang Dept. of Command Automatization, Naval Univ. of Engineering, Wuhan 430033, China
[email protected] H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 7–17. springerlink.com © Springer-Verlag Berlin Heidelberg 2009
8
X. Zhang et al.
cognition entirely can help individuals to heighten their cognitive abilities and make it easier to exceed themselves in cognition. Studying cognition means that it is studying life especially human brains that are very complex huge systems. Any object correlated to cognition can be included in the research range, so the research about cognition is a field that has strong characteristics of systems and synthesis.
2 Consciousness and Thinking 2.1 Reflection on Consciousness and Thinking Marxism philosophy considers that consciousness is the reflection of matter in brains. From the viewpoint of psychological motion, consciousness may be classified as psychological imitation and knowledge [2]. Psychological imitation is subjective consciousness, and is a kind of psychological experiences and mental states, and includes emotion, feeling, interest, attention, will and so on. Knowledge is formed from thought, is objective consciousness abstracted and refined by rational thinking. In a broad sense, consciousness includes explicit consciousness and potential consciousness. Consciousness usually means explicit consciousness and is a category in a narrow sense. Explicit consciousness can be grasped and controlled by our spirit, but potential consciousness can’t. Potential consciousness is the base of explicit consciousness which comes from potential consciousness. Instinct is an innate capability or patterns of behaviors that are often responses to specific environmental stimuli, e.g. babies needn’t study and are able to cry and suckle after birth. Instinct also includes the abilities that have been posteriorly possessed by bodies and seem not to need any feeling to enter brains, e.g. at the time of emergencies, human usually can erupt self-protecting behaviors under the condition of self-unknowing and sometimes can appear supernormal behavioral capability. In fact, instinct is a kind of potential consciousness. Potential consciousness shows itself more at the aspect of functionality, in which one part is owned with birth and seems solidified in neural organizations, and the other part is formed after birth and possessed by studying in the course of living, e.g. writing, riding, driving, typing on a keyboard, etc. Consciousness is the higher form of potential consciousness and is a mental form which can be felt, grasped and controlled. Based on consciousness, thinking comes into being. Thinking is a form that belongs to consciousness. In a point of time, consciousness is quiescent, and is like a picture depicting the form of a material object, so thinking is a dynamic course that is like a film comprised of some pictures between which there are some relations. Human’s thinking is one-thread or one-dimensional. The consciousness which is being paid attention to is named as conscious focus, then in a time point there is only one conscious focus for any person. A thinking course in which there exists change is formed by different consciousnesses between which there usually still exist other relations besides temporal and spatial logical relations. Thinking is interrelated with memories that supply the relations between consciousnesses in a thinking course. Memory is also a kind of consciousnesses, and the difference is that memories are the consciousnesses which are saved in brains and can be taken out and then reappear in conscious focus and feelings.
Analysis on Basic Conceptions and Principles of Human Cognition
9
2.2 Cognitive Forms on Thinking in Scientific Subjects There is meta-cognition in any cognitive field. All cognition comes from methodologies that include relations, information, scale views [3], points of view, methods, etc. Different cognitive bases lead to different cognitive contents. The conceptions about thinking in different subjects are different. In human history, mental phenomena were originally researched by philosophy. With the development of human cognition, the researches on thinking formed logic which gradually separated from philosophy, afterward cognitive science comes into being, and in China this research field is called noetic science. In Marxism philosophy, the meaning of thinking has the difference between broad sense and narrow sense [2]. Thinking in a broad sense equals consciousness, spirit and cognition. It expresses the meaning as the object opposite to existence, and is a general category. In a narrow sense, thinking just denotes the rational cognition in a cognitive course. Obviously, these are two different conceptions, but they are not distinguished and used by a mixed mode in Marxism philosophy. Brain science is interested in the inner laws of neural physiological functions which are the base of consciousness and thinking, and researches how consciousness and thinking emerge from the operations of neural organizations. Psychology science pays attention to the operational mechanism of thinking and mental behaviors, and commits itself to find out the inner laws contained in the transformation from psychological activities to physical behaviors. Noetic science is the scientific category which studies the laws relevant with thinking. It covers the research results of brain science, psychology, thinking, cognitive science, philosophy, systems science and so on, and synthesizes all cognitive contents about thinking that comes from different scale views or viewpoints in order to understand how human cognize the objective world and how the information gotten from feeling the world is saved and processed to become the knowledge about the world. Systems science thinks much of the consistency, difference and systematization about thinking and cognition between all relevant scientific subjects, emphasizes the relations, cross and amalgamation of subjects. Systems science is also interested in the function which thinking and cognition perform. In fact, what systems science here studies is the same as noetic science’s, the difference is just their viewpoints, so noetic science can also be claimed as noetic systems science from the viewpoint of system idea. Cognition has hierarchical structure from concreteness to abstraction. Philosophy grasps the most ultimate and general properties in the abstract. Concrete subjects such as brain science, psychology, neurophysiology and somatology research some subsidiary or concrete properties, laws and processes in a less scale under the macro direction of philosophy. The conception about thinking which is formed from each subject is different from other subject’s due to different methodologies. This is like a focus lens which is facing an object and shows different pictures according to different focus positions and ranges, but the object doesn’t change with the change of focus and it is objective. Though there are some differences between correlated concrete subjects, they all faces the same object as thinking, they should be theoretically consistent with each other. The consistent problem is just the place where systems science usually performs its functions. Systems science will harmonize all concrete subjects which are correlated with noetic science and do its best to form synthetical and uniform conceptions about thinking and other relevant objects from concreteness to abstraction and
10
X. Zhang et al.
from feelings to logos. The way doesn’t accord with cognitive characteristics that the essence of thinking is understood through the definition of several general sentences. Thinking should be grasped by synthesizing the cognitive contents that come from different methodologies.
2.3 Classification on Thinking The classification on thinking can be said that the benevolent see benevolence and the wise see wisdom. It is bewildering that the same cognitive object has so many different opinions, which to some extent shows that it isn’t easy to completely cognize thinking. Combined with Hsue-shen Tsien’s thought, according to the basic units of thinking, thinking is classified as behavioral thinking, visual thinking, abstract thinking; with the result and function of thinking, thinking is divided into reappearing thinking and creating thinking; from the viewpoints of main bodies, thinking is composed of individual thinking and social thinking [2]. Based on the restudy about consciousness and thinking, thinking forms are here newly classified. According to the relational modes in thinking, thinking contains two kinds of thinking forms that are logical thinking and alogical thinking. In logical thinking there are some laws that have been grasped by human. Logical thinking mainly is symbolic thinking that is so particular that many people don’t think they can think if they don’t use symbols. Symbolic thinking includes language thinking, all kinds of computer language thinking, mathematical symbolic thinking, graphical thinking and other kinds of symbolic logical thinking. Language thinking is a tool that plays a key role in the development of human cognition. Symbolic thinking especially language thinking spans all levels of cognition from concreteness to abstraction, and carves up consciousness and cognition precisely and punctiliously. In the course of cognitive development, this kind of carving is continuously carried on and affirms and records each result gotten in cognitive progress, so cognition gets accumulated and can be handed down or be inherited through a series of symbols. Alogical thinking denotes the thinking forms in which human haven’t yet found out laws, and has many forms, i.e. replaying thinking, dreaming thinking, associational thinking, random relating thinking and inspirational thinking, etc. The boundary between logical thinking and alogical thinking isn’t changeless. Human cognitive development shows that the logical range has been ceaselessly expanding and the boundary changes as if the logical range had been nibbling the range of alogical thinking at all times and found out laws in the range of alogical thinking and transferred them to the logical range of thinking. Replaying thinking denotes that based on memories what has been felt reappears in conscious focus. Dreaming thinking figures the thinking forms in one’s dream. Associational thinking is the thinking activities in which some consciousnesses have been correlated according to some relations and have been reflected orderly in conscious focus. Random relating thinking is the least restricted form and can be completely divorced from the usual relations among other kinds of thinking. Alogical thinking usually contains both logical thinking contents and alogical thinking contents. Compared with logical thinking, the laws contained in alogical thinking appear faint. Inspirational thinking has close relations with other thinking forms, and it seems to emerge from a kind of
Analysis on Basic Conceptions and Principles of Human Cognition
11
thinking chaos and has strong creativity. According to the concrete forms of consciousness contained in thinking, thinking forms can be divided into perceptional reflection thinking, conscious presentation thinking, abstract idea thinking and symbolic thinking. Perceptional reflection thinking is the direct and sensory reflection on objective objects and forms the perception about objects. Conscious presentation thinking is composed of pure conscious perceptions without the help of symbolic languages and begins to grasp the content and essence of objective objects to some extent. Abstract idea thinking breaks away from sensory reflection, synthetic perception and conscious presentation, and purely moves about in the range of content and essence about objective world. Symbolic thinking is the thinking activities that depend on symbols which are correlated with concrete consciousnesses. The forms of things are consistent with their contents. According to the concrete contents of consciousnesses contained in thinking, thinking forms can be divided into sensory thinking, perceptional thinking, presentative thinking, idea thinking and symbolic thinking. Sensory thinking is about the original feelings on objects, and is an intuitionistic thinking. Perceptional thinking synthesizes some relevant sensory thinking and forms a total sensory grasp. Presentative thinking is the process that is based on relevant perceptional thinking and can partly grasp the content and essence of objects. Idea thinking is a purely abstract thinking and is the highest cognitive forms of human thinking. Thinking can also be classified into concrete thinking and abstract thinking, and concrete thinking is also named as intuitionistic thinking or imagery thinking. From the participant extent of perception and sensibility, thinking can be divided into sensorial thinking and rational thinking. The most attractive thinking form is creating thinking that can find out new ideas. Creation is weighed by the results of thinking, so creating thinking is always associated with concrete problems that make creating thinking pragmatic. Creating thinking has two types, one is pattern logic evolvement, and the other is pattern extension or reformation. Most of past researches belonged to the former that can’t direct the development of era, but the latter can. Like the work of Aristotle, Copernicus, Galileo, Heng Zhang, Newton, Hegel, Marx, Einstein, Planck, Hawking, etc, what they do are the greatest creations.
3 Thinking and Language 3.1 The Antiquity and Evolvement of Language At the beginning of the form of a language, every syllable was correlated with a concrete object in life, e.g. in Chinese such words are like sun, moon, white, black, eat, look, etc. With the development of cognition, the ability to abstract was also developed, the denotation of language symbols had been changed or extended. In the meanings of words, most of archaic words originally only had one kind of meanings, but most of modern words have been developed to have several kinds of meanings and some words still keep the original meanings. A language becomes more complex and richer with its development. The pronounciation of words has changed from single syllable to multisyllable because of human’s characteristics of physiology and thinking, and multisyllable can make up the shortcoming that the acuteness of human audition isn’t as perfect as possible and in fact is usually dull. The
12
X. Zhang et al.
evolvement of languages obviously shows that human cognition is evolving from simpleness and sensibility and is becoming more and more complex and rational.
3.2 Analysis on the Broad and Narrow Meanings of Words The broad and narrow meanings of words mean the denotation of a word is a range that is like a circle which has core and periphery. The content of core denotes the narrow meanings, and the periphery that is formed from the relations with the core denotes the broad meanings. Although the form of a word has not any change, its denotative content isn’t single or identical in different lingual contexts. When a word shows its broad meanings, the word is like a point correlated with the body that is looked as its meanings, so the same meanings can be denoted by different points, i.e. different words can denote the same meanings, e.g. in Marxism philosophy consciousness have the same meanings as spirit or cognition, they are identical. The broad and narrow meanings of words come from two factors, one is human’s cognitive ability to abstract, and the other is the reduction, resolution and exactness of cognition about cognitive objects. Abstraction means holism, resolution means reductionism, so the cognitive abilities pertinent to holism and reductionism jointly forms the emergence of the broad and narrow meanings of words. It is the difference of scales of view in cognition that is the essential reason which leads to the emergence of the broad and narrow meanings of words. The broad meaning of a word is formed from the magnification of the range of its denotation so as to contain the correlated contents in a bigger range. Although the denotation has changed with the context, the symbol is still the word which hasn’t changed.
3.3 The Relation between Thinking and Language Our life bodies are living fossils that have been witnessing human’s history and also the earth’s history when there had existed any evolving form from which human ancestors’ lives came. The language we use is also a historical witness and is easier to study. There are more than six thousand languages in the world, but they all belong to eight kinds of phyla [4]. Many languages have evolved from a same language, e.g. Zang language, Burmese and Chinese have obvious common base, which has proved that long ago they belonged to one people. Language is a tool used to measure cognition. The least metrical unit is morpheme. Language is composed of a series of symbols that are used to segment and match nearly all things that belong to one’s conscious world. Thinking continues segmenting and makes itself more precise with language, and each progress gotten by thinking about cognition is recorded by new language symbols, so things that are invisible, blurry or dormant are transformed into things that are visible, specific and tangible. The development of language toward complication shows that human thinking ability is evolving to be more intelligent. Not all thinking is embodied or exhibited by language because thinking can operate without language, but thinking does its best to be expressed by language, because without language any thinking or cognition can’t be inherited. Thinking needs memories, and each tiny
Analysis on Basic Conceptions and Principles of Human Cognition
13
progress needs the help of memories, which are recorded in brains, but most are finally transferred and recorded out of brains in some mode such as language or picture. Thinking and language help and accelerate each other’s evolvement.
4 Thinking and Cognition 4.1 Cognition Definition Each individual is a conscious unit or body. Cognition emphasizes the result of the knowledge about objects, that is, a conscious body does his best to approach the objective existence of things through conscious reflection, and under the context of cognitive historical level accurately hold the truth about things to the greatest extent. Comprehending as a kind of thinking modes emphasizes cognitive process, but cognition is the result of comprehending and emphasizes cognitive result.
4.2 The Hierarchical Structure of Cognition The contemporary system of science and technology [5] proposed by Hsue-shen Tsien divided human knowledge into six levels and eleven big branches, as is shown in figure 1. The knowledge has been longitudinally carved up into six levels, which from top to bottom are philosophy, bridges, basic theories, technical science, applied technologies and pre-science. Scientific knowledge has been latitudinally divided into natural science, social science, mathematical science, systems science, noetic science, anthropic science, geographical science, military science, behavioral science, architectural science, and literature and art. The content of cognition is hierarchical, and accordingly knowledge also has multilevel. According to the thought of the contemporary system of science and technology, from the viewpoint of cognition, individual’s cognitive levels has been abstracted and formed. As is shown in figure 2, human’s cognition has six levels, which from abstraction to visualness are respectively view, methodology, science, method, art, technology and work. Work is here tightly correlated with
Mathematical science
Dialectics of Nature
Systemic methodology
Mathematical Philosophy
Systems science
Objective historical view
Epistemology Noetic science
Social science
Human-world view Anthropic science
Natural science
Military philosophy
Geographical philosophy Geographical science
Human ology
Military science
Behavioral science
Unwritten practical perception
Applied technologies
Technical science
Basic theories
Practical experiential knowledge repository and philosophical thinking
Pre-science
Aesthetics
Architectural philosophy
Literary theory
Literary activitis
Architectural science
Quantitative intelligence
Bridges
Philosophy
——Science of cognizing the subjective and objective world
Marxism philosophy Qualitative intelligence
Literary creation
Fig. 1 Contemporary system of science and technology is proposed by Tsien Hsue-shen
14
X. Zhang et al.
Fig. 2 The abstract structure of one’s cognition has six levels
practice, it may be said that what work denote here is just practice. Through work, human transforms his thought and ideas into behavioral activities, and communicates with practical objects and the world, and is able to transform nature, and feeds back information through practice to validate and to improve cognition. Work is here concrete behavioral activities, and has its specific behavioral object. Technology here denotes abilities at the aspects of thinking and skills, and is combined with individual body and can’t separate from it. Without the help of technology, work can’t be carried on. Art is here a kind of rules or strategies that is relevant to concrete behavioral activities or operational processes and can’t separate from the operations and has its concrete operational object, i.e. an art is the art of some activity or object and isn’t the art of other activities or objects. It is possible that the arts of some activities or objects are the same, and this situation belongs to method levels. Method here denotes a kind of arts that are the same or a set of arts that are pertinent, and a method is faced with a kind of collectivities that may be a set or a range. Some methods about problems in a field can be logically correlated to form a logical system. Different methodologies and world views lead to different logical systems about the same things, and each kind of logical systems can be called a school. A science is here a general uniform theory that has syncretized, synthesized, combined and integrated nearly all the schools in a field and has formed a uniform acknowledged logical system. In the process of forming a science, each school needs abstract theoretical grasp or direction about some problems to coordinate and to harmonize their opinions, this kind of abstract theoretical grasps is called their methodology. Based on all kinds of methodologies, one can have a whole general abstract cognition about the world or problems, which is here called view. When one faces a problem, he always integrates and grasps his cognition from the most concrete work up to the most abstract view, finally through thinking gives birth to one or more ideas. Once the idea has been performed, then one will show the idea by his face’s expressions, physical actions or behaviors, which can be organized to form the state of work.
4.3 The Relation between Thinking and Cognition The realization and improvement of cognition is a process which goes through from sense to perception, then to conscious presentation, abstract view, and finally to idea, from visualness to abstraction, from concreteness to generalness, from
Analysis on Basic Conceptions and Principles of Human Cognition
15
microcosmicness to macroscopicalness, from phenomena to essence, from practice to rationalness, from chance to necessity and from finity to infinity. The realization of this process owes to the function of consciousness, which innately has this kind of talents and its necessary organizational material base. Each individual is unique and different from other conscious bodies. The uniqueness is confirmed by life’s material constitution that can’t be substituted by other lives. Furthermore, life growth and cognitive development are also unique in the world, so each conscious body is inevitably different from others. On the other hand, every individual can coexist and communicate with others, which explains that there are similarities, commonness, congenericness or homogenousness among conscious bodies. The same kind of life bodies has the same physiological material bases and similar habitations, lives and evolves in the mode of humanity which makes the existence of commonness between conscious bodies, and commonness is the base for the communication between conscious bodies. Consciousness can’t communicate directly with others by the forms in which it exists in brains, so consciousness needs some explicit forms to have contact with other individual’s consciousness. Face expressions and body actions are the original forms for the embodiment of consciousness, and doodling, depicting, drawing or painting such as ancient cave frescoes belong to another kind of the most intuitionistic forms, all of these forms can be called denotations that are used to pass conscious information. The emergence of concrete denotations that separate from human bodies is a qualitative progress. The significance of denotations lies in the representation of nearly any complex conscious content in a concise mode. Language is a kind of advanced denotation systems that can denotes nearly most conscious contents and the relations between them. Each word is relevant to a kind of special consciousnesses in a certain range. If there aren’t the relevant conscious memories of a word in one’s brain, he can’t understand the meanings of the word. It has been known that there always exists the difference between any two conscious bodies at the aspect of concreteness, people’s consciousnesses or conscious backgrounds correlated with a word are different, how they manage to communicate? Here one’s conscious background is composed of concrete and abstract consciousnesses relevant to something and is a mass of all kinds of relevant consciousnesses, e.g. the word mama makes everyone think of his mother and then relate to others’ mothers and other life forms’ mothers, finally in his conscious world has grasped the most abstract content that mama is a kind of relations that identifies the object that has given birth to a new life or has fostered a new life. It is by this way that the cognition about mama comes into being. When one has heard a child’s pronunciation on mama, he firstly searches out the relation between the denotation in the sound form with the most abstract rational consciousness that has already been recorded in his brain in the form of memories, then downwards relates to others’ mothers, through associating these consciousnesses with the child, he is conscious of the child’s mother. Therefore, although each individual has his unique consciousness background that is different from others’, all of them are able to form identical rational consciousness through the ability to abstract some consciousnesses. Although these identical rational consciousnesses respectively exist in each of conscious bodies, closed cognitive channels that achieve all human’s cognition are
16
X. Zhang et al.
formed based on these pivotal rational consciousnesses which are the upward peaks in the closed cognitive channels. One’s thinking is cultivated and improved in the process of his growth. For a student, the contents studied in school are made up of two aspects, one is the exercises and improvement of the ability to think and to cognize, the other is extending fields of vision i.e. obtaining knowledge. Although human brains have congenital cognitive material base that is far more dominant than other species’, but the ability to think and to cognize intelligently is reached and acquired by a lot of methodical exercises. Without necessary exercises, even if one has good genetic aptitude, his potential intelligent ability can’t get aroused, and after he has grown up, usually he is maybe clever due to his life experiences, but he couldn’t be intelligent or knowledgeable. Language can boost up the ability to think, but in most cases thinking can operate without language, which can be regarded as a kind of thinking ability. Most of people are usually accustomed to think with language, they think that it is incredible to think without language, which is because people used to so excessively rely on the help of language that the ability to think without language hasn’t been initially evoked or has degenerated. A lot of abilities are correlated with cognition. Since life is a kind of material motion [6], human abilities aren’t changeless. Through study and exercises, one can learn an ability and reach the extent of proficiency, but if he hasn’t use the ability in a long time, he will feel rusty at this aspect when he needs to use the ability to tackle some problems, i.e. the ability gets degenerated. Thinking is a kind of abilities, so it has this characteristic about abilities. Thinking has many forms, and it can be said that any forms is a kind of abilities to think. Any kind of thinking needs to be evoked, cultivated and improved, after it is gotten, it also needs to be kept through frequently using or exercising it. Cognition is reached through thinking. Different cognitive abilities can form different opinions. Even cognition on the same level can lead to different cognitive results due to different bases and methods. Thinking abilities decide the level and profundity of cognition. One’s cognitive abilities get more liberated, i.e. the better his thinking abilities are, the deeper and the more self-contained his cognition about the essence of cognitive objects is. It may be said that the abilities of thinking and cognition aren’t inherent or congenital but cognitive parochialism is. Consequentially conscious bodies lying in certain cognitive levels coexist with the bondages of cognitive parochialism. Since one doesn’t improve himself up to a higher cognitive level, which means that he doesn’t break away from his current level, then he can’t feel the parochialism of his cognition formed on the current cognitive level and he will think that his opinions is very correct, on the contrary, he maybe doubt or deny some significant opinions, which shows their conservativeness on thought and restriction with cognition. When he has gotten away from his current cognitive level and has risen on a new higher level, he will naturally realize the parochialism of his opinions formed on the former level. Therefore, it isn’t proper to deny others’ cognition or optionally make comment on others’ opinions, because they aren’t fault based on their cognitive level. Anybody can’t get away from the parochialism of his historical existence as if today can’t exceed tomorrow forever. The development of one’s cognition is the course in which he is ceaselessly exceeding himself and wiping
Analysis on Basic Conceptions and Principles of Human Cognition
17
off current cognitive parochialism through self-conscious cognitive evolvement and improvement, and it is also a process of frequently denying himself in cognition, through which conscious bodies rise toward a higher level of cognitive states.
References 1. Tsien, H.S.: On Noetic Science. Nature Magazine 8, 563–567 (1983) 2. Lu, M.S.: Research on the Mystery of Thinking. Beijing Agricultural University Publishing House, China (1994) 3. Zhang, X.R., Li, M.Y.: Scale View and Thinking Principle in System Mode. Science and Technology Philosophy Doctoral Forum of China, Taiyuan (2008) 4. Pinker, S.: The Language Instinct—How the Mind Creates Language. Translated by Lan Hong. Shantou University Press, China (2004) 5. Tsien, H.S.: The Structure of Contempary Science—Rediscussion on systematology of contemporary science and technology. Philosophical Researches 3, 19–22 (1982) (in Chinese) 6. Zhang, X.R., Li, M.Y., Liu, Z., Zhang, M.: System Cognition Idea. In: 4th National Conference of Logic System, Intelligent Science and Information Science, Guiyang, China (2008)
Global Exponential Stability for Discrete-Time BAM Neural Network with Variable Delay Xiaochun Lu
Abstract. In this paper, the existence and the global exponential stability of the equilibrium point for a class of discrete-time BAM neural network with variable delay are investigated via Lyapunov stability theory and some analysis techniques such as using an important inequality and using norm inequalities in matrix theory. Several delay-independent sufficient conditions for the existence and the global exponential stability of the equilibrium point are derived by constructing different Lyapunov functions for different cases. Finally, two illustrative examples are given to demonstrate the effectiveness of the obtained results. Keywords: BAM neural network, Discrete-time system, Global exponential stability, Lyapunov stability theory.
1 Introduction The bi-directional associative memory(BAM) neural network model, known as an extension of the unidirectional auto associator of Hopfield neural network, was first introduced by Kosko in 1987. It has been widely used in many fields such as pattern recognition, optimization and automatic control and so on. Up to now, a lot of works concerning the global exponential stability of BAM networks have been done. Many scholars have derived various sufficient conditions for the global exponential stability for continuous-time BAM neural networks(see [1-8]) by constructing suitable Lyapunov functions and using some different inequality techniques. Nowadays, more and more researchers studied more general BAM networks model without delays described as the following form: Xiaochun Lu School of Water Resources and Hydropower, Wuhan University, Wuhan 430072, China H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 19–29. c Springer-Verlag Berlin Heidelberg 2009 springerlink.com
20
X. Lu
⎧ p ⎪ ⎪ cji fj (yj (t)) + Ii ], i = 1, . . . , m, ⎨ x˙ i (t) = −αi (xi (t))[ai (xi (t)) − j=1
m ⎪ ⎪ dij gi (xi (t)) + Jj ], j = 1, . . . , p, ⎩ y˙ j (t) = −βj (yj (t))[bj (yj (t)) −
(1)
i=1
or the model with delays described as the following form: ⎧ p ⎪ ⎪ sji fj (yj (t − τji (t))) + Ii ], i = 1, . . . , m, ⎨ x˙ i (t) = −αi (xi (t))[ai (xi (t)) − j=1
m ⎪ ⎪ tij gi (xi (t − σij (t))) + Jj ], j = 1, . . . , p, ⎩ y˙ j (t) = −βj (yj (t))[bj (yj (t)) − i=1
(2) where m ≥ 2, p ≥ 2 are numbers of neurons in the network, xi , yj denote the state variable associated with the neuron, and ai , bj are appropriately behaved functions. The connection matrices C = (cji )p×m , D = (dij )m×p tell us how the neurons are connected in the network and S = (sji )p×m and T = (tij )m×p indicate the strength of the neuron interconnections within the network with time delay parameters τji (t) and σij (t). And the activation functions fj and gi show how neurons respond to each other. However, because of implementing the continuous-time neural networks for simulation or computational purposes, it is essential to formulate a discretetime system which is an analogue of the continuous-time networks. There is no unique schemes ways of obtaining a discrete-time analogue from continuoustime networks. There exist many numerical schemes such as Euler scheme and Runge-Kutta scheme which can be used to obtain the discrete-time version of the continuous-time system. Certainly, the discrete-time system is desired to preserve the dynamical characteristics of the continuous-time system. For the stability problem of discrete-time(see [9-13]) bi-directional associative memory neural networks, results are much less than those of continuoustime neural networks. For example, in [13] authors have studied the global exponential stability of a class of discrete-time Cohen-Grossberg neural network(CGNN) with delay or without delays. Mohamad and Gopalsamy[11,12] have investigated the exponential stability of continuous-time and discretetime cellular neural networks with delays via Halanay-type inequalities and Lyapunov methods. Based on the linear matrix inequality (LMI), in [14], authors have derived some sufficient delay-independent and delay-dependent conditions for the existence, uniqueness and global exponential stability of the equilibrium point of discrete-time BAM neural networks with variable delays. With the development of the Cohen-Grossberg neural network(see [15-17]), in this paper, we consider the corresponding discrete-time version of the bi-directional associative memory (BAM) neural network model (2) described as the following equation
Global Exponential Stability for Discrete-Time BAM Neural Network
21
⎧ p ⎪ ⎪ sji fj (yj (n − kji (n))) + Ii ], ⎨ xi (n + 1) = xi (n) − αi (xi (n))[ai (xi (n)) − j=1 (3) m ⎪ ⎪ tij gi (xi (n − lij (n))) + Jj ]. ⎩ yj (n + 1) = yj (n) − βj (yj (n))[bj (yj (n)) − i=1
The initial conditions associated with (4) are the form: xi (s) = φi (s), i = 1, . . . , m, s ∈ [−k, 0], yj (s) = ψj (s), j = 1, . . . , p, s ∈ [−l, 0], where k =
max
1≤i≤m,1≤j≤p
{kji (n)}, l =
max
(4)
{lij (n)}. And the delays
1≤i≤m,1≤j≤p
satisfy 1 < kji (n + 1) < 1 + kji (n) and 1 < lij (n + 1) < 1 + lij (n). The rest of this paper is organized as follows: in Section 2 we give some Preliminaries; next, in Section 3 we obtain some sufficient conditions to ensure that the equilibrium of system (3) is globally exponentially stable via constructing two different Lyapunov functions; one example and a figure are given to illustrate the effectiveness of our results in Section 4; finally, in Section 5 the conclusions are given.
2 Preliminaries Throughout this paper, we need the following assumptions: (H1 ) Functions αi (·), βj (·) are both bounded, positive and Lipschitz con+ − m tinuous, furthermore, 0 ≤ α− i ≤ αi (x) ≤ αi < ∞, x ∈ R ; 0 ≤ βj ≤ + βj (y) ≤ βj < ∞, y ∈ Rp . (H2 ) Behaved functions ai (·), bj (·) are bounded and Lipschitz continuous with Lipschitz constants λi , hj such that |ai (μ) − ai (ν)| ≤ λi |μ − ν|, |bj (μ) − bj (ν)| ≤ hj |μ − ν|; furthermore they are reversible and satisfy a˙ i (x) ≥ ξi > 0, b˙ j (y) ≥ rj > 0. (H3 ) Activation functions fj (·), gi (·) are bounded or globally Lipschitz continuous and there exist Lipschitz constants Mi , Nj such that |fj (μ) − fj (ν)| ≤ Nj |μ − ν|, |gi (μ) − gi (ν)| ≤ Mi |μ − ν|. Let R denote the set of real numbers and Rm = R × . . . × R and Rp =
m
R × . . . × R. Let Z = {· · · , −1, 0, 1 · · ·}, Z0+ = {0, 1, 2, . . . , m + p}; kji ∈
p
Z0+ , lij ∈ Z0+ . For arbitrary x ∈ Rm and y ∈ Rp , xT and y T denote the 1 transpose of x and y, respectively. Define x2 = (xT x) 2 be the vector’s 2norm. Let Rm×m denote the set of all m × m real matrices. For A ∈ Rm×m , the spectral norm of A is defined as A2 = (max{|λ| : λ is an eigen value 1 of AT A}) 2 . Suppose (x∗ , y ∗ ) = (x∗1 , . . . , x∗m , y1∗ , . . . , yp∗ )T be an equilibrium point of system (3). Let ui (n) = xi (n) − x∗i , vj (n) = yj (n) − yj∗ , then we can rewrite (3) into
22
X. Lu
⎧ ui (n + 1) = ui (n) − αi (ui (n) + x∗i )[ai (ui (n) + x∗i ) − ai (x∗i ) ⎪ ⎪ ⎪ p ⎪ ⎪ ⎪ sji [fj (vj (n − kji (n)) + yj∗ ) − fj (yj∗ )]], − ⎨ j=1
⎪ vj (n + 1) = vj (n) − βj (vj (n) + yj∗ )[bj (vj (n) + yj∗ ) − bj (yj∗ ) ⎪ ⎪ m ⎪ ⎪ ⎪ − tij [gi (ui (n − lij (n)) + x∗i ) − gi (x∗i )]]. ⎩
(5)
i=1
For convenience, we let αi (ui (n)) = αi (ui (n)+x∗i ), βj (vj (n)) = βj (vj (n)+yj∗ ), ai (ui (n)) = ai (ui (n) + x∗i ) − ai (x∗i ), bj (vj (n)) = bj (vj (n) + yj∗ ) − bj (yj∗ ), fj (vj (n)) = fj (vj (n) + yj∗ ) − fj (vj∗ ), gi (ui (n)) = gi (ui (n) + x∗i ) − gi (x∗i ), then (5) can be further reduced into ⎧ p ⎪ ⎪ sji fj (vj (n − kji (n)))], ⎨ ui (n + 1) = ui (n) − αi (ui (n))[ai (ui (n)) − j=1 (6) m ⎪ ⎪ tij gi (ui (n − lij (n)))]. ⎩ vj (n + 1) = vj (n) − βj (vj (n))[bj (vj (n)) − i=1
Obviously, functions in (6) satisfy H1 and: (H2 ) Behaved functions ai (·), bj (·) are bounded with constants λi , hj such that |ai (u)| ≤ λi |u|, |bj (v)| ≤ hj |v|; furthermore its are reversible and satisfy a˙ i (u) ≥ ξi > 0, b˙ j (v) ≥ rj > 0. (H3 ) Activation functions fj (·), gi (·) are bounded with constants Mi , Nj such that |fj (v)| ≤ Nj |v|, |gi (u)| ≤ Mi |u|, for ∀u ∈ Rm , ∀v ∈ Rp .
3 Existence of the Equilibrium Point and Globally Exponential Stability for System (6) It is obvious that the stability of the equilibrium of system (6) implies the stability of the equilibrium of system (3). So we only need to study system (6).
Theorem 1. For system (6), under assumptions H1 , H2 , H3 , if it also satp m isfies 1 ≥ ξi Nj |sji |, 1 ≥ rj Mi |tij |, then there exists at least one zero i=1
j=1
equilibrium. The proof of this Theorem is similar to that of existed results [12]. Here we omit it. Our main purpose is to proof the equilibrium of system (6) is globally exponentially stable via constructing proper Lyapunov-Krasovskii functions.
Theorem 2. Under assumptions H1 , H2 , H3 and conditions of Theorem 1, if there exist γi , ηj (i = 1, . . . , m, j = 1, . . . , p) such that uai (u) ≥ γi u2 , vbj (v) ≥ p m + − − α+ βj+ Mi |tij |. ηj v 2 and α+ i λi ≤ 1, βj hj ≤ 1, βj ηj > i Nj |sji |, αi γi > i=1
j=1
Global Exponential Stability for Discrete-Time BAM Neural Network
23
Then the zero equilibrium of (6) is unique and globally exponentially stable, i.e. for every solution (u, v) = (u1 , . . . , um , v1 , . . . , vp )T of (6) such that m
p m 1 n |ui (n)| + |vj (n)| ≤ ν( ) [ sup |φi (s)| + sup |ψj (s)|], (7) ξ i=1 s∈[−l,0] i=1 j=1 j=1 s∈[−k,0]
where l =
p
max
1≤i≤m,1≤j≤p
ξ > 1, ν > 1, for n ∈
{lij }, k =
max
1≤i≤m,1≤j≤p
{kji }. ξ, ν are constants and
Z0+ .
Proof. By Theorem 1, there exists an equilibrium and the uniqueness of the equilibrium will be guaranteed by (7). So we only need to certify (7). We can consider functions G1 (·), G2 (·) given by: ˜ = 1−ξ+ ˜ ξα ˜ − γi−ξ˜ G1 (ξ) i
p
˜ =1−ξ+ ˜ ξβ ˜ − ηj −ξ˜ βj+ Mi |tij |ξ˜l , G2 (ξ) j
j=1
m
˜k α+ i Nj |sji |ξ ,
i=1
for ξ˜ ∈ [1, +∞). We note that G1 (1) = α− i γi −
p
βj+ Mi |tij | > 0, G2 (1) = βj− ηj −
j=1
m
α+ i Nj |sji | > 0.
i=1
Using the continuity of G1 (·), G2 (·) on [1, +∞), they will follow that there exists a real number ξ, such that G1 (ξ) = 1 − ξ +
ξα− i γi
−ξ
G2 (ξ) = 1 − ξ + ξβj− ηj − ξ
p j=1 m
βj+ Mi |tij |ξ l ≥ 0,
(8)
k α+ i Nj |sji |ξ ≥ 0.
(9)
i=1
Now let us consider functions Wi (n) = ξ n |ui (n)|, Zj (n) = ξ n |vj (n)|. We obtain Wi (n + 1) = ξ n+1 |ui (n + 1)| = ξ n+1 |ui (n) − αi (ui (n))(ai (ui (n)) −
p
sji fj (vj (n − kji (n))))|
j=1
≤ ξ n+1 |ui (n) − αi (ui (n))ai (ui (n))| + ξ n+1
p
|sji ||αi (ui (n))|
j=1
×|fj (vj (n − kji (n))| ≤ξ
n+1
(1 −
α− i γi )|ui (n)|
+
ξ n+1 α+ i
p j=1
Nj |sji ||vj (n − kji (n))|
24
X. Lu
= ξWi (n) −
α− i γi ξWi (n)
+
α+ i
p
Nj |sji |ξ kji (n)+1 Zj (n − kji (n))
j=1 + ≤ ξWi (n) − α− i γi ξWi (n) + αi
p
Nj |sji |ξ k+1 Zj (n − kji (n)).(10)
j=1
Similarly, we can calculate Zj (n + 1) = ξ n+1 |vj (n + 1)| ≤ ξZj (n) − βj− ηj ξZj (n) + βj+
m
Mi |tij |ξ l+1 Wi (n − lij (n)), (11)
i=1
for i = 1, . . . , m, j = 1, . . . , p, n ∈ Z0+ . According to (10) and (11), we can construct a Lyapunov function V (·) as V (n) = V1 (n) + V2 (n), V1 (n) =
m
[Wi (n) +
i=1
V2 (n) =
α+ i
p
Nj |sji |ξ
n−1
k+1
j=1
p
[Zj (n) + βj+
j=1
m
Zj (q)],
q=n−kji (n) n−1
Mi |tij |ξ l+1
i=1
Wi (q)].
q=n−lij (n)
Calculating the difference V1 (n) = V1 (n + 1) − V1 (n), we have V1 (n) = V1 (n + 1) − V1 (n) p m + [Wi (n + 1) + αi Nj |sji |ξ k+1 = i=1
−
j=1
Zj (q)]
q=n−kji (n+1)+1
p m [Wi (n) + α+ Nj |sji |ξ k+1 i i=1
≤−
n
j=1
n−1
Zj (q)]
q=n−kji (n)
p m m + (1 − ξ + α− γ ξ)W (n) + α Nj |sji |ξ k+1 Zj (n). (12) i i i i i=1
i=1
j=1
Similarly, we can have V2 (n) ≤ −
p
(1 − ξ + βj− ηj ξ)Zj (n) +
j=1
p j=1
βj+
m
Mi |tij |ξ l+1 Wi (n). (13)
i=1
Thus V (n) ≤ −
m i=1
[1 − ξ + α− i γi ξ −
p j=1
βj+ Mi |tij |ξ l+1 ]Wi (n)
Global Exponential Stability for Discrete-Time BAM Neural Network
−
p
[1 − ξ +
βj− ηj ξ
j=1
−
m
k+1 α+ ]Zj (n). i Nj |sji |ξ
25
(14)
i=1
By using (8) and (9), we assert that V (n) ≤ 0, for n ∈ Z0+ , which implies V (n) ≤ V (0), for n ∈ Z0+ . Since V (0) = V1 (0) + V2 (0) =
m
[Wi (0) +
α+ i
i=1
+
p
+
m i=1 p
Nj |sji |ξ
[Zj (0) + βj+
m
p
−1
Mi |tij |ξ l+1
j=1
m
βj+ Mi |tij |ξ l+1 (lij (0) − 1)) sup |Wi (s)| s∈[−l,0]
k+1 α+ (kji (0) − 1)) sup |Zj (s)| i Nj |sji |ξ s∈[−k,0]
i=1 m
≤ ν{ sup
Wi (q)]
q=−lij (0)
j=1
(1 +
Zj (q)]
q=−kji (0)
i=1
(1 +
−1
k+1
j=1
j=1
≤
p
|Wi (s)| +
s∈[−l,0] i=1
sup
p
|Zj (s)|},
(15)
s∈[−k,0] j=1
where ν = max{ max {1 + 1≤i≤m
max {1 +
1≤j≤p
We have
m
Wi (n) +
i=1 m
ui (n) +
i=1
p
m
p
βj+ Mi |tij |ξ l+1 (l − 1)},
j=1 k+1 α+ (k − 1)}} > 1. i Nj |sji |ξ
i=1
Zj (n) ≤ V (n). Thus
j=1 p
p m 1 vj (n) ≤ ν( )n { sup |φi (s)| + sup |ψj (s)|}, ξ s∈[−l,0] i=1 s∈[−k,0] j=1 j=1
holds. This completes the proof.
Corollary 1. Under assumptions H1 , H2 , H3 and the conditions of Theorem 4, if there exist γi (i = 1, . . . , m), ηj (j = 1, . . . , p), such that uai (u) ≥ m + − γi u2 , vbj (v) ≥ ηj v 2 and α+ α+ i λi ≤ 1, βj hj ≤ 1, βj ηj > i Nj |sji |, α− i γi
>
p j=1
i=1
βj+ Mi |tij |.
Then the zero equilibrium of (6)is unique and globally
26
X. Lu
exponentially stable, i.e. for every solution (u, v) = (u1 , . . . , um , v1 , . . . , vp )T p p m m of (6)such that |ui (n)| + |vj (n)| ≤ ν( 1ξ )n [ |ui (0)| + |vj (0)|], where i=1
j=1
i=1
j=1
ξ, ν are constants and ξ > 1, ν > 1, n ∈ Z0+ . Proof. Similarly to the proof of Theorem 2, we consider another Lyapunov p m function V (n) = Wi (n) + Zj (n), for n ∈ Z0+ .The remaining part of the i=1
j=1
proof is similar to that of Theorem 5 and it’s omitted here. This completes the proof. Considering the following system ⎧ p ⎪ ⎪ ui (n + 1) = ui (n) − αi (ui (n))[ai (ui (n)) − cji fj (vj (n)) ⎪ ⎪ ⎪ j=1 ⎪ ⎪ p ⎪ ⎪ ⎪ − sji fj (vj (n − kji (n)))], ⎨ j=1 (16) m ⎪ ⎪ ⎪ v (n + 1) = v (n) − β (v (n))[b (v (n)) − d g (u (n)) j j j j j j ij i i ⎪ ⎪ ⎪ i=1 ⎪ m ⎪ ⎪ ⎪ tij gi (ui (n − lij (n)))], − ⎩ i=1
Obviously, according to the proof of Theorem 1 and Corollary 1, we have the following corollary.
Corollary 2. Under assumptions H1 , H2 , H3 , if there exist γi (i = 1, . . . , m), ηj (j = 1, . . . , p), such that uai (u) ≥ γi u2 , vbj (v) ≥ ηj v 2 and α+ i λi ≤ 1, m p + − + − + βj hj ≤ 1, βj ηj > αi Nj (|cji | + |sji |), αi γi > j=1 βj Mi (|dij | + |tij |). i=1
Then the zero equilibrium of (16)is unique and globally exponentially stable, i.e. for every solution (u, v) = (u1 , . . . , um , v1 , . . . , vp )T of (16)such that m i=1
|ui (n)| +
p
p m 1 |vj (n)| ≤ ν( )n [ |ui (0)| + |vj (0)|], ξ i=1 j=1 j=1
where ξ, ν are constants and ξ > 1, ν > 1, n ∈ Z0+ . Remark 1. In this paper, we have derived some analogous conditions via constructing different Lyapunov functions and applying different distinct analysis technique in the proof. Comparing Theorem 1 with Theorem 2, we can conclude that our results are independent on delays. Remark 2. In [14], the authors considered a special case of the model (3) as αi (xi (n)) = 1, βj (yj (n)) = 1, ai (xi (n)) = 1, bj (yj (n)) = 1. In other words, this shows that our results are general and new.
Global Exponential Stability for Discrete-Time BAM Neural Network
27
4 Example In this section, some examples are presented to illustrate the feasibility and effectiveness of our results. Consider the following discrete-time CohenGrossberg type BAM neural network model with delay. Considering the neural network (6) with parameters: f1 (y) = f2 (y) = sin(y), g1 (x) = g2 (x) = 1 2 sin(x),
2+sin(x1 (n)) 4−sin(y1 (n)) 0 0 4 6 , β(y) = , α(x) = 2+cos(x2 (n)) 4−cos(y2 (n)) 0 0 4 6 y1 (n) x1 (n) 10 10 , b(y) = , 01 01 x2 (n) y2 (n)
√
√ √ √ 2 2 2 2 −√ − 16 16 √4 √4 √ S= , T = . 2 2 − 162 − 162 4 4
a(x) =
+ Thus we can obtain N1 = N2 = 1, M1 = M2 = 0.5. α+ 1 = α2 = 0.75, + + − − − − 5 β1 = β2 = 6 , α1 = α2 = 0.25, β1 = β2 = 0.5, λ1 = λ2 = 1, h1 = h2 = 1, η1 = η2 = 1, γ1 = γ2 = 1. So, the conditions of Theorem 2, α+ i λi = 0.75 < 1, βj+ hj = 56 < 1,
√ 2 1 < 0, + + = −0.25 + × 2 4 √ 2 1 + × < 0, −βj− ηj + Nj (α+ 1 |sj1 | + α2 |cj2 |) = −0.5 + 2 16
−α− i γi
Mi (β1+ |ti1 |
β2+ |di2 |)
hold, where i, j = 1, 2. Consequently the origin of the system (17) is globally exponentially stable. We can also see from Fig.1. In Fig.1, we also take the Fig. 1 The numeric simulation of state variables x(n) and y(n) of system (17)
1 x1(n) x2(n) y1(n) y2(n)
0.8
0.6
x1(n), x2(n), y1(n), y2(n)
0.4
0.2
0
−0.2
−0.4
−0.6
−0.8
−1
0
2
4
6
8 n
10
12
14
28
X. Lu
initial values as (x1 (0), x2 (0), y1 (0), y2 (0), )T = (1, 0.5, −0.5, −1)T and the delay kji = 1, lij = 1. However, using the criteria in [14], it is difficult to make certain of the exponential stability of examples (17). This means that our results are new and general.
5 Conclusions In this paper, the main purpose is to study the existence and the global exponential stability of the equilibrium point for discrete-time Cohen-Grossberg type BAM networks with variable delay. Our model is more general and we generalize the prevent authors’ techniques and results. Therefore we have derived some sufficient independent conditions ensuring global exponential stability of the equilibrium of discrete-time Cohen-Grossberg type BAM networks via Lyapunov functional approach, inequality analysis technique and matrix theory.
References 1. Arika, S., Tavsanoglu, V.: Global asymptotic stability analysis of bidirectional associative memory neural networks with constant time delays. Neurocomputing 68, 161–176 (2005) 2. Gopalsamy, K., He, X.Z.: Delay-independent stability in bidirectional associative memory networks. IEEE Trans. on Neural Networks 5, 998–1002 (1994) 3. Li, C.D., Liao, X.F., Zhang, R.: Dealy-dependent exponential stability analysis of bi-directional associative memory neural networks with time delay: an LMI approach. Chaos, Solution and Fractals 24, 1119–1134 (2005) 4. Liao, X.F., Wong, K.W., Yang, S.Z.: Convergence dynamics of hybrid bidirectional associative memory neural networks with distributed delays. Physics Letters A 316, 55–64 (2003) 5. Liu, Y.R., Wang, Z.D., Liu, X.H.: Global asymptotic stability of generalized bidirectional associative memory networks with discrete and distributed delays. Chaos, Solitons and Fractals 28, 793–803 (2006) 6. Lou, X.Y., Cui, B.T.: Absolute exponential stability analysis of delayed bidirectional associative memory neural networks. Chaos, Solitons and Fractals 31, 695–701 (2007) 7. Wang, H.X., He, C., Yu, J.B.: Analysis of global exponential stability for a class of bi-directional associative memeory networks. Circuits and Systems 5, 673–676 (2003) 8. Zhao, H.Y.: Exponential stability and periodic oscillatory of bi-directional associative memory neural network involving delays. Neurocomputing 69, 424–448 (2006) 9. Feng, Z.S., Michel, N.: Robustness analysis of a class of discrete-time recurrent neural networks under perturbations. IEEE Trans. on Circuits and Systems I 46, 1482–1486 (1999)
Global Exponential Stability for Discrete-Time BAM Neural Network
29
10. Ma, K.L., Peng, J.G., Xu, Z.B., Yiu, K.F.C.: A new stability criterion for discrete-time neural networks: Nonlinear spectral radius. Chaos, Solitons and Fractals 31, 424–436 (2007) 11. Mohamad, S., Gopalsamy, K.: Dynamics of a class of discrete-time neural networks and their continuous-time counterparts. Mathematics and Computers in Simulation 53, 1–39 (2000) 12. Mohamad, S., Gopalsamy, K.: Exponential stability of continuous-time and discrete-time cellular neural networks with delays. Applied Mathematics and Computation 135, 17–38 (2003) 13. Xiong, W.J., Cao, J.D.: Global exponential stability of discrete-time CohenGrossberg neural networks. Neurocomputing 64, 433–446 (2005) 14. Liang, J.L., Cao, J., Ho, D.W.C.: Discrete-time bidirectional associative memory neural networks with variable delays. Physics Letters A 335, 226–234 (2005) 15. Cao, J., Liang, J.L.: Boundedness and stability for Cohen-Grossberg neural network with time-varying delays. Journal of Mathematical Analysis and Applications 296, 665–685 (2004) 16. Cao, J., Li, X.L.: Stability in delayed Cohen-Grossberg neural networks: LMI optimization approach. Physica D: Nonlinear Phenomena 212, 54–65 (2005) 17. Liao, X.F., Li, C.G., Wong, K.W.: Criteria for exponential stability of CohenGrossberg neural networks. Neural Networks 17, 1401–1414 (2004) 18. Song, Q.K., Cao, J.D.: Stability analysis of Cohen-Grossberg neural network with both time-varying and continuously distributed delays. Journal of Computational and Applied Mathematics 197, 188–203 (2006) 19. Liu, X.G., Tang, M.L., Martin, R., Liu, X.B.: Discrete-time BAM neural networks with variable delays. Physics Letters A 367, 322–330 (2007)
The Study of Project Cost Estimation Based on Cost-Significant Theory and Neural Network Theory Xinzheng Wang, Liying Xing, and Feng Lin 1
Abstract. Based on the reference to domestic and foreign correlative theories and methods, cost-significant theory and neural network theory are used to estimate project cost in the paper. The cost-significant theory is put forward to solve the tedious operation issues by finding out significant items to simplify the operational difficulty of engineering cost estimation. Then the BP neural network is applied to “distill” the data of CSIs and csf from the completed projects. It has realized the accurate prediction of project investment by using the two nonlinear theories The basic theories of CS and BP neural network are illustrated by an example From the example it shows that the relative errors are so small that they can meet the accurate demands of cost estimation. Meanwhile, the test results show that the model based on cost- significant theory and neural network theory is successful and effective for practical engineering
. .
,
.
Keywords: Cost estimation, Cost-significant theory, BP neural network, Investment control.
1 Introduction The construction investment estimation is an important content of the project’s feasibility study. The accuracy of the cost estimation directly affects the project’s decision, construction’s scale, design scheme and economic effects, and affects the project’s proceeding. It is significant for the management and control of the project estimation to process the estimate handily, quickly and exactly. The main feature of the engineering’s estimation is that there are too many factors which can affect the project cost, and there is a highly nonlinear mapping relationship between the project cost and these uncertain factors. According to the traditional estimation method, quota index has certain and static character (such as Investment Estimate Fixed Estimation), and the traditional estimation method is to build model XinzhengWang . Liying Xing . Feng Lin School of Civil Engineering, Nanyang Normal University, Nanyang, Henan 473061,China
[email protected] H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 31–38. springerlink.com © Springer-Verlag Berlin Heidelberg 2009
32
X. Wang, L. Xing, and F. Lin
according to a linear relationship between the project’s cost and its factors (such as Units of Production Capacity Estimation, Production Capacity Index Estimation, Proportion Estimation, Lange Coefficient Method, Capital Turnover Method etc), which is not accurate of the practical condition. To solve these problems, this paper tries to research our country project cost estimation method with the combination of the cost-significant theory and the BP neural network theory. Firstly, the estimation budget and the budget estimate programming according to the cost-significant theory is simplified. Secondly, based on this foundation, this paper aims to “discover and seek” the similar cost-significant items (CSIs) and factors (csf ) data from a large quantity of project’s historical information that based on the BP neural network. Then we will discover and obtain the nonlinear relationship between the uncertain factors (which influence the project cost) and the construction cost, therefore, realize the investment estimation of the proposed project and reach to the purpose of simplifying estimate procedure and improving estimation accuracy.
2 Basic Thoughts of Cost-Significant Theory CS theory is the cost-significant theory, which theoretical thoughts originated from Italy economists Vilfred Pareto founds, that is social wealth is not well-distributed, about twenty percent population own eighty percent of social wealth. British Dundee university Malcolm Horner applied the "cost-significant theory" to the construction cost estimation research, founding that about twenty percent total quantity items bear about eighty percent of construction total cost, we say that the twenty percent items is the Cost-significant Items (CSIs), others are the non-Cost-significant Items (non-CSIs) [1]. Although the non-Cost-significant Items have less effect on the total cost, its computing workload is far more than the CSIs’ computing workload. Therefore we just only pay attention to the cost-significant items, catch and study on the key sectional engineering from various sectional engineering according to the key ideas, thereby speculate the whole project’s and the similar project’s cost, which can achieve to the purpose of both simplifying calculation workload and guarantee the accuracy of the investment estimation[2-4].
2.1 The Search Steps of Csis (1) Calculate the total construction cost C and the total amount N. Here need to explain the count of quantities items, firstly select out the level of N construction, which is counted in the total by construction scale, secondly account the number N, which only occur expenses item among the same level standard items. (2) Calculate the average cost T of the construction. The average cost of projects is the ratio of the total project cost and the total project number, namely T = C/N. (3) Find the CSIs by comparing with the single construction cost and the average cost. Generally, if the single project cost is more than the average cost, we called it as cost-significant items (CSIs); if less than the average cost, called it as non-cost-significant items (non - CSIs). In case the CSIs cannot guarantee within thirty percent, we can carry on the second average [5].
The Study of Project Cost Estimation
33
2.2 The Consistency of CSIs Foreign studies show that, the similar construction has the same CSIs, and the"cost-significant factor" is very stable, which is the ratio of CSIs cost and total project cost. Although the CSIs may not exactly the same in different project quantities detailed list, CSIs exists a lot of similarities in the similar project. Many studies also show that[6], similar project quantities detailed lists have almost an identical CSIs. In foreign constructions, there have many CSIs models for the construction cost estimation. The results indicate that, the csf data of different projects is between 10% and 27%, the investment estimation precision is between 4% and 10%, which satisfies the precision demand of investment estimation.
2.3 The Application Step of CSIs Model If the same type completed projects have the same CSIs, and its " cost-significant factor" is stable, proposed projects may carry on the investment estimation by means of the following ways: choose the CSIs from the similar completed project’s quantities detailed list by Mean theory, calculate the similar project’s thus estimate the proposed project CSIs’ cost and its cost-significant factor; cost-significant factor; the proposed project cost equals the ratio of CSIs cost and mean of cost-significant factor(csf). Thus, the CSIs can greatly simplify the investment estimation’s computation procedure of the feasibility study phase, and will not affect the accuracy of measurement. However, to reach this purpose, the distinguish between the same type projects and the similar projects plays an important role. If artificial distinguish, it must cause a big error and a bad stability. This paper discusses to adopt the neural network method, under having many completed construction’s CSIs and csf data condition, judge the same type projects or similar engineering process by simulating the people’s brain "experience", thus predict the proposed project’s CSIs cost and the cost-significant factor.
② ③
①
3 The Applications of BP Network Artificial neural network is a method of information processing, which is developed by the biological neural systems inspired. Based on the learning sample process, the artificial neural network analysis data, a model is built and then finds new knowledge. Neural network can automatically adjust the neurons input and output in accordance with the rules through learning, to change the internal state[7-9]. In all kinds of neural network models, the BP neural network is a widen-applied neural network model. The standard BP network is made up of three kinds of neural unit layers. The lowest layer is called as the input layer. The middle one is called as the implication layer (can be multi-layer). And the top one is called as the output layer. Every layer of neural unit forms fully-connection, and the neural units in each layer have no connection. The study course of BP algorithm is made up of
34
X. Wang, L. Xing, and F. Lin
Error reverse pass
Excepted output
Output layer
Input layer Hidden layer
Fig. 1 BP neural network basic structure [10-11]
propagation and back-propagation . In the propagation course, the input information is transferred and processed through input layer and implication layer. The state of every neural unit layer only affects the state of next layer. If the expected information cannot be got in the output layer, the course will turn into the back-propagation and return the error signal along the former connection path. Altering the connection weight between each layer, the error signal is transmitted orderly to the input layer, and then passes the propagation course. The repeated application of these two courses makes the error become smaller and smaller, until it meets the demand. The specific structure can be seen in Fig.1. In this picture, the relationship between the input and output of neural unit (except the input layer) is nonlinear mapping, and S(Sigmoid) function is always been adopted. f(x)=1/(1+e-x) is the node-outputting function, and the differential coefficient is f’(x)=f(x)(1-f(x)). Its advantage is that, whenever inputting any data can be transformed into the numbers that are in 0,+1 .
( )
3.1 BP Neural Network Application in Csis Estimates For a large number of typical completed projects budget (or bill of quantities) cost information, we can firstly analysis every project item’s CSIs sectional item, total project cost, and the cost-significant factor data by mean theory. And secondly sort out the information by some certain format according to project data analysis and engineering properties, which is taken as the training sample. Thirdly input in the neural network to training, therefore complete the mapping from input layer (engineering characteristics) to output layer (CSIs data). That is a highly nonlinear neural network model, that can automatically extract this knowledge, and store the network weights in neural network inside. So project cost personnel can get the proposed project’s CSIs and csf, by inputting the relative information into the neural network about the characteristics of proposed project.
The Study of Project Cost Estimation
35
3.2 Build BP Neural Network Model of Investment Estimation The investment estimation model of neural network is composed of input preprocessing module, neural network module and output processing module. Input preprocessing module is mainly pre-processing the input data, changing the qualitative things into the quantitative data, what is easy for neural network operating. The output processing module will transform the neural network output into the investment estimate data what we need, whose core is the neural network module.
3.3 Case Analysis
,
This article takes the highway investment estimation for example analyzes the different constructions characteristic’s CSIs and cfs data, by collecting and sorting out the material cost, labor cost, levy land cost and project construction cost of completed highway construction projects. The selection of the project’s feature should consult statistics and analysis of the historical projects’ materials. And it can be sure by the expert experience. We can analyze the effect that the cost of the typical highway project and the change of construction’s parameter make for the estimation of the project’s investment. Then, we confirm eight main factors that are landform, cross-section’s type ( cutting excavation, embankment, half-digging and half-filling), height, and width, foundation processing type , road surface’s material and thickness, guard project’s type and so on as the project’s features. And then list the different types of project’s feature, and lead to the change of every kilometer highway engineering’s cost according to quota standard and engineering characteristic to being mutually related nature of cost influence, and sequence them from childhood to the big, and the Tab.1 is seen to the subjective preset corresponding quantification data at last. To illustrate this problem, we can simply calculation, choose twenty typical construction CSIs remit total(taking the seventeen and twentieth groups data as testing samples, the first and sixteen groups data as learning samples) to set up a BP neural network investment estimation model. From Table 1, it shows that any highway project’s model can be given a quantitative description. Taking Ti = (ti1 , ti 2 , , ti 8 ) as an example, Ti is the serial
number of project i (i = 1,2, ) ; tij ( j = 1, 2, ,8) is the quantitative numeric value of the first j feature of the first i project. For instance, some highway project (the serial number is assumed as i ) in the plain, and the type of cross section is embankment and highway grade for at a high speed with height 1.8m of roadbed cross section, width 35m and manson grizzly screen, pitch concrete, common guard and thickness 0.45m of road surface’s structure. Therefore, its quantitative description is Ti = (3, 2, 4, 6,3,1,1, 4) . If some feature is made up of several kinds, count its weighed average according to the proportion and make it as its quantification result. This model adopts three layers of BP network model, and chooses f(x)=1/(1+e-x) as the node-outputting function. The units of model’s input layer are eight, and they stand for project’s characteristic vectors, like landform, cross-section’s type, cross-section’s height, cross-section’s width, foundation processing type, material of road surface’s structure, road surface’s thickness and protection type, and they
36
X. Wang, L. Xing, and F. Lin
are expressed by I1~I8; the two output units are the per- kilometer CSIs millon yuan cost and the csf, and respectively expressed by O1 O2 [12]. The value of implication layer’s unit is seventeen according to Ke Ermoge luofu’s theorem. Commonly, choose the random numbers whose original weight is between(-1 1). Collect sixteen training samples and four testing samples according to the complexity degree of the inputting-outputting mapping. List the twenty typical samples’ characteristic fixed quantity data and the data of O1 O2, in Tab.2.
、
,
、
3.4 The Analysis of Test’s Result Use the network after constringency to test the data of group seventeen and twentieth,, and take its average as the predictive value of O1 O2, and the results are showed in Tab.3, O1/O2 is the per-kilometer project investment total estimation of the predictive sample. It can be seen from the test’s result that the relative error between the real value and predictive value is less than 8%, the overall error ratio is small, and the need for the estimation of engineering feasibility study can be basically satisfied. This shows that the model’s generalization ability is better and the estimation model is successful.
、
Table 1 The types of achievement highway project’s features fixed quantity table The fixed quantitative value of feature
1
2
3
Landform
maintain area
hill
plain
The type of foundation’s cross-section
cutting excavation
4
5
6
Embankment
half-diggin g and half-filling
cutting excavationǃ embankment
cutting excavationǃ half-digging and half-filling
embankmentǃ half-digging and half-filling
0~0.5
0.5~1
1~1.5
1.5~2
2~2.5
more than 2.5
0~20
20~30
30~40
40~50
50~60
more than 60
The type of foundation
general replacement
plastic board drain concretion
mason grizzly screen
sand pile drain concretion
punning, stir pile
geo-textile
The material of road surface’s structure
bitumen& concrete
cement& concrete
Guard project
common guard
anchor plate revetment
gravitation retaining wall
spray-net strut
Plate girder strut
grass slope preserve
The thickness of road surface’s structure/m
0~0.2
0.2~0.3
0.3~0.4
0.4~0.5
0.5~0.6
more than 0.6
The height of foundation’s cross-section/ m The width of foundation’s cross-section/ m
The Study of Project Cost Estimation
37
Table 2 The table of typical sample’s characteristic fixed quantity data and budget material
Serial number
Output(ten thousand yuan/Km)
Entry
I1
I2
I3
I4
I5
I6
I7
I8
O1
O2
1 2 3 4 5
3 2 1 2 3
2 1 4 4 1
1 1 2 2 1
2 2 2 3 3
1 3 2 6 1
1 2 1 2 1
2 1 5 3 1
6 4 3 3 4
1608 1823 2400 2528 1723
0.782 0.807
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
3 1 1 3 2 2 3 1 3 1 3 3 1 1 3
6 1 4 2 1 2 2 4 2 4 1 4 4 3 2
4 1 3 5 1 6 6 5 6 5 6 1 2 4 2
5 5 3 3 4 5 6 4 6 4 6 3 2 3 3
6 1 3 6 4 6 2 4 3 3 1 1 6 1 1
1 2 4 1 1 2 1 1 1 1 1 2 1 1 1
6 5 5 1 4 6 4 3 1 1 6 6 3 6 1
4 3 2 3 4 5 5 5 5 4 5 4 3 4 6
2346 3113 2361 1742 2240 2468 2521 2904 2232 2704 2400 1906 2511 2389 1826
0.824
0.811 0.794 0.778 0.813 0.806 0.814 0.794 0.782 0.815 0.808 0.825 0.802 0.805 0.798 0.801 0.806 0.812
Table 3 Results analysis Serial number predictive value of O1 actual value of O1 relative error of O1 (%) predictive value of O2 actual value of O2 relative error of O2 (%) predict Km/millon yuan actual Km/millon yuan relative error%
17 1906 1990 -4.22 0.817 0.798 2.4 2332.9 2493.7 -6.4
18 2511 2492 0.8 0.790 0.801 -1.3 3178.5 3111.1 2.1
19 2389 2490 -4.4 0.824 0.806 2.2 2899.3 3089.3 -6.2
20 1826 1855 -1.6 0.790 0.812 -2.7 2311.4 2284.5 1.2
4 Conclusion In the paper, the cost-significant theory and the neural network theory are used to estimate the engineering cost. According to the thought of CS theory, some key items, namely the significant items, are excavated from numerous subentry projects so as to reduce the operational difficulty of the project cost estimation and the calculation work in a large degree. Meanwhile, neural network theory is applied to extract the similar cost-significant items (CSIs) and the cost-significant factors (csf) from a large number of historical project cost data automatically. In addition, for its highly fault-tolerance, the neural network has the automatic function to correct the
38
X. Wang, L. Xing, and F. Lin
deviation of the completed projects, which is caused by someone or other factors. The neural network also has an automatic matching and correcting function for the similarities between the cost-significant project and factors. Furthermore, because the neural network has a function of parallel processing data, the data can be processed in a high speed, and thus the neural network can meet the requirements of rapid estimation. It shows that it is efficient, feasible and accurate to use the cost-significant theory and the neural network to estimate the investment of highway project. At the same time, it has some reference value and academic significance for adopting new scientific methods to promote the study on the estimation of project investment.
References 1. Malcolm, R., Horner, W.: New Property of Numers-The Mean Value and its Application to Data Simplification. The Papers to the Royal Society, London (2004) 2. Zakieh, R.: Quantity-significance and Its Application to Construction Project Modelling. University of Dundee, Dundee (1991) 3. AI-Hubail, J.: Modelling Operation and Maintenance Expenditures in the Offshore Oil and Gas Industry. Department of Civil Engineering of The University of Dundee, Dundee (2000) 4. Asif, M.: Simple Generic Models for Cost-Significant Estimating of Construction Project Cost. Department of Civil Engineering of The University of Dundee, Dundee (1988) 5. Duan, X.C., Zhang, J.W.: The Government Investment Project Overall Investment Control Theory and Study Method. Science press, Beijing (2007) 6. Wang, N.: New Approaches to Optimising the Whole Life Performance of Building and Civil Engineering Projects, Department of Civil Engineering of the University of Dundee, Dundee (2005) 7. Shao, F.J., Yu, Z.Q.: Principle and method of Data mining. China HydropowerPress, Beijing (2003) 8. Wang, W.: Artificial Neural Network Theory-Application and Introductory. Beijing Aeronautics and Astronautics University Press, Beijing (1996) 9. Jiao, L.C.: Neural Network Theory. Xi’an Electronic Technology University Press, Xi’an (1994) 10. Cheng, M.: Neural Network Model. Dalian University of Science and Technology Press, Dalian (1995) 11. Yan, P.F., Zhang, C.S.: Artificial Neural Network and Simulation Calculation. Tsinghua University Press, Beijing (2000) 12. Zhou, L.P., Hu, Z.F.: The Application of Neural Network in the Cost Estimation of Construction. Journal of Xi’an University of Architecture& Technology 37(2), 261–264 (2005)
Global Exponential Stability of High-Order Hopfield Neural Networks with Time Delays Jianlong Qiu and Quanxin Cheng
Abstract. In this paper, the global exponential stability is studied for a class of high-order Hopfield neural networks (HHNNs) with time delays by employing Lyapunov method and linear matrix inequality (LMI) technique. Simple sufficient conditions are given ensuring global exponential stability of HHNNs. The proposed results improve some previous works and do not require the symmetry of weight matrix. In addition, the proposed conditions are easily checked by using the Matlab LMI Control Toolbox.
1 Introduction It is well known that neural networks play an important role in many fields, such as pattern recognition, signal processing, associate memory, and optimization. These applications of neural network heavily depend on its strong approximation property. In this point, high-order neural networks does better than ordinary neural networks, that is to say, high-order neural networks have stronger approximation property, faster convergence rate, great stronger capacity, and higher fault tolerance. Due to this, recently HHNNs have attracted considerable attention, see [1-8]. In [2], the absolute stability of highorder neural networks is studied. In [7], some criteria are derived to ascertain global asymptotic stability for high-order Hopfield type neural networks, and in [8] some sufficient conditions are presented for the exponential stability of Jianlong Qiu School of Automation, Southeast University, Nanjing 210096, China and Department of Mathematics, Linyi Normal University, Linyi 276005, China Quanxin Cheng Department of Mathematics, Southeast University, Nanjing 210096, China
[email protected] H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 39–47. c Springer-Verlag Berlin Heidelberg 2009 springerlink.com
40
J. Qiu and Q. Cheng
high-order BAM neural networks. These previous literature only consider the stability of high order neural networks. In reality, studies on neural dynamical systems not only involve discussion of stability property, but also involve many dynamics behavior such as periodic oscillatory, bifurcation and chaos. In many applications, the property of periodic solutions is of great interest. Motivated by the above discussion, this paper studys the exponential stability of HHNNs. The proposed results are given in the form of linear matrix inequality which do not require the symmetry of the weight matrix. Moreover, an illustrative example is given to demonstrate the effectiveness of the obtained results. The remaining paper is organized as follows. In Section 2 the model formulation and some preliminaries are given. The main results are stated in Section 3. Finally, concluding remarks are made in Section 4. Notation. Throughout this paper the following notations will be used. Let A = (aij ) be a n × n dimensional real matrix. AT , A−1 , λmax (A), λmin (A) denotes, respectively, the transpose of, the inverse of, the minimum eigenvalue of and the maximum eigenvalue of a square matrix A. The notation A > 0 means that A is symmetric and positive definite. For x ∈ Rn , its norm is n √ defined by ||x|| = xT x, M ∗ = Mj2 , K = diag(K1 , K2 , . . . , Kn ), L = j=1
diag(L1 , L2 , . . . , Ln ).
2
Model Formulation and Preliminaries
Consider the following second-order neural networks with time-delays modelling by the set of differential equations x˙ i (t) = −ai xi (t) +
n j=1
+
n n
bij fj (xj (t)) +
n
cij gj (xj (t − τ ))
j=1
dijl gj (xj (t − τ )) gl (xl (t − τ )) + Ii (t),
(1)
j=1 l=1
where i = 1, 2, . . . , n; t > 0; xi (t) denote the potential (or voltage) of the ith neuron at time t; ai are positive constants, and denote the rate with which the ith unit will reset its potential to the resting state in isolation when disconnected from the network and external inputs at time t; time delay τ is non-negative constant, which correspond to finite speed of axonal signal transmission; bij , cij , dijl are the first-order and second-order connection weights of the neural network, respectively; Ii denote the ith component of an external input source introduced from outside the network to the ith neuron. Throughout this paper, the activation functions are assumed to satisfy the following assumptions
Global Exponential Stability of High-Order Hopfield Neural Networks
41
j > 0 such that | j , for all x ∈ R, j = (H1 ) There exist numbers M gj (x)| ≤ M 1, 2, . . . , n; i > 0, K j > 0 such that (H2 ) There exist numbers L 0≤
fi (x) − fi (y) i, ≤L x−y
0≤
gj (y) gj (x) − j , ≤K x−y
for all x, y ∈ R (i, j = 1, 2, . . . , n). The initial conditions associated with system (1) are of the form xi (t) = ϕi (t),
−τ ≤ t ≤ 0,
(2)
in which ϕi (t) (i = 1, 2, . . . , n) are continuous functions. Under the assumptions (H1 ) and (H2 ) system (1) has an equilibrium point X ∗ , where X ∗ = [x∗1 , x∗2 , . . . , x∗n ]T . Denote yi (t) = xi (t) − x∗i , fj (yj (t)) = fj (yj (t) + x∗j ) − fj (x∗j ), gj (yj (t)) = gj (yj (t) + x∗j ) − gj (x∗j ), so it is obvious that functions fj (·), gj (·) have the following properties (A1 ) There exist numbers Mj > 0 such that |gj (x)| ≤ Mj , for all x ∈ R, j = 1, 2, . . . , n; (A2 ) There exist numbers Li > 0, Kj > 0 such that 0≤
fi (x) − fi (y) ≤ Li , x−y
0≤
gj (x) − gj (y) ≤ Kj , x−y
for all x, y ∈ R (i, j = 1, 2, . . . , n). System (1) is transformed into y˙ i (t) = −ai yi (t) +
n j=1
+
n n
bij fj (yj (t)) +
n
cij gj (yj (t − τ ))
j=1
dijl gj (xj (t − τ )) − gj (x∗j ) gl (xl (t − τ ))
j=1 l=1
+ gl (xl (t − τ )) − gl (x∗l ) gj (x∗j ) n n = −ai yi (t) + bij fj (yj (t)) + cij gj (yj (t − τ )) j=1
+
n n
j=1
(dijl + dilj ) ξl gj (yj (t − τ )),
(3)
j=1 l=1
where ξl = dijl /(dijl + dilj ) gl (xl (t − τ )) + dilj /(dijl + dilj ) gl (x∗l ) when ∗ dijl + dilj = 0, it lies between gl (xl (t − τ )) and gl (xl ); otherwise ξl = 0.
42
J. Qiu and Q. Cheng
Denote Y (t) = (y1 (t), y2 (t), . . . , yn (t))T , F (Y (t)) = (f1 (y1 (t)), f2 (y2 (t)), . . .,fn (yn (t)))T ; G(Y (t − τ )) = (g1 (y1 (t − τ )), g2 (y2 (t − τ )), . . . , gn (yn (t − τ )))T ,A = diag(a1 , a2 , . . . , an ), B = (bij )n×n , C = (cij )n×n (where we do not assume that matrixes B, C to be symmetric); D = (D1 + D1T , D2 + D2T , . . . , Dn + DnT )T , where Di = (dijl )n×n , S = diag(ξ, ξ, . . . , ξ)n×n , where ξ = [ξ1 , ξ2 , . . . , ξn ]T . Then system (3) becomes in the following vector-matrix form Y˙ (t) = −AY (t) + BF (Y (t)) + CG(Y (t − τ )) + S T DG(Y (t − τ )).
(4)
It is obvious that the global exponential stability of the origin of system (4) is equivalent to the global exponential stability of the equilibrium point X ∗ of system (1). Definition 1. The equilibrium point X ∗ of system (1) is said to be globally exponentially stable, if there exist constants k > 0 and γ ≥ 1 such that, for t ≥ 0, ||X(t) − X ∗ || = ||Y (t)|| ≤ γe−k(t−τ ) sup ||Y (s)||. s∈[−τ, 0]
To give our main results, we also need the following lemmas. Lemma 1 [12]. Suppose W, U are any matrices, is a positive number and matrix H = H T > 0, then the following inequality holds W T U + U T W ≤ W T HW + −1 U T H −1 U. Lemma 2 [13]. (Schur complement) The following linear matrix inequality (LMI) Q(x) S(x) > 0, S T (x) R(x) where Q(x) = QT (x), R(x) = RT (x), and S(x) depend affinely on x, is equivalent to (1) R(x) > 0, Q(x) − S(x)R−1 (x)S T (x) > 0, (2) Q(x) > 0, R(x) − S(x)Q−1 (x)S T (x) > 0.
3 Global Exponential Stability of HHNNs In this section, some criteria are given for checking the global exponential stability of HHNNs by constructing suitable Lyapunov functional and employing LMI method. Theorem 1. Under the assumptions (H1 )–(H2 ), the equilibrium point X ∗ of system (1) is unique and globally exponentially stable if there exist positive
Global Exponential Stability of High-Order Hopfield Neural Networks
43
definite matrices P , Q, Σ, positive diagonal matrix R = diag(r1 , r2 , · · · , rn ) and constants β, i > 0, i = 1, 2, 3, 4 such that ⎡ ⎤ √ H P B P C βLB T P ⎢ B T P In×n 0 ⎥ 0 0 ⎢ T ⎥ ⎢ C P ⎥ > 0, 0 Σ 0 0 (5) 1 ⎢√ ⎥ ⎣ βBL 0 ⎦ 0 In×n 0 2 P 0 0 0 M ∗ In×n and 1 Σ + (2 + β4 )DT D + β3 C T C − 2Q ≤ 0,
(6)
−1 ∗ 2 T where H = AT P + P A − 2KQK − L2 − β(2 + −1 3 + 4 M )KR K − βA A.
Proof. From Lemma 2, we know that condition (5) is equivalent to −1 ∗ 2 T AT P + P A − 2KQK − L2 − β(2 + −1 3 + 4 M )KR K − βA A −1 −1 ∗ 2 T −1 T T −2 M P − P BB P − 1 P CΣ C P − βLBB L > 0
then there exists a scalar k > 0 such that −1 ∗ 2 AT P +P A−2kP −2e2kτ KQK − L2 − βAT A − β(2 + −1 3 + 4 M )KR K
−1 ∗ 2 T −1 T C P − βLBB T L ≥ 0. (7) − 2kβRK − −1 2 M P −P BB P −1 P CΣ
Consider the Lyapunov functional n y (t) V (Yt ) = e2kt Y T (t)P Y (t) + 2βe2kt ri 0 i gi (s)ds i=1 t +2 t−τ e2k(s+τ ) GT (Y (s))QG(Y (s))ds.
Calculate the derivative of V (Yt ) along the trajectory of (4), we have V˙ (Yt )|(4) = e2kt 2kY T (t)P Y (t) + 2Y T (t)P Y˙ (t) ! yi (t) n n +4βk ri gi (s)ds + 2β ri gi (yi (t))y˙i (t) +2e
i=1 2kτ T
0
i=1
" G (Y (t)QG(Y (t)) − 2GT (Y (t − τ ))QY (Y (t − τ ))
= e2kt Y T (t)[2kP − P A − AT P ]Y (t) + 2e2kτ GT (Y (t)QG(Y (t)) +2Y T (t)P BF (Y (t)) + 2Y T (t)P CG(Y (t − τ )) −2βGT (Y (t))RAY (t) + 2βGT (Y (t))RBF (Y (t)) ! yi (t) n ri gi (s)ds +2βGT (Y (t))RCG(Y (t − τ )) + 4βk i=1
0
" +2Y T (t)P S T DG(Y (t−τ ))−2GT (Y (t − τ ))QG(Y (t−τ )) . (8)
44
J. Qiu and Q. Cheng
By Lemma 1, we have 2Y T (t)P BF (Y (t)) ≤ Y T (t)[P BB T P + L2 ]Y (t), 2Y (t)P CG(Y (t − τ )) ≤ T
2Y (t)P S DG(Y (t − τ )) ≤ T
0≤
T
n i=1
! ri 0
yi (t)
gi (s)ds ≤
(9)
T −1 T −1 C P Y (t) 1 Y (t)P CΣ T +1 G (Y (t − τ ))ΣG(Y (t − τ )), T T −1 2 Y (t)P S SP Y (t) T +2 G (Y (t − τ ))DT DG(Y (t − τ )), ! yi (t) n 1 ri ki sds = Y (t)T RKY (t), 2 0 i=1
− 2βGT (Y (t))RAY (t) ≤ βY T (t)(KR2 K + AT A)Y (t),
(10) (11) (12)
(13)
2βGT (Y (t))RBF (Y (t)) ≤ βY T (t)(KR2 K + LB T BL)Y (t), T 2 2βGT (Y (t))RCG(Y (t − τ )) ≤ −1 3 βY (t)KR KY (t)
(14)
+3 βG(Y (t − τ ))T C T CG(Y (t − τ )), (15) ∗ T 2 2βGT (Y (t))RS T DG(Y (t − τ )) ≤ −1 4 βM Y (t)KR KY (t) +4 βG(Y (t − τ ))T DT DG(Y (t − τ )). (16) Substituting (9)–(16) into (8), and from conditions (6)–(7), we have V˙ (Yt )|(4) ≤ e2kt Y T (t)[2kP − P A − AT P + 2βkRK + 2e2kτ KQK −1 T ∗ 2 T 2 +P BB T P + −1 C P + −1 1 P CΣ 2 M P + βLB BL + L −1 ∗ 2 T +β(2 + −1 3 + 4 M )KR K + βA A]Y (t)
+GT (Y (t − τ ))[1 Σ + (2 + β4 )DT D " +β3 C T C − 2Q]G(Y (t − τ )) ≤ 0, which means V (Yt ) ≤ V (Y0 ),
∀t ≥ 0.
Since V (Yt ) ≥ e2kt λmin (P )||Y (t)||2 ,
V (Y0 ) ≤ (λmax (P ) + β||R||||K||)||Y (0)||2 +
∀t ≥ 0,
1 ||Q||||K||2(e2kτ − 1) sup 2k s∈[−τ,
by the above two inequalities, we easily obtain ||Y (t)|| ≤ γe−k(t−τ )
sup s∈[−τ, 0]
||Y (s)||,
0]
||Y (s)||
2
.
Global Exponential Stability of High-Order Hopfield Neural Networks
45
for all t ≥ 0, where γ ≥ 1 is a constant. By Definition 1, this implies that the equilibrium X ∗ = (x∗1 , x∗2 , · · · x∗n )T is globally exponentially stable. Corollary 1. Under the assumptions (H1 )–(H2 ), the equilibrium point X ∗ of system (1) is unique and globally exponentially stable if there exist positive definite matrices P , Q, positive diagonal matrix R = diag(r1 , r2 , · · · , rn ) and constants β, i > 0, i = 1, 2, 3, 4 such that any one of the following conditions holds: (i) ⎡
H PB PC ⎢ B T P In×n 0 ⎢ ⎢ CT P 0 In×n ⎢√ ⎣ βBL 0 0 P 0 0
√ βLB T 0 0 In×n 0
⎤ P ⎥ 0 ⎥ ⎥ > 0, 0 ⎥ ⎦ 0 1 M ∗ In×n
(17)
and In×n + (1 + β)DT D + βC T C − 2Q ≤ 0,
(18)
= AT P + P A − 2KQK − L2 − β(3 + M ∗ )KR2 K − βAT A. where H (ii) ⎡
H PB ⎢ B T P In×n ⎢√ ⎣ βBL 0 P 0
⎤ √ βLB T P ⎥ 0 0 ⎥ > 0, ⎦ In×n 0 −1 −1 + ) I 0 (−1 n×n 1 2
(19)
and (1 + β3 )C T C + (2 M ∗ + β4 )DT D − 2Q ≤ 0,
(20)
−1 ∗ 2 T where H = AT P + P A − 2KQK − L2 − β(2 + −1 3 + 4 M )KR K − βA A.
Theorem 2. Under the assumptions (H1 )–(H2 ), the equilibrium point X ∗ of system (1) is unique and globally exponentially stable if there exist positive definite matrices P , Q, Σ, positive diagonal matrix R = diag(r1 , r2 , · · · , rn ) and constants β, i > 0, i = 1, 2, 3, 4 such that ⎡ ⎤ √ H P B P C βLB T P ⎢ B T P In×n 0 ⎥ 0 0 ⎢ ⎥ ⎢ CT P ⎥ > 0, (21) 0 Σ 0 0 1 ⎢√ ⎥ ⎣ βBL 0 ⎦ 0 In×n 0 2 P 0 0 0 M ∗ In×n and
46
J. Qiu and Q. Cheng
1 KΣK + (2 + β4 )KDT DK + β3 KC T CK − 2Q ≤ 0,
(22)
−1 ∗ 2 T where H = AT P + P A − 2Q − L2 − β(2 + −1 3 + 4 M )KR K − βA A.
Proof. Consider the Lyapunov functional n y (t) ri 0 i gi (s)ds V (Yt ) = e2kt Y T (t)P Y (t) + 2βe2kt i=1 t +2 t−τ e2k(s+τ ) Y T (s)QY (s)ds.
Calculate the derivative of V (Yt ) along the trajectory of ( 4) and apply Lemma 1, 2, we have V (Yt ) ≤ V (Y0 ), ∀t ≥ 0. The remaining part of the proof is similar to that of Theorem 1, and is omitted here. If dijl = 0, i, j, l = 1, 2, · · · , n, the system (1) degenerate into the following first-order neural networks: x˙ i (t) = −ai xi (t) +
n
bij fj (xj (t)) +
j=1
n
cij gj (xj (t − τ )) + Ii ,
(23)
j=1
where i = 1, 2, . . . , n, t > 0. We have the following corollaries. Corollary 2. Under the assumptions (H1 )–(H2 ), the equilibrium point X ∗ of system (23) is unique and globally exponentially stable if there exist positive definite matrices P , Q, Σ, positive diagonal matrix R = diag(r1 , r2 , · · · , rn ) and constants β, i > 0, i = 1, 2 such that the following condition holds: ⎡ ⎢ ⎢ ⎢ ⎢ ⎣
AT P + P A − L2 − KQK 2 −βA2 − (2 + −1 2 )βKR K T B P T √C P βBL
⎤ √ T P B P C βLB ⎥ ⎥ ⎥ > 0, In×n 0 0 ⎥ ⎦ 0 1 Σ 0 0 0 In×n
(24)
and 1 Σ + 2 C T C − Q ≤ 0,
(25)
4 Conclusion In this paper, we have studied the exponential stability of high-order neural networks with time delay. Several new sufficient conditions have been derived to show the HHNNs are exponentially stable and have periodic solutions by using Lyapunov method and LMI technique. The conditions are easy to check and the results are useful in applications of manufacturing high quality neural networks.
Global Exponential Stability of High-Order Hopfield Neural Networks
47
Acknowledgements. This work was jointly supported by the National Natural Science Foundation of China under Grant No. 60574043 and the Natural Science Foundation of Shandong Province of China under Grant No. Y2008A32.
References 1. Karayiaaanis, N., Nvenetsaopoulos, A.: On the Dynamics of Neural Networks Realizing Associative Memories of First and High-order. Network: Comp. Neural Syst. I, 345–364 (1990) 2. Dembo, A., Farotimiand, O., Kaillath, T.: High-order Absolutely Stable Neural Networks. IEEE Trans. Circuits Syst. 38, 57–65 (1991) 3. Karayiaaanis, N., Nvenetsaopoulos, A.: On the Training and Performance of High-order Neural Networks. Math. Biosci. 129, 143–168 (1995) 4. Kosmatopoulos, E., Christodoulou, M.: Structural Properties of Gradient Recurrent High-order Neural Networks. IEEE Trans. Circuits Syst. II 42, 592–603 (1995) 5. Kosmatopoulos, E., Polycarpou, M., Christodoulou, M., et al.: High-order Neural Networks Structures for Identification of Dynamical Systems. IEEE Trans. Neural networks 6, 442–431 (1995) 6. Brucoli, M., Carnimeo, L., Grassi, G.: Associative Memory Design Using Discrete-time Second-order Neural Networks with Local Interconnections. IEEE Trans. Circuits Syst. I 44, 153–158 (1997) 7. Xu, B., Liu, X., Liao, X.: Global Asymptotic Stability of High-order Hopfield Type Neural Networks with Time Delays. Comput. Math. Appl. 45, 1729–1737 (2003) 8. Cao, J., Liang, J., Lam, J.: Exponential Stability of High-order Bidirectional Associative Memory Neural Networks with Time Delays. Phys. D 199(3-4), 425–436 (2004) 9. Cao, J., Wang, L.: Exponential Stability and Periodic Oscillatory Solution in BAM Networks with Delays. IEEE Trans. Neural Networks 13, 457–463 (2002) 10. Cao, J., Wang, J.: Absolute Exponential Stability of Recurrent Neural Networks with Lipschitz-continuous Activation Functions and Time Delays. Neural Networks 17, 379–390 (2004) 11. Cao, J.: A Set of Stability Criteria for Delayed Cellular Neural Networks. IEEE Trans. Circuits Syst. I 48, 494–498 (2001) 12. Cao, J., Ho, D.W.C.: A General Framework for Global Asymptotic Stability Analysis of Delayed Neural Networks Based on LMI Approach. Chaos, Solitons and Fractals 24(5), 1317–1329 (2005) 13. Boyd, S., Ghaoui, L., Feron, E., et al.: Linear Matrix Inequalities in System and Control Theory. SIAM, Philadelphia (1994)
Improved Particle Swarm Optimization for RCP Scheduling Problem Qiang Wang and Jianxun Qi
*
Abstract. In this paper, an improved particle swarm optimization (IPSO) algorithm is presented to solve RCP Scheduling Problem. Firstly, a mapping is created between the feasible schedule and the position of the particle, then the IPSO begin to search the global best and the local best until the stop criteria is satisfied. A case study is presented and a comparison is made between IPSO and some traditional heuristic methods. Results show that the IPSO algorithm is more satisfying than those of the heuristic methods in terms of feasibility and efficiency. Keywords: RCP scheduling problem, Improved particle swarm optimization, Project management, Heuristic methods, Project scheduling, Genetic algorithm.
1 Introduction The resource constrained project scheduling problem –RCPSP is difined by Davis (1973) as “the method of scheduling activities within fixed amounts of resources available during each time period of project duration so as to minimize the increase in project duration” [1] Blazewicz et al.(1983)proved that the RCPSP is a general form developed from Job shop problem, so that it is a kind of NP-hard problem [2]. This type of problem is characterized by factorial growth in the amount of computation required to consider all possible solutions as problem size increase. There are two classes of method to solve this kind of problem, exact method and Heuristic methods [3]. The exact method includes branch and bound method, mathematical programming such as 0-1 programming [4] and dynamic programming [5]; Heuristic procedures employ some rule of thumb or experience to determine priorities among activities competing for available resources [6-8]. Article [9-10] made a summary of all these heuristic methond. These studies, however, do not contain any computational performance analysis and discuss only small problem examples. In most instances, these exact algorithms may be computationally infeasible or face “combinatorial explosion” problem if the project under study is larger or more complicated [11]. Due to these drawbacks, some evolutionary computation techniques such as genetic algorithm [12-14] have been developed and widely Qiang Wang . Jianxun Qi School of Business and Management, North China Electric Power University Beijing 102206, China
[email protected] *
H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 49–57. springerlink.com © Springer-Verlag Berlin Heidelberg 2009
50
Q. Wang and J. Qi
employed to solve the RCP Scheduling Problem. Although the GA approach can overcome the drawbacks of the analytical and heuristic algorithms, it meets difficulties such as premature convergence or slow convergence process. The Particle Swarm Optimization (PSO) algorithm developed by Kennedy and Eberhart is a stochastic, population-based optimization method which consists of a swarm of particles [15-18]. It has shown to be less susceptible to premature convergence. Previous work also illustrated that it is an efficient and effective algorithm to map high-dimensional dada sets to a lower-dimension with minimal mapping error [19]. With all these advantages, PSO has been applied to other industrial areas like electrical engineering for power optimization [20], resource-constrained project scheduling [21] and so on. To our knowledge, however, there have not been any applications of PSO to RCP Scheduling Problem. In this paper, an improved PSO (IPSO) algorithm is proposed for the RCP Scheduling Problem in project management.
2 The RCP Scheduling Problem 2.1 Activities-on-Node Network A project consists of a number of events (milestones) and activities or tasks that have to be performed in accordance with a set of precedence constrains. Each activity has duration and normally requires resources. Resources may be of different types, including financial resources, manpower, machinery, equipment, materials energy, space, etc. The problem here is to arrange a suitable schedule for the activities, so that all these resources will be used within the limits. The precedence relationship in this paper is the finish-start relationship with a zero time lag: an activity can only start as soon as all its predecessor activities have finished. Usually, there are two possible modes of representation of a project network: the activity-on-arcs mode and the activity-on- node mode.The projects in this paper are represented by an activity-on-node (AoN) network G = (V , E ) , where the set of nodes V represents the activities and the set of arcs E represents finish-start precedence constrains with a time-lag equal to zero. The activities are numbered from the dummy start activity1 to the dummy end activity n . So for each arc ( i , j ) ∈ E in the network, the activity i is the immediate predecessor of activity.
2.2 The Performance Measure When
Tn denotes the finished time of the last activity, the objective function of
RCP Scheduling Problem can be formulated conceptually as follows:
min Tn
(1)
Tn = S n + d n
(2)
Improved Particle Swarm Optimization for RCP Scheduling Problem
51
The above two equations are subjected to the following conditions: For all ( i , j ) ∈ E
si + d i ≤ s j s1 = 0 For k
i , j = 1, 2, … , N .
(3)
sn ≤ δ n .
(4)
and
= (1, … , m ) and t = ( s1 , … , sn )
∑r i∈st
ik
= ukt ≤ Rkt
(5)
where N is the total number of the activities of the project under study; si denotes the start time of activity
i ( i = 1, 2,… , N ) ; d j is the duration of the activity
i ; Rkt represents the available amount of resource k during time period t ; rik is
st is taken as the set of ongoing activities at time t ; ukt denotes the resource usage for resource type k during time period t . the amount of resource k required by activity i ;
3 The Principle of Improved Particle Swarm Optimization 3.1 The Standard PSO Algorithm The standard PSO simulates a social behavior such as bird flocking to a promising position for food or other objective in an area or apace. By sharing the swarm experience and generalizing its own experience, the birds can easily adjust its own behavior police toward the best destination. Just like the birds, the particles fly through the problem space by updating their velocities based on the particle’s personal best position and the best previous position attained by any member of its th
neighbor. In a N dimensional space, the position of the i particle in the t
generation can be represented as X i ( t ) = ( xi1 ( t ) , velocity is denoted as Vi ( t ) = ( vij ( t ) ,
…,
xi 2 ( t ) ,
vij ( t ) ,
…,
…,
th
xin ( t ) ) ; its
vin ( t ) ) . During
iterations, the local bests and the global bests are determined through evaluating the performance, i.e., fitness values or objective, of the current population of parth
ticles. The i particle’s new position is calculated as: X i ( t + 1) = X i ( t ) + Vi ( t )
(6)
where Vi ( t ) denotes the particle’s velocity and X i ( t ) represents the current position in the i
th
generation. The particle velocity is updated according to:
52
Q. Wang and J. Qi
Vi ( t + 1) = ϕ 0Vi ( t ) + ϕ1 ⋅ r1 ( t ) ⋅ [ pi Best − X i ( t )] + ϕ 2 ⋅ r2 ( t ) ⋅ [ g Best − X i ( t )]
(7)
where 0 ≤ ϕ 0 ≤ 1 is an inertia weight taken to control the impact particle’s previous velocity on the current velocity; ϕ1 , ϕ 2 are acceleration constants; r1 ( t ) and r2 ( t ) are random variables sampled from ( 0,
1) ; piBest is the personal best posi-
th
tion found by i particle, and g Best is the global best among all the population of particles achieved so far.
3.2 The Improved PSO Algorithm As the search mechanism of PSO is based on its personal best and the global best, no information of other particles is considered. Therefore, the search in the solution space is single direction. No doubt, it will suffer from premature convergence, where the particles cannot escape local minima. The improved particle swarm optimization (IPSO) is designed to overcome this disadvantage [22]. In the IPSO, all the particles are ranked according to their fitness value which is decided by their personal best piBest . The information from the top n particles is selected to adjust the behavior police for each particle in its next iteration (for the instance of Fig 1.2 in this paper n =2 is selected). Thereby the search of IPSO turned out to be even and multi-directional, the precision and global convergence ability of the algorithm are also improved. The basic formulation for IPSO algorithm is as the follows: Vi ( t + 1) = ϕ 0Vi ( t ) + ϕ1 ⋅ r1 ( t ) ⋅ [ pi Best − X i ( t )] +
1 n
n
∑ϕ
2j
[
⋅ r2 j ( t ) ⋅ g jBest − X i ( t )
]
(8)
j =1
X i ( t + 1) = X i ( t ) + Vi ( t )
(9)
where ϕ 2 j is the acceleration constant of the j particle; r2 j ( t ) is its random varith
able sampled from ( 0,
1) ; g jBest is the personal best position of the j particle th
associated with the best fitness encountered after ( t − 1) iterations. The j particle th
is among top n particles which are ranked according to their fitness value decided by their personal best piBest .
4 IPSO Framework for the RCP Scheduling Problem 4.1 Particle-Represented Schedule For an AoN network with N activities, if its schedule fulfills (3), (4) and (5), it is called a feasible schedule. Each of its arbitrary feasible schedules S ( t ) may be represented by the start times s j ( t ) of all its activities, using:
Improved Particle Swarm Optimization for RCP Scheduling Problem
S ( t ) = ( s1 ( t ) ,
…,
s j (t ) ,
…,
sn ( t ) )
53
(10)
In order to apply the IPSO, A mapping is created between the feasible schedules and the IPSO particles. In this mapping, the particle position X ( t ) = ( x ( t ) , … , x ( t ) , … , x ( t ) ) can be represented by the Activities’ start time (the candidate solutions to the resource leveling problem) as follows: i
i1
X i (t ) =
ij
Si ( t )
δn
in
⎛ si1 ( t ) , … , sij ( t ) , … , sin ( t ) ⎞ δn δn δ n ⎟⎠ ⎝
=⎜
(11)
Where Si ( t ) represents a feasible schedule found by the i particle after ( t − 1) itth
erations; sij ( t ) represents the start time of j activity of i particle after ( t − 1) itth
erations, and δ n
th
n
= ∑ di is the longest duration of the project. It can be learned i =1
from (11) that the value of xij ( t ) varies between [ 0,1] . Also a feasible project schedule can be transformed from the particle position using:
Si ( t ) = X i ( t ) ⋅ δ n
(12)
4.2 Parameters Configuration for IPSO Usually the IPSO size (total amount of the particles) between 40~60 will lead to a good result. It can be adjusted according to the concrete problem. For the instance Fig. 1 in this paper, m = 10 is selected as the size of the IPSO. For the AoN network mentioned above, all parameters of the particle position, either initialized or updated during search should represent a feasible schedule. In order to avoid infeasible particle position, we adjust (9) as follows: If [ X i ( t ) + Vi ( t )] ⋅ δ n represent a feasible schedule
X i ( t + 1) = X i ( t ) + Vi ( t )
(13)
If [ X i ( t ) + Vi ( t )] ⋅ δ n represent an infeasible schedule
X i ( t + 1) = X i ( t )
(14)
In order to balance the global search ability and the local search ability, a suitable the maximum velocity Vmax should be confirmed and the velocity of the particles should vary within interval [ −Vmax , Vmax ] . Usually Vmax is selected between the 10% and 20% of the total variable scope. In this paper, 10% of the deadline is used as
54
Q. Wang and J. Qi
the maximal velocity ( Vmax = 10% ). During the IPSO search, if the i particle’s th
velocity in j
th
direction ( vij ( t ) ) is beyond interval [ −Vmax , Vmax ] , its velocity
should be adjusted as follows: If vij ( t ) ≥ Vmax , then
vij ( t ) = Vmax = 10%
(15)
vij ( t ) = −Vmax = −10%
(16)
If vij ( t ) ≤ −Vmax , then
The convergence ability of the IPSO has been discussed by Eberhart R C and Shi Ti in 2000; some parameters are suggested to guarantee the convergence and effective calculation [23]. In this paper some parameters are selected as the follows:
ϕ 0 = 0.729844 , ϕ1 = 1.49618 and
1
n
∑ϕ n
2i
= 1.49618 .
i =1
4.3 Procedure of the IPSO Algorithms Some feasible schedules can be attained through heuristic methods based on priority rules. In the computational study of section 7.4.1 in Klein (2000) [24], a total of 73 priority rules were evaluated. In the heuristic algorithms, schedulable activities (whose predecessor are all completed and which require no more resources than available amounts at the time) with a higher priority should be assigned the resource and scheduled prior to the ones with lower priorities. Although the schedules thus acquired are not optimal, they are feasible and can be transformed into initial particles for the IPSO. In this paper, 40 initial particles are randomly created with these heuristic methods such as SAD rule, MILFT rule and so on. Beginning with these initial particles, the IPSO search process is as the follows: Step1: Set iteration counters as 0; calculate the positions of the 40 initialized particles with (11); go to step 5 Step2: update the particles’ velocity using (8) and the adjustment condition represented by (15) and (16). Step3: update the particles’ position using (9) and the adjustment condition represented by (13) and (14) Step4: A feasible projected schedule can be transformed from the particle position on the basis of (12). Step5: Calculate the fitness value (the total weighted sum of squired resource usage) of the particles at current time, using (1) and (2). For each particle, if its current fitness value is smaller than that of its previous local best position piBest ,
Improved Particle Swarm Optimization for RCP Scheduling Problem
55
the piBest will be updated as the current position; according to the IPSO Principle, the top n piBest among the particles are selected as the g jBest for (8) ( n =2 for the instance of Fig.1 in this paper). Step6: update the count number using t = t + 1 and test if the approach fulfills the stop criteria. If the stop criteria are fulfilled, then go to step 7; otherwise, go to step 2. [Stop criteria. The IPSO will be terminated if the current iteration meets any one of the following termination signals: (1) maximum number of iterations since last updating of the global best, and (2) maximum total number of iterations.] Step7: Recorded the top
n piBest among the particles, the best one, p1 pest are se-
lected as the g Best
5 Computational Analyses A simple instance as Fig.1 is created to try the IPSO algorithm. It is an AoN network of 10 nodes with fumy start node 1 and terminate node 10. The duration for each activity is indicated above the corresponding node, while the requirement for the single renewable resource, which is assumed for simplicity in this instance, is give below the node. The resource availability limit R kt of this instance is 9 and δ n is 36. The IPSO size is taken as 10 as it is only a simple network.Other parameters can be referred to section 4.2 in this paper. Some other heuristic methods such as GA, PSO have been applied to make a comparison, the result of which is shown as Tab.1
Fig. 1 An example instance for the resource leveling problem
56
Q. Wang and J. Qi
Table 1 Results Comparison
ALGORITHMS GA PSO IPSO
SIZE 20 10 10
THE BEST FITNESS VALUE
SCHEDULE SEARCHED
TIMES
27 25 17
20 19 18
(0,0,0,2,5,5,10,8,18,20) (0,0,0,2,6,5,10,8,17,19) (0,0,5,2,0,6,12,12,16,18)
ITERATION
It can be easily found from Tab.1 that the IPSO presented in this paper is the best one among all these methods compared. Its solution with a fitness value of 18 is better than that of the GA (genetic algorithm) GA, and its iteration times (17) is much smaller than that of GA (27) and traditional PSO (25). Though this is just a small network, it can be deduced that its advantages in optimization and convergence ability will be showed of f for large and complex project networks.
6 Conclusions In this paper, an improved particle swarm optimization (IPSO) method is designed to solve the RCP Scheduling Problem. A mapping between the feasible schedule and the particles’ position is created, and the particle’s best position searched in the solution space by IPSO will be transformed into the best schedule in terms of project duration in return. The computation analysis show that the approach based on IPSO has the ability to search for the global optima, and is more efficient than the traditional PSO method and GA approach. Acknowledgments. This paper is supported by National Natural Science Foundation 80579101 and Doctor Funds of the China Education Ministry 20050079008.
References 1. Davis, E.W.: Project Scheduling under Resource Constraints—a Historical Review and Categorization of Procedures. AIIE Transactions 5, 297–313 (1973) 2. Erik, L., Demeulemeester, Willy, S.H.: Project Scheduling: A Research Handbook, pp. 10–15. Kluwer Academic Publishers, Dordrecht (2002) 3. Herroelen, W., Reyck, B.D., Demeulemeester, E.: Resource-constrained Project Scheduling: A Survey of Recent Developments. Computers & Ops. Res. 25, 279–302 (1998) 4. Patterson, J.H., Huber: A Horizon-varying Zero-one Approach to Project Scheduling. Management Science 20, 990–998 (1974) 5. Bell, C.E., Park, K.: Solving Resource Constrained Project Scheduling Problems by A*-search. Naval Research Logistics 37, 61–84 (1990) 6. Kolisch, R., Kolisch, A.: Adaptive Search for Solving Hard Project Scheduling Problem of Operational Research. Naval Research Logistics 43, 23–40 (1996) 7. Bouleimen, K., Lecocq, H.: A New Efficient Simulated Annealing Algorithm for Resource Constrained Scheduling Problem. Technical Report, Service de Robotique et Automatisation, University de Liege, pp. 1–10 (1998)
Improved Particle Swarm Optimization for RCP Scheduling Problem
57
8. Hartmann, S.: A Competitive Genetic Algorithm for Resource Constrained Project Scheduling. Naval Research Logistics 45, 733–750 (1998) 9. Lawrence, S.R.: Resource-constrained Project Scheduling-A Computational Comparison of Heuristic Scheduling Techniques. Technical Report, Graduate School of Industrial Administration, Carnegie-Mellon University, Pittsburgh, pp. 10–18 (1985) 10. Alvarez-Valdes, R., Tamarit, J.M.: Advances in Project Scheduling, pp. 113–134. Elsevier, Amsterdam (1989) 11. Lee, J.K., Kim, Y.D.: Search Heuristics for Resource Constrained Project Scheduling. Journal of Operat. Res. Soc. 47, 678–689 (1996) 12. Zhang, L.Y., Zhang, J.P., Wang, L.: Genetic Algorithms Based on MATLAB of Construction Project Resource Leveling. Journal of Industrial Engineering/Engineering Management 18, 52–55 (2004) (Chinese) 13. Leu, S.S., Yang, C.H., Huang, J.C.: Resource Leveling in Construction by Genetic Algorithm-based Optimization and in Decision Support System Applicaiton. Automation in Construction 10, 27–41 (2000) 14. Tarek, H.: Optimization of Resource Allocation and Leveling Using Genetic Algorithms. Journal of Construction Engineering and Management 6, 167–175 (1999) 15. Kennedy, R., Eberhart, C.: Particle Swarm Optimization. In: Proceedings of International Conference on Neural Networks, pp. 1942–1948 (1995) 16. Wang, J.W., Wang, D.W.: Experiments and Analysis on Inertia Weight In Particle Swarm Optimization. Journal of Systems Engineering 20, 194–198 (2005) (Chinese) 17. Clerc, M., Kennedy, J.: The Particle Swarm. Explosion, Stability, and Convergence in a Multi- dimensional Complex Space. IEEE Trans. on Evolutionary Computation 6, 58–73 (2002) 18. Trelea, I.: The Particle Swarm Optimization Algorithm. Convergence Analysis and Parameter Selection. Information Processing Letters 85, 317–325 (2003) 19. Edwards, A., Engelbrecht, A.P.: Comparing Particle Swarm Optimisation and Genetic Algorithms for Nonlinear Mapping. In: 2006 IEEE Congress on Evolutionary Computation, pp. 694–701 (2006) 20. Naka, S., Genji, T., Yura, T., et al.: A Hybrid Particle Swarm Optimization for Distribution State Estimation. IEE Trans. on Power System 18, 60–68 (2003) 21. Zhang, H., Li, H., Tam, C.M.: Particle Swarm Optimization for Resource-constrained Project Scheduling. International Journal of Project Management 14, 83–92 (2006) 22. Zhao, B., Cao, Y.J.: An Improved Particle Swarm Optimization Algorithm For Power System Unit Commitment. Power System Technology 28, 6–10 (2004) 23. Eberhart, R.C., Shi, Y.: Comparing Inertia Weights and Constriction Factors in Particle Swarm Optimization. In: Proceedings of the Congress on Evolutionary Computing, pp. 84–88. IEE Service Center, California (2000) 24. Klein, R.: Scheduling of Resource-Constrained Projects. Kluwer Academic Publisher, Boston (2000)
Exponential Stability of Reaction-Diffusion Cohen-Grossberg Neural Networks with S-Type Distributed Delays Yonggui Kao and Shuping Bao
Abstract. This paper is devoted to investigation of the existence of the equilibrium point and its globally exponential stability for reaction-diffusion Cohen-Grossberg neural networks with variable coefficients and distributed delays by means of the homotopic mapping theory and Lyapunov-functional method. The sufficient conditions obtained which are easily verifiable, have a wider adaptive range. Finally, a numerical example verifies the theoretical analysis. Keywords: Homotopic mapping theory, Cohen-Grossberg neural networks, Reaction-diffusion terms, Exponential stability.
1 Introduction Cohen-Grossberg neural networks (CGNNs) has been widely applied in parallel computation, associative memory and optimization problems, which relys on the stability, especially, global asymptotic or exponential stability of neural networks. Many researchers have presented various criteria for the uniqueness and global asymptotic or exponential stability for the equilibrium point of CGNNs with or without time varing delays. In practice, the delays in artificial neural networks are usually continuously distributed [1-10], because neural networks usually has a spatial extent due to the presence of an amount of parallel pathways with a variety of axon sizes and lengths. Moreover, diffusion effect cannot be avoided in the neural networks model when electrons Yonggui Kao Department of Mathematics, Harbin Institute of Technology, Weihai 264209, China Shuping Bao College of information Science and Technology, Qingdao University of Science and Technology, Qingdao 266061, China H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 59–68. c Springer-Verlag Berlin Heidelberg 2009 springerlink.com
60
Y. Kao and S. Bao
are moving in asymmetric electromagnetic field[11-13]. Besides, it is common to consider the diffusion in biological systems (such as immigration) [6,8,10]. However, few author have studied the existence of the equilibrium point and its exponential stability for reaction-diffusion CGNNs with distributed delays. We consider the following reaction-diffusion CGNNs model: ⎧ # $ m ∂ui (t,x) ∂ui (t,x) ∂ ⎪ D − ai (ui (t, x)) = ⎪ il ∂t ∂x ∂x ⎪ l l ⎪ l=1 ⎪ ⎪ n 0 ⎪ ⎪ ⎪ aij sj ( −∞ dηj (θ)uj (t + θ, x)) ×[bi (ui (t, x)) − ⎪ ⎨ j=1 n K (k) ⎪ ⎪ tij sj (uj (t − τk (t), x)) + Ii ], t ≥ 0, x ∈ Ω − ⎪ ⎪ ⎪ j=1 k=0 ⎪ ⎪ ⎪ ⎪ ui (t0 + s, x) = ϕi (s, x), −∞ < s ≤ 0, x ∈ Ω ⎪ ⎩ ∂ui (t,x) = 0, x ∈ ∂Ω, ∂n
(1)
where i = 1, . . . , n; n is the number of neurons in the network; ui (t, x) denotes the state variable associated with the i th neurons at time t and in space x; ai (t, x) represents an amplification function; bi (t, x) represents an appropriately behaved function; aij represents the strength of the neuron interconnections within the network; sj (t, x) shows how the jth neuron reacts (k) to the input; tij represents the interconnection with delay τk (t); τk (t) are delay functions satisfying 0 ≤ τk (t) ≤ τ and 0 ≤ τ˙k (t) ≤ τ¯ ≤ 1; Ii is the constant input from outside the system; Dik ≥ 0 corresponds to the transmission diffusion operator along the ith neuron. u(t, x) = (u1 (t, x), . . . , un (t, x))T , x = (x1 , . . . , xm )T . Throughout the paper, we assume that system (1) has a continuous solution denoted by u(t, 0, ϕ; x) or simply u(t, x) . we also assume that (H1) There exist mi and Mi , such that 0 < mi ≤ ai (ui ) ≤ Mi , i = 1, . . . , n. Δ (H2) bi (·) is differentiable, and αi = inf ui ∈R {b˙ i (ui )} > 0, where b˙ i (·) is the derivative of bi (·), bi (0) = 0, i = 1, . . . , n.
(H3) There exist constants βj > 0(j = 1, . . . , n), such that |sj (x) − sj (y)| ≤ βj |x − y| , ∀x, y ∈ R. ∗ , such that (H4) There exist constants pij , qij , p∗ij , qij n 2q 2q 2−2qji 2−2qji Mi |aij |2pij βj ij δj ij + Mj |aji |2−2pji βi δi j=1 j=1 % %2p∗ij % %2−2p∗ji n n K K ∗ 2q∗ 2−2qji % (k) % % (k) % + Mi %tij % βj ij + (1 + τ¯) Mj %tji % βi < 0.
−2mi αi +
k=0 j=1
0
n
k=0 j=1
(H5) −∞ dηj (θ)uj (t + θ, x), j = 1, 2, ..., n are Lebesgue-Stieltjes integrable and the delay kernels ηj : (−∞, 0] → R are nondecreasing bounded variation functions satisfying
Exponential Stability of Reaction-Diffusion CGNNs
0 −∞
61
dηj (θ) = δj > 0.
Δ
Let C = C((−∞, 0] × Rm , Rn ) be the Banach space of continuous functions which map (−∞, 0] × Rm into Rn with the topology of uniform convergence. Ω be an open bounded domain in Rm with smooth boundary ∂Ω, and mesΩ > 0 denotes the measure of Ω. L2 (Ω) is the space of real functions on Ω which are L2 for the Lebesgue measure. It is a Banach n space for the norm u(t) = ui (t)2 where u(t) = (u1 (t), · · · , un (t))T , ui (t)2 =
Ω
define ϕ =
i=1
2
1/2
|ui (t, x)| dx n
i=1 1/2
|ϕi (x)|2τ dx
Ω
. For any ϕ(t, x) ∈ C[(−∞, 0] × Ω, Rn ], we
ϕi 2 , where ϕ(t, x) = (ϕ1 (t, x), · · · , ϕn (t, x))T , ϕi 2 = , |ϕi (x)|∞ =
sup
−∞≤s≤0
|ϕi (s, x)| .
Definition 1. u(t, x) ≡ u∗ ∈ Rn is said to be an equilibrium point of system (1), if the constant vector u∗ = (u∗1 , . . . , u∗n )T satisfies ∗
bi (u ) −
n
aij sj (δj u∗j )
j=1
−
K n
tij sj (u∗j ) + Ii = 0, i = 1, . . . , n. (k)
(2)
k=0 j=1
Lemma 1. (Cronin[8] Kroneckers theorem). Assume that f : Ω → Rn is a continuous function, deg(f, Ω, p) = 0, then there exists x0 ∈ Ω, such that f (x0 ) = p. Lemma 2. (Cronin[8] Homotopy invariance theorem). Assume that H(x, λ) : Ω × [0, 1] → Rn is a continuous function, denote hλ (x) = H(x, λ). When λ ∈ [0, 1] and p ∈ / hλ (∂Ω), deg(hλ , Ω, p) is independent of λ.
2 Existence and GES of the Equilibrium Point In order to study the existence and uniqueness of the equilibrium point, we consider the following algebraic equations associated with system (1) bi (ui (t, x)) −
n j=1
aij sj (δj uj (t, x)) −
K n
(k)
tij sj (uj (t, x)) + Ii = 0,
(3)
k=0 j=1
where i = 1, . . . , n. System (3) can be rewritten in the following vector form: B(u) − AS(δu) − T S(u) + I = 0,
(4)
62
Y. Kao and S. Bao
where u = (u1 , . . . , un )T , B(u) = (b1 (u1 ), . . . , bn (un ))T , A = (aij )n×n , δ = K (k) diag(δ1 , . . . , δn ),T = ( k=0 tij )n×n , S(δu) = (s1 (δ1 u1 ), . . . , sn (δn un ))T , I = (I1 , . . . , In )T . Theorem 1. If (H1)-(H5) holds, then system (1) has an unique equilibrium point u∗ . Proof. Let h(u) = B(u)−AS(δu)−T S(u)+I. Define the homotopic mapping Δ ¯ × [0, 1] → Rn as follows: H(u, λ) = (H1 (u, λ), . . . , Hn (u, λ))T = H(u, λ) : Ω λh(u) + (1 − λ)B(u), λ ∈ [0, 1], in which α = diag(αi ). It follows from (H2) and (H4) that |bi (ui )| ≥ αi |ui | ,
i = 1, . . . , n,
(5)
|sj (uj )| ≤ βj |uj |+|sj (0)| , |sj (δj uj )| ≤ βj δj |uj |+|sj (0)| , j = 1, . . . , n. (6) Thus we have |Hi (u,
λ)| = |λ bi (ui ) −
n
aij sj (δj uj ) −
j=1 n
≥ αi |ui | −
|aij | βj δj |uj | −
j=1
% n % K % (k) % − %tij % |sj (0)| − |Ii | .
n K
k=0 j=1 n
(k) tij sj (uj )
|aij | |sj (0)| −
j=1
+ Ii
+ (1 − λ)bi (ui )|
% n % K % (k) % %tij % βj |uj |
k=0 j=1
k=0 j=1
By (7), we obtain n
Mi |ui | |Hi (u, λ)|
% n n n K n % % (k) % Mi |ui | αi |ui | − |aij | βj δj |uj | − |aij | |sj (0)| − ≥ %tij % βj |uj | i=1 j=1 j=1 k=0 j=1 % K n % % (k) % − %tij % |sj (0)| − |Ii |
i=1
k=0 j=1
% ∗ ∗ n % K % (k) %2pij 2qij |aij | |sj (0)| |ui | − 12 βj |ui |2 %tij % j=1 k=0 j=1 % ∗ % ∗ K K n % n % % (k) %2−2pij 2−2qij % (k) % 1 −2 βj |uj |2 − %tij % %tij % |sj (0)| |ui | − |Ii | |ui | k=0 j=1 k=0 j=1
n n n 2q 2q 2−2qji 2−2qij Mi αi − 12 Mi |aij |2pij βj ij δj ij − 12 Mj |aji |2−2pji βi δj ≥ i=1 j=1 j=1 % % ∗ % % ∗ ∗ ∗ n n K K % (k) %2pij 2qij % (k) %2−2pji 2−2qji − 12 Mi %tij % βj − 12 Mj %tji % βi |uj |2 k=0 j=1 k=0 j=1
% n n K n % % (k) % |s − Mi |aij | |sj (0)| + (0)| + |I | |ui | . %tij % j i
−
n
i=1
j=1
k=0 j=1
(7)
Exponential Stability of Reaction-Diffusion CGNNs
63
So, we obtain n M |u | |H (u, λ)| dx i i i Ω i=1 & n n 2q 2q Mi αi − 12 ≥ Ω Mi |aij |2pij βj ij δj ij − i=1
j=1
n j=1
Mj |aji |2−2pji
% %2−2p∗ji ∗ 2−2qji % (k) % |uj |2 Mj %tji % βi k=0 j=1 k=0 j=1
' % n % n n K % (k) % − Mi |aij | |sj (0)| + %tij % |sj (0)| + |Ii | |ui | dx − 21
n K
i=1
% %2p∗ij 2q∗ % (k) % Mi %tij % βj ij −
1 2
j=1
n K
k=0 j=1
2 δ0 u(t)2 (
≥
1 2
− L u(t)2 ,
δ0 = min Mi αi −
1 2
i
n
where,
2pij
Mi |aij |
−
j=1
1 2
n
=
2−2pji
Mj |aji |
j=1
2qij 2qij δj ,
βj
and
2−2qji 2−2qji δi
βi )
% %2p∗ij % %2−2p∗ji n K ∗ 2q∗ 2−2qji % (k) % % (k) % > 0, − 21 Mi %tij % βj ij − 12 Mj %tji % βi k=0 j=1 k=0 j=1 % % * # $ + K n % (k) % n L = maxi Ω Mi |s |a | |s (0)| + (0)| + |I | dx . %t % ij j j i j=1 k=0 j=1 ij n Take U (R 0 }. For any u ∈ ∂U (R0 ), 0n) = {U ∈ R | u(t) 2 < R0 = (L + 1)/δ # $ Mi |ui | |Hi (u, λ)| dx ≥ δ0 u(t)2 u(t)2 − δL0 > 0, ∀λ ∈ and Ω n K
i=1
[0, 1], i.e., H(u, λ) = 0, obtain:
∀λ ∈ [0, 1]. From Lemma 2, we
∀u ∈ ∂U (R0 ),
deg(h(u), U (R0 ), 0) = deg(H(u, 1), U (R0 ), 0) = deg(H(u, 0), U (R0 ), 0) = 1. It follows from Lemma 1 that there exist at least an u∗ ∈ U (R0 ), such that h(u∗ ) = 0. i.e., system (1) has at least an equilibrium point u∗ . Next, we prove the uniqueness of the equilibrium point. Suppose that u ¯∗ is also an equilibrium point of system (1), then u∗i ) − bi (¯
n
aij sj (δj u ¯∗j ) −
j=1
K n
bi (u∗i ) +
(k)
−
bi (¯ u∗i )
n K
k=0 j=1
=
n j=1
(k) tij sj (u∗j )
aij sj (δj u∗j )
−
n K k=0 j=1
From
(H2),(H4) and (H5), we obtain n
i=1
+ 12
−Mi αi + K n k=0 j=1
1 2
(8)
k=0 j=1
Thus
tij sj (¯ u∗j ) + Ii = 0, i = 1, . . . , n
n
j=1
2qij 2qij δj
Mi |aij |2pij βj
% % ∗ ∗ % (k) %2pij 2qij Mi %tij % βj +
1 2
K n k=0 j=1
+
1 2
−
n j=1
aij sj (δj u ¯∗j )
(k) tij sj (¯ u∗j )
n j=1
(9) .
2−2qji 2−2qji δi
Mj |aji |2−2pji βi
% % ∗ ∗ % (k) %2−2pji 2−2qji Mj %tji % βi
|u∗i − u ¯∗i |2 ≤ 0,
64
Y. Kao and S. Bao
which implies u∗i = u ¯∗i (i = 1, . . . , n). Hence system (1) has an unique equilib∗ rium point u . This completes the proof. Denoting yi (t, x) = ui (t, x) − u∗i , i = 1, . . . , n, system (1) can be rewritten as follows: ⎧ # $ m ∂yi (t,x) ∂yi (t,x) ∂ ⎪ D − ai (yi (t, x)) × [bi (ui (t, x)) − bi (u∗i ) = ⎪ il ∂xl ∂t ∂xl ⎪ ⎪ l=1 ⎪ ⎪ n n 0 0 ⎪ ⎪ ∗ ⎪ ⎪ ⎨ − j=1 aij sj ( −∞ dηj (θ)uj (t, x)) + j=1 aij sj ( −∞ dηj (θ)uj ) (10) K n K n ⎪ − t(k) s (u (t − τ (t), x)) + t(k) s (u∗ )], t ≥ 0, x ∈ Ω ⎪ j j k j ⎪ j ij ij ⎪ ⎪ k=0 j=1 k=0 j=1 ⎪ ⎪ ⎪ yi (t, x) = ψi (t, x), −τ ≤ t ≤ 0, x ∈ Ω ⎪ ⎪ ⎩ ∂yi (t,x) = 0, x ∈ ∂Ω, ∂n where i = 1, . . . , n, ψi (t, x) = ϕi (t, x)−u∗i . y(t, x) = (y1 (t, x), . . . , yn (t, x))T , ψ(t, x) = (ψ1 (t, x), . . . , ψn (t, x))T . Obviously, u∗ of system (1) is GES if and only if the equilibrium point O of system (10) is GES. Thus in the following, we only consider GES of the equilibrium point O for system (10). Theorem 2. If (H1)-(H5) hold, then the equilibrium point O of system (10) is GES. Proof. From (H4), there exists a sufficiently small constant 0 < λ < mini {mi αi } , such that n n 2q 2q 2p 2−2pji 2−2qji 2−2qji 2λ − 2mi αi + Mi |aij | ij βj ij δj ij + Mj |aji | βi δi j=1 j=1 % %2p∗ij % %2−2p∗ji n n K K ∗ 2q∗ 2−2qji % (k) % % (k) % + Mi %tij % βj ij + Mj %tji % βi e2λτ k=0 j=1 k=0 j=1 % %2−2p∗ji n K ∗ 2−2qji % (k) % +¯ τ Mj %tji % βi ≤ 0. k=0 j=1
Taking Lyapunov functional and calculating the rate of change of V (t): % %2−2p∗ij n n K ∗ 2−2qij % (k) % 2 Mi %tij % βi V (t) = Ω ( |yi (t, x)| e2λt + i=1 k=0 j=1 t × t−τk (t) |yj (s, x)|2 e2λ(s+τ ) ds)dx n
2
(11)
(2λ |yi (t, x)| e2λt + 2 |yi (t, x)| e2λt sign(yi (t, x))y˙i (t, x) i=1 % %2−2p∗ij n K 2−2q∗ % (k) % 2 + Mi %tij % βj ij |yj (t, x)| e2λ(t+τ ) k=0 j=1 % %2−2p∗ij n K 2−2q∗ % (k) % 2 −(1 − τ˙k (t)) Mi %tij % βj ij |yj (t − τk (t), x)| e2λ(t−τk (t)+τ ) )dx D+ V (t) =
Ω
k=0 j=1
Exponential Stability of Reaction-Diffusion CGNNs
65
n
[2λ |yi (t, x)|2 e2λt + 2 |yi (t, x)| e2λt sign(yi ) # $ i (t,x) Dil ∂y∂x − ai (yi (t, x)) k 0 0 ×[bi (ui (t, x)) − bi (u∗i ) − aij sj ( −∞ dηj (θ)uj (t + θ, x)) + aij sj ( −∞ dηj (θ)u∗j ) K K n n (k) (k) − tji sj (uj (t − τk (t), x)) + tji sj (u∗j )] k=0 j=1 k=0 j=1 ∗ % % ∗ K n % (k) %2−2pij 2−2qij + Mi %tij % βj |yj (t, x)|2 e2λ(t+τ ) k=0 j=1 % % ∗ K ∗ n % (k) %2−2pij 2−2qij −(1 − τ˙k (t)) Mi %tij % βj |yj (t − τk (t), x)|2 e2λ(t−τk (t)+τ ) ]dx =
Ω
mi=1 ∂ × ∂xl l=1
k=0 j=1
From the boundary condition, we have # $ # $ m m ∂yi (t,x) ∂ ∂ i (t,x) |yi (t, x)| dx = Dil ∂y∂x dx |y (t, x)| ∂x ∂xl Dil ∂xl Ω Ω i l l l=1 l=1 # $ m , ∂yi (t,x) ∂yi (t,x) ∂ i (t,x) = dx− Ω ∂y∂x D dx il Ω ∂xl |yi (t, x)| Dil ∂xl ∂x l l l=1 # $2 m ∂yi (t,x) =− |yi (t, x)| dx. Ω Dil ∂xl l=1
Combining (H1)-(H5) with 2ab ≤ a2 + b2 , we obtain D+ V (t) ≤
n
2
2
[2λ |yi (t, x)| e2λt + 2e2λt (−mi αi |yi (t, x)| i=1 % % n n K % (k) % + |aij | βj δj |yi (t, x)| |yj (t, x)| + Mi %tij % βj |yi (t, x)| ) j=1 k=0 j=1 % %2−2p∗ij n K ∗ 2−2qij % (k) % Mi %tij % βj |yj (t, x)|2 e2λ(t+τ ) + j=1 k=0 % %2−2p∗ij n K 2−2q∗ % (k) % 2 −(1 − τ¯) Mi %tij % βj ij |yj (t − τk (t), x)| e2λ(t−τ (t)+τ ) ]dx n
Ω
k=0 j=1
2 2 [2λ |yi (t, x)| e2λt + 2e2λt (−mi αi |yi (t, x)| Ω i=1 n q q 1−q 1−q p 1−p + Mi |aij | ij βj ij δj ij |yi (t, x)| |aij | ij βj ij δj ij |yj (t, x)| j=1 % %p∗ij ∗ % %1−p∗ij n K q 1−q∗ % (k) % % (k) % Mi %tij % βj ij |yi (t, x)| %tij % βj ij |yj (t − τk , x)|) + k=0 j=1 % %2−2p∗ij n K 2−2q∗ % (k) % Mi %tij % βj ij |yj (t, x)|2 e2λ(t+τk ) + k=0 j=1 % %2−2p∗ij n K 2−2q∗ % (k) % 2 − Mi %tij % βj ij |yj (t − τk , x)| e2λt ]dx k=0 j=1 n 2 2 ≤ Ω [2λ |yi (t, x)| e2λt + e2λt (−2mi αi |yi (t, x)| i=1 n n (12) 2q 2p 2 2−2pji 2−2qji 2 Mi |aij | ij βj ij |yi (t, x)| + Mi |aij | βj |yj (t, x)| + j=1 j=1
≤
66
Y. Kao and S. Bao
% %2p∗ij % %2−2p∗ij n K 2q∗ 2−2q∗ % (k) % % (k) % 2 Mi %tij % βj ij |yi (t, x)| + Mi %tij % βj ij k=0 j=1 k=0 j=1 % %2−2p∗ij n K ∗ 2−2qij % (k) % 2 Mi %tij % βj |yj (t, x)| e2λ(t+τk ) + k=0 j=1 % %2−2p∗ij n K 2−2q∗ % (k) % 2 − Mi %tij % βj ij |yj (t − τk , x)| e2λt ]dx +
n K
k=0 j=1 n
n n 2q 2p 2−2pji 2−2qji [2λ − 2mi αi + Mi |aij | ij βj ij + Mi |aij | βj i=1 j=1 j=1 ∗ ∗ % % % % n n K K ∗ ∗ % (k) %2pij 2qij % (k) %2−2pji 2−2qji + Mi %tij % βj + Mj %tji % βi e2λτ ] k=0 j=1 k=0 j=1 % %2−2p∗ij n K 2−2q∗ % (k) % 2 Mi %tij % βj ij |yj (t − τk , x)| yi (t)2 e2λt +
≤
k=0 j=1
≤ 0,
Where = |yj (t − τk , x)|. It implies V (t) ≤ V (0). By (11) we have % % ∗ K ∗ n n % (k) %2−2pij 2−2qij t V (0) = Ω (|ψi (t, x)|2 + Mi %tij % βi |yj (s, x)|2 ℘ds)dx −τk i=1 k=0 j=1
% % ∗ n K ∗ n % (k) %2−2pij 2−2qij 0 2 2λ(s+τk ) Mi %tij % βi y (s) e ds ≤ ψi (t)22 + j 2 −τk i=1 k=0 j=1 ( ) ∗ % ∗ K n % % (k) %2−2pij 2−2qij ≤ max 1 + βi (eτ − 1) ψ22 , %tij % i
k=0 j=1
2
where ℘ = e2λ(s+τk )(. And V (t) ≥ y(t)2 e2λt . Hence ) ∗ % n % K ∗ % (k) %2−2pij 2−2qij 2 2λt y(t)2 e ≤ max 1 + βi (eτ − 1) ψ22 , %tij % i
k=0 j=1
which leads to y(t)2 ≤ γ ψ2 e−λt , where ( )1/
∗ 2 % n % K ∗ % (k) %2−2pij 2−2qij . βi (eτ − 1) γ = max 1 + %tij % i
k=0 j=1
This completes the proof. Corollary 1. System (1) has an unique equilibrium point which is GES point, if the conditions (H1)-(H3) hold. Furthermore, assume one of the following conditions hold: % %2 n n K % (k) % 2 (H5) −2mi αi + Mi |aij | βj2 + Mi %tij % βj2 < 0. (H6)
−2mi αi +
j=1 n j=1
+
n K k=0 j=1
k=0 j=1
Mi |aij | βj +
% % % (k) % Mj %tji % βi < 0.
n j=1
Mj |aji | βi +
n K k=0 j=1
% % % (k) % Mi %tij % βj
Exponential Stability of Reaction-Diffusion CGNNs
67
∗ Proof. In (H4), let pij = qij = p∗ij = qij = 1, then (H4) turns to (H5). ∗ Furthermore, we suppose that pij = qij = p∗ij = qij = 12 , then (H4) turns to (H6).By Theorems 1 and 2, system (1) has a unique equilibrium point which is GES.
Remark 1. When Dik ≡ 0, then system (1) becomes the system analyzed in [2,9,14,15,16]. It is worth noting that, in the paper, we did not need sj is bounded. Thus, we improve the results in Refs. [2,9,14,15,16]. Remark 2. From Theorems 1 and 2, we see if reaction- diffusion terms satisfy a weaker condition Dik ≥ 0, then the effects for the existence and GES of the equilibrium point just come from the networks parameters, the stability is completely expressed by the relations of these parameters. Remark 3. Although the assertions of exponential stability in Theorems 1 and 2 are independent of the delays, the convergence rate λ do depend on the delays τk .
3 An Example Example 1. Consider Cohen-Grossberg neural networks with delays and reaction-diffusion terms ⎧ # $ ∂u1 (t,x) (t,x) ∂ ⎪ D11 ∂u1∂x − a1 (u1 (t, x)) = ∂x ⎪ ⎪ ∂t ⎪ ' & ⎪ ⎪ 1 2 2 ⎪ 0 ⎪ (k) ⎪ a1j sj ( −∞ dηj (θ)uj ) − t1j sj (uj (t − τk , x)) ⎪ ⎨ × b1 (u1 (t, x)) − j=1 j=1 k=0 # $ ∂u2 (t,x) (t,x) ∂ ⎪ D21 ∂u2∂x − a2 (u2 (t, x)) = ∂x ⎪ ⎪ ∂t ⎪ & ' ⎪ ⎪ 1 2 2 ⎪ 0 ⎪ (k) ⎪ a2j sj ( −∞ dηj (θ)uj ) − t2j sj (uj (t − τk , x)) . ⎪ ⎩ × b2 (u2 (t, x)) − j=1
j=1 k=0
(13) Let D11 > 0, D21 > 0, ai = 4 + sin ui , bi (ui ) = 4ui , s1 (u1 ) = arctan u1 , s2 = u2 . Clearly, ai satisfies (H1) with mi = 3, Mi = 5, sj satisfies (H2) with βj = 1, bi satisfies (H3) with αi = 4, i, j = 1, 2. Moreover, we choose (0) (0) (1) (1) 1 1 1 1 1 a11 = 12 , a12 = 12 , a21 = 16 , a22 = 24 , t11 = 12 , t12 = 16 , t11 = 12 , t12 = (1) (1) 1 (0) 1 (0) 1 1 1 4 , t21 = 6 , t22 = 24 , t21 = 12 , t22 = 24 . By simple calculation, we show that (H6) holds. It follows from Theorems 1 and 2 that system (13) has a unique equilibrium point (0, 0)T which is GES.
4 Conclusions In this paper, the dynamics of Cohen-Grossberg neural networks model with delays and reaction-diffusion is studied. By employing homotopic mapping theory and constructing Lyapunov functional method, some sufficient condi-
68
Y. Kao and S. Bao
tions have been obtained which guarantee the model to be GES. The given algebra conditions are useful in design and applications of reaction-diffusion Cohen-Grossberg neural networks. Moreover, our methods in the paper may be extended for more complex networks. Acknowledgements. This paper was supported by the National Natural Science Foundations of China under Grant 60673101 and 60674020. And supported by the Foundations under Grant HITWHXB200807.
References 1. Xu, Z., Qiao, H., Peng, J., Zhang, B.: A Comparative Study of Two Modeling Approaches in Neural Networks. J. Neural Networks 17, 73–85 (2004) 2. Cao, J.: An Estimation of the Domain of Attraction and Convergence Rate for Hopfield Continuous Feedback Neural Networks. Phys. Lett. A 325, 370–374 (2004) 3. Cao, J., Liang, L.: Boundedness and Stability for Cohen-Grossberg Neural Network with Time-varying Delays. J. Math. Anal. Appl. 296, 665–685 (2004) 4. Cao, J., Wang, J.: Globally Exponentially Stability and Periodicity of Recurrent Neural Networks with Time Delays. IEEE Trans. Circuits Syst. I 52, 920–931 (2005) 5. Zhang, J., Suda, Y., Komine, H.: Global Exponential Stability of CohenGrossberg Neural Networks with Variable Delays. Phys. Lett. A 338, 44–50 (2005) 6. Hasting, A.: Global Stability in Lotka-Volterra Systems with Diffusion. J. Math. Biol. 6, 163–168 (1978) 7. Liang, J., Cao, J.: Global Exponential Stability of Reaction-Diffusion Recurrent Neural Networks with Time-varying Delays. Phys. Lett. A 314, 434–442 (2003) 8. Liao, X., Li, C.: Stability in Gilpin-Ayala Competition Models with Diffusion. Nonlinear Anal. TMA. 18, 1751–1758 (1997) 9. Liao, X., Li, C., Wong, K.: Criteria for Exponential Stability of CohenGrossberg Neural Networks. Neural Networks 17, 1401–1414 (2004) 10. Rothe, F.: Convergence to the Equilibrium State in the Volterra-Lotka Diffusion Equations. J. Math. Biol. 3, 319–324 (1976) 11. Song, Q., Cao, J.: Global Exponential Stability and Existence of Periodic Solutions in BAM Networks with Delays and Reaction-diffusion Terms. Chaos Solitons & Fractals 23, 421–430 (2005) 12. Song, Q., Zhao, Z., Li, Y.: Global Exponential Stability of BAM Neural Networks with Distributed Delays and Reaction-diffusion Terms. Phys. Lett. A 335, 213–225 (2005) 13. Wang, L., Xu, D.: Global Exponential Stability of Reaction-diffusion Hopfield Neural Networks with Variable Delays. Sci. China Ser. E 33, 488–495 (2003) 14. Xiong, W., Cao, J.: Global Exponential Stability of Discrete-time CohenGrossberg Neural Networks. Neurocomputing 64, 433–446 (2005) 15. Yuan, K., Cao, J.: Global Exponential Stability of Cohen-Grossberg Neural Networks with Multiple Time-varying Delays. LNCS, vol. 73, pp. 78–83 (2004) 16. Zhang, J., Suda, Y., Komine, H.: GlobalExponential Stability of CohenGrossberg Neural Networks with Variable Delays. Phys. Lett. A 338, 44–50 (2005)
Global Exponential Robust Stability of Static Reaction-Diffusion Neural Networks with S-Type Distributed Delays Shuping Bao
Abstract. In this letter, by using homotopic invariance, toplogical degree theory and Lyapunov functional method, we investigate the global exponential robust stability of static neural network models with reaction-diffusion terms and S-type distributed delays. We present a theorem and a corollary which generalize the results of related literature. Moreover, the exponential convergence rate is estimated. Keywords: Static neural networks, Distributed delays, Exponential robust stability, Liapunov functional, Reaction-diffusion.
1 Introduction On the basis of the difference of basic variables (local field states or neuron states), a dynamical neural network can frequently be cast either as a local field neural network model or as a static neural network model [1,2]. Generally speaking, Hopfield neural networks [3], bidirectional associative memory networks [4] and cellular neural networks [5] all are local field neural network models. The recurrent back-propagation networks [6], brain-statein-a-box/domain type networks [7,8] and optimization-type neural networks in [9-11] are all in the static neural network model forms. The basic form of the local field model is n dxi (t) = −ai xi (t) + wij fj (xj (t)) + Ii , i = 1, 2, · · · , n, dt j=1
(1)
where n denotes the number of neurons; wij is the value of the synaptic connectivity from neuron j to i; fi (·) is the nonlinear activation function of Shuping Bao College of information Science and Technology, Qingdao University of Science and Technology, Qingdao 266061, China H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 69–79. c Springer-Verlag Berlin Heidelberg 2009 springerlink.com
70
S. Bao
neuron i; Ii is the external input imposed on neuron i [1, 2]. With the same notation, the static model can be written as ⎛ ⎞ n dyi (t) = −ai yi (t) + fi ⎝ wij yj (t) + Ii ⎠ , i = 1, 2, · · · , n. (2) dt j=1 Local field neural network models have attracted a great deal of attention. Many deep theoretical results have been obtained for local field neural network, we can refer to [3,4,5,12-18] and so on. Recently, more and more attention has been paid to the static models for its great potential of applications. In [2], the authors presented a reference model approach for investigating the stability of system (2), but did not consider the influence of time delays. In [19], the authors investigated the global robust asymptotic stability of static neural network models with S-type distributed delays on a finite interval, In [20], the authors discussed local robust stability of static neural network (SNN) with S-type distributed delays, but they both didn’t consider the effect of reaction-diffusion phenomina. In this paper, we will study the global exponential robust stability of the following static reaction-diffusion neural networks with S-type distributed delays on a finite interval: ⎧ m ∂ui ∂ ∂ui ⎪ ⎪ ⎪ = (Dik ) − ai (t, λ)ui (t, x) ⎪ ⎪ ∂t ∂x ∂x k k ⎪ ⎪ k=1 ⎪ ! 0 ⎪ ⎪ n ⎪ ⎨ +fi uj (t + θ, x)dωij (θ, λ) + Ii (t) (3) j=1 −r(λ) ⎪ ⎪ ⎪ ⎪ ∂ui ∂ui ∂ui ⎪ ⎪ := col( ,..., ) = 0, t ≥ σ ≥ 0, x ∈ ∂Ω, ⎪ ⎪ ⎪ ∂n ∂x ∂x 1 m ⎪ ⎪ ⎩ yi (σ + θ) = φi (θ), t ≥ σ, θ ∈ [−r(λ), 0], i = 1, 2, · · · , n, where n denotes the number of neurons; ai (t) > 0 and Dik (t, x, u) ≥ 0 represent the neuron charging time functions and smooth diffusion operator, respectively; xi (i = 1, ..., n) corresponds to the neuron ith coordinate in the space X; Ω is a bounded compact set with smmoth boundary ∂Ω and measure μ(Ω) > 0 in Rm ; is the nonlinear activation function of neuron i; Ii is the external input imposed on neuron i; λ ∈ Λ ⊂ R is the parameter, σ ∈ R, φi (θ, x) ∈ C([−r(λ), 0] × Rm ) is the initial conditions; ωij (θ, λ), i, j = 1, 2, · · · , n, are nondecreasing bounded variation functions ! 0 on [−r(λ), 0], uj (t + θ, x)dωij (θ, λ), i, j = 1, 2, · · · , n, are Lebesgue−r(λ)
Stieltjes integrable. ∗ There exist positive constants r, ai , ai , ωij , i, j = 1, 2, · · · , n, such that for 0 ∗ < any λ ∈ Λ, 0 < ai ≤ ai (λ, t) ≤ ai , 0 ≤ r(λ) ≤ r, | −r(λ) dωij (θ, λ)| ≤ ωij
∗ ∞, i, j = 1, 2, · · · , n. Let W = (ωij )n×n , A = diag(ai ), L = diag(li ),
Global Exponential Robust Stability
71
i = 1, 2, · · · , n, B = {φ(θ, x)|φ(θ, x) = (φ1 (θ, x), φ1 (θ, x), · · · , φn (θ, x)), φi (θ, x) ∈ C([−r(λ), 0] × Rm ), i = 1, 2, · · · , n}. Define φmax = max ! 1≤i≤n {max−r(λ)≤θ≤0 φi (θ)} as the maximum norm, 1
|φi (θ)|2 dx) 2 . Then B is a Banach space. For any
where φi (θ) = ( Ω
φ(θ) ∈ B and σ ∈ R, a solution of system (3) is a vector function u(t, x) = u(σ, φ, t, x) = (u1 (t, x), · · · , un (t, x)) satisfying (3) for t ≥ σ.
2 Preliminaries Definition 1. Suppose that u(t, x) = u(σ, φ, t, x) is a solution of system (3). The equilibrium point u∗ of system (3) is said to be globally exponentially stable if there exist constants M > 0, α > 0, such thatu−u∗ ≤ M e−α(t−t0 ) , where · denotes the Euclidean norm. Definition 2. System (3) is said to be robust stable or globally exponentially robust stable if its equilibrium u∗ = (u∗1 , u∗2 , · · · , u∗n ) is stable or globally exponentially stable for any r(λ) ∈ [0, r] and ai (λ) ∈ [ai , ai ], i = 1, · · · , n. Lemma 1. If f (x, y, θ) is continuous on [a, b; c, d; α, β], ω(θ) is a non! β decreasing bounded variation function on [α, β], dω(θ) = ω ∗ < ∞, then α ! β f (x, y, θ)dω(θ) is continuous on [a, b; c, d]. g(x, y) = α
Proof. f (x, y, θ) is continuous on [a, b; c, d; α, β], so it is uniformly continuous, i.e., ∀ε ≥ 0, ∃δ ≥ 0 such that when |x1 −x2 | ≤ δ,|y1 −y2 | ≤ δ and |θ1 −θ2 | ≤ δ, we have |f (x1 , y1 , θ1 ) − f (x2 , y2 , θ2 )| ≤ ε. Since −|f (x1 , y1 , θ)−f (x2 , y2 , θ)| ≤ f (x1 , y1 , θ)−f (x2 , y2 , θ) ≤ |f (x1 , y1 , θ)−f (x2 , y2 , θ)|,
!
!
β
−
β
|f (x1 , y1 , θ) − f (x2 , y2 , θ)|dω(θ) ≤ α
|f (x1 , y1 , θ) − f (x2 , y2 , θ)|dω(θ), α
we have %! % % β % % % (f (x1 , y1 , θ) − f (x2 , y2 , θ))dω(θ)% ≤ εω ∗ . % % α % %! % % β % % % (f (x1 , y1 , θ) − f (x2 , y2 , θ))dω(θ)% ≤ εω ∗ . |g(x1 , y1 ) − g(x2 , y2 )| = % % α % Thus g(x, y) is continuous on [a, b; c, d].
72
S. Bao
From Lemma 1, we know that when the conditions of Lemma 1 are satisfied, ! lim ∗
x→x , y→y ∗
!
β
f (x, y, θ)dω(θ) =
β
lim
∗ ∗ α x→x ,y→y
α
f (x, y, θ)dω(θ).
∂ f (x, y, θ)and fy (x, y, θ) = Lemma 2. If f (x, y, θ), fx (x, y, θ) = ∂x ∂ f (x, y, θ)are continuous on [a, b; c, d; α, β], ω(θ) is a nondecreasing ∂y ! β bounded variation function on [α, β], dω(θ) = ω ∗ < ∞, then α ! β ! β ∂ f (x, y, θ)dω(θ) = fx (x, y, θ)dω(θ), ∂x α α ! β ! β ∂ f (x, y, θ)dω(θ) = fy (x, y, θ)dω(θ) ∂y α α ! β Proof. Let g(x, y) = f (x, y, θ)dω(θ). From the mean value theorem, α
g(x + Δx, y) − g(x, y) = Δx
!
β
α
!
f (x + Δx, y, θ) − f (x, y, θ) dω(θ), Δx
β
fx (x + ξΔx, y, θ)dω(θ)
= α
ξ ∈ [0, 1]. From Lemma 1, g(x + Δx, y) − g(x, y) ∂ g(x, y) = lim = lim Δx→0 Δx→0 ∂x Δx ! ! β lim fx (x + ξΔx, y, θ)dω(θ) = = α Δx→0
Similarly,
∂ ∂y
!
β
! f (x, y, θ)dω(θ) =
α
!
β
fx (x + ξΔx, y, θ)dω(θ) α β
fx (x, y, θ)dω(θ). α
β
fy (x, y, θ)dω(θ). α
3 Global Exponential Robust Stability Theorem 3. Assume that (T1 ) |fi (y1 ) − fi (y2 )| ≤ li |y1 − y2 |, i = 1, · · · , n (T2 ) A − LW ∗ is an M -matrix. Then system (3) is globally exponentially robust stable.
Global Exponential Robust Stability
73
Proof. Part I. Existence of the equilibrium. We can prove that for any constant inputs I ∈ Rn ,λ ∈ Λ, r(λ) ∈ [0, r], ai (t, λ) ∈ [ai , ai ] , i = 1, · · · n, system (3) has at least an equilibrium, by homotopic invariance, topological degree theory. From (T1 ), it follows that |fi (s)| ≤ li |s| + |fi (0)|, i = 1, 2, · · · , n, ∀s ∈ R. Let h(u, t) = A(t)u − f (u) = 0,
(4)
Obviously, the solutions of (4) are the equilibria of system (3). Define the ¯ × [0, 1] → Rn as follows homotopic mapping H(u, t) : Ω H(u, t, η) = (H1 (u, t, η), ..., Hn (u, t, η)) = ηh(u, t) + (1 − η)u, η ∈ J = [0, 1].
|Hi (u, t, η)| = |η[ai (λ, t)ui − fi
n
!
j=1
0 −r(λ)
+ (1 − η)ui |
≥ [1 + η(ai − 1)]|ui | − η|fi & ≥ [1 + η(ai − 1)]|ui | − η li | ≥ [1 + η(ai − 1)]|ui | − ηli
uj (t + θ, x)dωij (θ, λ) + Ii ] !
n j=1
n j=1
n j=1
0
uj (t + θ, x)dωij (θ, λ) + Ii −r(λ) '
|
∗ wij uj | + li |Ii | + |fi (0)|
∗ |wij ||uj | − η(li |Ii | + |fi (0)|)
i = 1, 2, . . . , n.
(5) That is, H + ≥ [E + η(A − E)][u]+ − ηLW ∗ [u]+ − η[LI + + f + (0)] = (1 − λ)[u]+ + η(A − σW ∗ )[u]+ − η[LI + + f + (0)]
(6)
where H + = [|H1 |, |H2 |, . . . , |Hn |]T ,[u]+ = [|u1 |, . . . , |un |]T , I + = [|I1 |, |I2 |, . . . , |In |]T , f + (0) = [|f1 (0)|, |f2 (0)|, . . . , |fn (0)|]T , E is an identity matrix. Since C = A−LW ∗ is an M-matrix, we have (A−LW ∗ )−1 ≥ 0(nonnegative matrix) and there exists Q = (Q1 , Q2 , · · · , Qn )T > 0. namely Qi > 0, i = 1, 2, . . . , n. such that (A − LW ∗ )Q > 0. Let " U (R0 ) = u : [u]+ ≤ R0 = Q + (A − LW ∗ )−1 [LI + + f + (0)] (7) Then, U (R0 )is not empty and it follows from (7) that for any u ∈ ∂U (R0 )(boundary of U (R0 )), " H + ≥ (1 − η)[u]+ + η(A − LW ∗ ) [u]+ − (A − LW ∗ )−1 [LI + + f + (0)] (8) = (1 − η)[u]+ + η(A − LW ∗ )Q > 0, η ∈ J = [0, 1].
74
S. Bao
That is,H(u, t, η) = 0, ∀u ∈ ∂U (R0 ), η ∈ [0, 1]. So, from homotopy invariance theory[21], we have d(h, U (R0 ), 0) = d(H(x, 1), U (R0 ), 0) = d(H(x, 0), U (R0 ), 0) = 1, where d(h, U (R0 ), 0)denotes topological degree. By topological degree theory, we can conclude that h(u,t)=0 has at least one solution in U (R0 ). That is, system (3) has at least one equilibrium. Part II. Uniqueness of equilibrium and its global exponential robust stability. Let u∗ = (u∗1 , u∗2 , · · · , u∗n )T be an equilibrium of system (3) and u(t, x) = (u1 (t, x), u2 (t, x), · · · , un (t, x))T = (u∗1 , u∗2 , · · · , u∗n )T is any solution of system (3). Rewrite system (3) as m ∂ui − u∗ ∂ ∂ui − u∗ = (Dik ) − ai (t, λ)(ui − u∗ ) ∂t ∂xk ∂xk
! 0k=1 n +fi uj (t + θ, x)dωij (θ, λ) + Ii (t) (9) j=1 −r(λ)
! 0 n ∗ uj dωij (θ, λ) + Ii (t) , i = 1, 2, · · · , n, −fi j=1
−r(λ)
Multiply both sides of (9) by ui − u∗ and integrate it, we get
! ! m ! 1 d ∂ui − u∗ ∂ (ui − u∗ )2 dx = (ui − u∗ ) (Dik )dx − ai (t, λ) (ui − 2 dt Ω ∂xk ∂xk Ω k=1 Ω u∗ )2 dx + Ω (ui − u∗ )Υ dx
(10)
Where Υ =
fi
n !
−r(λ)
j=1
0
uj (t + θ, x)dωij (θ, λ) + Ii (t)
− fi
n ! j=1
0 −r(λ)
u∗j dωij (θ, λ)
+ Ii (t)
by the boundary condition of equation (3), we get n ! Ω
k=1
!
= ∂Ω
=−
(ui − u∗i )
∂(ui − u∗i ) ∂ (Dik dx = ∂xk ∂xk
((ui − u∗i )Dik
m k=1
Dik ( Ω
∂(ui − ∂xk
u∗i ) m )k=1 dx
∂(ui −u∗ i) 2 ) dx ∂xk
−
! Ω
(ui − u∗i )∇(Dik
m ! k=1
Ω
Dik (
∂(ui − u∗i ) m )k=1 dx ∂xk
∂(ui − u∗i ) 2 ) dx ∂xk
(11) ∂ ∂ T in which ∇ = ( ∂x , . . . , ) is the gradient operator. From (10) and (11), ∂x 1 m assumption (T1 ) and H¨ older inequality, we have n ! 0 d ∗ 2 ∗ 2 ||ui − ui ||2 ≤ −2ai (t, λ)||ui − ui ||2 + 2li uj (t + θ, x) − u∗j × dt −r(λ) j=1 ! 0 n ∗ ∗ 2 ||ui − ui ||2 dωij (θ, λ) ≤ −2ai ||ui − ui ||2 + 2li uj (t + θ, x) − Θ j=1
−r(λ)
(12)
Global Exponential Robust Stability
75
Where Θ=u∗j .||ui − u∗i ||2 dωij (θ, λ). I.e., n ! 0 d ∗ ∗ ||ui − ui ||2 ≤ −ai ||ui − ui ||2 + li uj (t + θ, x) − u∗j dωij (θ, λ) dt −r(λ) j=1
(13) We can choose a sufficient small positive constant ε such that βi ai − li
n
∗ βj ωji > ε, i = 1, · · · n.
(14)
j=1
Let us consider functions Fi (ξi ) = βi (ai − ξi ) − li
n
! βj
j=1
0 −r(λ)
e−ξi θ dωij (θ, λ), ξi ∈ [0, +∞), i = 1, 2, ..., n.
From (14), we get Fi (0) > ε > 0 and Fi (ξ) is continuous for ξi ∈ [0, +∞), moreover,Fi(ξ) → −∞ as ξ → +∞, thus there exist constant αi ∈ (0, +∞) such that ! 0 n Fi (αi ) = βi (ai − αi ) − li βj e−αi θ dωij (θ, λ) = 0, (15) j=1
−r(λ)
for i ∈ 1, 2, ..., n. By choose α = max1≤i≤n αi , we have ! 0 n Fi (α) = βi (ai − α) − li βj e−αθ dωij (θ, λ) ≥ 0, j=1
Define the Lyapunov functional ⎛
! n n ∗ αt V (t) = βi ⎝ui − ui 2 e + li i=1
(16)
−r(λ)
j=1
0
−r(λ)
⎞ Ξdωij (θ, λ) ⎠ .
$ # t Where Ξ= t+θ uj (s, x) − u∗j 2 eα(s−θ) ds . Since 0 ≤ r(λ) ≤ r, Let ωij (θ, λ) = ωij (−r(λ), λ) when θ ∈ [−r, −r(λ)), we have that θe−αθ and ωij (θ, λ) can be% continued as %the bounded variation function on [−r, 0] and ! % 0 % −αθ ∗ % |θe |≤M , % dωij (θ, λ)%% ≤ ωij , i, j = 1, · · · , n. −r
From Lemma 2, the upper right Dini-derivative of V (t) along the solution of system (3) can be calculated as DV + =
n i=1
+
n j=1
! li
βi 0
−r(λ)
αeαt ui − u∗i 2 + eαt
dui − u∗i 2 dt )
[uj (t, x) −
u∗j 2 eα(t−θ)
− uj (t + θ, x) −
u∗j 2 eαt ]dωij (θ, λ)
76
S. Bao
≤
(
n
βi e
α − ai +
αt
i=1
= −e
i=1
li
βi (ai − α) −
)
0
e
−αθ
−r(λ)
j=1
&
n
αt
!
n
n
! l i βj
'
0
e
−αθ
−r(λ)
j=1
||ui − u∗i ||2
dωij (θ, λ)
||ui − u∗i ||2
dωij (θ, λ)
From (16), we have DV + ≤ 0, t > 0 . So V (t) ≤ V (0), t > 0. Since V (t) =
n
βi eαt ui − u∗i 2 ≥ min {βi }eαt 1≤i≤n
i=1 n
n
n
u∗i 2 eαt
!
ui − u∗i 2
i=1
!
0
0
ui (0, x) − + li ϑ(s, x)ds dωij (θ, λ) j=1 −r(λ) θ
n ui (0, x) − u∗i 2 eαt + ≤ max {βi } 1≤i≤n i=1
! 0 n n ∗ α(ρ−θ) li |θ|uj (ρ, x) − uj 2 e dωij (θ, λ) (θ ≤ ρ ≤ 0) i=1 j=1 −r(λ)
n n ∗ ≤ max {βi } 1 + max {li M ωij } × sup ui (s, x) − u∗i 2
V (0) =
i=1
βi
1≤i≤n
1≤i≤n
−r≤s≤0 i=1
j=1
Where ϑ(s, x) = uj (s, x) − u∗j 2 eα(s−θ) . We let
max1≤i≤n {βi } 1 + max1≤i≤n {li
n j=1
M=
min1≤i≤n {βi }
∗ M ωij }
,
And then M ≥ 1 and n
ui −
u∗i 2
≤ Me
j=1
−αt
sup
n
−r≤s≤0 i=1
ui (s, x) − u∗i 2 ,
So the equilibrium u∗ of system (3) is exponentially stable, and the exponential converging velocity index α = min1≤i≤n {αi } from (16). Because of the arbitrariness of u, u∗ is globally exponentially stable. Thus, system (3) is globally exponentially robust stable.From[14,15], we have: Corollary 4. Assume that system(3) satisfies (T1)and one of the following: (T3) A − LW ∗ is a matrix with strictly diagonal dominance of column (or row). n ∗ βj li wji ∗ (T4) ai − li wij >
j=1,j =i
βi > 0
, βi > 0, i = 1, 2, ..., n
Global Exponential Robust Stability n
(T5) max
1≤i≤n
(T6) max
1≤i≤n
j=1 n j=1
77
∗ lj wji
ai
<1
∗ ∗ li wij + lj wji
2ai
<1
Then system (3) is globally asymptotically robust stable.
4 Remark and Conclusion System (3) includes many models as special cases. For example, when the smooth operators Dik = 0, (i = 1, 2, ..., n; k = 1, 2, ..., m), ( ωij , θ = 0 1 1 ωij (θ, λ) = , r(λ) = r, ai (λ) = , fi (·) = gi (·), τ τ 0, −r ≤ θ < 0 System (3) is model (2). When ⎧ m k ⎪ ⎪ ωij (λ), θ = r0 = 0 ⎪ ⎪ ⎪ k=0 ⎪ ⎪ ⎪ ⎪ m ⎪ ⎪ k ⎪ ωij (λ), r1 ≤ θ < 0 ⎪ ⎪ ⎪ k=1 ⎪ ⎪ ⎨ m ωij (θ, λ) = ω k (λ), r2 ≤ θ < r1 ⎪ k=2 ij ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ...... ⎪ ⎪ ⎪ ⎪ (m) ⎪ ⎪ ωij (λ), rm ≤ θ < rm−1 ⎪ ⎪ ⎪ ⎪ ⎩ 0, −r ≤ θ < rm i, j = 1, . . . , n System (3) becomes a static neural network model with discrete time delays: ∂ ∂ui ∂ui (Dik ) − ai (λ)ui (t, x) + fi = ∂t ∂xk ∂xk m
k=1
n m
k ωij (λ)uj (t
− rk , x) + Ii
.
k=0 j=1
(17)
When ωij (θ, λ) ∈ C 1 [−r, 0], i, j = 1, . . . , n, the value of the synaptic connectivity from neuron j to i is a continuous function on [−r, 0], which means that time delays influence the network continuously, system (3) belongs to the static model with continuous time delays:
78
S. Bao
∂ ∂ui ∂ui (Dik ) − ai (λ)ui (t, x) + fi = ∂t ∂xk ∂xk m
k=1
n ! j=1
0 −r
uj (t`, x)ωij (θ, λ)dθ + Ii
,
.
(18) Where t` = t + θ. As many neural networks are described by static models (see[2]), system(3) is widely representative. The global exponential robust stability of neural networks can be widely applied in solving control and optimization problems, and the M -matrix is easy to verify, so the results in this paper are significant and practical in both theory and application. Acknowledgements. This paper was supported by the National Natural Science Foundations of China under Grant 60673101 and 60674020.
References 1. Xu, Z., Qiao, H., Peng, J., Zhang, B.: A Comparative Study of two Modeling Approaches in Neural Networks. J. Neural Networks 17, 73–85 (2004) 2. Qiao, H., Peng, J., Xu, Z., Zhang, B.: A Reference Model Approach to Stability Analysis of Neural Networks. IEEE Trans. on Systems, Man and Cybernetics 33, 925–936 (2003) 3. Hopfield, J.: Neurons with Graded Response have Collective Computational Properties Like Those of Two-state Neurons. In: National Academy of Sciences of the United States of America, pp. 3088–3092. NASA Press (1984) 4. Kosko, B.: Bidirectional Associative Memories. IEEE Trans. Syst., Man Cybernetics 18, 49–60 (1988) 5. Chua, L., Yang, L.: Cellular Neural Networks: Theory and applications. IEEE Trans. Circuits Syst. 35, 1257–1290 (1988) 6. Pineda, F.J.: Generalization of Back-propagation to Recurrent Neural Networks. Phys. Rev. Lett. 59, 2229–2232 (1987) 7. Li, J., Michel, A.N., Porod, W.: Analysis and Synthesis of A Class of Neural Networks Linear Systems Operating on a Closed Hypercube. IEEE Trans. Circuits Syst. 36, 1406–1422 (1989) 8. Varga, I., Elek, G., Zak, H.: On the Brain-state-in-a-convex-domain Neural Models. Neural Networks 9, 1173–1184 (1996) 9. Bouzerdoum, A., Pattison, T.R.: Neural network for quadratic optimization with bound constraints. IEEE Trans. Neural Networks 4, 293–303 (1993) 10. Forti, M., Tesi, A.: New Conditions for Global Stability of Neural Networks with Applications to Linear and Quadratic Programming Problems. IEEE Trans. Circuits Syst. I 42, 354–366 (1995) 11. Friesz, T.L., Bernstein, D.H., Mehta, N.J., et al.: Day-to-day Dynamic Network Disequilibrium and Idealized Traveler Information Systems. Operat. Res. 42, 1120–1136 (1994) 12. Song, Q., Cao, J., Zhao, Z.: Periodic and Its Exponential Stability of Reactiondiffusion Recurrent Neural Networks with Continuously Distributed Delays. Nonlinear Analysis Series B 7, 65–80 (2006)
Global Exponential Robust Stability
79
13. Linshan, W., Zhe, Z., Yangfan, W.: Stochastic Exponential Stability of the Delayed Reaction-diffusion Recurrent Neural Networks with Markovian Jumping Parameters. Physics Letters A 372, 3201–3209 (2008) 14. Linshan, W., Yuying, G.: Global Exponential Robust Stability of Reactiondiffusion Interval Neural Networks with Time-varying Delays. Physics Letters A 350, 342–348 (2006) 15. Yang, X.F., Liao, X.F., Tang, Y.Y.: Guaranteed Attractivity of Equilibrium Points in a Class of Delayed Neural Networks. Int. J. Bifurcat. Chaos 16, 2737– 2743 (2006) 16. Mohamad, S.: Global Exponential Stability in DCNNs with Distributed Delays and Unbounded Activations. J. Comput. Appl. Math. 205, 161–173 (2007) 17. Mohamad, S., Gopalsamy, K., Haydar, A.: Exponential Stability of Artifical Neural Networks with Distributed Delays and Large Impulses. Nonlinear Anal. Real World Appl. 9, 872–888 (2008) 18. Huang, Z.K., Xia, Y.H., Wang, X.H.: The Existence and Exponential Attractivity of j-almost Periodic Sequence Solution of Discrete Time Neural Networks. Nonlinear Dyn. 50, 13–26 (2007) 19. Wang, M., Wang, L.: Global Asymptotic Robust Stability of Static Neural Network Models with S-type Distributed Delays. Mathematical and Computer modelling 44, 218–222 (2006) 20. Guo, D.J., Sun, J.X., Liu, Z.L.: Functional Methods of Nonlinear Ordinary Differential Equations. Shandong Science Press, Jinan (1995)
A LEC-and-AHP Based Hazard Assessment Method in Hydroelectric Project Construction Jian-lan Zhou, Da-wei Tang, Xian-rong Liu, and Sheng-yu Gong
*
Abstract. Aiming at the defects of simply using LEC method to evaluate the risk factors in working system, this paper thoroughly analyses the main factors which can influence the hazard assessment; adapts LEC and AHP assessment methods with the combination of the former accident cases and experts’ experiences; associates multiple factors that can lead to accidents, assigns reasonable weights, thus determines the importance order of the factors. The statistical data verifies that the above method implements the quantitative assessment objectively, which can guarantee the consistency of the evaluation result and statistical data. Keywords: Hydroelectric project construction, Hazard assessment, LEC, AHP, Statistical data on accidents.
1 Introduction As we know, the Three-Gorges Project’ commencement ceremony was held in 1993, which is 15 years ago. Recently, the other two hydroelectric projects on the Yangtse River, namely Xiluodu hydroelectric project and Xiangjiaba hydroelectric project, are also on their way in construction. All these projects share some common features: the scale of the project is huge; it will take a rather long time to fully finish the project; the construction process of the project is extremely complicated and the project involves very high risks. Thus, it is very urgent to carry out the identification, evaluation as well as the monitoring of the risk sources, to strengthen the control of the potential accidents, to prevent and respond to the severe accidents during the process of the construction of hydroelectric project. However, there is little systematic research on the above issues, and there are no Jian-lan Zhou . Xian-rong Liu Department of Work Safety, China Three Gorges Project Corporation, Yichang 443002, China
*
Jian-lan Zhou . Da-wei Tang Institute of Systems and Engineering, Huazhong University of Science & Technology, Wuhan 430074, China Sheng-yu Gong Sinohydro Engineering Bureau No. 3, Ankang 725011, China {ZhouJian-lan,TangDa-wei,LiuXian-rong,GongSheng-yu, ZHOUJL1999}@163.com H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 81–90. springerlink.com © Springer-Verlag Berlin Heidelberg 2009
82
J.-l. Zhou et al.
existing regulations or laws, which can cover the relevant issues of risk sources in construction of hydroelectric project, either. Therefore, it is very necessary and emergent to carry out some research on them [1].
2 Identification and Evaluation of the Risk Sources The risk sources in the hydroelectric project construction refer to those working positions, working process and working tools which may lead to human injury or death, economic loss, working environment breach or the combination of the above consequences. To be more specifically, the risk sources include earth excavation, rock excavation, shaft excavation, slope excavation, mechanical equipments, electrical equipments, dynamotor installation and so on. Accordingly, the identification of the risks sources include the analysis of working environment, working layout, working process, working tools, transportation routes, equipments with high risks, workers’ psychology and so on. Through the identification, we can get various factors which may cause accidents in the construction process, including physical factors (like equipment defects, protection defects, electricity hazards etc.), chemical factors (like flammable materials, explosive materials, poison materials etc.), biological factors (like pathogenic microorganism etc.), psychological factors (like overloaded, abnormality in health etc.), behavioral factors (like miss-leading, miss-operating etc.) and so on. The evaluation of the risk sources is a dynamic process which involves many interactive aspects and the criteria of the evaluation should be scientific, rational and comprehensive. Therefore, based on the features of the risk sources in hydroelectric project construction, this paper will carry out the evaluation in the following steps: first, LEC will be used to identify and roughly evaluate the factors which may cause the accidents; then, incorporated with the accidents cases and expert experience, AHP will be applied to perform the correlation analysis on those factors, and through the specification of the relevant weights, the importance of each factor concerning the potential accidents will be determined. The paper is based on the example of scaffold operation. The corresponding risk factors which may lead to accidents as well as the evaluation framework are given as follows: A
B1
C1
C2
C3
C4
B2
C5
C6
C7 C8
B3
C9
C1 0
B4
C1 1
C1 2
C1 3
B5
C1 4
C1 5
C1 6
Fig. 1 The risk evaluation framework of working on the scaffold
B6
C1 7
C1 C1 8 9
C2 0
C2 1
C2 2
A LEC-and-AHP Based Hazard Assessment Method
83
In the above figure: A stands for ‘risk evaluation of working on scaffold’; B1 stands for ‘falling from high places’; B2 stands for ‘collapse’; B3 stands for ‘objects hitting’; B4 stands for ‘fire’; B5 stands for ‘electric shock’; B6 stands for ‘others’ C1 stands for ‘haven’t made proper construction plan or developed safety related measures according to the features of the work before setting up and taking down the scaffold’; C2 stands for ‘the men who set up and take down the scaffold have not been properly trained’; C3 stands for ‘haven’t informed or haven’t completely informed the workers about the potential risks before they are working’; C4 stands for ‘working in abominable weather without any protection measures’; C5 stands for ‘setting up and taking down the scaffold without obeying design regulations’; C6 stands for ‘setting up the scaffold with unqualified materials’; C7 stands for ‘improper guidance from the commander or workers disobey the commander’; C8 stands for ‘working without obeying the rated load, overload, or unbalance load’; C9 stands for ‘improper or inadequate protection measures on working place; C10 stands for ‘workers don’t carry tools for protecting themselves’; C11 stands for ‘inadequate or improper cooperation when delivering objects among workers’; C12 stands for ‘intercrossing work by different groups simultaneously without proper protection measures’; C13 stands for ‘construction residues, tools and materials are not cleared in time’; C14 stands for ‘unauthorized people entering into working places because of no protection fence around the working places’; C15 stands for ‘rock falling along the slope’; C16 stands for ‘the bases of scaffold are not solid enough’; C17 stands for ‘fire caused by improper disposition of welding residues’; C18 stands for ‘flammable materials ignited by naked fire’; C19 stands for ‘improper layout of the temporary electric wires’; C20 stands for ‘electric leakage due to damp and naked wires’; C21 stands for ‘slipping, being hit or falling down due to carelessness’; C22 stands for ‘insufficient light in working place, slippery on the surface of working place’.
3 Working Conditions Hazard Analysis (LEC Method) LEC method uses the product of 3 factors to determine the extent of hazard of a certain object. These 3 factors are: the probability of risks (L); the frequency of exposure to the risks (E); and the severity of consequences (C). The value of each
84
J.-l. Zhou et al.
factor and its corresponding meanings are shown in table 1- table 3 [2]. Practically, the values of those factors are determined through the discussion of the experts or through the Delphi Method. Based on those values, the final value (D), which is used to specify the hazardous level, can be got from the product of L, E and C, i.e., D = LEC. The value of D and its corresponding meanings are described in table 4. Table 1. Probability of risks (L) Value
Description
10 6 3 1 0.5 0.2 0.1
Certain Quite likely Likely Unlikely Quite unlikely Nearly impossible Impossible
Table 2. Frequency of exposure to the risks (E) Value 10 6 3 2 1 0.5
Description
Constant exposure Daily exposure Weekly exposure Monthly exposure Occasional exposure Rare exposure
Table 3. Severity of consequences (C) Value 100 40 15 7 3 1
Description Extremely severe, more than 10 deaths or more than 50 seriously injured Very severe, 3-9 deaths or 10-49 seriously injured Quite severe, 1-2 deaths or 6-9 seriously injured Severe, 2-5 seriously injured Severe, 1 seriously injured Not so severe, slightly injured
Table 4. The extent of hazard and the corresponding level Value
Description
Level
>=320
Extremely hazardous, must stop working Very hazardous, countermeasures are needed immediately Quite hazardous, countermeasures are needed hazardous, attentions should be paid to Not so hazardous, acceptable
Level 1
[160, 320) [70, 160) [20, 70) <20
Level 2 Level 3 Level 4 Level 5
A LEC-and-AHP Based Hazard Assessment Method
85
Table 5. The risk evaluation results of a certain operation process with scaffold Accident type Factor
B1
B2
B3
B4
B5
B6
L
E
L*E
C
D
Hazardous level
C1
10
6
60
7
420
1
C2
10
6
60
7
420
1
C3
10
6
60
7
420
1
C4
6
1
6
7
42
4
C5
10
6
60
40
2400
1
C6
6
6
36
40
1440
1
C7
3
2
6
15
90
3
C8
10
6
60
7
420
1
C9
6
6
36
7
252
2
C10
10
2
20
7
140
3
C1
10
6
60
7
420
1
C2
6
6
36
7
252
2
C3
10
6
60
7
420
1
C5
10
6
60
7
420
1
C6
10
6
60
40
2400
1
C7
6
6
36
7
252
2
C8
6
6
36
7
252
2
C16
10
6
60
40
2400
1
C10
6
6
36
7
252
2
C11
3
6
18
15
270
2
C12
10
6
60
7
420
1
C13
10
6
60
3
180
2
C14
10
6
60
7
420
1
C15
10
6
60
7
420
1
C17
10
6
60
7
420
1
C18
6
6
36
7
252
2
C4
3
1
3
15
45
4
C19
6
6
36
15
540
1
C20
6
6
36
7
252
2
C21
6
6
36
1
36
4
C22
10
6
60
3
180
2
In our example of a certain operation process with scaffold, after consulting the experts, the value of L, E and C can be summarized in table 5. Note that, those values can only reflect the intrinsic features of the corresponding risks, and influences of countermeasures against them are not incorporated.
86
J.-l. Zhou et al.
From the above table, we can see that the same factor can lead to different types of accidents, different values of L, E, C and D. This is reasonable. For example, factor C5 may lead to accident B1 and B2, although the hazardous level are both 1, their values of D have a significant different.
4 AHP The basic idea of AHP is to decompose the target objectives into different levels and perform the qualitative and quantitative analysis based on the hierarchical structure. Based on the analysis of the essence and factors of a complex decision problem, as well as the relationship among those factors, AHP can mathematically model the decision process without much quantitative information. In this way, it can conveniently analyze the complex and ill-structured decision problems with multi-objective, and multi-criteria [3]. The process of AHP can be roughly divided into 4 steps: determine the weights, specify the hierarchy structures, construct pair-wise comparison matrix and rank. In order to perform the ranking, the maximum Eigen-value of the comparison matrix and its corresponding Eigen-vector should be specified. Concerning the consistency test, AHP introduces the factor of CR, which equals to CI/RI. When CR is less than 0.1, the consistency is acceptable. Note that CI = (λmax − n) /(n − 1) ( λmax is the maximum Eigen-value of the comparison matrix, n is the dimension of the matrix), and RI is a series of constant depend on n. The value of RI is shown in table 6. Table 6. The value of RI n
1
2
3
4
5
6
7
8
9
10
RI
0
0
0.58
0.90
1.12
1.24
1.32
1.41
1.46
1.49
5 Hazard Assessment Based on AHP and LEC The product of L and E in table 5 can represent the possibility of the accident during a certain period, in addition, the product of L and E can also be seen as the weight of the consequence C. In table 5, the possible values of L × E include 60, 36, 20, 18, 6 and 3. Further, these values can be categorized into 5 levels: Level A for 60, Level B for 36, Level C for 2o and 18, Level D for 6 and Level E for 3. Based on these levels, the meaning of the scale in the comparison matrix can be determined as follows: According to the scales and their corresponding meanings in table 7, and incorporating the values of L × E in table 6, the comparison matrixes can be got. Table 8 gives the factor comparison matrix of B1: b1 =10.2311 , CRb1 =0.0172 . Since CR is much From the above matrix, we can get λmax less than 0.1, the consistency is quite satisfactory. In addition, the corresponding Eigen-vector which can be seen as the weight vector is:.
A LEC-and-AHP Based Hazard Assessment Method
87
Table 7. The scales in comparison matrix and their meanings scale
Corresponding relation between the factors
1
A / A, B / B, C / C , D / D, E / E
Corresponding meaning of the scale The
two
factors
are
equally
important 3
One factor is slightly more important
A / B , B / C , C / D, D / E
than the other 5
One factor is more important than
A / C , B / D, C / E
the other 7
One factor is much more important
A / D, B / E
than the other 9
One factor is extremely more im-
A/ E
portant than the other The comparison between factor i and factor j is expressed by bij, The
Reciprocal
comparison between factor j and factor i is expressed by bji= 1/bij
Table 8. Factor comparison matrix of B1
b1 c1 c2 c3 c4 c5 c6 c7 c8 c9 c10
c1
c2
c3
c4
c5
c6
c7
c8
c9
c10
1
1
1
7
1
3
7
1
3
5
1
1
1
7
1
3
7
1
3
5
1
1
1
7
1
3
7
1
3
5
1/7
1/7
1/7
1
1/7
1/5
1
1/7
1/5
1/3
1
1
1
7
1
3
7
1
3
5
1/3
1/3
1/3
5
1/3
1
5
1/3
1
3
1/7
1/7
1/7
1
1/7
1/5
1
1/7
1/5
1/3
1
1
1
7
1
3
7
1
3
5
1/3
1/3
1/3
5
1/3
1
5
1/3
1
3
1/5
1/5
1/5
3
1/5
1/3
3
1/5
1/3
1
ωb1 =(0.158, 0.158, 0.158, 0.019, 0.158, 0.068, 0.019, 0.158, 0.068, 0.035) Similarly, λmax RC ω of the other comparison matrix concerning B2, B3, B4,
、 、
B5 and B6 can also be specified as follows: b2 λmax = 8, CR b2 = 0, ω b 2 =(0.167, 0.056, 0.167, 0.167, 0.167, 0.056, 0.056, 0.167) b3 λmax = 6.0393, CR b3 = 0.0063, ω b 3 =(0.082, 0.041, 0.219, 0.219, 0.219, 0.219)
88
J.-l. Zhou et al.
Table 9. Evaluation value of each factor and the corresponding ranking factor
D’
ranking
D
Original ranking
B1C5
1.896
1
2400
1
B2C6
1.196
2
2400
1
B2C16
1.196
3
2400
1
B5C19
0.918
4
540
5
B1C6
0.816
5
1440
4
B3C12
0.518
6
420
6
B3C14
0.518
6
420
6
B3C15
0.518
6
420
6
B5C20
0.428
9
252
18
B1C1
0.332
10
420
6
B1C2
0.332
10
420
6
B1C3
0.332
10
420
6
B1C8
0.332
10
420
6
B3C13
0.222
14
180
25
B2C1
0.209
15
420
6
B2C3
0.209
15
420
6
B2C5
0.209
15
420
6
B3C11
0.208
18
270
17
B3C10
0.194
19
252
18
B1C9
0.143
20
252
18
B4C17
0.137
21
420
6
B5C4
0.132
22
45
29
B1C7
0.086
23
90
28
B1C10
0.074
24
140
27
B2C2
0.070
25
252
18
B2C7
0.070
25
252
18
B2C8
0.070
25
252
18
B6C22
0.059
28
180
25
B4C18
0.046
29
252
18
B1C4
0.040
30
42
30
B6C21
0.007
31
36
31
A LEC-and-AHP Based Hazard Assessment Method
89
b4 λmax = 2, CR b4 = 0, ω b 4 =(0.75,0.25) b5 λmax = 3, CR b5 = 0, ω b 5 =(0.067, 0.467, 0.467) b6 λmax = 2, CR b6 = 0, ω b 6 =(0.25, 0.75)
Since all the value of CR above is much less than 0.1, the consistencies are all very satisfactory. According to the actual severe accidents data from the 3 hydroelectric projects mentioned in the introduction, B1 happens 17 times, B2 happens 8 times, B3 happens 21 times, B4 happens 0 times, B5 happens 4 times, B6 happens 0 times. Based on the experts’ opinion, the values of λmax RC ω of comparison matrix concerning A in figure 1 are:
、 、
a λmax = 6.4031, CR a = 0.065, ω a =(0.3, 0.179, 0.338, 0.026, 0.131, 0.026) .
Using Dbj ' = ω aj ω bj cbj ( i = 1, , k j ; j = 1, 2, , 6 ) , we can get the evaluation value of D of each factor and the corresponding ranking. The results are shown in the table 9: i
i
i
6 Result Analysis From table 9, we can see that the importance of the first 3 factors remain unchanged. These factors are C5 which can cause B1, C6 which can cause B2 and C16 which can cause B2. The reason for why B1C5 ranks first is that it is the most primary factor which can cause B1, another primary factor is B1C6, which ranks 5th. The rankings of other factors which can cause B1 have a slight decrease, for example, the ranking of B1C1 has decreased to 10 from 6. As for B2, C6 and C16 are two primary causes, which rank 2nd and 3rd respectively. B5 happens a lot in actual cases, and the main reason for B5 is C19, ranking 4th, another cause of B5 ---- C20 ranks 9th. There are negligible changes in the rankings of the factors which can cause B3. Among the factors which can cause B4 and B6, B4C17 ranks highest, but it only ranks 21st, the reason for this phenomenon is that B4 and B6 never happen in reality. Besides the intrinsic features of the risk sources, the occurrences of accidents also depend on the establishment and fulfillment of the security management regulations. Based on the analysis above, and considering the accident investigation results, the following facts are contributes to the frequent occurrences of the accidents in operation with scaffold: The security management regulation concerning the scaffold is insufficient; Security criteria are not strictly obeyed; Supervising companies misunderstand the relationship between the production and security.
7 Conclusion Based on the example of operation with scaffold during the construction of hydroelectric projects, we proposed a LEC-and-AHP based hazard assessment method by constructing a hierarchy model of risk factors and the comparison matrixes. With the help of actual cases and experts’ experience, the method
90
J.-l. Zhou et al.
determines the importance of risk factors which can lead to different kinds of accidents through the analysis on those factors and the assignment of different weights accordingly. The future research may include the establishment of a set of security assessment criteria and a unified security checking list of the construction of hydroelectric projects.
References 1. Lv, A., Wang, P., Feng, L.: Identification, Appraisal and Control of Hazard Installations in Construction Engineering. Journal of Huazhong University of Science and Technology 1, 43–45 (2006) 2. Liu, Zhang, X., Liu, G.: The Application Guide of Safety Assessment method. Chemical Industry Press, Safety Science and Engineering Publishing Center (2005) 3. Wang, L., Xu, S.: AHP Introduction. China Renmin University Press (1990)
A Stochastic Lotka-Volterra Model with Variable Delay Yong Xu, Song Zhu, and Shigeng Hu
Abstract. Since population models are often subject to environmental noise, in this paper we stochastically perturb the Lotka-Volterra model with variable delay x(t) ˙ = diag(x(t)) b + Ax(t) + Bx(t − δ(t)) into the Itˆ o form dx(t) = diag(x(t)) b + Ax(t) + Bx(t − δ(t)) dt + σdw(t) . We show that under certain conditions, the deterministic delay equation and the perturbed delay equation have similar behaviour in the sense that both have positive solutions which will not explode to infinity in a finite time and, in fact, will be ultimately bounded. Keywords: Stochastic Lotka-Volterra model, Variable delay, Ultimate boundedness, Itˆ o s formula. AMR Subject Classifications: 34K40, 60H10, 60J65.
1 Introduction The variable delay Lotka-Volterra model for n interacting species is described by the n-dimensional variable delay differential equation n n , x˙ i (t) = xi (t) bi + aij xj (t) + bij xj (t − δj (t)) , j=1
i = 1, · · · , n ,
(1)
j=1
where xi (t) represents the population size of ith species, bi is the inherent net birth rate of ith species, aij and bij (i, j = 1, · · · , n) represent their Yong Xu · Shigeng Hu School of Mathematics and Statistics, Huazhong University of Science and Technology, Wuhan 430074, China Song Zhu Department of Control Science and Engineering, Huazhong University of Science and Technology, Wuhan 430074, China
[email protected] H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 91–100. c Springer-Verlag Berlin Heidelberg 2009 springerlink.com
92
Y. Xu, S. Zhu, and S. Hu
interactions rates and all these parameters are constants, δi (t) (i = 1, · · · , n) is the delay function. There is an extensive literature concerned with the dynamics of this variable delay model and here we only mention Gopalsamy and He [1], Huo and Li [2], Freedman and Wu [3], Liu and Chen [4], Li and Kuang [5], Teng el.ct. [6–8]. In [1], Gopalsamy and He discuss the following variable delay Lotka-Volterra model for two interacting species ( N˙ 1 (t) = N1 (t) b1 (t) − a11 (t)N1 (t − τ (t)) − a12 (t)N2 (t − τ (t)) (2) N˙ 2 (t) = N2 (t) b2 (t) − a21 (t)N1 (t − τ (t)) − a22 (t)N2 (t − τ (t)) , where all bi (t) and aij (t) are positive periodic functions. They show that, in order to guarantee the existence and global asymptotic stability of the positive periodic solution of the system (2), the coefficient functions bi (t) and ai (t) need to satisfy a set of algebraic conditions. On the other hand, population systems are often subject to environmental noise. It is therefore useful to reveal how the noise affects on the population systems. Recently, stochastic Lotka-Volterra system receives an increasing attention. For example, Mao and his coauthors [9–11] show the presence of environmental noise may suppress explosion in population dynamics. Bahar et al. [12] show that under the sufficient large noise, the population will become extinct with probability one. Rudnicki et al. [13, 14] mainly examine the stationary distribution and extinction. However, little is known about the variable delay Lotka-Volterra system (1) with perturbation. This paper will fill the blank. To consider the environment noise, we may replace the rate bi by bi + σi w, ˙ where w˙ is a white noise (i.e., w(t) is a Brownian motion). Hence, Eq. (1) becomes the following Itˆo stochastic differential equation with variable delay n n ,# $ dxi (t) =xi (t) bi + aij xj (t) + bij xj (t − δj (t)) dt + σi dw(t) j=1
(3)
j=1
on t ≥ 0, where i = 1, · · · , n. In this paper we will show that under certain conditions, the deterministic systems (1) and the stochastic systems (3) behave similarly in the sense that both have positive solutions which will not explode to infinity in a finite time and, in fact, will be ultimately bounded. In other words, we show that under certain condition the noise will not spoil these nice properties. Moreover, under certain conditions, we find that for any p > 0, the pth moment of the solutions for the stochastic systems (3) is ultimately bounded. This paper is organized as follow: In the next Section, some notations and a useful lemma are given. In Section 3, we show Eq. (3) has a global positive solution. From the biological point of view, the asymptotic ultimate bound property is obtain in Section 4.
A Stochastic Lotka-Volterra Model with Variable Delay
93
2 Preliminaries Throughout this paper unless otherwise specified, we use the following notations. Let | · | denote the Euclidean norm in Rn . If A is a vector or matrix, its transpose is denoted by AT . If A is matrix, its trace norm is denoted by 2 |A| = trace(AT A). Let R+ = (0, ∞). For any x ∈ R, denote x+ = x∨0. For any x = (x1 , · · · , xn )T ∈ Rn , denote x∞ = max1≤i≤n xi . For any matrix A = (aij )n×n ∈ Rn×n , we use the following notations, Ai =
n
a+ ij ,
A¯i =
n
A¯ i =
a+ ij ,
j=1
j =i
n
a+ ji ,
i = 1, · · · , n.
j=1
Let (Ω, F , P) be a complete probability space with a filtration {Ft }t≥0 satisfying the usual conditions (i.e. it is right continuous and F0 contains all P -null sets). Assume that w(t) is a scalar Brownian motion defined on the complete probability space. If x(t) is an Rn -valued stochastic process on t ∈ [−τ, ∞), we let xt = {x(t + θ) : θ ∈ [−τ, 0]} for t ≥ 0. In this paper we consider the following Itˆo stochastic differential equation with variable delay n n ,# $ dxi (t) =xi (t) bi + aij xj (t) + bij xj (t − δj (t)) dt + σi dw(t) j=1
(4)
j=1
on t ≥ 0, where i = 1, · · · , n. This takes the matrix form , dx(t) = diag(x(t)) b + Ax(t) + By(t) dt + σdw(t) ,
(5)
where x(t) = (x1 (t), · · · , xn (t))T , y(t) = (y1 (t), · · · , yn (t))T = (x1 (t − δ1 (t)), · · · , xn (t − δn (t)))T , diag(x(t)) represents the n × n matrix with all elements zero except those on the diagonal which are x1 (t), · · · , xn (t), b = (b1 , · · · , bn )T , σ = (σ1 , · · · , σn )T , A = [aij ]n×n , B = [bij ]n×n , w(t) is a scalar Brownian motion, the delay function δi (t) ∈ C 1 ([0, ∞); [0, τ ]) (i = 1, · · · , n) and satisfies the following condition η := max inf (1 − δi (t)) > 0. 1≤i≤n t≥0
The initial data {x(t) : −τ ≤ t ≤ 0} = {ξ(t) : −τ ≤ t ≤ 0} ∈ C([−τ, 0]; Rn+ ), which is the family of continuous Rn+ -valued function ϕ with norm ϕ = sup−τ ≤t≤0 |ϕ(t)| < ∞. Before we state our main result, let us introduce a useful lemma. Lemma 1. Let p > 0. Assume that there a matrix A = (aij )n×n ∈ Rn×n such that aii < −Ai ,
i = 1, · · · , n.
(6)
94
Y. Xu, S. Zhu, and S. Hu
Then, for any x ∈ Rn+ , Ip (x) :=
n n
aij xpi xj ≤ −Kp xp+1 ∞ ,
(7)
i=1 j=1
where −Kp = max
1≤k≤n
, akk + Ak +
pp (p + 1)p+1
n
. |aii |−p Ap+1 i
i=1,i =k
Proof. Fixing x ∈ Rn+ , we may compute that Ip (x) ≤
n #
aii xp+1 + i
i=1
n
n $ p := a+ x x ϕi (xi ), ∞ ij i
(8)
i=1
j =i
where ϕi (t) = aii tp+1 + Ai tp x∞ , i = 1, · · · , n. By condition (6), we have ϕi (x∞ ) = (aii + Ai )xp+1 ∞ < 0,
i = 1, · · · , n.
Therefore, it is easy to get that max 0≤t≤ x ∞
ϕi (t) =
pp |aii |−p Ap+1 xp+1 ∞ , i (p + 1)p+1
i = 1, · · · , n.
On the other hand, by the definite of x∞ , there must exist some k such that xk = x∞ . Hence, by (8), we have that Ip (x) ≤ ϕk (x∞ ) +
n
ϕi (xi )
i=1,i =k
≤ max
1≤k≤n
= max
1≤k≤n
n , ϕk (x∞ ) + i=1,i =k
, akk + Ak +
max
0≤t≤ x ∞
pp (p + 1)p+1
ϕi (t)
n
xp+1 |aii |−p Ap+1 ∞ i
i=1,i =k
as required.
3 Positive Global Solutions It is well known that, in order for a stochastic differential equation to have a unique global solution for any given initial data, the coefficients of the equation are generally required to satisfy the linear growth condition and the local Lipschitz condition (cf. Mao [15]). However, the coefficients of (5)
A Stochastic Lotka-Volterra Model with Variable Delay
95
do not satisfy the linear growth condition, though they are locally Lipschitz continuous, so the solution of (5) may explode at a finite time (cf. Mao [16], Wu [17]). In this section, we shall show that the solutions of Eq. (3) is not only positive but will also not explode to infinite at any finite time. Theorem 2. Assume that parameter matrix A, B of Eq. (5) satisfy that ¯i ). max (aii + Ai ) ≤ − max (nB
1≤i≤n
(9)
1≤i≤n
Then, for any given initial data x0 = ξ ∈ C([−τ, 0]; Rn+ ), there is a unique solution x(t) to Eq. (5) on t ≥ −τ . Moreover, this solution remains in Rn+ with probability 1, namely x(t) ∈ Rn+ for all t ≥ −τ almost surely. Proof. Since the coefficients of Eq. (5) are locally Lipschitz continuous, for any given positive initial data ξ ∈ C([−τ, 0]; Rn+ ), there is a unique maximal local solution x(t) on t ∈ [−τ, τe ), where τe is the explosion time (cf. [16] Theorem 3.2.2, p. 95). To show this solution is global, we only need to prove that τe = ∞ a.s. Let k0 > 0 be sufficiently large in the sense k0−1 <
min |ξ(θ)| ≤ max |ξ(θ)| < k0 .
−τ ≤θ≤0
−τ ≤θ≤0
For each integer k ≥ k0 , define the stopping time τk = inf{t ∈ [−τ, τe ) : xi (t) ∈ / (k −1 , k) for some i = 1, 2, · · · , n} with usual setting inf ∅ = ∞,where ∅ denotes the empty set. Clearly, τk is increasing as k → ∞. Set τ∞ = limk→∞ τk , whence τ∞ ≤ τe a.s. If we can prove τ∞ = ∞ a.s., then τe = ∞ a.s., which implies the desired result. To prove this statement, for any p > 0, let us define a C 2 -function V : Rn+ → R+ by V (x) =
n
u(xi ),
(10)
i=1
where u(xi ) = xpi − 1 − p log(xi ). Clearly, u(·) ≥ 0 and u(0+ ) = u(∞) = ∞. Let T > 0 be arbitrary. For 0 ≤ t ≤ τk ∧ T , applying the Itˆo formula to V (x(t)) and taking expectation yield !
t
EV (x(t)) = EV (x(t)) + E
LV (x(t), y(t))dt, 0
where LV : Rn+ × Rn+ → R is defined by LV (x, y) =p
n i=1
n n n , n n , xpi bi + aij xj + bij yj − p aij xj + bij yj bi + j=1
j=1
n n p(p − 1) 2 p p 2 σi x i + σ . + 2 2 i=1 i i=1
i=1
j=1
j=1
(11)
96
Y. Xu, S. Zhu, and S. Hu
By the condition (9) and the Lemma 1, we have that n n
aij xpi xj ≤ −Kp xp+1 ∞ ,
i=1 j=1
where −Kp = max
1≤k≤n
, akk + Ak +
pp (p + 1)p+1
n
. |aii |−p Ap+1 i
i=1,i =k
Noting that Ak < |akk | for k = 1, · · · , n, we have K∞ := lim Kp = − max (akk + Ak ) > 0. p→∞
1≤k≤n
So we may choose sufficiently large p such that Kp > 0. Clearly, nxp+1 ∞ . We therefore have n n
aij xpi xj
≤ −n
−1
Kp
i=1 j=1
n
n i=1
xp+1 ≤ i
xp+1 . i
(12)
i=1
By the Young inequality, we may compute that n n
bij xpi yj ≤
i=1 j=1
n n
b+ ij
i=1 j=1
, p 1 p+1 xp+1 y + p+1 i p+1 j
1 , ¯ ¯ xp+1 pBi + η −1 B = i i p + 1 i=1 n
1 + , p+1 . bij yj − η −1 xp+1 j p + 1 i=1 j=1 n
n
+
Substituting (12) and (13) into (11) gives LV (x, y) ≤
n i=1
+p
p + , p+1 bij yj − η −1 xp+1 j p + 1 i=1 j=1 n
Fi (x) + n
n
, |bji | yi − η −1 xi ,
j=1
where , ¯ - p+1 ¯i + η −1 B pB i xi + p bi + (p − 1)σi2 /2 xpi Fi (x) := − p n−1 Kp − p+1 + p(η −1 |bji | − aji )xi + p(σi2 /2 − bi ).
(13)
A Stochastic Lotka-Volterra Model with Variable Delay
97
By condition (9), we have , ¯i ¯i + η −1 B pB ¯i > 0, lim n−1 Kp − = n−1 max (akk + Ak ) − B p→∞ 1≤k≤n p+1
i = 1, · · · , n.
Therefore we may choose sufficiently large p such that ¯
¯i + η −1 B pB i > 0, p+1
n−1 Kp −
i = 1, · · · , n,
which implies Fi (x) is bounded above by a positive constant H for any x ∈ Rn+ . On the other hand, for any θ > 0, we may compute that !
!
t
0
≤η
−1
≤ η −1
!
t
xθi (s − δi (s))ds =
yiθ (s)ds = 0
!
t
−τ ! t
xθi (r)dr
=η
−1
t−δi (t)
δi (0)
!
t
xθi (r)dr
+η
−1
!
xθi (r) dr (1 − δi (s)) 0
−τ
0
ξiθ (r)dr
xθi (r)dr + η −1 τ ξθ .
(14)
0
Therefore, we have p + bij E p+1 n
EV (x(t)) ≤EV (ξ(0)) + nHt + +p
n n i=1 j=1
! |bij |E
n
i=1 j=1
0 −τ
≤EV (ξ(0)) + nHt + η
!
0
−τ
η −1 xp+1 (s)ds j
η −1 xj (s)ds
−1
n , n b+ ij ξp+1 + p|bij |ξ τ p+1 i=1 j=1
= : Ht . Let t = τk ∧ T . We obtain that EV (x(τk ∧ T )) ≤ HT . By the definition of τk , xi (τk ) = k or 1/k for some i = 1, 2, . . . , n, P(τk ≤ T )[u(k −1 ) ∧ u(k)] ≤ P(τk ≤ T )V (x(τk ∧ T )) ≤ EV (x(τk ∧ T )) ≤ HT , which implies that lim sup P(τk ≤ T ) ≤ lim k→∞
k→∞
HT = 0. u(k −1 ) ∧ u(k)
Since T > 0 is arbitrary, we must have P(τ∞ < ∞) = 0 as required.
98
Y. Xu, S. Zhu, and S. Hu
4 Asymptotic Bound Properties In Section 3, we show that the solution of Eq. (5) is positive and will not explode in any finite time. This nice positive property allows us to further discuss various asymptotic properties for the solution of Eq. (5). Comparison with nonexplosion of the solution, stochastically ultimate boundedness is more interesting from the biological point of view. To discuss stochastically ultimate boundedness, we first examine the pth moment boundedness, which is also interesting. Theorem 3. Let condition (9) hold and p > 0. Then there is a positive constant Kp , which is independent of the initial data x0 = ξ ∈ C([−τ, 0]; Rn+ ), such that the solution x(t) of Eq. (5) has the property that lim sup E|x(t)|p ≤ Hp .
(15)
t→∞
n Proof. By Theorem 2 , the solution x(t) will remain n in pR+ for all t ≥ −τ . 2 n Define a C -function U : R+ → R+ by U (x) = i=1 xi . Applying the Itˆ o formula to et V (x(t)) and taking expectation yields
!
t
et EU (x(t)) = EU (ξ(0)) + E
, es LU (x(s), y(s)) + U (x(s)) ds,
(16)
0
where LV : Rn+ × Rn+ → R is defined by LU (x, y) = p
n i=1
xpi
n n n , - p(p − 1) bi + aij xj + bij yj + σi2 xpi . 2 j=1 j=1 i=1
Using the same computation (12) and (13) in the proof of Theorem 2.1, we have that p + , p+1 LU (x, y) ≤ , Gi (x) + bij yj − eτ η −1 xp+1 j p + 1 i=1 j=1 i=1 n
n
n
(17)
where # # ¯ $ p+1 ¯i + η −1 eτ B pB p − 1 2$ p i xi + p bi + σi xi Gi (x) := −p n−1 Kp − p+1 2 By condition (9), we have , ¯i + η −1 eτ B ¯i pB ¯k > 0, lim n−1 Kp − = n−1 max (akk + Ak ) − B p→∞ 1≤k≤n p+1
i = 1, · · · , n.
A Stochastic Lotka-Volterra Model with Variable Delay
99
Therefore, these exists p∗ > 0, for any p ≥ p∗ , such that n−1 Kp −
¯i + η −1 eτ B ¯
pB i > 0, p+1
i = 1, · · · , n,
which implies Gi (x) is bounded above by a positive constant H1 for any x ∈ Rn+ . On the other hand, we may compute that !
! t es yip+1 (s)ds ≤ eτ e(s−δi (s)) xp+1 (s − δi (s))ds i 0 0 ! t ! t τ −1 es xp+1 (s)ds ≤ e η es xp+1 (s)ds + eτ η −1 τ ξp+1 . ≤ eτ η −1 i i t
−τ
0
Therefore, we get !
t
et EU (x(t)) ≤ EU (ξ(0)) + nH1
es ds + 0
n n peτ (η)−1 τ ξp+1 + bij . p+1 i=1 j=1
This immediately implies that lim sup EU (x(t)) ≤ nH1 . t→∞
Recall the elementary inequality: for any xi > 0, α > 0, then α/2 |x|α ≤ nα/2 max xα i ≤n 1≤i≤n
n
xα i ,
(18)
i=1
where x = (x1 , · · · , xn )T . Therefore, for any p ≥ p∗ , we have lim sup E|x(t)|p ≤ Hp , t→∞
by setting Hp = n1+p/2 H1 . Recall the Liapunov inequality:
E|x(t)|r
1/r
1/s ≤ E(|x(t)|s ,
0 < r < s < ∞.
Then the assertion (15) follows for any p > 0. Then we introduce a simple corollary, by which the pth moment boundedness implies the stochastically ultimate boundedness. Corollary 4. For any p > 0, if the stochastic process x(t) is bounded in pth moment, i.e., lim supt→∞ E|x(t)|p ≤ Kp , where Kp is a constant dependent on p, then x(t) is ultimately bounded, namely, for any ∈ (0, 1), there exists a constant M = M () such that lim sup P{|x(t)| ≤ M } ≥ 1 − . t→∞
(19)
100
Y. Xu, S. Zhu, and S. Hu 1/p
Proof. For any ∈ (0, 1), let M = Kp /1/p . Then by the Chebyshev inequality, P{|x(t)| > M } ≤ Hence,
E|x(t)|p . Mp
lim sup P{|x(t)| ≤ M } ≥ 1 − ,
(20)
t→∞
as required.
References 1. Gopalsamy, K., He, X.: Oscillations and Convergence in an Almost Periodic Competition System. Acta. Appl. Math. 46, 247–266 (1997) 2. Huo, H., Li, W.: Periodic Solutions of a Periodic Lotka-Volterra System with Delays. Appl. Math. Comput. 156, 787–803 (2004) 3. Freedman, H.I., Wu, J.: Periodic Solutions of Single-species Models with Periodic Delay. SIAM J. Math. Anal. 23, 689–701 (1992) 4. Liu, S., Chen, L.: Profitless Delays for Extinction in Nonautonoumous LotkaVolterra System. Commun. Nonlinear Sci. Numer. Simul. 4, 210–216 (2001) 5. Li, Y., Kuang, Y.: Periodic Solutions of Periodic Delay Lotka-Volterra Equations and Systems. J. Math. Anal. Appl. 255, 260–280 (2001) 6. Teng, Z.: On the Positive Almost Periodic Solutions of a Class of Lotka-Volterra Type Systems with Delays. J. Math. Anal. Appl. 249, 433–444 (2000) 7. Teng, Z.: Nonautonomous Lotka CVolterra Systems with Delays. J. Differential Equations 179, 538–561 (2002) 8. Teng, Z., Chen, L.: Global Asymptotic Stability of Periodic Lotka-Volterra Systems with Delays. Nonlinear Anal. 45, 1081–1095 (2001) 9. Bahar, A., Mao, X.: Stochastic Delay Lotka-Volterra Model. J. Math. Anal. Appl. 292, 364–380 (2004) 10. Mao, X., Marion, G., Renshaw, E.: Environmental Noise Supresses Explosion in Population Dynamics. Stochastic Process. Appl. 97, 96–110 (2002) 11. Mao, X., Sabanis, S., Renshaw, E.: Asymptotic Behaviour of the Stochastic Lotka-Volterra Model. J. Math. Anal. Appl. 287, 141–156 (2003) 12. Bahar, A., Mao, X.: Stochastic Delay Population Dynamics. Int. J. Pure Appl. Math. 4, 377–400 (2004) 13. Rudnicki, R.: Long-time Behaviour of a Stochatic Prey-predator Model. Stochastic Process. Appl. 108, 93–107 (2003) 14. Rudnicki, R., Pichor, K.: Influence of Stochastic Perturbation on Prey-predator Systems. Math. Biosci. 206, 108–119 (2007) 15. Mao, X.: Stochastic Differntial Equations and Applications. Horwood, Chichester (1997) 16. Mao, X.: Exponential Stability of Stochastic Differential Equations. Dekker, New York (1994) 17. Wu, F., Hu, S.: Stochastic Functional Kolmogorov-type Population Dynamics. J. Math. Anal. Appl. 347, 534–549 (2008)
Extreme Reformulated Radial Basis Function Neural Networks Gexin Bi and Fang Dong
Abstract. Gradient descent based learning algorithms are generally very slow due to improper learning steps or may easily converge to local minima. And many iterative learning steps may be required by such learning algorithms in order to obtain better learning performance. This paper proposes a new learning algorithm for R-RBFNs which randomly chooses hidden nodes and analytically determines the output weights of R-RBFNs. The experimental results based on a few benchmark problems has shown that the proposed algorithm tends to provide better generalization performance at extremely fast learning speed. Keywords: Reformulated radial basis function neural network, Gradient descent based learning algorithm, Admissible radial basis function, Generator function.
1 Introduction Radial basis function neural networks (RBFNNs) are function approximation models that can be trained to implement a desired input-output mapping [1], [2]. The performance of RBFNNs depend on the number and centers of the radial basis functions, their shapes, and the methods used for learning the input-output mapping. Broomhead and Lowe [3] suggested that the centers of the radial basis functions can either be distributed uniformly within the region of the input space for which there is data, or chosen to be a subset of the training vectors by analogy with strict interpolation. Moody and Darken [4] proposed a hybrid learning process for training radial basis function neural networks with Gaussian radial basis functions, which employs a supervised scheme for updating the output weights,i.e., the weights that Gexin Bi · Fang Dong College of Navigation, Dalian Maritime University, 1 Linghai Road, 116026 Dalian, China
[email protected],
[email protected]
This work is supported by Application Fundamental Research Foundation of China Ministry of Communications under grant 200432922505.
H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 101–110. c Springer-Verlag Berlin Heidelberg 2009 springerlink.com
102
G. Bi and F. Dong
connect the radial basis functions with the output units, and an unsupervised clustering algorithm for determining the centers of the radial basis functions. Karayiannis et al., proposed a systematic approach for constructing reformulated radial basis function neural networks (R-RBFNs), which was developed to facilitate their training by supervised learning algorithms based on gradient descent learning algorithms. This approach reduces the development of reformulated radial basis function models to the selection of admissible generator functions. Experiments involving a variety of reformulated radial basis function models generated by linear and exponential generator functions [5] perform considerably better than conventional RBF models trained by existing algorithms [1], [6]. However, It is clear that the learning speed of gradient descent based learning algorithms is in general far slower than required and it has been a major bottleneck in their applications for past decades. Further the learning parameters (i.e., learning rate, number of learning epochs, stopping criteria, and other predefined parameters) must be properly chosen to ensure convergence [7]. Huang, et al [7], [8], [9] proved that single hidden layer feedforward neural networks (SLFNs) with randomly generated RBF nodes and a widespread of piecewise continuous activation functions can universally approximate any continuous target function on any compact subspace of the Euclidean space Rn [10]. After the hidden layer parameters are chosen randomly, SLFNs can be simply considered as a linear system and the output weights of SLFNs can be analytically determined through simple generalized inverse operation of the hidden layer output matrices [11], [12], [13]. In this paper, we present a extreme reformulated radial basis function neural networks (ER-RBFNNs) as a systematic and axiomatic approach for extremely designing R-RBFNNs. ER-RBFNNs provides an axiomatic approach for R-RBFNNs by using basis functions formed in terms of admissible generator functions. ERRBFNNs need not be tuned during training and may simply be assigned with random values. After the hidden layer parameters are chosen randomly, ER-RBFNNs can be simply considered as a linear system and the output weights of ER-RBFNNs can be analytically determined through simple generalized inverse operation of the hidden layer output matrices.
2 Construction OF R-RBFNs 2.1 Mathematical Description of RBF Models Consider the Rn → Rp mapping implemented by the model ˜ N f( ωi φ(x − vi )) = y.
(1)
i=1
where f (·) is a nondecreasing, continuous and differentiable function, x = [x1 , x2 , . . . , xn ]T ∈ Rn , y = [y1 , y2 , . . . , yp ]T ∈ Rp , vi = [v1 , v2 , . . . , vn ]T ∈ Rn
Extreme Reformulated Radial Basis Function Neural Networks
103
is the prototype of the radial basis function, ωi = [ωi1 , ωi2 , . . . , ωim ]T ∈ Rm is the weight vector connecting the ith hidden node and the input nodes, and φ(x) are radial basis functions.
2.2 Admissible Radial Basis Functions The interpretation of an RBF neural networks as a composition of receptive fields requires that the responses of all RBF’s to all inputs are always positive. If the prototypes are interpreted as the centers of receptive fields, it is required that the response of any RBF becomes stronger as the input approaches its corresponding prototype. And, it is required that the response of any RBF becomes more sensitive to an input vector as this input vector approaches its corresponding prototype. Finally, it is reasonable to require that the responses of all radial basis functions to all inputs be bounded [1], [6]. In order for the model (1) to satisfy the desired properties mentioned above, any admissible radial basis function φ(x) = g(x2 ) must satisfy the following three basic axiomatic requirements: A1) g(x − v2 ) > 0 for all x, v ∈ Rn ; A2) g(x−v2 ) > g(y−v2 ) for all x, y, v ∈ Rn such that x−v2 < y−v2 ; A3) If ∇x g = ∇x g(x − v2 ) enotes the gradient with respect to x of g(x− v2 ) at x, then ∇x g2 /x − v2 > ∇y g2 /y − v2 for all x, y, v ∈ Rn such that x − v2 < y − v2 ; A4) g(x − v2 ) > ∞ for all x, v ∈ Rn . The four basic axiomatic requirements impose some mathematical restrictions on the search for admissible radial basis functions that can lead to R-RBFNs. The selection of admissible radial basis functions in accordance with the four basic axiomatic requirements can be facilitated by the following theorem. Theorem 1: The model described by (1) represents an radial basis function neural network in accordance with all four axiomatic requirements if g(x) is infinitely differentiable on (0, ∞) such that: 1) g(x) > 0, ∀x ∈ (0, ∞); 2) g (x) < 0, ∀x ∈ (0, ∞); 3) g
(x) > 0, ∀x ∈ (0, ∞); 4) limx→0+ g(x) = L, where L are finite numbers. Theorem 1 requires that g(x) is infinitely differentiable on (0, ∞). So theorem 1 is more restrictive than Karayiannis et al.’s theorem . A radial basis function is said to be admissible in the wide sense if it satisfies the three basic axiomatic requirements, that is, the first three conditions of Theorem 1. If a radial basis function satisfies all four axiomatic requirements, that is, all four conditions of Theorem 1, then it is said to be admissible in the strict sense.
104
G. Bi and F. Dong
2.3 Admissibility Generator Functions The search for admissible radial basis functions can be simplified by considering basis functions of the form φ(x) = g(x2 ) ,with each g(x) defined in terms of a generator function g0 (x) as g(x) = (g0 (x))1/1−m , m = 1, can be facilitated by the following theorem [1], [6]: Theorem 2: Consider the model (1) and let each g(x) be defined as g(x) = (g0 (x))1/1−m , m = 1, where g0 (x) is a generator function that is continuous on (0, ∞) and has continuous first- and second-order derivatives. If m > 1, then this model represents an radial basis function neural network in accordance with all four axiomatic requirements if: 1) g0 (x) > 0, ∀x ∈ (0, ∞); 2) g0 (x) > 0, ∀x ∈ (0, ∞); 3) r0 (x) = [m/m − 1](g0 (x))2 − g0 (x)g0
(x) > 0, ∀x ∈ (0, ∞); 4) limx→0+ g0 (x) = L1 , where L1 ∈ (0, ∞). If m < 1, then this model represents an radial basis function neural network in accordance with all four axiomatic requirements if: 1) g0 (x) > 0, ∀x ∈ (0, ∞); 2) g0 (x) < 0, ∀x ∈ (0, ∞); 3) r0 (x) = [m/m − 1](g0 (x))2 − g0 (x)g0
(x) < 0, ∀x ∈ (0, ∞); 4) limx→0+ g0 (x) = L2 , where L2 ∈ (0, ∞). Any generator function that satisfies the first three conditions of Theorem 2 leads to admissible radial basis functions in the wide sense [1], [6]. Admissible radial basis functions in the strict sense can be obtained from generator functions that satisfy all four conditions of Theorem 2.
2.4 Constructing Admissible Generator Functions Theorem 2 essentially reduces the construction of admissible radial basis function models to the search for admissible generator functions. A broad variety of admissible generator functions can be determined by a constructive approach based on the admissibility conditions of Theorem 2 [1], [6]. The construction of wide-sense admissible generator functions can be attempted by assuming that g0 (x) = p(g(x)), where p(·) satisfies certain conditions in accordance with Theorem 2. Theorem 2 requires that g0 (x) > 0, ∀x ∈ (0, ∞), for m > 1 and g0 (x) < 0, ∀x ∈ (0, ∞), for m < 1. Since it is also required by Theorem 2 that g0 (x) > 0, ∀x ∈ (0, ∞), the function p(x) must be selected so that p(x) > 0, ∀x ∈ (0, ∞), if m > 1 and p(x) < 0, ∀x ∈ (0, ∞), if m < 1. For such functions, the admissibility conditions of Theorem 2 are satisfied by all solutions g0 (x) > 0, ∀x ∈ (0, ∞), of the differential equation g0 (x) = p(g0 (x)) that satisfy the conditions r0 (x) > 0, ∀x ∈ (0, ∞), for m > 1 and r0 (x) < 0, ∀x ∈ (0, ∞), for m < 1.
Extreme Reformulated Radial Basis Function Neural Networks
105
Generator functions admissible in the strict sense can be obtained by determining the subset of the resulting wide-sense admissible generator functions that satisfy the fourth condition of Theorem 2. The constructive approach outlined above is employed here to produce increasing generator functions that can be used for m > 1. The same constructive approach can be extended to produce decreasing generator functions that can be used for m < 1. Assume that m > 1 and let the function p(x) be of the form p(x) = kxn , k > 0. The function g0 (x) can be obtained in this case by solving the differential equation g0 (x) = k(g0 (x))n , k > 0. According to (2),
g0
(x)
=
kn(g0 (x))n−1 g0 (x)
r0 (x) = (g0 (x))2 (
2
(2) 2n−1
= k n(g0 (x))
m − n). m−1
. In this case (3)
If m > 1, it is required that r0 (x) > 0, ∀x ∈ (0, ∞), which holds for all m/(m − 1) > n. For m > 1, m/(m − 1) > 1 and the inequality m/(m − 1) > n holds for all n < 1. For n = 1, m/(m − 1) − n = 1/(m − 1) > 0. Thus, the condition r0 (x) > 0, ∀x ∈ (0, ∞), is satisfied for all n ≤ 1. For n = 1, p(x) = kx and the solutions of (2) are g0 (x) = θexp(βx), where θ > 0 and β = k/θ > 0. These generator functions also satisfy the fourth axiomatic requirement since limx→0+ g0 (x) = θ > 0. The exponential generator functions g0 (x) = exp(βx), β > 0, obtained for θ = 1 correspond to g(x) = exp(βx/(1 − m)), which lead to Gaussian radial basis functions φ(x) = g(x2 ) = exp(−x2 /σ 2 ), with σ 2 = (m − 1)/β. For n < 1, the solution of (2) are of the form g(x) = (ax + b)1/(1−n) , where a = k(1 − n) > 0 and b ≥ 0. For n = 0, p(x) = and (2) leads to linear generator functions g0 (x) = ax + b, a > 0, b ≥ 0. For g0 (x) = ax + b, the fourth axiomatic requirement is satisfied only if limx→0+ g0 (x) = b > 0. Thus, the fourth axiomatic requirement excludes generator functions of the form g0 (x) = ax. Linear generator function produce radial basis functions of the form φ(x) = g(x2 ) = (ax2 + b)1/(1−m) , with m > 1. If a = 1 and b = r2 , then the linear generator function g0 (x) = ax + b becomes g0 (x) = x + γ 2 and g(x) = (x + γ 2 )1/(1−m) . If m = 3, g(x) = (x + γ 2 )−1/2 corresponds to the inverse multiquadratic radial basis function φ(x) = g(x2 ) = 1/(x2 + γ 2 )1/2 . Another useful generator function for practical applications can be obtained from g0 (x) = ax + b by selecting b = 1 and a = δ > 0. For g0 (x) = 1 + δx, limx→0+ g(x) = limx→0+ g0 (x) = 1. For this choice of parameters, the corresponding radial basis function φ(x) = g(x2 ) is bounded by 1, which is also the bound of the Gaussian radial basis function.
3 ER-RBFNNs For N arbitrary distinct samples (xk , tk ), where xk = [xk1 , xk2 , . . . , xkn ]T ∈ Rn ˜ hidden nodes and admissiand tk = [tk1 , tk2 , . . . , tkp ]T ∈ Rp , R-RBFNNs with N ble RBF φ(x) are mathematically modeled as
106
G. Bi and F. Dong ˜ N
1
ωi (g0 (xk − vi 2 )) 1−m = yk , k = 1, . . . , N.
(4)
i=1
˜ hidden nodes with activation function φ(x) can approximate R-RBFNNs with N N˜ these N samples with zero error means that i=1 yk − tk = 0, i.e., there exist ωi , vi such that ˜ N
1
ωi g0 ((xk − vi 2 )) 1−m = tk , k = 1, . . . , N.
(5)
i=1
The above N equations can be written compactly as Hω = T,
(6)
where HN ×N˜ = ⎞ ⎛ 1 1 g0 (x1 − v1 2 ) 1−m · · · g0 (x1 − vN˜ 2 ) 1−m ⎟ ⎜ .. .. ⎠ ⎝ . ··· . 1 1 2 1−m 2 1−m g0 (xN − v1 ) · · · g0 (xN − vN˜ ) ⎛ T⎞ ⎛ T⎞ ω1 t1 ⎜ .. ⎟ ⎜ .. ⎟ and T = ⎝ . ⎠ . ω=⎝ . ⎠ T T ωN˜ N˜ ×m tN N ×m
(7)
(8)
H is called the hidden layer output matrix of the R-RBFNs; the ith hidden node output with respect to inputs x1 , x2 , . . . , xN . Huang et al. proved that for SLFNs with additive or RBF hidden nodes, one may randomly choose and fix the hidden node parameters and then analytically determine the output weights when approximating any continuous target function. Note that the proof of universal approximation is shown to be valid for R-RBFNs with general type of hidden nodes generated by linear and exponential generator functions. In this regard, for N arbitrary distinct samples (xk , tk ), in order to obtain arbi˜ (≤ N ) hidden trarily small non-zero training error, one may randomly generate N nodes generated by linear and exponential generator functions with random parameters. Eq. (4) then becomes a linear system and the output weights ω are estimated as ω ˆ = H † T, (9) where H † is the Moore-Penrose generalized inverse of the hidden layer output matrix H. Calculation of the output weights is done in a single step here. Thus this avoids any lengthy training procedure where the network parameters are adjusted iteratively with appropriately chosen control parameters (learning rate and learning epochs, etc.).
Extreme Reformulated Radial Basis Function Neural Networks
107
4 Performance Evaluation In this section, the performance of the proposed ER-RBFNNs is evaluated on seven different benchmark regression data sets, which are summarized in table 1. All the simulations have been conducted in MATLAB 7.1 environment running on ordinary PC with 2.0 GHZ CPU and 512M RAM. In our simulations, the input and output attributes of regression applications are normalized into the range [-1,1]. The prototypes representing the locations of the RBF’s in the input space are randomly chosen from the range [-1,1]. There are 20 hidden nodes assigned for our ER-RBFNNs algorithm . The experimental results reported here are based on the average of 20 independent trials. Table 1 Specification of real-world regression cases Data sets Training data Testing data Attributes Ailerons 7154 6596 40 bank domains 4500 3692 32 Delta ailerons 3000 4129 5 Delta elevators 4000 5517 6 Elevators 8752 7846 18 Kinematics 4000 4192 8 Puma32H 4500 3692 32
4.1 Comparison with Different Value of Linear Generator Functions We first compare the generalization performance of ER-RBFNNs generated by linear generator functions g0 (x) = x + γ 2 , m = 3 and γ = 0, γ = 0.1, γ = 1. The 1 radial basis functions of the form φ(x) = g(x2 ),with g(x) = (g0 (x)) 1−m . It is clear from Table 2 that the best performance of ER-RBFNNs on the testing root mean square error (RMSE) was achieved for γ = 0. However, the best performance of standard deviaton (Dev) on the testing set was achieved for γ = 1. As observed from table 2, the learning time of ER-RBFNNs is very stable on a wide range of value of linear generator functions. According to table 2, the learning phase of ERRBFNNs can be completed in seconds or less than seconds.
4.2 Comparison with Different Value of Exponential Generator Functions Next, we compare the generalization performance of ER-RBFNNs generated by exponential generator functions g0 (x) = exp(βx), m = 3 and β = 0.5, β = 1, β = 5. It is clear from Table 3 that the best performance of ER-RBFNNs on the testing root mean square error (RMSE) and exponential were achieved for β = 0.5, ER-RBFNNs algorithm run faster than R-RBFNNs algorithm. As observed from
108
G. Bi and F. Dong
Table 2 Comparison with different value of linear generator functions Data sets Ailerons
γ
0.0 0.1 1.0 bank 0.0 domains 0.1 1.0 Delta 0.0 ailerons 0.1 1.0 Delta 0.0 elevators 0.1 1.0 Elevators 0.0 0.1 1.0 Kinematics 0.0 0.1 1.0 Puma32H 0.0 0.1 1.0
Testing Mean Dev. 1.7286e-004 6.9348e-006 1.7397e-004 5.3345e-006 1.7305e-004 4.7740e-006 0.1132 0.0038 0.1139 0.0033 0.1137 0.0027 4.5021e-004 4.2689e-005 4.5627e-004 4.0186e-005 4.6176e-004 2.7356e-005 0.0015 1.4005e-005 0.0015 2.1459e-005 0.0015 8.2240e-006 0.0031 1.5747e-004 0.0032 2.2819e-004 0.0031 1.4728e-004 0.1967 0.0072 0.2039 0.0074 0.2079 0.0054 0.0282 3.7659e-004 0.0284 3.5732e-004 0.0285 2.8951e-004
Training time(s) 0.4609 0.4261 0.4500 0.4461 0.4727 0.4877 0.0711 0.0727 0.0711 0.1000 0.1008 0.1047 0.3586 0.3570 0.3438 0.1078 0.1187 0.1078 0.2383 0.2305 0.2445
table 3, the performance of ER-RBFNNs improved as the value of β decreased from 5 to 0.5. The networks trained with β = 5, did not achieve satisfactory performance. This experiments indicate that there exists a certain range of values of β that guarantee satisfactory performance. This determines the responses of the RBF’s and plays a critical role in the ability of the network to implement the desired input-output mapping.
4.3 Comparison between Linear and Exponential Generator Functions with Random Value In this section, we compare the generalization performance of ER-RBFNNs generated by linear generator function g0 (x) = x + γ 2 and exponential generator functions g0 (x) = exp(βx) with m = 3. Each of these networks was tested with random value chosen from the range [-1,1]. In order to comparison to be fair, the same set of randomly selected prototypes were used in each trials. Table 4 shows the testing root mean square error (RMSE) ,standard deviaton (Dev) and training time in 20 trials. The experiments verified that training speed is one of the major advantages of ER-RBFNNs. According to table 4, the root mean square error produced on the average in 20 trials by ER-RBFNNs were not significantly affeced by different type of generator function with random value. Compared with ER-RBFNNs generated
Extreme Reformulated Radial Basis Function Neural Networks
109
Table 3 Comparison with different value of exponential generator functions Data sets
β
Ailerons
0.5 1.0 5.0 bank 0.5 domains 1.0 5.0 Delta 0.5 ailerons 1.0 5.0 Delta 0.5 elevators 1.0 5.0 Elevators 0.5 1.0 5.0 Kinematics 0.5 1.0 5.0 Puma32H 0.5 1.0 5.0
Testing Mean Dev. 3.0253e-004 3.1286e-005 5.6511e-004 5.4372e-005 0.0031 0.0078 0.1195 0.0016 0.1187 0.0015 4.3878 15.3158 4.4930e-004 1.0320e-005 4.6034e-004 2.1009e-005 4.6053e-004 8.4129e-005 0.0014 1.0338e-005 0.0015 3.3189e-006 0.0017 8.3193e-005 0.0038 3.1361e-004 0.0057 0.0010 52.9946 199.1462 0.2181 0.0053 0.2774 0.0091 0.6623 0.0151 0.0291 1.3406e-004 0.0300 2.4612e-004 0.2458 0.3968
Training time(s) 0.5016 0.5070 0.4781 0.5338 0.2703 0.2609 0.0813 0.0914 0.0836 0.1164 0.1187 0.1211 0.3844 0.3984 0.3742 0.1344 0.1281 0.1227 0.2555 0.2531 0.2539
by liner generator functions, ER-RBFNNs generated by exponential generator functions produced a slightly lower standard deviaton on the testing set. Table 4 Comparison between linear and exponential generator functions with random value Data sets Mode Ailerons linear expo bank linear domains expo Delta linear ailerons expo Delta linear elevators expo Elevators linear expo Kinema linear -tics expo Puma linear -32H expo
Testing Mean Dev. 1.7168e-004 4.1466e-006 2.1784e-004 2.0198e-005 0.1133 0.0030 0.1175 0.0016 4.5690e-004 3.7404e-005 4.3316e-004 2.2301e-005 0.0015 4.0118e-006 0.0014 1.6391e-005 0.0031 1.9121e-004 0.0035 3.1569e-004 0.2025 0.0070 0.1922 0.0057 0.0284 2.7302e-004 0.0293 1.9312e-004
Training time(s) 0.8542 0.5094 0.4852 0.2664 0.0656 0.0766 0.1055 0.1148 0.3555 0.3906 0.1102 0.1242 0.2500 0.2648
110
G. Bi and F. Dong
5 Conclusion In this paper, a fast and axiomatic approach algorithm (ER-RBFNNs) has been developed for R-RBFNNs generated by linear and exponential generator functions. The parameters of linear and exponential generator function can be randomly choosed and analytically determines the output weights of ER-RBFNNs. The learning speed of ER-RBFNNs is extremely fast.
References 1. Karayiannis, N.B., Randolph-Gips, M.M.: On the Construction and Training of Reformulated Radial Basis Function Neural Networks. IEEE Trans. Neur. Netw. 14, 835–846 (2003) 2. Micchelli, C.A.: Interpolation of Scattered Data: Distance Matrices and Conditionally Positive Definite Functions. Constr. Approx. 2, 11–22 (1986) 3. Broomhead, D.S., Lowe, D.: Multivariable Functional Interpolation and Adaptive Networks. Comp. Sys. 2, 321–355 (1988) 4. Moody, J.E., Darken, C.J.: Fast Learning in Networks of Locally-Tuned Processing Units. Neur. Comput. 1, 281–294 (1989) 5. Chen, T., Chen, H.: Approximation Capability to Functions of Several Variables, Nonlinear Functionals, and Operators by Radial Basis Function Neural Networks. IEEE Trans. Neur. Netw. 6, 904–910 (1995) 6. Karayiannis, N.B.: Reformulated Radial Basis Neural Networks Trained by Gradient Descent. IEEE Trans. Neur. Netw. 10, 657–671 (1999) 7. Liang, N.Y., Huang, G.B., Saratchandran, P., Sundararajan, N.: A Fast and Accurate Online Sequential Learning Algorithm for Feedforward Networks. IEEE Trans. Neur. Netw. 17, 1411–1423 (2006) 8. Huang, G.B., Zhu, Q.Y., Siew, C.K.: Extreme Learning Machine: Theory and Applications. Neur. Comput. 70, 489–501 (2006) 9. Huang, G.B., Liang, N.Y., Rong, H.J., Saratchandran, P., Sundararajan, N.: On-Line Sequential Extreme Learning Machine. In: Proc. IASTED International Conference on Computational Intelligence, Calgary, Canada (2005) 10. Huang, G.B., Chen, L., Siew, C.K.: Universal Approximation Using Incremental Constructive Feedforward Networks with Random Hidden Nodes. IEEE Trans. Neur. Netw. 17, 879–892 (2006) 11. Huang, G.B., Zhu, Q.Y., Siew, C.K.: Extreme Learning Machine: Theory and Applications. Neur. Computing 70, 489–501 (2006) 12. Huang, G.B., Zhu, Q.Y., Siew, C.K.: Extreme Learning Machine: A New Learning Scheme of Feedforward Neural Networks. In: Proceedings of International Joint Conference on Neural Networks, pp. 985–990. IEEE Press, New York (2004) 13. Huang, G.B., Siew, C.K.: Extreme Learning Machine with Randomly Assigned RBF Kernels. Inter. J. Inform. Tech. 11, 16–24 (2005) 14. Chen, S., Cowan, C., Grant, P.: Orthogonal Least Squares Learning Algorithm for Radial Basis Function Networks. IEEE Trans. Neur. Netw. 2, 302–309 (1991)
Research of Nonlinear Combination Forecasting Model for Insulators ESDD Based on Wavelet Neural Network* *
Haiyan Shuai, Jun Wu, and Qingwu Gong
Abstract. Wavelet neural network (WNN) effectively overcomes the intrinsic shortcomings of artificial neural network, namely, slow learning speed, difficult determination of structure and existence of local minima, combination forecasting model fully integrates the information of each model, and nonlinear one effectually conquers the difficulties and drawbacks in combined modeling non-stationary time serial by using linear model , hence, nonlinear combination forecasting model based on WNN possesses more flexible structure, higher data fitting and forecasting accuracy. Simulation experiments show, compared with other forecasting models, the predicted results are closer to the actual ones which show the nonlinear combination forecasting model for insulators ESDD based on WNN can efficaciously improve the speed and accuracy of the forecasting. Therefore, the method presented provides a doable thought for the computerization of pollution area map of power network. Keywords: Wavelet neural network(WNN), Equal salt deposit density(ESDD), Nonlinear combination forecasting, Insulators.
1 Introduction Exposed to dirty environment, surfaces of insulators will be polluted. After being wetted, the polluted layers will deduce insulators’ insulation capability and often invite pollution flashover. According to the operation experiences of power Haiyan Shuai . Qingwu Gong School of Electrical Engineering of Wuhan University, Wuhan 430074, China *
Haiyan Shuai Wuhan Technical College of Communications, Wuhan 430074, China Jun Wu Three Gorges Vocational College of Electric Power,Yichang 443000, China
[email protected] *
Project Supported by Ministry of Science and Technology of China (NCSIE-2006-JKZX-174).
H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 111–119. springerlink.com © Springer-Verlag Berlin Heidelberg 2009
112
H. Shuai, J. Wu, and Q. Gong
sectors, pollution flashover is one of the main factors causing power accidents. In recent years, there have happened several large area pollution flashovers in some locals or the whole country, which resulted in large area power blackout. For example, in the early of 2001, the large area flashovers happened in Northeast China, North China and Central China respectively l. Among all the causes resulting in pollution flashover, the lag and the inaccuracy of pollution area map of power network is a primary one. Equal Salt Deposit Density (ESDD) is the equivalent amount of NaCl that would yield the same conductivity at complete dilution [1]. ESDD is a main factor to classify contamination severity and map pollution areas. Therefore, forecasting ESDD plays an important role in the safety, economy and reliability of power system. At present, the methods of ESDD forecasting are mainly traditional multivariate regression analysis [2-3], artificial neural network [4-6], support vector machines [7], et al.. By far, there is no report about using combination forecasting model to predict ESDD at home and abroad. So-called combination forecasting method is the way to use two or more than two different forecasting methods to make predictions for an object respectively, carry out a proper combination of each single prediction result , and regard the combination as the final forecasting result [8]. Combination forecasting integrates the useful information of all single forecasting models and generally considers each forecasting result; hence it can more systemically and comprehensively reflect the changes of an object than a single model. Bates J.M and Granger C.J.W. [9] proved the combination of two or more than two agonic single forecasting models can produce the results better than those of each single model, which showed combination forecasting method can increase prediction accuracy. According to the different ways of combination of each single forecasting model, combination forecasting models can generally be divided into two classes: linear one and nonlinear one. Most of former combination forecasting methods belonged to linear ones. Although linear combination forecasting is relatively simple, there exist some limitations in it. Reference [10] points out that the value of a linear combination forecasting is merely a convex combination of the values of different single forecasting models. In coordinate system, when the actual values of an object being predicted are over, under or intersecting the values of the two different forecasting models, linear method tends to be disable. In order to conquer these limitations, reference[11] puts forward a nonlinear combination forecasting method, namely, supposing a set of the actual values of an object being predicted are y ( d ) ( d = 1, 2, , D ) ,then using L forecasting methods to get L values of the object: z (i ) (i = 1, 2, , L) ,and finally utilizing the L values to construct a nonlinear combination function: ∧
y = ϕ ( Z ) = ϕ ( z (1), z (2), Where,
ϕ
, z ( L))
(1)
is the nonlinear function. According to the nonlinear combination ∧
forecasting theory, under a certain measure, the metric of
y will be better than those
Research of Nonlinear Combination Forecasting Model
113
of z (i ) (i = 1, 2, , L) . References [11-12] uses BP neural network to construct nonlinear combination functions respectively and all get relatively good results. However, the researches in recent years show that, although BP network has very strong ability to self-study and nonlinearly map, but in the view of function expression, the network is an inferior one because there does not exist a unified and integrated theory to instruct the selection of BP structure which is selected depending on experiences, leading to over-fitting and low generalization, especially under small samples. Besides, BP algorithm belonging to a local optimization method also suffers from many local minima, which will influence the reliability and veracity of nonlinear combination forecasting model. Wavelet neural network (WNN) [13] is a mathematical modeling method which is developed based on the combination of wavelet transform [14] with artificial neural network. By virtue of wavelet outstanding performances on analyzing of non-fixed signals and constructing of nonlinear function models, WNN which has been combined with wavelet basis function has more advantages than traditional neural network, and been effectively used in signal processing ,data compression and fault diagnosis ,et al.. The paper applies the nonlinear combination forecasting model based on WNN to carry out a study on insulators’ ESDD prediction, and compares the ESDD values of the model with those of BP nonlinear combination model, WNN single model and BP single model. The experiments show that the model being discussed, compared to other models, has intelligent combined-modeling process, more flexible structure, broader applicability, higher accuracy of data fitting and forecasting, et al..
2 Nonlinear Combination Forecasting Model for ESDD Based on WNN In fact, the nature of wavelet transform [15] is an integral transformation between different parameters.
wab ( a, b ) =
+∞
∫
f ( x) g (a, b, x)dt
(2)
−∞
Where, g ( a, b, x) = mother wavelet.
a
1 x−b g( ) is called wavelet basis and g ( x) called a |a|
、 b are termed as scaling factor and translation factor of
g (a, b, x) respectively. To f ( x ) , the resolution of its local structure can be realized through adjusting wavelet basis window.
a
、 b ,namely, gearing the scale and position of
114
H. Shuai, J. Wu, and Q. Gong
WNN is a model which, based upon wavelet analysis, possesses neural network thread. In other words, WNN adopts nonlinear wavelet basis to replace common-used Sigmoid function in traditional neural network. Through linear superposition of nonlinear wavelet basis selected, the combination of the ESDD data of all single forecasting models is realized. The nonlinear combination function in Equation. (1) can be fitted as follow, by adopting wavelet basis g ( a, b, x ) : L
∧
m
ϕ ( Z ) = ∑ ωk g ( k =1
∑v j =1
jk
z ( j ) − bk
(3)
)
ak
∧
Where ϕ ( Z ) is the ESDD values of nonlinear combination forecasting corresponding to the actual ones y ( d ) ; z ( j ) expresses the prediction value of the v jk bk ak are the weighted No.j model, Z = ( z (1), z (2),...z ( L )) ; ωk coefficient between output terminal and the No.k hidden layer node, the weighted one between the No.j input and the No.k hidden layer node, the translation factor and scaling factor of the No.k wavelet basis respectively; m is the number of wavelet basis (the number of hidden layer nodes ). Considering Morlet wavelet possessing relatively good localization and smoothness, it will be selected in Equation (3).
、 、 、
g ( x) = cos(1.75 x) exp(− Fig.1
shows
the
structure
of
WNN. ∧
x2 ) 2
There
(4) are
L input nodes-
z (1), z (2),...z ( L) and one output node- ϕ ( Z ) .
Fig. 1 The structure of WNN
、 、 、 、
The objective of using WNN to carry out regression analysis is to determine the ∧ network parameters- ωk v jk bk and ak by which ϕ ( Z ) can be fitted optimally with y ( d ) . ωk v jk bk and ak can be optimized via Minimum Mean Square Error(MMSE) energy function as follow.
Research of Nonlinear Combination Forecasting Model L
D
m
1 E = ∑ [ ∑ ωk g ( 2 d =1 k =1
∑v j =1
jk
115
z ( j ) − bk ) − y (d )]2
ak
(5)
Where, D is the number of training samples and y ( d ) the No.d actual value. To gain the optimal ωk v jk bk and ak is to minimize Equation (5). In this article, gradient descent algorithm is used as WNN learning principle. The details are as followings:
、 、
1) Set the objective error function value; 2) Define input z ( j ) and corresponding output 3) Initialize network parameters;
y (d ) , and normalize them;
Table 1 Initial parameters of the WNN
4) Calculate the gradient of each parameter; L
Let
z * = ∑ v jk z j ( j ) j =1
, S = z a− b ,then the gradients of Equation(5)are *
k
k
respectively: D ∧ z * − bk ∂E = − ∑ [ϕ ( Z ) − y ( d )] g [ ] ∂ ωk ak t =1 D ∧ ∂E S2 S = − ∑ [ϕ ( Z ) − y ( d )]ω k [ − cos(1.75 S ) exp( − ) − ∂ v jk 2 ak t =1
1.75sin(1.75 S ) exp( −
S2 1 ) ]z ( j ) 2 ak
D ∧ S2 S ∂E = − ∑ [ϕ ( Z ) − y ( d )]ω k [cos(1.75 S ) exp( − ) + 2 ak ∂ bk t =1
1.75sin(1.75 S ) exp( −
S2 1 ) ] 2 ak
116
H. Shuai, J. Wu, and Q. Gong D ∧ ∂E S2 S2 + = − ∑ [ϕ ( Z ) − y ( d )]ω k [cos(1.75S ) exp( − ) 2 ak ∂a t =1 k
1.75sin(1.75 S ) exp( − 5) Introduce momentum factor
α
S2 S ) ] 2 ak
to amend each parameter;
ωk (t + 1) = ωk (t ) − η
∂E + α ω k (t ) ∂ωk
v jk (t + 1) = v jk (t ) − η
∂E + α v jk (t ) ∂v jk
bk (t + 1) = bk (t ) − η
∂E + α bk (t ) ∂bk
ak (t + 1) = ak (t ) − η
∂E + α ak (t ) ∂ak
Where, η is the learning ratio and α the momentum factor. 6) Compute current output of WNN: put current parameters into Equation (3) to get current output of the network; 7) Numerate error function value. When the error is less than the set one, the learning process is terminated; otherwise, turns to step 4). The basic principles of nonlinear combination forecasting model for insulators ESDD based on WNN are to regard the ESDD of each single model as the input of WNN and the corresponding actual ESDD as output, and then utilize sufficient samples to train the model in the interest of establishing a nonlinear mapping relationship between the ESDD of each single model and the actual one. When its precision is reached after many training and testing, the model can be used as an effective way to predict insulator’s ESDD.
3 Appraisal of the Prediction In order to evaluate the prediction effects of the combination model comprehensively, according to the practice and principle, the paper uses relative error δ and average relative error δ to evaluate the accuracy of the prediction.
yi − yˆi
δ=
δ =
1
yi
×100%
l
yi − yˆ i
i =1
yi
∑ l
× 100%
(6)
(7)
Research of Nonlinear Combination Forecasting Model
117
Where yi are actual values and yˆ i predicted ones. These values reflect comprehensively the prediction results. The smaller δ is, the better the generalization of the model and the corresponding parameters are.
4 Experimental Results On the basis of the model principles and modeling steps in the text, Matlab 7.0 is adopted to write the ESDD prediction programs based on WNN. In the experiment, the historical meteorological data and ESDD data in the ESDD monitoring spot of Qingshan District, Wuhan from April to June in 2005 are regarded as training set, and those from April to June in 2006 as forecasting set. Tab.2 shows the comparisons between WNN combination forecasting model and BP combination forecasting model, WNN single forecasting model, and BP single forecasting model.
Table 2 Comparison of forecasting results of ten ESDD Single foreca forecasting ng
Combination m ination forecasting mb ng Times
WNN ecasting g Actuall Forecasting 2 ˄PJFP ˅˄mg/cm ˅ 0.0268
0.0281
0.0325
0.0341
0.0346
0.0328
0.0376
0.0387
0.0230
0.0243
0.0255
0.0242
0.0494
0.0517
0.0422
0.0448
0.0467
0.0450
0.0474
0.0491
G (%)
WNN
BP Forecasting g G (%) %)˄mg/cm2˅ 4.85 4.92 5.20 2.93 5.65 5.10 4.66 6.16 3.64 3.59
4.670
G (%)
Forecasting i ˄mg/cm2˅
BP
G (%) %)
Forecasting ˄mg/cm2˅
G (%)
0.0252
5.97
0.0286
6.72
0.0249
7.09
0.0346
6.46
0.0354
6.77
0.0360
10.77
0.0367
6.07
0.0322
6.94
0.0372
8.00
0.0395
5.05
0.0397
5.99
0.0408
8.51
0.0213
7.39
0.0248
7.83
0.0211
8.26
0.0245
3.92
0.0239
6.27
0.0238
6.67
0.0526
6.48
0.0542
9.72
0.0534
8.10
0.0403
4.50
0.0452
7.11
0.0398
5.69
0.0451
3.43
0.0441
5.57
0.0443
5.14
0.0500
5.49
0.0509
7.38
0.0516
8.86
5.476
7.03
7.79
From Table 2, it can be seen that the maximum relative errors of WNN single model and BP one are 9.72% and 10.77% respectively, and the average relative errors of them 7.03% and 7.79% respectively. The maximum relative error and the average relative error of BP nonlinear combination model are 7.39% and 5.467% respectively; while those of WNN nonlinear combination model are 6.16% and 4.670% respectively. This indicates that the prediction algorithm based on WNN is stable and practical, and can improve forecasting precision. Using the ESDD predicted by WNN combination forecasting model to divide pollution area can greatly increase the accuracy of power area pollution map.
118
H. Shuai, J. Wu, and Q. Gong
5 Conclusions ESDD is a main factor to classify contamination severity and draw pollution areas map of power network, hence its accuracy directly influences the precision of pollution areas map, and further affects the insulation capability of power system. There are some conclusions according to the study. (1) By virtue of WNN forecasting model for ESDD conquering some intrinsic shortcomings of traditional BP network, such as low convergence, difficulty to select structure, over-fitting &low generalization, existence of local minima, et al., its forecasting accuracy is higher than that of BP network. (2) The nonlinear combination forecasting model for ESDD fully takes use of the raw data and the information of all single forecasting models which compensates the flaws of only using single model, therefore its prediction precision is higher than that of single model. (3) The model discussed above can effectively improve ESDD forecasting accuracy. Compared with other models, its values are closer to the actual ones. Using its predicting ESDD to divide pollution area will largely boost the accuracy of pollution area map of power network. The way provides a new thread for the computerization of drawing pollution area map of power network.
References 1. Su, Z.Y., Wu, G.Y.: Q/GDW152-2006 (4) (2006) 2. Almad, A. S., Ahmad, H., Salam, M. A.: Regression Technique for Prediction of Salt Contamination Severity on High Voltage Insulators. In: Annual Report Conference on Electrical Insulation and Dielectric Phenomena, Victoria BC, vol. 1, pp. 218–221 (2000) 3. Almad, A.S., Ahmad, H., Salam, M.A.: Prediction of Salt Contamination on High Voltage Insulators in Rainy Season Using Regression Technique. In: Proceedings of TENCON, vol. 3, pp. 24–27 (2000) 4. Almad, A.S., Ghosh, P.S., Aljunid, S.A.K.: Modeling of Various Meteorological Effects on Contamination Level for Suspension Type of High Voltage Insulators Using ANN. In: IEEE/PES Transmission and Distribution Conference and Exhibition 2002: Asia Pacific, Yokohama, Japan, pp. 1030–1035 (2002) 5. Almad, A.S., Ghosh, P.S., Aljunid, S.A.K.: Assessing of ESDD on High-voltage Insulators Using Artificial Neural Network. Electric Power System Research, 131–136 (2004) 6. Zhang, H., Wen, X.S., Ding, H.: Extrapolation of Insulator’s ESDD Based on Climate Factor with Artificial Neural Network. High Voltage Apparatus 39, 31–32 (2003) 7. Jiao, S.B., Liu, D., Zheng, G.: Forecasting the ESDD of Insulator Based on Least Squares Support Vector Machine. In: Proceedings of the CSEE, vol. 26, pp. 149–153 (2006) 8. Tang, X.W.: Research of Combination-forecasting Calculation Methods. Forecasting 10, 35–40 (1991) 9. Bate, J.M., Granger, C.W.J.: Combination of Forecasting. Operation Research Quarterly 20, 451–468 (1969)
Research of Nonlinear Combination Forecasting Model
119
10. Zhang, G.P.: Anatomy of B-G Combination-forecasting. Forecasting 12, 24–27 (1988) 11. Wen, G.X., Niu, M.J.: A New Nonlinear Combined Forecasting Method on the Basis of Neural Networks. Systems Engineering-Theory &Practice 12, 66–72 (1994) 12. Dong, J.R., Yang, X.T.: Neural Network Approach to Combination Forecast in Ownerships of Chinese motor vehicle. In: Proceedings of 1997 international conference on management science & engineering, pp. 98–105. Harbin Institute of Technology Press (1997) 13. Cao, L., Hong, Y.: Predicting Chaotic Time Series with Wavelet Networks. Physics D (1995) 14. Li, S.X.: Wavelet Transform and its Applications. Higher Education Press, Beijing (1997) 15. Yang, J.G.: Wavelet Analysis and its Engineering Applications. China Machine Press, Beijing (2005)
Parameter Tuning of MLP Neural Network Using Genetic Algorithms Meng Joo Er and Fan Liu
*
Abstract. In this paper, a hybrid learning algorithm for a Multilayer Percept-rons (MLP) Neural Network using Genetic Algorithms (GA) is proposed. This hybrid learning algorithm has two steps: First, all the parameters (weights and biases) of the initial neural network are encoded to form a long chromosome and tuned by the GA. Second, as a result of the GA process, a quasi-Newton method called BFGS method is applied to train the neural network. Simulation studies on function approximation and nonlinear dynamic system identification are presented to illustrate the performance of the proposed learning algorithm. Keywords: Genetic algorithms, Backpropagation, Function approximation, Nonlinear dynamic system identification.
1 Introduction The most noticeable applications of Neural Networks (NNs) are in the areas of communication channel equalization, signal processing, pattern recognition, system identification, prediction and financial analysis, and process control [1~5]. Such a widespread use of NN is mainly due to its behavioral emulation of the human brain and its learning ability. A NN can be termed as a parallel and distributed process that consists of multiple processing elements. NNs have emerged as an important tool in terms of learning speed, accuracy, and noise immunity and generalization ability. In this paper, the most popular models of NN, such as feedforward NNs are considered. Supervised Learning (SL) is one of the most effective weight training algorithms, whereby efforts are made to find an optimal set of connective weights for a NN according to some optimality criteria. One of the most popular SL training algorithms for feedforward NN is backpropagation (BP). The BP is a gradient descent search algorithm. It is based on minimization of the total mean square Meng Joo Er . Fan Liu School of Electrical and Electronic Engineering, Nanyang Technological Universigy Singapore 639798, Singapore {Emjer,liuf0009}@ntu.edu.sg
*
*
H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 121–130. springerlink.com © Springer-Verlag Berlin Heidelberg 2009
122
M.J. Er and F. Liu
error between the actual output and the desired output [6]. This error is used to guide the search of the BP algorithm in the weight space. The BP is widely used in many applications in that it is not necessary to determine the exact structure and parameters of NN in advance. However, the problem of the BP algorithm is that it is very often trapped in local minima and the learning and the adaptation speed is very slow in searching for global minimum of the search space. The speed and robustness of the BP algorithm are sensitive to several parameters of the algorithm and the best parameters vary from problems to problems. Recently, some developed evolutionary algorithms, notably Genetic Algorithms (GA), have attracted a great attention in the NN areas [6]-[12]. The GA is proposed to train a feedforward NN and its performance is investigated in [6]. In [7], the GA is proposed to optimize the NN topology and connective weights. The GA is used to identify unimportant neurons and delete those neurons to yield a compact structure in [9]. By working with a population of solutions, the GA can seek many local minima, and thus increase the likelihood of finding global minimum. This advantage of the GA can be applied to NNs to optimize the topology and weight parameters. In this paper, a hybrid algorithm based on GA technique is proposed to optimize parameters of the MLP Neural Network (MLPNN). A simple chromosome representation is used, which contains information about connections, weights and biases of the MLPNN. The parameter learning process, based on GA technique and BP algorithm, is a two-step learning process. In the first step, the initial parameters, such as weights and biases of the NN are tuned by the GA. In the second step, the BP algorithm and the quasi-Newton method is introduced to train the initial NN to yield optimal values of weights and biases of the NN. This paper is organized as follows: In Section 2, the general structure of the MLPNN and the process of the GA optimal algorithm are briefly reviewed. Details of the proposed hybrid algorithm are given in Section 3. To demonstrate the effectiveness of the proposed algorithm, simulation studies on function approximation and nonlinear dynamic system identification are carried out in Section 4. Conclusions are given in Section 5.
2 MLPNN MLPNN has become very popular in past decades. The development of the BP training algorithm represents a landmark in NNs in that it provides a computationally efficient method for training the MLPNN. A general three-layer MLPNN is depicted in Fig. 1. In Fig. 1, the NN is a three-layer feedforward NN. Here, multi-input and singleoutput (MISO) systems are considered. However, all results could be extended to multi-input and multi-output (MIMO) systems. The mathematical description of the network is as follows. Input layer: Each node represents an input variable pi , i = 1, 2, , n .
Parameter Tuning of MLP Neural Network Using Genetic Algorithms
θ1
b1 W2
W1
b2
p1
123
θ2 y
p2
pn
bm
Input Layer
Layer 1
θ
r
Layer 2
Output layer
Fig. 1 Topology structure of a three-layer feedforward NN
Layer 1: Each node represents a neuron of layer 1. The term w ji denotes the weight of the link between the jth neuron of layer 1 and the ith input variable. Here, W1 is the corresponding weight matrix of layer 1. Layer 2: Each node represents a neuron of layer 2. The term wkj denotes the weight of the link between the kth neuron of layer 2 and the jth neuron of layer 1. W2 is the corresponding weight matrix of layer 2. Output layer: Each node represents an output as the summation of the incoming signals from layer 2. In such a network, a neuron sums a number of weighted inputs and a bias and then it passes the result through a nonlinear activation function. Fig. 2 shows a typical structure of a MLP neuron with n -input connections and a single output.
Fig. 2 A typical MLP neuron
124
M.J. Er and F. Liu
The output of the neuron is given by y= f(
n
∑w
1i
p i + b) ,
(1)
i =1
where p1 , p 2 , w11 , w12 ,
, p n : Input variables; , w1n : Connective weights;
b : Bias; f : Activation function.
Before a NN can be used for any purpose, the weights of neurons and bias values should be adjusted by a learning algorithm so that outputs of the NN will match the desired patterns for specific sets of inputs. One of the most widespread learning algorithms is the BP algorithm. However, difficulties faced by the BP algorithm are poor convergence rate [13] and an entrapment of local minimum. Here, we choose a quasi-Newton method called the BFGS method for NN training [14].
3 Hybrid BP Learning Algorithm Based on GA The GA is a derivative-free stochastic optimization method based on the features of natural selection and biological evolution. It has several advantages over other optimization algorithms. It can be applied to both continuous and discrete optimization problems. Compared with the BP algorithm, the GA is less likely to get trapped in local minima [6]. It is a computational model inspired by population genetics. It has been used mainly as function optimizers and it has been demonstrated to be an effective global optimization tool, especially for multi-model and non-continuous functions. The GA evolves a multi-set of elements, called a population of individuals. Each individual X i (i = 1,2, , p ) ( p , the size of the population) of population X represents a solution of the problem. Individuals are usually represented by strings and each element of which is called a gene. The value of a gene is called its allelic value, and its range is usually restricted to [0, 1], but it can also be continuous and even structured. We use real-valued strings in our approach. The GA is capable of maximizing a given fitness function F computed on each individual of the population. The block diagram of the proposed hybrid algorithm is depicted by Fig. 3. This model describes a hybrid learning algorithm of MLPNN by using the GA to optimize the parameters of the network. All the parameters of the network are encoded to form a long chromosome and tuned by the GA. Then, as a result of the GA process, the BP algorithm is used to train the network. The procedure of the hybrid BP learning algorithm is presented as follows.
Parameter Tuning of MLP Neural Network Using Genetic Algorithms
125
Start
Creating initial MLPNN
GA Start Initialization GA parameters, pc ,pm Evaluation Fitness
Satisfy the stopping criteria
Output Optimal Weights and Biases to MLPNN
YES
NO Selection New Population
Reproduce
Crossover
Mutation
Training New MLPNN New generation End
Fig. 3 Flowchart of the proposed learning algorithm method
3.1 Chromosome Representation A MLPNN can be represented by a directed graph, encode on a chromosome with each parameter (weights and biases). All these parameters are memorized by a row vector C = (ci ), i = 1,2, , N , where N is the number of all NN parameters. We can write the chromosome as
C = [W1 ,W2 , b1 , b2 ,
, bm ,θ 1 ,θ 2 ,
,θ r ] ,
(2)
where W1 denotes the connective weight of link between the input layer and the first hidden layer, W2 is the connective weight of link between the first hidden layer and the second hidden layer, b1 , b2 ,
, bm are the biases of neurons of the
126
M.J. Er and F. Liu
first hidden layer, θ1 ,θ 2 , ,θ r are the biases of neurons of the second hidden layer. We use the real-value encoding in this paper, W1 , W2 , b1 , b2 , , bm ,
θ1 , θ 2 ,
,θ r are the real values of the connective weights, biases respectively.
3.2 Fitness Function The fitness function is dependent on problem and is used to evaluate the performance of each individual. The error signal of the output neuron j at iteration n (i.e., presentation of the nth training example) is defined by
e j ( n) = d j ( n) − y j (n) .
(3)
We defined the instantaneous value of the error energy for neuron j as 1 2 1 e j (n) . Correspondingly, the value of ξ (n) is obtained by summing e 2j (n) 2 2 over all neurons in the output layer
ξ ( n) =
1 2
∑ e ( n) , 2 j
j∈C
(4)
where the set C includes all the neurons in the output layer of the network. For MLPNN it is the sum squared error. The fitness is defined as by summing ξ (n) over all n with respect to the set size N , as shown by N
F=
∑ ξ ( n) .
(5)
n =1
Obviously the objective is to minimize F (⋅) subject to weights w ji , wkj and biases bi (i = 1,2,
, m) and θ j ( j = 1,2,
, r) .
3.3 Selection Selection operator is to select individuals from the population for reproduction based on the relative fitness value of each individual. The extraction can be carried out in several ways. We use “NormGeomSelcet” ranking selection method in this paper, it is a ranking selection function based on the normalized geometric distribution.
3.4 Crossover To apply the standard crossover operator the individuals of the population are randomly paired. The two mating chromosomes are cut once at corresponding points and the sections after the cuts exchanges. The crossover point can be chosen randomly.
Parameter Tuning of MLP Neural Network Using Genetic Algorithms
127
3.5 Mutation After crossover, the new individuals are subjected to mutation. Mutation prevents the algorithm to be trapped in a local minimum. A variable is selected with a certain probability and its value is modified by a random value. Here, we choose non-uniform mutation method. Non-uniform mutation changes one of the genes of the parent based on a non-uniform probability distribution.
4 Illustrative Examples One of the most powerful uses of an NN is function approximation. Two examples are simulated in this section to demonstrate the effectiveness of the proposed hybrid learning algorithm. They are the Hermite polynomial function and the nonlinear dynamic system identification.
4.1 Function Approximation Here, the underlying function to be approximation is the Hermite polynomial, which is defined as: f ( x) = 1.1(1 − x + 2 x 2 ) exp(−
x2 ). 2
(6)
A set of 100 one-input normalized data and the corresponding target data is used as the training data. Another 100 input-target data are also used as the testing data. The parameter of GA is set as, crossover probability: pc = 0.09 , number of generation: gen = 100 . We construct a 1-15-1 MLPNN and the activation functions are set as “tansig”, “tansig” and “purelin” respectively. “Tansig” denotes hyperbolic tangent sigmoid transfer function. “Purelin” denotes linear transfer function. A set of 17 neurons is generated. The total number of parameters is 46. Simulation results are demonstrated in Fig. 4 ~ Fig. 6. The comparisons of the proposed method and BP method are depicted in Fig. 4 and Fig. 5. The mean squared error arrived at goal value at the 121 epoch in Fig. 4 and at 123 epoch in Fig. 5. The fitness value is shown in Fig. 6.
4.2 Nonlinear Dynamic System Identification The system can be described as
y (t + 1) =
y (t ) y (t − 1)[ y (t ) + 2.5] + u (t ) 1 + y 2 (t ) + y 2 (t − 1)
y (1) = 0, u (t ) = sin( 2πt / 25)
.
(7)
128
M.J. Er and F. Liu
Fig. 4 Training performance with GA
Fig. 5 Training performance without GA
Fig. 6 Fitness evolution (Example 1)
Fig. 7 Fitness evolution (Example 2)
Fig. 8 Training performance with GA
Fig. 9 Training performance without GA
The model is identified in series-parallel mode and is given by yˆ (t + 1) = f ( f (t ), y (t − 1), u (t )) .
(8)
Parameter Tuning of MLP Neural Network Using Genetic Algorithms
129
It is a three-input-single-output model. There are 200 input-data sets chosen as training data. The parameter of GA is set as, crossover probability: pc = 0.095 , number of generation: gen = 100 . We construct a 3-3-1 MLPNN and the activation functions are set as “tansig”, “tansig” and “purelin” respectively. A set of 7 neurons is generated. The total number of parameters is 16. Simulation results are demonstrated in Fig. 7 ~ Fig. 9. The comparisons of the proposed method and BP method are depicted in Fig. 8 and Fig 9. The mean squared error arrived at goal value at the 67 epoch in Fig. 7 and at about 71 epoch in Fig. 9. The fitness value is shown in Fig. 7.
5 Conclusions In this paper, a hybrid learning algorithm using GA to optimize parameters of MLPNN is presented. The initial structure of the MLPNN is created, and all the parameters of the network are tuned by GA. Simulations show that the hybrid learning algorithm has good performance in function approximation and nonlinear dynamic system identification. The GA, as a global search tool, can avoid the local minima problem of the BP. In our recent works, we propose that GA can be used to optimize the parameters of MLPNN. Further research will focus on how to use the GA to obtain a more compact structure of the network with higher accuracy. Not only the parameters of the network, but also the network topology can be optimize by the GA is a future research work.
References 1. Kadirkamanathan, V., Niranjan, M.: A Function Estimation Approach to Sequential Learning with Neural Networks. Neural Computation 5, 954–975 (1993) 2. Juang, C.F., Chin, C.T.: An On-Line Self-Constructing Neural Fuzzy Inference Network and its Applications. IEEE Trans. Fuzzy Systems 6, 12–32 (1998) 3. Widrow, B., Rumelhart, D.E., Lehr, M.A.: Neural Networks Applications in Industry, Business and Science. Communication of the ACM 37, 93–105 (1994) 4. Narendra, K.S., Kannan, P.: Identification and Control of Dynamical Systems Using Neural Networks. IEEE Trans. Neural Networks 1, 4–27 (1990) 5. Vellido, A., Lisboa, P.J.G., Vaughan, J.: Neural Networks in Business: A survey of applications. Expert Systems with Applications 17, 51–70 (1999) 6. Siddique, M.N.H., Tokhi, M.O.: Training Neural Networks: Backpropagation vs Genetic Algorithms. In: Proceeding of the International Joint Conference on Neural Networks, vol. 4, pp. 2673–2678 (1999) 7. Tang, K.S., Chan, C.Y., Man, K.F., Kwong, S.: Genetic Structure for NN Topology and Weights Optimization. In: IEEE Conference Publication, vol. 414, pp. 250–255 (1995) 8. Leng, G., McGinnity, T.M., Prasad, G.: Design for Self-Organizing Fuzzy Neural Networks Based on Genetic Algorithms. IEEE Trans. on Fuzzy Systems 14, 755–765 (2006)
130
M.J. Er and F. Liu
9. Chen, S., Wu, Y., Luk, B.L.: Combined Genetic Algorithm Optimization and Regularized Orthogonal Least Squares Learning for Radial Basis Function Networks. IEEE Trans. on Neural Networks 10, 1239–1243 (1999) 10. Maniezzo, V.: Genetic Evolution of Topology and Weight Distribution of Neural Networks. IEEE Trans. on Neural Networks 5, 39–53 (1994) 11. Billings, S.A., Zheng, G.L.: Radial Basis Function Network Configuration Using Genetic Algorithms. Neural Networks 8, 877–890 (1995) 12. Frank, H., Leung, F., Lam, H.K., Ling, S.H., Peter, K., Tam, S.: Tuning of the Structure and Parameters f a Neural Network Using an Improved Genetic Algorithm. IEEE Trans. on Neural Networks 14, 79–88 (2003) 13. Lippmann, R.P.: An Introduction to Computing with Neural Nets. IEEE ASSP Magazine 1, 4–22 (1987) 14. Watrous, R.L.: Learning Algorithms for Connectionist Networks: Applied Gradient Meth-ods for Nonlinear Optimization. In: Proceeding IEEE First Int. Conf. Neural Net., vol. 2, pp. 619–627 (1987)
Intelligent Grid of Computations Samia Jones*
Abstract. The Web can be considered a massive information system with interconnected databases and remote applications providing various services. While these services are becoming more and more user oriented, the concept of smart applications on the Web is increasing. Most sites still measure success by hits and page views. Instead, building an intelligent infrastructure to track visitors and their activities could be useful. Web intelligence accurately measures site success and guide future direction. Once built, visitor profile, event, and scenario models will clarify relevant hit measurements. To track users, a three-tiered infrastructure that aggregates, stores, and distributes intelligence across the organization could be build. A middleware platform is required to deal with multiple very-large data sources for multi-aspect analysis intelligence by creating a grid-based of web data mining agents known as Data Mining Grid. As users click banners, view products, and make purchases, and commerce software from the vendors will be filed in a central repository (data Warehouse), typically built on Oracle or Microsoft SQL Server. This Web warehouse will become the definitive repository for clean and consistent organization information. To test hypotheses, non-technical decisionmakers will use interactive analysis tools (OLAP and query tools ) from vendors like SAS. OLAP and query tools only answer the questions put to them -- they don’t reveal what users should have asked. But Data mining tools uncover hidden trend to find less-obvious knowledge. Data mining tools are available from vendors like DataSage, kapowtech and DBMiner. A multi-level control data mining architecture model included. Keywords: Intelligent Grid Data mining.
1 Introduction Central to human intelligence is the process of learning or adapting. Likewise, machine learning may be the most important aspect of Artificial Intelligence (AI), including behavior, cognition, symbolic manipulation, and achieving goals. This suggests that AI software should be concerned with being changeable or adaptable. The challenge for AI is to learn capabilities for helping people derive specifically targeted knowledge from diverse information sources, such as, the Web. Subsequently, one of the challenges facing Web Services includes developing a Samia Jones Texas A&M University at Qatar PO Box 23874 - Doha, Qatar Texas A&M Engineering Building-Education City *
H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 131–135. springerlink.com © Springer-Verlag Berlin Heidelberg 2009
132
S. Jones
Table 1 The total verified number of search results Total Total Totalfrom from from March2002 Aug. 2001 (in Dec. 2002 Search Engine (in millions) (in millions) millions) Google
9,732
8,371
6,567
AlltheWeb
6,757
4,388
4,969
AltaVista
5,419
3,432
3,112
WiseNut
4,664
5,009
4,587
HotBot MSN Search Teoma NLResearch Gigablast
3,680 3,267 3,259 2,352 2,352
2,869 2,523 1,839 3,610 NA
3,277 3,005 2,219 3,321 NA
global consensus for an architecture that lets applications (using object-oriented specialized software components) plug into an "application bus" and call a Web AI service. Smart Web applications are security controlled to allow users access only to information that they are authorized to see. Multi-layered security (via password control and the Oracle database) allows some users to view only; while others can also edit directory information, send pages or update on-call schedules. Information Portals are one of the most sophisticated applications on the Web today. During 1998, the first wave of Internet Portals became very popular. They provided consumers with personalized points of entry to a wide variety of information on the Internet. Examples included; MyYahoo (Yahoo), NetCenter (Netscape), MSN (Microsoft) and AOL. The following table gives the total verified number of search results (including Web pages, PDFs, other file types) from all searches. Since the exact same queries were used in Dec. 31, 2002., March 2002 and August 2001. Based on All the Web reported size and percentages, by Greg R. Notess. Moreover, It should be obvious that the current trend is using a grid technology of the Web services structure. The aim of this paper is to develop an intelligent grid and test it.
2 Research It is believed that the next paradigm shift in the Web is towards smart choices and knowledgeable queries. Now, the question is: How to evolve through this paradigm and generate association rules with services to support the choices that may aggregate automatically knowledgeable queries?
Intelligent Grid of Computations
133
2.1 Web Intelligence System Basic System of WIS (Web Intelligence System) may look like this: Portal → search engine→ database ↔ updates → cookies →retrieve info ↔ back to user. (Figure 1 shows that graphically)
Internet And Data Base, Search engine & Networks
Queries
Web Services
Usage Mining
HTML
VML
HTTP
WAP
TCP
UDP IP
Web logs, Cookies Users Profiles
Content Mining
Data Grid
Fig. 1 A basic Web Mining Intelligence model
Nevertheless, based on Russell & Norvig’s scheme, WIS can be classified into four categories as follows: Table 2 Designing philosophy of WIS and Ability, Functionality Of WIS
System that thinks like humans
System that thinks rationally
System that acts like humans
System that acts rationally
Moreover, to capture data transaction for future analysis, there is the need of Web mining techniques to track the clicks of the mouse that define where the visitor has been on the website. In order for the smart choices to cooperate and compete among themselves to optimize themselves as well as others’ resources and utilities, a new platform as the middleware is required to deal with the multiple very-large data sources for multi-aspect analysis in portals for the Web Intelligence System (Figure 2). Creating a grid-based, of Web mining agents, called Data Mining Grid maybe used to: 1. 2. 3. 4.
develop various data mining agents for different targeted tasks; organize the agents into a grid with multi-layer under the Web as a middleware; use the gird for multi-analysis in distributed, multiple data sources; manage the grid by a multi-level control techniques
134
S. Jones
Fig. 2 A multi-level control intelligent infrastructure model
Data Base & Warehouse
Internet And Web Services
AI
Intelligent infrastructure
Adaptive SQL
HTML
VML
HTTP
WAP
Dynamic Languages Logic Layer TCP
Results
UDP
IP
Data Grid
XML schema
Compute Grid
So, the purpose of the paper is to study the pattern of the visited sites and the relation between them as well as to analyze the visitors and possibly their motives of visiting the site. Designing a system that acts like humans based on their preferences and the wide variety of information that are available on the web. Therefore, data will be collected about the visited sites, and study of the association between the pages visited to determine if it is random or related.
3 Hypothesis The visitor is known to the visited sites The visitor is new to the visited sites, but seems to be Savvy user (computer knowledgeable) The visitor is new to the visited sites, but seems to be Ignorant user (a computer Ill literal) The visitor is new to the visited sites but seems to be a random user (No pattern on the visits) Four groups will be formed: • • • •
Recurrent user; is known to the visited sites, and finds the information quick. Savvy user: is new to the website, and knowledgeable to find the information of hyperlinks. Ignorant user; is new to the website, but computer Ill literal, too many mistakes or clicks to get the right hyperlinks. Random user: has no strong intention to get any inks no pattern among the choices, just wanders among pages.
4 Data Collection Multiple data sources that are obtained from multiple customer touch points that would include the Web, wireless webs, call centers, and brick-and-mortar store data. All will be integrated into a distributed data warehouse that provides a multifaced view of the customers.
Intelligent Grid of Computations
135
5 Conclusion It should be obvious from the description of the proposed model, and the current trend of grid technology's move toward a Web services structure, that the two are converging. In simple terms, the grid is a distributed system for sharing resources, while this model is a distributed architecture most concerned with easy integration, and simple, extensible, secure access. Both systems have common problems which is the partial failures. This research could provide better scalable performance and/or an improved data mining model quality by using Social Network theory which is now significantly influencing search engine and portal development for the Web. This could also be compared to an existing implementations of database-integrated mining kernels. Finally, the present approach can lead to greater flexibility in the deployment of a data mining application, leading to better overall workload management on an enterprise data servers that support a complex mix of transactional, decision support, data management and data mining applications.
References 1. Yao, Y., Zhong, N., Liu, J., Ohsuga, S.: Web Intelligence (WI). In: Zhong, N., Yao, Y., Ohsuga, S., Liu, J. (eds.) WI 2001. LNCS(LNAI), vol. 2198, pp. 1–17. Springer, Heidelberg (2001) 2. Berners-Lee, T., Hendler, J., Lassila, O.: The Semantic Web. Scientific American 284, 34–43 (2001) 3. Alesso, H.P., Smith, C.F.: The Intelligent Wireless Web. Addison-Wesley, Reading (2000) 4. Kumar, R., Raghavan, P., Rajagopalan, S., Tomkins, A.: The Web and Social Networks. IEEE Computer Special Issue on Web Intelligence 35, 32–36 (2002) 5. Congiusta, A., Pugliese, A., Talia, D., Trunfio, P.: Designing Grid Services for Distributed Knowledge Discovery. Web Intelligence and Agent Systems 1(2) (2003) 6. Zhong, N., Yao, Y., Ohshima, M.: Peculiarity Oriented Multi-Database Mining. IEEE TKDE 15(4) (2003) 7. Friedman, N., Getoor, L.: Efficient Learning Using Constrained Sufficient Statistics. In: Proceedings of the 7th International Workshop on Artificial Intelligence and Statistic (1999) 8. Apte, C., Natarajan, R., Pednault, E., Tipu, F.: A Probabilistic Framework for Predictive Modeling Analytics. IBM Systems Journal 41(3) (2002) 9. Apte, C., Grossman, E., Pednault, E., Rosen, B., Tipu, F., White, B.: Probabilistic Estimation Based DataMining for Discovering Insurance Risks. IEEE Intelligent Systems 14 (1999) 10. Natarajan, R., Pednault, E.P.: Segmented Regression Estimators for Massive Data Sets. In: Proc. Second SIAM Conference on Data Mining, Crystal City, VA (2002) 11. Pednault, E.: Transform Regression and the Kolmogorov Superposition Theorem. IBM Research Report RC 23227, IBM Research Division, Yorktown Heights, NY 10598 (2004) 12. Fraley, C.: Algorithms for Model-Based Gaussian Hierarchical Clustering. SIAM J. Sci. Comput. 20, 270–281 (1998)
Method of Solving Matrix Equation and Its Applications in Economic Management Qingfang Zhang and June Liu*
Abstract. By changing the conditions of linear system of equations, some important conclusions about matrix equation are obtained. Furthermore, the applications of matrix equation in economic management are discussed. Keywords: Matrix equation, Solution of matrix equation, Economic management.
1 Introduction In our production, management and commodities circulation, there are some questions can be expressed directly or indirectly by more than one system of linear equations. The matrix equation is a powerful tool to solve this kind of problem. Because of concise algorithm rules, the matrix equation makes the linear relationship with many variables easier. Furthermore, the matrix equation not only supply the theorem foundation, but also optimize programs and improve the strategic ability in practise. Thence, we should attach importance to the study of matrix equation. In many linear algebra books, there are satisfied solutions to linear equations. Unfortunately, the researches into matrix equation application are limitedly. In this paper, the author has further studied solution of the matrix equation, with a preliminary probe into matrix equation application on related issues of economic management.
2 Preliminaries We denote: A : m × s matrix, B : m × n matrix, X matrix, R( A) = r : rank of matrix A .
、 b : n ×1 matrix, E : identity
Lemma 1. Linear system of equations AX = b has solutions if and only if
R( A) = R( A, b ) = r ; Qingfang Zhang . June Liu College of Mathematical and Information Sciences, Huanggang Normal University, Huanggang 438000, China
[email protected]
*
H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 137–141. springerlink.com © Springer-Verlag Berlin Heidelberg 2009
138
Q. Zhang and J. Liu
(1) AX = b has unique solution when r = s ; (2) AX = b has infinite solutions when r < s . Lemma 2. Let A = (ail ) ≠ O be matrix with rank m , then A is an inverse matrix m
∑| a
if
i≠ j
ij
| <| a jj | ( j = 1, 2,
m) .
Column matrix b in Lemma 1 is applied by matrix B , we have Theorem 1. The necessary and sufficient condition that matrix equation AX = B has solutions is R( A) = R ( A, B ) = r .
(1) AX = B has unique solution when r = s ; (2) AX = B has infinite solutions when r < s . Proof. Let X = ( X 1 , X 2 ,
, X n ), B = ( B1 , B2 ,
, Bn ) , then matrix equation AX = B
is equal to vector equation AX i = Bi (i = 1, 2, Since ( A, B) = ( A, B1 , B2 ,
, n) .
, Bn ) ⎯⎯ →( A, B1 , B2 ,
where A is the row simple matrix of A . →( A, Bi ) (i = 1, 2, Then we have ( A, Bi ) ⎯⎯
, Bn )
, n) by primary translation.
Suppose that R( A) = r ,then A has r nonzero rows and others are zero. Using Lemma 1, we have AX = B ⇔ AX i = Bi (i = 1, 2,
, n) ⇔ R( A) = R( A, Bi )(i = 1, 2,
, n) ,
which implies that the last m − r elements of Bi are zero (i = 1, 2, , n) . In other words, R( A, B) = r = R ( A) . The matrix equation AX = B is equal to AX i = Bi (i = 1, 2, , n) according to the proof above. Thence, AX i = Bi (i = 1, 2, , n) has unique solution when r = s . So does AX = B . And, AX i = Bi (i = 1, 2, , n) has infinite solutions when r < s , in this case AX = B has infinite solutions. Theorem 2. Let A = (aij )m× m be a matrix with
aij ≥ 0 ( i, j = 1, 2,
m
m),
∑a i =1
ij
< 1( j = 1, 2,
, m) .
Then ( E − A) X = b has unique solution and X = ( E − A) −1 b . m
Proof.
∑a i =1
ij
= a1 j + a2 j +
+ amj < 1( j = 1, 2,
a1 j + a2 j + a j −1, j + a jj + a j +1, j +
, m) shows + amj < 1 − a jj ,
Method of Solving Matrix Equation and Its Applications in Economic Management m
∑a
i.e.,
i≠ j
ij
< 1 − a jj .
aij =| aij |=| −aij | is obvious by aij ≥ 0(i, j = 1, 2, m
Then
∑ | −a i≠ j
ij
139
, m) .
| <|1 − a jj | (or E − A ) is obtained. The Lemma 2 implies that E − A
is an inverse matrix and R ( E − A) = m . By Lemma 1, matrix equation ( E − A) X = b has unique solution and X = ( E − A) −1 b .
3 The Methods of Solving Matrix Equation Not only the matrix equation ( E − A) X = b , but also AX = B , we can solute them as following: ⎛ E B1 ⎞ Case 1: r = s . First, ( A, B) ⎯⎯ →⎜ r ⎟ by primary row translation, then ⎝O O⎠ ⎛E ⎞ ⎛B ⎞ AX = B and ⎜ r ⎟ X = ⎜ 1 ⎟ have same solution X = B1 . O ⎝ ⎠ ⎝O⎠ Case 2: r < s .
⎛ A B⎞ →⎜ First, ( A, B) ⎯⎯ ⎟ by primary row translation where A are r × s row ⎝O O⎠ simple matrix and B are r × n matrix. Let X = ( X 1 , X 2 ,
, X n ), B = ( B1 , B2 ,
, Bn ) ,
then the same solution equation AX i = Bi of AX = B has infinite solutions for every i and has s − n free variables. Suppose that X 0i were is a special solution of AX i = Bi (i = 1, 2, , n) , then X is the general solution. So the all solutions of AX i = Bi are X i = { X 0i + X } (i = 1, 2, , n). The all solutions of AX = B are X = ( X1 , X 2 ,
, X n ) = ( X 01 + X , X 02 + X ,
, X 0n + X ) .
4 Application Examples in Economic Management Matrix equation is a power tool to express the relationship between many variables. And it can be applied to describe the world as a mathematics module. Some problems in the economics can be solved by using matrix equation.
140
Q. Zhang and J. Liu
Table 1 The amounts will be sold A
B
I
60
64
II
100
48
Example 1 Two companies I and II all sell two products A and B. The Table 1 represents the amounts will be sold in a year. How much are the price of two this products and the profit in order to achieve the objective in Figure 2. Table 2 Income and profit in a year
I II
Income (a year)
Profit (a year)
4360 4920
2780 3460
Now, suppose that the price and profit are showed in Tabel 3. Table 3 Income and profit Price
Profit
A
X 11
X 12
B
X 21
X 22
⎛ 60 64 ⎞ A=⎜ ⎟ describe the amounts of products sold in a ⎝100 48 ⎠ ⎛ X 11 X 12 ⎞ ⎛ 4360 2780 ⎞ year, B = ⎜ ⎟ the price ⎟ the income and profit a year, X = ⎜ ⎝ 4920 3460 ⎠ ⎝ X 21 X 22 ⎠ and profit one product,respectively. Then the problem can be solved by matrix equation AX = B as following: Let
⎛ 60 64 ( A B) = ⎜ ⎝100 48
4360 2780 ⎞ primary row translation ⎛ 1 0 ⎟ ⎯⎯⎯⎯⎯⎯⎯→ ⎜ 4920 3460 ⎠ ⎝0 1
30 25 ⎞ ⎟ 40 20 ⎠
⎛ 30 25 ⎞ and X = ⎜ ⎟. ⎝ 40 20 ⎠ Example 2 Let I II III be three workshops. The expense coefficients of these three workshops in a product period are represented in Tabel 4. The products of this three workshops are 235,125,210,respectively. Now we will compute the total value of output.
、、
Method of Solving Matrix Equation and Its Applications in Economic Management
141
Table 4 Expense coefficients of these three workshops in a product period I
II
III
I
0.25
0.10
0.10
II
0.20
0.20
0.10
III
0.10
0.10
0.20
⎛ −0.25 −0.1 −0.1 ⎞ ⎛ 235 ⎞ ⎜ ⎟ ⎜ ⎟ Let A = ⎜ −0.2 −0.2 −0.1 ⎟ describe the expense coefficients, b = ⎜ 125 ⎟ ⎜ −0.1 −0.1 −0.2 ⎟ ⎜ 210 ⎟ ⎝ ⎠ ⎝ ⎠ ⎛ x1 ⎞ ⎜ ⎟ the total products of three workshops and X = ⎜ x2 ⎟ the total value of output. ⎜x ⎟ ⎝ 3⎠ ⎧(1 − 0.25) x1 − 0.1x2 − 0.1x3 = 235 ⎪ Then we have the system of equation ⎨ −0.2 x1 + (1 − 0.2) x2 − 0.1x3 = 125 . ⎪ −0.1x − 0.1x + (1 − 0.2) x = 210 1 2 3 ⎩ And the matrix equation ( E − A) X = b can be obtained. At last, the solution of ⎛ 400 ⎞ ⎛ 0.75 −0.1 −0.1⎞ ⎜ ⎟ ⎜ ⎟ matrix equation is X = ⎜ 300 ⎟ by E − A = ⎜ −0.2 0.8 −0.1⎟ = A1 and ⎜ 350 ⎟ ⎜ −0.1 −0.1 0.8 ⎟ ⎝ ⎠ ⎝ ⎠ ⎛ 0.75 −0.1 −0.1 0.8 −0.1 ⎜ −0.1 −0.1 0.8 ⎝
(A1 b ) = ⎜ −0.2 ⎜
235 ⎞ ⎛1 0 0 ⎟ primary row translation ⎜ 125 ⎟ ⎯⎯⎯⎯⎯⎯⎯ →⎜ 0 1 0 ⎜0 0 0 210 ⎟⎠ ⎝
400 ⎞ ⎟ 300 ⎟ . 350 ⎟⎠
References 1. Guo, J., Wang, W.: The solutions of Matrix Equation AX=B. Journal of Science of Teachers College and University 7, 86–88 (2008) 2. Ma, H., Liu, X.: Structured Solutions to Matrix Equation XB=C. Journal of Applied Mathematics 1, 49–51 (2008) 3. Sheng, X., Chen, G.: Research to the Solution of Matrix Equation AXB=D. Journal of Lanzhou Universtiy 6, 101–104 (2006) 4. Wu, C.: Linear Algebra. Higher Education Press, Beijing (2003) 5. Yang, G.: Some Issues on Linear Equation. College Mathematics 8, 161–167 (2008) 6. Zheng, G.: A Block Matrix Solution to Matrix equation. College Mathematics 4, 124– 127 (2005) 7. Zhang, G.: Solving Matrix Equations by Primary Row Transformation of Matrix. College Mathematics 12, 117–120 (2003) 8. Zheng, H.: On Solving General Solutions of the Linear Matrix Equation. Journal of Mathematics for Technology 6, 83–86 (2002)
Efficient Feature Selection Algorithm Based on Difference and Similitude Matrix Weibing Wu, Zhangyan Xu, and June Liu*
Abstract. Feature selection algorithm based on method-difference-similitude matrix (DSM) is a better method of data mining. In this method, for storing Dmatrix and S-matrix, the efficiency of the algorithm is seriously affected when the massive data sets are considered. So we use the idea of the old algorithm to design a new feature selection algorithm which need not store D-matrix and S-matrix. The complexity of the new algorithm are better than that of the old. At last, an example is used to illustrate the efficiency of the new algorithm. Keywords: Rough set, Feature selection, Difference matrix, Similitude matrix.
1 Introduction Reduction of pattern dimensionality via feature extraction and feature selection belongs to the most fundamental steps in data preprocessing[1-5]. Feature selection techniques aim at reducing the number of unnecessary, irrelevant, or unimportant features. It is common practice to use a measure to decide the importance and necessity of features. In 1992, Professor Skowron[6] proposed discernibility matrix methods to reduce the unnecessary attributes and select necessary attributes of information system. Since the frequencies of attributes appearing in the discernibility matrix implicate their abilities to distinguish different classes, the attribute with higher frequency is preferred to being selected. However in rules reducing, the numbers of reduced rules with the discernibility matrix method may not be the least and the reduction result may not be the simplest rules, because the similar characteristics are not revealed in discernibility matrix. So Professor Yan etc. [7,8,9,10] proposed a new knowledge reduction method based on methoddifference-similitude matrix (short by DSM). The new method can synthesize the Weibing Wu . June Liu College of Mathematics and Information Science, Huanggang Normal University, Huanggang, 438000, China
[email protected]
*
Zhangyan Xu Department of Computer, Guangxi Normal University, Guilin, 541004, China
[email protected] H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 143–152. springerlink.com © Springer-Verlag Berlin Heidelberg 2009
144
W. Wu, Z. Xu, and J. Liu
similar rules into as fewer ones as possible. So the DSM is a better method for feature selection in rough set. Professor Wu proposed an efficient algorithm for feature selection with DSM. The time complexity of this algorithm is O(| C |2 | U |2 ) . This algorithm must store D-matrix and S-matrix. So the space complexity of this algorithm is O (| C || U |2 ) , and we need about | C | × | U |2 units for storing them. For example,
if the number of the objects and attributes are 10 5 and 10 2 respectively, the D-matrix and S-matrix need about 1012 units in the worst case. This is unacceptable. Therefore, the size of units which the D-matrix and S-matrix require is a restriction when massive object sets are considered. For using a restricted space, we used the idea of radix sorting in [11] to design a new algorithm for feature selection based on DSM, but in the new algorithm, we did not construct D-matrix and S-matrix. The time complexity and space complexity of the new algorithm are cut to O(| C || U |2 ) + O(| C |2|U |) and O(| U |) + O(| C |) respectively. So the new algorithm is better than the old algorithm.
2 The Old Algorithm Based on DSM In this section, we briefly introduce the old algorithm based on DSM [10], and briefly review the concepts in [10]. A decision table is defined as S=(U,C,D,V,f), where U is the set of objects, C is the set of condition attributes, D is the set of decision attributes, and C ∩ D =∅; V = ∪ Va , where Va is the value range of attribute a. f : ( C ∪ D ) × U → V is a∈C ∪ D
information function, which one information value for each attribute of each object, that is ∀ a ∈ C ∪ D , x ∈ U , f ( a, x ) ∈Va holds. For ∀P ⊆ (C ∪ D) there is a binary indiscernibility relation IND(P):
IND(P) ={(x, y)∈U ×U | ∀a∈P, f (a, x) = f (a, y)} . The partition of U, generated by IND(P) is denoted as U / IND(P) (for short U/P). If
,
(x,y)∈IND(P) then x and y are indiscernible by attributes from P. Any element [ x]P = { y | ∀a ∈ P, f ( x, a) = f ( y, a)} in U/P is called equivalence class. Suppose there are n objects in S, and they are categorized into m categories by D. Let us denotes the ith object as xi, the union of objects in the jth category as U j, the value of attribute a of xi as f ( a , xi ) . Difference sets: The difference set of xi, denoted as S D ( xi ) .
S D ( xi ) = {md md = ∪ {a | f ( a, xi ) ≠ f ( a , xi ) ∧ f ( D, xi ) ≠ f ( D, xi )}, a ∈ C , j = 1, 2,..., n} The difference set of the jth category, denoted as SD (U j )( j = 1, 2,..., m) is defined as:
Efficient Feature Selection Algorithm Based on Difference and Similitude Matrix
145
S D (U j ) = {S D ( xi ) | xi ∈ U j } . The difference set of S, denoted as S D ( S ) , is defined as:
S D ( S ) = {S D ( xi ) | xi ∈ U } . The core of S, denoted as Core(C ) , is defined as:
Core(C ) = {a | a ∈ C ∧ ∃md = {a} ∈ SD ( xi ) | xi ∈U }. Similitude sets: The similitude of xi, denoted as S S ( xi ) .
S S ( xi ) = {ms ms = ∪ {a | f ( a, xi ) = f ( a , xi ) ∧ f ( D, xi ) = f ( D , xi )}, a ∈ C , j = 1, 2,..., n} The similitude set of the jth category, denoted as SS (U j ) , is defined as:
S S (U j ) = {S S ( xi ) | xi ∈ U j } . The similitude set of S, denoted as S S ( S ) , is defined as:
S S ( S ) = {S S ( xi ) | xi ∈ U } . The important factor of attribute a, denoted as F(a) , is defined as below:
F (a ) = ∑ i =1 q( a , i ) / | S S (U i ) |2 m
where q(a,i) denotes the frequency of attribute a appearing in S S (U i ) . The rank function of attribute a, denoted as R(a), is defined as below:
R(a ) = N (a ) × F (a ) where N(a) denotes the frequency of attribute a appearing in S D ( S ) . Only the attribute with the greatest rank value will be selected, and the rank values will be re-calculated after each selection. The followings are the old algorithm [10] . Input: Object number: n; category number: m; condition attribute set: C; decision attribute set: D; attribute value set: V; information function: f. Output: Reduct. Initial state: Reduct=Null, Core(C ) =Null. Steps: (1) Calculate difference-similitude set of the whole decision table. Generate difference set of the decision table S D ( S ) ; for i=1 to m do Generate difference set S D (U i ) ; (2) Find out the core attributes. Computing Core(C ) ;
146
W. Wu, Z. Xu, and J. Liu
Reduct← Core(C ) ;
S D ( S ) = S D ( S ) − {md | md ∩ Core(C ) ≠ ∅} ; X = C − Core(C ) ; (3) Calculate F(a), which denotes the important factor of each attribute a in X. (4) Generate reduction result. (4.1) For each attribute a in X, computing R(a); (4.2) Choose attribute a with the greatest R(a) in X (4.3) S D ( S ) = S D ( S ) − {md | md ∩ {a}) ≠ ∅} ; (4.4) Reduct←Reduct+{a}; (4.5) X=X − {a}; (4.6) If | S D ( S ) | ≠ 0 then go to step (4.1); In this old algorithm, it must store the D-matrix and S-matrix. The larger of data set is, the worst of efficiency of the old algorithm is. Because the size of units for storing the D-matrix and S-matrix is unacceptable when the massive data sets are considered. In the next section, we use the idea of the old algorithm, but we need not store the D-matrix and S-matrix.
3 The New Algorithm Based on DSM Because the core of the decision table is the part of any reductions of the decision table, the core is first to be computed. For cutting the space complexity of the new algorithm, we need not store the D-matrix and S-matrix. Now we first propose the algorithm for computing the core of the decision table. Algorithm 1: computing the core(C) Input: Object sets: U = {x1, x2, , xn} ; condition attribute set: C = {c1, c2 , , cr } ; decision attribute set: D; attribute value set: V; information function: f. Output: Core(C ) Initial state: Core(C ) =∅. for (i=1;i
for (j=i+1;j
B ← ∅;
for (k=1;k
if (|B|>1) break; } if (|B|==1) Core(C ) = Core(C ) ∪ B; }
Efficient Feature Selection Algorithm Based on Difference and Similitude Matrix
147
It is obviously that the time and space complexity of the algorithm for computing the Core(C) are O (| U |2 | C |) and O (| C |) in the worst case respectively. In the old algorithm, S-matrix is used to compute the m F (a) = ∑ q(a, i) / | SS (Ui ) |2 . In fact, there is another method to compute F(a) that i =1
do not use the S-matrix. Now we provide a new method to calculate F(a) without S-matrix. Algorithm 2: calculate F(a) Input: Object sets:
; category number: U = { x1 , x 2 , , x n } U / D = {U1,U2 , ,Um} ; condition attribute set: C; decision attribute set: D; attribute value set: V; information function: f. Output: F(a), ∀a ∈ C . (1) To a, calculate the maximum and minimum of f (a, x j ) ( j =1,2, ,n) and de-
note M and m respectively; (2) F(a)=0; (3) for ( k=1;k<m+1;k++) (3.1) Use static list to store the objects U k = {xk 1 , xk 2 , , xkt } in turn; let the head pointer of the list point to xk1 ; (3.2) Construct Mi-mi+1 empty queues, let frontk and end k (k=0,1,…, Mimi) be the head pointer and tail pointer of the kth queue respectively. Distribute the object x of the list Uk to the f ( x, ci ) − mi th queue according to the elements order of list Uk. (3.3) Calculate the number of objects in each non-empty queue, donate t1,t2, ,ts (t1 + t2 + + ts = t) ; calculate F(a) as follow: F (a ) = F (a ) + ∑ s ti2 /t 2 . i =1
It is obviously that the time and space complexity of the algorithm for calculating F(a) are O (| U |) and O (| U |) respectively. Now we only calculate N(a). It also has a new method to calculate N(a) without D-matrix. To calculate N(a), we first propose an efficient algorithm for computing U/R(R⊆C). Algorithm 3: calculate U/R(R⊆C) Input: Object sets: U = {x1, x2, , xn} ; condition attribute set: R = {r1, r2, , rl }; decision attribute set: D; attribute value set: V; information function: f. Output: U/R. (1) To each ri ( i = 1, 2, , l ) , calculate the maximum and minimum of f ( x j , ri ) ( j = 1,2, , n) and denote M i and mi respectively;
(2) Use static list to store the objects of the list point to x1 ; (3) for (i=1;i
x1 , x2 , , xn in turn ; let the head pointer
148
W. Wu, Z. Xu, and J. Liu
(3.1) The ith “distribution”: construct Mi-mi+1 empty queues, let frontk and
end k (k=0,1,…, Mi-mi ) be the head pointer and tail pointer of the kth queue respectively. Distribute the object x of the list U to the f ( x , ri ) − mi th queue according to the elements order of list U. (3.2) The ith “collection”: the head pointer of the list points to the head pointer of the first non-empty queue, modify the tail pointer of each non-empty and let it point to the head object of the next nonempty queue. In this way, recombine M i − mi + 1 queues to a new list; (4) Let the objects sequence of list from Step 3 be x1′, x2′ , t=1; Bt = {x1′} ;
, x′n ;
for (j=2;j
else { t=t+1; Bt = {x′j } ;} The time and space complexity of the algorithm 3 are O (| R || U |) and O ( | U |) respectively [11,12]. Now we give the following algorithm to calculate N(a). Algorithm 4: calculate N(a) Input: current reduction reduce(C) , U/ reduce(C ) = {R1 , R2 , , Rz } ; condition attribute set C; decision attribute set: D; attribute value set: V; information function: f. Output: N(a) , U/( reduce(C ) ∪a)= {R1′, R2′ , , Rz′′} , a ∈ C − reduce(C ) . (1) To a, calculate the maximum and minimum of f (a, x j ) ( j =1,2, ,n) and denote M and m respectively; (2) N(a)=0; (3) for ( k=1;k
i =1
ki
(3.2) If t≠1 we use the algorithm 3 to compute Rk /{a} = {Rk 1 , Rk 2 ,
, Rkt } ;
h = h1 − h2 ; if | Rki / D |== 1 , then delete Rki ( Rki ∈ Rk /{a } ); To the rest of Rkj ∈ R k /{ a } , calculate h2 = ∑t | Rki |(| Rki | −1)/2 . i=1
(3.3) N ( a ) = N ( a ) + h ; Notation: Any two objects of each Rk (k = 1,2,..., Z ) are indiscernible by attributes set {a,d}. But any two objects in different Ri and R j are discernible by attributes
set {a,d}. Since N(a) denotes the frequency of attribute a appearing in SD(S), according to the old algorithm, only two objects in each Rk (k = 1,2,..., Z ) may be obtained attribute a in SD(S). However, for any two objects in each Rki ∈ Rk / D can
Efficient Feature Selection Algorithm Based on Difference and Similitude Matrix
149
not be obtained attribute a because the value of decision attribute D of their are the same. According to the old algorithm, the frequencies of attribute a in SD(S)may be h1. On the other hand, to any two objects in each Rki ∈ Rk /{a} can not be obtained attribute a because the value of conditional attribute {a} of their are the same. So to Rk (k = 1,2,..., Z ) , the frequencies of attribute a in SD(S) is h1 –h2. The time and space complexity of the algorithm 4 are O (| U |) and O (| U |) in the worst case respectively according to the algorithm 3. Now, we can design a new as follow. Algorithm 5: efficient algorithm based on DSM Input: Object number: n; category number: m; condition attribute set: C; decision attribute set: D; attribute value set: V; information function: f. Output: Reduct. Initial state: Reduct=∅, Core(C ) =∅. (1) Calculate Core(C ) according to the algorithm 1; (2) Reduct= Reduct ∪Core(C ) ; X=C- Core(C ) ; (3) Calculate U / Core(C) according to the algorithm 3; (4) To ∀ a ∈ X , calculate according to the algorithm 2 and calculate N(a) according to the algorithm 4, R(a)= F(a)× N(a); (5) Choose attribute a with the greatest R(a) in X; (6) If | U / Reduct | ≠ 0 then go to step (4); Analysis of the complexity for the algorithm 5. The time complexity of the step 1 is O (| C || U |2 ) in the worst case. The time complexity of the step 3 is O (| C || U |) in the worst case. The time complexity of the step 4 is O (| X || U |) . So all the time complexity of the algorithm from the step 4 to the step 7 is O(| C || U |) + O((| C | −1) |U |) + + O(2| U |) + O(|U |) = O(| C |2|U |) in the worst case. So the time complexity of the new algorithm is O(| C || U |2 ) + O(| C |2|U |) . So it is better than the time complexity of the old algorithm with O(| C |2|U |2 ) . It is obviously that the space complexity of the new algorithm is O(| C |) + O(|U |) . It is also better than the space complexity of the old algorithm with O(| C || U |2 ) .
4 An Example The reduction on the following decision table will illustrate the reducing in detail. For an decision table is given in table 1[9]. According to the step 1 of the algorithm 5, we use the algorithm 1 to calculate the core of decision table 1. So Core(C)={a,d}. According to the step 3 of the algorithm 5, we can calculate U/{a,d} as follow. According to the step 3.1of Algorithm 3, there is :
front[0] → X1 → X 2 → X 4 → X10 → X12 ← end[0] ;
150
W. Wu, Z. Xu, and J. Liu
Table 1 Decision table 1
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14
a 1 1 3 1 3 2 3 3 2 1 3 1 2 2
b 1 1 3 2 2 1 2 3 3 1 2 2 2 1
c 1 1 2 1 1 1 1 2 2 2 2 1 1 2
d 1 2 2 1 2 1 1 1 2 1 1 2 2 1
e 1 2 2 2 2 1 1 2 2 2 1 1 2 1
f 1 2 1 1 1 1 1 1 1 1 2 2 1 2
D 1 1 1 1 1 2 2 2 2 2 2 2 2 2
front[1] → X 6 → X 9 → X 13 → X 14 ← end [1] ; front[2] → X 3 → X 5 → X 7 → X 8 → X11 ← end[2] ; According to the step 3.2 of Algorithm 3, there is:
X1→X2 →X4 →X10 →X12 →X6→X9→X13→X14 → X 3 → X 5 → X 7 → X 8 → X11; According to the step 3.1 of Algorithm 3, there is :
front[0] → X1 → X 4 → X10 → X 6 → X14 → X 7 → X 8 → X 11 ← end [0] ; front[1] → X 2 → X12 → X9 → X13 → X3 → X5 ←end[1]; According to the step 3.2 of Algorithm 3, there is
X1→X4 →X10 →X6 →X14 →X7 →X8→X11→X2 → X12 → X 9 → X13 → X 3 → X 5 According to the step 4 of Algorithm 3, there is U/{a,d}={{X1,X4,X10}, {X6, X14}, {X7,X8,X11},{X2,X12},{ X9,X13},{X3,X5}}. According to the step 4 of the algorithm 5, we can calculate R(b)= F(a)× N(a). We first use the algorithm 2 to calculate F(b). Now the calculation of F(b). The maximum and minimum of f (b, Xj) ( j =1,2, ,14) are M=3 and m=1 respectively according to the step 1 of the algorithm 2. To X 1 → X 2 → X 3 → X 4 → X 5 , according to the step 3.2 of Algorithm 2, there is: front[0] → X 1 → X 2 ← end [0] ; front[1] → X 4 → X 5 ← end [1] ; front[2] → X 3 ← end [2] ;
Efficient Feature Selection Algorithm Based on Difference and Similitude Matrix
151
According to the step 3.3 of Algorithm 2, there is t1 = t2 = 2, t3 = 1. So
F(b) = 0 + (22 + 22 +12 )/52 =9/25; To X 6 → X 7 → X 8 → there is:
→ X 14 , according to the step 3. 2 of Algorithm 2,
front[0] → X 6 → X 10 → X 14 ← end [0] ; front[1] → X 7 → X 11 → X 12 → X 13 ← end [1] ; front[2] → X 8 → X 9 ← end [2] ; According to the step 3.3 of Algorithm 2, there is t1 = 3, t 2 = 4 , t 3 = 2 . So
F(b) = 9/ 25 + (32 + 42 + 22 )/92 = 9/25 + 29/81 = 0.718 ;
Now we can use the algorithm 4 to calculate N(b). For the {X1,X4,X10}, we can get {X1,X4,X10}/D ={{X1,X4},{X10}}. So h1=2×1=2. And {X1,X4, X10}/{b}={{X1,X10},{X4}}. And {X4} is deleted. h2=2×1/2=1. Therefore N(b)= h1- h2=2-1=1. For the {X2,X12},there is h1=1 and h2=0. So N(b)= N(b)+1=2. For the {X6,X14},{X7,X8,X11}, {X9,X13} and {X3, X5}, according to the step 3.1 of the algorithm 4, they are deleted because their decision attribute values are the same. So
N(b)=2. And R(b)=0.72×2=1.44, U/{a,d,b}= {{X1,X10}}.
For the same reason, there are R(c)=1. 19×2=2.38, U/{a,d,c}={{X2,X12}}; R(e)=1.19×2=2.3,U/{a,d,e}={{X4,X10}};R(f)=1. 24×0=0, U/{a,d,f}={{X1,X4, X10}, {X2,X12}}. According to the step 5 of the algorithm 5, R(c) and R(e) are the maximum. So we can select one of c and e. In here, we select attribute c. So Reduct={a,d,c}. Because U/{a,d,c}={{X2,X12}}, the algorithm go to the step 4. At this time, R(b)=0.72, U/{a,d,c,b}=∅. R(e)=1. 19, U/{a,d,c,e}=∅. R(f)=0, U/{a,d,c,f}={{X2, X12 }}. In this time, we select attribute e. And Reduct={a,d,c,e}. Now U/{a,d,c,e}=∅, so algorithm is end. We can get a reduction of decision table 1 which is Reduct={a,d,c,e}.
5 Conclusions In this paper, for improving the complexity of algorithm for feature selection based on DSM, we proposed a new algorithm with the same idea of the algorithm in [10]. However, the new algorithm is not to store D-matrix and S-matrix, and the space complexity of the new algorithm is O(|U|). So it can drastically cut down the space complexity of the old algorithm with O(|C||U|2) in [10]. On the other hand, the time complexity of the new algorithm with O(|C2||U|)+ O(|C||U|2) is better than that of the old algorithm with O(|C|2|U|2). In the new algorithm, if the core is not calculated, the time complexity of the new algorithm is O(|C2||U|).
152
W. Wu, Z. Xu, and J. Liu
When the objects |U|>>|C|, if the core is not calculated, the algorithm is very good. So how to improve the efficiency of computing the core is our next work. Acknowledgments. This work is supported by the Education Department Foundation of GuangXi, China (200826) and the Scientific Research Foundation of Huanggang Normal University, China (06CB50) and the Doctor Grant of Guangxi Normal University.
References 1. Kudo, M., Sklansky, J.: Comparison of Algorithms that Select Features for Pattern Classifiers. Pattern Recognit. 33, 25–41 (2000) 2. Langley, P.: Selection of Relevant Feature in Machine Learning. In: Proceedings of the AAAI Fall Symposium on Relevance, pp. 140–144. IEEE Press, New York (1994) 3. Liu, H., Setiono, R.: A Probabilistic Approach to Feature Selection-A Filter Solution. In: Proceedings of the 13th International Conference on Machine Learning, pp. 319– 327. IEEE Press, New York (1996) 4. Zhong, N., Dong, J.Z., Ohsuga, S.: Using Rough Sets with Heuristics for Feature Selection. Journal of Intelligent Information Systems 16, 199–214 (2001) 5. Swiniarski, R.W., Skowron, A.: Rough Set Methods in Feature Selection and Recognition. Pattern Recognition Letters 24, 833–849 (2003) 6. Skowron, A., Rauszer, C.: The Discernibility Matrixes and Functions in Information Systems, pp. 331–362. Kluwer Academic Publishers, Dordrecht (1992) 7. Xia, D.L., Yan, P.L.: A New Method of Knowledge Reduction for Information System-DSM Approach. Wuhan University, Wuhan (2001) 8. Jiang, H., Yan, P.L., Xia, D.L.: A New Reduction Algorithm Difference-Similitude Matrix. In: Proceedings of the 2nd International Conference on Machine Learning and Cybernetics, pp. 1533–1537. IEEE Press, Xi’an (2003) 9. Wu, M., Xia, D. L., Yan, P. L.: Difference-Similitude Set Theory. Intelligent Computing: Theory and Applications III, SPIE, Orlando, pp. 1–11. IEEE Press, New York (2005) 10. Wu, M., Yan, P.L.: Feature Selection Based on Difference and Similitude in Data Mining. Wuhan University Journal of Natural Sciences 12, 467–470 (2007) 11. Yang, B.R., Xu, Z.Y., Song, W.: An Efficient Algorithm for Computing Core Based on Positive Region. In: Proceedings of the 2007 International Conference on Artificial Intelligence (ICAI 2007), Las Vegas, Nevada, USA, vol. I, pp. 124–132 (2007)
Exponential Stability of Neural Networks with Time-Varying Delays and Impulses Haydar Ak¸ca, Val´ery Covachev, and Kumud Singh Altmayer
Abstract. We present sufficient conditions for the uniqueness and exponential stability of equilibrium points of impulsive neural networks which are a generalization of Cohen-Grossberg neural networks. Keywords: Neural networks, delays, impulses, stability.
1 Introduction A neural network is a network that performs computational tasks such as associative memory, pattern recognition, optimization, model identification, signal processing, etc. on a given pattern via interaction between a number of interconnected units characterized by simple functions. From the mathematical point of view, an artificial neural network corresponds to a nonlinear transformation of some inputs into certain outputs. Haydar Ak¸ca Mathematical Sciences Department, Faculty of Sciences, United Arab Emirates University, P.O. Box 17551, Al Ain, UAE
[email protected] Val´ery Covachev Department of Mathematics & Statistics, College of Science, P.O. Box 36, Sultan Qaboos University, Muscat 123, Sultanate of Oman and Institute of Mathematics, Bulgarian Academy of Sciences, Sofia 1113, Bulgaria
[email protected],
[email protected] Kumud Singh Altmayer Department of Mathematical and Physical Sciences, Philander Smith College, Little Rock, AR 72202
[email protected] H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 153–163. c Springer-Verlag Berlin Heidelberg 2009 springerlink.com
154
H. Ak¸ca, V. Covachev, and K.S. Altmayer
In the present paper we consider a neural networks model described by the following system of differential equations with time-varying delays and impulses: n n d ui (t) = −ai (ui (t)) + wij fj (uj (t)) + vij gj (uj (t − τij (t))) + Ii , dt j=1 j=1
t > t0 ,
t = tk ,
i = 1, n,
ui (tk +) = Jik (ui (tk )),
(1)
i = 1, n,
t0 < t1 < t2 < · · · < tk → ∞
k = 1, 2, 3, . . . ,
as k → ∞,
where n ≥ 2 is the number of neurons in the network; ui (t) is the state potential of the i-th neuron at time t; i = 1, n means i ∈ {1, 2, . . . , n} or i = 1, 2, . . . , n; fj (·) and gj (·) are the normal and the delayed activation functions; wij and vij are the normal and the delayed synaptic connection weights from the j-th neuron on the i-th neuron; Ii are constant external inputs from outside the network to the i-th neuron; the functions ai (·) show how the neuron self-regulates or resets its potential when isolated from other neurons and inputs; τij (t) ≥ 0 are the time-varying transmission delays; tk (k = 1, 2, 3, . . .) are the instants of impulse effect. By a solution of (1) we mean u(t) = (u1 (t), u2 (t), . . . , un (t))T ∈ IRn , in which u(·) is piecewise continuous on [t0 , α) for some α > t0 such that u(tk +) and u(tk −) exist and u(·) is differentiable on the intervals of the form (tk−1 , tk ) ⊂ (t0 , α) and satisfies the differential equations in (1); we assume that u(t) is left continuous with u(tk ) = u(tk −); the left and right limits of u(·) are related by the impulse operators Jk : IRn → IRn , Jk (u) = (J1k (u1 ), J2k (u2 ), . . . , Jnk (un ))T . The model (1) is a generalization of Cohen-Grossberg neural networks first proposed by Cohen and Grossberg [3]. Special cases of this model are Hopfield-type neural networks with time-varying delays [5], cellular neural networks with time-varying delays [7, 8, 9, 10] and bi-directional associative memory neural networks with discrete delays [2]. Throughout the present paper, we will use the following assumptions. H1. The functions ai : IR → IR are continuous and there exist constants λi > 0 such that (x1 − x2 )[ai (x1 ) − ai (x2 )] ≥ λi (x1 − x2 )2 for all x1 , x2 ∈ IR and i = 1, n. H2. The transfer functions fi and gi are continuous and monotonically increasing. H3. The delays τij (t) are bounded, that is, there exists a constant b such that 0 ≤ τij ≤ b for all t = tk and i, j = 1, n. H4. The number i(t0 , t) = max{k ∈ IN : tk < t} of instants of impulse effect between t0 and t satisfies
Exponential Stability of Neural Networks
lim sup t→+∞
155
i(t0 , t) = p < +∞ t
and the impulse functions Jik are monotonically increasing. In our previous paper [1] ai (ui ) = ai ui , where ai (i = 1, n) are positive constants, fi ≡ 0, gi are Lipschitz continuous and monotonically increasing. Here the Lipschitz continuity of fi and gi will be replaced by a weaker property. However, in [1] we were able to obtain conditions for the existence of an equilibrium point of (1). Here we have to assume the existence and uniqueness of the solution of the initial value problem for (1), and the existence of an equilibrium point for (1). This paper is organized as follows. In Sect. 2, we introduce a more general time-delay impulsive system (2), (3) and present the necessary notations and concepts of the stability analysis of this system. Then we present sufficient conditions for the uniqueness and stability of equilibrium points of the impulsive system (2), (3). Finally, in Sect. 3 we apply the results obtained to the neural network with time-varying delays and impulses (1).
2 Notations and Preliminaries Here we follow [5, 6] adapting the approach expounded therein to impulsive systems. In order to study the stability analysis of the general time delay system we rewrite system (1) as du(t) = F (u(t)) + G(uτ (t)), t > t0 , t = tk , dt u(tk +) = Jk (u(tk )), k = 1, 2, 3, . . . ,
(2) (3)
where Jk (u) = (J1k (u1 ), . . . , Jnk (un ))T and the operators Jik satisfy the condition H4, u(t) = (u1 (t), u2 (t), . . . , un (t))T is the state vector of the neural network, both F and G are continuous mappings from an open subset Ω of IRn into IRn , and G(uτ (t)) is defined as G(u) = (G1 (u), G2 (u), . . . , Gn (u))T and Gi (uτ (t)) = Gi (u1 (t − τi1 (t)), u2 (t − τi2 (t)), . . . , un (t − τin (t))), τij (i, j = 1, n) are the delays which satisfy the condition H3. Also, we suppose that the initial value problem for (2), (3) has a unique solution. Let IRn be the n-dimensional real vector space with vector norm · n defined as follows: If x = (x1 , x2 , . . . , xn )T ∈ IRn , then x = |xi |. i=1
Definition 1. Let Ω ⊂ IRn , f : Ω → IRn and x0 ∈ Ω. The minimal Lipschitz constant of f with respect to x0 is defined by
156
H. Ak¸ca, V. Covachev, and K.S. Altmayer
LΩ (f, x0 ) = inf {α > 0 : f (x) − f (x0 ) ≤ αx − x0 , x ∈ Ω}. Definition 2. Let Ω ⊂ IRn , f : Ω → IRn and x0 ∈ Ω. Then the constant mΩ (f, x0 ) =
f (x) − f (x0 ), sgn (x − x0 ) x − x0 x∈Ω\{x0 } sup
is called the relative nonlinear measure of f with respect to x0 on Ω. Here ·, · is the inner product in IRn and sgn (x) = (sgn (x1 ), sgn (x2 ), . . . , sgn (xn ))T . Remark 1. If f (x) = (f1 (x1 ), f2 (x2 ), . . . , fn (xn ))T and fi (i = 1, n) are monotonically increasing, then mΩ (f, x0 ) =
f (x) − f (x0 ) . x − x0 x∈Ω\{x0 } sup
We will also need the following simple assertion from calculus. Lemma 1. [5] If a > c ≥ 0, then, for each nonnegative real number b, the equation λ − a + ceλb = 0 has a unique positive solution. In fact, the left-hand side of this equation is a strictly monotonic function for λ ≥ 0, which is nonnegative at λ = a and negative at λ = 0. If u∗ is an equilibrium point of system (2), then it satisfies the equation F (u∗ ) + G(u∗ ) = 0.
(4)
If u∗ is an equilibrium point of the impulsive system (2), (3), then it satisfies (4) and u∗ = Jk (u∗ ), k = 1, 2, 3, . . . , that is, u∗ is a fixed point of the impulse operators Jk . Definition 3. The time-delay impulsive system (2), (3) is said to be exponentially stable on a neighbourhood Ω of an equilibrium point u∗ if there are two positive constants α and M such that u(t) − u∗ ≤ M e−α(t−t0 )
sup
t0 −b≤s≤t0
u0 (s) − u∗ ,
t ≥ t0 ,
where u(t) is the unique trajectory of the system initiated from u0 (s) ∈ Ω with s ∈ [t0 − b, t0 ]. Proposition 1. Let u∗ ∈ Ω be an equilibrium point of system (2). If mΩ (F + G, u∗ ) < 0, then u∗ is the unique equilibrium point in Ω.
Exponential Stability of Neural Networks
157
Proof. Suppose that x∗ is another equilibrium point of system (2) in Ω. Then both u∗ and x∗ satisfy equation (4) and 0 > mΩ (F + G, u∗ ) = ≥
F (u) + G(u) − F (u∗ ) − G(u∗ ), sgn(u − u∗ ) u − u∗ u∈Ω\{u∗ } sup
F (x∗ ) + G(x∗ ), sgn(x∗ − u∗ ) = 0, x∗ − u∗
which is a contradiction. Let us introduce some notation: d = sup mΩ (Jk , u∗ ), k∈IN
D = max { sup mΩ (Jk−1 , u∗ ), 1}, k∈IN
ν = sup (i(t0 , t) − i(t0 , t − b)) . t>t0
From condition H4 it follows that ν < ∞. Proposition 2. Let Ω ⊂ IRn be a neighbourhood of an equilibrium point u∗ of the system (2), (3). If D < ∞ and for some matrix A = diag(a1 , a2 , . . . , an ) with ai > 0 we have mA−1 (Ω) (F A, A−1 u∗ ) + Dν LA−1 (Ω) (GA, A−1 u∗ ) < 0,
(5)
then by Lemma 1 the equation λ min ai + mA−1 (Ω) (F A, A−1 u∗ ) + Dν LA−1 (Ω) (GA, A−1 u∗ )ebλ = 0
(6)
i=1,n
has a unique positive solution λ. If d < eλ/p ,
(7)
that is, p ln d < λ, then system (2), (3) is exponentially stable on Ω. More ˜ ∈ (0, λ − p ln d) there exists a constant M such that precisely, for any λ ˜
u(t) − u∗ ≤ M e−λ(t−t0 )
sup
t0 −b≤s≤t0
u0 (s) − u∗
for all t ≥ t0 .
(8)
Proof. For any vector w ∈ IRn we have w = w, sgn(w) and w ≥ w, sgn(z) for all z ∈ IRn . Therefore for any s ∈ IR, s > 0 we have 1 u(t) − u∗ − u(t − s) − u∗ ≤ u(t) − u(t − s), sgn(u(t) − u∗ ). s s
158
H. Ak¸ca, V. Covachev, and K.S. Altmayer
So, from system (2) we have du(t) du(t) − u∗ ≤ , sgn(u(t) − u∗ ) dt dt = F (u(t)) + G(uτ (t)), sgn(u(t) − u∗ ) = F (u(t)) − F (u∗ ), sgn(u(t) − u∗ ) + G(uτ (t)) − G(u∗ ), sgn(u(t) − u∗ ) = F A(A−1 u(t)) − F A(A−1 u∗ ), sgn(A−1 u(t) − A−1 u∗ ) + GA(A−1 uτ (t)) − GA(A−1 u∗ ), sgn(A−1 u(t) − A−1 u∗ ) ≤ mA−1 (Ω) (F A, A−1 u∗ )A−1 (u(t) − u∗ ) + LA−1 (Ω) (GA, A−1 u∗ )
A−1 (u(s) − u∗ )
sup t−b≤s≤t
* ≤ mA−1 (Ω) (F A, A−1 u∗ )u(t) − u∗ + LA−1 (Ω) (GA, A−1 u∗ )
u(s) − u∗
sup t−b≤s≤t
+# min ai
$−1
.
i=1,n
By virtue of (5), by Halanay’s inequality [4] and taking into account the presence of impulses, we have u(t) − u∗ ≤ e−λ(t−t0 ) di(t0 ,t)
sup
t0 −b≤s≤t0
u(s) − u∗ ,
(9)
where λ is the unique positive solution of equation (6). In fact, for t ∈ [t0 , t1 ] by Halanay’s inequality we derive u(t) − u∗ ≤ e−λ(t−t0 )
sup
t0 −b≤s≤t0
u(s) − u∗ .
Now let t ∈ (tk , tk+1 ] for some k ∈ IN. In order to apply Halanay’s inequality, we extend u(t) as a continuous function from (tk , tk+1 ] back to t0 − b as follows: ⎧ ⎪ u(t), t ∈ (tk , tk+1 ], ⎨ v(t) = Jk Jk−1 · · · J u(t), t ∈ (t−1 , t ], = 2, k, ⎪ ⎩ Jk Jk−1 · · · J1 u(t), t ∈ [t0 − b, t1 ]. If tk−μ−1 < t − b ≤ tk−μ < · · · < tk < t, then sup
u(s) − u∗ ≤ Dμ
t−b≤s≤t
sup
v(s) − u∗
t−b≤s≤t
and μ ≤ ν. Thus we have dv(t) − u∗ * ≤ mA−1 (Ω) (F A, A−1 u∗ )v(t) − u∗ dt $−1 +# + Dν LA−1 (Ω) (GA, A−1 u∗ ) sup v(s) − u∗ min ai . t−b≤s≤t
i=1,n
Exponential Stability of Neural Networks
159
Now from Halanay’s inequality we have v(t) − u∗ ≤ e−λ(t−t0 )
sup
t0 −b≤s≤t0
v(s) − u∗ .
To derive (9) it suffices to notice that v(s) − u∗ ≤ dk u(s) − u∗ for s ∈ [t0 − b, t0 ]. Let ε > 0 be such that λ − (p + ε) ln d > 0. Then i(t0 , t) ≤ (p + ε)(t − t0 ) for all t large enough and there exists a constant M ≥ 1 such that i(t, t0 ) ≤ (p + ε)(t − t0 ) + ln M/ ln d for all t ≥ t0 . Then di(t0 ,t) ≤ M exp{[(p + ε) ln d](t − t0 )} ˜ = λ − (p + ε) ln d. and the desired estimate (8) follows with λ
3 Main Results Now, extending some results of [1, 5, 6], we present some sufficient conditions for uniqueness and exponential stability of the equilibrium of the impulsive network (1). In order to apply Propositions 1 and 2, we define F, G : IRn → n n IRn by Fi (u) = −ai (ui ) + wij fj (uj ) and Gi (u) = vij gj (uj ) + Ii . For j=1
j=1
Ω ⊂ IRn we denote by Ωi the projection of Ω on the i-th axis of IRn . Theorem 1. Let Ω be a neighbourhood of an equilibrium point u∗ = (u∗1 , u∗2 , . . . , u∗n )T of system (1), mi = mΩi (fi , u∗i ) and Mi = mΩi (gi , u∗i ). If ri (i = 1, n) are positive real numbers such that ⎫ ⎧ n n ⎬ 1 ⎨ rj r j |wji | + Dν Mi |vji | < 1, (10) max mi ⎭ r r i=1,n λi ⎩ j=1 i j=1 i then the equilibrium point u∗ of system (1) is unique in Ω. Proof. For each i = 1, n the transfer functions fi and gi are increasing, or equivalently fi (t) − fi (s) sgn(t − s) = |fi (t) − fi (s)|, gi (t) − gi (s) sgn(t − s) = |gi (t) − gi (s)| for all t, s ∈ R. Moreover, from condition H1 we have % % ai (t) − ai (s) sgn(t − s) = %ai (t) − ai (s)% ≥ λi |t − s|.
160
H. Ak¸ca, V. Covachev, and K.S. Altmayer
An equilibrium point u∗ of system (1) corresponds to a solution of the equation (4). Let us suppose that u and u∗ are two distinct solutions of (4) in Ω. Then for R = diag(r1 , r2 , . . . , rn ) we have 0 = R F (u) + G(u) − F (u∗ ) − G(u∗ ) , sgn(u − u∗ ) ⎧ n n ⎨ = ri − ai (ui ) − ai (u∗i ) + wij fj (uj ) − fj (u∗j ) ⎩ i=1 j=1 ⎫ n ⎬ vij gj (uj ) − gj (u∗j ) sgn(ui − u∗i ) + ⎭ j=1 ⎧ n n ⎨ % % % % |wij |%fj (uj ) − fj (u∗j )% ri −%ai (ui ) − ai (u∗i )% + ≤ ⎩ i=1 j=1 ⎫ ⎬ % % + |vij |%gj (uj ) − gj (u∗j )% ⎭ ≤−
n
ri λi |ui − u∗i | +
i=1
=−
n
n n ri |wij |mj |uj − u∗j | + |vij |Mj |uj − u∗j | j=1 i=1
ri λi |ui −
u∗i |
+
i=1
n j=1
&
mj
n
ri |wij | + Mj
i=1
n
' ri |vij | |uj − u∗j |
i=1
⎫ ⎬ rj |wji | − Mj rj |vji | |ui − u∗i | =− ri λi − mi ⎭ ⎩ i=1 j=1 j=1 ⎧ ⎤⎫ ⎡ n n n ⎬ ⎨ 1 ⎣ rj rj ri λi 1 − |wji | + Mj |vji |⎦ |ui − u∗i | < 0. =− mi ⎭ ⎩ λi r r i=1 j=1 i j=1 i ⎧ n ⎨
n
n
As in Proposition 1, the contradiction obtained proves the uniqueness of the equilibrium point u∗ of system (1) in Ω. Theorem 2. Let all assumptions of Theorem 1 hold. Suppose further that the unique positive solution λ of the equation 1 − 1 + Dν qeλb = 0 i=1,n pi
λ min with
pi = λi − mi
n rj j=1
ri
|wji |
Exponential Stability of Neural Networks
161
⎫ ⎧ n ⎬ ⎨M rj i q = max |vji | ⎭ r i=1,n ⎩ pi j=1 i
and
satisfies (7). If u(t) is the trajectory of system (1) initiated from u0 (s) ∈ Ω with s ∈ [t0 − b, t0 ], then max ri ∗
u(t) − u ≤ M e
i=1,n ˜ −λ(t−t 0)
min ri i=1,n
sup
t0 −b≤s≤t0
u0 (s) − u∗ ,
(11)
˜ ∈ (0, λ − p ln d). where λ Proof. We can first note that by virtue of Theorem 1 the equilibrium of system (1) is unique. By the change x = Ru system (1) takes the form ( d −1 x(t)) + RG(R−1 xτ (t)), t = tk , dt x(t) = RF (R (12) x(tk +) = RJk (R−1 x(tk )), k = 1, 2, 3, . . . It is easy to see that x∗ = Ru∗ is an equilibrium point of system (12) and mR(Ω) (RJk R−1 , Ru∗ ) = mΩ (Jk , u∗ ) ≤ d. Denote P = diag(p1 , p2 , . . . , pn ). From inequality (10) it follows that pi > 0, i = 1, n. As in Theorem 1, for all x ∈ P R(Ω) we have RF (R−1 P −1 x) − RF (u∗ ), sgn(x − P Ru∗ ) ⎫ ⎧ n n ⎬ ⎨ % % % % ∗ % ∗ % ≤ ri −%ai (ri−1 p−1 |wij |%fj (rj−1 p−1 i xi ) − ai (ui ) + j xj ) − fj (uj ) ⎭ ⎩ i=1 j=1 ⎫ ⎧ n n ⎬ ⎨ λ % % % % m i % j %xj − pj rj u∗j % xi − pi ri u∗i % + ≤ ri − |wij | ⎭ ⎩ ri pi rj pj i=1 j=1 ⎞ ⎛ n n % % r j ⎝λi − mi =− p−1 |wji |⎠ %xi − pi ri u∗i % i r i=1 j=1 i =−
n
% % ∗% ∗ % p−1 i pi xi − pi ri ui = −x − P Ru
i=1
and thus
mP R(Ω) (RF R−1 P −1 , P Ru∗ ) ≤ −1.
(13)
162
H. Ak¸ca, V. Covachev, and K.S. Altmayer
Further on, for all x ∈ P R(Ω) % % % n % n % % −1 −1 ∗ % %ri g v (r p x ) − g (u ) RG(R−1 P −1 x) − RG(u∗ ) = ij j j j j j % j % % i=1 % j=1 n n % % % % Mi rj %xj − pj rj u∗ % = |vij |Mj rj−1 p−1 |vji |%xi − pi ri u∗i % j j pi j=1 ri i=1 j=1 i=1 ⎫ ⎧ n n ⎬ ⎨M % % rj i %xi − pi ri u∗ % = qx − P Ru∗ , ≤ max |vji | i ⎭ ri i=1,n ⎩ pi
≤
n
ri
n
j=1
i=1
which implies that LP R(Ω) (RGR−1 P −1 , P Ru∗ ) ≤ q.
(14)
From inequalities (13) and (14) we deduce mP R(Ω) (RF R−1 P −1 , P Ru∗ ) + Dν LP R(Ω) (RGR−1 P −1 , P Ru∗ ) ≤ −1 + Dν q ) ') ( ( & n n Mi rj rj ν −1 ν = −1 + D max |vji | = max pi |vji | −pi + D Mi pi j=1 ri ri i=1,n i=1,n j=1 ( & ') n n rj rj −1 ν = max pi mi <0 |wji | + D Mi |vji | − λi ri ri i=1,n j=1 j=1
in view of inequality (10). Thus we can apply Proposition 2 to system (12) with A = P −1 and deduce the estimate ˜
x(t) − Ru∗ ≤ M e−λ(t−t0 )
sup
t0 −b≤s≤t
x0 (s) − Ru∗
for all t ≥ t0 . Since x = Ru, this yields estimate (11). Acknowledgements. The authors express their gratitude to the reviewer whose suggestions allowed them to considerably improve the quality of the paper, in particular, the proof of Proposition 2.
References 1. Ak¸ca, H., Alassar, R., Covachev, V.: Stability of Neural Networks with Time Varying Delays in the Presence of Impulses. Adv. Dyn. Syst. Appl. 1, 1–17 (2006) 2. Chen, A.P., Cao, J.D., Huang, L.H.: Exponential Stability of BAM Neural Networks with Transmission Delays. Neurocomputing 57, 435–454 (2004)
Exponential Stability of Neural Networks
163
3. Cohen, M., Grossberg, G.: Absolute Stability and Global Pattern Formation and Parallel Memory Storage by Competitive Neural Networks. IEEE Trans. Systems Man Cybernet. 13, 815–821 (1983) 4. Driver, R.D.: Ordinary and Delay Differential Equations. Springer, New York (1977) 5. Peng, J., Qiao, H., Xu, Z.: A New Approach to Stability of Neural Networks with Time-Varying Delays. Neural Networks 15, 95–103 (2002) 6. Song, X., Peng, J.: Stability of a Class of Time-Delayed Neural Networks with Non-Lipschitz Continuous Activation Functions. Neurocomputing (submitted) 7. Zhang, Q., Wei, X.P., Xu, J.: Global Exponential Convergence Analysis of Delayed Neural Networks with Time-Varying Delays. Phys. Lett. A 318, 537– 544 (2003) 8. Zhang, Q., Wei, X.P., Xu, J.: Global Asymptotic Stability Analysis of Neural Networks with Time-Varying Delays. Neural Process. Lett. 21, 61–71 (2005) 9. Zhang, Q., Wei, X.P., Xu, J.: Delay-Dependent Exponential Stability of Cellular Neural Networks with Time-Varying Delays. Chaos Solitons Fractals 23, 1363– 1369 (2005) 10. Zhou, D.M., Cao, J.D.: Globally Exponential Stability Conditions for Cellular Neural Networks with Time-Varying Delays. Appl. Math. Comput. 131, 487– 496 (2002)
Adaptive Higher Order Neural Networks for Effective Data Mining Shuxiang Xu and Ling Chen*
Abstract. A new adaptive Higher Order Neural Network (HONN) is introduced and applied in data mining tasks such as determining automobile yearly losses and edible mushrooms. Experiments demonstrate that the new adaptive HONN model offers advantages over conventional Artificial Neural Network (ANN) models such as higher generalization capability and the ability in handling missing values in a dataset. A new approach for determining the best number of hidden neurons is also proposed. Keywords: Neural network, Higher order neural network, Adaptive activation function, Data mining.
1 Introduction Data mining tools can answer many business questions that traditionally were too time-consuming to resolve. They search databases for hidden patterns, finding predictive information that business experts may overlook because it lies outside their expectations. An example of pattern discovery is the analysis of retail sales data to identify seemingly unrelated products that are often purchased together. Another pattern discovery example is detecting fraudulent credit card transactions from collected data [1, 2]. Data mining techniques can be implemented rapidly on existing software and hardware platforms to enhance the value of existing information resources. When implemented on high performance client/server or parallel processing systems, data mining tools can analyze massive databases to deliver answers to questions such as, "Which customers are most likely to buy this new product, and why?" [3, 4]. Data mining is usually supported by the following technologies: massive data collection, powerful multiprocessor computers, and Shuxiang Xu School of Computing and IS, University of Tasmania, Launceston, Tasmania 7250, Australia
[email protected] *
Ling Chen Information Services, Department of Health and Human Services, Hobart, Tasmania 7000, Australia H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 165–173. springerlink.com © Springer-Verlag Berlin Heidelberg 2009
166
S. Xu and L. Chen
data mining algorithms. Whilst data collection is a relatively easy task and powerful computers as well as distributed systems are readily available, commonly used data mining algorithms include Artificial Neural Networks (ANNs), Decision Trees, Rule Induction, Nearest Neighbor Classification, and Cluster Analysis [2, 5]. Data mining is often applied to two separate processes: knowledge discovery and prediction. Knowledge discovery provides explicit information that has a readable form and can be understood by users. Forecasting, or predictive modeling provides predictions of future events and may be transparent and readable in some approaches (e.g. Rule Induction), but opaque in others such as ANNs. However, several recent research findings have shown that it’s possible to retrieve transparent rules from certain ANNs [35]. This paper addresses using ANNs for data mining, for the following reasons. First, although usually considered a black-box approach, ANNs are a natural technology for data mining. ANNs are non-linear models that resemble biological neural networks in structure and learn through training. ANNs present a model based on the massive parallelism and the pattern recognition and prediction abilities of the human brain. ANNs learn from examples in a way similar to how the human brain learns. Then ANNs take complex and noisy data as input and make educated guesses based on what they have learned from the past, like what the human brain does. Next, ANNs (especially higher order ANNs) are able to handle incomplete or noisy data [6, 7]. Finally, ANNs may hold superior predictive capability, compared with other data mining approaches [8-11]. While conventional ANN models have been able to bring huge profits to many businesses, they suffer from several drawbacks. First, conventional ANN models do not perform well on handling incomplete or noisy data [6, 7, 12]. Next, conventional ANNs can not deal with discontinuities (which contrasts with smoothness: small changes in inputs produce small changes in outputs) in the input training data set [8, 10, 13]. Finally, conventional ANNs lack capabilities in handling complicated business data with high order nonlinearity [10, 13]. To overcome these limitations some researchers have proposed the use of Higher Order Neural Networks (HONNs) [14, 15]. HONNs are networks in which the net input to a computational neuron is a weighted sum of products of its inputs (instead of just a weighted sum of its inputs, as with conventional ANNs). In [8] HONN models have been used in several business applications. The results demonstrate significant advantages of HONNs over conventional ANNs such as much reduced network size, faster training, as well as reduced forecasting errors. In [16] HONNs are used for data clustering which offer significant improvement when compared to the results obtained from using self-organising maps. In [17] global exponential stability and exponential convergence issues of HONNs are studied. In [10], HONNs have been used for dealing with non-linear and discontinuous financial time-series data, and are able to offer roughly twice the performance of conventional ANNs on financial time-series prediction. [13] employs HONNs for financial data automodeling. Their algorithms are further shown to be capable of automatically finding an optimum model, given a specific application. In [18] a HONN model is applied to the classification into age-groups of abalone shellfish, a difficult benchmark to which previous researchers have tried to handle using different ANN architectures.
Adaptive Higher Order Neural Networks for Effective Data Mining
167
Adaptive HONNs are HONNs with adaptive activation functions. Such activation functions are adaptive because there are free parameters in the activation functions which can be adjusted (in the same way as connection weights) to adapt to different problems. In [24], an adaptive activation function is built as a piecewise approximation with suitable cubic splines that can have arbitrary shape and allows them to reduce the overall size of the neural networks, trading connection complexity with activation function complexity. In [25], real variables a (gain) and b (slope) in the generalized sigmoid activation function are adjusted during learning process. A comparison with classical ANNs to model static and dynamical systems is reported, showing that an adaptive sigmoid (ie, a sigmoid with free parameters) leads to an improved data modeling. In this paper, Section 2 proposes a new adaptive HONN model with an adaptive activation function. Section 3 addresses the issue of optimizing the number of hidden layer neurons, one of the key issues yet to be resolved. Section 4 gives experiments to justify our new adaptive HONN model. Section 5 offers a summary of this paper and possible directions for further work in the future.
2 Adaptive HONNs HONNs were first introduced by [14]. The network structure of a three input second order HONN is shown below:
Fig. 1 Left, MLP (multi-layer perceptron) with three inputs and two hidden nodes; Right, second order HONN with three inputs
Adaptive HONNs are HONNs with adaptive activation functions. The network structure of an adaptive HONN is the same as that of a multi-layer ANN. That is,
168
S. Xu and L. Chen
it consists of an input layer with some input units, an output layer with some output units, and at least one hidden layer consisting of intermediate processing units (see next section on the number of hidden units). We will only use one hidden layer as it has been mathematically proved that ANNs with one hidden layer is a universal approximator [26]. Usually there is no activation function for neurons in the input layer and the output neurons are summing units (linear activation), the activation function in the hidden units is an adaptive one. Our adaptive activation function has been defined as the following:
Ψ( x ) = A1⋅ sin (B1⋅ x ) + A2 ⋅ e − B 2⋅x
2
(1)
where A1, B1, A2, B2, are real variables which will be adjusted (as well as weights) during training. Justification of the use of free parameters in the neuron activation function (2.1) can be found in [8, 25, 34]. In our experiments (Section 4) we use an HONN learning algorithm that is based on an improved steepest descent rule [8] to adjust the free parameters in the above adaptive activation function (as well as connection weights between neurons). We will see that such approach provides more flexibility and better data mining ability compared with more traditional approaches.
3 Optimizing the Number of Hidden Units for HONNs Optimizing the number of hidden layer neurons for an ANN to solve a practical problem remains one of the unsolved tasks in this research area. Setting too few hidden units causes high training errors and high generalization errors due to under-fitting, while too many hidden units results in low training errors but still high generalization errors due to over-fitting. It is argued that the best number of hidden units depends in a complex way on: the numbers of input and output units, the number of training cases, the amount of noise in the targets, the type of hidden unit activation function, the training algorithm, etc [27]. A dynamic node creation algorithm for ANNs is proposed in [28]. [29] proposes an approach which is similar to [28] but removes nodes when small error values are reached. In [30] an algorithm is developed to optimize the number of hidden nodes by minimizing the mean-squared errors over noisy training data. In this paper we propose an approach for determining the best number of hidden nodes based on [31], which reports that, using ANNs for function approximation, the rooted mean squared (RMS) error between the well-trained neural network and a target function f is shown to be bounded by ⎛ C 2f O⎜ ⎜ n ⎝
⎞ ⎟ + O⎛⎜ nd log N ⎞⎟ ⎟ ⎠ ⎝N ⎠
(2)
where n is the number of hidden nodes, d is the input dimension of the target function f, N is the number of training pairs, and Cf is the first absolute moment of the Fourier magnitude distribution of the target function f. The two important points of (2) are the approximation error and the estimation error between the
Adaptive Higher Order Neural Networks for Effective Data Mining
169
well-trained neural network and the target function. For this research we are interested in the approximation error which refers to the distance between the target function and the closest neural network function of a given architecture (which represents the simulated function). To this point, [31] mathematically proves that, with n ~ Cf (N/(d log N))1/2 nodes, the order of the bound on the RMS error is optimized to be O(Cf ((d/N) log N)1/2). Based on the above result, we can conclude that if the target function f is known then the best number of hidden layer nodes (which leads to a minimum RMS error) is n = Cf (N/(d log N))1/2
(3)
Note that the above equation is based on a known target function f. However, in most practical cases the target function f is not known, instead, we are usually given a series of training input-output pairs. In these cases, [31] suggests that the number of hidden nodes may be optimized from the observed data (training pairs) by the use of a complexity regularization or minimum description length criterion [32]. This is a criterion which reflects the trade-off between residual error and model complexity and determines the most probable model (in this research, the HONN with the best number of hidden nodes). Based on this, when f is unknown we use a complexity regularization approach to determine the constant C in the following n = C (N/(d log N))1/2
(4)
The approach is to try an increasing sequence of C to obtain different number of hidden nodes, train an ANN for each number of hidden nodes, and then observe the n which generates the smallest RMS error (and note the value of the C). The maximum of n has been proved to be N/d [32]. Please note the difference between the equation (3) and the equation (4): in (3), Cf depends on a known target function f, which is usually unknown (so (3) is only a theoretical approach), whereas in our approach as shown in (4), C is a constant which does not depend on any function. Based on our experiments conducted so far we have found that for a small or medium-sized dataset (with less than 10000 training pairs), when N/d is less than or close to 30, the optimal n most frequently occurs on its maximum, however, when N/d is greater than 30, the optimal n is close to the value of (N/(d log N))1/2.
4 Adaptive HONN Experiments Our first experiment is to use our adaptive HONN to process the Automobile Dataset from the UCI Machine Learning Repository [33]. This dataset is made of 205 instances, with 25 attributes (inputs) and 1 class attribute (output). This data set consists of three types of entities: (a) the specification of an auto in terms of various characteristics, (b) its assigned insurance risk rating, and (c) its normalized losses in use as compared to other cars. The second rating corresponds to the degree to which the auto is more risky than its price indicates. Cars are initially assigned a risk factor symbol associated with its price. Then, if it is more risky (or less), this symbol is
170
S. Xu and L. Chen
adjusted by moving it up (or down) the scale. Actuarians call this process "symboling". A value of +3 indicates that the auto is risky, -3 that it is probably pretty safe. The third factor is the relative average loss payment per insured vehicle year. This value is normalized for all autos within a particular size classification (two-door small, station wagons, sports, etc...), and represents the average loss per car per year. This third factor is considered the output attribute. The 26 attributes are: Attribute: Attribute Range. 1. symboling: -3, -2, -1, 0, 1, 2, 3. 2. normalized-losses: continuous from 65 to 256. 3. make: alfa-romero, audi, bmw, chevrolet, dodge, honda, isuzu, jaguar, mazda, mercedes-benz, mercury, mitsubishi, nissan, peugot, plymouth, porsche, renault, saab, subaru, toyota, volkswagen, Volvo. 4. fuel-type: diesel, gas. 5. aspiration: std, turbo. 6. num-of-doors: four, two. 7. body-style: hardtop, wagon, sedan, hatchback, convertible. 8. drive-wheels: 4wd, fwd, rwd. 9. engine-location: front, rear. 10. wheel-base: continuous from 86.6 120.9. 11. length: continuous from 141.1 to 208.1. 12. width: continuous from 60.3 to 72.3. 13. height: continuous from 47.8 to 59.8. 14. curb-weight: continuous from 1488 to 4066. 15. engine-type: dohc, dohcv, l, ohc, ohcf, ohcv, rotor. 16. num-of-cylinders: eight, five, four, six, three, twelve, two. 17. engine-size: continuous from 61 to 326. 18. fuel-system: 1bbl, 2bbl, 4bbl, idi, mfi, mpfi, spdi, spfi. 19. bore: continuous from 2.54 to 3.94. 20. stroke: continuous from 2.07 to 4.17. 21. compression-ratio: continuous from 7 to 23. 22. horsepower: continuous from 48 to 288. 23. peak-rpm: continuous from 4150 to 6600. 24. city-mpg: continuous from 13 to 49. 25. highway-mpg: continuous from 16 to 54. 26. price: continuous from 5118 to 45400. There are missing values in this dataset. Based on our new approach, the optimal number of hidden layer neurons for this experiment is n=8. For this experiment, the data set is divided into a training set made of 85% of the original set and a test set made of 15% of the original set. A validation set is not needed because the optimal number of hidden layer neurons has been determined. After the adaptive HONN (with 8 hidden layer units) has been well trained over the training data pairs, it is used to forecast over the test set. The correctness rate reaches 96.1%. To verify that for this example the optimal number of hidden layer neuron is 8, we try to apply the same procedure by setting the number of hidden layer neurons to 7, 9, and 11, which results in correctness rates of 81.2%, 83.2%, and 79.3% on the test set, respectively.
Adaptive Higher Order Neural Networks for Effective Data Mining
171
To verify the advantages of our adaptive HONN model we establish a conventional ANN with the sigmoid activation function (and one hidden layer) for the same experiments. With 8 hidden neurons and the same training set, the conventional ANN reaches a correctness rate of only 81.4% on the test set. After we change the number of hidden neurons to 7, 9, and 11, the correctness rates obtained are 70.5%, 72.1%, and 69.1%, respectively. These results seem to suggest that the adaptive HONN model holds better capability in generalization and in handling datasets with missing values. For our second experiment, a Mushroom Data Set is considered [33]. This data set includes descriptions of hypothetical samples corresponding to 23 species of gilled mushrooms in the Agaricus and Lepiota Family. Each species is identified as definitely edible, definitely poisonous, or of unknown edibility and not recommended. This latter class was combined with the poisonous one. There are 8124 instances with 22 attributes. There are missing values in this data set. Based on our approach, the optimal number of hidden layer neurons for this experiment is n=10. For this experiment, the data set is divided into a training set made of 80% of the original set and a test set made of 20% of the original set. Again a validation set is not needed because the optimal number of hidden layer neurons has been determined. After the adaptive HONN (with 10 hidden layer units) has been well trained over the training data pairs, it is used to forecast over the test set. The correctness rate reaches 95.3%. To verify that for this example the optimal number of hidden layer neuron is 10, we try to apply the same procedure by setting the number of hidden layer neurons to 9, 11, and 14, which results in correctness rates of 82.1%, 84.5%, and 78.8.9% on the test set, respectively. To verify the advantages of the proposed adaptive HONN model we use a conventional ANN with the sigmoid activation function (and one hidden layer) for the same experiments. With 10 hidden neurons and the same training set, the conventional ANN reaches a correctness rate of only 77.1% on the test set. After we change the number of hidden neurons to 9, 11, and 14, the correctness rates obtained are 70.5%, 72.1%, and 62.3%, respectively. These results also confirm that the adaptive HONN model holds better capability in generalization and in handling datasets with missing values, compared with conventional ANNs.
5 Summary and Discussions In this paper a new adaptive HONN model is introduced and applied in data mining tasks such as predicting average yearly losses for automobiles and determining edibility of mushrooms. Such model offers significant advantages over conventional ANNs such as more accurate predictions, and the ability of handling missing values in a dataset. A new approach for determining the best number of hidden nodes has been proposed. For future work, it would be a good idea to extend the research to involve large applications which contain training datasets of over 10000 input-out pairs. Further comparison studies between our adaptive HONN and other ANN approaches should also be conducted to demonstrate the advantages and disadvantages of each method.
172
S. Xu and L. Chen
References 1. Cios, K.J., Pedrycz, W., Swiniarski, R.W., Kurgan, L.A.: Data Mining: A Knowledge Discovery Approach. Springer, Heidelberg (2007) 2. Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Elsevier, Amsterdam (2006) 3. Masseglia, F., Poncelet, P., Teisseire, M.: Successes and New Directions in Data Mining. Information Science Reference (2007) 4. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques. Elsevier/Morgan Kaufman (2005) 5. Bramer, M.: Principles of Data Mining. Springer, Heidelberg (2007) 6. Peng, H., Zhu, S.: Handling of incomplete data sets using ICA and SOM in Data Mining. Neural Computing & Applications 16, 167–172 (2007) 7. Wang, S.H.: Application Of Self-Organising Maps for Data Mining with Incomplete Data Sets. Neural Computing & Applications 12, 42–48 (2003) 8. Xu, S.: Adaptive Higher Order Neural Network Models and Their Applications in Business. In: Zhang, M. (ed.) Artificial Higher Order Neural Networks for Economics and Business, ch. XIV. IGI Global (2008) 9. Zhang, M., Xu, S.X., Fulcher, J.: ANSER: an Adaptive-Neuron Artificial Neural Network System for Estimating Rainfall Using Satellite Data. International Journal of Computers and Applications 29, 215–222 (2007) 10. Fulcher, J., Zhang, M., Xu, S.: Application of Higher-Order Neural Networks to Financial Time-Series Prediction. In: Artificial Neural Networks in Finance and Manufacturing, ch. V (2006) 11. Kohonen, T., Kaski, S., Lagus, K., Salojarvi, J., Honkela, J., Paatero, V., Saarela, A.: Self Organization Of A Massive Document Collection. IEEE Trans. Neural Networks 11, 574–585 (2000) 12. Dong, G., Pei, J.: Sequence Data Mining (Advances in Database Systems). Springer, Heidelberg (2007) 13. Zhang, M., Xu, S.X., Fulcher, J.: Neuron-Adaptive Higher Order Neural-Network Models for Automated Financial Data Modeling. IEEE Transactions On Neural Networks 13, 188–204 (2002) 14. Giles, L., Maxwell, T.: Learning Invariance and Generalization in High-Order Neural Networks. Applied Optics 26, 4972–4978 (1987) 15. Redding, N., Kowalczyk, A., Downs, T.: Constructive High-Order Network Algorithm That is Polynomial Time. Neural Networks 6, 997–1010 (1993) 16. Ramanathan, K., Guan, S.U.: Multiorder Neurons for Evolutionary Higher-Order Clustering and Growth. Neural Computation 19, 3369–3391 (2007) 17. Ho, D.W.C., Liang, J.L., Lam, J.: Global Exponential Stability of Impulsive HighOrder BAM Neural Networks with Time-Varying Delays. Neural Networks 19, 1581– 1590 (2006) 18. Abdelbar, A.M.: Achieving Superior Generalisation with a High Order Neural Network. Neural Computing & Applications 7, 141–146 (1998) 19. Chen, Y.H., Jiang, Y.L., Xu, J.X.: Dynamic Properties and a New Learning Mechanism in Higher Order Neural Networks. Neurocomputing 50, 17–30 (2003) 20. Cho, J.S., Kim, Y.W., Park, D.J.: Identification Of Nonlinear Dynamic Systems Using Higher Order Diagonal Recurrent Neural Network. Electronics Letters 33, 2133–2135 (1997)
Adaptive Higher Order Neural Networks for Effective Data Mining
173
21. Burshtein, D.: Long-Term Attraction in Higher Order Neural Networks. IEEE Transactions On Neural Networks 9, 42–50 (1998) 22. Psaltis, D., Park, C.H., Hong, J.: Higher Order Associative Memories and Their Optical Implementations. Neural Networks 1, 149–163 (1988) 23. Reid, M.B., Spirkovska, L., Ochoa, E.: Simultaneous Position, Scale, Rotation Invariant Pattern Classification Using Third-Order Neural Networks. Int. J. Neural Networks 1, 154–159 (1989) 24. Campolucci, P., Capparelli, F., Guarnieri, S., Piazza, F., Uncini, A.: Neural Networks with Adaptive Spline Activation Function. In: Proceedings of IEEE Melecon, vol. 96, pp. 1442–1445 (1996) 25. Chen, C.T., Chang, W.D.: A Feedforward Neural Network with Function Shape Autotuning. Neural Networks 9(4), 627–641 (1996) 26. Leshno, M., Lin, V.Y., Pinkus, A., Schocken, S.: Multilayer Feedforward Networks with a Nonpolynomial Activation Function Can Approximate Any Function. Neural Networks 6, 861–867 (1993) 27. Haykin, S.: Neural Networks: A Comprehensive Foundation. Prentice Hall, NJ (1999) 28. Ash, T.: Dynamic Node Creation in Backpropagation Networks. Connection Science 1, 365–375 (1989) 29. Hirose, Y., Yamashita, I.C., Hijiya, S.: Back-Propagation Algorithm Which Varies the Number Of Hidden Units. Neural Networks 4, 365–375 (1991) 30. Rivals, I., Personnaz, L.: A Statistical Procedure For Determining the Optimal Number of Hidden Neurons of A Neural Model. In: Second International Symposium on Neural Computation (2000) 31. Barron, A.R.: Approximation and Estimation Bounds for Artificial Neural Networks. Machine Learning 14, 115–133 (1994) 32. Barron, A.R., Cover, T.M.: Minimum Complexity Density Estimation. IEEE Transactions on Information Theory 37, 1034–1054 (1991) 33. UCI Machine Learning Repository (2008), http://archive.ics.uci.edu/ml/index.html 34. Xu, S., Zhang, M.: Justification of a Neuron-Adaptive Activation Function. In: IJCNN 2000 (CD-ROM) Proceeding (2000) 35. Malone, J., McGarry, K., Wermter, S., Bowerman, C.: Data Mining Using Rule Extraction From Kohonen Self-Organising Maps. Neural Computing & Applications 15, 9–17 (2006)
Exploring Cost-Sensitive Learning in Domain Based Protein-Protein Interaction Prediction Weizhao Guo, Yong Hu, Mei Liu, Jian Yin, Kang Xie, and Xiaobo Yang*
Abstract. Protein interactions are of great biological interest because they orchestrate nearly all cellular processes and can further our understandings in biological processes and diseases. Protein interaction data like many real world datasets are imbalanced in nature. Most protein pairs belong to the non-interaction class and few belong to the interaction class. Most existing protein interaction prediction methods assume equal distribution of the positive and negative interaction data. In this study, we first analyze effects of various portions of negative samples on the performance of domain-based protein interaction prediction methods using Artificial Neural Network (ANN), Bayesian Network (BN), and SVM. Then we introduce cost-sensitive learning to address the class imbalance problem. Experimental results demonstrated that the addition of cost-sensitive learning to each classifier: ANN, BN, and SVM, indeed yields an increase in accuracy. Keywords: Cost-sensitive learning, Imbalance data, Protein-protein interactions. Weizhao Guo . Jian Yin School of Information Science and Technology, Sun Yat-sen University, Guangzhou 510275, China
[email protected],
[email protected] *
Yong Hu Business Intelligence and Knowledge Discovery, Guangdong University of Foreign Studies, Sun Yat-sen University, Guangzhou 510275, China
[email protected] Mei Liu Bioinformatics and Computational Life-Sciences Laboratory, ITTC, Department of Electrical Engineering and Computer Science, The University of Kansas, 1520 West 15th Street, Lawrence, KS 66045, U.S.A.
[email protected] Kang Xie School of Business, Sun Yat-sen University, Guangzhou 510275
[email protected] Xiaobo Yang The 2nd Affiliated Hospital, Guangzhou University of TCM, Guangzhou 510120, China H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 175–184. springerlink.com © Springer-Verlag Berlin Heidelberg 2009
176
W. Guo et al.
1 Introduction Proteins are essence of life because they orchestrate virtually every process and function in a living cell. The multiplicity of functions that proteins execute in most cellular processes and biochemical events is attributed to their interactions with other proteins. Protein-protein interaction (PPI) information can contribute to ANNotations of uncharacterized proteins according to the classification of known proteins within interaction network. Moreover, information about these protein interactions can improve our understanding in diseases and directly contribute to new therapeutic approaches. In a simple yeast cell, there are about 6,000 proteins which consequently results in an enormous pool of all possible protein interaction pairs to analyze. It is thus impractical to rely on experimental techniques for obtaining a complete protein interaction map. Over the years, research in seeking complementary in silico methods that are capable of accurately predicting interactions has been actively sought. These researches have focused on machine learning techniques such as artificial neural network (ANN) [1], Bayesian network [2,3], and SVM [4-7]. Despite the preliminary successes achieved, these machine learning techniques may perform poorly in learning from imbalanced data, in which one class is represented by a large number of examples while the other is represented by only a few. The reason is that traditional classifiers seek to maximize classification accuracy over a full range of instances. High predictive accuracy may be produced for the majority class but poor predictive accuracy is produced for the minority class, which is usually the important class. Protein interaction prediction problem is a typical imbalanced data classification problem. There exist much more protein pairs that do not interact than the ones that do interact. It is apparent that failing to identify one of the few true interacting protein pairs is much worse than inaccurately classifying one of the many non-interacting pairs as interacting. For instance, consider two classifiers A and B trained to recognize a class of histones consisting 15 genes. Assume that on a test set of 1000 genes, classifier A identified 30 genes, in which 14 are correct. Classifier B, on the other hand, identified everything in the test set as negative. Undoubtedly, the classifier A has learned something regarding histones, whereas the classifier B has learned nothing. Most existing computational methods bypassed the class imbalance problem by assuming equal number of interaction and non-interaction data samples. In this research, we address the class imbalance problem in protein interaction prediction by employing cost-sensitive learning. Protein interaction prediction is formulated as a classification problem based on their constituent domain information. Protein domains are structural and/or functional units of proteins that are conserved through evolution to represent protein functions or structures. Proteins interact with each other through their domains is a well accepted concept. In this study, three classification algorithms, Artificial Neural Network (ANN), Bayesian network (BN), and SVM are analyzed using different proportions of positive and negative samples. Performances of the classifiers before and after utilizing costsensitive learning are compared. Our experimental results showed that the prediction accuracy can be improved accordingly by the addition of cost-sensitive learning.
Exploring Cost-Sensitive Learning in Domain
177
The paper is organized into four sections. Section 2 introduces feature representation used in protein interaction prediction and cost-sensitive learning that can be applied to different classification algorithms to improve the prediction accuracy. Experimental results and discussion are presented in Section 3 and conclusions are drawn in Section 4.
2 Methods 2.1 Domain Feature Representation In our study, protein-protein interaction (PPI) prediction problem is formulated as a two-class classification problem. Each protein pair is a sample classified as either interacting or non-interacting, where the interacting class is labeled as 1 and the non-interacting class is labeled as 0. A protein pair can be characterized by the domains existing in each protein. Thus, a protein pair is represented as a vector of n features where each feature corresponds to a unique Pfam domain in our data set [8]. Let X = [x1, x2 … xn, y] represents a sample with n feature attributes belonging to the class y which equals to either 0 for non-interacting or 1 for interacting. Each feature xi has a discrete value of 0, 0.5 or 1. The feature value xi = 0 means that neither proteins in the protein pair contain the corresponding domain. Similarly, xi = 0.5 means that one protein in the protein pair contains the domain, and xi = 1 means that both proteins in the pair contain the domain. This ternary-valued feature representation is different from other domain-based binary-valued protein methods. It allows us to distinguish between protein pairs with interaction evidence existing in one protein and those existing in both proteins [9].
2.2 Cost-Sensitive Learning Recent research in machine learning algorithms strives to minimize various errors by considering unequal costs of those errors. This is the case in many real-world applications. For example, diagnosing a healthy person mistakenly to be sick may be less serious than diagnosing a patient erroneously to be healthy because the latter may result in loss of a life. Thus, the inclusion of costs of errors into learning algorithms has been suggested as a potential solution for the class imbalance problem [11]. In Section 2.1, X = [x1, x2, …, xn, y] represents a sample with n feature attributes xi belonging to the class y, where y has a discrete value of 0 and 1. Therefore for a two-class case, the cost matrix has the following structure in Table 1[11]. Table 1. Two-class Cost Matrix
Actual Class
Negative Positive
Predicted Class Negative C(0, 0) = C00 C(1, 0) = C10
Positive C(0, 1) = C01 C(1, 1) = C11
178
W. Guo et al.
In Table 1, the cost matrix columns correspond to the predicted class while rows represent the actual class. C01 stands for the cost of false positives and C10 stands for the cost of false negatives. Similarly, C11 stands for the cost of true positives and C00 for the cost of true negatives. Normally, the values of C00 and C11 are both 0. C01+C10 represent the cost of error for misclassification while C11+C00 represent the cost for correct classification. In our application, the purpose of cost-sensitive learning is to train a more adaptive classifier by adjusting diverse costs of misclassification errors, so that the classifier concentrates on the misclassified protein pairs during every training iteration. Considering a two-case classification problem, the cost of a protein pair X predicted as the class I is W(X, I) where I = 0 or 1. Accordingly, the key of obtaining the optimal prediction of the protein pair X is to minimize the cost W(X, I). Assume that the probability of the protein pair X being in class J is P(J|X) where J = 0 or 1, the cost W(X, I) can be computed by the following formula:
W ( X , I ) = min( P (0 | X )C ( I ,0) + P (1 | X )C ( I ,1))
(1)
Protein interaction prediction is a typical class imbalance problem where the number of ‘non-interacting’ proteins is much more than that of ‘interacting’ proteins. Traditional classifiers do not work well with imbalance data. Therefore, we expect the prediction accuracy of the classifiers to increase with the addition of costsensitive learning. In this paper, three classification algorithms, Artificial Neural Network (ANN) [1], Bayesian network (BN), and SVM [10], are employed. ANN is a biologically inspired learning method that imitates the brain made up of closely interconnected set of neurons. It takes a number of real-valued inputs but produces only one single-valued output and adjusts the connected weights based on the output values obtained for a specific input instance. BN can combine highly irrelevant types of protein instances and converts the different types of data to a common probabilistic framework. SVM is based on structural risk minimization that provides principle means to minimize the predicted error by adjusting kernel function. It is readily adaptable to new types of data and trains mass data fast [4]. Over the years, there has been a growing interest in the application of SVM to biological problems such as prediction of protein-protein interaction [4-7]. In the next section, we analyze the effects of negative samples in domain-based protein interaction prediction by changing the proportion of negative and positive examples and compare performances of different classifiers before and after utilizing the cost-sensitive learning.
3 Experiments 3.1 Data Sources Protein-protein interaction data for the yeast organism was collected from the database of interacting proteins (DIP) [12], Deng et al. [13], and Schwikowski et al. [14]. The dataset used by Deng et al. [13] is a combined interaction data experimentally obtained through two-hybrid assays on Saccharomyces cerevisiae by
Exploring Cost-Sensitive Learning in Domain
179
Uetz et al. [15], Schwikowski et al. [14] gathered their data from yeast two-hybrid, biochemical, and genetic data. We obtained 15,409 interacting protein pairs for the yeast organism from DIP, 5,719 pairs from Deng et al. [13], and 2,238 pairs from Schwikowski et al. [14]. The data sets were then combined by removing the overlapping interaction pairs. Because domains are the basic units of protein interactions, proteins without domain information cANNot provide any useful information for our prediction. Therefore, we excluded the pairs where at least one of the proteins has no domain information. To further reduce noise in our data, pairs with both proteins containing only one domain, which only occurred once among all proteins, were also excluded. Finally, we have 9,834 protein interaction pairs among 3,713 proteins. Negative samples are generated by randomly picking a pair of proteins. A protein pair is considered to be a negative sample if the pair does not exist in the interaction set. The protein domain information was gathered from Pfam [8], a protein domain family database, which contains multiple sequence alignments of common domain families. Hidden Markov model profiles were used to find domains in new proteins. The Pfam database consists of two parts: Pfam-A and Pfam-B. Pfam-A is manually curated, and Pfam-B is automatically generated. Both Pfam-A and Pfam-B families are used here. In total, there are 4,293 Pfam domains defined by the set of proteins.
3.2 Evaluation Measures Three classification methods [1,10] for the protein-protein interaction prediction problem, namely ANN, BN, and SVM, are evaluated by computing sensitivity (SN), specificity (SP) and accuracy (AC). SN is the proportion of matched interactions over total number of observed interactions. SP is defined as the percentage of matched non-interactions over total number of observed non-interactions. AC is the percentage of all protein pairs that are correctly classified by the classifier. Detailed formulas for SN, SP and AC are listed below. Here, TP, TN, FP and FN stand for true-positive, true-negative, false-positive and false-negative, respectively. SN =
TP TP + FN
(2)
SP =
TN TN + FP
(3)
TP + TN TP + TN + FP + FN
(4)
AC =
3.3 Experiment Result 3.3.1 Preprocessing There exists vast amount of noise in protein-protein interaction dataset. Especially, we have tremendous amount of features (i.e. 4293) to consider. Intuitively, not all features are equally important. In contrast to most protein interaction prediction
180
W. Guo et al.
algorithms, data preprocessing is utilized here such that only a set of more important features are used for learning. We implemented entropy and information gain based feature selection to eliminate the noise. In our application, we define entropy as H(x)=-∑Pi Pi, where Pi is the probability of i-th message. For a set of protein pairs, we compute the information expectation using the formula:
㏒
E = p log p + (1 − p) log(1 − p)
(5)
Where p stands for the probability obtained by computing how often one class occurs in the PPI data set and (1-p) for the other class. In Section 2.1, Let X = [x1, x2, …, xn, y] represent a sample with n feature attributes xi which has a discrete value of 0, 0.5 or 1. Therefore, each input attribute xi divides the set of protein pairs into three subsets {Li1, Li2, L i3}. We can get the probability pij for each subset Lij by computing how often each subset Lij occurs, respectively (Note: j=1, 2, 3). Then, the entropy H(Lij) for each subset Lij is computed. At last, the information gain for each input attribute xi is obtained by the formula below: 3
InformationGain( xi ) = E − ∑ pij H ( Lij ) j =1
(6)
Attributes with low information gain are eliminated because they are identified as redundant noise. Based on feature selection described above, we selected 1426 attributes with high information gain from the original 4293 attributes described in section 3.1. Now the protein pairs in our dataset is processed to only consider the selected features and is used to train our model. 3.3.2 Training In order to predict protein-protein interactions, we need to train our model first. After data preprocessing by feature selection, 9834 samples (4917 positives and 4917 negatives) with 1426 feature attributes are utilized to train different models. To construct the SVM model, we use the linear kernel to find a separating hyper plane which is approximated by minimizing the prediction error. In our application, the kernel tries to find more common domains between different pairs of PPIs. The more common domains the pairs of PPIs have, the larger the similarity value is. For instance, let X and Y represent two different PPI samples with n feature attributes and the kernel value K(X, Y) is used to measure the similarity between X and Y in the feature space, the larger the similarity values are, the more likely X and Y are classified into the same class. For our artificial neural network model, the radius basis function (RBF) is employed. Like other ANN methods, it has three layers: input, hidden and output. Different layers have different functions. For example, each node in the input layer represents a feature of an input instance and the output for a given sample depends on the similarity between its domain features and domain features in other protein pairs. The hidden layer is what makes the network nonlinear through hidden neurons and is a bridge that links the input layer and the output layer. ANN
Exploring Cost-Sensitive Learning in Domain
181
predicts a protein pair to be interacting if the value assigned by the output layer is larger than or equal to a certain threshold. At last, the BN model is constructed by computing the Bayes equation (7). Assume that Dj stands for each class in the training set and H refers to the sample to be classified. We can estimate the prior probability P(Dj) by computing how often each class Dj occurs in the training data. Similarly, P(H|Dj) can be obtained by computing how often H occurs in each class for the training data. P(H) is estimated by computing ∑P(H|Dj)P(Dj). Finally, we obtain the posterior probability P(Dj|H) for each class by formula (7). By comparing the value P(Dj|H) for each class, we can classify the sample H as the class with the highest posterior probability P(Dj|H). P( D j | H ) =
P ( H | D j ) P( D j )
(7)
P( H )
3.3.3 Testing Protein-protein interaction predictions are made using the models constructed above. First, we compared different classification methods, SVM, ANN and BN, using 5-fold cross-validation with equal number of positive and negative data samples. Here, the output of three methods is a real number between 0 and 1. Thus, we can set a certain threshold and if the output for a protein pair is greater than or equal to the threshold, the protein pair is predicted as interacting. And the threshold can be changed to produce ROC curves (Figure 1). As clearly shown in Table 2 Predicted results comparing SVM, ANN and BN
Specificity (SP) Sensitivity (SN) Accuracy (AC)
SVM 68.4% 76.8% 72.6%
ANN 67.6% 77.6% 72.6%
Fig. 1 The ROC comparison for SVM, ANN and BN
BN 67.6% 79.4% 73.5%
182
W. Guo et al.
Table 2, for our domain-based protein interaction predictions, Bayesian network outperforms both SVM and ANN. When we fix the specificity at approximately the same level 68% across different classification models, the sensitivity are 76.8%, 77.6% and 79.4% for SVM, ANN and BN respectively. Meanwhile, the accuracies are 72.6%, 72.6% and 73.5%.
3.4 Effects of Adding Cost-Sensitivity In this section, we address the class imbalance problem in protein interaction prediction by changing the proportion of positive and negative samples and employing cost-sensitive learning. Here, we compare performances of the classifiers before and after utilizing cost-sensitive learning. All three classification algorithms, BN, ANN, and SVM are analyzed using different proportions of positive and negative samples (Figure 2). In this section, we validate our result based on the test dataset. In section 2.5, cost matrix for a two-class case was introduced. In this section, cost matrix is represented as [C00 C01; C10 C11]. For example, in the cost matrix [0.0 0.95; 1.05 0.0], 0.95 stands for the cost of false positive and 1.05 for the cost of false negative. Similarly, 0.0 stands for the cost of true positive and the cost of true negative, respectively. Now, we compare the effects of adding cost-sensitivity for BN, SVM and ANN using different proportions of positive and negative samples below. It can be observed that the accuracies with the addition of costsensitive learning are adaptively improved comparing to those without cost-sensitive learning. Table 3 compares the performances for the classification method BN before and after utilizing cost-sensitive learning as the proportion of positive and negative training samples is varied. When the proportion of positive and negative training examples is 1000: 2000, 1000: 3000 and 1000: 4000, the addition of cost matrix, [0.0 0.90; 1.25 0.0], yields an increase in accuracy.
1XPEHU
Fig. 2 Setting different number of positive and negative training examples
3RVLWLYH
1HJDWLYH
Table 3 Comparing the results for BN before and after utilizing cost-sensitive learning
Size (positive: negative) 1000: 2000 1000: 3000 1000: 4000
Without/ With the addition of cost-sensitive SP SN AC 92.3%/89.1% 30.4%/39.8% 61.3%/64.2% 93.9%/92.2% 27.1%/32.8% 60.4 %/62.5% 94.5%/93.0% 25.4%/30.1% 59.9%/61.6%
Exploring Cost-Sensitive Learning in Domain
183
Table 4 Comparing the results for SVM before and after utilizing cost-sensitive learning
Size (positive: negative) 1000: 2000 1000: 3000 1000: 4000
Without/ With the addition of cost-sensitive SP SN AC 86.8%/84.4% 42.9%/46.5% 64.8%/65.5% 91.5%/87.7% 33.2%/40.9% 62.3%/64.2% 93.9%/89.3% 26.2%/36.8% 60.0%/63.0%
Table 4 compares the performances for SVM before and after utilizing costsensitive. When the proportion is 1000: 2000, adding cost matrix [0.0 0.95; 1.05 0.0] improves the accuracy. When 1000: 3000, the corresponding cost matrix is [0.0 0.95; 1.25 0.0]. When 1000: 4000, the corresponding cost matrix is [0.0 0.90; 1.25 0.0]. Table 5 compares the classification result for ANN before and after using costsensitive learning. When the proportions of positive and negative examples are 1000: 2000, the accuracy increases with the addition of [0.0 0.95; 1.25 0.0]. When 1000: 3000 and 1000: 4000, the corresponding cost matrix is [0.0 0.90; 1.25 0.0]. Table 5 Comparing the results for ANN before and after utilizing cost-sensitive learning
Size (positive: negative) 1000: 2000 1000: 3000 1000: 4000
Without/ With the addition of cost-sensitive SP SN AC 88.1%/82.7% 40.1%/49.7% 64.1%/66.2% 89.7%/89.0% 30.6%/39.0% 60.2%/64.0% 94.4%/89.3% 25.4%/32.5% 59.9%/60.9%
4 Conclusion In this paper, we first used feature selection based on entropy and information gain to eliminate existing noise in our data set. The useful features are used to train our models and can obtain the classification result more quickly and more accurately. Then we addressed class imbalance problem in protein interaction prediction by applying cost-sensitive learning. Three classification algorithms, ANN, BN and SVM are analyzed in domain-based protein-protein interactions prediction and we compared the classification results for the three algorithms before and after employing cost-sensitive learning as the proportion of positive and negative samples are varied. Experimental results suggested that adding cost-sensitive learning to each classifier yields an increase in accuracy. Acknowledgements. This research was partly supported by National Natural Science Foundation of China (NSFC, Project No.: (70801020) (60773198) (60573097) (70572053)). Natural Science Foundation of Guangdong Province (7300272), Research Foundation of Science, Technology Plan Project in Guangdong Province (2007B031403003), Sun Yat-sen University “211 Project” Construction Projects of Phase Key Discipline, NECT-06-0737, GDUFS(399-X3207018) and GDUFS(GWQ0718).
Ⅲ
184
W. Guo et al.
References 1. Chen, X.W., Liu, M.: Domain Based Predictive Models for Protein-Protein Interaction Prediction. EURASIP Journal on Applied Signal Processing, 1–8 (2006) 2. Jansen, R., Yu, H., Greenbaum, D., Kluger, Y., Krogan, N.J.: A Bayesian Networks Approach for Predicting Protein-Protein Interactions from Genomic Data. Science 302, 449–453 (2003) 3. Rhodes, D.R., Tomlins, S.A., Varambally, S., Mahavisno, V., Barrette, T.: Probabilistic Model of the Human Protein-protein Interaction Network. Nat. Biotechnol., 951– 959 (2005) 4. Bock, J.R., Gough, D.A.: Predicting Protein-protein Interactions from Primary Structure. Bioinformatics 17, 455–460 (2001) 5. Ben-Hur, A., Noble, W.S.: Kernel Methods for Predicting Protein-protein Interactions. Bioinformatics, i38–i46 (2005) 6. Martin, S., Roe, D., Faulon, J.L.: Predicting Protein-protein Interactions Using Signature Products. Bioinformatics 21, 218–226 (2005) 7. Chen, X.W., Han, B., Fang, J.: Large-scale Protein-protein Interaction Prediction Using Novel Kernel Methods. Int. J. Data Mining and Bioinformatics 2, 145–156 (2008) 8. Finn, R.D., Tate, J., Mistry, J., Coggill, P.C., Sammut, J.S.: The Pfam Protein Families Database. Nucleic Acids Res. 36, D138–D141 (2008) 9. Chen, X.W., Liu, M.: Prediction of Protein-protein Interactions Using Random Decision Forest Framework. Bioinformatics 21, 4394–4400 (2005) 10. Burges, C.J.C.: A Tutorial on Support Vector Machines for Pattern Recognition. Data Mining and Knowledge Discovery 2, 121–167 (1998) 11. Elkan, C.: The Foundation of Cost-Sensitive Learning. In: 17th International Joint Conference on Artificial Intelligence, pp. 973–978 (2001) 12. Salwinski, L., Miller, C.S., Smith, A.J., Pettit, F.K., Bowie, J.U.: The Database of Interacting Proteins: 2004 update. Nucleic Acids Res. 32, D449–D451 (2004) 13. Deng, M., Mehta, S., Sun, F., Chen, T.: Inferring Domain-domain Interactions from Protein-protein Interactions. Genome Res. 12, 1540–1548 (2002) 14. Schwikowski, B., Uetz, P., Fields, S.: A Network of Protein-protein Interactions in Yeast. Nature Biotechnol. 18, 1257–1261 (2000) 15. Uetz, P., Giot, L., Cagney, G., Mansfield, T.A., Judson, R.S.: A Comprehensive Analysis of Protein-protein Interactions in Saccharomyces cerevisiae. Nature 403, 623–627 (2000)
An Efficient and Fast Algorithm for Estimating the Frequencies of 2-D Superimposed Exponential Signals in Presence of Multiplicative and Additive Noise Jiawen Bian, Hongwei Li, Huiming Peng, and Jing Xing
Abstract. In this paper, we consider the two-dimensional (2-D) superimposed exponential signals in independently and identically distributed (i.i.d.) multiplicative and additive noise. We use a three step iterative(TSI) algorithm to estimate the frequencies of the considered model. It is observed that the estimator is consistent and works quite well in terms of biases and mean squared errors. Moreover, the convergence rate of the estimators attain Op (M −3/2 N −1/2 ) and Op (M −1/2 N −3/2 ) for each pair of frequencies. It attains the convergence rate of the least squares estimators (LSEs) in presence of only additive noise. Keywords: 2-D superimposed exponential signals, Three step iterative algorithm, Least squares estimators, Consistent estimators.
1 Introduction We consider the following 2-D model of superimposed exponential signals in multiplicative and additive noise: y(m, n) =
p
ξk (m, n)ei(uk m+vk n+φk ) + ε(m, n), m = 1, 2, . . . , M, n = 1, 2, . . . , N,
k=1
(1) Jiawen Bian · Hongwei Li · Huiming Peng School of Mathematics and Physics, China University of Geosciences, Wuhan 430074, China Jing Xing Department of Statistics and Applied Mathematics, Hubei University of economics, Wuhan 430205, China
[email protected] H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 185–195. c Springer-Verlag Berlin Heidelberg 2009 springerlink.com
186
J. Bian et al.
√ where y(m, n) are observed values, i = −1, uk ∈ (0, 2π) and vk ∈ (0, 2π) are the unknown frequencies. Multiplicative noise {ξk (m, n)} is an array of independent identically distributed (i.i.d) real random variables with mean Ak = 0 and finite variance σk2 , {φk } ∈ [0, 2π) is the unknown phase. Additive noise {ε(m, n)} is an array of complex random variable with mean zero and finite variance σ02 /2 for both the real and imaginary parts which are assumed to be independent. The number of components p is assumed to be known in advance. In this paper we mainly consider the estimation of frequencies (uk , vk ), given a sample of size M and N , namely y(1, 1), . . . , y(M, N ). We make the following assumptions: (1) The multiplicative noise {ξk (m, n)} and additive noise {ε(m, n)} are independent mutually; (2) The frequencies u j s are different from each other, and it is the same for vj s; (3) 0 < limmin{M,N }→+∞ M N < +∞. The problem of estimating 2-D exponential signals is an important problem in multidimensional statistical signal processing. This modeling and estimation problem has fundamental importance, as well as various applications in texture estimation of image(see [1] and the reference therein) and in wave propagation problems (see [2] and the references therein). This problem has been intensively studied in presence of additive noise. Many techniques have been developed to estimate the parameters of 2-D superimposed exponential signals in white or color noise. Such as maximum-likelihood estimator (MLE)(see [3]), least squares estimators (LSEs)([4]), Pisarenko’s method([5]), singular value decomposition ([6]), cumulant based method([7]). However, multiplicative noise may also occur in a variety of applications (see [8], [9], [10]). But there are not many methods for the 2-D harmonic retrieval in the presence of multiplicative noise and additive noise. [11] and [12] considered this problem using cyclic statistic method. It was also considered in [13] using 2-D chirp z-transform. It is known [4] that the LSEs have the best convergence rate among all the estimators and attains the CRB in presence of white i.i.d. noise. It needs multidimensional searching, so it is time consuming and hard to implement for large sample size. It will be very meaningful to find a iterative algorithm for the LSEs. But nowhere, at least not known to the authors, the iterative procedure for the LSEs of the frequencies for a 2-D superimposed exponential signal model with multiplicative and additive noise has been considered. Recently, a computationally efficient three step iterative (TSI) algorithm was proposed by [14] to estimate the frequencies of 1-D sinusoidal signals in presence of additive stationary noise, which shares the same technique with the seven step iterative algorithm proposed by [15], it is observed that the TSI algorithm is statistically efficient and attains the same convergence rate as the LSEs. Moreover, it need only a little time to work and can be serve as online implementation.
An Efficient and Fast Algorithm
187
Stimulated by the paper [14], in this paper, we generalized the TSI algorithm for the 1-D signal to estimate the frequencies of a 2-D superimposed exponential signal in presence of i.i.d multiplicative and additive noise. This method uses a correction term based on A1M,N (j), A2M,N (j) and BM,N (j) to be defined in Section 3, which are functions of the data vector and the j-th available frequency estimators. It is observed that if the initial estimators are accurate up to the order Op (M −1 ) and Op (N −1 ) for the frequencies uj and vj respectively (here Op (M δ ) means M −δ Op (M δ ) is bounded in probability), then the three step iterative algorithm produces fully efficient frequency estimators which have convergence rate of Op (M −3/2 N −1/2 ) and Op (M −1/2 N −3/2 ) for uj and vj respectively, which is the convergence rate of the LSEs ([4]). We use the 2-D periodogram maximizers over Fourier frequencies as the initial estimators. It is known that the 2-D periodogram maximizers over Fourier frequencies do not generally provide estimators up to the order Op (M −1 ) and Op (N −1 ) for the frequencies uj and vj respectively. To overcome this problem, we do not use the fixed sample size available for estimation at each step. At first step we use a fraction of it and at the last step we use the whole data set by gradually increasing the effective sample sizes. The rest of the paper is organized as follows. In Section 2, we give the initial estimator based on the 2-D periodogram, which is implemented by 2D FFT. The TSI algorithm is presented in Section 3. In Section 4, we present the numerical experiments and finally we conclude the paper in Section 5. All the proofs are provided in the Appendix.
2
Initial Estimator
We consider the initial estimator as the 2-D periodogram maxmizer at the Fourier frequencies, which is defined as follows: %2 % M N % 1 %% −i(mu+nv) % I(u, v) = y(m, n)e (2) % . % % MN % m=1 n=1
The estimators obtained by finding p local maxima of I(u, v) achieve the best possible rate and are asymptotically equivalent to the LSEs. It is known as approximate least squares estimators (ALSE) in the literature, but it needs multidimensional searching in the frequency domain and is very time consuming for large sample size. In this paper, we employ 2-D FFT to find the p local maxima of I(u, v). The initial estimates obtained by 2-D FFT are 2πj Fourier frequencies which have the form u = 2πi M and v = N for some integer 1 i M, 1 j N , so the convergence rate of the initial estimators are Op (M −1 ) and Op (N −1 ) for uj and vj respectively.
188
J. Bian et al.
3 TSI Algorithm In this section, we will discuss the TSI algorithm. Given a consistent estimator u ˜j and v˜j of model (1), we compute u ˆj and u ˆj for j = 1, . . . , p as follows: 12 A1M,N (j) 12 A2M,N (j) , vˆj = v˜j + 2 Im , (3) u ˆj = u˜j + 2 Im M BM,N (j) N BM,N (j) where A1M,N (j) =
N M
M −i(˜uj m+˜vj n) )e , 2
(4)
N −i(˜uj m+˜vj n) )e , 2
(5)
y(m, n)e−i(˜uj m+˜vj n) ,
(6)
y(m, n)(m −
m=1 n=1
A2M,N (j) =
N M
y(m, n)(n −
m=1 n=1
BM,N (j) =
M N m=1 n=1
and Im[.] denotes the imaginary part of a complex number. We can start with any consistent estimators u ˜j and v˜j and improve them step by step using Eq. (3). The motivation of the algorithm is based on the following theorem. In the following we note u = (u1 , ..., up ), v = (v1 , ..., vp ), u ˆ = (ˆ u1 , ..., u ˆp ), v ˆ = (ˆ v1 , ..., vˆp ) for convenience. Theorem 1. If for j = 1, 2, . . . , p, u ˜j − uj = Op (M −1−δ ) and v˜j − vj = −1−δ Op (N ), where δ ∈ (0, 1], then (a) u ˆj − uj = Op (M −1−2δ ), vˆj − vj = Op (N −1−2δ ), if δ < 12 , 3
1
1
where Σjj =
σ2 −σj2 , f or A2j
3
L
u − u), M 2 N 2 (ˆ v − v)) −→ N2p (0, 6Σ), if δ 12 , (b) (M 2 N 2 (ˆ j = 1, ..., p; Σjj = 1, ..., 2p; Σij = 0, f or i = j. σ 2 = pk=1 σk2 + σ02
2 σ2 −σj−p , f or A2j−p
j = p +
Proof. See Appendix. We start with the initial estimates obtained by 2-D FFT and improve it step (m) by step by the recursive algorithm below. The m-th step estimators u ˆj and (m)
(m−1)
ˆj vˆj are computed from the (m − 1)-th step estimators u respectively by the formula (m) u ˆj
=
(m−1) u ˆj +
(m−1)
and vˆj
12 A1Mm ,Nm (j) 12 A2Mm ,Nm (j) (m) (m−1) , vˆj = vˆj , Im + 2 Im 2 Mm BMm ,Nm (j) Nm BMm ,Nm (j) (7)
where A1Mm ,Nm (j), A2Mm ,Nm (j) and BMm ,Nm (j) can be obtained from Eq. (m−1) (m−1) (4-6) by replacing M , N , u ˜j and v˜j with Mm , Nm , u ˆj and vˆj respectively. We repeatedly choose suitably Mm and Nm at each step as follows:
An Efficient and Fast Algorithm
Step 1. (0) vˆj
189 (0)
With m = 1, choose M1 = M 0.8 , N1 = N 0.8 , u ˆj
= u ˜j and
= v˜j , which are the initial estimates obtained by 2-D FFT. −1− 14
Note that u ˜j − uj = Op (M −1 ) = Op (M1 −1− 1 Op (N1 4 ).
) and v˜j − vj = Op (N −1 ) = (0)
(0)
Taking M1 = M 0.8 , N1 = N 0.8 , u ˆj = u ˜j and vˆj (7), and applying part (a) of Theorem 1 , we obtain −1− 12
(1)
u ˆj −uj = Op (M1 Step 2. (2) vˆj (1) vˆj
6
6
) = Op (N − 5 ). (2)
(1) u ˆj
and
6
−1− 23
(2)
u ˆj −uj = Op (M2
3
−1− 23
) = Op (M − 2 ), vˆj −vj = Op (N2 (2)
3
) = Op (N − 2 ) (3)
With m = 3, choose M3 = M and N3 = N , computing uˆj
from
(2) u ˆj
and 3
and
6 −1− 1 (1) Since u ˆj − uj = Op (M − 5 ) = Op (M2 3 ) and −1− 1 = Op (N2 3 ), using part (a) of Theorem 1 again,
(1) vˆj .
− vj = Op (N − 5 ) we have
(3) vˆj
(1)
With m = 2, choose M2 = M 0.9 and N2 = N 0.9 , compute uˆj
from
Step 3.
−1− 12
) = Op (M − 5 ), vˆj −vj = Op (N1
= v˜j in
(2) vˆj 1
and
and applying part (b) of Theorem 1, we have 1
3
L
(M 2 N 2 (ˆ u − u), M 2 N 2 (ˆ v − v)) −→ N2p (0, 6Σ). Remark 1. Note that the exponents we use above are not unique. There are several other ways they can be chosen so that the iterative process will converge in three steps. For example another set of choices can be M1 = M 0.6 , N1 = N 0.6 ; M2 = M 0.8 , N2 = N 0.8 ; M3 = M , N3 = N , it is not possible to choose a set of exponents to make the iterative process converge in less than three steps, but it is sure for several sets of exponents to take more than three steps to converge.
4 Numerical Experiment In this section we present some numerical results to observe how the proposed method works for finite sample size. We consider the following model: y(m, n) = ξ1 (m, n)ei(0.5m+1.5n+π/4) + ξ2 (m, n)ei(0.6m+1.7n+π/3) + ε(m, n), where {ξ1 (m, n)} and {ξ2 (m, n)} are real Gaussian random variables with the same mean 3 and variance 0.5 and 0.6 respectively, {ε(m, n)} is an array of i.i.d. exponential complex random variables with mean zero and the same variance σ02 /2 for the real and the imaginary parts. To asset the sensitivity of the model to different noise levels we plot three different signal to min{A2 } noise ratio (SNR), defined as: SN R = 10 log10 2 σk2 , namely SNR = -5, 0 k=0
k
190
J. Bian et al.
and 5. To present the consistency we take the sample sizes as M=N=64, 128 and 256. Now, for each sample size, we estimate the frequencies based on the TSI algorithm. In all cases we consider the initial estimator as the periodogram maxmizer at the Fourier frequencies. We report the average estimates of the two pairs of frequencies (Fr11, Fr12), (Fr21, Fr22) and the corresponding square root of the mean squared errors (SRMSEs) over 100 replications. For comparison purpose, we also report the initial estimates as well as the true values of parameters and the corresponding square root of the asymptotic variances (SRAVs). All the results are reported in Tables 1-3 corresponding to M=N=64, M=N=128, M=N=256 respectively. In each table and for each SNR, the first line represents the true parameter values and the initial estimates are reported at the second line, the third line represents the TSI estimates and the SRMSEs of the TSI estimates are reported at the fourth line. Finally, we reported the SRAVs at the last line. The following observations are very clear from the numerical experiments. It is observed that the TSI estimates are very close to the true parameter values and are better than the initial estimates in nearly all the cases considered. It is immediate that the biases decrease as the SNR increase. Therefore, the TSI estimates provide asymptotically unbiased estimators of the frequencies. It can be seen that the SRMSEs of all the parameters decrease gradually and approach the SRAVs as the sample size increases. It verifies the consistency of the the TSI estimates. It is also observed that the TSI estimates are also fairly good even when the sample size is small and the SNR is low when the initial estimates are bad, so it verifies the effectiveness of the TSI algorithm. Table 1 The average estimates of the initial and TSI estimators based on 100 replications, along with the SRMSEs and SRAVs of the two pairs of frequencies when M=N=64 SNR Estimate Parameter Initial -5 TSI SRMSE SRAV Parameter Initial 0 TSI SRMSE SRAV Parameter Initial 5 TSI SRMSE SRAV
Fr11 0.500000 0.490874 0.500148 1.19256e-3 1.05406e-3 0.500000 0.490874 0.499922 5.89485e-4 5.81172e-4 0.500000 0.490874 0.499906 3.08272e-4 3.05329e-4
Fr12 1.500000 1.472622 1.498587 1.06496e-3 1.05406e-3 1.500000 1.472622 1.498533 6.59702e-4 5.81172e-4 1.500000 1.472622 1.498566 3.14107e-4 3.05329e-4
Fr21 0.600000 0.589049 0.599263 1.23070e-3 1.05217e-3 0.600000 0.589049 0.599473 6.01880e-4 5.77743e-4 0.600000 0.589049 0.599791 3.05855e-4 2.98750e-4
Fr22 1.700000 1.668971 1.701828 1.28560e-3 1.05217e-3 1.700000 1.668971 1.701556 5.73435e-4 5.77743e-4 1.700000 1.668971 1.701638 3.03656e-4 2.98750e-4
An Efficient and Fast Algorithm
191
Table 2 The average estimates of the initial and TSI estimators based on 100 replications, along with the SRMSEs and SRAVs of the two pairs of frequencies when M=N=128 SNR Estimate Parameter Initial -5 TSI SRMSE SRAV Parameter Initial 0 TSI SRMSE SRAV Parameter Initial 5 TSI SRMSE SRAV
Fr11 0.500000 0.490874 0.499873 2.68336e-4 2.63515e-4 0.500000 0.490874 0.499851 1.54061e-4 1.45293e-4 0.500000 0.490874 0.499912 7.64692e-5 7.63322e-5
Fr12 1.500000 1.521709 1.499992 2.64891e-4 2.63515e-4 1.500000 1.521709 1.499996 1.45544e-4 1.45293e-4 1.500000 1.521709 1.499980 7.67465e-5 7.63322e-5
Fr21 0.600000 0.589049 0.600115 2.86920e-4 2.63043e-4 0.600000 0.589049 0.600042 1.45542e-4 1.44436e-4 0.600000 0.589049 0.600173 7.48406e-5 7.46875e-5
Fr22 1.700000 1.718058 1.699986 2.97158e-4 2.63043e-4 1.700000 1.718058 1.699976 1.45130e-4 1.44436e-4 1.700000 1.718058 1.700016 7.49614e-5 7.46875e-5
Table 3 The average estimates of the initial and TSI estimators based on 100 replications, along with the SRMSEs and SRAVs of the two pairs of frequencies when M=N=256 SNR Estimate Parameter Initial -5 TSI SRMSE SRAV Parameter Initial 0 TSI SRMSE SRAV Parameter Initial 5 TSI SRMSE SRAV
Fr11 0.500000 0.490874 0.499982 6.88426e-5 6.58787e-5 0.500000 0.490874 0.500003 3.68782e-5 3.63233e-5 0.500000 0.490874 0.499985 1.96997e-5 1.90830e-5
Fr12 1.500000 1.497165 1.499999 6.88849e-5 6.58787e-5 1.500000 1.497165 1.499998 3.65143e-5 3.63233e-5 1.500000 1.497165 1.500000 1.91082e-5 1.90830e-5
Fr21 0.600000 0.589049 0.600015 6.68669e-5 6.57607e-5 0.600000 0.589049 0.600033 3.66264e-5 3.61089e-5 0.600000 0.589049 0.600022 1.917771e-5 1.86719e-5
Fr22 1.700000 1.693515 1.700016 6.88854e-5 6.57607e-5 1.700000 1.693515 1.700008 3.61466e-5 3.61089e-5 1.700000 1.693515 1.700011 1.89554e-5 1.86719e-5
5 Conclusions In this paper, we considered the estimation of the frequencies of signals of a 2-D superimposed exponential model in presence of i.i.d. multiplicative and additive noise. We generalized the three step iterative algorithm [14] to our
192
J. Bian et al.
considered model (1). We proved the efficiency of the TSI estimators and obtained the asymptotic distribution of the TSI estimators. It is observed that the TSI algorithm works quite well and the TSI estimators have the same convergence rate as the LSEs in presence of additive noise [4]. It can be easily seen that the TSI algorithm needs only three steps to converge from the given starting value, it naturally saves computational time and can be used for online implementation. Acknowledgements. This research is supported by the National Nature Science Foundation of China (No. 60672049) and the Research Foundation for Outstanding Young Teachers, China University of Geosciences (Wuhan) (No. CUGQNL0822 and No. CUGQNL0732).
References 1. Francos, J.M., Narashimhan, A., Woods, J.W.: Maximum-likelihood Estimation of Textures Using a Wold Decomposition Model. IEEE Trans. Image Process. 4(12), 1655–1666 (1995) 2. Ward, J.: Space-Time Adaptive Processing for Airborne Radar. Lincoln Lab., MIT, Lexington, MA, Tech. Rep. 1015 (1994) 3. Rao, C.R., Zhao, L., Zhou, B.: Maximum Likelihood Estimation of 2-D Superimposed Exponential Signals. IEEE Trans. Signal Process. 42(7), 1795–1802 (1994) 4. Kunda, D., Mitra, A.: Asymptotic Properties of the Least Squares Estimators of 2-D Exponential Signals. Multidimensional Systems and Signal Processing 7, 135–150 (1996) 5. Lang, S.W., McClellan, J.H.: The Extension of Pisarenko’s Method to Multiple Dimensions. In: Proc. Int. Conf. Acoustics, Speech, Signal Processing, Paris, France, pp. 125–128 (1982) 6. Kumaresan, R., Tufts, D.W.: A Two-dimensional Technique for FrequencyWavenumber Estimation. Proc. IEEE 69(11), 1515–1517 (1981) 7. Ibrahim, H.M., Gharieb, R.R.: Estimating Two-dimensional Frequencies by a Cumulant Based FBLP Method. IEEE Trans. Signal Process. 47(1), 262–266 (1999) 8. Giannakis, G.B., Zhou, G.: Harmonics in Multiplicative and Additive Noise: Parameter Estimation Using Cyclic Statistics. IEEE Trans. Signal Process. 43(9), 2217–2221 (1995) 9. Besson, B., Castanie, F.: On Estimating the Frequency of a Sinusoid in Autoregressive Multiplicative Noise. Signal Processing 30, 65–83 (1993) 10. Swami, A.: Multiplicative Noise Models: Parameter Estimation Using Cumulants. Signal Processing 36, 355–373 (1994) 11. Wang, F., Wang, S., Dou, H.: Two-dimensional Parameter Estimation Using Two-dimensional Cyclic Statistics. Acta Electronica Sinica 31(10), 1522–1525 (2003) 12. Yang, S., Li, H.: Two-dimensional Harmonics Parameters Estimation Using Third-order Cyclicmoments. Acta Electronica Sinica 33(10), 1808–1811 (2005)
An Efficient and Fast Algorithm
193
13. Yang, S., Li, H.: 2-D Harmonic Retrieval in Multiplicative and Additive Noise Based on Chirp Z-tranform. In: Proc. Int. Conf. Signal Processing, Beijing, China, pp. 16–20 (2006) 14. Nandi, S., Kundu, D.: An Efficient and Fast Algorithm for Estimating the Parameters of Sinusoidal Signals. Sankhya 68(2), 283–306 (2006) 15. Bai, Z.D., Rao, C.R., Chow, M., Kundu, D.: An Efficient Algorithm for Estimating the Parameters of Superimposed Exponential Signals. J. Statist. Plan. Infer. 110, 23–34 (2003) 16. Mangulis, V.: Handbook of Series for Scientists and Engineers. Academic Press, New York (1965)
Appendix To prove Theorem 1, we need the following lemmas. These results are elementary, so the proofs are omitted, readers can refer to [16]. Lemma 1. for k, l=1,2,3, α = 0 or β = 0, N M 1 1 mk−1 nl−1 cos2 (mα + nβ) = min{M,N }→∞ M k N l 2kl m=1 n=1
lim
for u = α or v = β, N M 1 mk−1 nl−1 cos(mα + nβ) cos(mu + nv) = 0 min{M,N }→∞ M k N l m=1 n=1
lim
Lemma 2. if {ε(m, n)} is a sequence of i.i.d. random variable with mean zero and finite variance σ 2 , then for any positive integer k and l, sup
N M
1
1
ε(m, n)mk nl ei(mα+nβ) = Op (M k+ 2 N l+ 2 ).
α,β m=1 n=1
Proof. (Proof of Theorem 1) We will give a simple proof of Theorem 1 in the following. Now we compute BM,N (j), A1M,N (j) and A2M,N (j) respectively. Firstly, we will compute BM,N (j). Since ξk (m, n) is an array of i.i.d. random variable with mean Ak and variance σk2 , so if we note ηk (m, n) = ξk (m, n) − Ak , then ηk (m, n) is an array of i.i.d. variable with mean zero and variance σk2 . BM,N (j) =
p k=1
+
Ak
N M
˜ j )m+(vk −˜ vj )n+φk ] ei[(uk −u
m=1 n=1
p N M
˜ j )m+(vk −˜ vj )n+φk ] ηk (m, n)ei[(uk −u +
k=1 m=1 n=1
=
p k=1
Ak eiφk J1k (M, N ) + R1 (M, N ), (say)
N M
˜ j m+˜ vj n) ε(m, n)e−i(u
m=1 n=1
(8)
194
J. Bian et al.
where for k = j, J1k (M, N ) = Op (1); for k = j, using Taylor series approximation of ei(uj −˜uj )m and ei(vj −˜vj )n all up to first order, we have J1j (M, N ) = M N i[(uj −˜ uj )m+(vj −˜ vj )n] = M N (1 + Op (M −δ ) + Op (N −δ )). m=1 n=1 e 1 1 Using Lemma 2, we have R1 (M, N ) = Op (M 2 N 2 ). It is immediate that BM,N (j) = Aj eiφj M N (1 + Op (M −δ ) + Op (N −δ )). For A1M,N (j) A1M,N (j) =
&
N M m=1 n=1
=
p
p
k=1
iφk
Ak e
' (Ak + ηk (m, n))ei(uk m+vk n+φk ) + ε(m, n) (m −
M −i(u ˜ j m+˜ vj n) )e 2
J2k (M, N ) + R2 (M, N ) (say),
(9)
k=1
where for k = j, J2k (M, N ) = Op (M ); for k = j, using Taylor series approximation of ei(uj −˜uj )m and ei(vj −˜vj )n all up to first order, we have M N 3 i(uj −˜ uj )m i(vj −˜ vj )n = i(uj − u ˜ j ) M12N (1 + J2j (M, N ) = m=1 (m − M n=1 e 2 )e Op (M −δ ) + Op (N −δ )). vj )n] j )m+(vj −˜ Using Taylor series approximation of ei[(uj −˜u up to first N M order and Lemma 2, we have R2 (M, N ) = ε(m, n)(m − p M m=1 N n=1 M −i(uj m+vj n) i[(uj −˜ uj )m+(vj −˜ vj )n] )e e + η (m, n)(m − k=1 m=1 n=1 k 2 M i[(uk −uj )m+(vk −vj )n+φk ] i[(uj −˜ uj )m+(vj −˜ vj )n] e = R21 (M, N ) + R22 (M, N ) 2 )e (say). M N Where R21 (M, N ) = − m=1 n=1 ε(m, n)(m 3 1 3 1 M −i(uj m+vj n) 2 −δ N 2 ) 2 N 2 −δ ), )e + O (M + O (M R (M, N ) = p p 22 2 M N p M i[(uk −uj )m+(vk −vj )n+φk ] η (m, n)(m − )e + k k=1 m=1 n=1 2 3 1 3 1 Op (M 2 −δ N 2 ) + Op (M 2 N 2 −δ ). So, A1M,N (j) =
N M
ε(m, n)(m −
m=1 n=1
+iAj eiφj (uj − u ˜j ) +
p N M
M −i(uj m+vj n) )e 2
M 3N (1 + Op (M −δ ) + Op (N −δ )) 12
ηk (m, n)(m −
k=1 m=1 n=1
M i[(uk −uj )m+(vk −vj )n+φk ] )e . (10) 2
Therefore
12 A1M,N (j) ˜j + 2 Im u ˆj = u M BM,N (j) ˜j )(Op (M −δ ) + Op (N −δ )) = uj + (uj − u & M N 12 m=1 n=1 ε(m, n)(m − + 3 Im M N Aj
M −i(uj m+vj n+φj ) 2 )e
An Efficient and Fast Algorithm
p
M
k=1
+
195
N
M i[(uk −uj )m+(vk −vj )n+(φk −φj )] n=1 ηk (m, n)(m − 2 )e Aj
m=1
= uj + (uj − u˜j )(Op (M −δ ) + Op (N −δ )) +
12 Xj (say), M 3N
'
(11)
where M
m=1
Xj =
N
n=1 [Im{ε(m, n)}(m
−
M 2
) cos(uj m + vj n + φj )
Aj M 2
−Re{ε(m, n)}(m − p
k=1
+
M
m=1
N
) sin(uj m + vj n + φj )]
Aj
n=1
ηk (m, n)(m −
M 2
) sin[(uk − uj )m + (vk − vj )n + (φk − φj )] Aj
(12) Similarly, we have 12 A2M,N (j) vˆj = v˜j + 2 Im N BM,N (j) = vj + (vj − v˜j )(Op (M −δ ) + Op (N −δ )) +
12 Yj (say), MN3
(13)
where Yj is similar with Xj and can be obtained by only replacN ing (m − M 2 ) in Xj by (n − 2 ). Using Lemma 1-2, we have that when min{M, N } → ∞ var( 6(σ2 −σj2 ) , Cov( 312 1 A2j M2N2 12
Cov(
have
0, Cov(
12
3
3
1
M2N2 1
M2N2
Xj ,
Xj ,
Xj , 12
1
12
1
3
1
M2N2 3
M2N2
3
M2N2 12
12
M
3 2
N
Yj )
Xk )
Yk ) → 0.
1 2
Xj ) → →
→
0,
6(σ2 −σj2 ) , var( 112 3 A2j M2N2
and
0, Cov(
for 12
1
3
M2N2
Yj ,
=
j
12
1
3
M2N2
Yj ) → k,
we →
Yk ) 3
1
From assumption (3), we have Op (M −δ ) = Op (N −δ ), Op (M − 2 N − 2 ) = 1 3 Op (M − 2 N − 2 ) = Op (M −2 ) = Op (N −2 ). Therefore, if u ˜j − uj = Op (M −1−δ ), 1 −1−δ v˜j − vj = Op (N ) and 0 < δ 2 , then from Eq. (11) and Eq. (13), u ˆj − uj = Op (M −1−2δ ), vˆj − vj = Op (N −1−2δ ). If 12 < δ 1, from Eqs. 3 1 (11-13) and using the Central Limit Theorem, it follows that (M 2 N 2 (ˆ u− 1
3
L
v − v)) −→ N2p (0, 6Σ). u), M 2 N 2 (ˆ
An Improved Greedy Based Global Optimized Placement Algorithm Luo Zhong, Kejing Wang, Jingling Yuan, and Jingjing He*
Abstract. An improved Greedy based global optimized Placement algorithm was proposed in this paper. Probability controlling was applied to avoid local minimization of traditional Greedy algorithm. Some new technique such as Marker bit method, dynamical selection and weight matrix were introduced to improve efficiency of this algorithm. From the experiments and comparison with Simulated Annealing algorithm, this novel algorithm shows better stability and accuracy. Keywords: Greedy, Global selection, Marker bit, weight matrix.
1 Introduction
,
, ,
,
The problem of optimized Placement is appeared in many areas. There are a lot of algorithm to solve the problem such as SLA SA Greedy GA and so on. The article propose a improved Greedy algorithm to avoid the problem of getting local optimization that is appear in Greedy usually .The article will introduction some common problems of optimizing Placement algorithm and two method to improve Greedy algorithm.
2 Problems of Placement Algorithm 2.1 Overlap and Connection Overlap and connection are common problems in placement, so the effective solution is to separate different modules correctly. For example, there are three tasks in 4*4 as follows. Task1 0010 1111 0110 0000
Task 2 1000 1111 1000 1000
Task 3 0000 0001 0001 0001
Luo Zhong . Kejing Wang . Jingling Yuan . Jingjing He Computer Science and Technology School, Wuhan University of Technology, 430070, Wuhan, China
[email protected] *
H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 197–204. springerlink.com © Springer-Verlag Berlin Heidelberg 2009
198
L. Zhong et al.
Marker bit method is applied in this paper to merge and separate different modules for the purpose of solving overlapping and connection problems. For example, if the first task is represented by 0000 0001 and the second task is represented by 0000 0010, the overlapping area is the sum of both tasks is represented by 0000 0011, So if the solution is 0011 0100 that means the third one, the fifth one and the sixth one are overlapped in this point. The empty matrix is as follows. Fig. 1 16*16 Matrix Space
There are three tasks are overlapped in the yellow module as follows
Task1 0010 1111 0110 0000
Task 2 1000 1111 1000 1000
Task 3 0000 0001 0001 0001
There are two tasks are overlapped in the red module as follows
Task 4 100 100 100
Task 5 010 010 010
There is only one task in the pink module as follows
Task 6 1111 1110 1111 That means there are 6tasks at the beginning
An Improved Greedy Based Global Optimized Placement Algorithm
199
Task 1, Task 2 and Task 3 are overlapped in the yellow module. Task 4 and Task 5, are connected in the red module. Task 6 is independent in the pink module. Do the same operations for each module as follows (1)Assign the first task as 1,the second task as 2,the third task as 4,the nth task is (2) Combine different tasks in the same module, the value is the sum of the values in the same position. (3) Write the results into the matrix. So, the matrix is represented as Fig. 2 Representation of overlapping and connection
So the advantage of marker bit method is to merge and separate different modules effectively. But the number of modules is limited. For example, one integer value only represent at most 16 overlapping and connective matrix at once because of the limitation of integer representative range.
2.2 Two Standards There are two standards that decide the value of solution in Placement. ¾ ¾
Overlapping evaluation value maximal free matrix
2.2.1 Overlapping Evaluation Value The overlapping evaluation value shows overlapping situation. The formula is as follows. Cvalue = v2 *1 + v3* 2 + v4 * 4 + v5*8 + v6 *16.... + vi * 2i−2 + …… (1)
vi represented the number of points in which there are overlapping i modules . Just like the above example, there are three points resulted from two modules overlapped and one point produced by three modules overlapped, so the overlapping evaluation value is 3*1+1*2=5. In a word, the overlapping evaluation value is bigger, the overlapping is more.
200
L. Zhong et al.
2.2.2 Maximal Free Matrix The maximal free matrix is defined as the free matrix having the biggest free such as the following figure Fig.3. The yellow is the maximal free matrix whose parameter is 9*4=36 Fig. 3 A solution of Placement
Obviously, the less of overlapping evaluation value and bigger of maximal free matrix, the solution is much better. In practice, the overlapping evaluation value has higher priority
3 Improved Greedy In order to avoid getting local optimization, the solution is obtained from global space rather than local space by probability controlling.
3.1 Global Optimized Placement 3.1.1 Movement And Rotation New solution is obtained by movement and rotation of some modules. Movement is to change module position to produce a new solution. For example, Fig.a is an initial solution includes 5 modules, so we get a new solution Fig.5 through moving module 3 from the coordinate (12,1) to the coordinate (12,8). Through rotating module 5, we also get new solution Fig.6 .
Fig. 4 Initial solution
Fig. 5 Movement
Fig. 6 Rotation
An Improved Greedy Based Global Optimized Placement Algorithm
201
3.1.2 Global Selection It is an important problem to get new solution in global arrange which rests the required number of movement and rotation at every time. All solution can be produced if all modules are moved and rotated at each time. The less is the number of movement, the smaller is the solution space, which will result in local minimization. But the number of movement is too much, randomcity and time will increase. So we proposed a dynamical selection to solve this problem. The basic idea of dynamical selection is that if there are i modules in one solution space, k modules are selected to do movement and rotation. K is a random value is chosen by the following rules k is in the range of (1…i) Given, f(v)is the probability of k=v. Then,
(2)
For example, a solution including 5 modules selected k modules to operate.
, , , ,
The probability of K=1 is 50% The probability of k=2 is 25% The probability of k=3 is 12.5% The probability of k=4 is 6.25% The probability of k=5 is 6.25%
This probability controlling will not only guarantee selection to be in a small range to get the best solution quickly but also have a certain probability to get new solution in larger range even in the global space. Pseudo code Begin GetRandomNum(A) V=GetRan(1);//get a Random number in (0…1) K=1; While(v<1) Begin V=V*2; If(k>=A) return A; K++; End while Return
;
k
End GetRandomNum
202
L. Zhong et al.
3.2 Optimization Generally, Moving modules to the corner can get better solutions. So the paper propose the weight matrix to decide the movement. The basic idea of placement optimization is the probability of moving to the corner is bigger than moving to the center. the formula is Map[i][j]=|i-m/2|+|j-n/2|+1; i=0…m j=0…n
(3)
m,n is the range that the module can move. For example,There is a module. 0101 0111 1100 The top-left point is the basic point A, the position of A in the range of 5*9 is 000000000 000000000 000000000 000000000 000000000 So point A only move in the yellow range, that is map[3][5]. Based the formula (3), The weight matrix is
The nearer to the corner, the bigger is the weight from the above matrix, which will optimize the solution. On the basis of weight matrix, add the former date to the later data to produce selection matrix is as follows.
From selection matrix, we can get movement coordinate. The detail steps are as follows.
∈
1. Get a random value in [0…43] such as 17 . 2.decide which area k belong to. Such as 17 [16,19] this is the area between map[0][4] and map[1][0]. 3.Get the point that module move. Such as we get point (1,0). 4.move the module to the point. Such as We get a new solution :
An Improved Greedy Based Global Optimized Placement Algorithm
203
000000000 010100000 011100000 110000000 000000000
3.3 Test Data Based the test data,The inproved Greedy can get better solution and cost less time than other algoritms. Improved Greedy avoid local solution successfully and become more efficient. Table 1 Test data between algorithms Parameters Algorithm
Complex O
Maximal Free Matrix
Executions Time
Test Case
(Average) 98.3 (Average) 16×3(48)
103.4
2.103s
Test1
3.067s
Test1
4.766s
Test1
(Average) 89.7 (Average) 8×5(40)
100.6
1.943s
Test2
3.264s
Test2
4.747s
Test2
86.6
1.487s
Test3
3.177s
Test3
4.664s
Test3
1.016
Test4
1.432
Test4
4.101s
Test4
Greedy
T*n* mn
SA
T*L*n* mn
GA
n×mn×P
Greedy
T*n* mn
SA
T*L*n* mn
GA
n×mn×P
Greedy
T*n* mn
SA
T*L*n* mn
GA
n×mn×P
(Average) 84.3 (Average) 6×8(48)
Greedy
T*n* mn
132
SA
T*L*n* mn
GA
n×mn×P
) (
( Average
136.7 Average 9×12 108
) ( )
n is the number of pattern, mn is the number of units of the pattern.
204
L. Zhong et al.
4 Conclusion The paper propose two method—Dynamical selection and The weight matrix. Through the dynamical selection, Greedy is provided with an ability of global selection, which avoid to get local solution. What is more, The weight matrix make Greedy more efficient. On the other hand, Marker bit method that is used in placement to solve the problem of overlap and connection make Greedy own better stability and accuracy. Acknowledgments. The work is a part of NOC. My Guide teacher, Teacher Yuan, suggest a lot of good advice for my article. On the other hand, My classmate help me complete the article. I would like to thank my teacher and my classmate for so many days to help me.
References 1. Bazargan, K., Kastner, R., Sarrafzadeh, M.: Fast Template Placement for Reconfigurable Computing Systems. IEEE Design and Test of Computers 17, 68–83 (2000) 2. Walder, H., Plazner, M., Thiele, L.: Online Scheduling and Placement of real-time asks to partially Reconfigurable Devices. In: Real-Time Systems Symposium (RTSS), pp. 224–225 (2003) 3. Walder, H., Steige, C., Plazner, M.: Fast Online Task Placement on FPGAs: Free Space Partitioning and 2D Hashing. In: International Parralel and Distribute Processing Symposium (IPDPS), p. 178 (2003) 4. Ahmadinia, A., Bodda, C., Teich, J.: A Dynamic Scheduling and Placement Algorithm for Reconfigurable Hardware. In: Müller-Schloer, C., Ungerer, T., Bauer, B. (eds.) ARCS 2004. LNCS, vol. 2981, pp. 125–139. Springer, Heidelberg (2004) 5. Handa, M., Vemuri, R.: An Efficient Algorithm for Finding Empty Space for Online FPGA placement. In: Design Automation Conference (DAC), June 2004, pp. 960–965 (2004) 6. Handa, M., Vemuri, R.: An Integrated online Scheduling and Placement Methodology. In: Becker, J., Platzner, M., Vernalde, S. (eds.) FPL 2004. LNCS, vol. 3203, pp. 444–453. Springer, Heidelberg (2004) 7. Cui, J., Deng, Q., He, X., Gu, Z.: An Efficient Algorithm for Online Management of 2D Area of Partially Reconfigurable FPGAs. In: Design Automation and Test in Europe (DATE), pp. 129–134 (2007) 8. Cui, J., Gu, Z., Liu, W., Deng, Q.: An Efficient Algorithm for Online Soft Real-time Task Placement on Reconfigurable Hardware Device. In: IEEE International Symposium on Object/Component/Service-Oriented Real-time Distributed Computing (ISORC), pp. 321–328 (2007)
An Alternative Fast Learning Algorithm of Neural Network Pin-Hsuan Weng, Chih-Chien Huang, Yu-Ju Chen, Huang-Chu Huang, and Rey-Chue Hwang*
Abstract. In this paper, an alternative fast learning algorithm for supervised neural network was proposed. Both linear multi-regression and back-propagation learning methods were used alternately in the network’s training process. This new learning way is expected to improve the learning efficiency and accuracy of neural network in the real applications. To demonstrate the superiority of the learning method we developed, several examples were simulated. The conventional back-propagation learning method was also performed as the comparison with the new method proposed. From the simulation results shown, the new method we proposed not only has the faster learning speed, but also has the better learning efficiency. Keywords: Alternative, Fast learning, Neural network, Linear multi-regression, Back-propagation.
1 Introduction In the past two decades, neural network (NN) has been widely and successfully applied into many areas due to its powerful capabilities of learning and adaptation [1-3]. Usually, error back-propagation (BP) is the most known and popular learning algorithm used in the NN applications. It is an iterative gradient method to define the parameters of a multi-layer network. However, the slow convergence and easy plunging into the local minimum are two main drawbacks of BP learning algorithm. Pin-Hsuan Weng, Chih-Chien Huang, Rey-Chue Hwang Electrical Engineering Department, I-Shou University, Kaohsiung County 840, Taiwan, China
*
Yu-Ju Chen Information Management Department, Cheng Shiu University, Kaohsiung 833, Taiwan, China Huang-Chu Huang Electric Communication Department, Kaohsiung Marine University, Kaohsiung 811, Taiwan, China
[email protected] H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 205–212. springerlink.com © Springer-Verlag Berlin Heidelberg 2009
206
P.-H. Weng et al.
Basically, the learning rate and momentum are two parameters associated with BP learning. They are selected as constants in the most of NN applications. An improper setting for these two values may either slow down the learning process or may even prevent the convergence of the network. Therefore, how to speed up the training and improve the learning accuracy of NN have been proposed by many researches [4-13]. In these past studies, linear multi-regression (LMR) method was also investigated and used for the optimization of NN [9-12]. Such an idea is to partition the overall weights of NN into two parts, i.e. nonlinear part and linear part. It assumed that the relationship between output layer and hidden layer are linear so that the weights between these two layers can be solved by LMR method directly. The weights between hidden layer and input layer are still nonlinear so that the optimal values of weights are searched by using gradient error BP method. According to our research experiences, undoubtedly, LMR algorithm certainly can approach the best solution for a two-layer neural model, i.e. NN includes input and output layers only. But, for the multi-layer NN model, how to determine the appropriate number of training epoch for the weights between input and hidden layers is still a critical study problem. In other words, after the process of LMR in the linear part of NN model, the follow-up nonlinear training between hidden layer and input layer becomes a very important step for an optimal NN model. The inappropriate training number setting easily makes NN model be divergent or become a worse trained model. In this study, an alternative supervised learning algorithm for NN model was proposed. Both LMR and BP learning methods were used alternately in the network’s training. The mechanism for the determination of training epoch in nonlinear BP learning part was developed and reported. In Section 2, Both BP learning method and LMR algorithm are briefly described. Section 3 reports the learning mechanism we proposed. The simulations and comparisons by using two methods are presented in Section 4 and then a conclusion is made in Section 5.
2 BP Learning and LMR Algorithms 2.1 BP Learning Method Fig. 1 shows the architecture of a three-layer neural model. The sigmoid function, f ( x) = 1 /(1 + exp(− x)) , is used as the signal transfer function for each neuron in the model. At here, we assume that the size of NN is n-m-1, i.e. n nodes in the input layer, m nodes in the hidden layer and one node in the output layer. Denote
ωij1
is
j th unit of hidden layer and i th unit of input layer th and ω 2j is the weight connection between j unit of hidden layer and output
the weight connection between
layer. Let zik ,
sk , ok and r jk be the i th input, desired output, actual output and
An Alternative Fast Learning Algorithm of Neural Network
207
the output of j th hidden node for input pattern k. δˆk and δ jk are defined as the error terms of output node and j th hidden node, respectively. Based on the steepest gradient descent algorithm, BP learning rule can be summarized as follows [3]. Step 1: Initialize all weights to small random values. Step 2: Present an input pattern k and specify the desired output sk . Step 3: Calculate the network output using the present weights. Step 4: Find the error terms. For the output node:
δˆk = ( s k - o k )o k (1 - o k )
(1)
For the hidden node j:
δ jk = r jk (1 - r jk )δˆk ω 2j
(2)
ω 2j (k + 1) = ω 2j (k ) + ηδˆk rjk + α (ω 2j (k ) − ω 2j (k − 1))
(3)
ω ij1 ( k + 1) = ω ij1 ( k ) + ηδ jk z ik + α (ω ij1 ( k ) − ω ij1 ( k − 1))
(4)
Step 5: Adjust weights by
Where (k+1), (k) and (k-1) are next, present and previous iterations, respectively η is the learning rate and α is the momentum. Step 6: Present next input pattern k+1 and then go back to step 2. All training inputs are presented cyclically until the weights are stabilized. Fig. 1 A three-layer neural model
2.2 LMR Method Regression analysis is the method for exploring relationships between response variable and predictor variables. LMR can obtain the optimal solution, if the data
208
P.-H. Weng et al.
base is linear and good for estimation. Here, the LMR method is briefly described as follows. Usually, a prediction linear model can be expressed as
yˆ = bˆ0 + bˆ1 x1 + bˆ2 x 2 + In this expression,
+ bˆn x n
(5)
yˆ represents a predicted value for the model based on speci-
fied values of the variables x1 , x 2 , ..., x n . The estimated regression coefficients are
bˆ0 , bˆ1, bˆ2 , ..., bˆn . Suppose the data is denoted as ( x1i , x2i , , xni , y ni ) , i = 1, , m . Then, the multiple-linear-regression model can be represented as yi = b0 + b1 x1i + b2 x2 i + + bn x ni , for i = 1, , m . It can be rewritten as the matrix form Y = Xβ
(6)
Where, X , Y , and β
⎡1 ⎢1 ⎢ X = ⎢1 ⎢ ⎢ ⎢⎣1
x11
x21
x12 x13
x22 x23
x1m
x2 m
⎡ y1 ⎤ xn1 ⎤ ⎢y ⎥ ⎥ … xn 2 ⎥ ⎢ 2⎥ , ⎢ y3 ⎥ and β = [b0 , b1 , b2 , Y = ⎥ xn 3 ⎢ ⎥ ⎥ ⎢ ⎥ ⎥ ⎢⎣ y m ⎥⎦ xnm ⎥⎦
, bn ]T
Based on the least-squares estimation, the following equation can be obtained.
( X T X )β = X T Y
(7)
Thus, the least-squares estimator can be expressed as
βˆ = ( X T X ) −1 ( X T Y )
(8)
3 Alternative Learning Algorithm In articles [9-12], the training process by combining LMR and BP techniques was presented for improving the learning speed of NN. The training procedure can be briefly described as follows. Step 1: Initialize all weights to small random values. Step 2: Calculate the output of each hidden node for each input pattern. (Or, the NN model can be trained with some iteration firstly, then calculate the output of each hidden node.) Step 3: Develop a linear system of equations and then use LMR method to calculate the weights between output layer and hidden layer.
An Alternative Fast Learning Algorithm of Neural Network
209
Step 4: Evaluate the training error and use BP method to propagate this error to update the weights between input layer and hidden layer. Step 5: Go back to Step 2, until the convergence is achieved. However, according to our research experiences, for making NN model could be efficiently convergent, the follow-up nonlinear training between input layer and hidden layer becomes a very important step in the whole training process. In above procedure, the inappropriate training number setting easily makes NN model be divergent or become a worse trained model. To improve the learning efficiency of NN, an alternative learning method with LMR and BP technique was proposed. In this method, LMR and BP algorithms are alternately used in the training process of NN model. The training procedure is presented as follows. Step 1: Initialize all weights to small random values. Step 2: Train NN by BP learning with a certain number of iterations. In our study, 100 training epochs were set for each simulation. Step 3: Calculate the average of weight changes for all hidden nodes on the 100th 2 . epoch and denote it as Δω BP Step 4: Fix the weights between input and hidden layers, then use LMR method to find the optimal increments of weights ( Δω j 2 , j=1,…, m) between hidden and output layers. Furthermore, update the weights ω 2j for j=1,…, m. 2 . Step 5: Calculate the average value of Δω 2j , for j=1,…,m and denote it as Δω LR
2 2 . Step 6: Let N1 = Δω LR Δω BP Step 7: Fix the weights between hidden layer and output layer and then retrain the weights between input layer and hidden layer by BP method N1 epochs. In our study, we set a limit for N1 value, i.e. if N1>1000, then N1=1000 and if N1<20, then N1=20. Step 8: Go back to Step 4, until the convergence of NN is achieved.
4
Simulations
To demonstrate the superiority of learning method we proposed, the identifications of two nonlinear systems were studied and simulated. For a comparison, both systems were also performed by NN model with BP learning rule only. In order to make a clear comparison, all initial parameters of NN models with different learning techniques are the same. In addition, all neural models are executed by nine different learning rates, from 0.1 to 0.9. Simulation 1: The first plant needs to be identified is
y (k ) = x1 (k ) 2 + x2 (k ) ⋅ x3 ( k ) + (1 x2 (k ) ⋅ x4 (k )) 2
(9)
210
P.-H. Weng et al.
Fig. 2 The error curves performed by BP and proposed methods with different learning rates for simulation 1 (solid line – BP method, dotted line – proposed method)
In our study, 1000 data points are generated from above system. 700 points are used for training and 300 points are used for testing. For all simulations, the size of NN model is 4-4-1. The input vector of neural model is [ x1 (k ), x2 (k ), x3 (k ), x4 (k )] . Fig. 2 shows the error curves performed by BP and proposed methods with 9 different learning rates. From the simulation results shown, the performance of NN with proposed learning method obviously is much better than the NN model with BP learning method only.
Simulation 2: The second plant needs to be identified is
y ( k ) = 10 sin(π ⋅ x1 (k ) ⋅ x 2 ( k )) + 20( x3 (k ) − 0.5) 2 + 10 x 4 (k ) + 5 x 5 (k ) + ε
(10)
where ε is the noise term which is uniformly distributed between [0, 1]. For this plant, 1000 data points are generated. 700 points are used for training and 300 points are used for testing. The size of NN model is 5-4-1. The input vector of neural model is [ x1 (k ), x2 (k ), x3 (k ), x4 (k ), x5 (k )] . Fig. 3 shows the error curves performed by two methods with 9 different learning rates. From the simulation
An Alternative Fast Learning Algorithm of Neural Network
211
Fig. 3 The error curves performed by BP and proposed methods with different learning rates for simulation 2 (solid line – BP method, dotted line – proposed method)
results, same as the simulation 1, the performance of NN with proposed learning method obviously is much better than the NN model with BP learning method only.
5 Conclusion In this paper, an alternative fast learning algorithm for supervised NN was proposed. Both LMR and BP learning methods were used alternately in the network’s training process. From the simulation results, it is clearly to find that the performance of NN model with alternative learning method is much better than the NN model with BP learning method only. This method not only makes NN have the faster learning efficiency, but also can improve the drawbacks existing in the methods proposed in past related researches. However, only the stationary signal processing problem was studied in this paper. In our advanced study, the nonstationary signal processing problem will be continuously studied in our future works.
212
P.-H. Weng et al.
Acknowledgments. This work is supported by the National Science Council of Republic of China under contract No. NSC-97-2221-E-214-069.
References [1] Carpenter, G.A., Grossberg, S.: A Massively Parallel Architecture for a Selforganizing Neural Pattern Recognition Machine. Computer Vision, Graphics, and Image Processing 37, 55–115 (1987) [2] Chen, S., Billings, S.A., Grant, P.M.: Non-linear System Identification Using Neural Network. International Journal of Control 51, 1191–1214 (1990) [3] Kaotanzad, A., Hwang, R.C., Abaye, A., Maratukulam, D.: An Adaptive Modular Artificial Neural Network Hourly Load Forecaster and Its Implementation at Electric Utilities. IEEE Trans. On Power System 10, 1716–1722 (1995) [4] Leonard, J., Kramer, M.A.: Improvement to The Back-propagation Algorithm for Training Neural Network. Computers and Chemical Engineer 14, 337–341 (1986) [5] Azmi, M.R., Liou, R.: Fast Learning Process of MLPNN Using Recursive Least Squares Methods. IEEE Trans. on Signal Processing 4, 446–450 (1993) [6] Biegler, F., Omatu, S.: A Learning Algorithm for Multilayer Neural Networks Based on Linear Least Squares Problems. Neural Networks 6, 127–131 (1993) [7] Verma, B.K., Mulawka, J.J.: Training of The Multiplayer Perceptron Using Direct Solution Methods. In: Proc. 12th IASTED Int. Conf. Appl. Inform., pp. 18–24 (1994) [8] Mulawka, J.J., Verma, B.K.: Improving The Training Time of the Backpropagation Algorithm. Int. Journal of Microcomputer Application 13, 85–89 (1994) [9] Verma, B.: Fast Learning of Multilayer Perceptrons. IEEE Trans. on Neural Networks 8, 1314–1320 (1997) [10] Stager, F., Agarwal, M.: Three Methods to Speed up The Training of Feedforward and Feedback Perceptrons. Neural Networks 10, 1435–1443 (1997) [11] Xu, C.H., Xu, X.D.: Research of New Learning Method of Feedforward Neural Network. Journal of TSINGHUA University (Science and Technology), 301–307 (1999) [12] Stubberud, P., Bruce, J.W.: LMS Algorithm for Training Single Layer Globally Recursive Neural Networks. In: IEEE International Conference on Neural Networks Conference Proceedings, vol. 3, pp. 2214–2217 (1998) [13] Abid, S., Fnaiech, F., Najim, M.: A Fast Feedforward Training Algorithm Using a Modified Form of The Standard Backpropagation Algorithm. IEEE Trans. on Neural Networks 12, 424–430 (2001)
Computer Aided Diagnosis of Alzheimer’s Disease Using Principal Component Analysis and Bayesian Classifiers ´ Miriam L´ opez, Javier Ram´ırez, Juan M. G´orriz, Ignacio Alvarez, Diego Salas-Gonzalez, Fermin Segovia, and Carlos Garc´ıa Puntonet
Abstract. Functional brain imaging with PET (Positron Emission Tomography) and SPECT (Single Photon Emission Computed Tomography) has a definitive and well established role in the investigation of a variety of conditions such as Alzheimer’s Disease (AD). Nowadays the inspection of PET and SPECT images is performed by expert clinicians, but usually entails time consuming and subjective steps. This work aims at providing an automatic tool to assist the interpretation of SPECT and PET images for the diagnosis of AD. The main problem to be handled is the so-called small size sample, which consists in having a small number of available images compared to the large number of features. This problem is faced up by reducing intensively the dimension of the feature space by means of Principal Component Analysis (PCA). Our approach is based on bayesian classifiers, which uses the a posteriori information to determine to which class the subject belongs, yielding 88.6% and 98.3% accuracy for SPECT and PET images respectively. Keywords: SPECT, PET, Alzheimer type dementia, Principal component analysis, Bayesian classification.
1 Introduction Alzheimer’s Disease (AD) is a neuro-degenerative disease that causes memory loss, behavioral changes and cognitive impairment. Mainly elderly are ´ Miriam L´ opez · Javier Ram´ırez · Juan M. G´ orriz · Ignacio Alvarez · Diego Salas-Gonzalez · Fermin Segovia Dept. of Signal Theory, Networking and Communications University of Granada, Spain Carlos Garc´ıa Puntonet Dept. of Architecture and Computer Technology, University of Granada, Spain H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 213–221. springerlink.com Springer-Verlag Berlin Heidelberg 2009
214
M. L´ opez et al.
affected by this disease. Because of the aging populations, the number of Alzheimer’s patients is expected to increase dramatically in coming years, straining the health care system. Single Photon Emission Computed Tomography (SPECT) and Positron Emission Tomography (PET) are being largely used for the study of the regional Cerebral Blood Flow (rCBF) and cell metabolism respectively, and provide unique information for the identification of functional abnormalities relevant to AD. With these methods it is possible to detect early rCBF changes seen in this dementia (even before clinical symptoms) and differentiate AD from other dementias by means of the rCBF pattern change. In the interpretation of these cerebral functional images, well-trained classifiers provide effectively diagnostic information. Both SPECT and PET are non-invasive, three-dimensional functional imaging modalities that provide clinical information regarding biochemical and physiologic processes in patients. The evaluation of these images is usually done through visual ratings performed by experts. However, maybe due to the large amounts of data represented in comparison with the number of available imaged subjects (typically <100), statistical classification methods have not been widely used in this area. Several approaches have been recently proposed in the literature aiming at providing an automatic tool that guides the clinicians in the AD diagnosis process. The Voxels-As-Features (VAF) approach [1], which will be used as a reference, uses the brain voxels intensities to train a Support Vector Machine (SVM) classifier. The main problem to face up is the small size sample, that is, the number of available samples to train a classifier is much lower than the features used in the training step. Other approaches [2, 3, 4] try to resolve this problem by searching the most discriminant features or the brain regions of interest (ROIs), which are used to train the classifier. In this work, a fully computer aided diagnosis (CAD) system for the early detection of AD is shown. The information used in the classification step is obtained from a previous feature extraction phase. The selected features to be used in the training step undergo a Principal Component Analysis (PCA) transformation which allows us to reduce drastically the dimension of the feature space so the small size sample problem is solved. The resultant data are used to make up a bayesian classifier which makes use of the a posteriori information to classify the coming images of a new patient.
2 Image Preprocessing As in previous approaches [1] the classification task is based on the assumption that the same position in the volume coordinate system in different volumes corresponds to the same anatomical position. However this assumption is not met by the images without pre-processing: The subject who is being imaged is not always positioned at the same position in the reference frame of the
CAD of Alzheimer’s Disease Using Principal Component Analysis
215
Fig. 1 Average image of the dataset, seen as consecutive transaxial slice images
imaging device and the anatomy does not always have the same shape and size among different subjects. This means that registering the volumes spatially is needed. Spatial registration is done by an implementation of the algorithms proposed in [5]. SPECT imaging generates volumes that only give a relative measure of the blood flow, whereas PET images represent information about the cells metabolism. In order to make possible a direct comparison of the voxel intensities between SPECT or PET images, a previous normalization of the intensities is needed. For all the experiments, we normalize the images by applying an affine transformation to the intensities as suggested also in [5]. All the images of the database are transformed using affine and non-linear spatial normalization, thus the basic assumptions are met. After the normalization steps, about 105 voxels remain for each patient. As proposed in [2, 3], we first construct a binary mask which selects the voxels of interest and discards the rest. This is done by taking the voxels whose mean intensity value averaged over all images exceeds the half of the maximum mean intensity value, and this mask is applied to the original images. In the resulting averaged images the irrelevant information has been removed or reduced. Fig. 1 represents the average image along the transaxial axis. The mask application will reject those voxels whose intensity values are lower than 0.5.
3 Principal Component Analysis and Eigenbrains PCA generates a set of orthonormal basis vectors, known as principal components, which maximize the scatter of all the projected samples. After the preprocessing steps, the remaining voxels are rearranged in a vector form. Let X = [X1 , X2 , ..., Xn ] be the sample set of these vectors, where n is the number of patients, each of dimensionality N . After normalizing the vectors to unity norm and subtracting the grand mean, a new vectors set Y = [Y1 , Y2 , ..., Yn ] is derived. Therefore, each Yi represents a normalized voxels vector with dimensionality N , Yi = (yi1 , yi2 , ..., yiN )t , i = 1, 2, ..., n. The covariance matrix of the normalized vectors set is defined as 1 1 ΣY = Yi Yit = YYt n i=1 n n
(1)
and the eigenvector and eigenvalue matrices Φ, Λ are computed as ΣY Φ = ΦΛ.
(2)
216
M. L´ opez et al.
Note that YYt is an N × N matrix while Yt Y is an n × n matrix. If the sample size n is much smaller than the dimensionality N , then diagonalizing Yt Y instead of YYt reduces the computational complexity [6] (Yt Y)Ψ = ΨΛ1
(3)
T = YΨ
(4)
where Λ1 = diag{λ1 , λ2 , ..., λn } and T = [Φ1 , Φ2 , ..., Φn ]. Derived from the eigenface concept [6] and due to its still brain-like appearance, the eigenbrains correspond to the dominant eigenvectors of the covariance matrix. In this approach, only m leading eigenvectors are used, which define the matrix P P = [Φ1 , Φ2 , ..., Φm ].
(5)
The criterion to choose the most discriminant eigenbrains is set by their separation ability, which is measured by the Fisher Discriminant Ratio (FDR), defined as F DR =
(μ1 − μ2 )2 σ12 + σ22
(6)
where μi and σi denote the i-th class within class mean value and variance, respectively. Fig. 2 represents the aspect of the three main eigenbrains of three slices along the transaxial axis. For the classification task we project each patient vector into the previously defined eigenbrain space, producing a vector of weights. For the whole database, a matrix of weights can be constructed, given by: Z = Pt Y. Fig. 2 Three representative transaxial slices for the first three eigenbrains
(7)
CAD of Alzheimer’s Disease Using Principal Component Analysis
217
4 Bayes Classifier For pattern recognition, the Bayes classifier is the best classifier in terms of minimum Bayes error, therefore the a posteriori probability functions will be evaluated [7]. Let ω1 and ω2 denote the object classes (AD and NORMAL), and Z a patient voxels vector in the reduced PCA subspace. The a posteriori probability function of ωi given Z is defined as P (ωi |Z) =
p(Z|ωi )P (ωi ) , p(Z)
i = 1, 2.
(8)
where P (ωi ) is a priori probability, p(Z|ωi ) the conditional probability density function of Z given ωi , and p(Z) is the mixture density. The maximum a posteriori (MAP) decision rule for the Bayes classifier is defined as p(Z|ωi )P (ωi ) = max{p(Z|ωj )P (ωj )}, j
Z ∈ ωi .
(9)
The brain projected data Z is classified to ωi of whom the A Posteriori probability given Z is the largest between the classes. Usually there are not enough samples to estimate the conditional probability density function for each class (within class density). A compromise, therefore, is to make an assumption of a particular density form, and convert the general density estimation question into a parametric one. The within class densities are usually modeled as normal distributions 8 1 1 t −1 p(Z|ωi ) = × exp − (Z − Mi ) Σi (Z − Mi ) 2 (2π)m/2 |Σi |1/2
(10)
where Mi (see (11)) and Σi are the mean and covariance matrix of class ωi , respectively.
4.1 Probabilistic Reasoning Model (PRM) Under the unified Bayesian framework, a new probabilistic reasoning model, PRM-1 is derived in [8], which utilizes the within class scatters to derive averaged estimations of within class covariance matrices. For the PRM-1 model, in the reduced PCA subspace, we assume all the within class covariance matrices are identical, diagonal, and each diagonal element is estimated by the sample variance in the one dimensional PCA subspace. As a result, the conditional probability density function specifies the MAP rule as a quadratic classifier characterized by Mahalanobis distance. In particular, let ω1 , ω2 and N1 , N2 denote the classes and number of patients within each class, respectively. Let M1 , M2 be the means of the classes in the reduced PCA subspace span [Φ1 , Φ2 , ..., Φm ]. We then have
218
M. L´ opez et al.
Mi =
Ni 1 (i) Z , Ni j=1 j
i = 1, 2.
(11)
(i)
where Zj , j = 1, 2, ..., Ni represents the sample voxels vector for the ωi class. The PRM-1 model assumes the within class covariance matrices are identical and diagonal " 2 Σi = diag σ12 , σ22 , ..., σm . (12) Each component σi2 can be estimated by sample variance in the one dimensional PCA subspace ⎧ ⎫ Nk # L ⎨ $2 ⎬ 1 1 (k) σi2 = zij − mki (13) ⎩ Nk − 1 ⎭ L j=1 k=1
(k)
(k)
where zij is the i-th element of the sample Zj , mki the i-th element of Mk , and L the number of classes (two in our approach). From (10) and (12) it follows ⎧ ⎫ m ⎨ 2⎬ (zj − mij ) 1 1 9m p(Z|ωi ) = × exp − (14) ⎩ 2 ⎭ σj2 (2π)m/2 j=1 σj j=1
Thus the MAP rule (9) specifies a quadratic classifier characterized by Mahalanobis distance (note that the priors are set to be equal). ⎧ ⎫ m m ⎨ 2 2⎬ (zj − mij ) (zj − mkj ) = min (15) ⇒ Z ∈ ωi ⎭ k ⎩ σj2 σj2 j=1
j=1
5 Evaluation Results SPECT and PET images used in this work were taken with a PRISM 3000 machine and a SIEMENS ECAT 47 respectively. Initially they were labeled by experienced clinicians of the “Virgen de las Nieves” Hospital (Granada, Spain) and “Cl´ınica PET Cartuja” (Seville, Spain) respectively. The database consists of 79 SPECT patients (41 labeled as NORMAL and 38 labeled as AD) and 60 PET patients (18 NORMAL and 42 AD). Initially, the original brain image 79 × 95 × 69 voxel sized is reduced by averaging over subsets of 4 × 4 × 4 voxels. After applying the mask, the remaining voxels are rearranged into a vector form so that PCA can be applied to the training set and eigenbrains are obtained. The new patient to be classified is projected into the eigenbrain space, the a posteriori probabilities P (Z|ω1 ) and P (Z|ω2 ) are
CAD of Alzheimer’s Disease Using Principal Component Analysis
219
100
50
AD NORMAL
0
−50
−100
−150 100 50 0 −50 −100
−30
−20
−10
30
20
10
0
Fig. 3 For PET images, distributions of the three first principal coefficients for AD and NORMAL patients
computed (where ω1 = NORMAL and ω2 = AD) and the MAP rule (14) is applied. Fig. 3 represents the three main PCA coefficients as 3D points for NORMAL and AD patients. Fig. 4 shows the accuracy values obtained for SPECT and PET images when the number of considered principal component m increases. The accuracy results were obtained by testing the classifier with the LeaveOne-Out method, that is, the classifier is trained using all but one patient, which is used in the test phase. This procedure is repeated for each patient and an average accuracy value for all the experiments is obtained. The VAF CLASSIFICATION RESULTS 100
90
Accuracy
80
70
60 PET SPECT
50
40 0
5
10
15
20 25 30 # PCA Coefficients
35
40
45
50
Fig. 4 Accuracy for SPECT and PET images using bayesian classifiers when the number m of considered principal components increases
220
M. L´ opez et al.
Table 1 Accuracy results obtained from SPECT and PET images SPECT PET VAF approach 77.2% 96.7% Eigenbrains approach 88.6% 98.3%
approach was also implemented and tested by the same cross-validation strategy. Results can be compared in Table 1.
6 Conclusions A computer aided diagnosis system for the early detection of the Alzheimer disease was shown in this paper. The system was developed by performing the principal components analysis to a subset of remaining voxels after the preprocessing steps, which allows us to face up the small size problem and reduce drastically the feature space dimension. The most important components in terms of ability to separate are chosen to make up a bayesian classifier. The a posteriori information is used when a new patient is needed to be classified. With this approach, 88.6% and 98.3% accuracy values are obtained for SPECT and PET images respectively. These results outperform the accuracy values attained by VAF approach, which has been taken as reference work. Acknowledgements. This work was partly supported by the Spanish Government under the PETRI DENCLASES (PET2006-0253), TEC2008-02113, NAPOLEON (TEC2007-68030-C02-01) projects and the Consejer´ıa de Innovaci´ on, Ciencia y Empresa (Junta de Andaluc´ıa, Spain) under the Excellence Project (TIC-02566).
References 1. Fung, G., Stoeckel, J.: SVM Feature Selection for Classification of SPECT Images of Alzheimer’s Disease Using Spatial Information. Knowledge and Information Systems 2(11), 243–258 (2007) 2. G´ orriz, J.M., Ram´ırez, J., Lassl, A., Salas-Gonzalez, D., Lang, E.W., Puntonet, ´ C.G., Alvarez, I., L´ opez, M., G´ omez-R´ıo, M.: Automatic Computer Aided Diagnosis Tool Using Component-Based SVM. In: Medical Imaging Conference, Dresden (2008) ´ 3. Alvarez, I., L´ opez, M., G´ orriz, J.M., Ram´ırez, J., Salas-Gonzalez, D., Puntonet, C.G., Segovia, F.: Automatic Classification System for the Diagnosis of Alzheimer Disease Using Component-Based SVM Aggregations. In: Proc. International Conference on Neural Information Processing (2008) 4. Ram´ırez, J., G´ orriz, J.M., Salas-Gonzalez, D., Lassl, A., L´ opez, M., Puntonet, C.G., G´ omez, M., Rodr´ıguez, A.: Computer Aided Diagnosis of Alzheimer Type Dementia Combining Support Vector Machines and Discriminant Set of Features. Accepted in Information Sciences (2008)
CAD of Alzheimer’s Disease Using Principal Component Analysis
221
5. Friston, K.J., Ashburner, J., Kiebel, S.J., Nichols, T.E., Penny, W.D.: Statistical Parametric Mapping: The Analysis of Functional Brain Images. Academic Press, London (2007) 6. Turk, M., Pentland, A.: Eigenfaces for Recognition. Journal of Cognitive Neuroscience 13(1), 71–86 (1991) 7. Fukunaga, K.: Introduction to Statistical Pattern Recognition, 2nd edn. Academic Press, London (1991) 8. Cheng, j.L., Wechsler, H.: A Unified Bayesian Framework for Face Recognition. In: International Conference on Image Processing, ICIP 1998, pp. 151–155 (1998)
Margin-Based Transfer Learning Bai Su, Wei Xu, and Yidong Shen
Abstract. To achieve good generalization in supervised learning, the training and testing examples are usually required to be drawn from the same source distribution. However, in many cases, this identical distribution assumption might be violated when a task from one new domain(target domain) comes, while there are only labeled data from a similar old domain(auxiliary domain). Labeling the new data can be costly and it would also be a waste to throw away all the old data. In this paper, we present a discriminative approach that utilizes the intrinsic geometry of input patterns revealed by unlabeled data points and derive a maximum-margin formulation of unsupervised transfer learning. Two alternative solutions are proposed to solve the problem. Experimental results on many real data sets demonstrate the effectiveness and the potential of the proposed methods.
1 Introduction The problem of transfer learning, where information gained in one learning task is used to improve performance in another related task, is an important new area of research. This paper concerns the subproblem of domain adaptation, in which the information learned from an auxiliary domain is generalized to help learning in a related target domain, where these two domains data are distributed similarly, but not identically. Previous works [1, 13, 7, 6] have studied the supervised version of this problem in which Bai Su · Yidong Shen State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences {subai,ydshen}@ios.ac.cn Bai Su · Wei Xu NEC Laboratories America, Inc.
[email protected] H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 223–233. c Springer-Verlag Berlin Heidelberg 2009 springerlink.com
224
B. Su, W. Xu, and Y. Shen
labeled data from both auxiliary and target domains are available for training. In this work, however, we study the more challenging problem of unsupervised transfer learning, where no labeled data from the target domain are available at training time, but instead, unlabeled target test data are available during training. We are specially interested in applying maximum margin method to transfer learning. Maximum margin methods, such as Support Vector Machine and its extensions, have significantly advanced the state of the art for classification. In particular, SVMs have excellent learning, classification ability and generalization ability which use structural risk minimization instead of empirical risk minimization. However, these methods usually perform poorly as traditional classification methods do in the scenario of transfer learning since they all assume the source and target domains are distributed identically. Recently, there have been some research [1, 13] to apply maximum margin methods to transfer learning, but they are all restricted to semi-supervised transfer learning, i.e. a small portion of target data have been labeled for training. In this paper, we introduce the maximum margin method into unsupervised transfer learning for the first time. The major obstacle to apply maximum margin method to unsupervised transfer learning is the missing labels of target data while traditional maximum margin methods require supervised information to find the maximum margin separating different classes. To overcome this difficulty, we treat the supervised information provided by the classifier trained in auxiliary domain as the prior information and proposed an iterative approach based on alternating optimization to make the classification results consistent with the maximum margin in the target data. The main advantage of this tranductive learning is that we can use any classifier as the source of prior information, which provide a certain flexibility in practice. However it does not provide an appropriate model for the target domain and a new round of computation is required when new data arrives. To avoid this problem, we further propose to train a SVM classifier for the target domain by modifying the regularizer in the objective function which is solved under the same alternating optimization framework as before.
2 Problem Formulation We consider the problem of a binary classification task with respect to a target data set Dt , where all examples are unlabeled. Besides, there is a fully-labeled auxiliary data set Da which comes from the auxiliary domain, and an auxiliary classifier f (x) has been trained from it. A classifier is a function mapping a data point xi (i = 1..n) with m attributes xji (j = 1..m) to yi where yi = {±1}. The target data is drawn from a distribution that
Margin-Based Transfer Learning
225
is related to, but different from the auxiliary data in a way unknown to the learner. The objective is to label the data in Dt as accurate as possible.
3 Maximum Margin Method The goal of an SVM is to find the linear discriminant f (x) = wT φ(x) + b (φ(x) is the kernel function) that maximizes the minimum misclassification margin by solving the following optimization problem: 1 w2 + Cξ T e 2
min
w,b,ξi
s.t. yi (wT φ(xi ) + b) ≥ 1 − ξi , ξi ≥ 0 where i ξi measures the total classification error, w2 is a regularization term that is reversely related to margin between data points of two classes, C > 0 is a regularization parameter, and e is the vector of ones. We aim at extending maximum margin methods to transfer learning. A trivial solution could be to assign all data points to the same class, as the class label y is unknown, and the resultant margin could be infinite. To avoid such a meaningless solution, a class balance constraint is introduced as − ≤ eT y ≤
(1)
where ≥ 0 is a constant controlling the class imbalance. Then, the margin can be maximized by optimizing both the unknown y and the unknown parameter (w, b), as 1 w2 + C ξ T e 2 s.t. yi (wT φ(xi ) + b) ≥ 1 − ξi , ξi ≥ 0.
min
y,w,b,ξi
(2)
yi = {±1}, − ≤ et y ≤ . Recall that the dual of the SVM is 1 max αT e − αT (K ◦ yy T )α α 2 s.t. αT y = 0, 0 ≤ α ≤ Ce
(3)
where α = [α1 , . . . , αn ]T , K is the kernel matrix, and ◦ denotes the elementwise product between matrices. Hence, (2) can also be written as 1 min max αT e − αT (K ◦ yy T )α y α 2 s.t. αT y = 0, 0 ≤ α ≤ Ce yi = {±1}, − ≤ et y ≤ .
(4)
226
B. Su, W. Xu, and Y. Shen
3.1 Iterative Approach As (4) is non-convex integer programming problem and thus difficult to solve, we use a iterative approach based on alternating optimization [2]. First, fix y and maximize (3) w.r.t. α, which is standard SVM training. Since the auxiliary data is similar to the target data, the prediction result made by f (x) can be treated as a weak evidence of y. Thus we use it as the initial value of y. Then, fix α and minimize (2) w.r.t. y. Note that with a fixed α, both w and b can be determined from the KKT conditions, so the second step reduces to min ξ T e y,ξi
s.t. yi (wT φ(xi ) + b) ≥ 1 − ξi , ξi ≥ 0. Obviously, this yields yi = sign(wT φ(xi )+b). The two steps are then repeated until convergence. The second step is supposed to correct y to maintain consistency with the maximum margin through the target data set. However, the performance of the iterative approach is not satisfactory in practice. To understand this, we consider the hinge loss used by the SVM: 0 if yi f (xi ) > 1 li = 1 − yi f (xi ) otherwise. As shown in Figure 1(a), SVM tries to push yi f (xi ) to the right of the turning point where yi f (xi ) = 1. Changing the label of any point (e.g. point P) on the right of the turning point will incur relatively larger loss (Figure 1(b)) compared with the original one. Therefore, the procedure over-commits to the initial label, and thus easily gets stuck in local optima.
3.2 Iterative Approach Using Laplacian Loss Function To solve this problem, the loss function has to be changed to reduce the loss incurred by changing class label in the iterative approach. This can be done by using the -sensitive loss function (Figure 2(a)) used in support (a)
1
(b)
3 2.5
hinge losee
Fig. 1 Hinge loss and the loss incurred by changing label
2 0.5
1.5 1 0.5
0 0
0.5
1 yif(xi)
1.5
2
0 −2
−1
0 yif(xi)
1
P
2
Margin-Based Transfer Learning
227
Fig. 2 sensitive loss and Laplacian loss
(a)
(b) 1.5 Laplacian loss
ε sensitive loss
1
0.5
0 −1
−0.5 −ε 0 ε y −f(x ) i
0.5
1
i
1
0.5
0
0
0.5
1 y f(x ) i
1.5
2
i
vector regression (SVR). When = 0, it reduced to Laplacian loss function l = |f (xi ) − yi | where yi = {±1} different from the one in standard SVR. Laplacian loss function is shown under the same reference frame as that of hinge loss in Figure 2(b). Note that because Laplacian loss is symmetric around the turning point, the yi f (xi )’s will tend to lie around the turning point in order to reduce the loss. As changing the label will not incur big loss as before, SVR can more easily get out of a poor solution. By replacing the hinge loss function with Laplacian loss function, the objective function becomes min
w,y,b,ξi ,ξi∗
n 1 w2 + C (ξi + ξi∗ ) 2 i=1
(5)
s.t. yi − (wT φ(xi ) + b) ≤ ξi −yi + (wT φ(xi ) + b) ≤ ξi∗ ξi ≥ 0, ξi∗ ≥ 0, yi = ±1 − ≤ et y ≤ . First we can fix y and then obtain w by solving the dual form of (5) and using the KKT conditions. As discussed at the beginning of Section 3, the class balance constraint (1) has to be maintained to avoid trivial solutions. Hence, each of the y’s obtained throughout the iterative process is required to satisfy (1). To guarantee this, the bias b cannot be computed according to the KKT conditions as usual. Instead, we obtain b by minimizing (5) with w being fixed. Then (5) can be reduced to min y,b
n
|wT φ(xi ) + b − yi |
(6)
i=1
s.t. yi = ±1, − ≤ et y ≤ . In this optimization problem, b practically defines a boundary on the sorted list of wT φ(xi ) + b between positive and negative. Apparently, in order to minimize the objective value, we should label those patterns whose f (xi ) value is positive as +1, while label the rest as −1. Therefore, we can shift b in the range such that the class balance constraint is satisfied, compute the objective value in (6) at each position and set b to be the position with the
228
B. Su, W. Xu, and Y. Shen
minimum objective. Sorting takes O(nlog(n)) time, and search takes O(n) time. So the procedure takes O(nlog(n)) time. y can also be obtained by solving (6) which gives yi = sign(wT φ(xi ) + b). The complete procedure is presented at Algorithm 1. Algorithm 1. Maximum Margin Transfer Learning 1: 2: 3: 4: 5: 6:
Initialize y with the classification results of f (x). Perform SVR training with Laplacian loss. Compute w from the KKT conditions. Compute the bias b as described in Section 3.2 Assign the labels as yi = sign(wT φ(xi ) + b) Repeat steps 2-5 until convergence.
So far, we have proposed an unsupervised transfer learning approach in which the labels of target data are refined iteratively during the learning process until they are consistent with the maximum margin through target data. The main goal of this approach is to gain correct label of target data, not a model over target data. Although we do get a linear discriminant in the end, the resultant solution will not be sparse due to using Laplacian loss, which degrades its generalization ability.
3.3 Model Learning In this section, we propose another approach to unify the two goals: labeling target data and learning a new model under the same iterative optimization framework. Based on the observation that the distribution of target data is similar to that of auxiliary data, it is reasonable to assume that the discrepancy between the model of target data and the model of auxiliary data is also small. Thus we revise (5) to incorporate this assumption as: min
w,y,b,ξi ,ξi∗
n 1 w − w 2 + C (ξi + ξi∗ ) 2 i=1
(7)
s.t. yi − (wT φ(xi ) + b) ≤ ξi −yi + (wT φ(xi ) + b) ≤ ξi∗ ξi ≥ 0, ξi∗ ≥ 0, yi = ±1 − ≤ et y ≤ where w is the counterpart of w in f (x). The objective function combines two goals, i.e. minimizing the the discrepancy of two models and classification error in target data, together. Note that w +w −w ≥ w. Because w is inversely related to the margin and w is constant, minimizing w − w
Margin-Based Transfer Learning
229
is also equivalent to maximizing the lower bound of the margin. As a result, the solution would have better generalization performance than the original one. The dual of (7) is: : :2 n n : 1: : : min max − : (αi − βi )φ(xi ): + (αi − βi ) yi − w T φ(xi ) (8) y : αi ,βi 2 : i=1 i=1 s.t.
n
(αi − βi ) = 0, 0 ≤ αi , βi ≤ C
i=1
yi = {±1}, − ≤ et y ≤ . With y being fixed, this turns out to be a convex problem that can be solved by Platt’s sequential minimal optimization algorithm [3]. Thus an iterative algorithm can be derived in the same manner as algorithm 1. The difference is that we will get an appropriate classification model for target domain in the end. We denote the first approach as MM1 and the second as MM2. Several differences between the two approaches need to be clarified. Comparing with MM1, MM2 utilize not only the prior label information but also the model information of auxiliary domain. For MM1, we can use any classifier desired as the auxiliary classifier, while only SVM can be applied for MM2. The goals of MM1 and MM2 are also different. The former is only to label target data, while the latter has an additional goal, i.e. to learn an appropriate SVM model.
3.4 Computational Complexity MM1 and MM2 share the same computational complexity. The main computational load of the two algorithms is a sequence of quadratic programming. Modern implementations for quadratic programming typically have an empirical time complexity between O(n) and O(n2.3 ). The number of iterations of both algorithms is usually small (under 10) in practice.
4 Related Work Transfer learning aims to apply knowledge learned from auxiliary tasks, where labeled data are usually plenty, to develop an effective model for a related target task with limited labeled data or even no labeled data. There is no formal definition of “related tasks”, and in practice it refers to either related learning problems on the same data set or the same learning problem on different data sets. Different from semi-supervised learning in which both auxiliary data and target data are distributed identically, transfer learning deals with data sets with different distributions. One line of transfer learning research focused on learning with sample selection bias or covariance shift. In this problem setting, the training samples
230
B. Su, W. Xu, and Y. Shen
is governed by an unknown distribution p(x|λ) while the unlabeled test data is governed by a different unknown distribution p(x|θ). The training and test distribution may differ arbitrarily, but there is only one true unknown conditional distribution p(x|y). Zadronzy [4] and Bickel and Scheffer [5] directly estimate the ratio p(s = 1|x, λ, θ) ∝ p(x|θ)/p(x|λ), where s is a selector variable that decides whether an example x drawn under the test distribution p(x|θ) is moved into the training set or not. In practice, p(s = 1|x, λ, θ) can be estimated by a discriminative approach [5]. Despite the advance on sample bias correction methods, several difficulties limit their applicability on our problem. One of them is the difficulty of estimating the distribution of p(x), not to mention its change across domains. Moreover, the class-conditional p(y|x) often changes, which violates the assumption of sample selection bias. Another line of transfer learning research is about semi-supervised transfer learning. In this scenario, a typical assumption being made is that the number of labeled target data is limited. Thus the problem is how to utilize auxiliary data, which is typically sufficient, to learn a high-quality classification model over target data. Several works based on SVM [1, 13], logistic regression [6], and AdaBoost [7] have revealed that auxiliary data play the role of additional training data and boost the performance of target classification model. In contrast to these works, we focus on a different scenario: unsupervised transfer learning in which no labeled target data is available. There have been a few works to deal with transfer learning in this scenario. [8] proposed a co-clustering based algorithm (CoCC) to overcome the domain difference. CDSC [9] is a spectral classification based method in which the main idea is to regularize two objectives, namely, minimizing the cut size on all the data with the least inconsistency of the auxiliary data, and at the same time maximizing the separation of the target data. We compare our algorithm with CoCC and CDSC in the next section and demonstrate the better performance of our algorithm.
5 Experiments In this section we evaluate our algorithm empirically. We focus on binary text classification problems in the experiments, although the algorithm can be extended for multi-class problems by performing binary classification recursively.
5.1 Data Sets The cross-domain data sets are generated in specific strategies using 20 Newsgroups1 , Reuters-215782 and SRAA3 . Each of these data sets has at least two 1 2 3
http://people.csail.mit.edu/jrennie/20Newsgroups/ http://www.daviddlewis.com/resources/testcollections/ http://www.cs.umass.edu/ mccallum/data/sraa.tar.gz
Margin-Based Transfer Learning
231
level hierarchical structure. Suppose A and B are two root categories in one data collection, and A1, A2 and B1, B2 are sub-level categories of A and B respectively. Let A1, B1 be the positive and negative examples in the training set respectively. Let A2, B2 be the positive and negative examples in the test set respectively. Because training and test data are in different subcategories, they distributions are different but related. The 20 Newsgroups is a text collection of approximately 20,000 newsgroup documents. Six different data sets are generated from 20 Newsgroups for evaluating cross-domain classification algorithms. Each data set contains two top categories: one as positive and the other as negative classes. Then, we split the data based on subcategories. For instance, comp vs. sci indicates that the top category comp is treated as positive class and sci is as negative. The other data sets are named in the same way. SRAA is a Simulated/Real/Aviation/Auto UseNet data set for document classification. 73,218 UseNet articles are collected from four discussion groups about simulated autos (sim-auto), simulated aviation (sim-aviation), real autos (real-auto) and real aviation (real-aviation). We use the documents in real-auto and sim-auto as training data, while real-aviation and sim-aviation as test data. The auto vs aviation data set is generated in the similar way. Reuters-21578 is one of the most used test collections for evaluating automatic text-categorization techniques. The data sets are generated for crossdomain classification in the similar ways as what we have done on the 20 Newsgroups and SRAA corpora. Three data sets, orgs vs. people, orgs vs. places and peoples vs. places, are generated for cross-domain classification. Since there are too many sub-categories, we do not list the detailed description here.
5.2 Compared Algorithms Different types of algorithms are employed for comparison with our algorithms. The first is a supervised learning algorithm SVM. SVM has been successfully applied in many applications like text classification. To implement SVM algorithm, We use libSVM with radial basis kernel function (RBF). For SVM algorithm, the error rates are recorded both in the case of transfer learning and in the case of single domain classification. We also compared our algorithm with two semi-supervised algorithms, Transductive SVM (TSVM) [11] and Spectral Classifier (SC) [12]. SVMlight is employed to implement TSVM with a linear kernel function as in [11]. Two transfer learning algorithms, CoCC and CDSC, are selected as the state-of-art transfer learning algorithms to compare with our algorithm. All parameters are set default by the software. We use test error rate as the evaluation measure.
232
B. Su, W. Xu, and Y. Shen
Table 1 Performance Comparison for Different Data Sets Data Sets
K-L
auto vs aviation real vs simulated orgs vs people orgs vs places people vs places comp vs sci rec vs talk rec vs sci sci vs talk comp vs rec comp vs talk
1.126 1.161 0.303 0.329 0.307 0.874 1.102 1.021 0.854 0.866 0.967
SVM Da − Dt Dt − CV 0.289 0.034 0.235 0.031 0.300 0.106 0.418 0.094 0.241 0.083 0.305 0.015 0.204 0.004 0.202 0.009 0.207 0.009 0.182 0.010 0.114 0.006
Semi-Supervised Transfer Learning TSVM SC CoCC CDSC MM1 MM2 0.188 0.160 0.142 0.120 0.080 0.080 0.316 0.278 0.250 0.188 0.044 0.044 0.294 0.276 0.232 0.232 0.240 0.230 0.424 0.386 0.400 0.318 0.309 0.309 0.256 0.230 0.226 0.202 0.154 0.154 0.334 0.270 0.192 0.098 0.091 0.091 0.118 0.428 0.092 0.092 0.069 0.069 0.162 0.192 0.160 0.124 0.019 0.019 0.148 0.362 0.100 0.044 0.034 0.034 0.104 0.086 0.090 0.042 0.020 0.020 0.024 0.042 0.042 0.024 0.048 0.041
5.3 Experimental Results In Table 1, Kullback-Leibler divergence values [10] between the training and test sets are presented in the second column, which indicate the distances between different distributions. It can been seen that the KL-divergence values for all the data sets are much larger than the case when we simply split the same data set into test and training data, which has a KL value of nearly zero. The next column Da − Dt shows the performance when we use the labeled auxiliary data Da to train the SVM classifier for classification and test the target data Dt , while the column Dt − CV shows the best case of performing 10-fold cross-validation on Dt . Dt − CV is the case of single domain classification, which gives the best possibly result for transfer learning classifiers. In most cases, our algorithms achieve the best performance against other methods.
6 Conclusion In this paper, we address the issue of unsupervised transfer learning and introduce the maximum margin method to it for the first time. We formulated the problem as an iterative optimization problem. Two alternative approaches have be derived, namely MM1 and MM2. While MM1 utilize the prior label information provided by the auxiliary classifier, MM2 combine both the prior label information and the model information from the auxiliary classifier. With different characters, the two approaches can be applied alternatively according to different requirements in practice. The Experimental results on many data sets show that our algorithms consistently outperform the previously proposed unsupervised transfer learning algorithms, which demonstrate the effectiveness and potential of our proposed algorithms.
Margin-Based Transfer Learning
233
References 1. Wu, P., Dietterich, T.G.: Improving SVM Accuracy by Training on Auxiliary Data Sources. In: ICML (2004) 2. Zhang, K., Tsang, I.W., Kwok, J.T.: Maximum Margin Clustering Made Practical. In: ICML (2007) 3. Platt, J.: Fast Training of Support Vector Machines Using Sequential Minimal Optimization. In: Sch¨ olkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods-Support Vector Learning, pp. 185–208. MIT Press, Cambridge 4. Zadrozny, B.: Learning and Evaluating Classifiers under Sample Selection Bias. In: ICML (2004) 5. Bickel, S., Scheffer, T.: Dirichlet-Enhanced Spam Filtering Based on Biased Samples. In: NIPS (2007) 6. Liao, X., Xue, Y., Carin, L.: Logistic Regression with an Auxiliary Data Source. In: ICML (2005) 7. Dai, W., Yang, Q., Xue, G.R., Yu, Y.: Boosting for Transfer Learning. In: ICML (2007) 8. Dai, W., Xue, G.R., Yang, Q., Yu, Y.: Co-Clustering Based Classification for Out-of-Domain Documents. In: SIGKDD (2007) 9. Ling, X., Dai, W., Xue, G.R., Yang, Q., Yu, Y.: Spectral Domain-Transfer Learning. In: SIGKDD (2008) 10. Kullback, S., Leibler, R.A.: On Information and Sufficiency. Annals of Mathematical Statistics 22(1), 79–86 (1951) 11. Joachims, T.: Transductive Inference for Text Classification Using Support Vector Machines. In: ICML (1999) 12. Kamvar, S.D., Klein, D., Manning, C.D.: Spectral Learning. In: IJCAI (2003) 13. Yang, J., Yan, R., Hauptmann, A.: Adapting Svm Classifiers to Data with Shifted Distributions. In: Workshop on Knowledge Discovery and Data Mining from Multimedia Data and Multimedia Applications, Omaha, NE (2007) 14. Porter, M.: An Algorithm for Suffix Stripping Program. Program 14(3), 130– 137 (1980)
Nonlinear Dead Zone System Identification Based on Support Vector Machine Jingyi Du and Mei Wang *
Abstract. Aiming at a kind of nonlinear dead zone system, a novel identification method based on support vector machines was presented. In the method, the nonlinear dynamic characteristic of systems was expressed by using the Hammerstein model and the dead zone nonlinear model was created by using the least squares support vector machine. Furthermore, the strategy of the modified cost function was adopted to improve the accuracy of the dead zone model based on the inputs. In addition, the convergence condition of the algorithm is analyzed theoretically. Finally, the simulation on a hydraulic positioning servo system was carried out, and it is verified that the accuracy of the model identification is improved efficiently by using the proposed method. Keywords: Support vector machines, Dead zone nonlinearities, Hammerstein model, Modified cost function, Hydraulic position servo system.
1 Introduction In the industrial processes, the different degrees of dead zone (insensitive zone) exist in the mechanical and electrical systems and the hydraulic systems and control systems. This is a typical non-linear aspect. The difficulty to model this kind of systems which contains the dead zone is increased by the dead zone[1]. If the system input and output are measurable (known), the transmission characteristics of the system can be identified. By using the relationships between the inputs and the outputs in which both the high-frequency harmonic component and the fundamental frequency component exist, Li Shu-shen and Zhu Ji proposed a method to identify the systems which contain the dead zone link[2]. The method can estimate the system model parameters but fail to manipulate the asymmetric dead zone link. Dai Yi-ping et al. applied the genetic algorithms to simulate the steam turbine system which involves in the non-linear adjusted part containing the dead zone section[3]. The result can recognize the system parameters accurately, but Jingyi Du . Mei Wang School of Electricaland Control Engineering, Xi’an University of Science and Technology, Xi’an 710054, China
[email protected] *
H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 235–243. springerlink.com © Springer-Verlag Berlin Heidelberg 2009
236
J. Du and M. Wang
the dead zone is the symmetric dead zone whose slope is assumed to be 1. Xue-qi and QiuhaiCao [4] suggested the two step recognizing method based on the Legendre polynomial optimal square approximation to handle the dead zone in the system containing the asymmetric link. In practice, the systems comprising the dead-zone links probably are not only non-symmetrical, and the non-linear relationships between the inputs and the outputs are often shown [5,6]. Neural networks which are of the inherent adaptability and the learning ability are essentially the non-linear transform information systems. They carry out the huge amounts of parallel processing by using the neurons and identify effectively the dynamic characteristics of the nonlinear systems. Then the models of any non-line systems are obtained [7]. Tsai and Chuang [8], Selmic and Lewis [9] presented the methods to identify the models of the nonlinear dead zone links and achieved the good application results. However, the applications of neural networks are limited to a certain degree because of the over-Fitting learning, the local minimum, and the dependence of structures and designs on the expertise. Support Vector Machine (SVM) is a novelty machine learning algorithm proposed by Vapnik [10]. It has the better characteristics of the less samples learning, the global optimum, and the generalization ability, and so on. Then it overcomes the above shortcomings of the neural networks. Suykens and the other researchers [11, 12] presented a new type of SVM (the least squares support vector machines, LS-SVM). Being the extension of the SVM, the LS-SVM converts the quadratic programming problem of the SVM to the solving linear equation group problem. For the required precision, the LS-SVM shows the characteristics of the faster learning speed, the simpler architecture and the concise arithmetic. The Hammerstein model is suitable for describing the non-linear process, such as the dead zone and the switch on or off and so on. After the above analyses, the correctness of the identification result is ensured in this paper by the LS-SVM algorithm and the changed cost function which compensates for model error together yield good identification performance. Moreover, the convergence condition of the algorithm is deduced from the Lyapunov stability theorem.
2 System Description The Hammerstein model of the non-linear system containing the dead zone is shown in figure 1. There are 2 parts in the model. The first part is f (v(k )) which describes the non-linear dead zone. The second part is the linear discrete transmitting function G(z) to describe the linear system. v(k ) and y (k ) are the input and the output of controlled objective, respectively. u (k ) is the non-measurable middle
v(k)
u(k)
f (v ( k ))
y(k )
G (z )
Fig. 1 The Hammerstein model of the non-linear system containing the dead zone
Nonlinear Dead Zone System Identification Based on Support Vector Machine
237
variable. In our specific application (a hydraulic pressure servo-system), u (k ) would be the volume of the ratio threshold. (In practice, the direct measure of the volume of the ratio threshold of the hydraulic pressure servo system is very difficult. So it is very tough to model the nonlinear dead zone of the ratio threshold in the servo system.) Most of the references[15,16,17] tend to describe the dead zone as a symmetrically standard non-sensitive part. In practice, the essence of the dead zone system is not only asymmetric, but also is over reflects the non-linear relationship between inputs of the over positive or the negative dead zone edge and the outputs (Such as the dead zone characteristic of the volume in the servo system of the hydraulic pressure). f (v(k )) is shown as the non-linear equation below (figure 2).
⎧ f1 (v) ⎪ u(k ) = f (v(k )) = ⎨0 ⎪ f (v) ⎩ 2
if
v(k ) > br −bl ≤ v (k ) ≤ br
if if
(1)
v(k ) < −bl
Fig. 2 Nonlinear dead zone model
u f1 (v)
bl br
v
f 2 (v )
The linearly dynamic systems can be expressed by the transfer function below. G( z) =
y(k ) B( z −1 ) b0 + b1 z −1 + + bm z −m = = u(k ) A( z −1 ) 1+ a1 z −1 + + an z −n
(2)
where m and n are the power of the shift operator polynomial, and m ≤ n. Its difference equation is n
m
i =1
j =0
y (k ) = −∑ ai y (k − i ) + ∑ b j u (k − j )
(3)
3 System Model Based on LS-SVM Given the training sample set ( xk , yk ) kN=1 , the function in the form of
y ( x) = wT ϕ ( x) + b is used to estimate the unknown function by the LS-SVM
238
J. Du and M. Wang
method [11,12]. The final non-linear model is obtained by solving the equality constraints optimization problem. N
∑α K ( x, x ) + b
y ( x) =
k
(4)
k
k =1
where K ( x, xk ) is a kernel function which satisfies the Mercer condition. The different SVMs can be constructed by selecting the different kernel functions. The last term b is not suitable for the sequential calculation and be removed. Set T ϕ T ( x) = (ϕ ( x1 ), ϕ ( x2 ), , ϕ ( xN ), λ ) and w = ( w1 , w2 , , wN , b / λ ) , and λ is a constant. Namely, the dimension of the SMV is increased one more and the dimension of the corresponding weight vector is also increased one more[18].Then (4) becomes: N
∑α ′ K ′( x, x )
y ( x) =
k
(5)
k
k =1
where K′(x, xk ) = K(x, xk ) + λ2 . For simplicity, K ′( x, xk ) and α ′ are written as α and K ( x, xk ) . The output y(k) of the non-linear Hammerstein model which contains the dead area includes two parts. The first part uses the LS-SVM to estimate the nonlinear dead zone model between v( k ) and u (k ) . The other part y(k) is a linear model which depends on the past inputs and the output. That is N
∑α K (v(k ), v (k ) )
u (k ) =
j
(6)
j
j =1
where {v (k )}Nj=1 is the past N input data. For convenience, (6) is written as the following. N
∑α K
u (k ) =
j
j (v ( k ))
(7)
j =1
Combining (3) and (7), the output of the dead area non-linear Hammerstein model is n
y (k ) = −
∑
N
ai y (k − i ) + b0 (
i =1
∑α K (v(k ))
m
+
j
j =1
j
(8)
∑ b u (k − l ) l
l =1
It can also be expressed in the matrix form. y (k ) = Wn X n (k )
(9)
Nonlinear Dead Zone System Identification Based on Support Vector Machine
239
where Wn = [−a1, −a2 , , −an , b0α1, b0α2 , , b0αN , b1, b2 , b3 , , bm ] is the weight coefficient vector. The input vector is
X n = [ y (k − 1), y (k − 2),
, K N (v(k )), u (k − 1),
K1 (v(k )), K 2 (v(k )), u (k − 2),
, y (k − n), (10)
, u (k − l )]
T
If the parameter λ and the kernel function (such as the radial basis kernel function K ( x1 , x2 ) = exp(− x1 − y2
2
2s 2 ) ) are determined in the previous equation,
the part of {K j (v(k ))}Nj =1 will be obtained. Then equation (9) is a linear equation group and the solution of the weight coefficient vector can be searched out by the no constraint-free gradient descent algorithm. The definition of the cost function is needed when the gradient descend method is used. Usually the error function can be adopted. So the cost function is 2 J (k ) = e (k )
(11)
2
where e(k ) = y (k ) − d (k ) , d (k ) is the target output. If the input signal v(k ) is located in the dead area, −bl ≤ v( k ) ≤ br , the output u (k ) of the non-linear dead area approximately equals to zero. The definition is shown as follows. N
h( k ) = u ( k ) =
∑α K j
j (v ( k ))
≈0
(12)
j =1
or h(k ) = WP X P = Wn X n
(13)
Where Wp = [0,0, ,0, b0α1, b0α2 , , b0αN ,0,0,0, ,0] ,
X p = [0,
, 0 K1 (v(k )),
, K N (v(k )), 0,
, 0]T .
Then, the cost function becomes 2 J ( k ) = e (k )
2
+
Δυ h ( k )
2
2
(14)
Where the υ is penalty factor, the value of Δ is
⎧1 Δ=⎨ ⎩0
if
−ε l ≤ v(k ) ≤ ε r
if
otherwise
Where ε l and ε r are the estimated range of the dead zone.
(15)
240
J. Du and M. Wang
According to the gradient descend method, the learning factor is selected to be η , and the value of the weight vector is refreshed. Wn (k + 1) = Wn (k ) − η
∂J (k ) ∂Wn (k )
(16)
or Wn (k +1) = Wn (k) −η(e(k)Xn (k) +Δυh(k)Xp (k))
(17)
If the defined ideal weight value is W′ , the error function is ′ n (k ) = Wn (k )Xn (k ) e(k) = Wn (k)Xn (k) − WX
(18) 2
The orthogonal discrete Lyapunov function is defined as V ( k ) = W n ( k ) . It varies in the training process. 2
ΔV(k +1) = V(k +1) −V(k) = Wn (k +1) − Wn (k)
2
=η2 e(k) Xn (k) −υ2η2 h(k) Xp (k) 2
2
2
2
+2η2tr{e(k)Xn (k)T υh(k)Xp (k)}
(19)
−2ηtr{e(k)Xn (k) Xn (k) } T
T
−2ηυtr{h(k)Xp (k)T Xn (k)T } or
ΔV (k +1) = η2 ( e(k) Xn (k) +υ2 h(k) Xp (k) )2 −2η e(k)2 − 2υηh(k)2
(20)
Based on the Lyapunov stability theorem, there must be ΔV ( k + 1) < 0 to ensure the algorithm convergence or the system stable. Equivalently, it can ensure the algorithm convergence that the learning factor is properly selected in the range below. 2 2 0<η <2(e(k) +υh(k) )
( e(k) Xn(k) +υ2 h(k) Xp(k) )2
(21)
4 Simulation After the above formula illation, the simulation of the modeling problem of the Servo hydraulic system which contains the dead zone is carried out. The mathematical model of the system can be depicted as a series of a dead zone nonlinear part and a linear part [8]. The nonlinear dead zone part is depicted in figure 2, br=2.5, bl=2.1. The discrete model of the linear part is
Nonlinear Dead Zone System Identification Based on Support Vector Machine v( k )
Dead zone
u (k )
241 y (k )
Linear system t
yˆ (k ) Linear model
LS-SVM
d l Fig. 3 The identification structure block diagram of the hydraulic servo system
G(z) =
y (k ) 0.1499 z 2 + 0.0435 z − 0.0001 = 3 u ( k ) z − 0.8898 z 2 − 0.1472 z + 0.0414
(22)
The support vector machines can imitate any non-linear characteristics. A LSSVM is designed as the identification machine to replace the non-linear dead zone. It can be seen in figure 3 that the required sample data for the identification are the given stimulant signals v(k) and the actual output signals of the Piston location. The location output signals are relatively easy to measure. For the Servo hydraulic system identification, the position signals are given, v = 5sin(5t ) . For the LS-SVM, the most commonly used radial basis function is selected as the kernel function. After several simulation tests, the kernel width is finally determined s = 0.25 , and λ = 5.6 . Set bl = br = 2.5 for the non-linear dead zone identification. This means that the dead zone is symmetric in the system. The variable cost function is used in figure 4, the cost function is the equation (14), and the penalty factor v=10. The common cost function is used in figure 5, and the cost function is the equation (11). It can be seen by comparing figure4 with figure5 that the nonlinear dead zone model using the variable cost function strategy is better because of the consideration of the specific case of the small inputs. The actual output signals of the system and the model output signals are shown in Figure 6. The solid curve represents the actual output signals and the dashed curve represents the model output signals. The mean square error MSE = 0.0832 . Fig. 4 The estimated dead zone using the varying cost function method
6
4
u
2
0
-2
-4
-6 -6
-4
-2
0 v
2
4
6
242
J. Du and M. Wang 6
Fig. 5 The estimated dead zone using the common cost function method
4
u
2
0
-2
-4
-6 -6
-4
-2
0 v
2
4
6
1.5
Fig. 6 The actual output signals of the system and the model output signals
1
0.5
mm
0
-0.5
-1
-1.5
-2 0
1
2
3
4
5 s
6
7
8
9
10
5 Conclusion The dead zone phenomena have an important impact on the accuracy and stability of system control because of the seriously non-linear character. In order to reduce or eliminate the influence on the system output, the most important thing is to deeply analyze and understand the dead zone non-linear systems. A Hammerstein model method is proposed to identify the nonlinear dead zone based on the LSSVM. Aiming at the input signals, the variable cost function strategy is adopted to improve the accuracy of the model identification. Moreover, the convergent range is given according to the Lyapunov stability theorem. The simulation results show the effectiveness of this method. A new optional method is provided for the nonlinear dead zone system modeling by this paper.
References 1. Merzouki, R., Davila, J.A., Fridman, L., et al.: Backlash Phenomenon Observation and Identification in Electromechanical System. Control Engineering Practice 15(4), 447– 457 (2007) 2. Li, S.X., Zhu, J.: A Modeling Method for the Non-Linear Servo System Containing a Dead Zone. Automation Technology and Application 18, 4–8 (1999) 3. Dai, Y.P., Deng, R., Liu, J.: Research on Parameter Identification for Nonlinear Steam Turbine Governing System Based on Genetic Algorithm. Power Engineering 23(1), 2215–2218 (2003)
Nonlinear Dead Zone System Identification Based on Support Vector Machine
243
4. Cao, X.Q., Lu, Q.H.: Two-Step Method for Identifying Physic Parameters of Systems with Dead Zone Nonlinearity. Journal of Tsinghua University (Natural Science) 47, 2045–2047 (2007) 5. Wang, X.S., Sub, C.Y., Henry, H.: Robust Adaptive Control of a Class of Nonlinear Systems with Unknown Dead-Zone. Automatica 40, 407–413 (2004) 6. Lee, S.W., Kim, J.H.: Control of System with Deadzones Using Neural Network Based on Learning Controller. In: Proc. IEEE Int. Conf. Neural Networks, pp. 2535– 2538 (1994) 7. Zhu, Q.B.: Neural Network Learning Correction Method of the Nonlinear Sensing Characteristics and Its Application. Instruments Journal 25(6), 505–805 (2004) 8. Tsai, C.H., Chuang, H.T.: Deadzone Compensation Based on Constrained RBF Neural Network. Journal of the Franklin Institute 341, 361–374 (2004) 9. Selmic, R.R., Lewis, F.L.: Deadzone Compensation in Motion Control Systems Using Neural Networks. IEEE Trans. Autom. Control 45, 602–613 (2000) 10. Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995) 11. Suykens, J., Vandewalle, J.: Least Squares Support Vector Machines Classifiers. Neural Processing Letters 9, 293–300 (1999) 12. Suykens, J., Van, G., De Brabanter, J., et al.: Least Squares Support Vector Machines. World Scientific Publishing Co Pte Lte, Singapore (2002) 13. Tsai, C.H., Chuang, H.T.: Deadzone Compensation Based on Constrained RBF Neural Network. Journal of the Franklin Institute 341, 361–374 (2004) 14. Chen, Z.H.: A New Identification Method of Nonlinear System Based on the Hammerstein Model. Control Theory and Applications 24(1), 143–147 (2007) 15. Shen, Q.K., Zhang, T.P.: Robustly Adaptive Fuzzy Control for the Unknown Nonlinear Dead-zone. Control and Decision 21(4), 368–375 (2006) 16. Cho, H., Bai, E.W.: Convergence Results for an Adaptive Deadzone Inverse. Int. J. Adaptive. Control Signal Processing 12, 451–466 (1998) 17. Tao, G., Kokotovic, P.V.: Adaptive Sliding Control of Plants with Unknown Deadzone. IEEE Trans. Autom. Control 39, 59–68 (1994) 18. Vijayakumar, S., Wu, S.: Sequential Support Vector Classifiers and Regression. In: Proc. Int. Conf. on Soft Computing, vol. 5, pp. 610–619 (1999)
A SVM Model Selection Method Based on Hybrid Genetic Algorithm and Empirical Error Minimization Criterion Xin Zhou and Jianhua Xu*
Abstract. The generalization capacity of support vector machine (SVM) depends largely on the selection of kernel function and its parameters, and penalty factor, which is regarded as model selection of SVM. When various forms of differentiable and loose generalization bounds are considered as the objective functions, the traditional optimization algorithms easily fall into the local optimal solutions, whereas the modern techniques difficultly find out really optimal ones. Recently the empirical error criterion on a validation set is used as a new objective function which is optimized by the classical optimization methods. In this paper, we propose a new SVM model selection based on hybrid genetic algorithm and empirical error minimization criterion. The hybrid genetic method integrates the gradient descent method into the genetic algorithm to search for a better parameter of RBF kernel. The experiments on 13 benchmark datasets demonstrate that our method can work well on some real applications. Keywords: Support vector machine (SVM), Hybrid genetic algorithm, Empirical error minimization criterion.
1 Introduction The support vector machine (SVM) is a prominent classifier introduced by Vapnik and his co-workers [1, 2], and has been applied to many different classification problems successfully. But its success depends on the tuning of several parameters which affect the generalization capacity, including the penalty parameter C and the kernel function parameters such as σ for radial basis function (RBF) kernel. In order to achieve optimal generalization performance, SVM model selection technique should optimally determine these parameters. The adjustment of the parameters is usually conducted through minimizing an error estimator. Several methods [3-9] were developed for choosing the best parameter values, such as the k-fold cross-validation or the leave-one-out (LOO) Xin Zhou . Jianhua Xu Department of Computer Science, Nanjing Normal University, Nanjing 210097, China
[email protected] *
H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 245–253. springerlink.com © Springer-Verlag Berlin Heidelberg 2009
246
X. Zhou and J. Xu
methods. Ratsch et al. [3] used 5-fold cross validation to optimize the penalty parameter C and kernel parameter on the first five realizations, and average their estimations as the final model parameters. Lee and Lin [4] made full use of training samples to adopt the exhaustive search for the optimal parameters, which minimizes the expectation test of the LOO error rate. However, two algorithms are time consuming. Since it is difficult to calculate the real LOO error, researchers used some estimated upper bounds of the LOO error as the objective functions for SVM model selection, such as radius-margin bound, span bound and VC dimension bound. Chapelle et al. [5] proposed firstly an automatic method for selecting parameters for SVM by using an approximate error of the LOO procedure. Keerthi [6] applied Quasi-Newton algorithms efficiently to minimize radius-margin bound for tuning parameters of RBF kernel SVM. Duan et al. [7] showed that radius margin bound can provide good prediction for L2-SVM, but does not work well for L1-SVM. In this case, Chung et al. [8] proposed a modified radius margin bound for L1-SVM. Recently, Ayat et al. [9] defined the empirical error on a validation set as a new criterion of model selection, and optimized it through the Quasi-Newton method. It is attractive since this criterion does not need to solve another QP problem (for example, to calculate the minimal radius containing all training examples) [10, 11]. Adankon [12] proposed a new formulation for SVM model selection by using the criterion of empirical error technique, which is based on generalization error minimization through a validation set. However, among the above gradient-based (such as Newton and conjugate gradient methods, etc.) model selection algorithms, the final generalization capacity of SVM are largely affected by the initial point of the parameters since these classical methods can only obtain local optimal solutions. On the other hand, some researchers began to use modern optimization techniques (such as genetic, simulated annealing, and PSO algorithms) to study the SVM model selection. Zheng [13] used genetic algorithm to minimize radius margin bound for SVM parameters. Javier [14] proposed a method for tuning L1SVM parameters with the simulated annealing and modified radius margin bound. But their experiments show that since genetic algorithms are lack of local optimization capability and the upper bounds are loose, the final results of model selection are not satisfactory yet, especially for kernel parameters. In this paper, we propose a new method for SVM model selection, which is based on hybrid genetic algorithm and empirical error minimization criterion. The hybrid genetic method integrates the gradient descent algorithm into the genetic algorithm. This can help us find out a better parameter of RBF kernel. Experiments on 13 benchmark datasets show our new model selection method can work well.
2 Support Vector Machine Given a binary class training set {xi , yi } , where xi ∈ R n , yi = {1, −1} and i = 1,…, l . The goal of a support vector machine is to solve the following quadratic problem:
A SVM Model Selection Method Based on Hybrid Genetic Algorithm l
max
∑αi − i =1 l
s.t.
∑α y i
i =1
i
1 l l ∑∑ α iα j yi y j k (xi , x j ) 2 i =1 j =1
,
247
(1)
= 0, 0 ≤ α i ≤ C
where C is the penalty factor and k ( xi , x j ) the kernel function, for example, RBF kernel:
⎛ x −x i j k ( xi , x j ) = exp ⎜ − 2 ⎜ 2σ ⎝
2
⎞ ⎟ = exp ⎛ − θ x − x i j ⎜ ⎟ ⎝ 2 ⎠
2
⎞ ⎟, ⎠
(2)
where σ is referred to as the width parameter of RBF kernel. Here let θ = 1 σ 2 for convenience. Through solving the problem (1), we can obtain a nonlinear classifier: l
f ( xi ) = ∑ y jα j k ( x j , xi ) + b, i = 1,
,l ,
(3)
j =1
where the class is predicted to be +1 if f > 0 and −1 otherwise. To obtain a good performance, some parameters in SVM have to be chosen elaborately. When applying RBF kernel, the parameters include the penalty factor C (determining the tradeoff between the training error and model complexity) and the parameter of the kernel function θ (implicitly defining the nonlinear mapping from input space to some high-dimensional feature space).
3 An Empirical Error Criterion In this section, we briefly introduce the empirical error criterion used in [9, 10]. It is attractive to use a subset of the training data to approximate the expected risk of SVM [5]. Given an independent validation data set, the number of classification errors is: T=
1 N
∑ Ψ(− y f ( x )) , i
i
(4)
i
where Ψ is a step function (i.e., Heaviside function) and N is the size of the validation set. The larger the validation set is, the smaller the variance of the estimated error will be according to the statistical viewpoints. Since such an error function is not differentiable, we have to replace it using some smooth function. For the SVM’s output f ( xi ) (or simply f i ) of example xi , Platt [15] estimated its corresponding posterior probability, pi =
1 , 1 + exp( Af ( xi ) + B)
(5)
248
X. Zhou and J. Xu
which indicates the possibility that the example xi belongs to the positive class, where A and B are constants. Now we substitute a new variable ti = ( yi + 1) 2 ∈ {0,1} for the real class label yi = ±1 of example xi . Through minimizing the test error of validation data set by using a variant of Newton’s algorithm, we can get the optimal values of A and B: N sv
( A* , B* ) = arg max ∑ ( ti log( pi ) + (1 − ti ) log(1 − pi ) ) . A, B
(6)
i =1
Using two optimal values ( A* , B* ) , we define the error probability for a given example xi as,
Ei =| ti − pi |= pi1−ti (1 − pi )ti , i = 1,
,l .
(7)
For a validation set, the average estimation of the errors could be written as: E=
1 N
N
∑E
i
i =1
=
1 N
N
∑p i =1
1− ti i
(1 − pi )ti .
(8)
This is our empirical error criterion for SVM model selection. Generally speaking, such a function is associated with all SVM parameters. In this paper, we particularly apply the gradient descent method to adjust the kernel parameter of RBF (i.e., θ ). In this case, we have to calculate its derivative with respect to θ : ∂E ∂ ⎛1 = ∂θ ∂θ ⎜⎝ N
⎞
N
1
N
∂Ei
∑ E ⎟⎠ = N ∑ ∂θ i =1
i
=
i =1
1 N
N
∂Ei ∂f i . ⋅ ∂θ i
∑ ∂f i =1
(9)
Now we derive the derivative of Ei with respect to f i , ∂Ei ∂Ei ∂pi = ⋅ , ∂f i ∂pi ∂f i where
(10)
∂pi ∂Ei ∂ ti − pi ⎧−1 if ti = 1 = − Api (1 − pi ) . Then we = =⎨ = yi , and 1 if t 0 + = ∂f i pi ∂pi i ⎩
have ∂Ei = Ayi pi (1 − pi ) . ∂f i
(11)
The derivative of f i with respect to θ is:
⎞ ∂fi ∂ ⎛ N sv = ⎜ ∑ α j y j k ( xi , x j ) + b ⎟ , ∂θ ∂θ ⎝ j =1 ⎠ where N sv is the number of support vectors. Finally, we get
(12)
A SVM Model Selection Method Based on Hybrid Genetic Algorithm
∂α ⎤ ∂b ∂fi N sv ⎡ ∂k ( x j , xi ) α j + j k ( xi , x j ) ⎥ + . = ∑ yj ⎢ ∂θ j =1 ⎣ ∂θ ∂θ ⎦ ∂θ
249
(13)
It is noted that this derivative is composed of three parts. Theoretically the last two ∂α j ∂b k ( xi , x j ) and are parts depend on θ , but in our experimnet, we find that ∂θ ∂θ ∂k ( xi , x j ) α j . Finally, we obtain the derivative of the negligible in comparison to ∂θ empirical error function with respect to the kernel parameter θ of RBF kernel:
⎧ N sv ∂k ( xi , x j ) ⎫ ∂Ei αj⎬, = Ayi pi (1 − pi ) ⋅ ⎨∑ y j ∂θ ∂θ ⎩ j =1 ⎭
(14)
where the derivative of k ( xi , x j ) with respect to its parameter θ is: ∂k ( xi , x j ) ∂θ
⎛ θ = exp ⎜ − xi − x j ⎝ 2
2
⎞ ⎛ 1 ⎟ ⋅ ⎜ − xi − x j ⎠ ⎝ 2
2
⎞ ⎟. ⎠
(15)
4 A Novel SVM Model Selection Method The empirical error criterion (8) is a nonlinear function with respect to SVM model parameters. The genetic algorithm is a best candidate to optimize such a criterion. But in practice it is time consuming and does not guarantee to find out a global optimal solution without a good initialization. On the other hand, the gradient descent method is a widely used classical optimization tool because of its simplicity and easy implementation. However, it could get stuck in some local optimum generally. It is found that the generalization capacity of SVM is affected by the kernel parameter of RBF more seriously than by the penalty one. In this paper, the gradient descent method is integrated into the genetic algorithm to construct a hybrid genetic technique. It can help us find out a better kernel parameter. We choose the best individual as an initial solution, and then search for its optimal kernel parameter using gradient descent method as its iterative solution. After evaluating a new generation, we pick up a new best individual from three candidates: the best solution of the old generation, the best one of the new generation, and the iteratively improved version of the old best solution. The worst individual is replaced by the iterative solution too. Such an optimization method can sufficiently utilize advantages of two different methods to find out the better model parameters. Our SVM model selection procedure based on hybrid genetic algorithm is listed as follows: 1) Randomly generate an initial population; 2) Calculate the fitness function (Eq. (8)) for each individual by SVM algorithm, and save the best one as I best ;
250
X. Zhou and J. Xu
3) Regard I best as the initial solution, and then search for a local solution using the gradient descent method as an iterative solution. It is noted that only kernel parameter of RBF is optimized in this step; 4) Generate a new generation of population using three basic operation (i.e., selection, crossover and mutation) in genetic method; 5) For a new generation, calculate fitness function for each individual using SVM; 6) Find out the best individual and the worst one from the current generation; compare the current best individual with I best and the iterative solution, and save the best individual in I best ; replace the worst one with the iterative solution; 7) Return to step 3 until meeting the stop conditions. If converged, the parameters corresponding to the optimal individual are defined as the SVM optimal model parameters. In such a hybrid genetic algorithm for SVM model selection, both the penalty factor C and the RBF kernel parameter θ are optimized at the same time. It is very valuable for SVM applications.
5 Experimental Results In this paper, we only focus on SVM with RBF kernel, and carry out experiments on 13 datasets from [16]: banana, breast cancer, diabetes, flare-solar, german, heart, image, ringnorm, splice, thyroid, titanic, twonorm, and waveform. Each dataset is composed of 100 or 20 different realizations of the binary classification problem. The general information about these data sets is given in Table 1. The search space of model parameters are C ∈ [0.1,30] and θ ∈ [0.01,5.0] . Following the similar experi-mental setting in [3] and [5], we randomly extract one third examples of each training set as the validation dataset, and the left two third ones as the training set. Table 1 General information about 13 data sets Datasets Banana Breast cancer Diabetes Flare-solar German Heart Image Ringnorm Splice Thyroid Titanic Twonorm Waveform
Training size 400 200 468 666 700 170 1300 400 1000 140 150 400 400
Test size 4900 77 300 400 300 100 1010 7000 2175 75 2051 7000 4600
Dimension 2 9 8 9 20 13 18 20 60 5 3 20 21
Realizations 100 100 100 100 100 100 20 100 20 100 100 100 100
A SVM Model Selection Method Based on Hybrid Genetic Algorithm
251
Table 2 Test error rates and their standard deviation found by different algorithms for selecting the SVM model parameters. The values in the squared brackets are the corresponding model parameters, i.e., [C, θ ] Adankon’s GA-SVM Method [12] Breast 25.79±4.21 25.62±4.49 26.04±4.7 26.84±4.71 25.59±4.18 [1.080,0.100] [0.654,0.058] cancer 23.25±1.82 23.32±1.61 Diabetes 23.53±1.73 23.25±1.70 23.19±1.67 [0.500,0.060] [1.458,0.036] 15.98±3.32 15.48±3.33 Heart 15.92±3.18 16.13±3.11 15.95±3.26 [0.666,0.010] [0.428,0.021] 4.64±2.13 4.57±2.17 Thyroid 4.80±2.19 4.62±2.03 4.56±1.97 [10.000,0.183][0.558,0.327] 22.93±1.17 22.57±0.89 Titanic 22.42±1.02 22.88±1.23 22.5±0.88 [1.100,0.120] [2.884,0.227] Datasets
5-fold CV[3]
R-M Bound[5]
Span Bound[5]
Hybrid GA-SVM 25.51±4.25 [0.785,0.101] 23.11±1.74 [1.879,0.018] 15.37±3.26 0.490,0.010] 4.54±2.08 [0.469,0.328] 22.57±0.89 [2.833,0.229]
Firstly we conduct our experiments on five data sets (i.e., breast cancer, diabetes, heart, thyroid, and titanic). Taking account of the time efficiency, on each of the first ten training and testing sets, the penalty parameter C and kernel parameters θ are optimized by using our hybrid genetic algorithm (simply Hybrid GA-SVM). The final model parameters are computed from the average of the ten estimations. For comparison, we also carry out general genetic algorithm as SVM model selection tool on these five sets (simply GA-SVM). Additionally, some results from 5-fold cross validation (5-fold CV) [3], radiusmargin bound (R-M bound) [5], span bound [5] and Adankon’s method [12] are Table 3 The test error rates and their standard deviation obtained with 5-fold CV, GASVM, and hybrid GA-SVM. The values in the square brackets are the corresponding optimal model parameters, i.e., [C, θ ] Datasets
5-fold CV[3]
Banana
11.53±0.66
Flare-solar
32.43±1.82
German
23.61±2.07
Image
2.96±0.60
Ringnorm
1.66±0.12
Splice
10.88±0.66
Twonorm
2.96±0.23
Waveform
9.88±0.43
GA_SVM 11.34±0.49 [3.271, 0.327] 32.45±1.80 [1.450, 0.043] 23.60±2.23 [1.259, 0.034] 3.56±0.67 [3.237, 0.327] 1.47±0.08 [0.223, 0.077] 10.83±0.67 [2.528, 0.016] 2.43±0.13 [0.176, 0.040] 9.86±0.48 [0.623, 0.042]
Hybrid GA_SVM 11.30±0.50 [3.036, 0.339] 32.33±1.80 [1.811, 0.018] 23.58±2.21 [2.825, 0.026] 3.21±0.68 [3.239, 0.528] 1.46±0.08 [0.194, 0.088] 10.76±0.69 [2.624, 0.018] 2.42±0.13 [0.163, 0.044] 9.86±0.45 [0.721, 0.045]
252
X. Zhou and J. Xu
cited. All these experimental data are shown in Table 2, where the best results are indicated in the bold font. According to Table 2, our model selection method performs best on four datasets. On titanic data set, our results are worse than 5-fold cross validation and span bound methods. For the other eight data sets, we execute two model selection methods: hybrid GA-SVM and GA-SVM, as shown in Table 3. The results from 5-fold cross validation are listed too, but we could not find comparable results for the other methods on these data sets in the literature. From this table we can show that our method can work best on six data sets, while 5-fold cross validation and GA-SVM perform best on one data set respectively. According to the above experimental results, we can conclude that for most of the 13 data sets our model selection based on hybrid genetic algorithm for SVM can obtain better results than the 5-fold cross validation, radius margin bound or span bound based ones did.
6 Conclusions In this paper, an effective model selection for SVM is proposed by combining hybrid genetic algorithm and empirical error minimization criterion. The hybrid genetic algorithm integrates the gradient descent algorithm and the genetic one. It helps us find out a better parameter of RBF kernel. The iterative solution from the gradient descent method is also regarded as a candidate of the best and worst individuals. It makes it easy to search for better model parameters. The empirical error rate on a validation set is a better estimate of generalization error, compared with various generalization error bounds. The simulation results on 13 benchmark dataset demonstrate that our proposed approach can work well. In our further work, we will test more real applications including some unbalanced sets to validate our method and consider other kernel functions (for example, polynomial and sigmoid kernels). On the other hand, we will utilize some heuristic strategies to speed up our model selection method. Acknowledgments. This work is supported by National Natural Science Foundation of China (No. 60875001).
References 1. Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, New York (1995) 2. Cortes, C., Vapnik, V.N.: Support Vector Networks. Machine Learning 20(3), 273– 297 (1995) 3. Ratsch, G., Onoda, T., Muller, K.R.: Soft Margins for AdaBoost. Machine Learning 42, 287–320 (2001) 4. Lee, J.H., Lin, C.J.: Automatic Model Selection for Support Vector Machines. Technical Report, Department of Computer Science and Information Engineering, National Taiwan University (2000) 5. Chapelle, O., Vapnik, V.N., Bousquet, O., Mukherjee, S.: Choosing Multiple Parameters for Support Vector Machines. Machine Learning 46(1-3), 131–159 (2002)
A SVM Model Selection Method Based on Hybrid Genetic Algorithm
253
6. Keerthi, S.S.: Efficient Tuning of SVM Hyperparameters Using Radius Margin Bound and Iterative Algorithms. IEEE Trans. on Neural Networks 13, 1225–1229 (2002) 7. Duan, K., Keerthi, S.S., Poo, A.N.: Evaluation of Simple Performance Measures for Tuning SVM Hyperparameters. Neurocomputing 51, 41–59 (2003) 8. Chung, K.M., Kao, W.C., Sun, C.L., Wang, L.L., Lin, C.J.: Radius Margin Bounds for Support Vector Machines with the RBF Kernel. Neural Computation 15, 2643–2681 (2003) 9. Ayat, N.E., Cheriet, M., Suen, C.Y.: Optimization of the SVM Kernels Using an Empirical Error Minimization Scheme. In: Lee, S.W., Verri, A. (eds.) SVM 2002. LNCS, vol. 2388, pp. 354–369. Springer, Heidelberg (2002) 10. Adankon, M.M., Cheriet, M., Ayat, N.E.: Optimizing Resources in Model Selection for Support Vector Machines. In: 2005 International Joint Conference on Neural Networks, pp. 925–930. IEEE Press, Montreal (2005) 11. Ayat, N.E., Cheriet, M., Suen, C.Y.: Automatic Model Selection for the Optimization of the SVM Kernels. Pattern Recognition 38, 1733–1745 (2005) 12. Adankon, M.M., Cheriet, M.: New Formulation of SVM for Model Selection. In: 2006 International Joint Conference on Neural Networks, pp. 1900–1907. IEEE Press, Vancouver (2006) 13. Zheng, C.H., Li, C.J.: Automatic Parameters Selection for SVM Based on GA. In: 5th World Congress on Intelligent Control and Automation, pp. 1869–1872. IEEE Press, Hangzhou (2004) 14. Javier, A., Saturnino, M., Philip, S.: Tuning L1-SVM Hyperparameters with Modified Radius Margin Bounds and Simulated Annealing. In: Sandoval, F., Prieto, A.G., Cabestany, J., Graña, M. (eds.) IWANN 2007. LNCS, vol. 4507, pp. 284–291. Springer, Heidelberg (2007) 15. Platt, J.: Probabilistic Outputs for Support Vector Machines and Comparisons to Regula-rized Likelihood Methods. In: Bartlett, P.J., Scholkopf, B., Schuurmans, D., Smola, A.J. (eds.) Advances in large margin classifiers, pp. 67–74. MIT Press, Cambridge (1999) 16. Ratsch, G.: Benchmark data sets, http://ida.first.fhg.de/projects/bench/benchmarks.htm
An SVM-Based Mandarin Pronunciation Quality Assessment System Fengpei Ge, Fuping Pan, Changliang Liu, Bin Dong, Shui-duen Chan, Xinhua Zhu, and Yonghong Yan
Abstract. This paper presents our Mandarin pronunciation quality assessment system for the examination of Putonghua Shuiping Kaoshi (PSK) and investigates a novel Support Vector Machine (SVM) based method to improve its assessment accuracy. Firstly, an selective speaker adaptation module is introduced, in which we select well pronounced speech from results of the first-pass automatic pronunciation scoring as the adaptation data, and adopt Maximum Likelihood Linear Regression to update the acoustic model (AM). Then, compared with the traditional triphone based AM, the monophone based AM is studied. Finally, we propose a new method of incorporating all kinds of posterior probabilities using SVM classifier. Experimental results show that the average correlation coefficient between machine and human scores is improved from 83.72% to 85.48%. It suggests that the two methods of selective speaker adaptation and multi-model combination using SVM are very effective to improve the accuracy of pronunciation quality assessment. Keywords: SVM, Pronunciation Quality Assessment, CALL, Speaker Adaptation, Speech Recognition.
1 Introduction This work is to investigate automatic pronunciation quality assessment (APQA), as the main part of a computer-aided Putonghua Shuiping Kaoshi Fengpei Ge · Fuping Pan · Changliang Liu · Bin Dong · Yonghong Yan ThinkIT Speech Lab, Institute of Acoustics, Chinese Academy of Sciences, Beijing 100871, P.R. China {gefengpei,panfuping,liuchangliang,dongbin}@hccl.ioa.ac.cn Shui-duen Chan · Xinhua Zhu Department of Chinese & Bilingual Studies, the Hong Kong Polytechnic University {chsdchan,Ctxhzhu}@polyu.edu.hk,
[email protected] H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 255–265. c Springer-Verlag Berlin Heidelberg 2009 springerlink.com
256
F. Ge et al.
(PSK) system. Over the last decades, many research groups have studied it based on speech recognition techniques. The evaluation scores are obtained by combining various machine measurements, including scores of Hidden Markov Model (HMM) log-likelihood, timing, phone log-posterior probability and segment duration. Mandarin is a syllabic language, whose syllable has two parts: the initial part is consonant and the final is vowel. So the segmental quality is a major problem especially for nonnative speakers. A classical measurement is the phonetic posterior probability, which was used in several previous works [1]-[5]. However, it does not correlate with human judge directly. Witt and Young studied the pronunciation quality assessment at the phone level, and suggested improving evaluation performance by using a likelihood-based GOP measure and explicit error modeling [6]. This paper describes our Mandarin APQA system, and some improvement measures are investigated. Firstly, a novel speaker adaptation module is present. We select well pronounced speech from results of the first-pass APQA as the adaptation data, and adopt Maximum Likelihood Linear Regression (MLLR) to update acoustic model (AM). Then, monophone based AM is studied, which can describe the acoustic characteristics of isolated phones more elaborately. At last, we propose a new method of incorporating all kinds of posterior probabilities using SVM to class pronunciation qualities. Since pronunciation assessment is highly subjective, three measures has been used, each of which measures a different aspect of how well computerderived scores agree with human assessment. Experiment results suggest that selective speaker adaptation is useful to improve the performance of forced alignment, and that the method of multi-model combination using SVM is helpful for increasing the accuracy of classing pronunciation qualities and can achieve usable performance. The rest of the paper is organized as follows: section 2 describes our baseline system; the improvement measures are shown in section 3; in section 4 experiment results are presented; section 5 concludes this paper; and finally the acknowledgement is given.
2 Baseline System Our system evaluates the pronunciation quality of Mandarin speech, in which syllable is the fundamental assessment unit. All Mandarin syllables can be considered as a combination of initial and final parts. The initial is articulated with a final to form a syllable pronunciation. Mandarin is also a tonal language. The tone is mainly specified by the pattern of pitch contour of vowel portion of the syllable. In our system, the syllable pronunciation is evaluated from three aspects: the quality of consonant, vowel and tone. The first two parts are evaluated using the ASR techniques of HMM and Viterbi search [7], which is the core of the system. The tone quality is evaluated using a Gaussian Mixture Model
An SVM-Based Mandarin Pronunciation Quality Assessment System
257
Input Speech
Feature Extraction
Learning Text
MFCC
Pronunciation Dictionary
One-Pass Viterbi Decoding (Forced Alignment) Phonetic Segments
HMM Model
Vowel Part
Pitch Extraction
Post Probability Calculation Predefined Threshold
Model Matching
Detector Vowel Score
Tone GMM
Tone Score
Consonant Score
Integrate Scores Syllable Score
Fig. 1 Our CALL system structure
(GMM) classifier, which is not the focus of this paper, but can be found in [8][9]. A block diagram of the system is shown in Fig.1. The front-end feature extraction converts the speech waveform to a sequence of Mel-Frequency Cepstral Coefficients (MFCC), which is then fed into decoder to do onepass Viterbi decoding. The HMM net only consists of the models of the learning text, and the Viterbi decoding is a forced alignment between the speech frames and the HMM models. With the frame index of each HMM state and the accumulated observation probability of the phone segment, the phonetic posterior probability score is calculated as the pronunciation quality measurement. There are two kinds of phonetic posterior probabilities in our system. One is the average of logarithm of the frame-based posterior probabilities (AFBPP) [3][4][5]. 1 log P (st |ot ), e−b+1 e
ρAF BP P (P H|O) =
(1)
t=b
where O is the forced-aligned observation sequence of P H; b is its beginning frame and e is its end frame; S is the state sequence corresponding to O; P (st |ot ) is the state posterior probability. The other is the phone log-posterior probability (PLPP) [6]: ρP LP P (P H|O) =
p(O(q) )|q 1 1 log P (q|O(q) ) = log , τ τ p(O(q) |p) p∈Q
(2)
258
F. Ge et al.
where τ is the number of frames in the acoustic segment O(q) ; Q is the set of Mandarin consonants when q is a consonant, otherwise it is the set of Mandarin vowels when q is a vowel. We combine the two posterior probability scores as follows. P (P H|O) = ω ∗ ρAF BP P + ρP LP P ,
(3)
where P (P H|O) is the combined posterior probability, and ω is the combination coefficient. The confidence of P (P H|O) is an absolute measurement describing how close the utterance is to the standard pronunciation. It can be used for phonetic pronunciation assessment directly [1][4]. We classify phonetic pronunciation quality into three classes: good, medium and bad, corresponding to score 2, 1 and 0 respectively. So, the final stage of evaluation is using predetermined thresholds to map the posterior probability scores to grades. The thresholds used for consonant and vowel are context-dependent. Evaluation of the tone is in parallel with the phone pronunciation assessment. Finally, the phone pronunciation scores and the tone score are integrated by Eq.(4) to form the syllable score. All syllable scores for one speaker are summed up to the speaker score. Ssyllable = min (SConsonant , SV owel , ST one )/2,
(4)
3 Improved System 3.1 Selective Speaker Adaptation As described in section 2, forced alignment is significant for pronunciation quality assessment, from which the frame index of each HMM state and the accumulated observation probability of the phone segment are obtained. During it, AM plays an important role. Ideally, we should adopt standard speech as the evaluation criterion to assess pronunciation quality, and AM should be pre-trained with standard Mandarin speech. However, speaker independent (SI) AM often mismatch the test speech, because most users are with strongly accent. For such speakers, their articulations are different from those of native speakers. They often have difficulty in pronouncing some initials/finals. So they use some typical strategies of language learners to compensate for such difficulties, including phonological transfer, overgeneralization, prefabrication, epenthesis, etc. Therefore, they pronounce in a way close to standard Mandarin, but not accurately. This mismatch between AM and test speech degrades the APQA performance severely. Therefore, Adjusting SI AM to fit the users is necessary for improving the phone segmentation (forced alignment) of strongly accented Mandarin speech, and also non-native speaker’s speech.
An SVM-Based Mandarin Pronunciation Quality Assessment System
259
Fig. 2 Our CALL system structure with speaker adaptation
In this paper, we adopt MLLR [10] to adapt SI AM to the special testee. A block diagram of the improved system is shown in Fig.2. Firstly, preassessment which is the first-pass pronunciation quality assessment is done using the baseline system and the scores for every consonant, vowel and tone for the testee are obtained. Using the first-pass assessment results, we select well pronounced speech as the adaptation data, whose score obtained from the system is 2. Then the SI acoustic model is adapted using MLLR. In order to achieve speaker normalization without adapting to specific phone error patterns, this adaptation is limited to a single global transform of the HMM mixture component means. Finally, we use the speaker-adapted (SA) AM to assess his/her pronunciations again.
3.2 Monophone Based AM Used in Pronunciation Quality Assessment In baseline system, we adopt triphone based AM to do forced alignment and posterior probability calculation, which is popularly used in ASR systems. The triphone based AM can model the variability of context-dependent phones better. In fact, the PSK examination focuses on the single syllables and two-syllable words. Moreover, most of the testees can not pronounce them fluently, and they often speak phone by phone. In this case, the context of a phone is not necessary to be modeled subtly, and the elaborate AM with triphones may cause some phonetic confusions. Using monophone as the modeling unit can elaborate the acoustic characteristic of different phones with more GMMs. So the monophone based AM is also used in APQA to model acoustic characteristics for the isolated phones. In our improved system, the triphone and monophone based AM are combined to assess Mandarin pronunciation qualities.
3.3 Multi-model Combined System Using SVM Triphone based AM can model the variability of context-dependent phones better, while monophone based AM can describe the acoustic characteristics of the isolated phones more elaborately. The posterior probabilities calculated
260
F. Ge et al.
Fig. 3 Multi-model combined system using SVM
with different AMs contain acoustic characteristic from different aspects. So combining them is useful to improve system performance. In baseline system, we combine the two type posterior probabilities of AFBPP and PLPP using the method of the linear weighted sum shown as Eq.(4) and classify the combined confidence with thresholds. This method can not resolve complex learning problems, while SVM is an outstanding statistical learning method for such classify problem[11] . We use the toolkit “(lib) SVM” of Chih-Chung Chang and Chih-Jen Lin [12][13] to implement the system combination of triphone and monophone AMs. A block diagram of the combined system is shown in Fig.3. It mainly includes two parts: training and classifying. The latter is implemented with models pre-trained by the former. The input and output of SVM are confidence vectors and pronunciation qualities (0,1 and 2) respectively. What is the confidence vector is an important problem. In this paper, there are two kinds of phonetic posterior probabilities for every phone described as section 2. From the two systems, based on triphone and monophone, we can obtain four-dimensional vectors per phone as the input of SVM. During training models, confidence vectors of the developed corpora are fed into the SVM training module and the model for every phone is obtained. While classifying, a evaluation corpora is used.
4 Experiments 4.1 Performance Measures For similarity measurements between machine and human assessment, three kinds of evaluation metrics are used. They are the assessment correct rate (ACR), the average correlation coefficient (ACC), and the speaker correctrank rate (SCRR). The three evaluation measures reflect the performance of our system from different aspects. ACR can be defined as the overall fraction of phones/syllables which are correctly assessed. ACC measures overall similarity of the machine and human scores, which is calculated as follows:
An SVM-Based Mandarin Pronunciation Quality Assessment System
T T (fki · gki ) 1 < fk , gk > 1 2i 2 ACC = , = 2 2 T fk · gk T i fki i gki k=0
261
(5)
k=0
where fki and gki are the machine and human scores respectively for the syllable i of testee k. And T is the testees number. The speaker correct-rank rate (SCRR) is usually used in the PSK examination, which denotes one’s Putonghua level [14]. SCRR is the ratio of the number of correctly ranked speaker number, whose machine and human scored rank are the same, and the number of all speakers.
4.2 Experimental Setup The AMs used in our experiments are gender-dependent, continuous mixture density, state-tied, cross-word HMMs, which model monophone or triphone. For the triphone based AMs, the HMMs consist of about 5100 tiedstates, each of which has 16 Gaussian components. For the monophone based ones, there are about 700 tied-states and the number of Gaussian components per state is investigated. The front end used MFCC analysis to get the 39-dimensional feature, including 12-dimensional static cepstra and 1dimensional energy with 1st and 2nd order derivatives. The training data are the same for the above two kinds of AMs, about 400 hours native Mandarin speech with 200 hours female and 200 hours male data. We collected three test sets of PSK, which are named as PSK1, PSK2 and PSK3 including 289, 201 and 400 testees respectively. They are all spoked by Hong Kong native undergraduates with strongly accent, and recorded in the actual PSK examination. Transcriptions for each test set are the same, but that for different sets are distinct. They are annotated by five trained phoneticians, each of whom grades the pronunciation quality of consonant, vowel and tone for every syllable with 2, 1 or 0, standing for good, medium and bad pronunciation respectively. According to voting mechanism, the combined human scores is obtained. We randomly draw out seventy percents of every test set as its development corpus and the rest are used as its evaluation corpus.
4.3 System Performance Comparison Using Monophone Based AM with Different GMMs Using monophone as the modeling unit can elaborate the acoustic characteristic of different phones, and the GMM number used to describe the distribution of monophones affects the system performance greatly. Fig. 4 shows the system performance using monophone based AM with different number of GMMs in PSK3 eval. From this figure, it appears that the AM with 64 GMMs is the best in terms of the average CC.
262
F. Ge et al.
Fig. 4 System performance using monophone based AM with different number of GMMs in PSK3
4.4 Evaluation This section presents our experiment results with the three kinds of performance measures described in 4.1, illustrating the performance improvement from different aspects. In the below tables, “Baseline monophone AM” and “Baseline triphone AM” stand for the baseline systems using monophone based and triphone based AM respectively; the improved systems with speaker adaptation module are labeled as “Monophone AM + adaptation” and “Triphone AM + adaptation”. “SVM combination system” denotes the multi-model combination system using SVM, which also adopts speaker adaptation module. Table 1 shows the increase of ACR from baseline system to the SVM combination system. In this table, C-ACR, V-ACR, T-ACR and S-ACR is the assessment correct rate for consonants, vowels, tones and syllables respectively. Comparing the improved systems of speaker adaptation with the baseline systems, it is believed that the speaker adaptation module is helpful to improve ACR, which is the most outstanding on syllables. The line of “SVM combination system” presents that the multi-model combination using SVM increases the assessment accuracy very much. Table 2 indicates the performance improvement in terms of the average CC between machine and human scores. From baseline system to the SVM combination system, the average CC is increased step by step. In particular, the SVM combination system increases the average CC with 1.39% in PSK1, 1.47% in PSK2 and 1.15% in PSK3, compared with the system “Triphone AM + adaptation”. Especially, With PSK2 corpora, the average CC is improved from 83.72% to 85.48%. In the other two evaluation corpora, the average CC based on the SVM combination system is close to that of the inter-expert. So the SVM-based system can achieve usable performance.
An SVM-Based Mandarin Pronunciation Quality Assessment System
263
Table 1 System performance in terms of ACR PSK1 System Description C-ACR Baseline triphone AM 0.890 Baseline monophone AM 0.888 Triphone AM + adaptation 0.892 Monophone AM + adaptation 0.889 SVM combination system 0.900 PSK2 System Description C-ACR Baseline triphone AM 0.896 Baseline monophone AM 0.896 Triphone AM + adaptation 0.899 Monophone AM + adaptation 0.900 SVM combination system 0.909 PSK3 System Description C-ACR Baseline triphone AM 0.884 Baseline monophone AM 0.892 Triphone AM + adaptation 0.886 Monophone AM + adaptation 0.895 SVM combination system 0.905
V-ACR T-ACR 0.890 0.868 0.892 0.870 0.895 0.869 0.901 0.871 0.903 0.875
S-ACR 0.835 0.835 0.838 0.838 0.848
V-ACR T-ACR 0.897 0.894 0.893 0.899 0.903 0.895 0.901 0.899 0.905 0.902
S-ACR 0.854 0.851 0.856 0.856 0.867
V-ACR T-ACR 0.873 0.895 0.871 0.898 0.879 0.897 0.880 0.899 0.885 0.905
S-ACR 0.809 0.809 0.815 0.816 0.829
Table 2 System performance with the average CC between machine and human scores System Description Baseline triphone AM Baseline monophone AM Triphone AM + adaptation Monophone AM + adaptation SVM combination system
PSK1 0.8379 0.8388 0.8410 0.8393 0.8549
PSK2 0.8372 0.8376 0.8401 0.8420 0.8548
PSK3 0.8471 0.8456 0.8505 0.8483 0.8620
Table 3 System performance in terms of SCRR System Description Baseline triphone AM Baseline monophone AM Triphone AM + adaptation Monophone AM + adaptation SVM combination system
PSK1 82.04% 81.67% 83.33% 82.72% 84.20%
PSK2 90.31% 90.20% 92.06% 92.16% 93.20%
PSK3 76.96% 77.37% 78.74% 78.53% 79.44%
The increase of SCRR is shown in Table 3. From the table, we know that SCRR of the SVM combination system is the highest. So we believe that
264
F. Ge et al.
speaker adaptation and multi-model combination using SVM are useful to improve the performance of our Mandarin pronunciation quality assessment system.
5 Conclusion This paper has presented our Mandarin pronunciation quality assessment system for PSK and investigated some measures in order to improve its assessment accuracy. In conclusion, this work indicates that our SVM-based system is capable of providing similar feedback to a testee with regard to which pronunciation can be accepted as correct or not and which rank his/her Putonghua Shuiping is. Future work will concentrate on increasing the agreement of machine scores with human assessment more. Acknowledgements. This work is partially supported by MOST (973 program, 2004CB318106), the National Natural Science Foundation of China (10574140, 60535030), the National High Technology Research and Development Program of China (863 program, 2006AA01010, 2006AA01Z195).
References 1. Neumeyer, L., Franco, H., Weintraub, M., Price, P.: Automatic TextIndependent Pronunciation Scoring of Foreign Language Student Speech. In: Proc. of ICSLP 1996, Philadelphia, Pennsylvania, pp. 1457–1460 (1996) 2. Tatsuya, K., Masatake, D., Yasushi, T.: Practical Use of English Pronunciation System for Japanese Students in the CALL Classroom. In: INTERSPEECH 2004, pp. 1689–1692 (2004) 3. Franco, H., Neumeyer, L., Kim, Y., Ronen, O.: Automatic Pronunciation Scoring for Language Instruction. In: Proc. Int’l. Conf. on Acoust., Speech and Signal Processing, Munich, pp. 1471–1474 (1997) 4. Neumeyer, L., Franco, H., Digalakis, V., Weintraub, M.: Automatic Scoring of Pronunciation Quality. Speech Communication 30(2-3), 83–93 (2000) 5. Franco, H., Neumeyer, L., Digalakis, V., Ronen, O.: Combination of Machine Scores for Automatic Grading of Pronunciation Quality. Speech Communication 30 (2000) 6. Witt, S.M., Young, S.J.: Phone-level Pronunciation Scoring and Assessment for Interactive Language Learning. Speech communication 30(2)-32(3), 95–108 (2000) 7. Bernstein, J., Cohen, M., Murveit, H., Rtischev, D., Weintraub, M.: Automatic Evaluation and Training in English Pronunciation. In: ICSLP Kobe, Japan (1990) 8. Chen, J.C., Jang, J.S.R., Li, J.Y., Wu, M.J.: Automatic Pronunciation Assessment for Mandarin Chinese. In: IEEE International Conference on Multimedia and Expo., Taipei, Taiwan (June 2004)
An SVM-Based Mandarin Pronunciation Quality Assessment System
265
9. Pan, F.P., Zhao, Q.W., Yan, Y.H.: Improvements in Tone Pronunciation Scoring for Strongly Accented Mandarin Speech. In: Proceedings of ISCSLP 2006, pp. 592–602 (2006) 10. Leggetter, C., Woodland, P.: Speaker adaptation of HMMs using linear regression. Technical Report CUED/F-INFENG/TR. 181. Cambridge University Engineering Department, Cambridge, UK (1994) 11. Vapnik, V.: The Nature of Statistical Learning Theory. Springer, Heidelberg (1995) 12. http://www.csie.ntu.edu.tw/~ cjlin/libsvm/ 13. http://ntu.csie.org/~ piaip/svm/svm_tutorial.html 14. Implementation Outline for Putonghua Shuiping Kaoshi. Commercial Press, Beijing (2006)
An Quality Prediction Method of Injection Molding Batch Processes Based on Sub-Stage LS-SVM XiaoPing Guo, Chao Zhang, Li Wang, and Yuan Li*
Abstract. Injection molding process, a typical batch process, due to multistage, nonlinear and unavailability of direct on-line quality measurements, it is difficult for on-line quality control. A sub-stage LS-SVM quality prediction method is proposed for dedicating to reveal the nonlinearly relationship between process variables and final qualities at different stages. Firstly, using an clustering arithmetic, PCA P-loading matrices of time-slice matrices are clustered and the batch process is divided into several operation stages, the most relevant stage to the quality variable is defined, and then applying correlation analysis to get irrelevant variables of un-fold stage data according to time as input of LS-SVM for end-of-batch product quality prediction. For comparison, a sub-MPLS quality prediction method is applied. The experimental results prove that the effectiveness of the proposed quality prediction method is superior to the one of the sub-MPLS method. Keywords: Quality prediction, Batch process, LS-SVM, Sub-Stage.
1 Introduction Injection molding processes, a typical multistage batch process, is an important polymer processing technique and transforms polymer materials into various shapes and types of products. However, due to the process high dimensionality, complexity, batch-to-batch variation, and also limited product-to-market time, the final product quality are usually available at the end of the batch, which is analysed (mostly offline) after the batch completion. It is difficult for on-line quality control. Several statistical modeling methods such as multi-way partial least square (MPLS) models, have been reported recently for batch processes[1], Nevertheless, MPLS method is inefficient in revealing time-specific relationships for some multi stage processes and is a linear method. Consequently, it performs poorly in predicting response variables of nonlinear batch processes. XiaoPing Guo . Chao Zhang . Li Wang . Yuan Li Information Engineering School, Shenyang Institute of Chemical Technology, Shenyang 110142, China
[email protected] *
H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 267–273. springerlink.com © Springer-Verlag Berlin Heidelberg 2009
268
X. Guo et al.
However, an artificial neural network (ANN) had the capability to handle the modeling problems associated with nonlinear static or dynamic behaviors. The NNPLS method [2]differs from the direct ANN approach in that, the input–output datas are not directly used to train the NN, but are preprocessed by the PLS outer transform. Jia bo Zhu et al (1998) [3] proposed a time-delay neural network (TDNN) modeling method for predicting the treatment result. Vapnik (1998) presented the Support Vector Machines (SVM), which can give attention to both the expectation risk and the generalization performance and can be used to approximatenonlinear functions. Suykens and Vandewalle (1999) presented LS-SVM method, in which the objective function includes an additional sum squared error term. LS-SVM is one of the methods by which the statistical learning theory can be introduced to practical application. It has its own advantages in solving the pattern recognition problem with small samples, nonlinearity, and higher dimension. And it can be easily introduced into learning problem such as function estimation[4,5]. For multi-stage batch processes,each stage has its own underlying characteristics, and a batch process can exhibit significantly different behaviors over different stages. It is therefore natural to develop stage-based statistical modeling methods to reflect the inherent stage nature to improve the performances of quality control. A stage-based sub-PCA modeling method[6] has been developed, and it has been shown that the stage PCA modeling can overcome many difficulties of MPCA-based monitoring for batch processes. Thus, in the present paper, for injection molding process, a sub-stage LS-SVM quality prediction method is proposed for dedicating to reveal the nonlinear relationship between process variables and final qualities at different stages, and to build a stage-based quality prediction model. Firstly, using an clustering arithmetic, PCA P-loading matrices of time-slice matrices is clustered and the batch process is divided into several operation stages according to the change of process correlation, the most relevant stage to the quality variable is defined, then applying correlation analysis to get inrelevant variables to un-fold stage data according to time as input of quality prediction model, finally, sub-stage LS-SVM models are developed for every stage for quality prediction. For comparison purposes sub-MPLS quality models are established. The results prove the effectiveness of the proposed quality prediction method is superior to the one of the sub-MPLS quality prediction method.
2 Process Description Injection molding, an important polymer processing technique, transforms polymer materials into various shapes and types of products. As a typical multistage process, injection molding operates in stages, among which, filling, packingholding, and cooling are the most important phases. During filling, the screw moves forward and pushes melt into the mold cavity. Once the mold is completely filled, the process then switches to the packing-holding stage, during which additional polymer is “packed” at a high pressure to compensate for the material shrinkage associated with the material cooling and solidification. The packingholding continues until the gate freezes off, which isolates the material in the mold from that in the injection unit. The process enters the cooling stage; the part in the
An Quality Prediction Method of Injection Molding Batch Processes
269
mold continues to solidify until it is rigid enough to be ejected from the mold without damage. Concurrently with the early cooling phase, plastication takes place in the barrel where polymer is melted and conveyed to the front of barrel by screw rotation, preparing for next cycle[7]. For injection molding, high degree of automation is possible. After the process conditions are properly set, the process repeats itself to produce molded part at a high rate. The process is, however, susceptible to the production of off-spec products due to various process malfunctions, drifting of process conditions, changes in materials, and unknown disturbances. Abrupt, gross faults in the key process variables can be easily and reliably detected by the conventional SPC chart. Slow drift or faults involving multiple process variables, however, can not be detected. These process faults, even if they are small and not common, can lead to production of large quantity of bad parts, if they are not detected earlier. The material used in this work is high-density polyethylene (HDPE). Ten process variables are selected for modeling, they are Nozzle Pressure, Stroke, Injection Velocity, Hydraulic Pressure, Plastication Pressure, Cavity Pressure, Screw Rotation Speed, SV1 valve opening, SV2 valve opening, and Mold Temperature, respectively. The sampling rates of these process variables are 20 ms. The operating conditions are set as follows: injection velocity is 25mm/sec; mold temperature equals 25ºC; seven-band barrel temperatures are set to be (200, 200, 200, 200, 200, 180, 160, 120) ºC; packing-holding time is fixed to be 3 second. Quality variables are part weight. Totally, 60 batch runs are conducted under 19 different operation conditions, which can cover all the normal operation range. Based on these data, an stage-based sub-MPLS model and LS_SVM model are developed for quality prediction.
3 A Stage-Based LS-SVM Modeling 3.1 LS-SVMs(Least Square-Support Vector Machines (SVMs)) LS-SVMs follow the approach of a primal-dual optimization formulation, where this technique makes use of a so-called feature space where the inputs have been transformed by means of a (possibly infinite dimensional) nonlinear mapping.This is converted to the dual space by means of Mercer’s theorem and the use of a positive definite kernel, without computing explicitly the mapping. It has its own advantages in solving the pattern recognition problem with small samples, nonlinearity, and higher dimension. Suykens and Vandewalle (1999) presented the LS-SVM approach, in which the following function is used to approximate the unknown function,
,
y ( xi ) = ω ⋅ φ( xi ) + b
,
(1)
where, xi ∈ Χ ⊂ R n yi ∈ Υ = R φ(⋅) : R n → R nh is a nonlinear function which maps the input space into a higher dimension feature space.Given training data,
270
X. Guo et al.
T = {( x1 , y1 ), ,( xl , yl )} ∈ (Χ × Υ )l , the goal is to estimate a model of the form, LSSVM defines an optimization problem as follows, 1 1 l 2 min J (ω, e) = ω + γ ∑ ei2 , (γ > 0) (2) ω,b , e 2 2 i =1
subject to the equality constraints yi = ωT ⋅ φ( xi ) + b + ei , i = 1, 2,… l To solve this optimization problem, one defines the following Lagrange function, L(ω, b, e, α ) = J (ω, e) − ∑ αi ( ωT ⋅ φ( xi ) + b + ei − yi ) l
(3)
i =1
Where, α = {αi }i =1 is the Lagrange multiplier set. l
Calculating the partial derivatives of L(ω, b, e, α ) with respect to ω, b, e, α , one gets the optimal condition for Eq.(2) as l
∇ b L = ∑ αi = 0, i =1
l
∇ ω L = ω − ∑ αi φ( xi ) = 0,
(4)
i =1
∇ ei L = αi − γei = 0, i = 1, 2,… l , ∇ αi L = ωT ⋅ φ( xi ) + b + ei − yi = 0, i = 1, 2,… l ,
Expressing ei and w with αi and b, one can transform the above equality into ⎡Ω + γ −1 I ⎢ T ⎣ 1
1 ⎤ ⎡α ⎤ ⎡ y ⎤ ⎥⎢ ⎥ = ⎢ ⎥ 0⎦ ⎣b ⎦ ⎣ 0 ⎦
(5)
where y = [ y1 ,… , yl ]T , α = [α1 ,… , αl ]T ,1 = [1,… ,1]T and Ω is a square matrix in which the element located on lth column and mth row is Ωij = φT ( xi )φ( x j ) = K ( xi , x j ), i, j = 1,… , l.
(6)
Choosing γ > 0 , ensures that the matrix ⎡ Ω + γ −1 I Φ=⎢ T ⎣ 1
1⎤ ⎥ is invertible. Then we have the analytical solution of a and b 0⎦ ⎡α ⎤ −1 ⎡ y ⎤ ⎢b ⎥ = Φ ⎢ 0 ⎥ ⎣ ⎦ ⎣ ⎦
(7)
Substituting the obtained b and a into Eq.(4), we get l
y ( x) = ∑ αi K ( x, xi ) + b
(8)
i =1
where K(x, xk) is the Kernel function, which can be any symmetric function satisfying Mercer’s condition(Smola, 1996).
An Quality Prediction Method of Injection Molding Batch Processes Fig. 1 Sub-stage modeling method
271
k
i
K
j
1
k
2
1 ~ X1
PCA
~ P1
2 ~ X2
~ P2
k
K
~ Xk
~ XK
~ PK
~ Pk
Clustering algorithm
stage1 Sub-model 1
stage c
Y
Sub-model c
3.2 Sub Stage LS-SVMS Modeling The data gathered from injection molding process forms a three-dimensional data matrix, X( I × J × K ) , where for batch process applications, I denotes cycle number, J denotes variable number, and K denotes the number of samples within a cycle. The X ( I × J × K ) is unfolded, with each of the K time slabs concatenated to produce a two-way array, Xnew ( I × JK ) . The multi-way unfolding procedure is graphically shown in Fig. 1. Based on the work of sub-PCA stage partition strategy[6] an batch run is divided into several stages. An un-fold data of one stage represented as X ( I × m ) , where I is batch numbers, and m is the number of process variables after un-fold data of the stage. All un-fold data of every stage is proposed to extract the local covariance information of process variable and get irrelevant variables as input variables of model. Using irrelevant input variables data and quality data established sub LS-SVM models. For comparison purposes sub-MPLS models were established.
,
4 Experimental Results Without using any prior process knowledge, using sub-PCA based stage-division algorithm, the trajectories of an injection molding batch run is divided into four stages according to the change of local covariance structure, correspond to the four physical operation stages, i.e., injection, packing-holding, plastication and cooling stages shown in Fig. 2. The final quality variables have weak relation with the plastication and cooling stage. The on-line quality prediction model is a distributed model, the weight variables are estimated by the sub-MPLS model and the sub-LSSVM model in packing stage.
272
X. Guo et al. Stages defined by the proposed method
0 1 72
2
3
219 250
0 72 248 Injection Packing-Holding
4
565 578
557 Plastication Physical Operating Stages
Cooling
1000
1000
Fig. 2 Stage division results for injection molding process
,
For illustration, the results of the LS_SVM and he sub-MPLS output prediction for 30 batches are shown in Fig. 3. In Fig. 3 the solid line with circle symbols indicates the weight measurements, and the solid line with triangle symbols plots the corresponding weight prediction using sub-MPLS model and the solid line with square symbols plots the corresponding weight prediction using sub-LSSVM model. It is clear that the predictions of two methods are much closer to the actual weight measurements, indicating significant improvement by using tail data of the packing stage prediction model. But there exists an obvious offset between the measured and predicted values for sub-MPLS methods, and the value of the offset varies with different operating conditions, for instance, the offset for batch 5 is much smaller than that of batch 20. An analysis suggests that the problem was caused by nonlinear of packing stages because sub-MPLS model is linear. But the product weight predicted by sub-LSSVM model can be more exactly predicted.
28.5
Measurment Sub_LSSVM Sub_MPLS
28.0
Weight
27.5
27.0
26.5
26.0 5
10
15
20
25
30
Batch Number
Fig. 3 Predicted results for weight variables using sample time data of packing stages
An Quality Prediction Method of Injection Molding Batch Processes
273
5 Conclusion For injection molding process, an typical multi-stage batch process, a new quality prediction method has been applied. Firstly, Process stages are determined by analyzing the change of process covariance structure and partitioning time-slice PCA loading matrices using a new clustering algorithm. Then applying correlation analysis to get irrelevant variables of un-fold stage data according to time as input, then modelling sub LS-SVM model, finally, the predicted result of sub LS-SVM model is compared with the one of the sub-MPLS model, the predicted precision of the proposed model is superior to the one of the sub-MPLS model. The results have demonstrated the effectiveness of the proposed method. Acknowledgments. This work was supported by the National Science Foundation of China under Grant 60776070 and project of Department of Education of Liaoning under Grant 2008566.
References 1. Nomikos, P., Macgregor, J.F.: Multiway Partial Least Squares in Monitoring Batch Processes. J. Chemometrics Intell. Lab. Syst. 30, 97–108 (1995) 2. Qin, S.J., McAvoy, T.J.: Nonlinear PLS Modeling using Neural Networks. J. Comput. Chem. Eng. 16, 379–391 (1992) 3. Jiabao, Z., Zurcher, J., Ming, R., Meng, Q.H.: An Online Wastewater Quality Predication System Based on a Time-delay Neural Network. J. Engineering Applications of Artificial Intelligence 11, 747–758 (1998) 4. Suykens, J.A.K., Vandewalle, J.: Least Squares Support Vector Machine classifiers. J. Neural Processing Letters 9, 293–300 (1999) 5. Smola, A.J.: Regression Estimation with Support Vector Learning Machines. Master’s Thesis, Technische Universität München (1996) 6. Lu, N., Gao, F., Yang, Y., Wang, F.: A PCA-based Modeling and On-line Monitoring Strategy for Uneven-length Batch processes. J. Ind. Eng. Chem. Res. 43, 3343–3352 (2004) 7. Yang, Y.: Injection Molding: from Process to Quality Control, Ph.D. Thesis, The Hong Kong University of Science & Technology (2004)
Soft Sensing for Propylene Purity Using Partial Least Squares and Support Vector Machine Zhiru Xu, Desheng Liu, Jingguo Zhou, and Qingjun Shi*
Abstract. Many important process variables in propylene distillation actual system is difficultly detected directly or not easy online survey, especially propylene purity which will vary with other process parameter. This paper presents an online soft sensing method by combining partial least squares and support vector machine. First multivariate data analysis was performed using partial least squares, then soft sensing regression model was constructed. Simulation shows that the proposed measuring scheme guarantees parameter estimation and predicting accuracy. Keywords: Soft sensing, Partial least squares, Support vector machine, Propylene purity.
1 Introduction Batch distillation is one of the most important separation processes used in many chemical industries, especially those related to the manufacture of fine and specialty chemicals and in the recovery of volatile compounds from industrial waste[1]. The propylene rectifying tower is the essential in the ethylene product installment, it can separate the propylene of chemistry level or polymerization level from the propane binary mixture and propylene and can get propane product [2]. But one of the main disadvantages of a propylene rectifying is that the process exhibits nonlinear characteristics such as multiple steady states and high sensitivity to operating variables due to the coupling between separation and chemical reaction [3]. Furthermore, in propylene purity actual observation and control system, many important process variable is be difficultly detected directly or not easy online survey Hence special consideration must be given to the design of its detecting and control system. In order to solve this kind of estimate and control variables, the soft-sensing technology came into being. Research and development of soft sensing has been comprehensively reviewed by Arijit Bhattacharya, Pandian Vasan [4], L. Fortuna,
,
Zhiru Xu . Desheng Liu . Jingguo Zhou . Qingjun Shi School of Electrics and Information Engineering, Jiamusi University Heilongjiang 154007, China
[email protected] *
H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 275–281. springerlink.com © Springer-Verlag Berlin Heidelberg 2009
276
Z. Xu et al.
S. Graziani, M. G. Xibilia [5], Weiwu Yan, Huihe Shao, Xiaofan Wang[6]. The basic idea soft-sensing technology is to choose a set of measurable or easily measured parameters (referred to as secondary or ancillary variable variables), the auxiliary variables and the estimated variable-led are be inferred by constructing a mathematical relationship, measurement function is achieved in place of hardware. In this paper, the support vector machine (SVM) is employed to predicting propylene purity. First of all, using partial least-squares(PLS) methods principal component analysis for distillation’s parameters is realized to simplify dimension of the variable parameters, these parameters include refluxing irrigation liquid-level, quantity, overhead temperature of b, reflux temperature column pressure, overhead temperature of a overhead temperature difference and so on, they will affect propylene purity. And then SVM regression model is established through off-line data, and on-line forecast of the propylene purity is accomplished base on the model. The outline of the paper is as follows. Section 2 concentrates on the process of Propylene distillation. The concept of partial least squares and Correlation analysis results are given in Section 3. The experimental results and discussion are presented in Section 4. Some final remarks and overall conclusions are given in Section 5.
,
,
2 Propylene Distillation Process The control process of propylene distillation sees Figure 1[7]. Petroleum gas from the catalytic cracking unit by washing the desulfurizing liquefied goes into the gas separation plant, the light components above C3 are separated in the column 1, and then by the ethane from column 2 removal them into propylene distillation column (column 3, 4), propylene, propylene oxide are isolated. From the analysis of product quality, the column1, 2 have the better separation precision. In good working order, propane do not carry propane C4 component, and propylene not also carry C2 components. As a result, the operation focused on the distillation column on propylene.
,
Fig. 1 Process of propylene distillation Condenser
Mixture feed
Reflux tank
Propylene product Distillation Column Reboil
Reboil
Propane product
Soft Sensing for Propylene Purity Using Partial Least Squares
277
3 Correlation Analysis Based on Partial Least Squares PLS is a new kind of multivariate statistical data analysis, partial least squares is designed to explain the variable space to find some combination of linear, in order to better explain the variations in response to variable information[8]. Assuming that all explanatory variables are associated with the response variable, PLS basic idea is as follows as shown in Figure 2:
Reaction component
Interpretation composition PLS
OLS Interpretation variable
Reaction variable
Fig. 2 Modeling diagram partial least-squares 0.2 0.15
Propylen Purity Value
0.1 0.05 0 -0.05 -0.1 -0.15 -0.2 -0.25 -0.3
Product Output Refluxing Irrigation LiquidQuantity of Reflux Overhead Temperature of B Reflux Temperature Column Pressure Overhead Temperature of A Overhead Temperature Reflux Ratio
Fig. 3 Principal component analysis on process parameters
278
Z. Xu et al.
Ordinary least-squares regression OLS (ordinal least squares) establishes a direct response to variable interpretation the variable linear regression model, reflecting the linear relationship between the two (virtual arrow in Figure 2); but the establishment of PLS is to explain latent variable on the response of the latent variable linear regression model, reflecting the indirect explanation of variables and the relationship between the response variables (the arrow in Figure 2). In this study multivariate data analysis was performed using PLS. variable preprocessing was performed. Principal component analysis (PCA) was performed to get an overview on the offline data sets. The information contained in original variables was summarized by calculation of new latent variables. The compounds, which could not be well explained with the latent variables, were classified as outliers in PCA. Analysis showed impact factor by a variety of parameters on the purity of propylene (see Figure 3), the relevance of the parameters are different (in which, quantity 0.1886441 reflux temperature-0.241395 column pressure-0.276399 overhead temperature of a 0.261821), therefore in this paper the four parameters with a large proportion were choose to be a forecast of input parameters of the propylene purity.
,
,
,
4 Soft Sensing Methods of Propylene Purity Based on SVM 4.1 SVM Regression Analysis Support vector machine (SVM) are learning systems that use a hypothesis space of linear function in a high dimensional feature space, trained with a learning algorithm from optimization theory that implements a learning bias derived from statistical learning theory. Least squares support vector machines (LSSVM) are reformulations to the standard SVM which result in a set of linear equations instead of a quadratic programming problem of SVM[9]. Based on the statistical learning theory and the structural risk minimization principle, SVM obtain the solution by solving the quadratic programming problem while avoiding the local minima, which provides an advantage over other regression techniques[10]. On the other hand, the least squares version of the SVM algorithms finds the solution by solving a set of linear equations. The main advantage of LSSVMs over SVMs is solving a linear system instead of solving a quadratic programming problem. The motivation for choosing LSSVMs as the approximation tool is their higher generalization capability, as well as the achievement of an almost global solution within a reasonably short training time. In feature space, an LSSVM regression model is of the form,
f ( x) = wT φ ( x) + b
(1)
where the weight vector w ∈ R and φ (⋅) is a nonlinear mapping from the input space to a feature space and b is the bias term. Dual form of the model is given by n
N
f ( x) = ∑ σ k K ( xk , x) + b k =1
(2)
Soft Sensing for Propylene Purity Using Partial Least Squares
where
279
σ k s are positive real constants and k (⋅) is a kernel, which is defined to be
a function such that,
k ( x, z ) = (φ ( x), φ ( z ))
(3)
The mapping φ (⋅) is not needed to be defined explicitly since, according to Mercer’s theorem, it can be connected to a kernel function by (3) under some conditions. In this work, a Gaussian function (4) is adopted as a kernel function,
k ( x, z ) = exp(−
x−z
σ2
2
)
(4)
where σ is the width of the function. The attainment of the kernel function is cumbersome and it will depend on each case[11]. However, the kernel function more used is the radial basis function (RBF), a simple Gaussian function, and polynomial functions. For σ of the RBF kernel and d of the polynomial kernel it should be stressed that it is very important to do a careful model selection of the tuning parameters, in combination with the regularization constant, in order to achieve a good generalization model.
4.2 Soft Sensing Model of Propylene Purity Based on SVM Actual production, because of the complex industrial processes, propylene purity is affected by many factors. There is a certain degree of relevance and redundancy between the impact of factors, it will lead to the importation of soft sensing model number is very large, prone to over-fit and seriously affect computing speed and accuracy of forecasts of the soft sensing model. PLS for feature extraction from the original the relevant isolated characteristics, can effectively deal with colinearity between variable, lower variable dimension to feature extraction and selection of the purpose, and effectively address co-linear and data redundancy. In this paper, a soft sensing model combined PLS and SVM is proposed, shown in Figure 4.
x1
Z1(x)
x2
Z2(x)
x3 . . xn .
Mean Normalization
. . .
PLS Feature Extraction
Z3(x) . . Zn.(x)
Fig. 4 Soft sensing model combining PLS and SVM
SVM Regression
Fine Propylene Purity(%)
280
Z. Xu et al.
Measured sample value
99.77
predictive value 99.76 99.75 99.74 99.73 0
20
40 60 sampling point
80
100
Fig. 5 Output of soft sensing model
The input–output data were divided into training sets(100 data ) and test sets(100 data), which consisted of different batches, respectively. Thus, a highly informative process database was generated economically consisting of quantity reflux temperature column pressure overhead temperature of a. Kernel of SVM regression Choices radial basis function of nuclear, the degree of punishment parameters of Samples gam=100, radial basis kernel parameters of the test sig2 = 50; The predictive results are as figure 5:
,
,
,
0.03
Error Analysis
0.02 0.01 0 -0.01 -0.02
0
20
Fig. 6 Predictive error
40 60 sampling point
80
100
Soft Sensing for Propylene Purity Using Partial Least Squares
281
It can be observed from simulation results (fig. 5) that the LSSVM model exhibits an acceptable reconstruction performance for the soft sensing of the Propylene purity. Red dot discrete curve is predictive value of SVM model, Black curve is actual sensor measurements. As the Environmental interference and measurement error of instruments and so on, measurements are often volatile. So soft sensing model can avoided effectively these problems brought by instrument. Fig 6 shows that errors are within the range of the production process.
5 Conclusion In this paper, a novel approach based on combining PLS and LSSVM has been applied on the task of predicting propylene purity. PLS has a strong feature extraction capabilities, can effectively reduce the sample-dimensional, provide samples which is simple and information which is full to input; for the model; SVM has a good ability of non-linear function approximation. In this paper, Soft-sensing modeling method combines the advantages of both. Experiments have been conducted on the actual distillation tower. The simulation the results strongly suggest that LSSVM can aid in the predicting propylene purity. It is hoped that more interesting results will follow on further exploration of data.
References 1. Fileti, A.M.F., Cruz, S.L., et al.: Control strategies analysis for a batch distillation column with experimental testing. Chemical Engineering and Processing 39, 121–128 (2000) 2. Gadalla, M., Olujic, Z., et al.: A design method for internal heat integrated distillation columns (iHIDiCs). Computer Aided Chemical Engineering 24, 1041–1046 (2007) 3. Ali, M.A.-H., Betlem, B., et al.: Non-linear model based control of a propylene polymerization reactor. Chemical Engineering and Processing 46, 554–564 (2007) 4. Bhattacharya, A., Vasant, P.: Soft-sensing of level of satisfaction in TOC product-mix decision heuristic using robust fuzzy-LP. European Journal of Operational Research 177, 55–70 (2007) 5. Fortuna, L., Graziani, S., et al.: Soft sensors for product quality monitoring in debutanizer distillation columns. Control Engineering Practice 13, 499–508 (2005) 6. Yan, W., Shao, H., et al.: Soft sensing modeling based on support vector machine and Bayesian model selection. Computers & Chemical Engineering 28, 1489–1498 (2004) 7. Dayi, D., Xionglin, L.: Aplacation of advanced control in propylene rectifying tower of gas ends plant. Petroleum refinery engineering 32, 48–51 (2002) (in China) 8. Sellin, N.: Partial least square modeling in research on educational achievement. Reflections on Educational Achievement, 256–257 (1995) 9. Li, S., Yao, X., et al.: Prediction of T-cell epitopes based on least squares support vector machines and amino acid properties. Analytica Chimica Acta 584, 37–42 (2007) 10. Iplikci, S.: Dynamic reconstruction of chaotic systems from inter-spike intervals using least squares support vector machines. Nonlinear Phenomena 216, 282–293 (2006) 11. Borin, A., Ferrao, M., et al.: Least-squares support vector machines and near infrared spectroscopy for quantification of common adulterants in powdered milk. Analytica Chimica Acta 579, 25–32 (2006)
Application of Support Vector Machines Method in Credit Scoring Leilei Zhang and Xiaofeng Hui*
Abstract. Credit scoring has attracted lots of research interests in the literature. The credit scoring manager often evaluates the consumer’s credit with intuitive experience. However, with the support of the credit classification model, the manager can accurately evaluate the applicant’s credit score. Support Vector Machine (SVM) classification is currently an active research area and successfully solves classification problems in many domains. This article introduces support vector machines (SVM), to the problem in attempt to provide a model with better explanatory power. We used backpropagation neural network (BNN) as a benchmark and obtained prediction accuracy around 80% for both BNN and SVM methods for the Australian and German credit datasets from UCI. Keywords: Credit scoring, SVM, BNN.
1 Introduction Credit scoring is a method of modelling potential risk of credit applications. In general it employs statistical techniques and historical data to produce a score that financial institutions can use to evaluate credit applicants in terms of risk.[1] With the rapid growth in the credit industry, credit scoring models have been extensively used for the credit admission evaluation. In the last two decades, several quantitative methods have been developed for the credit admission decision. The credit scoring models are developed to categorize applicants as either accepted or rejected with respect to the applicants’ characteristics such as age, income, and marital condition. [2-4]Credit officers are faced with the problem of trying to increase credit volume without excessively increasing their exposure to default. Therefore, to screen credit applications, new techniques should be developed to help predict credits more accurately. The benefits of credit scoring involve reducing the credit analysis cost, enabling faster credit decisions, closer monitoring of existing accounts and prioritizing credit collections. Although rating agencies and many institutional writers emphasize the importance of analysts’ subjective judgment in determining credit scoring, many researchers have obtained promising Leilei Zhang . Xiaofeng Hui School of Managment, Harbin Institute of Technology, Harbin, China
[email protected]
*
H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 283–290. springerlink.com © Springer-Verlag Berlin Heidelberg 2009
284
L. Zhang and X. Hui
results on credit scoring prediction applying different statistical and Artificial Intelligence (AI) methods. [5-7]. The overall objective of credit scoring is to build models that can extract knowledge of credit risk evaluation from past observations and to apply it to evaluate credit risk of companies or persons with much broader scope. The most popular methods adopted in the scoring industry are linear discriminant analysis, logistic regression and their variations. They are relatively easy to implement and are able to generate straightforward results that can be readily interpreted, but this usually requires domain expert knowledge and in-depth understanding of the data. In addition, these methods are not effective for problems with high-dimensional inputs and small sample size (e.g. new creditors). In real world application, assumptions regarding the data must be held, such as being linear separable and that the data should follow certain distributions. In this study, we experimented with using a relatively new learning method for the field of credit scoring prediction, support vector machines, together with a frequently used high-performance method, backpropagation neural networks, to predict credit scoring.
2 Inductive Learning of Classifiers A classifier is a function that maps an input attribute vector, x = ( x1,x2,x3,...xn ) , to the confidence that the input belongs to a class – i.e., f ( x ) = confidence(class). In the case of text classification, the attributes are words in the document and the classes are the categories of interest. The key idea behind SVMs and other inductive learning approaches is to use a training set of labeled instances to learn the classification function automatically. SVM classifiers resemble the second example above – a vector of learned feature weights. The resulting classifiers have many advantages: they are easy to construct and update, they depend only on information that is easy for people to provide (i.e., examples of items that are in/out of categories), and they allow users to smoothly tradeoff precision and recall depending on their task.
3 The Proposed Support Vector Machine Algorithm The support vector machine (SVM) is a training algorithm for learning classification and regression rules from data, for example the SVM can be used to learn polynomial, radial basis function (RBF) and multi-layer perceptron (MLP) classifiers SVMs were first suggested by Vapnik in the 1960s for classification [810]and have recently become an area of intense research owing to developments in the techniques and theory coupled with extensions to regression and density estimation. SVMs arose from statistical learning theory; the aim being to solve only the problem of interest without solving a more difficult problem as an intermediate step. SVMs are based on the structural risk minimisation principle, closely related
Application of Support Vector Machines Method in Credit Scoring
285
to regularisation theory. This principle incorporates capacity control to prevent over-fitting and thus is a partial solution to the bias-variance trade-off dilemma. Two key elements in the implementation of SVM are the techniques of mathematical programming and kernel functions. The parameters are found by solving a quadratic programming problem with linear equality and inequality constraints; rather than by solving a non-convex, unconstrained optimisation problem. The flexibility of kernel functions allows the SVM to search a wide variety of hypothesis spaces. Here we focus on SVMs for two-class classification, the classes being P, N
yi = +1,−1 respectively. This can easily be extended to k − class classification by constructing k two-class classifiers The geometrical interpretation of for
support vector classification (SVC) is that the algorithm searches for the optimal separating surface, i.e. the hyperplane that is, in a sense, equidistant from the two classes. This optimal separating hyperplane has many nice statistical properties. SVC is outlined first for the linearly separable case. Kernel functions are then introduced in order to construct non-linear decision surfaces. Finally, for noisy data, when complete separation of the two classes may not be desirable, slack variables are introduced to allow for training errors. Maximal Margin Hyperplanes If the training data are linearly separable then there exists a pair that
(w, b) such
w T xi + b ≥ 1, for all xi ∈ P
(1)
w T xi + b ≤ −1, for all xi ∈ N with the decision rule given by
f w ,b (x) = sgn( wT x + b)
(2)
w is termed the weight vector and b the bias (or − b is termed the threshold). The inequality constraints (1) can be combined to give
yi (w T x i + b) ≥ 1, for all x i ∈ P ∪ N
(3)
Without loss of generality the pair (w , b) can be rescaled such that
min w T xi + b = 1 ,
i =1,…, l
this constraint defines the set of canonical hyperplanes on ℜ . In order to restrict the expressiveness of the hypothesis space, the SVM searches for the simplest solution that classifies the data correctly. The learning N
problem is hence reformulated as: minimize
w = wT w 2
subject to the
286
L. Zhang and X. Hui
constraints of linear separability (3). This is equivalent to maximising the distance, normal to the hyperplane, between the convex hulls of the two classes; this distance is called the margin. The optimisation is now a convex quadratic programming (QP) problem
1 2 w 2 T subject to yi (w xi + b) ≥ 1, i = 1,…, l. Minimize Φ (w ) = w ,b
(4)
This problem has a global optimum; thus the problem of many local optima in the case of training e.g. a neural network is avoided. This has the advantage that parameters in a QP solver will affect only the training time, and not the quality of the solution. This problem is tractable but in order to proceed to the non-separable and non-linear cases it is useful to consider the dual problem as outlined below. The Lagrangian for this problem is
L ( w , b, Λ ) = where
[
]
l 1 2 w − ∑ λi yi (wT x i + b) − 1 2 i =1
(5)
Λ = (λ1 ,…, λl )T are the Lagrange multipliers, one for each data point.
The solution to this quadratic programming problem is given by maximising L with respect to Λ ≥ 0 and minimising with respect to w , b . Differentiating with respect to w and b and setting the derivatives equal to 0 yields l ∂ L(w, b, Λ ) = w − ∑ λi yi xi = 0 ∂w i =1
and l ∂ L(w, b, Λ ) = −∑ λi yi = 0 ∂b i =1
(6)
So that the optimal solution is given by (2) with weight vector l
w * = ∑ λ*i yi xi
(7)
i =1
Substituting (6) and (7) into (5) we can write l
F (Λ ) = ∑ λi − i =1
l 1 1 l l 2 T w = ∑ λi − ∑∑ λi λ j yi y j xi x j 2 2 i =1 j =1 i =1
which, written in matrix notation, leads to the following dual problem
(8)
Application of Support Vector Machines Method in Credit Scoring
Maximize F (Λ ) = ΛT I −
1 T Λ DΛ 2
287
(9)
subject to Λ ≥ 0, Λ y = 0 T
where
y = ( y1 , … , y l ) T and D is a symmetric l × l matrix with elements
Dij = yi y j xi x j . Note that the Lagrange multipliers are only non-zero when T
yi (wT x i + b) = 1 , vectors for which this is the case are called support vectors since they lie closest to the separating hyperplane. The optimal weights are given by (10) and the bias is given by T
b* = yi − w * xi for any support vector
(10)
xi (although in practice it is safer to average over all sup-
port vectors). The decision function is then given by
⎛ l ⎞ f (x) = sgn ⎜ ∑ yi λ*i xT xi + b* ⎟ ⎝ i =1 ⎠ The solution obtained is often sparse since only those
(11)
x i with non-zero Lagrange
multipliers appear in the solution. This is important when the data to be classified are very large, as is often the case in practical data mining situations. However, it is possible that the expansion includes a large proportion of the training data, which leads to a model that is expensive both to store and to evaluate. Alleviating this problem is one area of ongoing research in SVMs.
4 Backpropagation Neural Network Backpropagation neural networks have been extremely popular for their unique learning capability and have been shown to perform well in different applications in our previous research such as medical application and game playing. A typical backpropagation neural network consists of a three layer structure: input-layer nodes, output-layer nodes and hidden-layer nodes. In our study, we used financial variables as the input nodes and rating outcome as the output layer nodes. Backpropagation networks are fully connected, layered, feed-forward models. Activations flow from the input layer through the hidden layer, then to the output layer. A backpropagation network typically starts out with a random set of weights. The network adjusts its weights each time it sees an input–output pair. Each pair is processed at two stages, a forward pass and a backward pass. The forward pass involves presenting a sample input to the network and letting activations flow until they reach the output layer. During the backward pass, the network’s actual output is compared with the target output and error estimates are
288
L. Zhang and X. Hui
computed for the output units. The weights connected to the output units are adjusted to reduce the errors (a gradient descent method). The error estimates of the output units are then used to derive error estimates for the units in the hidden layer. Finally, errors are propagated back to the connections stemming from the input units. The backpropagation network updates its weights incrementally until the network stabilizes.
5 Experimental Studies The two real world data sets, the Australian and German credit data sets, are available from the UCI Repository of Machine Learning Databases (Murphy & Aha, 2001) and are adopted herein to evaluate the predictive accuracy. The Australian credit data consists of 307 instances of creditworthy applicants and 383 instances where credit is not creditworthy. Each instance contains 6 nominal, 8 numeric attributes, and 1 class attribute (accepted or rejected). This dataset is interesting because there is a good mixture of attributes: continuous, nominal with small numbers of values, and nominal with larger numbers of values. There are also a few missing values. To protect the confidentiality of data, the attributes names and values have been changed to meaningless symbolic data. The German credit scoring data are more unbalanced, and it consists of 700 instances of creditworthy applicants and 300 instances where credit should not be extended. For each applicant, 24 input variables describe the credit history, account balances, loan purpose, loan amount, employment status, personal information, age, housing, and job title. This data set only consists of numeric attributes.
6 Prediction Accuracy Analysis To evaluate the prediction performance, we followed the 10-fold cross-validation procedure, which has shown good performance in model selection. Because some credit data had a very small number of data points for both the Australian and German datasets, we also conducted the leave one-out cross-validation procedure to access the prediction performances. When performing the cross validation procedures for the neural networks, 10% of the data was used as a validation set. Table 1 summarizes the prediction accuracies of the two models using both crossvalidation procedures. The outcome show that support vector machines outperformed neural networks model; and the 10-fold and leave-one-out cross-validation procedures obtained comparable prediction accuracies. We can draw several conclusions from the experiment results obtained. First, our results conform to prior research results indicating that analytical models based on publicly available financial information built by machine learning algorithms could provide accurate predictions. In our study, we obtained the highest
Application of Support Vector Machines Method in Credit Scoring
289
Table 1 Prediction accuracies (SVM: support vector machines, NN: neural networks)
Data
10-fold
Leave one-out
Cross validation
Cross validation
(%)
SVM
(%)
NN
(%)
SVM
(%)
NN
Australian credit data sets
81.63
76.54
79.87
75.26
German credit data sets
80.00
79.25
80.00
75.68
Fig. 1 ROC curve (SVM)
prediction accuracies of 81.63% for Australian data set and 79.25% for the German data set. Second, support vector machines slightly improved the credit scoring prediction accuracies.
7 Conclusion In this study, we applied a newly introduced learning method based on statistical learning theory, support vector machines, together with a frequently used high performance method, backpropagation neural networks, to the problem of credit scoring prediction. We used two data sets for Australian and German credit datasets as our experiment testbed. The results showed that support vector machines achieved accuracy comparable to that of backpropagation neural networks.
290
L. Zhang and X. Hui
References 1. David, W.: Neural Network Credit Scoring Models. Computers & Operations Research 27, 1131–1152 (2000) 2. Reshmi, M., Malhotra, D.K.: Differentiating between Good Credits and Bad Credits Using Neuro-Fuzzy Systems. Computing, Artificial Intelligence and Information Technology 136, 190–211 (2002) 3. Malhotra, R., Malhotra, D.K.: Evaluating Consumer Loans Using Neural Networks. Omega 31, 83–96 (2003) 4. Baesens, B.: Benchmarking State-of-the-Art Classification Algorithms for Credit Scoring. Journal of the Operational Research Society 54(6), 627–634 (2003) 5. Haärdle, W.: Predicting Corporate Bankruptcy with Support Vector Machines Humboldt University and the German Institute for Economic Research (2003) 6. Schebesch, K.B., Stecking, R.: Support Vector Machines for Classifying and Describing Credit Applicants: Detecting Typical and Critical Regions. Journal of the Operational Research Society 56(9), 1082–1088 (2005) 7. Shin, K.S.: An Application of Support Vector Machines in Bankruptcy Prediction Model. Expert Systems and Applications 28, 127–135 (2005) 8. Syed, N.A.: Incremental Learning with Support Vector Machines. In: Proceedings of the International Joint Conference on Artificial Intelligence (1999) 9. Cauwenberghs, G., Poggio, T.: Incremental and Decremental Support Vector Machine Learning. In: Advances in Neural Information Systems, Cambridge, MA, pp. 409–415 (2000) 10. Hosmer, D.W., Lemeshow, S.: Applied Logistic Regression. Wiley & Sons, New York (1989)
Improving Short Text Clustering Performance with Keyword Expansion Jun Wang, Yiming Zhou, Lin Li, Biyun Hu, and Xia Hu*
Abstract. Most of traditional text clustering methods are based on bag of words representation, which ignore the important information on semantic relationship between key terms. To overcome this problem, researchers have recently proposed several new methods for improving short text clustering accuracy based on enriching short text representation. However, the computational costs of these methods based on expanding words appeared in short texts are usually time-consuming. In this paper, we improve previous work by enriching short text representation with keyword expansion. Empirical results show that the proposed method can greatly save time without sacrificing clustering accuracy. Keywords: Short text clustering, Keyword expansion, Text representation.
1 Introduction Text clustering plays an important role in text-related task and application area such as text mining, web information retrieval and search engine. However, short text clustering poses new challenges. Short texts commonly contain no more than fifty words. Because of the short length, the texts do not provide enough contexts for pattern matching. Conventional bag of words representation used for clustering methods is often unsatisfactory. In order to overcome this difficulty, researchers have recently proposed several new methods for improving the accuracy of short text clustering focused on enriching short text representation. Banerjee Somnath, Ramanathan Krishnan[1] proposed a method of improving the accuracy of clustering short texts by enriching their representation with additional features from Wikipedia. In their method, the conventional bag of words representation of text items is augmented with the titles of select Wikipedia articles. Hotho A., Staab S., Stumme.G[2] compiled background knowledge into the text document representation. They used concepts from WordNet[3] to expand text items in the conventional bag of words representation. In [1] and [2], both of them employed the term expansion method. However, expanding every term with Jun Wang . Yiming Zhou . Lin Li . Biyun Hu . Xia Hu School of Computer Science and Engineering, Beihang University, Beijing 100191, China
[email protected] *
H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 291–298. springerlink.com © Springer-Verlag Berlin Heidelberg 2009
292
J. Wang et al.
additional feature has two weak points. One is that the method will be rather timeconsuming. As the number of terms in bag of words representation is usually very high, the number of terms will become huge after expansion. Secondly, it may add noise to the representation and may incur a loss of information. In this paper, we improve previous work by enriching short text representation with keyword expansion. Our method is based on an initial text representation as a bag of words. First, we extract keywords from the short text corpus based on TF/IDF score. Secondly, after getting the keywords list, we compiled the background knowledge into the keywords list. The background knowledge is obtained from WordNet. WordNet assigns words of English language to sets of synonyms called ‘synsets’. We consider the ‘synsets’ as the background knowledge of a word. Thirdly, we use the keywords list with background knowledge to expand the initial text representation. The clustering is then performed with K-means. We presented an experimental comparison with previous work. The experiment was made on two short text corpus which both came with a set of categorizing labels attached to the documents, the Reuters-21578 Corpus[4], and OHSUMED test collection[5], a clinically-oriented MEDLINE subset. Results indicate that with our proposed method, the efficiency of the short text clustering can be greatly improved without sacrificing clustering accuracy.
2 Enriching Text Representation with Keyword Expansion Our method first extracts keywords as candidate words to be extended, then expands original short texts with keywords background knowledge.
2.1 Keyword Expansion The first step is to extract keywords from short texts. A popular algorithm for indexing is the TF/IDF measure, which extracts keywords that appear frequently in a document, but not appear frequently in the remainder of the corpus[6]. In this paper, we extract keywords from the short text corpus based on. The principal idea is that for each short text, we select the top x percent terms based on their TF/IDF score. A keywords list consisted of the top x percent terms. The parameter x can be set depended on different applications. In this paper, we select the top 45% words from each short text, and then choose noun, verb, adjective and adverb from the top 45% words as candidate words to be expanded. After obtaining the keywords list, we use background knowledge to expand these keywords. The background knowledge is obtained from WordNet. WordNet is semantically-oriented dictionary of English, similar to a traditional thesaurus but with a richer structure. WordNet assigns words of English language to sets of synonyms called ‘synsets’. We consider the ‘synsets’ as the background knowledge of the term in keywords list.
Improving Short Text Clustering Performance with Keyword Expansion
293
2.2 Compiling Background Knowledge into Text Representation When compiling background knowledge into text representation, we first construct a keywords list for each short text. Then, for every term in the list, we use term’s ‘synsets’ from WordNet to replace the original one. For example, consider the following short text segments: 1. Benz is credited with the invention of the motorcar. First, we remove the stop words and do stemming, and represent the short text segments (1) as bag of words, like: 2. BOW={‘Benz’, ,’credit’, ’invention’ ,’motorcar’} Suppose motorcar is a keyword of this text, then we select motorcar into keyword list. Next, we find the ‘synsets’ of motorcar in WordNet. For instance, the ‘synsets’ of motorcar is: 3. Syn(‘motorcar’) = [‘car’, ‘auto’, ‘machine’, ‘automobile’, ‘motorcar’]. Then the original short text is expanded with Syn(‘motorcar’), like: 4.Extended_BOW={‘Benz’, ’credit’, ’invention’, ’motorcar’, ‘car’, ‘auto’, ‘machine’, ‘automobile’} After we have processed all the short texts in the corpus, TF/IDF score of each short text should be calculated to obtain the representation of the corpus. Then, clustering algorithms can be employed based on this representation.
2.3 Algorithm Description This section provides the precise algorithm description. The enriched short text representation with keyword expansion is derived as follows: Stop words removal and stemming Let Dn be the set of N short texts, for each short text d in Dn, apply stop words removal and stemming. Extract Keywords For each short text di in Dn, construct the term vector vi, where each element is a term tj and it’s weighting score wij. The weighting score is defined as follows:
wi , j = tf ij × log(
N ) df j
(1)
Where tfij is the term frequency of tj in di, N is the total number of documents in Dn and dfj is the document frequency of term tj. According to the weighting score, we select top x percent terms from the term vector. Here we set x to be 45%. Then we select noun, verb, adjective and adverb from the top 45% words to construct a keywords list keywords-di. Enrich original short text with keyword expansion For each short text di in Dn, and each term tj in keywords list keywords-di, let Syn_tj be tj’s ‘synsets’ in WordNet, and use Syn_tj to replace tj in di .
294
J. Wang et al.
Weighting the extended short text document Let EX_Dn be the set of N extended short texts, for each ex_di in EX_Dn, construct the term vector ex_vi, where each element is the weighting score wij of term tj, defined as follows:
wi , j = tf ij × log(
N ) df j
(2)
Where tfij is the term frequency of tj in ex_di, N is the total number of documents in EX_Dn for calculating document frequencies, and dfj is the document frequency of term tj.
3 Experiment and Discussion 3.1 Short Text Corpus Information Our experiment data comes from two corpus, Reuters-21578 Corpus and OHSUMED test collection. Some texts from the two corpuses are multiple categories and some contains more than 50 words. To perform our method, we select those texts which belong to only one category and contain no more than 50 words to construct a subset from Reuters-21578 Corpus and OHSUMED test collection respectively. We refer these two subsets as Reuters-subset and OHSUMEDsubset. Table 1. Short text corpora information Corpus
Category 'earn' 'money-fx' ‘money-supply’ Reuters-subset ‘trade’ ‘acq’ ‘interest’ ‘crude’ ‘Disorders of Environmental Origin’ OHSUMED-subset ‘Neoplasm’ ‘Musculoskeletal Diseases’
articles Total articles 1086 63 61 1715 27 332 93 53 15 57 30 12
3.2 Experimental Methodology and Evaluation Metrics Baseline It has been observed previously that additional features from WordNet can improve clustering accuracy. For comparison, we let the method used in [1] as baseline and refer it as all-word-expansion based representation method and call our method as keyword-expansion based representation method.
Improving Short Text Clustering Performance with Keyword Expansion
295
Clustering Solution Evaluation Metrics We focus our experiment on clustering accuracy and efficiency with different text representation between all-word-expansion and keyword-expansion. We have clustered the above two corpuses using K-means and compared the pre-categorization with our clustering results using three different metrics. The first metric is the widely used F-measure which combines recall and precision into global measures of utility. The second measure is the Entropy that looks how the various classes of texts are distributed within each cluster, and the third measure is the Purity that measures the extend to which each cluster contained texts from primarily one class[7,8].The derivation of the formula of F-measure, Purity and Entropy is beyond the scope of this paper, more details can be found in [7,8]. In general, lower Entropy value indicates better clustering solution and higher value of Purity and F-measure means the clustering results is more convinced. Efficiency Evaluation Metrics We employed the program executing time to evaluate the method’s efficiency. Our program included four parts: Extract-Words, Enrich-Text, TF/IDF-Weighting and Kmeans-Clustering. In the Extract-Words process, the program extracts words from text to construct a keywords list from which the word will be expanded with background knowledge from WordNet. In All-words-expansion representation method, this process is just to remove stop words from short texts and all the rest will be selected into the keywords list. The Enrich-Text process is to use the background knowledge to replace the words in short text which also appeared in the keywords with background knowledge list. The TF/IDF-Weighting process is to represent the enriched short texts with TF/IDF score.
3.3 Results and Discussion The performance on each of the representation method was explored by averaging its performance twenty times. The program executing time is measure by CPU time(The CPU time is measured in seconds on a Intel(R) Core(TM)2 CPU 1.73GHz running Python). The quality of the clustering solution was evaluated by F-measure, Purity and Entropy. Experiment Results Fig.1 showed when the OHSUMED-subset was represented by two methods, the executing time of Extract-Words, Enrich-Text, and TF/IDF-Weighting and Kmeans-Clustering process. Fig. 2 showed when the Reuters-subset was represented by the two methods, the executing time used by Extract-Words, Enrich-Text, TF/IDF-Weighting and Kmeans-Clustering process. Table 2 recorded when OHSUMED-subset and Reuters-subset were represented by the two methods respectively, the total executing time used by the entire program. Table 3 showed the quality of clustering solutions measured by F-measure, Purity and Entropy of the two methods respectively.
296
J. Wang et al.
Fig. 1 Executing time with OHSUMED-subset represented by All-words-expansion representation method and keywordexpansion representation method
OHSUMED-subset 1.8
CPU Time(seconds)
1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 All-words Extract Words T F/IDF Weighting
Enrich Short T ext Kmeans Clusteirng
Reuters-subset
350 300 CPU Time(seconds)
Fig. 2 Executing time with Reuters-subset represented by All-words-expansion representation method and Key-wordsexpansion representation method
Key-words
250 200 150 100 50 0 All-words
Key-words
Extract Words
Enrich Short Text
TF/IDF Weighting
Kmeans Clusteirng
Discussion For convenience, we refer the All-words-expansion based representation method as the first one, and our Key-words-representation based method as the second. From Fig.1 and Fig.2, we can see that when the corpus was represented by the first method, the executing time of the Extract-Words process is less than the executing time when the corpora was represented by the second method. The reason is that the second method had to extract keywords from the corpus while the first method didn’t. However, when executing the Enrich-Text, TF/IDF-Weighting and K-means-Clustering process, the time used by the first method was much longer than the second method.
Improving Short Text Clustering Performance with Keyword Expansion
297
Table 2 Executing time Corpus Expansion:
OHSUMED -subset Reuters -subset All words Keywords All words Keywords
Total CPU Time 3.314539744 2.6565999 438.2793 263.36194 Table 3 Clustering results evaluation Corpus Representation method F-measure Purity
Reuters-subset All words expansion 0.466704585096 0.761516034985
Keywords expansion 0.473993525225 0.781535471331
OHSUMED-subset All words expansion 0.502852281363 0.56140350877192979
Entropy
0.390147579326
0.389705295309 0.858800812261
Keywords expansion 0.492494239648 0.559122807018 0.86573901042695078
Table 3 showed the quality of clustering solutions measured by F-measure, Purity and Entropy for the two methods respectively. From this figure, we see that the quality of the clustering is almost the same. From the experiment results given by fig.1, fig2, table 2 and table 3, we can see that our method saves time while obtaining almost the same clustering accuracy.
4 Conclusions In this paper, we studied the short text clustering problem. We improve previous work on short text clustering by enriching text representation. Compared to previous work, most of which are based on all words expansion, our proposed method, key words expansion method, can greatly improve the clustering efficiency while without sacrificing clustering accuracy. Although we use the ‘synsets’ from WordNet as the background knowledge of our expanded representation in this paper, an interesting direction of future work would be to use a variety of other sources of external text, such as web search results, query reformulation logs, and Wikipedia, etc. Acknowledgement. The work is supported by the National Basic Research Program of China under Grant No.2007CB310803. We are also grateful to anonymous reviewers for their valuable comments.
References 1. Banerjee, S., Ramanathan, K., Gupta, A.: Clustering Short Texts Using Wikipedia. In: 30th Annual International ACM SIGIR Conference on Research and Development In Information Retrieval, pp. 787–788. ACM Press, New York (2007) 2. Hotho, A., Staab, S., Stumme, G.: Ontologies Improve Text Document Clustering. In: Third IEEE International Conference on Data Mining. IEEE Computer Society Press, Florida (2003)
298
J. Wang et al.
3. Fellbaum, C.: An Electronic Lexical Database (Language, Speech, and Communication). MIT Press, Cambridge (1998) 4. The Reuters-21578 benchmark corpus, http://nltk.googlecode.com/svn/trunk/nltk_data/index.xml 5. Hersh, W., Buckley, C., Leone, T.J.: OHSUMED: an Interactive Retrieval Evaluation and New Large Test Collection for Research. In: 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 192– 201. Springer, New York (1994) 6. Li, J.Z., Fan, Q., Kuo, Z.: Keyword Extraction Based on tf/idf for Chinese News Document. Wuhan University Journal of Natural Sciences (2007) 7. Zhao, Y., Karypis, G.: Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering. Machine Learning 55(3), 311–331 (2004) 8. George, H., Adam, S.: Agreement, the F-Measure, and Reliability in Information Retrieval. J. Am. Med. Inform. Assoc. 12, 296–298 (2005) 9. Diego, I., David, P., Paolo, R.: Evaluation of Internal Validity Measures in Short-Text Corpor. In: Gelbukh, A. (ed.) CICLing 2008. LNCS, vol. 4919, pp. 555–567. Springer, Heidelberg (2008) 10. Resnik, P.: Using Information Content to Evaluate Semantic Similarity in Taxonomy. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence, Canada, pp. 448–453 (1995)
Nonnative Speech Recognition Based on Bilingual Model Modification at State Level Qingqing Zhang, Jielin Pan, Shui-duen Chan, and Yonghong Yan
Abstract. This paper presents a novel bilingual model modification approach to improve nonnative speech recognition accuracy when the variations of accented pronunciations occur. Each state of baseline nonnative acoustic model is modified with several candidate states from the auxiliary acoustic model, which is trained on speakers’ mother language. State mapping criterion and n-best candidates are investigated, and different numbers of Gaussian mixtures of the auxiliary acoustic model are compared based on a grammarconstrained speech recognition system. Using this bilingual model modification approach, compared to the nonnative acoustic model which has already been well trained by adaptation technique MAP, the Phrase Error Rate further achieves a 5.83% relative reduction, while only a small relative increase on Real Time Factor occurs.
1 Introduction With globalization, nonnative speech recognition is becoming a popular issue in automatic speech recognition (ASR). As apposed to native one, current speech recognition is known to perform considerably worse when recognizing nonnative speech, which results in word error rates of two to three times native error rates [1]. Qingqing Zhang · Jielin Pan · Yonghong Yan ThinkIT Speech Laboratory Institute of Acoustics Chinese Academy of Sciences Beijing 100190, China Shui-duen Chan Department of Chinese and Bilingual Studies The Hong Kong Polytechnic University
[email protected] H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 299–309. c Springer-Verlag Berlin Heidelberg 2009 springerlink.com
300
Q. Zhang et al.
The difficulty in recognizing nonnative speech results mainly from the mismatch between acoustic models automatically determined from training data and characteristics of test data. Usually, acoustic models are trained on native speech, which represent characteristics of native pronunciation. Nonnative speakers’ pronunciations of the test speech, however, different from those native speakers’ observed during training, dramatically decreases the recognition performance [2]. It is shown that when the acoustics of the accents are taken into account, large gains in performance can be achieved [3, 4]. This indicates that if acoustic models are trained with the nonnative speech covering the pronunciation variation, the performance of nonnative speech recognition can be improved. In fact, the variation among nonnative speakers, even with the same accent, is potentially very large [5]. The characteristics of the nonnative speakers’ pronunciation may differ in fluencies, levels of familiarity with the target language, individual tendencies in mapping unfamiliar sounds to their own native language sounds, etc. In order to build robust nonnative acoustic models, much more training data with different kinds of variations are required. However, to obtain such training data is very difficult, especially for those with strong accents, since most of the nonnative training data are recorded by speakers who are relatively familiar with the second language. In recent years, speaker adaptation techniques such as MAP and MLLR have been widely used to handle nonnative speech [6], by which native acoustic models can be adapted to nonnative ones with limited amounts of adaptation speech. In these methods the similarity between adaptation speech and test speech is the key point which determines the performance. Although these techniques gain in nonnative speech recognition accuracy, however, the typical nonnative accuracy following adaptation still falls substantially below that of native speech. [7] studies the limitations of adaptation. Results suggest that the target language phones that do not exist in the speakers’ own language as well as the large acoustic variability reduces the performance heavily. How to make acoustic models robust and tolerant with these pronunciation variations is the motivation of our research presented in this paper. [8] argues that nonnative speakers may produce speech sounds which are either part of their own native language or which are established via merging characteristics of a native sound with a nonnative speech sound. Thus it can be speculated that training data from the speakers’ mother language will be useful for nonnative speech recognition, especially when a heavy accent occurs (where speakers tend to use the phonemes of their own native language to substitute ones of the target language), since the training data represents pronunciation characteristics of the speakers’ own native language. Inspired by this, we examine the problem of Mandarin-accented English speech recognition when only a limited amount of nonnative training data is available. To make the baseline acoustic model better adapted to the characteristics of Mandarin-accent, the baseline nonnative English acoustic model is modified with an auxiliary Mandarin acoustic model at the state level. The
Nonnative Speech Recognition Based on Bilingual Model Modification
301
similarity between states from the nonnative English acoustic model and Mandarin acoustic model is investigated, and different numbers of state candidates are compared based on the designed testing database. Because computation cost is an important factor for real world applications, different numbers of Gaussian mixtures of the auxiliary acoustic model are also compared in our research to investigate their impacts on computation cost and recognition accuracy. The paper is structured as follows: Training and testing databases are presented in section 2. In section 3, we describe the baseline acoustic models of our experiments and in section 4 we document how state-level bilingual model modification can help improve the baseline nonnative recognizer performance efficiently. Section 5 gives a brief conclusion of this paper.
2 Database Description Our study is restricted to nonnative English spoken by native speakers of Mandarin. All the training and testing data described were recorded and digitized at 8 KHz sampling rate with 16-bit resolutions. The speech feature vector used throughout this paper consists of 36 components (12 PLP parameters, and their first and second order time derivatives), which is analyzed at a 10msec frame rate with a 25msec window size. Cepstral Mean Subtraction (CMS) is employed. Our training data are divided into three categories: native Mandarin database, native English database and Mandarin-accented English database. The native Mandarin training database consists of native Chinese speech database of National 863 Hi-Tech Project (DB863) [12]. It is a standard database published by governmental research program 863 for read speech in Mandarin. The English training database is Wall Street Journal (WSJ), etc. The Mandarin-accented English database was collected in our lab. It includes English read speech with text from everyday conversations, spoken by hundreds of Mandarin-accented speakers. Table 1 summarizes the main information about these three databases. In order to study different variations of nonnative pronunciations, a representative testing database is designed in the experiment. This testing database contains two types of phrases that are composed of words with different degrees of familiarity to speakers. Some of these phrases have the Table 1 Summary of three training databases Training Corpus Type TrainM Native Mandarin TrainE Native English Mandarin-accented TrainA English
Time Source (hours) DB863 500 WSJ 232 Lab
20
302
Q. Zhang et al.
most common and simple words in English such as ”baby”, ”hello”, so the speakers can pronounce them more easily; the other include some words that are rarely used in general conversations, such as names of English people: ”Fitzgerald”,” orchestra”, which are hard for Mandarin speakers, and in this situation speakers tend to pronounce them with heavier accents. This database consists of 1568 utterances in total, which were recorded by four Chinese female and six Chinese male. All the experiments use a grammar-constrained speech recognition system. The grammar consists of all of the phrases in the test set, which has about 400 different phrases in total. The English phone set used in the experiments is supplied by the ARPABET and the dictionary is based on the CMU pronunciation dictionary [9]. This dictionary consists of approximately 53,000 words with associated phonetic transcriptions. Performances measured in this paper are at the phrase level.
3 Baseline Models In this section, one native English acoustic model and different nonnative English acoustic models are investigated and compared. The acoustic model that has the best performance will be selected to be the baseline and modified by the state-level bilingual model modification approach in order to achieve further higher nonnative speech recognition accuracy. Additionally, a native Mandarin acoustic model is established in this section too. This model will be used as an auxiliary acoustic model in the model modification approach.
3.1 Native Acoustic Model For comparison, the native English acoustic model is established first. The native English acoustic model is trained on TrainE database only, which is labeled as Model Nat. This acoustic model contains 6000 tied HMM states with 32 Gaussian mixture components per state. Table 2 gives the Phrase Error Rate (PhrER) of the native English acoustic model. Additionally, we use the Mandarin training data TrainM to build native Mandarin acoustic model Model Mand, which will be used in the model modification approach as the auxiliary acoustic model. This acoustic model contains about 6000 tied HMM states with 32 Gaussian mixture components per state. The phone set of the model is supplied by 49 toneless Mandarin phones based on the phonetic inventory of the International Phonetic Association (IPA) [10].
3.2 Nonnative Acoustic Models In order to improve nonnative speech recognition accuracy, adding nonnative speech data into training may be the simplest way to reduce the mismatch
Nonnative Speech Recognition Based on Bilingual Model Modification
303
Table 2 Performances of the native and nonnative acoustic models on the testing database Acoustic Models PhrER (%) Model Nat 46.9 Model ReT 39.3 Model MLLR 37.2 Model MAP 34.3
between training/testing data. We explored different acoustic modeling methods to add the nonnative speech data TrainA into training for fitting the characteristics of Mandarin accent. Table 2 shows performances of these nonnative acoustic models on the testing data and gives comparison with native one. Model ReT (short for ReTrained acoustic Model) is the acoustic model retrained by pooling non-native data TrainA and native data TrainE together. Model MLLR (short for MLLR adaptive acoustic Model) refers to acoustic model to apply adaptation technique MLLR on native acoustic model Model Nat with TrainA. Model MAP (short for MAP adaptive acoustic Model) is the acoustic model using adaptation technique MAP on native acoustic model Model Nat with TrainA. As expected, all of these nonnative models perform better than do the native one on the nonnative testing data. Compared to Model ReT and Model MLLR, Model MAP reaches the lowest PhrER. Therefore, Model MAP is selected to be the baseline nonnative acoustic model, and all of the improvements with state-level bilingual model modification approach below will be based on this adaptive nonnative acoustic model.
4 State-Level Model Modification Adaptation technique is shown to be effective for nonnative speech recognition. This technique depends on the consistency between the types of adaptive data and test data. In practical use, however, the nonnative speech recognizer may encounter speakers who are just beginning to learn the foreign language or who have heavy accents. In these cases, nonnative speakers tend to substitute sounds of their mother language for those foreign sounds they can not produce [11]. Usually, these pronunciation substitutions are difficult to predict and hard to capture in the training data. Due to lack of adaptive data, the gain achieved by adaptation technique will be limited. In our application, nonnative speakers may produce English sounds which are more likely as parts of Mandarin sounds. It can be speculated that modifying nonnative English acoustic models with Mandarin ones may be useful to capture the pronunciation substitutions. Since Mandarin have different phone sets from English, in order to implement the modification, a statelevel mapping criterion to define which English model state combines with
304
Q. Zhang et al.
which Mandarin model states is required. In the following section, the statelevel mapping algorithm is presented first, and then different n-best state candidates are investigated and compared based on the nonnative test set. Lastly, different numbers of Gaussian mixtures for the auxiliary acoustic models used in modifying the nonnative one are also compared to measure computational cost.
4.1 State-Level Mapping Algorithm The state-level mapping algorithm used in this paper is an extension of a phone clustering algorithm based on confusion matrix [2]. In the state-level mapping, each baseline acoustic model state is modified with its corresponding auxiliary model states by combining the Gaussian mixture into a new mixture. In our application, the nonnative English acoustic model Model MAP is chosen to be the baseline acoustic model and the Mandarin acoustic model is used as the auxiliary acoustic model. The detailed algorithm can be described as below: (For convenience, Mandarin and English are referred as source and target language respectively.) 1. Target reference: Force align target language states in small amounts of target language speech data using target language acoustic model in order to get the time-label information. These are considered as the target states reference. 2. Source hypothesis: The source language state recognizer is applied using source language acoustic model on these speech data to decode the source phonetic representation of each utterance. This yields parallel phonetic segmentations of the target language speech data in the source language state inventories. The source phonetic representation is considered as the source states hypothesis. 3. Co-occurrence criterion: Define a criterion for co-occurrence between two phonetic labelings of the reference and hypothesis. In our system, when the number of overlapping frames between the reference and hypothesis states is more than sixty percent of the reference state duration, we can arrange the state of the source language into a target language matrix that contains the counts of co-occurrences between the ith and j th states of the source and target languages. This language matrix of cooccurrences is the confusion matrix. Figure 1 shows an example of the co-occurrence between state ” tj en” and state ” si ch” when English is taken as the target language. 4. Confusion probability calculation: Let M, N be the numbers of states in source and target language. Let AS,T (M, N) be the confusion matrix and Ai,j be the ith row and j th column element of this matrix. Given the target language state tj and the source language state si , the confusion probability can be computed as:
Nonnative Speech Recognition Based on Bilingual Model Modification
305
Fig. 1 Example of the co-occurrence between state ” tj en” and state ” si ch” when English is taken as the target language. (Note: the Mandarin states and English states are labeled by tag ” ch” and ” en” respectively)
Ai,j =
count(si |tj ) M count(sn |tj )
(1)
n=1
where Ai,j ∈ AS,T (M, N),i = 1...M, j = 1...N. 5. State mapping information: After the confusion probability calculation Ai,j is obtained, the state mapping information can be derived from this matrix. Given the ith row and j th column element of Ai,j has the maximal value in the j th column, the ith state from source language is the best matching state to the j th state of the target language. If k th row has the second maximal value in the j th column, the k th state from source language is the 2nd-best matching state to the j th target language state, and so on. Based on this rule, every target language state can find n-best (n<M) matching states from the source language. 6. State-candidate model modification: When the state mapping information and the number of n-best candidates are determined, the new output probability density pj (ot ) of target language state j th for the observation vector ot is modified by the n-best state candidates from the source language as follow: pj (ot ) = αpj (ot )tar + (1 − α)
n l=1
where n l=1
alj = 1
alj pl (ot )sou
(2)
306
Q. Zhang et al.
Table 3 Performance of the nonnative acoustic model with state-level model modification approach on the testing database Acoustic Models PhrER (%) Model MAP 34.3 Model MM C1 31.8
Here α denotes the weight for state from baseline acoustic model, alj is the confusion probability of corresponding lth -best state candidate and n denotes the number of n-best candidates. pj (ot )tar and pl (ot )sou refer to output probability densities of original target language state j th and source language state lth respectively. Table 3 shows the performance of the nonnative acoustic model with statelevel model modification approach on the testing database. Model MM C1 refers to the acoustic model modified by the state-level bilingual model modification approach when Model MAP is regarded as the baseline nonnative English acoustic model and Model Mand is regarded as the auxiliary acoustic model, respectively. ” C1”, short for ”1-best Candidate”, means that each state of baseline acoustic model is modified by just using the best matching state from Mandarin model states via combining the Gaussian mixture into a new mixture. As shown, Model MM C1 performs remarkably better than the baseline nonnative English acoustic model, achieving a 7.3% relative PhrER reduction.
4.2 N-Best Candidate Selection Given the state mapping criterion, another aspect concerning model optimality that we have investigated is the number of n-best candidates selected to modify the baseline nonnative acoustic model. Appropriate number of candidates can improve the recognition accuracy further. In our study, different candidate numbers are investigated and compared. Experiment shows that 2-best candidate is the most appropriate choice. Table 4 presents the experimental results on the testing database when selecting different numbers of n-best candidates. Tags” C1”,” C2” and” C3” refer to three numbers of n-best candidates. As can be seen, Model MM C2 Table 4 Performances of state-level model modification approach with different n-best candidates on the testing database Acoustic Models PhrER (%) Model MAP 34.3 Model MM C1 31.8 Model MM C2 31.6 Model MM C3 32.0
Nonnative Speech Recognition Based on Bilingual Model Modification
307
outperforms Model MM C1 and Model MM C3, achieving a 7.9% relative PhrER reduction compared to the baseline nonnative acoustic model. Thus, 2-best state candidate is regarded as the preferable number of state candidates to be selected in the state-level model modification approach.
4.3 Number of Gaussian Mixture Components It can be seen that state-level bilingual model modification approach improves nonnative speech recognition accuracy significantly. However, adding the auxiliary acoustic model into recognition processing will cost more time to decode. In speech recognition systems, the acoustic computation cost often occupies most of the decoding time. Thus, controlling the number of Gaussian mixture components in the acoustic models is needed before the system can be put into practical use. To measure computation, Real Time Factor (RTF) is used as the criterion in our experiments. If it takes time P to process an input of duration T, the RTF is defined as: P (3) T Table 5 shows results on RTF and PhrER for the model modification approach with different numbers of Gaussian mixtures of auxiliary acoustic models. Tags ” 4GMM”, ” 8GMM” and ” 32GMM” refer to three different numbers of Gaussian mixtures for the auxiliary Mandarin acoustic model. With the number of Gaussian mixtures increasing, the RTF increases remarkably. Even though Model MM 32GMM C2 achieves the best performance on PhrER, it costs much more time to decoder speech, which has a 152% relative increase on RTF compared to the baseline. That is too slow to be used in practice. As we can see in Table 5, when the number of Gaussian mixtures is reduced to 4, the computation cost decreases significantly, and at the same time over 5% relative reduction on PhrER can still be achieved. In fact, the auxiliary Mandarin acoustic model, which represents the characteristics of Mandarin pronunciations, is used to modify the nonnative acoustic model. It does not need to be too detailed. As shown in Table 5, 4 Gaussian mixture components for the auxiliary acoustic model is enough to capture most of the characteristics of Mandarin pronunciations. RT F =
Table 5 The comparison results of model modification approach with different numbers of Gaussian mixture components in the auxiliary acoustic models Acoustic Models Model MAP Model MM 4GMM C2 Model MM 8GMM C2 Model MM 32GMM C2
RTF PhrER (%) 1.05 34.3 1.32 32.3 1.53 32.7 2.65 31.6
308
Q. Zhang et al.
Therefore, considering the computation cost and recognition accuracy, the nonnative acoustic model whose states are modified with corresponding 2best state candidates from the auxiliary Mandarin model with 4 Gaussian mixture components achieves the best performances.
5 Conclusion This paper investigates how to improve nonnative speech recognition accuracy when large variations of accented pronunciations occur. In order to capture the characteristics of accents from speakers’ own native language, a novel state-level bilingual model modification approach is proposed and presented. Each state of the baseline nonnative acoustic model is modified by 2-best candidate states from the auxiliary acoustic model with 4 Gaussian mixture components, which is trained on speakers’ mother language. In our experiments, significant improvement is achieved on recognition accuracy while only a small relative increase on computation cost occurs. This approach is proven to be effective and efficient for the development of nonnative speech recognition systems in which training data are limited to cover large variations of accented pronunciations. Acknowledgements. This work is partially supported by MOST (973program,2004CB318106), National Natural Science Foundation of China (10574140, 60535030), The National High Technology Research and Development Program of China (863 program, 2006AA010102,2006AA01Z195).
References 1. Tomokiyo, L.M., Waibel, A.: Adaptation Methods for Nonnative Speech. In: Proceedings of Multilinguality in Spoken Language Processing (2001) 2. Zhang, Q., Pan, J., Yan, Y.: Mandarin-English Bilingual Speech Recognition for Real World Music Retrieval. In: ICASSP 2008, paper 1147, Las Vegas, March 30 - April 4 (2008) 3. Humphries, J., Woodland, P., Pearce, D.: Using accent-specific pronunciation modeling for robust speech recognition. In: Proc. ICSLP 1996, Philadelphia, PA, October 1996, pp. 2324–2327 (1996) 4. Teixeira, C., Trancoso, C., Serralheiro, A.: Recognition of Non-native Accents. In: Proc. Eurospeech 1997, Rhodes, Greece, September 1997, pp. 2375–2378 (1997) 5. Livescu, K.: Analysis and Modeling of Non-native Speech for Automatic Speech Recognition. Master’s thesis, MIT (August 1999) 6. Wang, Z., Schultz, T., Waibel, A.: Comparison of Acoustic Model Adaptation Techniques on Non-native Speech. In: Proc. ICASSP (2003) 7. Clarke, Constance, Jurafsky, Daniel: Limitations of MLLR Adaptation with Spanish-accented English: an Error Analysis. In: INTERSPEECH 2006, paper 1611-Tue2BuP.7 (2006)
Nonnative Speech Recognition Based on Bilingual Model Modification
309
8. Bohn, O.-S., Flege, J.E.: The production of New and Similar Vowels by Adult German Learners of English. Stud. Second Lang. Acquis. 14, 131–158 (1992) 9. The CMU Pronouncing Dictionary v0.6, The Carnegie Mellon University, http://www.speech.cs.cmu.edu/cgi-bin/cmudict 10. IPA. The International Phonetic Association (revised to 1993) IPA Chart. Journal of the International Phonetic Association 23 (1993) 11. Flege, J.E.: Production and Perception of a Novel, Second-language Phonetic Contrast. Journal of the Acoustical Society of America 93, 1589–1608 (1993) 12. Li, A., Yin, Z., Wang, T., Fang, Q., Hu, F.: RASC863 - A Chinese Speech Corpus with Four Regional Accents. In: ICSLT-o-COCOSDA, New Delhi, India (2004)
Edge Detection Based on a PCNN-Anisotropic Diffusion Synergetic Approach Mario I. Chacon-Murguia, Mitzy Nevarez-Aguilar, Angel Licon-Trillo, Oscar Mendoza-Vidaña, J. Alejandro Martinez-Ibarra, Francisco J. Solis-Martinez, and Lucina Cordoba Fierro*
Abstract. A new synergetic model based on a pulse neural network and the Perona-Malik anisotropic diffusion algorithm is introduced. The proposed model was developed because of the failure of conventional edge detectors to perform the extraction of geometric structures from low contrast edges and noisy assumed uniform regions. The synergetic model is a variation of the Perona-Malik algorithm that incorporates a gradient update during its iteration process. The gradient update is related to enhance real edges and to reduce the noise content in uniform regions. The proposed approach is compared with the Canny and watershed methods as well as with a variation of the Perona-Malik algorithm. Results indicate that the synergetic approach yields a more complete geometric structure extraction compared with the others methods.
1 Introduction Edge detection has been a challenge area in image processing. The importance of solving the edge detection problem correctly is related to the effect it has in the segmentation stage for image analysis. Different schemes have been proposed to face the edge detection problem, and many more are still being proposed because edge detection in general requires specific edge detection methods for specific situations. In fact, edge detection can be considered an open problem due to the variety of situation found in image processing applications. Although many edge detection methods have been reported in the literature [1-4], it is not common to find synergetic approaches aimed to improve the edge detection methodology performance in complex situations. One kind of complex situation for edge Mario I. Chacon-Murguia . Mitzy Nevarez-Aguilar . Oscar Mendoza-Vidaña Chihuahua Institute of Technology DSP & Vison Lab *
Angel Licon-Trillo . Francisco J. Solis-Martinez . Lucina Cordoba Fierro Facultad de Medicina, Universidad Autónoma de Chihuahua J. Alejandro Martinez-Ibarra Centro Universitario del Sur, Universidad de Guadalajara, Mexico
[email protected] H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 311–321. springerlink.com © Springer-Verlag Berlin Heidelberg 2009
312
M.I. Chacon-Murguia et al.
Fig. 1 Structure samples found on the Nose Bug eggs’ surface
detection is when the image under processing has low contrast edges and noisy assumed uniform regions. In these cases most of the edge methods present a high rate of false positives from the edge pixel point of view. An interesting approach to deal with edge detection in low contrast images is the anisotropic diffusion filters [5-8]. A well known anisotropic filter methodology is the Perona-Malik method, P-M, [5]. A modification of this filter has been reported in the literature to have good performance in the detection of edges in glass substrates images, which are low contrast images [6]. A specific situation of low contrast images with a low signal (edges) to noise ratio can be found in surface images of the Meccus Phyllosomus eggs. The Meccus Phyllosomus or Nose Bug is an insect that can infect humans with a parasite that may originate cardiovascular problems termed Chagas disease. Due to this situation the Nose Bug has been a topic of many studies for many years [9,10]. One of the studies related to the Nose Bug concerns the taxonomy of the families and their possible hybrids. A possible solution proposed to determine the correct taxonomy is to relate the geometric structures found in the eggs’ surface with a corresponding family or hybrid. Figure 1 illustrates two examples of such geometric structures. The images are acquired with a scanning electron microscope. Before the geometric structures of the egg can be analyzed for classification purposes, it is necessary to extract them from the egg’s surface using image processing techniques, which is not a trivial task because of the low contrast and noisy composition of the images. The geometric structures extraction problem through image edge detection is the topic cover in this research. The proposed method to achieve this task is a synergetic approach. The method incorporates information extracted with a Pulse Coupled Neural Network, PCNN, which is incorporated into an anisotropic diffusion method. The organization of this paper includes a description of the possible solution using the Perona-Malik method as well as one proposed modification, Section 2. Section 3 presents the PCNN, methodology to obtain synergetic information. The proposed methodology combining the PCNN information with the P-M method is explained in Section 4. Finally the results and conclusions are discussed in Section 5.
Edge Detection Based on a PCNN-Anisotropic Diffusion Synergetic Approach
313
2 Anisotropic Filtering Anisotropic filtering is a method based on the solution of the heat equation or diffusion equation. Its main idea is to smooth assumed uniform regions based on a diffusion behavior, and preserve the edges of a region by stopping the diffusion process. The method developed by Perona–Malik [5], is described in the next subsection.
2.1 Perona Malik Approach The Perona-Malik model, P-M, is derived from the continuous anisotropic diffusion defined by
∂I t ( x, y ) ∂t
= div ⎡⎣ct ( x, y ) ∇I t ( x, y ) ⎤⎦
(1)
From this continuous model a discrete anisotropic diffusion model is derived and is represented by 1 4 I t +1 ( x, y ) = I t ( x, y ) + ∑ ⎡⎣ ctj ( x, y ) ∇I ti ( x, y ) ⎤⎦ (2) 4 i =1 where I t ( x, y ) is the current image under processing at time t. ctj ( x, y ) is the diffusion coefficient at time t with direction j={North, South, East, West}, and ∇I it ( x, y ) represents the gradient at the coordinates (x,y) in the directions j, defined by
∇I tN ( x, y ) = I t ( x − 1, y ) − I t ( x, y )
(3)
∇I tS ( x, y ) = I t ( x + 1, y ) − I t ( x, y )
(4)
∇I tE ( x, y ) = I t ( x, y + 1) − I t ( x, y )
(5)
∇IWt ( x, y ) = It ( x, y ) − I t ( x, y − 1)
(6)
The diffusion coefficient ctj ( x, y ) can be defined as a function of ∇I it ( x, y ) ctj ( x, y ) = g ( ∇I it ( x, y ) )
(7)
where g ( ∇I ( x, y ) ) is a non-negative monotonically decreasing function, such i t
that
⎧1 g ( 0) ⎪ g ( ∇I it ( x, y ) ) = ⎨0 lim g ( ∇I i ( x, y ) ) t ⎪ ∇Iit ( x , y ) →∞ ⎩
(8)
which warranties that edge regions, high gradient values, will be preserved, and uniform regions will be diffused, smoothed.
314
M.I. Chacon-Murguia et al.
A possible g() function in the P-M model is g ( ∇I it ( x, y ) ) =
1 ⎛ ∇I ( x , y ) 1+ ⎜ ⎜ k ⎝ i t
⎞ ⎟ ⎟ ⎠
2
(9)
where k is a parameter that controls the diffusion process, such that, oversmoothing ⎧ Diffusion process = ⎨ diffsuion stops at early stages ⎩
if k is too large if k is too small
2.2 Perona – Malik Variations and Parameter Estimation Chao and Tsai propose a modification to the P-M model, P-MM, to carry out edge sharpening besides the smoothing process [6]. This model is formulated as
I t +1 ( x, y ) = I t ( x, y ) +
1 4 ⎡ g ( ∇I ti ( x, y ) ) − v ( ∇I ti ( x, y ) ) ⎤∇I ti ( x, y ) ∑ ⎦ 4 i =1 ⎣
(10)
The sharpening function is defined as
v ( ∇I it ( x, y ) ) = α ⎡⎣1 − g ( ∇I it ( x, y ) ) ⎤⎦
(11)
where α is sharpening parameter with 0 ≤ α ≤ 1. The v() function is a nonnegative monotonically increasing function, such that ⎧0 v ( 0) ⎪ v ( ∇I it ( x, y ) ) = ⎨1 lim v ( ∇I i ( x, y ) ) t ⎪ ∇Iit ( x , y ) →∞ ⎩
(12)
The parameters for the P-M and P-MM models were adjusted to process the eggs’ images following the recommendations reported in the literature [5,6,8]. The best edge detection results were obtained with the values shown in Table 1. Table 1 P-M and P-MM parameters P-M 20 Iterations K=maximum gradient value,λ=1/4
P-MM 10 Iterations K= 70, λ=1/4,α= 0.03
2.3 Anisotropic Diffusion Results The P-M and the P-MM models were tested in the edge detection problem. The findings indicate that edge detection in these images is even problematic for these two methods. The problem arises from the fact that the image composition incorporates low contrast edges and high content of noise. Therefore, this leads us to a
Edge Detection Based on a PCNN-Anisotropic Diffusion Synergetic Approach
315
new paradigm where the main idea of diffusion is preserved but a solution to the low contrast edges and noisy uniform regions is incorporated. This new paradigm based on the PCNN image preprocessing is described in the next section.
3 PCNN Preprocessing The PCNN has been used as a preprocessor in different situations reported in the literature [11-13]. The idea of incorporating the PCNN model is to try to generate uniform regions of the images. If this is possible then these regions may be used to reinforce the edges and diffuse the other pixels using the P-M approach.
3.1 PCNN Model The PCNN architecture has three main modules: the dendrite tree, the linking and the pulse generator [11]. The linking and feeding elements form the dendrite tree. The neuron element receives information from its neighborhood through the linking. The input signal information is obtained through the feeding. The pulse generator module compares the internal activity, U(t), with a dynamic threshold to decide if the neuron element fires or not. A PCNN model can be defined by (13)-(17) F (t ) = GFeed e
−α F Δt
L ( t ) = GLink e
F ( t − 1) + S + Y ( t − 1) ∗ W
−α LΔt
L ( t − 1) + Y ( t − 1) ∗ M
U ( t ) = F ( t ) ⎡⎣1 + β L ( t ) ⎤⎦ θ (t ) = e
− 1
αθ θ ( t − 1) + VY (t )
⎧1 if U ( t )〉θ ( t ) Y ( t ) = ⎪⎨ ⎪⎩0 otherwise
(13) (14) (15) (16)
(17)
The feeding region is represented by Eq. (13), where GFeed is the feed gain, S is the input image, αFΔt is the time constant of the leakage filter of the feeding region, Y(t) is the neuron output at time t, and W is the feeding kernel. The outputs Y(t) of the PCNN can be observed as output images called pulsed images of the PCNN. Eq. (14) describes the linking activity. Here GLink is the linking gain, αLΔt is the time constant of the leakage filter of the linking region, and M is the linking kernel. Equation (15) corresponds to the internal activity of the neuron element. The internal activity depends on the linking and feeding activity. In (15) β is the linking coefficient. β defines the amount of modulation of the feeding due to
316
M.I. Chacon-Murguia et al.
the linking activity. The dynamic threshold is implemented by (16), where αθ is the time constant of the leakage filter of the threshold and V is the threshold gain. Finally the output of the neuron is defined by (17). In the case of an image processing task, each pixel is related to a neural element.
3.2 Image Segmentation The generation of the information that will be included in the P-M model is presented in this section. The main interest is to generate information that may increase the performance of the P-M method. This information must be related to data to enhance real edges and to reduce the noise content in uniform regions. In this respect, the PCNN model parameters are set to generate uniform regions, the parameter values used are indicated in Table 2. These parameters are derived from previous works [14,15]. Using these parameters the PCNN was tested and it was found that the best information is generated in pulsation 14th. Figure 2 illustrates an original image and the pulsation 14th, I14(x,y). Once the main information is generated by the PCNN, it is morphologically processed to yield the final information that is provided to the P-M algorithm. These morphologic operations are listed next. The first operation on I14(x,y) is a close operation with a structure element B corresponding to an octagon of size 3 to generate uniform areas I C ( x, y ) = I14 ( x, y ) • B
(18)
Table 2 PCNN parameter settings to generate uniform regions
β = 1.0 αLΔt = -0-1
G = 0.1 Feed α
θ
=5
αFΔt = -0.1 V =5
G
Link
= 1.0
The linking and feeding kernels are 3X3 average filters.
Fig. 2 a) Original image, b) Regions generated by the PCNN at pulsation 14th
Edge Detection Based on a PCNN-Anisotropic Diffusion Synergetic Approach
a)
b)
c)
d)
317
e) Fig. 3 a) IC(x,y), b) IE(x,y), c) IH(x,y), d) Small holes dilated and added to IE(x,y), e) ITH(x,y)
Then remove connections between uniform areas through erosion with C being a square of size 3 I E ( x , y ) = I C ( x, y ) ○ C
(19)
Now find small holes in IC(x,y) to fill them in IE(x,y), this gives image IH(x,y). Dilate the small holes found in IC(x,y) with D equal to a disk of size 2, add them to IE(x,y) , and apply thinning to obtain the image ITH(x,y). The image ITH(x,y) contains useful information related to possible edge regions and uniform regions that can be incorporated into the P-M model. The complete process is illustrated in Figure 3. ITH ( x, y ) = ⎡⎣( I H ( x, y ) ⊕ D ) + I E ( x, y ) ⎤⎦
(20)
4 Synergetic Model The PCNN–PM synergetic model consists on improving the edge information quality before it is used by the PM algorithm. This improvement is achieved by
318
M.I. Chacon-Murguia et al.
increasing the gradient values of possible edge pixels and decreasing the gradient values of non possible edge pixels using the information provided by ITH ( x, y ) .
4.1 Edge Estimation, Edge Reinforcement and Noise Reduction The information presented in ITH(x,y) is used to estimate the possible edges in the original image, such that the P-M algorithm is reinforced by better edge pixel estimation. Pixel edges information is improved by modifying the term Iit(x,y) in Eq. (2) and (10) as indicated in Eq. (21).
⎧max ∇ ( I it ( x, y ) ) if p ( x, y ) ∈ ITHE ( x, y ) ⎪ I ( x, y ) = ⎨ i ⎪⎩ min ∇ ( I t ( x, y ) ) if p ( x, y ) ∉ ITHE ( x, y ) i t
(21)
where, max ∇ ( I it ( x, y ) ) , and min ∇ ( I it ( x, y ) ) are the maximum gradient vector and the minimum gradient vector in the image under processing, and ITHE(x,y) are the set of edge pixels in ITH(x,y). This step has the effect of reducing the uncertainty on the information used by the P-M algorithm, improving the edge to noise ratio. Thus the synergetic anisotropic model can be expressed for the P-M and PMM models by I t +1 ( x, y ) = I t ( x, y ) +
1 4 ⎡⎣ctj ( x, y ) ∇I it , n ( x, y ) ⎤⎦ ∑ 4 i =1
(22)
1 4 ∑ ⎡ g ( ∇Iit ,n ( x, y ) ) − v ( ∇Iit ,n ( x, y ))⎤⎦∇Iit , n ( x, y ) (23) 4 i =1 ⎣ where n in the gradient term is the number of times equation (21) is applied in the P-M process. In this work the best result was achieved with n=2. Applying the anisotropic synergetic model yields a new algorithm that is faster in its edge convergence and edge definition performance, due to the uncertainty reduction accomplished by the PCNN information. I t +1 ( x, y ) = I t ( x, y ) +
4.2 Edge Definition Once the PCNN-PM model is applied, the final edges are defined with the Canny edge detector method. Some examples of the performance of the new model over the images in Figure 1 are shown in Figure 4 and 5 in comparison with other edge detector methods; Canny, Watershed, P-M, and P-MM. It can be observed from these figures that the Canny detector is completely mixed up with the noise presented in the internal regions. The Watershed method found acceptable edges but they are not completely defined. The P-M approaches deal better with the low contrast edges and the noise content. Among the P-M paradigm the PCNN-PM and the PCNN-PMM models yield more closed structures than the P-M and the PMM. The PCNN-PM is selected over the PCNN-PMM because the PCNN-PMM
Edge Detection Based on a PCNN-Anisotropic Diffusion Synergetic Approach
a)
b)
c)
d)
e)
f)
319
Fig. 4 a) Canny alone, b) Watershed, c) P-M normal 20 iterations K= maximum gradient, d) P-MM, e) PCNN-PM, 8 iterations, λ=1/4, n=2, f) PCNN-PMM, 8 iterations, λ=1/4, n=2
a)
b)
c)
d)
e)
f)
Fig. 5 a) Canny alone, b) Watershed, c) P-M normal 20 iterations K= maximum gradient, d) P-MM, e) PCNN-PM, 8 iterations, λ=1/4, n=2, f) PCNN-PMM, 8 iterations, λ=1/4, n=2
requires the computation of the sharpening function v(), which in this cases does not provide significant advantages due to the noisy content in the assumed uniform regions.
320
M.I. Chacon-Murguia et al.
5 Results and Conclusions 5.1 Experimental Results This work described a synergetic approach to determine the edges of geometric structures found in scanning electron microscope surface images of the Meccus Phyllosomus eggs. The challenge is to extract those structures from a low contrast edges and noisy information. Results of this research showed that the synergetic approach yield better results than ordinary edge detection operators, as well as better results than the anisotropic approach alone. Incorporation of the PCNN information speeded up the PM algorithm, and it also improved the detection of real edges and decreases the false positive cases.
5.2 Conclusions and Future Work The work showed that common edge detector operators are not able to deal satisfactorily with this problem. Instead, an anisotropic diffusion approach was followed. Albeit the anisotropic approach is a more robust method to face the problem it was necessary to incorporate information generated with a PCNN to increase the performance of edge detection. Future work will be related to remove the small isolated blobs, reduce the false positives edges, and to link discontinuous edges.
References 1. Canny, J.: A Computational Approach to Edge Detection. IEEE Trans. on Pattern Analysis and Machine Intelligence 8, 679–698 (1986) 2. Walczak, A., Puzio, L.: Adaptive Edge Detection Method for Images. OptoElectronics Review 16, 60–67 (2008) 3. Yuan, W.Q., Li, D.S.: Edge Detection Based on Directional Space. Frontiers of Electrical and Electronic Engineering in China, 135–140 (2006) 4. Wang, H., Li, H., Ye, X., Gu, W.: Training a Neural Network for Moment Based Image Edge Detection Journal of Zhejiang University 1, 398–401 (2000) 5. Perona, P., Malik, J.: Scale-Space and Edge Detection Using Anisotropic Diffusion. IEEE Trans. on Pattern Analysis and Machine Intelligence 12, 629–639 (1990) 6. Chao, S.M., Tsai, D.M.: An Anisotropic Diffusion-based Defect Detection for LowContrast Glass Substrates. Image and Vision Computing 26, 187–200 (2008) 7. Alvarez, L., Pierre, L., Morel, J.: Image Selective Smoothing and Edge Detection by Nonlinear Diffusion. SIAM J. Numeric Analysis 29, 845–866 (1992) 8. Jahne, B., HauBecker, H.: Computer Vision and Applications. Academic Press, London (2000) 9. Lent, H., Wygodzinsky, P.: Revision of the Triatominae (Hemiptera, Reduviidae), and Their Significance as Vectors of Chagas’ Disease. Bull. Amer. Museum nat. Hist. 163, 125–520 (1979)
Edge Detection Based on a PCNN-Anisotropic Diffusion Synergetic Approach
321
10. Cruz-Reyes, A., Pickering-López, J.M.: Chagas Disease in Mexico: an Analysis of Geographical Distribution During the Past 76 years - A Review. Mem Inst Oswaldo Cruz, Rio de Janeiro 101, 345–354 (2006) 11. Lindbland, T., Kinser, J.: Image Processing Using Pulse-Coupled Neural Netw. Springer, Heidelberg (1998) 12. Johnson, J.L., Padgett, M.L.: PCNN Models and Applications. IEEE Trans. on Neural Netw. 10, 480–498 (1999) 13. Gu, X., Zhang, L.: Orientation Detection and Attention Selection Based Unit-linking PCNN. In: International Conference on Neural Networks and Brain, vol. 3, pp. 1328– 1333 (2005) 14. Chacón, M., Zimmerman, A., Rivas, P.: Image Processing Applications with a PCNN. In: Liu, D., Fei, S., Hou, Z., Zhang, H., Sun, C. (eds.) ISNN 2007. LNCS, vol. 4493, pp. 884–893. Springer, Heidelberg (2007) 15. Chacón, M., Zimmerman, A.: License Plate Location Based on a Dynamic PCNN Scheme. In: Proceedings of the IEEE IJCNN (2003)
Automatic Face Recognition Systems Design and Realization Zhiming Qian, Chaoqun Huang, and Dan Xu*
Abstract. This paper provides new insights into three issues in face recognition: designs a complete face recognition system and gives the realization process; Presents a iris location method on K-means algorithm; Provides a face recognition method on Sobel and LBP. Experiments show that the system has good recognition results. Keywords: Face recognition, Adaboost algorithm, K-means clustering, Feature extraction, Feature analysis.
1 Introduction In recent years, with the development of artificial intelligence and E-commerce, face recognition becomes the most potential biometric technology and authentication means. The research of computer face recognition began in the late 1960s. The earliest face recognition system was created by Bledsoe in terms of the distance between facial feature point and rate parameter [1]. Face recognition has a progressive development in recent decades, and in the 1990s it became a hot research. Face recognition is the leading edge of a field of pattern recognition domain, but it is now still at the stage of research topics and it can not become active in the field of practical field. Although human can be easy to identify a familiar face, there are still many difficulties in automatically recognizing human faces from frontal views. For example: occlusion and corruption; varying illumination and expression; imaging perspective and distance; and so on. Although the classification criteria on face recognition may not be the same, the current study has two main directions [2]. One is the global analysis, classifying and recognizing based on whole face image. It considered the attributes of the global pattern. The other is based on the local feature information of face image, such as eye, mouth and nose, then it classifies and recognizes in terms of certain criteria. Zhiming Qian . Chaoqun Huang . Dan Xu YunNan University, China
[email protected],
[email protected],
[email protected] *
Zhiming Qian Chuxiong Normal University, China H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 323–331. springerlink.com © Springer-Verlag Berlin Heidelberg 2009
324
Z. Qian, C. Huang, and D. Xu
The subspace methods of face recognition such as PCA, Fisher were successfully tested in many situations [3,4]. As a result, they made large progresses in face recognition field. However, if we analyze them from the practical angle, these methods still have the following deficiencies: First of all, in pace with the new flourished face samples, there is no guarantee that there won’t be any trouble in finding the required subspace. Secondly, the subspace method cannot tackle the nonlinear problems in face recognition. From what have been discussed above, the local feature analysis are preferred in practical face recognition systems. At present, there is large number of part face recognition methods in numerous documents, such as SIFI, differential features [5,6], etc. It is a big pity that these documents only emphasize the effects in a specific face database; moreover, most of the face recognition systems are finished on the form of semi automation, which leading to no complete process of face recognition. Focus on the difficulties, the study shows a thorough automatic face recognition system, besides, it also give the concrete realization process. The system adopts popular techniques such as Adaboost and LBP [7,8], etc. Furthermore, it perfects the system so as to produce better recognition effects.
2 System Overview In the system analysis, we mainly think about the following factors: 1. Velocity: It is demanding that the system must keep higher processing speed. During the detection stage, we adopt the Adaboost algorithm to carry out the face detection. The reasons why we choose it result from two characteristics: the first one is that it enjoys the fastest detection speed. The second is that it has better detection capability; and during feature extraction stage, between the Gabor wavelet [9] and LBP which are popular, we choose the latter one. LBP is a kind of local feature extraction which can calculate fast and has a good recognition capability. 2. Effect: to ensure the system’s running speed simultaneously, we try best to improve the identifying effects. During the detection period, in order to ensure the iris’s accurate location, we apply K—means clustering to locate iris. In the recognition period, we add weight analysis in Soble and LBP scheme so as to make advancement of the identifying effect of LPB. Figure 1 shows the main framework of our face recognition system. Fig. 1 The frame map of the system
Automatic Face Recognition Systems Design and Realization
325
3 System Details According to Figure 1, there are 6 main parts in the face recognition system. Consequently, we will explain the specific progresses in each part.
3.1 Face Finder In our face recognition system, we choose Adaboost algorithm to locate face, Adaboost was first put forward by Freund and Schapire in 1995. The algorithm’s basic idea is boosting many weaker classifiers which could make a simple classification to construct a stronger classifier. Theory proves that while each weaker classifier makes a better classification than random speculation, if we have infinite weaker classifiers, the error rate of stronger classifier will tend to zero. In 2001, P.Viola presented a new face detection algorithm using boosted cascade of Haar features [7].
3.2 Iris Finder In accordance with the distribution of face organs, we could estimate the locations where the irises appear. Let the two iris points be (x1, y1) and (x2, y2), where x1=1.3w/4 y1=1.6h/5 x2=2.8w/4 y2=1.6h/5 (w and h are respectively the width and height of a face image). Next, get the designated eye-window based two iris points. The window size is w/6, as the Figure 2(a) shows. Afterwards, separate the colors of the iris and the skin with K-mean clustering method. The algorithm as follows:
,
,
,
(1) Initialize the three classes, the centers of these classes are class1=min (img) (the minimum of gray value), class2= max (img) (the maximum of gray value), class3= (class1+class2)/2, respectively. (2) Scan the window image. According the neighboring principle, classify each pixel to the near self gray value. (3) Calculate the new class center. Then, find the difference between the values of old and new centers. If there is no difference, go to the fourth step, otherwise, go to the second step. (4) Conclude the classification. After assorting, the class1 which is the dark area may contain iris. We suppose the class1 is white, while the class2 and class3 are skin areas which are black, as the Figure 2(b). Fig. 2 Eyes windows cluster analysis
(a)
(b)
326
Z. Qian, C. Huang, and D. Xu
Fig. 3 Sub-window images Fig. 4 3x3-neighborhood operator
Fig. 5 Face and eyes localization
For the white area, we will find the eye center using the following conditions to justify: (1) The heights of the eye lumps are smaller than their widths. (2) The eye lumps are symmetrical, and the corresponding two areas have no obvious difference. (3) Assuming there are two qualified lumps in one window image. The coordination of the center of the two lumps are (x3, y3) and (x4, y4) respectively. Comparing y3 and y4, select the corresponding lump of the bigger one as the center of the eye. Finally, the iris center will be found by the center of the eye. We get the designated sub-window using two eye points. The window size is 31×13 pixels, as the Figure 3 shows. Use the operator to every pixel of a sub-window by adding the 3x3-neighborhood of each pixel replacing the center pixel value. See Figure 4 for an illustration of the operator. Next, calculating the maximum of the sub-window, select the corresponding coordinate of the maximum as the iris center. The final result is showed in Figure 5.
3.3 Normalization Based on the iris location, make the two centers’ positions at an alignment level through the rotation of the image. Then zooming the images until the distance between the two centers comes to 70 pixels. Next adopt the template in figure 6(a) to make the face sample to be normalized to 150×180 pixels, like Figure 6(b).
3.4 Preprocessing We use Xiao-yang Tang’s preprocessing method [10] that eliminates most of the effects of changing illumination while still preserving the essential appearance details that are needed for recognition. It includes three main steps.
Automatic Face Recognition Systems Design and Realization
327
Fig. 6 Face image normalization
(a)
(b)
(1) Gamma correction: It has the effect of enhancing the local dynamic range of the image in dark or shadowed regions, while compressing it in bright regions and at highlights. This is a non-linear gray-level transformation that replaces graylevel I with I γ , where γ ∈ [0,1] is a user-defined parameter. (2) Difference of Gaussian (DoG) filtering: DoG filtering is a convenient way to obtain the resulting bandpass behavior. Fine spatial detail is critically important for recognition. We use inner Gaussian(σ 0 = 1)and outer Gaussian(σ 1 = 3). (3) Contrast equalization: globally rescales the image intensities to standardize a robust measure of overall contrast or intensity variation. Using the following two stage process: P
P
B
B
B
I ( x, y ) (mean(| I ( x' , y ' ) |α ))1 / α
(1)
I ( x, y ) (mean(min(τ , | I ( x' , y ' ) |)α ))1 / α
(2)
I ( x, y ) = I ( x, y ) =
B
Here, α is a strongly compressive exponent that reduces the influence of large values, τ is a threshold used to truncate large values after the first phase of normalization, and the mean is over the whole image. By default we use α=0.1 and τ=10. The result is showed in Figure 7. Fig. 7 Face image preprocessing
3.5 Sobel Edge Detection Refer to the preprocessing face image, if it belongs to the test image, we use Sobel operator to do edge detection with it. Then use the method of adapting threshold to get the binary image. As shown in Figure 8(a). Divide the edge binary image into 63 sub-blocks, each one is 21×15 pixels. As shown in Figure 8(b). Then calculate the numbers of the pixel point. Assuming there are n w white pixels in the ith B
B
328
Z. Qian, C. Huang, and D. Xu
Fig. 8 Sobel weight analysis
(a)
(b)
sub-block, the total number of the pixel is N in the ith sub-block. Then the weight of this sub-block is P= n w /N+0.5. Calculate the weight of all the sub-blocks one by one. Among them, the left and right down areas (the red part in the Figure 8(b)) are vulnerable to the interference of noise. Also, they are comparatively useless for recognition. Set the weight of each sub-block as 0.25, then a vector of 63 dimensions will be got. It represents the weight of the 63 sub-blocks. B
B
3.6 LBP Analysis The LBP operator is one of the best performing texture descriptors and it has been widely used in various applications. It has proven to be highly discriminative and its key advantages, namely its invariance to monotonic gray level changes and computational efficiency, make it suitable for demanding image analysis tasks. The LBP operator was originally designed for texture description. The operator assigns a label to every pixel of an image by thresholding the 3x3-neighborhood of each pixel with the center pixel value and considering the result as a binary number. See Figure 9 for an illustration of the basic LBP operator. To be able to deal with textures at different scales, the LBP operator was later extended to use neighborhoods of different sizes(n, r) which means n sampling points on a circle of radius of r. Another extension to the original operator is the definition of so called uniform patterns. A local binary pattern is called uniform if the binary pattern contains at most two bitwise transitions from 0 to 1 or vice versa when the bit pattern is considered circular. For example, the patterns 00000000(0 transitions), 01110000(2 transitions) and 11001111(2 transitions) are uniform where as the patterns 11001001(4 transitions) and 01010011(6 transitions) are not. Timo Ahonen have found that 90.6% of the patterns in the (8, 1) neighborhood in case of preprocessed FERET face database [8]. In our experiment, the face images are analyzed using the uniform patterns in the (8, 1) neighborhood. As shown in Figure 10. Fig. 9 Basic LBP operator
Automatic Face Recognition Systems Design and Realization
329
Fig. 10 Face image analyzed using LBP
Fig. 11 Face feature extraction
3.7 Feature Analysis and Matching After the analysis with LBP, we divide the image into 63 average sub-clock each one occupies 21×18 pixels. To each sub-block, weighted processing using the edge template we got from Figure 6, and then got a vector which contains a length of 59 through local histogram statistic. Link the vector of each sub-block together to form a reference feature of 63×59=3717, store it in the database. As shown in Figure 11. For one waiting—recognition image, using Equation (3) get the similarity values of the test image and all registration image, then choose the corresponding face image of the smallest one and treat it as the final recognition result. 63
59
i =1
n =1
S (img 1 , img 2 ) = ∑ (q i ∑ min(hn1 , hn2 ))
(3)
Where q i denotes the weight of the ith sub-block, h n denotes the nth bin in a subblock’s histogram. B
B
B
B
4 Experiment The database adopted by this experiment includes two parts: colorful Feret face images and the self-collected face images. Register face images contain 150 persons with one frontal image per person (some images are showed in Figure 12). Then make tests using other three images which contain different poses and facial expression. Compare the results with the original LBP method. The final experiment result is showed in Table 1. The face database we have collected contains 26 persons, each one 4 pictures, and the fours are different in poses, facial expressions and decorations. (Some images are showed in Figure 13). Choose the frontal image of each person and register it. Then test the other three pictures, the result is showed in Table 2.
330
Z. Qian, C. Huang, and D. Xu
Fig. 12 Some images in colorful Feret face database
Table 1 Comparison of recognition rates in colorful Feret database Method LBP Sobel+LBP
Top 1 recognition rate 91.3% 94.6%
Fig. 13 Some images in our own face database
Table 2 Comparison of recognition rates in our own face database Method LBP Sobel+LBP
Top 1 recognition rate 93.2% 97.1%
5 Conclusion This study designed a completely face recognition system and described the realization process. Providing the methods of clustering iris location and LBP which applies the Sobel weighted processing and they have proved to be effective through the experiment. Our continuing work will focus on these two aspects: 1. Keeping on perfecting the accuracy of the iris location. In our experiment, some faces can’t be recognized. This phenomenon is mainly caused by the inaccurate location of the iris, which directly leads to the distortion of the face picture after the capturing process. 2. Obviously, the scale of the face database isn’t big enough in our experiment. In the future, we need to go on experimenting on a large scale database and make a comparison of recognition rate between this way and other methods.
Automatic Face Recognition Systems Design and Realization
331
References 1. Bledsoe, W.: Man-Machine Facial Recognition. Technical Report PRI 22, Panoramic Research Inc., Palo Alto, CA (1966) 2. Hjelmas, E., Low, B.K.: Face Detection: A Survey. Computer Vision and Image Understanding 83, 236–274 (2001) 3. Turk, M., Pentland, A.: Eigenfaces for Recognition. Journal of Cognitive Neuroscience 13, 71–86 (1991) 4. Belhumeur, P., Hespanha, J., Kriegman, D.: Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection. IEEE Trans. on Pattern Analysis and Machine Intelligence 19, 711–720 (1997) 5. Bicego, M., Lagorio, A., Grosso, E., Tistarelli, M.: On the Use of SFIT Features for Face Authentication. In: Proceddings Of IEEE Int. Workshop on Biometrics, in association with CVPR, New York (2006) 6. Ravela, S., Hanson, A.: On Multi-Scale Differential Features for Face Recognition. In: Proc. Vision Interface, Ottawa, Canada, pp. 15–21 (2001) 7. Viola, P., Jones, M.: Robust Real Time Object Detection. In: 8th IEEE International Conference on Computer Vision, Vancouver (2001) 8. Timo, A., Abdenour, H., Matti, P.: Face Description with Local Binary Patterns: Application to Face Recognition. IEEE Trans. on Pattern Analysis and Machine Intelligence 28, 2037–2041 (2006) 9. Liu, C., Wechsler, H.: Gabor Feature Based Classification Using the Enhanced Fisher Linear Discriminant Model for Face Recognition. IEEE Trans. on Image Processing 11, 467–476 (2002) 10. Tan, X.Y., Triggs, B.: Enhanced Local Texture Feature Sets for Face Recognition under Difficult Lighting Conditions. In: Proceedings of the 2007 IEEE International Workshop on Analysis and Modeling of Faces and Gestures. LNCS, vol. 4778, pp. 168–182. Springer, Heidelberg (2007)
Multi-view Face Detection Using Six Segmented Rectangular Features Jean Paul Niyoyita, Zhao Hui Tang, and Jin Ping Liu*
Abstract. This paper presents a multi-view face detection system which combines skin color detection and adaptive boosting (Adaboost) algorithm. The aim of this combination is to satisfy the accuracy and speed, the two important characteristics of real time face detection. The second contribution of this paper is a new type of rectangular features for face detection, represented in a 2 X 3 matrix form. With these new features the training time becomes significantly very short: five times faster than using the traditional feature set. The experimental results demonstrate the effectiveness of our method in detecting profile and rotated faces over a wide range of variations in color. The method detects 97.5% of positive faces while 5% is declared as false positive. The system also detects the occluded faces as well as lean faces and rotated faces. Keywords: Face detection, rectangular filter, skin color information, Adaboost.
1 Introduction Face detection from images is currently a very active research area due to its potential applications, such as automated face recognition, surveillance and security system, human–computer interaction, etc [1], [2]. Although the task of face detection is so trivial for human brain, yet it still remains a challenging and difficult problem to enable a computer to deal with face detection. Numerous methods have been proposed to detect faces in a single image of intensity or color images [3]. Among the face detection methods, the ones based on learning algorithms have attracted much attention recently and have demonstrated excellent results [2]. With appearance-based methods, face detection is treated as a problem of classifying each scanned sub window as one of two classes (i.e. face and non-face). Appearance-based methods avoid difficulties in modeling 3D structures of faces by considering possible face appearances under various conditions. A face/nonface classifier may be learned from a training set composed of face examples Jean Paul Niyoyita . Zhao Hui Tang . Jin Ping Liu School of Information Science and Engineering Central South University, Changsha 410083, China *
H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 333–342. springerlink.com © Springer-Verlag Berlin Heidelberg 2009
334
J.P. Niyoyita, Z.H. Tang, and J.P. Liu
taken under possible conditions and non-face examples. Building such a classifier is possible because pixels on a face are highly correlated, whereas those in a non-face sub window present much less regularities. However, large variations caused by changes in facial appearance, lighting, and expression impede the face detection. Changes in facial view (head pose) further complicate the situation. The speed is also an important issue for real-time performance. Moreover, the image based techniques are slow. Recently, several face detection systems based on image based technique have been made [4], [5]. Those techniques require a large number of training samples and a long training time. To overcome this drawback we combine the image based technique with skin color technique. We first prevent the problem of lighting condition, which may change the skin color, by compensating the light in image. As we know a digital image is composed by pixels; however we need just those ones for the skin. To save the time we remove the non-skin pixels. After extracting the skin tone, we have a set of face candidates which have to be scanned by a proposed six segmented rectangular filter (see the section 4.2) to gain the rapidity and training time. By using the adaptive boosting algorithm (Adaboost), these face candidates will be classified as face or non face. The paper is structured as follows. In the next section we present the résumé of our system. Section 3 then presents operations needed before carrying out the face detection. Having face candidates obtained from this phase, section 4 proposes a novel feature set to classify faces. The experimental results are presented in section 5, and then we conclude in section 6.
2 System Overview We have summarized the structure of our algorithm in Fig. 1. Our system is similar to the one of Rein-Lien Hsu et al. [1] for lighting compensation and skin color detection. As for processing, we apply a six-segmented rectangular filter to classify faces regions. The system contains two main parts: image preparation and processing. In order to alleviate the variation of lighting conditions, a pre-processing procedure must be done. To identify the existence of human face, we have to scan the image for detecting the skin color regions and remove unnecessary pixels. To reduce the search region, we need to locate the possible face region. For this we classify the pixels into face-color or non-face color based on their hue component only. After segmentation, we have face candidates. In this processing phase, we use classifiers. Here the image is scanned by a six-segmented rectangular filter. To achieve that, the adaptive boosting algorithm (Adaboost) will be used. In case that we have a grayscale image instead of color one; the preprocessing phase is not considered and we go directly to the classification step.
Multi-view Face Detection Using Six Segmented Rectangular Features
335
3 Preprocessing Phase Human skin has its own color distribution that differs from that of most of non face objects [1]. For this reason, it can be used to filter the input image to obtain candidate regions of face. To achieve this filtering process, we need to transform the color space of our image. The following sections deal with these operations.
3.1 Lighting Compensation The appearance of skin tone color depends on lighting conditions. For this reason the first step in preparation phase is to normalize the color appearance in image. For this we have used the “reference white” approach as proposed in [1]. We first sort the image and obtain the top 5% of pixels with high gray level. If the number of those pixels is larger than 100, their gray level is set to 255 and for remain pixels value, the value is normalized to the ratio of those 5% top pixels (ratio is obtained by dividing the value 255 by the mean of the 5% top pixels). The color components are unaltered if a sufficient number of reference white pixels is not detected.
3.2 Color Transformation Most images acquired with a color video camera are in RGB space. The reason for transforming to another color space when aiming at skin color detection is to achieve invariance for changing illumination and/or skin tones. In the RGB space, the triple component (R, G, B) represents not only color but also luminance. Luminance may vary across a person's face due to the ambient lighting and is not a reliable measure in separating skin from non-skin region [6]. Luminance can be removed from the color representation in the chromatic color space. Numerous color spaces have been developed and suggested for various applications and purposes. They are usually obtained via a transform from RGB space or other “basic” color space. Those color spaces can be classified in two categories: those that only use chrominance channels such as normalized RG, HS, CbCr and color vectors that consist of all color channels (RGB, HSV, YCbCr, CIE-Lab). In our paper we benefit from a color vector that use all color channels since, as revealed in [7], segmentation performance degrades when only chrominance channels are used in pixels classification. For this, we prefer the YCbCr color space, since it is widely used in video compression standard.
3.3 Skin Color Segmentation and Region Merging As we have seen in previous section, human skin has its own color distribution that differs from that of most of non-face objects. Although different people have different skin color, several studies have shown that the major difference lies largely between their intensity later than their chrominance [3].
336
J.P. Niyoyita, Z.H. Tang, and J.P. Liu
Skin segmentation is commonly used in algorithms for face detection, hand gesture analysis, a filter for pornographic content on the internet, and other uses in video applications. In these applications, the search space for objects of interest, such as faces or hands, can be reduced through the detection of skin regions. To this end, skin segmentation is very effective because it usually involves a small amount of computation, it can be done regardless of pose, and it is robust under partial occlusion and resolution changes. To segment the skin, we first have to detect it in order to build a decision rule that will classify skin and non-skin pixels. For this fact, we take advantage of skin modelling technique. Having skin region, filters will no longer have to scan the whole image but just that region. After this segmentation, we will have to merge the detected skin regions then eliminate the small skin regions in order to facilitate the image scan.
4 Features and Classification Haar features are based on Haar wavelets, which are functions that consist of a brief positive impulse, followed of a brief negative impulse. Features usually encode knowledge about the domain, which is difficult to learn from the raw and finite set of input data. These haar-like features are interesting for two reasons: (1) powerful face/nonface classifiers can be constructed based on these features; and (2) they can be computed efficiently by using the summed-area table or integral image technique (see the next section). In previous image based face detection systems, many feature sets have been used: four feature types for Paul Viola and Michael Jones [8] and Lienhart et al. [5] extended them. To get the value of a Haar-like feature, we have to make the difference between the sums of the pixel gray level values within the black and white rectangular regions.
4.1 Integral Image Technique As explained in [4], the technique consists in rapidly computing the rectangle features using the intermediate representation for the image. The integral image II(x, y) at location x, y contains the sum of the pixels above and to the left of x, y, defined as follow.
II ( x , y ) =
∑ I ( x, y )
x '≤ x , y '≤ y
(1)
The image can be computed in one pass over the original image using the following pair of recurrences:
S ( x, y ) = S ( x, y − 1) + I ( x, y )
(2)
Multi-view Face Detection Using Six Segmented Rectangular Features Fig. 1 The sum of the pixels within rectangle D can be computed with four array references
A
337
B 1
C 3
II ( x, y ) = II ( x − 1, y ) + S ( x, y )
2 D 4
(3)
Where S(x, y) is the cumulative row sum, S(x, −1) = 0 and II (−1, y) = 0. Using the integral image, any rectangular sum can be computed in four array references, as illustrated in Fig. 3. The value of the integral image at location 1 is the sum of the pixels in rectangle A. The value at location 2 is A+B, at location 3 is A+C, and at location 4 is A+B +C +D. The sum within D can be computed as (4+1)-(2+3) [4]. The use of integral images leads to enormous savings in computation for features at varying locations and scales.
4.2 Six Segmented Rectangular Filter In [4] Viola and Jones, proposed the concept of rectangular filter which is based on bright-dark relation. Oraya Sawettanusorn et al. [9] applied the concept to detect the face region around the point between the eyes. This point is a face representative because it is common to most people and easy to find for wide range of face orientation. As characteristics, the point between the eyes has dark part (eyes and eyebrows) on both sides, and has comparably bright part on upper side (forehead), and lower side (nose and cheekbone). Inspired by this idea, Shinjiro, Kawato, Tetsutani, and Hosaka [10] have developed the face detection and tracking in real time with SSR filter and support vector machine. The system extracts the face candidates with a Six-Segmented Rectangular (SSR) filter and face verification by a support vector machine. They also benefit in motion cue in selecting face candidates and template matching. Although we share the use of six segmented filter with those others, our system is different from theirs in the fact that, after prior preprocessing steps, the face candidates are selected by using Adaboost algorithm. This is different from Kawato, S. & Tetsutani who have used the support vector machine (SVM). At this point, the Adaboost take advantage over SVM because the later depends on parameter settings. It has also revealed that the Adaboost feature selection provides a final hypothesis model that can be easily interpreted but the high dimensional support vectors of SVM approach do not provide any [11]. For Oraya Sawettanusorn et al. they didn’t use the trained data. From those features we calculate a feature response which is a difference of the sum of pixels in neighbor regions. These responses then permit to distinguish
338
J.P. Niyoyita, Z.H. Tang, and J.P. Liu
(a)
(g)
(b)
(c)
(d)
(e)
(f)
(h)
Fig. 2 A feature set of six segmented rectangular filters ((a), (b), (c), (d), (e) and (f)). (g) The nose area is brighter than the right and left eye area. (h) The region containing both eyes and eyebrows is relatively darker than the cheekbone area
positive and negative examples. The overall classification decision is made from the combined weighted classification decisions of the group of classifiers. In training, we learn the group of classifiers one at a time. These weak classifiers are typically composed of single-variable decision trees called “stumps”. In training, the decision stump learns its classification decisions from the data and also learns a weight for its “vote” from its accuracy on the data. Between training each classifier one by one, the data points are re-weighted so that more attention is paid to data points where errors were made. This process continues until the total error over the data set, arising from the combined weighted vote of the decision trees, falls below a set threshold.
4.3 The Motivation The feature types used in [5] and [8] have been used to track and detect the human faces in image. However they have used a large number of features that increase the calculation time, therefore the training time is too long. For this, we opted for new feature type to solve the training time and improve the detection. These new features can also detect the diagonal face due to their property of being used in oblique position. In the next paragraph we will deal with calculation of different feature types and the advantage of the new feature set. Number of features: As explained in [5], the number of features obtained from each feature prototype differs from prototype to prototype. In order to calculate the number of features, suppose that we have a window of size W x H as a basic unit for testing the presence of face and rectangle of size w X h to slide it, as shows the Fig. 5. With: 0<w<W; 0 < h
Multi-view Face Detection Using Six Segmented Rectangular Features
x + 1⎞ ⎛ y +1⎞ ⎛ F = Y.⎜W +1− w ⎟ ⎟.⎜ H +1− h 2 ⎠⎝ 2 ⎠ ⎝
339
(4)
While a rotated feature generate
X + 1⎞ ⎛ Y +1⎞ ⎛ XY .⎜W + 1 − z ⎟.⎜ H + 1 − z ⎟ with z=w+h 2 ⎠⎝ 2 ⎠ ⎝
(5)
Using the above formulas we can calculate the number of features according to their type. The results show that number of features significantly decreases if we use the new prototype feature (1.1/10 compared with former features). For details about feature calculations, we suggest the reader to read the paper wrote by Lienhart [5].
4.4 Classification in Cascade We know how to select a small number of critical features and to combine them into a strong classifier. Now, we want to increase the detection performance while reducing the computation time. That objective will be achieved by constructing a cascade of classifiers. The principle is to reject quickly the majority of negative windows while keeping almost all positive examples and then focus on more sensitive sub windows with more complete classifiers. To do that, the first stages in the cascade will contain only few features, which achieve very high detection rates (about 100 %) but will have a false positive rate of roughly 40 %. It is clear that it is not acceptable for a face detection task but combining successively many of these stages which are more and more discriminating will permit to reach the goal of fast face detection. We can just compare this cascade structure with a degenerated decision tree. If a sub-window is classified as positive at one stage, it proceeds down in the cascade and will be evaluated by the next stage. It will be like this until this subwindow is found negative by one stage or if all the stages classify it as positive. In this last case, it will finally be considered as a positive example. The fig 3 shows the procedure where each boosted classifier group is organized into nodes of a rejection cascade. In the figure, each of the nodes (F1, F2, F3,..., FN) contains an entire boosted cascade of groups of decision stumps trained on the Haar-like features from faces and non-faces. Typically, the nodes are ordered from least to most complex so that computations are minimized (simple nodes are tried first) when rejecting easy regions of the image. At the earliest stage, the cascade attempts to reject as many negative as possible. Although the boosting in each node is tuned to have a very high detection rate, it does have many false positives. But this will be salved because of using many nodes. During the run mode, a search window of different sizes is swept over the original image. This quick and early rejection vastly speeds up face detection.
340 Fig. 3 Rejection cascade: A series of classifiers is applied to every subwindow. The initial classifier eliminates a large number of negative examples with very little processing. Subsequent layers eliminate additional negatives but require additional computation
J.P. Niyoyita, Z.H. Tang, and J.P. Liu
No face
F1 Face
No face
F2 Face F3
No face Face Face
F4 No face
5 Experiments As explained with the definition of the integral image in 4.1, the scanning of an input image is quite simple and efficient with the integral image representation. To detect faces of different sizes and places in an image, we will apply scaled and shifted detectors all over the image. Our basic detector is a 20x20. All the images used to train the model, faces and non faces are of this size, and accordingly, all the selected rectangular features that we have to apply in the windows are defined in this 20x20 basic window. The training and classification have been done by using the gentle Adaboost algorithm. During this operation, 20 cascades (layers) have been created. We have trained our data using a set of 4160 images divided into two subsets: MIT-CBCL face recognition database which contains 2000 face images and background set of 2160 images taken arbitrarily. The training lasted just eight hours on a PC with Intel 3.2 GHz CPU, and 1 gigabytes of random access memory while it takes forty three hours to train the same set on the same PC using traditional features presented in [4] and [5]. At the same PC, our system uses 100 ms to detect an image of 1024 by 768 mega pixels i.e. 2/3 of time if we compare it with the one in [4]. We tested our detector on a new test set of CMU face database that contains 65 images with 187 faces. With that database test, 119 of 121 (98.3%) upright faces have been detected while 38 of 39 (97.7%) and 25 of 27 (92.5%) for profile and rotated faces respectively were detected. It can detect 97.5% of positive faces while 5% is declared as false positive. Unfortunately we couldn’t compare other systems ours due to different testing datasets; the reason why we use the MIT+CUM dataset, to make a comparison with the Paul Viola and Michel Jones [4]’s system as shows the table 1. The system also detects the occlusion faces as well as lean faces and oblique faces. In Fig. 5, we show the detection results where the occluded faces, rotated faces, and lean faces are detected.
Multi-view Face Detection Using Six Segmented Rectangular Features
341
Table 1 Detection results with MIT+CMU test set containing 130 images and 507 faces
Detector Paul viola [4] Our detector
Upright face 93.7% 98.6%
Profile face 97.5%
Rotated face 96%
W
Detection rate 93.7% 97.3%
False alarm 5% 6%
W
Window w h
H
Window h w w H
Upright Rectangle h Rotated rectangle Fig. 4 Upright and rotated rectangular features inside the testing window
Fig. 5 Pictures taken in different position, different orientation and different lighting conditions
6 Conclusion In this paper, we have presented a method of detecting and tracking faces in video sequences in real time, which is based on skin color detection and learning
342
J.P. Niyoyita, Z.H. Tang, and J.P. Liu
method. This method first compensates the light in image then selects the skin tone for getting the face candidates. Our basic strategy for detection is fast training with a Six-Segmented Rectangular (SSR) filters and face verification by Adaboost algorithm. We have evaluated our algorithm on various images and face databases. The images have been taken in different positions and lighting conditions. Not only has improved the detection speed, our algorithm is also effective in detecting non-frontal faces and non-upright faces. Acknowledgments. This research has been supported by the National Natural Science Foundation of China under grant number 60634020.
References 1. Hsu, R.L., Abdel-Mottaleb, M., Jain, A.K.: Face Detection in Color Images. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(5), 696–706 (2002) 2. Li, S.Z., Jain, A.K.: Handbook of Face Recognition, pp. 371–390. Springer Science+Business Media, Inc., Heidelberg (2005) 3. Yang, M.H., Kriegman, D.J., Ahuja, N.: Detecting Faces in Images: A Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(1), 34–58 (2002) 4. Viola, P., Jones, M.J.: Robust Real-time Face Detection. International Journal of Computer Vision, 137–154 (2004) 5. Lienhart, R., Kuranov, A., Pisarevsky, V.: Empirical Analysis of Detection Cascades of Boosted Classifiers for Rapid Object Detection. MRL Technical Report (2002) 6. Cai, J., Goshtasby, A., Yu, C.: Detecting Human Faces in Color Images, Wright State University, U. of Illinois. In: International Workshop on Multimedia Database Management Systems, pp. 124–131 (1998) 7. Phung, S.L., Bouzerdoum, A., Chai, D.: Skin Segmentation Using Color Pixel Classification: Analysis and Comparison. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(1) (2005) 8. Viola, P., Jones, M.J.: Rapid Object Detection using a Boosted Cascade of Simple Features. In: IEEE CVPR (2001) 9. Sawettanusorn, O., Senda, Y., Kawato, S., Tetsutani, N., Yamauchi, H.: Real-Time Face Detection Using Six-Segmented Rectangular Filter. In: International Symposium on Intelligent Signal Processing and Communication Systems, Japan (2003) 10. Kawato, S., Tetsutani, N., Hosaka, K.: Scale-Adaptive Face Detection and Tracking in Real Time with SSR Filters and Support Vector Machine. The Iinstitute of Electronics, Information and Communication Engineers, pp. 2857–2863 (2005) 11. Silapachote, P., Karuppiah, D.R., Hanson, A.R.: Feature Selection Using Adaboost for Face Expression Recognition. University of Massachusetts Amherst (2005)
Level Detection of Raisins Based on Image Analysis and Neural Network Xiaoling Li , Jimin Yuan, Tianxiang Gu, and Xiaoying Liu*
Abstract. Utilizing image processing technology, the researcher calculated the length of the long-short-axis, marked the location of it and calculated the 7 parameters, chroma, length, width and etc, 4 of which were chosen as the key characteristics of the input to build a neural network and identify the level of raisins through analysis of the external characteristics of raisins. The method was based on traditional characteristics detection, used by boundary tracking algorithm and the length of the new long-short-axis detection algorithm. The result of the experiment indicates that the calculating method and judging of the level of raisins are precise and accurate, with an average recognition rate of 92%. Therefore, the method has a great practical value, which can be applied to other agricultural products classification. Keywords: Image manipulation, Characteristic parameter, Neural network, Raisin, Level detection.
1 Introduction The image detection technology to identify agricultural research has been extensively researched and applied. Rehkugler and He Dong detected defects of apples by utilizing methods of gray-scale value detection of image and color classification [1,2]; Miller used colored and near-infrared image to analyze the injury area of peaches to classify peaches [3]; Shearer invented new ways of colors grading on round chilis from the perspective of machine vision, with the correct rate up to 96% [4,5]; application of computer vision technology to test the quality of mango and surface corruption [6]; and theory of image processing and neural network to determine long fruit, with accuracy rate over 96% with regard to grading of cucumber levels [7]. Xiaoling Li Information college of Panzhihua University, Panzhihua 617000, SiChuan, China
*
Jimin Yuan . Tianxiang Gu Automatic College of UESTC, Chengdu 610054, SiChuan, China Xiaoying Liu Computer College of PanZhiHua University, PanZhiHua 617000, SiChuan, China
[email protected] H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 343–350. springerlink.com © Springer-Verlag Berlin Heidelberg 2009
344
X. Li et al.
At present, there are photoelectric sorting and manual separation with regard to distinguish the grading of raisins. Methods of Manual separation were based mainly upon human observation to determine the level. There is lack of objectivity if relying solely on color characteristics and the naked eyes because of the raisins sizes and the complex situations of surface. Method of photoelectric separation is based primarily on color characteristics, utilizing surface testing to determine level of raisins, which is made up of material delivering equipment, light boxes, electronically controlled lines and pneumatic system to distinguish the colors of raisins [8]. It controls circuits complicatedly and needs users of higher level, which restricts its application. In this paper, image processing and analysis technology are combined with artificial neural networks to identify and grade raisins. The key of this way is image manipulation algorithms and characteristic testing, linked to artificial neural networks through extracting valid characteristic parameter. The experiment takes Turpan nuclear-free green raisins as samples [9].
2
Image Pre-manipulation
Use digital cameras COOLPIX4500 to get photos by adjusting brightness and location, the calibration coefficient of X and Y: X_SCALE=0.2135675; SCALE=0.214416667.The calibration coefficient of diagonal line XY_SCALE can be worked out by formula 1.
XY _ SCALE = X _ SCALE 2 + Y _ SCALE 2
(1)
The photos of raisins, 640 × 480 pixel, 24-bit true color BMP, as shown in Figure 1. In the process of image-getting, the quality of image declined because of uneven light and transmission lines etc. Therefore, the images need pre-treating for extracting characteristic data. The image Pre-treatment shown in Figure 2:
Fig. 1 Raisin image (The left: First-grade raisin image; The right: Second-grade raisin image)
Level Detection of Raisins Based on Image Analysis and Neural Network
345
Fig. 2 Process of image pretreatment
When segmenting the background of color images, the researcher first conducted threshold as far as the gray image was concerned, and then compared the original color image with treated image. If pixel is 255(having treated image, the background is white), set the corresponding pixel of the original image to 255 [10]. In this way, non-background part of the original image is retained, making the background white. The colored image of raisins can be indicated as R(red), G(green) and B(blue). Based on the formula 2, the threshold is shown as follows:
R0 (i, j ) ≤ Tr
⎧ R0 (i, j ) R(i, j ) = ⎨ ⎩255 ⎧ G0 (i, j ) G (i, j ) = ⎨ ⎩255
G0 (i, j ) ≤ Tg G0 (i, j ) > Tg
⎧ B (i, j ) B(i, j ) = ⎨ 0 ⎩255
B0 (i, j ) ≤ Tb B0 (i, j ) > Tb
R0 (i, j ) > Tr (2)
R0 ( i , j ),G0 ( i , j ) , B0( i , j ) and R( i , j ),G( i , j ),B( i , j ) are gray value of three-channel pixel before and after the background segmenting.; Tr, Tg, Tb are the threshold of segmenting background for three channels R, G and B. The result of segmenting is shown in Figure 3: Compared the three pictures, the researcher found that while segmentation in B channel, the information of raisins remained integral; However, Segmentation in R & G channel, part of information lost. Therefore, because of the largest instance
Fig. 3 Results of background segmentation in three channels(The left: Segmentation in R channel; The middle: Segmentation in G channel; The right: Segmentation in B channel)
346
X. Li et al.
between the two peaks which is helpful for the correct choice of the threshold, the effect in histogram B with background segmentation is quite good.
3 Selection and Extraction of Characteristics Parameter In order to identify and grade raisins, the characteristics of a single raisin need extracting. The following experiment focuses on extraction method of color, shape, and other characteristics. Transforming from RGB to HSI can be done as formula 3, 4, 5 and 6:
I = ( R + G + B) / 3
S = 1−
(3)
3 [min( R + G + B)] ( R + G + B)
[( R − G ) + ( R − B )] / 2 H = arccos{ } [( R − G ) 2 + ( R − B)(G − B)1/ 2
(4)
(if B ≤ G )
[( R − G ) + ( R − B)]/ 2 H = 360。− arccos{ } if B > G [( R − G ) 2 + ( R − B)(G − B)1/ 2
(
(5)
)
(6)
Moreover, calculation the mean of HSI can be done as formula 7,8,9:
H =
1 n ∑ Hi n i =1
(7)
S=
1 n ∑ Si n i =i
(8)
I=
1 n ∑ Ii n i =1
(9)
、 S 、 I - the color, saturation and brightness of raisins picture element H -the mean of color; S -the mean of saturation; I - the mean of brightness; n-the number of raisins picture element.
In the above formula,
Hi
i
i
According to the area feature in the Green theorem as formula 10:
A=
1 ( xdy − ydx) 2∫
(10)
Integral goes along with the closed curve. After the discretization, the researchers get the following formula 11 to calculate the area:
Level Detection of Raisins Based on Image Analysis and Neural Network
347
Table 1 Average of raisin features Grade
H
S
I
P
A
L
M
First
44.7560
0.4811
0.3355
133.7500
140.2401
19.9255
9.6232
Second
36.1069
0.4506
0.3576
124.8903
102.6010
17.3508
7.9682
Third
32.1093
0.5245
0.4028
93.0393
67.4188
14.1359
6.3648
A= In the above formula, N b
1 Nb ∑ [ xi yi +1 − xi +1 yi ] 2 i =1
(11)
-the number of boundary point.
Calculate three levels(each contain 40) of raisin samples, and get mean score of 7 characteristics, identify the parameters color, saturation, brightness, long, area, length and width as a means of identification features. As shown in table 1.
4 Raisin Grade-Level under BP Network Checkup The relation between the form feature and grading is comparatively complex which is hard to distinguish one from another, so the researchers make use of BP internet to establish the relationship between form feature and grading which is helpful to distinguish different grades. The input of BP network is 7, the output nodes are 3 and there is an implied layer. The input has 7 corresponding feature parameters, while the output is corresponding with the 3 grades of the raisin. The initial study velocity is 0.01.self study velocity is adopted. The target error is 0.001. The NN grading figure is as figure 4:
First Grade Raisin Characteristic Parameter of Raisin
Neural Network Classifier
Second Grade Raisin Third Grade Raisin
Fig. 4 Grading sketch map of neural network
348
X. Li et al.
The 180 samples are made up of 60 raisins in 3 different grades respectively. Identifiable samples are 120 raisins with 40 raisins of each grading. In order to make them belonging to the same scope, the data is normalized. PT is the input matrix, among which there are 7 feature parameters and 5 samples of different grades.
T is the target matrix, and each set of parameters is corresponding to an output vector. Each element stands for a grade. In the target output, 1 stands for the grade of the target output.
5 Network Training and the Results of Recognition The experiment is carried out with the help of 7 inputs, 3 outputs and a middle layer with 17 nerve cells, indicating that the training speed is fast, the ratio of identification is very high. The result is shown in table 2. Table 2 shows that chroma, area, length, width are relatively important parameters. At the same time, these 4 features are more efficient than those 7 features. Therefore, the cyber structure adopts the system with 7 inputs, 3 outputs and a middle layer with 17 nerve cells.
Level Detection of Raisins Based on Image Analysis and Neural Network
349
Table 2 Recognition rate of different parameters
parameter number 7
H √
S √
6
√
√
5
√ √
5
I √
epoch of learning
Recognition rate of sample %
( )
P √
A √
L √
M √
976
91
√
√
√
√
973
91
√
√
√
√
1181
91
√
√
√
√
9302
86
√
√
1961
90
1183
88
√
930
88
√
907
92
4
√
√
4
√
√
√
4
√
√
√
4
√
√
√ √
6 Conclusion Separating raisins is based on image manipulation technology, through the border tracking algorithms. The researchers put forward a new method of calculation, which is to test the length of the long-short-axis, mark the location of it and calculate the 7 parameters, chroma, length, width and etc, 4 of which are chosen as the key characteristics of the BP input of network to build a network and identify the level of raisins. The result of the experiment indicates that the calculating method and judging of the level of raisins are precise and accurate, with an average recognition rate of 92%.
References 1. Rehkugler, G.E., Throop, J.A.: Apple Sorting with Machine Vision. Transaction of the ASAE 29, 1388–1395 (1985) 2. He, D.J., Yang, Q.X., Shao, P.: Computer Vision for Color Sorting of Fresh Fruits. Transaction of the Chinese Society of Agricultural Engineering 14, 202–205 (1998) (in Chinese) 3. Miller, B.K., Delwiche, M.J.: A Color Vision System for Peach Grading. Trans. of the ASAE 32, 1484–1490 (1989) 4. Li, P., Zhu, J.Y., Liu, Y.D.: Application and Developing Trend of Computer Version Technology in Detection and Classification of Agricultural Products. Acta Agriculturae Universitis Jiangxiensis 27, 796–800 (2005) (in Chinese) 5. Shearer, S.A., Payne, F.A.: Color and Defect Sorting of Bell Peppers Using Machine Vision. Trans. of the ASAE 33, 2045–2050 (1990) 6. Wang, J.F., Luo, X.W., Hong, T.S.: Application of Computer Vision Technology in Detecting Mango Weight and Surface Bruise. Trans. of the Chinese Society of Agricultural Engineering 14, 186–189 (1998) (in Chinese)
350
X. Li et al.
7. Zhang, R.Y., Liu, S.S.: Researches and Applications of Computer Vision Technique in Fruit and Vegetables Commercialization After Harvesting. Journal of Yuzhou University 21, 497–501 (2004) (in Chinese) 8. Xie, F.Y.: On Photoelectric Technology of Colour Sorting for Raisin. Journal of Hunan Agricultural University 30, 71–73 (2004) 9. Liu, X.Y.: Study on Raisin Grading Technology Based on Image Analysis. Northwest A & F University, Yangling (2006) (in Chinese) 10. Ying, Y.B.: Study on Background Segment and Edge Detection of Fruit Image Using Machine Vision. Journal of Zhejiang Agricultural University 26, 35–38 (2000) (in Chinese)
English Letters Recognition Based on Bayesian Regularization Neural Network Xiaoli Huang and Huanglin Zeng*
Abstract. In this paper, we study the incorporation of Bayesian Regularization into constructive neural networks. The degree of regularization is automatically controlled in the Bayesian inference framework and hence does not require manual setting. Simulation shows that regularization with input training using a full Bayesian approach produces networks with better recognition performance and lower susceptibility as the noise increases. Regularization with input training under the gradient descent algorithm, however, does not produce significant improvement on the problems tested. Keywords: Bayesian regularization, Neural network, The gradient descent algorithm, English letters recognition.
1 Introduction Multi-layer feed forward networks have been popularly used in many pattern classification and regression problems. BP algorithm is one of the fully developed algorithms. According to the Kolmogolov theorem, this type of network consists of the input, the hidden and the output layers which can precision approach for any non-linear function. This conclusion is strictly set up only in the infinite network conditions. Generally speaking, we do not know the exact capacity of a limited network to solve a given problem. Because different problems need different ability of the network, it is important to select the appropriate size of the neural network when it is designed to solve specific problems. If the network is too small, it could not be a good model on the problems. At the same time, over-fitting problem or poor generalization capability happens when a neural network over learns during the training period. As a result, such a too well-trained model may not perform well on unseen data set due to its lack of generalization capability. The first method is an early learning stopping mechanism in which the training process is concluded as soon as the overtraining signal appears. The signal can be observed when the Xiaoli Huang . Huanglin Zeng School of Automation and Electronic Information, Sichuan University of Science & Engineering, Zigong 643000, China
[email protected] *
H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 351–357. springerlink.com © Springer-Verlag Berlin Heidelberg 2009
X. Huang and H. Zeng
352
prediction accuracy of the trained network applied to a test set, at that stage of training period, gets worsened. The second approach is the Bayesian Regularization for regular-insertion and model comparison described in the companion paper `Bayesian interpolation' [1]. This approach minimizes the over-fitting problem by taking into account the goodness-of-fit as well as the network architecture. My work is based on the same probabilistic framework and extends it using concepts and techniques adapted from Gull and Skilling's Bayesian image reconstruction methods [2].The Bayesian Regularization approach is considered in this study and demonstrated on the letters in the alphabet which are added noise.
2 BP Neural Network’s Overview BP ANN is a kind of multilayer forward propagation NN based on the inverse error propagation algorithm. The BP model is consisted of one input layer, some hidden layers and one output layer, with every layer consisting of some neurons, connection of the different layer neurons by connective weight and threshold and nonexistence of connection among same layer neurons. The sketch of standard BP model is shown in Fig.1. The error back-propagation learning consists of 2 passes through the different layers of the network: a forward and a backward passes. In the forward pass, an activity pattern is applied to the input neurons of the network and its effects propagate through the network, layer by layer, until an output is produced the network. Weights between neurons of successive layers are initially assigned in random. In the backward pass, the error observed between the network and the desired responses is computed and used to amend the weights. The standard BP learning algorithm [3] is to modify the network connective weights and thresholds to make the error function descending along negative grad direction. Let I = [ I1 , I 2 , , I i , , I m ] be the function implemented by a network with
i hidden units directly connected to the output unit, O = [O1 , O2 , , Om ] be the expect output , w be the set of values of the connections in the network, the error sum of squares function
ED ( w) =
Fig. 1 Structure of three-layer BP Neural Network
1 m ∑ (Yk − Ok )2 2 k =1
(1)
English Letters Recognition Based on Bayesian Regularization Neural Network
353
w
be the corresponding residual error. The task of `learning' is to find a set of which fits the training set well, i.e. has small error
ED .
3 Bayesian Regularization It is also hoped that the learned connections will ‘generalize’ well to new examples. We will address medications to the standard back prop algorithm which implicitly or explicitly modify the objective function, with decay terms or regularizes. Some of the ‘hints’ in [4] also fall into the category of additive weight-dependent energies. A sample weight energy term is:
Ew = ∑ i
1 2 wi N
(2)
Where the N is number of the samples. Gradient-based optimization is then used to minimize the combined function:
H = α EW + β ED
(3)
Where the α is a decay rate or regularizing constant which controls of other parameters’ (the weight and the threshold) distribution. The α , β = 1 − α are parameters which are to be optimized in Bayesian framework of MacKay [7], [5]. Training algorithms are designed to minimize the error if α << β . At the same time training algorithm is designed to have more smooth response, that is, as far as possible to reduce the effective network parameters in order to make up a larger network error if α >> β . Connections between probabilistic inference and neural networks have been discussed in [6]. Let us now review the probabilistic interpretation of network learning. A network with connections w is viewed as making predictions about the target outputs as a function of input I in accordance with the probability distribution:
P ( I , w, β ) = Where
exp − β E ( I , w) Si ( β )
(4)
Si ( β ) = ∫ (exp − β E )dI , E is the error for a single datum, and β is a
measure of the presumed noise included in I . If E is the quadratic error function then this corresponds to the assumption that I includes additive Gaussian noise with variance δ = 1/ β . A prior probability is assigned to alternative network connection strengths written in the form: 2
w,
X. Huang and H. Zeng
354
P ( w, α ) = Where
exp − α Ew ( w) sw (α )
(5)
S w ( β ) = ∫ (exp − α Ew )d k w , α is a measure of the characteristic ex-
pected connection magnitude. If
Ew is quadratic as specified in (2) then weights
are expected to come from a Gaussian with zero mean and variance The posterior probability of the network connections
P( w, α , β ) =
δ w2 = 1/ α .
w is:
exp− (α Ew + β ED ) S (α , β )
(6)
∫
Where S (α , β ) = [exp − (α Ew + β ED )]d k w . So under this framework, minimization of A = (α Ew + β ED ) is identical to finding the (locally) maximum a posteriori parameters
wMP ; minimization of ED
by back propagation is identical to finding the maximum likelihood parameters wML . Thus an interpretation has been given to back propagation’s energy functions
ED and Ew , and to the parameters α and β . Bayesian Regularization can make effectively weights as little as possible under the circumstances of ensuring the network error as small as possible through employing new performance function; this is equivalent to automatically reduce the size of the network. Bayesian methods can adaptive adjust parameters to achieve the best value. In the MATLAB toolbox, the Bayesian Regularization is achieved through the trainbr function.
4 Results and Discussions The neural network input neurons are 5 × 7 and output neurons are 26 because the matrix of English letters is 5 × 7 and the number of English letters is 26; Experience has shown that 15 neurons is selected as the hidden layer of neural network is the best. In order to have a network of anti-interference capability, first of all, the ideal noise-free letters are used to train the network, and then two sets of letters with random noise are used to train the network, with the last the ideal noise-free letter are used to train the network. The output of the network will be put through a competition layer to ensure that the goal output value is 1 and the other output is 0. Let the network goal error be 0.0001, the largest training number be 1000. First of all, Bayesian Regularization neural network is used to train the network, the parameters after the convergence is shown in Fig. 2; and then the gradient descent algorithm that includes momentum and self-adjusting learning rate is used to train
English Letters Recognition Based on Bayesian Regularization Neural Network
355
Fig. 2 The parameters after the convergence by Bayesian regularization
(a)No noise
(b) With noise
the network on the same scale and same error conditions, the parameters after the convergence is shown in Fig. 3. From Fig.2 and Fig.3, both methods can be compared with the results of training parameters in Table 1: From Table 1 we can see that the Bayesian Regularization method can speed up network convergence while improve the accuracy of the network compare with gradient descent algorithm. Table 1 The results of training parameters of the two methods
Sum squared error
The Bayesian Regularization algorithm The gradient descent algorithm
Convergence number in training
No noise
With noise
No noise
With noise
25.060500
0.0000865078
135
40
0.0961202
0.5898880000
217
87
356
X. Huang and H. Zeng
Fig. 3 The parameters after the convergence by the gradient descent algorithm
(a) No noise
(b) With noise
Finally, performance tests are carried out on the network that trained by both algorithms. Any letter by adding 0.01 to 1.00 random noise to generate 1000 input vector are entered the network to perform the recognition experiments on the both trained network. The relationship between the recognition error rate and the noise indicators are shown in Fig. 4 (a) and (b). From Fig.4, we can see that: (1) The Bayesian Regularization algorithm’s recognition error rate was 0% (recognition rate can reach 100%) when the noise is less than 0.2 while the rate of recognition error rise slowly when the noise is between 0.2 and 1.0 to achieve 6%. But no matter how the noise increased, the largest error rate is 6%. (2)The gradient descent algorithm’s recognition error rate was 0% (recognition rate can reach 100%) when the noise is less than 0.15 while the rate of recognition error rise slowly when the noise is between 0.15 and 0.4. But the recognition error rate increased dramatically after the noise is bigger than 0.4. The largest error rate is 20%. From the above information, we can draw the conclusions that on the same scale and network error conditions, the Bayesian Regularization algorithm generalization ability is superior to the gradient descent algorithm because it has faster convergence, better identification and better fault-tolerant ability.
English Letters Recognition Based on Bayesian Regularization Neural Network
357
Fig. 4 The relationship between the recognition error rate and the noise indicators
(a) By the Bayesian Regularization algorithm
(b) By the gradient descent algorithm
References 1. MacKay, D.J.C.: Bayesian Interpolation. Neural Computation (1991) 2. Skilling, J.: Maximum Entropy and Bayesian Methods, pp. 45–52. Kluwer Academic Publishers, Dordrecht (1989) 3. Wang, Z.W., ZhangLiu, R.P.: One Improved BP Neural Network Learning Algorithm. Mathematical Theory and Applications 25, 31–34 (2005) 4. Abu-Mostafa, Y.S.: Learning from Hints in Neural Networks. J. Complexity 6, 192–198 (1990) 5. MacKay, D.J.C.: Bayesian Interpolation. Neural Computation 4(3), 415–447 (1992) 6. Tishby, N., Levin, E., Solla, S.A.: Consistent Inference of Probabilities in Layered Networks: Predictions and Generalization. In: Proc. IJCNN, Washington (1989) 7. MacKay, D.J.C.: A Practical Bayesian Framework for Back Propagation Networks. Neural Computation 4(3), 448–472 (1992)
Iris Disease Classifying Using Neuro-Fuzzy Medical Diagnosis Machine Sara Moein, Mohamad Hossein Saraee, and Mahsa Moein*
Abstract. Disease diagnosis is an essential task in the medical world. The use of computers in the practice of medicine is becoming more and more crucial. In this paper, we propose an intelligent system to help us diagnose the Iris disease. This system is based on Artificial Neural Network (ANN) approach. In order to evaluate our proposed approach we apply the system on a dataset which includes all related symptoms. Next multilayer perceptron ANN is trained to be able to classify. In order to obtain the best results we use different measure values. Finally data fuzzification is employed to improve the stem performance. Keywords: Artificial Neural Network, Medical Diagnosis, Fuzzification.
1 Introduction Medical diagnosis as an important problem for human. Medical diagnosis deals with human body and life. The amount of information a physician can memorized and remember is limited. In addition there are enormous number of difference diseases each one with distinct feature. Medical diagnosis directly related to the experience and intelligence of a physician. However the most experience medic, will do the same as the new graduated ones, in facing a new disease. Therefore, many medical experts and scientists are concerned about computerizing tools to diagnose disease in medical science [13], [14], [16]. Fuzzy logic, genetic algorithm, artificial neural network and hybrid neurofuzzy systems have tried to solve some of medical diagnosis problems. Sara Moein Dept. of Computer Engineering, Islamic Azad University of Najafabad branch, Isfahan, Iran
[email protected] *
Mohamad Hossein Saraee Dept. of Electrical and Computer Engineering, Isfahan University of Technology, Isfahan, Iran
[email protected] Mahsa Moein Dept. of Computer Engineering, Islamic Azad University of Najafabad branch, Isfahan, Iran
[email protected] H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 359–368. springerlink.com © Springer-Verlag Berlin Heidelberg 2009
360
S. Moein, M.H. Saraee, and M. Moein
1.1 Intelligent Medical Diagnosis Intelligent medical diagnosis machines do the same process as the physician perform for medical diagnosis. First, they arrange an interview with patient and to get appropriate knowledge after accurate examining. It is probable that additional laboratory examination should be taken. After all, the physician can diagnose the disease. Medical diagnosis is not a 100% reliable process. Either machine or a physician does make mistake. But it is safe to think that all physicians don't make the same medical diagnosis decision. Also, not all the intelligent machines have the same performance and they don't have the same reliability. The more intelligent and experienced machine, the more reliable medical decision making [3], [6], [9]. Artificial neural networks are helpful and powerful. Since not only they are capable to recognize patterns with the aid of the expert, they can also detect hidden information and strike features that can't be seen from data or even video pictures. Actually, artificial neural network shows that experience from expert is not enough in medical diagnosis. Nowadays, physicians combined this opportunity that gives by neural network. Scientists used ANN for various diseases like hepatitis, breast cancer, etc and they could achieve to goal results. Present paper is effort to use ANN for Iris medical diagnosis [10], [11]. Meanwhile there are different points to be concerned, when an intelligent machine has been implemented: • Comprehensibility: the symptoms should be comprehensible by the intelligent machine. • Reliability: how much the machine processing result is reliable? Is it trustable? • Performance: does the intelligent machine perform the medical diagnosis task with high quality?
1.2 Existing WISER (Wireless Information System for Emergency Responders) is a system which has been designed to assist First Responders in hazardous material incidents. Developed by the National Library of Medicine, WISER provides a wide range of information on hazardous substances, including substance identification support, physical characteristics, human health information, and containment and suppression guidance [19],[20]. The followings are some of the features of the WISER system: • Mobile support, providing First Responders with critical information in the palm of their hand. • Comprehensive decision support, including assistance in identification of an unknown substance and, once the substance is identified, providing guidance on immediate actions necessary to save lives and protect the environment. • Rapid access to the most important information about a hazardous substance by an intelligent synopsis engine and display called "Key Info".
Iris Disease Classifying Using Neuro-Fuzzy Medical Diagnosis Machine
361
• Radiological support, including radioisotope substance data, tools, and reference materials. • Intuitive, simple, and logical user interface developed by working with experienced first responders.
2 Proposed Method Implementation of artificial medical diagnosis machine is involved few steps. The first is arrange an interview with a physician to collect standard data about Iris disease. Iris diagnosis uses the structure, color, and shape of the Iris and pupil of the eye to determine an individual's illness [1], [3], [5]. Illustration of the proper symptoms and reference values to Iris Diagnosis is given below, while Table 1 shows the symptoms values Maximum and minimum range. Attribute Information: 1. 2. 3. 4. 5.
Sepal length in cm Sepal width in cm Petal length in cm Petal width in cm Class: − Iris Setosa − Iris Versicolour − Iris Virginica
In next step the selected features for many visitors of a particular clinic are measured during 3 months.
2.1 Iris Dataset It is vital to have acceptable and valid data of Iris dataset. We have obtained the data set with cooperation of Iris clinic. This dataset consists of four major symptoms to diagnose Iris disease for 150 records of patients to classify them to 3 classes: Iris Setosa, Iris Versicolour and Iris Virginica, where each class refers to a type of iris plant. Table 2 shows a part of used dataset. As mentioned before, the complete dataset contains the measured features of 150 patients.
2.2 ANN Training Schema What we opted the algorithm of ANN, is the three layer perceptron to keep the structure as so simple. Also, we applied an ordinary back propagation training GDR algorithm on that. The main point is to change the training cycle number and also the number of nodes in hidden layer, to examine which structure can classify more accurately. Also fuzzification effect on symptoms values is investigated [18], [19]. Table 3 is the base to assign labels to each class. Fig. 2 shows the error
362
S. Moein, M.H. Saraee, and M. Moein
Table 1 Summary statistics for attributes(Symptoms values)
Sepal length Sepal width Petal length Petal width
Min
Max
Mean
SD
Class correlation
4.3 2.0 1.0 0.1
7.9 4.4 6.9 2.5
5.84 3.05 3.76 1.20
0.83 0.43 1.76 0.76
0.7826 -0.4194 0.9490 0.9565
Table 2 Part of provided dataset of Iris
Table 3 List of classes and assigned lables Assigned label 1 2 3
Class name Iris Setosa Iris Versicolour Iris Virginica
Iris Disease Classifying Using Neuro-Fuzzy Medical Diagnosis Machine
363
rate in some epochs. The MATLAB package and NETLAB toolbox are the simulator softwares that we used for ANN training. Fig. 1 is a patch of MATLAB code, developed to train ANN.
Fig. 1 Patch of Developed MATLAB code to Train ANN
Cycle Cycle Cycle Cycle Cycle Cycle Cycle Cycle Cycle Cycle Cycle Cycle Cycle Cycle Cycle Cycle
1 Error 2 Error 3 Error 4 Error 5 Error 6 Error 7 Error 8 Error 9 Error 10 Error 11 Error 12 Error 13 Error 14 Error 15 Error 16 Error
35.740504 Scale 1.000000e+000 20.335462 Scale 5.000000e-001 20.335462 Scale 2.500000e-001 20.335462 Scales 1.000000e+000 20.335462 Scales 4.000000e+000 20.335462 Scales 1.600000e+001 20.335462 Scales 6.400000e+001 20.335462 Scale 2.560000e+002 15.305778 Scale 1.024000e+003 12.937717 Scale 5.120000e+002 8.209793 Scale 2.560000e+002 7.977332 Scale 1.280000e+002 5.693968 Scale 6.400000e+001 5.470339 Scale 3.200000e+001 5.186173 Scale 1.600000e+001 5.081094 Scale 8.000000e+000
Fig. 2 Error rate in cycles after runing matlab code
364
S. Moein, M.H. Saraee, and M. Moein
3 Experiment and Discussion K-folding is a successfully experienced technique that is used in this proposed System. Based on the number of records, k-folding scheme with k=7 was applied. To be exact, each time 80% of the samples are as training dataset, and the left 20% are as the testing dataset, while training procedure has to be done 7 times. Fig. 3
Fig. 3 Training cycle = 150, hidden layer node = 30, Above the output error, below the assigned label/ disease, Performance: 80%
Fig 4a The fuzzy toolbox
Iris Disease Classifying Using Neuro-Fuzzy Medical Diagnosis Machine
365
(b)
(c) Fig. 4 (b,c) Fzzy membership function for two of Iris attributes
illustrates the test result based the non-fuzzy continuous data. Deduced performance is 80%. In the other stage, the fuzzufied data is trained. Basically a fuzzy membership function is used on all symptoms. Fig. 4 illustrates using the fuzzy toolbox and the fuzzy membership function for one of the symptoms. It is wonderful that
366
S. Moein, M.H. Saraee, and M. Moein
Fig. 5 Training cycle = 500, hidden layer node = 20, Fuzzified data, Performance = 100%
Fig. 6 Training cycle = 1000, hidden layer node = 10, Fuzzified data, performance = 100%
performance evaluated to 100%, after data fuzzification. Error rate decreases to less than 0.3. Fig. 5 and fig. 6 show it. Also trade off between training cycle and hidden layer node number can be notified. While the above part of figure shows the error rate, and the below part clear the difference between the training result and the real value [19].
4 Conclusion In this paper, we presented the power of Artificial Neural Network (ANN) in artificial medical diagnosis. Our approached showed improved performance. Most of
Iris Disease Classifying Using Neuro-Fuzzy Medical Diagnosis Machine
367
the research works in this area have been focused on diagnosis of disease like hepatitis, breast cancer. Iris is a new disease that has been classified using ANN approach. In addition, we show that the performance does improve enormously by performing fuzzifing, however there are different effective parameters to increase the validity of results. For the future works, advanced ANN can be trained with fuzzy or non fuzzy data to evaluate the performance for unknown problems. Also focusing on the process of data collecting based on interview with patients can be worthy of consideration toward an intelligence medical diagnosis system.
References 1. Laura, D., Camacho, A.C., Badr, A., Armando, D.G.: Images Compression for Medical Diagnosis using Neural Networks. Universidad Nacional de La Plata (1996) 2. Michie, D., Spiegelhalter, F.D., Taylor, S.: Machine Learning, Neural and Statistical Classification. Ellis Horwood (1994) 3. Salim, J.: Medical Diagnosis Using Neural Networks. International Journal of Health and Medical Computation (2004) 4. Pomi, A., Olivera, H.: BMC Medical Informatics and Decision Making: Contextsensetive auto associative memories as expert systems in medical diagnosis. BioMed. Central (2006) 5. Sasikala, K.R., Petrou, J., Kittler, M.: Fuzzy Classification with a GIS as an aid to decision making. Department of Electronic and Electrical Engineering (2006) 6. Steimanm, F., Adlassnig, K.P.: Fuzzy Medical Diagnosis. Institut fur Rechnergestutzte Wissensverarbeitung, Universitat Hannover (2004) 7. Siganos, D.: Neural Networks in Medicine the Public health care case study. PortRoyal Paris (1995) 8. Sordo, M.: Introduction to Neural Networks in Healthcare. Open Clinical: Knowledge Management for Medical Care (2002) 9. Ultsch, A., Korus, D., Kleine, T.O.: Integration of Neural Networks and KnowledgeBased Systems in Medicine. Hans-Meerwein-Straße / Lahnberge, Marburg (1995) 10. Wesley, K., Haisty, J.: Agreement between Artificial Neural Networks and Human Expert for the Electrocardiographic Diagnosis of Healed Myocardial Infarction. Journal of the American College of Cardiology, 28, 1012-1016LU TP 95-9 (1996) 11. Wolff, J.G.: Medical Diagnosis as Pattern Recognition in a Framework of Information Compression by Multiple Alignment. Unification and Search (2005) 12. Weigand, S., Huberman, A., Rumelhart, D.E.: Predicting the Future: A Connectionist Approach. International Journal of Neural Systems (1995) 13. Yao, X., Liu, Y.: Involving Artificial Neural Network in Medical. The University of South Wales, Australia (1995) 14. Yao, X., Liu, Y.: Neural Network for Breast Cancer Diagnosis. Biomedical and Bio Engineering Journal, Birhmingham (1999) 15. Zhou, Z.H., Jiang, Y., Yang, Y.B., Chen, S.F.: Lung Cancer Cell Identification Based on Artificial Neural Network Ensembles. Nanjing University, Nanjing (2001) 16. Zrimec, T., Kononenko, I.: Feasibility Analysis of Machine Learning in Medical Diagnosis from Aura Images. University of Ljubljana, Bangladesh (2004)
368
S. Moein, M.H. Saraee, and M. Moein
17. Moein, S., Monadjemi, S.A., Moallem, P.: A Novel Fuzzy Neural Based Medical Diagnosis system. In: WASET, Egypt, vol. 26 (February 2008) 18. Moein, S.: Hepatitis Diagnosis by Training A MLP Artificial Neural Network, Las Vegas, USA (July 2008) 19. Software for Neural Network Training (March 2006), http://www.Neuralware.com 20. Wireless Information System for Emergency Responders (2006), http://wiser.nlm.nih.gov/ 21. An expert system for Medical decision (March 2006), http://www.Dxplain.com
An Approach to Dynamic Gesture Recognition for Real-Time Interaction Jinli Zhao and Tianding Chen*
Abstract. Recently, the new Human Computer Interactive (HCI) technologies such as gesture recognition have drawn extensive attention, which have been extended to such field as the interaction between human and machine. In this paper, the critical technology about the gesture recognition is discussed, a method to locate human hand in high speed is proposed, based on the combination of skincolour based Gauss model and the revised optical flow. Considering the characters of dynamic hand gesture, we adopt the Hidden Markov Model and use it in ‘the remote robot control system based on p2p’. Keywords: Gesture Recognition, Skin-colour model, Optical Flow Tracking, Hidden Markov Model.
1 Introduction As personal computer becomes more pervasive in people’s life, the technology of Human Computer Interaction (HCI) will get more attention. The center of HCI has been transferred from computer to human [1-3]. Gesture recognition will be applied to the interaction between human and robot beside the interaction between computer and human. Robot will be used extensively in dangerous work and service industry, especially the service industry, which needs a natural human computer interface [4]. Unlike works where users are required to wear data-gloves, which are not natural, gesture recognition based on vision becomes into the main direction of the research. The advantage is non-glove and non-contact, and the fast computer power makes the real-time vision computer possible. Dynamic hand gesture has been regarded as a highly difficult task mainly due to two aspects of signal characteristics: segmentation ambiguity and spatiotemporal variability. The segmentation ambiguity problem concerns how to determine the start point and end point in a continuous hand trajectory. The other Jinli Zhao . Tianding Chen College of Information & Electronic Engineering Zhejiang Gongshang University *
No.18, Xuezheng Str., Xiasha University Town, Hangzhou 310018, P.R. China
[email protected],
[email protected] H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 369–377. springerlink.com © Springer-Verlag Berlin Heidelberg 2009
370
J. Zhao and T. Chen
difficulty comes from the fact that the same gesture varies dynamically in shape and duration, even for the same gesture. In the thesis, a method is proposed for hand trajectory recognition based on HMM, which can model spatio-temporal information in a natural way. In order to distinguish the undefined gesture, a modified threshold model is proposed, and also a gesture spotting method is given to work perfectly.
2 System Overview The flow chart of the system is shown in Figure 1.
Fig. 1 The system block diagram
3 Dynamic Gesture Recognition Based on Vision 3.1 Hand Segmentation Based on Skin Color Model The RGB color model in common use includes the information both of color and brightness, which is sensitive to the variation of the illumination and environment. To ensure the skin color not be affected by the illumination and the color, we use
YCb Cr color model [6]. Where Y represents the brightness, Cb represents the color green and Cr represents red. The relationship between YCb Cr model the
and RGB model show as follows:
Y = 0.29900* R + 0.58700* G + 0.11400* B
(1)
Cb = −0.16874 * R − 0.33126 * G + 0.50000 * B + 128
(2)
Cr = 0.50000 * R − 0.41869 * G − 0.08131* B + 128
(3)
Then we model the skin color based on Gauss model,
(
)
M = Σ, μCb , μCr .
An Approach to Dynamic Gesture Recognition for Real-Time Interaction
Where
the
mean
N
value
of
Cb and Cr
respectively
371
is
given:
N
1 1 , C μ = ∑ ∑ Cri , and the variance matrix is given: bi C b r N i=0 N i =0 2 2 ⎛ σC 2 σC C ⎞ N N 2 2 ∑=⎜ ⎟ σ = ( C − μ ) σ = ( C − μ ) ∑ ∑ Cr ri Cr Cb bi Cb ⎜σ C C σ C 2 ⎟ ⎝ ⎠ .where i =0 i =0 , ,
μC =
b
b r
r b
r
N
σ C C = σ C C = ∑ (Cri − μC )(Cbi − μC ) b r
r
b
i =0
r
b
We use Gauss model to approach the distribution of the skin color with a certain number of the skin samples. After the Gauss model training of the skin color [7], we can get the probability of the pixels that need to be judged, as follow: P(Cb , Cr ) = exp(−0.5(Cr − μCr , Cb − μCb )T ∑ (Cr − μCr , Cb − μCb )) −1
(4)
So we can get the probability of the every pixel of the image to be tested. We take the adaptive threshold by observing to get the segmentation of the skin: when the threshold gets lower, the region of the skin increase, but the rate of increase reduce. When the threshold reduces to a certain value, the region of the skin increase significantly, since non-skin is region also included. Record the incremental minimum when the threshold reduces, and this is the optimal threshold. In this paper, threshold reduces from 0.75 to 0.05, reduces by 0.1 every time. Observing the incremental minimum, if the threshold reduces from 0.45 to 0.35, we set the threshold 0.4. The example of the segmentation shows as Figure 2.
Fig. 2 An example of the segmentation based on skin-color
3.2 Gesture Tracking Based on Optical Flow Optical flow means the speed of the campaign mode in the image. Optical flow tracking reflects the change in the image because of the motion in the interval dt . Therefore, optical flow only connected with the change from motion not from others [8].
372
J. Zhao and T. Chen
Fig. 3 Schematic of SAD
Optical flow calculates usually based on the assumption as follows: The brightness of any point on the object to be observed remain unchanged as the time changes. The campaign mode of the adjacent points is similar in the image plane; it is an assumption that the brightness in the hand region and the parameters such as the direction and the speed of the motion basically maintain the same, in the system of the dynamic gesture recognition based on vision. Calculating the sum of the absolute value of the gray difference between the adjacent frames in the interested pixels, we can seek for the optical flow, as show in Figure3. The block of No. f frame is the reference block(size is 8*8 pixels), it is at search block S next frame, block S is the expansion of the original reference block, which expand for 8 pixels at the top and left, 7 pixels at the bottom and right. The formula of the optical flow is SAD(Sum of Absolute Difference),
Ⅰ Ⅱ
D(u, v) =
7
∑ R( x, y ) − S ( x + μ + 8, y + ν + 8)
(5)
x , y =0
R ( x, y ) means the gray value of the point ( x, y ) in the block S, S ( x + μ + 8, y + ν + 8) means the gray value of the point ( x + μ + 8, y + ν + 8) , D( μ ,ν ) is the sum of the gray value in the block R and block S. we can know the range of ( μ ,ν ) is 8~7. The point ( μ ,ν ) corresponding to the smallest D ( μ ,ν ) is at the center of the block R in the next frame.
Where
The original image can be divides into many reference blocks, and the center of the block in the next frame can be calculated by the algorithm above.
3.3 Feature Extraction The features of the feature tracking basically are location, speed, and angle. We mainly consider the plane movement, and can calculate the values in the polar
An Approach to Dynamic Gesture Recognition for Real-Time Interaction
373
coordinate ( ρ , ϕ ) . The judgment suspension can be seen in figure 4 and figure 5. According to the algorithm of the optical flow, we can get the original data— the median point ( x, y ) of the hand in the rectangular coordinate. To make the features not sensitive to the change of the location, we should regularize ( x, y ) .
1 T ⎛1 T ⎞ ( x c , y c ) = ⎜ ∑ xt , ∑ y t ⎟ ⎝ T t =1 T t =1 ⎠
rt =
( (x − x ) t
c
ϕt = tan −1 ( Where and
2
+ ( yt − yc ) 2
)
yt − yc ) / 2π xt − xc
(6)
(7)
(8)
( xc , yc ) is the center of gesture track, rt is the distance of ( xc , yc )
( x, y ) . For φt , we have a discretization with 16 direction chain code.
To set the discretization dis_N, calculate the dis_N: Sita=180* φt ; FOR (i=0; i<=15; i++) If ((sita >= i*22.5) && (sita < (i+1) *22.5)) Dis_N=I; Thus, a specific gesture G can be represented by a series of discrete teature extraction [9].
G = {dis _ N i } i=1, 2, …T, where T is the symbol number of the quantified observation. To get the high recognition rate, the number of the observation quantified is very important. After several trials, the optimal quantified number is 24.
3.4 Suspension Point Judgment In the dynamic gesture recognition system, every movement of the hand will cause the system estimation [11] whether it is the suspension point. When the dynamic gesture almost ends, it has the corresponding max likelihood value. The start point can be obtained by the Viterbi algorithm whenever meets the suspension point candidate. And the optimal candidate suspension point can be the one we need. We choose as following principle:
a b
choose the last candidate suspension point before the pattern B when it’s not a gesture after the pattern B. when it’s a gesture after the pattern B, we have two options:
374
J. Zhao and T. Chen
Fig. 4 Trackview of the start gesture
Fig. 5 Schematic of the suspension judgement
A will be a part of the gesture when the start point of B is before the first candidate suspension point of A, concluding both A and B and neglecting all the candidate suspension point of A. Choose the last point to be the suspension point when B starts before the first and the last candidate suspension point of A.
3.5 The Application of the Gesture Recognition In this paper, we have the recognition test with eight gestures show in the Table 1 to interact between the robot and human, applying to the remote robot control system based on p2p [12-13]. It can be divided to 3 steps: firstly, to choose a patter of the hand. In the paper, we choose the center of the gravity to be the pattern of the hand. Secondly, design a method of recognition. We adopt the discrete HMM with 8 states [14]. Finally, we train the HMM [15] in the training data. We can see from the Table 2 and Table 3 that the accuracy rate of the recognition is high to above 90%, which can basically satisfy the remote robot control.
An Approach to Dynamic Gesture Recognition for Real-Time Interaction Table 1 Eight dynamic gestures in the application
Table 2 Training sample and testing data of the test
Table 3 The results of the recognition test
375
376
J. Zhao and T. Chen
4 Conclusions In this paper, we considered a vision-based system that can interpret a user’s gestures in real time. Hand segmentation utilized color and motion information and we use the mathematics morphology of expansion and corrosion to get an accurate region of the hand. The current location of the hand can be represented by the coordinate of the center of gravity in 2D plane. HMM with discrete and continuous densities were trained and tested. The disadvantages of the system are that: it is difficult to segment the hand when the color of the background is similar to the hand, and in the condition that non-defined gestures happen, the system recognition is low to 88%.
References 1. Wu, Y., Huang, T.S.: Human Hand Modeling, Analysis and Animation in the Context of HCI. In: Proc. of the Int’l. Conf. on Image Processing, pp. 6–10 (1999) 2. Zhu, Y.X., Ren, H.B., Xu, G.Y., Lin, X.Y.: Toward Real-time Human-computer Interaction with Continuous Dynamic Hand Gestures. In: Proc. of the IEEE Int’l. Conf. on Automatic Face and Gesture Recognition, pp. 544–551 (2000) 3. Nilsen, M., Storring, M., Moeslund, T.B., Granum, E.: A Prodedure for Developing Intuitive and Ergonomic Gesture Interface for HCI. In: Camurri, A., Volpe, G. (eds.) GW 2003. LNCS(LNAI), vol. 2915, pp. 409–420. Springer, Heidelberg (2003) 4. Mo, Z.Y., Lewis, J.P., Nwumann, U.: SmartCanvas: A Gesture-driven Intelligent Drawing Desk System. In: Proc. of the IUI 2005, pp. 239–243 (2005) 5. Malik, S., Laszlo, J.: Visual touchpad: A Two-handed Gestural Input Device. In: Proc. of the ACM ICMI 2004, pp. 289–296 (2004) 6. Kjeldsen, R., Kender, J.: Finding Skin in Color Images. In: Proceedings of International Conference on Autaomatic Face and Gesture Recognition, Killington, pp. 312– 317 (1996) 7. Yang, J., Lu, W., Waibel, A.: Skin-color Modeling and Asaptation. In: Proceedings of ACCV 1998, HonhKong, pp. 687–694 (1998) 8. Cutler, R., Turk, M.: View-based Interpretation of Real-time Optical Flow for Gesture Recognition. In: IEEE Inter. Conf. on Automatic Face and Gesture Recognition, Nara, Japan (April 1998) 9. Zhu, Y.X., Huang, Y., Xu, G.Y., et al.: Motion-Based Segmentation Scheme to Feature Extraction of Hand Gesture. In: Zhou, J., ain, A.K., Tian-xu, Z., et al. (eds.) Proceedings of SPIE, vol. 3545, pp. 228–231. SPIE, Washington (1998) 10. Ren, H.B., Xu, R.Y., Zhu, Y.X., Lin, X.Y., Tao, L.M.: Motion-and-color Based Hand Segmentation and Hand Gesture Reocgnition. In: Proceedings of the First International Conference on Image and Graphics; Journal of Image and Graphics (JIG), 5(suppl.), 384–388 11. Wren, C., Azarbayejani, A., Darrell, T., Pentland, A.: Pfinder: Real-Time Tracking of the Human Body. IEEE Trans. on Pattern Analysis and Machine Intelligence 19(7), 780–785 (1997) 12. Waldherr, S., Romero, R., Thrun, S.: A Gesture Based Interface for Human-robot Interaction. Autonomous Robots 9, 151–173 (2000)
An Approach to Dynamic Gesture Recognition for Real-Time Interaction
377
13. Segen, J., Kumar, S.: Fast and accurate 3D gesture recognition interface. In: On: Proceedings of International Conference on Pattern Recognition, Brisbane, Australia, pp. 86–91 (1998) 14. Andrew, D., Bobick, Aaron, F.: Parametric Hidden Markov Models for Gesture Recognition Wilson. IEEE Trans. on Pattren Anlysis and Machine Intelligence 21(9), 884– 900 (1999) 15. Rabiner, L.R., Juang, H.: An Introduction to Hidden Markov Models. IEEE ASSP Magazine, January 4-16 (1985)
Dynamic Multiple Pronunciation Incorporation in a Refined Search Space for Reading Miscue Detection Changliang Liu, Fuping Pan, Fengpei Ge, Bin Dong, Shuiduen Chen, and Yonghong Yan
Abstract. Error prediction is important for detecting reading miscues by a reading tutor. In order to incorporate the error prediction into the decoder of a conventional speech recognizer, this paper proposes an algorithm of Dynamic Multiple Pronunciation Incorporation (DMPI). It solves the confliction between the coverage of errors and the perplexity increase of search space. A multiple pronunciation model (MPM) is developed to model the misreading errors. The pronunciation variants referred to in current reference are extracted from MPM and added to the search space of the recognizer – a refined state network before recognizing. The original state network is redeveloped to reserve some redundant fan-in and fan-out nodes which make the merging of the original state network and the additional state network very easy. The experiment result proved effectiveness of this algorithm. The EER is decreased by about 9.5%. Keywords: CALL, Reading tutor, Decoder, State network, Multiple pronunciation model.
1 Introduction Computer Assisted Language Learning (CALL) has been proved very effective for language learners. Some CALL systems concentrate on pronunciation assessment or pronunciation training (CAPT) [1, 2] and some concentrate on improving reading proficiency and comprehension of learners (reading tutor) Changliang Liu · Fuping Pan · Fengpei Ge · Bin Dong · Yonghong Yan ThinkIT Speech Lab, Institute of Acoustics, Chinese Academy of Sciences {chliu,fpan,fge,bdong,yyan}@hccl.ioa.ac.cn Shuiduen Chen Department of Chinese & Bilingual Studies, The Hong Kong Polytechnic University
[email protected] H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 379–389. c Springer-Verlag Berlin Heidelberg 2009 springerlink.com
380
C. Liu et al.
[3, 4, 5, 6]. A reading tutor can listen to the learner’s reading and provides help when the learner needs or it thinks the learner needs. A reading tutor is a tool for helping language learners (i.e. children, foreigners) to learn language more easily. One of its main task is to detect and classify miscues in learners’ reading speech given the reference, so that the tutor can provide reliable error analysis and appropriate feedbacks. Reading miscues consist of insertion, omission, substitution, etc.. Most reading tutors use an automatic speech recognizer (ASR) to detect reading miscues [3, 4, 5, 6]. However, a broad range of pronunciation variations caused by misreading or accent make it difficult to get sufficiently high speech recognition performance because there are no correct definitions in a common pronunciation dictionary. The poor recognition result impairs the detection performance greatly. In this case, efficient error predication becomes important for the system. It can help the recognizer to anticipate and remediate the reading miscues. In [5] and [4], error predictions help to expand the simple reference to a syllable or phone graph. The graph with error predictions is then aligned with the lattice resulted from the recognizer to generate the detection results. However, in the recognizing process, they don’t consider the error predictions and only use a common pronunciation dictionary. As we have said, the misdefinitions of the reading miscues will make decoder make mistakes. We attempt to incorporate error predictions in the decoder of a large vocabulary continuous speech recognition (LVCSR) system. However, we don’t simply add all pronunciation variants into the pronunciation dictionary, which will largely increase the perplexity of search space and won’t obtain a good performance [8]. For a reading tutor, what the reader will read is known beforehand (reference), which is the most different with conventional LVCSR. Therefore, we proposed an algorithm of Dynamic Multiple Pronunciation Incorporation (DMPI). It only adds the pronunciation variants in the dictionary relevant to the reference and will rarely increases the perplexity search space. In practical LVCSR, the pronunciation dictionary is not an individual component. It and other knowledge sources such as hidden Markov model and context dependent phones are pre-compiled to a search space. In our system, the search space is a memory-efficient state network [7]. Dynamically adding pronunciation variants to the dictionary means that the search space has to be re-compiled, which is very inconvenient and inefficient. DMPI directly modifies the search space itself and needn’t re-compile the whole search space. The rest of this paper is structured as follows: section 2 describes an overview of the whole system; section 3 introduces the constructions of the multiple pronunciation model and in section 4 details of the algorithm DMPI is described; the experiments are presented in section 5 and the conclusion is given in section 6.
Dynamic Multiple Pronunciation Incorporation
381
2 System Overview Our system of detecting reading miscues is based on LVCSR framework, which is shown in figure 1. A decoder incorporates all the knowledge sources: language model (LM), acoustic model (AM), pronunciation dictionary (PD), reference and multiple pronunciation model (MPM), and translates the input speech into a hypothesis lattice. Then, the lattice is transferred into a confusion network by a lattice processing module. Confusion network is a new representation of a set of candidate hypothesis which specifies the sequence of syllable-level confusion in a compact lattice format [9]. It supplies many candidates for each word result, and can compensate the recognition errors when aligning it with the reference. Finally, the confusion network is aligned with the reference by an improved dynamic programming algorithm (DP), then the reading miscues – insertions, omissions and substitutions are extracted from the aligned path. Compared with a conventional LVCSR [7], the system has another two knowledge sources: • Reference: what is supposed to be read by the reader which is the special information for a reading tutor. • MPM: model the pronunciation variants and their probabilities.
/DQJXDJH 0RGHO
)HDWXUH ([WUDFW
$FRXVWLF 0RGHO
3URQXQFLDWLRQ 'LFWLRQDU\
/DWWLFH 3URFHVV
'HFRGH
$OLJQPHQW
'HWHFWLRQ 5HVXOW
6SHHFK
5HIHUHQFH
0XOWLSOH 3URQXQFLDWLRQ 0RGHO
5HIHUHQFH
Fig. 1 System overview of detecting reading miscues
3 Multiple Pronunciation Model The multiple pronunciation model contains all possible pronunciations (including the normative pronunciation and pronunciation variances) or words. And it quantifies the occasion of each pronunciation v of word w with a probability PMP (v|w). It can be derived by linguistic analysis [3], data-driven approaches or their combination [6]. In our system, the MPM is generated by a data-driven approach. An example of the multiple pronunciation model is shown in Fig.2. For the training of the multiple pronunciation model, a lot of reading speech by language learners is collected and then, the actual pronunciations of
382
C. Liu et al.
Fig. 2 An example of multiple pronunciation model
Ӊ ch ang2 0.59 Ӊ zh ang3 0.41 ӉӬ ch ang2 ch eng2 0.85 ӉӬ zh ang3 ch eng2 0.15 ......
the corpus is transcribed by human experts. After aligning the transcriptions and the references by DP, we count all pronunciations of each word and compute its pronunciation probability by the following equation. PMP (v|w) =
count of w with pronunciation v count of all word w
Several methods of exploiting MPM are introduced in [8]. A common method is to add all pronunciation variants to the pronunciation dictionary. However, it will not work well here. In our system, the dictionary has more than 40000 vocabularies and the multiple pronunciations have a large range because of readers’ different knowledge backgrounds. If all pronunciation variants are added, the search space will become very large and the increase of perplexity will reduce the accuracy of the recognition and impair the performance of detecting reading miscues. Considering both the pronunciation variants coverage and the perplexity of the search space, the algorithm DMPI is proposed.
4 Dynamic Multiple Pronunciation Incorporation In this algorithm, we did not add all pronunciation variants into the search space, but only those belong to words of current reference. Before decoding, they are extracted from the multiple pronunciation model based on analysis of current reference, and then added to the search space. Other pronunciation definitions in the search space keep unchanged. Thus, the perplexity increases only slightly. Before adding the pronunciation variants to the search space, which is a state network in our system, they will be converted to state paths. Then, the state paths are merged into the original state network.
4.1 The State Network The state network is pre-compiled from a linear lexicon and phonetic decision trees. The linear lexicon defines a list of words with their monophone sequence. Together with phonetic decision trees which give a clustering of the HMM states, the linear lexicon is finally transferred into a state network. The state network is then optimized by forward and backward node-merging
Dynamic Multiple Pronunciation Incorporation
383
Fig. 3 A three-word linear lexicon
࣫Ҁ%HLMLQJ EHLMLQJVS 䖢:HOFRPH KXDQ\LQJVS Դ
HLML
LQJE
LE
LQJEHL
5227
LQE
࣫Ҁ
MLE
ML
LEHL
फҀ LQ
LQQ
MLQJ
LQJQDQ
LQDQ
QDQM
MLQ
HLMLQJ
LQJQ
࣫ᵕ
EHLM
MLQJE
VS
ZH
VS
ZH
VS
ZH
VS
ZH
DQMLQJ
MLQJQ
Fig. 4 The state network for the three-word lexicon
process. A three-word linear lexicon and its final state network are shown in Fig.3 and Fig.4. The state network is a left-to-right graph and structured in three parts: individual mid-part, shared initial part (fan-in triphones) and final part (fanout triphones). There are two types of nodes in the network: state nodes and dummy nodes. Each state node is associated with an HMM state index, while dummy node is not related to any acoustic event and used to compress network or mark special situations. In Fig.4, symbol “” denotes state node and symbol “” denotes dummy node. Some important features about the state network are listed below: • Two layers of dummy nodes: FI nodes and FO nodes are introduced to cluster the cross-word (CW) context dependent fan-in and fan-out triphones; • A so-called “WI layers” are introduced to store the word identities and putting the nodes of this layer in the non-shared mid-part of the network; • Optimizing the network at the state level by a sufficient forward and backward node-merging process; • A language model (LM) look-ahead model is built on the state network to factor the LM probabilities over the state nodes.
384
C. Liu et al.
4.2 Algorithm of Dynamic Multiple Pronunciation Incorporation The liner lexicon added with an additional pronunciation is shown in Fig.5. The additional gray rectangle denotes that the word Кࣘ may also be pronounced as “bei3 jin1”. The insertion of pronunciation variants to the state network is done one by one. The state path which is converted from the additional pronunciation in figure 5 is shown in Fig.6. The state path is a mini state network. In order to merge the two state networks, we make some change during the building of the state network. In the original state network, only the essential FI and FO nodes are added. That is to say, if the CW triphone does not exist in the linear lexicon, the corresponding FI or FO node will not appear in the state network. Thus, if the mini state network has some FI or FO nodes which do not exist in the original state network, the merge of them will be very difficult. Therefore, we reserve the the whole set of FI and FO nodes when building the state network, no matter whether the corresponding triphones exist in the lexicon or not. Thus, some FI or FO nodes may be unconnected in the modified state network ,such as nodes 46 and 49 in figure 4. In our system of recognizing Chinese Mandarin, the total tonal mono phone number is 213, including 27 initials, 184 finals, a ‘sp’ and a ‘sil’. The total number of FI or FO nodes is 4968. In figure 4, not all the fan-in and fan-out nodes are listed due to space reason. ࣫ᵕQRUWKSROH EHLMLVS
Fig. 5 The linear lexicon which is inserted with an additional pronunciation of Кࣘ
࣫Ҁ%HLMLQJ EHLMLQJVS ࣫Ҁ%HLMLQJ EHLMLQVS फҀ1DQMLQJ QDQMLQJ 6(17B3$86( VLO
6(17B67$57 VLO 6(17B67$57 VLO
࣫Ҁ%HLMLQJ EHLMLQ
LQE
LQEHL
MLQJ
MLQE
VS
ZH
MLQQ
VS
ZH
ĂĂ
ĂĂ
LQQ
LEHL
LE
HLMLQ
EHLM
࣫Ҁ
Fig. 6 A pronunciation variant and its corresponding mini state network
Dynamic Multiple Pronunciation Incorporation
385
When transferring a pronunciation definition to a mini state network, the whole set of FI and FO nodes are also built in. The derived state network contains all CW triphones. Similarly with the large state network, there are also some unconnected nodes in the mini state network, such as FI nodes: 76, 77 and FO node: 71. The corresponding triphone ‘i2-b+ei3’ and ‘j-in1+n’ do not exist in the pronunciation sequence of ’b-ei3-j-ing1’. They take a role of placeholder. With them, the merge of the large state network and the mini state network will become very easy. The merging process is operated in three steps: Step 1: For each FI node in the additional state network, move all its preceding and successive links to the same node in the original state network and remove the redundant links. Step 2: For each FO node in the additional stat network, move all its preceding and successive links to the same node in the original state network and remove the redundant links. Step 3: Link the ROOT node to all the first state nodes of the additional state network. The final merged state network is illustrated in Fig.7. The grey nodes belong to the additional state network and the dotted lines are the new links to the additional nodes. All multiple pronunciations referred to in the reference will be added to the original state network as the step above. Another problem of the additional state network is that they do not have corresponding LM look-ahead model. TO resolve this problem, the additional state network will not be optimized by forward and backward merging and the WI node of each pronunciation path are regularly put at the begin of the state sequence. When decoding,
HLML
LQJE
࣫ᵕ
LQE
EHLM
LQJEHL
LEHL
LE
࣫Ҁ
MLE
ML
MLQ
HLMLQJ
5227
LQJQ
LQ
फҀ
LQQ
LE
DQMLQJ
QDQM
LQDQ
EHLM
LEHL
MLQJQ
࣫Ҁ
HLMLQ
ZH
VS
ZH
VS
ZH
VS
LQEHL LQE
MLQJ
LQJQDQ
MLQJE
ZH
MLQE
VS
MLQJ
LQQ
Fig. 7 The final state network after merging
MLQQ
VS
ZH
VS
ZH
386
C. Liu et al.
the decoder can obtain the LM probability as soon as it appears in this path and do not need the LM look-ahead probability. All pronunciations variants of the same word share the same language model probability during the decoding process because they stand for the same word identity. Their pronunciation probability PMP (v|w) is introduced and the word probability will be computed according to the equation below: P (w|o) PAM (o|v) · PMP (v|w) · PLM (w) where PMP (v|w) is the probability of the pronunciation v given word w, and PAM (o|v) is the acoustic model probability of pronunciation v given observation vector o. PMP (v|w) is the probability of pronunciation v given word w, which is referred to in section 3.
5 Experiment 5.1 Data Corpus and Evaluation Metrics We use a data corpus of Hongkong college students’ reading Chinese Mandarin to evaluate the algorithm DMPI. The reading materials are four passages from “Hongkong Putonghua Shuiping Kaoshi” (PSK). Each of 665 students read one of the four passages. All speech were transcribed with syllable sequences (pin yin) by Chinese native teachers. The whole corpus is divided into two parts: 100 paragraphs as test set and the rest as develop set. The develop set is used to adapt the acoustic model and train the multiple pronunciation model. The test set is used to evaluate performance of the algorithm. We use three metrics to measure the performance of the system: syllable recognition error rate (SER) is used to evaluate the accuracy of recognition; miscue detection error rate (DER) and false alarm rate (FAR) are used to evaluate the detecting performance. SER is computed by comparing the 1best recognizing result with the transcription using DP. DER is defined as the number of miscues which have not been detected divided by the number of all miscues; and FAR is defined as the number of words erroneously detected as reading miscues divided by the number of all miscues. The real miscue information is obtained by aligning the reference with the transcription. Different points of DER and FAR can be obtained by changing some parameters in the system. They can be plotted as a detection error tradeoff (DET) curve. For the convenience of comparison, equal error rate (EER) is also used, which is the rate when the DER equals to FAR.
5.2 Experiment Setup The front end of the system extracts 39 dimensions MFCC feature, including 12 dimensions static cepstrum and 1 dimension energy, with their 1st and
Dynamic Multiple Pronunciation Incorporation
387
2nd derivatives. The HMM acoustic model was trained on about three hundred hours of speech from Chinese native speakers, then was adapted by the develop set of the Hongkong corpus referred to in section 5.1 with a MAP algorithm. The final model has about five thousand states and each state has 32 Gauss mixtures. The language model is a general tri-gram model which was trained on about 2G bytes text materials from news and internet. The dictionary has about forty thousand words, including all characters and most words of Chinese Mandarin.
5.3 Experimental Result In this section, performance of the algorithm DMPI is investigated. The baseline system is a conventional LVCSR decoder followed by a confusion network alignment module, without the multiple pronunciation model. The multiple pronunciation model is trained on the develop set referred to in section 5.1. Table 1 shows the recognizing and detecting performance of DMPI compared with baseline. The recognizing SER is reduced by 9.5% relatively and the detecting EER decreases by 9.3% respectively, which proves the effectiveness of DMPI. The DET curves of DMPI is illustrated in Fig.8. Compared with the baseline, the detecting performance is significantly improved. The learner’s poor language level results in many arbitrary reading miscues. There are many pronunciations that do not exist in the original Table 1 Recognizing and detecting performance of DMPI SER EER baseline 31.0% 46.3% DMPI 27.8% 42%
Fig. 8 DET Performance of the algorithm DMPI compared with baseline
388
C. Liu et al.
pronunciation dictionary corresponding to some word. They may be called out-of-pronunciations (OOP). The additional pronunciations from the multiple pronunciation model resolve these OOPs. They give the decoder more choices in the decoding process. The dynamically incorporation algorithm hardly increase the perplexity of the search space but it can cover the most necessary error predictions. The decoding time after incorporating additional pronunciations does not increase compared with the baseline.
6 Conclusion This paper proposed an algorithm of Dynamic Multiple Pronunciation Incorporation (DMPI) to incorporate the error prediction into the decoder of a conventional speech recognizer. The algorithm solves the confliction between the coverage of errors and the perplexity of search space. It dynamically adds the pronunciation definition paths into the search space, which increases the perplexity slightly. We redeveloped the the original state network to reserve some redundant fan-in and fan-out nodes. These placehold nodes make the merge of the original state network and the additional state network very easy. The experiment result proved the effectiveness of this algorithm. The EER of miscue detecting error rate and false alarm rate is decreased by 9.5%. The Dynamic Multiple Pronunciation Incorporation is a novel method to apply the multiple pronunciation model. It is a general framework, not limited by special multiple pronunciation model, or special searching space – state network. In our system, the multiple pronunciation model is a statistical multiple pronunciation dictionary. Some more complex models also can be used, such as decision trees, artificial neural network (ANN), etc. [8]. The decoding space can also has be other types, such as phonetic graphs or weighted finite-state transducers (WFST), etc.
References 1. Witt, S.M., Young, S.J.: Phone-level Pronunciation Scoring and Assessment for Interactive Language Learning. Speech Communciation 30, 95–108 (2000) 2. Leonardo, N., Horacio, F., Vassilios, D., Mitchel, W.: Automatic Scoring of Pronunciation Quality. Speech Communciation 30, 83–93 (2000) 3. Jack, M., Steven, F.R., Alexander, G.H., Matthew, K.: A Prototype Reading Coach that Listens. In: Proceedings of the Twelfth National Conference on Artificial Intelligence (AAAI 1994), Seattle, WA (1994) 4. Daniel, B., Wayne, W., Sarel, V.V., Javier, G.: Syllable Lattices as a Basis for a Children’s Speech Reading Tracker. In: InterSpeech (2007) 5. Jacques, D., Mari, W., Kris, D., Hugo, V.H.: A Flexible Recogniser Architecture in a Reading Tutor for Children. In: InterSpeech (2006) 6. Hongcui, W., Tatsuya, K.: Effective Error Prediction Using Decision Tree for Grammar Network in Call System. In: ICASSP (2008)
Dynamic Multiple Pronunciation Incorporation
389
7. Shao, J., Li, T., Zhang, Q., Zhao, Q., Yan, Y.: A One-Pass Real-Time Decoder Using Memory-Efficient State Network. IEICE Transactions on Information and Systems 91, 529–537 8. Helmer, S., Catia, C.: Modeling Pronunciation Variation for ASR: A Survey of the Literature. Speech Communication 29, 225–246 (1999) 9. Lidia, M., Eric, B., Andreas, S.: Finding Consensus in Speech Recognition: Word Error Minimization and Other Applications of Confusion Networks. Computer Speech and Language 14, 373–400 (2000)
Depicting Diversity in Rules Extracted from Ensembles Fabian H.P. Chan, A. Chekima, Augustina Sitiol, and S.M.A. Kalaiarasi 1
Abstract. Ensembles are committees of neural networks used to achieve better classification and generalization accuracy in contrast to just using a single neural network. Traditionally incomprehensible models such as artificial neural networks presents a problem in its application to safety critical domains where not only accuracy but also model transparency is a requirement. This problem is not only inherited but further multiplied in ensembles. Furthermore, the aspect of diversity by which ensembles achieve improved classification ability cannot be reflected in rule extraction methods designed for single neural networks hence the need for rule extraction methods specifically designed for ensembles. This paper presents a decompositional rule extraction algorithm for ensembles which is able to approximately decompose the neural networks in an ensemble and reflect their collective diversity to identify significantly contributing inputs. Keywords: Safety critical domain, Ensemble neural network, Rule extraction, Comprehensibility, Data mining.
1 Introduction Safety critical domains refer to an area whereby a mistake on the application of programmable systems could lead to injury, death or other serious adverse effects on people. Healthcare or medical systems are an obvious example of a safety critical domain [1]. In order to facilitate the application of advanced computation models such as neural networks it is important therefore that these models can be tested and verified. Verification typically involves examination of the underlying logic of such systems or models by human experts of the related domains. Hence the need of humanly comprehensible rules which can be extracted from incomprehensible Fabian H.P. Chan . A. Chekima . Augustina Sitiol . S.M.A. Kalaiarasi School of Engineering and Information Technology Universiti Malaysia Sabah 88999 Kota Kinabalu Sabah Malaysia
[email protected] H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 391–400. springerlink.com © Springer-Verlag Berlin Heidelberg 2009
392
F.H.P. Chan et al.
models such as neural networks via rule extraction algorithms to allow transparency to the model and subsequent verification. Ensemble neural networks which are made up of multiple single neural networks achieve its improved classification and generalization ability through the diversity of its members [2,3]. As long as each singular neural network in the ensemble is able to classify sufficiently on different areas of the data, their combined output tends to suppress individual errors of any particular network. This implies that the output of an ensemble for any instance of data may reflect the decision of all or only a subset of the ensemble members. The aspect of diversity in ensembles is a scope for which single neural network rule extraction algorithms are not designed to handle. The rule extraction method presented in this paper attempts to address this aspect by extracting rules which retains the frequency of input usage across the positively voting components of the ensemble. The ensemble rule extraction approach used in the design of this algorithm aims for a decompositional rule extraction algorithm in which the criterion emphasized upon is the fidelity of the rules extracted. An explanation of common rule extraction algorithm categorization and criteria is given in [4].
2 Ensemble Rule Extraction The ensemble rule extraction approach used in this paper divides the ensemble rule extraction process into two stages. The first stage called the component stage, deals with rule extraction for the individual network components used in the ensemble. The second stage is the combination stage which combines the rules obtained from the component stage in a relevant way to represent the voting combination method used in the ensemble. Because the components that contribute to the output of the ensemble can potentially vary for each instance of data, the rule extraction is performed on a case by case basis using one instance of data at a time. An explanation of the merit of this rule extraction approach for ensembles is given in [5].
2.1 Component Stage The basis for the rule extraction method employed here follows from the fact that artificial neural networks, being modeled after real neurons have activation functions in each neuron with a firing threshold. The output of a neuron y can be written in formula as expressed in equation (1).
⎛ n ⎞ y = S ⎜ ∑ wi pi + b ⎟ . ⎝ i =1 ⎠
(1)
where S is the activation function, wi and pi are respectively the i-th weights and inputs to the current neuron and b is the bias or threshold. The rule extraction algorithm employed here tries to find the minimum subset of links that contributes to a neuron firing for any given input. The algorithm does
Depicting Diversity in Rules Extracted from Ensembles
393
this by incrementally eliminating the smallest absolute valued weight and input product shown in (2) which still allows the neuron to fire.
wi ⋅ pi .
(2)
By repeatedly finding the minimum subset of links from each neuron identified by a minimum subset of links in the previous layer, the significant input neurons that contribute most to the network output can be identified. This rule extraction algorithm is applied to individual networks at the component stage of the ensemble rule extraction. The algorithm is given in Fig 1. 1 2 3 4 5
Start with the output neuron in the output layer of the network. Find the minimal subset of weight links that is overcome the neuron threshold by incrementally removing the least absolute valued input weight product. Use the minimal subset to identify contributing neurons in the hidden layer For each identified contributing neuron, repeat step 2 and represent the results in a binary vector representing which inputs are used. Combine the binary input vectors obtained in step 4 with a logical OR function.
Fig. 1 Component Stage Rule Extraction Algorithm
In the following example, a neural network was trained for the Pima Indian Diabetes dataset [5]. The architecture of the network is shown in Fig. 2. Fig. 2 Network Architecture
The weights and biases for the network are given in the following where wij represents the weights between the input and hidden layer and vjk represents the weights between the hidden and the output layer. bj and ck represent the biases for the neurons in the hidden and output layer respectively. Input to hidden layer weights (wij), w13 = -0.5679 w11 = -0.0363 w12 = -1.5289 w23 = 0.9485 w21 = 2.0053 w22 = 0.2514 w33 = -1.2278 w31 = 0.4351 w32 = 0.8049 w41 = -0.1003 w42 = -1.5084 w43 = 1.3426 w53 = 1.3842 w51 = 2.0678 w52 = -0.2926
394
F.H.P. Chan et al.
w61 = 1.0435 w62 = -1.3746 w71 = 2.0320 w72 = 1.4391 w81 = 0.8230 w82 = -0.2552
w63 = 0.7555 w73 = 1.7399 w83 = -0.4560
Hidden to output layer weights (vjk): v11 = 3.3256 v21 = 2.3636 v31 = -2.5335 Hidden nodes bias (bj): b1 = 2.4522 b2 = -0.5033 b3 = -3.1981 Output node bias (ck): c1 = -3.2974 The neurons in the hidden and output layer use a log-sigmoid activation function as given in equation (3).
f ( x) =
1 . 1 + e−x
(3)
The following network inputs (Pi) is one instance of inputs taken from the Pima Indian Diabetes dataset which have already been bipolar normalized using the formula in (4).
f ( x) =
(1 − (−1))( x − x min ) − 1. x max − x min
(4)
[p1, p2, p3, p4, p5, p6, p7, p8] = [-0.3846, 0.1980, 0.1475, -1, -1, 0.6182, -0.2701, - 0.7727]
Applying equation (1) to each of the neurons in the hidden layer (hj) we get the following,
⎛ n ⎞ h j = S ⎜ ∑ wij ⋅ pi + b j ⎟ ⎝ i =1 ⎠ h1 = S(0.014+0.397+0.0642+0.1003 -2.0678+0.6451-0.5488-0.6359 +2.4522) = S(0.4203) = 0.6036 h2 = 0.7336 h3 = 0.047
Similarly, applying the same equation to the neuron in the output layer (Ok) we get the following output for the network, O1 = S(2.007+1.733-0.1191-3.2974) = 0.5802 The rule extraction begins by looking at the output neuron where the inputs are the outputs of the neurons in the hidden layer as shown in the earlier feedforward calculations of the network. The next step attempts to find a minimal subset of absolute valued weight and input product links necessary to overcome the neuron threshold which is expressed in equation (5). Find minimal subset of links so that, n
∑v j =1
jk
h j + ck ≥ 0 .
(5)
Depicting Diversity in Rules Extracted from Ensembles
395
From the example, v11h1 + v21h2 + v31h3 + c1 = 2.007 + 1.733 + -0.1191-3.2974 = 0.3235 The algorithm attempts to find the minimal subset by incrementally removing the smallest absolute valued weight input product link. Smallest |vjkhj| is: |v31h3| = |-0.1191| = 0.1191 Removing v31h3 will yield: v11·h1 + v21h2 + c1 = 2.007 + 1.733 -3.2974 = 0.4426 Smallest | vjkhj | is: |v21h2| = 1.733 Removing v21h2 will yield: v11h1 + c1 = 2.007 + -3.2974 = -1.2904 When equation (5) is no longer satisfied the algorithm stops and uses the last subset which still satisfied the equation as the minimal subset of links. The minimal subset of links identifies the significantly contributing neurons in the previous layer which in the example are h1 and h2. The next step of the algorithm repeats the minimal subset search for each of the hidden neurons identified. Output at h1 is given by, w11p1 + w21p2 + w31p3 + w41p4 + w51p5 + w61p6 + w71p7 + w81p8 + b1 = 0.014 + 0.397 + 0.0642 + 0.1003 - 2.0678 + 0.6451 - 0.5488 - 0.6359 + 2.4522 = 0.4203 Smallest |wijpi| is: |w11p1| = 0.014 Removing w11p1 will yield: w21p2 + w31p3 + w41p4 + w51p5 + w61p6 + w71p7 + w81p8 + b1 = 0.397 + 0.0642 + 0.1003 - 2.0678 + 0.6451- 0.5488 - 0.6359 + 2.4522 = 0.4063 Smallest |wijpi| is: |w31p3| = 0.0642 Removing w31p3 will yield: w21p2 + w41p4 + w51p5 + w61p6 + w71p7 + w81p8 + b1 = 0.397 + 0.1003 - 2.0678 + 0.6451 - 0.5488 - 0.6359 + 2.4522 = 0.3421 Smallest |wij pi| is: |w41p4 | = 0.1003 Removing w41p4 will yield: w21p2 + w51p5 + w61p6 + w71p7 + w81p8 + b1 = 0.397 - 2.0678 + 0.6451 - 0.5488 - 0.6359 + 2.4522 = 0.2418 Smallest |wijpi| is: |w21p2| = 0.397 Removing w21p2 will yield:, w51p5 + w61p6 + w71p7 + w81p8 + b1 = -2.0678 + 0.6451 - 0.5488 - 0.6359 + 2.4522 = -0.1552
396
F.H.P. Chan et al.
The algorithm stops here because equation (5) is no longer satisfied. In order to combine the results of different neurons in the hidden layer, the algorithm represents the significant inputs in a binary vector with 1 denoting significant and 0 as insignificant. For h1: [p1 p2 p3 p4 p5 p6 p7 p8] = [0 1 0 0 1 1 1 1] By applying the same procedure to H2 we obtain: [p1 p2 p3 p4 p5 p6 p7 p8] = [0 0 0 1 0 0 0 0] Combining both vectors with a logical OR function yields: [0 1 0 0 1 1 1 1] ∨ [0 0 0 1 0 0 0 0] = [0 1 0 1 1 1 1 1] The final output for the component rule extraction stage is this binary vector which represents significant inputs used in that particular neural network component.
2.2 Combination Stage The combination stage of an ensemble combines the outputs from the neural network components to produce the final output of the ensemble. The rule extraction in this stage attempts to take into consideration how the ensemble achieves this combination and reflects it in the final rules that is produced. The algorithm for the combination stage rule extraction is presented in Fig 3. 1 2 3
For each network component that votes positively towards the ensemble output, perform component stage rule extraction. Sum the binary input vectors from the components to obtain an input usage indication. Convert the input usage into rules.
Fig. 3 Combination Stage Rule Extraction Algorithm
In the following example an ensemble with 5 network components are used. The ensemble uses network components trained for the Pima Indian Diabetes dataset with a majority vote as the combination method to produce the final output [5]. Table1 shows the voting results for 5 data instances put into the ensemble and Table 2 shows the binary input matrices for each network component. Table 1 Voting results of each component Network Components 1 2 3 4 5
Votes for 5 data instances 1 1 1 1 1
0 0 0 0 0
1 1 0 1 0
1 1 1 1 1
0 0 0 0 0
Depicting Diversity in Rules Extracted from Ensembles
397
Table 2 Component binary input matrices Network Components Binary input matrices for the 5 data instances
1
2
3
4
5
11000110 11111111 0 0 0 1 0 00 1 11000010 11111111
11111111 11111111 00011010 00000000 11111111
11111111 11111111 11111111 00000000 11111111
11101111 11111111 11111111 00000000 11111111
00000000 11111111 11111111 0 00 0 0 0 0 0 11111111
Since the Pima Indian Diabetes dataset has a binary target, we are only interested in the positive outputs of the ensemble. Therefore the rules are only extracted when the ensemble outputs a positive result. Step 1 of the algorithm looks for components that vote positively towards the ensemble output. Table 1 shows that out of the 5 data instances entered into the ensemble only 3 of them produces positive outputs which are data instances 1, 3 and 4. In the first instance of data entered into the ensemble all the components voted positively. The corresponding binary input vectors from Table 2 for each positively voting component in this instance is then summed to obtain an input usage indication. Input usage indication at data instance 1, = Component1 + Component2 + Component3 + Component 4 + Component5 = [1 1 0 0 0 1 1 0] +[1 1 1 1 1 1 1 1] + [1 1 1 1 1 1 1 1] + [1 1 1 0 1 1 1 1] + [0 0 0 0 0 0 0 0] = [4 4 3 2 3 4 4 3] Input usage indications are similarly obtained at each data instance which produces a positive result. Input usage indication at data instance 3, = Component1 + Component2 + Component 4 = [0 0 0 1 0 0 0 1]+[0 0 0 1 1 0 1 0]+ [1 1 1 1 1 1 1 1] = [1 1 1 3 2 1 2 2] Input usage indication at data instance 4, = Component1 + Component2 + Component3 + Component 4 + Component5 = [1 1 0 0 0 0 1 0]+[0 0 0 0 0 0 0 0]+ [0 0 0 0 0 0 0 0]+[0 0 0 0 0 0 0 0]+ [0 0 0 0 0 0 0 0] = [1 1 0 0 0 0 1 0] IF att1(4)=6 AND att2(4)=148 AND att3(3)=72 AND att4(2)=35 AND att5(3)=0 AND att6(4)=33.6 AND att7(4)=0.627 AND att8(3)=50 THEN 1 IF att1(1)=10 AND att2(1)=115 AND att3(1)=0 AND att4(3)=0 AND att5(2)=0 AND att6(1)=35.3 AND att7(2)=0.134 AND att8(2)=29 THEN 1 IF att1(1)=2 AND att2(1)=197 AND att7(1)=0.158 THEN 1 Fig. 4 Final extracted rules
398
F.H.P. Chan et al.
Each of the input usage matrix are then converted into a rule showing the original data of that particular instance with the frequency of which a particular attributewas referenced in the ensemble denoted in brackets. The rule set produced from the 3 data instances are given in Fig 4 .
3
Experimentation
This section gives the experiment setup and results for the rule extraction algorithm applied on diabetes dataset. The details of the dataset are first given followed by the ensemble creation procedure and finally the rule extraction results applied on the ensembles. The datasets used in the experiment were obtained from the UCI Machine Learning Repository [6]. This dataset have been commonly used in other literature and the availability of this datasets to other researchers allows experiments to be replicated or compared. The details of a few relevant aspects of the datasets used in this experiment are given in Table 3. Table 3 Dataset details Aspect Number of instances Class type Class Distribution Attributes Division for Train, Test and Validation set (%)
Description/Value 768 Binary 0 = 500, 1 = 268 8 Train = 60, Test = 20, Validation = 20
3.1 Ensemble Training The neural network components for the ensemble were trained using feedforward backpropagation neural networks with gradient descent and momentum. The activation functions used in the hidden and output layers were both log-sigmoid. Parameters were selected through successive network trainings using different sets of parameters. The final parameters used are given in Table 4. Table 4 Network training parameters used Parameter Hidden Neurons Learning Rate Momentum Constant
Value 10 0.2 0.6
Network component generation used simple bootstrapping to increase diversity between components. A pool of 100 network components was generated. Genetic
Depicting Diversity in Rules Extracted from Ensembles
399
algorithm was then used to optimize a selection of 5 components to constitute the ensemble. Genetic algorithm parameters are given in Table 5. The ensemble creation procedure was repeated 5 times for diabetes dataset thus giving 5 separately trained ensembles from their own pool of 100 network components. Table 5 Genetic algorithm parameters Parameter Generations Population Size Crossover Fraction Migration Interval Migration Fraction Mutation
Value 100 20 0.8 0.8 0.2 Adaptive
3.2 Ensemble Rule Extraction Results The rule extraction algorithm was applied on each of the ensembles created. The experimentation results are given in Table 6 and Table 7. Each table gives the overall accuracy of the ensemble, the classification accuracy of the rules, and the rule fidelity. Table 6 Results for Pima Indian Diabetes No
Ensemble Accuracy
Rule Accuracy
Rule Fidelity
1 2 3 4 5
80.599 80.0781 80.0781 79.9479 79.8177
73.8281 69.9219 67.0573 64.7135 76.1719
90.625 84.8958 78.125 74.6094 95.5729
The results from Table 6 shows that the rule extraction algorithm can produce rule sets with relatively good accuracy although this ability is not entirely consistent as can be seen in the rule extraction results for the third ensemble. A possible exception to this algorithm causes too many links to be removed thus reducing the fidelity and accuracy in the rules obtained. By implementing a limit on the minimum number of links to retain however the rule extraction results improved greatly as can be seen in Table 7 which shows the rule extractions results with the algorithm modified to retain at least two links. Although the rule extraction results from Table 7 shows near perfect results, the rule comprehensibility has suffered greatly and nearly all the rules have as many antecedents as inputs to the network.
400
F.H.P. Chan et al.
Table 7 Results for Pima Indian Diabetes No
Ensemble Accuracy
Rule Accuracy
Rule Fidelity
1 2 3 4 5
80.599 80.0781 80.0781 79.9479 79.8177
80.599 80.0781 80.0791 79.9479 79.8177
100 100 100 100 100
4 Conclusion This paper has presented and illustrated a novel ensemble rule extraction algorithm capable of addressing an important aspect of ensembles which is diversity. The rules extracted by this algorithm depict diversity by showing the frequency of input usage used by the ensemble neural network components in producing its output. The application results of this algorithm on diabetes dataset is presented and although the rule extraction results is not shown to be entirely consistent, further experimentation with simple constraints show that sacrificing rule comprehensibility will allow higher fidelity. A possible solution to address this issue would be to accept that comprehensibility and fidelity are inversely related and therefore design an approach to rule extraction that allows the user to choose between them.
References 1. Wall, R., Cunningham, P.: Exploring the Potential for Rule Extraction from Ensembles of Neural Networks. In: 11th Irish Conference on Artificial Intelligence & Cognitive Science, pp. 52–68 (2000) 2. Tsymbal, A., Pechenizkiy, M., Cunningham, P.: Diversity in Ensemble Feature Selection. Technical report, Trinity College Dublin (2003) 3. Opitz, D., Shavlik, J.: Actively Searching for an Effective Neural-Network Ensemble. Connection Science 8, 337–353 (1996) 4. Tickle, A., Golea, M., Hayward, R., Diederich, J.: The Truth Is in There: Current Issues in Extracting Rules from Feedforward Neural Networks. In: Proceedings of the International Conference on Neural Networks, vol. 4, pp. 2530–2534 (1997) 5. Wall, R., Cunningham, P., Walsh, W.: Explaining Predictions from a Neural Network Ensemble One at a Time. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS, vol. 2431, pp. 449–460. Springer, Heidelberg (2002) 6. Asuncion, A., Newman, D.J.: UCI Machine Learning Repository. University of California, Department of Information and Computer Science, Irvine (2007), http://www.ics.uci.edu/~mlearn/MLRepository.html
A New Statistical Model for Radar HRRP Target Recognition Qingyu Hou, Feng Chen, Hongwei Liu, and Zheng Bao 1
Abstract. To resolve the problem of how to determine the proper number of the mixture models for radar high-resolution range profile (HRRP) target recognition. This paper develops a variational Bayesian mixture of factor analyzers (VBMFA) model. This method can automatically determine the optimal number of models by birth-death moves and can accurately describe the statistical characteristics of HRRP. So the VBMFA method should have better recognition performance than factor analysis and mixtures of factor analyzers method, and experimental results for measured data proved this conclusion. Keywords: Radar automatic target recognition (RATR), High-resolution range profile (HRRP), Variational Bayesian mixtures of factor analyzers (VBMFA), Variational Bayesian(VB), Mixtures of factor analyzers (MFA).
1 Introduction A high-resolution range profile (HRRP) contains the target structures, such as target size, scatterer distribution, etc. Several issues have to be considered when HRRP is applied to radar automatic target recognition (RATR). The first one is the well-known target-aspect sensitivity of HRRP [1-4], and one effective approach to overcome this problem is to establish a template database by dividing HRRPs from all target aspects into several target-aspect frames. The second one is the time-shift sensitivity of HRRP. In order to decrease the computation complexity, an HRRP is only a part of received radar echo extracted by a range window, in which target signal is included. Thus the position of the target signal in an HRRP can vary with the measurement. The last one is the amplitude-scale sensitivity of HRRP. This comes from the fact that the intensity of an HRRP is a function of radar transmitting power, target distance, radar antenna gain, radar receiver gain, radar system losses and so on. Thus, HRRPs measured by different radars or under different conditions will have different amplitude-scales. Therefore, it is a prerequisite of HRRP-based statistical recognition to deal with the three sensitivity Qingyu Hou . Feng Chen . Hongwei Liu . Zheng Bao National Laboratory of Radar Signal Processing, XiDian University, 710071 Xi’an, China
[email protected] H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 401–409. springerlink.com © Springer-Verlag Berlin Heidelberg 2009
402
Q. Hou et al.
problems of HRRP. Reference [2], [3] and [4] make a detailed analysis of the effect of the sensitivity problems of HRRP on statistical recognition, and propose a corresponding solution. After solving the above three sensitivity problems of HRRP, the main challenging task is how to accurately describe HRRPs statistical characteristics. Based on the hypothesis that the range cells of an HRRP approximate to be statistically independent, [3] developed a statistical model comprising two distribution forms to model echoes of different types of range cells as the corresponding distribution forms. According to [4], this hypothesis is not totally true, [4] discussed a more accurate statistical recognition methods based on HRRP samples’ jointly statistical characteristics. Three different joint-Gaussian models, Principle Component Analysis (PCA) model, Probabilistic Principle Component Analysis (PPCA) model, and Factor Analysis (FA) model were discussed in [4]. The recognition performance of PPCA model is better than PCA model, as the PPCA model considers both signal subspace and noise subspace. The base vectors of signal subspace must be orthogonal in both of the above two models. As the variables of Gaussian with zero mean are independent if they are orthogonal, the latent variables of those two models are independent each other. But actually independent variables are not always orthogonal, PCA model and PPCA model may not the best statistical model for HRRP. FA model just need the latent variables are independent, so it could describe HRRPs statistical characteristics accurately. Mixture of Factor Analyzers (MFA) is applied to this paper, as radar HRRP data is usually the multiple models. But how to determine the proper number of the models is a problem [5], [6]. A simple solution is to fix the number of mixture models, usually could improve the recognition performance, but not strict. Via variational Bayesian theory [7], [6] presented an algorithm of variational inference for Bayesian Mixtures of Factor Analyzers (VBMFA) that can automatically determine the optimal number of models by birth-death moves. This paper we apply the VBMFA to model radar HRRP data, it can describe HRRPs statistical characteristics accurately.
2 VBMFA in Radar HRRP Statistical Recognition Factor analysis (FA) is a method for modeling the joint-Gaussian distributed correlations in multidimensional data, but radar HRRP samples do not satisfy the joint-Gaussian distributed strictly. Therefore, FA mixture model, which is a jointGaussian mixture model, can describe HRRPs statistical characteristics accurately. Assume we have a mixture of m factor analyzers indexed by ws , s = 1,2, , m . In radar automatic target recognition, the generative MFA model of a ddimensional HRRP sample in the
Tk (k = 1,2,
K ) is given by
l th (l = 1,2,
(
Lk ) aspect-frame of the target
)
p( xkli | Θ kl ) = ∑ s =1 p xkli | μ kls , Akls ,Ψ kl p(ws ) m
(1)
A New Statistical Model for Radar HRRP Target Recognition
where parameter tions,
{
}
Θkl = (μkls , Akls , π s )s=1,Ψkl , π s m
403
is the vector of mixing propor-
π s = p(ws ) . Here we define that each model has a same Ψ kl
(
)
p xkli | μ kls , Akls ,Ψ kl = (2π )
−d 2
Ψ kl + Akls Akls
)(
T T ⎛ 1 × exp⎜ − xkli − μ kls Ψ kl + Akls Akls ⎝ 2
(
T −1 2
)( −1
matrix. And
(2)
)
⎞ xkli − μ ⎟ ⎠ s kl
But how to determine the proper number of the models-m is a problem. We can fix the number of models in our MFA model like [8], this makes a simple computation, but is not reasonable. And the method in [8] utilized the maximumlikelihood (ML) method, which fails to take into account model complexity. So this may lead to overfitting and the inability to determine the best model size and structure without utilizing costly cross-validation procedures. To deal with the problem of overfitting, we apply the Bayesian approach, which is often computationally and Bayesian integration is usually intractable in practice. So some different approximation method were developed [9], [10], such as Markov chain Monte Carlo (MCMC) methods and the Laplace approximation. And reference [6] developed a third method, variational Bayesian (VB) inference. We can obtain a lower bound on the log evidence using the Jensen’s inequality:
L ≡ ln p( x) = ln ∫ dΘp( x,Θ ) ≥ ∫ dΘq(Θ ) ln
p( x,Θ ) ≡F q(Θ )
(3)
It converts computation of log evidence into an optimization problem of minimizing the Kullback-Leibler (KL) distance between q (Θ ) and p ( x,Θ ) , which is
ln p( x) . To make the optimization problem tractable, it is assumed that the variational distribution q (Θ ) is suffiequivalent to maximizing a lower bound of
ciently simple - fully factorized with each factorized component in the exponential Fig. 1 A Bayesian formulation for MFA. Circles denote random variables, rectangles denote hyperparameters, and the plate notation is used to denote repetitions over data xkli and over the m analyzers in the generative model.
a* , b*
νs
s = 1, 2,..., m k = 1, 2,..., K l = 1, 2,..., Lk
μ * ,ν *
μkls
Akls
α *u * Ψ *kl
xkli i = 1, 2,..., N k = 1, 2,..., K l = 1, 2,..., Lk
si ykli
π
404
Q. Hou et al.
family. Under this assumption, an analytic solution of the optimal obtained by taking functional derivatives. We use
Θ kl = {α *u * , a* , b* , μ * ,ν * ,Ψ kl } to
q (Θ ) can be
denote the set of hyperparame-
ters of the Bayesian MFA model. The directed acyclic graph for the generative model is shown graphically in fig.1. Deriving from reference [6], we obtain the log marginal likelihood of the HRRPs by utilizing the equation (3): N ⎛ ⎞ L ≡ ln p ( xkl ) = ln⎜⎜ ∫ dΘ kl p (Θ kl )∏ p( xkli | Θ kl ) ⎟⎟ i =1 ⎝ ⎠
⎛ dπp (π | α *u * ) dνp(ν | a* , b* ) d Akl p ( Akl | ν ) dμ kl p( μ kl | μ * ,ν * ) ⋅ ⎞ ∫ ∫ ∫ ⎜∫ ⎟ ⎟ = ln⎜ N ⎡ m ⎜ ⎢ p( s | π ) dy p( y ) p ( x | s , y , A , μ ,Ψ )⎤⎥ ⎟ ∑ i kli kli kli i kli kl kl kl ∫ ⎜∏ ⎟ ⎦ ⎝ i =1 ⎣ si =1 ⎠ ≥ ∫ dπq (π ) ln
p (π | α * , u * ) q (π )
~ ⎡ p (ν s | a* , b* ) ~s ~s p ( Akls | ν s , μ * ,ν * ) ⎤ + ∑ ∫ dν q (ν ) ⎢ln + ∫ dAkl q ( Akl ) ln ~ ⎥ q (ν s ) q ( Akls ) s =1 ⎣ ⎦ m
s
s
N m ⎡ p ( si | π ) p ( ykli ) + ∑∑ q ( si ) ⎢ ∫ dπq (π ) ln + ∫ dykli q ( ykli | si ) ln q ( si ) q ( ykli | si ) i =1 si =1 ⎣ ~ ~ ~ + ∫ d Akl q ( Akl ) ∫ dykli q ( ykli | si ) ln p ( xkli | si , ykli , Akl ,Ψ kl )
(4)
]
≡F
~
= [ Akl , μ kl ] , k = 1,2,
, K and l = 1,2, , Lk . To make the VB trac~ q(π ,ν , Akl ) is fully factorized: table, we assume that ~ q(π ,ν , Akl , μ kl ) ≈ q(π )q(ν )q( Akl ) . The variational posteriors q (⋅) are given Here Akl
in the Appendix. The model hyperparameters which govern the priors also are estimated in the Appendix. VBMFA method can infer the number of analyzers automatic via a heuristic approach [6].
3 Main Procedure of Radar HRRP Target Recognition Based On the Proposed Statistical Model The main procedure of radar HRRP target recognition based on the proposed statistical model-VBMFA model will described in the following. The FA model and MFA model have the same procedure in our experiment.
A New Statistical Model for Radar HRRP Target Recognition
A. Training Phase Step 1) Divide the training samples of target frames {x
}
Lk kl l =1 ,
xkl = {xkli | i = 1,2,
where
and amplitude normalized HRRP frame. Step 2) Compute the mean
mx | kl (k = 1,2,
, K ; l = 1,2,
Step 3) Estimate the parameters ters
{A , μ s kl
s kl
,Ψ kl , π s
}
K ; Lk ; m
k =1; l =1; s =1
405
Tk (k = 1,2,
, K ) into HRRP
, N } denotes the l th range aligned vector
, Lk ) by mx |kl =
of
HRRP
frame
1 N x . ∑ i =1 kli N
Akl andΨ kl in FA model. Estimate the paramein MFA model and in VBMFA model.
Step 4) Store both the mean vector and the parameters for each frame of each target. B. Test Phase Step 1) The amplitude normalized test sample
x is time-shift compensated with mx | kl , the mean vector of frame HRRP xkl (k = 1,2, , K ; l = 1,2, , Lk ) ,
by the slide correlation processing.
x into the l th (l = 1,2, , Lk ) frame model of , K ) . The joint distribution of x , p ( x | l ; k ) can be ob-
Step 2) Substitute test sample target
Tk (k = 1,2,
tained by equation (1), and FA model is a special instance for equation (1) by m = 1 . Step 3) The class-conditional density of target
Tk (k = 1,2,
, K ) is
p ( x | k ) = max p ( x | l ; k ) . According to Bayes’ rule, the posterior probability l
of target
Tk is p(k | x) =
p( x | k ) p(k ) K
∑ p( x | k ) p(k ) k =1
Step 4) If C
= arg max p(k | x) , then x belongs to target TC . k
4 Experiment Result 1) Data Description: The results presented in this paper are based on measured airplane data [2]-[4]. The parameters of the targets and radar are shown in Table 1, and the projections of target trajectories onto ground plane are shown in Fig. 2, from which the aspect angle of an airplane can be estimated according to its relative position of radar. As shown in Fig.2 (a-c), the measured data are segmented. Training data and test data are from different data segments, and the training data
406
Q. Hou et al. 15
60
15
5 1
4
7
10
5
10
40 2 20
5
7
6
7
4
2
5
1
1
km
6
km
km
3
5
2
3 3
4 0 -5
Radar 0
5
km
0 -20
-15
(a)
-10 km
-5
Radar 0
0 -20
Radar -15
(b)
-10 km
-5
0
(c)
Fig. 2 Projection of target trajectories onto ground .plane (a) Yak-42; (b) Cessna Citation S/II; and (c) An-26
cover almost all of the target-aspect angles of the test data. Therefore, the second and the fifth segments of Yak-42, the fifth and the sixth segments of An-26, the sixth and seventh segments of Cessna S/II are taken as the training data, other data segments are taken as the test data in our experiments. In addition, the measured HRRP is a 256-dimensional vector. In our experiments, the aspect-frames in the training data are divided by the equal interval partition approach, and the latent variables in all aspect-frames have the same dimensionalities. There are 35 aspect-frames for Yak-42, 50 for Cessna Citation S/II, and 50 for An-26, which each frame has 1024 HRRP samples. 2) Recognition Experiments: The average recognition rates of FA model, MFA model and VBMFA model are shown in Table 2. The dimensionalities of latent variables in our experiments are the same q=20. In MFA model, the number of components with a constant M=2, and in VBMFA model, the number of components is not fixed, because this method can automatically determine the optimal number of components. From reference [4], we know that FA model has the best performance than PCA model and PPCA model (both the training data and test data are having a pretreatment with power transformation in reference [4], and we do not utilize it in our experiments). MFA model [8] has a better recognition performance than FA model, which indicated that the statistical model of radar HRRP data is usually a non-Gaussian distribution, not a single model like FA model. And the MFA model via multiple joint-Gaussian models to describe the statistical characteristics of non-Gaussian distributed correlation in multidimensional radar HRRP data. But Table 1 Parameters of Palnes and Radar in the Isar Experiment radar parameters
center frequency bandwidth
5520 MHz 400MHz
planes
length (m)
width (m)
height (m)
Yak-42 Cessna Citation S/II An-26
36.38
34.88
9.83
14.40
15.90
4.57
23.80
29.20
9.83
A New Statistical Model for Radar HRRP Target Recognition
407
Table 2 Confusion Matrices and Average Recognition Rates of FA Model, MFA Model, VBMFA Model FA model (q=20) Yak42 Yak-42
100
Cessna Citation S/II 2
MFA model (q=20;M=2)
VBMFA model (q=20)
An-26
Yak42
Cessna Citation S/II
An-26
Yak42
Cessna Citation S/II
An26
17.75
100
2
15
100
0
8
Cessna Citation S/II
0
84.25
1.75
0
84.50
1.50
0
89
11
An-26
0
13.75
80.50
0
13.50
83.50
0
11
81
Average Recognition rate (%)
88.25
89.30
90.00
the number of analyzers with a fixed constant (i.e. M=2 in our experiment) is not strict, and it does not assure that the fixed constant is the optimal number of analyzers. VBMFA model does not fix the number of analyzers to a constant, contrary it can infer the number of analyzers automatic via a heuristic approach, therefore VBMFA model has a best recognition performance, the average recognition rate of which is approximately 2 percentage points larger than that of FA model.
5 Conclusion From the theoretical analysis and some experiments for the FA model, MFA model and VBMFA model, we can have the conclusion that a jointly nonGaussian distributed HRRP frame can be described more accurately via the jointGaussian mixture model(i.e. MFA model and VBMFA model) than single model(i.e. FA model). The VB method is first introduced into radar HRRP statistical recognition, and can obtain a better recognition performance than FA model and MFA model. It is worth pointing out that there are only three targets in our experiments. Therefore, the performance evaluation results will descend if more targets are concerned. And VBMFA method has a small drawback that it is a little sensitive to random initialization of the parameters, so how to find an algorithm with more robust and a better recognition performance for multi-class targets, is our next work. Acknowledgments. This work is partially supported by the Program for Cheung Kong Scholars and Innovative Research Team in University (PCSIRT, grant no. IRT0645) and NSFC with grant of 60772140.
408
Q. Hou et al.
References 1. Jocobs, S.P.: Automatic Target Recognition Using High-resolution Radar Rang Profiles. Ph. D. Dissertation. Washington University (1999) 2. Chen, F., Du, L., Liu, H.W., Bao, Z., et al.: An Amplitude-scale and Time-shift Jointly Optimization for Radar HRRP Recognition. In: Proceeding of 2008 International Conference on Radar, Adelaide, Australia, pp. 515–518 (2008) 3. Du, L., Liu, H.W., Bao, Z., et al.: A Two-Distribution Compounded Statistical Model for Radar HRRP target Recognition. IEEE Transactions on Signal Processing 54(6), 2226–2238 (2006) 4. Du, L., Liu, H.W., Bao, Z.: Radar HRRP Statistical Recognition Parametric Model and Model Selection. IEEE Transactions on Signal Processing 56(5), 1931–1944 (2008) 5. Ueda, N., Nakano, R., Ghahramani, Z., Hinton, G.E.: SMEM Algorithm for Mixture Models. Neural Computation 12(9), 2109–2128 (2000) 6. Ghahramani, Z., Beal, M.: Variational Inference for Bayesian Mixture of Factor Analysers. In: Advances in NIPS, vol. 12, pp. 449–455 (2000) 7. Attias, H.: Inferring Parameters and Structure of Latent Variable Models by Variational Bayes. In: Proceeding of 15th Conference on Uncertainty in Artificial Intelligence (1999) 8. Ghahramani, Z., Hinton, G.E.: The EM Algorithm for Mixtures of Factor Analyzers. Technical Report CRG-TR-96-1, Dept. of Computer Science, University of Toronto (1996) 9. Stephen, P.B.: Markov Chain Monte Carlo Method and Its Application. The Statistician 47(1), 69–100 (1998) 10. MacKay, D.J.C.: Probable Networks and Plausible Predictions—a Review of Practical Bayesian Methods for Supervised Neural Networks. Network: Computation in Neural Systems 6, 469–505 (1995)
Appendix: Optimal q (⋅) Distribution and Hyperparameters Via the VB approximation, we can obtain the full variational posterior [6]
( )
m N m ~ p(π ,ν , Akl , μ kl , s, ykl | xkl ) ≈ q (π )∏ q (ν s )q Akls ⋅ ∏∏ q(si )q( ykli | si ) s =1
i =1 si =1
• q ( ykli | si ) ~ N ( yklis , Σ s ) s ykli = Σ s Akls Ψ kl−1 ( xkli − μkls T
−1
Σ s = I + Akls Ψ kl−1 Akls T
d d ~s ~s ~s q, s • q ( Akl ) = ∏ q( Aklq⋅ ) ~ ∏ N ( Aklq ) ⋅, Σ q =1
~s
where Aklq ⋅
s s = [ Aklq ⋅ , μ klq ]
~ q ( Akls )
q =1
~ q ( Akls )
(5)
(6)
A New Statistical Model for Radar HRRP Target Recognition N
409
s s s −1 q, s Aklq ⋅ = [ Akl ]q = [Ψ kl ∑ q ( si ) xkli ykli Σ Akl Akl ]q T
(7)
i =1
⎡⎡
N
⎤
i =1
⎦
⎤
s μ klq = [ μ kls ]q = ⎢ ⎢Ψ kl−1 ∑ q ( si ) xkli + ν *μ * ⎥ Σ μq , sμ ⎥
⎣⎣
Σ Aq , sA kl
−1
kl
N
T = Ψ kl−1 ∑ q( s i ) ykli ykli i =1
Σ μq , sμ kl
Ψ
−1 kl
⎡ N 1 = diag ⎢ ∑ ⎢ N i =1 ⎣⎢
q ( y kli | s i )
−1 kl
kl
kl
⎦q
+ diag ν s
(8)
(9) q (ν s )
N
= ν q* +Ψ kl−1 ∑ q ( si )
(10)
i =1
⎛ ~ ⎡ y ⎤ ⎞⎛ ~ ⎡y ⎤⎞ ⎜ xkli − Akls ⎢ kli ⎥ ⎟⎜ xkli − Akls ⎢ kli ⎥ ⎟ ⎟ ⎟ ⎜ ⎜ ⎣ 1 ⎦ ⎠⎝ ⎣ 1 ⎦⎠ ⎝
T
⎤ ⎥ (11) ⎥ ~ ⎥ q ( Akls ) q ( s i ) q ( y kli | s i ) ⎦
where here we use the diag as the operator which sets off-diagonal terms to zero. ks
ks
j =1
j =1
• q(ν s ) = ∏ q(ν ⋅sj ) ~ ∏ Ga (a sj , b sj ) where
Ga denotes the Gamma distribution, k s is the number of column of s th
factor loading matrix
Akls .
a sj = a * +
1 d d s s 2 * , b j = b + ∑ Aklqj 2 2 q =1
(12) q ( Akls )
• q(π ) ~ Dir (αu ) where Dir denotes the Dirichlet distribution.
αu s =
α* m
N
+ ∑ q( si )
(13)
i =1
~ 1 • ln q( si ) = [ψ (αus ) − ψ (α )] + ln Σ s + ln p( xkli | ykli , si , Akls ,Ψ kl ) + c 2 ∂ where ψ (x ) is the digamma function ψ ( x ) ≡ ln Γ ( x) , Γ (x) is the Gamma ∂x function.
πs =
1 N ∑ q(si ) N i =1
(14)
Independent Component Analysis of SPECT Images to Assist the Alzheimer’s Disease Diagnosis ´ Ignacio Alvarez, Juan M. G´ orriz, Javier Ram´ırez, Diego Salas-Gonzalez, Miriam L´ opez, Carlos Garc´ıa Puntonet, and Fermin Segovia
Abstract. Finding sensitive and appropriate technologies for non-invasive observation and early detection of the Alzheimer’s Type Dementia (ATD) are of fundamental importance to develop early treatments. Single Photon Emission Computed Tomography (SPECT) images are commonly used by physicians to assist the diagnosis, rating them by visual evaluations. In this work, we present a computer aided diagnosis method in which a selection of relevant features was extracted from each patient image by means of Independent Component Analysis (ICA). An average image was computed within the normal or Alzheimer’s disease brain image class, to be later used to extract a set of independent sources that best symbolized each class characteristics. Each brain image was projected onto the space spanned by this independent sources basis, and the extracted information was used to train a SVM classifier which could classify new subjects in a unsupervised manner.
1 Introduction The Alzheimer’s Dementia (AD) is still incurable and the affected population, mostly elder, is growing through the past decades. The diagnostic of AD still remains a challenge, specially during the early stage of the disease that offers better opportunities to treat its symptoms or allows to test new medical treatments. Anatomical changes in the affected brains caused by the AD dementia take long time to occur, while physiological functions can give information about the damage earlier. A study ´ Ignacio Alvarez · Juan M. G´ orriz · Javier Ram´ırez · Diego Salas-Gonzalez · Miriam L´ opez · Fermin Segovia Dept. of Signal Theory, Networking and Communications, University of Granada, Spain Carlos Garc´ıa Puntonet Dept. of Architecture and Computer Technology, University of Granada, Spain H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 411–419. c Springer-Verlag Berlin Heidelberg 2009 springerlink.com
412
´ I. Alvarez et al.
of the regional cerebral blood flow (rCBF) of the brain is frequently used as a diagnostic tool in addition to the clinical findings. SPECT is a noninvasive, three-dimensional functional imaging modality that provide clinical information regarding biochemical and physiologic processes in patients, i.e. the rCBF. Many studies have examined the predictive abilities of nuclear imaging with respect to AD and other dementia illnesses. The evaluation of these images is usually done through visual ratings performed by experts. However, statistical classification methods have not been widely used in this area, quite possibly due to the fact that images represent large amounts of data and most imaging studies have relatively few subjects (generally <100)[1], producing the small sample size problem. Previous approaches consider the use of voxel intensities of the whole brain image as features to apply statistical learning methods for the classification [2]. Due to the large number of voxels, such approaches suffer the so-called small sample size problem, which occurs if the number of training samples is much smaller than the dimensionality of the feature space [3]. However, there exist approaches that successfully deal with the small sample size problem, as the PCA application of [4]. Nevertheless, PCA only takes into account pairwise relationships between voxels of the brain images. It seems reasonable to expect that important information may be contained in the high-order relationships among pixels, for which ICA is a sensitive method, being this reason the motivation to use it. Also, ICA has been successfully applied to other fields that can suffer the small sample size problem, as the face detection problem [5], fMRI brain imaging [6] or glaucoma detection in retina [7]. In the other hand, also an effective and robust method is needed to manage the classification task. Support Vector Machines are an example of classification technique that has shown adequate generalization properties in small datasets [8]. Support Vector Machines (SVMs) have marked the beginning of a new era in the learning from examples paradigm [9]. Recently, SVMs have attracted attention from the pattern recognition community due to a number of theoretical and computational merits derived from [9]. These techniques have been successfully applied to many fields including voice activity detection (VAD) [10], content-based image retrieval [11], and medical imaging diagnosis [2]. Somehow, the application of SVM to high dimensional and small sample size problems is still a challenge and improving the accuracy of SVM based-approaches is still a field in development [12]. This paper shows a computer aided diagnosis (CAD) system for the early detection of ATD using SVM classifiers applied to the projection of each brain image into the ’independent sources brain images space’ (see Sect.2). The proposed method, tested over SPECT images, is developed with the aim of reducing the subjectivity in visual interpretation of these scans by clinicians, thus improving the accuracy of diagnosing Alzheimer disease in its early stage.
Independent Component Analysis of SPECT Images
413
2 Estimation of the ICA Model The task in ICA is to find a solution to the noiseless blind source separation problem [13, 14], which can be stated as follows: let X be an observed random vector and A a full rank matrix such that: X = AS
(1)
where the source signals S are commonly assumed to be stochastically independent: ps (S1 , ..., Sn ) = ps1 (S1)...psn (Sn ). Thus, ICA recovers both the sources Sj and the mixing process using the independence assumption. In the linear case, the latter task consists of finding the mixing matrix A. A popular approach is to find a demixing or separating matrix W so that variables Yj in Y = WX are estimates of Sj up to scaling and permutation. In the deflationary approach the sources are estimated one by one, by finding a column vector wj (this will be stored as a row of W) such that Yj = wTj X is an estimate of Sj . Hence W is an estimate of the (pseudo)inverse of A up to scaling and permutation of the rows of W. The estimation of the independent components and the mixing matrix is done with the help of FastICA [15], which is an iterative fixed-point algorithm with the following update for w: w ←− E{Xg(wT X)} − E{g (wT X)}w
(2)
where w is one of the rows of the demixing matrix W, and g is the derivative of the contrast function, chosen to be a cubic polynomial. After each iteration step (2), w is normalized to have unit norm, ensuring that the rows wj of the demixing matrix are orthogonal.
2.1 ICA Application to Functional Brain Images The approach taken in this work to the ICA method consist of the search of a few element basis, which can describe the classes of our database in the most independent way. In brain images, the dataset is an ensemble of 3D brain images Γi , which size M is typically 79 × 95 × 69 ∼ 5 · 105 voxels, each pertaining to a class C k , k = 1, 2, ..., K, where K is the total number of classes. Let the full 3D brain image database be Γ1 , Γ2 , ..., ΓN , each understood as a vector of dimension M . A within class average brain image X can be defined as: Xk =
1 Nk
Γi ,
k = 1, 2, ..., K
(3)
{Γi ⊂C k }
where Nk denotes the number of images in the class C k . These average images, which number is the same as the number of classes, are the original mixed signals X = [X1 , X2 , ..., XK ]. ICA algorithm is expected to separate
´ I. Alvarez et al.
414
Fig. 1 Separation process of four original signals into four independent sources. Three representative coronal brain slices are displayed
them into a independent set of sources through WX = Y, where W is the estimated separating matrix, and Y is the estimated set of K independent sources Y = [Y1 , Y2 , ..., YK ] (see Fig 1). These latent variables are the essence of the different classes. Furthermore, these independent sources Yk define a orthogonal basis which span a independent components subspace of the ‘brain images space’. Thus, if seen as a linear operator, Y can be used to project each brain image onto the independent component subspace, yielding to a coefficients matrix Ω: Ωki = Yk Γi ,
i = 1, ..., N
k = 1, 2, ..., K
(4)
This coefficients denote the contribution of each original source in representing a specific brain image (see Fig. 2). Thus, the matrix Ω contains the most significant information extracted from the independent component analysis, stacked in a K × N data ensemble. We used this matrix Ω as features for the classification task, that is N K-dimensional patterns: xi = [Ω1i , Ω2i , ..., ΩKi ], i = 1, 2, ..., N
(5)
each of them with its corresponding class label yi ∈ {±1}.
3 Background on SVMs The classification is achieved through a SVM, that separates a given set of binary labeled training data with a hyperplane that is maximally distant from
Independent Component Analysis of SPECT Images
415
the two classes (known as the maximal margin hyper-plane). The objective is to build a function f : IRK −→ {±1} using training data, consisting of K-dimensional patterns xi and class labels yi : (x1 , y1 ), (x2 , y2 ), ..., (xN , yN ) ∈ IRK × {±1} ,
(6)
so that f will correctly classify new examples (x, y). When no linear separation of the training data is possible, SVM can work effectively in combination with kernel techniques using the kernel trick, so that the hyperplane defining the SVM corresponds to a non-linear decision boundary in the input space [9]. In this way the decision function f can be expressed in terms of the support vectors only [9]: f (x) = sign{
NS
αi yi K(si , x) + w0 },
(7)
i=1
where K(., .) is the kernel function, αi is a weight constant derived form the SVM process and si are the support vectors [9]. Common kernels that are used by SVM practitioners for the nonlinear feature mapping are: • Polynomial
K(x , y) = [γ(x · y) + c]d .
(8)
• Radial basis function (RBF) K(x , y) = exp(−γ||x − y ||2 ).
(9)
as well as the linear kernel, in which K(., .) is simply a scalar product.
4 Database The database consists of a set of 3D SPECT brain images produced with an injected gamma emitting 99m Tc-ECD radiopharmeceutical and acquired by a three-head gamma camera Picker Prism 3000. Images of the brain cross sections are reconstructed from the projection data using the filtered backprojection (FBP) algorithm in combination with a Butterworth noise removal filter. The classification task is based on the assumption that the same position in the volume coordinate system within different images corresponds to the same anatomical position. The SPECT images are spatially normalized using the SPM software [16]. The preprocessing and spatial normalization procedure, described in detail in [8], is achieved using Affine and Non-linear spatial normalization [17], and essentially guarantees meaningful voxel-wise comparisons between images. SPECT images consists of functional information and direct comparison of the voxel intensities, even different acquisitions of the same subject, is thus not possible without normalization of the
416
´ I. Alvarez et al.
intensities. Intensity level of the images is normalized to the maximum intensity, and consequently the basic assumptions are met. The images were initially labeled by experienced physicians of the “Virgen de las Nieves” hospital (Granada, Spain), using 4 different labels: normal (NOR) for patients without any symptoms of ATD and possible ATD (ATD1), probable ATD (ATD-2) and certain ATD (ATD-3) to distinguish between different levels of the presence of typical characteristics for ATD. In total, the database consists of N =79 patients: 41 NOR, 20 ATD-1, 14 ATD-2 and 4 ATD-3. We considered the patient label positive when belonging to any of the ATD classes, and negative otherwise.
5 Experiments Two different experiments were performed to construct the within class average images (3): • Method I: Using 2 classes; NOR and ATD, combining the three ATD labels into one. • Method II: Using 4 classes; NOR, ATD-1, ATD-2 and ATD-3. Once the matrix Ω was obtained using (4), a SVM was trained using 4 different kernels: linear, quadratic, Radial Basis Function (RBF) and polynomial, and was tested using a leave-one-out cross-validation strategy. It is to be noted that, in the first method, four classes were used with the purpose of constructing the matrix Ω. But for the classification task, the feature vector class labels (5) were always reduced to two, combining the three ATD labels into one. Fig. 2 ICA representation of an Alzheimer’s affected patient following method I
Independent Component Analysis of SPECT Images
417
Table 1 Statistical performance measures of the ICA model Parameter (%) Method I
Accuracy Specificity Sensitivity
Linear 84.81 87.80 81.58
Method II
Accuracy Specificity Sensitivity
83.54 85.37 81.58
Kernel function Quadratic RBF 87.34 87.34 92.68 90.24 81.58 84.21 86.07 87.80 84.21
91.14 92.68 89.47
VAF
Polynomic 87.34 92.68 81.58
72.15 78.05 65.79
88.60 90.24 86.84
74.68 82.93 65.79
5.1 Analysis The results summarized in table 1 reveal that the idea of finding a few element image basis for characterizing the AD is meaningful. The two described methods of constructing Ω yielded to significant improvement over other CAD systems based on the Voxels-As-Features (VAF) approximation[18], which performance values are displayed as reference. As expected from theoretic reasons [9], non linear kernel methods generalize better as linear kernels when the number of features is small. Both method I and II represent a high compression of the large amount of brain image data to a small number of features; 2 or 4 weight values respectively. Method I exhibit interesting performance results, but the compression into 2 features may be too drastic, and brain image information may be lost. Method II shows to be more adequate to characterize the AD, and best performance is obtained when combined with a RBF kernel, reaching 91.1% accuracy. Specificity took the value 92.7% and sensitivity 89.5% in that case.
6 Conclusions ICA applied to SPECT brain images is an effective way of extracting the relevant information for classification. Different architectures are possible for applying the ICA algorithm, like finding a equal number of sources as observed signals, or performing the unmixing process to the image voxels generating a new images made up of independent information. Nevertheless, the original aim of the ICA application within this work was to find a set of independent sources that best symbolize each Alzheimer’s disease stage. The final results show the success of this method when combined with a RBF kernel SVM, which estimated error rate lower than 9%, ourperforming previous results like the VAF approximation.
418
´ I. Alvarez et al.
Acknowledgements. This work was partly supported by the MICINN under the PETRI DENCLASES (PET2006-0253), TEC2008-02113, NAPOLEON (TEC200768030-C02-01) and HD2008-0029 projects and the Consejer´ıa de Innovaci´ on, Ciencia y Empresa (Junta de Andaluc´ıa, Spain) under the Excellence Project TIC-02566.
References 1. Ishii, K., Kono, A.K., Sasaki, H., Miyamoto, N., Fukuda, T., Sakamoto, S., Mori, E.: Fully Automatic Diagnostic System for Early- and Late-onset Mild Alzheimer’s Disease Using FDG PET and 3D-SSP. European Journal of Nuclear Medicine and Molecular Imaging 33(5), 575–583 (2006) 2. Fung, G., Stoeckel, J.: SVM Feature Selection for Classification of SPECT Images of Alzheimer’s Disease Using Spatial Information. Knowledge and Information Systems 11(2), 243–258 (2007) 3. Duin, R.P.W.: Classifiers in Almost Empty Spaces. In: Proceedings 15th International Conference on Pattern Recognition, vol. 2, pp. 1–7. IEEE, Los Alamitos (2000) 4. Nobili, F., Salmaso, D., Morbelli, S., Girtler, N., Piccardo, A., Brugnolo, A., Dessi, B., Larsson, S.A., Rodriguez, G., Pagani, M.: Principal Component Analysis of fdg pet in Amnestic MCI. Eur. J. Nucl. Med. Mol. Imaging 35(12), 2191–2202 (2008) 5. Bartlett, M., Movellan, J., Sejnowski, T.: Face Recognition by Independent Component Analysis. IEEE Transactions on Neural Networks 13(6), 1450–1464 (2002) 6. Theis, F.J., Gruber, P., Keck, I.R., Lang, E.W.: Functional MRI analysis by a novel spatiotemporal ICA algorithm. In: Duch, W., Kacprzyk, J., Oja, E., Zadro˙zny, S. (eds.) ICANN 2005. LNCS, vol. 3696, pp. 677–682. Springer, Heidelberg (2005) 7. Fink, F., Worle, K., Gruber, P., Tome, A.M., Gorriz, J.M., Puntonet, C.G., Lang, E.W.: Ica Analysis of Retina Images for Glaucoma Classification. In: 30th Annual International Conference of the Engineering in Medicine and Biology Society, pp. 4664–4667. IEEE, Los Alamitos (2008) 8. Ram´ırez, J., G´ orriz, J.M., G´ omez-R´ıo, M., Romero, A., Chaves, R., Lassl, A., Rodr´ıguez, A., Puntonet, C.G., Theis, F., Lang, E.: Effective emission tomography image reconstruction algorithms for SPECT data. In: Bubak, M., van Albada, G.D., Dongarra, J., Sloot, P.M.A. (eds.) ICCS 2008, Part I. LNCS, vol. 5101, pp. 741–748. Springer, Heidelberg (2008) 9. Vapnik, V.N.: Statistical Learning Theory. John Wiley and Sons, Inc., New York (1998) 10. Ram´ırez, J., Y´elamos, P., G´ orriz, J.M., Segura, J.C.: SVM-based Speech Endpoint Detection Using Contextual Speech Features. Electronics Letters 42(7), 877–879 (2006) 11. Tao, D., Tang, X., Li, X., Wu, X.: Asymmetric Bagging and Random Subspace for Support Vector Machines-based Relevance Feedback in Image Retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence 28(7), 1088– 1099 (2006)
Independent Component Analysis of SPECT Images
419
12. G´ orriz, J.M., Ram´ırez, J., Lassl, A., Salas-Gonzalez, D., Lang, E.W., Puntonet, ´ C.G., Alvarez, I., L´ opez, M., G´ omez-R´ıo, M.: Automatic Computer Aided Diagnosis Tool Using Component-based SVM. In: Medical Imaging Conference, Dresden. IEEE, Los Alamitos (2008) 13. Comon, P.: Independent Component Analysis, a new concept? Signal Process. 36(3), 287–314 (1994) 14. Bingham, E.: Advances in Independent Component Analysis with Applications to Data Mining. PhD thesis, Helsinki University of Technology (2003) 15. Oja, E.: A Fast Fixed-point Algorithm for Independent Component Analysis. Neural Computation 9, 1483–1492 (1997) 16. Friston, K., Ashburner, J., Kiebel, S., Nichols, T., Penny, W. (eds.): Statistical Parametric Mapping: The Analysis of Functional Brain Images. Academic Press, London (2007) 17. Salas-Gonzalez, D., G´ orriz, J.M., Ram´ırez, J., Lassl, A., Puntonet, C.G.: Improved Gauss-newton Optimization Methods in Affine Registration of Spect Brain Images. IET Electronics Letters 44(22), 1291–1292 (2008) 18. Stoeckel, J., Malandain, G., Migneco, O., Koulibaly, P.M., Robert, P., Ayache, N., Darcourt, J.: Classification of SPECT Images of Normal Subjects Versus Images of Alzheimer’s Disease Patients. In: Niessen, W.J., Viergever, M.A. (eds.) MICCAI 2001. LNCS, vol. 2208, pp. 666–674. Springer, Heidelberg (2001)
The Multi-Class Imbalance Problem: Cost Functions with Modular and Non-Modular Neural Networks Roberto Alejo, Jose M. Sotoca, R.M. Valdovinos, and Gustavo A. Casa˜n
Abstract. In this paper, the behavior of Modular and Non-Modular Neural Networks trained with the classical backpropagation algorithm in batch mode and applied to classification problems with Multi-Class imbalance is studied. Three different cost functions are introduced in the training algorithm in order to solve the problem in four different databases. The proposed strategies show an improvement in the classification accuracy with three different types of Neural Networks. Keywords: Multi-Class, imbalance, backpropagation, cost function.
1 Introduction Typically supervised learning methods are designed to work with reasonably balanced Training Sets (TS) [1], but many real world applications have to face imbalanced data sets [2]. A TS is said to be imbalanced when several classes are under-represented (minority classes) in comparison with others (majority classes). Roberto Alejo Lab. Reconocimiento de Patrones, Instituto Tecnol´ogico de Toluca Av. Tecnol´ogico s/n, 52140 Metepec, M´exico and Centro Universitario UAEM Atlacomulco, Universidad Aut´onoma del Estado de M´exico Carretera Toluca-Atlacomulco Km. 60, 50450 Atlacomulco, M´exico R.M. Valdovinos Centro Universitario UAEM Valle de Chalco, Universidad Aut´onoma del Estado de M´exico Hermenegildo Galena No.3, Col. Ma. Isabel, 56615 Valle de Chalco, M´exico Jose M. Sotoca · Gustavo A. Casa˜n Dept. Llenguatges i Sistemes Inform´atics, Universitat Jaume I Av. Sos Baynat s/n, 12071 Castell´o de la Plana, Spain
H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 421–431. c Springer-Verlag Berlin Heidelberg 2009 springerlink.com
422
R. Alejo et al.
A feed-forward Neural Network (NN) trained on an imbalanced dataset is not able to learn with sufficient discrimination among classes [3]. Particularly, in the back-propagation algorithm with batch-mode, the majority class dominates the training process, therefore the minority classes converge very slowly [4]. In the machine learning field, most of the work in imbalanced problems are addressed to solve the problem for two classes [5], and only a few studies discuss the multi-class imbalance problem [4, 6]. This paper is focused mainly in the evaluation of different cost functions designed to improve the NN performance. Thus, the backpropagation algorithm is modified to deal with the multi-class imbalance problem. These cost functions will be calculated in relation to the proportion of samples used in to train the NN. The main contributions of this paper are the comparison between different approaches that apply cost functions in the multi-class imbalance problem learning directly and the effect of decoupling multi-class problems and solving two-class imbalance problem using a modular strategy.
2 Modular and Non Modular Neural Networks The Modular Neural Networks (Mod-NNs) present a new trend in NN architectural designs [7]. It has been motivated by the highly-modular nature in biological networks and based on the “divide and conquer” approach [8]. The use of Mod-NNs implies a significant improvement in the learning process in comparison with a Non-Modular NN (Non-Mod-NN) [8]. The Non-Modular classifiers tend to introduce high internal interferences because a strong coupling among their hidden-layer weights can appear [9]. The Mod-NNs show the following computational advantages [4]: a) the numbers of iterations needed to train the individual modules is less than the number of iterations needed to train a Non-Mod-NN for the same task; b) the modules in a Mod-NN are smaller than Non-Mod-NN; and c) the modules can be trained independently and in parallel. Here, we use the Mod-NN architecture to face the multi-class imbalance problem. In this Mod-NN architecture, each module is a single-output NN (see Fig.1) which determines if a pattern belongs to a particular class. Thereby, a K-class problem is reduced to a set of K two-class problems. A module for class ck is trained to distinguish between patterns belonging to ck respect to the patterns of the rest of classes. So, in our Mod-NN, given a instance test xi , the two class network with the highest rating is taken as the class label for that instance. Non-Mod-NN and Mod-NNs modules are studied in this work with Radial Basis Function NN (RBFNN), Random Vector Functional Link Net Networks (RVFLNN) and Multilayer Perceptron (MLP). MLP and RBFNN are two well-known NN in the pattern recognition field [10]. The main difference from the MLP is that the activations of the hidden neurons of RBFNN depend on the distance of an input vector to a prototype vector whereas MLP calculate the inner product of the input
The Multi-Class Imbalance Problem
423
(a) Non-Modular NN
(b) Modular NN
Fig. 1 The NN architectures
vector and the weight vector [11]. Both NN can be trained by supervised methods [10]. All parameters are adapted simultaneously by an optimization procedure. The RVFLNN is a variant of the RBFNN. The RVFLN of Pao [12] is added to the RBFNN in order to obtain the last one, and it gives a extra connectivity of the FLN along with any functions put into the offset hidden neurons. The addition of connections between the hidden neurons adds extra learning power [10].
3 The Backpropagation Algorithm and the Class Imbalance Problem Empirical studies of the backpropagation algorithm [13], show that the class imbalance problem does not generate equal contributions to the mean square error (MSE) in the training phase. Obviously the main contribution to the MSE is produced by the majority class. m Let us consider a TS with two classes such that N = i ni and ni is the number of samples from class i. Suppose that the MSE by class may be expressed as Ei (U ) =
ni L 1 (y n − Fpn )2 , N n=1 p=1 p
(1)
so that the overall MSE can be expressed as E(U ) =
m
Ei = E1 (U ) + E2 (U ) .
(2)
i=1
If n1 << n2 then E1 (U ) << E2 (U ) and ∇E1 (U ) << ∇E2 (U ). Then ∇E(U ) ≈ ∇E2 (U ). So, −∇E(U ) it is not always the best direction to minimize the MSE in both classes.
424
R. Alejo et al.
Considering that the TS imbalanced problem affects negatively the backpropagation algorithm due to the disproportionate contributions to the MSE, it is possible to consider a cost function (γ) that balances the TS class imbalance as follows m E(U ) = i=1 γ(m)Ei = γ(1)E1 (U ) + γ(2)E2 (U ) (3) nm L M n n 2 γ(m) (y − F ) , = N1 p n=1 m=1 p=1 p where γ(1)∇E1 (U ) ≈ γ(2)∇E2 (U ) avoiding the fact that the minority class would be ignored in the learning process [14]. The previous process can be generalized to multi-class problems [6] to obtain the following cost functions: • Option 0: γ(m) = 1, is the backpropagation algorithm without modifications. • Option 1: γ(m) = nmax /nm ; where m = 1, ..., M and nmax is the number of samples of the majority class. • Option 2: γ(m) = N/nm ; where m = 1, ..., M and N is the total number of samples. • Option 3: γ(m) = ∇Emax (U )/∇Em (U ), where ∇Emax (U ) is the majority class. This function is a simplification of [13].
4 Data Sets For the experimental phase, Cayo, Ecoli6, Feltwell and Satimage databases with multiple classes were used. Feltwell is related to an agriculture region near to Felt Ville, Feltwell (UK) and is divided in training data (5124 samples) and test data (5820 samples). Cayo, which represents a particular region in the gulf of Mexico, was partitioned using the holdout method (50% training and 50% test). Both are remote sensing images. Ecoli6 is obtained from Ecoli, a biological database created by the Institute of Molecular and Cellular Biology from Osaka University, Japan. It was originally distributed in eighth classes but in this work classes 7 and 8 have been eliminated since these only have two samples making it difficult to apply the method of cross validation. The Ecoli6 database was split using the five cross validation method in 10 parts (5 training and 5 testing), using the 80-20 proportion. Satimage were obtained from the UCI Machine Learning Database Repository, and used
Table 1 A brief summary of the some basic characteristics of the databases Dataset Cayo Ecoli6 Feltwell Satimage
Size 6019 332 10944 6430
Attr. 4 7 15 36
Class 11 6 5 6
Class distribution 838/293/624/322/133/369/324/722/789/833/772 5/143/77/52/35/20 3531/2441/896/2295/1781 1508/1531/703/1356/625/707
The Multi-Class Imbalance Problem
425
without changes. The data in Satimage is divided into: 4435 training and 200 testing samples. In Table 1, the most important characteristics of each database are summarized.
5 Methodology The NN was trained with the backpropagation algorithm in batch mode. This process was repeated five times and the results correspond with the average. The KHold-Out Paired t and K-Fold Cross-Validation Paired t statistical tests [15] were applied. We assume that the set of differences has an independently drawn sample from an approximately normal distribution. Then, under the null hypothesis (equal classification accuracies), the following statistic has a Test Student distribution with K - 1 degrees of freedom. The learning rate (η) was set to 0.0001 for RBFNN and RVFLNN, and a value of 0.9 was used for the MLP. Only one hidden layer was used in the last case. The number of neurons for the hidden layer (for all NN) was established to 16, 15, 6 and 12 for Cayo, Ecoli6, Feltwell and Satimage respectively. In this study, Accuracy, g-mean and Kappa coefficient are used as measure criteria for the classifiers performance. These measure criteria are often obtained from the confusion matrix, where real classes are in columns whereas predicted ones appears in rows (Table 2). The table built on this way is a general vision assignment, the right ones (diagonal elements) like the wrong ones (elements out of the diagok
n
ii nal). From Table 2, the measure criteria Accuracy = i=1 is obtained, where n njj n is the total number of samples and n+j is the Accuracy by class. The proportions of the samples pij in the cell (i, j) correspond to the number of samples nij , i.e, k k pij = nij /n. So, define pi+ and p+j as pi+ = j=1 pij , and p+j = i=1 pij . The Kappa coefficient is used as a quality parameter and takes into consideration the marginal distributions for the confusion matrix. Its value provides gives us an idea of the right percentage obtained in the classification process, # $ once o −pc the random part has been eliminated. It is defined as Kappa = p1−p where c k k po = i=1 pii is the well predicted percentage, and pc = i=1 pi+ p+j the random coefficient.
Table 2 Confusion matrix
Predicted Classes 1 2 k total (n+j )
1 n11 n21 nk1 n+1
Real Classes 2 n12 n22 nk2 n+2
k n1k n2k nkk n+k
total (ni+ ) n1+ n2+ nk+ n
426
R. Alejo et al.
Other measure used to quantify the classifier performance in the class imbalance problem is the geometric mean (g-mean) [14]. The geometric mean is defined as 9k 1 g-mean = ( i=1 pii ) k , where pii is the class accuracy i.
6 Experiments and Discussions As can be seen in Table 1, all databases have different TS imbalance levels. From a moderate imbalance up to a severe imbalance in the same data set. As example, in Ecoli6 database, classes 1 and 2 present a severe imbalance between them (majority class has 143 samples and minority class has only 5), while classes 5 and 6 show a reasonable imbalanced problem. A similar case is observed in Cayo, while in Feltwell and Satimage databases only a moderate imbalance between classes can be noticed. The first column in the tables of classification performance shows the NN used and the second the applied evaluation criteria. The following columns contain the observed values due to the applied strategies. The data in parenthesis represents the standard deviation.
6.1 Results with Non-Mod-NN The experience with the Cayo dataset shows the low performance of the classifier when imbalanced samples were used in the training phase without a cost function (Option 0). In table 3, the zero value of g-mean is observed in the three network models without applying cost functions. As expected, the classes not correctly identified are minority classes (4 and 5) whose impact is small in the global performance. In the three network models using cost functions improve the results when the g-mean has a low value in Option 0 (for Cayo database it took value zero). The improvements of classifier accuracy are due to the increment in the minority classes accuracy when cost functions are applied in the training phase and consequently are
Table 3 Classification performance of Cayo dataset with Non Modular Networks
MLP
RBFNN
RVFLNN
Acc g-mean Kappa Acc g-mean Kappa Acc g-mean Kappa
Option 0 76.7(1.8) 0.0(0.0) 0.7(0.0) 75.0(4.5) 0.0(0.0) 0.7(0.1) 70.8(4.3) 0.0(0.0) 0.7(0.1)
Option 1 85.2(0.4) 82.6(0.3) 0.8(0.0) 75.6(3.9) 72.7(4.5) 0.7(0.0) 75.4(2.1) 73.1(2.4) 0.7(0.0)
Option 2 85.9(0.3) 84.1(0.7) 0.8(0.0) 78.4(4.8) 73.7(4.8) 0.8(0.1) 78.8(2.5) 73.5(3.1) 0.8(0.0)
Option 3 85.1(0.2) 80.2(0.4) 0.8(0.0) 81.0(1.5) 74.9(2.4) 0.8(0.0) 82.0(1.4) 74.1(2.4) 0.8(0.0)
The Multi-Class Imbalance Problem
427
Table 4 Classification performance of Ecoli6 dataset with Non Modular Networks
MLP
RBFNN
RVFLNN
Acc g-mean Kappa Acc g-mean Kappa Acc g-mean Kappa
Option 0 87.1(2.7) 84.5(5.7) 0.8(0.0) 83.4(5.5) 66.2(37.4) 0.8(0.1) 84.9(4.3) 62.0(35.4) 0.8(0.1)
Option 1 82.8(6.8) 82.1(6.2) 0.8(0.1) 83.1(5.4) 84.1(3.6) 0.8(0.1) 83.1(7.4) 84.4(6.2) 0.8(0.1)
Option 2 82.2(5.9) 81.7(5.3) 0.8(0.1) 84.6(4.9) 84.4(5.4) 0.8(0.1) 84.9(4.4) 84.1(6.2) 0.8(0.1)
Option 3 85.8(5.2) 84.4(6.7) 0.8(0.1) 84.6(5.5) 84.3(7.3) 0.8(0.1) 83.1(6.7) 84.1(6.6) 0.8(0.1)
identified correctly in the classifier phase. The options with better results are Option 2 for MLP and Option 3 for RBFNN and RVFLNN. In data set Ecoli6 (Table 4) the g-mean values (see Option 0) are rather high specially for MLP, thus, the use of cost functions does not suppose an improvement in the results, and in same cases it obtains worse results. It should probably be related to the small size of the database. Feltwell dataset presents a similar situation as Ecoli6 with high g-mean values, but when we use cost functions the accuracy results are similar or bit better. The difference seems to come from the number of samples, that is significantly bigger in Feltwell. Option 2 shows the best results in MLP and RBFNN models, while Option 1 and 3 do the same for RVFLNN. In Satimage dataset (Table 6) all implemented Options report improvements in the global classifier performance. Important accuracy, g-mean values improvements can be observed (specially in MLP) when options 1, 2 and 3 are applied. The best results are obtained with Option 2 in MLP and RBFNN and Option 3 in RVFLNN. Table 5 Classification performance of Feltwell dataset with Non Modular Networks
MLP
RBFNN
RVFLNN
Acc g-mean Kappa Acc g-mean Kappa Acc g-mean Kappa
Option 0 88.6(1.3) 84.7(3.2) 0.9(0.0) 87.0(0.8) 81.1(2.5) 0.8(0.0) 86.9(1.8) 74.0(15.6) 0.8(0.0)
Option 1 88.8(0.7) 87.0(0.8) 0.9(0.0) 86.5(1.8) 84.9(2.2) 0.8(0.0) 88.0(0.8) 85.8(0.7) 0.8(0.0)
Option 2 89.3(1.2) 87.4(1.3) 0.9(0.0) 87.8(1.8) 86.3(1.7) 0.8(0.0) 87.0(2.9) 84.5(4.1) 0.8(0.0)
Option 3 88.8(0.4) 86.6(0.4) 0.9(0.0) 84.1(3.0) 78.2(7.2) 0.8(0.0) 88.1(1.1) 84.7(2.4) 0.8(0.0)
428
R. Alejo et al.
Table 6 Classification performance of Satimage dataset with Non Modular Networks
MLP
RBFNN
RVFLNN
Acc g-mean Kappa Acc g-mean Kappa Acc g-mean Kappa
Option 0 82.2(0.2) 50.8(2.8) 0.78(0.0) 83.2(0.7) 76.8(1.8) 0.8(0.0) 82.8(1.2) 74.1(1.9) 0.8(0.0)
Option 1 85.6(0.6) 84.2(0.6) 0.8(0.0) 83.3(0.6) 81.6(1.2) 0.8(0.0) 83.2(1.3) 81.1(1.2) 0.8(0.0)
Option 2 87.4(0.5) 86.3(0.4) 0.8(0.0) 85.6(0.5) 84.6(0.5) 0.8(0.0) 84.5(0.3) 82.8(0.5) 0.8(0.0)
Option 3 86.3(0.5) 83.6(1.1) 0.8(0.0) 84.3(1.3) 81.4(2.5) 0.8(0.0) 85.3(0.3) 82.2(0.5) 0.8(0.0)
Table 7 Classification performance of Cayo with Modular Networks
MLP
RBFNN
RVFLNN
Acc g-mean Kappa Acc g-mean Kappa Acc g-mean Kappa
Option 0 71.4(1.4) 0.0(0.0) 0.7(0.0) 76.6(4.2) 0.0(0.0) 0.74(0.0) 83.18(1.3) 74.0(3.5) 0.8(0.0)
Option 1 83.2(0.3) 79.1(1.0) 0.8(0.0) 82.3(0.5) 78.8(1.9) 0.8(0.0) 79.8(5.9) 68.1(13.9) 0.8(0.1)
Option 2 83.4(0.4) 79.8(0.3) 0.8(0.0) 80.6(1.3) 77.8(2.4) 0.78(0.0) 81.5(1.8) 77.3(2.9) 0.8(0.0)
Option 3 83.0(0.7) 77.4(0.5) 0.8(0.0) 81.7(3.0) 76.5(2.6) 0.8(0.0) 83.19(1.2) 77.5(2.0) 0.8(0.0)
6.2 Results with Mod-NNs For Cayo dataset with Modular NN, as happened Non-Modular NN, the use of cost functions improved the accuracy results, except with RVFLNN. It is important to note that the behavior of Modular and Non-Modular NN is different for the same database (compare the accuracy classification for the three different NN with Option 0 in tables 3 and 7). In Cayo dataset with Modular NN the best results where obtained for Option 1 for the MLP and RBFNN, and Option 3 for RVFLNN. Ecoli6 database with Modular NN presents a similar behavior as the NonModular NN, using cost functions does not suppose an improvement in the results (table 8). When Modular NN and cost functions are used with Feltwell dataset, several behaviours can be seen (table 9). We can be note that in MLP the results are worse when the Options are applied, but in RBFNN and RVFLNN we can talk of a light improvement in the global performance. Only Option 3 reports better classifier performance.
The Multi-Class Imbalance Problem
429
Table 8 Classification performance of Ecoli6 with Modular Networks
MLP
RBFNN
RVFLNN
Acc g-mean Kappa Acc g-mean Kappa Acc g-mean Kappa
Option 0 87.1(2.9) 83.7(5.9) 0.8(0.0) 86.4(3.0) 84.6(4.9) 0.8(0.0) 87.9(2.8) 85.0(5.7) 0.8(0.0)
Option 1 84.9(3.5) 83.3(3.5) 0.8(0.0) 85.2(7.4) 84.7(8.2) 0.8(0.1) 85.9(5.8) 84.9(4.6) 0.8(0.1)
Option 2 85.2(3.9) 83.8(3.6) 0.8(0.0) 84.9(5.0) 84.9(5.4) 0.8(0.0) 85.2(3.4) 85.6(3.6) 0.8(0.0)
Option 3 85.5(3.9) 83.3(5.9) 0.8(0.0) 85.2(6.0) 86.0(4.8) 0.8(0.0) 87.0(5.0) 85.6(6.0) 0.8(0.1)
Table 9 Classification performance of Feltwell with Modular Networks
MLP
RBFNN
RVFLNN
Acc g-mean Kappa Acc g-mean Kappa Acc g-mean Kappa
Option 0 89.8(0.1) 87.1(0.2) 0.9(0.0) 89.5(0.6) 87.4(1.0) 0.9(0.0) 88.3(2.0) 82.4(4.2) 0.8(0.0)
Option 1 87.2(0.3) 84.7(0.3) 0.8(0.0) 89.0(1.0) 87.1(1.7) 0.9(0.0) 88.7(1.0) 86.1(1.0) 0.9(0.0)
Option 2 86.8(0.4) 84.3(0.4) 0.8(0.0) 89.0(1.0) 87.0(1.1) 0.8(0.0) 89.9(1.2) 88.1(1.2) 0.8(0.0)
Option 3 88.6(0.4) 85.9(0.5) 0.8(0.0) 90.4(0.5) 88.2(0.2) 0.8(0.0) 90.7(0.6) 88.2(1.0) 0.8(0.0)
Table 10 Classification performance of Satimage with Modular Networks
MLP
RBFNN
RVFLNN
Acc g-mean Kappa Acc g-mean Kappa Acc g-mean Kappa
Option 0 81.8(0.8) 47.5(2.8) 0.8(0.0) 83.9(1.5) 73.9(5.9) 0.8(0.02) 84.9(1.0) 77.3(1.3) 0.8(0.0)
Option 1 85.4(0.6) 82.8(1.0) 0.8(0.0) 85.4(1.4) 84.0(1.6) 0.8(0.0) 86.0(0.4) 83.9(0.2) 0.8(0.0)
Option 2 84.7(0.5) 82.8(0.9) 0.8(0.0) 86.2(0.8) 84.4(0.9) 0.8(0.0) 86.0(0.1) 83.9(0.4) 0.8(0.0)
Option 3 84.8(0.3) 81.4(0.8) 0.8(0.0) 85.0(1.3) 81.5(1.5) 0.8(0.0) 85.9(1.0) 82.5(1.0) 0.8(0.0)
In Satimage, a better classifier performance is observed when the cost functions are used (in table 10). Option 1 reports better accuracy results in models MLP and RVFLNN, and a similar improvement is shown with Option 2 in RBFNN.
430
R. Alejo et al.
7 Conclusions In this work, the class imbalance problem for multiple classes is analyzed by means of Modular and Non Modular NN trained with the backpropagation algorithm in batch mode for four different databases. Three strategies have been studied in order to balance the MSE for each class contribution. Each strategy consists of using different kinds of cost functions in the training algorithm. The proposed strategies improve the Modular and Non Modular NN (implemented for MLP, RBFNN and RVFLNN models) performance over the less representative classes contained in the TS, reducing the class imbalance problem in the training process with the added objective of improving the general performance during the classification phase. But in some cases it does not improve the global results. The most clear cases is the Ecoli6 database, which has a very reduced sample set. We can not decide which cost function (Option 1, 2, 3) is better: in Non Modular NN the Option 1 has significance statistical improvements in 8.3% of the experiments, Option 2 in about 41.6% and Option 3 in 33.3%. While in the Modular NN we have 33.3% for Option 1 and Option 3, and 16.7% for Option 2. Future research must consider the relationship between TS imbalance and data complexity (overlapping, noise or decision frontiers), and include the techniques for to reduce the data complexity. Also severe TS imbalance (for example in remote perception images) must be further considered. Acknowledgements. This work has been partially supported by grants DPI2006-15542C04-03 from the Spanish CICYT, SEP-2003-C02-44225 from the Mexican CONACyT, Generalitat Valenciana under the project GV/2007/105, CB-2008-01-107085 from the Mexican CONACyT and PROMEP/103.5/08/3016 from the Mexican SEP.
References 1. Japkowicz, N., Stephen, S.: The Class Imbalance Problem: a Systematic Study. Intelligent Data Analysis 6, 429–449 (2002) 2. Kotsiantis, S., Pintelas, P.: Mixture of Expert Agents for Handling Imbalanced Data Sets. Annals of Mathematics and Computing & TeleInformatics 1, 46–55 (2003) 3. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: Synthetic Minority Over-Sampling Technique. J. Artif. Intell. Res. (JAIR) 16, 321–357 (2002) 4. Anand, R., Mehrotra, K., Mohan, C.K., Ranka, S.: Efficient Classification for Multiclass Problems Using Modular Neural Networks. IEEE Transactions on Neural Networks 6, 117–124 (1995) 5. Zhou, Z.H., Liu, X.Y.: Training Cost-Sensitive Neural Networks with Methods Addressing the Class Imbalance Problem. IEEE Transactions on Knowledge and Data Engineering 18, 63–77 (2006) 6. Bruzzone, L., Serpico, S.B.: Classification of Imbalanced Remote-Sensing Data by Neural Networks. Pattern Recognition Letters 18, 1323–1328 (1997) 7. Auda, G., Kamel, M.: Modular Neural Network Classifiers: A Comparative Study. Journal of Intelligent and Robotic Systems 21, 117–129 (1998)
The Multi-Class Imbalance Problem
431
8. Eric, R., Gawthrop, P.: Modular Neural Networks: A State of the Art. Technical Report CSC-95026, Centre for System and Control. Faculty of mechanical Engineering, University of Glasgow, UK (1995) 9. Jacobs, R.A., Jordan, M.I., Nowlan, S.J., Hinton, S.J.: Adaptive Mixtures of Local Experts. Neural Computation 3, 79–87 (1991) 10. Looney, C.: Pattern Recognition Using Neuronal Networks - Theory and Algorithms for Engineers and Scientists, 1st edn. Oxford University Press, New York (1997) 11. Ding, C., Xiang, S.Q.: From Multilayer Perceptrons to Radial Basis Function Networks: A Comparative Study. In: EEE. Conference on Cybernetics and Intelligent Systems, vol. 1, pp. 69–74 (2004) 12. Pao, Y.H., Park, G.H., Sobajic, D.J.: Learning and Generalization Characteristics of the Random Vector Functional-Link Net. Neurocomputing 6, 163–180 (1994) 13. Anand, R., Mehrotra, K.G., Mohan, C.K., Ranka, S.: An Improved Algorithm for Neural Network Classification of Imbalanced Training Sets. IEEE Transactions on Neural Networks 4, 962–969 (1993) 14. Alejo, R., Garcia, V., Sotoca, J.M., Mollineda, R.A., Sanchez, J.S.: Improving the Performance of the RBF Neural Networks with Imbalanced Samples. In: 9th International Work-Conference on Artificial Neural Networks, pp. 162–169. Springer, Heidelberg (2007) 15. Kuncheva, L.I.: Combining Pattern Classifiers: Methods and Algorithms. WileyInterscience, Hoboken (2004)
Geometry Algebra Neuron Based on Biomimetic Pattern Recognition Wenming Cao and Feng Hao 1
Abstract. Biometric Pattern Recognition aim at finding the best coverage of per kind of sample’s distribution in the feature space. It is based on the analysis of relationship of sample points in the feature space. According to the principle of “same source”, research the same kind of samples’ distribution in the feature space can get eigenvector information with low data amount. This can be realized by ‘coverage recognizing method of complex geometric body in high dimensional space’. Self-adaptive topological structure of high dimensional geometrical neuron model offers theoretical basis for its realization. But it has been investigated in 2D sample space. In this paper ,we extend to Multispectral Image sample space by Clifford Algebra,and propose geometry algebra neuron by biomimetic pattern recognition theory. The experiment result proves the efficiency of our theory. Keywords: Biometric Pattern Recognition, Geometry Algebra,Geometry Algebra Neuron, Multispectral Image sample.
1 Introduction There are finite or infinite sample points between two different samples of a same kind of things. We call this as “same source”. In mathematical language it is described as, there is a set:
B = {x1 = x, x2 ,
, xn = y ρ ( xm , xm+1 ) < ξ , m ∈ [1, n − 1], m, n ∈ N , ξ > 0,} ⊂ A
(1) n
Where A is a point set consists of things belong to A in the feature space R . In this paper, based on biomimetic pattern recognition, we consider the concept of Wenming Cao . Feng Hao Jiaxing, Zhejiang 100083, China Wenming Cao School of Information Engineering, Shenzhen University, Guangdong 518060, China
[email protected],
[email protected] H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 433–440. springerlink.com © Springer-Verlag Berlin Heidelberg 2009
434
W. Cao and F. Hao
color and multispectral image recognition[1]. A multispectral (multicolor,multicomponent) image contains more than one component. An RGB image is an example of a color image featuring three separate image components R (red), G(green), and B (blue). In the classical approach every multicolor pixel (in particular, color pixel) is associated to a point of a kD multicolor vector space (to a point of a 3D RGB vector space for color images)[2-4]. Our proposed approach is that each image multicolor pixel is considered not as a kD vector, but as a multiplet number, and multicolor (color) space is identified with the so-called multiplet (triplet) algebra. The aim is to show that the use of Clifford algebras[5] fits more naturally to the tasks of multicolor image processing and recognition of multicolor patterns than does the use of color vector spaces. We give algebraic models of multispectral image using different hypercomplex[6] and Clifford algebras and mention new methods of image recognition based on an algebraic-geometric theory of invariants. In this approach, each color or multicolor pixel is considered not as a kD vector, but as a kD hypercomplex number. The remainder of this paper is organized as follows. Section 2 describes Clifford algebras as models of physical spaces, and Biomimetic Pattern Recognition Theory based on Clifford Algebra,and proposes the metric between two Clifford numbers.. Section 3 introduces 3D face recognition algorithm based on Clifford algebras.Experiments results, experimental analysis and conclusion are given in Section 4 and Section 5 respectively.
2 Geometric Algebra We suppose that some hypercomplex-valued invariants of an image are calculated when recognizing it. Hypercomplex algebras generalize the algebras of complex numbers, quaternions and octonions. Of course, the algebraic nature of hypercomplex numbers must correspond to the spaces with respect to geometrically perceivable properties. For recognition of 2D, 3D and nD images we turn the
R 3 and R n into corresponding algebras of hypercomplex numbers. Let n “small” nD space R be spanned by the orthonormal basis of n space
spaces R
2
hyperimaginary units I i , i = 1, 2,..., n . We assume
⎧ +1 ⎪ I = ⎨ −1 ⎪0 ⎩ 2 i
i = 1, 2,..., p, i = p + 1, p + 2,...., p + q, i = p + q + 1, p + q + 2,....., p + q + r = r
and I i I j = − I i I j . Now, we construct the “big”
(2)
n
2n D hypercomplex space R 2 .
Let b = (b1 , b2 ,......., bn ) ∈ B2n be an arbitrary n-bit vector, where bi ∈ B2 = {0,1} n
and B2 is the nD Boolean algebra. Let us introduce I b := I1b1 I 2b2 ......I nbn . Then
2n elements I b form a basis of 2n D space, i.e., for all M ∈ R 2 we n
Geometry Algebra Neuron Based on Biomimetic Pattern Recognition
have M :=
∑c I
b b
b∈B2n
435
. We denote these algebras by A A2Spn ( p ,q ,r ) ( R | I1 ,....., I n ) ,
A2Spn ( p ,q ,r ) or A2Spn , if I1 ,....., I n , p, q, r are fixed. The conjugation operation in
A2Spn ( p ,q , r ) maps every Clifford number
e := c0 I 0 + ∑ b ≠ 0 cb I b to the number e = c0 I 0 − ∑ b ≠ 0 cb I b . The algebras A2Spn ( p ,q , r ) are transformed into 2n D pseudo-metric spaces designated by ( p ,q ,r ) p ,q ,r eLSp or eL2 n , if the pseudo-distance between two Clifford numbers A and 2n
B is defined by ρ ( A,B ) = ( A − B )( A − B ) . Subspaces of pure vector Clifford numbers The
x1 I1 + ...+ xn I n ∈ Vec1 ( A2Spn ) are nD spaces R n := gR nSp ( p ,q ,r ) .
pseudo-metrics
constructed Sp ( p , q , r )
corresponding pseudo-metrics in gR n We can make
( p ,q ,r ) eLSp 2n
in
induce
.
A2Spn , be a ranked and Z / 2 Z -graded algebra. Let r(b) be the
Hamming weight (= rank) of b, i.e., a functional r : B2 → [0, n − 1] defined by n
n
r (b) := ∑ bi , and let ∂ (b) = r (b)(mod 2) be the grad of b. Then A2Spn can be i =1
represented as the ranked and Z / 2 Z -graded sums
A2Spn = ⊕ nr =0 A2[nr ] and
R 2 = ⊕1∂= 0 A2{n∂} , where the dimension of the vector space A2[nk ] equals the n
k
binomial coefficient Cn and
n
∑C k =0
The dimensions of
k n
= 2n .
{1} [k ] n −1 A2{0} and A2 n are equal to 2 . The subspaces A2 n are n
spanned by the k-products of units I i1 I i2 ...I ik (i1 < i2 < ... < ik ) , i.e., by all basis vectors with r (b) = k . Every element e := tations: e = e
[0]
∑
b∈B2n
cb I b of A2Spn has the represen-
+ e [1] + ... + e [ n ] and e = e {0} + e {1} , where e [0] ∈ A2[0] is the n
scalar part of the Clifford numbers, bivector part, . . . ,
e [1] ∈ A2[1]n is its vector part, e [2] ∈ A2[2] is its n
e [ n ] ∈ A2[nn ] is its n-vector part, and, finally, e {0} and e {1} are
even and odd parts of the Clifford number e . If
e ∈ A2{nl } , we put ∂ (e ) = l and
say that l is the degree of e .Multiplication of two Clifford numbers of ranks k and s gives the sum of Clifford numbers from
436
W. Cao and F. Hao
|k −s|
to
p = min(k + s, 2n − k − s )
[|k − s|]
[|k − s|+2]
with
increment
2,
i.e.,
A B =e +e + ... + e . In A2 ( R ) we introduce a conjugation operation which maps every element z = x1 + Ix2 to the element z = x1 − Ix2 . Now, the generalized complex plane [k ]
[s]
[ p]
is turned into the pseudo-metric space A2 ( R ) → ge 2
Sp ( p , q , r )
if one defines the
pseudo-distance as:
⎧ ( x − x ) 2 + ( y − y ) 2 , if z ∈ A ( R | i ) 2 1 2 1 2 ⎪ ⎪ 2 2 ρ ( z1 , z2 ) = ⎨ ( x2 − x1 ) − ( y2 − y1 ) , if z ∈ A2 ( R | e) ⎪| x − x |, if z ∈ A2 ( R | ε ) ⎪⎩ 2 1
(3)
where z1 := ( x1 , x2 ) = x1 + Ix2 , z2 := ( y1 , y2 ) = y1 + Iy2 . So, the plane of the Sp (2,0,0)
classical complex numbers is the 2D Euclidean space ge 2 numbers plane is the 2D Minkowskian space ge
Sp (1,1,0) , 2
, the double
and the dual numbers
Sp (1,0,1) . 2
plane is the 2D Galilean space ge When one speaks about all three algebras (or geometries) simultaneously, then the corresponding algebra (or geometry) is that of generalized complex numbers, denoted by
A2Sp ( p ,q , r ) (or
ge 2Sp ( p ,q , r ) ).
3 Geometric Algebra Neuron 3.1 Geometric Algebra Neuron A mathematical description of GA Neuron P of class A:
P = ∪ Pi
(4)
Pi = { X | ρ ( X , Y ) ≤ k , Y ∈ Βi , X = e + x, x ∈ G n }
(5)
i
Bi = { X | X = α si + (1 − α ) si +1 , α ∈ [ 0,1]}
(6)
n
Where k is a constant, G is n -grade Geometric Algebra, si is a sample of a clas s A in feature space. Let
ρ ( X , L ) be
L = p ∧ v , p is a point in line
the distance from X to line vector L ,
L , v is the direction vector of L . Then
Geometry Algebra Neuron Based on Biomimetic Pattern Recognition
⎧ 2 ρ ( X , p ) ≤ ρ ( X , p + v) and ⎪ ρ ( X , p), ⎪ ⎪ 2 ρ ( X , L ) = ⎨ ρ 2 ( X , p + v), ρ ( X , p + v ) ≤ ρ ( X , p ) and ⎪ ⎪ ρ 2 ( X , X '), otherwise ⎪ ⎩ where
i=
X '=
437
X '− p =− X '− p X '− p v = X '− p v
(7)
x ∧ (exp(−iπ / 2)v) p∧v exp(−iπ / 2)v , v+ (exp(−iπ / 2)v) ∧ v v ∧ (exp(−iπ / 2)v)
x∧ p And the GA Neurons S ( L; r ) is: x∧ p S ( L; r ) = { X | ρ 2 ( X , L ) < r 2 }
If L is a point (it means v is a scalar) in feature space, then lent to d ( x, p ) , and
(8)
d ( X , L ) is equiva-
S ( L; r ) is equivalent to S ( p; r ) (GA Neurons be-
comes a hypersphere). If v is a vector, then a GA Neuron S ( L; r ) is a compact
coverage compared to a hypersphere. A new neuron model, a GA Neuron is defined by the input-output transfer function: f ( X ; L ) = φ ( ρ ( X , L)) Where
φ (⋅)
(9)
is the non-linearity of the neuron, the input vector X = e + x, x ∈ G n ,
and L = p ∧ v , p is a point in line L , v is the direction vector of L . A typical choice of φ ( ⋅) was the Gaussian function used in data fitting, and the threshold function was a variant of the Multiple Weights Neural Network(MWNN) [9,10,11]. The network used in data fitting or system controlling consisted of an input layer of source nodes, a single hidden layer, and an output layer of linear weights. The network implemented the mapping ns
f s ( X ) = λ0 + ∑ λiϕ ( ρ ( X , Li ) )
(10)
i =1
where
λi ,0 ≤ i ≤ ns
were weights or parameters. The output layer made a
decision on the outputs of the hidden layer. One mapped
438
W. Cao and F. Hao
f s ( X ) = max φ ( ρ ( X , Li ) ) ns
(11)
i =1
where φ ( ⋅) was a threshold function. In this paper, ni = 3 .
3.2 GA Neuron Classifier The function of classifier in Patten Recognition is: according to the eigenvectors from feature extractor to mark the tested objects[8]. A classifier could be considered as a net or machine, which compute r distinguish function and select a sort of model corresponding to maximum of the functions. There are many ways to describe pattern classifier. The most usual used is the form: g k ( x), k = 1, , r . for all j ≠ k , If g k ( x) > g j (x)
(12)
。Distinguish function of GA N euron Classifier is equation (12), that is g ( X ) = f ( X ) = max φ ( ρ ( X , L ) ) 。
The classifier judge the eigenvector x as pattern k
ns
k
s
i =1
i
3.3 GA Neuron Net Learning Algorithm Let S denote the filtered set that contains the expression pattern which determine the network and X denote the original set that contains all the expression pattern in the order. Begin 1. Put the first expression pattern into the result set S and let it be the fiducially expression pattern sb , and the distance between the others and it will be com-
S = {sb } . smax = sb and ρ max = 0 2. If no expression pattern in the original set X , stop filtering. Otherwise, check the next expression pattern in X , then compute its distance to sb ,i.e., pared. Set
ρ = s − sb . 3. If ρ > ρ max , go to step 6. Otherwise continue to step 4. 4. If ρ < ε , set smax = s , ρ max = ρ , go to step 2. Otherwise continue to step 5. 5. Put
s into the result set: S = S ∪ {s} , and let sb = s , smax = s , and
ρmax = ρ . Then go to step 2.
Geometry Algebra Neuron Based on Biomimetic Pattern Recognition
6. If | ρ max
439
− ρ |> ε 2 , go to step 2. Otherwise put smax into the result set:
S = S ∪ {s
max
} , and let sb = smax , d max = s − smax go to step2. 3.3 GA Neu-
ron Net learning Algorithm.
4 Experiment and Analysis This experiment uses 300 space target samples with 6 classes for training and testing. Each class has 50 pictures from different angles as illustrated in figure 1.
Fig. 1. Sample pictures the United States, "Cassini" probe, Chang-e IV,Mars Global Surveyor,0809geoeye,Pierwszy polski satellite, polish satellite,Spy sateline 40 samples in each class are selected as training samples for recognition respectively, meanwhile 5 new samples in each class are selected as testing samples that means there are 270(40*6+5*6) samples totally for recognition Table 1 Experiment result Training samples numbe Testing samples numbe Accurate recognized number
20*6
25*6
30*6
35*6
40*6
120
120
120
120
120
92
103
108
110
115
Accurate rate
76.66%
85.81%
90.00%
91.66%
95.8%
Error recognized number
12
11
5
5
4
Error rate
10.00%
0.911%
4.16%
4.16%
3.83%
6
5
5
4
4
5%
4.81%
4.81%
3.33%
3.33%
Refused recognized number Refused rate
440
W. Cao and F. Hao
Table 1 illustrates the recognition result with different numbers of training samples. The result proves the efficiency of our method. Some testing samples are refused to be recognized from table 1 due to the threshold. It also illustrates the efficiency of the method to recognize the samples that do not belong to any trained class. The quality of image and the space target sample in image can result the measurement error which increases the distance between testing sample and sub-manifold centroid. This may be the reason of error rate and refused rate.
5 Conclusions The hypercomplex algebras and geometric algebra neuron are used to analyzed the problems found in the color and multispectral image processing and recognition effectively. We research the property of space color image, and propose biomimetic pattern recognition based on Clifford algebra with high dimensional space theory[8]. The experiment result proves the efficiency of our theory, and lays the foundation for next job. Acknowledgment. This paper is supported by National Natural Science Foundation of China (NNSF) No. 60871093, Pre-Research and Defense Fund of China: No.9140C80002080C80.
References 1. Labunets, R.E.V., Labunets, V.G., Astola, J.: Is the Visual Cortex a Fast Clifford Algebra Quantum Compiler. In: A NATO Advanced Research Workshop Clifford Analysis and Its Applications, pp. 173–183 (1999) 2. Labunets, R.E.V.: Fast Fourier–Clifford Transforms Design and Application in Invariant Recognition. PhD Thesis of Tampere University Technology, Tampere, Finland, pp. 262–300 (2000) 3. Labunets, V.G., Rundblad, E.V., Astola, J.: Is the Brain Clifford Algebra Quantum Computer. In: Proc. of SPIE Materials and Devices for Photonic Circuits, pp. 134–145 (2001) 4. Labunets, V.G., Rundblad, E.V., Astola, J.: Fast Invariant Recognition of Color 3D Images Based on Spinor–Valued Moments and Invariants. In: Proc. Of SPIE Vision Geometry X, pp. 22–33 (2001) 5. Lasenby, A.N., Doran, C.J.L., Gull, S.F.: Lectures in Geometric Algebra.Clifford (Geometric) Algebras with Applications to Physics. In: Mathematics and Engineering, Birkhouser, Boston, pp. 256–288 (1996) 6. Vela, M.: Explicit Solutions of Galois Embedding Problems by Means of Generalized Clifford algebras. In: Symbolic Computation, pp. 811–842 (2000) 7. Wang, S.J.: A New Development on ANN in China – Biomimetic Pattern Recognition and Multi Weight Vector Neurons. LNCS(LNAI), pp. 35–43. Springer, Heidelberg (2003) 8. Wang, S.J., et al.: Multi Camera Human Face Personal Identification System Based on Biomimetic Pattern Recognition. Acta Electronica Sinica, 1–3 (2003)
A Novel Matrix-Pattern-Oriented Ho-Kashyap Classifier with Locally Spatial Smoothness Zhe Wang, Songcan Chen, and Daqi Gao
Abstract. The previous work matrix-pattern-oriented Ho-Kashyap classifier (MatMHKS) can directly deal with images in matrix representation n1 × n2 such that the spatial information within these images is not destroyed. Although MatMHKS works with n1 × n2 per image, it is far less that this spatial correlation with the matrix form n1 × n2 can suggest the real number of freedom. MatMHKS just keeps the relationship of the pixels in the same row or column of images. In this paper we further consider the relationship of the pixels that close to each other may be correlated, and thus develop a new matrix-pattern-oriented Ho-Kashyap classifier named MatHKLSS that is introduced with a locally spatial smoothness. The experimental results here demonstrate that the proposed MatHKLSS has a superior advantage to MatMHKS in terms of classification.
1 Introduction The existing classifier design strategies generally base on vector pattern. When a non-vector pattern such as images is input, the non-vector pattern first has to be vectorized by concatenating its feature elements [1]. The socalled vector-pattern-oriented classifier design indeed brings a convenience while dealing with problems. However, such a vectorization especially for images, not only increases the computational complexity [2], but also breaks down the original spatial or structural information between the elements of Zhe Wang · Daqi Gao Department of Computer Science & Engineering, East China University of Science & Technology, Shanghai 200237, China Songcan Chen Department of Computer Science & Engineering, Nanjing University of Aeronautics & Astronautics, Nanjing 210016, China
H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 441–449. c Springer-Verlag Berlin Heidelberg 2009 springerlink.com
442
Z. Wang, S. Chen, and D. Gao
the image [3]. Several researchers have paid attention to the problems about the vectorization, and independently proposed a technique that can directly operate on matrix pattern without the vectorization preprocessing. Firstly, a series of matrixized (two-dimensional) feature extraction methods [4, 5] were independently proposed. Two-Dimensional Principal Component Analysis (2DPCA) [5] can extract features directly from image matrices and is shown to be better than the classical Principal Component Analysis (PCA) [6] in terms of both image classification performance and the reduction of computational complexity. Two-Dimensional Linear Discriminant Analysis (2DLDA) [4] is shown to not only increase the classification performance, but also overcome the singularity problem implicitly in classical Linear Discriminant Analysis (LDA) [7]. We found that during the classification processing of the matrixized or twodimensional feature extraction [4, 5], the classifiers following these matrixized feature extractors still resort to the traditional vector-based technique. In other words, the operating pattern of the subsequent classifier is still of vector representation rather than matrix representation. Intuitively, directly manipulating images in design of a classifier seems simpler and also not to lose too much spatial or local contextual information of the original image. To this end, our previous work was proposing a Matrix-pattern-oriented Modified Ho-Kashyap classifier with Squared approximation of the misclassification errors (MatMHKS) [8]. The advantages of MatMHKS are: 1) preventing the structural information of images from being fully destroyed; 2) decreasing the memory required for the weight vector from n1 × n2 to n1 + n2 ; 3) avoiding overtraining. The objective function of our previous work MatMHKS [8] is given as follows min I = Remp + cRreg , (1) where the empirical risk term Remp considers an image with the matrix representation n1 × n2 , and the regularization term Rreg tries to make the discriminant function smooth. However, even though MatMHKS takes n1 × n2 per image, it is far less that this spatial correlation can reveal the real number of freedom for images. Since the Remp of the equation (1) adopts the mathematical form uT A˜ v where the given image A ∈ Rn1 ×n2 and the weight n1 n2 vectors u ∈ R , v˜ ∈ R , the Remp only considers the relationship of the pixels in the same row or column of images. The Rreg of MatMHKS plays a role as a generalization regularization term but just makes the discriminant function smoothness. But Rreg still does not fully consider the image case, i.e., the spatial information of images. In a word, our previous work MatMHKS should be spatially rough and fails to fully explore the spatial information. The aim of this paper is to remedy the disadvantage of our previous work MatMHKS and to propose a new Matrix-pattern-oriented Ho-Kashyap classifier with Locally Spatial Smoothness named MatHKLSS. The proposed
A Novel Matrix-Pattern-Oriented Ho-Kashyap Classifier
443
MatHKLSS not only considers the relationship of the pixels in the same row or column of images, but also incorporates the prior spatial information of the correlated neighboring pixels. Thus while dealing with images, the proposed MatHKLSS here can utilize the local spatial information that the neighboring pixels of images should be similar. In practice, the MatHKLSS of this paper is designed by introducing a new regularization term Rloc into the objective function of MatMHKS. And the introduced term Rloc exactly suggests the prior information that the neighboring pixels of images should be similar. The experiments here compare the proposed MatHKLSS with our previous work MatMHKS, and demonstrate the effective classification of the newly proposed MatHKLSS. The rest of this paper is organized as follows. Section 2 gives the formulation of the proposed new Matrix-pattern-oriented Ho-Kashyap classifier with Local Spatial Smooth (MatHKLSS). Section 3 reports on all the experimental results. Finally, conclusion and future work are given in Section 4.
2 Matrix-Pattern-Oriented Ho-Kashyap Learning with Locally Spatial Smoothness (MatHKLSS) Suppose that there is an image matrix sample set S = {(A1 , ϕ1 ), ..., (AN , ϕN )}, where N is the sample number, Ap ∈ Rn1 ×n2 , and the corresponding class label ϕp ∈ {+1, −1}. In our previous work MatMHKS [8], its objective function is given as follows min I(u, v ˜, v0 , bp ) =
N
2
(ϕp (uT Ap v ˜ + v0 ) − 1 − bp ) + c(uT u + v ˜T v ˜), (2)
p=1
˜ ∈ Rn2 , the bias v0 ∈ R, the c ≥ 0 is a where the weight vectors u ∈ Rn1 , v regularized parameter, and bi ≥ 0. It has been stated that for a given image Ai , the first term of the right side of the optimization (2) just utilizes the relationship information between the pixels in the same row or column and the second term just keeps the global smoothness of the discriminant function of MatMHKS. Thus while dealing with images where the prior local smoothness need be emphasized than the global one, a local regularized method or a local spatial smoothness should be introduced. In this paper, for a given image A ∈ Rn1 ×n2 , we introduce the local spatial smoothness into the matrix-pattern-oriented classifier through the newly-proposed regularized term Rloc = (ui vj − us vt )2 , (i,j) (s,t)∈Nij
where ui , us are the ith and sth elements of the weight vector u respectively, vj , vt are the jth and tth elements of the weight vector v ˜ respectively, (i, j) denotes the pixel position that lies in the coordinate with the ith row and
444
Z. Wang, S. Chen, and D. Gao
the jth column of the image A, and (s, t) ∈ Nij denotes that all the (s,t) lie in the 4 or 8 neighbors of the (i, j). Thus, minimizing the term Rloc directly suggests that the neighboring pixels of images have similar values. In the other words, the corresponding element values should be similar if the elements are spatially near. Further, we give the objective function of the newly-proposed Matrix-pattern-oriented Ho-Kashyap learning with Locally Spatial Smoothness (MatHKLSS) as follows min J(u, v ˜, v0 , bp ) =
N p=1
+c
2
(ϕp (uT Ap v ˜ + v0 ) − 1 − bp )
(3)
(ui vj − us vt )2 .
(i,j) (s,t)∈Nij
The first term of the right side of the equation (3) plays the same role as that of the equation (2), i.e., measures the number of the misclassified samples and keeps the values of the pixels in the same row or column similar through the mathematic form uT Ap v ˜, which also accords with the statement that the values in the same row or column of images have a common divisor [9]. As the above said, the second term of the right side of the equation (3) suggests the local spatial smoothness, i.e., that the element values in neighbors would be similar if the elements are spatially near. Here, to get a simply mathematical form of the equation (3), we simplify the newly-proposed term Rloc as following Rloc = (ui vj − us vt )2 ≈
(i,j) (s,t)∈Nij n 1 −1
(ui−1 − 2ui + ui+1 )2 +
i=2
where
n 2 −1
(vj−1 − 2vj + vj+1 )2 ,
j=2
n 1 −1
(ui−1 − i=2 n 1 −1 T
2ui + ui+1 )2
(u (ei−1 − 2ei + ei+1 ))2
=
(4)
i=2 n 1 −1 T
(5)
(ei−1 − 2ei + ei+1 )(ei−1 − 2ei + ei+1 )T ]u
=u [
i=2
= uT Ru u, Ru =
n 1 −1
(ei−1 − 2ei + ei+1 )(ei−1 − 2ei + ei+1 )T , ei = [0...1...0]T ∈ Rn1 and
i=2
the ith element of ei is 1, ui = uT ei . Similarly, we can also get n 2 −1
(vj−1 − 2vj + vj+1 )2 = v ˜T Rv v ˜.
j=2
(6)
A Novel Matrix-Pattern-Oriented Ho-Kashyap Classifier
445
Consequently, by introducing the equation (5) and (6) into the objective function (3), we can get a simplified form as following min J(u, v ˜, v0 , bp ) =
N
2
(ϕp (uT Ap v ˜ + v0 ) − 1 − bp ) + c(uT Ru u + v ˜T Rv v ˜),
p=1
(7) where Ru = Rv =
n 1 −1
(ei−1 − 2ei + ei+1 )(ei−1 − 2ei + ei+1 )T , ei ∈ Rn1
i=2 n 2 −1
(ej−1 − 2ej + ej+1 )(ej−1 − 2ej + ej+1 )T , ej ∈ Rn2 .
j=2
It can be found that both Ru and Rv are constant values while n1 , n2 are given, which can bring a simpleness for optimizing the function (7). Then, by setting Y = [y1 , ..., yN ]T , yp = ϕp [uT Ap , 1]T , i = 1, ..., N, v = [˜ vT , v0 ]T , the function (7) can be rewritten in matrix form as following min J(u, v ˜, v0 , b) = (Y v − 1 − b)T (Y v − 1 − b) + c(uT Ru u + vT R˜v v), (8) 0 R v N . From the equations (7) or (8), we where 1, b ∈ R and R˜v = 0 0 cannot directly find closed-form optimal weights, instead, use the gradient descent technique to iteratively seek them. The gradients of the objective functions (7) and (8) with respect to u, v and b are N ∂J =2 ϕp Ap v ˜[ϕp (uT Ap v ˜ + v0 ) − 1 − bp ] + 2cRu u, ∂u p=1
∂J = 2Y T (Y v − 1 − b) + 2cR˜v v, ∂v ∂J = −2(Y v − 1 − b). ∂b Further through setting
∂J ∂u
= 0 and
∂J ∂v
(10) (11)
= 0, we can get
N N u=( Ap v ˜v ˜T ATp + cRu )−1 [ ϕp (1 + bp − ϕp v0 )Ap v ˜], p=1
(9)
(12)
p=1
v = (Y T Y + cR˜v )−1 Y T (1 + b).
(13)
From the equations (12) and (13), it can be seen that the weight vectors u and v are all determined by the margin vector b whose components determine the distance of the corresponding sample to the separation hyperplane. Moreover, since u and v are also mutually dependent, we seek the u and v by initializing b and the iteration. The design procedure for MatHKLSS is shown in Table 1.
446
Z. Wang, S. Chen, and D. Gao
Table 1 Algorithm MatHKLSS Input: S = {(A1 , ϕ1 ), ..., (AN , ϕN )} OutPut: the weight vectors u, v ˜, and the bias v0 1. fix c ≥ 0, 0 < ρ < 1; initialize b(1) ≥ 0 and u(1); set the iteration index k = 1; 2. Y = [y1 , ..., yN ]T , where yp = ϕp [u(k)T Ap , 1]T ; 3. v(k) = (Y T Y + cR˜v )−1 Y T (1 + b(k)); 4. e(k) = Y v(k) − 1 − b(k); 5. b(k + 1) = b(k) + ρ(e(k) + |e(k)|); 6. if b(k + 1) − b(k) > ξ, then go to Step 7, else stop; N N 7. u(k + 1) = ( Ap v ˜(k)˜ v(k)T ATp + cRu )−1 ( ϕp (1 + bp (k) − ϕp v0 )Ap v ˜(k)), p=1
p=1
k = k + 1, go to Step 2.
In Table 1, k denotes the iteration step, b(k + 1) = b(k) + ρ(e(k) + |e(k)|) prevents all the components of b from decreasing zero, and ξ is a preset terminated parameter. Finally, the discriminant function of the proposed MatHKLSS for input pattern A ∈ Rn1 ×n2 can be given in terms of > 0 , A ∈ class + 1 T f(A) = u A˜ (14) v + v0 < 0 , A ∈ class − 1 where the weight vectors u and v ˜ and the bias v0 are obtained in the process of training.
3 Experiments In this section, we compare the newly-proposed matrix-pattern-oriented classifier with local spatial smoothness MatHKLSS with our previous work MatMHKS [8] in terms of classification and computational complexity. The used images are Olivetti Research Laboratory (ORL) face database (available at http://www.cam-orl.co.uk) and Letter text database (available at http://sun16.cecs.missouri.edu/pgader/CECS477/NNdigits.zip), respectively. The ORL data set has images with the size 28 × 23 from 40 persons, each of which provides ten different images. The main challenge on this data set is of pose and expression variations. For ORL data set, we have employed the first six images per person for training and the rest for testing. Letter data set contains ten text classes consisted of the digits from ’0’ to ’9’, each of which provides 50 different images of the size 24 × 18. For Letter data set, we have employed 30 images per digit for training and the rest for testing. All computations are run on Pentium IV 1.40-GHz processor running, Windows XP Professional and MATLAB environment. The involved parameters
A Novel Matrix-Pattern-Oriented Ho-Kashyap Classifier
447
of both MatHKLSS and MatMHKS are given as follows: bp1 = 10−4 , ρp = 0.99, ξ = 10−4 , and the regularization parameter c is selected from the set {10−4 , 10−3 , ..., 103 , 104 } by N -fold cross validation [10] that randomly splits the samples into two parts (the training and testing sets) and repeats the procedure N times. In our experiments, N is set to 10.
3.1 Classification Performance Comparison This subsection compares the newly-proposed MatHKLSS with the previous MatMHKS in terms of classification. The difference between MatHKLSS and MatMHKS is the second regularized term of their objective function of (2) and (7). Since MatHKLSS emphasizes the neighboring relationship of pixels of images, its second regularized term (uT Ru u + v ˜T Rv v ˜) requires that the corresponding pixel values should be similar if the pixels are spatially near. In contrast, the previous work MatMHKS just requires the global smoothness of its discriminant function through (uT u + v ˜T v ˜). Intuitively, the local or neighboring information plays an important role in the image classification. Thus, we implement MatHKLSS and MatMHKS on the image datasets Letter and ORL, respectively. Since both MatHKLSS and MatMHKS can directly deal with image samples, both Letter and ORL are kept the original size, i.e., Letter with 24 × 18 and ORL with 28 × 23. Figure 1 gives the classification accuracies of MatHKLSS and MatMHKS on Letter and ORL, respectively. In this figure, the horizontal axis denotes the logarithm value of the regularized parameter c and the vertical axis denotes the classification accuracies. From Figure 1, it can be found that while the c decreases the accuracies of MatHKLSS is close to that of MatMHKS. This phenomenon attributes to that both (2) and (7) would degenerate the same empirical risk
100
100
MatHKLSS MatMHKS
90
80 percentage of Recognition(%)
percentage of Recognition(%)
80 70
60 50 40
30
70 60
50 40 30
20
20
10
10
0 −4
MatHKLSS MatMHKS
90
−3
−2
−1 0 1 value of log(c) (Letter)
2
3
4
0 −4
−3
−2
−1 0 1 value of log(c) (ORL)
2
3
4
Fig. 1 The classification accuracy (%) comparison of MatHKLSS and MatMHKS with varying values of the regularized parameter c on Letter and ORL, respectively
448
Z. Wang, S. Chen, and D. Gao
N 2 term p=1 (ϕp (uT Ap v ˜ + v0 ) − 1 − bp ) if the value c becomes zero. On the other hand, while the value c becomes large, the objective functions (2) and (7) both focus on their regularized terms, respectively. In our opinion, since the Rloc of the newly-proposed MatHKLSS utilizes the prior neighboring pixels of images, MatHKLSS should outperform MatMHKS, which is validated by the left sub-figure of Figure 1. But the right sub-figure of Figure 1 shows that MatMHKS is better than MatHKLSS while the c equals to 102 and 103 , which should be further discussed. It should be noted that Figure 1 shows that the accuracy of the previous MatMHKS fast decreases while the c becomes large, which demonstrates that the newly-proposed MatHKLSS introduces more information than MatMHKS.
3.2 Computational Complexity Analysis In this subsection, we give a discussion on the running time of the newlyproposed MatHKLSS and MatMHKS. All the experiments are implemented in the same environment. Figure 2 gives the average training time of MatHKLSS and MatMHKS with the variation of the regularized parameter c on Letter and ORL, respectively. Since both Ru and Rv of the objective function (7) can be computed in advance, MatHKLSS has the same computational complexity as MatMHKS in theory. From this figure, it can also be found that MatHKLSS has a comparable computational complexity to MatMHKS in the c with a smaller value. But the training time of MatMHKS fast decreases in the c with a larger value. Simultaneously, MatMHKS with a much less running time also has a much worse classification performance than its corresponding MatHKLSS comparing Figure 1 with Figure 2.
100
400
90
MatHKLSS MatMHKS
350
80 300 running time (in s)
running time (in s)
70
60 50 40
250 200 150
30 100
20 MatHKLSS MatMHKS
10 0 −4
−3
−2
50
−1 0 1 value of log(c) (Letter)
2
3
4
0 −4
−3
−2
−1 0 1 value of log(c) (ORL)
2
3
4
Fig. 2 The running time (in s) comparison of MatHKLSS and MatMHKS with varying values of the regularized parameter c on Letter and ORL, respectively
A Novel Matrix-Pattern-Oriented Ho-Kashyap Classifier
449
4 Conclusion and Future Work In this paper, we propose a new matrix-pattern-oriented linear classifier MatHKLSS. The contributions of the proposed MatHKLSS are: 1) that MatHKLSS utilizes the prior neighboring relationship between pixels of images through the newly-proposed regularized term Rloc and consequently introduces more information than our previous work MatMHKS [8]; 2) that MatHKLSS need not take more computation than MatMHKS since both Ru and Rv can be got before training. In future, we should implement the proposed MatHKLSS on more image datasets and give more discussion. On the other term Rloc here is just given as one form side, the newly-proposed 2 (i,j) (s,t)∈Nij (ui vj − us vt ) and in future can be generated to another form if the form can suggest some intuitive information. Acknowledgment. The authors thank Natural Science Foundations of China under Grant No. 60575027, the High-Tech Development Program of China (863) under Grant No. 2006AA10Z315, and High University’s Doctor foundation under grant No. 200802870003 for partial supports respectively.
References 1. Beymer, D., Poggio, T.: Image Representations for Visual Learning. Science 272, 1905–1909 (1996) 2. Chen, L., Liao, M., Lin, J., Yu, G.: A New Lda-based Face Recognition System Which Can Solve the Small Sample Size Problem. Pattern Recognition 33, 1713–1726 (2000) 3. Wang, H., Ahuja, N.: Rank-r Approximation of Tensors: Using Image-as-matrix Representation. In: IEEE CVPR (2005) 4. Li, M., Yuan, B.: 2D-lda: A Statistical Linear Discriminant Analysis For Image Matrix. Pattern Recognition Letters 26, 527–532 (2005) 5. Yang, J., Zhang, D., Frangi, A., Yang, J.: Two-dimension pca: A New Approach to Appearance-based Face Representation and Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 26, 131–137 (2004) 6. Jolliffe, I.: Principle Component Analysis. Springer, Heidelberg (1986) 7. Fisher, R.: The Use of Multiple Measurements in Taxonomic Problems. Annals of Eugenics 7, 179–188 (1936) 8. Chen, S., Wang, Z., Tian, Y.: Matrix-pattern-oriented Hockashyap Classifier with Regularization Learning. Pattern Recognition 40, 1533–1543 (2007) 9. Cai, D., He, X., Han, J., Huang, T.: Learning a Spatially Smooth Subspace for Face Recognition. In: CVPR (2007) 10. Xu, Q., Liang, Y.: Monte Carlo cross Validation. Chemometrics and Intelligent Laboratory Systems 56, 1–11 (2001)
An Integration Model Based on Non-classical Receptive Fields Xiaomei Wang and Hui Wei 1
Abstract. In this paper, we improve a new model of retinal ganglion cell receptive fields (RFs), which is different from conventional model of classical receptive fields (CRFs), and take into account the characteristic that the sizes of receptive fields vary with eccentricities. According to the information processing mechanism of retinal ganglion cell RFs including the non-classical receptive fields (nCRFs), we use the model to integrate image information. The results show that the model can not only enhance the edge contrast, but also transfer the low spatial frequencies which are filtered by the center/surround mechanism of the classical receptive field. So the image processing results are benefit for the integration of image information. The results are consistent with the spatial summation characteristic of ganglion cell RFs. Keywords: Ganglion cell, Non-classical receptive field, Size change, Integration.
1 Introduction Machine vision is of great significance to intelligent computing systems. However, the vast majority of machine vision researchers overlooked the principle of visual physiology, and proposed various algorithms from the engineering point of view. Therefore machine vision lacks a unified and effective algorithm to solve the problem of complex scenes segmentation and the current machine vision can only deal with single local information. But most scenes are composed of a wide variety of local information. In contrast, the human vision system has a strong processing ability to adapt to different environments. When a person observes a scene, the brain will well combine various local visual information and produce visual perception, which can give a harmonious feeling and identify objects of various sizes and characteristics from the very complex background. Therefore, by applying theories of human vision, neuropsychology and cognitive psychology, we are entirely possible to find some new breakthroughs to provide a new prototype of computer vision theory and to solve the difficulties encountered in computer vision. Xiaomei Wang . Hui Wei School of Computer Science, Fudan University, Shanghai 200433, China
[email protected] H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 451–459. springerlink.com © Springer-Verlag Berlin Heidelberg 2009
452
X. Wang and H. Wei
Receptive field is the basic unit of structure and function of visual information processing system. And an in-depth progress has been made on the receptive field research in recent years. Every visual cell has a classical receptive field (CRF), which is a small area in the retina that is responsible for eliciting the response of the neuron. The spatial summation property of CRF is to extract boundaries of images. Since the 1960's, many researchers [1, 2] have found that there is still a large scope of area beyond the CRF of a visual neuron, which is called as nonclassical receptive field (nCRF). Light spot stimuli in it can’t cause a direct reaction of the cell, but they can facilitate, inhibit or disinhibit the behavior of that cell. Li Chaoyi et al. studied the spatial summation property of disinhibitory nCRF of cat retinal ganglion cells [3] and lateral geniculate neurons [4]. They found that disinhibitory nCRF can not only enhance the bolder contrast but also compensate the loss of low-frequency which was caused by the antagonistic center/surround mechanism of the CRF. So, it plays an important role in transmitting the image information of area luminance contrast and luminance gradient [5]. In this paper, a receptive field is composed of a CRF and an nCRF. In addition, Li Chaoyi et al. also observed two phenomena: (1) the algebraic sum of the inhibition to the cell in the CRF induced by cells in different regions of the disinhibitory nCRF which are respectively stimulated by two independent light spots is bigger than that induced by those cells which are synchronously stimulated [6]; (2) the light spot array within the disinhibitory nCRF strongly influence the sensitivity of the cell response to the light spot within the CRF. The denser the array is, the lower the sensitivity is, vice versa. This fact shows that, there is mutual inhibition among the cell responses within a disinhibitory nCRF [7]. On the basis of the above fact, we improved the three Gaussians model proposed by Ghosh et al. in order to show the characteristic that mutual inhibition exists between any two cell responses within a disinhibitory nCRF. And we built a two-layer neural network model of retinal ganglion cell RFs in which all RFs were same in sizes [8]. In fact, the sizes of RFs are very different. The ganglion cell RFs in the fovea are much smaller than those in the periphery. And they increase with increasing eccentricities from fovea [9]. So, based on the two-layer neural network model, our study is to discuss the role of RFs with varying size in image information integration, which is of great significance to image segmentation, edge detection, object recognition, etc.
2 An Integration Model Based on Non-classical Receptive Fields 2.1 Neurophysiological Mechanism of Retinal Ganglion Cell NCRF The neurophysiological study shows that [10], the synchronous activities of horizontal cells in large-area affect the activities of photo-receptors by means of spatial summation and feedback, and then cause the ganglion cell response to the stimulates within the disinhibitory nCRF through bipolar cells. Moreover, Li Chaoyi et al. proposed that the bipolar cells far from the CRF center may form the large-scale nCRF of a ganglion cell via amacrine cells, and the RFs of these bipolar cells may form the subunits of the nCRF of a retinal ganglion cell respectively
An Integration Model Based on Non-classical Receptive Fields
453
Fig. 1 Structural diagram of retinal ganglion cell RF, where the red area is the CRF center, the green area is the CRF surround, the blue area is large-scale nCRF, and an area within a yellow circle is a subunit of the nCRF CRF center
CRF surroud
nCRF
[11]. Hence, horizontal cells, bipolar cells and amacrine cells all participate in the formation of the nCRF of a retinal ganglion cell. The structure of retinal ganglion cell RF (including nCRF) is shown in Fig.1.
2.2 The Integration Model Base on the above Neurophysiological mechanism, we design a three-layer neural network model to integrate image information, as shown in Fig. 2. The first layer is composed of photo-receptors. The second one is composed of retinal ganglion cells. And the third one is composed of the integration units. In this model, a RF is composed of an excitatory CRF center, an inhibitory CRF surround and a disinhibitory nCRF. The output of the second layer can be expressed as follow:
GC ( X , Y ) =
∑ ∑ W ⋅ RC ( x, y) − ∑σ ∑σ W
y∈σ1 x∈σ1
1
y∈
2
x∈ 2
2
⋅ RC ( x, y )
⎛ ⎞ + log B ⎜1 + ∑ ∑ W3 ⋅ RC ( x, y ) ⎟ , ⎝ y∈σ 3 x∈σ 3 ⎠ − A1 ⋅e W1 = 2πσ 1
( x − x0 )2 + ( y − y0 ) 2 2σ 12
− A2 ⋅e W2 = 2πσ 2 − A3 W3 = ⋅e 2πσ 3
(1)
,
(2)
,
(3)
( x − x0 ) 2 + ( y − y0 ) 2 2σ 22
( x − x0 )2 + ( y − y0 ) 2 2σ 32
,
(4)
454
X. Wang and H. Wei
Fig. 2 The simulate model of retinal ganglion cell RFs, where the green units represent photo-receptors located in the CRF center, the red units represent the ones located in the CRF surround, the blue units represent the ones located in the nCRF, and the units within a yellow circle compose a subunit of the nCRF
where GC(x, y) is the response of a ganglion cell; RC(x, y) is the image stimulus within a RF ; W1, W2 and W3 are the weighting functions of the photo-receptors within the CRF center, the CRF surround and the disinhibitory nCRF respectively; A1, A2 and A3 are the sensitivity of the CRF center, the CRF surround and the nCRF respectively; σ1, σ2 and σ3 are the radii of the CRF center, the CRF surround and the nCRF respectively; x and y are position coordinates of a photo-receptor; x0 and y0 are the center coordinates of a RF; X and Y are the position coordinates of a retinal ganglion cell; B is the logarithmic base and B>1, which represents the mutual inhibition among the cell responses within a disinhibitory nCRF. The third layer is the integration layer, in which similar outputs of the second layer are integrated together according to the topological neighborhood and the output response of the second layer. The output of the third layer can be expressed as follow:
⎧1 (2n ), ⎪3 (2n ), ⎪ Oi ( x, y ) = ⎨ ⎪ ⎪⎩(2n − 1) (2n ),
mean(GCi ) ≤ 1 n 1 n < mean(GCi ) ≤ 2 n (5)
(n − 1) n < mean(GCi ) ≤ 1
An Integration Model Based on Non-classical Receptive Fields
mean(GCi ) =
1 ∑ ∑ GC ( x, y), k y∈Ωi x∈Ωi
(i = 1, 2,
455
, m)
(6)
where Ωi is the No.i sub-region of the integration layer (i=1,2,…,m); Oi(x,y) is the outputs of the units within Ωi ; mean(GCi) is the input average of the units within Ωi; x and y are position coordinates of an integration unit; k is the number of units within Ωi; n is the number of integration sub-regions. For convenience of calculation, in the model, the layer of photo-receptors covers the range of 10 degrees eccentricity from fovea. Objects are imaged on the fovea, which is the central region of a retina and has the strongest sensitivity. The mean diameter of the fovea is about 5.2 degrees [12]. Therefore, the 10-degree range can sufficiently cover center field of vision. The first layer can receive input images with various sizes. So the amount of photoreceptors in the first layer is not fixed and is the same as that of pixels of each image. According to the fact that the amount of photoreceptors is 100 times that of ganglion cells in physiology, the amount of the cells in second layer is set to about one percent of that in the first layer. The amount of units in the integration layer is the same as that in ganglion cell layer. There is little reference on the study of the precise statistical data about the size change of ganglion cell RFs. According to the data in the paper written by Croner L.J. and Kaplan E. [13], we do a statistical analysis of the sizes of ganglion cell RFs located at different positions in retina. The cells are grouped into five eccentricity ranges (0°–2°, 2°–4°, 4°–6°, 6°–8°, 8°–10°), and the size values of the CRF center within them are shown in table 1. The ratio of CRF surround radius to CRF center radius does not vary systematically with eccentricity from fovea. And for the vast majority of ganglion cells, the ratio is more than 2. The sizes of ganglion cell CRFs are 3-6 times that of nCRF [14]. Table 1 Size values of ganglion cell CRF centers in the range of 10 degrees eccentricity from the fovea (in units of degree) Eccentricity range Radius
0-2 0.03
2-4 0.04
4-6 0.05
6-8 0.06
8-10 0.07
3 Results We use the model to integrate image information, and the processing results of the ganglion cell layer are shown in Fig. 3. Fig. 3(a) is an original picture of a horse. Fig. 3(b) is the processed image by only using CRF, which only retains the image edge information (high spatial frequency components) while completely lose the information of slow change in the brightness gradient (low spatial frequency components).But what we see is not a world only composed of edges. So our complex visual systems are not only limited to edge enhancement when they process image information. On the basis of the edge processing, they should transfer image information to brains as completely
456
X. Wang and H. Wei
Fig. 3 The image processing results of the ganglion cell layer. (a): the original image; (b): the processed image by only using CRF, and the model parameters are A1=1, A2=0.18, σ1=33, σ2=6.7×σ1; (c): the processed images by using RFs (CRFs + nCRFs), which have a constant size in different positions, the model parameters are A3=0.005, σ3=4×σ2, B=1.2; (d): the processed images by using RFs (CRFs + nCRFs), which have a varying constant size in different positions, σ1 take value according to table 1, and the other parameters are the same as those in (c)
An Integration Model Based on Non-classical Receptive Fields
457
as possible. Fig. 3(c) and Fig. 3(d) are the processed images by the whole RF (CRF + nCRF). The RFs in different positions in Fig. 3(c) have a constant size, which is the same as that of the smallest RF in Fig. 3(d). RFs, which have the smallest sizes in the center, increase with the increasing distance from the center in Fig. 3(d). Compared with Fig. 3(b), both Fig. 3(c) and 3(d) can efficiently restore the low spatial frequency information of the original picture, and the restoration don’t counteract the edge enhancement produced by the center/surround mechanism of CRFs, which is consistent with the spatial summation property of nCRF. The similar inputs in the photoreceptor layer can produce the similar outputs in the ganglion cell layer. On the principles of topological neighborhood and output similarity, the integration layer group the similar outputs of the ganglion cell layer together so as to achieve the integration of image information. In addition, by comparing Fig. 3(c) with Fig. 3(d) in the head and neck regions, we can see that Fig. 3(d) filters out more change information of the brightness gradient and better highlights the image edges. This is because the sizes of RFs in the neck and head regions in Fig. 3(d) are larger than those in Fig. 3(c). Hence it is evident that big RF is helpful to the integration of regional image information and small RF is beneficial to the display of image detail information. Fig. 3(d) shows the characteristic that RF increases with increasing eccentricity from fovea, which accord with the physiological characteristics. The processing results of the integration layer are shown in Fig. 4.
Fig. 4 The image processing results of the integration layer. (a): the image processed by the ganglion cell layer; (b): the histogram of the figure (a); (c): the output of the integration layer, when the border values are taken into account, the model parameters are n=4; (d): the output of the integration layer, when the border values are not taken into account, the model parameters are n=2
458
X. Wang and H. Wei
Fig. 4(a) is the image processing results of the ganglion cell layer. Fig. 4(b) is the histogram of Fig. 4(a). Because the model enhances the edge contrast and results in Mach band effect, the bright belt and the dark band can be seen on both sides of the edge of the horse. So the highest and the lowest values in the histogram correspond respectively to the bright band and the dark band of the border. When we take into account the border in Fig. 4(a) and Fig. 4(b), the values in histogram can be divided into four integration sub-regions, namely, the bright belt of the edge, the dark band of the edge, the horse and the background. The integration result is shown in Fig. 4(c). When we don’t take into account the border in Fig. 4(a) and Fig. 4(b), the values in histogram can be divided into two integration regions, namely, the horse and the background. The integration result is shown in Fig. 4(d). Contrasting Fig. 4(c) with Fig. 4(d), we can see that the integration result is better when the border values are not taken into account.
4 Discussions (1) Based on a new model of retinal ganglion cell receptive fields, we add the characteristic that the sizes of ganglion cell RFs increase systematically with increasing eccentricities from fovea, and further discuss the role of RFs with changing sizes in image information integration. Using the model to process images, we can see that it can not only enhance the edges of images but also retain the low spatial frequency components of images. Obviously, this result is consistent with the spatial summation property of nCRF and is benefit to the further image segmentation. (2) Research shows that ganglion RF can dynamically change its size [15]. According to different brightness, different lengths of time of stimulations, different background images, or the changes of speed of moving objects, the size of RF will be changed. For example, in the dark environment, visual neurons will expand RF and receive weak lights by spatial summation at the cost of reducing the spatial resolution. In the case of needing to identify the fine structures of images, RFs become smaller in order to improve the spatial resolution capability. Thus, visual systems can meet the needs of different tasks in different situations by adjusting themselves. In this paper, although the sizes of ganglion cell RFs increase with the increasing eccentricities from fovea, the size of the RF in fixed position don’t dynamically change with the different tasks. We will improve our work with the further research. We will make RFs adjust the sizes of their own according to nature of stimulus. That is to say, RFs will expand in the region of image details needing to be finely distinguished and reduce in the continuous region within which the image information changes slightly. By this way, images are accurately characterized, i.e., not only the image edges are extracted, but also region brightness contrasts are retained. Finally, image integration is achieved through spatial summation of neuron's RFs. Acknowledgment. Our research gets financial aid from National Natural Science Foundation of China (NSFC) project 60303007 and Shanghai Science and Technology Development Fund 08511501703.
An Integration Model Based on Non-classical Receptive Fields
459
References 1. Ikeda, H., Wright, M.J.: The Outer Disinhibitory Surround of the Retinal Ganglion Cell Receptive Field. J. Physiol. 226, 511–544 (1972) 2. Krüger, J., Fischer, B.: Strong Periphery Effect in Cat Retinal Ganglion Cells. Excitatory Responses in ON- and OFF-center Neurons to Single Grid Displacements. Exp. Brain Res. 18, 316–318 (1973) 3. Li, C., Zhou, Y., Pei, X., Qiu, F., Tang, C., Xu, X.: Extensive Disinhibitory Region Beyond the Classical Receptive Field of Cat Retinal Ganglion Cells. Vision Res. 32, 219–228 (1992) 4. Li, C., He, Z.: Effects of Patterned Backgrounds on Responses of Lateral Geniculate Neurons in Cat. Exp. Brain Res. 67, l6-26 (1987) 5. Li, C., Pei, X., Zhou, Y.: Role of the Extensive Area Outside the X-cell Receptive Field in Brightness Information Transmission. Vision Res. 31, 1529–1540 (1991) 6. Li, C.: Mutual Interactions and Steady Background Effects within and Beyond the Classical Receptive Field of Lateral Geniculate Neurons. In: Yew, D.T., So, K.F., Tsang, D.S. (eds.) Vision: Structure and Function, Singapore, pp. 259–280 (1988) 7. Li, C., He, Z.: Effects of Patterned Backgrounds on Responses of Lateral Geniculate Neurons in Cat. Exp. Brain Res. 67, 16–26 (1987) 8. Wang, X., Wei, H.: Two Improved Models of Retinal Ganglion Cell Receptive Field. In: 2008 International Conference on Humanized Systems, pp. 384–387. Posts and Telecom Press, Beijing (2008) 9. Kolb, H.: How the Retina Works. American Scientist 91, 28–35 (2003) 10. Yang, X., Gao, F., Samuel, M.: Modulation of Horizontal Cell Function by GABA(A) and GABA(C) Receptors in Dark- and Light-adapted Tiger Salamander Retina. Vis. Neurosci. 16, 967–979 (1999) 11. Qiu, F., Li, C.: Mathematical Simulation of Disinhibitory Properties of Concentric Receptive Field. Acta Biophysica Sinica 11, 214–220 (1995) 12. Druid, A.: Vision Enhancement System – Does Display Position Matter? [Master’s Thesis: Department of Computer and Information Science]. Linköping University, Sweden (2002) 13. Croner, L.J., Kaplan, E.: Receptive Fields of P and M Ganglion Cells Across the Primate Retina. Vision Res. 35, 7–24 (1995) 14. Li, C.: Integration Fields beyond the Classical Receptive Field Organization and Function Properties. News Physiol. Sci. 11, 181–186 (1996) 15. Li, C.: New Advances in Neuronal Mechanisms of Image Information Processing. Bulletin of National Natural Science Foundation of China 3, 201–204 (1997)
Classification of Imagery Movement Tasks for Brain-Computer Interfaces Using Regression Tree Chiman Wong and Feng Wan 1
Abstract. Classification of EEG (electroencephalographic) signals recorded during right and left motor imagery tasks is a technique for designing BCI (Braincomputer interfaces). In this paper, the regression tree is used to separate the right/left patterns that are extracted by ERD time courses. The regression tree is a statistical method to identify complex patterns without rigorous theoretical and distributional assumptions. The simulation result shows that the proposed BCI can provide satisfactory offline classification error rate and mutual information. Keywords: Brain-computer interface (BCI), Decision tree, Event-related desynchronization (ERD), Motor imagery.
1 Introduction Since the human’s EEG was first described by Hans Berger in 1929, people have speculated that it might be used for communication and control. In recent few decades, the BCI based EEG has attracted more and more attention in research since the EEG signal can reflect brain activity [1, 4]. In first BCI international meeting, the BCI was defined that it is a communication system that does not depend on the brain’s normal output pathways of peripheral nerves and muscles [4]. In other words, the patients who are paralyzed or have other severe movement deficits can have an alternative communication method for acting on the world. Thus, BCI translates the continuous EEG signal or brain activity into a discrete command for computer processing [3]. For this purpose, many research teams study to use many different algorithms and EEG signals to design the BCIs. Present-day BCIs based on the different EEG signals can fall into 5 groups. They are using VEPs (Visual Evoked Potential), slow cortical potentials, P300 evoked potentials, mu/beta rhythms and cortical neuronal action potentials, respectively [1]. In addition, a great variety of classifiers have Chiman Wong . Feng Wan Department of Electrical and Electronics Engineering Faculty of Science and Technology, University of Macau H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 461–468. springerlink.com © Springer-Verlag Berlin Heidelberg 2009
462
C. Wong and F. Wan
been tried to apply in BCI in order to find out the best classification algorithm to the BCIs based on different EEG signals. The more details can be found in [2]. This paper proposed a BCI using mu/beta rhythms feature from EEG signals that are recorded during left and right hand imagery movement from a healthy subject. Then the extracted features are classified by the regression tree. Decision tree is one of the most popular classification algorithms in data mining and machine learning because it is easily interpreted and comprehended by humans. Furthermore, decision tree can classify the complex patterns with only using several simply discriminants. The basic idea is to break up a complex decision into a union of several simpler decisions, thus providing a clear solution which is often easier to interpret. Note that the regression tree is the same kind of decision tree that is used to predict the numeric quantities. The performance of the proposed BCI is tested by BCI competition 2003 dataset III. The main purpose of this work is to investigate whether the decision tree is applied in BCI.
2 Methodology 2.1 Experimental Data The BCI competition 2003 dataset III is used to evaluate the proposed decision tree classifier algorithm in this work. The EEG dataset that provided by Graz BCI group was recorded from a 25 years old female during a feedback session [13]. Three pairs of bipolar EEG channels were positioned over C3, Cz and C4. It was sampled at 128 Hz and bandpass filtered between 0.5 and 30Hz. This dataset consists of 280 trials (140 left trials and 140 right trials) of 9 s length. In each trial, the first 2 s was quite. At 3 s, an arrow (left or right) was displayed as cue while the female was asked to imagine left or right hand movement according to the cue. One half of dataset (140 trials: 70 left trials and 70 right trials) are used to train the BCI system, the others are used for testing.
2.2 Feature Extraction An internally or externally paced event results not only in the generation of an event-related potential (ERP) but also in a change in the ongoing EEG in form of an event-related desynchronization (ERD) or event-related synchronization (ERS). Note that these events, sensory stimuli, can induce changes in the activity of neuronal populations that are generally called event-related potentials (ERPs). Furthermore, it is known since Berger (1930) that the events can block or desynchronize the ongoing alpha activity. This phenomenon represents frequency specific changes of the ongoing EEG activity and may consist either of decreases or increases of power in the specific frequency bands. It also may be considered to be a decrease or an increase in synchrony of the underlying neuronal populations, respectively. This former case is called event-related desynchronization (ERD) and the latter is called event-related synchronization (ERS) [5, 6]. Event-related desynchronization (ERD) of mu and beta rhythms is a distinct characteristic for imagery movement classification task. One of the basic
Classification of Imagery Movement Tasks for Brain-Computer Interfaces
463
measurements of ERD/ERS in time domain is that the EEG power within identified frequency bands is displayed relative to the power of the same EEG derivations recorded during the reference or baseline period a few seconds before the event occurs introduced in [6]. This is a very convenient, fast and simply method to extract the ERD time course feature from filtered EEG data. The aforementioned EEG dataset was sampled from 3 channels where are C3, Cz and C4, respectively. Only C3 and C4 channel data are employed here since the ERD time course in Cz is not clear. First, the C3 and C4 channel data is filtered by FIR (Finite Impulse Response) bandpass filter (10~12 Hz). Then the two time courses of ERD ec3(k) and ec4(k) are computed at k-th sampling point by equation (1) [11]. xc3(k) and xc4(k) are the filtered data from C3 and C4 channel, respectively, w −1
w−1
ec 3 (k ) = ∑ xc 3 (k − i ) , ec 4 (k ) = ∑ xc 4 (k − i ) 2 2
i =0
(1)
i =0
where w is the length of window. Each feature vector ec(k) is constructed by two column vectors ec3(k) and ec4(k), and denoted as.
ec (k , m ) = [ec 3 (k , m), ec 4 ( k , m)]
(2)
where k represents the k-th sampling point and m denoted the m-th trial. As the period of each trial is 9s length and the sampling frequency is 128Hz, the total sampling points should be 1152. In each trial, it may generate (1152-w+1) feature vectors. For every sampling point, there are 140 feature vectors for growing the tree. In other words, there are (1152-w+1) trees for feature classification totally.
2.3 Feature Classification Decision tree T is made up of nodes and branches. Node t is designated as an internal node or a terminal node (leaf node). Internal node can split into two children (tL and tR) while terminal node does not have any children. A terminal node has a class label associated with it such that observations that fall into particular terminal node are assigned to that class [7, 8, 9, 12]. Decision trees are designed for predicting categories rather than numeric quantities. Regression trees are designed for predicting numeric quantities, the same kind of tree representation can be used, but the terminal nodes of the tree would contain a numeric value that is the average of all the training set values to which the leaf, that represent the average output for observations that reach the leaf. Because statisticians use the term “regression” for the process of computing an expression that predicts a numeric quantity, decision trees with averaged numeric values at the leaves are called “regression tree” [8]. To construct a regression tree, it is performed in a manner similar to decision tree. Much of the work in designing the decision trees focus on deciding which property should be tested at each node. The fundamental principle underlying tree is that of simplicity. The expected target is to obtain a simple, compact decision
464
C. Wong and F. Wan
tree with few nodes. The method is to seek a property test at each node that makes the data reaching the branched nodes as “pure” as possible. The impurity of the node is measured by squared difference between the predicted values from the tree and the real values (or class label) [12, 14]. During the growing of the tree, the minimum impurity of every split is guaranteed. As a result, the growing algorithm is searching a threshold value at each node that splits two branches, left and right, whose total impurity is as small as possible. For instance, the splitting at node t is determined by one of the element’s threshold value in the feature vector ec(k,m) in this paper. At first, selecting one column of feature vector which is ec3(k) or ec4(k) and sorting the specific column in ascending order, denoted by esc3(k) or esc4(k) respectively. The matrix of threshold value splitj(k,i) at node t is calculated by (3)
split j ( k , i ) = es j (k , i ) +
es j ( k , i + 1) − es j (k , i ) 2
(3)
where i=1,2,…,m-1, k=1,2,…, (1152-w+1) and j represents C3 or C4. It means that the feature vector ec(k,m) at node t is classified to left (or right) branch if the feature ej(k,m) is larger (smaller) than splitj(k,i). Then the impurity R(t) of node t, tL and tR can be calculated by (4). tL and tR are the left and right child nodes of t.
R(t ) =
1 ( ym − y (t )) 2 ∑ n(t ) ec ( k , m )∈t
(4)
1 ∑ ym n(t ) ec ( k , m )∈t
(5)
y (t ) =
where ym is the class label of ec(k,m) and n(t) represents the number of feature vectors that fall into node t. Thus, for a splitting threshold value splitj(k,i), and node t, the difference in the mean squared error between before and after splitting is
ΔR ( split j ( k , i ), t ) = R (t ) − R (tL ) − R (tR )
(6)
where tL and tR are the left and right child nodes of t. Obviously, only R(t) is independent on splitj(k,i). It is easy to know that the summation of impurity of tL and tR is smaller, the difference ΔR(splitj(k,i),t) is bigger. The maximum of ΔR(splitj(k,i),t) implies that the child nodes tL and tR are the purest. Hence, the optimal threshold value splitj(k,i) may be obtained by maximizing the ΔR(splitj(k,i),t). The aim of splitting is minimizing the node’s impurity. If the impurity of tL (tR) is still large, then the similar splitting is continue. Otherwise, the node tL (tR) is terminal node. And the predicted value ŷ of the node t is estimated by the average of ym. Note that the predicted value ŷ is also the output of regression tree,
Classification of Imagery Movement Tasks for Brain-Computer Interfaces
yˆ (t ) =
465
1 ∑ ym . n(t ) ec ( k , m )∈t
(7)
The regression tree T is obtained by iteratively splitting nodes. The main problem is when to stop the splitting during the growing. One method is to set a small threshold value in impurity. Splitting is stopped if the impurity of node is less than the threshold value. Another method is to stop the splitting when a node in terminal represents fewer than some threshold number of data, say 5, for instance, after growing the tree, each terminal node is assigned to less than 5 training samples. However, it is often difficult to find the best threshold value in these two methods. An alternative principal method is pruning. In this method, the tree is growing fully at first. It indicates that the impurity of terminal node is zero or each terminal node does not contain more than 5 training data. This large tree fits the training data very well. However,it does not generalize well to the testing data. Thus, it is necessary to prune the meaningless branches from this large tree. As a result, a sequence of subtrees can be obtained by successively pruning branches [14]. The N subtrees are denoted T1, T2, T3, … , TN (T1=T, TN={t1}, t1 is the root node). The 140 training feature vectors ec(k) are given to train the regression tree Tk. Then the least training error of the tree T* will be chosen for testing. In addition, T* is pruned to N subtrees which are T*,1, T*,2, T*,3,…, T*,N before the testing. All subtrees except T*,N are used to classify the 140 testing feature vectors etc(k), respectively. If the output of tree Y is smaller than 1.5 (the mean of the class label of left and right trial), the corresponding feature vector is classified as a left trial. On the contrary, if Y is larger than 1.5, it is classified a right trial.
Y*,n (k , m) = T*,n (etc (k , m))
(8)
where T*,n is n-th subtree of T* (n=1,2,…,N). Y*,n(k,m) indicates the confidence of the classification result. If Y*,n(k,m) is closer to 1 (2), it gives more confidence to say that the trial is left (right) imagination. Finally, the error rate of T* is the minimum error rate of N subtrees.
3 Result and Discussion In general, classification error rate and mutual information are usually used to qualify the BCI’s performance. [10] proposed a measurement for information transfer based on mutual information, where is shown in equation (9) and (10)
MI n (k ) = 0.5 ⋅ log 2 (1 + SNRn (k )) SNRn (k ) =
2 var {Y*,n (k , m)} m∈{ L , R}
var {Y*,n (k , mL )} + var {Y*,n (k , mR )}
mL ∈{ L}
mR ∈{ R}
(9)
−1
(10)
466
C. Wong and F. Wan
Fig. 1 Box plot of Mutual information and Classification error rate for different window length (w=4~33) by cross validation. When w=19, the error rate is 0.1714, 0.1643, 0.1357, 0.1357, 0.1571, 0.2143, 0.1571, 0.1357, 0.1643 and 0.1643, the mutual information is 0.4136, 0.5189, 0.5340, 0.5519, 0.4616, 0.3547, 0.6173, 0.5025, 0.5681 and 0.5393, respectively
where {L} and {R} are the sets of left and right trials, mL and mR is the corresponding index, and var{∗} is the variance. The mutual information of T* is the maximum mutual information of N subtrees. The performance of regression tree is primarily determined by its training data (such as the order in which the training data are presented to the growing algorithm). Thus, each experiment is repeated 10 times for cross-validation to find out the general performance. For this purpose, the training and testing data should be selected randomly and they are non-overlapping. In addition, the training data still consists of 70 left trials and 70 right trials. Then the least training error of regression tree T* is learning from training feature vectors. Consequently, the error rate and mutual information of T* can be obtained. The same procedure is iterated 10 times and then the 10 error rate and 10 mutual information can be obtained. Furthermore, the parameter w in (1) also effects the classification result. Fig. 1 is the box plot of classification error rates and mutual information based on different window length. The box plot displays the minimum, first quartile (Q1), the median (Q2), the third quartile (Q3) and the maximum (Q4) of the data. The “+” represents the outlier since that point beyond the whiskers whose length is interquartile range (Q3-Q1). When w=19, the median value of mutual information can be achieved the maximum. It can be found that the optimal w in Fig. 1 is 19. So when w=19, the median value of mutual information is about 0.52 and the median value of error rate is 0.16. Fig. 2 shows the classification error rate and mutual information at every sampling point and the averaged output for the left and right trials when select w=19. From Fig. 2(a) ~ (b), the minimum error rate is 0.1357 at
Classification of Imagery Movement Tasks for Brain-Computer Interfaces
467
Fig. 2 The Classification error rate, Mutual information and Averaged tree output for the left or right trials at every sampling point. (a) Classification error rate; (b) Mutual information; (c) Averaged tree output for the left or right trial
4.75 s and the maximum mutual information is 0.5739 at 4.75 s. It is reasonable because the subject began the motor imagery after the cue ended at 3 s. It is also a very good result compared with BCI competition 2003 [13]. The Fig.2 (c) illustrates that the averaged tree output of right or left motor imagery. It can be found that there is a different pattern between imagination of left and right hand movement from 3.5 s to 6 s. The training time of fully growing tree is definitely longer than the training time of linear classifier (such as LDA). In Matlab simulation, the training time of growing all regression trees (1152-w+1) for 140 training trials using non-optimized code adds up to approximately 124s, and the processing time of regression tree is about 9 s. It is quite shorter than the growing time because the classification operation in tree only includes several properties testing. Fig. 3 shows the pruned tree that only has several branches. Fig. 3 This is the pruned tree that is applied to classify the testing data. The corresponding testing result is illustrated in Fig. 2. x1 and x2 are denoted ec3(k,m) and ec4(k,m) respectively
468
C. Wong and F. Wan
4 Conclusion A classical tree classifier is applied to the BCI based on the time course of ERD features. The simulation result shows that the proposed BCI has a satisfactory offline performance basically. The classification performance of decision tree is mainly determined by training data and the generalization of tree is related to the pruning technology. And this dataset is only based on a specified subject. Accordingly, the cross validation result may not reflect the real performance of proposed BCI completely in this paper. In future, the proposed BCI can be tested by more subjects’ datasets to investigate the generalization.
References 1. Wolpaw, J.R., Birbaumer, N., McFarland, D.J., Pfurtscheller, G., Vaughan, T.M.: Brain-Computer Interface for Communication and Control. Journal of Clinical Neurophysiology 113, 767–791 (2002) 2. Lotte, F., Congedo, M., Lecuyer, A., Lamarche, F., Arnaldi, B.: A Review of Classification Algorithms for EEG-based Brain-Computer Interfaces. Journal of Neural Engineering 4, R1–R13 (2007) 3. Bashashati, A., Fatourechi, M., Ward, R.K., Birch, G.E.: A Survey of Signal Processing Algorithms in Brain-Computer Interfaces based on Electrical Brain Signals. Journal of Neural Engineering 4(2), R32–R57 (2007) 4. Wolpaw, J.R.: Brain-Computer Interface Techology: A Review of the First International Meeting. IEEE Transactions on Rehabilitation Engineering 8(2), 164–173 (2000) 5. Pfurtscheller, G., Neuper, C.: Motor Imagery Activates Primary Sensorimotor Area in Humans. Neuroscience Letters 239, 65–68 (1997) 6. Pfurtscheller, G., da Silva, F.H.L.: Evented-Related EEG/MEG Synchronization and Desynchronization: Basic Principles. Clinical Neurophysiology 110, 1842–1857 (1999) 7. Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A.: Classification and Regression Trees. Wadsworth (1984) 8. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann Publishers, San Francisco (2005) 9. Safavian, S.R., Landgrebe, D.: A survey of Decision Tree Classifier Methodology. IEEE Transactions on Systems, Man, and Cybernetics 22(3), 660–674 (1991) 10. Schlogl, A., Keinrath, C., Scherer, R., Pfurtscheller, G.: Information Transfer of an EEG-based Brain-Computer Interface. In: Proceedings of the 1st International IEEE EMBS Conference on Neural Engineering, pp. 641–644 (2003) 11. Jia, W.Y.: Classification of Single Trial EEG during Motor Imagery based on ERD. In: Proceedings of the 26th Annual International Conference of the IEEE EMBS, pp. 5–8 (2004) 12. Duda, R.O., Hart, P.E., David, G.S.: Pattern Classification, 2nd edn. John Wiley & Sons Inc., Chichester (2001) 13. BCI competition II, http:// ida.first.fraunhofer.de/projects/bci/competition_ii/ 14. Martinez, W.L., Martinez, A.R.: Computational Statistics Handbook with MATLAB. Chapman & Hall/CRC, Boca Raton (2002)
MIDBSCAN: An Efficient Density-Based Clustering Algorithm Cheng-Fa Tsai and Chun-Yi Sung
Abstract. This investigation presents a clustering algorithm that incorporates neighbor searching into the density-based IDBSCAN algorithm. The rooted algorithm performs fewer searches than standard IDBSCAN. Experimental results indicate that the proposed MIDBSCAN algorithm has a lower execution time cost than DBSCAN, IDBSCAN or KIDBSCAN. MIDBSCAN has a maximum deviation in clustering correctness rate of 0.1%, and a maximum deviation in noise data filtering rate of 0.3%. Keywords: Data mining, Data clustering, Density-based clustering algorithm.
1 Introduction Data mining is widely employed in business management and engineering. The major objective of data mining is to discover helpful and accurate information among a vast amount of data, providing a reference basis for decision makers. Data clustering is currently a very popular and frequently applied analytical method in data mining. Research in data clustering focuses mainly on increasing the accuracy and reducing the clustering time cost [1]-[17]. Clustering schemes are classified as partitioning, hierarchical, density-based, grid-based and mixed methods. Partitioning methods, such as K-means, are the most popular clustering algorithms. The advantage of partitioning approaches is fast clustering, while the disadvantages are the instability of the clustering result, and inability to filter noise data. Hierarchical methods, such as CURE and CHAMELEON, involve constructing a hierarchical tree structure, and adopting it to perform clustering. These methods have high Cheng-Fa Tsai · Chun-Yi Sung Department of Management Information Systems, National Pingtung University of Science and Technology, 91201, Pingtung, Taiwan {cftsai,n9656018}@mail.npust.edu.tw H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 469–479. c Springer-Verlag Berlin Heidelberg 2009 springerlink.com
470
C.-F. Tsai and C.-Y. Sung
clustering accuracy, but suffer from continuously repetitive merging and partitioning: each instance must compare the attribute of all objects, leading to a high calculation complexity. Density-based methods, including DBSCAN, IDBSCAN and KIDBSCAN perform expansion and clustering based on density. These approaches can filter noise, and perform clustering in disordered patterns, but take a long time to perform clustering. Grid-based clustering algorithms, such as STING, segment data space into various grids, where each data point falls into a grid, and perform clustering with the data points inside the grids, thus significantly reducing the clustering time. Section 2 of this work reviews literature about the K-means, DBSCAN, IDBSCAN and KIDBSCAN algorithms. Section 3 describes in detail the concept, algorithm and implementation steps of MIDBSCAN mentioned in this investigation. Section 4 illustrates the experimental and analytical results. A summary is presented in Section 5.
2 Related Works This section introduces K-means, DBSCAN, IDBSCAN and KIDBSCAN density-based clustering methods, which are related to the proposed MIDBSCAN algorithm.
2.1 K-Means Algorithm K-means, presented by McQueen in 1967 [5], was the first clustering algorithm, and only needs to set the K centroid. The clustering method of K-means is as follows: (1) Generate K points randomly, and define these as the centroid. (2) Input all of the data from the database; calculate the closest distance from the point, then set it as the same cluster. (3) Recalculate the centroid; if it is unchanged, then conclude the algorithm; otherwise, repeat step (2). K-means is a partitioning clustering method, and therefore has fast execution. However, K-means produces unstable clustering, and does not perform noise filtering.
2.2 DBSCAN Algorithm DBSCAN, proposed by Ester et al. in 1996 [2], was the first clustering algorithm to employ density as a condition. It utilizes density clustering to place data points into the same cluster when their density within their data points is higher than a set threshold value, and sets this cluster as the seed for outward expansion. This algorithm must set two parameters, the radius () and the minimum number of included points (M inP ts). To search for sets, DBSCAN scans all data points in the database, and draws a circle of radius
MIDBSCAN: An Efficient Density-Based Clustering Algorithm
471
Fig. 1 DensityReachability of Core Point [2]
() around each data point Q. The number of data points within a circle is set as a neighborhood. If a neighborhood is greater than the set M inP ts, then its data points are set as expansion seeds, if P is the core point, and its neighborhood is less than the set M inP ts, then these data points are considered as a Border Point, as depicted in Fig. 1. DBSCAN can conduct clustering on disordered patterns, and has noise filtering capacity, as well as clusters that can be stabilized. However, parameter setting is difficult, which also affects the clustering result.
2.3 IDBSCAN Algoritm IDBSCAN is a density-based data clustering scheme developed by Borah et al. in 2004 [1]. This method applies a Marked Boundary Object to determine the data point of an expansion seed when searching for neighborhood to add in expansion seeds. Assuming that the√core point is P (0, 0), eight √ √ marked √ objects may be√defined 2), C(, 0), D(/ 2, -/ 2), √ as: A(0,), B(/ 2, √ / √ E(0, -), F (-/ 2, -/ 2), G(-, 0), H(-/ 2, / 2), as show in Fig. 2. where every quadrant contains three boundary objects. If P indicates the core point, and it satisfies the set density condition, then the algorithm finds within the neighborhood the closest point to these eight marked boundary objects, and sets these data points as the expansion seeds. Since these seeds may be selected using multiple marked boundary objects, the algorithm requires only one instance of input is needed. The number of seeds added is below (3d − 1), where d represents the dimension of the database. Fig. 2 Eight Marked Boundary Object of IDBSCAN [1]
472
C.-F. Tsai and C.-Y. Sung
2.4 KIDBSCAN Algorithm KIDBSCAN is a density-based clustering method presented by Tsai and Liu in 2006 [8]. They searched for marked boundary objects with IDBSCAN, and found that inputting data sequentially from low-density database causes remnant seed searching, resulting in poor expansion results. To decrease the number of sample instances, KIDBSCAN performs expansion by inputting elite points. It has three parameters, elite point, radius and M inP ts. The execution steps are as follows. (1) Adopt K-means algorithm to find the K numbers of the centroid within the database, then find the K data points closest to these centroid and define them as elite points, because K-means can discover these elite points quickly. (2) Move the K elite points to the very front of the database. (3) Execute the IDBSCAN algorithm. Experimental results prove that KIDBSCAN performs data clustering quickly.
3 The Proposed MIDBSCAN Algorithm The processing procedure of searching for Neighbors (neighborhood data points) is very time consuming in the DBSCAN and IDBSCAN algorithms. Therefore, to shorten the time consumed, this study focuses lowering the number of expansion seeds added into the neighborhood data in this procedure, thus reducing the time cost of searching for Neighbors. This section introduces the method, concept, implementation procedures and algorithm for MIDBSCAN. Searching for neighborhood data in density-based clustering algorithms is time-consuming. The time taken in DBSCAN is defined by SetofPoints.region Query(), such that the time complexity of searching for neighborhood data is O(n) where n denotes the database size. Data points that have been clustered do not need to be input again into the neighborhood search criteria when search-
Time cost of search neighbors : 100% Cluster 1(50%) Cluster 2(30%)
Noise(20%)
Time cost of search neighbors : 50% Cluster 1(50%) Cluster 2(30%)
Noise(20%)
Time cost of search neighbors : 20% Cluster 1(50%) Cluster 2(30%)
Noise(20%)
Fig. 3 Conceptual illustration of neighborhood search, where after each completion of a clustering task, the time cost of neighborhood search will be lowered to the number of those clustered
MIDBSCAN: An Efficient Density-Based Clustering Algorithm
473
ing for neighborhood points (See Fig. 3). Thus, the index array required for the neighborhood search is readjusted after each completion of clustering task. This neighborhood search index array is defined as for use at the beginning of the algorithm, and it stores the unclassified data points, with storage in Boolean type. MIDBSCAN requires n bytes of memory in the entire database. Although MIDBSCAN uses more memory space than conventional IDBSCAN, it reduces the time consumed. The time cost of the next neighborhood search following completion of a clustering task is O(n − c), where n denotes the database size, and c represents the size of the cluster set data size after the clustering task. The parameters are set when implementing MIDBSCAN: (1) Radius (), (2) Minimum number of included points (M inP ts). The MIDBSCAN clustering algorithm can be described as follows: Input: Datasets, Eps, MinPts Output: Clusters MIDBSCAN (Datasets, Eps, MinPts) Intialization; ClusterID := NextID(First); FOR i FROM 1 TO Datasets.Size DO Point := Datasets.Get(i); IF Point.CID = UNCLASSIFIED THEN IF ExpandCluster(Datasets, Point, ClusterID, Eps, MinPts) THEN ClusterID := NextID(ClusterID); UnclassfiedData.Adjust(); END IF END IF END FOR END; ExpandCluster(Datasets, Point, CID, Eps, MinPts) : Boolean; Neighbors := UnclassfiedData.RegionQuery(Point, Eps); IF Neighbors.size < MinPts THEN Datasets.ChangeCIDs(Point, NOISE); RETURN False; ELSE Datasets.ChangeCIDs(Point,CID); Neighbors.AddMBOs(); FOR i FROM 1 TO Neighbors.size DO neighborPoint := Neighbors.Get(i); IF neighborPoint.CID = UNCLASSIFIED || neighborPoint.CID = NOISE THEN Datasets.ChangeCID(neighborPoint,CID); END IF; END FOR; WHILE Seeds <> Empty DO seedPoint := Seeds.First(); Seeds.Delete(seedPoint); Neighbors := UnclassfiedData.RegionQuery(seedPoint, Eps); IF Neighbors.size >= MinPts THEN Neighbors.AddMBOs(); FOR i FROM 1 TO Neighbors.size DO neighborPoint := Neighbors.Get(i); IF neighborPoint.CID = UNCLASSIFIED || neighborPoint.CID = NOISE THEN Datasets.ChangeCID(neighborPoint,CID); END IF; END FOR; END IF; END WHILE; RETURN True; END IF END;
474
C.-F. Tsai and C.-Y. Sung
The implementation steps for the MIDBSCAN algorithm are described as follows: Step 1. Initialize all parameters, and define a new Cluster ID. Step 2. Begin scanning all data points within the entire database. For data points belonging to the Cluster ID of those unclassified data, implement the Expand Cluster processing procedure. The database is the set of data points; the Point is the core point; the Cluster ID is the current cluster ID; represents the radius, and M inP ts denotes the minimum number of included points. Step 3. If the data point returned by the expansion procedure function is a noise data point, then go directly to Step 2, until the Datasets database has been fully scanned. If an expansion data point is returned, then update the new Cluster ID, and alter the index array of unclassified data, and then go to Step 2. Step 4. End the algorithm when all data points have been processed. The implementation steps for the Expand Cluster processing procedure are as follows. Step 1. Search for Neighbors within the range of radius in the unclassified cluster index. If the number of Neighbors is less than M inP ts, then leave the procedure, and return the core point as the noise data point. Otherwise, go to Step 2. Step 2. Set the core point as the current cluster set ID. Step 3. If the set of seeds is empty, then end the expansion processing procedure, otherwise go to Step 4. Step 4. Search for the marking boundary point within Neighbors, and add in the expansion Seeds. Step 5. Set all unclassified data points and Neighbors that are noise data points as the current cluster ID. Step 6. Extract the first seed from the expansion seeds; and define it as the core point, and delete it. Step 7. In the unclassified data index, search for Neighbors within the range of radius of the core point. If the number of Neighbors is greater than M inP ts, then go to Step 3.
4 Experimental Results The clustering algorithm was implemented in the C# language in Microsoft Visual Studio 2005 on a notebook computer with a 1.6 GHz Intel CPU, with 1G of RAM, running Windows XP. The clustering correctness rate and noise filtering rate experiments were performed with six different pattern datasets. An additional pattern dataset was designed for the time cost experiment. The pattern image size was 900x700 pixels. The number of datasets
MIDBSCAN: An Efficient Density-Based Clustering Algorithm
475
Fig. 4 The experimental results obtained by MIDBSCAN
within Dataset1 to Dataset 6 was 115,000 of 2-dimensional data, which includes 15,000 (15%) noise data points, and a fixed radius (= 6). Dataset 7 contained 110,000-1,100,000 instances of data, with datasets of six different sizes (each containing 10% noise data points), and a fixed radius (= 16) for experimentation with datasets of varying sizes, as shown in Fig. 4. The proposed algorithms were applied to the Neighbors search processing procedures in the DBSCAN, IDBSCAN and KIDBSCAN algorithms, to form DBSCAN Plus, MIDBSCAN, KIDBSCAN Plus. Thus, the experiments were performed on six algorithms in total. The radius () was fixed, then according to the pattern data density to adjust its M inP ts to conduct the experiment. The experiments measured (1) Execution time, (2) Clustering Correctness Rate and (3) Noise Filtering Rate. Table 1 lists the datasets source and characteristics. Experimental results involving various pattern datasets comparing the original DBSCAN, IDBSCAN and KIDBSCAN algorithms and those with
Table 1 Benchmark Datasets Benchmark
Cluter
Points
Noise Rate
Source
Dataset 1
10
115,000
15%
Guha et al., 1998
Dataset 2
4
115,000
15%
Ester et al., 1996
Dataset 3
14
115,000
15%
C.F. Tsai and C.C.Yen, 2007
Dataset 4
2
115,000
15%
C.F. Tsai and C.C.Yen, 2007
Dataset 5
4
115,000
15%
C.F. Tsai and C.C.Yen, 2007
Dataset 6
4
115,000
15%
C.F. Tsai and C.C.Yen, 2007
Dataset 7
5
110K-1,100K
10%
2010 Olympic Winter Games
476
C.-F. Tsai and C.-Y. Sung
Table 2 A comparison between DBSCAN, IDBSCAN, KIDBSCAN, DBSCAN Plus, MIDBSCAN and KIDBSCAN Plus algorithms, using 6 kinds of pattern datasets, with data size being 115,000 (containing 15% noise data). Item 1 represents Execution Time (in seconds); Item 2 denotes Clustering Correctness Rate; Item 3 indicates Noise Filtering Rate Algorithm
Item
DataSet 1
DataSet 2
DataSet 3
DataSet 4
DataSet 5
DataSet 6
DBSCAN
1 2 3
1217.10 100% 95.06%
1210.50 99.99% 95.55%
1211.34 99.83% 92.43%
1211.86 99.96% 96.93%
1211.75 99.99% 95.72%
1246.51 99.97% 96.25%
1 2 3
807.18 100% 95.02%
877.39 99.99% 95.40%
770.95 99.83% 92.31%
920.03 99.98% 96.82%
874.25 99.99% 95.62%
857.40 99.96% 96.14%
IDBSCAN
1 2 3
278.84 100% 95.42%
422.95 99.97% 96.23%
425.81 99.99% 92.88%
465.43 99.96% 97.40%
381.06 99.99% 95.94%
435.17 99.95% 96.52%
MIDBSCAN
1 2 3
71.48 100% 95.77%
132.92 99.91% 96.78%
130.84 99.94% 93.20%
166.09 99.90% 97.66%
112.39 99.97% 96.20%
133.70 99.94% 96.81%
KIDBSCAN
1 2 3
275.54 100% 95.43%
415.95 99.98% 96.24%
424.73 99.97% 92.91%
459.00 99.96% 97.40%
379.17 99.98% 95.96%
436.06 99.95% 96.56%
1 2 3
124.64 100% 95.38%
248.53 99.99% 95.06%
233.17 99.99% 92.75%
299.25 99.96% 97.30%
216.82 99.98% 95.90%
255.92 99.96% 96.42%
DBSCAN Plus
KIDBSCAN Plus
Neighbor search processing indicate a variation in the clustering error rate of about 0.1%, and the variation in the noise filtering rate never exceeded 0.3%, revealing that the proposed algorithms do not influence the clustering quality and noise filtering capacity of the original algorithms. Experimental results on the pattern dataset of Dataset 1 demonstrate that the particularly high density of the dataset led to a large number of M inP ts and a high execution time cost. Additionally, the execution time cost of KIDBSCAN Plus algorithm in these datasets was also higher, since the algorithm also contains the K-means algorithm, making it likely to produce instability in the execution time cost. The time cost of DBSCAN and DBSCAN Plus algorithms in the 440K dataset exceeded 10,000 seconds (see Table 3). Because the experiment objective was to determine the efficiency that could be achieved by incorporating the proposed method into each of the algorithms, no further experiment were performed on the algorithms on datasets larger than 660K. Hence, Table 3 does not list all of the simulation results for DBSCAN and DBSCAN Plus (N/A means that the simulations were not performed). The adding the proposed method to IDBSCAN and KIDBSCAN significantly changed their time costs with datasets of size 1,100K. The Neighbors search processing reduced the time wasted in unnecessary searches, thus
MIDBSCAN: An Efficient Density-Based Clustering Algorithm
477
Table 3 Using Dataset 7 pattern to construct experiments with various dataset sizes to test the execution time cost (in seconds) of various clustering algorithms, with each dataset size containing 10% noise data points Algorithm
110K
220K
1116.46
4466.46
17840.89
DBSCAN Plus
769.25
3184.87
12444.76
N/A
N/A
N/A
IDBSCAN
133.23
439.65
1699.25
1098.75
1510.66
9794.62
MIDBSCAN
26.76
82.50
300.53
372.20
4976.26
1451.60
KIDBSCAN
133.57
459.92
1751.96
1077.65
1480.67
9631.90
KIDBSCAN Plus
43.10
188.04
335.29
1101.82
1465.18
1687.39
DBSCAN
440K
660K
880K
1,110K
N/A
N/A
N/A
improving the efficiency in searching large databases, and significantly reducing the execution time cost. The time cost of KIDSCAN Plus algorithm did not vary significantly from that of KIDBSCAN in datasets of size 660K and 880K. The KIDBSCAN algorithm first uses the K-means algorithm to seek its centroid; move the data points away from the centroid to the front end of the database, then runs the IDBSCAN algorithm. Since the K-Means algorithm first employs a random value to extract the K centroid, an unstable centroid extraction may compromise the implementation efficiency of IDBSCAN.
Fig. 5 Execution time cost (in seconds) of processing various dataset sizes by the MIDBSCAN, IDBSCAN, KIDBSCAN and KIDBSCAN Plus algorithms
478
C.-F. Tsai and C.-Y. Sung
5 Conclusion The proposed processing method for Neighbors search was added to the DBSCAN, IDBSCAN and KIDBSCAN algorithms, and tested with various patterns and sizes of datasets. The IDBSCAN algorithm with the new method, called MIDBSCAN, had better execution time cost, clustering correctness rate, and noise filtering rate and clustering stability than IDBSCAN alone. Acknowledgement. The author would like to thank the National Science Council of Republic of China, Taiwan for financially supporting this research under contract no. NSC 96-2221-E-020-027.
References 1. Borah, B., Bhattacharyya, D.K.: An Improved Sampling-Based DBSCAN for Large Spatial Databases. In: Proceedings of International Conference on Intelligent Sensing and Information, pp. 92–96 (2004) 2. Ester, M., Kriegel, H., Sander, J., Xu, X.: A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In: Proceedings of 2nd International Conference on Knowledge Discovery and Data Mining, pp. 226–231 (1996) 3. Guha, S., Rastogi, R., Shim, K.: CURE: An Efficient Clustering Algorithm for Large Data Bases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, vol. 27(2), pp. 73–84 (1998) 4. Karypis, G., Han, E.H., Kumar, V.: CHAMELEON: Hierarchical Clustering Using Dynamic Modeling. IEEE Computers 32(8), 68–75 (1999) 5. McQueen, J.B.: Some Methods of Classification and Analysis of Multivariate Observations. In: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297 (1967) 6. Tsai, C.F., Chen, Z.C., Tsai, C.W.: MSGKA: An Efficient Clustering Algorithm for Large Databases. In: IEEE International Conference on Systems, Man, and Cybernetics, vol. 5, pp. 6–13 (2002) 7. Tsai, C.F., Lee, J.C.: DK-Means: A Robust New Clustering Technique in Data Mining for Databases. Electronic Commerce Studies 5(4), 419–438 (2007) 8. Tsai, C.F., Liu, C.W.: KIDBSCAN: A New Efficient Data Clustering Algorithm for Data Mining in Large Databases. In: Rutkowski, L., Tadeusiewicz, ˙ R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2006. LNCS(LNAI), vol. 4029, pp. 702–711. Springer, Heidelberg (2006) 9. Tsai, C.F., Shih, D.C., Liu, C.W.: FICA: A New Data Clustering Technique Based on Partitional Approach for Data Mining. In: IEEE International Conference on Machine Learning and Cybernetics, Hong Kong, vol. 2, pp. 739–744 (2007) 10. Tsai, C.F., Tsai, C.W., Wu, H.C., Yang, T.: ACODF: A Novel Data Clustering Approach for Data Mining in Large Databases. Journal of Systems and Software 73, 133–145 (2004)
MIDBSCAN: An Efficient Density-Based Clustering Algorithm
479
11. Tsai, C.F., Wu, H.C., Tsai, C.W.: A New Data Clustering Approach for Data Mining in Large Databases. In: The 6th IEEE International Symposium on Parallel Architectures, Algorithms, and Networks, Manila, Philippine, pp. 278– 283 (2006) 12. Tsai, C.F., Yang, T.: An Intuitional Data Clustering Algorithm for Data Mining in Large Databases. In: IEEE International Conference on Informatics, Cybernetics and Systems, Taiwan, pp. 1487–1492 (2003) 13. Tsai, C.F., Yen, C.C.: ANGEL: A New Effective and Efficient Hybrid Clustering Technique for Large Databases. In: Zhou, Z.-H., Li, H., Yang, Q. (eds.) PAKDD 2007. LNCS(LNAI), vol. 4426, pp. 817–824. Springer, Heidelberg (2007) 14. Tsai, C.F., Yen, C.C.: G-TREACLE: A New Grid-Based and Tree-Alike Pattern Clustering Technique for Large Databases. In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds.) PAKDD 2008. LNCS, vol. 5012, pp. 739–748. Springer, Heidelberg (2008) 15. Tsai, C.F., Yen, C.C.: Unsupervised Anomaly Detection Using HDG-Clustering Algorithm. In: Ishikawa, M., Doya, K., Miyamoto, H., Yamakawa, T. (eds.) ICONIP 2007, Part II. LNCS, vol. 4985, pp. 356–365. Springer, Heidelberg (2008) 16. Wang, W., Yang, J., Muntz, R.: STING: A Statistical Information Grid Approach to Spatial Data Mining. In: Proceedings of 23rd International Conference on Very Large Data Bases, pp. 186–195 (1997) 17. Xu, R., Wunsch, D.: Survey of Clustering Algorithm. Proceedings of IEEE Transactions on Neural Networks 16(3), 645–678 (2005) 18. Vancouver 2010 XXI Olympic Winter Games, International Olympic Committee, http://www.olympic.org/uk/games/vancouver/index_uk.asp
Detection and Following of a Face in Movement Using a Neural Network Jaime Pacheco Mart´ınez, Jos´e de Jes´ us Rubio Avila, and Javier Guillen Campos
Abstract. In this paper, a system that uses smaller time in the detection of one person in one environment with controlled light is proposed. To reach this result the following 3 steps are used: 1) Detection of the movement; this fact consist in detecting the object (the object is a person); 2) Detecting of the face, the face is found in one image in movement, considering a camera and a fuzzy neural network. 3) Detecting of the person in movement. The camera is moved in the direction of the person in movement using the prediction of the position using the Kalman filter algorithm. This system is implemented in real time.
1 Introduction The systems of computer vision have used some processes as manufacture plants or in military plants. For example, in manufacture plants is used in the inspection or in the quality control of the products. In military plants it is used to find and follow the enemy to destroy it. In one system of computer vision to follow persons the first step is the detection of the moving object, the second step is to use one algorithm to interpret the image, and the last step is the prediction and the movement of the camera. The first researching about the face recognitions started in the sixties [3]. It consists in one system machine human where the computer saves and classifies a face in a base of data that are constructed from some photos of humans [23]. The research said that the eyes and the mouth are the first places to be recognized, the distance between the eyes, the noise and the mouth are the second parameters to be recognized. Jaime Pacheco Mart´ınez · Jos´e de Jes´ us Rubio Avila · Javier Guillen Campos Secci´ on de Estudios de Posgrado e Investigaci´ on, Instituto Polit´ecnico Nacional - ESIME Azcapotzalco, Av. de las Granjas 682, Col. Sta. Catarina. Azcapotzalco, M´exico D.F
[email protected]
H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 481–490. c Springer-Verlag Berlin Heidelberg 2009 springerlink.com
482
J.P. Mart´ınez, J. de Jes´ us Rubio Avila, and J.G. Campos
Actually, there exist some methods to detect the faces in images. Yang [25] classify this method in 4 categories: method based in the knowledge, in the invariant characteristics, the corresponding of the photos and in the characteristics of the facials and textures. Wang [15] converts the images in one reduced image to eliminate the textures, later he takes the characteristics of the eyes of possible faces. He obtain the 89.3% of right detection of 402 images. Froba and Ernest [8] detect the faces being in front taking the local characteristics with a mask of 3x3 pixels. Hamouz [11] present the method to detect and localize the faces being in front. First, he looks and filters 10 characteristics independent of the face, as are the corner and the center of the eyes, the noise and the mouth. Hichem [14] presents a fast method of learning to detect the color of the skin, using a method based in the form of the face and a neural network. Henrry [20] in the first step looks faces being in front and later he looks and presents in faces being inclined with some angle. Garcia [9] uses a neural network that deduces automatically the best filter, this network detects faces being inclined with an angle from 20 to 60 degrees. Maslabre [17], uses a radial basis neural network to distribute the faces of different dimensions. In Rafael [19] and Cesar [5] they design a device that is a camera that follows a white point based is several images and it predicts the movement with the Kalman filter. In this paper, it is proposed a system that follows one object (a person) using a camera over a mechanic and rotational system. It makes it detects the movements using the difference between consecutive images. It looks for the face using a fuzzy neural network. And finally it predicts the next movement with the Kalman filter algorithm.
2 The General System The general problem is to find and to look a person, to get this objective, it is proposed the system given in Fig. 1 and the flow diagram of the algorithm is given in Fig. 2. To implement the system is necessary to take a system to capture some consecutive images and to save these images in the hard disc to a future analysis. The system includes a target of video of 720x480 pixels and a camera of video of 512x480 pixels. The images are taken in black and white. The flow diagram of the Fig. 2 can be summarized in 3 steps: the first step is the capture of the image; the second step is the processing, the segmentation and the analysis of the image; and the last step is the recognizing and the interpretation of the image.
Detection and Following of a Face in Movement Using a Neural Network
483
Fig. 1 The system proposed
2.1 Capture of the Image The time of the capture of the image depends of the scanning that is used in the standard formats of TV NTSC and PAL which capture only the half of the horizontal lines for each line.
2.2 Looking for the Movement Looking for the movement of one person in one image is one of the first steps for the segmentation as can be seen in Fig. 1. Let us consider the following equations [10]: C = |A(to ) − B(t1 )| (1) If C < 0 → C = 0
(2)
Where A is the image in the time 0 and B is the image in the time 1. C is the absolute difference between the two images. Using the difference of one person in two images called T 0 and T 1 with the equations (1) and (2) can be detected the movement. In addition it can eliminate the noisy.
2.3 Segmenting of the Movement Sometimes, depending of the cloths and of the lights, the result of the difference of two images whit the threshold shows only the segmentation of the image in movements with holes. To overcome the problems related with the
484
J.P. Mart´ınez, J. de Jes´ us Rubio Avila, and J.G. Campos
Fig. 2 Flow diagram of the algorithm
illumination and the color, it is used first the dilatation and later the erosion given in the following equations [10]: X · B = (X ⊕ B) ⊗ B
(3)
Detection and Following of a Face in Movement Using a Neural Network
485
Where: X ⊕ B = d ∈ E 2 : x + b for each x ∈ X and b ∈ B X ⊗ B = d ∈ E 2 : d + b ∈ X for each b ∈ B
"
"
(4) (5)
The result dilatation and the erosion is a mask that will be used later.
2.4 Analysis of the Area In the process to look for some person in movement in one image, it is assumed that there is not other body or object in movement. The true is that it is uncommon. That is way it was obtained the mask to find that biggest area which is the person.
2.5 The Mask The process of the binary mask has two purposes: 1. Making the segmentation of the two images T 0 and T 1. 2. Determining how big is the body of the person that moves. To make the segmentation of images it is used the product operation. As the mask is one image that has the numbers 0 and 1, taking the product of one image (T 0 or T 1) with the mask the movement of the body will be segmented.
2.6 Looking for the Face It is applied the derivative for the mask combined with one of the images in a vertical form with the objectives to detect the form of the person and to take a measure of the width of the face. The width of the face gives a measure of how far is the person. This algorithm has the disadvantage that if the person has long hair it can have a mistake with the distance of the person. To look the face, it was used a fuzzy neural network with 12 inputs. The images of the neural network were 3 with 3 kind of skin, the first was a blond person, the second was a brown person and the last was a black person. If the part of the body of tested person is approximated to one of the 3 kind of persons (blond, brown or black), it is the face of the person. The fuzzy neural network used in this study was the found in the toolbox of Matlab.
2.7 The Square After to find the face, it is saved the coordinate of the face. It will be the upper and left part of the square. To make the square for the face will be used the following equations:
486
J.P. Mart´ınez, J. de Jes´ us Rubio Avila, and J.G. Campos
Scaling =
Coordenate of the mask Scale
Original coordinate = Original Image + Scaling
(6) (7)
2.8 The Calculus and the Prediction After to mark with a square the original face, the position of the square and the image are sent to the Kalman filter algorithm. The Kalman filter algorithm will find the position and the velocity of the image form the origin of the square and other images (T 2 and T 3) to follow the localized person. The square of the face of the person, has two objectives: 1. Localizing the person and determining the average position. 2. Letting and specific analysis of the person. To use the Kalman filter algorithm, it is used a mathematical model of the person who walks in straight line. To determine the state of the system it uses the position of the person (p) and the velocity (v); the input (u) is the acceleration and the output (y) is the position which in this case is the center of the square. The velocity v of the object is given by the following equation: vk+1 = vk + T uk
(8)
The change of the velocity is affected by noisy, i.e., the noisy needs to be included to the equation as follows: vk+1 = vk + T uk + vk
(9)
Where vk is the noisy added to the velocity. A similar equation is obtained for the position p: 1 pk+1 = pk + T uk + T 2 uk + pk 2
(10)
Where pk is the noisy added to the position. Let us define the state of the systems as: p xk = k (11) vk Finally, the equations of the system are: 2 1T T /2 uk + wk xk + xk+1 = T 0 1 yk = 1 0 xk + zk
(12)
Where zk is the measure produced by the instrumentation errors which may be the minimum.
Detection and Following of a Face in Movement Using a Neural Network
487
Fig. 3 Results 1 and 2
2.9 Sending of the Date After to predict the movement with the Kalman filter algorithm, the algorithm send a data proportional to the angle of the turn for the camera. This process uses the communication of two applications: the first is Matlab which find a program to make the communication, the flow of the data and the end of the communication; the second id a microcontroller (P IC16F 877) with a serial cable and a M AX232 circuit reaching velocities of 921Kbps.
3 Simulations The following figures shows the final results of the faces founded in many persons. The Table 1 shows the time used for each program to finish its run.
Fig. 4 Results 3 and 4
488
J.P. Mart´ınez, J. de Jes´ us Rubio Avila, and J.G. Campos
Fig. 5 Results 5 and 6
Fig. 6 Results 7 and 8
Table 1
Image T0 Image T1 Difference Filter Dilatation and erosion Area and center Mixing Looking for the face Kalman filter Turn of the servomotor Total
Minimum times (sec) Maximum times (sec) 0.333 0.333 0.333 0.333 0.016 0.047 0.125 0.235 2.281 2.267 0.547 1.016 0.0001 0.00016 182.34 236.5 0.192 0.92 0.002 0.28 186.1691
241.93
Detection and Following of a Face in Movement Using a Neural Network
489
4 Conclusions In this study an algorithm for the detection and the following of one person in movement in images was proposed. It used some techniques. The mask lets to make the segmentation of the image and to get the width and the area of the body. The fuzzy neural network determine if there exists a face considering 12 average inputs. The Kalman filter algorithm gives a prediction of the movement of a person who moves in front in an straight line. This system was implemented in real time. Acknowledgement. The authors are thankful with the editor to invite them to be part of the committee.
References 1. Baron, R.J.: Mechanisms of Human Facial Recognition. International Journal in Man 15, 137–178 (1981) 2. Barron, J.L., Fleet, D.J., Beauchemin, S.S.: Performance Optical Flow 3. Bledsoe, W.W.: Man Machine Facial Recognition. Technical Report, Palo Alto California (1966) 4. Castleman, K.R.: Digital Image Processing. Prentice-Hall, Englewood Cliffs (1996) 5. Cesar, D.: Seguimiento de Objetos Por Medio de Visi´ on activa. Technical Report of the INAOE (2002) 6. Davies, G.M., Ellis, H.D., Shepherd, J.W. (eds.): Perceiving and Remembering Faces. Academic Press Series in Cognition and Perception. Academic Press, London (1981) 7. Diccionario, L.E.: Diccionario de la Real Academia Espa˜ nola. Vig´esima Segunda Edici´ on (2001) 8. Froba, B., Ernst, A.: Face Detection with the Modified Census Transform. In: Sixth IEEE International Conference Automatic Face and Gesture Recognition, pp. 91–96 (2004) 9. Garcia, C., Delakis, M.: A Neural Architecture for Fast and Robust Face Detection. In: IEEE IAPR International Conference on Pattern Recognition, vol. 2, pp. 40–43 (2002) 10. Gonzalo, P.: Vision Por Computadora. Editorial Alfaomega (2002) 11. Hamouz, M., Kittler, J., Kamarainen, J.K., Palanen, P., Kalviainen, H.: Affine Invariant Face Detection and Localization Using Gmm-Based Feature Detector and Enhanced Aparence Model. In: Sixth IEEE International Conference on Automatic Face and Gesture Recognition, pp. 67–72 (2004) 12. Haralick, R.M., Shapiro, L.G.: Computer and Robot Vision. Addison-Wesley, Reading (1992) 13. Hartley, R.I., Zisserman, A.: Multiple View Gometry in Computer Vision. Cambridge University Press, Cambridge (2000) 14. Hichen, S., Bougemma, N.: Coarce to Fine Face Detection Based on Kkin Color Adaptation. INRIA France, http://www-rocq.inria.fr/imedia
490
J.P. Mart´ınez, J. de Jes´ us Rubio Avila, and J.G. Campos
15. Kongqiao, W.: Automatical Face Detection in Images with Complex Background. Neural Networks and Signal Processing 2, 1027–1030 (2003) 16. Kuo, B.C.: Sistemas de Control Digital. Editorial CECSA (1996) 17. Malasne, N., Yang, F., Paindavoine, M.: Real Time Implementation of a Face Tracking. IEEE Transactions on Neural Networks 14 (2003) 18. Martines, R.K.: Control de Movimientos de Robots Manipuladores. Editorial Prentice Hall, Englewood Cliffs (2003) 19. Rafael, J.: Dise˜ no del Control de un Robot de dos Grados de Libertad Para Aplicaciones de Seguimiento de Objetos. Technical Report of the INAOE (2003) 20. Rowley, H.A., Baluja, S., Takeo, K.: Neural Network-Based Face Detection. IEEE Transactions on Analysis and Machine Intelligence 20, 23–38 (1998) 21. Shing, Y., Jang, R.: Neuro Fuzzy and Soft Computing. Editorial Prentice Hall, Englewood Cliffs (1997) 22. Siyan, K.: Windows 2000 TCP/IP. Prentice-Hall, Englewood Cliffs (2001) 23. Taylor, W.K.: Machine Learning and Recognition of Faces. Electronics Letters 3, 436–437 (1967) 24. Welch, G., Bishop, G.: An Introduction to the Kalman Filter. Department of Computer Science, University of North California at Chapel Hill 25. Yang, M., Kriegman, D.J., Narendra, A.: Detecting Faces in Images: A Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence 24 (2002)
Nonparametric Inter-Quartile Range for Error Evaluation and Correction of Demand Forecasting Model under Short Product Lifecycle Wen-Rong Li*and Bo Li
Abstract. In manufacturing, many important decision-makings are based on demand forecasting of products. However, so many uncertainties lead to demand forecasting error. Especially, semiconductor manufacturing is characterized by its short product lifecycle, which means only limited historical data of a single product can be used to support error evaluation and correction of demand forecasting model, while traditional forecasting error evaluation and correction methods need large sample size to ensure the quality of results. To solve this problem, a new method titled “Nonparametric Inter-Quartile Range” (NIQR), which combines nonparametric density kernel estimation with cumulative probability distribution function, is proposed for error evaluation and model selection, and second quartile is used to correct model’s forecasting error. Numerical experiments in semiconductor manufacturing are used to show the feasibility and effectiveness of the proposed NIQR method for small sample size from short lifecycle product. Keywords: Uncertainty, Demand forecasting error, Short product lifecycle, Nonparametric inter-quartile range, Evaluation, Correction, Small sample size.
1 Introduction Demand forecasting in manufacturing is often affected by uncertainty, and at the same time, to consider all the factors impact product demand in the demand forecasting model is impossible. Demand forecasting error is the primary root of irrationality in production planning decision-makings. So, how to evaluate and correct error of demand forecasting model has been one of the hottest topics in demand forecasting field. Researches have been carried on popularly in error evaluation and correction of demand forecasting models. In Semiconductor Demand Forecast Accuracy Model Wen-Rong Li . Bo Li Institute of Astronautics & Aeronautics, University of Electronic Science and Technology of China, Chengdu 610054, China
[email protected],
[email protected] *
Corresponding author.
H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 491–500. springerlink.com © Springer-Verlag Berlin Heidelberg 2009
492
W.-R. Li and B. Li
(SeDFAM) [1], variance and covariance had been used to describe demand forecasting error. Geometric Brownian motion (GBM) process [2][3] had been borrowed to calibrate demand forecasting error, showed demand forecasting error has a linear accretion with increase of forecasting period. Forecasting errors have been collected as a new data series for future forecasting error dynamic prediction using different forecasting model, such as GM (1, 1) model and Neural Networks to correct future model’s forecasting error in order to improve forecasting precision [4][5]. Impacts of other important uncertainty factors beside model independent variables are considered in the forecasting model using weight coefficients to increase forecasting precision [6]. However, above methods are popular in research field, while statistical methods are widely used in practice in the manufacturing enterprises. The existing practical statistical methods are as following. Spread of forecast error includes variance, covariance, CV (Coefficient of Variation) [7], Sigma’ rule, range [8] and Inter-Quartile Range (IQR) [9], are used to help forecasting model evaluation and better model selection. Center of forecast error, contains mean, median of forecasting error or relative forecasting error, which are often opted to correct forecasting error. In this paper, we focus on statistical methods. Median and IQR are robust methods, which are resistant to extreme values are popular with manufacturing manager. They can be gained easily using commercial statistical analysis software such as SPSS or JMP. It’s well known that traditional statistical methods all need large samples to ensure the quality of results. However, in semiconductor manufacturing, historical data of one single product is limited for its short product lifecycle. Also these statistical results may dynamic change over time, so it leads smaller samples for median and IQR calculation in one short forecasting period. So far, there are no existing methods suitable for small samples effectively, so it’s high time for us to find an effective way to solve it. Nonparametric kernel density estimation [10] is borrowed to fit the unknown distribution using the small samples from short lifecycle product by its discrete sampling function of model’s independent variable, and then combined with cumulative probability function to calculate IQR and median of forecasting errors to evaluate and correct error of demand forecasting models. This new method called Nonparametric Inter-Quartile Range is proposed for error evaluation and correction of demand forecasting models more effectively, by using relative forecast errors as distribution statistic. It avoids the requirement of large sample size, and this is a breakthrough over traditional statistical methods. This paper is organized as follows. In section 2, propose the framework of NIQR method. Section 3 provides details of NIQR method, and shows its benefits. In section 4, Numerical experiments are used to validate distinction of the proposed new method, by comparing with widely used traditional methods. Finally, conclusion is reported in Section 5.
2 The Framework of Nonparametric Inter-Quartile Range This section will propose and describe the framework of new method Nonparametric Inter-Quartile Range, which solves the problem how to evaluate and correct error of demand forecasting models with small sample size.
Nonparametric Inter-Quartile Range for Error Evaluation and Correction
493
In order to overcome the small sample size problem from short product lifecycle in semiconductor manufacturing, Nonparametric Inter-Quartile Range is proposed to improve. It borrows the benefits of nonparametric kernel density estimation to estimate more simulative samples of model’s independent variable using limited small samples from short product lifecycle. This proposed NIQR method makes of three steps: nonparametric kernel density estimation, forecasting error evaluation using nonparametric IQR and forecasting error correction using nonparametric median. The steps can be described as Fig.1:
Choose statistic, sample size for error evaluation Nonparametric kernel density estimation Step 1
Select optimization bandwidth h
Density estimation for each discrete sampling point
Discrete probability density distribution Step 2 Calculate NIQR using cumulative probability distribution function to evaluate error of forecasting model for better model selection
Select sample size for error correction through DOE Step 3 Use Q2 (median) to correct error of demand forecasting models to reduce its impact to production planning decision-makings
Fig. 1 Steps of Nonparametric Inter-Quartile Range method
3 Details of Nonparametric Inter-Quartile Range and Its Benefits In this paragraph, how to make out the three steps of NIQR will be presented respectively and detailedly. Then discuss the benefits of new proposed NIQR method through comparing it with traditional methods.
494
W.-R. Li and B. Li
3.1 Nonparametric Kernel Density Estimation 3.1.1 Statistic and Sample Size Choice For different products and different market cycles, the demands are in different quantitative levels, such as 100 units or 10000units. If forecasting error is adopted as statistic, it may lead to large gaps among IQR of different products So, RFE (Relative Forecast Error) as Equation 1 is used as statistic in order to reduce iterative calculation error. For denominator can’t be zero, if demand is zero, it will be set as 1unit to ensure continuity of the calculations. RFE =
forecast − actual . actual
(1)
The sample size of error evaluation of demand forecasting models for better model selection can be chose based on the size of historical data. 3.1.2 Two Key Parameters Choice If nonparametric kernel density estimation is selected to fit distribution of demand forecasting error accurately, firstly M(I)SE ((Integrated) Mean Square Error) [10] must be made sure minimal. M(I)SE is lied on the selection of kernel function K(y) and bandwidth h, so these two parameters determine the precision of distribution fitting. According to long-term research both at home and abroad, Epanechnikov kernel is chose as the kernel function in our research, for it minimizes M(I)SE. The function of Epanechnikov kernel is as Equation 2 [11]: ⎧3 ⎪ (1 − x 2 ) | x |≤ 1 . k(x) = ⎨ 4 ⎪⎩ 0 | x |> 1
(2)
Now, let’s introduce an automatic and convenient method ——cross validation method which selects bandwidth of nonparametric kernel density estimation. The foundational thought is to cancel one point at each calculation, and this cancelled point is used to validate. Combining parametric maximum likelihood with cross validation method, we can get the equation for optimization estimate of bandwidth as Equation 3 [10]: N
N
hˆ = arg max L(h) = arg max Π fˆh* (X j ) = arg max Π [ h
h
j =1
h
j =1
1 ( N − 1)h
N
∑ k( i≠ j
xi − x j h
)] ,
(3)
where, X j -the cancelled point at j cross validation;
L(h) -maximum likelihood when bandwidth is h . The meaning of this equation is to select bandwidth where maximum likelihood with cross validation is maximal.
Nonparametric Inter-Quartile Range for Error Evaluation and Correction
495
3.1.3 Probability Density Estimation After opted kernel function and optimization bandwidth h, let’s go on to estimate corresponding probability density fˆ(x) of any point x in (−∞,+∞) , and fit out any unknown distribution by using probability density function as Equation 4 [10] [11]: 1 fˆ ( x ) = nh
n
∑ K( i =1
x − Xi ), h
(4)
where, K(⋅) -kernel function;
h -optimization bandwidth; X i -No.i point in the samples X;
n-samples quantity. For instance, to sort the discrete sampling points in ascending, from − ∞ to + ∞ , sample a point when cross a spacing of Δ . In practice application, we usually use reasonable large values instead of ± ∞ . Equation 4 is used to work out probability density for each discrete sampling point respectively, and fit out population distribution from samples with any unknown distribution. This method can increase quantity of samples for distribution fitting by discrete sampling of model’s independent variable. From this method, we can get the distribution form not a mathematic function but a series of discrete probability densities in numerical value form.
3.2 Error Evaluation and Correction of Demand Forecasting Models For unknown distributed of samples got by nonparametric kernel density estimation, IQR can be got though cumulating by using cumulative probability distribution statistic. Firstly, nonparametric kernel density estimation is used to fit population distribution with known samples, and then quartiles of IQR are estimated with discrete cumulative probability distribution function as Equation 5: X
Fˆn ( x) =
∑ fˆ (x) × Δ .
(5)
−∞
In Equation 5, fˆ ( x ) is a series of discrete probability density values in population distribution, which are simulated by using nonparametric kernel density estimation, Δ is the spacing between sampling points. Use Equation 5 to calculate the quartiles of NIQR: the first quartile (25th percentile) NIQR-Q1, the second quartile (50th percentile) NIQR-Q2 and the third quartile (75th percentile) NIQR-Q3. Then subtract NIQR-Q1 from NIQR-Q1 to calculate the Nonparametric Inter-Quartile Range: NIQR = NIQR-Q3 –NIQR-Q1, in order to help model evaluation and selection. In error correction step, the sample size of error correction can be chose through DOE by NIQR method, such as 3 to 10 samples before current time point to select
496
W.-R. Li and B. Li
an optimum sample size which minimizes forecasting Mean Square Error (MSE) dynamically over a period of time. Then the second quartile NIQR-Q2 of RFE can be used to correct future error of demand forecasting model as Equation 6. corrected forecasted demand = model forecast result × (1 + NIQR - Q2 of RFE) .
(6)
3.3 Comparison between Methods By comparing traditional model’s forecasting error evaluation and correction methods with the proposed NIQR a conclusion can be drawn that the obvious difference between these methods are that sample size of traditional methods are used for distribution fitting equal to historical points, while Nonparametric InterQuartile Range has larger samples over traditional methods for its discrete sampling function, larger sample size can be got by choosing smaller sampling spacing of independent variable X.
4 Numerical Experiments and Results 4.1 Question Presentation and Data Format In semiconductor manufacturing enterprises, many important decision-makings are based on demand forecasting. There are so many models can be used for product demand forecasting, but which one is a better option? How to evaluate and correct the forecasting error with small samples to support more reasonable decision-makings, as semiconductor manufacturing is characterized by short product lifecycle? In this section, nine forecasting models are built for one special product from an international semiconductor manufacturing company, and then use traditional methods and NIQR to evaluate and correct forecasting errors of these models respectively. Let’s go to introduce the data format used in numerical experiments: T1, T2…, T8, …are the forecasted time with equal interval; A, B…, H, …are the actual values of demand at each forecasted time; A1, A2…, B1, B2…, H1, H2, …are the forecasting values of demand at forecasted time with different forecast models. Then according to statistic discussion in subsection 3.1.1, RFE is chose as statistic in our research to get the data used for error evaluation and correction, as Table 1. Table 1 RFE data RFE Forecasted time Model 1
Model 2
Model 3
…
Model 9
T1
(A1-A)/A
(A2-A)/A
(A3-A)/A
…
(A9-A)/A
T2
(B1-B)/B
(B2-B)/B
(B3-B)/B
…
(B9-B)/B
…
…
…
…
…
…
T8
(H1-H)/H
(H2-H)/H
(H3-H)/H
…
(H9-H)/H
…
…
…
…
…
…
Nonparametric Inter-Quartile Range for Error Evaluation and Correction
497
There are 33 samples of this product, 27 samples are used for error evaluation and model selection, while 6 samples are used for error correction. The optimum sample size for error correction is chose by DOE from 3 to 10 samples.
4.2 Experiments of Models Evaluation and Selection Use the first 27 samples to calculate IQR by traditional and nonparametric methods respectively. Fig.2 and Fig.3 are the results of traditional IQR method and nonparametric IQR method. From both methods, mode 9 is chose as the best demand forecasting model as its minimal IQR. But the difference between the two methods will be discussed in the coming subsection.
4.3 Why NIQR Different? Now let’s move to study why NIQR method is different from traditional methods via their sample size for distribution fitting. In Fig.2, every model has 27 samples for distribution fitting, while as the discussion in subsection 3.1.3, NIQR method has discrete sampling function of independent variable X, so there are much more 0.8 0.7 0.6 0.5 0.4 tri 0.3 au Q 0.2 0.1 0.0 -0.1 -0.2 -0.3
IQR Q1 Q2(Median) Q3
1
2
3
4
5
6
7
8
9
Model No.
Fig. 2 Traditional IQR method for error evaluation and model selection 0.8 0.7 0.6 0.5 it 0.4 aru 0.3 Q 0.2 0.1 0.0 -0.1 -0.2 -0.3
IQR Q1 Q2(Median) Q3
1
2
3
4
5
6
7
8
9
Model No.
Fig. 3 Nonparametric IQR method for error evaluation and model selection
498
W.-R. Li and B. Li
Table 2 Sample size of Nonparametric IQR method for each model
Model No.
1
Sample size
122
0.05
0.045 li 0.04 abb 0.035 o 0.03 pr 0.025 ty sin 0.02 de 0.015 0.01 0.005 0 -0.9 -0.8 -0.6 -0.5 -0.4 -0.3 -0.1 0.01 0.14 0.27 0.4 0.53 0.66 0.79 0.92 1.05 1.18 1.31 1.44 1.57 1.7 RFE
a) model 1 l bi bao pr yt sin de
0.05 0.045 0.04 0.035 0.03 0.025 0.02 0.015 0.01 0.005 0 -0.9 -0.8 -0.6 -0.5 -0.4 -0.3 -0.1 0.01 0.14 0.27 0.4 0.53 0.66 0.79 0.92 1.05 1.18 1.31 1.44 1.57 1.7 RFE
d) model 4 0.05 0.045 li 0.04 abb 0.035 o 0.03 pr 0.025 yt is 0.02 ne 0.015 d 0.01 0.005 0 -0.9 -0.8 -0.6 -0.5 -0.4 -0.3 -0.1 0.01 0.14 0.27 0.4 0.53 0.66 0.79 0.92 1.05 1.18 1.31 1.44 1.57 1.7 RFE
g) model 7
2
3
4
5
6
7
8
9
134
117
241
197
235
180
140
116
0.05
0.045 li 0.04 b 0.035 bao 0.03 pr 0.025 ty sin 0.02 de 0.015 0.01 0.005 0 -0.9 -0.8 -0.6 -0.5 -0.4 -0.3 -0.1 0.01 0.14 0.27 0.4 0.53 0.66 0.79 0.92 1.05 1.18 1.31 1.44 1.57 1.7 RFE
b) model 2 0.05 0.045 li 0.04 ab 0.035 obr 0.03 p yt 0.025 i 0.02 nse d 0.015 0.01 0.005 0 -0.9 -0.8 -0.6 -0.5 -0.4 -0.3 -0.1 0.01 0.14 0.27 0.4 0.53 0.66 0.79 0.92 1.05 1.18 1.31 1.44 1.57 1.7 RFE
e) model 5 0.05 0.045 li 0.04 abb 0.035 o 0.03 pr 0.025 yt is 0.02 ne 0.015 d 0.01 0.005 0 -0.9 -0.8 -0.6 -0.5 -0.4 -0.3 -0.1 0.01 0.14 0.27 0.4 0.53 0.66 0.79 0.92 1.05 1.18 1.31 1.44 1.57 1.7 RFE
h) model 8
0.05
0.045 li 0.04 b 0.035 a b o 0.03 r p y 0.025 t i 0.02 s n e 0.015 d 0.01 0.005 0 -0.9 -0.8 -0.6 -0.5 -0.4 -0.3 -0.1 0.01 0.14 0.27 0.4 0.53 0.66 0.79 0.92 1.05 1.18 1.31 1.44 1.57 1.7 RFE
c) model 3 0.05 0.045 li 0.04 b 0.035 a b o r 0.03 p 0.025 y t 0.02 i s n e 0.015 d 0.01 0.005 0 -0.9 -0.8 -0.6 -0.5 -0.4 -0.3 -0.1 0.01 0.14 0.27 0.4 0.53 0.66 0.79 0.92 1.05 1.18 1.31 1.44 1.57 1.7 RFE
f) model 6 0.05 0.045 li 0.04 b 0.035 a b 0.03 o r p 0.025 y t 0.02 i s n e 0.015 d 0.01 0.005 0 -0.9 -0.8 -0.6 -0.5 -0.4 -0.3 -0.1 0.01 0.14 0.27 0.4 0.53 0.66 0.79 0.92 1.05 1.18 1.31 1.44 1.57 1.7 RFE
i) model 9
Fig. 4 Nonparametric distributions of demand forecasting errors (x coordinate is RFE, and y coordinate is density probability)
samples for each model distribution fitting in Fig.3, and their sample size and nonparametric distributions are shown as Table 2 and Fig.4. It’s obvious that the samples number is larger than traditional methods. Because NIQR has more samples over traditional methods, it can fit the population distribution and estimate relative parameters more correctly. Therefore it can evaluate and correct the model forecasting error more effectively. The effectiveness of NIQR for error correction will be validated in the coming subsection.
4.4 Sample Size Selection of Correction and Error Correction Methods Comparison After chose the best demand forecasting model 9, the next 6 samples of this model are used for NIQR error correction method validation. In order to choose an optimum sample size for error correction, the NIQR-Q2s of RFE using 3 to 10 samples before current sample are used for error correction using Function 6. Then use forecasting Mean Square Error (MSE) to evaluate the effectiveness of NIQR error correction method with traditional median and mean error correction methods. The results are shown as Fig.5. It’s obvious that 6 samples with NIQR method which minimizes MSE is the optimum sample size for model 9 forecasting error corrections, and it maybe changed over time, so DOE can be carried on over a period of time. After sample size selection, error correction of future forecasting can be achieved via Function 6 too.
Nonparametric Inter-Quartile Range for Error Evaluation and Correction
499
1000 NIQR-Q2
950
median
mean
900 SME 850 800 750 700
3
4
5 6 7 8 sample size for error correction
9
10
Fig. 5 Sample size selection for model 9 error correction
4.5 Results from Numerical Experiments Comparisons 1. Model forecasting error evaluation and better model selection: NIQR method increases the samples for distribution fitting to ensure the quality of results by its discrete sampling function of independent variable X, so it can estimate the population distribution more correctly over traditional methods. 2. Error correction: From DOE of sample size selection, the following two conclusions can be drawn. One is that NIQR-Q2 is a more robust method for sample size selection of error correction over traditional methods no matter how many the sample size is. Another is it can get the optimum sample size which minimize forecasting MSE. The proposed NIQR can avoid large samples requirement, and fit the distribution with small sample size more correctly. So it’s suitable for products with short lifecycle. Dynamic model evaluation and error correction can be carried out by collecting real-time data over time which is a great breakthrough over traditional methods.
5 Conclusion In this paper, a new model forecasting error evaluation and error correction method titled Nonparametric Inter-Quartile Range (NIQR) is proposed, which combines the distribution fitting based on nonparametric kernel density estimation with cumulative probability distribution function. It’s appropriate for error evaluation and error correction with small sample size from products with short lifecycle. This is a breakthrough over traditional methods, which means NIQR can fit the distribution more reasonable by increasing its sample size using the discrete sampling function of nonparametric kernel density estimation, in order to evaluation and selection forecasting models more correctly. With optimum sample size for error correction, NIQR can correct the forecasting errors more effectively with minimal forecasting Mean Square Error. And this new method can be applied in other factors analysis besides demand forecasting.
500
W.-R. Li and B. Li
Acknowledgements. This work is sponsored by the National Natural Science Foundation of China under Grant No. 70701007/G0109, supported by the Technology Support Plan of Sichuan province under Grant No. 2008JY0060, and also supported by Youth Foundation of University of Electronic Science and Technology of China (UESTC).
References 1. Cakanyildirim, M., Roundy, R.O.: SeDFAM: Semiconductor Demand Forecast Accuracy Model. IIE Transactions (Institute of Industrial Engineers) 34, 449–465 (2002) 2. Liang, Y.Y., Chou, Y.: Option-based Capacity Planning for Semiconductor Manufacturing. In: 2003 IEEE International Symposium on Semiconductor Manufacturing (ISSM 2003), pp. 77–80 (2003) 3. Chou, Y., Cheng, C.T., Yang, F., Liang, Y.Y.: Evaluating Alternative Capacity Strategies in Semiconductor Manufacturing under Uncertain Demand and Price Scenarios. International Journal of Production Economics 105, 591–606 (2007) 4. Zhao, X., Liu, T., Zhou, B., Hu, Y.: The Smoothing Improvement and the Application of Grey Model GM (1, 1). Journal of Northeast Dianli University 26(4), 63–66 (2006) 5. Pang, Z., Niu, Y.: Generalized Predictive Control Based on Predictive Error Correction by Neural Networks. Journal of Qingdao Technological University 27(4), 92–96 (2006) 6. Hu, H., Sheng, W., Li, G., Chen, Z.: Improvement of Linear Trend Estimation Method. Consumption Guide 143, 136 (2006) 7. Wallace, J.H., Mark, L.S.: Factory Physics: Foundations of Manufacturing Management, 2nd edn. Copyright 2001,1999, 1995 by the McGraw-Hill Companies, Inc. Authorized for Beijing: Tsinghua University Publishing House 251–254, 380–389 (2002) 8. Shi, G., Yang, L., Gong, W.: Foundation of Quanlity Control and Reliability Engineering, pp. 74–84. Chemical industry Publishing House, Beijing (2005) 9. Ma, G.: Managing Statistics. Science publishing house, Beijing (2002) 10. Matthias, H.: Introduction to Nonparametric Econometrics. HEC Lausanne & FAME, 20–21 (2003) 11. Jack, J., John, D.: Eonometric Methods, 4th edn., pp. 370–375. Copyright by the McGraw-Hill Companies, Inc. Translation Copyright by China Economics Publishing House (2002)
Simulated Annealing and Crowd Dynamics Approaches for Intelligent Control Qingpeng Zhang 1
Abstract. This is a survey of two approaches to intelligent control. The approaches are based on the author’s previous and ongoing projects in the Maximum Clique Problem (MCP) and the Crowd Dynamics. The ideas came from computational intelligence and fluid dynamics. One is Simulated Annealing (SA) algorithm and TABU search’s applications in the combinatorial search of intelligent control. Another one is from Continuum Dynamics and this approach can achieve a kind of group intelligence. This article is originally produced as a proposal and then a survey, thus is incomplete and lacking of simulation data. Keywords: Simulated Annealing, Crowd Dynamics, Intelligent Control.
1 Introduction In 1970s, King-Sen Fu and George N. Saridis introduced the idea that Intelligent Control (IC) is the intersection of Artificial Intelligence (AI), Automatic Control (AC) and Operations Research (OR). In [1], it can be described as follows: IC = AI ∩ AC ∩ OR
(1)
This relations and interactions can be expressed in Fig.1. Since its emergence, due to its perspective fascinating traits, Intelligent Control has attracted countless talented scientists and engineers to do research in relevant fields. After 30 years’ endeavor, there are mainly 6 kinds of Intelligent Controller: 1) 2) 3) 4)
Hierarchically Intelligent Control System; Expert Control system; Fuzzy Control; Neural Networks (NN)-Based Control;
Qingpeng Zhang Key Laboratory of Image Processing and Intelligent Control, Department of Control Science and Engineering, Huazhong University of Science and Technology, Wuhan 430074, Hubei, China
[email protected] H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 501–506. springerlink.com © Springer-Verlag Berlin Heidelberg 2009
502
Q. Zhang
Fig. 1 [1] Interaction of AI, OR and Control Theory, and resulting intelligent control
5) 6)
Human-Simulated Intelligent Control; Integrated Intelligent Control which combines two or more above areas.
In this section, the author would like to list the basic definition and functions of Intelligent Control’s different aspects/layers. The definitions are mainly from the Task Force on Intelligent Control (IEEE Control System Society, 1993) [2] and the panel discussion held by Panos J. Antsaklis (Notre Dame) in the 38th Conference on Decision and Control, 1999. Although many years have passed, the ideas are still valid and full of vitality. Most of them are classical and the guideline for new researchers now. Ideas from recent publications are also involved.
1.1 Control Control is to direct a system to a preassigned goal or to maximize a preassigned measure of utility under a set of specifications. Control methodology is the set of techniques and procedures used to construct and implement a controller for a dynamical system. The general control problem is how to construct a controller, given a model of the plant, so that specifications on how we would like the closed loop system to behave holds. [1]
1.2 Intelligence and Artificial Intelligence We still do not have a satisfactory quantitative way to characterize the “intelligence” of a controller or of a system. [7] An acceptable definition is stated in [1]:
Simulated Annealing and Crowd Dynamics Approaches for Intelligent Control
503
Intelligence is a (control) tool of fighting complexity and it has emerged as a result of evolution. Intelligence grows via 1) growth of computational power and 2) accumulation of knowledge of how to sense. Artificial Intelligence is defined to be the study of mental faculties through the use of computational methods. To achieve AI, there are mainly two approaches: 1) Using NN to simulate the capabilities that human brain possesses entirely. 2) Employing other aspects of AI like the emulation of evolution, annealing process and the ant colony’s intelligence.
1.3 Intelligent System and Intelligent Control Intelligent System’s attribute and capability are clarified in [3]: An intelligent system should have the ability to act appropriately in an uncertain environment, where an appropriate action is that which increases the probability of success, and the success is the achievement of behavioral sub-goals that support the system’s ultimate goal. There are varies of definitions of Intelligent Control from different aspects and understandings, in [3], intelligent control is defined as a computationally efficient procedure of directing to a goal of a complex system with incomplete and inadequate representation and under incomplete specifications of how to do this. Researchers are attempting to combine extend theories and methods from control, computer science and operations research to form a brand new kind of control, which is able to solve problems of complex, nonlinear and uncertain systems. The biggest advantage and the reason why Intelligent Control is introduced are that Intelligent Control can address control problems that cannot be formulated in the language of conventional control. The controller and the system to be controlled are combined [2]. Compared to conventional control, the “Control” in intelligent control is more general and more like the word which is used in everyday language [2]. The research areas relevant to intelligent control, in addition to conventional control include planning, learning, combinatorial search, hybrid systems, fault diagnosis, reconfiguration, NN, Fuzzy Logic, etc. Among these areas, combinatorial search holds a significant position because computational complex is the central issue and search algorithms are the main tool of fighting complexity. A variety of modern optimization algorithms was developed from different domains (which are mentioned a little in Section 2.2 and will be discussed in detail in Section 3). In this paper, the topics are combinatorial search algorithms.
1.4 New Modern Optimization Algorithms Based Intelligent Control Intelligent control encompasses many fields from conventional control, such as optimal control, robust control, stochastic control, linear control, and nonlinear control, as well as the more recent fuzzy, genetic, and neuro–control technologies [2]. Compared to 1980s to 1990s, neural networks, genetic algorithms, fuzzy
504
Q. Zhang
logic, and adaptive critics are no longer new ideas. In contrast, more conventional linear and nonlinear control theories have been improved approvingly over the years. Therefore, combinations of these techniques are used to capitalize on each technique’s strong points and minimize overall drawbacks. [8] To researchers including me, the challenge is to juxtapose all the methods mentioned above to develop controllers stronger than any single one. In this essay, I want to introduce several new tools from two domains: Statistical Physics and Fluid Dynamics based on my experience and understanding. I name them as Simulated Annealing and Crowd Dynamics Based Intelligent Control.
2 Simulated Annealing and Crowd Dynamics Based Intelligent Control In [5], Prof. K.M. Pissano (Ohio State) stated that: Intelligent control achieves automation, according to the authors, via the emulation of biological intelligence. It either seeks to replace a human who performs a control task (e.g., a chemical process operator) or it borrows ideas from how biological systems solve problems and applies them to the solution of control problems (e.g., the use of neural networks for control). Admittedly, bio-inspired computing and automation are intuitive and have produced significant impetus to the development of AI and intelligent control. However, intelligent goal selecting & path planning, pattern recognition, optimization, and other attributes known as factors of intelligent control under unexpected & complex environments can be achieved not only by emulation of biological intelligence. How a metal freezes into a minimum energy crystalline structure? How the liquid form natural lanes when encountered obstacles? Such questions are answered by physicians and mathematicians. Mathematical or statistical models observed by them could give us a new way to reach the global optimum in a particular system. For the first question, Metropolis Model is invented by Metropolis in 1952-1953. It was introduced to combinatorial optimization in 1983 and became widely know as Simulated Annealing (SA) algorithm [9]. TABU local search can be combined with SA very easily. The combined algorithm can obtain excellent capability for combinatorial optimization (I will describe it in section 3.1). For the second question, Fluid Dynamics has provided exact model (section 3.2). In the author’s view, both SA-TABU combined algorithm and Fluid Dynamics can be applied in control engineering in a simple way. As the new control tools will use modern optimization algorithms that are not widely used in intelligent control, I would like to name the proposed control methods as New Modern Optimization Algorithms Based Intelligent Control.
2.1 Simulated Annealing and TABU Based Intelligent Control We can develop a new type of controller based on SA and TABU local search. SA exploits an analogy between the way in which a metal cools and freezes into a minimum energy crystalline structure and the search for an optimum in a more
Simulated Annealing and Crowd Dynamics Approaches for Intelligent Control
505
general system. It has been proved theoretically to be convergent to the global optimum with appropriate Markov chains; TABU search is another effective algorithm for combinatorial optimization because the tabu list can help the algorithm jump from local optimum. According to the author’s project of Maximum Clique Problem (please refer to [11]), both of them are able to find the global optimum with a much higher probability than other algorithms like NN, but consume longer running time. Thus I combined them to achieve faster speed. The results showed that although for a certain problem, each computing step is slower than SA alone due to the tabu list, the number of steps is greatly reduced (for the chance to succeed is higher) so that the overall speed is increasing obviously. (Detailed process and results can be found in the author’s unpublished paper [11].) The new combined algorithms can replace old algorithms for combinatorial search and optimization in intelligent control directly.
2.2 Continuum Model Based Group Intelligence The research on the simulation of crowd dynamics was inspired by the demand from a variety of fields and implemented to numerous disciplines. Other than computer graphics, the crowd models also have important applications in psychology, disaster management, and eventually intelligent control. Its application in intelligent control will focus on path planning and groups’ goals selection. Most research projects in this field are multi-agent based. However in large groups and complex environments, the computing cost will increase exponentially. In [10], a continuum based model is developed from fluid dynamics. The model treats a group of individuals as a continuum. By parallelizing the functions in fluid dynamics into the algorithms, we can naturally reproduce realistic crowds with limited computing cost compared to multi-agent based models. The successful simulation of real crowds can have important application in multi-robots systems, guidance and planning.
3 Expected Results 3.1 Results of Simulated Annealing and TABU Based Intelligent Control As what have been stated above, the larger chance of finding the global optimum is the major advantage of SA. The drawback of slow speed can be alleviated by combining other algorithms like NN, GA and TABU. The prospect of SA based controller is encouraging.
3.2 Results of Continuum Model Based Group Intelligence The ability to form natural flows and avoid obstacles with low computing cost will enable the very large complex multi-robots systems, transportation systems, air
506
Q. Zhang
traffic management systems and other intelligent control systems to choose optimal paths and reasonable goals more quickly. Thus it will reduce accident rate in multi-systems. This attribute is extremely important for transportation and air traffic management systems. To conclude, the prospect of continuum crowds’ application in control systems is promising. Acknowledgments. The author thanks the anonymous reviewer for their constructive comments and suggestions. Prof. Linqiang Pan and Dr. Zehui Shao have provided tutorial support to the author’s research. Prof. Tsung-Chow Su has opened the door of crowd dynamics for the author.
References 1. Saridis, G.N.: Intelligent Robotic Control. IEEE Transaction on Automation Control 28, 547–557 (1983) 2. Panos, A.: Defining Intelligent Control. Technical report, Task Force on Intelligent Control, IEEE Control System Society (1993) 3. Meystel, A.: Intelligent Control. Encyclopedia of Physics and Technology. Academic Press, London (1993) 4. Passino, K.M.: Bridging the Gap Between Conventional Intelligent Control. Special Issue on Intelligent Control. IEEE Control Systems Magazine 13, 12–18 (1993) 5. Passino, K.M., Samad, T.: Perspectives in Control: New Concepts and Applications. IEEE Press, NJ (2001) 6. Xing, W., Xie, J.X.: Modern Optimization Algorithms, 2nd edn. Tsinghua University Press, Beijing (2005) (in Chinese) 7. Antsaklis, P.J., Baras, J.S., Doyle, J.C., Ho, Y.C., Johnson, T.L.: At the Gates of the Millennium: Are We in Control? In: Proceeding of the 38th Conference on Decision and Control (1999) 8. Neidhoefer, J.C., Krishnakumar, K.: Intelligent Control for Near-Autonomous Aircraft Missions. IEEE Transactions on Systems, Man, and Cybernetics—Part A: System and Humans 31, 14–29 (2001) 9. Kirpatrick, S., Gelatt, C.D., Vecchi, M.P.: Optimization by Simulated Annealing. Science 220, 611–680 (1983) 10. Treuille, A., Cooper, S., Popovi´c, Z., Crowds, C.: IEEE tranactions on Graphics 25(3), SIGGRAPH (2006) 11. Zhang, Q.P.: A Combined Simulated Annealing Algorithm for the Maximum Clique Problem. Technical report, Huazhong University of Science and Technology (2008)
Accomplishing Station Keeping Mode for Attitude Orbit Control Subsystem Designed for T-SAT Montenegro Salomón and Amézquita Kendrick 1
Abstract. Station keeping mode (SKM) for Attitude Orbit Control Subsystem (AOCS); this mode is an operational approach regarding the attitude dynamics control. The essential features of the design methodology is researching the basic theory and then afterwards, make an iterative design approach using some premise/assumptions, implementing the system simulation through the Simulink Matlab software package, designing the required controllers, monitor and analyzing the responses until the design give the best results within the range required. Firstly the thruster configuration is designed to acquire the parameters of the torque on the satellite. Next, the controllers are based on the well known PID control law. Controllers optimize the attitude and are used during the maneuvers for SKM. Afterwards the simulation results are given, and presented to demonstrate the performance and validity of the AOCS design approach. Finally the simulation results demonstrate that all the requirements were accomplished and the Station Keeping Mode was successfully designed.
1 Introduction Communication Satellite systems have become a great interest in the last decades. This is due to the technology progress, the necessity to increase scientific possibilities and telecommunication performance capacity in the long-term, but also because they allow nation’s governments to develop a complete satellite system at a large. This also provides possibilities for telemedicine, tele-education, educational projects at moderate costs for this country, involving students and scientists in space missions and acquiring experience, knowledge and expertise for their future career. In the context of a worldwide communications network, the role of satellite communications systems is very important. Satellite communications links add capacity to existing communications capabilities and provide additional alternate routings for communications traffic. Satellite links, as one of several kinds of long-distance links, interconnects switching centers located strategically around the world. Montenegro Salomón . Amézquita Kendrick Beihang University Astronautics and Aeronautics, Beijing, China H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 507–516. springerlink.com © Springer-Verlag Berlin Heidelberg 2009
508
M. Salomón and A. Kendrick
T-SAT Project Communication satellites are generally located in Geostationary Orbit (GEO) and used according the configuration and specific design requirements. However a GEO satellite is usually launched with a payload that allows different implementations, therefore is also necessary the design a reliant and efficient launch vehicle that allows the positioning in space of a rather large mass and dimensions satellite. Below is introduced the communication satellite project investigated defined as T-SAT. The T-SAT satellite developed model, represents a special project for students training and can be used in professional courses of satellite technology. The students are divided in several teams, each one working with specifics task regarding the complete design and simulation project. The aim of the experience is to accomplish the necessary work in a concurrent engineering design manner and make all the joints and interfaces necessaries in order to improve the development of the design. This situation give good experiences about the whole satellite design process and allows exercising in the practice all the knowledge learned during the professional training courses. It is primarily an educational project with the objective to provide a realistic learning environment and hands-on experience for undergraduates, graduates and staff, in the development of a communication satellite based in a conceptual platform. The T-SAT, is a virtual satellite, which needs to meet some necessities according to the clients requirements documentation in which are explained some points that must be covered by the AOCS. For this paper we regard only those that are related to the performance requirements in Station Keeping Mode (SKM).operation. Function Requirements • The subsystem shall meet the required performance for the correct and expected accuracy of pointing and functioning of the Payload subsystem. Performance Requirements The attitude and orbit control system (AOCS) is one of the key subsystems within a satellite, as it's heavily interfacing to all other subsystems, then is necessary to have devices that can accomplish all tasks and reach that making well performance, during this work the focusing will be on transfer orbit and geostationary orbit. Geostationary Orbit • The attitude error of station-keeping mode should be less than: Roll: ±0.08°, Pitch: ±0.08°, Yaw: ±0.20° • When the station position changes, the S/C will be a long term attitude biased to keep the beam coverage on the Earth unchanged, and in such cases the attitude error of normal mode could be 1.2 times of the nominal value. • Station keeping errors: W/E ±0.1°, N/S ±0.1°
Accomplishing SKM for Attitude Orbit Control Subsystem designed for T-SAT
509
This paper was generated investigating the assessment, possibilities and performance of AOCS subsystem, in order to check the attitude control during station keeping mode of the T-SAT and at the same time tried to improve this kind of control. The design of the controller of attitude actuators is the primary aim of this paper and therefore will be explained. Station Keeping Mode During acquisition of the ground station position in geostationary orbit , as well as North-South Station Keeping (NSSK) and East-West Station Keeping (EWSK), the SKM is used to keep earth-pointing three axis attitude for payload operation. The 10N thrusters in pairs are symmetrically fired in drift orbit, to execute final ground station position acquisition maneuvers (The S/C to be accelerated east or west) and inclination correction maneuvers (The S/C to be accelerated south or north). In SKM, Earth Sensors are used to measure roll and pitch angles and Sun Sensors are used to measure yaw angle. The control actuators are 10N thrusters while the Momentum Wheels speed is kept constant. Since propellant to be consumed in NSSK is about 80 percent of orbit correction propellant, the optimal time should be selected to execute NSSK firing maneuvers in order to save propellant, and to extend S/C service life. During the NSSK, it is necessary to switch from NM to SKM because of a larger ΔV requirements, and 10N thrusters are used to control three-axis attitude. The AOCS needs the attitude feedback information as well as angular rate feedback information. Therefore, the gyros need to be warmed up firstly before entering the station keeping mode. When the integration of the angular rates from the gyros is used for the yaw measurement, it is necessary to calibrate the gyros by ground commands to determine compensation values for gyro drift. When the gyros are not able to provide angular rates of the S/C accurately, and only measurements of S/C’s three-axis angles are used for the attitude control in SKM; AOCS shall operate in the back-up SKM with selected telecommand according to the solar array characteristics. Simulation During the Attitude simulation process some mathematical models were used to emulate the physical behavior of the satellite and the space environment. These models will be reviewed below. Attitude Premise/Assumption In order to perform the simulations some values were assumed to be: For Satellite’s Dynamic and Solar Disturbance Torque: • • • •
Orbital elements: a=42165.53 Km, i=0.6°, e=0.001, Ω=60°, ω=25°, λ=105.1° Inertial Values: Ix=3000, Iy=1000, Iz=3000 Center of mass position: cx=0.00, cy=0.00, cz=0.834 Mass: 1250; Launch time: 00:00 Jan 1st 2008
510
M. Salomón and A. Kendrick
For Thruster configuration design and simulation: • Body size: 2200×1700×1700 (mm) and Center of mass position: cx=0.00, cy=0.00, cz=0.834 For Station-keeping controller design and simulation: • Satellite’s dynamic model • Solar Disturbance torque model • Thruster model Satellite Mathematical Model: The simplified dynamics equations of a rigid spacecraft are: ψ
φ
In the equations, assume that the inertial matrix of the satellite is diagonal matrix, , ,
,.
is the coordinate of the angular momentum of
momentum wheels in body coordinate.
is the coordinate of the outer
torque applied to the satellite in body coordinate.
is the orbit angular rate, i.e. 0 . φ, the angular rate of orbit coordinate with respect to inertial coordinate is 0 θ and ψ are the roll, pitch and yaw attitude respectively of the satellite. Thrusters Mathematical Model: For station keeping simulations the thrusters mathematical model was developed taking into account all the torques produced by each thruster gathered in the a or b branch (See figure 1). Also, knowing that in NSSK the 6a and 7a thrusters are used to generate northward force so they are activated all along the NSSK maneuver. If the X axis Pseudo Rate Modulator (PRM) output is +1, +X torque should be generated. Therefore 7a should be turned off to generate +X torque. So this is called “off-modulation”. When X axis PRM output is -1, 6a should be turned off to generate –X torque. For EWSK, 2a and 3a are used to generate eastward force to gain eastward velocity increment. The off modulation is used when either 2a or 3a is needed to generate torque; in that case one is turned off meanwhile the other one is used to generate the appropriate torque. With these conditions and by using combinational logic the combinational functions were obtained.
Accomplishing SKM for Attitude Orbit Control Subsystem designed for T-SAT
511
Fig. 1 Thrusters branches Fig. 2 Single Thruster Model
Thruster Configuration Design and Simulation Thrusters’ forces and torques: A single thruster is able to generate a torque of 10N-m. And with respect of the satellite’s body coordinate frame the forces and torques generated depend on the position of the thrusters used.
512
M. Salomón and A. Kendrick
Torque generating strategy: In SKM the entire torque on the satellite is the sum of the components of the torque generated by each active thruster. Station-Keeping Controller Design and Simulation • Controller design: In station keeping mode the controllers are based on the well known PID control law: Controller
.
.
.
Where the derivative error is taken directly from the gyros output, figure 3. • PID Controller Simulink model Fig. 3 PID Controller
• PRM Simulink model: This model was given by experts. The output of the PRM can be 1, 0, or -1. The switch on point is 0.02 and the switch off value is 0.01. Fig. 4 PRM Simulink Model
• EW_SKM Simulink model: This model consists of the integration of the modules: satellite’s dynamic, solar disturbance torque, PID controllers, PRM, and EWSK Thrusters model. It can be seen in the following figure.
Accomplishing SKM for Attitude Orbit Control Subsystem designed for T-SAT
513
Fig. 5 East/West Station Keeping Attitude Matlab Simulink Model
• NS_SKM Simulink model: Basically same as the EW_SKM Simulink model but the thruster model is based on the NSSK torque generation requirements. • Controller parameter selection: All the controllers used in EW_SKM and NS_SKM were tuned up by the classical trial and error process. Station Keeping Simulation Results EWSK Attitude Simulation The initial conditions for this simulation were: 0.05 0.05 , 0.15
0 0 0
Fig. 6 First 50 seconds attitude response for EWSK
,
0 93.9693 and 0
0 0 0
Fig. 7 Steady state attitude for EWSK (Simulation time: 1hr.)
514
M. Salomón and A. Kendrick
Fig. 8 EWSK Velocity change response
NSSK Attitude Simulation The initial conditions for this simulation were the same that above.
Fig. 9 First 200 seconds attitude
Fig. 10 Steady state attitude (Simulation time: 1hr.)
Accomplishing SKM for Attitude Orbit Control Subsystem designed for T-SAT
515
Fig. 11 NSSK Velocity change response
2 Conclusions • The satellite’s attitude transient response in east/west station keeping can be seen in the figure 6, from this it can be noticed how the control system o brings the satellite’s attitude from the initial simulation values to ±0.02 in less than 15 seconds. This result shows that the control system manages the system transient response in east/west station keeping successfully. • The figure 7 shows the satellite’s attitude steady state during 1 hour of simulation. In this case the satellite’s attitude is kept within the required values gave to AOCS for station keeping mode along the simulation. • The last figure for EWSK exposes the evolution of the eastward velocity change during the east/west station keeping maneuver. The velocity change required was 0.1 ⁄ and, from the figure 8, this value is reached at 8.33 seconds after the maneuver beginning. • Also, in another graphics is easy to check that using these results, all the requirements for the T-SAT were accomplished satisfactory. • The figure 9 shows the satellite’s attitude transient response in north/south station keeping. The angles pitch and yaw are brought within the range o ±0.01 in less than 15 seconds but the roll angle has a slower response while the north/south station keeping maneuver is being performed. This is because the thrusters used to correct the roll error also generate the maneuver northward force. Once the maneuver is finished, in this case at 129.4 seconds, the roll angle responds quickly.
516
M. Salomón and A. Kendrick
• The satellite’s attitude steady state is kept within the range ±0.01 as it can be seen in the figure 10 • The northward velocity change during the north/south station keeping maneuver is shown in the figure 11. The velocity change required was 1.78 ⁄ . This value is reached at 129.4 seconds after the maneuver beginning. o
From the simulation results is remarkable that under the worst initial conditions, the attitude error of station-keeping mode is inside the required range. Therefore the angles roll, pitch and yaw are within required values, which means that the controllers performance fulfill the requirements for the AOCS.
References 1. Marcel, J.: Sidi, Spacecraft Dynamics and Control (Practical Engineering Approach). Cambridge University Press, Holbrook (2000) 2. Hoecherl, F., Wilke, M., Quirmbach, O., Surauer, M., Igenbergs, E.: A Toolset for model based AOCS-Design. Institute of Astronautics Technical University of Munich, Germany 3. Chobotov, V.: Spacecraft Attitude Dynamics and Control. Krieger Publishing Company, Malabar (1991) 4. Wertz, J.R.: Spacecraft Attitude Determination and Control. D. Reidel Publishing Company, Dordrecht (1980) 5. Hughes, P.C.: Spacecraft Attitude Dynamics. Dover Publications, Inc., Mineola (2004) 6. Schlutz, J.: Attitude and Orbit Control of Small Satellite Using Thruster System. University of Sydney, Australia (2005)
Nonlinear System Identification Based on Recurrent Wavelet Neural Network Fengyao Zhao, Liangming Hu, and Zongkun Li*
Abstract. Based on Elman network, the recurrent wavelet neural network (RWNN) is presented, and the extended kalman filter of RWNN is given in this paper. The recurrent wavelet neural network (RWNN) can be used in the nonlinear system identification successfully. Practical example shows that RWNN has a faster convergence as well as a better precision in calculation, and a good result on the nonlinear system identification is got, which means it has a broad prospect on application. Keywords: Elman network, Recurrent wavelet neural network(RWNN), Extended kalman filter(EKF), Nonlinear dynamical system, Identification.
1 Introduction Along with the development of system identification and control technology, the application of the recurrent neural network (RNN) is more and more popular. RNN has the feedforward and feedback connections contrasted with feedforward network, which provides it with nonlinear mapping capacity and dynamical characteristics, so it can be used to simulate the dynamical system and solve dynamic problems. One typical type of RNN is Elman network [1], In which past output values of hidden neurons feed back into themselves. With memory neurons of context layer, recurrent neural networks have the ability of identification on dynamical system. In function approximation, neural networks combine with wavelets have shown good results. Recently, wavelet neural networks have also been used in identification on dynamical system [2-4]. In this research, we integrate Elman network and wavelet neural network to construct a novel recurrent wavelet neural network (RWNN) to identify nonlinear dynamical system. At the same time, the paper offers an effective extended kalman filter algorithm to train it act as an independent system simulator. Fengyao Zhao . Liangming Hu . Zongkun Li School of Water Conservancy and Environment, Zhengzhou University, Zhengzhou 450002, China
[email protected],
[email protected],
[email protected] *
H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 517–525. springerlink.com © Springer-Verlag Berlin Heidelberg 2009
518
F. Zhao, L. Hu, and Z. Li
2 Recurrent Wavelet Neural Network (RWNN) Fig.1 depicts the architecture of recurrent wavelet neural network (RWNN) with three layers of neurons. The first layer consists of two different groups of neurons. They are the groups of external input neurons and internal input neurons also called context units. The second layer is hidden layer, and the third layer is output layer. The inputs to the context units are the outputs of the hidden neurons. The outputs of the context units and the external input neurons are fed to the hidden neurons. Context units are also known as memory units as they store the previous output of the hidden neurons. This recurrent memory gives the network dynamical properties. In Elman network a commonly used activation function is the sigmoid. The architecture of RWNN is similar to that of Elman network, but in its hidden layer the activation function is wavelet function. In every neuron of RWNN hidden layer, there are two accessional parameters ai and bi, which represent the dilation factor and the translation factor of the wavelet activation function separately. In Fig.1, the RWNN has n+1 input neurons and one output neuron, and it has m hidden neurons and the same number of context neurons. Let x(t) Rn denote the network exterior input vectors and y(t) R denote the network output vector at the discrete time t correspondingly, which offer the training vectors. Hi(t) Rm denotes the output of the hidden layer and xc(t) Rm denotes the output of the context layer. Note that x0(t) is not the real exterior input of the network. x0(t)=1, and wi 0 = θ1i is the bias of hidden neuron i.
∈
∈
∈
∈
Let feedback gain coefficient be α , which means:
xc ,i (t ) = α H i (t − 1) .
(1)
Fig. 1 The architecture of RWNN
yt
( )
output layer W (t) Hm(t)
H1(t)
hidden layer
vt
W (t)
( )
xc,1 t
( )
context layer
xc, t
input layer
m( )
x t-1 0(
)
x t-1 n(
)
Nonlinear System Identification Based on Recurrent Wavelet Neural Network
519
The following relations describing the RWNN can be obtained: m ⎧ 1 ⎪ y (t ) = ∑ Wi (t ) H i (t ) i =1 ⎪ ⎪⎪ hi (t ) − bi (t ) ) ⎨ H i (t ) = ϕ ( ai (t ) ⎪ n m ⎪ ⎪hi (t ) = ∑ Wij2 (t ) x j (t ) + α ∑ vik (t ) H k (t − 1) + θ1i (t ) ⎪⎩ j =1 k =1
(2)
In which Wi1 (t ) is the weight connecting hidden and output layers, Wij2 (t ) is the weight connecting input and hidden layers, and vik (t ) is the weight connecting
,
context and hidden layers. ϕ (•) is a wavelet function and we choose Morlet wavelet function in this paper. ai(t) is the dilation factor and bi(t) is the translation factor of the wavelet function. Let t ' = hi (t ) − bi (t ) the Morlet wavelet basis ai (t ) function is:
,
ϕ (t ' ) = cos(1.75t ' )e −t '
2
/2
.
(3)
3 Application of the Extended Kalman Filter to RWNN The use of extended kalman filter (EKF) for training recurrent networks was first explored by Matthews[5], and Williams provides a detailed analytical treatment of EKF training of recurrent networks[6,7]. It is well known that the EKF neural network training algorithm is superior to the standard back-propagation algorithm, and the EKF has been a typical and effective method to train recurrent neural networks [8]. The behavior of RWNN can be described by following nonlinear discrete-time system:
θ (t + 1) = θ (t ) + ω (t ) , y (t ) = h(θ (t )) + v (t ) In the training RWNN, the state vector θ(t) is
θ (t ) = [W (t )T , v(t )T , a (t )T , b(t )T ]T .
(4)
Let y(t) denote the network actual output of the RWNN and yd(t) denote the desired output at time step t. Define ξ (t ) to be the network error at time t
ξ (t ) = y d (t ) − y (t ) .
(5)
Assume that the network starts to operate at time step 1 and operates up to final time step N. Define the total squared error over each of iterations by
520
F. Zhao, L. Hu, and Z. Li N N 1 1 E = ∑ ( yd (t ) − y (t )) 2 =∑ (ξ (t ))2 . 2 t =1 t =1 2
(6)
The objective of learning algorithms is to minimize the total squared error E by adjusting the weight of the network. At the tth time step, the input signals and recurrent node outputs are propagated through the network, and the functions y(t) are computed, then error vector ξ (t ) can be calculated. The dynamic derivatives of each component of matrix H(t) are formed with respect to all trainable weight parameters of the network: T
H (t ) =
∂y (t ) ⎛ ∂y (t ) T ∂y (t ) T ∂y (t ) T ∂y (t ) T ⎞ = ⎜( ) ,( ) ,( ) ,( ) ⎟ ∂θ (t ) ⎝ ∂W (t ) ∂v(t ) ∂a(t ) ∂b(t ) ⎠
(7)
Then θˆ(t ) and P(t) are updated by the following global EKF recursion: A(t ) = [(η (t ) S (t ))−1 + H (t )T P (t ) H (t )]−1
(8)
K (t ) = P(t ) H (t ) A(t ) .
(9)
θˆ(t + 1) = θˆ(t ) + K (t )ξ (t )
(10)
P(t + 1) = P(t ) − K (t ) H (t )T P(t ) + Q(t )
(11)
Where η (t ) is a scalar leaming rate parameter which in conjunction with the weighting matrix S (t ) , establishes the learning rate. The matrices A(t), for which a matrix of network output size must be inverted. K(t) is known as Kalman gain, computed at each time step, and is used to update the weight vector and error covariance matrix. Finally, Q(t ) is a diagonal covariance matrix that provides a mechanism by which the effects of artificial process noise are included in the Kalman recursion. The presence of this matrix helps to avoid numerical divergence of the algorithm, and also helps the algorithm avoid poor local minima. In RWNN, each component of matrix H(t) are calculated by following formulas:
∂y (t ) = H i (t ) . ∂Wi1 (t )
(12)
∂H i (t ) . ∂y(t ) = Wi1 (t ) 2 ∂Wij (t ) ∂Wij2 (t )
(13)
∂y (t ) ∂H (t ) = Wi1 (t ) i . ∂vik (t ) ∂vik (t )
(14)
∂H (t ) ∂y (t ) = Wi1 (t ) i . ∂ai (t ) ∂ai (t )
(15)
Nonlinear System Identification Based on Recurrent Wavelet Neural Network
~ ,
∂H (t ) ∂y(t ) = Wi1 (t ) i . ∂bi (t ) ∂bi (t )
521
(16)
In equation (13) (16) the calculating of every partial derivative at each time t depends on the partial derivative at previous time (t-1), m ∂H i (t ) ∂H i (t ) ⎡ ∂H k (t − 1) ⎤ . = ⎢ x j (t ) + α ∑ vik (t ) ⎥ 2 Wij2 (t − 1) ⎦⎥ ∂Wij (t ) ∂hi (t ) ⎣⎢ ∂ k =1
(17)
m ∂H i (t ) ∂H i (t ) ⎡ ∂H k (t − 1) ⎤ . = ⎢α H k (t − 1) + α ∑ vik (t ) ⎥ ∂vik (t ) ∂hi (t ) ⎣ ∂vik (t − 1) ⎦ k =1
(18)
m ⎡ ∂ (t ') ∂H i (t ) ∂H (t − 1) ⎤ . = ϕ '(t ') ⎢ + α ∑ vik (t ) k ⎥ ∂ai (t ) ∂ai (t − 1) ⎦ k =1 ⎣ ∂ai (t )
(19)
m ⎡ ∂ (t ') ∂H i (t ) ∂H (t − 1) ⎤ . = ϕ '(t ') ⎢ + α ∑ vik (t ) k ⎥ b ( t ) ∂bi (t ) ∂ ∂bi (t − 1) ⎦ k =1 ⎣ i
(20)
,from equation (2) we can get
In equation (19) and (20)
ϕ ' (t ' ) = −1.75 sin(1.75t ' )e − t '
2
/2
− t ' cos(1.75t ' )e − t '
2
/2
.
(21)
、
h (t ) − bi (t ) . ∂ (t ') can easily get from and the partial derivative ∂ (t ') t'= i ai (t ) ∂ai (t ) ∂bi (t ) At the beginning of the algorithm, the initialization of xc,i(0) and partial derivatives can be set to 0, i.e., x c , i (t ) = 0 , ∂H i (t ) = 0 , ∂H i (t ) = 0 , ∂H i (t ) = 0 , ∂H i (t ) = 0 , ∀i, j, k, when t = 0. ∂bi (t ) ∂vik (t ) ∂ai (t ) ∂Wij2 (t )
Then, we can calculate partial derivatives in matrix H(t) according to the sequence t = 1, 2,..., N . In general, the total error E is diminishing with the increase of calculating iterations in the algorithm.
4 Simulation and Experimental Results In this section, simulation and experiments are conducted to illustrate the validity of the proposed RWNN. The following one-input-one-output nonlinear process model coming from reference [9] is used in the simulation:
⎧ y (t + 1) = f [ y (t )] + g[u (t )] + ω y (t + 1) ⎪ 2 ⎨ f [ y (t )] = 3[ y (t )] /(6.2 + 2 y (t ) + y (t )) ⎪ g[u (t )] = u 2 (t ) + u 3 (t ) ⎩
(22)
Where u(t) is the manipulated input variable and y(t) is the output of the system.ωy(t) is the noise of y(t) in the simulation. This is a strongly nonlinear
522
F. Zhao, L. Hu, and Z. Li
process, as the dynamic relation between y(t) and u(t) changes by both nonlinear functions f [•] and g[•] and it is a dynamic system where real time implementation is critical. Assume f [•] and g [•] are both unknown, a RWNN is built like Fig.1 as the system identifier with the number of the input, hidden, context and output layer neurons being 2, 6, 6 and 1, respectively. The network is first trained by u(t) and y(t), then forecasts ~y (t ) according to another testing variable u~(t ) , through which tests the RWNN and its algorithm’s effectiveness on the identification of nonlinear dynamical system. In the learning phase u(t) is defined as the following equation:
,
u (t ) = 1.0 + 0.6 sin(2kπ / 50) + 0.4 sin(2kπ / 75) .
(23)
u(t) is shown in Fig.2, and the desired output can get from equations (22). The noise is not considered in the simulation. The number of time step is N = 200 , and the learning result of RWNN is shown in Fig.3. The training of the network adopts the previous EKF algorithm. Utilizing equations (8) (11) a value less than 0.001 of total squared error E can be achieved about 200 training iterations. For the sake of analyzing the efficiency of RWNN, the learning result of Elman network is also presented in Fig.3 for comparison. In the research Elman network has the same number of the input, hidden, context and output layer neurons as RWNN, and the Elman network is also trained 200 iterations by EKF algorithm. It can be seen from Fig.3 that the learning result of RWNN is much better than that of Elman network.
~ ,
Fig. 2 Training input curve respect to time
Fig. 3 The results of neural network learning phase
Nonlinear System Identification Based on Recurrent Wavelet Neural Network
523
Fig. 4 Error curves of RWNN and Elman network
Fig. 4 depicts the total error curve of RWNN and Elman network with respect to the learning iterations. At the early 50 iterations, the error curve of RWNN descends faster, which means RWNN convergence more quickly. At the latter 150 iterations, the two curves descend both slowly, which is a general result existing in the training algorithm, but the error value of RWNN is obviously smaller than that of Elman network, which means RWNN have higher convergence precision. After training the network, input the testing function u~ (t ) as shown in Fig.5, and the testing output ~ y (t ) of network is illustrated in Fig.6. It can be seen that the RWNN identifying result tracks desired signal quite well, indicating that the
Fig. 5 Testing input curve respect to time
Fig. 6 The results of identifier during the testing phase
524
F. Zhao, L. Hu, and Z. Li
proposed RWNN algorithm has a good ability of identifying nonlinear dynamical system. The testing result of Elman network is also shown in Fig.6 and it is clearly not so good as RWNN. It can be seen from the simulation results that RWNN has the advantage over Elman network in terms of convergence speed and identification precision. RWNN performs rather well in identifying nonlinear dynamical system, which is valuable for practice application.
5 Discussion (i) Wavelet function has a favorable localizing characteristic. For some localizable regions of the inputting function space, only a few weights of the RWNN need to be regulated, so it can easily acquire the localizable character of dynamic system. Usually RWNN gains much better convergence precision with fewer neurons and with fewer neurons training iterations [3]. (ii) There are many wavelets with good property in the wavelet function family, such as Mayer wavelet, Morlet wavelet, Gauss wavelet, etc. Different wavelet can be chosen as the basis function of RWNN according to different problems. In the EKF learning algorithm, only the derivative ϕ ' (t ' ) should be modified with respect to different wavelet basic function. (iii) In this paper RWNN is based on multi-input-one-output structure, it can easily extend to the multi-input-multi-output structure.
6 Conclusions In this research, a new dynamic neural network RWNN integrating Elman network and the wavelet neural network is proposed, and the extended kalman filter(EKF) algorithm of training it is also given. Simulate performance of the RWNN on the identification of nonlinear dynamical system is excellent. The RWNN reveals an extremely fast convergence and a relative high accuracy in the process of identifying nonlinear dynamical system. It is believed that the proposed RWNN is a promising identifying strategy for nonlinear dynamical system. Acknowledgements. The authors would acknowledge financial support from National Science Foundation of China(50279003).
References 1. Elman, J.L.: Finding structure in Time. Cognitive Science 14, 179–211 (1990) 2. Oussar, Y., Rivals, I., Personnaz, L., et al.: Training Wavelet Networks for Nonlinear Dynamic Input-output Modeling. Neurocomputing 20, 173–188 (1998) 3. Liang, F., Tan, Y.: Using Wavelet Neural Network in Non-linear System Identification. Journal of Guilin Institute of Electronic Techology 20(1), 18–22 (2000) (in Chinese) 4. Srivastava, S., Singh, M., Hanmandlu, M., Jha, A.N.: New Fuzzy Wavelet Neural Networks for System Identifiation and Control. Applied Soft Computing 6, 1–17 (2005)
Nonlinear System Identification Based on Recurrent Wavelet Neural Network
525
5. Matthews, M.B.: Neural Network Nonlinear Adaptive Filtering Using- the Extended Kalman Filter Algorithm. In: Proceedings of the International Neural networks Conference, Paris, vol. I, pp. 115–119 (1990) 6. Williams, R.J.: Some Observations on the Use of the Extended Kalman Filter as a Recurrent Network Learning Algorithm. Technical Report NU-CCS-92-1. Northeastern University, College of Computer Science, Boston (1992) 7. Williams, R.J.: Training Recurrent Networks Using the Extended Kalman Filter. In: International Joint Conference on Neural Networks, VBaltimore, vol. IV, pp. 241–246 (1992) 8. Puskorius, G.V., Feldkamp, L.A.: Neurocontrol of Nonlinear Dynamical Systems with Kalman Filter Trained Recurrent Networks. IEEE Trans. on Neural Networks 5, 279– 297 (1994) 9. Yuan, X., Liu, S.: Study on Neural Network Simulation of Dynamic System. Water Resources and Hydropower Engineering 3, 38–42 (1998)
Approximation to Nonlinear Discrete-Time Systems by Recurrent Neural Networks Fengjun Li
Abstract. Neural networks are widely used to approximate nonlinear functions. In order to study its approximation capability, a approximating approach for nonlinear discrete-time systems is presented by using the concept of the time-variant recurrent neural networks (RNNs) and the theory of twodimensional systems. Both theory and simulations results show that the derived mathematical model of RNNs can approximate the nonlinear dynamical systems to any degree of accuracy. Keywords: Approximation, nonlinear discrete-time system, recurrent neural networks.
1 Introduction There are two types of connections in neural networks. Neural networks with only feedforward connections are called feedforward networks, and neural networks with arbitrary connections are often called recurrent networks. Many of the applications of neural networks, particulary in the areas of nonlinear system identification and control, reduce to the problem of approximating unknow functions of one or more variables from discrete measures [1-6]. A number of authors have established that multilayer feedforward neural networks, with a variety of activation functions, serve as universal approximators [7-9]. For example, in 1989 Hornik, Stinchcombe, and White[7] could show that any Borel-measurable function on a compact domain can be approximated by a three-layered feedforward network, i.e., a feedforward network with one hidden layer, with an arbitrary accuracy. In the same year, Cybenko [8] and Fengjun Li School of Mathematics and Computer Science Ningxia University, Yinchuan 750021, China
[email protected] H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 527–534. c Springer-Verlag Berlin Heidelberg 2009 springerlink.com
528
F. Li
Funahashi[9] found similar results, each with different methods. Whereas the proof of Hornik, Stinchcombe, and White [7] is base on the Stone-Weierstrass theorem, Cybenko [8] makes in principle use of Hahn-Banach theorem. Funahashi [9] mainly applies the Irie-Miyake and the Kolmogov-Arnold-Sprecher theorem. More generally, continuous nonlinear functionals can be approximated with feedforward neural networks, and this can be used to directly approximate the output of dynamical systems [10]. The nonlinear dynamical behavior of recurrent networks is suitable for spatio-temporal information processing. Theoretical studies on recurrent networks have been mainly concerned with the stability of convergence of the trajectory to the equilibria [11]. Recurrent neural networks (RNNs) is one type of artificial neural networks with very successful applications [11-14]. Some work has already been done on the capability of RNNs to approximate measurable functions, e.g. [15]. This paper focuses on dynamical systems and prove that those can be approximated by RNNs in state space model form with an arbitrary accuracy. The remainder of this article is organized as follows. A shot introduction on dynamical systems and RNNs in state space model form is introduced in Section 2. The approximation idea and algorithm are given in Section 3. Using the previously obtained theoretical results, some numerical results are given in Section 4. Finally, in Section 5 we summarize our current research. Throughout the paper we will use boldface characters to stand for vectors, and regular characters for scalars.
2
Dynamical Systems and RNNs
A time-variant discrete dynamical system which can be described as a equation s(t + 1) = f (s(t), u(t), t), s ∈ Rn , u ∈ Rm , t ∈ R,
(1)
where s(t) ∈ RL and u(t) ∈ Rm are the state vectors and input vectors of neurons, respectively. (1) is the state transition that is a mapping from the present internal hidden state of the system s(t) and the influence of external inputs u(t) to the new state s(t + 1). The system can be viewed as a partially observable autoregressive dynamic state transition s(t) → s(t + 1) that is also driven by external forces u(t). Without the external inputs u(t), the system is called an autonomous system [15]. However, most real world systems are driven by a superposition of an autonomous development and external influences. If we assume that the state transition does not depend on s(t), we are back in the framework of feedforward neural networks[16]. However, the inclusion of the internal hidden dynamics makes the modeling task much harder, because allows varying inter-temporal dependencies. Theoretically, in the
Approximation to Nonlinear Discrete-Time Systems
529
current framework an event s(t+1) is explained by a superposition of external inputs u(t), u(t − 1), · · · from all the previous time steps. In this paper, we propose to map nonlinear time-variant discrete dynamical systems (1) by a type of recurrent neural networks in state space model form sL (t + 1) = F A(t)sL (t) + B(t)u(t) , sL ∈ RL , u ∈ Rm , (2) where A(t) and B(t) are weight matrices of appropriate dimensions at time t, T F (x) = f (x1 ), f (x2 ), · · · , f (xL ) (0 < F˙ (·) ≤ 1) is the so called activation function of the network which has continuous derivation. A major advantage of RNNs written in the form of of a state space model (2) is the explicit correspondence between equations and architecture.
3
Approximation Idea
For nonlinear time-variant discrete dynamical systems, the common case, we prove that their finite time trajectory can be approximated by the state vector of the output units of RNNs to any degree. Our approximation idea is that at every time in the definition of the nonlinear time-variant discrete dynamical systems, using time-variant RNNs (2) to approximate them. Therefore, we must confirm the weight matrices A(t) and B(t) at every time t. We will convert this approximation problem to the train problem of RNNs (2). For seasonable, the train algorithm must be simple, efficient and convergence quickly. We find in the process of train weight matrices in RNNs (2), each variable depends on two other variables—time t and iterated number k. Therefore, using the denotation way of two-dimensional system, (2) can be rewritten as sL (t + 1, k) = F A(t, k)sL (t, k) + B(t, k)u(t) . (3) The train algorithm of weight matrices A(t) and B(t) is denoted as A(t, k + 1) = A(t, k) + ΔA(t, k), B(t, k + 1) = B(t, k) + ΔB(t, k).
(4)
Next, we will confirm ΔA and ΔB in (4). We use the state vector of the output unites of RNNs to approximate the finite time trajectory of nonlinear time-variant discrete dynamical systems. Let e(t, k) = sL (t, k) − s(t),
(5)
sL (0, k) = sL (0) = s(0)
(6)
and A(t, 0), B(t, 0), t = 0, 1, · · ·, with random. If δ(t + 1, k) = sL (t, k + 1) − sL (t, k),
(7)
530
F. Li
then δ(1, k) = 0, and δ(t + 1, k) = G(t − 1, k)[A(t − 1, k − 1)sL (t − 1, k + 1) + B(t − 1, k + 1)u(t − 1) − A(t − 1, k)sL (t − 1, k) − B(t − 1, k)u(t − 1)] = G(t − 1, k)[A(t − 1, k − 1)G(t, k) + ΔA(t − 1, k)sL (t − 1, k + 1) + ΔB(t − 1, k)u(t − 1),
(8)
where G(t − 1, k) = diag(f˙(ξ1 ), f˙(ξ2 ), · · · , f˙(ξL )), ξ = (ξ1 , ξ2 , · · · , ξL )T is between G(t − 1, k)[A(t − 1, k − 1)sL (t − 1, k + 1) + B(t − 1, k + 1)u(t − 1) and A(t − 1, k)sL(t − 1, k) + B(t − 1, k)u(t − 1). From (2), we have e(t, k + 1) − e(t, k) = −G(t − 1, k)[A(t − 1, k)δ(t, k) + ΔA(t − 1, k) × sL (t − 1, k + 1) + ΔB(t − 1, k)u(t).
(9)
Let G(t − 1, k)[A(t − 1, k)δ(t, k) + ΔA(t − 1, k)sL (t − 1, k + 1) + ΔB(t − 1, k)u(t) = P (t, k)δ(t, k) + Q(t, k)e(t, k).
(10)
From (8), (9) and (10), we obtain the two dimensional discrete Roessor’s model [17] as follows: δ(t, k) P (t, k) Q(t, k) δ(t + 1, k) , (11) = e(t, k) −P (t, k) I − Q(t, k) e(t, k + 1) where I is identity matrix of appropriate dimensions, P (t, k) and Q(t, k) are confirmed in the future which initial conditions are δ(1, k) = 0, k = 0, 1, · · · , and e(t, 0) = sL (t, 0) − s(t), t = 0, 1, · · · . As 0 < F˙ (·) ≤ 1, and ˙ 1 ), f˙(ξ2 ), · · · , f˙(ξL )) G(t − 1, k) = diag(f(ξ is nonsingular, so from (10), we have
Approximation to Nonlinear Discrete-Time Systems
ΔA(t − 1, k) ΔB(t − 1, k)
531
= G−1 (t − 1, k)[P (t, k)δ(t, k) + Q(t, k)e(t, k)] " −A(t − 1, k)δ(t, k) sL (t − 1, k + 1) u(t − 1) ×
sL (t − 1, k + 1) u(t − 1)
−1
sL (t − 1, k + 1) u(t − 1) .(12)
From[17], we know that in two dimensional discrete time-variant system (11), the convergence property is only depend on Q(t, k), independent of P (t, k). Without loss of generality, we take P (t, k) = 0. Because of 0 < F˙ (·) ≤ 1, we choose Q(t, k) = G(t, k − 1). further adjust variant t, (12) can be written as " ΔA(t − 1, k) = e(t + 1, k) − A(t, k)[sL (t, k + 1) − sL (t, k)] ΔB(t − 1, k) −1 sL (t, k + 1) × sL (t, k + 1) u(t) u(t) × sL (t, k + 1) u(t) " = e(t + 1, k) − A(t, k)[sL (t, k + 1) − sL (t, k)] 1 sL (t, k + 1) . (13) × 2 u(t) sL (t, k + 1) + u2 (t) With this, we know that the approximation algorithm of time-variant discrete RNNs consists of (4) and (13). At each time in the definition of the nonlinear time-variant discrete dynamical systems (1), applying (4) and (13) to perform iterated computation until error sufficiently small. Then turn to next time continue this process.
4
Numerical Results
In order to demonstrate our theoretical result, in this section, we give an example of approximation to the nonlinear time-variant discrete dynamical systems 2 0.1 0.05t 0.3 s1 (t + 1) s1 (t) u(t) (14) = + 0.2 −0.15t 0.1 s2 (t + 1) s2 (t) by time-variant discrete RNNs (3), and analyze the error bound of this approximation algorithm. Let initial conditions s1 (0) = 0, s2 (0) = 0.5 and take f (x) = 0.5x in (3). The number of iterated train algorithm (3) and (13) is fixed two at each time t. we select u(t) = 0.25 for a numerical computation to have numerical results in Figures 1 and 2. If we use
532
F. Li
Fig. 1 The trajectory of s1 (t) and its approximator
Fig. 2 The trajectory of s2 (t) and its approximator
Fig. 3 The relationship between the approximation error and iterated number
Approximation to Nonlinear Discrete-Time Systems
! E = ||si (t) − sL,i (t)||2 =
T
533
12 (si (t) − sL,i (t)) dt , i = 1, 2 2
(15)
0
as a metric of this approximation, then We also analyze the relationship between error bound and iterated number (see Figure 3). On the basis of the numerical example, we can conclude that our numerical approximation results are in good agreement with theory.
5 Conclusion In approximation by neural networks, we have two main problems— “density” and “complexity”. In this paper, we study the density problem. We introduce nonlinear time-variant discrete dynamical systems and prove their finite time trajectory can be approximated by the state vector of the output units of RNNs to any degree. In a word, this paper obtained results can precisely characterize the approximation ability of RNNs and clarify the relationship among the rate of approximation, the number of hidden units and the properties of approximated systems. Acknowledgements. This work was supported by NSFC project under contract No. 70531030 and Ningxia university project under contract No. ZR200803.
References 1. Narendra, K.S., Parthasarathy, K.: Identification and Control of Dynamic Systerms Using Neural Networks 1, 4–27 (1990) 2. Chen, S., Billings, S.A.: Neural Networks for Nonlinear Dynamic Systerm Modeling and Identification. Int. J. Contr. 56, 319–346 (1992) 3. Funahashi, K.I., Nakamura, Y.: Approximation of Dynamical Systerms by Continuous Time Recurrent Nueral Networks. Neural Networks 6(6), 801–806 (1993) 4. Chen, T.P., Chen, H.: Approximation of Continuous Functoinal by Neural Networks with Application to Dynamic Systerms. IEEE Trans. Neural Networks 4, 910–918 (1993) 5. Liu, G.P., Kadirkamanathan, V., Billings, S.A.: Variable Neural Networks for Adaptive Control of Nonlinear Systems. IEEE Trans. Syst., Man, Cybern. C 29, 34–43 (1999) 6. Chen, T.P., Amari, S.I.: New Theorems on Global Convergence of Some Dynamical Systems. Neural Networks 14, 251–255 (2001) 7. Hornik, K., Stinchombe, M., White, H.: Multilayer Feedforward Networks Are Universal Approximator. Neural Networks 2, 359–366 (1989) 8. Cybenko, G.: Approximation by Superpositions of Sigmoidal Function. Math. of Control Signals, and System 2, 303–314 (1989)
534
F. Li
9. Funahashi, K.I.: On the Approximate Realization of Continuous Mappings by Neural Networks. Neural Networks 2, 183–192 (1989) 10. Sch¨ afer, A.M., Zimmermann, H.-G.: Recurrent Neural Networks are Universal Approximators. In: Kollias, S.D., Stafylopatis, A., Duch, W., Oja, E. (eds.) ICANN 2006. LNCS, vol. 4131, pp. 632–640. Springer, Heidelberg (2006) 11. G´egout, C.: Stable and Convergent Dynamics for Discrete-Time Recurrent Networks. Nonlinear Analysis 30(3), 1663–1668 (1997) 12. Garzon, M., Botelho, F.: Dynamical Approximation by Recrrent Neural Networks. Neurocomputing 29, 25–46 (1999) 13. Hammer, B.: On the Approximation Capability of Recrrent Neural Networks. Neurocomputing 31, 107–123 (2000) 14. Liang, J., Nikiforuk, P.N., Gupta, M.M.: Approximation of Discrete-Time State-Space Trajectories Using Dynamic Recurrent Neural Networks. IEEE Trans. Auto. Contr. 40(7), 1266–1270 (1995) 15. Zimmermann, H.G., Grothmann, R., Sch¨ afer, A.M., Tietz, C.: Identifcation and Forecasting of Large Dynamical Consistent Neural Networks. In: Haykin, S.J., Principe, T.S., McWhirter, J. (eds.) New Directons in Statistcal Signal Processing: From Systems to Brain. MIT Press, Cambridge (2006) 16. Immermann, H.G., Neuneier, R.: Neural Network Architectures for the Modeling of Dynamical Systems. In: Kolen, J.F., Kremer, S. (eds.) A Field Guide to Dynamical Recurrent Networks, pp. 311–350. IEEE Press, Los Alamitos (2001) 17. Kaczorek, T., Klamka, J.: Minimum Energy Control of 2-D Linear Systems with Variable Coefficients. Int. J. Control 44, 645–651 (1986)
Model-Free Control of Nonlinear Noise Processes Based on C-FLAN Yali Zhou, Qizhi Zhang, Xiaodong Li, and Woonseng Gan*
Abstract. In practical active noise control (ANC) systems, nonlinear active controllers may be required in cases where the actuators used in ANC systems or the structures to be controlled exhibit nonlinear characteristics. In this paper, a chebyschev functional link artificial neural network (C-FLANN) is used as a nonlinear controller. Compared with the multilayer perception (MLP) neural network, CFLANN exhibits a much simpler structure, less training computation and faster convergence. The simultaneous perturbation stochastic approximation (SPSA) algorithm instead of the usual back-propagation method is applied as a learning rule of the network to adapt to the time-varying plant. Unlike the back-propagation method, the SPSA method does not require an estimation of the secondary path. Computer simulations have been carried out to demonstrate that the proposed algorithm outperforms the standard filtered-x least mean square (FXLMS) algorithm when the ANC system exhibits nonlinear and time-varying characteristics. Keywords: ANC, Nonlinear, Model-free, C-FLANN.
1 Introduction In recent years, active noise control (ANC) has been a relevant area of interest for many researchers. Well-established methods and corresponding applications have been reported in the literature [1,2]. The principle of ANC is the destructive interference at a definite location of an incoming acoustical noise using a signal, usually generated by a loudspeaker, with equal amplitude and opposite phase. The block diagram of one of the most widely adopted schemes, i. e., the so-called single-channel feed-forward ANC scheme is shown in Fig.1. In this scheme, x ( n ) is Yali Zhou . Qizhi Zhang School of Automation, Beijing Information Science and Technology University, Beijing 100192, China
*
Xiaodong Li Institute of Acoustics, Academia Sinica, Beijing 100080, China Woonseng Gan School of EEE, Nanyang Technological University, Singapore 639798, Singapore
[email protected] H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 535–541. springerlink.com © Springer-Verlag Berlin Heidelberg 2009
536
Y. Zhou et al.
the noise sensed by a reference microphone, d p ( n ) is the signal at the location where the noise has to be attenuated, y(n) is the signal generated by the controller, d s ( n ) is the interfering signal received at the same location where an error micro-
phone collects the error signal e ( n ) = d p ( n ) + d s ( n ) . This signal is used, together with the input signal, to adapt the controller, which is an adaptive filter driven by a suitable adaptation algorithm. Moreover, in Fig.1, P(Z) is the transfer function characterizing the primary path between the noise source and the error microphone, while S(Z) is the transfer function of the secondary path between the controller output and the error microphone. The most common form of linear adaptive algorithm/architecture combination is a transversal finite impulse response (FIR) filter using the filtered-X Least Mean Square (FXLMS) algorithm [1,2]. To deal with the noise cancellation, previous knowledge or modeling of the secondary path is required. While presently linear controllers constitute a well-established solution for ANC, it has been recently pointed out that various nonlinear effects may influence the behavior of an ANC structure [3,4]. Nonlinearities may be present in the primary path when the noise is propagating with high sound pressure [5] and the nonlinearity of the air is taken into account. Nonlinearities can also occur in the secondary path due to the use of A/D and D/A converters, power amplifiers, loudspeakers and transducers. Overdriving the electronics or loudspeakers gives rise to relevant nonlinear effects [6]. To deal with these effects, different structures for nonlinear controllers have been recently proposed in the literature [3-11]. Among them, structures based on functional link artificial neural network (FLANN) have attracted much research attention. The basic principle of FLANN is to expand the dimensionality of the input signal space by using a set of independent functions [10]. Several mapping functions have been used to achieve this purpose, such as Legendre, Chebyshev, and trigonometric polynomials [11].
P(Z)
ds(n) y(n)
FM
e(n) +
C-FLANN controller
f0
Z-1
x(n-1)
S(Z) Z-1
x(n-2)
Functional Expansion
x(n)
XN x(n)
dp(n)+
f1
∑
SPSA Z-1
Fig. 1 The block diagram of an ANC system
x(n-N+1)
WM
fM-1
Fig. 2 The FLANN structure
y(n)
Model-Free Control of Nonlinear Noise Processes Based on C-FLAN
537
In this paper, a chebyschev functional link artificial neural network (CFLANN) is used as a nonlinear controller, at the same time, the simultaneous perturbation stochastic approximation (SPSA) algorithm is used to update all weights of the C-FLANN simultaneously. Unlike the back-propagation method, this method does not require an estimation of the secondary path. Compared with the multilayer perception (MLP) neural network, C-FLANN exhibits a much simpler structure, less training computation and faster convergence [6]. So it is easier to implement by hardware and improve the performance-price ratio of system.
2 Control Algorithm The structure of the FLANN controller is shown in Fig.2, at any time instant n, defining the input signal vector X ( n ) as X (n) = [ x(n), x(n − 1),
x(n − N + 1)]T .
(1)
Where N is the length of the input vector X(n), [.]T denotes transpose of a vector. Then the functionally expanded vector F(n) is F (n) = [ f 0(n ), f 1( n),
fM − 1(n)]T = [ FE ( x( n)), FE ( x ( n − 1)),
FE ( x ( n − N + 1))]T . (2)
Where FE(.) stands for functional expansion, M is the length of the functional enhanced vector F ( n ) . These M independent functions map the N-dimensional signal space into an Mdimensional space, that is RN → RM N < M . W ( n ) is the weight vector and is given by W (n) = [ w0( n), w1( n),
wM − 1( n)]T .
(3)
It is common knowledge that if the secondary path of the ANC system is completely unknown, it is impossible to use usual gradient method as a learning rule to update the controller coefficients [12]. In this case, an estimator of the gradient of the error function is needed. The SPSA which was introduced by J. C. Spall [13] is a well-known gradient approximation approach that relies on measurements of the objective function, not on measurements of the gradient of the objective function. The objective of the following analysis is to develop the C-FLANN-based SPSA algorithm to improve the noise cancellation capability of a nonlinear ANC system. Step 1: Define the error function Note that in ANC systems, each sampling error signal does not contain enough information as an evaluation function to be optimized. That is, the expectation of the error signal has to be used as the evaluation function. For practicality, the sum of the error signal for a certain interval is used to approximate the expectation of the error signal. Thus, the error function is defined as [14].
J ( y (n)) =
1 λ 2 1 λ e (n) = ∑ [ds (n) + dp (n)]2 . ∑ 2 t =1 2 t =1
(4)
538
Y. Zhou et al.
where t is the sampling number in a block interval, and number of one block interval.
λis the total sampling
Step 2: Functional expansion of the input pattern vector X ( n )
For functional expansion of the input pattern, we have chosen the Chebyshev expansion. Then the functionally expanded vector F ( n ) can be written as F(n) =[ f 0(n), f 1(n), fM −1(n)]T =[T0(n),T1(n),T2(n), TP −1(n),T0(n−1),T1(n−1),T2(n−1), TP −1(n −1)......,T0(n− N +1),T1(n − N +1),T2(n− N +1), TP −1(n− N +1)]T =[1, x(n),2x(n)2 −1, 2x(n)TP − 2(n) −TP − 3(n),1, x(n−1),2x(n −1)2 −1, 2x(n−1)TP − 2(n−1) −TP − 3(n −1) 1, x(n− N +1),2x(n− N +1)2 −1, 2x(n− N +1)T (n− N +1) −T (n − N +1)]T P− 2
(5)
P−3
Where P is the order of functional expansion, then M=N*P. Step 3: Compute the anti-noise signal ds(n) The output of the C-FLANN is M −1
y (n) = W (n)F (n) = ∑ [ wj (n) fj (n)] . T
j =0
(6)
Then the anti-noise signal ds(n) can be calculated using the following equation L −1
T
ds (n) = S (n) Y (n) = ∑ sj (n) y (n − j ) .
(7)
j =0
Where S ( n ) = ⎡⎣ s0 ( n ) s1 ( n )… sL−1 ( n )⎤⎦ , is the impulse response of the secondary path transfer function S(Z), L is the length of the secondary T path. Y ( n ) = ⎡⎣ y ( n ) y ( n − 1)… y ( n − L + 1)⎤⎦ . T
Step 4: Generation of SP vector The following perturbation vector Δ (n) is generated as independent Bernoulli random variables with outcomes of ±1 that gives small disturbances to all weights
Δ (n) = (Δ 0(n), Δ1(n),
ΔM − 1(n))T .
(8)
Step 5: Error function evaluations Obtain two measurements of the error function J (i) based on the SP:
J ( y (W (n) + ck Δ (n))) and J ( y (W (n) − ck Δ (n))) . Where ck is a positive scalar and represents a magnitude of the perturbation. Step 6: Gradient approximation
Generate the SP approximation to the unknown gradient
ΔW (n) =
∂J ( y (W (n))) as ∂W (n)
J ( y (W (n) + ck Δ (n))) − J ( y (W (n) − ck Δ (n))) Δ ( n) . ck
(9)
Model-Free Control of Nonlinear Noise Processes Based on C-FLAN
539
Step 7: Update the weight vector W(n) of the C-FLANN Weights of the C-FLANN are updated in the following manner:
W (n + 1) = W (n) − ak ΔW (n) .
(10)
Where ak is a positive learning coefficient. From (9) and (10), it can be seen that the weights of the C-FLANN controller is updated without the need to model the secondary path, so this algorithm is called mode-free(MF) control algorithm.
3 Simulation Studies To demonstrate the effectiveness of the proposed algorithm, computer simulations are performed on a nonlinear ANC system. And at the same time, a comparison between the proposed algorithm and the FXLMS algorithm is made. The noise to be attenuated is a sinusoidal signal at 300 Hz sampled at 3000 samples/s. For the C-FLANN-based SPSA algorithm the length of the input vector X(n) is 13(N=13),the chebyschev functional expansion of the input signal is of fourthorder type(P=4), then M=N*P=52. The total sampling number of one block interval is set as 100. ck and ak are set as 0.001 and 0.00007, respectively. For the FXLMS algorithm, a 13-tap finite impulse response (FIR) filter is used, the convergence coefficient for the FXLMS algorithm is set at μ=0.001. Case 1: In this case, the primary path is characterized by a nonlinear inputoutput relationship, which consists of a linear transfer function
λ
P '( Z ) = 0.8Z − 6 + 0.6 Z − 7 − 0.2Z − 8 − 0.5Z − 9 − 0.1Z − 10 + 0.4Z − 11 − 0.05Z − 12 . (11) followed by the nonlinear combination of its outputs
dp (n) = u (n − 2) + 0.6u 2(n − 2) .
(12)
where H −1
u (n) = ( P'(n)) X (n) = ∑ [ p ' j (n) xj (n)] . T
j =0
(13)
where P'(n) = [ p ' 0(n), p ' 1(n), p ' H − 1(n)]T is the impulse response of the linear transfer function P' ( Z ) of the primary-path. The secondary path is characterized by the non-minimum phase transfer function and is assumed to be time-invariant S ( Z ) = 0.3Z − 2 + 0.6Z − 3 + 0.1 Z − 4 − 0.4 Z − 5 − 0.1Z − 6 + 0.2Z − 7 + 0.1Z − 8 + 0.01Z − 9 + 0.001Z − 10
. (14)
The simulation results of the canceling error in the frequency domain are shown in Fig.3. From the simulation results shown in Fig.3, it can be seen that when the secondary path model is deterministic and time-invariant, the proposed algorithm and the FXLMS algorithm can reduce the 300Hz sinusoidal signal effectively. But
540
Fig. 3 The error signal spectrum for case 1
Y. Zhou et al.
Fig. 4 The error signal versus number of iterations for case 2
only the C-FLANN-based SPSA algorithm is effective on the harmonic noise signal caused by nonlinearity. Case 2: Next, we deal with a tracking problem. With the same set of parameters used for case 1, when the number of iterations reaches 25,000, the secondary path is altered by letting S ( z ) = − S ( z ) . The adaptation continues for 50000 iterations. The error signal in error microphone versus the number of iterations is shown in Fig.4. From the simulation results shown in Fig.4, it can be seen that when the secondary path model is time-varying, the C-FLANN-based SPSA algorithm has a good tracking ability of the secondary path. After a short transient phase, the system settles down to a steady-state response. Whereas the FXLMS algorithm cannot adapt itself to the change of the secondary path because of its model-based characteristics [1]. From the simulation results shown in Fig.4, it is concluded that the SPSAbased MF controller can eliminate the need of the modeling of the secondary path for the ANC system. So, such an approach has potential advantages in accommodating systems where the equations governing the system are unknown or with time-varying dynamics [15].
4 Conclusions In this paper, the SPSA algorithm is applied to the nonlinear controller equipped with C-FLANN. This approach optimizes error function without using derivative of the error function. Therefore, the presented ANC algorithm does not require any estimation of the secondary path. Computer simulations have been carried out to assess the performance of the proposed algorithm as a candidate for nonlinear ANC. Its performance, in terms of error power spectrum, has been compared to that of the standard FXLMS. It is shown that for the nonlinear control, in terms of the error power spectrum, the proposed algorithm outperforms the standard FXLMS algorithm when the ANC system exhibits nonlinear and time-varying characteristics. This observation implies that the SPSA-based C-FLANN algo-
Model-Free Control of Nonlinear Noise Processes Based on C-FLAN
541
rithm can eliminate the need of the modeling of the secondary path for the ANC system. Acknowledgments. This research is supported by Training Funds for Elitist of Beijing (20061D0500600164, 20051A0500603) and Funding Project for Academic Human Resources Development in Institutions of Higher Learning under the Jurisdiction of Beijing Municipality (PXM2008_014215_055942).
References 1. Kuo, S.M., Morgan, D.R.: Active Noise Control Systems—Algorithms and DSP Implementations. Wiley, New York (1996) 2. Nelson, P.A., Elliott, S.J.: Active Sound Control. Academic Press, London (1991) 3. Zhou, Y.L., Zhang, Q.Z., Li, X.D., Gan, W.S.: Analysis and DSP Implementation of an ANC System Using a Filtered-Error Neural Network. Journal of Sound and Vibration 285, 1–25 (2005) 4. Strauch, P., Mulgrew, B.: Active Control of Nonlinear Noise Processes in a Linear Duct. IEEE Trans. Signal Processing 46, 2404–2412 (1998) 5. Klippel, W.: Active Attenuation of Nonlinear Sound. Patent, US (1999) 6. Patra, J.C., Kot, A.C.: Nonlinear Dynamic System Identification Using Chebyshev Functional Link Artificial Neural Networks. IEEE Trans. Syst., Man, Cybernetics 32, 505–511 (2002) 7. Tan, L., Jiang, J.: Adaptive Volterra Filter for Active Control of Nonlinear Noise processes. IEEE Trans. Signal Processing 49, 1667–1676 (2001) 8. Tokhi, M.O., Wood, R.: Active Noise Control Using Radial Basis Function Networks. Contr. Eng. Practice 5, 1311–1322 (1997) 9. Snyder, S.D., Tanaka, N.: Active Control of Vibration Using a Neural Network. IEEE Trans. on Neural Networks 6, 819–828 (1995) 10. Zhang, J.W., Wang, K.Q., Yue, Q.: Data Fusion Algorithm based on Functional Link Arti-ficial Neural Networks. In: The 6th World Congress on Intelligent Control and Automation, China, pp. 2806–2810 (2006) 11. Weng, W.D., Yen, C.T.: Reduced-Decision Feedback FLANN Nonlinear Channel Equaliser for Digital Communication Systems. IEE Proc. Commu. 151, 305–311 (2004) 12. Maeda, Y., De Figueiredo, R.J.P.: Learning Rules for Neuro- Controller via Simultaneous Perturbation. IEEE Transactions on Neural Networks 8, 1119–1130 (1997) 13. Spall, J.C.: Multivariate Stochastic Approximation Using Simultaneous Perturbation Gradient Approximation. IEEE Transactions on Automatic Control 37, 332–341 (1992) 14. Maeda, Y., Yoshida, T.: An Active Noise Control without Estimation of Secondary Path. In: ACTIVE 1999, USA, pp. 985–994 (1999) 15. Spall, J.C., Cristin, J.A.: Model-free Control of Nonlinear Stochastic Systems with Discrete-time Measurement. IEEE Transactions on Automatic Control 43, 1198–1210 (1998)
An Empirical Study of the Artificial Neural Network for Currency Exchange Rate Time Series Prediction Pin-Chang Chen, Chih-Yao Lo, and Hung-Teng Chang*
Abstract. This paper applies an integrated artificial neural network approach to forecast foreign exchange rates between the US dollar and Chinese Renminbi. In order to obtain a better forecasting performance in foreign exchange rates, this study develops an integrated forecasting model, which applies the SPSS as data preprocess method and an artificial neural network application named Alyuda Neuro Intelligence as forecasting tool. The results of this study provide evidence on the effectiveness and efficiency of the integrated artificial neural network model. The findings of this study should contribute positively to the development of theory, methodology, and practice of using artificial neural network to develop a forecasting model with enhanced forecasting accuracy. Keywords: Artificial Intelligence, Neural Network, Forecasting, Exchange Rates.
1 Introduction Artificial neural network is one of the most famous approaches in forecasting financial time series. Artificial neural network is a universal and highly flexible function approximator for pattern recognition and classification. It requires no prior assumption on the behavior and function form of the related variables, but it still can capture the underlying dynamic and nonlinear relationships among variables. Several design factors significantly impact the forecasting accuracy of artificial neural network. These factors include selection of input variables, architecture of the neural network, and quantity of input data. The question of system architecture design has been widely researched, but the corresponding questions of input variables selection and how many input data is needed in producing a reliable forecasting model have not been adequately addressed. Artificial neural network belongs to the class of data driven approach, as opposed to model driven approach. The representation of data is a critical for a successful neural network Pin-Chang Chen . Chih-Yao Lo . Hung-Teng Chang Department of Information Management, Yu-Da College of Business, Miaoli County, Taiwan 361, R.O.C. {chenpc,jacklo,cht}@ydu.edu.tw
*
H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 543–549. springerlink.com © Springer-Verlag Berlin Heidelberg 2009
544
P.-C. Chen, C.-Y. Lo, and H.-T. Chang
design. However, the movements of foreign exchange rates are generally nonstationary and quite random in nature. For most artificial neural networks, the price input is not a desirable set and it makes forecasting difficult. To overcome this problem, some types of data preprocess and data transformation are required to make the raw data become stationary and statistical property. In order to obtain a better forecasting performance in foreign exchange rates, the researcher of this study is motivated to develop an improved forecasting model. This study develops an integrated forecasting model, which applies the SPSS as data preprocess tool and an artificial neural network application named Alyuda Neuro Intelligence as forecasting model. This forecasting model is intended to take advantage of the SPSS, which can be used to select the correlated data and transform the original disorderly raw data into a dimensionless series in order to obtain an appropriate fundamental for the accurate mathematical relations. This forecasting model also takes advantage of the Alyuda Neuro Intelligence to provide forecasting results by using its neural network learning ability in nonlinear relationships inherent in the data. The proposed forecasting model is intended to outperform other forecasting models and obtain a better forecasting performance in foreign exchange rates forecasting.
2 Literature Review The forecasting of foreign exchange rates has been attracting increasing attention as a favorite domain to be examined by various methods. There exists a huge amount of related data and indicators with possible influence when examining a financial time series with the aim of forecasting [1]. Artificial neural network is one of the most famous approaches in forecasting financial time series. Artificial neural network is a universal and highly flexible function model for pattern recognition and classification. It requires no prior assumption on the behavior and function form of the related variables, but it still can capture the underlying dynamic and nonlinear relationships among variables [2]. There are several papers and articles that show the use of artificial neural network is promising in forecasting foreign exchange rates. Wu and Yang [3] investigated the relationship between exchange rate fluctuation and other economic indicators by applying the neural network to predict the yearly exchange rate of the New Taiwan currency. This study selected several economic indicators including the current account, consumer price index, GDP, and money supply, discount rate, as input to predict the exchange rate by the neural network system. The empirical experiment showed that the performance of neural networks was better than the other models discussed. The result demonstrated the predictive strength of the neural network and its potential for solving financial forecasting problems. Panda and Narasimhan [4] demonstrated a neural network to make one-stepahead prediction of weekly Indian rupee/US dollar exchange rate. It compared the forecasting accuracy of the neural network with linear autoregressive and random walk models. By using six forecasting evaluation criteria, this study found that the
An Empirical Study of the Artificial Neural Network
545
neural network had superior in-sample forecasting abilities than linear autoregressive and random walk models. The findings in the study have provided evidence against the efficient market hypothesis and suggested that there exist always a possibility of extracting information hidden in the exchange rate market. Gradojevic and Yang [5] employed a non-parametric method to forecast highfrequency Canadian/US dollar exchange rate. The introduction of a microstructure variable, order flow, substantially improved the predicting abilities of both linear and non-linear models. The non-linear models outperformed random walk and linear models based on a number of recursive out-of-sample forecasts. Two main criteria applied to evaluate model performance were root mean squared error (RMSE) and the ability to predict the direction of exchange rate moves. The artificial neural network (ANN) model was consistently better in RMSE to random walk and linear models for the various out-of-sample set sizes. Moreover, ANN performed better than other models in terms of percentage of correctly predicted exchange rate changes. The empirical results suggested that optimal ANN architecture was superior to random walk and any linear competing model for highfrequency exchange rate forecasting. Chen and Leung [6] evaluated and compared the performance of models based on two competing neural network architectures, the multi-layered feed forward neural network (MLFN) and general regression neural network (GRNN). This study suggested that the selection of proper architectural design might contribute directly to the success in neural network forecasting. An auxiliary experiment was developed and confirmed the possible synergetic effect from combining forecasts made by the two different network architectures. Jamal [7] applied a neural network model for representing the exchange rate of the United States with two of its major trading partners, Canada and the Euro countries. A one-period lagged exchange rate was used as the explanatory variable. The results showed that this model was quite successful in predicting the exchange rates studied. It also indicated that models of the relationship of an exchange rate with its explanatory variable might be updated as often as necessary for estimation and forecasting. Dunis and Williams [8] examined and analyzed the use of neural network regression (NNR) models in foreign exchange (FX) forecasting and trading models. The NNR models were benchmarked against traditional forecasting techniques to ascertain their potential added value as a forecasting and quantitative trading tool. In addition to evaluation the various models using traditional forecasting accuracy measures, they were also assessed using financial criteria. Having constructed a synthetic EUR/USD series for the period up to January 4, 1999, the models were developed using the same in-sample data, leaving the remainder for out-of-sample forecasting, October 1994 to May 2000, and May 2000 to July 2001, respectively. The out-of-sample period results were tested in terms of forecasting accuracy, and in terms of trading performance via a simulated trading strategy. Transaction costs were also taken into account. It concluded that NNR models had the ability to forecast EUR/USD returns for the period investigated, and added value as a forecasting and quantitative trading tool.
546
P.-C. Chen, C.-Y. Lo, and H.-T. Chang
In summary, artificial neural network belongs to the class of data driven approach, as opposed to model driven approach. The representation of data is a critical for a successful neural network design. The merit of artificial neural network is its effective and efficient learning ability in nonlinear relationships inherent in the data. Its ability to self-organize and to function without a pre-programmed knowledge base gives it an important advantage in forecasting sensitive information.
3 Methodology This study is intended to present both a theoretical viewpoint of the development of an artificial neural network, and a practical application of the proposed forecasting model to the forecasting of foreign exchange rates. The research methodology of this study consists of three stages. In the first stage, data collection and data preprocess will be conducted. In the second stage, data training, testing, and forecasting will be applied. In the third stage, forecasting performance will be evaluated by using the statistical measures. Stage One: Data Collection and Data Preprocess This study is intended to forecast the daily exchange rates between US dollar and Chinese Renminbi, which influenced by the daily movements of other different exchange rates including 21 different countries. This forecasting model uses the daily data from January 1, 2007 to December 31, 2007 for in-sample estimation, leaving the period from January 1, 2008 to January 31, 2008 for out-of-sample forecasting and evaluation. The foreign exchange rates data has been collected from the Federal Reserve Bank of New York database and organized by using Excel. In addition, the SPSS will be used to check the correlation between 22 different currencies. Some unrelated exchange rates will be removed and the rest of them will be used as input data. Data preprocess also will be applied, all input data will be transformed into the scale between -1 and +1 by using the artificial neural network application named Alyuda Neuro Intelligence. Stage Two: Data Training, Testing, and Forecasting There are many topologies possible for artificial neural networks. The backpropagation algorithm is one of the most commonly used neural network paradigms. The back-propagation algorithm is a general purpose learning algorithm, which consists of three basic steps: input signals go forward, errors are calculated from expected outputs, and these errors are sent to previous layers for weight adjustments. The back-propagation algorithm will be used to develop the artificial neural network model in this study. This study uses an artificial neural network application named Alyuda Neuro Intelligence as the foundation of a prediction model. After training and testing the input data, the forecast values obtained from this artificial neural network model will be considered as the results of this study.
An Empirical Study of the Artificial Neural Network
547
Stage Three: Evaluation of the Forecasting Performance To examine the performance of forecasting, it is necessary to evaluate the previously unseen data. It is likely to be the closest to a true forecasting or trading situation by using the direct comparison of out-of-sample data. Typically, forecasting models are optimized using a mathematical criterion, and subsequently analyzed using statistical measures. This study will use forecasting accuracy via statistical measures to evaluate the forecasting performance.
4 Data Analysis (1) Data Collection and Data Preprocess According to the Federal Reserve Bank of New York database, there are 22 different exchange rates have been collected (See Figure 1). By using the SPSS correlation, 10 unrelated exchange rates have been removed. The rest of 12 exchange rates have been used as input data for the artificial neural network forecasting model named Alyuda Neuro Intelligence. All input data has been preprocessed into the scale between -1 and +1.
Fig. 1 Data Collection
(2) Data Training, Testing, and Forecasting
Fig. 2 Data Forecasting
548
P.-C. Chen, C.-Y. Lo, and H.-T. Chang
The back-propagation algorithm has been used to run the data training and testing. By using the Alyuda Neuro Intelligence, the best network design has been searched as 12-28-1 architecture. The diagram shows that the target line and the output line are perfectly matched (See Figure 2). It means that this artificial neural network model has been well trained. (3) Evaluation of the Forecasting Performance The out-of-sample data has been collected in order to compare the forecasting values and the real exchange rates. The results demonstrated that the differences between the forecasting values and the real exchange rates are very little. The correlation is 0.998462 and R-squared is 0.99653 (See Figure 3). Those statistical measures have proved the forecasting accuracy of this artificial neural network forecasting model.
Fig. 3 Data Evaluation
5 Conclusion Nowadays the forecasting of foreign exchange rates is more a necessary than an experimental simulation. Since the future exchange rates are not certain, forecasts need to be made for speculation purposes that involve the spot and derivatives markets. Governments, international banks, multinational corporations, and individuals generate their expectations through all kind of resources to develop forecasting tools and advisory services to predict the short-term and long-term exchange rates movements. The foreign exchange market and the foreign exchange rates have affected every aspect of our daily personal and corporate financial lives, and influenced the economic and political destiny of every country. This study provides evidence on the effectiveness and efficiency of the integration of the SPSS and an artificial neural network model into a combined-expertise construction. The proposed model has processed the mix of variables effectively and produced estimations and forecasts efficiently. The findings of this study should contribute positively to the development of theory, methodology, and practice of using artificial neural network to develop a forecasting model with enhanced forecasting accuracy.
An Empirical Study of the Artificial Neural Network
549
References 1. Haykin, S.: Neural Networks: A comprehensive foundation, 2nd edn. Prentice Hall, Upper Saddle River (1999) 2. Medsker, L., Turban, E., Trippi, R.R.: Neural network fundamentals for financial analysts. In: Trippi, R.R., Turban, E. (eds.) Neural Networks In Finance and Investing, pp. 3–24. McGraw-Hill, New York (1996) 3. Wu, W., Yang, H.: Forecasting New Taiwan Dollar/United States Dollar exchange rate using neural network. The Business Review 7(1), 63–70 (2007) 4. Panda, C., Narasimhan, V.: Forecasting Exchange Rate Better with Artificial Neural Network. Journal of Policy Modeling 29(2), 227–236 (2007) 5. Gradojevic, N., Yang, J.: Non-linear, Non-parametric, Non-fundamental Exchange Rate Forecasting. Journal of Forecasting 25(4), 227–245 (2006) 6. Chen, A., Leung, M.: Performance evaluation of neural network architectures: the case of predicting foreign exchange. Journal of Forecasting 24(6), 403–420 7. Jamal, A.: The Neural Network and Exchange Rate Modeling. International Journal of Management 22(1), 28–31 (2005) 8. Dunis, C., Williams, M.: Modeling and Trading the Euro/US Dollar Exchange Rate: Do Neural Network Models Perform Better? Derivatives Use, Trading and Regulation 8(3), 211–240 (2002)
Grey Prediction with Markov-Chain for Crude Oil Production and Consumption in China Hongwei Ma and Zhaotong Zhang*
Abstract. The crude oil demand is growing rapidly in China, driven by its rapid industrialization and motorization. China has already become the second-largest oil importer nation in the world, after the United States. The dynamic GM (1,1) model of grey theory is used to develop the dynamic GM(1,1) model to forecast the crude oil consumption and production in China. In order to improve the forecast accuracy, the original GM (1, 1) models are improved by using Markov-chain. We analyze the data of the crude oil consumption and production from 1990 to 2006 in China, and forecast China’s crude oil consumption and production by this Grey-Markov forecasting model, which shows that the improved grey forecasting model is of more reliability and higher forecast accuracy than GM (1, 1). The forecast results indicate that China’s crude oil consumption and production will continue to increase rapidly in the period of 2007 to 2015. Keywords: Grey system theory, Markov-chain, Grey forecasting model.
1 Introduction Along with economic growth of nearly 10% per year over the last two decades, crude oil production and consumption has been rapidly increasing in China. By 2007, its crude oil consumption ranked the second in the world. In the meantime, along with the continuous economic development and the acceleration of industrialization and motorization processes, China’s crude oil consumption and production will increase even more rapidly. The energy system is an imperfect information system situated between the ‘Black Box’ and ‘White Box’. According to the definition, it is a typical grey system [1]. For such a system, the classic and modern control theories can not work well. The grey theory was developed to meet this demand and is a truly multidisciplinary and generic theory, which deals with the systems characterized Hongwei Ma . Zhaotong Zhang College of Engineering, Nanjing Agricultural University, Nanjing 210031, China *
H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 551–561. springerlink.com © Springer-Verlag Berlin Heidelberg 2009
552
H. Ma and Z. Zhang
by lacking of information. The fields covered by the grey system theory include system analysis, data processing, modeling, prediction, decision-making and control. The accumulated generating operation (AGO) is one of the most important characteristics of grey system theory, and its main purpose is to reduce the randomness of data. In the recent 30 years, the studies on energy system forecasting models have already made great progress, and many forecasting models have been developed. Because all of these models take the energy system as the white system, and adopted the mathematical statistics and regression analysis to establish the forecasting models, we think that the forecasted results of these models can not reflect the real changing situation of energy system completely [2]. This paper attempts to apply grey theory to China’s crude oil consumption and production prediction. We believe that the model GM (1, 1) and Markov model could be integrated with each other to forecast crude oil consumption and production to enhance the predicting accuracy.
2 Research Methodology 2.1 GM (1,1) Grey Forecasting Model The key of grey system theory is the grey dynamics model, which is notable for its generating function and grey differential equation. The concepts of grey block and differential similarity are the natures to establish grey forecast model. This model is generally described as GM (M, N), where M is the rank of differential equation, and N is the number of variables [3]. The procedures of deriving a GM (1, N) grey model are displayed as follows. Step1: Suppose an original sequence with n entries is X i( 0 ) = { x i( 0 ) (1), x i( 0 ) ( 2), … , x i( 0 ) ( k ), … , x i( 0 ) ( n )}, i = 1,2,
, N.
(1)
where xi(0) (k ) is the value at time k .
Step2: Based on the original sequence X i(0) , a new sequence X i(1) can be generated by one time accumulated generating operation (1-AGO), which is X i( 0 ) = { x i( 0 ) (1), x i( 0 ) ( 2), … , xi( 0 ) ( k ), … , x i( 0 ) ( n )}, i = 1,2,
, N.
(2)
where k
xi(1) (k ) = ∑ xi(0) ( j )
(3)
j =1
(1)
Obviously, the new sequence X i is monotonous increasing. Compared with the original sequence X i(0) , the regularity of data is enhanced and the randomness of data is reduced.
Grey Prediction with Markov-Chain for Crude Oil Production
553
Step3: Because of the exponentially increasing nature of the entry, the sequence X i(1) is similar to the exponentially increasing nature of the solution for a first-order differential equation which is accordingly used to express the entry, i.e.,
dx1(1) + ax1(1) = b2 x2(1) + b3 x3(1) + … + bN xN(1) . dt
(4)
Equation (4) is called a GM (1, N) model, where a, b2 , b3 ,… , bN are model parameters. The model parameters can be determined by the least square regression. aˆ = [a, b2 , b3 ,… , bN ]T = (BT B)−1 BT Y ,
(5)
where
Y = ( x1( 0) (2), x1( 0 ) (3), … , x1( 0 ) (k ), … x1( 0 ) (n)) T , 1 (1) ⎤ ⎡ (1) x 2(1) (2), x 3(1) (2), … , x N(1) (2)⎥ ⎢ − 2 [ x1 (1) + x1 (2)], ⎥ ⎢ 1 ⎢ − [ x (1) (2) + x1(1) (3)], x 2(1) (3), x 3(1) (3), … , x N(1) (3) ⎥ B=⎢ 2 1 ⎥ …, …, …, …, … ⎥ ⎢ ⎥ ⎢ 1 (1) (1) (1) (1) (1) ⎢− 2 [ x1 (n − 1) + x1 (n)], x 2 (n), x 3 (n), … , x N (n)⎥ ⎦ ⎣
(6)
According to the differential equation theory, the solution for a GM (1, N) (shown as (4)) is xˆ1(1) ( k + 1) = [ x1(1) (0) −
1 N 1 N bi xi(1) ( k + 1)e − ak + ∑ bi xi(1) ( k + 1)] ∑ a i =2 a i =2 .
(7)
Using the operation of one time inverse accumulated generating operation (1-IAGO), the predicted value of the entry k + 1, xˆ1(0) (k + 1), of the original sequence X 1(0) can be obtained by (7):
xˆ1(0) (k + 1) = xˆ1(1) (k + 1) − xˆ1(1) (k ) .
(8)
If N = 1 , then GM (1, N) model will become a GM(1,1) model [4].
2.2 Partition of States by Markov-Chain Forecasting Model ˆ (k) that The values of X (0 ) (k + 1) are distributed in the region of the trend curve Y may be divided into a convenient number of contiguous intervals[5].When X ( 0 ) (k + 1) falls in interval i, one of S such intervals, it may be regarded as corresponding to a state ⊗i in an m order Markov unstable sequence, ⊗ i can be signified as follows:
554
H. Ma and Z. Zhang
⊗ i = [ ⊗ i1 ,⊗ i2 ] .
(9)
where i=1,2,…,S, S is the amount of states.
ˆ (k ) + A . ⊗i1 = Y i
(10)
ˆ (k ) + B . ⊗i2 = Y i
(11)
ˆ (k ) is a time function, so ⊗i1 and ⊗i2 will vary with the time series. It means Y that the state ⊗i is dynamic. To establish S (the amount of states), ⊗i1 and ⊗i2 , depends on the study object and the original data series.
2.3 Calculate the Transition Probability P For Markov-chain series, the transition probability from state ⊗i to state ⊗ j can be established using an equation as follows: Pij (m) =
M ij (m) Mi
,
( i, j = 1, 2,
(12)
,S)
where Pij (m) is the transition probability of state ⊗ j transferred from state ⊗i for m steps, m is the number of transition steps each time, M ij (m) is the number of original data of state ⊗ j transferred from state ⊗i for m steps, M i is the number of original data points in state ⊗i . These Pij (m) values can be presented as a transition probability matrix R (m ) . ⎡ p11 (m) p12 (m) ⎢ p (m) p (m) 21 22 R(m) = ⎢ ⎢ ⎢ ⎢⎣ pi1 (m) pi2 (m)
p1j (m) ⎤ p 2j (m) ⎥⎥ , ⎥ ⎥ pij (m) ⎥⎦
( i, j = 1, 2,
,S) .
(13)
The state transition probability Pij ( m) reflects the statistical law of each state transition in a system, which is the foundation of Markov probability matrix forecast [6].The future development of the system can be forecasted by studying the state transition probability matrix R (m) .Generally, it is necessary to observe the one-step transition matrix R(1) . Suppose the object to be forecasted is in (1≤q≤S),
row
q
in
max Pqj (1) = pql (1), ( j = 1, 2,
matrix
R(1)
should
be
considered.
⊗q If
, S ) , (1≤q≤S)), then what will most probably happen
in the system at the next moment is the transition from state ⊗q to state ⊗l . It is difficult to determine the future transition of the state, if two or more transition probabilities in the row q of the matrix R (1) are the same. Therefore the transition
Grey Prediction with Markov-Chain for Crude Oil Production
555
probability matrix of two-step transition matrix R(2) or multi-step transition matrix R (m) should be considered [7].
2.4 Calculate the Forecasting Data After the determination of the future-state transition of a system, i.e. the determination of Grey elements ⊗i1 , ⊗i2 , the changing interval of the forecast value is ˆ (k + 1) , is considered to between ⊗i1 and ⊗i2 . The most probable forecast value, Y be the middle value of the determined state interval, that is 1 ˆ + 1) = 1 (⊗ + ⊗ ) = Y(k) ˆ Y(k + (A i + Bi ) . i1 i2 2 2
(14)
3 Evaluation of Forecasting Accuracy The accuracy of forecasting is evaluated by a post-error test method. It uses four important indexes, the average relative error δ, the post-error ratio C, the probability of small error P, and the forecasting precision ρ. The details are described as follows: Given the original sequence
{
}
X i( 0) = xi(0) (1), xi(0) (2),…, xi( 0) (n) and its predicted equivalent
Xˆ (0) = { xˆ (0) (1), xˆ (0) (2),… , xˆ (0) (n)} , the forecasting errors are ε ( k ) = x (0) ( k ) − xˆ (0) ( k ).
k = 1, 2,… , n
the mean values and covariance for both the forecasted sequence and the forecasting errors, x ,
S1 , and ε , S 2 ,δ, ρ are x=
ε=
δ=
1 n (0) 1 n ( 0) x ( k ), S 1 = [ x ( k ) − x ] 2 , k = 1,2, ∑ ∑ n k =1 n k =1
1 n ( 0) ∑ ε (k ), n k =1
S2 =
1 n ( 0) [ε ( k ) − ε ] 2 , k = 1,2, ∑ n k =1
1 n 1 x ( 0 ) ( k ) − xˆ ( 0 ) (k ) , ρ = 1 − δ , ∑ ( 0) n k =1 x (k )
from (15) and (16), we can obtain
k = 1,2,
, n.
, n.
, n.
(15)
(16)
(17)
556
H. Ma and Z. Zhang
Table 1 The four grades of forecasting accuracy
grade1: very good grade2: good grade3: qualified
δ <0.01 <0.05 <0.10
P >0.95 >0.80 >0.70
C <0.35 <0.50 <0.65
ρ(%) >95 >90 >85
grade4: unqualifed
<0.20
>0.60
<0.80
>80
Grade
C = S 2 / S1 ,
P = p{ε (k ) − ε 〈 0.6745 S1 }
(18)
Based on these values, the accuracy grade of model can be judged according to the criteria list in Table 1. The smaller the C value is, the higher the accuracy grade of model has. On the other hand, the higher the P value, the higher the accuracy grade of model has. Because the post-error ratio C indicates the change rate of the forecasting error, a smaller C value explains a larger S1 value and a smaller S2 value, and P shows the probability of the relative bias for forecasting error, and the higher P value indicates that the proportions of data points with residual values less than the average residual by values smaller than 0.6745[8].
4 Model Checking and Forecast on Crude Oil Consumption and Production in China There are many factors which could influence the crude oil production and consumption, such as the industry structure, the economic growth, the governmental policy and so on. Some factors are clear, and others are not clear. So the time series of the crude oil production of China shows random fluctuations. As Table 2 shows, the historical data of the crude oil production from 1990 to 2005 is rising, but fluctuating randomly. So this paper forecast and analyzes the crude oil production of China by Grey-Markov forecasting model [9]. Table 2 The crude oil production of China from 1990 to 2006 (Unit: 10000 tons of SCE)
Number Year Amount Number Year Amount
1 1990 19745 .18 9 1998 22986 .25
2 1991 20130. 05 10 1999 22857. 16
3 1992 20271. 38 11 2000 23280. 51
4 1993 20768. 03 12 2001 23420. 70
5 1994 20896. 30 13 2002 23858. 05
6 1995 21419. 64 14 2003 24232. 16
7 1996 22544. 72 15 2004 25122. 45
8 1997 22906. 93 16 2005 25982. 68
Grey Prediction with Markov-Chain for Crude Oil Production
557
4.1 Build the GM (1, 1) Grey Forecasting Model Based on the historical data of the crude oil production in China from 1990 to 2002,a trend curve equation is built by GM(1,1) Grey forecasting model. It is as follows:
ˆ (k ) = X ˆ ( 0 ) ( k + 1) = 1255703 .674807e 0.015975k − 1235958 .494807 Y Where k is the series number of the year, and k=0 means that it is 1990.
4.2 Partition of States by Markov-Chain Forecasting Model According to the actual crude oil production data, four states, i.e., four contiguous ˆ (0) (k + 1) . The four state intervals intervals can be established about the curve of X can be got as follows:
ˆ ( k ) − 0.03 Y , ⊗1 : ⊗11 = Y
ˆ ( k ) − 0.01 Y ⊗12 = Y
ˆ ( k ) − 0.01Y , ⊗ 2 : ⊗ 21 = Y ˆ ⊗ 3 : ⊗ 31 = Y(k),
ˆ (k ) ⊗ 22 = Y
ˆ ⊗ 32 = Y(k) + 0.01Y
ˆ ( k ) + 0.01 Y , ⊗ 4 : ⊗ 41 = Y
ˆ ( k ) + 0.03 Y ⊗ 42 = Y
where Y denotes the average value of the historical crude oil production from 1990 to 2002.The historical data series, the regressed curve and the states intervals are showed in Fig.1. 25000
Crude Oil Production in China
24000 23000 22000 21000 20000 19000 1990
1992
1994
1996
1998
2000
2002
Fig. 1 The forecasting curve of the crude oil production in China
4.3 Calculate the Transition Probability P After observing the states intervals and historical data series, the number of the historical data in every interval can be got. They are as follows:
558
H. Ma and Z. Zhang
M1 = 3,
M 2 = 5,
M 3 = 1,
M4 = 3
Where Mi denotes the number of the historical data in the interval I, and i=1, 2, 3, 4. Then calculate the one-step transition probability to every states interval and present them in the transition matrix R(1) as follows: ⎡ ⎢ ⎢ ⎢ R (1 ) = ⎢ ⎢ ⎢ ⎢ ⎢ ⎣
1 3 3 5 0 0
2 3 1 5 1 1 1 3
0 0 0 0
⎤ 0 ⎥ 1 ⎥ ⎥ 5 ⎥ 0 ⎥ ⎥ 2 ⎥ ⎥ 3 ⎦
The crude oil production of 2002 year is in the ⊗1 state interval, so observe the first row of the transition matrix. Because max P1j = P12 ( j = 1, 2,3, 4) as the matrix R(1) shows, the crude oil production of 2003 year is most probably in the ⊗2 state interval, so we can calculate the most probable forecast value of 2003 year’s crude oil production by Eq (14) [10]. 1 ˆ = (⊗21 + ⊗22 ) = 24384.32 Y(14) 2
4.4 Comparison of Forecast Precision between the Grey-Markov Forecasting Model and the GM (1, 1) Forecasting Model As the above, we can get the forecast values from 2003 to 2006 calculated by Grey-Markov forecasting model and GM (1, 1) forecasting model [11]. From Table 2, we can get the reality amount of the crude oil production in China. We compared the forecast values between the two models and the results are showed in Table 3 and Table 4. Table 3 Comparison of forecast results with two different methods
Year
Reality
GM(1,1) value
Precision
Grey-Markov value Precision
2003
24232.16
24493.97
98.92%
24382.32
99.37%
2004
25122.45
24888.40
99.07%
25107.70
99.94%
2005
25982.68
25289.19
97.33%
25727.78
99.02%
2006
26394.09
25696.43
97.36%
26135.02
99.02%
Grey Prediction with Markov-Chain for Crude Oil Production
559
Table 4 Comparison of forecast results with two different methods
Type Crude Oil Production Type Crude Oil Production
δ 0.034 δ 0.021
GM(1,1) Model C P ρ(%) 0.195 1 96.6 Grey-Markov Model C P ρ(%) 0.096 1 98.6
Grade grade 2 Grade grade 1
Table 3 and Table 4 shows that the forecast values of Grey-Markov forecasting model are more precise and reliable than original GM(1,1) model, for this model makes full use of the information given by historical data, and increase greatly the forecasting precision of random fluctuating sequences [12].
4.5 The Future Forecast Results of Crude Oil Consumption and Production in China Calculate the forecast value from 2007 to 2015 by the Grey-Markov forecasting model following the above steps. Table 5 shows the forecasting values. The forecast results of crude oil consumption and production from 2007 to 2015 via the Grey-Markov forecasting model are shown in Table5 and Fig.2. As shown by prediction, the crude oil consumption and production in China will continue to increase rapidly for the next 10 years [13]. In addition, Table5 and Fig.2 show that from 2007 to 2015, the annual growth rate of crude oil production will reach 2.13%, while the annual growth rate of crude oil consumption will be 7.24%. The average crude oil consumption from 2007 to 2015 will reach 2.38 times as against the average crude oil production. In 2015, the crude oil consumption and production will Table 5 The future forecast results from 2007 to 2015(unit:10000 tons of SCE)
Year
Grey-Markov Forecast Value
2007
production 26483.0298
consumption 53046.9621
2008
26944.1947
56865.7286
2009
27413.3901
60959.4019
2010
27890.7559
65347.7724
2011
28376.4343
70052.0546
2012
28870.5702
75094.9906
2013
29373.3107
80500.9595
2014
29884.8058
86296.0956
2015
30405.2078
92508.4143
560
H. Ma and Z. Zhang
100000
Crude Oil Production and Consumption
90000 80000 70000 60000 50000 40000 30000 20000 10000 1990
1995
2000 Production
2005
2010
2015
Cons umption
Fig. 2 The forecasting results of the crude oil production in China
be 925.0841 MTCE and 304.0521 MTCE respectively, and the former will be 3.0425 times as against the later. So the crude oil in China will continue to depend on export heavily [14].
5 Conclusions This paper has applied the Markov-chain to develop the GM (1, 1) models to forecast the crude oil consumption and production in China. The GM (1, 1) models reflect the macroscopically regulation, the Markov-chain shows the vibration development of the microcosmic system, and both not only have the mutual advantage but also can make full use of the information obtained from the original time series [15]. The posteriori check results show that the Grey-Markov forecasting models have higher forecast accuracy than the original ones. Acknowledgments. This study was supported financially by the college of engineering, Nanjing agricultural university, and the agricultural machinery bureau of Jiangsu province (Project No.GXS06012).The authors would like to thank Prof. Wang Xiaohua at the college of engineering, Nanjing agricultural university for his valuable comments and kindly help.
References 1. Deng, J.L.: Grey System (Society and Economy). Publishing House of National Defense Industry, Wuhan (1985) 2. Wang, X.H., Feng, Z.M.: Rural household energy with the economic development in China: stages and characteristic indices. Energy Policy 29, 1391–1397 (2001) 3. He, Y., Bao, Y.D.: Grey-Markov forecasting model and its application. System Engineering-Theory & Practice 9, 59–63 (1992)
Grey Prediction with Markov-Chain for Crude Oil Production
561
4. Deng, J.L.: Control Problems of Grey System. Huazhong University of Science and Technology Press, Wuhan (1990) 5. Huang, M., He, Y., Cen, H.: Predictive analysis on electric-power supply and demand in China. Renewable Energy 32, 1165–1174 (2007) 6. Liu, S.F., Lin, Y.: An Introduction to Grey Systems: Foundation, Methodology and Application, pp. 97–111. IIGSS Academic Publisher, Slippery Rock (1998) 7. National Bureau of Statistic of China. China statistical year book. China Statistics Press, Beijing (2007) 8. Hu, D.L.: Applying the grey system theory to evaluate the competitive competence of enterprises. Journal of Scientific-Technical Progress and Strategy 20, 159–161 (2003) 9. He, Y.: A new forecasting model for agricultural commodities. Agric. Eng. Res. 60, 227–235 (1995) 10. CCICED: Energy Strategies and Techniques. Publishing House of China Environmental Science (1997) 11. Hsu, C.I., Wen, Y.H.: Improved grey prediction models for the trans-Pacific air passenger market. Transportation planning and Technology 22, 87–107 (1998) 12. Chao, H.W.: Predicting tourism demand using fuzzy time series and hybrid grey theory. Tourism Manage 25, 367–374 (2004) 13. Lee, C.: Grey system theory wins application on earthquake forecasting. Journal of Seismology 4, 27–31 (1986) 14. He, Y., Bao, Y.D.: Grey-Markov forecasting model and its application. Sys. Eng (Theory Practice) 9, 59–63 (1992) 15. Li, C.H.: Applying the grey prediction model to the global integrated circuit industry. Technological Forecasting and Social Change 70, 63–74 (2003)
Fabric Weave Identification Based on Cellular Neural Network Suyi Liu, Qian Wan, and Heng Zhang*
Abstract. Owing to the property of binary output it has a very important in identification of image processing in cellular neural networks (CNN). Warp and weft interlacing points of fabric image will be expressed as images of binary property in the paper. Adopting cellular neural network it makes a cellular element a warp interlacing point or a weft interlacing point, and calculates the relation of cellular element and neighbor cellular element, then it can identify the weave structure of fabric. It has made identification of weave structure on three original fabric weave contained plain weave, twill weave and satin weave, and obtained better result. Keywords: Cellular Neural Network, Fabric, Warp Interlacing Point, Weft Interlacing Point, Weave Identification.
1 Introduction Cellular neural network (CNN), which is network connecting with locals, consist of many cell elements. It has features of few data quantity, fast processing speed and good effect. So it is widely used in image processing, signal processing and other fields. The analysis and identification of fabric weave structure is an important part in textile industry production process. The main purpose of identification of fabric weave structure is that identify integrated structure of fabric sample, one cycle period at least, and structure character, then it can guide further production. At present the way of analysis and identification on fabric weave structure usually takes Fourier analytics, autocorrelation function method, and wavelets transform method. According to the similar feature between fabrics weave structure, by adopting CNN a cellular element is represented a weft interlacing point or a warp interlacing point, then it identifies the fabrics weave structure after calculating the relationship of cellular and adjacent cellular. Suyi Liu . Qian Wan . Heng Zhang College of Electronics and Information Engineering, Wuhan University of Science and Engineering, 430073 Wuhan, China
[email protected] *
H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 563–569. springerlink.com © Springer-Verlag Berlin Heidelberg 2009
564
S. Liu, Q. Wan, and H. Zhang
2 Cellular Neural Network and Its Application in Image Processing 2.1 Basic Theory of Cellular Neural Network Basic unit of cellular neural network is called artificial cell just connected with surrounding neurons, and it is a continuous non-linear dynamic system. Shown in figure1, it is a cellular neural network in 4 × 4 size. c(i, j ) means the cell in the
i th row and the j th column. Fig. 1 A structure diagram of 2 dimension cellular neural network
Cellular neural network, in M × N dimension, is constitutes by M × N cells
of c(i, j ) , 1 ≤ i ≤ M ,1 ≤ j ≤ N . And its dynamic process can be described by one-step nonlinear differential equation in the following. (1) State equation
v xij (t ) = −v xij (t ) +
∑ Aij, kl v ykl (t ) +
c ( k , l )∈N r (i , j )
∑ Bij;kl vukl (t ) +I ij . (1)
c ( k ,l )∈N ir (i ' j )
1 ≤ i ≤ m, 1 ≤ j ≤ n , where Nr (i, j ) is the near neighbors of a neuron c(i, j ) , and its radius is r. The definition is:
N i (i, j ) = {C (k , l ) max{ k − i , l − j } ≤ r} , 1 ≤ k ≤ m, 1 ≤ l ≤ n .
Aij , kl ——feedback coefficient between neuron and output; Bij , kl ——coefficient between neuron and output; I ij ——external biasing input. (2) Output equation
v yij (t ) = f (v xij (t )) = 0.5[ v xij (t ) + 1 − v xij (t ) − 1 ] .
(2)
Fabric Weave Identification Based on Cellular Neural Network
565
Fig. 2 Piecewise linear function
1 ≤ i ≤ m, 1 ≤ j ≤ n . The equation shows relation between output and state, where output function f (v ) is a piecewise linear function, shown in figure 2. (3) Constraint condition
v xij (t ) ≤ 1 vuij (t ) ≤ 1
.
(3)
(4) Parametric hypothesis
A(i, j; k , l ) = A(k , l ; i, j ) .
(4)
Literature [2] has proved CNN is stable. And it shows: Theorem 1 If network parameter meet
A(i, j; i, j ) >
1 . Rx
(5)
Then lim v xij (t ) > 1 , lim v yij (t ) = ±1 . t →∞ t →∞ The theorem 1 ensures cellular neural network has output of binary output. The quality has important significance for classification problem of image processing.
2.2 Cellular Neural Network Applied to Image Processing Applying CNN to process an image it needs the following disposal. (1) Differential equation transform into difference equation First order difference equation, is equivalent of equation 1, shows:
v xij (t + 1) − v xij (t ) = −v xij (t ) +
∑
∑
Aij , kl v ykl (t ) + Bij ; kl vukl (t ) +I ij c ( k , l )∈N r (i , j ) c ( k , l )∈N ir (i ' j )
Then
v xij (t + 1) =
∑
∑
Aij , kl v ykl (t ) + Bij; kl vukl (t ) +I ij . c ( k , l )∈N r (i , j ) c ( k , l )∈N ir (i ' j )
(6)
566
S. Liu, Q. Wan, and H. Zhang
(2) Range adjustment of pixel value In order to meet expression (3) of constraint condition of CNN, image pixel value [0,255] is linearly mapped to the range of [-1,1]. Its means to process as follows:
vukl =
2 vukl − 1 . 255
(7)
When CNN is applied to image processing, its size is identical with pending image. If pending image contains M × N pixel, CNN contains M × N cell neuron, where has one-to-one correspondence between pixel and cell. According to the expression (3) of constraint condition of CNN initial state value in processing, set initial state value of CNN at zero each point of network is substituted into expression (1), until the whole network stability, and then the output is binary, that is a classification pattern of corresponding image.
3 Analysis of Fabric Weaves Sort of fabric weaves is numerous. It includes original weave, fancy weave, compound weave, complex weave. The elementary weave, is the most basic weave form, includes plain weave, twill weave and satin weave. They are the basis of fancy weave and other weave, shown in Fig. 3. Thus it is very important to analyze and identify elementary weave.
(a) plain weave
(b) twill weave
(c)satin weave
Fig. 3 Schematic diagram of three original weaves
Fabric consists of interlacing points that are staggered by weft and warp. Fig 3(a) shows plain weave arrangement graph. To up, down, left and right direction-adjacent interlacing point of each warp interlacing point is weft interlacing point. In the same way, adjacent interlacing point of each weft interlacing point is warp interlacing point. So we can identify weave structure of the fabric with analyzing the relationship of the fabric weave points. Let property of interlacing point of fabric image be expressed as numerical value, weft interlacing points are expressed as “1” (it is represented as white in binary image), and warp interlacing points are expressed as “0” (it is represented
Fabric Weave Identification Based on Cellular Neural Network
567
as black in binary image). Filling with the original location and region of the interlacing points respectively by this way, the binary image of interlacing point points will be formed. By analyzing fabric weave image, we found that it is similar to the structure of the CNN. Therefore, by adopting CNN it made a cellular element a warp interlacing point or a weft interlacing point, further calculated the relation of cellular element and neighbor cellular element, and then weave structure of fabric can be identified.
4 The Identification of Fabric Original Weave Based on CNN 4.1 Design of Network Parameters Referred to cellular state equation (1), the dynamic mechanism of CNN includes output feedback and input controlling. Because the effect of output feedback is determined by template A, and the effect of input controlling is determined by template B, key of making image processing with CNN is how to find out a suitable template. When it chose r = 1 , the structure of C (i, j ) and eight neurons surrounded,
Nr (i, j ) ∈ R3×3 . Then A and B both are 3 × 3 templates, according to the feature of the fabric original weave, it choose a temple which likes form shown in:
namely
⎡0 0 0 ⎤ A = ⎢⎢0 a 0⎥⎥ ⎢⎣0 0 0⎥⎦
,
⎡− c − c − c ⎤ B = ⎢⎢− c b − c ⎥⎥ . ⎢⎣− c − c − c ⎥⎦
4.2 Identification of Fabric Original Weaves The paper takes plain weave, will weave and satin weave fabric as original images. Shown in fig 4, it is a bitmap which has 256 gray levels.
(a) plain weave Fig. 4 Fabric sample
(b) twill weave
(c) satin weave
568
S. Liu, Q. Wan, and H. Zhang
Through multiple adjustment calculations by the use of MATLAB software, we get network parameters: I=0.5 a=2,b=8,c=1, namely
,
⎡ 0 0 0⎤ ⎡− 1 − 1 − 1⎤ A = ⎢⎢0 2 0⎥⎥ , B = ⎢⎢− 1 8 − 1⎥⎥ . ⎢⎣0 0 0⎥⎦ ⎢⎣− 1 − 1 − 1⎥⎦ We can get the identification result of the fabric original weave based on the flow chart shown in fig 5. Network conditions of convergence are: If there are cells which their state between 1 and -1 in network, it manifests that network is not convergent, and then network will enter into cyclic iteration again. Otherwise it is convergent, that means it can jump out of cyclic iteration. Fig. 5 Flow chart
Start
Readin Image
Data Processing
Setting Initial Parameters Calculation and Iteration
No
Judging If Network Is Convergence Yes Output
End
Fabric Weave Identification Based on Cellular Neural Network
569
The identification results of fabric original weave refers to table 1. Table 1 Identification results of three original weaves of fabric Fabric categories
Plain weave Twill weave Satin weave
Weft interlacing point 1 2 4
Warp interlacing point 1 1 1
Meaning
a weft intersect a warp two weft intersect a warp four weft intersect a warp
5 Conclusion According to the similarity between fabrics weave structure and CNN structure in this paper, by adopting CNN it made a cellular element a warp interlacing point or a weft interlacing point, further calculated the relation of cellular element and neighbor cellular element, and then it identified weave structure of fabric. The paper has made demonstrated experiments of weave structure identification on three original fabric weaves contained plain weave, twill weave and satin weave. It got a good identifiable result, thus it is said that the way is effective.
References 1. Ravandi, S.A.H, Toriumi, K., Matsumoto, Y.: Fourier Transform Analysis of Plain Weave Fabric Appearance. Waterman, T.F. (ed.) Textile Research Journal 65, 65–69 (1995) 2. Li, Y.M., Xu, B.J., Gao, W.D.: Studies on Automatic Recognition System of Fabrics Weave Parameters. Beijing Textile 23(2), 54–57 (2002) 3. He, F., Li, L.Q., Xu, J.M.: Woven Fabric Density Measure Based on Adaptive Wavelets Transform. Journal of Textile Research 28(2), 32–35 (2007) 4. Chua, L.O.: CNN: A Paradigm for Complexity. Scientific Series On Nonlinear Science (1998) 5. Chua, L.O., Yang, L.: Cellular Neural Networks: Theory. IEEE Transactions on Circuits and Systems 35(10), 1257–1272 (1988) 6. Cai, B.X.: Fabric Structure and Design. Textile Industry Publishing House, Beijing (1990) 7. Chua, L.: CNN: A Paradigm for Complexity. World Scientific, Singapore (1998) 8. Wang, W.L., Li, Z.G., Xiao, Z.T.: Study on Image Processing Based on Cellular Neural Network. Computer Science. 35(4A), 159–161 (2008)
Cutting Force Prediction of High-Speed Milling Hardened Steel Based on BP Neural Networks Yuanling Chen, Weiren Long, Fanglan Ma, and Baolei Zhang*
Abstract. It is absolutely necessary to machine the complex mould cavity with micro-ball end mill in high speed machining. Because of the priceless and frangibility of the tool, it is significant to predict the cutting force in order to lessen the tool’s disrepair, to ensure the quality and to improve efficiency. This paper established the cutting force prediction model based on BP neural network when machining arc surface of hardened steel in high speed. According to the sample set of experimental results used to train and test the neural network, we realize the cutting force prediction and simulation through introducing the elastic grads’ decrease method to improve the speed of convergence and precision in the process of cutting. The practice showed that most of the error values are about 5% except some special individual. Obviously, to predict the cutting force is feasible in the process of non-stability and non-linear extremely of cutting through the non-linear neural network. Keywords: BP neural networks, Micro-ball end mill, Hardened steel, Cutting force, Prediction.
1 Introduction It is one of the powerful ways to research the dynamical trait in the process of milling based on the physics modeling and simulation. Many scholars established the milling force model through the cutting theory analysis, trial-system identification as well as differential method and established edge line’s or solid modeling of ball end mill considering the effect of tool’s flexible bend[1,2,3]. Because the tool is slim, it will turn backward and vibration easily leading to the cutting force and thickness of chip change accordingly. In fact, it is difficult to predict and measure the value of turn and trait through study the cutting force model in analysis and trial Yuanling Chen . Weiren Long . Fanglan Ma . Baolei Zhang College of Mechanical Engineering, Guangxi University, Nanning 530004, China
[email protected]
*
H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 571–577. springerlink.com © Springer-Verlag Berlin Heidelberg 2009
572
Y. Chen et al.
way. The forward neural network has better study ability and non-linear changing mechanism, it can model and simulate exact system effectively, which is difficult to establish with math formula. Many foreign scholars carried on researches and applications on the neural network in kinds of fields, such as the tool’s condition identification, the optimization of cutting condition, the prediction of surface’s roughness and control, and so on. They have made good achievements [4,5,6].
2 Theory of BP Neural Network The study of neural network can be regarded as a process of specific function approach, and the answer of the problems commonly not precision. So we must consider the accuracy and the function approaching ability. The paper adopts the forward neural network, namely BP. The need of BP neural network must to be met is monotonicity. The double bent function is the following:
f ( x) =
1 1 + e− x
.
(1)
The reverse spread of the vector ä is the base of BP. It adjusts the weight and threshold using square of the network’s error and the differential coefficient of the input of every layer in the network to reduce the error’s sum of squares which get expected outputs. So we usually choose the function of the linear neural cells. When the value of f ( x) is about 0 or 1, the update becomes much smaller leading to much more stability. So before training, we must make the sample’s inputs and outputs to be naturalized as a citizen in order to make the outputs between 0 and 1.
3 Experimental Setup 3.1 Experimental Conditions 3.1.1 Machine Tool, Cutter and Workpiece Material The experiment was carried on an YCM-V85A vertical spindle milling center (maximum rotating speed up to 8000RPM). The cutter is a 4-flutes hard alloy ball end mill with TiAlN-coated and its diameter is 2 millimeters. The workpiece material is hardened steel 45(HRC 52). 3.1.2 Force-Measured System The forces were measured using YDX-III9720 type dynamometer. The signals were amplified by charge amplifier, and then were transmitted to computer through signal-collected card cooperated with the software “general cutting forcemeasured system”.
Cutting Force Prediction of High-Speed Milling Hardened Steel
573
3.2 Experimental Project The experiment adopted single factor to machine arc surface with slot-milling method. That is, only changes levels of one factor. The experimental parameter’s setup was shown in the table 1. Table 1 Test parameters
Factors
Levels
Axial cutting depth Ad (mm)
0.07 0.08 0.09 0.10 0.11 0.12 0.13
Spindle speed n(r/min)
2387 3183 3978 4774 5570 6366
Feed F(mm/min)
50 100 150 200 250 300 350
Arc radius R(mm)
4.0 4.5 5.0 5.5 6.0
Tool’s Cantilever L(mm)
13 15 17 19 21
The cutter’s diameter
:φ=2mm;the cutter’s teeth:N=4;
4 Establishing the Cutting Force’s Neural Network Model The neural network adopted the toolbox in the MATLAB to design and simulate. For the cutter, the force Fx, Fy in the plane XY made the bent of the cutter, come into being the machine error, and the force Z-direction Fz make the cutter disrepair because of being compressed. The neural network can be applied in the prediction of Fx, Fy and Fz. Because the cutting force varied at any moment (as Fig.1), we choose the maximum of the force at one point ensuring the comparabilZ ity of the experimental data. The neural network can be divided into G 7RRO $ teacher-study, which offers teacher signal Y R accorded with input signal and studies the % connection between input and output, and $ no teacher-study, which studies by itself © accorded with the input signal. This study carried out teacher- train to the network ¥ through the cutting force , which gotten in the experiment on the condition that R changes the axial cutting depth, feed speed, rotate speed, surface radius and the suspended handle. The training samples Fig. 1 Milling along arc surface should be consisted of all the values which 7
)
H F H L S N U R :
574
Y. Chen et al.
Table 2 The samples of net training No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Ad F (mm) (mm/min) 0.07 200 0.09 200 0.11 200 0.13 200 0.10 50 0.10 150 0.10 250 0.10 350 0.10 200 0.10 200 0.10 200 0.10 200 0.10 200 0.10 200 0.10 200 0.10 200
n (r/min) 6366 6366 6366 6366 6366 6366 6366 6366 5570 3978 2387 6366 6366 6366 6366 6366
R (mm) 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 5.0 6.0 4.0 4.0 4.0
L (mm) 19 19 19 19 19 19 19 19 19 19 19 19 19 13 17 21
Fx (N) 63.4 88.3 105.0 115.2 82.9 88.7 102.6 117.7 103.4 114.6 119.8 107.8 120.9 80.5 88.4 68.5
Fy (N) 39.5 47.6 49.2 48.9 37.5 43.8 46.6 46.1 44.1 46.0 47.7 52.3 59.7 47.2 56.9 40.6
Fz (N) 128.7 173.8 196.9 225.3 195.0 185.3 209.2 240.8 209.7 224.9 238.6 195.8 220.6 134.9 180.6 140.4
Fy (N) 45.4 45.8 43.1 42.3 46.6 40.6 50.7
Fz (N) 147.5 188.3 196.4 220.3 241.6 183.8 163.2
Table 3 The samples of net checking No. 17 18 19 20 21 22 23
Ad (mm) 0.08 0.10 0.10 0.10 0.10 0.10 0.10
F (mm/min) 200 200 100 200 200 200 200
n R (r/min) (mm) 6366 4.0 6366 4.0 6366 4.0 4774 4.0 3178 4.0 6366 4.5 6366 4.0
L (mm) 19 19 19 19 19 19 15
Fx (N) 76.8 93.9 88.3 109 118.1 100.4 75.6
6 7 8 3 1 ,
used to predict parameters, Viz. It should be covered the maximum and minimum and make the network to have enough samples to get better approaching precision and widely applied scope. 6 7 8 3 7 8 2
4.1 Multilayer Network Design and Training
W XU S H W \ XD 2 /
U WH X \ S D Q/ ,
Fig 2 is the typical multilayer neur-al network’s sketch. In the study, there are 5 nerve cells in the input layer, 11 in the hidden layer and 3 in the output layer. The +LGGHQ double bent tangent function sigmoid is /D\HUV adopted in the input layer and middle layer, as well as the output layer adopted linear Fig. 2 The typical sketch of Multi-layer function. The samples in the table 2 are neural network
Cutting Force Prediction of High-Speed Milling Hardened Steel
575
used to train network through elastic grads’ decrease met-hod. Its flow chart as Fig.3. The training precision is kept 1e-5, the maximum training times is 1e4, the momentum coefficientäcan overcome the system’s vibration and the leaning rateócan control the pace of convergence. In this experiment, the momentum coefficient is 2.5 and leaning rate is 0.8, the cutting experimental parameters and results of the test 1-16 are used as training samples. The network is converged after 8045 steps. The result Error defined as the absolute value of experimental value which divided by the difference of the absolute value of prediction and the absolute value of experiment.
Start
The initialization of weight and threshold
Study patterns are offered to network
Middle layer’s input and output
Sj =
∑W
ij
,b
⋅aj −θ
j
j
= f (S j )
Deferent layer’s input and output
Lj =
∑V
t j
⋅bj − γ
, C
κ t
j
= f ( Lt )
YES
E <ε
END
NO The general errors and grads of output layer
d
k j
= ( y tk − C tk ) ⋅ f ′ ( L t )
、 Δν
= a ⋅ d tk ⋅ b j
Every unit’s input and output
、
e = [ ∑ d tk v jt ] ⋅ f ′ ( b i ) k t
Δ W = β ⋅ d tk ⋅ b j
Modify the weight and threshold with grads and optimize arithmetic
Fig. 3 The flow chart of BP network’s arithmetic
576
Y. Chen et al.
4.2 Results of Network Check The samples (17-23) which do not participate in the train can be used as test samples to predict in the trained BP network to verify its cover capability in the table 3. The predicted results and errors are as shown in the table 4, the predicted errors are within 10%, except that very few are much bigger which show that the network having better predicted capability can be used to predict the cutting force. Table 4 Testing results and errors Predicted value(N)
No. 1 2 3 4 5 6 7
Error (%)
Px
Py
Pz
Ex
Ey
Ez
76.7 97.2 84.8 109.3 118.9 105.0 82.4
44.3 49.0 36.3 44.3 47.8 44.6 50.9
154.8 186.2 185.2 220.5 229.2 191.3 142.2
0.16 3.48 3.99 0.23 0.70 4.57 9.02
2.46 6.93 15.80 4.65 2.58 9.96 0.31
4.94 1.09 5.72 0.11 5.15 4.06 12.85
5 Application Through designing, training and testing the network above, the application capability can satisfy our request and can be used to predict the cutting force. The cutting parameters which is used to prediction and the predicted results which we adopted the above network getting are shown in the table 5. Table 5 Predicted results of cutting force No. 1 2 3
Ad (mm) 0.12 0.10 0.10
F (mm/min) 200 300 200
N (r/min) 6366 6366 6366
R (mm) 4.0 4.0 5.5
L (mm) 19 19 19
Px (N) 110.8 111.6 111.6
Py (N) 49.0 46.2 62.2
Pz (N) 210.2 227.8 200.7
In order to verify the feasibility of the predicted cutting force based on the network above, we compare the experimental cutting force with the parameters above and predicted force, getting the errors shown in table 6. The fact which predicts error values is about 5% mostly except some special individual. The error for the individual predicting value is relatively large for which the main reason is caused by the following multiple factors when collecting the experimental data, the accumulation of the random error and the system error because of the influence from the outer factors, and the accumulation of the nettraining error. Due to the prediction of the nerve net is actually to seek the imminent function by the experimental data, and only in the condition that the structure of net is convergent, the result of the prediction is stable and reliable.
Cutting Force Prediction of High-Speed Milling Hardened Steel
577
Table 6 Experimental value and predicted error No. 1 2 3
The experimental value(N)
Error(%)
Fx
Fy
Fz
Ex
Ey
Ez
104.3 105.9 113.2
48.3 46.7 53.4
200.3 222.8 208.7
5.85 5.14 1.47
1.45 1.09 14.16
4.69 2.18 3.98
As for the above research, we can extend to other predict toward the cutting force in larger range, and choose in prior the cutting parameters and optimize the cutting process according to the extent of the predicting value.
6 Conclusions Through the above analysis, it gets the following conclusions. 1) 2) 3) 4)
Most of the error value is about 5% except some special individual. The multilayer net which is trained by the large-scale swatch can predict the cutting force well of which the result is stable and reliable. The research result shows that the predict of cutting force during the process of the non-linear nerve net and the unstable & the altitudinal non-linear nerve net is reliable. To calculate backward the cutting parameter by using the compare of the nerve net model can combine and optimize the cutting parameter.
Acknowledgement. The research is supported by the National Natural Science Foundation of China (50665001) and the project of Guangxi Graduate Student Education Innovation Plan.
References 1. Ma, W., Lin, Z., Chen, K.: The Cutting Force Model of Rigid Ball End Mill. Mechanical Science And Technology 17, 422–424 (1998) (in Chinese) 2. Ma, W., Wang, N.: The Study of Cutting Force Model Considering The Elastic Distortion of Ball End Mill. Journal of Nanjing University of Aeronautics & Astronautics 30, 633–640 (1998) (in Chinese) 3. Ni, Q., Li, C., Ruan, X.: Cutting Forces Simulation of Ball-End Milling Based on Solid Modeling. Journal of Shanghai Jiao Tong University 35, 1003–1007 (2001) (in Chinese) 4. Dimla, E., Dimla, J.R., Paul, M., Nigel, J.: Automatic Tool State Identification in a Metal Turning Operation Using MLP. Neural Networks and Multivariate Process Parameters 38, 343–352 (1998) 5. Franci, C., Uros, Z.: Approach to Optimization of Cutting Conditions by Using Artificial Neural Networks. Materials Processing Technology 173, 281–290 (2006) 6. Durmus, K.: Prediction and Control of Surface Roughness in CNC Lathe Using Artificial Neural Network. Materials Processing Technology 175, 321–329 (2008) 7. Ruan, X.: Artificial Intelligence’s Way Neural Calculational Science the Simulation of Brain’s Function Based on the Cell. National Defense Industry Publishing Company, Nanjing (2006) (in Chinese)
BP Neural Networks Based Soft Measurement of Rheological Properties of CWS Dong Xie and Changxi Li*
Abstract. Rheological properties of the CWS are very important for pumping and combustion, but online measurement of them is still a problem for CWS’s NonNewtonian fluid characteristics. Based on the special nonlinear mapping function of BP neural networks, a kind of soft measurement technology is put forward to predict the rheological characteristics of CWS. From the solids concentration and one or two apparent viscosities of the suspension at certain shear rate, the BP neural network can predicate the apparent viscosities at other shear rates with time dependence history. The result shows its effectiveness comparing with the Yield Power Low equation. Keywords: Neural networks, Coal water slurry, Rheology measurement, Soft measurement.
1 Introduction Coal-water slurries (CWSs), also referred to coal-water mixtures (CWMs), coalwater fuels (CWFs) and coal-water pastes (CWPs), are the most promising of all alternative coal-based fuels for replacing oil by coal due to their lower cost and similarity to oil with respect to convenience in transporting and handling. CWS is a high concentrated suspension mixed with coal, water and about 1% chemical additives. It needs an appropriate yield value to maintain its stability during storage, a low apparent viscosity to be combusted satisfactorily and a high coal concentration for economic use [1]. Therefore, the characterization of rheological behavior in CWS is important for slurry preparation, storage, transfer and atomization. In practical terms, the slurry should have a low viscosity at the moderate shear rates for pumping (10-200s-1). When CWS is transported over long distances of kilometers or more through pipes, its rheological properties will change greatly as the settling of solids. It is necessary to measure CWS’s Dong Xie and Changxi Li Automatic Control Science & Engineering Dept., Huazhong University of Science & Technology, Wuhan 430074, China
[email protected]
*
H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 579–586. springerlink.com © Springer-Verlag Berlin Heidelberg 2009
580
D. Xie and C. Li
rheological parameters in order to decide whether continue to pump or to do some pretreatment works first, such as chumming or heating, etc. Some studies have been directed towards relating rheological properties of CWS to atomization quality (5000-30000 s-1) [2,3]. It is found that low slurry viscosity leads to small droplet size during atomization of CWSs and hence an increase in the carbon conversion efficiency in a boiler or furnace. In order to implement automatic production, the online measurement technology for the rheological properties of CWS is in an urgently requirement. As a Non-Newtonian fluid, the characteristics of the CWSs make it complex to evaluate its detailed rheological properties. It is usually using sampling inspection offline. The aim of the present study was to investigate the possible ways of implement an online rheological measurement of the CWS by a soft measurement method based on a neural network prediction model.
2 Theory and Method 2.1 Rheological Characteristics of CWS In the former researches, it is concluded that CWSs generally possess yield points, and any single CWS can exhibit regions of Newtonian, dilatants and pseudo-plastic behavior. It is generally agreed that CWS exhibits a weak shearthinning behavior at low and intermediate shear rates, and becomes Newtonian at higher shear rate where atomization occurs [4]. And it also exhibits thixotropic and rheopectic behavior obviously [5]. Previous studies have revealed that the rheological characteristics of CWS would be varied with a number of factors [1,6]: 1) 2) 3) 4) 5) 6)
physicochemical properties of coal; volume fraction, Ø, of the suspension; particle size range and its distribution; interparticle interactions in the suspension, which are affected by the nature of surface groups, the pH and presence of electrolytes and chemical additives; temperature of the suspension; pretreatment of the suspension, such as microwave, ultrasonic wave and magnetic water;
In most of the case, the CWS suspension is so complex to get its rheological mechanism model and the constitutive equations. Even get one, it’s usually difficult to use. Therefore, empirical models are more commonly used. In order to mathematically describe the relationship between shear stress and shear rate, equations have been developed to fit the flow curves obtained by experiments. The Ostwal de Waele or Power Law as follows [1,7]
τ = kγ n .
(1)
is widely used. The τ is the shear stress, k is the consistency index, γ is the shear rate, n is the flow behavior index. The power law index (n) is a measure of the
581
Shear Stress(Pa)
Shear Stress(Pa)
BP Neural Networks Based Soft Measurement of Rheological Properties of CWS
Fig. 1 Time Dependency of Thixotropic & Rheopetic CWSs
deviation from Newtonian behavior of the fluid, i.e. n=1 for Newtonian and k=μ (viscosity); if n<1, the fluid is shear thinning, and if n>1, the fluid is dilatant. The more universally applicable equation is Yield Power Low [8]:
τ = τ 0 + kγ n .
(2)
where the τ and k are as described above, the τ0 is yield point. When τ0=0, it is the power law. When n=1, it is a Bingham plastic fluid and k=η (plastic viscosity). There are others equations such as Casson equation and Sisko equation [8,9]. However, all the equations described above deal only with the effect of shear rate on viscosity, while the shear rate history of CWS is not included. As a thixotropic fluid, the rheology can change with time as the structure of the fluid is altered. Fig.1 shows the effect of slowly change of the shear rate. For thixotropic CWS, the decreasing shear curve is lower than the increasing shear curve, and the viscosity of the sample decrease with time. The opposite behavior is seen with rheopectic fluid. Because of the time dependency characteristic of CWS, the Yield Power Lower would have big error. According to the raw data of [10], the average error of the cure fit by yield power lower could be near to 50%.
2.2 Rheological Measurement Viscosity and yield point are two of the most important rheological properties of CWMs. The most common measuring principle to measure rheological properties is the rotational rheometer which allows determining a list of properties in an accepted manner. The method is based on the measurement of the rotational torque when a defined speed is preset or the resulting speed is measured at a set torque level. It is obvious that we can only get apparent viscosity of the NonNewtonian fluid at a certain steady shear rate once a time. Using sinusoidal stress or a sudden jump, we could only get the quantity of the elastic properties which is not the most concerned. Other measure methods such as capillary rheometer have the same problem, we cannot get the apparent viscosity covered a shear rate range in a time short enough to implement online measurement. Even having several rheometers running at separate shear rates simultaneously, we would ignore the time dependency characteristic and have a big error.
582
D. Xie and C. Li
2.3 Soft Measurement and ANN Soft measurement is a method by modeling mathematic model for some parameters which are difficult to be detected directly. The mathematic model descripts the relationship between the parameters which are needed to be detected and some other parameters which are easy to get. By getting these easy-detected parameters instead, it is possible to achieve what we need. But it is always difficult to find the related parameters and the mathematic model. The artificial neural network (ANN) method such as back propagation (BP) neural nets has been a highly feasible and creative technique to solve traditional problems. It can map highly nonlinear input-output relationships; the experiences it obtains include not only those of human knowledge, but also those which may be still unknown to human experts. Another important function of ANN is the ability to interpolate and extrapolate from its experiences. Because of this, many researches on creating soft measurement model by ANN have carried out [11]. Because there are many factors influence the rheology of CWSs and the rheological properties is non-linear, it is possible to use 3-layer PB neural network to get the measurement result.
3 Numerical Example Analysis The experiment data came from [10]. There were 32 groups of rheology test data of CWS for autoclave sample. The rheological properties were determined based on the Yield Power Low equation (Eq.2) using the Haak RV 100 viscometer. In all the tests, every condition was fixed but the solids concentration which changed from 33.91 wt% to 55.39 wt%. In each test, it was got 10 apparent viscosities at 10 shear rates which were respectively 50Hz, 100Hz, 200Hz, 300Hz, 400Hz, 400Hz, 300Hz, 200Hz, 100Hz and 50Hz. The shear rate increased from 50Hz to 400Hz at first, and then reduced to the start point. For example in one test, the solids concentration was 43.51 wt%, the apparent viscosities were as Table 1 and Fig.2. It’s assumed that the physicochemical properties of coal were almost the same in all the samples; the particle size range and its distributions were similar; the interparticle interactions in the suspension were the same, and all the tests were carried out in the same temperature. So what only affects the rheology of the CWS was the solids concentration. To avoid the case that individual element was insensitive to the rheology, a point or two of the apparent viscosity was also selected as the input of the BP networks. Considering to the hysteresis loop of the rheology curve, when added one point of the apparent viscosity with all the shear rates, the one at 400Hz shear rate was selected, and the one at 50Hz was added for two points. It was easy to have rheometers running at these shear rate and read the viscosities at any time. The online concentration meter was also wildly used in industry. So if the ANN worked, the online measurement of the rheology of the CWS could be implemented by this soft measurement method. The shear rate sequence was also selected as the input. The output of the BP networks was the vector of 10 apparent viscosities at 10 shear rates.
BP Neural Networks Based Soft Measurement of Rheological Properties of CWS
583
Table 1 The apparent viscosities at 10 shear rates of 43.51 wt% CWS Shear rate(Hz) Viscosity(cP)
+50 +100 +200 +300 +400 -400 -300 -200 -100 -50 295.4 231.6 185.0 170.9 159.1 159.7 159.5 170.5 210.0 291.8
Apparent viscosity(cP)
300
250
200
150 50
100
150
200 250 Shear rate(Hz)
300
350
Fig. 2 The apparent viscosity at 10 shear rates
6 Train Validation Test
5
Square error
4
3
2
1
0
0
1
2
3
4 Epoching
5
6
Fig. 3 Error curve of the 12 elements BP neural network
7
8
400
584
D. Xie and C. Li
So two 3-layer frame were chosen as the neural networks structure, which were 12-30-10 and 13-30-10. 16 groups of rheological data were selected randomly to train the networks, 8 groups to validate the networks, and 8 left to test it. Normalization processing was applied to all the input data. The training function was chosen as the BP neural networks’ training algorithm. After learning training of the neural networks, the error curves of network training were close to the ideal level, as shown in Fig.3 and Fig.4. It was seen that the error curve of networks training was close to the ideal level quickly. The 13 elements BP neural network had a smaller error than the 12 elements one. The validation error of the 12 elements was a little big while the error of the 13elements one came to an acceptable level. The effect of the predicate of the BP neural networks and the Yield Power Low equation (Eq.(2)) fit to the raw data are showing as Fig.5 and Fig.6. The first 16 data are train data, from 17 to 24 are validation data and last 8 are test data. It is seen that both of the neural networks are trained well as they have little error in the first 16 data. But the 12 elements network has an error high to 37%, while the 13 elements one has 24%. However, comparing with the Yield Power Low equation, the 12 elements network has similar error, and error of the 13 elements network is distinct reduced. The BP networks have a better effect to evaluate the rheological characteristic of CWSs than the equation. The reasons why the error is still high up to 24% are that the training data is not big enough and that some factors influencing the rheological properties are not considered, such as pH and the ash rate of the suspension. After all, it is always difficult to get a precision value of rheological parameters. 7 Train Validation Test
6
Square error
5
4
3
2
1
0 0
1
2
3
4 5 Epoching
6
7
Fig. 4 Error curve of the 13 elements BP neural network
8
9
BP Neural Networks Based Soft Measurement of Rheological Properties of CWS
585
0.5 ANN Error YPL Error
0.45 0.4 0.35
Error
0.3 0.25 0.2 0.15 0.1 0.05 0
0
5
10
15 20 Data Groups
25
30
35
Fig. 5 Compare the error of the 12 elements ANN with the error of Yield Power Low equation 0.5 ANN Error YPL Error
0.45 0.4 0.35
Error
0.3 0.25 0.2 0.15 0.1 0.05 0
0
5
10
15 20 Data Groups
25
30
35
Fig. 6 Compare the error of the 13 elements ANN with the error of Yield Power Low equation
586
D. Xie and C. Li
4 Conclusion In this paper, a novel soft measurement method based on BP neural network for the rheological characteristic of CWS is presented. From the solids concentration and one or two apparent viscosities of the suspension at certain shear rate, the BP neural network can predicate the apparent viscosities at other shear rate with time dependence. The result show that the accuracy of the new soft measurement method exceeds that of the common used Yield Power Low equation. The online measurement of the rheological properties of CWS is possible to implement in this way. If we get more parameters of the CWS condition, such as the temperature, ash rate, particle size rate and its distribution, PH and the chemical additive concentrations, etc, it is probable to get an exacter model. And a Hopfield neural network would be more suitable than BP networks for the hysteresis [12]. All these need more study in the future.
References 1. Ogura, T., Tanoura, M., Hırakı, A.: Behaviour of Surfactants in a Highly Loaded CoalWater Slurry: 1. Effects of Surfactants Concentration on Its Properties. Beulleti of the Chemical Society of Japan 66, 1343–1349 (1993) 2. LaFlesh, R.C., Lachowicz, Y.V., McGown, J.G.: Combustion Characteristics of CoalWater Fuels. In: Proc. Eighth Int. Symp. On Coal Slurry Prep. and Utils., U.S. Dept. of Energy, P.E.T.C., Pittsburgh, PA, pp. 438–452 (1986) 3. Burdukov, A.P., Popov, V.I., Tomilov, V.G.: The Rheodynamics and Combustion of Coal-Water Mixtures. Fuel 14, 927–933 (2002) 4. Ken, D.K., Paul, D.: Dynamic Surface Tension of Coal-Water Slurry Fuels. Fuel 7, 295–300 (1995) 5. Lu, p., Zhang, M.: Rheology of Coal-Water Paste. Powder Technology 150, 189–195 (2005) 6. Roh, N.S., Shin, D.H., Kim, D.C., Kim, J.D.: Rheological Behavior of Coal Water Mixture: 2. Effect of Surfactants and Temperature. Fuel 74, 1313–1318 (1995) 7. Gürses, A., Açıkyıldız, M., Doğar, Ç.: An Investigation on Effecs of Various Parameters on Viscosities of Coal-Water Mixture Prepared with Erzurum-Aşkale Lignite Coal. Fuel Processing Technology 87, 821–827 (2006) 8. Raffi, M., Turian, Jamel, F., Attal, Sung, d., Liewis, E., Wedge, w.: Properties and Rheology of Coal-Water Mixtures Using Different Coals. Fuel 81, 2019–2033 (2002) 9. Trochet-Mignard, L., Taylor, P., Bognolo, G., Tadros, T.F.: Concentrated Coal-Water Suspensions Containing Nonionic Surfactants and Polyelectrolytes 2. Adsorption of Nonyl Phenyl Propylene Oxide-Ethylene Oxide on Coal and the Rheology of the Resulting Suspension. Colloids and Surfaces A: Physicochemical and Engineering Aspects 95, 37–42 (1995) 10. Chris, M., Mark, A., Brian, C.: Coal-Water Fuel Preparation and Gasification. ThailandTask 39, Topical Report. DE-FC21-93MC30098 (1996), http://www.OSTI.org 11. Xiong, G., Nybeg, T.R., Xu, X.: BP Based Soft Measurement of Flash Point of Lubrication Oil. In: Proceedings of the 3rd World Congress on Intelligent Control and Automation, vol. 2, pp. 1114–1118 (2000) 12. Sunil, B., Jerry, M.: The Hysteretic Hopfield Neural Network. IEEE Transactions on Neural Networks 11, 879–888 (2000)
A Parameters Self-adjusting ANN-PI Controller Based on Homotopy BP Algorithm Shuguang Liu and Mingyuan Liu*
Abstract. Back-Propagation neural network can record the fuzzy control rules efficiently, and utilize these experiences according to associative memory. But, all existing feedforward net learning algorithms have a local minimum problem inevitably. To solve above problem, a Homotopy continuation BP algorithm is adopted in this paper, which provides an effective method for BP network’s global convergence and is of very fast convergent speed. For some complex nonlinear control systems, a parameters self-adjusting fuzzy-PI controller is ever adopted effectively. Because ANN has strongly nonlinear mapping power, so we can use ANN based on Homopy BP algorithm to replace fuzzy segment to reconstruct a new ANN-PI controller, which has a faster dynamic response, higher control accuracy, better disturbance-resisting ability, less sensitive to parameter changes, and robustness. Keywords: Homotopy, neural network, parameters self-adjusting, fuzzy rules.
1 Introduction Because fuzzy control can simulate human being’s adjusting mentality and control strategy with a nice bit of intelligence, and its adjusting rules is independent of the accurate mathematics model and easy to be implemented, so we pay attention to it more and more[1-3]. In general, a control object’s mathematics model must be knew before designing a control system, however it is very difficult for us to obtain this model with good accuracy and convenient to design in many manufacturing processes. Traditional fuzzy controller is a kind of control system based on fuzzy rules, which is an induction of fuzzy information and conclusion of control experience. In people’s thinking, an abstract conception is corresponding to a fuzzy variable, whereas the experiential knowledge is a mapping of linking them. From Shuguang Liu Xi’an Polytechnic University, Xi’an 710048 China
[email protected]
*
Mingyuan Liu Xi’an Jiaotong University, Xi’an 710049 China H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 587–595. springerlink.com © Springer-Verlag Berlin Heidelberg 2009
588
S. Liu and M. Liu
view of physiology, people’s knowledge is some brain’s neural memory, which is carried out by different conjunction intension between neurons. According to this point, the fuzzy rules can be stored in manual neural network completely[4]. The learning procedure in BP neural network based on gradient descent method is always related to the initial value of all weights connecting neurons, only when these weights is appropriate, the network is convergent, otherwise it is not convergent. Moreover, BP network’s convergence speed is very slow, it’s iterative step is likely to be up to several thousands times so much as over ten thousand times, perhaps a satisfying result don’t be received. Homotopy BP algorithm is not only convergent in a large area, but also it has a quick convergence speed and strong ability of overcoming morbidity. When the gradient descent method is not convergent, it can find a satisfying solution[5]. Because of the simple ANN controller’s input quantitative factors and output proportional factors are fixed and can not be adjusted automatically, so its control accuracy and dynamic response speed, anti-jamming capability are certain limited and can’t be further improved. Although ANN-PI controller can speed up the dynamic response and improve control accuracy, but because of it’s fixed parameters, controller’s adaptive capacity is not excellent. In response to this problem, on the basis of ANN-PI controller we use the strategy of self-tuning parameters to design one kind of parameters’ self-adjusting ANN-PI controller for sizing machine warp unwinding tension autolevelling[6].
2 Homotopy BP Algorithm For the back-propagation (BP) neural network in fig.1, an error function can be defined by 2
⎤ 1 N3 1 N2 ⎡ 2 ε (k ) = ∑ yˆ j (k ) − y j (k ) = ∑ ⎢ yˆ j (k ) − f (∑ Wij (k ) xi (k ))⎥ (1) 2 j 2 j =1 ⎣ j =1 ⎦
[
where
]
yˆ is supposed to be the expected output, y denotes the actual output.
Fig. 1 A typical structure of three layers BP network
A Parameters Self-adjusting ANN-PI Controller Based on Homotopy BP Algorithm
589
In fact, BP algorithm is one grads algorithm, namely
Wij (k + 1) = Wij (k ) − μ∇ε (k ) = W (k ) − μ
∂ε (k ) ∂Wij (k )
(2)
By using the nonlinear optimal K-T condition, we can translate equation (2) into the zero problem of the continuous function f, i.e.
f ( x, W ) = 0 Let W and Y be a non-empty subset of glazed mapping, if
,
(3)
R n respectively G, F : W → Y is a
(t ,W ) ∈ [0, 1] × W satisfy
H : [t , W ] = (1 − t )G (W ) + tF (W ) .
(4)
H : [0, 1] × W → Y is called a linear Homotopy between G and F, else t ∈ [0, 1] is a Homotopy parameter, and the dimension of R n is n = N1 × N 2 + N 2 × N 3 .
then
Confirming zero of nonlinear question F is that a auxiliary zero-glazed mapping:
G : R n → R n is selected to construct Homotopy mapping between F and G: H : [0, 1] × R n → R n , if H (0,W ) = G (W ) and H (1,W ) = F (W ) , then zero of Homotopy H is coherent with that of G at t=0, and zero of H is also coherent with that of F at t=1. Let ∇ε ( k ) = ∇ε (W ) = 0 , the zero problem of equation (2) is equivalent to that of equation (3). So, a Homotopy function can be constructed as follows:
H (t ,W (t )) = (1 − t )(W (t ) − W (0)) + t∇ε (W ), t ∈ [0, 1]
(5)
F (W ) = ∇ε (W ) is a waited-solved problem, the auxiliary function G (W ) = W (t ) − W (0) is a simple problem, W(0) is some
where the objective function
initial values of weights. Zero set of Homotopy mapping for equation (4) implies that
H −1 (0) = {(t ,W ) H (t ,W ) = 0}, t ∈ [0, 1], W ∈ R n .
(6)
Let
H (t , W (t )) = 0, t ∈ [0, 1] ,
(7)
if equation (7) is continuously differentiable in an interval [0,1], there exists the renewed weights formula such that[5] W ( k + 1) = W (k ) + k = 0,1,
,L − k
1⎡ k k ∂ ⎤ (1 − ) I + ∇ε (W (k )) −1 ⎥ • [∇ε (W (k ) − (W (k ) − W (0))] (8) ⎢ L⎣ L L ∂W ⎦
590
S. Liu and M. Liu
where I is an identity matrix, L is an iterative step. For the arbitrary given W(0), an enough exact W(L) can be obtained after L iterative step such that −1
⎡ ∂ ⎤ W (k + 1) = W (k ) − μ ⎢ ∇ε (W (k ))⎥ • [∇ε (W (k )) − W (k )], k = L, L + 1, ∂ W ⎣ ⎦
(9)
3 Parameters Self-adjusting ANN-PI Controller Based on Homotopy BP Algorithm 3.1 Using Three Layers BP Network to Record the Fuzzy Rules Fuzzy controller’s control rules can be valued as follows:
if E = Ai and EC = B j then U = Cij , i = 1,2,
, m; j = 1,2,
, n.
(10)
Where Ai, Bj, Cij are the fuzzy sets respectively on domain X, Y, Z, denoting a set of fuzzy variables such as positive large (PL), positive middle (PM), negative large (NL), . Sentences (10) can be described as a fuzzy relationship R from X×Y to Z as follows:
…
R = ∨ Ai ∧ B j ∧ Cij
(11)
i, j
Which yields,
μ R ( x , y , z ) = ∨ μ A ( x ) ∧ μ B ( y ) ∧ μC ( z ) . i
i, j
j
ij
(12)
If output error and error change of the controlled object were fuzzy sets A and B, the control action would be gained by fuzzy reasoning composition rules as follows [3]:
U = ( A × B) R .
(13)
Thus,
μU ( z ) = ∨ μ R ( x, y , z ) ∧ μ A ( x ) ∧ μ B ( y ) .
(14)
x, y
A center of gravity method is sued to obtain controller’s output: n
U=
∑U i −1
i
⋅ μU (U i ) (15)
n
∑μ i =1
U
(U i )
Above fuzzy logic rules can be implemented by three layers BP neural network[4].
A Parameters Self-adjusting ANN-PI Controller Based on Homotopy BP Algorithm
591
Fig. 2 Three layers BP network controller
In fig.2, the 1st to 3rd layer of Homotopy network implements the fuzzy control rules ‘IF-THEN’, but the 3rd to 4th layer does a output adjudication. The numbers of three layer neural network’s input- and output- nodes are designed according to calculating accuracy, but that of middle hidden nodes are related on the recorded samples. In general, s hidden nodes can record s+1 different samples exactly[7].
3.2 Parameters Self-adjusting ANN-PI Controller As the proportional and integral action of linear PI controller can enable the control system to get higher accuracy and speed up the dynamic response, so introduction of PI structure to ANN will be able to improve the controller’s performance. The relationship of the input and output of ANN-PI controller is t
U (t ) = K P ⋅ U + K I ⋅ ∫ Udt 0
t
= K P ⋅ U i + K I ⋅ T ⋅ ∑U i
(16)
i =0
where T is sampling time, KP is the proportional gain, KI is the integral gain. To further improve the controller performance, literature[6] put forward the thought of K1, K2, KP, KI self-adjusting on the basis of ANN-PI controller, so that K1, K2, KP, KI can be accommodated by themselves with different of e and e . When e or e is larger, we take ‘rough-tuning’ strategy, i.e. decreasing K1, K2 to reduce the resolution of e and e , in the meanwhile enlarging KP, KI to output the larger control quantity. Conversely, when e or e is smaller, which closes to the steady-state system, we take the ‘fine-tuning’ strategy, i.e. enlarging K1, K2 to improve the resolution of e and e , in the meanwhile decreasing KP, KI to output the small control quantity. The self-adjustment operation process is that we first use the original K1, K2 to quantify e and e , and according to ANN2 we can get the
592
S. Liu and M. Liu
parameters enlarging (or decreasing) multiples n, after that we can calculate K1=K1n, K2=K2n, KI=KI/n, KP=KP/n as ANN1’s new parameters. The system diagram of parameters self-adjusting ANN-PI controller is shown as fig.3.
.
e
Fig. 3 Parameters self-adjusting ANN-PI controller
Fuzzy sets and domain of E, EC, U, N are respectively {PB,PM,PS,PO,NO,NS,NM,NB} and {+6,+5,+4,+3,+2,+1,+0,-0,-1,-2,-3,-4,-5,-6} {PB,PM,PS,O,NS,NM,NB} and {+6,+5,+4,+3,+2,+1,0,-1,-2,-3,-4,-5,-6} {PB,PM,PS,NO,NS,NM,NB} and {+7,+6,+5,+4,+3,+2,+1,0,-1,-2,-3,-4,-5,-6,-7} {AB,AM,AS,OK,CS,CM,CB} and {8,4,2,1,1/2,1/4,1/8} Fuzzy control rules table of ANN1 and parameters modified rules table of ANN2 show in tab.1 and tab.2 respectively. The inputs of ANN1 has 26 nodes, corresponding to all integers of E, EC from – 6 to +6, the outputs has 15 nodes corresponding to all integers of U from –7 to +7, the hidden nodes is 57. The inputs of ANN2 has 26 nodes the outputs has 7 nodes, the hidden nodes is 57. For ANN1 and ANN2, some arbitrary real numbers would be selected from [-1, 1] to initialize weights.
,
,
Table 1 Fuzzy control rules table EC
U
E
NB
NM
NS
O
PS
PM
PB
NB
PB
PB
PB
PB
PM
O
O
NM
PB
PB
PB
PB
PM
O
O
NS
PM
PM
PM
PS
O
NS
NS
NO
PM
PM
PS
O
NS
NM
NM
PO
PM
PM
PS
O
NS
NM
NM
PS
PS
PS
O
NS
NM
NM
NM
PM
O
O
NM
NB
NB
NB
NB
PB
O
O
NM
NB
NB
NB
NB
A Parameters Self-adjusting ANN-PI Controller Based on Homotopy BP Algorithm
593
Table 2 Parameters modified rules table EC
N
E
NB
NM
NS
O
PS
PM
NB
CB
CM
CS
OK
CS
CM
PB CB
NM
CM
CS
OK
OK
OK
CS
CM
NS
CS
OK
OK
AS
OK
OK
CS
NO
OK
OK
AM
AB
AM
OK
OK
PO
OK
OK
AM
AB
AM
OK
OK
PS
CS
OK
OK
AS
OK
OK
CS
PM
CM
CS
OK
OK
OK
CS
CM
PB
CB
CM
CS
OK
CS
CM
CB
4 Simulation and Application Simulations are carried in two cases: one is with the conventional fuzzy controller and the other is with the parameters of self-adjusting ANN-PI controller. Simulation model is selected as follow:
G (s) =
e − sτ 5( s + 5)
(17)
The step response curves of two different controllers imposed on equation (17) are shown in fig.4 at τ = 0 and τ = 3 . Simulating results shows that the parameters self-adjusting ANN-PI controller is better than the conventional fuzzy controller in aspects of the rise time, overshoot, settling time, accuracy and hysteresis. \
W
1- the conventional fuzzy controller; 2- the parameters of self-adjusting ANN-PI controller. Fig. 4 Two kinds of controllers’ step response curves
In the textile process, the control problem of sizing machine warp unwinding tension is one of the problems of yarn tension levelling. The yarn’s mechanical characteristic has not yet been clear [8]. Taking sizing machine as an example, there
594
S. Liu and M. Liu
are many factors to impact the mechanical features of yarns, such as the changes of winding torque, the improper temperature in serous fluid and drying room, the changes of yarn roll and grouting roll rate, the bad moment of yarn roll and tension roll, the bad control of shaft brake, even as the braking zone between the wheel and brake pads for long-term use and wear, and so will bring on tension in the yarn. These reasons make some difficulties to establish accurate mathematical models. Furthermore, view from the research of unwinding aerodynamic control of electrical circuits used in the proportional valve, the proportional valve has the feature of hysteresis, which is a non-linear element, no doubt this will lead to non-linear in the axis unwinding. So, an ANN-PI control strategy based on Homotopy BP network for sizing machine warp unwinding tension control is more appropriate. The yarn tension control curve of 45 Tex. Count cotton is shown in fig.5. From the experimental results, the conventional fuzzy controller’s ascending time is about six sampling period (sampling time T=10ms [7]), while the parameters selfadjusting ANN-PI controller’s ascending time is about three sampling period, its response time is increased greatly. Compared with the conventional fuzzy controller, the parameters self-adjusting ANN-PI controller has small fluctuation for the response characteristic curve, good disturbance-resisting ability and high accuracy. To test the control system’s disturbance-resisting capability, the random interference research on the performance of the parameters self-adjusting ANN-PI controller should be carried out. A random interference f=Us•RND (Us = 0.4V, the time imposed on the system is 5 sampling periods) was added to the control Fig. 5 Unwinding tension control curve
1- the conventional fuzzy controller; 2- the parameters of self-adjusting ANN-PI controller.
Fig. 6 Self-adjusting parameters fuzzy-PI controller’s unwinding tension control curve
A Parameters Self-adjusting ANN-PI Controller Based on Homotopy BP Algorithm
595
system, like fig.6. The results showed that this control system has better disturbance-resisting capability, showing a strong adaptability to the parameters changes.
References 1. Zadeh, L.A.: Fuzzy Sets. Inform. Contr. 8, 338–353 (1965) 2. Rutherford, D.A., Bloore, G.C.: The Implementation of Fuzzy Algorithms for Control. Proc. IEEE 6, 572–573 (1976) 3. Zadeh, L.A.: Outline of a New Approach to the Analysis of Complex Systems and Decision Processes. IEEE Trans. Syst. Man. Cybern. 3, 28–44 (1973) 4. Chang, S.K.: On the Execution of Fuzzy Programs Using Finite-State Machines. IEEE Trans. Computer 61, 211–253 (1972) 5. Li, B.S., Liu, Z.J.: The Application of Fuzzy Set Theory to the Design of a Class Controllers. Acta Automatica Sinica 6, 25–32 (1980) 6. Hu, J.Y., Wu, Z.Q., Song, S.S.: Parameters Self-Adjusting Fuzzy-PI Controller. Information and Control 6, 28–33 (1997) 7. Liu, S.G., Wei, J.M.: Application of Fuzzy Control to Yarn Tension Autolevelling. Journal of Textile Research 15, 4–8 (1993)
Study on Optimization of the Laser Texturing Surface Morphology Parameters Based on ANN Zhigao Luo, Binbin Fan, Xiaodong Guo, Xiang Wang, and Ju Li*
Abstract. This paper analyses the important impact of laser texturing surface morphology on drawing forming. Based on the artificial neural network theory, the nonlinear mapping relations between the morphology parameters and the quality of drawing sheet was studied, the optimization model of laser texturing morphology parameters were established, and fixed on the optimization target, then the fitness function suitable for the genetic algorithm was determined, and the laser texturing morphology parameters were optimized. Taken BenDou for example, the multi-parameter numerical simulation was taken, and the morphology sharp parameters were obtained after the Genetic Algorithm. The results showed that parts with better formability, and noted that this method had better Optimizing guide. Keywords: Laser texturing, Drawing forming, the artificial neural network, the Genetic Algorithm, Numerical simulation.
1 Introduction In the drawing of sheet metal forming process, the main forming default of product is rupture and wrinkle. With the geometric parameters and the process parameters of mold setting in premise, and the main factor affect product quality is friction lubrication. The greater the friction, the greater movement of resistance of the sheet metal, the greater tension stress suffere in sheet metal; the smaller the friction, the sheet metal suffere the uneven tension stress, and prone to wrinkle defects. Laser texturing technology changes the friction between the sheet metal and the mold by changing the mold surface topography parameters, and indirectly improves the quality of the drawing forming. The process of drawing forming is a multi-parameter nonlinear interactions, and the non-linear mapping from the multiple-input to multiple-output is achieved by the artificial neural network, the intelligent design for laser texturing morphology is applied by artificial neural network in prior. This paper takes the drawing process of Zhigao Luo . Binbin Fan . Xiaodong Guo . Xiang Wang . Ju Li College of Mechanical Engineering, Jiangsu University, Zhenjiang 212013, China
[email protected],
[email protected] *
H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 597–603. springerlink.com © Springer-Verlag Berlin Heidelberg 2009
598
Z. Luo et al.
Bendou as an example to study, establish the artificial neural network models between the Drawing quality and appearance of laser texturing parameters, at the same time the genetic algorithm is used to optimize the laser texturing morphology parameters.
2 Establish the Models of the Morphology Parameters 2.1 Confirm the Optimize of Variable Parameters The impact of drawing contains four factors, such as size and shape of the parts parameters, the shape of blank, process conditions, and sheet performance. In this paper, the drawing quality improved by the laser texturing morphology on the surface is considered in paper, so the relevant parameters of the morphology are selected: micro figure radius R, distribution density ρ, height h as the variable parameter. Optimize model of the variable is:
X = [R、ρ、h] .
(1)
as 1 show, as a result of the restrictions of the processing parameters of Laser texturing, the micro figure radius ranges from 250 to 425μm, the height is between 7.2 to 35.2μm, and the distribution density of micro figure affect the drawing process little , So optimization of the surface distribution density is between 6% to 30%.
2.2 Confirm of the Optimize Target The common defect in sheet metal forming process are rupture and wrinkle, the drawing performance indicators can usually be chosen as a measurement: (1) Rupture factor p, there are a variety of reasons for the rupture in sheet drawing, such as the rupture strength, the plastic rupture and so on. FLD sheet describes the limit strain under the different ways, which can determine the occurrence of rupture. The rupture factor is defined according to the windage D in FLD, as:
、 、h)= ( D + D
P( R ρ
where D is windage,
max
) / Dmax ,
(2)
Dmax is the absolute maximum of windage.
(2) Wrinkle factor Q, there are many external factors lead to wrinkle , such as stress wrinkle, shear stress wrinkle, uneven tensile stress wrinkle, bending stress wrinkle and so on, and in the final analysis, wrinkle is produced by shear stress in up and under surface, lead to instability plastic deformation of sheet. Q is defined as:
、 、h)= ε
Q(R ρ where strain.
ε2
2
− ε1 ,
is the up surface tangential strain,
ε1
(3)
is the down surface tangential
Study on Optimization of the Laser Texturing Surface Morphology Parameters
599
(3) The homogeneous of sheet thickness M, and the rupture and wrinkle are related to thickness, so , M is defined as:
M ( R、ρ、h)= J / J max , where J is the thickness distribution, J max is the max in J, where i
(4) 1
⎛ ⎞p J = ⎜ ∑ ( h i − h) p ⎟ , ⎝ 1 ⎠ n
n is the unit number, h is the thickness of the ith unit , h is the initial thickness, p is the strengthen factor. Above all, the research target is to avoid rupture and wrinkle in a reasonable windage and thickness uniformity. And the simple optimization objective function as follows:
、、)
f ( R ρ h = w1[( D + Dmax ) / Dmax ] + w2 ε 2 − ε 1 + w3 ( J / J max ) , (5) where
w1
、 w 、 w are the weighted coefficients. 2
3
2.3 Determination of Fitness Function In this paper, the design variables of the objective function was forecast by the artificial neural network model, and the proportion of linear method was used in the genetic algorithm fitness function. Fitness function is the only measurement to evaluate the genetic chromosome performance, the greater value the fitness function is, the closer the corresponding solution to the problem optimal solution. According to the analysis above, fitness function is constructed as follows:
、ρ、h) = f (f R(R、、ρρ、、h)h)
、、 、、
− f ( R ρ h) , ρ h) min max − f ( R
eval ( R
max
(6)
f ( R、ρ、h) is one of the individual objective function value, f ( R、ρ、h) max f ( R、ρ、h) min stand for the minimum and maximum of the objection function respectively, the smaller f ( R、ρ、h) is the greater fitwhere:
、
ness, and the better solutions for gene surface.
3 Method of Hybrid Optimization During the process of sheet metal drawing forming, the artificial network model is established by a limited time numerical simulation, and the relationship between the surface topography parameter and the quality is trained by the neural network. Different sheet metal forming result is quickly and largely received from this model quickly. The input node to be chosen is the diameter of micro figure R, density ρ and height h. The output node is the deviation D, tangential strain ε and the thickness J. The number computed by numerical simulation training is used as data sample, and another number is calculated as the test data at the same time. A
600
Z. Luo et al.
Begin
Numerical simulation
Establish the neural network model
Determine the group size
GA
Training neural network model Convergence judge
N
Objective function value Y Optimization parameter Fig. 1 Hybrid optimization program
、
suitable neural network parameters are chosen, such as layer the number of neurons and transfer function and so on. A neural network model is selected through sample, and then test data is used to be test, the model is accepted if the error is in the permitted scope of the license. The objective function is predicted based on the artificial neural network above, and the individual fitness value of real-time solution achieved in the optimal iterative process. By reducing the numerical simulation computation; The genetic algorithm optimization which is used as a optimization algorithm combine with MATLAB6.5 that improve the efficiency and accuracy of computing, and the laser texturing morphology optimization program is work out, which is used to completed the algorithm solution in optimize process.
4 Optimization Experiment 1) Computing model The designed computing model is a bendou with flange, and the slab is rectangular, the length is 300mm, width is 180mm, and drawing height is 60mm, 45 materials is selected to simulate as the sheet material, the thickness of sheet metal is 1.0mm, considering the asperity of friction conditions, 0.12 is taken as lubrication friction coefficient.
Study on Optimization of the Laser Texturing Surface Morphology Parameters
1
R
ȡ
D
2
h
601
İ
30
J
Fig. 2 The model of artificial neural network
2) modeling and training the artificial neural network In this paper, three neural network is used the input layer neurons is 3, that is the micro figure radius, the density and height; the hidden layer neurons is 30; and output neurons is 3, that is the deviation, Tangential Strain and thickness uniformity, as follow fig shows: All neurons are calculated by using sigmoid function, the minimum error expectations of neural network training parameters is 0.001; the largest number of cycle is 50000; smoothing factor is 0.1, the momentum factor is 0.25; space samples are taken from the orthogonal finite element simulation results of micro figure, with a total of 25, micro figure radius (in mm) values are 0.25,0.294,0.338,0.382,0.425. Densities are 6%, 12%, 18%, 24%, 30%. Height (in mm) values are 7.2,14.2,21.2,28.2,35.2. The system error reaches 0.001 after 23546 times iteration. Another 4 data are chosen to test, and the error between numerical simulation and training network show in the table. As show in the table, the error is in 5%, and indicates that the reliability of the network model.
,
3) Genetic algorithms optimization Choose 100 as the size of the population; crossover arithmetic uses two cross-cutting, cross-probability is 0.9; mutation operator use Gaussian variation and mutation probability is 0.1; the largest genetic algebra is 100; individual replacement percentage is 25%. A set of approximate solution are received after genetic Table 1 The error between numerical simulation and training network
number 1 2 3 4
Windage D 3.08% 0.57% 0.44% 0.61%
Tangential strainε 3.2% 0.64% 0.48% 0.83%
Thickness distribution J 1.67% 0.77% 0.60% 1.02%
602
Z. Luo et al.
(a)
(b)
Fig. 3 The comparison of result between Conventional drawing and drawing with laser texturing
cycle: R = 0.382mm, ρ = 24%, h = 21.2mm. The numerical simulation is carried by using this group, and the error is controlled in 1.5 percent, of which the value of D in the gross morphology reduced 11.3 percent, and the value of ε reduces 7.3 percent, the value of J reduced 3.5 percent. The results shown in the figure using the approximate solution to experiment, the rupture and wrinkle didn’t occur in the parts, so the quality of the parts are considered to be eligible in line with the requirements.
5 Conclusion (1) In this paper, the drawing mold surface topography parameters in bendou are optimized with the methods of combining numerical simulation artificial neural network and genetic algorithm, the results show that the parts could be drawing better than the conventional thickness distribution, and strain to cut the maximum deviation value in the optimization of laser texturing micro figure radius density and height. (2) Using powerful ability to function approximation of the artificial neural networks, the relationship between the drawing sheet metal forming parameters on the surface of the mold (micro figure radius, density and height) and the quality of forming by less numerical simulation, not only to ensure the accuracy, but also reduce the number of complex numerical simulation. On the basis of the neural network modeling, the sheet metal forming parameters were optimized with the overall search capabilities of genetic algorithm. the genetic algorithm optimization has good results through examples, and show the applicability of the model.
、
、
References 1. Pan, J.F., Zhong, Y.X., Yuan, C.L.: Process Parameters Optimization for Sheet Metal Forming During Drawing with a Multi-Objective Genetic Algorithm. Journal of Tsinghua University, Science and Technology 47(8), 1267–1269 (2007)
Study on Optimization of the Laser Texturing Surface Morphology Parameters
603
2. Samya, E., Jacques, T., Bassem, B.: Genetic Algorithms to Solve the Cover Printing Problem. Computers & Operations Research 34(11), 3346–3361 (2007) 3. Carlos, C.: Theoretical and Numerical Constraint-Handling Techniques Used with Evolutionary Algorithms: a Survey of the State of the Art. Computer methods in applied mechanics and engineering 191(11), 1245–1287 (2002) 4. Dai, H., Li, Z., Xia, J.: Processing Parameter Optimization and Experimental Study on Drawing Hole Forming Zhongguo Jixie Gongcheng. China Mechanical Engineering 17(15), 1627–1634 (2006) 5. Singh, S.K., Kumar, D.R.: Application of a Neural Network to Predict Thickness Strains and Finite Element Simulation of Hydro-Mechanical Deep Drawing. International Journal of Advanced Manufacturing Technology 25(1-2), 101–107 (2005) 6. Wang, X., Cao, L.: GA Theory and Application Software. Xi’an Jiaotong University Press (2002)
A Combined Newton Method for Second-Order Cone Programming Xiaoni Chi and Jin Peng
Abstract. Based on the Fischer-Burmeister smoothing function, a combined Newton method is proposed for solving the second-order cone programming (SOCP). The algorithm combines the techniques in both non-smoothing Newton methods and smoothing Newton methods. Without any restrictions regarding its starting points, the algorithm needs to perform at most one line search at each iteration and is shown to be globally convergent. Numerical results indicate the effectiveness of our algorithm. Keywords: Second-order cone programming, Non-smoothing Newton method, Smoothing Newton method, Global convergence.
1 Introduction The second-order cone programming (SOCP) problem is to minimize or maximize a linear function over the intersection of an affine space with the Cartesian product of a finite number of second-order cones. Consider the following SOCP problem [1] ( n ) n T (P ) min ci xi : Ai xi = b, xi ∈ Ki , i = 1, . . . , n i=1
i=1
and its dual problem (D) max
" bT y : ATi y + si = ci , si ∈ Ki , i = 1, . . . , n .
Xiaoni Chi · Jin Peng College of Mathematics and Information Science, Huanggang Normal University, Huangzhou 438000, China
[email protected] H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 605–612. c Springer-Verlag Berlin Heidelberg 2009 springerlink.com
606
X. Chi and J. Peng
Here Ai ∈ Rm×ki , ci ∈ Rki , i = 1, . . . , n, b ∈ Rm are the data, and xi ∈ Ki , si ∈ Ki , i = 1, . . . , n, y ∈ Rm are the variables. The set Ki (i = 1, . . . , n) is the second-order cone (SOC) of dimension ki , i.e., Ki := {xi = (xi0 , xi1 ) ∈ R × Rki −1 : xi0 − xi1 ≥ 0}, where · denotes the Euclidean norm. It is not difficult to show that the SOC Ki (i = 1, . . . , n) is self-dual. Let k = k1 + · · · + kn , K = K1 × · · · × Kn , A = (A1 , · · · , An ) ∈ Rm×k , c = (c1 , · · · , cn ) ∈ Rk , x = (x1 , · · · , xn ) ∈ K, s = (s1 , · · · , sn ) ∈ K, where we use x = (x1 , · · · , xn ) for the column vector x = (xT1 , · · · , xTn )T . Thus, problems (P) and (D) can be simply written as (P ) min {cT x : Ax = b, x ∈ K},
(1)
(D) max {bT y : AT y + s = c, s ∈ K}.
(2)
Without loss of generality we may assume that n = 1 and k = k1 in the following analysis, because our analysis can be easily extended to the general case. Then the sets of strictly feasible solutions of (1) and (2) are: F 0 (P ) = {x : Ax = b, x ∈ K 0 }, F 0 (D) = {(y, s) : AT y + s = c, s ∈ K 0 } respectively, where K 0 denotes the interior of the SOC K defined as K 0 := {x = (x0 , x1 ) ∈ R × Rk−1 : x0 − x1 > 0}. Throughout this paper, we assume that F 0 (P ) × F 0 (D) = ∅. Under this assumption, it can be shown that both (1) and (2) have optimal solutions and their optimal values coincide [1]. Recently great attention has been paid to the SOCP problems, since they have a wide range of engineering applications [2]. As novel algorithms for solving optimization problems, smoothing Newton methods [3, 4, 5] perform very well in both theory and practice. However, the smoothing Newton methods available are mostly for solving complementarity problems [3, 5] and variational inequality problems [4], whereas there is little work on smoothing Newton methods for the SOCP. Moreover, some algorithms [5] need to perform two or three line searches at each iteration. In this paper, we present a combined Newton method for the SOCP. By using the techniques in both non-smoothing Newton methods and
A Combined Newton Method for Second-Order Cone Programming
607
smoothing Newton methods, we give a novel and effective way to solve the SOCP problems. The rest of this paper is organized as follows. Section 2 presents our new algorithm. In Section 3, we analyze the convergence of the algorithm. Numerical results are given in Section 4 to illustrate the effectiveness of the algorithm. Section 5 concludes this paper.
2 Algorithm Description It is well-known that solving the SOCP problems (1) and (2) is equivalent to [1] finding (x, y, s) ∈ Rk × Rm × Rk such that ⎧ x ∈ K, ⎨ Ax = b, AT y + s = c, s ∈ K, (3) ⎩ x ◦ s = 0. Here the Euclidean Jordan algebra [1, 6] for the SOC K is the algebra defined by x ◦ s = (xT s, x0 s1 + s0 x1 ), ∀x, s ∈ Rk . The element e = (1, 0, · · · , 0) ∈ Rk is the unit element of this algebra. Let φ : Rk × Rk × R → Rk denote the Fischer-Burmeister smoothing function [7] 2 φ(x, s, μ) = x + s − x2 + s2 + 2μ2 e. By Proposition 4.2 in [7], the pointwise limit φ(x, s) = limμ→0+ φ(x, s, μ) satisfies φ(x, s) = 0 ⇐⇒ x ◦ s = 0, x ∈ K, s ∈ K. Let u := (x, y, s) ∈ Rk × Rm × Rk , z := (u, μ) ∈ Rm+2k+1 and define ⎞ ⎛ Ax − b Φ(u) := ⎝ AT y + s − c ⎠ , φ(x, s)
(4)
(5)
⎛
⎞ Ax − b ⎜ AT y + s − c ⎟ ⎟ H(z) := ⎜ ⎝ φ(x, s, μ) ⎠ . μ
(6)
Let ∂Φ stand for the generalized Jacobian of Φ in the sense of Clarke [8]. On account of (3), (4) and (6), z ∗ := (x∗ , y ∗ , s∗ , 0) is a root of H(z) = 0 if and only if (x∗ , y ∗ , s∗ ) is the optimal solution of (1) and (2).
608
X. Chi and J. Peng
Algorithm 1 Step 0. Choose σ, δ, η1 , η2 ∈ (0, 1), μ0 ∈ (0, ∞) and let z = (0, μ0 ) ∈ Rm+2k+1 . Let u0 := (x0 , y0 , s0 ) ∈ Rk × Rm × Rk be an arbitrary point and z0 := (u0 , μ0 ). Choose γ ∈ (0, 1) such that γμ0 < 1. Let β0 := Φ(u0 ). Set k := 0. Step 1. If H(zk ) = 0, stop. Otherwise, let ρ(zk ) = γ min{1, H(zk )2 }. Step 2. (Non-smooth Newton Step) Choose Vk ∈ ∂Φ(uk ). If Vk is singular, go to Step 3. Otherwise, compute Δ; uk ∈ Rm+2k from Vk Δ; uk = −Φ(uk ).
(7)
If Φ(uk + Δ; uk ) = 0, stop. If the following two conditions are satisfied: Φ(uk + Δ; uk ) ≤ η1 βk ,
(8)
uk , μk ) ≤ η2 H(zk ), H(uk + Δ;
(9)
set uk+1 := uk + Δ; uk , μk+1 := μk , βk+1 := Φ(uk + Δ; uk ), k := k + 1, and go to Step 1. Otherwise, go to Step 3. Step 3. (Smoothing Newton Step) Compute a solution Δzk := (Δuk , Δμk ) ∈ Rm+2k+1 of H(zk ) + H (zk )Δzk = ρ(zk )z.
(10)
Let λk = max{δ l | l = 0, 1, 2, . . .} such that H(zk + λk Δzk ) ≤ [1 − σ(1 − γμ0 )λk ]H(zk ),
(11)
and set zk+1 := zk + λk Δzk , βk+1 := Φ(uk + λk Δuk ), k := k + 1. Go to Step 1. By following the proof of Proposition 6.2 in [7] and using Corollary 5.4 in [7], we can obtain the following properties of the function H(z) defined as in (6) to establish the well-definedness of Algorithm 1. Theorem 1. (i) The function H(z) is globally Lipschitz continuous in Rm+2k+1 . If μ > 0, H(z) is continuously differentiable with its Jacobian ⎞ ⎛ A 0 0 0 ⎜ 0 AT I 0 ⎟ ⎟ H(z) = ⎜ ⎝ M (z) 0 N (z) P (z) ⎠ , 0 0 0 1 where −1 −1 M (z) = I − L−1 w Lx , N (z) = I − Lw Ls , P (z) = −2μLw e,
A Combined Newton Method for Second-Order Cone Programming
2 w := x2 + s2 + 2μ2 e,
Lx :=
x0 xT1 x1 x0 I
609
.
(ii) If A has full row rank, H (z) is nonsingular for any μ > 0. By Theorem 1(i), is is not difficult to show that if A has full row rank Algorithm 1 is well-defined.
3 Convergence Analysis Lemma 1. Suppose that A has full row rank and that {zk = (uk , μk )} is the iteration sequence generated by Algorithm 1. If Φ(uk ) = 0 for all k ≥ 0, we have that {zk } is an infinite iteration sequence, {μk } is monotonically decreasing and zk ∈ Ω for any k ≥ 0, where " Ω = z = (x, y, s, μ) ∈ Rm+2k+1 : μ ≥ ρ(z)μ0 . Proof. It is not difficult to show that if Φ(uk ) = 0 for all k ≥ 0, {zk } is an infinite iteration sequence with μk > 0. Next we prove zk ∈ Ω for any k ≥ 0 by induction on k. It is obvious that z0 ∈ Ω, because ρ(z0 ) ≤ γ < 1. Suppose that zk ∈ Ω. On account of Algorithm 1, the iteration sequence {zk } includes {z;k } generated by Step 2 and {zk } generated by Step 3. If the sequence {zk } is infinite, we prove zk+1 ∈ Ω and {μk } is monotonically decreasing by following the proof of Proposition 8 in [4] and using the relation (11). If the sequence {z;k } is infinite, we prove zk+1 ∈ Ω by considering the following three cases. Case (i) If H(zk ) > 1 and H(zk+1 ) > 1, we have ρ(zk+1 )μ0 = γμ0 = ρ(zk )μ0 ≤ μk = μk+1 . Case (ii) If H(zk ) > 1 and H(zk+1 ) ≤ 1, we obtain ρ(zk+1 )μ0 = γμ0 H(zk+1 )2 ≤ γμ0 = ρ(zk )μ0 ≤ μk = μk+1 . Case (iii) If H(zk ) ≤ 1, it follows from (9) that ρ(zk+1 )μ0 = γμ0 H(zk+1 )2 ≤ γμ0 H(zk )2 = ρ(zk )μ0 ≤ μk = μk+1 . Hence we have that zk ∈ Ω for any k ≥ 0 and {μk } is monotonically decreasing. ! Theorem 2. Suppose that A has full row rank and that z ∗ = (u∗ , μ∗ ) is an accumulation point of the iteration sequence {zk = (uk , μk )} generated by Algorithm 1. If Φ(uk ) = 0 for all k ≥ 0, at least one of Φ(u∗ ) = 0 and H(z ∗ ) = 0 holds. Proof. Since Φ(uk ) = 0 for all k ≥ 0, we have H(zk ) = 0 for all k ≥ 0. Thus {zk } is an infinite iteration sequence, which includes {z;k } generated by
610
X. Chi and J. Peng
Step 2 and {zk } generated by Step 3. We consider the following two cases separately. Case (i) If the sequence {zk } is finite, the sequence {z;k } is infinite. Then by Step 2 there exists a sufficiently large k0 such that zk = z;k for any k > k0 . Therefore, we have from (8) that Φ(uk ) ≤ η1k−k0 βk0 for any k > k0 , which implies Φ(u∗ ) = 0. Case (ii) If the sequence {zk } is infinite, by taking a subsequence if necessary, suppose that {zk } converges to z ∗ = (u∗ , μ∗ ) as k → ∞. In view of (9), (11) and Lemma 1 that the two sequences {H(zk )} and {μk } are monotonically decreasing. Then {H(zk )} and {μk } converge to H(z ∗ ) and μ∗ respectively as k → ∞. Furthermore, {H(zk )}, {μk } and {ρk } converge to H(z ∗ ), μ∗ and ρ∗ := γ min{1, H(z ∗)2 } respectively. If H(z ∗ ) = 0, we obtain the desired result. On the contrary, suppose that H(z ∗ ) > 0. It follows from Lemma 1 that 0 < ρ∗ μ0 ≤ μ∗ . Thus by (11) and Theorem 1, we have lim λk = 0, lim H(zk ) = H(z ∗ ), lim H (zk ) = H (z ∗ ).
k→∞
k→∞
k→∞
On the one hand, the Armijo condition (11) is not satisfied for λk := λk /δ by Step 3 in Algorithm 1, i.e., H(zk + λk Δzk ) − H(zk ) > −σ(1 − γμ0 )H(zk ). λk Taking the limit k → ∞ in the above inequality and using λk → 0, we obtain H(z ∗ )T H (z ∗ )Δz ∗ ≥ −σ(1 − γμ0 )H(z ∗ )2 .
(12)
On the other hand, we have from (10) H(z ∗ )T H (z ∗ )Δz ∗ = −H(z ∗ )2 + ρ∗ H(z ∗ )T z ≤ −H(z ∗)2 + γμ0 H(z ∗)2 . Substituting the last relation into (12) yields (1 − σ)(1 − γμ0 ) ≤ 0, which contradicts the fact σ ∈ (0, 1), γμ0 < 1. Hence we obtain H(z ∗ ) = 0. !
4 Numerical Results We implemented Algorithm 1 in MATLAB 7.0.1 to see the performance of the combined Newton method for the SOCP. All the experiments were performed on a Intel(R) Pentium(R) 4 CPU 3.00 GHz and 512 MB memory desktop computer under Windows XP.
A Combined Newton Method for Second-Order Cone Programming
611
Table 1 Performances of Algorithm 1 on SOCPs Algorithm 1 m
k
iter
cpu(s)
10
20
3
0.031
25
50
3
0.032
50
100
3
0.125
75
150
3
0.234
100
200
4
0.765
125
250
3
1.122
We use randomly generated problems with size n = 1 and k(= 2m) from 20 to 250. Let x0 = 0 ∈ Rk , y0 = 0 ∈ Rm , s0 = c be initial points. The parameters used in Algorithm 1 were as follows: σ = 0.25, δ = 0.95, μ0 = 0.75, η1 = 0.8, η2 = 0.99 and γ = 0.99. We used H(z) ≤ 10−6 as the stopping criterion. Table 1 shows the effectiveness of Algorithm 1.
5 Conclusions By using the techniques in both non-smoothing Newton methods and smoothing Newton methods, we propose a combined Newton method for the SOCP in this paper. Based on the Fischer-Burmeister smoothing function, our algorithm is shown to possess the following good properties: (i)our algorithm needs to perform at most one line search at each iteration; (ii) the algorithm does not have any restrictions regarding its starting points; (iii)our algorithm is shown to be globally convergent. Acknowledgements. This work is supported by the National Natural Science Foundation (Grant No. 70671050), the Major Research Program (Grant No. Z20082701) and the Group Innovation Project of Hubei Provincial Department of Education, the Excellent Youth Project of Hubei Provincial Department of Education, and the Doctorial Foundation of Huanggang Normal University (No. 08cd158), China.
References 1. Alizadeh, F., Goldfarb, D.: Second-order cone programming. Math. Program. 95, 3–51 (2003) 2. Lobo, M.S., Vandenberghe, L., Boyd, S., Lebret, H.: Applications of second-order cone programming. Linear Algebra Appl. 284, 193–228 (1998)
612
X. Chi and J. Peng
3. Chen, B., Xiu, N.: A global linear and local quadratic non-interior continuation method for nonlinear complementarity problems based on Chen-Mangasarian smoothing functions. SIAM J. Optim. 9, 605–623 (1999) 4. Qi, L., Sun, D., Zhou, G.: A new look at smoothing Newton methods for nonlinear complementarity problems and box constrained variational inequalities. Math. Program. 87, 1–35 (2000) 5. Tseng, P.: Analysis of a non-interior continuation method based on ChenMangasarian smoothing functions for complementarity problems. In: Reformulation –Nonsmooth, Piecewise Smooth, Semismooth and Smoothing Methods, pp. 381–404. Kluwer Academic Publishers, Boston (1999) 6. Faraut, J., Kor´ anyi, A.: Analysis on Symmetric Cones. Oxford University Press, Oxford (1994) 7. Fukushima, M., Luo, Z.Q., Tseng, P.: Smoothing functions for second-order-cone complementarity problems. SIAM J. Optim. 12, 436–460 (2001) 8. Clarke, F.H.: Optimization and nonsmooth analysis. John Wiley and Sons, New York (1983)
MES Scheduling Optimization and Simulation Based on CAPP/PPC Integration Yan Cao, Ning Liu, Lina Yang, and Yanli Yang*
Abstract. Confronted with the complexity of manufacturing environment, manufacture system, and manufacturing process, how to abase this complexity is an important problem to realize production scheduling and control effectively. Thereinto, CAPP and PPC are two main modules involved in production activities. Firstly, the existing approaches of CAPP/PPC integration are analyzed. Then, a conceptual model for concurrent distributed integration of CAPP and PPC is presented. The model can achieve the integration of CAPP and PPC and solve the production planning and control in a virtual enterprise. According to the conceptual model of the functional integration of CAPP/PPC, a method to realize the functional integration of CAPP/PPC is discussed. And then, according to the requirements of single piece and small batch production, MES scheduling optimization can be realized based on CAPP/PPC Integration. The structure of the scheduling sub-system of MES is also presented that is consisted of three core modules, namely operation planning, operation scheduling, and material tracing module. Finally, based on Witness, a simulation model of a specific workshop is established. Through the comparison of scheduling schemes, the optimal MES scheduling scheme is obtained. Keywords: CAPP, PPC, MES, Scheduling, Integration, Simulation. Yan Cao Advanced Manufacturing Engineering Institute, School of Mechatronic Engineering, Xi’an Technological University, Xi’an 710032, China
[email protected]
*
Ning Liu China First Heavy Industries, Fularji 161042, China Lina Yang Xi’an University of Science and Technology, Xi’an 710054, China Yanli Yang Shenzhen University, Shenzhen 518060, China H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 613–622. springerlink.com © Springer-Verlag Berlin Heidelberg 2009
614
Y. Cao et al.
1 Introduction Confronted with the complexity of manufacturing environment, manufacture system, and manufacturing process, how to abase the complexity is an important problem to realize workshop production scheduling and control effectively. Production scheduling and decision-making are a multiple objective optimization problem where local decision-making and optimization exist at each phase of production scheduling and control [1, 2]. The gap of CAPP and PPC often suffers from stochastic resource bottlenecks, non-availability of tools and personnel, breakdown of machine tools, and so on. The reasons that lead to the above phenomena are as follows: • Process planners assume unlimited and idle resources in a workshop. • CAPP and PPC have conflicting objectives. CAPP emphasizes the technological requirements of products, while PPC mainly considers the products’ time to market. • Not considering the dynamic production environment, the objective time delay between the production planning and its execution phase is inevitable. • CAPP may not consider the dynamic nature of workshop production scheduling and control. Hence, it can not predict the load status of the resources in a workshop. In recent years, with the development of information technology and the request of agile production, Manufacturing Execution System (MES), which is a management information system at the workshop layer, becomes a bridge that spans and fills up the gap between the upper layer of plan management and the ground floor layer of industrial control in an enterprise [3, 4, 5, 6]. Workshop scheduling is the core module of MES, which directly affects efficiency, operation, and management in workshop production. Effective scheduling methods can improve benefit and reduce cost [7, 8, 9, 10]. Moreover, MES scheduling performance can be greatly improved through CAPP/PPC integration.
2 CAPP/PPC Integration Approaches The integration of CAPP and workshop scheduling arose in the middle of the 1980s. After about ten years, the ideas of the integration of CAPP and PPC were suggested. So far, several approaches for integration of CAPP and PPC have been discussed in terms of nonlinear process planning, flexible process planning, closed loop process planning, dynamic process planning, alternative process planning, as well as just-in-time process planning. According to the review and analysis of references, there are mainly three types of general integration approaches.
MES Scheduling Optimization and Simulation Based on CAPP/PPC Integration
615
• Interfaced integration. • Modular integration. • Unified integration.
3 A Model of Concurrent Distributed CAPP/PPC Integration Research on integration of CAPP and PPC must be fulfilled from the point of view of enterprise integration. In the paper, a model of concurrent distributed CAPP/PPC integration is constructed. The structure is illustrated in Fig. 1. Hereinto, FU represents functional unit.
FU of alternative routes
CAPP
FU of space
FU of time
PPS
Fig. 1 A model of concurrent distributed integration
4 A Hierarchical Distributed Resource Model of a Virtual Enterprise A hierarchical distributed resource model of a virtual enterprise is shown in Fig. 2.
5 Realization Method of CAPP/PPC Integration The flow chart of the functional integration of CAPP/PPC is shown in Fig. 3. As capacity unbalance leads to long lead-time, high levels of work-in-process (WIP), as well as rising cost, it affects the synthetic objective to minimize cost and lead time.
616
Fig. 2 A hierarchical resource distribution model of a virtual enterprise
Y. Cao et al.
MES Scheduling Optimization and Simulation Based on CAPP/PPC Integration
617
Fig. 3 CAPP/PPC integration flow chart
6 MES Scheduling Optimization Based on CAPP/PPC Integration According to the requirements of single piece and small batch production, MES scheduling optimization can be realized based on CAPP/PPC Integration. Consequently, the information from MES can reflects production changes in time. And, production planning can responds to the changes rapidly. The data flow among production planning, MES, and production control is shown in Fig. 4.
618
Y. Cao et al.
Production planning Product demand BOM Order sheet
Task status System status
……
……
MES
Production plan Analysis report Inventory
Production progress Equipment status
……
……
Production control
Fig. 4 The data flow among production planning, MES, and production control
7 MES Scheduling Sub-system The scheduling sub-system of MES is consisted of three core modules, namely operation planning, operation scheduling, and material tracing module, as shown in Fig. 5. Operation scheduling adopts optimization algorithms, such as intelligent algorithm, heuristic algorithm, and so on.
8 Workshop Simulation for MES Scheduling Optimization Based on Witness In the paper, workshop operation simulation is adopted to seek the optimal MES scheduling scheme. Based on Witness, a simulation model of workshop operations is established. Through comparison of scheduling schemes, workshop operations are optimized. In a workshop where there are five groups of machines, three products are to be machined. The products need four, three, and five working procedures respectively. The initial simulation model is shown in Fig. 6. Its statistic information is shown in Table 1-3. Table 1 Product statistic information
Name A B C
No. Entered 3466 5940 2292
No. Shipped 3417 5889 2241
W.I.P. 49 51 51
Avg W.I.P. 22.03 47.57 27.86
Avg Time 1113.59 1403.09 2129.43
MES Scheduling Optimization and Simulation Based on CAPP/PPC Integration
Fig. 5 The functional modules of MES scheduling sub-system
619
620
Y. Cao et al.
Fig. 6 The initial simulation model Table 2 Machine statistic information
Name Machine1 Machine2 Machine3 Machine4 Machine5
% Idle 3.98 4.94 27.58 2.08 21.22
% Busy 96.02 95.06 72.42 97.92 78.78
No. Of Operations 11616 5682 11595 8144 5672
Table 3 Buffer statistic information
Name
Total In
Total Out
Now In
Max
Min
Avg Size
Avg Time
Buffers1 Buffers2 Buffers3 Buffers4 Buffers5
11620 5757 11607 8194 5682
11619 5684 11599 8147 5673
1 73 8 47 9
73 91 30 142 35
0 0 0 0 0
18.65 17.46 1.39 44.25 4.31
281.15 531.47 20.91 946.17 132.78
Table 4 Product statistic information after optimization
Name A B C
No. Entered 3409 5813 2286
No. Shipped 3403 5812 2277
W.I.P. 6 1 9
Avg W.I.P 4.90 6.77 4.91
Avg Time 251.66 204.17 376.28
MES Scheduling Optimization and Simulation Based on CAPP/PPC Integration
621
The system model after optimization is shown in Fig. 7. The statistic information after model optimization is shown in Table 4-6.
Fig. 7 The system model after optimization Table 5 Machine statistic information after optimization
Name Machine1 Machine2 Machine3 Machine4 Machine5
% Idle 29.03 36.04 28.37 27.37 22.80
% Busy 70.97 63.96 71.63 72.63 77.20
No. Of Operations 11497 5686 11496 8091 5684
Table 6 Buffer statistic information after optimization
Name Buffer1 Buffer2 Buffer3 Buffer4 Buffer5
Total In 11501 5693 11499 8091 5686
Total Out 11501 5689 11499 8091 5685
Now In 0 4 0 0 1
Max 14 15 15 18 21
Min 0 0 0 0 0
Avg Size 0.75 0.74 0.84 1.00 1.95
Avg Time 11.37 22.90 12.82 21.64 60.08
9 Conclusions According to the requirements of single piece and small batch production, a model for CAPP/PPC integration is presented, based on which MES scheduling performance can be optimized. Although the emphasis is focused on the application to a virtual enterprise, the approach can also be applied to other manufacturing enterprises. A scheduling sub-system of MES is presented that is consisted of three core modules, namely operation planning, operation scheduling, and material
622
Y. Cao et al.
tracing module. Based on Witness, a simulation model for a specific workshop is established. Through comparison of scheduling schemes, workshop operations are optimized. The future researches are mainly focused on intelligent scheduling algorithms to deal with the complexity and incidents in practical production more effectively. What is more, the functions of the scheduling sub-system of MES need to be perfected and extended. Acknowledgments. The paper is supported by Shaanxi Major Subject Construction Project and President Fund of Xi’an Technological University.
References 1. Zhang, X.D., Yan, H.S.: Integrated Optimization of Production Planning and Scheduling for Multi-Stage Workshop. Chinese Journal of Mechanical Engineering 41, 98– 105 (2005) 2. He, Y., Liu, F., Shi, J.L.: A Framework of Scheduling Models in Machining Workshop for Green Manufacturing. Journal of Advanced Manufacturing Systems 7, 319–322 (2008) 3. Wei, Y., Li, D.B., Yuan, M.H.: Research on Manufacturing Execution System for Agile Assembling Resource Reconfiguration Based on Agent. Machine Tool & Hydraulics 36, 44–47, 52 (2008) 4. Wang, Q.F., Liu, F., Huang, H.L.: Service-Oriented Reconfigurable Manufacturing Execution System for Discrete Workshop. Computer Integrated Manufacturing Systems 14, 737–743 (2008) 5. Pan, Y., Zhang, W.X.: Research on Multi-Agent-Based MES Structure of Discrete Manufacturing Industry. Application Research of Computers 26, 244–246, 249 (2009) 6. Cheng, Z.L., Fan, Y.Q.: Design of component-based flexible MES in metallurgical industry. Computer Integrated Manufacturing Systems 13, 490–496 (2007) 7. Zhou, W.K., Zhu, J.Y.: Application of Workflow Technology for Workshop Scheduling. Journal of Harbin Institute of Technology (New Series) 12, 105–110 (2005) 8. Chu, H.Y., Cao, Q.J., Fei, R.Y.: Production Scheduling Research for Workshop Based on Manufacturing Cell. Journal of Beijing University of Technology 32, 730–736 (2006) 9. Liu, M., Yan, J.W.: An Adaptively Annealing Genetic Algorithm Based Scheduling Method of Workshop Daily Operating Planning. Chinese Journal of Computers 30, 1164–1172 (2007) 10. Yu, X.Y., Sun, S.D., Chu, W.: Parallel Cooperative Evolutionary Genetic Algorithm for Multi-Workshop Planning and Scheduling Problems. Computer Integrated Manufacturing Systems 14, 991–1000 (2008)
An Improved Diversity Guided Particle Swarm Optimization Dongsheng Xu and Xiaoyan Ai*
Abstract. Particle swarm optimization (PSO) is a new population based stochastic search algorithm, which has shown good performance on well-known numerical test problems. However, on strongly multimodal test problems the PSO easily suffers from premature convergence. In this paper, an improved diversity guided PSO is proposed, namely IARPSO, which combines a diversity guided PSO (ARPSO) and a Cauchy mutation operator. The purpose of IARPSO is to enhance the global search ability of ARPSO by Conducting a Cauchy mutation on the global best particle. Experimental results on 6 multimodal functions with many local minima show that the IARPSO outperforms the standard PSO, ARPSO and ATRE-PSO on all test functions. Keywords: Particle swarm optimization (PSO), Diversity, Cauchy mutation, Function optimization.
1 Introduction Particle swarm optimization is a population based stochastic search technique first introduced by Kennedy and Eberhart in 1995 [1]. Since then it has been used to solve many optimization problems. Its performance has been compared with stochastic search algorithms such as simulated annealing (SA) and genetic algorithm (GA) [2]. Although PSO has shown good performance in many optimization problems, it easily suffers from local optima in multimodal optimization problems. A major problem with PSO in multimodal optimization is premature convergence, which results in great performance loss and sub-optimal solutions. In order to overcome the drawback of PSO, many techniques have been developed to improve its performance, such as ARPSO [3], ATRE-PSO [4] etc. Riget [3] has proposed a diversity-guided PSO, called ARPSO, which uses a diversity measure to control the search of swarm. The ARPSO defines an attraction phase and a repulsion phase. The former exists in the standard PSO algorithm, and the latter modifies the velocity updating formula of the standard PSO. In the Dongsheng Xu . Xiaoyan Ai Department of Information Technology, Yulin University, Yulin 719000, China
[email protected],
[email protected] *
H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 623–630. springerlink.com © Springer-Verlag Berlin Heidelberg 2009
624
D. Xu and X. Ai
attraction phase the swarm is contracting, and consequently the diversity decreases. When the diversity drops below a predefined constant number dlow, the repulsion mechanism is applied for expanding swarm. Finally, when the diversity reaches to another predefined constant number dhigh, the ARPSO switches back to the attraction phase. The experimental results have shown that the ARPSO prevents premature convergence to a high degree. On the basis of ARPSO, Pant [4] has presented ATRE-PSO, in which a middle phase between attraction and repulsion is introduced. In ARPSO if the diversity is above dhigh then particles attract each other, and if it is below dlow, then the particles repel each other until they meet the required diversity dhigh. When the diversity is lying in between dlow and dhigh, the ATRE-PSO employs a middle phase called the phase of positive conflict. In this phase there is neither a complete attraction nor a complete repulsion. Each particle in swarm is attracted by its own previous best particle and is repelled by the global best particle. In this way the diversity may lie in a balance level, because there is neither total attraction nor total repulsion but a middle phase between the two. The simulation results have shown the middle phase could improve the exploring and exploiting abilities of ARPSO. In this paper, an improved ARPSO is proposed, namely IARPSO, which combines the ARPSO and a Cauchy mutation operator conducted on the global best particle. The Cauchy mutation is more likely to generate a larger jump than Gaussian mutation, which could help the global best particle jump out local optima. Experimental results on 6 well-known multimodal functions with many local minima show that the IARPSO outperforms the standard PSO, ARPSO and ATRE-PSO. The rest of the paper is organized as follows. In Section 2, the standard PSO, ARPSO, ATRE-PSO, and our proposed algorithm IARPSO are introduced. In Section 3, the benchmark functions, parameter settings, experimental results and discussions are presented. Conclusions as well as further works are given in Section 4.
2 An Improved Diversity Guided PSO (IARPSO) 2.1 The Standard PSO Like other evolutionary algorithms, PSO is also a population-based search algorithm and starts with an initial population of randomly generated solutions called particles [5]. Each particle in PSO has a velocity and a position. PSO remembers both the best position found by all particles and the best positions found by each particle in the search process. For a search problem in an n-dimensional space, a particle represents a potential solution. The velocity vij and position xij of the jth dimension of the ith particle are updated according to Eqs. (1) and (2): v (t + 1) = w ⋅ v (t ) + c1 ⋅ rand 1 ⋅ ( pbestij ( t ) − x (t )) + c2 ⋅ rand 2ij ⋅ ( gbest (t ) − x (t )) ij ij ij ij j ij
(1)
An Improved Diversity Guided Particle Swarm Optimization
625
xij (t + 1) = xij (t ) + vij (t + 1)
(2)
where i = 1,2,…, is the particle’s index, Xi = (xi1, xi2,…,xin) is the position of the ith particle; Vi = (vi1, vi2,…, vin) represents velocity of particle i. pbesti = (pbesti1, pbesti2,…, pbestin) is the best previous position yielding the best fitness value for the ith particle; and gbest = (gbest1, gbest2,…., gbestn) is the global best particle found by all particles so far. The inertia factor w was proposed by Shi and Eberhart [6], rand1ij and rand2ij are two random numbers independently generated within the range of [0,1], c1 and c2 are two learning factors which control the influence of the social and cognitive components, and t = 1,2,…, indicates the iterations.
2.2 The ARPSO Riget [3] has presented a diversity-guided PSO (ARPSO) which defines a repulsion phase based on a modified velocity updating model. The basic PSO algorithm only employs an attraction phase which attracts each particle. When the diversity in swarm drops below a predefined constant number dlow, the ARPSO switches to the repulsion phase, in which particles repel each other due to the modified velocity updating formula, and then the diversity increases. When the diversity reaches to another predefined constant number dhigh, the ARPSO switches back to the attraction phase. The switching search model is defined as [3]: v (t + 1) = w ⋅ v (t ) + dir ⋅ [c1 ⋅ rand 1 ⋅ ( pbestij (t ) − x (t )) + c2 ⋅ rand 2ij ⋅ ( gbest (t ) − x (t ))] ij ij ij ij j ij
(3)
where the sign dir satisfies:
⎧if (dir > 0 & & diversity < dlow) ⎨ ⎩if (dir < 0 & & diversity > dhigh)
dir = −1; dir = 1;
, dir is set to 1 at the beginning and
the diversity is computed as: diversity ( t ) =
S ⋅ ∑ S ⋅ L i =1 1
(
)
2 n ∑ x ij ( t ) − x j ( t ) j =1
S 1 x j (t ) = ⋅ ∑ x ij ( t ) S i =1
(4)
(5)
where |S| is the swarm size, |L| is the length of longest diagonal in the search space, n is the dimensional size, t = 1,2,…, indicates the iterations, xij is the jth value of the ith particle and x j is the jth value of the average point all particles in swarm.
2.3 The ATRE-PSO In the ARPSO [4], particles alternates attraction and repulsion phase when the diversity is above dhigh or below dlow. The diversity < dlow and diversity > dhigh may
626
D. Xu and X. Ai
not be the only two possibilities for deciding the search model of the swarm, but many times the diversity may lie in between dlow and dhigh. For this reason, Pant proposes an improved ARPSO algorithm, namely ATRE-PSO, which introduces a middle phase called the phase of positive conflict in the ARPSO. In the middle phase there is neither a complete attraction nor a complete repulsion. Each particle in swarm is attracted by its own previous best particle and is repelled by the global best particle. The middle phase is defined as [4]: v (t + 1) = w ⋅ v (t ) + c1 ⋅ rand 1 ⋅ ( pbestij (t ) − x ( t )) − c2 ⋅ rand 2ij ⋅ ( gbest (t ) − x (t )) ij ij ij ij j ij
(6)
2.4 Our Proposed Method IARPSO Although ARPSO enhances the diversity of swarm and prevents premature convergence in a certain extent, it still suffers from local optima in some multimodal functions. In this paper, an improved ARPSO is proposed, called IARPSO, which employs a Cauchy mutation operator. It is to hope that the long jump from the Cauchy mutation on the global best particle could improve the global search ability of ARPSO. The motion of particles is dominated by its previous best particles and the global best particle. So the changes of the global best particle will influence the movement of particles. In the IARPSO, a random number generated by a Cauchy mutation operator is added to the global best particle every generation. Such slight changes on the global best particle extend its search space, and may help particles jump to a better position and escape from local optima finally for its long flat tails. Some studies [7–8] have explained why Cauchy mutation performs better than traditional Gaussian mutation for most function optimization problems. The Cauchy mutation is more likely to generate larger jumps than Gaussian mutation and thus better in success. The Cauchy mutation operator is defined by [8]: gbest (t + 1) = gbest (t ) + cauchy ()
(7)
where cauchy() is a random number generated by a Cauchy distribution with a scale parameter t =1. The main steps of IARPSO are given in Table 1.
3 Experimental Studies on IARPSO 3.1 Test Functions 6 well-known multimodal functions with many local minima used in [7] have been chosen in our experimental studies. All the functions used in this paper are to be minimized. The description of the benchmark functions and their global optimum(s) are listed in Table 2.
An Improved Diversity Guided Particle Swarm Optimization
627
Table 1 The main steps of IARPSO
Begin PopSize = population size; P = current population; eval = the number of evaluations; MaxEvaluation = the maximum number of evaluations; diversity = the swarm diversity; while (eval < MaxEvaluation) Select a search model according to equation (3); for i = 1 to PopSize Calculate the velocity of particle Pi according to the selected search model; Update the position of particle Pi according to equation (2); Calculate the fitness value of particle Pi ; for end Mutate gbest according to equation (7); If gbest(t+1) is better than gbest(t) Update gbest with gbest(t+1) If end Update pbest, gbest in P if needed; Calculate the swarm diversity according to equation (4); eval++; while end End
Table 2 The 6 multimodal test functions used in our experimental studies, where n is the dimension of the functions, fmin is the minimum values of the function, and X ⊆ Rn is the search space Test Functions f1 = ∑in=1 − xi ⋅ sin
( ) xi
2 f2 = ∑in=1 [ xi − 10 cos(2π xi ) + 10] f3
=
− 2 0
− ex p
f4 =
⋅
ex p
⎛ ⎜ ⎝
1 4000
π
1 n
⎛ ⎜ ⎝
− 0 .2
⋅
1 n
∑ in = 1 c o s
(2 π
2 ∑ in = 1 x i
xi
2 ∑ in= 1 x i − ∏ in= 1 co s(
(
) ⎞⎟ ⎠
xi
+
⎞ ⎟ ⎠
2 0 +
) +1
i
⎪
u ( x , a , k , m ) = ⎨0, i
(
X
30
[–500, 500]
fmin
30
[–5.12, 5.12]
0
30
[–32, 32]
0
30
[–600, 600]
0
30
[–50, 50]
0
30
[–50, 50]
–12569.5
e
2 2 2 10 sin ( π y ) + ∑ in=−11 ( y -1) [1 + sin( π y )] + ( y -1) 1 i i +1 n nn 1 + ∑ i =1 u ( x ,10 ,100 ,4 ) , y = 1 + ( x + 1) i i 4 i ⎧ m ⎪ k ( xi − a ) , xi > a
f5 ( x ) =
n
)
− a ≤ xi ≤ a
⎪ ⎪⎩ k ( − xi − a ) m , xi < − a
f 6 ( x ) = 0 .1 sin 2 ( 3 π x1 ) + ∑ in=−11 ( x i -1) 2 [1 + sin 2 ( 3 π x i + 1 )] + 2 2 ( x n - 1) [(1 + sin ( 2 π x n )] + ∑ in= 1 u ( x ,5 ,1 0 0 ,4 ) i
)
–1.1428
628
D. Xu and X. Ai
3.2 Parameter Settings There are four variant PSO algorithms including the proposed IARPSO used in the following experiments. The algorithms and parameters settings are listed below: z z z z
The standard PSO (PSO); ARPSO [3]; ATRE-PSO [4]; Our approach (IARPSO).
For the standard PSO, w = 0.72984, c1 = c2 = 1.49618, and the maximum velocity Vmax is set to the half range of the search space on each dimension. For ARPSO, ATRE-PSO and IARPSO, w is linearly decreasing from 0.9 to 0.4, c1 = c2 = 2.0, dlow = 5.0e–6 and dhigh = 0.25 by the suggestions of [3–4]. For all algorithms, the population size is set to 10 and the maximum number of evaluations is set to 100,000 [8]. All the experiments in this paper are conducted 30 times with different random seeds, and the average results throughout the optimization runs are recorded.
3.3 Experimental Results and Discussions The results of comparison among PSO, ARPSO, ATRE-PSO and IARPSO are given in Table 3, where “Mean” indicates the mean best function values found in the last generation, and “Dev” stands for the standard deviation. The convergence characteristics in terms of the best fitness value of the median run of each algorithm for each test function are presented in Figure 1. From the above results, IARPSO achieve better performance than the standard PSO and ATRE-PSO on all test functions. IARPSO outperforms ARPSO on functions 1, 2, 3 and 4. On functions 5 and 6, ARPSO and IARPSO nearly have the same performance. Especially on function 1, the IARPSO shows good performance in global search ability, even though the test function has many local minima. That can be contributed to the Cauchy mutation, which is more likely to generate larger jump than Gaussian mutation and help the global best particle escape from Table 3 The comparison results among PSO, ARPSO, ATRE-PSO and HPSO PSO
F
ARPSO
ATRE-PSO
IARPSO
Mean
Dev
Mean
Dev
Mean
Dev
Mean
Dev
f1
–6093
462
–7259
568
–6508
763
–11055
324
f2
69.65
23.7
42.78
9.56
38.8
14.3
37.8
10.6
f3
6.42
3.51
1.47e–12
1.91e–13
2.74e–03
1.04e–03
9.84e–13
8.23e–13
f4
0.16
0.70
4.43e–02
3.06e–02
6.13e–02
4.27e–02
1.23e–02
5.37e–02
f5
1.25
0.79
3.02e–17
4.37e–25
2.18e–07
2.45e–07
3.02e–17
7.61e–26
f6
1.49
5.06
–1.142
4.63e–03
–1.139
3.14e–02
–1.142
3.79e–03
An Improved Diversity Guided Particle Swarm Optimization
f1
f2
f3
f4
f5
f6
629
Fig 1 The performance comparisons among PSO, ARPSO, ATRE-PSO and IARPSO
local optima. From the results of Figure 1, IARPSO converges faster than the other three PSO algorithms on all test functions.
4 Conclusions The purpose of IARPSO is to improve the global search ability of ARPSO by combining a Cauchy mutation operator. It is to hope that the long jump generated by the Cauchy mutation could help particles jump out local optima. Experimental studies on 6 well-known multimodal functions with many local minima show that the IARPSO outperforms PSO, ARPSO and ATRE-PSO on all test functions. However, there are still fewer cases where IARPSO falls in the local optima as what happens on function 2. It suggests that the proposed method might not enough to prevent from premature convergence in that case. Future work will focus on improving the performance of ARPSO further.
630
D. Xu and X. Ai
References 1. Kennedy, J., Eberhart, R.C.: Particle Swarm Optimization. In: Proceedings of IEEE International Conference on Neural Networks, pp. 1942–1948 (1995) 2. Eberhart, R.C., Shi, Y.: Comparison Between Genetic Algorithms and Particle Swarm Optimization. In: Porto, V.W., Waagen, D. (eds.) EP 1998. LNCS, vol. 1447, pp. 69– 73. Springer, Heidelberg (1998) 3. Riget, J., Vesterstom, J.S.: A Diversity-guided Particle Swarm Optimizer – the arPSO. Technical report, EVAlife, Denmark (2002) 4. Pant, M., Radha, T., Singh, V.P.: A Simple Diversity Guided Particle Swarm Optimization. In: Proceedings of Congress Evolutionary Computation, pp. 3294–3299 (2007) 5. Hu, X., Shi, Y., Eberhart, R.C.: Recent Advance in Particle Swarm. In: Proceedings of Congress Evolutionary Computation, pp. 90–97 (2004) 6. Shi, Y., Eberhart, R.C.: A Modified Particle Swarm Optimization. In: Proceedings of Congress Evolutionary Computation, pp. 69–73 (1998) 7. Yao, X., Liu, Y., Lin, G.: Evolutionary Programming Made Faster. IEEE Trans. Evol. Comput. 3, 82–102 (1999) 8. Wang, H., Liu, Y., Li, C.H., Zeng, S.Y.: A Hybrid Particle Swarm Algorithm with Cauchy Mutation. In: Proceedings of IEEE Swarm Intelligence Symposium, pp. 356– 360 (2007)
Research on Intelligent Diagnosis of Mechanical Fault Based on Ant Colony Algorithm Zhousuo Zhang, Wei Cheng, and Xiaoning Zhou*
Abstract. Ant colony algorithm is an evolutionary optimization algorithm that simulates the foraging behavior of ant in nature, and it is distributed, parallel, robust and based on positive feedback. Basic principle of ant colony algorithm is introduced, and an adaptive clustering algorithm based on multi-ants parallel mechanism is constructed in this paper. The multi-ants parallel and adaptive clustering algorithm is applied to fault classification of locomotive wheel-paired bearings, and the accuracy rate of classification is 87%. Research results show the algorithm is effective on practical fault diagnosis. Keywords: Intelligent diagnosis, Ant colony algorithm, Supervised learning, Unsupervised learning, Clustering.
1 Introduction Ant colony algorithm was firstly proposed by Dorigo M in 1991 [1], and then it was used in dealing with the traveling salesman problem (TSP) successfully [2]. And now it is widely used in clustering [3, 4] and feature extraction [5, 6]. As an evolutionary optimization algorithm, ant colony algorithm (ACO) has the advantages of positive feedback, distributed and parallel computing, robustness and so on [7, 8]. Therefore, it is promising to construct mechanical fault clustering models by combining ant colony algorithm and clustering methods for fault classification and recognition in the condition of lack of prior information. For example, the categories of training samples are unknown. This paper firstly introduces the basic principle of ant colony algorithm, and takes pheromone concentration as the main basis of classifying samples to construct a multi-ants parallel and adaptive clustering model base on ant colony Zhousuo Zhang . Wei Cheng . Xiaoning Zhou School of Mechanical Engineering, State Key Laboratory for Manufacturing Systems Engineering, Xi’an Jiaotong University, Xi’an 710049, China
[email protected],
[email protected],
[email protected] *
H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 631–640. springerlink.com © Springer-Verlag Berlin Heidelberg 2009
632
Z. Zhang, W. Cheng, and X. Zhou
algorithm and clustering algorithm. In the chapter 3 we give a detailed analysis and calculation process of the model. And in the chapter 4 we apply the clustering method to fault classification of locomotive wheel set bearings.
2 Basic Principles of Ant Colony Algorithm bi (t ) represent the number of ants that located in element i at time t , τ ij (t ) represent the information contents of path (i, j ) at time t , and n is the scale of TSP, m is the total number of ants, then
Let
n
m = ∑ bi (t ) ; Γ = {τ ij (t ) | ci , c j ⊂ C}
(1)
i =1
is the set of residual information contents in connections
lij between two elements
in set C at time t . At the initial time the information content of each path are equal,
and let τ ij (0) = const , thus the optimization of basic ant colony algorithm is implemented by means of directed graph g = (C , L, Γ) . Ant k ( k = 1, 2, , m) determines its transfer direction according to the information content of each path in its motion process. Let tabu list tabuk ( k = 1, 2, , m) record the city that ant k currently pass, the set will be dynamically adjusted along with the evolution process of tabuk . In the searching process, ants calculate state transition probability ac-
pijk (t ) represents the state transition probability from element(city) i to element(city) j of k ant k at time t , pij (t ) can be calculated as follow formula: cording to the information contents and heuristic information of each path.
⎧ [τ ij (t )]α ⋅ [ηik (t )]β , ⎪ α β ⎪ pijk (t ) = ⎨ ∑ [τ is (t )] ⋅ [ηis (t )] s ⊂ allowed k ⎪ 0, ⎪⎩
where
α
is an information heuristic factor,
if j ∈ allowed k
(2) otherwise
β
is an expectation heuristic factor,
allowed k = {C − tabuk } represents the next optional cities of ant k . α repre-
sents the relative importance of path, the bigger α is, the stronger of collaboration between ants will be. β represents the relative importance of visibility.
ηij (t )
is heuristic factor, its expression is η ij
= 1 d ij , dij represents the distance
between two adjacent cities. For ant k , the smaller dij is, the bigger be, and the bigger
k ij
p (t ) also will be.
ηij (t )
will
Research on Intelligent Diagnosis of Mechanical Fault
633
To avoid heuristic information submerged by residual information contents due to excess of residual pheromone, it is necessary to update the residual information contents after each ant finishes one step or travels all the n cities. Therefore, the information content of path (i, j ) at time t + n can be adjusted as follow formula: m
τ ij (t + n) = (1 − ρ ) ⋅τ ij (t ) + Δτ ij (t ) Δτ ij (t ) = ∑ Δτ ijk (t )
(3)
k =1
where ρ is an volatilization coefficient of the pheromone, and 1 − ρ is a residual factor of the pheromone. To avoid infinite accumulation of information contents, the value range of ρ should be ρ ⊂ [0,1) . At initial time, Δτ ij (0) = 0 .
Δτ ij (t ) represents the pheromone increment that ant k leaves at path (i, j ) in a cycle. One of the commonly used algorithms for updating pheromone is AntCycle algorithm: ⎧Q ⎪ , Δτ k (t ) = ⎨ Lk ij ⎪ 0, ⎩
where
( )
if ant k pass i, j in this cycle
(4)
otherwise
Lk represents the total length that ant k passed in this cycle, Q is the
strength of pheromone, which influences the convergence rate of the algorithm. And this algorithm is the basic algorithm of ant colony algorithm.
3 Multi-ants Parallel and Adaptive Clustering Algorithm This algorithm combines ant colony algorithm with clustering analysis, and takes pheromone concentration as the main basis of classifying samples [9]. In the initial stage this algorithm constructs some initial solutions randomly, and modifies the solutions according to local search mechanism. Then it constructs pheromone matrix according to the optimized solutions and executes new cycles. When the new solutions satisfy the suspension condition, the algorithm stops calculating. The main steps of this algorithm can be described as follows:
3.1 Coding Solutions and Establishing Objective Function
v dimensions has n objects {x1 , x2 , , xv } and R ants. The clustering analysis can be described as follows: R ants divide n objects into K classes and the clustering evaluation function J c is minimized. Supposing that dataset with
At first the algorithm codes solutions, and the coding methods can be described by means of simple strings such as s = {c1 , c2 , , cn } , where {ci | i = 1,…, n}
634
Z. Zhang, W. Cheng, and X. Zhou
is the class labels of object i . If the classes are marked with integer,
ci ∈1, 2,… , K . ci = c j represents that object xi , x j belong to the same class and ci
≠ c j represents that object xi , x j do not belong to the same class. Thus the
solution matrix S
= {s1 , s2,
, sR } can be constructed by the solution set of R
J c is defined as the square sum of Euclidean distance
ants. Evaluation function
between objects and clustering center of the objects, it can be described as follow formula: K
n
V
J c = MinF (ω , m) = ∑∑∑ ωij || xiv − m jv ||2
(5)
j =1 i =1 v =1
xiv is the vth attribute value of object i , m jv is the mean value of all the samples’ attribute v in cluster j . m is a K × n matrix of clustering center, and ωij is n × K matrix of relation weights:
where
⎧1, xi ∈ j ⎩0, xi ∉ j
ωij = ⎨
(6)
and m jv can be calculated according to the follow formula:
∑ ωx = ∑ ω N
m jv
i =1 ij iv N i =1
, j = 1,..., K , v = 1,..., n
(7)
ij
According to the formula (5), we can evaluate the solutions generated by ants repeatedly until function J c satisfies the condition, or J c is less than the minimum function value that is setup before. Then the optimum clustering center related to the minimum J c can be selected.
3.2 Constructing Initial Solution of Pheromone At first let all the ants’ solutions be empty. For pheromone matrix
tao with n × k
dimensions, the element taoij represents the pheromone concentration of sample
i relative to cluster j . All the elements of pheromone matrix are initialized to equal values before executing cycles. In this paper, all the initial values are 0.01. In the process of structuring initial solutions, the initial clustering centers are constructed for R ( R < n) ants by means of k-means algorithm, and the main
(1)The initial clustering centers are selected for each
steps of the algorithm are:
Research on Intelligent Diagnosis of Mechanical Fault
()
635
ant randomly. 2 The distances between each ant(data) and K initial clustering centers are calculated. 3 each ant chooses the nearest clustering center, and allocates itself to the same cluster.
()
3.3 Modifying Solutions Using Local Search Mechanism R ants construct some initial solutions, and search for some optimized solutions of all initial solutions locally. Thereupon, the quality of solutions is further improved. Thus at first we should calculate the values of evaluation function J c of each ant as the basis for judging the initial solutions accurate or not. The local search strategy is to vary solutions randomly in order to generate new solutions. If the new solutions are more accurate than the former solutions, the old solutions will be replaced by the new ones, otherwise the old solutions will be preserved.
3.4 Updating Pheromone Updating pheromone can dynamically reflect the information generated by the ants in motion. Therefore, pheromone matrix is updated according to optimized solutions generated by L ants after every cycle. Supposing that ρ ⊂ [0,1] is an pheromone volatilization coefficient, and it represents the volatile degree of pheromone as time lapses. And the formula of pheromone updating can be expressed as follows:
taoij (t + 1) = (1 − ρ ) × taoij (t ) + ∑ l =1 Δtaoijl L
⎧1 ⎪ , Δtaoijl = ⎨ Fl ⎪ 0, ⎩
object i belongs to cluster j
(8)
(9)
otherwise
The pheromone matrix updated will be the basis for the next iterative searching of ant colony, and new solutions are calculated by means of this method.
4 Engineering Applications of Multi-ants Parallel and Adaptive Clustering Algorithm Wheel set bearings are weak link in locomotives and are easy to malfunction, so it has important significance to classify faults correctly in engineering application. In this section we apply the multi-ants parallel and adaptive clustering algorithm to fault classification of locomotive wheel set bearings. And we make experiment on the JL-501A test-bench of locomotive wheel set bearings. To simulate the actual working condition, we use hydraulic system to drive and load, and set speed and load with the control cabinet. According to the above
636
Z. Zhang, W. Cheng, and X. Zhou Sony EX Data Acquisition Module
Experimental Control Cabinet and Bench
Data Recording Module
Fig. 1 Locomotive rolling bearing test system Moto
Support
Coupling
Axi Test Bearing Loadable Module Acceleration Sensor
Test Bench
Hydraulic Cylinder
Speed Sensor
Fig. 2 Structural diagram of experimental device
Early scratches
(a) Early fault of outer ring
Rolling body abrasion
(b) Rolling body fault
Fig. 3 Bearing fault physical picture
method, we realize continuously variable transmission and get a load power up to 20000N. The acceleration sensor is adsorbed on the bottom of load module. Although the vibration signals are weakened, it is consistent with the actual situation
Research on Intelligent Diagnosis of Mechanical Fault
637
most. Fig.1 shows the overall test system of locomotive rolling bearing, and Fig.2 gives a detailed structural diagram of experimental device. In the experiment, we acquire vibration acceleration signals in three different conditions: the bearings without faults (Normal condition), slight friction fault of outer ring, and compound faults of outer-ring peeling and rolling-element pitting. And Fig.3 (a) and (b) shows the physical pictures of outer ring and rolling-element fault. The detailed parameters of our experiment are listed in Table.1, and the original time-domain signals in these three conditions are shown in Fig.4. In our experiment, we chose three time domain characteristics, effective value, peak value and standard deviation, as the feature attributes to classify the data. Let signal x = ( x1 , x2 ,..., xT ) , T is the length of x , xˆ is the mean value of x , thus effective value
xrms , peak value x peak and standard deviation xs can be calcu-
lated as follows: Table 1 Experimental parameters of locomative wheel bearings Motor spindle speed (n/rpm)
Sampling frequency(Hz)
Normal condition
499
25.6k
Outer ring minor fault
506
25.6k
Outer ring and rolling fault
519
25.6k
Amplitude/V
Fault type
Amplitude/V
(a) Normal Condition
Amplitude/V
(b) Outer ring minor fault
(c) Outer ring and rolling compound fault
Fig. 4 Original Time-Domain Signals
638
Z. Zhang, W. Cheng, and X. Zhou
1
xrms
⎡ 1 T 2⎤2 =⎢ ∑ xi ⎥ ⎣ T − 1 i =1 ⎦
(10)
x peak = max( xi )
(11) 1
2 ⎡ 1 T 2⎤ ˆ xs = ⎢ ( x x ) − ∑ i ⎥ ⎦ ⎣ T − 1 i =1
(12)
For each condition we use 100 data samples: 50 of them are used as training samples to train classifier, and the others are used to test the performance of the classifier. The main steps and process for fault classification of locomotive bearings by means of the multi-ants parallel and adaptive clustering algorithm can be described as follows: 1) The number of ants is initialized as R=150/2=75, the initial solutions are emptied and the cycle number is initialized as NCmax = 10 . 2) Some initial clustering centers are constructed randomly and the number of the clustering centers is set such as k = 3 , then 75 clustering evaluation functions
J generated by 75 ants are calculated and the result is J max = 3.92 in our
experiment. 3) 10 solutions selected are more accurate according to the value of J and local search was made to obtain more optimized solutions. 4) Pheromone is updated according to the new solutions and new cycles are executed until the solutions satisfy the terminal conditions. 5) The fault categories of the data are classified according to the pheromone concentration matrix, and the average clustering center is calculated to construct the classifier. 6) 150 test samples are used to test the classifier. The above classifier is used to test 150 samples, and the results are shown in the above figures. Fig.5 (a) shows the fault classification 3D figures of the locomotive wheel set bearings, and it indicates the spatial classification of different types of faults. Fig.5 (b) shows that all the features are in the same plane and the distribution of every feature is more centralized, we can give good classification boundaries in x1-x2 plane. Fig.5 (c) and (d) indicate that projections in x2-x3 plane and x1-x3 plane are the same, and this is because that all the features are in the same plane and the 3D figure of the fault classification is symmetrical. According to the Fig.5, the faults are classified into three types, and 131 of 150 samples are classified correctly, which means we get a classification accurate of 87%. So it is effective for fault classification by means of multi-ants parallel and adaptive clustering algorithm.
Research on Intelligent Diagnosis of Mechanical Fault
(a) 3D figure of the fault classification
(b) Projections in x1-x2 plane
(c) Projections in x2-x3 plane
(d) Projections in x1-x3 plane
639
Compound fault of outer ring and rolling element Normal state Slight fault of outer ring Wrongly classified sample Fig. 5 3D and Projections of the locomotive wheel set bearings fault classification
5 Conclusion At present intelligent diagnosis methods that are commonly used are mostly belong to supervised methods. However, in some engineering applications, it is difficult to determine the categories of the training samples even the categories can not be determined. Therefore, unsupervised clustering learning methods are needed to deal with fault classifications in these conditions. This paper constructs a multi-ants parallel and adaptive clustering algorithm based on ant colony algorithm, and applies the algorithm to fault classification of locomotive wheel set bearings. The classification results show that 87% of the faults can be recognized correctly. So it has important engineering significance to classify faults by means of multi-ants clustering algorithm.
640
Z. Zhang, W. Cheng, and X. Zhou
Acknowledgements. This paper is supported by the National Natural Science Foundation of China“Research on Hybrid Intelligent Technique and Its Application in Fault Diagnosis Based on Granular Computing” (Grant No. 50875197) and the Project Sponsored by the Scientific Research Foundation for the Returned Overseas Chinese Scholars, State Education Ministry.
References 1. Colorni, A., Dorigo, M., Maniezzo, V.: Distributed Optimization by Ant Colonies. In: Proceedings of the 1st European Conference on Artificial Life, pp. 134–142 (1991) 2. Dorigo, M., Maniezzo, V., Colorni, A.: Ant System: Optimization by a Colony of Cooperating agents. IEEE Trans. Syst., Man, Cybern. B 26, 29–41 (1996) 3. Prakash, S., Jayaraman, V.: An Ant Colony Approach for Clustering. Analytica Chimica Acta 509, 187–195 (2004) 4. Mehemet, K., Ali, N.: A New Arrhythmia Clustering Technique Base on Ant Colony Optimization. Journal of Biomedical Informatics 41, 874–881 (2008) 5. Rahul, K., Screeram, R.: A Hybrid Approach for Feature Subset Selection Using Neural Networks and Ant Colony Optimization. Expert Systems with Applications 33, 49–60 (2007) 6. Hamidreza, R., Karim, F.: An Improved Feature Selection Method Based on Ant Colony Optimization (AOC) Evaluated on Face Recognition System. Applied Mathematics and Computation 205, 716–725 (2008) 7. Dorigo, M.: Optimization, Learning and Natural Algorithms. Ph.D. Dissertation, Department of Electronics, Politecnico di Milano, Italy (1992) 8. Liu, H.: Principle and Application of Ant Colony Algorithm. Science Press, Beijing (2005) 9. Zhang, W.: Adaptive Ant Colony Optimized Clustering Algorithm Based on Multi Agent Architecture. Computer Engineering and Applications 15, 17–19 (2005)
A New Supermemory Gradient Method without Line Search for Unconstrained Optimization June Liu, Huanbin Liu, and Yue Zheng
Abstract. In this paper, we present a new supermemory gradient method without line search for unconstrained optimization problems. The new method can guarantee a descent at each iteration. It sufficiently uses the previous multi-step iterative information at each iteration and avoids the storage and computation of matrices associated with the Hessian of objective functions, so that it is suitable to solve large scale optimization problems. We also prove its global convergence under some mild conditions. In addition, We analyze the linear convergence rate of the new method when the objective function is uniformly convex and twice continuously differentiable. Keywords: Unconstrained optimization, Memory gradient method, Global convergence, Convergence rate.
1 Introduction Consider the unconstrained optimization problem min f (x),
x ∈ Rn ,
(1)
where f (x) : Rn → R is continuously differentiable and its gradient is available. Usually we use the iterative method for solving (1) and its form is given by June Liu · Huanbin Liu · Yue Zheng Institute of Uncertain Systems, College of Mathematics and Information Science, Huanggang Normal University, Hubei 438000, China Yue Zheng School of Mathematics and Statistics, Wuhan University, Wuhan 430072, China
[email protected],
[email protected],
[email protected] H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 641–647. c Springer-Verlag Berlin Heidelberg 2009 springerlink.com
642
J. Liu, H. Liu, and Y. Zheng
xk+1 = xk + αk dk ,
(2)
where αk is a positive step size and dk is a search direction of f (x) at xk . Throughout this paper, we denote f (xk ) by fk , ∇f (xk ) by gk and f (x∗ ) by f ∗ respectively. There are many traditional methods for solving (1) such as the Newton method, the quasi-Newton method, the steepest descent method and the conjugate gradient method. It is well-known that the Newton method and the quasi-Newton method are most effective iterative methods. However, they need store and update matrices associated with the Hessian of the objective function. As problem size and computational expense increase, so does the difficulty of finding solutions. To further reduce the computational expense associated with Newton-like methods, the conjugate gradient method has recently attracted more attention. The most important features of this method are that storage requirement are minimal and that computational expense are relatively small. Consequently, the conjugate gradient method is particularly useful when the dimension of (1) is large. Generally it has the form (2) with , f or k = 1 −gk dk = (3) −gk + βk dk−1 , f or k ≥ 2 where βk is a scalar which determines the different conjugate gradient methods. The choice of βk should be such that (2)-(3) reduces to the linear conjugate gradient method in the case when f (x) is a strictly convex quadratic and αk is the exact 1-dimentional minimizer. Well-known formulas for βk are the Fletcher-Reeves(FR), Polak-Ribiere-Polyak(PRP), HestenesStiefel(HS), Conjugate Decent(CD) and Dai and Yuan(DY) formulas, which are given by gk 2 βkF R = , gk−1 2 βkP RP = βkHS =
gkT (gk − gk−1 ) , gk−1 2
gkT (gk − gk−1 ) , dTk−1 (gk − gk−1 )
βkCD = − βkDY = −
gk 2 , dTk−1 gk−1
gk 2 , dTk−1 (gk − gk−1 )
respectively. Their convergence properties have been reported by many authors [1]-[8].
A New Supermemory Gradient Method without Line Search
643
Similar methods to the conjugate gradient method are the memory gradient method or the supermemory gradient method. Not only are their basic idea simple but also they avoid computation and storage of some matrices, so that they are suitable to solve large scale optimization problems. The main difference between them is that the latter can use the information of the previous multi-step iterations more sufficiently and hence it is helpful to design algorithms with quick convergence rate. Many authors have studied the global convergence properties for memory gradient methods and yielded substantial results [9]-[12]. However, it is difficult or time-consuming to implement an exact line search for seeking step length in practical computation. We should use a suitable search direction and some available inexact line searches to choose a step size at each iteration for memory gradient methods and guarantee the global convergence. Generally, we often use Wolfe line search, Goldstein line search and Armijo line search for finding a step size. However for large scale problems, these line search rules need expensive computation of function and gradient values at each iteration. Recently, Sun and Zhang [13] proposed a particular choice of step size which means no line search and established the global convergence. Chen and Sun [14], Li and Chen [15], Yasushi Narushima [16], Yu [17] etc made further discussion from other aspects. Based on these, we will study the convergence properties of a new supermemory gradient method in this paper. We prove that, without any line searches, the new method can also guarantee a descent at each iteration. Finally, the global convergence and linear convergence rate are proved under some assumptions.
2 New Method For future reference, We formally assume the following. (H1) The objective function f (x) is bounded below on the level set Ω = {x ∈ Rn |f (x) ≤ f (x1 )}. (H2) In a neighborhood N of Ω, f (x) is differentiable and its gradient g(x) is Lipschitz continuous, namely, there exists a constant L > 0 such that g(x) − g(y) ≤ Lx − y, ∀x, y ∈ N.
(4)
Algorithm (A) Step 0 Given constants ε > 0, ρ ∈ ( 12 , 1), positive integer m and k := 1; Step 1 If gk < ε, then stop! else go to Step 2; Step 2 Computer dk which satisfies the following formula ⎧ −gk , f or k ≤ m − 1 ⎨ m (5) dk = dk−i , f or k ≥ m ⎩ −gk + βk i=1
644
where βk =
J. Liu, H. Liu, and Y. Zheng ρ gk ; m dk−i
i=1
gT d
k k Step 3 Computer αk = − L d 2; k Step 4 Let xk+1 = xk + αk dk ; Step 5 Let k := k + 1 and go to Step 1.
Lemma 1. For all k ≥ 1, we have that gkT dk ≤ −(1 − ρ)gk 2 .
(6)
Proof. For obtaining (6), we consider the following two cases. Case (i):k ≤ m − 1. It is obvious that (6) holds. Case (ii):k ≥ m. gkT dk = −gk 2 + βk
m
gkT dk−i
i=1
≤ −gk 2 + βk gk
m
dk−i
i=1
= −(1 − ρ)gk 2 . We can immediately draw the conclusion of Lemma 1. Lemma 2. For all k ≥ 1, we have dk ≤ (1 + ρ)gk .
(7)
Proof. If k ≤ m − 1, then dk = gk ≤ (1 + ρ)gk . If k ≥ m, then dk 2 = gk 2 − 2βk
m
gkT dk−i + βk2
i=1
≤ gk 2 + 2βk gk
m
m
dk−i 2
i=1 m 2 dk−i + βk ( dk−i )2
i=1
i=1
= gk 2 + 2ρgk 2 + ρ2 gk 2 = (1 + ρ)2 gk 2 . Consequently, we can obtain dk ≤ (1 + ρ)gk . Lemma 3. Suppose that the assumption (H1)-(H2) hold. Let {xk } be generated by Algorithm(A). Then we have f (xk ) − f (xk+1 ) ≥
1 (gkT dk )2 . 2L dk 2
(8)
A New Supermemory Gradient Method without Line Search
645
Proof. By mean value theorem and (H2) we have !
1
f (xk+1 ) − f (xk ) = αk !
g(xk + tαk dk )T dk dt 0
1
(g(xk + tαk dk ) − gk )T dk dt
= αk gkT dk + αk 0
! ≤
αk gkT dk
1
g(xk + tαk dk ) − gk dk dt
+ αk 0
1 ≤ αk gkT dk + Lα2k dk 2 2 1 (gkT dk )2 =− . 2L dk 2
3 Convergence Analysis By using Lemma 1–3, we can easily obtain the following result. Theorem 1. Suppose that the assumption (H1)-(H2) hold. Let {xk } be generated by Algorithm(A). Then we have that lim gk = 0.
k→∞
In order to analyze the convergence rate, we further assume that (H3)f (x) is uniformly convex and twice continuously differentiable. Lemma 4. Suppose that (H3) hold, then f (x) has the following properties (1)f (x) has a unique minimizer x∗ on Rn . (2)The level set Ω = {x ∈ Rn |f (x) ≤ f (x1 )} is bounded. (3)There exists 0 < m1 < m2 such that 1 1 m1 x − x∗ 2 ≤ f (x) − f (x∗ ) ≤ m2 x − x∗ 2 , 2 2 ∗ m1 x − x ≤ g(x) ≤ m2 x − x∗ .
(9) (10)
(4) Assumptions (H1) and (H2) hold. Proof. The results of this lemma are obtained from [18]. Theorem 2. Suppose that (H3) holds and the Algorithm(A) generates an infinite sequence {xk }. Then xk convergence to x∗ at least R-linearly. Proof. Conform to Lemma 4, we may assume that L = m2 . It follows by Lemma 3 and from (6) and (7) that f (xk+1 − f (xk )) ≤ −
1 1−ρ 2 ( ) gk 2 . 2L 1 + ρ
646
J. Liu, H. Liu, and Y. Zheng
Combining the above inequality (9) and (10), we can obtain f (xk+1 ) − f (xk ) ≤ −( Define θ=(
m1 2 1 − ρ 2 ) ( ) (f (xk ) − f (x∗ )). m2 1+ρ
(11)
m1 2 1 − ρ 2 ) . ) ( m2 1+ρ
It is obvious that 0 < θ < 1. By (9) and (11), we have m1 xk − x∗ 2 ≤ f (xk ) − f (xk−1 ) + f (xk−1 ) − f (x∗ ) ≤ (1 − θ)(f (xk−1 ) − f (x∗ )) ≤ · · · ≤ (1 − θ)k−1 (f (x1 ) − f (x∗ )). which shows that xk convergence to x∗ at least R-linearly. This completes the proof.
4 Conclusion In this paper, we presented a new supermemory gradient method without line search for unconstrained optimization problems. The new method can guarantee a descent at each iteration. We proved its global convergence under mild conditions. We also analyzed the linear convergence rate of the new method when the objective function is uniformly convex and twice continuously differentiable. However more numerical test should be done for practical problems.
References 1. Dai, Y.H., Yuan, Y.X.: Convergence Properties of the Fletcher-Reeves Method. IMA J. Numer. Anal. 16, 155–164 (1996) 2. Hu, Y.F., Storey, C.: Global Convergence Result for Conjugate Gradient Methods. JOTA 71, 399–405 (1991) 3. Grippo, L., Lucidi, S.: A Globally Convergent Version of the Polak-Ribiere Conjugate Gradient Method. Math. Prog. 78, 375–391 (1997) 4. Fletcher, R.: Practical Methods of Optimization. Unconstrained Optimization, vol. 1. John Wiley Sons, New York (1987) 5. Dai, Y.H., Yuan, Y.X.: Convergence Properties of the Conjugate Descent Method. Advances in Mathematics 25, 552–562 (1996) 6. Dai, Y.H., Yuan, Y.X.: Nonlinear Conjugate Gradient Methods. Shanghai Scientific and Technical Publishers (2000) 7. Dai, Y.H., Yuan, Y.X.: A Nonlinear Conjugate Gradient Method with a Strong Global Convergence Property. SIAM J. Optim. 10, 177–182 (1999)
A New Supermemory Gradient Method without Line Search
647
8. Gilbert, J.C., Nocedal, J.: Global Convergence Properties of Conjugate Gradient Methods for Optimization. SIAM J. Optim. 2, 21–42 (1992) 9. Wolfe, M.A., Viazminsky, C.: Supermemory Descent Methods for Unconstrained Minimization. J. Optim. Theory Appl. 18, 455–468 (1976) 10. Zhenjun, S.: Supermemory Gradient Method for Unconstrained Optimization. J. Eng. Math. 17, 99–104 (2000) 11. Zhenjun, S.: A New Memory Gradient under Exact Line Search. Asia-Pacific J. Oper. Res. 20, 275–284 (2003) 12. Zhenjun, S.: A New Supermemory Gradient Method for Unconstrained Optimization. Advances in Mathematics 35, 265–274 (2006) 13. Sun, J., Zhang, J.: Global Convergence of Conjugate Gradient Methods without Line Search. Annals of Operations Research 103, 161–173 (2001) 14. Chen, X., Sun, J.: Global Convergence of a Two-parameter Family of Conjugate Gradient Methods without Line Search. Journal of Computational and Applied Mathematics 146, 37–45 (2002) 15. Li, X., Chen, X.: Global Convergence of Shortest-residual Family of Conjugate Gradient Methods without Line Search. Asia-Pacific Journal of Operational Research 22, 529–538 (2005) 16. Narushima, Y.: A Memory Gradient Method without Line Search for Unconstrained Optimization. SUT Journal of Mathematics 42, 191–206 (2006) 17. Yu, Z.S.: Global Convergence of a Memory Gradient Method without Line Search. J. Appl. Math. Comput. 26, 545–553 (2008) 18. Cohen, A.I.: Stepsize Analysis for Descent Methods. J. Optim. Theory Appl. 33, 187–205 (1981)
A Neural Network Approach for Solving Linear Bilevel Programming Problem Tiesong Hu, Bing Huang, and Xiang Zhang*
Abstract. A novel neural network approach is proposed for solving linear bilevel programming problem. The proposed neural network is proved to be Lyapunov stable and capable of generating optimal solution to the linear bilevel programming problem. The numerical result shows that the neural network approach is feasible and efficient. Keywords: Linear bilevel programming, Neural network, Asymptotic stability, Optimal solution.
1 Introduction Bilevel programming (BLP) problem arise in a wide variety of scientific and engineering applications including resource allocation, finance budget, price control, transportation and network design, and so on. It is characterized by the existence of two optimization problems in which the constraint region of the first-level problem is implicitly determined by another optimization problem. BLP problem has been proved to be NP-hard[1]. Since McCulloch and Pyne[2,3] utilized logical calculus to emulate nervous activities, there have been various types of analogue neural networks proposed for computation, and recent research has shown that the neural network approach has many computational advantages over traditional digital computers[4-9]. However, so far little has been published on application of neural networks to the BLP problem. In this paper, we present an approach of constructing neural network, which is different from the method given in [10,11] for the linear BLP problem. Here, following the Kuhn-Tucker optimality condition of the lower level problem, we reduce the linear BLP problem to a regular linear programming with complementary constraints. Then we propose a novel neural network approach for the linear programming problem with complementary constraints and get the approximate Tiesong Hu . Bing Huang . Xiang Zhang State Key Laboratory of Water Resource and Hydropower Engineering Science, Wuhan University, Wuhan 430072, China *
H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 649–658. springerlink.com © Springer-Verlag Berlin Heidelberg 2009
650
T. Hu, B. Huang, and X. Zhang
optimal solution of the linear BLP problem. It is noted that the neural network proposed here can also be used to solve the linear programming with complementary constraints. Towards these ends, the rest of the paper is organized as follows. In Section 2, we will firstly introduce the smoothing method for the linear programming with complementary constraints. Then in Section 3, we propose a neural network for solving the smoothed problem and derive the condition for asymptotic stability, solution feasibility and solution optimality. Numerical example are given in Section 4. Finally we conclude our paper.
2 Linear BLP Problem and Smoothing Method Let
x ∈ X ⊂ R n , y ∈ Y ⊂ R m , F : X × Y → R 1 , f : X × Y → R 1 . The
general model of the linear BLP can be written as [2]:
min F ( x, y ) = c1 x + d1 y x∈ X
A1 x + B1 y ≤ b1
s.t.
min f ( x, y ) = c2 x + d 2 y
(1)
y∈Y
s.t.
A2 x + B2 y ≤ b2
c1 , c 2 ∈ R n , d 1 , d 2 ∈ R m , b1 ∈ R p , b2 ∈ R q , A1 ∈ R p×n , B1 ∈ R p×m , A2 ∈ R q×n , B2 ∈ R q×m . According to the above introduction, we can replace the lower level problem with its Kuhn-Tucker optimality conditions, and get the following one-level problem.
min c1 x + d1 y s.t.
A1 x + B1 y ≤ b1 , A2 x + B2 y ≤ b2 , uB2 − v = − d 2 ,
(2)
u (b2 − A2 x − B2 y ) + vy = 0, x ≥ 0, y ≥ 0, u ≥ 0, v ≥ 0 where
u ∈ R q , v ∈ R m are (row) vector.
Problem (2) is called the mathematical program with complementary constraint, the regularity assumptions which are needed for successfully handling smooth optimization programs are never satisfied. Then, we reformulate problem (2) to the following non-smooth equivalent reformulation [12]:
A Neural Network Approach for Solving Linear Bilevel Programming Problem
651
min c1 x + d1 y A1 x + B1 y ≤ b1 ,
s.t.
A2 x + B2 y ≤ b2 , uB2 − v = − d 2 ,
(3)
−2 min(u , b2 − A2 x − B2 y ) = 0, −2 min(v, y ) = 0, x≥0 The “min” operator is applied component-wise to the vectors. Problem (3) is nonsmooth optimality problem, and it is not good to use neural network to solve problem (3). But fortunately, some researchers have presented smoothing method for problem (3). Following this smoothing method we can propose a novel neural network approach for the linear BLP problem. Let μ ∈ R be a parameter. Define the function φμ
: R 2 → R by
φμ ( a , b) = ( a − b ) 2 + 4 μ 2 − ( a + b) Then, we can have the following proposition. Proposition 1 [12]. For every μ ∈ R , we have
φμ (a, b) = 0 ⇔ a ≥ 0, b ≥ 0, ab = μ 2 Note also that for μ for every
μ≠0
every (a, b) ,
,
= 0 , we have φμ (a, b) = 0 ⇔ a ≥ 0, b ≥ 0, ab = 0 . While,
φμ ( a, b)
is differential for every
(a, b) . Moreover, for
lim φμ (a, b) = −2 min(a, b) . The function φμ (a, b) is therefore μ →0
a smooth perturbation of the complementary conditions. Then, problem (3) can be approximated by: min c1 x + d1 y s.t.
A1 x + B1 y ≤ b1 , uB2 − v = −d 2 , [ui − (b2 − A2 x − B2 y )i ]2 + 4μ 2 − ui − (b2 − A2 x − B2 y )i = 0, (v j − y j ) 2 + 4μ 2 − v j − y j = 0, x≥0
j = 1,… , m
i = 1,… , q
(4)
652
T. Hu, B. Huang, and X. Zhang
Using the problem (4), we overcome the difficulty that the problem (2) does not satisfy any regularity assumptions which are needed for successfully handling smooth optimization problems, and pave the way for using neural network approach to solve the problem (2). To simply the discussion, we introduce the following notations.
⎛ A x + B1 y − b1 ⎞ G ( x, y , u , v ) = ⎜ 1 ⎟, −x ⎝ ⎠ ⎛ ⎞ uB2 − v + d 2 ⎜ ⎟ H ( x, y, u, v) = ⎜ φμ (ui , (b2 − A2 x − B2 y )i ), i = 1,… , q ⎟ ⎜ ⎟ φμ (v j , y j ), j = 1,… , m ⎝ ⎠
Let x ' = ( x, y , u , v ) , we can write problem (4) equivalently as the following problem.
min f ( x ') = c1 x + d1 y
,
s.t. Gl ( x ') ≤ 0 l = 1,… , n + p (5) H k ( x ') = 0 k = 1,… , 2m + q Definition 2. Let x ' be a feasible point of problem (5) and L = {l : Gl ( x ' ) = 0, l = 1,… , n + p} . We say that x ' is a regular point of problem (5) if the gradients ∇H k ( x ') , ∇Gl ( x '), l ∈ L are linearly independent.
,
Similar to the main result in [3](Theorem 6.11), we can directly have the following theorem.
{( x ') } be a sequence of solutions to problem (5). Suppose the } converges to some x for μ → 0 . If x is the regular point of μ
Theorem 1. Let sequence
{( x ') μ
_
_
_
the problem (5), then
x solves the problem (2).
3 Neural Network for Linear BLP Problem The Lagrange function of problem (4) can be defined by
L( x ', Y , λ , μ ) = f ( x ') +
2m+q
n+q
k =1
l =1
∑ λk H k ( x ') + ∑ μl [Gl ( x ') + Yl 2 ]
where the term Y is slack variable, and the terms λ , μ are referred as Lagrange multiplier. Then, our aim now is to design a neural network that will settle down to the equilibrium, which is also a stationary point of the Lagrange function
A Neural Network Approach for Solving Linear Bilevel Programming Problem
653
L( x ', Y , λ , μ ) . The transient behavior of the neural network can be defined by the following equations.
⎧ dx ' ⎪ dt = −∇ x ' L( x ', Y , λ , μ ) ⎪ ⎪ dY = −∇ L( x ', Y , λ , μ ) Y ⎪ dt (LBPNN) ⎨ ⎪ d λ = ∇ L( x ', Y , λ , μ ) λ ⎪ dt ⎪ dμ ⎪ = ∇ μ L( x ', Y , λ , μ ) ⎩ dt
(6)
Now we will study the relationship between the equilibrium of LBPNN and the approximate optimal solution of the problem (1) for ε → 0 + . We have the following theorem. *
Theorem 2. Let (( x ')
, Y * , λ * , μ * ) be the equilibrium of the neural network (6),
*
and assume that ( x ') is a regular point of problem (4). Then the equilibrium of the neural network solves problem (4). Proof. The proof of theorem 2 can be divided into two steps. Firstly, same to the proof of theorem 3 in [6], we can get that μl > 0, l = 1,… , n + p and the equilibrium of the neural network is the Kuhn-Tucker point of the problem. Secondly, we note that it is easy to verify that ∇ x ' x ' L(( x ') 2
Let
*
, Y * , λ * , μ * ) is positive definite.
Z = {z | ∇H k ( x 'k ) z = 0, k = 1,… , 2m + q; ∇Gl ( x 'k ) z = 0, ∀l ∈ L} ,
we have that z ∇ x ' x ' L(( x ') T
2
*
, Y * , λ * , μ * ) z > 0 , then following the sufficiency
optimality conditions of second order for the problem (4), we can get that (( x ') , Y , λ , μ ) solves the problem (4). While for the neural network to be of practical sense, the neural network should be of asymptotically stable, so that the neural network will always converge to *
*
*
*
(( x ')* , Y * , λ * , μ * ) from an arbitrary initial point within the attraction domain *
of (( x ')
, Y * , λ * , μ * ) . With this in mind, we state and prove the following theo-
rem, which in other words represents the local stability of the network.
(( x ')* , Y * , λ * , μ * ) be the equilibrium of the neural network (6). * * * * * If ( x ') is the regular point of problem (4), then (( x ') , Y , λ , μ ) is an asymp-
Theorem 3. Let
totically stable point of the neural network.
654
T. Hu, B. Huang, and X. Zhang
Proof. Let the following equation denote the Lyapunov function of the network LBPNN.
1 1 2 2 ∇ x ' L( x ', Y , λ , μ ) + ∇Y L( x ', Y , λ , μ ) 2 2 2 1 1 2 + ∇ λ L( x ', Y , λ , μ ) + ∇ μ L( x ', Y , λ , μ ) 2 2
E ( x ', Y , λ , μ ) =
Differentiating E ( x ', Y , λ , μ ) with respect to time t , we have
dE ∂E dx ' ∂E dY ∂E dλ ∂E dμ = ⋅ + ⋅ + ⋅ + ⋅ dt ∂x ' dt ∂Y dt ∂λ dt ∂μ dt = [∇x' L(x ',Y, λ, μ) ⋅∇2x' x' L(x ',Y, λ, μ) +∇λ L(x ',Y, λ, μ) ⋅∇λ2x' L(x ',Y, λ, μ) +∇μ L(x ',Y, λ, μ) ⋅∇2μx' L(x ',Y, λ, μ)]
dx ' dt
2 +[∇Y L(x ',Y, λ, μ) ⋅∇YY L(x ',Y, λ, μ) +∇μ L(x ',Y, λ, μ) ⋅∇2μY L(x ',Y, λ, μ)]
+[∇x' L(x ',Y, λ, μ) ⋅∇2x'λ L(x ',Y, λ, μ)]
dλ dt
dY dt
dμ dt 2 2 =−∇x' L(x ',Y, λ, μ) ⋅∇x' x' L(x ',Y, λ, μ) ⋅∇x' L(x',Y, λ, μ) −∇Y L(x ',Y, λ, μ) ⋅∇YY L(x', +[∇x' L(x ',Y, λ, μ) ⋅∇2x'μ L(x ',Y, λ, μ) +∇Y L(x ',Y, λ, μ) ⋅∇Y2μ L(x ',Y, λ, μ)]
Y, λ, μ) ⋅∇Y L(x ',Y, λ, μ) 2 As ∇YY L(( x ')* , Y * , λ * , μ * ) =
Diag (2μ1* ,… , 2μn*+ q ) , following the proof
of the theorem (2), we have μ *
2 > 0, l = 1,… , n + q , then ∇YY L( x ', Y , λ , μ ) and dE ≤ 0 . It shows ∇ 2x ' x ' L(( x ')* , Y * , λ * , μ * ) are both positive definite. We have dt that (( x ')* , Y * , λ * , μ * ) is an asymptotically stable point of the neural network.
l
4 Numerical Studies In this section we will present some linear bilevel programming problems to illustrate the validity of the neural network approach for the linear bilevel programming. Example 1. Consider the following linear BLP problem [2],
x ∈ R1 , y ∈ R1 .
A Neural Network Approach for Solving Linear Bilevel Programming Problem
655
min F ( x, y ) = x − 4 y x≥0
s.t
min f ( x, y ) = y y ≥0
s.t.
− x − y ≤ −3 −2 x + y ≤ 0 2 x + y ≤ 12 3x − 2 y ≤ 4
After applying the Kuhn-Tucker transformation and the smoothing method, the above problem reduces to a problem similar to problem (3). Then similar to problem (6), we can get a set of ordinary differential equations, which describes the transient behavior of the neural network, and adopt the classical fourth-order Runge-Kutta method to solve these equations. We make program with Microsoft Visual C++ 6.0 and use a personal computer (CPU: Intel Pentium 1.7GHz, RAM: 256MB) to execute the program. Following theorem 1, we let ε have different small values and Table 1 presents the different optimal solutions of Example 1 Table 1 The optimal solutions with different values of parameter
ε
(Example 1)
Different optimal solutions corresponding to different ε
ε = 0.01
ε = 0.001
ε = 0.0001
(3.99,4.00)
(4.00,4.00)
(4.00,4.00)
Fig. 1 The transient behavior of the variables in Example 1
656
T. Hu, B. Huang, and X. Zhang
over the different ε . The initial point is ( x, y ) = (1.0,1.0) and the other variables are zero. Moreover, Figure 1 shows the transient behavior of the variables corresponding to ε = 0.001 . Example 2. Consider the following linear BLP problem. x ∈ R
1
, y ∈ R2 .
max 4 x1 + y1 + y 2 x≥0
s.t.
max x + 3 y1 y ≥0
s.t.
x + y1 + y 2 ≤
25 9
x + y1 ≤ 2 8 9 x, y1 , y 2 ≥ 0 y1 + y 2 ≤
Table 2 The optimal solutions with different values of parameter
ε
(Example 2)
Different optimal solutions corresponding to different ε
ε = 0.01
ε = 0.001
ε = 0.0001
(1.834,0.892,0.004)
(1.833,0.891,0)
(1.833,0.891,0)
Fig. 2 The transient behavior of the variables in Example 2
A Neural Network Approach for Solving Linear Bilevel Programming Problem
657
For Example 2, following the same procedure of dealing with Example 1, we also study the different optimal solutions over the different ε > 0 . The solution trajectory with
x, y1 , y2 versus the number of iterations is shown in Figure 2 and the
influences of the value of ε respect to the optimal solutions are tabulated in Table 2. From above Tables and Figures, it can be found that the computed results converge to the optimal solution with the decreasing of ε . It shows that the neural network approach is feasible to the linear BLP problem.
5 Conclusion In this paper we present a novel neural network approach for the linear BLP problem, and the numerical results show that the computed results converge to the optimal solution with the decreasing of ε , which corresponds to the result in theorem 1. It deserves pointing out that the initial point of the neural network is the key factor of influencing the transient behavior of the proposed neural network. An appropriate initial point can get perfect transient behavior of the variables. The reason why such thing happens is that the neural network proposed only has asymptotic stability. In order to get the optimal solution rapidly, we should choose the point, which satisfies the constraints (3) possibly, as the initial point. How to design neural network with global stability for BLP problems is still a challenge topic. Acknowledgments. The work was supported by the National Natural Science Foundation of China under the grant No. 70771082.
References 1. Deng, X.: Complexity Issues in Bilevel Linear Programming, pp. 149–164. Kluwer Academic Publishers, Dordrecht (1998) 2. McCulloch, W.S., Pitts, W.A.: A Logical Calculus of the Ideas Immanent in Nervous Activity. Bulletin of Mathematics and Biophysics 5, 115–133 (1943) 3. Pyne, I.B.: Linear Programming on an Electronic Analog. Computer Transactions of the American Institute Electrical Engineers 75, 139–143 (1956) 4. Zhang, S., Constantinides, A.G.: Lagrange Programming Neural Networks. IEEE Transaction on Circuits and Systems 39, 441–452 (1992) 5. Hu, T.S., Lam, K.C., Ng, S.T.: River Flow Time Series Prediction with RangeDependent Neural Network. Hydrol. Sci. J. 46, 729–745 (2001) 6. Hu, T.S., Lam, K.C., Ng, S.T.: A Modified Neural Network for Improving River Flow Prediction. Hydrol. Sci. J. 50, 299–318 (2005) 7. Hu, T.S.: Neural Optimization and Prediction. Dalian Maritime University Press, Dalian (1997) (in Chinese) 8. Hu, T.S., Guo, Y.: Neural Network for Multi-Objective Dynamic Programming. ACTA Electronica Sinica 27, 70–72 (1999)
658
T. Hu, B. Huang, and X. Zhang
9. Sheng, Z., et al.: A New Algorithm Based on the Frank-Wolfe Method and Neural Network for a Class of Bilevel Decision Making Problems. ACTA Automatica Sinica 22, 657–665 (1996) 10. Shih, H.S., Wen, U.P., et al.: A Neural Network Approach to Multi-Objective and Multilevel Programming Problems. Computers and Mathematics with Applications 48, 95–108 (2004) 11. Lan, K.M., Wen, U.P., et al.: A Hybrid Neural Network Approach to Bilevel Programming Problems. Applied Mathematics Letters 20, 880–884 (2007) 12. Facchinei, F., Jiang, H., Qi, L.: A Smoothing Method for Mathematical Programs with Equilibrium Constraints. Mathematical Programming 35, 107–134 (1999)
Fuzzy Solution for Multiple Targets Optimization Based on Fuzzy Max-Min Neural Network Pengfei Peng, Jun Xing, and Xuezhi Fu*
Abstract. After summarizing general solution of multiple targets optimization and studying the elementary principle of fuzzy solution for multiple targets optimization, the method of fuzzy solution for multiple targets optimization based on fuzzy min-max (FMM) neural network was proposed in this paper. Some simulation analysis results indicated that multiple targets optimization algorithm based on FMM neural network was simple, which had strong nonlinear mapping capability and could preferably express fuzzy membership function. So, it was solved that membership function was hard to be defined with reason when solution for multiple targets optimization was put forward. Keywords: Multiple targets optimization, Fuzzy set, Fuzzy membership function, Neural network.
1 Introduction Several target aims were frequently expected to attain optimal value in problems of weapon control, engineering design and so on. In these problems of multiple targets optimization, minimization of one subgoal could often lead to optimal value worsening of another subgoal or other ones. It was needed to coordinate among optimal values of every subgoal to gain optimal plan when problem of multiple target optimization was being solved. In general, efficient solution (also as pareto solution or non-inferior solution) can be get as to problem of multiple target optimization. Great developments have gained recently on research for multiple targets optimization, and many solution methods were advanced by experts. In these solution methods, targets combination method was the one which was used widely during the period of development initial stages of methods of multiple targets optimization, such as weighted sum method, goal programming method, ε -restriction method, and so on. Characteristic of these methods was that multiple targets were combined to be one target by variable combination methods and was resolved by Pengfei Peng . Jun Xing . Xuezhi Fu Naval University of Engineering, Wuhan 430033, China *
H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 659–667. springerlink.com © Springer-Verlag Berlin Heidelberg 2009
660
P. Peng, J. Xing, and X. Fu
method of single target optimization. Solutions by this method was at least weakly effective. But this method was improved based on single target optimization, and there was only one initial point of its (or a group of ones) which converged to one point and had no characteristics of vector optimization. Until 1989, Goldberg had firstly advanced idea that adaptive grade function of Pareto form was used in genetic algorithm [1], and suggested that sorting and selection of non-inferior solutions should be used to make colony approach optimal solution in multiple targets optimization. Besides, he had advanced that colony could maintain equably in a solution set instead of converging a point by microhabitat technology. These ideas affected followed studies greatly. At present, evolutionary computing was used widely in method of multiple targets optimization and many excellent technologies of multiple targets optimization appeared, such as genetic algorithm of multiple targets [2], pareto genetic algorithm with microhabitat technology [3] and so on. However, optimal values of multiple targets were related to ones of subgoals, and relationship between optimal values of subgoals and ones of multiple targets was ambiguous. So, solution methods avoiding ambiguity were all dissatisfactory[4]. Therefore, principle of fuzzy solution for multiple targets optimization was studied in this paper, and method of fuzzy solution for multiple targets optimization based on FMM neural network was put forward.
2 Method of Fuzzy Solution for Multiple Targets Optimization The common expression of mathematical model of multiple targets optimization was showed as followed: T ⎧ X = [x1, x2 , , xn ] ⎪ T ⎪min F ( X ) = [F1( X ), F2 ( X ), , Fi ( X ), , Fm ( X )] ⎨ g j ( X ) ≤ 0 j = 1,2, , p ⎪ ⎪ hk ( X ) = 0 k = 1,2, , q ⎩
i = 1,2, m
(1)
Elementary idea of method of fuzzy solution for multiple targets optimization was that restricted optimal solutions of every subgoals were get firstly, and these optimal solutions were used to fuzz every subgoal function, then, solution which made membership grade function of intersection was found, which was optimal solution of multiple targets optimization. Detailed steps were showed as followed. (I) Restricted extremum of each subgoal function was resolved. T ⎧ X = [x1 , x2 , , xn ] ⎪ ⎪min Fi ( X ) i = 1,2, , m ⎨ j = 1,2, , p ⎪ g j(X ) ≤ 0 ⎪ h (X ) = 0 k = 1,2, , q ⎩ k
(2)
Fuzzy Solution for Multiple Targets Optimization
661
and T ⎧ X = [x1 , x2 , , xn ] ⎪ ⎪max Fi ( X ) i = 1,2, , m ⎨ j = 1,2, , p ⎪ g j(X ) ≤ 0 ⎪ h (X ) = 0 k = 1,2, , q ⎩ k
Thereinto,
(3)
Fi max and Fi min were maximum and minimum of every subgoal tar-
get respectively. (II) Every subgoal function was fuzzed.
F max − F ( X ) μ F~i ( X ) = i max i min Fi − Fi Thereinto,
μ F~ ( X ) was
r
(4)
satisfaction level of realizing subgoal
i
generally,
1 1 r = 1, , , 2 3
Fi (X ) . r > 0 ,
.
(III) Fuzzy decision was constructed.
~ m ~ D = ∩ Fi
(5)
i =1
its membership grade function is: m
μ D~ ( X ) = ∧ μ F~ ( X ) i =1
(6)
i
(IV) Optimal solution of multiple targets optimization as
X * was resolved.
m
μ D~ ( X * ) = max μ D~ ( X ) = max ∧ μ F~ ( X ) i =1
Obviously, different forms of
μ F~ ( X )
i
(7) .
would affect optimal solution of multiple
i
targets optimization, and different forms of
μ F~ ( X )
could be constructed smartly
i
by this to reflect characteristics of problems and people’s subjective desires and get good optimization plans. Generally, membership grade functions can be constructed by forms of tine Γ, parabola, normal distribution and so on except formula (4).
3 Principle and Study Algorithm of FMM Neural Network Simpson advanced a kind of fuzzy neural network in 1992 which was named as fuzzy max-min neural network[5]. Characteristics of this neural network were that
662
P. Peng, J. Xing, and X. Fu
input was points among multiple dimension unit cube, super-box(multiple dimension rectangle) was used in calculating process, mapping form has strong nonlinear capability, and study velocity of its was quicker than that of multi-layer sensors. Therefore, this network had been used in pattern recognition, multiple targets decision and so on.
3.1 Input Space of FMM Neural Network In general, input pattern could be showed as a multiple dimension vector as
X = ( x1 , x2 ,......, xn ) , which could be also treated as a point of space as R n . In T
studies on neural networks application, input pattern as X could be showed as eigenvector. Absolute information of every branch quantum was not very important, and what was concerned was comparative information of every branch quantum, so, X was frequently conducted normalization process. Therefore, input pattern of FMM neural network was n dimension vector as
X = ( x1 , x2 ,......, xn ) ( 0 ≤ xi ≤ 1 , T
i=1,2,…n) after some normalization process, and X was always in n dimension unit cube as
I n = [0,1] . n
3.2 Topological Structure and Neuron of FMM Neural Network Topological structure of FMM neural network was comparatively simple and triplex layers feed forward neural network and entire links, whose structure was showed as fig. 1. In this structure, the under layer was input one as FA which had n nodes and was used to receive n dimension input vector after normalization process. The
Fig. 1 Topological structure of FMM neural network
Fuzzy Solution for Multiple Targets Optimization
663
middle layer was super-box nodes one as FB , and every node in it showed one super-box and number of super-box could increase in the process of study with aid(FMM neural network was made to be self-adaptive one). It was supposed that neuron number was m and connection matrix between FA and FB was V matrix
⎡ V1 ⎤ ⎡ v11 V = ⎢⎢ ⎥⎥ = ⎢⎢ ⎢⎣Vm ⎥⎦ ⎢⎣v m1
v12 vm 2
v1n ⎤ ⎥ ⎥ v mn ⎥⎦
(8)
w1n ⎤ ⎥ ⎥ wmn ⎥⎦
(9)
and W matrix
⎡ W1 ⎤ ⎡ w11 W = ⎢⎢ ⎥⎥ = ⎢⎢ ⎢⎣Wm ⎥⎦ ⎢⎣wm1
w12 wm 2
which was matrix composed of max-min points of super-box. FC layer was output one, which had p neurons. Connection matrix between
FB and FC is
⎡ u11 ⎢ U =⎢ ⎢u p1 ⎣
u12 um2
u1m ⎤ ⎥ ⎥ u pm ⎥⎦
(10)
It was a two-value matrix, and there is only one element whose value was 1 and other ones were all 0 in every column of matrix U (So, there could be some 1 in every row).
3.3 Algorithm of Study with Aid of FMM Neural Network Algorithm of study with aid of FMM neural network FMM could be divided into three steps as super-box expansion, overlap check and super-box compress. (I) super-box expansion Study samples as {X h , Yv } were input. Thereinto, X h was input vector and
Yv was corresponding output one. Then, super-box about X h which had maximum of membership grade function was found from existing Yv kinds of superboxes, and this super-box was judged whether it was allowed to be expanded. If
664
P. Peng, J. Xing, and X. Fu
this one was not allowed to be expanded, another one would be found in left
Yv -1
kinds of ones. If there was no one super-box which met expansion conditions or it was not constructed yet, a new super-box would be constructed whose points of maximum and minimum were all equal to X h . (II) overlap check After super-box was expanded, overlapping with other ones could happened. But it could not be allowed. So overlapping check must be conducted to every super-box after super-box expansion. (III) super-box compress Minimum overlap dimension of overlap super-boxes was checked out. For forms of super-boxes being changed little as far as possible and one whose robustness was best being got, only minimum overlap dimension was compressed. After all study samples were input one time, Matrix of V, W and U were confirmed and study was over.
4 Method of Fuzzy Solution for Multiple Targets Optimization Based on FMM Neural Network As had been said before, the key of fuzzy solution for multiple targets optimization was to select right membership grade function. The reason was that fuzzy optimization solutions for multiple targets were different along with changes of forms of membership grade function. In common design of fuzzy optimization of multiple targets, membership grade functions showed by obvious expression were used widely. In fact, it was difficult to confirm right obvious expression of membership grade function, and membership grade functions which were selected by person were not reasonable. So, a new method of fuzzy solution for multiple targets optimization based on fuzzy max-min neural network was advanced, which used fuzzy max-min neural network to map membership grade function and transform problem of multiple targets optimization to one of single target optimization. Detailed steps of its were showed as followed: (I) Network structure was confirmed and study samples were provided. (II) Neural Network was trained and expression of its of membership grade function was constructed. (III) Initial solution as X 0 was given, and ω0 was defined.
X 0 as Fi ( X 0 ) (i=1,2,…,m) were calculated. (V) Membership grade function as μ F~ ( X 0 ) corresponding to Fi ( X 0 ) was i
(IV) Function values of subgoals of
confirmed by neural network after train with aid, and
ωi = μ F~ ( X 0 ) / ηi
and
i
ω = min(ω1 , ω2 , ωm )
were all calculated. Thereinto,
ηi
was a weighting
Fuzzy Solution for Multiple Targets Optimization
665
coefficient which affected focus level of designer to subgoal as
Fi ( X ) and im-
portant targets should be given bigger weight. (VI) If
ω − ω0 ≤ ε p ( ε p
is iterative precision defined in advance.), iteration
was over, and optimization solution was output, or step (7) was practicable. (VII) It was defined that ε r corresponding to Fr ( X 0 ) was minimum and
Fr (X ) was optimization target. Method of single target optimization was used to solute followed optimal problem: T ⎧ X = [x1 , x 2 , , x n ] ⎪ min Fr ( X ) ⎪⎪ ⎨ g j (X ) ≤ 0 j = 1,2, , p ⎪ hk ( X ) = 0 k = 1,2, , q ⎪ ⎪⎩ Fi ≤ (1 + ξ ) Fi ( X 0 ) i = 1,2, , m; i ≠ r
(11)
Thereinto, ξ was looseness coefficient. After optimal solution was got, step (IV) was practicable.
5 Analysis of Example Application Truss of three poles were showed as fig.2. Material density as ρ was equal to 1.0×104 Kg/m3, allowed stress as [σ] was equal to 200MPa, and area of cross−5
−4
section of members was required to be 1× 10 m ≤xi≤ 5 × 10 m (i=1,2). Areas of cross-section of every poles were needed to confirm when weight of this truss was lightest and displacement at nodes was smallest [6,7].
1m
1m
1m
x1
x2
x1
20KN Fig. 2 Truss of three poles
2
2
666
P. Peng, J. Xing, and X. Fu
This problem of multiple targets optimization could be described as followed: T ⎧ X = [x1 , x 2 ] ⎪ min F1 ( X ) = 2 2 x1 + x 2 ⎪ 20 ⎪ min F2 ( X ) = ⎪ x1 + 2 2 x 2 ⎪ 20( 2 x1 + x 2 ) ⎪ − 20 ≤ 0 ⎪ 2 x12 + 2 x1 x 2 ⎨ ⎪ 20 2 x1 ⎪ − 20 ≤ 0 2 x12 + 2 x1 x 2 ⎪ ⎪ 20 2 x 2 ⎪ − 15 ≤ 0 2 x12 + 2 x1 x 2 ⎪ ⎪1.0 × 10 −5 ≤ x ≤ 5.0 × 10 −4 (i = 1,2) i ⎩
(12)
Study samples were showed as Tab. 1. After method of fuzzy solution for multiple targets optimization based on fuzzy max-min neural network was conducted, optimization solutions could be got that
[
X n* = 1.32 × 10 − 4 ,1.01 × 10 − 4
]
T
,
F1 ( X n* ) = 4.74 , F2 ( X n* ) = 4.79 . After
method of fuzzy integration judgement was conducted, optimization solutions could be got that
[
X *f = 1.28 × 10 −4 ,1.11 × 10 − 4
]
T
,
F1 ( X *f ) = 4.73 ,
F2 ( X *f ) = 4.53 . By comparison, it could be gained that results of fuzzy solution Table 1 Study samples np
1 2 3 4 5 6 7 8 9 10
input (function value of subgoal)
output (satisfactory degree)
input (function value of subgoal)
output (satisfactory degree)
F1(X)
μF ( X )
F2(X)
μF ( X )
2.2 4.1 4.9 5.6 6.3 7.0 7.7 8.5 9.5 10.0
1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1
1.4 3.7 4.7 5.6 6.5 7.3 8.1 9.1 10.4 12.0
1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1
1
2
Fuzzy Solution for Multiple Targets Optimization
667
for multiple targets optimization based on FMM neural network could met requirements, and error of designed variable was little.
6 Conclusion Fuzzy solution for problem of multiple targets was a effective method for solving problem of multiple targets optimization. However, selection of membership grade function in process of optimization could affect optimization solutions directly. People’s subjective desires were hard to be reflected by membership grade function of obvious expression. A new algorithm of fuzzy optimization of multiple targets was advanced in this paper based on FMM neural network, which solved description problem of membership grade function and would have good application.
References 1. Goldberg, D.E.: Genetic Algorithms in Search, Optimization and Machine Learning. Addison Wesley Publishing Company, New York (1989) 2. Fonseca, C.M.: Genetic Algorithms for Multi-objective Optimization Formulation: Discussion and Generalization. In: Proceedings of the Fifth International Conference on Genetic Algorithms, San Mateo, pp. 416–423 (1993) 3. Quagliarella, D., Vicini, C.: Coupling Genetic Algorithms and Gradient Based Optimization Techniques. In: Genetic Algorithm and Evolution Strategies in Engineering and Computer Science, Recent Advances and Industrial Applications, Michigan, pp. 289– 309 (1997) 4. Rao, S.S.: Multi-objective Optimization of Fuzzy Structural Systems. International Journal for Numerical Methods in Engineering 6, 1157–1171 (1987) 5. Patrick, K.S.: Fuzzy min_max Neural Networks–Part1: Classification. IEEE Transactions on Neural Networks 2, 776–786 (1992) 6. Deng, B., Wang, J., Huang, H.: Fuzzy Optimization of Mechanical Engineering and Fuzzy Integration Judgement in Design of Multiple Targets Optimization. Mechanical Science and Technology 1(suppl.), 13–17 (1996) 7. Huang, H.: Principle and Application of Fuzzy Optimization in Mechanical Design. Science Publishing Company, Beijing (1997)
Fixed-Structure Mixed Sensitivity/Model Reference Control Using Evolutionary Algorithms Pitsanu Srithongchai, Piyapong Olranthichachat, and Somyot Kaitwanidvilai*
Abstract. This paper proposes a mixed sensitivity/model reference control using evolutionary algorithms. The proposed technique can solve the problem of complicated and high order controller of conventional H∞ optimal control. In addition, time domain specifications such as overshoot, undershoot, rise time can be incorporated in the design by formulating the appropriate fitness function of compact genetic algorithms. By the proposed approach, robustness and performance in terms of frequency domain and time domain specifications can be achieved simultaneously. Simulation results in a servo system verify the effectiveness of the proposed technique. Keywords: Fixed-structure robust
H ∞ control, Compact genetic algorithm,
Model reference control.
1 Introduction Recently, there has been much research that intends to develop a robust controller for system under conditions of uncertainty, parameter changes, and disturbances. As shown in previous work, H∞ optimal control is a powerful technique to design a robust controller; however, the order of controller designed by this technique is much higher than that of the plant. It is not easy to implement this controller in practical applications. To solve this problem, the design of a fixed-structure robust controller has been proposed and has become an interesting area of research because of its simple structure and acceptable controller order. A more recent control technique uses computational intelligence such as genetic algorithms (GA’s) or Particle Swarm Optimization (PSO) in adaptive or learning control. Karr and Gentry [1], [2] applied GA in the tuning of fuzzy logic control which was applied to a pH control process and a cart-pole balancing Pitsanu Srithongchai . Piyapong Olranthichachat . Somyot Kaitwanidvilai Faculty of Engineering, King Mongkut’s Institute of Technology Ladkrabang, Ladkrabang, Bangkok 10520 Thailand
[email protected] *
H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 669–676. springerlink.com © Springer-Verlag Berlin Heidelberg 2009
670
P. Srithongchai et al.
system. Hwang and Thomson [3] used GA to search for optimal fuzzy control rules with prior fixed membership functions. Somyot and Manukid [4] proposed a GA based fixed structure H∞ loop shaping control to control a pneumatic servo plant. To obtain parameters in the proposed controller, genetic algorithm is proposed to solve a specified-structure H∞ loop shaping optimization problem. Infinity norm of transfer function from disturbances to states is subjected to be minimized via searching and evolutionary computation. The resulting optimal parameters make the system stable and also guarantee robust performance. However, all of past developed techniques mentioned above are based on the frequency domain specifications. The reason is that it is convenient to design a robust controller by considering frequency domain. In fact, time domain specifications are also very important to be considered in the design. Unfortunately, the relation between time domain and frequency domain is currently unclear. Moreover, incorporating time domain specifications into the robust control synthesis by using analytical methods is very difficult. To simplify the problem, we propose a compact genetic algorithm for solving the pre-specified controller that can achieve both the time and frequency domain specifications. In the future work, the proposed technique will be used for controlling the servo system in the HDD visual inspection machine. The remainder of this paper is organized as follows. Section 2 presents the proposed cGA based robust/model reference control. The compact genetic algorithm for designing a fixed structure controller is also described in this section. Section 3 presents simulation results in a servo system when the proposed technique is applied. Section 4 concludes the paper.
2 cGA Based Robust/Model Reference Control According to the standard procedure of robust control [5], there are many techniques for designing a robust controller in a general plant; for example, mixed sensitivity function, mu-synthesis, H∞ Loop Shaping, etc. However, controllers designed by these techniques result in a complicated structure and high order. The order of the controller depends on the order of both the nominal plant and the weighting functions. It is well known that a high order or complicated structure controller is not desired in practical work. To overcome this problem, a fixedstructure robust controller is designed by using cGA. Followings show the details of proposed technique. cGA is used to solve the H∞ fixed-structure control problem which is difficult to be solved analytically. The proposed technique is described as follows: Controller’s structure selection K(x) is a structure-specified controller. In most cases, this controller has a simple structure such as PID configuration, lead-lag configuration. In this paper, we select the structure K(x) as lead-lag controller as shown in (1).
K ( x) =
as + b ( s + c)
(1)
Fixed-Structure Mixed Sensitivity/Model Reference Control
671
Set of controller parameters, x, is attempted to be evaluated to maximize the objective function. In this case, control’s parameters set is, x = [ a, b, c]
(2)
In the proposed technique, objective function is formulated as.
F = w1 F1 + F2 when J cos t1 < 1 and plant is stabilized by the controller (3) = 0.001 or small number when J cos t1 ≥ 1 or plant is not stabilized by the controller. where
F1 =
1 J cos t1
, F2 =
1 J cos t 2
and
w1 is the specified weighting factor. Jcost1
is the cost function obtained from the mixed-sensitivity approach. J cos t1 =
W1T W2 S
(4) ∞
where W1 is weighting function which is used to specify the plant perturbation and W2 is used to specify the performance and disturbance attenuation of the system. T is the plant’s complementary sensitivity function, and S is the plant’s sensitivity function. Jcost2 is the cost function obtained from the model reference approach. Assume that the repeat input command, which is usually applied in the industrial application, is applied to control the plant. Consequently, the model reference cost function can be written as Tp
J cos t 2 =
∫ e dt 2 r
(5)
0
where Tp is the period of repeat reference input, er
= ( yr − y ) is the difference
between the actual output and the desired response, y is the actual output. The desired response, yr, is determined by a reference model. Generally, reference model is specified by a first or second order filter. In this paper, a first order reference model is adopted.
yr 1 = Gr ( s) = τ s +1 r
(6)
where r is the input command and τ is a time constant, Gr ( s ) is the transfer function of reference model. Assume that the reference input, step input command, is given priori. This command is commonly used in a servo system since a plant repeatedly performs a specific motion that ends in a fixed duration. After specifying the structure of controller K(x), the cGA is used to tune the controller parameters to obtain the maximum fitness function in (3). Based on the principle of compact genetic
672
P. Srithongchai et al.
mechanics, an optimal solution can be evolved. Compact genetic algorithms and the proposed technique are summarized as follows:
Step 1. Specify the controller structure. In this paper, the structure of the controller is selected as the lead-lag controller. The optimal controller parameter is the unknown parameter that cGA attempts to evaluate. Step 2. Initialize the probability vector (p). The number of members in vector p is calculated from the number of unknown parameters and the number of bits per unknown. For example, assume that number of unknown parameters is 9 and the number of bits per unknown is 8. Then, the length of probability vector (m) is 9×8 = 72. The initial probability, p, for all elements is set to be 0.5. Step 3. Generate s individuals from the vector, where s is defined as the tournament selection of size s. In this paper, s is selected as 10 and S means the unknown controller parameter vector. for i=1 to s do S[i] = generate(pb) where generate means the individual generation procedure that computes the new individuals based on the probability vector p. Step 4. Use (3) to compute the fitness value of each S. Keep the S that has the maximum fitness value as the winner, and the S that has the minimum fitness value as the loser. winner, loser = compete(S[1], S[2]…., S[s]) where compete means the comparison procedure. Step 5 Update the probability vector p from the winner and loser. The following pseudo code is used to describe the updating algorithm. for i=1 to m do if winner[i] ≠ loser[i] if winner[i] = 1 then p[i] = p[i] + (1/n) else p[i] = p[i] – (1/n) if p[i] > 1 means probability = 1 if p[i] < 0 means probability = 0 where n is the population size, and m is the chromosome length. Step 6 Check the convergence by for i=1 to m do, if p[i] > 0 and p[i]<1 then return to step 2 If the solution is converged, p is the optimal solution. More details of compact genetic algorithm can be seen in [8].
3 Control of a Servo System A servo system in [4] is used to illustrate the effective of proposed technique. The servo plant can be written as:
Fixed-Structure Mixed Sensitivity/Model Reference Control
G(s)= where
10 K p K a K e Ns (1 + 0.05s )
673
(7)
K a , K p , K e and N are amplifier gain, phase detection gain, encoder gain
and counter gain, respectively. In this paper, nominal parameters of a servo plant is set as follows [4, 9].
K a = 20, K p = 0.06, K e = 5.73, N = 1 Two weighting functions for the mixed-sensitivity approach can be specified as [4].
W1 =
0.5s + 0.05 s + 0.2 s + 6.3265
W2 =
0.6 s + 0.2 s + 8
2
(8)
2
To illustrate the control performances of the proposed controller, computer simulations were performed. Reference model in this simulation study is selected as Gr ( s ) =
1 0.1s + 1
(9)
Time constant in the reference model, in this case, is selected as 0.1. The control parameters are unknown, but the structure of controller and range of parameters are specified in the design. In this paper, controller structure in (1) are adopted and the optimal solution of controller parameters is assumed to be in the following ranges.
x : 0 ≤ a ≤ 10 , 0 ≤ b ≤ 100 , 0 ≤ c ≤ 1000
(10)
w1 is selected as 0.25. We select the genetic parameters as population size = 100, maximum generation = 50, individual size (s) = 10. When running the compact genetic algorithms for 9 generations, an optimal solution is obtained. Optimal solution evolved by the proposed technique is shown in following.
K ( x) =
3.6243s + 21.3082 s + 205.0200
(11)
The obtained optimal solution has infinity norm (Jcost1) of 0.9598 and the fitness value = 0.9552. This means that the optimal controller is robust. . Fig. 1 shows the convergence of the fitness function. Clearly, the solution is converged at 9 generation. For comparison purpose, time domain response of the proposed controller, robust controller in [4], reference model, and a conventional lead-lag controller are investigated. Controller parameters in conventional lead-lag controller are tuned th
674
P. Srithongchai et al.
Fig. 1 Fitness Value versus Iteration in compact Genetic Algorithm
Fig. 2 Step response of the proposed optimal controller, controller in [4], conventional leadlag controller at nominal plant and the reference model
by considering only the time domain response. This can be achieved by using trial-and-error method. Fig. 2 shows the step responses of all controllers. As seen in this figure, settling time of the proposed and conventional controller is almost the same and is better than that of the robust controller in [4]. Also, there is no overshoot in all responses. To study the robust performance of all controllers, the step responses of the perturbed plant which parameters K a , K p , K e , and N are changed to 0.072, 24, 6.876, 0.8 respectively (20% perturbation), are investigated. Fig.3 shows the step responses of all controllers when the plant is perturbed. As seen in this figure, there are overshoot at the responses from the robust controller in [4] and the conventional controller. Clearly, the response from the proposed technique is over-damp response and there is no oscillation in the response. This shows that the proposed technique is robust and the response is better than that of other controllers.
Fixed-Structure Mixed Sensitivity/Model Reference Control
675
Fig. 3 Step response of the proposed optimal controller, controller in [4], conventional leadlag controller at perturbed plant Table 1 Comparison of controller of proposed technique and GA based robust controller in[4]
Controller Infinity Norm
Proposed controller
Controller in [4]
Conventional lead-lag controller
3.6243s + 21.3082 s + 205.0200
0.8878s + 2.0508 s + 19.7244
0.145s + 0.85 s + 8.5
0.9598
0.8323
1.0401
(Jcost1) Table 1 shows the controller parameters and infinity norm (Jcost1). As shown in this table, both proposed controller and controller in [4] are robust since the infinity norm is less than 1. In contrary, the conventional controller is not robust as shown by its infinity norm.
4 Conclusions The proposed technique is applied to control a servo system. By incorporating of robust control and the model reference control concepts, the robust and good performance controller can be achieved. In robust analysis, infinity norm of proposed system is less than one, which means that the designed system is robust. Response of the proposed controller at the perturbed plant is much better than that of the robust controller in [4] and conventional lead-lag controller. By the above mentioned results, it can be concluded that performances in time and frequency domain as well as robustness can be achieved by the proposed controller. In the future work, the proposed technique will be adopted to control a servo system of an automatic visual inspection.
676
P. Srithongchai et al.
Acknowledgements. This research work is financially supported by Industrial/University Cooperative Research Center in Data Storage Technology and Applications, King Mongkut's Institute of Technology Ladkrabang and National Electronics and Computer Technology Center, National Science and Technology Development Agency (Project No. HDDB51-004) and Belton Industrial (Thailand) Co., Ltd.
References 1. Karr, C.L., Gentry, E.J.: Fuzzy Control of pH Using Genetic Algorithms. IEEE Trans. Fuzzy Syst. 1, 46–53 (1993) 2. Karr, C.L., Weck, B., Massart, D.L., Vankeerberghen, P.: Least Median Squares Curve Fitting Using a Genetic Algorithm. Eng. Application Artif. Intell. 8(2), 177–189 (1995) 3. Hwang, W.R., Thompson, W.E.: An Intelligent Controller Design Based on Genetic Algorithm. In: Proc. 32nd IEEE Conf. Decision Control, December 15-17, pp. 1266– 1267 (1993) 4. Chen, B.S., Cheng, Y.M.: A Structure-Specified Optimal H∞ Control Design for Practical Applications: A Genetic Approach. IEEE Transl. J. Control System Technology 6(6) (November 1998) 5. Kaitwanidvilai, S., Parnichkun, M.: Genetic Algorithm Based Fixed-Structure Robust H Infinity Loop Shaping Control of a Pneumatic Servo System. International Journal of Robotics and Mechatronics 16(4) (2004) 6. MATLAB Robust Control Toolbox, Mathworks co., Ltd., http://www.mathworks.com 7. MATLAB Generic Algorithms Toolbox, http://www.ie.ncsu.edu/mirage/GAToolBox/gaot 8. Harik, G.R., Lobo, F.G., Goldberg, D.E.: The Compact Genetic Algorithm. IEEE Trans. On Evolutionary Computation 3(4) (November 1999) 9. Kuo, B.C.: Automatic Control Systems. Prentice-Hall, Englewood Cliffs (1991)
ANN-Based Multi-scales Prediction of Self-similar Network Traffic Yunhua Rao, Lujuan Ma, Cuncheng Zhao, and Yang Cao*
Abstract. Self-similarity is a ubiquitous phenomena spanning across diverse network environments and has great effect on network performance. The LongRange Dependence (LRD) structure in self-similar network traffic could be exploited in traffic prediction, which is very useful in resource allocation. But the traffic prediction is very difficulty because of its multi-scale and non-linear feature. Having considered these features of self-similar network traffic, the prediction algorithm with ANN is proposed in this paper. At first, ANN for the multiscale traffic prediction is constructed. The procession of Input/Output vectors, parameter selection and training scheme are also discussed. Then the artificial traces are generated with Fractional-ARIMA model and are used in the experiments of ANN multi-scale prediction. The result shows that this algorithm can predict the self-similar network traffic at multi-scale. It is very useful to optimize network control scheme and improve network performance. Keywords: ANN, self-similar, network traffic.
1 Introduction Since Leland reported self-similarity (i.e. Long-Range Dependence LRD) in network traffic, it has been discovered that LRD character has a significantly effect on network performance, especially its continuous burstiness manifests at different time scales, such as the impaction on TCP congestion control, overflow probability of queueing system, and so on [1], [2],[3]. Because of the significant influence of self-similarity on network performance and the feasible traffic prediction with LRD structure, it is very meaningful to Yunhua Rao . Lujuan Ma . Yang Cao School of Electronic Information, Wuhan University, Wuhan 430072, Hubei, China *
Yunhua Rao National Mobile Communications Research Laboratory, Southeast University, Nanjing 210096, China Cuncheng Zhao Yichang Testing Technique Research Institute, Yichang 443003, Hubei, China
[email protected] H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 677–683. springerlink.com © Springer-Verlag Berlin Heidelberg 2009
678
Y. Rao et al.
exploit this character in resource allocation to guarantee QoS and improve network performance. There are some researches on these problems. The prediction algorithm of self-similar network traffic is incorporated in the SAC (selective aggressiveness control) congestion control framework [2]. SAC could retract bandwidth when the future network is idle, and then adjust the degree of congestion according the confident function. The more self-similar of network traffic, the more effective of this algorithm. This traffic prediction based congestion control algorithm could lessen the infection of delayed feedback information and enhance the performance of TCP in large-scale network that has delaybandwidth production. Sang-Jo Yoo has improved the multiplexing gain of broadcasting through the prediction of VBR MPEG video traffic [5]. N.G. Duffield etc. predicted the QoS performance with the employment of fluctuation in self-similar network traffic [4]. All these researches showed that the prediction of self-similar network traffic is useful in resource utilization and performance improvement. The research of selfsimilar network traffic prediction has been carried out widely. The prediction formula of fractional Brown motion is the theory base of all the current self-similar network traffic prediction algorithm proposed by Gripenberg and Norros, which showed that the closest data has the greatest influence on the next step prediction. But this forecast formula needs the Hurst parameter in advance and does not consider the SRD in network traffic. It is not practical. Shu Y. etc. [6] proposed a prediction algorithm based on Fractional-ARIMA model, which possesses SRD and LRD features simultaneously. This algorithm is complex and less accurate because it needs to estimate the Hurst parameter of network traffic on-line. Tuan T. etc. [2] predicted network traffic with a heurist scheme based on its self-similarity. This scheme is very simple, but its precision is strongly impacted by the selected time scale, the number of sampled data and Hurst parameter. Wiener-Kolmogorov algorithm is direct and simple, but it lacks of self-similarity consideration. The difficulty of self-similar network traffic prediction is mainly derived from its non-linear and coexistence of LRD and SRD. The predict scale is also a problem needed to be considered in practical applications. In this paper, we study the multi-scale prediction of self-similar network traffic with ANN, which also consider the relationship between the impaction of LRD and SRD traffic on system performance. The Fractional-ARIMA (fractional autoregression integrated moving average) model that captures both LRD and SRD character is adapted to examine the validation of our prediction algorithm. It shows that the ANN prediction algorithm could overcome the non-linear difficult and predicts network traffic accurately on multi-time scales. This paper is organized as follows: in section 2 the concepts of self-similar network traffic are introduced, which is used in our later discussion; In section 3 the multi-scale prediction algorithm with ANN and experiments are proposed; The conclusion of all the works is given in section 4.
ANN-Based Multi-scales Prediction of Self-similar Network Traffic
679
2 Self-similar Process and Queueing Model Self-similar process is a stochastic process, which has scale invariability in statistical. If a continue time stochastic process X (t)(t=0, 1, . . .) has D
X (t ) = a − H X (at ) , α>0, t ≥ 0 , 0
(1)
it is self-similar in statistical, where D means in distribution and H is Hurst parameter. If Eq. (1) comes to existence for infinite dimensional distributions, X (t) is an exactly self-similar stochastic process, or if Eq. (1) comes to existence only for mean and variance of the stochastic process, it is an asymptotically self-similar stochastic process. Let X (t) (t=0, 1…) be a stationary stochastic process with mean m and variance σ 2 . If its autocorrelation function r (k ) has r (k ) ~ k − d L1 (k ) , as k → ∞, where 0 < d < 1 , and lim L1 (tx) / L1 (t ) = 1, then X (t) is said to be long t →∞
∞
range dependent for all x>0, i.e.
∑ r (k ) → ∞ . k =0
Two stochastic processes are often used to model self-similarity, fractional Brownian motion (FBM) and FARIMA process. If process Y (t) (t=0, 1,) is a Gaussian process and has self-similar stationary increment with Hurst parameter H (H-sssi), it is said to be FBM with Hurst parameter H (0
ϕ ( B)Δ d Yk = θ ( B)ε k where
(2)
ε k is white Gauss process, B is backward operator and fractional difference
φ
φ
φ
operator Δd= (1-B)d=∑kCdk(-B)k , (B)=1- 1(B) -…- p(B)p and θ (B)=1-θ1(B) …-θq(B)q, p and q are integer.If d=0, Y is a short-range dependence processARMA (p,q)(autoregressive moving average). So FARIMA (p,d,q) process includes LRD and SRD at the same time.
3 The Multi-scale Prediction Algorithm Based on ANN The prediction of self-similar network traffic is different from normal prediction algorithm. At first, LRD and SRD coexist in traffic, which require different prediction algorithm. Second, the non-linear character in traffic is difficult to be fit with exponential functions, it is heavy tail. Third, the applications require the algorithm to predict on different time scales adaptively. At last, the prediction algorithm must meet the demand of real time applications.
680
Y. Rao et al.
With the consideration of the above difficulties, BP neural network is regarded as an operable method in traffic prediction. It has been proved in theory that BP neural networks with three layers could approach any non-linear function and is applied in the non-linear system identifies successfully. For a study sample data {xi, yi}, (i = 1, 2, k), there exists a mapping G, where yi = G (xi). The study of BP neural networks with this sample data is a process to search a mapping algorithm adaptively, which could approach G under predefined requirements. The hidden layer nodes could preserve the characters of LRD and SRD. So BP neural network could realize the nonlinear mapping of input and output with enough hidden nods and could automatically adjust prediction algorithm at different time scales.
3.1 Construction of BP Neural Network In this paper, the BP neural network is constructed with three layers: input, hidden and output layer. To realize the non-linear mapping function, the nodes of hidden layer must story the parameters of LRD and SRD at the same time. The next step prediction of SRD is mainly related to the latest several values. It is a linear polynomial. So the number of hidden layer nodes is the same as the order of SRD to story the feature of SRD. Since LRD could be expressed only with Hurst parameter, it needs one node. At the same time, this network must preserve the mean and variance of the self-similar network traffic, so it needs two more hidden layer nodes. The number of input nodes should also consider the requirement of multi-scale prediction. At little time scale, the effect of SRD dominates in network traffic. The input vector must hold all the related information for the next step prediction. So the number of input nodes is selected as the order of SRD. At large time scales, the effect of SRD decreases and that of LRD dominate in network traffic. According to the prediction formula of fractional Brown motion, the latest data has the strongest impaction on the next step prediction. So only the mean of the latest aggregated traffic data is needed to input. Then the number of input nodes is set to one at large time scales. For BP neural networks, the normal mapping function is sigmoid function, which could reflect the non-linear relationship between input vector and output vector. But the self-similar network traffic has correlation structure on several time scales; the rapidly decayed exponential function cannot capture the heavy tail behavior in network traffic. Since only one hidden layer node is needed to story Hurst parameter information, we select sub-exponential function as the mapping function of this node and other nodes mapping functions are still sigmoid function f(x)=1/(1+(e-lnx)). The LRD and SRD information are preserved in different nodes separately.
3.2 Experiments To validate this BP neural network based prediction algorithm, the experiments are implemented with the artificial trace. At first, the trace is generated with
ANN-Based Multi-scales Prediction of Self-similar Network Traffic
681
Fractional-ARIMA model, which captures LRD and SRD at the same time. Then the BP neural network is trained with these sample data at different time scales. At last, the comparison between the real and predicted data is carried out and some useful results are obtained. 3.2.1 The Generation of Trace
By the model of Fractional-ARIMA (1, d, 0) (d = H – 0.5), 200 000 synthetic traffic data are generated, its Hurst parameter H being 0.55, 0.65, 0.70, 0.80 and 0.85 respectively, its mean is set to be zero and variance is set to be 1. When the time scale of aggregated network traffic is not larger than the order of SRD, the number of input nodes is the same as the order of SRD. As it is larger than the order of SRD, the LRD is predicted and the number of input node is only one. At different time scales, the input value of each node is the mean of the aggregated network traffic. 3.2.2 The Training Process
The training process is as following. At first, the initial value is selected. The weight value is a random value between 0 and 0.05; the threshold is 0.5-1.0. Second, the sample data is fed at different time scales. Third, the output and error value are all calculated. If the error is less than 0.2, goes to the fifth step, else goes to next step. Forth, the weight value is modified and goes to the second step. Fifth, the weight and useful information are storied, and then the training is ended. The training is carried out with the network traffic aggregated at different time scales. 3.3.3 The Prediction Result
The trained BP neural network is used to predict the Fractional-ARIMA time sequences with different Hurst parameter. The results are showed in fig.1, fig.2 and fig.3. From these three figures, it can be showed that the error is large for the first 500 data, and then it decreases with the increasing of predicted data. This is because the network could be adjusted with the inputted data. The predict error is different between the sequence with H=0.65 and that with H=0.85 in fig.2 and fig.3. This is because that the burstiness of self-similar network traffic is expressed with Hurst paremeter, the larger H is, the more intensive burstiness. It shows that there is a salient increase in burstiness of self-similar network traffic when H = 0.75. It is named as jumping burstiness. And its existence in traffic has a remarkable impact on queueing performance, which should be paid enough attention to.
682 Fig. 1 The real data and predicted data
Fig. 2 The predict error with H = 0.65
Fig. 3 The predict error with H = 0.85
Y. Rao et al.
ANN-Based Multi-scales Prediction of Self-similar Network Traffic
683
4 Conclusions In self-similar network traffic, the presences of LRD could influence its prediction at large time scales. But the normal prediction algorithm cannot capture this feature and its performance is weakly. In this paper, the ANN based prediction algorithm of multi-scale self-similar network traffic is discussed. It shows that both long-range dependence and short-range dependence of traffic can all influence the performance of prediction algorithm. Experiments valid our algorithm and some useful results are obtained.
,
Acknowledgments. This work was supported by the open research fund of National Mobile Communications Research Laboratory Southeast University under the grant No. W200705 and China Postdoctoral Science Foundation under the grant No. 20070410952.
References 1. Park, K., Willinger, W.: Self-Similar Network Traffic and Performance Evaluation. Wiley Interscience, New York (2000) 2. Tuan, T., Park, K.: Multiple Time Scale Congestion Control for Self-Similar Network Traffic. Performance Evaluation 36(1), 359–386 (1999) 3. Field, A.J., Harder, U., Harrison, P.G.: Measurement and Modeling of Self-Similar Traffic in Computer Networks. IEE Proc. Commun. 151(4), 335–363 (2004) 4. Duffield, N.G., Lewis, J.T., Neil, C.: Predicting Quality of Service for traffic with longrange fluctuations. In: Proceedings of IEEE International Conference on Communications, pp. 473–477 (1995) 5. Yoo, S.J.: Efficient Traffic Prediction Scheme for Real-Time VBR MPEG Video Transmission Over High-Speed Networks. IEEE Trans. On Broadcasting 48(1), 10–18 (2002) 6. Shu, Y., Jin, Z.: Traffic Prediction Using FARIMA Models. In: Proc. ICC, Canada, pp. 22–26 (1999)
Application of DM and Combined Grey Neural Network in E-Commerce Data Transmission Zhiming Qu*
Abstract. Using grey system (GS) theory, data mining (DM) and radial basis function (RBF) neural network method, a new model, the combined grey neural network (CGNN) model is setup, which aims at solving users’ data transmission in E-commerce. The results show that, in short-term prediction of data transmission, GM is an effective way and RBF has perfect ability to study and map. CGNN model and neural network have dual properties of trend and fluctuation under the condition of combining with the time-dependent sequence data. It is concluded that great improvement, comparing with trend prediction and simple factor in CGNN model, has been made to analyze data transmission. Keywords: DM, GS, CGNN, E-commerce, Data transmission.
1 Introduction DM extracts the useful information and knowledge in which the data is abundant, incomplete, ambiguous and random [1, 2]. Practically, the extraction is implicit, unaware and potential. The data used by DM changes swiftly and makes DM respond rapidly and provide support for decision-making efficiently [3, 4]. The object of grey system technology expresses the uncertainty of small sample and poor information in which some pieces of information are known but others are unknown. Developing and generating the known information, the real world will be realized and discovered and the system operation behavior and evolution will be grasped and described properly. Through original data preprocessing, the changing law of grey system will be found. The integrated functions of the grey system will appear as a certain inner law. It is well known that the implicit, unknown and useful information and knowledge are extracted by DM from greatly incomplete and ambiguous application data. The studying objects of GST are based on the poor information which is generated by parts of Zhiming Qu School of Civil Engineering, Hebei University of Engineering, Handan 056038, China
[email protected]
*
H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 685–692. springerlink.com © Springer-Verlag Berlin Heidelberg 2009
686
Z. Qu
the known information to extract valuable data and to properly recognize and effectively control the system behavior [5]. As a kind of DM method, the technology of neural network is dependent on its inner relations to model, which is well self-organized and self-adapted. The neural network can conquer the difficulties of traditionally quantitative prediction and avoid the disturbance of man’s mind. Though the objective system are expressed complicatedly, the development and change are still of logic laws and the different functions in the whole system are coordinated and unified. Thus, how to find its inner developing regularities from the dispersed data seems to be important. In the light of the description above, it can be found that the combination between GST and DM will take great effect on data safety analysis. The grey neural network model will be built in solving the problems of data safety analysis. Through comparison with GM (1, 1) and RBF models and their analysis results, the feasibility and advantages will be proved by combination of DM and GST [6-8]. Neural network technology will combine self-learning with self-suited features seamlessly. Neural network technology will be used to build real-time intelligent E-commerce applications of self-learning capability, which can determine realtime judgment and respond to E-commerce needs enabling enterprises to build and deploy smart components rapidly. Through monitoring E-commerce process, the inherent law, relationship detection mode and principle of priority of discrete data can be analyzed and established in order to forecast changes business results in the future. In this way, E-commerce enterprises will be able to seize market opportunities and to provide its customers with unparalleled service [9]. E-commerce is the inevitable result of modern information technology development and an inevitable choice in the future operation of business mode. In the global economy situation, the network infrastructure should be strengthened and the process of E-commerce enterprises actively promoted, E-commerce safe legislation amplified and the logistics distribution system improved for the E-commerce development to create a good environment. At the same time, the multi-media data mining, text data mining and network data mining should be strengthened [10, 11]. The problems such as data quality, data security and confidentiality, integration between data mining and other commercial software should be solved. Using data warehouse, data mining and other modern information technology, business gives full play to the unique advantages to promote technological innovation and management innovation so that companies with invincible position is setup in the tide of E-commerce.
2 Grey Neural Network Model Using GM (1, 1) to predict sequence is one of the most frequently applicable fields of the grey system. Because the grey model is in the light of the data itself to acquire the regularities, some predictable errors maybe appeared and many differently independent models will be setup to many related sequences, which can not consider the relations among data sequences sufficiently. Generally, the shortcoming of a model can be made up through setting up the combining models such as RBF [8].
Application of DM and Combined Grey Neural Network
687
2.1 RBF Neural Network In the grey neural network, the BP network is used. However, the RBF neural network is used in this paper. In RBF neural network, the radial basic function is adopted as the implicit neural unit to form a 3-level network. The node of input level only transfers the input signal to the implicit level. The number of the input node is assured by the relational dimension, m. The node of implicit level, called RBF node, is composed of radial basic functions. The nodes of the output level are usually expressed as the linear function. Because the RBF neural network is used in predicting the data safety analysis problems, so the output node is 1, which is the prediction value. The topology structure of the output node is shown in fig.1.
Fig. 1 Model of RBF network
In fig.1, X = ( x1 , x2 ,
, xn )T is the input sample, and y, the single output,
U = (u1 , u2 , , un )T , the implicit node output, W = ( w1 , w2 , , wn )T , the weight connecting with implicit node and output node. The connect weight value is 1 between input and implicit nodes because the signal is transmitted to the implicit layer by the input node. The output of NO. i implicit node is u i = R ⎣⎡ X − Ci σ i ⎦⎤ . Where i is the number of implicit nodes,
i=1, 2,…, m. R is the radial function which is expressed by Gaussian kernel function. X is the input sample. Ci is the center of radial basis function of ith neuron.
σ i is the width parameter of radial basis function of ith neuron. X − Ci is the Euclidean norm. The activation function of implicit node has different expressions. The Gaussian kernel function, R ( x ) = exp x 2 , is always used, and the output of RBF neural ∞
network is y = ∑ wi exp X − Ci σ i−2 . Two stages are included in RBF network. 2
i =0
Ci and σ i of all implicit nodes are calculated by k-average clustering algorithm
688
Z. Qu
and all the input samples in the first stage. Then, according to training samples and least square method, wi is solved after the implicit layer parameters are calculated. In the light of the reasonable input parameters and the prediction principles, the input and output data based on radial basis function can be calculated, trained and predicted by the functions in MATLAB tool. Combining with neural network, the GM (1, 1) is used to setup the grey neural network prediction model.
2.2 CGNN Prediction Model A series of prediction values can be acquired to the raw series data while GM (1, 1) is setup to many series. But a certain deviation still existed, which is related to the raw unintuitive series. Thus, the relationship between series and the deviation of prediction and original data should be taken into account. The prediction value is considered as the input samples of neural network, and the original data as the output sample. Using a certain data structure, the network will be trained and series of well-trained weight and threshold values can be acquired. The prediction in one or more different time of different GM (1, 1) is as the well-trained input of network from which the final prediction in the next time or next different time will be carried out. As to the algorithm, the CGNN prediction is introduced in detail in reference [8]. In the data safety analysis, it is very complicated that the variables inside the data system are produced at the beginning of the model setup. The variables explained in the model should be selected correctly, which, on one hand, relies on the further study and cognition by the model builder to the system and on the other hand, on the quantitative analysis. To solve this problem, the grey relational principle will bring active action on it.
2.3 Prediction with Grey System Model and CGNN Let y be the system variable, z1 , z2 , , zn are the positive or negative correlated factors. ε i is the relation on the basis of zi to y . Given the lower threshold value, ε 0 , zi can be deleted while ε i < ε 0 , in which parts of explaining variables relating to the weak relation can be deleted in the data safety system. To the network and using the method above, the input variables of network are selected, which can simplify the input samples greatly. Let λ1 be grey prediction value, λ2 the prediction value by neural network, λc prediction value by optimal combined model. The prediction errors are η1 ,η 2 and ηc respectively. The corresponding weighted coefficients are w1 , w2 and wc , and w1 + w2 = 1 , α c = w1α1 + w2α 2 . Thus, the errors and variations [8] are ηc = w1η1 + w2η 2 and Var (η c ) = Var ( w1η1 + w2η 2 ) = w12Var (η1 ) + w22Var (η 2 ) + 2w1 w2 Cov (η1 ,η 2 ) . As to w1 , in order to determine the functional minimum value, let
Application of DM and Combined Grey Neural Network
689
Var (ηc ) = w12Var (η1 ) + (1 − w1 ) Var (η 2 ) +2w1 (1 − w1 ) Cov (η1 ,η 2 ) , 2
∂Var (ηc ) ∂w1
=0
And ∂ 2Var (ηc ) 2∂w12
Then
w1 =
= Var (η1 ) + Var (η 2 ) − 2Cov (η1 ,η 2 ) . ⎡⎣∂ 2Var (ηc ) ⎤⎦ ( ∂w12 ) ≥ 0 ,
Var (η2 ) − Cov (η1 ,η2 )
Var (η1 ) + Var (η2 ) − 2Cov (η1 ,η 2 )
w2 = 1 − w1
and
.
Because
Cov (η1 ,η2 ) = 0 , let Var (η1 ) = γ 11 , Var (η 2 ) = γ 22 , then the weighted coefficients of
combined
αc =
prediction
are
w1 =
γ 22 γ 11 + γ 22
,
w2 =
γ 11 γ 11 + γ 22
and
γ 22 γ 11 α1 + α2 . γ 11 + γ 22 γ 11 + γ 22
3 Comparison between GM (1, 1) and CGNN Model 3.1 Prediction by GM (1, 1) Aiming at a user, there are safety problems in the data he or she uploads and receives. In order to verify grey neural network model, parts of data through the receiver will be analyzed. The data is shown in table 1 as the raw data (MB means megabytes). The calculation steps of GM (1, 1) are not stated here.
3.2 Prediction by CGNN Model In CGNN model, the vector parameters selected will be input into the safety system. Using the method provided in section 2.1, the samples of data from NOV. 2004 to Dec. 2005 will be trained and the data is predicted in table 2. Comparing the prediction data with the actual data, the residual data sequence
p =(-6.1, -8.3, -10.3, -8.2, -5.5, -3.6, -7.0, 4.2, 5.8, 8.2, 6.9, 9.0, 8.7), Table 1 Raw data of safety analysis by GM (1, 1) SN 1 2 3 4 5 6
date Nov. 2004 Dec. 2004 Jan. 2005 Feb. 2005 Mar. 2005 Apr. 2005
Data (MB) 7382.67 7201.78 7057.33 6896.44 6736.00 6644.89
SN 7 8 9 10 11 12
date May 2005 Jun. 2005 Jul. 2005 Aug. 2005 Sep. 2005 Oct. 2005
Data (MB) 6583.11 6505.78 6442.67 6352.00 6275.11 6227.11
690
Z. Qu
Table 2 Prediction data by CGNN model Date Nov. 2004 Dec. 2004 Jan. 2005 Feb. 2005 Mar. 2005 Apr. 2005 May 2005
Real data (MB) 7382.67 7202.22 7057.33 6896.44 6736.00 6644.89 6583.11
Prediction Data (MB) 7355.56 7164.89 7011.56 6860.00 6712.00 6628.44 6552.00
date Jun. 2005 Jul. 2005 Aug. 2005 Sep. 2005 Oct. 2005 Nov. 2005 Jan. 2006
Real data (MB) 6505.78 6442.67 6352.00 6275.11 6227.11 6178.22 6169.78
Prediction Data (MB) 6524.44 6468.44 6388.44 6305.78 6267.11 6216.44 6210.22
and Var (η ) =
2 1 n ∑ ( p ( k ) − p ) = 52.8 . n − 1 k =1
4 Results Analysis Through the calculation, the prediction data from grey system model and the CGNN are shown in table 3. The unit of data is MB. From table 3, the prediction data by combined model of grey system and neural network is improved more than that of anyone of GM (1, 1) and RBF models. GM (1, 1) provides a good tool for predicting the data safety analysis, which makes us, only depending on parts of or poor information, accurately predict and master the received data safety. Thus, the uncertain risk by false information and Hacker operation is avoided and overcome effectively. Especially in the short term prediction of data safety, GM is an effective way. As to RBF, it has perfect ability to study and map. The output of model has important and valuable reference to judge the short-term safety analysis, which proves the reliability and effectiveness of Table 3 Data comparison Date Nov. 2004 Dec. 2004 Jan. 2005 Feb. 2005 Mar. 2005 Apr. 2005 May 2005 Jun. 2005 Jul. 2005 Aug. 2005 Sep. 2005 Oct. 2005 Nov. 2005
Real data 7382.67 7202.22 7057.33 6896.44 6736.00 6644.89 6583.11 6505.78 6442.67 6352.00 6275.11 6227.11 6178.22
GM (1, 1) 7382.67 7113.33 7012.44 6912.89 6814.67 6717.78 6622.22 6528.44 6435.56 6344.44 6254.22 6165.33 6166.67
RBF 7355.56 7164.89 7011.56 6860.00 6712.00 6628.44 6552.00 6524.44 6468.44 6388.44 6305.78 6267.11 6216.44
Combined model 7374.67 7128.44 7012.00 6897.33 6784.89 6692.00 6601.78 6527.11 6445.33 6356.89 6269.33 6194.67 6181.33
Application of DM and Combined Grey Neural Network
691
Fig. 2 Data comparison
data safety prediction using RBF network model. However, the combined model of grey system and neural network, to some degree, has the dual properties of trend and fluctuation under the condition of combining with the time-dependent sequence data, which has great improvement comparing with any method of trend prediction and simple factor.
5 Conclusions Combining with grey prediction model and neural network model, the CGNN model adapts the GM (1, 1) and RBF neural network models, which plays very important role in analyzing users received data safety. The grey system is applied in the broader range, which has the features of tending to increase or decrease well. And neural network method is a kind of self-adapt and nonlinear dynamic system. It has powerful ability to simulate nonlinear mapping and is used to setup the model and describe the nonlinear characteristics of model. DM and grey system theory can compensate for each other to overcome a series of problems in application. Perfect combination of grey system and neural network method will take good effect on analyzing data safety in order to protect user from attack. In short-term process of data safety analysis, the CGNN model proves to be a more effective way than any simple model. However, some problems such as adapting conditions and parameters of model is worthy of studying.
References 1. Li, K., Liu, Y.A.: Agent Based DM Framework for the High Dimensional Environment. Journal of Beijing institute of technology 14, 113–116 (2005) 2. Qu, C.D., Xu, K., Han, Z.H., et al.: Approach to DM Based on Rough Sets and Decision Tree. Journal of Northeastern University 27, 481–484 (2006)
692
Z. Qu
3. He, B.B., Fang, T., Guo, D.Z.: Uncertain Spatial DM Algorithms. Journal of China University of Mining & Technology 36, 121–125 (2007) 4. Pan, D., Shen, J.Y., Zhou, M.X.: Incorporating Domain Knowledge into DM Process: An Ontology Based Framework. Journal of Wuhan University 11, 165–169 (2006) 5. Liu, S.F.: Grey System Theory and Its Application. Science Press, Beijing (2000) 6. Xiong, H., Chen, D.J.: DM Technology Based on Grey System Theory. System Engineering and Electronic Technology 26, 77–85 (2004) 7. Wang, Y.F.: Mining Stock Price Using Fuzzy Rough Set System. Expert Systems with Applications 24, 13–23 (2003) 8. Peng, Y.: The Applications of Stock Analysis Based on Grey System Theory of DM. Changsha University of Science & Technology Press, Changsha (2006) 9. Hu, Y.T., Wang, Y.D.: Application of Artificial Neural Network in E-Commerce. Computer and Application 4, 4–6 (2003) 10. Qian, Y.Y., Huang, H.: Application of Data Mining in E-Commerce. Journal of Hefei Normal College 26, 62–64 (2008) 11. Jiang, L.X., Cai, Z.H.: Data Mining and Its Application in Electronic Commerce. Computer Engineering and Design 24, 74–77 (2004)
Application of Prediction Model in Monitoring LAN Data Flow Based on Grey BP Neural Network Zhiming Qu*
Abstract. The grey model is characterized as less data and fast computing, and prediction results of BP neural network for nonlinear systems are better. Through amending grey prediction model by BP neural network, the grey BP neural network model is established, which improves the prediction accuracy of LAN data flow. Collecting the actual LAN data flow in a certain period, the simulation experiments of grey BP neural network model, a single grey model as well as a single BP neural network model are carried out. Experimental results show that the grey BP neural network model is superior to that of a single model. Grey BP neural network model can give full play to the advantages of each single model, but the model can not avoid the shortcomings of a single model. in the grey BP neural network model, because a grey model predicts the system trend very well only in a adjacent period, the prediction accuracy with time lasting will deviate from the prediction of actual sequence. Based on short-term LAN Data Flow, grey BP neural network model is adopted by the grey prediction characteristics. In conclusion, grey BP real-time prediction model can forecast and be able to guarantee the prediction accuracy, which determines a reasonable number of samples so that the accuracy of neural network model is further improved and, at the same time, ensure the rapid speed of operation. Keywords: Grey BP neural network, Data flow, Network, Prediction model.
1 Introduction In the process of prediction of short-term LNA data streams, grey theory is used to establish grey BP neural network model. Because the LAN data flow is of great uncertainty, the prediction is not satisfactory, while BP neural network can predict the residuals through the residual sequence between actual value and prediction. As to non-linear projection, BP neural network can achieve very good results to amend the model, in which the prediction accuracy can be greatly improved. As a result, a combined model is created, which can not only give play to the Zhiming Qu School of Civil Engineering, Hebei University of Engineering, Handan 056038, China
[email protected]
*
H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 693–700. springerlink.com © Springer-Verlag Berlin Heidelberg 2009
694
Z. Qu
characteristics of computing speed of grey model but also predict non-linear system by neural network very well to obtain the merits of final satisfactory results. During constructing grey BP neural network, as to high speed in the prediction period of LAN data streams requirement, the BP neural network algorithm is improved using momentum method and self-adaption to speed up the convergence rate of network to avoid the error to a local minimum point [1-3].
2 Prediction Model of LAN Data Flow Based on Grey System and Neural Network 2.1 Integration of Grey System and Neural Network In the case of a small amount of data, grey models such as GM (1, 1) and SCGM (1, 1) can predict data sequence of non-linear, uncertain systems. However, the prediction error is often high, especially when the mutation, switch, failure or disturbance are occurred in the system which interferes the prediction sequence to cause abnormal data and undermine the stability of the projections, and then the prediction error will be increased significantly. In practical system, although some of the anomalies can not be expected to occur. However, there are some mutation points to be predictable through the prior data. Grey prediction model including error compensation can not make accurate projections to the data [1, 4, 5]. Comparing with grey prediction model, artificial neural network is of its powerful features, which can learn through the predictable mutation data in order to achieve emerging prediction to certain special circumstances. But the artificial neural network has its weaknesses needing a large number of training data which, more importantly, has wide representation. Actually, this is more difficult to meet requirement. If the artificial neural network without adequate training is adopted to predict, the error will be very large. In addition, if a grey prediction is completely replaced by the artificial neural network, the vast majority of smooth data must be trained. However, too much training modes will certainly require more network structures, which reduce the study efficiency and consumes too much resources. A small number of special studies of data points maybe a large number of normal data annihilation.
2.2 Prediction Model of LAN Data Flow Based on Grey BP Neural Network From the analysis above, it can be seen that it is inappropriate to adopt grey and neural network prediction to the mainly smooth-oriented process with the mutation point. As a result, grey neural network prediction is mainly used to compensate for the single grey model or single neural network model. Take the discrete system SISO for example, the original series X (t)= {x(t-k+i)} (i=1,2,…, k ), and the length of the original series is l . the prediction for the step. Grey prediction uses GM (1, 1), the prediction is xˆ(t + l ) after l iteration, and the
Application of Prediction Model in Monitoring LAN Data Flow
695
prediction error is e(t + l ) = x (t + l ) − xˆ (t + l ) [1, 6]. In the learning process of grey neural network model, the neural network uses multiple-input and multipleoutput BP network, in which BP training algorithm is used. Thus, in order to smooth the oriented process, at the time L, raw data is expressed as x(0)(L), and the (0)
residual value of the simulated xˆ ( L) is obtained through GM (1,1) model, which is recorded as . Firstly, BP network model with residual sequence is established. The residual sequence will be as the input sample of BP network training. Through enough training, different input vectors can obtain corresponding output vector, in which the weight values and the threshold are the training value using self-adaptive network learning. Thus, the network can be used to predict by the residual sequence. Through training BP neural network, error can be predicted, from which the error prediction can be obtained. The old prediction deduces new prediction which is the one by grey BP neural network.
3 Realization of Prediction Model of LAN Data Flow Based on Grey BP Neural Network According to the combination prediction model, a section of the actual LAN data is used to simulate. The collected data is the observation for a peak time, which is the LAN data about 20 seconds of interval. The grey BP neural network is about three and feed-forward layers and each one choose the different number of nerve cells. Hidden and input layers use Tansig function, and output level uses Purelin function. The largest iteration is about 5000. The prediction is not carried out to emphasize a point or a moment, but the LAN data flow prediction. More precisely, it is a prediction of trend which is only for the time period. The reason for this is that the LAN data stream itself has the characteristics of volatility. If this time is to be amplified, the LAN data flow is characterized as peak and low peak [1, 8, 9]. Therefore, if it is not accurate to predict LAN data at a certain time simply. Through testing the grey BP neural network model, a single BP neural network model and a single grey model respectively, the prediction by the models are compared. For fitting experiment, the former 24 data are selected to predict the trend. While predicting the combined model and single grey model, the prediction accuracy is more or less similar. Furthermore, the series of data points are predicted in the next period.
4 Application Analysis of Prediction Model of LAN Data Flow Based on Grey BP Neural Network First of all, the former 24 data points are used to train the model. Data is divided into two groups, the former 12 data are as the input data, after the latter 12 the output data, and then the latter 12 data are to be fitted. Take 10 data points for a time period of the fitting, after 200 seconds, 3 fitting values are obtained and the prediction outcome is shown in from Fig.1 to Fig.4. The relative error and average
696
Fig. 1 Comparison of original data and prediction
Fig. 2 Initiation of original error and prediction error
Fig. 3 Prediction and original data in individual points
Z. Qu
Application of Prediction Model in Monitoring LAN Data Flow
697
Fig. 4 Initiation of original error and prediction error Table 1 Error analysis of prediction and original value Data (point i) original value prediction Relative error (%) Average error (%)
1 2.89 2.8 3.1 2.87
2 2.945 2.89 1.9
3 3.05 3.09 -1.3
4 3.1 3.175 -2.4
5 3.16 3.295 -4.3
6 3.215 3.35 -4.2
error of data is also calculated. The relative error is the predictive value to the original value and the average error is all the average fitting points’ error, shown in Table 1. Using the trained model, 24 to 45 data at different time are predicted and compared with the original values. The relative error and average error are used to evaluate the prediction outcome.
4.1 Comparison of Grey BP Model, Grey Model and BP Model As to three models in section 3, the fitting by a single grey model is the best, followed by the combination model, and prediction accuracy of a single BP neural network model is minimal. The average errors are 3.01 percent, 3.35 percent, 4.12 percent respectively. As to the prediction, the combined model is the best, followed by a single grey model, and finally a single BP neural network model. The average errors are 2.87%, 16.9%, 9.28%. From the experiment above, the combined model and single grey prediction model can predict a good effect, but the single BP neural network is compared to less ideal than that of the combined model and single grey prediction model. In the combined model, the design parameters of BP neural network and a single BP neural network are the same, form which the combination prediction model is more effective than a single BP neural network model. As to a single BP neural network model, if the accuracy is further improved, then it is bound to increase the number of cycle, which, at the same time, increases the network's running time.
698
Z. Qu
For a single grey model, predictable results are of no difference from the combined model. In order to obtain more apparent comparison effect, an experiment should be further carried out. Using the two models, the LAN data flow is predicted for the interval. The data by fitting is as the first period, the adjacent period as the prediction period, and the next period will be the interval. The reason to do this is that the prediction effect is not very ideal by a single grey model with the system volatility. Through adjusting the system volatility, the grey BP neural network model can make adjustments. The following is the result of the experiment, which is shown from Fig. 5 to Fig. 8 and Table 2.
Fig. 5 Prediction by grey BP model in next period
Fig. 6 Error prediction by grey BP model in next period. Table 2 Error analysis of prediction and original value by grey BP neural network Data (point i) original value prediction Relative error (%) Average error (%)
1 3.1 3.25 -4.8 9.28
2 3.05 3.35 -9.8
3 3 3.45 -15
4 3.2 3.525 -10.1
5 3.4 3.6 -5.8
6 3.54 3.9 -10.2
Application of Prediction Model in Monitoring LAN Data Flow
699
Fig. 7 Data prediction by grey model only in next period
Fig. 8 Error prediction by grey model only in next period
4.2 Further Comparison of Grey BP Model and Single Grey Model The experiment results show that the average error of combined model is 9.28 percent, and the error of a single grey model 16.9%. As to the volatility of the system, the combined model is adjusted better than a single model. And the shortcomings of a single grey model are that with time going, the error departing from the system is increasing. Based on the above data, grey BP neural network model has the following advantages while compared with a single BP neural network and a single grey model. The model can be quickly established and be satisfied with the prediction accuracy. The experiment proves the combined model obtains better prediction when the parameters of neural network models and a single BP neural network model are under the similar circumstance. And in the same cycle, the average error by a single BP neural network is greater.
700
Z. Qu
5 Conclusions Through the actual collection of LAN data stream in a time section, the simulation and prediction tests are carried out using a grey BP neural network model and LAN data flow, which is compared with a single BP neural network model and a single grey prediction model. In the tests, each model is simulated and predicted. As to the simulation and prediction results, the error is analyzed. It can be conclude that the grey BP neural network model is better than that of a single model. In order to improve the prediction accuracy of grey BP neural network, the largest circulation number of networks can be increased, which increases the network computing time. Comparing with a single grey model in the adjacent period, the prediction accuracy is not obvious. In separated periods, however, the prediction of grey BP neural network model performs better and more adaptable. A single grey model mainly predicts the system trend. With the system volatility, this prediction is not enough, and the trend of system change is not stable. Through adjusting the BP neural network model, the combined model can effectively reduce the error. Therefore, using grey model to predict the system trend as well as trends adjustment to the changing trends by neural network, it is effective and feasible to establish the grey BP neural network model which is of high accuracy and the advantage of fewer samples.
References 1. Gang, C.: Prediction of Traffic Flow Based on Grey Theory and BP Neural Networks. Press of Harbin Institute of Technology, Harbin (2006) 2. Li, G.: The Prediction of the Electric Power Load Based on Grey Theory and BP Neural Network. Press of Harbin University of Science and Technology, Harbin (2005) 3. Tang, N.: The Study of Stock Price Index Prediction Based on Grey Theory and Neural Network Theory. Press of Wuhan University of Science and Technology, Wuhan (2007) 4. Mi, Y., Jiang, X.: Applied Predictive Model of Settlement Based on BP Neural Network. Journal of Kunming University of Science and Technology 32, 65–69 (2007) 5. Zhou, H.: Grey Neural Network and Application in Lifetime Prediction of Concrete Structure. Wuhan University of Science and Technology, Wuhan (2004) 6. Yun, S., Namkoong, S.: A Performance Evaluation of Neural Network Models in Traffic Volume Forecasting. Mathl. Comput. 27, 293–310 (1998) 7. Yin, H., Wong, S.C., Xu, J., Wong, C.K.: Urban Traffic Flow Prediction Using a Fuzzyneural Approach. Transportation Research (Part C) 10, 85–98 (2002) 8. Mozolin, M., Thill, J.C., Lynn, U.E.: Trip Distribution Forecasting with Multilayer Perception Neural Network: A Critical Evaluation. Transportation Research (Part B) 34, 53–73 (2000) 9. Qiao, F., Yang, H., William, H.K.: Intelligent Simulation and Prediction of Traffic Flow Dispersion. Transportation Research (Part B) 35, 843–863 (2001)
Monitoring ARP Attack Using Responding Time and State ARP Cache Zhenqi Wang and Yu Zhou*
Abstract. ARP cache poisoning is considered to be one of the easiest and the most dangerous attacks in local area networks. This paper proposes a solution to monitor the ARP poisoning problem, by extending the current ARP protocol implementation. Instead of the traditional stateless ARP cache, we use a state ARP cache in order to manage and secure the ARP cache. We also use a novel approach by monitoring responding time which is different between normal and malicious ARP reply. This method records ARP cache that may be malicious ARP reply, using the records we discover the attackers whose responding time is abnormal comparing with other responding time. Because the network condition is not smooth, the responding time is related to the Normal Distribution function, we use hypothesis testing to make sure who is the attacker. Keywords: ARP cache poisoning, Stateful ARP cache, Normal Distribution Function, Hypothesis test.
1 Introduction This ARP works in the LAN(Local Area Network) based on that the hosts trust each other. ARP is the stateless protocol. There are several loopholes of the ARP ,so ARP cache poisoning is one of the most dangerous poisoning in the LAN. It was not designed to cope with malicious hosts. The ARP poisoning attacks are often used as part of other serious attacks: Man-in-the-Middle (MiM) attack, and Denial of Service (DoS) attack. With a MiM attack, traffic between two hosts is redirected to a third host, which is usually the attacker’s host. This attack allows the attacker to sniff the traffic exchanged between the two victim hosts. With DoS attack, a target host is denied from communicating with other hosts . This paper proposes a monitoring method to the ARP poisoning problem by extending the existing ARP protocol. The new extension includes (1) state ARP cache, (2) cross layer design, (3) update the records, and (4) using normal distribution function and hypothesis testing monitor who is the attacker. Zhenqi Wang . Yu Zhou Network Management Center, North China Electric Power University, Boding 071000, China *
H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 701–709. springerlink.com © Springer-Verlag Berlin Heidelberg 2009
702
Z. Wang and Y. Zhou
The rest of the paper is organized as following: Section 2 provides some background about ARP attacks. Section 3 provides an overview of the related work done in this area. Section 4 discusses the proposed approaches. Section 5 concludes the paper and presents future research directions.
2 Background The ARP is built on the basis of mutual trust between the hosts, according to the ARP mechanism, the following loopholes can be used by the attackers: (1) ARP cache based on the received ARP packets dynamically updating at any time; (2) ARP agreement is not connection-oriented, and the host updates cache when receiving ARP reply even if it never send any ARP requesting packet; (3) ARP doesn`t include authentication mechanisms, as long as the received package is in the right format, the host update the local ARP cache according to the contents of the package, the agreement does not check the legitimacy of the package. According to the mechanism of ARP, attackers send the packets in the form as following: (1) fake ARP reply packets (unicast); (2) fake ARP request packets (broadcast). Due to the second form of ARP attack, Man-in-the-Middle attack will lead to IP address conflicts, and the packet is broadcasting packet, so it is easily detected. Actually ARP spoofing of the MiM attacks is rare, so it always aims at specific target host [1]. The core idea of the ARP spoof is sending fake ARP response to the specific host, using the mapping of fake IP address and MAC address to update the host ARP cache. The ARP attacks are often used as part of other serious attacks: DoS Attacks, Host Impersonation, Host MiM Attacks, Cloning Attack (MAC spoofing attack).
3 Related Work Specialized tools like ARP watch [2] can be used to detect suspicious ARP traffic. The main problem is that it depends on the network administrator being able to differentiate between non-malicious events and ARP cache poisoning attacks, and also on his/her ability to take appropriate and timely measures when an attack occurs. Intrusion Detection Systems (IDSs) like Snort [3] are usually able to detect ARP attacks. The main problem with IDSs is that they tend to generate a high number of false positives. Also, their ability to detect ARP poisoning is limited [4], as they may not be able to detect all of the forms of the attack. 1)
2)
Passive inspection. When the system receives the ARP request packet in the LAN , the system check whether the address in the packet is same as it own address or not. If same, it is likely to ARP spoofing, or other hosts may also be configured with their same address. If using this method to detect the ARP spoof, IP / MAC mapping table of all legitimate host in LAN must be saved. Positive inspection. When receiving the ARP response packet, in order to confirm its authenticity, picking up the MAC address from the packet to restructure RARP requesting packet, which can query the corresponding IP
Monitoring ARP Attack Using Responding Time and State ARP Cache
3)
4)
5)
703
address of the MAC address. Comparing the IP address of the two responding packets, if different, maybe someone forgery the responding packet. Monitoring DNS server. Many networks monitoring software will try to reverse address resolution. If ARP spoofing occurs, it can be observed in the DNS system that the resolution request has increased. Because the attacker will be ongoing to try to resolve the address and to reverse resolution address, so that it act as gateway. The attacker will send large mounts of repeated resolution packet, so ARP spoofing can be detected in the LAN. Using Ping model to monitor. When a host implements the ARP attack, its network interface card is hybrid model, and it monitors all broadcasting packets. We can forged such an ICMP packet: hardware address is not same as any host in the local area network. The normal host will discard the packets, only the attacker would response to the packet. We can find ARP spoofing. Use of the ARP packet monitoring the attack. This model is a variant of Ping model that uses ARP packet to replace the above-mentioned ICMP packet.
4 State ARP and Dynamic Detection The following sections describe a mechanism based on a state ARP cache and monitoring the responding time. We make sure who is attacker by Hypothesis testing of Normal Distribution Function.
4.1 State ARP Cache The proposed prevention mechanism is based on the use of a state ARP cache. When host A generates an ARP request to get the MAC address of host B, an entry is created in its state ARP cache, with the status of ”Waiting”. Host A waits for an ARP reply, within a predefined timeout. If an ARP reply comes, then host A waits another timeout in order to collect other possible ARP replies sent by other hosts in the network. Note that if host A receives more than one ARP reply, then this means that more than one host has replied. Therefore, among those hosts, only one host is an honest host, which is host B. The others are probably malicious hosts, performing ARP cache poisoning attack to corrupt the ARP cache of host A. The main differences between the current stateless ARP cache and the proposed state ARP cache are: 1) When a host receives an ARP reply, the current stateless ARP cache will update the corresponding entry if it exists already in the ARP cache. However, the state ARP cache will not update the corresponding entry unless an ARP request has been generated before for that entry, even if the entry exists already in the cache. The state ARP cache will not update its entries using ARP requests. It is important to mention that all the tested OSs update their ARP caches once they receive ARP
704
Z. Wang and Y. Zhou
requests [3]. By doing this, ARP cache will be better protected from ARP cache poisoning attack.
4.2 Types of Possible ARP Replies The types of the possible ARP reply that host A may receive depends on the type of the attack (MiM, DoS, or Cloning) that the malicious host intends to perform on host A. If within a timeout, host A receives only one ARP reply, and then we can assume that host B has generated the ARP reply, which does not include fake IP and MAC addresses. In this case, host A updates its ARP cache, and changes the status of the entry corresponding to the host B’s IP address to”Resolved”. The content of the non fake ARP reply packet generated by host B is shown in figure 1. If within the timeout, host A receives more than one ARP reply, then we can assume that one packet came from host B and the remaining packets came from malicious hosts. We assume that host C is a malicious host. Depending on the type of the attack host C intends to perform on host A, the following is the contents of the possible ARP reply packets that host C may generate:
Fig. 1 Host B sends a legal ARP reply to Host A via the network
If within the timeout, host A receives more than one ARP reply, then we can assume that one packet came from host B and the remaining packets came from malicious hosts. We assume that host C is a malicious host. Depending on the type of the attack host C intends to perform on host A, the following is the contents of the possible ARP reply packets that host C may generate: 1)
In case of a MiM attack, two possible ARP reply packets can be generated by host C. Figure 2(a) presents a case where host C inserts its MAC address as the source MAC address. Figure 2(b) presents the case where host C hides its MAC address in order to avoid any potential detection, and inserts a fake MAC address.
Monitoring ARP Attack Using Responding Time and State ARP Cache
705
Fig. 2 Host C performing a MiM attack against host A.(a ) and (b) present two strategies for th attack
Fig. 3 Host C performing a DoS attack against host A.(a), (b) and (c) present three different strategies for the attack
2)
In case of a DoS attack, three possible ARP reply packets can be generated by host C as shown in figure 3. Figure 3(a) shows host C inserting its MAC address as the source MAC address. Figure 3(b) shows host C hiding its MAC address in order to avoid any potential detection, and inserting a fake MAC
706
Z. Wang and Y. Zhou
address (x). The source MAC address (x) in the Ethernet header is similar to the source MAC address in the ARP header (x). Figure 3(c) shows how host C hides its MAC address in order to avoid any potential detection, and inserts a fake MAC address (x). The source MAC address (x) in the Ethernet header is not similar to the source MAC address in the ARP header (y). In case of a Cloning attack, two identical ARP reply packets are generated, one by host B and the other by host C. The contents of the two packets are the same (figure 1).
4.3 Monitoring Mechanism Depending on this nature of the received ARP reply packets, host A uses a mechanism based on static ARP cache and responding time, to monitor the abnormal response whether ARP spoofing occurs. 4.3.1 Cross Layer Controller The current ARP protocol implementation does not make any crosses layer control between the ARP layer and the Ethernet layer to verify whether or not the source MAC addresses in these 2 layers are similar. An extension of the implementation of the ARP protocol would proceed to do this verification before accepting any ARP reply packet. Any ARP packet, whose source MAC addresses in the Ethernet layer is different from the source MAC addresses in the ARP header, is considered a fake packet and must be discarded. Consequently, the packets shown in figures 2(b), 3(a), and 3(c) will be discarded by host A since the source MAC address in the Ethernet header is different from the source MAC address in the ARP header in those packets. In case of a MiM attack, the packets shown in figures2 (a) or 2(b) may be received by host A, in addition to the non fake ARP reply packet received from host B (figure1). However, the cross layer controller will discard the packet shown in 2(b). Only the packet shown in 2(a) is accepted by host A for further processing. Therefore, host A can accept the packets shown in figures 1 and 2(a). In case of a DoS attack, the packets shown in figures3 (a), 3(b), and 3(c) may be received by host A, in addition to the non fake ARP reply packet received from host B (figure 1). However, the cross layer controller will discard the packets shown in figures3 (a) and 3(c). Only the packet shown in figure 3(b) is accepted by host A for further processing. Therefore, host A can accept the packets shown in figures 1 and 3(b). If host A receives two similar ARP reply packets that look like coming from host B. Therefore, it is most likely that a malicious host is performing the cloning attack. Consequently, the two packets will be ignored and host A will not update its ARP cache.
Monitoring ARP Attack Using Responding Time and State ARP Cache
707
Table 1 Font sizes MAC in the responding packets is the number of responding packets within N timeouts
MAC B C
Last responding time
In N timeouts ,the reply times
Tb
Nb
Tc
Nc
4.3.2 Monitor by Responding Time After filter by the cross layer controller, the rest of the ARP packet is in this form that the IP address and MAC address in ARP header and in Ethernet header are the same. Then these packets are filtered by state ARP cache mechanism. When host sends an ARP request packet, it will wait within two predefined timeout. If receiving two ARP reply packet, one of them may be an attacker. The responding time is time difference between the receiving time and the sending time. The timeout is a little longer than the responding time. If host A send an ARP packet, receive two or more than two packet (In this paper we just study on two ARP reply packet, for example B and C) then create a table (table 1). The host records the related messages that are MAC addresses, then waiting N timeouts. Within the timeouts, if it receive other packet that IP address is same as B`s, host update the table. In the normal traffic, the reply times are not a large gap. If Nc is far larger than Nb, host C is the attacker. Because host c want to replace host B in ARP cache of host A, it send ARP reply more frequent than the normal host. If Nb and Nc is not very different from Nc. Host A can send large mount of refuse packet to the LAN. On the basis of reply of host B and host C, record some massages: within M timeouts the reply times of host B and host C, and the average response time. The normal system reply time does not change too obvious, and because the existence of the ARP spoofing the system accept a lot of refuse packets, it is likely to occur more time to response. In normal condition, the responding time subjects to normal distribution. In different system the average value is different, so we define the average value of responding time is Tavg. In the LAN ,large number of ARP request packets were send for example (N=1000), records the responding time (T1T2…Tn). The square difference is
φ 2 ; the significant level α=0.05.
Table 2 Font NUM is the number of the reply times, we assume that we received N responding packets. MAC is in the responding packets
MAC
B`s responding time
C`s responding time
1
Tb1
Tc1
N
Tbn
Tcn
NUM
…
…
…
708
Z. Wang and Y. Zhou
δ =T φ2
avg=
=
1 999 ∑ Ti , n i =0
1 999 2 ∑ (Ti − δ 2 ) n i =0
(1)
.
(2)
According to the received ARP reply packets , updates the table2. Then we got two sample space: Tb(Tb1,,Tb2, …, Tbn) and Tc(Tc1, Tc2, …, Tcn). We define the related assumptions Hb0 and Hb1, Hc0 and Hc1. Hb0 and Hc0 have been assumed that Tb and Tc are subject to normal distribution whose average value of respond-
ing time is δ and square is φ . The test methods of Tb and Tc are the same, so we just test Tb . The following is detection steps: If Tb is subject to normal distribution, we assume this: δ b =δ. Then using 2
standerd normal distribution function to do hypothesis testing U=
δb − δ φ
~N(0,1).
{U
Because the significant level α=0.05, P bution
(3)
m
> Cα } >α, the stander normal distri-
Cα =1.96. Using formula (1) and (2), we got δ b . Then make the values
into the formula (3), got the value
ub . If ub >1.96, then we consider the MAC B
is attacker`s MAC. Using this method, we can detect the suspicious packet. Even if we get many ARP reply packets within the two timeouts, the same method can be applied in this way.
5 Conclusion In this paper we proposed a monitoring method to the ARP poisoning problem by extending the existing ARP protocol. The new extension includes (1) state ARP cache, (2) Normal Distribution, (3) cross layer design, and (4) Hypothesis test. The state ARP cache assists in the decision making of monitoring the ARP cache, especially when some updates intend to poison the cache. The cross layer design is used as a very fast way to check some of the naive ARP attacks. The normal distribution and hypothesis test were used to check which the malicious host is. There may be wrong for two reasons: Firstly, due to the complexity of network traffic, there are many factors that affect network performance; second the mathematical model may lead to wrong test. Therefore, this approach remains to be further studied.
Monitoring ARP Attack Using Responding Time and State ARP Cache
709
References 1. Guo, W., Liu, X.: Monitoring Method of MiM Attack Based on Timeout of ARP Cache. Computer Engineering 23, 566–571 (2002) 2. Ren, X.: The Principle Agreement to Deceive the Analysis and Methods to Resist. Computer Engineer 29, 403–440 (2003) 3. Arpwatch, ftp://ftp.ee.lbl.gov/arpwatch.tar.gz 4. Demuth, T., Leitner, A.: Arp Spoofing and Poisoning. TrafficTricks Linux Magazine 56, 26–31 (2005) 5. Gouda, M., Huang, C.-T.: A Secure Address Resolution Protocol. Computer Networks 41, 57–71 (2003)
A Study of Multi-agent Based Metropolitan Demand Responsive Transport Systems Jin Xu, Weiming Yin, and Zhe Huang*
Abstract. Multi-agent based modeling has been looked as an efficient tool for largescale system, it can be integrated with other AI-based and conventional approaches, be greatly enhanced. The resulting hybrid systems offer a flexible modeling environment that exploits the benefits of each individual in a synergistic fashion, like urban transport system. This research presents a multi-agent based demand responsive transport (DRT) services model, which adopts a multi-agents planning approach for metropolis traffic services control. In the proposed model, there are three layers: transport-admin agent layer, node-station agent layer and taxi agent layer. The agent for station and each vehicle have a planning domain, and select a route by a cooperation of agents in its planning domain. Based on this model, a simplified multiagent based demand responsive transportation services system can be developed that is effective for reducing traffic congestion and air pollution. By computational experiments, we examine an effectiveness of the proposed method. Keywords: Multi-agent system, Demand responsive transport, ITS, Agent-based simulation.
1 Introduction In recent years, urban traffic congestion and air pollution have become huge problems in many cities in the world. For example, in one big metropolis, there is about 6,100,000 residents in metropolitan area and about 12,000 taxis in the street. In order to reduce congestion, the government has invested in improving their city infrastructures. However, infrastructure improvements are very costly to undertake and no useful for air pollution reduce. Hence, existing infrastructure and vehicles have to be used more efficiently. The application of new information technologies such as multi-agent technologies to urban traffic information control has made it possible to create and deploy more intelligent traffic management. Jin Xu . Weiming Yin Faculty of Mechanical & Electronic Information, China University of Geosciences (Wuhan), Wuhan 430074, China
*
Zhe Huang Hubei Cen-Tronic Import and Export Co., LTD, Wuhan 430070, China H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 711–720. springerlink.com © Springer-Verlag Berlin Heidelberg 2009
712
J. Xu, W. Yin, and Z. Huang
To reduce traffic congestion, CO2 emissions, air pollution, accidents, financial costs, and other environmental damages, it is necessary to conduct further research on the various characteristics of traffic flow patterns. In general, road traffic system consists of many autonomous, such as vehicle users, public transportation systems, traffic lights and traffic management centre, which distribute over a large area and interact with one another to achieve an individual goal. Our objective is to increase the efficient passages of every vehicle, while at the same time reduce the number of vehicles on the street. This could result in reduction in air pollution caused by the vehicle, traffic congestion and financial cost. Demand responsive transport services are planning computer systems in charge of the assignment and scheduling of client’s traffic requests and using different vehicles available for these purposes. DRT services can provide rapid response transport services ‘on demand’ from the passengers, and offer greater flexibility in time and location for their clients. Moreover, it could also increase the number of passengers in every vehicle, thereby helping reduce environmental pollution and traffic congestion and financial cost. A multi-agent system is an autonomous intelligent computer system, in which every agent always has a certain level of intelligence. The level of an agent’s intelligence could vary from having pre-determined roles and responsibilities to a learning entity. A multi-agent system is an aggregate of agents, with the objective of decomposing a larger system into several smaller agents. This results in multiple agents being able to engage in flexible, highly-detailed interactions. This decomposi-tion offers advantages in modeling complex systems. This paper is organized as follows: The next section describes related work on urban traffic simulation. Section 3 describes the framework we have designed for the traffic information control based on MAS. In section 4 we define the agents for our problem domain. Section 5 introduces the agent planning sequence model. Section 6 defines the planning algorithm. Section 7 shows the experiment result. Finally, Section 8 concludes the paper.
2 Related Work The application of new multi-agent technologies to urban traffic information systems has made it possible to create intelligent systems for traffic control and management: the so-called Intelligent Traffic Systems (ITS) [1] or Advanced Traffic Management Systems (ATMS). The basic task of ITS is to support road managers in traffic management tasks [2]. Because urban traffic networks have interrupted traffic flow, they have to effectively manage a high quantity of vehicles in many small road sections. On the other side, they also have to deal with a noninterrupted flow and use traffic sensors for traffic data information integration. These features make it difficult for real time traffic management. The urban traffic simulators can be classified into two main kinds: macroscopic simulator and microscopic simulators. Macroscopic simulators use mathematical models that describe the flows of all vehicles. In microscopic simulator, each element is modeled separately, which allow it to interact with other elements. Multi-agent systems are
A Study of Multi-agent Based Metropolitan Demand Responsive Transport Systems DEMAND
713
TRAVEL DISPATCH CENTRE (TDC)
TRANSPORT (DRT) USER
BOOKING
BOOKING, AND DISPATCHING
BOOKING
“SMART” BUS STOP MEETING POINT
VEHICLE
ON-BOARD UNIT (OBU)
Booking the Journey Making the Journey
Fig. 1 Traditional Telematics Based DRT
an efficient tool for the basis of urban traffic simulator. Many researchers have made studies on this subject. In order to alleviate the problems encountered in traditional transit service several flexible services were studied and offered. Telematics-based DRT systems [3] based on traditional telecommunication technology has played a role in providing equitable transportation service to elderly and handicapped persons who have difficulty in accessing regular public transit systems. Telematics-based DRT systems are based upon organization via a Travel Dispatch Centre using booking and reservation systems which have the capability to dynamically assign passengers to vehicles and optimize the routes. A schematic representation of telematics-based DRT services is shown in Fig 1. Because it is based on traditional telecommunication technology, the telematics-based DRT services response to client is slow, and sometimes it is difficult to find the best solution for client, besides being unstable. We propose an approach based on multiple agents. We already did some experiments in small city. Now we would like to apply our model in metropolitan transportation problem. In the following sections, we propose our agent-based hybrid model for demand responsive transportation information intelligent control.
3 Framework of System This section describes the agent framework used for urban demand responsive transportation information intelligent control. The system agent framework is composed of three layers. The first layer is an agent platform (A-globe platform), on top of it is the multi-agent architecture and finally there is the urban demand responsive transportation information system (See Fig. 2). In the lowest layer, the A-globe agent platform [4] provides a distributed environment organized in containers where agents can reside and communicate. The A-globe platform provides an agent management system (AMS) in charge of agent identification and localization, a directory facilitator (DF) for identifying agents by their offered services and a message transport system devoted to support
714
J. Xu, W. Yin, and Z. Huang
Fig. 2 System Agent Framework
the communication between agents and containers. A-globe platform is FIPA(the Foundation for Intelligent Physical Agents) compliant on the ACL(Agent Communication Language) level. A-globe is suitable for real-world simulations including both static and mobile units, where the core platform is extended by a set of services provided by Geographical Information System (GIS) and Environment Simulator (ES) agent. On top of the A-globe agent platform resides the multi-agent architecture [5], providing the base agents, structures and semantics for implementing urban demand responsive transportation information intelligent control system [6]. The agents are of four kinds: the user agent, the node-station agent, the transportadmin agent, the taxi agent. Finally, over the multi-agent architecture layers an urban demand responsive transportation system layer is implemented. By extending and implementing the agent-based system provided by the architecture, a dynamic trips planning and traffic control system can be developed.
4 The Definition of Agents This section describes the main agents in the multi-agent architecture used for urban demand responsive transportation information intelligent control system.
4.1 User Agent This agent represents a human user and their interactions with system. It is responsible for capturing all the client’s requirements. It provides communication and interoperability between the end user and the urban demand responsive transportation intelligent system.
4.2 Node-Station Agent This agent is in-charge of processing, assigning and scheduling the received trip requests. It is responsible for coordination with the other agents and the administration of the transportation service. It helps in the collaboration among agents
A Study of Multi-agent Based Metropolitan Demand Responsive Transport Systems
715
that provide support to related functions, such as the matching of request to vehicles, the geographical data access, the accountability of the transactions and the service payment among others.
4.3 Taxi Agent This agent is in-charge of modeling each real vehicle in the system. It processes the trip plan of the vehicle and provides interoperability between the vehicle and the DRT system. Taxi agents make proposals for the actual plan, process change information with the node-station agent, update the plan and reschedule the remaining trip requests.
4.4 Transport-Admin Agent This agent is in-charge of all agents (taxis, node-stations, users). It can set: how many taxi agents will start, how route planning and intercommunication they will use matching transport requests with available vehicles. It manages service descriptions coming from vehicles side and client’s side.
5 Agent Planning Sequence Model Design In this section, we present our multi-layer distributed hybrid planning model. In Fig.3 we can see: each node-station agent and taxi agent has a planning domain. According to it, taxi agent makes its plan of path. Because of traffic unpredictability (traffic jams, etc.), taxi agents need to communicate with other agents exchange information. Therefore each taxi agent makes own statistic of pass times. It updates it by means of own experience and experience of the others, that are transmitted among them by sending messages. In our approach, every taxi agent is moving and the trip requests are random. So it is a dynamic distributed situation. If all the agents communicate together online, it will be slowly and unstable. Hence each taxi agent and node-station agent has a limited range of planning domain it need not communicate with all taxi agents. The taxi agent can communicate only with taxis agents or node-station agents in its planning domain. For node-station agent, it also only communicate the taxi agent in his planning domain, the information interchange is performed when other agents entry to the planning domain. It also ensures change while each meeting. So they can change their plan and use another path. Taxi planning system ensures route planning and communication with the view of improving planning. It consists of two main parts. First part is taxi agents planning and re-planning paths with other taxi agent and node-station agent within its planning domain. Second part is node-station communicating in real time. It ensures right interchange and using information about traffic situation and passengers’ trip requests. (See Fig.4)
716
J. Xu, W. Yin, and Z. Huang N i : N ode -station agent i T i : Taxi agent i N1 T1 T2 T7
N2 T3
T6
T9
T4
N3
T5
N5 T8
N4 Route between node station Taxi agent planning domain
Fig. 3 Agent Planning Domain Framework N i : N ode - station agent i T i : Taxi agent i
Transport admin Agent
Transport admin agent layer
User Agent
Communication between agents
N1
N2
Node- station agent layer
N3
N4 N5
T2 T1 T5
T4
T6
T3
Taxi agent layer
Fig. 4 Agent Multi-Layer Planning Framework
T 8
T9
T 7
A Study of Multi-agent Based Metropolitan Demand Responsive Transport Systems
717
As mentioned above, in the hybrid model we have: • • •
A multi-layer distributed hybrid structure is used. The node-station agent applies filtering policies. The taxi agent make proposals for the actual plan, process change information with the node-station agent in its planning domain, update the plan and reschedule the remaining requests, transport passengers.
In the transport-admin agent layer, User agent’s job is to represent the client and his decisions of the transportation request. So it is responsible for the result. Since the real client does not directly communicate with the taxi agent, the User agent also constitutes a kind of client towards the transportation system, which is represented by the node-station agent. So when the clients change their requests in a dynamic scenario to deal with unexpected situations, the User agent also has responsibility for communicating the client about any subsequent changes to the original deal. These unexpected situations must also be communicated with the other agents though transport-admin agent. The User agent must communicate to the transport-admin agent about changes on the clients’ desires (e.g. change the lieu, delays, and trip cancellations). The transport-admin will implement a negotiation with the node-station agent about the changed request. In the node-station agent layer, the node-station agent gives the client (through taxi agent) the most suitable trip solution offered by the planning together after filtering a list of proposals. At the same time, the node-station agent holds a request profile containing the client’s preferences concerning the trip. It also holds some negotiation capabilities with the other node-station agent in real time. The node-station agent processes all the client’s requests coming through the User agent. It is the agent who in charge of executing a negotiation role in the layer. In addition, the node-station is in charge of implementing the assignment through filtering policies and the negotiation process. It holds a list containing the trip requests being processed and a list of filtering policies to apply to the trip solutions. Before the taxi agent gives the proposals to the node-station agent, it will communicate with the other agents that manage the trip plan of the vehicle and other traffic information in its planning domain. In the taxi agent layer, the taxi agents communicate and plan with other agents in its planning domain to find the best trip solution.
6 Planning Algorithm In the demand responsive transportation system, a general formulation of a client’s demand function is
Y = F ( x1 • x2 • x3 • ..... • xi ) ,
(1)
where Y is the dependent variable (level of demand) and xi (i = 1, ..., n) are the explanatory variables. The demand function reflects the behavior of an individual client, whose preferences dictate the particular functional form of the relationship,
718
J. Xu, W. Yin, and Z. Huang
or it may be formulated to explain the behavior of an aggregate group of individual client. A particular specification of the demand function which is often used to analyze client’s travel demand is expressed in terms of ‘generalized costs’ (time, money…). This concept is an attempt to summarize the ‘cost’ of a journey by adding together the various components of time or money spent. The operation system can be formulated as the following optimization problem: Minimize
ACT = β ∑i=1 ai • qi + m +c0 M
(2)
where ACT is the all generalized cost of the journey at time T. β is the value of coefficients. M is a set of all passengers. m is the monetary cost of the journey. ai is the value of time associated with time component i. qi is the time required to complete the journey divided into the various components i of traveling time: like a get-on delay time of passenger i, a get-off delay time of passenger i, the travel time. co is the residual component of the ‘cost’ of making a journey which is not a function of the monetary cost or the time involved but is a ‘cost’ associated with user’s special requirement component. The User agent receives passenger's requests at time T and transfers them toward all others agents, and holds information of all passenger's requests. Taxi agent exists for each vehicle and determines its route by planning and cooperation with the node-station agents and taxi agents in its planning domain. The route is determined as follows: 1) Finds some candidate routes by its local search, 2) decide each route by planning and cooperation with agents in its planning domain, 3) do the above step until each route is chosen, 4) when the system receives a new trip request T=T+1 do the above step.
7 Experiment For example, we consider a real big metropolis situation, there is about 6,100,000 residents in metropolitan area and about 12,000 taxis in the street. The test considered 10000 random trips requests in one hour, the capacity of the vehicle (total number of seats) is 4. The scenario have 120 stations, we set taxi number as 1000, 2000, 3000, 4000, 5000. At first, we set each taxi agent has a limited planning domain as introduced above. The results (see Fig. 5) show the number of waiting passengers depending on number of taxis. The approach gives better results for the clients, which provides an acceptable balance between cost, fleet size, demand coverage, and service quality. So in this scenario, it is necessary 4000 taxis spread into all stations. Then we set each taxi agent has an unlimited planning domain that it can communicate with other all the agents. The results (see Fig. 6) show the number of
A Study of Multi-agent Based Metropolitan Demand Responsive Transport Systems
719
Number of waiting passengers
3000
2500
2000
1500
1000
500
0 1000
2000
3000
4000
5000
Number of taxis
Fig. 5 Simulation Result with Limited Planning Domain
Number of waiting passengers
3500
3000
2500
2000
1500
1000
500
0 1000
2000
3000
4000
5000
Number of taxis
Fig. 6 Simulation Result with Unlimited Planning Domain
waiting passengers depending on number of taxis. So in this scenario, it is necessary 5000 taxis spread into all stations. In Fig. 7, we have compared the two results. The model with the limited planning domain is more effectively than
Number of waiting passengers
3500 3000 2500 2000 1500 1000 500 0
0
1000
2000
3000
Number of taxis
Fig. 7 Simulation Result Compared
4000
5000
6000
720
J. Xu, W. Yin, and Z. Huang
unlimited. On the other hand, we can see the planning domain based algorithm performs better from the vehicles’ perspective.
8 Conclusion and Future Work In this paper we have proposed a new multi-agent multi-layer distributed hybrid planning model for metropolitan DRT systems. As future work, we consider to continue optimizing the model for urban demand responsive transportation system. Acknowledgment. This research is supported by the Educational Research fund of CUG under Grant No.200639, the Natural Science Foundation of Hubei Province of China under Grant No. 2005075014, and the Scientific Research Foundation for the Returned Overseas Chinese Scholars by the State Education Ministry of China.
References 1. Cascetta, E.: Transportation Systems Engineering: Theory and Methods. Kluwer Academic Press, London (2001) 2. McQueen, B.: Intelligent transportation systems architecture. Artech House Books (1999) 3. Nelson, J.D.: Recent Developments in Telematics-based Demand Responsive Transport. In: IVT seminar University of Newcastle upon Tyne (2003) 4. Sislak, D., Rehak, M., Pechoucek, M., Rollo, M., Pavlıcek, D.: A-globe: Agent Development Platform with Inaccessibility and Mobility Support. In: Software Agent-Based Applications, Platforms and Development Kits, Berlin, pp. 21–46 (2005) 5. Weiss, G.: Multiagent Systems: A Modern Approach to Distributed Artificial Intelligence. MIT Press, Massachusetts (1999) 6. Li, X., Wang, F.: Study of City Area Traffic Coordination Control on The Basis of Agent. In: Proceedings of the IEEE Intelligent Transportation Systems Conference, vol. 6, pp. 758–761 (2002)
The Diagnosis Research of Electric Submersible Pump Based on Neural Network Ding Feng, Cheng Yang, Bianyou Tan, Guanjun Xu, Yongxin Yuan, and Peng Wang*
Abstract. There are many down-hole failures of electric submersible pump which are difficult to diagnose in the process of oil production. And the fault diagnosis has become the focus to study at present. In the oil field production, the diagnosis of electric submersible pump has important significance to assuring the equipment working efficiently and saving production cost. The method of neural network pattern recognition and data acquisition is presented in the paper. What’s more, the software which can distinguish the operation mode and draw the behavior graph and trajectory characteristic graph is developed based on this method. And then study the feature extraction by the method of time series model according to the different current curve on the current cards. Moreover, it can also form a characteristic repository of the current cards and continuously perfect it. The diagnosis range and diagnosis accuracy will be improved greatly by this method, which is an extension of traditional methods. The practice shows that, this technology has very wide application prospect. Keywords: Neural network, Electric submersible pump, Pattern recognition, Diagnosis.
1 Introduction Electrical submersible pump oil recovery technique is a new technique emerging in recent years, which gets wider and wider application due to its large capacity, high-power, simple way of energy transferring between the surface equipments and down-hole, and convenient management .There is some higher integrative fault rate because of the complicated structure and adverse working environment. How to diagnose the submersible electric pump, judge and analyze malfunction reasons is an important subject of tapping the potential of oil well, assuring the equipment working effectively, prolonging the period of checking pump, and improving the whole production economy benefit of the oilfield [1]. Neural network technology is used to diagnose the electrical submersible pump, which is a new fault diagnosis method in the last decade. It is based on the Ding Feng . Cheng Yang . Bianyou Tan . Guanjun Xu . Yongxin Yuan . Peng Wang School of Mechanical Engineering, Yangtze University, Jingzhou, Hubei, 434023, China
[email protected],
[email protected] *
H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 721–727. springerlink.com © Springer-Verlag Berlin Heidelberg 2009
722
D. Feng et al.
contemporary advanced computer hardware technology and focus on the information and intelligence. Compared with the traditional diagnostic method, by this method, the diagnostic precision is much higher and the diagnosis range is much wider. In this paper, the software which can distinguish the operation mode and draw the behavior chart and trajectory characteristic graph is developed based on this method, thus forming a comprehensive diagnosis system of electrical submersible pump.
2 Neural Network and Pattern Recognition 2.1 Neural Network Neural network is a parallel information process system, which is composed of some simple processing unites called the "neurons" these neurons are arranged and connected in different topological way according to the realization of the function requirement. Its basic structure is shown in figure 1 [2].
,
Fig. 1 Neural network structure
The reason that neural network is very suitable to fault section estimation is that, it has massive parallel distributed processing ability, self adaptability, selflearning, associative memory, fault tolerance, treating complicated model and so on. Moreover, it can adjust the topological unit to solve the problem under a clutter and great uncertainty environment.
2.2 Pattern Recognition and Data Acquisition Pattern recognition includes a wide range of information processing, which can also be almost solved by manual at ease. and it is very difficult by the computer, such as characteristic recognition for the voice and words. There are two aspects for pattern recognition. One is classification, or rather, appointing the input value among the dispersion numbers. The other is regression, the output value represents continuous variable. We mainly classify the operation status of the electrical submersible pump here.
The Diagnosis Research of Electric Submersible Pump Based on Neural Network
723
Data acquisition includes automated analysis for mass data, relationship collection, recognition and establishment of none obvious trend. The process of the knowledge discovery based on the data collection can be divided into four basic stages: choosing (collecting the target data), pre-treating (preparation for the data analysis), data collecting (getting the data and analyzing through the soft of data collection) and result describing.
3 Diagnosis of Electric Submersible Pump Based on the Neural Network Pattern Recognition 3.1 Model Distinguishing for the Electric Submersible Pump with the Optimization Back Propagation Theory OBP is an operational rule which is based on an optimization model established between the layers of the multilayer feed-forward network, the viewpoint of Optimization Back Propagation (OBP) is first put forward in the paper called the “BP Study on The Fast Learning Algorithm of Networks ” [2]. What’s more, OBP is a novel learning algorithm for multilayer feed-forward neural networks, which is very useful when we make some concrete algorithm. High dimensional is substituted by the general two dimensional drawing, and relationship with each other is revealed when the data topological relation is not changed. We design a two layers network of OBP here, including an input layer and topology competitive layer composed of hexagon. We input the data of three operation states for electrical submersible pump into network, and the analysis results will be shown in figure 2: the area of the result data is formed in the chart, which is consistent with the different operation state shown in the network, and is separated by the bright boundary line domain representing low density points. It is worth mentioning that, there are two different blocks rather than only one showing in the air lock network areas of the chart. A possible explanation is that, the network has discerned the law which is not perceivesd by the present diagnostic method, such as the current cards. So it is necessary to further analyze and treat some variables for the purpose of further synthesizing the operation state for the electrical submersible pump. Fig. 2 Electric Submersible Pump Operating Conditions Classification
724
D. Feng et al.
3.2 The Behavior Chart of the Electrical Submersible Pump As previously mentioned, there has been application software which analyzing and diagnosing the electrical submersible pump systems based on the neural network. (The neural network has been used to analyze and diagnose the electrical submersible pump systems in some application software) . However, there have been only one variable (namely motor current) concerned in these software ,thus many factors of affecting system run most likely not have been taken into account, A new interpretation tool called the behavior chart for the running state of electrical submersible pump is developed by the PDVSA (Petrol’s de Venezuela) in order to test the analysis results, from which we can get an approximate value through the related variable, and it could provide much more information than the current cards or any other changing trend chart .The behavior chart is formed by the special network management model called “recurrent neural network (RNR) ” [3]. The behavior chart which is defined when the electrical submersible pump system is in under-load condition, and the behavior chart which is formed by the neural network trained based on spot data of the under-load system are shown in fig 3, 4, respectively. Recurrent neural network is an internal feedback connection network with the memory. It makes their exports not only rely on their input parameters, but also have relations with their history dealt with in the past. The dynamic introduction of feedback connection makes recurrent neural network training more complex than the simple network management. It is worth mentioning that the ability of processing the process state of recurrent neural network requires to only using an instantaneous value dealing with the variable as the input of the network on timely application. If there is no memory function, it must go to precede network training through the change trend of the instantaneous value got by memories of the past history. The behavior diagram of the electrical submersible pump lifting methods is shown in figure 5. It is a natural extension of the model distinguishing behavior described above.
Fig. 3 Behavior Chart
Fig. 4 Characteristic Trajectory
The Diagnosis Research of Electric Submersible Pump Based on Neural Network
725
Fig. 5 BC generation methodology
3.3 The Diagnosis of the Characteristic Extraction Condition of Electric Submersible Pump Current Card The current card is one of the main evidences by which the administrators can manage the electric submersible pump wells and analyze the working conditions of the downhole units. Under the different conditions, the current curves on the current cards are different. Namely the different curves are corresponding to the different current conditions. So the current curves can reflect most of the faults of the electric submersible pump system. It is the most reliable information to diagnose and analyze the conditions of the electric pump wells. Here we use the methods of time series models to extract features. From the standard current cards we can draw a conclusion: under the different conditions, the current values and extents of the current fluctuations are different. We should establish proper coordinates on the standard current cards, and then select the appropriate points on the current cards in accordance with the requirements of precision. The overall current values on current cards are corresponding to the different conditions. Here we use the sums of squares of the current values as a representative point and the distance as the discrimination function to make a classification. n
2
M = ∑ Ii ,
i=1, 2, 3, 4……n;
(1)
i =1
Among them:
, ; 2
M—Square sum of current values A I—Current value A n—The number of points taken on current curves.
,;
It is a standard current card shown in fig 6, the electrical submersible pump operating condition is marked by the thick line, namely current curves, which will be
726
Fig. 6 Reading the current cards
D. Feng et al.
Fig. 7 The result extracted from current curves
extracted from the current card by employing methods of image processing then, and the result is shown in fig 7. The characteristic value M under every working condition is got through reading the coordinate value corresponding to the pixel on the curve. And then the working condition of current card is determined by comparing this characteristic value with the threshold of the related characteristic values in the database of the standard current card. We input the working condition occurred frequently in actual production into an open current card as a standard, or concretize the existing standard, so as to enrich the database and make the diagnosis accuracy to be improved incessantly.
4 Economic and Social Benefits Having fully considered the conditions that pump units working parameters, well trajectory, physical and chemical properties of output liquid, formation conditions and other factors, the concepts and methods of comprehensive diagnosis is put forward based on this, and a comprehensive diagnostic model is established. Since 2004, this comprehensive diagnostic methods and model was used in 430 wells of Bohai oilfield and Jianghan Oilfield, and it reduced the inspection cycle of pump by 60 days, increased about 435,000 barrels oil, created a cumulative direct economic benefit of more than 300 million Yuan. At present, the diagnostic techniques of submersible pump unit comprehensive condition is applying in Daqing oilfield, Dagang oilfield and Shengli oilfield.
5 Conclusion The method of electrical submersible pump diagnosis through neural network model recognition is shown in this paper, and the soft used in field production is developed base on it, the following works has been done: 1. It mainly introduces the comprehensive structure of electrical submersible pump operation mode recognition, supplements and describes the processing procedure by using the behavior chart, including the instantaneous status (behavior chart) and evolution timely (trajectory characteristic).
The Diagnosis Research of Electric Submersible Pump Based on Neural Network
727
2. The working condition of electrical submersible pump is diagnosed by means of extracting and recognizing the character of the current cards, and the open database about working condition diagnosis is established.
References 1. Huang, X., Shi, B., Fan, X.: Introduction to The Diagnosis Technology of Electrical Submersible Pump. Oil Well Testing 9(4), 59–61 (2000) 2. Feng, D., Zhou, D.: BP Study on The Fast Learning Algorithm of Networks. Computer Engineering 22(6), 445–451 (1996) 3. Xiong, S.: Nonlinear Time Series Model for Shape Recognition Using Neural Networks. Acta Automatica Sinica 25(4), 467–475 (1999) 4. Aponte, H., Toussaint, L., Ramos, M.: Experiences Using an Electric Submersible Pump Application on Heavy-Oil Cold-Production Automation in Eastern Venezuela Fields. SPE 69708 5. Leonardo, O., Alexander, R.: Artificial-Lift Systems Pattern Recognition Using Neural Networks. SPE 69405 6. Chen, Z., Feng, D.: A Character Pick-up Method of Current Cards Based on Pattern Recognition. Petroleum Machinery 32(2), 38–41 (2004)
The Application of BP Feedforward Neural Networks to the Irradiation Effects of High Power Microwave Tingjun Li*
Abstract. BP feedforward neural networks are applied to the study of radio fuse effects by High Power Microwave (HPM). Under the irradiation of the HPM, in order to improve the detecting capacity of HPM, we analyze the damaged threshold value and combine it with experiment result of BP feed foreword neural networks. The result of simulation shows that this method is valid. Keywords: BP feedforward neural networks, High Power Microwave (HPM), Irradiation effects.
1 Introduction Recently, HPM is applied in military in a large scale. It can be used as directional energy weapon, irradiating targets in high radiant intensity to destroy its electrical equipment system. It can also use as all kinds of interfering source which has many coupling approaches to system to disturb enemy’s electrical system. As a result, it is necessary to study on this kind of electromagnetism interfering issue. Every important military action could not be without fuse, but fuse using microelectronics as core is so easy to be attacked by HPM that it is of much sense to effectively evaluate damage of fuse caused by PHM. We usually use digital circuit to test HPM effect of fuse, in which there are too much need to test and too many measure instruments need to use. As a result, it is constrained in many aspects. Applying neural networks which just solves the problem presented above to interference detection of HPM is a running method.
2 BP Feedforward Neural Networks Neural network model which uses BP algorithm is called BP network. Topology structure of multi-layer neural network model is shown in Fig. 1. Tingjun Li Naval Aeronautical and Astronautical University, Yantai 264001, China
[email protected]
*
H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 729–737. springerlink.com © Springer-Verlag Berlin Heidelberg 2009
730
T. Li
Fig. 1 the Structure of Multi-layer BP Feedforward Neural Networks
2.1 δ Learning Rule of Feedforward Networks Having Hidden Layer Number of training samples is p. That is to say that number of input and output couple is P( X k , Tk ) , (k=1,2…,P), X k = ( xk 1 , xk 2 , ⋅⋅⋅, xkM ) , Tk = (tk1 , tk 2 , ⋅⋅⋅, tkN ) .
Actual output vector of networks Ok = (Ok 1 , Ok 2 , ⋅⋅⋅, OkN )T , Wij is the weight of number i neural cell in upper layer exported to number j neural cell in lower layer. For the number k sample, situations of number j neural cell is defined:
Netkj = ∑ w ji oki i
(1)
So the output of number j neural cell is : Okj = f j ( Netkj )
(2)
Inspiring function is semi-liner function and training target function is: 1 Ek = ∑ (tkj − okj ) 2 , E = ∑ Ek 2 (1) Get the gradient decreasing Δ k w ji in each training cycle According to gradient algorithm: min f ( x), x ∈ Rn . For x, searching in the direction of negative gradient of f ( x ) :
xk +1 = xk + λk dk
(3)
d k is the searching direction which is the fastest decreasing direction started from xk . d k = −∇f ( xk )
(4)
λk is the step length of one dimension search in direction d k coming from xk Appropriately, if we take :
Ek =
1 ∑ (tkj − okj )2 , E = ∑ Ek 2
(5)
As target function, variation of weight will be in direct ratio with negative gradient of target function. That is to say:
The Application of BP Feedforward Neural Networks to the Irradiation Effects
Δ k w ji ∝ −
Divide
∂Ek ∂w ji
731
(6)
∂Ek into product of two parts: ∂w ji ∂Ek ∂Ek ∂Netkj = ⋅ ∂w ji ∂Netkj ∂w ji
(7)
Notice that Netkj = ∑ w ji oki and we get: i
∂Netkj ∂w ji
=
∂ ∂w ji
∑w
o
jm km
= oki
(8)
m
Let
δ kj = −
∂Ek ∂Netkj
(9)
and we get,
∂Ek = δ kj okj ∂Netkj
(10)
Δ k w ji = ηδ kj okj
(11)
− Finally,
(2) Find
δ kj of output layer in networks: δ kj = −
∂okj ∂Ek ∂E =− k ⋅ ∂Netkj ∂okj ∂Netkj
(12)
First, we consider the second factor, noticing the semi-liner character of inspiring function: ∂okj ∂Netkj
=
∂ f j ( Netkj ) = f ' j ( Netkj ) ∂Netkj
(13)
Now we calculate the first factor. It is discussed in 2 cases. First, we take a neural cell unit u j as output unit, at this time:
∂Ek ∂ 1 = [ ∑ (tkj − okj ) 2 ] = −(tkj − okj ) ∂okj ∂okj 2 j
(14)
732
T. Li
As a result, in output layer of networks:
δ kj = (tkj − okj ) f 'k ( Netkj )
(15)
(3) Hidden layer δ kj : when a neural cell u j is a hidden unit, we find the partial derivative of it: ∂Ek ∂Ek ∂Netkm ∂Ek ∂ =∑ ⋅ =∑ ⋅ ∂okj m ∂Netkm ∂okj m ∂Netkm ∂okj
∑w
o =∑
mi ki
i
m
∂Ek ⋅ wmj = −∑δkm wmj (16) ∂Netkm m
Substitute appropriate part of (12) with (16) we can get :
δ kj = f ' k ( Netkj )∑ δ km wmj
(17)
m
2.2 BP Algorithm of Sigmoid Inspiring Function We take sigmoid function as inspiring function: f ( Netkj ) =
1 1+ e
− Netkj
(18)
in which Net kj is the state of networks unit u j : Netkj = ∑ w ji oki + θ j
(19)
the output of unit is: okj =
1 1 = 1 + exp(−∑ w ji oki − θ j ) 1 + e − Netkj
(20)
i
in which θ j is the threshold value of unit u j . In the condition of inspiring function: f ' j ( Netkj ) =
∂okj ∂Netkj
= okj (1 − okj )
(21)
for output layer units:
δ kj = (tkj − okj ) ⋅ okj (1 − okj )
(22)
δ kj = okj (1 − okj )∑ δ km wmj
(23)
for hidden layer units: m
The Application of BP Feedforward Neural Networks to the Irradiation Effects
the adjustment of weight is:
733
Δw ji (t + 1) = ηδ kj oki
(24)
In order to be quicker and do not surge, we add a “state term”: Δw ji (t + 1) = ηδ kj oki + α Δw ji (t )
(25)
in which α is a constant which decide the effect of old weight value variation to recent weight value variation.
2.3 Steps of Algorithm (1) Set the initial value of each weight value and threshold value: w ji (0), θ (0) are small random values.
,
;
(2) Provide training samples: input X k k=1,2,…,P we iterate from (3) to (5) for each input sample (3) Actual output of networks and state of hidden layer units: okj = f j (∑ w ji oki + θ j ) i
(4) Find training error δ kj = okj (1 − okj )(tkj − okj ) , δ kj = okj (1 − okj )∑ δ km wmj m
(5) Adjust weight value and threshold value: w jt (t + 1) = w ji (t ) + ηδ j oki + α [ w ji (t ) − w ji (t − 1)]
θ j (t + 1) = θ j (t ) + ηδ j + α [θ j (t ) − θ j (t − 1)] (6) After k experiences from 1 to p, we make opinion whether it meet need of precision: E ≤ ε , ε is the precision. (7) End
3 Study of BP Feedforward Neural Networks to HPM Irradiation Effect 3.1 Experiment Equipment and Check Result As is shown in Fig. 2, in experiment system of HPM in microwave darkroom, we use microwave source of s waveband to irradiate wireless radio fuse.
Fig. 2 Experiment System Sketch Map
734
T. Li
Parameters of experiment system are: screening capacity of darkroom is 100dB; the size is 10m(L)×10m(W)×4 (H) length of the long side of trumpet irradiation antenna is 38cm,short side is 25cm long; Irradiation gain is G=18.8dB. Direction of irradiation from irradiating antenna to trumpet antenna is the right ahead of microwave darkroom. Output frequency of microwave source is 2.865GHz width of impulse is 80ns-2μs; repeated frequency of exported impulse chain is 10-500pps. Electromagnetic wave in darkroom is TEM vertical polarization wave. Microwave power density at R in front of irradiating antenna is PW = PG / (4π R 2 ) = 6.04G / R 2 . During the experiment, we adjust irradiation power of HPM by control platform. Output power of microwave source is measured by power calculator. We can notice whether the radio fuse is blasted. Result of experiment is shown in Table 1
D=
m ;
;
Table 1 HPM (befor/after) Irradiation Effect on Working Radio Fuse 6 Meters Far from the Antenna Fuse Setting Vertical Vertical Vertical Vertical Vertical Vertical Vertical Parallel Parallel Parallel Parallel Parallel
Working Current /mA before after 39 40 42 42 34 >50 42 43 39 41 41 40 44 36 >50 >50 42 43 39 41 41 40 34 >50
Demodulation Voltage/V before after 2.5 2.4 2.9 2.8 2.5 2.0 5.4 5.7 2.5 4.3 2.4 1.2 5.5 1.9 2.1 2.0 5.4 5.7 2.5 4.3 2.4 1.2 2.5 2.0
Irradiation Frequency/GHz before after 1.001 1.141 1.028 1.030 1.010 1.035 1.076 1.077 1.001 1.026 1.029 1.031 1.008 1.008 1.011 1.012 1.076 1.077 1.001 1.026 1.029 1.031 1.010 1.035
Sensitive Irradiation Degree/cm Effect before after 17 17 √830 40 60 √600 20 8 √850 44 8 17 100 100 40 35 40 34 56 √400 44 8 17 100 100 40 20 8
△ △ △ △ △ △ △ △
Note: Fuse is irradiated in vertical state and parallel state. There are 10 samples. Only samples have obvious effect on irradiation are listed in the chart and others have no variation. √ indicates fuse blast after irradiation (parameter after √ is blasting condition); indicates parameters of fuse are varied; × indicates fuse is damaged.
△
3.2 Constructions and Adjustment of Feedforward BP Neural Networks We construct feedforward BP neural networks according to different situations of HPM irradiation effect. Initial state of fuse is used as input layer of BP networks.
The Application of BP Feedforward Neural Networks to the Irradiation Effects
735
In order to make the dimension of input characters the same with that of input nodes, number of input nodes should be corresponding to character vector of fuse state. Diagnoses of output layer are corresponding to the states of fuse after irradiation. Experiment data is processed in hidden layers. Primarily, complexity degree of problem is in direct ratio of number of hidden layers and its nodes. That is to say, the more complex of interface is and the more hidden nodes, the more learning time will be needed, which depress capacity of generalize and worsen the application capacity of networks. But if there is few node, conditions of judgment will become fewer which can not reach target of training. We usually compare training and learning result from easy to difficult or from difficult to easy so as to decide numbers of hidden nodes and hidden layers. We now construct a feedforward BP neural network. Input layer nodes of neural network are: fuse state of putting, working current before irradiation, demodulation voltage before irradiation, irradiation frequency before irradiation and sensitive degree before irradiation. Output layer nodes are working current after irradiation, demodulation voltage after irradiation, irradiation frequency after irradiation, sensitive degree after irradiation and irradiation effect. According to this experiment condition, we take 30 nodes in middle layer. We set vertical and parallel state value as 1 and 0 at input port in order to process easier. Comparing state of working current, demodulation voltage and sensitive degree before and after irradiation, if there is much variation (more than 1/3 of normal value) output will be 0, otherwise, it will be 1. for irradiation effect, if fuse damages or blasts, output will be 1.If parameters of fuse varies, output will be 0. This is not only in favor of state samples normalization but provides convenient for hardware current design. For data above, we write MATLAB program, train the network and diagnose variation of fuse in condition of HPM. It can reach the target when error ≤0.25. Simulation result is shown in Fig. 3.
Fig. 3 Simulation Result
736
T. Li
Main part of program is as below: net=newff(minmax(P'),[5,30,5],{'tansig','purelin','pu relin'},'trainlm'); inputWeights=net.IW{1,1}; inputbias=net.b{1}; layerWeights=net.LW{2,1}; layerbias=net.b{2}; net.trainParam.show=10; net.trainParam.Ir=0.01; net.trainParam.mc=0.8; net.trainParam.epochs=100; net.trainParam.goal=0.02; [net,tr]=train(net,P',T'); A=sim(net,P'); E=T'-A; MSE=mse(E);
P is input vector, T is output vector. Networks can reach 0.25 error target by 60 times train in Fig. 3. We can also change training effect by adjust the number of hidden layer and nodes of it. So this method also can be used to diagnose the situation with data more than that of this experiment.
4 Conclusion Experiment data prove that applying BP feedforward neural networks which has great precise and nonlinear approaching capacity to study of HPM irradiation effect on fuse is viable. By using it, our study on irradiation effect of HPM will be more convenient. It has advantages of clear and simple structure which has good expanding capacity for digital hardware circuit.
References 1. Yang, J.: Artificial Neural Networks Practical Tutorial. Zhe Jiang University Press, HanZhou (2000) 2. Xu, D.: MATLAB 6.x System Analysis and Design. Xi’an Electronic Science and Technology University Press, Xi’an (2002) 3. Wei, G.: Study on HPM Irradiation Effect on Radio Fuse. Journal of Central Plains Engineering College 12, 145–150 (2003) 4. Li, T.J.: Data Acquiring System Based on Vxi bus. In: Proceedings of the Second International Conference on Active Media Technology, vol. 5, pp. 688–692 (2004)
The Application of BP Feedforward Neural Networks to the Irradiation Effects
737
5. Li, T.J.: Design of Computer Management System. In: Proceedings of the Third International Conference on Wavelet Analysis and Applications, vol. 5, pp. 744–749 (2004) 6. Li, T.J.: Design of Boot Loader in Embedded System. In: Proceedings of the 6th International Progress Wavelet Analysis and Active Media Technology, vol. 6, pp. 458–463 (2005) 7. Li, T.J., Lin, X.Y.: Research on Integrated Navigation System by Rubidium Clock. Journal on Communication 8, 144–147 (2006)
A Novel Model for Customer Retention Yadan Li, Xu Xu, and Panida Songram
Abstract. The prevention of customer churn through customer retention is a core issue of Customer Relationship Management (CRM). By minimizing customer churn a company can maximize its profit. This paper proposes a novel churn model to deal with customer retention problems. It does not only through churn probability to classify the customers, but also by the achieved pattern and rules to make policies.With the help of intuitionistic fuzzy set theory, α-cuts, expert knowledge, data mining technique is employed to construct the model. This study’s experiments show that the proposed model has validated its efficiency. In short, the proposed model provides a new route to guide the further research concerning customer retention. Keywords: Customer retention, Fuzzy set theory, Data mining.
1 Introduction Customer relationship management (CRM) comprises a set of processes and enabling systems supporting a business strategy to build long term, profitable relationships with specific customers [1]. Customer data and information technology tools shape into the foundation upon which any successful CRM strategy is built. In addition, the rapid growth of the Internet and its associated technologies has greatly increased the opportunities for Yadan Li · Xu Xu School of Economics and Management, Tongji University, Shanghai 200092, China
[email protected] Panida Songram Department of Computer Science, Faculty of Informatics, Mahasarakham University, Thailand 44150, Thailand H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 739–747. c Springer-Verlag Berlin Heidelberg 2009 springerlink.com
740
Y. Li, X. Xu, and P. Songram
marketing and has transformed the relationships between companies and their customers are managed [2]. Although CRM has become widely recognized as an important business strategy, there is no widely accepted definition of CRM. Reference [3] defined CRM as the strategic use of information, processes, technology, and people to manage the customers relationship with the company across the whole customer life cycle. Reference [4] defined CRM as a company approach to understanding and influencing customer behavior through meaningful communications in order to improve customer acquisition, customer retention, customer loyalty, and customer profitability. These definitions emphasize the importance of viewing CRM as a comprehensive process of retaining customers, with the help of business intelligence, to maximize the customer value to the organization. According to References [5] and [6], CRM consists of four dimensions: Customer Identification, Customer Attraction, Customer Retention, and Customer Development. They share the common goal of creating a deeper understanding of customers to maximize customer value to the organization in the long term. Customer retention is the central concern for CRM. Customer satisfaction, which refers to the comparison of customers expectations with his or her perception of being satisfied, is the essential condition for retaining customers [7]. As such, elements of customer retention include one-to-one marketing, loyalty programs and complaints management. Loyalty programs involve campaigns or supporting activities which aim at maintaining a long term relationship with customers. One-to-one marketing refers to personalized marketing campaigns which are supported by analyzing, detecting and predicting changes in customer behaviors [8]. Customer retention has a significant impact on firm profitability. Reference [8] found that a 1% improvement in retention can increase firm value by 5%. Churn refers to the tendency for customers to defect or cease business with a company. Marketers interested in maximizing lifetime value realize that customer retention is a key to increasing long-run firm profitability. A focus on customer retention implies that firms need to understand the determinants of customer churn and are able to predict those customers who are at risk of defection at a particular point in time. Customer churn is the loss of existing customers to a competitor. The phenomenon has the potential to result in considerable profit loss for a company. The prevention of customer churn, as such, is a core CRM issue. It is, therefore, prudent to find better methods of ensuring that customers remain loyal. To do this it is crucial to predict customers’ behavior. Accurate prediction may help carriers minimize churning by building lasting relationships with customers. Some carriers have begun looking into their customer churn data, typically by singling out a small number of variables and searching for dependencies between churned clients and company policies. This has typically been done using traditional statistical models. Some companies have
A Novel Model for Customer Retention
741
even gone one step further by employing data mining techniques in hopes of obtaining better results. References [9] and [10] are concerned with the discovery of interesting association relationships, which are above an interesting threshold, hidden in databases. Selected association rules can be used to build a model for predicting the value of a future customer. Previous investigations have highlighted the impact of many policies on customer retention. Yet, almost all studies have focused on increasing the accuracy of predicting churn without using the resulting analysis to make policies that prevent it. Simply predicting churn cannot reduce its rate of occurrence. In order to limit the rate of churn, more analysis is needed. It must be recognized that churn can be the result of a number of factors. While competitive pricing typically drives churn, factors such as customer service, service quality and regional coverage also cause customer defection. This paper proposes a novel model to approach the customer retention problem. The model contains two functions, i.e. classification and making policies. In the classification function, firstly, it constructs a database which concludes a number of satisfaction factors, then it forms a new database based on the original database by combining intuitionistic fuzzy set theory and αcuts. Secondly, it gets the churn probability and classifies the customers into different groups. In the making policies, it employs data mining technique to find the interesting pattern and association rules to each customer group. This is then used to create appropriate policies for different customer group. The most significant feature of this model is that it not only predicts churning but also makes proactive attempts to decision makers.
2 Proposed Model and Approach As the nature of research in customer retention model, data mining is difficult to confine to specific disciplines. Intuitionistic fuzzy set theory, α-cuts, and expert knowledge are employed to our model.
2.1 Model Architecture This proposed novel model in two functions: classification and making policies.
2.2 Intuitionistic Fuzzy Set Theory Intuitionistic fuzzy set theory(IFS) is an extension of fuzzy set theory that defies the claim that from the fact that an element x belongs to a given degree μA (x) to a fuzzy set A, naturally follows that x should not belong to A to the extent 1-μA (x), an assertion implicit in the concept of a fuzzy set. On
742
Y. Li, X. Xu, and P. Songram
Table 1 Attributes table Attributes description Enterprise offers competitive price(D1 ) Enterprise is abreast of developing new products(D2 ) Complains are taken by enterprise’s employees(D3 ) It is easy to get enterprise’s contact with the right person at call center (D4 ) The employees at enterprise’s center are competent and professional(D5 ) The enterprise utilizes e-mail to communicate with customers(D6 ) The enterprise’s sales representative is competent and has profound knowledge(D7 ) The enterprise offers gifts to customers in special days(D8 ) The enterprise’s society responsibility (D9 ) · · ·· · ·
the contrary, an IFS assigns to each element x of the universe both a degree of membership μA (x) and one of non-membership vA (x) such that μA (x) + vA (x) ≤ 1
(1)
Thus relaxing the enforced duality μA (x) = 1 − vA (x) from fuzzy set theory. Obviously, when μA (x)+vA (x) = 1 for all elements of universe, the traditional fuzzy set concept is recovered. IFS can satisfy the customers needs and feelings effectively. So the proposed model utilizes IFS to help construct the original satisfaction database. In this proposed model, the satisfaction values are adopted to build the database. Table 1 shows some of attributes description, and the attributes values satisfy IFS.
2.3 α-Cuts An element x ∈ X that typically belongs to a fuzzy set A, when its membership value to be greater than some threshold α ∈[0,1]. The ordinary set of each element is the α-cut Aα of A. Aα = {x ∈ X, μA (x) ≥ α}
(2)
Equation (2) is employed in this paper, Reference [11] also defines the strong α-cut Aα = {x ∈ X, μA (x) > α} (3) The membership function of a fuzzy set can be expressed in terms of the characteristic function of its α-cuts according to the formula: 1 if f x ∈ Aα ; uA (α) = (4) 0 otherwise.
A Novel Model for Customer Retention
743
2.4 Data Mining Data mining combines the statistic and artificial intelligence to find out the rules that are contained in the data, letters, and figures [12]. The central idea of data mining for CRM is that data from the past that contains information that will be useful in the future. So as to acquire and retain potential customers and maximize customer value. Appropriate data mining tools, which are good at extracting and identifying useful information and knowledge from enormous customer databases, are one of the best supporting tools for making different CRM decisions. There are many methods of data mining including classification, estimation, prediction, clustering, and association rules. Among these, association rules can discover the high frequency pattern and discover which things appear frequently and simultaneously. In this novel model, association rules are guided to make policies to different customers.
2.5 Expert Knowledge Expert knowledge and percentage of customer satisfaction(P, churn probability ) are combined to classify the customers. Experts of this fields are employed to confirm the boundary of customer satisfaction. For example, Case one: 0 1 0 1 0 1 1 1 0, P=5/9=56%; Case two: 1 1 1 1 0 0 0 1 1, P=6/9=65%; Case three: 1 1 1 1 1 1 1 0 1, P=8/9=89%; Case four: 1 1 1 1 1 1 1 1 1, P=100%. The customers can be divided into different groups according to the values of P. Group one: P<60%; Group two: 60% ≤ P< 80%; Group three: 80% ≤ P <100%; Group four: P=100%.
3 Model Explanation The steps of our proposed model included are described below: − Construct the database of customer retention with support of IFS. − Set the value of α to form the transformed database. − Combine expert knowledge and the percentage of satisfaction to divide customers into different groups. − To different groups, select the value is one to form transaction tables with the help of α-cuts. − Discovery the pattern of customer retention combined with data mining technique.
744
Y. Li, X. Xu, and P. Songram
Table 2 Original table D1 0.3 0.3 0.6 0.7 0.4 0.7 0.5 0.3 0.5 0.4 0.5 0.6 0.6 0.6 0.6 0.7 0.6 0.4 0.7 0.1
D2 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.7 0.3 0.3 0.6 0.7 0.3 0.3 0.3 0.2 0.7 0.5 0.6 0.2
D3 0.7 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.5 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0.8 0.8 0.3 0.6
D4 0.8 0.5 0.6 0.6 0.6 0.6 0.6 0.4 0.4 0.5 0.5 0.4 0.4 0.3 0.3 0.3 0.5 0.6 0.2 0.6
D5 0.6 0.4 0.6 0.6 0.6 0.6 0.7 0.3 0.2 0.2 0.1 0.1 0.1 0.2 0.1 0.2 0.6 0.6 0.6 0.7
D6 0.8 0.3 0.7 0.6 0.6 0.6 0.6 0.2 0.4 0.3 0.4 0.3 0.2 0.3 0.3 0.3 0.6 0.6 0.6 0.6
D7 0.7 0.5 0.8 0.8 0.8 0.8 0.8 0.4 0.4 0.3 0.3 0.3 0.2 0.4 0.3 0.4 0.8 0.8 0.8 0.8
D8 0.6 0.6 0.6 0.6 0.6 0.6 0.8 0.5 0.4 0.4 0.6 0.3 0.5 0.6 0.6 0.7 0.6 0.6 0.6 0.8
D9 0.4 0.8 0.8 0.9 0.7 0.9 0.9 0.9 0.7 0.6 0.5 0.6 0.3 0.6 0.5 0.6 0.9 0.7 0.8 0.6
3.1 Classification Part The following steps explicate the processes of classification part. Step one: Table 2 shows the original table. Step two: Suppose α=0.6, the transformed table is shown in table 3. Step three: Combine expert knowledge and the percentage of satisfaction to divide customers into different groups(see 2.5).
3.2 Making Policies The customers can be divided into four groups: emergency customers, demanding customers, price sensitive customers, permanent customers according to the values of P. Here the following emergency customers(P is less than 60%) are employed to illustrate the function. Table 4 shows the emergency customers table based on table 3. The following steps explicate the processes of making policies part. Step one: The transaction table is shown in table 5. Each transaction is assigned a transaction identifer(TID). Step two: Calculate the appearing times of every transaction item and show in Table 6.
A Novel Model for Customer Retention
745
Table 3 Transformed table D1 0 0 1 1 0 1 0 0 0 0 0 1 1 1 1 1 1 0 1 0
D2 1 1 1 1 1 1 1 1 0 0 1 1 0 0 0 0 1 0 1 0
D3 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 0 1
D4 1 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 0 1
D5 1 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1
D6 1 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1
D7 1 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1
D8 1 1 1 1 1 1 1 1 0 0 1 0 0 1 1 1 1 1 1 1
D9 0 1 1 1 1 1 1 1 1 1 0 1 0 1 0 1 1 1 1 1
Step three: Acquire the support of every appearing item. D1 =0.5, D2 =0.3, D3 =0.2, D8 =0.5, D9 =0.7. Step four: Compare the support of every appearing item with min-support (0.3). It is obviously to find frequent item set I1 , I2 . It is obviously to find frequent item set I1 , I2 . I1 = { D1 , D2 , D8 , D9 }, I2 = { D1 D8 , D8 D9 }. Step five: The min-confidence is say, 60%, then the association rules can be yielded: D1 ⇒D8 (60%); D8 ⇒D1 (60%); D8 ⇒D9 (60%). Table 4 Transformed table of emergency group D1 0 0 0 0 0 1 1 1 1 1
D2 1 0 0 0 1 1 0 0 0 0
D3 1 1 0 0 0 0 0 0 0 0
D4 0 0 0 0 0 0 0 0 0 0
D5 0 0 0 0 0 0 0 0 0 0
D6 0 0 0 0 0 0 0 0 0 0
D7 0 0 0 0 0 0 0 0 0 0
D8 1 0 0 0 1 0 0 1 1 1
D9 1 1 1 1 0 1 0 1 0 1
746
Y. Li, X. Xu, and P. Songram
Table 5 Transaction table TID 1 4 7 10
Items D2 D3 D8 D9 D9 D1 D1 D8 D9
TID Items TID Items 2 D3 D9 3 D9 5 D2 D8 6 D1 D2 D9 8 D1 D8 D9 9 D1 D8
Table 6 Appearing times of some items Item D1 D3 D9
Appearing times Item Appearing times 5 D2 3 2 D8 5 7
The useful patterns are presented below: − If customers value the competitive price offered by the enterprise, they will like the gifts provided by the enterprise in special days. − If customers like the gifts provided by the enterprise in special days, they will value the competitive price offered by the enterprise. − If customers like the gifts provided by the enterprise in special days, they will concern social responsibility of the enterprise.
4 Experiment Results This model of customer retention with the cooperative corporation started in 2008. In such a situation, the novel model has been tested with the corporation recently. Some interesting pattern and rules are achieved in our research. Different policies are appropriate for different customers.
5 Conclusions This paper has described a model architecture to deal with the complete customer retention problem. This is accomplished not only through dividing customers into different groups, but also by proposing retention policies. The model works in two modes, namely, classification and making polices. In the classification part, firstly, the model builds the transformed database from the original database combined IFS and α-cuts. At a second step, the model divides customers into different groups from the transformed database based on expert knowledge and percentage of customer satisfaction. In the making policies part, different policies are established to different groups through data mining technique. The experiments show that the proposed model has
A Novel Model for Customer Retention
747
an efficiency to the corporation. Owing the percentage of satisfaction equals to churn probability in this model, the model process signifies an interesting and important approach toward a better support in retaining possible churners. Acknowledgements. The research is supported by the National High-Tech. R&D Program for CIMS, China (No.2007AA04Z151), Program for New Century Excellent Talents in University, China (No. NCET-06-0377), Shanghai Leading Academic Discipline Project(No.B310) and the National Natural Science Foundation, China (No.70531020). Thanks to the cooperative enterprises for their help in making this study possible.
References 1. Chi, H.L., Phill, K.R.: Web Personalization Expert with Combining Collaborative Filtering and Association Rule Mining Technique. Expert Systems with Applications 21, 131–137 (2001) 2. Ngai, E.W.T.: Customer Relationship Management Research (1992-2002): An Academic Literature Review and Classification, Marketing Intelligence, Planning 23, 582–605 (2005) 3. Parvatiyar, J.N.: Customer Relationship Management: Emerging Practice, Process, and Discipline. Journal of Economic and Social Research 36, 1–34 (2001) 4. Kincaid, J.W.: Customer Relationship Management: Getting it Right, Upper Saddle River, New York (2003) 5. Swift, R.S.: Accelarating Customer Relationships: Using CRM and Relationship Technologies, Upper Saddle River, New York (2001) 6. Chen, I.J., Popovich, K.: Understanding Customer Relationship Management (CRM): People, Process and Technology. Business Process Management Journal 9, 672–688 (2001) 7. Kivetz, R., Simonson, I.: Earning the Right to Indulge: Effort as a Determinant of Customer Preferences toward Frequency Program Rewards. Journal of Marketing Research 39, 155–170 (2002) 8. Kim, E., Kim, W., Lee, Y.: Combination of Multiple Classifiers for the Customer’s Purchase Behavior Prediction. Decision Support Systems 34, 167–175 (2001) 9. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer, New York (2001) 10. Wang, K., Zhou, S., Yang, Q., Yeung, M.S.: Mining Customer Value: From Association Rules to Direct Marketing. Data Mining and Knowledge Discovery 11, 58–79 (2005) 11. Didier, D., Henri, P.: Fuzzy Set Theory and its Applications. Kluwer-Nijhoff Publishing, London (2001) 12. Xu, X., Jie, L., Feng, W., Yi, J.: Utilize a Novel Approach to Find the Relative Pattern of Patients of a Disease. In: 7th World Congress on Intelligent Control and Automation, pp. 4201–4205. IEEE Press, Piscataway (2008)
Neural Network Ensemble Approach in Analog Circuit Fault Diagnosis Hong Liu, Guangju Chen, Guoming Song, and Tailin Han*
Abstract. Neural Network (NN) based analog circuit fault diagnosis approach is the widely used strategy in present. In this paper, a NN-ensemble-based strategy has been introduced into the field of analog circuit fault diagnosis. Efficient, accurate and different fault features sets are obtained by resampling the original feature sets with Bagging algorithm in order to train individual RBF neural networks as component classifiers simultaneously, then plurality voting strategy is employed to isolate the actual faults of analog Circuit Under Test (CUT). Experimental results indicate that compared with any of its individual RBF neural network, the NN ensemble is able to effectively improve the generalization ability of the analog circuit fault classifier and increase the fault diagnosis accuracy. Keywords: Neural network ensemble, Bagging, Analog circuit, Fault diagnosis.
1 Introduction With the increase of analog circuits scale and integration, and application of plentiful mixed analog-digital circuits, diagnosis and test in analog circuits is becoming a difficult and urgent task. For analog circuits, the lack of simple fault models, the presence of component tolerances, noise, and circuit nonlinearities make the diagnosis automation of analog circuits very complex [1-3]. Hong Liu . Guangju Chen . Guoming Song School of Automation Engineering, University of Electronic Science and Technology of China *
,Chengdu 610054, China
Hong Liu School of Computer Science and Technology, Changchun University of Science and Technology Changchun 130022, China
,
Guoming Song Computer Engineering Dept., Chengdu Electromechanical College Chengdu 610031, China Tailin Han College of Electronic Science and Engineering, Changchun University of Science and Technology, Changchun 130022, China
[email protected] H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 749–757. springerlink.com © Springer-Verlag Berlin Heidelberg 2009
750
H. Liu et al.
Faults in analog circuits can be classified into two categories: catastrophic and parametric. Catastrophic faults include open nodes, shorts between nodes, and other topological changes in a circuit. Parametric faults refer to any change in the value of an element with respect to its nominal value outside the tolerance limits, without affecting its connectivity. According to the faults number, faults in analog circuits are often categorized into two types: single fault and multiple faults. The number of single fault in electronic devices accounts for 70-80% compared to total fault number, and for some multiple faults always correlate, they can be seen as a single fault. So in this paper, we only take single soft fault in analog circuits into account [1]. Along with the development of the artificial intelligence technology, NN-based analog circuits fault diagnosis methods have been the hotspot for many scholars. NN’s great advantage in function fitting and self-learning ability, good robustness and prominent classification ability make it a superb classifier in analog circuit fault diagnosis. But a common criticism for NN-based analog circuits fault diagnosis comes up subsequently because the tedious process of selecting the proper NN architecture and the over-fitting, under-fitting, local minima and poor generalization phenomenons. In order to improve the generalization ability of analog circuit fault classifier and make it more practical, one solution is combining the decisions of several classifiers rather than using the output of the best classifier in the ensemble. And the other is Support Vector Machine. Since the problem of analog circuit fault diagnosis is equivalent to pattern recognition problems [1], in this paper, the existing NN ensemble strategy in pattern recognition field is selected. In this paper, a method for diagnosing analog circuit fault based on NN ensemble is presented. Firstly we focus on the theoretical analysis, then the NN ensemble method is presented, after obtained the original fault feature sets from the response of CUT, the resampled training sets for training every component networks simultaneously, and the plurality voting strategy which is employed to diagnosis the unknown faults of the CUT. At last, experimental results verify the validity of the proposed method and the fault diagnosis accuracy has been increased.
2 Theoretical Analysis and Implement Steps 2.1 Theoretical Analysis ANN ensemble is a group of classifiers which are combined together in order to obtain a better generalization ability than that gained by a single classifier. It’s a very successful technique where the outputs of a set of separately trained neural network are combined to form one unified prediction. The basic framework of NN ensemble is shown in Fig. 1. Many works have been done in investigating why and how neural network ensemble works. The most famous and classical one is Krogh and Vedelsby’s work [4]. The formulation for ensemble error in case of regression using a linearly
Neural Network Ensemble Approach in Analog Circuit Fault Diagnosis
weighted ensemble has been proved [5]. Assume the task is to learn a function
751
f,
and the training samples are drawn randomly from the distribution p (x ) . Suppose that the ensemble consists of N networks and the output of network α is called
V α (x) , then the final output of the ensemble is defined as (1): V ( x) = ∑ ωα V α ( x ) α
(1)
x of an individual network is defined as a = (V ( x) − V ( x)) . Then the ensemble diversity on input x is:
The α
diversity on
input
α
2
a ( x) = ∑ ωα aα = ∑ ωα (V α ( x) − V ( x)) 2 α
α
The quadratic errors of the network and (4):
(2)
α and of the ensemble are respectively in (3)
ε α ( x) = ( f ( x) − V α ( x)) 2
(3)
e( x) = ( f ( x) − V ( x)) 2
(4)
Another form for e( x) can be conducted according to (2):
e( x ) = ∑ ω α ε α ( x ) − a ( x ) α
(5)
E α ( x) , Aα ( x) and E ( x) to be the averages, over the input distribution, of ε α ( x) , a α ( x) and ε ( x ) respectively, shown in (6) to (8).
We define
E α ( x) = ∫ dxp ( x)ε α ( x)
(6)
Aα ( x) = ∫ dxp ( x)a α ( x)
(7)
E ( x) = ∫ dxp ( x)ε ( x)
(8)
From Eq. (6), the ensemble generalization error
E=E−A
E can be formulated as: (9)
(9) clearly demonstrates that the generalization ability of ensemble is determined by the average generalization ability and the average ambiguity of the individual neural networks that constituted the ensemble. It means that there are two ways to increase the generalization ability of the ensemble. One way is to decrease the generalization error to make E smaller, another is to increase the difference
752
H. Liu et al.
Output Combine network outputs O1 Network 1
On
O2 Network 2
Network n
Input Fig. 1 Basic framework of NN ensemble
degrees among the individual neural networks to make A larger [5]. That is the reason why the errors of the individual ensemble components will be counteracted when their predictions are combined. (9) also shows that the generalization error of the ensemble is always smaller than the average of the individual errors, that is E < E . In particular for uniform weights:
E≤
1 N
Eα ∑ α
(10)
2.2 The Selection of the Component Networks According to the theoretical analysis, a NN ensemble is constructed by two steps, one is to design a number of individual neural networks and the other is to combine their predictions according to a certain rules. As far as it goes, Back Propagation (BP) NN and Radial basis function (RBF) NN are the most prevailing classifiers to diagnose faults in analog and mixedsignal circuits. RBF NN is a two-layer network whose output nodes form a linear combination of the Gaussian kernel functions computed by the hidden layer nodes. The basis functions in the hidden layer produce a localized response to input stimulus. That is, they produce a significant nonzero response only when the input falls within a small localized region of the input space. And theory has been verified, for a given nonlinear function, using the RBF neural network can approximate it with any accuracy. And more important thing is that the RBF neural network can avoid tedious redundancy computing of reverse direction propagation between the input layer and the hidden layer. And the speed of learning is 103-104
Neural Network Ensemble Approach in Analog Circuit Fault Diagnosis
753
Table 1 Hypothetical resampling of Bagging method
A Sample of Bagging on the Same Data Set Original Training Set {1 2 3 4 5 6 7 8} Resampled Training Set 1 {2 7 8 3 7 6 3 1} Resampled Training Set 2 {3 6 2 7 5 6 2 2} Resampled Training Set 3 {4 5 1 5 6 4 3 8} Resampled Training Set 4 {7 8 5 6 4 2 7 1} times faster than BP neural network [6]. So RBF is employed as the component neural network of the ensemble.
2.3 Ensemble Technique In ensemble methods, the most prevailing training techniques are Bagging and Boosting. In this paper, we concentrate on bagging method that generate disagreement among the classifiers by altering the training set and network parameter that each classifier employs. Bagging was proposed by Breiman based on Bootstrap sampling [7]. It generates several training sets from the original training set, and each training set is an independent sample of the data. Thus, some examples are missing and others occur multiple times. For example in Table 1, there are eight training examples in original training set, by resampling, we form 4 training sets for 4 component classifiers. In this case, there are the same numbers of examples in original and resampled training sets. Bagging method which let every classifier has different train sets and different network parameters guarantee the diversity of the component networks. It makes better generalization ability possible. Boosting [8] encompasses a family of methods. The focus of these methods is to produce a series of classifiers. The boosting training sets are also samples of the original data set, but the incorrectly predicted examples occur more times in later training sets since Boosting concentrates on correctly predicting it. Thus Boosting attempts to produce new classifiers that are better able to predict examples for which the current ensemble’s performance is poor. In [7], results indicate that in cases where there is noise Bagging’s error rate will not increase as the ensemble size increases whereas the error rate of the Boosting methods may indeed increase as ensemble size increases. That is to say, boosting sometimes causes over-fitting. In this paper, Bagging is chosen as the ensemble technique, and boosting technique will be employed in the further research in the field of analog circuit fault diagnosis.
3 Experiments Analysis When use NN to diagnose analog faults, there exist differences in test stimulus. Among these, Reference [9] uses an impulse test signal, and fault features of the
754
H. Liu et al.
Fig. 2 Sallen-key Bandpass Filter
response in time domain input to BP NN to diagnose faults in a differential amplifier. Reference [10] consists of a white noise test signal and BP NN for response analysis in time domain and fault diagnosis in a band-pass filter. The work in [11] exploits DC test signal to extract fault features from some limited nodes, and RBF and BP NN are considered and compared for a pure resistive circuit. In our work, sallen-key bandpass filter is chosen as the analog CUT. And Sinusoidal wave with constant amplitude 5 V has employed as the input signal of the CUT. The nominal values of every component in CUT are shown in Fig.2. In Fig.2, each resistor has a tolerance of 5% and capacitors have a tolerance of 10%. A sensitivity analysis of the discrete components has shown that the CUT response is sensitive to R2, R3, C1 and C2. When resistors and capacitors values vary within each tolerance limit respectively, the CUT response is fault free. We assume that components of R2 and R3 have soft faults in the intervals ± (5%, 70%) of normal value, C1 and C2 have soft faults in the intervals ± (10%, 70%) of normal value. Then faults can be classified into 8 fault models: R2 ↓ , R2 ↑ , R3 ↓ , R3 ↑ , C1 ↓ , C1 ↑ , C2 ↓ , C2 ↑ , where ↓ or ↑ stand for lower or higher values than nominal ones, respectively. Therefore 9 fault models including faultfree state are obtained. To generate training data for different fault classes, we set faulty components in the circuit and other components’ values are varied within their tolerances. The set of single soft deviation faults is given in Table 2. In our experiment, CUT is modeled and the response waveforms of faulty and fault-free states have been simulated by OrCAD 10.5 PSpice software. To construct the fault dictionary, the response waveforms of the corresponding faulty or fault-free states have been sampled. Fault feature data of the response form the original data set. For every fault pattern, 100 Monte Carlo runs are performed. For
Neural Network Ensemble Approach in Analog Circuit Fault Diagnosis
755
Table 2 A list of single soft faults of CUT Element
Fault value
2.4k Ω
Fault index 1
3k Ω
4.8k Ω
2
2k Ω
1.2k Ω
3
R3 ↑
2k Ω
2.8k Ω
4
C1 ↓
5nF
3.8nF
5
C1 ↑
5nF
6.1nF
6
C2 ↓
5nF
2.1nF
7
C2 ↑
5nF
7.5nF
8
R2 ↓ R2 ↑ R3 ↓
Nominal value
3k Ω
9 patterns, there are 900 examples obtained in total. Then we choose 540 examples to construct the original training sets, and the left 360 samples form the test sets. Afterwards, the original feature data are preprocessed and normalized. The purpose of preprocessing is to reduce the inputs number of the neural network in order to reduce computational time and the aim of normalizing is to speed up the convergence of the neural networks. In the processing of our experiments, the results show that most of the reduction in error for ensemble method occurs with the first additional classifiers, especially the first six. So that, 15 RBF neural networks have been chosen as the component classifier, and by bagging resampling technique, we get 15 different resampled training sets for every component networks and train them in parallel. These resampled training sets are generated by randomly drawing from the original data set respectively. The whole diagnosis process was simulated by MatLab R2008. The first step is to preprocess and normalize the original feature data. And then divide the data into original training set and test set. After these two steps, the training and testing process in 1 run is described as the following: Step 1 Generate the Bagging samples for individual networks by randomly drawn samples from original training set. Step 2.Train RBF neural network on the Bagging samples and save the parameters of trained neural network. Step 3 Predict by the trained RBF neural network on the test data and save the results. Step 4 Combine the results of individual RBF neural networks by using voting mechanism. Step 5 Calculate the fault diagnosis accuracy. In the processing of combination the different decisions of component classifiers, plurality voting rule is chosen for the voting mechanism.
756
H. Liu et al.
Table 3 Fault diagnosis results of RBF and RBF NN
Component
Diagnosis correction rate (%)
R2
Single RBF 80
RBF NN 90
R3
84
91
C1
88
95
C2
100
99
In our experiments, we operate 10 runs so that the overall performance of this diagnosis system is obtained by averaging the performance of 10 runs. From Table 3, compared with single RBF network in reference [1], we could see that the fault diagnosis correction rate increases as the theoretical predict. And ensemble diagnosis accuracy is also higher than the one of any individual network in the ensemble. In this paper, Bagging ensemble fault diagnosis system is constructed. The above results show the superior performance of the proposed approach. Moreover, the high classification accuracy is obtained. This approach is suitable for fault diagnosis of other analog circuits.
4 Conclusion A NN-ensemble-based analog circuit fault diagnosis method is presented here. Examples in original feature set describe the behaviors of the CUT. They consist of the circuit measurements representing the fault free and different fault scenarios. RBF NNs are individual classifiers which employed to build the ensemble. Bagging algorithms are utilized to get different training sets in random to train the individuals in ensemble. And trained ensemble of NNs is employed to isolate the CUT faults. At last, ensemble of individual classifier’s fault diagnosis correction rate is calculated. Experimental results indicate that this scheme is feasible, and they also reveal that the higher fault classification accuracy are obtained by employed this approach. Next step the testability analysis of the analog CUT and a new feature extraction method base on entropy will be considered to increase the accuracy of analog fault diagnosis. Also the Boosting ensemble technique will be utilized for the further research.
References 1. Wang, C., Xie, Y.L., Chen, G.J.: Fault Diagnosis Based on Radial Basis Function Neural Network in Analog Circuits. In: International Conference on Communications, Circuits and Systems Proceedings, pp. 1183–1185 (2004) 2. Catelani, M., Fort, A.: Fault Diagnosis of Electronic Analog Circuits Using a Radial Basis Function Network Classifier. Measurement 28, 147–158 (2000)
Neural Network Ensemble Approach in Analog Circuit Fault Diagnosis
757
3. Wang, P., Yang, S.: A New Diagnosis Approach for Handling Tolerance in Analog and Mixed-signal Circuits by Using Fuzzy Math. IEEE Trans. on Circuits and Sys.ts.I: Fundamental Theory and Applications 52, 2118–2127 (2005) 4. Krogh, A., Vedelsby, J.: Neural Network Ensembles, Cross Validation, and Active Learning. In: Advances in Neural Information Processing Systems, vol. 8, pp. 231–238 (1995) 5. Zhao, Y., Gao, J., Yang, X.Z.: A Survey of Neural Network Ensembles. In: International Conference on Neural Networks and Brain, pp. 438–442 (2005) 6. Liu, Y., Wang, Y., Zhang, B.F.: Ensemble Algorithm of Neural Networks and Its Application. In: Proceedings of the Third International Conference on Machine Learning and Cybernetics, pp. 3464–3467 (2004) 7. Opitz, D., Maclin, R.: Popular Ensemble Method: An Empirical Study. Journal of Artificial Intelligence Research 11, 169–198 (1999) 8. Schapire, R.: The Strength of Weak Learnability. Machine Learning 5, 197–227 (1990) 9. Maidon, Y., Jervis, B.W.: Using Artificial Neural Networks or Lagrange Interpolation to Characterize the Faults in an Analog Circuit: An Experimental Study. IEEE Trans. on Instrumentation and Measurement 48, 932–938 (1999) 10. Spina, R., Upadhyaya, S.: Linear Circuit Fault Diagnosis Using Neuromorphic Analyzers. IEEE Trans. on Circuits and Systems II: Analog and Digital Signal Processing 44, 188–196 (1997) 11. Mahammadi, K., Monfared, A.R.M.: Fault Diagnosis of Analog Circuits with Tolerances By Using RBF and BP Neural Networks. In: IEEE Student Conference on Research and Development Proceedings, vol. 2, pp. 317–321 (2002)
Research on Case Retrieval of Case-Based Reasoning of Motorcycle Intelligent Design Fanglan Ma, Yulin He, Shangping Li, Yuanling Chen, and Shi Liang*
Abstract. The case retrieval model based on neural network is presented to enhance the efficiency and quality of retrieving case in the case-based reasoning system of the motorcycle intelligent design. In the retrieval model, the adaptive resonance theory neural network was used to dynamically cluster the cases in the case base to narrow the searching range. The back propagation neural network was applied to memory the index of cases to retrieve quickly the similar case from the narrowed case base. Thus the efficiency and quality of retrieving case are improved. Finally, an example of the plan selection of motorcycle general design was given. Its result was contrasted with that of case retrieval based on the nearest neighbor method to demonstrate the effectives of the case retrieval model. The research shows that it is practicable and effective using the adaptive resonance theory and BP neural network to modeling the reasoning mechanism. Keywords: Motorcycle, Case-based reasoning, Adaptive resonance theory, Back propagation neural network, Case retrieval.
1 Introduction In the normal general design of motorcycle, many enterprises reference the available ripe products to design a new one. During the design procedure, after they modify the parameters of the available ripe motorcycles, they carry out experiments and evaluate the performance repeatedly on the new one. With this method, it needs to have experienced designers to select, modify and evaluate a suitable product from the numerous motorcycles. For the young designers, they are difficult to make a decision from the numerous plans because they are short of Fanglan Ma . Yuanling Chen . Shi Liang College of Mechanical Engineering, Guangxi University, Nanning, China
*
Yulin He College of Mechanical Engineering, Chongqing University, Chongqing, China Shangping Li Mathematics and Science Department of Qinzhou University, Qinzhou, China {FanglanMa,YulinHe,ShangpingLi,YuanlingChen,ShiLiang, lan_mfl}@163.com H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 759–768. springerlink.com © Springer-Verlag Berlin Heidelberg 2009
760
F. Ma et al.
experience. Also, reusing the design knowledge is lower and the design plans of the same product are various. Recently, the case-based reasoning (CBR) presented in intelligent design system is suitable to solve the above-mentioned problem. The CBR system finds a previous successful case for the new problem from the case base. However, the system is usually used the nearest neighbor method to retrieve the similar case [1]. It needs to determine the weights of characteristic attributes. And the weight determination is affected by subjectivity to a great extent. The retrieval speed is increased linearly along with the number of cases. For the above reason, the connection weight of artificial neural network is used to determine the weight of characteristic attributes to lessen the influence of subjectivity [2]. The self-organization characteristic mapping neural network is developed to cluster and index the cases to improve the precision and speed of retrieving case [3]. It is showed that the artificial neural network is superior to the traditional CBR system in case retrieval. The adaptive resonance theory (ART) neural network is in an advantageous position to the back propatagtion (BP) neural network because it has incremental on-line learning ability and new input pattern does not affect the clustered cases. So it has better stability and plasticity. In view of the above reason, in the intelligent design of motorcycle the ART1 neural network is integrated with BP neural network to lessen the subjective influence on the weights and to increase the efficiency of case retrieval. The ART1 neural network is used to dynamically cluster the motorcycle cases. According to the clustered case template, it can shorten the retrieval range in the case base. The BP neural network is applied to memory the clustered case to build the case index. And then the reasoning system is successfully built by means of identification and matching of the neural network. At the same time, the retrieval results are contrasted with those based on the nearest neighbor method to test the effectiveness of the case-based reasoning system based on the neural network.
2 Index Modeling Based on Neural Network In the overall plan design of motorcycle, the plan design provides detailed design such as overall disposition design and moulding design with relative parameters to guide the development work. Therefore, the index model based on neural network is to retrieve the optimal parameter combination reflecting the performance demands of motorcycle for customers according to the requirement characteristic of plan design. And then the designers can enhance the design efficiency and quality with the help of the system.
2.1 Hierarchical Case Organization Based on the Overall Design Characteristic The available overall design plans of motorcycle are divided into three layers (Fig. 1) in order to indexing the cases on the basis of the neural network. In the normal overall design procedure, the motorcycle type determines the types of the frame and engine to a great extent. Also, it affects the installment and arrangement
Research on Case Retrieval of Case-Based Reasoning
761
Motorcycle
autoscooter
Straddle frame
general purpose trail bike trail bike general purpose race motorcycle motorcycle motorcycle … … performance parameter
engine Frame …
Performance parameter
…
…
road motorcycle
…
Fig. 1 Hierarchical structure of motorcycle plan design
of other assemblies. So the first layer is divided by the characteristic of motorcycle type, and the nodes of the layer are straddle frame and autoscooter. Because the classification of the motorcycle type is relatively stable, it is retrieved by the guide menu during the case-based reasoning. The second layer is classified by the requirement characteristic of the plan design. Each motorcycle type in the first layer is divided into several clustered templates. The design plans located in different template reflects the purpose and performance requirement of different motorcycle type. Therefore, the characteristics, the engine emission, the highest speed and power of motorcycle, and so on, are extracted from the motorcycle product to produce the clustered templates for different purpose. The number of these templates is not constant for that it is got from some classification rule. So the incremental on-line learning ability of the ART1 neural network is used to carry out uncertain classification. On the basis of the clustered templates, each case is identified by detailed characteristics such as engine emission, the highest speed and power to form the third layer. The classification of this layer is to memory all cases located in certain clustered template so that it can use supervised neural network to realize. In accordance with the above hierarchical organization, the ART1 neural network in the CBR system of motorcycle intelligent design is used to cluster the design plans, and the BP neural network is used to memory the clustered plans to get the index model.
2.2 ART1 Clustered Algorithm Based on Characteristic The ART1 neural network makes use of the self-excitement and side-inhibition of neuron to guide learning. The input pattern is contrasted and identified by the twoway connection weight. And the neural network is resonated to memory it. Meanwhile, the neural network recalls the pattern with the same way[6]. The structure of the ART1 neural network is in Fig.2.
762
F. Ma et al. 抑制
Output vectors 输出矢量
Output layer
1
2
...
...
j
m 前馈 wij
tij 反馈 Input layer 1
2
...
i
...
n
输 vectors 矢量 Input
Fig. 2 Structure of ART1 neural network
The cluster of the ART1 neural network takes advantage of the feedback connection weights tij to memory the learned pattern, that is, the neural index of cases. The number of groups of cluster is adjusted by the warning parameter. The bigger it is, the more the number of groups is, and the greater the similarity among cases is. On the contrary, the less the number of groups is, the lower the similarity among cases is. The input vectors of ART1 are the characteristic attributes describing the product information. And the output vectors are the number of groups of cluster. Its algorithm is described as below. (1) Code characteristic of case. The characteristic of cases needs to be coded because the inputs of ART1 neural network must be the two numerical number, that is, 0,1. For the character characteristic, it can be transformed directly into 0, 1 patterns. 1 means that it has the characteristic, and 0 means it has not the characteristic. For the numerical characteristic, it needs to be region splitting. The possible selection range is divided into several regions. And the possible value of characteristic is the length of the characteristic code. (2) Input the characteristic code into ART1 neural network, and set suitable warning parameter to produce suitable groups of cluster. (3) Save the structural parameters of ART1 neural network, and produce the clustered indexes of cases based on ART1.
2.3 Index Model of BP Neural Network Based on the Clustered Template BP neural network has three or more layers of neurons. It is trained by means of supervised learning. When the neural network is provided with a couple of learning patterns, the active values of neurons are propagated from the input layer to the middle(hidden) layer, and finally to the output layer. The neurons located in the output layer output the corresponding response. So that the BP neural network is suitable to memory each clustered case.
Research on Case Retrieval of Case-Based Reasoning
763
It needs to build the BP neural network for each cluster template for that the cases clustered different templates have incomplete similar characteristics. Therefore, the index model of BP neural network for the motorcycle design plan has three layers. It means the neural network has only one hidden layer. The number of input layer neurons is the number of characteristic attributes. In order to memory the index of each case, the output neurons are the number order of case clustered in some cluster template. For example, there are 3 cases clustered in certain template. These 3 cases are used as training sample. And then the corresponding output layer have three neuron, and they can be expressed as {1,0,0}, {0,1,0}, {0,0,1}. The number of input and output neurons depends on the dimensions of the input and output vectors. However, the number of hidden neurons needs to be repeatedly calculated to be determined.
3 Example of Indexing Overall Design Plan Based on Neural Network During the overall design of motorcycle, the representative key characteristic attributes are selected from numerous parameters to form the design plan. For this reason, the input vectors of neural network are the requirement characteristic attributes, such as the motorcycle type, the engine emission, the highest speed, the highest power, the highest moment of rotation, the least radius of gyration and climbing capacity. When the new design requirement is input into the reasoning system, the neural network will retrieve the similar cases.
3.1 Training and Clustering of ART1 The characteristic attributes selected from certain motorcycle products are coded into the input vectors of the ART1 neural network, and then the neural network is trained to get the index model of cluster template. The training procedure of ART1 is divided into these phrases, initialization, identification, comparison, learning and renewal. It can be known about from reference [6]. According to reference [7], the engine emission p of different straddle frame of motorcycle type can be divided into three regions. That is, the engine emission of the trail bike is between 0 and 50mL, the general purpose motorcycle (or the cross-country motorcycle) is between 51 and 250mL, and the race motorcycle is between 251 and 1100mL. Hence, the length of characteristic attributes of the engine emission is three. With the similar method, other characteristics, such as the highest speed vmax, the highest power wmax, the highest moment of rotation Tmax, the least radius of gyration rmin and climbing capacity c, are split as in Table 1. And the codes of the partial samples are showed in Table 2. 40 training samples and 25 testing samples are selected from the available product database of motorcycle. The final training and testing results are showed in Table 3 according to learning procedure of reference [6].
764
F. Ma et al.
Table 1 Range splitting of characteristic attributes Range splitting
attributes
straddle frame 0~50,51~250,251~1100
p /mL vmax/km•h-1 wmax /kW Tmax/N.m rmin/m c /(°)
autoscooter 0~60,61~250
0~60,61~85,85~120,121~250 3~5.5,5.6~10,11~22,23~29,30~80 4.3~6.5,6.6~7.7,7.8~25,26~30,31~115 1.6~1.8,1.9~2.5,2.6~3 6~9,18~22,≥30
0~50,51~110 3~6,7~15 5~8,8.1~22 1.6~1.8,1.9~2.2 6~9,10~22
Table 2 Code of partial samples
1 code 2 code 3 code
Motorcycle type autoscooter 01 straddle frame 10 straddle frame 10
p /mL
vmax/km•h-1
49 10
60 01
5.0 10
124 010
130 0001
398 001
165 0001
wmax /kW Tmax/N.m
rmin/m
c /(°)
6.96 10
1.8 10
7.5 10
16.2 10000
17.6 00100
2.2 100
21 010
39 00001
35.3 00001
2.6 001
30 001
Table 3 Classification results of straddle frame based on ART1
trail bike 4 Training samples Testing samples
Right
4
results general crosspurpose country 8 8 8
7
race 20 20
Error
0
1
0
0
Right Error
4 4 1
8 7 1
8 7 0
5 5 0
3.2 Training of BP Neural Network The training procedure of BP neural network are composed of 4 sections, that is, the forward propagation of input pattern, the backward propagation of the output error, the repeated memory training and trained results identification. The output error tends toward a tiny minority after the input pattern is repeatedly forward propagated and the output error is repeatedly backward propagated.
Research on Case Retrieval of Case-Based Reasoning
765
Table 4 Partial training samples of straddle-frame general-purpose motorcycle based on BP neural network input sample 1 2 3 4
Motorcycle type Straddle frame Straddle frame Straddle frame Straddle frame
sample
p /mL
vmax -1 /km•h
wmax /kW
Tmax /N.m
rmin /m
c /(°)
79
70
4.92
8
1.8
18
85
80
5.22
7.35
1.9
18
88
75
4.85
7.74
1.9
19
97
80
5.88
8.13
1.9
20
Output
Trained results
1 2 3
1 0 0
0 1 0
0 0 1
0 0 0
0.9986 0 0.0001
0.0002 0.9946 0.0051
0.0037 0.0048 0.9923
0.0001 0.0003 0.0059
4
0
0
0
1
0
0
0.0038
0.9928
On the basis of the cluster template, each case is identified with the detailed characteristic, the parameter value of characteristic attribute. Therefore, BP neural network is built for each template to index each case, and is trained according to the above learning procedure. Table 4 is the trained results of motorcycle clustered in the straddle-frame general-purpose motorcycle template (In Table 4, the listed results are partial). The cases clustered in this template are 16, that is, the output nodes of the neural network are 16. Each node stands for a case. The structure of BP neural network is 7×17×16 (the number of the input, the hidden and the output nodes). The training samples are normalized and distributed between 0 and 1. After that, the neural network is trained with these samples. The learning rate is 0.4, the momentum parameter is 0.8, and the overall error is 0.00001. After the training processes are circulated 380680, the error function is tended to be stable. And the final trained results are in Table 4. From Table 4, it can be seen that the index model based on BP neural network has memorized the input and output learning patterns. For example, the output vectors of No.1 sample are [1 0 0 0]. And the trained results of BP neural network are [0.9986 0.0002 0.0037 0.0001]. They have become much closer. The results also show that the trained results can meet the need of precision. Now the connecting weights and biases determined by the neural network have built the index of cases. And the neural network can be used to retrieve the similar case.
3.3 Generation of Overall Design Plan The case retrieval procedure based on the ART1 and BP neural network is the following.
766
F. Ma et al.
Table 5 Retrieved results based on the neural network
requirement Similar case requirement Similar case requirement Similar case
Motorcycle type autoscooter autoscooter Straddle frame Straddle frame Straddle frame Straddle frame
wmax Tmax/ /kW N.m 5 6 5.29 6.37
rmin/ m 1.9 1.9
c/ (°) 7.5 7.5
50 49
vmax/ km•h-1 60 60
125
110
10
10
1.9
20
124
110
10.3
10.9
2.1
20
400
160
30
35
2.8
30
398
160
31.6
33.3
2.6
30
p /mL
similarity 0.9607
0.9752
0.9627
(1) Code new cases according to the rule. (2) Fetch the structure parameter and weights of the neural network, and input the characteristic code. Then the neural network will produce automatically the index of cases (the cases are classified into different templates). (3) Search cases in the case base in accordance with the index route to find out the corresponding clustered template, that is, to find out a certain number of cases relative with the new case. (4) If the new case can not be allotted into any template, it means the system can not produce index route. However, ART1 neural network can build automatically new index, that is, the system will generate a new cluster template, with which the new case can be added into the case base. (5) Retrieve the optimal case with the forward operation of BP neural network according to the detailed characteristic of case. From the above, it can be known about that the reasoning system is based on the structure parameters of the ART1 and BP neural network to retrieve a new case for a new design plan input into the neural network. Table 5 is the retrieved result of cases based on the neural network. From the results it can be fount out the most similar case (the similarity is maximum value) retrieved by the reasoning system can meet the need of the initial design requirement. After the similar case is evaluated and revised, the final design plan can be come into being. Therefore, the index model based on the neural network can be used as the case retrieval mechanism of CBR system.
3.4 Comparison with the Traditional Index Model In the case retrieval of the traditional CBR system, the similarity between cases is determined by means of the nearest neighbor method. The equation is the following.
Research on Case Retrieval of Case-Based Reasoning
767
n
sim(T , S ) = ∑ wi sim(Ti , S i )
(1)
i =1
In the equation, T is the target case, S is the source of cases, n is the number of attributes of case, sim is the function calculating the similarity, wi is the weight of No. i attribute. In order to evaluate the effectiveness of the index model based on the neural network, the same design requirement is input into the CBR system built on the traditional method. It retrieves the similar case according to the similarity value between the old case and the new one. The determination of attribute weights is calculated with the step analysis method. The retrieved results are showed in Table 6. From the table, it can be found out the results retrieved from the nearest neighbor method tally with those retrieved from the neural network. It also shows that building the index model based on the neural network is practicable and effective. Table 6 Retrieved results based on the nearest neighbor method Motorcycle type
P /mL
vmax -1 /km•h
wmax /kW
Tmax/ N.m
rmin /m
c /(°)
requirement
autoscooter
50
60
5
6
1.9
7.5
Similar case
autoscooter Straddle frame Straddle frame Straddle frame Straddle frame
49
60
5
6.96
2.1
7.5
125
110
10
10
1.9
20
124
110
10.3
10.9
2.1
20
400
160
30
35
2.8
30
398
160
31.6
33.3
2.6
30
requirement Similar case requirement Similar case
similarity
0.9379
0.9510
0.9887
4 Conclusion (1) The case base of motorcycle overall design plan is clustered dynamically on the basis of characteristic with the ART1 neural network, and the clustered cases are memorized with the BP neural network. As a result, it can narrow the searching range of cases, enhance the retrieval efficiency and avoid the frequent searching through the database. (2) From the retrieval cases procedure, it can be seen that the new plan does not need to determine the weights of characteristic attributes during the retrieving procedure based on the neural network. Therefore, it can avoid the subjective effect on the weights, and can increase the case retrieval quality. (3) It can be known about that from the comparison result with the traditional case retrieval the CBR system can speedily and correctly retrieve the similar case
768
F. Ma et al.
from the case base. Thus the case retrieval mechanism has better retrieval effect and higher practicality. Acknowledgment. This project is supported by National Natural Science Foundation of China (No.50375161).
References 1. Watson, I.: Case-based Reasoning Is a Methodology not a Technology. KnowledgeBased Systems 12, 303–308 (1999) 2. Lee, S., Ryu, J.H., Won, J.S.: Determination and Application of the Weights for Landslide Susceptibility Mapping Using an Artificial Neural Network. Engineering Geology 71, 289–302 (2004) 3. Kim, K.S., Han, I.: The Cluster-indexing Method for Case-based Reasoning Using Self-organizing Maps and Learning Vector Quantization for Bond Rating Cases. Expert Systems with Applications 21, 147–156 (2001) 4. Fung, W.K., Liu, Y.H.: Adaptive Categorization of ART Networks in Robot Behavior Learning Using Game-theoretic Formulation. Neural Networks 16, 1403–1420 (2003) 5. Yang, B.S., Han, T., An, J.L.: ART-KOHONEN Neural Network for Fault Diagnosis of Rotating Machinery. Mechanical Systems and Signal Processing 18, 645–657 (2004) 6. Wang, X., Wang, H., Wang, W.H.: Artificial Neural Network Principle and Applications. North and East University Press, Shenyang (2000) 7. Automobile Engineering Manual (Motorcycle Parts). People’s Communication Press, Beijing (2001) 8. Wang, H., Wang, H., Zhou, X.H.: Use Artificial Neural Network to Facilitate CaseRetrieval Course in Injection Mould Case-based Reasoning Decision-making System. Mechanical Science and Technology 22, 474–479 (2003) 9. Li, H.G., Gao, G.A.: A Hybrid Intelligent CAPP System Based on Neural Indexing Case and Knowledge. Journal of Computer Aided Design and Computer Graphics 12, 317–320 (2000) 10. Yuan, F.C., Chiu, C.C.: A Hierarchical Design of Case-based Reasoning in the Balanced Scorecard Application. Expert Systems with Applications 36, 333–342 (2009) 11. Li, H., Sun, J., Sun, B.L.: Financial Distress Prediction Based on OR-CBR in the Principle of K-nearest Neighbors. Expert Systems with Applications 36, 643–659 (2009) 12. Li, H., Sun, J.: Ranking-order Case-based Reasoning for Financial Distress Prediction. Knowledge-Based Systems 21, 868–878 (2008)
Improving Voice Search Using Forward-Backward LVCSR System Combination Ta Li, Changchun Bao, Weiqun Xu, Jielin Pan, Yonghong Yan
Abstract. Voice search is the technology that enables users to access information using spoken queries. Automatic speech recognizer (ASR) is one of the key modules for voice search systems. However, the high error rate of the state-of-the-art large vocabulary continuous speech recognition (LVCSR) is the bottleneck for most voice search systems. In this paper, we first build a baseline system using language model (LM) with domain-specific information. To improve our system, we propose a forward-backward LVCSR system combination method to decrease the search errors in speech recognition. This also helps to improve the spoken language understanding (SLU) performance. Experiment results show that our proposed method improves the performance of speech recognition by 5.7% relative CER reduction and increases the F1measure of SLU by 1.5% absolute on our test set. Keywords: Voice search, LVCSR, SLU, forward-backward, system combination.
1 Introduction Voice search is the technology that enables users to access information using spoken queries. In recent years, a number of practical voice search services have emerged, mostly in the domain of local search [1]. With these services, users can retrieve requested information, such as phone numbers, addresses and other information, of a point of interest (POI) by speaking to an automated agent. Ta Li · Changchun Bao · Weiqun Xu · Jielin Pan · Yonghong Yan ThinkIT Speech Lab, Institute of Acoustics, Chinese Academy of Sciences, Beijing, P.R. China {tli,baochangchun,xuweiqun,jpan,yonghong.yan}@hccl.ioa.ac.cn H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 769–777. c Springer-Verlag Berlin Heidelberg 2009 springerlink.com
770
T. Li et al.
Typical voice search systems usually consist of three modules: automatic speech recognizer (ASR), spoken language understanding (SLU), and dialog manager (DM) [2]. And some practical voice search applications only contain two modules in the system. For example, automated directory assistance system (ADAS), which is one of the most popular voice search applications [3], takes two modules in two steps as follows. First, a spoken query O is converted into a text query Q using ASR. And then based on the text query Q, the most relevant information is retrieved from the database using SLU and returned to users. Apparently, ASR which is the first module in the voice search systems, plays an important role on the overall system performance. The high error rate of ASR results certainly makes SLU more difficult, and then weakens the overall voice search performance. Hence, the first step to establish a good voice search system, is to build a good ASR system. A large vocabulary continuous speech recognition (LVCSR) engine is usually used in voice search systems as the ASR module to recognize more flexible spoken queries. The decoding in LVCSR can be done using Bayes’ Rule [4] by ˆ = arg max P (O|Q)P (Q) Q (1) Q
where P (O|Q) is the acoustic model (AM) and P (Q) is the language model (LM). Most previous voice search systems attempted to adapt AM and LM to the voice search task, and significant improvements have been achieved in last few years [3, 5]. Besides the AM and LM, the performance of ASR is also influenced by the decoding strategy. Most state-of-the-art LVCSR systems use the viterbi search algorithm as decoding strategy. And the typical viterbi search algorithm for LVCSR is performed in forward direction which is from left to right in a time-synchronous fashion. Backward search is used in second-pass decoding for LVCSR based on the information stored in forward-pass, and improve the decoding speed with no increase in search errors [6]. On the basis of the optimized AM and LM, we propose a novel decoding strategy to further improve the the performance of LVCSR. In this paper, we first introduce our voice search system and attempt to impose domainspecific information in the LM of our system. With the optimized LM, our baseline system achieved a relative better performance. In LVCSR system, search errors are inevitably produced by the pruning algorithm, and different decoding strategy leads to different search errors. Combining the two systems with different decoding strategies may reduce the search errors. We propose the forward-backward LVCSR system combination method, which consists of two different decoding strategies (the forward and backward viterbi search). With different search space and decoding order, forward and backward viterbi searches make use of complementary information, which help to reduce the search errors. Combining the system outputs using domain-specific information also helps to improve the SLU performance. Compared with the baseline
Improving Voice Search
771
system, our approach improves both speech recognition and spoken language understanding performances. This paper is organized as follows. In section 2, we introduce a baseline system of our voice search task. In section 3, the forward-backward LVCSR system combination approach is presented in detail. Section 4 describes the experiments and results. Finally, conclusions are given in section 5.
2 Baseline System of Voice Search Given a user’s spoken query by the ASR engine. Then it information will be retrieved about the ASR and SLU will
O, it is firstly converted into a text query Q is sent to the SLU module, and the relevant and provided to users. In this section, details be given.
2.1 ASR Implementation in Voice Search In recent years, some grammar-constrained ASR systems have been successfully used in voice search systems [5]. However, to deal with more naturalness and more complicated semantics in spoken queries, we choose to use LVCSR in our system. AM is one of the most important parts of the LVCSR. The widely used Hidden Markov Model (HMM) is adopted in our AM training. The acoustic models were trained by the minimum phone error (MPE) training criterion [7]. All the triphone HMMs adopt 3-state left-to-right topology, and a robust state clustering with two-level phonetic decision trees was used [8]. LM is another important part of the LVCSR, especially in the domainspecific voice search task. The POIs in the database of our system is limited, so the POI list is considered as the important source information for LM. For this reason, the LM probability can be estimated as below P (W ) = (1 − λ)Pg (W ) + λPl (W )
(2)
where Pg (W ) is the LM built using general spontaneous speech corpus, and Pl (W ) is the LM built using the POI list. λ is the interpolation weight. But using the POI list in LM building still can not cover most syntaxes in the domain of our task. For further imposing the domain-specific information in syntax level, we collected some real spoken queries in the related domain. With the transcriptions of the real spoken queries, the new LM is built as below Pn (W ) = (1 − λr )P (W ) + λr Pr (W )
(3)
where Pr (W ) is the LM built using the real spoken queries, and λr is the interpolation weight. Pn (W ) is the new LM which is used in our baseline system. The LMs used in this paper were all built using SRILM tools [9].
772
T. Li et al.
Besides AM and LM, an efficient LVCSR decoder is also important to the voice search task, so we use a real-time forward viterbi beam search decoder described in [10].
2.2 SLU Implementation in Voice Search The spoken Language Understanding (SLU) module is designed to detect and extract the information of POI entities as key element of shallow understanding. For the Voice Search application, we define 3 types of entities, the first type is related to location of POI, like location and address, the second type includes all the class of POIs, like bank, cinema and service station, the third type of entities concerns about some information about a POI such as price, rank for a hotel and cuisine style. Inspired by Named Entity Recognition (NER) technique in Information Retrieval (IR), the POI entity detection and extraction is based on sequence tagging. Maximum entropy (ME) model [11] is adopted here for POI entity detection. Input word sequence w is converted to feature vectors x, which contains lexical, contextual and application specific knowledge. By sending these features into ME model , we get the optimal tag sequence, which is used for POI entities extraction. The recognized POI entities are then sent to dialogue management (DM) module for further interaction.
3 Forward-Backward LVCSR System Combination By imposing the domain-specific information in the LM, the baseline system has obtained good result on our task, the overall system performance still can be improved by a different decoding strategy in LVCSR. Forward viterbi search which is one of the most popular decoding startegies has been widely used in the state-of-the-art LVCSR systems. It is performed in the forward direction from left to right in a time-synchronous fashion.
3.1 Backward Viterbi Search Backward viterbi search is implemented opposite to the forward viterbi search which is performed in forward direction from left to right in a timesynchronous fashion. Because of the reverse decoding order of backward viterbi search, we need to change most of knowledge sources involved in LVCSR. First, we need to build a reverse state network as the search space for our backward viterbi search. Compared to forward viterbi search [10], there are only a few changes in pronunciation dictionary and triphone HMMs. The
Improving Voice Search
773
Forward triphone hmm models:
t33 t23
1.0 Null
t22
S3 S
t11 t12
t34
S2
S1
t22
t33
1.0 Null
Backward triphone hmm m models:
t11 t12
1.0 Null
S1 S
t23 S2
t34 S3
1.0 Null
Fig. 1 Triphone HMM in both forward and backward decoder (3-state left to right)
pronunciation dictionary need to be reversed by reversing the order of all phonemes in one word, for Chinese word “(Beijing)”, • normal pronunciation: b ei3 j ing1 • reversed pronunciation: ing1 j ei3 b The triphone HMMs need to be reversed as shown in Figure 1. Second, reversed LM need to be used in backward viterbi search. Basically we only need to reverse the training corpus used in section 2, and train the reversed LM by the reversed corpus using the same training approach described in section 2. Finally, we obviously need to reverse the order of sending the feature vectors into ASR because of the right to left decoding order. For example, we send the feature vectors O1 , O2 . . . , On into the forward decoder, while we send the reversed feature vectors On , On−1 . . . , O1 into the backward decoder.
3.2 System Combination In recent years, system combination technology has been widely used in many ASR systems[12, 13, 14]. The combined output results are always generated by multi-recognizer’s 1-best results, n-best results, or output confusion networks (CN). In [15], LM information was used for LVCSR system combination and the performance was improved. In our voice search task, the domain of our application is limited, so the LM trained by much in-domain corpus in our LVCSR system can cover most of the n-grams grammar and semantics in the spoken queries, which makes the LM contain more heuristic information for our task. Given an in-domain LM, the perplexity of an utterance reflects its relationship with the domain, so the recognized text query with lower perplexity helps to better performance in both ASR and SLU. For this
774
T. Li et al.
reason, we combined the forward and backward systems’ hypothesis outputs by using the perplexity. After forward-backward decoding in our LVCSR system, we got the forward hypothesis output word sequences q1 q2 . . . qn and the backward hypoth
esis output word sequences qm qm−1 . . . q1 . First we need to reverse the order
of the backward hypothesis outputs to q1 q2 . . . qm . Then we combined the two hypothesis outputs into final queries as follows: P Pf =
< n
−1/n P (qi |q1:i−1 )
(4)
i=1
−1/n
P (qi |q1:i−1 )
(5)
q1 q2 . . . qn if P Pf <= P Pb
q1 q2 . . . qm if P Pf > P Pb
(6)
P Pb = Q=
< m i=1
where P Pf is the perplexity of the forward hypothesis outputs q1 q2 . . . qn in our in-domain LM, P Pb is the perplexity of the reversed backward hypothesis
outputs q1 q2 . . . qm in our in-domain LM. Considering the in-domain LM used in our system is trigram, we can simplify the formulas in (4) and (5) as P (qi |q1:i−1 ) = P (qi |qi−2:i−1 )
(7)
P (qi |q1:i−1 ) = P (qi |qi−2:i−1 )
(8)
Finally, the combined outputs Q is obtained and sent to the SLU module.
4 Experiments and Results 4.1 Evaluation Data Some spoken queries were collected when users were asked to try our voice search system to find some POI information in the area of ZhongGuanCun (ZGC), Beijing. The details of the POI information are listed in Table 1. The data is divided into a develop set (D-SET), and a test set (T-SET), D-SET consists of 816 spoken queries while T-SET consists of 1411 spoken queries.
4.2 ASR and SLU Results First, the forward-backward LVCSR system combination results are listed in Table 2. Although the backward system’s performance is not as good as that of the baseline system, we get a better performance in our system combination. Compared with the baseline system, a relative 9.8% character error rate
Improving Voice Search
775
Table 1 Main classes in our domain Domain classes bank,cinema,restaurant,hotel,hospital, service station,stadium and gymnasium address,toponym,price,rank for a hotel, cuisine style,phone number,bus number
POI type Detailed information for a POI
Table 2 The ASR improvements of our system combination set D-SET
T-SET
ASR system baseline system backward system system combination baseline system backward system system combination
CER 13.3% 14.3% 12.0% 8.8% 10.2% 8.3%
SER 45.7% 49.4% 43.0% 37.4% 41.7% 35.2%
(CER) reduction and a relative 5.9% sentence error rate (SER) reduction are obtained on the D-SET. We also get a relative 5.7% CER reduction and a relative 5.9% SER reduction on the T-SET. From the results, we find the complementary information between forward and backward LVCSR system helps to decrease the search errors, and the in-domain LM can give some heuristic information for effective system combination. Then, we send our ASR outputs into SLU, and Table 3 shows the SLU improvements with our method. Evaluation of SLU is measured in three metrics: precision, recall and F1-measure (F1), defined as follows: precision = recall =
# of correctly recognized P OI × 100% # of all recognized P OI
# of correctly recognized P OI × 100% # of all P OI F1 =
2 × precision × recall precision + recall
Table 3 The SLU improvements of our system combination set D-SET T-SET
ASR system baseline system system combination baseline system system combination
precision 87.4% 88.3% 87.8% 88.1%
SLU result recall 78.8% 82.6% 83.6% 86.0%
F1 82.9% 85.4% 85.6% 87.1%
776
T. Li et al.
From Table 3, system combination also improves the performance in SLU over the baseline system. The precision, recall, and F1 of our method are all better than that of the baseline system on both D-SET and T-SET.
5 Conclusions Building a high-performance voice search system is a challenging task. It involves a combination of ASR, SLU, DM and other additional modules, depending on the applications. In this paper we focused on the ASR which is the first and important module in voice search systems. We first introduced our experience in building our baseline system. We then proposed forwardbackward LVCSR system combination approach in our ASR module, and the experiments show an improvement of the accuracy for both speech recognition and spoken language recognition and understanding. Acknowledgements. This work is partially supported by MOST (973 program, 2004CB318106), the National Natural Science Foundation of China (10574140, 60535030), the National High Technology Research and Development Program of China (863 program, 2006AA01010, 2006AA01Z195).
References 1. Miller, D.: Speech-enabled Mobile Search Marches On. Speech Technology Magazine (2007) 2. Wang, Y., Yu, D., Ju, Y., Acero, A.: An Introduction to Voice Search. Signal Processing Magazine, IEEE 25(3), 28–38 (2008) 3. Yu, D., Ju, Y., Wang, Y., Zweig, G., Acero, A.: Automated Directory Assistance System–from Theory to Practice. In: Proceedings of Interspeech (2007) 4. Rabiner, L., Juang, B.: Fundamentals of Speech Recognition, pp. 200–238. Prentice-Hall International Inc., Englewood Cliffs (1999) 5. Gao, Y., Ramabhadran, B., Chen, J., Erdogan, H., Picheny, M., Center, I., Heights, Y.: Innovative approaches for large vocabulary name recognition. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2001), vol. 1 (2001) 6. Austin, S., Schwartz, R., Placeway, P.: The Forward-backward Search Algorithm. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 1991), pp. 697–700 (1991) 7. Povey, D., Woodland, P.: Minimum Phone Error and I-smoothing for Improved Discriminativetraining. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2002) (2002) 8. Liu, C., Yan, Y.: Robust State Clustering Using Phonetic Decision Trees. Speech Communication 42(3), 391–408 (2004) 9. Stolcke, A.: SRILM-an Extensible Language Modeling Toolkit. In: Seventh International Conference on Spoken Language Processing (2002)
Improving Voice Search
777
10. Shao, J., Li, T., Zhang, Q., Zhao, Q., Yan, Y.: A One-Pass Real-Time Decoder Using Memory-Efficient State Network. IEICE Transactions on Information and Systems 91(3), 529 (2008) 11. Ratnaparkhi, A., et al.: A Maximum Entropy Model for Part-of-speech Tagging. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 133–142. Association for Computational Linguistics (1996) 12. Sinha, R., Gales, M., Kim, D., Liu, X., Sim, K., Woodland, P.: The CU-HTK Mandarin broadcast news transcription system. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2006) (2006) 13. Hoffmeister, B., Plahl, C., Fritz, P., Heigold, G., Loof, J., Schluter, R., Ney, H.: Development of the 2007 RWTH Mandarin GALE LVCSR system. In: IEEE Automatic Speech Recognition and Understanding Workshop, Kyoto, Japan (December 2007) 14. Ng, T., Zhang, B., Nguyen, K., Nguyen, L.: Progress in the BBN 2007 Mandarin Speech to Text system. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2008), pp. 1537–1540 (2008) 15. Schwenk, H., Gauvain, J.: Combining Multiple Speech Recognizers Using Voting and Language Model Information. In: Sixth International Conference on Spoken Language Processing, ISCA (2000)
Agent Oriented Programming for Setting Up the Platform for Processing EEG / ECG / EMG Waveforms Tholkappia Arasu Govindarajan, Mazin Al-Hadidi, and Palanisamy V.*
Abstract. Agent can be defined as a component that, given a goal could act in the place of a user within its domain knowledge. Agents are also called intelligent agents, as intelligence is a key component of agency. Agent oriented approach can be viewed as next step of Object Oriented approach. The paper attempts to demonstrate the concept of developing Multi-Agent platform for processing of Biosignals. It also demonstrates the concept of developing and deploying agents using JADE – Java Agent DEvelopment framework The technical goal is to develop a multi agent platform for processing of bio-signals aiming at assisting medical practitioners in developing standard examination procedures. The Intelligent agents interact with themselves and the expert system to produce the report for the given bio-signal. Keywords: Agent, JADE, Bio-Signals, EEG, ECG, EMG.
1 Introduction Agent can be defined as a component that, given a goal could act in the place of a user within its domain knowledge. Agents are also called intelligent agents, as intelligence is a key component of agency. The paper attempts to demonstrate the concept of developing Multi-Agent platform for processing of Bio-signals. It also Tholkappia Arasu Govindarajan Assistant Professor, Department of Computer Science & Engineering, Jayam College of Engineering & Technology, Dharmapuri, Tamil Nadu, India
[email protected] *
Mazin Al-Hadidi Assistant Professor, Department of Computer Engineering, AL-Balqa’ Applied University, Al-Salt, Jordan 19117
[email protected] V. Palanisamy Principal, Info Institute Engineering, Coimbatore, Tamil Nadu, India
[email protected] H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 779–789. springerlink.com © Springer-Verlag Berlin Heidelberg 2009
T.A. Govindarajan, M. Al-Hadidi, and V. Palanisamy
780
demonstrates the concept of developing agents using JADE – Java Agent DEvelopment framework. Agent-based software engineering is a relatively new field and can be thought of as an evolution of object-oriented programming. Though agent technology provides a means to effectively solve problems in certain application areas, where other techniques may be deemed lacking or cumbersome, there is a current lack of mature agent-based software development methodologies. This deficiency has been pointed out as one of the main barriers to the large-scale uptake of agent technology. Thus, the continued development and refinement of methodologies for the development of multi-agent systems is imperative, and consequently, an area of agent technology deserving significant attention. The proposed methodology for multi-agent system does not attempt to extend object-oriented techniques, instead focusing on agents specifically and the abstractions provided by the agent paradigm.
2 Features of JADE The following is the list of features that JADE offers to the agent programmer: • • • • • • •
• • • • • •
Distributed agent platform. The agent platform can be split among several hosts. Graphical user interface to manage several agents and agent containers from a remote host. Debugging tools to help in developing multi agents applications based on JADE. Intra-platform agent mobility, including transfer of both the state and the code (when necessary) of the agent. Support to the execution of multiple, parallel and concurrent agent activities via the behaviour model. JADE schedules the agent behaviours in a nonpreemptive fashion. FIPA-compliant Agent Platform, which includes the AMS (Agent Management System), the DF (Directory Facilitator), and the ACC (Agent Communication Channel). Many FIPA-compliant DFs can be started at run time in order to implement multi-domain applications, where a domain is a logical set of agents, whose services are advertised through a common facilitator. Each DF inherits a GUI and all the standard capabilities defined by Efficient transport of ACL messages inside the same agent platform. Library of FIPA interaction protocols ready to be used. Automatic registration and deregistration of agents with the AMS. FIPA-compliant naming service: at start-up agents obtain their GUID (Globally Unique Identifier) from the platform. Support for application-defined content languages and ontologies. InProcess Interface to allow external applications to launch autonomous agents.
Agent Oriented Programming for Setting Up the Platform
781
3 Agent Based Bio-signal Processing Scenario The paper attempts to demonstrate the concept of developing Multi-Agent platform for processing of Bio-signals. It also demonstrates the concept of developing agents using JADE – Java Agent DEvelopment framework. The agents are trained, intelligent system that is capable of setting up the platform for processing the EEG / ECG / EMG waveforms. The agents themselves communicate with each other in decision making process. The technical goal is to develop a multi agent platform for processing of biosignals aiming at assisting medical practitioners in developing standard examination procedures. If a medical practitioner wants to have an expert opinion about EEG / ECG / EMG of his patient, Generic Agent can be invoked to which he has to specify the SSN (Social Security Number) of the patient, the type of the signal and the corresponding data file. The Generic Agent in turn will search for the Specific Agent – EEG Agent, ECG Agent, EMG Agent based on the signal type on the network and if found, the corresponding information will be passed to the specific agent by the Generic Agent. For example the EMG medical practitioner wishes to have an expert opinion, the EMG Agent with all necessary information, will look for an EMG Expert System (HINT, DARE, CANDID, MIYOSYS-II). Getting the Expert knowledge, the interpretation will be sent back to the Generic Agent through EMG Agent. The expert opinion will be displayed on the user side as well as it will be stored in Database by DB Agent for further references.
4 Analysis The analysis phase aims to clarify the problem to a sufficient level of detail, with minimal concern about the solution. The steps in the analysis phase are summarized below: • Step 1: Use Cases. The system requirements are analyzed and a use case diagram created based on these requirements. • Step 2: Initial Agent Types Identification. By applying a set of rules, an initial diagram of the multi-agent system called the agent diagram is produced. • Step 3: Responsibilities Identification. By observing the agent types produced in the agent diagram and applying a set of rules, an initial table of responsibilities is produced, called the responsibility table, for those agents whose responsibilities are clear initially. • Step 4: Acquaintances Identification. The obvious acquaintances between agents are identified, and subsequently the agent diagram and responsibility table are updated. • Step 5: Agent Refinement. The agent diagram and responsibility table are updated by applying a number of considerations related to support, discovery, and management and monitoring. • Step 6: Agent Deployment Information. The agent deployment diagram is produced, where the agents and the physical hosts/devices the agents are going to be deployed are indicated. • Iterate Steps 1-6.
782
T.A. Govindarajan, M. Al-Hadidi, and V. Palanisamy
The important elements gained from carrying out the above steps are the artifacts. These artifacts form the basis for the design phase.
5 Design Once the problem has been clarified to a sufficient level of detail, a move is made from the analysis to the design phase, which aims to specify the solution. From this point on, the proposed methodology focuses on the JADE platform (and hence, the constructs provided by it). Carrying out the design phase allows to reach a level of detail that is sufficient enough to have a relatively straightforward transition to the implementation, with the possibility of a significant amount of code being generated. The steps in the design have been summarized below: • Step 1: Agent Splitting/Merging/Renaming. By considering system performance and complexity in relation to the agent deployment diagram produced in analysis, it is determined whether agents should be split, merged or left as is. • Step 2: Interaction Specification. All responsibilities in the responsibility table related to an acquaintance relation with another agent are considered, and the interaction table produced for each agent type. • Step 3: Ad-Hoc Interaction Protocol Definition. In the case that an existing interaction protocol can not be used for an interaction, an ad-hoc interaction protocol is defined using a suitable formalism. • Step 4: Message Templates. The interaction table is updated to specify suitable MessageTemplate objects in behaviours to receive incoming messages. • Step 5: Description to be Registered/Searched (Yellow Pages). The naming conventions and the services registered/searched by agents in the yellow pages catalogue maintained by the JADE directory facilitator are formalized. A class diagram form is used as a representation. • Step 6: Agent-Resource Interactions. Based on the agent diagram produced in analysis, passive and active resources in the system are identified, and it is determined how agents will interact with these resources. • Step 7: Agent-User Interactions. Based on the agent diagram produced in analysis, agent-user interactions are identified and detailed. • Step 8: Internal Agent Behaviours. Based on the responsibility table produced in analysis, the agent responsibilities are mapped to agent behaviours. Different types of responsibilities (including interactions) require different types of agent behaviours have been specified. • Step 9: Defining an Ontology. An appropriate ontology for the domain is specified, by making a number of considerations. • Step 10: Content Language Selection. By following some rules, a suitable content language is selected. • Iterate Steps 1-10. Move back and forth between analysis and design whenever necessary.
Agent Oriented Programming for Setting Up the Platform
783
6 Creating Multi-agent System with JADE This section describes the JADE classes that support the development of multiagent systems. JADE warrants syntactical compliance and, where possible, semantic compliance with FIPA specifications.
6.1 The Agent Platform The standard model of an agent platform, as defined by FIPA, is represented in the following figure. The Agent Management System (AMS) is the agent who exerts supervisory control over access to and use of the Agent Platform. Only one AMS will exist in a single platform. The AMS provides white-page and life-cycle service, maintaining a directory of agent identifiers (AID) and agent state. Each agent must register with an AMS in order to get a valid AID. The Directory Facilitator (DF) is the agent who provides the default yellow page service in the platform. The Message Transport System, also called Agent Communication Channel (ACC), is the software component controlling all the exchange of messages within the platform, including messages to/from remote platforms. JADE fully complies with this reference architecture and when a JADE platform is launched, the AMS and DF are immediately created and the ACC module is set to allow message communication. The agent platform can be split on several hosts. Only one Java application, and therefore only one Java Virtual Machine (JVM), is executed on each host. Each JVM is a basic container of agents that provides a complete run time environment for agent execution and allows several agents to concurrently execute on the same host. The main-container, or front-end, is the agent container where the AMS and DF lives and where the RMI registry, that is used internally by JADE, is created. According to the FIPA specifications, DF and AMS agents communicate by using the FIPA-SL0 content language, the FIPA-agent-management ontology, and the FIPA- request interaction protocol. JADE provides compliant implementations for all these components.
6.2 The Agent Class The Agent class represents a common base class for user defined agents. Therefore, a JADE agent is simply an instance of a user defined Java class that extends
Fig. 1 Reference architecture of a FIPA agent platform
Agent Platform Agent Message
Agent Management
Transport
Directory Facilitator
System
T.A. Govindarajan, M. Al-Hadidi, and V. Palanisamy
784
the base Agent class. This implies the inheritance of features to accomplish basic interactions with the agent platform (registration, configuration, remote management, …) and a basic set of methods that can be called to implement the custom behaviour of the agent (e.g. send/receive messages, use standard interaction protocols, register with several domains, …) The computational model of an agent is multitasking, where tasks (or behaviours) are executed concurrently. Each functionality/service provided by an agent should be implemented as one or more behaviours. A scheduler, internal to the base Agent class and hidden to the programmer, automatically manages the scheduling of behaviours. 6.2.1 Agent Life Cycle The Agent class provides public methods to perform transitions between the various states; these methods take their names from a suitable transition in the Finite State Machine shown in FIPA specification Agent Management. For example, doWait() method puts the agent into WAITING state from ACTIVE state, doSuspend()method puts the agent into SUSPENDED state from ACTIVE or WAITING state. Notice that an agent is allowed to execute its behaviours (i.e. its tasks) only when it is in the ACTIVE state. Care must be taken that if any behaviours call the doWait() method, then the whole agent and all its activities are blocked and not just the calling behaviour. Instead, the block() method is part of the Behaviour class in order to allow suspending a single agent behaviour. Starting the agent execution: The JADE framework controls the birth of a new agent according to the following steps: the agent constructor is executed, the agent is given an identifier, it is registered with the AMS, it is put in the ACTIVE state, and finally the setup() method is executed. According to the FIPA specifications, an agent identifier has the following attributes: • • •
A globally unique name. A set of agent addresses. Each agent inherits the transport addresses of its home agent platform. A set of resolvers, i.e. white page services with which the agent is registered.
The setup() method is therefore the point where any application-defined agent activity starts. The setup()method needs to be implemented in order to initialise the agent. When the setup() method is executed, the agent has been already registered with the AMS and its Agent Platform state is ACTIVE. This initialisation procedure should be used to: • • •
(optional) if necessary, modify the data registered with the AMS; (optional) set the description of the agent and its provided services and, if necessary, register the agent with one or more domains, i.e. DFs; (necessary) add tasks to the queue of ready tasks using the method addBehaviour().
Agent Oriented Programming for Setting Up the Platform
785
These behaviours are scheduled as soon as the setup() method ends; The setup() method should add at least one behaviour to the agent. At the end of the setup() method, JADE automatically executes the first behaviour in the queue of ready tasks and then switches to the other behaviours in the queue by using a roundrobin non-preemptive scheduler. The addBehaviour(Behaviour) and removeBehaviour(Behaviour) methods of the Agent class can be used to manage the task queue. Stopping agent execution: Any behaviour can call the Agent.doDelete() method in order to stop agent execution. The Agent.takeDown() method is executed when the agent is about to go to DELETED state, i.e. it is going to be destroyed. The takeDown() method can be overridden in order to implement any necessary cleanup. 6.2.2 Inter-agent Communication The Agent class also provides a set of methods for inter-agent communication. According to the FIPA specification, agents communicate via asynchronous message passing, where objects of the ACLMessage class are the exchanged payloads. The Agent.send() method allows to send an ACLMessage. The value of the receiver slot holds the list of the receiving agent IDs. The method call is completely transparent to where the agent resides, i.e. be it local or remote, it is the platform that takes care of selecting the most appropriate address and transport mechanism. 6.2.3 Agents with a Graphical User Interface (GUI) An application, which is structured as a Multi Agent System, still needs to interact with its users. So, it is often necessary to provide a GUI for at least some agents in the application. This need raises some problems, though, stemming from the mismatch between the autonomous nature of agents and the reactive nature of ordinary graphical user interfaces. When JADE is used, the thread-per-agent concurrency model of JADE agents must work together with the Swing concurrency model. Performing an ACL message exchange in response to a GUI event: When an agent is given a GUI, it often happens that the agent is requested to send a message because of a user action (e.g., the user clicks a pushbutton). The ActionListener of the button will be run within the Event Dispatcher thread, but the Agent.send() method should be called within the agent thread. In the event listener, add a new behaviour to the agent, which performs the necessary communication. If the communication to perform is simply a message send operation, the SenderBehaviour class can be used, and the event handler will contain a line such as: myAgent.addBehaviour(new SenderBehaviour(msgToSend)); If the communication operation is messages receive, the ReceiverBehaviour class can be used in the same way: myAgent.addBehaviour(new ReceiverBehaviour(msgToRecv));
786
T.A. Govindarajan, M. Al-Hadidi, and V. Palanisamy
More generally, some complex conversation (e.g. a whole interaction conforming to an Interaction Protocol) could be started when the user acts on the GUI. In the multi-agent system, GenericGui class and FileGui class make the user to choose the required set of values during the processing. //GenericGUI for getting the basic details public class GenericGui extends JFrame implements ActionListener{ //GUI Componenets } public void actionPerformed(ActionEvent ae){ //Task to perform } //FileGUI for accessing the required data file for processing public class FileGui extends JFrame implements ActionListener{ //GUI Componenets } public void actionPerformed(ActionEvent ae){ //Task to perform } 6.2.4 Class Behaviour This abstract class provides an abstract base class for modelling agent tasks, and it sets the basis for behaviour scheduling as it allows for state transitions (i.e. starting, blocking and restarting a Java behaviour object). The block() method allows to block a behaviour object until some event happens (typically, until a message arrives). This method leaves unaffected the other behaviours of an agent, thereby allowing finer grained control on agent multitasking. This method puts the behaviour in a queue of blocked behaviours and takes effect as soon as action() returns. All blocked behaviours are rescheduled as soon as a new message arrives. Moreover, a behaviour object can block itself for a limited amount of time passing a timeout value to block() method, expressed in milliseconds. A behaviour can be explicitly restarted by calling its restart() method. Summarizing, a blocked behaviour can resume execution when one of the following three conditions occurs: 1. An ACL message is received by the agent this behaviour belongs to. 2. A timeout associated with this behaviour by a previous block() call expires. 3.The restart() method is explicitly called on this behaviour. //Implementation of Behaviour class in GenericAgent class AnalysisRequest extends Behaviour{ public void action(){ //taks to be done } public boolean done(){ return true; } } The Behaviour class also provides two placeholders methods, named onStart() and onEnd(). These methods can be overridden by user defined subclasses when some actions are to be executed before and after running behaviour execution. onEnd() returns an int that represents a termination value for the behaviour. It should be noted that onEnd() is called after the behaviour has completed and has
Agent Oriented Programming for Setting Up the Platform
787
been removed from the pool of agent behaviours. Therefore calling reset() inside onEnd() is not sufficient to cyclically repeat the task represented by that behaviour; besides that the behaviour should be added again to the agent as in the following example public int onEnd() { reset(); myAgent.addBehaviour(this); return 0; } 6.2.5 Class CyclicBehaviour This abstract class models atomic behaviours that must be executed forever. So its done() method always returns false. //Implmentation of CyclicBehaviour in EEGAgent public class EEGAgent extends Agent{ public void setup(){ addBehaviour(new EEGRequest()); } protected void takeDown(){ System.out.println("EEG Agent" +getAID().getName()+" is terminating"); } class EEGRequest extends CyclicBehaviour{ public void action(){ if(msg!=null){ //task to be done } else{ block(); } } } } 6.2.6 Class WakerBehaviour This abstract class implements a one-shot task that must be executed only once just after a given timeout is elapsed. The implementation of WakerBehaviour is as follows: //WakerBehaviour in GenericAgent protected void setup(){ addBehaviour(new WakerBehaviour(this,3000){ protected void handleElapsedTimeout(){ myAgent.addBehaviour(new alysisRequest());
} }); }
After the specified time, the AnalysisRequest object will be invocked.
7 Deployment and Testing After completing the steps of analysis and design, the features of JADE platform have been explored for the implementation of multi-agent system. With reference to the design phase, the following agents have been developed using Java language: Generic Agent, DB Agent, EEG Agent, ECG Agent and EMG Agent. The required behaviors and actions were implemented as per the design guidelines and FIPA recommendations. The required table has been created in Database which is accessible through Java DataBase Connectivity(JDBC).
788
T.A. Govindarajan, M. Al-Hadidi, and V. Palanisamy
8 Conclusion and Future Work The JADE platform is a popular, FIPA-compliant platform for the development of multi-agent systems. However, prior to this, no formal work on bio-signal processing had been proposed for the analysis and design of multi-agent systems using the JADE platform. The multi-agent system for processing Bio-signals will help the medical practitioners to have a standard examination procedure. As the agents on the JADE environment run on Threads, the response time is very less which helps the medical practitioner to make a quick diagnosis. As the health care industry is at its peak, the latest developments in the technology can be thought of integrating with it. In this direction, there are several issues remaining for future work, which include: • More emphasis on agent internal structure and mechanisms. • Development of multi-agent system to deploy on mobile network with which the patients can be monitored through wireless media. • Developing and accessing Expert System for specific applications. • Bio-metric based applications can also be developed by means of mobility of agents.
References 1. Caire, G., Cabanillas, D.: JADE Tutorial: Creating and Using Application Specific Ontologies (2004), http://jade.tilab.com/doc/CLOntoSupport.pdf 2. FIPA Interaction Protocol Specifications, http://www.fipa.org/repository/ips.php3 3. Foundation for Intelligent Physical Agents (FIPA), http://www.fipa.org/ 4. JADE-Java Agent DEvelopment Framework, http://jade.tilab.com/ 5. Jennings, N.R., Wooldridge, M.: Applications of Intelligent Agents. In: Jennings, N.R., Wooldridge, M. (eds.) Agent Technology: Foundations, Applications and Markets, pp. 3–28. Springer, Berlin (1998) 6. Luck, M., Ashri, R., D’Inverno, M.: Agent-Based Software Development. Artech House Publishers (2004) 7. Odell, J.: Objects and Agents: How do They Differ? Journal of Object-Oriented Programming (2000) 8. Paurobally, S., Cunningham, J., Jennings, N.R.: Developing agent interaction protocols using graphical and logical methodologies. In: Dastani, M., Dix, J., El FallahSeghrouchni, A. (eds.) PROMAS 2003. LNCS, vol. 3067, pp. 149–168. Springer, Heidelberg (2004) 9. Shoham, Y.: Agent Oriented Programming. Artificial Intelligence 60(1), 51–92 (1993) 10. Franklin, S., Graesser, A.: A Taxonomy for Autonomous Agents. In: Proceedings of Third International workshop on Agent Theories (1996), http://www.msci.memphis.edu/franklin/AgentProg.html 11. Sturm, A., Shehory, O.: A Framework for Evaluating Agent-Oriented Methodologies. In: Giorgini, P., Henderson-Sellers, B., Winikoff, M. (eds.) AOIS 2003. LNCS, vol. 3030, pp. 94–109. Springer, Heidelberg (2004)
Agent Oriented Programming for Setting Up the Platform
789
12. Wooldridge, M.: An Introduction to Multiagent Systems. John Wiley and Sons, Chichester (2002) 13. Wooldridge, M., Jennings, N.R., Kinny, D.: The Gaia Methodology for Agent-oriented Analysis and Design. Autonomous Agents and Multi-Agent Systems 3(3), 285–312 (2000) 14. Wooldridge, M.J., Jennings, N.R.: Intelligent Agents: Theory and Practice. The Knowledge Engineering Review 10(2), 115–152 (1995) 15. Wooldridge, M.J., Jennings, N.R.: Pitfalls of Agent-oriented Development. In: Proceedings of the 2nd International Conference on Autonomous Agents (Agents 1998), Minneapolis, USA, pp. 385–391 (1998) 16. Shoham, Y.: Agent Oriented Programming. Technical Report, Stanford University (1992)
A Forecasting Model of City Freight Volume Based on BPNN Peina Wen and Zhiyong Zhang * *
Abstract. City Freight Volume (FV) is a critical factor to the city logistics planning. However, because of the complicated factors of city FV, the accurate prediction of city FV is not easy. In this paper, we analyze nine affecting factors of city FV, and build up a forecasting model of Beijing FV based on BP Neural Network. Solving the model by MATLAB, we would get the forecasting results of Beijing FV. The numerical results show that the use of BP neural network model to predict the city FV is reasonable. Keywords: City logistics planning, City freight volume, BP neural network, Forecasting model.
1 Introduction With the development of modern logistics industry, many Chinese provinces and cities have developed their logistics strategies. The city logistics planning is an important part of the city's strategic planning. The governments hope to boost the City’s economy, improve the city’s investing environment, attract the foreign capitals and lessen the pressure of employment by developing logistics industry. But the sudden growth of logistics may lead to imbalance between the supply and the actual demand [1]. So how to predict the logistics-demand accurately is the primary goal of logistics planners. However, the development of logistics in China was later than the developed countries, which led to lacking of correct cognition for the modern logistics development and operation as well as complete and scientific data for forecasting, so the policy-making of logistics development and the Peina Wen Management science and engineering, Department of Postgraduate, Beijing Wuzi University, Beijing 101149, China
[email protected]
*
Zhiyong Zhang Ph. D Candidate of Management School, Hefei University of Technology, Hefei 230009, China
[email protected] *
Corresponding author.
H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 791–798. springerlink.com © Springer-Verlag Berlin Heidelberg 2009
792
P. Wen and Z. Zhang
feasible research about infrastructure construction of logistics are short of quantitative basis. Accordingly, finding a suitable forecasting model (or method) which is more accuracy to logistics-demand is crucial. In the existing literatures, logistics demands were measured by value or by volume. When measured by value, logistics-demand should take all of the values in the process of logistics into account, such as the expenses of logistics, the income of logistics, and the added values of supply chain. When measured by volume, a specific activity volume is only considered during the process, such as freight volume, inventory volume, or delivery volume [2]. City logistics meet the needs of economic activities and residents’ living. What’s more, the subjects of city logistics include all of the activities in long periods and wide ranges, all of these make logistics-demand difficult to be measured by value. As we know, the core activities of city logistics are freightage and storage, and most expenses come from freightage. In this meaning, freight volumes could reflect the scale of logistics [1]. So it is very significant to predict the amount of city FV. There are several methods to predict city FV. But the following methods are in common used: econometrics models, regression analysis, elastic coefficient method, trend analysis method, grey system model, neural network model, and combination forecasting model [3]. The former four methods are based on timeseries data, so they can compare the data of different periods, and are precise. However, the needs for understanding the relationship between the forecasting objects and the variables as well as the trends of the variables make these methods be applied difficultly. The grey system model, though avoids the complicated interrelationship, has the problem on establishing continuous differential equation by using scatter data, so it is only suitable for short-term forecasting. Besides, the lack of precise data and its limited scope of application indicate the precision of results is not high. Artificial Neural Network (ANN) model is an information processing system based on the imitation of structure and the function of human neural network. It is also a nonlinear dynamic system which could realize certain dynamic function. The model is good at learning and generalizing. It is able to learn experience automatically from the samples without repeating searching and expression, so it works well with unstable, uncertain or incomplete data. Besides, it could approach any continuous functions with any precision, show the nonlinear relationships hided in samples and solve complicated, multivariable, nonlinear regression problems with high precision. It should be noted that the model has more advantages than common statistic methods when the samples are small and have “white noise” (which is random error) [4], [5]. On the practical data of Beijing, the paper analyzes nine economic factors and tries to build up a forecasting model of city FV based on BP Neural Network (BPNN) in this city.
2 BP Neural Network BPNN is a typical type of ANN that widely being applied. The BP neural network is a multi-layered, feed-forward network which follows the one-way communication. It includes an input layer, several connotative layers and an output layer. Its structure is shown in fig.1 [6].
A Forecasting Model of City Freight Volume Based on BPNN
793
x1
y1
x2
y2
xn
yn
Fig. 1 The structure of BP Neural Network
The learning process of BPNN consists of two processes: the forward propagation of data streams and the backward propagation of error signals. As the former happened, the input samples are introduced from the input layer and disposed by the connotative layers, and then passed to the output layer. If the actual output is dissimilar with the expected output (teaching signal) of the output layer, it will transfer to the latter. The output error will return along the original access as some forms and modify the authority of each neuron. This iterative process continued until the error reduced to an acceptable level [7].
3 The Forecasting Model of CFV in Beijing 3.1 Factors and Pretreatment Factors of city FV are plenty and complex, in this paper, we only consider economical factors. Generally speaking, the primary economical factors which affect city FV include total population and GDP (they influence the volumes of city FV), Industrial value, Agricultural value, Primary industry, Secondary industry and Tertiary industry they reflect the economical structure , Level of consumption (it reflects the personal logistics), Retail-sales of consumer goods (it reflect the speed of circulation) [8]. The data of this paper come from the "Beijing Statistical Annals (2008)" (Table 1). Based on the principle of selecting a sample datum, we select the data from 1991 to 2002 as the training samples, and select the data from 2003 to 2007 as the testing samples. As the dimensions of the data are different, they need to be normalized. Generally speaking, the data are confined in [0 1]. The Normalized formula is [9]:
(
)
−
xi =
xi − xmin . xmax − xmin
In this paper we use MATLAB to executive the normalized procedure.
(1)
794
Table 1 The statistic of fright volume and other correlative factors
years 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007
P 1094 1102 1112 1125 1251.1 1259.4 1240 1245.6 1257.2 1363.6 1385.1 1423.2 1456.4 1492.7 1538 1581 1633
Level of consumer 1348 1513 1977 2505 3519 4208 4557 5178 5784 7326 8197 9291 10584 12200 14835 16770 18911
GDP 598.9 709.1 886.2 1145.3 1507.7 1789.2 2075.6 2376.0 2677.6 3161.0 3710.5 4330.4 5023.8 6060.3 6886.3 7870.3 9353.3
Primary industry 45.8 49.1 53.7 67.5 73.5 75.0 75.7 76.7 77.1 78.6 80.8 84.0 89.8 95.5 98.0 98.0 101.3
Secondary industry 291.5 345.9 419.6 517.6 645.8 714.7 781.9 840.6 907.3 1033.3 1142.4 1250.0 1487.2 1853.6 2026.5 2191.4 2509.4
Tertiary industry 261.9 314.5 412.9 560.2 788.4 999.5 1218.0 1458.7 1693.2 2049.1 2487.3 2996.4 3446.8 4111.2 4761.8 5580.8 6742.6
Indus. value 730.2 860 1166.6 1576.6 1493.3 1590.6 1819.7 1947.0 2183.5 2842.0 3270.1 3620.2 4410.8 5733.3 6946.2 8210.0 9648.4
Agri. value 39.6 43.2 51.1 72.5 86.8 89.2 86.8 88.3 89.4 88.1 84.7 83.5 80.9 83.1 91 104.5 115.5
Cons. goods 357.8 430.0 611.2 766.6 950.4 1061.6 1208.5 1373.6 1509.3 1658.7 1831.4 2005.2 2296.9 2626.6 2902.8 3275.2 3800.2
total 26804 29888 29865 30825 32185 32907 32351 30127 28275 30717 30607 30961 30925 31700 32509 33547 20770
Freight volume railway highway 2983 23739 2912 26881 3048 26730 3006 27700 2974 29087 2851 29960 2883 29360 2563 27490 2583 25635 2612 28010 2505 28007 2348 28375 2265 28361 1959 29256 1976 30050 1956 30953 1925 17872 P. Wen and Z. Zhang
Record: The data of Retail-sales of consumer goods come from “Beijing Statistical Annals” which belong to past years, and other data stem from “2008 Beijing Statistical Annals”.
A Forecasting Model of City Freight Volume Based on BPNN
795
3.2 The Designing of BPNN The Nodes of Input and Output. According to the analysis of factors above, we consider nine factors, including population, Level of consumer, GDP, Primary industry, Secondary industry, Tertiary industry, Industrial value, Agricultural value, and retail sales of consumer goods, as input nodes, and freight volume (containing total, railway and highway) as output nodes. Hidden Level and Hidden Neurons. As a single-hidden-layer BPNN has strong ability in non-liner mapping the model is adopted in this paper. The number of neurons of hidden layer should be decided by testing results. Based on the Kolmogorov theorem and the input neurons, we set 19 at first, and then chose 17 and 24 for comparison. Finally, we confirm 19 as the best number. The comparison curves of forecasting error between 17, 19 and 24 are shown in Fig.2. [10], [11].
1.4
1.2
1
0.8
0.6
0.4
0.2
0
1
1.5
2
2.5
3
3.5
4
4.5
5
Fig. 2 The comparison curve of forecasting error between 17, 19 and 24
From Fig.2, we learn that the forecasting errors reach the lowest when middle layer level is nineteen. Transferring Function. Considering the actual circumstances, in this paper, we select “tansig” as the transferring function for middle layer, and “logsig” for input layer. Training Algorithm. Because “trainlm” has high convergence speed and the training error is lower. We here consider “LM algorithm” as the training algorithm.
796
P. Wen and Z. Zhang
Table 2 Training parameters
Training times 1000
goal 0.001
Training Parameters. The training of network adopt neural network tool in MATLAB. Training parameters are shown in Table 2 and other parameters take default setting.
3.3 Numerical Results The training cost about 0.49 seconds. The network’s goal of error is achieved after eleven times, and MSE is 0.000955637/0.001. The final results are shown in Fig.3, Fig.4 and Table 3.As the initial conditions of training are different from each time, the results are not the same. In order to make the prediction closer to the raw data, we could train more times. Performance is 0.000955637, Goal is 0.001
-1
Training-Blue Goal-Black
10
-2
10
-3
10
-4
10
0
1
2
3
4
5 6 11 Epochs
7
8
9
10
11
Fig. 3 The results of training (hidden layer neurons for 19)
We know that, from Fig.3, Fig.4, Table 3 and Table 4, compared with SPSS Model, BPNN model is more accuracy. The relative error between the prediction and the raw data are only from 0.14452% to -2.01996%.The large margin of error in 2007 is due to the limitation of vehicles, the non-production of factories and
A Forecasting Model of City Freight Volume Based on BPNN
797
1.4
1.2
1
0.8
0.6
0.4
0.2
0
1
1.5
2
2.5
3
3.5
4
4.5
5
Fig. 4 Forecasting error Table 3 The comparison between the prediction and the raw data
years
2003 2004 2005 2006 2007
CFV (raw date)
total 30925 31700 32509 33547 20770
railway 2265 1959 1976 1956 1925
highway 28361 29256 30050 30953 17872
CFV (prediction)
total 31413 31637 32209 32796 33284
railway 2019 1992 1977 1965 1949
highway 29276 29934 30186 30299 30527
Average relative error(%) -2.0199 1.2719 -0.1445 -1.3001 44.1051
Table 4 The comparison between BPNN model and SPSS model
years
Raw data
2003 2004 2005 2006 2007
30925 31700 32509 33547 20770
BPNN model Prediction Error (%) 31413.24 1.5788 31636.84 -0.1992 32209.25 -0.9221 32795.71 -2.2395 33283.79 60.2494
SPSS model Prediction Error (%) 413003.30 32.5895 48239.34 52.1746 40967.87 26.0201 16940.42 -49.5024 -1833.55 -108.8279
other country’s policies, which all caused by the 2008 Beijing Olympic Games. As a result, the use of BPNN model to predict the freight volume of Beijing is fairly accurate.
798
P. Wen and Z. Zhang
4 Conclusion This paper build up a forecasting model of freight volume with time-series in Beijing by BP neural network and solve the model by using MATLAB. From the forecasting results, we learn that the forecasting accuracy of this model is high with the biggest error of -2.01996% and the average error of 0.54816%. If some special factors are considered, the forecasting will be more accurate. Besides, this model is easy to calculate and implement without building complex mathematical equations and programs. What’s more, the forecasting law of model is affected by the nature of the samples. As long as the samples selected properly, the accuracy of forecasting is determined by the network structure. However, as the forecasting model is based on historical data, it is easy to be affected by original data and to neglect the future development trends. This model has strong robustness and broad applicability. It also can be used to solve similar prediction problems, such as stocks, water-flow, and rainfall. ANN has been using in all fields of society. Its encouraging accomplishments benefit the development and survival of society as well as the whole human beings. Acknowledgments. This paper was supported by the Funding Project for Academic Human Resources Development in Institutions of Higher Learning under the Jurisdiction of Beijing Municipality-PHR (IHLB).
References 1. Zhang, L.X.: Study on the Method of Urban Logistics Demand Forecasting. Southeast University, Jiangsu (2006) (in Chinese) 2. Cai, D.P.: Study on the Forecasting of Logistics Market Demands. Jiangxi University of Finance & Economics, Jiangxi (2006) (in Chinese) 3. Wang, X.Z.: The Study of the Predictive Method about the Freight Volume. Wuhan University of Technology, Hubei (2005) (in Chinese) 4. Han, L.Q.: The Theory, Design and Application of Neural Network. Chemical Industry Press, Beijing (2007) (in Chinese) 5. Hou, F.J.: Prediction Application in the Time Array of the Market of Railway Passengers Transport based on BP Neural Network. In: Planning and Management, Beijing (2003) (in Chinese) 6. Hechi-Nielsen, R.: Theory of the Back Propagation Neural Network. In: Proceedings of the International Joint Conference on Neural Networks, Washington, DC (1989) 7. Bottaci, L., Drew, P.J., Hartley, J.E.: Artificial Neural Networks Applied to Outcome Prediction for Colorectal Cancer Patients in Separate Institution. Lancet, 350–352 (1997) 8. Wang, X.L.: The Application of the Econometrics Model in Estimation of Logistics Demand. In: Logistics Technology, Beijing (2005) (in Chinese) 9. Zhou, K.L., et al.: Neural Network Models and Simulation of MATLAB Programming. Tsinghua University Press, Beijing, (2005) (in Chinese) 10. Baxt, W.G.: Application of Artificial Neural Networks to Clinical Medicine. Lancet, 346–352 (1995) 11. Flying Synopsys R & D Center: The Theory of Neural Network and the Realization of MATLAB7. Publishing House of Electronics Industry, Beijing (2005) (in Chinese)
The Estimations of Mechanical Property of Rolled Steel Bar by Using Quantum Neural Network Jen-Pin
Yang, Yu-Ju Chen, Huang-Chu Huang, Sung-Ning Tsai, and Rey-Chue Hwang*
Abstract. In this paper, the estimations of mechanical property of rolled steel bar by using quantum neural network (QNN) were proposed. Based on the learning capability of neural network, the nonlinear, complex relationships among the steel bar, the billet materials and the control parameters of production could be automatically developed. Such an artificial intelligent (AI) estimator then can help the operation technician to set the related control parameters of rolling process. Not only the quality of steel bars could be improved, but also the cost of bar’s production could be greatly reduced. Keywords: Estimations, Mechanical property, Rolled steel bar, Quantum neural network.
1 Introduction As we know, steel bar is the necessary and important material widely used in many engineering constructions, including building, bridge, and road. Its quality is Jen-Pin Yang . Rey-Chue Hwang Electrical Engineering Department *
Sung-Ning Tsai General Education Center, I-Shou University, Kaohsiung County 840, Taiwan, China
[email protected] Yu-Ju Chen Information Management Department, Cheng Shiu University, Kaohsiung 833, Taiwan, China Huang-Chu Huang Electric Communication Department, Kaohsiung Marine University, Kaohsiung 811, Taiwan, China H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 799–806. springerlink.com © Springer-Verlag Berlin Heidelberg 2009
800
J.-P. Yang et al.
highly related to the safeties of construction and human’s life. In fact, the degree of withstanding earthquake of the construction is closely linked with the quality of steel bar. In year 1999, 921 Earthquake not only took away many humans lives, but also caused a very serious damage to economy of Taiwan. From then on, the government of Taiwan has reset the new policy for controlling the quality of steel bar. All disqualified steel bars are not allowed to sell and must be melted and reproduced. Any failed steel bar will certainly increase the cost of the steel manufacturing company. Therefore, how to make a good control in the manufacturing process of steel bars becomes a very important issue for the manufacturers. Usually, in the rolled process of steel bar, the relative control parameters, such as size, rolling speed and hydraulic pump and section, are mainly determined by the technician with full experiences in accordance with the compositions of billet [1-2]. Basically, in all compositions of billet, Carbon equivalent (C.E.), Carbon (C) and Manganese (Mn) are three major referring factors adopted to set the control parameters by technician with full experiences. The block diagram of manufacturing process of steel bars is simply shown in Fig. 1. As we know, the compositions of billet include many chemical elements. Some of them are even unknown, especially when the sources of metal scrap came from different countries. Such a simple way for setting the control parameters based on human’s experiences easily makes the steel bar produced be disqualified. In other words, it also implies that the cost of steel company will be increased with no doubt. Recently, due to the powerful learning and adaptive capabilities, NN technology has been widely applied into many areas, such as control system, system identification, decision making, pattern recognition and so on [3-9]. Through a simple training, NN model can automatically develop the complex and nonlinear relationships between input and output pairs of training data provided. Such a well-trained NN model then can be used to perform a specific work designed by the operator. In this study, the mechanical property estimator of rolled steel bar by using QNN was developed. It is well-known that QNN can deal with the signal with fuzziness due to its units have various graded levels which are capable of classifying the features of signal [10-13]. Section 2 presents the backbone of the developed estimator, i.e., QNN model. Some experiments performed by using QNN are reported in Section 3. Section 4 gives the conclusion of this research.
Billet Material
Manufacturing Process
Control Parameters (Based on Experiences)
Fig. 1 The conventional manufacturing process of steel bars
Steel Bar
The Estimations of Mechanical Property of Rolled Steel Bar
801
2 QNN Estimator and Its Learning Algorithm In this paper, a size of 20-12-12-1 QNN model is used for all experiments. The diagram of QNN structure is shown in Fig. 2. The ns-level sigmoid function is taken as the transfer function by the units in the hidden layer. Its math form is expressed as sgm(x ) = (1/ n ) (1 (1 + exp(− (x − θ r )))) . In the output layer, we still use s ∑ ns
r =1
the sigmoid function as the node’s transfer function. The major steps of QNN learning algorithm is summarized and presented as follows [10, 13]. Update the synaptic weights: Denote the error term δ j for all nodes and d j , O j , Y j , and Xi are the desired value of j-th output node, the computed value for the j-th node, the output value of hidden node j, and the value of input signal from ith node in the layer below node j. For nodes of output layer:
δ j = (d j − O j )O j (1 − O j )
(1)
For nodes of hidden layer: ns
δ j = ((1 / ns )∑Y jr (1− Y jr )) ∑δ k w jk r =1
(2)
k
The weights can be adjusted by
wij (n + 1) = wij (n) + ηδ j X i + ξ (wij (n) - wij (n - 1))
(3)
where, n+1, n, and n-1 index next, present, and previous iterations respectively. η is the learning step and ζ is the momentum. In our studies, η = 0.2 , ζ = 0 . 5 and ns=12. Update the quantum intervals: In each training cycle, we calculate the following outputs for each hidden node j. ni
h j = ∑ wij X i
(4)
i =0
ni is the number of input signals. ns
h rj = ∑ sgm( h j − θ rj )
(5)
r =1
~ 1 hj = ns
ns
∑ h rj
(6)
r =1
v rj = h rj (1 − hrj )
(7)
802
J.-P. Yang et al.
Take the average values for each class, h~ j , Cm For m-th class
and v rj ,Cm .
Cm , ~ 1 h j ,Cm = Cm
∑xm , xm∈Cm h j,m
~
(8)
1 Cm
∑ xm , xm∈Cm v rj , m
(9)
and
v rj ,Cm =
where, | C m | denotes the cardinality of C m . The quantum interval can be adjusted by Δθ
r j
= ηθ
1 ns
Cm
∑ ∑
m =1 x m ∈Cm
~ ~ ( h j ,Cm − h j ) × ( v rj , Cm − v~ j )
θ jr = θ rj + Δθ jr
(11)
where, ηθ is the learning rate. In our studies, ηθ =0.005 and C m = 8 .
Output Layer
Hidden Layer
Hidden Layer
Input Layer
Fig. 2 Architecture of a four-layer QNN
(10)
The Estimations of Mechanical Property of Rolled Steel Bar
803
3 Experiments In our experiments, 1400 sets of data, including billet compositions, control parameters of rolling process and mechanical properties of rolled steel bars, provided by Hai-Kwang Corporation; Taiwan, were analyzed and simulated. The examples of data are listed in Table 1. The data information includes, steel type, yield strength, tensile strength, Ts/Ys, percentage of elongation, Wt, Wt , C, Si, Mn, P, S, Cu, Sn, Ni, Cr, Mo, W, V, Al, Pb, Nb, CE, size, rolling speed, hydraulic pump and section. The mechanical properties of rolled steel bars, including yield strength, tensile strength and percentage of elongation will be modeled and
%
Table 1 The examples of steel bar data Steel Type, Yield Strength, Tensile Strength, Ts/Ys, Percentage of Elongation , Wt, Wt C, Si, Mn, P, S, Cu, Sn, Ni, Cr, Mo, W, V, Al, Pb, Nb, CE, Size, Rolling Speed, Hydraulic Pump, Section SD420
50.6
0.2778 0.0080
0.0682 0.7091 0.00200.0013
0.2613 0.0073
0.0719 0.0022
0.2930 0.0080
0.0746 0.0023
0.2994 0.0074
0.0730 0.0021
0.2604 0.0058
0.0372 0.0018
0.2862 0.0160
0.1298 0.0060
SD420
54.3
0.7203 0.0000
SD420
53.0
0.7199 0.0013
SD420
53.6
0.7195 0.0000
SD420
51.4
0.6219 0.0000
SD420
51.8
0.8570 0.0000
67.6 0.0083 0.0083
69.8 0.0055 0.0092
69.3 0.0072 0.0093
68.9 0.0078 0.0090
67.0 0.0136 0.0072
67.8 0.0332 0.0016
1.34 0.0159 0.0012
21.7 0.0237 0.3997
1.29 0.0123 0.0014
22.8 0.0244 0.3850
1.31 0.0198 0.0015
MAPE 0.9073%
Table 3 The error statistics of tensile strength MAE 0.8187
3.96 0.0090 D25
20.10 0.4259 0.4594
Table 2 The error statistics of yield strength MAE 0.6183
3.98 0.0018 D25
21.1 0.0000 0.3643
1.31 0.0294 0.0022
3.98 0.0018 D25
22.4 0.0249 0.4230
1.30 0.0184 0.0008
4.04 0.0018 D25
16.5 0.0262 0.4167
1.29 0.0250 0.0012
3.99 0.0018 D25
MAPE 1.5014%
3.95
0.0288 D25
0.30 0.0215 8.9
0.0237 0.0085 3 4
1.40 0.0210 8.9
0.0237 0.0086 3 4
0.10 0.0213 8.9
0.0236 0.0089 3 4
0.10 0.0215 8.9
0.0235 0.0087 3 4
0.50 0.0067 8.9
0.0028 0.0068 3 4
0.70 0.0683 9.6
0.1719 0.0136 2 3
804
J.-P. Yang et al.
Table 4 The error statistics of percentage of elongation MAE 0.5202
MAPE 3.1132%
62
58
54
50
46 1
51
101
151
201
251
301
351
Fig. 3 The performance results of yield strength. (Solid line: actual values. Dotted line: estimated values)
74 72 70 68 66 64 1
51
101
151
201
251
301
351
Fig. 4 The performance results of tensile strength. (Solid line: actual values. Dotted line: estimated values)
The Estimations of Mechanical Property of Rolled Steel Bar
805
24
21
18
15
12 1
51
101
151
201
251
301
351
Fig. 5 The performance results of percentage of elongation. (Solid line: actual values. Dotted line: estimated values)
estimated based on the corresponding billet compositions and relative influencing factors. For demonstrating the AI estimator we developed, two sets of data are divided in the simulations. The first 1000 sets of data are used for training the neural model and the other 400 sets of data are used for testing. For each QNN model, there are twenty inputs, including C, Si, Mn, P, S, Cu, Sn, Ni, Cr, Mo, W, V, Al, Pb, Nb, CE, size, rolling speed, hydraulic pump and section. Tables 2, 3 and 4 list the testing error statistics of yield strength, tensile strength and percentage of elongation, respectively. The mean absolute error (MAE) and the mean absolute percentage error (MAPE) for overall testing data are used for the performance measures. Fig. 3, Fig. 4 and Fig. 5 show these three QNN models’ performances in a graphical form. From the simulation results, we can find that QNN model developed do have the capability to capture the very complex relationships among the mechanical properties of steel bar and their influencing factors, including the billet materials and the control parameters of rolling process. Such a well-trained NN estimator then can be used to help the technician to set the related control parameters of rolling process.
4 Conclusion In this paper, an artificial intelligent estimator of mechanical properties of rolled steel bar based on three independent QNN models is developed. From the simulation results, the nonlinear and very complex relationships among the mechanical properties, billet materials and control parameters of steel bar rolling process could be automatically developed. Such AI tool certainly can help the technician
806
J.-P. Yang et al.
without full experience to set the proper control parameters before the steel bar is on the real-line production. Not only the quality of steel bar could be strictly controlled and improved, the cost of production due to the defective manufacturing process could also be greatly reduced. Acknowledgments. This work is supported by the National Science Council of Republic of China under contract No. NSC-97-2221-E-214-069.
References [1] The Handbook of CNS 560 Chemical Compositions and Mechanical Properties of Rolled Steel Bars [2] Hai-Kwang Corporation ISO Q09-02-W01 Thermex Operation Handbook [3] Chen, S., Billings, S., Grant, P.: Non-linear System Identification Using Neural Networks. International Journal of Control 51, 1191–1214 (1990) [4] Khotanzad, A., Hwang, R.C., Abaye, A., Maratukulam, D.: An Adaptive Modular Artificial Neural Network: Hourly Load Forecaster and Its Implementation at Electric Utilities. IEEE Transactions on Power Systems 10, 1716–1722 (1995) [5] Zhang, B., Fu, M., Yan, H., Jabri, M.A.: Handwritten Digit Recognition by AdaptiveSubspace Self-Organizing Map (ASSOM). IEEE Transactions on Neural Networks 10 (1999) [6] Huang, H.C., Hwang, R.C., Hsieh, J.G.: A New Artificial Intelligent Peak Power Load Forecaster Based on Non-fixed Neural Networks. International Journal of Electrical Power and Energy Systems 24, 245–250 (2002) [7] Shen, C.Y., Hsu, C.L., Hwang, R.C., Jeng, J.S.: The Interference of Humidity on a Shear Horizontal Surface Acoustic Wave Ammonia Sensor. Sensors & Actuators: B. Chemical 122, 457–460 (2007) [8] Weng, P.H., Chen, Y.J., Huang, H.C., Hwang, R.C.: Power Load Forecasting by Neural Models. Engineering Intelligent Systems for Electrical Engineering and Communications 15, 33–39 (2007) [9] Shen, C.Y., Huang, H.C., Hwang, R.C.: Ammonia Identification Using Shear Horizontal Surface Acoustic Wave Sensor And Quantum Neural Network Model. Sensors & Actuators: A. Physical 147, 464–469 (2008) [10] Purushothaman, G., Karayiannis, N.B.: Quantum Neural Networks (QNN’s): Inherently Fuzzy Feed-forward Neural Networks. IEEE Transactions on Neural Networks 8 (1997) [11] Zhou, J., Gan, Q., Krzyzak, A., Suen, C.Y.: Recognition of Handwritten Numerals by Quantum Neural Network with Fuzzy Features. International Journal on Document Analysis and Recognition 2, 30–36 (1999) [12] Behrman, E.C., Nash, L.R., Steck, J.E., Chandrashekar, V.G., Skinner, S.R.: Simulations of Quantum Neural Networks. Information Sciences 128, 257–269 (2000) [13] Lee, C.D., Chen, Y.J., Huang, H.C., Hwang, R.C., Yu, G.R.: The Non-Stationary Signal Prediction by Using Quantum NN. In: Proceedings of 2004 IEEE International Conference on Systems, Man and Cybernetics, pp. 3291–3295 (2004)
Diagnosis of Epilepsy Disorders Using Artificial Neural Networks Anupam Shukla, Ritu Tiwari, and Prabhdeep Kaur*
Abstract. Epilepsy is a common chronic neurological disorder that is characterized by recurrent unprovoked seizures. About 50 million people worldwide have epilepsy at any one time. This paper presents an Intelligent Diagnostic System for Epilepsy using Artificial Neural Networks (ANNs). In this approach the feed-forward neural network has been trained using three ANN algorithms, the Back propagation algorithm (BPA,), the Radial Basis Function (RBF) and the Learning Vector Quantization (LVQ). The simulator has been developed using MATLAB and performance is compared by considering the metrics like accuracy of diagnosis, training time, number of neurons, number of epochs etc. The results obtained clearly shows that the presented methods have improved the inference procedures and are advantageous over the conventional architectures on both efficiency and accuracy. Keywords: Artificial Neural Networks, Back-Propagation Networks (BPN), Radial Basis Function Networks (RBFN), Learning Vector Quantization Network (LVQN), Epilepsy and Diagnosis.
1 Introduction The validity and usefulness of Artificial Neural Networks (ANNs) depends on whether an appropriate measure is used to access its accuracy and whether an ANN is significantly more accurate than traditional statistical models for the medical task. Although there has been research using ANNs in medical fields, but there has not been a significant use in a hospital or clinic routinely [1]. The reason is that people don’t think machines to be much reliable when it comes to diagnosis of a disease. But, soft computing tools like ANNs, Fuzzy Logic, Genetic Algorithm, can do well to ease and complement the work of medical experts [2]. They can help to filter out the real patients, which will reduce the costs and time required for diagnosis. The doctors can then provide all their attention to the actual patients. By properly using these techniques, a trust can be established between them and the patients [3]. Anupam Shukla . Ritu Tiwari . Prabhdeep Kaur ABV- IIITM, Gwalior, India
[email protected],
[email protected],
[email protected] *
H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 807–815. springerlink.com © Springer-Verlag Berlin Heidelberg 2009
808
A. Shukla, R. Tiwari, and P. Kaur
Epilepsy is the most common disease of the central nervous system (CNS). These elliptic disorders are transient signs and/or symptoms due to abnormal, excessive or synchronous neuronal activity in the brain. It is not cured but usually controlled with medication, although surgery may be considered in difficult cases [4]. The prevalence of this disease in some African countries is about 10 per 1000 population. In rural areas from South Africa, prevalence of epilepsy in children aged 2-9 years is 7.3 per 1000. The mean age of age at onset was 23.7 years for motor partial epileptic seizures and 12.3 years for generalized seizures [5]. Information about existing resources available within the countries to tackle the huge medical, social, and economic burden caused by epilepsy is lacking. To fill this information gap, a survey of country resources available for epilepsy care was conducted within the framework of the ILAE/IBE/WHO Global Campaign Against Epilepsy. Data were collected from 160 countries representing 97.5% of the world population. The data reinforce the need for urgent, substantial, and systematic action to enhance resources for epilepsy care, especially in low-income countries [6]. Epilepsy has serious social and economic consequences too. People with epilepsy continually face social stigma and exclusion. A fundamental part of ridding the world of this stigma is to raise public and professional awareness. The cost and burden of epilepsy varies between countries In 1990, WHO, identified that, on average, the cost of the anti-epileptic drug phenobarbitone, could be as low as US$ 5 per person per annum.[7].
2 Methodology In this work, after preprocessing the data, three ANNs (BPN, RBFN & LVQN) were trained with the data of epilepsy and generated four models for each of the three networks. Then, the best models for each of the ANNs are chosen, and the three models were compared to find the overall best model for diagnosis of a disease. Fig. 1 shows the overall methodology of the work performed in this paper. All neural networks have been simulated using the software package MATLAB [8]. BPA uses a gradient descent approach to minimize output error in a feedforward network [9, 10, 11], and hence uses supervised learning to train the network. RBFN uses Unsupervised and supervised approaches for decision making [12, 13, 14]. Learning vector quantization is a method for training competitive layers in a supervised manner. A competitive layer will automatically learn to classify input vectors. [15, 16, 17, 18] Data-set for Epilepsy was taken from UCI repository of machine learning databases [19]. The nine input values were derived from a 1/4 second "SPIKE" event that is recorded on a single channel of an EEG monitor. A team of neurologists were asked to classify each spike as an epileptic event (True or False). The file contains 100 TRUE values and 165 FALSE values; total 265 patterns each having nine attributes. The patterns are in random order. First, all the data has been normalized so that the value of every attribute is between 0 and 1. Out of 265 instances, 200 instances have been used for training the system and 65 have been
Diagnosis of Epilepsy Disorders Using Artificial Neural Networks
809
Collection of Data (UCI repository of machine learning databases)
Pre-processing 1.Remove Instances with missing attribute values. 2. Data splitting, i.e., splitting data into training and testing set.
Train network using different ANN algorithms.
Simulate BPA, RBF & LVQ models with the testing set and training set
Find best BPA, RBF & LVQ models
Performance Comparison Compare best BPA, RBF and LVQ models on the basis of accuracy of diagnosis, neurons used, training time, error etc. and choose the best overall model for diagnosis Fig. 1 Overall Block Diagram of the methodology
used for testing purposes. Class 1 of output signifies False (not epileptic) and Class 2 signifies TRUE (epileptic).
3 Simulated Models The simulated models are described in three sections for different ANN algorithms.
3.1 Diagnosis Using BP Network In our experiment, a BP Network with two hidden layers was used (Fig. 2). The transfer function used in first and second hidden layer neurons, was tansig and the output layer neurons used purelin transfer function. The value of momentum, learning rate and number of hidden neurons were changed to get different BP models and choose the best one. Number of maximum allowable epochs was
810
A. Shukla, R. Tiwari, and P. Kaur
Fig. 2 Architecture of Back- Propagation Networks Table 1 Experimental Results for BP Networks No. Of Hidden Neurons
Momentum
Learning rate
28 (16+12) 30 (20+10) 30 (18+12) 30 (20+10) 32 (22+10) 32 (22+10)
0.7
0.06
8869
0.8
0.06
0.7
No. of epochs
Training Time (sec.)
39.47
%age Accuracy of Diagnosis on training set 99.50
%age Accuracy of Diagnosis on testing set 86.2
6546
37.05
98.50
93.85
0.06
8044
37.86
99.00
93.85
0.8
0.07
16632
89.14
99.50
95.38
0.7
0.06
6372
35.36
99.50
90.77
0.7
0.06
19215
105.89
99.00
90.31
No. of Hidden Layers = 2, Function in Hidden Layer = Tan-Sigmoid, Function in Output Layer = Purelin, Mean sum-squared error (MSE) = 0.01.
35000. Table 1 shows the experimental results of diagnosis using BP Networks. From the table, it is clear that the BP network with 30 hidden neurons and having a percentage accuracy of diagnosis of 95.38% is the best BP network for diagnosis of Epilepsy. Fig. 3 shows the training curve for the best BP Network and Fig. 4 shows the graphical representation of accuracy of the best BP Network for diagnosis of Epilepsy. This shows that the diagnosis is correct in 62 out of 65 cases.
3.2 Diagnosis Using RBF Network In Radial Basis Function Networks (Fig. 5), we need to specify the input values, target values, error goal and spread value. The performance function named MSE
Diagnosis of Epilepsy Disorders Using Artificial Neural Networks
Fig. 3 Training curve of the best BPN
811
Fig. 4 Percentage accuracy of best BPN
(Mean Sum-Squared Error) was used. The value of spread was changed to get different models. The experimental results of diagnosis using RBF Networks are shown in Table 2. From this table, it is clear that the RBF network model with
Fig. 5 Architecture of Radial Basis Function Network Table 2 Experimental Results for RBF Networks
4.2
%age Accuracy of Diagnosis on training set 100
%age Accuracy of Diagnosis on testing set 86.20
6.08 5.97 5.53
100 100 100
80.00 87.69 76.92
Training Time (sec.)
150
Mean squared error (MSE) 0.03
150 150 150
0.11 0.02 0.025
Spread
No. of epochs
2.5 4.9 5.2 6
Goal = 0.01, Radial Basis Neurons = 150, Function Used = newrb.
812
Fig. 6 Training curve of the best RBFN
A. Shukla, R. Tiwari, and P. Kaur
Fig. 7 Percentage accuracy of best RBFN
spread value of 5.2, number of hidden neurons equal to 150 and accuracy of 87.69% is the best RBF model for diagnosis of Epilepsy. Fig. 6 shows the training curve for the best RBF Network and Fig. 7 shows the graphical representation of accuracy of the best RBF Network for diagnosis of Epilepsy. This shows that the diagnosis is correct in 57 out of 65 cases.
3.3 Diagnosis Using LVQ Network Different LVQ models were obtained by changing number of hidden neurons. The Kohonen’s learning rate was set to 0.06. The performance function used, was Mean Sum-Squared Error (MSE). Fig.8 shows the architecture of LVQ network and Table 3 shows the experimental results on the basis of number of hidden neurons used. The network with 10 hidden neurons and an accuracy of 95.38% is the best LVQ Network. Fig. 9 shows the training curve for the best LVQ Network and
Fig. 8 Architecture of LVQ Network
Diagnosis of Epilepsy Disorders Using Artificial Neural Networks
813
Table 3 Experimental Results for LVQ Networks
No. of Hidden Neurons
No. of epochs
Mean sumsquared error (MSE)
Training Time (sec.)
10 13 15 18 20
300 300 300 300 300
0.05 0.04 0.04 0.35 0.04
65.13 65.72 65.70 65.38 68.56
%age Accuracy of Diagnosis on training set 96.00 96.00 96.50 96.50 90.77
%age Accuracy of Diagnosis on testing set 95.38 90.77 90.77 92.31 92.31
Learning rate = 0.06, Learning Function = learnlv1.
Fig. 9 Training curve of the best LVQN
Fig. 10 Percentage accuracy of best LVQN
Fig. 10 shows the graphical representation of accuracy of the best LVQ Network for diagnosis of Epilepsy. This shows that the diagnosis is correct in 62 out of 65 instances.
3.4 Performance Comparison of ANNs Table 4 shows the performance comparison of ANNs to find the best diagnostic system for Epilepsy. In this case, 200 patterns of epilepsy related data were used for training and 65 were used for testing. For the networks trained with the specified data, BPA and LVQ produce the same accuracy of diagnosis on the testing set. Although the accuracy of diagnosis in BPA is better than that of LVQ on the training set, but the actual data will never be exactly same as the data on which the network is trained. Also, LVQ takes much lesser time in training than BPA for producing the same results on testing set. If we compare the time required to train the three networks, then it is clear that RBF takes the least time for training. The
814
A. Shukla, R. Tiwari, and P. Kaur
Table 4 Performance comparison to find the best diagnostic system for Epilepsy (using full training data set)
Network BPA RBF
%age accuracy of diagnosis (on testing set) 95.38 87.69
Training Time (in seconds) 89.14 5.97
Mean sum-squared error (MSE) 0.01 0.02
LVQ
95.38
65.13
0.05
Table 5 Accuracy Comparison of the best models when trained on reduced training set
Network BPA RBF
%age accuracy of diagnosis (on testing set) 87.88 83.63
LVQ
92.73
time required by LVQ was about eleven times that required for the RBF network. The BP Network took time that was about 15 times that required by RBF network. The difference in time can be attributed to the fact that Back-Propagation code uses a number of loops, and loops are not very efficient in MATLAB. So far RBF is concerned its code uses large matrix operations which are quite efficient in MATLAB. Also experiments were conducted using a reduced training set (100 patterns) and a larger testing set (165 patterns) as shown in Table 5. The results of these experiments were obtained using the same training parameters as used for the best BPA, RBF and LVQ models obtained by training on full training data set. In these experiments, LVQ produces the best accuracy of diagnosis on testing set. BPA produces the second best accuracy. Thus, LVQN is the best diagnostic model for Epilepsy.
4 Conclusions The best diagnostic system for the Epilepsy is LVQN with 95.38% accuracy on full training set and 92.73% accuracy on reduced training set and comparatively larger testing set. Thus, LVQN provides the best generalization in diagnosis. Even when trained on a smaller training set, it provides best accuracy on a larger testing set. RBFN is the quickest in training, but does not match the capabilities of LVQN. The performance of each of the diagnostic system is basically application dependent, and depends a lot on the kind of data set being used for training and testing. Also, all the neural networks have their own advantages and disadvantages. So we conducted the experiment with different neural networks to find the best system for the diagnosis. This is because, only the best model will be able to cater to the needs of medical experts, and a bad model will only complicate the work of doctors and will be an unnecessary overhead. This work can be extended for other diseases also & finally for a complete medical expert system.
Diagnosis of Epilepsy Disorders Using Artificial Neural Networks
815
References 1. Aliev, R.A., Aliev, R.R.: Soft Computing and its Applications. World Scientific Publishing Co. Pvt. Ltd., Singapore (2001) 2. Lisboa, P.J.G., Vellido, A., Wong, H.: Outstanding Issues for Clinical Decision Support with Neural Networks. J. Artificial Neural Networks in Medicine and Biology, 63–71 (2000) 3. Setiono, R., Huan, L.: Neurolinear: From Neural Networks to Oblique Decision rules. J. Neurocomputing, 1–24 (1997) 4. http://en.wikipedia.org/wiki/Epilepsy/ (accessed October 2007) 5. Del Rio, R.A., Foyaca-Sibat, H., LdeF Ibañez –Valdes: Neuroepidemiological Survey For Epilepsy, Knowledge About Epilepsy, Neurocysticercosis And HIV/Aids At The Makaula Village In South Africa. Internet Journal of Neurology 7, 2, ISSN: 1531295X 6. Dua, T., De Boer, H.M., Prilipko, L.L., Saxena, S.: Epilepsy care in the world: Results of an ILAE/IBE/WHO global campaign against epilepsy survey. CODEN EPILAK 47(7), 1225–1231 (2006) 7. http://www.who.int/mediacentre/factsheets/fs166/en/ 8. Neural Networks Toolbox in MATLAB 9. Freeman, J.A., Skapura, D.M.: Neural Networks. Addison-Wesley, Reading (1999) 10. Hecht–Nielsen, R.: Theory of the Backpropagation. In: Proceedings of the International Joint Conference on Neural Networks, vol. 1, pp. 593–606 (1989) 11. Widrow, B., Lehr, M.A.: 30 Years of Adaptive Neural Networks: Perceptron, Madaline and Backpropagation. Proc. IEEE 78(9), 1415–1441 (1990) 12. Orr, M.J.L.: Regularisation in the Selection of Radial Basis Function Centres, J. Neural Computation 7(3), 606–623 (1995) 13. Toh, K.A.[Kar-Ann], Mao, K.Z.: A Global Transformation Approach to RBF Neural Network Learning. In: ICPR 2002, vol. II, pp. 96–99 (2002) 14. Jankowski, N.: Approximation with RBF-type Neural Networks using flexible local and semi local transfer functions. In: 4th Conference on Neural Networks and Their Applications, pp. 77–82 (1999) 15. Ghosh, A., Biehl, M., Freking, A., Reents, G.: A Theoretical Framework for Analyzing the Dynamics of LVQ: A Statistical Physics approach, Technical Report 2004-9-02, Mathematics and Computing Science, University Groningen, P.O. Box 800, 9700 AV Groningen, Netherlands (2004), http://www.cs.rug.nl/~biehl 16. Hammer, B., Villmann, T.: Generalized Relevance Learning Vector Quantization. J. Neural Networks 15(8-9), 1059–1068 (2002) 17. Neural Networks Research Centre, Helsinki, Bibliography on the self-organizing maps (SOM) and learning vector quantization (LVQ), Helsinki Univ. of Technology, Otaniemi (2002), http://liinwww.ira.uka.de/bibliography/Neural/SOM.LVQ.html 18. Ghosh, A., Biehl, M., Freking, A., Reents, G.: A theoretical framework for analyzing the dynamics of LVQ: A statistical physics approach, University Groningen, P.O. Box 800, 9700 AV Groningen, The Netherlands, Technical Report 2004-9-02, Mathematics and Computing Science (2004) 19. UCI repository of machine learning databases (2007), http://archive.ics.uci.edu/ml/datasets.html
Neural Forecasting Network for the Market of Pleione Formosana Hayata Orchid Chih-Yao Lo, Cheng-I Hou, and Tian-Syung Lan*
Abstract. Pleione Formosana Hayata Orchid is one of Taiwan’s native plants. Growing high in the mountains at an elevation of 1500-2000 meters, it requires a temperature range of 15-20 degrees Celsius. This perennial is a member of the family orchidaceous. It has one bulb and only one leaf. It is sold in bulb form, and it blooms before its leaf forms. At harvesting time, nursery personnel have always had to invest much capital to stock Pleione Formosana bulbs for the traditional orchid industry. However, the price of Pleione Formosana bulbs changes daily based on market supply and demand. This fluctuation makes it difficult to know how many bulbs to stock at any given time. However, if information technology could be used to assist operating personnel to forecast the demand for the flower in the near future, they can buy at a low price and achieve the objective of short-term stock, according to short-term demands, without misjudging the amount to be bought. Thus, they not only can increase their profit, but also enable customers to get fresh bulbs at a low price, thereby assisting them to reduce material costs. This research presents a market demand forecasting system for the Pleione Formosana Hayata Orchid product to assist traditional market personnel to forecast customer demand in the near future. The back propagation neural network algorithm will be used in the Pleione Formosana Hayata Orchid product market demand forecasting system so that the order demands in the future can be forecast on the basis of information on existing orders. Keywords: Forecasting system, Back propagation neural network, Pleione formosana Hayata Orchid.
1 Introduction The transportation and sale of the Pleione Formosana Hayata Orchid (PFH Orchid) product signifies the economic activity that occurs when PFH Orchid products are Chih-Yao Lo . Tian-Syung Lan Department of Information Management, Yu-Da College of Business, Taiwan 361, R.O.C. *
Cheng-I Hou Department of Leisure Management, Yu-Da College of Business, Taiwan 361, R.O.C. {jacklo,cheng,tslan}@ydu.edu.tw H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 817–827. springerlink.com © Springer-Verlag Berlin Heidelberg 2009
818
C.-Y. Lo, C.-I. Hou, and T.-S. Lan
sent to consumers or to subsequent manufacturers, including: the cleaning procedures from cultivation to collection, classification, transportation, storage, processing, sale, financing, and market information collection. However, price is an important factor; it has as great an influence on the operations as product freshness in the PFH bulb stock and sales industry. The price of the PFH Orchid product reflects the Orchid market demand. Therefore, if the short-term demand of customers in the future could be forecast, there would be many benefits for the operating personnel. Pleione Formosana Orchid grows at an elevation of 1500-2500 meters in Taiwan’s mountainous areas.[1] pointed out that natural environmental factors are the major growth factors affecting PFH Orchids, including: production technology, elevation, soil, rainfall, temperature, and other factors. Many people collect PFH Orchids from mountain areas, so these orchids are becoming increasingly rare in the wild. Forecasting entails many uncertainties and risk factors. If the forecast is inaccurate, then the production and marketing will cost more for the business and may even bring about an operational crisis. There are many forecasting methods, such as the statistical forecast method, qualitative analysis, cause and effect method, and time series. Depending on subject or purpose, different methods have been used to solve related problems. The NN is one of the methods used in artificial intelligence, and has several advantages over other methods, including fault-tolerance, learning ability, and non-linear conversion capacity. The BPN network is one of the most representational modes used extensively in all neural network learning modes at the current time. This research used a BPN network as its forecasting model.
2 Theoretical Background Werbos and Parker both developed the basic BPN network’s concept, but it was not until 1985, when Rinehart, Hinton, and Williams at Stanford University proposed the propagation learning rule or generalized delta learning rule, that this theory and algorithm has been definitively defined [2-3]. The basic principle of a BPN network uses the gradient descent method to minimize errors, and thus derives a delta rule. Its idea is to reduce the difference between the actual and expected outputs by means of a successive correctional value. From the viewpoint of mathematics, the correction of synapse values is in direct proportion to the first differential of error values, which converges to a stable state in the process of network learning, and is equivalent to the minimum value of a curve on the plane. The process of the BPN network includes forward pass and backward pass, and the rate of error can be reduced and the expected learning achieved through these two stages [4]. According to related studies, a supervised learning network must first obtain a training sample from the same field they wish to examine, including the input and output values. The objective of supervised learning is to reduce the difference between the actual and expected output values, and can be used for forecasting and classification. Therefore, the famous BPN network algorithm in supervised learning methods can be used to forecast the PFH Orchid’s market demand and supply in future. The application methodology is detailed as follows.
Neural Forecasting Network for the Market of Pleione Formosana Hayata Orchid
819
Table 1 Comparison of Input Matrix Conversion Input Parameter X1 X2 X3 X4 X5 X6
Meaning Income Market prices Related Item Number of research projects Number of consumer Capita consumption
A. Parameter Conversions Each year, the cultivation and demands of PFH Orchid product differ, based on the investigation and analysis of the orchid market. Firstly, it is necessary to convert the PFH Orchid product order of each cycle into the input matrix ( ) according to the relative factors of income, market price, etc. Secondly, the output matrix is obtained from the proper conversion amount of different sizes of PFH Orchid bulbs or flowers in the next cycle. For the conversion method, refer to Table 1: B. Forecasting Algorithm The BPN network order demands that a forecasting module be built according to the BPN network algorithm, after the proper input matrix and output matrix have been obtained. The module structure will adopt a hidden layer (see Fig. 1), which can also be denoted as Eq.(1), given below.
Fig. 1 Back Propagation Neural Network Order Demand Forecasting Structure
In this figure, X in the input layer indicates the input matrix; this matrix is the parameter matrix converted from each cycle according to the relative factors of income, market price, and so forth. a1 is the output matrix of the hidden layer, and a2 is the output matrix of the output layer, which represents the order demand matrix to be forecast. a 2 = f (W2 ⋅ f (W1 ⋅ X − b1 ) − b2 )
(1)
in which f ( ) indicates the conversion function.. The W b W b in the hidden and 1,
1,
2,
2
output layers is obtained from the calculation of the BPN network’s training algorithm, which is detailed below. Based on relevant literatures, the conversion function Logsig is used in this research.
820
C.-Y. Lo, C.-I. Hou, and T.-S. Lan
The learning and training process for the weight matrix of the neural network hidden layerW and threshold vector b , the weight matrix of the NN output layerW and threshold vector b are detailed as follows: 1
1
2
2
1) 2)
Input a set of training sample vectors X and expectation object vectors T . Calculate the actual output vector a2 of the network. 1.
Calculate the output vector a1 of the hidden layer: 1 ⎛ ⎞ ⎛ ⎞ a1 = f ⎜ ∑ W1 ⋅ X − b1 ⎟ = logsig ⎜ ∑ W1 ⋅ X − b1 ⎟ = − W ⋅ X −b ⎝ ⎠ ⎝ ⎠ 1+ e ∑ 1 1
2.
Calculate the actual output vector
(2)
a2 .
1 ⎛ ⎞ ⎛ ⎞ a2 = f ⎜ ∑W2 ⋅ a1 − b2 ⎟ = logsig ⎜ ∑W2 ⋅ a1 − b2 ⎟ = − ∑ W2 ⋅ a1 − b2 ⎝ ⎠ ⎝ ⎠ 1+ e
3)
(3)
Calculate the difference E between output vector a2 and object vector T . E = T − a2
4)
(4)
Update the weight value W and the threshold value b . 1. Update the weight value W and the threshold value b2 of the output layer. 2
W2 ( new ) = W2 + ΔW2 ΔW2 = ηδ 2 a1T
(5)
b2 ( new ) = b2 + Δb2 Δb2 = −ηδ 2
(6)
2. Update the weight value
W1
and the threshold value
b1
of the hidden layer.
W1 (n e w ) = W1 + Δ W1 Δ W1 = η δ 1 X
(7)
T
δ 1 = F1 ( n ) (W 1 ) δ 2 T
δ 2 = − 2 F 2 ( m ) (T − a 2 )
b1 ( new) = b1 + Δb1Δb1 = −ηδ1 ⎡ f (m ) ⎢ 2 1 ⎢ ⎢ ⎢ 0 ( ) F2 m = ⎢ . ⎢ ⎢ ⎢ ⎢ 0 ⎣ i
⎡ fi (n ) ⎢ 1 1 ⎢ ⎢ ⎢ 0 ( ) F1 n = ⎢ ⎢ . ⎢ ⎢ ⎢ 0 ⎣
0
...
i
f 2 (m 2 ) ... . . 0
0
...
...
i
f 1 (n 2 ) ... . . 0
...
⎤ ⎥ ⎥ ⎥ 0 ⎥ ⎥ . ⎥ ⎥ i ⎥ f 2 (m i )⎥ ⎦
(8)
0
⎤ ⎥ ⎥ ⎥ 0 ⎥ ⎥ . ⎥ ⎥ i ⎥ f 1 (n j )⎥ ⎦
and
i
f 2 (m i ) = (1 − a 2 )(a 2 )
(9)
0
and
i
f 1 (ni ) = (1 − a1 )(a1 )
(10)
Neural Forecasting Network for the Market of Pleione Formosana Hayata Orchid
821
Repeat the above steps until the error (E) has obviously not changed: that is, convergence is achieved and the learning is finished. The deposited weight value and the threshold value represent the order feature derived from the former orders after the learning is finished. Substitute the weight value matrix and the threshold value matrix into Eq.(1) for the subsequent BPN network order demand forecasting module to calculate the order demand.
3 Network Construction In order to create an effective forecasting model, its accuracy must also be improved. This research uses the BPN network, which is one of the modes in NN, as the forecasting model. The first step is to analyze the impact factors which are related to the supply and demand for the PFH Orchid. Second, the BPN Network technique is used to identify all relevant factor weights. Third, the weight factors which are relatively less important are removed, and the key factor is identified. Finally, the weight of the key factor is entered into this model to estimate the next phase of supply and demand. Therefore, a system framework is constructed for the forecasting model for the PFH Orchid, as detailed below. The impact factors of this study were chosen from the main factors affecting supply and demand as well as other relevant factors based on economic theory. a. Factors affecting demand 1. Income (X1) This study set the income as Taiwan’s national income [5]. 2. Market prices for the Pleione Formosana Hayata Orchid (X2) This study used the average market value as the PFH Orchid market price because the higher the market price, the less the demand. The average market price came from the Taipei market, the market in Taichung, the Changhua market, and the Tainan and Kaohsiung markets [6]. 3. Related Item prices (X3) This section is divided into complementary and substitute products. The complementary products for the PFH Orchid are flower pots. However, such pots also are other plants’ complementary goods. It is very difficult to use the quantity of pots to determine the demand for the PFH Orchid. Therefore, this study did not use complementary goods, but rather substituted products as impact factors. Phalaenopsis Orchid is used as a substitute product because as the PFH Orchid, the Phalaenopsis sells at relatively low prices in Taiwan. Consumers in Taiwan often use price as their top priority consideration based on interviews with sellers. This study used the average market price as the PFH Orchid’s value. 4. The number of research projects for the Pleione Formosana Hayata Orchid (X4). The volume of research projects is identified for PFH Orchid every year to show the preference of scholars [7-9]. 5. The number of consumers (X5) This section can be divided into two groups: the domestic market and foreign markets. However, this study conflated these two groups because the foreign market consumer statistics are often inaccessible. Therefore, the annual value of
822
C.-Y. Lo, C.-I. Hou, and T.-S. Lan
exports is divided by the price of the PFH Orchid to estimate the number of consumers [10]. Foreign market consumers = Annual export values / PFH Orchid export price Domestic market consumers = Annual domestic sales / PFH Orchid domestic price 6. The annual per capita consumption for flower (X6): Taiwan flower gardening consumption estimates = (total production of flowering + flowering import values - flowering export values) x 2 (Committee on Agriculture). b. The factors affecting the supply 1. Production Technology (Y1) This study classified cultivation techniques into four ratings: flat bed cultivation planting, elevated bed cultivation planting, high bench cultivation plate frame planting, and high bench cultivation plate frame planting with screening and an automatic sprinkling system [11]. Rating: The smaller the number, the better the production technology 2. Pleione Formosana Hayata Orchid market price (Y2) From the suppliers’ point of view, the higher the market price, the more willing they are to produce. This study used the average market value as the PFH Orchid market price [5]. 3. Raw materials and production elements prices (Y3) This study used the prices of raw materials and production elements added together to compute the figure used as the cost of production, including the amount of money needed for: seed, fertilizer, labor, pesticide costs, energy costs, rent, and interests on capital, buildings and equipment depreciation charges [11]. 4. Other related product prices (Y4) The other related products are set as Lycoris Aurea Herb. The Lycoris Aurea Herb growth environment is similar to that of the PFH Orchid. Therefore, if the Lycoris Aurea Herb market price increases, a farmer could grow the Lycoris Aurea Herb instead. The Lycoris Aurea Herb flower’s cutting period is from August to October, which does not overlap with the PFH Orchid’s production time [6]. 5. The government's Agricultural Research and Development Budget (Y5) [12]. 6. The number of suppliers (Y6) The PFH Orchid is one of Taiwan’s native plants. Long ago, indigenous people collected these orchids, and sold all to the markets. It was very difficult to calculate the number of suppliers at that time. Therefore, this study set the number of market suppliers according to the government’s policy of permitted domestic suppliers only [11]. The framework of the NN is often specified by the number of layers. The NN usually has more than two layers, which are input layer, a hidden layer or multiple hidden layers, and the output layer. There is no standard number of hidden layers. Usually, one or two hidden layers have the best performance for the NN [4].Usually, the impact parameters are earning rate, momentum, and epoch (“Epoch” is the number of the training cycle). This study used the pattern mode which trains one training sample at a time. The error and weights must be calculated for
Neural Forecasting Network for the Market of Pleione Formosana Hayata Orchid
823
every training sample. It is called a “Learning Cycle.” This study used the default value of 0.01, depending on the network and the study of the rate adjustment. “Momentum” for the network can reduce the error surface gradient of local sensitivity, effectively inhibiting the network, bringing its values down to the local minimum. The momentums of the initial testing value in this study are 0.001, 0.01 and 0.05. One must input all the data into the network for training; this is called one training cycle, so the epoch value is equal to 1. An appropriate learning network will enable a number of good opportunities for induction. There are two main methods to determine how many training cycles should be run for the network. A decision was made to set the number of training cycles initially: in such a case, the network stops learning after the preset number of training cycles. This condition may result in incomplete training if the network has not yet reached convergence. Another method is to set the scope of convergence of the network, wherein the network stops if the value of the error is less than the one originally set. This study set 100 as the default, and set epochs to 300 as the training cycles.
4 System Development The forecasting algorithm was implemented on the MS SQL database management system within the Visual Basic Programming environment that uses the BPN network theory to set the values for the support platforms. The Visual Basic programming software package and SQL database management system have very good matrices and vector operational capabilities, and they provide many toolbox functions. The developed software system uses the BPN network theory as the forecasting tool to calculate the production and demand for the PFH Orchid. Fig. 2 shows the system’s Graphical User Interface (GUI). Fig. 2 The Pleione Formosana Orchid BPN System’s GUI
The NN operation can be divided into learning and testing processes [4]. The learning process involves searching for rules among the learning samples. The testing processes input data gathered from the testing samples. The NN infers the relative output. Then it compares the output with the original data and calculates the error in order to understand the effectiveness of the network’s learning result. This study used 20 tests and 20 learning samples.
824
C.-Y. Lo, C.-I. Hou, and T.-S. Lan
There are many different types of neural networks. At present, over a dozen of well-known NN types have been built, including BPN, Hopfield Network, Kohonen Network, Boltzmann Machine Network, and Adaptive Resonance Theory, among others. This study used the BPN Network type. Hebb[13] proposed a learning rule that explained how a network of neurons learned. The learning rule is one of the most important functions for an NN because it determines how to adapt connection weights in order to optimize network performance.
5 Results and Discussions The current study, involving NN-related parameters, used past production and sales data for the PFH Orchid market demand with other related information as an example, and it also used the Visual Basic programming software package to develop the system as a tool with which to experiment. This study obtained the following results. Fig. 2 shows the amount of convergence after the NN training had concluded. Tables 2 and 3 show the results after the neural network’s training. The following is the formula for finding the Mean Absolute Error (MAE) [14]: In the equation, is the output target, is the forecast output target, n is the number of times, and i represents the initial value. Fig. 3 shows how the beginning weights were randomly distributed with values that ranged between 0 and 1. This results in an MAE error rate up to the initial value of more than 1. After running the training cycle 3500 to 4000 times, the BPN network output a group of effective forecast weights. Table 2 The results after the BPN process (Demand values for 2007)
X1 X2 X3 X4 X5 X6
W1 19.859 7.3083 -4.531 -3.5318 -0.80918 5.0717
b1 -0.14051 10.373 -13.129 7.3305 -11.335 -0.81747
W2 -15.728 7.4675 5.7161 -6.4688 4.2921 14.501
b2 -12.6 -13.1 6.511 5.351 -6.94 -2.35
QD
835,376
Table 3 The results after the BPN process (Supply values for 2007)
Y1 Y2 Y3 Y4 Y5 Y6
W1 17.4921 5.3248 11.4834 -4.237 4.6238 -11.932
b1 -1.375 3.1385 -0.2842 1.4852 5.891 -0.392
W2 -12.582 -7.288 3.2894 1.348 11.482 3.238
b2 QS -2.386 -1.233 -0.2318 -5.238 526,847 0.3913 -0.581
Neural Forecasting Network for the Market of Pleione Formosana Hayata Orchid Fig. 3 BPN network training figures
825
MAE 1.5 1 0.5 0
Demand Supply 1
0 0 0 0 0 0 0 0 50 100 150 200 250 300 350 400
The forecast weight has been tested with 20 sets of data. The Cost Percentage Error (CPE) (National Statistics) and the Mean Absolute Percentage Error (MAPE) (National Statistics) have been used to verify the back propagation neural network’s accuracy:
After CPE calculation for the PFH Orchid market demand quantity, there are two sets of data error values within ±2%, 4 sets of data within ±3%, 4 sets of data within ±4%, another 4 sets of data within ±6%, 3 sets of data within ±7%, and there are 3 further sets of data within ±10%. The calculation of MAPE for demand quantity is 5.21%. After CPE calculation for PFH Orchid market supply quantity, there are 1 set of data error value within ±1%, 2 sets of data within ±2%, 4 sets of data within ±3%, another 7 sets of data within ±4%, and 6 sets of data within ±6%. The MAPE value for the supply quantity is 3.37%. The overall MAPE value is 4.47%. Therefore, the accuracy of the BPN for this case is 95.53%. The following tables are the forecasting results for demand and supply. Xi & Yi: are the decision variables, Wi: is the weight value, and bi: is the threshold value. QD: is the estimated demand quantity, and QS: is the estimated supply quantity. According to the above NN training and the study result, the analyzed results are as follows. 1. The PFH Orchid bulbs have a short storage life. The grower is greatly dependent on the accuracy of forecasts. The predictive model from this study has been proven to have a 95.53% accuracy; clearly, this model can be of great value to the growers. 2. The impact factors on the market demand forecasts are: consumer income, the PFH Orchids’ market price, and the annual per capita consumption for these flowers. The impact factors on the market supply forecasts are production technology, raw materials prices, and production elements. 3. The related products’ prices and the number of suppliers have no significant influence on the demand and the supply forecasts. Therefore, those factors can be disregarded during the forecast process.
826
C.-Y. Lo, C.-I. Hou, and T.-S. Lan
4. This study forecasted that the PFH Orchid market demands are about 300,000 higher than the actual supply The reasons are that the PFH Orchid seeding needs three years to mature, and the PFH Orchid greenhouse production technology is not up to date with current research and knowledge. 5. According to the economics theorem, if there is an excess of demand, market prices tend to rise. The actual data supports this theorem. However, based on in-depth interviews with PFH Orchid growers, it was found that this is not really the case, possibly because profits do not directly go to growers. This unfortunate situation leads to lowered production levels. Another reason is that some dealers do not source their products from legally registered suppliers, thus diluting the legitimate industry’s profits. 6. The PFH Orchid’s majority of sales were provided by the Nan-Chuan County area’s growers. Therefore, dealers and growers should closely cooperate with each other to provide promotion, marketing, and so on, so that everybody involved can share in the profits they have earned. 7. It is difficult to obtain confidential information. Therefore, this study deliberately disregarded such information, only considering that of the manufacturers’ standpoint. It is not feasible to take into account dealers’ or distributors’ data. The result is that the true sales data of the products and the characteristics described in the study did not show the whole PFH Orchid-selling picture. 8. When the products are defected, such as damaged packaging or imperfect production process, those data are rendered unavailable and unrecorded. Therefore, when making a production prediction, the security storage quantity considerations are slightly inaccurate. 9. The ultraviolet line index has the possibility to affect whether the consumers would go out shopping; this factor may influence the sales quantity. Unfortunately, there is no meteorological database in Taiwan that records such data. Therefore, this study could not take the ultraviolet index as an impact factor, thus skewing the results somewhat.
6 Conclusions The BPN network mainly uses historical sample data to learn the studied subject behavior. In addition to the unique nonlinear conversion capabilities and fault tolerance capabilities, the information is not necessary when one wishes to impose restrictions. Therefore, all areas of prediction may be invoked. The amount of information required is very flexible. The NN can also simplify the model of the development process. It is capable of dealing with seasonal trends and with ups and downs of the data types. The PFH Orchid has seasonal and trend-like characteristics. This paper uses flower sales data, through the BPN network, to identify the affected product’s supply and demand factors. It also uses the neural network’s training and certification capabilities to build the forecasting model. The aim of the proposed method is to develop a new forecasting model that would help estimating the supply and demand for the PFH Orchid in Taiwan’s market. An algorithm is
Neural Forecasting Network for the Market of Pleione Formosana Hayata Orchid
827
also developed for this purpose. This system is expected to be viewed as a prototype for other orchid models in future work.
References 1. Li, N.: Orchids. Taiwan Flower Development Association, 115–320 (2003) 2. Wang, J., Xiao, D.: Introduction of Neural Network and Fuzzy Control Theory. Quanhua Science & Technology Books Co., Ltd. (2002) 3. Ye, Y.: Application and Practice of Neural Network Mode, 7th edn. Taipei Scholars Books Co., Ltd. (2000) 4. Wong, B., Bodnovich, T.A., Selvi, Y.: Neural Network Applications in business: A Review and Analysis of the Literature. Decision Support Systems 19, 301–320 (1997) 5. National Statistics, R.O.C (Taiwan), http://www.stat.gov.tw/ 6. Agriculture and Food Agency Council of Agriculture, the Executive Yuan 7. Taiwan Floriculture Development Association 8. Government Research Information Systems (GRB) Database 9. Agricultural Research and Development Database 10. Ministry of Finance Customs Office, http:// web.customs.gov.tw/statistic/statistic/mnhStatistic.asp/ 11. Nan-Chuan County Agriculture Council 12. Council of Agriculture Unit Budgets 13. Hebb, D.O.: The Organization of Behavior. Wiley, New York (1946) 14. Wang, H.S.: Application of BPN with Feature-based Models on Cost Estimation of Plastic Injection Products. Computers & Industrial Engineering 53, 79–94 (2007)
Harmonic Current Detection Based on Neural Network Adaptive Noise Cancellation Technology* Ziqiang Xi , Ruili Tang, Wencong Huang, Dandan Huang, Lizhi Zheng, and Pan Shen *
Abstract. An approach of measuring harmonics in power system based on selfadaptive noise cancellation method and artificial neural network (ANN) theory is put forward in this paper. By training ANN on line and two-level filter, the designed system could accurately complete dynamic detection for harmonics. It is proved and available by simulation. Keywords: Artificial neural network (ANN), Adaptive noise filter, BP algorithm, Harmonic detection.
1 Introduction
,
With the power of the electronic device becomes widely harmonic pollution of power is getting worse. It’s necessary for us to detect harmonic content, and master its situation about the system of power, as well as prevent harmonic harm and maintain the safe operation of the power systems. Currently, the harmonic current detection method based on the instantaneous reactive power theory is commonly used, but this method is complicated which has a low accuracy and less adaptive capacity. However, those disadvantages can be overcome by using the method of current detection based on artificial neural network (ANN), which demonstrates a strong ability of study, and enhance the accuracy of detection simultaneously. It has a great potential with the current detection based on neural network. In this paper, a harmonic current detection method based on adaptive noise cancellation technology is mentioned. Ziqiang Xi . Ruili Tang . Wencong Huang . Dandan Huang . Lizhi Zheng . Pan Shen School of Electrical and Electronic Engineering, Hubei University of Technology Wuhan, 430068 China *
Ziqiang Xi Professor at Hubei University of Technology, China, research area: power systems and automation, automatic control, power electronics.
*
H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 829–835. springerlink.com © Springer-Verlag Berlin Heidelberg 2009
Z. Xi et al.
830
2 Adaptive Noise Cancellation Theory Adaptive noise cancellation method can separate the signals s(t) from the noise n(t), the principle as shown in Fig. 1. s(t)+n(t)
Original input
+ _
∑
n(t) Reference input
y(t)
Adaptive filter
Error e(t) Fig. 1 Adaptive noise cancellation schematic
The input system includes original input s(t)+n(t) and reference input n/(t), reference input n/(t) is adjusted by the adaptive filters to get the output y(t), In which, s(t) and n(t) are not related, n/(t) neither. But n(t) and n/(t) are relevant. n(t) get the best rejection ratio in case x is closest to the main channel noise y in the sense of MMSE, at the same time, the output of system z(t) is closest to the signal s(t), so s(t) can be detected. X is used to adjust the parameters of the adaptive filter as well as the feedback error signal. The mixed-signal can be detected by the adaptive noise cancellation method which has a little or no noise signal on the priori knowledge of statistics to get the needed signal.
3 The Principle of Harmonic Current Detection Based on BP Algorithm Neural Network Self-adaptive Noise Cancellation Technology 3.1 Harmonic Current Analysis of the Power System In the power system, distortion of the wave is generated by a typical non-linear load, especially non-linear rectifier load, in which the harmonic proportion is small and mostly odd. In general, the amplitude of any odd harmonic is no more than 50 percent of the fundamental one. This value comes from theoretical, but in practice, the amplitude of the harmonics should be less than it, and the higher the number of harmonics the smaller the amplitude, so only the odd harmonics can be detected in actual measurement. In the form of training samples, the odd harmonics are mainly considered which can narrow the scope of the samples. Nonsinusoidal current of cycle is generated by non-linear load in the power system by Fourier series, then we obtain:
Harmonic Current Detection
831
i ( t ) = I 1 sin( ω t + ϕ 1 ) +
∞
∑
n=2
I n sin( n ω t + ϕ n ) = i 1 ( t ) +
∞
∑
n=2
in (t )
(1)
From the Eq. (1), i1(t) is fundamental current, and in(t) is the harmonic current with n orders. They can be divided into two parts, cosine and sine, which can be expressed as
i1 (t ) = I 1 cos ϕ1 sin ωt + I 1 sin ϕ1 cos ωt = i1 p (t ) + i1q (t )
(2)
in (t ) = I n cos ϕ n sin(nωt ) + I n sin ϕ n cos(nωt ) = i ns (t ) + inc (t )
(3)
i1p(t) and i1q(t) defined by Eq. (2) are separately fundamental active and reactive current; ins(t)and inc(t) defined by Eq. (3) are separately fundamental active and reactive current in harmonics the times of n. Detecting harmonic with the adaptive noise cancellation, we can take iL as the original input, if iL =i1+i2+…+in is the current noise, the other higher-order harmonic currents need to detect the signal. We take sin ωt cos ωt and their 3 5 7 second-class frequency harmonics as a reference input, and their noise corresponding to the current of any orders sine and cosine are related, but not related to the higher harmonics.
、
、、
3.2 Harmonic Analysis of the Current Detection Based on ANN The principle of harmonic current detection based on ANN shown in Fig. 2, the method can detect the harmonic of Odd, even, specific and total orders as well as phase on real-time accurate dynamic testing are in Fig. 2.
i L = i + ik sin ωt
cos ωt
w1s
+ y(t)
w1c
sin(nωt
wns
cos(nω
wnc
¦ Error e(t)
Fig. 2 Harmonic current detection schematic
From Fig. 2, we can see that the node 1 and 2 are correspond to the fundamental adaptive filter. when detecting the total harmonic current, we take sin ωt and
Z. Xi et al.
832
cos ωt as a reference input so that output w1s sin ωt and w1c cos ωt are separately close to related i1p(t) and i1q(t). After ANN study is over, the output of system will get the total harmonic currents. When detecting the odd harmonic current, we will take sin ωt cos ωt sin(2k + 1)ωt and cos(2k + 1)ωt
、
、 ( 3 ≤ 2k + 1 ≤ n ,k is a positive integer)as a reference input, after ANN study
is over, then we obtain:
i 2 k +1 (t ) = w ( 2 k +1) s ⋅ sin( 2 k + 1)ω t + w( 2 k +1) c ⋅ cos( 2 k + 1)ω t
(4)
the value of corresponding odd harmonics current is characterized by Eq. (4). When detecting the special orders harmonic and phase, we will take sin ωt , cos ωt sin kωt and cos kωt k [2 , n], k is positive integer as a reference input, after ANN study is over, we have:
、
( ∈
)
ik (t ) = wks ⋅ sin kωt + wkc ⋅ cos kωt
(5)
the value of harmonic with k orders is derived by Eq. (5). ANN output is closed to noise current through the minimum mean square error, we will take the output ih*(t) of detecting circuit as the error signal e(t) of the right w to adjust the value, e(t)=ih*(t)=ih(t)+i(t)-i*(t).we take the mathematical expectation after the square on both sides of the formula, which derived as:
E [ e 2 ( t )] = E [ i h2 ( t )] + E {[ i ( t ) − i ∗ ( t )] 2 } ih(t) and ih*(t) are irrelevant. We make sure that
(6)
E[i h2 (t )] is minimum, then
E{[i (t ) − i ∗ (t )]2 } also is, so is E{[ih∗ (t ) − ih (t )] 2 } when we adjust the right about neurons. Ideally, the output of adaptive filter is equal to the any times weight of cosine and sine which is relevant to the noise current of reference input while w is close to the optimal value after several iterations. The weight is corresponding to their peak. In order to achieve dynamic testing for harmonic currents, fundamental shift factors
cos ϕ
、 cos ϕ = w
1s
w12s + w12c are derived from
weight w1s and w1c. In order to meet the real-time dynamic testing, we should improve the convergence rate further, and ensure the stability of the ANN. Assuming that v(t)=w(t)-wopt(t) is the distortion of the right value, w(t) is the ideal weight value and wopt(t) is the best approximation value. The increase of learning rate η can improve the convergence rate but v(t) also will increase when the right value is adjusted which can result in output distortion of neural network. Taking the fundamental self-adaptive filter as example, under the ideal circumstances denoted by w, of w as follows: w=wopt(t)
(7)
w1s ⋅ sin ωt = i1 p , w1c ⋅ cos ωt = i1q
(8)
Harmonic Current Detection
sin ωt
cos ωt
833
i1'
w1s +
w1c e(t )
sin ωt
w '1s
cos ωt
w '1c
∑ Error e(t)
Fig. 3 The slip road map of second fundamental self-adaptive filter
if v(t) is not zero, then
w1s ⋅ sin ωt = i1 p + i dp , w1c ⋅ cos ωt = i1q + i dq .x and y
are the distortion current based on ANN fundamental cosine and sine. idp and idq, i1p and i1q are not related. In order to add secondary ANN self-adaptive filter, we use the self-adaptive noise cancellation principle so that the output current is less distortion. Because of η increasing, it causes the distortion. The structure is shown in Fig. 3. By the same token, the secondary ANN self-adaptive filter should be added to the rest of ones corresponding the high harmonics, so as to speed up the convergence rate, and improve the detection results.
4 Simulation and Analysis We take f=50HZ as the fundamental frequency of the input reference in the simulation. The current flowing through the load is a sine wave, the period 0.02s,
Fig. 4 non-training network output and the ideal output
834
Z. Xi et al.
Fig. 5 The process of change for error after training
Fig. 6 Training output
range 1A. we use the function newff() to establish the BP network structure, supposing that the number of the hidden layer neurons is 10, the output layer is one. The transfer functions of neurons about hidden and output layer are separately tansig and purelin, network training algorithm is Levenberg-Marquardt. The nontraining network output and the ideal output waveform are shown in Fig. 4. As the function newff() is established, the initialization of harmonic and neurons are random, it has a poor output, then there is a difference between the detected output and the ideal sine wave. Application function train() for the training network will be set to 50 hours of training, and the training precision is set to 0.01, the remaining parameters using default values. The process of change for error is get after training shown in Fig. 5.
Harmonic Current Detection
835
From Fig. 5 we can see that if the training network is fast, we will achieve the required accuracy after a cycle of iterative process. We will get the ideal sine wave and measured wave as shown in fig 6 (the solid line is ideal wave). The simulation showed that in a cycle of the frequency, the secondary of the ANN output is close to their theoretical value until it is convergence, then they have almost no error about margin and phase between them, and the selection for initial value of the right is little impact on convergence, so the adaptive capacity of the system is powerful. We know from the simulation that ANN adaptive harmonic measurement is accuracy and good real-time, as shown in Fig. 6.
5 Conclusion According to the basic characteristics of adaptive noise cancellation technology and single neuron, combining to the harmonic current dynamic detecting, we design the harmonic current dynamic detection system base on second harmonic current. The structure of this system is simple, and it is easy for the algorithm to implement. Simulation studies have shown that this system can detect the harmonic current of non-linear loads online, as well as it has a high precision. We confirmed the validity of the mentioned method.
References [1] Mu, S., Wei, R.Y., Li, Z.Y.: Harmonic Analysis Based on Artificial Neural Network in Power System. Journal of Changsha Railway University 20, 94–96 (2002) [2] Wang, Z.A.: Harmonic and Reactive Power Compensation. Machinery Industry Press, Beijing (1998) [3] Zhou, K.L., Kang, Y.H.: Neural Network Models and MATLAB Simulation Program. Tsinghua University Press, Beijing (2004) [4] Wan, J.R., Wei, D.Q., Zhai, L.H.: Harmonic Current Detection Based on Adaptive Active Power Filter. Journal of Tianjin University 41, 408–412 (2008) [5] Xia, X.Y., Luo, A.: Harmonic Detection Method Based on Adaptive Frequency Tracking. High Voltage Engineering 34, 1715–1919 (2008) [6] Cai, Z.F., Chen, N.D., Chen, G.Z.: Harmonic Analysis Model Algorithm Based on Adaptive Neural Network. Transactions of China Electrotechnical Society 23, 118– 123 (2008) [7] Xiang, D.Y., Wang, G.B., Ma, W.M.: A New Method for Non-integer Harmonics Measurement Based on FFT Algorithm and Neural Network. Proceedings of the CSEE 25, 35–39 (2005) [8] Xiao, Y.H., Mao, X., Luo, R.Q.: The Application of Neural Network Theory in the Field of Harmonic Measurement. Transactions of China Electrotechnical Society 17, 101–104 (2002) [9] Wang, Q., Wu, N., Wang, Z.: A Neuron Adaptive Detecting Realization of Harmonic Current for APF and its Realization of Analog Circuit. IEEE Transactions on Instrumentation and Measurement 50, 77–84 (2001) [10] Gao, C.Y., Li, Y.M.: PS-3000 Harmonic Current Test System Based on BP Algorithm. Electrical Measurement & Instrumentation 44, 12–14 (2007)
Study on Dynamic Relation between Share Price Index and Housing Price: Co-integration Analysis and Application in Share Price Index Prediction Jin Peng*
Abstract. According to international experiment, there is a dynamic relation between share price index and housing price. In this paper, aimming at the recent fact in china, a new cointegration analysis is used to reserch this relation and predict the share price index furtherly. Firstly, we adopted the H-P filter technique to decompose the fluctuant components from the series of share price index and housing price. Secondly, the stationary of the time series is verified, there is cointegration relation between share price index and housing price. The result of Granger causality test shows, the fluctuation of housing price has remarkable influence on share price index. At last, on the basis of analysis above, we construct the error correction model, and apply it to predict the share price index. Keywords: Cointegration Analysis, H-P filter technique, Granger causality test, Error correction model, Share price index prediction.
1 Introduction Housing market and share market are important carrier of risk. Much experience in different countries shows, there is a consanguineous relation between share price index and housing price. For example, in the period from 1989 to 1990, Nikkei index of Japan nose-dives from 38916 point to 20222 point, in the same term, the housing price had slumpped 70%. Later, the economic crisis come forth. Thus, study on relation between share price index and housing price has important effect to government and investors. Recently, reseachers have many acquirement on the relation between share price index and housing price. Quan and Titman(1999) researched this relation of 17 countries in 14 years, and found that there is consanguineous correlaiton Jin Peng School of Economics and Management, Wuhan University, Wuhan, 430072, China
[email protected]
*
H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 837–846. springerlink.com © Springer-Verlag Berlin Heidelberg 2009
838
J. Peng
between share price index and housing price, but in some countries of Asia, this correlation appeared faintness. Reference took share price index and housing price in Taiwan from third quarter of 1973 to first quarter of 1992 as research sample, applied bivariate VAR model to analysis the correlation, attained the conclusion that there is a Granger causality from housing price to share price index. There are many other related research on this topic. Based on the above analysis, in this paper, in order to reflect that the cointegration relation between share price index and housing price time series is related to influencing variables, the H-P filter technique is used to eliminate the fluctuant components. First, the unit root test, the cointegration test are used to analyze the stationary of the time series. Then, granger causality test are utilized to research the lead-lag relation between share price index and housing price. At last, according to this relation, error correction model is studied and get something useful to predict the share price index.
2 Econometric Analysis of Research Data 2.1 Research Data and Process In this study, two groups representive index are selected: share price index(Shanghai bourse) and housing price. These time series is in the period from 1998 to 2007. Due to the different signfinication, for example, the housing price index is adjacent price correlation index, but the share price index is base price index. So this paper has standardized these two index as same price correlation index. In order to eliminate the heteroscedasticity of original data, we select Logarithmic Transformation to preprocess the data of share price index and housing price. The evolving trend of new time series are shown by Figure 1 which shows base price index preprocessed by Logarithmic Transformation (shortening as LNSP, LNHP). Fig. 1 The evolving trend of LNSP and LNHP
Study on Dynamic Relation between Share Price Index and Housing Price
839
According to the Figure 1, we can find the two time series are in similar tendency, we think that there is some dynamic relation between the share price and housing price distinctly.
2.2 Stationarity Test of Time Series Based on ADF Test In economitrics, cointegration relation is defined a long-term stationary relation among time series. To macroeconomy series, it usually is unstable owing to its many influencing factors. Many positive analyses indicate that the moments(mean value, variance, covariance etc.) of time series change with time, but the linear combination of these moments usually is stationary. It implys that, before using cointegration theory to process the integrated time series, the macroeconomy series need be examined using unit-root test. The Augmented Dickey-Fuller(ADF) technique is the most effective tool for testing stability of time series at present. The revised regression equation of ADF test is: m
ΔEt = β1 + β 2T + φ ∑ α i ΔYt − i + μt
(1)
i =1
Where Et denotes the time series ready for test; β1 denotes a constant; T denotes the time tendency; m denotes the delayed value; μt denotes the random error. Fuller had described its distribution. It is stable I (0) series without unit root in case without difference if the T statistics is smaller than the critical value; It will be the unstable I (1) series with a unit root if series could not refuse testing without difference, but refuse testing once it encounters first difference.
2.3 Cointegration Test of LNSP and LNHP Time Series If there are two or more unstable series, there exists the cointegrated relation between them at the same time. It is necessary to show their linear relation according to Johansen and Juseli's Johansen cointegraiton theory by using maximum likelihood estimator to test the cointegration relation under Value at Risk (VAR) analysis. The approach is explained as follow:
λ Skk − S k 0 S00−1 S0 k = 0
(2)
Where S00 is the residual matrix of least squares regression of Δxt to
Δxt −1 , Δxt − 2 ,
, Δxt − k +1
,
Δxt
is
the
differential
sequence
of
xt
,
Δxt −1 , Δxt − 2 , , Δxt − k +1 denote the 1-k order sequences; Skk denotes the residual matrix of least squares regression of Δxt − k to Δxt −1 , Δxt − 2 , , Δxt − k +1 ; S0k and Sk 0 are cross matrixs. The cointegrated vector β can be calculated using algorithm of highest eigenvalue.
840
J. Peng
Lmax = −T ln (1 − λr +1 )
(3)
Where T is sample sizes, λr +1 is the calculating eigenvalue. For null hypothesis, there are r cointegrated vectors, the alternative hypothesis are r + 1 cointegrated vectors. Due to many influencing factors, the time series may in short time deviate from the balanced state, but it will back to balance under the push of economy power. That is, these variables may be unstable in short period, but apt to change evenly together in a long run. Therefore, in a long term, if the time series have the relation of cointegration, there is run balanced increment relation between both economic variables. That is the economy significance of cointegration.
2.4 Granger Causality Test and Error Correction Model The basic idea of granger causality test is that X's change should be advanced to the Y's change if the X's change caused Y's change. Accordingly, this granger causality test can distinguish the lead and lag relationship between time series. General speaking, the causality of time series is verified by error correction model. The error correction models can be described as: Δy1 = α 2 + β 2 Et −1 + ∑ α 21 ( i ) Δyt −i + ∑ α 22 ( i ) Δxt − i + ε 2t
(4)
Δxt = α1 + β1 Et −1 + ∑ α11 ( i ) Δyt − i + ∑ α12 ( i ) Δxt −i + ε1t
(5)
i
i
i
i
Where α st ( i )( s, t = 1, 2 ) denotes autoregressive parameters; α1 , α 2 are constants;
ε1t , ε 2t are irrelational disturbing terms; Et −1 danotes error correction term(ECT). Using the error correction model, lever values can be combined with difference values to construct valuable model. In this valuable model, the short-term variability Δyt of variable yt is determined by the joint of secular trend(cointegration relationship) and short-term fluctuation( Δxt ). In a short term, the deviation of systerm to balanced state determines the fluctuating degree; on the contrary, in a long term, the cointegration relationship play a great role as equalizer. According to granger causality theorem, if the cointegration relationship exist between economic variables, this relation can be denoted by error correction model. That is, if two variables have cointegration relationship, there are some(at least one) granger causality. In equation (4), if the parameter Δx is remarkable in F-test, there is a short-term causality from x to y ; if the parameter β 2 is remarkable in t -test, there is a long-term causality from x to y ; if the two parameters are remarkable in joint F-test, there is a intense causality.
2.5 H-P Filter Technique The H-P filter technique is a for decomposed the time series in state space. The purpose of utilizing H-P filter in this study is separate the fluctuant components
Study on Dynamic Relation between Share Price Index and Housing Price
841
from the time series of share price index and housing price. For a time series yt ( t = 1, 2 , T ) , the H-P filter technique select trend components which satisfy the equation: ⎡T min ⎢ ∑ yt − Tyt Tyt ⎣ t =1
(
)
2
(
+ ϕ Δ 2Tyt
) ⎤⎥⎦ 2
(6)
Where Tyt is the trend components; smooth coefficient ϕ is the wight value of trend components and fluctuant components, and satisfy:
ϕ = σ 0 2 / σ 12
(7)
Where σ 0 2 and σ 12 denote the variance of trend components and fluctuant components respectively. ϕ ’s selection rule is: determining ϕ =100 using annual data; determining ϕ =1600 using monthly data; determining ϕ =14400 using daily data. After preprocessing the data using H-P filter, the fluctuant components are:
C yt = yt − Tyt
( t = 1, 2,
,T )
(8)
3 Cointegration Analysis of Share Price Index and Housing Price The time series must be stable if we adopt the conventional econometrics to regress and analyze, or it will cause artificial or false regression. From the Fig. 1, it seems that both time series are difficult to meet the conventional requirement. Here we use the cointegration theory and error correction model, as a method developed in recent 10 years, to deal with non-stable data and carry out the analysis.
3.1 Unit Root Test of Time Series The stationarity of share price index and housing price is studied based on ADF menthod. The outcome is shown in Table 1. It is clear from Table 1 that for the logarithmic time series of share price index and housing price, at the 5% level of significance, 1ogarithm of share price index(LNSP) accepted the unit root test by ADF test and PP test, but the 1ogarithm Table 1 Test results of ADF unit root test ADF test
PP test First Lever value First difference Lever value difference LSI -0.56 -3.59 -0.35 -2.97 LHP — — 1.08 -3.09 5% critical value -2.9903 -2.981 variables
842
J. Peng
Table 2 Cointegration estimation result between share price index and housing price time series Cointegration number eigenvalue Trace statistics 5% critical value 1% critical value 0.663094 29.98974 12.52 16.29 m ≤ 0 (**) SI~HP 0.240512 6.054327 3.83 6.48 m ≤ 1 (*) Standard equation: LSI=0.876*LHP
Note: m represents the number of cointegration equations.Trace statistics denotes cointegration relation at 5% level of significance. Asterisks denote significance at: **-5% level, *1% level.
of housing price(LNHP) only accepted the unit root using PP test. These results indicate that the time series of LNSP and LNHP are nonstationary and need be differenced to achieve stationary. Further, the first order difference of the LNSP and LNHP is tested by the PP test. At the 5% level of significance, the time series are stable. Thus, The cointegration test can be performed for the time series of LNSP and LNHP.
3.2 Cointegration Test of LNSP and LNHP Time Series Based on the above analysis, the LNSP and LNHP both are stationary through the first order difference, which indicates they are I(1) variables, thus providing a possibility for the cointegration relationship between the two variables. The cointegration test is performed for the time series of LNSP and LNHP with the help of Johansen method by Eviews 3.1. The test results are shown in Table 2. It can be seen from Table 2 that for the Cointegration number m ≤ 0, the trace statistics is greater than the 1% critical value, indicating the null hypothesis is rejected at the 1% significant level; for the Cointegration number m ≤ 1, the trace statistics is smaller than the 5% critical value, indicating the null hypothesis is accepted. Thus, the LNSP and LNHP are cointegrated, namely in the long term they keep up the long-term Cointegration relation.
3.3 Error Correction Model and Granger Causality Test In order to analyze the granger causality of time series of LSI and LHP, this study construct a error correction model based on equations (4) and (5), the test result is shown in Table 3. where Δ SP and Δ HP denotes the differential sequences of LNSP and LNHP respectively. According to AIC information criterion, the lagged order of autoregressive vectors is determined to 4. Table 3 Estimation result of error correction model Granger causality
Short-term
variables F statistics of
△SP △HP
— 0.66
joint hypothesis
△SP △HP t statistics of ECT F statistics of △SP and ECT △HP and ECT 2.75* —
1.763** -0.3496
— 0.75
Note: Asterisks denote significance at: **-5% level, *-10% level.
2.91* —
Study on Dynamic Relation between Share Price Index and Housing Price
843
Outcome from Table 3 indecates that the short-term unidirectional causality from housing price to share price index can be found, that is, in a short term, the raise of housing price will cause the evolution of share price index. On the other hand, the t statistic of ECT is significant in the share price index equation, namely the deviation of time series share price index and housing price to longterm balanced state will make a great effect on the short-term raise of share price index. In conclusion, from 1998 to 2007, there is cointegration relation between share price index and housing price, especially unidirectional intense Granger causality from housing price to share price index.
4 Experiment and Analysis 4.1 Application of H-P Filter Technique Based on H-P filter technique, this study eliminate the trend components and fluctuant components of time series of LNSP and LNHP. The results are described in Figure 2 and Figure 3. Where HPSP, HPHP denotes the trend components of LNSP and LNSP respectively; CSP, CHP denote cycle components of LNSP and LNHP respectively. It can be seen from Figure 2 and Figure 3, after filtering the fluctuant components, the LNSP and LNHP time series have similar changing trend. Fig. 2 Trend components of time series processed by H-P filter
Fig. 3 Fluctuant components of time seriesprocessed by H-P filter
844
J. Peng
4.2 Cointegration Test of Tendence and Fluctuation of Time Series The tendence and fluctuation of LNSP and LNHP are inspected by unit-root test, and the result indicate that they are stationary sequences, thus the cointegration relation between the two variables can be analyzed. The test results are shown in Table 4 and Table 5. According to the test results, there is still cointegration relation between trend components of LNSP and LNHP, the long-term elastic coefficient is 0.913, which exceeds the the elastic coefficient 0.876 of unfiltered time series remarkablely. From the cointegration equation, the cointegration relation between fluctuant components of LNSP and LNHP can be found, but a co-restricted relation. This short-term fluctuation constructs a long-term equilibrium and decrease the actual elastic coefficient between share price index and housing price automatically. In brief, the test result exposes that share price index and housing price have similar evolutive feature.
4.3 Error Correction Model Based on the ananlysis above, the estimating equation after calculation between the share price index and housing price is as follows. ΔSP = 0.3386SP(−1) + 0.7264Z + 0.4925u − 0.1358T
(t = −6.38; t = 12.67; t = 8.08; t = −7.124) Z = 0.2113ΔHP + 0.3895ΔSP(−1) + 0.2254ΔHP(−1) R 2 = 0.942547 Table 4 Johensan cointegration estimation between trend component of LNSP and LNHP time series
Cointegration number eigenvalue trace statistics 5% critical value 1% critical value 0.48257 22.49603 12.52 16.29 m ≤ 0 (*) HPSP~HPHP 0.30459 7.991534 3.83 6.52 m ≤ 1 (*) Standard equation: HPHP=0.913*HPSI Note: m represents the number of cointegration equations.Trace statistics denotes cointegration relation. Asterisks denote significance at: *-1% level.
Table 5 Johensan cointegration estimation between fluctuant component of LNSP and LNHP time series CSP~CHP
Cointegration number eigenvalue trace statistics 5% critical value 1% critical value 0.566872 28.76455 19.94 24.57 m ≤ 0 (*) 0.305196 8.742658 9.21 12.95 m≤1 Standard equation: CHP=-1.475*CSI
Note: m represents the number of cointegration equations.Trace statistics denotes cointegration relation. Asterisks denote significance at: *-5% level.
Study on Dynamic Relation between Share Price Index and Housing Price
845
Table 6 The Prediction Result of Share Price Year 2001 2002 2003 2004
Actual value 1573.742 1661.846 1557.653 1354.171
Cof 1384.736 1543.19 1820.585 1554.859
ErC(%) -0.1201 -0.0714 0.1688 0.1482
RDP-5 1897.618 1500.148 2321.838 1590.068
ErR(%) 0.2058 -0.0973 0.4906 0.1742
Note:Cof, ErC denotes predicting value and error rate of cointegration model respectively; RDP-5, ErR denotes predicting value and error rate of RDP-5 respectively.
△
△
Where, Z is the cointegration vector; SP denotes the growth rate in t year; SP(-1) is the one year delayed value of growth rate of share price; u is the random error; T is the year of time series.
4.4 Prediction of Share Price Index Based on the ananlysis above, the estimating equation after calculation between the share price index and housing price is as follows. In this paper, the share price index(Shanghai bourse) and housing price in the period from 1998 to 2007 are select to construct dynamic relation. Furtherlly, predicting the share price index according to this relation. Shanghai stock index in the period from 2001 to 2004 is arbitrarily selected as test sample. The result of prediction is shown as Table 6. We can find from table 6 that the prediction accuracy of cointegration model is higher than RDP-5. To test sample, the error rate of cointegration model are 0.1201, 0.0714, 0.1688, 0.1482 in four quarter respectively. These results imply that, after eleminating fluctuant component, applying cointegration relation to carry out prediction work is feasible.
5 Conclusion and Future Work This paper employs H-P filter to eliminate the fluctuant component, analyzes the dynamic relation between share price index and housing price, and predicts share price index according to this relation. The acquirement are following: (1) There is cointegration between share price index and housing price, and the change of housing price has markedly effect on share price index. (2) Through eliminating fluctuant component, utilizing the cointegration relation to predict share price index. The result shows good effectiveness of proposed hybrid.
References 1. Chen, N.K.: Asset CE Fluctuations in Taiwan: Evidence from Stock and Real Estate Prices 1973 to 1992. Journal of Asian Economics 12, 215–232 (2001)
846
J. Peng
2. Engle, R.F., Granger, C.W.J.: Cointegration and Error Correction: Representation, Estimation and Testing. Econometrica 55, 251–276 (1987) 3. Johansen, S., Juselius, K.: Maximum Likelihood Estimation and Inference on Cointegration with Applications to the Demand for Money. Oxford Bulletion of Ecomomise and Statistics 52, 169–210 (1990) 4. Basdevant, O.: On Application of State-space Modeling in Macroeconomics. Discussionpaper Series of Reserve Bank of New Zealand (2003)
Application of RBF and Elman Neural Networks on Condition Prediction in CBM Chao Liu, Dongxiang Jiang, and Minghao Zhao
Abstract. The maintenance strategy develops quickly under the requirement of equipments’ near-zero-downtime running performance. Condition Based Maintenance (CBM) makes the maintenance strategy by detecting the equipment’s condition and corrects them before failure which attracts more attention. However, the equipments’ running process di ers greatly. The parameters which can signify the faults onsets are also di erent. This paper attempts to find uniform rule for condition prediction. Artificial Neural Networks play more and more important roles in times series prediction which can achieve the desired output without exactly mathematical model. Application of Neural Networks to condition prediction is presented in this paper. For di erent concern of condition prediction, RBF Neural Network and Elman Neural Network are selected for the condition prediction which they both achieve good accuracy. Keywords: Condition based maintenance, Condition prediction, Fault distance, RBF neural network, Elman neural network.
1 Introduction More and more complex systems are used in industries which need each component works well in running process. Considering the damage that tool failure can cause to a machine tool and its peripheral components, it becomes increasingly important in contemporary manufacturing to predict and prevent machine failures, instead of allowing the machine to fail and then react to the failure. The need to achieve nearzero-downtime performance has been driving the shift from the traditional ”fail and fix” (FAF) practice to the ”predict and prevent” (PAP) paradigm[1-3]. As the need Chao Liu Dongxiang Jiang Minghao Zhao Department of Thermal Engineering, Tsinghua University, Beijing 100084, China H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 847–855. c Springer-Verlag Berlin Heidelberg 2009 springerlink.com
848
C. Liu, D. Jiang, and M. Zhao
to develop and demonstrate technologies that can monitor and predict the remaining service life of key elements in one country’s civil infrastructure is growing, monitoring the equipment’s condition and correcting abnormity before failure attract more and more research interests. Consequently predictive maintenance develops quickly recently. Predictive maintenance is based on anticipated future conditions of equipment, its remaining time before failure (or time reaching an unacceptable level of performance), the rate of degradation, and the nature of failure if it were to occur. Thus, as one significant method in predictive maintenance, Condition Based Maintenance (CBM) is to be considered in this paper whose basic concept is making maintenance strategy by observing the condition of the equipment[4]. The condition of the equipment is quantified by parameters that are continuously monitored and are specific which can signify the equipment’s condition.
2 Condition Prediction for Gas Engine The maintenance strategy of CBM is based on the equipment’s condition which is often signified by several parameters. Considering a parameter depicted in Fig. 1, the equipment degenerates while it’s running time increases. At fist its condition is normal. As time increases, the parameter becomes higher and o -normal. When its value reaches some limits, it reflects some fault. Usually, the fault’s signs are not signified by one parameter, and some faults which have similar degradation mechanisms are diÆcult to distinguish by one parameter alone. Also, the parameters which can signify the condition of equipments di er from the equipment’s style, structure, serving lifetime, serving history and so on. In this paper, uniform rule consisting characters of di erent parameters is chosen to reflect the equipment’s condition. Di erent parameters have the similar or opposite degradation mechanisms as parameters will depart from normal levels (higher or lower) when the fault appears(as Fig. 1). So uniform rule to fault diagnosis can
Fig. 1 Parameter degradation mechanism
Application of RBF and Elman Neural Networks
849
be established through proper analysis. Hard work is needed to select the parameters which are sensitive to faults and can distinguish faults well. In this paper, three parameters are chosen to signify the gas engine’s faults which are proved to satisfy the requirements. Further analysis is presented in the 4th section.
3 Principles of Artificial Neural Networks Artificial Neural Networks (ANNs) are systems that are inspired by biological neural networks. They are suitable for processing data for which the relationship between the cause and its result cannot be exactly defined[5]. The ability of ANNs to learn complex nonlinear input-output relationships have motivated researchers to apply ANNs for solving model-free nonlinear problems related to various fields[6]. In this paper, RBF Neural Network and Elman Neural Network are selected for condition prediction.
3.1 RBF Neural Network The radial basis function neural network (RBFNN) has a feedforward structure consisting of three layers, an input layer, a nonlinear hidden layer and a linear output layer, as shown in Fig. 2. The hidden nodes are the radial basis function units and the output nodes are simple summations. The number of input, output and hidden nodes are nl , no and nh , respectively. This particular architecture of RBFNN has proved to directly improve training and performance of the network[7]. The response of the jth hidden neuron to the input xxk can be expressed as [7]: j (xxk )
exp(
1 2 j
xxk j 2 )
Fig. 2 Architecture of RBF Neural Network
1 j nh
(1)
Fig. 3 Architecture of Elman Neural Network
850
C. Liu, D. Jiang, and M. Zhao
Where j is the centre for the jth hidden neuron, j is the spread of the Gaussian function and denotes the Euclidian norm. The output of each node in the output layer is defined by [7]: fi (XX)
nh j(
(XX j ) ) ji
1 i no
(2)
j 1
Where XX is the input vector and ji represents the weight from the jth hidden node to the ith output node. Radial Basis Function (RBF) NNs have been used in signal processing, system identification and control applications. It was shown that NNs which use RBFs as their transfer function give better performance in terms of accuracy and speed when compared to NNs which use the MLP concept[5].
3.2 Elman Neural Network The Elman Neural Network is one kind of globally feedforward locally recurrent network model proposed by Elman[8, 9]. According to the general principle of the recurrent networks, there is a feedback from the outputs of some neurons in the hidden or output layer to neurons in the context layer which seems to be an additional input layer. These feedback connections in Elman’s neural nets are from the outputs of neurons in the hidden layer to the context layer units that are called as context nodes[10]. This part of the input layer, namely, the context layer, plays a role in storing internal states in Elman’s net as mentioned above. The Elman Neural Network’s Architecture is described in Fig. 3. Using the weight matrices, the outputs of the neurons in the hidden layer for sth iteration can be computed as [10]: y(s) j
f(
n
w1i j x(s) i
i 1
m
w3i j y(sj
1)
)
t j
(3)
j 1
And the output layer for sth iteration can be computed as [10]: z(s) k
f(
m
w2 jk y(sj
1)
)
t j
(4)
j 1
where W1 is the weight matrix between input layer and hidden layer, W3 the weight matrix between context layer and hidden layer, W2 is the weight matrix between hidden layer and output layer, X is input vector, i, j, k indicate the number of input nodes, hidden nodes, output nodes respectively. The Elman NN’s outputs are not only relative to current input data, but also to historical input data because the context layer. That means the output is a function of the previous activation states as well as the current inputs which is useful in time series prediction.
Application of RBF and Elman Neural Networks
851
4 Applications of Neural Network on Condition Prediction of Gas Engine 4.1 Data Preprocessing As mentioned in the 2th section, the uniform rule to signify the equipment’s condition is expected in the condition monitoring process which can simplify the decision process of maintenance strategy. Several steps are proposed for data preprocessing based on the practical data presented in Fig. 4,Fig. 5. Firstly, the normal levels can be detected from the early data when the parameters depart from normal to abnormal in the running process. And parameter abnormalities are calculated compared with normal level online: pik
(xik xi ) xik
i 1 N
(5)
Where, xik is the state of the selected parameter(xi), xi is the normal level, pik is the abnormality signifying the degradation condition.
Fig. 4 The Original Temperature Data
Fig. 6 The Distance Function
Fig. 5 The Original Pressure Data
852
C. Liu, D. Jiang, and M. Zhao
Secondly, fault distance is considered to get uniform rule for condition detecting: dk
(
N
p2ik )1
2
k
1 mt0
(6)
i 1
Where dk is the distance function, t0 is the sample time, m are the sample steps. Here, fault distance is given to signify the equipment’s condition which is relative to the parameters’ impacts on faults. After these steps, fault distance is achieved as Fig. 6.
4.2 Application of RBF and Elman Neural Network in Condition Prediction The basis of RBF Neural Network & Elman Neural Network for prediction have mentioned in 3th section. In this paper, the fault distance mentioned above is used in condition prediction. To estimate the prediction e ects, error function is brought in: dk pred dk k 1 mt0 ek (7) peak(d)2 where dk pred is the prediction result at ith time step, dk is the practical result at ith time step, peak(d) means fault in the running operation. Using RBF and Elman Neural Network, the input data contains ten nodes before the current condition and the output contains one node which expects to predict next time step. The training processes are all online with the real-time data to update the neural network. Condition Prediction Using RBF Network. Prediction results are achieved after training the RBF NN (Fig. 7). Using the error function defined above, the relative error of RBF NN is calculated (Fig. 7). Generally, the prediction results follow the practical data well. The prediction results have good time response. However, the prediction results will depart from the
Fig. 7 Prediction result and relative error using RBF NN
Application of RBF and Elman Neural Networks
853
practical data much when the practical data varies rapidly. The maximum relative error is 35 percent and the average error level is about 2 percent. Condition Prediction Using Elman Neural Network. The prediction results are achieved after training the Elman Neural Network online (Fig. 8). The result predicted by Elman NN has no salutation when the results depart from the practical data. Yet the prediction results will present time delay when the condition changes rapidly. In general, the prediction results follow the practical data well. The maximal relative error is 32 percent and the average error level is about 2 percent.
4.3 Comparison For the neural networks used above, RBF NN performs well in time response. But greater departure is expected for the prediction results when the practical data changes rapidly. Elman NN doesn’t have such good time response, but the smoothness of the prediction results performs better than the RBF NN. This is because Elman has a context layer which makes its output depends on not only current input but also historical inputs. Yet the training process of Elman NN expends more time. For the relative error level, the maximal error is almost the same while the RBF NN has more discontinuous points and a little higher error. The average error level is also in the same level. From the definition of distance function, once the distance is higher than the alert level, alarm is made out to maintenance strategy. And the prediction result will give the faults onsets beforehand which can prevent failure. Condition prediction using RBF NN gets more discontinuous points which depart from the practical data. If the departure is higher than the alert level, the prediction will give out wrong decision. While similar mistake will not happen to the Elman NN. Yet the problem of Elman NN is that it has delay in time response which also weakens the condition prediction e ects. However, the situations discussed above don’t appear in this work which the prediction results follow the practical data well and no false decision will be given out.
Fig. 8 Prediction results and relative error using Elman NN
854
C. Liu, D. Jiang, and M. Zhao
If the maintenance strategy has stronger emphasis on equipments’ reliability, RBF NN is suitable for condition prediction. If the maintenance strategy pays more attention on the continuously running process which doesn’t expect unnecessary shutdown made by false decision, condition prediction using Elman NN is more suitable.
5 Conclusion In this paper, the condition prediction methods are presented using neural networks. From the condition monitoring concept of CBM, condition prediction can give maintenance strategy better supports. Usually, the condition signified by several crucial parameters makes it varies from di erent equipments. So prediction methods put forward in one field are hard to apply in another field. This paper presents fault distance which signifies the abnormality of the equipment’s condition departing from normal level. It can unify the relative parameters to one for condition detection which can get better adjustability in di erent kinds of applications. Then, RBF Neural Network and Elman Neural Network are applied in the condition prediction. For condition prediction using RBF NN, good time response is acquired. Yet the maximal relative error is higher and more discontinuous points appear in the results which increases the possibility of the wrong decision of maintenance strategy. Condition prediction using Elman NN performs well in smoothness of the results and decreases the possibility of the wrong decision of maintenance strategy. These two methods are suitable for di erent emphases of maintenance strategy discussed in the 4th section. Condition prediction using RBF NN is suitable for prevent any fault onset when the equipments’ reliability ranks the first, and condition prediction using Elman NN is suitable to maximize the continuous running time without failure. Once the condition of the equipment is confirmed, maintenance strategy can be determined. Furthermore, fault prognosis can be brought in using neural networks or other methods. Acknowledgements. This work is supported by National Basic Research (973) Program of China (No.2007CB210304).
References 1. Gang, Y., Hai, Q., Dragan, D., Jay, L.: Feature Signature Prediction of a Boring Process Using Neural Network Modeling with Confidence Bounds. J. Adv. Manuf. Technol. 30, 614–615 (2006) 2. Rangangath, K., Samuel, H.H., Verduin, W.H.: System Health Monitoring and Prognostics - a Review of Current Paradigms and Practices. J. Adv. Manuf. Technol. 28, 1012–1017 (2006)
Application of RBF and Elman Neural Networks
855
3. Srinivas, K., Michael, R.B.: Methods for Fault Detection, Diagnosis, and Prognostics for Building Systems-A review. Part I, HVAC & R Research 11, 3–6 (2005) 4. Tse, P.W., Atherton, D.P.: Prediction of Machine Deterioration Using Vibration Based Fault Trends and Recurrent Neural Networks. J. Vibration and Acoustics 121, 355–361 (1999) 5. Li, R.H., Meng, G.X., Gao, N.K., Xie, H.K.: Combined Use of Partial Least-squares Regression and Neural Network for Residual Life Estimation of Large Generator Stator Insulation. J. Meas. Sci. Technol. 18, 2074–2075 (2007) 6. Mahanty, R.N., Dutta Guppta, P.B.: Application of RBF Neural Network to Fault Classification and Location in Transmission lines. IEE Proc. Transm. Distrib. 151, 201–204 (2004) 7. Song, Y.H., Xuan, Q.Y., Johns, A.T.: Protection Scheme for E H V Transmission Systems with Thyristor Controlled Series Compensation Using Radial Basis Function Neural Networks. Electr. Mach. Power Syst. 25, 553–565 (1997) 8. Gao, X.Z., Ovaska, S.J.: Genetic Algorithm Training of Elman Neural Network in Motor Fault Detection. Neural Comput. & Applic. 11, 37–39 (2002) 9. Elman, J.: Finding Structure in Time. Cognitive Science 14, 179–211 (1990) 10. Seker, S., Turkcan, E.A.E.: Elman’s Recurrent Neural Network Applications to Condition Monitoring in Nuclear Power Plant and RotatingMachinery. Engineering Applications of Artificial Intelligence 16, 647–656 (2003)
Judging the States of Blast Furnace by ART2 Neural Network Zhiling Lin, Youjun Yue, Hui Zhao, and Hongru Li*
Abstract. An improved unsupervised neural network of ART2 is proposed to judge the pattern of blast furnace states. In this method six variables viz. charging speed, air flow, air temperature, air pressure, permeability indices and Si composition of liquid iron are determined to express the blast furnace states in a smelting process. The values of these variables are gained from the slide windows in order to overcome their time-varying difficulty. The pattern of blast furnace states is classified by ART2’s competition learning and self steady mechanics. The simulation shows this method is effective.
1 Introduction A blast furnace (BF) smelting process involves the gaseousness, the liquidness and the solidity. It is a complexity because of the transfer of mass, of heat, of momentum and the large of chemical reaction. So it is difficult to master the inner changing rules of BF. The operators usually judge the BF states by using their experiences which have only a few simple theories. This means the different operator will bring the different operant level, which often leads to the abnormal change of BF states, even operation exception [1]. With the development of technology an expert system [2,3,4] and a neural network [5] have applied to the BF control. An expert system which is accepted in Baoshan steelworks and Anshan steelworks determines the states of BF by the rules of experiences. It has an effect on guiding the BF operation. But it has disadvantage of knowledge acquisition and self adapting and some abnormal BF states Zhiling Lin . Youjun Yue . Hui Zhao School of Electrical Engineering and Automation, Tianjin University of Technology, Tianjin 300384, China *
Zhiling Lin . Youjun Yue . Hui Zhao Tianjin Key Laboratory for Control Theory & Applications in Complicated Systems Tianjin 300384, China Hongru Li College of Information Science and Engineering, Northeast University, Shenyang 110004, China H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 857–864. springerlink.com © Springer-Verlag Berlin Heidelberg 2009
858
Z. Lin et al.
cannot be deduced only by some simple rules. A neural network succeeds in building the BF model because it has a strong ability of mapping the input spaces to output spaces. But a neural network must select the training sample and pick up the abnormal data. Abnormal data usually mean the changes of BF states. So a neural network can give a good prediction in steady BF states, but cannot give a right judgment in abnormal BF states. An unsupervised neural network of ART2 is proposed to judge the BF states with some important influences factors.
2 ART2 Neural Network and Improvement 2.1 ART2 Neural Network The ART2 network is an unsupervised neural network for performing both continuous-valued vectors and binary-valued vectors [6]. Detailed derivations of ART2 technique can be found by Carpenter, Grossberg [7] and Chun-Hsien Chenwhich [8], which are repeated here for the sake of clarity and completeness. A typical ART2 is made up of two sub-systems, the attentional sub-system and the orienting sub-system. Its architecture is proposed as shown in Fig.1. Fig. 1 The typical architecture of ART2
In the attentional sub-system, an input pattern s is first presented to the F1 layer which consists of six kinds of unit, viz. W, X, U, V, P and Q cells. It then undergoes a process of activation, including normalization, noise suppression and updating. This results in an output pattern p from the F1 layer. Responding to this output pattern, an activation is produced across F2 layer through bottom-up weights bij. As the F2 layer is a competitive layer with a winner-take-all mode, only one stored pattern is a winner. It also represents the best matching pattern with the input pattern from the F1 layer. Furthermore, the pattern of activation on the F2 layer brings about an output pattern that is sent back to the F1 layer via top-down weights tji.
Judging the States of Blast Furnace by ART2 Neural Network
859
In the orienting sub-system, a reset mechanism R and a vigilance parameter ρ are presented to check for the similarity between the output pattern from the F2 layer and the original input pattern from the F1 layer. If both patterns are concordant, the neural network will go into a resonant state and the relevant stored pattern is conducted. Otherwise, the neural network will assign an uncommitted (inhibitory) node on the F2 layer for this input pattern, and thereafter, learn and transform it into a new stored pattern. The training algorithm can be described as the following steps. Step 1. Initialize parameters of a, b, c, d, e, θ , α and ρ , where a, b are the fixed weights in the F1 layer, c is the fixed weights used in testing for reset, d is the activation of winning F2 unit, e is a small parameter to prevent division by zero when the norm of a vector is zero, θ is the noise suppression parameter, α is the learning rate, and ρ is the vigilance parameter. An input vector s is randomly selected and then proceeds to step 2. Step 2. Update F1 unit activation using Eqs. ((1), (2), (3), (4), (5), (6) and (7)). Initially, let ui=0, pi=0 and qi=0 (where the number of input patterns i=1,2,…,n), thereafter, update F1 unit activation again.
Vi , || Vi || +e
(1)
Wi = Si + aU i ,
(2)
P(k ) = U (k ) + d ⋅ tJ ,
(3)
Ui =
Xi =
Wi , || Wi || +e
(4)
Qi =
Pi , || Pi || + e
(5)
Vi = F ( X i ) + bF (Qi ) ,
(6)
where the activation function is
⎧x fi (x) = ⎨ ⎩0
x ≥θ . x <θ
(7)
Step 3. Compute signals to F2 units, and find YJ with the largest signal (assuming that yJ≥yj, for j=1,2,…,m, where m is the number of output patterns). m
YJ = max{Yj | Yj = ∑bij pi , j = 1, , l}. i =1
(8)
860
Z. Lin et al.
Step 4. Check for reset by updating ui according to Eq. (1), along with using Eqs. ((9) and (10)).
Pi = U i + dt Ji Ri =
U i + cPi , || U || +c || P || + e
(9) (10)
If ||r||≥ ρ −e, then update other F1 units according to Eqs. ((2), (4), (5) and (6)), and continue the following steps. Otherwise, return to step 3 for finding the second largest signal and check again. If no pattern concords, this uncommitted node on the F2 layer will be learnt and transformed as a new stored pattern. Step 5. Update weights of the winning unit J for a certain iterations until the weight changes are below some specified tolerance. Consequently, update F1 activations according to Eqs. ((1), (2), (4), (5), (6) and (9)):
U (k ) tJ (k + 1) = t J (k ) + α ⋅ d (1 − d )[ J − tJ (k )] , 1− d
(11)
U (k ) bJ (k + 1) = bJ (k ) + α ⋅ d (1 − d )[ J − bJ (k )] . 1− d
(12)
According to formula (11) and (12), after the system became stable t J (k + 1) = t J (k ) bJ (k + 1) = bJ (k ) , so
,
,
tJ =
U J (k ) , 1− d
(13)
bJ =
U J (k ) . 1− d
(14)
、
t J bJ will change with U J . When the input mode S changes with the similarity || R || (greater than vigilance parameter ρ ), a set
This means the long memory
template need continue to adjust in ART2 learning process and finally it will deviate from the original template.
2.2 Improvement After the J-th long memory mode is activated through k times by node mode
U ( k ) of F1 layer, the set template U J * satisfies k
min
∑ (U i =1
* J
− U J(i ) ) 2 .
(15)
Judging the States of Blast Furnace by ART2 Neural Network
In which
U J * is the ideal set temperate mode, U J * = [u J 1* , u J 2* ,
861
, u Jm* ] ; U J(i )
is the temperate mode of i times activated by J-th long memory mode,
U J(i ) = [u J(i1) , u J(i2) ,
(i ) , u Jm ] , m is the number of mode neuron, k
k
i =1
i =1
UJ* =
1 k (i ) ∑UJ . k i=1
2(∑U J* − ∑U J(i ) ) = 0 so (16)
Substitute formula (11) for formula (12), so the coefficient of ART2 long memory is improved as follows: k
t J (k + 1) = t J (k ) + α ⋅ d (1 − d )[
∑U i =1
(i ) Ji
k (1 − d )
− t J (k )] ,
(17)
− bJ (k )] .
(18)
k
bJ (k + 1) = bJ (k ) + α ⋅ d (1 − d )[
∑U i =1
(i ) Ji
k (1 − d )
3 Extracting Characteristic Vector from BF States 3.1 Variables Selection The BF smelting runs in a relative movement between the rising gas and the falling charging. The qualified product is produced by heated, deoxidized, smelted, slagging, desulfurized and cementited. In this process the heat quantity is the foundation of keeping BF in good state and it directly reflects the running state of furnace. The variable which indicates the level of BF temperature is the Si composition. Higher the Si composition in liquid iron is, higher the temperature of liquid iron is. Vice versa. Air flow will affect the charging falling, the gas flow distribution and the BF temperature in the smelting process. In the case of lower or similar fuel rate the product of raw-iron raises with more air flow and faster charging falling. Air flow will bring the most energy to BF inner. The temperature of air flow will impact energy of blast air, temperature of furnace, imitational distribution of gas flow and effectiveness of blowing fuel. The instrument of permeability indices reflects the permeability of stock column. The pressure of flow indicates the consistency between the gas and the permeability of stock column. The stock rod indicates the moving position of charging material. So these variables are selected as the characteristic vectors of a BF smelling process.
862
Z. Lin et al.
In conclusion the characteristic variables of BF states are determined as the charging speed, the air flow, the air temperature, the air pressure, the permeability indices and the Si composition of liquid iron.
3.2 Data-Acquisition and Pre-process The smelting process is a time-varying process. One of effective methods is dividing this process into a series of time-segments and using the slide windows to extract the character of each time-segment. Generally the width of window is 3-5 times than the number of process variables and the slide step is set as one. Since the initial values of process variables have larger difference the data must be normalized to avoid submerging some small data after they are obtained from windows. The normalized function is as follows
xi′(l ) =
xi (l ) − x (l )
σl
,
(19)
where
x (l ) =
σl =
1 N
N
∑ x (l ) , i
i =1
N
1 N-1
∑ [ x (l ) − x (l )]
2
i
i =1
l is the dimension of sample, l = 1, 2, i = 1, 2, , N .
k
,
m and i is the number of sample,
Choose the average number of sample as the character vector of k-th window
S (k ) =
1 N
N
∑X i =1
' i
.
(20)
The output of network should be recovery after training, according to
xˆ = x + σ xˆ ' ,
(21)
, xˆ ' is the output values before re-
where xˆ is the actual values after recovery covery.
4 Simulations The data from spot which are confirmed by operators are used to check the effectiveness of this method. After several experiences and trials the parameters of
Judging the States of Blast Furnace by ART2 Neural Network
863
Table 1 Results by improved ART2 Time 8:10 8:20 8:30 8:40 8:50 9:00 9:10 9:20 9:30
ART2
are
Results by ART2 1 1 2 3 4 5 5 1 1
taken
Presumed by operators Normal Normal Hang-ups/warm running Hang-ups/warm running Hang-ups/warm running/slip Slip/cool running Slip/cool running Normal Normal
as a = 0.58, b = 0.62, c = 0.1, d d = 0.82, θ = 0.4,
ρ = 0.95 , and the weights of character vector are taken as ω1 = 0.1, ω2 = 0.15, ω3 = 0.15, ω4 = 0.25, ω5 = 0.2, ω6 = 0.15. Then follows the improved ART2 and the results are in table 1. It is seen from table 1 that results by ART2 conform to that by operators. But more details need to be discussed. The results obtained from the ART2 network are related to several parameters, viz. a, b, c, d , θ , ρ . Parameters a, b reflect the degree of fusion between input signal s and memory pattern q . Bigger the a, b is, stronger the error is. So the network is more stable and the sensitivity is lower.
c⋅d ≤ 1 . θ is the coefficient of noise suppression 1− d which reflects the noise suppression ability of nonlinear function f (⋅) . Bigger Vice verse.
c, d satisfy
the θ is, stronger the noise suppression ability is. But more the useful signals remove. Usually
θ≈
1 . ρ is a parameter which determines the nocity of gradn
ing. Bigger the ρ is, more precise the classification is. But this will cause the sensitivity too high to put in practice. How to determine the range of ρ needs to be further analyzed.
5 Conclusions Keeping BF states stable is an important operation in the smelting process. It is a premise for normal product, good technical and economic index, high efficient, low cost and long living. ART2 is a perfect classifier and has the ability of recognizing different patterns in the complex condition and of learning the unknown pattern quickly. The BF pattern in smelting process can be determined by an effective tool-ART2. The method has the two advantages: 1. It can has learning ability which can determine new pattern in the smelting process; 2. It has self-stable
864
Z. Lin et al.
mechanics which can give the online monitoring function. The disadvantage is that it is difficult to adjust the parameters.
#
Acknowledgments. This work is supported by national natural science foundation of P. R. China Grant 60674063 and the "863" project of P. R. China #2007AA0414 and Tianjin natural science foundation #08JCZDJC18600.
References 1. Liang, D., Bai, C., Qiu, G.: Research on Intelligence Diagnose Method for Blast Furnace Operation. Journal of Iron and Steel Research 18, 56–58 (2006) 2. Huang, B., Wang, W.: Multivariable Intelligent Furnace Temperature Control System Based on Blast Furnace Expert System. Iron and Steel 40, 21–23 (2005) 3. Liu, J., Gong, B., Wang, S.: Research on Blast Furnace Distributed Intelligent Control System. Journal of Zhejiang University 34, 194–200 (2000) 4. Li, H., Zeng, Y.: Blast Furnace Temperature Control Based on Judge of Expert Rule. Information of Microcomputer 23, 96–98 (2007) 5. Lu, H., Gao, B., Zhao, L.: Neural Network Expert System of Forecasting Blast Furnace Operational Conditions. Journal of University of Science and Technology 24, 276–279 (2002) 6. Zhang, D., Wang, F., He, J.: ART2 Neural Network for Growth Phase Classification in Batch Fermentation Process. Chinese Journal of Scientific Instrument 27, 1378–1382 (2006) 7. Carpenter, G.: Grossberg: ART2 Stable Self-Organization of Pattern Recognition Codes for Analog Input Patterns. Appl. Optics 26, 4919–4930 (1987) 8. Chen, C., Khoo, p., Wei, Y.: A Strategy for Acquiring Customer Requirement Patterns Using Laddering Technique and ART2 Neural Network. Advanced Engineering Informatics 16, 229–240 (2002)
Research on Dynamic Response of Riverbed Deformation Based on Theory of BP Neural Network Qiang Zhang, Xiaofeng Zhang, and Juanjuan Wu*
Abstract. In order to study the dynamic response of riverbed deformation due to the changes of the water and sediment conditions, this article introduces a method of BP neural network to establish mathematic method to predict the riverbed deformation from the two aspects of riverbank deformation and river cross section deformation. The model was trained and verified by the measured data of Shishou bend in Jingjiang River. The model was used to predict the riverbed deformation on section Jing-92 and section Jing-96 on year 2008 after the application of the Three Gorges Reservoir. The results can reflect the evolutionary tendency of riverbed deformation, which show that the model is feasible to be used to predict the riverbed deformation. Keywords: BP neural network, Riverbed deformation, Training and verifyication, Three Gorges Reservoir.
1 Introduction The water-sediment conditions of the lower reach has changed after the application of the Three Gorges Reservoir, the river sediment-carrying capacity at the lower reaches of the dam is at the state of non-saturation, which will result in the riverbed deformation, and may cause the river regime adjustment. Data show that after the application of the Three Gorges Reservoir, Jingjiang riverbed will be subject to severe erosion, which will threaten the safety of the Jingjiang dike[1,2]. Research and forecast of the middle and lower reaches of the Yangtze River’s deformation is very significative after the application of the Three Gorges Reservoir. In previous studies, in order to study for many problems of channel evolution, a lot of research and exploration work was done to put forward research methods. In the view of method, there are river model, the mathematical models and analysis of the channel evolution. Artificial Neural Networks is an emerging international Qiang Zhang . Xiaofeng Zhang . Juanjuan Wu State Key Laboratory of Water Resources and Hydropower Engineering Science, Wuhan University, Wuhan 430072, China
[email protected] H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 865–873. springerlink.com © Springer-Verlag Berlin Heidelberg 2009
866
Q. Zhang, X. Zhang, and J. Wu
science, which have achieved very good results in pattern recognition, and automatic control. It has also successfully applied in the fields of planning, water resources, water and environmental assessment of hydrological time series studies at home and abroad [3], which broaden the application areas. In this paper, the BP neural network model is used to analyze the changes of typical riverbed shoreline and the riverbed deformation after the application of Three Gorges.
2 BP Neural Network Model of Riverbed Deformation The working principle of BP neural network in detail can be found in the related artic-les [4,5,6]. For a training pattern ( X k , Yk )( k =l,2,... N ) in which X k is the inp-ut pattern and can be expressed as an no-dimensional vector ( x k 1 , x k 2 ,... x kn )T and Yk is the corresponding expected solution and can be expressed as an nodimensional ve-ctor ( y k 1 , y k 2 , ... y kn )T, the net input to a neuron is usually computed as the sum of all inputs to it:
I kjl = ∑ W jil Okil −1 + θ lj
(1)
i
where
W jil is the weight of the link between neuron u lj in layer l and iron u il −1 in
layer 1-1.
Okil −1 is the output of the neuron u il −1 . θ lj is the threshold of the
l
ne-uron u j . The net input to a neuron is used by its activation function, here the most commonly used sigmoid function, to produce an output
Okjl = f ( I kjl ) = [1 + exp(− I kjl )]−l
(2)
where f (.) is the transfer function. The transfer function has several forms, such as the hyperbolic tangent, step, or sigmoid function. However, the sigmoid function is the most common in practical application and is used here. The BP neural network is characterized by gaining possession of hidden layers. Generally, a BP network with one hidden layer is enough for most applications, and one more hidden layers tends to make the network too complicated, a potential that results in more minimums and causes lower convergence speeds and larger errors. Theoretically, BP with two hidden layers is enough to explain the complicated system with more general classic boundaries and has faster convergence speeds. In the BP learning scheme, the calculated outputs in the output layer,
y kj' are
compared with the desired outputs y kj , to find the error, before the error signals are propagated backward through the network. The error function fined as
E k can be de-
Research on Dynamic Response of Riverbed Deformation
Ek =
1 m ( y kj − y kj' ) ∑ 2 j =1
867 2
(3)
where m is the total number of the output neurons. In the BP algorithm, each training pattern is presented once and the weight correction is calculated, but the weights are not actually adjusted; calculated weight corrections for each weight are added together for all the patterns and then weights are adjusted only once using the cumulative correction. The momentum strategy implements a variable learning rate coefficient implicitly by adding a fraction of movement in the weight change to the current direction of movement in the weight apace. The new equation for weight change is given by
⎛ ∂E ⎞ ⎟ + α × (W jil [T ] − W jil [T − 1]) W jil [T + 1] = W jil [T ] − η ⎜ ⎜ ∂W l [T ] ⎟ ji ⎝ ⎠
(4)
where η is the step size or the learning rate coefficient, the index T labels the iterat-ion number in the learning process, and α is a momentum coefficient that can have a value between 0 and 1. This article establishes mathematic method to simulate and predict the river deformation from two aspects of the riverbank deformation and river cross section deformation. In this model, the key point is the choice of input vectors. When the input vectors are reasonable and representative, the output vectors of the model may be content.
2.1 Riverbank Deformation Model Through the analysis of model input vector, the model selects the cross section shape on the reaches, the location of the thalweg, the water-sediment gene and the length of the time interval as the influencing factors. At the same time, the factors that affect the river deformation are not only the average discharge for many years (equivalent to the median discharge), but also the overland flow caused deformation due to the big discharge can not be ignored. Therefore, the period of the big flow rate is added in the factors. For the influencing factors of water-sediment conditions and river boundary conditions, the width of the river bank failure along the longitudinal direction will be different, in order to reflect the basic situation of the river bank failure, the width of the typical cross section of the river bank failure is often selected as the of the model output goal.
2.2 River Cross Section Deformation Model The input and output vectors of the changes on the river cross sections are similar to the riverbank deformation model, the cross sections shape on the reaches, the location of the talweg, the water-sediment gene, and the length of the time interval
868
Q. Zhang, X. Zhang, and J. Wu
were selected as the input vectors in the mode, and the period of the big flow rate was selected as the influencing factors. The river cross sections were selected as the model output goal.
3 Training and Verification of BP Neural Network Model The Jingjiang River is the flood control focus in the middle and lower reaches of the Yangtze River, and the stability of the its riverbank and embankment is very important. In the 1990s, the evolution of Shishou River bend was very rapid, especially the cut-bank and thrown-away-elbow in the Xiangjia shoal, which caused a large area bank failure near the Beimen inlet [7]. After the application of the Three Gorges Reservoir, the Jingjiang River often had the phenomena of bank failure in recent years. The model selected river water-sediment and the river boundary conditions as the input vectors, and it selected the riverbed shorelines and the riverbed sections for the output vectors. To study the establishment of the riverbank deformation and the river cross sections deformation of the BP network prediction model, this paper simulates and predicts the riverbed deformation of the Shishou River bend. According to the available information on the terrain which can be found in the figure 1, the amplitude on the left bank of section Jing-92 which locates at the upper reach of the Shishou River bend was very big, the right bank of section Jing-96 which locates at the middle reach of Shishou River bend also had the same rule as section Jing-92, those rules can reflect the situation of riverbed deformation at Shishou River bend. At the same time, the section Jing-92 and section Jing-96 are near the inlet reach of the Shishou River bend, they may be largely influenced by the upstream and downstream of the bend changes. Therefore the model selects the section Jing-92 and section Jing-96 as the research objects which can reflect the riverbed deformation.
3.1 Riverbank Deformation For section Jing-92, the model selected the left bank shoreline positions of +25 m, +30 m, +35 m as the research objects, and the thalweg positions, width-to-depth ratio, average discharge of the Xinchang station as well as the sediment discharge of section Jing-84, section Jing-89, section Jing-90 and section Jing-95 were selected as the input vectors. The riverbank of 10 periods, 1970 to 1975, 1975 to 1980, 1980 to 1987, 1987 to 1991, 1991 to 1993, 1993 to 1996, 1996 to 1998, 1998 to 2000, 2000 to 2002, 2002 to 2004, were selected as the model training samples. And the riverbank of 2004 to 2006 was used for the model prediction sample. For Jing section 96, the model selected the left bank shoreline positions of +25 m and +30 m as the research objects, and it selected the data of section Jing-90, section Jing-92, section Jingshi, section Jing-95 and section Jing-97 as the research objects. And the riverbed shoreline of 2004 to 2006 was used for the model prediction sample.
Research on Dynamic Response of Riverbed Deformation
869
Fig. 1 Sketch of Shishou River bend
The calculated and measured shoreline locations of section Jing-92 and section Jing-96 can be seen in table 1. Table 1 shows that the relative error was less than 2%, the prediction results of the BP neural network model are very accurate. So this method can be used for the research of riverbed shoreline deformation. Table 1 Comparison between the verification and measured shoreline locations of section Jing-92 and section Jing-96 on year 2006 Section Jing-92
Section Jing-96
Left coastline
Measured data / m
Verification data / m
Relative error/ %
Right coastline
Measured data/ m
Verification data / m
Relative error/ %
+25m
-2000
-2006.39
0.32
+25m
2271
2257.07
-0.61
+30m
-2019
-2039.65
1.02
+30m
2306
2257.35
-2.16
+35m
-2043
-2053.65
0.52
3.2 River Cross Section Deformation For section Jing-92, the thalweg positions, the generalized sections’ absolute height and the water- sediment conditions as well as the corresponding period of
870
Q. Zhang, X. Zhang, and J. Wu
section Jing-84, section Jing-89, section Jing-90 and section Jing-95 were selected as the input vectors in order to estimate section Jing-92’s generalized absolute height for the model output vector. The riverbed deformation data of 9 periods, 1975 to 1980, 1980 to 1987, 1987 to 1991, 1991 to 1993, 1993 to 1996, 1996 to 1998, 1998 to 2000, 2000 to 2002, 2002 to 2004, were selected as the model training samples. And the riverbed deformation of 2004 to 2006 was used for the model prediction sample. For Section Jing-96, the data of section Jing-84, section Jing-89, section Jing90 and section Jing-95 were selected as the input vectors; the riverbed deformation of 2004 to 2006 was used for the model prediction sample. 45 40
Elevation / m
35 30 25 20 15
Prediction section of 2006
10
Measured section of 2006
5
Measured section of 2004
0 0
500
1000
1500
2000
2500
3000
3500
4000
4500
Distance frome the initial point / m
Fig. 2 Comparison between the verification and measured cross section Jing-92
40
35
30
Elevation / m
25
20
15
Prediction section of 2006 10
Measured section of 2006 5
Measured section of 2004 0 0
500
1000
1500
2000
2500
Distance from the initial point / m
Fig. 3 Comparison between the verification and measured cross section Jing-96
Research on Dynamic Response of Riverbed Deformation
871
The simulation results of BP neural network comparing with the actual cross sections of section Jing-92 and section Jing-96 were shown in Figure 2 and Figure 3. The tables show that the model predicted results in 2004 ~ 2006 are close to the measured sections. So this method can be used for the research of riverbed deformation.
4 Riverbed Deformation Prediction of BP Neural Network Model In order to predict riverbank deformation and river cross section deformation of section Jing-92 and section Jing-96 on year 2008, the water-sediment conditions in 2007 was added. Because of without the water-sediment data in 2008, the data in 2005 was used to instead. The prediction results were shown in table 2, Figure 4 and Figure 5. The left bank of section Jing-92 and the right bank of section Jing-96 both have the trends that collapse backwards, which can be seen from the prediction results. So it should be noted these two sections may emerge large bank failure. Table 2 Comparison between the prediction and measured shoreline locations of section Jing-92 and section Jing-96 on year 2008 Section Jing-92
Section Jing-96
Left coastline
Measured data / m
Prediction data / m
Right coastline
Measured data / m
Prediction data / m
+25m
-2000
-2005.78
+25m
2271
2275.55
+30m +35m
-2019 -2043
-2041.33 -2055.73
+30m
2306
2307.58
45 40
Elevation / m
35 30 25 20 15
Prediction section of 2008 10
Measured section of 2006 5 0 -2500
-2000
-1500
-1000
-500
0
500
1000
Distance from initial point / m
Fig. 4 The prediction result of cross section Jing-92
1500
2000
872
Q. Zhang, X. Zhang, and J. Wu
40 35
Elevation / m
30 25 20 15
Prediction section of 2008
10
Measured section of 2006
5 0 0
500
1000
1500
2000
2500
Distance from initial point / m
Fig. 5 The prediction result of cross section Jing-96
According to the river terrain data for many years, the prediction results of BP network models are in line with the basic law of evolution of river bend, and the results are also the same as the objective facts that section Jing-92 on the left bank and section Jing-96 on the right bank are frequently suffered erosion.
5 Conclusions In this paper with the BP neural network, riverbank deformation and river cross section deformation of section Jing-92 and section Jing-96 were predicted. The results have been good agreement with the actual situation which shows that the method is feasible to predict riverbed deformation after the application of the Three Gorges Reservoir. The accuracy of the BP neural network model should be further improved. When the input vectors including multi-type and some vectors’ number are much more than other ones, the input vectors that have more numbers will weaken the impact of other types of variables on output variables. The existing BP network model can’t whole consider every input vector equally because they have different numbers, so it’s hard to predict the output vectors accurately. So it’s urgent to introduce other ways to solve this problem. It is a long-term process that the Three Gorges Reservoir will influence the evolution of the downstream river. In the future in the forecast of riverbed deformation, BP neural network, one of the good methods to study these problems, can be used to develop further track calculation. Acknowledgments. This work is supported by the National Natural Science Foundation of China (No.50579054) and 973 Program of China (No.2007CB714106).
Research on Dynamic Response of Riverbed Deformation
873
References 1. Pan, Q.: Sediment Study of the Three Gorges Project. China Water Conservancy and Hydropower Press, Beijing (1999) 2. Shi, S., Lin, C., Yang, G.: Governing and Exploiting the Water Front Resources at Middle and Lower Reaches of the Yangtze River. Scientia Geographica Sinica 22, 700–704 (2002) 3. Hu, T.: Neural Network Prediction and Optimization. Dalian Maritime University Press, Dalian (1997) 4. Zhang, X., Xu, Q., Pei, Y.: Preliminary Research on the BP Networks Forecasting Model of Watershed Runoff and Sediment Yielding. Advances In Water Science 12, 17–22 (2001) 5. Shang, G., Zhong, L., Chen, L.: Discussion about BP Neural Network Structure and Choice of Samples Training Parameter. Journal of Wuhan University of Technology 19, 108–110 (1997) 6. Zhang, X., Hu, X., Xie, Z.: Research on BP Network Prediction Model for Diversion Flow of 3 Diversion Outlets of Jingjiang River. Yangtze River 34, 33–34 (2003) 7. Pan, Q., Lu, J.: Analysis of the Near River Evolution on the Middle Reaches of the Yangtze River. Yangtze River 30, 32–34 (1999)
Adaboosting Neural Networks for Credit Scoring Ligang Zhou and Kin Keung Lai*
Abstract. Credit scoring model is a popular tool for the financial institutions (FIs) to assess their customers’ credit risk. Since the large amount of money in credit granting business for FIs, an improvement in the accuracy of the credit scoring model to recognize good and bad customers, even a fraction of one percent can help to reduce significant loss. Some existing researches suggest that adaboost model can help to improve the accuracy of classification for base classifiers. In this paper, two adaboost models with different weights strategies are introduced for credit scoring. Multilayer perceptron neural network with back-propagation training method is employed as the base classifier. The models are tested on one real-world dataset and the experimental results show that adaoosting neural network model is outperformed than the single neural network and traditional adaboost model. Keywords: Adaboost, Neural network, Credit scoring.
1 Introduction In the USA, there were 163.3 million credit cards holders which account for about 75% of adults and 617.1 million general purpose credit cards in circulation at the end of 2002 and this number is expected to be 712.0 million in 2007 [1]. In the UK, 1687 million transactions were by credit card in 2002. Between 1999 and 2002, the number of UK adults using cards to pay for internet purchase increased from 1.3 to 11.8 million with a total transaction value of £9 billion [2]. A large amount of credit customers can contribute to the profit of the credit granting institutions, but they are also the source of risk. With the rapid growth of the credit market, it is becoming impossible to handle a large amount of customers with traditional subjective judgment methods. Most credit granting institutions are now using credit scoring models to assess credit risk of customers to make faster and better credit granting decisions. Ligang Zhou . Kin Keung Lai Department of Management Sciences, City University of Hong Kong, Kowloon Tong, Hong Kong {mszhoulg, mskklai}@cityu.edu.hk *
H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 875–884. springerlink.com © Springer-Verlag Berlin Heidelberg 2009
876
L. Zhou and K.K. Lai
Credit scoring is the set of decision models and their underlying techniques that aid lenders in the granting of consumer credit [3]. It use quantitative methods and data from previous customers with observed credit performance (default or nondefault) to construct scoring models and apply this model on data from new customer to predict its probability of default. In this sense, credit scoring model is to classify customers into good (nondefault) or bad (default) class. When compared to traditional methods based on experts’ experience, this method can promote great efficiencies and time-saving, reduce subjectivity in the loan approval process and provide consistent decision. In the USA, more than 90% of top largest financial institutions use score to make billions of credit decisions annually. Since the large amount of money in credit business, an improvement in the classification accuracy of scoring model, even a fraction of one percent can help the lenders to reduce loss of millions of dollars. Many quantitative methods from different disciplines have been used for building credit scoring models, such as linear regression, logistic regression, decision tree, k-nearest neighbour, etc. from statistics, linear programming from operations research, neural network (NN), support vector machines (SVM), genetic algorithm from artificial intelligence and other hybrid methods which combine two or more than two methods to overcome disadvantage from single method. In theory, neural network can approximate any function arbitrarily closely and have been successfully applied to a variety of real world classification tasks in industry, business and sciences [4]. It has been introduced to credit scoring by many previous researches. Jensen [5] introduce the standard back-propagation neural network for building credit scoring model which select customer’s payment history: delinquent, charged-off and paid-off to be the neural network’s output neuron to depict the loan outcome. West [6] investigates the credit scoring accuracy of five neural network models including multilayer perceptron and results are bechmarked against with several traditional methods. Baesens, etc. [7] discuss and contrast statistical and neural network approaches for survival analysis which focus on finding when customers will be default. Bensic, etc. [8] focus on extracting important features for credit scoring in small-business lending on a dataset with specific transitional economic conditions using a relative small dataset and four different neural network are tested by using the forward nonlinear variable selection strategy. Desai and Crook [9] make a comparison of neural networks and linear scoring models in the credit union environment. Although neural network is reported to be outperformed than some statistical methods in most cases, attempts to improve it have never been stopped. Principle component analysis (PCA) techniques is employed to select the important features to improve accuracy and reduce computational time [10]; Genetic programming is adapted to optimize neural network architecture to improve the ability of neural networks [11]; Hybrid neural network models which combine neural network with fuzzy logic system, self-organizing map (SOM), decision tree to improve NN. Boosting neural networks is another direction to improve to the performance of NN [12]. Adaboosting neural networks (AdNN) are to use neural network as base classifier instead of decision tree in the traditional adaboost models and is expected to
Adaboosting Neural Networks for Credit Scoring
877
provide more accurate generalization than a single model. Schwenk and Bengio [12] reported the AdNN is significantly better than boosted decision trees in terms of accuracy on a data set of online handwritten digits reorganization. West, etc. investigate three ensemble strategies (crossvalidation, bagging, adaboost) and employ multilayer perceptron neural network as a base classifier for bankruptcy prediction. The experiment results on three real world financial decision applications show that the generalization ability of the neural network ensemble is superior to the single best model. However, Tsai and Wu [13] investigate the performance of a single classifier as the baseline classifier of ensemble model and compare with ensemble models based on neural networks. They found that ensemble neural network classifiers do not outperform a single best neural network classifier in many cases. In their study, they only consider the simple ensemble strategy: majority voting and use different classifier parameters to form the diversity of classifier in ensemble model. In this study, we consider neural network to be the base classifier of Adaboost model for credit scoring and discuss the effect of two different training samples weighting strategies and neural network architecture. The rest of this paper is organized as follows. In the next section, the description about Adaboost neural networks and basic issues to construct neural network baseline classifiers is given. In section 3, empirical study on a real world data set is reported and analyzed. The final section 4 is a short conclusion and discussion about this research.
2 Adaboosting Neural Networks Adaboost is a popular boosting algorithm which constructs a composite classifier by sequentially training classifiers while putting more and more emphasis on certain patterns [12]. Usually, there are two ways to implement this idea. One is to train the baseline classifier with respect to a weighted cost function (WCF) which assigns larger weight to the incorrectly classified samples; another way is to sample with replacement (SWR) in terms of probability distribution and thus can be used to approximate a weighted cost function. In the second way, samples with high probability may occur several times while those with low probability may not occur in the training sample at all. For the credit scoring problem, we are given a training data set S = { xk , yk }kN=1 ,
where input data xk ∈ R m and its corresponding output yk ∈ R and yk ∈ {1, − 1} . In the SWR adaboost method, initially, each sample in S is assigned an equal weight of 1/N, which means that each sample has the same opportunity to be selected at the first step. Generating T neural network classifiers for the adaboost model need T rounds of training neural network with T different training sample groups St (t=1, 2, …, T). In round t, the function to determine the weight of sample k is denoted by Dt(k). In each round after the construction of classifier Mt which provides a function Ft to map x to {1, -1}, the value of Dt(k) is adjusted in terms of how they are classified by the the classifier Mt and the training sample group St+1 is then generated in term of Dt on S with sample replacement. The detail of this
878
L. Zhou and K.K. Lai
algorithm (SWRAdNN) is described as follows in pseudo-code which is slightly adjusted from traditional one: Algorithm: SWRAdNN Input: S, a set of samples for training with size N; T, the number of rounds to construct the adaboosting model; Settings for training the base neural network classifier Output: Adaboost model Method: Step 1: initialize the weight of each sample in S to 1/N, i.e. D1(k) = 1/N, k=1, 2,…N; Step 2: for t = 1 to T do do 2.1 Sample S with replacement according to Dt to obtain St; 2.2 Train neural network with St to obtain classifier model Mt; 2.3 Compute the weighted error ε t from model Mt on S as (1): N
ε t = ∑ Dt ( k ) × errt ( xk ) ,
(1)
k =1
where errt ( xk ) is the misclassification error of xk by model Mt, it is defined as (2): ⎧1 if Ft ( xk ) ≠ yk errt ( xk ) = ⎨ . otherwise ⎩0
(2)
2.4 If ε t > 0.5 , set Dt(k) = 1/N, k=1, 2,…N; until ε t < 0.5 if ε t = 0 , set T = t, break; else update the weight function Dt(k) by formula (3) and (4):
⎧ D (k ) × βt Dt +1 ( k ) = ⎨ t ⎩ Dt (k ) Dt +1 ( k ) =
if Ft ( xk ) = yk otherwise
Dt +1 ( k ) N
∑D k =1
t +1
(k )
,
,
(3)
(4)
where (4) is to normalize the weight such that Dt+1 is a probability distribution function, β t is obtained by formula (5):
Adaboosting Neural Networks for Credit Scoring
βt =
879
εt 1 − εt
(5)
endif endfor Following above method, we have a set of neural network classifiers Mt which actually defines a set of functions {Ft | t=1, 2, …, T} and the final decision from this set of functions (adaboost models) is defined as (6): T
y ( x ) = ∑ Ft ( x ) × log t =1
1
βt
.
(6)
From (3), it can be observed that the weights of correctly classified samples are decreased while by normalization the weights of misclassified sample are increased. The structure of the neural network structure is shown as in Fig. 1, two nodes in the output layer which indicate the probability that the customer is good and the customer is bad respectively. For a good customer, the completely correct output of the neural network should be [1 0] and for a bad customer, it should be [0 1] indicating that the probability for it to be good is zero and to be bad is 1. The output function F(x) can be defined as (7).
⎧ +1 if Pt g > Pt b . Ft ( x ) = ⎨ ⎩ −1 otherwise
(7)
In SWRAdNN algorithm, the training data set for each neural network is obtained by resampling with replacement based on the weight distribution function Dt from S. Another way is to combine Dt with the cost function used in network training function [12] which guide the direction of optimization of weights in neural network. This adaboost neural network algorithm WCFAdNN is described as follows: Fig. 1 Example of neural network architecture in adboost NN model
Ptg Ptb
880
L. Zhou and K.K. Lai
Algorithm: WCFAdNN Input: S, a set of samples for training with size N; T, the number of rounds to construct the adaboosting model; Settings for training the base neural network classifier Output: Adaboost model Method: Step 1: initialize the weight of each sample in S to 1/N, i.e. D1(k) = 1/N, k=1, 2,…N; Step 2: for t = 1 to T do do 2.1 Training neural network with Levenberg-Marquardt backpropag-ation on S with respect to weight distribution Dt and to obtain model Mt which corresponds to the function Ft. 2.2 Compute the weighted error ε t from model Mt on S as (8):
εt =
1 N ∑ Dt (k ) × ⎡⎣1 − ( Pt g ( xk ) − Ptb ( xk ) ) × yk ⎤⎦ . 2 k =1
(8)
When yk=1, the completely correct output of NN should be [1 0], the actual output of NN is [ Pt g Pt b ], thus the error is the sum of error from the two output neuron, i.e. (1 − Pt g ) + Pt b , the same goes for yk= -1. 2.4 If ε t > 0.5 , set Dt(k) = 1/N, k=1, 2,…N; until ε t < 0.5 if ε t = 0 , set T = t, break; else update the weight function Dt(k) by formula (9) and (10):
(
)
1⎡ 1+ Pt g ( xk ) − Ptb ( xk ) × yk ⎤ ⎦
Dt +1 ( k ) = Dt (k ) × β t2 ⎣ Dt +1 ( k ) =
Dt +1 ( k ) N
∑D k =1
t +1
(k )
,
,
(9)
(10)
where (10) is to normalize the weight such that Dt+1 is a probability distribution function, β t is obtained by formula (11):
βt = endif endfor
εt
1 − εt
.
(11)
Adaboosting Neural Networks for Credit Scoring
881
The final decision function is the same as SWRadNN. For (9), since 0< ε t < 0.5 , so 0 < β t < 1 , the weights of the correctly classified samples with idea output is reduced by βt, and weights of those misclassified samples will have no change.
3 Experimental Results In this section, the experimental results on one credit data sets from UCI Machine Learning Repository database will be presented. The experiments are to make a comparison between SWRAdNN and WCFAdNN and explore the effect of sample weighting strategies and architecture of neural networks on performance of the adaboost models. The data set is from German, it consists of 700 instances of creditworthy applicants and 300 instances of noncreditworthy applicants. For each instance has 20 features including status of existing account, credit history, load purpose, credit amount, employment status, etc. All nominal variables are transferred into binary variables and ordinal and continuous variables are kept. The original dataset with 20 attributes denoted by symbols is transformed into a numerical data set with 26 numerical variables including the attribute to describe installment rate in percentage of disposable income which is ignored in the numerical version of the dataset on UCI. All input attributes are linear scaled to [0, 1] as (12) to avoid the dominance of attributes with greater numeric values over those with small values. xij =
xij − min xkj k ∈{1,... N }
max xkj − min xkj
k ∈{1,... N }
, i = 1,..., N , j = 1,..., m .
(12)
k ∈{1,... N }
Let the number of creditworthy cases classified as good be denoted by GG and classified as bad with GB, denote the number of default cases classified as good with BG and as bad with BB. Three commonly used evaluation criteria measure the efficiency of the classification as follows: GG × 100% Sensitivity (Se)= GG + GB BB Specificity (Sp) = × 100% BG + BB GG + BB Percentage Correctly Classified (PCC) = × 100% GG + GB + BG + BB
The experimental results about the two adaboost methods with different architectures of neural networks are shown as Table 1. Each number is the average of 3 replications of 10-fold cross validation with the epochs to train the neural network to be 200 and the architecture of the neural network is 26-3-2. To compare the adaboosting neural network models with the single neural network models and the traditional adaboost model with decision tree as the base classifier, the average results of 3 replication of 10-fold cross validation of single neural network with
882
L. Zhou and K.K. Lai
Table 1 The results obtained by adaboosting neural network models on German credit dataset with different number of members in the models Number of Members
SWRAdNN Sp 47.22 47.00 47.78
Se 86.24 87.90 88.10
10 30 50
PCC 74.53 75.63 76.00
Se 86.43 87.48 87.62
WCFAdNN Sp PCC 52.44 76.23 63.41 76.73 51.89 76.90
Table 2 The results obtained by single neural network with different architectures and traditional adaboost with decision tree as base classifier on German credit dataset BP Neural network Number of neurons in hidden layer 3 10 30 50
Adaboost
Se
Sp
PCC
Se
Sp
PCC
84.81 81.86 84.38 83.05
46.44 40.22 37.22 39.44
73.30 69.37 70.23 69.97
92.48
28.44
73.27
SWRAdNN Single NN
100.00%
WCFAdNN Adaboost
90.00% 80.00% 70.00% 60.00% 50.00% 40.00% 1
3
5
7
9
11
13
15
17
19
21
23
25
27
29
Fig. 2 PCC of four models on 3 replication of 10 fold cross validation
different number of neurons in the hidden layer and the traditional adaboost model with decision tree as the base classifier are shown in Table 2. From the results, it can be found that the performance of classification accuracy of two adaboost strategies perform better than a single BP Neural network model and traditional Adaboost model on measure of Sp and PCC. WCFAdNN outperforms SWRAdNN in terms of Sp and PCC. Traditional Adaboost classify most instance into good, thus it has good performance on Se but bad performance on Sp. The results in Table 1 slightly support the phenomena that more members in adaboost can help to increase the performance. Although previous research finds
Adaboosting Neural Networks for Credit Scoring
883
that accurate boosting model require a relatively large ensemble membership [14], Do the phenomena still exist if the number of members reaches to a large number? Fig. 2 shows the PCC of SWRAdNN, WCFAdNN, the single BP Neural network model with best PCC and Adaboost model on the 30 groups of testing data sets. In most case, adaboosting neural network is better than that from single NN and adaboost, which demonstrate the good robustness of the adabooting NN.
4 Conclusion In this study, we compared the performance of two different sample weights strategies for adaboost and explore the effect of architecture of base neural network. And the empirical results in this research show that adaboosting neural network model outperform a single neural network and the traditional adaboost with decision tree as the base classifier. The idea of adaboost is to increase the weight to the misclassified instance in one round, how this strategy affects the classifier’s performance on the previously correctly classified instance? Thus, the theory behind adaboosting neural network needs further investigation. In addition, the effect of diversity characteristic of members in adaboost and the form of function updating will be our research in the future. Acknowledgments. This research is supported by the grant from City University of Hong Kong (Strategic Research Grant No. 7002253).
References 1. The Nilson Report. Oxnard, California (October 2003) 2. Thomas, L.C., Oliver, R.W., Hand, D.J.: A Survey of the Issues in Consumer Credit Modelling Research. Journal of the Operational Research Society 56, 1006–1015 (2005) 3. Thomas, L.C., Edelman, D.B., Crook, J.N.: Credit Scoring and Its applications. Siam, Philadelphia (2002) 4. Zhang, G.: Neural Networks for Classification: a Survey. Systems, Man, and Cybernetics. IEEE Transactions on Applications and Reviews 30, 451–462 (2000) 5. Jensen, H.L.: Using Neural Networks for Credit Scoring. Managerial Finance 18, 15– 26 (1992) 6. West, D.: Neural Network Credit Scoring Models. Computers & Operations Research 27, 1131–1152 (2000) 7. Baesens, B., Van Gestel, T., Stepanova, M., Van den Poel, D., Vanthienen, J.: Neural Network Survival Analysis for Personal Loan Data. Journal of the Operational Research Society 56, 1089–1098 (2005) 8. Bensic, M., Sarlija, N., Zekic-Susac, M.: Modelling Small-business Credit Scoring by Using Logistic Regression, Neural Networks and Decision Trees. International Journal of Intelligent Systems in Accounting Finance & Management 13, 133–150 (2005) 9. Desai, V.S., Crook, J.N.: A Comparison of Neural Networks and Linear Scoring Models in the Credit Union Environment. European Journal of Operational Research 95, 24–37 (1996)
884
L. Zhou and K.K. Lai
10. Wang, W., Xu, Z., Lu, J.: Three Improved Neural Network Models for Air Quality Forecasting. Engineering Computations 20, 192–210 (2003) 11. Ritchie, M., White, B., Parker, J., Hahn, L., Moore, J.: Optimization of Neural Network Architecture Using Genetic Programming Improves Detection and Modeling of Gene-gene. Interactions in Studies of Human Diseases 4, 285–301 (2003) 12. Schwenk, H., Bengio, Y.: Boosting Neural Networks 12, 1869–1887 (2000) 13. Tsai, C.-F., Wu, J.-W.: Using Neural Network Ensembles for Bankruptcy Prediction and Credit Scoring. Expert Systems with Applications 34, 2639–2649 (2008) 14. Breiman, L.: Prediction Games and Arcing Algorithms. Neural Computation 11, 1493–1517 (1999)
An Enterprise Evaluation of Reverse Supply Chain Based on Ant Colony BP Neural Network Ping Li, Xuhui Xia, and Zhengguo Dai
Abstract. Evaluation of reverse supply chain enterprises is the an important foundation of the reverse supply chain management. Analysis of the evaluation of the reverse supply chain enterprise comprehensive strength to consider the impact of factors; the establishment of evaluation index system; On this basis, based on ant colony BP neural network of reverse supply chain business model evaluation; describes the reverse supply chain enterprise evaluation. The process through simulation to verify the validity of the model. Keywords: Reverse supply chain, Evaluation index, BP neural network, Ant Colony Algorithm.
1 Introduction With the increasing environmental awareness, we begin to pay more attention to the reverse supply chain which can be used to enhance the utilization of waste. However, the complexity of reverse supply chain process enables the corporation who is lack of experience cost a much higher price to establish effective reverse supply chain network. As a result, those who are not familiar Ping Li · Xuhui Xia College of Machinery and Automation, Wuhan University of Science and Technology, Wuhan, 430081, China
[email protected],
[email protected] Ping Li College of Mathematic and Information Science, Huanggang Normal University, Huanggang, 438000, China Zhengguo Dai Beijing Jixiang Digital Wave CO.,LTD, Beijing, 100101, China
[email protected] H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 885–891. c Springer-Verlag Berlin Heidelberg 2009 springerlink.com
886
P. Li, X. Xia, and Z. Dai
with reverse logistics operation wish to outsource this part of the mandate of third-party professional reverse supply chain enterprises. Thus, it has become major trend in business community. During this process, there is an important question: how to choose the right reverse supply chain enterprise? In fact, the evaluation for an enterprise is not only complex but also nonlinear. A good designer might make a right judge by his experience, but still could not completely overcome everything by his subjective error. Thus, we use the artificial neural network to carry out the evaluation. On one hand, we can make full use of existing information. On the other hand, we also could avoid the aspects of subjective error. Therefore, this is an effective method of evaluation. Recently, back-propagation algorithm for multi-layer network (BP neural network) is the most widely used artificial neural network, but the basic back-propagation algorithm has the disadvantage of a long training time and easily failing in local shortcomings. Thus, in practical problems, we always use other method to increase convergence rate and avoid the local optimum. In this paper, ant colony algorithm is adapt to optimizing the initial weight of the back propagation network for the purpose of avoiding local optimization generally. This paper is organized by following: First of all, we discuss the reverse supply chain enterprise method of evaluation and some important indicators; then we design the basic structure of the BP network by selected index of targets, at the same time, we give the basic steps to solve the problem and the algorithm process; later, we illustrate this algorithm by an example ; finally, we present a summary and some further improvements which will be discussed in the future work.
2 Selected Important Factors Enterprises In order to give a comparison of different kinds of enterprises, we should first choose some key factors which can determine the level of qualifications of a reverse supply chain enterprise and then we give some further calculations. With the analysis of reverse supply chain and taking into account of the factors in choice of logistics service providers, we have the following properties as the key factors in choice of reverse supply chain companies. The first property of a company is the level of management which includes the speed of recovery, inventory turns, the average efficiency of Demolition processing center, the proportion of managers. The second is the level of technology which includes the proportion of technical personnel, advanced degree of equipment and technology, the ratio of scientific research cost and the number of patents. The third property is the level of information including the standardization of information, speed of information processing and information security. The fourth fact is the cost including the cost of recovery, transfer price of recycling products, information cost, distribution costs and opportunity cost. The fifth is the ability of transport which includes the
An Enterprise Evaluation of Reverse Supply Chain
887
vehicle condition, the accuracy of the time and the applicability. The last important fact we should consider is the experience and network coverage including the employment history, the number of recycling center and recycling center location. In the actual process of evaluation, we should try to choose easily quantifiable factors as the evaluation index, such as inventory turnover rate, and so on, and we also can give a criteria of quantization for a special indicator, such as using failure rate of vehicles as the indicators of vehicles situation.
3 ACO Back Propagation(BP) Network Model Optimizing net weight is very important in BP network. If we choose initial weight randomly, the result of optimization will likely be local optimum solution. Thus, it is important to choose proper initial net weight. Ant colony algorithm is an effect global optimum algorithm. So this paper will use it to find proper initial net weight.
3.1 Basic Idea First of all, we should use ant colony algorithm to optimize the initial weight of BP network and narrow the search space, then we get the optimum value with BP network in narrowed solution space. The implementation of concrete steps is as follows: Firstly, we should produce a certain number of initial weights randomly; Secondly, we utilize ACO to get the best values and obtain optimum weight from optimized initial weight with BP algorithm; Finally, we manipulate BP network to complete appraise of the reverse supply chain enterprises.
3.2 ACO-BP Network Designing Structure of BP Network. Considering the size of the problem, threelayer network will be used in this passage. There are input layerimplicit layer and output layer. The numbers of the joint points are determined by the dimensions of the data. For example, in this paper, there are twenty-two factors in the evaluation of the supply chain business and there are twenty-two joint points in the corresponding input layer. Sometimes, in order to simplify the models, we can choose eight factors and the number of corresponding joint points is eight. The numbers of the output layer joint points are always dependent on the need of the problem, with the hope that the comprehensive abilities are showed by the real numbers in the close interval from 0 to 1, so the joint point is 1. The implicit layer is more complex, so it can be concluded that the efficiency is the best when the joint points are 8 and the
888
P. Li, X. Xia, and Z. Dai
joint points are 11 by the method in this passage. Therefore, the numbers of joint points is 11. According to the feature of the problem, we choose S-hyperbolic function as transfer function of implicit layer, and hard limits linear function as transfer function of output layer. Design of Ant Colony Algorithm. Assuming that the performance function of BP network is the mean square error function F(x), and the value range of x is xmin ≤ x ≤ xmax , x ∈ Rn , so the number of input-layer’s points is N. Thus, it turned out to be the following optimization problem: minJ = F (X) xmin ≤ x ≤ xmax , x ∈ Rn . The steps of this algorithm is as follows. Firstly, we suppose the network weights xi 1 ≤ i ≤ N are N random non-zero values and they become a set of IPi . Ant colony will go to find food from the nests, that means every ant choose a weight from IPi and they select a group of neural network weights from all sets. The number of the ants is h, τj (IPi ) represents the pheromone of the j-th node in the setIPi (1 ≤ i ≤ m). Different ants search the next nodes independently. So every ant sets out fromIPi and they choose the next node based on the pheromone of every node in the set and the formula: prob(τjk (IPi )) =
τj (IPi ) . N τg (IPi )
g=1
After the ants finish the choice of all the nodes in the sets, they reach the food source. Then we adjust the pheromone of the element in all sets, and repeat this process again and again. When all the ants converge the same route or they finish the given number of iteration, we end the research. The main steps of the above algorithm is as follows: Step 1: denote t and NC in respect to the time and the number of iteration and denote NCmax the maximum number of iteration. Denote the pheromone of the nodes in every set τj (IPi )(t) = c and τj (IPi )(t) = 0, and we set every ant in the beginning line. Step 2: all the ants begin to search the path, prob(τjk (IPi )) =
τj (IPi ) . N τg (IPi )
g=1
Step3: repeat the step2 until all the ants reach the food resource. Step4: suppose m + t "→ t; N C + 1 "→ N C, we compute the neural network output by the weights which ants choose and compare the error. Then we record the current optimal solution. According to the rule of adjust of pheromone, we update each node of the pheromone.
An Enterprise Evaluation of Reverse Supply Chain
889
The rule of the adjust of pheromone: With the increasing time, the remain pheromone begin to work. We use the parameters ρ0 ≤ ρ ≤ 1 to represent the persistence of pheromone and use 1−ρ to represent the degree of volatilization for every pheromone. After m units of time, ants arrived at the food source from the starting point. The various pheromone of every path should be adjusted by the following formula: τj (IPi )(t+1) = ρτj (IPi )(t) + Δτj (IPi ); Δτj (IPi ) =
h
Δτjk (IPi ).
g=1
The Δτjk (IPi ) means the k-th ant in this iteration remains the pheromone inPj (IPi ). It can be computed by the following formula: Q if thek − thantpassedinthej − thnode Δτjk (IPi ) ek 0otherwise. In the above, there is something we should explain further. The Q is a constant which can be used to adjust the rate of the change of pheromone. ek = |O − Oq |, O and Oq are the actual output and expectation output in BP neural network respectively. It is obvious that if the error is small, the increasing rate is fast. Step5: If all the ants converge to the same path or the number of iteration isN C > N Cmax , the iteration is over and the system outputs the best path, otherwise, we should turn to the Step2.
3.3 Compute Process and Specific Algorithm of BP Network There are two steps of the training course of ant colony BP network: one is optimizing the initial weights of the network by using ant colony algorithm; the other is using BP algorithm iterative weight value. The specific steps is described as following: (1) ascertain the structure of BP network. (2) get N coding as the vertex set S of ant colony algorithm. (3) decode the coding, get network weights, use the error between network practical output value and the expected value as the fitness function and calculate their values. (4) calculate and get the path of ant colony by using the fitness function. (5) renewal concentration value of pheromone. (6) repeat the steps from (3) to (5) by using the parameters after renewal and iterative constantly. (7) judge whether the termination conditions is or not, if it is ,then come to(8), if it is not, come to (4).
890
P. Li, X. Xia, and Z. Dai
(8) use the optimal value that calculated by ant colony algorithm as the network weight value, train network by BP algorithm until it reach the appointed precision or iterations.
4 A Simulation Example Considering the complexity of the model, we assume that a customer enterprise has considered eight indexes as the following scale such as : inventory turnover, proportion of technicians, net assets value, reward rate of assets , vehicle fault rate, on time arrival rate, employment time and network coverage. This customer has seven enterprise to choose such as the scale one. The concrete parameter for ant colony BP network model is set as follows: in BP network, the number of training samples is 4 (form X1 to X4), the number of hidden nodes is 11, the training function is contained in Matlab toolbox, initial weight value is random number of -10 10, the maximal iteration number is 1000, and in ant colony algorithm, the number of ant is 25, volatilization rate is 0.7, the experimental result of ant colony algorithm training is as scale two. number
X1 X2 X3 X4 X5 X6 X7
Integrated Strength Evaluation Factors Inventory Skilled Net Assets Failure On Work Velocity PerCapital(ten- ReRate of time years sonnel thousand) turn Vehi- arrival ProporRate cles rate tion 15.80 0.103 1080.0 20.82 0.5 0.98 8 11.29 0.069 350.0 9.85 1.7 0.87 3 8.37 0.057 290.0 7.90 1.9 0.79 2 12.54 0.083 550.0 17.36 1.1 0.92 5 13.79 0.091 590.0 18.92 0.9 0.94 6 12.95 0.088 440.0 17.04 1.2 0.92 6 12.63 0.074 480.0 16.54 1.5 0.90 5
X1 Evaluation 0.90 Objective:¯ y Output 0.9019 Results:y
NetworkEvaluation cover- Results age rate 0.75 0.30 0.23 0.46 0.61 0.57 0.55
0.90 0.67 0.51 0.76
X2 0.67
X3 0.51
X4 0.76
X5
X6
X7
0.6408
0.5573
0.7412
0.8019
0.7181
0.7056
We can take conclusion from the table of that the actual output is very close to the expect out put of the model. so this model can evaluate the reverse supply chain enterprises very accurately.
5 Conclusion In this paper, taking advantage of ant colony algorithm, BP network can avoid the local optimum in process of optimizing. Based on this work, we
An Enterprise Evaluation of Reverse Supply Chain
891
evaluated the actual strength of some reverse supply chain enterprise and received very good results. Since the BP networks convergence rate becomes very slow when it is used in large-scale problems, the further improvement contains how to optimize its iterate rate with numerical methods or heuristic algorithm. Acknowledgements. This paper and the authors are supported by the Hubei Province science and technology research project under Grant 2007AA101C45.
Ultrasonic Crack Size Estimation Based on Wavelet Neural Networks Yonghong Zhang, Lihua Wang, and Honglian Zhu*
Abstract. Patter recognition can be used for crack size and type classification in ultrasonic nondestructive evaluation. This paper presents a novel approach to estimate crack size and crack location in a steel plate using ultrasonic signals based on wavelet neural network. The feature indicators extracted from collected ultrasonic signals include 14 signatures characterizing ultrasonic signals in the time, frequency, and joint time-frequency domains. We develop a wavelet neural network model to estimate crack size. The obtained results are compared with the conventional BP neural network. Our data indicate that the wavelet neural network model performs better than the BP model. Keywords: Wavelet neural network, Ultrasonic, NDT, Estimation, Crack size.
1 Introduction Ultrasonic nondestructive testing (NDT) technique is the most widely used to test and identify the defects which were located in the wall of pipelines and petrol tanks. Conventional ultrasonic testing for industrial application used the pulse echo inspection techniques, which rely on the echo amplitude to size the crack [1]. Though simple and inexpensive, it suffers from poor resolution for crack sizing because echoes may be severely attenuated. Further more, the amplitude of echoes may be influenced by factors such as surface roughness, particles in the specimen, transparency, and crack orientation. In recent years, artificial neural networks have been shown to be effective for defects classification and sizing of cracks [2-5]. In this paper, a method based on wavelet analysis and neural networks used to realize the steel crack estimation and auto-classification has been tried, which involving the use of the input feature selection, wavelet transformation, and neural networks. In the following sections, we will describe the experimental data firstly, and then the different crack echo signals were analyzed by multi-resolution decomposition using wavelet transform. Yonghong Zhang . Lihua Wang . Honglian Zhu School of Information and Automation Engineering, Nanjing University of Information Science and Technology, Nanjing, 210044, China
[email protected] *
H. Wang et al. (Eds.): The Sixth ISNN 2009, AISC 56, pp. 893–899. springerlink.com © Springer-Verlag Berlin Heidelberg 2009
894
Y. Zhang, L. Wang, and H. Zhu
The feature will be extracted from the reconstructed wavelet coefficient, which can represent the characters of the corresponding crack. The proposed neural network architecture and training results were also introduced. Finally, the performance of the proposed approach was concluded.
2 Experimental Data Collection The experiment data was obtained using the Pulse-Echo inspection method with a 2.25MHz angle beam transducer. The experiment system used is showed in Fig.1. The experiment system consists of the following main subsystems: the OmniScan UT, the Bi-slide linear positioning system, the fixture, the specimen, and the ultrasonic transducer. The cracks on the nine specimens were produced by electrical discharge machining (EDM) with the following depths (in mm): 0, 0.1, 0.3, 0.5, 1.0, 1.5, 2.0, 2.5, and 3.0.
Fig. 1 The experimental system setup
Fig. 2 Photos of experiment system
Ultrasonic Crack Size Estimation Based on Wavelet Neural Networks
895
Two pictures of the experiment system are given in Fig.3. In the experiment as shown in Fig.1, the transducer serves as the transmitter and receiver. The ultrasonic signal is transmitted by a transmitter to the specimen, it propagates in the specimen in a certain direction and will be changed by the back wall, crack corner, and crack tip if they are in its path. The reflected signal by the crack can be received by the same transducer as shown in Fig. 2.
3 Feature Selection Selecting features of ultrasonic signals means capturing and quantifying information relevant to different sizes of crack at different locations. Besides peaks frequently used in ultrasonic signal waveforms for testing, some meaningful signatures characterizing ultrasonic signals in the time, frequency, and joint timefrequency domains can be extracted from complete signal waveforms. These features may not be easily interpreted visually, but they are very useful in automatically identifying, sizing and locating flaws. Features can be extracted from the time domain and frequency domain. In the time domain, features should be selected to describe an ultrasonic signal or pulse which include the statistics of waveform amplitude, pulse duration, local and global rise and fall indexes etc. Applying Fourier transform, a signal in the time domain can be transferred into the frequency domain, some parameters or statistics of power spectrum amplitude, such as the structural parameters, local and global rise and fall relative indexes can be extracted to characterize an ultrasonic signal in the frequency domain. Researchers have used hundreds of features based on different signal processing techniques for classification purposes [2]. In reference [3], 69 features were used. For the classification in our experiments, we have selected 14 features listed in Table 1. Normally, the peak amplitude of radio frequency (RF) waveforms in the time domain is used to identify the size of cracks. This feature may be contaminated by factors such as the couplant between the transducer and the specimen and the pressure of the transducer applied on the specimen. To improve classification accuracy, we have included features from the shape of the complete waveform. These features are informative, especially when crack depths are comparable to or smaller than the wavelength of the waves propagating in solids. Some of the features listed in Table I are defined below: F4: Mean of s (t ) + jsˆ(t )
(1)
F5: Variance of s(t ) + jsˆ(t )
(2)
F8: T0 = F11: Ω 0 =
1 2 t s (t ) dt E∫
1 ∞ 2 Ω S (Ω) d Ω E ∫−∞
(3) (4)
896
Y. Zhang, L. Wang, and H. Zhu
F12: B =
π
∞
( Ω − Ω ) S (Ω ) E∫ −∞
0
2
dΩ
(5)
F13: Skewness of S (Ω)
(6)
F14: Kurtosis of S (Ω)
(7)
Table 1 Features in the time and frequency domains for crack classification
Feature label F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11 F12 F13 F14
Feature description mean value of the normalized RF waveform amplitude values variance of the normalized RF waveform amplitude values energy in the time domain mean value of the normalized envelope function variance of the normalized envelope function rise time from 25% level to peak fall time from peak to 25% level time center pulse duration peak frequency center frequency band width Skewness of magnitude spectrum Kurtosis of magnitude spectrum
4 Wavelet Neural Networks In this work, the wavelet neural network for classification was used and it can be considered as an extended perception consisting of two parts as shown in Fig.3. The first part contains the so-called wavelet nodes, in which the classical sigmoidal functions have been substituted by the wavelet functions [2]. It was used for transient detection and feature extraction for this part. The classification is performed by the second part, a traditional single-layer or multi-layer perception. During the training stage, the wavelet neural network is able to learn arbitrarily complex decision regions defined by the weight coefficients. At the same time, it can look for those parts of the time-frequency plane that are suited for a more reliable classification of the input transients [2]. For each class of transients, a time-domain analysis has to be conducted in order to establish which mother wavelet is capable of best fitting the transients of the class. The Morlet wavelet was chosen as mother wavelet which can proves the most appropriate comparing to other mother wavelets available in the literature [4]. The Morlet wavelet can be expressed as:
Ultrasonic Crack Size Estimation Based on Wavelet Neural Networks
∑ψ
897
ϕ1
( a1 ,τ 1 )
ϕ2 ω
∑ψ ( a2 ,τ 2 )
ω11 ω12 ω1 p
O1
21
O2 v21
ω2 p
ω22
v11 v12 v1k
ωm1
ϕm
∑ψ
( am ,τ m )
v22
v2k v p1
ωm 2 ωmp
Op
vp2
v pk
Fig. 3 Wavelet neural network structure
⎛ t −τ ⎝ a
ψ⎜
t −τ ⎞ ⎛ ⎟ = cos ⎜ Ω0 × a ⎠ ⎝
⎛ ⎛ t − τ ⎞2 ⎞ ⎞ exp ⎜⎜ − ⎜ ⎟ 2 ⎟⎟ ⎟ ⎠ ⎝ ⎝ a ⎠ ⎠
(8)
Where Ω 0 is equal to 5.33 [4]. The output of the wavelet neural network is: N ⎛ P ⎛ ⎛M ⎛ t −τ ⎞ ⎞ ⎞ ⎞ y (k ) = σ ⎜ ∑ v pk ⎜ σ p ⎜ ∑ ωmp ∑ x(n)ψ ⎜ ⎟⎟⎟⎟ ⎜ p =1 ⎝ a ⎠ ⎠ ⎠ ⎟⎠ n =1 ⎝ ⎝ m =1 ⎝
(9)
Where the input is x(n) , and t is the sampling time. The input specimen number is N , and wavelet nodes number is m . The hidden layer nodes number is p , and the output number is k .
5 Results and Discussion In order to show the potential performance of the proposed method, we have trained the proposed wavelet neural network based on the chosen feature vectors [x(1), x(2), …, x(14)], We divided the available experimental data as 80% for training, 10% for test and 10% for validation data. The relative performance was compared between the proposed method and BP neural network to test and validate the results. These two network performance curves were shown in Fig.4. The same number of neurons in every layer was applied, and the only difference is the basic functioning of these networks as mentioned in the previous sections. The BP neural network took longer time as compared to the wavelet neural network. The test results and the validation results of the wavelet neural network for estimation of the crack size were given in Fig.5. It can be seen that the proposed model produces very good estimation results for crack size.
898
Y. Zhang, L. Wang, and H. Zhu
105 104
103 102 101
100
10−1 10−2
Fig. 4 Training results for wavelet neural network and BP neural network model
Fig. 5 Test data performance for wavelet neural network
As a conclusion, the data driven approaches such as neural networks have found wide applications for pattern recognition and regression. In this paper, a list of potential feature vectors is prepared firstly. Wavelet neural network model was used to training data. Preliminary investigation using the ultrasonic data collected in lab for different crack sizes in a steel plate specimen show promising results. In the future work, the depth investigation of intrinsic features of the various neural network architectures, incorporation of advanced signal processing techniques for feature generation, data fusion with features from different knowledge domains, and application of neural network models in field data analysis.
Ultrasonic Crack Size Estimation Based on Wavelet Neural Networks
899
Acknowledgement. This work was supported by Natural Science Foundation of University in Jiangsu Province.
References 1. Sahoo, A.K., Zhang, Y.H., Zuo, M.J.: Estimating Crack Size and Location in a Steel Plate Using Ultrasonic Signals and CFBP Neural Networks. In: Proceedings of IEEE Canadian Conference on Electrical and Computer Engineering, pp. 1571–1574 (2008) 2. Leopoldo, A., Pasquale, D., Massimo, D.A.: Wavelet Network-Based Detection and Classification of Transients. IEEE Transactions on Instrumentation and Measurement 50(5), 1425–1435 (2001) 3. Miao, C.X., Zhang, Y.H., Ming, J.Q., Zuo, J.: A SVM Classifier Combined with PCA for Ultrasonic Crack Size Classification. In: Proceedings of IEEE Canadian Conference on Electrical and Computer Engineering, pp. 1627–1630 (2008) 4. Liu, C.L., Wang, X.: New Adaptive Wavelet Neural Network for ECG Recognition. Journal of System Simulation 19(14), 3281–3282 (2007) 5. Yang, J., Cheng, J.C.: Determination of the Elastic Constants of a Composite Plate Using Wavelet Transforms and Neural Networks. Journal of Acoustic Society of America 113(4), 1245–1250 (2002)
Author Index
Ai, Xiaoyan 623 Ak¸ca, Haydar 153 Al-Hadidi, Mazin 779 Alejo, Roberto 421 Altmayer, Kumud Singh 153 ´ Alvarez, Ignacio 213, 411 Bao, Changchun 769 Bao, Shuping 59, 69 Bao, Zheng 401 Bi, Gexin 101 Bian, Jiawen 185 Campos, Javier Guillen 481 Cao, Wenming 433 Cao, Yan 613 Cao, Yang 677 Casa˜ n, Gustavo A. 421 Chacon-Murguia, Mario I. 311 Chan, Fabian H.P. 391 Chan, Shui-duen 255, 299 Chang, Hung-Teng 543 Chekima, A. 391 Chen, Feng 401 Chen, Guangju 749 Chen, Ling 165 Chen, Pin-Chang 543 Chen, Shuiduen 379 Chen, Songcan 441 Chen, Tianding 369 Chen, Yu-Ju 205, 799 Chen, Yuanling 571, 759 Cheng, Quanxin 39 Cheng, Wei 631
Chi, Xiaoni 605 Covachev, Val´ery
153
Dai, Zhengguo 885 Dong, Bin 255, 379 Dong, Fang 101 Du, Jingyi 235 Er, Meng Joo
121
Fan, Binbin 597 Feng, Ding 721 Fierro, Lucina Cordoba Fu, Xuezhi 659
311
Gan, Woonseng 535 Gao, Daqi 441 Ge, Fengpei 255, 379 Gong, Qingwu 111 Gong, Sheng-yu 81 G´ orriz, Juan M. 213, 411 Govindarajan, Tholkappia Arasu Gu, Tianxiang 343 Guo, Weizhao 175 Guo, Xiaodong 597 Guo, XiaoPing 267 Han, Tailin 749 Hao, Feng 433 He, Jingjing 197 He, Yulin 759 Hou, Cheng-I 817 Hou, Qingyu 401 Hu, Biyun 291 Hu, Liangming 517
779
902 Hu, Shigeng 91 Hu, Tiesong 649 Hu, Xia 291 Hu, Yong 175 Huang, Bing 649 Huang, Chaoqun 323 Huang, Chih-Chien 205 Huang, Dandan 829 Huang, Huang-Chu 205, 799 Huang, Wencong 829 Huang, Xiaoli 351 Huang, Zhe 711 Hui, Xiaofeng 283 Hwang, Rey-Chue 205, 799 Jiang, Dongxiang 847 Jones, Samia 131 Kaitwanidvilai, Somyot 669 Kalaiarasi, S.M.A. 391 Kao, Yonggui 59 Kaur, Prabhdeep 807 Kendrick, Am´ezquita 507 Lai, Kin Keung 875 Lan, Tian-Syung 817 Li, Bo 491 Li, Changxi 579 Li, Fengjun 527 Li, Hongru 857 Li, Hongwei 185 Li, Ju 597 Li, Lin 291 Li, Minyong 7 Li, Ping 885 Li, Shangping 759 Li, Ta 769 Li, Tingjun 1, 729 Li, Wen-Rong 491 Li, Xiaodong 535 Li, Xiaoling 343 Li, Yadan 739 Li, Yuan 267 Li, Zongkun 517 Liang, Shi 759 Licon-Trillo, Angel 311 Lin, Feng 31 Lin, Zhiling 857 Liu, Changliang 255, 379 Liu, Chao 847
Author Index Liu, Desheng 275 Liu, Fan 121 Liu, Hong 749 Liu, Hongwei 401 Liu, Huanbin 641 Liu, Jin Ping 333 Liu, June 137, 143, 641 Liu, Mei 175 Liu, Mingyuan 587 Liu, Ning 613 Liu, Shuguang 587 Liu, Suyi 563 Liu, Xian-rong 81 Liu, Xiaoying 343 Liu, Zhong 7 Lo, Chih-Yao 543, 817 Long, Weiren 571 L´ opez, Miriam 213, 411 Lu, Xiaochun 19 Luo, Zhigao 597 Ma, Fanglan 571, 759 Ma, Hongwei 551 Ma, Lujuan 677 Mart´ınez, Jaime Pacheco 481 Martinez-Ibarra, J. Alejandro 311 Mendoza-Vida˜ na, Oscar 311 Moein, Mahsa 359 Moein, Sara 359 Nevarez-Aguilar, Mitzy 311 Niyoyita, Jean Paul 333 Olranthichachat, Piyapong Pan, Fuping 255, 379 Pan, Jielin 299, 769 Peng, Huiming 185 Peng, Jin 605, 837 Peng, Pengfei 659 Puntonet, Carlos Garc´ıa
669
213, 411
Qi, Jianxun 49 Qian, Zhiming 323 Qiu, Jianlong 39 Qu, Zhiming 685, 693 Ram´ırez, Javier 213, 411 Rao, Yunhua 677 Rubio Avila, Jos´e de Jes´ us
481
Author Index Salas-Gonzalez, Diego 213, 411 Salom´ on, Montenegro 507 Saraee, Mohamad Hossein 359 Segovia, Fermin 213, 411 Shen, Pan 829 Shen, Yidong 223 Shi, Qingjun 275 Shuai, Haiyan 111 Shukla, Anupam 807 Sitiol, Augustina 391 Solis-Martinez, Francisco J. 311 Song, Guoming 749 Songram, Panida 739 Sotoca, Jose M. 421 Srithongchai, Pitsanu 669 Su, Bai 223 Sung, Chun-Yi 469 Tan, Bianyou 721 Tang, Da-wei 81 Tang, Zhao Hui 333 Tang, Ruili 829 Tiwari, Ritu 807 Tsai, Cheng-Fa 469 Tsai, Sung-Ning 799 V., Palanisamy 779 Valdovinos, R.M. 421 Wan, Feng 461 Wan, Qian 563 Wang, Jun 291 Wang, Kejing 197 Wang, Li 267 Wang, Lihua 893 Wang, Mei 235 Wang, Peng 721 Wang, Qiang 49 Wang, Xiang 597 Wang, Xiaomei 451 Wang, Xinzheng 31 Wang, Zhe 441 Wang, Zhenqi 701 Wei, Hui 451 Wen, Peina 791 Weng, Pin-Hsuan 205 Wong, Chiman 461 Wu, Juanjuan 865 Wu, Jun 111 Wu, Weibing 143
903 Xi, Ziqiang 829 Xia, Xuhui 885 Xie, Dong 579 Xie, Kang 175 Xing, Jing 185 Xing, Jun 659 Xing, Liying 31 Xu, Dan 323 Xu, Dongsheng 623 Xu, Guanjun 721 Xu, Jianhua 245 Xu, Jin 711 Xu, Shuxiang 165 Xu, Wei 223 Xu, Weiqun 769 Xu, Xu 739 Xu, Yong 91 Xu, Zhangyan 143 Xu, Zhiru 275 Yan, Yonghong 255, 299, 379, 769 Yang, Cheng 721 Yang, Jen-Pin 799 Yang, Lina 613 Yang, Xiaobo 175 Yang, Yanli 613 Yin, Jian 175 Yin, Weiming 711 Yuan, Jimin 343 Yuan, Jingling 197 Yuan, Yongxin 721 Yue, Youjun 857 Zeng, Huanglin 351 Zhang, Baolei 571 Zhang, Chao 267 Zhang, Heng 563 Zhang, Leilei 283 Zhang, Meng 7 Zhang, Qiang 865 Zhang, Qingfang 137 Zhang, Qingpeng 501 Zhang, Qingqing 299 Zhang, Qizhi 535 Zhang, Xiang 649 Zhang, Xiaofeng 865 Zhang, Xiaorui 7 Zhang, Yonghong 893
904 Zhang, Zhaotong 551 Zhang, Zhiyong 791 Zhang, Zhousuo 631 Zhao, Cuncheng 677 Zhao, Fengyao 517 Zhao, Hui 857 Zhao, Jinli 369 Zhao, Minghao 847 Zheng, Lizhi 829 Zheng, Yue 641 Zhong, Luo 197
Author Index Zhou, Jian-lan 81 Zhou, Jingguo 275 Zhou, Ligang 875 Zhou, Xiaoning 631 Zhou, Xin 245 Zhou, Yali 535 Zhou, Yiming 291 Zhou, Yu 701 Zhu, Honglian 893 Zhu, Song 91 Zhu, Xinhua 255