Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen
Editorial Board David Hutchison Lancaster University, UK Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA Josef Kittler University of Surrey, Guildford, UK Jon M. Kleinberg Cornell University, Ithaca, NY, USA Friedemann Mattern ETH Zurich, Switzerland John C. Mitchell Stanford University, CA, USA Moni Naor Weizmann Institute of Science, Rehovot, Israel Oscar Nierstrasz University of Bern, Switzerland C. Pandu Rangan Indian Institute of Technology, Madras, India Bernhard Steffen University of Dortmund, Germany Madhu Sudan Massachusetts Institute of Technology, MA, USA Demetri Terzopoulos University of California, Los Angeles, CA, USA Doug Tygar University of California, Berkeley, CA, USA Moshe Y. Vardi Rice University, Houston, TX, USA Gerhard Weikum Max-Planck Institute of Computer Science, Saarbruecken, Germany
4684
Lishan Kang Yong Liu Sanyou Zeng (Eds.)
Evolvable Systems: From Biology to Hardware 7th International Conference, ICES 2007 Wuhan, China, September 21-23, 2007 Proceedings
13
Volume Editors Lishan Kang China University of Geosciences School of Computer Science Wuhan, Hubei 430074, China E-mail: kangw
[email protected] Yong Liu The University of Aizu,Tsuruga Ikki-machi, Aizu-Wakamatsu City, Fukushima 965-8580, Japan E-mail:
[email protected] Sanyou Zeng China University of Geosciences School of Computer Science Wuhan, Hubei 430074, China E-mail:
[email protected]
Library of Congress Control Number: 2007933938 CR Subject Classification (1998): B.6, B.7, F.1, I.6, I.2, J.2, J.3 LNCS Sublibrary: SL 1 – Theoretical Computer Science and General Issues ISSN ISBN-10 ISBN-13
0302-9743 3-540-74625-0 Springer Berlin Heidelberg New York 978-3-540-74625-6 Springer Berlin Heidelberg New York
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. Springer is a part of Springer Science+Business Media springer.com © Springer-Verlag Berlin Heidelberg 2007 Printed in Germany Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper SPIN: 12115266 06/3180 543210
Preface
We are proud to introduce the proceedings of the 7th International Conference on Evolvable Systems: From Biology to Hardware (ICES 2007) held in Wuhan, China, September 21–23, 2007. ICES 2007 successfully attracted 123 submissions. After rigorous reviews, 41 high-quality papers were included in the proceedings of ICES 2007, representing an acceptance rate of 33%. ICES conferences are the first series of international conferences on evolvable systems. The idea of evolvable systems, whose origins can be traced back to the cybernetics movement of the 1940s and the 1950s, has recently led to bio-inspired systems with self-reproduction or self-repair of the original hardware structures, and evolvable hardware with the autonomous reconfiguration of hardware structures by evolutionary algorithms. Following the workshop Towards Evolvable Hardware taking place in Lausanne, Switzerland, in October 1995, the 1st International Conference on Evolvable Systems: From Biology to Hardware (ICES 1996) was held in Tsukuba, Japan (1996). Subsequent ICES conferences were held in Lausanne, Switzerland (1998), Edinburgh, UK (2000), Tokyo, Japan (2001), Trondheim, Norway (2003), and Barcelona, Spain (2005) where it was decided that China University of Geosciences, Wuhan, would be the location of ICES 2007 with Lishan Kang as the General Chair. ICES 2007 addressed the theme “From Laboratory to Real World” by explaining how to shorten the gap between evolvable hardware research and design for real-world applications in semiconductor engineering and mechanical engineering. ICES 2007 featured the most up-to-date research and applications in digital hardware evolution, analog hardware evolution, bio-inspired systems, mechanical hardware evolution, evolutionary algorithms in hardware design, and hardware implementations of evolutionary algorithms. ICES 2007 also provided a venue to foster technical exchanges, renew everlasting friendships, establish new connections, and presented the Chinese cultural traditions to overcome cultural barriers. On behalf of the Organizing Committee, we would like to thank warmly the sponsors, China University of Geosciences and Chinese Society of Astronautics, who helped in one way or another to achieve our goals for the conference. We wish to express our appreciation to Springer, for publishing the proceedings of ICES 2007 in the Lecture Notes in Computer Science. We would also like to thank also the authors for submitting their work, as well as the Program Committee members and reviewers for their enthusiasm, time and expertise. The invaluable help of active members of the Organizing Committee, including Xuesong Yan, Qiuming Zhang, Yan Guo, Siqing Xue, Ziyi Chen, Xiang Li, Guang Chen, Rui Wang, Hui Wang, and Hui Shi, in setting up and maintaining the online submission systems, assigning the papers to the reviewers, and
VI
Preface
preparing the camera-ready version of the proceedings was highly appreciated and we would like to thank them personally for their efforts to make ICES 2007 a success. September 2007
Lishan Kang Yong Liu Sanyou Zeng
Organization
ICES 2007 was organized by the School of Computer Science and Research Center for Space Science and Technology, China University of Geosciences, sponsored by China University of Geosciences and Chinese Society of Astronautics.
Honorary Conference Chair Yanxin Wang
China University of Geosciences, China
General Chair Lishan Kang
China University of Geosciences, China
Program Chair Yong Liu Tetsuya Higuchi
University of Aizu, Japan National Institute of Advanced Industrial Science and Technology, Japan
Local Chair Sanyou Zeng
China University of Geosciences, China
Program Committee Elhadj Benkhelifa Peter J. Bentley Stefano Cagnoni Carlos A. Coello Coello Peter Dittrich Marco Dorigo Rolf Drechsler Marc Ebner Manfred Glesner Darko Grundler Pauline C. Haddow Alister Hamilton Morten Hartmann Jingsong He
University of the West of England, UK University College London, UK Universit` a degli Studi di Parma, Italy Depto. de Computaci´ on, Mexico Friedrich Schiller University, Germany Universit´e Libre de Bruxelles, Belgium University of Bremen, Germany Universitaet Wuerzburg, Germany Darmstadt University, Germany Univesity of Zagreb, Croatia The Norwegian University of Science and Technology, Norway Edinburgh University, UK Norwegian University of Science and Technology, Norway University of Science and Technology of China, China
VIII
Organization
Arturo Hernandez Aguirre Francisco Herrera Tetsuya Higuchi
Tulane University, USA University of Granada, Spain National Institute of Advanced Industrial Science and Technology, Japan Masaya Iwata National Institute of Advanced Industrial Science and Technology, Japan Yaochu Jin Honda Research Institute Europe, Germany Didier Keymeulen Jet Propulsion Laboratory, USA Jason Lohn NASA Ames Research Center, USA Michael Lones Department of Electronics, University of York, UK Wenjian Luo University of Science and Technology of China, China Juan Manuel Moreno Arostegui Technical University of Catalonia (UPC), Spain Karlheinz Meier University of Heidelberg, Germany Julian Miller Department of Electronics University of York, UK Masahiro Murakawa National Institute of Advanced Industrial Science and Technology, Japan Michael Orlov Ben-Gurion University, Israel Marek Perkowski Portland State University, USA Eduardo Sanchez Logic Systems Laboratory, Switzerland Lukas Sekanina Brno University of Technology, Czech Republic Moshe Sipper Ben-Gurion University, Israel Adrian Stoica Jet Propulsion Lab, USA Kiyoshi Tanaka Shinshu University, Japan Gianluca Tempesti University of York, UK Christof Teuscher University of California, San Diego (UCSD), USA ´ Yann Thoma Ecode d’ing´enieurs de Gen`eve, Switzerland Adrian Thompson University of Sussex, UK Jon Timmis University of York, UK Jim Torresen University of Oslo, Norway Jochen Triesch J.W. Goethe University, Germany Edward Tsang University of Essex, UK Gunnar Tufte The Norwegian University of Science and Technology, Norway Andy Tyrrell University of York, UK Youren Wang Nanjing University of Aeronautics and Astronautics, China Xin Yao University of Birmingham, UK Ricardo Zebulum Jet Propulsion Lab, USA Sanyou Zeng China University of Geosciences, China Qingfu Zhang University of Essex, UK Shuguang Zhao Xidian University, China
Organization
Steering Committee Pauline C. Haddow Tetsuya Higuchi Julian F. Miller Jim Torresen Andy Tyrrell (Chair)
The Norwegian University of Science and Technology, Norway National Institute of Advanced Industrial Science and Technology, Japan University of Birmingham, UK University of Oslo, Norway University of York, UK
IX
Table of Contents
Digital Hardware Evolution An Online EHW Pattern Recognition System Applied to Sonar Spectrum Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kyrre Glette, Jim Torresen, and Moritoshi Yasunaga Design of Electronic Circuits Using a Divide-and-Conquer Approach . . . . Guoliang He, Yuanxiang Li, Li Yu, Wei Zhang, and Hang Tu
1
13
Implementing Multi-VRC Cores to Evolve Combinational Logic Circuits in Parallel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jin Wang, Chang Hao Piao, and Chong Ho Lee
23
An Intrinsic Evolvable Hardware Based on Multiplexer Module Array . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jixiang Zhu, Yuanxiang Li, Guoliang He, and Xuewen Xia
35
Estimating Array Connectivity and Applying Multi-output Node Structure in Evolutionary Design of Digital Circuits . . . . . . . . . . . . . . . . . . Jie Li and Shitan Huang
45
Research on the Online Evaluation Approach for the Digital Evolvable Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rui Yao, You-ren Wang, Sheng-lin Yu, and Gui-jun Gao
57
Research on Multi-objective On-Line Evolution Technology of Digital Circuit Based on FPGA Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Guijun Gao, Youren Wang, Jiang Cui, and Rui Yao
67
Evolutionary Design of Generic Combinational Multipliers Using Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Michal Bidlo
77
Analog Hardware Evolution Automatic Synthesis of Practical Passive Filters Using Clonal Selection Principle-Based Gene Expression Programming . . . . . . . . . . . . . . . . . . . . . . Zhaohui Gan, Zhenkun Yang, Gaobin Li, and Min Jiang
89
Research on Fault-Tolerance of Analog Circuits Based on Evolvable Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Qingjian Ji, Youren Wang, Min Xie, and Jiang Cui
100
XII
Table of Contents
Analog Circuit Evolution Based on FPTA-2 . . . . . . . . . . . . . . . . . . . . . . . . . Qiongqin Wu, Yu Shi, Juan Zheng, Rui Yao, and Youren Wang
109
Bio-inspired Systems Knowledge Network Management System with Medicine Self Repairing Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . JeongYon Shim
119
Design of a Cell in Embryonic Systems with Improved Efficiency and Fault-Tolerance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yuan Zhang, Youren Wang, Shanshan Yang, and Min Xie
129
Design on Operator-Based Reconfigurable Hardware Architecture and Cell Circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Min Xie, Youren Wang, Li Wang, and Yuan Zhang
140
Bio-inspired Systems with Self-developing Mechanisms . . . . . . . . . . . . . . . . Andr´e Stauffer, Daniel Mange, Jo¨el Rossier, and Fabien Vannel Development of a Tiny Computer-Assisted Wireless EEG Biofeedback System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Haifeng Chen, Ssanghee Seo, Donghee Ye, and Jungtae Lee Steps Forward to Evolve Bio-inspired Embryonic Cell-Based Electronic Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Elhadj Benkhelifa, Anthony Pipe, Mokhtar Nibouche, and Gabriel Dragffy Evolution of Polymorphic Self-checking Circuits . . . . . . . . . . . . . . . . . . . . . . Lukas Sekanina
151
163
174
186
Mechanical Hardware Evolution Sliding Algorithm for Reconfigurable Arrays of Processors . . . . . . . . . . . . . Natalia Dowding and Andy M. Tyrrell System-Level Modeling and Multi-objective Evolutionary Design of Pipelined FFT Processors for Wireless OFDM Receivers . . . . . . . . . . . . . . Erfu Yang, Ahmet T. Erdogan, Tughrul Arslan, and Nick Barton Reducing the Area on a Chip Using a Bank of Evolved Filters . . . . . . . . . Zdenek Vasicek and Lukas Sekanina
198
210 222
Evolutionary Design Walsh Function Systems: The Bisectional Evolutional Generation Pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nengchao Wang, Jianhua Lu, and Baochang Shi
233
Table of Contents
XIII
Extrinsic Evolvable Hardware on the RISA Architecture . . . . . . . . . . . . . . Andrew J. Greensted and Andy M. Tyrrell
244
Evolving and Analysing “Useful” Redundant Logic . . . . . . . . . . . . . . . . . . . Asbjoern Djupdal and Pauline C. Haddow
256
Adaptive Transmission Technique in Underwater Acoustic Wireless Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Guoqing Zhou and Taebo Shim
268
Autonomous Robot Path Planning Based on Swarm Intelligence and Stream Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chengyu Hu, Xiangning Wu, Qingzhong Liang, and Yongji Wang
277
Research on Adaptive System of the BTT-45 Air-to-Air Missile Based on Multilevel Hierarchical Intelligent Controller . . . . . . . . . . . . . . . . . . . . . . Yongbing Zhong, Jinfu Feng, Zhizhuan Peng, and Xiaolong Liang
285
The Design of an Evolvable On-Board Computer . . . . . . . . . . . . . . . . . . . . . Chen Shi, Shitan Huang, and Xuesong Yan
292
Evolutionary Algorithms in Hardware Design Extending Artificial Development: Exploiting Environmental Information for the Achievement of Phenotypic Plasticity . . . . . . . . . . . . . Gunnar Tufte and Pauline C. Haddow
297
UDT-Based Multi-objective Evolutionary Design of Passive Power Filters of a Hybrid Power Filter System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shuguang Zhao, Qiu Du, Zongpu Liu, and Xianghe Pan
309
Designing Electronic Circuits by Means of Gene Expression Programming II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Xuesong Yan, Wei Wei, Qingzhong Liang, Chengyu Hu, and Yuan Yao Designing Polymorphic Circuits with Evolutionary Algorithm Based on Weighted Sum Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Houjun Liang, Wenjian Luo, and Xufa Wang Robust and Efficient Multi-objective Automatic Adjustment for Optical Axes in Laser Systems Using Stochastic Binary Search Algorithm . . . . . . Nobuharu Murata, Hirokazu Nosato, Tatsumi Furuya, and Masahiro Murakawa Minimization of the Redundant Sensor Nodes in Dense Wireless Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dingxing Zhang, Ming Xu, Wei Xiao, Junwen Gao, and Wenshen Tang
319
331
343
355
XIV
Table of Contents
Evolving in Extended Hamming Distance Space: Hierarchical Mutation Strategy and Local Learning Principle for EHW . . . . . . . . . . . . . . . . . . . . . Jie Li and Shitan Huang
368
Hardware Implementation of Evolutionary Algorithms Adaptive and Evolvable Analog Electronics for Space Applications . . . . . Adrian Stoica, Didier Keymeulen, Ricardo Zebulum, Mohammad Mojarradi, Srinivas Katkoori, and Taher Daud
379
Improving Flexibility in On-Line Evolvable Systems by Reconfigurable Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jim Torresen and Kyrre Glette
391
Evolutionary Design of Resilient Substitution Boxes: From Coding to Hardware Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nadia Nedjah and Luiza de Macedo Mourelle
403
A Sophisticated Architecture for Evolutionary Multiobjective Optimization Utilizing High Performance DSP . . . . . . . . . . . . . . . . . . . . . . . Quanxi Li and Jingsong He
415
FPGA-Based Genetic Algorithm Kernel Design . . . . . . . . . . . . . . . . . . . . . . Xunying Zhang, Chen Shi, and Fei Hui
426
Using Systolic Technique to Accelerate an EHW Engine for Lossless Image Compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yunbi Chen and Jingsong He
433
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
445
An Online EHW Pattern Recognition System Applied to Sonar Spectrum Classification Kyrre Glette1 , Jim Torresen1 , and Moritoshi Yasunaga2 1
2
University of Oslo, Department of Informatics, P.O. Box 1080 Blindern, 0316 Oslo, Norway {kyrrehg,jimtoer}@ifi.uio.no University of Tsukuba, Graduate School of Systems and Information Engineering, 1-1-1 Ten-ou-dai, Tsukuba, Ibaraki, Japan
[email protected]
Abstract. An evolvable hardware (EHW) system for high-speed sonar return classification has been proposed. The system demonstrates an average accuracy of 91.4% on a sonar spectrum data set. This is better than a feed-forward neural network and previously proposed EHW architectures. Furthermore, this system is designed for online evolution. Incremental evolution, data buses and high level modules have been utilized in order to make the evolution of the 480 bit-input classifier feasible. The classification has been implemented for a Xilinx XC2VP30 FPGA with a resource utilization of 81% and a classification time of 0.5μs.
1
Introduction
High-speed pattern recognition systems applied in time-varying environments, and thus needing adaptability, could benefit from an online evolvable hardware (EHW) approach [1]. One EHW approach to online reconfigurability is the Virtual Reconfigurable Circuit (VRC) method proposed by Sekanina in [2]. This method does not change the bitstream to the FPGA itself, rather it changes the register values of a circuit already implemented on the FPGA, and obtains virtual reconfigurability. This approach has a speed advantage over reconfiguring the FPGA itself, and it is also more feasible because of proprietary formats preventing direct FPGA bitstream manipulation. However, the method requires much logic resources. An EHW pattern recognition system, Logic Design using Evolved Truth Tables (LoDETT), has been presented by Yasunaga et al. Applications include face image and sonar target recognition [3,4]. This architecture is capable of classifying large input vectors (512 bits) into several categories. The classifier function is directly coded in large AND gates. The category module with the highest number of activated AND gates determines the classification. Incremental evolution is utilized such that each category is evolved separately. The average recognition accuracy for this system, applied to the sonar target task, is 83.0%. However, evolution is performed offline and the final system is synthesized. This approach L. Kang, Y. Liu, and S. Zeng (Eds.): ICES 2007, LNCS 4684, pp. 1–12, 2007. c Springer-Verlag Berlin Heidelberg 2007
2
K. Glette, J. Torresen, and M. Yasunaga
gives rapid (< 150ns) classification in a compact circuit, but lacks run-time reconfigurability. A system proposed earlier by the authors addresses the reconfigurability by employing a VRC-like array of high-level functions [5]. Online/on-chip evolution is attained, and therefore the system seems suited to applications with changes in the training set. However, the system is limited to recognizing one category out of ten possible input categories. A new architecture was then proposed by the authors to allow for the high classification capabilities of the LoDETT system, while maintaining the online evolution features from [5]. This was applied to multiple-category face image recognition and a slightly higher recognition accuracy than the LoDETT system was achieved [6]. While in LoDETT a large number of inputs to the AND gates can be optimized away during circuit synthesis, the run-time reconfiguration aspect of the online architecture has led to a different approach employing fewer elements. The evolution part of this system has been implemented on an FPGA in [7]. Fitness evaluation is carried out in hardware, while the evolutionary algorithm runs on an on-chip processor. In this paper the architecture, previously applied to face image recognition, has been applied to the sonar target recognition task. The nature of this application has led to differences in the architecture parameters. Changes in the fitness function were necessary to deal with the higher difficulty of this problem. The sonar target dataset was presented by Gorman and Sejnowski in [8]. A feed-forward neural network was presented, which contained 12 hidden units and was trained using the back-propagation algorithm. A classification accuracy of 90.4% was reported. Later, better results have been achieved on the same data set, using variants of the Support Vector Machine (SVM) method. An accuracy of 95.2% was obtained in a software implementation presented in [9]. There also exists some hardware implementations of SVMs, such as [10], which performs biometric classification in 0.66ms using an FPGA. The next section introduces the architecture of the evolvable hardware system. Then, the sonar return-specific implementation is detailed in section 3. Aspects of evolution are discussed in section 4. Results from the experiments are given and discussed in sections 5. Finally, section 6 concludes the paper.
2
The Online EHW Architecture
The EHW architecture is implemented as a circuit whose behaviour and connections can be controlled through configuration registers. By writing the genome bitstream from the genetic algorithm (GA) to these registers, one obtains the phenotype circuit which can then be evaluated. This approach is related to the VRC technique, as well as to the architectures in our previous works [11,5]. 2.1
System Overview
A high-level view of the system can be seen in figure 1. The system consists of three main parts – the classification module, the evaluation module, and the
An Online EHW Pattern Recognition System
3
ONLINE EVOLVABLE SYSTEM TOP-LEVEL VIEW CLASSIFICATION SYSTEM TOP-LEVEL MODULE
CPU
configuration &
input pattern
EVALUATION MODULE fitness
training patterns
CDM1
M A X.
CDM2
D E T E C T O R
configuration
CLASSIFICATION MODULE input pattern
category classification
Fig. 1. High level system view
category classification
CDMK
Fig. 2. EHW classification module view
CPU. The classification module operates stand-alone except for its reconfiguration which is carried out by the CPU. In a real-world application one would imagine some preprocessing module providing the input pattern and possibly some software interpretation of the classification result. The evaluation module operates in close cooperation with the CPU for the evolution of new configurations. The evaluation module accepts a configuration bitstring, also called genome, and calculates its fitness value. This information is in turn used by the CPU for running the rest of the GA. The evaluation module has been implemented and described in detail in [7]. 2.2
Classification Module Overview
The classifier system consists of K category detection modules (CDMs), one for each category Ci to be classified – see figure 2. The input data to be classified is presented to each CDM concurrently on a common input bus. The CDM with the highest output value will be detected by a maximum detector, and the identifying number of this category will be output from the system. Alternatively, the system could also state the degree of certainty of a certain category by taking the output of the corresponding CDM and dividing by the maximum possible output. In this way, the system could also propose alternative categories in case of doubt. 2.3
Category Detection Module
Each CDM consists of M ”rules” or functional unit (FU) rows – see figure 3. Each FU row consists of N FUs. The inputs to the circuit are passed on to the inputs of each FU. The 1-bit outputs from the FUs in a row are fed into an N input AND gate. This means that all outputs from the FUs must be 1 in order for a rule to be activated. The 1-bit outputs from the AND gates are connected to an input counter which counts the number of activated FU rows.
4
K. Glette, J. Torresen, and M. Yasunaga
As the number of FU rows is increased, so is the output resolution from each CDM. Each FU row is evolved from an initial random bitstream, which ensures a variation in the evolved FU rows. To draw a parallel to the LoDETT system, each FU row represents a kernel function. More FU rows give more kernel functions (with different centers) that the unknown pattern can fall into. CATEGORY DETECTION MODULE
input pattern
FU11
FU12
FU1N
FUNCTIONAL UNIT N-input AND
FU 2N
N-input AND
C O U N T E R
output
Input pattern
Addr FUM1 FUM2
FUMN
f1 f MUX
FU22
Data MUX
FU21
C
f2
Output
f
Configuration
N-input AND
Fig. 3. Category detection module. N functional units are connected to an N input AND gate.
2.4
Fig. 4. Functional unit. The data MUX selects which of the input data to feed to the functions f1 and f2 . The f MUX selects which of the function results to output.
Functional Unit
The FUs are the reconfigurable elements of the architecture. This section describes the FU in a general way, and section 3.2 will describe the applicationspecific implementation. As seen in figure 4, each FU behavior is controlled by configuration lines connected to the configuration registers. Each FU has all input bits to the system available at its inputs, but only one data element (e.g. one byte) of these bits is chosen. One data element is thus selected from the input bits, depending on the configuration lines. This data is then fed to the available functions. Any number and type of functions could be imagined, but for clarity, in figure 4 only two functions are illustrated. The choice of functions for the sonar classification application will be detailed in section 3.1. In addition, the unit is configured with a constant value, C. This value and the input data element are used by the function to compute the output from the unit. The advantage of selecting which inputs to use, is that connection to all inputs is not required. A direct implementation of the LoDETT system [4] would have required, in the sonar case, N = 60 FUs in a row. Our system typically uses N = 6 units. The rationale is that not all of the inputs are necessary for the pattern recognition. This is reflected in the don’t cares evolved in [4].
An Online EHW Pattern Recognition System
3
5
Implementation
This section describes the sonar return classification application and the following application-specific implementation of the FU module. The evaluation module, which contains one FU row and calculates fitness based on the training vectors, is in principle equal to the description in [7] and will not be further described in this paper. 3.1
Sonar Return Classification
0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0
10
20
30
40
50
sonar spectrum
60
200 180 160 140 120 100 80 60 40 20 0
metal rock 0
10
20
30
40
50
max. detector
The application data set has been found in the CMU Neural Networks Benchmark Collection1 , and was first used by Gorman and Sejnowski in [8]. This is a real-world data set consisting of sonar returns from underwater targets of either a metal cylinder or a similarly shaped rock. The number of CDMs in the system then becomes K = 2. The returns have been collected from different aspect angles and preprocessed based on experiments with human listeners, such that the input signals are spectral envelopes containing 60 samples, normalized to values between 0.0 and 1.0 – see figure 5. There are 208 returns in total which have
60
60 8-bit samples
Fig. 5. The sonar return spectral envelope, which is already a preprocessed signal, has its 60 samples scaled to 8-bit values before they are input to the CDMs
been divided into equally sized training and test sets of 104 returns. The samples have been scaled by the authors to 8-bit values ranging between 0 and 255. This gives a total of 60 × 8 = 480 bits to input to the system for each return. Based on the data elements of the input being 8-bit scalars, the functions available to the FU elements have been chosen to greater than and less than or equal. Through experiments these functions have shown to work well, and intuitively this allows for detecting the presence or absence of frequencies in the signal, and their amplitude. The constant is also 8 bits, and the input is then compared to this value to give true or false as output. This can be summarized as follows, with I being the selected input value, O the output, and C the constant value: f Description Function 0 Greater than O = 1 if I > C, else 0 1 Less than or equal O = 1 if I ≤ C, else 0 1
http://www.cs.cmu.edu/afs/cs/project/ai-repository/ai/areas/neural/bench/cmu/
6
3.2
K. Glette, J. Torresen, and M. Yasunaga
Functional Unit Implementation
Based on the choice of data elements and functions above, the application-specific implementation of the FU can be determined. As described in the introduction, the VRC technique used for reconfiguration of circuits has the disadvantage of requiring much logic resources. This especially becomes the case when one needs to select the input to a unit from many possible sources, which is a common case for EHW. This is an even bigger problem when working with data buses as inputs instead of bit signals. FUNCTIONAL UNIT IMPLEMENTATION sample number
=
input sample number
> D
input sample value
>
OUTPUT REG.
output
M U X
constant
function
configuration input
Fig. 6. Implementation of the FU for the sonar spectrum recognition case
Instead of using a large amount of multiplexer resources for selecting one 8-bit sample from 60 possible, we have opted for a ”time multiplexing” scheme, where only one bit is presented to the unit at a time. See figure 6. The 60 samples are presented sequentially, one for each clock cycle, together with their identifying sample number. The FU checks the input sample number for a match with the sample number stored in the configuration register, and in the case of a match, the output value of the FU is stored in a memory element. This method thus requires a maximum of 60 clock cycles before an FU has selected its input value. The sample input value is used for comparison with the constant value C stored in the configuration. Since the two functions greater than and less than or equal are opposite, only a greater than-comparator is implemented, and the function bit in the configuration decides whether to choose the direct or the negated output.
4
Evolution
This section describes the evolutionary process. Although the base mechanisms are the same as in [6], there are important changes to the fitness function. The GA implemented for the experiments follows the Simple GA style [12]. The algorithm is written to be run on the PowerPC 405 hard processor core in the Xilinx Virtex-II Pro (or better) FPGAs [11], or the MicroBlaze soft processor core available for a greater number of FPGA devices [5]. Allowing the GA to run in software instead of implementing it in hardware gives an increased flexibility compared to a hardware implementation.
An Online EHW Pattern Recognition System
7
The GA associates a bit string (genome) with each individual in the population. For each individual, the fitness evaluation circuit is configured with the associated bit string, and training vectors are applied on the inputs. By reading back the fitness value from the circuit, the individuals can be ranked and used in selection for a new generation. When an individual with a maximum possible fitness value has been created (or the maximum limit of generations has been reached), the evolution run is over and the bit string can be used to configure a part of the operational classification circuit. 4.1
Genome
The encoding of each FU in the genome string is as follows: Spectrum sample address (6 bit) Function (1 bit) Constant (8 bit) This gives a total of Bunit = 15 bits for each unit. The genome for one FU row is encoded as follows: F U1 (15b) F U2 (15b) ... F UN (15b) The total amount of bits in the genome for one FU row is then, with N = 6, Btot = Bunit × N = 15 × 6 = 90. In the implementation this is rounded up to 96 bits (3 words). 4.2
Incremental Evolution of the Category Detectors
Evolving the whole classification system in one run would give a very long genome, therefore an incremental approach is chosen. Each category detector CDMi can be evolved separately, since there is no interdependency between the different categories. This is also true for the FU rows each CDM consists of. Although the fitness function changes between the rows, as will be detailed in the next section, the evolution can be performed on one FU row at a time. This significantly reduces the genome size. 4.3
Fitness Function
The basic fitness function, applied in [6], can be described as follows: A certain set of the available vectors, Vt , are used for training of the system, while the remaining, Vv , are used for verification after the evolution run. Each row of FUs is fed with the training vectors (v ∈ Vt ), and the fitness is based on the row’s ability to give a positive (1) output for vectors v belonging to its own category (Cv = Ci ), while giving a negative (0) output for the rest (Cv = Ci ). In the case of a positive output when Cv = Ci , the value 1 is added to the fitness sum. When Cv = Ci and the row gives a negative output (value 0), 1 is added to the fitness sum. The other cases do not contribute to the fitness value.
8
K. Glette, J. Torresen, and M. Yasunaga
The basic fitness function FB for a row can then be expressed in the following way, where o is the output of the FU row: o if Cv = Ci FB = xv where xv = (1) 1 − o if Cv = Ci v∈Vt
While in the face image application each FU row within the same category was evolved with the same fitness function, the increased variation of the training set in the current application made it sensible to divide the training set between different FU rows. Each FU row within the same category is evolved separately, but by changing the fitness function between the evolution runs one can make different FU rows react to different parts of the training set. The extended fitness function FE can then be expressed as follows: o if Cv = Ci and v ∈ Vf,m FE = xv where xv = (2) 1 − o if Cv = Ci v∈Vt
Where Vf,m is the part of the training set FU row m will be trained to react positively to. For instance, if the training set of 104 vectors is divided into 4 equally sized parts, FU row 1 of the CDM would receive an increase in fitness for the first 26 vectors, if the output is positive for vectors belonging to the row’s category (i.e. ”rock” or ”metal”). In addition, the fitness is increased for giving a negative output to vectors not belonging to the row’s category, for all vectors of the training set.
5
Results
This section presents the results of the implementation and experiments undertaken. The classification results are based on a software simulation of the EHW architecture, with has identical functionality to the hardware proposed. Results from a hardware implementation of the classifier module are also presented. 5.1
Architecture and Evolution Parameters
The architecture parameters N and M , that is, the number of FUs in an FU row and the number of FU rows in a CDM, respectively, have been evaluated. From experiments, a value of N = 6 has shown to work well. Increasing the number of FU rows for a category leads to an increase in the recognition accuracy, as seen in figure 7. However, few FU rows are required before the system classifies with a relatively good accuracy, thus the system could be considered operational before the full number of FU rows are evolved. It is worth noting that the training set classification accuracy relatively quickly rises to a high value, and then has slow growth, compared to the test set accuracy which has a more steady increase as more FU rows are added. For the evolution experiments, a population size of 32 is used. The crossover rate is 0.9. Linear fitness scaling is used, with 4 expected copies of the best individual. In addition, elitism is applied. The maximum number of generations
100
100
95
95
90
90
85
85 Accuracy (%)
Accuracy (%)
An Online EHW Pattern Recognition System
80 75 70
9
80 75 70
65
65 training (1000 gens., Fe) test (1000 gens., Fe) training (20000 gens., Fe) test (20000 gens., Fe)
60
training (1000 gens., Fe) test (1000 gens., Fe) training (1000 gens., Fb) test (1000 gens., Fb)
60
55
55 10
20
30
40
50
60
70
10
20
30
FU rows
40
50
60
70
FU rows
Fig. 7. Average classification accuracy on Fig. 8. Average classification perforthe training and test sets as a function of mance obtained using fitness functions FB the number of FU rows per CDM. N = 6. and FE . N = 6. Results for generation limits of 1000 and 20000.
allowed for one evolution run is 1000. By observing figure 7 one can see that this produces better results than having a limit of 20000 generations, even though this implies that fewer FU rows are evolved to a maximum fitness value. The classification accuracies obtained by using the basic fitness function FB , and the extended fitness function FE with the training set partitioned into 4 parts of equal size, have been compared – see figure 8. The use of FE allows for a higher classification accuracy on both the training and the test set. 5.2
Classification Accuracy
10 evolution runs were conducted, each with the same training and test set as described in section 3.1, but with different randomized initialization values for the genomes. The extended fitness function FE is used. The results can be Table 1. Average classification accuracy FU rows (M ) training set test set 20 97.3% 87.8% 58 99.6% 91.4%
seen in table 1. The highest average classification accuracy, 91.4%, was obtained at M = 58 rows. The best system evolved presented an accuracy of 97.1% at M = 66 rows (however, the average value was only 91.2%). The results for M = 20, a configuration requiring less resources, are also shown. These values are higher than the average and maximum values of 83.0% and 91.7% respectively obtained in [4], although different training and test sets have been used. The classification performance is also better than the average value of 90.4% obtained from the original neural network implementation in [8], but lower than the results obtained by SVM methods (95.2%) [9].
10
5.3
K. Glette, J. Torresen, and M. Yasunaga
Evolution Speed
In the experiments, several rows did not achieve maximum fitness before reaching the limit of 1000 generations per evolution run. The average number of generations required for each evolution run (that is, one FU row) was 853. This gives an average of 98948 generations for the entire system. The average evolution time for the system is 63s on an Intel Xeon 5160 processor using 1 core. This gives an average of 0.54s for one FU row, or 4.3s for 8 rows (the time before the system has evolved 4 FU rows for each category and thus is operational). A hardware implementation of the evaluation module has been reported in [7]. This was reported to use 10% of the resources of a Xilinx XC2VP30 FPGA, and, together with the GA running on an on-chip processor, the evolution time was equivalent to the time used by the Xeon workstation. Similar results, or better because of software optimization, are expected for the evolution in the sonar case. 5.4
Hardware Implementation
An implementation of the classifier module has been synthesized for a Xilinx XC2VP30 FPGA in order to achieve an impression of speed and resource usage. The resources used for two configurations of the system can be seen in table 2. While the M = 58 configuration uses 81% of the FPGA slices, the M = 20 configuration only uses 28% of the slices. Both of the configurations classify at the same speed, due to the parallel implementation. Given the post-synthesis clock frequency estimate of 118MHz, and the delay of 63 cycles before one pattern is classified, one has a classification time of 0.5μs. Table 2. Post-synthesis device utilization for two configurations of the 2-category classification module implemented on an XC2VP30 Resource M = 20 M = 58 Available Slices 3866 11189 13696 Slice Flip Flops 4094 11846 27392 4 input LUTs 3041 8793 27392
5.5
Discussion
Although a good classification accuracy was achieved, it became apparent that there were larger variations within each category in this data set than in the face image recognition data set applied in [6]. The fitness function was therefore extended such that each row would be evolved with emphasis on a specific part of the training set. This led to increased classification accuracy due to the possibility for each row to specialize on certain features of the training set. However, the partitioning of the training set was fixed, and further investigation into the partitioning could be useful. The experiments also showed that by increasing the number of FU rows per category, better generalization abilities were obtained. The fact that the generalization became better when the evolution was cut off at an earlier stage,
An Online EHW Pattern Recognition System
11
could indicate that a CDM consisting of less ”perfect” FU rows has a higher diversity and is thus less sensible to noise in the input patterns. The main improvement of this system over the LoDETT system is the aspect of on-line evolution. As a bonus the classification accuracies are also higher. The drawback is the slower classification speed, 63 cycles, whereas LoDETT only uses 3 cycles for one pattern. It can be argued that this slower speed is negligeable in the case of the sonar return application, since the time used for preprocessing the input data in a real-world system would be higher than 63 cycles. In this case, even an SVM-based hardware system such as the one reported in [10] could be fast enough. Although the speed is not directly comparable since the application differs, this system has a classification time of 0.66ms, roughly 1000 times slower than our proposed architecture. Therefore, the architecture would be more ideal applied to pattern recognition problems requiring very high throughput. The LoDETT system has been successfully applied to genome informatics and other applications [13,14]. It is expected that the proposed architecture also could perform well on similar problems, if suitable functions for the FUs are found.
6
Conclusions
The online EHW architecture proposed has so far proven to perform well on a face image recognition task and a sonar return classification task. Incremental evolution and high level building blocks are applied in order to handle the complex inputs. The architecture benefits from good classification accuracy at a very high throughput. The classification accuracy has been shown to be higher than an earlier offline EHW approach. Little evolution time is needed to get a basic working system operational. Increased generalisation can then be added through further evolution. Further, if the training set changes over time, it would be possible to evolve better configurations in parallel with a constantly operational classification module.
Acknowledgment The research is funded by the Research Council of Norway through the project Biological-Inspired Design of Systems for Complex Real-World Applications (proj. no. 160308/V30).
References 1. Yao, X., Higuchi, T.: Promises and challenges of evolvable hardware. In: Higuchi, T., Iwata, M., Weixin, L. (eds.) ICES 1996. LNCS, vol. 1259, pp. 55–78. Springer, Heidelberg (1997) 2. Sekanina, L., Ruzicka, R.: Design of the special fast reconfigurable chip using common F PGA. In: Proc. of Design and Diagnostics of Electronic Circuits and Sy stems - IEEE DDECS’2000, pp. 161–168. IEEE Computer Society Press, Los Alamitos (2000)
12
K. Glette, J. Torresen, and M. Yasunaga
3. Yasunaga, M. et al.: Evolvable sonar spectrum discrimination chip designed by genetic algorithm. In: Proc. of 1999 IEEE Systems, Man, and Cybernetics Conference (SMC’99), IEEE Computer Society Press, Los Alamitos (1999) 4. Yasunaga, M., Nakamura, T., Yoshihara, I., Kim, J.: Genetic algorithm-based design methodology for pattern recognition hardware. In: Miller, J.F., Thompson, A., Thompson, P., Fogarty, T.C. (eds.) ICES 2000. LNCS, vol. 1801, pp. 264–273. Springer, Heidelberg (2000) 5. Glette, K., Torresen, J., Yasunaga, M., Yamaguchi, Y.: On-chip evolution using a soft processor core applied to image recognition. In: Proc. of the First NASA /ESA Conference on Adaptive Hardware and Systems (AHS 2006), Los Alamitos, CA, USA, pp. 373–380. IEEE Computer Society Press, Los Alamitos (2006) 6. Glette, K., Torresen, J., Yasunaga, M.: An online EHW pattern recognition system applied to face image recognition. In: Giacobini, M., et al. (eds.) EvoWorkshops 2007. LNCS, vol. 4448, pp. 271–280. Springer, Heidelberg (to appear, 2007) 7. Glette, K., Torresen, J., Yasunaga, M.: Online evolution for a high-speed image recognition system implemented on a Virtex-II Pro FPGA. In: The Second NASA/ESA Conference on Adaptive Hardware and Systems (AHS 2007) (accepted, 2007) 8. Gorman, R.P., Sejnowski, T.J.: Analysis of hidden units in a layered network trained to classify sonar targets. Neural Networks 1(1), 75–89 (1988) 9. Frieß, T.T., Cristianini, N., Campbell, C.: The Kernel-Adatron algorithm: a fast and simple learning procedure for Support Vector machines. In: Proc. 15th International Conf. on Machine Learning, pp. 188–196. Morgan Kaufmann, San Francisco, CA (1998) 10. Choi, W.-Y., Ahn, D., Pan, S.B., Chung, K.I., Chung, Y., Chung, S.-H.: SVMbased speaker verification system for match-on-card and its hardware implementation. Electronics and Telecommunications Research Institute journal 28(3), 320– 328 (2006) 11. Glette, K., Torresen, J.: A flexible on-chip evolution system implemented on a Xilinx Virtex-II Pro device. In: Moreno, J.M., Madrenas, J., Cosp, J. (eds.) ICES 2005. LNCS, vol. 3637, pp. 66–75. Springer, Heidelberg (2005) 12. Goldberg, D.: Genetic Algorithms in search, optimization, and machine learning. Addison Wesley, Reading (1989) 13. Yasunaga, M., et al.: Gene finding using evolvable reasoning hardware. In: Tyrrell, A., Haddow, P., Torresen, J. (eds.) ICES 2003. LNCS, vol. 2606, pp. 228–237. Springer, Heidelberg (2003) 14. Yasunaga, M., Kim, J.H., Yoshihara, I.: The application of genetic algorithms to the design of reconfigurable reasoning vlsi chips. In: FPGA ’00: Proceedings of the 2000 ACM/SIGDA eighth international symposium on Field programmable gate arrays, New York, NY, USA, pp. 116–125. ACM Press, New York (2000)
Design of Electronic Circuits Using a Divide-and-Conquer Approach Guoliang He1, Yuanxiang Li1, Li Yu2, Wei Zhang2, and Hang Tu2 1 State
Key Laboratory of Software Engineering , Wuhan University, Wuhan 430072, China 2 School of Computer Science, Wuhan University, Wuhan 430072, China
[email protected],
[email protected],
[email protected],
[email protected]
Abstract. Automatic design of electronic logic circuits has become a new research focus with the cooperation of FPGA technology and intelligent algorithms in recent twenty years. However, as the size of logic circuits became larger and more complex, it has become difficult for the automatic design method to obtain valid and optimized circuits. Based on a divide-and-conquer approach, a two-layer encoding scheme was devised for design of electronic logic circuits. In the process of evolvement, each layer was evolved parallel and they contacted each other at the same time. Moreover, in order to simulate and evaluate evolved electronic logic circuits, a two-step simulation algorithm was proposed to reduce computation complexity of simulating circuits and to improve the simulation efficiency. At last, a random number generator was automatically designed with this encoding scheme and the proposed simulation algorithm, and the result showed this method was efficient. Keywords: Evolvable Hardware, Divide-and-Conquer Approach, Simulation Algorithm.
1 Introduction In recent years the evolution of digital circuits has been intensively studied, which are expected to allow one automatically to produce large and efficient electronic circuit in applications instead of human design. However, an issue in the evolutionary design of electronic circuits is the problem of scale, which has not yet broken through grandly so far [1,2]. A possible way of tackling the problem is using building blocks that are higher functions rather than two-input gates. It was also observed that the evolutionary design of circuits with this method is easier when compared with the original methods in which the building blocks are merely gates [3,4,5]. However, to identify suitable building blocks and thus to evolve efficient electronic circuits are difficult tasks, which are to be further investigated. Another efficient method is to divide a larger circuit into L. Kang, Y. Liu, and S. Zeng (Eds.): ICES 2007, LNCS 4684, pp. 13–22, 2007. © Springer-Verlag Berlin Heidelberg 2007
14
G.L. He et al.
several sub-circuits and then evolve each sub-circuit respectively [6,7], but to divide an unknown circuit into sub-circuits in term of its abstract description is also a mysterious field. Other methods including variable length chromosome coding technology [8], compressed chromosome encoding, biological development [9] and so on were also presented to solve the problem of the scale of circuits. In this paper, a two-layer encoding method for designing electronic circuits is introduced in term of a divide-and-conquer approach, and then a two-step simulation algorithm is proposed to simulate and evaluate evolved circuits [10,11]. It adopts the event-driven idea, and carries on simulation of logical circuits stage by stage to reduce computation complexity of simulating circuits.
2 Design Logic Circuits with a Two-Layer Encoding Method 2.1 Circuit Encoding Scheme At present, one of the most important issues of designing a complex hardware system is how to deal with the scalability of the chromosome representation of electronic circuits. With the length of chromosomes being increased exponentially as the complexity of circuit improves, it is inefficient to evolve a valid circuit by current evolutionary techniques.
An initialized circuit
The representation of the circuit
Evolution of Directed Graph
Evolution of Genetic Tree
Simulation and Fitness calculation
Satisfy the ending condition
End the evolution
Fig. 1. The overview of the evolution
Design of Electronic Circuits Using a Divide-and-Conquer Approach
15
Based on the theory of divide-and-conquer and evolvable hardware, a method is introduced by dividing a circuit into modules before evolution, and then every module and the connections among the modules are parallel evolved. An overview of the evolution process is depicted in Fig. 1. In Accordance with this principle, a two-layer encoding method is introduced to evolve circuits. Fig. 2 shows the general architecture of this two-layer encoding scheme. The first layer is an acyclic directed graph, which is formed by connecting the modules through some links. The second layer is a lot of trees to form a population in each module predefined before evolution. First, each module acts as a genetic tree after an initialized circuit is modularized, then these modules are connected by some links to form an acyclic directed graph as a circuit. So a population is obtained by different links among the genetic tree layers. The nodes in each genetic tree are classified as terminal nodes and nonterminal nodes. The former are element in the set of constant values( “0” or “1” ) and variables. The later are fundamental logic gates or evolved modules, in this paper only the elementary logic gates listed in Table 1 are considered.
Fig. 2. The two-layer encoding scheme Table 1. Allowed cell functions
Letter 1 2 3 4 5
Function !a a•b a+b !( a • b) !(a + b)
Letter 6 7 8 9
Function a⊕b !( a ⊕ b) a • !c + b • c D-Trigger
2.2 Evolving Scheme In terms of this encoding method representing evolving circuits, some strategies could be devised. At the beginning of the evolution, there is only one tree inside each module
16
G.L. He et al.
and the acyclic oriented graph of connection among the modules is also unique. In the process of evolution, the population of the trees is produced in each module according to the evolution which will be discussed later, but in a module only a main tree has the connection of the leaf nodes and root node with outside of the module. At the same time, this strategy describes the links among the modules as acyclic oriented graphs and a lot of different connection ways form the senior colony of oriented graphs. From this evolving strategy, it can be seen that the two-layer encoding scheme is efficient and flexible to design more complex circuits. On the one hand, the modules and the oriented graphs are evolved separately, so this short chromosome that represents any one of them is very efficient to use evolutionary algorithm. At the same time, it reuses same modules to form a population of oriented graphs for evolution. Therefore, while the complexity of the evolvable circuit and the population for evolution become larger, the length of the chromosome representing this circuit and the evolutionary computation will increase to a lower extent. On the other hand, different evolution strategies which are efficient for the code structure of each level could be used flexibly to rapidly get a valid and optimized circuit. 2.3 Evolutionary Operations Due to this two-layer encoding method, some evolutionary operators are implemented differently in each layer to obtain valid circuits. For the first layer, the operators on graphs are as follow: Crossover: randomly select two graphs and pick up the different parts of them to combine into new two graphs. Mutation: add, delete and change the parts of the connections among nodes within a graph to form a new graph. Selection: inserting the newly produced individual into the former population and thus a new population is created at the end of each generation of evolution. When the number of individuals reaches the upper limit, delete the individuals with worse fitness value. The selection probability is the function of the fitness value of the genetic tree and the predefined value at the beginning. For the evolving of genetic trees in each module, which is the second layer, other operators are as follows: Exchange: this operator contains two different methods: the first is swapping the order of the subtrees in the same tree; the second is changing subtrees between two trees, with a precondition that the input and output of the two subtrees are matched. Mutation: change one or several non-terminal nodes in a tree according to the mutating probability. Upgrade: pick out a subtree in a certain tree and make it a tree itself. Deletion: delete a certain branch of a tree. Generation: create subtrees according to the probability of appearance of nodes on different layers.
Design of Electronic Circuits Using a Divide-and-Conquer Approach
17
3 Simulation and Evaluation Simulation and evaluation are very important processes to guide the strategy of evolution. Existing emulation software can carry out emulation well for circuits, but they can not simulate and evaluate logic circuits efficiently for the proposed two-layer scheme. For digital circuits, its function is commonly validated by simulating test data. In the process of validating a circuit function, the alteration of testing data does not always result in the output value to change for every logical gate. In terms of this idea, a circuit simulation algorithm is introduced to avoid some unnecessary macro blocks simulating repeatedly, because their output values have not changed as test data altering for the same circuit. So, the proposed simulation algorithm could shorten the runtime and improve the efficiency of simulation. 3.1 Simulation Algorithm of Electronic Logic Circuits For the simulation of logic circuits, it is necessary to handle the problem of logic gates or modules’ parallelism and delaying time. This paper introduces a “time and incident” form to record the signal lines and their signal values when they are going to change at a certain moment in future. Because the emulated circuit has to be downloaded into the FPGA board to perform a real time simulation, here a standard delay model is used through the simulation process (assuming that the delay time of signal line is zero)[12]. The data structure of “time and incident” form is as following: Structure of “Time” form {
┊ Time; Time variable, record a certain moment of the system Pevent;Incident pointer, points to the head of the array of the incident which will happen at “Time” moment. Next; Pointer to the next “Time” form
┊ } Structure of “Incident” form {
┊ Signal_ID ; The identification recording the signal which triggers the current incident Signal_Value;Recording the value when the signal is going to be at a certain moment Next; Pointer to the next “Incident” form
┊ }
18
G.L. He et al.
According to the treating method of the characteristic of circuit in the context, the main idea of the simulation algorithm is as following. The emulation process of logic circuits is divided into two stages according to the topological structure of the circuit. First stage is the structurization for the evolved circuit. Each macro entity is a child circuit without feedback and described as an oriented tree. Because not every macro entity wants emulation in every emulation clock cycle, each macro entity in the circuit is set to the state of activation or suspension in terms of its input value being changed or not. The second stage is to emulate the activated macro entity. The program will traverse every node (operator, variable or constant) of the macro entity and calculate the output values for these nodes according to their type and child nodes. Then these values will be stored in a corresponding place (In the macro entity the nodes are arranged in a breadth priority order, the emulation process begins layer upon layer backward from leaf nodes). For each basic macro entity (or logic gates) the output value and time delaying is determinate when the input signal is given. The main process of the simulation algorithm is expatiated in Figure 3. We can see from the process that the emulation for the macro entity is the key point, which decides the time complexity and efficiency of the whole algorithm. Because macro entities correspond with genetic trees in the evolutionary operations and that is to say the emulation of macro entity is the emulation of the genetic tree. This paper utilizes the event-drive idea and do not emulate those macro entities which don’t need emulation. Compared with those algorithms which have to traverse every macro entity, this algorithm can save unnecessary emulation time and improve the efficiency of simulation. 3.2 Evaluation of Electronic Logic Circuits To design digital circuits effectively, evaluation for circuit and emulation result must be carried out to guide the strategy for further evolution. According to the characteristic of logic circuits, the evaluation can be carried out from two aspects: Evaluation of circuit: the evaluation of circuit includes the evaluation of its function and its scale. The evaluation of the function means whether the circuit meets the demand of design and can reach the designated function. As for the evaluation of circuit scale, the fewer is the number of logic gates, the higher is the extent of parallelism when the function of the circuit is full. In fact, smaller scale means higher parallelism when the performance and speed of the circuit are the same. Evaluation of complexity: here the evaluation of complexity is defined as the time that is obtained through the simulation software to get the output result of the circuit, or that is needed to meets certain state. The fitness function of logic circuits is as following: fitness = g
c1 − c2
s
p − c3 , ci (i =1,2,3) are nonnegative constants.
In this formula, g denotes the extent of optimization of circuits’ performance, s denotes runtime and p denotes circuits’ scale. A good circuit indicates good performance, short runtime and small scale.
Design of Electronic Circuits Using a Divide-and-Conquer Approach
19
Initialize the population and their parameters
Read the values of input signals in the current time
If the value of an input signal changes, the macro entities connected to it are added to the set of S
Is there an event of current time in the time table?
Modify the value of the corresponding signa l, a nd a dd the ma c ro e ntitie s c onnected to the signal to the set of S as well as deleting the event.
Simulate the macro entities in the set of S. If the output value of the macro entity is changed, it is added to the event table as an event.
Delete the record of current time in the time table
Add a unit clock of this simulation
Satisfy the ending condition ?
End the simulation process
Fig. 3. The simulation algorithm of digital circuits
Moreover, the derivation tree generated in the process of evolution needs evaluation. Because it is not the whole circuit, we use the root node to represent the corresponding tree. So the fitness of the nodes is calculated as following: 1, Find the m circuits with the highest fitness and add their fitness to every node that the oriented graph of this circuit is connected to. The higher is the fitness summation of a certain node, the more important is its effect in the connection of the oriented graph.
20
G.L. He et al.
2. Evaluate each module separately. This method is employed for certain hardware object. 3. The key nodes decided by rule 1 and 2 require special fitness calculation. That is to say, evaluate the change of the oriented graph’s performance when certain node mutates and increase the fitness of the nodes which lead to big variation of the oriented graph’s fitness.
4 Experiment Using this two-layer encoding method and the simulation algorithm, a practical pseudo-random number generator is automatically designed to validate its efficiency. At the beginning of evolution, a basic embryonic circuit has to be given as the primitive individual. There are 65 modules in the embryonic circuit of the pseudo-random number generator. Among these modules, 64 modules are connected in series which are made up of single D triggers and the 65th module is a combinational circuit whose input is connected to the D output terminals of the former 64 triggers and its output is connected to the input terminal of the first D trigger. In fact, this is a typical LFSR pseudo-random number generator. Table 2. The evaluation results of the optimized circuit
Monobit Test Birthday Spacing Test Overlapping Permutations Test Ranks of 31x31 and 32x32 Matrices Ranks of 6x8 Matrices Monkey Tests on 20-bit Words Monkey Tests OPSO,OQSO,DNA Count the 1`s in a Stream of Bytes Count the 1`s in Specific Bytes Parking Lot Test Minimum Distance Test Random Spheres Test The Sqeeze Test Overlapping Sums Test Runs Test The Craps Test
Gen 1 1.000000 1.000000 1.000000
Gen 1000 0.912541 0.827782 0.720198
Gen 2000 0.773936 0.612698 0.515017
1.000000
0.210932
0.321032
1.000000 1.000000
1.00000 1.000000
0.999164 1.000000
1.000000
1.000000
1.000000
1.000000
1.000000
1.000000
1.000000
1.000000
1.000000
1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
0.999613 1.000000 0.973520 1.000000 1.000000 1.000000 1.000000
0.602602 0.946229 0.534159 0.999676 0.653748 0.659004 0.322741
Design of Electronic Circuits Using a Divide-and-Conquer Approach
21
Table 3. The values of parameters and its fitness
Gen 1 Gen 1000 Gen 2000
g 0 6 11
s 95 2756 3340
p 2100 21992 49183
Fitness 0.000000 0.166469 1.142326
For the individuals of the j generation, the evaluation function will evaluate the nj bit binary random bit string generated by emulation of these individuals. The g, s and p in f (g, s, p) are obtained synchronistically by emulation. nj is the monotone increasing function of j and n0=200000. FIPS-140 standard and the famous Diehard software are adopted to test its randomicity. The performance of the best individuals from 3 generations is shown in the Table 2, and the values of parameters and its fitness are shown in Table 3(parameters:C1=4.0, C2=0.5, C3=0.5). From Table 2 it shows that after the evolution of 2000 generations, this random number generator has passed most tests for random numbers’ performance which indicates the proposed method can design random number generators with good performance and have the ability of auto-design of combinational circuit and sequential circuit.
5 Conclusion Based on the divide-and-conquer method, a two-layer encoding scheme is presented for evolution of logic circuits. In the evolution process, genetic trees and the links among modules are evolved separately but also get in touch with each other. Moreover, according to this encoding method, a simulation algorithm is also proposed to simulate and evaluate electronic logic circuits to improve the efficiency of simulation. However, there are still some issues to be considered about this Divide-and-Conquer Approach. To automatically divide a circuit into several sub-circuits acted as modules in the two-layer encoding method has to be considered in stead of pre-divided in this paper. In the future study we hope to distill some principles in the evolution of circuits to be used as fundamental building blocks which could design larger logic circuits efficiently and easily. And it is necessary to improve the algorithms of evolution and simulation in order to evolve circuits quickly and efficiently.
Acknowledgments This work is supported by the National Natural Science Foundation of China under grant No. 60442001 and National High-Tech Research and Development Program of China (863 Program) No. 2002AA1Z1490.
References 1. Yao, X., Higuchi, T.: Promises and challenges of Evovable Hardware. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews 29(1), 87–97 (1999) 2. Gordon, T., Bentley, P.: Evolving hardware. In: Zomaya, A. (ed.) Handbook of Nature Inspired and Innovative computing, pp. 387–432. Springer, Heidelberg (2005)
22
G.L. He et al.
3. Higuchi, T., Murakawa, M., Iwata, M., et al.: Evolvable hardware at function level. In: Proc 1997 IEEE international conference in Evolutionary Computation Indianapolis, Indiana, pp. 187–192. IEEE Computer Society Press, Los Alamitos (1997) 4. Kang, L.S., He, W., Chen, Y.P.: Evolvable hardware are realized with function type programmable device. Chinese Journal of Computers 22(7), 781–784 (1999) 5. Zhao, S.G., Yang, W.H.: Intrinsic hardware evolution based on prototype of function level FPGA. Chinese Journal of Computers 25(6), 666–669 (2002) 6. Torresen, J.: Evolving multiplier circuits by training set and training vector partitioning. In: The 5th international conference of evolvable systems:from biology to hardware, pp. 228–237 (2003) 7. Stomeo, E., Kalganova, T., Lambert, C.: Generalized Disjunction Decomposition for the Evolution of Programmable Logic Array Structures. In: First NASA/ESA Conference on Adaptive Hardware and Systems 2006 (2006) 8. Kajitani, I., Hoshino, T., Iwata, M., et al.: Variable length chromosome GA for evolvable hardware. In: Proc the 1996 IEEE International Conference on Evolutionary Computation (ICEC’96), Piscatawa, NJ, USA, pp. 443–447. IEEE Computer Society Press, Los Alamitos (1996) 9. Gordon, T.G.W., Bentley, P.J.: Development brings scalability to hardware evolution. In: The 2005 NANSA/DoD conference on evolvable hardware, pp. 272–279 (2005) 10. Tu, H.: On evolvable hardware. Wuhan University, Wuhan (2004) 11. He, G.l.: Evolvable hardware technology and its applications in the design of digital circuits. Wuhan University, Wuhan (2007) 12. Kang, F.j.: Modern simulation technology and application. National defence industry press, Beijin (2001)
Implementing Multi-VRC Cores to Evolve Combinational Logic Circuits in Parallel Jin Wang1, Chang Hao Piao2, and Chong Ho Lee1 1 Department
of Information & Communication Engineering, Inha University, Incheon, Korea
[email protected] 2 Department of Automation Engineering, ChongQing University of Posts and Telecommunications, Chongqing, China
Abstract. To conquer the scalability issue of evolvable hardware, this paper proposes a multi-virtual reconfigurable circuit (VRC) cores-based evolvable system to evolve combinational logic circuits in parallel. The basic idea behind the proposed scheme is to divide a combinational logic circuit into several subcircuits, and each of them is evolved independently as a subcomponent by its corresponding VRC core. The virtual reconfigurable circuit architecture is designed for implementing real-world applications of evolvable hardware (EHW) in common FPGAs. In our approach, all the VRC cores are realized in a Xilinx Virtex xcv2000E FPGA as an evolvable system to achieve parallel evolution. The proposed method is evaluated on the evolutions of 3-bit multiplier and adder and compared to direct evolution and incremental evolution in the terms of computational effort and hardware implementation cost. Keywords: Intrinsic evolvable hardware, scalability, parallel evolutionary algorithm, incremental evolution.
1 Introduction As an alternative to conventional specification-based circuit design method, EHW has been introduced as an important paradigm for automatic circuit design in the last decade. However, there is still a long way to go before EHW become a real substitute to human designers. One of the problems appearing is that most of the evolved circuits are size limited [1, 2]. This is named as the scalability issue of EHW. In literature [2], Yao indicated that the existing evolvable systems were generally not scalable due to two reasons: (1) Chromosome string length of EHW, which increases with the target circuit size. A long chromosome string is required for representing a complex system. This often makes the search space too large that is difficult to explore even with evolutionary techniques. (2) Computational complexity of an evolutionary algorithm (EA), which is more pivotal factor than chromosome string length. Generally, the number of individual evaluations required to find a desired solution can increase drastically with the increased complexity of the target system. This paper focus on the investigation of scalability issues applied to the evolutionary design of combinational logic circuits. In our proposal, a combinational logic circuit will be decomposed as several sub-circuits. An evolvable system L. Kang, Y. Liu, and S. Zeng (Eds.): ICES 2007, LNCS 4684, pp. 23–34, 2007. © Springer-Verlag Berlin Heidelberg 2007
24
J. Wang, C.H. Piao, and C.H. Lee
including several VRC cores is employed to evolve the separate sub-circuits in parallel. Finally, the evolved sub-circuits interact to perform the expected top level logic function. Our proposed method approaches the scalability issue of EHW by speeding up the EA computation, shortening the chromosome length, and reducing the computational complexity of the task. Experiments of evolving 3-bit multiplier and adder are conducted in this paper to compare the execution time and the hardware cost of proposed evolutionary strategy with direct evolution and incremental evolution [3, 4, 5]. The rest of this paper is organized as follows: Section 2 briefly introduces the existed approaches to the scalability issue of EHW. The proposed scheme implementing parallel evolution with multi-VRC cores is presented in Section 3. Section 4 describes the hardware realization of the proposed evolvable system. Experimental results are summarized in Section 5 and discussed in Section 6. Section 7 concludes our work.
2 Previous Scalable Approaches to EHW Various approaches have been proposed to solve the scalability issue of EHW. Murakawa et al. [6] tackled the problem of scale in the evolved circuits by using function level evolution. By employing higher level functions as building block rather than multi-input gates, an important property of function level approach is that the size of chromosome remains limited while the complexity of circuits can grow arbitrarily. This approach in itself is reasonable, and it has also been considered in the application of evolving spatial image operators [7, 8]. While higher level functions allowed the designer to reduce the EA search space and to make evolution easier, its disadvantage could be that the evolved solutions do not exhibit any innovation in their structure [9]. On the other hand, to identify the suitable function blocks and thus to evolve efficient electronic circuits is a difficult and time-consuming task. Parallel genetic algorithms as one of the most promising choices to improve the computational ability of EA have been presented in different types [10, 11]. Using parallelism to increase the speed of evolution seems to be an answer to combat the high computational cost. While parallel evolution does offer limited relief on the high computational cost problems, it does not provide any new capabilities from the standpoint of computational complexity theory. For example, the computational complexity of evolving combinational logic circuits which grow exponentially with the size of the circuit inputs on traditional genetic algorithm still grow exponentially with the size of circuit inputs on parallel genetic algorithm. When billions of candidate circuits are evaluated to evolve even small combinational logic circuits (e.g. 4-bit multiplier), relying too much on sheer speedup of the EA itself seems not a reasonable solution to the computational complexity issue of EHW. A possible way to reduce the computational complexity of EHW is incremental evolution which is based on the principle of divide-and-conquer. Incremental evolution was first introduced by Torresen [3] as a scalability approach to EHW. According to incremental evolution strategy, different non-trial circuits have been successfully evolved by using both extrinsic [3, 4] and intrinsic EHW [5]. In this approach, a circuit is evolved through its smaller components. This means the
Implementing Multi-VRC Cores to Evolve Combinational Logic Circuits in Parallel
25
evolution is first undertaken individually and serially on a set of sub-circuits, and then the evolved sub-circuits are employed as the building blocks which are used to further evolution or structure of the target circuit. A variation of incremental evolution, named Multiobjective genetic algorithm, has been suggested by Coello Coello et al. in [12]. In their case, each of the outputs of a combinational logic circuit is handled as an objective which is evolved in its corresponding subcomponent independently. Another scheme related to the idea of system partition is cooperative coevolution proposed by Potter and De Jong in literature [13]. It consists of serial evolution of coadapted subcomponents which interact to perform the function of the top system.
3 Description of the Proposed Approach Our proposed scheme approaches the scalability issue of EHW from three aspects: speeding up the EA, limiting the chromosome length, and decreasing the computational complexity of the problem. The main idea behind our proposed approach is to use parallel intrinsic evolution to handle subcomponents which are the decomposed versions of the top system. The circuit decomposition and assembly is inspired by the principle of divide-and-conquer which has been introduced in [3, 4, 5, 12, 14] to limit the computational complexity of EHW. A generalized hardware architecture for evolving decomposed subcomponents in parallel is introduced in this paper. This architecture, which we call multi-VRC cores, can realize parallel evolution in a single commercial FPGA. The feature of parallel intrinsic evolution of subcomponents is the most significant difference between our approach and the existing extrinsic EHW approaches [3, 4, 12, 13, 14], wherein subcomponents were evolved in a serial pattern by software simulation. 3.1 Decomposition of Logic Circuit There are commonly two strategies which have been introduced to decompose combinational logic circuits: Shannon decomposition and system output decomposition [4, 14]. In this paper, for the purpose of simplifying hardware implementation, only the second scheme-decomposition of system output is employed. Fig. 1 shows the system output decomposition approach for evolving a 2bit multiplier which includes 4-bit input and 4-bit output. In this scenario, according to the system output decomposition strategy, the 4-bit output of multiplier is assigned into two groups as the vertical line in truth table indicates. Each partitioned 2-bit output is applied to evolve its corresponding subcomponents (subcomponent 1 and 2). Each of them is a 4-in/2-out circuit. Further, the evolved subcomponents can be assembled together (as shown in Fig. 1) to perform a correct multiplier function. Although this particular illustration shows 2 subcomponents, the actual number of subcomponents may be more. E.g. we can evolve a separate circuit for only 1-bit output, so four subcomponents with 4-bit input and 1-bit output would be required in this case.
26
J. Wang, C.H. Piao, and C.H. Lee
Fig. 1. Partitioning output function of 2-bit multiplier
3.2 Evolutionary Algorithm In this work, a kind of intrinsic EHW which is built on the multi-VRC cores structure is used to evolve the partitioned subcomponents in parallel. Virtual reconfigurable circuit architecture was first proposed by Sekanina [8] for the purpose of implementing real-world applications of EHW in common FPGAs. The structure of the VRC is flexible, which can be designed for a given problem to fit the application requirements. For the application of evolving combinational logic circuits, Cartesian genetic programming (CGP)-based geometric structure has been implemented by Sekanina on VRC [15]. CGP was first introduced by Miller et al in [16], whose phenotype is a two-dimensional array of logic cells. In our approach, to reduce the chromosome length, a revised two-dimensional gates array which introduces more connection restrictions than standard CGP is employed. Compared with CGP in which each cell can get its inputs from the external inputs of cells array or the cell output in its previous layers, each gate in our proposed array is limited to connect to the gate outputs in its previous one layer. Some very similar gates array structures have been reported by Torresen [3] and Coello Coello [12] for learning combinational logic circuits in different literatures. The basic frame of the parallel evolutionary algorithm employed in our evolvable system is illustrated in Fig. 2. Though this particular illustration shows a parallel evolutionary algorithm designed for evolving two subcomponents, the actual number of subcomponents can be more. In this model, the population is divided into two subpopulations to maintain two decomposed subcomponents. The evolution of each subcomponent is performed according to the 1+ λ evolutionary strategy, where λ =2. Evolutionary operations are only based on selection and mutation operators. At the beginning phase, each subpopulation including λ individuals is created at random. Once the fitness of each individual is evaluated, the fittest individual is selected as the parent chromosome. The next generation of subpopulation is generated by using the fittest individual and its λ mutants. This process will repeat until the stop criteria of each subpopulation are achieved, which are defined as: (1) EA finds the expected solution for its corresponding subcomponent; (2) Predefined generation number is exhausted. In the evolutionary process, each VRC core maintains the evolution of its corresponding subpopulation independently.
Implementing Multi-VRC Cores to Evolve Combinational Logic Circuits in Parallel
27
Fig. 2. Flow diagram of proposed parallel evolutionary algorithm for evolving two subcomponents
Each subcomponent processes different decomposed system output function, so the fitness value of each individual in different subpopulation is calculated by comparing the subcomponent output with its corresponding partitioned desired system output as follows:
Fitness =
∑ ∑ x;
vector output
⎧1 output = expect where x = ⎨ ⎩0 output ≠ expect
(1)
For each partitioned output vector, each processed single bit output is compared with its corresponding system expected output (which is labeled as expect). If they are equal, the variable x will be presented as 1 and be added to the fitness function. Fitness is the sum values for the compared results of all outputs (output) in the processed partitioned output vectors (vector) set with the truth table.
4 Hardware Implementation Celoxica RC1000 PCI board [17] equipped with a Xilinx Virtex xcv2000E FPGA [18] (see Fig. 3) which has been successfully applied as a high performance, flexible and low cost FPGA-based hardware platform for different computationally intensive applications [19, 20] is employed as our experimental platform for the implementation and verification of the proposed multi-VRC cores architecture. The proposed evolvable system is composed of two main components (as shown in Fig. 3): Control and interface and several VRC cores. In the evolvable system, all operations of VRC cores are controlled by the control and interface which executes the commands from host PC and connects with the on board 8Mbytes SRAM. Each VRC core in the proposed evolvable system corresponds to a decomposed subcomponent which is defined in previous section. A VRC core consists of a Virtual reconfigurable circuit unit, an EA unit, and a Fitness unit. The EA unit implements the genetic operations and generates configuration bits string (chromosomes) to
28
J. Wang, C.H. Piao, and C.H. Lee
configure the virtual reconfigurable circuit unit. The virtual reconfigurable circuit unit, whose function is virtual reconfigurable, processes the input data from four memory banks. Fitness unit calculates individual fitness by comparing the output from the virtual reconfigurable circuit unit with its corresponding partitioned output in truth table.
Fig. 3. Organization of the proposed evolvable system with multi-VRC cores
Every virtual reconfigurable circuit unit in one VRC core can be considered as a digital circuit which acts as a decomposed subcomponent of the top system. Fig. 4 presents a virtual reconfigurable circuit unit designed for evolving a 6-input/3-output subcomponent as an example. It consists of 43 uniform function elements (FE) allocated in an array of 6 columns and 8 rows. In the last column of FE array, the amount of FEs is equal to the number of system outputs, and each FE corresponds to one system output. Every FE has two inputs and can be configured to perform one of 8 logic functions listed in Fig. 4. The input connections of each FE are selected by its two equipped multiplexers. An input of the FE in column 2 (3, 4, 5 or 6) can be connected to any one output of FE in its previous one column. In column 1, each input of FE can be connected to any one of the system inputs or defined as a bias of value 1 or 0. Each FE is equipped with a Flip-flop to support pipeline processing. A column of FEs is considered as a single stage of the pipeline. Each FE needs 9 bits to determine its input connections (3+3 bits) and function (3 bits). Although the number of configuration bits required in column 6 is lower than 72 bits, the configuration memory still employs 72 bits per column in our implementation to simplify hardware design. Hence, the configuration memory process 72 × 6=432 bits. In our approach, eight FEs in the same column are configured simultaneously, so the 432-bit data in the configuration memory is divided and stored in 6 configuration banks (cnfBank) of 72 bits.
Implementing Multi-VRC Cores to Evolve Combinational Logic Circuits in Parallel
29
Fig. 4. Virtual reconfigurable circuit unit for the evolution of the 6-in/3-out subcomponent
Fig. 5. Architecture of EA unit
Fig. 5 describes the architecture of the EA unit designed for the evolution of a 6input/3-output subcomponent. When the EA is activated, the population memory is filled by two chromosomes, which are the mutated versions of two 6×72 bits random numbers generated by the Random Number Generator (RNG) with linear cellular automata [21]. The mutation unit processes 6×72 bits data in 6 clocks per 72 bits. Only randomly selected bits are inverted and the number of mutated bits is decided by the predefined mutation rate (which is 0.8% in this work). After all chromosomes of the initial population have been evaluated, the best chromosome is chosen as the parent chromosome to be stored in the best chromosome memory. The new population is generated using the parent chromosome and its 2 mutants. If the fitness
30
J. Wang, C.H. Piao, and C.H. Lee
of the offspring chromosome is better than that of the parent, the parent chromosome that is stored in the best chromosome memory will be replaced. Fitness calculation is realized in the Fitness unit. The input training vectors set is loaded from onboard SRAM and processed as the input of the VRC unit. The output vectors of the VRC unit are sent to the Fitness unit and compared with the partitioned expected output set specified in a truth table (that are also stored in onboard SRAM). The fitness value is increased for every output bit match. Therefore, the maximal fitness value is 64×3 (the size of the decomposed system output in this scenario is 3-bit).
5 Experimental Results Our proposed multi-VRC cores-based evolvable system was designed by using VHDL and synthesized into Virtex xcv2000E FPGA using Xilinx ISE 8.1. According to our synthesis report, in all cases, the proposed evolvable system can be operated more than 90MHz. However, the actual hardware experiment was run at 30MHz because of easier synchronization with PCI interface which can operate correctly with PCI bus clocks from zero to 33MHz. In this paper, 3-bit multiplier and 3-bit adder were employed as evolutionary targets to illustrate our proposed scheme. Three different evolutionary strategies were used in the experiments: (1) direct evolution which was employed in [15, 22]; (2) incremental evolution proposed by Torresen [4] with partitioned training vector strategy only; (3) our proposed multiVRC cores-based scheme. The maximum number of generations of one EA run was set to 227 in all evolutionary strategies. To achieve the reasonable hardware cost and performance, a uniform 8×6 FE array was employed to evolve all the decomposed subcomponents in incremental evolution and our proposed strategy. A 12×6 FE array was selected for the direct evolution of 3-bit multiplier, which was depended on our previous experiments. No feasible 3-bit multiplier solution can be evolved using smaller size FE arrays (e.g. 10×6, 8×8). To simplify hardware design, this 12×6 FE array was also employed to directly evolve 3-bit adder. In the evolution of 3-bit multiplier, with the proposed scheme, two and three subcomponents-based system partitions were implemented individually. To achieve more symmetrical computational complexity in each decomposed subcomponent, the system output partitions were executed as follows: (1,3,5), (2,4,6) for 2 subcomponents and (1,4), (2,5), (3,6) for 3 subcomponents. The same system decomposition rule was also employed in incremental evolution. The comparisons of the device cost, the chromosome length in each subcomponents, the times of successful EA runs to find feasible logic circuits (times of success), the average and standard deviation values of the number of generations, and the average total evolution time produced by direct evolution, incremental evolution, and multi-VRC cores-based approach are shown in table 1. We performed 100 runs for each case. The evolvable 3-bit adder includes 6-input and 4-output, wherein input carry is not considered in this work. Only two subcomponents-based system decomposition was implemented in our experiment. Table 2 summarizes all results for evolving 3-bit adder under different settings. All average experimental results are from 100 independent EA runs for each case.
Implementing Multi-VRC Cores to Evolve Combinational Logic Circuits in Parallel
31
Table 1. The results of evolving 3-bit multiplier with different evolution strategies
EA type
Divided outputs
Device cost (slices)
Chromosome (bits)
Direct evolution
1-6
4274
792
2984
432
3423
432
4505 6422
Incremental evolution
MultiVRC cores
1,3,5 2,4,6 1,4 2,5 3,6 1,3,5 2,4,6 1,4 2,5 3,6
Number of generations avg. std. dev.
Total evolution time (avg.)
Times of success
77.147 sec
50
18081238
19461400
1526060 2122824 505963 391404 133943
1803625 2297710 494748 353602 165753
432
2450171
2326822
10.454 sec
61
432
610772
409938
2.606 sec
57
58 61 70 67 60
15.569 sec 4.400sec
Table 2. The results of evolving 3-bit adder with different evolution strategies
EA type Direct evolution Incremental evolution Multi-VRC cores
Divided outputs 1-4 1,3 2,4 1,3 2,4
Device Chromcost osome (slices) (bits)
Number of generations avg. std. dev.
Total evolution time (avg.)
Times of success
4130
792
380424
465868
1.623 sec
47
2948
432
48011 58684
63766 70317
0.455 sec
56 57
4460
432
69256
62631
0.295 sec
51
6 Discussion We have presented the results of our initial experiments on the multi-VRC coresbased evolvable system. The analysis is conducted on the two examples presented in this paper. In all cases, the times of successful EA runs to find feasible logic circuits in direct evolution, incremental evolution, and our proposed approach are comparable. Since our main motive of the approach proposed in this paper is to develop an efficient evolvable system for conquering the scalability issue of EHW, we need to perform another comparison wherein we analyze the computational cost required by the three mentioned approaches. The computational costs of different types of evolutionary strategies can be evaluated by using the average total system evolution time. It can be clearly appreciated that the multi-VRC cores-based EHW outperforms other approaches in all cases. All results indicated the execution time for EA learning significantly depends on the levels of the system decomposition selected. For the evolution of 3-bit multiplier, the speedup obtained by means of multi-VRC cores with two decomposed subcomponents is 7.4 (against direct evolution) and 1.5 (against incremental evolution with two subcomponents). With three subcomponents-based
32
J. Wang, C.H. Piao, and C.H. Lee
decomposition, the speedup of multi-VRC cores-based EHW is 29.6 (against direct evolution) and 1.7 (against incremental evolution with three subcomponents). Similar compared results can be obtained in the evolutions of 3-bit adder. We can consider the proposed multi-VRC cores-based EHW is a hybridization of parallel intrinsic evolution and divide-and-conquer approach-based incremental evolution. The better performance related to computational costs obtained by our approach is mainly due to two new features of our proposed evolvable system: (1) Parallel implemented multi-VRC cores with powerful computational ability. In our approach, all the RC cores are implemented in a FPGA, which executes the evolution of subcomponents (e.g. fitness evaluation and genetic operations) in parallel. The most obvious advantage of this implementation is that the evolution of each subcomponent is completely pipelined and parallel. The proposed hardware implementation gives us a promise to conquer the overhead introduced by slow inter processor communications, setup time issues in general multi-processors based parallel evolution. (2) System decomposition strategy. The main advantage of decomposition system output is that evolution is performed in some smaller subcomponents with less output than top system. The number of gates required for implementation each subcomponent can be reduced for the smaller size of system output. A shorter chromosome can be employed to present each subcomponent. For example, in our experiments, the chromosome length in each subcomponent was reduced from 792 to 432. On the other hand, decomposed output function also reduces the computational complexity of the problems to be solved. Therefore, by partitioning system output, a simpler and smaller EA search space can be achieved in the evolution of each subcomponent. Another interesting observation is the hardware implementation cost. With the introduction of system decomposition strategy, the device cost is larger than that needed for traditional direct evolution approach. More VRC cores are required in our proposed approach than in direct evolution (which employed only one VRC). However, the increase of device cost is not very significant in multi-VRC cores-based EHW. From our synthesis results, the hardware cost in two-VRC cores-based approach is very close to the result in direct evolution. In three-VRC cores-based approach, the device cost increase is 1.5 times as compared to direct evolution. This feature is due to smaller inputs/outputs combination is employed in the decomposed subcomponents. Corresponding to the Shannon’s effect [15], we can employ smaller size of FE array and chromosome memory to implement each partitioned subcomponents. On the other hand, we need to remind that our original motive is to conquer the existed scalability issue of EHW. Device cost is not considered as a serious issue in our work, because the number of transistors becoming available in circuits increases according to Moore’s Law.
7 Conclusion In this paper, we have presented a novel scalable evolvable hardware, known as multi-VRC cores architecture, for synthesizing combinational logic circuits. The proposed approach uses a divide-and-conquer-based technique to decompose the top system into several subcomponents. Then, all subcomponents are independently evolved by their corresponding VRC cores in parallel. The experimental results show
Implementing Multi-VRC Cores to Evolve Combinational Logic Circuits in Parallel
33
that our proposed scheme performs significantly better than direct evolution and incremental evolution in term of the EA execution time. Both of 3-bit multiplier and adder are able to be evolved in less than 3 seconds, which is untouchable by any other reported evolvable systems. Future work will be devoted to apply this scheme to other more complex real-world applications. Acknowledgments. This work was supported by the Korean MOCIE under research project 'Super Intelligent Chip Design'.
References 1. Torresen, J.: Possibilities and Limitations of Applying Evolvable Hardware to Real-world Application. In: Proc. of the 10th International Conference on Field Programmable Logic and Applications, FPL-2000, Villach, Austria, pp. 230–239 (2000) 2. Yao, X., Higuchi, T.: Promises and Challenges of Evolvable Hardware. IEEE Transactions on Systems, Man, and Cybernetics 29(1), 87–97 (1999) 3. Torresen, J.: A Divide-and-Conquer Approach to Evolvable Hardware. In: Sipper, M., Mange, D., Pérez-Uribe, A. (eds.) ICES 1998. LNCS, vol. 1478, pp. 57–65. Springer, Heidelberg (1998) 4. Torresen, J.: Evolving Multiplier Circuits by Training Set and Training Vector Partitioning. In: Tyrrell, A.M., Haddow, P.C., Torresen, J. (eds.) ICES 2003. LNCS, vol. 2606, pp. 228–237. Springer, Heidelberg (2003) 5. Wang, J., et al.: Using Reconfigurable Architecture-Based Intrinsic Incremental Evolution to Evolve a Character Classification System. In: Hao, Y., Liu, J., Wang, Y.-P., Cheung, Y.-m., Yin, H., Jiao, L., Ma, J., Jiao, Y.-C. (eds.) CIS 2005. LNCS (LNAI), vol. 3801, pp. 216–223. Springer, Heidelberg (2005) 6. Murakawa, M., et al.: Hardware Evolution at Function Level. In: Ebeling, W., Rechenberg, I., Voigt, H.-M., Schwefel, H.-P. (eds.) PPSN IV 1996. LNCS, vol. 1141, pp. 62–71. Springer, Heidelberg (1996) 7. Zhang, Y., et al.: Digital Circuit Design Using Intrinsic Evolvable Hardware. In: Proc. Of the 2004 NASA/DoD Conference on the Evolvable Hardware, pp. 55–63. IEEE Computer Society Press, Los Alamitos (2004) 8. Sekanina, L.: Virtual Reconfigurable Circuits for Real-World Applications of Evolvable Hardware. In: Tyrrell, A.M., Haddow, P.C., Torresen, J. (eds.) ICES 2003. LNCS, vol. 2606, pp. 186–197. Springer, Heidelberg (2003) 9. Sekanina, L.: Evolutionary Design of Digital Circuits: Where Are Current Limits? In: Proc. of the First NASA/ESA Conference on Adaptive Hardware and Systems, AHS 2006, pp. 171–178. IEEE Computer Society Press, Los Alamitos (2006) 10. Gordon, V.S., Whitley, D.: Serial and Parallel Genetic Algorithms as Function Optimizers. In: Proc. of the Fifth International Conference on Genetic Algorithms, pp. 177–183. Morgan Kaufmann, San Mateo, CA (1993) 11. Cantu-Paz, E.: A Survey of Parallel Genetic Algorithms. Calculateurs Parallels 10(2), 141–171 (1998) 12. Coello Coello, C.A., Aguirre, A.H.: Design of Combinational Logic Circuits Through an Evolutionary Multiobjective Optimization Approach. Artificial Intelligence for Engineering, Design, Analysis and Manufacture 16(1), 39–53 (2002) 13. Potter, M.A., De Jong, K.A.: Cooperative Co-evolution: An Architecture for Evolving Coadapted Subcomponents. Evolutionary Computation 8(1), 1–29 (2000)
34
J. Wang, C.H. Piao, and C.H. Lee
14. Kalganova, T.: Bidirectional Incremental Evolution in Extrinsic Evolvable Hardware. In: Proc. of the 2nd NASA/DoD Workshop on Evolvable Hardware, pp. 65–74. IEEE Computer Society Press, Los Alamitos (2000) 15. Sekanina, L., et al.: An Evolvable Combinational Unit for FPGAs. Computing and Informatics 23(5), 461–486 (2004) 16. Miller, J.F., Thomson, P.: Cartesian Genetic Programming. In: Poli, R., Banzhaf, W., Langdon, W.B., Miller, J., Nordin, P., Fogarty, T.C. (eds.) EuroGP 2000. LNCS, vol. 1802, pp. 121–132. Springer, Heidelberg (2000) 17. Celoxica Inc., RC1000 Hardware Reference Manual V2.3 (2001) 18. http://www.xilinx.com 19. Martin, P.: A Hardware Implementation of a Genetic Programming System Using FPGAs and Handel-C. Genetic Programming and Evolvable Machines 2(4), 317–343 (2001) 20. Bensaali, F., et al.: Accelerating Matrix Product on Reconfigurable Hardware for Image Processing Applications. IEE proceedings-Circuits, Devices and Systems 152(3), 236–246 (2005) 21. Wolfram, S.: Universality and Complexity in Cellular Automata. Physica 10D, 1–35 (1984) 22. Miller, J.F., et al.: Principles in the Evolutionary Design of Digital Circuits–Part I. Journal of Genetic Programming and Evolvable Machines 1(1), 7–35 (2000)
An Intrinsic Evolvable Hardware Based on Multiplexer Module Array Jixiang Zhu, Yuanxiang Li, Guoliang He, and Xuewen Xia The State's Key Laboratory of Software Engineering,WuHan University
[email protected]
Abstract. In traditional, designing analog and digital electrical circuits are the tasks of hard engineering, but with the emergence of Evolvable Hardware (EHW) and many researchers’ significant research in this domain EHW has been established as a promising solution for automatic design of digital and analog circuits during the last 10-odd years. At present, the main research in EHW field is focused on the extrinsic and intrinsic evolution. In this paper, we will fix our attention on intrinsic evolution. Some researchers concentrate on how to implement intrinsic evolution, mainly including the following three aspects: The first, evolve the bitstream directly and then recompose the bitstream; The second, amend the content of Look-Up-Table (LUT) by relative tools; The third, set up a virtual circuit on a physical chip, and then evolve its “parameters” which are defined by the deviser, when the parameters are changed, the corresponding circuit is evolved. This paper ignores the first and the second approaches, and proposes a virtual circuit based on Multiplexer Module Array (MMA) which is implemented on a Xilinx Virtex-II Pro (XC2VP20) FPGA.
,
Keywords: intrinsic, digital, multiplexer, FPGA.
1 Introduction Evolvable Hardware (EHW) is the application of Genetic Algorithms (GA) and Genetic Programming (GP) to electronic circuits and devices[3]. The research of EHW is mainly concentrated on extrinsic evolution and intrinsic evolution. The extrinsic evolution means that during the evolution, the fitness of individuals are evaluated by software simulation, only the final best individual will be downloaded to the physical chip. When it refers to digital circuit, the extrinsic evolution is actually making use of someway to evolve some functionalities, in other words, the intention of researching extrinsic evolution is to search a better algorithm which will help us obtain an optimized and correct circuit more easily than the previous algorithms do. Some available algorithms have proposed by Tatiana Kalganova et al in [8,9], and they have successfully evolved some digital circuits by input decomposition and output decomposition, or some other methodologies. While the intrinsic evolution means that during the evolution, every individual will be downloaded to the physical chip, the chip will evaluate the corresponding circuit and then will return its fitness to GA process. L. Kang, Y. Liu, and S. Zeng (Eds.): ICES 2007, LNCS 4684, pp. 35–44, 2007. © Springer-Verlag Berlin Heidelberg 2007
36
J. Zhu et al.
There are many methods to implement intrinsic evolution. A methodology for direct evolving is described in [1], it is the most intuitionistic intrinsic evolution, however, this approach requires a very professional understanding of a given FPGA as well as a familiarity of its configuration bitstream structure. Although the author introduces the bitstream composition of XC2V40 and teaches us how to localize the LUT contents in a configuration bitstream, there are two fatal limitations in its flexibility: On one hand, even produced by the same corporation, different types of FPGA have different configuration compositions, if the experimental environment is changed, we must spend unwanted energy on familiarizing ourselves with the new environment and re-parse the configuration, then locate the LUT contents from millions of configuration bitstream bits, and link them as a gene for evolution ,that is obviously a lack of portability. On the other hand, illegal bitstream may destroy FPGA, however, evolve the configuration bitstream directly will generate some illegal bitstream easily, when these illegal ones are downloaded to the FPGA, the chip will not work even damaged. Another methodology has been introduced in [2,3,4], that is to make use of JBits SDK or other third-party tools which are provided by Xilinx. JBits is a set of Java classes which provide an Application Program Interface (API) into the Xilinx Virtex FPGA family bitstream, it provides the capability of designing and dynamically modifying circuits in Xilinx Virtex series FPGA devices. Compared with the previous method, this method can modify the LUT contents more securely, and it has been proved to be a feasible method for intrinsic evolution. However it still has some limitations [2]: the largest drawback of JBits API is its manual nature, everything must be explicitly stated in the source code, including the routing. Another equally important limitation is that JBits API also requires that the users are very familiar with the architecture of a specified FPGA, hence, this method is shortage of flexibility. A virtual EHW FPGA based on “sblock” has been proposed in [5]. The functionality of the “sblock” and its connectivity to its neighbors is achieved by configuring its LUT, to alter a function in a “sblock” or change its connectivity, the LUT may be reprogrammed, and then map the virtual EHW FPGA to physically FPGA. The virtual EHW FPGA has many of the characteristics of the physical FPGA, it is able to solve the genome complexity issues and enable evolution of large complex circuits, but ultimately speaking, this method operates the LUT contents in a bottom-level way, this makes us uneasy about its security. Put the above three methodologies in perspective, ultimately, they change the LUT contents directly or indirectly for intrinsic evolution, their shared flaws are the deficiencies in flexibility, portability and security. The methodology of virtual circuit not operates the LUT contents directly, it is a middle-level structure between FPGA and GA, therefore we do not need to spend overmuch energy on being familiar with the given FPGA, some virtual circuits have been introduced in [6,7], while in this paper, we propose a new virtual circuit i.e. MMA, compared with the previous virtual circuits, the primary advantage is its convenience in encoding and decoding. The MMA is designed in VHDL and implemented on a Xilinx Virtex-II Pro series FPGA——XC2VP20. By setting up a MMA in a physical chip, we only need to evolve the data structure which is defined by ourselves to realize intrinsic evolution. During the GA process,
An Intrinsic Evolvable Hardware Based on Multiplexer Module Array
37
we need not to manipulate the bitstream and not to change the LUT contents, so it is a safe evolution. In addition, this method only requires us grasp the general approach of designing a simple circuit using VHDL or Verilog HDL, instead of mastering the internal structure of a specific FPGA as a professional hardware engineer, since VHDL is a universal hardware description language, in this sense, it is “evolution friendly” [5] and in possession of flexibility. In the following sections, a detailed description of MMA methodology will be laid out.
2 The Structure of MMA In this section, we will introduce the structure of MMA in a bottom-up manner. 2.1 Function Styles The function styles are defined by ourselves, they may be some basic logic gates such as “AND”, “OR”, “NOT”, or some complex function modules, it depends on whether a gate-level EHW or a function-level EHW will be carried out. In our experiment, we defined ten function styles including basic logic-gates and some simple functions, every function style has an opposite function style to it, see Table 1. During the evolution, the numbers ‘0’ to ‘9’ represent the corresponding functions, we need to look up this table when decoding chromosomes to circuits. Table 1. Function Style in our experiment
0 1 2 3 4
Function Style a !a a•b a+b !( a • b)
5 6 7 8 9
Function Style !(a + b) a⊕b !( a ⊕ b) a • !c + b • c !( a • !c + b • c)
2.2 Multiplexer Module Figure 1 illustrates the internal structure of a single multiplexer module in our system. The blocks marked “MUX1”, “MUX2”, “MUX3”, “MUX4” represent the multiplexers, those marked “reg1”, “reg2” ,“reg3” ,“reg4” represent the control-port of the corresponding multiplexers, and those marked “Function Style 0”, “Function Style 1”,…, “Function Style i” represent the corresponding function modules defined in above section, the input signals of each multiplexer module will be introduced in section 2.3, the output of “MUX4” is that of the whole multiplexer module. Every multiplexer module contains several input-switch multiplexers and a function-switch multiplexer, the number of input-switch multiplexers depends on the function styles, if there is a function style has a max input of n, n input-switch multiplexers will be needed even though the other function styles are less inputs than n, because it must be ensured that all possible function styles have sufficient inputs.
38
J. Zhu et al.
In the proposed system, each multiplexer module contains four multiplexers, three for input-switch, and one for function-switch. See figure 1, “MUX1”, “MUX2”, “MUX3” are used for input-switch, because there are two three-inputs functions in our experiment, “MUX4” is used for function-switch, each of the four multiplexers is connected to a register, then the contents of these registers will determine which input will be the output. The input signals of a single multiplexer module are all connected to “MUX1”, “MUX2”, “MUX3”, each of these three multiplexers select one input signal as an input of the following function modules, so there are three inputs for all function modules, however, if a function style needs less input signals than three, the redundant inputs will be null. The output of “MUX4” which is selected from the function modules’ outputs in terms of “reg4” is that of the whole multiplexer module, then the current multiplexer module will has a functionality of the corresponding function module, hence, as a conclusion, if the circuit needs to be changed in logical structure or functionality, just to change the contents of these four registers. reg1
reg2
reg3
reg4 Function Style 0
MUX1
Function Style 1 Input Signals
MUX4
Output
MUX2
MUX3
Function Style i
Fig. 1. Structure of a Multiplexer Module
2.3 Multiplexer Module Array Figure 2 illustrates the structure of MMA which is prepared for an m inputs (I0, I1,I2,…,Im-1) and n outputs O0,O1,O2,…,On-1 circuit, the blanks represent the multiplexer modules, the arrowheads represent the routing. The size of the MMA is determined by the scale of the target circuit, the number of columns is determined by ourselves optionally, while the rows can not be less than the larger one of the inputs and outputs i.e. max(m, n). When describing the MMA in VHDL, all the multiplexer modules are connected to the previous column of modules and the primal inputs of the circuit, so during the GA process, the physical structure of the circuit is actually not changed, the only thing we need to do is evolving the contents of the registers, as it is mentioned in section 2.2, these registers determine the logic structure and functionality of the circuit, hence, when these registers evolved, the circuit is changed.
An Intrinsic Evolvable Hardware Based on Multiplexer Module Array
39
Take designing a 4-bit adder for example, a 4-bit adder has 8 inputs and 5 outputs, we design an 8*8 MMA. Each of the 8 multiplexer modules in the first column is connected to the 8 primal inputs directly, while from the second column to the eighth column, all the multiplexer modules are connected to the 8 primal inputs and 8 outputs from the previous column, that is to say, except the modules in the first column are only 8 inputs, the other modules are 16 inputs, the 5 outputs are selected from the outputs of the 8 modules in the last column. Inputs Outputs
Multiplexer Module Array I0
O0
…..
I1
O1
….. ┊
┊
┊
Im-1
….. …..
┊
┊ On-1
Fig. 2. The structure of MMA
3 The Proposed System In this section, we will introduce the proposed system. 3.1 The Experiment Platform The platform of our research is an AMD-XPL card embedded a XCV2P20 device, it is built in PC via PCI, the XC2VP20 will be able to work when the driver is installed correctly. The ADM-XPL is an advanced PCI Mezzanine card (PMC) supporting Xilinx Virtex-II PRO (V2PRO) devices, an on-board high speed multiplexed address and data local bus connects the bridge to the target FPGA, memory resources provided on-board include DDR SDRAM, pipelined ZBT and Flash, all of which are optimized for direct use by the FPGA using IP and toolkits provided by Xilinx. See more detailed specifications in [10,11,12]. See figure 3, it illustrates the proposed system explicitly, the FPGA Space is allocated from some usable memories or registers in FPGA, the arrowheads reflect the direction of data flow. The GA process in PC is described in C++, while the modules in XC2VP20 are described in VHDL, a Counter is necessary for generating the primal inputs of MMA. During the GA process, the chromosomes are written to the FPGA Space one by one. Once a single chromosome finishes being written, the MMA will configure the circuit in terms of the current chromosome, after that, the Counter will
40
J. Zhu et al.
generate the primal inputs, from “000…0” i.e. all “0” to “111…1” i.e. all “1”, at the same time, the corresponding outputs of every input combination will be evaluated by the configured circuit, if the Counter reaches all “1” state, the evaluation stopped, then the FPGA Space returns the Truth Table of current circuit to the GA process, and it will be compared with the target Truth Table, then the corresponding fitness of current chromosome will be evaluated. Then the next chromosome will follow the same stages and obtain its fitness until the GA process is terminated.
Chromosome
FPGA Space
GA Current circuit Truth Table Counter
MMA
PC
XC2VP20
Fig. 3. Intrinsic Evolution Framework
In such a cycle, the GA process needs to call relative API functions to write datum to and read datum from the FPGA Space, so we need to configure the C++ compiler for using the API header files and libraries following the approach which has been introduced in [10]. 3.2 VHDL design In the proposed system, designing the virtual circuit in VHDL is a preliminary work. See Figure 3 again, the virtual circuit contains the three modules: the first module marked “FPGA Space” is a location allocated from all usable memory resources mentioned in the previous section, the second is MMA, and the third is a Counter. The FPGA Space module is prepared for receiving chromosome and returning the Truth Table, this module is an alternating interface between the GA process and the MMA: the GA process writes every chromosome to FPGA Space module, and the FPGA Space module returning the Truth Table to GA process. The MMA module read datum from FPGA Space module, and then maps them to a real function circuit. The Counter module’s outputs are the inputs of the MMA module, it starts generating the input combinations of the MMA module as soon as the circuit finished mapping, once an input combination has been generated, a corresponding output combination will be written to an appointed location of the FPGA Space module, when all the
An Intrinsic Evolvable Hardware Based on Multiplexer Module Array
41
input combinations have been completed, these locations constitute the Truth Table of this circuit. When this preliminary work is finished, it is a universal design for evolving different target circuits, or only needs some tiny modification such as broadening the size or shifting the routing of MMA to evolve more complex target circuits. 3.3 Encoding and Decoding A pivotal problem in EHW domain is how to encode circuits to chromosomes, which has a direct influence on both the course of evolution and the final outcome. A good coding should satisfy three conditions at least: the first, its length should be controlled in the range which the GA could handle; the second, it should be able to decode to a practical circuit conveniently; the third, the evolution of the coding should be able to reflect that of both functionality and routing. The proposed methodology in this paper has advantage in obtaining such a good coding. In section 2.2, we have mentioned that every multiplexer module has four controlports, which are connected to four registers respectively, so we can use four integers to describe a multiplexer module, and then joint these four integers of all multiplexer modules together row by row as a chromosome. In the experiment, we evolved a 4-bit adder and design an 8*8 MMA, the modules in first column were 8 inputs numbered from ‘0’ to ‘7’, while those in other 7 columns were 16 inputs numbered from ‘0’ to ‘15’, the anterior 8 sequence numbers represented the 8 primal inputs and the latter 8 sequence numbers represented the 8 inputs from the previous column’s outputs. Take an example, see figure 4, if a Multiplexer Module is described by four integers as follows:
Fig. 4. An example
then we decode this single module in this way: the number “7” represents that the last primal input is the first input of this module, the number “12” represents that the row 5th module’s output in the previous column is the second input of current module, the number “0” represents that the first primal input is the third input of current module, the number “6” represent function style 6, look up table 1, it is “a ⊕ b”, however, this function has only two inputs, so as it is introduced in section 2.2, the third input will be ignored, so this module has a functionality of “XOR”. The 8*8 MMA has 64 such modules, joint these 64*4 integers together as a chromosome, so the length of chromosome in our experiment is 256. 3.4 Fitness Evaluation The fitness is evaluated by GA, once the GA process obtains the Truth Table of current circuit from the appointed locations of FPGA, it compares this truth table with the Truth Table of the target circuit, checks them bit by bit and counts the match-bits,
42
J. Zhu et al.
the percentage of the match-bits is the fitness of current circuit. There will be some trivial differences in fitness evaluation between different GA processes, see more detailed description of fitness evaluation in [8,9]. 3.5 Experiment Results In our experiment, the max generation of GA was 2,000,000, and the program was executed 10 times, there were 9 times step into “fitness stalling effect” [9], only once the fitness reached 100%, the low success rate is mainly attributed to the following reasons: Firstly, theoretically speaking, an 8*8 MMA is enough to evolve a 4-bit adder, but in fact, it is difficult to evolve a fully functionality 4-bit adder in finite generations with an size of 8*8, so increasing the size of MMA will help to improve the success rate. Secondly, the routing between multiplexer modules in our experiment was stiff, every module could only connect to the previous column modules or primary inputs, and each module has only one output, which result in many modules have not been fully utilized, so a more flexible routing will be beneficial to increasing the likelihood of success evolution in limited size. Thirdly, the function styles defined in the proposed system are too simple, see table 1, all the function styles are only one output, extending multi-output function styles or more complex function styles will be able to improve the efficiency of evolution. In addition, the GA operators in our experiment were not efficient enough, because we did not try our best to seek a better algorithm for evolution, this maybe another reason for the low success rate. How to improve the success rate of evolution will be our next step research, the main point of this paper is proposing a model for intrinsic evolution, and the experimental result shows that it is a feasible methodology, Table 2 shows the chromosome which has fully evolved, every four integers represents a single multiplexer module, the modules in shadow means that these modules will not be used in final circuit, there are 25 modules in shadow, so the other 39 modules constitute the final circuit. Table 2. A fully evolved chromosome
5 9 3 11 8 7 7 10 1 13 6 3 1 0 9 7 3 10 06 4 1 8 15
7 5 6 8 5 2 3 0
12 5 8 3 6 2 10 4 2 6 14 3 4 11 6 9 6 4 10 2 13 8 9 0 10 5 7 6 4 14 3 5
6 8 12 4 9 7 2 7 6 2 11 2 9 6 10 1 3 7 15 4 10 8 12 4 11 3 9 0 1 9 7 2 11 13 5 2 10 3 2 0 5 1 11 3 13 11 10 0 13 10 9 2 5 7 13 6 9 3 5 1 12 8 2 3 9 12 5 3 1 5 13 2 5 9 10 7 11 9 1 0 14 12 8 0 9 10 1 5 6 10 5 10 5 1 13 3
5 1 8 4 0 4 8 3 15 11 7 4 10 13 9 3 8 14 12 8 4 0 15 7 12 7 9 0 1 5 10 7
8 10 2 4 9 11 7 2 2 6 13 6 15 12 7 4 13 2 4 1 14 5 6 0 12 1511 3 0 4 11 2
9 15 6 12 8 3 14 11 0 13 10 7 7 3 0 15 4 1 2 9 10 4 13 0
3 6 4 6 6 5 8 2
Decoding this chromosome to practical circuit following the method described in section 3.3, the corresponding circuit is showed in Figure 5, that 25 invalid modules are not included in the circuit diagram while the other 39 valid modules do, but we
An Intrinsic Evolvable Hardware Based on Multiplexer Module Array
43
can only see 32 modules in this diagram, it is because that there are 7 modules’ function styles are 0, see Table 1, it means that these modules have functionalities of buffers, so we draw them as wires for the sake of understandability.
Fig. 5. The corresponding circuit diagram of the fully evolved chromosome
4 Conclusion The main purpose of this paper is to propose an intrinsic evolution methodology based on MMA, and as an example, we introduced the proposed system by evolving a 4-bit adder, of course, the 4-bit adder has been evolved successfully by other methodologies, we just want to prove this is a feasible methodology and illuminate that compares with some of the previous methodologies, it has advantages in circuit encoding and decoding, flexibility and portability etc. In this experiment, we have not taken any special measures to control the evolution for the sake of improving the efficiency and success rate of GA process, this is not the focus of this paper, but it will be our next step research.
Acknowledgement The first author would like to thank State Key Laboratory of Software Engineering, Wuhan University for supporting our research work in this field, and the National
44
J. Zhu et al.
Natural Science Foundation under Grant No.60442001 and National High-Tech Research and Development Program of China (863 Program) No.2002AA1Z1490.The authors acknowledge the anonymous referees.
References 1. Upegui, A., Sanchez, E.: Evolving Hardware by Dynamically Reconfiguring Xilinx FPGAs. In: Moreno, J.M., Madrenas, J., Cosp, J. (eds.) ICES 2005. LNCS, vol. 3637, pp. 56–65. Springer, Heidelberg (2005) 2. Guccione, S., Levi, D., Sundararajan, P.: JBits: Java based interface for reconfigurable computing. Xilinx Inc. (1999) 3. Hollingworth, G., Smith, S., Tyrrell, A.: Safe Intrinsic Evolution of Virtex Devices. In: The Second NASA/DoD Workshop on Evolvable Hardware[C], pp. 195–202 (2000) 4. Hollingworth, G., Smith, S., Tyrrell, A.: The Intrinsic Evolution of Virtex Devices Through Internet Reconfigurable Logic. In: Miller, J.F., Thompson, A., Thompson, P., Fogarty, T.C. (eds.) ICES 2000. LNCS, vol. 1801, p. 72. Springer, Heidelberg (2000) 5. Pauline, C.: Haddow and Gunnar Tufte. Bridging the Genotype-Phenotype Mapping for Digital FPGAs, pp. 109–115. IEEE Computer Society Press, Los Alamitos (2000) 6. Sekanina, L.: On Dependability of FPGABased Evolvable Hardware Systems That Utilize Vitual Reconfigurable Circuits, pp. 221–228. ACM, New York (2006) 7. Glette, K., Torresen, J.: A Flexible On-Chip Evolution System Implemented on a Xilinx Virtex-II Pro Device. In: Moreno, J.M., Madrenas, J., Cosp, J. (eds.) ICES 2005. LNCS, vol. 3637, pp. 66–75. Springer, Heidelberg (2005) 8. Kalganova, T.: Bidirectional Incremental Evolution in Extrinsic Evolvable Hardware. In: Proc. of the Second NASA/DOD Workshop on Evolvable Hardware(EH’00), pp. 65–74 (2000) 9. Stomeo, E., Kalganova, T., Lambert, C.: Generalized Disjunction Decomposition for Evolvable Hardware. IEEE, Transactions on Systems, MAN And Cybernetics—Part B: Cybernetic 36(5), 1024–1043 (2006) 10. Alpha Data Parallel Systems Ltd. ADM-XRC SDK 4.7.0 User Guide (Win32), Version 4.7.0.1 (2006) 11. Alpha Data Parallel Systems Ltd. ADM-XRC-PRO-Lite (ADM-XPL) Hardware Manual, Version 1.8 (2005) 12. Alpha Data Parallel Systems Ltd. ADC-PMC2 User Manual, Version 1.1 (2006)
Estimating Array Connectivity and Applying Multi-output Node Structure in Evolutionary Design of Digital Circuits Jie Li and Shitan Huang Xi’an Microelectronics Technology Institute 710071 Xi’an, Shannxi, China
[email protected]
Abstract. Array connectivity is an important feature for measuring the efficiency of evolution. Generally, the connectivity is estimated by array geometry and level-back separately. In this paper, a connectivity model based on the path number between the first node and the last node is esteblished. With the help of multinomial coefficient expansion, a formula for estimating array connectivity is presented. By applying this technique, the array geometry and level-back are taken into account simultaneously. Comparison of connectivity within arrays of different geometries and level-backs becomes possible. Enlightened by this approach, a multi-output node structure is developed. This structure promotes the connectivity without increasing the array size. A multiobjective fitness funciton based on power consumption and critical delay of circuits is proposed, which enables evolved circuits to agree with the requirements of applications. Experimental results show that the proposed approach offers flexibility in constructing circuits and thus improves the efficiency of evolutionary design of circuits. Keywords: Evolvable Hardware, array connectivity, multinomial coefficient, multi-output node structure.
1 Introduction Graphical rectangular arrays are commonly used in evolutionary design to map a digital circuit into a genotype [1][2][3][4][5]. In this representation, each input of a node can be connected to one output of the nodes in the previous columns. Feedback and connection within the same column are not allowed. The array is characterized by the geometry and level-back [6]. The array geometry refers to the number of columns and rows of an array [7]. The level-back indicates the maximum number of columns to the current node can have their outputs connected to the inputs of that node [6]. The values of array geometry and level-back determine the connectivity of an array. The larger the size of an array and the larger the level-back, the greater the array connectivity. As the array connectivity increase, the algorithm offers more flexible ability to construct circuits that match the functional requirements, and thus increases the evolutionary efficiency. Miller studied the effects on efficiency caused by array L. Kang, Y. Liu, and S. Zeng (Eds.): ICES 2007, LNCS 4684, pp. 45–56, 2007. © Springer-Verlag Berlin Heidelberg 2007
46
J. Li and S. Huang
geometry [8]. He argued that an array of smaller size may tend to produce higher fitness solution, if it is sufficient to realize a circuit. Kalganova applied different array geometries and level-backs for evolving multiple-valued combinational circuits [9], and investigated their influences on the average fitness and successful ratio. The conclusion is that by carefully selecting the geometry and level-back it is possible to improve the EA performance. Also she pointed out that the effect of row is less important in the evolutionary process. In [4] Kalganova intended to improve the solution quality by dynamically changing the array geometry. Other researches tried another way to obtain the maximum connectivity by employing a maximum levelback that equals the number of columns [3][6][10]. These published works evaluated the array connectivity by geometry and level-back separately. Connectivity comparison can be performed within arrays of the same level-back and different geometries or those of the same geometry and different level-backs conveniently. However, for those arrays of different geometries and level-backs, the comparison seems to be of no effect, and therefore, can not be a help for the determination of the best choice of these parameters efficiently. In this paper, a model based on the path number between the first and the last node of an array for estimating connectivity is established. Combined with the expansion of multinomial coefficients, a formula for calculating the array connectivity is presented. The advantages of this technique are that the level-back and geometry are taken into account simultaneously, and comparison between arrays of different level-backs and geometries is possible. According to this model, a multi-output node structure is developed. The structure enables a node to be a functional cell as well as a routing cell. A fitness function based on the power consumption and critical delay is also given, which can lead the evolved circuits close to the requirements of applications. This article is organized as follows: section 2 deals with the module and the connectivity formula, while section 3 describes the node structure. Section 4 introduces the fitness function and section 5 the evolutionary algorithm. The experimental results are reported in section 6. Conclusions are given in section 7.
2 Estimating Array Connectivity Suppose there is an array of n×m. Suppose also each node takes a structure of 1-input and 1-output.The array connectivity can be summarized as the total number of paths that connect the nodes in the first column and the last column. Since the ability of the nodes in the last column connecting to that in the first column is equivalent, we can simplify the array connectivity as follows: given a n×m array with a level-back of lb, the total number of paths that connect the input of the last node (No. n×m – 1) to the output of the first node (No. 0) is defined as the array connectivity, which is denoted by C (n, m, lb). Any column in an array, except the first and the last one, presents one of the two opposite states: a path passes the column and no path passes the column. Let 1 represents the state that a column passed by a path and 0 the state that a column is skipped. The array connection state can be encoded as a binary string. In an array of n×m, there are n possible entries for a path to enter the next column. Fig. 1 shows the different connection states of an array of 3×4 with lb = 3, and the path number of each state.
Estimating Array Connectivity and Applying Multi-output Node Structure
(a)
(b)
(c)
(d)
47
Fig. 1. Array connection states and path numbers. (a) State coding: 1111, path number: 9; (b) State coding: 1011, path number: 3; (c) State coding: 1101, path number: 3; (d) State coding: 1001, path number: 1.
As shown in Fig. 1, the number of array connection states is relevant to the levelback and the number of columns. When lb = 1, a node can only connect to the nodes in its neighbor column as featured by Fig. 1(a). For lb = 2, a node can connect to both the nodes in neighbor column and the nodes in the column with a distance of 2. The states of array include the situations shown in Fig. 1 (a), (b), (c). As the lb increases to 3, the state shown in Fig. 1 (d) should be also included. Apparently, the number of array states is enlarged as the number of columns and level-back increased. The number of array states is irrelevant to the rows, whereas the specific number of routing paths is dependent on the number of rows. According to the analyzing above, we investigated the connection states of some arrays with different geometries and level-backs in order to find out the relationships between them. The statistical results are listed in table 1, where m is the number of columns; lb represents the level-back; Si denotes the number of states that have i columns skipped by the routing paths, in other words, Si is the number of binary strings under the restriction of a certain level-back. When lb = 1, there is only one connection state – S0. The particular number of paths is determined by the number of rows and the output number of a node. By comparing the data in table 1 with the expanded coefficients of binomial, trinomial and quadrinomial, we can examine the relationships between them. Fig. 2 shows the comparison results. Similar results can be obtained as the m and lb increased. Thereby, an expression for calculating connectivity of an n×m array with lb≥2 can be derived from Fig. 2 as follows: m − 2 ( lb )
C (n, m, lb) = ∑ ⎛⎜ i i =0 ⎝ 1
⎞ n m − 2 −i + ⎟ ⎠
⎡ m−2 ⎤ ⎢ 2 ⎥ + lb − 2 ⎢ ⎥
∑ i=2
m − 2 ( lb )
where
⎛ ⎜i ⎝
⎞ ⎟ ⎠
represents the coefficients of lb-nomial;
integer greater than or equal to
m− 2 2
.
( lb )
⎛ m −i −1 ⎞ n m − 2−i ⎜i ⎟ ⎝ ⎠
(1)
⎡ m− 2 ⎤ denotes the least ⎢ 2 ⎥
48
J. Li and S. Huang Table 1. Statistical results of array connection states lb
2
3
4
Si S0 S1 S2 S3 S4 S5 S0 S1 S2 S3 S4 S5 S6 S0 S1 S2 S3 S4 S5 S6 S7
m 3 1 1
4 1 2
5 1 3 1
6 1 4 3
7 1 5 6 1
8 1 6 10 4
9 1 7 15 10 1
10 1 8 21 20 5
1 1
1 2 1
1 3 3
1 4 6 2
1 5 10 7
1 6 15 16 6
1 7 21 30 19 3
1 1
1 2 1
1 3 3 1
1 4 6 4
1 5 10 10 3
1 6 15 20 12 2
1 7 21 35 31 12 1
1 8 28 50 45 16 1 1 8 28 56 65 40 10
11 1 9 28 35 15 1 1 9 36 77 90 51 10 1 9 36 84 120 101 44 6
As mentioned previously, the node is predefined as a 1-input 1-output structure, so there are n possible entries in each column except the first and the last one. Consider a multi-output node structure, assume that there are k outputs of a node. Then the number of possible entries of a column becomes kn. As can be seen from expression (1), the array connectivity can be enhanced greatly without increasing the size of the array, and thus improving the performance of EA. This analysis results in the final formula for array connectivity estimation. The formula is given as follows:
⎧0 ⎪1 ⎪ ⎪ m−2 C(n, m, lb) = ⎨(kn) ⎡ m−2⎤ ⎪ ⎢ 2 ⎥+lb−2 (lb) 1 m−2 (lb) ⎢ ⎥ ⎪ ⎛ ⎞ ⎛ m−i−1⎞ m−2−i m−2−i ( kn ) + ⎜ i ⎟ (kn) ∑ ⎪∑⎜⎝ i ⎟⎠ ⎝ ⎠ i =2 ⎩ i=0
m=1 m=2 lb=1; m>2
(2)
lb=2,3,...,( m−1); m>2
where k≥1. In this formula, the calculation considers the effects of array geometry together with that of level-back. By applying this technique, connectivity comparison within arrays of different geometries and level-backs is possible. As can be seen from the formula, the effect that the number of columns performs on the connectivity is greater than that of the number of rows. As the level-back takes the maximum value as it can be, the array connectivity reaches to its peak. These conclusions correspond with those of the published works.
Estimating Array Connectivity and Applying Multi-output Node Structure
(a)
49
(b)
(c) Fig. 2. Comparison results. (a) Binomial coefficient vs. lb = 2; (b) Trinomial coefficients vs. lb = 3; (c) Quadrinomial coefficients vs. lb = 4.
3 Representation of Multi-output Node Structure Multi-output node structure has been introduced to satisfy the requirements of multioutput logic function [11][12][13]. In these researches, the outputs of a node are completely used as the outputs of the defined function. The number of outputs of a node varies as the function presented by the node changes. In contrast, we propose a multi-output structure with a fixed number of outputs. In this work, the output number takes a value of 3, which is slightly larger than the output number of the most complex function (e.g. 1-bit full adder). Redundant outputs of a node connect to one of the inputs of the node randomly. In this way, not only can a node be a functional node that presents the required logic function, but also it can be a switching node that transmits data from its inputs to its outputs directly. The multi-output node structure is represented in a hierarchical way, as shown in Fig. 3 (a). The external connection describes how the inputs are connected to the outside world. The internal connection illustrates the implemented function, the function outputs and the connections of the redundant outputs. Fig. 3 (b) shows a diagram of a 3-input 3-output structure and the detail representation of the node, in
50
J. Li and S. Huang
which the implemented function is a 1-bit full adder. The first and the second output indicate the carry and the sum of the adder, respectively. The third output is randomly connected to the second input of the node.
<(a, b, c), (FA, c, s, b)> (a)
(b)
Fig. 3. Hierarchical representation of node structure. (a) Hierarchical representation; (b) Example of a 1-bit adder node.
4 Multi-objective Fitness Function Generally, the fitness function for combinational circuits is separated into two stages: the first stage is to evolve a fully functional circuit; the second is to find an optimal solution based on the previous result. In this work, we use the power consumption and critical delay to evaluate circuits in the second stage. The power consumption refers to the total power consumed by all the MOSFETs in the circuit. The critical delay indicates the amount of propagation of the critical path. The new fitness function is formulated as follows: fitness =
⎧ ⎨ ⎩ max_
fit _ number < max_ fit
fit _ number
(
fit + α / power × max_ delay
)
(3) fit _ number = max_ fit
where fit_number is the number of bits that match to the truth table; max_fit is the total number of output bits of the truth table; power denotes the power consumption and max_delay, the critical delay. is a self-defined regulator for normalization. Suppose that the power consumption and the delay of a p-type MOSFET are of the same as that of a n-type MOSFET. We can determine the power consumption and the critical delay for each building block, according to the CMOS based circuits of
α
Table 4. Power consumption and critical delay of building blocks No. 1 2 3 4 5 6
Building block WIRE NOT AND NAND OR NOR
Critical delay 0 1 2 1 2 1
Power consumption 0 2 6 4 6 4
No. 7 8 9 10 11
Building block XOR NXOR MUX HA FA
Critical delay 2 2 2 2 4
Power consumption 8 8 6 14 22
Estimating Array Connectivity and Applying Multi-output Node Structure
51
primitive gates [15]. The calculation results are shown in table 4, where HA and FA refer to 1-bit half adder and 1-bit full adder, respectively. The data for HA and FA is obtained from the evolved circuits.
5 Evolutionary Algorithm In this paper, Cartesian Genetic Programming is applied as the evolutionary algorithm for evolving circuits. CGP is developed by Miller and Thomson for automatic evolution of digital circuits [2]. Typically, the one point mutation is adopted in CGP. For the purpose of obtaining higher diversity, an extended operator, inverted mutation, is employed in this work. The inverted mutation randomly selects two points within a chromosome, and exchanges the internal connections of the symmtrical nodes one by one based on the center of these two points. This operation is capable of generating a piece of new gene sequence as well as holding the existing structure in part. From this point of view, the inverted operation possesses some features of crossover. However, the inverting can only act on a single chromosome. Information exchanging between chromosomes is impossible. So it is still within the range of mutation. The combination of the standard CGP, 3-input 3-output node structure, the inverted mutation and the new fitness function is called modified CGP (MCGP), as will be used in the following experiments.
6 Experimental Results and Analysis In the following experiments, the point mutation ratio was set to guarantee that the number of modified genes is at least of 1 and at most of 4. The ratio of the inverted mutation was fixed to 0.3. The level-back took the value of 1. 6.1 1-Bit Full Adder The results produced in the experiment and that reported in [6] are listed in table 5. One of the optimal circuits found in this experiment is shown in Fig. 4 (a). The best solution obtained in [6] and a human designed circuit are shown as Fig.4 (b) and (c). Performance comparison is given in table 6, in which Max_delay denotes the critical delay and Power stands for the power consumption. Note that if the 1-output structure is applied the 3×3 array in table 5 only has a connectivity of 3. With the help of the 3-output structure, the connectivity becomes 3 times larger than its previous value. It can be seen from table 6 that the performances of the circuits generated by MCGP and CGP are equivalent. However, by observing Fig. 4 (a) and (b), we can see that in the circuit of MCGP the sum and the carry were generated simultaneously, while in CGP they appeared in different moments. 6.2 2-Bit Adder The results produced in the experiment and that reported in [6] are listed in table 7. The optimal circuit found in this experiment is shown in Fig. 5 (a). The best solution
52
J. Li and S. Huang Table 5. Evolutionary results of 1-bit full adder in 100 runs Algorithm Maximum generation Level-back Array geometry Connectivity Successful cases
MCGP 10,000 1 3×3 9 100
CGP 10,000 2 1×3 2 84
Fig. 4. Circuits of 1-bit full adder. (a) MCGP; (b) CGP; (c) Human designed. Table 6. Performance comparison of 1-bit full adder’s circuits Outputs MCGP CGP Human design
c '=(c ⊕ x1). x 0+ ( c⊕ x1).c s =( c ⊕ x1). x 0+ (c ⊕ x1). x 0 c '=( x 0⊕ x1).c + ( x 0⊕ x1). x1 s =c ⊕ x 0⊕ x1 c '= x 0. x1+ c.( x 0⊕ x1) s =c ⊕ x 0⊕ x1
Max_delay
Power
4
22
4
22
6
34
obtained in [6] and one possible carry look-ahead circuit are shown as Fig. 5 (b) and (c). The carry look-ahead circuit is obtained form [15]. Performance comparison is given in table 8. MCGP took advantages of functional building blocks, and resulted in a relative high successful ratio, though the array connectivity in MCGP was smaller than that in CGP. Table 7. Evolution results of 2-bit adder in 100 runs Algorithm Maximum generation Level-back Array geometry Connectivity Successful cases
MCGP 50,000 1 3×3 9 98
CGP 50,000 5 1×6 16 62
Estimating Array Connectivity and Applying Multi-output Node Structure
53
Fig. 5. Circuits of 2-bit adder. (a) MCGP; (b) CGP; (c) Carry look-ahead. Table 8. Performance comparison of 2-bit adder’s circuits Outputs MCGP
c '=(( c⊕ x1). x 0+( c⊕ x1).c ).( x1⊕ y1)+ x1.( x1⊕ y1) s 0 = c ⊕ x 0⊕ y 0
Max_delay
Power
6
44
6
44
8
78
s1=((c ⊕ x1). x 0+ ( c⊕ x1).c )⊕ x1⊕ y1 CGP
c '= y1.( x1⊕ y1) + ( x 0.( x 0⊕ y 0)+ c.( x 0⊕ y 0)).( x1⊕ y1) s 0 = c ⊕ x 0⊕ y 0
Carry lookahead
s1=( x 0.( x 0⊕ y 0) +c.( x 0⊕ y 0))⊕ x1⊕ y1 c '= x1. y1+ ( x1⊕ y1).( x 0. y 0+ ( x 0⊕ y 0).c ) s 0 = c ⊕ x 0⊕ y 0 s1= x1⊕ y1⊕( x 0. y 0+ ( x 0⊕ y 0).c )
6.3 2-Bit Multiplier The best solution is shown in Fig. 6. Performance comparison refers to table 9. As shown in Fig. 6, the best solution took a maximum of 7 gates as mentioned by Miller [6], and was equivalent to the most efficient circuit reported in [6]. Comparing with the circuit produced in [6], the evolved circuit in this work had a significantly reduced delay time due to its usage of the primitive gates, such as NAND and NOR. Table 9. Performance comparison of 2-bit multiplier’s circuits Outputs MCGP
CGP Human design
Max_delay
Power
3
34
p2 = ( x1. y1).( x0. y0) ; p3 = (x1.y1).(x0.y0).(x1.y0) . p 0 = x 0. y 0 ; p1 = ( x 0. y1) ⊕ ( x1. y 0) ;
7
48
p2 = x1. y1.( x0. y0) ; p 3 = x 0.x1. y 0. y1 ;
4
48
p 0 = x 0. y 0 ; p1 = x1.y0 ⊕ x0.y1 ;
p2 = x1.y1 + x0. y0 ; p3 = x1.y0 + x0.y1 . p 0 = x 0. y 0 ; p1 = ( x1. y 0 ) ⊕ ( x 0. y1) ;
54
J. Li and S. Huang
Fig. 6. Evolved optimal circuit of 2-bit multiplier 6.4 4-Bit Adder The array took a geometry of 5×5, and the maximum generation was limited to 1,000,000. Four successful cases were achieved in 20 runs. The most efficient circuit obtained in the experiments is shown in Fig. 7. The evolved circuit tended to use fine grain building blocks (e.g. NAND, NXOR, MUX) rather than coarse grain ones (e.g. half adder, full adder). Table 10 shows the comparison of performance. The circuits of ripple-carry and carry look-ahead follow that mentioned in [15]. Table 10. Performance comparison of 4-bit adder’s circuits
Max_delay Power
MCGP
Ripple-carry
Carry look-ahead
8 102
16 88
8 204
Fig. 7. Evolved optimal circuit of 4-bit adder
According to table 10, the optimal solution in MCGP had a greatly reduced delay time than that of the ripple-carry circuit; and its power consumption was only half of that consumed by the carry look-ahead circuit. Derived from the results, the
Estimating Array Connectivity and Applying Multi-output Node Structure
55
performance of the evolved circuit is therefore better than that of the circuits of ripple-carry and carry look-ahead.
7 Conclusion A formula for estimating the array connectivity is developed in this work. According to the presented formula, four possible ways for increasing the connectivity can be concluded: the first way is to increase the number of the array columns; the second one is to take a large level-back; the third, to increase the number of rows. These methods have been studied by other researchers. In their works these features are taken into account separately, while in this paper their effects on the connectivity are considered as a whole. The presented technique can be of a great benefit to the determination of the parameters for evolutionary design of digital circuits. The last way is to employ multi-output node structure, which can increase the array connectivity and improve the efficiency of evolution without increasing the size of the array. This structure offers more flexible ability of constructing circuits in that it allows a node to be a functional cell as well as a routing cell. This feature of the developed structure is also beneficial to the requirements of fault-tolerance. Further investigation will be carried out on this issue. Experiments for combinational circuits show that the proposed approach is able to promote the array connectivity and evolvability, and thus improves the evolutionary efficiency. Acknowledgments. The authors would like to thank the anonymous reviewers for their helpful comments.
References 1. Coello, C.A.C., Christiansen, A.D., Aguirre, A.H.: Use of Evolutionary Techniques to Automate the Design of Combinational Circuits. International Journal of Smart Engineering System Design 2(4), 299–314 (2000) 2. Miller, J.F., Thomson, P.: Cartesian Genetic Programming. In: Poli, R., Banzhaf, W., Langdon, W.B., Miller, J., Nordin, P., Fogarty, T.C. (eds.) EuroGP 2000. LNCS, vol. 1802, pp. 121–132. Springer, Heidelberg (2000) 3. Sekanina, L.: Design Methods for Polymorphic Digital Circuits. In: Proceeding of 8th IEEE Design and Diagnostic Circuits and Systems Workshop, Sopron, HU, UWH, pp. 145–150. IEEE Computer Society Press, Los Alamitos (2005) 4. Kalganova, T., Miller, J.F.: Evolving more Efficient Digital Circuits by Allowing Circuit Layout Evolution and Multi-objective Fitness. In: Stoica, A., et al. (eds.) Proceeding of the 1st NASA/DoD Workshop on Evolvable Hardware, pp. 54–65. IEEE Computer Society Press, Los Alamitos (1999) 5. Vassilev, V.K., Job, D., Miller, J.F.: Towarda the Automatic Design of More Efficient Digital Circuits. In: Lohn, J., et al. (eds.) Proceeding of the 2nd NASA/DoD Workshop on Evolvable Hardware, pp. 151–160. IEEE Computer Society Press, Los Alamitos, CA (2000)
56
J. Li and S. Huang
6. Miller, J.F., Job, D., Vassilev, V.K.: Principles in the Evolutionary Design of Digital Circuits - Part I. Genetic Programming and Evolvable Machines 1(1), 7–35 (2000) 7. Kalganova, T., Miller, J.F.: Circuits Layout Evolution: An Evolvable Hardware Approach. Coloquium on Evolutionary Hardware Systems. IEE Coloquium Digest (1999) 8. Miller, J.F., Thomson, P.: Aspects of Digital Evolution: Geometry and Learning. In: Sipper, M., Mange, D., Pérez-Uribe, A. (eds.) ICES 1998. LNCS, vol. 1478, pp. 25–35. Springer, Heidelberg (1998) 9. Kalganova, T., Miller, J.F., Fogarty, T.: Some Aspects of an Evolvable Hardware Approach for Multiple-valued Combinational Circuit Design. In: Sipper, M., Mange, D., Pérez-Uribe, A. (eds.) ICES 1998. LNCS, vol. 1478, pp. 78–89. Springer, Heidelberg (1998) 10. Sekanina, L., Vašĭček, Z.: On the Practical Limits of the Evolutionary Digital Filter Design at the Gate Level. In: Rothlauf, F., Branke, J., Cagnoni, S., Costa, E., Cotta, C., Drechsler, R., Lutton, E., Machado, P., Moore, J.H., Romero, J., Smith, G.D., Squillero, G., Takagi, H. (eds.) EvoWorkshops 2006. LNCS, vol.~3907, pp. 344--355. Springer, Heidelberg (2006) 11. Miller, J.F., Thomson, P., Fogarty, T.C.: Genetic Algorithms and Evolution Strategies. In: Quagliarella, D., et al. (eds.) Engineering and Computer Science: Recent Advancements and Industrial Applications, Wiley, Chichester (1997) 12. Kalganova, T.: An Extrinsic Function-level Evolvable Hardware Approach. In: Proceeding of the 3rd Euopean Conference on Genetic Programming (Euro 2000), Springer, London (2000) 13. Vassilev, V.K., Miller, J.F.: Scalability Problem of Digital Circuit Evolution Evolvability and Efficient Design. In: Proceeding of the 2nd NASA/DoD Workshop on Evolvable Hardware, IEEE Computer Society Press, Los Alamitos (2000) 14. Sekanina, L.: Evolutionary Design of Gate-level Polymorphic Digital Circuits. In: Rothlauf, F., Branke, J., Cagnoni, S., Corne, D.W., Drechsler, R., Jin, Y., Machado, P., Marchiori, E., Romero, J., Smith, G.D., Squillero, G. (eds.) EvoWorkshops 2005. LNCS, vol. 3449, pp. 185–194. Springer, Heidelberg (2005) 15. Uyemura, J.P.: Introduction to VLSI Circuits and Systems. Wiley, Chichester (2001)
Research on the Online Evaluation Approach for the Digital Evolvable Hardware Rui Yao, You-ren Wang, Sheng-lin Yu, and Gui-jun Gao College of Automation and Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, Jiangsu, 210016, China
[email protected],
[email protected],
[email protected]
Abstract. An issue that arises in evolvable hardware is how to verify the correctness of the evolved circuit, especially in online evolution. The traditional exhaustive evaluation approach has made evolvable hardware unpractical to real-world applications. In this paper an incremental evaluation approach for online evolution is proposed, in which the immune genetic algorithm is used as the search engine. This evolution approach is performed in an incremental way: some small seed-circuits have been evolved firstly; then these small seed-circuits are employed to evolve larger module-circuits; and the module-circuits are utilized to build still larger circuits further. The circuits of 8-bit adder, 8-bit multiplier and 110-sequence detector have been evolved successfully. The evolution speed of the incremental evaluation approach appears to be more effective compared with that of the exhaustive evaluation method; furthermore, the incremental evaluation approach can be used both in the combinational logic circuits as well as the sequential logic circuits. Keywords: Evolvable hardware, online evolution, incremental evaluation, digital circuit.
1
Introduction
In evolvable hardware (EHW), there’re two methods in performing evolution, i.e., online evolution and offline evolution [1]. So far most of the EHW researchers prefer to adopt the offline approach in which the circuits are evaluated using their software models [2, 3]. However, the software simulation is too slow, and modeling is very difficult in some cases; moreover it is unable to repair the hardware’s local fault online [4]. Online evolution is the prerequisite for the hardware’s online repair. Nevertheless, for online evolution, the fitness evaluation and circuit verification turns out to be even more difficult [5]. How to guarantee the correctness of the evolved circuits is one of the puzzles of the EHW research, especially for the online evolution. On the one hand, a test set must be designed to guarantee the correctness of the circuit’s logic function. But the self-contained test set cannot be acquired in some cases. In theory, a combinational logic circuit’s correctness may be guaranteed if all of its possible input combinations have been tested. Whereas for L. Kang, Y. Liu, and S. Zeng (Eds.): ICES 2007, LNCS 4684, pp. 57–66, 2007. c Springer-Verlag Berlin Heidelberg 2007
58
R. Yao et al.
a sequential logic circuit, the output at one moment concerned not only its inputs at that time, but also its origin state. Therefore the online verification of a sequential circuit is very difficult, and the fitness evaluation can be hardly performed at whiles. On the other hand, the exhaustive evaluation method will not work for circuits with a large number of inputs, since the number of possible input combinations increases exponentially as the number of inputs increases. For instance, there are 264 =18,446,744,073,709,551,616 input combinations in a 64-bit adder. If the self-contained test is adopted, and suppose that evaluates one test vector need 1ns; then some 264 ×10-12=18,446,744 seconds=213 days are needed to test all the inputs combinations, i.e., 213 days are needed to evaluate one individual. Suppose that there are 50 individuals in the population, and 2000 generations are needed to find a correct solution; then the evolution time needed are 2000×50×213=21300000 days=58494 years! Aiming at the puzzle of the fitness evaluation dues to the complexity of the circuits, an approach of partitioning the training set as well as each training vector was proposed by Jim Torresen to realize relatively large combinational logic circuits [6], a random partial evaluation approach was presented by Hidenori Sakanashi et al for lossless compression of very high resolution bi-level images [7], and a fast evolutionary algorithm that estimated the fitness value of the offspring by that of its parents, was proposed by Mehrdad Salami et al, to quicken the evaluation process [8]. However, Torresen’s approach quickens the evolution process at the expense of resources, and it is suitable only for the evolution of the combinational logic circuits; Sakanashi’s approach can not secure a 100%correctness of the digital logic circuits; Salami’s approach is only suitable for slow fitness evaluation applications, and it will be bankrupt in usefulness if the estimation time of an individual exceeds its evaluation time. In this paper the idea of partitioning verification in the electronic design automation is introduced into the evaluation of the EHW circuits, say, incremental evaluation approach. The approach is used in the online evolutionary design of the digital circuits, and the circuits of 8-bit adder, 8-bit multiplier and 110sequence detector have been evolved successfully.
2
The Incremental Evaluation Approach
The basic idea of the incremental evaluation approach is to perform evolution in an incremental way: some small basic modules to construct a certain circuits, namely seed-circuits, have been evolved at first; then they are used to evolve larger module-circuits; and the module-circuits are used to build still larger circuits further. 1) When a seed-circuit is evolved, it is evaluated using a test set that only satisfies the function of itself. Because in the incremental evaluation, most of the evolution time is spent on the evolution process of the seed-circuit, thus the evolution time decreased dramatically.
Research on the Online Evaluation Approach
59
2) While the newly developed circuits are evaluated, the test set is designed in an incremental way. Because the correctness of the seed-circuits has been evaluated already, only the newly developed function is evaluated. For instance, to a 3-bit serial adder, a2a1a0+b2b1b0, after the least significant bit (a0+ b0) and hypo-least significant bit (a1+ b1) has been evolved, when the most significant bit (a2+ b2) is evaluated, its output only concerns to the local bits a2, b2 and the abutted carry c1, thus only 23 =8 test vectors are needed; whereas in the exhaustive evaluation, 26 =64 test vectors are needed. By analogy, 16×8=128 test vectors are needed in the incremental evaluation to evaluate a 16-bit adder, while in the exhaustive evaluation, the number is 232 =4,294,967, 296. Thus it secures that the number of the test vectors needed increases linearly rather than exponentially as the number of the circuit’s inputs increases. While meeting the precondition of ensuring the circuit’s correctness, the test set decreases dramatically, and the evolution speed is improved greatly. 3) While to evolve relative larger circuits using the modular-circuits, only the interconnections between them are taken into account to design their test sets.
3 3.1
The Technical Scheme The Online EHW Evolutionary Platform
The block diagram of our online evolutionary platform made up of the Xilinx Virtex FPGA is shown in figure 1.
2XWSXW
Fig. 1. Block diagram of the online evolutionary platform
The co-processing environment of CPU and FPGA is adopted in the platform. The hardware platform comprises of FPGA, input SRAM and output SRAM; while the software platform is provide by the Java-based API: JBits. Here the Xilinx FPGA XCV1000 is adopted, which is capable of partial dynamic reconfiguration [9], and the number of its internal equivalent system gates is 1M. The total storage capacity of input SRAM and outputs SRAM is 8Mb, and the SRAM may be divided into 4 banks of 2Mb SRAM block.
60
3.2
R. Yao et al.
Coding
The building blocks in the evolution area are modules with different granularities. They are modeled as modules with 2 input port and 1 output portwith 2 input ports and 1 output port, as shown in figure 2.
Fig. 2. Sketch map of the modules in the evolution area
Since both the function of the modules with different granularities and the interconnections between them need to be coded, the multi-parameters cascading and partitioning coding method is used. The whole chromosome is divided into two sections, Section A and Section B, as shown in figure 3(a). Where Section A determines the interconnection, as shown in figure 3(b); Section B determines the module’s function, as shown in figure 3(c).
Fig. 3. Multi-parameters cascading and partitioning coding
3.3
Immune Genetic Algorithm
The immune genetic algorithm (IGA) can tackle the premature convergence of the genetic algorithm and the slow run speed of the immune algorithm effectively. In the early 1990’s, Forrest et al proposed the framework of the immune genetic algorithm by combining the mechanism of antibody’s recognizing of antigen with
Research on the Online Evaluation Approach
61
GA [10]. Since then, many new approaches have appeared to design the affinities and the selection probability, e.g. the concentration and affinity of the antibody were used to design the expected reproduction rate; the immune operators such as vaccination and immune selection were designed and combined with the GA’s operators; the concentration of the antibody was determined using information entropy [11-14]. In this paper a modified IGA, namely MMIGA, is used as the evolutionary algorithm, the performance index of the circuit is regarded as the antigen, and the solutions on behalf of the circuit’s topology is viewed as the antibodies, the coherence between the actual response and the expected response of the circuit is considered as the appetency of the antibody to the antigen. The MMIGA algorithm can be described as follows: Step 1. Initialization. Randomly generate N antibodies to form the original population Am. Step 2. Calculate the affinities. View the fitness of each antibody as its affinity. Calculate the affinity of each antibody, namely di . If the optimal individual is found, stop; otherwise, continue. Step 3. Sort the antibodies in Am by affinity. The one that has highest affinity is viewed as the monkey-king. Step 4. Select the S highest affinity antibodies from Am to propose a new set Bm. Step 5. Clone the S antibodies in Bm to form Cm of N antibodies. The number of clones of the ith antibody is: ⎧ ⎫ ⎪ ⎪ ⎪ ⎪ ⎨ d N⎬ i ni = Round · , i = 1, 2, ..., S (1) s ⎪ S⎪ ⎪ ⎪ d ⎩ ⎭ j j=1
where Round() is the operator that rounds its argument toward the closest integer. Step 6. Perform mutation on Cm with probability of 0.95 and obtain Dm. The mutation bits of each antibody adjust adaptively according to its affinity. The number of the mutation bits of the ith antibody is:
dmax − di mbi = Int · rmax · L , i = 1, 2, ..., N (2) dmax where dmax is the maximum affinity, L is the length of the antibody, rmax is the maximum mutating rate (here is 0.5). Simulated annealing is used to determine whether to hold the mutation or not: if the affinity increases, hold the mutation; otherwise, hold the mutation with a small probability. Step 7. Select N-S lowest affinity antibodies from Am to form a temporary set Em. Step 8. Perform mutation on Em and obtain Fm.
62
R. Yao et al.
Step 9. Sort the antibodies in Dm by affinity and determine the dissimilar matrix NL. Considering the symmetry of the comparability, to reduce the calculate cost, only the upper triangular matrix is calculated, as shown in formula (3). ⎡ ⎤ nl11 nl12 . . . nl1N ⎢ nl21 . . . . . . nl2N ⎥ ⎥ NL = ⎢ (3) ⎣.................. ⎦ . . . . . . . . . nlN N Step 10. Determine the fuzzy select array SL. Select S-1 dissimilar antibodies according to NL in the order of nl11 → nl12 → · · · → nl1n → nl21 → · · · → nlnn to form SL. Step 11. Hold the elite of Am, select S-1 antibodies from Dm according to SL, in addition to the N-S antibodies in Fm to form the new antibody set Am. Step 12. Repeat step 2-11, until the optimal individual is found or the terminating iterative generation is arrived.
4 4.1
Results and Discussion Online Evolutionary Design Results of the Combinational Logic Circuits
Results of the Serial Adder. The 4-bit serial adder and 8-bit serial adder are taken as examples in this paper. The incremental evaluation and exhaustive evaluation is used respectively in the contrast experiments. The parameters of the algorithm are as follows: the size of the antibody set is N=150, the size of the highest affinity antibody set is S=5 and the mutation probability of the lowest affinity antibody set is 0.9. The maximum terminating iterative generation is 5000. Each evaluation approach runs 50 times respectively, and the results of the generations and time needed are shown in figure 4 to figure 7. From figure 4 and figure 6 we can see that, while evolving the 4-bit adder and 8-bit adder using incremental evaluation and exhaustive evaluation respectively, the generation needed makes no odd; it is owing to the incremental evolution being used in both of them. But figure 5 and figure 7 clearly shows that, the evolution time of exhaustive evaluation is much longer than that of the incremental evaluation. In contrast, there is not much difference in the evolution time of the 4-bit adder, because the difference of the test set is puny: the test set of the exhaustive evaluation is 500, while that of the incremental evaluation is 200. But the evolution time of the 8-bit adder in figure 7 differs distinctly, because the test set of the exhaustive evaluation is 66000, while that of the incremental evaluation is 200. It is obvious that the degree of improvement of the evolution speed in the incremental evaluation approach is governed by the size of the test set. Results of the 8-Bit Multiplier. The 8-bit multiplier is evolved using the seed-circuits of the 82 pipelining multiplier and the 16-bit adder, whose topology is shown in fig.8. The size of the test set is 4096; the parameters are as same as that of 4.1.1.
Research on the Online Evaluation Approach
63
Evolution generation of 4−bit adder 500 exhaustive evaluation incremental evaluation
450 400
Generations
350 300 250 200 150 100 50 0
0
10
20
30
40
50
Runs
Fig. 4. Generations needed for the 4-bit adder Evolution time of 4−bit adder 120 exhaustive evaluation incremental evaluation 100
Time in minutes
80
60
40
20
0
0
10
20
30
40
50
runs
Fig. 5. Time needed for the 4-bit adder
The seed-circuits in figure 8 have used the look-up tables (LUT) and the routing resources, as well as the fast carry chain, AND MUX and XOR, etc. So not only the operation speed has been improved, but also the hardware resources have been saved: only 5 slices have been used in the 8×2 pipelining multiplier, and 8 slices in the 16-bit adder. Only 4 8×2 pipelining multipliers and 3 16-bit adders, total 44 slices, i.e., 22 CLBs, have been used in the optimum solution of the 8-bit multiplier. 4.2
Online Evolutionary Design Results of the Sequential Logic Circuits
The circuit of the 100-sequence detector has an input ’x’ and an output ’z’, except for 2 flip-flops. Its input is a string of random signal; its output ’z’ is a
64
R. Yao et al. Evolution generations of 8−bit adder 500 exhaustive evaluation incremental evaluation
450 400
Generation
350 300 250 200 150 100 50 0
0
10
20
30
40
50
Runs
Fig. 6. Generations needed for the 8-bit adder Evolution time of 8−bit adder 300 exhaustive evaluation incremental evaluation 250
Time in minutes
200
150
100
50
0
0
10
20
30
40
50
Runs
Fig. 7. Time needed for the 8-bit adder
’1’ whenever a 110-sequence appears. 3 states (00, 01 and 11) are used out of its 4 states. While evolving the 110-sequence detector using traditional evaluation, LUTs are used as the basic modules. The evolution runs 20 times, among which 10 runs are successful. It means that the successful rate of the evolution is 0.5. The results of the generations and time needed in the successful 10 runs are shown in table 1. The mean evolution generation in table 1 is 932, and the mean evolution time is 120.4 minutes. While evolving the 110-sequence detector using incremental evaluation, the flip-flop module whose function has been verified, together with the LUT of the combinational logic, are taken as basic modules. The evolution runs 30 times, the results of the generations and time needed are shown in table 2.
Research on the Online Evaluation Approach c(n+1)
c(2n+1) a(n+1) a(n+2)
G1&G2 ˆ G3&G4
0
1
F1&F2 ˆ F3&F4
0
1
an b(i+1) bi
65
P(2n+1) b(n+1) a(n+1) G1&G2
P(2n)
bn an
0
1
0
1
s(n+1)
sn
F1&F2
c(n-1)
c(2n-1)
Fig. 8. Seed-circuits of the 8-bit multiplier Table 1. Generations and time needed to evolve the 110-sequence detector using the traditional method Runs 1 2 3 4 5 6 7 8 9 10 Generation 860 1889 358 1026 236 735 301 2173 1234 508 Time 136 169 52 111 36 78 54 249 193 126
Table 2. Generations and time needed to evolve the 110-sequence detector using the traditional method Runs Generation Time Runs Generation Time
1 5 1 16 17 2
2 7 1 17 64 7
3 35 9 18 14 1
4 101 14 19 11 1
5 149 17 20 37 4
6 96 11 21 29 3
7 25 5 22 30 3
8 135 15 23 13 2
9 32 4 24 12 2
10 230 26 25 26 3
11 78 9 26 47 4
12 6 1 27 67 7
13 23 2 28 33 4
14 50 6 29 44 5
15 21 2 30 16 1
The mean evolution generation in table 2 is 48.4, which is about 0.52% of that of the traditional evolution; and the mean evolution time is 5.7 minutes, which is about 4.7% of that of the traditional method. So we can conclude that the incremental evaluation can not only increase the successful rate of the evolution, but also speed up the evolution speed dramatically, and decrease the evolution time.
5
Conclusion
An incremental evaluation approach for the online evolution of the evolvable hardware has been proposed in this paper. The seed-circuits is evolved using partial test set, then the newly developed circuit is evaluated using a test set designed in an incremental approach. The immune genetic algorithm is used as the evolutionary algorithm. It is shown that the incremental evaluation approach can simplify the online verification and evaluation process of the digital circuits, and accelerate the evolutionary design process. It has opened a new avenue to
66
R. Yao et al.
tackling the slow evolution speed and the puzzle of online evaluation caused by the complexity of the circuits. Acknowledgments. The work presented in this paper has been funded by National Natural Science Foundation of China ( 60374008, 90505013 ) and Aeronautical Science Foundation of China ( 2006ZD52044, 04I52068 ).
References 1. Yao, X., Hugichi, T: Promises and Callenges of Evolvable Hardware. IEEE Trans On Systems Man and Cybernetics -Part C: Applications and Reviews 29(1), 87–97 (1999) 2. Zhao, S.-g., Jiao, L.-c.: Multi-objective evolutionary design and knowledge discovery of logic circuits based on an adaptive genetic algorithm. Genetic Programming and Evolvable Machines 7(3), 195–210 (2006) 3. Liu, R., Zeng, S.-y., Ding, L., et al.: An Efficient Multi-Objective Evolutionary Algorithm for Combinational Circuit Design. In: Proc. of the First NASA/ESA Conference on Adaptive Hardware and Systems, pp. 215–221 (2006) 4. Wang, Y.-r., Yao, R., Zhu, K.-y., et al.: The Present State and Future Trends in Bio-inspired Hardware Research (in Chinese). Bulletin of National Natural Science Foundation of China 5, 273–277 (2004) 5. Xu, Y., Yang, B., Zhu, M.-c.: A new genetic algorithm involving mechanism of simulated annealing for sigital FIR evolving hardware (in Chinese). Journal of Computer-Aided Design & Computer Graphics 18(5), 674–678 (2006) 6. Torresen, J.: Evolving Multiplier Circuits by Training Set and Training Vector Partitioning. In: Tyrrell, A.M., Haddow, P.C., Torresen, J. (eds.) ICES 2003. LNCS, vol. 2606, pp. 228–237. Springer, Heidelberg (2003) 7. Sakanashi, H., Iwata, M., Higuchi, T.: Evolvable hardware for lossless compression of very high resolution bi-level images. Computers and Digital Techniques 151(4), 277–286 (2004) 8. Salami, M., Hendtlass, T.: The Fast Evaluation Strategy for Evolvable Hardware. Genetic Programming and Evolvable Machines 6(2), 139–162 (2005) 9. Upegui, A., Sanchez, E.: Evolving Hardware by Dynamically Reconfiguring Xilinx FPGAs. In: Moreno, J.M., Madrenas, J., Cosp, J. (eds.) ICES 2005. LNCS, vol. 3637, pp. 56–65. Springer, Heidelberg (2005) 10. Forest, S., Perelson, A.S.: Genetic algorithms and the immune system. In: Schwefel, H.-P., M¨ anner, R. (eds.) PPSN 1990. LNCS, vol. 496, pp. 320–325. Springer, Heidelberg (1991) 11. Fukuda, T., Mori, K., Tsukiama, M.: Parallel Search for Multi-modal Function Optimization with Diversity and Learning of Immune Algorithm. In: Dasgupta, D. (ed.) Artificial Immune Systems and Their Applications, pp. 210–220. Springer, Berlin (1999) 12. Jiao, L.-c., Wang, L.: A Novel Genetic Algorithm Based on Immunity. IEEE Transactions on Systems, Man and Cybernetics-Part S: Systems and Humans 30(5), 552–561 (2000) 13. Cao, X.-b., Liu, K.-s., Wang, X.-f.: Solve Packong Problem Using An Immune Genetic Algorithm. Mini-Micro Systems 21(4), 361–363 (2000) 14. Song, D., Fu, M.: Adaptive Immune Algorithm Based on Multi-population. Control and Decision 20(11), 1251–1255 (2005)
Research on Multi-objective On-Line Evolution Technology of Digital Circuit Based on FPGA Model Guijun Gao, Youren Wang, Jiang Cui, and Rui Yao College of Automation and Engineering, Nanjing University of Aeronautics and Astronautics, 210016 Nanjing, China guijun
[email protected]
Abstract. A novel multi-objective evolutionary mechanism for digital circuits is proposed. Firstly, each CLB of FPGA is configured as minimum evolutionary structure cell (MESC). The two-dimensional array consisted of MESCs by integer scale values is coded. And the functions and interconnections of MESCs are reconfigured. Secondly, the circuit function, the number of active CLBs and the circuit response speed are designed for evolutionary aims. The fitness of the circuit function is evaluated by on-line test. The fitness of the active CLBs’ number and response speed are evaluated by searching the evolved circuit in reverse direction. Then the digital circuits are designed by multi-objective on-line evolution in these evaluation methods. Thirdly, a multi-objective optimization algorithm is improved, which could quicken the convergence speed of online evolution. Finally, Hex-BCD code conversion circuit is taken as an example. The experimental results prove the feasibility and availability of the new on-line design method of digital circuits. Keywords: Evolvable Hardware, Digital Circuit, On-line Evolution, Multi-objective Evolutionary Method, FPGA Model.
1
Introduction
The key principle of Evolvable Hardware (EHW) [1,2] is to realize the hardware self-adaptation, self-combination and self-repair based on evolutional algorithm and programmable logic device (PLD). There are two methods to realize: one is off-line evolution [3] and the other is on-line [4]. At present, a lot of research is done on off-line evolution. Firstly, the structure sketches of the circuits are designed by evolutional algorithm and mathematics model. Secondly, placing and routing is implemented by special design software. Thirdly, physical structures are mapped into FPGA. Finally, the logic functions of the designed circuits are evaluated according to the test results of FPGA. But its disadvantages are also visible. For example, its evaluation speed is slow and its evolutionary time is overlong. It’s difficult to synthetically evaluate circuit function, active areas, time-delay, power dissipation and so on. So it can not implement real-time repair of hardware faults. EHW on-line evolution can configure the functions and L. Kang, Y. Liu, and S. Zeng (Eds.): ICES 2007, LNCS 4684, pp. 67–76, 2007. c Springer-Verlag Berlin Heidelberg 2007
68
G. Gao et al.
interconnections of the MESCs in FPGA directly. In the process of evolutionary design, we can realize the real-time evaluation of each chromosome in every generation, obtain the evaluation indices of circuits’ comprehensive capability, and implement hardware on-line repair in the case of local faults. So it is important to do research on EHW on-line evolution technology. It will become one of the research hotspots whose key technologies include code method, on-line evaluation, multi-objective evolutionary mechanism, and FPGA chips suitable for evolution. Today, on-line evolution of EHW is usually based on binary code method [5,6], whose shortcoming is overlong chromosome. This problem makes the search space become huge and the satisfactory result not found in certain time. If minterm and function-level code methods are used, it is necessary to decode. The evolved logic circuits based on the two methods aren’t one-to-one mapping with the physical circuits. These methods are unfavorable to hardware on-line repair. Some researchers have studied multi-objective evolution in the field of digital circuits for many years. Zhao Shuguang [7] integrated several objects into a single object in the form of ”sum of weights” based on netlist-level representation. He Guoliang [8] and Soliman [9] researched the multi-objective evolution of two objects about circuit function and the number of active logic gates. They evolved the 100% functional circuit firstly and then optimized the active logic gates. But these multi-objective evolution designs are off-line. Considering inner characteristic structure of FPGA, we propose a method of digital circuit on-line multi-objective evolution which is suitable for FPGA by coding MESCs and evolving three objects of evolutionary aim(100% functional circuit) and two restricted aims(number of active CLBs and circuit response speed). Based on this, a novel multi-objective on-line evolutionary algorithm is presented.
2
Multi-objective Design Principle of Digital Circuit On-Line Evolution
Generally, if the input/output characteristic of expected design circuit is known, we can use multi-objective evolutionary algorithm to implement circuit function and optimize the structure. At first, the initial chromosome for representing circuit structure in the multi-objective evolutionary algorithm is created randomly. Secondly, the chromosome is downloaded into FPGA and then tested and evaluated. Thirdly, multi-objective evolutionary algorithm evolves the old chromosomes and creates new ones according to test results. The second and third parts run repeatedly and stop when a certain chromosome satisfies the multi-objective evolutionary conditions. Finally, the expected circuit can be realized. Fig.1 shows the multi-objective on-line evolutionary process of digital circuit. 2.1
Multi-objective Optimization Model and Evolutionary Algorithm
It is an important application for multi-objective optimization algorithm in the field of circuits’ on-line evolutionary design. And therefore how to construct
Research on Multi-objective On-Line Evolution Technology
69
˅
Fig. 1. Multi-objective on-line design process of digital circuit
multi-objects and design evolutionary algorithm need be researched urgently. For the sake of multi-objective on-line evolutionary application in EHW, we improve the multi-objective optimization model based on typical restriction method in this paper. The problems of multi-objective optimization about digital circuit design can be expressed as follows: primary function: y = f itness(x) = F (1) subsidiary functions: power(x) ≤ D, timedelay(x) ≤ P The functional fitness value of the evolved circuit is calculated as follows: F =
n−1 m−1
Ci,j
(2)
i=0 j=0
Ci,j =
1, outdata = epdata 0, outdata = epdata
(3)
Where outdata is output value of current evolved circuit; epdata is output value of expected circuit; n is total number of circuit output pins; m is total number of test signals. The fitness value of circuit function reflects the approaching extent between current circuit and expected circuit directly. Power dissipation is simplified by the number of active CLBs in this paper, which is signified by D. Time-delay is defined by the signal maximal transmission time from input to output, which is signified by P. The value of power dissipation and time-delay can be obtained by the method of searching evolved circuit in reverse direction. Fig.2 shows the realization process of multi-objective evolutionary algorithm. The steps are as follows: STEP 1. Create initial population randomly. STEP 2. Judge if each chromosome of population satisfies the constraint conditions and niche qualification. If yes, it will be preserved into registered pool. Best chromosome of population is signed.
70
G. Gao et al.
STEP 3. Evolve the population. If registered pool is not full, go to STEP 2. Otherwise, go to next. STEP 4. Judge if there are excellent genes in the population. If there are excellent genes, sign and lock them. STEP 5. Evolve the population by multi-objective evolutionary algorithm based on the strategy of mutating, preserving excellent genes and so on. STEP 6. Evaluate each target. If all of the multi-objective evolutionary conditions are satisfied, evolution is stopped. Otherwise, go to STEP 4.
Aim and subsidiary evaluation
Initial population progenitive pool
Register pool is Full Restricted register pool
Old population
Evolve (mutate, preserve excellent gene etc)
Local evaluation
Current best chromosome
Aim evalution
Niche evaluation
New population
Register pool is not Full
Subsidiary evaluation Register pool is Full
End
Fig. 2. Flow chart of multi-objective evolutionary algorithm
2.2
Encoding and Searching Methods of Digital Circuit
In the design of digital circuits, there are many encoding methods, such as binary code, min-term code and function-level code. We propose an integer code method based on FPGA model. This method includes coding functions of CLBs and interconnections between CLBs. Fig.3 shows the encoding and searching methods of 2-D array of FPGA. We encode the 2-D array with cascaded multi-parameter encoding method. In the paper, one LUT is used in every CLB. Two input pins (F1 and F2) and one output pin (XQ) in every LUT are used to evolve the circuit. Every chromosome can be expressed by 80-bit code in Fig.3. The first 55 bits of the whole code express the interconnections between CLBs. The others express the functions of CLBs. It is important for multi-objective evaluation to search active CLBs of 2D evolutionary array in reverse direction. The searching direction is signed in
Research on Multi-objective On-Line Evolution Technology Input column
Evolutionary column
0 4
3 Test signal input
2
1
1 4
9 8
3
6 2
5 4
1
59 9
7 58
8
19 18
64 14
17 16
57 7
15 14
63
13
29 28
62 12
39
25 24
6
49
69 19 38 74 24 48 79 29 54
27 26
Output column 5
4
68
18
37 36 73
23
47 46 78
28 53
35 45 67 17 34 72 22 44 77 27 52
Test result ouput
33 43 3 13 23 6 21 26 51 11 16 32 71 42 76 2 56 12 61 22 66 1
0
3
2
71
0 0
55
5
11 10
60
10
21 20
65
15
31 30 70
20
41 40 75
25 50
Searching direction
Fig. 3. Schematic of structure of FPGA, code and searching direction of digital circuit evolutionary design
Fig.3. Firstly, the CLBs connected to output column are signed by searching this column. Secondly, if the searched CLB has been signed, the CLBs connected to this CLB will be signed by orderly searching in reverse direction. If the input column has been searched, the search is stopped. Consequently, the positions of all active CLBs are signed. The number of active CLBs and time-delay can be calculated by this method. 2.3
Improved Multi-objective Evolutionary Strategy of Digital Circuit Design
The evolutionary design process of digital circuit is as follows: confirm the logical function of designed circuit, design fitness evaluation function, search the optimization structure by evolutionary algorithm, download each chromosome of each generation to FPGA, evaluate the current circuit, and stop evolution if the evolutionary current circuit satisfies expected function. By analyzing evolutionary circuits, we can find that partial genes of some chromosomes are best (that is, the sub-functions of the circuit corresponding to best genes satisfy some evolutionary conditions) but the whole chromosomes aren’t best. However, these best genes may be destroyed and evolved again in the next evolution. And the evolutionary time is increased accordingly. On the basis of the analysis, we present a novel improved idea of circuit design. When some sub-circuits satisfy performance requirements, we preserve the best genes corresponding to the sub-circuits and only evolve the other functions and their interconnections. Then a new strategy of accelerating convergence of circuit evolutionary design is presented. For example, we make use of n individuals of single
72
G. Gao et al. Locked part
In
On
. . .
. . .
I0
O0 Evolutionary initial phase
In
On . . . .
...
. ... . . O0
I0
On
In
. . Or
...
Phase of þlockingÿ some excellent genes
. . .
. . . O0
I0 Evolutionary end
Fig. 4. Improved evolutionary strategy of accelerating evolution process convergence
population to design circuit by parallel evolution. The improved strategy can be realized as follows: when one or more pins outputs of FPGA satisfy test conditions, the CLBs influencing these outputs will be signed and locked in order to avoid being destroyed. And locked genes will be used to learn in the process of subsequent evolution for quickening convergence. The idea of preserving part excellent gene sections is applied into the multiobjective evolutionary design of digital circuits. The searching space is reduced by preserving the locked genes based on multi-objective evaluation of every generation. So it can heighten the speed of multi-objective on-line evolutionary design. The idea is similar to the experts’ designing thoughts.
3 3.1
Experimental Results Experimental Program and Results Analysis
Virtex FPGA is adopted as the experimental hardware model and JBits2.8 as the software platform. Four basic logic functions are used to configure the CLBs of FPGA. They are F1&F2, F1&(˜F2), F1ˆF2 and F1|F2. HEX-BCD code conversion circuit is taken as the experimental example. The evolutionary area is an array of 5*7. Because the first column works as inputs and the last as outputs, the two columns won’t participate in the evolution. And then the actual evolutionary area becomes an array of 5*5. Multi-objective evaluation function is acquired by equation (1) in chapter 2.1. Fig.5 shows 30 groups of contradistinctive results between multi-objective evolution and single-objective evolution. The results illuminate that the number of active CLBs and time-delay levels of multi-objective evolution are less than that of single evolution but the convergence speed is slower obviously. We use accelerating convergence strategy to quicken the convergence speed of multi-objective evolution. Fig.6 shows the contrast of convergence speed among the single-objective algorithm, common multi-objective algorithm and accelerating multi-objective algorithm. The experimental results indicate that the convergence speed of accelerating multi-objective algorithm is faster than that of
Research on Multi-objective On-Line Evolution Technology
73
7
25
p1−−time−delay level curve of single−objective evoution
d1−−number curve of single−objective evolutionary active CLBs
p2−−time−delay level curve of multi−objective evolution
6
d2−−number curve of multi−objective evolutionary active CLBs d1
time−delay levels
number of active CLBs
20
15 d2
10
5 p1 4 p2 3
5
0
2
5
10
15 20 experimental times
25
1
30
(a) Comparative chart of active CLBs
5
10
15 20 experimental times
25
30
(b) Comparative chart of time-delay levels
300 t1−−curve of single−objective evolutionary time t2−−curve of multi−objective evolutionary time
evolutionary time
250
200
150
t2
100
t1
50
0
5
10
15 20 experimental times
25
30
(c) Comparative chart of evolutionary time
Fig. 5. Comparison of experimental results between multi-objective and singleobjective evolution
100 c1
percentage of convergence£¥
95
c3 c2 c1−curve of accelerating multi−objective evolution
90
c2−curve of common multi−objective evolution 85
c3−curve of single−objective evolution
80 75 70 65 60 55
0
1000
2000 3000 evolutionary generations
4000
5000
Fig. 6. Comparative chart of convergence curve of different algorithms in the process of evolution
74
G. Gao et al. 200 t1−−histogram of common multi−objective evolutionary time t2−−histogram of accelerating multi−objective evolutionary time
180 160
evolutionary time
140 120 100 80
t1
60 t2
40 20 0
1
2
3
4
5 6 7 experimental times
8
9
10
Fig. 7. Histogram of evolutionary time of two multi-objective evolutionary algorithms
common algorithm. Local degeneration is caused by constraint conditions. But it’s still convergent for the whole evolutionary process. 10 groups of evolution time values of the two methods are shown in Fig.7. 3.2
Circuit Structures Analysis
A typical circuit structure of single-objective evolution is shown in Fig.8 and that of multi-objective evolution in Fig.9. The gray parts show the obvious redundancies and the white parts are still possible to be predigested. From the two circuits, we can find that the number of active CLBs is 18 and time-delay level is 5 in Fig.8. And the number of active CLBs is 10 and time-delay level is 3 in Fig.9. Obviously, the circuit structure of multi-objective evolution is better than that of single-objective evolution and the redundancies are reduced greatly. So the approach of multi-objective evolution has such advantages as less resource consumption and time-delay. a4
b4
a3
b3
a2
b2
a1
b1
a0
b0
Fig. 8. Circuit structure based on single-objective evolution
Research on Multi-objective On-Line Evolution Technology
75
Fig. 9. Circuit structure based on multi-objective evolution
4
Conclusions
According to the structure characteristic of FPGA, multi-objective on-line evolution of digital circuit based on coding the MESCs by two-level integer method and multi-objective evolutionary algorithm is realized in this paper. An improved multi-objective evolutionary strategy based on locking the excellent gene sections is presented. Then the growing evolution of digital circuit comes true by evolving several sub-modules divided by the whole circuit and preserving the excellent gene sections. At last, the experimental results prove the feasibility of the new design method of digital circuit. Compared with the design circuit of single-objective evolution, multi-objective evolutionary circuit has less active CLBs, redundancies and time-delay level. And the improved multi-objective evolutionary strategy shortens the evolutionary time greatly. In the paper, the proposed method is only applied in combinational logic circuit. And more research is still needed on multi-objective on-line evolutionary method for sequential circuit in the future. Acknowledgments. The work presented in this paper has been funded by National Natural Science Foundation of China (60374008, 90505013) and Aeronautical Science Foundation of China (2006ZD52044, 04I52068).
References 1. Yao, X., Higuchi, T.: Promises and challenges of evolvable hardware. IEEE Transactions on Systems: Man and Cybernetics 29, 87–97 (1999) 2. Lohn, J.D., Horny, G.S.: Evolvable hardware: using evolutionary computation to design and optimize hardware systems. Computational Intelligence Magazine 1, 19– 27 (2006) 3. Anderson, E.F.: Off-line Evolution of Behaviour for Autonomous Agents in RealTime Computer Games. In: Proc. of 7th International Conference of Parallel Problem Solving from Nature, Granada, Spain, pp. 689–699 (2002) 4. Darniani, E., Tettamanzi, A.G.B., Liberali, V.: On-line Evolution of FPGA-based Circuits: A Case Study on Hash Functions. In: Proc. Of the First NASA/DoD Workshop on Evolvable Hardware, California, USA, pp. 26–33 (1999)
76
G. Gao et al.
5. Yao, X., Liu, Y.: Getting most out of evolutionary approaches. In: Proc. of NASA/DoD Conference of Evolvable Hardware, Washington DC, USA, pp. 8–14 (2002) 6. Hartmann, M., Lehre, P.K., Haddow, P.C.: Evolved digital circuits and genome complexity. In: Proc. Of NASA/DoD Conference of Evolvable Hardware, Washington DC, USA, pp. 79–86 (2005) 7. Shuguang, Z., Yuping, W., Wanhai, Y., et al.: Multi-objective Adaptive Genetic Algorithm for Gate-Level Evolution of Logic Circuits. Chinese Journal of Computeraided Design & Computer Graphics 16, 402–406 (2004) 8. Guoliang, H., Yuanxiang, L., Xuan, W., et al.: Multiobjective simulated annealing for design of combinational logic circuits. In: Proc. of the 6th World Congress on Intelligent Control and Automation, Dalian, China, pp. 3481–3484 (2006) 9. Soliman, A.T., Abbas, H.M.: Combinational circuit design using evolutionary algorithms. In: IEEE CCECE 2003 Canadian Conference on Electrical and Computer Engineering, Canadian, pp. 251–254 (2003)
Evolutionary Design of Generic Combinational Multipliers Using Development Michal Bidlo Brno University of Technology, Faculty of Information Technology Boˇzetˇechova 2, 61266 Brno, Czech republic
[email protected]
Abstract. Combinational multipliers represent a class of circuits that is usually considered to be hard to design by means of the evolutionary techniques. However, experiments conducted under the previous research demonstrated (1) a suitability of an instruction-based developmental model to design generic multiplier structures using a parametric approach, (2) a possibility of the development of irregular structures by introducing an environment which is considered as an external control of the developmental process – inspired by the structures of conventional multipliers and (3) an adaptation of the developing structures to the different environments by utilizing the properties of the building blocks. These experiments have represented the first case when generic multipliers were designed using an evolutionary algorithm combined with the development. The goal of this paper is to present an improved developmental model working with the simplified building blocks based on the concept of conventional generic multipliers, in particular, adders and basic AND gates. We show that this approach allows us to design generic multiplier structures which exhibit better delay in comparison with the classic multipliers, where adder represents a basic component.
1
Introduction
The design of combinational multipliers has been often concerned as a non-trivial task for demonstrating the capabilities of evolutionary systems. Gate-level representation has been usually utilized whose search space is typically rugged and hard to explore using the evolutionary algorithms. In case of applying a direct encoding (a non-developmental genotype–phenotype mapping) it is extremely difficult to achieve scalability of the evolved solutions, i.e. to obtain larger instances of the circuits, for example, when the traditional Cartesian Genetic Programming (CGP) is utilized [1]. Therefore, more effective representations have been investigated in order to overcome these issues and, in general, improve the scalability and evolvability of digital circuits, as summarized in the following paragraph. Miller et al outlined the principles in the evolutionary design of digital circuits and showed some results of evolved combinational arithmetic circuits, including multipliers, in [2]. A detailed study of the fitness landscape in case of the evolutionary design of combinational circuits using Cartesian Genetic Programming is L. Kang, Y. Liu, and S. Zeng (Eds.): ICES 2007, LNCS 4684, pp. 77–88, 2007. c Springer-Verlag Berlin Heidelberg 2007
78
M. Bidlo
proposed in [3]. 3 × 3 multipliers constitute the largest and most complex circuits designed by means of traditional CGP in these papers. Vassilev et al utilized a method based on CGP which exploits redundancy contained in the genotypes. Larger (up to 4 × 4 bits) and more efficient multipliers were evolved by means of this approach in comparison with the conventional designs [4]. Vassilev and Miller studied the evolutionary design of 3 × 3 multipliers by means of evolved functional modules rather than only two-input gates [5]. Their approach is based on Murakawa’s method of evolving sub-circuits as the building blocks of the target design in order to speed up and improve the scalability of the design process [6]. Torresen applied the partitioning of the training vectors and the partitioning of the training set approach (so-called increased complexity evolution or incremental evolution) for the design of multiplier circuits. His approach was focused on improving the evolution time and evolvability rather than optimizing the target circuit. 5×5 multipliers were evolved using this method [7]. Stomeo et al devised a decomposition strategy for evolvable hardware which allows to design large circuits [8]. Among others, 6×6 multipliers were evolved by means of this approach. Aoki et al introduced an effective graph-based evolutionary optimization technique called Evolutionary Graph Generation [9]. The potential capability of this method was demonstrated through experimental synthesis of arithmetic circuits at different levels of abstraction. 16 × 16 multipliers were evolved using word-level arithmetic components (such as one-bit full adders or one-bit registers). An instruction-based developmental system was introduced in [10] for the design of arbitrarily large multipliers. Genetic algorithm was utilized to evolve a program for the construction of generic multipliers using a parametric approach. Basic AND gates and higher building blocks based on one-bit adder were utilized. A concept of environment (an external control of the developmental process) was introduced in order to design irregular structures. An interesting phenomenon of adaptation to different environments was observed. The results presented in [10] pose the first case when generic multipliers were evolved using the development. In this paper an improved developmental model is presented that is based on the system introduced in [10]. Simplified building blocks are utilized in order to clarify the circuit structure and not to restrict the search space (only basic AND gates and pure half and full adders are considered). The adaptation is exploited (as discussed in [10]) for the design of different circuit structures depending on the environment chosen. In particular, the experiments are devoted to the design of carry-save multipliers which exhibit shorter delay in comparison with the classic multipliers as described in [11]. Note that the evolutionary development of classic generic multipliers was introduced in [10].
2
Biologically Inspired Development
In nature, the development is a biological process of ontogeny representing the formation of a multicellular organism from a zygote. It is influenced by the genetic information of the organism and the environment in which the development is carried out.
Evolutionary Design of Generic Combinational Multipliers
79
In the area of computer science and evolutionary algorithms in particular, the computational development has been inspired by that biological phenomena. Computational development is usually considered as a non-trivial and indirect mapping from genotypes to phenotypes in an evolutionary algorithm. In such case the genotype has to contain a prescription for the construction of a target object. While the genetic operators work with the genotypes, the fitness calculation (evaluation of the candidate solutions) is applied on phenotypes created by means of the development. The utilization of environment in the computational development may be understood as an external information (in addition to the genetic information included in the genotype) and as an additional control mechanism of the development. The principles of the computational development together with a brief biological background are summarized in [12].
3
Development of Efficient Generic Multipliers
The method of the development is inspired by the construction of conventional combinational multipliers for which generic algorithms exist. Figure 1 shows two typical designs of a 4 × 4 multiplier constructed by means of the conventional approach [11]. It is evident that the circuits contain parts which differ from the rest of the circuit, i.e. they represent a kind of irregularity. In particular, it is a case of the first level (“row”) of AND gates occuring in both multipliers in Fig. 1a,b and the last level of adders occuring in the carry-save multiplier (Fig. 1b). However, the rest of the circuit structure exhibits a high level of regularity that can be expressed by means of an iterative algorithm utilizing variables and parameters related to a given instance of the multiplier. For example, the number of bits of the operands determines the number of AND gates and adders in the appropriate circuit level or the number of levels of the multiplier. Theferore, this concept is assumed to be convenient for the design of generic multipliers using the development and an evolutionary algorithm. Experiments were conducted under the previous research dealing with the evolutionary design of generic multipliers using an instruction-based developmental model [10]. The building blocks utilized in that method include an adder put together with a basic AND gate which, however, may pose an unsuitable approach preventing the evolution from finding better solutions. Nevertheless, general programs were evolved for the construction of multipliers whose structure corresponds to that of the classic combinational multiplier shown in Fig. 1a. The obtained results showed the possibility of designing generic multipliers using an evolutionary algorithm with the development which gave rise an interesting area deserving of future investigation. Therefore, new features of the developmental model will be introduced in this paper in order to design more efficient multipliers than that were evolved in [10]. The approach presented herein is based on the structure of the carry-save multiplier (see Fig. 1b) which exhibit shorter delay in comparison with the classic multiplier shown in Fig. 1a [11]. In particular, simplified building blocks will be introduced including only basic AND gates and pure one-bit adders and an enhanced instruction for creating the
80 a
b 0
0
M. Bidlo a
1 b0
a
2 b0
a
3 b0
a
0 b1
a
1 b1
a
2 b1
a
a
b
0
3 b1
0
a
1 b0
a
a
0 b1
a
HA
HA
FA a0 b
FA 2
HA
a1 b
2
a2 b
a3
0
1 b1
a
a
HA a
2
b
2
b
0
FA
FA
a0 b 3
a1 b 3
b
2
b
0
1
a
2
b
1
p
1
p
2
2
a
a2
b3
FA a
FA a3
FA
FA
FA
p
p
p 5
p 6
4
2
b
1
2
a
3
b
2
b
2
FA
b
0
3
a
1 b3
a
3
a
b
3
3
b3
HA 3
b
b 2
FA
p 0
3
HA a
FA
HA
3
p 7
p
0
p
1
p
2
p
3
(a)
FA
FA
HA
FA
FA
p
p
p
4
5
6
p
7
(b)
Fig. 1. 4 × 4 conventioanl multipliers: (a) classic combinational multiplier, (b) more efficient carry-save multiplier. a0 , . . . , a3 , respective b0 , . . . , b3 denote the bits of the first, respective the second operand, p0 , . . . , p7 represent the bits of the product.
circuit structure will be presented that is able to generate two building blocks at a time — in addition to the single-block generative instruction introduced in [10] — due to the increased complexity of the construction algorithm needed for the design of generic multiplier structure using the simplified building blocks.
4
Instruction-Based Developmental System
A simple two-dimensional grid consisting of a given number of rows and columns was chosen as a suitable structure for the development of the target circuits. The building blocks are placed into this grid by means of a developmental program. In order to handle irregularities, an external control of the developmental process has been introduced that is called an environment. A building block represents a basic component of the circuit to be developed. The general structure of the block is shown in Figure 2a. Each building block contains three inputs from which one or two may be unused depending on the type of the block. There are two outputs at each building block from which one may be meaningless, i.e. permanently set to logic 0, depending on the block type. The outputs are denoted symbolically as out0 and out1. In case of the block containing only one output, out0 represents the effective output and out1 is permanently set to logic 0. The circuit is developed inside a grid (rectangular array) which proved to be a suitable structure for the the design of combinational multipliers (see Figure 2b). Figure 3 shows the set of building blocks utilized for the experiments presented in this paper. For the interconnection of the blocks the position (row, col) in the grid is utilized. The inputs of the blocks are connected
Evolutionary Design of Generic Combinational Multipliers
81
to the outputs of the neighboring blocks by referencing the symbolic names of the outputs or via indices to the primary inputs of the circuit, depending on the block type. Feedback is not allowed. For example, out1(row, col − 1) means that the input of the block at the position (row, col) in the grid is connected to the output denoted out1 of the block on its left-hand side. The connections to the primary inputs are determined by the indices v0 and v1 . Let A = a0 a1 a2 , B = b0 b1 b2 represent the primary inputs (operands A and B) of a 3 × 3 multiplier. For instance, an AND gate with v0 = 1 and v1 = 2 has its inputs connected to the second bit (a1 ) of operand A and the third bit (b2 ) of operand B. In case of the building blocks at the borders of the grid (when row = 0 or col = 0), where no blocks with valid outputs occur (for row − 1 or col − 1), the appropriate inputs of the blocks at (row, col) are set to logic 0. The development of the circuit is performed by means of a developmental program. This program, which is the subject of evolution, consists of simple application-specific instructions. The instructions make use of numeric literals 0, 1, . . . , max value, where max value is specified by the designer at the beginning of evolution. In addition to the numeric literals, a parameter and some inputs
col
row
(0,0)
block type position in the grid (row, col) out0
out1
outputs (m−1,n−1)
(a)
(b)
Fig. 2. (a) Structure of a building block. (row, col) determines the position of the block in the grid – see part (b). The connection of the inputs depends on the type and position of the block. (b) A grid of the building blocks with m rows and n columns for the development of generic multipliers.
(a)
(b)
out0
(c)
out0 out1 sum cout
(d)
out0 out1 sum cout
(e)
FA−1 (row, col) out0 out1 sum cout
(f)
out0(row−2, col) out0(row−1, col)
FA−2 (row, col) out0 out1 sum cout
(g)
out0(row−1, col) out1(row−1, col−1) out1(row, col−1)
out0
HA−2 (row, col)
out0(row−2, col) out0(row−1, col)
out1(row−2, col−1)
HA−1 (row, col)
ID−2 (row, col)
out0(row−1, col)
out1(row, col−1)
out0
out0(row−2, col) out0(row−1, col) out1(row, col−1)
ID−1 (row, col)
v0 v1 out1(row, col−1)
out0(row−1, col)
FA−3 (row, col) out0 out1 sum cout
(h)
Fig. 3. Building blocks for the development of generic multipliers. (a, b) buffers – identity functions, (c) AND gate, (d, e) half adders, (f, g, h) full adders. (row, col) denotes the position in the grid. v0 and v1 determine indices of primary input bits. Connection of different inputs of the blocks are shown. Unused inputs and outputs are not depicted (set to logic 0). Note that the full adders (g, h) are new to this paper and are inspired by and intended for the design of carry-save multipliers.
82
M. Bidlo
variables of the developmental system can be utilized. The parameter represents the width (the number of bits) of the operands – inputs of the multiplier. The parameter is referenced by its symbolic name w, whose value is specified by the designer at the beginning of the evolutionary process. For example, in case of designing a 4 × 4 multiplier, the parameter possesses this value, i.e. w = 4. The value of the parameter is invariable during the evolutionary process. There are four variables integrated into the developmental system denoted v0 , v1 , v2 and v3 , whose values are altered by the appropriate instructions during the execution of the program (developmental process). Table 1 describes the instruction set utilized for the development. The SET instruction assigns a value determined by a numeric literal, parameter or another variable to a specified variable. Instructions INC, respective DEC are intended for increasing, respective decreasing the value of a given variable. The difference can be specified only by a numeric literal. Simple loops inside the developmental program are provided with the REP instruction whose first argument determines the repetition count and the second argument states the number of instructions after the REP instruction to be repeated. Inner loops are not allowed, i.e. REP instructions inside the repeated code are interpreted as NOP (no operation) instructions. The GEN instruction generates one or two building blocks of the type specified by its arguments. If (row, col) do not exceed the grid boundaries, the block is generated at that position. In case of generating two blocks, the second one is placed to (row+1, col). If the grig boundary exceeds, no block is generated out of the grid. The possibility of generating two building blocks during an execution of the GEN instruction represents a new feature introduced in this paper in comparison with the system presented in [10], where only one block could be generated. This variant has been chosen in order to reduce the complexity of the developmental process when the simplified building blocks have been introduced. Note that this approach does nowise restrict the capabilities of the construction algorithm because the number and types of the blocks to be generated are determined independently by means of the evolutionary algorithm. In case of generating an AND gate, its inputs are connected to the primary inputs indexed by the actual values of variables v0 , v1 as shown in Figure 3c. If v0 or v1 exceeds the bit width of the operands, the appropriate input of the AND gate is connected to logic 0. The inputs of the other building blocks are determined by the position (row, col) in the grid they are generated to. After executing GEN, col is increased by one. In fact, the developmental program may consist of several parts, which may consist of different number of instructions. Let us define the length of a program (or a part of a program) as the number of instructions it is composed of. These parts are executed on demand with respect to an environment. A single execution of a part of program is referred to as a developmental step. The meaning of the environment is to enable the system to develop more complex structures which may not be fully regular. The environment is represented by a finite sequence of values specified by the designer at the beginning of the evolution, e.g. env = (0, 1, 2, 2). The number of different values in the environment corresponds to the number of parts of the developmental program. In addition, there is an
Evolutionary Design of Generic Combinational Multipliers
83
Table 1. Instructions utilized for the development Instruction Arguments Description 0: SET variable, value Assign value to variable. variable ∈ {v0 , v1 , v2 , v3 }, value ∈ {0, 1, . . . , max value, w, v0 , v1 , v2 , v3 }. 1: INC variable, value Increase variable by value. variable ∈ {v0 , v1 , v2 , v3 }, value ∈ {0, 1, . . . , max value}. 2: DEC variable, value If variable ≥ value, then decrease variable by value. variable ∈ {v0 , v1 , v2 , v3 }, value ∈ {0, 1, . . . , max value}. 3: REP count, number Repeat count-times number following instructions. All REP instructions in number following ones are interpreted as NOP instructions (inner loops are not allowed). 4: GEN block1, block2 Generate block1 to the actual position (row, col). If block2 is non-emty block, generate block2 to (row + 1, col). Increase col by 1. 5: NOP An empty operation.
environment pointer (let us denote it e) determining a particular value in the environment during the development time. Each part of the program is executed deterministically, sequentially and independently on the others according to the environment values. However, the parameter and the variables of the developmental system are shared by all the parts of program. At the beginning of the evolutionary process the value of the parameter w and the form of the environment env are defined by the designer. The developmental program, whose number of parts and their lengths are also specified a priori by the designer, is intended to operate over these data in order to develop multiplier of a given size. As evident, the different sizes of multipliers are created by setting the parameter and adjusting the environment. Hence the circuit of a given size is always developed from scratch; it is a case of parametric developmental design. The following algorithm will be defined in order to handle the developmental process. 1. Initialize row, col, v0 , v1 , v2 , v3 and e to 0. 2. Execute env(e)-th part of program. 3. If a GEN instruction was executed, increase row by 2 in case of generating two building blocks simultaneously or by 1 if only single blocks were generated. Increase e by one and set col to 0. 4. If neither e, nor row exceed, go to step 2. 5. Evaluate the resulting circuit.
5
Evolutionary System Setup
A chromosome consists of a linear array of the instructions, each of which is represented by the operation code and two arguments (the utilization of the arguments depends on the type of the instruction). The array contains n parts of the developmental program stored in sequence, whose lengths (the number of
84
M. Bidlo
instructions) correspond to l0 , l1 , . . . , ln−1 . The number of the parts and their lengths are determined by the designer. In general, the structure of a chromosome can be expressed as i0,0 i0,1 . . . i0,l0 −1 ; . . . ; in−1,0 in−1,1 . . . in−1,ln−1 −1 , where ij,k denotes the k-th instruction of j-th part of program for k = 0, 1, . . . , lj − 1 and j = 0, 1, . . . , n − 1. During the application of the genetic operators the parts of the program are not distinguished, i.e. the chromosome is handled as a single sequence of instructions. The chromosomes possess constant length during the evolution. The population consists of 32 chromosomes which are generated randomly at the beginning of evolution. Tournament selection operator of base 2 is utilized. Mutation of a chromosome is performed by a random selection of an instruction followed by a random choice of a part of the instruction (operation code or one of its arguments). If the operation code is to be mutated, entire new instruction will replace the original one, otherwise one argument is mutated. The mutation algorithm ensures proper values of arguments depending on the instruction type (see Table 1). The mutation is performed with the probability 0.03, only one instruction per chromosome is mutated. A special crossover operator has been applied with probability 0.9, working as follows. Two parent chromosomes are selected and an instruction is selected randomly in each of them (i1 , i2 ). A position (index) is chosen randomly in each of the two offspring (c1 , c2 ). After the crossover, the first, respective the second offspring contains the original instructions from the first, respective the secont parent with the exception of i1 , respective i2 , which is put at the position c2 in the second offspring, respective c1 in the first offspring. The fitness function is calculated as the number of bits processed correctly by the multiplier developed by means of the program stored in the chromosome. The experiments were conducted with the evolution of programs for the construction of 4 × 4 multipliers, i.e. the parameter w = 4. There are 24+4 = 256 possible test vectors and the multipliers produce 8-bit results. Therefore, the maximum fitness representing a working solution equals 256 · 8 = 2048. After the evolution the resulting program is verified in order to determine whether it is able to create larger multipliers, typically up to the size 14 × 14 bits. This size of circuit was determined experimentally, allowing to perform a sufficient number of developmental steps for demonstrating the correctness of the evolved program and keeping a reasonable verification time. If a program shows this ability, it will be considered as general.
6
Experimental Results and Discussion
The experiments were devoted to the design of carry-save multipliers which exhibit better properties in comparison with the classic multipliers. In [10] an external information called an environment was introduced into the development for additional control of the developmental process. Moreover, an ability of adaptation of the design to different environments was observed allowing to create different multiplier structures. This feature was utilized in the experiments presented
Evolutionary Design of Generic Combinational Multipliers
85
herein in order to investigate the ability of the evolutionary design system to construct carry-save multipliers. The selection of the evolved programs and circuits presented in this paper is based on their generality (i.e. the ability to construct generic multipliers) and the resemblance to the carry-save multiplier structure with respect to the circuit delay and the number of building blocks the developed multipliers are composed of. In the first sort of experiments a subset of the bulding blocks from Fig. 3 was chosen for the design of the carry-save multipliers (see Fig. 1b). Therefore, only the blocks (a, b, c, d, g, h) were involved in the design process. Considering the irregular structure of the conventional carry-save multiplier, a program consisting of four parts is to be evolved. The parts of the program are executed according to the environment env = (0, 1, 2, 2, 3), which is specified a priori with respect to the structure of carry-save multiplier. Therefore, the construction of the circuit is performed as follows. Considering Fig. 1b, the first level of the AND gates is created using part 0. The second level of AND gates together with the following level of adders are constructed by means of part 1. According to the environment, the next levels of AND gates and adders are created by means of double application of part 2. Finally, part 3 is utilized to design the last level of adders. Two hundreds of independent runs of the evolutionary algorithm were conducted from which 18% evolved a correct program for the construction of 4 × 4 multipliers. 60% of the evolved programs were classsified as general, i.e. able to create arbitrarily large multiplier. Figure 4 shows (a) one of the best evolved general program and (b) a 4 × 4 multiplier constructed by means of that program. At the beginning of the development, the system is initialized: the variables are set to 0, the parameter is set to 4, row and column positions are initialized to 0 and the number of rows and columns is limited to 8 – no gate may be generated a
a
0 b0
a
b0
1
0 a0
0 0
a
2 b0
0 a1
b 1
0 a2 b 1
b 1
a3 b
0 1
FA
0
FA
FA a
0 0 FA
0
FA a
0 b2
0
FA
0
FA a
1 b2
0
2 b2
a
b2
3
0
FA
0
FA
FA
FA
a0 b 3
a1 b
0
FA
FA
FA
FA
0
FA
FA
FA
FA
FA
p
p 5
p 6
p
0 0
3 b0
0
FA
0
0
FA 3
a2
b 3
0 a3
b 3
0
FA
0
0 0
0 0 FA p 0
(a)
0
FA p 1
0
FA p 2
p
3
0
4
7
(b)
Fig. 4. (a) An evolved general program, (b) 4 × 4 multiplier exhibiting the carry-save structure created by means of this program. Note that blank rectangles represent empty blocks (not generated by any instruction) whose outputs are considered as logic 0.
86
M. Bidlo
behind the grid boundaries. According to the first element of the environment (0), part 0 of the evolved program is executed. (see Fig. 4a) The first REP instruction initiates a loop repeating 4 times (for w = 4 – designing a 4 × 4 multiplier) two instructions after the REP instruction. In each pass, an AND gate (code 2 in the argument of GEN instructions at line 2) is generated with its inputs connected to the primary inputs of the circuit indexed by the values of variables v0 , v1 . Moreover, v0 is increased by 1 (line 3) so that the AND gates generated in different passes possess the different first input. After executing a GEN instruction, the column position is increased by 1. After finishing part 0, the row position is increased by 1 and the column position is set to 0. According to the next element of the environment (1), part 1 will be executed. Note, however, that the GEN instruction at line 2 of part 1 generates two building blocks into the actual column, the second block “under” the first one: full adder is generated into the second row of the first column (code 5 in the first GEN argument) and the identity function is generated into the third row of the first column (code 0 in the second argument). Since there have been building blocks generated into two rows, the row position is increased by 2 after finishing part 1. In case of executing part 3, only the full adders are generated (code 5 of the GEN instructions at lines 4 and 5) as there is no space left in the grid for the second level of blocks specified by the second argument of the GEN instructions – the number of rows of the grid was limited to 8 for this experiment. It is evident that the multiplier shown in Fig. 4 could be optimized with respect to the inputs of some building blocks (e.g. adders possessing only one non-zero input could be replaced by the identity functions as demonstrated in [10]). After this optimization the circuit corresponds to the carry-save multiplier shown in Fig. 1b. The second sort of experiments was devoted to the design of multipliers using the full set of building blocks shown in Fig. 3 and the same form of environment like in the previous experiment. Therefore, this setup corresponds to both variants of the multipliers from Fig 1. The prefix (0, 1, 2, 2) of the environment may be utilized for the evolution of classic multiplier structures shown in Fig. 1a. Again, 200 independent experiments were conducted from which 37% of working programs were obtained and 54% of them were classified as general. However, the experiments showed that the evolution of efficient carry-save multipliers is extremely difficult using this setup. Although there is all the resources available as in the first sort of experiments, no valid carry-save structure was obtained. The evolution generated the carry-save components very rarely and not to the positions at which they could be usefully utilized during the circuit operation. The classic structures (Fig. 1a) were evolved instead. An example of a general program together with a 4 × 4 multiplier is shown in Fig. 5 which represents the same type of the classic multiplier structure that was evolved in [10]. The experiments presented in this section represent a continuation of the successful research in the field of the evolutionary design of generic multipliers using development. The phenomenon of adaptation of the developmental process to different environments during the evolution introduced in [10] enabled
Evolutionary Design of Generic Combinational Multipliers a b 0 0
a b 0 1
0
FA
0
a b 0 2
0 0
FA
a b 0 3
0 0
FA
87
0 0
FA 0
0
a 1b0
a 1b1
a 1 b2
a 1b3
FA
FA
FA
FA
FA
a b 2 0
a b 2 1
a b 2 2
a b 2 3
a
HA
FA
FA
FA
FA
a b 3 0
a b 3 1
a b 3 2
a b 3 3
a
FA
HA
FA
FA
FA
FA
p
p
p 4
p
p
p
0
FA
0
0
2 0
0
0 0
FA
0
0
FA
0
3 0
0
0
0 0
FA p
(a)
0
0
FA p
1
0
2
3
5
6
7
(b)
Fig. 5. (a) An evolved general program, (b) a 4 × 4 multiplier based on the structure of the classic combinational multiplier. Blank rectangles represent empty blocks with the outputs possessing logic 0.
us to design various multiplier structures. In particular, the carry-save structure was rediscovered in this paper, exhibiting shorter delay in comparison with the classic multiplier, which was the main goal in the new sorts of our experiments. Although the carry-save multipliers showed to be very hard to evolve, the evolutionary developmental system demonstrated the ability to design this class of multipliers using a reduced set of building blocks. Moreover, simplified building blocks were introduced in this paper together with an improved developmental model in comparison with [10]. Therefore, there is a smaller limitation of the evolutionary process which, however, leads to more difficult evolution because of lower level of abstraction in the circuit representation. The success of the evolution of carry-save multipliers demonstrates an ability of the experimental system to design different circuit structures with more complex interconnection of their components which represents a promising area for the next research.
7
Conclusions
In view of the successful experiments, there is a big potential for the application of this model to other classes of well-scalable circuits, e.g. adders, median and sorting networks etc. Therefore, the future research will be focused on adjusting the existing system to the specific circuit structures in order to investigate the evolution in the designs involving other building blocks and environments with respect to the construction of generic combinational circuits. Acknowledgements. This work was supported by the Grant Agency of the Czech Republic under contract No. 102/07/0850 Design and hardware implementation of a patent-invention machine, No. 102/05/H050 Integrated Approach
88
M. Bidlo
to Education of PhD Students in the Area of Parallel and Distributed Systems and the Research Plan No. MSM 0021630528 Security-Oriented Research in Information Technology.
References 1. Miller, J.F., Thomson, P.: Cartesian genetic programming. In: Poli, R., Banzhaf, W., Langdon, W.B., Miller, J., Nordin, P., Fogarty, T.C. (eds.) EuroGP 2000. LNCS, vol. 1802, pp. 121–132. Springer, Heidelberg (2000) 2. Miller, J.F., Job, D.: Principles in the evolutionary design of digital circuits – part I. Genetic Programming and Evolvable Machines 1(1), 8–35 (2000) 3. Miller, J.F., Job, D.: Principles in the evolutionary design of digital circuits – part II. Genetic Programming and Evolvable Machines 3(2), 259–288 (2000) 4. Vassilev, V., Job, D., Miller, J.: Towards the automatic design of more efficient digital circuits. In: Proc of the Second NASA/DoD Workshop on Evolvable Hardware, Palo Alto, CA, pp. 151–160. IEEE Computer Society Press, Los Alamitos (2000) 5. Vassilev, V., Miller, J.F.: Scalability problems of digital circuit evolution. In: Proc. of the 2nd NASA/DoD Workshop of Evolvable Hardware, Los Alamitos, CA, US, pp. 55–64. IEEE Computer Society Press, Los Alamitos (2000) 6. Murakawa, M., Yoshizawa, S., Kajitani, I., Furuya, T., Iwata, M., Higuchi, T.: Hardware evolution at function level. In: Ebeling, W., Rechenberg, I., Voigt, H.M., Schwefel, H.-P. (eds.) PPSN IV 1996. LNCS, vol. 1141, pp. 206–217. Springer, Heidelberg (1996) 7. Torresen, J.: Evolving multiplier crcuits by training set and training vector partitioning. In: Tyrrell, A.M., Haddow, P.C., Torresen, J. (eds.) ICES 2003. LNCS, vol. 2606, pp. 228–237. Springer, Heidelberg (2003) 8. Stomeo, E., Kalganova, T., Lambert, C.: Generalized disjunction decomposition for evolvable hardware. IEEE Transactions on Systems, Man and Cybernetics – Part B 36, 1024–1043 (2006) 9. Aoki, T., Homma, N., Higuchi, T.: Evolutionary synthesis of arithmetic circuit structures. Artificial Intelligence Review 20, 199–232 (2003) 10. Bidlo, M.: Evolutionary development of generic multipliers: Initial results. In: Proc. of The 2nd NASA/ESA Conference on Adaptive Hardware and Systems, AHS 2007, IEEE Computer Society Press, Los Alamitos (2007) 11. Wakerly, J.F.: Digital Design: Principles and Practice. Prentice Hall, New Jersey, US (2001) 12. Kumar, S., Bentley, P.J. (eds.): On Growth, Form and Computers. Elsevier Academic Press, Amsterdam (2003)
Automatic Synthesis of Practical Passive Filters Using Clonal Selection Principle-Based Gene Expression Programming Zhaohui Gan1, Zhenkun Yang1, Gaobin Li1, and Min Jiang2 1
College of Information Science and Engineering, Wuhan University of Science & Technology 430081 Wuhan, China
[email protected],
[email protected] 2 College of Computer Science, Wuhan University of Science & Technology 430081 Wuhan, China
[email protected]
Abstract. This paper proposes a new method to synthesize practical passive filter using Clonal Selection principle-based Gene Expression Programming and binary tree representation. The circuit encoding of this method is simple and efficient. Using this method, both the circuit topology and component parameters can be evolved simultaneously. Discrete component value is used in the algorithm for practical implementation. Two kinds of filters are experimented to verify the excellence of our method, experimental results show that this approach can generate passive RLC filters quickly and effectively.
1 Introduction Automatic design of electronic circuit is the dream of electronic engineers. Many scholars have done a lot of research on this direction. Until now, automatic design of digital circuit has made great progress. However, analog circuit synthesis, including topology and sizing, is a complex problem. Most analog circuits were designed by skilled engineer who uses conventional methods based on domain-specific knowledge. However, recent significant development of computer technology and circuit theory made it possible for us to take some approaches to automatically synthesize analog circuits by computers. Analog circuit synthesis involves both the sizing (component value) and topology (circuit structure) optimization. Recently, remarkable progress has been made in analog circuit synthesis. Horrocks successfully applied Genetic Algorithms (GA) into component value optimization for passive and active filter using preferred component values [1, 2]. Parallel Tabu Search Algorithm (PTS) was also applied into the same problem by Kalinli [3]. GA was also applied to select circuit topology and component values for analog circuit by Grimbleby [4]. Koza and his collaborators have done lots of research on automatic analog circuit synthesis by means of Genetic Programming (GP) [5, 6], it maybe the most notable progress in this field. Based on GP, they developed circuit-construction program trees which have four kinds of circuit-construction L. Kang, Y. Liu, and S. Zeng (Eds.): ICES 2007, LNCS 4684, pp. 89–99, 2007. © Springer-Verlag Berlin Heidelberg 2007
90
Z. Gan et al.
functions. The behavior of each circuit was evaluated by SPICE (Simulation Program with Integrated Circuit Emphasis). A main drawback of this technique is that it is very complex to implementation and it requires large computing time. Some topology-restricted approaches were proposed to reduce the computing time. Lohn and Colombano developed a compact circuit topology representation that decreases the complexity of evolutionary synthesis of analog circuit [7]. These representation methods restricted some potential topologies and decreased the running time. Using parallel genetic algorithm, these algorithms allows circuit size, circuit topology, and component values to be evolved [8]. In [9, 10], a novel tree representation method was proposed to synthesize basic RLC filter circuits, this representation method was restricted to series-parallel topologies. Compared with general-topology method, this approach was more efficient for passive circuit synthesis. Gene Expression Programming (GEP), a new technique of evolutionary algorithm used for automatic creation of computer programs, was first invented by Candida Ferreira in 1999 [11]. The main difference between GA, GP and GEP is the form of individual encoding: individuals in GA are linear strings of fixed length whereas the GP individuals are trees of different shape and size, the GEP individuals are also trees of different shape and size, but their encoding is linear strings of fixed length using Karva notation [12]. GEP combines the advantage of both GA and GP, while overcoming some shortcomings of GA and GP. Clonal Selection Algorithm (CSA), inspired by natural immunological mechanisms, has been successfully applied into several challenging domains, such as multimodal optimization, data mining and pattern recognition [13, 14]. CSA can enhance the diversity of the population and has faster convergence speed. Clonal Selection principle-based Gene Expression Programming (CS-GEP) [15] is proposed by us as an evolutionary algorithm, which combines the advantages of Clonal Selection Algorithm and GEP. In this paper, based on the binary tree representation method, CS-GEP is applied into practical passive filter synthesis successfully. This method allows both the circuit topology and sizing to be evolved simultaneously. Furthermore, taking the practical conditions into account, discrete component values are used in the passive filter circuit synthesis, so it is convenient for engineering implementation. Two kinds of passive filter design tasks are finished to evaluate the proposed approach, the experiment results demonstrate that our method can synthesize passive filters effectively. This paper is organized as follows. Section 2 gives an overview of the related work including the circuit representation method and the overview of GEP. Section 3 explains the circuit encoding method in detail and applies CS-GEP algorithm into practical passive filter synthesis. The experiments of a low-pass and a high-pass filter design are covered in Section 4. Some conclusions and further works are drawn in Section 5.
2 Related Work and Motivations 2.1 An Overview of Gene Expression Programming Gene Expression Programming is a new evolutionary algorithm for creation of computer programs automatically. Similar to GP, GEP also uses randomly generated
Automatic Synthesis of Practical Passive Filters
91
population and applies genetic operators into this population until the algorithm finds an individual that satisfies some termination criteria. The main difference between GP and GEP is that the GEP individuals are encoded as linear strings of fixed length using Karva notation, the chromosome is simple, linear, and compact encoding by this method. In GEP, each gene is composed of a fixed length of symbols (including head and tail). The head contains symbols that represent both functions and terminals, whereas the tail contains only terminals. For each problem, the length of the head h is predetermined by user, whereas the length of tail t is given by:
t = h × ( n − 1) + 1
(1)
where n is the largest arity of the functions in function set. For example, from the function set F= {E, Q, S, +, -, *, /} and the terminal set T= {x, y}, suppose an algebraic expression is: sin ( xy ) + y + e x
(2)
Then if we define h is 10, here n is 2, so, t=11. The gene is shown in Fig. 1:
Head Tail 012345678901234567890 Q+S+ * y E x y x y x x x y y x y x y x Fig. 1. The gene of GEP for equation (2) Q
+
+ S y
*
x
Gene 1
+
y
E x
(a) The expression tree of Eq. (2)
+
Gene 2
+
Gene 3
Gene 4
(b) A four-gene chromosome linked by addition
Fig. 2. The expression tree of gene and chromosome
where, Q represents the square-root function, E represents the exponential function, and S represents the sinusoidal function. The gene can be represented by the expression tree shown in Fig. 2 (a). The GEP chromosomes are usually composed of several genes of equal length. The interaction between the genes was specified by the linking function. An example of a four-gene chromosome linked by addition is shown in Fig. 2 (b).
92
Z. Gan et al.
+ +
RS L2 0.33uH
//
L1
L3 100uH L1 22uH
L5 68uH
C1 C2 82uF
RS 1k VS
C1 0.1uF
+ L2
L4 47uH
RL 1k
+
Vout
C3 0.22uF
//
// L3
C2 L4
(a) Schematic
+
+ C3
L5
RL
(b) Binary tree representation
Fig. 3. An example of RLC circuit and its tree representation
2.2 Circuit Representation and Analysis
Binary tree can be used to represent the structure of series-parallel RLC circuits [9], an example of RLC circuit and its corresponding binary tree representation are shown in Fig. 3. There are two kinds of node in the binary tree: connection nodes and component nodes. Two types of connection nodes are used to represent the series and parallel connection, which are denoted by + and //, respectively. The component nodes consist of three types of passive electrical component: R (resistor), L (inductor) and C (capacitor). Compared to other evolutionary approaches simulated by SPICE, the circuit analysis algorithm for this circuit presentation is very simple. First, the impedance of each node is calculated from leaf to root. Second, starting from the root node, the current flow down through out of the tree, the current of each node can be obtained according to circuit theory [16]. If the node is series, the current flowing out is equal to the current flowing in; if the node is parallel, the current flowing in is divided inversely proportional to the impedance of its children nodes, and flowing out to the children nodes. Finally, the voltage of RL is calculated by multiplying its current and impedance, the voltage gain is obtained through dividing the voltage of RL by the source voltage. 2.3 Motivation from Binary Tree Representation and GEP
The binary tree representation of RLC circuit is compact and efficient for synthesizing practical passive filters. GEP retains the benefit of GA and GP, the chromosome of GEP is simple and efficient due to the linear encoding method. All the above advantages motivated us to develop a algorithm that combines the binary tree representation method and GEP to synthesize practical passive filters.
3 Synthesize RLC Filters Using CS-GEP 3.1 Circuit Encoding Method
The template of RLC passive filter is shown in Fig.4, where VS is the input signal and Vout is the output signal, the value of RS and RL both are 1k ohm. Fig. 4 (a) is the
Automatic Synthesis of Practical Passive Filters
+
Evolved
RS 1k
93
RLC Circuit
RL 1k
VS
RS
Vout
Evolved RLC Tree
RL (a) The schematic
(b) The tree representation
Fig. 4. The template of evolved RLC circuit
Head
Tail
Topology
Value Domain Sizing
(a) The structure of a gene
Tail Head 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 90123456789012345678 + RS + L1 // C1 + L2 + // // L3 C2 + + L4 C3 L5 RL R C L L R C R L C R R R L C C L R L L R (b) The head and tail of the gene
Value Domain 5 11 18 14 2 9 63 25 34 7 4 54 12 36 50 48 2 19 12 58 (c) The value domain of the gene
Fig. 5. The gene of the circuit in Fig. 3
schematic of the circuit, the dashed line enclosed part is to be evolved by our algorithm. Fig. 4 (b) is the corresponding tree representation of this circuit. The most difficult part of analog circuit synthesis is how to encode the circuit (including topology and sizing). The main issue is to explore a rule that describes an analog circuit correctly. In our paper, the binary tree representation method combining with GEP is employed to encode series-parallel RLC circuits. The circuit in Fig.3 is encoded in Fig. 5. Fig. 5 (a) shows that a gene consists of three parts: head, tail and value domain. The topology of a circuit is represented by the head and tail, and the sizing of the circuit is determined by the value domain. The structure of the circuit is encoded in Fig. 5 (b), only two kinds of functions, namely, + and // functions exist in GEP encoding for circuit presentation. The component values of the circuit are listed in Fig. 5 (c). Typically manufactured preferred component values are classified as twelve series (E12), there are twelve values in each decade, the preferred values are 10, 12, 15, 18, 22, 27, 33, 39, 47, 56, 68, 82, 100… Each component value ranges over five decades, so sixty values (Component Value Set) are available for each component. The numbers in the value domain denote the index of the values in Component Value Set.
94
Z. Gan et al.
Each individual of CS-GEP consists of a gene encoded as Fig. 5, which can create a valid circuit. CS-GEP uses a population of individuals to evolve until the algorithm finds a circuit that satisfied the predefined specification. 3.2 The Framework of CS-GEP
The Artificial Immune System (AIS) is a new kind of evolutionary algorithms inspired by the natural immune system to solve real-world problems. The Clonal Selection Principle (CSP) establishes the idea that only those cells that recognize the antigens are selected to proliferate. When an antibody recognizes an antigen with a certain affinity, it is selected to proliferate (asexual). During asexual reproduction, the B-cell clones experience a hypermutation process, together with a high selection pressure, their affinities are improved. In the end of this process, the B-cells with higher antigenic affinity are selected to become memory cells with long life spans. Based on the CSP, Clonal Selection Algorithm (CLONALG) was proposed by de Castro and Von Zuben to solve complex problems [14]. Combining the advantage of CSA and GEP, Clonal Selection principle-based Gene Expression Programming (CS-GEP) is proposed as a new evolutionary algorithm. The algorithm is briefly described as follows: Step 1: Initialization An initial population P (N individuals) is generated randomly. Step 2: Evaluation Evaluate the fitness of each individual, the fitness function is based on the mean squared error (MSE) of the individuals, the mean squared error E of the individual is expressed as: E=
1 n 2 Wi ( Ai − Ti ) ∑ n i =1
(3)
where n is the number of sampling points of frequency and i is the index of the sampling point, Wi is the weight of the sampling point i, Ai is the actual frequency response and Ti is the target frequency response in the sampling point i of the circuit. The fitness function of the individual is also adopted as: fitness = 1000 ⋅
1 1+ E
(4)
Step 3: Selection Select n (n
⎛ β ⋅N ⎞ N c = round ⎜ ⎟ ⎝ i ⎠
(5)
where, Nc is the number of clones generated for each individual, β is a multiplying factor, N is the total number of individuals, i is the index of current individual in the population, and round( ) is the function that it rounds its variable toward the closest integer.
Automatic Synthesis of Practical Passive Filters
95
Step 4: Mutation The population PC is submitted to a mutation process, there are two types of mutation: topology-mutation and sizing-mutation. In topology-mutation, any node in the head of a gene can be mutated to connection nodes (function) or component nodes (terminal); however, the component node can only be mutated to component node. In sizing-mutation, the component values in value domain can be changed into other ones in Component Value Set. After mutation, a population PM is generated. Step 5: Re-selection Re-select m individuals with highest fitness from PM compose the memory set, some members in population P with lower fitness can be replaced. Step 6: Replacement Replace the d lowest fitness individuals by some new ones to maintain the population diversity. Steps from 2 to 6 are repeated until a maximal generation reached or a satisfied circuit is found.
4 Experiment Result In order to verify the excellence of our approach, a low-pass and a high-pass filter were synthesized using the proposed method. All results are experimented on a Table 1. The parameters of CS-GEP for the two experiments
Parameters of CS-GEP
Value 2000 50 12 0.2 10 2 40 10
Maximum Number of generation Population size N Head length h Mutation rate Number of individuals n for cloning Clone factor β Number of individual m being replaced Number of new individual d
+
L2 270mH
+
RS L1 220mH RS 1k VS
C2
3.3nF
C3
18nF
L3 180mH
//
L1 C1 0.18uF
C4 0.18uF
RL 1k
C1
Vout
//
// // C2
(a)
+
L2
C4
+ L3
C3
RL
(b)
Fig. 6. The evolved low-pass filter. (a) The circuit schematic generated by our method. (b) The binary tree representation of generated circuit.
96
Z. Gan et al. Frequency Response 0 -20
Gain (dB)
-40 -52 -60 X: 2000 Y: -59.79
-80 -100 -120 -140 1
10
100 1000 2k Frequency (Hz)
10k
100k
(a) Frequency Response -5
Gain (dB)
-6
-7
-8
-9
-10
1
10
100 Frequency (Hz)
1000 2k
10k
(b). Fig.7. (a) Frequency response of the low-pass filter shown in Fig. 6. (b) The detail section of pass band.
personal computer, whose configuration is as follows. CPU: Intel Celeron 2.66GHz, memory: 512MB, operation system: MS Windows XP Professional. SP2, compiler: Visual C++ 6.0. Table 1 lists the parameters of CS-GEP used in the two experiments. 4.1 Experiment A: Low-Pass Filter Synthesis
The specification of the low-pass filter is given bellow: Pass-band edge: 1.0 kHz Stop-band edge: 2.0 kHz
Automatic Synthesis of Practical Passive Filters
97
Maximum pass-band gain: -6 dB Minimum pass-band gain: -7 dB Maximum stop-band gain: -52 dB. Fig. 6 shows one of the circuit schematic generated by our approach and its corresponding binary tree representation. The frequency response of the circuit is shown in Fig. 7. We can clearly see that the gain is -59.79 dB at the stop-band edge (2k Hz), which meets the predefined specification. The ripple on the pass-band, which is between -6dB and -7dB, also meets the above specification. 4.2 Experiment B: High-Pass Filter Synthesis
The specification of the high-pass filter is given bellow: Stop-band edge: 1.0 kHz, Pass-band edge: 2.0 kHz, Maximum pass-band gain: -6 dB, Minimum pass-band gain: -7 dB, Maximum stop-band gain: -52 dB. One of the resulting circuit schematic and its corresponding binary tree representation are shown in Fig. 8 (a) and Fig. 8 (b) respectively. The frequency response of the circuit is shown in Fig. 9. The gain is -53.07 dB at the stop-band edge (1k Hz), which bellows the specification (-52 dB). The gain on the pass-band, which is very close to 6 dB, is far away from the predefined specification. + C1
RS 1k
56nF
C2
47nF
C4
0.1uF
+
RS
//
C1 L1 56mH
L2 68mH
RL 1k
Vout
L1
+ C2
VS C3 0.47uF
// +
+ L2
C3
C4
RL
Fig. 8. The evolved high-pass filter. (a) The circuit schematic generated by our method. (b) The corresponding binary tree representation of circuit.
It should be mentioned that, for every independent run, the averaged time required to generate a suitable circuit is bellow thirty minutes. Compared with other method which takes hours of computing time, our method is more efficient. It should be attributed the effectivity of method to usage of discrete component value, the simple encoding method and the powerful CS-GEP.
98
Z. Gan et al. Frequency Response 0 -10 -40 -52 X: 1000 Y: -53.07
Gain (dB)
-70 -100 -130 -160 -190 -220
1
10
100 1000 2k Frequency (Hz)
10k
100k
(a) Frequency Response -5 -5.5 -6
X: 2000 Y: -6.16
Gain (dB)
-6.5 -7 -7.5 -8 -8.5 -9 -9.5 -10 1k
2k
10k Frequency (Hz)
100k
(b) Fig. 9. (a) Frequency response of the high-pass filter shown in Fig. 8. (b) The detail section of pass band.
5 Conclusion A novel method based on binary tree representation and CS-GEP has been applied into RLC passive filter design. Both the circuit structure and component value can be generated automatically by this method. Experimental results show that our method can generate RLC passive filters efficiently. It is convenient for practical implementation
Automatic Synthesis of Practical Passive Filters
99
due to usage of discrete component value. It should owe the success of the algorithm to the suitable encoding method and efficient CS-GEP. In future work, we will take more practical conditions into account for engineering application and extend these strategies to synthesize other types of circuit.
References 1. Horrocks, D.H., Spittle, M.C.: Component Value Selection for Active Filter Using Genetic Algorithms. In: Proc. IEE/IEEE Workshop on Natural Algorithms in Signal Processing, Chelmsford, UK, vol. 1, pp. 13/1–13/6. IEEE Computer Society Press, Los Alamitos (1993) 2. Horrocks, D.H., Khalifa, Y.M.A.: Genetic Algorithm Design of Electronic Analogue Circuits Including Parasitic Effects. In: Proc. First On-line Workshop on Soft Computing (WSC1), pp. 71–78. Nagoya University, Japan (1996) 3. Kalinli, A.: Component Value Selection for Active Filters Using Parallel Tabu Search Algorithm. AEU International Journal of Electronics and Communications 60, 85–92 (2006) 4. Grimbleby, J.B.: Automatic Analogue Network Synthesis using Genetic Algorithms. In: Proc. 1st Int. Conf. Genetic Algorithms in Engineering Systems: Innovations and Applications, pp. 53–58 (1995) 5. Koza, J.R., Bennett III, F.H., Andre, D., Keane, M.A., Dunlap, F.: Automated Synthesis of Analog Electrical Circuits by Means of Genetic Programming. IEEE Trans. on Evolutionary Computation 1(2), 109–128 (1997) 6. Koza, J.R.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge, MA (1992) 7. Lohn, J.D, Colombano, S.P.: A Circuit Representation Technique for Automated Circuit Design. IEEE Trans. on Evolutionary Computation 3(3), 205–219 (1999) 8. Lohn, J.D, Colombano, S.P.: Automated Analog Circuit Synthesis using a Linear Representation. In: Sipper, M., Mange, D., Pérez-Uribe, A. (eds.) ICES 1998. LNCS, vol. 1478, pp. 125–133. Springer, Heidelberg (1998) 9. Chang, S.J., Hou, H.S., Su, Y.K.: Automated Passive Filter Synthesis Using a Novel Tree Representation and Genetic Programming. IEEE Trans. on Evolutionary Computation 10(1), 93–100 (2006) 10. Hou, H.S., Chang, S.J., Su, Y.K.: Practical Passive Filter Synthesis Using Genetic Programming. IEICE Trans. Electron E88-C(6), 1180–1185 (2005) 11. Ferreira, C.: Gene Expression Programming: a New Adaptive Algorithm for Solving Problems. Complex Systems 13(2), 87–129 (2001) 12. Ferreira, C.: Gene Expression Programming: Mathematical Modeling by an Artificial Intelligence. Springer, Heidelberg (2006) 13. de Castro, L.N., Von Zuben, F.J.: Learning and Optimization Using the Clonal Selection Principle. IEEE Trans. on Evolutionary Computation, Special Issue on Artificial Immune Systems 6(3), 239–251 (2002) 14. de Castro, L.N., Von Zuben, F.J.: The Clonal Selection Algorithm with engineering applications. In: GECCO’00, Workshop on Artificial Immune Systems and Their Applications, pp. 36–37 (2000) 15. Gan, Z.H, Yang, Z.K, Li, G.B, Jiang, M.: Automatic Modeling of Complex Functions with Clonal Selection-based Gene Expression Programming. In: The 3rd International Conference on Natural Computation (ICNC’07) (submitted) (2007) 16. Sedra, A.S., Brackett, P.O.: Filter Theory and Design: Active and Passive. Matrix Publishers, Inc. (1978)
Research on Fault-Tolerance of Analog Circuits Based on Evolvable Hardware Qingjian Ji, Youren Wang, Min Xie, and Jiang Cui College of Automation and Engineering, Nanjing University of Aeronautics and Astronautics, 210016 Nanjing, China
[email protected]
Abstract. For electronic devices especially used in extreme-environment, it is very important to ensure high-reliability and long-lifetime operation; so it is significant to develop fault-tolerant mechanisms by adopting evolutionary algorithm. Based on EHW (evolvable hardware), this paper presents a new FPACA (field programmable analog cell array) which is an evolution-oriented reconfigurable architecture and can implement evolution of both analog and digital functions. Adopting single-chromosome evolutionary algorithm, we establish evolutionary reconfiguration mechanism to research the fault- tolerance of evolutionary analog circuits, such as amplifiers, filters or DAC (digital to analog converters). Comparing to FPTA, FPACA has the advantages of low hardware cost and convenience of software analysis and simulation. By implementing a typical amplifier circuit, we illustrate the fault-tolerance of FPACA analog circuits and the experimental results show the correctness and feasibility of FPACA. Keywords: EHW, FPACA, Fault-tolerance, Evolutionary algorithm, Amplifier circuit.
1 Introduction EHW is based on the idea of combining reconfigurable hardware devices with evolutionary algorithms to execute reconfiguration autonomously [1], which refers to the characteristics of self-organization, self-adaptation and self-recovery, so it has wide application prospects in aeronautics and astronautics field, industrial automation, intelligent sensor, robot control and so on. At present, the research of EHW which includes analog and digital models has been a hot topic. However, most researches focus on the digital EHW. With the rapid development of mixed-signal integrate circuit and reconfigurable technology, it is necessary and significative to research the analog EHW [2]. In the field of fault-tolerance techniques the traditional methods rely on the presence of additional redundant components and thorough testing either at the point of manufacture or on-line, and add considerable cost and design complexity. So it is necessary to develop new mechanisms and techniques which face to integrated electronics systems. Fortunately the characteristics of EHW provide an ideal method to replace the traditional redundancy design. There are several reasons: firstly, EHW can L. Kang, Y. Liu, and S. Zeng (Eds.): ICES 2007, LNCS 4684, pp. 100–108, 2007. © Springer-Verlag Berlin Heidelberg 2007
Research on Fault-Tolerance of Analog Circuits Based on Evolvable Hardware
101
preserve existing circuit functionalities, in conditions where hardware is subject to faults, aging, temperature drifts and high-energy radiation damage. Secondly, new functions can be generated when needed (more precisely, new hardware configurations can be synthesized to provide required functionality) [3]. Moreover, it is important to reduce the weight and volume redundancy in integrated electronics systems which can be actualized just by EHW. In a word, in order to operate under complex and unpredictable environment, obtaining the ability of self-adaptation, reconfiguration and self-recovery are necessary. That is to say, the research of fault-tolerance of EHW is of great significance. The Jet Propulsion Laboratory (JPL) performs research in fault tolerant, long life, and space survivable electronics for the National Aeronautics and Space Administration (NASA). FPTA chips were designed at JPL and particularly targeted for EHW experiments, which outstanding characteristic is a largely “sea of transistors” architecture and can be used for hardware reconfigurable at transistor level [5]. Based on FPTA chips, JPL has had experiments to illustrate evolutionary hardware recovery from degradation due to extreme temperatures and radiation environments. Measurement results demonstrate that the original functions of some evolved circuits, such as half-wave rectifiers and low-pass filters, could be recovered by reusing the evolutionary algorithm that altered the circuit topologies [3], [4]. It also has been verified that the function of the 4-bit DAC from scratch could be recovered after the fault was detected [6]. In this paper, based on FPTA [7] we propose a new FPACA model. Comparing to FPTA which has additional capacitors and programmable resistors, FPACA has the advantages of low hardware cost and convenience of analysis and simulation. The architecture of FPACA is an evolution-oriented reconfigurable architecture and can implement evolution of both analog and digital functions at the transistor level. Adopting single-chromosome evolutionary algorithm, we establish evolutionary reconfiguration mechanism to research the fault-tolerance of FPACA analog circuits, such as amplifiers, filters or DAC. By implementing a typical amplifier, we illustrate the fault-tolerance of FPACA analog circuits and the experimental results show the correctness and feasibility of FPACA This paper is organized as follows: Section 2 presents the architectures and properties of FPACA. Section 3 describes the evolutionary design of FPACA analog circuits. Section 4 illustrates experiments on fault-tolerance through evolution of a typical amplifier circuit. Concluding remarks are given in Section 5.
2 Architecture of FPACA Normally, the evolutionary search for a circuit solution is performed either by software simulations as in extrinsic evolution or directly in hardware on reconfigurable chips. In extrinsic evolution accurate models are important to evolve circuit topology solutions. Fig. 1 (left) shows the FPACA model to implement extrinsic evolution. The model consists of an N×M (N and M are the number of columns and rows) array of reconfigurable cells. The cells are connected through different programmable switches. The number of external inputs, external outputs and cells rest on the difference and complexity of objective circuits. These properties have merits of flexibility and
102
Q. Ji et al.
expansibility and are especially propitious to extrinsic evolution. Fig. 1 (right) provides a detailed view of a FPACA cell. As shown, this cell can be programmed at the transistor level, which contains 14 transistors connected through 44 programmable switches and is able to implement different building blocks for analog signal processing. It also includes three capacitors, Cm1, Cm2 and Cm3, of 100fF, 100fF and 5pF respectively. This architecture is different from FPTA [7] which has additional capacitors and programmable resistors, so FPACA needs less hardware resources and has more simple software design. VDD
M2
5
M1
M9
6
2 4
31 Cm3
3
7
32
29
1
37
28
8 M3
M4
40
M10
M11
9
10
11 13
14 15
16
42
34
43
39
12 Cm2
Cm1
18
17
M5
33
M6
41
M12
M1344
19
20
21
38
22
23
30
35
36
24
M7
27
25
26
M8
M14
GND
Fig. 1. Architecture of the FPACA model (left) and schematic of a FPACA cell (right)
3 Evolutionary Design of FPACA EHW is reconfigurable hardware whose configuration is under the control of an evolutionary algorithm. The search for an electronic circuit realization of a desired transfer characteristic can be made in software as in extrinsic evolution, or in hardware as in intrinsic evolution. In extrinsic evolution by software simulations the final solution is downloaded to (or become a blueprint for) the hardware. While in intrinsic evolution the hardware actively participates in the circuit evolutionary process [10].
VB Platform
PSpice simulation control FPACA modeling control Evolutionary operation
FPACA circuit model PSpice platform Circuit responses
Fig. 2. Evolutionary reconfiguration mechanism of FPACA
Research on Fault-Tolerance of Analog Circuits Based on Evolvable Hardware
103
In this work, the FPACA cells circuit model is established through PSpice platform, and the reconfiguration of FPACA is controlled by evolutionary algorithm implemented by Visual Basic 6.0 (VB) platform. Fig. 2 shows the evolutionary reconfiguration mechanism of FPACA. The following is the main aspects of evolutionary design. 3.1 HereBoy: A Fast Evolutionary Algorithm Evolutionary algorithm determines how the hardware structure should be reconfigured when a new hardware structure is needed for an improved performance. In this work, we adopted HereBoy [8] as the evolutionary algorithm. This algorithm combines Genetic Algorithms (GA) with Simulated Annealing Algorithm and it is more efficient than either of them. 3.2 Coding of the FPACA As shown in Fig. 1 (right), the FPACA cell is an array of transistors interconnected by programmable switches. The status of the switches (OFF or ON) determines a circuit topology which determines the corresponding specific response. Thus, the circuit topology can be described as a function of switch status which can be represented by a binary sequence [9]. Consequently, the individual chromosome of evolutionary algorithm can be easily coded to a binary string. For example, in Fig. 1 (right), the programmable switches which can be coded by a 44 bit-string, such as “10011…”, where by convention one can assign ‘1’ to a switch turned ON and ‘0’ to a switch turned OFF. 3.3 Fitness Function The unique criterion of the fitness function used to evaluate the chromosomes determines both the operations carried on in evolutionary algorithm and the searching direction. So it is very important to design an appropriate fitness function. The calculation of the fitness is generally derived from the error between the actual output and the ideal output. The fitness function is shown as formula (1).
F itn ess = 1 /
N 1 [ u ' o ( i ) − u o ( i )] 2 ∑ N − 1 i =1
(1)
In formula (1), u'o (i) represents the actual output voltage of current generation circuit topology response at time i , and uo() i is the ideal output voltage at time i . The more similar the actual response is to the objective response, the bigger the fitness is. N is the sampling number. Based on description above, the main steps of evolutionary process are illustrated in Fig. 3. The process is repeated for many generations to generate the better circuit topology. The process usually ended at a given generations number or the closest target response has been reached.
104
Q. Ji et al.
Initializtion of chromosome
Conversion to a circuit description
PSpice simulation
Response evaluation and fitness evaluation
Circuit responses
No Models of circuits
New chromosome
Evolutionary operation
Does it conform to the requirements ? Yes End
Fig. 3. Main steps for the evolutionary process of FPACA
4 Fault-Tolerance Experiments of FPACA Analog Circuits The aim of this experiment is to recover functionality of a typical amplifier circuit using only one FPACA cell which is shown in Fig. 1 (right). Firstly based on the evolutionary design of FPACA which is described in part 3, we evolve amplifier circuits. The excitation input is 1kHz sine wave of amplitude 0.01V and evolutionary object is to get an ideal amplify circuit which amplification times is 50. The sine wave input point is M5g, and the signal output point is M6d. The experiment uses evolutionary algorithm with the following parameters: uniform mutation probability: 0.09; the maximum fitness: 30; the maximum evolution generations: 10000. By evolving we get an optimal solution of evolutionary amplifier whose fitness is 19.577 and sine response curve is shown in Fig. 4. The gain amplification is 0.50035/0.01=50.035 times, which is in the range of error and satisfies the demand. Then based on the optimal solution of evolutionary amplifier we introduce local faults and do fault-tolerance experiments. The circuit faults include three types: 14 single-transistor faults, 9 double-transistor faults and 8 three-transistor faults. The
Fig. 4. Sine response curve of an optimal solution
Research on Fault-Tolerance of Analog Circuits Based on Evolvable Hardware
105
faults are introduced by setting all switches which are directly connected to the corresponding fault transistor, OFF. Fault-tolerance experimental method: firstly, we evolve FPACA cell to an amplifier topology; secondly, three types of faults are introduced to the FPACA cell respectively; and then evolutionary process is carried out to recover the amplifier topology with the same functionalities. In order to evaluate the fault-tolerance ability of the FPACA cell, 3 technical indexes are defined here: convergence rate, average fitness, and average evolutionary generations. The convergence rate is defined of the proportion of functionality recovery times in every 10 times evolution. The average fitness describes the average errors which are between actual response and objective response in every 10 times evolution. The bigger the average fitness is; the better the fault-tolerance ability is. The average evolutionary generations that reflects self- recovery speed to every corresponding type faults denotes the average generations to implement self-recovery in every 10 times evolution. 4.1 Single-Transistor Fault In the following experiments, 14 single-transistor faults are injected to the FPACA cell respectively. Then fault-tolerate experiments are carried out with the experimental method above and the experimental results are illustrated in Fig. 5. As shown in Fig. 5, the FPACA cell has good fault-tolerance ability for single-transistor fault. Thereinto, when fault is injected to M1, M2, M9, M11, M13 or M14, the convergence rate is 100%; that is to say, the amplifier circuit evolved can recover from single-transistor fault completely. Only when fault is injected to M5, the evolution is hard to converge; that is to say, it is difficult to achieve the optimal circuit topology. That is because M5 is the input position, when M5 occurs faults that the input signal can’t enter into the circuit topology.
Fig. 5. Experimental results with the faults injected to 14 single-transistors respectively
4.2 Multi-transistor Faults In some cases, multi-transistor faults may occur; so it is necessary to test fault-tolerance ability of multi-transistor faults.
106
Q. Ji et al.
Fig. 6. Experimental results with the faults injected to 9 double-transistors respectively
Fig. 7. Experimental results with the faults injected to 8 three-transistors respectively
Here double and there-transistor faults are injected to the FPACA circuit to illustrate fault-tolerance ability of the amplifier circuit respectively. According to experimental method, the faults are injected by setting all the switches which are directly connected to the fault transistors OFF, and then the corresponding evolutionary experiments are carried out to achieve the optimal circuit topologies. The double and there-transistor fault-tolerance ability results are shown in Fig. 6 and Fig. 7 respectively. In Fig .6 and Fig. 7, we can see that the fault-tolerance ability of FPACA cell descends obviously with the number of fault transistors increasing. Especially when three-transistor faults occur, the average convergence rate of the total 8 kinds of three-transistor faults is no more than 30%; the average of all average fitness diminishes obviously and average evolutionary generations increases rapidly. As shown in Fig. 7, three arrowheads point out the corresponding values. 4.3 Analysis of Results From the experimental results above and the simple analysis, we can know that the number of fault transistors is closely related to the fault-tolerance ability; that is to say, with the number of fault transistors increasing, evolutionary recovery of the same circuit needs more averages of average evolutionary generations, and the averages of
Research on Fault-Tolerance of Analog Circuits Based on Evolvable Hardware
107
average fitness and convergence rate descend evidently. The reason consists in the increasing number of fault transistors makes the signal paths which are used to accurately transfer signals become less; consequently to evolve the objective circuit topologies becomes more difficult; so the fault-tolerance ability is affected obviously. We also find that if 4 transistors cause faults, the optimal amplifier circuit topology can’t be evolved; that is to say, the most permissive faults are 4 transistors. Additionally, the location of fault transistor also has crucial impact on fault-tolerance ability. For example, in Fig. 5 when transistor M3, M5, M4 or M10 causes fault, the convergence rate will descends extremely.
5 Conclusions In this paper, we firstly present a new architecture of FPACA which belongs to analog EHW and design the evolutionary reconfiguration mechanism to research the fault-tolerance ability. Secondly, by evolving a typical amplifier circuit, we illustrate the relation among the number of fault transistors, evolutionary efficiency and fault-tolerance ability. Finally, the following conclusions are achieved. (1) The FPACA analog circuit functionalities can be recovered by re-evolving from local transistor faults. (2) The functionalities of amplifier circuits which are evolved from the FPACA cell can be recovered from single-transistor fault and multi-transistor faults, and the most permissive faults are four transistors. (3) With the number of transistor faults increasing, evolutionary recovery of the same objective circuit topologies need more averages of average evolutionary generations, and the averages of average fitness and convergence rate will descend evidently. (4) When the number of FPACA cells used to evolve is determinate, the fault-tolerance ability of evolutionary circuit has close relation with the circuit functions. For other evolutionary circuits, such as filters and DAC, although more FPACA cells are needed, the conclusions are similar. Acknowledgments. The work presented here in this paper has been funded by National Natural Science Foundation of China (60374008, 60501022) and by Aeronautic Science Foundation of China (2006ZD52044, 04I52068).
References 1. Yao, X., Higuichi, T.: Promises and Challenges of Evolvable Hardware. IEEE Trans On Systems Man and Cybernetics-Part C: Applications and Reviews 29, 87–97 (1999) 2. Wang, Y., Zhang, Z., Cui, J., Chen, Z.: The Architecture and Circuital Implementation Scheme of A New Cell Neural Network for Analog Signal Processing. In: Pre-proceedings of the International Conference Bio-Inspired Computing - Theory and Applications, Wuhan, China, pp. 374–380 (2006) 3. Stoica, A., Zebulum, R.S., Keymeulen, D., Daud, T.: Transistor-Level Circuit Experiments Using Evolvable Hardware. In: Mira, J.M., Álvarez, J.R. (eds.) IWINAC 2005. LNCS, vol. 3562, pp. 366–375. Springer, Heidelberg (2005)
108
Q. Ji et al.
4. Stoica, A., Keymeulen, D., Arslan, T., Duong, V., Zebulum, R.S., Ferguson, I., Guo, X.: Circuit Self-Recovery Experiments in Extreme Environments. In: Proceedings of the 2004 NASA/DoD Conference on Evolution Hardware, Seattle, WA, USA, pp. 142–145 (2004) 5. Stoica, A., Keymeulen, D., Zebulum, R.S., Thakoor, A., Daud, T., Klimeck, G., Jin, Y., Tawel, R., Duong, V.: Evolution of Analog Circuits on Field Programmable Transistor Arrays. In: Proc. of the Second NASA/DOD Workshop on Evolvable Hardware, pp. 99–108. IEEE Computer Society Press, Los Alamitos (2000) 6. Ricardo, S., Zebulum, R.S., Keymeulen, D., Duong, V., Guo, X., Ferguson, M.I., Stoica, A.: Experimental Results in Evolutionary Fault-Recovery for Field Programmable Analog Devices. In: Proceedings of The 2003 NASA/Dod Conference on Evolvable Hardware, Chicago, IL, USA, pp. 182–186 (2003) 7. Stoica, A., Zebulum, R., Keymeulen, D.: Progress and Challenges in Building Evolvable Devices. In: Proceedings of The Third NASA/DoD Workshop on Evolvable Hardware, Long Beach, CA, USA, pp. 33–35 (2001) 8. Levi, D.: HereBoy: a fast evolutionary algorithm Evolvable Hardware. In: Proceedings of The Second NASA/DoD Workshop on Evolvable Hardware, Palo Alto, CA, USA, pp. 17–24 (2000) 9. Stoica, A., Zebulum, R., Keymeulen, D., Tawel, R., Daud, T., Thakoor, A.: Reconfigurable VLSI Architectures for Evolvable Hardware: from Experimental Field Programmable Transistor Arrays to Evolution-Oriented Chips. IEEE Transactions on VLSI Systems 9, 227–232 (2001) 10. Stoica, A., Keymeulen, D., Tawel, R., Salazar, C., Li, W.T.: Evolutionary Experiments with a Fine-grained Reconfigurable Architecture for Analog and Digital CMOS Circuits. In: Proceedings of the First NASA/DoD Workshop on Evolvable Hardware, Pasadana, pp. 76–84. IEEE Computer Society Press, Los Alamitos (1999)
Analog Circuit Evolution Based on FPTA-2 Qiongqin Wu, Yu Shi, Juan Zheng, Rui Yao, and Youren Wang College of Automation, Nanjing University of Aeronautics and Astronautics, Jiangsu 210016, China
[email protected]
Abstract. FPTA-2 with feedback structure is evolved to achieve an effective amplifier. Results in both time and frequency domain show it’s more effectively than using open-loop circuit. A new kind of fitness function based on square error threshold is put forward. Special points of sine wave can evolve better by using such evaluation function. A new structure of multi-cell circuits is designed and experiments show such new structure can make evolution easier.
1 Introduction Evolvable hardware is a kind of reconfigurable hardware under the control of evolutionary algorithm. It contains a novel design method of hardware and has the capability of self-adaptive and self-repair. John R. Koza, David Andre, etc. designed a 96 decibel opAm with low distortion by using genetic programming [1]. Ricardo Salem Zebulum, Macro A. Pacheco, etc. evolved CMOS opAm through genetic algorithm based on Miller OTA cell, and promoted adaptive weights used in multi-objective evolution [2]. In this paper, amplifier evolution experiments are performed based on FPTA-2 by using genetic algorithm. Fitness function of the amplifier evolution is adjusted, and four-cell circuit arrangement is regulated. Simulation results show that feedback structure is important to amplifier circuit, and the adjustment of fitness function works effectively for multi-object evolution. Multi-cell, especially four-cell circuit, can be evolved faster after the cell arrangement is adjusted.1 The remainder of the paper is organized as follows: in section 2, structure of FPTA-2 is described in detail; in section 3, Hereboy algorithm and improvement of fitness function are introduced. Simulations results could be found in section 4 and the results are compared with those presented in [3], [4]. A parallel 4-cell structure is presented in section 5, and simulation results are compared with those in serial structure. Section 6 is the summary and some thought for future research are put forward.
2 FPTA-2 FPTA is reconfigurable at transistor level, and the parameters and connection states of each transistor are programmable. Digital, analog and mixed circuits can be evolved on 1
The project was supported by the National Science Foundation (60501022).
L. Kang, Y. Liu, and S. Zeng (Eds.): ICES 2007, LNCS 4684, pp. 109–118, 2007. © Springer-Verlag Berlin Heidelberg 2007
110
Q. Wu et al.
FPTA, as different function could be realized with different transistor structures. FPTA-2 is the second reconfigurable chip developed by JPL. Each cell of FPTA-2 has a transistor array as well as a set of programmable resources. It contains 14 CMOS transistors (M1-M14) connected by 44 switches controlled by voltage, and 3 capacitances, Cm1, Cm2, and Cc, whose values are 100fF, 100fF and 5pF respectively. More details about FPTA-2 can be found in [5], [6]. Open-loop structure is usually used to evolve amplifiers, but evolution results are not good. The gain is hard to get to the desired value, and only gain and the FFT characteristic are used to evaluate the evolved circuits [4]. Amplifiers have also been evolved in two stages. Open-loop component is evolved at the first stage, and the optimized result is maintained. The feedback circuit is solely evolved at the second stage [3]. This method works more effectively than open-loop evolution, but it evolves in two stages and weakens the unity of open-loop structure and the feedback circuit. In this paper, the two parts of the amplifier are evolved simultaneously and the evolution process is easier than the two-stage method. FPTA-2 with feedback structure is used in experiments. The feedback circuit contains two reconfigurable resistances R1 and R2 and two static capacitances C1 and C2 that keep parallel connected with R1 and R2 respectively. Each RC circuit can be bypassed by reconfigurable switches. Detail of the circuit is shown in fig 1.
3 Hereboy Algorithm and Fitness Function 3.1 Hereboy Algorithm Hereboy algorithm is an evolution algorithm combined with genetic algorithm and simulated annealing. In the algorithm, several genes are selected randomly to perform
Fig. 1. Structure of FPTA-2 cell
Analog Circuit Evolution Based on FPTA-2
111
mutation operator at a certain probability to search for the best chromosome. More details about Hereboy algorithm can be found in [7]. Many aspects should be considered in the algorithm, such as mutation probability (Pm) and fitness function used to evaluate evolved results. The validity of the search space and speed of searching are related with Pm. If it is too large, searching will become randomly. And if Pm is too small, searching process will get too slow and the probability of convergence will reduce. According to experiences, Pm is in the domain of [0.06, 0.2] [7]. 3.2 Fitness Adjustment The reciprocal of square error per data is usually taken as fitness function [8], [9]. Take two cycles of sine wave as an example, target gain, defined as Am0, is 5, frequency is 1 kHz, and the max voltage value, v0, is 0.01 volts. Node Vin and Vout are marked in fig 1. The input signal Ui(t) and target output Uo ' (t) are expressed as in equation 1. U i (t ) = 0.01× sin(2000πt )
U o' (t ) = 0.05 × sin(2000πt )
.
(1)
Fitness function is presented in equation 2. Fitness =
.
1 n
(
)
2 1 ' U 0 (i ) − U o (i ) ∑ n − 1 i =1
(2)
Uo(i) and Uo’(i) are the i'th data of real and target output voltage value in sampled sequence, and n is the number of sampled data. 100 points are sampled in each cycle, so n=201 in this experiment. The fitness function gets larger while the square error becomes smaller, which means output gets closer to the target value. But not every point of the output signal evolved well, especially in some specific points, such as the peak-to-peak value of a sine wave. It is necessary to adjust the fitness function to satisfy the evolution demand of some special points. A constraint is taken into account in fitness function in this paper. The reciprocal of the square error is taken as fitness at the beginning of evolution. Then Pa and Pb are taken as the weights of the main fitness function and the additional fitness function separately. Sq0 is a predetermined square error threshold, and sq is the square error of present evolution. The value of Pa and Pb is determined in equation 3 as follows. ⎧1, sq > sq 0 Pa = ⎨ ⎩0.95, sq ≤ sq 0 ⎧0, sq > sq 0 Pb = ⎨ ⎩0.05, sq ≤ sq 0
. (3)
112
Q. Wu et al.
Fitness function is depicted as in equation 4. pa
Fitness' = n
1 ∑(Uo (i) −Uo' (i))2 n −1 i=1
+
pb Am − Am0
(4)
In equation 4, only square error is calculated in the former stage of the evolution, and the gain joins evaluation only if the square error is no larger than sq0. The flowchart of the extrinsic evolution of analog circuits is shown in fig 2.
Fig. 2. Flowchart of extrinsic evolution of analog circuits
4 Experiments 4.1 Evolution of Feedback Amplifier Sine wave and square wave are used as input signals in the experiment. Sine wave is designed as in formula (1). Parameters of square wave are designed as follows: desired gain (Am0) is 5, frequency is 1 kHz, and the max voltage value (v0) is 0.01v.
Analog Circuit Evolution Based on FPTA-2
113
Table 1. Evolved results of amplifiers
Number
Sine wave Open-loop Feedback
Square wave Open-loop Feedback
1
2.562
10.365
1.147
3.348
2
3.134
8.581
1.095
4.143
3
2.307
14.012
1.954
3.316
4
5.046
12.955
2.075
4.324
5
1.497
11.231
2.034
3.357
6
5.164
9.021
1.845
5.549
7
2.45
13.357
1.157
3.348
8
6.358
15.025
1.034
5.051
9
1.754
14.344
1.125
3.167
10
7.46
9.348
1.462
4.921
11
3.011
16.127
1.039
3.654
12
4.046
16.696
1.535
4.254
13
2.132
8.395
2.014
2.395
14
1.963
10.184
1.951
2.168
15
5.486
9.946
1.674
3.214
In the experiments, mutation probability is 0.08, the max generation is 10000, and the max fitness value is 40. 15 experiments are performed. FPTA-2 cell without feedback structure is also evolved as an amplifier, and the other parameters are the same. The evolution results are shown as in table 1. As known from table 1, the fitness values of open-loop circuits are much smaller than that of feedback amplifiers. It means that FPTA-2 cells with feedback structure are easier to evolve into an amplifier than cells without it. The optimized results based on the two different circuits are shown in fig 3 to fig 6. In the circuits without feedback structure, optimized result emerges in the 10th experiment. The max fitness value is 7.46, and the corresponding chromosome is: 110001110001000110001111110010110101111011101111010101101. As shown in fig 3(a), peak-to-peak values are different in two cycles of the output, and the larger one is 30.1mv. The close gain is about 3.01 and the frequency is about 1 kHz which can be estimated from fig 3(b). There is a comparatively large frequency component at 0HZ.
114
Q. Wu et al.
That is because bias voltage exists in the output of transistor circuit, and it also exists in other results. Although the output curve seems smooth, the result is not so good since the voltage gain is much smaller than desired. The transmission bands exist in small range around 1 kHz as in fig 3(c). The results acquired from circuits with feedback structure in fig 4 are much better than those in fig 3. The max fitness value is 16.696 which is found in the 12th experiment and the corresponding chromosome is: 101011011101100110010011111 110010111001001001000110110101. The peak-to-peak value is approximately 49 mv, and the gain is about 4.9v. Period of the output signal is 1ms, and the curve is very smooth. Though there is still a bit distortion of the curve, there is no harmonic in the signal. This means the distortion coefficient of the circuit is small. The transmission band in fig 4(c) is much wider than that in fig 3(c). The amplifier performs as a low-pass filter. The total time consumption is a little larger in feedback experiments than that in open-loop ones. That’s because the chromosome contains more genes. However, it takes less time to reach a certain fitness value in feedback experiments. Thus the speed of convergence process is higher in fig 4 than in fig 3. The results in fig 5 and fig 6 are mostly the same as in fig 3 and fig 4 respectively. The open-loop circuit can not obtain as good results as feedback circuits. There are harmonics in the square output of circuit with feedback, which makes the result not good.
(a) Optimized output
(c) Gain-frequency curve
(b) Amplitude frequency
(d) Max fitness value
Fig. 3. Result of open-loop structure and sine wave input
Analog Circuit Evolution Based on FPTA-2
(a) Optimized output
(c) Gain-frequency curve
(b) Amplitude frequency
(d) Max fitness value
Fig. 4. Result of feedback structure and sine wave input
(a) Optimized output
(c) Gain-frequency curve
(b) Amplitude frequency
(d) Max fitness value
Fig. 5. Result of open-loop structure and square wave input
115
116
Q. Wu et al.
(a) Optimized output
(c) Gain-frequency curve
(b) Amplitude frequency
(d) Max fitness value
Fig. 6. Result of feedback structure and square wave input
(a) Optimized output
(b) Amplitude frequency
(c) Gain-frequency curve
(d) Max fitness value
Fig. 7. Optimized output after fitness adjustment
Analog Circuit Evolution Based on FPTA-2
117
4.2 Fitness Adjustment Fitness function is adjusted based on feedback circuits. For example, the input sine wave is the same as depicted before and 15 experiments are carried out. The output of the best evolved circuit is shown in fig 7. The curves in two cycles are basically the same. That is because the average gain in two cycles is taken account in the fitness. As to the evolution result, square error gets to sq0 at generation 7593 when gain is about 4.8, and the max fitness occurs at generation 9205 when gain is about 5. The results show that adjustment of the fitness works to the evolution of sine wave amplifying.
5 Parallel Cell Structure As circuit becomes more and more complex, the number of FPTA-2 cells increases rapidly. The arrangement and connection between the cells is surely an important factor to influence evolution effects. Take 4-cell structure as an example, it can be arranged as shown in fig.8 (a). The cell will easily be separated into two parts if some switches are configured off. This results in zero-output of the circuit, which holds a large proportion during evolution. It means time waste and ineffectiveness evolution. Another 4-cell structure is put forward as shown in fig 8(b). Four cells are arranged in two rows and two lines, and cell b and cell c are both connected to the output cell d. If
(a) Serial arrangement
(b) Parallel arrangement
Fig. 8. Two types of the four-cell arrangement
(a) Serial arrangement
(b) Parallel arrangement
Fig. 9. Best fitness of two types of circuit arrangement
118
Q. Wu et al.
one of the two cells is separated, the other cell is still probably connected to cell d. And the probability of zero-output is smaller. Take integral circuit as an example, two types of cell arrangement are tested and compared. The input of the circuit is DC with voltage of 1v, and lasts for 1ms. The max fitness value curves are presented as in fig. 9(a) and fig 9(b). The proportion between zero-output generation and total evolved generation is 0.5267 in fig 9(a), and 0.3629 in fig 9(b). The parallel arrangement of cells works more efficiently than the serial arrangement. Because the length of chromosome is larger and multi-cell evolution is more complex, further research is needed on circuit evolution of multi-cell.
6 Conclusion The method on analog circuit evolution is discussed in this paper. Evolution on feedback amplifier is much better than on open-loop circuit. Fitness function is adjusted on a square error threshold. Four-cell circuit arrangement is regulated and the results show that it can speed up the evolution and helps multi-cell circuit evolution more effectively.
References [1] Design of a 96 decibel operational amplifier and other problems for which a computer program evolved by genetic programming is competitive with human performances. In: 1996 Japan-China Joint Int. Workshop Information Systems, Ashikaga Inst. Technol., Ashikaga, Japan (1996) [2] Zebulum, R.S., Pacheco, M.A., Vellasco, M.: Synthesis of CMOS Operational Amplifiers through Genetic Algorithms. In: Proc. XI Brazilian Symp. Integrated Circuit Design, pp. 125–128 (1998) [3] Zhiqiang, Z., Youren, W.: A new Analog Circuit Design Based on Reconfigurable Transistor Array. In: 2006 8th International Conference on Solid-State and Integrated Circuit Technology Proceedings, Shanghai, China, pp. 1745–1747 (2006) [4] Yuehua, Z., Yu, S.: Analog Evolvable hardware Experiments based on FPTA. Users of Instruments 14(3), 10–12 (2006) [5] Kaiyang, Z., Youren, W.: Research on evolvable circuits based on reconfigurable analog array. Computer testing and control 13(10) (2005) [6] Stoica, A., Zebulum, R., Keymeulen, D., Tawel, R., Daud, T., Thakoor, A.: Reconfigurable VLSI Architectures for Evolvable Hardware: from Experimental Field Programmable Transistor Arrays to Evolution-Oriented Chips. IEEE Transactions on VLSI Systems, Special Issue on Reconfigurable and Adaptive VLSI Systems 9(1), 227–232 (2001) [7] Levi: HereBoy: A Fast Evolutionary Algorithm [A]. In: Proceedings of the Second NASA/DoD Workshop on Evolvable Hardware[C]. Washington, pp. 17–24 (2000) [8] Vieira, P.F., Sá, L.B., Botelho, J.P.B., Mesquita, A.: Evolutionary Synthesis of Analog Circuits Using Only MOS Transistors. In: Proceedings of the 2004 NASA/DoD Conference on Evolution Hardware (EH’04) (2004) [9] Langeheine, J., Trefzer, M., Brüderle, D., Meier, K., Schemmel, J.: On the Evolution of Analog Electronic Circuits Using Building Blocks on a CMOS FPTA. In: Deb, K., et al. (eds.) GECCO 2004. LNCS, vol. 3102, pp. 1316–1327. Springer, Heidelberg (2004)
Knowledge Network Management System with Medicine Self Repairing Strategy JeongYon Shim Division of General Studies, Computer Science, Kangnam University San 6-2, Kugal-Dong, Kihung-Gu,YongIn Si, KyeongKi Do, Korea, Tel.: +82 31 2803 736
[email protected]
Abstract. In the complex information environment, the role of intelligent system is getting high and it is essential to develop a smarter and more efficient intelligent system for processing automatic knowledge acquisition, structuring the memory efficient to store and retrieving the related information and repairing the system automatically. Focusing on the self repairing system, in this study Medicine Self Repairing Strategy for knowledge network management is designed. The concepts of Self type, Internal Entropy, medicine treatment are defined for modeling Self Repairing System. We applied this proposed system to virtual memory consisting of knowledge network and tested the results. Keywords: Knowledge network management strategy, Internal entropy, medicine.
1
Introduction
The living things are exposed to a dynamic, complex and dangerous environment. For surviving they have developed their own way such as an intelligent system to overcome the faced difficulties. They protect the body from the external stimulus or attack of disease and keep a balance of body by autonomic self repairing system. When the system can’t overcome the external attacks, the body loses the balance. In the oriental medicine, the broken status of balance is regarded as a disease and the main role of medicine treatment is to help autonomic self repairing system to recover the balance of body. The role of medicine helps to remove the harmful factors and to raise the deficiency of the body. As a computer technology develops very rapidly, the information environment as well as the real world becomes more and more complex. Especially the internet came to human society and changed the paradigm of modern society. In this tendency, the role of intelligent system is getting high and it is essential to develop a smarter and more efficient intelligent system for processing automatic knowledge acquisition, structuring the memory efficient to store and retrieving the related information and repairing the system automatically. Many studies of the intelligent system adopting Life science have made and applied to the many practical areas for recent decades. L. Kang, Y. Liu, and S. Zeng (Eds.): ICES 2007, LNCS 4684, pp. 119–128, 2007. c Springer-Verlag Berlin Heidelberg 2007
120
J. Shim
Focusing on the self repairing system, in this study Medicine Self Repairing Strategy for knowledge network management is designed. The concepts of Self type, Internal Entropy, medicine treatment are defined for modeling Self Repairing System. We applied this proposed system to virtual memory consisting of knowledge network and tested the results.
2
Intelligent Knowledge Network Management System
An intelligent knowledge management System was designed for assembling the functions of knowledge selection, knowledge acquisition using modular neural network and symbolic learning, structuring memory, perception & inference and knowledge retrieval by author.[1] As shown in Figure 1,data flow in this system is as follows: The raw data passed Preprocessing module are temporary stored in Input Knowledge pool for the efficient information processing. Input data stored in Input Knowledge Pool are filtered by Reactive layer and distributed to ALM(Adaptive Learning Module) or SKAM(Symbolic Knowledge Acquisition Module). If it is training data, it is used for learning process in ALM and the output nodes of learning frame are connected to nodes in higher Associative Knowledge layer. ALM consists of three layered Neural Network and is operated by BP algorithm. If it is symbolic representation of data, it activates SKAM which analyzes knowledge and associative relations logically. Its output nodes are also connected to the nodes in Associative knowledge layer. Associative Knowledge layer constructs knowledge net based on propagated knowledge from the previous step. Constructed knowledge net is selectively stored in connected memory. For the efficient memory retention, maintenance and knowledge retrieval, memory is designed to consist of knowledge network. Knowledge network is specially designed and composed of knowledge cells according to their associative
Fig. 1. Intelligent Knowledge Network Management System
Knowledge Network Management System
121
relation and concepts. This system process knowledge network management. During this processing, discarded data or useless dead knowledge is sent to Pruning pool and removed. 2.1
Knowledge Network Management Strategy
We designed knowledge cells of knowledge network to have several properties,i.e. ID, T(Self Type), IE(Internal Entropy) and C(contents), for the efficient processing of memory maintenance and knowledge retrieval. As shown in figure 3, Knowledge cell is connected to other knowledge cell and the collection of knowledge cells compose Knowledge network. Knowledge network is represented as a form of Knowledge Associative list.
Fig. 2. Knowledge cell, Knowledge network and Knowledge Associative list
Knowledge network management strategy system takes charge of maintaining efficient memory by discriminating dead cells and removing dead cells. Main function of Knowledge network management strategy is pruning the dead cells or negative cells and maintain the new state of memory constantly. In this process, the system checks if there is a negative cell and calculates Internal entropy of whole memory. If it finds a dead cell, it sends a signal and starts the pruning process. In the case of a negative cell, it determine if a negative cell should be destroyed. During destroying a negative cell, memory configuration is made. After removing, new associative relation should be made by calculating the new strength. In this case of following example in Figure 4, node kj is removed because it is a dead cell. After removed, ki is directly connected to kk with new strength. The new strength is calculated by eq 1. Rik = Rij ∗ Rjk
(1)
For the more general case as shown in Figure 5, if there are n number of dead cells between ki and kj ,the new strength Rij is calculated by equation 3. Rij = Ri,i+1 ∗ Rj−1,j (2)
122
J. Shim
Fig. 3. Knowledge nodes with dead cells
3 3.1
Medicine Self Repairing Strategy Self Maintenance
Medicine Self Repairing Strategy is designed for self maintenance of the system. As shown in Figure 2, the system checks SEG(Surviving Energy Gauge) Balancing factor if it is in the balancing state periodically. All the knowledge cells composing knowledge network in memory are represented as an object including ID,Self Type, Self Internal Entropy and contents. Among the attributes, Self Type and Self Internal Entropy are used for Medicine Self Repairing Strategy. The main role of medicine is to select a bad knowledge cell which is harmful or useless for the intelligent processing and to remove it. Medicine type matching mechanism is memory cleaning process removing the type matched knowledge cell with medicine type.
Fig. 4. Self maintenance
Self Type Self Type is defined as own property representing its characteristics and has five factors as M,F,S,E and,K whose property can be determined depending on the application area. There exists attractive relationship and rejecting relationship as shown in Table1 and Table2. If one type meets Attracting relation, two types are associated and their relational strength increases. On the contrary, if it meets Rejecting relation,expelling strength works on two types and their strength decreases.
Knowledge Network Management System
123
Table 1. Type Matching Rule : Attracting Relation Attracting Relation M⊕F F⊕E E⊕K K⊕S S⊕M
Table 2. Type Matching Rule: Rejecting Relation Rejecting Relation ME ES SF FK KM
3.2
Self Internal Entropy and SEG Balancing Factor
Every knowledge cell has Self Internal Entropy(IE,Ei ) representing the strength of the cell. It has a value of [-1,1]. Minus value means negative energy, plus value means positive energy and zero represents the state of no-energy. Ei =
1 − exp(−σx) 1 + exp(−σx)
(3)
SEG(Surviving Energy Gauge) is a measure of evaluating the state of surviving ability in the circled environment. This value is used for maintaining the system. SEG is calculated by eq.2. n Eti SEG(t) = i=1 (4) n 3.3
Medicine Self Repairing Mechanism
In this system, every knowledge cell has its Self Type which has a value of M,F,E,K, and S. But Self Type of knowledge cell can be transfer to other type or abnormal type according to the situation. For example,supposing that the state of starvation continues for a long time, this cell can be changed to a dead cell of which type is D by self maintenance mechanism. The dead cell has zero as a value of Self Internal Entropy. During the memory structuring, the abnormal
124
J. Shim
typed knowledge cells can be added in the knowledge network. If many useless cells are included in Knowledge network, this situation makes the efficiency of a system lower. Because the value of SEG is also lower, Self maintenance system can easily check the abnormal state. For Preventing this situation and recovering, the repairing treatment is essential. In this study, we propose Medicine Self Repairing Mechanism for recovering and removing the bad sectors in Knowledge network. The main idea is medicine type matching remove,that is,medicine searches the matching type cells and remove the cells from Knowledge network. Its precondition is that system should detect the abnormal types and find the appropriate medicine matched with the abnormal type. In this process the empirical knowledge is used for selecting the medicine. System should have medicine matching type list for medicine self repairing.
Fig. 5. Medicine type matching remove
Knowledge Network Management System
125
The following algorithm represents medicine Self Repairing algorithm. Algorithm 1: Medicine Self Repairing Algorithm Start STEP1 Check SEG, Repairing Signal. STEP2 Input the medicine type, MED. STEP3: while(!EOF) begin If (SEG ≤ θ and Repairing control = ON) then Input the medicine type, MED. Search the type,T,in the cell matched with MED referring medicine type matching table. If(found) Remove the found cell. end. STEP4 call Knowledge Network Restructuring module Stop
4
Experiments
The prototype of Knowledge Network management system was applied to virtual memory and Medicine Self Repairing Strategy was tested using knowledge associative list and medicine lookup table of Table 3. Figure 6 shows the initial state of knowledge network and Table 4 shows its knowledge list containing Self Type,knowledge nodes,IE and Relation. In this experiment, medicine type matching repairing strategy was processed in four steps as shown in graph of Figure 7. MED-D means D type cell removing, MED-DP is D and P type
Fig. 6. Knowledge network
126
J. Shim Table 3. Medicine Lookup table MED Attracting Relation D,P,C Rejecting Relation
F
Table 4. Knowledge associative list of initial knowledge network Type Knowledge i IE Rel. Knowledge i + 1 M
K1
1.0 0.9
K2
M
K1
1.0 0.4
K7
F
K2
0.8 0.8
K3
F
K2
0.8 0.6
K5
F
K3
0.6 0.2
K4
F
K3
0.6 0.8
K6
C
K4
-1.0 0.0
NULL
S
K5
0.7 0.0
NULL
D
K6
0.0 0.0
NULL
S
K7
1.0 0.7
K8
S
K7
1.0 0.3
K9
S
K7
1.0 0.5
NULL
C
K8
-1.0 0.0
NULL
M
K9
0.2 0.1
K10
F
K10
0.3 0.0
NULL
D
K11
0.0 0.6
K10
D
K11
0.0 0.1
K12
D
K11
0.0 0.9
K13
M
K12
0.2 0.0
NULL
P
K13
-0.3 0.6
K14
M
K14
0.1 0.0
NULL
removing, and MED-DPC is D,P and C type removing. MEDA-F means Adding ennergy to F type cells because F type is Attracting relation with this medicine according to Medicine Lookup table. As shown in Figure 8, we can know that the value of SEG is rising as Medicine Type Matching Repairing is processed. As a result, the memory was successfully updated to the optimal state.
Knowledge Network Management System
127
Fig. 7. Change of Entropy by medicine type matching repairing
Fig. 8. Change of SEG by medicine type matching repairing
5
Conclusion
Knowledge network management system with Medicine Type Matching Repairing strategy was designed. The concepts of Self Type, Internal Entropy and SEG were defined and used for self repairing system. We applied this proposed system to virtual memory consisting of knowledge net and tested the results. As a result of testing , we could find that the memory was successfully updated and maintained by medicine type matching repairing strategy. The proposed system was also can be usefully applied to many areas requiring the efficient memory management.
References 1. Shim, J.-Y.: Knowledge Retrieval Using Bayesian Associative Relation in the Three Dimensional ModularSystem. In: Yu, J.X., Lin, X., Lu, H., Zhang, Y. (eds.) APWeb 2004. LNCS, vol. 3007, pp. 630–635. Springer, Heidelberg (2004) 2. Anderson, J.R.: Learning and Memory. Prentice Hall, Englewood Cliffs
128
J. Shim
3. Fausett, L.: Fundamentals of Neural Networks. Prentice Hall, Englewood Cliffs 4. Haykin, S.: Neural Networks. Prentice Hall, Englewood Cliffs 5. Shim, J.-Y., Hwang, C.-S.: Data Extraction from Associative Matrix based on Selective learning system. In: IJCNN’99, Washongton D.C (1999) 6. Anderson, J.R.: Learning and Memory. Prentice Hall, Englewood Cliffs 7. Shim, J.-Y.: Automatic Knowledge Configuration by Reticular Activating System. In: Wang, L., Chen, K., Ong, Y.S. (eds.) ICNC 2005. LNCS, vol. 3610, Springer, Heidelberg (2005)
Design of a Cell in Embryonic Systems with Improved Efficiency and Fault-Tolerance Yuan Zhang, Youren Wang, Shanshan Yang, and Min Xie College of Automation and Engineering Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China
[email protected]
Abstract. This paper presents a new design of cells to construct embryonic arrays, the function unit of which can act in three different operating modes. Compared with cells based on LUT with four inputs and one output, the new architecture displays improved flexibility and resource utilization ratios. Configuration memory employed by embryonics can implement 1-bit error correcting and 2-bit error checking by using extended hamming code. The two-level fault-tolerance is achieved in the embryonic array by the error correcting mechanism of memory at celllevel and column-elimination mechanism at array-level which is triggered by cell-level fault detection. The implementation and simulation of a 4bit adder subtracter circuit is presented as a practical example to show the effectiveness of embryonic arrays in terms of functionality and twolevel fault-tolerance. Keywords: Embryonic systems, Cellular arrays, Two-level self-repair, Extended hamming code, Fault tolerance of configuration memory.
1
Introduction
The requirements for reliability and fault tolerance of electronic systems become higher and higher as the complexity of digital systems increases. The traditional design for fault tolerance can’t provide a solution for integrated SOC (System On a Chip)for its complex architecture, large volume, and difficulty in integration. Therefore there is a necessity to quest for new architectures and mechanisms used for on-line self-repair of VLSI(Very Large Scale Integration). Embryonic systems [1] are planar cellular arrays with self-test and self-repair properties. Similar to those found in biological organisms, properties such as multi-cellular organization, cellular replication, cellular differentiation and cellular division are used to build a digital circuit with high reliability and robust fault tolerance properties. Early cellular architecture is based on multiplexer [2] which is unsuitable for implementing large scale circuits for its large resources and difficulty in routing. In recent years, LUT is mostly used in the architecture of POE (POE = phylogenesis, ontogenesis, epigenesis) [3]. However, it doesn’t solve the problem of low L. Kang, Y. Liu, and S. Zeng (Eds.): ICES 2007, LNCS 4684, pp. 129–139, 2007. c Springer-Verlag Berlin Heidelberg 2007
130
Y. Zhang et al.
resource utilization ratio. In this paper the improved cellular architecture is proposed which is capable of implementing arithmetic operation or two gates with three operational modes to improve flexibility and resource utilization ratio. It is necessary to have a research on the reliability and the fault-tolerant ability of configuration memory within embryonic arrays as errors in the memory will render the system useless. Simple off-line test, an addition of a complex artificial immune system [4,5] for on-line self-test and hamming code for 1-bit error correcting [6] have been proposed previously. In this paper we apply extended hamming code [7] to the memory fault tolerance which can correct 1-bit error at cell-level and give 2-bit error signal to trigger array-level self-repair to obtain enhanced fault-tolerance ability.
2
Architecture of Cells in Embryonic Systems
The embryonic array is constituted by a symmetrical electronic cellular array. Fig. 1 gives an abstract view of the basic components of the embryonic cell. Every cell has the identical circuit structure, which contains co-ordinate generator, memory, function unit, switch box and control unit [8].Input multiplexers controlled by co-ordinate are contained in the southmost and northmost cells to have the inputs of the system sourced from west as well as south and north.
Active cell
Co-ordinate generator +1 0
INPUTS
Spare cell
Memory
1
transparent
0 0
1 1
Control Unit
2
2 1 0
OUTPUTS
0
1
2
INPUTS
Embryonic system
Function unit LUT
Transparent signal
2
INPUTS
N
To east neighbor
Self-test logic
Self-test and selfrepair logic
W
Switch box
E
S
Fig. 1. Architecture of cells in embryonic systems
2.1
Co-ordinate Generator and Memory
Each cell chooses its configuration bits stored in the memory according to its coordinate. For column-elimination adopted each cell just needs one co-ordinate and the memory just has to store the configuration bits of cells within the same row. When the cell is in normal state, the configuration bits are chosen from the memory according to its co-ordinate calculated by adding 1 to the co-ordinate of the west
Design of a Cell in Embryonic Systems
131
cell. When it is in transparent state, the cell passes the co-ordinate of the west cell to east directly and selects transparent configuration from the memory. 2.2
Function Unit
Fig. 2 shows the function unit in some detail. A cell has three different operational modes, which make it flexible enough to map all kinds of application circuits onto the array conveniently. – 4-LUT mode: when MODE(1:0) is “11”, the function unit acts as a 16-bit LUT which supplies an output depending on its four inputs. It is suitable for implementing a circuit with larger logic function. – 3-LUT mode: when MODE(1:0) is “01”, the function unit is split into two 8-bit LUTs, both supplying a result depending on three inputs. The first result can go through the flip-flop as the first output. The second one can be used as a second output. A cell can perform an arithmetic operation function, such as a full adder or a full subtracter. – 2-LUT mode: when MODE(1:0) is “00”, the function unit can be regarded as two 4-bit LUTs with two inputs and one output. A cell can implement two gates. The inputs of function unit can be chosen from fixed logic values, any of outputs of the neighboring cells or the cell’s own output according to the configuration bits. The configuration bit EN REG can enable sequential operation.
...
INPUTS_MUX
...
0 1
0 1
D
Q
0 1
Fun0_out
CLK
...
EN_REG
0 1
...
0 1
0 1
0 1
1 0
Fun1_out
MODE(1) MODE(0)
Fig. 2. Architecture of function unit
2.3
Switch Box
Switch box is designed for propagating information between cells. As shown in Fig. 3, outputs in each direction can select from function unit’s outputs and cell’s inputs in other three directions according to the configuration bits.
Y. Zhang et al. N0_OUT
N0_IN N1_IN E0_IN E1_IN W0_IN W1_IN FUN0_OUT FUN1_OUT S0(2:0)
W1_OUT
W0_IN W1_IN N0_IN N1_IN S0_IN S1_IN FUN0_OUT FUN1_OUT
Switch Box N0_IN N1_IN E0_IN E1_IN W0_IN W1_IN FUN0_OUT FUN1_OUT
E0_IN E1_IN N0_IN N1_IN S0_IN S1_IN FUN0_OUT FUN1_OUT
S0_IN S1_IN E0_IN E1_IN W0_IN W1_IN FUN0_OUT FUN1_OUT
W0(1:0)
S0_IN S1_IN E0_IN E1_IN W0_IN W1_IN FUN0_OUT FUN1_OUT
W0_OUT
N1(2:0)
N0(2:0) E0_IN E1_IN N0_IN N1_IN S0_IN S1_IN FUN0_OUT FUN1_OUT
N1_OUT
W1(2:0) S0_OUT
E0_OUT
E0(2:0) W0_IN W1_IN N0_IN N1_IN S0_IN S1_IN FUN0_OUT FUN1_OUT S1(2:0)
132
E1_OUT
E1(2:0)
S1_OUT
Fig. 3. Architecture of switch box
2.4
Control Unit
Control unit is responsible for controlling the other blocks in the cell by combining fault flags into cell state signal which triggers the cell’s elimination. There are four states for a cell. The transition from one state to the next at discrete time steps according to the fault flags from self-test logic within the cell and the state of neighboring cells in the same column is what determines the array’s behavior. The cell works in normal state with no faults detected. The cell will go into nonreparable transparent state with 2-bit error detected in the memory that can’t return to normal state again. When a fault is detected in the function unit or any neighboring cell of the same column, the cell will go into reparable transparent state and return to the normal state when the fault disappears. When the cell is in either of the transparent states, control unit will set the transparent signal to ‘1’, and self-test logic continues to test the transparent configuration bits. If 2-bit error is detected in the transparent configuration bits, the cell will go into backup state and set the backup signal to ‘1’.
3
Fault Tolerance Mechanism of Embryonic Systems
In order to improve the fault tolerance ability of embryonic systems we are developing a hierarchical approach including cell-level and array-level for fault tolerance, allowing the system to keep operating normally in the presence of multi-fault with less redundancy resources. The “stuck-at” fault model which covers the most common faults is used. 3.1
Fault Detection and Fault Tolerance at Cell-Level
Duplication of the function unit appears to give the function unit fault-detection ability since a fault will cause outputs of function unit copies to differ. This discrepancy can be detected by a XOR-gate comparing the cell function outputs.
Design of a Cell in Embryonic Systems
133
When the outputs of the two function units are different, the fault signal from comparator will be set to ’1’ indicating a fault occurring in the cell. The advantages of this method for fault detection are that it is simple and operates on-line. However the ability of fault location and fault tolerance can’t be obtained by the method of dual modular redundancy. Errors within the memory will lead to the failure of function unit and the connections between cells, which will make the cell abnormal. To achieve high reliability and robust fault tolerance property of configuration memory, extended hamming code which can correct 1-bit error and detect 2-bit error provides a solution for the design of memory fault tolerance. The memory has the ability of error correcting with 1-bit error occurring to ensure that the cell keeps operating normally. When 2-bit error occurs, a fault signal is given to trigger array reconfiguration. Extended Hamming Code. It’s known that Hamming code is perfect 1-bit error correcting code by the use of extra check bits. The extended hamming code is slightly more powerful, which is able to detect when 2-bit error has occurred by adding a parity check bit as well as able to correct any single error. Create the code word as follows: 1. Mark all bit positions that are powers of two as check bits. (positions 1, 2, 4, etc.) 2. All other bit positions are for the data to be encoded. (positions 3, 5, 6, 7, etc.) 3. Each check bit calculates the parity for some of the bits in the code word. The position of the check bit determines the sequence of bits that it alternately checks and skips. For example: Position 1: check 1 bit, skip 1 bit, check 1 bit, skip 1 bit, etc. (1,3,5,7,9,etc.) Position 2: check 2 bits, skip 2 bits, check 2 bits, skip 2 bits, etc. (2,3,6,7,etc.) 4. Set a check bit to 1 if the total number of ones in the positions it checks is odd. Set it to 0 if the total number of ones in the positions it checks is even. 5. Set the parity check bit to 1 if the total number of all bits and check bits is odd. Set it to 0 if the total number of all bits and check bits is even. Each data bit is checked by several check bits. An error will cause check bits which check it to change in response. Check each check bit; add the positions that are wrong, this will indicate the location of the bad bit. The parity check bit can help to detect 2-bit error. Error Checking and Correcting Circuits for the Memory. As shown in Fig. 4, the co-ordinate is passed to the configuration memory to select the cell’s configuration for generating corresponding check bits and a parity check bit according to coding process referred above and also passed to the standard check bits and parity check bit memory for standard ones. The discrepancy caused by errors can be detected by a XOR-gate comparing their outputs. Error detecting and correcting unit will judge the number of errors according to the result of comparison. Then it will correct the bad bit with one error occurring or give 2-bit error signal with 2-bit error detected . If generated check bits are different from
134
Y. Zhang et al.
Memory Transparent signal Co-ordinate
transparent
Check bits and parity check bit generator
2
Standard check bits and parity check bit memory
1 0
Backup signal
Transparent configration
Memory self-test and self-repair logic
transparent
1 0
Comparator
2 1 0
Correct Error detecting configuration bits and 2-bit error signal correcting unit
Fig. 4. Fault detection and fault tolerance for the configuration memory
standard ones, this indicates at least one error occurring in the configuration bits. Then the comparison of parity check bit is a necessity for identifying the number of errors. A discrepancy of parity check bit indicates one error occurring. Then the bad bit will be located according to the comparison of check bits and will be inverted to ensure exporting correct configuration bits, otherwise indicates 2-bit error detected which can’t be corrected at cell-level. The control unit receives its 2-bit error signal and activates array-level self-repair. All the cells in the faulty column select transparent configuration bits for testing. If 2bit error in transparent configuration bits arises, the control unit will generate backup signal and the backup transparent configuration bits will be used to keep the cell transparent correctly. 3.2
Fault Tolerance at Array-Level
Reconfiguration mechanism based on column-elimination (Fig. 5) is adopted for array-level fault tolerance in this paper. faulty cell
spare cell
0
1
2
0
1
2
0
1
2
fault signal propagation
active cell
0
1
0 0
1
2
0
1
2
0
1
2
0
1
Co-ordinate propagation
normal (no fault)
transparent cell
reconfiguration (A fault is detected in the array )
return to normal (The faulty column becomes transparent )
Fig. 5. The process of array-level fault tolerance based on cell recombination
When 2-bit error occurs in the memory or a fault is detected in the function unit, the control unit will export transparent signal. The signal will propagate along the column by OR-gate network. All the cells within that column will be signaled to move to the transparent state. Consequently, the faulty column is removed from the array whose function is replaced by the spare column.
Design of a Cell in Embryonic Systems
4
135
Example
To illustrate the two-level fault tolerance property within embryonic systems the design of 4-bit adder subtracter is presented. Fig. 6 shows the circuit structure. It is composed of four full adders and four XOR-gates. Add sub is the control signal. When it’s ‘0’, the circuit performs as a 4-bit adder. When it’s ‘1’,the circuit performs as a 4-bit subtracter. y3
y2
y1
y0
add_sub
c
C3
FA
C2
FA
3
s3
x3
s2
x1
C0
FA
1
s1
x2
C1
FA
2
0
s0
x0
Fig. 6. Schematic diagram of 4-bit adder subtracter
According to the cell architecture in section 2, a cell is capable of implementing a full adder or two gates. Six cells are enough for the logic circuit which can be mapped onto a 3*3 array with a column of spare cells. y2y1y3 y0 Cell20
0
y2 y1y3 y0 2-LUT Cell21
1
co-ordinate
Add_sub W0_IN N0_IN W0_IN N1_IN W0_IN
2-LUT Cell22
1
E0_OUT
y3y0
XOR
S0_OUT
XOR
S1_OUT
W0_IN N0_IN W0_IN N1_IN
XOR
S0_OUT
XOR
S1_OUT
E0_OUT
N0_IN
A
S0_IN
B
SI_IN
Ci
Si
Ci+1
Cell00
Add_sub W0_IN A Si
S0_IN
B
E0_IN
Ci
co-ordinate
Ci+1
E1_OUT E0_OUT
x1x2 0
N0_IN
A
S0_IN
B
W0_IN S0_OUT FA2 3-LUT Cell01 E1_OUT W0_IN N0_OUT N0_IN E0_OUT S0_IN N1_OUT W1_IN
1
x1x2 x0x3
C
Spare cell
S0_IN 3-LUT Cell12 E1_OUT
3-LUT Cell11 N1_IN W1_IN
Cell10
N0_IN
0
co-ordinate
y2 y1
Si
Ci
N1_IN E0_OUT
Ci+1
Spare cell
Si
B Ci
Ci+1
S1_IN co-ordinate x0 x3 0 1 FA1
N0_OUT E0_OUT
Spare cell
00000041041
08382037800
08382037800
00000041041 08382037800
08382037000
08382037000
08382037000
00000041041
00000041041
00000041041
10248E08C40 10248E08C40 10248E08C40
N0_OUT
S0_OUT FA3 E1_OUT 3-LUT Cell02 A
S2 S3
00000041041
00000041041 10245008F80 10245008F80 10245008F80 00000041041 S1 S0
00000041041
00000041041
10249200C38 10249200C38 10249200C38 102463C0C00 102463C0C00 102463C0C00
W0_OUT S1_IN FA0
x1x2x0 x3
Fig. 7. Implementation of 4-bit adder subtracter in an embryonic array
The placing and routing in the embryonic array and corresponding data in the memory are shown in Fig. 7. The four cells implementing four full adders work in 3-LUT mode and the two cells implementing four XOR-gates work in 2-LUT mode. The rightmost column of spare cells is ready for array-level fault tolerance based on reconfiguration.
136
Y. Zhang et al.
If it is designed by the function unit based on LUT with four inputs and one output, at least twelve cells are needed and the routing will be more difficult. Here six cells are enough for the implementation with the adoption of the design of reconfigurable function unit which has three operational modes. So the improved function unit provides a solution for enhanced cellular function and reduced hardware resources.
5
Simulation and Results
The whole system described by VHDL is synthesized, placed and routed by the XST and the P&R tools from XILINX, then post-simulated by ModelSim SE 6.0c. 5.1
Verification of Cell-Level Fault Tolerance
We take cell11 as the fault injection cell here. As shown in Fig. 8, cell11 configbits “10248E08CC0 ” represents the configuration exported from memory with 1-bit error. Cell11 corrected configbits “10248E08C40 ” represents the configuration bits corrected by extended hamming code. From the result we can see the bad bit is corrected and the right configuration bits “10248E08C40 ” are exported to configure function unit and switch box, ensuring the sum s and carry c of adding x to y are right. 5.2
Verification of Array-Level Fault Tolerance
When 2-bit error occurs in the memory or a fault is detected in the function unit, array-level self-repair will be triggered by the fault signal from the celllevel self-test logic. Then the logic function will be mapped onto the non-faulty cells by column-elimination.
Fig. 8. The simulation result of memory fault tolerance with 1-bit error occurring
Design of a Cell in Embryonic Systems
137
Fig. 9. The simulation result of the system with 2-bit error occurring in the memory
Fig. 10. The simulation result of the system with a fault occurring in the function unit
2-bit Error Occurring in the Memory. As shown in Fig. 9, when the configuration bits of cell11 from the memory are “10248E08C80 ”, the 2-bit error signal cell11 memory fault is high indicating 2-bit error occurring. At this time the sum s and carry c of adding x to y are wrong indicating the invalidation of cell-level self-repair. Then the transparent configuration bits “00000041C41 ” also with two bits error are chosen. The system also exports a wrong result. Then the cell goes into backup state setting backup signal high and works with correct transparent configuration “00000041041 ”. Notice that after some reconfiguration time, the result is right. The system returns to normal and the co-ordinates of the last column cells cell02 x, cell12 x, and cell22 x change from “10 ” to “01 ”, the spare column replaces the function of the faulty column.
138
Y. Zhang et al.
A Fault Occurring in the Function Unit. We introduce a signal cell11 fun fault to simulate fails in function unit. When it’s ‘0’, the inputs of function unit are connected to normal input signals, otherwise to fixed logic value ‘1’, simulating the stuck at 1 fault. As shown in Fig. 10, Notice that when cell11 fun fault goes to logic 1 at 200 ns, there is a small time interval on which the output of the circuit is unknown just before returning to normal behavior. Then cell11 selects transparent configuration bits “000000041041 ” and the co-ordinates of the last column cells cell02 x, cell12 x, and cell22 x change from “10 ” to “01 ”, the spare column replaces the function of the faulty cell’s column. When the fault disappears at 400 ns, the reconfiguration is triggered and the correct result of subtracting y from x is obtained again.
6
Conclusions and Future Work
In this paper the architecture of embryonic systems is introduced. A controllable function unit based on multi-mode enhances the cell function and reduces the hardware resources. The configuration memory is endowed with 1-bit error correcting and 2-bit error detecting in the form of extended hamming code. So the fault-tolerant ability is improved by the two-level self-repair strategy including memory fault tolerance at cell-level and cell recombination by columnelimination at array-level. The direction of our future research is optimizing the cell’s architecture and improving the resource utilization ratios and the ability of fault location at celllevel to achieve automatic design of large scale digital circuits. Acknowledgments. The work presented here in this paper has been funded by National Natural Science Foundation of China (60374008, 90505013) and Aeronautic Science foundation of China(2006ZD52044, 04I52068).
References 1. Mange, D., Sanchez, E., Stauffer, A., Tempesti, G., Marchal, P., Piguet, C.: Embryonics: A New Methodology for Designing Field-Programmable Gate Arrays with Self-Repair and Self-Replicating Properties [J]. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 6(3), 387–399 (1998) 2. Ortega, C., Tyrrell, A.: Design of a Basic Cell to Construct Embryonic Arrays [J]. IEE Procs. On Computers and Digital Techniques 145-3, 242–248 (1998) 3. Moreno, J., Thoma, Y., Sanchez, E.: POEtic: A Prototyping Platform for Bioinspired Hardware [A]. In: Proceedings of the eighth International Conference on Evolvable Systems: From Biology to Hardware [C], Bacelona, Spain, pp. 177–187 (2005) 4. Zhang, X., Dragffy, G., Pipe, A.G., Zhu, Q.M.: Artificial Innate Immune System: an Instant Defence Layer of Embryonics[A]. In: the proceedings of the 3rd International Conference on Artificial Immune Systems[C], Catania, Italy, pp. 302–315 (2004)
Design of a Cell in Embryonic Systems
139
5. Bradley, D.W., Tyrrell, A.M.: Immunotronics: Novel Finite-State-Machines Architectures with Built-In Self-Test Using Self-Nonself differentiation[J]. IEEE Transactions on Evolutionary Computation 6(3), 227–238 (2002) 6. Prodan, L., Udrescu, M., Vladutiu, M.: Self-Repairing Embryonic Memory Arrays[A]. In: Proc. of The sixth NASA/DoD workshop on Evolvable Hardware[C], Seattle, USA, pp. 130–137 (2004) 7. Zhang, J., Zhang, X.: Extended Hamming Code Algorithm and Its Application in FLASH/EEPROM Driver[J]. Ordnance Industry Automation 2003, 22(3), 52–54 (2003) 8. Ortega-Sanchez, C., Mange, D., Smith, S., Tyrrell, A.M.: Embryonics: A BioInspired Cellular Architecture with Fault-Tolerant Properties [J]. Genetic Programming and Evolvable Machines 1(3), 187–215 (2000)
Design on Operator-Based Reconfigurable Hardware Architecture and Cell Circuit Min Xie, Youren Wang, Li Wang, and Yuan Zhang College of Automation and Engineering, Nanjing University of Aeronautics and Astronautics, 210016 Nanjing, China
[email protected]
Abstract. Due to the generic and highly programmable nature, gate-based FPGA provides the ability to implement a wide range of application. However, its small cell and complex interconnection network cause problems of low hardware resource utilization ratio and long interconnection time-delay in compute-intensive information processing field. PMAC (Programmable Multiply-Add Cell) presented in this article ensures high-speed and flexibility by adding much programmability to the multiply-add structure. PMAC array architecture resolves these problems and greatly increases resource utilization ratio and the efficiency of information processing. By establishing PMAC model and simulating, PMAC array is actualized on the VirtexII Pro series XC2VP100 device. By implementing FFT butterfly operation and 4th order FIR on PMAC array, flexibility and correctness of the architecture are proved. The results have also shown to have an average increase of 28.3% in resource utilization ratio and decrease of 15.5% in interconnection time-delay. Keywords: Reconfigurable computing, Reconfigurable hardware, FPGA, Operator-based programmable cell circuit, Information processing.
1 Introduction There are three types of information processors: microprocessor, ASIC, PLD. Microprocessors which are based on software programs limit their performance by lack of customizability and the processing speed is slower than ASIC. The speed of ASIC is quick which satisfies the real-time processing demands, but ASICs have disadvantages of fixed architecture and no programmability. FPGA acquires the balance between speed and flexibility, but it has the defects of low resource utilization ratio, long interconnection time-delay and much power dissipation [1], [2]. Recently there has been a significant increase in the sizes of gate-based FPGA. Architectures contain tens of thousands to hundreds of thousands of logic elements, providing a logic capacity into the millions of gates which can implement a wide range of application. However, when gate-based FPGA is used for algorithm-level evolvable hardware, such architecture has disadvantages of too long chromosome and too big search space, which can’t implement on-line evolution of the operator-based circuit. Therefore it is necessary to use the operator-based circuit as the fundamental unit to evolve the algorithm-level circuit. L. Kang, Y. Liu, and S. Zeng (Eds.): ICES 2007, LNCS 4684, pp. 140–150, 2007. © Springer-Verlag Berlin Heidelberg 2007
Design on Operator-Based Reconfigurable Hardware Architecture and Cell Circuit
141
In recent years, more and more coarse-grain programmable cell circuits are researched [3], [4], [5]. Paul Franklin [6] presented the architecture of RaPid (Reconfigurable Pipelined Data-path); its special data-path provides high performance assistances of regular calculation for DSP processor. But it is a linear structure which can achieve high performance only in linear system. Higuchi [7] used DSP processor as the cell circuit, but this architecture occupied too much hardware resource to be implemented on a single chip. Aiming at the disadvantages of gate-based FPGA in information processing filed, this article presents a new architecture based on operator-based PMAC, which gives attention to the balance of speed, feasibility, power consumption and resource utilization ratio.
2 Reconfigurable Hardware Architecture 2.1 The General Architecture The general architecture is based on the 2-D cell array structure as shown in Fig. 1. The architecture contains five main parts: reconfigurable cell circuit PMAC, programmable interconnection network Configurable Interconnection, embedded memory BlockRAM, I/O programmable resource I/O routing pool and I/O block (not described in Fig. 1). In the architecture, the hard-core is 16*16 cell array. Based on the characteristics of the information algorithm, the hierarchical architecture is adopted. The 1st level is the 8-bit PMAC which can implement byte-based data processing; the 2nd level is the CLB which is composed of 4 PMACs. Such design can expand byte-based cell PMAC to word-based cell CLB easily. The CLB array makes the word-based information processing available. If expanding is unnecessary, the 4 PMACs in CLB can implement byte-based FFT butterfly operation or 4th order FIR etc. The 3rd level is Hunk which is composed of 4 CLBs. When CLB is word-based cell, Hunk can implement the word-based FFT butterfly operation or 4th order FIR etc.
Fig. 1. The general reconfigurable hardware architecture
In the architecture, the hierarchy of cells determines the hierarchy of programmable interconnection network. PMAC, CLB and Hunk have their own programmable interconnection network respectively. BlockRAM which is adjacent to the Hunk
142
M. Xie et al.
satisfies the memory demands for information processing. I/O routing pool is the important resource for interconnection between inner logics and I/O blocks. 2.2 Design of Operator-Based Programmable Cell PMAC Design of cell circuit for highly efficient information processing depends on the analysis of the inherent characteristics of the aiming algorithms. When information processing algorithms are implemented by hardware, not only the calculation volume and complexity, but also the regularity and modularization should be taken into account. Analyzing quantities of algorithms, such as FFT/IFFT, FIR/IIR, DCT/IDCT, Convolution and Correlation operation etc, we conclude that: in these algorithms, lots of multiplication and addition are used; calculation volume ratio of the addition and multiplication is approximate to 1:1. So PMAC adopts the 8-bit CLA (Carry Look-ahead Adder) and 8-bit Booth Multiplier as its core as shown in Fig. 2.
Fig. 2. PMAC cell circuit architecture
Most of the coarse grain cells are based on 4-bit or 8-bit ALUs which have powerful functions, but logic cell is more complex and the 1:1 rule is not taken into account sufficiently. The functional and technical features and advantages of the PMAC which is the hardcore of the architecture are as follows.
:
(1)Configuration of multiplier input registers multiplier input registers can be configured to shift registers or general registers by setting MUX1 and MUX2. Data are shifted into registers from ShiftinA and ShiftinB or inputted form DataA and DataB. By setting the MUX3 and MUX4 the registers can be bypassed. (2)Configuration of multiplier outputs registers: registers between adder and Booth multiplier make pipeline data processing available. If registers are unnecessary, they can be bypassed by setting MUX5. To set MUX5, MUX6 and MUX7 can change the connective modes of adder and multiplier. These modes can accomplish additions of any 2 data among multiplier output MULin, Add, external input DataC and DataD. (3)Configuration of multiplier expansion [8]: sometimes precision of byte-based processing is not enough while word-based needed. Based on multiplication distributive law, the 4 byte-based multipliers in PMAC can be expanded to a word-based multiplier in CLB.
Design on Operator-Based Reconfigurable Hardware Architecture and Cell Circuit
143
(4)Configuration of adder module: a_s is addition/subtraction control signal. The module is an adder when a_s is in high level. Inputs of the module can be MULin, DataC, DataD or Add. The module can accomplish the following operations: DataC adding/subtracting MULin which is the most common information processing operator; DataD adding/subtracting DataC which means the adder can be used separately; by setting MUX8 and MUX9, adder can be configured to accumulator or inverse accumulator which can save much resource when the volume of addition is more than that of multiplication. (5)Configuration of adder expansion: byte-based adder can be expanded to word-based adder. Saturation operation is adopted for the adder: the sum selects maximal value when it is overflow or minimal when underflow. (6)Handshaking signals among PMACs: input effective new_data, trigger enable en, current level computing finished done, current level computing now busy, begin to compute start and ready to incept data rdyfdata. There are also some control signals and logic signals: clock clk, partial product input part_pro_in, partial product output part_pro_out, carry in carryin, carry out carryout. PMAC is a powerful cell. A single PMAC can accomplish byte-based multiplication, addition/subtraction, accumulation, inverse accumulation, multiply-accumulation and inverse multiply-accumulation etc. 2.3 Design of Programmable Interconnection Network In the CLB, programmable interconnection of the 4 PMAC is shown in Fig. 3.
ShiftinB ShiftoutB
Fig. 3. Programmable interconnection network in CLB
Data_C
Data_D
ShiftinA ShiftoutA
Add
Data_A Data_B
ShiftinB ShiftoutB
Mul Data_D
Part_mul_in
Data_C
Part_mul_out
Add
ShiftinA ShiftoutA
Data_A Data_B
Part_mul_in
ShiftinB ShiftoutB
Carryout Carryin
Mul Data_D
Part_mul_out
Carryin
Carryout
ShiftinA ShiftoutA
Part_mul_in
ShiftinA ShiftoutA
Part_mul_out
ShiftinB ShiftoutB
Part_mul_in
Data_C
Carryin Carryout
Data_A Data_B
Part_mul_out
Data_C
Carryin Carryout
Data_A Data_B
Data_D
Mul
Add
Mul
Add
Fig. 4. Testing and scanning chain for shift registers in CLB
Small circles denote programmable bits. In generic FPGA, in order to satisfy the highly programmable nature, programmable interconnection network is distributed symmetrically and switch matrix is adopted. Because information processing dataflow are regular, we adopt direct interconnection mode. Such design without switch matrix saves much chip area. Simply configured CLB can form a shift register which can test
144
M. Xie et al.
shift registers in PMAC array as shown in Fig. 4. Black dots denote effective connection. Of course, the other parts in PMAC array can be tested in the same way.
3 Analysis of Experiment Results 3.1 Hardware Resource Consumption of PMAC PMAC is modeled on development platform ISE6.2 of Xilinx Corporation and simulated on simulation platform ModesSim SE6.0 of Mentor Graphics Corporation. The hardware platform is XC2VP100 FPGA. We have used VHDL to depict the cell circuit PMAC and the general architecture, and synthesized the process by Xilinx Synthesis Technology XST; finally, programmable bit file was configured to XC2VP100. Table 1 gives resource consumption amounts of the main units in PMAC. Add_Sub, Booth_Multi and Shif_Regi*2 represent 8-bit adder/subtracter with registered output, 8-bit booth multiplier with registered output and two 8-bit shift register separately. Table 1. The resource consumption amounts of PMAC on XC2VP100 Inner units Add_Sub Booth_Multi Shif_Regi*2 PMAC
Total Equivalent Gates 160 643 131*2 1235
Slices 4 33 4*2 40
Slice FF 8 29 8*2 53
The PMAC which only consumes 1235 equivalent NAND gates totally occupies 40 Slices. Hardware resource utilization ratio is only 66% and that of registers is only 65% [9]. The reason consists in PMAC which needs many 1-bit adders in the Slices but rarely needs the other logic resource. On the other hand, 40 separate slices need to interconnect together; the interconnection lines would be too long and complex. Of course, the interconnection would waste much power and increase the time-delay. If the embedded block MULT18x18 is adopted to implement the 8-bit multiplier, 33 more Slices are needed to form a PMAC. Employment of embedded block can’t increase the utilization ratio of resource and decrease the power consumption. If operator-based PMAC is adopted, several cells with simple interconnection can implement certain information processing algorithms. Such architecture has three main advantages. First, PMAC can implement many algorithms on the cell array flexibly. Second, booth multiplier and CLA in the PMAC are both compact structure and high-speed computing parts. Interconnection between PMACs is short and simple which confirms short time-delay. So high clock frequency can be achieved. Third, complex interconnection is one of the main reasons why generic FPGA consumes much power; to simplify the interconnection can also reduce the power. If booth multiplier, CLA and shift register in PMAC can be fully used, in other words, the ratio of multiplication and addition calculation volume is 1:1, inner resource utilization ratio is about 97.25% which is 33% overtopping generic FPGA. PMAC gives attention to the balance of the flexibility, speed, power consumption and resource utilization ratio and gets better results.
Design on Operator-Based Reconfigurable Hardware Architecture and Cell Circuit
3.2
145
Performance Analysis of PMAC
Fig.5 shows post Place&Route simulation of the main functions of PMAC. Definitions of the input/output pins correspond to Fig. 1. In simulation process, sh_ctrl denotes MUX1 and MUX2 configuration bits; when sh_ctrl is in high level, shina and shinb shift 1 bit and are outputted in shouta and shoutb; a_s_rol denotes the MUX6, MUX7, MUX8 and MUX9 configuration bits. When (a_s)&(a_s_rol)=10, PMAC accomplishes the most common operation of multiply-add; when (a_s)&(a_s_rol)=11, PMAC implements multiply-accumulation. Table 2 shows a_s and a_s_rol combinational control functions. Table 2. The add/subtract model control a_s_rol a_s 0 1
0
1
data_c-shina*shinb data_c+shina*shinb
add(i)-mul(i-1) add(i)+mul(i-1)
In the simulation process, data format is Q7Q6Q5Q4Q3Q2Q1Q0. Data are all scaled at Q4, that is to say, the last 4 bits Q3Q2Q1Q0 denote the decimal part while the front 4 bits Q7Q6Q5Q4 the integer part. So data precision is 0.0625(1/16). For example, the first group of inputs are shina=57=>3.5625, shinb=25=>1.5625, data_c=16=>1 and the corresponding outputs are mul=89=>5.5625, add=105=>6.5625. The sign “=>” denotes the corresponding scaled conversion calculation result of the data; the average error between outputs and the actual calculation results 5.5664, 6.5664 is 0.00390. In concrete application, the scaling schedule can be selected. For example, the rotating factor W in FFT could be scaled at Q7, so the average error is only 0.0079(1/128).
Fig. 5. The Post-Place&Route simulation of PMAC implemented on XC2VP100
Fig. 5 shows the characteristic that PMAC has three levels of registers. Data in rectangles are the 1st group. The 1st level is shift registers; shouta and shoutb are 1 cycle after shina and shinb. The 2nd level is multiplier output registers; the products are 2 cycles after shina and shinb. The 3rd level is adder output registers; the sum is 3 cycles after data_c. As shown in Fig.5, the functions of PMAC are correct and stable.
146
M. Xie et al.
3.3 Implementation of Information Processing Algorithm on PMAC Array 3.3.1 Implementation of FFT Butterfly Operation on PMAC Array FFT which transforms signals from time-domain to frequency-domain is an important information processing algorithm; its general formulas are: X ( k ) = F F T [ x ( n )] =
N −1
∑ x ( n ) iW
kn N
(1)
1 N −1 x(k )iWN− kn ∑ N n= 0
(2)
n=0
x(n) = IFFT [ X (k )] = Among them, 0≤k≤N-1
,the rotating factor W = e
− j 2π / N
N
. Between formulas (1)
and (2), FFT and IFFT have much similarity that both are based on many regular butterfly operations. Butterfly operation flow shows in Fig. 6.
Fig. 6. Butterfly operation dataflow
A, B and W are all complex numbers, A=a1+a2i, B=b1+b2i, W=w1+w2i. Butterfly operation can be divided into two parts by combining and each part has 4 real number additions and 4 real number multiplications. Changing the a_s signal of cell periodically, butterfly operation can be easily implemented on 4 PMACs (Fig.7).
Fig. 7. The interconnection mode of butterfly operation implemented on PMAC array
In Fig. 7, the solid lines denote the effective interconnection while the shadow ones ineffective. Every butterfly operation is completed in two cycles, that is to say, each group of inputs keeps two cycles; in the first cycle, a_s_1, a_s_3 and a_s_4 are in high level, and a_s_2 is in low level; the 4 PMACs accomplish the operation of formulas (3) and (4) which is the A+BW branch. In the second cycle, the 4 a_s signals inverse; the 4 PMACs implement the operation of formulas (5) and (6) which is the A-BW branch.
Design on Operator-Based Reconfigurable Hardware Architecture and Cell Circuit
147
Out_re1=(a1+b1*w1)+b2i*w2i
(3)
Out_im1=(a2i+b2i*w1)+b1*w2i
(4)
Out_re2=(a1-b1*w1)-b2i*w2i
(5)
Out_im2=(a2i-b2i*w1)-b1*w2i
(6)
In the process, rotating factor W is scaled at Q7; the other data are scaled at Q4. Simulation shows in Fig.8. For example, the 1st group of data, a1=96, a2=64, b1=48, b2=56, w1=48, w2=64; out_re1=91=>5.6875≈5.625, out_re2=97=>6.0625≈6.025; out_im1=108=>6.75≈6.8125, out_im2=21=>1.3125≈1.2875; the average error between output and the actual calculation results is about 0.0406.
Fig. 8. The Post-Place&Route simulation of butterfly operation implemented on PMAC array
We implemented the same butterfly operation on the (FPGA)Slices array and MULT&Slice resource and made comparison. Table 3 shows the performance and efficiency comparison of butterfly operation implemented on different resources. Table 3. The performance and efficiency comparison of butterfly operation implemented on different hardware resource Logic resource (FPGA)Slices MULT&Slice PMAC
Frequency (MHz) 122.654 174.368 225.887
Utilization ratio 65% <65% 97.25%
Logic time-delay 70.7% 85.4% 84.8%
Interconnection time-delay 29.3% 14.6% 15.2%
(FPGA)Slices denote that slices in FPGA are applied only to accomplish butterfly operation. MULT&Slice denote that the embedded core MULT18x18 and slices are both applied to accomplish the same operation. PMAC denotes that the operation is implemented on the PMAC array. The comparison results prove that PMAC array has
148
M. Xie et al.
high speed, high resource utilization ratio and short interconnection time-delay, especially comparing with (FPGA)Slices. 3.3.2 Implementation of 4th Order FIR on PMAC Array FIR is one of the most important filters and has a wide range of application in digital image processing, removing noise and data compression. Its formula is:
y n = x [ n ] * h[ n ] =
N −1
∑ x [ k ]i h [ n − k ]
(7)
k =0
The h[0], h[1]……h[N 1] are coefficients of the filter. Adopting the PMAC array to implement FIR, the shift registers can be fully utilized. A 4th order FIR can be implemented on 4 PMACs as shown Fig.9.
Fig. 9. The interconnection mode of 4th order FIR implemented on PMAC array
PMACs accomplish the formulas (8)(9)(10)(11) in sequence.
+h *x fir =fir +h *x fir =fir +h *x fir1=init
0
n
(8)
2
1
1
n-1
(9)
3
2
2
n-2
(10)
+h *x = ∑ x[k ]ih[n − k ] 3
fir=fir3
3
n-3
(11)
k =0
Fir1, fir2, fir3 and fir denote the 1st, 2nd, 3rd and 4th order FIR output separately. The scaling scheme in FIR is similar with that in FFT, that is to say, the constant factors are scaled at Q7 and the other data scaled at Q4. So h0, h1, h2 and h3 are scaled at Q7; the others are scaled at Q4. Simulation of the 4th order FIR is shown in Fig.10. The data in the rectangle show the data flow of the 4th FIR. We also implemented the same 4th order FIR on three different kinds of hardware resource. Table 4 gives the performance and efficiency comparison of the 4th order FIR implemented on different resources. Similar results as FFT simulations are achieved. The PMAC array can obtain high frequency and hardware resource utilization ratio and short interconnection time-delay.
Design on Operator-Based Reconfigurable Hardware Architecture and Cell Circuit
149
Fig. 10. The Post-Place&Route simulation of the 4th order FIR implemented on PMAC array Table 4. The performance and efficiency of 4th order FIR implementation on different hardware resource Logic resource (FPGA)Slices MULT&Slice PMAC
Frequency (MHz) 88.320 174.657 188.332
Utilization ratio 65% <65% 89.39%
Logic time-delay 72.8% 83.3% 81.8%
Interconnection time-delay 27.2% 16.7% 18.2%
4 Conclusion Aiming at the application of multi-media information processing, this article presents an operator-based reconfigurable hardware architecture and introduces the functional and technical features of PMAC in detail. By accomplishing the FFT butterfly operation and the 4th order FIR on PMAC array, the results show that this architecture has an average increase of 28.3% in resource utilization ratio and an average decrease of 15.5% in interconnection time-delay comparing with gate-based FPGA. The PMAC-based reconfigurable architecture can serve as the evolutionary model of the evolvable hardware to implement the algorithm level evolving online. Afterwards, because fixed-point operations, which have conflicts of data precision and data range, are adopted in this architecture, problems of big value not to overflow and small value not to lost need to be further researched. The main future work is to perfect and optimize PMAC cell circuit, interconnection resource and the memory block. Acknowledgments. The work presented here in this paper has been funded by National Natural Science Foundation of China (60374008, 90505013) and Aeronautic Science foundation of China (2006ZD52044, 04I52068).
150
M. Xie et al.
References 1. Kuon, I., Rose, J.: Measuring the gap between FPGAs and ASICs. In: ACM/SIGDA 14th international symposium on Field programmable gate arrays, pp. 21–30 (2006) 2. Gao, H.: Study on Design Techniques of a SRAM-Based Field Programmable Gate Array (In Chinese), pp. 20–26. Xidian University, Xi’an (2005) 3. Signh, H., et al.: Morphosys: An Integrated Reconfigurable System for Data-Parallel and Communication-Intensive Application. IEEE Transaction on Computers 49(5), 465–481 (2000) 4. Komuro, T., Ishii, I., Ishikawa, M., Yoshida, A.: A digital vision chip specialized for high-speed target tracking. IEEE Transactions on Electron Devices 50(1), 191–199 (2003) 5. Galanis, M.D.: A Reconfigurable Coarse-Grain Data-Path for Accelerating Computational Intensive Kernels. Circuits, Systems and Computers 14(9), 887–893 (2005) 6. Ebeling, C., Cronquist, D.C., Franklin, P.: RaPiD-reconfigurable pipeline datapath. In: The 6th International Workshop in Field-Programmable Logic and applications, pp. 126–135 (1996) 7. Higuchi, T., Murakawa, M., Iwata, M., Kajitani, I., Wenxin Liu, I., et al.: Evolvable hardware at function level. In: Proceedings of IEEE International Conference on Evolutionary Computation, pp. 187–192 (1997) 8. Wu, H., Zhang, L.: Design of a novel Reconfigurable Multiplier (In Chinese). Microelectronics & Computer 21(9), 161–163 (2004) 9. Huang, Z.-j., Zhou, F., Tong, J.-r., et al.: A new FPGA logic block suitable for datapath circuits (In Chinese). Computer Engineering and Design 21(3), 29–34 (2000)
Bio-inspired Systems with Self-developing Mechanisms Andr´e Stauffer, Daniel Mange, Jo¨el Rossier, and Fabien Vannel Ecole polytechnique f´ed´erale de Lausanne (EPFL) Logic Systems Laboratory CH-1015 Lausanne, Switzerland
[email protected]
Abstract. Bio-inspired systems borrow three structural principles characteristic of living organisms: multicellular architecture, cellular division, and cellular differentiation. Implemented in silicon according to these principles, our cellular systems are endowed with self-developing mechanisms like configuration, cloning, cicatrization, and regeneration. These mechanisms are made of simple processes such as growth, load, branching, repair, reset, and kill. The hardware simulation and hardware implementation of the self-developing mechanisms and their underlying processes constitute the core of this paper.
1
Introduction
Borrowing three structural principles (multicellular architecture, cellular division, and cellular differentiation) from living organisms, we have already shown how embryonic hardware [2] [3] [1] is able to grow a bio-inspired system in silicon thanks to two algorithms: an algorithm for cellular differentiation, based on coordinate calculation, and an algorithm for cellular division, the Tom Thumb algorithm [4]. Implemented as a data and signals cellular automaton (DSCA) [8], this embryonic hardware is endowed with self-developing properties like configuration, cloning, cicatrization, and regeneration [6]. In a previous work [7], the configuration mechanisms (structural and functional growth), the cloning mechanisms (cellular and organismic growth), the cicatrization mechanism (cellular self-repair), and the regeneration mechanism (organismic self-repair) were already devised as the result of simple processes like growth, load, branching, repair, reset, and kill. The goal of this paper is to implement these processes in hardware in order to organize systems and deal with their faults in a fully automatic way. Starting with a very simple cell made of only six molecules, Section 2 will introduce hardware simulations of its DSCA implementation to describe these self-developing mechanisms. We define then a small organism made of three cells, the “SOS” acronym, as an application example for the simulation of our mechanisms (Section 3). The hardware implementation of the acronym “SOS” is realized thanks to CONFETTI, a reconfigurable hardware platform for bio-inspired architectures (Section 4). A brief conclusion (Section 5) summarizes our paper and opens new research avenues. L. Kang, Y. Liu, and S. Zeng (Eds.): ICES 2007, LNCS 4684, pp. 151–162, 2007. c Springer-Verlag Berlin Heidelberg 2007
152
2 2.1
A. Stauffer et al.
Self-developing Mechanisms Structural Configuration
The goal of the structural configuration mechanism is to define the boundaries of the cell as well as the living mode or spare mode of its constituting molecules. This mechanism is made up of a structural growth process followed by a load process. For a better understanding of these processes, we apply them to a minimal self-developing cell. This cell is made up of six molecules arranged as an array of two rows by three columns, the third column involving two spare molecules dedicated to self-repair. The growth process starts when a growth signal is applied to the lower left molecule of the cell at time t = i (Fig. 1). Every four time step (t = i+1, i+5, ...), according to the structural configuration data or structural genome, a molecule of the cell selects one of its four data inputs (Fig. 2), in order to create a data path among the molecules of the cell. This path allows to trap a copy of the structural data in the memory positions of the cell and to move another copy of the structural data around the cell. At time t = i + 24, when the connection path between the molecules is about to close, the lower left molecule delivers a close signal to the nearest left neighbor cell. The path is really closed at time t = i + 25.
t=i
t = i+1
t = i+5
t = i+9
t = i+17
t = i+21
t = i+24
t = i+25
t = i+13
Fig. 1. Structural growth process of a minimal self-organizing cell made up of six molecules when a growth signal is applied to the lower left molecule at time t = i; when the path is about to close (t = i + 24), the lower left molecule delivers a close signal
1
2
3
4
Fig. 2. Data input selection. (1) Northward. (2) Eastward. (3) Southward. (4) Westward.
The load process is triggered by the close signal applied to the lower right molecule of the cell at time t = i (Fig. 3). At each time step (t = i + 1 to
Bio-inspired Systems with Self-developing Mechanisms
t = i+1
t=i
t = i+2
t = i+3
153
t = i+4
Fig. 3. Triggered by the close signal of the nearest right neighbor cell (t = i), the load process stores the molecular types and modes of the artificial cell
1
2
3
4
5
Fig. 4. Molecular modes. (1) Living. (2) Spare. (3) Faulty. (4) Repair. (5) Dead.
1
2
3
4
5
6
7
8
9
Fig. 5. Molecular types. (1) Internal. (2) Top. (3) Top-left. (4) Left. (5) Bottom-left. (6) Bottom. (7) Top-right. (8) Right. (9) Bottom-right.
t = i + 3), a load signal will then propagate westward and northward through the cell and each of its molecules acquire a molecular mode (Fig. 4) and a molecular type (Fig. 5). At time t = i + 4, we finally obtain an homogeneous tissue of molecules defining both the boundaries of the cell and the position of its living mode and spare mode molecules. This tissue is ready for being configured by the functional configuration data. 2.2
Functional Configuration
The goal of the functional configuration mechanism is to store in the homogeneous tissue, which already contains structural data (Fig. 3, t = i + 4), the functional data needed by the specifications of the current application. This mechanism is a functional growth process, performed only on the molecules in the living mode while the molecules in the spare mode are simply bypassed. Every four time step (t = i + 1, i + 5, ...), according to the functional configuration data or functional genome, a path is created among the molecules of the cell in order to trap a copy of the functional data in the memory positions of the cell and to move another copy of the functional data around the cell. At time t = i + 17, the final cell is made up of four living molecules organized as an array of two rows by two columns, while one row of two spare molecules are bypassed. The final specifications of the cell under construction are now stored as functional data in its living molecules.
154
A. Stauffer et al.
t=i
t = i+1
t = i+5
t = i+9
t = i+13
t = i+17 Fig. 6. Functional configuration of the living molecules
2.3
Cloning
The cloning mechanism or self-replication mechanism is implemented at the cellular level in order to build a multicellular organism and at the organismic level in order to generate a population of organisms. This mechanism suppose that there exists a sufficient number of molecules in the array to contain at least one copy of the additional cell or of the additional organism. It corresponds to a branching process which takes place when the structural and the functional configuration mechanisms deliver northward and eastward growth signals on the borders of the cell during the corresponding growth processes (Fig. 7).
t = i+16
(a)
t = i+34
t = i+12
(b)
t = i+24
Fig. 7. Northward and eastward growth signals triggering the cloning mechanism. (a) Structural branching processes. (b) Functional branching processes.
2.4
Cicatrization
Fig. 6, at time t = i+17, shows the normal behavior of a healthy minimal cell, i.e. a cell without any faulty molecule. A molecule is considered as faulty, or in the faulty mode, if some built-in self-test detects a lethal malfunction. Starting with the normal behavior of Fig. 6 (t = i + 17), we suppose that two molecules will become suddenly faulty (Fig. 8, t = i): (1) The lower left molecule, which is in the living mode. (2) The upper right molecule, which is in the spare mode. While there is no change for the upper right molecule, which is just no more able to play the role of a spare molecule, the lower left one triggers a cicatrization mechanism. This mechanism is made up of a repair process involving repair signals (Fig. 8, t = i + 1 to t = i + 2) followed by a reset process performed with reset signals (t = i + 4 to t = i + 6). This tissue, comprising now two molecules in the faulty
Bio-inspired Systems with Self-developing Mechanisms
t=i
t = i+1
t = i+5
t = i+6
t = i+2
t = i+3
155
t = i+4
Fig. 8. Cicatrization mechanism performed as a repair process (t = i + 1 to i + 3) followed by a reset process (t = i + 4 to i + 6)
t=i
t = i+1
t = i+5
t = i+9
t = i+13
t = i+17 Fig. 9. Functional reconfiguration of the living and repair molecules
mode and two molecules in the repair mode, is ready for being reconfigured by the functional configuration data. This implies a functional growth process bypassing the faulty molecules (Fig. 9). 2.5
Regeneration
Our minimal self-developing cell comprises a single spare molecule per row and tolerates therefore only one faulty molecule in each row. A second faulty molecule in the same row will trigger the death of the whole cell, and the start of a regeneration mechanism. Fig. 10 illustrates the repair process and kill process involved in this mechanism. Starting with the normal behavior of the cicatrized cell (Fig. 9, t = i + 17), a new molecule, the upper middle one, becomes faulty. In a first step, the new faulty molecule sends a repair signal eastward, in order to look for a spare molecule, able to replace it (t = i + 1). In a second step, the supposed spare molecule, which is in fact a faulty one, enters the lethal dead mode and triggers kill signals northward, westward and southward (t = i + 2). In the next three steps, the other molecules of the array are given the dead mode. At time t = i + 5, the original minimal cell is dead.
156
A. Stauffer et al.
t=i
t = i+1
t = i+2
t = i+3
t = i+4
t = i+5 Fig. 10. Regeneration mechanism performed as a repair process (t = i + 1) followed by a kill process (t = i + 2 to i + 5)
3 3.1
SOS Acronym Application Structural Configuration, Functional Configuration and Cloning
Even if our final goal is the self-development of complex bio-inspired systems, in order to illustrate its basic mechanisms we will use an extremely simplified application example, the display of the “SOS” acronym. The system that displays the acronym can be considered as a one-dimensional artificial organism composed of three cells (Fig. 11). Each cell is identified by a X coordinate, ranging from 1 to 3. For coordinate values X = 1 and X = 3, the cell implements the S character, for X = 2, it implements the O character. Such a cell, capable of displaying either the S or the O character, is a totipotent cell comprising 4 × 6 = 24 molecules. An incrementer implementing the X coordinate calculation is embedded in the final organism.
X=1
2
3
Fig. 11. One-dimensional organism composed of three cells resulting from the structural configuration, functional configuration and cloning mechanisms applied to a totipotent cell
In order to build the multicellular organism of Fig. 11, the structural configuration mechanism, the functional configuration mechanism, and the cloning mechanism are applied at the cellular level. Starting with the structural and functional configuration data of the totipotent cell, these mechanisms generate successively the three cells X = 1 to X = 3 of the organism “SOS”.
Bio-inspired Systems with Self-developing Mechanisms
3.2
157
Cicatrization and Functional Reconfiguration
The cicatrization mechanism (or cellular self-repair) results from the introduction in each cell of one column of spare molecules (Fig. 11), defined by the structural configuration of the totipotent cell, and the automatic detection of faulty molecules. Thanks to this mechanism, each of the two faulty molecules of the middle cell (Fig. 12) is deactivated, isolated from the network, and replaced by the nearest right molecule, which will itself be replaced by the nearest right molecule, and so on until a spare molecule is reached. The functional reconfiguration mechanism takes then place in order to regenerate the O character of the organism “SOS”. As shown in Fig. 12, the regenerated character presents some graphical distortion.
X=1
2
3
Fig. 12. Graphical distortion resulting from the cicatrization and reconfiguration mechanisms applied to the middle cell of the organism
3.3
Regeneration
The totipotent cell of the organism “SOS” having only one spare column allows only one faulty molecule per row. When a second one is detected, the regeneration mechanism (or organismic self-repair) takes place and the entire column of all cells to which the faulty cell belongs is considered faulty and is deactivated (column X = 2 in Fig. 13; in this simple example, the column of cells is reduced to a single cell). All the functions (X coordinate and configuration) of the cells to the right of the column X = 1 are shifted by one column to the right. Obviously, this process requires as many spare cells to the right of the array as there are faulty cells to repair. As shown in Fig. 13, the reparation of one faulty cell needs one spare cell to the right and leaves a scar in the organism “SOS”. 3.4
Basic Processes
As defined in Section 2, the self-developing mechanisms are made up of basic processes like growth, load, repair, reset and kill. Fig. 14 illustrates these processes while they are performed on the middle cell of the organism “SOS”. During the structural configuration mechanism, the growth of the data path of the middle cell (Fig. 14a) leads to the definition of the boundaries as well as the living mode or spare mode of its constituting molecules (Fig. 14b). The growth process of
158
A. Stauffer et al.
X=1
2
3
Fig. 13. Scar resulting from the regeneration mechanism applied to the organism
(a)
(b)
(c)
(d)
(e)
(f)
Fig. 14. Processes performed on the middle cell. (a) Structural growth. (b) Load. (c) Functional growth. (d) Repair and reset. (e) Functional regrowth. (f) Kill.
the functional configuration mechanism expresses the O character which is part of the specifications of the current application (Fig. 14c). The reset process following the repair of the cicatrization mechanism (Fig. 14d) allows a functional
Bio-inspired Systems with Self-developing Mechanisms
159
regrowth process which bypasses the faulty molecules (Fig. 14e). The detection of a second faulty molecule in the third lower row of the middle cell triggers the kill process of the regeneration mechanism (Fig. 14f).
4 4.1
Hardware Implementation CONFETTI Platform
CONFETTI, which stands for CONFigurable ElecTronic Tissue [9] [5], is a reconfigurable hardware platform for the implementation of complex bio-inspired architectures. The platform is built hierarchically by connecting elements of increasing complexity. The main hardware unit is the EStack (Fig. 15), a stack of four layers of PCBs: – The ECell boards (18 per EStack) represent the computational part of the system and are composed of an FPGA and some static memory. Each ECell is directly connected to a corresponding routing FPGA in the subjacent ERouting board. – The ERouting board (1 per EStack) implements the communication layer of the system. Articulated around 18 FPGAs, the board materializes a routing network based on a regular grid topology, which provides inter-FPGA communication but also communication to other routing boards. – Above the routing layer lies a board called EPower that generates the different required power supplies. – The topmost layer of the EStack, the EDisplay board, consists of a RGB LED display on which a touch sensitive matrix has been added. The complete CONFETTI platform is made up of an arbitrary number of EStacks seamlessly joined together (through border connectors in the ERouting board) in a two-dimensional array. The connection of several boards together potentially allows the creation of arbitrarily large surfaces of programmable logic. The platform used to implement the “SOS” acronym application example consists of six EStacks in a 3 by 2 array (Fig. 16). 4.2
SOS Application
The 3x2 EStack platform totalizes an amount of 108 ECells organized as 6 rows of 18 units. Using one ECell for each molecule, the complexity of the platform allows thus the implementation of four and a half totipotent cells of the “SOS” application. Fig. 17 shows the cloning of the totipotent cell in order to build a first multicellular organism “SOS” and sketches the cloning of this organism in order to define a population of them. The cloning of the organism rests on two assumptions: (1) There exists a sufficient number of spare cells in the array to contain at least one copy of the additional organism, assumption which is only
160
A. Stauffer et al.
EDisplay EPower
ECells
ERouting
Fig. 15. Schematic of the EStack
Fig. 16. Schematic of the 3x2 EStack platform
partially satisfied here. (2) The calculation of the coordinates produces a cycle X = 1 → 2 → 3 → 1 implying X+ = (X + 1) mod 3. Given a sufficiently large space, the cloning of the organism could be repeated for any number of specimens in the X and/or Y axes. Fig. 18 illustrates cicatrization or reparation at the cellular level as well as regeneration or reparation at the organismic level. The cicatrization of the cells having at most one faulty molecule in each of their rows causes the graphical distortion of the characters S and O. The regeneration of the cell having more than one faulty molecule in one of its rows leaves a scar in the organism “SOS”.
Bio-inspired Systems with Self-developing Mechanisms
161
Fig. 17. Cloning of the “SOS” acronym, totally realized at the cellular level and partially achieved at the organismic level, on a CONFETTI substrate
Fig. 18. Cicatrization and regeneration of the “SOS” acronym on a CONFETTI substrate
5
Conclusion
The self-developing mechanisms are mainly based on the Tom Thumb algorithm. These mechanisms are made of simple processes like growth, load, branching, repair, reset, and kill. They allow the cellular systems to possess bio-inspired properties such as: – Cloning or self-replication at cellular and organismic levels. – Cicatrization or self-repair at the cellular level. – Regeneration or self-repair at the organismic level. Starting with a very simple cell made of only six molecules, we realized hardware simulations of its DSCA implementation in order to describe these selfdeveloping mechanisms. The “SOS” acronym, a small organism made of three cells, was introduced as an application example for the simulation of our mechanisms. Finally the hardware implementation of this application example was fulfilled thanks to CONFETTI, a reconfigurable hardware platform for bio-inspired architectures.
162
A. Stauffer et al.
In order to improve our systems, we intend to study additional hardware features such as: – Automatic detection of faulty molecules, erroneous configuration data, and application dysfunction. – Asynchronous implementation at the organismic level and synchronous implementation at the cellular level.
References 1. Canham, R., Tyrrell, A.M.: An embryonic array with improved efficiency and fault tolerance. In: Lohn, J., et al. (eds.) Proceedings of the NASA/DoD Conference on Evolvable Hardware (EH’03), pp. 265–272. IEEE Computer Society Press, Los Alamitos, CA (2003) 2. Mange, D., Sipper, M., Stauffer, A., Tempesti, G.: Toward robust integrated circuits: The Embryonics approach. Proceedings of the IEEE 88(4), 516–541 (2000) 3. Mange, D., Stauffer, A., Petraglio, E., Tempesti, G.: Embryonics machines that divide and differentiate. In: Ijspeert, A.J., Murata, M., Wakamiya, N. (eds.) BioADIT 2004. LNCS, vol. 3141, Springer, Heidelberg (2004) 4. Mange, D., Stauffer, A., Petraglio, E., Tempesti, G.: Self-replicating loop with universal construction. Physica D 191(1-2), 178–192 (2004) 5. Mudry, P.-A., Vannel, F., Tempesti, G., Mange, D.: Confetti: A reconfigurable hardware platform for prototyping cellular architectures. In: Proceedings of the 14th Reconfigurable Architectures Workshop (RAW 2007), IEEE Computer Society Press, Los Alamitos, CA (2007) 6. Stauffer, A., Mange, D., Tempesti, G.: Embryonic machines that grow, self-replicate and self-repair. In: Lohn, J., et al. (eds.) Proceedings of the 2005 NASA/DoD Conference on Evolvable Hardware (EH’05), pp. 290–293. IEEE Computer Society Press, Los Alamitos, CA (2005) 7. Stauffer, A., Mange, D., Tempesti, G.: Bio-inspired computing machines with selfrepair mechanisms. In: Ijspeert, A.J., Masuzawa, T., Kusumoto, S. (eds.) BioADIT 2006. LNCS, vol. 3853, Springer, Heidelberg (2006) 8. Stauffer, A., Sipper, M.: The data-and-signals cellular automaton and its application to growing structures. Artificial Life 10(4), 463–477 (2004) 9. Vannel, F., Mudry, P.-A., Mange, D., Tempesti, G.: An embryonic array with improved efficiency and fault tolerance. In: Proceedings of the Workshop on Evolvable and Adaptative Hardware (WEAH07), IEEE Computational Intelligence Society, Los Alamitos (2007)
Development of a Tiny Computer-Assisted Wireless EEG Biofeedback System Haifeng Chen, Ssanghee Seo, Donghee Ye, and Jungtae Lee Department of Computer Science & Engineering, Pusan National University San-30, Jangjeon-Dong, Geumjeon-Gu, Busan, 609-735, Republic of Korea {chenhaifeng,shseotwin,ydh11,jtlee}@pusan.ac.kr
Abstract. This paper describes an on-going research to develop a BrainComputer Interface (BCI) with which to conduct biofeedback training. A convenient portable wireless two-channel tiny Electroencephalogram (EEG) acquisition device has been developed for this study, which is based on Radio Frequency (RF) technology, and we developed a computer assisted EEG biofeedback system using Virtual Reality which provides an ideal medium to represent the spatial and temporal environment of electrical activity emanating from the brain. A system prototype system has been implemented with the proposed device for attention enhancement training with Virtual Reality (VR) environment, and 3 volunteers’ test results are reported in this paper. With the proposed system, lots of EEG biofeedback training can be designed easily and done at home in our daily life conveniently. Keywords: Electroencephalogram, EEG biofeedback, Attention Enhancement Training.
1 Introduction The EEG is a particularly powerful clinical tool and has become the gold standard for neurology and psychology research for decades. It is a relatively simple, inexpensive and completely harmless method for analyzing brain activity. Based on different frequency bands, EEG signals are categorized in 4 specific categories of brain activity commonly discussed in EEG literature: Alpha, Beta, Theta and Delta waves, as shown in Table 1. EEG biofeedback is a technique used mainly in behavioral medicine and as an adjunct to psychotherapy. An electronic device records EEG activities at a particular scalp location, extrapolates physiological measurements from the signals and converts them to visual and/or auditory representations dynamically covering with the brain signals. The aim of this study is to develop a portable EEG biofeedback system for enhancing the human ability to self-regulate brain electric activity, and with the proposed system, some computer-aided EEG biofeedback training can be done conveniently at home in our daily life. The EEG biofeedback is a learning strategy that enables people to alter their brain waves. When information about a person’s own brain wave characteristics is made available to him or her, he or she can learn to change them. EEG biofeedback is like a L. Kang, Y. Liu, and S. Zeng (Eds.): ICES 2007, LNCS 4684, pp. 163–173, 2007. © Springer-Verlag Berlin Heidelberg 2007
164
H. Chen et al.
mirror to us, telling us how we look at a given instant in terms of how well our brains are working. We can think of it simply as “exercise” for our brain. The most common application of EEG biofeedback is currently to the Attention Deficit Hyperactivity Disorder (ADHD), sleep problems in children, teeth grinding, and chronic pain, and etc. The training is a painless, non-invasive procedure, and it is also helpful with the control of mood disorders such as anxiety, depression, medically uncontrolled seizures, minor traumatic brain injury or cerebral palsy. Table 1. Overview of specific brainwave types and their associated state of consciousness
Brainwave Frequency Beta (13-30Hz) Alpha (8-12Hz) Theta (4-7Hz) Delta (below 4Hz)
State of Consciousness Fully-Awake, Alert, Excitement, Tension Deeply-Relaxed, Passive-Awareness, Composed Drowsiness, Unconscious, Deep-Tranquility, Optimal Meditative State Sleep, Unaware, Deep-Unconsciousness
During these years, many studies of EEG biofeedback treatment have reported promising results for ADHD, depression, and epilepsy, and many EEG biofeedback training systems have been designed and developed. Zhang Zuoseng and Chen Weiming developed an Alpha frequency band EEG biofeedback system in 1988 [1]. They designed an EEG acquisition device for biofeedback system and their experiment results showed the feasibility of using EEG biofeedback training to enhance a person’s alpha component. Jenifer et al developed an EEG biofeedback system using virtual environments for training a subject to maintain his or her EEG component signal amplitude for a predetermined period in 1999 [2]. Othmer and Kaiser proposed the use of EEG biofeedback with Virtual Reality in 2000 [3]. V. A. Grin et al reported the effect of EEG biofeedback training for enhancement of attention which showed that the ratio between trained sensorimotor rhythm power and the power of the rest EEG spectrum increased by 30-100% during the 4 minutes of training in 2001 [4]. In 2002, an attention enhancement system using EEG biofeedback in virtual environment was reported by B.H. Cho et al [5]. They proposed a virtual reality method for treating ADHD children and increasing the attention span of children who have attention difficulty, and their experiment results showed their system can possess the ability to improve the attention span of ADHD children and it’s useful to help them learn to focus on some tasks. And Marco Congedo et al proposed a low-resolution electromagnetic tomography neurofeedback to enhance the human ablility to self-regulate brain electric activity in 2004 [6]. Moreover, Andrija et al introduced a music therapy and computer game-based EEG biofeedback training system to help students de-stress for improving academic performance in 2005 [7], and Vincent J. Monastra et al gave a review of EEG biofeedback in the treatment of ADHD [8]. Fig.1 shows a commonly used framework of EEG biofeedback system.
Development of a Tiny Computer-Assisted Wireless EEG Biofeedback System
Visual Feedbacks Subject
165
Signal Processing EEG
Fig. 1. Framework of EEG biofeedback system
As mentioned above, the Virtual Reality (VR) technology has been employed in the area of EEG biofeedback recently [2, 3, 5]. There is no doubt that VR can get the subjects’ attention more easily and make them increasing their ability to concentrate, since it owns three “I”s characters: Immersion, Interaction, and Imagination [9]. In our system prototype application, we also designed a VR environment, and the collected EEG from a subject’s scalp will be translated into movement and interaction within the VR environment. Among these developed systems, most of them are constructed by employing a commercial EEG device. It makes the desired system a little huge and expensive as some useless functions for the specified system might be included in some commercial devices, and it is difficult to make a more flexible and efficient biofeedback system with commercial devices. Therefore, a tiny two-channel wireless system named “wEEG” based on RF technology has been developed for EEG signal acquisition in this study, which extends the acquisition device in our former 16channel portable EEG monitoring project using WLAN technology. The structure of wEEG is described in next section in detail. Furthermore, we also implement a system prototype application to build a virtual environment for attention enhancement training, which takes the low fast Beta wave (12-15Hz, formerly sensorimotor rhythm (SMR)) and Theta wave (4-8Hz) into account as the biofeedback training frequency parameter, and our experiments aim at studying the ability of a person to enhance his or her attention for a given task.
2 Design of the System 2.1 The Structure of wEEG In order to build a more flexible and efficient EEG biofeedback system, we have developed the wEEG, a wireless, portable two-channel tiny EEG acquisition device, based on our past works. The wEEG is divided into two sections: mobile station and base station, in which the mobile station is our proposed wireless EEG device and the base station is used for computer based training to communicate with mobile station wirelessly. Fig.2 shows the systematic block diagram of wEEG’s Mobile Station, and Fig. 3 shows the wEEG’s Base Station. Since the focus of this paper is on the design of EEG biofeedback system, as a result we introduce the structure of wEEG first in detail. First, the simple base station, includes a 8051 based microprocessor and a RF receiver communicated with host PC over USB port. The RF transceiver used in this
166
H. Chen et al.
system is the CC1020 from Texas Instruments [10], which is a low power, narrow band, 915MHz ISM band RF chip. As shown in Fig.4, the CC1020 has a simple interface to connect with microprocessor in a typical system, and it can provide a data rate up to 153.6kBaud. The microprocessor drives the RF receiver to get data from mobile station, and then delivers EEG data to our VR program which executed in host PC. Mobile Station
Channel 1 electrodes Channel 2 electrodes
Mi cr op ro ce ss or
2-Channel EEG Amplifier & Filter
A/D Converter
RF Tr an sm it er
Power Supply
Fig. 2. Block diagram of wEEG Mobile Station
Base Station RF Tr an sc ei ve r
M ic ro pr oc es so r
USB
Host PC
Fig. 3. Block diagram of wEEG Base Station
Fig. 4. Microprocessor interface of CC1020
Second, the mobile station, consists of two-channel EEG amplifier & filter, microprocessor, RF transmitter and power supply. The total gain is more than 20,000
Development of a Tiny Computer-Assisted Wireless EEG Biofeedback System
167
times, and the default programmable sample rate is 256Hz and resolution of 12-bit for two-channel EEG signals. a) EEG amplifier and filter: As stated earlier, because EEG signals typically have an amplitude in range of 10~100uV, thereby requiring amplification prior to any signal processing. The skin typically provides source impedance on the order of 15Mohm. To acquire the signal effectively, the amplifier must match or have a greater input impedance than the source impedance. Furthermore, the amplifier must reject 60Hz (or 50Hz) power line interference from the signal. Consequently, a relatively high Common Mode Rejection Ratio (CMRR) is desired. Under these conditions, a high input impedance, high CMRR and moderately high gain instrumentation amplifier is selected as the preamplifier component of the EEG amplifier. The chip selected is INA128 from Texas Instruments [11], which is a real low power, general purpose instrumentation amplifier that requires low input bias current. It also features a high CMRR of 120dB and a differential input impedance of 10Gohm || 2pF. Preamplifier E le ct ro de
2nd order HPF
+
+
+
+
-
-
-
-
2nd amplifier To ADC
4th order LPF
Notch Filter
Main amplifier
+
+
+
-
-
-
Fig. 5. The structure of amplifier and filter
In order to get the whole efficient EEG band (0.15Hz-100Hz), we have to use amplifiers and filters to do EEG conditioning. From the instrumentation amplifier, a 4th order Bessel LPF (Low Pass Filter) attenuates frequencies up 100Hz. The signal is then filtered by a 2nd Butterworth HPF (High Pass Filter) to attenuate frequencies below 0.15Hz. After pass band filtering, the signal goes to an operation amplifier as the main amplifier which amplifies the filtered signal by 40dB. After that, the signal goes to a 60Hz notch filter to reject the 60Hz interference. Finally, the signal is gained by a 2nd amplifier with 12dB. The operation amplifier chip used here is LMC6464 from National Semiconductor [12], which is a low power operation amplifier (OpAmp) with Rail-to-Rail input and output. The structure of this part is shown in Fig. 5. The specifications of wEEG’s amplifier and filter part are given in Table 2. b) Microprocessor: An inexpensive 8-bit 8051-based microcontroller with a 12-bit ADC is employed in this design to acquire EEG signals and drive RF transmitter. c) RF transmitter: This is the most important part in the mobile station that transmits the digital EEG signals wirelessly. We use the same RF transceiver chip
168
H. Chen et al.
with the base station. It can be employed in wEEG for transmitting EEG signal wirelessly. d) Power supply: for the mobile station, we are now using a 9V battery to supply it temporarily, and we are considering a small size Li-Ion rechargeable instead of that. For the base station, it is supplied through the USB cable. Table 2. Specifications of wEEG’s amplifier & filter
Part name Pre-amplifier
Low pass filter High pass filter Main amplifier Notch filter 2nd amplifier
Character Differential input impendence: 10Gohm||2pf, CMRR: 120dB, input bias current: 1nA input offset voltage: 25uV 4th-order Bessel filter cutoff frequency: 100Hz 2nd-order Butterworth filter cutoff frequency: 0.1Hz Invert amplifier 60Hz Twin-T notch filter Invert amplifier
Gain 25dB
0dB 0dB 40dB 0dB 12dB
Additionally, the PCB (Printable Circuit Board) artwork for the designed wEEG is on-going, and the expected size of two-channel mobile station is about 30mm×50mm×20mm without battery. And desired base station is smaller than the mobile station and its power is supplied by host PC via serial cable. 2.2 VR Program and EEG Biofeedback Training Protocol We have just developed a 3D VR environment for the EEG biofeedback training, which is a car racing game like 3D VR environment. In this program, each one second EEG signals are processed by DFT (Discrete Fourier Transform) first to get their power spectrum, and then the powers of the EEG components (Alpha or Beta) are translated into the control signal directly. So a changing EEG component power value will always be visible and will correspond to the speed or position of the virtual car. In this way, subjects are getting his or her own feedback about the current power value (or proportion) of the EEG components continually. Especially, in order to improve the interest of subjects, we adopt a racing mode to stimulate the subjects to get more attention during training. There are two racing car in the VR world, one is controlled by a subject, the other one is controlled by computer or another subject. If a subject races with computer, the computer controlled car’s speed will be randomly increased referring to the subject’s speed. And it is possible that two subjects race each other in our training system, in this case, two racing cars are controlled by two subject’s EEG component parameters respectively. VR program consists of four parts that are device initiation, EEG signal acquisition and filtering, SMR-band power value analysis, and results output, and the most important part is to calculate the power value (or proportion) or SMR band, as it links
Development of a Tiny Computer-Assisted Wireless EEG Biofeedback System
169
the VR program and EEG signal well. The ratio of the integral of powers in the SMR band and all EEG bands following each time interval is defined and calculated as follows: SMR (t )% =
∑P ( f )
f ∈SMR 30
∑P ( f ) f =4
Where
n
× 100%
(1)
n
Pn ( f ) is the power spectral density between times t − 1 and t . SMR(t) is
thus the instantaneous ratio of the integral of power in the SMR (12-15Hz) and all pass frequency band (4-30Hz). Fig.6 shows the procedure for estimating the EEG’s SMR band power.
EEG
Pre-filtering
Racing Speed Translation
512-point FFT SMR-band Power Calculation
Fig. 6. Procedure for calculating SMR band power
For controlling the car’s speed using subjects’ EEG signals, we have to undertake a short measure session to find a threshold for the subject before training, and in this study, the SMR training, we can use the function (1) to get the SMR threshold( SMRthreshold ). Finally, we can define the car’s speed as follows: 0, ⎧ ⎪ V (t ) = ⎨⎛ SMR(t − 1) ⎞ ⎜ − 1⎟⎟ × 100, ⎪⎜ SMRthreshold ⎠ ⎩⎝
(SMR(t − 1) < SMRthreshold ) (SMR(t − 1) ≥ SMRthreshold )
(2)
V (t ) is the calculated racing speed of the subject in time t and displayed in the VR program on computer screen, and SMR(t − 1) is the subject’s SMR band power in t − 1 . Since whether the racing car can run depends on above SMR band Where
power levels of each subject, it will be helpful to lead the subject to concentrate on the game task instinctively and try to win. Fig. 7 shows an example VR world. There are two racing car in the middle of computer screen, one is controlled by the subject’s “brain wave”, and the other is controlled by computer. If the controlling feather (SMR ratio) reaches appointed threshold, the subject-controlled car will speed up, otherwise it will slow down. The speed is defined by equation (2). And the filtered EEG signals, the ratio of SMR and Theta are displayed in the screen’s bottom.
170
H. Chen et al.
Fig. 7. Interface of the VR environment
Finally, our computer settings are as follows: Pentium 4 2.66GHz CPU, 512M of RAM, and Windows XP OS. The VR environments are designed using 3D MAX and programmed using Microsoft Visual C++ 6.0 with DirectX 9.0 SDK.
3 Implementation and Experiment Results The developed EEG device has been tested in our laboratory to ensure overall functionality and robustness. In order to verify its performance of gathering human scalp EEG signals, we select received data for 3 seconds from base station to analyze it using DFT to check its frequency components. Fig. 8 shows received EEG signals from wEEG mobile station for three seconds, and its DFT power spectrum is show as Fig. 9. 20 10
Amplitude(uV)
0 -10 -20 -30 -40 -50 -60
0
0.5
1
1.5 Time(sec)
2
2.5
3
Fig. 8. Received one second EEG signals from wEEG mobile station
We use the wEEG mobile station as the EEG acquisition device to build our current EEG biofeedback system. We use a single channel of wEEG, three electrodes, A2, C4 and Pz (International 10-20 Electrode System as shown in Fig.10) in which A2 is the reference electrode and C4 is the sensorimotor (SMR) rhythm sensitive area, and they are used to collect scalp EEG signals using bipolar mode. The system hardware configuration is shown in Fig .11.
Development of a Tiny Computer-Assisted Wireless EEG Biofeedback System
171
Power Spectrum
8 6
4 2
0
0
20
40
60 Frequency(Hz)
80
100
120
Fig. 9. DFT power spectrum of sample EEG signals
Fig. 10. Overview of international 10-20 electrodes system
Fig. 11. System hardware configuration
Before a training session commences, the subject’s SMR threshold is measured by a prior session. And the subject is asked not to move his or her body during the experiments to minimize the movement artifacts. After that, the training session starts, and at this time, the subject is encouraged to concentrate on the game task and racing with computer. We have tested our system on three members of our laboratory, including two males and one female. The sampling rate of EEG is set to 256Hz, and collected EEG signal is pre-filtered into 4 and 30Hz band. The signal is windowed with a 2-second (512 points) Hanning window with 50% overlap to calculate the SMR’s power. The
172
H. Chen et al.
following Fig. 12 shows the average SMR power ratio of 3 different states, which are stable states, attention with VR and attention without VR, for each subject, and each state continues 3 minutes. Because the SMR band (12-15Hz) is a very narrow band, its power ratio is not very big, just about 7% in the whole 4-30Hz band. Though enough experiments have not been done until now, our current test results have shown that all of subjects’ SMR power in VR environment is higher than that without VR, which is about 1% higher than in stable state. This means that the VR environment is helpful to enhance the SMR band, and with the proposed system, that can be realized conveniently.
Fig. 12. Average SMR power ratio of each subject
4 Conclusions and Future Work In order to develop a portable and more convenient EEG biofeedback system, a set of tiny 2-channel wireless EEG acquisition devices using RF technology named wEEG has been designed, which is based on our former 16-channel wireless EEG project, and a virtual reality environment has been developed for EEG biofeedback training. Several experiments have been done to evaluate wEEG and biofeedback system, and the test result shows that VR environment is helpful to enhance/inhibit EEG components in EEG biofeedback system. We hope the proposed portable EEG biofeedback system can offer powerful and easy-to-use tools that users can create flexible treatment programs at home. Now we are planning to do more experiments to confirm our system, and our further interests focus on developing more interesting and more flexible training program using wEEG, such as developing a PDA interface for wEEG for real time EEG biofeedback training, and expending the wEEG mobile station with an auditory feedback.
Development of a Tiny Computer-Assisted Wireless EEG Biofeedback System
173
References 1. Zousen, Z., Weiming, C.: Development of EEG biofeedback system and research of the biofeedback in the alpha frequency band. In: Proceedings of Annual International Conference of the IEEE Engineering in Medicine and Biology Society, New Orleans LA, USA, pp. 1482–1483. IEEE Computer Society Press, Los Alamitos (1988) 2. Allanson, J., Mariani, J.: Mind Over Virtual Matter: Using Virtual Environments for Neurofeedback Training. In: Proceedings of IEEE Virtual Reality Annual International Symposium, pp. 270–273 (1999) 3. Othmer, S., Kaiser, D.: Implementation of Virtual Reality in EEG Biofeedback. Cyberpsychology & Behavior 3(3), 415–420 (2000) 4. Grin’-Yatsenko, V.A., Kropotov, Y.D., Ponomarev, V.A., Chutko, L.S., Yakovenko, E.A.: Effect of Biofeedback Training of Sensorimotor and β1 EEG Rhythms on Attention Parameters. Human Physiology 27(3), 259–266 (2001) 5. Cho, B.H., Lee, J.M., Ku, J.H., Jang, D.P., Kim, J.S., Kim, I.Y., Lee, J.H., Kim, S.I.: Attention Enhancement System using Virtual Reality and EEG Biofeedback. In: Proceedings of IEEE Virtual Reality 2002, Orlando FL, USA, pp. 156–163. IEEE Computer Society Press, Los Alamitos (2002) 6. Congedo, M., Lubar, J.F., Joffe, D.: Low-Resolution Electromagnetic Tomography Neurofeedback. IEEE Transactions on Neural Systems and Behabilitation Engineering 12(4), 387–397 (2004) 7. Maricic, A., Leang, H.P.: Biofeedback Computer Game-based Training. In: Proceedings of the 47th International Symposium ELMAR, Zadar, Crotia, pp. 185–188 (2005) 8. Monastra, V.J., Lynn, S., Linden, M., Lubar, J.F., Gruzelier, J., LaVaque, T.J.: Electroencephalographic Biofeedback in the Treatment of Attention-Deficit/Hyperactivity Disorder. Applied Psychophysiology and Biofeedback 30(2), 95–114 (2005) 9. Burdea, G.: Virtual Reality Systems and Applications. In: Proceedings of Electro ’93 International Conference, NJ, USA, pp. 164–167 (1993) 10. CC1020 datasheet, Texas Instruments, focus. ti.com/docs/prod/folders/print/cc1020.htm 11. INA128 datasheet, Texas Instruments, focus. ti.com/docs/prod/folders/print/ina128.html 12. LMC6464 datasheet, National Semiconductors, www.national.com/pf/LM/LMC6464.html
Steps Forward to Evolve Bio-inspired Embryonic Cell-Based Electronic Systems Elhadj Benkhelifa1, Anthony Pipe2, Mokhtar Nibouche3, and Gabriel Dragffy4 1
Bristol Robotics Laboratory, University of the West of England (UWE), Frenchay Campus, Coldharbour Lane, Bristol, BS16 1QY, UK
[email protected] 2 Bristol Robotics Laboratory, University of the West of England (UWE), Frenchay Campus, Coldharbour Lane, Bristol, BS16 1QY, UK
[email protected] 3 Bristol Institute of Technology , University of the West of England (UWE), Frenchay Campus, Coldharbour Lane, Bristol, BS16 1QY, UK
[email protected] 4 Bristol Institute of Technology, University of the West of England (UWE), Frenchay Campus, Coldharbour Lane, Bristol, BS16 1QY, UK
[email protected]
Abstract. EHW is the acronym used to denote an emerging and relatively new research field in digital hardware design; it stands for Evolvable Hardware. This technique has attracted many researchers since the 1990’s. EHW aims at an automatic design and optimisation of a reconfigurable hardware system using Evolutionary Algorithms (EAs), such as Genetic Algorithms, Genetic programming etc. This article is published as part of a three years research project. The objective of this project is to employ the above method on a target specific hardware, the Embryonics Hardware System. The latter requires large hardware resources. Thus, in this project, EAs will be used to evolve the Embryonics Hardware System to discover novel design with reduced complexity. The new design must first ensure the correct functionality. Hence to verify the concept of Evolvable Hardware, the authors, in this paper, focus on the design of relatively simple combinatorial logic circuits using Genetic Algorithms with multi-objective fitness.
1 Introduction The electronic circuit industry is increasing in complexity very rapidly. With new generations of hardware, the demand for more complex behaviours is, consequently, growing. These circuits are typically designed to either be part of a system, or to be the system itself; computers, digital watches, nuclear power stations, just to mention a few. Without any doubt, these systems perform better than human beings in many aspects. However, despite these achievements, hardware failures to fulfil some real life tasks are common; and even current leading computer systems fall short in performing them, when a changing environment is considered. These failures can be explained by limitations of the traditional circuit design methodologies, which are based on rules that have been expanded over the years and depend only on human L. Kang, Y. Liu, and S. Zeng (Eds.): ICES 2007, LNCS 4684, pp. 174–185, 2007. © Springer-Verlag Berlin Heidelberg 2007
Steps Forward to Evolve Bio-inspired Embryonic Cell-Based Electronic Systems
175
capabilities. Ontogeny and Phylogeny signify the basic concepts of the Cell and Evolution theories in living beings in nature, respectively. Nature, which achieves a high level of functional integrity, proved to be the best inspirational source to researchers and engineers in the field of digital hardware design to tackle complexity and reliability issues. Thus, under the umbrella of Artificial Life (ALife) [27], research in Biologically Inspired Hardware has emerged. Scientists and Engineers have been enthused by the above theories. The Ontogeny, for instance, has motivated researchers to develop the Embryonics Hardware System (embryological electronics) [1] [4]. This system was built using conventional design methods relying hundred percent on human capabilities. It is a homogenous array that consists of a number of identical embryonic cells. Together, these cells incorporate self-repair and selfhealing mechanisms, developed during the embryonic phase in living beings. On one hand, this method, relatively, allows the hardware device to be more reliable, on the other hand, it proved to be very inefficient as far as complexity is concerned. The evolutionary theory or the Darwinian Theory in nature is concerned with the study of the development of organisms through the evolution of genetics and relationship between these organisms. It emphasises on adaptation of living beings in their environments and survival of the fittest through natural selection. It was about a decade ago, when researchers in the field of hardware design have started exploring the evolution theorem in order to adopt it as an unconventional approach towards autonomous design circuitry, which may result in novel hardware solutions with minimal complexity and maximum reliability [26]. Since then research within this field has taken various names such as Hardware Evolution, Evolutionary Electronics, and Evolvable Hardware (EHW). EHW was first used by Hugo de Garis in the early 90s. It was proposed as an opponent to conventional methods for hardware design. Since then, and especially after the introduction of Programmable Logic Devices such as FPGAs, this field has been investigated thoroughly by different researchers, Adrian Thompson [2], Julian Miller [5] and Higuchi [17], just to mention a few names. This technique appears to be successful and promising, as it could automatically design digital circuits through the use Evolutionary Algorithms. However, researchers in this field are still facing some real challenges, such as scalability, maintainability and generalisation issues etc [9] [7]. Therefore, is Evolvable Hardware (EHW) a technology that can be used as a compaction methodology to reduce the complexity of the Embryonics Hardware system or electronic circuits in general? Under the umbrella of evolutionary computation, the authors endeavour to learn how to implement such parallel cellular hardware systems, the Embryonics, autonomously in simpler design circuitry and better performance than the human-made one with conventional design methodologies. For this purpose and in order to verify the concept, this paper introduces some promising results from experiments conducted to evolve relatively simple combinational circuits (absence of memory elements) using Genetic Algorithms with multi-objective fitness.
2 Evolvable Hardware 2.1 Definition The first thing that comes to mind as a solution to the complexity and reliability problems is either to add more rules and parameters or employ more designers with
176
E. Benkhelifa et al.
greater expertise. Together, these two options could result in simpler circuitry designs and more reliable systems. However, these can be considered as temporary solutions to a greater problem; those systems might still not cope in dynamic environments where fuzziness and approximation are the ruling bodies of the daily activities and the surrounding world [8]. Moreover, these solutions could prove to be expensive and resource demanding, as well as limited in scope, as it is difficult for human designers to discover all possible circuit solutions for any desired behaviour. As a subset of artificial evolution, research on Evolvable Hardware (EHW) [2] has taken place to resolve the stated problem on silicon. EHW refers to reconfigurable electronic systems, which can evolve under the control of some Evolutionary Algorithms (EAs), in order to solve realworld tasks [10] [11]. Evolvable Hardware as a technology that can allow hardware to evolve until an optimal or near-optimal solution to a certain problem or desired behaviour is achieved. This technology can be proposed as an alternative to traditional design paradigms for very complex systems, which can show drawbacks. This technique has seen some considerable advances in the last few years.
Evolved Circuit
0101110 1011000 1010011
Rec Reconfigurable Hardware
0001110 1111000 1011111
GA Process
Fig. 1. EHW combines Evolutionary Algorithms and a Reconfigurable Hardware
2.2 Reconfigurable Hardware Primarily, any reconfigurable hardware device, coarse-grained or fine-grained, can be used for EHW applications. The following are the four different types of reconfigurable devices: • • • •
Field Programmable Gate Array (FPGA): commercial available digital arrays that allow digital circuits to be programmed at gate level. Field Programmable Analogue Array (FPAA): a cellular programmable analogue array used to implement analogue circuits, e.g. oscillators Field Programmable Transistor Array (FPTA): cellular array of reconfigurable transistors. The FPTA is able to implement both digital and analogue hardware. Artificial Neural Network (ANN): Programmable ANN devices where the Evolutionary Algorithm can either adjusts the synaptic weights, if the structure is predetermined, or create the structure and the learning algorithm adjusts the synaptic weights of the network.
Steps Forward to Evolve Bio-inspired Embryonic Cell-Based Electronic Systems
177
2.3 Evolutionary Algorithms Inspired by nature and biology, Evolutionary Algorithms (EAs) have emerged to mimic aspects of natural evolution and natural selection [8]. These Algorithms are optimal randomised generative search techniques. Classical analytical, numerical or combinatory techniques are superior to EAs when certain well know known (problem specific) conditions are satisfied, in addition, hybrid classical techniques have been developed which are also superior to EAs for mildly ill-conditioned problems. However, most real world problems are non-linear, discontinuous, highly crosscoupled and ill-defined. The real system must be heavily simplified for the classical techniques to work; if heavy simplification is unsuitable, the EAs are most usefully applied. For these reasons EAs have come forward as an alternative to conventional methodologies when complex systems, which require intensive computations and parallelism, are considered. EAs have been applied to many engineering optimisation problems, such as, neural networks, machine learning and hardware design, just to mention few. Genetic Algorithms (GA) [14], Evolutionary Strategies [24], Evolutionary Programming [23], Genetic Programming (GP) [13], are the four known EAs. All these techniques are similar in spirit, but differ in the details of their implementation and the nature of the problems to which they have been applied. They all share a common conceptual base of simulating evolution via processes of selection, mutation, and cross-over or recombination. These processes are often known as “Genetic Operators” or “Search Operators”. 2.3.1 Genetic Algorithms It was first proposed by an American scientist, John Holland, in 1960’s [25]. These algorithms are now widely adopted, studied and experimented in many Engineering fields where search and optimisation are required; Digital Electronics is one of them, Evolutionary Electronics. The strength of the GAs comes from having a very simple randomised beam search algorithm that allows the harnessing of the combinatory power of a computer to search a large solution space in parallel, while being probabilistically directed at each iteration towards the desired solution or solutions. GA outperforms traditional techniques in many cases. The followings are some advantageous features of the GAs: 1. 2.
3. 4.
GAs involves a search from a population of individuals, rather than from a single point in traditional Methods Evolutionary techniques are interested in specifying what the circuit should implement; on the other hand, the conventional design approach states how to design and implement a circuit. Evolutionary design approaches do not require Priori knowledge. Flexibility in modifying the GA Parameters allows it to adapt to new requirements.
2.3.2 GA Mechanism The working of a GA is simple; the algorithm maintains a population of encoded individuals, where each individual represents a potential solution to the problem at hand. The population is usually randomly generated. Then the GA evaluates each candidate according to a fitness function, which symbolizes an ultimate solution to
178
E. Benkhelifa et al.
Fig. 2. Intrinsic, Extrinsic and Mixtrinsic EHW within the EA Mechanism
that problem. Purely by chance, a few of those candidate solutions may hold promise towards solving the problem, thus they are selected. These preferred candidates are allowed to reproduce and then are objected to random changes by crossover/recombination, by swapping parts of a solution with another, or mutation, by making a random (normally small) change to single elements of the solution. These genetic operators will cause the formation of a new generation in the pool, which goes into a similar cycle of artificial evolution. After several iterations, with each one involving a competitive selection that rejects and discards poor solutions, based of the survival of the fittest paradigm, the average fitness of the population is expected to increase each generation. Therefore, a desired solution can be extracted from the pool at some point. This process makes GAs well suited for combinatorial and continuous problems. Figure 2 shows a Graphical Representation of the GAs Mechanisms. This process is adapted to all Evolutionary algorithms 2.4 EHW Classification Researchers have approached the field of EHW from two distinct angles which are intimately related [9]. First is the use of EAs for circuit synthesis and design as an alternative to traditional methods applied in man-made circuitry; Adrian Thomson is a pioneer researcher in this field [2]. As an ultimate goal, the second is the use of EAs to develop a new generation of hardware, self- reconfigurable and evolvable, environment-aware, which can adaptively reconfigure to attain a desired behaviour optimally, survive and recover from faults and degradation, and also improve its performance over the lifetime of operation. The latter is often known as “hardware capable of online adaptation to dynamically changing environments”; NASA is one of the leading research groups in this field [11]. Evolvable Embryonics concentrate on the former concept. Experiments, which will be performed for this purpose make use of the Genetic Algorithm [13] [14] to automatically design an Embryonics cell, seeking an optimised version of an existing man-made one. Evolvable Hardware has been applied on three hierarchical levels, Transistor [15] [16], Gate [17] and Functional [18] levels. Also, EHW can be Direct or Indirect, according to the level of
Steps Forward to Evolve Bio-inspired Embryonic Cell-Based Electronic Systems
179
the chromosome representation [19]. Thus, Grammar, and tree representations are used with the Indirect EHW and architecture bit representation is used with the direct one. Moreover, EHW is categorised into Extrinsic, Intrinsic and Mixtrinsic [20]. This classification is based on how solutions are evaluated and tested. This can be offline with a simulator (Extrinsic) or online in the actual hardware (Intrinsic). The third category was proposed to solve the portability problem between the real hardware and the simulator and visa versa. It does this by keeping solutions that work on both hardware and simulator only and eliminate the rest.
3 Evolvable Embryonics 3.1 Motivations Researchers within the Bristol Robotic Laboratory in the University of the West of England have attempted research on the field of Embryonics. Although, they have succeeded in building a working Embryonics cell with self–repair mechanisms, the system requires, approximately, 7000 transistors [3]. The human designer proved to fail in producing a more compact Embryonics cell, thus by conducting this research one can investigate the power of Evolutionary Algorithms to design less resource demanding Embryonics. This is hoped to be an answer to scalability problem in Evolvable Hardware. Also, the advantage of evolving a self-repair hardware system over an ordinary one can resolve the issue of maintainability of evolved circuits, mentioned in [9]. The global objective of Evolvable Embryonics is to discover new design solutions to Embryonics using Evolutionary Algorithms. The new design has to be 100% functional system with less/minimum number of gates/hardware resources. In this paper the authors focus on the design of a combinatorial logic circuit using Genetic Algorithms. 3.2 The Genetic Algorithm 3.2.1 Genetic Encoding The genetic encoding adopted in this paper is similar to the one suggested in [22]. The logic circuit is organised into a two dimensional array of cells. Each cell accepts two inputs and produces one output. The cells in the first column of the array are fed with defined inputs; for the purpose of this experiment, the combinatorial circuit takes n number inputs. Therefore each input of the circuit represents 2^n combinations of logic constants (‘0’ or ‘1’). Cells in the following columns receive inputs from cells in the previous column. The chromosome is a string of integers where each three successive alleles embody a cell. Each triplet in the chromosome, respectively, encodes a cell’s two inputs and cell’s type. Hence the chromosome length is calculated in this way: 3*(number of Columns * number of rows)
(1)
In the experiment presented in this paper, the array used if of a fixed size of 5*5 cells; thus the length of the chromosome equals to 75 genes. Each cell’s inputs of the
180
E. Benkhelifa et al. Table 1. Cell Type Definition
Logical function of cell AND OR XOR NOT(in1) NOT(in2) WIRE(in1) WIRE(in2) NAND NOR
Cell type 5 6 7 8 9 10 11 12 13
first column in the array can be any integer value in the range [0 to MAX_NUM_INPUTS -1]. On the other hand, cells in the other columns can take any integer value in the [0 to NUM_ROWS -1]. As for the third gene in the triplet, cell type is defined as follows A typical chromosome in this paper’s experiment can be a multiple triplets of this form ( [0-3], [0-3], [5-13] ), ( [0-4], [0-4], [5-13] ), ..…….. ( [0-4], [0-4], [5-13] ) 3.2.2 The Fitness Function The fitness function in this experiment aims to accept only solutions with 100% correctness of the target circuit; then it reports a solution with minimal number of functional gates. The first part F1 of the fitness function compares the output response of each cell in the evolved circuit with the desired one(s). If all matching, then the fitness value for correctness is 100%. Checking every cell’s output in the evolved circuits against the desired ones enhances the algorithm and minimises time for the evolution process. The second fitness F2 searches for the most optimum correct solution in terms of required gates. This is done by identifying gates of type 10 and 11 gates. Cells with types 5 and 6 which are fed from the same source can be considered as a WIRE as they pass the same input. The following shows how the fitness of individuals is calculated: F1 = (c ∗ 100) /2^n
(2)
Where ‘c’ is a variable which is incremented by one for each correct output and n is the number of desired outputs If { F1 =100} F2 = ( t * 100) /ng
(3)
Where t is the number of gates of type 10 or 11 and ng is the total number of gates in the array.
Steps Forward to Evolve Bio-inspired Embryonic Cell-Based Electronic Systems
181
3.2.3 Selection Selection is based on the principle of survival of the fittest, which means only better candidate solutions get more offspring with the same or close to genetic code. Selection in a GA is probabilistic. There are several techniques of selection: Proportional, Rank based, exponential and Tournament selections. In this experiment the proportional, also know as “roulette wheel”, is used. 3.2.4 Initial Population After restricting the chromosomal representations to valid possible solutions, an initial population has to be created as a starting point to the GA. This can be generated at random or predefined by the GA designer. It is typically recommended to vary the population size between the range 30 to 100 individuals. For the purpose of this experiment large population sizes of 2000 individuals are used. The population is created randomly. 3.2.5 Crossover and Mutation These are genetic operators; they are used analogously to how they are used by natural crossover, mutation and reproduction, in order to expand a current generation to the next one, and maintain genetic diversity. Both Crossover and Mutation are probabilistic and their probabilities are defined by the GA’s designer. Crossover combines individuals at random producing offspring. There are several different techniques of crossover: one point crossover, two point crossover, cut and splice and uniform & half Uniform crossover. These techniques vary depending on how parent’s strings are combined. For this experiment, one-point crossover with probability rate of 1.0 is used. Mutation involves changing genes in chromosomes randomly, in order to create diversity within the pool. In this experiment, simple gene swap mutation with probability rate of 0.0213 is used. Mutation is restricted to avoid illegal chromosomes.
4 The Experiment The libga100 library [21] was also used as a GA Platform. As mentioned above, this experiment aims at evolving combinatorial circuits. For this purpose, the functionality of a 2bit*2bit Multiplier is evolved The stopping criterion of the evolution process is after reaching a 1000 generation ignoring the convergence. After few runs the following possible and most optimal solutions are reported: Best: 1 3 5 0 2 5 1 1 10 0 3 5 2 1 5 2 4 5 4 3 7 1 0 5 3 1 11 0 3 8 1 4 11 0 4 10 4 1 10 3 2 11 3 4 5 3 2 6 2 0 9 4 2 10 3 2 10 1 1 10 4 0 10 3 1 11 3 2 11 4 0 10 1 2 11 (56%) (a) Best:2 0 5 1 2 5 0 3 5 1 3 5 3 0 5 3 2 10 1 2 7 1 4 5 0 0 10 4 2 10 0 2 11 3 0 11 0 3 10 3 4 11 3 1 10 3 2 11 0 4 7 3 0 11 1 4 10 1 3 10 2 3 10 1 0 5 4 4 11 3 3 10 0 4 11 (64%) (b)
182
E. Benkhelifa et al.
Fig. 3. Initial Array of 5*5 to evolve 2bit*2bit Multiplier
Fig. 4. Graphical Representation of chromosome (a) with 56% optimised solution
Fig. 5. Graphical Representation of chromosome (b) with 64% optimised solution
These solutions are 100% correct and (56%) and (64%) represent the optimisation fitness F1. Figure 4 shows the graphical representation of chromosome (a) that fulfils 2bit*2bit Multiplier functionality with 56% optimisation. One can easily notice that the GA found one of the desired outputs matching the output of a cell from the first
Steps Forward to Evolve Bio-inspired Embryonic Cell-Based Electronic Systems
183
column of the array. The chromosome (b) represents a more optimum solution with only 7 gates rather than 8 in (a) Figure 5 shows the graphical representation of chromosome (b) that fulfils 2bit*2bit Multiplier functionality with 64% optimisation. Chromosome (b) represents a more optimum solution with only 7 gates rather than 8 in (a).
5 Conclusion and Further Work This paper presents a step forward to evolve a complex hardware system. It proposes a novel approach based on the use of Evolutionary Algorithms to autonomously design complex logic circuits. The main objectives are to minimize the total number of gates of an Embryonics cell with total correctness. The results will be compared against the Embryonics cell that was produced by human designers. The importance of this technique and its contribution to the world of digital electronics becomes apparent. It directly addresses one of, if not the, most crucial issue of all electronic systems, that is, their level of complexity. Initial results of experiments, presented in this manuscript, proved promising and advantageous over the traditional methodologies. Unlike the human design approach, evolutionary design methodology searches a broad range of design alternatives; thus steps towards Evolvable hardware, really, holds a huge promise to minimise the time and the cost required by conventional approaches to design Hardware Systems. By conducting this research, one can investigate the power of Evolutionary Algorithms on larger scale digital circuits, in this case, the Embryonics Cell, which consists of up to 7000 transistors approximately [3]. This approach is hoped to find an answer to the scalability issue in the design of a very complex electronic system [9]. Divide and conquer methodology is an attractive approach to break down a complex problem to sub-problems [29]. Also, by conducting this research, one can learn more about the way evolutionary algorithms work to evolve hardware. These techniques can be taught to human designers to be used in circuit design. Another challenge researchers are facing, is to produce an efficient chromosome representation for complex architectures, and efficient fitness function. These will, also, be investigated in this research. To further foster these early conclusions and strengthen the initial observations; this experiment is going to be repeated on more complex circuit functionalities with an ultimate aim of automatic design of self-repair Embryonics hardware system.
References 1. Zhang, X., Dragffy, G., Pipe, A.G., Zhu, Q.M.: Ontogenetic Cellular Hardware for Fault Tolerant Systems. In: the proceeding of ESA’03: The 2003 International Conference on Embedded Systems and Applications, June 2003, Las Vegas, USA (2003) 2. Thompson, A.: Silicon Evolution. In: Koza, J.R., et al. (eds.) Proceedings of Genetic Programming, 1996 (GP96), pp. 444–452. MIT press, Cambridge (1996) 3. Zhang, X.: PhD Thesis, Biologically Inspired Highly Reliable Electronic Systems With Self- Healing Cellular Architecture, University of the West of England Bristol, UK (2005) 4. Mange, D., Sipper, M., Stauffer, A., Tempesti, G.: Towards Robust Integrated Circuits: The Embryonics approach. Proceeding of the IEEE 88(4) (2000)
184
E. Benkhelifa et al.
5. Miller, J.F., Thomson, P., Fogarty, T.C.: Designing Electronic Circuits Using Evolutionary Algorithms. Arithmetic Circuits. In: Quagliarella, D., Periaux, J., Poloni, C., Winter, G. (eds.) Genetic and Evolution Strategies in Engineering and Computer Science:Recent Advancements and Industrial Applications. Ch. 6, Wiley, Chichester (1997) 6. Zhang, X., Dragffy, G., Pipe, A.G., Zhu, Q.M.: A Reconfigurable Self-healing Embryonic Cell Architecture. In: the proceeding of ERSA’03: The 2003 International Conference on Engineering of Reconfigurable Systems and Algorithms, June 2003, Las Vegas, USA (2003) 7. Miller, J.F., Vesselin, K.V.: Scalability Problems of digital Circuit Evolution Evolvability and Efficient Designs. In: Lohn, J., Stoica, A., Keymeulen, D. (eds.) The Second NASA/DoD workshop on Evolvable Hardware, pp. 55–64. IEEE Computer Society, Los Alamitos (2000) 8. Tomassini, M.: Evolutionary Algorithms. In: International workshop on Towards Evolvable Hardware, The Evolutionary Engineering Approach, Logic Systems Laboratory, Switzerland (1995) 9. Yao, X., Higuchi, T.: Promises and Challenges of Evolvable Hardware. IEEE Translations on Systems, Man, and Cybernetics-Part c: Applications and Reviews 29(1) (1999) 10. Haddow, P.C., Tufte, G.: Evolving a robot controller in hardware. In: Norwegian Computer Science Conference (NIK99), The Norwegian University of Science and Technology (1999) 11. Stoica, A.: Toward evolvable hardware chips: experiments with a programmable transistor array. In: Proceedings of 7th International Conference on Microelectronics for Neural, Fuzzyand Bio-Inspired Systems, Granada, Spain, April 7-9, 1999, IEEEComp Sci. Press (1999) 12. Teuscher, C., Mange, D., Stauffer, A., Tempesti, G.: Bio-Inspired Computing Tissues: Towards Machines that Evolve, Grow, and Learn. Biosystems 68(2-3), 235–249 (2003) 13. Holland, J.H: Adaptation in Natural and Artificial Systems. University of Michigan Press, Ann Arbor (1975) 14. Goldberg, D.E.: Genetic Algorithms in Search. In: Optimization and Machine Learning, Kluwer Academic Publishers, Boston, MA (1989) 15. Stoica, A., Zebulum, R., Keymeulen, D., Tawel, R., Daud, T., Thakoor, A.: Reconfigurable VLSI Architectures for Evolvable Hardware: from Experimental Field Programmable Transistor Arrays to Evolution-Oriented Chips. IEEE Transactions on VLSI Systems, Special Issue on Reconfigurable and Adaptive VLSI Systems 9(1), 227– 232 (2001) 16. Stoica, A., Zebulum, R.S., Ferguson, M.I., Keymeulen, D., Duong, V., Daud, T.: Xin Guo, Rapid evolution of analog circuits configured on a Field Programmable Transistor Array. In: the Proceedings of Smart Engineering System Design 17. Higuchi, T., et al.: Evolvable Hardware with Genetic Learning. In: Proc. of Simulated Adaptive Behaviour, MIT Press, Cambridge (1993) 18. Higuchi, T., et al.: Evolvable Hardware at Functional Level. In: IEEE International Conference on Evolutionary Computation (1997) 19. Lohn, J.D., Hornby, G.S.: Evolvable HardwareUsing Evolutionary Computation to Design and Optimize Hardware Systems. IEEE Computational Intelligence Magazine(February 2006) 20. Stoica, A., Zebulum, R., Keymeulen, D.: Mixtrinsic Evolution. In: Miller, J.F., Thompson, A., Thompson, P., Fogarty, T.C. (eds.) ICES 2000. LNCS, vol. 1801, Springer, Heidelberg (2000)
Steps Forward to Evolve Bio-inspired Embryonic Cell-Based Electronic Systems
185
21. Corcoran, A.L., Wainwright, R.L.: Using LibGA to Develop Genetic Algorithms for Solving Combinatorial Optimization Problems. In: Practical Handbook of Genetic Algorithms, Applications. Lance Chambers editor pages, vol. I, CRC Press (1995) 22. Soliman, A.T., Abbas, H.M.: Combinational Circuit Design Using Evolutionary Algorithms. In: CCECE, May 2003, Montreal (2003) 23. Koza, J.R.: Genetic Programming: On the Programming of Computers by Means of Natural selection. MIT Press, Cambridge (1992) 24. Rechenberg, I.: Evolution Strategy. In: Zurada, J., Marks, R. (eds.) Computational Intelligence: Imitating Life, pp. 147–159 (1994) 25. Holland, J.: Adaptation in Natural and Artificial Systems. University of Michigan Press, Ann Arbor, MI (1975) 26. Gordon, T.G.W., Bentley, P.J.: On Evolvable hardware. In: Soft Computing in Industrial Electronics, University College London, UK (2001) 27. Von Neumann, J.: Theory of self-reproducing Automata. In: Burks, A.W. (ed.) University of Illinois Press, Urbana (1966) 28. Ortega-Sanchez, C.A.: PhD Thesis Embryonics: A Bio-Inspired Fault-Tolerant MultiCellular System, The University of York, UK (May 2000) 29. Torresen, J.: A Divide-and-Conquer Approach To Evolvable Hardware, University of Oslo, Norway
Evolution of Polymorphic Self-checking Circuits Lukas Sekanina Faculty of Information Technology, Brno University of Technology Boˇzetˇechova 2, 612 66 Brno, Czech Republic
[email protected]
Abstract. This paper presents elementary circuit components which exhibit self-checking properties; however, which do not utilize any additional signals to indicate the fault. The fault is indicated by generating specific values at some of standard outputs of a given circuit. In particular, various evolved adders containing conventional as well as polymorphic gates are proposed with less than duplication overhead which are able to detect a reasonable number of stuck-at-faults by oscillations at the carry-out output when the control signal of polymorphic gates oscillates.
1
Introduction
The use of self-checking circuits is important to various applications nowadays. As it is assumed that fault-tolerance issues will be more important in the era of nanoelectronics, the use of this type of circuits will certainly grow. Self-checking circuits are conventionally constructed by adding checking logic around an unmodified original output of the circuit. Garvie utilized the evolutionary algorithm to design small self-checking circuits which merge the original function with the checker logic [1]. He reported better results than conventional approaches if the total area of the circuit is measured. However, conventional as well as evolutionary approaches require special signals which indicate that a fault is present in the circuit. When more complex circuits are composed of these elementary selfchecking circuits, these special signals have to be interconnected and aggregated to obtain the global information about the fault. The goal of this work is to propose elementary circuit components which exhibit self-checking properties; however, which do not utilize any additional signals to indicate the fault. The fault is indicated by generating specific values at some of standard outputs of a given circuit. This research is motivated by the fact that some of future computing architectures will probably contain a large massively parallel array of locally interacting computing elements in which the issue of efficient wiring will be very important. The existence of some extra signals or even global signals will be problematic [2]. In order to reduce the overall cost and wiring, new circuits are proposed in which the user function (e.g. addition) is completely merged with the test procedure. Furthermore, as the circuits indicate the faulty behavior by oscillations at one of their output signals, no additional testing input/output signal wires are needed. This paper L. Kang, Y. Liu, and S. Zeng (Eds.): ICES 2007, LNCS 4684, pp. 186–197, 2007. c Springer-Verlag Berlin Heidelberg 2007
Evolution of Polymorphic Self-checking Circuits
187
describes and analyzes some evolved adders which exhibit this property. These circuits utilize conventional as well as polymorphic gates. Polymorphic gates perform two or more logic functions which are activated under certain conditions by changing control parameters (such as temperature, Vdd, light, external control voltage etc.) of the circuit [3, 4]. Similarly to [5], instead of two functions, the proposed circuits perform only one function in the both modes of polymorphic gates. These circuits are designed in such a way that if a fault is present in the circuit, one of its outputs oscillates when the control signal of polymorphic gates oscillates. Thus, the fault can be detected. The plan for this paper is as follows: Section 2 summarizes basic principles of digital circuit testing. Section 3 introduces the concept of polymorphic electronics and examples of existing polymorphic gates and circuits. In Section 4 and 5 the proposed method is described which allows obtaining self-checking circuits with a reasonable overhead and without any supporting diagnostic wires. While evolved circuits are presented in Section 6, Section 7 discusses them. Conclusions are given in Section 8.
2
Self-checking Circuits
A built-in self-test (BIST) mechanism within an integrated circuit is a function which verifies all or a portion of the internal functionality of the integrated circuit [6]. Typically, a pseudo-random sequence generator produces the input signals for a section of combination circuitry and a signature analyzer observes the output signals and produces a syndrome. The syndrome is compared with the correct syndrome to determine whether the circuit is performing according to a specification. BIST techniques can be classified as concurrent and nonconcurrent. Concurrent BIST uses embedded checkers for on-line testing during normal operation. Non-concurrent BIST autonomously performs off-line testing of the device in which is built in, before normal operation. Circuits with Concurrent Error Detection (CED) are capable of detecting transient and permanent faults and are widely used in systems where dependability and data integrity are important [7]. CED techniques can also enhance off-line testability and reduce BIST overhead. Their importance grows as shrinking process technology makes logic sensitive to radiation hazards such as Single Event Upsets. Almost all CED techniques decompose circuits into two modules: the functional logic and the checker [8, 9]. The functional logic provides output encoded with an error detecting code and the checker determines if the output is a codeword. The checker itself traditionally provides a two-rail signal in which an error is indicated by both rails being at the same value. A classic CED technique is duplication in which the output function generator and check symbol generator are functionally identical and the checker simply compares their output. One well-defined class of self-checking circuits, called totally self-checking (TSC), offers two desirable properties in the presence of faults from the set of likely faults F [9]: (1) it is fault-secure (FS) with respect to F , i.e., no fault from
188
L. Sekanina
F can cause undetected error for any input codeword; and (2) it is self-checking with respect to F , i.e., if for every fault f from F , a circuit produces a noncode space output for at least one code space input. In a TSC circuit, the occurrence of the first error could be immediately signaled by activating the error signal. Automatic synthesis methods for TSC circuits with less than duplication overhead have been proposed (see, e.g., [10, 11, 12]). Garvie has presented a method based on evolutionary design capable of adding logic around an unmodified original output generating function circuit to make it TSC with less than duplication overhead. Moreover, he proposed a method capable of generating TSC circuits with unconstrained structure, i.e. circuits not strictly adhering to the functionchecker modular decomposition [1].
3
Polymorphic Circuits
The concept of polymorphic electronics was proposed by Stoica et al [3]. In fact, polymorphic circuits are multifunctional circuits. The change of their behavior comes from modifications in the characteristics of components (e.g. in the transistor’s operation point) involved in the circuit in response to controls such as temperature, power supply voltage, light, etc. [4, 3]. Table 1 gives examples of the polymorphic gates reported in literature. For instance, the NAND/NOR gate is the most famous example [13]. This circuit consists of six transistors and operates as NAND when Vdd is 3.3V and as NOR when Vdd is 1.8V. The circuit was fabricated in a 0.5-micron CMOS technology. The circuit is stable for ±10% variations of Vdd and for temperatures in the range of –20◦ C to 200◦C. Table 1. Examples of existing polymorphic gates and their implementation cost Gate AND/OR AND/OR/XOR AND/OR AND/OR NAND/NOR NAND/NOR/NXOR/AND
control values 27/125◦ C 3.3/0.0/1.5V 3.3/0.0V 1.2/3.3V 3.3/1.8V 0/0.9/1.1/1.8V
control transistors ref. temperature 6 [3] external voltage 10 [3] external voltage 6 [3] Vdd 8 [4] Vdd 6 [13] etxternal voltage 11 [14]
The use of polymorphic gates as building blocks offers an opportunity to design multifunctional digital modules at the gate level. Once the circuit is designed at the gate level (abstracting thus from the electric level), it does not matter whether this circuit is “reconfigured” by a level of Vdd, temperature or light. For example, using the polymorphic NAND/NOR gate and some standard gates is possible to create a circuit which operates as a three-bit multiplier in the first environment and as a six-input sorter in the second environment [15].
Evolution of Polymorphic Self-checking Circuits
4
189
Proposed Method
Similarly to [5], the goal is to propose such a polymorphic circuit whose output Yosc will oscillate when a fault is present in the circuit and simultaneously the polymorphic mode is changed. In other words, it is required that the polymorphic circuit performs the same function in each mode of its polymorphic gates. Note that Yosc is not a special testing output signal; Yosc is one of functional circuit outputs. 4.1
Polymorphic Self-checking
In contrast to [5], presented circuits utilize conventional as well as polymorphic gates. Consider that only one type of polymorphic gates – the NAND/NOR gate (controlled by Vdd or by some external voltage Vs) – is utilized. Let Vc denote this control voltage, independently of the fact that it might be Vdd or Vs. The NAND/NOR gates can be switched simultaneously between NAND and NOR function by changing the control value Vc. If the inputs were not changed and the control signal were switched at the frequency kf (k is a positive integer and f is an operational frequency of the circuit) then the circuit outputs would remain steady (neglecting here some switching spikes), which is a normal operation of the circuit. However, it is required that the circuit operates in such a way that if a stuckat-fault is present within the circuit then Yosc output will oscillate between 0 and 1 at the same frequency as the control signal oscillates between its NAND and NOR levels. In addition to its primary function, this output works as the indicator of a stuck-at-fault in the circuit. The control signal is activated for polymorphic gates whenever the circuit should be tested. The test can be performed either before the system is put into operation or in some time slots devoted to testing of the system. If it is unproblematic with respect to the system function the proposed scheme allows testing the system permanently, during circuit operation. Common input values of the circuit are then considered as test vectors. In this mode, the control signal has to oscillate permanently. 4.2
Analysis of Self-checking Capabilities
In order to investigate basic self-checking properties of proposed circuits, we will analyze these circuits at the gate level. The following procedure is applied for two types of faults (stuck-at-0 and stuck-at-1) injected to the outputs of all K gates within the circuit. 1. For i = 1 to K do 2. begin (a) Inject a stuck-at-fault at the output of gate i. (b) Set polymorphic gates to mode 1. (c) Calculate R1 – the vector of Yosc values for all possible combinations of input values.
190
L. Sekanina
(d) Set polymorphic gates to mode 2. (e) Calculate R2 – the vector of Yosc values for all possible combinations of input values. (f) Calculate Mgi – the set of input vectors that have induced different Yosc values for gate i. end This algorithm allows us to calculate the fault coverage for all possible test vectors (i.e. for the trivial test). In addition, we are able to find those input vectors which induce oscillations at the Yosc output for the particular stuck-atfault. It is unrealistic to expect that the oscillations will always appear when only a single test vector is applied. In most cases we have to apply several test vectors in order to indicate a stuck-at-fault by oscillations at Yosc . The goal is to find such a minimal subset, Mmin , of all possible input vectors which, when applied simultaneously with changing Vc, will cause oscillations at Yosc in the case that a stuck-at-fault is present in the circuit. However, the approach has one disadvantage: when a stuck-at-fault is present at Yosc then no oscillations are observable and the fault has to be detected in another way. It can also happen for some gates that when a stuck-at-fault is present at these gates then no oscillations are detectable at Yosc for any input vector. The goal is to avoid the use of such gates and subcircuits in the circuit because they decrease the efficiency of the approach. 4.3
Test Problem: Full Adder
The approach will be tested on a standard 1-bit full adder which has two operands, A and B, and input carry, Cin . It generates the sum S = A ⊕ B ⊕ Cin and the output carry Cout = AB + BCin + ACin . A standard optimized static CMOS VLSI implementation of the 1-bit full adder costs 24 transistors [16]. In order to construct adders with specific properties, standard 1-bit full adders are usually equipped with several additional input/output signals. For example, Carry-Look-Ahead adders use 1-bit full adders that generate two additional signals – propagate, P = A ⊕ B, and generate, G = AB utilized to look ahead to predict the carry-out [17]. Self-checking adders contain the parity prediction logic [10, 12]. In our case, no additional signals have to be utilized to indicate a stuck-atfault. Either S or Cout will perform this task. As Section 6 will show, it is useful to utilize Cout (i.e., Yosc = Cout ) because the oscillations can propagate through next adders connected in the carry propagate chain. Thus, it could be possible to create n-bit self-checking adder which indicates a stuck-at-fault by oscillations at its most significant Cout . A self-checking adder consisting of fourteen NAND/NOR gates was proposed in [5] (denoted as FA-14 in Table 2). The authors conjectured that polymorphic circuits performing the identical function in the both modes would be more fault-tolerant to radiation induced faults as their function could be restored by
Evolution of Polymorphic Self-checking Circuits
191
changing the mode of operation. It seems that the adder was not designed with the aim of detecting stuck-at-faults by oscillations at Cout . We analyzed this circuit using the proposed method and recognized that the stuck-at-fault is not recognized for nine gates (when oscillations are measured at Cout , i.e. at gate 13) and for six gates (when oscillations are measured at S, i.e. at gate 14). Therefore, this adder is not suitable for our purposes. Moreover, its implementation cost is relatively high (14 x 6 = 84 transistors).
5
Evolutionary Design of Polymorphic Circuits
In order to evolve gate-level polymorphic circuits which perform the identical function in the both modes, we will use an evolutionary algorithm (EA) inspired by Cartesian Genetic Programming [18, 15]. A candidate digital circuit is represented using a two-dimensional array of (nr ×nc ) programmable nodes. Each programmable node has two inputs, a single output and can be programmed to implement one of functions given in function set Γ . The role of EA is to find the interconnection of nodes and functions performed by the nodes for a given specification expressed by means of a truth table. A candidate circuit is encoded as an arrays of integers of the size 3.nr .nc + no , where no is the number of circuit outputs. As only combinational circuits are evolved, no feedback links are allowed in candidate circuits. Hence the node input can be connected either to an output of a node placed in some of preceding columns or to a primary circuit input. The EA uses one genetic operator – mutation – which modifies m integers of the chromosome. Either a node or an output connection is modified. The EA operates with the population of λ individuals. The initial population is randomly generated. Every new population consists of a parent (the fittest individual from the previous population) and its mutants. In case that two or more individuals have received the same fitness score in the previous generation, the individual which did not serve as the parent in the previous population will be selected as a new parent. The fitness value is defined as follows: f itness = B1 + B2 + (W − z)
(1)
where B1 (resp. B2 ) is the number of correct output bits for the first (resp. second) polymorphic mode obtained as the response for all possible input combinations, z denotes the number of transistors utilized in a particular candidate circuit and W is a suitable constant (W = 10.nc .nr ). The last term is considered only if the circuit behavior is perfect in the both modes; otherwise W − z = 0. The number of transistors is calculated for the nodes used in the phenotype as follows: the NAND/NOR costs 6 transistors [13], the XOR costs 8 transistors and inverter costs 2 transistors. We intentionally restricted Γ to contain only the gates NAND/NOR, XOR and NOT because these gates were recognized during experiments as most useful for our task.
192
6
L. Sekanina
Results
This section presents properties of selected self-checking circuits that we evolved using the following basic setup: population size is 15, one randomly selected gene is mutated and 100k generations are produced in each run. Other parameters are described in particular subsections. Table 2 summarizes some features of evolved circuits. The FA-14 is shown for comparison. Table 2. Parameters of self-checking circuits. Last column lists the gates with unrecognizable faults when measured at Cout or S (see also Fig. 4). The overhead is calculated in relation to the number of transistors in optimized conventional solutions. Circuit Gates Trans. Delay Overhead Unrecognizable faults at gates: FA-14 14 84 5 250% (3, 4, 5, 8, 9, 11, 12, 13, 14)-Cout [5] (1, 2, 3, 10, 13, 14)-S FA3n 6 36 5 50% (6)-Cout FA3 [19] 7 38 4 58% (5, 7)-Cout FA2 8 52 3 116% (3, 6, 8)-Cout FA1 9 54 6 125% (6,7,8,9)-Cout , (7,9)-S HA 4 28 3 100% (4)-Cout ANDFA1 10 64 7 113% (5, 10)-Cout
6.1
Full Adders
Best-Evolved Adder (FA3n). In this experiment, we utilized 20 programmable nodes organized in array of 20 x 1 elements and Γ = {NAND/NOR, XOR, NOT}. Figure 1 shows the FA3n adder which exhibits the best selfchecking properties out of all experiments. It consists of two XOR gates, an inverter and three polymorphic NAND/NOR gates. The sum calculation is quite standard. On the other hand, the carry output is calculated unconventionally using the three NAND/NOR gates that are connected to the circuit inputs as well as to inverted sum, S. Independently of the level of the control signal of the polymorphic gates, this logic network always generates a correct carry-out signal when there is no fault present in the circuit. The proposed adder would cost 36 transistors, i.e. the overhead is 50% in comparison with a conventional transistor-level implementation. The overhead is similar to conventionally designed self-checking circuits [20, 1] (for example, the self-checking adder reported in [11] also consists of 36 transistors); however, the proposed adder does not require any additional wires when Vdd controls the polymorphic gates. It is assumed that standard gates (the XORs and inverters) operate correctly for the both modes of polymorphic gates (for example, for Vdd = 1.8V as well as Vdd = 3.3V). Note that other adders reported in this paper exhibit the overhead more than 100%. Figure 4 shows the fault coverage for all possible test vectors (i.e. for the trivial test). Test vectors are indexed 0–7 which corresponds with the circuit inputs ordered as (Cin, B, A). Symbol “x” means that a corresponding test vector is
Evolution of Polymorphic Self-checking Circuits
Sm
S1
S0 A0
Am
A1 g1
g4
g3
g7 g5
B0
193
g9
g10 g11
B1
Bm
Cin g6
g12
g2
g8
STA0
STAm
STA1
Cout_m
Cout1 Cout0
Fig. 1. FA3n – the best evolved 1-bit self-checking adder and its utilization in the carry-propagate adder
able to induce oscillations at the carry-out output for the particular stuck-atfault. We can observe that the stuck-at-fault can not be detected only when injected to gate 6. The reason is that this gate is connected directly to the primary output of the adder. It is easy to derive from Fig. 4 that by applying test vectors Mmin = {1, 2, 3, 5} or {1, 2, 5, 6} or {2, 4, 5, 6}, all single faults can be detected. In other words, at least four test vectors have to be applied in order to initiate oscillations at the carry-out output when a single fault is present in the adder. The probability of fault detection is 0.325 when only a single randomly generated test vector is applied. The n-bit Self-Checking Adder. As Fig 1 shows, by cascading 1-bit selfchecking adders FA3n we can construct carry-propagate adders. Consider a 2-bit adder. When a fault is present in STA1 then the fault is indicated by oscillations at Cout1 as explained for the 1-bit full adder in the previous paragraph. In order to detect at Cout1 a fault which is present in STA0 then Cout0 has to propagate through STA1. This can only be achieved by setting A1 = B1 . Similarly to previous section, we can observe that the stuck-at-fault can not be detected when injected to gates 6 or 12. Only four test vectors, for example Mmin = {10, 14, 20, 21}, are needed to detect all single faults at remaining gates and thus to initiate the oscillations at Cout1 . In general, four test vectors have to be used to perform this task for n-bit self-checking adder. For the 2-bit self-checking adder, the probability of fault detection in STA1 is 0.325 when only a single randomly generated test vector is applied. However, the probability of fault detection is only 0.1625 when a fault is present in STA0. Recall that A1 = B1 has to be ensured to propagate the oscillations. In general, this probability decreases twice with every next 1-bit adder which is closer to the least significant bit. Therefore, it is much easier to detect faults in more significant bits of the adder than those in less significant bits when only a single randomly chosen test vector is applied. However, this problem could be overcome by observing also internal carry-out signals. Shortest-Delay Adder (FA2). In this experiment, we utilized nr = 8, nc = 3 and Γ = {NAND/NOR, XOR, NOT}. Figure 2 shows an evolved self-checking adder which exhibits the shortest delay. As Fig. 4 shows, some stuck-at-faults cannot be recognized by oscillations at Cout .
194
L. Sekanina
B S
A
A Cin
Cout
g5
g1
g8
g2
g1
g5
g4
g6
S
B
g9
g6
g2
Cin g8
g3
Cout
g3
g7
g4
g7
Fig. 2. FA1 – self-checking adder solely composed of NAND/NOR gates (left), FA2 – the shortest delay self-checking adder (right)
An Adder Composed of NAND/NOR Gates Only (FA1). In this experiment, we utilized nr = 2, nc = 6 and Γ = {NAND/NOR}. Figure 2 shows an evolved self-checking adder which consists of nine NAND/NOR gates. Its main feature is that it contains fewer gates than existing FA-14 adder. As many stuckat-faults cannot be recognized by oscillations at Cout (see Fig. 4), this adder is not practically useful for creating carry-propagate adders. However, almost all stuck-at-faults are recognizable at the S output. 6.2
Half Adder (HA)
Figure 3 shows evolved self-checking half adder. It is implemented using 24 transistors which means the 100% overhead. As Fig. 4 shows, at least two test vectors have to be applied in order to detect all recognizable stuck-at-faults. In this experiment, we utilized 24 programmable nodes organized in array of 3 x 8 elements and Γ = {NAND/NOR, XOR, NOT}. 6.3
Extended Adder (ANDFA1)
By the extended adder we mean a circuit of four inputs A, B, C and D which firstly calculates H = A AND B. Then, the sum S and the carry-out are calculated g4
log. 0 A B
log. 0
g1 g6
D
g3
A S
g2
B
S
g5 g10
g7 g8
g1
g9
Cout
Cout C g2
g4
g3
Fig. 3. HA – self-checking half adder (left), ANDFA1 – self-checking extended adder (right)
Evolution of Polymorphic Self-checking Circuits
FA1 - Cout stuck-at-0 gi 01234567 g1 XX..X... g2 ..X.X... g3 .XX..... g4 ...X.XX. g5 X.X..... g6 ........ g7 ........ g8 ........ g9 ........ stuck-at-1 g1 ...X..XX g2 ...X.X.. g3 .....XX. g4 .XX.X... g5 .....X.X g6 ........ g7 ........ g8 ........ g9 ........
FA1 - S stuck-at-0 gi 01234567 g1 XXXXX.X. g2 X.X.XXXX g3 XXXX.X.X g4 XXXXXXXX g5 ..X....X g6 ...X..X. g7 ........ g8 X....X.. g9 ........ stuck-at-1 g1 .X.XXXXX g2 XXXX.X.X g3 X.X.XXXX g4 XXXXXXXX g5 X....X.. g6 .X..X... g7 ........ g8 ..X....X g9 ........
FA2 stuck-at-0 gi 01234567 g1 .X.XX.X. g2 .XX..XX. g3 ........ g4 ..XXXX.. g5 .X.XX.X. g6 ........ g7 .X.XX.X. g8 ........ stuck-at-1 g1 .X.XX.X. g2 .XX..XX. g3 ........ g4 ..XXXX.. g5 .X.XX.X. g6 ........ g7 .X.XX.X. g8 ........
FA3n stuck-at-0 gi 01234567 g1 ..XXXX.. g2 .X..X... g3 .XX.X... g4 ...X.XX. g5 X.X..... g6 ........ stuck-at-1 g1 .X....X. g2 ...X..X. g3 ...X.XX. g4 .XX.X... g5 .....X.X g6 ........
HA stuck-at-0 gi 0123 g1 .XX. g2 .XX. g3 .XX. g4 ........ stuck-at-1 g1 X..X g2 .XX. g3 .XX. g4 ....
195
ANDFA1 stuck-at-0 gi 0123456789012345 g1 .....XX..XX..... g2 .....XX..XX..... g3 ....XXXXXXXX.... g4 .....XX..XX..... g5 ................ g6 ...X....XXX..... g7 ...XXXX.XXX..... g8 .......X...XXXX. g9 XXX.XXX......... g10 ................ stuck-at-1 g1 ....X..XX..X.... g2 .....XX..XX..... g3 ...X........XXX. g4 .....XX..XX..... g5 ................ g6 .......X....XXX. g7 .......X...XXXX. g8 ...XXXX.XXX..... g9 ...........X...X g10 ................
Fig. 4. Fault coverage for evolved circuits. Symbol “x” means that a corresponding test vector is able to induce oscillations at the output for the particular stuck-at-fault.
using H, C and D as inputs. This circuit serves as a building block of combinational multipliers [17]. Figure 3 shows evolved ANDFA1 adder which contains 10 gates (64 transistors). This represents the overhead of 113%. In this experiment, we utilized 28 programmable nodes organized in array of 7x4 elements and Γ = {NAND/NOR, XOR, NOT}.
7
Discussion
All evolved circuits exhibit the required logic behavior in the both modes of polymorphic gates. Regarding testability properties, only FA3n and HA circuits can indicate all stuck-at-faults at the Cout output. However, the problem is that stuck-at-faults located directly at the Cout output cannot be principally eliminated within this scheme. This represents the main weakness of the proposed approach. As the area overhead of FA3 and HA circuits is 50% (100%, respectively), they are also good alternatives to the dual-module redundancy. These circuits can be utilized as building blocks of larger adders and multipliers in which a stuck-at-fault can be indicated by changes at one of their outputs when the mode of polymorphic gates is changed. In the case of multipliers, more effort will be needed to find cheaper implementations of extended adders. Breaking the carry chain in any of these circuits represents a serious problem for the method. Another problem is related to the decreasing probability of detecting stuck-at-faults for less significant bits in carry-propagate adders. Proposed adders show very promising behaviors especially when we consider that the requirement on testability was not included into the fitness function in this initial study. We just evolved polymorphic adders and then verified their
196
L. Sekanina
testability. In our future research, we will include the requirement on testability directly to the fitness function. The computational time for this problem is not high. For the 1-bit full adder we performed 370 experiments. On average, 11 thousands generations are produced in each experiment which takes approx. 0.25 sec on a standard PC equipped with Athlon64 X2 4800+ processor.
8
Conclusions
We presented low-cost implementations of self-checking adders which indicate a stuck-at-fault by oscillations at the carry-out output. When it is possible for a target application to switch the control signal of polymorphic gates (e.g. Vdd) either in some time slots or during the entire operation, the adder would check itself. In fact, the adder can permanently be under test because its common input values can serve as test vectors. If these inputs values are diverse and updated very often then the probability of fault detection increases. Proposed circuits can be utilized as components for creating larger self-checking adders and multipliers which do not require any additional test signals.
Acknowledgements This work was partially supported by the Grant Agency of the Czech Republic under contract No. 102/06/0599 Methods of polymorphic digital circuit design and the Research Plan No. MSM 0021630528 - Security-Oriented Research in Information Technology.
References [1] Garvie, M.: Reliable Electronics through Artificial Evolution. PhD thesis, University of Sussex (2005) [2] Frank, M.: Reversibility for Efficient Computing. PhD thesis, Massachusetts Institute of Technology (1999) [3] Stoica, A., Zebulum, R.S., Keymeulen, D.: Polymorphic electronics. In: Liu, Y., Tanaka, K., Iwata, M., Higuchi, T., Yasunaga, M. (eds.) ICES 2001. LNCS, vol. 2210, pp. 291–302. Springer, Heidelberg (2001) [4] Stoica, A., Zebulum, R.S., Keymeulen, D., Lohn, J.: On polymorphic circuits and their design using evolutionary algorithms. In: Proc. of IASTED International Conference on Applied Informatics AI2002, Insbruck, Austria (2002) [5] Zebulum, R.S., Stoica, A.: Multifunctional Logic Gates for Built-In Self-Testing. NASA Tech Briefs 30(3), 10 (2006) [6] Novak, O., Gramatova, E., Ubar, R.: Handbook of Testing Electronic Systems. Czech Technical University Publishing House (2005) [7] Pradhan, D.K.: Fault-Tolerant Computer System Design. Prentice-Hall, Englewood Cliffs (1996)
Evolution of Polymorphic Self-checking Circuits
197
[8] Diaz, M., Az´ema, P., Ayache, J.M.: Unified design of self-checking and fail-safe combinational circuits and sequential machines. IEEE Trans. Computers 28(3), 276–281 (1979) [9] Piestrak, S.J.: Feasibility study of designing tsc sequential circuits with 100% fault coverage. In: 17th IEEE Int. Symposium on Defect and Fault-Tolerance in VLSI Systems, pp. 354–364. IEEE Computer Society Press, Los Alamitos (2002) [10] Touba, N.A., McCluskey, E.J.: Logic synthesis of multilevel circuits with concurrent error detection. IEEE Trans. on CAD of Integrated Circuits and Systems 16(7), 783–789 (1997) [11] Marienfeld, D., Ocheretnij, V., G¨ ossel, M., Sogomonyan, E.S.: Partially duplicated code-disjoint carry-skip adder. In: Proc. of the 17th IEEE Int. Symposium on Defect and Fault Tolerance in VLSI Systems, pp. 78–86. IEEE Computer Society Press, Los Alamitos (2002) [12] Kakaroudas, A.P., Papadomanolakis, K., Kokkinos, V., Goutis, C.E.: Comparative study on self-checking carry-propagate adders in terms of area, power and performance. In: Soudris, D.J., Pirsch, P., Barke, E. (eds.) PATMOS 2000. LNCS, vol. 1918, pp. 187–194. Springer, Heidelberg (2000) [13] Stoica, A., Zebulum, R., Guo, X., Keymeulen, D., Ferguson, I., Duong, V.: Taking Evolutionary Circuit Design From Experimentation to Implementation: Some Useful Techniques and a Silicon Demonstration. IEE Proc.-Comp. Digit. Tech. 151(4), 295–300 (2004) [14] Zebulum, R.S., Stoica, A.: Four-Function Logic Gate Controlled by Analog Voltage. NASA Tech Briefs 30(3), 8 (2006) [15] Sekanina, L., Starecek, L., Gajda, Z., Kotasek, Z.: Evolution of multifunctional combinational modules controlled by the power supply voltage. In: Proc. of the 1st NASA/ESA Conference on Adaptive Hardware and Systems, pp. 186–193. IEEE Computer Society Press, Los Alamitos (2006) [16] Weste, N., Harris, D.: CMOS VLSI Design: A Circuits and Systems Perspective, 3rd edn. Addison-Wesley, Reading (2004) [17] Wakerly, J.: Digital Design: Principles and Practices. Prentice-Hall, Englewood Cliffs (2000) [18] Miller, J., Job, D., Vassilev, V.: Principles in the Evolutionary Design of Digital Circuits – Part I. Genetic Programming and Evolvable Machines 1(1), 8–35 (2000) [19] Sekanina, L.: Design and Analysis of a New Self-Testing Adder Which Utilizes Polymorphic Gates. In: Proc. of the 10th IEEE Design and Diagnostics of Electronic Circuits and Systems Workshop, pp. 1–4. IEEE Computer Society Press, Los Alamitos (2007) [20] Ocheretnij, V., Marienfeld, D., Sogomonyan, E.S., G¨ ossel, M.: Self-checking codedisjoint carry-select adder with low area overhead by use of add1-circuits. In: 10th IEEE Int. On-Line Testing Symposium, pp. 31–36. IEEE Computer Society Press, Los Alamitos (2004)
Sliding Algorithm for Reconfigurable Arrays of Processors Natalia Dowding and Andy M. Tyrrell Intelligent Systems Research Group, Department of Electronics, University of York, United Kingdom
Abstract. Electronic systems with intrinsic adaptive and evolvable features can potentially significantly increase functionality of a system. To achieve high level of adaptivity the system must be able to modify its internal configuration under changing environmental conditions without interrupting operation. This can be achieved through dynamic reconfiguration. Dynamic reconfiguration of arrays of processors often relies on the specialized architectures with the built-in reconfiguration capacities. Specialized architectures suffer from lack of flexibility and high cost. Reconfiguration algorithms for highly practical general purpose architectures such as rectangular grid of processors are highly complex and, thus, unsuitable for dynamic reconfiguration. This paper proposes a systematic approach to reconfigurable architectures. The general framework for reconfiguration algorithms design is presented based on discrete Morse functions and discrete vector fields on cellular complexes.
Introduction The recent advances in VLSI (Very Large Scale Integration) technologies make it possible to fabricate circuits with extremely high densities of transistors. High speed and computational power and especially high complexity of the new systems achieved in silicon industry in last decades has brought to light new requirements and new potential. High complexity makes it is desirable to endow the system with the built-in properties of self-repair, self-replication and adaptation. These properties are intrinsic to the higher multi-cellular organisms which demonstrate an amazing survival and adaptive qualities. Recent research demonstrated a significant progress implementing some of the biologically inspired features in hardware (e.g. [15], [14]). In order to introduce such properties in silicon it is necessary that the system is able to perform a number of actions at the run-time. These actions may include changing status of the internal components of the system as required, deactivating some of the working components and activating the others, performing self-tests and self-diagnostics in an efficient manner and maintaining a number of spare components.
The authors would like to thank the EPSRC (grant number GR/R74512/01) and the MoD for funding this project.
L. Kang, Y. Liu, and S. Zeng (Eds.): ICES 2007, LNCS 4684, pp. 198–209, 2007. c Springer-Verlag Berlin Heidelberg 2007
Sliding Algorithm for Reconfigurable Arrays of Processors
199
Taking into account the increasing complexity of electronic systems self-repair becomes vital for reliable and effective operation. Self-repair property can be provided by built-in run-time reconfiguration mechanism. Reconfiguration is important for adaptive systems for two reasons. Firstly, reconfiguration can be thought of as an ability of the system to change its internal structure to adapt to the changing environment. Secondly, reconfiguration can be used for supplying the system with fault-tolerant capability to ensure reliable operation since the probability of faults increases with complex systems with the high degree of integration. Reconfiguration techniques have been studied by a number of researchers including [13], [11], [7], [8], [9]. The previous work on reconfiguration can be subdivided into several classes depending on time of reconfiguration, on topology of the basic array of processors and on the general approach to the reconfiguration (generic or architecture-specific). Regarding time of reconfiguration, there are generally two different classes of reconfiguration strategies - fabrication time and run-time reconfiguration. Most of the work is devoted to the fabrication-time reconfiguration (see [13], [9]) while only a small fraction of research studies run-time techniques (see [11]). Different topologies of arrays of processors include mesh, hypercube, tree, torus. Regarding the structure of the array of processors, most research has been devoted to the regular mesh-connected arrays [13], [11], [9]. The reason for this is their practical importance. For many applications this is natural and most convenient topology. These applications include image and signal processing, operations on matrices and many others. In addition, many applications that are suitable for a rectangular array of processors allow wavefront implementation. In a wavefront array processors operate in an asynchronous and distributive manner making it very attractive for reconfiguration algorithms. Structures with data dependency, such as binary trees, are least favorable for the run-time reconfiguration. Processors in the tree operate in a synchronized manner and failure of a single processor may lead to the failure of whole system. Still, binary trees are useful structures and are used in many applications such as dictionary and database machines, task scheduling and logic function representation. The previous work on reconfigurable binary trees included the design of a specialized architecture adapted to the specifics of the binary tree topology and reconfiguration requirements. The reconfiguration capabilities are normally provided by introducing additional communication lines, switches and modules (see [11],[7],[8]). However, the specialized architectures are suitable for a very narrow class of applications and are not suitable for re-use. This makes them very costly and inflexible. In this paper the general framework for the reconfiguration for fault-tolerance on the basis of discrete Morse functions will be presented and it will be shown how discrete Morse functions can be used in the search of replacement node and for establishing the route. A binary tree embedded into the mesh will be considered and the reconfiguration algorithm for the embedded binary tree will be described.
200
1
N. Dowding and A.M. Tyrrell
Background
In order to establish consistent terminology throughout the paper, the following terms will be used. The host array is the original physical array of processors which may contain faulty processors. The target array is the fault-free subset of the host array capable to perform the desired task correctly. The aim of reconfiguration is to build the target array. The purpose of the run-time or dynamic reconfiguration algorithms is to enable the system to operate in the case of failure of processing units during execution. Ability of the system to tolerate faults is a critical feature for adaptive evolvable systems because their advantages can be appreciated to full extent only in the case of uninterrupted operation during a large period of time. Run-time reconfiguration encounter several challenges which makes it very difficult to implement this feature in integrated circuits. These challenges include: – Time of reconfiguration should be as small as possible – When reconfiguration is complete, the processing units should be in a synchronized status which corresponds to the status of the system at the moment preceding failure. – The algorithm should be distributive to ensure acceptable time and reliability. The efficiency of the specialized architectures lies in the fact that they use to their advantage full information about the topological structure of the target array. In the case of general purpose arrays, the target array should be embedded into the host array. The structure of the host array is not always ideally suited for this. The general framework proposed in this paper is aimed at providing the systematic methodology of design of reconfiguration algorithms which will bring together the efficiency of the specialized schemes and flexibility of general purpose schemes. In this section the theoretical background behind the proposed strategy and terminology will be introduced. Since this is rather mathematical it is important to lay a firm foundation before moving on to applying it to processing array. 1.1
Discrete Morse Function
Discrete Morse functions are mappings defined on discrete domains called cell complex (see [1]). The complete treatment of the subject can be found in [1], [3], [4]. Notation adopted in [2], [3], [4] is used to present new definitions and concepts. In order to understand the concept of the discrete Morse function it is necessary to introduce the cell complexes. To do so, consider a unit ball B m of dimension m. The boundary of the ball B m is a sphere S m−1 . The topological space which is homeomorphic to the unit ball B m is called m-cell.
Sliding Algorithm for Reconfigurable Arrays of Processors
201
Cell C m can be glued to the cell complex K m−1 by means of attaching map. Attaching map sends each point of the boundary of C m to a point of K m−1 : ϕ:S m−1 −→ K m−1 . A collection of cells of dimension d : 0 ≤ d ≤ m together with the attaching maps is called a cell complex (see [2]). The examples of a cell complex are a sphere, a torus and two spheres with common point. An arbitrary graph is an example of cell complex which consists of cells of dimension 1 and 2 (see figure 1). Let K be an arbitrary cell complex. Denote σ n to be a an n-cell of K. A set of cells of the complex K of dimension d ≤ m is called k-sub-complex of K. The following notations and definitions will be accepted in connection with cell complexes: 1. The border of the cell τ m is denoted as ∂τ . 2. If cells σ m−1 , τ m ∈ K and σ m−1 ⊂ ∂τ , that is, σ m−1 lies in the boundary of the cell τ m , then σ m−1 < τ m . 3. If σ m−1 < τ m then σ m−1 is called the face of the cell τ m . 4. Card(N ) is cardinality of the set N. Let f be a real function on the cell complex K assigning a real number to each cell. According to [2] the following definitions describe the concept of discrete Morse function and the critical cells: Definition 1. The real function f on K is called discrete Morse function if the following conditions satisfy: 1. If N = τ m+1 > σ m |f (τ m+1 ) ≤ f (σ m ), then Card(N ) ≤ 1 2. If M = υ m−1 < σ m |f (υ m−1 ) ≥ f (σ m ), then Card(M ) ≤ 1 Definition 2. The cell σ m is called critical cell of the index m in the complex K if the following conditions are satisfied: 1. If N = {τ m+1 > σ m |f (τ m+1 ) ≤ f (σ m )} and Card(N ) = 0 2. If M = {υ m−1 < σ m |f (υ m−1 ) ≥ f (σ m )} and Card(N ) = 0 An example of discrete Morse function is given on the figure 1. Following [4] discrete vector field is defined on a cell complex with discrete Morse function as follows. Definition 3. Let K be an arbitrary cell complex. The discrete vector field V on K is the mapping V : K −→ K ∪ ∅ such that the following conditions are satisfied: 1.If V (σ) = ∅ than dim V (σ) = dim σ + 1. 2. If V (σ) = τ than V (τ ) = ∅. 3. For all σ∈ K, the cardinality of the set |V (σ)| ≤ 1. In other words, the discrete vector field is a sequence of cells σ1 , τ1 , σ2 , τ2 , ..., σn , τn , called vector chain. In each pair σi , τi the cell σi is a face of τi . The second
202
N. Dowding and A.M. Tyrrell 11
6 7
4
2
13
3
5 1
1
0 12
10
8
9
Fig. 1. Discrete Morse function on the cell complex and the corresponding discrete vector field
and third conditions state that each cell can serve as an image or pre-image of no more than one cell and, therefore, V is well-defined. A particularly useful class of discrete vector fields connected with Morse functions are called gradient-like discrete vector fields. Gradient-like discrete vector fields are associated with discrete Morse function and can be constructed in several steps. Start with the sub-complex K 1 . For all non-critical 1-cells if vertex v ∈ ∂σ 1 and f (v) > f (σ 1 ) establish the vector v which goes from v to σ 1 . Proceed for higher dimensions until all cells and their boundaries have been examined. That is, if σ m < τ m+1 and f (τ m+1 ) ≤ f (σ m ) then v is the vector starting at τ m+1 and finishing at σ m . The discrete vector field shown on the figure 1 belong to the class of gradient-like vector fields. If K is a cell complex and f : K −→ R is discrete Morse function, then there is a unique gradient-like vector field corresponding to f can be defined on K. The discrete vector field establishes the rules of transition from an one arbitrary cell of the cell complex to another until it arrives to the critical cell determining, thus, the discrete flow on the cell complex. Hence, the discrete gradient-like vector field leads one away from zero-valued critical vertex towards a maximal critical vertex. Specifying location for the replacement processor and establishing a valid route to it represent the most challenging problems for the distributive reconfiguration strategy (see [9]). In Section 2 it will be shown how the discrete Morse functions can be used to discover the appropriate location for the replacement processor and to establish a valid path to this location.
2
Sliding Tree Algorithm
A successful reconfiguration strategy for embedded binary trees should ideally have the following properties: – It should be able to construct new routes bypassing the faulty vertices – It should preserve the mutual disposition of the tree nodes
Sliding Algorithm for Reconfigurable Arrays of Processors
203
Algorithm 1. Sliding Tree Algorithm 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15:
if k is not critical vertex then U pdate distance f unction on grid G end if if k is in H − tree then Remove subtree Sb rooted at k and all nodes between k and its; parentp U pdate distance f unction on G Compute the area consumed by the subtree Sb Compute the target value of distance f unction C F ind node v with LP V (v) = C using P ath Searching Algorithm if v is not f ound then ST OP end if Construct path f rom p to v P lant subtree Sb at node v end if
The second property allows one to maintain the integrity of the tree and its layout will guarantee construction of optimal routes and the possibility of repeated reconfiguration. Assume that an arbitrary operational node v of the embedded tree failed and denote the subtree rooted at v as Sb(v). Then communication between the root and all nodes of Sb(v) is damaged. In order to recover from the fault the subtree Sb(v) is removed together with the path between v and its parent node p. The replacement node is searched for and the subtree Sb is restored at the new position. If the algorithm fails to find the appropriate position, reconfiguration fails. 2.1
Distance Function
Let G be a graph representation of the mesh-connected array of processors called grid graph. The discrete Morse function proposed in this section is defined in two steps: first, its values on vertices will be defined. Then, they will be complemented by values on edges. Let C be a set of vertices of the grid that should be excluded from the routing process. These vertices are called critical. All vertices that correspond to a faulty processors are included into the class C as well as vertices of the embedded tree and vertices that lie on the border of the grid. The distance function is defined as follows: Definition 4. For each non-critical vertex v of the graph G with the set of critical vertices C = C1 , C2 , ..., Cn , define the distance function as LP V (v) = min [ min (length(v, c)) ] Ci ∈C c∈Ci
(1)
204
N. Dowding and A.M. Tyrrell
Note that values of the distance function on the strong critical vertices are zero. Example of distance function for one critical vertex is given in figure 1. Vertices which are equidistant from the critical one form well known diamond-shaped wavefronts (e.g. [10],[11]). With the function defined in this way, another set of critical vertices on the grid can be identified. For these vertices the following inequality holds: LP V (u)u∈I(v) ≤ LP V (v)
(2)
where I(v) is set of vertices incident to the vertex v. Collectively these vertices are called maximal critical vertices of the graph G corresponding to a function LP V (v). They subdivide the graph into collection of subgraphs known as neighborhood subgraphs or N-subgraphs of G. The distance function is a Morse function on each separate N-subgraph, hence, it can be called locally discrete Morse function. Locally discrete Morse functions are studied in [5]. 2.2
Discrete Vector Fields
With the discrete function LP V defined on the set of vertices, the discrete vector field can be defined in a non-unique way. In order to maintain the structural integrity of the embedded tree, it would be desirable that the discrete flow which is emanating from a vertex of grid graph, had a minimal number of twists. To achieve this it is necessary the choose direction of the flow. More precisely, the discrete vector field should be built in the following manner. Let G be a grid graph and let C be the set of critical vertices. Assume that the distance function is computed for each vertex and N-subgraphs are defined for each critical vertex. Let F : G −→ G ∪ be a discrete vector field defined as follows: – If v is a critical vertex then the image of v under F is an empty set: Im(v) =
– If v is not critical, let the edge g = (u1 , u2 ) define the priority direction of the vector field on the N-subgraph H (see figure 2). Let v be a vertex from H. Let Ev = {(v, wi )}, where 1 ≤ i ≤ 4, be a set of edges incident to v and such that LP V (wi ) > LP V (v). Then the edge ej ∈ Ev is assigned to be an image of v under F if the following equation holds: ω(ej ) = mink (α(g, ek )), where α(g, ek ) is the oriented angle between the edges g and ek . The methods of defining the discrete vector field for a given discrete Morse function can vary and may be chosen in such a way that suits best for the task. For example, different vector fields can be obtained when a different local priority vector is chosen. 2.3
Path Searching Algorithm
In this section it will be shown how the distance function and the direction preserving vector field can be used in the search of an appropriate replacement vertex and establishing the valid route.
Sliding Algorithm for Reconfigurable Arrays of Processors
205
Fig. 2. Discrete vector field on the grid. Assume that f indicates priority direction. Numbers in the boxes show the value of distance function of the node. Vertex W has three candidate vertices with higher values of distance function. Then the vertex B is chosen because edge (W, B) is co-directed with f .
Due to the regularity of the H-tree layout, the area of the grid consumed by a tree of level d can be easily computed using a simple recursive algorithm. Knowing the area required to a subtree it is easy to conclude that the minimal value of the distance function for the candidate node u for a root of a subtree should satisfy equation 3 (h and w denote the height and the width of the H-tree expressed in a number of vertices): h w LP V (u) = max{ + 1, + 1} (3) 2 2 Priority direction is chosen to be pnt − cld where cld is the root of the subtree to be relocated and pnt is its parent node. The Path Searching Algorithm is aimed at finding a vertex which satisfies 3. The search is directed by the distance function and discrete vector field defined in 2.2. The Path Searching Algorithm follows the discrete vector field until it hits the maximal critical vertex where the search stops. The discrete vector field can conveniently direct the search in the domain of the neighbor - subgraph. But N-subgraphs constitute rather small portions of the grid, especially when a large number of critical points is present. When the path reaches a maximal critical vertex w and LP V (w) is less than required by 3, it is desirable that the algorithm continues the search. This can be done by a simple extension to the Basic Scheme of the Search Path Algorithm in which the extended family of nearest nodes is examined when the maximal critical vertex is found. The algorithm continues the search along the non-growing vertices for a finite number of steps. This enhanced algorithm is called Extended Scheme. In this research two types of algorithms have been tested. First of them - basic search algorithm - complete execution upon meeting a maximal critical point. The second, extended search algorithm, continues the search until satisfactory vertex is found or until specified number of search steps elapsed.
206
N. Dowding and A.M. Tyrrell
Algorithm 2. Path Searching Algorithm (Basic Scheme) 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22:
Input : 1. target value of distance f unction C 2. Starting at parent node p : v = p P AT H ⇐ v while LP V (v) < C do list L ⇐ incident vertices(v) M = maxu∈L (LP V (u)) for all u ∈ L do if LP V (u) = M then S⇐u end if end for if L not then ω=0 for all u ∈ L do if ω < angle[(v, u), direction] then ω = angle[(v, u), direction] best = u end if end for P AT H ⇐ best v = best else ST OP 23: end if 24: end while
3
Results
In order to test the reconfiguration scheme based on the Sliding Tree Algorithm, a simulation has been designed and the reliability of the system (and, hence, the effectiveness of the algorithm) has been estimated. The code of the simulation was written in C++ and run in under Microsoft Visual Studio 6. LEDA 4.4.1. library by Algorithmic Solutions Software GmbH has been used to support graph manipulations. Simulation has been run under the following assumptions: – all processors are fault-free at the initial moment of time – processors are statistically independent, that is, failure of a single processor does not effect status of other processors in the array – faults are fatal and faulty processors cannot be used again. – buses and communication lines are considered to be fault-free. These assumption allow one to design a realistic reliability model of the array without excessive complications to the model. Figure 3 compares search techniques of Basic and Extended Schemes.
Sliding Algorithm for Reconfigurable Arrays of Processors
207
1 0.9
extended scheme d=5
Survivability coefficient
basic scheme d=5
0.8
basic scheme d=6 extended scheme d=6
0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 5
10
15
20
25 30 35 40 45 Number of faulty nodes
50
55
60
Fig. 3. Survivability coefficient for the basic scheme and the extended scheme for the trees of different depth
The reliability of the system which consists of N components over the time period t is defined as ρ(t) = S(t) N (see [16]) where S(t) is number of surviving components at time t and N is total number of components. The reliability of the single processing element over some period of time t is given by expression R(t) = exp−λt , where λ is the failure rate usually expressed as number of failures per 10−6 hours. In order to get more realistic picture assume that all processing elements of the grid have equal probability of failure. In order to estimate the reliability of the system the method considered in [11] will be employed. Let Ci be the survivability coefficient, defined as probability that the array survives with i faults. Ci is determined as Ci = Si /K where Si is the number of successful recoveries of the system from i faults and K is the number of patterns with i faulty nodes. The reliability of the system can be expressed as K M R(t) = Ci RM−i (1 − R)i (4) i i=0 The survivability coefficient Ci is determined experimentally using the simulation. The uniform random number generator was used to generate two random integers x, y ∈ {1, ..., max(h(d), w(d))} and simulate the failure of the processor with coordinates (x, y). Then the Sliding Tree Algorithm is used to reconfigure the tree. In order understand the level of reliability of the proposed algorithm it is useful to compare it with previous work on reconfigurable binary trees (see [7], [8]). All three schemes are specialized reconfiguration schemes for binary trees. Such comparison is difficult because of the difference in characteristics such as number of spares. Also, the parameters of experiments (time range and the depth of the tree) differ substantially. However it is still possible to compare the absolute numbers for reliability evaluation. Figure 4 compares the reliability of basic and extended versions of the Sliding Tree Algorithm with two specialized
208
N. Dowding and A.M. Tyrrell
Reliability 1 0.9 0.8 0.7 0.6 0.5 0.4 Basic STA Extended STA Modular scheme RAE scheme
0.3 0.2 0.1 0 0.05
0.1
0.2
0.3
0.4
Time(h*106)
0.5
0.6
0.7
Fig. 4. Comparison of reliability of Basic an Extended schemes with two specialized schemes - modular scheme and RAE scheme
schemes - modular tree scheme and RAE scheme presented in [8] and [7] respectively. Figure 4 shows that both basic and extended scheme demonstrates reliability higher then 0.9 up to 0.25 h∗ 106. During this period of time they show reliability close to that of RAE and modular schemes. Moreover, the extended scheme outperform the RAE scheme up to the time of 0.35 h ∗ 106 . This is a promising result for a general architecture against specialized.
4
Conclusion
The framework for the reconfiguration algorithm design presented in this paper proposes a general method for developing distributive reconfiguration algorithms based on discrete Morse functions and discrete vector fields. The General framework will be called General Design Environment for Reconfiguration Strategies (GDERS). GDERS represent a generic and systematic approach to the problem of reconfiguration algorithms. The method of discrete functions on cell complexes used in GDERS can be applied to any topology of the host arrays. Moreover, the class of target arrays can be extended to the class of planar graphes. This can be done by considering the minimal spanning tree of the target graph during the process of reconfiguration and reconnecting the nodes of target graph when reconfiguration is complete. Another advantage of the proposed reconfiguration strategy is distributivity. The Sliding Tree Algorithm demonstrated the capabilities of distributive approach to the tree structure which is characterized by strong data dependency. The Sliding Tree Algorithm serves as an example of how GDERS can be used in practice. In addition, the reliability demonstrated by the Sliding Tree Algorithm shows good performance even when it is compared to the specialized schemes. The Sliding Tree Algorithm has a lot of potential for the future work. The most prospective directions should be the methods of decreasing the number
Sliding Algorithm for Reconfigurable Arrays of Processors
209
of spare nodes and further study of discrete Morse functions with the aim of improving the Path Search Algorithm of the main Sliding Tree Algorithm. The main outcome of this paper is the design a distributive and flexible reconfiguration algorithm for the mesh-connected arrays of processors which is capable of recovery after repeated faults. The importance of the Sliding Tree Algorithm for further development of adaptive and evolvable systems follows from the number of advantages which the algorithm can deliver. First of all, applying the algorithm for fault-tolerance can provide an uninterrupted operation of the system,thus, allowing the system to evolve new features and to adapt to the new environment. Moreover, the algorithm can be used generally for restructuring the target array, hence, assisting in performing adaptation and self-development in a specific conditions. In addition, the generic nature of the proposed approach allows one to apply the algorithm to different target arrays and different discrete functions.
References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12.
13. 14.
15.
16.
Milnor, J.: Morse Theory. Ann. Math. St, Prinston Univ. Pr. (1973) Forman, R.: A User’s Guide to Morse Theory. In: Sem. Lotharingen Comb. (2002) Forman, R.: Morse Theory for Cell Complexes. Adv. in Math. 134, 90–145 (1998) Forman, R.: Combinatorial Vector Fields and Dynamical Systems. Mathematische Zeitung 228, 629–681 (1998) Goresky, M., MacPherson, R.: Stratified Morse Theory. Springer, Berlin and Heidelberg GmbH (1988) Leiserson, C.E.: Area-Efficient Graph Layout (for VLSI). In: IEEE 21st Ann. Symp. Found. Comp. Sc. (1980) Raghavendra, C.S., Avizienis, A., Ercegovac, M.D.: Fault Tolerance in Binary Tree Architectures. IEEE Trans. Comp. 33, 568–572 (1984) Hassan, A., Agarval, V.: A Fault-tolerant modular archtecture for binary trees. IEEE Trans. Comp. 35, 356–361 (1986) Jigang, W., Shrikanthan, T.: An Improved Reconfiguration Algorithm for Degradable VLSI/WSI Array. Jour. Syst. Architecture 49, 23–31 (2003) Lee, C.Y.: The Algorithm for Path Connections and Its Applications. IRE Trans. Electr. Comp. EC-10, 346–365 (1961) Kung, S.-Y., Jean, S.-N., Chang, C.-W.: Fault-Tolerant Array Processors Using Single Track Switches. IEEE Trans. Comp. 38, 501–514 (1989) Abachi, H., Walker, A.-J.: Reliability analysis of tree, torus and hypercube message passing architecture. In: Proc. of the 29th Southeast. Symp. on Syst. Th., pp. 44– 48. IEEE Computer Society Press, Los Alamitos (1997) Chean, M., Fortes, J.A.B.: A Taxonomy of Reconfiguration Techniques for FaultTolerant Proccesor Arrays. IEEE Comp. 23, 55–69 (1990) Ortega, C., Mange, D., Smith, S.L., Tyrrell, A.M.: Embryonics: A Bio-Inspired Cellular Architecture with Fault-Tolerant Properties. Jour. of Gen. Prog. and Evol. Machines 1, 187–215 (2000) Greenstead, A.J., Tyrrell, A.M.: An Endocrinologic-Inspired Hardware Implementation of a Multicellular System. In: Proc.NASA/DoD Conf. Evol. Hardware, Seattle (2004) Lala, P.K.: Fault-Tolerant and Fault Testable Hardware Design. Prentice Hall Int., Englewood Cliffs (1985)
System-Level Modeling and Multi-objective Evolutionary Design of Pipelined FFT Processors for Wireless OFDM Receivers Erfu Yang1 , Ahmet T. Erdogan1, Tughrul Arslan1 , and Nick Barton2 1
School of Engineering and Electronics 2 School of Biological Sciences The University of Edinburgh, King’s Buildings, Edinburgh EH9 3JL, United Kingdom {E.Yang,Ahmet.Erdogan,T.Arslan,N.Barton}@ed.ac.uk Abstract. The precision and power consumption of pipelined FFT processors are highly affected by the wordlengths in fixed-point application systems. Due to nonconvex space, wordlength optimization under multiple competing objectives is a complex, time-consuming task. This paper proposes a new approach to solving the multi-objective evolutionary optimization design of pipelined FFT processors for wireless OFDM receivers. In our new approach, the number of design variables can be significantly reduced. We also fully investigate how the internal wordlength configuration affects the precision and power consumption of the FFT by setting the wordlengths of input and FFT coefficients to be 12 and 16 bits in fixed-point number type. A new system-level model for representing power consumption of the pipelined FFT is also developed and utilized in this paper. Finally, simulation results are provided to validate the effectiveness of applying the nondominated sorting genetic algorithm to the multi-objective evolutionary design of a 1024-point pipelined FFT processor for wireless OFDM receivers.
1
Introduction
The precision and power consumption are of prime importance in fixed-point DSP (digital signal processing) applications. FFT (fast Fourier transform) has emerged as one of main DSPs and has been widely applied to wireless communication systems. Since the FFT is the key component in wireless OFDM (Orthogonal-Frequency-Division-Multiplexing) receivers [1, 2], it is chosen as a benchmark system in this paper to investigate the system-level modeling and evolutionary optimization issues under multiple objectives. Determining the optimum wordlength plays an important role in a complex DSP system [2]. In a complex system this process may be spent more than half of the total design time. Since the design space is nonconvex, there are often many local optimum solutions [2]. Though it has been reported that a good trade-off between precision and complexity can be found by using different wordlengths for all the stages in [3], the wordlengths obtained by the simulationbased strategy may be far from globally optimal or Pareto-optimal in the domain of multi-objective optimizations. L. Kang, Y. Liu, and S. Zeng (Eds.): ICES 2007, LNCS 4684, pp. 210–221, 2007. c Springer-Verlag Berlin Heidelberg 2007
System-Level Modeling and Multi-objective Evolutionary Design
211
To make a trade-off design between conflicting objectives such as power consumption and precision in designing a full custom pipelined FFT processor with variable wordlength, a number of wordlength configurations for different stages of FFT processor should be explored in a rapid and efficient manner. The conventional method for exploring different wordlength configurations is based simulation and offline analysis, which often requires much runtime and unavoidably results in a very low efficiency of design. However, multi-objective evolutionary algorithms can automatically generate all the optimal configurations under the given design constraints. Undoubtedly, this will significantly improve the design efficiency under multi-objectives. The aim of this paper is to present an automatic approach for efficiently generating all the Pareto-optimal solutions under two competing design objectives, i.e., power consumption and precision. Toward this end, a multi-objective evolutionary algorithm is selected and applied to the multi-objective optimizations of pipelined FFT processors for wireless OFDM receivers. Compared with the commonly used simulation-based wordlength optimization methods, the evolutionbased multi-objective optimization approach is capable of exploring the entire space. Moreover, all the solutions can be simultaneously generated by running the algorithm. Therefore, the time for determining optimum wordlengths can be significantly reduced.
2
Related Work
The precision and hardware complexity of fixed-point FFT processors are affected by wordlengths. In recent years the wordlength-based optimization has received considerable attention [4, 5, 6, 7, 2]. In [4], the genetic algorithm (GA) was used to optimize the wordlength in a 16-point radix-4 pipelined FFT processor. However, only the wordlengths for input data and FFT coefficients were optimized by the GA. The work in [4] can be classified into single-objective optimization problems. In designing DSPs and VLSI systems, multi-objective evolutionary algorithms have been applied [8,9,10]. The multi-objective optimizations for pipelined FFT processors were particularly investigated in [5, 6, 7]. A multi-objective genetic algorithm was employed to find solutions for the FFT coefficients which have optimum performance in terms of signal-to-noise ratio (SNR) and power consumption. In these works, the authors only focused on the wordlength impact of fixed-point FFT coefficients on the performance and power consumption. The research objective was to find a solution for the FFT coefficients which have optimum performance. So the FFT coefficients were directly used to encode the chromosomes in [5, 6, 7]. Due to this representation, the size of the chromosome used for the multi-objective genetic algorithms in [5, 6, 7] highly depends on the size of the FFT. For example, to a 1024-point pipelined FFT, the number of variables required in [5, 6, 7] will be more than 2048. Together with a large population, the computation requirement and algorithm complexity in [5, 6, 7] are extremely high.
212
E. Yang et al.
Compared with these existing works, this paper presents a new method to deal with the multi-objective optimization problems arising from designing pipelined FFT processors in fixed-point applications. In our new multi-objective optimization strategy, the number of design variables will be the same as the number of the stages in the pipelined FFT processor. For example there are only 10 design variables to be optimized in a 1024-point FFT. Unlike [5, 6, 7] we will fully investigate how the internal wordlength configuration affects the precision and power consumption of the FFT by setting the wordlengths of input and FFT coefficients to be 12 and 16 bits in fixed-point number type. To perform the multi-objective optimizations the full fixed-point operations will be used in this study. In [5,6,7] only partial fixed-point computation was used. Furthermore, we will adopt a new system-level model to represent power consumption rather than using a lower-level dynamic-switching-based power consumption model [6, 7]. There are also other wordlength-based optimizations. In [2] a fast algorithm for finding an optimum wordlength was presented by using the sensitivity information of hardware complexity and system output error with respect to the signal wordlengths. However, a complexity-and-distortion measure is needed to update the searching direction. It may be very difficult to obtain such a measure for a complex system. Moreover, it’s also hard for extending the direction information-based optimization methods to other more complex applications. To find the optimum wordlengths for all the stages of the FFT processors, simulation-based results have been reported in [3], where the optimization of the wordlengths in an 8k-points pipelined FFT processor was made by using a C-model. However, the computation time and complexity will exponentially increase when the size of FFT processor increases. As a result, this simulationbased wordlength optimization method can only search a partial solution space. In addition, the power consumption was not explicitly dealt with in [3]. In [2] the wordlength optimization of an OFDM demodulator was investigated as a case study. Only 4 wordlength variables, i.e., the FFT input, equalizer right input, channel estimator input, and equalizer upper input, were selected. The internal wordlenghs of the FFT were asumed to have already been decided. In their simulations, only the input to the FFT and other blocks were constrained to be in fixed-point type, whereas the blocks including FFT were simulated in float-point type. However, in this study we will deal with the multi-objective optimization issues by fully selecting the internal wordlenghs of the FFT as design variables. In our simulations, the entire FFT processor is simulated in fixed-point type. Thus, the results will be closer to the real DSP hardware environment.
3
Pipelined FFT Processors and Wordlength Optimizations
The precision of pipelined FFT processors relies on the wordlength. To increase the precision,alongerwordlengthisdesirable.However,theshortwordlengthisexpected forreducingthecostandpowerconsumption.Althoughtheoptimumwordlengthmay be obtained by trial-and-error manner or computer-based simulations, the process
System-Level Modeling and Multi-objective Evolutionary Design
213
itself is extremely time-consuming. It has been reported that in a complex system it may take 50% of the design time to determine the optimum wordlength, see [2] and the reference therein. For pipelined FFT processors it is very difficult to obtain an analytical equation to determine the optimum wordlength even for the case where all the stages of the FFT processor have the same wordlength. When the optimum wordlength of each stage needs to be configured independently or separately, the complexity of determining the optimum wordlengths for the whole processor will become particularly challenging if there is no efficient method to automatically generate all the trade-off solutions under multiple competing objectives and constraints.
4
Statement of the Multi-objective Optimization Problem
Let x denote the design variable vector in multi-objective optimizations. In this paper the output SQNR (Signal-to-Quantization-Noise-Ratio) is used to measure the precision performance of fixed-point pipelined FFT processors. The first objective to be optimized is defined by min f1 (x) = D − SQN R(x)
(1)
where D is a positive constant and can be used to represent the desired SQNR. The SQNR is calculated as follows 2N 2N fl2 f ix 2 fl f ix 2 fl f ix 2 SQN R(x) = 10 log (Ri + Ii )/ [(Ri − Ri ) + (Ii − Ii ) ] i=1
i=1
(2) where N is the data length of the FFT. and are the real and imaginary part of the output for the float-point FFT. Correspondingly, Rif ix and Iif ix are the real and imaginary part of the output for the fixed-point FFT. The second objective is to minimize the power consumption, i.e., Rif l
Iif l
min f2 = P ower(x)
(3)
where P ower(x) will be detailed in the next section. Now we are in a position to state the multi-objective optimization problem under consideration in this paper as follows: Problem 1 (multi-objective optimization problem). Find the design variable vector x∗ ∈ Ω under some constraints to minimize the objective vector function min f (x) = [f1 (x), f2 (x)]
(4)
in the sense of Pareto optimality, i.e., for every x∗ ∈ Ω either fi (x) = fi (x∗ ) (∀i ∈ {1, 2}) or at least there is one i ∈ {1, 2} such that fi (x) > fi (x∗ ).
214
E. Yang et al.
So for the multi-objective optimization problem, we are interested in finding all the solutions which satisfy the following nondominance conditions: – 1) ∀i ∈ {1, 2}: fi (x1 ) ≤ fi (x2 ) – 2) ∀j ∈ {1, 2}: fj (x1 ) < fj (x2 ) where x1 and x2 are two solutions. The Pareto-optimal set are defined as all the solutions that are nondominated within the entire search space. The objective functions of the Pareto-optimal set constitute the Pareto front we intend to find and assess the performance of a multi-objective optimization algorithm. For wordlength variables in a wireless OFDM receiver, we choose the wordlength of each stage in the pipelined FFT processor since these wordlengths have the most significant effect on precision and power consumption in the OFDM system. The inputs and outputs of the FFT are set to 12 bits fixed-point numbers since 12 bits are enough to represent a number required in this study. So we can focus on the effect of the internal wordlengths on the performance measurements of the FFT. It should be noted that in this case the number of the variables will be only determined by the size of the FFT processor. For example, there are 10 wordlength variables for a 1024-point FFT since there are 10 stages in total.
5
System-Level Modeling of Pipelined FFT Processors
To solve the multi-objective fixed-point optimization problems from pipelined FFT processors with different length of data points, a new system-level modeling approach is needed. By using such a system-level approach, a computational model can be developed to give the main performance metrics of pipelined FFT processors, such as power consumption, precision, and area, etc. In this paper, we are focusing on developing a system-level model to compute the power consumption of fixed-point FFT processor under different length of data points and wordlength. To obtain the precision metric, a relatively simple relation can be derived, as shown in (2). A parameterized system-level model is desired in order to reduce the computation complexity and resource requirement in power-efficient embedded applications. The model developed by a system-level approach also needs to provide some flexibility and scalability. Toward this end, in this study we propose four parameterized system-level models to represent the power consumption of FFT processors, i.e., single exponential fitting, bi-exponential fitting, single polynomial fitting, and bi-polynomial fitting. For exponential fitting methods, the following modeling formula is used: P ower = aebMsize + cedMsize
(5)
where Msize is the FIFO memory size required by the FFT. For polynomial fitting methods, the cubic polynomial function is utilized, i.e.: 3 2 P ower = aMsize + bMsize + cMsize + d
(6)
System-Level Modeling and Multi-objective Evolutionary Design
215
For bi-exponential and bi-polynomial modeling, the modeling data were grouped into two sections, i.e., memory size lies in [0.0625, 0.5]KB or [1, 16]KB. To determine the coefficients of the system-level model (5) and (6) , a set of data representing the power consumption of pipelined FFT processors under different FIFO memory size are needed. For this purpose, both the modeling and validating data sets can be obtained by using an accurate SystemC-based system-level simulator for FFT processors. In the SystemC-based system-level simulator, a gate-level power analysis is performed for each component by selecting a variety of parameters and representative input vectors [11]. The power information obtained for each component is then back-annotated to its SystemC object. After executing the complete SystemC-based simulator, the sufficiently accurate power estimates can be obtained.
Modeling FFT power consumption with memory size 120
Power consumption (mW)
100
80
60
Original data Cubic polynomial Bi−cubic polynomial Exponential Bi−exponential
40
20
0 0
2
4
6
8 10 Memory size (KB)
12
14
16
Fig. 1. Modelling FFT power consumption with FIFO memory size
The flexibility and scalability of the proposed system-level models lie in their coefficients which can be easily re-determined by using different modeling and validating data sets representing different architectures of FFT processors. To model the relationship between the dominating FIFO memory size and systemlevel power consumption, the only requirement is to present the necessary data set to the parameterized system-level models. The advantage of this parameterized system-level modeling is that it does not influence the multi-objective optimization algorithms which will be detailed later in this paper. As an example, Fig. 1
216
E. Yang et al.
shows a system-level modeling case by using a set of data representing the systemlevel power-consumption of FFT processors under different FFT size. This figure also shows the comparative results of the four modeling methods based on the same modeling and validating data set for the system-level power consumption of FFT processors. From this figure we observed that the bi-exponential fitting method gives the best modeling performance. So in this paper it is used in simulation example which will be reported later.
6
Multi-objective Evolutionary Design
Since the wordlength optimization problem is a noncontinuous optimization problem with a nonconvex search space, it is hard to find a global optimum solution [2]. So the conventional optimization techniques, such as gradient-based and simplex-based methods cannot be efficiently applied to solving this kind of optimization problems because there is no easy way to obtain the gradient information or analytically calculate the derivative for the complex objective functions f1 (x) and f2 (x). Hence, non-gradient optimizers are dominating in these applications. Currently there have been many popular non-gradient optimization methods available, such as simulated annealing (SA), evolutionary algorithms (EAs) including genetic algorithms, random search, Tabu search, and particle swarm optimization (PSO). Among these methods evolutionary algorithms have been recognized as one of the possibly well-suited to multi-objective optimization problems, see [12]. In this study the multi-objective evolutionary algorithms (MOEAs) are also chosen as main optimizers due to their several advantages compared with other optimization methods. Particularly, in this study we are interested in applying NSGA-II [13] to the multi-objective optimization problem under consideration in this paper. In comparison to NSGA [14], NSGA-II has many advantages and has been applied to many applications. In particular, NSGA-II was developed by using a fast sorting algorithm and incorporating elitism. There is also no need for specifying a sharing parameter. In the NSGA-II algorithm, the initial population of size N is first created with random values. The population is then sorted based on the nondomination into different fronts. The first front is completely non-dominant set in the current population and the second front is dominated by the individuals in the first front only. Each individual in each front is assigned a fitness (or rank) value equal to its nondomination level (the fitness in the first level is 1, 2 is for the next-best level, and so on). Once the non-dominated sort is completed, the usual tournament selection, recombination, and mutation operators are used to generate an offspring population. After the initial generation, elitism is introduced by combining current and previous population members. So the size of the combined population is doubled. This intermediate population is then sorted according to nondominaton by using a fast sorting algorithm. After generating a new population of size N , the selection is performed to produce the individuals of the next generation. The offsprings are generated by the selected parents using usual crossover and
System-Level Modeling and Multi-objective Evolutionary Design
217
mutation operators. In NSGA-II, the diversity among nondominated solutions is realized by introducing a crowding comparison procedure, see [13] for more details. One of the disadvantages of NSGA-II in [13] is that it did not provide a method or metric to measure how good or bad the final solutions are, particularly for a very complex engineering multi-optimization problem. So, in this study we also have to find appropriate metrics to assess the quality of multi-objective optimal design of pipelined FFT processors. Generally, the metrics to measure the performance of the heuristics evolutionary algorithms for multi-objective optimizations mainly include Generational Distance (GD), Spacing (SP), Maximum Spread (MS), etc. For more details on these performance metrics, one is referred to [15] and the references therein. In this paper we only choose the SP and MS metrics to measure the performance of NSGA-II for the multi-objective optimization problem of the pipelined FFT processor under consideration in this paper. The SP is used to measure how well the solutions are distributed (spaced) in the non-dominated sets found so far by MOEAs. It is fully computed by measuring the distance variance of neighboring vectors in the non-dominated sets. The SP is defined by Sp =
1 ¯2 Σ n (di − d) n − 1 i=1
(7)
where n is the number of nondominated vectors found so far. di = minnj=1,j=i M i j n ( m=1 |fm −fm |), and d¯ = Σi=1 di /n in which M is the total number of objective functions. Since SP measures the standard deviations of the distances among the solutions found so far, the smaller it is the better the distribution of the solutions is. A value of zero indicates that all the points on the Pareto front are evenly spaced. The MS is used to represent the maximum extension between the farthest solutions in the non-dominated sets found so far. MS is defined as follows M (maxn f i − minn f i )2 M s = Σm=1 (8) i=1 m i=1 m Unlike the metric SP, a bigger MS value indicates a better performance.
7
Simulation Results
In this section we demonstrate a design example by using the nondominated genetic algorithm (NSGA-II) to the multi-objective optimization of a 1024-point pipelined FFT processor. In the simulation the control parameters of the NSGAII were set as follows: the number of maximum generation= 250, population size = 100, probability of crossover = 0.85, and probability of mutation = 0.15. These control parameters are held constant during the design in one run.
218
E. Yang et al.
The other initial setting for the entire design is summarized in the following: – 1) The inputs and outputs of the FFT were set to be 12 bits fixed-point numbers in the specification. – 2) D was set to be 40 dB. – 3) The FFT coefficients had a wordlength of 16 bits and were assumed to be stored in a ROM. – 4) There was no scaling for all the stages of the FFT. – 5) There were total 10 design variables ranging from 8 to 24. – 6) The bi-exponential model for power consumption was used in the whole simulation. – 7) The desired SP and MS were 0.25 and 35.0 respectively. We have run the total program over 10 times, the successful rate under the design requirements above is 100%. The obtained results from one typical run are shown in Figs. 2-4. In Fig. 2 the x-axis represents the SQNR error defined by (1). The y-axis is the power consumption (mW). In this design example the SP and MS performance metrics are 0.1910 and 39.6434, respectively. Figure 3 illustrates the Pareto-optimal set from which a trade-off design of internal configuration of wordlengths can be easily made in terms of further design requirements and user’s preferences. The number of individuals on each ranked front over generations is plotted in Fig. 4. From this figure we can also observe that the multi-objective optimization algorithm worked perfectly. It should also be noted that the results are only for the purpose of demonstration. If we change the setting for the control parameters of the MOEA, the performance of the trade-off design may be further improved. The non−dominated front 80
Power consumption (mW)
75
Pareto front
70
65
60
55
50 0
5
10
15 20 SQNR error (dB)
25
30
Fig. 2. Pareto front obtained by MOEA for 1K FFT
35
System-Level Modeling and Multi-objective Evolutionary Design
(b) stage no.: 2
15
Wordlength
Wordlength
(a) stage no.: 1
10 5
0
20
40
60
80
15 10 5
100
0
20
40
10
0
20
40
60
80
0
20
40
Wordlength
Wordlength
40
60
80
0
20
40
Wordlength
Wordlength
40
60
80
0
20
40
Wordlength
Wordlength
40
60
80
100
80
100
(j) stage no.: 10
20
20
100
15 10
100
25
0
80
20
(i) stage no.: 9
15
60
(h) stage no.: 8
15
20
100
15 10
100
20
0
80
20
(g) stage no.: 7
10
60
(f) stage no.: 6
20
20
100
10 0
100
40
0
80
20
(e) stage no.: 5
0
60
(d) stage no.: 4
20
Wordlength
Wordlength
(c) stage no.: 3
0
219
60
80
30 20 10
100
0
20
40
Solution
60 Solution
Fig. 3. Pareto-optimal set
Ranking−based multi−objective evolutions
Number of individuals
200
150
100
50
0 10 8
250 6
200 150
4
100
2
Ranking no.
50 0
0
Generation
Fig. 4. Number of individuals on each ranked front
Unlike the commonly used simulation-based wordlength optimization methods, the proposed approach in this paper is capable of exploring the entire space. The most important is that all the solutions can be simultaneously generated by exploring the entire search space. Therefore, the time for determining the optimum wordlengths can be expected to be significantly reduced. In this
220
E. Yang et al.
example, the computation time is only 111.76 seconds (Windows XP platform, DELL D810 laptop). The trade-off design now can be easily made in terms of the Pareto-optimal front and set.
8
Conclusion
Determining the optimum wordlength for pipelined FFT processors under multiple competing objectives is a complex, time-consuming task. A new approach to solving the multi-objective evolutionary optimization design of pipelined FFT processors for wireless OFDM receivers has been proposed in this paper. How the internal wordlength configuration affects the precision and power consumption of the FFT has been fully investigated in this paper by setting the wordlengths of input and FFT coefficients to be 12 and 16 bits in fixed-point number type. A new system-level model for representing power consumption of pipelined FFT processors has also been developed and utilized in this paper. Simulation results have been provided to validate the effectiveness of applying the nondominated sorting genetic algorithm to the multi-objective evolutionary design of a 1024-point pipelined FFT processor for wireless OFDM receivers.
Acknowledgment This research is funded by the UK Engineering and Physical Sciences Research Council (EPSRC) under grant EP/C546318/1. The authors thank all of the team members of the ESPACENET1 project, which involves the Universities of Edinburgh, Surrey, Essex, and Kent, Surrey Satellite Technology (SSTL), NASA Jet Propulsion Laboratory (JPL), EPSON, and Spiral Gateway. The supports from Stefan Johansson, Peter Nilsson, Nasri Sulaiman, and Ali Ahmadinia are greatly appreciated.
References 1. Fechtel, S.A., Blaickner, A.: Efficient FFT and equalizer implementation for OFDM receivers. IEEE Transactions on Consumer Electronics 45(4), 1104–1107 (1999) 2. Han, K., Evans, B.L.: Optimum wordlength search using sensitivity information. EURASIP Journal on Applied Signal Processing 5, 1–14 (2006) 3. Johnsson, S., He, S., Nilsson, P.: Worldlength optimization of a pipelined FFT processor. In: Proceedings of the 42nd Midwest Symposium on Circuits and Systems, Las Cruces, NM, pp. 501–503 (August 1999) 4. Sulaiman, N., Arslan, T.: A genetic algorithm for the optimisation of a reconfigurable pipelined FFT processor. In: Proceedings of the 2004 NASA/DoD Conference of Evolution Hardware, Seattle, LA, June 24 - 26, 2004, pp. 104–108 (2004) 1
Evolvable Networks of Intelligent and Secure Integrated and Distributed Reconfigurable System-On-Chip Sensor Nodes for Aerospace Based Monitoring and Diagnostics.
System-Level Modeling and Multi-objective Evolutionary Design
221
5. Sulaiman, N., Arslan, T.: A multi-objective genetic algorithm for on-chip real-time optimisation of word length and power consumption in a pipelined FFT processor targeting a MC-CDMA receiver. In: Proceedings of the 2005 NASA/DoD Conference of Evolution Hardware, Washington, D.C, June 29 - July 1, 2005, pp. 154–159 (2005) 6. Sulaiman, N., Erdogan, A.T.: A multi-objective genetic algorithm for on-chip realtime adaptation of a multi-carrier based telecommunications receiver. In: Proceedings of the 1st NASA/ESA Conference on Adaptive Hardware and Systems, Istanbul, Turkey, June 15-18, 2006, pp. 424–427 (2006) 7. Sulaiman, N., Arslan, T.: A multi-objective genetic algorithm for on-chip real-time adaptation of a multi-carrier based telecommunications receiver. In: Proceedings of the 2006 IEEE Congress on Evolutionary Computation (CEC 2006), Vancouver, BC, Canada, July 16-21, 2006, pp. 3161–3165. IEEE Computer Society Press, Los Alamitos (2006) 8. Bright, M., Arslan, T.: Multi-objective design strategies for high-level low-power design of DSP system. In: Proceedings of the IEEE International Symposium on Circuits and Systems, Orlando, Florida, pp. 80–83. IEEE Computer Society Press, Los Alamitos (June 1999) 9. Palermo, G., Silvano, C., Zaccaria, V.: Multi-objective design space exploration of embedded systems. Journal of Embedded Computing 1(11), 1–9 (2002) 10. Talarico, C., Rodriguez-Marek, E., Sung Koh, M.: Multi-objective design space exploration methodologies for platform based SOCs. In: Proceedings of the 13th Annual IEEE International Symposium and Workshop on Engineering of Computer Based Systems, Potsdam, Germany, March 27-30, 2006, pp. 353–359. IEEE Computer Society Press, Los Alamitos (2006) 11. Ahmadinia, A., Ahmad, B., Arslan, T.: System level reconfigurable FFT architecture for system-on-chip design. In: Proceedings of the 2nd NASA/ESA Conference on Adaptive Hardware and Systems, August 5-7, 2007, Edinburgh, UK (to appear, 2007) 12. Deb, K.: Multi-Objective Optimization Using Evolutionary Algorithms, 1st edn. John Wiley & Sons, Ltd, Chichester (2002) 13. Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast elitist multi-objective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation 6(2), 182–197 (2002) 14. Srinivas, N., Deb, K.: Multiobjective optimization using nondominated sorting in genetic algorithms. Evolutionary Computation 2(3), 221–248 (1994) 15. Salazar-Lechuga, M., Rowe, J.E.: Particle swarm opotimization and fitness sharing to solve multi-objective optimization problems. In: Proceedings of the IEEE Swarm Intellligence Symposium 2006, Indianapolis, Indiana, May 12-14, 2006, pp. 90–97. IEEE Computer Society Press, Los Alamitos (2006) 16. Han, K., Evans, B.L.: Wordlength optimization with complexity-and-distortion measure and its applications to broadband wireless demodulator design. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Montreal, Canada, May 17 - 21, 2004, pp. 37–40. IEEE Computer Society Press, Los Alamitos (2004) 17. Jenkins, W.K., Mansen, A.J.: Variable word length DSP using serial-by-modulus residue arithmetic. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Minneapolis, MN, pp. 89–92. IEEE Computer Society Press, Los Alamitos (May 1993) 18. Pappalardo, F., Visalli, G., Scarana, M.: An application-oriented analysis of power/precision trade-off in fixed and floating-point arithmetic units for VLSI processors. In: Circuits, Signals, and Systems, pp. 416–421 (2004)
Reducing the Area on a Chip Using a Bank of Evolved Filters Zdenek Vasicek and Lukas Sekanina Faculty of Information Technology, Brno University of Technology Boˇzetˇechova 2, 612 66 Brno, Czech Republic
[email protected],
[email protected]
Abstract. An evolutionary algorithm is utilized to find a set of image filters which can be employed in a bank of image filters. This filter bank exhibits at least comparable visual quality of filtering in comparison with a sophisticated adaptive median filter when applied to remove the salt-and-pepper noise of high intensity (up to 70% corrupted pixels). The main advantage of this approach is that it requires four times less resources on a chip when compared to the adaptive median filter. The solution also exhibits a very good behavior for the impulse bursts noise which is typical for satellite images.
1
Introduction
As low-cost digital cameras have entered to almost any place, the need for highquality, high-performance and low-cost image filters is of growing interest. In this paper, a new approach is proposed to the impulse noise filters design. The aim is to introduce a class of simple image filters that utilize small filtering windows and whose performance is at least comparable to existing well-tuned algorithms devoted to common processors. Furthermore, an area-efficient hardware implementation is required because these filters have to be implemented on the off-the-shelf hardware, such as field programmable gate arrays (FPGA). In most cases, impulse noise is caused by malfunctioning pixels in camera sensors, faulty memory locations in hardware, or errors in the data transmission (especially in satellite images [1]). We distinguish two common types of impulse noise: the salt-and-pepper noise (commonly referred to as intensity spikes or speckle) and the random-valued shot noise. For images corrupted by salt-andpepper noise, the noisy pixels can take only the maximum or minimum values (i.e. 0 or 255 for 8-bit grayscale images). In case of the random-valued shot noise, the noisy pixels have an arbitrary value. We will deal with the salt-and-pepper noise in this paper. Traditionally, the salt-and-pepper noise is removed by median filters. When the noise intensity is less than approx. 10% a simple median utilizing 3×3 or 5×5-pixel window is sufficient. Evolutionary algorithms (EA) have been applied to the image filter design problems in recent years [2,3,4]. EA is utilized either to find some coefficients of a pre-designed filtering algorithm or to devise a complete structure L. Kang, Y. Liu, and S. Zeng (Eds.): ICES 2007, LNCS 4684, pp. 222–232, 2007. c Springer-Verlag Berlin Heidelberg 2007
Reducing the Area on a Chip Using a Bank of Evolved Filters
223
of a target image filter. As the first approach only allows tuning existing designs, the use of the second approach has led to introducing completely new filtering schemes, unknown so far [4]. The images filtered by evolved filters are not as smudged as the images filtered by median filters. Moreover, evolved filters occupy only approx. 70% of the area needed to implement the median filter on a chip. When the intensity of noise is increasing (10-90% pixels are corrupted), simple median filters are not sufficient and more advanced techniques have to be utilized. Various approaches were proposed (see a survey of the methods, e.g. in [5]). Among others, adaptive medians provide good results [6]. However, they utilize large filtering windows and additional values (such as the maximum and minimum value of the filtering window) have to be calculated. This makes them expensive in terms of hardware resources. Others algorithms are difficult to accelerate in hardware for real-time processing of images coming from cameras. Unfortunately, the evolutionary design approach stated above which works up to 10% noise intensity does not work for higher noise intensities. The method proposed in this paper combines simple evolved filters with human-designed components to create a bank of 3 × 3 filters which provides a sufficient filtering quality for high noise intensities (up to 70%), and simultaneously a very low implementation cost in hardware.
2
Conventional Image Filters
Various approaches have been proposed to remove salt-and-pepper noise from grayscale images [7,8,9,5]. As linear filters have inclination to smoothing, most of proposed approaches are based on a nonlinear approach. The median filter is the most popular nonlinear filter for removing the impulse noise because of its good denoising power, computational efficiency and a reasonably expensive implementation in hardware [10]. The median filter utilizes the fact that original and corrupted pixels are significantly different and hence the corrupted pixels can easily be identified as non-medians. However, when the noise level (the number of corrupted pixels) increases, some pixels remain corrupted and unfiltered [11]. Median filters which utilize larger filtering windows are capable of removing noise of high intensity but filtered images do not exhibit a sufficient visual quality. The adaptive median filters produce significantly better resulting images than convential medians [6]. The filter operates with a kernel of Smax × Smax pixels. The kernel is divided into subkernels of size 3 × 3, 5 × 5, . . . , Smax × Smax inputs. For each subkernel, the minimum, maximum and median value is calculated. In order to obtain the filtered pixel, the calculated values are processed by the algorithm described in [6]. With the aim to visually compare images filtered by the convential median filter and adaptive median filter, Figure 1 provides some examples for the 40% salt-and-pepper noise (PSNR states for the peak signal-to-noise ratio). We can observe that there are many unfiltered shots in the image obtained by the median filter. We used the 3 × 3 kernel in order to easily compare the results described is next sections. Note that the use of larger kernels implies that many details
224
Z. Vasicek and L. Sekanina
a) original
b) b) 40% noise
image
d) median filter 5x5
c) median filter 3x3
e) adaptive median filter 5x5
f) adaptive median filter 7x7
Fig. 1. Images obtained by using conventional filters. a) Original image b) Noise image corrupted by 40% Salt-and-Pepper noise, PSNR: 9.364 dB c) Filtered by median filter with the kernel size 3 × 3, PSNR: 18.293 dB d) Filtered by median filter with the kernel size 5 × 5, PSNR: 24.102 dB e) Filtered by adaptive median with the kernel size up to 5 × 5, PSNR: 26.906 dB f) Filtered by adaptive median with the kernel size up to 7 × 7, PSNR: 27.315 dB.
are lost in the image. On the other hand, the image obtained by the adaptive median filter is sharp and preserves details. However, the adaptive median (with 5x5 filtering window) costs approx. eight times more area on a chip in comparison to a conventional 3x3 median filter. Even better visual results can be achieved by using more specialized algorithms [9], but their hardware implementation leads to area-expensive and slow solutions.
3 3.1
Evolutionary Design of Image Filters The Approach
This section describes the evolutionary method which can be utilized to create innovative 3 × 3 image filters [4]. These filters will be utilized in the proposed bank of filters. Every image filter is considered as a function (a digital circuit in the case of hardware implementation) of nine 8-bit inputs and a single 8-bit output, which processes grayscale (8-bits/pixel) images. As Fig. 2 shows, every pixel value of the filtered image is calculated using a corresponding pixel and its eight neighbors in the processed image. In order to evolve an image filter which removes a given type of noise, we need an original (training) image to measure the fitness values of candidate filters. The goal of EA is to minimize the difference between the original image and the filtered image. The generality of the evolved filters (i.e., whether the filters operate sufficiently also for other images of the same type of noise) is tested by means of a test set.
Reducing the Area on a Chip Using a Bank of Evolved Filters
3.2
225
EA for Filter Evolution
The method is based on Cartesian Genetic Programming (CGP) [12]. A candidate filter is represented using a graph which contains nc (columns) × nr (rows) nodes placed in a grid. The role of EA is to find the interconnection of the programmable nodes and the functions performed by the nodes. Each node represents a two-input function which receives two 8-bit values and produces an 8-bit output. Table 1 shows the functions we consider as useful for this task [4]. We can observe that these functions are also suitable for hardware implementation (i.e. there are not such functions as multiplication or division). A node input may be connected either to an output of another node, which is placed anywhere in the preceding columns or to a primary input. Filters are encoded as arrays of integers of the size 3 × nr × nr + 1. For each node, three integers are utilized which encode the connection of node inputs and function. The last integer encodes the primary output of a candidate filter. Table 1. List of functions implemented in each programmable node code 0 1 2 3 4 5 6 7
function 255 x 255 − x x∨y x ¯∨y x∧y x∧y x⊕y
Input image
description code function constant 8 x1 identity 9 x2 inversion A swap(x, y) bitwise OR B x+y bitwise x ¯ OR y C x +S y bitwise AND D (x + y) 1 bitwise NAND E max(x, y) bitwise XOR F min(x, y)
Filtered image I0
I0
I2
I2
I8 I7
11 17
7
21
3
0
14
15 18
15 22
7
2
4
15
1
15 23
11 27
14
11
9
3
25
3
26
3
29
15 33
14 37
30
10 34
2
38
15 31
2
9
39
9
15
O
I3
I3 I4
13
14 10
I1
I1
I5
3
description right shift by 1 right shift by 2 swap nibbles + (addition) + with saturation average maximum minimum
Image filter
I4 I5
4
11
12
15
16
11
19
20
24
28
32
35
36
7
40
I6 I7
I6 I8
Fig. 2. The concept of image filtering using a 3 × 3 filter (left). An example of evolved filter (right).
EA uses a single genetic operator – mutation – which modifies 5% of the chromosome (this value was determined experimentally). No crossover operator is utilized in this type of EA because no suitable crossover operator has been proposed so far [13]. Mutation modifies either a node or an output connection. The EA operates with the population of λ individuals (typically, λ = 8). The initial population is randomly generated. Every new population consists of a parent (the fittest individual from the previous population) and its mutants. In case that two or more
226
Z. Vasicek and L. Sekanina
individuals have received the same fitness score in the previous generation, the individual which did not serve as the parent in the previous population will be selected as a new parent. This strategy was proven to be very useful [12]. The evolution is typically stopped (1) when the current best fitness value has not improved in the recent generations, or (2) after a predefined number of generations. 3.3
Fitness Function
The design objective is to minimize the difference between the filtered image and the original image. Usually, mean difference per pixel (mdpp) is minimized. Let u denote a corrupted image and let v denote a filtered image. The original (uncorrupted) version of u will be denoted as w. The image size is K × K (K=128) pixels but only the area of 126 × 126 pixels is considered because the pixel values at the borders are ignored and thus remain unfiltered. The fitness value of a candidate filter is obtained as f itness = 255.(K − 2)2 −
K−2 K−2
|v(i, j) − w(i, j)|.
i=1 j=1
3.4
Design Examples
This approach was utilized to evolve efficient image filters for Gaussian noise and 5 % salt-and-pepper noise and to create novel implementations of edge detectors [4]. Examples of filtered images for the 40 % salt-and-pepper noise are given in Fig. 3. When compared with the common median filter (see Fig. 1 and PSNR), evolved filters preserve more details and generate sharper images. Note that these filtered images represent the best outputs that can be obtained by a single 3 × 3-input filter evolved using described method for the 40% noise. Figure 2 shows an example of evolved filter. We can observe that EA can create only a combinational behavior and the filter utilizes only 3 × 3 pixels at the input. These filters are not able to compete to adaptive median filters which sophistically operate with larger kernels. A way to improve evolved filters could be to increase the kernel size; however, this will lead to smoothing and loosing details in images.
a) evolved filter1
b) evolved filter2
c) evolved filter3
Fig. 3. A corrupted image (see Fig. 1) filtered by evolved filters a) evf1 (PSNR: 18.868 dB), b) evf2 (PSNR: 18.266 dB) and c) evf3 (PSNR: 18.584 dB)
Reducing the Area on a Chip Using a Bank of Evolved Filters
4
227
Proposed Approach
In order to create a salt-and-pepper noise filter which generates filtered images of the same quality as an adaptive median filter and which is suitable for hardware implementation, we propose to combine several simple image filters utilizing the 3 × 3 window that are designed by an evolutionary algorithm according to previous Section 3. As Figure 4(a) shows the procedure has three steps: (1) the reduction of a dynamic range of noise, (2) processing using a bank of n filters and (3) deterministic selection of the best result. We analyzed various filters evolved according to description in Section 3 and recognized that they have problems with the large dynamic range of corrupted pixels (0/255). A straightforward solution of this problem is to create a component which inverts all pixels with value 255, i.e. all shots are transformed to have a uniform value. Filter kernel 3x3
Filtered image I0 I1
Filter 1
O0
Filter 2
O1
Filter n
On
I2 I3 I5 I4
Pre− processing filter
Post− processing filter
I8 I7 I6
(a)
(b)
Fig. 4. a) Proposed architecture for salt-and-pepper noise removal and b) training image
Preprocessed image then enters a bank of n filters that operate in parallel. Since we repeated the evolutionary design of salt-and-noise filters (according to Section 3) many times, we have gathered various implementations of this type of filter. We selected n different evolved filters which exhibit the best filtering quality and utilized them in the bank. Note that all these filters were designed by EA using the same type of noise and training image and with the same aim: to remove 40% salt-and-pepper noise. Finally, the outputs coming from banks 1 . . . n were combined by n-input median filter which can easily be implemented using comparators [14]. As the proposed system naturally forms a pipeline, the overall design can operate at the same frequency as a simple median filter when implemented in hardware.
5 5.1
Experimental Results Quality of Filtering
The filters utilized in the bank were evolved using the method described in Section 3. These filters use the size of kernel 3 × 3 pixels and contain up to 8 × 4 programmable nodes with functions according to Table 1.
228
Z. Vasicek and L. Sekanina
a) bridge
b) goldhill
c) lena
d) d) bridge with
e) e) goldhill with
f)f) lena with
40% noise
40% noise
40% noise
Fig. 5. Examples of test images Table 2. PSNR for adaptive median filter with the kernel size up to 5 × 5 and 7 × 7 kernel size image/noise goldhill bridge lena pentagon camera
10% 31.155 29.474 33.665 32.767 30.367
20% 30.085 28.064 31.210 31.460 28.560
5×5 40% 26.906 24.993 27.171 28.235 25.145
50% 24.290 22.567 24.433 25.217 22.675
70% 15.859 14.781 15.468 16.315 14.973
10% 31.155 29.474 33.655 32.767 30.367
20% 30.085 28.058 31.207 31.460 28.560
7×7 40% 27.315 25.177 27.529 28.621 25.298
50% 25.961 23.710 25.984 27.175 23.852
70% 20.884 19.060 20.455 21.654 19.242
A training 128 × 128-pixel image was partially corrupted by 40% salt-andpepper noise (see Fig. 4(b)). EA operates with an eight-member population. The 5% mutation is utilized. A single run is terminated after 196,608 generations. Results will be demonstrated for 5 test images of size 256 × 256 pixels which contain the salt-and-pepper noise with the intensity of 10%, 20%, 40%, 50% and 70% corrupted pixels. Figure 5 shows some examples of test images. Table 2 summarizes results obtained for the adaptive median filter which serves as a reference implementation. All results are expressed in terms of PSNR = 10 log10
1 MN
2552 2 i,j (v(i, j) − w(i, j))
where N × M is the size of image. Table 3 summarizes results for the images filtered using the bank of size 3 and 5. The output pixel is calculated by a 3-input (5-input, respectively) median circuit. Surprisingly, only three filters utilized in the bank are needed to obtain a bank filter which produces images of at least comparable visual quality to
Reducing the Area on a Chip Using a Bank of Evolved Filters
229
Table 3. PSNR for the bank filter filter image/noise goldhill bridge lena pentagon camera
10% 33.759 31.458 30.304 34.631 30.576
20% 30.619 28.992 28.162 31.89 28.185
3-bank 40% 50% 27.716 25.867 25.83 24.282 25.684 24.137 28.681 26.577 25.284 23.72
70% 19.091 18.333 18.324 18.437 17.85
10% 34.392 32.321 30.393 35.201 31.091
20% 31.131 29.714 28.424 32.411 28.74
5-bank 40% 50% 27.966 25.965 26.124 24.441 25.881 24.203 28.945 26.683 25.576 23.919
70% 19.079 18.327 18.314 18.435 17.845
Table 4. Result of synthesis for different filters optimal median filter 3×3 5×5 7×7 no. of slices 268 1506 4426 area [%] 1.1 6.4 18.7 fmax [MHz] 305 305 305 filter
adaptive median 5×5 7×7 2024 6567 8.6 27.8 303 298
evolved filters proposed utilized in bank filter bank fb1 fb2 fb3 fb4 fb5 3-bank 5-bank 156 199 137 183 148 500 843 0.7 0.8 0.6 0.8 0.6 2.1 3.6 316 318 308 321 320 308 305
the adaptive median filter. This fact is demonstrated by Figure 6 where the visual quality of the images filtered by the adaptive median and 3-bank filter is practically undistinguishable. 5.2
Implementation Cost
In order to compare the implementation cost of median filters, adaptive median filters, evolved filters and the bank of filters, all these filters were implemented in FPGA [15]. Results of synthesis are given for relatively large Virtex II Pro XC2vp50-7 FPGA which contains 23616 slices (configurable elements of the FPGA). This FPGA is available on our experimental Combo6x board. Table 4 shows that proposed bank filters require considerably smaller area on the chip in comparison to adaptive median filters whose implementation is based on area-demanding sorting networks. In order to implement the proposed 3-bank filter in a small and cheap embedded system, a smaller FPGA, XC3S50, is sufficient (it contains 768 slices). However,a larger and more expensive XC3S400 FPGA (containing 3584 slices) has to be utilized to implement the adaptive median filter. 5.3
Other Properties of Evolved Bank Filters
Figure 7 shows another interesting feature we observed for the bank of evolved filters. This kind of filters is relatively good in removing the impulse bursts noise; much better than the adaptive median filters. The impulse bursts usually corrupt images during the data transmission phase when the impulse noise occurs. The main reason for the occurrence of bursts is the interference of frequency
230
Z. Vasicek and L. Sekanina
a) bridge adaptive median filter
a) bridge 3−bank filter
b) goldhill adaptive median filter
b) goldhill 3−bank filter
c) lena adaptive median filter
c) lena 3−bank filter
Fig. 6. Comparison of resulting images filtered using the adaptive median filter with kernel size up to 7 × 7 (a, b, c) and 3-bank filter (d, e, f)
a) image with 40% noise
b) adaptive median
c) evolved filter
filter
Fig. 7. a) Image corrupted by 40% impulse noise (bursts), images filtered using b) adaptive median with kernel size up to 5 × 5 (PSNR: 11.495 dB) and c) 3-bank filter (PSNR: 22.618 dB)
modulated carrying signal with the signals from other sources of emission. Reliable elimination of this type of noise by means of standard robust filters can be achieved only by using sliding windows that are large enough. However, e.g., the 5x5 median filter leads to significant smearing of useful information [1]. Note that images shown in Figure 7 were obtained by the bank filter which was not trained for the impulse bursts noise at all. This solution represents a promising area of our future research.
6
Discussion
The proposed approach was evaluated on a single class of images. Future work will be devoted to testing the proposed filtering scheme on other types of images. Anyway, results obtained for this class of images are quite promising from the
Reducing the Area on a Chip Using a Bank of Evolved Filters
0
1
2
3
4
5
6
7
8
14 9
14 13
0
17
11 21
6
25
14 29
14 33
15 37
12 10
11 14
11 18
11 22
15 26
13 30
5
7
13
0
14
3
14
9
10
1
11
12
6
15
16
19
14 20
1
23
24
4
27
28
7
31
32
6
34
35
36
3
9
38
231
0
39
40
Fig. 8. Example of an evolved filter utilized in the 3-bank filter
application point of view. We can reach the quality of adaptive median filtering using a 3-bank filter; however, four times less resources are utilized. This can potentially lead to a significant reduction of power consumption of a target system. Moreover, Table 4 does not consider the implementation cost of supporting circuits (i.e. the FIFOs) needed to correctly read the filtering windows from memory. This cost can be significant since adaptive median filters require larger filtering windows than the bank filter. Currently we do not exactly know why three (five, respectively) filters evolved with the aim of removing 40%-salt-and-pepper noise are able to suppress the saltand-pepper noise with the intensity up to 70%. Moreover, none of these filters does work sufficiently in the task which it was trained for (the 40% noise). We can speculate that although these filters perform the same task, they operate in a different way. While a median filter gives as its output one of the pixels of the filtering window, evolved filters sometime produce new pixel values. By processing these n-values in the n-input median, the shot can be suppressed. We tested several variants of evolved filters in the bank but never observed a significant degradation in the image quality. The existence of several filters in the bank offers an opportunity to permanently evolve one of them while the remaining ones could still be sufficient to achieve a correct filtering. A possible incorrect behavior of the candidate filter will not probably influence the system significantly. Therefore, this approach could lead to on-line adaptive filtering, especially in the case when EA can modify different filters of the bank. Note that a solution which uses only a single filter cannot be utilized in the on-line adaptive system in which the image processing must not be interrupted.
7
Conclusions
In this paper a new class of image filters was introduced. The proposed bank filter consists of a set of evolved filters equipped with a simple preprocessing and post-processing unit. Our solution provides the same filtering capability as a standard adaptive median filter; however, using much fewer resources on a chip. The solution also exhibits a very good behavior for the impulse bursts noise which is typical for satellite images. In particular, evolutionary design of image filters for this type of noise will be investigated in our future research.
232
Z. Vasicek and L. Sekanina
Acknowledgements This research was partially supported by the Grant Agency of the Czech Republic under No. 102/07/0850 Design and hardware implementation of a patent-invention machine and the Research Plan No. MSM 0021630528 – Security-Oriented Research in Information Technology.
References 1. Koivisto, P., Astola, J., Lukin, V., Melnik, V., Tsymbal, O.: Removing Impulse Bursts from Images by Training-Based Filtering. EURASIP Journal on Applied Signal Processing 2003(3), 223–237 (2003) 2. Dumoulin, J., Foster, J., Frenzel, J., McGrew, S.: Special Purpose Image Convolution with Evolvable Hardware. In: Oates, M.J., Lanzi, P.L., Li, Y., Cagnoni, S., Corne, D.W., Fogarty, T.C., Poli, R., Smith, G.D. (eds.) EvoWorkshops 2000. LNCS, vol. 1803, pp. 1–11. Springer, Heidelberg (2000) 3. Porter, P.: Evolution on FPGAs for Feature Extraction. PhD thesis, Queensland University of Technology, Brisbane, Australia (2001) 4. Sekanina, L.: Evolvable components: From Theory to Hardware Implementations. Natural Computing. Springer, Heidelberg (2004) 5. Schulte, S., Nachtegael, M., Witte, V.D., Van der Weken, D., Kerre, E.E.: Fuzzy impulse noise reduction methods for color images. In: Computational Intelligence, Theory and Applications International Conference 9th Fuzzy Days in Dortmund, pp. 711–720. Springer, Heidelberg (2006) 6. Hwang, H., Haddad, R.A.: New algorithms for adaptive median filters. In: Tzou, K.-H., Koga, T. (eds.) Proc. SPIE, Visual Communications and Image Processing ’91: Image Processing, vol. 1606, pp. 400–407 (1991) 7. Yung, N.H., Lai, A.H.: Novel filter algorithm for removing impulse noise in digital images. In: Proc. SPIE, Visual Communications and Image Processing ’95, vol. 2501, pp. 210–220 (1995) 8. Bar, L., Kiryati, N., Sochen, N.: Image deblurring in the presence of salt-andpepper noise. In: Scale Space, pp. 107–118 (2005) 9. Nikolova, M.: A variational approach to remove outliers and impulse noise. J. Math. Imaging Vis. 20(1-2), 99–120 (2004) 10. Ahmad, M.O., Sundararajan, D.: A fast algorithm for two-dimensional median filtering. IEEE Transactions on Circuits and Systems 34, 1364–1374 (1987) 11. Dougherty, E.R., Astola, J.T.: Nonlinear Filters for Image Processing. SPIE/IEEE Series on Imaging Science & Engineering (1999) 12. Miller, J., Job, D., Vassilev, V.: Principles in the Evolutionary Design of Digital Circuits – Part I. Genetic Programming and Evolvable Machines 1(1), 8–35 (2000) 13. Slany, K., Sekanina, L.: Fitness landscape analysis and image filter evolution using functional-level cgp. In: EuroGP 2007. LNCS, vol. 4445, pp. 311–320. Springer, Heidelberg (2007) 14. Knuth, D.E.: The Art of Computer Programming: Sorting and Searching, 2nd edn. Addison-Wesley, Reading (1998) 15. Vasicek, Z., Sekanina, L.: An area-efficient alternative to adaptive median filtering in fpgas. In: Proc. of the 17th Conf. on Field Programmable Logic and Applications, pp. 1–6. IEEE Computer Society Press, Los Alamitos (to appear, 2007)
Walsh Function Systems: The Bisectional Evolutional Generation Pattern Nengchao Wang1, Jianhua Lu2, and Baochang Shi1,* 1
Department of Mathematics, Huazhong University of Science and Technology, Wuhan, 430074, China 2 State Key Laboratory of Coal Combustion, Huazhong University of Science and Technology, Wuhan, 430074, China
[email protected],
[email protected]
Abstract. In this paper, the concept of evolution is introduced to examine the generation process for Walsh function systems. By considering the generation process for Walsh function systems as the evolution process of certain discrete dynamic systems, a new unified generation pattern which is called the Bisectional Evolutional Generation Pattern (BEGP for short) for Walsh function systems is proposed, combined with their properties of symmetric copying. As a byproduct of this kind of pattern, a kind of ordering for Walsh function systems which is called quasi-Hadamard ordering is found naturally. Keywords: Walsh function, Quasi-Hadamard ordering, Bisectional Evolutional Pattern, Symmetric copying.
1 Introduction Walsh function systems are a kind of closed orthogonal function systems which take only two values +1 and -1. The first system was introduced by J. L.Walsh in 1923 [1] which now is called Walsh function system of Walsh ordering. The mathematical expression of the i th Walsh function of the k th family in this system has following form k −1
wk ,i ( x ) = ∏ sgn[cos ir 2 r π x ],
0 ≤ x <1
r =0
k = 0,1, 2, " i = 0,1, " , 2 − 1 k
where
⎧ + 1, ⎩ −1
sgn[ x ] = ⎨ *
x≥0 x<0
Corresponding author.
L. Kang, Y. Liu, and S. Zeng (Eds.): ICES 2007, LNCS 4684, pp. 233–243, 2007. © Springer-Verlag Berlin Heidelberg 2007
(1)
234
N. Wang, J. Lu, and B. Shi
and i represents the binary code of i : r
k −1 i = ∑ ir 2 r r =0
Fig. 1 shows the functional images of the 4th family Walsh functions (the first 16 Walsh functions) in this system.
0
8
1
9
2
10
3
11
4
12
5
13
6
14
7
15
Fig. 1. The 4th family Walsh functions (the 16 Walsh functions forefront, and 0~15 represent the ordinal numbers of the functions) in the Walsh function system of Walsh ordering
As shown in the expression and figure, we can easily find that Walsh function systems are systems combined with simplicity and complexity. As for the simplicity, Walsh function systems are closed orthogonal function systems which take only two values +1 and -1. This simplicity now has made them be widely used in various fields especially digital signal processing. As for the complexity, the functions are functions so“singular”that it seems hard to analysis them by the traditional tools based on calculus because they are composed by the products of a series of sgn functions. In addition, the images of Walsh functions also look more complex than traditional orthogonal functions such as trigonometric functions. The complexity on the contrary has always been the impediment that Walsh function systems are understood more deeply and used more widely. Therefore it must be good news if the simplicity and complexity can be connected using some simple method for Walsh function systems. In this paper, such a method is found by introducing the evolution concept into the generation process for Walsh function systems. By considering the generation process for Walsh function systems as the evolution process of certain discrete dynamic systems, a new unified generation method which is called Bisectional Evolutional
Walsh Function Systems: The Bisectional Evolutional Generation Pattern
235
Generation Pattern (BEGP for short) is proposed for Walsh function systems combined with their properties of symmetric copying. As we can see below, the complexity can be seen as the repeat of the simplicity under certain sense. As a byproduct of this generation pattern, an ordering for Walsh function systems which is called quasi-Hadamard ordering is deduced naturally. It is worthwhile to point out that Walsh matrix of this ordering is also symmetric which is same to that of other three traditional orderings Walsh ordering, Paley ordering and Hadamard ordering. The paper is arranged as following: part 2 briefly introduces the Bisectional Evolution Pattern for arbitrary discrete dynamics systems; part 3 proposes the Bisectional Evolution Pattern by examining the generation for Walsh function system of Walsh ordering as an example; part 4 expends the evolutional generation method to Walsh function systems of other ordering and deduces the quasi-Hadamard ordering for Walsh function system; part 5 provides a summery and conclusions.
2 The Bisectional Evolution Pattern Evolution may be the most fundamental way which things generate and develop in the nature. This evolution process can be described simply by a discrete dynamic system shown in Fig.2. 0
k
Evolutional
Rule
Fig. 2. A simple discrete dynamic system which describes the evolution process, in which 0 represents the initial state of the system, k and k+1 represent two states in the evolution process, respectively, corresponding to the initial and final state of certain procedure in the total process
Dynamic system cannot sustain the evolution process without force; the process cannot be deeply understood without knowing the pattern of the evolutional rule. However, in Fig. 2, the force and the pattern of the evolutional rule are both vague because the evolutional rule Φ is a “black box”. Therefore, Nenchao Wang proposed a new dynamic system in the nineties of the 20th century which is called “Bisectional Evolution Dynamic System” to substitute the above one, seen in Fig.3.Unlike the above one, the evolutional rule Φ is now more specific and constituted by two duality decomposition rules ( 0-rule and 1-rule respectively )and one synthesis rule(0-1 rule). In this new system, every evolution state in an evolutional cycle is first decomposed and into two duality parts by the decomposition rules, and then composed by the synthesis rule to complete one evolution circle. We called this evolution pattern as Bisectional Evolution Pattern. The advantages of this evolution Pattern lie
236
N. Wang, J. Lu, and B. Shi
0
k
0 1
0-1
k+1
Fig. 3. Bisectional evolution dynamic system which describes the evolution process. Unlike the dynamic system in Fig.2, one evolution circle is completed by decomposing the evolution state into two duality parts using two decomposition rules and then composing them using the synthesis rule.
on its clear exposition of the fundamental evolution principle which is implied in many dynamic systems [2~4].
3 BEGP for Walsh Function Systems 3.1 The Matrix Presentation for Walsh Function Systems
It is naturally to think that the presentation for certain family of Walsh function systems by matrix is the simplest method because that all Walsh functions are step functions and only take two values +1 and -1.If we denote +1 and -1 by + and - respectively, represent the matrix formed by k th family of Walsh function system in Walsh ordering with WN ,where N = 2 k ,from Fig.1, we can get that: ⎡+ ⎢+ ⎡+ +⎤ W1 = [ + ] ,W2 = ⎢ ,W4 = ⎢ ⎥ ⎢+ ⎣+ −⎦ ⎢ ⎣+ ⎡+ + + + + + + ⎢+ + + + − − − ⎢ ⎢+ + − − − − + ⎢ + + − − + + − W8 = ⎢ ⎢+ − − + + − − ⎢ ⎢+ − − + − + + ⎢+ − + − − + − ⎢ ⎣⎢ + − + − + − + We call WN as Walsh matrix of Walsh ordering.
+ + +⎤ + − −⎥ ⎥ − − +⎥ ⎥ − + −⎦ +⎤ −⎥ ⎥ +⎥ ⎥ −⎥ , " +⎥ ⎥ −⎥ +⎥ ⎥ − ⎦⎥
Walsh Function Systems: The Bisectional Evolutional Generation Pattern
237
Owing to the equivalence between the matrixes and the families of Walsh fun-ction system, if we can find the relationships between these matrixes, the simplicity and complexity for Walsh function systems can be connected. Now the key problem becomes studying the relationships between these matrixes. In others word, we must answer the question such as whether there exists a route and how we can find it from W1 to W2 , from W2 to W4 and so on. As we can see below, there exists such route from W2k to W2k +1 , where k = 0,1, 2," .The route can be considered as an evolutional process of certain discrete dynamic system and can describe the generation process for Walsh function system of Walsh ordering simply. Because that the process obeys certain Bisectional Evolution Pattern, We call this process as BEGP for Walsh function system of Walsh ordering which will be discussed in detail in 3.2. 3.2
BEGP for Walsh Function System of Walsh Ordering
Now we examine the relationships between WN s in detail, where N = 2 k , k = 0,1, 2," Through carefully observations, we find W2k +1 can be obtained from W2k by symmetric copying procedure. For sake of simplicity, we take the case from W2 to W4 as an example. We first apply a positive mirror symmetric copying procedure on W2 along the horizontal direction; we get the result denoted by W ( 0 ) :
⎡+ + # + +⎤ W (0) = ⎢ ⎥ ⎣+ − # − +⎦ Secondly, we apply a negative mirror symmetric copying procedure on W2 along the horizontal direction; we get the result denoted by W (1) : ⎡+ + # − −⎤ W (1) = ⎢ ⎥ ⎣+ − # + −⎦ Finally we construct a matrix whose even rows are composed by W (0) and odd rows are composed by W (1) .The matrix has the following form:
⎡+ ⎢+ ⎢ ⎢+ ⎢ ⎣+
+ + +⎤ + − −⎥ ⎥ − − +⎥ ⎥ − + −⎦
It’s easy to see that this matrix is precisely W4 .These procedures can be shown in Fig.4. If we repeat such procedures on W4 , we can generate W8 easily, Fig.5 shows
238
N. Wang, J. Lu, and B. Shi
these procedures. In fact, the above procedures summarily obey Bisectional Evolution pattern, which are shown in Fig.6.In this dynamic system, 0-rule now is the positive mirror symmetric copying procedure while 1-rule is the negative one; the 0-1 rule is the parity mixing procedure. Following the Bisectional Evolution Pattern, we can gradually generate the whole Walsh function system of Walsh ordering by the following route: W1 ⇒ W2 ⇒ W4 ⇒ " ⇒ WN / 2 ⇒ WN ⇒ " We call this kind of new generation pattern as BEGP for Walsh function system of Walsh ordering.
ª º « » W2 ¬ ¼
W (0)
ª # º « # » ¬ ¼
ª « « « « ¬
ª # º « # » W (1) ¬ ¼
º
»
»
¼
» W4 »
Fig. 4. Bisectional evolution from W2 to W4
Remark 1. BEGP for Walsh function system of Walsh ordering can also be understood as an evolution process of row-mirror symmetric copying. We still take the case from W2 to W4 as an example. If we first apply the positive mirror symmetric copying and then apply the negative one on W2 row by row rather than as a whole, we can get W4 directly. We call copying row by row as row copying and call copying as a whole as block copying. Remark 2. As we know, there are not only mirror symmetry but also shift symmetry. Combined with the two kinds of copying styles, we have four methods to generate Walsh function systems. As we can see in the next part, the four methods are naturally and precisely corresponding to four kinds of orderings for Walsh function systems.
Walsh Function Systems: The Bisectional Evolutional Generation Pattern
ª « « « « ¬ ª « « « « ¬
ª « « « « « « « « « « ¬
# # # #
»
»
¼
239
º
» W4 »
º »» » » ¼
ª « « « « ¬
# # # #
º »» » » ¼
º
»
»
»
» » W8 » » » » » ¼
Fig. 5. Bisectional evolution from W4 to W8
W1
0
W (0)
1
W (1 )
WN / 2
0-1
WN
Fig. 6. BEGP for Walsh function system of Walsh ordering.0-rule here is positive mirror symmetric copying procedure and 1-rule is the negative one. The 0-1 rule here is parity mixing procedure.
240
N. Wang, J. Lu, and B. Shi
4 BEGPs and Orderings for Walsh Function Systems 4.1 Ways of Ordering for Walsh Function Systems
As a feature of Walsh function systems, ways of ordering play a key role not only in the theory studies but also applications of them. Traditionally only three orderings for Walsh function systems are familiar to us, Walsh ordering, Paley ordering and Hadamard ordering. Each ordering is relative to certain kind of binary code for Walsh functions. In this section we will find that different orderings vary only in their BEGPs. As revealed in Remark 2, we have four kinds of BEGPs for Walsh functions systems according to different ways of symmetric copying. Specifically, we can show them in Table 1: Table 1. Four BEGPs for Walsh function systems according different ways of symmetric copying BEGPs for Walsh function systems
symmetric copying procedures
I
mirror symmetry + row copying
II
shift symmetry + row copying
III
mirror symmetry +block copying
IV
shift symmetry+ block copying
As we can see in above section, BEGP I generates Walsh function system of Walsh ordering. If we follow the method which generates Walsh matrix of Walsh ordering but substitute BEGP I with BEGP II, we can easily get Walsh matrix of Paley ordering. Now we take BEGP IV as an example and see what can be deduced. For the sake of convenience, we denote the matrix series will be generated as H1 , H 2 , H 4 , H 8 ," . It can be easily known that
⎡+ +⎤ H1 = [ + ] , H 2 = ⎢ ⎥ ⎣+ −⎦ Now we generate H 4 from H 2 .First applying the positive shift symmetric copying on H 2 we get
⎡+ + # + +⎤ H (0) = ⎢ ⎥ ⎣+ − # + −⎦ Secondly applying the negative shift symmetric copying on H 2 , we get
⎡+ + # − −⎤ H (1) = ⎢ ⎥ ⎣+ − # − +⎦
Walsh Function Systems: The Bisectional Evolutional Generation Pattern
241
Considering the block copying applied in BEGP IV, We can get H 4 as the following form:
⎡+ ⎢ ⎡ H (0) ⎤ ⎢ + H4 = ⎢ = ⎥ ⎣ H (1) ⎦ ⎢ + ⎢ ⎣+
+ + +⎤ − + −⎥ ⎡ H 2 ⎥= + − − ⎥ ⎢⎣ H 2 ⎥ − − +⎦
H2 ⎤ − H 2 ⎥⎦
(2)
Repeating the same procedures on H 4 , we can get H 8 .It is easily to see that H N , N = 1, 2, 4," are a series of Hadamard matrixes. Thus BEGP IV generates Walsh function system of Hadamard ordering. Till now, we have obtained all Walsh function systems of three traditional orderings using BEGP naturally. However, there is still one type of BEGPs remained, BEGP III.Naturally it can also generate Walsh function system of certain ordering. We call this ordering as quasi-Hadamard ordering which will be discussed in detail in 4.2. 4.2 Walsh Function System of Quasi-hadamard Ordering
In this section, we generate Walsh function system of quasi-Hadamard ordering using BEGP III. For the sake of convenience, we denote the matrix series will be generated as G1 , G2 , G4 , G8 ," It can be easily known that ⎡+ +⎤ G1 = [ + ] , G2 = ⎢ ⎥ ⎣+ −⎦ Now we generate G4 from G2 .First applying the positive mirror symmetric copying along the horizontal direction (Procedure I) on G2 we get ⎡+ +⎤ G (0) = ⎢ ⎥ ⎣− +⎦ Secondly applying the positive mirror symmetric copying along the vertical direction (Procedure II) on G2 we get ⎡+ −⎤ G (1) = ⎢ ⎥ ⎣+ +⎦ Thirdly applying the negative mirror symmetric copying along the vertical direction on G (0) or applying the negative mirror symmetric copying along the horizontal direction on G (1) (Procedure III) we get ⎡+ −⎤ G (01) = ⎢ ⎥ ⎣− −⎦
242
N. Wang, J. Lu, and B. Shi
Considering block copying applied in BEGP III, We can get G4 finally as the following form: ⎡+ ⎢+ G4 = ⎢ ⎢+ ⎢ ⎣+
+ + +⎤ − − + ⎥ ⎡ G2 G (0) ⎤ ⎡ G2 ⎥= =⎢ ⎢ − + − ⎥ ⎣G (1) G (01) ⎥⎦ ⎣G2PV ⎥ + − −⎦
G2PH ⎤ ⎥ −G2PHV ⎦
Where G2PH , G2VH , −G2PHV represent respectively the results of above three procedures. Repeating the same procedures on G4 , we can get G8 .Generally, the recursive relationship between GN and GN / 2 can be written as following: ⎡G GN = ⎢ NPV/ 2 ⎣G N / 2
GNPH/ 2 ⎤ k ⎥ , N = 2 , k = 0,1, 2," −GNPHV /2 ⎦
(3)
Thus we can finally get Walsh function system of quasi-Hadamard ordering using BEGP III. It is worthwhile to point out that GN are also symmetric matrix series compared with Walsh matrix series of other three orderings. However it should be stressed that in fact quasi-Hadamard ordering is not introduced for the first time. The related literature [5] called Walsh function system of this ordering Gray code ordered Walsh functions which is generated by other generation method. Remark 3. Walsh function system generated using BEGP III is named after Walsh function system of quasi-Hadamard ordering owing to the similarity between BEGP III and BEGP IV. If we represent H 4 as
⎡H H4 = ⎢ 2 ⎣B
A⎤ C ⎥⎦
And according to (2) we can find that A, B are obtained by applying positive shiftsymmetric copying procedure on H 2 along horizontal and vertical direction respectively. Then C is obtained by applying negative shift-symmetric copying on B along horizontal direction or by applying negative shift-symmetric copying on C along vertical direction. Now if we substitute H 2 by G2 and substitute shift by mirror in above process generating H 4 , we will get G4 . Owing to this similarity, quasiHadamard ordering is proposed.
5 Conclusions In this paper, a new unified method BEGP for generating Walsh function systems is proposed by introducing the concept of evolution into the generation process
Walsh Function Systems: The Bisectional Evolutional Generation Pattern
243
combined with symmetric copying properties of Walsh functions. Walsh function systems of three traditional orderings are generated naturally and precisely using this method. As a byproduct of the generation method, Walsh function system of quasiHadamard ordering also can be deduced naturally. The largest advantage of this method is that it can be understood easily but still grasps the fundamental properties of Walsh function systems. As we know, most applications of Walsh function systems till now lie on that we can design fast algorithms for Walsh transforms. BEGP for Walsh function systems reveals that the process of designing fast algorithms can be considered as the inverse process of bisectional evolutional generation process for Walsh function systems in certain sense. This topic will be discussed in our next paper. Acknowledgments. The study is supported by the National Science Foundation of China under Grant No. 60473015.
References 1. Walsh, J.L.: A Closed Set of Orthogonal Functions. Amer. J. Math. 15, 5–24 (1923) 2. Wang, N.: Walsh Function Generated with Evolution. Journal of Image and Graphics 3, 225–231 (1996) 3. Wang, N.: The Mathematical Beauty of Walsh Functions (I). Journal of Yunnan University (19 supp.), 306–314 (1997) 4. Wang, N.: The Mathematical Beauty of Walsh Functions (II). Journal of Yunnan University (19 supp.), 315–323 (1997) 5. Falkowski, B.J., Perkowski, M.A.: Algorithms and Architecture for Gray Code Ordered Fast Walsh Transform. In: Proc. IEEE Int. Symp. Circ. & Syst (23rd ISCAS), New Orleans, USA, pp. 2913–1916 (1990)
Extrinsic Evolvable Hardware on the RISA Architecture A.J. Greensted and A.M. Tyrrell Intelligent Systems Research Group, Department of Electronics, University Of York, Heslington, York, UK YO10 5DD {ajg112,amt}@ohm.york.ac.uk http://www.bioinspired.com
Abstract. The RISA Architecture is a novel reconfigurable hardware platform containing both hardware and software reconfigurable elements. This paper describes the architecture and the features that make it suitable for implementing biologically inspired systems such as the evolution of digital circuits. Some of the architecture’s capabilities are demonstrated with the results of evolving a simple combinatorial circuit using one of the fabricated RISA devices.
1
Introduction
Evolvable hardware provides a unique challenge to hardware engineers. In a field that uses well defined components, design tools and procedures, evolving electronic circuitry requires the ability to instantiate and connect circuit elements in a random fashion. This is of course quite contrary to the intended role of most standard electronic devices. Those wishing to investigate evolving electronic circuits work outside these constraints in order to achieve their goals. Field Programmable Gate Arrays (FPGAs) [1] provide a useful platform for evolving circuits. Their reconfigurability provides a method repeatedly evaluate the fitness of the candidate solution circuits encountered during an evolutionary process. However, commercial FPGAs still impose difficulties when implementing evolvable hardware systems. At the heart of these difficulties is the proprietary configuration bitstream used to configure commercial FPGAs. Without knowledge of the bitstream’s construction, it is difficult to use an evolutionary process to create and test low-level bitstream based candidate solutions without risking device damage. Without resorting to well documented, simplistic, or out-of-date technology like the Xilinx® XC6200 [2], or proprietary bitstream manipulation tools, such as the also out-of-date JBits API[3], users must utilise a reconfigurable superplatform on top of an existent FPGA [4,5]. Although this strategy works well, the inefficiency of creating one reconfigurable architecture using another reduces the size of circuit that may be created. L. Kang, Y. Liu, and S. Zeng (Eds.): ICES 2007, LNCS 4684, pp. 244–255, 2007. © Springer-Verlag Berlin Heidelberg 2007
Extrinsic Evolvable Hardware on the RISA Architecture
245
An alternative strategy is to design and fabricate a new reconfigurable architecture specifically with biologically-inspired systems, like evolvable hardware, in mind. Projects such as POEtic[6,7] have taken this route. The main advantage is a greater efficiency in circuit use and modes of operation appropriate to the required task. A third alternative is to completely remove the constraints of electronics entirely and evolve systems using alternative substances [8]. The work described in this paper takes the first of these alternative approaches, a novel digital electronic architecture, designed for implementing bioinspired systems, such as the evolution of digital circuits. The Reconfigurable Integrated System Array (RISA) [9] addresses a number of the shortfalls of using commercial reconfigurable devices for evolution: the ability to safely apply randomly generated configuration bitstreams, fine grained partial reconfiguration and a fully documented bitstream structure. The paper is organised as follows: Section 2 provides an introduction to the RISA architecture. Sections 2.1-2.2 and 2.3 describe the two main constituent parts of the RISA architecture, a custom FPGA Fabric and microcontroller core respectively. Fabrication details of the initial RISA Device are given in Section 2.4. An initial evolvable hardware experiment using the RISA platform is described in Section 3. Ideas for future development are outlined in Section 4. Conclusions are made in Section 5.
2
RISA
The Reconfigurable Integrated System Array (RISA) is a digital electronic architecture designed for exploring biologically inspired systems. RISA extends the standard FPGA paradigm by integrating distributed reconfigurable software elements with reconfigurable hardware resources. The architecture’s name reflects that each element of the array comprises a whole integrated system, the combination of a microcontroller with an area of gate array fabric. Figure 1 illustrates the RISA architecture. The structure of the RISA cell allows a number of different system configurations to be implemented. Each cell’s FPGA fabric may be combined into a single area and used for traditional FPGA applications. Similarly, the separate microcontrollers can be used in combination for multi-processor array systems such as systolic arrays [10]. However, the intended operation is to use the cell parts in conjunction, allowing the microcontroller to control its adjoining FPGA configuration. This cell structure is inspired by that of biological cells. As illustrated in Figure 2, the microcontroller provides functionality similar to a cell nucleus. By storing and manipulating different FPGA configurations, which are analogous to DNA, the cell’s overall functionality, performed by the FPGA fabric, can be controlled and altered. 2.1
The RISA FPGA Fabric
The RISA FPGA fabric uses an island-style architecture [11]. The fabric design does not attempt to compete with commercial FPGA designs in terms of
IO Blocks
IO Blocks
IO Blocks
FPGA
FPGA
FPGA
uC
FPGA
uC
FPGA
uC
uC
IO Blocks
IO Blocks
FPGA
IO Blocks
IO Blocks
uC
IO Blocks
A.J. Greensted and A.M. Tyrrell
IO Blocks
246
RISA Cell
uC IO Blocks
microcontroller
Fig. 1. The RISA Architecture comprises an array of RISA Cells. Each cell contains a microcontroller and a section of FPGA fabric. Input/Output (IO) Blocks provide interfaces between FPGA sections at device boundaries. Inter-cell communication is provided by dedicated links between microcontrollers and FPGA fabrics.
Organism
RISA
Cell
FPGA uC
Nucleus (with DNA)
Biological
Electronic
Organism
RISA
Cell
RISA Cell
Nucleus
uC Configuration Bitstreams
DNA
RISA Cell
Fig. 2. The structure of the RISA cell is based upon biological cells. The microcontroller operates as a centre of cell operations, controlling the cell functionality implemented in the FPGA fabric. FPGA fabric configuration bitstreams may be stored and manipulated in the microcontroller.
density or flexibility of implementable circuits. However, the RISA fabric does provide fine grained partial reconfiguration allowing small areas of the fabric to be targeted for reconfiguration. Also the fabric is random configuration safe,
Extrinsic Evolvable Hardware on the RISA Architecture
247
so evolved bitstreams will not cause physical damage to the device. Equally important, the design enables full access to the format of the configuration bitstream. This provides the opportunity for custom bitstream generation and the reverse engineering and inspection of evolved bitstreams. The lowest level of the RISA FPGA fabric is the Function Unit, shown in Figure 3. The main parts are a Function Generator, a 2:1 multiplexer and a D flip-flop. Combining these units provides the functionality listed below. – – – – –
4 input, 1 output Look-Up Table (LUT) 16x1 bit RAM block 1 to 17 bit variable length Shift register (via external local routing) 4-1 multiplexer (when used with the 2-1 multiplexer) 1 bit full adder (when used with the carry chain)
The Cluster is the next level of FPGA structure. Each Cluster contains four Function Units, local routing, and connections to inter-cluster routing. Clusters are arranged in a square grid to form the FPGA fabric. Figure 4a shows this arrangement. The figure also illustrates a key part of the fabric’s routing scheme. Each of the Cluster’s four Function Units is assigned to a different routing direction. The direction dictates how a Function Unit’s circuitry may be connected.
shChnIn DInSelect
cyChnOut
0
data
CySelect
fgDIn
12
dIn
mode(0)
22
fgMode(0)
mode(0)
mode(1)
2
fgMode(1)
mode(1)
enable
3
fgEnable
enable
dIn(0)
4
37
0 1
0 1
FOutSelect
fgDOut
dOut
0 1
add(0) add(1)
dIn(2)
add(2)
dIn(3)
add(3)
clk
clk 0 1
5
14
shiftOut
0 1
BigMux
Function Generator
data 0 1
bigMuxOut
fDOut
15
carry
dIn(1)
abMuxOut ABMux
7
CarryMux
muxS1
6
0 1
fDIn
1
13
carryInject configurableBit
ExtMux muxSel
muxS
0 1
8
muxDIn(0) muxDIn(1)
sel
9
muxA
0 1
muxB
10
muxOut
muxDOut
regOut
regDOut
11
RegDInSelect 17 16
regDIn regEn
1 0
regD
d
regE
en
18
21
s regSet regReset
q
r
configurableRegister
regS 19 20
regR shChnOut
cyChnIn
Fig. 3. The RISA Function Unit is the lowest level of the FPGA fabric structure
248
A.J. Greensted and A.M. Tyrrell
(a)
FPGA Fabric
Cluster Eastbound inputs Function Unit
North
East
West
South
Eastbound outputs
Southbound Inputs Cluster (Eastbound Connections Only)
(b)
Combinatorial Input Mux
Westbound Inputs East Function Unit Eastbound Combinatorial Outputs
Combinatorial Circuitry d q en
Eastbound Inputs
Registered Circuitry
Registered Inputs Combinatorial Inputs
Eastbound Registered Outputs
N,S & W Feedback Outputs
Eastbound Registered & Combinatorial Feedback Output
N,S & W Registered Outputs
Northbound Inputs
Fig. 4. The multiplexer based FPGA routing design can be randomly configured without risk of forming combinatorial feedback paths or signal contentions. Combinatorial paths may only be connected within their assigned directions. Registered paths can connect to all signal directions.
In order to achieve a random configuration safe architecture, two configuration scenarios must be avoided: the possibility of signal contention and the creation of combinatorial feedback paths. The former causes a high current to flow between alternatively driven wire endpoints, which can lead to device damage. The latter can create fast oscillations, which unnecessarily consume power, cause noise in neighbouring signals via crosstalk and can cause metastability issues on connected registers. The directed Function Unit scheme and multiplexer based routing prevents both these problems. Figure 4b illustrates the input and output routing of a single Function Unit, in this case the Function Unit assigned to the east direction. Table 1 summarises how signals may be routed. The only restriction is that connections between combinatorial signals must be of the same direction. This stops an unregistered loop being created. 2.2
Fabric Configuration
The configurable resources of the FPGA fabric are divided into either routing or logic; routing being the FPGA connectivity, logic being the functional circuitry. The RISA configuration process supports the features listed below.
Extrinsic Evolvable Hardware on the RISA Architecture
249
Table 1. Function unit connectivity. The routing of combinatorial signals is limited to remove the problem of combinatorial feedback paths.
Outputs
Inputs
– – – –
Combinatorial
Registered
Combinatorial
Of the same direction
Any
Registered
Any
Any
Concurrent configuration of routing and logic resources. Reconfiguration without disrupting the currently operating configuration. Single clock cycle configuration switching. Configuration readback with device state inspection.
Two pairs of serial data chains, one pair for routing, the other for logic, are used for loading configuration information into the fabric. Each pair comprises a chain for selecting a target for configuration and a chain for shifting the actual configuration data. It is this approach that provides the partial reconfiguration feature of the RISA architecture. Figure 5 illustrates the configuration circuitry for a single chain pair. The same circuit design is used for both routing and logic configuration. (a)
ConfigSelectUnit
ConfigSelectUnit
d q en
selChainDIn
selChainDOut
d q en
selShiftEn 0 1
Configure Bus
(b) cnfDIn
0 1
Configurable Register 0 1
Configurable Bit
d q en
0 1
d q en
cnfDOut
cnfReadBackEn cnfShiftEn cnfLoadEn 1 0
en
d
d q en
d q en q
q
Fig. 5. Routing and Logic configuration can be configured separately, each using a pair of serial data chains. Each pair comprises a target selection chain and a configuration chain. (a) Shows the target selection circuitry. (b) Shows the two types of configurable element.
Figure 5a shows the ConfigSelectUnits that are used to select or bypass sections of the configuration chain and thus areas of configurable fabric. Figure 5b
250
A.J. Greensted and A.M. Tyrrell
IO Block 0.0
0.5
1.0mm
Cluster Upper Memory Block Random Number Generator
Core Power Stripe Core Power Ring
Lower Memory Block SNAP Core
Fig. 6. The first fabricated version of the RISA architecture contains a single RISA Cell. The FPGA fabric accounts for the majority of the die area. The die is 5mmx5mm in size and fabricated using a 0.18μm process.
shows the two types of element that make up the configuration chain. The ConfigurableRegister is a D-type flip-flop that can have its initial value set and its state read back. The ConfigurableBit is used for setting single bit control lines, such as multiplexer selection bits and signal inverter states. Examples of these configurable elements can be seen in the Function Unit shown in Figure 3. The lowest level unit selectable for configuration, either routing or logic, is the Cluster. Using the selection chains, any arrangement of Clusters can be targeted. 2.3
SNAP, the RISA Microcontroller
A microcontroller core is incorporated into each RISA cell. Within the RISA architecture, these microcontrollers form a processor array. A custom microcontroller, the Simple Networked Application Processor (SNAP), was designed specifically for the RISA architecture. The SNAP core is, in summary, a 4 stage pipelined, RISC, Von Neumann memory design with a 16 bit data width and address space. It includes a set of interfacing links dedicated to inter-core communication. Furthermore, the core is tightly integration with the RISA FPGA fabric. The core includes a number of modules similar to those found in many commercial microcontrollers [12] such as timers, UARTs, a watch dog timer and general purpose IO ports. A hardware random number generator is also included.
Extrinsic Evolvable Hardware on the RISA Architecture
251
One set of features that makes the SNAP design novel are the techniques used for integration with the FPGA core. First, configurable IO ports that connect directly to the FPGA fabric are built into the microcontroller’s register file. Second, signals routed from the fabric can directly determine instruction execution; condition codes controlled by these signals can be encoded directly into instructions. Using this technique, blocks of code can be enabled depending on FPGA signal values without the need for more timely test code. Last, the SNAP microcontroller has direct access to the FPGA’s configuration chains. Four access ports connect to each configuration chain, all being operable concurrently. 2.4
The RISA Device
So far the RISA project has produced one run of fabricated RISA devices. A cellbased 0.18μm process was used, the die layout is shown in Figure 6. Each device of this initial revision contains a single RISA cell. The FPGA fabric contains 36 Clusters, in a 6x6 grid. As can be seen, the FPGA fabric accounts for the larger area of the device. As with commercial FPGAs, routing takes up a great deal of the die area [13], especially in this case due to the routing being multiplexer based.
3
Evolvable Hardware Using the RISA Platform
A simple evolvable hardware experiment has been undertaken to demonstrate the RISA platform in operation. The initial experiment uses one RISA device, containing a single RISA cell. The cell’s FPGA fabric was used to evolve a simple circuit. Figure 7 shows the experiment setup. The device motherboard, shown in Figure 7a, contains a Xilinx® Spartan FPGA which is used to apply test vectors to the RISA device straddled above, as shown in 7b.
Xilinx FPGA Download Cable
Xilinx Spartan FPGA XC3S500E
RISA Device
RISA Device Download Cable
(a)
Power Regulator
(b)
Fig. 7. (a) The RISA device motherboard without RISA device attached. (b) The experiment setup, with RISA device connected (Serial cable not shown).
A.J. Greensted and A.M. Tyrrell
Input Vectors
RISA Device
IO Blocks
FPGA Fabric
SNAP
IO Blocks
FPGA Fabric
(b)
Xilinx FPGA XC3S500E
IO Blocks
(a)
IO Blocks
252
Output Vectors
D0 D1 D2 D3
West IOBlocks
PE PO
East IOBlocks
Configuration Access Port
Fig. 8. (a) Circuits are evolved within the RISA FPGA fabric under the control of the motherboard’s Xilinx® FPGA. (b) Input vectors are applied to the FPGA fabric via the west IO blocks. The output data, for this experiment parity data, is read from the east IO blocks.
For the experiment presented here, the evolutionary algorithm is performed extrinsically on the Xilinx® FPGA. RISA FPGA fabric configurations are downloaded from the Xilinx FPGA to the RISA device via the Configuration Access Port. Input vectors are applied to and circuit outputs are read from the RISA FPGA fabric via the IOBlocks. This setup is shown in Figure 8a. The Xilinx Embedded Development Kit (EDK) was used to implement the evolutionary algorithm running on a Microblaze soft core. With future development the aim is to move the evolutionary algorithm into the on-board SNAP microcontroller. Furthermore, multiple RISA cells will be used to speed up evolution time by performing multiple fitness evaluations in parallel. The evolutionary algorithm is designed to evolve a 4 bit parity generator, calculating both odd and even parity. The input and output assignments are shown in Figure 8b. The aim at this early stage in system development is to prove the RISA architecture can be used for evolving circuits, rather than increase the complexity bounds of evolved hardware. The evolutionary algorithm parameters are as follows; population size of 32; tournament selection size of 4; a generation limit of 400. Figures 9a and 9b show the mean fitness with positive and negative standard deviations for 100 runs. Figure 9a shows the results for a mutation rate of 4 out of 256, or 1.56%. 86 out of 100 runs found a successful solution. Figure 9b shows the result for a mutation rate of 2 out of 256, or 0.78%. The number of successful runs were 89 out of 100. The evolutionary algorithm makes use of only the east bound direction of cluster Function Units. Furthermore, only the combinatorial circuitry is utilised. A novel, and completely feasible, approach to accelerating evolution time would be to make use of all four Function Unit directions concurrently. This would allow four candidate configurations to be loaded and tested together.
32
32
28
28
24
24
20
20
Fitness
Fitness
Extrinsic Evolvable Hardware on the RISA Architecture
16 12
16 12
8
8
4
4
0
253
0 0
50
100
150
200
250
Generation
(a)
300
350
400
0
50
100
150
200
250
300
350
400
Generation
(b)
Fig. 9. Fitness against generation averaged (mean) over 100 runs when evolving 4-bit parity generator. Also shown are positive and negative standard deviations. (a) has a mutation rate of 4 out of 256, (b) 2 out of 256.
4
Future Development for the RISA Architecture
The RISA architecture was designed for array based systems. Although, as Section 3 demonstrates, interesting experiments can be performed using a single RISA cell, the architecture’s true power is in performing multicellular system operations. The next stage in the architecture’s development is the implementation of different multicellular systems, such as fault tolerant Embryonic Arrays [14,15]. Work is currently underway to develop a novel distributed rerouting algorithm for circumventing faults in array based circuits. The system can be realised by implementing system functionality in RISA cell FPGA fabric and the routing algorithm in the SNAP microcontrollers. Other fault tolerant multicellular systems such as developmental systems [16,17] and endocrinologic architectures [18] are also fully implementable using RISA devices. The design and manufacture of a multi-RISA device motherboard to support these systems is underway. As the RISA architecture is a completely custom design, the suite of supporting software tools also need developing. The experiment outlined in Section 3 provided a practical way of developing and testing these tools. So far an assembler, configuration API and bitstream manipulation API have been created. A major future development is the fabrication of a second revision RISA device. The main goal of this step will be to incorporate multiple RISA cells within a single device. As illustrated in Figure 6, the FPGA fabric uses a considerable area. By moving the ASIC design from cell-based to full-custom, the fabric density could be significantly reduced.
5
Conclusions
A custom architecture ASIC, called RISA, has been produced to simplifying the task of creating bio-inspired systems. A random configuration safe reconfiguration
254
A.J. Greensted and A.M. Tyrrell
system is used, making the RISA platform very suitable for the evolution of digital circuits. An embedded microprocessor array provides a reconfigurable software element suitable for implementing rerouting algorithms used in cellular based fault tolerant systems. The experiment outlined in Section 3, although simplistic, clearly demonstrates the operation of the architecture’s hardware reconfigurable fabric. While the authors do not consider evolutionary algorithms as a particularly useful approach for creating digital circuits, their use in these experiments demonstrate the architecture’s ability to perform extrinsically controlled reconfiguration. This functionality is believed to be critical in many applications that require adaptation and fault tolerance. Further information on the RISA device, such a full HDL and schematics, can be found on the project website: http://www.bioinspired.com/users/ajg112.
Acknowledgements The authors would like to thank the EPSRC (grant number GR/R74512/01) and the MoD for funding this project. Many thanks also to the members of Europractice’s Microelectronics Support Centre for their expert assistance with ASIC development.
References 1. Wolf, W.: FPGA Based System Design. Prentice-Hall, Englewood Cliffs (2004) 2. Xilinx: XC6200 Field Programmable Gate Arrays - datasheet (1997) 3. Xilinx: JBits SDK web site (2007), http://www.xilinx.com/labs/projects/jbits/ 4. Sekanina, L.: Towards evolvable IP cores for FPGAs. In: Proceedings of the 3rd NASA/DoD Conference on Evolvable Hardware, EH-03, Washington, DC, USA, pp. 145–154. IEEE Computer Society Press, Los Alamitos (2003) 5. Sekanina, L.: Virtual reconfigurable circuits for real-world applications of evolvable hardware. In: Tyrrell, A.M., Haddow, P.C., Torresen, J. (eds.) ICES 2003. LNCS, vol. 2606, pp. 186–197. Springer, Heidelberg (2003) 6. POEtic: Project web site (2007), http://www.poetictissue.org/ 7. Tyrrell, A.M, Sanchez, E., Floreano, D., Tempesti, G., Mange, D., Moreno, J.M., Rosenberg, J., Villa, A.E.: Poetic tissue: An integrated architecture for bio-inspired hardware. In: Tyrrell, A.M., Haddow, P.C., Torresen, J. (eds.) ICES 2003. LNCS, vol. 2606, pp. 129–140. Springer, Heidelberg (2003) 8. Harding, S., Miller, J.: Evolution in materio: A tone discriminator in liquid crystal. In: Proceedings of the Congress on Evolutionary Computation 2004, vol. 2, pp. 1800–1807 (2004) 9. Greensted, A., Tyrrell, A.: RISA: A hardware platform for evolutionary design. In: Proceedings of 2007 IEEE Workshop on Evolvable and Adaptive Hardware, IEEE Computer Society Press, Los Alamitos (2007) 10. Hwang, K., Briggs, F.: Computer Architecture and Parallel Processing. McGrawHill, New York (1984)
Extrinsic Evolvable Hardware on the RISA Architecture
255
11. Betz, V., Rose, J., Marquardt, A.: Architecture and CAD for Deep-Submicron FPGAs. Kluwer Academic Publishers, Norwell, MA, USA (1999) 12. ATMEL: AVR ATmega128 - datasheet (2006), http://www.atmel.com/dyn/resources/prod documents/doc2467.pdf 13. Ahmed, E., Rose, J.: The effect of LUT and cluster size on deep-submicron FPGA performance and density. In: FPGA ’00: Proceedings of the 2000 ACM/SIGDA eighth international symposium on Field programmable gate arrays, New York, NY, USA, pp. 3–12. ACM Press, New York (2000) 14. Mange, D., Sipper, M., Stauffer, A., Tempesti, G.: Towards robust integrated circuits: The embryonics approach. Proceedings of the IEEE 88(4), 516–541 (2000) 15. Ortega-S´ anchez, C., Tyrrell, A.: A hardware implementation of an embryonic architecture using Virtex FPGAs. In: Miller, J.F., Thompson, A., Thompson, P., Fogarty, T.C. (eds.) ICES 2000. LNCS, vol. 1801, pp. 155–164. Springer, Heidelberg (2000) 16. Miller, J.F.: Evolving a self-repairing, self-regulating, french flag organism. In: Deb, K., et al. (eds.) GECCO 2004. LNCS, vol. 3102, Springer, Heidelberg (2004) 17. Liu, H., Miller, J., Tyrrell, A.: An intrinsic robust transient fault-tolerant developmental model for digital systems. In: Deb, K., et al. (eds.) GECCO 2004. LNCS, vol. 3102, Springer, Heidelberg (2004) 18. Greensted, A., Tyrrell, A.: Implementation results for a fault-tolerant mulitcellular architecture inspired by endocrine communication. In: Proceedings of EH 2005, 7th NASA/DoD Conference on Evolvable Hardware, IEEE Computer Society Press, Los Alamitos (2005)
Evolving and Analysing “Useful” Redundant Logic Asbjoern Djupdal and Pauline C. Haddow CRAB Lab Department of Computer and Information Science Norwegian University of Science and Technology {djupdal,pauline}@idi.ntnu.no http://crab.idi.ntnu.no
Abstract. Fault Tolerance is an increasing challenge for integrated circuits due to semiconductor technology scaling. This paper looks at how artificial evolution may be tuned to the creation of novel redundancy structures which may be applied to meet this challenge. An experimental setup and results for creating “useful” redundant structures is presented.
1
Introduction
As the semiconductor feature size decreases and the number of transistors on a single chip increases, one of the growing challenges facing the electronic design community is faulty behaviour [1]. This challenge may be met by improved fault tolerance methods. The semiconductor fault challenge may be, in general, a long term challenge but is here today for large ICs, like FPGAs. The mass production of FPGAs enables FPGAs to be produced in the newest technologies. Xilinx Virtex 5 [2] is an example of a new FPGA series from Virtex produced in 65nm technology with up to 330,000 logic cells. If faults are expected to occur in a digital circuit, fault tolerance — the ability to function correctly in the presence of faults, may be achieved by incorporating redundance (additional resources) in some form. These additional resources may be in the form of additional hardware, in which case it is called hardware redundancy [3], the focus of this paper. To find new redundancy techniques it is important to free oneself from the constraints brought upon us by thinking in the way of traditional redundance techniques. The way one thinks when designing circuits is influenced by the way that one is taught electronics, designed electronics and the tools used in the design process. One way of freeing oneself from these human and design automated constraints is to search for ideas using some sort of heuristic search process. One such process is that of evolutionary algorithms [4]. The application of evolutionary algorithms to the design of hardwares is termed evolvable hardware [5]. The goal being, either to explore for unique solutions or to optimise existing solutions. However, in both cases, the goal is usually to obtain a given behaviour e.g. a binary adder [6]. Further, evolution may be applied when seeking some sort of structure, such as evolving the french flag [7]. In both these cases the goal may be explicitly defined and given to the evolutionary algorithm L. Kang, Y. Liu, and S. Zeng (Eds.): ICES 2007, LNCS 4684, pp. 256–267, 2007. c Springer-Verlag Berlin Heidelberg 2007
Evolving and Analysing “Useful” Redundant Logic
257
for comparison between the evolving solutions and the sought solutions. In the former case it is the functionality that needs to be explicitly defined whereas in the latter case it is the structure that needs to be explicitly defined. In this work, the goal is to push evolution to find useful redundant structures for achieving fault tolerance whilst retaining full functionality. However, these redundant structures are unknown, unlike the case of the earlier mentioned french flag problem. It is not possible to explicitly describe the structure that one is seeking, only the functionality of the sought circuit — perhaps in terms of a truth table. Section 2 gives an overview of necessary background material. Section 3 presents relevant previous work. The experimental setup is found in section 4 with results and discussion in section 5. The paper concludes in section 6.
2 2.1
Background Fault Models and Simulated Faults
Two fault models are considered in this work: the gate reliability model and the single fault model. In the gate reliability model, each gate has a certain probability of failing. A fault scenario is one possible configuration of faulty gates for a given circuit. If a fault scenario for the gate reliability model is to be created, each gate in the circuit is tested against a random number generator and selected to be faulty or not based on a chosen gate reliability. This may be said to be a reasonable model of reality as the probability of having failing gates in a circuit is directly proportional to the number of gates in the circuit. In the single fault model, a circuit can have exactly one fault at any time. If a fault scenario for the single fault model is to be created, one and only one of the gates are selected to fail. A failing gate can be modelled in several ways. This paper models a failing gate by inverting its output, something that can be said to be a worst-case scenario. Although an inverted output is not a realistic fault for a defect CMOS gate, this fault model is useful for simulation and analysis purposes because it ensures a wrong output for all possible input values. 2.2
Redundancy
A redundant gate in a circuit is a gate that may fail without damaging the circuits outputs. To find if a gate in a circuit is redundant or not, a gate redundancy test may be performed where the gate is temporarily made defect. If this does not affect the circuit outputs, the gate is redundant. Finding all redundant gates in a circuit involves applying the redundancy test on all gates one by one. The ultimate goal of this work is not redundancy, but reliability. Some forms of redundancy are known to enhance a circuits reliability, while other forms of redundancy consist of “dead meat” that does not contribute and should be optimised away from the circuit. In this paper the term useful redundancy is used for redundant gates that have a useful purpose in the circuit, while fake redundancy is used for gates that have no useful purpose.
258
2.3
A. Djupdal and P.C. Haddow
Measuring Functionality and Reliability
The functionality of a circuit is found by trying all possible input values and recording the respective output values of the circuit. If all recorded output values correspond exactly to the desired truth table for the function, the circuit is working perfectly, otherwise 100% functionality is not achieved. Traditionally, the result of such a test for functionality is either “not working” (0) or “working” (1), referred to as fbool herein. When using artificial evolution to create circuits, fbool is too coarse grained to be used for guiding evolution towards a working circuit. One way of giving evolution more information about how far an individual is from a working solution, is to measure the hamming distance between the circuit output and the desired output i.e. the number of bits that are different between these two solutions. This is then normalised to the interval [0, 1] where 1 is 100% working. This measure of functionality is called fham in this paper. A reliability metric measures how well a circuit functions in the presence of faults. The traditional reliability metric used in this paper is called Rtrad and is the average of all fbool results after having tested a number of randomly selected fault scenarios. The possible fault scenarios depend on the fault model chosen. In this paper the traditional reliability metric Rtrad is used together with the single fault model and is named Rtrad_single . A reliability metric may also be based on fham and is called Rehw . Rehw is the average of all fham results after having tested a number of randomly selected fault scenarios.
3
Previous Work
In the earlier work of Hartmann and Haddow [8], circuits were evolved with an Rehw based fitness function using the gate reliability model. The results provided clear evidence that evolution traded off functionality for reliability. Instead of making 100% functional circuits and tolerating faults using redundancy, evolution shrunk the circuits. For the gate reliability model, the probability of having a faulty gate in a circuit is directly proportional to the number of gates in the circuit. Evolution took the easiest path to tolerating the faults — it avoided many of them by removing gates to a point where the circuit was no longer 100% functional. While [8] only looked at Rehw , [9] investigated and compared both the Rehw and Rtrad reliability metrics for evolved and traditional circuits. In traditional electronics 100% functionality is considered essential. In previous work [10] the problem of evolving 100% functional circuits with redundancy was investigated. Like in this paper, reliability in itself was not the main goal, but rather the creation of redundant structures. To ensure 100% functionality, the fitness function was designed such that fham was the only contributor to fitness unless functionality was 100%. Thus reliability only affected fitness after 100% functionality was reached. Several experimental setups were tried, using both the gate reliability model and the single fault model. When using the gate reliability model, no form of
Evolving and Analysing “Useful” Redundant Logic
inputs
f
inputs 2
r
f
output
1
(a)
259
2
r
output
1
(b)
Fig. 1. Structures evolved in [10]
redundancy was achieved as the simplest solution for evolution was to minimise the number of gates used in implementing a fully functional circuit. The single fault model experiments on the other hand created larger circuits containing redundant gates. It was concluded that the single fault model does not discourage large circuits and evolution can therefore more easily introduce new redundant structures. The first evolved structure in [10] containing redundant gates had the form shown in figure 1(a). The subcircuit marked f implements the desired function and the subcircuit marked r implements any function. All gates in r are redundant. The three gates in the figure makes sure that r does not have any impact on the output at all — the output of gate 1 is constant 1 no matter what r evaluates to. This gate is called unreachable because no input vector has any impact on the output of the gate. This structure was evolved using the fitness function f = k1 · fham + k2 · Rtrad_single and evolution achieved high fitness by making r as large as possible and f as small as possible, thus scoring high on Rtrad_single . The redundant gates in r are fake and thus not useful for any purpose. They do not, in any way, influence the output and could just as well be removed. One way of avoiding the structure in figure 1(a) is to detect unreachable gates. This was also tried in [10]. Any subcircuit with unreachable gates as the only outputs can be excluded when Rtrad_single is calculated. In this way, such structures do not contribute to fitness, i.e. Rtrad_single and evolution is encouraged to find another way to improve fitness. The result is typically a structure as in figure 1(b). Here there are no unreachable gates but the redundant gates in r are still just as “useless” for the same reason: r contains only fake redundancy and could just as well be removed from the circuit without affecting functionality or reliability. The work in [10] managed to create several circuits with redundant gates. However, the method used did not manage to evolve any circuits with useful redundancy. It was concluded that evolution chooses the easiest way to solve the problem, and the easiest way in the experimental setup in [10] was fake redundancy. When the fitness function is not good enough at separating circuits with useful redundancy from circuits with fake redundancy, the result is large amounts of fake redundancy and no useful redundancy. The goal of this paper is to tune the evolutionary process further in order to be able to evolve useful redundancy.
260
A. Djupdal and P.C. Haddow
inputs
X Y
output
g
Fig. 2. Circuit partition after selecting any gate g
4
Experiments
This paper builds on the lessons learned in [10]. In [10], a fitness function using Rtrad_single seemed most promising with regard to introducing redundancy and Rtrad_single is therefore chosen in this paper. A key point for improving on the previous experiments is to correctly separate useful redundancy from fake and only include useful redundant gates when Rtrad_single is to be calculated. Detecting known unwanted structures, like the unreachable gate subcircuits in figure 1(a), is not the answer. Experiments in [10] show that evolution is only going to come up with new ways of cheating by introducing new forms of fake redundancy. The solution chosen in this paper is to use a more general way of classifying redundancy as useful or fake. Instead of detecting unwanted structures, a gate is simply classified as useful redundant if it has some observable influence on the circuits output. More specifically, a gate is said to be useful redundant if, when the gate becomes defect, some other redundant gate becomes non-redundant in order to maintain correct circuit functionality. 4.1
Algorithm for Classifying Redundant Gates
Algorithm 1, FindFake, is a heuristic for classifying the redundant gates in a given circuit as being either useful redundant or fake redundant. The algorithm works on a given circuit. First, all redundant gates are marked as useful redundant. Then a gate g is selected. For the selected gate g, the circuit can be partitioned into two sets of gates X and Y , both of which may be the empty set, as shown in figure 2. The gates in Y can be disconnected from the circuit by changing the chosen gate g to either V cc or Gnd, both of which are tried. If this change does not damage the output of the circuit, the number of redundant gates in X after the change is compared to the number of redundant gates in X before the change. If the number of redundant gates in X is unchanged, the gates in Y have no impact on the output and are useless. They are then marked as fake. This is repeated for all the gates in the circuit. 4.2
Rtrad_single Based on Measured Redundancy
A measure like Rtrad_single depends on the function the circuit is supposed to perform — Rtrad_single is 0 when functionality is not 100%. To encourage redundancy early during evolution, before the individuals reach 100% functionality,
Evolving and Analysing “Useful” Redundant Logic
261
Algorithm 1. Classifying redundant gates as useful or fake 1: procedure FindFake(circuit) 2: markAllRedundantGatesAsUseful 3: for all gates g do 4: partitionCircuit(X, Y, g) Find gate sets X and Y given g 5: redundantInX ← numberRedundant(X) 6: g ← vcc Disconnect Y by substituting g with Vcc 7: if outputsUnchanged then If circuit is still working 8: redundantV cc ← numberRedundant(X) 9: if redundantInX ≥ redundantV cc then 10: markAsFake(Y) 11: end if 12: end if 13: g ← gnd Disconnect Y by substituting g with Gnd 14: if outputsUnchanged then 15: redundantGnd ← numberRedundant(X) 16: if redundantInX ≥ redundantGnd then 17: markAsFake(Y) 18: end if 19: end if 20: restoreCircuit Change circuit back to the original 21: end for 22: end procedure
the current behaviour of the individual is measured. The measured behaviour is then used when calculating Rtrad_single instead of the desired target behaviour, resulting in a score for Rtrad_single even when 100% functionality is not reached. 4.3
Experimental Setup
All experiments are conducted on simulations of circuits in a digital feed forward circuit simulator. Only Boolean logic is allowed and the following gates are available: AND, OR, NAND, NOR, NOT. Cartesian genetic programming [11] is applied with the following GA parameters: – – – – –
Maximum number of gates: 100 Population size: 20 Tournament selection with elitism (g = 3, p = 0.7) Crossover rate: 0.2 Mutation rate: 0.05 (mutation applied at the gate level)
The experiments in this paper use the single fault model. The algorithm explained in section 4.1 classifies redundant gates as either useful or fake and only useful redundant gates are included when Rtrad_single is calculated. Rtrad_single is calculated based on the current measured behaviour and not the target behaviour.
262
A. Djupdal and P.C. Haddow
Evolving function and redundancy at the same time. For experiments evolving functionality and redundancy at the same time, the following fitness function is used: f1 = 0.7 · fham + 0.3 · Rtrad_single (1) Three sets of experiments are performed using the fitness function in equation (1), differing in target functionality: Two input AND, two input OR and two input XOR. Evolving function first, then redundancy. If 100% functionality is required before evolving redundancy, the following fitness function is used: 0 if fham < 1.0 f2 = 0.7 · fham + 0.3 · (2) Rtrad_single if fham = 1.0 The fitness function in (2) is used when evolving CIR4, a four input one output function with the truth table “1001011101100110” (bit zero to the right) Evolving with unspecified function. If the target functionality is not specified but instead evolved together with the circuits, the following fitness function is used: f3 = Rtrad_single . (3)
5
Results and Discussion
The results and their discussions are separated into three subsections, based on the complexity and type of target behaviour. 5.1
Simple Functionality
The chosen functionality for the simple experiments is a two-input Boolean function that can be implemented with a single gate circuit. Both AND2 and OR2 have been tried. The reason for evolving these very simple functions is to see what redundancy structures emerge when the function requires little effort to evolve. Table 1 shows the best individuals after running ten independent experiments for both AND2 and OR2. The best results from these experiments all have the same basic idea behind the introduced redundancy: a voter structure similar to figure 3(a) is introduced just before the output of the circuit. Four independent circuit modules are connected to this voter that all perform the desired function. If three of the four modules work correctly, the voter outputs the correct value. This voter structure is created by the evolutionary algorithm to solve the problem, nothing in the experimental setup predefines a voter as the preferred result. This design may be compared to the most well known traditional fault tolerance method, Triple Modular Redundancy (TMR), that has three modules and a majority voter. It is interesting to see that evolution in fact finds a voter as the
Evolving and Analysing “Useful” Redundant Logic
263
Table 1. Results, simple functionality. “Type” indicates redundancy type: voter or something else. “Red.” is the number of redundant gates. “Non-red.” is the number of nonredundant gates. (a) AND2 # 0 1 2 3 4 5 6 7 8 9
Type Voter Voter
Voter Voter Voter
Red. 23 32 33 37 50 39 40 38 35 23
(b) OR2 Non-red. 3 3 5 7 4 3 3 3 5 7
# 0 1 2 3 4 5 6 7 8 9
Type Voter Voter Voter Voter
Voter
Red. 18 21 38 17 28 23 33 29 33 29
Non-red. 7 3 3 3 5 5 6 4 6 5
best solution. Of all the possible solutions that evolution could have found it chose something close to the traditional solution. The evolved voter is smaller than the TMR-voter (three gates as opposed to four), but needs more working modules. This is no disadvantage when simulating using the single fault model, in fact a three-gate four-input voter is the best solution in this case. In the more realistic gate reliability model, TMR is better as it requires fewer gates in total and, therefore, has fewer gates that may fail. It is also clear from table 1 that when evolution has managed to create redundancy, the redundant subcircuits are expanded. This can be explained by the use of the single fault model. It is favourable for fitness to have as many redundant gates as possible because Rtrad_single is the same as the number of redundant gates divided by the total number of gates. Analysis of Evolved Voter. Why is the voter structure in figure 3(a) successful at hiding single defects in the modules connected to the voters inputs? The voter can be explained by doing a don’t care (DC) analysis of the circuit. If one input to an AND-gate is zero, the other input is DC because no matter what it is, the output of the AND-gate is zero. Likewise, if one input to an OR-gate is one, the other input is DC. In addition, an input DC is in most cases propagated to the subcircuit connected to this input, meaning that all gates in the subcircuit have a DC for this specific case. This is not true for all possible circuits, but is true for the voter in figure 3(a). These simple rules can now be used to explain the voter. All four modules connected to the voter should perform the same function, so every wire in figure 3(a) has the same value. The purpose of the voter is to make sure that any single fault in any of the modules is tolerated. The voter should therefore be designed such that if any three of the four inputs to the voter is correct, the fourth input is DC. To see if the voter fulfils this requirement, one should separately examine the two possible
264
A. Djupdal and P.C. Haddow
DC
module
module
module
module
DC
DC module
0
0
module
module 0
0
module
module
(a)
module module module
(b)
DC 1
1 1
DC 1
1
(c)
Fig. 3. Evolved voter
cases of voter operation: When the voter output should be zero, and when the voter output should be one. Zero-case: This case is illustrated in figure 3(b). When the result of the voter should be zero, only one input to the AND-gate of the voter needs to be zero. This means the other input and both modules that are indirectly connected to the input are DC. One-case: This case is illustrated in figure 3(c). When the result of the voter should be one, both inputs to the AND-gate must also be one. This case must therefore be handled by the OR-gates. For each of the OR-gates to output one, only one of the inputs to each OR-gate needs to be one. This means the other input and the module connected to it are DC. These two cases show that the voter outputs the correct value even when one of the four modules connected to the voter fails. Note the symmetry in figures 3(b) and 3(c). For example in figure 3(b), it is just as correct to mark the lower two modules having a DC output and the upper two modules having output 0. It can now be seen that if a single module is selected as faulty, if the three other modules work correctly the output will still be correct. 5.2
Complex Functionality
If the functionality of the circuit is more complex it becomes harder to evolve a functional circuit. How does this affect the redundancy structures that are evolved? XOR2 is a step up in functionality. XOR is not among the gates available for evolution and requires minimum a three gates implemention. In the XOR-case evolution has a much harder time finding a solution as efficient as the voter in figure 3(a). Table 2(a) shows the best individuals after running ten independent experiments for XOR2. The same kind of voter was observed in one of the evolved XOR-circuits, but mostly functionality and the structures used for introducing redundancy in the circuit were mixed together in an intricate way. An example of this is given later in this section. The most complex functionality evolved in this paper is the four-input CIR4 circuit that requires a nine gate minimum implementation. To ensure 100% functionality, it was necessary to apply the fitness function in equation (2) that forces evolution of functionality first and then redundancy. Table 2(b) shows the best
Evolving and Analysing “Useful” Redundant Logic
265
Table 2. Results, complex functionality. Same layout as in table 1. (a) XOR2 # 0 1 2 3 4 5 6 7 8 9
(b) CIR4
Type
Red. Non-red. 22 6 38 8 19 10 22 6 (not working) 32 7 Voter 43 3 19 10 42 11 36 7
# 0 1 2 3 4 5 6 7 8 9
Type
Red. 17 24 24 21 11 24 25 15 18 32
Non-red. 13 21 15 16 14 15 18 17 13 16
A in
1100
in
1100
B
in0 in in
out
0100 in0
1100 in0 in1
Fig. 4. Redundant XOR2 without voter. IN0 and IN1 are the main circuit inputs
individuals after running ten independent experiments for CIR4. In this case there are no voters evolved and like most of the XOR2 evolutionary runs, functionality and redundancy are mixed together. As can be seen from the number of non-redundant gates in the evolved circuits, the introduced redundancy is not very efficient. Although not as efficient as the voter solutions, these solutions are still interesting. The purpose of this work is not to evolve the voter but to find new ways of introducing redundancy to a circuit. The solutions in table 2 do represent new redundancy solutions. The inefficiency might come from the fact that the fitness function forces 100% functionality before redundancy. The evolutionary runs were also stopped after a certain amount of time. More efficient redundancy might have been the result if the experiments were allowed to run longer. Example of Non-Voter Based Redundant Circuit. What does a non-voter based redundant circuit look like? An example of such a circuit is the XOR circuit number nine in table 2(a). This circuit is illustrated in figure 4. The four rounded boxes are subcircuits having the truth table written inside the box (bit zero to the
266
A. Djupdal and P.C. Haddow
Table 3. Results, evolving function together with redundancy. Same layout as in table 1, with the addition of column “Function” which is the evolved functionality. IN0 and IN1 are the circuit inputs. # Function 0 IN0 1 ¬ IN0 2 AND 3 ¬ IN1 4 IN1 5 IN0 6 IN0 7 IN1 8 ¬ IN1 9 IN1
Type Voter
(Voter) Voter Voter Voter Voter Voter
Red. 28 39 23 32 59 28 49 40 17 31
Non-red. 3 5 4 3 3 3 5 3 3 3
right). All gates in region A (to the left of the dotted line) are redundant while all gates in region B are non-redundant. The redundant gates in figure 4 are useful redundant, they do have an impact on the circuit output. The XOR functionality is, however, not produced exclusively in the redundant part of the circuit. None of the rounded boxes in the redundant part of the circuit represent XOR. Instead, XOR is formed with a combination of the redundant and non-redundant gates. An analysis similar to the DC analysis for the voter in section 5.1 can be used to understand why the gates in region A are redundant. 5.3
No Specified Functionality
From the previous experiments in this paper it is clear that functionality affects how redundancy is achieved and how effective this redundancy is. As the complexity of the functionality increases, more focus is placed on getting a circuit working and it becomes harder to find an efficient way of creating a redundant version of the circuit. The evolved redundancy structures are the goal for this paper, not a specific functionality. A set of experiments are performed that does not explicitly state what function the evolved circuits should perform. The only requirement is that the circuit must have two inputs and one output. Evolution is thus free to create any function and focus all efforts on creating circuits with redundancy. This is accomplished by using the fitness function in equation (3). As Rtrad_single is the only factor in this fitness function and because Rtrad_single is based on the current measured functionality of an individual, the target functionality of the circuits is evolved together with the redundant circuits themselves. It is likely that the resulting function is something that can easily be made redundant in an efficient way. This is backed up by the results. Table 3 shows the best individuals after running ten independent experiments where the target functionality is not specified. The evolved functions are very simple (typically cloning an input or being
Evolving and Analysing “Useful” Redundant Logic
267
the equivalent of a single Boolean gate) and most individuals use a voter similar to figure 3(a).
6
Conclusion and Further Work
This paper has presented an experimental setup that sucessfully uses artificial evolution to create digital circuits with useful redundancy. The purpose of this experimental setup is to find new ways of building redundant circuits. The results show that although there is no explicit guiding towards creating a voter structure, evolution does in some cases create a voter resembling the voter used in traditionally designed reliable circuits. This is typically the result when evolving circuits with simple functionality. The voter is a known way of making redundant structures and while it is interesting that evolution creates voter like structures, the main goal is to find new ways of introducing redundancy. When evolving more complex functions, the result is non-voter based redundancy. Although not as efficient as a voter based solution, these results are interesting examples on how to do redundancy without the traditional voter. Planned further work includes experiments where evolution is allowed to leave the strict Boolean logic domain and exploit the analog properties of the CMOS technology.
References 1. ITRS: Int. techn. roadmap for semiconductors. Technical report, ITRS (2005) 2. Xilinx: Xilinx virtex 5 overview. http://www.xilinx.com/virtex5 3. Lala, P.K.: Self-Checking and Fault Tolerant Digital Design. Morgan Kaufmann Publishers (2001) 4. Eiben, A.E., Smith, J.E.: Introduction to Evolutionary Computing. Springer (2003) 5. Higuchi, T., Niwa, T., Tanaka, T., Iba, H., de Garis, H., Furuya, T.: Evolving hardware with genetic learning: a first step towards building a darwin machine. In: Proc. Int. Conf. From animals to animats: simulation of adaptive behavior. (1993) 417– 424 6. Hemmi, H., Mizoguchi, J., Shimohara, K.: Development and evolution of hardware behaviors. In: Artificial Life IV: Proc. 4th Int. Workshop Synthesis Simulation Living Syst., MIT Press (1994) 371–376 7. Miller, J.F.: Evolving a self-repairing, self-regulating, french flag organism. In: Genetic and Evolutionary Computation (GECCO). (2004) 129–139 8. Hartmann, M., Haddow, P.C.: Evolution of fault-tolerant and noise-robust digital designs. IEE Proc. - Computers and Digital Techniques 151(4) (jul 2004) 287–294 9. Haddow, P.C., Hartmann, M., Djupdal, A.: Addressing the metric challange: Evolved versus traditional fault tolerant circuits. In: Adaptive Hardware and Systems. (2007) 10. Djupdal, A., Haddow, P.C.: Evolving redundant structures for reliable circuits – lessons learned. In: Adaptive Hardware and Systems. (2007) 11. Miller, J.F., Job, D., Vassilev, V.K.: Principles in the evolutionary design of digital circuits  part i. Journal of Genetic Programming and Evolvable Machines 1(1) (2000) 8–35
Adaptive Transmission Technique in Underwater Acoustic Wireless Communication Guoqing Zhou and Taebo Shim Underwater Acoustic Communication Institute, Soongsil University, Korea 511 Sandgo-dong dongjak-gu, Seoul 156-743, Korea
[email protected],
[email protected]
Abstract. Underwater acoustic channel (UACh) requires robust techniques to get high speed data transmission for reliable communication. If the channel can be estimated and this estimate sent back to the transmitter, the transmission scheme can be adapted to the channel variation. In this paper, we assume a fixed source and receiver configuration over a slowly-varying UACh, where the instantaneous signal to noise ratio (SNR) is constant over a large number of transmissions and then changes to a new value based on the Rayleigh fading distribution. Theoretical derivation of channel capacity (ChC) shows that we can optimize the data rate allowing the transmit power to vary with SNR, subject to an average power constraint. Simulation study also shows that using adaptive technique, we can adapt the channel variation in UACh communication. We also find that variability of the UACh capacity seems not to be negligible by the sloping condition. Keywords: Underwater Acoustic Channel Capacity, Adaptive Transmission Technique, Rayleigh Fading, Sloping Condition.
1 Introduction The UACh is a time varying block fading channel, which is constant over some transmissions after which UACh changes to a new independent status. It is critical to capture the distribution of the channel gain for designing a UACh communication system. In the past works, determining the ChC usually assumes a linear communication channel with Gaussian source and perfectly known channel impulse response function [1] or assumes a time invariant UACh model [2] . In this paper, we develop practical variable power and variable data rates techniques for Rayleigh fading UACh. We will see that the Rayleigh fading channel capacity is achieved when the transmitter adapts its power, data rates to the channel variation. The optimal power allocation is a water-filling in time, where power and data rates are increased when channel conditions are favorable and decreased when channel conditions are not favorable. The outline of this paper is as follows. Section 2 provides theoretical derivation of Rayleigh fading UACh capacity and adaptive technique of optimizing channel capacity. Section 3 gives theoretical experiment on optimizing the ChC for Rayleigh L. Kang, Y. Liu, and S. Zeng (Eds.): ICES 2007, LNCS 4684, pp. 268–276, 2007. © Springer-Verlag Berlin Heidelberg 2007
Adaptive Transmission Technique in Underwater Acoustic Wireless Communication
269
fading UACh at low frequency and high frequency. Section 4 presents variability of the UACh capacity in sloping condition. Summary and conclusion are in Section 5.
2 Theoretical Derivation of Channel Capacity For a Rayleigh fading UACh, we consider the system model as shown in Fig.1. Transmitter
Channel g >i @
Receiver Decoder
n[i]
Power Control Encoder
x[i]
y[i]
S(i)
Channel Estimator
g[i]
Fig. 1. UACh System Model
From Fig.1, we assume the UACh system model is composed of three parts, which are transmitter, channel and receiver. Where the transmitter is composed of encoder and power control S(i) and the receiver is composed of decoder and channel estimator. Channel is assumed to be a Rayleigh fading channel, where x[i] is the channel input at time i, y[i] is the corresponding channel output, and n[i] is random UACh noise process. The channel power gain g[i] is assumed following a Rayleigh fading probability density function (pdf) at time i, which is constant over some block length T after which g[i] changes to a new independent value based on the pdf. The Rayleigh fading pdf is given by an exponential distribution. p γ (γ
)=
e
−γ
γ
(1)
γ
where γ is the average SNR, γ is instantaneous SNR We also assume that the channel power gain g[i] can be estimated at the receiver and sent back to the transmitter. So we can use power control S(i) scheme to adapt the channel variation, where power and data rates are increased when channel conditions are favorable and are decreased when channel conditions are not favorable. Let S denote the average transmit signal power and B the received signal bandwidth. We know that the capacity of an additive white Gaussian noise (AWGN) channel with average received SNR γ is
C = B log 2 (1 + γ
)
(2)
So ChC with Rayleigh fading pdf can be obtained as the maximum of the integral C =
∫
∞ 0
B lo g 2 (1 + γ
) p (γ ) d γ
(3)
270
G. Zhou and T. Shim
subject to the source power constraint ∞
∫ S (γ ) p (γ ) dγ = S
(4)
0
To find the optimal power allocation [7], we form the Lagrangian as
J ( S ( γ ) ) = ∫ B log 2 (1 + γ ) p ( γ ) d γ − λ ∫ S ( γ ) p ( γ ) d γ ∞
∞
0
0
(5)
Next we differentiate the Lagrangian and set the derivative equal to zero: ∂J ( S ( γ ) ) ∂S ( γ )
⎡⎛ ⎢⎜ 1 = ⎢⎜ γ S ⎢⎜ 1 + ( γ ) ⎢⎣⎝ S
⎤ ⎞ ⎥ ⎟λ ⎟ − λ ⎥ p (γ ) = 0 ⎥ ⎟S ⎥⎦ ⎠
(6)
Solving for S(γ) with the constraint that S(γ)>0 yields the optimal power adaptation as S (γ ) S
⎧1 1 ⎪ − ,γ ≥ γ 0 = ⎨γ 0 γ ⎪0, γ < γ ⎩ 0
(7)
where γ0 is the cut off value of SNR Substituting equation (7) to equation (4) yielding the cut off value of SNR must satisfy ∞⎛ 1 1⎞ (8) ∫γ 0 ⎜⎝ γ 0 − γ ⎟⎠ p (γ ) dγ = 1 Since γ is time varying, the maximizing power adaptation policy is a water-filling formula in time, as illustrated in Fig. 2. 1
S J S
J0
S J S
1
J
J0
J
Fig. 2. Optimal Power Allocation: Water-filling
This curve shows how much power is allocated to the channel for instantaneous SNR. When channel conditions are good, more power and a higher data rate is sent over the channel. As channel quality degrades, less power and rate are sent over the channel. If the instantaneous channel SNR falls below the cut off value, the channel is not used.
Adaptive Transmission Technique in Underwater Acoustic Wireless Communication
271
Substituting equation (7) to equation (3) then yields the maximizing ChC of Rayleigh fading UACh ∞ ⎛γ ⎞ (9) C = B log p γ dγ
∫γ
max
2
0
⎜ ⎟ ⎝ γ0 ⎠
( )
So we can use optimal power location technique to adapt channel variation in UACh communication.
3 Theoretical Experiment of UACh Capacity 3.1 Theoretical Experiment of UACh Capacity at Low Frequency Data for simulation were provided by KODC (Korea Oceanographic Data Center). Fig.3 shows sound speed profile for the line number 104 in the East Sea of Korea measured in February 2004.
Fig. 3. Sound Speed Profile
Based on the theoretical derivation of UACh capacity of section 2, for a low frequency simulation, we assumed that a fixed source and receiver configuration, where the source depth is 10m, the receiver depth is 30m, and the range of the source and receiver is 10km. The source center frequency is assumed at 300HZ with bandwidth 100Hz. In this condition, we can use KRAKEN model [12] to get the transmission loss (TL) of 88.7 dB. The source average power is assumed 193dB//uPa2/Hz. The ambient noise level is approximated 67 dB at sea state 3 by R.J.Urick [8]. For an array of 50 sensors, the DI is 17 dB [10], so we can get the average SNR at the receiver in dB S
N
= SL − TL − ( NL − DI )
(10)
Where NL is ambient noise and DI is directivity index of the receiver array. All the assumptions and calculation data are shown in Table 1.
272
G. Zhou and T. Shim
Using equation (1) we can get Rayleigh fading pdf and using equation (8) we can get the cut off value of SNR. When the channel instantaneous SNR is estimated at the receiver and sent back to the transmitter, we can let the power control S(i) at the transmitter varies with the instantaneous SNR following with equation (7) to adapt the channel variation, as shown in Fig.4(a). With corresponding data rates, shown in Fig.4 (b). When estimated instantaneous SNR is getting bigger, more power is imposed and a higher data rate is sent over the channel. As estimated instantaneous SNR is getting smaller, less power is imposed and a lower data rate is sent over the channel. If the estimated instantaneous SNR is below the cut off value 0.9233, no data is transmitted over a period. Table 1. Assumptions and Calculation data
NO.
f
1 2
B
300
100
Hz
Hz
30
3
KHz
KHz
J0
Cmax
SL
TL
NL
DI
Average
(dB)
(dB)
(dB)
(dB)
SNR(dB)
193
88.7
67
17
54.3
0.9233
0.570
193
60.067
42
17
107.93
0.9544
18.146
(Kbps)
0.7
250
0.6 200
D a t a R a t e (K b p s )
S o u rc e P o w e r (d B )
0.5 150
100
0.4
0.3
0.2 50
0.1
0
0
5
10
15
20
25 30 SNR (dB)
35
(a) Variable Power
40
45
50
0
0
5
10
15
20
25 30 SNR (dB)
35
40
45
50
(b)Variable Data Rate
Fig. 4. Adaptive Method At Low Frequency
The simulation results show that we can use optimal power allocation adaptive method to adapt the UACh variation at low frequency transmission. 3.2 Theoretical Experiment of UACh Capacity at High Frequency In this section, we simulate the UACh capacity at high frequency, assuming that a fixed source and receiver configuration, where the source depth is 10m, the receiver depth is
Adaptive Transmission Technique in Underwater Acoustic Wireless Communication
273
30m, and the range of the source and receiver is 1 km. The source center frequency is assumed at 30 KHz with bandwidth 3 KHz. In such a high frequency situation, we can use Ray-theory [11] to get the TL, where TL at range r is approximated by TL = 10 log(
△
r Δh ) Δθ
(11)
△
where θis two ray initial separation, h is the vertical separation of the rays . The source average power is assumed 193dB//uPa2/Hz. The ambient noise level is approximated 42dB at sea state 3[8]. For an array of 50 sensors, the DI is 17 dB [10], so we can get the average SNR using equation (10). All the assumptions and calculation data also are shown in Table 1. We can use the same method to get the Rayleigh fading distribution and use variable power and variable data technique to adapt UACh variation at high frequency, as shown in Fig.5. when channel conditions are favorable, power and data rates are increased and when channel conditions are not favorable, power and data rates are decreased. If the channel is too bad, below the cut off value 0.9544, the channel is not used for a period. 18
200 180
16
160
14
D a t a R a t e (K b p s )
S o u rc e P o w e r (d B )
140 120 100 80
12 10 8
60
6 40
4
20 0
0
5
10
15
20
25 30 SNR (dB)
35
(a) Variable Power
40
45
50
2
0
5
10
15
20
25 30 SNR (dB)
35
40
45
50
(b)Variable Data Rate
Fig. 5. Adaptive Method At High Frequency
The simulation results also show that we can use optimal power allocation adaptive method to adapt the UACh variation at high frequency transmission. 3.3 Summary of Section 3 The theoretical experiment results show that we can use an optimal power allocation adaptive method to adapt UACh variation, where power and data rates are increased when channel conditions are favorable and decreased when channel conditions are not favorable. If the estimated instantaneous SNR is below the cut off value, the channel is not used for a period.
274
G. Zhou and T. Shim
4. Variability of UACh Capacity in the Sloping Condition 4.1 UACh Capacity in the Sloping Condition at Low Frequency We use the same data and sound speed file as used in section 3. The sloping condition, as shown in Fig.6, the slope begins from the depth of 1000m,, where the horizontal range of the slope is 10Km and slope angle varies from one degree to five degree. In simulation, we also assume that the source is at the depth of 10m, the receiver is at the depth of 30m, source level (SL) is 193 dB/uPa2 and center frequency is 300Hz with bandwidth 100 Hz.
Fig. 6. Sloping Condition
Given assumptions, we can also use KRAKEN model [12] to get TL in the sloping condition. The ambient NL is approximated 67dB at sea state 3 by R.J.Urick [8]. For an array of 50 sensors, the DI is 17dB [10], as we assumed in section 3. So we can use the same method from section 3 at low frequency case to get the ChC, as shown in Fig.7. 4
capacity (Kbps)
3.5
3
2.5
2
1.5
0
1000
2000
3000
4000
5000 6000 ra n g e (m )
7000
8000
Fig. 7. Channel Capacity at low frequency
9000
10000
Adaptive Transmission Technique in Underwater Acoustic Wireless Communication
275
From Fig.7, the results show that the ChC, at the low frequency case, is very much influenced due to the change of the slope angle. 4.2 UACh Capacity in the Sloping Condition at High Frequency Using the same sloping condition at high frequency except center frequency of 30KHz with bandwidth 3KHz, slope angle is considered at both one and five degree. In such a high frequency case, we can also use Ray-theory [11] to get the TL at range 200m, 400m, 600m, 800m and 1000m, respectively. So we can use the same method from section 3 high frequency case to get the ChC, as shown in Fig.8. 18.8
18.7
Data Rate (Kbps)
18.6
18.5
18.4
18.3
18.2
18.1 200
300
400
500
600 range(m)
700
800
900
1000
Fig. 8. Channel Capacity at high frequency
From Fig. 8, the results show that the ChC at the high frequency case seems to be invariant by the slope condition. 4.3 Summary of Section 4 The simulation of ChC in the sloping condition shows that the ChC increases with the slope angles at low frequency, which is presumably due to reflection of acoustic energy in the sloping condition. Simulation at high frequency of short ranges shows that the slope is no effect on ChC.
5 Summary Under the assumptions of source and receiver configuration and Rayleigh fading UACh function, theoretical derivation shows that we can optimize data rate by using optimal power allocation adaptive technique. With constrained bandwidth, theoretical experiment results shows that we can use adaptive transmission technique to adapt channel variation in UACh communication. Simulation of ChC in the sloping condtion shows that variability of the UACh capacity seems not to be negligible by the sloping condition.
276
G. Zhou and T. Shim
Referrences 1. Shannon, C.E.: A Mathematical Theory of Communication. The Bell System Technical Journal (1948) 2. Hummels, D.R.: The Capacity of a Model for Underwater Acoustic Channel. IEEE Transaction on Sonics and Ultrasonics SU-19(3) (1972) 3. Hayward, T.J., Yang, T.C: Underwater Acoustic communication Channel Capacity: A Simulation Study. Naval Research Laboratory, Washington, DC20375 4. Sklar, B.: Rayleigh Fading Channels in Mobile Digital Communication Systems. Communications Engineering Services. IEEE Communications Magazine (1997) 5. Junli, A.B., Zhao, Y.Q.: Rayleigh Flat Fading Channel’s Capacity. School of Mathematics and Statistics. Carleton University (2005) 6. Zhou, G., Cho, J., Shim, T.: Underwater Acoustic Communication Channel Capacity in the Sloping Condition. In: KASI-ASK Joint Conference on Acoustic 2007(January 28, 2007) 7. Goldsmith, A.J.: Wireless Communication. Stanford University (2004) 8. Urick, R.J.: Principles of underwater sound (1983) 9. Williams.: Burdic Underwater Acoustic system Analysis (1991) 10. Nielsen, R.O.: Sonar Signal Processing, Interstate Electronics Corporation Anaheim, California (1991) 11. Bowlin, J.B., Spidsberger, J.L., Duda, T.F.: Ocean Acoustic Ray-Tracing Software Ray (1992) 12. Porter, M.B.: The KRAKEN Normal Mode Program. In: SACLANT Uudersea Research Center (May 17, 2001)
Autonomous Robot Path Planning Based on Swarm Intelligence and Stream Functions Chengyu Hu1,2, Xiangning Wu2, Qingzhong Liang2, and Yongji Wang1 1 Department of Control Science and Engineering, Huazhong University of Science & Technology, Wuhan, China 430074
[email protected],
[email protected] 2 School of Computer, China University of Geosciences, Wuhan, China 430074
[email protected],
[email protected]
Abstract. This paper addresses a new approach to navigate mobile robot in static or dynamic surroundings based on particle swarm optimization (PSO) and stream functions (or potential flows). Stream functions, which are introduced from hydrodynamics, are employed to guide the autonomous robot to evade the obstacles. PSO is applied to generate each optimal step from initial position to the goal location; furthermore, it can solve the stagnation point problem that exists in potential flows. The simulation results demonstrate that the approach is flexible and effective.
1 Introduction In the last few decades we have witnessed a rapidly increasing interest in mobile robot navigation and path planning, as it has a lot of applications such as assembly, manufacturing, transportation and services. The definition of robot path planning given by most researchers is typically formulated as follows: given a robot and description of an environment, plan a path from an initial location to the goal location, which is collision-free and satisfies optimization criteria [1]. The path planning problem could be divided into two sub-problems, one is to establish the surroundings model, generate path by some traditional approaches; the other is to avoid obstacle in the surroundings. There are many traditional approaches to establish the surroundings model. According to the knowledge of the environments, the methods can be classified as local methods and global methods. Visibility graph [4], free space approach [5] and grids [6] are envisioned as global methods. Artificial potential field [8], neural network algorithms [2] and fuzzy logic algorithms [19] are looked upon as local methods. Among these approaches, artificial potential field (APF) is used widely to solve the problem of path planning, as it is simple and easily implemented. However, some drawbacks such as local minimum, could not reach goal if obstacle nearby. Many researchers proposed improved APF [9]-[11], S Waydo and R M Murray [14][16] borrowed a concept from hydrodynamic analysis and then presented the potential flows algorithm in 2003. This algorithm has been proved effective to generate smoother path (i.e. more suited to aircraft-like vehicles) than other methods and L. Kang, Y. Liu, and S. Zeng (Eds.): ICES 2007, LNCS 4684, pp. 277–284, 2007. © Springer-Verlag Berlin Heidelberg 2007
278
C. Hu et al.
solves the local-minima in artificial potential field, whereas bring with the stagnation point problem where the velocity of the fluid is zero [16] Recently many heuristic approaches are employed to solve path planning problems. In order to find one optimal path, human take nature life such as birds, ants or other social insects as an example and develop algorithms inspired by their strictly selforganized behaviors. These algorithms such as ant colony optimization (ACO) and particle swarm optimization (PSO) can be subsumed under the concept of “Swarm Intelligence”. For instance, Zhang R B [12] proposed one global path planning method based on ACO, the author first used grids to establish the surroundings model and then applied ACO to path optimization, Kang F [13] combined improved artificial potential field approach with genetic algorithm to navigate mobile robot, Qin Y Q [17] used MAKLINK graph to build the working space of the mobile robot, then obtained the shortest path from the start point to the goal point by the Dijkstra algorithm, finally adopted PSO to find the best path. However, these heuristic algorithms are mainly used to find one best path from existed paths generated by the concept of configuration space [7]. In this work, we proposed a new method for mobile robot path planning. In the approach, Stream functions are used to push the robot away from obstacles. PSO is employed to generate every optimal forward step from the initial position to goal location and all the steps make up of the whole path. The stagnation point existed in potential flows could easily be solved by the introduction of swarm intelligence. The rest of the paper is structured as follows. In Section2, how to use the stream functions to avoid obstacles is reviewed and the drawback of stream functions is brief described. The PSO algorithm and the proposed approach are introduced and developed in detail in Section 3. In Section 4, simulation results are presented. Finally, a conclusion is drawn in Section 5.
2 Obstacles Avoidance in Potential Flows As have been mentioned above, stream functions could be used to establish the potential area and navigate mobile robot with collision free. Consider the robot as the fish, and then follow the stream line passing through the rock. Consider a uniform flow with strength U, supposed that a single, stationary obstacle with radius a is placed at (bx, by), let b = bx + iby , then b complex potential would become the following equation [15]
ω = Uz + U (
= bx − iby . The
a2 +b) z −b
(1)
The complex velocity can then be calculated using the following formula:
ω ′( z ) ≡ u − iv (2) Just for simplicity, suppose the circular obstacle is located at origin (0, 0). We have the following equation [15]
ω ′( z ) = U [1 −
a2 2 ( x − y 2 − i 2 xy )] r4
(3)
Autonomous Robot Path Planning i
279
i
Assumed that the robot i is currently ( x1 , x 2 ), then x
Here
2 2 2 2 a a i i i i i i = U [1 − (x − x )], x = U 2x x 1 1 2 2 1 2 4 4 i i r r
(4)
r i = ( x1i − bx ) 2 + ( x2i − b y ) 2
We could use equation (4) to evade the obstacle, free-collision path will be generated when the stream line pass through the obstacle. Undoubtedly the technique for single obstacle can be extended for the cases with multiple obstacles. In Fig.1, the left and right figure show how the robot passes through a single and multiple obstacles using the model above.
Fig. 1. This left figure shows a uniform flow with strength U pass through a single, stationary obstacle with radius a, the right figure shows a uniform flow with strength U pass through multiple stationary obstacles with different radius
Though the stream functions are very simple and a much smoother path could be generated by the equation (4), the SP problem mentioned above exists nonetheless. In Fig. 2 show the stagnation point in the potential flows.
Fig. 2. This left figure show where the stagnation point occur, the right figure shows the emulation that the robot plunge into stagnation point in multiple obstacle surrounding
280
C. Hu et al.
The stagnation points may occur at the A and C Point When the stream line pass through the centre of a circle showed in the left figure in Fig.2. The right figure shows the stagnation point in multiple obstacles surrounding, which simulated by Matlab 6.0. We could see that once a robot moves onto the stagnation point, it will stop and can not reach the target.
3 Path Planning Based on Swarm Intelligence and Stream Functions 3.1 Particle Swarm Optimization (PSO) Particle swarm optimization is firstly invented by Kennedy and Eberhart in 1995 [18]. The algorithm is a stochastic optimization method based on the social behavior of individuals living together in groups. Each individual tries to improve itself by observing other group members and imitating the better ones. In that way, the group members are performing an optimization, which can be described with the following model: Every particle has a position present in the search space, and each is able to calculate the corresponding objective function value, and moves in the search space with a velocity V. During the whole optimization process, all particles stay alive; they just change their position and velocity. The best position one particle has reached so far is called its private guide pBest and the best position ever visited by one of its neighbors is the particle’s local guide gBest. In each iteration t, the particles’ velocities are updated, directing each particle towards its local and its private guide and keeping a proportion of the old velocity. The velocity and position of each particle is updated with the following equation.
V = w *V + C1 * Rand * ( pBest − present) + C 2 * rand * ( gBest − present) (5) t t −1
present = present +V t t −1 t
(6)
Here Vt is the particle velocity at the time t, present is the current position of particle, pBest is the particle best position and gBest is a global best position, rand is a random number between (0,1), w is inertia weight and C1, C2 are learning factors. Much work has been done to prove that the choice of inertia weight w has no fastness value [3]. The inertia weight w plays an important role for the convergence behavior of the technique. It is used to control the impact of the previous history of velocities to the current velocity of each particle, regulating the trade-off between the global and local exploration abilities of the swarm, since large values of w facilitate global exploration of the search space (visiting new regions) while small values facilitate local exploration. In order to perform more refined search of the already detected promising regions, the common measure is that uses large values of w at the first steps of the algorithm and gradually decrease during the optimization process. In the paper we set the w initial value as 0.9 and the ending value as 0.4, and then decreasing the w linearly by each epoch. The equation is:
w = wend + t *
winitial − wend t max
Here t is the current iteration, tmax is the max iteration.
(7)
Autonomous Robot Path Planning
281
3.2 Model Based on PSO and Stream functions Regard the robot as fish, we consider such a scenario: a school of fish pass through the reefs to find the food, each fish follows the leader which has the best position and each fish remember its best position, at the same time, the reefs will take effect on the fish, all the fish just follow the stream line evading the obstacles. Suppose that the working space of the mobile robot is that: the mobile robot moves in a two-dimensional space; there are numbered static or dynamic obstacles in the space, and the obstacles could be abstracted as circular obstacles with radius; if some obstacles are polygon, we replace them with circumcircle; the robot could been seen as a particle robot. Then the current position and velocity of robot can be seen as the particle’s. Set the position of particle i as X(xi,yi), velocity as (Vxi,Vyi), and the global extreme value is G(gx,gy). We use the distance from current position to the goal location to evaluate each particle’s performance, i.e. use the sphere function to guide the particle. The equation is following:
①
②
③
1 F ( X ) = ( X − G) 2 2
(8)
In the process of the particles moving to the goal, each best position (i.e. gBest) in iteration act as the node where the robot pass through, the distance between two adjacent nodes is the robot motion step, so line the nodes together we could get the planning path. The path planning problem criteria may include distance, time, and energy. The distance is the most common criterion. We use the equation (9) to compute all the virtual robots paths, selects the shortest path as the planning path.
min S = ∑i =1 step i t max
(9)
Here stepi is the distance between two adjacent nodes, and tmax is the max iteration. The distance must restrict within the max gait that the robot could move, i.e. the max flying distance of particle in each iteration could not exceed the gait of robot. The new approach describes in detail as following: Step1, select parameters to initialize particle swarm with initial positions. Set the value that sensors could detect. Step2, compute the fitness by equation (8) and judge whether the swarm particles reach the goal location, if arrive at the specified position, then go to step5. Step3, multiple particles begin to fly and adjust theirs own “flying” according to theirs “flying” experience as well as the experience of the other particles. They update their positions and velocities by the equation (5), (6). Step4, at the ending of particles flying, stream functions will have an effect on the velocity of each particle, push the particle away the obstacle, if some particles are stuck into the stagnation point, they could pull themselves out by the experience of other virtual robot. Step5, Judge whether the stopping criterion is satisfied, if satisfied, all virtual robot stop flying, compute the distance of the all robots paths by equation (9), select the shortest path as the planning path, if not go to Step3.
282
C. Hu et al.
4 Simulation Results Simulation are implemented with Matlab 6.0, supposed that robot could detect all obstacles in the space. The results show the effectiveness of the approach discussed in Section 3. Fig.3 is the snapshots of simulation results of mobile robot navigated by PSO and Stream functions. In simulations, the left figure shows only one static circular obstacle with radius 1 in the surroundings, which centered at (2,2), the right figure is the case for multiple obstacles avoidance. There are three obstacles centered at (-2,6), (-4,-2), (4,2) and with radius 1, 0.6 and 0.8 respectively. The strength U of the uniform flows is 2. The initial positions of all the particles are random positions within the range of a square area which centered at (-8,-8). In regard to the parameters of PSO algorithm during the experiments, C1=C2=2, size of the particle swarm is 10, maximum number of iterations is 200, the stopping criterion ErrGoal is 0.01, w is set according to equation (7). navigate in multiple obstacles surroundings 10
8
8
6
6
4
4
2
2 Y
Y
navegate in single obstacle surrounding 10
0 -2 -4
-4
-6
-6
-8
-8
-10 -10
-10 -10
-6
-4
-2
0 X
2
4
6
8
10
static obstacle
0 -2
-8
goal location
initial position -8
-6
-4
-2
0 X
2
4
6
8
10
Fig. 3. Simulation of robot navigated by PSO and stream functions with single or multiple obstacles in the surroundings
In the Fig.3, blue circle represents the obstacle, if the obstacle is not a circle such as a polygon, we could use a circumscribed circle to substitute it. The green circle is the goal. The blue line with red dots on is the optimal path. Every red dot represents the forward gait of the mobile robot in the evolution. The approach could be used in dynamic surroundings. Suppose that in the Fig.4, one hollow circular obstacle with radius 0.8 is moving from the position (4,-7) to (4, 2) at a fix speed which should not exceed the robot motion speed. From the results, we could see no matter what the surroundings is, the proposed approach could lay out one optimal path. The contribution of PSO could make a summary of two aspects, one is that it guides the robot to evade the obstacles, move to the target location, and finally make a optimization of the paths that it has generated; the other is that it could solve the stagnation point problem which exists in potential flows. When some particles are plunged into the stagnation point, they could escape by their inertia and other
Autonomous Robot Path Planning
283
navigate in dynamic surroundings 10 8 6
static obstacle
goal location
4
Y
2 0
static obstacle moving obstacle
-2 -4 -6 initial position
-8 -10 -10
-8
-6
-4
-2
0 X
2
4
6
8
10
Fig. 4. Simulation of robot navigated by PSO and stream functions in dynamic surroundings
neighbor particles’ traction. The stream functions make a repulsive force on the robot to push the robot away from obstacles.
5 Conclusions and Future Work In this paper, we present a new approach to navigate mobile robot in a static or dynamic environment based on PSO and stream functions. Extensive simulation results illustrate the effectiveness of the proposed method. One key development is the introduction of PSO that effectively guides the robot to the goal location and deals with the SP problem which exists in potential flows. Another key development is the employment of stream functions that guarantee the particle swarms evade the multiple obstacles. The combination of PSO and stream functions is the ally of global method and local method; they could draw the advantages from each other and effectively navigate the mobile robot. Research is underway for further in-depth analysis of the proposed technique. We will emphasize future work on the following issues: one is how to compute the repulsive force if the obstacle could not be abstracted as a simple circle obstacle with radius, the other is whether the model is applicable when the objects move at a variable velocity rather than to a fixed one.
References 1. Thomaz, C.E., Pacheco, M.A.C., Vellasco, M.M.B.R.: Mobile Robot Path Planning Using Genetic Algorithms. In: International Conference on Evolutionary Computation in Engineering (1999) 2. Balakrishnan, K., Honavar, V.: Some Experiments in evolutionary Synthesis of Robotic Neurocontrollers. In: Proceeding of the World Congress on Neural Networks, San Diego, CA, September 15-20, 1996, pp. 1035–1040 (1996)
284
C. Hu et al.
3. Shi, Y.H., Eberhart, R.C.A.: Modified Particle Swarm Optimizer [A]. In: International Conference on Evolutionary Computation [C], pp. 69–73. IEEE, Anchorage (1998) 4. Pere: Automatic planning of manipulator movements. IEEE Transactions on Systems, Man, and Cybernetics 11(11), 681–698 (1981) 5. Habib, M K, Asama, H.: Efficient method to generate collision free path for autonomous mobile robot based on new free space structuring approach. In: Proc. IEEE/RSJ IROS, pp. 563–567 (1991) 6. Zhaoqing, M., Zenren, Y.: Mobile robot real-time navigation and avoidance based on grids. Robot 61, 344–348 (1996) 7. Liu, Y.H., Arimoto, S.: Proposal of tangent graph and extended tangent graph for path planning of mobile robots. In: Proceedings of IEEE International Conference on Robotics and Automation, vol. 1, pp. 312–317 (1991) 8. Khatib, O.: Real-time obstacle avoidance for manipulators and mobile robots. Int. J. Robot. Res. 5(1), 90–98 (1986) 9. Geand, S.S., Cui, Y.J.: New potential function for mobile robot path planning. IEEE Transactions on Robotics and Automation 16(10), 615–619 (2000) 10. Koren, Y., Borenstein, J.: Potential field methods and their inherent limitations for mobile robot navigation. In: Proc. IEEE Conf. Robotics and Automation, Sacramento, CA, April 7–12, 1991, pp. 1398–1404 (1991) 11. Chuang, J.H., Ahuja, N.: An analytically tractable potential field model of free space and its application in obstacle avoidance. IEEE Trans. Syst., Man, Cybern. B 28, 729–736 (1998) 12. Zhang, R.B., Guo, B.X., Xiong, J.: Research on global path planning for robots based on ant colony algorithm. Journal of Harbin Engineering University 25(6) (December 2004) 13. Fei, K., Yao-nan, W.: Robot Path Planning Based on Hybrid Artificial Potential Field Genetic Algorithm. Journal of System Simulation 18(3) (March 2006) 14. Waydo, S., Murray, R.M.: Vehicle motion planning using stream functions. In: Proc. IEEE International Conference on Robotics and Automation, Taipei, Taiwan, pp. 2484–2491. IEEE Computer Society Press, Los Alamitos (2003) 15. Waydo, S., Murray, R.M.: Vehicle motion planning using stream functions. CDS Technical Report 2003 -001, California Institute of Technology (2003) 16. Ye, G.H., Wang, H., Tanaka, K.: Coordinated Motion Control of Swarms with Dynamic Connectivity in Potential Flows[DB/OL] (2005) 17. Qin, Y.Q., Sun, D.B., Li, N., et al.: Path Planning for mobile Robot Based on Particle Swarm Optimization [J]. Robot 26(3), 222–225 (2004) 18. Kennedy, J., Eberhart, R.: Particle swarm optimization. In: Proceedings of. IEEE International Conference on Neural Networks, pp. 1942–1948. IEEE Computer Society Press, Los Alamitos (1995) 19. Nelson, H.C., Ye, Y.C.: An intelligent mobile vehicle navigator based on fuzzy logic and reinforcement learning [J]. IEEE Trans on Systems, Man and Cybernetics, Part B: Cybernetics 29(2), 314–321 (1999) 20. Clerc, M: The swarm and queen: Towards a deterministic and adaptive particle swarm optimization [A]. In: Proceeding of Congress on Evolutionary Computation[C], Washington, D.C. (1999)
Research on Adaptive System of the BTT-45 Air-to-Air Missile Based on Multilevel Hierarchical Intelligent Controller Yongbing Zhong, Jinfu Feng, Zhizhuan Peng, and Xiaolong Liang The Engineering Institute, Air Force Engineering University. 710038 Xi’an, China
[email protected]
Abstract. This paper presents an adaptive control system suitable for the control technology of BTT-45 air-to-air missile. It resolves a problem of the BTT-45 missile’ channel coupling through the application of the idea which is similar to “reversing design”. The proposed system has the following features: (1)Adaptive robust control; (2)Self-decoupling. A simulation example is used to demonstrate excellent performance of the proposed system. Keywords: BTT, STT, Air-to-Air Missile, Reversing Design, “IC”.
1 Introduction The control and guide system is the key that the missile can accurately hit the target for the air-to-air missile and air-to-air missile adopts the skid to turn (STT). Along with the improvement of maneuverability and loading, and the progress of airborne computer and control technology, the world positively develops the research on the control of BTT missile for improving the missile’s performance1,2. Though the BTT missile has much superiority, there are many key technical problems to replace the current STT missile matured in engineer design and application. Including,3,4,5,6: i) The design of the missile’s control system. It is improper to adopt the three-channel independent control project for the STT missile because the BTT missile is a multivariable system including kinematics coupling, inertia coupling, aerodynamic coupling and control action coupling. ii) It requires that it should restrain the disadvantageous influence of the homing loop’s stability because of the missile's violent rolling. Therefore, this paper puts forward a system of multilevel hierarchical intelligent controller, analyzes the technical characteristics, and studies its application to the BTT-45 air-to-air missile. iii) The harmony control. The BTT missile requires a system to have a harmony control function of keeping angle of sideslip as zero in flight. iv) The uncertain problem of controlling angle of rolling. The BTT missile requires a system to have a harmony control function of keeping angle of sideslip as zero in flight.
2 The Design Ideas The multilevel hierarchical intelligent controller system comprises organization level, coordination level, executive level and CDB etc. Each level is an intelligence L. Kang, Y. Liu, and S. Zeng (Eds.): ICES 2007, LNCS 4684, pp. 285–291, 2007. © Springer-Verlag Berlin Heidelberg 2007
286
Y. Zhong et al.
including the characteristic identifier (CI), illation engine (IE), knowledge base (KB). After the instruction enters the controller and is identified by "CI", the instruction, KB and DB are input into IE to judge and bring the output control signal. Centralizing the transcendental knowledge of the designer and control expert on the intelligence controller's KB can combine qualitative analysis with quantitative analysis to reduce the dependence on the mathematics model of the controlled object and perfect it by attaining after-experience in use. Adopting the direct illation form of " if … then … " can search optimization or self-adaptation control. CDB is the exchange location for multi-agent control system. CDB records the information and suppose needed by all levels to offer the intelligence for share7. All levels' exchange is executed through blackboard and the CDB's suppose is continuously modified to win the solution at last. The multilevel hierarchical intelligent controller system takes on the mixed multilevel 'IC' understanding / parsing cognitive method. Low-level must execute the partial control task with high accuracy to content some partial performance goal; Coordination level mostly assists every sub-task and its' running precision isn't high but has higher decision-making and definite study ability; High-level has low precision and corresponding study and high-level decision-making ability. Therefore, High-level is a cerebrum of “IC” and follows the principle of "intelligence increases but accuracy descends8,9. It is obvious that the system is the same with the weapon system that need make a decision on all and synthetically harmonize sub-systems to concretely execute high performance campaign task.
3 Intelligence Control's Application to the BTT-45 Air-to-Air Missile 3.1 The Design of the BTT-45 Air-to-Air Missile's Organization Level The manipulative structure level's precision request is not high, but which needs corresponding study and high deciding ability. In this level, it not only establishes the certain mathematics model, but also establishes the dicey broad-sense model. Only can adoption of the mixed control of non-mathematics broad-sense model expressed as knowledge and the mathematics model expressed as parsing methods be fit to it. We can get the BTT-45 missile's mathematics model if taking the missile's angle of attack α , angle of sideslip β , three angular velocities rudder-deflection angles
δ x , δ y ,δ z
ω x ,ω y ,ω z
of the axis, three-
and angle of rolling φ as the state variable of
the missile' three channels’ mathematics model10,11. •
α =ω
57 . 3 qsC α 1 −ω xβ − ( 57 . 3 qsC y + P )α − mv mv
z
y
δ
z
(1)
δy
•
β =ω
δz
y
57 . 3 qsC z δ β 1 + ω xα − ( P − 57 . 3 qsC z ) β + mv mv
ω
• x
=
ω qsL (m x Iz
x
δx L ω x + 57 . 3 m x δ x ) 2v
y
(2) (3)
Research on Adaptive System of the BTT-45 Air-to-Air Missile •
qsL ωy (my Iy • qsL ωz ωz = (mz Iz
ωy =
• • I −I L δ ωy + myβ β + 57.3myβ β + 57.3my y δ y ) + x z ωxωz 2v Iy • • L L I −I ωz + mαz α + 57.3mαz α + 57.3mδz z δz ) + x z ωxωy 2v v Iz
•
δ x = −17δ x − δ xc •
δ y = −17δ y − δ yc •
δ z = −17δ z − δ zc qs [(C x + 57.3C αy )α + 57.3C δy z δ z ] m qs δ α z = [(57.3C zβ − C x ) β + 57.3C z y δ y ] m
αy =
•
φ = ωx
287
(4)
(5) (6) (7) (8) (9) (10) (11)
From the equation group, we can make out that rolling-channel can be resolved alone but pitching-channel and crab-channel exist coupling whose intension is direct ratio to the rolling angular velocity ω x . Broad-sense model: It is necessary to establish the self-adaptation model, selfstudy model, self-organizing model etc12. 3.2 The Design of the BTT-45 Air-to-Air Missile's Coordination Level It mostly harmonizes sub-tasks completed. The coordination level's precision request is not high, but there is corresponding study and higher decision-making ability. Because the demand that the BTT missile should keep angle of slippery less can not be satisfied, there is a system that has harmony and control to accomplish. In addition, missile's pitching-crab channels exist more coupling because the BTT missile makes a rapid rotation about its velocity vector13,14,15. For the sake of settling the conflict, it may design a cooperation module and evaluating module to achieve cooperation active of "IC" missile. It includes two parts below: The Design of Cooperation Module In 1987, The Kovach designed a method: first, he neglects three-channel's coupling; then independently designs three-channel's automatic pilot by frequency method and root track method. The method requires pitching-channel and rolling-channel to satisfy the design demand and responding velocity of crab-channel is not less than rolling-channel16. In this way, the system's angle of sideslip is small and it achieves BTT's control. We may suppose angle of sideslip zero based on this idea. In this way, it not only satisfies angle of sideslip lesser but also causes pitching-channel and crab-channel not to exist coupling. First, we can resolve rolling-channel, then
288
Y. Zhong et al.
pitching-channel, last crab-channel. Therefore, the guidance system only brings pitching acceleration and rolling-angle' order17,18,19,20. The model of rolling-channel: •
ω
qsL (m Iz
=
x
ω x
x
L ω 2v
x
+ 57 . 3 m
δ x
x
δx)
(12)
•
δ x = −17δ x − δ xc
(13)
•
φ =ωx
(14)
The model of pitching-channel: •
α = ω •
ωz =
1 mv
−
z
( 57 . 3 qsC
α y
+ P )α −
δ
57 . 3 qsC
y
z
δ
z
mv
(15)
•
δz α L • α I −Iz qsL ω z L (m z ω + m z α + 57.3m z α + 57.3m z δ z ) + x ω xω y Iz 2v z v Iz
•
δ z = −17δ z − δ zc α
y
=
(16) (17)
δz α qs [( C x + 57 . 3 C y )α + 57 . 3 C y δ z ] m
(18)
The model of crab-channel: δ
y
57 . 3 qsC z δ y β 1 β = ω y + ω xα − ( P − 57 . 3 qsC z ) β + mv mv •
•
ωy =
(19)
•
δy β • β I −I qsL ωy L (my ω y + my β + 57.3my β + 57.3my δ y ) + x z ω xω z Iy 2v Iy
•
δ y = −17δ y − δ yc α
z
=
δy β qs [( 57 . 3 C z − C x ) β + 57 . 3 C z δ y ] m
(20) (21) (22)
The Design of Evaluating Module The evaluating module decides some action should be locally executed and some action should be synergically executed. The former informs seeking level of problem and the latter informs the module of cooperation21. 3.3 The Design of the BTT-45 Air-to-Air Missile's Executive Level
In this level, it should ensure that the control of executing organization can adapt the change of overloading characteristic, self-parameters and running circumstance forwardly. The level should high precisely execute the partial task of control to satisfy some partial performance goal. It is an interface of coordination level and domain task to receive the command of evaluating module and monitors when the mission completes to feed back the information to evaluating module and etc. In reality, completing the mission demand complicated consequence and the mission can't hang up or cancel at discretion. Or, it will result in the disagreement of the system. The same mission’s intervene should proceed according to the priority or restriction22,23.
Research on Adaptive System of the BTT-45 Air-to-Air Missile
289
4 The Implement and Experiment of Control System The control system of Three axes revolver for flying simulation adopts light electronic axis-angle-coder with high distinguish rate as measure component to form full digital system including velocity loop and position loop. The control system achieves instruction transmission, data acquisition surveillance management and revolver driving by way of the software and hardware operation24,25.
,
The Composing of Control System The control system of three axes revolver for flying simulation consists of three control subsystems which are crab-system, pitching-system and rolling-system. Three interface circuit
industry control computer
keyboard
controller
mouse
power switch device
communication
surveillance management
monitor
tested equipment
serial port
serial card PCI16121 PWM card parallel card
crab moment electrical machine
coder
crab frame
power switch device
pitching moment electrical machine
coder
pitching frame
power switch device
inner clock
coder
rolling frame
Fig. 1. The structure figure of control system
control subsystems are the same structure which is made up of industry control computer, interface circuit, inner clock, power switch device, coder and moment electrical machine. (fig.1). The Result of Decoupling Experiment Such as the formula (12) to (20), three subsystems are independent and designed separately. It is easy to achieve the sine or cosine function and add subtract multiplication operation included in the decoupling network. The computer collected the data of angle position measured by coder and the computer can receive the data of angle velocity after differential coefficient operation in according to the angle position. It is seen that three subsystems is decoupled obviously by simulating experiment. The paper takes the rolling-channel and the pitching-channel as an example. The underneath takes the influence of the rolling-channel on the pitching-channel as an example. Fig.2,3,4 are the changing curves of the rolling-channel φ and the figure 5 is the unchangeable curve of the pitching-channel α . It is seen that the change of the rolling-channel has no effect on the pitching-channel and proves to the rationality of the design which three channels don’t possess coupling.
290
Y. Zhong et al.
Fig. 2. The changing curve of
Fig.4. The changing curve of
φ
φ
Fig. 3. The changing curve of
Fig. 5. The changing curve of
φ
α
5 Conclusion The system of multilevel hierarchical intelligent controller is put forward in this paper. The system control project and system principium are provided in detail. The pivotal problems of the application are gone deep into discussion. Of course, the paper only puts up the principle and there are still a lot of works to do for the engineering application.
References 1. Zheng, J., Yang, D.: The Application of Robust Control Theory to Bank-to-Turn Missile. National Defence Industry Press, BeiJing (2001) 2. Luo, J., Xie, S., Jiang, Z., Wang, Y.: Intelligence Control Engineering and its Applied Example. Chemical Industry Press, BeiJing (2005) 3. Cai, Z.: Intelligence Control. Electronic Industry Press, BeiJing (2004)
Research on Adaptive System of the BTT-45 Air-to-Air Missile
291
4. Grayson, M.: Principles of Guided Missile Design. Van Nostrand, New York (1960) 5. Eichblatt Jr., E.J. (ed.): Test and Evaluation of the Tactical Missile. AIAA, Washington, DC (1989) 6. Lin, C.F.: Modern Navigation, Guidance, and Control Processing. Prentice-Hall, Englewood Cliffs, NJ (1991) 7. Eberhardt, R., Wise, K.A.: Automated gain schedules for missileautopilots using robustness theory. In: Proc. 1st IEEE Conf. Contr. Applicat., Dayton, OH, pp. 234–250 (1992) 8. White, D.P., Wozniak, J.G., Lawrence, D.A.: Missile autopilot design using gain scheduling technique. In: Proc. 26th Southeastern Symp. Syst. Theory, Athens, OH, pp. 606–610 (1994) 9. Lin, C.L., Su, H.W.: Adaptive fuzzy gain scheduling in guidance system design. AIAA J. Guid., Contr., Dyna. 24, 683–692 (2001) 10. Tan, S., Hang, C.C., Chai, J.S.: Gain scheduling: From conventional to neuro-fuzzy. Automatica 33, 411–419 (1997) 11. Takagi, T., Sugeno, M.: Fuzzy identification of systems and its applications to modeling and control. IEEE Trans. Syst., Man, Cybern. SMC-15, 116–132 (1985) 12. Seng, T.L., Khalid, M.B., Yusof, R.: Tuning of a neuro-fuzzy controller by genetic algorithm. IEEE Trans. Syst., Man, Cybern. B 29, 226–236 (1999) 13. Siarry, P., Guely, F.: A genetic algorithm for optimizing Takagi-Sugeno fuzzy rule bases. Fuzzy Sets Syst. 99, 37–47 (1999) 14. de Sousa, A.T., Madrid, M.K.: Optimization of Takagi-Sugeno fuzzy controllers using a genetic algorithm. In: Proc. 9th IEEE Int. Conf. Fuzzy Syst., San Antonio, TX, pp. 30–35 (2000) 15. Goldberg, D.: Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley, Reading, MA (1989) 16. Hull, R.A., Johnson, R.W.: Performance enhancement of a missile autopilot via genetic algorithm optimization techniques. Proc. Amer. Contr. Conf., 1680–1684 (1994) 17. Kim, Y.S., Han, H.W.G., Kuo, T.Y.: An intelligent missile autopilot using genetic algorithm. In: Proc. 1997 IEEE Int. Conf. Syst., Man, Cybern. (1997) 18. Takagi, T., Sugeno, M.: Fuzzy identification of systems and its applicationsto modeling and control. IEEE Trans. Syst., Man, Cybern. SMC-15, 116–132 (1985) 19. Sugeno, M.: On stability of fuzzy systems expressed by fuzzy rules with singleton consequents. IEEE Trans. Fuzzy Syst. 7, 201–224 (1999) 20. Teixeira, M.C.M., Zak, S.H.: Stabilizing controller design for uncertain nonlinear systems using fuzzy models. IEEE Trans. Fuzzy Syst. 7, 133–142 (1999) 21. Begovich, O., Sanchez, E.N., Maldonado, M.: T–S scheme for trajectory tracking of an underactuated robot. In: Proc. IEEE Fuzzy Conf., San Antonio, TX, May 2000, pp. 798– 803 (2000) 22. Slotine, J.E., Li, W.: Applied Nonlinear Control. Prentice-Hall, Englewood Cliffs. NJ (1991) 23. Joo, Y.H., Shieh, L.S., Chen, G.R.: Hybrid state space fuzzy model based controller with dual-rate sampling for digital control of chaotic systems. IEEE Trans. Fuzzy Syst. 7, 394– 408 (1999) 24. Ogata, K.: Discrete-Time Control Systems. Prentice-Hall, Englewood Cliffs. NJ (1987) 25. Chen, C.T.: Linear System Theory and Design, 3rd edn. Oxford Univ. Press, London, U.K (1999)
The Design of an Evolvable On-Board Computer Chen Shi1, Shitan Huang1, and Xuesong Yan2,3 1
2
Xi’An Microelectronics Technology Institute. Xi’An 710054, China School of Computer Science , China University of Geosciences, Wu-Han 430074, China 3 Research Center for Space Science and Technology, China University of Geosciences, Wu-Han, 430074, China
[email protected]
Abstract. This paper argues in favor of evolvable hardware as a technique to improve the capability of on-board computers for deep space exploration. From architecture point of view, it analyzes the role of EHW in the multi-level structure of the intelligent system and describes the two mainly characters of the computer system building with EHW. It emphasizes the important of supercomputing capability and self-determination ability. In order to illuminate the implementation of an evolvable computer system, it describes a demo structure for object recognition that realizes image-matching function for space application.
1 Introduction The on-board computer for deep space exploration to Moon, Mars and Beyond faces new challenges, including the extreme environment (EE), the delay of too-long information transportation, the expensive cost of launching and transportation, the unknown situation in space sailing, and so on [1,2]. Environments with large temperature swings is one typical factor induce drifts, degradation, or damage into electronic devices, such as between -180°C and 120°C at the initial landing sites on the Moon, low temperatures of -220°C to -230°C during the polar/crater Moon missions, high temperature of 460 °C for Venus Surface Exploration and Sample Return mission, etc[3]. The current approach for on-board computers designs is to use commercial/military range electronics protected through passive (insulation) or active thermal control, and high weight shielding for radiation reduction. In fact, that only solves problems partly, but adds to sizable weight and volume, compounded by power loss, and additional cost for the mission. More importantly, as missions will target operations with smaller instruments/rovers and operations in areas without solar exposure, these approaches become infeasible. Therefore, we must explore a new technology which is suitable for on-board computers designs for deep space explorations with several advantages including lower costs, less power, higher reliability, higher flexibility and longer life-span, etc. Evolvable hardware (EHW) is an emerging technology with reconfigurability and evolvability suit for space applications. The very core of the EHW technology is based on using advanced search algorithms to automatically design, reconfigure, adapt, or otherwise manipulate hardware or software models of hardware, typically evolutionary algorithms (EAs) which are stochastic L. Kang, Y. Liu, and S. Zeng (Eds.): ICES 2007, LNCS 4684, pp. 292–296, 2007. © Springer-Verlag Berlin Heidelberg 2007
The Design of an Evolvable On-Board Computer
293
search methods that mimic the metaphor of natural biological evolution, such as selection, recombination, mutation, migration, locality and neighborhood. Normally EHW is built on software-reconfigurable logic devices, such as the PLD (Programmable Logic Device) and the FPGA (Field Programmable Gate Array).
2 Intelligent Systems for Deep Space Applications Building the electronic system for deep space applications with the EHW technology has already got a lot of achievements, for example “Intelligent Decision-Making for Autonomous Rover Operations” which will be used to explore the Mars in 2009 by NASA. The main object of this research item is to find the best routing strategy even in the unknown environment which sufficiently taking all kinds of factors into account, such as the requirement of exploration missions, the performance of electric systems, the condition of space environment, and so on. These instances of EHW in enabling a fully autonomous space exploration mission have one representative characteristic that is intelligent ability. The main tasks of the intelligent system including two categories: ones that handle the high-level heuristic functions of decision-making and ones that control the low-level autonomic reactive behaviors. For example, during the course of mission operation, it support the planning and scheduling of mission activities to accomplish the overall mission goals and objectives, taking into account the operational status of flight and ground mission elements, health and safety constraints, optimal resource allocation, and science prioritization.
Fig. 1. The multi-level structure of an intelligent system
Normally, the architecture of it has multi-level interconnects and interfaces, which offers extreme flexibility and is radically reconfigurable by using both evolvable hardware and intelligent software on the system level as well as the module level. The whole intelligent system is a highly integrated hardware-software system that is analogous to the integration of the neural system with the body in biological systems. In order to design and analyze the system expediently, it can be separated into three basic levels including system level, sub-system level, and Module level, which illustrating in fig.1. The hiberarchy can provide large-scale failsafe redundancy and adaptability to a
294
C. Shi, S. Huang, and X. Yan
broad range of operational environment. System level realizes the swarm reconfiguration and reallocation of space system with multi-agent coordination technology. Current reconfigurable space system concepts generally achieve different system configurations by rearrangement of system modules with fixed functional characteristics. And the system-level reconfiguration is allowed to evolve with science priorities and system degradation and attrition. The functional groupings and the operational status of the swarm will be continuously evaluated throughout the mission and the constituents optimized to meet the mission goals. Subsystem level includes an innate set of functional subsystems that are connected by programmable, adaptable and evolvable interfaces to provide functional plasticity. And the conception of module level is similar to EHW, which means that the basic functional modules will evolve with “learning” and sensor feedback. For example, the calibration of a detector will self-adjust as the gain changes throughout the mission. Usually the two layers that above mentioned are achieved by intelligent software, which utilize the evolution programming and evolution strategy to improve the optimization techniques. The last layer mainly realizes the fine-granularity reconfiguration and provides high performance computing capabilities.
3 EHW in On-Board Computers From the above analysis, we could find out that EHW is the lowest level of the whole multi-level structure that is the foundation of the intelligent system providing the computing power and the self-reconfigurable ability. In traditional, on-board computer is made up with hardware and software. With the EHW technology developing, the hardware of on-board computer becomes EHW and the software becomes corresponding intelligent software that forms a self-organize, self-diagnose, and self-recover computer system. In the early days of the EHW research, we hammer away at the optimize circuit design technology of the electronic system including on-board computer design with the EAs[4], for example the research of the description language OHDL[5]. The idea of it to employ a search/optimization algorithm that operates in the space of all possible circuits and determines solution circuits with desired functional response. It can be done on software models using circuit simulators of traditional computers, in which case, evolution is called extrinsic evolution or off-line evolution. The EHW of on-board computer is directly implement in reconfigurable hardware, in which case, it is called intrinsic evolution or on-line evolution. Building with EHW, the computer system has two mainly characters. The first one is strong computer power that means the basic components of EHW provide supercomputing capabilities that are not available on traditional computers. The current state of art for processing data onboard spacecraft, aside from purely logistical data handling duties, is fairly limited. The current lack of on-board scientific computing capabilities is a major obstacle to optimizing scientific return for science missions. Many analysis tasks that are routinely performed on researchers’ desktop computers cannot be done onboard. Mission operational efficiency is severely hampered, as these analysis results are necessary to guide operational decisions to achieve onboard science autonomy. Surely, there are different approaches to attain high performance computing capabilities onboard, including hardware approach, software approach for fixed conventional computer architecture, and hybrid hardware and software approach.
The Design of an Evolvable On-Board Computer
295
But the effectiveness of that is limited because of the definitive factor is the development state of integrate circuits design and manufacture technology. With the development of computer science, we can form a colony system with the familiar Commercial-Of-The-Shelf (COTS) microprocessors that are conventional standard stored program computer on von-Neumann architecture to attain supercomputing power in space to support intelligent system operations. On the basis of this idea, with the maturation of FPGA technology, EHW becomes another effective approach to attain computing power. In one large scale FPGA chip, we even can build all computing resource for one intelligent system using the fine-granularity configurability of FPGA. The second mainly character of the computer system building with EHW is self-determination ability that means the system can deal with many pivotal mission or emergent status by itself, for example fault-recovery. Most evolutionary approaches to fault recovery in FPGAs focus on evolving alternative logic configurations and evolving the intra-cell routing[6,7]. And with the appearance of the part-configurable FPGA technology, the system can realize self fault-recovery when it is running which can guarantee the executable continuity of important missions.
4 Example Currently, many image-processing applications are implemented on general-purpose processors such as Pentiums. Effective processing of images and specially sequences of images in space requires computing architectures that are less complicated, highly flexible and more cost-effective. Using the EHW technology, we build an evolvable computer system for interplanetary navigation application that mainly perform object recognition mission. In this section we will briefly describe one demo structure of our
Inside states
Outside states Outside Monitor
Inside Monitor
Collection+ Pre-processing
Storage+ Transmission
Evolutionary Algorithms Reconfiguration Image-matching
The evolvable computer system Fig. 2. The structure of an evolvable computer system
296
C. Shi, S. Huang, and X. Yan
on-going efforts to build evolvable computer for space applications. Theoretically speaking, the kernel of it is to realize image-matching algorithms effectively. As if can be deduced from fig.2, the tissue is composed of two main parts: the traditional component and the evolvable component. The first part is similar to computer unit of general-purpose processors that see after changeless operations such as the image data collection, pre-processing, storage and transmission, etc. The kernel of second part is a functional evolvable module, which can be implemented by reconfigurable field programmable gate arrays (FPGAs). The configurations that make the device output responses optimal to the desired response are combined to make better configurations until an optimal architecture is achieved. The evolvable hardware architecture continues to reconfigure itself in order to achieve a better performance. With the changing of environment, the outside and inside monitor supply the real-time parameter for the reconfigurable hardware module which will process evolutionary algorithms and search the optimal configuration strategy in the feature space, which can been implemented on Xilinx Virtex2 FPGA.
5 Conclusions We have identified the role of EHW in the multi-level structure of the intelligent system for deep-space applications and the two mainly characters of the computer system building with EHW. We have begun efforts to attain high-performance computing in space to support intelligent system operations based on FPGA reconfigurable technology. We have described one demo structure of an evolvable computer system with its evolvable hardware and software for object recognition.
References 1. Von Neumann, J.: The Theory of Self-Reproducing Automaton. University Illinois Press, Urbana, IL (1966) 2. de Garis, H.: Evolvable Hardware: The Genetic Programming of Darwin Machines. In: Proc. of Int. Conf. on Artificial Neural Nets and Genetic Algorithms, Innsbruck, pp. 441–449 (1993) 3. Proceedings of the NASA/JPL Conference on Electronics for Extreme Environments, February 9-11, 1999, Pasadena, CA (1999) http://extremeelectronics.jpl.nasa.gov/conference 4. Thompson, A.: Hardware Evolution: Automatic design of electronic circuits in reconfigurable hardware by artificial evolution. Doctoral thesis, University of Sussex, UK (1996) 5. Hirst, A.J.: Notes on the Evolution of Adaptive Hardware. In: Proc. of Adaptive Computing in Engineering Design and Control, Plymouth, UK, pp. 212–219 (1996) 6. Ortega, C., Tyrrell, A.: Biologically Inspired Fault-tolerant Architectures for Real-time Control Applications. Control Engineering Practice 7(5), 673–678 (1999) 7. Kajitani, I., Hoshino, T., Iwata, M., et al.: Variable length chromosome GA for evolvable hardware. In: Proc the 1996 IEEE International Conference on Evolutionary Computation (ICEC96), Nagoya, Japan, pp. 443–447 (1996)
Extending Artificial Development: Exploiting Environmental Information for the Achievement of Phenotypic Plasticity Gunnar Tufte and Pauline C. Haddow The Norwegian University of Science and Technology Department of Computer and Information Science Sem Selandsvei 7-9, 7491 Trondheim, Norway
[email protected],
[email protected]
Abstract. Biological organisms have an inherent ability to respond to environmental changes. The response can emerge as organisms that can develop to structural and behavioural different phenotypes. The cue to what phenotypic property to express is cued by the environment. This implies that the information necessary for a single genotype to develop to different phenotypes is the genome itself and the information provided by the environment i.e. phenotypic plasticity. This concept is incorporated in the development model presented herein so as to demonstrate how an evolved genome can express different phenotypes depending on the present environment which the phenotype has to develop and survive in. An experimental approach is used to show the concept to evolved robust behaviour in different environments and to evolve genomes that can be triggered to express different behaviour depending on the present environment.
1
Introduction
In a direct mapping i.e. one-to-one mapping, there is a single phenotype, defined by the genotype. This implies that each element in the phenotype is represented explicitly in the genotype. Introducing developmental mechanisms into evolutionary computation is motivated by a need to overcome limitations in such direct mapping approaches e.g. scaling [1] and/or to mimic natural developmental in artificial systems [2]. The processing of the genome in a development mapping may be based on gene regulation [3]. Gene regulation implies that different parts of the genome are expressed in different cells at different times in the emerging phenotype. The organism emerges as a result of an interplay between the genome and the intermediate phenotypes e.g. a cell’s next state may depend on the DNA (genome) and it’s neighbours current states (intermediate phenotypes). In biology, the environment may also affect the phenotype emerging from the development process. Phenotypic plasticity [4] is a property of organisms which enables adaptation or response to the environment. The adaptation or L. Kang, Y. Liu, and S. Zeng (Eds.): ICES 2007, LNCS 4684, pp. 297–308, 2007. c Springer-Verlag Berlin Heidelberg 2007
298
G. Tufte and P.C. Haddow
response is expressed as changes in the phenotypic structure and/or behaviour. It is important to note that this adaptation occurs during the development phase. That is, the genome develops in an environment where the emerging phenotype is tuned by the environment in which it develops. This implies that developing organisms may adapt their structure and/or functionality according to information provided by the external environment i.e. external stimulus. Organisms that have an environment-organism interaction, exploitable by the mapping process may be said to be systems with mutual pertubatory channels [5]. This is often termed ”embodiment” [6]. However, in most artificial developmental models the environment is not included in the gene regulation process. In an earlier version of Miller’s development model [7], a dynamic initial environment was implemented. A cell had what was called an ”external environment” bit as input to its cell program. The value of this bit could be changed during/after development by an external stimulus to the cell, enabling the cell program to respond to the state of the environment bit. Such a bit enables a change in the cells development path and thus the emerging organism responds to this change. However, no external environment, in terms of the definition presented in [8] was present. Herein, a cellular development model is presented where a cell’s developmental action depends on external environmental information, in addition to inter– and intra–cellular communication. Including the possibility of phenotypic plasticity can be a way to achieve organisms that can actually respond to environmental changes during development. Such information can be used to achieve a sought behaviour in different environments, i.e. robust to environmental fluctuations. Another possibility is to evolve genomes for organisms that can change their behaviour based on the present environment during the organism’s development. The focus of this paper is to demonstrate how external influences can be exploited to achieve organism that can exploit phenotypic plasticity. The target is functional organism implementing digital sequential logic behaviour. Two scenarios are investigated. First, a single genome can develop to structural different organisms whilst retaining functionality in different environments. Second, a genome that can develop to organisms with different behaviour using the environment as a trigger for the desired behaviour, i.e. environmental cues.
2
Environmental Information
An important feature of natural development is that the developing organism develops within an environment. However, what is the environment for artificial systems? In [8] environment was discussed at different levels. Intra-cell environment that the DNA resides in, also referred to as the cell’s metabolism [9,10]. The second level of environment, found in most development models, is the neighbour environment referring to the inter-cell environment, enabling communication between neighbouring cells. Cells may communicate their protein levels [10]; cell type [11,9,12] or chemicals may diffuse between neighbours [11,9,13].
Extending Artificial Development: Exploiting Environmental Information
299
The ”environment” of a naturally developing organism usually refers to the external environment affecting the developing organism. In [8] this environment was expressed as a combination of both an initial environment and an external environment. When a cell is grown it is effected by the environment (at that place in the environment) — initial environment. The status of the initial environment thus affects the path of development for any given cell and thus affects the organism as a whole. However, when the organism is developed it has to survive in an environment and thus it is important that the environment beyond the growing organism can affect the developing organism. Such an environment is defined as the external environment. Empty cells, external to the organism and available to the organism, are also given an initial status such that their status may be included in fitness evaluation. As such, both the growing organism and its ”external environment” can be measured during evaluation [8]. A further implication of emerging organisms is that the phenotype may be evaluated at a given step of development, defined as the finalised phenotype, as in [10] or at each or any stage during development [12]. The latter takes the actual process of developing the emergent structure [14] or functionality [15] into the evaluation process i.e. life-time evaluation. Both evaluation strategies may include the organism and its external environment in the evaluation.
EA
Genom
Mapping
EA
Genom
Mapping
Fitness Emerging Organism
Fitness External Environment
(a) Indirect environmental influence through evolution.
Emerging Organism
External Environment
(b) Direct environmental influence exploitable by the mapping process.
Fig. 1. Evolution of developmental genomes with indirect and direct exploitation of environmental information
One notable feature of the work presented in [8] is that although an external environment is introduced, such an environment only affects the developing phenotype indirectly i.e. through evolution. There is no direct influence on the developing phenotype unlike the initial environment which directly affects the development path of all cells. However, in biology the external environment has a direct affect on the developing phenotype and such an affect is sought herein. Figure 1(a) illustrates the effect of the external environment as implemented in [8]. The organism emerges as an interplay between the genome and the emerging organism. This interplay is represented as the ”mapping” box where at any point in time the information about the genome and the organism (at that point in time) are available to the mapping process, as shown. Fitness measures the emerging organism together with its environment, as shown, at each stage of
300
G. Tufte and P.C. Haddow
the development process. The accumulated fitness, after the mapping process is stopped, is fed back to the evolutionary algorithm (EA). As such, the external environment does not influence the outcome of the development process (mapping) but rather the fitness evaluation thus providing an indirect dependence on the external environmental i.e. a system with no mutual perturbatory channels. In Figure 1(b) a similar mapping process is described except that the external environment information is available to the development process. As shown, the mapping process can exploit external environment information, in addition to the information coded in the genome and provided by the developing organism. As such, the emerging organism is a product of an interplay between the genome, the organism (at that point in time) and the present environment i.e. mutual perturbatory channels exist. In such systems, a genome can develops into different organisms depending on the environment present i.e. phenotypic plasticity is present. Such a system is implemented herein. In the work presented an inter-cell environment is present, exchanging cell types and cell states between the 2D von Neumann neighbours together with an external environment that can influence on the development process. The result of this approach is a development system that can include the phenomenon of phenotypic plasticity.
3
Development Model
The development model is based on cellular development. This implies that the genome is present and processed autonomously in every cell. In the model, the cell also contains the functional building blocks. For the experiments herein the application sought is that of a digital circuit (phenotype). Figure 2(a) illustrates the developmental system — the cell. The cell is divided into three parts: the genome (the building plan); the development process (mechanisms for cell growth and differentiation) and the functional component of the cell. The information in the functional components represents the type of the cell and the cell’s state is described by the outputs of the functional components. The genome consists of a set of rules. Rules are restricted to expressions consisting of the type and state of the target cell and the types and state of the cells in its von Neumann neighbourhood. There are two types of rules i.e. change and growth rules. Cell growth is a mechanism to expand the organism. A growth rule result provides the direction of growth: grow from north Gn; east Ge; south Gs or west Gw. It is important to note that these rules are expressed in terms of where the source of the cell growing into the target cell is. In many development systems, growth rules are expressed in which direction the target cell is going to grow. Describing where a cell is growing from enables a fully parallel implementation of the system to be created whilst retaining the possibility that cells in effect may grow in all four directions simultaneously. Growth rules have two restrictions. First, the target cell must be empty – this is to prevent growing over an existing cell and thus specialising the cell with a new cell type. Secondly, the cell to be copied into the target can not be empty.
Out
N
Extending Artificial Development: Exploiting Environmental Information
Genome
N
Result
Cell type and state W from neighboring E cells
Active?
S
Action
Development process
W Out
N E S W
301
Condition Active? Type state
Active? Type state
Active? Type state
Active? Type state
Active? Type state
Out LUT
D
SET
Q
Out
E
CLK CLR
S
Functional components
Out
N
Q
(a) Components of the cell.
W
C
E
S
(b) Regulation of genes as interplay between the genome, cell types and the cell state of the environmental or the emerging phenotype.
Fig. 2. The basic cell and a rule showing the gene regulation in the cellular development model
Differentiation changes a cell’s type i.e. its functionality. The result part of a change rule states the type of cell the target is going to be changed into. Cells have the following types: valid cell types; don’t care (DC) or empty. However, the empty cell is not a valid target cell type. Each rule consists of a result and a condition. The conditional part provides information about the cell itself and each of the neighbouring cells. In the development model presented in [12], the type of the cell was applied to describe these cells. However, to introduce external environment, state information is also needed. State information provides a way to include information relating to the functionality of the organism at a given point in time as well as information about the external environment — the empty cells in the environment also have state information. As such, a cell is represented in the condition of a rule by two genes representing its type and its state. However, a target cell is only represented by one gene: it’s type for change rules or growth direction for growth rules. The state of cell may be 0, 1 or DC. DC is introduced to provide the possibility to turn on or off this environmental influence. Firing of a rule can cause the target cell to change type, die (implemented as a change of type) or cause another cell to grow into it. Figure 2(b) illustrates the process of evaluating a rule. For each cell condition, the cell type and state are compared and if the conditions are true then that part of the rule is active. If all conditions are active then the result will become active and the rule will fire. Activation of the result gene is expressed in the emerging phenotype according to the action specified. In a development genome multiple rules are present. Multiple rules imply that more than one rule of a given cell may be activated at the same time if their
302
G. Tufte and P.C. Haddow
conditions hold. To ensure unambiguous rule firing, rule regulation is part of the development process. If the first rule is activated, the second rule can not be activated. Activation of the second rule prevents activation of the third rule, etc. The functional components of the cell is an Sblock [16]. The content of the look-up table (LUT) defines functionality and is, herein, also used to define the cell type. The LUT is the combinatorial component and the flip-flop is the memory element — capable of storing the cell state. The output value of an Sblock is synchronously updated and sent to all its four neighbours and as a feedback to itself. One update of the cell’s type under the execution of the development process is termed a development step (DS). A development step is thus a synchronous update of all cells in the cellular array. The update of the cell’s functional components i.e. one clock pulse on the flip-flop, is termed a state step (SS). A development step is thus made up of a number of state steps. The initial condition is applied before development starts. This means that all empty cells are set or reset depending on the given initial condition. To avoid empty cells updating their output values from their von Neumann neighbourhood, all cells of type Empty are set to update their outputs based on only their own output value at the previous clock pulse. As such, a given empty cell will retain its initial state — environmental information, until the emerging organism grows into it.
4
Experiments
In [8] the goal was to evolve developmental genomes that could function in different environments. Although the environmental information was given to the development mapping, its role was in the evaluation of the individual at each development step. One or more environments were tested so as to assess the effect of evolving individuals with different environmental influences. The environmental information provides a direct influence on fitness and thus evolution. As such, environmental influences was only exploitable by evolution. In the model applied herein, the development process itself can also exploit environmental information in the form of state information in the rules. As such, environmental information directly affects gene regulation thereby supporting phenotypic plasticity. In Figure 3(a) a setup for exploring phenotypic plasticity is shown. In the case shown, the genome can be exposed to two different environments A or B shown in Figure 3(b) and 3(c). The black and white colours represent states ”1” and ”0” respectively for the cells of the empty grid. If the organism is developed in environment A the resulting organism is phenotype A. The other possibility is an organism developed in environment B resulting in phenotype B. Phenotypic plasticity may be said to apply to plasticity with respect to structure whilst retaining functionality or plasticity with respect to functionality both in the presence of different environments. An experimental approach is taken to
Extending Artificial Development: Exploiting Environmental Information
303
Fitness Phenotype A
EA
Genotype
Environment A
Mapping
Fitness Phenotype B
Environment B
(a) Experimental setup.
(b) Environment A.
(c) Environment B.
Fig. 3. The experimental platform and set of possible environments
demonstrate each of the two. In the experiments conducted herein two environments were chosen ”A” and ”B”. The first set of experiments exposes the genome to the two environments during development. The fitness considers how well the genome functions in both environments. The application is a sequential counter where counting is based on the state information of the entire cellular space and the sequential operation of the functional components of the cells. The application thus places a requirement on the tuning of the development genome (by evolution) and the emerging phenotype (by development) for such sequential digital circuit behaviour. A counting sequence is defined in the cellular array as the number of logical ”1”s in the cellular array increasing by one for each state step. The goal being to achieve the counting behaviour in both environments i.e. the same functionality. In this case, a life-time fitness evaluation [12] extended for environmental adaptation [8] was applied. This experiment is similar to those performed in [8] where different environments were selected but in this case environment affects the developing phenotype directly. The second set of experiments exposes the genome to the same two environments. However, connected to a given environment is a fitness function. The same genome needs to produce all ”1”’s if developed in environment A and all ”0”’s if developed in environment B to achieve high fitness. This may be seen as a way of simulating environmental adaptation requiring different behaviour depending on the organisms environmental factors. In this case, fitness is based on the organisms functional output at the last state step of the last development step i.e. a static behaviour [15]. 4.1
Experimental Setup
In all experiments the development process was apportioned 100 development steps. Each development step was set to include 100 state steps. The maximum size of the organism was set to 1024 cells in an array of 32 by 32 cells. Note that the size of the array is important with respect to the number of cells outwith the emerged organism i.e. the number of cells in the environment.
304
G. Tufte and P.C. Haddow
Table 1. Definition of cell types and their functionality Cell type 0
LUT hex
Function name
0xF F F F 0000
no change Emty
1
0x66666666
XORd W ⊕ S
2
0x3D3D3D3D
3
0x0F F 00F F 0
XORc E ⊕ S XORb N ⊕ E
4
0x55AA55AA
5
0x55F F 55F F
XORa W ⊕ N N AN D W • N
6
0xF F 00F F 00
↓ DownP ropagation
7
0xCCCCCCCC
↑ U pP ropagation
8
0xF 0F 0F 0F 0
← Lef tP ropagation
9
0xAAAAAAAA → RightP ropagation 0xE8808000
T ≥ 4
11
0xF EE8E880
T ≥ 3
12
0xF F F EF EE8
T ≥ 2
10
Graphical representation
The environments applied were ”A”: the single state environment consisting of a single cell (the zygote) set to output a logical ”1” (see Figure 3(b)); and ”B”: an alternating state environment where every other empty cell was set to output a logical ”1” (see Figure 3(c)). The number of available cell types was set to thirteen, including the empty cell type. Available cell types was based on Sippers universal non-uniform CA cell [17] and threshold elements [18]. Table 1 provides the set of available cell types, together with their functional LUT definition and graphical symbol. The zygote was defined to be of type 5 (NAND). The evolutionary algorithm chosen was a Genetic Algorithm (GA), a modified version of a straight forward GA found in [19]. The GA’s crossover operator was modified such that a gene is undisturbed and a variable number of crossover points were implemented. The genome size was set to consist of 32 rules and the population size was set to 16. The initial population consisted of random generated valid rules. However, invalid rules may arise through the application of genetic operators. The crossover rate was set to 0.5 and the mutation rate for each gene was set to 0.0017. The GA was set to terminate after 100 000 generations. Ten runs were conducted for each experiment. The experiments were executed on a cPCI machine including a cPCI PC running the GA. Each genome was transferred on the cPCI bus to an FPGA card [20]. The development process and functional behaviour of the cellular array was executed on the FPGA [21]. 4.2
Environmental Influence Exploited to Retain Functionality
In this experiment the goal was to show that the phenotypic plasticity phenomena, in terms of maintaining functionality, could be achieved through the exploitation of external environmental information. Such functional maintenance is highly relevant in the context of achieving robust behaviour. The counting behaviour described in Section 4 was applied as the application sought and thus the behaviour to be maintained.
Extending Artificial Development: Exploiting Environmental Information
305
Rule number
20
60
15 40 10
20
5
0 0
20
40
60
80
100
30
60
25
50
20
40
15
30
10
20
5
10
0
0
0
20
40
60
Development step
Development step
(b)
(c)
(a)
80
100
Count sequence length
80
25
Number of active rules
100 30
0
Rule number
20
60
15 40 10
20
5
0 0
(d)
20
40
60
80
100
0
30
60
25
50
20
40
15
30
10
20
5
10
0 0
20
40
60
Development step
Development step
(e)
(f)
80
100
Count sequence length
80
25
Number of active rules
100 30
0
Fig. 4. Environmental influence exploited to retain functionality. Result of a genome developed in two different environments.
The output data format of the experiments can be seen in Figure 4. First a type plot, illustrating the type of each cell in the grid at development step 100 i.e. the grid represents the emerged organism and its environment. Note that only this graph depicts activity at only this point in time. The following two graphs depict the results in terms of each development step. The second graph presents the gene activation pattern during development together with the number of active rules at each development step. Rule numbers from 0 to 31 are placed on the left Y-axis. The mark (+) in the plot indicates that the rule was activated at the given development step. The right Y-axis shows the number of cells in the organism with an active rule on a given development step. The number of active rule cells is illustrated by the plotted line. Finally, the third graph shows the behaviour of the emerging organism at each development step e.g. longest counting sequence at each step. Figure 4(a) presents the type plot of the given genome at development step 100 in environment A. The same genome was applied to environment ”B” resulting in the type plot of Figure 4(d). As illustrated, the two environments influence the developing phenotype resulting in the emergence of different organisms with respect to the cell types at different points in the grid. However, the functionality i.e. counting behaviour of the emerged organisms (illustrated in Figures 4(c) and 4(f)) are almost identical indicating that phenotypic plasticity in terms of retaining functionality in different environments is present. If the gene activation plots in Figure 4(b) and 4(e) are examined the origin of the different phenotypes are clear. The development process exploits the environment by activating
306
G. Tufte and P.C. Haddow
different rules depending on the environment. In addition the common rules for both environments are to some extent exploited at different time during development. In conclusion, a single genome has resulted in two structurally different phenotypes by allowing external information to effect gene regulation. 4.3
Environmental Influence Exploited to Adapt Functionality
In this experiment the goal was to evolve developmental genomes that could respond to different environments by producing phenotypes with different functional properties. The experimental setup was such that if the genome was developed in environment ”A”, the target organism should output logical ”1” at each cell. On the other hand, if developed in environment ”B” the target organism should output a logical ”0” at each cell Figure 5 presents the result of one of the genomes from the ten runs. The results are presented in the type grid and active rules plot as in the experiments in subsection 4.2 for environment ”A” and environment ”B” respectively. However, here the functionality plot is plotted with respect to state steps rather than development steps. Figure 5(c), provides the result of the chosen genome developed in environment ”A”. As shown, the sought functionality is achieved — all cells outputting a logical”1”. That is, an organism has emerged with such a functionality under the influence of environment ”A”. Further, when the same genome is exposed
Rule number
20
60
15 40 10
20
5
0 0
20
40
60
80
100
1000
Number of set cells
80
25
Number of active rules
100 30
0
800
600
400
200
0
0
2000
Development step
(a)
4000
6000
8000
10000
8000
10000
State step
(b)
(c)
Rule number
20
60
15 40 10
20
5
0 0
20
40
60
Development step
(d)
(e)
80
100
0
1000
Number of set cells
80
25
Number of active rules
100 30
800
600
400
200
0
0
2000
4000
6000
State step
(f)
Fig. 5. Environmental influence exploited to adapt functionality. Result of a genome developed in two different environments.
Extending Artificial Development: Exploiting Environmental Information
307
to environment ”B”, the developing phenotype’s behaviour responds quite differently i.e. an organism emerges where the cells output a logical ”0”(see Figure 5(f)). Further, figures 5(a) and 5(d) illustrate how the environmental information forms the structure of the phenotype to achieve the specified target behaviour. That is, two distinctly different phenotypes emerge both structurally and functionally. Thus phenotypic plasticity is achieved in terms of varying functionality in response to environmental influences. In conclusion a single genome has resulted in two different functional phenotypes. The structural and functional differences of the phenotypes are a result of environmental influence on the gene regulation. The results selected for presentation herein are taken from the ten GA runs conducted for each experiment. In all runs the phenomenon of environmental influence on gene regulation was present. However, the effect on the structural plasticity may be present at different stages of development.
5
Conclusions and Further Work
This paper sought to investigate the notion of phenotypic plasticity and to see if it was possible to incorporate environmental information into the developing organism such that such properties inherent in biological development could be achieved. The results clearly show that it is possible to achieve both the maintenance of functionality in the presence of differing environments and allow the emerging phenotype to adapt its functionality different environment. The experiments demonstrate how the evolved developmental genomes can exploit environmental information in their gene activation to achieve the goal behaviour sought. In the experiments presented the environment information was limited to a set of two different external environments. To further investigate environmental influence on robust behaviour extended work is needed. Ongoing work include a extension of the set of external environments to be able to run more thorough experiments in the line of the work conducted in [8]. Regarding the experiment exploiting the environment to cue different behaviour ongoing work include incorporation of more information into the environment as to be able to induce several environmental cues exploitable by the developing organism.
References 1. Kitano, H.: Designing neural networks using genetic algorithms with graph generation systems. Complex Systems 4(4), 461–476 (1990) 2. Sipper, M., Sanchez, E., Mange, D., Tomassini, M., P´erez-Uribe, A., Stauffer, A.: A phylogenetic, ontogenetic, and epigenetic view of bio-inspired hardware systems. IEEE Transactions on Evolutionary Computation 1(1), 83–97 (1997) 3. Lantin, M., Fracchia, F.: Generalized context-sensative cell systems. In: Proceedings of Information Processing in Cells and Tissues, University of Liverpool, pp. 42–54 (1995)
308
G. Tufte and P.C. Haddow
4. Larsen, E.W.: Environment, development, and Evolution Toward a Synthesis. In: A View of Phenotypic Plastisity from Molecules to Morphogenisis, pp. 117–124. MIT Press, Cambridge (2004) 5. Quick, T., Dautenhahn, K., Nehaniv, C.L., Roberts, G.: On bots and bacteria: Ontology independent embodiment. In: Floreano, D., Mondada, F. (eds.) ECAL 1999. LNCS, vol. 1674, pp. 339–343. Springer, Heidelberg (1999) 6. Beer, R.D.: A dynamical systems perspective on agent-environment interaction. Artificial Intelligence 1–2(72), 173–215 (1995) 7. Miller, J.F.: Evolving developmental programs for adaptation, morphogenesis, and self-repair. In: Banzhaf, W., Ziegler, J., Christaller, T., Dittrich, P., Kim, J.T. (eds.) ECAL 2003. LNCS (LNAI), vol. 2801, pp. 256–265. Springer, Heidelberg (2003) 8. Tufte, G., Haddow, P.C.: Achieving environmental tolerance through the initiation and exploitation of external information. In: Congress on Evolutionary Computation(CEC2007), IEEE Computer Society Press, Los Alamitos (2007) 9. Federici, D.: Evolving a neurocontroller through a process of embryogeny. In: SAB 2004. LNCS, pp. 373–384. Springer, Heidelberg (2004) 10. Gordon, T.G.W., Bentley, P.J.: Development brings scalability to hardware evolution. In: the 2005 NASA/DOD Conference on Evolvable Hardware (EH’05), pp. 272–279. IEEE Computer Society Press, Los Alamitos (2005) 11. Miller, J.F.: Evolving a self-repairing, self-regulating, french flag organism. In: Deb, K., et al. (eds.) GECCO 2004. LNCS, vol. 3102, pp. 129–139. Springer, Heidelberg (2004) 12. Tufte, G., Haddow, P.C.: Identification of functionality during development on a virtual sblock fpga. In: Congress on Evolutionary Computation(CEC2003), pp. 731–738. IEEE Computer Society Press, Los Alamitos (2003) 13. Bongard, J.C., Pfeifer, R.: Morpho-functional Machines: The New Species (Designing Embodied Intelligence). In: Evolving complete agents using artificial ontogeny, pp. 237–258. Springer, Heidelberg (2003) 14. Viswanathan, S., Pollack, J.B.: How artificial ontogenies can retard evolution. In: Genetic and Evolutionary Computation (GECCO 2005), ACM Press, New York (2005) 15. Tufte, G.: Cellular development: A search for functionality. In: Congress on Evolutionary Computation(CEC2006), IEEE Computer Society Press, Los Alamitos (2006) 16. Haddow, P.C., Tufte, G.: An evolvable hardware FPGA for adaptive hardware. In: Congress on Evolutionary Computation(CEC00), pp. 553–560 (2000) 17. Sipper, M.: Evolution of Parallel Cellular Machines The Cellular Programming Approach. Springer, Heidelberg (1997) 18. Beiu, V., Yang, J.M., Quintana, L., Avedillo, M.J.: Vlsi implementations of threshold logic-a comprehensive survey. IEEE Transactions on Neural Networks 14(5), 1217–1243 (2003) 19. Spears, W.M.: Gac ga archives source code collection webpage http://www.aic.nrl.navy.mil/galist/src/ (1991) 20. Nallatech.: BenERA User Guide, nt107-0072 (issue 3) 09-04-2002 edn. (2002) 21. Tufte, G., Haddow, P.C.: Biologically-inspired: A rule-based self-reconfiguration of a virtex chip. In: Bubak, M., van Albada, G.D., Sloot, P.M.A., Dongarra, J.J. (eds.) ICCS 2004. LNCS, pp. 1249–1256. Springer, Heidelberg (2004)
UDT-Based Multi-objective Evolutionary Design of Passive Power Filters of a Hybrid Power Filter System* Shuguang Zhao, Qiu Du, Zongpu Liu, and Xianghe Pan College of Informatiom Sciences and Technology, Donghua University 2999 North Renmin Road, Songjiang, Shanghai, 201620, China
[email protected]
Abstract. Passive Power Filters (PPFs) are vital equipments for harmonic pollution control, but optimal design of PPFs is a rather difficult problem of multi-objective optimization and evaluation. On the basis of our previous work in evolvable hardware, especially evolutionary design of circuits, a Uniform Design Technique (UDT) based multi-objective genetic algorithm was developed towards optimal design of PPFs. It is characterized by an efficient and effective encoding-decoding scheme based on the standard series of component values, a mechanism to integrate and evaluate multi-objectives based on UDT, weight-vectors adjusting and PSpice simulation, a UDT based multi-parent crossover operator to improve offspring quality and computation cost, and an adaptation technique for genetic parameters to maintain the individual diversity and track the GA process. It is shown by simulation results capable of searching out a set of effective design results of PPFs, which meet the main optimization objectives and reflect roughly interactions between them.
1 Introduction Although born just 15 years ago, so far evolvable hardware (EHW) has witnessed many significant progresses, which argue that EHW is not only promising but also applicable [1-3]. During the last decade, we made continuously research on EHW, especially evolutionary design of circuits. The outcomes we obtained include minterm-based efficient encoding and function-level evolution of digital circuits, approaches to scalable EHW that combing evolutionary design with data mining, and Pspice-based multi-objective optimization of analog circuits [3]. On these bases we recently studied multi-objective optimization of power filters, especially Passive Power Filters (PPFs), which are fundamental and vital to harmonic pollution control. In this paper, an evolutionary approach to optimal design of PPFs was proposed. It is also an effort to shorten the gap between research and real-world applications of evolvable hardware. The experimental results presented are promising, which argue that evolvable hardware is essentially applicable. The remainder of this paper is organized as follows. In section 2, technical background and current status of optimal design of PPFs are briefed. The evolutionary *
This work was supported by National Natural Science Foundation of China ( No. 60672026).
L. Kang, Y. Liu, and S. Zeng (Eds.): ICES 2007, LNCS 4684, pp. 309–318, 2007. © Springer-Verlag Berlin Heidelberg 2007
310
S. Zhao et al.
approach we developed, i.e., a UDT-based multi-objective adaptive Genetic Algorithm (GA), is introduced in detail in section 3. Some experimental results are presented in section 4. The section 5 concludes this paper.
2 Background and Problem Statement Mainly due to wide use of nonlinear loads, harmonic pollution has become an increasingly serious problem which causes many other problems such as low power utilization efficiency, malfunction, shortened life or even destruction of electric facilities, and widespread electromagnetic interference (EMI). Several types of equipments are now available to control harmonic pollution, e.g., PPFs, Active Power Filters (APFs), and Hybrid Power Filter Systems (HPFSs). Mainly consisting of (electric) capacitors and inductors, PPFs boast of being simple, cost-effective and nearly maintenance-free. As a result, PPFs have been applied first of all and commonly used so far, although they lack stability of performances and security of operation when the source impedance and/or load vary. In contrast with PPFs, APFs have better filter and more stable performances but higher installation costs and fewer filter capacities. Coming out of combinations of PPFs and APFs, HPFSs inherited their characteristics but drawbacks, making themselves most promising and applicable. However, overall cost and performances of HPFSs are still ultimately affected by how well the comprised PPFs are optimized [4, 5]. In other words, PPFs, either used alone or comprised in HPFs, are vital to harmonic pollution control. Thus, it is still meaningful to research on optimal design of PPFs for the sake of lower cost, high performance and more practical equipments of harmonic pollution control. Unfortunately, optimal design of PPFs is a quite difficult problem of multi-objective nonlinear optimization with constraints [4, 5]. It is to meet not only technical specifications but also many other requirements including cost and safety (stability), under many constraints arising from practical factors, e.g., impedance and frequency of the power system, harmonic source characteristics, component tolerance, and even variation of environment temperature. The complexity of PPFs performance evaluation makes it much harder. As a result, there are few effective approaches to optimal design of PPFs at present although some relevant papers can be found in major journals or proceedings. Traditional methods of PPF design are generally based on specific engineering experiences or simplified models and aimed at just partial specifications, and consequently they are incapable finding optimal solutions [4]. Most existing methods of PPF optimization actually aim at just harmonic suppression, and they usually comprise some iterative steps derived from nonlinear programming, which work just when the gradients or the partial derivatives of the objective functions are continuous [5, 6, 7]. Consequently they are generally incapable of globally multi-objective optimization. Some methods proposed recently that apply novel optimization techniques (e.g., GA), could also be improved in multi-objective optimization, efficiency and success rate [8, 9]. Thus, our effort in evolutionary design of PPFs is meaningful and noticeable.
UDT-Based Multi-objective Evolutionary Design of PPFs
311
3 The UDT-Based Multi-objective Adaptive GA By modifying a framework of Elitist reserve Genetic Algorithm (EGA), which is theoretically guaranteed to converge, with a set of techniques introduced hereafter, the evolutionary approach we developed is promising in efficient overall optimization of PPFs. 3.1 Sequence-Number-Based Encoding While PPFs have several different circuit topologies, which mainly differ in numbers and types of the tuning filters contained, they have little difference in operation principles and optimization requirements. As illustrated in Figure 1, a typical PPF usually consists of several Single Tuning Filters (STF) and a High Pass Filter (HPF) [1,2]. For the GA to adopt a short and efficient chromosome, a standard-value series of 1% precision class, which has just 96 discrete values in each interval of one decade (e.g., [0.1,1], [1,10]), is chosen to encode a PPF. That is, the value of each PPF component (resistor, capacitor or inductor) is encoded with a 9-bit binary number (sequence encoding) that equals its sequence number in the standard-value table. Consequently, the chromosome is assembled in the form of circuit net-list, X = [Cb1 ] 2 & [ Lb1 ] 2 & [ Rb1 ]2 & " & [C h ]2 & [ Lh ] 2 & [ Rh ]2
(1)
where [Cb1]2, [Lb1]2, and [Rb1]2 indicates the sequence encoding of the capacitor, inductor and resistor of the first STF, respectively; the components of other STFs are encoded in the same way; [Ch]2, [Lh]2, and [Rh]2 indicate the sequence encoding of the corresponding HPF components. As 9 bits are sufficient for a sequence encoding to span a value range of five orders of magnitude, just one half of that required by direct binary encoding schemes, chromosome length of a PPF having M STFs is Lc =27M+27. Because the computation complexity of decoding (i.e., looking up the standard-value
Fig. 1. Diagram of a 3-phase Parallel hybrid power filter system
312
S. Zhao et al.
table) is neglectable and the computation complexity of genetic operations is about O(Lc2), with such an encoding-decoding scheme the GA computation complexity can be largely reduced by a factor of 4, and tolerance to component-value errors and consequently practicability and economics of the design results (PPFs) can be largely improved. 3.2 UDT-Based Integration of Multi-objectives For problems of multi-objective optimization like circuit design [3], a fitness function frequently used is weighted sum of objective functions, which converts such a problem into its one-objective equivalent that an evolutionary algorithm (EA) is competent for k
Maximize
f ( X ) = ∑ wi ⋅ f i ( X ) i =1
(2)
An approach with such a fitness function is rather simple and easy to implement, but its search direction is confined to that determined by the weight vector {wi}. As a result, each run of it definitely can search out no more than one Pareto-optimal solution, and solutions obtained from multiple executions of it have no expected distribution uniformity. The approaches that treat less important objectives as constraints have similar deficiencies. Recently, some efforts were made to solve this problem by using multiple weight vectors to get multiple search directions and multiple solutions, but the requirement of distribution uniformity is still often unsatisfied mainly due to arbitrary designation or random generation of weight-values. We designed a novel multi-objective adaptive genetic algorithm (UMOAGA) on the basis of the Uniform Design Technique (UDT) [10] and our previous works [3, 11], and applied it to evolutionary design of circuits. The UDT is a powerful experiment design technique [10]. For an experiment with n factors (i.e., variables) and q levels (i.e., numbers of possible values) per factor, the UDT helps select q representative combinations out of qn possible ones. The selected combinations are generally related with points uniformly scattered in the combination space, and the partial experiments based on them, which satisfactorily reflect each factor’s effect and interaction of the factors, have similar outcomes to that of a complete experiment. Many uniform design tables and matched application tables have been designed to ease applications of the UDT [10]. It has been proved that a weight vector derived from the combinations selected by the UDT has a desirable property, that is, its components or weight coefficients may have expected proportion relationships with each other in the sense of uniformity and integrality [3]. Hence, a task to compose m fitness functions fiti for k objective functions fi(X) can be regarded as a problem of experiment design with k factors and m levels per factor. By looking up the UDT tables indexed by k and m we can obtain a uniform design matrix U(k,m), each row of which corresponds to a select combination, then calculate normalized weight coefficients as follows k
wi , j = U i , j / ∑ U i , j j =1
i = 1, " m, j = 1, " k
(3)
UDT-Based Multi-objective Evolutionary Design of PPFs
Obviously,
∑
m i =1
the
weight
coefficients
are
all
positive
and
313
normalized,
wi , j = 1 .Thus, after normalizing objective functions to prevent them from
domination, we can compose m normalized fitness functions fiti by combining equation (2) with equation (3) k
fiti = ∑ wi , j ⋅ h j ( X ),
i = 1," m
(4)
j =1
With such fitness functions uniformly composed, fiti, genetic search of the GA will be guided in m consequent directions scattered uniformly toward the Pareto frontier, as exemplified in figure 2.
ï
Fig. 2. A set of weight vectors and search directions correspond to U(2,5)
3.3 Pspice-Based Multi-objective Evaluation of PPF Candidates As for PPF design, because the objective functions involved vary in dimensions and magnitudes, they must be normalized before being integrated into a fitness function. First, the difference between the ideal magnitude-frequency curve and that of an individual, ef(X, t), is used to estimate the filtering effect of the PPF, P
e f ( X , t ) = ∑ a ( f K ) ⋅[V goal ( f k ) − Vreal ( X , t , f k )]2 k =1
(5)
where, fk indicates a sample frequency, k=1…P; Vgoal(fk) and Vreal(X,t,fk) indicates the ideal value and the actual one at fk, respectively; a(fk) is a weight-factor reflecting the importance of the error at fk,. Thus, a normalized objective function of harmonic suppression can be defined as F1 ( X , t ) = exp(− K F 1 ⋅ e f ( X , t ))
(6)
Second, the installation cost of PPF mainly depends on that of electric components including electric capacitors, inductors and resistors, which are ultimately determined by the relevant electric specifications (e.g., rated capacities of the capacitors). Considering that cost of the electric capacitors has largest impact on the installation
314
S. Zhao et al.
cost and that there exists positive correlation between the rated capacity, Cp(K var), and the capacitance, C(μF), of an electric capacitor C p = ω ⋅ C ⋅U m
2
(7)
where Um(kV) is the rated voltage strength, ω is the angle frequency of electricity, in this paper cost of the inductors and resistors was ignored and the cost of electric capacitors is estimated by their capacitances. Thus, a normalized objective function of installation cost can be defined as M +1
F2 ( X , t ) = 1 − K F 2 • ∑ C j ( X , t ) j =1
(8)
where KF2 is a positive constant, M is the number of the STFs. Then, the total optimization objective and the corresponding fitness function can be expressed as 2
max FS ( X , t ) = ∑ wi (t ) •Fi ( X , t ) i =1
(9)
On these bases, as PPFs are similar to electronic passive filters in operation principle, Pspice, a well-known software of circuit simulation, is adopted to evaluate the main performance of PPFs. The evaluation process goes as follows: first, decode the individual to be evaluated into components, and compose a net-list file (.cir) in accordance with the Pspice criterions; second, execute a Pspice simulation using the net-list file as its data-source; third, read the result file generated by the Pspice simulation, and calculate fitness functions for the individual; finally, after evaluating fitness for all individuals in the current population, adapt the genetic probabilities according to the relevant equations. In this way, the PPF is evaluated as a whole, more effective design results than that of other methods, which treat each filter independently, can be obtained. 3.4 UDT Based Multi-parent Crossover Crossover is a primary and computation-intensive way for GA to breed individuals. Traditional crossover operators, which mate two chosen individuals to make a new individual, tends to have degenerate offspring and low operation efficiency. Consider the above defects we designed a UDT based novel crossover operator, which chooses multiple individuals and mates them to make multiple individuals. As for binary-coded Gas, provided that a individual is encoded as Pi=bi,1&bi,2…&bi,L, bi,j∈{0,1}, j=1,…,L, the crossover operation can be outlined as follows: Step 1. After determining the number of parents m, choose a natural number n<m. Look up the UDT tables indexed by m and n to obtain a uniform design matrix U(n,m)= [Ui,j]m⋅n. Step 2. Randomly divide the set of individuals’ subscript into n subsets Si, which satisfies Si∩Sj =∅ for i≠j and ∪i Si= S, i=1…n. Step 3. Randomly choose m parents in a proper manner. Sort them in a degressive order of fitness, and relate each of them with its sequence number which ranges from 1 to m.
UDT-Based Multi-objective Evolutionary Design of PPFs
315
Step 4. Partition the subscripts of the m parents in accordance with Sj(j=1,…,n), e.g., divide each parent into n sub-strings. Step 5. Iteratively produce m offspring by using each row of U(n,m) to compose an offspring. As to the ith offspring, cope the jth partition of the parent indicated by the element Ui,j as its jth partition while the subset Sj indicates which subscripts are involved. As for real-coded GAs or integer-coded GAs, the idea and operations for multi-parent crossover are very similar to that for binary-coded GAs. But each of m offspring is produced by computing a weighted sum of n parents on the basis of the uniform design matrix U(n,m) obtained, e.g., n
n
j =1
j =1
X i' = ∑ (U i , j ⋅ X j ) / ∑U i , j
i = 1,", m
(10)
In order to keep a constant population size for convenience, it is suggested to choose m from the factors of Ps, the population size. Otherwise an extra random or tournament selection is needed. 3.5 Adaptation Strategy for GA Parameters The optimal values of some GA parameters, e.g., crossover probability Pc and mutation probability Pm, vary with the specific applications, the population distribution, and the GA process. A reasonable and generally effective way to improve GA’s performance is to allow Pc and Pm to be self-adaptive. In this paper, Pc and Pm are enabled to adapt to genetic-procedure and individuals' diversity. The former is estimated by relative generation number, a ratio of the current generation number, t, to the maximal generation number allowed, tmax. The latter is measured with concentration degree of individuals in the population, which is estimated as f d (t ) = f avg (t ) /[ε + f max (t ) − f min (t )]
(11)
where favg(t), fmax(t) and fmin(t) are average, maximal and minimum fitness of all individuals in the population, respectively. Obviously, the index fd(t) satisfies 0< fd(t) <+∞, and it will simultaneously vary with the individuals’ diversity latter but in a reverse direction. On these bases, Pc and Pm are ordered to adapt themselves in the following manners Pc = Pc 0 • e −b1 •t / tmax / f d (t )
0 ≤ t ≤ t max
(12)
Pm = Pm 0 • e − b2 •t / tmax • f d (t )
0 ≤ t ≤ t max
(13)
where Pc0 and Pm0 are initial values of Pc and Pm respectively; t0 and t1 are generation numbers that satisfy the inequality: 0≤t0≤t1≤tmax; b1 and b2 are positive constants. With the above equations and the parameters properly selected, like that adopted in this
316
S. Zhao et al.
paper: Pc0 =0.8, Pm0 =0.1, t0=0.2tmax and t1=0.8tmax, an increase of fd(t) that reflects concentrating tendency of the population (i.e., premature convergence) will lead to an increase of Pm and decrease of Pc, whereas a decrease of fd (t) will lead to a decrease of Pm and increase of Pc. Meanwhile, Pc and Pm will slowly decrease as the GA process develops. Thus, Pc and Pm could keep nearly-optimal during the GA execution. 3.6 Summary of the Algorithm As a Combination of a framework of EGA and a set of techniques introduced above, the GA procedure can be outlined as follows: /* parameters: Ps=population size, tmax=maximum generation allowed, (k, m) for UDT-based integration of multiple objectives, (s, q) for UDT-based multi-parent crossover Step 1. Initialization: Obtain the uniform matrixes U(k,m) and U(s,q) by looking up the tables; compose the fitness function(s) according to equation (5), (7), (8)and (9); t←0; Step 2. Generating initial population: randomly generate N⋅Ps individuals (N≥1, N=2 in this paper); with each of the m fitness functions, choose Ps/m fittest individuals, making up an initial population of Ps uniformly distributed (in the search space) individuals. Step 3. Genetic iteration: repeat the following substeps until t ≥ tmax: 1) According to equations (11) - (13), figure out Pc and Pm; 2) Using the uniform matrix U(s,q), execute multi-parent crossover as described above; 3) To all of the offspring, execute one-point mutation with a probability Pm; 4) To each of all individuals, execute orderly decoding, Pspice simulation and fitness evaluation; 5) Update the population: with each of the m fitness functions, choose Ps/m fittest individuals, making up a new population of Ps individuals; 6) t←t+1 Step 4. According to the k objective functions, identify Pareto-optimal solutions from the last-generation population and then output them.
4 Experiments and Discussions A GA program has been developed with the above ideas. With it, several experiments have been completed. As for a HPFS to be used with resistor-inductor loads and harmonic sources like rectifiers, which feature harmonics of 6n±1 orders, the PPF embedded for each of three-phase consists of a STF for the 5th harmonic, a STF for the 7th harmonic and a HPF. By taking the APF as an ideal voltage-source with inner resistors of its transformer coils (i.e., R5 and R7), the simulation model shown in figure 2 is applicable. Both harmonic suppression and installation cost were taken as optimization objectives. Five search directions or weight-vectors were chosen with Equation (5) and the uniform design matrix U(2,5) (refer to figure 1), each of which
UDT-Based Multi-objective Evolutionary Design of PPFs
317
Fig. 3. Structure and simulation model of the PPF Table 1. A typical group of PPF design results obtained with the approach Parameters Of the 5th STF
Of the 7th STF
Of the HPF THD with PPF Total capacitor capacity
Result 1 C5=22.6μF L5=17.8mH R5=0.10Ω C7=17.4μF L7=12.1mH R7=0.10Ω Ch=255μF Lh=576μH Rh=1.62Ω 4.5%
Result 2 C5=29.6μF L5=15.2mH R5=0.01Ω C7=15.0μF L7=13.7mH R7=0.20Ω Ch=301μF Lh=453μH Rh=1.21Ω 4.1%
Result 3 C5=28.7μF L5=14.0mH R5=0.10Ω C7=16.7μF L7=13.7mH R7=0.01Ω Ch=335μF Lh=410μH Rh=1Ω 3.5%
Result 4 C5=31.6μF L5=13.0mH R5=0.01Ω C7=22.6μF L7=9.09mH R7=0.02Ω Ch=348μF Lh=392μH Rh=1.10Ω 3.2%
Result 5 C5=29.4μF L5=13.7mH R5=0.01Ω C7=20.5μF L7=10mH R7=0.01Ω Ch=389μF Lh=348μH Rh=1Ω 3.0%
295.0μF
345.6μF
380.4μF
402.2μF
438.9μF
was used for Five execution of the GA program. The other experimental parameters include: (population size) Ps=100, (maximum generation allowed) tmax=200, Pc0 =0.8, Pm0 =0.1, b1=b2=3, and (s,q)=(3,4). The typical design results obtained were listed in Table 1. Where, THD is short for Total Harmonic Distortion, which is 28% without using the PPF. It can be concluded from Table 1 that the influence of initial weight-vectors on the PPF design results, regarding preferences to sub-objectives, is observable, and that the design results listed can not only meet different preferences but also reflect approximately the tradeoff relationship between the sub-objectives, which can be described more precisely by increasing the experiment levels, m.
5 Conclusions Towards synthetically optimized PPFs and consequent HPFSs, a UDT-based multi-objective adaptive GA was proposed in this paper. Both theoretical analyses and experimental results show that the GA is improved in decrease of computation cost, obtaining better crossover results (offspring), and finding out a set of representative Pareto-optimal solutions, mainly by virtue of extensive applications of UDT to each
318
S. Zhao et al.
important step of it. Moreover, the strategy of multiobjective evaluation and optimization, based on sequence number encoding and PSpice simulation which take into account both component tolerance and interaction between the sub-filters, help obtaining more effective and representative PPF design results. Thus, this paper argues that evolvable hardware is essentially applicable.
References [1] Yao, X., Higuichi, T.: Promises and Challenges of Evolvable Hardware. IEEE Trans. On Systems Man and Cybernetics-Part C 29(1), 87–97 (1999) [2] Higuchi, T., Iwata, M., Keymeulen, D., et al.: Real-World Applications of Analog and Digital Evolvable Hardware. IEEE Trans. On Evolutionary Computation 3(3), 220–235 (1999) [3] Zhao, S.: Study of the Evolutionary Design Methods of Electronic Circuits (Phd. Thesis). Xidian University, Xi’an, China (2003) [4] Jing-chang, W.: Power Supply System Harmonics. Chinese Electric Power Publishing Company, Beijing (1998) [5] Kelly, A.W., Yadusky, W.F.: Rectifier design for minimum line-current harmonics and maximum power factor. IEEE Trans Power Electron 7, 332–341 (1992) [6] Bhattacharya, S., Cheng, P.T, Divan, D.M.: Hybrid Solutions for Improving Passive Filter Performance in High Power Applications. IEEE Transactions on Industry Applications 33(3), 732–747 (1997) [7] Moo, C.S., Cheng, H.L., Guo, S.J.: Designing passive LC filters with contour maps. In: Proc. Int. Conf. Power Electronics and Drive Systems, pp. 834–838 (1997) [8] Chen, Y.-M.: Passive Filter Design Using Genetic Algorithms. IEEE Transactions on Industrial Electronics 50(1), 202–207 (2003) [9] Wang, X., Gao, X.Z., Ovaska, S.J.: A hybrid optimization algorithm in power filter design. In: Industrial Electronics Society, 2005. IECON 2005. 32nd Annual Conference of IEEE, 2005, 1335–1340 (2005) [10] Kai-tai, F., Chang-qing, M.: Orthogonal and Uniform Experimental Design. Science Press, Beijing (2001) [11] Zhao, S., Jiao, L., Zhao, J., et al.: Evolutionary Design of Analog Circuits with a Uniform-Design Based Multi-Objective Adaptive Genetic Algorithm. In: Proceedings of EH-2005, pp. 26–29 (2005)
Designing Electronic Circuits by Means of Gene Expression Programming Ⅱ Xuesong Yan, Wei Wei, Qingzhong Liang , Chengyu Hu, and Yuan Yao School of Computer Science, China University of Geosciences, Wu-Han, 430074, China Research Center for Space Science and Technology, China University of Geosciences, Wu-Han, 430074, China
[email protected]
Abstract. A major bottleneck in the evolutionary design of electronic circuits is the problem of scale. This refers to the very fast growth of the number of gates, used in the target circuit, as the number of inputs of the evolved logic function increases. Another related obstacle is the time required to calculate the fitness value of a circuit. In this paper, We propose a new means (Gene Expression Programming) for designing electronic circuits and introduces the encoding of the circuit as a chromosome, the genetic operators and the fitness function. From the case studies show this means has proved to be efficient to the electronic circuit and the evolution speed is fast .The experiments results show that we have attained the better results.
1 Introduction Evolutionary Electronics applies the concepts of genetic algorithms to the evolution of electronic circuits (Thompson, 1996). The main idea behind this research field is that each possible electronic circuit can be represented as an individual or a chromosome of an evolutionary process, which performs standard genetic operations over the circuits. Due to the broad scope of the area, researchers have been focusing on different problems, such as placement (Esbensen and Mazumder, 1994), routing (Esbensen, 1994), Field Programmable Gate Array (FPGA) mapping (Kommu and Pomenraz, 1993), optimization of combinational and sequential digital circuits (Miller and Thomson, 1995), synthesis of digital circuits (Miller et al., 1997), synthesis of passive and active analog circuits (Koza et al., 1996), synthesis of operational amplifiers (Kruiskamp and Leenaerts, 1995), and transistor size optimization (Rogenmoser et al., 1996). Of great relevance are the works focusing on “intrinsic” hardware evolution (Layzell, 1998; Thompson et al., 1996) in which fitness evaluation is performed in silicon, allowing a higher degree of exploration of the physical properties of the medium. This particular area is frequently called Evolvable Hardware[1]. In the sequence of this work, Coello, Christiansen and Aguirre (1996) presented a computer program that automatically generates high-quality circuit designs [5]. They use five possible types of gates (AND, NOT, OR, XOR and WIRE) with the objective of finding a functional design that minimizes the use of gates other than WIRE (essentially a logical no-operation). L. Kang, Y. Liu, and S. Zeng (Eds.): ICES 2007, LNCS 4684, pp. 319–330, 2007. © Springer-Verlag Berlin Heidelberg 2007
320
X. Yan et al.
Miller, Thompson and Fogarty (1997) applied evolutionary algorithms for the design of arithmetic circuits. The technique was based on evolving the functionality and connectivity of a rectangular array of logic cells, with a model of the resources reflecting the Xilinx 6216 FPGA device [6]. Kalganova, Miller and Lipnitskaya (1998) proposed another technique for designing multiple-valued circuits. The EH is easily adapted to the distinct types of multiplevalued gates, associated with operations corresponding to different types of algebra, and can include other logical expressions [7]. This approach is an extension of EH method for binary logic circuits proposed in [6]. In order to solve complex systems, Torresen (1998) proposed the method of increased complexity evolution. The idea is to evolve a system gradually as a kind of divide-and-conquer method. Evolution is first undertaken individually on a large number of simple cells. The evolved functions are the basic blocks adopted in further evolution or assembly of a larger and more complex system [8]. More recently Hollingworth, Smith and Tyrrell (2000) describe the first attempts to evolve circuits using the Virtex Family of devices. They implemented a simple 2-bit adder, where the inputs to the circuit are the two 2-bit numbers and the expected output is the sum of the two input values [9]. A major bottleneck in the evolutionary design of electronic circuits is the problem of scale. This refers to the very fast growth of the number of gates, used in the target circuit, as the number of inputs of the evolved logic function increases. This results in a huge search space that is difficult to explore even with evolutionary techniques. Another related obstacle is the time required to calculate the fitness value of a circuit [10]. A possible method to solve this problem is to use building blocks either than simple gates. Nevertheless, this technique leads to another difficulty, which is how to define building blocks that are suitable for evolution. Timothy Gordon (2002) suggests an approach that allows evolution to search for good inductive bases for solving large-scale complex problems. This scheme generates, inherently, modular and iterative structures, that exist in many real-world circuit designs but, at the same time, allows evolution to search innovative areas of space [11]. Now the research of the evolvable hardware lies in the hardware representation and the concrete example, not the algorithm, so the hardware representation and code is the most important research contents. No matter is carries on the electronic circuit design and optimize using the evolvable hardware, or carries on the evolvable hardware research, the basic problems are the domain knowledge and hardware representation. For the evolutionary algorithm, the representation of the electronic circuit has two methods, one kind is code with the electronic circuit solution space, the other is code with the problem space. How weighs one representation quality may think the following questions? The first is the code method should as far as possible complete, it is say for the significance solution circuit or the optimize solution obtains in the problem space may represented by this code method. The second is the code method should speeds up the convergence speed of the algorithm search. Following this line of research, this paper proposes a new means for designing electronic circuits. This paper is organized as follows. Section 2 introduces the gene expression programming (GEP). Section 3 introduces the new means to design the circuit, as well as the encoding of the circuit as a chromosome, the genetic operators
Designing Electronic Circuits by Means of Gene Expression Programming
Ⅱ 321
and the fitness function. Sections 4 present the simulation results. Finally, section 5 presents the main conclusions.
2 Gene Expression Programming Gene expression programming (GEP) is, like genetic algorithms (GAs) and genetic programming (GP), a genetic algorithm as it uses populations of individuals, selects them according to fitness, and introduces genetic variation using one or more genetic operators [4]. The fundamental difference between the three algorithms resides in the nature of the individuals: in GAs the individuals are linear strings of fixed length (chromosomes); in GP the individuals are nonlinear entities of different sizes and shapes (parse trees); and in GEP the individuals are encoded as linear strings of fixed length (the genome or chromosomes) which are afterwards expressed as nonlinear entities of different sizes and shapes (i.e., simple diagram representations or expression trees). The interplay of chromosomes (replicators) and expression trees (phenotype) in GEP implies an unequivocal translation system for translating the language of chromosomes into the language of expression trees (ETs). The structural organization of GEP chromosomes presented in this work allows a truly functional genotype/phenotype relationship, as any modification made in the genome always results in syntactically correct ETs or programs. Indeed, the varied set of genetic operators developed to introduce genetic diversity in GEP populations always produces valid ETs. Thus, GEP is an artificial life system, well established beyond the replicator threshold, capable of adaptation and evolution. The advantages of a system like GEP are clear from nature, but the most important should be emphasized. First, the chromosomes are simple entities: linear, compact, relatively small, easy to manipulate genetically (replicate, mutate, recombine, transpose, etc.). Second, the ETs are exclusively the expression of their respective chromosomes; they are the entities upon which selection acts and, according to fitness, they are selected to reproduce with modification. During reproduction it is the chromosomes of the individuals, not the ETs, which are reproduced with modification and transmitted to the next generation. On account of these characteristics, GEP is extremely versatile and greatly surpasses the existing evolutionary techniques. Indeed, in the most complex problem presented in this work, the evolution of cellular automata rules for the density-classification task, GEP surpasses GP by more than four orders of magnitude. 2.1 Gene Representation GEP genes are composed of a head and a tail. The head contains symbols that represent both functions (elements from the function set F) and terminals (elements from the terminal set T), whereas the tail contains only terminals. Therefore two different alphabets occur at different regions within a gene. For each problem, the length of the head h is chosen, whereas the length of the tail t is a function of h and the number of arguments of the function with the most arguments n, and is evaluated by the equation: t = h (n-1) + 1
(1)
322
X. Yan et al.
Consider a gene composed of {Q, *, /, -, +, a, b}. In this case n = 2. For instance, for h = 15 and t = 16, the length of the gene is 10+11=21. One such gene is shown below (the tail is shown in bold): 0123456789012345678901234567890 *b+a-aQab+//+b+babbabbbababbaaa and it codes for the following ET:
In this case, the open reading frames ends at position 7, whereas the gene ends at position 30. GEP chromosomes are usually composed of more than one gene of equal length. For each problem or run, the number of genes, as well as the length of the head, is chosen. Each gene codes for a sub-ET and the sub-ETs interact with one another forming a more complex multi-subunit ET. 2.2 Genetic Operator Selection: In GEP, individuals were selected according to fitness by roulette wheel sampling coupled with the cloning of the best individual. The fitter the individual the higher the probability of leaving more offspring. Thus, during replication the genomes of the selected individuals are copied as many times as the outcome of the roulette. The roulette is spin as many times as there are individuals in the population, always maintaining the same population size. Mutation: Mutations can occur anywhere in the chromosome. However, the structural organization of chromosomes must remain intact. In the heads any symbol can change into another (function or terminal); in the tails terminals can only change into terminals. This way, the structural organization of chromosomes is maintained, and all the new individuals produced by mutation are structurally correct programs. It is worth noticing that in GEP there are no constraints neither in the kind of mutation nor the number of mutations in a chromosome: in all cases the newly created individuals are syntactically correct programs. Transposition: The transposable elements of GEP are fragments of the genome that can be activated and jump to another place in the chromosome. In GEP there are three
Designing Electronic Circuits by Means of Gene Expression Programming
Ⅱ 323
kinds of transposable elements. (1) Short fragments with a function or terminal in the first position that transpose to the head of genes, except to the root (insertion sequence elements or IS elements). (2) Short fragments with a function in the first position that transpose to the root of genes (root IS elements or RIS elements). (3) Entire genes that transpose to the beginning of chromosomes. Recombination: In GEP there are three kinds of recombination: one-point, two-point, and gene recombination. (1) One-point: During one-point recombination, the chromosomes crossover a randomly chosen point to form two daughter chromosomes. (2) Two-point: In two-point recombination the chromosomes are paired and the two points of recombination are randomly chosen. The material between the recombination points is afterwards exchanged between the two chromosomes, forming two new daughter chromosomes. (3) Gene recombination: In gene recombination an entire gene is exchanged during crossover. The exchanged genes are randomly chosen and occupy the same position in the parent chromosomes. It is worth noting that this operator is unable to create new genes: the individuals created are different arrangements of existing genes.
3 Circuits Evolution Use Gene Expression Programming 3.1 Chromosome Representation In our algorithm, the chromosome representation we use Cartesian Genetic Programming, which is proposed by Miller[13,14]. CGP is Cartesian in the sense that the method considers a grid of nodes that are addressed in a Cartesian coordinate system. In the CGP the genotype is represented as a list of integers that are mapped to directed graphs rather than trees. One motivation for this is that it uses graphs which are more general than trees. In CGP the genotypes are of fixed length but the phenotypes are of variable length according to the number of unexpressed genes. This representation is based on the FPGA of Xilinx Virtex-II. The starting point in this technique is to consider, for each potential design, a geometry (of a fixed size array) of uncommitted logic cells that exist between a set of desired inputs and outputs. The uncommitted logic cells are typical of the resource provided on the Xilinx FPGA part under consideration. An uncommitted logic cell refers to a two-input, single-output logic module with no fixed functionality. The functionality may then be chosen, at the implementation time, to be any two input variable logic function. In this technique, a chromosome is defined as a set of interconnections and gate level functionality for these cells from outputs back toward the inputs based upon a numbered rectangular grid of the cells themselves, as in Fig. 1. The inputs that are made available are logic ‘0’, logic ‘1’, all primary inputs and primary inputs inverted. To illustrate this consider a 3 x 3 array of logic cells between two required primary inputs and two required outputs. The inputs 0 and 1 are standard within the chromosome, and represent the fixed values, logic ‘0’ and logic ‘1’ respectively. The inputs (two in this case) are numbered 2 and 3, with 2 being the most significant. The lines 4 and 5 represent the inverted inputs 2 and 3 respectively. The logic cells which form the array are numbered column-wise from 6 to 14. The outputs are numbered 13 and 11, meaning that the most significant output is connected to the output of cell 13 and the least significant output is connected
324
X. Yan et al.
Fig. 1. A 3 x 3 geometry of uncommitted logic cells with inputs, outputs and netlist numbering
to the output of cell 11. These integer values, whilst denoting the physical location of each input, cell or output within the structure, now also represent connections or routes between the various points. In other words, this numbering system may be used to define a netlist for any combinational circuit. Thus, a chromosome is merely a list of these integer values, where the position on the list tells us the cell or output which is being referred to, while the value tells us the connection (of cell or input) to which that point is connected, and the cells functionality. Each of the logic cells is capable of assuming the functionality of any two-input logic gate, or, alternatively a 2-1 multiplexer (MUX) with single control input. In the geometry shown in Fig. 2. a sample chromosome is shown below:
Fig. 2. A typical netlist chromosome for the 3 x 3 geometry of Fig.1
Notice, in this arrangement that the chromosome is split up into groups of three integers. The first two values relate to the connections of the two inputs to the gate or MUX. The third value may either be positive - in which case it is taken to represent the control input of a MUX - or negative - in which case it is taken to represent a two-input gate, where the modulus of the number indicates the function according to Table 1 below. The first input to the cell is called A and the second input called B for convenience. For the logic operations, the C language symbols are used: (i) & for AND, (ii) | for OR, (iii) ^ for exclusive-OR, and (iv) ! for NOT. There are only 12 entries on this table out of a possible 16 as 4 of the combinations: (i) all zeroes, (ii) all ones, (iii) input A passed straight through, and (iv) input B passed straight through are considered trivial - because these are already included among the possible input combinations, and they do not affect subsequent network connections in cascade. This means that the first cell with output number 6 and characterised by the triple {0, 2, -1} has it’s a input connected to ‘0’, its B input connected to input 2, and since the third value is -1, the cell is an AND gate (thus in this case always produces a logical output of 0). Picking out the cell who’s output is labelled 9, which is characterised by the triple {2, 6, 7}, it can be seen that its A input is connected to input 2 and its B input is connected to the output of cell 6, while since the third number is positive the cell is a MUX with control input connected to the output of cell 7.
Designing Electronic Circuits by Means of Gene Expression Programming
Ⅱ 325
Table 1. Cell gate functionality according to negative gene value in chromosome Gene Value -1 -2 -3 -4 -5 -6 -7 -8 -9 -10 -11 -12
Gate Function A&B A & !B !A & B A^B A|B !A & !B !A ^ B !A A | !B !B !A | B !A | !B
3.2 Genetic Operation Mutation Operation: In this algorithm, the mutation operate is for the triple, each or some of the gene in the triple do the mutate, the mutation probability is 0.7, that is to say, each triple do the mutate with this probability and the mutation domain is 20. Crossover Operation: Select the fittest individual being the parent for the next generation in the population. Each other individual do the crossover operate with this parent. Fitness Measure: The test of whether the evolved circuits perform the desired logic translation of inputs to outputs is achieved by running all test inputs through the network, and compared the results with the desired functionality in a bit-wise fashion.
Fig. 3. One-bit full adder’s true table and a circuits individual’s functionality table
326
X. Yan et al.
A PLA file (truth table) contains the target function, and this is used as a basis for comparison. The percentage of total correct outputs in response to appropriate inputs is then used as the fitness measure for the genetic algorithm. In other words, the nearer the evolved circuit comes to performing with desired functionality, the fitter it is deemed to be. In our algorithm we uses this method: comparing all the outputs of a individual with the desired outputs. For a set of input in the true table, if there is one bit which belongs to the outputs of the circuit individual, different to the desired value, though deemed the circuit functionality useless for this input. When total outputs are equal to the desired outputs, then the fitness value added 1. For examples, suppose one-bit full adder’s true table and a circuit individual’s functionality table like Fig.3. From the Fig.3, we can see in the circuit individual’s functionality table, there are five instances correct in the all eight instances, so the circuit fitness value is 5. 3.3 Algorithm Framework The GEP method solve problem using the following steps: Step 1: random generation of the chromosomes of the initial population; Step 2: the chromosomes are expressed and the fitness of each individual is evaluated, estimate it whether according to the optimize rules, if the individual according to the rules then output the fittest individual and the solution, finish the computation; otherwise turn to Step 3; Step 3: The individuals are then selected according to fitness to reproduce the new individual, during reproduction; the genome is copied and transmitted to the next generation; Step 4: The individuals are recombined according to probability to generate the new individual; Step 5: The individuals are modified according to probability to generate the new individual; Step 6: The individuals are mutated according to probability to generate the new individual; Step 7: Generate new population through recombination, modification and mutation, turn to Step 2. Our circuit designed algorithm is based on the GEP method. Our algorithm through populations evolution to get the fittest solution, in the evolutionary process, we use the genetic operation to guarantee the individual diversity in the population, thus enables the population to be able rapid convergence. The electric circuit automation designs problem has it’s particularity, compared with other optimized problem the different place for the electric circuit automation design lies in the ultimate objective is explicit which is to get the circuit solution, in this circuit the output of a individual fit the true table value. For this idea, we design our algorithm like following: Step 1: random generation of the initial populations P10 and initial the counter t=0; Step 2: select the individuals from populations according to the fitness value and crossover with these individuals to get the new individuals Q; Step 3: for the new individuals use the mutation operator according to probability to generate the new population Pt; Step 4: t=t+1;
Designing Electronic Circuits by Means of Gene Expression Programming
Ⅱ 327
Step 5: if the terminate conditions satisfied turn to step 6, else turn to step 2; Step 6: output the new population Pt; From the framework with our algorithm, it is show this algorithm has guaranteed the population's diversity and causes the population not to trap into the local optima.
4 Case Study This section shows the implementation of our experiment case: one-bit full adder, two-bit half adder and two-bit full adder. All the case use the same parameters, the parameters for the algorithm are as Table 2. Table 2. The parameters for the algorithm Number of generations
2000
Population Size Head length Number of genes Chromosome length Mutation rate One-point recombination rate Selection range
30 6 3 39 0.7 1.0 100
4.1 Case 1: One-Bit Full Adder A one-bit full adder, with a truth table with 3 inputs and 2 outputs. In this case, the matrix has a size of 2×2. Our algorithm use small geometry to find the fully functional solutions. The resulting circuits as shown in Fig. 4. From the figure we know it is a gratifying result to obtain as it is clear that this design is an optimum solution.
Fig. 4. The evolved optimum one-bit full adder circuit
328
X. Yan et al.
4.2 Case 2: Two-Bit Half Adder A two-bit half adder, with a truth table with 4 inputs and 3 outputs. In this case, Our algorithm use small geometry to find the fully functional solutions, .the matrix has a size of 3×3. In this case, we experimented two half adder, one is no carry and other is have the carry. The resulting circuits as shown in Fig. 5 is the two-bit half adder (no carry), Fig. 6 is the two-bit half adder( with carry). From the figures we know it is a gratifying result to obtain as it is clear that this design is an optimum solution.
Fig. 5. The evolved optimum two-bit half adder( no carry) circuit
Fig. 6. The evolved optimum two-bit half adder( with carry) circuit
Designing Electronic Circuits by Means of Gene Expression Programming
Ⅱ 329
4.3 Case 3: Two-Bit Full Adder A two-bit full adder, with a truth table with 5 inputs and 3 outputs. In this case, Our algorithm use small geometry to find the fully functional solutions, .the matrix has a size of 3×3. The resulting circuits as shown in Fig. 7. From the figures we know it is a gratifying result to obtain as it is clear that this design is an optimum solution.
Fig. 7. The evolved optimum two-bit full adder circuit
5 Conclusion Evolutionary Electronics applies the concepts of genetic algorithms to the evolution of electronic circuits. The main idea behind this research field is that each possible electronic circuit can be represented as an individual or a chromosome of an evolutionary process, which performs standard genetic operations over the circuits. In this paper we proposed a new means for designing electronic circuits given a set of logic gates. The final circuit is optimized in terms of complexity (with the minimum number of gates). For the case studies this means has proved to be efficient, experiments results show that we have attained the better results. There are still many avenues for further work. Other ways of representing rectangular arrays of logic cells may be devised. There are many wider issues to be considered also which relate to the problem of evolving much larger circuits. It is a feature of the current technique that one has to specify the functionality of the target circuit using a complete truth table, however this is impractical for circuits with large numbers of inputs.
330
X. Yan et al.
Acknowledgements This paper is supported by National High-Tech Research and Development Plan of China under Grant (NO.2005AA737090), National Natural Science Foundation of China (NO.60473081), Foundation of State Key Laboratory of Geological Processes and Mineral Resources, China University of Geosciences (No.GPMR200617) and Natural Science Foundation of China University of Geosciences (NO.CUGQNL0516).
References 1. Zebulum, R.S., Pacheco, M.A., Vellasco, M.M.: Evolutionary Electronics: Automatic Design of Electronic Circuits and Systems by Genetic Algorithms. CRC Press (2001) 2. Thompson, A., Layzell, P.: Analysis of unconventional evolved electronics. Communications of the ACM 42, 71–79 (1999) 3. Louis, S.J., Rawlins, G.J.: Designer Genetic Algorithms: Genetic Algorithms in Structure Design. In: Proceedings of the Fourth International Conference on Genetic Algorithms (1991) 4. Ferreira, C.: Gene Expression Programming: Anew adaptive Algorithm for solving Problems. Complex Systems 13, 87–129 (2001) 5. Koza, J.R.: Genetic Programming. In: On the Programming of Computers by means of Natural Selection, MIT Press, Cambridge (1992) 6. Coello, C.A., Christiansen, A.D., Aguirre, A.H.: Using Genetic Algorithms to Design Combinational Logic Circuits. Intelligent Engineering through Artificial Neural Networks 6, 391–396 (1996) 7. Miller, J.F., Thompson, P., Fogarty, T.: Algorithms and Evolution Strategies in Engineering and Computer Science: Recent Advancements and Industrial Applications. Chapter 6. Wiley, Chichester (1997) 8. Kalganova, T., Miller, J.F., Lipnitskaya, N.: Multiple_Valued Combinational Circuits Synthesised using Evolvable Hardware. In: Proceedings of the 7th Workshop on Post-Binary Ultra Large Scale Integration Systems (1998) 9. Torresen, J.: A Divide-and-Conquer Approach to Evolvable Hardware. In: Sipper, M., Mange, D., Pérez-Uribe, A. (eds.) ICES 1998. LNCS, vol. 1478, pp. 57–65. Springer, Heidelberg (1998) 10. Hollingworth, G.S., Smith, S.L., Tyrrell, A.M.: The Intrinsic Evolution of Virtex Devices Through Internet Reconfigurable Logic. In: Miller, J.F., Thompson, A., Thompson, P., Fogarty, T.C. (eds.) ICES 2000. LNCS, vol. 1801, pp. 72–79. Springer, Heidelberg (2000) 11. Vassilev, V.K., Miller, J.F.: Scalability Problems of Digital Circuit Evolution. In: Proceedings of the Second NASA/DOD Workshop on Evolvable Hardware, pp. 55–64 (2000) 12. Gordon, T.G., Bentley, P.: Towards Development in Evolvable Hardware. In: Proceedings of the 2002 NASA/DOD Conference on Evolvable Hardware, pp. 241–250 (2002) 13. Miller, J.F.: Designing Electronic Circuits Using Evolutionary Algorithms, Dept. of Computer Studies, Napier University (2003) 14. Miller, J.F., Thomson, P.: CartesianGenetic Programming. In: Poli, R., Banzhaf, W., Langdon, W.B., Miller, J., Nordin, P., Fogarty, T.C. (eds.) EuroGP 2000. LNCS, vol. 1802, pp. 121–132. Springer, Heidelberg (2000) 15. Yan, X.S., Wei, W., et al.: Design Electronic Circuits by Means of Gene Expression Programming. In: Proceedings of the First NASA/ESA Conference on Adaptive Hardware and Systems, pp. 194–199. IEEE Press, Los Alamitos (2006)
Designing Polymorphic Circuits with Evolutionary Algorithm Based on Weighted Sum Method Houjun Liang1, Wenjian Luo1,2, and Xufa Wang1,2 1
Nature Inspired Computation and Applications Laboratory, Department of Computer Science and Technology, University of Science and Technology of China, Hefei, 230027, Anhui, China 2 Anhui Key Laboratory of Software in Computing and Communication, University of Science and Technology of China, Hefei 230027, China
[email protected], {wjluo,xfwang}@ustc.edu.cn
Abstract. Polymorphic circuit is a kind of multifunctional circuits that can perform two or more functions under different conditions. And those functions can be activated by changing control parameters, such as temperature, power supply voltage, illumination and so on. Polymorphic circuit provides a novel approach to build multifunctional circuits, and it can be used in many fields. However, polymorphic circuit can not be designed with conventional methods and is hard to be evolved with evolutionary algorithms directly. A novel evolutionary algorithm based on the weighted sum method is proposed in this paper, which can be used to evolve polymorphic circuits at gate level. The experimental results demonstrate that this algorithm can increase the success ratio and decrease the evolutionary generations needed to evolve a polymorphic circuit. Keywords: Polymorphic Circuit, Evolutionary Algorithm, Weighted Sum.
1 Introduction Polymorphic circuit is a special kind of multifunctional circuits composed of polymorphic gates. Generally, a polymorphic gate exhibits two or more logic functions, which may be activated by the change of Vdd, temperature, light and so on [1]. Polymorphic circuits are different from the traditional multi-functional electronics which are based on switches or multiplexers, and they also can not be designed through traditional methods. Stoica and his colleagues have designed and fabricated some basic polymorphic gates [1-5] using evolutionary algorithms. Polymorphic circuit provides a novel approach to build multifunctional circuits. It allows engineers to build adaptable digital circuits which can perform different desired functions under different circumstances. However, polymorphic circuit can not be designed with conventional methods, and it is difficult to evolve polymorphic circuit with evolutionary algorithms directly. Sekanina and his colleagues have adopted evolutionary algorithms to design some small scale polymorphic circuits at gate level [6-10]. The most complex circuit L. Kang, Y. Liu, and S. Zeng (Eds.): ICES 2007, LNCS 4684, pp. 331–342, 2007. © Springer-Verlag Berlin Heidelberg 2007
332
H. Liang, W. Luo, and X. Wang
evolved has up to 6 inputs and 6 outputs, and consists of several tens of gates [10]. This circuit is a 3-bit multiplier/6-bit sorting net, which was found in the generation 10,244,650 and 9 kinds of different gates were used [10]. In this paper, an evolutionary algorithm based on weighted sum method is proposed, by which the polymorphic digital circuits can be designed more effectively. Simulated experiments are done to design several polymorphic circuits. The experimental results demonstrate that the evolutionary algorithm based on weighted sum method can increase the success ratio and decrease the evolutionary generations. It is also demonstrated that this algorithm can be used to design relatively large polymorphic circuits faster. In this paper, a three-bit multiplier-adder circuit with 6 inputs and 6 outputs is generated in 2,514,043 generations. The rest of this paper is organized as follows. Section 2 introduces some related works. Section 3 describes the weighted sum approach. Section 4 gives the experimental results. Some discussions are given in section 5. Finally, a brief conclusion is given in section 6.
2 Background Polymorphic electronics have two or more built-in functions, and can exhibit different functional behaviors under certain control signals. The concept of polymorphic electronics was firstly proposed by Stoica and his colleagues [1]. As described in previous section, Stoica and his colleagues have designed and fabricated some basic polymorphic gates [1-5] . Experiments about self recovery in extreme environments have also been done [11]. The existing polymorphic gates implemented by Stoica and his colleagues were listed in Table 1 in [9]. The existence of polymorphic gates enables us to use them as primary components to design multifunctional circuits at the gate level. Sekanina and his colleagues have evolved several kinds of gate-level polymorphic digital circuits by using polymorphic gates as the basic building blocks [7-10]. Table 1. Typical examples of polymorphic circuits reported in [7]. The details of evolutionary algorithms that used to generate these polymorphic circuits could be found in [7]. Gate Array Success Ratio (Row × Col) (Suc/Run*) 5b-parity-median NAND/NOR, XOR/XOR 1 × 24 16/200 5b-parity-median 1 × 20 48/900 NAND/XOR, XOR/NOR Mult2b-sn4b NAND/NOR, AND/AND 1 × 40 8/1000 Mult2b-sn4b 1 × 40 1/50 ( a ∨ b )/XOR, XOR/( a ∧ b ) 2b-mult-add NAND/NOR, OR/XOR 1 × 40 11/200 2b-mult-add NAND/NOR, AND/AND 1 × 40 5/200 *“Suc” means the number of runs that find the expected polymorphic circuits successfully. And “Run” means the total independent runs. Circuits
Gates used
Table 1 lists some typical polymorphic circuits reported in [7]. All these circuits were designed by evolutionary algorithms. The details of evolutionary algorithms that
Designing Polymorphic Circuits with Evolutionary Algorithm
333
used to generate these polymorphic circuits could be found in [7]. In [7], it is assumed that the suitable polymorphic gates used in Table 1 are available. The 5b-paritymedian means 5-bit parity circuit vs. 5-bit median circuit, and mult2b-sn4b means 2bit multiplier vs. 4-input sorting network, and 2b-mult-add means 2-bit multiplier vs. 2-bit adder. From Table 1, it can be observed that the success rate is very low. For example, in the case of Mult2b-sn4b the success rate is only 0.8%. Current research results reveal that it is very difficult to evolve nontrivial polymorphic circuits. As described in section 1, the most complex polymorphic circuit evolved is a circuit that behaves as a 3-bit multiplier in mode 1 and as a 6-bit sorting net in mode 2 [10]. And this circuit was found in 10,244,650 generations. However, more complex circuits can hardly be evolved. In those experiments, for each polymorphic circuit, two corresponding exhibits in two different modes are treated equivalently when the fitness values are calculated. However, the two exhibits in two different modes are intrinsically different, and have different difficulties to be evolved. Therefore, in order to increase the success ratio, they should be treated with unequal priority.
3 Evolutionary Design with Weighted Sum Method In this section, an evolutionary algorithm based on the weighted sum method is proposed to design polymorphic circuits. The basic model adopted in this paper is Cartesian Genetic Programming (CGP) [7, 12, 13]. CGP has already demonstrated its efficiency for the design of digital circuits, which was introduced into Evolvable Hardware (EHW) by Miller and Thomson in [7, 12, 13]. Sekanina and his colleagues also adopted CGP to evolve the polymorphic circuits [7-10]. In this section, for convenience, CGP is firstly introduced in brief. And then the weighted sum method is given in detail. 3.1 Brief Introduction to CGP An array of u (columns) × v (rows) of gates (programmable elements) is used to construct the expected circuit in CGP [12, 13]. The number of the circuit inputs, denoted by ni, and outputs, denoted by no, are determined by the expected circuit. To construct the expected circuit, the inputs of each gate can only be connected to the previous columns of the array or some of the circuit inputs, while can not be connected to the latter columns of the array. That is to say, feedback is not allowed. In addition, L-back parameter is used to define the level of connectivity. In other words, if L is set as 1, only previous one column can be connected for each gate. And if L is set as 3, only previous three columns can be connected. And if L is set as u, i.e. the number of columns, all previous columns can be connected. An example of CGP model is shown in Fig. 1 [7]. As for this example, v = 1 and u = L. It is noted that we set v=1 and u=L for all experiments in this paper.
334
H. Liang, W. Luo, and X. Wang
Fig. 1. An example of a 3-input and 2-output circuit [7]. The gates used in this array are NAND and NOR, denoted by “0” and “1”. CGP parameters are given as follows: L=6, u=6, v=1. The corresponding chromosome is “2,1,0 0,3,1 4,1,1 2,3,0 5,4,0 0,1,1, 7,6”.
The evolutionary design approach adopted in this paper is based on the CGP model. The framework of evolutionary algorithm ( μ, λ ) ES adopted in this paper to evolve polymorphic circuits can be summarized as [7, 12]: (1) The ES population consists of λ individuals. (2) The initial λ individuals are generated randomly. (3) Every new population is composed of the offspring of the best μ individuals. (4) Only the mutation operator is adopted in CGP. (5) When a solution that produces correct outputs for all input combinations or a maximum number of generations is reached, the evolution process terminates. However, it is very difficult to evolve polymorphic circuits with this ( μ, λ ) ES directly. To increase the success ratio and decrease the evolutionary generations for evolving a polymorphic circuit, a weighted sum method for computing the fitness value is proposed, and described in section 3.2 in detail. 3.2 Weighted Sum Method
The polymorphic circuits can perform different functions under different environments. Without loss of generality, we assume that a polymorphic circuit can perform two different functions under two different modes. In general, the circuit in mode 1 or the circuit in mode 2 can be regarded as a traditional digital circuit. As for different gates, the circuits in mode 1 or mode 2 to be evolved have different complexity. Therefore, when the EA are designed for generating polymorphic circuits, the characteristics of polymorphic circuits should be considered for efficiently generating the expected polymorphic circuits. In this paper, a fitness computing method based on the weighted sum is given in Fig. 2. Different from the fitness computing method by Sekanina in [7], the fitness computing method in Fig. 2 involves the weighted values. W1 and W2 denote different weighted values adopted for mode 1 and mode 2 respectively. Therefore, how to select both W1 and W2 in advance is an important question. For convenience, the combination of W1 and W2 is called as RWR (Real Weight Ratio), and denoted by W1/W2. Therefore, the RWR should be set appropriately in order to increase the chance of getting a solution within a fixed number of runs and reduce evolution generations while evolving a polymorphic circuit. The following phenomenon should be considered when setting RWR: (1) With the identical gates and identical algorithms, two traditional circuits with different
Designing Polymorphic Circuits with Evolutionary Algorithm
335
(0) Suppose the expected polymorphic circuit behaves as f1 function in mode 1 and as f2 function in mode 2. (1) Given a candidate circuit, initialize the fitness value F=0, temporary variables T1=0, T2=0. (2) For each inputs combination C do (2.1) Set all gates of a candidate circuit into mode 1, calculate the output vector Vc. (2.2) Compare the output vector Vc with the desired output vector (Dc) of function f1. For each index i of the output vector, if Vc[i] =Dc[i], then T1=T1+1; (2.3) Set all gates of a candidate circuit into mode 2, calculate the output vector Vc. (2.4) Compare the output vector Vc with the desired output vector (Dc) of function f2. For each index i of the output vector, if Vc[i] =Dc[i], then T2=T2+1; (3) Calculate the fitness as F=W1*T1+W2*T2. (4) If F reaches the max fitness value determined by the truth table of the desired circuit, the expected circuit is generated. Fig. 2. The fitness computing method based on the weighted sum. The weighted values are introduced in Step (3). W1 and W2 denote the corresponding weighted values of T1 and T2. And W1 and W2 have much effect on the performance of the evolutionary algorithm.
functions often have different difficulties to be evolved. (2) With different gates and identical algorithms, the same traditional circuits will also have different difficulties to be evolved. Therefore, surely, as for a polymorphic circuit, the difficulties to evolve the function circuits in all modes are not equivalent. In order to set RWR appropriately, the difficulty to evolve the function circuit in each mode of a polymorphic circuit should be evaluated firstly. The functional circuits in different modes may be evolved separately with the corresponding gates for a certain number of independent runs, such as 100 runs. And then their average evolution generations are got, denoted by G1 (mode 1) and G2 (mode 2). Define OWR=G1/G2.
(1)
Where OWR means the Original Weight Ratio. If OWR is greater than 1, it means the circuit in mode1 is more difficult to evolve than the circuit in mode2. Thus, W1 should be greater than W2 when the polymorphic circuit is evolved. If OWR is less than 1, W1 should be smaller than W2. Table 2. Some polymorphic circuits and their OWRs. Gate1 and Gate2 mean the polymorphic gates used in the gate array. “Max Gen” means the maximum generations that are set in the experiments. Runs means the number of the experiments done for each circuit independently. Circuit 5b-parity-median-1 5b-parity-median-2 2b-mult-add-1 2b-mult-add-2 Mult2b-sn4b-1 Mult2b-sn4b-2
Gate Array 1 × 50 1 × 40 1 × 40 1 × 40 1 × 40 1 × 60
Gate1 Gate2 NAND/NOR XOR/XOR NAND/XOR XOR/NOR NAND/NOR OR/XOR NAND/NOR AND/AND NAND/NOR AND/AND ( a ∨ b )/XOR XOR/( a ∧ b )
Max Gen 300,000 300,000 300,000 300,000 300,000 300,000
Runs 300 200 100 100 300 300
OWR 1/3495.8 1/6551.6 1.7/1
1/7.7 1/2.5 1/2.3
336
H. Liang, W. Luo, and X. Wang
Table 2 shows some polymorphic circuits and the corresponding OWRs. Notably, here it is assumed that suitable polymorphic gates exist and can be used to build the circuits. The works of Stoica and his colleagues introduced in section 2 make this assumption possible. In addition, OWR is highly related with both “Gates Array” and “Gate 1/Gate 2”. For example, for the circuit 5b-parity-median-1, when “Gates Array” is set to 1 × 50, its OWR is only 1/3495.8. However, its OWR reaches about 1/20000 when “Gates Array” is set to 1 × 24.
4 Experiments In this section, experiments are done to evaluate the performance of the algorithm given in this paper. Both small combinational circuits (up to 5 inputs and 4 outputs) and relatively large circuits (up to 6 inputs and 6 outputs) are tested in experiments. The small polymorphic circuits and polymorphic gates adopted in experiments are listed in Table 2. The relatively large circuit is three-bit multiplier-adder. A (4,128)-ES is used. Every new population consists of mutants of the best 4 individuals as in [7, 12]. Only the mutation operator is utilized. Every gene in an individual has a chance to be mutated with the probability 5%. The computation is terminated when a correct solution is found or the maximum evolutionary generation is reached. The maximum evolutionary generation is 300,000. It’s noted that both the mutation operator and the terminal conditions are different with those in [7, 12]. Table 3. The experimental results of 5b-parity-median-1 against different RWRs. The OWR of 5b-parity-median-1 is 1/3495.8. RWR Ave.Gen. STD Suc.Rate
20/1 188088 100204 0.68
10/1 181398 99633 0.66
1/1 181654 106159 0.65
1/20 142122 97800 0.72
1/32 141764 104997 0.7
1/48 156419 103021 0.68
1/2000 152915 102429 0.74
1/4000 143546 96296 0.73
1/6000 149902 97694 0.69
1/10000 158563 105482 0.71
Table 4. The experimental results of 5b-parity-median-2 against different RWRs. The OWR of 5b-parity-median-2 is 1/6551.6. RWR 1024/1 64/1 16/1 1/1 1/4 1/8 1/16 1/2048 1/4096 1/8192 1/163841/655361/131072 Ave.Gen.242304248605268857248480246826230365216855164191177874145926170875 172579 165254 STD 91239 87386 68003 91685 88915 102342108144111176120268114312116822 112936 116804 Suc.Rate 0.37 0.34 0.24 0.3 0.34 0.37 0.45 0.67 0.58 0.73 0.66 0.64 0.68
Table 5. The experimental results of 2b-Mul-add-1 against different RWRs. The OWR of 2bMul-add-1 is 1.7/1. RWR 128/1 AVE.GEN. 150650 STD 108328 Suc.Rate 0.77
64/1 169940 102934 0.69
48/1 123375 84513 0.84
32/1 156734 104482 0.69
16/1 137621 90742 0.79
12/1 140884 86195 0.78
8/1 152120 93497 0.71
6/1 150621 91163 0.72
1/1 207557 106129 0.59
1/3 209752 118685 0.51
1/8 193351 110286 0.58
1/16 215622 131843 0.43
Designing Polymorphic Circuits with Evolutionary Algorithm
337
Table 6. The experimental results of 2b-Mul-add-2 against different RWRs. The OWR of 2bMul-add-2 is 7.7. RWR 16/1 8/1 6/1 1/1 1/3 1/6 1/12 1/16 1/24 1/32 1/48 1/64 1/128 AVE.GEN. 192372 217154 195381 203779 184348 166539 158348 137054 146672 143805 157281 164816 159009 STD 104919 103318 103914 105874 90730 93052 90752 81271 86960 80231 90440 85011 89003 Suc.Rate 0.64 0.54 0.58 0.57 0.74 0.76 0.8 0.83 0.82 0.82 0.77 0.77 0.78
Table 7. The experimental results of Mult2b-sn4b-1 against different RWRs. The OWR of Mult2b-sn4b-1 is 1/2.5. RWR Ave.Gen. STD Suc.Rate
16/1 199400 101549 0.63
8/1 194338 96882 0.65
6/1 169612 85035 0.75
3/1 182922 94757 0.67
1/1 204399 104416 0.57
1/3 167172 89435 0.71
1/6 190390 99406 0.66
1/10 204007 107307 0.59
1/16 202183 92059 0.65
1/24 1/32 1/64 163021 195862 186398 88315 102017 99853 0.72 0.6 0.65
Table 8. The experimental results of Mult2b-sn4b-2 against different RWRs. The OWR of Mult2b-sn4b-2 is 1/2.3. RWR 64/1 16/1 10/1 3/1 1/1 1/3 1/6 1/12 1/18 1/32 1/48 1/64 1/256 Ave.Gen.253875.8233658253320250027251233239058244596239905232767243369238087221166215805.3 STD 78948.44126417128644129485117700123572127677138189125463125247117177124627105613.5 Suc.Rate 0.33 0.4 0.31 0.32 0.34 0.38 0.34 0.36 0.42 0.37 0.43 0.45 0.47
The experimental results of small combinational circuits are given in Table 3~ Table 8. In these tables, the symbol “Suc.Rate” denotes the success rate in 100 independent runs. “Ave.Gen.” gives the average generations of all runs. When “Ave.Gen.” is statistically computed, the Maximum Generation is adopted if this run does not get a correct solution. For convenience, Fig. 3~Fig. 7 illustrate these results in a graphic way. Fig. 3~Fig. 8 shows the relationship of “Suc.Rate” and “Ave.Gen.” against RWRs of different circuits.
Fig. 3. The relationship between the success rate, the average generation and RWR of 5bparity-median-1
338
H. Liang, W. Luo, and X. Wang
Fig. 4. The relationship between the success rate, the average generation and RWR of 5bparity-median-2
Fig. 5. The relationship between the success rate, the average generation and RWR of 2b-multadd-1
Fig. 6. The relationship between the success rate, the average generation and RWR of 2b-multadd-2
Designing Polymorphic Circuits with Evolutionary Algorithm
339
Fig. 7. The relationship between the success rate, the average generation and RWR of Mult2bsn4b-1
Fig. 8. The relationship between the success rate, the average generation and RWR of Mult2bsn4b-2
As for 5b-parity-median-1 circuit, the success rate increases a little, but does not change much when RWR changes. However, compared with RWR=1/1, the average evolutionary generation decreases about 12.7%~21.9% when RWR<1/1. As for 5bparity-median-2 circuit, the success rate increases about 43% at most, and the average generation decreases about 29.1% at most. As for the 2b-mult-add-1 problem, the success rate increases about 10%~25% when RWR>1/1, and the average generation decreases from 207,577 to 123,375 when RWR changes from 1/1 to 48/1. As for the 2b-mult-add-2 problem, the success rate increases about 17%~26% and the average generation drops about 9%~32% when RWR is set in correct direction according to its OWR. For these circuits, the effects of the proposed approach are significant. As for both 2b-mult-sort-1 and 2b-mult-sort-2 circuit, there is no significant improvement no matter how RWR changes. However, when RWR changes according
340
H. Liang, W. Luo, and X. Wang
to OWR, the experimental results are not worse than that of traditional EAs without the weighted sum method (i.e. RWR=1/1). In addition, from Fig. 7, when RWR changes, both the success rate and the average generation number rise and decrease randomly. In Fig. 7, the number of gates used to evolve the circuit is 40. When the number of gates used is increased to 60, the effects of the weighted sum method become evident, as shown in Fig. 8. The success rate increases and the average generation number decreases a little. Although they do not change as much as those in Fig. 3~Fig. 6, the trend is still obvious. Another interesting thing is that sometimes the success rate can also rise a little when RWR is set in reverse direction of its OWR, such as the results in Fig. 7. Therefore, when RWR is set to be some times of the corresponding OWR, it will have positive effect on the evolutionary process to evolve a polymorphic circuit. In Fig. 9, three-bit multiplier-adder polymorphic circuit is evolved by the algorithm in this paper. The parameters are given as follows: u=90, RWR=18/1, Max Gen=5,000,000. This solution is found in the generation 2,514,043 in only one run. It belongs to the most complex circuits that have been evolved so far. Therefore, the algorithm proposed in this paper is also helpful for evolving more complex polymorphic circuits.
Fig. 9. A polymorphic three-bit multiplier-adder circuit. The inputs: A (0-2), B (3-5); the outputs: 0-5(0-3 in case of adder); gates: 0-NAND/NOR, 1-OR/XOR.
According to all above experimental results, it is demonstrated that the proposed method can improve the success rate and reduce the evolutionary generations. Thus this algorithm can be used to evolve relative large polymorphic circuits more efficiently.
Designing Polymorphic Circuits with Evolutionary Algorithm
341
5 Discussions Recent research works have demonstrated that polymorphic electronics can be used in many fields [1, 3, 7, 9]. For example, a polymorphic circuit could be designed to control power consumption automatically, or to implement a hidden function which can be activated under a specific condition, or to be used in low-cost adaptive systems, and so on. Experimental results reveal that for some kinds of circuits, such as 5b-paritymedian-1, 5b-parity-median-2, 2b-mult-add-1 and 2b-mult-add-2, the method proposed in this paper works very well. However, for some other circuits, such as Mult2b-sn4b-1 and Mult2b-sn4b-2, it does not make much difference no matter how RWR changes. Anyway, when the RWR is set as some times of its corresponding OWR, the experimental results are not worse than that of traditional EAs with RWR=1/1. The proposed weighted sum method can be used to evolve relatively larger polymorphic circuits by selecting a suitable RWR according to the rule mentioned in section 3. It is important that RWR is in the same direction with its corresponding OWR. In addition, RWR is not a very sensitive parameter. For those circuits whose OWR is with small scale, the selected RWR does not need to be a large amplification of OWR. Also, for those circuits whose OWR is with high scale, the RWR also need not to be too large. For example, the OWR of 5b-parity-median-1 is 1/3495.8, there are no distinct differences when RWR=1/20 and RWR=1/4000.
6 Conclusions Polymorphic circuits have special characteristics and can be applied in a wide range of areas. But it is hard to design a polymorphic circuit. So far, there are only several small scale multifunctional circuits have been designed exclusively by evolutionary approach. In this paper, an evolutionary design approach based on the weighted sum method has been introduced for designing the polymorphic combinational circuits. How to set appropriate weight ratio is discussed. The results of experiments show the weighted sum method is effective for the design of polymorphic circuits. A relatively large polymorphic circuit with 6 inputs and 6 outputs has also been found at gatelevel. When combined with other decomposition strategies, the proposed algorithm can be used to evolve more complex polymorphic circuits and used in real-world applications. How to decompose a polymorphic module and how to design larger polymorphic circuits are our future works. Acknowledgements. This work is partly supported by the National Natural Science Foundation of China (NO. 60404004), and the open foundation from Anhui Key Laboratory of Software in Computing and Communication.
342
H. Liang, W. Luo, and X. Wang
References 1. Stoica, A., Zebulum, R., Keymeulen, D.: Polymorphic Electronics. In: Liu, Y., Tanaka, K., Iwata, M., Higuchi, T., Yasunaga, M. (eds.) ICES 2001. LNCS, vol. 2210, pp. 291–301. Springer, Heidelberg (2001) 2. Stoica, A., Zebulum, R., Keymeulen, D.: Evolvable Hardware Solutions for Extreme Temperature Electronics. In: Proc. of the Third NASA/DoD Workshop on Evolvable Hardware, vol. 93–97, IEEE, Los Alamitos (2001) 3. Stoica, A., Zebulum, R., Keymeulen, D.: On Polymorphic Circuits and Their Design Using Evolutionary Algorithms. In: Proceedings of IASTED International Conference on Applied Informatics(AI2002), Innsbruck, Austrilia (2002) 4. Stoica, A., Zebulum, R., Keymeulen, D.: Taking Evolutionary Circuit Design from Experimentation to Implementation: Some Useful Techniques and a Silicon Demonstration. IEE Proceedings on Computers and Digital Techniques 151, 295–300 (2004) 5. Stoica, A., Zebulum, R.S., Keymeulen, D., Ramesham, R.: Temperature-Adaptive Circuits on Reconfigurable Analog Arrays. In: Proceedings of the First NASA/ESA Conference on Adaptive Hardware and Systems, Istanbul, Turkey, pp. 28–31 (2006) 6. Bidlo, M., Sekanina, L.: Providing Information from the Environment for Growing Electronic Circuits through Polymorphic Gates. In: Proceedings of Genetic and Evolutionary Computation Conference, New York, US, pp. 242–248 (2005) 7. Sekanina, L.: Evolutionary Design of Gate-Level Polymorphic Digital Circuits. In: Rothlauf, F., Branke, J., Cagnoni, S., Corne, D.W., Drechsler, R., Jin, Y., Machado, P., Marchiori, E., Romero, J., Smith, G.D., Squillero, G. (eds.) EvoWorkkshops 2005. LNCS, vol. 3449, pp. 185–194. Springer, Heidelberg (2005) 8. Sekanina, L.: Design Methods for Polymorphic Digital Circuits. In: Rothlauf, F., Branke, J., Cagnoni, S., Corne, D.W., Drechsler, R., Jin, Y., Machado, P., Marchiori, E., Romero, J., Smith, G.D., Squillero, G. (eds.) EvoWorkkshops 2005. LNCS, vol. 3449, pp. 185–190. Springer, Heidelberg (2005) 9. Sekanina, L., Martinek, T., Gajda, Z.: Extrinsic and Intrinsic Evolution of Multifunctional Combinational Modules. In: IEEE Congress on Evolutionary Computation (2006) 10. Sekanina, L., Starecek, L., Gajda, Z., Kotasek, Z.: Evolution of Multifunctional Combinational Modules Controlled by the Power Supply Voltage. In: Proceedings of the First NASA/ESA Conference on Adaptive Hardware and Systems, Istanbul, Turkey, pp. 186–193 (2006) 11. Stoica, A., Keymeulen, D., Arslan, T.: Circuit Self-Recovery Experiments in Extreme Environments. In: Proceedings of 2004 NASA/DoD Conference on Evolvable Hardware, pp. 142–145 (2004) 12. Miller, T.: Cartesian Genetic Programming. In: Poli, R., Banzhaf, W., Langdon, W.B., Miller, J., Nordin, P., Fogarty, T.C. (eds.) EuroGP 2000. LNCS, vol. 1802, pp. 121–132. Springer, Heidelberg (2000) 13. Miller, J., Job, D.: Principles in the Evolutionary Design of Digital Circuits - Part I. Genetic Programming and Evolvable Machines 1, 7–35 (2000)
Robust and Efficient Multi-objective Automatic Adjustment for Optical Axes in Laser Systems Using Stochastic Binary Search Algorithm Nobuharu Murata1 , Hirokazu Nosato2 Tatsumi Furuya1 , and Masahiro Murakawa2 1
Graduate School of Toho University, 2-2-1 Miyama, Funabashi, Chiba , Japan National Institute of Advanced Industrial Science and Technology (AIST), 1-1-1 Umezono, Tsukuba, Ibaraki, Japan
[email protected],
[email protected],
[email protected],
[email protected]
2
Abstract. The adjustment of optical axes is crucial for laser systems. We have previously proposed an automatic adjustment method using genetic algorithms to adjust the optical axes. However, there were still two problems that needed to be solved: (1)long adjustment times, and (2)adjustment precision due to observation noise. In order to solve these tasks, we propose a robust and efficient automatic multi-objective adjustment method using stochastic binary search algorithm. Adjustment experiments for optical axes with 4-DOF in noisy environment demonstrate that the proposed method can robustly adjust the positioning and the angle of the optical axes in about 12 minutes. Keywords: optical axes, automatic adjustment, stochastic binary search, multi-objective optimization, noisy environment.
1
Introduction
Laser systems are currently essential for various industrial fields. For laser systems, the adjustment of the optical axes is crucial, because the performance of the laser system deteriorates when the optical axes deviate from their specification settings due to disturbances, such as vibrations. However, it is very difficult to adjust the optical axes, because adjustment both requires high-precision positioning with μm resolutions and involves multiple degrees-of-freedom (DOF) that have an interdependent relationship. Thus, adjustment costs are a major problem due to the significant amounts of time required for a skilled engineer to adjust the optical axes. In order to overcome this problem, we have previously proposed automatic adjustment methods for optical axes using Genetic Algorithms (GA) [1,2,3]. For example, our method has been successfully applied to the automatic adjustment of a femto-second laser that has 12-DOF [2]. However, there were two tasks with the proposed methods that needed to be solved. First, it has been necessary to reduce the adjustment time. This is because a laser system should ideally be re-adjusted every time it is used, so, for practical L. Kang, Y. Liu, and S. Zeng (Eds.): ICES 2007, LNCS 4684, pp. 343–354, 2007. c Springer-Verlag Berlin Heidelberg 2007
344
N. Murata et al.
considerations, adjustment time must be as fast as possible. Second, because adjustment of optical axes is usually undertaken in very noisy environments, adjustment precision can vary widely. In order to overcome these tasks, we have proposed a robust and efficient automatic adjustment method for the positioning of the optical axes [4]. This method has two characteristics: 1. The method adopts a Binary Search Algorithm (BSA) [5]. The BSA gradually changes from an exploration phase to an exploitation phase. While the exploration phase searches a region that has not previously been searched using a binary search tree, the exploitation phase searches a region around good points. 2. The fitness value adopted is a weighted average of sampled fitness values based on a search history. In this study, we improve the proposed method to cope with the multiobjective adjustment. In general, adjustment of the optical axes must simultaneously satisfy the positioning and the angling of the optical axes that have a trade-off relationship. We can realize robust and efficient automatic multiobjective adjustment for the optical axes with the improved method. There are two advantages with the proposed method: (1) adjustment times can be reduced. This is because the method does not search in regions that have been previously searched. In addition, it is not necessary to re-evaluate the fitness function to mitigate the influence of noise. (2) the method provides robust automatic adjustment. Instance of premature convergence or falling into local solutions do not occur because the adjustment is less influenced by noise. Conducted experiments involving 4-DOF adjustment with our method demonstrate that (1) adjustment times can be reduced by 93% compared to conventional adjustment times, and (2) the improved method could achieve robust automatic multi-objective adjustment to mitigate the influence of noise. This paper is organized as follows: In section 2, we explain the multi-objective automatic adjustment of optical axes. Section 3 describes our proposed method, and in section 4, we present experimental results obtained for the proposed method. Finally, a summary of this study and future investigations are provided in section 5.
2
Multi-objective Automatic Adjustment of Optical Axes
A laser system typically consists of a pumping laser, as the source laser, and a laser cavity in which the laser beam from the pumping laser is amplified. If the output optical axes from the pumping laser are not precisely positioned and parallel (the incidence angle of the optical axes to the laser cavity is 0 degree), it will influence the laser cavity. Adjustment of the optical axes that control the output from the pumping laser is crucial, because the performance of a laser system deteriorates when the optical axes deviate from their precise positioning and parallelism is lost due to long-term use and disturbances, such as vibrations.
Robust and Efficient Multi-objective Automatic Adjustment
1 Laser
345
1 2
Laser
2
(a) DOF for positioning of optical axes. (b)DOF for angle of optical axes.
Fig. 1. Degrees of freedom (DOF) for optical axes
Mirror Laser Pumping Source Optical Sensor Mirror
Control Signals Controller
PC
Fig. 2. An automatic adjustment system for optical axes with four DOF
Therefore, the positioning and the angle of optical axes must be simultaneously adjusted to a target position and a target angle. In order to adjust the positioning and the angle of optical axes, it is necessary to adjust for four degrees-of-freedom (DOF). The 4-DOF in the adjustment of optical axes are shown in Fig. 1. Specifically, the optical axes have two DOF in terms of movement in both the vertical and horizontal planes in positioning the optical axis. Similarly, there are two DOF for the optical axes in terms of the angle in the vertical and horizontal planes. 2.1
Automatic Adjustment System
In this paper, we adjust the positioning and the angle of the optical axes using the most basic adjustment system. The structure of the optical axes adjustment system is illustrated in Fig. 2. The system consists of two mirrors with two stepping motors, an evaluation detector to detect the positioning and the angle of the optical axis, a motor controller to control the stepping motors for the mirrors and a PC to execute the calculations. The mirrors in the system can be adjusted according to 4-DOF to adjust the positioning and the angle of the optical axis. A developed adjustment system [3] according to this structure is shown in Fig. 3. This system consists of the following elements. 1. Motor controller The motor controller is a stepping motor driver, which can move the mirror holder system with a resolution of 0.075μm/step. With this controller, the
346
N. Murata et al.
Laser Light Source
Sensor Mirror
Fig. 3. The developed automatic adjustment system for optical axes with 4-DOF
time to move the motors in evaluating each individual is at maximum about 4 seconds. 2. Evaluation detector The detector consists of two-dimensional position sensitive detectors(S2044) and a signal processing circuit(C9069) produced by HAMAMATSU Photonics KK. This detector, with a resolution of 1.39μm, can detect the X-Y coordinates of the optical axes on the detector front. 3. PC This executes the flow which is explained in section 3. 4. Light source The light source is a He-Ne gas laser(1125P) produced by JDS Unipase. The beam diameter is 2mm. 2.2
Noise Sources in the Adjustment System
In this adjustment system, there are two sources of noise that influence output evaluations. (1) observational noise from the evaluation detector that evaluates the state of the optical axes. (2) driving precision noise due to the precision of the stepping motors. While the stepping motors are moved according to constant displacements, actual axial displacements are not constant. Accordingly, the optical axes can deviate from the desired state, even if the motors are moved according to displacement settings in seeking to adjust toward the target state. Observational results for these noises are shown in Table. 1. These values are root-mean-square (RMS) errors that are observed with the adjustment system. Table 1. Two kinds of noise sources
Error due to noise (1) Error due to noise (2)
Positioning error Angle error 18.1μm 0.5◦ 12.9μm 0.4◦
Robust and Efficient Multi-objective Automatic Adjustment
347
t:=0 Calculation P(t) Exploration or Exploitation ?
Exploitation
Exploration Calculation of weighted average g(x) Selection of a good point Exploration
Exploitation Division of region Fitness evaluation
t := t+1
Updating of search history Termination ?
Obtain Pareto Optimal Solutions
Fig. 4. Flowchart for the proposed method
In terms of noise (1), the positioning RMS error was 18.1μm and the angle RMS error was 0.5◦ . In terms of noise (2), the positioning RMS error was 12.9μm and the angle RMS error was 0.4◦ .
3
Proposed Adjustment Method
We propose a robust and efficient automatic multi-objective adjustment method for optical axes using a stochastic binary search algorithm. The flowchart of the proposed method is shown in Fig. 4. This method utilizes a weighted averaged fitness in the BSA, as explained in Fig. 4. We refer to the proposed method as a stochastic binary search algorithm utilizing a weighted averaged evaluation value (BSW). We improve on the single-objective adjustment method, which was proposed in [4], to realize a multi-objective adjustment method that can adjust for both the positioning and the angle of the optical axes. The ‘good point’ selection technique is improved for the exploitation phase. The BSW algorithm is explained in more detail below. 3.1
Stochastic Binary Search Algorithm
The strategy of the BSA is to use a binary search tree [5] to divide the search space into empty regions, allowing for the largest empty regions to be approximated. The search tree is constructed by generating a point xt at random within a chosen hypercube, and then by dividing that hypercube along dimensions that yield the most ‘cube-like’ subspaces. The basic algorithm for constructing the
348
N. Murata et al.
Probability of exploration P(t)
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0
100
200 300 400 The number of fitness evaluation t
500
Fig. 5. Probability of exploration for C=0.02, K=0.1, σ=0.05 and N =500
binary search tree works by repeatedly choosing either an exploration or exploitation step: 1. Exploration: Next point xt+1 is generated within the largest empty region. 2. Exploitation: Next point xt+1 is generated within the largest empty region that is within a small distance from a ‘good point’. The coordinates of the point xt and the positioning error for the optical axes f1t , and the angle error for the optical axes f2t at xt are stored in a search history F (t) as represented in (1). F (t) = {(x1 , f11 , f21 ), (x2 , f12 , f22 ), . . . , (xi , f1i , f2i ), . . . , (xt , f1t , f2t )}
(1)
The characteristics of the BSA are (1) the search phase in the BSA gradually shifts from exploration to exploitation and (2) the BSA stores a complete search history for all regions that have been searched. Based on these characteristics, the BSA can reduce adjustment times, because the BSA does not undertake exploration in the final phase and because the BSA does not search in regions that have previously been searched. The flow is explained in more detail below. 1. Determination of the basic algorithm The decision of whether to perform exploration or exploitation is made based on a probability distribution P (t) that varies with the number of fitness evaluations. P (t), which is calculated using Eq.(2), is illustrated graphically in Fig. 5. P (t) = (C − 1)
tanh( t/Nσ−K ) − tanh( −K σ ) +1 −K tanh( 1−K ) − tanh( σ σ )
(2)
C is the minimum probability of performing the exploration step, σ is the rate at which the probability of exploration decays, K is the decay mid-point and N is the maximum number of trials that are to be performed.
Robust and Efficient Multi-objective Automatic Adjustment
349
B
A
Point x
Point x 2dp
Fig. 6. Exploration
good point
Fig. 7. Placement of a small hypercube
2. Exploration In the beginning, the largest region which has not previously been searched is identified from the search history. A search point xt is generated randomly within that region. For example, exploration in a two-dimensional search is illustrated in Fig. 6. Within the search region, the region ‘A’ is selected because region ‘A’ is the largest region that has not already been searched. Next, the search point xt is generated at random in the region ‘A’. 3. Exploitation First, Np points are selected at random from the search history and the weighted averaged fitness, which will be explained in subsection 3.2, is calculated for those points. According to the calculated weighted averaged fitness, non-dominated solutions are chosen from the Np searched points. A ‘good point’ is selected at random from among these non-dominated solutions. Next, a small offset distance dp is used to generate a hypercube of interest about the ‘good point’. The small hypercube is placed around the ‘good point’ simply to provide an efficient means of identifying neighbouring regions. A search point xt is generated at random in the largest region that intersects the hypercube. Exploitation in a two-dimensional search is shown in Fig. 7. The small hypercube is placed around the ‘good point’ selected in the search history. Then, the largest region ‘B’ is selected where three regions intersect with the hypercube. the search point xt is generated at random in the region ‘B’. 4. Division of search region After the generation of the search point xt , the search region that was selected in either the exploration or exploitation phase is divided to two subspaces along a dimensional axis based on the search point xt . Then, the dimensional axis that yields the most ‘cube-like’ subspaces is selected. The definition of ‘cube-like’ employed here is the split that minimizes Eq.(3), where dmax is the maximum side length for the sides of the two subspaces,
350
N. Murata et al. Node1 Point x1 Positioning error f11 Angle error f 21 Node2
Node3 Point x2
Point x3
Positioning error f12 Angle error f 22
Positioning error f13 Angle error f 23 Node4
Node5 Point x4
Point x5
Positioning error f14 Angle error f24
Positioning error f15 Angle error f25
Fig. 8. Updating the search history
and dmin is the overall shortest side length. If H is 1, the two subspaces become hypercubes. H=
dmax dmin
(3)
5. Fitness evaluation In this adjustment, the positioning error of the optical axes and the angle error of the optical axes are obtained after displacement of the four stepping motors. The unit for f1 is μm and the unit for f2 is ◦ . These fitness values are calculated from the X − Y coordinates on the front of the optical sensor and the power of the laser beam. 6. Updating of search history The search history is updated by adding information about the search point xt , the fitness values and the two subspaces. Updating of a binary tree as the search history is illustrated in Fig. 8. When the search point was generated in the region marked by node 2 and the region was divided to two subspaces, the information about the search point and the fitness are stored in node 2. Then, the two nodes for the two divided subspace are referred to as child nodes from node 2. The adjustment terminates after N times repetition of the above steps. 3.2
Weighted Averaged Fitness
The conventional method of coping with noisy fitness functions is to evaluate the fitness values several times for each individual and to adopt the average of the sampled values [6,7]. However, adjustment for laser systems by the conventional method is not practical, because of the increases in adjustment times. For example, if moving the motors and detection to obtain two fitness values are performed N times, then the adjustment time would increase N -fold.
Robust and Efficient Multi-objective Automatic Adjustment
351
In order to solve this problem, a weighted average value, which is calculated from the search history F (t), is used for the fitness value. For the laser system, we assume that the detected values f1t , f2t at xt increases or decreases in proportion to the distance dt from a certain point x to the point xt and that noise varies according to a normal distribution. Thus, a maximum-likelihood estimation for fi (x) can be obtained as follows: fi (x) + gi (x) = 1+
T −1
t=1 T −1 t=1
1 ft 1+k×d2t i
(i = 1, 2),
(4)
1 1+k×d2t
dt = x − xt ,
(5)
where g1 (x), g2 (x) are weighted averaged values for the positioning and the angle of the optical axes, f1 (x), f2 (x) are evaluated values for the positioning and the angle of the optical axes with the detector. f1T = f1 (x), f2T = f2 (x), xT = x and k is the proportional value, and dt is the distance from the sampled points. The ‘good point’ in the exploitation step of the BSA is decided using this weighted averaged fitness g1 (x) and g2 (x). There are two advantages of this method. The first is that the method can prevent premature convergence during the exploitation phase due to observational and driving precision noise. The second advantage is that the number of evaluations is just one for each individual. Thus, this method is capable of robustly and efficiently adjusting the optical axes in noisy environments.
4
Adjustment Experiments
4.1
Experimental Details
Three experiments were conducted to examine the effectiveness of the proposed method. These were a multi-objective automatic adjustment experiment using NCGA [8] as the MOGA, and two multi-objective automatic adjustment experiments using BSA and using BSW, respectively. The adjustment goal in each case was set to the laser system to its ideal state (i.e., the positioning error for the optical axes is 0 and the angle error of the optical axes is 0). The initial conditions were states obtained by randomly altering the position within the range of ±5mm and the angle within the range of ±5◦ . Adjustment started from the initial states. Adjustment terminated when a predetermined number of evaluations was repeated, and were conducted over 5 trials. After adjustment, the optical axes are re-set to a state based on the Pareto optimal solutions and the positioning and the angle of the optical axes are re-evaluated. By applying an average value1 for 10 evaluations to the reevaluation values for the positioning and angle of the optical axes, the reliability of the obtained results is increased. 1
This process is not carried out for actual adjustment with BSW.
352
N. Murata et al.
Table 2. Average adjustment times for five trials for the three adjustment methods
MOGA (conventional method) BSA BSW (proposed method)
Adjustment time (min) 180 12.5 11.9
The parameters in these experiments were as following: – NCGA: The population size was 20. The probabilities for crossover and mutation were 0.7 and 0.05, respectively. The number of fitness evaluations was about 4820 (the iteration count for the genetic operator was about 240). – BSA: The parameters of P (t) were C = 0.02, K = 0.15, σ = 0.05 and N = 500. The number of fitness evaluation was 300. – BSW: The parameters of P (t) were the same as BSA, with k = 10. The number of fitness evaluation was 300. We used attainment surfaces [9] to compare the Pareto optimal solutions obtained by each adjustment method. The attainment surface is defined as a boundary surface between a region that is dominated by Pareto optimal solutions and a region that is not dominated by Pareto optimal solutions. By comparing the obtained attainment surfaces, we can verify the precision that each adjustment method could attain. A 50%-attainment surface [9] indicates the precision attainable by each adjustment method at a probability of 50%. The 50% attainment surface for our experiments is the 3rd attainment surface for 5 attainment surfaces obtained from the Pareto optimal solutions for the 5 trials. 4.2
Experimental Results
Figs. 9-12 present the results for each trial for each adjustment method. The 50%-attainment surfaces for each adjustment method are illustrated in Fig. 9. Figs. 10-12 present the 50%-attainment surfaces that were formed by Pareto optimal solutions obtained before and after the re-setting of the optical axes for each adjustment method. Table. 2 presents the average adjustment times for the 5 trials for each adjustment method. First, in terms of adjustment time, adjustment using BSW could be performed in about 12 minutes, representing an adjustment time reduction of 93%. As shown in Table. 2, the conventional method using a MOGA took 3 hours while BSA and BSW required only 12.5 and 11.9 minutes, respectively. The reason for these results is that BSW does not execute exploration in the final phase, and it does not search in regions that have previously been searched. Secondly, BSW could perform adjustment with high precision for the optical axes. For example, if the optical axes are re-set to the state based on the point in Fig. 9, positioning error would be 100μm, and the angle error would 0.2◦ . This positioning precision is 5% for a beam diameter of 2mm. Thirdly, BSW provides robust adjustment for the optical axes. This can be seen by comparing the 50%-attainment surfaces results obtained by the three
Robust and Efficient Multi-objective Automatic Adjustment
BSW BSA GA
Before re-setting After re-setting
1.4
1.2
1.2
1
1 Angle error (degree)
Angle error (degree)
1.4
0.8
0.6
0.4
0.8
0.6
0.4
0.2
0.2
0 0
200
400
600
800
0
1000
0
200
Positioning error (um)
Fig. 9. Comparisons of the 50%-attainment surfaces obtained by the three adjustment methods after re-setting
400 600 Positioning error (um)
800
Fig. 10. Adjustment results for MOGA before and after re-setting
Before re-setting After re-setting
1.4
1000
the
Before re-setting After re-setting
1.4
1.2
1.2
1 Angle error (degree)
1 Angle error (degree)
353
0.8
0.6
0.8
0.6
0.4
0.4
0.2
0.2
0
0 0
200
400 600 Positioning error (um)
800
1000
Fig. 11. Adjustment results for the BSA before and after re-setting
0
200
400
600
800
1000
Positioning error (um)
Fig. 12. Adjustment results for the BSW before and after re-setting
adjustment methods, as shown in Fig. 9. Clearly, the level of precision for the 50%-attainment surface with BSW is higher than for both the MOGA and BSA in terms of both evaluation functions for positioning error and angle error. This result is also supported by comparing the 50%-attainment surfaces before resetting and after re-setting for each adjustment method. The precisions of the 50%-attainment surfaces for the MOGA and for BSA deteriorated after re-setting due to the influence of noise, as shown in Figs. 10 and 11. In contrast, BSW was able to retain the high precision of the 50%-attainment surface even after the optical axes were re-set, as shown in Fig. 12. The reason for these results is that BSW utilizes a weighted averaged value as the fitness value.
5
Conclusion
In this paper, we have proposed a robust and efficient multi-objective automatic adjustment method to adjust the positioning and the angle of optical axes. The results of adjustments experiments using the proposed method demonstrate that
354
N. Murata et al.
adjustment could be performed in about 12 minutes, which represents an adjustment time reduction of 93% compared to the conventional method. Moreover, our method could achieve robust multi-objective automatic adjustment of the optical axes. A future investigation will be the application of the proposed method to optical systems involving multiple components with several DOF in order to verify its effectiveness for more difficult adjustment tasks. The proposed method can be employed not only with optical systems but with other kinds of engineering systems. With the proposed method, it is possible to robustly and efficiently execute adjustment in noisy environments.
Acknowledgement This work was supported in 2004 by Industrial Technology Research Grant Program of the New Energy and Industrial Technology Development Organization (NEDO) of Japan and Grant-in-Aid for JSPS Fellows in 2005.
References 1. Murakawa, M., Itatani, T., Kasai, Y., Yoshikawa, H., Higuchi, T.: An evolvable laser system for generating femtosecond pulses. In: Proceedings of the Second Genetic and Evolutionary Computation Conference (GECCO 2000), pp. 636–642 (2000) 2. Nosato, H., Kasai, Y., Murakawa, M., Itatani, T., Higuchi, T.: Automatic adjustments of a femtosecond-pulses laser using genetic algorithms. In: Proceedings of 2003 Congress on Evolutionary Computation (CEC 2003), pp. 2096–2101 (2003) 3. Murata, N., Nosato, H., Furuya, T., Murakawa, M.: An automatic multi-objective adjustment system for optical axes using genetic algorithms. In: Proceedings of 5th International Conference on Intelligent Systems Design and Applications (ISDA 2005), pp. 546–551 (2005) 4. Murata, N., Nosato, H., Furuya, T., Murakawa, M.: Robust and efficient automatic adjustment for optical axes in laser systems using binary search algorithm for noisy environments. In: Proceedings of The 3rd International Conference on Autonomous Robots and Agents (ICARA 2006), pp. 261–266 (2006) 5. Hughes, E.J.: Multi-objective binary search optimisation. In: Fonseca, C.M., Fleming, P.J., Zitzler, E., Deb, K., Thiele, L. (eds.) EMO 2003. LNCS, vol. 2632, pp. 102–117. Springer, Heidelberg (2003) 6. Fitzpatrick, J.M., Greffenstette, J.J.: Genetic algorithms in noisy environments. Machine Learning 3, 101–120 (1988) 7. Stagge, P.: Averaging efficiently in the presence of noise. In: Proceedings of Parallel Problem Solving from Nature (PPSN V), pp. 188–197 (1998) 8. Watanabe, S., Hiroyasu, T., Miki, M.: Neighborhood Cultivation Genetic Algorithm for Multi-Objective Optimization Problems. In: Proceedings of the 4th Asia-Pacific Conference on Simulated Evolution and Learning (SEAL’02), vol. 1, pp. 198–202 (2002) 9. Knowles, J., Thiele, L., Zitzler, E.: A tutorial oh the performance assessment of stochastic multiobjective optimizers. Report of Computer Engineering and Networks Laboratory (TIK) (2006)
Minimization of the Redundant Sensor Nodes in Dense Wireless Sensor Networks Dingxing Zhang1,2, Ming Xu1, Wei Xiao3, Junwen Gao4, and Wenshen Tang1 1
School of Computer, National University of Defense Technology, Changsha, China 2 Guangdong Tech. College of Water Resources & Electric, Guangzhou, China 3 College of Mathematics and Computer Science, Hunan Normal University, Changsha 4 School of Mechanical Engineering, South China University of tech. Guangzhou, China
[email protected]
Abstract. Most sensor networks are deployed with high density and then node duty cycle is dynamically managed in order to prolong the network lifetime. In this paper, we address the issue of maintaining sensing coverage of surveillance target in large density wireless sensor networks and present an efficient technique for the selection of active sensor nodes. First, the At Most kCoverage Problem (AM k-Coverage) is modeled as a nonlinear integer programming. Then Genetic Algorithm is designed to solve the multi-objective nonlinear integer programming, which is a quasi-parallel method. And later by using Genetic Algorithm, a central algorithm is designed to organize a sensor network into coverage sets. Considering that the central base station consumes a great deal of energy when it collects the coverage information from every node, we also propose a localized manner on the basis of the proposed central algorithm. Finally, Experimental results show that the proposed algorithm can construct the coverage sets reliably and reduce the number of active sensor nodes which is helpful to reduce system energy consumption and prolong the network lifespan. Keywords: coverage sets, multi-objective optimization, genetic algorithm, Pareto-optimal, local central node, maximal remaining energy.
1 Introduction A sensor network is often densely deployed over hostile or remote environment. One of the fundamental tasks for Wireless Sensor Networks is to provide full monitoring and pertinent data from the physical world. Once thrown over sensitive areas, the sensor nodes become one-use-only since their batteries can not be easily replaced or refilled. Therefore, energy is implied to be the most important resource. Density control can control the density of the working sensors to certain level to avoid redundancy. It ensures only a subset of sensor nodes works in the active mode, while fulfilling the given covering task. That is, density control is a promising approach to conserving system energy and extending lifetime of wireless sensor networks. Most of previous researches on density control focus on dividing the sensor network into disjoint subsets that every subset completely covers all sensing objects. L. Kang, Y. Liu, and S. Zeng (Eds.): ICES 2007, LNCS 4684, pp. 355–367, 2007. © Springer-Verlag Berlin Heidelberg 2007
356
D. Zhang et al.
In [5], [7], for the individual targets coverage, authors designed dominating set algorithms. Cardei and Du [3] address maximum disjoint coverage sets problem and mold the disjoint sets as disjoint coverage sets that every set can completely monitor all the target points. Therefore, they proposed an efficient method to extend the sensor network lifetime by dividing the sensors into a maximal number of disjoint coverage sets. For many sensor network applications such as forest environment surveillance, it is also preferable to only cover target objects as full as possible while minimizing energy consumption due to the following factors. First, since these coverage subsets are in active mode alternately, the coverage quality can be guaranteed statistically by setting an appropriated subset number even if few blinds occur in coverage subsets. For dense sensor networks, the coverage blinds are not static and can be covered by another subset at another time as long as it is within the sensing range of certain sensor nodes. Second, as the sensing task always correlates with particular application, to improve the coverage quality in practice, the redundant target objects often are set when a target is very important. Besides, some of previous researches on coverage problems focus on constructing a mathematic model to obtain solution of the problems. Chakrabarty et al [2] adopted the first ILP (Integer Linear Programming) model to study grid sensing field problems in which the ILP model contains quadratic constraints. Cardei et al. [4] improved the network lifetime by further relaxing the constraints of disjoint set covers, i.e., one node can be in multiple set covers. In the paper, we limit the upper bound of coverage degree on each target to guarantee a lower redundancy. Since we do not limit the lower bound of coverage degree, coverage blinds (i.e., targets are not covered by any sensor nodes in a coverage subset) may occur when a coverage subset is constructed. In the problem, we hope that minimal sensor nodes and blinds occur in coverage subsets. To solve these two problems, one of possible method is to consider the joint optimization on the minimum size of coverage subset and the number of coverage blinds. Thus, an Integer Nonlinear Programming model with (n+m) variables and 2m constraints is presented in this paper. Due to the multiobjective optimality conditions, the resulting optimization problem gives rise to a set of optimal solutions, instead of a single optimal solution. Since GA often deals with a population of points and captures multiple Pareto optimal in the population, to handle this issue, we apply GA to find these solution vectors in this paper. The major contributions of this paper are as follows. First, we construct the maximal number of cover sets based on At Most k Coverage Problem (AM kCoverage). The degree of coverage is flexible in this framework. Second, the central algorithm is proposed to divide the dense the wireless sensor network into coverage subset based on GA, which is a quasi-parallel method. Considering that the central base station obviously consumes a great deal of energy when it collects the coverage information from every node, finally, we also propose a localized manner on the basis of the proposed central algorithm. The rest of the paper is organized as follows. In Section 2, we propose the basic hypothesis and problem definition. Section 3 presents Multiobjective Optimization Using GA to solve the problem raised in Section 2. Section 4 presents the Central algorithm based on GA and a Local Solution based on the central algorithm is designed in Section 5. Detailed results of performance evaluations are presented in Section 6. Finally in Section 7, we conclude the paper.
Minimization of the Redundant Sensor Nodes in Dense Wireless Sensor Networks
357
2 The Basic Hypothesis and Problem Definition 2.1 Basic Hypothesis and Problem Definition In the paper, a large wireless sensor network is considered and all sensor nodes are randomly deployed to be close to the target objects which have all known the coordinate. These sensor nodes gather data which are sent to base stations (i.e., central data collector nodes). The sensing data might be processed at the base stations or at the local sensor nodes. Sensor networks can be assumed as follows: (1) all the sensor nodes are static and have the same computation capabilities. (2) Each sensor node has two power states: active and asleep, and energy dissipation is negligible in the sleeping state. (3) Devices can be time-synchronized so that activity decisions can occur in rounds. (4) Like most existing algorithms, sensor nodes know their respective positions since positioning issue has already been addressed [6]. Given a set of sensors, a set of targets and the sensor-target coverage map, we define AM k-Coverage (AM kCoverage) as follows: Definition 1: AM k-Coverage. Given a sensor network with n sensor nodes and a target set T with m targets, we find a family of coverage sets, such that (1) we minimize the number of sensor nodes in coverage sets and the coverage blinds, (2) each target is covered by at most k sensor nodes. As wireless sensor networks are often deployed randomly by aircrafts, it is difficult to guarantee uniformly distributing sensor nodes in real-world. Thus, a flexible frame to construct coverage sets has to be designed. In AM k-Coverage definition, we limit the upper bound of coverage degree on each target to guarantee a lower redundancy. Each target object is covered by at most k sensor nodes while differ from the previous work in this field [12]. 2.2 Multi-objective Integer Programming Formulation In this section, we present the Multi-objective Integer Programming Formulation for the AM k-Coverage problem. We are first given a set of n sensor nodes S={s1,s2,… ,sn} and a set of targets T={ t1,t2,… ,tm }. In addition, a relational matrix C=(cij)n×m between sensor nodes and targets is given, where cij is a Boolean variable, for i=1…n, j=1…m, if the target tj is covered by sensor si, then cij=1, otherwise cij=0. Let define the Boolean variable xk (k=1…n) as follows: xk=1, if the sensor node sk is selected in a coverage set, otherwise, xk=0. Using the variables above, we were able to formulate the relational problem. Definition 2: Total Overlapping Coverage of Single Target (TOC). Let S = {si| i=1…n} be a wireless sensor network and a target set T with m targets tk (k=1…m), the Total Overlapping Coverage of the target tj is defined as:
∑
n
c x , (j=1…m)
i =1 ij i
(1)
In the definition TOC, the term ∑ in=1 cij xi is the coverage overlapping of the target tj. We also define the following Boolean variable yp(p=1…m) which indicates whether or not target tp is covered by at least active sensor node(yp =0, if target tp is covered by
358
D. Zhang et al.
at least active sensor node, otherwise, yp =1). Thus, in certain coverage subset, the total cost of coverage blinds is denoted as:
∑
m p =1
vp y p
(2)
In the term (2), v p is the weigh of the target point p. According to the above, AM k-Coverage can be formally defined as follows: Definition 3: AM k-Coverage. Given a sensor network with n sensor nodes and a target set T, we find a family of sensor nodes to construct a coverage set, to minimize m the total coverage blinds ∑ p =1 v p y p and the total sensor nodes in the coverage n n set ∑ i =1 w i xi , TOC of each target is at most k, i.e., ∑ i =1 cij xi ≤ k , j=1…m, where k is some given integer. In the AM k-Coverage definition, each target is only covered by at most k sensor nodes if the target is covered. Without loss of generality, we assume that the cost for keeping a node awake is the same for all sensor nodes. We can further formulate the combinational optimization problem, AM k-Coverage, as the following multiobject integer programming problem: Objective: Minimize f1 ( x1 x2 … xn ) = ∑ i =1 wi xi n
Minimize f 2 ( y1 y2 … ym ) = ∑ j =1 v j y j m
Subject to y j + ∑ i =1 cij xi ≥ 1, j = 1… m n
y j ∑ i =1 cij xi = 0, j = 1… m n
∑
n
(3)
c x ≤ k , j=1…m
i =1 ij i
xi,yj∈{0,1}, i=1…n, j=1…m where wi , vk are respectively the weigh of the sensor node i and the target object k. The constraint y j + ∑ i =1 cij xi ≥ 1, j = 1… m and y j ∑ i =1 cij xi = 0, j = 1… m guarantee n
n
that yj =0 is not simultaneous with ∑ i =1 cij xi = 0 . That is, if the variable yj ≠0, then n
formulate ∑ i =1 cij xi = 0 and vice versa. The constraint, ∑ n
n
x c ≤ k for all j=1…m,
i =1 i ij
guarantees that each target is covered by at most k active sensors at the working duration of each coverage set. To obtain more coverage set, the value TOC is restricted in this model. 3m constraints of which 2m is linear and m nonlinear are in formula (3). However, n n we carefully analyze the constraints y j + ∑ i =1 cij xi ≥ 1 and y j ∑ i =1 cij xi = 0, j = 1… m , they only indicate a dependent relation between the vector x and y. That is, if we obtain the solution vector x = ( x1 x2 … xn ) , accordingly, the vector y = ( y1 y2 … ym ) can be solved. In the paper, first, we determine the vector x, the vector y then is obtained by applying the constraints y j + ∑ i =1 cij xi ≥ 1 and y j ∑ i =1 cij xi = 0, j = 1… m . Thus, we n
n
Minimization of the Redundant Sensor Nodes in Dense Wireless Sensor Networks
359
can reduce the constraints in the problem (3) by pre-treating constraints y j ∑ i =1 cij xi = 0 and y j + ∑ i =1 cij xi ≥ 1,, j = 1… m . n
n
In the parlance of multi-criterion decision-making, multiple optimal solutions are Pareto-optimal [1]. For many other Pareto-optimal solutions in the search space, we can not think that any of them is optimal and can only choose better solutions from the set of obtained Pareto-optimal solutions according to application requirements. Let us illustrate this aspect with a two-objective optimal problem shown in Figure 1. The figure considers Objictive1and Objictive2 two objectives, both of which are to be minimized. The point A represents a solution which incurs a near-minimal Objictive2, but is highly accident-prone. On the other hand, the point E represents a solution which is near least Objictive1. If both objectives are important goals of design, we cannot really say whether solution A is better than solution E, or vice versa. In fact, there exist many such solutions (like solution C) which also belongs to the Paretooptimal set and one cannot conclude an absolute hierarchy of solutions A, B, D, or any other solution in the set without any further information. All these solutions are known as Pareto-optimal solutions.
Fig. 1. The concept of Pareto-optimal solutions is illustrated
3 Multi-objective Optimization Using Genetic Algorithms 3.1 Individual Representation and Population Sorting
In the formula (3), since the vector y is determined by the vector x, though a solution vector is (x,y), where x=(x1,x2,…,xn),with xi {0,1}(i=1…n), y=(y1,…,ym), with yj {0,1}(j=1…m), we only define the binary vector x as the individual of a population. As the first step to devise GA, we first consider the usual 0-1 binary representation as an obvious choice. If we use a n-bit binary string as the chromosome structure, a value of 1 for the i-th bit implies that sensor node si is in the solution. Figure 2 illustrates a typical example of a chromosome. In the generation process, an initial population with m chromosomes is randomly generated using a uniform random generator. After applying crossover and mutation operators on the old set of chromosomes on the basis of the fitness, a new population is generated. In the paper, we use the way of Kalyanmoy Deb.[8] and Srinivas N.[9] proposal and the last solutions are close to the true optimum solution [8], [10]. In this method, the population of the GA is composed of different Pareto optimal fronts including unfeasible individuals. Since Pareto optimality defines how to determine the
∈
∈
360
D. Zhang et al.
1 0 0 0 1 0 0 1 1 1 0 …0 0 0 1 1 1 Fig. 2. Binary representation of an individual’s chromosome
set of optimal solutions, the main idea of our algorithm is to simultaneously find multiple Pareto optimal solutions so that decision maker can choose the most appropriate solution for the current application. To show our algorithm, we refer to the following terminologies. A solution x is said to dominate the other solution y, if objective vector (f1(x), … , fk(x)) dominates vector (f1(y), … , fk(y)) i.e., ∀ i {1, …, k}, fi(x) ≤fi(y) ∃ j {1, … , k}: fj(x)
∈
∧ ∈ ∈
∈
feasible ( x ) = max
{∑
n
}
c x − k , j = 1… m ≤ 0
i =1 ij i
(4)
Further, we define the constraint violation of an individual x for the j-th target object (i.e., the difference that the coverage degree of the j-th target object subtracted k) as follows: ⎧⎪ n c x − k , if ∑ n cij xi − k > 0 i =1 violate(x, t j )= ⎨∑ i =1 ij i 0, otherwise ⎪⎩
Then an individual x has an total constraint violation
∑
m j =1
(5)
violate( x, t j ) .
Now, we decide each individual rank and the crowding distance corresponding to their position in the front they belong. All the individuals in the first front are given a rank of value 1, the second front individuals are assigned rank 2 and so on. After
Minimization of the Redundant Sensor Nodes in Dense Wireless Sensor Networks
361
assigning the rank the crowding distance in each front is calculated. The crowding distance is calculated based on the literate [10]. We will not give its description here. 3.2 Selection, Crossover and Mutation
We use tournament selection technique to generate a new population. Although the simulated binary crossover (SBC) is designed to simulate the operation of a singlepoint binary crossover directly on real variables, we use (SBX) [11] operator for crossover and polynomial mutation. To aim at binary variables, we slightly modify the genetic crossover and mutation operators from the original individual pi. First, a random number μ that distributes uniformly between 0 and 1 is created. Then, β
according to the equation ∫ p ( x ) dx = μ , we can obtain the sample β , where 0
⎧⎪ 0.5 (η + 1) xη if 0 ≤ x ≤ 1 p ( x) = ⎨ − (η + 2 ) otherwise ⎪⎩0.5 (η + 1) x
(6)
η is a given crossover distribution index. After finding β from the above probability 1 2 distribution, the children solutions c , c are calculated as follows: ⎧ 1 , if ti1 ≥ 1 ci1 = ⎨ , ⎩0, otherwise
⎧ 1 , if ti2 ≥ 1 ci2 = ⎨ ⎩0, otherwise
where ti1 = 0.5 ⎡(1 + β ) pi1 + (1 − β ) pi2 ⎤ , ti2 = 0.5 ⎡(1 − β ) pi1 + (1 + β ) pi2 ⎤ ⎣ ⎦ ⎣ ⎦ Choose a sample δ from the equation
(7) (8)
δ
∫ p ( x )dx = μ ,where p ( x ) = 0.5 (ϕ + 1) (1 − x ) 0
ϕ
(9)
ϕ is a given mutation distribution index. After finding δ , the recombination children 1, if yij ≥ 1 ,where are calculated based on c = ⎧⎨ yij = pij + δ . ij ⎩0, otherwise To summarize our modified GA for AM k-Coverage, the following steps are used.
( )
solutions ci = cij
1×n
Algorithm 1: Genetic Algorithm Overview 1: Given maximal iterative times max_gen, a collection S of sensors, a collection T of targets and the sensor-target coverage map. 2. Generate an initial population P of size M. every individual is chosen uniformly at random from the decision space X={0, 1}n. 3. for gen=1 to max_gen 4: Sort the population; 5: Individual selection operation using tournament selection technique according to the individual fitness; 6. Make use of SBX operator for crossover and polynomial mutation to obtain offsprings; 7. gen=gen+1; 8. end for 9: return Pareto Optimal Set.
362
D. Zhang et al.
4 Central Algorithm Based on GA In an effective centralized approach, the sensors first send their coordination and cover information to the base station. The base station divides the sensor network into cover sets. Thereafter, the base station broadcasts back the sensor schedules. This static network structure permits that the central algorithm performs one time only. When a solution to the problem is available, it is transmitted to each sensor in form of a coverage set index. The index is used as battery scheduling, that is, a sensor has to be in the sleep mode or in the active mode. Clearly, the disadvantage of centralized algorithms is that they rely on the network’s ability to transmit data from every single node to the base station and vice versa. Fortunately, some work has been already done on this subject so we will not consider it here. The idea of algorithm 2 is like this: Call Algorithm 1 to obtain a Pareto Optimal Set. Each element in the set denotes a coverage set, for instance, given a sensor network with 10 nodes, a element X=(1,1,0,0,0,1,0,1,0,0) denotes that s1, s2, s6, s8 are in the same coverage set. However, a few elements may be in a Pareto Optimal Set, and we have to choose some suitable elements according to current application. It is trivial for the approach so we will not consider it here. The process is continued until all sensor nodes are assigned to at least a coverage set. Specially, we associate each coverage subset with a unique id, called subnetid, and all nodes belonged to the same coverage subset share its subsetid. According to Algorithm 2, a sensor node may have multiple subsetid, if it belongs to several coverage subsets. Algorithm 2: Central Algorithm for Dividing Coverage Set Based on Algorithm 1 1: Initialize Cover set index, i.e., c_index=1 2: While (S≠Φ) do 3: Call Algorithm 1 to obtain Pareto Optimal Set P={x1…xp} 4: Choose a better subset Psub ={Xs1 Xs2 … Xst}∈P according to the current application 5: for i=1 to t 6: each Xsi determines a coverage set C ( c _ index ) 7: S ← S − C ( c _ index ) . 8: c _ index ← c _ index + 1 . 9: i ← i + 1 10: End for 11: End while 12: Return {C1…Cc_index}. In algorithm 2, the initial S denotes the sensor node set, i.e., S=(s1,s2, …,sn).
5 A Local Solution Since the centralized approach obviously consumes a great deal of energy when it collects and issue the coverage information from and to every node, the Election of Local Central Node Based on Maximal Remaining Energy (ELCNBMRE) algorithm, which is based on only its m-hop neighborhood information, is proposed in this section. ELCNBMRE divides the network into several groups and each has a local central sensor node which implements Algorithm 2.
Minimization of the Redundant Sensor Nodes in Dense Wireless Sensor Networks
363
5.1 Detailed Description of ELCNBMRE
In the subsection, we descript an elected protocol of local central node in detail. The protocol attempts to select a sensor node with maximal energy as local central node. At any time, a sensor node is in one of the three states: “INITIAL”, “ELECTED”, “UNELECTED”. Time is divided into two phases: the local central election phase and the network working phase. At the beginning of the local central node election phase, all the nodes wake up, set their states to “INITIAL”, and carry out the operation of electing local central node. By the end of the phase, all the sensor nodes change their states to either “ELECTED” or “UNELECTED”. We assume the time length of the local central election phase t0. The protocol is described as follows.
,
Step 1. At the beginning of local central node election phase, every sensor node sets a backoff timer of (1 − Ei E0 ) t0 where Ei , E0 are respectively current remaining energy and initial energy of the i-th sensor node. When the timer expires, the sensor node broadcasts an ELECTING message to all its m-hop neighbors and sets its state to “ELECTED”. If a sensor node hears other ELECTING messages before its timer expires, it cancels its timer and does not become a local central node and then changes its state to “UNELECTED” which indicates the sensor node is not a local central node. The ELECTING message contains its node ID. The sensor node with maximal current remaining energy first broadcasts the ELECTING message by using backoff timers. Step 2. When a sensor node receives an ELECTING message, if the sensor node is already “ELECTED” or “UNELECTED” or more than m-hop away from the sensor node, it ignores the message; otherwise it adds this sender to its local central node list and sets its state to “UNELECTED”. Every sensor node with state “UNELECTED” broadcasts a JOINING message to the local central node belonged to the sensor node. The JOINING message contains (1) node ID, (2) coverage information and (3) the local central node ID. Step 3. If a sensor node elected as the local central node, it has to set a waiting timer Tw to receive the JOINING message. Obviously, using a large value of Tw is a tradeoff between the performance and the latency. The selection of Tw ensures the local central node can receive all JOINING message from its group members. When the waiting timer Tw expires, it adds respectively this sender and coverage information to its group member list and local coverage information list. 5.2 Local Algorithm Based on ELCNBMRE Algorithm
In the subsection, a local algorithm based on ELCNBMRE is developed. Algorithm 4: Local Algorithm Based on ELCNBMRE Algorithm Step 1. Using ELCNBMRE algorithm to elect local central nodes. Step 2. Using Algorithm 2 to divide group into several coverage subsets, mark every coverage subset Ci, i=2,..., p. Specially, we associate each local central node with a unique id, called groupid, and all nodes with the same local central node share its groupid. A sensor node may
364
D. Zhang et al.
Fig. 3. Exist target objects covered more than 2 (k=2)on the verge of Group A and Group B
have multiple groupid, if it belongs to not only a coverage subset. When the algorithm is implemented, the following two cases may occur:(1) There exist potential isolated sensor nodes that have neither received ELECTING messages nor JOINING messages even though it sent ELECTING message. In the case, the isolated first finds the closest sensor nodes and then joins the group that the closest sensor belongs to, (2) There exist potential target objects covered more than k times on the verge of groups (See Fig. 3). But, the case is trivial when the algorithm runs in practice due to working by turns in the same group.
6 Implementations and Experiments In the section, different case studies are presented to show the efficiency of the algorithm. Throughout all these experiments, the average results are given over different problem setting. The deployment problem parameters are randomly generated using a uniform random generator. In all experiments, the crossover distribution indexη and mutation distribution index crossover ϕ set 20.
Fig. 4. Different convergence results vs. iterative times
Minimization of the Redundant Sensor Nodes in Dense Wireless Sensor Networks
365
Table 1. Effect to implement Algorithm 2 is detailed Number of Sensors 200 300 400 500 600
Iterative Times 3.74 5.13 6.2 6.71 9.31
Maximal Parallel Degree 5 7 7 8 11
Minimal Parallel Average Degree Parallel Degree 2 3.27 2 4.61 2 5.3 2 5.72 2 5.47
The convergence of Algorithm 1 is first tested. In the experiment, we assume that 400 sensor nodes with sensing radius 80m and 80 target points is randomly located in a 400m ×400m area to form a random distributed map. We restrict that each target object is covered by at most 4 sensor nodes. The initial population size is set to 50 and the maximum number of generations to 200. Algorithm 1 is run 50 times with a crossover probability of 0.9. Figure 4 shows the efficiency of Algorithm 1 with different iterative times. As shown in Figure 4, the Pareto front changes towards the Pareto-optimal front. However, after iterating 30 times the Pareto front is almost not improved with increasing iterative times. The experimental results have shown that the algorithm is very robust by searching for the feasible region and better to maintain a representative sampling of solutions along the Pareto-optimal surface. Table 1 presents a set of run results for Algorithm 2. The Parallel degree in the following table indicates that the number of set covers is produced per iterative. The whole coverage set completely covers all the target objects in the table1 and is at most 3 blinds in the coverage sets in Table 1. The simulation results can show that Algorithm 2 can produce coverage subsets in parellel. To validate and evaluate the proposed Algorithm 4, we have implemented it in Matlab7. The iterative times of GA is set 50. Each data point reported below is an average of 50 simulation runs. Parameters used: This initial energy is 1000 unit of energy and this remaining energy of each sensor node is uniformly randomly distributed within [100, 100]. Each sensor node has a sensing range of r= 80m. In this
Fig. 5. Average of the total blind vs. blind in coverage subsets of each group
Fig. 6. Number of working nodes vs. deployed sensor nodes
366
D. Zhang et al.
simulation, we do not consider any communication consumption such as the time to send and receive a message. We also do not consider any potential congestion due to using flood protocol. The timer values are set to, respectively, t0= 100 seconds, Tw=200 seconds. We use the transmission range R=100m in the simulation unlike custom transmission range R=2r. We restrict that each target object is covered by a most 4 sensor nodes. Figure 5 shows the average of the total blind with 400 sensor nodes when the blind in coverage subsets of each group is 3, 4, 5 and 6 respectively. Though there are blinds in every coverage subset of each group, the total blind is closed to 0 due to being remedied by coverage subsets in different group. Figure 6 shows the comparison of the number of working nodes by the proposed Algorithm 2 and Algorithm 4 using different sensor nodes when the blind number is 0. It shows that Algorithm 2 and Algorithm 4 have equivalent performance in the number of working nodes. Obviously, Algorithm 4 has better performance than Algorithm 2 if we consider the communication consumptions.
7 Conclusion In this paper, we study the coverage subset problem under redundancy cover constraints by making use of a discrete target model. We design a method based on GA to organize a sensor network into some coverage sets. The approach can be applied to obtain parallel solution. In large sensor networks, the centralized approach obviously consumes a great deal of energy when it collects and issue the coverage information from and to every node. So we also propose a localized algorithm based on the proposed central algorithm. Experimental results show that the proposed algorithm can construct the coverage sets reliably and reduce the number of active sensor nodes which is helpful to reduce system energy consumption and prolong the network lifespan.
References 1. Steuer, R.E.: Multiple criteria optimization: Theory, computation, and application. Wiley, New York (1986) 2. Chakrabarty, K., Iyengar, S.S., Qi, H., Cho, E.: Grid coverage for surveillance and target location in distributed sensor networks. IEEE Trans. on Comput. 51(12), 1448–1453 (2002) 3. Cardei, M., Du, D.Z.: Improving wireless sensor network lifetime through power aware organization. ACM Wireless Networks 11(3) (2005) 4. Cardei, M., Thai, M.T., Li, Y., Wu, W.: Energy-efficient target coverage in wireless sensor networks. In: Proceedings of 24th Annual Joint Conference of the IEEE Computer and Communications Societies (INFOCOM), vol. 3, pp. 1976–1984 (2005) 5. Cardei, M., Wu, J., Lu, M., Pervaiz, M.O.: Maximum network lifetime in wireless sensor networks with adjustable sensing ranges. In: Proc. of IEEE International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob) (2005)
Minimization of the Redundant Sensor Nodes in Dense Wireless Sensor Networks
367
6. Koubaa, H., Fleury, E.: On the performance of double domination in ad hoc networks. In: Proc. of IFIP Medhoc (2000) 7. Dai, F., Wu, J.: On constructing k-connected k-dominating set in wireless networks. In: Proc. of IPDPS (2005) 8. Deb, K.: An Efficient Constraint Handling Method for GAs. Computer Methods in Applied Mechanics and Engineering 186(2/4), 311–338 (2000) 9. Srinivas, N., Deb, K.: Multi-Objective function optimization using non-dominated sorting genetic algorithms. Evolutionary Computation 2(3), 221–248 (1994) 10. Deb, K., Agrawal, S., Pratap, A., Meyarivan, T.: A Fast Elitist Non-Dominated Sorting Genetic Algorithm for Multi-Objective Optimization:NSGA-II. In: Proceedings of Parallel Problem Solving from Nature -PPSN VI, pp. 849–858. Springer, Heidelberg (2000) 11. Deband, K., Agarwal, R.B.: Simulated Binary Crossover for Continuous Search Space. Complex Systems (9), 115–148 (1995) 12. Abrams, Z., Goel, A., Plotkin, S.: Set k-cover algorithms for energy efficient monitoring in wireless sensor networks. In: Ramchandran, K., Sztipanovits, J. (eds.) Proc. of the 3rd Int’l Conf. on Information Processing in Sensor Networks, pp. 424–432. ACM Press, Berkeley (2004)
Evolving in Extended Hamming Distance Space: Hierarchical Mutation Strategy and Local Learning Principle for EHW Jie Li and Shitan Huang Xi’an Microelectronics Technology Institute 710071 Xi’an, Shannxi, China
[email protected]
Abstract. In this paper extended Hamming distance is introduced to construct the search space. According to the features of this space, a hierarchical mutation strategy is developed for the purpose of enlarging the search area with less computation effort. A local learning principle is proposed. This principle is used to ensure that no mutation operates on the same locus of chromosomes within one generation. An evaluation method called fitness effort for calculating computational effort per increased fitness value is also given. Experimental results show that the proposed hybrid approach of hierarchical mutation and local learning can achieve better performance than traditional methods. Keywords: Evolvable Hardware, extended Hamming distance space, hierarchical mutation, local learning, fitness effort.
1 Introduction Evolutionary algorithms are applied as the inherent inspiring mechanism that leads the EHW to satisfy the requirements of environment. Generally, evolution is considered as a movement performing on a fitness landscape [1]. Evolutionary operator, such as mutation, changes the fitness value of an individual, and thus moves its position on fitness landscape far away from or close to a peak or a trough which indicates an acceptable solution. To obtain better performance, some strategies for the determination of evolutionary parameters, such as mutation ratio, have been made from the fitness landscape point of view. L. Sekanina used standard CGP that modifies one randomly selected gene of an individual to evolve polymorphic digital circuits [2]. Miller argued that it would be suitable to adjust mutation ratio according to the length of genotype with a fixed number of modified genes such as 2 or 3 genes per chromosome [3]. A fixed gene number mutation operates on population can be viewed as the parallel hill climbing with the fixed step size but lacking searching efficiency. D. Levi presented an algorithm - HereBoy, in which an adaptive mutation was applied [4]. The algorithm adjusts the mutation ratio by calculating the difference between the current individual score and the maximum score. H. Liu applied a method based on a predefined segment function according to the degree of the current fitness close to the maximum L. Kang, Y. Liu, and S. Zeng (Eds.): ICES 2007, LNCS 4684, pp. 368–378, 2007. © Springer-Verlag Berlin Heidelberg 2007
Evolving in Extended Hamming Distance Space
369
fitness [5]. These strategies treated the fitness landscape as continual surface or curve, and used the information learned from the landscape to guide the evolution. For example, an individual with a higher fitness value on the fitness landscape is treated as if it is closer to a solution in the search space than the one with a lower fitness value. The mutation ratio often begins high and drops off as the population converges on the solution [4]. However, the adopted evaluation methods do not concern any information about the location in the search space. An individual with a higher fitness is not guaranteed to be in the position closer to an optimal solution. If there is no individual better than the current one within the search range, the evolution may be trapped in a local optimum [1]. In this case, it seems that a fixed mutation ratio or a descended mutation ratio would be in a little trouble helping the evolution to move away from this trap. Several researchers addressed this problem by applying neutral mutation [1][6]. Here we deal with it in another way. In this paper we are going to investigate the behavior of evolution in the search space, and then to develop a strategy for the determination of mutation. An extended concept of distance derived from Hamming distance for real number strings will be introduced. All individuals in search space are organized by the extended Hamming distance. A hierarchical mutation strategy based on the extended distance will be developed. A local learning principle, which guarantees no mutation operates repetitively on the same locus within one generation, will be proposed then. Fitness effort, a method for evaluating computational efforts per fitness, is also given. CGP [7], an elitism genetic programming algorithm without crossover operator, will be considered as the basic frame of algorithm in this paper. The rest of this paper is structured as follows: section 2 introduces the distance concept and details the extended Hamming distance space; the next section deals with the hierarchical mutation strategy and local learning principle; evaluation technique, experimental settings and results analysis will follow next. The final section will present the conclusions.
2 Extended Hamming Distance Space Real world search space contains all possible individuals, and always possesses highdimensionality [8]. In this section, search space is restructured as a distance space for understanding the evolution easily. 2.1 Extended Hamming Distance Hamming distance describes the difference of two binary strings by counting the number of different bits between the two strings. This concept is extended to characterize the difference of real number strings. Following the description above, the distance for real number strings can be defined as follows: for two real number strings with equal length, the number of different values in the same location is called extended Hamming distance (eHD). Therefore, an extended Hamming distance space can be established by using this definition.
370
J. Li and S. Huang
2.2 Construct Search Space as Extended Hamming Distance Space In CGP, phenotypes are represented as graphical building blocks with n inputs and m outputs as well as the presented function. Genotypes are often encoded as real number strings. Each gene represents an input or a function, respectively. The search space is highly dimensional, and it is not easy to illustrate such a complex space clearly. However, by applying the concept of eHD, the behavior of evolution can be investigated easily. The search space takes the current best individual as its center. The space center varies as the current best individual changes. In such a space, all individuals are separated from their nearest neighbors by eHD of one. Those individuals apart from the current best individual in the same eHD are distributed on the surface of the same sphere. The sphere takes the current best individual as its center and the eHD as its radius. Thus the eHD space can be characterized as a series of concentric spheres. The number of spheres is determined by the length of an individual. An example of eHD space with 4 individuals is illustrated by Fig. 1. Assume that A, B, C, D are real number strings with a length of 3, and take the form of {1 2 3}, {5 2 3}, {1 2 7}, {6 2 3}, respectively. As shown in Fig.1 (a), A is the current best individual. B, C, D are its nearest neighbors distributed on a unit sphere. Suppose after a mutation, B takes the place of the best individual. Then the eHD space changes its structure as the center moves to B. All the other individuals are automatically rearranged on different spheres according to their distances to the new center. In this example, A and D are on the sphere with a radius of 1, while C has to move to the sphere of a radius 2.
(a) before mutation
(b) after mutation
Fig. 1. Extended Hamming distance space varies as the current best individual changes
The detail information about how the individuals distribute in the search space may significantly aid the evolution to find the right direction towards the target. However, poor knowledge for the distribution has been investigated yet. Currently, the information of distribution is ignored in evolution. Hence the 3D eHD space formed by concentric spheres can be mapped into a 2D space composed of concentric circles. Fig.2 shows the 2D space containing the same 4 individuals as in Fig.1. Individuals in the space are located on different concentric circles according to their distances to the center. Detail information about distribution on a circle is ignored. No individual can
Evolving in Extended Hamming Distance Space
(a) before mutation
371
(b) after mutation
Fig. 2. 2D extended Hamming distance space represents the same search space shown in Fig. 1
appear in the spare region other than on the circles and the center. Although how they locate on the circles is unknown yet, one thing is clear that all individuals are included on the limited concentric circles. In such a distance space, the behavior of evolution is to move the space center away from the current best individual towards a better one by applying mutation operator continually. This will cause the structure of the space reconstruct dynamically and automatically. The final target of evolution is to enable the space center to be located on an individual that represents an acceptable solution.
3 Hierarchical Mutation and Local Learning Traditional mutation strategies can be divided into two branches: fixed mutation and adaptive mutation. In eHD space, the fixed mutation allows the evolution operating on a specific circle. For example, a mutation of 2 modified genes can only evolve the individuals on the circle of radius 2. On the other hand, the adaptive mutation trend to move the search region close to the current best individual as its fitness approximating to the maximum fitness value. Both of them do not have the ability to enlarge the search area when the evolution is trapped. Take a 1-bit half adder as an example. Assume that the applying function set is: {2-NAND, 3-AND, 4-OR, 5-NOR, 6-XOR, 7-NXOR}. Two primary inputs are encoded as 0 and 1, respectively. The length of an individual is limited to 6 and the string is arranged as follows: {input1 input2 function input1 input2 function}. Mutation is a one gene modification. Fitness is the number of bits that satisfy the truth table with a maximum number of 8. Only the individual with higher fitness value can take the place of the current best individual. Suppose the first random selected individual is {1 0 2 1 0 7} with a fitness of 0, and the beginning space is shown in Fig. 3. As Fig. 3 shows, after four mutation operations the center of the space along the path 1 successfully reaches to an acceptable solution – {0 1 3 1 0 6}. In path 2, though the center finally located on an individual with only one bit incorrect, no individual appears to have higher fitness around it within an eHD of one. Thus it can never move away from this point in the case of the preconditions that one gene mutation and only
372
J. Li and S. Huang
Fig. 3. Extended Hamming distance space viewed from the initial individual {1 0 2 1 0 7}. Path 1 and path 2 show two possible routes that the space center moves along.
better individual can be accepted. In this example, either a fixed mutation or an adaptive one tries exploring in a limited scope, and thus restricting the ability of finding better individuals. 3.1 Hierarchical Mutation According to the example above, a mutation in eHD space should have the ability to exploit in a specific circle, which is the same as the fixed mutation does; and also be able to explore an area covering the circles of different radiuses with less computational effort. This requirement gives rise to the hierarchical mutation strategy for evolving hardware in eHD space. In previous works, offspring are generated from their parent directly. As for hierarchical mutation strategy, things become different. Only a small portion of offspring is generated from the parent directly. Others are produced by the pioneer
Evolving in Extended Hamming Distance Space
373
individuals hierarchically. The number of individuals generated by a pioneer influences the performance of searching. A multi-offspring generating policy results in a tree-like hierarchical mutation. This will cause the exponential increased population and computation time. To balance the search ability and the computation time, a single offspring generating policy is presented.
Fig. 4. The proposed hierarchical mutation strategy. Note that individual 13 and individual 15 are duplicative.
The proposed hierarchical mutation strategy is shown in Fig.4. Assume that there is a population of n individuals. In the first step, the current best individual produces m (m < n) individuals directly (individual 1 ~ individual 4). Each of the m individuals creates a single new individual in the next step. This operation brings another m individuals (individual 5 ~ individual 8). The operation will perform again on the m individuals generated in the second step. The process will continue until the population is matched. Since a λ= 4 individuals used in CGP has been proven to be practical in EHW research works [9][10][11], m, the number of individuals generated in each step, is set to 4 to maintain the exploiting ability in a specific circle. Moreover, as a building block is represented as a 3-input 3-output structure, 4 genes are needed to feature a node in the array. The maximum number of modified genes of an individual is limited to 4, which means only 4 steps are required in this article. Therefore the population becomes 16, as shown in Fig.4. Comparing with other mutations, the search area of hierarchical mutation within one generation can cover a range from an eHD of 1 to that of 4. This is helpful for increasing the possibility of finding better individuals. The cost is only 16 one point mutation operations. 3.2 Local Learning Principle In stochastic search, as shown in Fig.4, it is not easy to avoid repetitive evolutionary operations on the same locus of chromosomes. For example, the mutations act on the individual 2 and the individual 14. This may lead to the duplicative individuals and repetitive computation. However, by applying additional strategy, such as local learning principle, this negative influence can be dropped off.
374
J. Li and S. Huang
Fig. 5. The hybrid approach of hierarchical mutation and local learning principle. No duplicative individuals can be produced within one generation.
Learning is to obtain useful knowledge from experience and store it for future applications. In other words, learning is remembering [12]. However, it is costly and unpracticable to remember all information learned from practice during ones life. It is reasonable to focus on the information which can help the current task performing better. A local learning mechanism is introduced here to learn and remember the information obtained from the evolutionary process, and use it to improve the evolutionary performance. The principle can be described as follows: remembering the positions of the successful modification of genes, and promising that no mutation can perform repetitively on the same locus within one generation. The combination of hierarchical mutation and local learning is described as Fig.5. Take the production of individual 6 as an example to explain the hybrid approach. In the beginning, the individual 2 tries to modify the gene in the locus 8 to produce the individual 6. But the position has been modified by the individual 4, and this information is kept in the memory. Then the individual 2 has to find another position to mutate. Finally, it finds the locus 16 is free. The 16th gene of individual 2 is modified successfully and the individual 6 is generated then. The learning process will be performed independently in each generation. Comparing with that without local learning, an additional global storage and comparison operations are required in this technique. The storage is used as memory to remember the positions of the successful mutations within one generation, and will be cleared before going to the next generation. The influence of the extra comparisons on the amount of computation is so less that it is negligible. With the help of the local learning principle, the 16 individuals generated in one generation can be guaranteed to be different from each other. This will greatly reduce the computational effort and increase the searching efficiency, as experiment results show.
4 Experimental Settings and Results Analysis Miller suggested that the hit effort which is defined as the total number of evaluations divided by the number of hits is a reliable feature for measuring computational
Evolving in Extended Hamming Distance Space
375
efficiency [7]. However, this method can only be used to evaluate algorithms that have successful runs. Another technique is introduced here to describe the computational effort per fitness for the evolutions with successful runs and that without successful runs. The measurement, called fitness effort, is defined to be the average number of evolution operations divided by the average number of fitness of the final best individuals. The average number of evolution operations is calculated by the average number of modified genes during the evolution. As it features, the less the fitness effort, the higher efficiency the algorithm. 4.1 Experimental Settings In this experiment, the phenotype is represented by a graphic gate array as CGP does. The function set consists of 11 functions :{ WIRE, NOT, AND, NAND, OR, NOR, XOR, NXOR, MUX, HA, FA}, where HA indicates 1-bit half adder and FA 1-bit full adder. Each node in the gate array is redefined as a 3-input 3-output structure, which allows the node to present a logic function and to be a connection path simultaneously. Contrast to the normal settings, the gate array takes a form with large columns and small rows, and is arranged as 11×5 in the experiments. Our another work shows that the relatively large columns benefit from the parallel usage of primary inputs as well as connection paths, and therefore increasing the possibility of finding a high quality solution. The maximum generation is limited to 200,000. Table 1. Six algorithms applied in experiments
Algorithm
Population size
Mutation strategy
CGP_1 ACGP LCGP ALCGP RLCGP HLCGP
16 16 16 16 16 16
one point adaptive one point adaptive random hierarchical
Modified gene number per mutation 1 1 ~ 4, average 2 1 1 ~ 4, average 2 1 ~ 4, average 2.5 1
Modified gene number per generation 16
Local learning Ĝ Ĝ Ĝ Ĝ
The algorithms used in the experiment are based on the standard CGP. Combining with different strategies for mutation and local learning, these algorithms show different behaviors from each other. The settings of the algorithms are listed in table 1. The results produced in [3] can not be used to compare with the results generated in this paper directly due to the differences of the node structure, the array geometry and the function sets, etc.. In table 1, CGP_1 is used to represent a standard CGP algorithm with a population of 16. ACGP is that a CGP algorithm adopts adaptive mutation by a pre-defined segment function to determine mutation ratio. It takes two as the average number of modified genes when evaluating the evolution operations. LCGP is similar to CGP except adding the local learning principle, and so is ALCGP to ACGP. RLCGP is based on LCGP, and the modified gene number per mutation operation can range from one to four with equal possibility. An average of 2.5 is taken as the number of modified genes in RLCGP. HLCGP applies hierarchical mutation
376
J. Li and S. Huang
and local learning that described early in section 3, and it stands the proposed hybrid approach in this paper. The target of the experiments is to find a full functional solution rather than search the global optimal one. The evolution process will terminate as soon as it finds an acceptable solution, or go ahead until it reaches the maximum generation. 4.2 Experimental Results Derived from the truth table of 3-bit multiplier and that of 4-bit adder, the maximum numbers of bits needs to be matched are 384 and 1280, respectively. Table 2 shows the experimental results. Table 2. Experimental results over 30 runs Circuit
Algorithm
3-bit mul
CGP_1 ACGP LCGP ALCGP RLCGP HLCGP
4-bit adder
HLCGP
Average fitness / ratio 256.7 / 66.8% 310.1 / 80.8% 380.5 / 99.1% 382.6 / 99.6% 382.1 / 99.5% 383.6 / 99.9% 1277.7 / 99.8%
Average operations 3,200,000 6,400,000 3,022,531.2 5,863,401.6 6,522,160 2,425,248
Successful cases / ratio 0/0 0/0 5 / 16.7% 10 / 33.3% 8 / 26.7% 19 / 63.3%
Hit effort/1000 NA NA 18135.2 8795.1 9783.2 3829.3
Fitness effort 12,465.9 20,638.5 7,943.6 15,325.1 17,069.2 6,322.3
1,764,406.9
24 / 80%
2205.5
1,380.9
For evolving 3-bit multiplier, as shown in table 2, a fitness effort of 12,465.9 was required in CGP_1. It means that an average of 12,465.9 genes should be modified to obtain each increased fitness value in CGP_1. ACGP obtained a relative higher average fitness than CGP_1 did. The fitness effort of ACGP was the largest one in table 2 due to its large number of modified genes and a lower average fitness. No solution was found in all the runs in CGP_1 and ACGP. Comparing with CGP_1, LCGP achieved an average fitness as large as 380.5, and brought an improvement as much as 48.2% over CGP_1. The fitness effort of LCGP was smaller than that of CGP. Obviously this progress is caused by the adoption of loacl learning. For RLCGP, a random large searching step is beneficial to avoiding evolution stall and increasing the possibility to find a target. However, the amount of computation of RLCGP was so great that the fitness effort became much higher. HLCGP obtained the highest average fitness and the least fitness effort. Comparing with LCGP, HLCGP had the much higher successful ratio,which means that HLCGP was able to find a solution much easier than LCGP did. This implies that the hierarchical mutation is of great help for evolution to converge on the target in EHW. From the experimental results we can see that the hierarchical mutation strategy and the local learning principle play different roles in the search. Local learning influences the improvement of fitness effort significantly while hierarchical mutation affects the hit effort greatly. An interesting thing is that a successful ratio as higher as 80% with a lower fitness effort of 1,380.9 was obtained when evolving 4-bit adder. The surprising performance may be explained by the effect of the usage of 1-bit adders in the function set. This
Evolving in Extended Hamming Distance Space
377
effect corresponds to the argument that the choice of gates can have dramatic influence on the ease of evolution [3]. Fig.6 illustrates the comparison of hit effort and fitness effort when evolving 3-bit multiplier. From this figure we can see that, by applying the hierarchical mutation and the local learning principle, HLCGP achieves the best performance among these 6 algorithms. The comparison result indicates that the hybrid approach of the two proposed techniques can be of great help for improving the efficiency of CGP for EHW.
Fig. 6. Hit effort and fitness effort comparisons for evolving 3-bit multiplier
5 Conclusion The work presented in this paper considers the search space of EHW as an extended Hamming distance space, and investigates the behavior of evolution in the space for the purpose of improving the performance of evolution. All individuals in such a space locate on a series of concentric circles. The radius of a circle is featured by the extended Hamming distance to the space center of the individuals on the circle. The number of circles is determined by the length of a chromosome which is coded as a real number string. The behavior of evolution in such a space is to adjust the center of the space to the location of a better individual. The hierarchical mutation strategy and the local learning principle are developed accordingly. Comparing with the mutation strategies available in other papers, hierarchical mutation enlarges the search area with less computation effort. This may help the evolution to get rid of trap and toward a better solution easily. Local learning principle ensures that no mutation operates on the same locus of chromosomes within one generation, and thus reducing the computation. The combination of these two techniques allows information sharing among individuals while traditionally in CGP individuals are independent of each other. Fitness effort is proposed as a evaluation method for calculating the computational effort for each increased fitness value. This technique is available for the algorithms with successful cases and those without successful cases. Experimental results indicate that the proposed hybrid approach can be an efficient technique for EHW.
378
J. Li and S. Huang
In this paper, the hierarchical mutation was fixed to a relative small size of 4×4 without concerning the complexity of circuits. However, an adaptive mechanism according to the complexity of circuits may be an effective way for the enhancement of the hierarchical mutation when evolving more complex circuits. Additionally, only two small circuits were used in the experiments as the evolution targets. Further works are required to address these problems. Acknowledgments. The authors would like to thank the anonymous reviewers for their helpful comments.
References 1. Shipman, R., Shackleton, M., Harvey, I.: The Use of Neutral Genotype-phenotype Mappings for Improved Evolutionary Search. BT Technology Journal 18, 103–111 (2000) 2. Sekanina, L.: Evolutionary Design of Gate-Level Polymorphic Digital Circuits. In: Rothlauf, F., Branke, J., Cagnoni, S., Corne, D.W., Drechsler, R., Jin, Y., Machado, P., Marchiori, E., Romero, J., Smith, G.D., Squillero, G. (eds.) EvoWorkshops 2005. LNCS, vol. 3449, pp. 185–194. Springer, Heidelberg (2005) 3. Miller, J.F., Job, D., Vassilev, V.K.: Principles in the Evolutionary Design of Digital Circuits – Part I. In: Banzhaf, W. (ed.) Genetic Programming and Evolvable Machines, vol. 1(1/2), pp. 7–35. Kluwer Academic Publishers, Netherlands (2000) 4. Levi, D.: HereBoy: A Fast Evolutionary Algorithm. In: Proceedings of the 2nd NASA/DoD Evolvable Hardware Workshop, IEEE Computer Society Press, Los Alamitos, CA (2000) 5. Liu, H., Miller, J.F., Tyrrell, A.M.: Intrinsic Evolvable Hardware Implementation of a Robust Biological Development Model for Digital System. In: Deb, K. et al. (eds.) GECCO 2004. LNCS, Springer, Heidelberg (2004) 6. Yu, T., Miller, J.F.: The Role of Neutral and Adaptive Mutation in an Evolutionary Search on the OneMax Problem. In: Cantú-Paz, E. (ed.) Late Breaking Papers at the Genetic and Evolutionary Computation Conference (GECCO 2002), New York, pp. 512–519 (2002) 7. Miller, J.F.: Cartesian Genetic Programming. In: Poli, R., Banzhaf, W., Langdon, W.B., Miller, J., Nordin, P., Fogarty, T.C. (eds.) EuroGP 2000. LNCS, vol. 1802, pp. 121–132. Springer, Heidelberg (2000) 8. Layzell, P.: Visualizing Evolutionary Pathways in Real-World Search Spaces. Technical report 308, Hewlett Packard (2002) 9. Sekanina, L.: Design Methods for Polymorphic Digital Circuits. In: Proc. of 8th IEEE Design and Diagnostic of Electronic Circuits and Systems Workshop, Sopron, HU, UWH, pp. 145–150. IEEE Computer Society Press, Los Alamitos (2005) 10. Sekanina, L., Vašíček, Z.: On the Practical Limits of the Evolutionary Digital Filter Design at the Gate Level. In: Rothlauf, F., Branke, J., Cagnoni, S., Costa, E., Cotta, C., Drechsler, R., Lutton, E., Machado, P., Moore, J.H., Romero, J., Smith, G.D., Squillero, G., Takagi, H. (eds.) EvoWorkshops 2006. LNCS, vol. 3907, pp. 344–355. Springer, Heidelberg (2006) 11. Walker, J.A., Miller, J.F.: Evolution and Acquisition of Modules in Cartesian Genetic Programming. In: Keijzer, M., O’Reilly, U.-M., Lucas, S.M., Costa, E., Soule, T. (eds.) EuroGP 2004. LNCS, vol. 3003, pp. 187–197. Springer, Heidelberg (2004) 12. Székely, G.: Learning is Remembering. In: Behavioral and Brain Sciences, vol. 20, pp. 577–578. Cambridge University Press, Cambridge (1997)
Adaptive and Evolvable Analog Electronics for Space Applications Adrian Stoica, Didier Keymeulen, Ricardo Zebulum, Mohammad Mojarradi, Srinivas Katkoori, and Taher Daud Jet Propulsion Laboratory (JPL), California Institute of Technology, 4800 Oak Grove Drive, Pasadena, California, 91109, USA
[email protected]
Abstract. Development of analog electronic solutions for space avionics is expensive and lengthy. Lack of flexible analog devices, counterparts to digital Field Programmable Gate Arrays (FPGA), prevents analog designers from rapid prototyping, and forces them to expensive custom design, fabrication, and qualification of application specific integrated circuits (ASIC). The limitation come from two directions: first, commercial Field Programmable Analog Arrays (FPAA) have little variability in the components offered on-chip; and second, these are only qualified for military grade temperatures, at best. However, more variability is needed for covering many sensing and control applications. Furthermore, in order to save mass, energy and wiring, there is strong interest in developing extreme environment electronics and avoiding thermal and radiation protection altogether. This means electronics that maintain correct operation while exposed to temperature extremes e.g., on Moon (-180°C to +125°C). This paper describes a recent version of a FPAA design, the JPL Self-Reconfigurable Analog Array (SRAA). It overcomes both limitations, offering a variety of analog cells inside the array together with the possibility of self-correction at extreme temperatures. A companion digital ASIC designed for algorithmic control, including a genetic algorithm implementation, is currently under fabrication. Keywords: Adaptive Hardware, Field Programmable Arrays.
1 Introduction Contrasting to fixed circuit solutions such as application specific integrated circuits (ASICs), field programmable gate and analog arrays (FPGA/FPAA) are advantageous for space applications due to their flexibility of being programmed after launch. While FPGAs are already being utilized in space applications, their analog counterparts have yet to be tested in space. Planetary exploration and long-term satellite missions require radiation and extreme-temperature hardened electronics to survive the harsh environments beyond Earth’s atmosphere. Failure to take appropriate precautions in this area could lead to disastrous consequences, potentially jeopardizing the vehicle, the mission and, in fact, future missions as well. L. Kang, Y. Liu, and S. Zeng (Eds.): ICES 2007, LNCS 4684, pp. 379–390, 2007. © Springer-Verlag Berlin Heidelberg 2007
380
A. Stoica et al.
There are two main approaches possible to achieve the needed survivability of electronics in extreme environments. The first is to use materials/devices that are less affected by environmental conditions such as temperature. National Aeronautics and Space Administration (NASA) has invested in SiGe solutions, which have shown a good response and stability all the way to very low temperatures (–230 C and below) [1]. The second approach is to allow variations of devices with temperature, but to compensate for them at the circuit and sub-system level. This is the approach of a NASA project at JPL described in the following. Reconfiguration at a fine level includes programming current and voltage sources injecting compensation signals to the circuits (tunability may be a more appropriate term). On the other hand reconfigurability at a coarser (block) level allows encapsulation of high-performance circuits in cells and a good number of resources that can be interconnected as needed to form a number of circuits and subsystems. The limitation of current FPAA in terms of reduced type or small number of cells is overcome by employing a larger number of cells/building blocks (BB) with a larger diversity. Thus, compared to commercial solutions this has the advantages in terms of (1) larger variability of the cells, (2) total number of cells, and (3) extended temperature operational range (from –180C to 125C). While in our prior work, BBs were of very fine granularity (as in field programmable transistor arrays, FPTA, and most refined to transistor level in FPTA-0 [2]) the SRAA uses buildings blocks of common encapsulated granularity, with advantages in frequency response because of fewer parasitic losses. Compared to our FPTA architecture [3], [4] that would only map circuits operating in the ~ 10 MHz range, the SRAA is expected to operate beyond ~5MHz. We have implemented a number of test chips as operational transconductance amplifiers (OTA) and have demonstrated that OTA can be compensated algorithmically. These tests were conducted using appropriate equipment for temperature control (-196°C to 130°C range) [5]. Higuchi [6] has demonstrated working of OTA cells for design of filters by exploiting programmable current biases to tune circuits using evolutionary algorithms to compensate for fabrication variations. In our case of circuits implemented with OTA, the tuning successfully compensated not for parameter variation due to mismatch but for temperature-induced deviations within -180C and 120C range. The most recent SRAA allows for temperature compensation and offers a variety of elementary cells. This offers sufficient flexibility to map a number of specific circuits for which mission-oriented ASICs were designed in the past. The SRAA cells were designed with built-in tunable knobs offering degrees of tuning and programmability for compensation for effects that come from temperature variations. A special characteristic of the architecture is that the on-line normal operation can continue uninterrupted while optimal compensation is determined on a set of reference cells. The result of compensation is then transferred to the main array. This is the first field-programmable architecture that allows on-line real-time adaptation while its main function is performed, providing a means of adaptation with hardware in the loop that does not jeopardize system safety. Currently, the algorithmic control for adaptation and reconfiguration is on a separate digital ASIC, which in future could be integrated into selected SRAA versions. The paper is organized as follows. Section 2 presents an analysis of NASAdesigned ASICs implementing functions for sensing and actuation. The analysis
Adaptive and Evolvable Analog Electronics for Space Applications
381
results in a selection of a common set of BBs that has been incorporated in multiple designs. Section 3 presents the SRAA architecture incorporating the selected set of BBs in an array of operational cells. For each group of cells of one type there is also a reference/test cell of the same type, on which adjustments and calibration can be performed even during run-time. It also describes the digital controls that permit multiple functions and hierarchical adaptation. Section 4 illustrates examples of mapping a circuit using SRAA cells and shows various simulation results.
2 Reconfigurable Analog Array 2.1 Analog Circuits for Sensing and Actuation in Space Avionics An analysis of NASA-designed ASICs implementing functions for sensing and actuation reveals that a number of BBs has been incorporated in multiple designs. The Table 1 below is a part of a larger table that reviews circuit BBs utilized in a variety of NASA designed sensors and power control chips. Based on the analysis of a number of ASICs, the following types of BBs (and their numbers in circuits) were selected for the first generation SRAA: 1. 2. 3. 4. 5. 6.
Operational Amplifiers (OpAmps) Low-Offset OpAmps Current Sources High Voltage OpAmps Comparators High Speed Comparators
Different implementations of the same building block were selected as needed for the functions. For instance, two kinds of OpAmps were implemented, conventional OpAmp and Low Offset (or Ping-Pong) OpAmp. The latter presents an input offset voltage below 100 microvolts. The comparator has a conventional and also a highspeed (more than 5MHz) implementation.
Pulse Width Modulator (PWM)
1
Power Switch Control
1
Shaft Encoder
4
4
Instrumentation Amplifier
3
3
2
1
x
x
Bandgap Reference
Digital Interface
Passive Components
High-speed Comparators
HV OpAmp
Current Source
OpAmp
ASIC/Application
Low-Offset OpAmp
Table 1. A reduced table summarizing the function of the ASIC (horizontal rows) and the main building blocks used (columns)
2
x 4
x 1
382
A. Stoica et al.
Fig. 1. SRAA block diagram. Digital controls now implemented in FPGA/ASIC may be included on-chip in next version of the SRAA.
Input
Output
Analog Input BB
Analog Output BB
Input
ANALOG CELL
ANALOG CELL
Self-Test bit 1
Config
Self-Test Bit
DAC
4
4
Reconfiguration bits
ID Register
Cell Addressing
Config
DAC
4
12
Reconfiguration Bits
ID Register
Config
Config
DAC
DAC
4
4
Reconfiguration bits
Cell 4 Addressing
12
Reconfiguration Bits
Digital Bus protocol Analog Input BB 1
BB Digital Control
Analog Output BB
Digital Bus Control
1
Test Fixture (TF) through Switch Box
Digital Control 1 (DC1)
Fig. 2. Reference Analog Cell
BB Digital Control Digital Bus Control
Digital Control
Fig. 3. Functional Analog Cell
Based on the requirements of each application, a building block is replicated 4 times, resulting in a 4x6 analog array. In addition, an extra copy of each cell will serve as reference for adjustment, as explained in the next sections. 2.2 SRAA Architecture Main components of the SRAA architecture are the analog cell arrays (divided into reference and functional array), the switch box and the test fixture (Figure 1). The SRAA is externally controlled by an FPGA and a digital ASIC, described below. 2.3 Analog Array The analog array is divided into the functional and the reference analog arrays. The reference analog cells are individually excited/probed through the test fixture to check
Adaptive and Evolvable Analog Electronics for Space Applications
383
for degradation. The functional array is configured to implement the system functions described in Section 2. Figures 2 and 3 show block diagrams of the reference and functional cells. Both types of cells include Digital-to-Analog Converters (DACs) used to provide bias voltages or currents to tune the analog cell response (see Section 3.4). In current implementation, these reconfiguration DACs are programmed by external controls originating in an FPGA. While the I/O of the reference analog cells are provided/read by the test fixture, they are routed to other functional cells or to I/O pads in the case of the functional analog cell. 2.4 Reconfiguring Circuit Topologies by Signal Switch Box (On-Chip Controls) The switch box allows for the configuration of different functions in the functional analog array, as well as providing an interface between the reference analog cells and the test fixture. The signal switch box is logically divided into 8 sub-blocks or chains, each of them associated with the configuration of a system level function. Figure 4 shows a block diagram representation of an individual chain, as well as the control signals. Each chain is implemented as a shift register structure, in which each flip-flop controls the state of a transmission-gate switch. A select signal enables 1 out of 8 chains; the clock signal shifts the data, which is latched at the end of the configuration. 0 0 S e le c t D a ta (g ) C lo c k ( g ) L a tc h ( g )
1 0
0 1
0 1
1 0
0 0
0 0
0 0
S e le c t D a ta (g ) C lo c k ( g ) L a tc h (g )
Fig. 4. Configuration of a switch box chain. The programming values are shifted in a long register, advancing on each clock cycle to next cell.
2.5 Tuning Cells by Programmable Controls All analog cells have one or more extra input points that allow circuit calibration for extreme temperature recovery. As an example, a ping-pong OpAmp circuit was designed which has two current mode calibration points that are used to bias the preamps. Other cells use voltage mode calibration points. The biases are obtained from DACs, as shown in Figures 2 and 3. 2.6 Digital Programmable Controls for Function Select and SRAA Operation The SRAA is initialized through the configuration of one or more system applications in the functional array. In the current implementation the switch box interface signals shown in Figure 4 are obtained from an external FPGA. The associated DACs of the
384
A. Stoica et al.
functional cells (Figure 3) being utilized are programmed to provide the default (room temperature) calibrating voltages/currents for the functional cells. These DACs are also programmed by the FPGA. Each cell of the functional array includes 1 or 2 reconfiguration DACs. There are a total of 40 reconfiguration DACs for the 6x4 functional array. These reconfiguration DACs are programmed sequentially and selected through a decoder. The reference cells reconfiguration DACs are programmed using a similar interface with the default (room temperature) calibration values for the analog cells. Once the functional and reference array are programmed, the reference cells are sequentially monitored to check for behavior degradation, particularly when the chip is exposed to extreme temperatures or radiation. The reference cells are tested using the test fixture, which consists of two (excitation) DACs and two ADCs as shown in Figure 1. These data converters are also controlled through the FPGA. As an example, the slew-rate of an OpAmp cell can be monitored by the application of a step voltage through the excitation DAC and evaluating the response coming from the excitation ADC.Two excitation DACs are needed to provide up to 2 inputs to the reference cells; one ADC to probe the cell output and the second to probe one of the inputs. Once an analog cell response goes out of specification, possibly due to temperature or radiation effects, the FPGA will recalibrate the corresponding reference cell by reprogramming its respective reconfiguration DACs (low resolution, not affected by temperature which usually affects if more than 8 bits). This involves a search process over the possible calibrating voltage/currents at the particular analog cells. The search process will be controlled by a digital ASIC that implements a search algorithm. Once new values are found that recover the behavior of the reference cell, the same are used to re-program the configuration DACs of the associated functional cells. This tuning process assumes that for each type of cell, its functional and reference versions will display the same behavior. This assumption may usually hold when the SRAA chip is exposed to extreme temperatures; nevertheless, it will probably not hold when the chip is exposed to transient radiation effects (SEU – single event upset), since its effect is not uniform across the chip. In this case the functional cells will also have to be probed individually for correction. Since the chip is fabricated in a rad-hard process (Honeywell), the SRAA will tolerate permanent radiation effects (Total Ionization Dose -TID) up to 300Krad. Finally, the data converters are also assumed to be temperature insensitive based on simulations. However, due to limitations in the simulation models, the actual behavior may present variations. Small temperature variations (such as offset or small decrease in resolution) in the data converters can possibly be compensated by the inherent feedback control performed by the search algorithm.
3 Mapping a Variety of Circuits by Selective Interconnect of SRAA Cells 3.1 Mapping Circuits into SRAA This section illustrates examples of circuit implementations using the SRAA cells, an instrumentation amplifier circuit (not shown) and a oulse width modulator (PWM) circuit (Figure 5), which uses two functional analog cells, the high speed (HS)
Adaptive and Evolvable Analog Electronics for Space Applications
385
HS Comparator Control
+ Out
Digital Interface
Clock Ramp
Conv. Comparator
+ -
Fig. 5. Comparators in Pulse-Width Modulator (PWM) circuit 1.4
4.5
1.2
3.5
R am pInput (V )
4
In1(V)
1 0.8 0.6
3 2.5 2 1.5 1 0.5 0
0.4
-0.5
0
20
40
60
80
100
120
140
160
120
140
160
120
140
160
140
160
Time (us)
0.2 6
0 0
2
4
6
8
10
Time (us)
5
C lock(V )
4
1.2 1
3 2 1
0.8
0
In2(V)
0
20
40
60
80
100
-1
0.6
Time (us) 4
0.4
3.5
0 0
2
4
6
Time (us)
8
10
C ontrol (V )
3
0.2
2.5 2 1.5 1
3
0.5
2.5
-0.5
0 0
20
40
60
80
100
6 5
1.5
O utput (V )
Out (V)
Time (us) 2
1 0.5
4 3 2 1
0
0
0
2
4
6
Time (us)
8
10
0
20
40
60
80
100
120
-1
Time (us)
Fig. 6. (Left) Differential and common-mode signals (top); common-mode signal (middle); and circuit output (bottom). (right) PWM circuit response (it takes three inputs (ramp, clock, and control) and produces one output, the pulse (bottom figure) at the output increases with the control input.
comparator and conventional comparator. This circuit can be used to control a switched power supply. The switch chain configuration allows the selection of one or other comparator to implement the circuit. It should be noticed that the analog cells are reused in other applications. 3.2 Functional Simulations This section shows simulation results of circuits mapped into the SRAA cells. Figure 6 (left) depicts the transient simulation results of the instrumentation amplifier, while
386
A. Stoica et al.
Figure 6(right) illustrates PWM response. The instrumentation amplifier amplifies the differential input (higher frequency in this case) and rejects the common mode signal (lower frequency).
4 Digital Controls: FPGA and ASIC Implementations The Digital control monitors and compensates the functionality of analog cells (Figure 7). In the monitoring mode, digital excitation signals are generated by the digital ASIC (sine wave, step voltage) and provided to Reference Analog Cell (RAC) through DACs on the analog ASIC. The RAC response is sent back to the digital ASIC through ADC on the analog ASIC. The fitness function associated with each RAC type is computed by the fitness processing block on the digital ASIC and compared to the expected value by the monitoring block. If during monitoring the fitness function of one of the RAC is not meeting the requirement, the compensation algorithm is activated. The compensation up-loads new bias voltages and current for the RAC until it finds RAC configuration for which the requirement is met. The same configuration is then applied to FAC.
Compensation
Monitoring Excitation Block
Fitness Processing
Reconfiguration Reference Analog Cell
Sensors
Functional Analog Cell
Digital Signal Processing
Fig. 7. Digital control monitors/compensates the reconfiguration analog array
The digital control is distributed now between an FPGA and digital ASIC. Although the functionality of the FPGA will be implemented in ASIC during the next phase of the project, at this stage, the FPGA was chosen for the interface between digital ASIC, analog ASIC and PC controlled by user, as a risk mitigation approach to deal with risks of unsuccessful first-time correct ASIC. The FPGA has already been validated to perform correctly in the extreme temperatures of interest. 4.1 FPGA Functions The FPGA implements the user interface (UART), the FFT fitness functions and, the digital and analog ASIC interface (Digital Control). Digital Control block provides the control signals for DACs, ADCs and the signal switch box. The digital ASIC
Adaptive and Evolvable Analog Electronics for Space Applications
387
controls the FPGA in monitoring or compensation mode. In debugging mode, the FPGA is user controlled via UART. The FPGA implementation used a VirtexII Pro based on OPB/PLB bus, UART core, on-chip ROM/RAM and, one Power PC 405 Core with a 25MHz clock. The user can program the switch box, the voltage and current bias and, the DAC excitation signal. The user can also choose the fitness function, analyze the ADC response, the fitness response, choose the compensation algorithm and perform a soft reset. 4.2 Digital ASIC Implementing a Genetic Processor The digital ASIC implements the compensation algorithms and the control of the analog ASIC during monitor and compensation mode. The main controller initiates the monitoring, the compensation and the I/O commutations. The system monitor cycles through each reference analog cell (RAC). A model-based compensation algorithm stores the voltage and current bias corrections values for each RAC and for each temperature value. The GA block implements a generational GA (all the members in the current population are replaced) with a roulette wheel selection and one point crossover. The fitness evaluation module (FEM) implements slew rate evaluation function providing a voltage step as excitation input and measuring the rising time of the response. Each module of the digital ASIC is programmable. For example, the number of cells to monitor and compensate can be changed depending on the SRAA application. The model compensation data as well as the genetic algorithm parameters (population size, cross-over rate, mutation rate) are provided at the initialization of the digital ASIC. The digital ASIC was fabricated using the same process as the analog ASIC. It has an area 6 by 6 mm and clock speed of 50MHz.
5 Adaptation and Evolution Under Digital Controls We adopted a hierarchical approach to calibration/compensation. A first level compensation involves mapping corrections that were predetermined, either via model-based computation or testing devices in the actual environment. A second level involves finding solutions through gradient-descent searches, which are less computationally expensive and faster. A third level is employed if previous levels did not succeed, and a global search, using evolutionary algorithms, is employed. In our earlier work we demonstrated a variety of circuits that can be evolved under the control of evolutionary algorithms implemented in software on a PC, or running on a DSP [2,3,5]. Here the focus is porting these algorithms into a digital design implementation. The entire system, both the analog and its digital control must be immersed in the extreme environment. In preliminary experiments using test chips using operational transconductance amplifiers (OTA), we have demonstrated that basic component of the analog array,- the OTA, can be compensated algorithmically using gradient descent for temperature control (-196°C to 130°C range). These experiments did not use the genetic algorithm which was not available in hardware as a digital implementation at that time, but exercised the control on a smaller search space, and using a gradient descent algorithm implemented in digital hardware. This proved that the self-recovery can take place with entire system implemented in
388
A. Stoica et al.
Vbias0
Vbias0 Vbias1
Vbias1 Excitation Signal
Excitation Signal GmC filter Response
GmC filter Response
Zoom
Best Vbias0 and Vbias1 Range
Zoom
Best Vbias0 and Vbias1 Range
Fig. 8. (Left) 1st order GmC filter response at +23ºC for a 50kHz sine wave excitation signal when sweeping voltage bias Vbias0 and Vbias1. (Right) 1st order GmC filter response at 180ºC for a 50kHz sine wave excitation signal when sweeping voltage bias Vbias0 and Vbias1.
Fig. 9. Evolution of the amplitude Pk-Pk of the 1st order GmC response through gradient descent at T=-180ºC
hardware, in the extreme environment. Figure 8 (left) shows oscilloscope traces of 1st order GmC response (using OTAs) for all possible values of voltage bias (Vbias0 and Vbias1). It represents the entire search space of the 1st order GmC filter and shows how different the responses of 1st order GmC are at +23ºC and at -180ºC. We invoke the algorithm when the temperature in the chamber is T=-180ºC. The Vbias were kept at their optimal values at room temperature, The gain drops from +20dB at room temperature to -7dB at -180ºC for a 50kHz sine wave input signal (Start Point 1) (Figure 9). If the new Vbias0 and Vbias1 yield a better response than the current best one, we keep it and restart the exploration around that point. The algorithm was able to increase the gain of the 1st order filter at 50kHz from 7dB to +10dB by changing the Vbias0 and Vbias1(Figure 16 right). The algorithm
Adaptive and Evolvable Analog Electronics for Space Applications
389
Fig. 10. (Left) 1st order GmC fitler Response at -180ºC before optimization of the gain of the 1st order GmC filter at 50kHz. (Right) Best 1st order GmC fitler Response at -180ºC after 21 evaluations using start point 1 for Vbias voltages.
continues with a new starting point of the Vbias voltage (Start Point 2) and was able to increase the gain of the filter at 50kHz from -1dB to +9.1dB by changing the Vbias0 and Vbias1. These preliminary experiments demonstrate the need for a compensation algorithm to adapt voltage or current bias to the environment.
6 Conclusions The SRAA offers a solution to the need for flexible, survivable extreme environment electronics. The flexibility offered by the presence of a wide variety of cells allows the implementation of a number of functions for sensing and actuation applications. Simulations show that the chosen architecture is capable of implementing in the same chip, by consequent configuration/ reconfiguration, a variety of topologies that previously were implemented with specialized ASICs. The SRAA is designed to maintain a stable operation over a wide temperature range of more than 300C, from –180C to 125C. Over this range, parameters of interest may vary with less than 1-5 % deviation from their 27C value, depending on the circuit. This is achieved by temperature-oriented designs and exploitation of tuning points additionally included in the circuits and which allow algorithmic driven digital programmable compensation through programmable current/voltage signals/biases. The availability of the SRAA as an off-the-shelf component for space avionics designs would greatly reduce the development/integration cost and time. Special innovations contributed with this SRAA include: •
A solution for on-line adaptation by the use of functional cells and reference cells. While the functional cells are actively involved in the mapped function, the reference cells are monitored and calibrations are determined in the case of
390
•
A. Stoica et al.
deviations. The calibration/tuning is then transferred to the functional cells without stopping the operation. A hierarchical approach to calibration/ compensation is used. A first level involves mapping corrections that were predetermined, either through modelbased computation or test with the devices in the actual environment. A second level involves finding solutions through gradient descent searches that are less computationally expensive and hence faster. Finally, a third level is employed in case that previous levels did not succeed, when a global search, using evolutionary algorithms, is employed.
Acknowledgment The work described in this paper was performed at the Jet Propulsion Laboratory, California Institute of Technology and was sponsored by the National Aeronautics and Space Administration.
References 1. Cressler, J.D.: Issues and Opportunities for Complementary SiGe HBT Technology. In: Proceedings of the 2006 Electrochemical Society Symposium on SiGe and Ge: Materials, Processing, and Devices, pp. 893–912 (2006) 2. Stoica, A., Zebulum, R., Keymeulen, D., Tawel, R., Daud, T., Thakoor, A.: Reconfigurable VLSI Architectures for Evolvable Hardware: from Experimental Field Programmable Transistor Arrays to Evolution-Oriented Chips. IEEE Trans VLSI Systems, Special Issue on Reconfigurable and Adaptive VLSI Systems 9(1), 227–232 (2001) 3. Stoica, A., Zebulum, R.S., Ferguson, M.I., Keymeulen, D., Duong, V.: Evolving Circuits in Seconds: Experiments with a Stand-Alone Board Level Evolvable System. In: 2002 NASA/DoD Conf. on Evolvable Hardware, July 15-18, 2002, Alexandria VA, IEEE Computer Society Press, Los Alamitos (2002) 4. Zebulum, R.S., et al.: Automatic Evolution of Tunable Filters using SABLES. In: Tyrrell, A.M., Haddow, P.C., Torresen, J. (eds.) ICES 2003. LNCS, vol. 2606, pp. 286–296. Springer, Heidelberg (2003) 5. Stoica, A., Zebulum, R.S., Keymeulen, D., Ramesham, R., Neff, J., Katkoori, S.: Temperature-Adaptive Circuits on Reconfigurable Analog Arrays. In: First NASA/ESA Conference on Adaptive Hardware and Systems (AHS 2006), 15-18 June 2006, pp. 28–31 (2006) 6. Higuchi, T., et al.: Real-world applications of analog and digital evolvable hardware. IEEE Tr. On Evolutionary Computation 3(3), 220–235 (1999)
Improving Flexibility in On-Line Evolvable Systems by Reconfigurable Computing Jim Torresen and Kyrre Glette Department of Informatics, University of Oslo P.O. Box 1080 Blindern, N-0316 Oslo, Norway
[email protected] http://www.ifi.uio.no/∼jimtoer
Abstract. Reconfigurable logic is a promising technology for adaptable systems – often called reconfigurable computing. However, one of the main challenges with autonomous adaptable systems is the flexibility. The paper starts with giving an overview of reconfigurable computing and different approaches to how it can be implemented. Then, we outline how these can be applied in on-line evolvable systems to improve flexibility in the hardware. The challenge of the latter is to include flexibility without re-synthesis and avoid having a too large logic gate overhead. An architecture based on system-on-chip and partial reconfiguration is proposed in the paper.
1
Introduction
Evolvable systems have the potential of becoming important in future autonomous adaptable systems. This could imply that both software and hardware are adaptable. However, commercial dynamic computer systems have so far mainly been based on context switching software – i.e. switching software processes on a processor. However, with the introduction of Field Programmable Gate Arrays (FPGAs) also hardware is able to be modified at run-time. Few embedded systems are designed today without containing one or more FPGAs. The technology has progressed from earlier being used only as glue logic to now also being much applied for fast data processing. However, substituting the configuration at run-time has so far not been much applied. Like software many years ago, the same code/configuration remains static in the device when a system is put in operation. Swapping processes has not yet reached the FPGA application designers. There are naturally some reasons for this including long reconfiguration time and reliability issues. The recent progress of the technology has however reduced these problems and dynamic FPGAs are now more applicable than earlier as will be described in this paper. Much research on reconfigurable computing is not related to evolvable systems. Thus, in this paper we emphasize on presenting the reconfigurable computing alternatives to try improve the applicability of it in evolvable systems. Since much (i.e. the signal routing) of the FPGA configuration bit string coding is secret, it is impossible for an evolvable system to directly change the FPGA content unless re-synthesis L. Kang, Y. Liu, and S. Zeng (Eds.): ICES 2007, LNCS 4684, pp. 391–402, 2007. c Springer-Verlag Berlin Heidelberg 2007
392
J. Torresen and K. Glette
is undertaken. Re-synthesis is not normally applicable and evolvable hardware architectures often end up being regular and with limited flexibility. The challenge is addressed in this paper by introducing some architectures with inherent reconfigurability. In the next section, an overview of the status of alternatives for run-time reconfigurable systems is included. This is continued by a introducing how this technology can be applied in evolvable systems in section 3. Finally, section 4 concludes the paper.
2
Reconfigurable Computing
Processing in run-time reconfigurable hardware systems is often named Reconfigurable Computing (RC). There is not a common unique definition of this expression since there is some variation in the way researchers define it. Below we will give an overview of the scope of reconfigurable computing as most researchers seem to perceive it. It refers to systems incorporating some form of hardware re-programmability. There seems to be three main degrees of reconfiguration: – Static: The configuration within the FPGA is the same throughout the lifetime of the system. This means no adaptivity at runtime. – Upgrade: The configuration is changed from time to time for bug fixes or functional upgrades. This represents rare adaptation. – Run-time: A set of configurations (multi-context) are available which the FPGA switch between at run-time. This could provide several benefits as described below. Most applications are implemented by applying the static approach – i.e. no adaptivity. However, upgrading of systems have recently become more common. This allows the configuration to be upgraded when bugs are found or when the functionality of the system is to be changed. However, several requirements of such a system exist [1]: – – – –
Fallback to old configuration must be possible. Switching in ideally one clock cycle. On-line upgrade over Internet. The configuration bitstream must be encrypted to avoid reverse engineering.
In the future, automatic dynamic products will probably arrive. These could autonomously upgrade the hardware as the environment (or data) changes or when bugs are detected in the system. One promising approach based on this idea is evolvable hardware [2]. The application areas for run-time reconfigurable systems are: – Space/cost/power reduction – Speeding up computation – Incorporating new data/patterns realized in reconfigurable logic
Improving Flexibility in On-Line Evolvable Systems
393
If not all functions in a systems are needed at the same time, we can substitute a part of the configuration at run-time as seen in Figure 1. Function A contains the parts of the system that always need to be present. However, part B and C are not needed concurrently and can be assigned to the same resources (location) in the FPGA. An example of such an application can be a multifunctional handheld device with e.g. mobile phone, MP3-player, radio and camera. For most purposes, a user would normally not apply more than one of these functions at a time. Thus, instead of having custom hardware for each function, it could be efficient having a reconfigurable system where only the active function is configured. This would allow for a smaller hardware device which leads to reduced cost and for some systems reduced power consumption. Such benefits are important in a competitive market. One of the benefits of FPGAs is their parallel structure allowing for parallel matching/searching. This is applied in an evolvable classification architecture presented in section 3.
Fig. 1. lllustration of run-time reconfiguration of FPGA
The application area for run-time reconfiguration for computational speedup is depicted in Figure 2. Rapid swapping between successive configurations can give a RC-based system a considerable throughput compared to having a general static FPGA configuration. If a task A can be partitioned into a set of separate tasks (A.1, A.2 and A.3 in the example in the figure) to be executed one after the other, an FPGA configuration can be designed for each of them. Thus, each configuration is optimised for one part of the computation. During runtime, context switching (CSW) is undertaken and the total execution time for the task in the given example is reduced. The context switching time would have to
394
J. Torresen and K. Glette
Task A
Static Context switching
Task A.1
CSW Task A.2 CSW Task A.3
Time
Fig. 2. Illustration of a run-time reconfigurable FPGA compared to a static FPGA
be short to reduce the overhead of switching between the different configurations. A run-time reconfigurable device has the advantage that its configuration can be modified according to the application at hand. In this way, it has the potential of achieving an even higher performance than an ASIC [3]. There are differences between the goals for run-time reconfiguration: – Space/power/cost optimisation: • Reconfigure for change in function, protocol, standard etc • Infrequent reconfiguration – Speed optimisation: • Reconfigure within a function or task • Frequent reconfiguration Infrequent reconfiguration would normally be easier to undertake than frequent reconfiguration. Configuration of systems can be further classified into [4]: – Deterministic configurations: Allocations in FPGA can be pre-planned. – Non-deterministic configurations: An operating system is needed to schedule tasks and control reconfigurable logic fragmentation. An operating system for run-time context switching is necessary for nondeterministic configurations. Although some work has been published on such operating systems, it is far from being ready for the commercial market. The main classes of reconfigurable devices available are [3]: – Single context (most commercial FPGAs) – Partially reconfigurable – (e.g. Xilinx Virtex FPGAs) – Multi-context (no commercial FPGAs) In single context devices, full re-programming is required for any change in the configuration. In partially reconfigurable devices, only the part of the configuration that is changed is written to the FPGA. The undisturbed portion of the device may continue executing, allowing the overlap of computation with reconfiguration [3]. A new configuration should be placed onto the FPGA where it will cause minimum conflict with other configurations already present in the device. De-fragmentation could be necessary to consolidate the unused area by moving valid configurations to new locations.
Improving Flexibility in On-Line Evolvable Systems
395
In multi-context devices, there are multiple memory bits for each programming bit location [3], providing context switching in one or a few clock cycles. No major FPGA vendor provides such devices yet. However, some IP cores are available [5,6,7]. These are typically based on an array of processing elements (PEs) with multiple context layers for switching routing between PEs and program code for the PEs. Context switching is typically undertaken in a single clock cycle. 2.1
Approaches to Reconfigurable Computing with FPGAs
General challenges of run-time reconfiguration in FPGA: – Reducing the long time required for reconfiguration – Avoiding the system from being inactive during reconfiguration (safe and robust reconfiguration) – Interfacing between modules belonging to different configurations – Predictability (reliability and testability) of system operation Since the configuration bit string is serially loaded into the device, the main problem with switching configurations is the long reconfiguration time. At the moment there seem to be three possible approaches; smaller devices, virtual FPGAs and partial reconfiguration. Smaller Devices. Since the full reconfiguration time is less for smaller devices, reconfiguration time can be reduced by applying smaller devices. Moreover, by applying context switching, we may be able to implement a full system in a smaller device with the benefit of reduced cost and power consumption. The drawback would be that the system would have to be inactive during reconfiguration. Virtual FPGAs. Virtual FPGA is based on designing a “virtual” FPGA inside an ordinary FPGA [8]. We have so far introduced an architecture for context switching based on a multi-context “virtual” FPGA [9,10]. The architecture provides switching between 16 different configurations in a single clock cycle. Such a system would never achieve as high clock frequency as a leading edge processor. However, by applying massive parallel processing, the execution time can still be less [11]. Even though a fast processing can be achieved, the context switching architecture requires much reconfigurable resources (in that way, this architecture is prioritising speed before cost and power consumption). Partial Reconfiguration. As FPGA devices are getting bigger, the configuration bitstream becomes longer and programming time increases. Thus, runtime reconfigurable designs would benefit from having only a limited part of the FPGA being context switched by partial reconfiguration. This feature is available in some FPGAs where a selected number of neighbouring columns are programmed. This requires detailed considerations for having no interruption at context switching [12]. One approach that we have started to look at is pipelined downloading of configuration bitstream [13].
396
J. Torresen and K. Glette
ROW
Configuration frame 1312 bit high, 1 bit wide (20 CLB high)
Fig. 3. Illustration of partial reconfiguration in Virtex-4 and Virtex-5 devices
Another challenge is to limit the inter partition data transfer. That is, efficient communication between context switched tasks. While the first FPGAs offering partial reconfiguration required complete columns of the device being programmed, the more recent ones – including Virtex-4/5, require only a part of each column being programmed – see figure 3. This makes interfacing between tasks and having uninterrupted operation easier since some rows can be used for permanent configurations. The smallest Virtex-5 device (LX30) consists of 4 rows while the largest (LX330) consists of 12 rows. Further, there has been introduced tools like PlanAhead that makes partial reconfiguration easier. It is possible to reconfigure the Virtex devices internally using the Internal Configuration Access Port (ICAP). This will be applied in the architecture presented in the next section. There has been undertaken some work on real-time partial reconfigurable systems like e.g. [14,15,16]. Below we include proposals of how some of the RC principles can improve the flexibility of evolvable hardware systems. This will be targeted at an earlier proposed classification architecture.
3
A Flexible Classifier Architecture
To be able to provide run-time adaptation, some scheme for automatic design is necessary. So far evolution has been a much explored method. However, to provide not too slow evolution, many systems have been based on virtual FPGA implementation rather than FPGA reconfiguration. One of the problems related to virtual FPGA is the large gate overhead especially for routing signals. A common way to implement routing is by using multiplexers. However, these become large as the signal resolution increases and the size of the architecture increases. To reduce this problem and make systems more scalable, we would in this paper introduce alternative architectures. 3.1
The Original Classification Module
We have earlier developed an architecture that has shown to give a high classification performances both for an image application [17] and signal processing application [18]. The system consists of three main parts – a classification module, an evaluation module, and a processor. The complete system is implemented
Improving Flexibility in On-Line Evolvable Systems
397
CLASSIFICATION SYSTEM TOP-LEVEL MODULE
input pattern
CDM1
M A X.
CDM2
D E T E C T O R
category classification
CDMK
Fig. 4. EHW classification module
in a single FPGA. The processor is running the evolution algorithm and configures the other modules. The evaluation module is used for fitness computation and is based on evaluating only a small part of the classifier at a time since incremental evolution is applied. The classification module, however, would have to be complete unless time multiplexing is applied. Therefore this module with its typically large structure is difficult to make online adaptable without a large logic gate overhead (mainly for routing). Thus, in this paper the focus is on the classification module rather than the evaluation module. The classification module operates stand-alone except for its reconfiguration which is carried out by the processor. The classification module consists of one category detection module (CDM) for each category to be classified – see figure 4. The input data to be classified is presented to each CDM concurrently on a common input bus. The CDM with the highest output value will be detected by a maximum detector, and the identifying number of this category will be output from the system. Each CDM consists of M “rules” or functional unit (FU) rows – see figure 5. Each FU row consists of N FUs. The inputs to the circuit are passed on to the inputs of each FU. The 1-bit outputs from the FUs in a row are fed into an Ninput AND gate. This means that all outputs from the FUs must be 1 in order for a rule to be activated. The outputs from the AND gates are connected to an input counter which counts the number of activated FU rows. The FUs are the reconfigurable elements of the architecture. Each FU behaviour is controlled by connected configuration lines (not shown in figure 5). Each FU has all input bits
398
J. Torresen and K. Glette CATEGORY DETECTION MODULE
input pattern
FU11
FU12
FU1N
N-input AND
FU21
FU22
FU 2N
N-input AND
FUM1 FUM2
C O U N T E R
output
FUMN
N-input AND
Fig. 5. Category detection module (CDM)
to the system available at its inputs but only one data element (e.g. one byte) of these bits is chosen. One data element is thus selected from the input bits, depending on the configuration lines. 3.2
A Flexible Classification Module
The evolution is undertaken for one or a few FUs at a time, thus, the flexibility would only be needed in the classification module to be applied after evolution. Including flexibility could either be undertaken by including flexibility in the design (virtual FPGA) or by changing the design itself either by partial re-synthesis or by having a number of pre-synthesized configurations. Reconfiguration of the two latter approaches could be undertaken with partial reconfiguration. For most applications, re-synthesis would not be applicable due to the long time needed. Further, resource planning could be difficult as the flexibility increases. Thus, we would like to look into how flexibility in an architecture can be increased with pre-synthesized configurations. The classification architecture introduced above is based on using predefined values for N and M . To be able to change them, re-synthesis is required. Keeping N and M fixed lead to the most efficient hardware architecture since flexibility
Improving Flexibility in On-Line Evolvable Systems
input pattern
399
FU1 FU2
N-input AND
FU3 FU4
N-input AND
FU5 FU6
N-input AND
Fig. 6. Example of flexible connection of FUs to AND gates
could often not be effectively implemented. However, with a system-on-chip implementation, there will typically be a limit on the total number of FUs that can be implemented in the device. On the other hand, since the data set changes over time, it would often be impossible to predict the optimal selection of N and M at design time. The following flexibilities could be explored: – Variance in the number of FUs in each row. – The number of FUs in each row (N) versus the number of FU rows (M). – The total number of units assigned to each CDM could be different for each category. Below, we first present how the variance in the number of FUs in each row could be implemented. Then, we present an architecture with the ability to select the combination of N and M that maximizes the performance. The latter approach could be combined with the first to get variance in the number of FUs among different rows. Flexible AND Gate Connections. To allow variance in the number of FUs in each row, the architecture in the example in Figure 6 is proposed. Some of the FUs are here connected to several AND gates. Thus, some AND gates may connect to more FUs than other. This would have to be controlled by an input control in each AND gate that allow or block for shared FUs. That is, an FU connected to several AND gates could be input to one unique AND gate or several. If evolution is undertaken on one AND gate at a time, one unique AND gate connection would be the most appropriate. The more global the FU sharing is implemented, the larger is the flexibility in number of FUs connected to each AND gate. However, this comes at a cost of increased routing overhead (increased use of logic gates). In any case, the processor would have to coordinate the assignment of FUs during evolution. Pre-synthesized Configurations. The most intuitive approach to applying pre-synthesised configurations would probably be to store a number of config-
400
J. Torresen and K. Glette
ONLINE EVOLVABLE SYSTEM TOP-LEVEL VIEW
CPU
EVALUATION MODULE
configuration & training patterns configuration
fitness
ICAP
configuration
input pattern
I N P U T I N T E R F A C E
CDM1
CDM2
CDMK
O U T P U T I N T E R F A C E
M A X. D E T E C T O R
category classification
CLASSIFICATION MODULE
Fig. 7. The complete evolvable system with partial reconfiguration of the CDMs in the classifier module
urations with different values for N and M . However, all have the same total number (N ∗ M ) of FUs based on the amount of available logic gate resources in the given device. This could be undertaken by the architecture in Figure 7. In this architecture, the CDMs are programmable by partial reconfiguration through the ICAP interface. After the processor (CPU) has evolved a new classifier, CDMs are configured with the configuration corresponding to the best found combination of N and M . The Input and Output interfaces (as well as the Max Detector) are static but can be updated with data like the chosen value of N and M . A more active variant would be to change the LUT (look-up table) content available in the partial configuration bit string. However, this could require low level study of the design and low-level configuration. We will look more closely at this as a part of our future work.
Improving Flexibility in On-Line Evolvable Systems
4
401
Conclusions
In this paper, an introduction to reconfigurable computing and different approaches to how it can be implemented have been included. Further, it has been outlined how this technology can be applied in on-line evolvable systems to improve flexibility in the hardware. The challenge of the latter is to include flexibility without re-synthesis and avoid having a too large logic gate overhead. An architecture based on system-on-chip and partial reconfiguration is proposed in the paper which allows for increased flexibility.
References 1. Prophet, G.: FPGAs + the Internet = upgradable Product. In: EDN Europe, pp. 28–38 (2000) 2. Torresen, J.: An evolvable hardware tutorial. In: Becker, J., Platzner, M., Vernalde, S. (eds.) FPL 2004. LNCS, vol. 3203, pp. 682–691. Springer, Heidelberg (2004) 3. Compton, K., Hauck, S.: Reconfigurable computing: A survey of systems and software. ACM Computing Surveys 34(2), 171–210 (2002) 4. Steiger, C., Walder, H., Platzner, M.: Operating systems for reconfigurable embedded platforms: Online scheduling of real-time tasks. IEEE Trans. on Computers 53(11), 1393–1407 (2004) 5. http://www.ipflex.com 6. http://www.elixent.com 7. http://www.pactcorp.com 8. Sekanina, L., Ruzicka, R.: Design of the special fast reconfigurable chip using common FPGA. In: Proc. of Design and Diagnostics of Electronic Circuits and Systems - IEEE DDECS’2000, pp. 161–168 (2000) 9. Torresen, J., Vinger, K.A.: High performance computing by context switching reconfigurable logic. In: Proc. of the 16th European Simulation Multiconference (ESM’2002), June 2002, pp. 207–210. SCS Europe (2002) 10. Vinger, K.A., Torresen, J.: Implementing evolution of FIR-filters efficiently in an FPGA. In: Proc. of the 2003 NASA/DoD Workshop on Evolvable Hardware (2003) 11. Torresen, J., Jakobsen, J.: An FPGA implemented processor architecture with adaptive resolution. In: Proc. of 1st NASA/ESA Conference on Adaptive Hardware and Systems (AHS-2006), IEEE Computer Society Press, Los Alamitos (2006) 12. Two flows for partial reconfiguration: module based or small bit manipulation, Application Note 290. Xilinx (2004) 13. Torresen, J.: Reconfigurable logic applied for designing adaptive hardware systems. In: Proc. of the International Conference on Advances in Infrastructure for eBusiness, e-Education, e-Science, and e-Medicine on the Internet (SSGRR’2002W). Scuola Superiore G. Reiss Romoli (2002) 14. Hubner, M., et al.: New 2-dimensional partial dynamic reconfiguration techniques for real-time adaptive microelectronic circuits. In: Proc. of IEEE Computer Society Annual Symposium on Emerging VLSI Technologies and Architectures (ISVLSI’06), pp. 97–102. IEEE Computer Society Press, Los Alamitos (2006) 15. Upegui, A., Sanchez, E.: Evolving hardware by dynamically reconfiguring Xilinx FPGAs. In: Moreno, J.M., Madrenas, J., Cosp, J., et al. (eds.) ICES 2005. LNCS, vol. 3637, pp. 56–65. Springer, Heidelberg (2005)
402
J. Torresen and K. Glette
16. Upegui, A., Sanchez, E.: Evolving hardware with self-reconfigurable connectivity in Xilinx FPGAs. In: Stoica, A., et al. (eds.) Proceedings of the 1st NASA /ESA Conference on Adaptive Hardware and Systems(AHS-2006), Los Alamitos, CA, USA, pp. 153–160. IEEE Computer Society Press, Los Alamitos (2006) 17. Glette, K., Torresen, J., Yasunaga, M.: An online EHW pattern recognition system applied to face image recognition. In: Giacobini, M., et al. (eds.) EvoWorkshops 2007. LNCS, vol. 4448, pp. 271–280. Springer-Verlag, Heidelberg (2007) 18. Glette, K., Torresen, J., Yasunaga, M.: An online EHW pattern recognition system applied to sonar spectrum classification. In: ICES 2007 (to be published, 2007)
Evolutionary Design of Resilient Substitution Boxes: From Coding to Hardware Implementation Nadia Nedjah1 and Luiza de Macedo Mourelle2 1
Department of Electronics Engineering and Telecommunications, 2 Department of System Engineering and Computation, Engineering Faculty, State University of Rio de Janeiro, Brazil
Abstract. S-boxes constitute a cornerstone component in symmetrickey cryptographic algorithms, such as DES and AES encryption systems. In block ciphers, they are typically used to obscure the relationship between the plaintext and the ciphertext. Non-linear and non-correlated S-boxes are the most secure against linear and differential cryptanalysis. In this paper, we focus on a two-fold objective: first, we evolve regular an S-box with high non-linearity and low auto-correlation properties using evolutionary computation; then automatically generate evolvable hardware for the obtained S-box. Targeting the former, we use the Nash equilibrium-based multi-objective evolutionary algorithm to optimise regularity, non-linearity and auto- correlation, which constitute the three main desired properties in resilient S-boxes. Pursuing the latter, we exploit genetic programming to automatically generate the evolvable hardware designs of substitution boxes that minimise hardware space, encryption/decryption time and dissipated power, which form the three main hardware characteristics. We compare our results against existing and well-known designs, which were produced by using conventional methods as well as through evolution.
1
Introduction
In cryptography, confusion and diffusion are two important properties of a secure cipher as identified in [15]. Confusion allows one to make the relationship between the encryption key and ciphertext as complex as possible while diffusion allows one to reduce as much as possible the dependency between the plaintext and the corresponding ciphertext. Substitution (a plaintext symbol is replaced by another) has been identified as a mechanism for primarily confusion. Conversely, transposition (rearranging the order of symbols) is a technique for diffusion. In modern cryptography, other mechanisms are used, such as linear transformations. Product ciphers use alternating substitution and transposition phases to achieve both confusion and diffusion respectively. Here we concentrate on confusion using non-linear and non-correlated substitution boxes or simply S-boxes. L. Kang, Y. Liu, and S. Zeng (Eds.): ICES 2007, LNCS 4684, pp. 403–414, 2007. c Springer-Verlag Berlin Heidelberg 2007
404
N. Nedjah and L. de Macedo Mourelle
It is well-known that the more linear and the less auto-correlated the S-box is, the more resilient the cryptosystem that uses them. However, engineering a regular S-box that has the highest non-linearity and lowest auto-correlation properties is an NP-complete problem. Evolutionary computation is the ideal tool to deal with this type of problems. As there are three objectives that need to be reached, which are maximal regularity, maximal non-linearity yet minimal auto-correlation, we propose to use multi-objective evolutionary optimisation. Therefore, we exploit the game theroy [12] and more specifically the well-known Nash equilibrium strategy [13] to to engineer such resilient substitution boxes. Generally, the result of a cryptographic process, i.e. the plaintext or ciphertext is required on real time basis. Therefore, the computation performed needs to be efficiently implemented. When the time requirement is a constraint, hardware implementation of the computation is usually needed. As S-boxes are omnipresent in almost all nowadays cryptosystems, it is very interesting to have an optimised hardware implementation of the S-box used. Therefore, here we take advantage of genetic programming to yield an efficient evolvable hardware for a given S-box coding. The rest of this paper is organised in seven sections. First, in Section 2, we define S-boxes more formally as well as their desirable properties. Subsequently, in Section 3, we present the multi-objective Nash equilibrium-based evolutionary algorithm [12], used to evolve resilient S-box coding and we give a brief description of the principles of evolvable hardware. Thereafter, in Section 4, we describe the S-box encoding and give the definition and implementation of the fitness evaluation of an S-box coding with respect to all three considered properties: regularity, non-linearity and auto-correlation. In the sequel, in Section 5, we present the methodology we employed to evolve new compact, fast and less demanding hardware for S-boxes and define the related fitness evaluation with respect to all three hardware characteristics: area, time and power consumption. Then, in Section 6, we assess the quality of the evolved S-boxes codes together with the corresponding hardware. Also, we compare the characteristics of the engineered S-boxes to those used by Data Encryption Standard (DES) [11]. Last but not least, in Section 7, we summarise the content of the paper and draw some useful conclusions.
2
Preliminaries for Substitution Boxes
S-Boxes play a basic and fundamental role in many modern block ciphers. In block ciphers, they are typically used to obscure the relationship between the plaintext and the ciphertext. Perhaps the most notorious S-boxes are those used in DES [11]. S-boxes are also used in modern cryptosystems based on AES and Kasumi. All three are called Feistel cryptographic algorithms [8] and have the simplified structure depicted in Fig. 1. An S-box can simply be seen as a Boolean function of n inputs and m outputs, often with n > m. Considerable research effort has been invested in designing resilient S-boxes that can resist the continuous cryptanalyst’s attacks. In order
Evolutionary Design of Resilient Substitution Boxes
405
Fig. 1. The simplified structure of Feistel cryptographic algorithm
to resist linear and differential cryptanalysis [2,7], S-boxes need to be confusing or non-linear and diffusing or non auto-correlated. S-boxes also need to be nonregular. In the following, we introduce some useful formal definitions for S-box properties, which are used later in the fitness evaluation of an S-box evolved coding. Definition 1. A simple S-box S is a Boolean function defined as S: B n −→ B. Definition 2. A linear simple S-boxes L is defined in (1): Lβ (x) =
i=m
βi .L(xi )
(1)
i=0
Definition 3. The polarity of a simple S-box S is defined in (2): ˆ S(x) = (−1)S(x)
(2)
Definition 4. The non-correlation factor of two simple S-boxes S and S is defined in (3): ˆ US,S = S(x) × Sˆ (x) (3) x∈Bn
Definition 5. Two simple S-boxes S and S are said to be non-correlated if and only US,S = 0.
406
N. Nedjah and L. de Macedo Mourelle
Definition 6. The non-linearity of a simple S-box S is measured by its noncorrelation factor with all possible linear simple S-Boxes defined in (4): 1 n NS = 2 − maxn |US,L | (4) α∈B 2 Definition 7. The auto-correlation of a simple S-box S is measured by its noncorrelation factor with derivatives S-boxes D(x) = S(x)⊕ α, for all α ∈ B n \ {0n} and defined in (5): AS =
max
α∈Bn \{0n }
|US,D |
(5)
Note that US,D is also called Walsh Hadamard transform [1]. Definition 8. A simple S-box is said to be balanced if and only the number of combinations x ∈ B n such that S(x) = 0 and the number of combinations y ∈ B n such that S(y) = 1 are the same. The balance of a simple S-box is measured using its Hamming weight, defined in (6). 1 n ˆ WS = 2 − S(x) (6) 2 n x∈B
3
Evolutionary Algorithms: Nash Startegy and Evolvable Hardware
Starting form random set of solutions, which is generally called initial population, an evolutionary algorithm breeds a population of chromosomes through a series of steps, called generations, using the Darwinian principle of natural selection, recombination also called crossover, and mutation. Individuals are selected based on how much they adhere to the specified constraints. Each evolved solution is assigned a value, generally called its fitness, that mirrors how good it is in solving the problem in question. Evolutionary computation proceeds by first, randomly creating an initial population of individuals; then, iteratively evolving to the next generation, which consists of going through two main steps, as far as the constraints are not met. The first step in a generational evolution assigns for each chromosome in the current population a fitness value that measures its adherence to the constraints while the second step creates a new population by applying the three genetic operators, which are selection, crossover and mutation to some selected individuals. Selection is performed on the basis of the individual fitness. The fitter the individual is, the more probable it is selected to contribute to the new generational population. Crossover recombines two chosen solutions to create two new ones using single-point crossover, double-point crossover or other kind of crossover operators that leads to population diversity [5]. Mutation yields a new individual by changing some randomly chosen genes in the selected one. The number of genes to be mutated is called mutation degree and how many individuals should suffer mutation is called mutation rate.
Evolutionary Design of Resilient Substitution Boxes
407
Fig. 2. Multi-objective optimisation using Nash strategy
3.1
Nash Equilibrium-Based Evolutionary Algorithm
This approach is inspired by the Nash strategy for economics and game theory [12,13]. The multi- objective optimisation process based on this strategy is noncooperative in the sense that each objective is optimised separately. The basic idea consists of associating an agent or player to every objective. Each agent attempts to optimise the corresponding objective fixing the other objectives to their best values so far. As proven by Nash in [12], the Nash equilibrium point should be reached when no player can improve further the corresponding objective. Let m be the number of objectives f1 , . . . , fm . The multi-objective genetic algorithms based on Nash strategy assigns the optimisation of objective fi to playeri , each of which has its own population. The process is depicted in Figure 2. Basically, it is a parallel genetic algorithm [4] with the exception that there are several criteria to be optimised. When a player, say playeri , completes an evolution generation, say t, it sends the local best solution reached fit to the all playerj , j ∈ {1, . . . , m} \ {i}, which will then be used to fix objective fi to fit during the next generation t+1. This evolutionary process is repeated iteratively until no player can further improve the associated criteria. 3.2
Evolvable Hardware
Evolutionary hardware [14] consists simply of hardware designs evolved using genetic algorithms, wherein chromosomes represent circuit designs. In general, evolutionary hardware design offers a mechanism to get a computer to provide a design of circuit without being told exactly how to do it. In short, it allows one to automatically create circuits. It does so based on a high level statement of the constraints the yielded circuit must obey to. The input/output behaviour
408
N. Nedjah and L. de Macedo Mourelle
Fig. 3. Four-point crossover of S-boxes
of the expected circuit is generally considered as an omnipresent constraint. Furthermore, the generated circuit should have a minimal size. Designing a hardware that fulfils a certain function consists of deriving from specific input/output behaviours, an architecture that is operational (i.e. produces all the expected outputs from the given inputs) within a specified set of constraints. Besides the input/output behaviour of the hardware, conventional designs are essentially based on knowledge and creativity, which are two human characteristics and too hard to be automated. Evolutionary hardware is a design that is generated using simulated evolution as an alternative to conventionalbased electronic circuit design. 3.3
Crossover Operators for S-Box Codings and Hardware Implementations
For both evolutionary processes (coding and hardware implementation of Sboxes), the crossover operator is implemented using four-point crossover. This is described in Fig. 3. The four-point crossover can degenerate to either the triple, double or single-point crossover. Moreover, these can be either horizontal or vertical. For the coding evolutionary process, the square represents the matrix of bytes while for the hardware evolution, it represents the matrix of gates and rooting. Note that this crossover can degenerate in a 3-point, double-point or single-point crossover, thus creating a better opportunity for population diversity, which leads to faster convergence.
4
Evolutionary Coding of Resilient S-Boxes
In general, two main important concepts are crucial to any evolutionary computation: individual encoding and fitness evaluation. One needs to know how to
Evolutionary Design of Resilient Substitution Boxes
409
appreciate the solutions with respect to each one of the multiple objectives. In this first evolutionary coding process, we encode an S-box simply as a matrix of bytes, which allows for an efficient application of the genetic operators: mutation and crossover. The mutation operator chooses an entry randomly and changes its value, using a fresh randomised byte. The crossover operator is handled for the both evolutionary process, i.e. the S-boxes coding and hardware implementation, in the same way as described in Section 3.3. In the remaining of the section, we concentrate on how to evaluate the fitness of an obtained S-boxes. For this purpose, in the next section, we give some necessary definitions that helps us implement the fitness function. 4.1
Fitness of S-Box Coding
Now, let us generalise the definitions of balance, non-linearity and autocorrelation to non-simple S-boxes, i.e. S-boxes defined as S: B n −→ B m . Definition 9. An S-box S defined as S: B n −→ B m is a concatenation of m simple S-boxes Si with 1 ≤ i ≤ m, such as in (7): S(x) = S1 (x)S2 (x) . . . Sm (x)
(7)
Definition 10. The non-linearity of S-box S is measured by NS∗ defined in (8): NS∗ =
min
Sβ (x) =
β∈Bm \{0m } i=m
NSβ (x) , wherein (8)
βi Si (x)
i=0
Definition 11. The auto-correlation of S-box S is measured by A∗S as in (9): A∗S =
max
Sβ (x) =
β∈Bm \{0m } i=m
ASβ (x) , wherein (9)
βi Si (x)
i=0
Definition 12. An S-box S is said to be regular if and only if for each ω ∈ B m there exists exactly the same number of x ∈ B n such that S(x) = ω. The regularity of an S-box can be measured by WS∗ defined in (10): WS (x) , wherein WS∗ = max β m m Sβ (x) =
β∈B \{0 } i=m
(10)
βi Si (x)
i=0
Note that a regular S-box S has WS∗ = 2n−1 . The optimisation objectives consist of maximising regularity of the S-box as well as its non-linearity while minimising its auto-correlation. These are stated in (11): max WS∗ , max NS∗ , min A∗S (11) S
S
S
410
5
N. Nedjah and L. de Macedo Mourelle
Evolvable Hardware Implementation of S-Boxes
In this section, an individual is a circuit design of an S-box, which is pre-specified using its truth table form. We encode circuit schematics using a matrix of cells that may be interconnected. A cell may or may not be involved in the circuit schematics. A cell consists of two-input logical gate or three in case of a MUX, and a single output. A cell may draw its input signals from the output signals of gates of previous rows. The gates included in the first row draw their input signals from the circuit global ones or their complements. The circuit global output signals are the output signals of the gates in the last raw of the matrix. 5.1
Fitness of an S-Box Circuit
A circuit design is said to be fit if, and only if, it satisfies the imposed input/output behaviour. In single objective optimisation, a circuit design is considered fitter than another if it has a smaller size, shorter response or consumes less power, depending on the optimisation sigle objective either size, time or power consumption minimisation respectively. In multi-objective optimisation, however, the concept of fitness is not that obvious. It is extremely rare that a single design optimises all objectives simultaneously. Instead, there normally exist several designs that provide the same balance, compromise or trade-off with respect to the problem objectives. Here, we consider three objectives: hardware area (H), response time (T ) and power dissipation (P). Of course, the circuit evolved need to be fit (F ). Objective H is estimated by the total number of gate-equivalent required to implement the evolved circuit and objective T by the maximum delay occasioned by it. Objective P is evaluated by approximating the switching activity of each gate and the respective fanout [10]. Let C be a digital circuit that uses a subset (or the complete set) of the gates given. Let gates(C) be a function that returns the set of all gates of circuit C and levels(C) be a function that returns the set of all the gates of C grouped by level. Notice that the number of levels of a circuit coincides with the cardinality of the set expected from function levels. On the other hand, let B(xi ) be the Boolean value that the circuit C propagates for a row of the input Boolean matrix X : 2nin × nin assuming that the number the number of input signal required for circuit C is nin . The fitness function, which allows us to determine how much an evolved circuit adheres to the specified constraints, is given as in (12). min ∀C (F (C) + ω1 H(C) + ω2 T (C) + ω3 P(C)) out ⎧ F (C) = nj=1 ⎪ i|B(xi )=yi,j ξ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ H(C) = g∈gates(C) gateEquiv(g) ⎪ ⎪ T (C) = l∈levels(C) maxg∈l delay(g) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ P(C) = g∈gates(C) switch(g)f anout(g)
(12)
Evolutionary Design of Resilient Substitution Boxes
411
In (12), yi,j represents the expected output value of the output signal j for input combination xi , nout denotes the number of output signals that circuit C has. For a gate g, functions gateEquiv, delay, switch and f anout return the number of gate-equivalent, propagation delay, number of switches and fanout respectively. For each error in the evolved circuit, the individual pays a penalty ξ. Constants ω1 , ω2 and ω3 are the weighting coefficients that allow us to specify the importance of each one of the three objective: area, response time and power dissipation and thus evaluate the fitness of an evolved circuit. Note that we always have ω1 + ω2 + ω3 = 1. For implementation issue, we minimised the fitness function in (12) for values of ω1 = 0.34, ω2 = 0.33 and ω3 = 0.33.
6
Performance Results
In this section, we present some figures of performance of the two evolutionary processes described in the previous section. The evolutionary coding process presented here is compared to related work from [3,9], while the evolvable hardware implementation described earlier is compared to the designs obtained using conventional methods for the S-boxes of DES [11]. 6.1
Performance of S-Box Evolutionary Coding
The Nash algorithm [12,13], described in Section 3 was implemented using multithreading available in JavaTM . As we have a three-objective optimisation (maximising regularity, maximising non-linearity and minimising auto-correlation), our implementation has three agents, one per objective, all three running in parallel in a computer with a Hyper-Threaded Pentium-IV processor of 3.2GHz. We used S-boxes of different size (i.e. number of input and output bits) to evaluate the performance of our approach. All the fittest S-boxes we obtained through the evolutionary process are regular. The non-linearity and the auto-correlation criteria for the best S-boxes yield are given in Table 1 together with those obtained in [9] and [3]. For all the benchmarks, our approach performed better producing S-boxes that are more non-linear and less auto-correlated than those presented in [9] and [3]. The best solutions were always found after at most 500 generations. In order to compare the solutions produced by the three compared approaches, we introduce the concept of dominance relation between two S-boxes in a multiobjective optimisation. Definition 13. An S-box S1 dominates another S-box S2 , denoted by S1 S2 or interchangeably solution S2 is dominated by solution S1 if and only if S1 is no worse than S2 with respect to all objectives and S1 is strictly better than S2 in at least one objective. Otherwise, S-box S1 does not dominate solution S2 or interchangeably S2 is not dominated by S1 . So, at the light of Definition 13, we can see that all the S-boxes evolved by our approach dominate those yield by Burnett at al. [9]. Furthermore, the 8× 4, 8 × 5
412
N. Nedjah and L. de Macedo Mourelle
Table 1. Characteristics of the best S-boxes S + by Millan et al. [9], Clarck et al. [11] and our approach input×output Millan NS∗+ 8×2 108 8×3 106 8×4 104 8×5 102 8×6 100 8×7 98
et al. Clarck et al. A∗S + NS∗+ A∗S + 56 114 32 64 112 40 72 110 48 72 108 56 80 106 64 80 104 72
Our approach NS∗ A∗S 116 34 114 42 110 42 110 56 106 62 102 70
and 8 × 6 S-boxes we produced dominate those generated by Clarck et al. too. Nevertheless, the 8 × 2, 8 × 3 and 8 × 7 S-boxes are non-dominated. 6.2
Performance of S-Box Evolvable Hardware
For comparison purposes, we evolved the S-boxes of the data encryption standard (DES) and obtained the characteristics (area, time and power) of the evolved circuits. However, for existing work on designing hardware for DES S-boxes, we could only obtain the size in terms of gate equivalent. In Table 2, we give the characteristics of the S-boxes of the fastest implementation of DES known as bitslice DES [6] and in parallel, we present the characteristics of the evolved DES S-boxes. Table 2. Characteristics of the bitslice DES S-boxes S-box S1 S2 S3 S4 S5 S6 S7 S8
conventional desgin evolutionary design area time power area time power 167 2.2010 981 124 1.2880 1071 149 3.8290 761 117 1.1005 981 153 2.4675 992 102 1.7145 412 119 1.5505 571 92 0.7660 771 161 2.1170 884 126 1.2760 514 162 2.2395 831 111 1.9115 959 148 2.6180 716 108 1.2220 801 152 2.7915 1009 137 0.9895 897
The parameters used in the evolutionary process were 0.9 as mutation rate, 16 as mutation degree and a population of 100 circuits. It took us about a couple of hours to evolve the designs of DES S-Boxes S1, S2, S3 and S4. The evolvable hardware yield for S-boxes S3 and S4 are given in the appendix. However, we believe that given time, the circuit designs for the S-Boxes will be much more
Evolutionary Design of Resilient Substitution Boxes
413
Fig. 4. Performance factor of DES S-boxes: bitslice DES S-boxes vs. evolutionary Sboxes
efficient in all of the three aspects: hardware area, response time and power consumption (switching activity only). The chart of Fig. 4 relates the performance factor of the bitslice DES S-Boxes versus those obtained by the evolutionary process proposed. The performance factor is the product area × time × power. It is clear that the evolutionary S-boxes designs are far better than those designed using conventional methods.
7
Conclusion
In the first part of this paper, we used a multi-objective evolutionary algorithm based on the concept of Nash equilibrium to evolve resilient innovative evolutionary S-boxes. The produced S-boxes are regular in the sense of Definition 12. Moreover, considering the dominance relationship between solution in multi-objective optimisation as defined in Definition 13, the generated S-boxes are better than those obtained by both Millan et al. and Clark et al. This is encouraging to pursue further evolution of more complex S-boxes. In the second part of this paper, we proposed a methodology based on evolutionary computation to automatically generate data-flow based specifications for hardware designs of S-boxes. Our aim was evolving minimal hardware specifications, i.e. hardware that minimises the three main characteristics of a digital circuit, which are space (i.e. required gate number), time (i.e. encryption and decryption time) and power dissipation. We compared our results against the fastest existing design. The hardware evolved for the S-boxes is more efficient in terms of the required hardware area and response time, but consumes more power. Overall, however, the trade-off between these three hardware characteristics, the evolved S-boxes area far better than the conventionally designed ones.
414
N. Nedjah and L. de Macedo Mourelle
Acknowledgements We are grateful to CNPq (Conselho Nacional de Desenvolvimento Cient´ıfico e Tecnol´ogico) for their continuous financial support.
References 1. Beauchamp, K.G.: Walsh Functions and Their Applications. Academic, New York (1975) 2. Biham, E., Shamir, A.: Differential Cryptanalysis of DES-like Cryptosystems. In: Menezes, A.J., Vanstone, S.A. (eds.) CRYPTO 1990. LNCS, vol. 537, pp. 2–21. Springer, Heidelberg (1991) 3. Clark, J.A., Jacob, J.L., Stepney, S.: The Design of S-Boxes by Simulated Annealing. New Generation Computing 23(3), 219–231 (2005) 4. Dorigo, M., Maniezzo, M.: Parallel Genetic Algorithms: Introduction and Overview of Current Research. In: Stender, J. (ed.) Parallel Genetic Algorithms, IOS Press, Amsterdam (1993) 5. Haupt, R.L., Haupt, S.E.: Practical genetic algorithms. John Wiley, Chichester (1998) 6. Kwan, M.: Reducing the gate count of bitslice DES (2000), http://eprint.iacr.org/ 7. Matsui, M.: Linear cryptanalysis method for DES cipher. In: Helleseth, T. (ed.) EUROCRYPT 1993. LNCS, vol. 765, pp. 386–397. Springer, Heidelberg (1994) 8. Menezes, A.J., Van Oorschot, P.C., Vanstone, S.A.: Handbook of Applied Cryptography. CRC Press (1996) 9. Millan, W., Burnett, L., Cater, G., Clark, J.A., Dawson, E.: Evolutionary Heuristics for Finding Cryptographically Strong S-Boxes. In: Varadharajan, V., Mu, Y. (eds.) Information and Communication Security. LNCS, vol. 1726, pp. 263–274. Springer, Heidelberg (1999) 10. Monteiro, J., Devadas, D., Gosh, A., Keutzer, K., White, J.: Estimation of average switching activity in combinational logic circuits using symbolic simulation. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 16(1), 121–127 (1997) 11. National Institute of Standard and Technology, Data Encryption Standard, Federal Information Processing Standards 46 (November 1977) 12. Nash, J.F.: Equilibrium points in n-person games. Proceedings of the National Academy of Sciences 36, 48–49 (1950) 13. Nash, J.N.: Non-cooperative game. Annals of Mathematics 54(2), 286–295 (1951) 14. Rhyne, V.T.: Fundamentals of digital systems design. Prentice-Hall Electrical Engineering Series (1973) 15. Shanon, C.E.: Communication theory of secrecy systems. Bell Sys. Tech. J. 28(4), 656–715 (1949)
A Sophisticated Architecture for Evolutionary Multiobjective Optimization Utilizing High Performance DSP Quanxi Li and Jingsong He 1 Department of Electronic Science and Technology Nature Inspired Computation and Applications Laboratory, University of Science and Technology of China, Hefei, 230026, China
[email protected],
[email protected] 2
Abstract. Constructing an evolutionary engine platform in evolvable hardware (EHW) is one of the most important topics, and a sophisticated architecture for the application of adaptive hardware is the key for the platform. In real world, most applications are multi-objective, and it is much necessary to solve the multi-objective problems (MOPs) by implementing evolutionary multi-objective optimization (EMO) in a special hardware platform. At present, there are far fewer attempts concerned with the theme. In this paper, we present an adaptive hardware platform to implement EMO algorithms utilizing high-performance digital signal processor (DSP) device. In this design, we mainly solve the problem of speedup in execution of evolutionary search by using parallel construct to implement such an EMO algorithm on DSP. Experimental results show that our platform works quite well. We still get a speedup of nearly 100 times in the condition that the CPU host frequency is 1810MHz and the hardware clock frequency is 150MHz, which offers an idea that by using a higher frequency DSP, we will get a better speedup, and we may further solve the real-world MOPs in real time. Keywords: Evolutionary Multi-objective Optimization, Digital Signal Processor, Evolvable Hardware.
1
Introduction
Over the last ten years, there has been an increasing interest in applying evolutionary algorithms (EAs) to MOPs. This research is highly relevant to real-world applications. As there are lots of MOPs in the field of real-world Engineering Design, Scientific Practice, Production Allocation, and Economics Management. These MOPs often involve several conflicting objectives for which a better tradeoff must be found. The examples of these applications are hardware/software partitioning in System-on-Chip design, control and ordering during production
This work is supported by the National Nature Science Foundation of China through Grant No. 60573170.
L. Kang, Y. Liu, and S. Zeng (Eds.): ICES 2007, LNCS 4684, pp. 415–425, 2007. c Springer-Verlag Berlin Heidelberg 2007
416
Q. Li and J. He
process, and the business portfolio management, etc. The combination of EAs, which have high computational complexity of evolutionary search, and real-world MOPs, which often have real-time constraints, makes using evolutionary algorithms to solve real-world MOPs impractical. So, seeking a real-time means to speed up evolutionary search is vitally important for solving the real-world MOPs. Evolvable hardware, which mainly consists of a core of EAs, and has a distinct characteristic of speeding up evolutionary search, has become a hotspot research field now. There are three principal research areas, Extrinsic EHW, Intrinsic EHW, and Complete Hardware Evolution (CHE) [1,2,3], which are categorized based on the extent of the implementation in hardware. Extrinsic EHW only simulates in virtue of software for the fitness computation [4]. Intrinsic EHW uses programmable or reconfigurable hardware to speed up the fitness computation [5]. And CHE directly implements all aspects of the evolution in programmable or reconfigurable hardware [3]. However, most of the EHW research is concerned with evolving circuits for various functions [6], and few of it concerns the implementation of EAs in hardware, and far fewer in particular EMO algorithms. It is the reason that the computing complexity of evolutionary search often makes the real-time capability impossible when it implements all aspects of the EMO in hardware. So, a speedup in execution has become the main motivation to construct a Complete Hardware Evolution, where all aspects of the EMO is implemented, which may contribute to dealing with the real-world MOPs in real time. At present, constructing an evolutionary engine platform in EHW is one of the most important topics, and it is vital to find a felicitous hardware platform to implement the EMO in real time. A DSP is a type of device with special configuration, one that is greatly fast and powerful. A DSP is unique because it processes data in parallel and real time. This real-time capability makes a DSP perfect for applications where we can not tolerate any delays. Meanwhile, there are lots of MOPs to solve going with the real-time, such an application is in wireless layout on or upward 3G project [7]. The layout must consider the confliction between carrier signal, sector of base-station and flexibility, scalability in real time, so that both the cost and communication quality are satisfied in a better tradeoff. There are some people who have attempted to implement an intelligent algorithm on DSP. Murakawa et al. [8] developed a dynamic reconfigurable chip consisting of 15 DSPs to execute neural processing. Ferguson et al. [9] presented an evolvable hardware platform based on the use of a stand-alone DSP and a field programmable transistor array (FPTA). However, Murakawa et al.’s [8] selfrecovery mechanisms for neural network has not been studied sufficiently, and the evolvable hardware platform presented by Ferguson et al. [9] was apt for the implementation of an EA. This shows that designing a sophisticated framework based on DSP, wherein evolutionary multi-objective optimization can be implemented to solve the multi-objective problems in real time, is much necessary. In this paper, we focus on the implementation of non-dominated sorting genetic algorithm II (NSGA-II) [10] utilizing a high-performance DSP device.
A Sophisticated Architecture for EMO Utilizing High Performance DSP
417
NSGA-II is a classical EMO algorithm based on Pareto-optimal front. Paretooptimal front is a series of optimal solutions, from which the expert can choose the best one. Our work is directly toward utilizing a high-performance DSP device as a means for implementing the EMO algorithm. It dose not refer to either the Extrinsic or Intrinsic EHW, but more closely follows the ideas of Complete Hardware Evolution. Meanwhile, our goal is not to evolve circuits, but implements all aspects of the evolution on DSP. This maybe opens a door for dealing with the real-world MOPs in real time. In the remainder of the paper, we present a brief description of NSGA-II and show the distinct characteristic of its implemental device in section 2. Section 3 describes the implemental ideology of the EMO algorithm utilizing DSP in detail, including description language, chromosome representation, the overall design architecture, and the algorithm modules. In section 4, we present the test problem, experimental environment, experimental results, and also give a performance comparison between software, DSP without adopting parallelism, and our design on DSP of the implementation of the EMO algorithm in a corresponding condition. Finally, we present conclusions for our work.
2
Background
2.1
Brief Description of the NSGA-II
Pseudo-code for the NSGA-II [10] is given as follows: NSGA-II () { initialize: generate P0 at random; fitness evaluation: P0 ; F = fast-non-dominated-sort (P0 ); for all Fi ∈ P0 : crowding-distance-assignment (Fi ); t = 0; while (t ≤ T) // T is the maximal iteration { selection: using a binary tournament selection Pt ; crossover and mutation: Qt ; fitness evaluation: Qt ; recombination: Rt =Pt ∪ Qt ; F = fast-non-dominated-sort (Rt ); for ith front: crowding-distance-assignment (Fi ); replacement: Pt ; t = t+1; } return Pt ; }
418
Q. Li and J. He
By using binary tournament selection, crossover and mutation, a parent population Pt of size N creates an offspring population Qt of size N . Then, a combined population Rt =Pt ∪Qt is formed, whose size is 2N . Next, the population Rt is sorted according to non-domination. Now, the first front F1 is created as the set of solutions are not dominated by any solutions in the population. These solutions are emphasized more than any other solutions in the combined population, and are temporarily removed from the population. Thus, solutions from the set F2 are chosen next, followed by solutions from the set F3 , and so on. This is repeated until no more sets can be accommodated. After each front has been created, its members are assigned crowding distances. Say that the count of individuals from the set F1 to Fk are less than N , and from the set F1 to Fk+1 are not less than N . In general, the count of solutions in all sets from F1 to Fk+1 would be larger than the population size. To choose exactly population members, we sort the solutions of the set Fk+1 using the crowded-comparison operator in descending order and choose the best solutions needed to fill all population slots. Finally, the new population Pt+1 of size N is formed based on the front and the crowing distances. 2.2
The Chip Used in This Design
In order to implement the EMO algorithm in real time, we choose the chip of TMS320C6711 [11,12,13], which is floating-valued and developed by Texas Instruments (TI). The high-performance and advanced very long instruction word architecture, makes it an excellent choice for multi-channel and multi-function applications. With performance of up to 900 million floating-point operations per-second at a clock rate of 150 MHz and a memory of 4G, the C6711 device offers cost-effective solutions to high performance complex algorithms programming challenges. The eight functional units provide four floating-point Arithmetic Logical Unit, and two floating-point multipliers. All of these performance of C6711 show that it will be much fit for implementing the EMO algorithm to solve the real-world MOPs in real time.
3 3.1
Evolutionary Multi-objective Optimization on DSP Description Language
We choose the C programming language to program on DSP. C language has a series of characteristic, better to express, faster to debug, and easier to read, which may greatly shorten the development cycle. Also, it does not need to fully understand the architecture of DSP, as the compiler can do all the laborious work of instruction selection, parallelizing, pipelining, and register allocation [13]. This virtue also involves utilizing the C6711 code generation tools to aid in optimization.
A Sophisticated Architecture for EMO Utilizing High Performance DSP
3.2
419
Chromosome Representation
At this design, we represent chromosome in real-value. The code of chromosome is perhaps the most important feature of an EA. It will affect the crossover/mutation operators in a large degree. Badly, the Pareto-optimal set will not be convergent. Using either fixed-value or floating-value to represent chromosome is related to the object problems. In real world, the MOPs are often modeling in real-value. So, it may be easy to solve the real-world MOPs by representing chromosome in floating value. Generally, the speed of floating-point DSP is lower than fixed DSP. However, for the reason that it uses floating-point data form, the performance of floating-point DSP is better than fixed DSP when it implements high-precision, complex algorithm, and that provides assurance for the real-time capability achievement of the EMO algorithm. 3.3
Design Architecture
The overall architecture of the implementation of the EMO algorithm on DSP is shown in Fig.1. Based on the function of the design, we can construct seven
Reselection
Crowding Distance
Fitness Evaluation
Archive
Crossover/ Mutation
Non_dominance Filter
Parent_POP
Combination Generator
Random Number Generator
Selection
Fig. 1. Architecture of the EMO algorithm on DSP
modules. That is a random number generator module, non-dominance filter module, selection module, crossover/mutation module, fitness evaluation module, combination generator module, crowding distance module. Wherein, the random number generator and non-dominance filter are two major modules. The random number generator can generate a new floating point or integer random number in a given cycle, and stores it into data RAM. Then it can be used for selection or crossover/mutation in any cycle. The combination generator combines the parent and child population into the mixed population so that they can be operated in the non-dominance filter. The non-dominated solutions are stored in the archive. Then they can be reselected as a new population after they are operated crowding distance. The random number generator, non-dominance filter,
420
Q. Li and J. He
crossover/mutation, and fitness evaluation are executed in parallel, which can highly speed up evolutionary search. In the next subsection, we mainly present the random number generator and the non-dominance filter. 3.4
Algorithm Modules
In order to implement EMO algorithms on DSP, we construct seven modules, and the random number generator and non-dominance filter are two key modules. The random number generator must be able to produce floating point and integer random numbers with uniform distributions for use in initialization, selection, and crossover/mutation. In fact, floating point random numbers are for initialization and crossover/mutation, and integer random numbers are for selection. A random number generator is usually specified by an equation of (1), sn = f (sn−1 )
(1)
where sn is the state of the generator after generating n random numbers. The initial state s0 is called the seed and is supplied by the user. In our design, Lagged Fibonacci Generators (LFGs) [14] are used, since they offer a simple method for obtaining very long periods, and they can be very fast. The sequence is defined in equation (2), xn = xn−p ⊗ xn−p+q mod m
(2)
where p and q are the lags, p > q, and ⊗ denotes the operation which could be any of +, −, ∗ or ⊕, m is usually a power of 2 (m = 2M ). With proper choice of p and q, we can get the period of this generator, which is equal to (2p − 1)2M−1 . In this paper, we study the parallelization of LFGs with -, p = 17, q = 12, and M = 31. Thus, the period is about 247 , that is enough for this design. Then equation (2) can be expressed in equation (3). xn = xn−17 − xn−5 mod 231
(3)
It is known that a combination of independent, uniformly distributed random sequences will have a more uniform distribution than either of its component sequences and the terms of the resulting sequence tend to be more statistically independent. This observation led to the design of combination generators which are formed by combining two or more generators. An example of such a generator is in equation (4), z i = xi ± y i mod m (4) where xi and y i are the ith random numbers produced by two random number generators, preferably of different type. This parallel computation of LFGs can be potentially useful in designing some parallel combination generators. Another major module for the EMO algorithm implementation on DSP is the non-dominance filter. In the worst case, the complexity of non-dominated sorting is O(M (2N )2 ), where M is the number of object, and N is the population size. In the process of non-dominated sorting, each individual must be compared to
A Sophisticated Architecture for EMO Utilizing High Performance DSP
421
every other individual. So, it needs to operate chromosome time after time, and that will take much memory, which induces the performance of the design degrade greatly. To reduce the space that it takes when we operate non-dominated sorting, we make an index for each individual. Then, we just operate the index in the non-dominance filter, which gets reasonable speedup. In order to get more speedup, the non-dominance filter compares each individual to every other individual in parallel. Meanwhile, for the reason that only N individuals are chose when the algorithm operate an reselection operator, the non-dominance filter needs sort the k + 1 set individuals, and as the suppose of that the numbers of individuals from the set F1 to Fk are less than N , and from the set F1 to Fk+1 are not less than N . To choose exactly N population members, we sort the solutions of the last front using the results operated in crowding distance module. The selected individuals from parent population are operated in crossover/mutation module. Also, we adopt parallelism in these modules to make it easy for multiplication.
4
Experimental Results
In this part, we present the test problem at first. Then, we describe the experimental environment. In order to validate the performance of the adaptive hardware platform, finally, we present the results of the EMO algorithm that are implemented based on MATLAB in an equivalent condition, also we present the results that implemented on the same DSP without adopting parallelism, and compare the results. 4.1
Test Problem
In this design, our motivation is not to improve on the EMO algorithm, but to present an adaptive hardware platform to implement the EMO algorithm. To sufficiently validate the performance of the adaptive hardware platform, we select the sixth test problem of Zitzler et al. [15] and call it ZDT6. The test problem includes two difficulties caused by the non-uniformity of the search space: first, the Pareto-optimal solutions are non-uniformly distributed along the global Pareto front; second, the density of the solutions is lowest near the Pareto-optimal front and highest away from the front. The problem definition is given in (5). f1 (x) = 1 − exp(−4x1 ) sin6 (6πx1 ) f1 (x) 2 f2 (x) = g(x) 1 − g(x) 0.25 6 xi g(x) = 1 + 9 4 2 0 ≤ xi ≤ 1, i = 1, 2, · · · , 6.
(5)
422
4.2
Q. Li and J. He
Experimental Environment
We utilize the DSK6711 as the experimental environment of the implementation of the EMO algorithm. DSK6711, developed by TI, is a type of hardware platform for debugging in real time. It chooses the floating-point device TMS320C6711 as the core processor with frequency of 150MHz. In order to benchmark the adaptive hardware platform, also, we present the results of the EMO algorithm that are implemented based on MATLAB in an equivalent condition. MATLAB runs on computer with the AMD3000+ CPU of 1810MHz clock frequency and 1G memory. Also, we present the implementation of the EMO algorithm on the same DSK6711 without adopting parallelism to validate the performance of the adaptive hardware platform. 4.3
Experimental Results
On DSK6711, we use the simulated binary crossover (SBX) operator and polynomial mutation [16] for real-coded EMO algorithm. The crossover probability of pc = 0.9 and a mutation probability of pm = 0.2 are used. For real-coded EMO algorithm, we use distribution index [10] for crossover mutation operators as ηc = 20 and ηm = 20 respectively. Then we make the population of size 100 evolving, with the number of generations is 100. In our experiment, the results are real value with the number of 100*10, so it is not easy to represent the performance of our design. For solving this problem, we load the object results into MATLAB, and use MATLAB to plot the Pareto-optimal front of the ZDT6. We have done the experiments on ZDT6 ten times, and got almost the same results. We plot one of the results, which are shown in Fig.2. Our design only takes a relatively small percentage of the memory in the C6711 device. The C6711 device has a memory of 4G, and our design only uses 1.8 DSP 1.6
software
1.4
f2(x)
1.2 1 0.8 0.6 0.4 0.2 0 0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
f1(x)
Fig. 2. DSP almost searched the same convergence and spread of solutions as software on ZDT6
A Sophisticated Architecture for EMO Utilizing High Performance DSP
423
the memory of 31156bit. So, it takes less than one out of hundred thousand. This suggests that a much more complex fitness function could be used to solve the MOPs, which presents a new thought for tackling the real-world MOPs in real time. 4.4
Performance Comparison
In order to benchmark the performance of our design, we also present the implementation of the test problem using MATLAB in an equivalent condition. The Pareto front implemented on MATLAB is in Fig.2. By comparing the runtime of implementation of the EMO algorithm on DSP and software, and we get a speedup of nearly 100 times. Also, we present the implementation of the EMO algorithm on the same DSK6711 without adopting parallelism to validate the performance of the adaptive hardware platform. We have run ten times of the EMO algorithm respectively, which were implemented on software, on DSP without adopting parallelism (DSP1), and our design on DSP (DSP2). The differences of the three implementations are summarized in Table 1, which are the mean of the implementation on ZDT6 respectively. Table 1. Comparison between software, DSP without adopting parallelism (DSP1), and our design on DSP (DSP2)
Population size Quantization bits (bit) Evolution generation Clock frequency(MHz) Cycles Run-time(sec) Speedup
Software 100 N/A 100 1,810 N/A 32.192983 1
DSP1 100 32 100 150 5.8789e+8 3.919282 8.214
DSP2 100 32 100 150 4.9465e+7 0.329767 97.62
In this design, our motivation is not to improve on the EMO algorithm, but to present an adaptive hardware platform to implement the EMO algorithm. As it is shown in Fig.2, our platform can commendably achieve the performance of the EMO algorithm. Experimental results show that our platform can nearly search the same convergence and spread of solutions on ZDT6. We have done the experiments on ZDT6 ten times, and got almost the same results. Also, we have presented the implementation of the EMO algorithm on the same DSP without adopting parallelism to validate the performance of the adaptive hardware platform. As it is shown in Table 1, we got a speedup of a bit over 8 on the DSP without adopting parallelism, and got a speedup of nearly 100 times in the condition that the CPU host frequency was 1810MHz and the hardware clock frequency was 150MHz, which is the reason that our platform can use parallelization quite well. This offers an idea that by using a higher frequency DSP, we will get a better speedup, and we can further solve the real-world MOPs in real time.
424
5
Q. Li and J. He
Conclusions
The focus of our research is directly toward using high-performance DSP as a means to speed up evolutionary search. Aiming at dealing with the real-world MOPs through the EMO algorithms according to the thought of evolvable hardware, we proposed an adaptive hardware platform to implement evolutionary multi-objective optimization utilizing a high-performance DSP device. In the design, we mainly solved the problem of speedup in execution of evolutionary search by using parallelism to implement the EMO algorithms in hardware. Experimental results showed that we still got a speedup of nearly 100 times in the condition that the CPU host frequency was 1810MHz and the hardware clock frequency was 150MHz. This offers an idea for solving the real-world MOPs in real time. To construct the sophisticated architecture for EMO algorithms, we mainly solved the problem of speedup in execution of evolutionary search by using parallelism. The random number generator and non-dominance filter are two key modules to settle the parallelism. By combining two or more Lagged Fibonacci Generators, we got a parallel random number generator, which could offer random numbers for the use of selection and crossover/mutation in any cycle. In non-dominance filter, we gained great speedup of evolutionary search through making each individual an index and parallelization. The initial experiment indicates that the adaptive hardware platform with the sophisticated architecture worked very well.
References 1. Garis, D.H.: CAM-BRAIN: Growing an Artificial Brain with a Million Neural Net Modules Inside a Trillion Cell Cellular Automata Machine. Journal of the Society of Instrument and Control Engineers 33(2) (1994) 2. Zebulum, R.S., Pacheco, M.A., Vellasco, M.: Evolvable Systems in Hardware Design: Taxonomy, Survey and Applications. In: Proceedings of the First International Conference on Evolvable Systems: From Biology to Hardware, pp. 344–358 (1996) 3. Tufte, G., Haddow, P.C.: Prototyping a GA Pipeline for complete hardware evolution. In: Proceedings of the First NASA/DoD Conference Workshop on Evolvable Hardware, pp. 18–25 (1999) 4. Hemmi, H., Shimohara, K.: Development and Evolution of Hardware Behaviors. In: Sanchez, E., Tomassini, M. (eds.) Towards Evolvable Hardware. LNCS, vol. 1062, pp. 250–265. Springer, Heidelberg (1996) 5. Gwaltney, D.A., Ferguson, M.I.: Intrinsic Hardware Evolution for the Design and Reconfiguration of Analog Speed Controllers for a DC Motor. In: Proceedings of the 2003 NASA/DoD Conference Workshop on Evolvable Hardware, pp. 81–90 (2003) 6. Stoica, A., Keymeulen, D., Zebulum, R., Thakoor, A., Daud, T., Klimeck, G., Jin, Y., Tawel, R., Duong, V.: Evolution of Analog Circuits on Field Programmable Transistor Arrays. In: Proceedings of the Second NASA/DoD Conference Workshop on Evolvable Hardware, pp. 99–108 (2000)
A Sophisticated Architecture for EMO Utilizing High Performance DSP
425
7. Jo, G.D., Sheen, M.J., Lee, S.H., Cho, K.R.: A DSP-Based Reconfigurable SDR Platform for 3G Systems. IEICE Transactions on Communications 88(2), 678–686 (2005) 8. Murakawa, M., Yoshizawa, S., Kajitani, I., Yao, X., Kajihara, N., Iwata, M., Higuchi, T.: The GRD Chip: Genetic Reconfiguration of DSPs for Neural Network Processing. IEEE Transactions on Computers 48(6), 628–639 (1999) 9. Ferguson, M.I., Stoica, A., Keymeulen, D., Zebulum, R., Duong, V.: An evolvable hardware platform based on DSP and FPTA. In: Proceedings of Genetic and Evolutionary Computation Conference, pp. 145–152 (2002) 10. Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation 6(2), 182–197 (2002) 11. Texas Instruments: TMS320C6000 CPU and Instruction Set Reference Guide. Literature Number: SPRU189F (2000) 12. Texas Instruments: TMS320C621x/C671x DSP Two-Level Internal Memory Reference Guide. Literature Number: SPRU609A (2003) 13. Texas Instruments: TMS320C6000 Programmer’s Guide. Literature Number: SPRU198G (2002) 14. Coddington, P.D.: Random Number Generators for Parallel Computers. NHSE Review 1(2) (1997) 15. Zitzler, E., Deb, K., Thiele, L.: Comparison of Multiobjective Evolutionary Algorithms: Empirical Results. Evolutionary Computation 8(2), 173–195 (2000) 16. Deb, K., Agrawal, R.B.: Simulated Binary Crossover for Continuous Search Space. Complex Systems 9(2), 115–148 (1995)
FPGA-Based Genetic Algorithm Kernel Design Xunying Zhang, Chen Shi, and Fei Hui Xi’an Institute of Microelectronics Technology, 710054, Xi’an, Shaanxi, China
[email protected]
Abstract. Research in Evolutionary Computation has switched some of its focus to applications in Electrical Engineering problems, leading to the field of study called Evolvable Hardware (EHW). The final goal is the creation of complete evolvable hardware systems that can adapt to changing environments and increase system performance during operation. To accomplish this task, there are three main components in this system: Genetic Algorithm, response evaluation and configurable hardware. Though the interpretation of the binary chromosome will vary from one optimization problem to another, the manipulation of the chromosomes using reproduction operators such as crossover and mutation will stay consistent. In this paper, we design a hardware-based architecture to perform the Genetic Algorithm in this system, called FPGA-based Genetic Algorithm Kernel. This modular architecture of the Genetic Algorithm will ensure its ease for modifications and suitability for different applications. Keywords: EHW, Genetic Algorithm Kernel, Fitness Calculator, FPGA.
1 Introduction Research in Evolutionary Computation has switched some of its focus to applications in Electrical Engineering problems. Scientists and engineers have begun to use Genetic Algorithm to aid in the design and optimization of electrical circuits, leading to the field of study called Evolvable Hardware (EHW). In the most abstract sense, EHW refers to the use of Evolutionary/Genetic Algorithms to aid in some aspect of the hardware design process. The final goal is the creation of complete evolvable hardware systems that can adapt to changing environments and increase system performance during operation [1]. Stoica and Thompson [1][2] defined two strategies: extrinsic and intrinsic evolution. Extrinsic evolution is designed that evaluation of each chromosome is done on a computer using a circuit simulator like SPICE or SMASH and the solution may be eventually downloaded into some kind of configurable device. Intrinsic evolution involves converting chromosomes into a configuration bit string that is downloaded into a programmable device. Circuit responses are compared against target responses in order to ascertain and rank each individual. Fig.1 [2] illustrates this concept. Fig.2 shows an integration of an intrinsic evolutionary environment to a system-on-a-chip proposed by Stoica. In this system, the evolutionary unit and the application portion are both implemented inside the FPGA. In order to reconfigure the L. Kang, Y. Liu, and S. Zeng (Eds.): ICES 2007, LNCS 4684, pp. 426–432, 2007. © Springer-Verlag Berlin Heidelberg 2007
FPGA-Based Genetic Algorithm Kernel Design
427
Fig. 1. Evolutionary of electronic circuits
Fig. 2. Evolutionary of SoC
application portion, the chromosomes used by the evolutionary unit map directly to a configuration. Evolutionary Algorithm collects the system responses and characterizes them as fitness values. As Fig.2 shows, there are three main components in this system: Genetic Algorithm, response evaluation and configurable hardware. Though the interpretation of the binary chromosome will vary from one optimization problem to another, the manipulation of the chromosomes using reproduction operators such as crossover and mutation will stay consistent. Genetic Algorithm performs initialization, selection, reproduction, population replacement and checks on the termination criteria. Response evaluation component applies each chromosome to the configurable application and compares the application responses to target responses in order to characterize the individual's fitness value. The third component is the application.
428
X. Zhang, C. Shi, and F. Hui
In this paper, we design a hardware-based architecture to perform the Genetic Algorithm in this system, called FPGA-based Genetic Algorithm Kernel. This modular architecture of the Genetic Algorithm will ensure its ease for modifications and suitability for different applications.
2 System Overview Fig.3 illustrates a practical hardware-based architecture to perform the complete evolutionary system. When it is time to process the evaluation, the Genetic Algorithm Kernel accepts the operation parameters such as probabilities for crossover and mutation, the maximum number of generations and the seed value for the random number generator. If the input data is ready, the Genetic Algorithm Kernel executes the Genetic Algorithm. When it is time to evaluate the population, it sends each chromosome to the Fitness Calculator in order to assess each individual’s fitness. The Fitness Calculator receives each chromosome and applies it to configure the Configurable Hardware. It collects the application response for each chromosome, compares this response with the target response and characterizes the individual’s fitness. The Genetic Algorithm Kernel receives the fitness sent by the Fitness Calculator and continues to execute current evaluation process. Since the Fitness Calculator is dependent on the applications, this paper concentrates on the design architecture for the realization of a hardware-based Genetic Algorithm for synthesis on FPGAs. Toward any optimization application that can be realized in hardware, the Fitness Calculator only need to meet the Genetic Algorithm Kernel’s interface. Then, they can work together and implement the complete evaluation.
Fig. 3. Genetic Algorithm Kernel architecture
3 System Design The Genetic Algorithm Kernel is divided into eight main functional components and four memory components, as Fig.4 illustrates. Considering the hardware simpleness, the Genetic Algorithm Kernel just implements a modified Genetic Algorithm. 3.1 Functional Components There are eight functional components in the Genetic Algorithm Kernel: Centric Controller, Random Number Generator, Initialization, Parent Selector, Genetic
FPGA-Based Genetic Algorithm Kernel Design
429
Operation, Individual Selector, Bus Controller and Population Replacement. According to the flow of the Genetic Algorithm, each functional component is described as follows: Centric Controller: The Centric Controller accepts the operation parameters. Then, it sends probabilities for crossover and mutation to the Genetic Operation component, seed value to the Random Number Generator, and uses the maximum number of generation as the termination criterion. As arbiter and synchronizer, centric controller component activates other functional components, controls memory access for each activated functional component and checks the maximum number of generations. When it receives a signal to start, it activates the Initialization component by sending a signal and sets the appropriate control signals of the Bus Controller to ensure the Initialization component can access Population Memory and Fitness Memory. When the Initialization component sends a complete signal back to the Controller, the Controller deactivates the memory access for the Initialization component, sends a startup signal to the Parent Selector, and activates memory access for the Parent Selector. This process repeats for the Genetic Operation, Individual Selector and Population Replacement components. Then, the Controller checks if the maximum number of generations has been reached. If the termination criterion is not met, it activates the Parent Selector and repeats the process again. If the termination criterion is met, the Controller deactivates all components and their memory access, and sends a done signal indicating the optimization process has completed. Random Number Generator: The Random Number Generator accepts its seed value and generates random bit vectors. The bit vectors generated by this component rang from 2 to 32. Other components communicate with the Random Number Generator and get the bit vectors of the length they need. The Initialization component gets these bit vectors from Random Number Generator and uses them as the initial individuals. When the Parent Selector component is activated, it communicates with the Random Number Generator to get two random bit vectors. These two bit vectors are the addresses of the two parents in the Population Memory and the Fitness Memory. When the Genetic Operation component is activated, it gets a bit vector and takes a part of this bit vector as crossover point. It also gets a bit vector as a random crossover probability and a mutation probability. Initialization: When the Initialization component activated, it receives each random bit vector of chromosome length from the Pseudo-Random Number Generator and simply sends the data to the Fitness Calculator. While the fitness value of this individual is sent back to the Initialization component, the Initialization component sends the chromosome to the Population Memory, and the fitness value to the Fitness Memory until it has reached the maximum address in the Population Memory. In this process, the address of each chromosome in the Population Memory is associated with the address of its fitness in the Fitness Memory. Parent Selector: The Parent Selector implements a Stochastic Selection. When the Parent Selector component is activated, it receives two random addresses from the Random Number Generator component and accesses the corresponding information in the Population Memory and Fitness Memory component. Then, it sends these two chromosomes and their associated fitness values to the Genetic Operation. Genetic Operation: The Genetic Operation component executes the reproduction, crossover and mutation genetic operations.
430
X. Zhang, C. Shi, and F. Hui
When the Genetic Operation is activated, it receives a random bit vector from the Random Number Generator, products the crossover probability and mutation probability by decoding the vector respectively. If the crossover probability is less than the operation threshold PC, the Genetic Operation receives the two parents and their associated fitness values from the Parent Selector component, gets a random crossover point from the Random Number Generator and completes crossover process to produces two offspring. In the other words, if the crossover probability is less than the operation threshold PC, the component performs one point crossover. If not, the component copies the two parents as offspring. If the mutation probability is less than the operation threshold PM, one bit of both offspring is mutated. If the mutation probability is greater than the operation threshold PM, the mutation process is not executed. Then, it sends the two offspring, two parents and their associated fitness values to the Individual Selector component. Individual Selector: The Individual Selector component sends the offspring to the external Fitness Calculator and receives their fitness value. Then, it compares the parents and offspring’s fitness value, selects the greatest and second greatest fitness values and their associated individuals and sends them to the Inter_Population Memory and Inter_Fitness Memory respectively. If the maximum addresses of Inter_Population Memory and Inter_Fitness Memory are not reached, this process repeats for Parent Selector, the Genetic Operation and the Individual Selector.
Fig. 4. FPGA-based Genetic Algorithm Kernel architecture
FPGA-Based Genetic Algorithm Kernel Design
431
Population Replacement: The Population Replacement component uses the data in the Inter_Population Memory and Inter_Fitness Memory to replace the data in the Population Memory and Fitness Memory. If the maximum addresses of Inter_Population Memory and Inter_Fitness Memory are reached, the Population Replacement component performs these replacements respectively and sends a signal back to the Centric Controller to show one generation evaluation has completed. Bus Controller: The Centric Controller component sends the control signal to the Bus Controller component to permit population memory and fitness memory access for the activated functional component. 3.2 Memory Components There are four memory components used in the Genetic Algorithm Kernel. The Population Memory component restores the individual chromosome. The Fitness Memory component restores each chromosome’s fitness value. We use two separate RAM block to perform these two component and associate the memory address of the individual chromosome in the population memory with the individual's location in the memory component storing the fitness values. The Inter_Population Memory component restores the new individuals produced by evaluation. The Inter_Fitness Memory component restores the fitness values associated with the new individuals. If these two memory components are fulfilled, the Population Replacement reads each individual in the Inter_Population Memory component and its fitness value in the Inter_Fitness Memory component and writes them to the Population Memory component and the Fitness Memory component respectively. This process repeats until these two memory components are empty. These two memory components are implemented by two FIFOs. When the Individual Selector component completes its process, it sends the individual and its own fitness values to the Inter_Population Memory component and the Inter_Fitness Memory component respectively.
4 Synthesis for FPGA The main purpose behind this design is to create a Genetic Algorithm for hardware synthesis that can be ported to any kind of FPGA. All components in the Genetic Algorithm Kernel are written as state machines using Verilog HDL. The Genetic Algorithm Kernel using a chromosome length of 16-bits, fitness length of 16-bits and a population size 128 is synthesized on Xilinx Spartan2E XC2S300EFG456. The entire system uses 714 Slices. The maximum clock frequency is 104.2MHz, when the chip speed grade is -6.
5 Conclusion and Future Works We take some different three-dimensional polynomial functions as the external Fitness Calculator and use the Genetic Algorithm Kernel together to find either the minimum
432
X. Zhang, C. Shi, and F. Hui
or maximum point in each of these functions’ search space. Most of these experiments show the proper functionality of this system. In the future, we will analyze this system carefully and modify the architecture by using pipeline in order to improve the system performance. In the other hand, we will study the usage of this system in the aerospace electronics systems. Aiming at a satellite application, we will design the complete Fitness Calculator and configurable hardware. Then, we will integrate the Genetic Algorithm Kernel with the Fitness Calculator and application to perform a online adaptive hardware system.
References 1. Stoica, A.: Evolvable Hardware: From On-Chip Circuit Synthesis to Evolvable Space Systems. In: Proceedings of the 30th IEEE Symposium on multi-valued logic, May 23-25, 2000, IEEE Press, Portland, USA (2000) 2. Thompson, A., Layzell, P., Zebulum, R.: Explorations in Design Space: Unconventional Electronics Design Through Artificial Evolution. IEEE Transactions On Evolutionary Computation 3(3), 167–196 (1999) 3. Azzini, A., Betoni, M., Liberali, V., Rossi, R., Tetamanzi, A.: Evolutionary Design and FPGA Implementation of Digital Ffilters. In: VLSI Circuits and Systems – Proc. SPIE, May 2003, Maspalomas, Spain, vol. 5117, pp. 485–496 (2003) 4. Garis, H.: Evolvable Hardware: The Genetic Programming of Drawing Machines[C]. In: Proc. of Int. Conf, on Artificial Neural Nets and Genetic Algorithms, pp. 441–449 (1993) 5. Higuichi, T.: Evolvable Hardware at Function Level[C]. In: Proc. l997 IEEE Int. Conf Evolutionary Computation (ICES’97), pp. 187–192 (1997) 6. Torresen, J.: A Divide-and Conquer Approach to Evolvable Hardware[C]. In: Sipper, M., Mange, D., Pérez-Uribe, A. (eds.) ICES 1998. LNCS, vol. 1478, pp. 57–65. Springer, Heidelberg (1998) 7. Miller, J.F., Thompson, A., Thompson, P., Fogarty, T.C. (eds.): ICES 2000, Evolvable Systems: From Biology to Hardware. LNCS, vol. 1801. Springer, Heidelberg (2000) 8. Torresen, J.: A Scalable Approach to Evolvable Hardware. In: Banzhaf, W. (ed.) Genetic Programming and Evolvable Machines, vol. 3(3), pp. 259–283. Kluwer Academic Publishers, Dordrect, The Netherlands (2002) 9. IEEE Standard Hardware Description Language Based on the Verilog Hardware Description Language, IEEE Computer Society, IEEE Std 1364-1995, p. 47, section 5.4.1 Determinism 10. Cummings, C.E.: Coding And Scripting Techniques For FSM Designs With Synthesis-Optimized, Glitch-Free Outputs. In: SNUG’2000 Boston (Synopsys Users Group Boston, MA, 2000) Proceedings (September 2000) 11. FAQs for Spartan-II. http://www.xilinx.com/company/press/kits/spartan2/faq_ sp2.htm
Using Systolic Technique to Accelerate an EHW Engine for Lossless Image Compression Yunbi Chen and Jingsong He 1
Department of Electronic Science and Technology Nature Inspired Computation and Applications Laboratory, University of Science and Technology of China, Hefei, 230026, China
[email protected],
[email protected] 2
Abstract. The way combining intelligent technology with hardware technology to study real-world applications is one of the most important methodologies in the field of EHW. This paper designs a novel evolvable hardware engine for predictive lossless image compression in the perspective of hardware, and firstly implements the whole engine on reconfigurable hardware. As a result of the high-speed pipeline architecture, all the modules of this engine can process the data in parallel. For the most time-consuming fitness evaluation unit, the systolic array which essentially accelerate the fitness evaluation is employed. Experimental results show that the proposed evolvable hardware engine can reduce the computing time remarkably (the speedup ratio approximates to 500), and can fully utilize the hardware resources. The systolic technique adopted here also promises to scale up images size with comparatively slower speed of the increasing of the power consumption. Keywords: Evolvable Hardware, Fitness Evaluation, Systolic Array, Image Compression.
1
Introduction
Evolvable hardware (EHW) has attracted increasing attention since the early 1990’s with the advent of easily reconfigurable hardware, such as field programmable gate arrays (FPGA’s). It promises to provide a novel approach to adaptive hardware construction. According to the changeable environment, the adaptive hardware can change its architecture and behavior dynamically[1,2]. Most studies on applications of EHW indicate that seeking valuable applications of EHW, discovering new problems and their corresponding solutions may advance EHW’s development effectively[3,4,5]. Indeed, the way to study real-world applications which combines intelligent technology with hardware technology is one of the most important methodologies in the field of EHW. For the design of adaptive hardware, fully utilizing the sophisticated hardware technology may help to seek a better solution as soon as possible. Among various applications of EHW, the adaptive lossless image compression is one of the typical applications. T. Higuchi et al.[6,7] firstly presented an L. Kang, Y. Liu, and S. Zeng (Eds.): ICES 2007, LNCS 4684, pp. 433–444, 2007. c Springer-Verlag Berlin Heidelberg 2007
434
Y. Chen and J. He
evolvable chip implemented by a special functional FPGA(F2 PGA) with their special variable length genetic algorithm for compressing images. A. Fukunaga et al.[8,9] presented a new prototype system based on genetic programming to solve the problem of such implementation on conventional FPGA. Both models can achieve good compression ratio, but the problems behind these extrinsic EHW models are seriously, they need to be executed on a host computer and the compiling time is much long[7,9]. In this sense, intrinsic EHW model may be more suitable for real-time applications[1], especially for the task of predictive lossless image compression. J. He et al.[10] proposed a novel intrinsic EHW model(BinEP) for this task, which makes for the hardware realization. However, when put into practice, in order to process the data with high speed, the method must adapt to be implemented in hardware completely. Unfortunately, the implementation in hardware is more complicated than in software. In most cases, an algorithm executed on software effectively would be difficult to be implemented directly in hardware, so the algorithm has to be adjusted to fit the pipeline hardware structure[11,12,13]. As the most intelligent hardware engines to say, the complicated fitness evaluation which is the most time-consuming part has become the main bottleneck, the best solution is to speed up fitness evaluation and enhance its scalability[14,15]. When the scalability is considered, the fitness evaluation unit should be easily scaled up in parallel but with little increase in system resources and its power waste. Thus, the design of a fast, reliable fitness evaluation which can be easily scaled up is the most important step in an intelligence hardware engine. In our work, systolic array with highly regular modules is adopted for the fitness evaluation design, and firstly realize the adaptive lossless image compression in hardware completely which is based on intrinsic EHW model. With the same cells and pipeline operation, systolic array has high regularity and modularity, not only fully utilizes hardware resources with 100%, but also reduces its run time remarkably. Experimental results show that the hardware acceleration ratio approximates to 500 and the fitness evaluation unit can be efficiently scaled up in parallel. The paper is structured as follows. Section 2 gives a brief presentation of Extended BinEP(EBinEP) algorithm and the detailed analysis of fitness evaluation. Section 3 designs a framework wherein fast evolutionary operations are interfaced with fast and reliable fitness evaluations that are accelerated by the systolic technique. In addition, the parallel extended structure of fitness evaluation is described. In section 4 experimental results on the evolvable hardware(the EHW engine) are presented. Detailed performance analysis of the application is given. Section 5 concludes the paper with directions for future work.
2 2.1
Extended BinEP and Fitness Evaluation Analysis Extended BinEP
In an extrinsic EHW mode, circuit that is coded into individuals is evolved continuously until reaching the specific functions. The advantage of using the extrinsic mode is that many commercial compilers and circuit resources out of
Using Systolic Technique to Accelerate an EHW Engine
435
conventional design can be employed directly. Since the extrinsic mode needs an extra device to accommodate the compiler software and has to consume much compiling time and download time, it may be detrimental to real-time adaptive applications. For the hardware implementation of adaptive lossless image compression, if the control parameters can be described as a set of switches, the evolving of states of switches would be easily achieved inside a chip through intrinsic mode. Based on [10], we propose the EBinEP algorithm which is more suitable for hardware implementation. EBinEP simplifies the complexity of evolutionary model, and software simulation results indicate that satisfied parameters can be gained only with a few individuals and generations.
Population(t)
Asexually Mutation
Image Data
1 2 Prediction
Fitness Evaluation
Selection
Termination Criterion Satisfied?
m
N Population(t+1)
Compression
n Y
3
x1 x 2 x3 x4 x
Best Solution
Fig. 1. EBinEP flowchart
Fig. 2. Template used by EBinEP
The EBinEP flowchart is shown in Fig.1, wherein the components surrounded by dashed line are completely implemented on hardware. Firstly, the image data flow into the intelligence engine for feature analysis (path 1), then the optimal predictive parameters are gained through learning and evolving. Secondly, both the original image data and the optimal parameters are employed to compute the predictive errors (path 2). Finally, the required results are obtained through encoding the predictive errors (path 3). Performance evaluation shows that lossless image compression based on the EBinEP can achieve better performance when compared to the traditional compression methods such as Huffman code. 2.2
Fitness Evaluation Analysis
Many researches show that the lossless image compression predicted by the exponential interpolating function may receive satisfied performance. As shown
436
Y. Chen and J. He
in Fig.2, pixel x can be predicted by four neighboring pixels {x1 , x2 , x3 , x4 }, just as 4 x= xi · e−ai (bi +1) , (1) i=1
where ai ∈ [0, 1],and bi ∈ [0, 3]. Different images have different predictive parameters ai and bi . Through learning and analyzing the image features, the evolutionary algorithm can obtain the optimal compression parameters within the prescribed time. Satisfied compression ratio can be achieved by the exponential interpolating function, but the computing complexity of fitness evaluation is increased. As a m × n image, xij denotes the original pixel located in (i, j), xij and eij denotes the predictive pixel and predictive error respectively. The formulations related to fitness evaluation are written as
eij = |xij − xij | = |sij + tij | ,
(2)
sij = xi−1,j−1 · w1 + xi−1,j · w2 + xi−1,j+1 · w3 ,
(3)
tij = xi,j−1 · w4 − xij ,
(4)
wk = e−ak ·(bk +1) ,
(5)
e2ij , (6) (m − 2)(n − 1) where i = 2, 3, . . . , n; j = 2, 3, . . . , m − 1; and k = 1, 2, 3, 4. The fitness value is measured by err , then the greater the err , the smaller the fitness value. err is used to replace err in order to simplify the fitness evaluation only when the fitness order is taken into account. All in all, EBinEP shows better performance compared to traditional compression methods simulated in software. Furthermore, in order to realize the benefits of adaptive lossless image compression on chip, the most effective scheme is to speed up the fitness evaluation, which is the most time-consuming part. err =
3
eij , err =
The EHW Engine with Systolic Array Acceleration
The adaptive lossless image compression is a real-time embedded application whose total hardware implementation is restricted by the limited chip area and power. In other words, while the EHW engine is accelerated, the hardware resources and power consumption of the engine shouldn’t increase too much. In this case, pipeline technique is adopted so that all the modules can process the data in parallel, and then the system speed only depends on the fitness evaluation. The highly regular and modular systolic array structure with the same and pipeline cells is employed here, which not only speeds up the fitness evaluation but also makes for VLSI implementation.
Using Systolic Technique to Accelerate an EHW Engine
3.1
437
Architecture
Fig.3 shows the EHW engine architecture, which can be divided into optimization module surrounded by dashed line and compression module at function level. The optimization module and the compression module are independent of each other. While one image is optimized, the previous optimized image can be compressed. The optimization module is the most important part in the whole system, which has the ability to gain the optimal predictive parameters through learning the features of different images within the prescribed time. The optimization module consists of the population registers module, the mutation module, the fitness evaluation module and the selection module. Pipeline registers are inserted among them so that all the modules can process the data in parallel to speed up the optimization. The following will concentrate on introducing the systolic array for the design of fitness evaluation module, which essentially speeds up the fitness evaluation and also satisfies the constraints of system resources and power.
Output Compression
Image Data RNG
Asexually Mutation
Prediction
-+
Fitness Evaluation
Selection Replacement
Population
Fig. 3. The EHW engine architecture
3.2
Systolic Array Design at Function Level
The weight function is written as wk = e−ak ·(bk +1) , k = 1, 2, 3, 4 ,
(7)
where (ak , bk ) are coded into a L-bits binary string as a whole one. The weight wk can be calculated through each (ak , bk ). To make the problem of Eq.(7) be easily implemented by intrinsic EHW, the value of exponential function can be calculated on chip by querying an embedded LOOK-UP Table(LUT). Eq.(2) can be seen as the absolute value of the sum of Eq.(3) and Eq.(4) whose inputs are sequences of pixels line by line. The dependent relationship in the pixel errors’ calculation is shown in Fig.4. As shown in the dependence graph, there are three basic edges corresponding to Eq.(3), where the data import upward along the direction (0, 1), (1, 0) denotes the right shift weight values and the results move along the direction (1, 1), and there are also three edges corresponding to Eq.(4), but the directions of the input data, the weight values and the results are (0, −1), (1, 0) and (1, 1). In order to implement the application in hardware,
438
Y. Chen and J. He j
x 21
x 22
e 22
e 23
e 24
e 25
e 26
e 27
inv inv e 32
e 33
e 34
x 23
x 24
x 25
x 26
x 27
x 28
x 31
x 34
x
x 32
x 33
35
Processor
-1 w
4
w
3
w
2
w1 x 11
o
x 12
x 13
x 14
x 15
x 16
x 17
x 18
x 21
x 22
x 23
x 24
x 25
x 26
Time
x 27
x 28
i
Fig. 4. Dependence graph
this 2-D dependence graph should be converted to the time-space form. Only the structure with the broadcast input, the shifted results and the static weights is considered here. The projection vector d, the processor vector pT and the schedule vector sT are defined as 1 d= , pT = 0 1 , sT = 1 0 . (8) 0 According to these definitions, then: ◆ Any nodes marked as I T = ( i j ) mapping to the processor i T p · I = (0 1) =j, j so, all the nodes of the horizon map to the same processor. ◆ The execution time of any node marked as I T = ( i j ) is i sT · I = ( 1 0 ) =i. j
(9)
(10)
◆ The hardware utilization ratio is HU E = 1/|sT · d| = 1 .
(11)
◆ The weights, the input data and the results are corresponding to the ports in the systolic array are shown in Table 1. From the results shown in Table 1, the systolic array can be designed just as indicated in Fig.5, each systolic cell consisting of a multiplier and an adder.
Using Systolic Technique to Accelerate an EHW Engine
439
Table 1. Edge mapping in systolic array pT · e 0 1 or −1 1
T e wk 1 0 i/p 0 1 ori/p 0 −1 result 1 0
sT · e 1 0 or 0 1
xi−1,j
xi,j−2
w1
w2
w3
0
z −1
z −1
w4
z −1
−1
ei,j−1
z −1
Fig. 5. Systolic array at function level
3.3
Systolic Array Design at Bit Level
The systolic cell(shown in Fig.6a) which consists of a multiplier and an adder can be realized by various structures. When the scalability is considered, the bit level systolic structure is adopted so that pipeline registers can be easily inserted among the bit operation units, which remarkably improves the system throughput and meets the different speed requirements. As an example, the multiplier input is 3-bits width and the adder’s is 6-bits width, the computing process is shown in Fig.6b. The hardware structure is shown in Fig.7, where HA denotes the half-adder and FA means the full-adder, the critical path comprises two half-adders and four full-adders. If the multiplier input is n-bits width and the adder’s is 2n-bits width, the critical path comprises two half-adders and 2n − 2 full-adders. a2b0 a2b1 a2b2 a[2:0] b[2:0]
× +
d[5:0]
+
(a)
s[6:0]
a2b2 d5 d4 s6 s5 s4
a2b0 a2b1 a1b1 a1b2 a0b2 d3 d2 s3 s2
a1b0 a0 b0 a0b1 d 0 d1 s1
HA
a0b1
FA
d1
a1b2
FA
a0b2
FA
d3
FA
d2
FA
FA
FA
HA
s5
s4
s3
a0b0
a1b1
FA
d5
d4
a1b0
HA
HA
s0
(b)
Fig. 6. Systolic cell and computing process
s6
s2
s1
s0
Fig. 7. Systolic array at bit level
d0
440
3.4
Y. Chen and J. He
Scalability Design in Parallel
For a high-rate coding, it would be more efficient to employ several systolic arrays. Just as shown in Fig.8, each systolic array has two input ports corresponding to the sequence A and B and one output port corresponding to the results. The sequence B of the systolic array is just the sequence A of the next systolic array after three clock cycles delay, so the neighboring two arrays can share an input sequence. Taking the four systolic arrays structure as an example, except for the pixels of the 4k + 1(k = 1, 2, . . .) lines which have to be accessed twice, the others only have to be accessed once. In contrast, for the single systolic array structure, all the pixels should be accessed twice except for the first and the last lines. Since the memory access is much time-consuming, the data reuse technique is employed to reduce the access times, which not only raises the system speed but also reduces its power consumption.
in seq 1
in seq 2 z −3
in seq 3 z −3
in seq 4 in seq 5 z −3
z −3
1st systolic array
2nd systolic array
3th systolic array
4th systolic array
res seq 1
res seq 2
res seq 3
res seq 4
Fig. 8. Systolic arrays in parallel
4
Performance Evaluation
In order to evaluate the performance of the EHW engine implemented on FPGA, sixteen scalable systolic arrays are adopted here for the design of fitness evaluation, which can predict sixteen pixels of different lines at the same time. Images used here are ten typical planetary images from NASA/DoD planetary gallery. Experiment is implemented in Altera Cyclone EP1C12Q240 FPGA, the compression results are shown in Table 2 and the performance is shown in Table 3. The results of different size images show that the EHW engine utilizing systolic technique has the equal performance as the software implementation but with less run time, whose average speedup ratio reaches 498. The systolic array fully utilizes the hardware resources with 100%, and the system operation is highly speed up with small power consumption, furthermore, it can be easily implemented by VLSI because of the high regularity and modularity of the systolic array. The Dispersed Reference Compression(DRC)[7] is a newly developed method which derives from [6]. Essentially, the ideas of the proposed method and the DRC are quite different. [7] reported a number of results of compression ratio, but didn’t show details about the computing time. So it is hard to give an exact experimental comparison between the EHW engine and the DRC. As is
Using Systolic Technique to Accelerate an EHW Engine
441
Table 2. Performance comparison of different size images between software and the EHW Engine.The size of images from up to down are:3000 × 2400, 3000 × 2400, 2400 × 2400, 2841×1846, 3000×1688, 2104×1726, 1320×1840, 1374×889, 1065×771, and1239× 805 respectively.
image name P IA07335 P IA07217 P IA05578 P IA07225 P IA07096 P IA07343 P IA07227 P IA04349 P IA05202 P IA06322
time (sec.) 148.53 148.70 118.55 107.68 103.81 74.02 49.76 25.31 17.07 20.60
software length entropy (bits) (bits) 2.435 2.400 2.104 2.053 2.630 2.580 4.149 4.111 2.699 2.647 3.010 2.961 5.134 5.110 4.603 4.567 5.250 5.225 3.172 3.132
hardware time length entropy (sec.) (bits) (bits) 297.54e-3 2.446 2.411 297.54e-3 2.161 2.110 238.03e-3 2.576 2.530 216.73e-3 4.178 4.150 209.27e-3 2.727 2.674 150.07e-3 2.992 2.941 100.37e-3 5.111 5.083 50.48e-3 4.656 4.620 33.93e-3 5.196 5.172 41.22e-3 3.233 3.192
accelerate ratio tsof t /thard 499 500 498 497 496 493 496 501 503 500
Table 3. Average performance population precision computation frequency run time accelerate size effort (Hz) (sec.) ratio software 1 N/A 100 2.0G 81.39 1 hardware 1 16bits 100 50M 163.52e − 3 498
well known, how to reduce the computing time is the most important issue for evolvable hardware. In this sense, intrinsic EHW is more attractive than extrinsic EHW for the real-time applications. The reason is very simple and clear: in extrinsic EHW, optimization procedure has been executed by host already, only the last compression procedure is done by hardware. And this has little realistic significance but to research. Therefore, we can say that the proposed method here has a big advantage over the DRC. For the large-scale images, the scalability design not only prompts parallelity of fitness evaluation, but also is helpful to reduce the system power consumption. Take the images such as P IA07335,P IA05202,P IA05578,P IA06322 as an example shown in Fig.9, the x-coordinate is the number of the layers of systolic array and the y-coordinate is the times of memory access. Usually, the memory access cycle is longer than the system clock cycle. Accessing memory frequently will increase the system power consumption and limit its operation speed, so the other high speed modules have to wait until the data access is finished. The scalability design fully utilizes the characteristic that fitness evaluation module needs to access the pixel value repeatedly, so the following systolic array directly gets its input data from the delayed data of the previous systolic array and doesn’t
442
Y. Chen and J. He
7
1.5
6
PIA07335
x 10
12
1.4
PIA05578
x 10
11
1.3 10 times
times
1.2 1.1 1
9 8
0.9 7
0.8 0.7
0
5
6
15
6
20
2
1.6
1.8
1.4
1.6
1.2
1.4
1
1.2
0.8
0
5
10 layers
0
5
6
PIA05202
x 10
times
times
1.8
10 layers
15
20
1
15
20
15
20
PIA06322
x 10
0
10 layers
5
10 layers
Fig. 9. Access times of scalable structure
need to access the memory any more. Shown in Fig.9, the scalability structure doesn’t increase the times of memory access, but reduces its access times while the number of the layers is increasing. The access times approximates to n × m along with the increasing layers.
5
Conclusion
This paper proposes an EHW engine for predictive lossless image compression, which is implemented on chip completely, other than extrinsic EHW models used by[7,8,9]. With the high-speed pipeline architecture, all the system modules can operate in parallel. In order to further realize the benefits of evolutionary computation in hardware, the highly regular systolic array is adopted to accelerate the fitness evaluation which is a most time-consuming component. To solve the large scale of images, the fitness evaluation unit can be easily scaled based on the systolic array technique to meet the real-time requirement, which not only reduces access times but also saves the power waste. Experimental results show that the proposed EHW engine is more suited to the task of real-time adaptive lossless image compression. Without loss of compression performance, the average hardware speedup ratio reaches to 498. Furthermore, when the P IA05202 (size of 1065×771) is considered, the compression
Using Systolic Technique to Accelerate an EHW Engine
443
time of the intelligence engine is only 33.93e−3s. As a general video compression, whose size is usually 1024 × 768 and the frame rate is 24 frames per second, the proposed engine is promising to be used for the video compression.
Acknowledgement This work is supported by the National Nature Science Foundation of China through Grant No.60573170.
References 1. Yao, X., Higuchi, T.: Promises and challenges of evolvable hardware. Systems, Man and Cybernetics, Part C. IEEE Transactions on 29(1), 87–97 (1999) 2. Hirst, A.J.: Notes on the evolution of adaptive hardware. In: Parmee, I. (ed.) Proceeding of Second International Conference on Adaptive Computing in Engineering Design and Control, pp. 212–219. University of Plymouth, UK (1996) 3. Higuchi, T., Iwata, M., Kajitani, I., Iba, H., Hirao, Y., Furuya, T., Manderick, B.: Evolvable Hardware and Its Applications to Pattern Recognition and FaultTolerant Systems. In: Sanchez, E., Tomassini, M. (eds.) Towards Evolvable Hardware. LNCS, vol. 1062, pp. 118–135. Springer, Heidelberg (1996) 4. Koza, J., Bennett III., F., Andre, D., Keane, M., Dunlap, F.: Automated synthesis of analog electrical circuits by means ofgenetic programming. Evolutionary Computation, IEEE Transactions on 1(2), 109–128 (1997) 5. Thompson, A., Koza, J., Goldberg, D., Fogel, D., Riolo, R.: Silicon Evolution. In: Genetic Programming 1996: Proceedings of the First Annual Conference, pp. 444–452 (1996) 6. Higuchi, T., Murakawa, M., Iwata, M., Kajitani, I., Liu, W., Salami, M.: Evolvable hardware at function level. In: IEEE International Conference on Evolutionary Computation, pp. 187–192 (1997) 7. Sakanashi, H., Iwata, M., Higuchi, T.: A Lossless Compression Method for Halftone Images Using Evolvable Hardware. In: Proceedings of the 4th International Conference on Evolvable Systems: From Biology to Hardware, pp. 314–326 (2001) 8. Fukunaga, A., Hayworth, K., Stoica, A.: Evolvable hardware for spacecraft autonomy. In: 1998 IEEE Aerospace Conference, Aspen, CO, pp. 135–143 (1998) 9. Fukunaga, A., Stechert, A., Koza, J., Banzhaf, W., Chellapilla, K., Deb, K., Dorigo, M., Fogel, D., Garzon, M., Goldberg, D., et al.: Evolving Nonlinear Predictive Models for Lossless Image Compression with Genetic Programming. In: Genetic Programming 1998: Proceedings of the Third Annual Conference, pp. 95–102 (1998) 10. He, J., Yao, X., Tang, J.: Towards intrinsic evolvable hardware for predictive lossless image compression. In: Wang, T.-D., Li, X., Chen, S.-H., Wang, X., Abbass, H., Iba, H., Chen, G., Yao, X. (eds.) SEAL 2006. LNCS, vol. 4247, pp. 632–639. Springer, Heidelberg (2006) 11. Shackleford, B., Snider, G., Carter, R., Okushi, E., Yasuda, M., Seo, K., Yasuura, H.: A High-Performance, Pipelined, FPGA-Based Genetic Algorithm Machine. Genetic Programming and Evolvable Machines 2(1), 33–60 (2001) 12. Tufte, G., Haddow, P.: Prototyping a GA Pipeline for complete hardware evolution. In: Evolvable Hardware, 1999. Proceedings of the First NASA/DoD Workshop on, pp. 18–25 (1999)
444
Y. Chen and J. He
13. Glette, K., Torresen, J.: A Flexible On-chip Evolution System Implemented on a Xilinx Virtex-II Pro Device. In: Moreno, J.M., Madrenas, J., Cosp, J. (eds.) ICES 2005. LNCS, vol. 3637, pp. 66–75. Springer, Heidelberg (2005) 14. Koza, J.R., Bennett III, F.H., Hutchings, J.L., Bade, S.L., Keane, M.A., Andre, D.: Evolving sorting networks using genetic programming and rapidly reconfigurable field-programmable gate arrays. In: Higuchi, T. (ed.) Workshop on Evolvable Systems. International Joint Conference on Artificial Intelligence, Nagoya, pp. 27–32 (1997) 15. Yamaguchi, Y., Miyashita, A., Maruyama, T., Hoshino, T.: A co-processor system with a virtex fpga for evolutionary computation. In: FPL ’00: Proceedings of the The Roadmap to Reconfigurable Computing, 10th International Workshop on Field-Programmable Logic and Applications, London, UK, pp. 240–249. Springer, Heidelberg (2000)
Author Index
Arslan, Tughrul
210
Barton, Nick 210 Benkhelifa, Elhadj 174 Bidlo, Michal 77 Chen, Haifeng 163 Chen, Yunbi 433 Cui, Jiang 67, 100 Daud, Taher 379 Djupdal, Asbjoern 256 Dowding, Natalia 198 Dragffy, Gabriel 174 Du, Qiu 309 Erdogan, Ahmet T.
210
244
Ji, Qingjian 100 Jiang, Min 89
Lee, Chong Ho 23 Lee, Jungtae 163 Li, Gaobin 89 Li, Jie 45, 368 Li, Quanxi 415
Nedjah, Nadia 403 Nibouche, Mokhtar 174 Nosato, Hirokazu 343
Rossier, Jo¨el
Haddow, Pauline C. 256, 297 He, Guoliang 13, 35 He, Jingsong 415, 433 Hu, Chengyu 277, 319 Huang, Shitan 45, 292, 368 Hui, Fei 426
Katkoori, Srinivas Keymeulen, Didier
Mange, Daniel 151 Mojarradi, Mohammad 379 Mourelle, Luiza de Macedo 403 Murakawa, Masahiro 343 Murata, Nobuharu 343
Pan, Xianghe 309 Peng, Zhizhuan 285 Piao, Chang Hao 23 Pipe, Anthony 174
Feng, Jinfu 285 Furuya, Tatsumi 343 Gan, Zhaohui 89 Gao, Gui-jun 57, 67 Gao, Junwen 355 Glette, Kyrre 1, 391 Greensted, Andrew J.
Li, Yuanxiang 13, 35 Liang, Houjun 331 Liang, Qingzhong 277, 319 Liang, Xiaolong 285 Liu, Zongpu 309 Lu, Jianhua 233 Luo, Wenjian 331
379 379
151
Sekanina, Lukas 186, 222 Seo, Ssanghee 163 Shi, Baochang 233 Shi, Chen 292, 426 Shi, Yu 109 Shim, JeongYon 119 Shim, Taebo 268 Stauffer, Andr´e 151 Stoica, Adrian 379 Tang, Wenshen 355 Torresen, Jim 1, 391 Tu, Hang 13 Tufte, Gunnar 297 Tyrrell, Andy M. 198, 244 Vannel, Fabien 151 Vasicek, Zdenek 222 Wang, Jin 23 Wang, Li 140 Wang, Nengchao
233
446
Author Index
Wang, Xufa 331 Wang, Yongji 277 Wang, Youren 57, 67, 100, 109, 129, 140 Wei, Wei 319 Wu, Qiongqin 109 Wu, Xiangning 277 Xia, Xuewen 35 Xiao, Wei 355 Xie, Min 100, 129, 140 Xu, Ming 355 Yan, Xuesong 292, 319 Yang, Erfu 210 Yang, Shanshan 129 Yang, Zhenkun 89 Yao, Rui 57, 67, 109
Yao, Yuan 319 Yasunaga, Moritoshi Ye, Donghee 163 Yu, Li 13 Yu, Sheng-lin 57
1
Zebulum, Ricardo 379 Zhang, Dingxing 355 Zhang, Wei 13 Zhang, Xunying 426 Zhang, Yuan 129, 140 Zhao, Shuguang 309 Zheng, Juan 109 Zhong, Yongbing 285 Zhou, Guoqing 268 Zhu, Jixiang 35