John Billingsley • Robin Bradbeer (Eds.) Mechatronics and Machine Vision in Practice
John Billingsley • Robin Bradbeer (Eds.)
Mechatronics and Machine Vision in Practice
With 245 Figures
123
Prof. Dr. John Billingsley Faculty of Engineering and Surveying University of Southern Queensland Toowoomba, QLD Australia
[email protected]
Prof. Dr. Robin Bradbeer Department of Electrical Engineering City University of Hong Kong 88 Tat Chee Avenue Kowloon, Hong Kong P.R. China
[email protected]
ISBN 978-3-540-74026-1
e-ISBN 978-3-540-74027-8
DOI 10.1007/978-3-540-74027-8 Library of Congress Control Number: 2007933848 © 2008 Springer-Verlag Berlin Heidelberg This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable for prosecution under the German Copyright Law. The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Cover design: Erich Kirchner, Heidelberg Printed on acid-free paper 987654321 springer.com
Foreword
Research papers on the subject of mechatronics cover a great variety of topics. Among them are those that explore new techniques and applications, but all too often there are those others that seek to paint old, tired techniques with a patina of new jargon. You will find none of the latter here. There is a heavy emphasis of the ‘in Practice’ that completes the title of the conference series from which these papers have been drawn. The papers were originally reviewed as full manuscripts and now a selection of authors have been invited to rewrite their work for inclusion in this volume. In the first section, papers with an educational theme have been gathered. Most of them focus on practical experiments that will reinforce a mechatronics course. A variety of techniques for vision analysis form the next section, again stressing a practical emphasis. The third section focuses on practical applications of machine vision, several of which have been implemented in industry, while the fourth is concerned with techniques within robotics other than vision. Some of the medical applications of the fifth section might not be for the squeamish. The book is completed with a number of applications that have an agricultural theme. University of Southern Queensland, Australia City University, Hong Kong
John Billingsley Robin Bradbeer November 2007
Contents
Education Emergent Behaviour Real-time Programming of a Six-Legged Omni-Directional Mobile Robot: Planning of Viennese Waltz Behaviour ..... Frank Nickols
3
The Hong Kong Underwater Robot Challenge ............................................... Robin Bradbeer
17
Dynamics and Control of a VTOL Quad-Thrust Aerial Robot....................... Joshua N. Portlock and Samuel N. Cubero
27
Project-oriented Low Cost Autonomous Underwater Vehicle with Servo-visual Control for Mechatronics Curricula................................... C. A. Cruz-Villar, V. Parra-Vega, and A. Rodriguez-Angeles Coordination in Mechatronic Engineering Work............................................ James Trevelyan
41 51
Vision Techniques A Vision System for Depth Perception that Uses Inertial Sensing and Motion Parallax........................................................................................ Vlatko Bečanović and Xue-Bing Wang Rate Shape Identification Based on Particle Swarm Optimization ................. P.W.M. Tsang and T.Y.Y. Yuen
65 77
Advanced 3D Imaging Technology for Autonomous Manufacturing Systems .................................................................................. A. Pichler, H. Bauer, C. Eberst, C. Heindl, J. Minichberger
87
Vision Based Person Tracking and Following in Unstructured Environments ........................................................................ Mahmoud Tarokh and John Kuo
99
VIII
Contents
Simple, Robust and Accurate Head-Pose Tracking Using a Single Camera............................................................................................... Simon Meers, Koren Ward and Ian Piper
111
Vision Applications Machine Vision for Beer Keg Asset Management .......................................... Michael Lees, Duncan Campbell, Andrew Keir Millimetre Wave Radar Visualisation System: Practical Approach to Transforming Mining Operations ................................................................ E. Widzyk-Capehart, G. Brooker, S. Scheding, A. Maclean, R. Hennessy, C. Lobsey and M. Sivadorai
125
139
An Underwater Camera and Instrumentation System for Monitoring the Undersea Environment .............................................................................. Kenneth K.K. Ku, Robin Bradbeer and Katherine Lam
167
Visual Position Estimation for Automatic Landing of a Tail-Sitter Vertical Takeoff and Landing Unmanned Air Vehicle.................................... Allen C. Tsai, Peter W. Gibbens and R. Hugh Stone
181
Minutiae-based Fingerprint Alignment Using Phase Correlation.................... Weiping Chen and Yongsheng Gao
193
Robotic Techniques A Snake-like Robot for Inspection Tasks ........................................................ Bin Li, Li Chen and Yang Wang
201
Modelling Pneumatic Muscles as Hydraulic Muscles for Use as an Underwater Actuator .............................................................................. Kenneth K.K. Ku and Robin Bradbeer
209
Automated Tactile Sensory Perception of Contact Using the Distributive Approach................................................................................ X. Ma, P. Tongpadungrod and P.N. Brett
219
Blind Search Inverse Kinematics for Controlling All Types of Serial-link Robot Arms ............................................................................... Samuel N. Cubero
229
Contents
IX
Distributive Tactile Sensing Applied to Discriminate Contact and Motion of a Flexible Digit in Invasive Clinical Environments.................................... Betty Tam, Peter Brett, David Holding, and Mansel Griffiths
247
Medical Applications
Intelligent Approach to Cordblood Collection................................................ S.L. Chen, K.K. Tan, S.N. Huang and K.Z. Tang
255
An Autonomous Surgical Robot Applied in Practice ..................................... P.N. Brett, R.P. Taylor, D. Proops, M.V. Griffiths and C. Coulson
261
Development of an Intelligent Physiotherapy System .................................... S.L. Chen, W.B. Lai, T.H. Lee and K.K. Tan
267
Visual Prostheses for the Blind: A Framework for Information Presentation .................................................... Jason Dowling, Wageeh Boles and Anthony Maeder Computer-based Method of Determining the Path of a HIFU Beam Through Tissue Layers from Medical Images to Improve Cancer Treatment............................................................................................ E. McCarthy and S. Pather
275
289
Agricultural Applications On-the-go Machine Vision Sensing of Cotton Plant Geometric Parameters: First Results .................................................................................................... Cheryl McCarthy, Nigel Hancock and Steven Raine
305
Robotics for Agricultural Systems .................................................................. Mario M. Foglia, Angelo Gentile, and Giulio Reina
313
More Machine Vision Applications in the NCEA .......................................... John Billingsley
333
Authors ..........................................................................................................
345
Index...............................................................................................................
347
Education
Mechatronics education has been a constant source of interest to nearly all engineering conferences, and the first chapter in this book is dedicated to this most important subject. The first paper discusses a very original idea from one of the more innovative pioneers of mechatronics education. This looks at the movement and control of a six legged omni-directional mobile robot, and determines how it can be programmed to perform the movements needed for a Viennese waltz. This is followed by something completely different! A description of an underwater robot competition held in Hong Kong, mainly for school students. The winners would attend the World Championships in the USA. From under the sea into the air; the next paper describes an aerial robot propelled by four rotors. It contains details of the control system needed to keep the platform stationary, and in position. It’s a pity that videos can’t be shown in books (yet?) as the demonstration of this robot is very impressive. Then it's back underwater again, with a paper from Portugal about a small project-based vehicle which is used to teach the basics of mechatronics syllabus. Finally, James Trevelyan has an interesting opinion piece concerning mechatronics engineering work, and how some sort of coordination between those institutions offering mechatronics courses may be beneficial.
Emergent Behaviour Real-time Programming of a Six-Legged Omni-Directional Mobile Robot: Planning of Viennese Waltz Behaviour
Frank Nickols Mechanical Engineering Dept. Dhofar University, Salalah, Sultanate of Oman, Middle East.
1
Introduction
Viennese waltz behaviour means simultaneous translation and rotation of a robot body when viewed from above. Emergent behaviour is concerned with how complex motion is computed in robots. The author has designed a six-legged, omnidirectional robot, Fig. 1, in order to investigate emergent behaviour real-time algorithms. The robot is programmed to carry out behaviour patterns using nine on-board, parallel processing, 16-bit microcomputers. The group of microcomputers is programmed to control independently the leg step amplitude and step angle for each of the robot’s six legs. The Viennese waltz is a very good starting point to demonstrate emergent behaviour due to, (i) its usefulness and, (ii) understanding the complexity in programming an omnidirectional, six-legged robot to carry out such motion. The property of omnidirectionality is imparted to the robot body due to each robot leg possessing three degrees of freedom (3dof) that are actuated by three servos. One microcomputer is dedicated to each leg in order to compute the displacement of
Fig. 1. Six-legged, 3dof each leg, omni-directional robot designed to investigate emergent behaviour algorithms
4
Frank Nickols
Fig. 2. Planar sheet of elemental mathematical functions, (emf’s), computing one output value as a function of inputs and coefficients. The idea is to produce continuous computation
each of the three servos. The remaining three microcomputers compute the Viennese waltz motion. This paper is the first of three papers concerned with (i) the planning (this paper), (ii) the strategy [1] and, finally, (iii) the real-time computational implementation (to be written) of Viennese waltz behaviour into the six-legged omnidirectional walking robot. The three papers are concerned with emergent behaviour and how complex robot motion behaviour patterns are computed from flat sheet standardised computer architectures, Fig. 2. The computer architecture uses a minimum number of elementary mathematical functions, (emf’s), which are multiplication, division, addition and subtraction. The four emf’s enable low level elementary computing that can be executed quickly and give the promise of realtime computing with minimum computing power. The challenge is how to carry out complex mathematical equations modeled with just these four emf’s. The solution will be described in the third and final of the papers which is currently work in progress. The robot body can be thought of as a mobile Stewart platform where, alternately, three legs of the six legs are used to locomote the body. It is known that the solutions of the servomechanisms that displace such a platform require significant computational power to solve in real-time (in this case every 20 msecs), e. g. [2]. The reader can see this if the following points are appreciated. 1. The robot has six legs. 2. Each leg is actuated by three servos so there are 18 servos. 3. Each servo is to be updated with a new angle demand at a rate of 50 times per second to place each leg tip in the correct 3dof cylindrical coordinate, (R, θ, Z). 4. The inverse kinematics equations for the leg tip placement are non-linear and complex. So there are 18 servos × 50 equations/second-servo = 900 complex inverse kinematics equations to be solved every second, i. e. one equation to be solved approximately every 1 msec.
Six-Legged Omni-Directional Mobile Robot: Planning of Viennese Waltz Behaviour
5
5. Each leg tip must follow a plane rectangular locus that gives locomotion. This locus needs additional computation power and to make things worse, the locus is subject to change both in height and length and worse still the locus has to be a curved plane whose curvature is variable when the robot is turning a corner or rotating. 6. The leg tips should be able to keep the body of the robot level whilst walking on uneven terrain. 7. The leg gait should be able to walk and change between any of the gaits, i. e., 1 leg up 5 legs down gait, 2 legs up 4 legs down, or 3 legs up 3 legs down, i. e. double tripod gait. 8. The body of the robot should possess 6dof motion albeit with limited displacement about some axes. In other words the body of the robot should be able to rotate about an instantaneous axis of rotation located at any position and at any direction in 3dof space. The rotation is continuous or limited dependent on the leg tip limited range of motion. For example, rotation can be continuous if the axis of rotation is perpendicular to the ground plane because the robot can keep walking, e. g. continuous rotation on the spot about a central vertical axis. However, if the axis of rotation is parallel to the ground plane then the robot can only rotate by a limited amount about that axis, e. g. limited roll left/right, limited pitch up/down, limited yaw left/right. Thus Viennese waltz behaviour presents an interesting challenge and even more so because, in the interests of computational efficiency, the author wishes to use no trig functions, no inverse trig functions, no square root, no log functions, no piece-wise linear functions, no look-up tables, no floating point numbers, no callable subroutines, no “if-then-else” instructions and no functions that are enclosed in parentheses. Instead, a significant number of coefficients are used together with 16 bit/32 bit integer numbers and integer power series to replace traditional mathematical functions. The advantages of such computation relate to efficiency and standardisation of computing hardware which can be designed to implement emergent behaviour. Such behaviour can be the Viennese waltz or many other behaviour patterns all of which emerge from an identical computational structure but with different coefficients. Figure 2 shows that the emf’s are arranged in a regular standardised physical layout such that information flows, wave-like, through a parallel pipe-lining processing architecture from one end of the layout to the other resulting in one computed output value at the other end. It is to be noted that the equation being computed by the standardised planar sheet of emf’s is changed by changing the values of the coefficients and not by changing the wiring or format of emf’s because this is a fixed standardised sheet. Biological systems may work like this which is quite different from digital computers that solve mathematical equations with a serial stepping computer program. There should be an efficient method by which biological systems elegantly compute behaviour patterns utilising a minimum number of elemental mathematical functions, (emf’s). Furthermore Nature would have evolved a computational architecture that is common to all biological computational processes. For example the biological computational process for (i) processing vision information from a retina, (ii) processing acoustic information from a cochlea, (iii) causing the heart
6
Frank Nickols
to beat, (iv) the gut muscles to massage food through the digestive system and (v) solving inverse kinematics equations for muscle contraction that enable limb movement and body locomotion. In fact the driving force behind this research is to show if it is possible that a standardised, generic, biological-like, parallel processing computer architecture can be used to solve most robot computational problems more efficiently than a serial digital computer. Inverse kinematics equations for the computation of each 3dof robot leg system have already been worked out. These equations will be described in forthcoming publications. The next step is to extend the concept to higher level computational intelligence processes, e. g. the Viennese waltz. These equations are almost complete. It is intended to implement these equations in a standardised field programmable gate array and use pre-calculated coefficients to obtain the required behaviour pattern. We now move on and use the important omnidirectional robot behaviour pattern, i. e. the Viennese waltz, as an application example in order to create an efficient emergent behaviour computing architecture.
2
The Viennese Waltz (see reference [3] for a movie clip)
The Viennese Waltz is a ballroom dance that requires a human dancing couple, joined as a coordinated four-legged paired unit, to rotate quickly whilst simultaneously translating, Fig. 3. Furthermore the paired unit has to negotiate the extremities of the dance floor so, added to the fast rotation and simultaneous translation, there is a less intense rotation around the dance floor. The less intense rotation means that the translation vector is constantly changing its direction. In fact ballroom dancing represents a very interesting study for the analysis and synthesis of legged and wheeled robot behaviour patterns. So also does the study of motion behaviour patterns and motion strategies of players in a premier league football match. It will be shown in this paper that the Viennese waltz behaviour pattern is a highly useful motion, possibly a fundamental motion, for omnidirectional robots. For example the behaviour pattern can be used to program
Fig. 3. Plan view of dancing couple doing the Viennese Waltz
Six-Legged Omni-Directional Mobile Robot: Planning of Viennese Waltz Behaviour
7
Fig. 4. Sun and planet wheel modeling robot Viennese Waltz behaviour
Fig. 5. Planet wheel on outside of sun wheel producing opposite rotation of planet wheel
a robot to back out of a tight corner whilst rotating, or, for a robot football player to circulate around an opponent whilst tracking the translating opponent. The classic “sun and planet wheel”, Fig. 4, is used to model Viennese Waltz behaviour. The planet wheel is a disc attached to the robot. There is no slip between the planet and the sun wheels. The planet wheel could be on the outside of the sun wheel, Fig. 5, in which case the rotation of the planet wheel is reversed.
3
Applications of Viennese Waltz Behaviour
3.1
Retreating Motion in a Corridor
The robot is shown, Fig. 6, backing out of a corridor representing retreating motion, i. e. translating, and simultaneously rotating such that the head of the robot is ready to face an opponent in the direction of its exit path.
8
Frank Nickols
Fig. 6. Application of Viennese Waltz behaviour. Retreating motion in a corridor
3.2
Rotating on the Spot
If the radius of the virtual planet wheel is set to zero then the robot will turn on the spot, Fig. 7.
Fig. 7. Application of Viennese Waltz behaviour. Rotating on the spot
3.3
Rotating About a Fixed Point
If the radius of the sun wheel is set to zero and the planet wheel rotates on the outside of the sun wheel then the robot will rotate about a fixed point which is the sun wheel, Fig. 8.
Fig. 8. Application of Viennese Waltz behaviour. Rotating about a fixed point
Six-Legged Omni-Directional Mobile Robot: Planning of Viennese Waltz Behaviour
3.4
9
Motion in a Straight Line with no Rotation, i.e. Pure Translation
This occurs when the radii of the sun and planet wheels are set to infinite radius, Fig. 9.
Fig. 9. Application of Viennese Waltz behaviour
4
Analysis of Sun and Planet Wheel Model
Viennese Waltz motion is programmed into an omnidirectional robot by using a model of a virtual sun and planet wheel, Fig. 10. The radii of each wheel depend on the required behaviour pattern. The contact point between the sun and planet wheels is an instantaneous centre of rotation, IC of R.
Fig. 10. Sun and planet wheels that are used as a model to produce Viennese Waltz behaviour in an omnidirectional robot
10
Frank Nickols
In order to program Viennese waltz behaviour it is only necessary to specify the following three variables; (see Fig. 11 for explanation): 1. The range coordinate, RP, of the instantaneous centre of rotation, IC of R, with respect to the robot body. 2. The angle coordinate, θP, of the instantaneous centre of rotation, IC of R, with respect to the robot body. 3. The radius of the sun wheel, RS.
Legend for figure 11 RP = radius of planet wheel RS = radius of sun wheel θplanet = incremental rotation of planet wheel about contact point between sun and planet wheels, Contact point is the IC of R of planet wheel. θICR = incremental rotation of IC of R w.r.t. the robot body. θsun = incremental rotation of instantaneous centre of rotation about sun wheel centre. θP1 , θP2 = angular positions of IC of R w.r.t robot body datum at positions 1 and 2 respectively. Note that: θP2 – θP1 = θICR Fig. 11. Analysis of Viennese Waltz motion with a sun and planet wheel. In this diagram the planet wheel is on the inside of the sun wheel but the opposite could be the case
Six-Legged Omni-Directional Mobile Robot: Planning of Viennese Waltz Behaviour
11
Fig. 12(a). Simplified analysis of the sun and planet wheels; (i) planet wheel on inside of sun wheel
Fig. 12(b). Simplified analysis of the sun and planet wheels; (ii) planet wheel on outside of sun wheel
Figure 11 above represents a detailed analysis of an incremental rotational displacement of the planet wheel. However it is rather complicated so the important angles featured in Fig. 11 are shown in Figs. 12(a) and 12(b). Analysing Fig. 12(a), we obtain: Note minus sign (minus indicates CCW) δθICR
⎛ ⎜ 1 = −δθplanet × ⎜ ⎜ RP ⎜ 1− R S ⎝
⎞ ⎟ ⎟ ⎟ Minus sign indicates planet ⎟ ⎠ wheel on inside of sun wheel.
(1)
12
Frank Nickols
Analysing Fig. 12(b), we obtain: Note plus sign δθICR
⎛ ⎜ 1 = +δθplanet × ⎜ RP ⎜ ⎜ 1+ R S ⎝
⎞ ⎟ ⎟ ⎟ Plus sign indicates planet ⎟ ⎠ wheel on outside of sun wheel.
(2)
Equations (1) and (2) are used to compute the coordinates of the new position, newθICR, of the IC of R with respect to the robot body, like this: newθICR = oldθICR + δθICR
(3)
where, δθICR, is given by Eq. (1) or Eq. (2) RP, remains the same unless a new trajectory is required. In practice the very small angles, δθICR, and, δθplanet, are not small. In fact, δθICR, enlarges to, ΔθICR, and, δθplanet, enlarges to, Δθplanet. Hence Eqs. (1) and (2) lead to errors in the robot body following the required path. However, these errors are compensated by inserting a gain factor, k > 1, into the Eq. (1) like this:
ΔθICR
⎛ ⎜ 1 = ±Δθplanet × k ⎜ RP ⎜ ⎜ 1± R S ⎝
⎞ ⎟ ⎟ ⎟ ⎟ ⎠
(4)
where, k > 1.
5
Application of Sun and Planet Wheel Model to the Robot
A schematic view of the six-legged omni-directional robot is shown below in Fig. 13. Locomotion of the robot is produced by the leg tips moving in rectangular curved plane shapes. A plan view of the robot is shown in Fig. 14, which shows the scale of the computational problem to achieve Viennese waltz behaviour because for each leg the value of the leg tip walking angles, (P), and the amplitude of step, (amp), must be computed in real time. The computational problem is made more challenging because the leg tip loci are curved planes whose radii of curvature are the distances, L, from the IC of R.
Six-Legged Omni-Directional Mobile Robot: Planning of Viennese Waltz Behaviour
13
Fig. 13. Illustration showing the six-legged robot and the leg tip locus plane shapes that will achieve robot body rotation about an instantaneous centre of rotation, IC of R
14
Frank Nickols
Fig. 14. Plan view of the robot during rotation about an instantaneous centre of rotation, IC of R
Six-Legged Omni-Directional Mobile Robot: Planning of Viennese Waltz Behaviour
6
15
Summary
A plan has been worked out using a sun and planet wheel for obtaining Viennese waltz behaviour which means combined translation and rotation of the robot body. Equations have been developed using this model such that a strategy can be worked out for the eventual real-time computational implementation of Viennese waltz behaviour in the robot. Further details are worked out in two forthcoming papers which are, firstly reference [1] which outlines the strategy of obtaining Viennese waltz behaviour and, because this is work in progress, a paper is yet to be written concerning the real-time computational implementation of Viennese waltz behaviour.
References [1] [2] [3]
Nickols F.M.J. Emergent Behaviour Real-time Programming of a Six-Legged OmniDirectional Mobile Robot: Strategy of Viennese Waltz behaviour. Forthcoming conference paper ICARCV, Singapore, 5–8 December 2006 Parallel Robots: Open Problems Jean-Pierre MERLET http://www-sop.inria.fr/coprin/equipe/merlet/Problemes/isrr99-html.html For a movie clip:- http://www.franknickolsrobots.com/movie1.htm and click on “Baby beetle performing Viennese Waltz” (allow time to download)
The Hong Kong Underwater Robot Challenge
Robin Bradbeer Director, Hoi Ha Wan Marine Science and Engineering Laboratory, Department of Electronic Engineering, City University of Hong Kong, Kowloon Tong, Hong Kong
1
Introduction
The aims of the project were: • To introduce technological concepts into marine environment and conservation education programmes. • To publicise the work currently being carried out by WWF and City University of Hong Kong in this area. • To enhance the awareness of Hong Kong teachers and students to marine conservation. • To provide a platform for design and technology students to partake in a practical design exercise with observable objectives. • To promote the development of technical, problem solving, critical thinking, and teamwork skills. The contest in Hong Kong was advertised widely in the press in December 2005, and 20 schools applied to join; 16 were accepted after some preliminary discussions to ascertain whether the students would be capable of performing in the contest, as well as determining the commitment of the school and its teachers. This was part of an international contest organised by the Marine Advanced Technology Education Center (MATE), Monterey, CA, USA in cooperation with the Marine Technology Society (MTS) [1].
2
The Robot Kit
All teams were supplied with a kit of parts to build a very simple robot. The concept of an inexpensive kit of parts, made available without charge, was based upon an original idea developed by MATE Center and a number of other educational
18
Robin Bradbeer
groups in the US [2]. There are also resources on the Internet with information about building underwater robots, and a book by Harry Bohm [3]. Providing some of the basic parts to the student teams makes entering the competition less daunting. As many of the components in the original rov kits designed in the US were not available in Hong Kong (as well as being specified in non-metric units!), a new kit was designed which synthesised some of these other ideas. A completed robot built with these parts can be seen in Figs. 1 and 2. The cost of each kit came to around HK$894, or US$115. This did not include a camera module, and each robot would need at least one to be able to complete the tasks in the competition. These would be one of the extra components that the students would need to buy to do so. To introduce the students and their teachers to the contest a series of workshops was help at CityU in the Underwater Systems Laboratory to allow each team to build the rov from their kit. As most schools in Hong Kong do not have access to a mechanical workshop all tools and accessories, as well as access to a test tank were made available. Figures 3 and 4 show the workshop activities. Fourteen of the sixteen teams were able to build their rov and test it in the water tank within the three hour period allocated.
Fig. 1. Constructed ROV made from kit of parts supplied – rear view
Fig. 2. Side view of built ROV
Fig. 3. Workshop in the CityU lab
Fig. 4. Testing in the lab water tank
The Hong Kong Underwater Robot Challenge
3
19
The Competition
The competition was based upon that designed by MATE Center for the MATE/MTS ROV Contest 2006. The teams from Hong Kong, being school students, were competing in the Ranger class contest. There is also an Explorer class contest, which is more advanced, and is for experienced schools and/or colleges. This year the Ranger mission tasks that the robots had to perform included: • Transporting the electronics module from the surface to the trawl-resistant frame. • Placing the electronics module in the frame. • Opening the door of the trawl-resistant frame adjacent to the submarine cable. • Retrieving the submarine power/communications cable connector from the seafloor. • Inserting the power/communications connector into the appropriately labelled open port on the electronics module. [5] There was also another task which involved locating and attaching to the acoustic transponder’s release loop and removing the release loop from the acoustic transponder to free the instrument. These two tasks had to be carried out in 20 minutes. Teams were allowed two attempts, and the highest scoring run was the one marked. The rovs had to be able to work at a depth of 5 m and at a distance of 7 m from the pool edge. They could only be powered by 12 v dc and with a maximum total current of 25 A.
4
The Workshops
A series of workshops to introduce the concept of designing and building an underwater robot were organised for the teams, as shown in the photos above. This not only allowed the teams to complete the kit robot which had been designed by CityU, based on a combination of two designs from Bohm’s book, and the availability of parts in Hong Kong. However, it was not only the robot construction that interested the students. As the contest was sponsored by WWF Hong Kong there was an environmental aspect that needed to be addressed too. Therefore a number of workshops were held at the Marine Life Centre in Hoi Ha Wan Marine Park – a marine environmental centre run by WWF, and where CityU also has a Marine Science and Engineering Laboratory As well as having an introduction to the marine environment around Hong Kong, the teams also took a ride in the Centre’s glass bottom boat to see the coral reef at first hand. Then they were given the opportunity to drive the laboratories commercial rov. Figures 5 to 7 show some of these workshop activities.
20
Robin Bradbeer
Fig. 5. On the glass bottom boat
Fig. 6. Driving the commercial ROV
Fig. 7. In the Marine Science and Engineering lab
The commercial rov is used at the laboratory to survey the coral reefs in the marine park. The laboratory also has permanently deployed video cameras and instrumentation on the reef for monitoring purposes. The students were able to witness real-life applications in these workshops, which gave them a better understanding of why they were taking part in the competition.
5
The Hong Kong Finals
The Hong Kong finals, to select the team to go to Houston, TX for the International Contest was held in the swimming pool at CityU. 14 teams made it through to the final, all with rovs that were based on the original kit, although some had changed considerably. As the original design was scalable, most had been made larger to accommodate the electronics module, which was around 400 mm × 400 mm × 550 mm, and weighed 0.5 kg in water. The final contest was held in the 3.3 m deep middle of the pool. The mission props were placed around 3 m from the side of the pool. Each team was given 15 minutes to complete the two mission tasks. At the same time, each team was given a notice board to display a poster of their work, and table and power supply to work on their robots before and/or after their runs. They also had to exhibit an
The Hong Kong Underwater Robot Challenge
21
Fig. 8. A pool full of robots – showing the underwater mission props
Fig. 9. Judging the engineering and documentation
Fig. 10. Making final adjustments
Fig. 11. Manoeuvring the modules
engineering report which was judged during the contest for content, as well as their robot being graded for its engineering. The engineering and report/poster marks made up half the total marks. Extra marks could be gained for completing the tasks in the time allowed. No team was able to finish the tasks, but two came very close. Figures 8 to 11 show some of the robots and how they performed in the pool.
6
The Hoi Ha Wan Trials
The three top teams in the swimming pool contest were taken to the WWF Marine Life Centre in Hoi Ha Wan Marine Park two weeks later. They were then given the chance to operate their robots in the sea, and try and locate an object suspended from the glass-bottom boat. This allowed them to see the difference between operating a robot in the swimming pool/ test tank and the sea. All three teams were able to locate the object successfully in a short time. In fact, it was surprising how such simple rovs could perform so well in a real life environment! Figures 12 to 14 show the Hoi Ha Wan trials.
22
Robin Bradbeer
Fig. 12. Dropping the robot over the side of the GBB
Fig. 13. Looking for the object in the sea
Fig. 14. Searching
7
The International Finals, Houston, Texas
The winning team from Hong kong waqs invited to attend the International Finals held at the NSA Neutral Buoyancy Lab., Houston, Texas in June 2006. Twenty five teams from all over USA and Canada took part in the Ranger contest. The Hong Kong team was the sole representative from outside N America. The team had rebuilt their robot so that it consisted of two parts – a ‘lander’ module that would be easily controlled and that would open the cage door and insert the cable connector, and a larger ‘carrier’ module that was streamlined and could carry the lander very fast, as it had 6 motors. It was less manoeuvrable, but could release the pin of the buoy. The team took three of the major 13 prizes including best poster and highest scoring rookie. Figures 15 and 16 show this rov.
The Hong Kong Underwater Robot Challenge
Fig. 15. The HK ROV and Spacelab
8
23
Fig. 16. The final assembled ROV
The Educational Objectives
The main objective of holding the competition was to introduce technology students to the concept of marine conservation, as well as to give them an opportunity to design an underwater robot. These two main objectives were fully met, and feedback from the teachers involved has been very positive. None of the students had seen an underwater robot before the contest started, so the fact that 14 of the 16 original teams were not only able to build a fully functioning robot, but one that, in most cases, could also carry out at least one of the mission tasks, was beyond our initial anticipations. The availability of the basic kit certainly helped, as did the hands-on workshops staffed with graduate students working in underwater robotics research. The contestants could easily relate to what they were trying do, as it was not an abstract task. Certainly, the visits to the Marine Park at Hoi Ha Wan gave an insight into practical applications of using technology to assist marine scientists, as well as seeing real rovs in action. The relationships between technology and conservation are not usually obvious – especially when technology is sometimes seen as causing many of our environmental problems. However, the close collaboration between the Department of Electronic Engineering, and the Department of Biology and Chemistry at CityU, especially with the use of rovs to monitor the reef at Hoi Ha Wan [7], as well as the development of underwater instrumentation for ocean observation [8] [9], meant that the real-life and practical aspects could be demonstrated. At the same time the competitive nature of the contest, with schools competing to go to the USA for the International Finals in Houston, TX, added some excitement. This was shown by the number of teams who came to the Underwater Systems Lab at CityU during the periods when there were no workshops, so that they could test their robots, as well as use some of the specialist facilities. Again, the presence of graduate students working in underwater robotics and instrumentation meant that much was learned by the contestants, which probably could not be found in books or on the web!
24
9
Robin Bradbeer
Conclusions
The First Hong Kong Underwater Robot Challenge was run from January to April 2006. It was a joint effort between WWF Hong Kong and the City University of Hong Kong and was designed to not only raise awareness of marine conservation issues amongst technologically oriented senior school students, but also to give them a competitive situation in which to design, build and operate an underwater vehicle. These aims were successfully accomplished with 14 out of the initial 16 teams entering the finals of the contest. During the competition the students not only learned about the marine environmental around Hong Kong, they also learned how technology is being used to conserve that environment. At the same time, they learned about underwater robotics, and, initially using a simple kit supplied by the organisers, eventually designed quite complex robots to carry out a series of tasks stipulated by the organisers of the International ROV Contest in the USA. The students experienced how to work as part of a team, and how to organise a complex project. The judging of the contest combined not only the completion of the tasks and the speed at which they were completed but also the documentation and engineering aspects. For further information please look at the contest web site www.ee.cityu.edu.hk/rovcontest.
Acknowledgements I would like to thank WWF Hong Kong for their financial and logistic support for the Hong Kong Underwater Robot Challenge 2006, as well as the Deputy President of City University of Hong Kong, the Dean of the Faculty of Science and Engineering, and the Head of Department of Electronic Engineering for their financial and logistics support. Also, the team from Student Development Services Sports Centre for making the swimming pool available and putting up with our strange requests. Katherine Lam and Paul Hodgson for the photos. Jill Zande at MATE Center for her patience in answering our questions (and for documenting all the rules/regulations etc so clearly), and finally, Cyrus Wong and Kenneth Ku, without whom none of this would have been possible.
References [1] [2] [3] [4] [5]
http://www.wwf.org.hk/eng/index.php http://www.cityu.edu.hk http://www.wwf.org.hk/eng/hoihawan/ http://robotchallenge.com/index1.html http://www.marinetech.org/rov_competition/index.php
The Hong Kong Underwater Robot Challenge [6] [7]
[8] [9]
25
H. Bohm and V. Jensen, Build Your Own Underwater Robot and Other Wet Projects. Vancouver, Canada:Westcoast Words, 2003 K.Y. Lam, P.K.S. Chin, R.S. Bradbeer, D. Randall, K.K. Ku, P. Hodgson, and S. G. Cheung, “A comparison of video and point intercept transect methods for monitoring subtropical coral communities”, Journal of Experimental Marine Biology and Ecology, (in publication) R.S. Bradbeer, K.K.Y. Lam, L.F. Yeung, K.K.K. Ku, “Real-time Monitoring of Fish Activity on an Inshore Coral Reef”, Proceedings, OCEANS 2005, Washington, DC, USA 19–23 October 2005 Paper #050216–04; IEEE NJ K.K. Ku, R.S. Bradbeer, K.Y. Lam, L.F. Yeung and Robin C.W. Li, “An underwater camera and instrumentation system for monitoring the undersea environment”, Proceedings 10th IEEE International Conference on Mechatronics and Machine Vision in Practice, pp.189–194, Macau, December 2004
Dynamics and Control of a VTOL Quad-Thrust Aerial Robot
Joshua N. Portlock and Samuel N. Cubero Mechanical & Mechatronic Engineering, Curtin University of Technology, Perth
1
Introduction
Some possible useful applications for Vertical Take-Off & Landing (VTOL) Unmanned Aerial Vehicles (UAVs) include remote video surveillance by security personnel, scouting missions or munitions delivery for the military, filming sports events or movies from almost any angle and transporting or controlling equipment. This paper describes the design, control and performance of a low-cost VTOL quadrotor UAV, known as the QTAR (Quad Thrust Aerial Robot). The QTAR is capable of stationary hover and omnidirectional flight; whereby pitch angle, roll angle, yaw rate and thrust can be controlled independently, while translation is subsequently controlled by these primary four inputs (tilting the thrust vector in the desired direction). The QTAR project had succeeded in developing and implementing a novel “attitude estimator” controller using very low cost components which provide sufficiently accurate tilt angles and state information for very responsive closed-loop feedback control of all flight degrees of free-
Fig. 1. QTAR Prototypes built at Curtin University of Technology
28
J.N. Portlock and S.N. Cubero
dom. The Attitude Control System (ACS) of the QTAR serves to automatically control all four motor thrusts simultaneously to stabilize all the main flight degrees of freedom (translation forwards and backwards, left and right and rotating on the spot yaw) except for altitude control. Thus, the QTAR saves a remote operator a great deal of adjustment and control effort, allowing the user to focus more on navigation and performing tasks rather than on continuously adjusting several motor speeds to maintain stability and control, manually. The quadrotor configuration employs four independent fixed-pitch rigid propellers for both propulsion and control. Each propeller is powered by its own electric motor, symmetrically positioned on each end of a “+” shape. The photos in Fig. 1 show two prototypes of the QTAR UAV that were designed, built, programmed and successfully flown at Curtin University of Technology, Western Australia, in 2005. A demonstration video can be viewed online [10].
2
Current “State of the Art” in Quadrotor UAVs
Triple and quadrotor configurations are the only types of VTOL UAV that employ rotor speed for control. Therefore control is actuated with no extra mechanical complexity, weight penalty or energy losses, commonly associated with swash plates, control surfaces or tail rotors. A triple rotor VTOL UAV, like the Tribelle [5], is the mechanically simplest configuration, however it cannot achieve independent control over roll and yaw moves, as they are coupled. At the time of initiating this project in 2005, the DraganflyerTM by RC Toys [5] was the only commercially available quad-rotor, selling at over $1300 AUD [12]. Many other VTOL UAV researchers have used this platform for their research [1], [2], [11], [3], [7]. In early 2005, the Draganflyer only had rate-gyro feedback for damping its attitude rates and little attitude stabilization or correction capabilities, hence a human operator had to focus much attention on maintaining stability and control. Later in 2005, RC Toys released their Ti (Thermal Intelligence) system. This performs some angular feedback to level out the Draganflyer when no user input is given; however the thermal horizon sensors are only accurate outdoors, at altitudes above the urban canopy [12]. As well as this attitude control limitation, the Draganflyer was limited to only 10 minutes of flight time, a small 1.5:1 thrust to weight ratio and a payload of less than 100 grams. These performance limitations were the key motivators to develop a more powerful quadrotor platform with an attitude control system capable of functioning indoors. Using low-cost commercially available “off the shelf” components, the goals of the QTAR project were to achieve a 2:1 thrust/weight ratio for improved control, flight endurance greater than 15 minutes and a 200 gram payload capacity, enough to carry an onboard wireless camera and additional equipment or sensors. These capabilities would satisfy many VTOL UAV applications.
Dynamics and Control of a VTOL Quad-thrust Aerial Robot
3
29
Design of the Propulsion System
Electrical DC motor drives were chosen in preference to Internal Combustion (IC) engines, which are quite noisy and involve high maintenance and operating costs. It was desirable to keep the maximum span of the QTAR within the width of a typical doorway to allow flight transitions into and through buildings. Therefore a propeller diameter of 10" (10 inches, or 254 mm) was selected to maintain a maximum span under 750 mm. Dual-bladed propellers (prop) were selected because they have much lower inertia and thus respond faster to thrust command signals than four-bladed props. Two different 10" diameter props, one with an 8 inch pitch and another with a 4.5" pitch were compared in tests. It was found that the 4.5" prop was more efficient, as it produced more thrust for the same amount of power. The GWS 380 brushed motor (rated at 70 Watts continuous) with a 5.33:1 gearbox was determined to be suitable for the 10" by 4.5" prop. This was compared with two different types of brushless motors (a gear-boxed in-runner and a direct-drive out-runner). The brushless motors both performed marginally better than the brushed motor, however, the brushed motors were chosen to simplify the controller and minimize costs. The thrust verses voltage (duty cycle) relationship for the brushed motor was close to linear, making simple open-loop speed control possible. The following plot in Fig. 2 illustrates the QTAR propulsion performance compared to two other commercially available quadrotor aircraft: the RC Toys DraganflyerTM and the Silverlit X-UFOTM. This data illustrates QTAR’s superior efficiency, while the Draganflyer and X-UFO both had similar, lower thrust/power characteristics. This plot also illustrates the maximum collective thrusts of 510 grams for the X-UFO and 620 grams for the Draganflyer. The QTAR system was capable of producing more than 2 kg
Fig. 2. Quadrotor Propulsion Performance Comparision
30
J.N. Portlock and S.N. Cubero
of collective thrust. The final QTAR prototype weighs about 450 grams. The energy density of Lithium Polymer (Li-Po) batteries at the time was calculated to be 145 mWh/gram, so the maximum battery capacity while retaining a 2:1 thrust/ weight ratio and carrying a 200 gram payload was 2490 mAh. This gave a theoretical endurance of 18 minutes. Even using the 2100 mAh battery we received from [13], the QTAR achieved flight times greater than 15 minutes.
4
Dynamic Modelling of Attitude
The dynamic attitude model was derived from Newton’s Laws. The gyroscopic procession of each rotor cancels out due to the counter-rotating pairs, which removes any coupling between the pitch and roll dynamics. Due to the low rotor inertia relative to the craft’s rotational inertia, the response of the electric motor was significantly faster than the attitude dynamics, so the motor response was assumed negligible in this model. The total collective thrust, FT , is the sum of all four rotor forces. (subscripts are T = Total, F = Front, B = Back, L = Left, R = Right )
FT = FF + FB + FL + FR
(1)
This collective thrust is nominally equal to the gravitational force when hovering, however, it can be varied by the pilot with the throttle input up to a maximum of 2×Fg, due to the 2:1 thrust/weight ratio. When QTAR is in a stationary hover, FT equals the weight force of the entire aircraft due to gravity. 4.1
Yaw Dynamics
A quadrotor has two sets of counter-rotating propellers, therefore, the net yaw moment generated from aerodynamic drag is cancelled out in neutral flight. This eliminates the need for a tail rotor that normally wastes 12% of the power in a conventional helicopter [4], [9]. Furthermore, a yaw moment is induced on a quadrotor by proportionally varying the speeds of the counter-rotating pairs, as illustrated in Fig. 7. The thrust variation, Vψ, is given by
Vψ ≤ From
τ max k
(k = 2 or 4 to avoid motor saturation, τmax = max. Torque)
(2)
τ ψ = I z ×ψ , where Iz is the mass moment of inertia, yaw acceleration
is
ψ =
τψ Iz
(where ψ = Yaw angle)
(3)
Dynamics and Control of a VTOL Quad-thrust Aerial Robot
31
Yaw moment is the sum of all rotor torques (CW = Clockwise, CCW = counter CW)
τ ψ = ∑τ r = τ CW − τ CCW = (τ L + τ R ) − (τ F + τ B )
(4)
In Fig. 3, the magnitudes of thrust forces are set so that FL = FR are both larger than FF = FB. The increased drag of the motors with higher thrust will create a net reaction moment that will rotate the body in one yaw direction. Similarly, the body can be rotated in the opposite yaw direction by reversing the relative magnitudes of the above pairs of thrust forces, where the thrusts of FF = FB are greater than the thrusts of FL = FR. Note that during yaw movement of the QTAR τψ ≠ 0 (net torque on the body), i.e. the sum of reaction moments is non-zero. Note that the size of each thrust is proportional to the size of each arrow, where the largest arrow represents a high thrust, the medium sized arrow represents a medium thrust (idling thrust for zero net rise or fall for each motor) and the smallest arrow represents a weak thrust. When the QTAR body is not rising or dropping in altitude, the sum of all thrusts equals the weight force due to gravity. The torque on each rotor, caused by aerodynamic drag, is proportional to the thrust by a scalar constant kτ. Therefore, Eq. (4) becomes
τ ψ = (kτ Vψ + kτ Vψ ) − (−kτ Vψ − kτ Vψ ) = 4kτ Vψ
(5)
The z-axis “Moment Of Inertia” (MOI) of the QTAR is the sum of all point mass inertias about the z-axis (assuming battery and controller inertia is negligible due to their masses being located predominantly at the “Centre Of Gravity”, or COG).
I z = ∑ I m = 4mm l 2 (where mm is a single motor & arm mass)
Fig. 3. Plus “+” configuration for flight control (size of arrow is proportional to thrust)
(6)
32
J.N. Portlock and S.N. Cubero
Therefore, substituting Eqs. (5) and (6) into (3), gives the equation of motion for yaw acceleration:
ψ =
4.2
τψ
=
Iz
4kτ Vψ 4m m l 2
=
kτ Vψ mm l 2
(7)
Pitch and Roll Dynamics
Due to the symmetrical nature of the quadrotor configuration, pitch and roll can be represented by the same model. Figure 3 illustrates the thrust variations required to induce a moment about the y-axis for rolling. The yaw deviation limit is thus
Vφ ,θ ≤
Fmax (k = 2 or 4 to avoid motor saturation, Fmax = maximum Force) (8) k
The equation of motion for this pitching or rolling moment is derived from the sum of moments about the y-axis:
∑τ θ = I
y
× θ
(9)
The thrust deviation for one motor can be calculated as
Vθ = ( FB − FF ) / 2
(10)
Therefore the sum of the moments is
∑τ θ
= 2Vθ l
(11)
The y-axis moment of inertia of QTAR is the sum of the two point mass inertias
I y = ∑ I m = 2m m l 2
(12)
We now substitute Eqs. (11) and (12) into (9) to find pitch acceleration.
∑τ θ
= I y × θ
2Vθ l = 2mm l 2 × θ
θ =
(13)
2Vθ l V = θ 2 mm l 2m m l
Due to symmetry of the QTAR body, this also represents pitch dynamics. The dynamic equations discussed so far have treated the QTAR as a flying “+” structure. Alternatively, Professor John Billingsley from the University of Southern Queensland proposed a different control strategy involving the aircraft controlled as a flying “X” structure, whereby pairs of motors are controlled. Figure 4 shows another method for controlling the thrusts of the QTAR. Note that motors “a” and
Dynamics and Control of a VTOL Quad-thrust Aerial Robot
33
Fig. 4. “X” configuration for flight control (size of arrow is proportional to thrust)
“b” are at the “front” side, “c” and “d” are at the “back” side, “a” and “c” are on the “left” side and “b” and “d” are on the “right” side of the vehicle (imagine this as a form of “diagonal flying” for the “+” structure in Fig. 3). Either “+” or “X” configurations can be used to control the QTAR successfully. For both configurations, the dynamic equation for vertical altitude acceleration will be the same, but the equation for pitch acceleration will be slightly different due to pairs of motors being controlled for the “X” configuration control method.
5
Attitude Controller Design and Simulation
It is evident from the developed models that the pitch/roll dynamics are linear (if actuator saturation is avoided), time-invariant and 2nd order. Furthermore, aerodynamic drag is assumed to be negligible, therefore this system model has no natural damping, no zeros and only one pole at the origin. This means that open-loop systems will always be unstable without feedback.
Fig. 5. Pitch/Roll Controller Block Diagram
34
J.N. Portlock and S.N. Cubero
Fig. 6. Yaw Rate Controller Block Diagram
With no natural damping, a proportional-only feedback controller would not adequately stabilise the attitude of the system, rather the system will require active damping. The pitch/roll controller illustrated in Fig. 5 was implemented. Yaw required a different controller configuration because it was a 1st order system without a global bearing angle reference. The user input was angular-rate, which was adequate for remotely piloted control. This control method dampens the yaw rate and thus maintains a relatively constant bearing while the user yaw input is left neutral. The yaw rate controller is illustrated below in Fig. 6. After establishing appropriate attitude controller designs, simulations were performed using MATLAB™ (by Mathworks) to evaluate the dynamic response of these controllers and sensor requirements such as states, ranges and resolutions.
Fig. 7. QTAR Control Electronics Block Diagram
Dynamics and Control of a VTOL Quad-thrust Aerial Robot
6
35
Control Electronics
Each system module illustrated in the block diagram of Fig. 7 either has its own removable circuit board or sub-assembly making it modular and upgradeable.
6.1
Inertial Measurement Modules
Since beginning the QTAR project, two different quadrotor aircraft have become available with attitude sensing onboard, however, they both have their limitations. The mechanical reference gyro on the SilverlitTM X-UFO is suitable for a toy, but for a UAV, it cannot operate for extended periods of time without drifting or becoming unstable. The thermal sensors on the DraganflyerTM only operate outdoors above the urban canopy in Visual Meteorological Conditions (VMC). To avoid these limitations and operate both indoors and outdoors, the QTAR system implemented low-cost inertial sensors and fused their complementary characteristics to estimate attitude using software. Micro Electro-Mechanical Sensor (MEMS) gyroscopes (gyros) measure angular rate/velocity around one axis. Theoretically a set of three orthogonally mounted gyros could be recursively integrated to continuously track attitude. Unfortunately, sensor bias drift and signal noise behaviour for low-cost gyros make this unrealisable. High-performance Ring-Laser gyros are far more accurate, however, their cost and weight make them impractical for the QTAR system. The result of integrating (discretely summing) MEMS gyros is an accumulation of the bias and signal noise errors, consequently increasing the uncertainty of the attitude estimation. Without bounding this accumulated error, the estimation becomes
Fig. 8. Electronics Modules Laid Out Prior to Mounting
36
J.N. Portlock and S.N. Cubero
unstable or unusable. The magnitude of this uncertainty is linearly proportional to the integration time, making these gyro sensors only good for short term estimation (i.e. high frequency performance). The Tokin gyros were used in the QTAR (metal rectangular prism sensors in Fig. 8) because they were the cheapest angular rate sensors at the time. MEMS accelerometers are implemented to compliment the gyros and bound the estimation error. Accelerometers measure both static acceleration due to gravity and dynamic acceleration due to forces on the vehicle. In steady state, a set of three orthogonally mounted accelerometers can accurately measure the pitch and roll tilt angles relative to the gravity vector. In mid flight, they also measure the collective thrust and any external disturbances like wind. Significant acceleration due to gearbox chatter and vibration introduces severe signal noise. At the time of developing the QTAR Inertial Management Unit (IMU), the Analog DevicesTM biaxial ADXL202 (±2 g range) accelerometers were the best solution. Two were mounted perpendicularly to sense all three axes. The maximum angles of incline were relatively small (±15°) so a linear approximation was used to relate the horizontal accelerators to the respective tilt angles, thus avoiding complex and time-consuming trigonometric functions in firmware.
7
Attitude Estimation
As mentioned before, performing integration on the gyros to estimate tilt angle is only accurate for a short period before eventually drifting. Accelerometer data is not always a precise measurement of tilt but remains stable and bounded over an extended period of time. Therefore, a discrete recursive complementary filter was implemented in software to benefit from both sensor characteristics and estimate the tilt angle states. Since this was being performed on a microcontroller, it was developed using scaled integer arithmetic and without the aid of matrix operations in order to minimise processing time. Figure 9 illustrates the final angular tilt state estimator, including the integer scaling factors used to maintain high accuracy without using floating point arithmetic. (Accelerometer output a = acceleration) The result of this compensating process is a calculated angle, dominated on the short term by the gyro sensor and bounded over the long term by the accelerometer data, where Kest determines these time scales. Mathematically, this recursive discrete state estimator can be written as
⎡∑ a
⎤ − (θ previous + Δθ gyro )⎥ K est ⎢⎣ 2 ⎥⎦
θ est = (θ previous + Δθ gyro ) + ⎢
(16)
Dynamics and Control of a VTOL Quad-thrust Aerial Robot
37
Fig. 9. Tilt Angle State Estimator
This angle estimator was simulated in MATLAB™ using inertial sensor data from flight tests. It was then implemented in firmware on the microcontroller with the scaled integer arithmetic. The experimental angle estimation data plotted in Fig. 10 demonstrates the effectiveness of the estimator in practice.
Fig. 10. Experimental Angle Estimation Data
38
J.N. Portlock and S.N. Cubero
The test was performed with the motors running to ensure the accelerometers were experiencing typical vibration noise. The plot compares the estimated angle with the uncompensated integrated gyro angle and the raw low-passed accelerometer. It can be seen that a steady state error will occur on the gyro integration if not bounded by the accelerometers. Also, the estimator rejects most high frequency accelerometer disturbances while also responding faster than the latent low-pass filtered accelerometer. An adaptive gain was implemented on the estimator gain Kest in Fig. 9. It was determined that higher rates of change in acceleration (Jerk) meant that the accelerometer was predominantly sensing dynamic acceleration. To improve the tilt angle estimation, the estimator gain was adapted to give less credibility to the accelerometer when jerk was high, but more when jerk was low.
8
Attitude Controller Implementation
The tilt angle controller gains determined from simulation were experimentally evaluated first with a 15° step input for the tilt command, but the response was underdamped. The rate gain was increased slightly and the proportional gain was lowered for improved performance. After tuning, step responses for tilt angle commands like those shown in Fig. 11 were obtained. With these tuned controller gains it was found that the system would no longer overshoot or oscillate, however, greater stability or damping comes at the cost of slower response times. High level code for the QTAR ACS was written to target the AtmelTM AVR AT mega 32 8-bit microcontroller using the signals shown in Fig. 12. A 4-channel (2-joystick) radio transmitter was used to send Pulse Position Modulated (PPM) signals for yaw, pitch, thrust and roll to QTAR’s 6-channel radio receiver in Fig. 8.
Fig. 11. Step Response with Tuned Controller Gains
Dynamics and Control of a VTOL Quad-thrust Aerial Robot
39
Fig. 12. Signal flow diagram for the QTAR ACS (Attitude Control System)
9
Conclusions
The QTAR attitude control system successfully estimated and controlled attitude both indoors and outdoors, allowing stable hover and easily controllable omnidirectional flight as described in Fig. 3. To the best of the authors’ knowledge at this time of writing, the Jerk-based adaptive tilt estimator gain method described in this paper had not been described in previous attitude estimation literature. The final QTAR prototype was capable of carrying a 200 gram payload, while maintaining a 2:1 thrust/weight ratio and achieving flight times of around 15– 20 minutes. The total cost of parts and materials for the QTAR was about AUD$870 (Australian), making it suitable for mass production and many lightweight VTOL UAV applications. The authors would like to thank Andre Turner from www.radiocontrolled.com.au [13] for sponsoring the QTAR project.
References [1] [2] [3]
Altug, E., J. P. Ostrowski, et al. (2002). Control of a quadrotor helicopter using visual feedback. Robotics and Automation, 2002. Proceedings. ICRA ’02. Altug, E., J. P. Ostrowski, et al. (2003). Quadrotor control using dual camera visual feedback. Robotics and Automation, 2003. Proceedings. ICRA ’03. IEEE International Conference on. Castillo, P., A. Dzul, et al. (2004). “Real-time stabilization and tracking of a fourrotor mini rotorcraft.” Control Systems Technology, IEEE Transactions on 12(4): 510–516.
40 [4] [5] [6] [7] [8] [9] [10] [11] [12] [13]
J.N. Portlock and S.N. Cubero Coleman, C. P. (1997). A Survey of Theorectical and Experimental Coaxial Rotor Aerodynamic. California, Ames Research Center. Dienlin, D. S., S. Dolch. (2002). “TriBelle – The Innovative Helicopter.” from http://braunmod.de/etribelle.htm. Innovations, D. (2005). “RC Toys Website.” http://www.rctoys.com. McKerrow, P. (2004). Modelling the Draganflyer four-rotor helicopter. Robotics and Automation, 2004. Proceedings. ICRA ’04. 2004 IEEE International Conference on. Microdrones GmbH (2006). http://www.microdrones.com. Petrosyan, E. (2003, 27 March 2003). “Aerodynamic Features of Coaxial Configuration Helicopter.” 2006, from http://www.kamov.ru/market/news/petr11.htm. Portlock, J. (2005). QTAR: Quad Thrust Aerial Robot 2005 Video. Perth. http://www.youtube.com/watch?v=MLxe3FuQ3v0. Suter, D., T. Hamel, et al. (2002). Visual servo control using homography estimation for the stabilization of an X4-flyer. Decision and Control, 2002, Proceedings of the 41st IEEE Conference on. Taylor, B., C. Bil, et al. (2003). Horizon Sensing Attitude Stabilisation: A VMC Autopilot. 18th International UAV Systems Conference, Bristol, UK. Turner, A. (2006). “Radio Controlled” website. www.radiocontrolled.com.au.
Project-oriented Low Cost Autonomous Underwater Vehicle with Servo-visual Control for Mechatronics Curricula
C. A. Cruz-Villar 1, V. Parra-Vega2, and A. Rodriguez-Angeles 1 1
Center for Research and Advanced Studies (CINVESTAV I.P.N.), Electrical Engineering Department, Mechatronics Group 2 CINVESTAV I.P.N., Saltillo, Robotics and Advanced Manufacture Group
1
Introduction
This paper describes a project oriented Autonomous Underwater Vehicle (AUV) that integrates three different courses in a Mechatronics MSc. academic program. The postgraduate program in Mechatronics at CINVESTAV includes the 64 hrs. courses: Real Time Programming, CAD/CAM/CAE, and Modeling and Simulation of Mechatronics Systems which are pursued simultaneously. It is intended that the students develop a final project per course, however during the first term of 2006; it was proposed to integrate a single final project for the three courses. For mechatronics integration purposes, it was suggested to take a radio control system which could be modified to apply reverse engineering and real time control, as well as advanced modeling techniques. A micro submarine was selected, as it requires a small working area and presents challenging issues such as hydrodynamic effects and under-actuation in some degrees of freedom (Gianluca 2003), on the other hand the remote control unit included at the commercial system can be modified to be interfaced to a PC through the parallel port to implement real time control. A major advantage of a radio control system is the autonomy yielded by absence of cables and physical connections between the control and controlled systems, looking to keep this autonomy; it was chosen a servo visual closed loop system. Thus the original “human” radio control system was modified to the servo visual control layout of Fig. 1.
42
C. A. Cruz-Villar, V. Parra-Vega and A. Rodriguez-Angeles
Fig. 1. Human and visual servoing RC systems
The project was presented to the students with the commitment to be accomplished in 8 weeks, it was divided into several subsystems, and each subsystem was assigned to a group of students among 10 enrolled students. The groups were encouraged to be multidisciplinary groups according to their undergraduate background (electronics, mechatronics, systems and mechanics engineers). It was also provided a minimum number of goals to be fulfilled as individual evaluation of each course, as well as of the whole project.
2
General Layout of the Project
The challenges and goals of the project were designed to include the three involved courses. The minimum project’s goals requested to the students were: • Reverse engineering of the commercial radio control AUV system to characterize it and be able to propose modifications. • Modeling, design and validation of the servo visual control using MATLAB as well as CAD/CAE tools. • Implementation of a real time controller in C-language on a Debian GNU/Linux RTAI gcc version 3.3.5-1 platform. • Modification of the electronics of the AUV to design an interface with the PC. • Position and orientation regulation of the AUV system in the X-Y plane. Due to the limited performance of the commercial AUV system and the goals of the project the students faced the following challenges: • • • • •
Image processing to determine position and orientation of the AUV. Under-actuation and limited response and control of the motors at the AUV. Limitations at the commercial radio control system provided with the AUV. Design of the multi task real time control implementation. Servo-visual problems in fluid environments, such as reflection and distortion.
The project was divided into the following subsystems: AUV, Radio-control and electronics, real time control and visual servoing.
Underwater Vehicle with Servo-visual Control
3
43
AUV Visual Servoing Project Description
The main component of the project was chosen as a micro radio control submarine Sea Scout of the brand HobbicoTM, formed by 53 assembled pieces, see the CAD disassembly of Fig. 2, which has been obtained by reverse engineering.
Fig. 2. CAD disassembly of the HobbicoTM, Sea Scout mini-submarine
The Sea Scout has a radio control unit working at 49 Mhz, and is provided with one DC motor to generate the up/down motion, and two coupled DC motors to provide the left/right and forward/backward motions, thus the AUV is underactuated. The power electronics in the AUV locks the signals such that only one of the left/right and forward/backward motions can be generated at a time. Another major constraint is that the DC motor power electronics works as an on/off controller, which limits the resolution and accuracy of the closed loop system. Thus, the performance and trajectories are highly constrained. To keep water resistance at the AUV, it was not modified, and for project purposes it was modeled and characterized by disassembling a second AUV purchased for reverse engineering purposes.
3.1
Model of the AUV
One of the major goals of the project was to model the AUV. The kinematic and dynamic models of the AUV were used for control design and validation through simulations with MATLABTM SIMULINKTM and Visual NastranTM. At the course
44
C. A. Cruz-Villar, V. Parra-Vega and A. Rodriguez-Angeles
Modelling and Simulation of Mechatronics Systems, several works related to the modelling and parameter identification of underwater vehicles were reviewed, comparing different approaches and simplification hypothesis, and particularly for modelling of the hydrodynamic effects. In the next subsections the model of a general AUV and the particular model of the Sea Scout AUV are presented. AUV's General Kinematics and Dynamics Model A rigid body that moves in a 3D space has 6 degrees of freedom denoted by T q = [ x y z φ θ ψ ] where x, y, and z are the position coordinates of the mass center with respect to a fixed coordinate frame; φ , θ and ψ are the Euler angles, which define the roll-pitch-yaw, see Fig. 3. Notice that the X axis is defined along the movement of the AUV and Z is defined downward, which follows the notation of the SNAME (Society of Naval Architects and Marine Engineers). By considering the Newton–Euler approach, see (Gianluca 2003), (Smallwood et al. 2004), the 3D body dynamics is given by
Mq + C (q )q = τ ,
M , C ∈ \ 6×6 , τ ∈ \ 6×1
(1)
where M is the constant, symmetric and positive definite inertia matrix, C (q ) is antisymmetric and represents the Coriolis and Centripetal forces, τ is the vector of external forces and torques. However, the AUV being immerse in a water environment is subjected to hydrodynamic and buoyancy effects (Jordán et al., 2005). For the study of a rigid body moving in an incompressible fluid an analysis based on the Navier–Stokes equations is considered. Since the density of the fluid is comparable to that of the body, then additional inertia effects must be taken into account, for that an added inertia matrix MA > 0 is introduced, which depends on the geometry of the body and the velocities of the system, this matrix is generally represented as follows
{
M A = − diag X x , Yy , Z z , Kφ , M θ , Nψ
}
Fig. 3. Generalized coordinates and reference frames
(2)
Underwater Vehicle with Servo-visual Control
45
where the entries X x , Yy , Z z , Kφ , M θ , Nψ depend on the geometry of the particular submarine and its velocities. Due to the fluid there is also a centripetal and Coriolis contribution, that is repT resented by the added Coriolis matrix C A ( q ) = −C A ( q ) , see (Gianluca 2003)
0 0 0 − Z z z ⎡ 0 ⎢ 0 0 0 0 Z z z ⎢ ⎢ 0 0 0 −Yy y X x x CA = ⎢ 0 − Z z z Yy y − Nψψ ⎢ 0 ⎢ Z z z 0 0 − X x x Nψψ ⎢ 0 − M θθ Kφφ ⎢⎣ −Yy y X x x
Yy y ⎤ − X x x ⎥⎥ 0 ⎥ ⎥ M θθ ⎥ − Kφφ ⎥ ⎥ 0 ⎥⎦
The viscosity of the fluid generates dissipative effects on the AUV, such as hydrodynamic damping and dragging forces, which are represented by a damping matrix D( q ) , (Jordán et al., 2005). The hydrodynamic damping and dragging forces acts against the movement of the AUV and are collinear to the fluid direction. For simplicity only dragging forces are considered, furthermore the effects are assumed to be decoupled along the degrees of freedom of the AUV, such that the matrix D( q ) is diagonal. Since the dragging forces are external, they are included like the vector of external torques τ in a form of a dragging vector force τ D . Therefore the dynamics of the AUV of Eq. (1) can be rewritten as
Mq + C (q )q = τ + τ D where τ D = ⎡⎣τ Dx
τ Di = −12
(3)
τ Dy τ Dz τ Dφ τ Dθ τ Dψ ⎤⎦
with
Ai μ ui 1 − CDi Aρ ui ui 2 di
(4)
ui is the incidence velocity (Olguin, 1999) at the direction i = x, y, z , φ ,θ ,ψ ; Ai and di are the transversal section and characteristic dimension perpendicular to ui respectively. The fluid density is denoted by ρ , CDi is a damping coefficient depending on Reynolds number, and μ denotes the
where
fluid dynamic viscosity. The hydrodynamic effects depend on the fluid regime and thus on the incidence velocity ui . The first term on (4) corresponds to laminar regime and the second term to turbulent regime (Olguin, 1999), being one of them zero depending on the regime. Then, in matrix form τ D is rewritten as the matrix D given by
D = − diag ⎡⎣τ Dx τ Dy τ Dz τ Dφ τ Dθ
τ Dψ ⎤⎦
(5)
46
C. A. Cruz-Villar, V. Parra-Vega and A. Rodriguez-Angeles
Finally the fluids effects related to buoyancy and gravity of an immerse body have to be taken into account. Considering the gravity acceleraT tion g = 0 0 9.81 , it follows that the buoyancy is given by B = ρV g , where V represents the body volume, and the body weight is W = m g , where m is the body mass. The gravity forces are external forces included the same way as the external torques τ through the vector G ( q ) given by
[
]
(W − B) sin(θ ) ⎡ ⎤ ⎢ ⎥ −(W − B) cos(θ ) sin(φ ) ⎢ ⎥ ⎢ ⎥ −(W − B ) cos(θ ) cos(φ ) G (q) = ⎢ ⎥ (6) ⎢ −( yGW − yB B) cos(θ ) cos(φ ) + ( zGW − z B B) cos(θ ) sin(φ ) ⎥ ⎢ ⎥ ( zGW − z B B) sin(θ ) + ( xGW − xB B) cos(θ ) cos(φ ) ⎢ ⎥ ⎣⎢ −( xGW − xB B) cos(θ ) sin(φ ) − ( yGW − yB B) sin(θ ) ⎦⎥ Thus, the general dynamics (1) transforms into
M ′q + C ′( q )q + D( q) + G ( q) = τ ,
M ′ = M + M A , C′ = C + CA
(7)
Dynamic Model of the Sea Scout AUV The Sea Scout is limited to three DC motors, which allow the up/down, forward/backward and left/right turn motions. So the allowed movements are highly constrained. Moreover only one camera for visual feedback is considered, such that only X-Y plane movements (horizontal position and yaw orientation) are to be covered and controlled, i.e. the camera stands over the working area of the AUV, see Fig. 4. Thus, the generalized coordinates become q = [ x y ψ ] . There are some assumptions about the modeling of the Sea Scout. First of all it has a maximum forward velocity of 0.25 m/s approximately, and its body is completely immersed, thus laminar regime is considered. For geometry and parameter estimation a cylindrical approximation of the AUV is taken, see Fig. 5. Finally, since the camera covers the X-Y plane, the Z-motion is left out, so that the AUV
Fig. 4. Generalized coordinates and reference frames
Underwater Vehicle with Servo-visual Control
47
Fig. 5. Cylindrical approximation
works at its neutral buoyancy point, thus neglecting the gravity and buoyancy effects G(q) , given by the vector (6).
The control forces acting on the AUV, i.e. τ , are related to the forces exerted by the DC actuators by a Jacobian B (Morel et al. 2003). Because of construction of the AUV, there are two coupled DC motors implying that the left/right turn depends on the backward/forward motors, as a result the yaw angle ψ and only one motion direction X or Y can be controlled at any time t . Along this work, the X direction is considered independent, and Y depends and it is result of a combination of x and ψ motions. So, it is obtained that the dynamics (7) is reduced for the Sea Scout to ⎡ m + X x −2my ⎤ ⎡ x ⎤ ⎡ τ Dx x ⎤ 0 0 ⎤ ⎡ x⎤ ⎡ 0 0 ⎡τ x ⎤ ⎢ ⎥⎢ ⎥ ⎢ ⎢ ⎥ m + Yy 0 ⎥⎢ y⎥ + ⎢ 0 0 2my ⎥⎥ ⎢⎢ y ⎥⎥ + ⎢ τ Dy y ⎥ = B ⎢ ⎥ ⎢ 0 ⎣τ y ⎦ ⎢ 0 0 I zz + Nψ ⎥⎦ ⎢⎣ψ ⎥⎦ ⎢⎣ 2my −2my 0 ⎥⎦ ⎢⎣ψ ⎥⎦ ⎢⎣τ Dψψ ⎥⎦ ⎣ where m=0.065 [Kg] is the AUV mass, L=0.118 and R=0.019 [m] are the dimensions of the cylindrical approximation, see Fig. 5, following (Olguin-Díaz 1999) it is obtained that
I zz =
1 1 m(3R 2 + L2 ), X x = −0.1m, Yy = −πρ R 2 L, Nψ = − πρ R 2 L3 12 12
ρ = 1000 ⎡⎣ Kg/m3 ⎤⎦ is the water density, the Jacobian B has been determined experimentally and by geometric projections, and it is given by 0.182 ⎤ ⎡ 1 B = ⎢⎢ −0.153 0.116 ⎥⎥ ⎢⎣ −0.223 L ⎥⎦
The dragging forces τ Dx , τ Dy and τ Dψ are obtained from the Eq. (4) taking into account the geometric dimensions Ax = π R 2 , Dx = 2 R, 3 Ay = 2 LR, Dy = L, Aψ = 2 L R / 8, Dψ = L / 2 and the water dynamic viscosity μ = 0.001 [ Pa sec ] .
48
3.2
C. A. Cruz-Villar, V. Parra-Vega and A. Rodriguez-Angeles
Radio Control and Electronics System
The commercial AUV includes a 49 MHz amplitude modulated radio control transmitter based on the integrated circuit TX6 ATS306T, which sends codified 2 Hz PWM signals corresponding to the on/off control of the actuators. The receptor in the AUV works with the integrated circuit RX6 ATS306R, which demodulates the PWM signals to the drivers of the motors that are based on transistors working in cut off and saturation regimes. The transmitter was modified to be interfaced to the PC through the parallel port. Also an amplification stage based on a bipolar transistor 2N3866 and an operational amplifier NE5539N was implemented to increase the power of the transmitted signal. 3.3
Real Time Control System
At this stage of the curriculum, the students only have background on linear controllers design; thus, it is proposed that PD controllers be implemented at each actuation motor. A strong limitation is that a single motor can be controlled each time interval of 250 ms. Moreover, the on/off limitation on the DC motors power electronics led the students to implement Pulse Width Modulation (PWM) based controllers, which are fully digital in the sense that one PWM output for each motor is obtained as a bit of the parallel port. To obtain a well synchronized carrier signal, it was decided to use a hard real time system which was implemented on a RTAI platform. The real time system was designed to include two threads in real- time -one for each DC motor controller-, and two user tasks; one for image acquisition and another for image processing. The controllers were validated through simulations in SIMULINK (MATLAB) and Visual Nastran. In Fig. 6, it is presented simulation results of the switched PD controllers for position error and yaw angle error respectively.
Fig. 6.
x -position [pixels] and yaw angle ψ
[rad] errors for a simulated scenario
Underwater Vehicle with Servo-visual Control
3.4
49
Visual Servoing Control
A SONY DFWVL500 CCD camera of 640 × 480 pixels and an 8 bits gray scale resolution was used for visual feedback of the position and orientation. This camera was given to the student with UNIX routines based on the libdc 1394 library for capturing and processing images, therefore no particular knowledge of visual control was required. Once the image was captured and processed to differentiate the tones, an algorithm to compute the centroid and orientation of the AUV was implemented. The PD control for regulation of position and orientation of the camera was programmed in pixels. The camera covers an area of 86 [cm] × 64.5 [cm], such that there is a ratio of 7 [pixels/cm] approximately.
4
Results
Notice that the AUV has singularities at ψ = 90°, 270°, that correspond to the AUV being parallel to the X axis. These singularities were experimentally confirmed and they presented the larger position error for the system, 50 pixels (6.5 cm) at position and 3° at orientation; meanwhile, far from these singularities the average errors were of 25 pixels (3.25 cm) at position and 3° at orientation, which are in the limits of resolution and performance of the servo visual AUV system. In Fig. 7 a snapshot composition of an experiment video with x(0) = 67 pixels and ψ (0) = 182° as initial conditions is presented. The desired values are xd = 400 pixels, ψ = 20°, while the achieved values are x = 425 pixels, ψ = 39°. The above results are highly satisfactory since the goal was to integrate the AUV visual-servoing platform, rather than to design a high performance controller, which nonetheless could be considered as an extension for further development of the AUV system. Concerning to the original goals of the project, they are fully satisfied since the integration of the AUV, the real time control and visual servoing are achieved. Furthermore, the students went farther than the original
Fig. 7. Results from an experiment and GUI for the AUV system
50
C. A. Cruz-Villar, V. Parra-Vega and A. Rodriguez-Angeles
goals by introducing an algorithm for determining the minimum error direction when regulating the position of the AUV, and designing a GUI (graphical user interface) for the system, that is shown in Fig. 7.
5
Conclusions
From the results it is concluded that this project enforced the integration of three courses in a Mechatronics curricula. The project is low cost and gives the opportunity to the students to practice the theory reviewed in the classroom. The students commented that the project was very useful and gave suggestions to improve it, such as modifying the electronics of the AUV to obtain proportional control actions at the DC motors, and even to design and build their own AUV. They also concluded that the servo-visual feedback loop could be applied to other non invasive autonomous applications like mobile robots and ship systems.
References [1] [2] [3] [4] [5] [6]
Gianluca A (2003) Underwater robots. Motion and force control of vehicles-manipulator systems, Springer. Olguin-Díaz E (1999) Modelisation et Commande d'un Systeme Vehicule/Manipulateur Sous-Marin, These pour le grade de Docteur de l'Institute National Polytechnique de Grenoble, Grenoble, France. Morel Y and Leonessa A (2003) Adaptive Nonlinear Tracking Control of an Underactuated Nonminimum Phase Model of a Marine Vehicle Using Ultimate Boundedness, Proceedings of the 42nd IEEE CDC, Maui, Hawaii, USA. Jordán MA et al (2005) On-line identification of hydrodynamics in underwater vehicles, Proceedings of the IFAC World congress, Prague. Smallwood A and Whitcomb LL (2004) Model-Based Dynamic Positioning of Underwater Robotic Vehicles: Theory and Experiment, Journal of Oceanic Engineering, vol. 29, No. 1, January. SNAME, Society of Naval Architects and Marine Engineers, http://www.sname.org/.
Coordination in Mechatronic Engineering Work
James Trevelyan School of Mechanical Engineering, The University of Western Australia E-mail:
[email protected]
1
Abstract
This paper shows that little has been written on the roles that people actually perform in the course of mechatronics engineering work. The paper reports empirical results of interviews with several engineers working in mechatronics: this is part of a larger study of engineering work in several disciplines. The paper argues that coordinating the work of other people is the most significant working role, both in mechatronic engineering and also other engineering disciplines. This role has not been explicitly identified before. While coordination appears to be a generic nontechnical role, close examination reveals that technical knowledge is very important for effective coordination in engineering work. The coordination role is not mentioned in engineering course accreditation criteria. The absence of explicit references to this role in previous literature suggests that the research methods used for this study could provide better guidance for engineering educators on course requirements.
2
Introduction
This paper presents some of our research on what industrial mechatronic engineers really do in their work: part of a larger study of many aspects of engineering work. Most engineering researchers are interested in seeing and learning about new technology and hardware, novel ideas and how these can be applied to solve specific technical problems or used for improving existing processes and work practices. A study of the behaviour of engineers and technicians working in industry might seem to be irrelevant in this context, belonging more in a non-technical conference on psychology or management. Yet researchers rely on engineers and technicians for the application of their ideas, mostly in private firms. Whether we
52
J. Trevelyan
like it or not, people are an essential part of nearly all engineering processes and work practices. Many engineering researchers work for private firms and research tells us that private industry R&D can be a frustrating career for many (Manners, Steger et al. 1983; Allen and Katz 1995; Lam 1997; Vinck 2003; Lam 2005). Some social scientists have studied this, but it is difficult for them to understand engineering work (the paper reviews some of their results). This paper argues that engineers can gain useful insights into R&D work practices using contemporary social science research methods combined with their personal experience, particularly to understand why, for example, Japanese engineers can be particularly successful with certain mechatronics products. We can also learn how technical experts can be used effectively to maintain high productivity in engineering design and development. Most other engineering researchers are also educators. It is valuable for educators to know something about the work their students will be expected to perform and the environment in which they will do that. First they have a duty to convey this to students. Second, engineering students will be more likely to work harder learning methods they know will be useful in their career. Third, engineering students need to learn how human behaviour constrains engineering processes. It is tempting to draw a line in the sand and dismiss all human behavioural issues as an issue for social scientists, of no concern to engineers and technologists. Many people, especially social scientists, see engineering as applied science. Engineering academics have recently described engineering practice in terms of “specialist technical problem solving” (Sheppard, Colby et al. 2006). Yet we cannot separate people from engineering. Our research shows that young engineers have to work with and through other people and it is human behaviour that constrains their work right from the start of their careers. Given the intrinsic involvement of people in engineering processes, it is surprising that so little attention has been devoted to this in engineering research. One explanation for this gap is that it is difficult for engineering researchers to acquire appropriate research methods and background knowledge. It is possible that many engineering researchers assume that the necessary research is already being done by social scientists. This paper shows that this is not the case: a background knowledge of engineering is just as essential as social science methods for effective research on this issue. Engineering work is largely unknown except by engineers themselves and much of their know how is knowledge that they do not know they have (Polyani 1962). While there are many anecdotal accounts, the research literature has remarkably little to offer that would help an inquiring mind learn more about it. Only a handful of reports have appeared in the last three decades. Mechatronic engineering, as a relatively new discipline, is even less known. Few companies, even now, would admit that they employ mechatronic engineers. Instead they know them as instrumentation engineers, systems engineers, control engineers, automation engineers and several other titles. A recent survey of about 100 research publications (Tilli and Trevelyan 2005) reveals two major groups of reports. Investigations on engineering education have stimulated about half of all the reports on engineering work. The other half has
Coordination in Mechatronic Engineering Work
53
emerged from researchers who have been interested in engineering management. However, the vast majority of these results were obtained using presumed lists of engineering roles which do not seem to be based on empirical observations (e.g. Deans 1999; Bons and McLay 2003). Only a small number of empirical studies can be relied on, mostly using anthropological approaches based on qualitative research. Most of these were written by social scientists or students with little engineering experience. There are significant industrial problems in mechatronics that this research could help to solve. Recently published data from Scandinavia confirms considerable anecdotal evidence that major industries incur costs ranging between 10% and 50% of turnover resulting from maintenance and operator failures. Research on the roles performed by people in maintenance work is almost non-existent (Trevelyan, Gouws et al. 2006). (Orr 1996) is the only report we have uncovered so far, coincidentally of significant relevance in mechatronics as the focus was photocopier repair technicians.
3
Empirical Research Method
The empirical research followed well-established qualitative research methods used by contemporary social science researchers (e.g. Zussman 1985). Most data came from transcripts of semi-structured interviews performed by the author who has 20 years of engineering work experience in different fields of engineering. The sampling was partly opportunistic and partly purposeful for typical cases and maximum variation. About 50 engineers happily agreed to be interviewed, including several mechatronic engineers with experience ranging from 1.5 to 35 years. Three of the interview subjects were female and most have engineering degree qualifications. Each interview took between one and two hours. Open-ended questions encouraged the respondent to talk about the details of the work he or she performs. Field studies were also part of the survey: a limited number of subjects were shadowed for 1–2 days to test interview data. We use standard ethnographic analysis techniques on interview transcripts, field notes and other reference texts (e.g. Strauss 1987; Patton 1990; Miles and Huberman 1994; Huberman and Miles 2002). Working papers at our web site provide full details on interview questions and analysis methods (http://www.mech.uwa.edu.au/jpt/pes.html).
4
Coordination in Mechatronic Engineering Work
The coordination role emerged from the interview data unexpectedly. Initially we considered formal and informal “supervision” as an engineering work role, and there were two questions in the interview to explore supervision relationships. However the huge number of references was unexpected. These were mostly oneon-one situations involving a coordination or supervision role. Most references
54
J. Trevelyan
were in response to questions unrelated to supervision. Finally an insightful firsthand comment about C. Y. O’Connor, the engineer responsible for the Perth to Kalgoorlie pipeline in 1895–1899, led us to understand that obtaining willing cooperation is an important part of this role. “He was neither dogmatic nor arrogant [and] possessed an extraordinary capacity to win the interest and cooperation of men.” (Evans 2001, p 135). Coordination in this context means influencing other people so that they conscientiously perform some necessary work at an agreed time. This usually requires three different interactions. First it is necessary to reach an agreement on what has to be done and when it has to be performed. Usually it is necessary to be present, at least intermittently, while the work is being done to check that the results (perhaps partial) agree with expectations. The final result needs to be checked to make sure no further work or rectification is needed. The coordinator may not know how to do the work, nor even what the final result might look like, and may learn some of this through the coordination process. In other situations the coordinator may need to provide some of the knowledge and help the worker develop necessary skills. Willing cooperation increases the chance that the work will be performed conscientiously with less chance of mistakes and re-work. A small selection of quotes from interviews with mechatronic engineers (or mechatronics related engineering disciplines) is a useful way to illustrate some aspects of the role. Explicit references to particular companies or individuals in quotations from the interviews have been changed to preserve the anonymity of the interview subjects. Quotation 1. A recently graduated mechatronic engineer on his first assignment, working on site with little real authority, and learning about progress reporting: On my previous job I had to work with electricians and technicians and tradesmen. They were reporting to me on the previous job, not on this job. [How many?] Maybe three to five. They also reported to other engineers. For example I had to show them how particular wiring is done. I had to make sure that things are done the way we specified in the cable schedules… and in the design diagrams and carried out in a timely manner. [How did you know how the cabling was supposed to be done?] That was easy. You just follow the cable diagrams. Actually, one of my first tasks was to look at the cable schedules and make some modifications. That sort of got me to know how the cabling is done. It wasn’t that difficult to pick up. The cabling is installed by electricians but the engineer is responsible for making sure the installation is correct. The electricians probably can read the drawings, but the engineer needs to take them through each aspect shown in the drawings and relate the drawings to the actual situation on-site. The engineer needs to know how the cabling should be installed, the meanings of all the symbols used in the drawings (documentation standards and methods), and he also needs to understand how the cabling is part of a larger system so he can understand the function performed by the cables and hence
Coordination in Mechatronic Engineering Work
55
help to identify any mistakes in the drawings. This is one aspect of technical coordination work, working as a site engineer. He continues…. [And how did you know how long it should take, how could you tell when they were going slowly?] Okay, I was doing a very hands-on job when I started, I was working alongside the electricians, I was working as an electrician. Like a tradesman. I know how long it takes to commission a bus station. In a bus station we have PA systems, the communication system, we have 32 stations along the busways and they are all similar. So once you have done one you know how long each one should take. His initial first hand experience would have involved learning about working methods and techniques from the tradesmen. He may not have reached their proficiency levels, but he has learned how long the work should take under normal conditions. You can also know by the quality of their work. They always make mistakes because some of them just cannot be bothered to follow the schedules. Some of them don’t mark the cables correctly, others do not terminate the cables correctly. You can just go in and find out the quality of their work when you just start turning things on. Sometimes you can spend just as much time fixing up their mistakes. But that was all very valuable experience on how you handle this kind of commissioning work because you learn from their mistakes. Spotting mistakes is frequently mentioned in reports of site engineering experience. Rectifying the mistakes means discussing problems with the tradesmen and arranging for the rework to be done. …you would ask them “Hey, have you done this task?” They would say “yes, yes, it’s fine, it’s all done.” And then it you would go out there and you would find that it is 30% completed. There is very little control over the electricians because they are unionised and they get paid a lot more than us. They are direct employees but because they are governed by union guidelines they are not really controlled or managed. Quotation 2. A highly experienced control systems engineer, commenting on what graduate engineers need to learn early in their career: The business of working with other people. It is amazing how much you don’t know about that. Like the fact that you can’t order people to do things, you have to ask people to do things and they will tell you to get stuffed if they don’t like it. This illustrates the importance of winning willing cooperation. The next sentence reveals that winning cooperation may require some sharing of information… The fact that when you share information there is currency involved and the fact that to the extent that you will be taken to the extent that you will contribute…
56
J. Trevelyan
He goes on to describe his own view of the engineering process that graduates need to learn. This is useful because it reveals how extensive coordination is needed to make the whole process work. The progressive process of creation, how does it work, how do you progress a vision through to a strategy through to a design through to a plan, how you get the work done, how you control the quality how you control the safety and how you liaise with clients, the whole process. Teaching people about differential equations and thermodynamics….. about finance and that sort of thing, the whole business how you translate a vision into a finished structure and a whole number of processes involved, how you involve other people, how you do it within an economic framework, all these sorts of things, that’s what engineering is about. That is the core of engineering. And, how you do that process and at the same time preserve some magic in it, without changing it into a completely mechanistic process, because there is a lot of magic all the way through. Quotation 3. A recently graduated mechatronic engineer in the building services industry: With sub contractors the person who gets the best service is the one who shouts the loudest. If you’re nice to them you drop to the bottom of the stack. You have to scream and yell. We prefer a different technique. We monopolise each sub contract or so he has no time to work for anyone else. If he tries to work for somebody else he will have to shuffle his own work around and squeeze it in and then he will have to keep on changing it as we change our schedule. Then he will do the job but find he has to go down to Albany to do one of our jobs first. In recent years, there has been a transformation away from large organisations employing all their required labour to a widespread use of small subcontractors, sometimes a single individual. Building a cooperative relationship takes time and we see one approach that combines the advantage of a long term relationship with the flexibility of a contracting arrangement. Most sub contractors are very good. But, there are some guys coming in just for the money. Our reputation and standing is very high. For example, we make sure that our wiring is installed in rigid conduit because it looks good. Flexible conduit looks bad, scrappy. We win work just for that reason alone. That means that we have to make sure that our sub contractors don’t cut corners. Again, technical requirements lie behind this as well: rigid conduit eliminates repeated bending stresses. Quotation 4. A project manager in a small specialist mechatronic engineering company: I had a group of electronic engineers who were doing the electronics and a group of mechanical engineers who were doing the mechanical aspects and a software engineer who was senior and who was waiting until everything was ready so he could do it, so that he could do his software. But, one of the items
Coordination in Mechatronic Engineering Work
57
that was assigned to a young engineer, which is... he was a graduate, I think he was a second-year graduate and he hasn’t got the sort of experience to say for this isolation you need this and you need the other. So, here we were on the last day and we were actually having a review and we said okay let’s have a look at this equipment, is it going to work? And we found on the last day that it wouldn’t because we were getting these echoes which were wrong from what we were expecting so we had to actually... I had to ask that young engineer to get his senior guy to help him to fix it but then I had to go to my customer who is expecting this kit to sort of arrive and buy ourselves time. So, here on one hand I am at the design level helping this young engineer by saying, look, we got noise, we need some isolation, we need this, sort of screened etc and on the other hand now I need to sort of jump a few ranks not to tell the customer, “Look I have noise etc etc, I can’t do it” but to give an excuse which is plausible which will then gain time and actually it was good because he couldn’t provide the sort of vehicle which we wanted so it was him who said “Look, I can’t get you the vehicle, is it okay if you come in a weeks time instead?” So I said “well, it might be a bit hard...” (laughs heartily) Here the project manager has to coordinate the graduate engineer and provide some guidance on reducing electromagnetic noise in a circuit, and at the same time coordinate a client’s schedule. The client needs an understandable reason for a delivery delay. Quotation 5. An instrumentation project manager working on process plant construction: So, yeah, there aren’t really any direct technical (responsibilities) but there is still the requirement that you have a good working knowledge of everything that’s going on in the project and, to a certain extent, everything that all the other disciplines are doing as well. Because you have got to be able to interface with what everyone else is doing so it’s a lot of coordination, a certain amount of handholding with individuals, once you’ve started getting a group of people together you get all the different personalities, different traits, different people with different strengths and different problems that all have to be addressed. Here the engineer uses the term “hand holding” to describe the second interaction in coordination: intermittent face to face contact to make sure that the work proceeds as expected and to provide guidance and help when needed. The engineer will normally anticipate problems and provide timely advice to help avoid them. Coordination is the most commonly referenced engineering work role in our interviews, both across all engineering disciplines and mechatronics. Qualitative analysis software provides an easy method to count interview references to particular roles. This does not imply a corresponding proportion of work time, but it does provide strong evidence for the significance of coordination roles in engineering work. In the interviews analysed so far, the most commonly referenced role is coordinating people in the same organisation. On average, each interview produced 24 references to technical roles, and 70 references to generic roles in-
58
J. Trevelyan
Fig. 1. Average number of role references in each interview, generic, non-technical roles. Average number of references to technical roles was 24
cluding coordination, financial, legal, project and operations management, business development and personal career development. Of the generic non-technical roles, coordinating people dominates as shown in the graph below (Fig. 1). One must be careful not to place too much significance on the numerical results. The numerical values can be strongly influenced by the way the interviews and analysis were conducted. Even though only a small number of interview quotations have been included in this paper, they illustrate several significant factors. First, even recently graduated engineers report significant coordination roles, mostly without real authority. Most respondents mention the importance of maintaining good working relationships and some comment directly on the counterproductive results of resorting to authoritarian or confrontational relationships. This aspect of engineering work does not seem to be mentioned in well-known works on engineering management (e.g. Badawy 1995) except in the context of cross-functional teams. Most reports of coordination work in the interviews did not refer to team work. A good example is coordination with clients, either directly or through staff employed by the client. A further factor is the significance of technical expertise in the coordination role. Again this is illustrated in some of the quotations reproduced above. Technical knowledge can be a significant factor that confers ‘informal authority’ in working relationships. Spotting mistakes in drawings and specifications is an important aspect of site engineering work: this draws on technical expertise. Engineering work is commonly separated into “real engineering” or “hard technical” working roles and “soft skills” including communication. The results of this study tend to support a different view: that coordinating people, gaining their willing cooperation, is the most significant working role for many engineers, and that this role relies on technical knowledge and expertise as much as interpersonal communication skills. The level of interview analysis has been far more detailed than reported in this paper. As well as engineering working roles, we have also analysed many aspects of technical knowledge used in engineering work and we plan to report on this in forthcoming publications.
Coordination in Mechatronic Engineering Work
5
59
Implications for Engineering Education
Engineering education is traditionally viewed as necessarily technical with “some management” or “soft skills” content. The prevailing social view reflects this and sees engineering as a technical discipline, or as applied science. Some engineering academics view the discipline as mainly specialized technical problem-solving (Sheppard, Colby et al. 2006). The data from this study shows that coordination work is an important aspect in the work of engineers, even at the start of their careers. Engineers devote little of their time to hands-on technical work: that is largely performed by other people. What we see in the evidence is that engineering work is coordinated and driven by engineers, but the end results are delivered through the hands of other people. The strength of this evidence leads one to conclude that engineering needs to be treated as a technical and a social discipline at the same time. Engineering educators rely strongly on accreditation criteria such as the ABET engineering education outcomes (ABET 2005) and Engineers Australia generic attributes (Engineers Australia 2005) to provide guidance in course design. These accreditation criteria are supposed to be based on the attributes that graduates need for engineering work. Neither mentions coordination or gaining willing cooperation specifically. While one might interpret “team work” to mean the same thing, and there are obvious parallels, working in teams is a different experience. Most of the engineer’s coordination work reported in the interview data occurs outside the context of a particular team. There is no doubt that effective communication is required to win willing cooperation and this probably explains why accreditation criteria and job advertisements place strong emphasis on communication skills. However, defining “communication skills” as an educational objective can lead to many different interpretations, possibly not relevant in coordination roles. Powerpoint presentations can be useful on occasions, but will probably not be very helpful in arranging well-timed concrete deliveries at a construction site. We can therefore raise legitimate questions on whether current accreditation criteria mentioning team work and communication skills are promoting effective acquisition of coordination skills in engineering students. Of course, effective coordination relies on a hierarchy of several other groups of more fundamental skills such as interpersonal verbal and non-verbal communication, written communication (verbal and visual), selecting appropriate communication strategies, mentoring, and informal leadership. It also relies on an accurate appreciation of both individual and shared technical knowledge domains and ways to represent technical knowledge in the working context. Traditionally, engineering schools rely on industry advisory committees with experienced engineers to provide guidance on course content and feedback on graduate attributes. The author has many years of experience working with such committees. This research has produced results that have not emerged either from these committees or national engineering accreditation requirements. This research, therefore, shows significant shortcomings in the way that most engineering
60
J. Trevelyan
schools gain their knowledge of engineering work requirements to design appropriate courses for their graduates.
Acknowledgements This work would not have been possible without the support of my family, in Australia, the UK and Pakistan. Nor would it have been possible without enthusiastic support from my colleague Sabbia Tilli. Thanks are also due to my faculty colleagues Ruza Ostrogonac and Lee O’Neill and students Leonie Gouws, Sally Male, Adrian Stephan, Ernst Krauss, Emily Tan, Katherine Custodio, Nathan Blight, Tim Maddern and Brad Parks. Finally I need to provide anonymous thanks to all the engineers and others who have contributed, knowingly and unknowingly, through their interview responses, comments, voluntary contributions and suggestions.
References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13]
ABET (2005). Criteria for Accrediting Engineering Programs 2006–2007. Allen, T. J. and R. Katz (1995). “The project-oriented engineer: A dilemma for human resource management.” R&D Management 25(2): 129–140. Badawy, M. K. (1995). Developing Managerial Skills in Engineers and Scientists: Succeeding as a Technical Manager, Van Nostrand Reinhold. Bons, W. and A. McLay (2003). Re-engineering Engineering Curricula for Tomorrow’s Engineers. 14th Annual Australasian Association for Engineering Education Conference, Melbourne, Australia. Deans, J. (1999). “The Educational Needs of Graduate Mechanical Engineers in New Zealand.” European Journal of Engineering Education 24(2): 151–162. Engineers Australia (2005). Accreditation Criteria Guidelines. A. Bradley, Engineers Australia. 2006. Evans, A. G. T. (2001). C. Y. O’Connor: His Life and Legacy, University of Western Australia Press. Huberman, A. M. and M. B. Miles, Eds. (2002). The Qualitative Researcher’s Companion. Thousand Oaks, California, Sage Publications. Lam, A. (1997). “Embedded Firms, Embedded Knowledge: Problems of Collaboration and Knowledge Transfer in Global Cooperative Ventures.” Organization Studies 18(6): 973–996. Lam, A. (2005). “Work Roles and Careers of R&D Scientists in Network Organisations.” Industrial Relations 44(2): 242–275. Manners, G. E., J. A. Steger, et al. (1983). Motivating Your R&D Staff. Managing Professionals in Innovative Organizations. R. Katz. Cambridge, Massachusetts, Ballinger: 19–26. Miles, M. and A. Huberman (1994). Qualitative Data Analysis: An Expanded Sourcebook. Thousand Oaks, California, Sage Publications Inc. Orr, J. (1996). Talking About Machines: An Ethnography of a Modern Job. Ithaca, New York, Cornell University Press.
Coordination in Mechatronic Engineering Work
61
[14] Patton, M. Q. (1990). Qualitative Evaluation and Research. Newbury Park, California, Sage. [15] Polyani, M. (1962). Personal knowledge: towards a post-critical philosophy. New York, Harper Torchbooks. [16] Sheppard, S., A. Colby, et al. (2006). “What is Engineering Practice?” International Journal of Engineering Education 22(3): 429–438. [17] Strauss, A. (1987). Qualitative Analysis for Social Scientists, Cambridge University Press. ]18] Tilli, S. and J. Trevelyan (2005). Published Research on the Nature of Engineering Work, School of Mechanical Engineering, The University of Western Australia. [19] Trevelyan, J. P., L. Gouws, et al. (2006). The nature of engineering maintenance work: a blank space on the map. 2006 World Conference on Engineering Asset Management, Surfer’s Paradise, Queensland, Maintenance Engineering Society of Australia. [20] Vinck, D., Ed. (2003). Everyday Engineering: An Ethnography of Design and Innovation. Inside Technology. Boston, MIT Press. [21] Zussman, R. (1985). Mechanics of the Middle Class: Work and Politics Among American Engineers. Berkeley, University of California Press.
Vision Techniques
Estimation of distance by means of binocular vision is a well trodden area. Even the use of a single camera with techniques such as visual streaming is well known. However the authors of the paper from Japan that opens this section have gone one step further. They have implemented their method on an analogue VLSI chip. Indeed, two versions are presented. The first gives a linear motion estimate that are virtually independent of contrast and illumination. The second one estimates motion in a spatially coarser manner, favoring certain directions as it mimics the receptive fields of the retina. To put the methods to a practical test, the sensors have been used in the RoboCup robot soccer competition. Some Hong Kong research concerns the use of a genetic algorithm for determining shapes. The ‘particle swarm optimisation’ technique can identify contours within the image of an object; the research aims to reduce the number of iterations required for convergence. The following article is inspired by altogether more practical considerations. Autonomous operations such as spray painting or sandblasting need a 3D vision system. The work is set in the context of an Austrian firm that specialises in spray painting. It concerns the recognition of identity and orientation of a part from a library of parts. The paper includes results from numerous practical tests. What sort of vision system will enable your mobile robot to follow you around? A paper from San Diego investigates special applications of techniques such as frame-differencing, color thresholding, region growing and shape identification. The aim is to cause the robot to follow at a more-or-less constant distance with no other source of information. Video of the system in action was shown when the paper was presented. An Australian paper concerns the tracking of position and orientation of a head in real time. Rather than indulge in computationally intensive methods of feature recognition, the authors have taken the simple expedient of attaching LED beacons to the head, mounted on a special pair of spectacles. A single camera can provide the data for all the necessary calculations.
A Vision System for Depth Perception that Uses Inertial Sensing and Motion Parallax
Vlatko Bečanović1 and Xue-Bing Wang1 1
Kyushu Institute of Technology, 2-4 Hibikino, Wakamatsu, Kitakyushu, Japan
1
Introduction
There are several ways for a vision system to obtain depth information from a complex visual scene. Some extract absolute distance information and others only relative information from the scene. Commonly used methods that give absolute distance measurements are triangulation, suitable for medium to far distances, and binocular disparity used in stereo vision, which is a robust method for determining distance information at close range. Other cues that are commonly used, but that only give relative distance information, are scale, perspective, precedence of non-occlusion etc. There is however another important cue for distance, or depth perception, not as commonly used as stereo vision, and that can reveal depth information from a monocular motion field. It is referred to as structure from motion or kinetic depth effect when the object that is observed is moving. When the observer is moving it is referred to as motion parallax (Mallot 1998, p 201). The former gives rise to local depth information and the latter can determine distances to objects. There is however an important requirement for determining absolute distances to objects, i.e. that the motion of the observer needs to be known. The motion of the observer will in our case correspond to the measured inertial change of the observer. The distance to an object point can then be obtained by augmenting the additional inertial cue, i.e. the observer motion, with the perceived motion of any object point present in the scene (Huster 2003, Karri et al. 2005). In a practical scenario the object points belong to image structures of anything from object contours, corners, to local textured regions, e.g. patches of the scene that can be distinguished from the background. An experiment was performed where a static high contrast object was the primary target and the observer was accelerated with the help of gravity. The experimental data was compared with a Monte-Carlo simulation of the algorithm in order to help us build a noise distribution model of the system dynamics. The hardware setup in our experiments consisted of an inertial sensor, a high contrast imager and a sliding pad mounted on a rail. The design of the particular vision
66
V. Bečanović and X.-B. Wang
sensor was inspired by the early vision processing stages of the vertebrate retina (Kameda et al. 2003). This sensor will calculate a contrast enhanced input image together with a temporal change output based on difference images, i.e. it is a brain inspired neuromorphic sensor. We plan to implement this approach in an embedded system that consists of a hardware based on the Intel PXA27x processor described in previous work (Misawa 2006, Bečanović et al. 2006).
2
Experiental Set-up
The experiment is performed by letting the vision system fall along a sliding rail with a constant acceleration of up to 9.4±0.4 m/s2. The sliding rail is equipped with a braking mechanism that will dampen the fall after an initial period of acceleration in order to protect the vision system from getting damaged during the de-acceleration (cf. Fig. 1). Data was collected continuously during each experimental trial that consisted of a trajectory of up to seven position estimates, giving up to five perceived acceleration estimates at 20 Hz. The response of the vision sensor was given along a trajectory row for the two parallel images provided; contrast enhanced (sustained) image and difference image as presented in Figs. 2 and 3. The contrast enhanced image will improve object centroid estimation and the temporal image is not used at present. The observer acceleration is measured simultaneously at a higher update rate and then down sampled (low-pass filtered) to fit the update rate of our imager. A characteristic example of the inertial information corresponding to the image sequence is presented in Fig. 4. The inertial sensor used is presently considered to be one of the smallest commercially available inertial sensors measuring only 5.6 × 5.6 × 1.4 mm and giving inertial measurements in
Fig. 1. The vision system mounted on the sliding rail. A view of the experimental setting can be seen in the background
A Vision System for Depth Perception that Uses Inertial Sensing and Motion Parallax
67
Fig. 2. Sustained image output along a single row. The parameter xC is the image centroid position as calculated from the corresponding 2D image
Fig. 3. Difference image output where bright to dark edges give a positive, and dark do bright edges a negative contribution (cf. Fig. 3)
the range from –2G to 2G. Priced at reasonable 25 USD per piece and consuming only about 5 mW it is our preferred choice for performing inertial measurements1. It is a true 3D sensor, although in our measurements only the component in the direction of motion is considered. 1
cf. model: HAAM-313B, HDK Hokuriku company web page: http://www.hdk.co.jp/
68
V. Bečanović and X.-B. Wang
Fig. 4. The inertial sensor output for the transversal movements in vertical (x-), horizontal (y-) and depth (z-) direction. The period of constant acceleration is marked with an arrow
3
Distance from Motion Parallax
There are several ways to obtain depth information. Motion parallax has the advantage that the depth can be determined to objects by using monocular vision only. For the case that the observer motion is unknown only relative depth can be estimated, but if the observer motion is known the absolute distance (as seen by the observer) can be estimated. If the velocity of the observer is known the distance to static objects can be determined. If also the acceleration of the observer is known the distance can be estimated to objects moving with constant velocity, that is, if the observer motion is known to a higher derivate than the object motion then the absolute distance to the object can be estimated. It should be noted that only translational motion will contribute to the motion parallax effect, the rotational component of the observer will not contribute at all. The distance algorithm will here be derived for the special case when the observer has constant acceleration in the plane perpendicular to the optical axis (that corresponds to our inertial sensor measurements) and where the object is moving with constant velocity or is static (the motion derivative is assumed to be constant). The depth formula can then be derived for an object point that is moving along a line in the x-y plane perpendicular to the optical axis of the observer, i.e. the z-axis (cf. Fig. 5). By assuming that the thin lens approximation holds: 1 1 1 zf = + ⇒ z′ = = M ( z) z f z z′ z− f
Fig. 5. Thin lens derivation of the distance from motion parallax formula
(1)
A Vision System for Depth Perception that Uses Inertial Sensing and Motion Parallax
and M ( z ) =
69
f , then for an object at distance z = d the perceived object point z− f
is projected as: (x', y') = (M(d) · x, M(d) · y)
(2)
where the prime indicates coordinates as perceived on the focal plane image. The relative velocity in the direction of the observer motion is Δv = vobs + v0 and the acceleration difference is Δa = aobs + a0, where a0 and v0 are components of object motion, aobs and vobs are components of observer motion, that all are parallel with the direction of observer motion. Both parameters are assumed to be zero, i.e. vobs= a0 = 0, meaning that the initial velocity of the accelerating observer is zero and that the perceived object is moving with constant velocity. The relative velocity to the object can be obtained by integration of the relative acceleration: t
Δv (t ) = ∫ Δa (t ′)dt ′ = aobs t + v0
(3)
0
The relative object position along the direction of observer motion s(t) is a linear path on the x-y plane, i.e. s(t):= ||(x(t), y(t))||. It can be obtained by integration of the velocity difference: t 1 s(t ) = ∫ Δv(t ′)dt ′ = aobs t 2 + v0t + s0 0 2
(4)
where the perceived position (at the focal plane) can be calculated from (4) by using the relation (2) so that:
⎛1 ⎞ s p (t ) = M d ⎜ aobs t 2 + v0t + s0 ⎟ ⎝2 ⎠
(5)
The acceleration as it is perceived by the observer can be obtained by first calculating the velocities at the two instances t + Δt and t + 2Δt:
v p (t + Δt ) ≈
s p (t + Δt ) − s p (t ) Δt
= (6)
⎛1 ⎞ = M d ⎜ a obs ( 2t − Δt ) + v 0 ⎟ ⎝2 ⎠
v p ( t + 2 Δt ) ≈
s p ( t + 2 Δt ) − s p ( t + Δ t ) Δt
=
⎛1 ⎞ = M d ⎜ a obs ( 2t + Δt ) + v 0 ⎟ ⎝2 ⎠
(7)
Then, by using the perceived velocities at t + Δt and t + 2Δt the perceived acceleration can be calculated as:
a p (t + 2Δt ) ≈
v p (t + 2Δt ) − v p (t + Δt ) Δt
= M d a obs
(8)
70
V. Bečanović and X.-B. Wang
which in turn can be written as:
ap =
f a obs d− f
(9)
and finally the distance as perceived by the observer can be obtained by solving for d so that:
d=
aobs f a f + f ≈ obs ap ap
(10)
The perceived distance is thus obtained as a quotient between the inertial and optically perceived accelerations. Both are measured quantities that introduce considerable amounts of imprecision to the estimate. Especially the perceived acceleration is critical for distant objects, e.g. when ap « 1, since d ∝ 1/ap.
4
Results from Simulation and Experiment
The first task was to simulate the algorithm in order to validate its performance for our inertial and vision sensors. The algorithm was simulated with a Monte-Carlo approach where two kinds of errors where introduced, the first being inertial measurement error which was estimated to be ±0.4 m/s2 and the second perceived pixel error that was estimated to be ±0.223 mm, e.g. half the inter-pixel distance of 0.445 mm. The errors were both simulated as normally distributed noise sources and a quantization artefact was modelled for the resolution of 40 pixels which is the focal plane resolution of our vision sensor prototype in the direction of object motion. The distance estimate was improved by the use of a Kalman filter and the corresponding model matrix F, the state vector X and the mixture matrix H were selected as follows:
⎛ 1 Δt ⎜ ⎜0 1 F =⎜ 0 0 ⎜ ⎜0 0 ⎝
Δt 2 / 2 0 ⎞ ⎟ 0⎟ Δt , 1 0⎟ ⎟ 0 1 ⎟⎠
⎛ s p (t k ) ⎞ ⎜ ⎟ ⎜ v p (t k +1 ) ⎟ , X (k ) = ⎜ a p (tk +2 ) ⎟ ⎜⎜ ⎟⎟ ⎝ a obs (t k + 2 ) ⎠
⎛1 ⎜ ⎜0 H =⎜ 0 ⎜⎜ ⎝0
0 0 0⎞ ⎟ 1 0 0⎟ 0 ε 0⎟ ⎟ 0 0 1 ⎟⎠
(11)
where the iteration step was tk = t0 + kΔt and ε « 1. Note that n observations correspond to k + 2 position estimates, since the acceleration is calculated from the position estimate. The simulated results are shown in Fig. 6 where the single observation points are presented together with the corresponding Kalman and mean estimates after 3 observations.
A Vision System for Depth Perception that Uses Inertial Sensing and Motion Parallax
71
Fig. 6. Distance estimates for a simulated run, where individual estimates, Kalman estimates and averages over observations in each trajectory are shown
There is a rich literature describing the Kalman filter and a more detailed description will not be given here. A good review of the Kalman filter together with sample code can, for example, be found in the book by Grewal and Andrews (Grewal et al. 2001). Experimental data were obtained for 10 runs at each distance, comprising trajectories of up to five observations for objects at far distances and down to a single observation in the close range. This was expected since the field of view decreases at close range together with the fact that the perceived object motion is high. This makes the error in the experimental data decrease with greater distance, since at far distances the imprecision due to low perceived velocity (close to sub-pixel) is compensated by the increase of observational points. At close range there are very few observations which will instead increase the spread of the estimates. At average the relative error for the Kalman estimate is about 3% for distances greater than 2.0 m, 6% for distances in the range between 1.0 and 2.0 m and up to 15% in the closer range. The experimental data is presented in Fig. 7 and Tab. 1 for a freely falling observer with an acceleration of 9.4±0.4 m/s2. The experiment was repeated twice with decreasing observer accelerations of 7.3±1.0 m/s2 and 6.2±1.3 m/s2 respectively. The experimental set-up was not deemed to be appropriate for lower accelerations because of too high friction between the sliding pad and rail at low inclinations, cf. Figs. 8 and 9.
72
V. Bečanović and X.-B. Wang
Fig. 7. Distance estimates for the experimental data. The number of observations per trajectory increases with distance. Estimates at close range might only have a single observation Table 1
d [m]
n *2
a p [cm/s2]
σa [cm/s2]
d p [m]
σ d [m]
σ d /di*3
0.5
1.1
21.8
4.41
0.55
0.08
0.12
0.7
2.3
16.9
2.66
0.68
0.05
0.06
0.9
2.8
12.7
3.13
0.94
0.14
0.15
1.1
3.5
11.0
2.90
1.11
0.14
0.06
1.3
3.9
10.1
3.37
1.25
0.13
0.06
1.5
3.3
8.29
2.26
1.50
0.10
0.02
1.7
3.7
6.34
1.82
2.07
0.20
0.05
1.9
3.9
5.79
1.19
2.07
0.15
0.06
2.1
4.2
5.42
0.88
2.15
0.12
0.03
2.3
4.1
4.89
0.54
2.32
0.08
0.03
2.5
4.6
4.53
0.63
2.52
0.06
0.02
3.0
4.1
3.68
0.73
3.16
0.11
0.04
*2 *3
Average number of observations along unique trajectories. Relative error obtained from standard deviation.
A Vision System for Depth Perception that Uses Inertial Sensing and Motion Parallax
Fig. 8. Distance estimates for an observer accelerated with 7.3±1.0 m/s2
Fig. 9. Distance estimates for an observer accelerated with 6.2±1.3 m/s2
73
74
V. Bečanović and X.-B. Wang
Fig. 10. Relative error as a function of distance for an observer accelerated with 9.4±0.4 m/s2 (freely falling), 7.3±1.0 m/s2 (falling at 30 degrees inclination) and 6.2±1.3 m/s2 (falling at 45 degrees inclination) respectively
For lower accelerations the uncertainty increases somewhat for farther distances (greater than 2.0 m), as can be seen in the result presented in Fig. 10. At close distances the best result was obtained with an observer acceleration of 7.3±1.0 m/s2 (sliding pad falling at 30 degrees inclination). Thus the result seems to depend not only on the acceleration but also on the actual inclination of the sliding rail. This could be explained as effects of non idealities in the experimental set-up. The friction and vibrations introduced when the pad slides along the rail are highly non-linear and give effects not accounted for in the simple 1D geometry and dynamics of our model. This is something that needs to be addressed in future designs. Nevertheless, the simple approach used at present gives surprisingly accurate results, which shows that it is a fairly robust principle deserved to be considered in vision systems, especially real-time systems that utilize motion information and where several sensing strategies are combined, e.g. inertia, vision, etc.
5
Conclusions
A monocular optical distance sensing system was investigated that was based on an algorithm that utilizes motion parallax. The algorithm was suitable for low precision hardware and showed promising results in simulation and experiments.
A Vision System for Depth Perception that Uses Inertial Sensing and Motion Parallax
75
The simulation showed that the distance could be robustly calculated when the rotational component of the observer motion was very small, and this was experimentally confirmed using a relatively ideal experimental set-up. Most probably this would not be the case in a real application, e.g. on a wheeled robot, vehicle, etc., thus the efficiency of the method would need to be validated further, especially under the influence from rotational motion. It would also be advantageous to complement this method with other distance sensing strategies when the observer is static, because this is the case when the system would be blind in the sense that no distance cues can be calculated. All in all, exploiting the mechanisms of motion parallax by fusing visual and inertial cues could prove to be an efficient way to sense the depth in a scene with lower cost, less weight and with lower power consumption compared to other alternatives.
Acknowledgements This work is supported by the Kyushu Institute of Technology 21st Century Center of Excellence Program financed by the Ministry of Education, Culture, Science and Technology in Japan which is gratefully acknowledged.
References [1] [2] [3] [4] [5] [6] [7] [8]
Bečanović, V., Matsuo, T., Stocker, A.A. (2006), An embedded vision system based on an analog VLSI optical flow vision sensor, ICS 1291: Brain-Inspired IT II, pp. 249–252. Grewal, M.S. Andrews, P.S. (2001), Kalman Filtering Theory and Practice using Matlab, John Wiley and Sons Inc. Huster, A. (2003), Relative Position Sensing by Fusing Monocular Vision and Inertial Rate Sensors, PhD. Dissertation, Stanford University, USA. Kameda, S., Yagi, T. (2003), A silicon retina system that calculates direction of motion. ISCAS pp 792–795. Karri, S.S., Titus, A.H. (2005), Toward an analog very large scale integration system for perceiving depth through motion parallax. Optical Engineering, vol. 44, no. 5. Mallot, H.A. (1998), Computational Vision – Information Processing in Perception and Visual Behavior, MIT Press (2000), 2nd Ed (translated by J.S. Allen). Misawa, H. (2006), An embedded vision system based on analog VLSI motion sensors. COE Report, Kyushu Institute of Technology, Japan. Wang, X.B. (2006) Determining distance from motion parallax with a silicon retina. Master thesis, Kyushu Institute of Technology, Japan.
Rate Shape Identification Based on Particle Swarm Optimization
P.W.M. Tsang1 and T.Y.Y. Yuen2 1,2
Department of Electronic Engineering, City University of Hong Kong, Hong Kong
1
Introduction
When a near planar object is viewed from different directions with a camera placed sufficiently far away, its images can be mathematically related with the Affine Transformation as given by
⎡ d si ( x ) ⎤ ⎡a b ⎤ ⎡ d ri ( x ) ⎤ ⎡ e ⎤ ⎢ d ( y )⎥ = ⎢ ⎥+⎢ ⎥ ⎥⎢ ⎣ si ⎦ ⎣ c d ⎦ ⎣d ri ( y )⎦ ⎣ f ⎦
(1)
where (dsi(x), dsi(y)) and (dri(x), dri(y)) are the co-ordinates of pixels in the scene Oscene and reference Oref objects, respectively. The set of parameters A = {a,b,c,d,e,f} are coefficients of the Affine Transformation relating the two images. As a result, matching of object shapes can be encapsulated as a process of searching the existence of a mapping between the pixels of each pair of images to be compared. The method can be easily achieved by defining an objective function E to be the area of non-overlapping region between the scene and the transformed reference object images, followed by a search algorithm to locate the Affine Transform that gives the smallest value for E (the global minimum) in the vector space. The effectiveness of this simple approach is severely limited by the enormous amount of time taken to locate the correct state (if it exists) in the six dimensional search space governed by the transform parameters. In addition, as there are numerous local minima in the objective function, fast searching algorithms based on Hill Climbing or Gradient Descent are not applicable, as the search path can be easily trapped in sub-optimal solutions. To overcome this drawback, attempts have been made using Genetic Algorithms to determine the optimal state in the vast search space. Amongst these techniques [1] has exhibited satisfactory performance in identifying isolated object shapes and well defined contours. The method is also applicable in shape alignment [2] as well as matching of broken contours [3].
78
P.W.M. Tsang and T.Y.Y. Yuen
An important assumption that has been made in [1] is that some of the chromosomes in the initial population should carry partial characteristics of the optimal descendant. By exchanging and modifying these useful qualities through crossover, mutation and reproduction, the majority of members in the population will finally evolve to a state which exhibits high fitness value for a given objective function. Despite the success of this method, little has been mentioned on how the algorithm will behave if the entire population is ill-formed (i.e., there is no genetic information in the population that is related to the targeted solution). Experimental results reveal that the evolution, which depends heavily on mutation, can be very long if the initial states are far away from the optimal solution in the search space. In the extreme case, the process will fail to converge or trapped in local minimum with moderate fitness values. A straightforward approach is to repeat the genetic search several times with different initial populations [4]. However, application of this simple method will increase the computation loading significantly. An alternative method is to build an initial population with individuals that exhibit fitness values that are above a pre-defined threshold. Although this selection criteria may be effective for other engineering applications, it is not always effective in object shape matching where the search space is unpredictable (depending on the pair of images to be matched), highly non-linear and generally contains large amount of local minima. A chromosome with large fitness value does not necessary imply the presence of genes that are related to the optimal solution. Recently, the above problem have been partially alleviated by integrating the “Migrant Principle” [5][6] to the genetic algorithm. However, the average number of generations required in accomplishing the task is considerably higher than the original scheme and complete success rate is not achieved in general. In this paper we propose a novel technique to overcome the above problems with the use of Particle Swarm Optimization (PSO). Experimental evaluations have reflected that for the majority of cases, our method is capable of identifying contours of the same object within a small number of iterations. Based on this important finding we have further enhanced the method by conducting multiple trials on a reduced evolution timeframe. The revised algorithm has demonstrated 100% success rate with only slight increase in the computation time. Organization of the paper is given as follows. In section 2 a brief summary of the Affine Invariant Object Matching scheme reported in [1] is outlined. Next, the Basic principle of PSO is described in section 3. Our method in applying PSO in object shape matching is presented in section 4. This is followed by a report on the experimental results and a conclusion summarizing the essential findings.
2
Outline of Genetic Algorithm Based Affine Invariant Object Matching
Details of the genetic algorithm for object shape matching have been described in [1] and only a brief outline will be given in this paper. Given a reference contour Oref = {pr1, pr2, ....prM} and a scene contour Oscene = {ps1, ps2, ....psM} the task of the genetic search was to determine whether an Affine Transform existed that maps
Rate Shape Identification Based on Particle Swarm Optimization
79
one contour to the other. The algorithm is summarized as follows. To start with three seed points S = [s1, s2, s3] are selected on the reference contour. An initial population Pop of I individuals are generated each formed by a triplet of randomly selected test points T = [t1, t2, t3] on the scene contour. Each seed point or test point is represented in the form of an index with reference to a fixed start point on the corresponding object contour. The pair of triplet point sets define an Affine Transform mapping S to T, which is applied to the reference contour to give a normalized contour Oref. The fitness value of the chromosome is determined by the fraction of overlapping between Oref and Oscene. Individuals from the initial population are selected with probabilities according to their fitness value into a mating pool at which they are allowed to crossover or mutate to produce the next generation of offspring. The reproduction cycle is repeated until a number of epochs have elapsed or the maximum fitness value in the population has exceeded a predefined threshold. If Oref and Oscene belong to the same object, a set of test points corresponding to S can be found which define an Affine Transform mapping the majority of points from Oref to Oscene, resulting in a maximum fitness value that is close to unity.
3
Particle Swarm Optimization (PSO)
Particle Swarm Optimization (PSO) was first initiated by Kennedy and Eberhart [7][8] to study and model the social behaviour of a population of individuals. In their theory, they suggest that when individuals are grouped together in a transparent community, each member will tend to improve its condition by learning from past experience and following the trend of other successful candidates. This postulation is found to provide a good description on the movement of birds’ flock, and could also be extended to unravel solutions for many real world problems. The generalization could be explained as follows. Suppose a problem is represented by an objective function f (x1 , x 2 ,........, x N −1 ) with N independent variables, defining a multidimensional space with each point corresponding to a state of the problem. The task is to deduce the set of variables that constitute to the best solution reflected by the highest (in certain cases the lowest) outcome of the objective function. When a group of individuals are placed in the space, each of them will be located at position that may score differently from others. Suppose the individuals all have the urge to migrate to better positions, they will move around with the hope that one or more of them will finally reach the best point in the space. In the paradigm of Particle Swarm, each individual will conduct such kind of search by referencing to the results achieved by their neighbours and also the population at large. The success of this kind of collaborative effort has instigated the development of PSO in determining global solutions for hard problems in science and engineering applications. PSO is similar to SGA in many ways. A problem is modelled as an objective function defined by a set of variables, forming a multi-dimensional parametric
80
P.W.M. Tsang and T.Y.Y. Yuen
space. A state (or position) in the latter is encoded into a mathematical expression such as a binary string. Initially, a Swarm of individuals known as particles each representing a state of a problem is established. The population is then evolved into the next generation through the repositioning of its individuals. The process repeats when certain number of epochs has elapsed, or when the objective function represented by the best candidate has exceeded a threshold indicating successful location of the optimal solution. Evolution of the population Gfrom one generation to the next is attained with two major steps. First, suppose xi n denotes the K position of the ‘ith’ particle at the nth generation, its velocity vi is given by
()
G K K v i (n ) = w × v i (n ) + c1 × rand (1) × [ pbest [i ] − x i (n )] G + c 2 × rand (1) × [ pbest [G best ] − x i (n )]
(2)
where
K vi (n ) = [− Vmax ,Vmax ] is the velocity, G xi (n ) = [0, X max ] is the current position of the particle, w = [wmin , wmax ] is the time varying inertia weight,
rand(1) is a random real number between the range [0,1], pbest i is the best position attained so far of particle ‘I’,
[]
Gbest is the particle that has attained the best position pbest [Gbest ] in the whole
swarm, Second, the velocity determines the direction and displacement on where the particle will be relocated in the next instance, as given by
G G K xi (n + 1) = xi (n ) + vi (n )
(3)
A new child population is generated after all the parent individuals have migrated to their new positions. It can be seen from Eq. (2) that we have adopted the “Global Variant” of PSO. In this approach, each particle will move towards its best previous position and towards the best particle in the entire swarm.
4
Proposed Method: Affine Invariant Object Matching Based on PSO
Our method is based on the same matching scheme in [1], but with the Genetic Algorithm replaced with the PSO. An initial population P0 of M particles is established each defining a randomly selected triplet of test points. Following the terminology in section 3, we have
G G G G Pn = {x 0 (n ), x1 (n ), x 2 (n ),....., x M −1 (n )}
(4)
Rate Shape Identification Based on Particle Swarm Optimization
81
The parameter n denotes the epoch of evolution which is set to 0 for the initial population. Subsequent generations are established based on repeated application of Eqs. (2) and (3). The pair of objects is considered as matched with each other if the maximum fitness value in the population has attained a value of 0.65, reflecting a similarity of no less than 65%. If the fitness value fails to reach the threshold within the maximum allowable number of generations the objects are considered as different. The expression in (2) though simple, involves a number of variables that have to be preset properly to effectuate the optimization. To begin with we have adopted the time varying inertia weight suggested in [9] to induce global to local exploration. As for the rest of the unknown parameters, we have conducted repetitive trials in matching different pairs of object contours and arrived at a set of values that give favourable result as listed in Table 1. Table 1. Parameter settings for the PSO scheme
Vmax Vmin c1 c2 wmin wmax Population Size(M) Maximum allowable generation
20 –20 4 4 0.9 1.4 30 100
Experimental results reveal that our method is capable of identifying matched object shapes within 30 generations for about 90% of test cases (i.e. a success rate of 90%). As for the remaining samples, the PSO generally fails within the maximum allowable number of iterations. The success rate is similar to that employing the Migrant Principle [6], but the computation time is significantly shortened. In view of the above observation, we have further enhanced the method by conducting multiple trials on a reduced evolution timeframe. The maximum allowable iteration is set to 30 instead of 100. As for the object shapes that fail to match after 30 generations, the PSO is conducted again for a maximum of two more times. Following the analysis in [4], the success rate is given by:
[1 − (0.1) ]× 100% = 99.9% 3
(5)
The revised algorithm has demonstrated almost 100% success rate. In the best case, a pair of object shapes could be identified in 30 generations or less which is significantly faster than that attained in [6]. A slight decrease in computation time is also noted under the worst condition when all three repeated trials (a maximum of 90 epochs) are required.
82
P.W.M. Tsang and T.Y.Y. Yuen
5
Experimental results
The performance of the proposed scheme in Affine invariant object recognition as compared with the Migrant Principle was illustrated with the use of the four model object contours: a scissors, a spanner, a wrench and a hammer. Two Affine transformed scene contours are associated with each model. The models together with its pair of transformed contours are shown in Figs. 1a to 1d. For a fair comparison all the contours adopted in the experiments are identical to that employed in [6].
Fig. 1a. Model and Transformed Scissors Contours
Fig. 1b. Model and Transformed Spanner Contours
Rate Shape Identification Based on Particle Swarm Optimization
83
Fig. 1c. Model and Transformed Wrench Contours
Fig. 1d. Model and Transformed Hammer Contours
The method reported in [6] is employed to match each model contours to its pair of scene contours with settings as given in Table 2. Table 2. Setting of the Genetic Algorithm
Maximum generations Total population size (P) Mutation rate (m) Condition of success match Number of trials
100 30 0.2 Fitness ≥ 0.65 30
The number of successful match for each model contours after 30 trials are shown in the third row of Table 3. It shows that the successful rate is high but less than 100% for the contours Sci-A, Sci-B and Span-A. The results obtained with the proposed scheme after 30 trials are shown in the fourth row of the same table. If can be seen that 100% successful rate is achieved for all cased based on the same population size.
84
P.W.M. Tsang and T.Y.Y. Yuen
Table 3. Successful Match results based on [6] and the proposed scheme
Migrant Principle Proposed Scheme
Model contour Scene contour P=30, m=0.2 P=30
Scissors
Spanner
Wrench
Hammer
SciA
SciB
SpanA
Span- Wre- Wre- Ham- HamB A B A B
27
29
28
30
30
30
30
30
30
30
30
30
30
30
30
30
At first sight it seems that the improvement is small. However, most of the success match in our scheme is attained in less than 30 generations whereas in [6] the same conclusion could rarely be made within 60 generations.
6
Conclusion
Recently it has been demonstrated that Genetic Algorithm, together with the incorporation of Migrant Principle, could attain good success rate in matching pairs of object contours that are grabbed from different viewpoints. In this paper we have explored the feasibility of further enhancing this benchmark with the use of Particle Swarm Optimization. Our initial finding was that under the same population size and maximum allowable generations, both approaches have similar performance. However, we also observed that in the majority of cases, PSO could identify a pair of matched contours with considerable smaller number of iterations. Based on this important finding we have further enhanced the method by conducting multiple trials on a reduced evolution timeframe. The revised algorithm had demonstrated 100% success rate with reasonable reduction in computation load as compare with the work reported in [6]. At present we are unable to explore further usage of PSO in other different context of contour matching. However, we believe that our favourable findings could be taken as a useful basis for overcoming more complicated problems in computer vision.
References [1] [2] [3]
P.W.M. Tsang, “A Genetic Algorithm for Affine Invariant Object Shape Recognition”, Proc. Instn. Mech. Engrs., vol. 211, part 1, pp. 385–392, 1997. P.W.M. Tsang, “A Genetic Algorithm for Aligning Object Shapes”, Image and Vision Computing, 15, Elsevier, pp. 819–831, 1997. P.W.M. Tsang, “A genetic algorithm for invariant recognition of object shapes from broken boundaries”, Pattern Recognition Letters, vol. 18, issue 7, pp. 631–639, 1997.
Rate Shape Identification Based on Particle Swarm Optimization [4] [5] [6] [7] [8] [9]
85
S.Y. Yuen, “Guaranteeing the Probability of Success using Repeated Runs of Genetic Algorithm”, Imag. Vis. Comput., 2001, 19, pp. 551–560. P.W.M. Tsang (2001), “A Novel Approach to Reduce the Effects of Initial Population on Simple Genetic Algorithm”, PDPTA ‘2001 , Las Vegas, pp. 457–462. P.W.M. Tsang, “Enhancement of a Genetic Algorithm for Affine Invariant Planar Object Shape Matching using the Migrant Principle”, IEE Proc. Vis. Image Sig. Process., vol. 150, no. 2, pp. 107–113, April 2003. J. Kennedy and R.C. Eberhart, “Particle Swarm optimization”, Proc. IEEE Int. Conf. on Neural Networks, pp. 1942–1948, Perth, 1995. Kennedy, J., “The particle swarm: Social adaptation of knowledge”, Proc. Int. Conf. on Evol. Comput., Indianapolis, pp. 303–308, 1997. Y Shi and R.C. Eberhart (1998). “Parameter selection in particle swarm optimization”, In Evolutionary Programming VII: Proc. EP98, New York: Springer-Verlag, pp. 591–600.
Advanced 3D Imaging Technology for Autonomous Manufacturing Systems
A. Pichler, H. Bauer, C. Eberst, C. Heindl, J. Minichberger PROFACTOR Research, Im Stadtgut A2, 4407 Steyr-Gleink, Austria. {apichl,hbauer,cebers,cheindl,jminich}@profactor.at
1
Introduction
Today’s markets and economies are becoming increasingly volatile, unpredictable, they are changing radically and even the innovation speed is accelerating. Manufacturing and production technology and systems must keep pace with this trend. The impact of novel innovative 3D imaging technology to counter these radical changes is exemplarily shown on the robot paint process. One has to keep in mind that investments in automatic painting lines are considerably high and as the painting line often is the bottleneck in production, it is imperative to prevent nonproductive times and maximize the use of the expensive equipment. Highly flexible, scalable and user-friendly production equipment is needed, including robotic systems for painting – a common process in production. The presented work argues that an intelligent 3D imaging system is mandatory to step forwards towards an autonomous painting system or production system generally. As the shop floor (scenes in the context of computer vision) in industrial environments are of high complexity traditional vision systems are constrained to well-defined tasks lacking adaptation and flexibility. To counter this issue a highly dynamic 3D vision system is required. The paper gives an overview of a novel innovative vision system used for manufacturing applications. In opposite to high-level uncertainties (incl. non-stable and non-deterministic production disturbances such as rush orders) that challenge scheduling algorithms within the (digital) factory, low-level uncertainties can widely be managed locally at the equipment by incorporation of sensory information. While vision-based control to compensate small pose deviations (usually in 2D or 2 1/2D) and calibration errors are state of the art, large deviations in object shape or 3D/6DOF pose corrections at complex objects are beyond. The range image processing to compensate missing data or pose uncertainties described in this paper includes segmentation, feature extraction and recognition/localisation. Related work on segmentation and is presented in [9], on finding features with defined geometric
88
A. Pichler et al.
properties in [3,16]. Recognition and localisation of objects in 3D based on range images has been described in [20]. Related approaches to compensate large uncertainties using planning based on sensory data have been presented in the original “FlexPaint” project [1], the “Fibrescope” Project (Flexible inspection of Bores with an robotic Endoscope) and in [21], where a sensory approach is used to deal with uncertainties in turbine-blade repair was introduced. There are are numerous 3D object recognition methods that are either global, like eigenpictures [17] or eigenshapes [4], or that rely on an initial segmentation of the object [8, 5]. Those methods obtain good results on noise free images, but there deficiencies on global properties which makes them vulnerable to occlusions. A more generic way of approaching the object recognition problem pose spin images [12], which have been shown to yield striking results with cluttered or occluded objects. As this method is based on finding correspondences between image and model regions, it is rather time intensive, though. [11] gives a good overview about current global and local approaches on range images. The first part of the paper describes an autonomous robot painting system, and its evaluation in industrial manufacturing lines. The second part describes 3D imaging workflow integrating adaptive vision, object recognition and localization – and demonstrate its impact on compensating incomplete data and disturbances that are induced by the industrial environment.
2
System Overview
The main concept foresees to derive the robot-process programming (in this paper the painting process) automatically even if the product is unspecified [19, 23]. The proposed robot-system analyses product sensor data of the products to be painted. The system processes a chain of subsequent modules as seen in Fig. 1: (i) 3D image processing, (ii) process planning and (iii) generation of collision free robot motions. As the scene in the task space is unknown 3D sensors are integrated to build a 3D world model. A novel 3D imaging system has been developed combining 3D sensors, object modeling and data analysis to create a 3D model of a robot working area. Next the idea of the employed paint process planning is to link geometries to a process model. This link is established with a flexibility that considers that the precise painting strategy that is mapped to the geometries may vary from customer to customer. Scheduling of individual strokes follows specific criteria, as cycle time or others [23]. Feature-specific paint-strokes for the feature-sets and sequences are converted to collision free tool and robot motions and complemented with air motions to a full speed-optimized paint-application. Automatically generated executable programs are first simulated and than executed on the paint-robots.
Advanced 3D Imaging Technology for Autonomous Manufacturing Systems
89
Fig. 1. System overview of sensor-based robot painting system
Fig. 2. Integrated 3D sensor cell and products hooked up in conveyor system (left); automatic paint process planning result showing paint strokes generated on retrieved 3D world model (right)
3
Inline 3D Image Processing
3.1
Overview
The proposed system combines 3D sensors, object modeling and data analysis to create a 3D model of a robot working area. This paradigm, known as task space scene analysis, provides a much richer understanding of complex, interior work environments than that gleaned from conventional 2-D camera images, allowing a process planning tool to plan and simulate robotic actions within the virtual world model. In order to retrieve a world model the 3D imaging system proceeds as follows (Fig. 3): Range images of the products scanned during their transport in the conveyor are taken using 3D laser range scanners. To retrieve most of the object geometry several sensor systems are necessary. Alignment of the data into a common world coordinate system is achieved using 3D registration method [11]. A meshing algorithm transforms aligned points data fragments into a surface model. In a final step reconstructed data is analysed by a 3D object recognition procedure. Resorting to a product database containing prepared CAD data the system identifies product
90
A. Pichler et al.
Fig. 3. Flow Diagram 3D Image Processing
models in the reconstructed sensor data. Additionally the algorithm implies accurate retrieval of the corresponding position and orientation of the products.
3.2
3D Scanning Process
In response to small product lot size and to compensate uncertainties (position uncertainties, non- stable and non-deterministic production disturbances such as rush orders) a highly flexible sensor cell has been integrated in the production line. As dimension and shape complexity of products is varying over a large scale an adaptive approach to set the viewpoints of the sensors has been chosen. Figure 4 shows a standard setup of the proposed system. Products are carried on a conveyor system which has a defined process flow direction. The scanning process is carried out iteratively. Starting from a standard position all systems are triggered to take initial scans. Aligning the output to a common world coordinate system and transforming it into a surface model retrieves boundary points or holes in the data which give raise to missing data points in the original data set. These positions are fed back in the loop to the scanning process. As the sensor systems are equipped with separate kinematics on their own all system are positioned to a new scanning position to compensate incomplete surface models.
Fig. 4. Inline 3D sensor cell, setup made up of 5 sensor systems each of those is capable of reconfiguring its kinematics configuration in order to adapt to the product geometry
Advanced 3D Imaging Technology for Autonomous Manufacturing Systems
3.3
91
Registration Process
Given the correct sensor calibration (intrinsic + extrinsic parameters) resulting from geometric configuration of laser plane and sensor the sensor data is transformed to 3D coordinates in a local sensor coordinate system. In order to give the sensor system a certain kind of freedom for adapting themselves to the product geometry a simple robot kinematics has been adapted. Using forward kinematics and registration algorithm for fine alignment such as the Iterative Closest Point (ICP) a perfectly aligned data set can thus be retrieved.
3.4
Reconstruction Process
Having an aligned 3D point’s cloud one of the mayor issues is to find a realistic topological structure. That means finding a transformation from point data to surface representation. Numerous of approaches have been developed over the last decades. The marching triangle algorithm [24] gives excellent meshing results on fragmented data at low computational costs. The idea is to polygonize the implicit surface by growing a triangulated mesh according to the local geometry and topology.
3.5
Object Recognition Process
Recognition of three dimensional objects in noisy and cluttered scenes is challenging problem. A successful approach in past research has been the use of regional shape descriptors. Given a reconstructed 3D surface model of the scene produced by a set of 3D sensors the goal is to identify objects in scene (in this case e.g. products in the production line) by comparing them to a set of candidate objects (product CAD database). There are several issues making the process of 3D recognition challenging: (1) the target object is often obscured due to self-occlusion or occluded by other objects. (2) Close objects in the scene act as background clutter, and thus interfere the recognition process, (3) many objects (same product spectrum) have quite the same similarity in dimension and shape, and (4) laser range scanners have limited resolution to produce detailed discriminative features. Generally the recognition task is considered as matching task between two surfaces. The proposed 3D object recognition scheme is based on regional shape descriptors (e.g. spin images, shape context, point signatures) which do not impose a parametric representation on the data, so they are able to represent surfaces of general shape. The identification of products in the task space of a production line requires the knowledge of product CAD data. This kind of representational data stored in the model library must be prepared for the recognition task. In an offline step all models are re-meshed to the same mesh resolution. Shape descriptor signatures of all objects are established and the most distinctive ones of each object are stored in the model library.
92
A. Pichler et al.
Finding correspondences using the correlation coefficient is computationally expensive, and therefore, a different way of managing the information conveyed by shape descriptor signatures is needed. In order to make the process of matching efficient, dimensionality reduction was achieved by projecting shape signatures represented as n-tuples to a space of dimension d < n, using principal component analysis (PCA). Shape descriptor signatures provide a comfortable way of handling generic 3D data. Unfortunately in spite of matching PCA compressed 3D data becomes an extensive task when dealing with hundreds of 3D models as common in industry. Furthermore regional shape descriptor signatures are quite redundant and tend to provide ambivalent results. These known issues in mind the algorithm has been extended by a distinctive point detection algorithm to find the most representative shape descriptor signatures of a 3D model. The approach measures the retrieval performance of all shape descriptor signatures of a model against all remaining models in the database. Identified distinctive shape descriptor signatures showing high retrieval performance to the correct model are selected. A common way of determining similarity between surfaces is the Mahalanobis distance which is used to calculate the likelihood of each descriptor. Shape descriptor signatures of a model are assigned distinctive if their similarity is least likely to the models in the database. As the likelihood is computed with respect to all shape descriptors in an entire database similarities measures are accumulated in a histogram. Basically, distinctive feature are the outliers of the histogram. Normally, histograms of similarity measure will typically have a single mode corresponding to similar shape descriptor signature matches. Distinctive shape descriptor signature matches will be upper outliers in the histogram of similarity measure. For well behaved distributions (i.e., single mode), a standard statistical way of detecting outliers is to determine the fourth spread of the histogram (fs = upper fourth – lower fourth = median of largest N/2 measurements – median of smallest N/2 measurements) [25]. Statistically moderate outliers are 1,5fs units above (below) the upper (lower) fourth, and extreme outliers are 3fs units above (below) the upper (lower) fourth. Figure 5 shows a histogram of distinctiveness measures. Four extreme outliers (points with similarity measure greater than 3fs above the upper fourth) are detected. Through detection of extreme outliers in the histogram of distinctiveness measures, an automatic method for finding distinctive shape descriptor signatures has been created. The actual recognition process is carried out in an online step. The database contains a set of models. Each of them is represented by a vector of n distinctive shape descriptor signatures. Recognition consists of finding the model which correspondents to a given scene that is the model which is most similar to this scene. Randomly selected points are taken from the scene and nearest neighbours are evaluated in the model database. If the distance of a model to a scene point is below a threshold the corresponding model gets a vote. The idea of the voting algorithm is to sum the number of times each model is selected. The model that is selected most often is considered to be the best match. In a verification step all selected models are tested against the scene by matching labeled scene points to corresponding models using Iterative Closest Point [11]. The Hausdorff distance
Advanced 3D Imaging Technology for Autonomous Manufacturing Systems
93
Fig. 5. Distinctiveness computation: distinctiveness histogram (left), distinctive points on 3D model (right)
Fig. 6. Flow diagram 3D object recognition process
between scene and model points has been used as measure of the quality of the recognition result.
4
Experiments
The system has been tested on a variety of industrial use cases. The database contains 200 industrial parts mainly from automotive sector. Figure 7 shows an excerpt of the database. In an offline stage all models have been prepared to homogeneous mesh resolution and distinctive points of every model have been retrieved. Experiments have been carried out on scenes where objects are hooked up in a conveyor system which is a common way in industry. Two representative experiments are shown in the paper. The result obtained from experiment 1 and 2 are listed in Table 1 and graphically presented in Fig. 8.
94
A. Pichler et al.
Fig. 7. Excerpt of model database (200 CAD models of industrial parts)
Fig. 8. Recognition result: scene 1 (left) and scene 2 (right); reconstructed data (green, solid) and superimposed recognized CAD model (grey, wireframe) Table 1. Experimental results of 3D recognition performance Exp. 1
2
Object 1 2 3 4 5 6 7 8 9 1 2 3
Model Overlap 0.41 0.46 0.51 0.45 0.67 0.53 0.59 0.65 0.61 0.42 0.35 0.49
Match Correlation 0.93 0.92 0.95 0.94 0.92 0.93 0.96 0.94 0.93 0.98 0.97 0.99
Recognition Time [ms] 954 865 856 765 967 841 745 618 831 432 218 342
Advanced 3D Imaging Technology for Autonomous Manufacturing Systems
95
The scenes feature the typical amount of clutter and background noise. They have been processed as described in above sections. The experiments consist of acquiring 3D sensor data, running the recognition algorithm on the scenes and then measuring the recognition rate in each scene. The first experiment has been carried out on a scene with 11 objects. Figure 8 (left) shows the simultaneous recognition of 9 models from the library of 200 models. The reconstructed mesh is visualized as solid and the superimposed CAD models are displayed as wire-frame model. The scene addresses a few interesting problems in object recognition. Part of the conveyor system is visible in the scene and is handled as background noise. There are obscured objects due to occlusion and self occlusion. Basically the recognition algorithm performs excellent. In Table 1 the recognition rate for the detected objects is above 90%. The recognition time for each of the objects is less than 1 second on a Pentium 4, 3 GHz, 1 GB Ram. The recognition algorithm failed when only a small portion of the model was visible. The second experiment shows a smaller scene. The recognition rate for all objects is above 95%. Again, recognition time is below 500 ms for each object. As can be seen from the results the system shows excellent performance in an industrial setup. A mayor step towards more robustness, stability and minimum computational time makes the system applicable for industrial applications.
5
Conclusion
In this paper a novel 3D imaging system has been proposed to be integrated in a production system in the shop floor. It enables 3D recognition and localization of products to be processed inline. Having a world model of the products in the task space sophisticated process-specific planning systems are used to generate robot motions. The developed 3D imaging workflow comprising adaptive sensing, reconstruction and recognition is tightly coupled to compensate uncertainties and disturbances which are ubiquitous in an industrial surrounding. To handle arbitrary object shapes a highly flexible 3D sensor cell has been integrated. The sensor systems adapt to product geometry based on initial scans to cover most of the object geometry. The computation of distinctive shape descriptor signatures contributes to more robustness and less computation costs in the recognition process. Furthermore it assigns unique complex feature points to CAD models getting rid of redundant data. Robust 3D object recognition and pose estimation contributes to the realization of small lotsize robotic applications if part or scene complexity is very high. Having a flexible 3D imaging system paired with automatic process planners the approach has proven to be a promising alternative to conventional teach in and OLP based programming and has shown feasible especially for high variant, low volume parts. Additionally it contributes to the realization of autonomous manufacturing systems.
96
A. Pichler et al.
The 3D imaging systems has been integrated into several robotic paint lines in North America and Europe. Future work will focus on classification of 3D models allowing recognizing new product variants on the basis of prototypical product shapes. Furthermore other fields of automatic process planning (inspection, sand blasting) will be investigated.
References [1] [2] [3] [4] [5] [6] [7] [8] [9]
[10] [11] [12] [13] [14] [15]
“Flexpaint.” [Online]. Available: www.flexpaint.org. Autere, “Resource allocation between path planning algorithms using meta a*,” in ISRA, 1998. N.W.C. Robertson, R.B. Fisher and A. Ashbrook, “Finding machined artifacts in complex range data surfaces,” in Proc. ACDM2000, 2000. R.J. Campbell and P.J. Flynn, “Eigenshapes for 3D object recognition in range data,” pp. 505–510. [Online]. Available: citeseer.ist.psu.edu/137290.html. O. Camps, C. Huang, and T. Kanungo, “Hierarchical organization of appearancebased parts and relations for object recognition,” 1998. [Online]. Available: citeseer.ist.psu.edu/camps98hierarchical.html E. Freund, D. Rokossa, and J. Rossmann, “Process-oriented approach to an efficient off-line programming of industrial robots,” in IECON 98: Proceedings of the 24th Annual Conference of the IEEE Industrial Electronics Society, 1998. P. Hertling, L. Hog, L. Larsen, J. Perram, and H. Petersen, “Task curve planning for painting robots – part i: Process modeling and calibration,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 12, no. 5, pp. 324–330, April 1996. R. Hoffman and A.K. Jain, “Segmentation and classification of range images,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 9, no. 5, pp. 608–620, 1987. A. Hoover, G. Jean-Baptiste, X. Jiang, P.J. Flynn, H. Bunke, D.B. Goldgof, K.K. Bowyer, D.W. Eggert, A.W. Fitzgibbon, and R.B. Fisher, “An experimental comparison of range image segmentation algorithms,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 18, no. 7, pp. 673–689, 1996. [Online]. Available: citeseer.csail.mit.edu/hoover96experimental.html N. Jacobsen, K. Ahrentsen, R. Larsen, and L. Overgaard, “Automatic robot welding in complex ship structures,” in 9th Int. Conf. on ComputerApplication in Shipbuilding, 1997, pp. 410–430. R.J.Campbell and P.J. Flynn, “A survey of free-form object representation and recognition techniques,” Comput. Vis. Image Underst., vol. 81, no. 2, pp. 166–210, 2001. A. Johnson and M. Hebert, “Using spin images for efficient object recognition in cluttered 3D scenes,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 21, no. 5, pp. 433–449, May 1999. T. Kadir and M. Brady, “Scale, saliency and image description,” International Journal of Computer Vision, vol. 45, no. 2, pp. 83–105, 2001. K.K. Gupta and A.D. Pobil, “Apartical motion planning in robotics: Current approaches and future directions,” 1998. K. Kwok, C. Louks, and B. Driessen, “Rapid 3-d digitizing and tool path generation for complex shapes,” in IEEE International Conference on Robotics and Automation, 1998, pp. 2789–2794.
Advanced 3D Imaging Technology for Autonomous Manufacturing Systems
97
[16] D. Marshall, G. Lukacs, and R. Martin, “Robust segmentation of primitives from range data in the presence of geometric degeneracy,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 3, pp. 304–314, 2001. [17] H. Murase and S.K. Nayar, “Visual learning and recognition of 3-d objects from appearance,” Int. J. Comput. Vision, vol. 14, no. 1, pp. 5–24, 1995. [18] M. Olsen and H. Petersen, “A new method for estimating parameters of a dynamic robot model,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 17, no. 1, pp. 95–100, 2001. [19] A. Pichler, M. Vincze, O. M. H. Anderson, and K. Haeusler, “A method for automatic spray painting of unknown parts,” in In IEEE Intl. Conf. on Robotics and Automation, 2002. [20] F.R.B., F. A.W., M. Waite, O.M., and E. Trucco, “Recognition of complex 3-d objects from range data,” in CIAP93, 1993, pp. 509–606. [21] X. Sheng and M. Krmker, “Surface reconstruction and extrapolation from multiple range images for automatic turbine blades repair,” in IEEE IECON Conference, 1998, pp. 1315–1320. [22] W. Tse and Y. Chen, “A robotic system for rapid prototyping,” in IEEE International Conference on Robotics and Automation, 1997, pp. 1815–1820. [23] C. Eberst, H. Bauer, H. Nöhmeyer, J. Minichberger, A. Pichler, G. Umgeher, “Selfprogramming Robotized Cells for Flexible Paint-Jobs”, International Conference on Mechatronics and Robotics 2004, Aachen, Germany. [24] A. Hilton, A.J. Stoddart, J. Illingworth and T. Windeatt, Marching Triangles: Range Image Fusion for Complex Object Modelling’, IEEE 1996 International Conference on Image Processing. [25] J. Devore, Probability and Statistics for Engineering and Sciences. Brooks/Cole, Belmont, CA, 1987.
Vision Based Person Tracking and Following in Unstructured Environments
Mahmoud Tarokh and John Kuo Department of Computer Science, San Diego State University, San Diego, CA 92124, USA.
1
Introduction
Vision based tracking and following a person by a robot equipped with a vision system has many applications such as surveillance and motion capture, and detection and following intruders. Such a robot can also be used as a human assistant for carrying tools and equipment and helping elderly. The major requirement in these applications is the ability to track and follow a moving person through nonpredetermined, unstructured and often rough environments. Vision based robotic person following consists of two main tasks – providing sensory (visual) feedback about the location of the person relative to the robot, and issuing signals to robot actuators, e.g. steering and wheel motors, to follow the person. When the camera is fixed, the simplest and fastest method for detecting moving objects is frame differencing, which compares consecutive image frames (Cai 1995; Richards 1995). However, the major challenge in the tracking task is the detection of person’s motion by a camera mounted on a moving robot as these two motions are blended together. A number of approaches have been proposed to address this issue, e.g. tracking features (Censi 1999; Zoghlami 1997, Foresti 2003) and computing optical flow (Srinivasan 1997; Irani 1994). In (van Leeuwen 2002) a method is proposed to track cars in front using a camera mounted on the pursuing car. A color based tracking system capable of tracking color blobs in real time is implemented on a mobile robot (Schlegel 2000), but requires the person to wear a shirt of specified color and does not consider shape. An approach to recognition of a moving person by a camera mounted on a robot is provided in (Tanawongsuwan 1999) which also uses color recognition. These approaches are only effective in environments that do not contain objects whose color is similar to that of the person to be tracked. More recently, a probabilistic approach is proposed which is based on frame differencing with a compensation for the robot mounted camera motion (Jung 2004).
100
M. Tarokh and J. Kuo
Considerable research work has been reported in the area of autonomous robot navigation, but very few addressed person following. In particular numerous fuzzy-logic based approaches have been developed for navigation (e.g. see Saffiotti 1997 for a review). Fuzzy logic has been applied to the wall following and obstacle avoidance problem (Braunstingl 1995). Research reported in (Weng 1998) uses vision to guide a mobile robot by comparing images to a database of images that are created during an initialization tour of the environment. Regardless of the approach, navigation and tracking using maps require that the environment be known prior to application, which limits flexibility and is not a valid approach to person following. We recently reported a simple vision based robotic person following for flat environments using a grey-scale camera that was fixed to a mobile robot platform (Tarokh and Ferrari 2003). The purpose of the present paper is to enable robust person following in rough terrain. In this work, we employ color and shape for person identification and an articulated camera which pan and tilt actuators for robust person following.
2
Person Identification
One of the main tasks in person following is the detection and segmentation of the person from the scene. This task consists of two subtasks, namely, training a detection system and recognition of the person as he/she moves in the environment. Both these subtasks employ color and shape characteristics. In our system, the person appears in front of the camera at the start of a tour, and images of the person are captured automatically when the person takes several poses, i.e. back and side view of the person. The system is then trained to recognize the shape and color of the person’s upper body. We use H (hue or color), S (saturation or color depth), B (brightness or lightness) color model, as HSB is based on direct interpretation of colors and provides a better characterization compared to other color models such as RGB for this application. The averages of H, S and B components for the poses are recorded, which provide the nominal values H nom , S nom and B nom. However these values will go through changes during the motion due to the change in lighting. We, therefore, allow deviations ΔH , ΔS , ΔB from the nominal values, which are found experimentally. Thus during the person following, if an object in the image has color components within the reference values H ref = H nom ± ΔH, S ref = S nom ± ΔS and Bref = Bnom ± ΔB, then the object will be a candidate for the person’s image, and its shape measures are checked. The shape identification system is trained with the above mentioned poses. Shape measures must be invariant to the mass (area) changes of the person’s image since the mass changes with the distance of the robot to the person. The three measures that satisfy this requirement are compactness C, circularity Q and eccentricity E. Equations for computing these shape measures are given in (Tarokh
Vision Based Person Tracking and Following in Unstructured Environments
101
2003), where the normalized values of the three measures are between 0 and 1. During the training, each of these measures is evaluated for the person in each of the above two poses (k = 1, 2) and their values C k, ref , Q k ,ref and E k ,ref are stored for the person following phase. This completes the training of the system, which takes a few seconds on a standard PC, and can be considered as an off-line phase. During person following, the camera takes images of the scene and the system performs several operations to segment (isolate) the person from other objects in the image. The first operation is to scan every pixel and mark the pixel as belonging to the person image, e.g. set it is to white if all its three color components are within the reference color ranges H ref , S ref and B ref . This process of checking all pixels is time consuming, and therefore we speed it up by considering two observations. First, since the person’s image occupies a large portion of the image, it will be sufficient to check pixels on every other row and every other column for color verification. In this manner, only a quarter of the pixels are checked and marked white if they satisfy the color range. The skipped pixels will be marked white if the checked pixels around them have been marked white. The second observation is that there is a maximum distance that the person can move between two consecutive frames. As a result, the person’s pixels in the current frame must all lie within a circle centered at the centroid (to be defined shortly) of the previous frame. These two observations limit the number of pixels to be checked and speed up the marking of the pixels that belong to the person’s image. The final operation is to perform a standard region growing on the marked pixels so that connected regions can be formed. Regions smaller in area than a specified value are considered noise and are removed. The shape measures values C i , Q i and E i for the remaining regions are computed, where i = 0,1,2,…,m-1 denote the region numbers. Rather than checking each shape parameter with its corresponding reference value, we define a single measure for the closeness of the detected region to the reference region, i.e. the person’s image during the training. A possible function σ is given in Tarokh (2003). The closeness function produces 1 if all shape measures of the region are the same as the reference value, and approaches zero if the region shape measures are completely different. It is noted that for each detected region, two shape measures are found, i.e. one for each pose. The region that has the largest value of closeness σ is selected, and if this value is close to 1, the selected region is assumed to represent the person. If all the regions have small values of σ , then none is chosen and another image is taken and analyzed. The above method of distinguishing the region corresponding to the person from other detected regions in the image is simple and yet quite effective. There are several reasons for this effectiveness. One is that the robot is controlled reasonably close to the person being followed and in the direction of person’s motion, as will be seen in the next section. This allows only few objects in the camera’s view making the person identification reasonably easy. Furthermore, the simplic-
102
M. Tarokh and J. Kuo
ity of image processing tasks allows fast computation, making it possible to achieve relatively high sample rates. We must now determine several characteristics of the detected region representing the person in the image. These characteristics will be used for the robot control. The area or the mass of the region is important since it gives a measure as to how close the person is to the camera mounted on the robot. A large mass is indicative of a person that is close to the camera, whereas a small mass implies that the person is far away. The mass (area) M is simply equal to the total number of pixels in the region. The coordinates of the center of the mass, denoted by x c , y c is defined as xc =
1 M
⎡ ⎤ ⎤ 1 ⎡ ⎢ ∑ y ⎥ ; yc = ⎢ ∑x ⎥ M ⎢⎣∀( x , y)∈R ⎥⎦ ⎢⎣∀( x , y )∈R ⎥⎦
(1)
where x, y are the coordinates of a pixel in the region, and the summation is taken over all pixels in the region R. It is noted that we assign the x-axis across the field of camera view, and the y-axis along the field of view, i.e. along the path of the person. The center of mass is of importance for person tracking because it provides the coordinates of the point to be tracked by the robot.
3
Fuzzy Tracking and Following Control
The objective of the robot control is to follow the person and keep a reasonably constant distance to him/her. There are four actuation quantities to be controlled, as shown in Fig. 1. These are camera pan or yaw angle β, camera tilt or pitch angle θ, robot forward/backward speed v, and robot steering angle ϕ . We explored standard PID controllers for these tasks. However, due to ambiguities and imprecision in the image information, PID controllers required frequent gain tuning, which were not practical. We therefore, use a fuzzy control paradigm which is effective in dealing with such imprecision and provides for a natural and intuitive rules base specification for this application. The image information, namely the person’s mass M, the center of the mass ( x c , y c ) and their derivatives ( dx c / dt , dy c / dt ), are the sensed/computed quantities. Note that the derivative (e.g. dx c / dt ) is computed as a change in the quantity between two samples (e.g. Δx c ) divided by the sample time, which is taken as the unit time. Thus in what follows, we use the derivative and the difference interchangeably. For reasons that will become clear shortly, instead of the current values β, θ and ϕ, the changes in these quantities from the last values, i.e. Δβ, Δθ and Δϕ are implemented. Each of the sensed and actuation quantities is treated as a fuzzy (linguistic) variable with five normalized membership functions as given in Fig. 2. The steering is not included in this table, and its value will be determined using the average of the camera pan (yaw), as will be described later. The fuzzy sets Set 1, … Set 5 are given specific names for each fuzzy variable as listed in Table 1, where the
Vision Based Person Tracking and Following in Unstructured Environments
103
Fig. 1. Robot actuation quantities
Fig. 2. Normalized membership functions
fuzzy variables are shown with a tilde. For example, the fuzzy sets for the x-axis x c that describes motion across the field of the center of the mass fuzzy variable ~ of view of the camera are named Far Left, Center, etc. Similarly, the fuzzy sets for the y-axis of the mass are called Down, Up, etc. depending on where the person appears in the image. Each of the sensed and actuation quantities is treated as a fuzzy (linguistic) variable with five normalized membership functions as given in Fig. 2. The steering is not included in this table, and its value will be determined using the average of the camera pan (yaw), as will be described later. The fuzzy sets Set 1, Set 2,…, Set 5 are given specific names for each fuzzy variable as listed in Table 1, where the fuzzy variables are shown with a tilde. For example, the fuzzy sets for the xaxis of the center of the mass fuzzy variable ~ x c that describes motion across the field of view of the camera are named Far Left, Center, etc. Similarly, the fuzzy sets for the y-axis of the mass are called Down, Up, etc. depending on where the person appears in the image.
104
M. Tarokh and J. Kuo
Table 1. Definition of fuzzy variables and associated sets
We propose the following scheme that decomposes the control task into three controllers for pan, tilt and speed. Steering control will be discussed later. The main tasks of the camera pan and tilt controllers are to position the camera so that the person is in the camera’s sight from which the person’s whereabouts can be deduced. The purpose of the robot speed controller is to keep a nearly constant distance between the robot and the person. Consider first the pan (yaw) controller. When the person moves to the left, the image of the person will be shifted to the left of the frame along the image x-axis if the camera and the robot are stationary. Thus the person’s center of mass in the x-direction, x c , is an indication of the location of the person across the field of view. Furthermore, Δx c = x c (k ) − x c (k − 1) gives the amount and direction of the change from the last sample, where k denotes the current sample (frame) value and (k–1) denotes the previous value of x c . The speed controller takes two inputs, namely the person’s image mass M and the change in the camera tilt Δθ. The mass is a measure of the person’s distance to the camera and the larger this mass, the closer the person will be to the camera, and vice versa. The tilt is used to account for hilly terrain When Δθ is positive as in the case of the person starting to climb a hill, the robot must slow down and when Δθ is negative, as in the case of the person starting to descend a hill, it must speed up. These considerations lead to the rule matrix in Table 4. The center of gravity defuzzification is used to determine the crisp value of the camera pan and tilt, and robot speed. The final control quantity is the steering. Although it is possible to employ fuzzy rules for determining the steering control similar to the other three quantities, it is simpler and more reasonable to base the robot steering on the pan (yaw) of the camera. This is due to the observation that the camera rotates to keep the person in its view and thus essentially follows the
Vision Based Person Tracking and Following in Unstructured Environments
105
person’s turning motions, which must eventually cause the rotation (steering) of the robot. It will be unnecessary and undesirable to steer the robot at the same rate as the camera pan. In other words, the camera must track relatively fast and fine motions of the person, whereas the robot must follow the gross motion of the person which is the average motion taken over a time period. As a result of this averaging, the steering is computed as ϕ = K ∫ θ dt where K is the proportionality constant. Table 2. Fuzzy rule matrix for camera pan control.
Table 3. Fuzzy rule matrix for camera tilt control.
106
M. Tarokh and J. Kuo
Table 4. Fuzzy rule for robot speed control
4
Indoor and Outdoor Experiments
The robot base used in the experiments was an ActiveMedia Pioneer2 All-Terrain rover, as shown in Fig. 3. The robot dimensions are 50 × 49 × 26 cm, it has four motorized wheels, and can travel at a top speed of 1.6 m/s. The Pioneer 2 is capable of holding 40 kg and has a battery life of 10–12 hours. The robot has a sonar ring with 8 sensors, which has an operation range of 15 cm to 7 m. The sonar sensors, seen in Fig. 3 as circles, are used for obstacle detection. If obstacles are detected in the robot’s path, a collision avoidance procedure is executed. This procedure will not be discussed in this paper for the sake of brevity. A Canon VC-C4 camera installed on the Pioneer 2 (Fig. 3), and permits color image capture at maximum resolution of 640 horizontal lines and 480 vertical lines in the NTSC format. It is connected to a laptop computer through an Imperx VCE-B5A01 PCMCIA frame gabber, which is specifically designed for laptops. The frame grabber can achieve capture rates of 30 frames/second at the lowest resolution of 160 × 120 in NTSC format, and 5 frames per second at the highest resolution of 640 × 480 . The laptop mounted on the base (Fig. 3) is an IBM T40 with Windows XP operating system. It contains an Intel Centrino processor running at 1.5 MHz.
Fig. 3. Pioneer 2 all-terrain rover used in experiments
Vision Based Person Tracking and Following in Unstructured Environments
107
Fig. 4. Examples of person following in unstructured environments
The application uses a variety of software libraries written by third-party for creating interface and enabling device control. The libraries for the user interface are written in Java, whereas libraries for low motor control are in C++. As a result our person following code was written both in Java and C++. The person following application uses the client server, distributed callback, model view controller. The cycle (sample) time for performing various tasks is found to be 0.13 s, or about 8 frames per second. Extensive indoor and outdoor trials were conducted with the person following system. Indoor trials included passing through a door (Fig. 4a) and hallway (Fig. 4b). Outdoor trials included following up a steep and winding dirt trail (Fig. 4c), a rocky terrain (Fig. 4d) that involved shaking of the rover, and follow-
108
M. Tarokh and J. Kuo
ing with partial occlusion and identification of person to be followed from another person (Fig. 4e and Fig. 4f). The successful experiments in rough terrain and partial occlusion, demonstrate that the person detection and fuzzy controllers are able to cope with shaky images and imprecise or incomplete information. The system even handles full occlusion in cases where the person does not quickly change directions or disappear behind other objects for an extended period of time.
5
Summary and Conclusions
The paper has presented an intelligent control method for person following in previously unknown environments. It consists of simple person identification using both color and shape, and fuzzy controllers for the camera and the robot. It is shown through various experiments that the system can function in both indoors and outdoors. The system has a number of features, which include robustness to noise due to rough terrain traversal, and to partial occlusion. It can perform well in difficult locations such as hallways with tight turns, and winding hilly outdoor trails. A video has been prepared showing person following in various environments, and can be viewed or downloaded from (Tarokh 2005). The system has two limitations. First, it is unable to perform satisfactory person following when the person moves fast. The main bottlenecks are image capture/save and thresholding routine that in combination take more than half of the total cycle. The other limitation is that in bright outdoor lights with distinct shadows, the person identification system can get confused since it treats the shadows as objects/obstacles. We are currently investigating these issues to improve the robustness of the system.
References [1] [2] [3] [4] [5] [6]
Braunstingl, R., P. Sanz and J.M. Ezkerra (1995). Fuzzy logic wall following of a mobile robot based on the concept of general perception, Proc. 7th Int. Conf. on Advanced Robotics, pp. 367–376, Spain. Cai, Q., A. Mitchie and J.K. Aggrarwal (1995). Tracking human motion in an indoor environment, 2nd Int. Conf. on Image Processing. A. Censi, A. Fusiello, and V. Roberto, (1999). Image stabilization by feature tracking, Proc. 10th Int. Conf. Image Analysis and Processing, pp. 665–667, Venice, Italy. G.L. Foresti . and C. Micheloni, (2003). A robust feature tracker for active surveillance of outdoor scenes, Electronic Letters on Computer Vision and Image Analysis, vol 1, No. 1, 21–34. M. Irani, B. Rousso, and S. Peleg, (1994). Recovery of ego-motion using image stabilization, Proc. IEEE Computer Vision and Pattern Recognition, pp. 454–460. B. Jung, and G. Sukhame,( 2004). Detecting moving objects using a single camera on a mobile robot in an outdoor environment, Porc. 8th Conf. Intelligent Autonomous Systems, pp. 980–987, Amsterdam, The Netherlands.
Vision Based Person Tracking and Following in Unstructured Environments [7] [8] [9] [10] [11] [12] [13] [14] [15] [16]
109
C. Richards, C. Smith and N. Papaikolopoulos, (1995). Detection and tracking of traffic objects in IVHS vision sensing modalities, Proc. 5th Annual Meeting of ITS America. A. Saffiotti (1997). “The uses of fuzzy logic in autonomous robot navigation: a catalogue raisonn’e”, Technical Report 2.1, IRIDIA, Universite Libr’e de Bruxelles, Brussels, Belgium. C. Schlegel, J. Illmann, H. Jaberg, M. Schuster, and R. Worz, (2000). Integrating vision based behaviors with an autonomous robot, Journal of Computer Vision Research, Videre, 1 (4), pp. 32–60. S. Srinivasan and R. Chellappa, 1997. Image stabilization and mosaicking using overlapped basis optical flow field. Proc. IEEE Int. Conf. Image Processing. R. Tanawongsuwan, A. Stoytchev and I. Essa (1999). Robust tracking of people by a mobile robotic agent. College of Computing Report, Georgia Institute of Technology. M. Tarokh and P. Ferrari, (2003). Robotic person following using fuzzy logic and image segmentation, J. Robotic Systems, vol. 20, No. 9, pp. 557–568. M. Tarokh, (2005) www-rohan.sdsu.edu/~tarokh/lab/research-person following.html. M.B. Van Leeuwen, F.C. Greon, (2002). Motion interpretation for in-car vsion system, Proc. IEEE/JRS Conf. Intelligent Robots and Systems, Lausanne, Switzerland. J. Weng and S. Chen (1998)Vision-guided navigation using SHOSLIF. Neural Networks, 1, pp.1511–1529. I. Zoghami, O. Faugeras and R. Deriche, (1997). Using geometric corners to build a 2d mosaic from a set of images, Proc. IEEE Conf. Vision Patter Recognition, pp 420–42.
Simple, Robust and Accurate Head-Pose Tracking Using a Single Camera
Simon Meers, Koren Ward and Ian Piper University of Wollongong, Australia.
1
Introduction
Tracking the position and orientation of the head in real time is finding increasing application in avionics, virtual reality, augmented reality, cinematography, computer games, driver monitoring and user interfaces for the disabled. While developing a computer interface for blind computer users, we encountered the need for a robust head-pose tracking system for accurately monitoring the gaze position of the user on a virtual screen. Although many head-pose tracking systems and techniques have been developed, we found most existing systems either added considerable complexity and cost to our application or were not accurate enough for our requirements. For example, systems described in (Horprasert et al. 1996), (Kaminski et al. 2006) and (Newman et al. 2000) use feature detection and tracking to monitor the position of the eyes, nose and/or other facial features in order to determine the orientation of the head. Unfortunately these systems require considerable processing power, additional hardware or multiple cameras to detect and track the facial features in 3D space. Although monocular systems like (Horprasert et al. 1996), (Kaminski et al. 2006) and (Zhu et al. 2004) can reduce the cost of the system, they generally performed poorly in terms of accuracy when compared with stereo or multi-camera tracking systems (Newman et al. 2000). Furthermore, facial feature tracking methods introduce inaccuracies and the need for calibration or training into the system due to the inherent image processing error margins and diverse range of possible facial characteristics of different users. To avoid the cost and complexity of facial feature tracking methods a number of head-pose tracking systems have been developed that track LEDs or infrared reflectors mounted on the user’s helmet, cap or spectacles (see (NaturalPoint 2006), (Foursa 2004), (Foxlin et al. 2004), and (Hong et al. 2005)). However we found the pointing accuracy of systems utilising reflected infrared light (NaturalPoint 2006) to be insufficient for our application. The other LED-based systems, like (Foursa 2004), (Foxlin et al. 2004), and (Hong et al. 2005), still require multi-
112
Simon Meers, Koren Ward and Ian Piper
ple cameras for tracking the position of the LEDs in 3D space which adds cost and complexity to the system as well as the need for calibration. In order to overcome much of the cost and deficiencies in existing head-pose tracking systems we have been developing accurate methods for pinpointing the position of infrared LEDs using an inexpensive USB camera and low-cost algorithms for estimating the 3D coordinates of the LEDs based on known geometry. Our system is comprised of a single low-cost USB camera and a pair of spectacles fitted with three battery-powered LEDs concealed within the spectacle frame. Judging by our results, we believe our system to be the most accurate low-cost head-pose tracking system developed. Furthermore, our system is robust and requires no calibration. Experimental results are provided demonstrating a headpose tracking accuracy of less than 0.5 degrees when the user is within one meter distance from the camera.
2
Hardware
The prototype of our infrared LED-based head-pose tracking spectacles is shown in Fig. 1(a). Figure 1(b) shows our experimental rig for testing the system, which incorporates a laser pointer (mounted below the central LED) for testing the ‘gaze’ accuracy. The baseline distance between the outer LEDs is 147 mm; the perpendicular distance of the front LED from the baseline is 42 mm. Although the infrared light cannot be seen with the naked eye, the LEDs appear quite bright to a digital camera. Our experiments were carried out using a lowcost, standard ‘Logitech QuickCam Express’ USB camera (Logitech 2006), providing a maximum resolution of 640×480 pixels with a horizontal lens angle of approximately 35°. The video captured by this camera is quite noisy, compared with more expensive cameras, though this proved useful for testing the robustness of our system. We filtered out most visible light by fitting the lens with a filter comprising several layers of developed, fully-exposed colour photographic negative. We found it unnecessary to remove the camera’s internal infrared filter. The filtering, combined with appropriate adjustments of the brightness, contrast and exposure settings of the camera, allowed the raw video image to be completely black, with the infrared LEDs appearing as bright white points of light. Consequently the image processing task is simplified considerably.
Fig. 1. (a) Prototype LED Spectacles (b) LED testing hardware
Simple, Robust and Accurate Head-Pose Tracking Using a Single Camera
113
The requirement of the user to wear a special pair of spectacles may appear undesirable when compared to systems which use traditional image processing to detect facial features. However, the advantage of being a robust, accurate and lowcost system which is independent of individual facial variations, plus the elimination of any training or calibration procedures can outweigh any inconvenience caused by wearing special spectacles. Furthermore, the LEDs and batteries could be mounted on any pair of spectacles, headset, helmet, cap or other head-mounted accessory, provided that the geometry of the LEDs is entered into the system.
3
Processing
The data processing involved in our system comprises two stages: 1) determining the two-dimensional LED image blob coordinates, and 2) the projection of the two-dimensional points into three-dimensional space to derive the real-world locations of the LEDs in relation to the camera.
3.1
Blob Tracking
Figure 2(a) shows an example raw video image of the infrared LEDs which appear as three white blobs on a black background. The individual blobs are detected by scanning the image for contiguous regions of pixels over an adjustable brightness threshold. Initially, we converted the blobs to coordinates simply by calculating the centre of the bounding-box; however the sensitivity of the three-dimensional transformations to even single-pixel changes proved this method to be unstable and inaccurate. Consequently we adopted a more accurate method – calculating the centroid of the area using the intensitybased weighted average of the pixel coordinates, as illustrated in Fig. 2(b). This method provides a surprisingly high level of accuracy even with low-resolution input and distant LEDs.
Fig. 2. (a) Raw video input (showing the infrared LEDs at close range – 200 mm). (b) Example LED blob (with centroid marked) and corresponding intensity data
114
3.2
Simon Meers, Koren Ward and Ian Piper
Head-Pose Calculation
Once the two-dimensional blob coordinates have been calculated, the points must be projected back into three-dimensional space in order to recover the original LED positions. This problem is not straightforward. Fig. 3 illustrates the configuration of the problem. The camera centre (C) is the origin of the coordinate system, and it is assumed to be facing directly down the z-axis. The ‘gaze’ of the user is projected onto a ‘virtual screen’ which is also centred on the z-axis and perpendicular to it. The dimensions and z-translation of the virtual screen are controllable parameters and do not necessarily have to correspond with a physical computer screen, particularly for blind users and virtual reality applications. In fact, the virtual screen can be easily transformed to any size, shape, position or orientation relative to the camera. Figure 3 also displays the two-dimensional image plane, scaled for greater visibility. The focal length (z) of the camera is required to perform the three-dimensional calculations. The LED points are labelled L, R and F (left, right and front respectively, ordered from the camera’s point of view). Their two-dimensional projections onto the image plane are labelled l, r and f. L, R and F must lie on vectors from the origin through their two-dimensional counterparts. Given our knowledge of the model, we are able to determine exactly where, on the projection rays, the LEDs lie. We know that the front LED is equidistant to the outer LEDs, thus providing Eq. (1).
d ( L, F ) = d ( R, F )
(1)
We also know the ratio r between these distances and the baseline distance.
d ( L, F ) = rd ( L, R )
(2)
These constraints are sufficient for determining a single solution orientation for the model. Once the orientation has been calculated, we can also derive the exact physical coordinates of the points, including the depth from the camera, by utilising our model measurements (provided in Section 2).
Fig. 3. Perspective illustration of the virtual screen (located at the camera centre), the 2D image plane, the 3D LED model and its projected ‘gaze’
Simple, Robust and Accurate Head-Pose Tracking Using a Single Camera
115
The distance of the model from the camera is irrelevant for determining the model’s orientation, since it can simply be scaled in perspective along the projection vectors. Thus it is feasible to fix one of the points at an arbitrary location along its projection vector, calculate the corresponding coordinates of the other two points, and then scale the solution to its actual size and distance from the camera. We use parametric equations to solve the problem. Thus the position of point L is expressed as: Lx = tlx
(3a)
Ly = tl y
(3b)
Lz = tz
(3c)
Since z is the focal length, a value of 1 for the parameter t will position L on the image plane. Thus there are only three unknowns – the three parameters of the LED points on their projection vectors. In fact one of these unknowns is eliminated, since we can fix the location of one of the points – we chose to fix the location of R to be at depth Rz = z, thus making its x- and y-coordinates equal to rx and ry respectively. The position of the point F is expressed as: Fx = uf x
(4a)
Fy = uf y
(4b)
Fz = uz
(4c)
Substituting these six parametric coordinate equations for L and F into Eq. (1) yields:
( tlx − uf x )
2
+ ( tl y − uf y ) + ( tz − uz ) = 2
2
( rx − uf x )
2
+ ( ry − uf y ) + ( z − uz ) 2
2
(5)
which can be rewritten as: u (t ) =
z 2 ( t 2 − 1) + lx2 t 2 + l y2 t 2 − rx2 − ry2
2 ( z 2 ( t − 1) + lx f x t + l y f y t − rx f x − ry f y )
(6)
Figure 4 shows a plot of Eq. (6). It should be noted that the asymptote is at: t=
rx f x + ry f y + z 2 lx f x + l y f y + z 2
(7)
and that the function has a root after the asymptote. Now we can calculate the point on the front-point projection vector which is equidistant to L and R, given a value for t. Of course, not all of these points are valid – the ratio constraint specified in Eq. (2) must be satisfied. Thus we need to also calculate the dimensions of the triangle formed by the three points and find the parameter values for which the ratio matches our model.
116
Simon Meers, Koren Ward and Ian Piper
Fig. 4. Relationship between parameters t and u
Fig. 5. Triangle Baseline Distance
The baseline distance of the triangle is given by Eq. (8) and plotted in Fig. 5. b (t ) =
( rx − tlx )
2
+ ( ry − tl y ) + ( z − tz ) 2
2
(8)
The height of the triangle is given by:
h (t ) =
((u (t ) f − tl ) + (u (t ) f x
y
)
− tl y ) + ( u ( t ) z − tz ) − ( b ( t ) / 2 ) 2
2
x
2
2
(9)
Simple, Robust and Accurate Head-Pose Tracking Using a Single Camera
117
Fig. 6. Triangle Height
Figure 6 shows a plot of Eq. (9). It should be noted that this function, since it is dependent on u(t), shares the asymptote defined in Eq. (7). At this stage we are not interested in the actual baseline distance or height of the triangle – only their relationship. Figure 7 shows a plot of h(t)/b(t). The function has a near-invisible ‘hump’ just after it reaches its minimum value after the asymptote (around t=1.4 in this case). This graph holds the key to our solution, and can tell us the value of t for which the triangle has a ratio which matches our model. Unfortunately, it is too complex to be analytically inverted, so we must resort to root-approximation techniques to find the solution. Thankfully, we can reduce the solution range by noting two more constraints inherent in our problem. Firstly, we know that we are looking for a solution in which the head is facing toward the camera. Rearward facing solutions are considered to be invalid as the user’s head would obscure the LEDs. Thus we can add the constraint that: Fz < M z
(10)
where M is the midpoint of line LR. This can be restated as: u ( t ) f z < ( tlz + z ) / 2
Fig. 7. Triangle Height/Baseline Ratio
(11)
118
Simon Meers, Koren Ward and Ian Piper
Fig. 8. z-coordinates of F and M
Figure 8 shows the behaviour of the z-coordinates of F and M as t varies. It can be seen that Eq. (10) holds true only between the asymptote and the intersection of the two functions. Thus these points form the limits of the values for t which are of interest. The lower-limit allows us to ignore all values of t less than the asymptote, while the upper-limit crops the ratio function nicely to avoid problems with its ‘hump’. Hence we now have a nicely behaved, continuous piece of curve on which to perform our root approximation. The domain could be further restricted by noting that not only rearward-facing solutions are invalid, but also solutions beyond the rotational range of the LED configuration; that is, the point at which the front LED would occlude one of the outer LEDs. Our prototype LED configuration allows rotation (panning) of approximately 58° to either side before this occurs. The upper-limit (intersection between the Fz and Mz functions) can be expressed as: t≤
− S − −4 ( − lx2 − l y2 + lx f x + l y f y )( rx2 + ry2 − rx f x − ry f y ) + S 2 2 ( − lx2 − l y2 + lx f x + l y f y )
(12)
where S = f x ( lx − rx ) + f y ( l y − ry ) . Note that this value is undefined if lx and ly are both zero (l is at the origin) or one of them is zero and the other is equal to the corresponding f coordinate. This follows from the degeneracy of the parametric equations which occurs when the projection of one of the control points lies on one or both of the x- and yaxes. Rather than explicitly detecting this problem and solving a simpler equation for the specific case we have chosen instead to jitter all two-dimensional coordinates by a very small amount so that they will never lie on the axes. We have determined that the lower-limit is bounded by the asymptote; however we can actually further restrict the domain by noting that all parameters should be positive so that the points cannot appear behind the camera. Note that the positive
Simple, Robust and Accurate Head-Pose Tracking Using a Single Camera
119
Fig. 9. Triangle ratio graph with limits displayed
root of Eq. (6) (illustrated in Fig. 4) is after the asymptote. Since u must be positive, we can use this root as the new lower-limit for t. Thus the lower-limit is now: t≥
rx2 + ry2 + z 2 lx2 + l y2 + z 2
(13)
Figure 9 illustrates the upper and lower limits for root-approximation in finding the value of t for which the triangle ratio matches the model geometry. Once t has been approximated, u can be easily derived using Eq. (6), and these parameter values substituted into the parametric coordinate equations for L and F. Thus the orientation has been derived. Now we can simply scale the solution to the appropriate size using the dimensions of our model. This provides accurate three-dimensional coordinates for the model in relation to the camera. Thus the user’s ‘gaze’ (based on head-orientation) can be projected onto a ‘virtual screen’ positioned relative to the camera.
4
Experimental Results
Even using as crude a method of root-approximation as the bisection method, our prototype system implemented in C++ on a 1.3 GHz Pentium processor took less than a microsecond to perform the entire three-dimensional transformation, from two-dimensional coordinates to three-dimensional head-pose coordinates. The t parameter was approximated to ten decimal place precision, in approximately thirty bisection approximation iterations. To test the accuracy of the system, the camera was mounted in the centre of a piece of board measuring 800 mm × 600 mm. A laser-pointer was mounted just below the centre LED position to indicate the ‘gaze’ position on the board. The system was tested over a number of different distances, orientations and video resolutions. The accuracy was monitored over many frames in order to measure the system’s response to noise introduced by the dynamic camera image. Table 1 and Fig. 10 report the variation in calculated ‘gaze’ x- and y-coordinates when the
120
Simon Meers, Koren Ward and Ian Piper
position of the spectacles remained static. Note that this variation increases as the LEDs are moved further from the camera, because the resolution effectively drops as the blobs become smaller (see Table 2). This problem could be avoided by using a camera with optical zoom capability providing the varying focal length could be determined. Table 1. Horizontal and vertical ‘gaze’ angle (degrees) resolution Resolution Distance (mm) Avg. x-error Max. x-error Avg. y-error Max. y-error
320×240 pixels 500 1000 0.09° 0.29° 0.13° 0.40° 0.14° 0.32° 0.22° 0.46°
1500 0.36° 0.57° 0.46° 0.69°
2000 1.33° 2.15° 2.01° 2.86°
640×480 pixels 500 1000 0.08° 0.23° 0.12° 0.34° 0.10° 0.20° 0.15° 0.29°
1500 0.31° 0.46° 0.38° 0.54°
2000 0.98° 1.43° 1.46° 2.15°
Fig. 10. Horizontal and vertical ‘gaze’ angle (degrees) resolution graphs Table 2. LED ‘blob’ diameters (pixels) at different resolutions and camera distances 640×480 pixels 320×240 pixels
500 mm 20 7
1000 mm 13 5
1500 mm 10 4
2000 mm 8 3
To ascertain the overall accuracy of the system’s ‘gaze’ calculation, the LEDs were aimed at fixed points around the test board using the laser pointer, and the calculated gaze coordinates were compared over a number of repetitions. The test unit’s base position, roll, pitch and yaw were modified slightly between readings to ensure that whilst the laser gaze position was the same between readings, the positions of the LEDs were not. The averages and standard deviations of the coordinate differences were calculated, and found to be no greater than the variations caused by noise reported in Table 1 and Fig. 10 at the same distances and resolutions. Consequently it can be deduced that the repeatability accuracy of the system
Simple, Robust and Accurate Head-Pose Tracking Using a Single Camera
121
is approximately equal to, and limited by, the noise introduced by the sensing device. As an additional accuracy measure, the system’s depth resolution was measured at a range of distances from the camera. As with the ‘gaze’ resolution, the depth resolution was limited by the video noise. In each case, the spectacles faced directly toward the camera. These results are tabulated in Table 3. Table 3. Distance from Camera Calculation Resolution Distance from Camera Accuracy at 320×240 pixels Accuracy at 640×480 pixels
5
500 mm ±0.3 mm ±0.15 mm
1000 mm ±2 mm ±1.5 mm
1500 mm ±5 mm ±3 mm
2000 mm ±15 mm ±10 mm
Conclusion
The experimental results demonstrate that the proposed LED-based head-pose tracking system is very accurate considering the quality of the camera used for the experiments. At typical computer operating distances the accuracy is within 0.5 degrees using an inexpensive USB camera. If longer range or higher accuracy is required a higher quality camera could be employed. The computational cost is also extremely low, at less than one microsecond processing time per frame on an average personal computer for the entire three-dimensional calculation. The system can therefore easily keep up with whatever frame rate the video camera is able to deliver. The system is independent of the varying facial features of different users, needs no calibration and is immune to changes in illumination. It even works in complete darkness. This is particularly useful for human-computer interface applications involving blind users as they have little need to turn on the room lights. Other applications include scroll control of head mounted virtual reality displays or any application where the head position and orientation is to be monitored.
Acknowledgements Equations (6), (7), (12) and (13) were derived with the assistance of the Mathematica (Wolfram 2006) software package.
References [1]
Foursa, M. (2004) Real-time infrared tracking system for virtual environments. In Proceedings of the 2004 ACM SIGGRAPH international conference on Virtual Reality continuum and its applications in industry, pages 427–430, New York, USA. ACM Press.
122 [2]
Simon Meers, Koren Ward and Ian Piper
Foxlin, E., Altshuler, Y., Naimark, L. and Harrington, M. (2004) FlightTracker: A novel optical/inertial tracker for cockpit enhanced vision. In ISMAR ‘04: Proceedings of the Third IEEE and ACM International Symposium on Mixed and Augmented Reality (ISMAR’04), pages 212–221,Washington, DC, USA. IEEE Computer Society. [3] Hong, S.K. and Park, C.G. (2005) A 3d motion and structure estimation algorithm for optical head tracker system. In American Institute of Aeronautics and Astronautics: Guidance, Navigation, and Control Conference and Exhibit. [4] Horprasert, T., Yacoob, Y. and Davis, L.S. (1996) Computing 3-d head orientation from a monocular image sequence. In Proceedings of the Second International Conference on Automatic Face and Gesture Recognition, pages 242–247. [5] Kaminski, J.Y., Teicher, M., Knaan, D., and Shavit, A. (2006) Head orientation and gaze detection from a single image. In Proceedings of International Conference Of Computer Vision Theory And Applications. [6] Logitech (2006) QuickCam Express. http://www.logitech.com. [7] NaturalPoint Inc. (2006) TrackIR. http://www.naturalpoint.com/trackir. [8] Newman, R., Matsumoto, Y., Rougeaux, S. and Zelinsky, A. (2000) Real-time stereo tracking for head pose and gaze estimation. In Proceedings. Fourth IEEE International Conference on Automatic Face and Gesture Recognition, pages 122–128. [9] Wolfram Research Inc. (2006) Mathematica 5.2. http://www.wolfram.com. [10] Zhu, Z. and Ji, Q. (2004) Real time 3d face pose tracking from an uncalibrated camera. In First IEEE Workshop on Face Processing in Video, in conjunction with IEEE International Conference on Computer Vision and Pattern Recognition, Washington DC.
Vision Applications
This section contains five papers on various applications of machine vision. The advent of low cost cameras and high speed video processing has led to a plethora of devices and applications in this area. The ones described in this chapter are some of the more interesting to come along. The first is a most novel application, using machine vision to determine the state of aluminium beer kegs as they arrive at the brewery, determines what their usage has been, and then feeds the information into the keg management system. Next, we go underground to look at the use of millimetre wave radar being used as a vision system to aid mining operations. Machine vision does not necessarily mean using visible light. Then, from underground to underwater again, to see how underwater cameras and instrumentation can aid marine biologists in monitoring fish behaviour on a coral reefs. Back up into the air for a paper on how machine vision can be used to estimate the position, and then the automatic landing of a vertical take off aircraft. Finally, at a different scale altogether, we see how the latest techniques in fingerprint identification are performed.
Machine Vision for Beer Keg Asset Management
Michael Lees1, Duncan Campbell2, Andrew Keir2 1
Foster’s Australia, Yatala Operations School of Engineering Systems, Queensland University of Technology
2
1
Abstract
A typical large brewery could have a keg fleet size in the order of hundreds of thousands. For some breweries, the asset value of this fleet is second only to the fixed plant. The annual rate of attrition within the fleet can range from 5% to 20%, a sizable figure in dollar terms with a stainless steel keg costing around USD100. There is a business case for a keg asset management system that can help to reduce the annual rate of attrition and supply chain cycle time. Established solutions such as bar codes and RFID tags are costly as they require a modification to every keg. The feasibility of a machine vision tracking system based on the optical character recognition (OCR) of the keg’s existing serial number is explored. With prospective implementation in the keg filling line, a process is proposed which is based on neural network OCR. A recognition rate of 97% was achieved for kegs with non-occluded serial numbers, with realistic scope for further improvement.
2
Introduction
The modern day stainless steel keg has proven to be a robust and reliable means of transferring large quantities of beer from the brewery to the various destinations of consumption. However, it has been estimated that for every pouring tap, a total of up to eight kegs are required in support of each tap to cater for supply chain latencies and keg losses (Bryson 2005). A typical large brewery could have a fleet size in the hundreds of thousands, but it is easy to see how fleet sizes can range from anything from tens of thousands (Till 1996) to millions (Schneider 2003) of kegs. In terms of asset value this can be very high. For some breweries it will be second only to the fixed plant (Perryman 1996). Despite the kegs being able to survive upward of 20 to 30 years (Clarke and Simmons 2005), the annual rate of attrition (including loss and theft) can range from 5% to 20% depending on location (Bry-
126
Michael Lees, Duncan Campbell, Andrew Keir
son 2005). With a stainless steel keg costing around USD100 each, this becomes a considerable annual financial burden for the brewery (Bryson 2005). Clearly there is a business case for a keg asset management system that can help to reduce the annual rate of attrition and supply chain cycle time. Due to a keg’s cycle time in the trade, it will only spend a fraction of its life on the actual company premises. These relatively brief moments are primarily for activities such as cleaning, filling and repair (when necessary). This is the only time available for collecting information that is required to effectively manage the fleet. The type of information required includes the current size of the fleet, and the average age and the condition of the kegs. Currently these are estimates based on visual assessments and counts of kegs moving through the plant. The brewery’s ability to manage keg assets could be significantly improved if each keg could be automatically identified once it enters the keg line. Due to the financial dimensions of keg fleet management, the concept of keg tracking has become popular (Till 1996, 1997). In recent times a variety of different techniques and solutions have been proposed. These include regular bar codes, advanced laser-etched two dimensional bar codes (Clark and Simmons 2005), as well as RFID tags (Perryman 1996; Pankoke et al. 2005). The RFID tag option has proven to be popular with third party keg management companies.1 However most of the proposed techniques, including these mentioned, require each and every keg to be modified in some way. For a typical fleet size, this means that the application of even the cheapest of barcodes or RFID tags (which can cost up to USD52 depending on the type of transponder and keg) to each and every keg would present a significant cost. It is a task that at best would only ever asymptote towards completion. Another proposed solution to the challenge of keg asset management is the use of non-returnable disposable kegs.3 As each keg already has a unique ID number stamped into the metal dome (the top of the keg), machine vision techniques have attracted particular attention in addressing this problem. Syscona4 offer a product that is designed to read the ID number on the top of brand new kegs. This system has a niche market. It is required by customs in some countries to provide a human-readable but automated verification of the keg’s RFID tag. This is an example of where the application of these two technologies can be complementary. Despite Syscona being able to read the ID number on new kegs, there is still a need for a machine vision system that can read the ID number on the regular (often aged and weathered) kegs in circulation in the trade. By identifying ID numbers in real-time through optical character recognition, coupled to a national keg asset database, a meaningful keg audit could be conducted at any time simply by querying the database. This system would provide the mechanism to establish the keg fleet age distribution, the history of each keg, identify kegs not seen for a prolonged time, identify and remove foreign kegs from the production line, and to enable preventative maintenance to be carried out on 1
www.trenstar.com www.schaeferkegs.com 3 www.ecokeg.com 4 www.syscona.de 2
Machine Vision for Beer Keg Asset Management
127
each keg. An automated visual inspection of the keg dome with the filler valve will identify many of the faults that would lead to product loss. This offers a significant saving in lost beer and the transport costs for replacing a customer’s leaking keg. Most importantly, a machine vision OCR solution does not require costly alterations to each and every keg.
3
Problem Statement
The primary research objective is to develop machine vision based techniques which non-invasively identify individual beer kegs prior to filling on the production line. The nature of the problem and the solution methodology offers the further prospective benefit (a secondary objective) of detecting visible keg deformations which are likely to lead to subsequent product loss. Deformed kegs must be detected in real-time so that they can be automatically removed from the production line prior to filling. The feasibility of machine vision techniques is investigated as a potential solution to the serial number identification problem in the first instance and is the subject of this paper. One key measure for the determination of success in terms of serial number recognition is set as a successful identification rate of 98%.
4
Methodology
The challenges in addressing this problem are captured through a set of constraints and parameters, some of which are pre-defined due to the physical properties of the kegs, and some defined in consideration of the operational requirements within the context of a production line. In consideration of such, the first stage of the research methodology was that of an off-line configuration (laboratory) used to determine lighting configurations, adequate image capture requirements and the development of the machine vision techniques using a sample collection of kegs.
4.1
Key Characteristics and Constraints
The keg fleet of interest comprises four different brands of kegs: Spartanburg, Rheem and Thielmann (all of which have a spoked pattern on the dome), and Blefa (which does not have a spoked pattern). Two examples are shown in Fig. 1. They demonstrate two different ID number positions and one dome which has spokes, and one which does not.
128
Michael Lees, Duncan Campbell, Andrew Keir
Fig. 1. ID number positioning for two different keg brands: Blefa (left) and Thielmann (right)
The kegs have the following relevant properties: • • • •
Each keg has a unique ID number stamped on the top dome Each brand of keg has the number stamped in the same orientation Different brands of kegs use different fonts and have different orientations ID numbers can be obscured by foreign substances (dust, dirt, etc.) or even partly corroded Within the context of a production line, the following considerations are made:
• Placement on the production line (prior to filling) • Successful serial number identification and the determination of keg removal from the production line prior to filling must be made within one second (the definition of real-time in this instance)
4.2
Illumination and Image Capture
A key consideration in machine vision applications, is appropriate illumination of the area under inspection. In this instance, the need is to establish adequate illumination to best highlight the ID numbers and to minimise the impact of dome sheen and potential soiling. The proposed illumination configuration for a production line implementation is shown in Fig. 3 (left). The initial laboratory configuration used is shown in Fig. 3 (right). The lighting array modules are Nerlite 51 mm×203 mm, 636 nm wavelength LED arrays. Images were captured with an effective spatial resolution of 4 pixels per millimetre across the keg dome.
Machine Vision for Beer Keg Asset Management
129
Fig. 2. Schematic of lighting array concept (left) and simulation of lighting via domed array setup (right)
The following is a list of the key equipment used: • • • •
Sony DFW-700 Firewire Camera Avenir 3.5 mm F1.4 lens (HTCH0354) Navitar Zoom 7000 lens Canon PowerShot G2 Digital Camera
The two lenses were required to examine two widely different aspects of the kegs. The Avenir is a wide angle lens with a field of view, when coupled with the Sony Camera, of (D H V) 112.0° 94.6° 73.5°. This allows imaging of the keg rim and dome from a distance of not more than 280 mm directly above the surface. Alternatively, the Navitar is a zoom lens capable of providing sufficiently resolved images of areas such as the filler valve assembly. Both lenses have manual iris adjustments. The two different cameras were used to examine the resolution differences and subsequent requirements for the process. The Sony Firewire camera integrates with the software used (Halcon) and can be controlled via commands given within the program. However, with limited dynamic range, and a resolution of only 728×1024 pixels, it was found that it did not meet the requirements for clarity when it came to the application of the OCR algorithms. In order to eliminate camera resolution as an inhibiting factor, still shots were taken with a Canon PowerShot G2 digital camera, which has a resolution of 2272×1704 pixels. These images formed the basis of the subsequent processing in this developmental phase, and were treated as equivalent to images captured in real-time by a large format camera.
5
Keg ID Number Recognition
The following image processing methodology was developed and shown in Fig. 3. The pre-processing and classification is discussed in greater detail in this section.
130
Michael Lees, Duncan Campbell, Andrew Keir
Fig. 3. Image capture, conditioning and serial number recognition process
5.1
Pre-processing
Due to the symmetric nature of the kegs, and the inability to know the specific orientation of the kegs on a production line, it was necessary to develop a mechanism whereby the serial number region could be located on the dome, and then targeted for application of the OCR algorithm for classification. Following initial image pre-processing to convert to greyscale (and therefore reducing the pixel depth of the image), global edge enhancement and noise attenuation, a two step approach to segment the image and to extract serial numbers is applied. Reduced computational time was a consideration in selecting the processes within these steps and also in consideration of the computing architecture that would ultimately be commissioned. The first step is one of locating text areas on the dome and extracting likely candidate regions for serial numbers. The second step establishes and extracts serial numbers within the candidate regions. A Laplace of Gauss (LoG) operation (σ = 2) is applied prior to segmentation for noise suppression. This filter uses the two-dimensional derivatives of the Gauss-function together with a Gaussian low pass filter for noise suppression (Klette and Zamperoni 1996). The LoG operator computes the Laplacian Δg(x,y) for an arbitrary smoothing of the Gauss-function σ, giving an expression for the kernel: Δ G σ (x , y ) =
1 2 πσ
4
⎛ x2 + y ⎜⎜ 2 ⎝ 2σ
2
⎞⎡ − 1 ⎟⎟ ⎢ exp ⎠⎣
⎛ x2 + y ⎜⎜ − 2σ 2 ⎝
2
⎞⎤ ⎟⎟ ⎥ ⎠⎦
(1)
Machine Vision for Beer Keg Asset Management
131
Edges are defined by zero-crossings following application of the LoG filter. This operator tends to be insensitive to noise and provides faithful representation of edge positions even with low-edge gradients (Gonzalez and Wintz 1987; Russ 1995). A dilation circle is applied to the thresholded image following the LoG filtering operation. Areas containing stamped text, or those with significant scratching, are broadened by a relatively large radius dilating circle. The circle radius is chosen such that sequences of digits making up a serial number appear as a single, elongated area as seen in Fig. 4. The filler valve assembly is an easily locatable reference point corresponding to the centre of the dome from a longitudinal perspective. This effectively addresses any translational shift of the image from true centre along the longitudinal axis. The serial number stamp location varies from brand to brand, however the radial location is relatively consistent within each brand, and there are not a large number of brand types. In each case, a mask is constructed corresponding to the serial number stamping location. The mask(s) can then be digitally rotated about the keg centre point, as defined by the centre of the filler valve, such that as the mask passes over the broadened serial number, a positive detection is made. Analysis of the size of the text present on the dome of the keg showed that the serial number would return a clearly distinguishable pattern quite unlike that of any of the other areas. The Spartanburg, Rheem and Thielmann brands have a spoked pattern with the serial number stamped consistently in relation to the spokes. The spoke pattern is easily distinguishable and can be used as a landmark for six discrete rotational steps of the mask. The serial number must appear within one of the six segments contained within the arms of the spokes (see Fig. 4). This decreases the number of rotational steps required to locate the serial number. This spoke detection can be used as a first pass stage for image rotation and serial number detection. Should the spoke pattern not be detected, the keg is either a Blefa or it is overblown. Assuming it is a Blefa, the serial number location mask is incrementally rotated at relatively small angles (eg. 2.5°). The broadened serial number is detected as it
Fig. 4. Auto-location of serial number using density profiles (left) and location of serial number within a spoked segment (right)
132
Michael Lees, Duncan Campbell, Andrew Keir
passes through the mask. If a serial number is not detected through a complete digital rotation of the mask (either in the spoked case or non-spoked case) then it is poorly discriminated from the background. This can be either due to extreme damage such as widespread abrasions, foreign substances coating the dome such as paint, or the keg may be overblown. This sequence of operations and decisions is summarised in Fig. 6. It is noted that a section of rust/scratches could return a false positive for the serial number section. Whilst requiring further investigation to exclude such an occurrence from interfering with the data, the likelihood of such a section being contained to a small area and not enveloping the whole of the keg dome would be quite small. If a full keg is placed in a cellar that is so cold that the beer freezes, it will expand and stretch the actual keg. This could impact upon brand determination and serial number extraction. Since this stretching almost always results in an increase in the height of the top dome of the keg, and the keg would then be classified as being overblown (see Fig. 5). Given that this condition creates some complications for the machine vision process, it prompted some further consideration as to how to measure whether or not a keg was indeed overblown. It is proposed that three or four high intensity light sources could be focused at predefined spots on the keg dome. The illuminated “spots” would intersect within a pre-defined region of interest, and thereby give an indication as to whether the radius of curvature of the keg is within acceptable bounds.
Fig. 5. Overblown Thielmann keg (left) and planar diagram of an overblown dome (right)
Machine Vision for Beer Keg Asset Management
133
Fig. 6. Flow chart of serial number detection and extraction
5.2
Classification
Serial number digits are classified using optical character recognition methods. Two approaches were investigated. One is based on the traditional template matching approach and one using neural networks. Template Matching The basis of the template matching approach to OCR is to match the input pattern image to a template either derived from many exemplars or one which is artificially created. Font tolerance is easily catered for through the appropriate construction of the template. The degree of coincidence (or the total sum of differences) between the pattern and the template give a metric by which a specific character can be identified.
134
Michael Lees, Duncan Campbell, Andrew Keir
Table 1. Experimental recognition rates of template matching based OCR Overall true recognition rate False positives False negatives
72% 25% 3%
The template was created specifically for the Blefa font type, Blefa’s being the newer of the kegs, so as to determine the accuracy of the template matching approach with minimal impact from variations in sheen, scratches and surface damage. Template matching based recognition achieved a 72% overall successful recognition rate – an unsatisfactory result. The false positive cases are defined as those for which a digit was classified and either it did not exist (9%), or it did exist but was incorrectly classified (16%). The false negative cases are those where a valid digit was not recognized at all (3%). This recognition performance appears to be strongly related to three factors. 1. Visually malformed digits that would defy human visual. This was due to instances of poor embossing (stamping) of the digit, and markings occluding digits. The former instance will always provide a challenge. The latter instance can in part be dealt with through caustic cleaning of the keg prior to inspection. 2. The digit registration method to delineate digits relied on the existance of clear space between digits. There were instances where extraneous pixels (byproducts of the pre-processing) effectively joined adjacent digits and thereby eliminated the required inter-digit spacing. Some further tuning of the preprocessing stage as well as using predefined digit pitch spacing may assist in improving the false classifications. 3. Occassionally, extraneous pixels with space on either side, were taken as being candidate digits. This contributed to false positives. Template matching relies heavily on spatial correlation (or coincidence) and is therefore very sensitive to translational and rotational variations. Given the nature of the keg serial number identification problem, variations in translation and rotation are expected. A recognition method with high degrees of tolerance is preferable. Neural Network Computationally intelligent (CI) methods, which are more likely to successfully extract digits that are visually discernable, and are less dependent on digit registration processes, are suggested to be more appropriate. Candidate CI methods include neural networks, fuzzy logic and hybrids thereof. Neural networks were examined as a precursor to potential future development of computationally intelligent keg serial number OCR. A three layer feed-forward neural network (one input layer, one hidden layer and one output layer) was used (Hagan et al. 1996). The number of input layer nodes corresponded to the number of pixels in the scanning pixel array. The number of output neurons corresponds to the number of output classes. It was found that a single hidden layer was sufficient to achieve the results described below.
Machine Vision for Beer Keg Asset Management
135
The main consideration in selecting the number of hidden layers and the number of hidden neurons is that of having sufficient neurons to achieve acceptable performance whilst maintaining generalisation within the solutions. In all, the neural network had 6000 input neurons (corresponding to a 60×100 pixel block), 10 output neurons, to identify the ten digits per font, and a single hidden layer of 32 neurons. The activation function for all neurons was a logSigmoid function. The network was trained using a template of the digits arranged such that each numeral was represented by a 60×100 array corresponding to the input digit block. A set of idealised data vectors was firstly used to train the network until it reached a predefined and acceptably low sum-squared error. The training vectors were then augmented with a set of noisy vectors to assist with the generalisation process and facilitating correct identification of noisy digits as well as clean digits. These noisy vectors were created via the addition of random noise with standard deviations of 0.1 and 0.2 to the idealised vector. Table 2. Experimental recognition rates of neural network based OCR Overall true recognition rate False Positives False Negatives
92% 8% 0%
The neural network achieved an improved recognition rate of 92%. No classifications were made for non-existing digits and 8% of existing digits were incorrectly classified. Taking into consideration those digits which were malformed in the stamping process, and those occluded by dirt and markings (as cited above in the template matching case), presenting significant challenges for human visual inspection, a recognition rate of 97% is arguable.
5.3
Discussion
Neural network classification appears to provide reduced sensitivity to digit translation and rotation. Template matching is particularly susceptible to the lack of, or extraneous, inter-digit spacing. Given the nature of the keg ID number recognition problem, and production line implementations, it is apparent that neural network classification is the more robust method. Multiple fonts could be used in the training process of neural networks and therefore rely on the training to tolerate variations. This does place a greater emphasis on correctly configuring and training of the neural network. It also relies on developing an extensive list of exemplars for training to ensure the entire solution space is solved for. Neuro-fuzzy systems have the advantage that the internal configuration of such systems can be defined by humans based on knowledge rather than the somewhat black-box approach with neural networks. Neuro-fuzzy systems can be trained using exemplars to either assist in exploring the ideal configuration, or to tune the internal parameters. In the context of multiple fonts, a neuro-fuzzy system could
136
Michael Lees, Duncan Campbell, Andrew Keir
be manually configured to directly represent all digits in all fonts and then tuned with exemplar data.
5.4
Conclusion
Given the justified need to implement a beer keg tracking and management system, machine vision based techniques offer a solution that does not require tagging or other intrusive modifications to be made to the keg fleet. Conceivably, a vision based inspection station could be placed after the keg external cleaning stage and prior to filling. A vision based system also offers the ability to inspect kegs for visible deformities providing an opportunity to remove the keg from circulation for repair or discarding without loss of product. It is proposed that the inspection station comprises adequate keg handling to ensure keg travel within a field of view encompassing the keg dome. The dome should be illuminated using a circular arrangement of red LED arrays angled such that the contrast between the embossed ID numbers and the dome sheen is greatest. A digital camera with an appropriate lens could be mounted inside the centre of the domed light array. The camera should have a resolution of at least 1704×1704 pixels with the keg dome filling the field of view (hence the need to mechanically guide the keg through the inspection station). The camera should have a dynamic range of at least 16 bits and can be monochrome. Neural network based recognition provides a feasible classification method albeit it did not achieve the target 98% classification rate required in the first instance. It does however demonstrate greater robustness to artifacts (rotation and translation variations) which will be inherent in a production line installation. Classification rates can be further improved through refined image pre-processing techniques, refinement of the neural network architecture and the employment of hybrid techniques such as neuro-fuzzy based classification.
Acknowledgements The authors wish to thank: Dr Peter Rogers and the management of Foster’s Australia for both support of this project and for permission to publish this material, and Queensland University of Technology (QUT) for the loan of camera equipment and lenses that were used within this project.
References [1] [2]
Bryson L (2005) Brewers, Do You Know Where Your Kegs Are? The New Brewer. Sept/Oct, http://www.beertown.org Clark D, Simmons K (2005) A dynamic 2D laser mark, Industrial Laser Solutions, Aug, pp 19–21
Machine Vision for Beer Keg Asset Management [3]
137
Gonzalez RC, and Wintz P (1987) Digital Image Processing, 2nd edn. AddisonWesley, Reading, Massachusetts [4] Hagan MT, Demuth HB, and Beale MH (1996), Neural Network Design, PWS Publishing, Boston, MA [5] Klette R, Zamperoni P (1996) Handbook of Image Processing Operators, John Wiley, Chichester, England [6] Pankoke I, Heyer N, Stobbe N, Scharlach A, and Fontaine J (2005) Using RFID technology to optimize traceability, Proceedings of the European Brewing Convention, Prague, pp 1–7 [7] Perryman M (1996) Practical use of RF tags for tracking beer containers. Brewer’s Guardian, Nov, pp 29, 33 [8] Russ JC (1995) The Image Processing Handbook, 2nd edn. CRC Press, Boca Raton, Florida [9] Schneider M (2003) Radio Frequency Identification (RFID) Technology and its Application in the Commercial Construction Industry. M.Sc. Thesis, University of Kentucky [10] Till V (1996) Keg Tracking – a Method to Control the Keg Fleet. Proceedings of the Twenty Fourth Convention of the Institute of Brewing Asia Pacific Section, pp 170– 173 [11] Till V (1997) Keg tracking – a method to control a keg fleet; experiences and advantages. Proceedings of the European Brewing Convention, pp 737–746
Millimetre Wave Radar Visualisation System: Practical Approach to Transforming Mining Operations
E. Widzyk-Capehart1, G. Brooker2, S. Scheding2, A. Maclean2, R. Hennessy2, C. Lobsey2 and M. Sivadorai1 1
CSIRO, Brisbane, Australia (formerly CRCMining, The University of Queensland, Brisbane, Australia) 2 CRCMining and Australian Centre for Field Robotics, University of Sydney, Sydney, Australia.
1
Introduction
Over the last three decades, mining operations have undergone massive transformation to mechanized, semi- and automated mining systems; manual labor has been gradually replaced by machine operations and processes have become more integrated. This change was possible due to technological advances in sensing techniques, improved excavation methods, bigger and more reliable mining machines and better understanding of geological conditions. Yet, with all the technological advances, majority of mining operations still rely on human “operator” to achieve production goal, whose performance, in turn, is influenced by the accuracy of information provided by various data gathering systems and by the variable, sometimes unmanageable or unpredictable environmental conditions. In order to achieve and maintained high level of performance of man-machine systems, the information acquired using various technologies must be accurate and must be provided in time for uninterrupted operation. One such technology, which promises to revolutionalise the way mining is conducted and bring it even closer to automation, is the millimetre wave radar visualisation system which, by mapping the working environment of mining equipment, acquiring production data and supplying information to operator, is well positioned to improve safety, increase productivity and reduce operational and maintenance costs. Depending on the application, the radar visualization systems, developed by the CRCMining in conjunction with the Australian Centre for Field Robotics at the University of Sydney, are currently at various stages of development, from proto-
140
E. Widzyk-Capehart et al.
type to fully functional commercial units. These systems are continually being improved through ongoing research. The implementation and performance verification of the radar sensor for range and 3D profiling were undertaken in underground and surface mines. Cavity, stope fill and orepass monitoring as well as dragline environmental mapping and rope shovel bucket and dig face imaging were the main areas of radar testing. Excellent performance results show the mm wave radar unaffected by dust, vapor, high temperature, vibration or corrosive environment encountered in mining operations. The benefits of the radar technology are associated with an increased productivity, decreased maintenance costs, reduction in machine wear and tear and improved safety.
1.1
Application
Underground applications of the mm wave radar system include range measurements in ore-passes and imaging of the internal structures of large mined out cavities. In open cut mines, the uses of radar extend from 3D surface mapping for volume reconciliation and slope stability monitoring to measurements of machine position, bucket fill and bucket position. Most underground mines operate by moving rock from the higher levels, where the mining takes place, through near vertical tunnels called ore-passes or silos to crusher stations at the lowest levels. This is illustrated schematically in Fig. 1a, which shows rock being dumped into the pass through a coarse lattice called a grizzly that restricts the maximum size to less than one meter across. In some mines, the rock travels through an inclined finger pass that feeds into the side of the main pass (Fig. 1a), in others, it goes through a grizzly that covers the top of the main pass. Pass diameters vary between 3 m and 6 m while silo and stope diameters can be up to ten times this size. An accurate measurement of the range to the ore in a pass allows for checking “hung” passes, monitoring the volume of ore store, maintaining a rock buffer above the loading station by the operator and automating some of the ore flow process. The “hung” pass occurs when the rock plugs the pass, creating an ever widening void below the hung rock as broken material is drawn from the bottom of the pass. If this condition is not detected in good time, thousands of tonnes of rock can fall onto the loading station, when the plug releases, with potentially catastrophic consequences. For example, a kinetic energy of a single 2 t boulder after falling 100 m is 2 MJ. Considering that 1 m of rock in a 4 m diameter pass has a mass of 30 t, the resulting kinetic energy would be 30 MJ when the rock is dislodged 100 m above the draw point. This could have disastrous consequences with damage infrastructure and production stoppages. Many underground mines, where massive deposits of ore are found, produce large underground voids called stopes. Generally, geotechnical constraints limit the stope’s size in plan to about 40×40 m but vertically the limitation is determined by the orebody outline, which could reach up to 300 m in extent. The void created by the mining operation must be filled with a competent material capable
Millimetre Wave Radar Visualisation System
141
of supporting the walls and the roof while the adjacent stope is being mined. The materials of choice are Cemented Aggregate Fill CAF (a weak concrete) and/or rock, depending on the duty required. The cost difference between CAF and rock fill makes it desirable to maximize the amount of rock-fill without compromising the strength of the filled stope. Since the cavities are often filled with dust and water vapor, an ideal tool to monitor filling process and fill levels is a “real-time” remote device that can see through the mist and vapor, as shown in Fig. 1b. Surface coal mining operations rely heavily on the performance of mining equipment, especially, draglines and rope shovels. The primary function of draglines is to remove the overburden and uncover the coal seam, which can then be excavated by another dragline, shovel-truck system or front-end loader and transported to crushers for further processing. The excavation process is complicated by the requirement that these massive machines, some weighing in excess of 3,500 tonnes, retain access to and from the working area. The environments around draglines and rope shovels are often dusty or, in some cases, shrouded in mist, smoke or even steam, to the extent that the operator is unable to see the working area. This makes digging impossible for significant periods of time, which is extremely expensive for a machine that costs up to $500,000 per day to operate. Once again, the ideal tool for viewing the dig and fill process is a “real-time” remote device that can see through the opaque environment and can present the operator with an image of the terrain surface or the bucket contents, as illustrated in Fig. 1c and 1d. These visual feedback sensors can also be used for the partial and, ultimately, for the complete automation of the mining process.
2
Sensor Requirements
2.1
Signal Dynamic Range
All sensors described in Fig. 1 must be capable of measuring the range to the rock or backfill through an extremely dusty volume to accuracy better than 1% of the maximum range. Some stopes and passes are longer than 300 m and the range requirements for open-cut visualisation easily exceed this distance. Therefore, for a minimum range requirement of 3 m, the ratio of maximum to minimum range will be at least 100. To determine the sensor’s relative received power, a simplified version of the radar range equation can be applied,
⎛ λ2 ⎞ ⎟ + σ dB − LdB − 40 log10 R S dB = PtdB + 2GdB + 10 log10 ⎜⎜ 3 ⎟ ⎝ (4π ) ⎠
(1)
where: SdB – Received power (dBW), PtdB – Transmitted power (dBW), GdB – Antenna Gain (dB), λ – carrier wavelength (m), σdB – Target radar cross section (dBm2), LdB – Losses (dB), R – Range (m).
142
E. Widzyk-Capehart et al.
Most of these parameters are determined by the radar system design with the exception of the target radar cross section, σdB, which is a function of the terrain reflectivity, σ°, and the beam footprint. The reflectivity of a typical distributed target (rock surface) can be estimated using rough surface scattering models (Beckmann and Spizzichino 1987) or the following well known relationship (Nelson 2001)
σ = o
cosθ − ε rock − sin 2 θ cosθ + ε rock − sin 2 θ
(2)
where θ is the angle of incidence and the complex permittivity of the rock is εrock. For a rough surface, where the average incidence angle θ = 0° the equation reduces to
σo =
1 − ε rock 1 + ε rock
= 0.34
(3)
where εrock = 4.21-j0.156 for coal at X-Band (Nelson 2001). From measurements, it has been found that the target structure, which can vary from a huge boulder to piles of blasted rock and paste, not the rock permittivity is dominant in determining the “effective” reflectivity. The measure shown in (3) is, therefore, not accurate in isolation and the radar cross-section, σdB, is usually determined experimentally. Figure 2 shows the results of more than 200 measurements made over a 24 hour period in a working pass using a pulsed 94 GHz radar in which variations of 30 dB to 40 dB in target reflectivity can be seen (Brooker et al. 2005) for dry rock with a reasonably constant permittivity.
Millimetre Wave Radar Visualisation System
143
Fig. 1. Radar applications for (a) Ore-pass showing the rock fill (b) Stope showing the CAF and crushed rock fill, (c) Dragline monitoring of the dig area and the bucket position and (d) Rope shovel monitoring of the bucket fill and tooth integrity
144
E. Widzyk-Capehart et al.
Fig. 2. Measured 94 GHz reflectivity histogram made in a vertical ore-pass showing the large dynamic range that can be expected in signal level
As theradar cross section is the product of the target reflectivity and the area of the beam footprint, Afoot, for a symmetrical antenna with a half power beamwidth, θ, at range, R, it is
σ dB = 10 log10 (σ o Afoot ) = 10 log10 σ o
π 4
(Rθ )2 = σ dBo + 10 log10 ⎜⎜ πθ
⎞ ⎟⎟ + 20 log10 R ⎝ 4 ⎠ ⎛
2
(4)
Substituting into (1) results in
⎛ λ2 πθ 2 ⎞ o ⎟⎟ − LdB + σ dB S dB = PtdB + 2GdB + 10 log10 ⎜⎜ . − 20 log10 R 3 ⎝ (4π ) 4 ⎠
(5)
It is clear from (5) that a combination of 40 dB of variation in the reflectivity and a range ratio of 100 to 1 (40 dB) results in a signal dynamic range of 80 dB. To process this received power, the signal is generally digitised and hence an analog to digital converter (ADC) with a dynamic range of at least 80 dB is required. As most common ADCs with the required throughput are 12 bit devices, with a dynamic range of less than 72 dB, some form of gain control prior to conversion is required.
Millimetre Wave Radar Visualisation System
2.2
145
Attenuation Through Clouds of Dust and Water Droplets
Atmospheric attenuation through clear air is determined by frequency dependent molecular interactions with the electromagnetic radiation. In Fig. 3 (Preissner 1978), it can be seen that the attenuation increases with increasing frequency in the microwave and millimetre wave bands before dropping off sharply into the IR and visible bands. In the millimetre wave band, windows of relatively low attenuation occur at 35 and 94 GHz between oxygen absorption lines. It is within these windows that most radar activity occurs, with the frequency around 77 GHz earmarked for automotive sensors and that around 94 GHz reserved for defence and experimental applications. For the relatively short range applications considered here, atmospheric attenuation is not significant; however, signal attenuation is affected by the particulates suspended in the atmosphere. Although no quantitative studies of millimetre wave signal propagation through dust on mines have been undertaken, some work has been done on propagation through dust storms (Gillett 1979), which showed that the visibility can be related to the mass of dust per cubic meter of air (Table 1). Since the wavelengths of most radar signals, including the millimetre wave band, is much larger than the diameter of the dust particles, Rayleigh scattering is used to determine signal attenuation as a function of the visibility (Goldhirsh 2001). This relationship (Fig. 4) can be used to determine the performance of the radar in dusty or misty conditions (Brooker 2005).
Fig. 3. Atmospheric attenuation of electromagnetic radiation in clear air as a function of frequency
146
E. Widzyk-Capehart et al.
Table 1. Relationship between Visibility and Mass Loading Visibility (m) 0.5 1 2 4 8 12.75
Mass Loading (g/m3) 78 37 17.8 8.5 4.0 2.44*
*Average for sand storms
Fig. 4. Relationship between visibility and attenuation for coaldust and water droplets at 94 GHz
In contrast to the poor transmission at visible and IR wavelengths, attenuation at a wavelength of 3.2 mm (94 GHz) is practically negligible for short range operation through the dust, though the losses do become significant in a water vapor environment if the visibility is very poor. For a visibility of 4 m (extremely thick fog), which corresponds to a mass loading of 8.5 g/m3, an attenuation of about 12 dB/km can be expected.
2.3
Backscatter from Clouds of Dust and Water Vapour
The effectiveness of a laser or radar system is dependent not only on the actual signal level returned from the target of interest, but also the relative level of this signal in comparison to other competing returns at the same range. These compet-
Millimetre Wave Radar Visualisation System
147
Fig. 5. Backscatter from coal dust and water with identical particle size distributions as a function of the visibility at 94 GHz
ing signals are generally referred to as clutter. The most common sources of clutter are returns from the dust or water droplets within the radar beam or from large (high reflectivity) returns that enter through the sidelobes of the antenna. Figure 5 shows the reflectivity plotted for coal dust and water spray with identical particle/droplet size distributions. It can be seen that the magnitude of the backscatter at 94 GHz is very small even at extremely high dust or water droplet levels.
2.4
Other Considerations
In some applications, spurious targets must be discriminated against as partial blockages may occur in ore passes and falling rocks and/or falling water in both passes and cavities may be present. The environmental effects of high ambient temperature, shock, vibration and dramatic changes in pressure as a result of the blast concussion, must be taken into considerations during sensor development while dust and water ingress as well as potential rock falls are given special consideration when designing protective housing. The configuration setup for the system should be kept to a minimum. Systems should be light, versatile and easy to install and align as they are often mounted in inaccessible and dangerous areas. The sensors extreme reliability (MTBF>1 year
148
E. Widzyk-Capehart et al.
of continuous operation) and long term accuracy are necessary requirements to reduce/eliminate large component of system maintenance. Multi sourcing of components with robust electrical design allow for component tolerances of up to 20% and provide for module replacement without recalibration. Simple and quick assembly as well as fast testing and calibration of modules and of the complete units should be ensured. From safety perspective, any radiation produced at the operational frequency should be well within statutory limits. This is a tall order given the environmental extremes encountered in mining operation; blast concussion, machine vibration and the likelihood of rock falls.
3
Selection of Technology
For many years, laser, acoustic, visual and microwave radar systems have been the workhorse technologies for industrial level measurement and imaging. However, since the availability of moderately low-cost K and millimetre wave band components, these higher frequency options have become more popular (Kielb et al. 1999). Using the results presented here and data from numerous studies on electromagnetic (Bhartia and Bahl 1984; Comparetto 1993; Perry and Baden 2000; Goldhirsh 2001) and acoustic (Goodsit 1982; Kue 1984) propagation through dust, criteria can be presented that can be used to select the most cost-effective candidate technology for a particular application. The pros and cons of the four sensor types are summarized in Table 2 while visual system issues are addressed in the next paragraph. It should be noted that, in addition to properties listed in Table 2, the performance of acoustic sensors are affected by extraneous noise and air currents while laser technology, though robust and well established, exhibits poor sensitivity to low reflectivity materials, such as, coal. As with laser system, CCTV systems are affected by environmental conditions; dust, rain as well as variable illumination (due to changing weather conditions), which makes the visualisation and object recognition (rocks) difficult to perform. CCTV would require special illumination for night vision and multiple units for stereovision for real-time data processing and operator feedback. Target recognition using video cameras is a very difficult problem, compounded with the need to reliably estimate small differences in shape, when dealing with face/bucket imaging. Shape recognition is a largely unexplored field in this context. In their current application in bucket tooth detection systems (Motion Metrics, 2006), the image acquisition process is supported by high power illumination assembly and the imaged objects are large and distinctive. Even then there is minimal or no data available on their performance to substantiate their application in other areas of object recognition. In many above ground installations, the size requirement is not an issue and a number of quite large microwave and millimetre wave systems have been constructed for imaging and slope stability applications (Reeves et al. 2000; Macfarlane et al. 2004). Thus, for dragline and shovel based imaging systems, both millimetre wave and microwave frequencies can be used. However, the
Millimetre Wave Radar Visualisation System
149
smaller aperture requirements, and hence lower overall size and weight, make the millimetre wave option more practical, albeit, at present, more expensive. In the ore-pass application, reliable and repeatable measurements can only be made if the beam is sufficiently narrow that it does not illuminate the walls even at the longest operational range (Brooker et al. 2005). Similarly, to produce accurate measurements in a cavity requires an extremely narrow beam. This is only available from either a large aperture or high frequency operation. As large aperture devices are generally cumbersome and heavy, the only alternative is to use the highest possible frequency. Ultimately, millimetre wave radar is the only viable candidate for these applications. Table 2. Comparison of sensors for mining application Sensor
Laser
Acoustic
Microwave radar
Millimetre wave radar
Beam width
Narrow
Wide
Wide
Narrow
Beam function in enclosed regions
Good
Too wide
Too wide
Good
Ease of scanning with mirrors
High
Impossible
Low
High
Dust/water vapor penetration
Poor
Poor
Good
Good
Dust operational effect on antenna/transducer
High
Moderate
Very low
Very low
Cost
Low
Low
Low
High
Property
4
Radar Operational Technique and Specifications
A number of different millimetre wave radar techniques have been applied to measure range in industrial processes (Brooker et al. 2005). Of those, the Frequency Modulated Continuous Wave (FMCW) was proved to be conceptually simple, most robust and a lowest cost technique despite chirp linearity and sensitivity with reflected power issues, which have to be addressed during operation. The FMCW technique has only recently been adapted for longer range applications by increasing the operational frequency to the 24 GHz ISM band (Zimmermann et al. 1996). Most FMCW radars operate by transmitting a linear frequency chirp of long duration. At any time, the received echo is shifted in frequency from the transmitted signal by the product of the roundtrip time to the target, and the rate of change of frequency. If the received signal is mixed with a portion of the transmitted signal and filtered, the resulting output will be a constant beat frequency. Two
150
E. Widzyk-Capehart et al.
factors limit the range resolution of these systems: the first is a function of the chirp bandwidth and the second is the actual linearity that is achieved for the transmitted chirp. These limitations notwithstanding, it has been confirmed that transmitted powers of only 10 mW are sufficient for all of the application discussed here. The basic structure of the radar systems and operational principle are shown in Fig. 6, a and b, respectively. The specifications of the radar systems discussed here are summarized in Table 3, with the Stope and Dragline radars having similar properties.
Fig. 6. FMCW radar (a) schematic diagram and (b) operational principles
Millimetre Wave Radar Visualisation System
151
Table 3. Radar specifications Radar type Ore-pass Properties Transmit Power 10 mW Centre Frequency 94 GHz
Bucket-Fill
Stope Fill & Dragline
10 mW
10 mW
94 GHz
77 GHz 600 MHz
Sweep Bandwidth
250 MHz
1 GHz
Sweep Linearity
<1%
<1%
≈0.1%
Sweep Time
1 ms
100 us
1 ms
Range Resolution
1.2 m @100 m
0.2 m @10 m
0.27 m@100 m
Antenna Gain
42 dB
42 dB
47 dB
Antenna Beamwidth
1.5 deg
1.5 deg
1.12 deg
Receiver Bandwidth
2 kHz
13 kHz
4.8 kHz
Noise Floor
–121 dBm
–103 dBm
–117 dBm
5
Building the Radars
To facilitate installation, mounting and alignment, the ore-pass and bucket-fill sensors were each built as two modules with a front end and a signal processor connected by a shielded umbilical cable (Fig. 7a). The bucket-fill sensor used the same robust 150 mm diameter radar housing as the ore-pass unit with the addition of a mirror scanner head (Fig. 7b). The signal processor, power supplies and displays were built into a commercially available housing with power and signal access via cable glands and a mil-circular connector. To allow for installation by a single technician, the masses of the two modules were both kept as low as possible, with the processor box weighing in at 4.7 kg and the front end (without mirror scanner) below 5 kg. Scanner Design The stope and dragline radars required good angular resolution for long range operation and a mechanism to scan the beam in angle. To achieve this, a 250 mm diameter horn lens antenna and a 2-axis mirror scanner were developed. A mechanism was designed to achieve an azimuth scan rate of up to 5 rps and an elevation scan rate of 20°/s with servo systems designed to position the mirror with an accuracy of better than 0.1° in both axes. The bucket-fill radar required a high speed mirror scanner to produce 2D scans across the bucket, which was implemented using a standard swash-plate design shown in Fig. 7b. With careful balancing, this configuration was capable of scanning a 1.5° beam at up to 20 rps.
152
E. Widzyk-Capehart et al.
Fig. 7. Complete Ore-pass radar sensor (a) and bucket-fill radar with mirror scanner (b) used for evaluation purposes
5.1
Signal Processing
The beat signal output by the radar is first filtered by an anti-aliasing filter after which it is sampled and digitized at the appropriate rate. A block of data corresponding to samples taken over a single sweep period is shaped by a Hamming window to reduce range sidelobe levels before the range spectrum is obtained by processing using a Fast Fourier Transform (FFT). Multiple sweeps can be integrated to improve the available signal to noise ratio prior to the application of the target detection, peak interpolation and discrimination algorithms. The measured target range is then available on a 4–20 mA loop or RS232 serial connection in the case of the Ore-pass radar, while client-server architecture is used by the other radars. The client (the visualization software) requests data from the radar server in order to visualize it. This architecture allows the visualisation client(s) to be physically remote and, since the Ethernet is the main transport protocol used, may be located anywhere in the world.
6
System Implementation and Results
6.1
Underground Application: Ore Pass and Cavity Monitoring
Underground applications were initially evaluated at Western Mining’s Olympic Dam Mine in South Australia and since then the Ore-pass radar has also been trialled elsewhere in Australia, the USA, Canada and Indonesia.
Millimetre Wave Radar Visualisation System
153
Ore-pass Radar Olympic Dam has a long history of trialling range measurement devices in orepasses, thus the infrastructure is in place for quick installation. Previously, ultrasonic, laser and microwave units had been installed in the trial pass at different times, where all three techniques had proven to be unreliable. The ultrasonic units were found to be too fragile, the laser units couldn’t penetrate the dust and the microwave units often returned signals from the walls of the pass rather than the bottom. In the initial evaluation, the millimetre wave radar was installed in a working pass and connected to the mine SCADA system for 10 months during which time its output was compared to the manually “dipped” range on a regular basis. The radar long-range performance was evaluated to ranges in excess of 170 m during a subsequent test in a deeper pass. That millimetre wave radiation can penetrate extremely thick dust was confirmed during tests in a crusher pocket, where the radar was still able to function perfectly, albeit at short range, through dust densities of up to 4 kg/m3. Since then the radars have been installed in many sites world wide, where they have proved to be both accurate and reliable. Figure 8 shows the capability of the radar, mounted over the grizzly, to measure both the depth of the pass and LHD activity.
Fig. 8. Measured ore-pass data as a function of time showing (a) reflections from the grizzly, the LHD bucket, falling rock and the bottom of the pass and (b) measured range and signal levels in a working pass
Stope Fill Radar Installation of the stope fill radar involved the erection of a gantry that extended out over the open stope as shown in Fig. 9a. A fiber-optic link was deployed to communicate the high speed image data to a dedicated computer in the control room on the surface a few kilometers away from the experimental site. A snapshot of the output from the radar early in the trials is shown as a point-cloud image in Fig. 9b, which was able to capture both the CAF pile on the stope floor, and a column of CAF as it discharged into the stope.
154
E. Widzyk-Capehart et al.
Fig. 9. Stope radar trials at Olympic Dam mine showing (a) the radar suspended over the stope on a fixed gantry (b) Point cloud data measured during the fill process
From these trials it was found that the angle of repose of rock and CAF are sufficiently different to identify the fill material and that the echo amplitude reduces over time as the CAF cures. This could be used to determine when the material is sufficiently cured to continue filling. Additionally, snapshot measurements made every couple of days were processed to reconcile the volume of CAF or rock poured into the stope with that logged by the backfill plant. It can be seen from Fig. 10, which shows a pile of CAF, that the measurements were made with sufficient resolution to identify differences in the angle of repose of the different fill materials as required for their identification. Subsequent trials in other stopes and mines with the same unit over the last four years without a single failure have shown that, notwithstanding the corrosive atmosphere, the high temperatures and the radiation within the stopes, this radar and scanner is also extremely robust. As regards its accuracy, in the latest trials, a comparison was made between the surface measured by the stope radar and that measured using a cavity monitoring system, which found excellent alignment between the two sets of data. It may be possible, from images of the stope floor shown in Fig. 10, to automate the fill process to some extent to select the type of fill used: rock or CAF. At present, the system operates as a visual, real-time feedback mechanism to allow the backfill operators to optimize the process manually.
Millimetre Wave Radar Visualisation System
155
Fig. 10. Stope floor image processing showing (a) 3D Plot of the floor of a stope during the fill process and (b) a cross section through the pile of CAF from which the angle of repose, and hence the fill type can be determined
Borehole Deployable Stope Radar After the initial success of the stope radar, the research was directed towards the development of 94 GHz radar deployable into stopes and other inaccessible underground cavities through a 150 mm diameter borehole. The process of designing a suitable antenna, integrating all components into a small diameter unit and prototype manufacturing was completed in April 2006 (Fig. 11). Initial tests conducted using a simulated borehole at the ACFR indicated that friction from the push-rods and from the radar head would make manual insertion and retrieval even through a relatively short borehole difficult, so the radar was fitted with low-friction wear rings (visible in Fig. 12a) and castor-clusters were manufactured to support the push-rods at 2 m intervals. The test radar required a minimum of additional infrastructure. A pair of car batteries supplied the 24 V required for the radar, while a small inverter, driven from these batteries, supplied the laptop with power. Ten 2 m long glass fibre push-rods and a 20 m umbilical cable connected the radar to the display. Trials were conducted at Olympic Dam in late April, 2006. A custom hole had been drilled and then reamed to size through about 13 m of rock into an irregularly shaped underground stope which was being filled. To ensure that nothing within the stope could snag onto the radar within the stope, a custom built survey tool consisting of a tilt sensor and a backward facing wide-angle video camera was first inserted through the borehole into the stope. This indicated a partial blockage towards the end of the borehole which required further reaming and ultimately blasting to clear before it was safe to insert the radar (Fig. 12b).
156
E. Widzyk-Capehart et al.
Fig. 11. Complete borehole deployable radar with covers removed
As shown in Fig. 13a, by chance, the point of radar deployment was within 5 m of the CAF column pouring into the stope. This proved to be a fortuitous accident as it confirmed our predictions that the technology could operate through extremely dense CAF-spray, as can be seen in the quality of the images shown in Fig. 13. It did also highlight issues with the mechanical scanning mechanism,
Fig. 12. Borehole deployable radar being inserted into an oversize 13 m long test hole into the stope
Millimetre Wave Radar Visualisation System
157
Fig. 13. Radar images made of the interior of the stope while it was being refilled with CAF (a) horizontal cross section image and (b) vertical point cloud image
which was easily jammed by small stones from the CAF spray that could result in the radar being jammed within the borehole. These successful outcomes have allowed us to start the development of second generation radar with more robust mechanics to be trialed towards the end of 2007.
6.2
Surface Application: Dragline and Electric Rope Shovel
Dragline radar visualisation system was evaluated at Rio Tinto, Hunter Valley Operation, BMA Peak Downs Mine and Anglo Coal Kleinkopje Colliery in South Africa. The electric rope shovel radar sensor was tested at Bracalba Quarry. Dragline Radar The radar was installed on the roof of the operator’s cabin on production Dragline DRE23 at Peak Downs Mine for several months. This location provided a clear view of the work area in front of the machine. The 3D Radar Scanner frame had an adjustable tilt over a 45 degree range in five degree intervals. The angle settings were determined from the limits imposed by dragline boom and the ground. The mirror servos were configured for a scan angle of 30 deg in elevation and 120 deg in azimuth and a full raster scan was completed in approximately 2 minutes. The radar unit was linked with the GPS Navigation Unit, installed on the dragline (Fig. 14), which received signals from the GPS base station, position near a power sub-station, approximately 3 km away from the dragline. A Stochastic Environment Representation technique was selected for data processing due to its capability of modeling the uncertainty in the sensing processes using Bayesian estimation of the environment (Leal, 2003). Data from the radar was geo-referenced to produce a set of points, which were analyzed to eliminate extraneous variables. Once the points were cleaned, a topological surface was
158
E. Widzyk-Capehart et al.
built, an example of which is shown in Fig. 15. This image shows two benches contoured at 2.5 m intervals where the apparent holes in the walls were regions not illuminated by the radar (shadows). The surface generation can occur in a fairly automated fashion and data can be produced in several output formats suitable for subsequent rendering and analysis. The radar testing showed that images of the terrain could be produced continuously with graphical display, regardless of weather conditions and dust generation. Clean measurements of the terrain surface were made using the radar out to ranges of 300 m. Once the initial surface was generated by the radar system, which can be performed while the dragline is stationary, the subsequent acquisition and processing of the radar data to update the surface took approximately just over one second. This provides an excellent basis for real-time operator and engineering personnel feedback on changes in the dragline environment. Radar data output can be imported into any mine planning software with successful integration achieved using 3d-Dig software (Widzyk-Capehart et al. 2006) A volume estimation technique was developed, with testing showing that for smooth objects of a known volume the results were closely approximating the theoretical volume if the grid spacing was sufficiently small.
Fig. 14. 3D Radar and the GPS Navigation units
Fig. 15. Geo-referenced surface generated at 00:52UTC mounted on DRE23 on 25 July 2004
Millimetre Wave Radar Visualisation System
159
To allow dragline operator to continue operation during periods of low visibility could save the mining industry millions of dollars in productivity losses. Towards that end, the development and proof-of principle testing of the mm wave radar was commissioned by Anglo Coal at their Kleinkopje Colliery in South Africa. Light weight radar based on the high-speed scanner, discussed in the following section, was mounted approximately half way up the boom of a Marion 8200 dragline. This radar was capable of producing 3D image slices through the bucket and cables for real-time display to the operator which would allow him to continue operation even when unable to see the bucket through thick mist or dust. Figure 16 shows radar images taken during the dump (Fig. 16a) and swing (Fig. 16b) cycles with bucket schematic overlay confirming that the bucket position can be determined with good accuracy from the rigging orientation. A photograph of the dragline taken with a perspective and scale similar to radar view was used as reference (Fig. 17).
Fig. 16. Radar images taken during (a) the dump cycle and (b) the swing cycles with bucket schematic overlay
160
E. Widzyk-Capehart et al.
Fig. 17. A photograph of the dragline taken with a perspective and scale similar to radar view is used as reference for radar imaging
Shovel Radar The shovel radar trials were undertaken on a P&H 2100BLE shovel at Bracalba Quarry. The sensor was mounted on a gantry suspended from the boom (Fig. 18) to test in both bucket and face imaging configurations. Both the signal processor and the overlay module were mounted within weather-proof housings on the guide rails on the first landing of the shovel boom with a 24 V power supply and the video recorder mounted on shock-mounts inside the shovel housing. Measurements during the trials were undertaken with the radar beam scanning a line at right-angles to the shovel axis over an angle that was greater than the width of the bucket. The range and angles to the bucket surface were measured while bucket was drawn through the beam (Fig. 19) scanning the entire surface. Both the video and the Ethernet data were logged for later analysis during the trial.
Fig. 18. High speed radar installed on the shovel boom: (a) radar attachment to the frame and (b) camera enclosure attachment to the frame (camera centred on the radar axis)
Millimetre Wave Radar Visualisation System
161
Fig. 19. Schematic showing the process of moving the bucket through the radar beam
A single video frame with an overlay, shown in Fig. 20, proves that the radar resolution and registration are sufficiently good to rebuild an accurate representation of the fill surface, which fits onto a computer model of the bucket. The measurements also show that the scanner has the ability to identify returns from the teeth, which confirms the theoretical principles of radar operation, i.e. the capability to receive an echo from narrow structures (Fig. 20b). The timestamp was used to synchronize the video data with the comprehensive radar and crowd data to reconstruct the fill surface. Material height in the bucket (elevation) was color coded and measured in mm from bail pin (Fig. 20c). Other parameters corresponding to images presented in Fig. 20 are listed in Table 4 (Widzyk-Capehart et al. 2005).
Fig. 20. Radar bucket imaging: (a) video recording; (b) rock texture processed image; (c) material fill superimposed on bucket image with elevation (blue – low to red – high)
162
E. Widzyk-Capehart et al.
Table 4. Shovel Imaging parameters Total Imaging Bucket ParameScan ter Time [s]
Scanner mirror rate [Hz]
Scan angle range [degree]
Data range [points/ sec]
Maximum Number bucket of bucket speed passes [m/s]
Total number of data points
Parameter 66.0 value
4.6
90
173
0.8
28,996
7
2.0
Benefits to the Mining Industry
Ultimate benefits to the mining industry are associated with an increased productivity, decreased maintenance costs, reduction in machine wear and tear and improved safety. Knowing the depth of material in orepasses is critical from both safety and productivity perspectives. Collapse of “hung” passes and dumping rock into empty passes causes millions of dollars of damage to mine infrastructure and threatens the safety of personnel. An accurate measure of the pass depth (ore volume) is useful for stock assessment and allows for the automation of the fill and drawing processes with the associated improvements in both productivity and safety. Pass diameter (and hence scaling) can be inferred from the changes in pass depth with drawn volume and productivity can be accurately determined from the rate of level change. Knowing the cavity volume and fill surface contours in near real-time ads a new dimension to the back-fill process as it minimizes fill costs by minimizing CAF and maximizing rock fill without compromising the structural integrity of the process; measures fill volume accurately to determine fill costs and optimizes fill rates by relating paste cure times and allowable hydrostatic pressures on barriers and optimizes mix to maximize the stope fill with minimum residual voids. Knowledge of the terrain and how it changes with time allows the measurement of the real productivity of a dragline operation and may show spoil location in 3D, actual volume of overburden removed and the extent of rehandle. This information would assist the dragline operator to excavate the pit to plan, maximize production and minimize delays. The continual updating of DTM of an excavation and the direct feedback to operational and planning personnel will also allow for real-time plan reconciliation, re-planning and re-scheduling, increasing the efficiency of the dragline operation. The display of overall dig sequence would reinforce the operator’s mental model of the task whereas the micro views would provide real-time feedback on progress against plan. This should enable operators to recognize when they are deviating from plan and manage/recover their error before they digress even further. Three-dimensional images may be used for training purposes as actual dig sequences may be “replayed” to operators as part of ongoing training and development or performance management.
Millimetre Wave Radar Visualisation System
163
Knowledge of material distribution in the shovel bucket provides basis for achieving consistent and optimum bucket payloads and thus increased productivity. When combined with shovel payload system, accurate bucket payload can be achieved. Improved safety during loading operations would come from providing information in real-time on the location of hazardous rocks in the bucket while reduce machine duty would be reached by minimising the amount of energy used to fill the bucket. Bucket imaging system (in its present form) is an operational enhancement tool, as it provides information towards improved estimation of material bulk density, dipper-truck matching and enhanced operator performance via real-time feedback on material distribution in the bucket. Dig face imaging during shovel operation would provide accurate positioning of the shovel with respect to dig face ensuring optimum bucket fill for the given shovel configuration. Knowledge of shovel configuration can be combined with real-time feedback to shovel operator for proper machine positioning to guarantee consistent and desirable bucket fill. Bucket tooth detection benefits mining industry via reduction in maintenance costs and production loses as it eliminates tooth dumping into the crusher system. Early warning to truck operator (bucket tooth in truck bed) would allow sufficient time to re-direct the truck to a disposal location other than the crusher. Production improvements could be achieved by instantly informing the shovel operator of changed bucket fillability parameters (material flow into the bucket changed) and requirements for bucket maintenance. Increased safety would be realized by reducing workers exposure to risk during bucket teeth removal from the crusher.
8
Conclusions
Numerous stages of the millimeter wave radar development and its successful application in surface and underground mining applications have been presented. The system has proved to be robust, reliable and capable of withstanding harsh mining conditions without loss of accuracy. Fast data processing capabilities make it an ideal tool for real-time imaging and operator feedback. Narrow beam, for good resolution, and variable range operation makes it suitable for a wide range of mining applications from stope imaging in underground mines to dragline and shovel environmental mapping in surface operations. A new line of light-weight high speed scanners with an extra degree of freedom is being developed. These visualisation sensors will provide for the possibility of mounting the radar on the dragline boom to map the entire region around the machine and, for the shovel application, to examine the bucket contents during the dig phase of the cycle as well as providing real-time on-line operator feedback during the loading component of the cycle. Increased data processing and improved visualisation techniques when combined with information on machine status would even further advance the operational advantages of excavating machines.
164
9
E. Widzyk-Capehart et al.
Acknowledgements
The authors would like to thank ACARP, BHP Billiton, Rio Tinto, Western Mining Corporation (now BHP Billiton, Bracalba Quarry, P&H and CRCMining for supporting the development of these radar applications.
References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15]
Beckmann P and Spizzichino A (1987) The Scattering of Electromagnetic Waves from Rough Surfaces, Artech House. Bhartia P and Bahl I (1984) Millimeter Wave Engineering and Applications, John Wiley & Sons. Brooker G (2005) The Feasibility of using a Millimetre-Wave Radar to find Voids and Slumps in a Longwall. CRCMining Internal Report, The University of Queensland, Brisbane. Brooker G, Scheding S, Bishop M, and Hennessy R (2005) Development and application of millimetre wave radar sensors for underground mining. IEEE Sensors Journal, vol. 5, no. 6, pp 1270–1280. Comparetto G (1993) The Impact of Dust and Foliage penetration on Signal Attenuation in the Millimeter wave Regime. Journal of Space Communication vol. 11, no. 1, pp 13–20. Gillett D (1979) Environmental factors affecting dust emission by wind erosion. Saharan Dust. C. Morales. Wiley & Sons, New York. Goodsit M (1982) Field Patterns of Pulsed, Focussed, Ultrasonic Radiators in Attenuating and Non-attenuating Media. Acoustic Society America vol. 71, no. 2, pp 318–329. Goldhirsh J (2001) Attenuation and backscatter from a derived two-dimensional dust storm Model. IEEE Trans. on Antennas and Propagation vol. 49, no. 12, pp 1703– 1711. Kielb J and Pulkrabek M (1999) Application of a 25 GHz FMCW radar for industrial control and process level measurement. Microwave Symposium Digest, IEEE MTTS, pp 281–284. Kue R (1984) Estimating Ultrasonic Attenuation from Reflected Ultrasonic Signals, Comparison of Spectral Shift and Spectral Difference Approach. IEEE Trans. on Acoustic Speech and Signal Processing vol. 32, no. 1, pp 1–6. Leal J (2003) Stochastic Environmental Representation. Ph.D. Thesis, University of Sydney, ACFR, Australia. Macfarlane D and Robertson D (2004) A 94 GHz dual-mode active/passive imager for remote sensing. SPIE passive millimetre-wave and terahertz imaging and technology, London. Motion Metrics International Corp. (2006) Personal Communication with President & CEO Dr S Tafazoli. Nelson S (2001) Measurement and Calculation of Powdered Mixture Permittivities. IEEE Transactions on Instrumentation and Measurement vol. 50, no. 5, pp 1066– 1070. Perry B and Baden J (2000) Effectiveness of MMW Aerosols in Defeating Battlefield Surveillance Radar: Field Demonstration Preliminary Results. IEEE AES Systems Magazine: pp 11–20.
Millimetre Wave Radar Visualisation System
165
[16] Preissner J (1978) The Influence of the Atmosphere on Passive Radiometric Measurements. AGARD Conference Reprint No. 245: Millimeter and Submillimeter Wave Propagation and Circuits. [17] Reeves B, Stickley G, Noon D, and Longstaff D (2000) Developments in monitoring mine slope stability using radar interferometry. Proc. of the Geoscience and Remote Sensing Symposium, IGARSS 2000, Honolulu. [18] Widzyk-Capehart E, Brooker G, Hennessy R, and Lobsey C (2005) Rope shovel environment mapping for improved operation using millimetre wave radar. Proc. of the 2005 AMT Conference, Fremantle, Australia. [19] Widzyk-Capehart E, Brooker G, Shedding S, Maclean A, Sheppard G, McDonald A, and Lever P (2006) Real-time dragline production enhancement system. ACARP Report C13042, Brisbane. [20] Zimmermann B and Wiesbeck W (1996) 24 GHz microwave close range sensors for industrial measurement applications. Microwave Journal, pp 228–238.
An Underwater Camera and Instrumentation System for Monitoring the Undersea Environment
Kenneth K.K. Ku, Robin Bradbeer and Katherine Lam Department of Electronic Engineering, City University of Hong Kong, Kowloon, Hong Kong
1
Introduction
It is impossible to use human divers carrying underwater cameras, with limited battery and recording capacity, to monitor marine life and coral behaviour, and to record for 24 hours a day over three months. As a result, the use of an instrumentation platform, remotely controlling a deployed underwater camera systems and sensors, is an alternative approach that can provide long-time monitoring and image recording. The qualities and functions of a camera used underwater for monitoring are important when used to monitor coral behaviour such as spawning. A highresolution, remote powered and real-time controlled camera is necessary for such task. In addition, this camera has functions such as zoom, pan and tilt for a larger viewing range. A real-time capturing and recording system, which can capture real time pictures and store a large amount of video recording to PCs, was installed in in the shore station of Marine Science and Engineering Laboratory in the Marine Park at Hoi Ha Wan, Hong Kong. To support high quality imaging, optical fibre cable was used to prevent loss of data and image. The video signal was sent directly over a fibre optic cable to the laboratory. This decreased distortion of video signal and at the same time increased the speed of transmission due to the fibre optic. A five-in-one real-time remote control underwater Seabird sensor, which measured water temperature, pressure, PAR, dissolved oxygen and salinity, could also be connected to the camera system. A block diagram of the system is shown in Fig. 1. The authors have also previously reported on the design of a camera for in situ recording of fish activity (Ku et al. 2004; Bradbeer et al. 2005; Bradbeer et al. 2006). Most of the results reported previously on fish behaviour have only considered day-time and good-weather conditions (Burke 1995; Colton and Levizon 1981; Robblee and Zieman 1984; Sogard et al. 1989). Night observations on reef fish by SCUBA divers were much less and these were restricted in terms of
168
Kenneth K.K. Ku, Robin Bradbeer and Katherine Lam
Fig. 1. Block diagram of the underwater system
number of samplings (Hobson 1972; Santos et al. 2002). As will be shown, being able to monitor fish behaviour on a 24 hour basis in any weather conditions gives a very powerful tool for more accurate observations. In this paper, we reviewed the design and installation of the underwater camera Preliminary results of the fish species behaviour were herein reported. The system was proved to be able to capture on video rare biological events such as the spawning of coral.
2
Structure of the System
2.1
Underwater Cable
The configuration of the underwater cable is shown in Fig. 2. The cable, which contained two copper wires and two multimode fibre optic cables (Harry 1999), was about 210 m long, and connected between the Marine Life Centre and the underwater system. The waterproof jacket was made of PVC. In addition, aramid yarn was put inside the jacket to prevent leakage of water. This cable was designed and manufactured specifically for this underwater camera system.
Fig. 2. Configuration of the underwater cable
An Underwater Camera System
169
Fig. 3. Block diagram of the on-land system
2.2
On-land System
The on-land system was placed at the Marine Life Centre to control the deployed underwater system including the camera. At the same time, this system processes the received video signal and sensor data for analysing the marine environment. As shown in Fig. 3, the whole system used a bidirectional communication via two optical fibre cables. Control commands in RS-232 format were sent from one of the PCs through PC’s ‘com port’ to the underwater system. Those commands could be for camera control, lighting control or sensor control. The user could select one of two PCs to control the underwater system by using the switch. The RS-232 command signals were converted to optical signals and sent down one of the fibre optic cables to the underwater system. Once the underwater system had received the command signal, it sent back the sensor data and video signals to the on land system down the other fibre optic cable. The optical signal sent back from underwater system was a combination of video (with wavelength 820 nm) and sensor data (with wavelength 1300 nm), a WDM filter was applied to split the signals. After the optical signals were separated, the two signals of different wavelengths were converted to a video signal and a data signal by the fibre-to video converter and the fibre-to-RS232 converter respectively. The data signal was send back to the PC and the video was processed through video-capturing hardware and displayed on the screens of two PCs. The main power to the underwater system was 110 V ac stepped down from an isolation transformer from 220 V ac electric source.
2.3
Underwater System
Once the optical signal was received by the underwater system (Fig. 4), the control board converted it into different signal formats for the different components, e.g., RS232 for lighting brightness control, RS422 for camera moving control and
170
Kenneth K.K. Ku, Robin Bradbeer and Katherine Lam
Fig. 4. Block diagram of the underwater system
RS232 for Seabird sensor control. As the video image was needed for real-time, the video signal is converted to optical continuously. If the retrieving of Seabird sensor data from PC on-land was requested through correct command, the Seabird sensor should have send the data in RS232 format to the RS232-to fibre converter. The optical coupler would have combined the optically converted sensor data and converted video signal and send back to the on-land-system for processing. The power supply board regulated 110 V ac to the proper power for different circuit boards and components. The system photos are shown in Fig. 5.
Fig. 5. System photos showing the electronics setup and the camera external
An Underwater Camera System
2.4
171
Housing
The housing was built to protect the camera and all electronics being put undersea. Even stainless steel erodes easily if it is placed in seawater for a long time due to the chemical reaction between the screws and housing with the salt water. Other materials were considered for making the housing. Acrylic and Nylon is a better choice being used underwater for three months. The main body was thus comprised of three materials, (1), Nylon cylinder body to house electronics shown in Fig. 6A; (2), two acrylic blocks on the top and at the bottom of the nylon body shown in Figs. 6B and 6C to clamp the cylinder with one of them acting as a sealed cover with the attached underwater cables and an acrylic dome acting as a window of the camera. The acrylic dome was screwed on the top of main body with an O-ring seal (Fig. 6D) and (3), stainless steel bars and nuts to screw all the parts together shown in Fig. 6E. 3 Data collection and analysis. The video camera was deployed on a hard coral community (~5 m in depth) at the Marine Life Centre Bay of the Hoi Ha Wan Marine Park, which is one of the few areas in Hong Kong where the habitat is nominally conserved, and being a sub-tropical area, corals grow in patches instead of the extensive formations
Fig. 6. A, Hollow nylon cylinder; B, disassembled housing; C, bottom acrylic block with cable connectors; D, Acrylic Dome and O ring and E, whole housing
172
Kenneth K.K. Ku, Robin Bradbeer and Katherine Lam
elsewhere. City University of Hong Kong has a laboratory attached the WWF Marine Life Centre in Hoi Ha, and this is used for a number of activities related to marine science and engineering. Once the video was received by the PCs in the laboratory, it was stored, either as a compressed file in real time on the HDD, or recorded directly onto DVD using an attached DVD recorder. This was programmed so that the recording was timed. The burned DVDs can then be examined for analysis. The video data on the PCs were used as source material for an automated fish species identification and counting system currently under development. Some underwater photos captured by the system is shown Fig. 7. Ten-minute video footage for each hour was taken round the clock in July 2004. The dataset of fish assemblage representing fine days were collected for 5 days, i.e., on 4th, 7th, 8th, 10th, 15th of July. Another dataset representing days of storm and heavy rainfall, which occurred between 16th and 18th (n = 3), were also obtained. The visibility of the water is approximately 2 m. The camera recorded video foot-
Fig. 7. Underwater photos captured by system during daylight
An Underwater Camera System
173
age of a water volume of ~ 2 m3. Species occurring were identified and their number in each footage was recorded. The number of fish was quantified as the number of fish occurring in each 10-minute video footage. Density of fish was obtained by dividing the number of fish by the volume of water, i.e., 2 m3.
4
Results of Observation
A list of fish species identified from the footage recorded by the underwater surveillance camera is shown below. Class Actinopterygii Family Pomacentridae (Damselfishes) Neopomacentrus bankieri (Chinese demoiselle) Abudefduf bengalensis (Bengal sergeant) Family Labridae (Wrasses) Halichoeres nigrescens (Bubblefin wrasse, Diamond wrasse) Stethojulis interrupta (Cut-ribbon wrasse) Thalassoma lunare (Moon wrasse) Family Gobiidae (Gobies) Amblygobius phalaena (Banded goby) Family Siganidae (Rabbitfishes) Siganus canaliculatus (Seagrass rabbitsfish, Pearlspot or White-spotted spinefoot) Family Apogonidae (Cardinalfishes) Apogon pseudotaeniatus (Doublebar or Twobanded cardinalfish) Family Mullidae (Goatfishes) Parupeneus biaculeatus (Pointed goatfish) Upeneus tragula (Freckled goatfish) Family Gerridae (Mojarras) Gerres macrosoma (Bulky mojarra) Family Scaridae (Parrotfishes) Scarus ghobban (Blue-barred parrotfish, Bluestriped parrotfish) Family Blenniidae (Blennies) Aspidontus dussumieri (Lance blenny) Family Serranidae (Groupers) Cephalopholis boenak (Chocolate hind, brown coral-cod) Epinephelus quoyanus (Longfin grouper) Family Tetraodontidae (Pufferfishes) Takifugu alboplumbeus (Hong Kong pufferfish) Family Scorpaenidae (Scorpionfishes) Sebasticus marmoratus (Marbled rockfish, Common rockfish) Family Syngnathidae (Seahorses and Pipefish) Hippocampus kuda (Spotted seahorse, yellow seahorse) Syngnathus schlegeli (Seaweed pipefish)
174
Kenneth K.K. Ku, Robin Bradbeer and Katherine Lam
Abundance and species of fish were identified from this footage and results were shown in Fig. 8 and Fig. 9 for fine weather days. Figure 10 shows the number of fish species and Fig. 11 the density for stormy and rainy days. The number of fish species and levels of fish density generally increase during daylight and decrease to almost zero at night. These two parameters were also the highest during dawn and dusk. These figures show that the coral fishes are most active during dawn and dusk in fine days. During the night, they tend to hide in crevices among the coral colonies for shelter. There is also an increase in fish density in the morning of stormy days, this indicate the fishes tend to be active feeding within a brief period only.
Fig. 8. Mean number of species
Fig. 9. Mean density of the fish (Number of fish · m-3)
An Underwater Camera System
175
Fig. 10. Mean number of species
Fig. 11. Mean density of fish (Number of fish · m3)
Diurnal activity patterns of some common species have been identified and shown below in Fig. 12. This indicates different species were active at different period of time of the day. Most common species, such as Chinese demoiselle, Bengal sergeant, Bubblefin wrasse, Moon wrasse and Seagrass rabbitsfish were active during dawn and dusk. Pointed goatfish was active only during dusk. All these common species are generally active during daylight and inactive at night. The camera has been deployed in similar fashion from the middle of June 2005. Preliminary results show results for slightly different weather conditions. However,
176
Kenneth K.K. Ku, Robin Bradbeer and Katherine Lam
Fig. 12. Number of occurrence of fish in 10-minute video footage of six common fish species
An Underwater Camera System
177
Fig. 13. Coral spawning-egg release, 20:00 30 June 2005
Fig. 14. Coral spawning-sperm cloud, 21:00 30 June 2005
on night on 30th June, at ~20:00, egg masses were observed ejected from coral colonies (Family Faviidae) in front of the camera (Fig. 13). Ejection of sperms occurred subsequently and caused the translucent nature of the water column at ~21:00 (Fig. 14).
5
Discussion
Traditional diver-based fish census has to be carried out in fine and warm days. The camera described in this paper, however, can run continuously non-stop during both fine and stormy days. The collected footage expands our knowledge on fish activities during bad weather such as days with storm and heavy rain or low sea water temperatures, which always happens in subtropical coral areas such as Hong Kong.
178
Kenneth K.K. Ku, Robin Bradbeer and Katherine Lam
Using underwater camera for fish census has an advantage of increasing sampling size in terms of time but it is restricted to the spatial sampling area. The diver method is the reverse, i.e, temporal sampling size is restricted but sampling area can be increased due to the mobility of the diver. In this study, the camera observed fish only over an area of ~ 5 m3 of coral area whereas the diver observed over an area of ~400 m3, i.e., four segments × 20 m long × 5 m wide. When the camera was used in coral area with high sediment load, the dome was subjected to be masked by the sediment and ‘marine snow’. These ‘marine snow’ was tiny lump of fine silt and sediment stick together by plankton or mucous secreted by the marine organisms. After the camera has been deployed for a few days, a biofilm consist of bacteria and algae formed on the dome. This biofilm was the first stage of fouling and attract settlement of more larvae of fouling organisms. The settled sediment and silt and fouling had to be cleaned regularly by diver by using cotton cloth so as to obtain a clear image. There were other technical problems encountered during the use of this camera in the undersea environment. These are discussed as follows. As the camera used in this project could pan and tilt, to obtain a larger range of view for the camera, a hemispherical dome was used. However, when the camera was placed underwater, the focus effect when zooming was worse than that on land. The camera lens was of a focal length 4 to 48 mm in air but cannot focus with a zoomed picture underwater due to the distortion of the dome-shaped cover. A 3-dioptre-macro correction lens was added so that the camera can zoom on to a object within 25 cm long. Initially, the camera could not focus with a clear zoomed picture because of the condensed water vapour on the underside of the acrylic dome cover. When the camera was put underwater, the temperature difference between the running electronics, which was ∼35°C, and the surrounding seawater, which was ∼20°C, generated the fog on the doom. The fog was removed by adding water-absorbing silicon gel packs inside the camera body. Underwater lights, with position of the beam parallel to the length of the camera body, were installed on the seabed and were switched on at night. Photophilic marine organisms were attracted to the underwater system at night which have blocked the light on the monitored objects, i.e., corals and fishes. The position of the light was adjusted such that the light path was not parallel to the camera view, or off to one side.
6
Conclusion
This instrumentation system has worked well for the three months continuous usage underwater, observing a coral bed at a depth of 3–4 m. It even survived a typhoon! The system was fixed to the sea bed using rebars hammered into the hard sediment under the sand. The system was clamped to the rebars. The fibre optic/power cable was buried under a thin sand layer for protection. After three months the system was raised and cleaned to clear away all the marine growth.
An Underwater Camera System
179
The camera system thus achieved the following: • a high-resolution underwater camera that can zoom, pan and tilt as well as being controlled remotely via optical fibre • a fibre optic communication for underwater, especially for underwater camera. In addition, the system could communicate with the Seabird sensor system via optical fibre. Through the system, real-time analysis of the underwater environment could be achieved. • an environmentally friendly system that was good for the marine environment with real-time underwater recording by remote control, rather than by diver.
Acknowledgement This project was financed with support from the City University of Hong Kong Strategic Research Grant Number 7001418.
References [1]
[2] [3] [4] [5] [6]
[7] [8] [9]
R.S. Bradbeer, K.K.K. Ku, L.F. Yeung, K.Lam (2005) An Underwater Camera for Security and Recreational Use. In: R.S. Bradbeer, Y.H. Shum (eds) Proceedings, 9th International Symposium on Consumer Electronics. Institute of Electrical and Electronic Engineers, Inc, New Jersey, pp 364–368 R.S. Bradbeer, K.K.K. Ku, L.F. Yeung (2006) Using an Underwater Camera to Monitor Real-Time Fish Activities. Sea Tech 47: 13–19 N.C, Burke (1995) Nocturanl foraging habitats of French and Bluestriped Grunts, Haemulon Flavolineatum and H. sciurns, at Tobacco Cayne, Belize. Environ Biol Fishes 42: 365–374 D.E. Colton, W.S. Levizon (1981) Diurnal variability in a fish assemblage of a Bahamian coral reef. Environ Biol Fishes 6: 341–345 E.S. Hobson (1972) Activity of Hawaiian reef fishes during the evening and morning transitions between daylight and darkness. Fish Bull 70: 715–740 K.K. Ku, R.S. Bradbeer, K. Lam, L.F. Yeung, R.C.W. Li (2004) An underwater camera and instrumentation system for monitoring the undersea environment. In: Bradbeer RS (ed) Proceedings 10th IEEE International Conference on Mechatronics and Machine Vision in Practice. City University of Hong Kong, Hong Kong, pp 189– 194 M.B. Robblee, J.C. Zieman (1984) Diel variation in the fish fauna of a tropical seagrass feeding ground. Bull Mar Sci 34: 335–345 M.N. Santos, C.C. Monteiro, M.B. Gaspar (2002) Diurnal variations in the fish assemblage at an artificial reef. ICES J Mar Sci 59: S32–S35 S.M. Sogard, G.V.N. Powell, J.G. Holmquist (1989) Utilization by fishes of shallow, seagrass-covered banks in Florida Bay: 2. Diel and tidal patterns, Environ Biol Fishes 24: 81–92
Visual Position Estimation for Automatic Landing of a Tail-Sitter Vertical Takeoff and Landing Unmanned Air Vehicle
Allen C. Tsai1, Peter W. Gibbens1 and R. Hugh Stone1 1
School of Aerospace, Mechanical and Mechatronic Engineering, University of Sydney, N.S.W., 2006, Australia {allen.tsai, pwg, hstone} @ aeromech.usyd.edu.au
1
Introduction
People gain a physical sense of the environment surrounding them via visual information from the eyes; but from these observations alone we are not capable of determining the exact dimensions of the environment. Field robots use sensors such as Global Position System (GPS) or Inertial Measurement Units (IMU) to make accurate estimations of the position and attitude, but these instruments cannot provide accurate relative measurements with respect to a specific site without prior surveying. Computer vision techniques, i. e. using cameras as sensors; offer vision information that gives a physical sense of a robotic platform pose with respect to some targeted site, and are capable of making accurate estimates of relative positions and attitudes. This paper focuses on how a tail-sitter Vertical Takeoff and Landing (VTOL) Unmanned Air Vehicles (UAV) can use visual cues to aid in navigation or guidance, focusing on the 3D position estimation of the flight vehicle especially during landing; where accurate estimates of the vehicle attitude and position offset from the a pre-designed landing site are required to achieve safe landing. Using computer vision techniques to estimate the states of a flight platform specifically for a VTOL UAV to aid in landing has been done by a number of research groups around the world since the last decade and half. Work of particular interest is the U.S.C. AVATAR project [1]; landing in unstructured 3D environment using visual cues has been reported but the assumption of the UAV always flying parallel to the ground surface was made, restricting this to output only 2D information. Work done by University of California, Berkeley [2] focused on ego-motion estimation where at least two or more images are required to determine the 3D attitude and position of a VTOL UAV. Yang and Tsai [3] have looked at position and attitude determination of a helicopter undergoing 3D motion using a single image, but
182
A. Tsai, P. Gibbens and R. Stone
did not present a strategy for target identification. Amidi and Miller [4, 5] used a visual odometer which requires two cameras and isn’t capable of determining the attitude information. This paper will expand on their recent work [6] and look into how from a single camera image and IMU information, the 3D position estimates of the vehicle can be ascertained from the appearance of the landing target which is subject to perspective transformation. Position of the vehicle is estimated via Yang and Tsai’s method [3] with slight improvements to deal with the more standard forms of presenting vehicle states in the aerospace field [7]. Paper Outline: Section 2 briefly discusses the image processing techniques undertaken, the set up of the video camera on the “T-Wing” UAV [11], and the design of the landing pad; section 3 describes the target detection method, section 4 presents the mathematical techniques used to carry out 3D position estimation for a flight vehicle, and section 5 presents and discusses the results of position estimates from images taken on a UAV. Section 6 gives some concluding remarks and directions for future work.
2
Image Processing
The idea of the vision algorithm is to accurately maintain focus on an object of interest, which is the landing pad with a pre-designed marking (hereafter referred to as the landing target). The elimination of unwanted objects is achieved by a series of transformations from color to a binary image, noise filtering, image segmentations and object identification. In this section the image acquisition hardware set-up and the set up of the landing pad is firstly introduced and followed by the vision algorithm in brief. 2.1
Landing Pad Design and Image Acquisition Hardware
The capital block letter “T” is used as the marking on the landing pad to make up the landing target. Reasons for using a “T” can be found in [6]. The width and the length of the “T” are both 60 cm and the thickness of the vertical and horizontal arms are both 15 cm. Knowing the dimensions of the landing target, the position of a UAV with respect to this target can be estimated; this will be expanded on later. The camera parameters can be found in [6]. The set up of the camera on the flight vehicle is shown in the simulation drawing and photographs in Fig. 1.
Fig. 1. Simulation drawing and photos of the camera set-up on the T-Wing
Automatic Landing of a Vertical Takeoff and Landing Unmanned Air Vehicle
2.2
183
Image Processing Algorithm
The low level image processing for target identification and position estimation requires transformations starting from the color image to gray-scale, then noise filtering, binary transformation, geometric feature based object elimination, target detection and line detection; the full details of this part of the task can again be found in [6]. The following figure shows the stages of image processing:
Fig. 2. Image processing steps (from left to right and down): the grayscale transformation, median filtering, binary transformation, rejection of small and large objects and target identification with line detection
3
Target Identification
The landing target identification procedure is accomplished by using the geometric properties of the target. This involves investigating the moment of inertia of the target shape. Hu’s invariant moments [8] are known to be invariant under translation, rotation and scaling in the 2D plane. The invariant moments are derived from the moment of inertia of the shape of the landing target. This technique is well suited to tasks associated with identifying landing targets. As VTOL UAVs approach the landing target, the flight vehicle will undergo 3D motion; the invariant moment method mentioned above is only known to be invariant under 2D scaling, translation and rotation. However as proven by Sivaramakrishna and Shashidharf [9], it is possible to identify objects of interest from even fairly similar shapes even if they undergo perspective transformation by tracking the higher order moments as well as the lower order ones. This has also been made possible due to the fact the vertical flight of the “T-Wing” seldom has vertical pitch and yaw angles greater than 20 degrees. Further details of this method and the results of target detection can be found in [6]. This method is applied to detect the landing pad and hence determine the position estimates of the landing target.
184
4
A. Tsai, P. Gibbens and R. Stone
State Estimation
Due to the inherent instability with VTOL UAV near the ground during landing, it is necessary to be able to accurately determine the 3D positions and 3D attitudes of the flight vehicle relative to the landing target to assist in the control of the vehicle for successful autonomous landing. This section discusses the attitude and position estimation techniques used.
4.1
Coordinate Axes Transformation
Transformations between several coordinate systems need to be defined so any position and attitude estimate are properly referenced with. The first coordinate system to be defined is the image plane denoted as the Image Coordinate System (ICS), and then the Camera Coordinate System (CCS) which represents the camera mounted on the flight vehicle. Lastly the flight vehicle body axes denoted as the Vehicle Coordinate System (VCS). To describe the relative orientation between the landing target and the flight platform, the Global Coordinate System (GCS) is also required. The more conventional ground reference frame of North, East and Down is also required (NED). The following diagram shows the relationships between the NED, GCS, VCS and CCS.
Fig. 3. Relationships between all the Coordinate Systems
Automatic Landing of a Vertical Takeoff and Landing Unmanned Air Vehicle
4.2
185
Flight Vehicle Orientation Estimation
The presence of parallel lines of the target shape can be used to deduce 3D attitudes of the flight vehicle; it is well known that 3D parallel lines intersect at a vanishing point on the image plane due to perspective transformation. The vanishing point is a property that indicates the 3D line direction of the set of parallel lines [10]. This technique ultimately gives the Direction Cosine matrix, CGC, which is the transformation matrix of CCS to GCS. The details of the parallel lines technique to obtain CGC can be found in [6]. The generation of the CGC matrix can be easily achieved by using the angles of the flight vehicle given by the IMU; these angles can be transformed into the camera’s pan, tilt and roll angles and hence the CGC matrix can be determined. For the purpose of testing the position algorithm, the IMU angle outputs are used. The IMU angles are used instead of angles obtained from vanishing point technique as they are better in accuracy, down to 0.05 degrees quoted by the manufacturer [12]. The use of IMU angles will ensure that the position estimates are not affected by errors introduced in the vanishing point technique, and this is the more practical and accurate approach in the future for autonomous tasks.
4.3
Flight Vehicle Position Estimation
The geometric properties of the landing target: “T” can be used to determine the position of the landing target with respect to the vehicle using one single image. The position of the landing target in the CCS is firstly determined and can then be easily transformed into north, east and altitude information of the vehicle. The position estimation technique is adopted from Yang and Tsai [3]. This technique accounts for the perspective distortion of images, due the 3D motions of UAVs. Let’s first consider a 3D line “L” represented by a set of points: L = {(x, y, z) | (x, y, z) = (p1, p2, p3) + λ(d1, d2, d3) for real λ}
(1)
The line L passes through the point (p1, p2, p3) and has the line directions (d1, d2, d3) in the CCS. The image plane point of a point on the line L is as follows: (u, v) = (f×x/z, f×y/z) = [f×(p1 + λd1)/(p3 + λd3), f×(p2 + λd2)/(p3 + λd3)]
(2)
where f is the focal length of the camera and λ is the line length or parameter. Transformation of Eq. (2) into matrix form results in the following linear system: Ap = b,
(3)
186
A. Tsai, P. Gibbens and R. Stone
where:
f ... f ⎛ ⎜ A=⎜ 0 ... 0 ⎜ − u ... − u n −1 ⎝ 0
⎞ ⎟ f ... f ⎟ − v 0 ... − v n −1 ⎟⎠ 0 ... 0
p = (p1, p2, p3)T
and b = (δ0(u0d3 – fd1), …, δn-1(un-1d3 – fd1), δ0(u0d3 – fd2), …, δn-1(un-1d3 – fd2))T where δ is the distance between successive collinear points and n is the notation for the (n+1)st point on the edge of the target “T”. The collinear points on the edges of the “T” are used to determine the positions; they are easily picked out by the intersection between the edges of the horizontal arm and the edges of the vertical arm of the “T”, which were obtained from line detection. There can be four intersection points that are expected which will be used for position estimation. p = (p1, p2, p3)T is the landing target position in the camera frame which is the average of the four intersection points. The linear system can be solved using the QR-decomposition technique by treating it as a least squares problem. Only the results of this procedure are shown, the full derivation can be found in Yang and Tsai’s paper [3]. The A matrix in the normalized QR-decomposition form is as follow: (4) A = QR, where: ⎛ 1 ⎜ n ⎜ ⎜ Q = ⎜ 0 ⎜ ⎜ U − u0 ⎜ q ⎝
...
1 n
...
0
...
U − u n −1 q
⎛ ⎜ n f ⎜ R = ⎜ 0 ⎜ U ⎜ n ⎝
0
...
1 n V − v0 q
0 n f V n
... ...
⎞ 0⎟ ⎟ 0⎟ ⎟ q⎟ ⎠
and
q =
SU + SV −
U =
n −1
∑u i=0
SU =
n −1
∑
i=0
i
,V =
U2 V2 − n n n −1
∑v i=0
u i2 and S V =
i
n −1
∑v i=0
2 i
0 1 n V − v n −1 q
⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠
T
Automatic Landing of a Vertical Takeoff and Landing Unmanned Air Vehicle
187
From substitution of Eq. (4) into Eq. (3) the following is obtained: QRp = b
(5)
=> p = R\QTb where ⎛ ⎜ ⎜ ⎜ QTb = ⎜ ⎜ ⎜ 1 ⎜ q ⎝
n −1
∑
i= 0
⎛ ⎜⎜ δ i (u i d 3 ⎝
1 n −1 ∑ δ i (u i d 3 − fd 1 ) n i= 0 1 n −1 ∑ δ i (v i d 3 − fd 2 ) n i= 0 U ⎞ ⎛ − fd 1 )⎜ − u i + ⎟ + δ i (v i d 3 − fd n ⎠ ⎝ = ( q1 , q 2 , q 3 )
2
⎞ ⎟ ⎟ ⎟ ⎟ ⎟ )⎛⎜ − v i + V ⎞⎟ ⎞⎟⎟ ⎟⎟ n ⎠⎠⎠ ⎝
T
⎛ q1 Uq 3 q 2 Vq 3 q 3 ⎞ ⎟ ⇒ p = ( p1 , p 2 , p 3 ) = ± ⎜⎜ + + , , nfq nfq q ⎟⎠ nf ⎝ nf
T
T
If q3 < 0 then negative one would need to be multiplied to R\QTb, which is the reason behind the ± sign [3]. Knowing that p is the position of the landing target in the CCS, the position of the vehicle in the GCS can be obtained firstly by translation of the camera position vector to the VCS origin and a transformation of the position vector in the CCS into VCS. This is then followed by the translation of the landing target position in VCS to GCS and then the transformation of VCS into GCS. The GCS is not aligned with the North-East-Down frame hence another transformation is required to finally obtain the flight vehicle’s positions in North East and Down Frame: Let p = Xc, XV = CV.C. XC + TV, XG = –CG.V. XV, XNED = CNED.G. XG,
where CG.C. = [dx, dy, dz]T, Ö CG.V. = CG.C. × CC.V.,
C G . NED
⎡0 = ⎢⎢ 1 ⎢⎣ 0
⎡0 0 − 1⎤ 0 ⎤ ⎢ ⎥ 0 ⎥⎥ 0 0 ⎥ and CV .C . = ⎢1 0 ⎢⎣0 − 1 0 ⎥⎦ 0 − 1⎥⎦ T CNED.G. = CG.NED.
1
(6)
188
5
A. Tsai, P. Gibbens and R. Stone
Experimental Results
Flight images taken during the hover flight testing of the “T-Wing”, a tail-sitter VTOL UAV, were used to test the accuracy and robustness of the aforementioned techniques of 3D position estimation. Flight images are extracted from a 5 second period of video, approximately 100 images, of the flight vehicle hovering over the landing target with the target always within the camera viewing angles. Figure 4 shows the North, East and Altitude coordinate estimates relative to the target location from the vision system and compares them with corresponding relative position estimates from the GPS, and a ‘filtered’ estimate from a fusion algorithm internal to the GPS/IMU system. In terms of accuracy, it cannot be said that the GPS nor the IMU – GPS filtered outputs of position are 100% accurate; GPS measurements are known to be affected ionospheric conditions, multi-path effects, satellite geometry and so on, and have different accuracies for different operational modes [12]. The manufacturer’s quoted precision [12] is 0.45 cm CEP (Circular Error Probable) for the mode that the GPS was in (equivalent to DGPS) for this particular part of the flight. The figure clearly shows coarseness in the GPS measurement that would be inadequate for feedback into a guidance/control system. The ‘filtered’ estimate is sufficiently smooth for control but evidently suffers from filter bias/lag resulting from the coarseness of the raw GPS position estimate. In comparison, the vision estimate gives smooth results that otherwise lie within the GPS error bounds. Subsequent fusion of the vision based position estimate with the IMU should provide a smooth and accurate relative position measurement suitable for guidance and control to touchdown.
Fig. 4. Position Estimates from G.P.S., Filtered, Vision and Simulation
Automatic Landing of a Vertical Takeoff and Landing Unmanned Air Vehicle
189
GPS cannot provide the absolute truth of vehicle position. However, clearly the vision system can give position estimates that potentially converge toward zero as the target is approached. To understand the nature of this characteristic, and for comparison purposes, an error analysis is justified. A test was devised that investigated the sensitivity of the position estimates from the vision algorithm to uncertainties in the vehicle attitude components. The IMU system has a precision of ±0.05 degrees in each of the three attitude angles. Perturbations of ±0.05 degrees the IMU attitude angles were carried out in all combinations to test the sensitivity of the vision based position estimation technique to these uncertainties. The image processing algorithm was tested on images at two altitudes: one low down at 1.98 m and the other higher up at 4.14 m. This demonstrates how pixel accuracy and line detection affect the position estimate as the target appears smaller with distance. The low and high altitude images used for the testing of sensitivity along with their line detection are shown in Fig. 5 and results of the sensitivity analysis are given in Table 1.
Fig. 5. The high and low altitude images with their line detection results Table 1. Position Estimates of Perturbed Angle Inputs and the R.M.S. Error North (m)
East (m)
Altitude (m)
Test Case
Low
High
Low
High
Low
High
Unperturbed Estimate Minimum perturbation Maximum perturbation R.M.S. error
0.5856 0.5827 0.5884 0.0019
–0.7148 –0.7201 –0.7095 0.0037
0.6266 0.6243 0.6288 0.0015
0.7741 0.7692 0.7789 0.0034
1.9805 1.9783 1.9827 0.0015
4.1414 4.1395 4.1433 0.0017
The results are promising as they show that IMU attitude errors only contribute 2 mm to errors in position estimates at an altitude of 2 m and 4 mm at an altitude of 4 m. Thus the most significant inaccuracies in the vision based position estimates will be due to height dependence of the image processing procedures, and the integrity of the camera calibration. Errors less than 10 cm show that at altitudes of about 5 m the vision system can almost always have better accuracy than GPS, and that that accuracy improves as the target is approached. With regard to computational time, the filtering and thresholding of the images took approximately 11.92% of the time; component labeling and segmentation took around about 54.99%; position estimation algorithm needed 4.933% and the
190
A. Tsai, P. Gibbens and R. Stone
invariant moments’ calculations required 28.16% of the computational time when dealing with two objects. Reported computational times for the position algorithm is less than 0.001 s [3], this shows that using one single image, accurate position estimates, accounting for perspective projection, can be obtained at less computational power than the more favored multiple view technique.
6
Conclusion
In this paper, results of vision based relative position estimates for a tail-sitter VTOL UAV undergoing 3D motion just before landing are presented. This method of 3D position estimation requires only a single camera and IMU angle data. This is a more computationally effective technique than motion analysis based on two cameras. Further developments of this technique in the future can possibly see autonomous landing of manned helicopters onto a helipad and autonomous guidance of fixed-wing aircraft onto runways during landing without GPS aiding. A major issue requiring investigation is the estimation of position when the landing target is only partially visible in the image plane. In the future we intend to fuse attitude, position and velocity estimates for use in the control of the “T-Wing” during landing.
References [1] [2] [3] [4] [5] [6]
[7] [8]
S. Saripalli, J.F. Montgomery, and G.S. Sukhatme, “Vision-based autonomous landing of an unmanned aerial vehicle,” presented at IEEE International Conference on Robotics and Automation, 2002. R.O.V. Shakernia, C.S. Sharp, Y. Ma, S. Sastry, “Multiple view motion estimation and control for landing an unmanned aerial vehicle,” presented at IEEE International Conference on Robotics and Automation, Washington, DC, 2002. Z.F. Yang and W.H. Tsai, “Using parallel line information for vision-based landmark location estimation and an application to automatic helicopter landing,” Robotics and Computer-Integrated Manufacturing, vol. 14, pp. 297–306, 1998. T. Amidi and K. Fujita, “Visual odometer for autonomous helicopter flight,” Robotics and Autonomous Systems, vol. 28, pp. 185–193, 1999. T. Amidi and J.R. Miller, “Vision-Based Autonomous Helicopter Research at Carnegie Mellon Robotics Institute 1991–1997,” presented at American Helicopter Society International Conference, Heli, Japan, 1998. A. Tsai, P. Gibbens, and R. Stone, “Terminal Phase Vision-Based Target Recognition and 3D Pose Estimation for a Tail-Sitter, Vertical Takeoff and Landing Unmanned Air Vehicle,” presented at Pacific-rim Symposium on Image and Video Technology, Hsin-Chu, Taiwan, 2006. B. L. Stevens and F.L. Lewis, Aircraft Control and Simulation, 2 ed. Hoboken, New Jersey: John Wiley & Sons, 2003. M. Hu, “Visual Pattern Recognition by Moment Invariants,” IRE Transactions on Information Theory, 1962.
Automatic Landing of a Vertical Takeoff and Landing Unmanned Air Vehicle [9]
191
R. Sivaramakrishna and N. S. Shashidharf, “Hu’s moment invariants: how invariant are they under skew and perspective transformations?” presented at IEEE WESCANEX 97: Communications, Power and Computing. 1997. [10] R.M. Haralick and L.G. Shapiro, Computer and Robot Vision, vol. II: AddisonWesley, 1993. [11] R.H. Stone, “Configuration Design of a Canard Configuration Tail Sitter Unmanned Air Vehicle Using Multidisciplinary Optimization.” N.S.W.: PhD Thesis, University of Sydney, 1999. [12] “SPAN Technology System Characteristics and Performance,” NovAtel Inc. 2005.
Minutiae-based Fingerprint Alignment Using Phase Correlation
Weiping Chen and Yongsheng Gao School of Engineering, Faculty of Engineering and Information Technology, Griffith University, Australia
[email protected],
[email protected]
1
Introduction
Fingerprint has been used as a method of personal identification for over a century. It is widely used in biometric authentication at present because of its uniqueness and permanence. A fingerprint consists of ridges and valleys. There are two basic features used in fingerprint recognition, i.e. ridge endings and ridge bifurcations. Other features are also used. According to features used in fingerprint recognition, automatic fingerprint recognition techniques are classified into minutiaebased, image-based and ridge feature-based approaches (Maltoni et al. 2003). Ridge feature-based approach (Jain et al. 2000) is used when minutiae are difficult to extract in very low-quality fingerprint images, whereas other features of the fingerprint ridge pattern (e.g., local orientation and frequency, ridge shape, texture information) may be extracted more reliably than minutiae, even though their distinctiveness is generally lower. Image-based approach (Ito et al. 2004) uses the entire gray scale fingerprint images as a template to match against input fingerprint images. This approach needs a large size of storage space and fingerprint images are illegal to be stored in some nations. Minutiae-based approach attempts to get the similarity degree between two minutiae sets. Jain et al. (Jain et al. 1997) proposed alignment-based minutiae matching algorithms which use corresponding ridges or minutiae neighbors to find best registration. However, these methods may make the computation more sophisticated and need to search for the best correspondence of minutiae pairs or ridge pairs (Jain et al. 1997) or use core or delta minutiae point to estimate the alignment (Chan et al. 2004). The problem of lost minutiae or false minutiae always occurs during the minutiae detection process. Hence, the corresponding pairs may not be found under this condition.
194
Weiping Chen and Yongsheng Gao
The performance of fingerprint recognition relies heavily on the accuracy of fingerprint alignment. This paper proposes a novel fingerprint alignment approach based on minutiae using phase correlation. A new representation called Minutiae Direction Image (MDI) is introduced, which is generated by converting minutiae point sets into image spaces. The rotation and translation values are calculated using phase correlation between the input MDI and the template MDI. Our approach does not need to search for the corresponding pairs between the two fingerprints. The alignment parameters are obtained directly through phase correlation between two MDIs. Phase correlation (Brown et al. 1992) method provides straightforward estimation of rigid translation between two images. It was applied in image-based fingerprint recognition (Ito et al. 2004). However, fingerprint recognition is widely applied in embedded system or portable device which requires a small storage space. Since their algorithm processes entire fingerprint images, the large storage requirement for all fingerprint images limits its applicability. The proposed minutiae-based approach, employs stores merely a small number of minutiae points, greatly reduced the storage requirement. The paper is organized as follows: Section 2 gives the definition of phase correlation. The minutiae direction image is introduced in Section 3. Section 4 describes a fingerprint alignment algorithm using phase correlation. Section 5 presents the experiments and preliminary results. Conclusion is drawn in Section 6.
2
Phase Correlation
The phase correlation (PC) method is a popular choice for image registration because of its robust performance and computational simplicity (Hoge 2003). It is based on the well-known Fourier shift theorem. Suppose two images f1 and f2, which differ only by a translation dx and dy. The relationship between these two images is given by f2 ( x, y ) = f1 ( x − dx , y − dy ) .
(1)
Their corresponding Fourier Transforms F1 and F2 are related by F2 (u , v ) = e
− j 2π ( udx M + vdy N )
(2)
F1 (u , v ) .
In other words, the Fourier magnitudes are the same in two images and the only difference is in phase between these two images. In addition, this phase difference is directly related to displacement. The phase difference equals to the cross-phase spectrum (or normalized cross-phase spectrum) P (u , v ) which is represented by *
P (u , v ) =
F1 ( u , v ) F2 ( u , v ) *
F1 ( u , v ) F2 ( u , v )
= e
− j 2 π ( udm M + vdn N )
,
(3)
Minutiae-based Fingerprint Alignment Using Phase Correlation
195
*
where F1 (u , v ) denotes the complex conjugate of F1 (u , v ) . The 2D inverse Fourier transform of cross-phase spectrum is given by p(m, n) =
1 MN
∑ P (u , v ) e
j 2 π ( um M + vn N )
u ,v
(4)
.
p ( m, n ) is delta function centered at the displacement. Hence, the displacements are determined according to the location of the peak in the inverse crossphase spectrum space.
3
New Representation: MDI
Phase correlation can not be used to align two point sets directly. We present a new representation called Minutiae Direction Image (MDI) which is generated by converting minutiae point sets into 2D image space. Then alignment parameters are determined using phase correlation between two MDIs. Let M = (( x1 , y1 , α1 ), … , ( xN , y N , α N )) denote the set of N minutiae in a fingerprint image. The image size is C × R and ( xi , yi , α i ) are the three features (spatial position and orientation) associated with the ith minutiae in set M . Define the M MDI of set M as M (m, n), m ∈ [0, C − 1], n ∈ [0, R − 1]. It contains the angles of minutiae directions at the positions of minutiae points and 0 otherwise, which is written as
⎧⎪α1
MM (m, n) = ⎨ ⎪⎩0
m = xi , n = yi, ( xi , yi , αi ) ∈ M otherwise
.
(5)
M
The size of the MDI M is the same as that of the fingerprint image. The intensity of each pixel in a MDI is the angle of the minutiae direction if this pixel’s coordinate is the same as the location of a minutiae point; otherwise, the intensity is set to 0.
4
Fingerprint Alignment Approach
In this section, we propose a minutiae-based fingerprint alignment approach using PC function. Alignment parameters (displacement and rotation) are calculated between the template fingerprint and the input fingerprint. We assume the scaling is constant because the images in most applications are acquired at the same resolution. After converting minutiae sets into MDIs, the alignment parameters are estimated using phase correlation.
196
Weiping Chen and Yongsheng Gao
Given two sets of minutiae points T and I , which are extracted from the template fingerprint image and the input fingerprint image respectively. For convenience, we use the following notation: T
T
T
T = ( ( x1 , y1 , α 1 ) , … , ( x I
I
I
T p
, y
I
T p
I
,α
T p
I q
))
I = ( ( x1 , y1 , α 1 ) , … , ( x q , y q , α T
T
)),
T
where ( xi , yi , α i ) are the three attributes (spatial position and orientation) associated with the ith minutiae in set T . T Firstly, the template minutiae set T is converted to a template MDI M . Suppose the possible rotation angle is from −θ max to θ max with angle spacing Δθ . To every θ ∈ [ −θ max , θ max ] , the input minutiae set I is rotated into a new input minuI tiae set I k , then convert the rotated input minutiae set into an input MDI M k. I A set of input MDIs M k is obtained where k = 0,1, 2, … , 2θ max Δθ . The highest correlation peak value v k in kth inverse phase correlation space is calculated by
(d v
k
x k ,d y k
(d x k
)
,d y
= k
a rg
)
=
m a x
p
k
{p
m ,n
(d
x
k
k ( m ,n )}
,d y
k
(6)
), I
where pk ( m, n ) is the inverse phase correlation between kth input MDI M k and T template MDI M and k = 0,1, 2, … , 2θ max Δθ . ( dxk ,dyk ) is the coordinate of vk . If the size of MDI is C × R , then m ∈ [0, C − 1] and n ∈ [0, R − 1] . A set of highest correlation peak value V = v0 ( dx0 ,dy0 ) , … , vM ( dxM ,dyM ) is thus obtained, where vk ( dxk ,dyk ) is maximum inverse phase correlation value between T and I k and k = 0,1, 2, … , M ; M = 2θ max Δθ . Displacement ( Δx, Δy ) between two fingerprint images is the (dxk , dyk ) with maximal value of vk in set V and rotation angle θ r is corresponding to index k with the maximal value of vk . Transformation parameters are calculated using Eq. (7).
{
θ r = k ∗ Δθ − θ max ; Δx = dxk ; Δy = dyk
5
}
(7)
Experiment and Preliminary Results
The experiments make use of the databases provided by FVC2004 (Maio et al. 2004). In our experiments, the database DB2 set A is used. This database contains 100 fingers and 8 impressions per finger. The size of each image is 328 × 364 (width × height). We did three tests to evaluate the performance of the proposed algorithm.
Minutiae-based Fingerprint Alignment Using Phase Correlation
5.1
197
Independent Sensitivity Experiment
In this experiment, we randomly select one impression of each finger in the database as template, manually rotate and translate templates by angles, pixels in x-direction and in y-direction respectively and estimate the rotation, translations in x-and in y-directions using our algorithm. Rotated angles range from –30° to 30° with a step size of 1°. Displacements in x and in y directions are from –60 pixels to 60 pixels with a step size of 2 pixels. Average error and standard deviation are calculated for all images. The results are shown in Table 1. Table 1. Results of independent sensitivity experiment
Rotation Translation (x direction) Translation (y direction)
5.2
Average Error 0.6138 (degree) 0.0902 (pixel) 0.0277 (pixel)
Standard deviation 2.4777 (degree) 0.6370 (pixel) 0.1803 (pixel)
Combined Sensitivity Experiment
In the above experiment, we tested the proposed approach’s sensitivities to rotation, translations in x and y directions respectively. In this experiment, the input images are generated by both rotating and translating the template images. 50 rotation angles and translation values in both x and y directions are randomly created using Gaussian distribution. The average error and standard deviation between templates and new inputs are showed in Table 2. Table 2. Results of combined sensitivity experiment
Average Error Standard Deviation
5.3
θ (degree)
Δx (pixel)
Δy (pixel)
0.44284 1.7235
2.3446 4.5618
2.2959 7.0096
Using All Images in Database
The previous two experiments only compared the template and input images with linear transformation. Hence, there is no non-linear distortion exists between two fingerprint images. However, the non-linear distortion always exists in reality. In this experiment, all impressions in database (FVC2004 DB2_A) are used. Manual alignment method is used as ground truth against which the proposed approach is compared. The third impression of each finger is used as the template. Each template in the database is aligned against the remaining impressions of the same
198
Weiping Chen and Yongsheng Gao
finger. Average error and standard deviation are calculated for all impressions. The results are showed in Table 3. Table 3. Results of using all images in database A (Maio et al. 2004).
Average Error Standard Deviation
6
θ (degree)
Δx (pixel)
Δy (pixel)
2.9605 3.6459
4.5727 6.3980
4.6944 9.2796
Conclusion
This paper proposes a novel minutiae-based fingerprint alignment approach, which utilizes phase correlation to calculate the alignment parameters between two minutiae sets. The computation of this proposed method is simple without the need of searching for corresponding pairs. It doesn’t need additional feature information as required by conventional minutiae-based matching techniques. In our algorithm, we only use the locations and directions of sparse minutiae points in fingerprints, which greatly reduces the storage space, comparing to current phasebased fingerprint matching techniques. Experimental results show that the proposed approach performed well in aligning fingerprint minutiae sets and greatly improved the economy of storage space.
References [1] [2] [3] [4] [5] [6] [7] [8] [9]
Brown LG (1992) A survey of image registration techniques. ACM Computing Surveys, vol. 24, no. 4, pp 325–376. Chan KC et al. (2004) Fast fingerprint verification using subregions of fingerprint images. IEEE Transactions on Circuits and Systems for Video Technology, vol. 14, no. 1, pp 95–101. Hoge WS (2003) A subspace identification extension to the phase correlation method. IEEE Transactions on Medical Imaging, vol. 22, no. 2, pp 227–80. Ito K et al. (2004) A fingerprint matching algorithm using phase-only correlation. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, vol. E87-A, no. 3, pp 682–691. Jain AK et al. (1997) On-line fingerprint verification. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, no. 4, pp 302–314. Jain AK et al. (1997) An identity authentication system using fingerprints. Proceedings of the IEEE, vol. 85, no. 9, pp 1365–1388. Jain AK et al. (2000) Filterbank-based fingerprint matching. IEEE Transactions on Image Processing, vol. 9, no. 5, pp 846–859. Maio D et al. (2004) FVC2004: The third fingerprint verification competition. Hong Kong, China. Maltoni D et al. (2003) Handbook of Fingerprint Recognition. New York.
Robotic Techniques
Mobile robots can take many shapes, with wheels, legs and even wings or rotors. Here we meet a robot in the shape of a snake. Although the snake idea is decades old, this paper from China presents practical applications for inspection that are certainly worth reading. Underwater robots present actuation problems. One form of motor that is impervious to being submerged is the ‘hydraulic muscle’. This paper from Hong Kong presents the theory and results of adapting a tubular muscle intended for pneumatic operation for their very practical underwater work. From actuators we move to sensors. A Birmingham (UK) medical research group has a particular interest in touch sensors. Here they are concerned with identifying the ‘shape’ of the pressure that is applied. They present practical examples including the analysis of a golf swing and gait on a treadmill. The accent of the papers in this book is almost exclusively on practical applications, but here we have one on kinematics. However the tutorial value in this paper is in avoiding the trigonometric acrobatics of inverse kinematics calculations and instead finding a solution by a practical iterative method akin to hill-climbing.
A Snake-like Robot for Inspection Tasks
Bin Li, Li Chen and Yang Wang Robotics Laboratory Shenyang Institute of Automation Chinese Academy of Sciences, Nanta Street, Shenyang, 110016 P.R. China
[email protected]
1
Introduction
In the earth, there are many kinds of snake. The movement of a real snake is very flexible as it can be adapted to various environments. Snakes are able to move on rough surfaces, they cross obstacles, and they can creep into areas that are very difficult to reach with any other kind of movement. This means, a snake-like robot with the same properties would be an ideal inspection system. As an example of an unconventional robot, a snake-like robot has been developed in Shenyang Institute of Automation (SIA). It was the goal of this development to imitate the movement of a biological snake as closely as possible, and to use the snake-like robot for inspection task. Several snake-like robots that emulate snakes’ motion were developed. The first serpentine robot was built by Hirose (Hirose, 1993), who recently carried out the gliding experiments on ice to show that the creeping motion is same as the principle of skating (G. Endo, 1999). How such mechanisms can locomote in a plane was studied in (Hirose, 1993; J. Ostrowski, 1995; Ma, 2001). The snake robot locomotion theory based on Geometric Mechanics was also discussed for the serpentine robot (O. Takanashi, 1996). NEC developed a 3-dimensional motion robot for the purpose of search and rescue for survivors in collapsed buildings (R. Worst, 1996), GMD built another 3-dimensional motion robot by tend on driven mechanism, SIA built a 3-dimensional motion robot to study the mechanism of locomotion, and Hirose’s group has also developed a 3-dimensional robot equipping with larger passive wheels (M. Mori, 2001). In this paper, we will describe development of a snake-like robot for inspection task, and experiment of inspection tasks. A snake-like robot system for inspection tasks will be given.
202
2
Bin Li, Li Chen and Yang Wang
The Locomotion Mechanism
The snake is a vertebrate, an animal with a backbone, and has the largest number of vertebrae of any animal: between 100–400 vertebrae, depending on the species. Snake skeletons have only three types of bones: skull, vertebrae and ribs. Snake skeletal form and structure is quite simplified in number and type. The interesting lessons from snake skeletons are the simplicity of a repeated structure and the relatively limited motions between adjacent pieces. These aspects are worth examining in a mechanism design. Our snake-like robot is made of many same joint units, the joint unit has one degrees of freedom. See in Fig. 1. Two adjacent joint units are assembled vertically, and form a module. Concatenation of these modules can produce a locomotion mechanism.
Fig. 1. The joint unit of the snake-like robot
3
The Locomotion Mode
As shown in Fig. 2, we develop a mathematical model for an articulated snake robot consisting of n rigid links with torque actuators at n-1 joint.
Fig. 2. The model of snake robot
A Snake-like Robot for Inspection Tasks
203
We define two DOF of joint as one rotate around the Pitch axis represent by φ i and another around the Yaw axis represent by θ i α θ 0 and α φ 0 are the initial winding angles of two waves, nθ and nφ are the numbers of links in each locomotion plane, s is the displacement of tail along the Serpenoid, Kn is the number of the wave shape, i is the i th link, L is the whole length of the robot body, δφ is the phase difference between two waves out of phase. s is the displacement of tail along the serpentine curve. K1 is the constant. Based on these, in this study the 3dimensional locomotion is described by the composition of the horizontal serpentine curve from the bending angle around Z axis and the vertical serpentine curve from the bending angle around X axis. Serpentine movement is obtained by when we hold the φ i invariable, and make the θ i variant as following: 2K π ⎞ ⎛ Knπ ⎞ ⎛ 2Knπ s + n i ⎟ + K1l ⎟ ⋅ sin ⎜ n ⎠ ⎝ n ⎠ ⎝ L
θi (s) = −2α0 sin ⎜
(1)
Concertina movement is obtained by when we hold the θ i invariable, and make the φ i variant as following: 2K π ⎞ ⎛ Knπ ⎞ ⎛ 2Knπ s + n i ⎟ + K1l ⎟ ⋅ sin ⎜ n L n ⎠ ⎝ ⎝ ⎠
φi (s) = −2α0 sin ⎜
(2)
Sidewinding movement is a three-dimensional locomotion. This gait is driven flat by three-dimensional rolling of each joint without using the special friction condition between the body and the ground such as lateral undulation. The bodyshape curve is described by the composition of the bending motions about X axis and Z axis with a phase difference has the following form: ⎧ ⎛ Knπ ⎞ ⎛ 2Knπ 2Knπ ⎞ s+ i⎟ ⎪θi (s) = −2αθ 0 sin⎜ ⎟ ⋅ sin⎜ nθ ⎠ ⎝ nθ ⎠ ⎝ L ⎪ ⎨ ⎪φ (s) = −2α sin⎛ Knπ ⎞ ⋅ sin⎛ 2Knπ s + 2Knπ i +δφ ⎞ ⎜⎜ φ0 ⎪i ⎟⎟ ⎜⎜ ⎟⎟ nφ ⎝ nφ ⎠ ⎝ L ⎠ ⎩
4
(3)
Configuration of the Control System
The snake’s locomotion is simulated by multi units through controlling units’ relative angles to attain corresponding purposes. Each unit is driven by a separate processor (slave) that controls the movement of the joints. A centralized controller is in the section of head. All slaves are linked via a single serial bus (CAN-Bus) to the centralized controller that coordinates their movement. See in Fig. 3.
204
Bin Li, Li Chen and Yang Wang
Fig. 3. Control system of the snake-like robot
Centralized controller and every performing unit of the robot have the same configuration of hardware. When we have downloaded different program to it, we can get Centralized controller or performing unit .It is easy for making and changing. The dimension of controller is 50 mm×48 mm, see in Fig. 4.
Fig. 4. The controller
5
The Snake-like Robot
According to the design described upper, a snake-like robot with the adaptability for the ground has been made, see in Fig. 5. The specifications of robot see in Table 1. According to the control methods that we had explained, the snake-like robot can move with three locomotion mode (Serpentine movement, Concertina movement and Sidewinding movement).
A Snake-like Robot for Inspection Tasks
205
Fig. 5. The snake-like robot Table 1. Specifications of robot No. of joint
16
Dimension
1.6 m
Weight
2.1 kg
6
Configuration of the Inspection System
A compacted video recorder with emitter inside is fixed in the head of the snakelike robot. The video will be seen on the monitor. Although the people do not go to the place, he will know the image there. In order to do the inspection work of areas that are difficult or dangerous to be accessed by human, we have built an inspection System based on the snake-like robot. The inspection System is consisted of the console and the snake-like robot, see in Fig. 6. Based on the inspection System, when the snake-like robot had moved in a hole, we can see the video from the TV, and know the things in the hole, see in Fig. 7.
Fig. 6. Configuration of the inspection System
206
Bin Li, Li Chen and Yang Wang
Fig. 7. The inspection work in a hole
Fig. 8. Raise head for observing
7
Inspection of a Car
In park, there are many kinds of car and truck. If some dangers are putted under car or truck, it is dangerous to go there and find the danger for police. We think that the snake-like robot can help police to do this work. Let us assume it is necessary to inspect a car where it is difficult and dangerous for police to go under the car. We are convinced that the snake-like robot could be very useful in this case. An operator who controls the snake-like robot would send the robot under the car. During inspection the TV camera at the robot’s head sends pictures. A TV monitor may be used to observe these pictures. A laptop is also helpful for controlling the robot. The instruction to the snake-like robot will input based on the menu displaying on the LCD screen of the laptop. For some reasons the radio contact between the robot and the controlling laptop may be lost. The robot can then act fully autonomous until the radio contact is established again.
A Snake-like Robot for Inspection Tasks
207
Fig. 9. The inspection work under a car
The result of the experiment had indicated that our snake-like robot system is new system for inspection task under the car. This is new application for snakelike robot.
8
Future Works
The method of autonomous path-planning and navigation for the snake-like robot will be studied. One of methods will be to turn back to its path into the car and then follow this path in backward direction. It would be much more complicated if the robot would have to continue inspection on its own. For this purpose the robot should be able to process or at least store pictures which then could be processed later.
Acknowledgments This work is supported by the national high technology research and development Program of China (863 Program).
208
Bin Li, Li Chen and Yang Wang
References [1] [2] [3] [4] [5] [6] [7]
G. Endo, K. Togawa, and S. Hirose (1999). Study on Self-containedand Terrain Adaptive Active Cord Mechanism. IEEE/RSJ International Conference on Intelligent Robots and Systems. S. Hirose (1993). Biologically Inspired Robot—Snake like locomotors and manipulators: Oxford University Press. http://borneo.gmd.de/~worst/snake-collection.html. J. Ostrowski, J.W. Burdick (1995). Geometric Perspectives in the Mechanics and Control of Robotic Locomotion. Proc. 7th Int.Symp. Robotics Research. M. Mori, S. Hirose (2001). Development of Active Cord Mechanism ACM-R3 with Agile 3D Mobility. IEEE/RSJ International Conference on Intelligent Robots and Systems. S. Ma (2001). Analysis of Creeping Locomotion of a Snake-like Robot. Int. J. of Advanced Robotics, 15(2), 205–224. O. Takanashi, K. Aoki, and S. Yashima. (1996). A Gait Control for the Hyperredundant Robot O-RO-CHI. Proc. 8th JSME Annual Conference on Robotics and Mechatronics. R. Worst, R. Linnemann (1996). Construction and Operation of a Snake-like Robot. IEEE International Joint Symposia on Intelligence and Systems.
Modelling Pneumatic Muscles as Hydraulic Muscles for Use as an Underwater Actuator
Kenneth K.K. Ku and Robin Bradbeer Department of Electronic Engineering, City University of Hong Kong, Kowloon, Hong Kong
1
Introduction
Pneumatic muscles have been used for a number of years as actuators in robotic systems, usually for those that mimic human actions. They are most commonly used in systems designed to aid physically handicapped people. Air muscles consist of an inflatable tube, usually neoprene rubber that is constrained by a nylon mesh. When compressed air is passed into the muscle, which is blocked at one end, the tube inflates, but the action of the enclosing mesh forces the tube to shorten. The resultant force is sued as a linear actuator. The pneumatic muscle was invented in 1950s by physician Joseph L. McKibben to motorise pneumatic arm orthotics (Schulte, 1961; Nicket et al. 1963) to help control handicapped hands. The artificial muscle, which is simple in design, was made of a rubber inner tube covered with a shell braided according to helical weaving. The muscle was closed by two ends, one being the air input and the other the force attachment point. When the inner tube was pressurized, the muscle inflated and contracted. The open-loop control of the artificial muscle by a simple pressure variation, made this orthotic system very easy for those with disabilities to use. Although this pneumatic artificial muscle theme became rather active (Matsushita, 1968), this actuator type was finally replaced in the 1960s by electric motors that do not need heavy and bulky pressurized gas tanks. Recently, there has been renewed interest in the original actuation mode among industrial robotics researchers. In the 1988s, engineers of the Japanese type manufacture Bridgestone proposed a redesigned and more powerful version of the pneumatic muscle called Rubbertuator,i.e., actuator in rubber (E P W, 1984), intended to motorise soft yet powerful robot arms. They are called Soft-Arms (Bridgestone Corp., 1987) and were commercialized as well as to service robotics (Pack et al., 1997) have also been studied. Bridgestone’s research revived interest in studies on pneumatic artificial muscles and new types have been developed (Immega, 1986; Tanaka and Okada,
210
Kenneth K.K. Ku and Robin Bradbeer
1992), but designs based on the original McKibben design have been made by a number of companies, including Shadow (Shadow Robot Group, 1987) in the UK, whose ‘air-muscles’ have been used by at City University of Hong Kong for a number of years. It’s become clear that if the muscles could be powered hydraulically instead of pneumatically, then they would provide a very efficient actuator for underwater use. This has been confirmed by the manufacturers. Although a number of musclebased actuators have been developed for underwater use, they either use expensive composites, and/or compound hydraulic cylinders. There have been no reports so far of anyone using rubber artificial muscles in hydraulic mode.
2
Introduction of Shadow Muscle
The Shadow Muscle is a powerful media to provide a pulling force. It plays an essential role in robotic like the biological muscles for humans. When a supply of compressed air or liquid is provided, the muscle can contract up to 40% of its original length. The force it provides decreases as it contracts, and the first few percent of the contraction is very powerful indeed.
2.1
Shadow Muscle Construction
The core of a Shadow muscle is an inflatable rubber tube wrapped in a tough plastic weave (Fig. 1) which acts like a scissor action. When compressed air is passed into the muscle, it is forced to expand. A small Shadow Muscle, at 6 mm in diameter, has the strength, speed and fine stroke of a finger muscle in a human hand. A pneumatic muscle 30 mm in diameter is capable of lifting more than 70 kg at a pressure of only four bars, while a large muscle (50 mm) has enough power to pull down a brick wall.
Fig. 1. Rubber tube wrapped in a plastic weave
3
Static Pressure-Contraction Relationships
A Shadow muscle acts as an actuator which converts pneumatic (or hydraulic) power into mechanical form to provide a pulling force by inflating the rubber tube. To figure out the relationship of the contraction length and pressure, a theoretical consideration and several experiment results will be shown in the following sections.
Modelling Pneumatic Muscles as Hydraulic Muscles for Use as an Underwater Actuator 211
3.1
Static Theoretical Considerations of the Shadow Muscle
In order to study the feature of the Shadow muscle, a theoretical approach is introduced without considering the detailed geometric structure modified from the work (Guihard and Gorce, 1999; Chou and Hannaford, 1996; Caldwell et al., 1995). The input work (Win) is done in the Shadow muscle when compressed gas pushes the inner bladder surface. This is dWin = ∫ ( P − P0 ) dli ⋅ d Si = ( P − P0 ) ∫ dli ⋅ d Si = P ′dV Si
(1)
Si
where P is the absolute internal gas pressure, P0, the environment pressure, P', the relative pressure, Si, the total inner surface, dSi, the area vector, dli, the inner surface displacement, and dV, the volume change. The output work (Wout) is that done when the actuator shortens associated with the volume change, which is dWout = − FdL X
(2)
where F is the axial tension, and dLX, the axial displacement. Considering energy conservation, the input work should equal the output work if a system is lossless and without energy storage. Assume the actuator is in this ideal condition. By the “virtual work” argument, dWout = dWin
(3)
thus, from (1) and (2), − FdLX = P ′dV F = − P′
dV dLX
(4)
Consider the braided structure of the external nylon shell as a series of twodimensional trapezoids. These trapezoids are pressurized by the inner shell, and this forms the drive plate comparable to the piston area in the cylinder as described in Fig. 2.
Fig. 2. Geometry of the Shadow Muscle
212
Kenneth K.K. Ku and Robin Bradbeer
If the length of the inflated muscle is given by L X = 2 L sin θB
(5)
and the circumference of the muscle by C = 2 LA cosθ = πD
(6)
where
A is the number of trapezoids in Y direction B is the number of trapezoids in X direction D is the number of the nylon shell of the inflated muscle The volume of the cylinder is then: 2
2 L2 A2 B sin θ cos2 θ ⎛D⎞ V = π ⎜ ⎟ LX = π ⎝2⎠
(7)
By “virtual work” argument, dWout = dWin where P ′ is the relative pressure F = − P′
dV 2 2 dV dθ = P ′L A 3sin 2 θ − 1 = − P′ ( ) dLX π dLX dθ
(8)
From (6), sin θ =
LX 2 LB
(9)
Substitute (10) into (9), we have F=
2 ⎤ P ′L2 A2 ⎡ ⎛ LX ⎞ ⎢3 ⎜ ⎟ − 1⎥ π ⎢⎣ ⎝ 2 LB ⎠ ⎥⎦
2 2 ⎞ 4L B ⎛ Fπ + 1⎟ LX = ⎜ 2 2 ⎝ P ′L A ⎠ 3
At equilibrium position for the static state, the following force diagram is considered.
Modelling Pneumatic Muscles as Hydraulic Muscles for Use as an Underwater Actuator 213
Fig. 3. Force diagram of static modelling at equilibrium position
F = T = W = mg
where g =9.81 ms–2 (acceleration of gravity) 2 2 ⎞ 4L B ⎛ mgπ 1 ∴ LX = ⎜ + ⎟ 2 2 ⎝ P ′L A ⎠ 3
(10)
where L(min ) < L X < Lmax Equation (10) will be the main equation to investigate the relationship between contraction and pressure of the muscle under different loads W.
3.2
Static Test Rig
Three sizes of the Shadow muscle (6 mm, 20 mm and 30 mm diameter) were tested to study the contraction percentage at different bar (0 bar – about 4 bar) with different loads (0.5 kg to 20 kg) suspended referring to the setup shown Fig. 4. For each size of the three muscles, 2 samples were taken to carry out the experiment for comparison. Graphs plotted below showed the contraction of the muscle as the pressure was increased from 0 bar to about 3.5 bars to 4.5 bars by adjusting the pressure regulator (lower line), then decreased back to 0 bar, under several static loads (upper line).
4
Results
Three sizes of the Shadow muscle (6 mm, 20 mm and 30 mm diameter) were tested to study the contraction percentage at different bar (0 bar – about 4 bar) with different loads (0.5 kg to 20 kg) suspended referring to the setup shown Fig. 4. For each size of the three muscles, 2 samples were taken to carry out the experiment
214
Kenneth K.K. Ku and Robin Bradbeer
Fig. 4. Experiment setup for static modelling
for comparison. Graphs plotted below showed the contraction of the muscle as the pressure was increased from 0 bar to about 3.5 bars to 4.5 bars by adjusting the pressure regulator (lower line), then decreased back to 0 bar, under several static loads (upper line).
4.1
Results for 20 mm Muscle as an Example
Figure 5 shows the average plot of static modelling for 20 mm muscle under different loads. The shape of each curve for different loads looks very similar. When the air pressure is increased, initially the muscle needs more pressure increase to contract and reach a closely linear state that the muscle contraction is proportional to pressure. However, when the pressure attains more than about 3 bars, the muscle contracts less with more pressure increase compared with the state under
Fig. 5. Trend plot of all average plots of 20 mm muscle for static modelling
Modelling Pneumatic Muscles as Hydraulic Muscles for Use as an Underwater Actuator 215
3 bars. When the muscle is expanded, the decreasing rate of the contraction is slower as the pressure is decreased initially. As the pressure is decreased more the curve is back to be closely linear. Finally, at about 1.5 bars the muscle contraction drops easily as the pressure is decreased.
4.2
Time Response of the 20 mm Muscle as an Example
Figure 6 shows all the average plots of the response for 20 mm muscle under different loads on the same graph. The shape of each curve for different loads is similar. The difference of these curves is the contraction and expanding time. The muscle takes 0.7 second, 0.9 second and 1 second to attain maximum contraction under 5 kg load, 10 kg load and 12 kg load respectively. This reveals that the time for the muscle to attain maximum contraction increases with the increase of the loads suspended. However, it is not obvious to show the relationship between expanding time and the loads.
Fig. 6. Time response of 20 mm muscle
4.3
Comparison Between Pneumatic and Hydraulic Result for 20 mm Muscle as an Example
Figure 7 shows the feature difference between the 20 mm muscle tested hydraulically underwater and tested pneumatically in air respectively. The signification difference shown on the graph is that hydraulic muscle can give more pulling force than pneumatic.
216
Kenneth K.K. Ku and Robin Bradbeer
Fig. 7. Comparison between pneumatic and hydraulic result for 20 mm muscle tested at 2 bar
5
Conclusion
By taking 20 mm muscle as an example, the shape looks very similar between the theoretical plot shown in Fig. 8 and the experimental results shown in Fig. 5. However, it is clear from the consolidated results of the muscle that the part of the curve around the origin does not fully reflect the model derived from theory, like
Fig. 8. Theoretical plot for 20 mm muscle
Modelling Pneumatic Muscles as Hydraulic Muscles for Use as an Underwater Actuator 217
the hysteresis effect at the beginning of the curves on Fig. 8. This difference could be due to: • the different design of the muscle from different manufactures; • the fact that area around the end-up of the muscle is not a cylinder; • the effects of the rubber elasticity on the contractile force. This result is essential to compare the Shadow muscles to the characteristics of other manufacturers and to compare their modelling. Also, this will then enable us to model hydraulically activated muscles in the future.
Aknowledgement The work described in this paper was fully supported by a grant from the Research Grants Council of the Hong Kong Special Administrative Region, China [Project No. CityU 1146/04E].
References [1] [2]
Bridgestone Corp. (1987) Tokyo, Japan, Soft Arm ACFAS Robot System D.G. Caldwell, G.A. Medrano-Cerda and M.J. Goodwin (1995) Control of Pneumatic Muscle Actuators. IEEE Control Systems Journal, vol. 15, No.1, pp. 40–48 [3] P. Chou and B. Hannaford (1996) Measurement and Modeling of McKibben Pneumatic Artificial Muscles. IEEE Tran On Robotics and Automation, vol. 12, No. 1 [4] E.P.W. (1984) Rubber muscle take robotics one step further. Rubber Develop, vol. 37, No. 4, pp. 117–119, 1984 [5] M. Guihard and P.Gorce (1999) Dynamic control of a artificial muscle arm, Proc. IEEE Conf. on systems, Man, and Cybernetics, vol. 4 , pp. 813–818 [6] G.B. Immega (1986) ROMAC muscle powered robots. Proc. RI-SME Conf. Robotics Research, Scottsdale, AZ, pp. 112–117 [7] M. Matsushita (1968) Synthesis of Rubber Artificial Muscles. J. Soc.Instrum. Contr. Eng., vol. 7, no 12, pp.110–116 [8] V.L. Nicket, M.D.J. Perry and A.L. Garret (1963) Development of Useful Function in the Severely Paralysed Hand. J Bone Joint Surgery, vol. 45a, no 5, pp. 933–952 [9] R.T. Pack, J.L. Christopher and K. Kawamura (1997) A Rubbertuator-based structure-climbing inspection robot. Proc. IEEE Intl. Conf. On Robotics and Automation, pp. 1869–1874 [10] H.F. Schulte, (1961) The characteristics of the McKibben artificial muscle. The application of external power in prosthetics and orthotics, Appendix H, Publication 87, pp. 94–115 [11] Shadow Robot Group, London, UK, (1987) The shadow Air Muscle, http://www.shadow.org.uk [12] Y. Tanaka and T. Okada (1992) Fundamental study of artificial muscle made of rubber. Trans, Jpn. Soc. Mechan Eng. ©, v58–545, pp 217–224
Automated Tactile Sensory Perception of Contact Using the Distributive Approach
X. Ma1, P. Tongpadungrod2 and P.N. Brett1 1
Clinical Biomedical Engineering Research Group, School of Engineering & Applied Science, Aston University, Birmingham B4 7ET, UK 2 Department of Production Engineering, Faculty of Engineering, King Mongkut’s Institute of Technology North Bangkok, 1518 Piboonsongkram Rd, Bangsue, Bangkok 10800, Thailand.
1
Introduction
In its most simple form tactile sensing could be described as the process of detecting and interpreting a single force to detect the intensity of contact between surfaces. It is often a useful control parameter used in the control of tool feed force. On more than one axis, contact force is measured to control surface following processes. More complex systems have been demonstrated in research studies. These aim to identify a range of parameters to determine more information on the conditions of contact. Often the objective is to provide additional information over that of vision for machine perception; to identify contacting objects through their surface characteristics and properties and to enable discrimination between types. There are further aims to determine a suitable construction that is robust and not complex to manufacture. With such qualities, solutions would be appropriate to the variable processes found in healthcare applications. There are some interesting examples and methods in the application of tactile sensing in variable environments. (Khodabandehloo 1990) controlled a manipulator to cut meat from a carcass using a powered knife. This used forces detected at the handle of the knife to deduce trajectories relative to the invisible skeletal structure. Other types of tactile sensing in industrial processes use the force data to guide tool points over a surface. Where the size or shape of imprint of the object is to be retrieved, a matrix of point sensors has been used. (Raibert and Tanner 1989) describe an early integrated circuit type. Arrays are usually complex constructions, with the notable exception of the more recent innovation of (Holweg 1996) that reduced complexity. However, there is still the need to retrieve large volumes of data with many connections, and the data interpretation process can be computationally intensive. In contrast to the above examples, the scheme described in this
220
X. Ma, P. Tongpadungrod and P.N. Brett
paper is able to discriminate between different loading conditions and offers the potential to minimise data processing. By detecting a change in the estimated state of the load, the scheme can be used to determine, motion, deformation and slip (Tam et al. 2004) It can also be applied to recognise static objects and discriminate different groups of objects (Brett et al. 2003). Most recently the method has been applied to investigate the dynamic motion of human subjects. The method is referred to as ‘distributive’ (as opposed to point-to-point) as it utilises the response of a continuous surface element that is monitored at multiple points within the surface from which to estimate the nature of the contact anywhere on the surface. An important characteristic is that sensor outputs are coupled through the deformation of the surface. This is illustrated in Figs. 1a and 1b which illustrate advantages, from a mechanical viewpoint, in contrast with array sensors. The distributive approach utilises few sensory points and consequently there are few connections and reduced computational load with respect to reading sensory values. As the surface is a continuous element, it responds with deformations over the entire area when loads are applied at any point. The deformation can be detected at strategically positioned sensing points. There may be as few as four active sensing positions required in some applications, and many more descriptors of contact can be derived than the number of sensing points deployed. Through this simple mechanical construction, descriptions of contact can be derived through a computationally efficient process of discrimination between different conditions. The advantage of the technique is the ability to classify as range of features describing contact. First impressions of the approach may appear limited, however from the many interpretations of the combined static and transient signals it is possible to identify many states that can be used as descriptors correlating with objects and behaviour. The descriptors can be used to retrieve prominent values of the object, behaviour and motion, as well as to describe shape, size and orientation. The computational process to derive information from the few sensor points is efficient. Such a technique has been used to sense contact force distribution in gripping devices (Ellis et al. 1994) and (Stone and Brett 1996) use this approach and employ a closed form interpretation algorithm to derive contact force information from the sensory data. (Evans and Brett 1996) and (Brett and Stone 1997) extend the application to detecting normal force distributions acting on soft materials in manufacturing and minimally invasive surgical devices, exploiting further opportunities found in the approach. More recently, the approach has been used to infer
Fig. 1a. Tactile array sensors
Fig. 1b. Distributive tactile sensing system
Automated Tactile Sensory Perception of Contact Using the Distributive Approach
221
descriptions of contacting conditions at the tip of a flexible digit, as could be envisaged at the tip of a steerable endoscope (Tam et al. 2004). The shape of the digit and its relative motion, with respect to the tissues, as sliding contact can also be interpreted using the same three sensing elements. Building on potential advantages of the approach, this paper reviews the progress of this research study and describes some of the results achieved.
2
One Dimensional Performance Study for a Static Load Distribution
In an initial study to understand factors influencing performance a simple 1-D version of a distributive sensor was constructed. The simply-supported beam sensing system shown in Fig. 2 was used to examine the effects of prominent design parameters on the performance of the approach. A variety of types of algorithm can be used to interpret the sensory data. Both fuzzy inference and neural networks have been applied successfully. Often it is found that a cascaded series of neural networks can be trained most readily in contrast to a single multi-input multi-output neural network (Table 1). To test the system, known descriptions of forces were applied such that errors could be derived directly. For the sensing system there is advantage if a force can be represented by simple means, and therefore the function f(z) given in Eq. (1) was selected to describe the load as a function of non-dimensional axial beam position z. The choice of function was arbitrary and convenient to apply in this application. The function incorporates the load position p, total load T and load width index c as follows:
f ( z ) = f max e − c
n
( z − p)n
(1)
fmax is the peak load amplitude and z is defined non-dimensionally in terms of the position
Fig. 2. A one dimensional surface example
222
X. Ma, P. Tongpadungrod and P.N. Brett
Table 1. Systems error for two configurations of neural networks. Neural Network Configuration Single Neural Network Cascaded Neural Networks N.Network A N.Network B N.Network C
Output Load Parameters Position, Width, Amplitude
Normalized System Error % 4.2
Position Width Amplitude
0.1 0.63 0.1
x along the beam of length L:
z=
x L
(2)
The total load T is given by the integral over the beam given in Eq. (3). 1
T = L ∫ f ( z )dz
(3)
0
The other parameters are best described with reference to the typical distributed load function shown in Fig. 3. This function uses the parameters fmax=290 N/m, p=0.2 m, c=12 and n=8. The sharpness of the curve is given by n, and the value of n=8 produces a sufficiently distinct step to define the boundary of the load. By choosing a greater value for n the sharpness can be increased further. The position of the load is centered at the value of p=0.2 along the beam. By applying the cascaded Back Propagation Neural Networks (Fig. 4) to interpret the load position from the sensory inputs provided by the 8 proximity sensors, detecting the static response of the beam, it was shown (Ma and Brett 2002) that the system could be used to determine the position of a point load to within an error of 3% over more than 80% of the length of the beam. Also the width and intensity of the load could be determined to within 4% of correct values. The results describing performance in evaluating these descriptors are shown in Figs. 5a, 5b and 5c for load position, width and value respectively.
Fig. 3. Force distribution qmax = 290 N/m, p = 0.2 m, c = 12, n = 8
Automated Tactile Sensory Perception of Contact Using the Distributive Approach
Fig. 4. A cascaded chain of neural networks to determine load parameters
Fig. 5a. Error ηp in the determination of load center position
223
224
X. Ma, P. Tongpadungrod and P.N. Brett
Fig. 5b. Error ηw in the determination of the width of the load
Fig. 5c. Error ηT in the determination of the total value of the load
The errors shown are plotted as functions of non-dimensional position on the beam. Furthermore the tests showed that this performance could be achieved by using only 4 strategic sensing positions. The study has also been extended to a cantilever beam, where different distributions of load were also interpreted with great resolution (Ma et al. 2004) using three sensing elements. These investigations revealed the sensitivity that could be achieved with the approach in interpreting and evaluating a variety of descriptors from which information describing contact can be derived directly. These successful results have led to 2-D surfaces and most recently surfaces for discriminating dynamic motion of human subjects.
Automated Tactile Sensory Perception of Contact Using the Distributive Approach
3
225
Two Dimensional Performance Study for a Static Load Distribution
In two dimensions there are more aspects to be examined. In a number of studies, surfaces were simply supported at the edges over a sparse array of proximity sensors in a similar configuration to the 1 dimensional surface of Fig. 2. Figure 6 shows 2 contoured plan diagrams of the surface to illustrate the variation in positional error of a point load. The diagrams show error in the x and y ordinates respectively. As expected, errors were most prominent at the edges of the surface, although typically, positional accuracy was within 4% of the surface range. Subsequent research on optimisation has shown that the array of sensing elements can be reduced to four placed in optimised positions (Tongpadungrod et al. 2004). Provided the sensing elements have similar output characteristics, the error achieved is in the range of 5%. Studies have also shown that the system can also be used to discriminate different types and sizes of objects placed on the surface while determining an applied position and orientation Fig. 7. Least sensitivity is in evaluating the magnitude of applied loads.
Fig. 6. Positional accuracy on a 2 dimensional surface (Dimensions in mm)
Fig. 7. Accuracy in discriminating object orientation
226
X. Ma, P. Tongpadungrod and P.N. Brett
These studies on discriminating types of objects, sizes locations and orientation (Tongpadungrod and Brett 2002) have shown that acceptable performance can be achieved within a range suited to the evaluation of human behaviour or motion. Both static and transient attributes are relevant. In this application, the challenge is to define a tactile sensing system able to distinguish different parameters describing the objective of the evaluation when presented with subjects that are similar in properties or behaviour and different in parameter values. Therefore care must be taken in defining the objectives. The sizing of feet is one simple possibility. A study revealed that a suitable performance can be achieved with 95% accuracy (Trace 1996). This performance compares with that achieved in the discrimination of fruit by type where accuracy greater than 90% was achieved (Tongpadungrod and Brett 2000). These static examples have shown that contacting objects can be identified, even categorised by the shape of contact. Furthermore the studies have demonstrated that it is possible to infer such information over 80% of a surface such that evaluation is independent of position and orientation. To achieve this, it is most important to place sensors in optimal positions. This is likely to be dependent both on the interpreting algorithm used and the nature of the parameters to be resolved (Tongpadungrod et al. 2004).
4
Dynamic Loading studies
The dynamic response of surfaces is more complex to handle than the static case. There are different situations to consider and these depend on the disturbance type triggering the response of the surface. There can be periodic disturbances at the same point and at differing positions; there are also single event disturbances. The transient response of the surface can be captured at discrete positions and used to derive a description of the disturbance. Where the surface response is considered quasi-steady, then strategies implemented in the static case will apply. In addition there is the time dimension, where one can observe differences in the nature of captured transients and their relative timing to discriminate different conditions. Quasi-steady applications have been used to discriminate differences in human motion both in simulated health and some sport applications. Here, rather than attempting to derive parameters associated only with spatial information, features captured in the time dimension have been used to delineate variations in motion or for sizing moving parts of the body rather than to determine the static parameters of say the size of feet. Successful examples have been applied to sport, when determining variations in motion from the ideal or norm respectively. Examples selected were those where the human subject did not move their point of contact during motion, as illustrated in Figs. 8a and 8b respectively. These figures illustrate a sport application, and sway corresponding with the balance of the subject. Both applications use the same surface. In a further study, the more complex application to intermittent contact, such as in the discrimination of parameters in gait (Fig. 9), or indeed the grouping of gait by type, is showing promise. To achieve acceptable response, there is a need to
Automated Tactile Sensory Perception of Contact Using the Distributive Approach
Fig. 8a. Sport Application
227
Fig. 8b. Monitoring Sway
Fig. 9. A surface as part of a treadmill used to discriminate gait
match the dynamics of the sensing structure to the application. To illustrate the performance that has been achieved in terms of a tangible value, the positional error associated with foot impact position is less than 7% of the full surface range (Elliott et al. 2007).
5
Conclusions
This paper has illustrated the merits and types of application of a distributive tactile surface. The applications demonstrate a sensing system with appropriate attributes to retrieve information on patients in health monitoring and measurement. It is an ideal technology from the viewpoint of mechanical simplicity and robustness. This is an early stage of the technology where the understanding of design strategies with respect to performance and nature of applications are beginning to be understood.
228
X. Ma, P. Tongpadungrod and P.N. Brett
References [1] [2] [3] [4] [5] [6] [7] [8]
[9] [10] [11] [12]
[13] [14]
[15]
[16]
Brett and Stone (1997) Brett, P.N. and Stone, R.S.,‘ A technique form measuring the contact force distribution in minimally invasive procedures.’, Proc IMechE Part H4, Vol. 211, pp. 309–316, 1997. Brett et al. (2003) Brett, P.N., Tam,B., Holding, D.J. and Griffiths, M.V., A Flexible Digit with tactile feedback for invasive clinical applications, 10th IEEE Conference on Mechatronics & Machine Vision in Practice, Perth, Dec 2003. Ellis et al. (1994) Ellis, R.E., Ganeshan, S.R. and Lederman, S.J.,’ A tactile sensor based on thin-plate deformation.’, Robotica, vol. 12, pp. 343–351, 1994. Elliott et al. (2007) Elliott, M., Ma, X., Brett, P.N., Determining the location of an Unknown Force Moving along a Plate Using the Distributive Sensing Method, Sensors and Actuators, part A. Physics, 2007. Evans and Brett (1996) Evans, B.S. and Brett, P.N.,’ Computer simulation of the deformation of dough-like materials in a parallel plate gripper’, Proc IMechE, Part B, Vol. 210, pp. 119–130, 1996. Holweg, (1990) Holweg E.,’Autonomous control in dextrous gripping, PhD thesis, Delft University of Technology, Delft, Netherlands, 1996. Khodabandehloo (1990) Khodabandehloo, K.,’Robotic Meat Cutting’, IMechE Symposium on Mechatronics, Cambridge, UK 1990. Ma and Brett (2002) Ma, X. and Brett,P.N..,‘A novel distributive tactile sensing technique for determining the position, width and intensity of a distributed load..’, Transactions of the IEEE on instrumentation and measurement systems, vol 51, No 2, April 2002. Ma et al. (2004) Ma,X., Brett,P.N., Wright,M.T. and Griffiths,M.V.,’ A flexible digit with tactile feedback for invasive clinical applications.’, Proc IMechE, part H, No: H3, Vol. 218, pp. 151–157. Raibert and Tanner(1989) Raibert, M.H. and Tanner, J.E.(‘Design and implementtation of a VLSI tactile sensing computer’, Robot sensors: Tactile and Non-vision. Vol 2, ed E.Pugh, Kempston, UK, IFS publications. pp. 145–155, 1989 Stone et al. (1996) Stone, R.S. and Brett,P.N.,‘A novel approach to distributive tactile sensing’, Proc IMechE part B4, vol. 210, 1996. Tam et al. (2004) Tam, B, Brett, P.N., Holding, D.J. and Griffiths, M. ‘The experimental performance of a flexible digit retrieving tactile information relating to clinical applications, Proc. 11th IEEE Int. Conf. Mechatronics and Machine Vision in Practice, Macao, 30 Nov–1 Dec. 2004. Trace (1996) Trace, M. ‘The sizing of feet by the distributive approach to tactile sensing.’, Thesis, Mechanical Engineering, University of Bristol, UK, 1996. Tongpadungrod (2004) Tongpadungrod, P., Rhys, D. and Brett, P.N., ‘An approach to optimise the critical sensor locations in a 1 dimensional novel distributive tactile surface to maximise performance.’, International Journal of Sensors and Actuators, Permagon. Tongpadungrod and Brett (2002) Tongpadungrod, P. and Brett, P.N.,‘ Orientation detection and shape discrimination of an object on a flat surface using the distributive tactile sensing technique.’, Proc. 9th IEEE Conference on Mechatronics and Machine Vision in Practice, Thailand, 2002. Tongpadungrod (2000) Tongpadungrod, P. and Brett, P.N., ‘The performance characteristics of a novel distributive method for tactile sensing.’, Proc 7th IEEE Conference on Mechatronics and Machine Vision in Practice , Sept 2000, Hervey Bay, Australia.
Blind Search Inverse Kinematics for Controlling All Types of Serial-link Robot Arms
Samuel N. Cubero University of Southern Queensland, Toowoomba, Australia.
1
Introduction
The main objective of “Inverse Kinematics” (IK) is to find the joint variables of a serial-link manipulator to achieve a desired position and orientation relationship between the end-effector frame and a base (or reference) frame. This paper describes a general purpose Inverse Kinematics (IK) method for solving all the joint variables for any type of serial-link robotic manipulator using its Forward Kinematic (FK) solution. This method always succeeds in solving the IK solution for any design of articulated, serial-link robot arm. It will always work on any design of serial-link manipulator, regardless of the number or types of joints or degrees of freedom (rotary and/or translational) and it is simple and easy enough to be implemented into robot arm design and simulation software, even automatically, without any need for complex mathematics or custom derived equations. Known as the “Blind Search” method, it also works on robots with redundant joints and with workspace internal singularities and will not become unstable or fail to achieve an IK solution. Robot arm design and 3D simulation software has been written and has successfully demonstrated that the “Blind Search” algorithm can be used as a general-purpose IK method that is capable of controlling all types of robot arm designs and even 3D animated objects and characters. The speed of solving IK solutions numerically is dependent on software design, selected search parameters and processing power.
2
Background of Inverse Kinematics
The majority of modern serial-link robotic arms and manipulators are designed to manipulate an end-effector tool to follow a predefined trajectory or path in 3D space (i.e. a series of known or calculated cartesian-coordinate positions and rollpitch-yaw angular orientations of the tool). For example, a 6 degree-of-freedom
230
S.N. Cubero
Fig. 1. Coordinate frames for the links of a 3 degree-of-freedom robot leg (example)
welding robot must have its 6 individual joint variables (or joint angles) controlled simultaneously so that the tip of a MIG welding tool attached at its tool-plate can be placed accurately at known positions along a welding path (e.g. a straight line). Using the Denavit Hartenberg (D-H) convention for obtaining an A-matrix for each link, the Forward Kinematic (FK) solution for the leg shown in Fig. 1 is given by the xB, yB, zB coordinates for the foot, measured from Frame B’s origin. xB = cos θ1 cos θ2 ( l3 cos θ3 + l2 ) + sin θ1 ( d2 – l3 sin θ3 ) + l1 cos θ1
(1)
yB = sin θ1 cos θ2 ( l3 cos θ3 + l2 ) – cos θ1 ( d2 – l3 sin θ3 ) + l1 sin θ1
(2)
zB = sin θ2 ( l3 cos θ3 + l2 )
(3)
(where l1, l2 and l3 are link lengths according to the D-H convention). These FK equations were derived by multiplying together three A-matrices: B
TF = 0T3 = 0A1 1A2 2A3 = transform relating frame 3 to frame 0
(4)
The two most common “general-purpose” methods for obtaining the IK solution of serial-link robotic arms or manipulators, are: (1) Using “Inverse Transforms”: Paul’s method [1] of obtaining a “closed-form” explicit solution for each joint variable; and (2) Using the “Jacobian Inverse” method to obtain an incremental IK solution [2]. Using Paul’s method, we face the problem of being unable to express a joint angle as an explicit function of (xB, yB, zB ) and the D-H link parameters only, independent of the other joint angles. Hence, there appears to be no algebraic, closed-form IK solution for this example manipulator since the joint
Blind Search Inverse Kinematics for Controlling All Types of Serial-link Robot Arms 231
angles are mathematically dependent on each other. Closed-form IK solutions cannot be found for several other designs of robot manipulators using this method. Hence, Paul’s “Inverse Transform” method is not a truly “general purpose” IK method that works for all manipulator designs. A commonly used technique for obtaining an IK solution is by inverting the manipulator’s Jacobian matrix, either symbolically or numerically. The Jacobian, J, is a special type of transformation matrix which relates incremental changes in joint angles, Δjθ = (Δθ1, Δθ2, Δθ3), to the robot end-effector’s incremental movements in cartesian space, ΔBpF = (ΔxB, ΔyB, ΔzB), relative to the base or reference frame. The relationship between incremental 3D cartesian displacements of the end-effector frame, ΔBpF, and incremental changes in joint angles, Δjθ, is given as ΔBpF = BJθ Δjθ
(5)
from McKerrow [2], where the Jacobian matrix, J, is given by
⎡ ∂x B ⎢ ⎢ ∂θ1 ∂y B Jθ = ⎢ B ⎢ ∂θ ⎢ ∂z 1 ⎢ B ⎢⎣ ∂θ1
∂x B ∂θ 2 ∂y B ∂θ 2 ∂z B ∂θ 2
∂x B ⎤ ⎥ ∂θ 3 ⎥ ∂y B ⎥ ∂θ 3 ⎥ ∂z B ⎥⎥ ∂θ 3 ⎥⎦
(6)
If we know the small changes in joint variables for all links of a robot, Δjθ = (Δθ1, Δθ2, Δθ3), the Jacobian allows us to calculate the small “incremental” changes in the position of the end-effector or foot, ΔBpF = (ΔxB, ΔyB, ΔzB). Hence, the “incremental” IK solution is: Δjθ = BJθ–1 ΔBpF
(7)
The Jacobian matrix cannot be inverted when the foot or the end-effector frame is near a workspace internal singularity, which is a position where there are an infinite number of possible joint variable solutions. There are two types of singularities. (1) A workspace internal singularity occurs inside the 3D workspace (volume of reachable positions), often when two or more joint axes or link frame axes line up; and (2) A workspace boundary singularity occurs at or beyond the outermost surface of the robot’s 3D workspace, when the manipulator is fully extended or fully retracted and attempts to move outside of its workspace (e.g. outside of the joint angle ranges defined by the minimum and maximum permissible joint angles for each link). In Fig. 2, there are infinite possible values for the solution of θ2 when the robot’s foot (end-effector frame origin) is situated on the “internal singularity curve”, where θ3= –132.4°, or if the foot position (origin of frame F or 3) is in line with the z1 basis axis as shown in Fig. 1. At target positions near or on this internal singularity curve, the Jacobian matrix cannot be inverted and hence, no IK solution can be found unless additional algorithms are implemented to deal with this problem. Such singularity features can be
232
S.N. Cubero
Fig. 2. Workspace of robot in Fig. 3 showing internal singularity curve [Cubero, 13]
identified by plotting the surfaces described by the FK equations. Most programmers of robot controllers are aware of the problems caused by internal singularities and, hence, do their best to avoid letting goal points or trajectory paths of endeffectors go anywhere near these problem areas. The stability and accuracy of this IK method depends on the sensitivity of J to particular joint angle values. Note that there are several other IK techniques that have been proposed in the past, many more than can be described in detail here. The reader may wish to examine and compare other IK methods, such as: (a) “Screw algebra” by Kohli & Sohni [6], “Dual matrices” by Denavit [7], “Dual quaternian” by Yang & Freudenstein [8], “Iterative method” by Uicker et al. [9] and Milenkovic & Huang [10], geometric or vector approaches by Lee & Ziegler [11], “Iterative method” by Goldenberg & Lawrence [14, 15] and Pieper’s approach for solving Euler angles [12]. Not all of these IK methods do not work for every possible type of serial-link robot manipulator design, especially those with one or more redundant degrees of freedom or those with one or more internal singularities. Also, most of these IK methods require a great deal of complicated trigonometric analysis and equation derivations which are unique for each type of robot manipulator design, hence, their approaches cannot be universally applied to all types of serial-link manipulators. The above methods can be demonstrated to work on a few, particular examples
Blind Search Inverse Kinematics for Controlling All Types of Serial-link Robot Arms 233
of manipulators, but they do not offer a simple, methodical, procedural approach that will always achieve an IK solution for every type of serial-link manipulator. Many of these methods are also not simple enough to automate or use immediately in 3D manipulator design and control software and they require the derivation of unique IK equations or algorithms for different classes of robot arms. In many cases, IK solutions cannot be found because the end-effector frame is located near an internal singularity within the workspace, causing IK equations or Jacobian inversions to become unsolvable. Such problems are described by McKerrow [2]. The “Blind Search” IK method overcomes all of these limitations.
3
A New “Blind Search” Incremental IK Method
After considering the limitations of popular IK methods, the author [Cubero, 13] proposed a general purpose IK method in 1997 which relies heavily on a “trial and error” or error minimizing approach to solving the joint variables of any robot arm, given the desired origin position and orientation of the end-effector frame. Only the Forward Kinematic (FK) solution for a manipulator or its overall manipulator transform matrix is needed, hence, this method can be applied to any type of robot with one degree of freedom per link. This IK method solves all the correct joint variables needed which correspond to an incremental change in the displacement and axis orientations of a robot’s end-effector frame in 3D space.
3.1
How the “Blind Search” IK Method Works
Consider a 3 dof serial-link manipulator with only rotary joints, such as the example robot leg in Fig. 1. Assuming that a small change in foot position is required, each joint angle (or displacement or joint variable) can do one of the following: 1. Increase (+) by a small displacement Δθ (rotary), or Δd (sliding) 2. Decrease (–) by a small displacement Δθ (rotary), or Δd (sliding); or 3. Remain the same (0), no change With forward kinematic equations, we can calculate the end-effector frame position error and/or basis axes alignment errors which arise for all possible combinations of small joint variable adjustments. For any serial-link manipulator with n degrees of freedom, there are a total of 3n – 1 different possible combinations for joint variable adjustments which can be made for a given Δθ (for rotary joints), or Δd for translational joints (excluding the “no changes” solution). For example, a 2 link robot arm would have 32 – 1 = 8 possible combinations (excluding “no changes”). In the following discussion, we will consider solving the 3 joint angles for the example robot leg in Fig. 1, given the desired target position B tT = (xB, yB, zB) for the foot relative to the base frame B. The following analysis can also be extended to 6 degree of freedom industrial manipulators, where the position of the end-effector (or tool) frame origin position and basis axes
234
S.N. Cubero
orientations are specified and must be achieved through small changes in joint variables. For simplicity, we will limit our discussion to an IK solution for achieving a desired (target) end-effector frame origin position. Later, we will extend this discussion to obtaining IK solutions which achieve a desired end-effector frame orientation and a desired (target) end-effector frame origin position, simultaneously. For the example robot shown in Fig. 1, there are N = 33–1 = 26 possible FK solutions for the foot position if each joint angle changes by +Δθ, –Δθ or 0° (no change at all). We can calculate all the resulting position errors of these FK solutions from the desired foot position using simple vector subtraction (i.e. Target position vector subtract the current position vector of the foot). We can then select the “best” combination of joint angle changes which produces the smallest computed foot position error as our incremental IK solution after doing a simple “minimum value search” on an array of error magnitudes from all combinations of joint angle changes. If the computed position error from this “best” combination of joint angle changes is not satisfactory, then the entire process can be repeated using the “best” or smallest error vector found so far (which tends to become smaller with more iterations), until the error tolerance is met. Note that the incremental change in foot position to the next target position must be small enough to be achieved, within tolerance, so that the target may be reached by any one or more of the possible combinations of joint angle changes. The test joint angles (subscript small “t”) will tend to converge towards the correct solution if we continually select the combination of joint angle changes that minimizes foot position error relative to the target position. This algorithm will now be considered in detail. As shown in Fig. 3, the target position vector for the foot, BtT, starts from the base frame origin B and points to the target frame origin T. The foot position vector BpF starts from the base frame B and points to the foot F. The starting error
Fig. 3. Position vectors for foot position F and target point T
Blind Search Inverse Kinematics for Controlling All Types of Serial-link Robot Arms 235
vector, FeT = BtT – BpF, starts from the foot F and points to the target frame origin T. The goal is to find a suitable set of joint angle changes to get the foot position F to end up at the target position T, within an acceptable error range (or tolerance). We need to find the combination of joint angle changes that achieves an error vector magnitude |FeT| less than an acceptable error tolerance within a reasonably short time, for quick real-time performance. We may need to search the FK solution for each and every possible combination of joint angle changes and select the combination of changes that gives us the smallest position error from the target position, which satisfies the required error tolerance. Table 1 and Fig. 4 show all the possible displacements that can be made for the example robot leg shown in Figs. 1 and 4. The “Blind Search” IK algorithm that will now be discussed is suitable for finding accurate joint angle solutions as long as the end-effector position always remains within its workspace, as shown in Table 1. Angular displacement combinations for the example robot of Fig. 1
Fig. 4. Displacement combinations for all joints of a 3 degree-of-freedom manipulator
236
S.N. Cubero
Fig. 2, and the search parameters Δθ and |FeT| are kept small enough so that the magnitude of the error vector converges towards 0. If, after many iterations none of the error magnitudes calculated from all of the combinations satisfy the accepted error tolerance (i.e. the “best” IK error magnitude |FeTb| > etol), then the initial incremental step (magnitude of the foot error vector) may need to be lowered because the target position T could be too far from the current foot position F. In summary, the following steps are executed: {1} A new target end-effector position/orientation is issued and this must be quite near to the current position/orientation of the end-effector frame. {2} All the FK solutions and position/angle errors for each of the displacement combinations shown in Fig. 4 are calculated and stored in an array. {3} A minimum value search is conducted to find out which displacement combination produced the smallest error and this combination is marked as the current “best IK solution”. {4} If this error satisfies the acceptable error tolerance for a match, then the IK solution is found, otherwise, the process is repeated from step {2} where the most recent “best IK solution” is used as the starting or current robot configuration, until the best FK error set (joint angles and/or displacements) satisfies the error tolerance criteria. In effect, this is a form of algorithmic or iterative “feedback control” method being used to eliminate the difference between the actual current position/ orientation of the robot manipulator and the target position/orientation. This method searches for small joint angle/displacement changes so that the current joint angles/displacements move closer and closer to the correct IK solution (within a set tolerance). The “Blind Search” algorithm can also be extended to find the IK solution that aligns the axes of both the end-effector frame and the target frame, while bringing together both their origins so that they coincide, within an acceptable error tolerance. Orienting the end-effector frame’s x, y and z-axes will be described later, so for now, we will consider achieving an IK solution for position only.
Blind Search Inverse Kinematics for Controlling All Types of Serial-link Robot Arms 237 Algorithm 1. Blind Search IK method for the example robot shown in Fig. 1 1.
2.
3.
4.
5.
Set the etol variable to an acceptable tolerance value for computed foot position error (e.g. Let 0.02 mm FK error be acceptable for an “exact” position match, i.e. Position of F must be within this distance from the target position T for accepting an IK solution. This is the worst possible error for the IK solution. Set Δθ to be proportional to the magnitude of the starting position error | FeT | or expected displacement of the end-effector (foot), where Δθ = k | FeT |, e.g. k = 0.2°/mm (scale factor k depends on lengths of robot’s links. The longer the ln link lengths, the larger k should be). Try to keep displacements small, e.g. Δθ < 2° but nonzero. (Algorithm is continued on following page…) Initialize iteration counter variable c = 0 and set the counter limit cmax = 50, which is the maximum number of iterations that will be executed in search of an IK solution whose error satisfies etol (error tolerance). Clear error flag ef = 0. (The error flag is only used if the magnitudes of the search parameters (initial error vector |FeT| or Δθ) are too large, or if the target position T is outside the robot’s workspace). These values must be carefully selected. Trajectory planning algorithm supplies the next target foot position BtT. Trajectory planning algorithm defines 3D path points for foot F. Keep all target positions close or within 10 mm of each other, or perhaps even smaller, depending on size of Δθ. Note: Initial step size | FeT | should be proportional to Δθ. The error flag ef is noted, acted upon and cleared. The initial “best” error |FeTtb | = | FeT | = |BtT – BpF| and initial “best” test IK solution is jθtb = (θ1tb, θ2tb, θ3tb ) = jθ = (θ1, θ2, θ3 ) = currently incorrect model joint angles. A test foot position vector BpFt (model of position) and the test error vector and its magnitude |FeTt| are calculated for each combination and stored. (|FeTtb | should decrease as c increases). This step may have to be repeated 33 –1 = 26 times for each possible combination of joint angle changes or no changes. e.g. FOR i = 1 to 26 (Loop). Calculate the test angles for each combination of joint angle changes, as shown in Table 1 (e.g. θ1t1, θ2t1, θ3t1, where i = 1) and check that each test joint angle lies within its valid range of motion (between the joint’s minimum and maximum displacement limits, θmin and θmax for rotary joints, or dmin and dmax for translational joints). e.g. If any test joint angle exceeds its minimum or maximum acceptable limit, then set it to that nearest limit. i.e. do not allow any test angle outside of its valid range of motion (i.e. every θ must always stay between θmin and θmax for that joint). Calculate a test foot position, BpFti, for this combination “i” using the test joint angles just found jθti = (θ1ti, θ2ti, θ3ti ) from Table 1 with FK (Eqs. 1–3) and store each test position into an array. Also store the test joint angles jθti. Calculate a test position error vector for this combination, FeTti = BtT – BpFti, and its magnitude |FeTti| and record this into an array. Run an optional check to see if the magnitude of the test error for each combination satisfies the error tolerance. If error |FeTti | < etol, this is an acceptable IK solution, so record this as |FeTtb| and store the “best” joint angles jθtb = (θ1tb, θ2tb, θ3tb ) = (θ1ti, θ2ti, θ3ti ) then jump to Step 8, skipping steps 6 and 7 to save time. If not, repeat Step 5 (Next “i”) and test the next combination of changes. Algorithm 1 continued on next page…
238
S.N. Cubero Algorithm 1 continued (Blind Search IK method):
6.
7.
8.
9.
Search for the smallest error magnitude in the |FeTt | (1 to 26 element) array using a simple “minimum value search” algorithm. The smallest error in this complete array of 26 error magnitudes is |FeTt |(s), it’s index number is remembered as “s” in the array and its joint variables (test IK solution) are recorded jθts = (θ1ts, θ2ts, θ3ts). The smallest error magnitude found so far from all the iterations (since c = 0) is recorded as the “best” error, |FeTtb|. If |FeTt |(s) from this array is less than the previously calculated “best error”, then set the new “best error” to equal the current error, i.e. if |FeTt |(s) < |FeTtb |, then set |FeTtb | = |FeTt |(s), and update the best joint angles jθtb = (θ1tb, θ2tb, θ3tb) = (θ1ts, θ2ts, θ3ts). Check that the test error is converging towards 0. If smallest error |FeTt |(s) > |FeTtb|, the last “minimum value search” did not find a better IK solution which produced a smaller “best” error than the one found from the previous pass of this algorithm. This could be caused by any of the search parameters Δθ, k or the displacement |FeT|, being too large, so they may be reduced or halved here if necessary. If c > cmax (e.g. after 50 iterations) and error is still greater than the acceptable error tolerance, |FeTtb | > etol, then set the “error flag” ef = 1 to inform the trajectory planning algorithm to take corrective action or modify its planned trajectory, e.g. print “No solution found because the initial movement of the end-effector |FeT| was too large or Δθ or k are too large, or the target point T is outside the robot’s workspace.”, set c = 0 and go to Step 3 to retry this incremental move again. (The search parameters may be reduced automatically to avoid this error message) Otherwise, if c < cmax and the “best” error sofar |FeTtb| > etol, another iteration is necessary to hopefully find a smaller |FeTtb| < etol, so calculate BpFt using FK (Eqs. 1–3) with the best “test” angles sofar jθtb = (θ1tb, θ2tb, θ3tb ), increment the loop counter c = c+1 and go to Step 5. If |FeTtb| < etol, then the IK solution is found so proceed to Step 8. If the error flag is clear, ef = 0, and the best test angles found so far produces an error magnitude |FeTtb| < etol, then update the joint angles by equating them with the best test angles: θ1 = θ1tb, θ2 = θ2tb, θ3 = θ3tb. (etol must be large enough to obtain a fast solution). Send this angle data to the position controllers of the actuators for links 1, 2 and 3, then Return to Step 3 to get the next target position BtT. (Note that the new FK solution for BpF is calculated with the newest joint angles jθ = (θ1, θ2, θ3) which are accurate joint variables. The best test position BpFtb should converge towards B tT, or tend to get closer to the target with more passes of this algorithm, until the best error |FeTtb | < etol (error tolerance) Note: This example describes “position control” only. To implement orientation control, use |FeTtb | = etotal from Eq. (16) to also force the orientation of the end-effector’s frame E to match the x, y & z basis axes of the target frame T. Additional code is necessary to control or guide the shape of redundant links towards a preferred posture or to approach the target from a particular 3D direction.
If the “best” error does not keep on improving and the error tolerance is not met after several iterations have completed, search parameters may be reduced in magnitude and the entire procedure can be repeated. This is not a problem that is normally encountered if these search parameters are selected carefully.
Blind Search Inverse Kinematics for Controlling All Types of Serial-link Robot Arms 239
3.2
Forcing the End-effector Frame to Match the Target Frame’s Origin Position and Orientation
The discussion in the previous section dealt mainly with finding an IK solution for a serial-link manipulator given a target position (origin of target frame T) which is very close to the origin of the end-effector frame E. We have so far only described how to move the origin of end-effector frame E to the origin of target frame T. We will now consider an extension to this incremental IK method and try to orient the xE and yE basis axes of end-effector frame E so that they point in the same directions as the corresponding xT and yT basis axes of the target frame T respectively. Note that only two of the three corresponding basis axes need alignment as long as both frames are “right handed”. If we can find the robot’s joint variables to move the origin of frame E to the origin of frame T, which is only a short distance away, while the corresponding x and y (and consequently z) basis axes of both frames are made to point in the same directions respectively (i.e. xE becomes colinear with xT and yE becomes colinear with yT) within an acceptable error tolerance, then the complete incremental IK solution is found for any type of serial-link arm. Consider the end-effector frame E for any type of multi-degree-of-freedom robot arm manipulator, as shown in Fig. 5. The robot arm shown in Fig. 5 is just for illustrative purposes only and this discussion applies to any serial-link robot arm design with one degree of freedom per link. Note that spherical joints (or “ball joints”) can be treated like two rotary links, where one link has a zero link length (l = 0), so each link has one rotation/ displacement about one axis. e.g. the human shoulder joint has two rotary degrees
Fig. 5. Base, End-effector and Target frames for a serial-link manipulator
240
S.N. Cubero
of freedom and can be considered as two links: one with a shoulder-to-elbow link length attached to an invisible link with a zero link length, each with a different rotation. The Target frame can be specified relative to the Base frame of the robot using a standard 4×4 transformation matrix BTT. Note that the xT, yT and zT unit vectors representing the basis axes of frame T relative to the base frame B are obtained from the first 3 vertical columns of the BTT “target” frame 4×4 matrix.
⎡a T ⎢b B TT = ⎢ T ⎢ cT ⎢ ⎣0 ⎡ xT =⎢ ⎢⎣ 0
dT
gT
eT
hT
fT
iT
0
0
yT
zT
0
0
pT ⎤ q T ⎥⎥ rT ⎥ ⎥ 1⎦
(8)
tT ⎤ ⎥ 1 ⎥⎦
B
where the direction vector of the xT basis axis with respect to frame B is given by xT = aT xB + bT yB + cT zB and likewise yT = dT xB + eT yB + fT zB zT = gT xB + hT yB + iT zB The point vector of the frame T origin point relative to the frame B origin is B tT = pT xB + qT yB + rT zB Similarly, the xE, yE and zE unit vectors representing the basis axes of frame E relative to the base frame B are obtained from the first 3 columns of the BTE matrix which is the FK transformation matrix of the entire manipulator (similar to the type found in Eq. 4, obtained by combining all the A-matrices for the manipulator). The manipulator transform for an n-link manipulator is thus B TE = 0A1 1A2…. n-1An (for any serial-link manipulator with n links > 3 )
⎡a E ⎢b B TE = ⎢ E ⎢ cE ⎢ ⎣0 ⎡ xE =⎢ ⎣⎢ 0
dE
gE
eE
hE
fE 0
iE 0
yE 0
zE 0
pE ⎤ q E ⎥⎥ rE ⎥ ⎥ 1⎦ B
pE ⎤ ⎥ 1 ⎦⎥
where the direction vector of the xE basis axis with respect to frame B is xE = aE xB + bE yB + cE zB and likewise, yE = dE xB + eE yB + fE zB zE = gE xB + hE yB + iE zB
(9)
Blind Search Inverse Kinematics for Controlling All Types of Serial-link Robot Arms 241
Fig. 6. Superimposing the origins of both E and T frames to measure α & β
The point vector of the frame E origin point relative to the frame B origin is pE = pE xB + qE yB + rE zB. Figure 6 shows the error vector and angular differences between the x and y basis axes of frames E and T, the magnitudes of which all need to be “driven towards 0” or reduced below an acceptable error tolerance in order to obtain an acceptable IK solution. In oder to achieve a suitable IK solution, the magnitude of the error vector |EeT| and the angles between pairs of corresponding xE, xT and yE, yT basis axes (i.e. α & β respectively) must be calculated and then combined into a “total error” value |etotal| which can be used to search for the best combination of joint variable changes. We may use the “Scalar” or “Dot Product” operation on basis vectors xE and xT to find the angle between them, α. Likewise, we can perform the same operation on vectors yE and yT to find β. There is no need to do this for the 3rd pair of axes, vectors zE and zT, because if the other two axes line up, the z axes will automatically be aligned relative to the x-y planes because both frames are “right handed”. The solutions for both α and β can each range anywhere from 0° to 180°. Also, the magnitude of any basis (unit) vector is 1, thus, |xE| = |xT| = |yE| = |yT| = 1. The inner or “Dot Product” operations are now used to find α and β. B
xE • xT = | xE | | xT | cos α = cos α
(10)
yE • yT = | yE | | yT | cos β = cos β
(11)
It is useful to note that if α or β lies between 90° and 180°, the cosine function will return a negative value. The “worst case” alignment between any two basis axes vectors is 180°, which gives cos (180°) = –1. If α or β are between 0° and 90°, the cosine function will return a positive value. The “best case” for alignment between any two basis axis vectors is 0°, which gives cos(0°) = +1. Hence, angular alignment “error” between the xE and xT axes can be measured using a positive value like 1-cos (α). If the angle α = 0, cos (0°) = 1, so the alignment error is 1 – 1 = 0 (meaning zero alignment error). If α = 180°, cos (180°) = –1 so 1 – (–1) = +2 which
242
S.N. Cubero
gives the largest or maximum value for alignment error. Note that “alignment error” is an artificial term that ranges from 0 (perfect alignment) to +2 (worst alignment) and is simply used as a measure of how poorly a pair of basis axes line up. We will designate eax as the angular alignment error between xE and xT, and eay as the angular alignment error between yE and yT. These values are useful for calculating an overall “total error” which also includes position error. eax = 1 – cos α = 1 – xE • xT = 1 – (aE aT + bE bT + cE cT )
(12)
eay = 1 – cos β = 1 – yE • yT = 1 – (dE dT + eE eT + fE fT )
(13)
An equation for “total error”, etotal, can be created to combine the position error and angular alignment error terms so that the best combination of joint variable changes can be found to minimize this “total error”. Total error can now be formulated using two “weighting factors” which can be adjusted to scale the importance of each source of error. Kp is the factor which adjusts the contribution of the initial error vector magnitude (or incremental step size to the next target position) |EeT| towards the “total error”. Ka is the factor which adjusts the contribution of both eax and eay angular misalignment (error) values. These weighting factors are like “gains” for a PID algorithm, but they must always remain positive. i.e. etotal, eax and eay are always positive. We will call epos the error term due to position error (distance between E and T) and eang will be the error term due to the sum of angular alignment error values. epos = Kp |EeT| ( > 0 positive)
(14)
eang = Ka (eax + eay ) ( > 0 positive)
(15)
etotal = epos + eang = Kp |EeT| + Ka (eax + eay )
( > 0)
(16)
The values for Kp and Ka need to be adjusted so that a fair balance can be obtained between the contribution of position error and the contribution of axis misalignment errors. The worst value for epos should be equal to the worst case value of eang if position error is just as important as alignment error for the axes. Accuracy of the “Blind Search method” depends largely on the value of the error tolerance for an acceptable IK solution, however, higher precision solutions may require more iterations. The variable etotal can be used instead of a “test” error vector |FeTt|, which can be calculated for each and every possible combination of joint variable changes. (See Step 9 in Algorithm 1.) The same methods used in the previous section may be used to search for the best combination of joint variable changes which produce smaller etotal values as more iterations are executed. A smallest “total error” value can be found for each iteration using a minimum value search and this can be compared to the best “total error” found so far. The “Blind Search” algorithm searches for incremental IK solutions by reducing the “total error” etotal with more passes of the algorithm,
Blind Search Inverse Kinematics for Controlling All Types of Serial-link Robot Arms 243
until the etotal value is below a satisfactory error tolerance value, etol (or etol) for an acceptable FK and IK solution. Hence, an IK solution is found when etotal < etol. The error tolerance for the IK solution must be set by the programmer along with carefully selected values for search parameters like initial position displacement or step |EeT| and Δθ (or Δd for translational joints), but this can even be automated.
4
Conclusion
This paper has presented a practical and robust inverse kinematics method which can solve the joint variables for any type of serial-link robot arm or manipulator, regardless of the number and types of degrees of freedom the manipulator has or the number or location of workspace internal singularities inherent within the workspace of the robot. The “Blind Search” IK method described in this paper will search for small joint angle changes that are necessary to minimise the origin position error and/or alignment error between the axes of the End-effector frame and the Target frame. The speed, stability and reliability of “Blind Search” IK solutions depends heavily on the selection of suitable search parameters, such as step size to the next target point |EeT|, Δθ or Δd incremental displacement magnitudes for each link, Kp and Ka “weighting factors” for calculating “total error”, and an error tolerance etol defining the permissible tolerance or acceptable error of an IK solution. These variables need to be adjusted for each type of robot manipulator. Performance of these algorithms can be “tuned” by trial and error, or perhaps automatically, to achieve a satisfactory balance between solution accuracy, search stability (i.e. reliability of convergence toward an acceptable solution), and computation time for real-time control. Much time and effort can be saved by using this “Blind Search” IK method because complicated mathematical derivations are avoided and only the FK solution or overall manipulator transform matrix is needed. The “Blind Search” method has been tested successfully for controlling the end-effectors of 3D simulated robot arms assembled from standard generic link types, as defined by McKerrow [2], without the need to derive any equations.
References [1] [2] [3] [4] [5]
Paul R. P. (1981). Robot manipulators – Mathematics, programming and control. Massachusetts Institute of Technology USA, ISBN 0-262-16082-X McKerrow P. J. (1991). Introduction to Robotics. (Chapters 3 and 4) AddisonWesley, ISBN 0-201-18240-8 Klafter R. D., Chmielewski T. A., Negin M. (1989). Robotic engineering – an integrated approach. Prentice-Hall, ISBN 0-13-782053-4 Fu K. S., Gonzalez R. C., Lee C. S. G. Robotics: Control, Sensing, Vision and Intelligence. McGraw-Hill, ISBN 0-07-100421-1 Ranky P. G., Ho C. Y. (1985). Robot modelling: control and applications with software. IFS Publications Ltd. UK, ISBN 0-903608-72-3, and Springer-Verlag, ISBN 3-540-15373-X
244 [6] [7] [8] [9] [10] [11] [12] [13] [14] [15]
S.N. Cubero Kohli D., Soni A. H. (1975). Kinematic Analysis of Spatial Mechanisms via Successive Screw Displacements. J. Engr. For Industry, Trans. ASME, vol. 2, series B pp. 739–747 Denavit J. (1956). Description and Displacement Analysis of Mechanisms Based on 2x2 Dual Matrices. Ph.D thesis, Mechanical Eng’g, Northwestern University, Evanston, Ill Yang A. T., Frudenstein R. (1964). Application of Dual Number Quaternian Algebra to the Analysis of Spatial Mechanisms. Trans. ASME, Journal of Applied Mechanics, vol. 31, series E, pp. 152–157 Uicker J. J. Jr, Denavit J., Hartenberg R. S. (1964). An Iterative Method for the Displacement Analysis of Spatial Mechanisms. Trans. ASME, Journal of Applied Mechanics, vol. 31, Sereies E, pp. 309–314 Milenkovic V., Huang B. (1983). Kinematics of Major Robot Linkages. Proc. 13th Intl. Symp. Industrial Robots, Chicago, Ill, pp. 16–31 to 16–47 Lee C. S. G., Ziegler M. (1984). A Geometric Approach in Solving the Inverse Kinematics of PUMA Robots. IEEE Trans. Aerospace and Electronic System, vol. AES20, No. 6, pp. 695–706 Pieper D. L. (1968). The Kinematics of Manipulators under Computer Control. Artificial Intelligence Project Memo No. 72, Computer Science Department, Stanford University, Palo Alto, Calif., USA Cubero S. N. (1997). Force, Compliance and Position Control for a Pneumatic Quadruped Robot. Ph.D Thesis, Faculty of Engineering, University of Southern Queensland, Australia Goldenberg A. A., Benhabib B., Fenton R. G. (1985). A complete generalized solution to the inverse kinematics of robots. IEEE Journal of Robotics and Automation, RA-1, 1, pp. 14–20 Goldenberg A. A., Lawrence D. L. (1985). A generalized solution to the inverse kinematics of redundant manipulators. Journal of Dynamic Systems, Measurement and Control, ASME, 107, pp. 102–106
Medical Applications
Robotics and mechatronic sensing allow surgeons to reach parts in a way that was hitherto extremely difficult or even impossible. The Birmingham group put their haptic sensing techniques to use for tactile internal examinations. This is a form of telepresence of a very short-range variety. With the widespread attention and controversy that surrounds stem-cell research, the collection of the blood that contains stem-cells from the umbilical cord of a newborn baby has special significance. This paper from Singapore concerns the practical aspects of the design for an automated machine. Another paper from the Birmingham group reports on a combination of actuators and sensors that performed an operation (literally) that would require great surgical skill. In drilling through a bone of the skull in preparation for inserting a cochlear implant, it is important to cease drilling when the bone is penetrated but before a delicate membrane is pierced. In a paper that is somewhat less challenging to the squeamish, the Singapore group describes the use of mechatronic principles in equipment for physiotherapy. Loads and forces can be programmed to suit the condition of the user, while a full record of performance can be captured automatically. What sort of display is appropriate when the user is blind? An array of stimulating electrodes must be of limited resolution and update frame-rate. This paper from a Brisbane group describes practical experiments to determine the response of users to displays of various qualities, while negotiating a course strewn with obstacles. A well-established treatment for cancer is the destruction of cells by beams of radiation, or in this case by high intensity ultrasound. The core problem is to ensure that the beams are brought to a focus at the required location. This location may move with the respiration of the patient. A paper from Toowoomba describes the automatic prediction of beam paths to correct variations.
Distributive Tactile Sensing Applied to Discriminate Contact and Motion of a Flexible Digit in Invasive Clinical Environments
Betty Tam+, Peter Brett, PhD+, David Holding, PhD+, and Mansel Griffiths, FCRS* +
School of Engineering & Applied Science, Aston University, Birmingham, B4 7ET, UK Dept of Otolaryngology, St Michael’s Hospital, UBHT, Bristol, UK.
*
Abstract While minimal access procedures in surgery offer benefits of reduced patient recovery time and after pain, for the surgeon the task is more complex as both tactile and visual perception of the working site is reduced. In this paper, experimental evidence of the performance of a novel sensing system embedded in an actuated flexible digit element is presented. The digit is envisaged to be a steerable tip element of devices such as endoscopes and laparoscopes. This solution is able to retrieve tactile information in relation to contact with tissues and to feedback information on the shape of the flexible digit. As such the scheme is able to detect forces acting over the digit surface and can discriminate different types of contact, as well as to evaluate force level, force distribution and other quantifiable descriptors. These factors, in terms of perception, could augment processes in navigation and investigation through palpation in minimal access procedures. The solution is pragmatic, and by virtue of its low mechanical complexity and polymer construction, it offers a real opportunity for maintaining high sterility through disposability and application in magnetic environments such as MR scanning environments.
1
Introduction
Micro-surgical procedures and procedures of minimal access are increasing in frequency in surgery. A major factor limiting the performance of the surgeon is the reduction of in sensory information feedback compared with traditional open methods of surgery. Vision feedback is provided by microscope or is relayed from a miniature camera to a display screen. By comparison, tactile feedback is more difficult to retrieve from a working site. Tools such as laparoscopes, endoscopes
248
Betty Tam et al.
and other micro-surgical tools are used manually with little tactile sensation as changing forces are not readily detected through the long mechanisms involved, and often the forces are below the threshold of human sensation. This paper describes the performance of a flexible digit that is able to retrieve tactile parameters to aid the guidance of tools and the control, and interpretation, of contact relating to palpation and surface avoidance. Steerable endoscopic devices, and devices for health delivery in lumen, have been explored with the view to detect force [1, 2] and to control movement [3]. In this work, the intention is to derive information relating to touch rather than the value of force and actuation is with the intention to control interaction between the digit and the surface of the flexible tissue. The digit embodies a distributive tactile sensing element [6] that is utilised as part of the mechanism of the device. Actuation is by fluid pressure and the whole mechanism structure is efficient in terms of conserving space across the section of the digit. With the view to produce devices of acceptable cost, the sensing element is an efficient construction, with few sensing elements and connections. Research studies have shown that the sensing element can satisfactorily be produced using strain gauges or optical fibre sensing elements to detect deformation and reactive forces. In the latter case it is possible to envisage a device that can be constructed from non-metallic materials with suitability for magnetic environments, such as MR scanners. The construction of the digit is described and the results of laboratory tests illustrate the potential to retrieve tactile information, as would be needed to assist in diagnosis or control in navigation of the digit. The experimental results show the ability to discriminate levels of stiffness through palpation, contact force orientation, sliding displacement of the digit with respect to surface features of the tissue.
2
Tactile Information Feedback Needs in Clinical Procedures
In clinical practice, Steerable endoscopes are a clinical tool delivery system and are used for tasks in diagnosis and therapy. In surgical therapy, endoscopic devices are used to remove, repair or palliate (alleviate) and depending on the course of treatment, different combinations of tool tips can be used. In diagnosis, there are three applications that an endoscope could be used for; visual examination, tactile examination and obtaining biological specimens for later diagnosis. In tactile examination, palpation is used to assess tissue properties. Kineaesthetic information can be used in navigation and to search for obstructions or pathways. This information is used in conjunction with visual examination. Tactile information could be used for navigation and to augment local investigation, such as by palpation. Tactile information of such a steerable digit needs to discriminate contacting conditions. For example, the shape of the digit, the direction and position of contact, the length of contact, multi or single contact points are important factors for aiding navigation. These may be applied in conjunction with visual feedback.
Distributive Tactile Sensing Applied to Discriminate Contact and Motion
249
Fig. 1. Approximate size of a range of devices used within lumen of the human body
Lumen of the body in which steerable probes can be deployed vary in crosssection. As an approximate guide to the range of size of access, Fig. 1 illustrates the variation. Techniques for operating devices need to be applicable to such a range. The approach of this study is applicable over these sizes as the techniques are scalable.
3
The System Functions and Method of Operation of the Digit
The digit is assumed to be part of a master-slave system, providing the slave servo function and the means to retrieve tactile information corresponding with tissue interaction. It is a mechanical element integrating actuator, mechanism and sensor. This is illustrated in Fig. 2, whether functions provided by the digit element and integrated software are shown in relation to functions of the complete system. In previous reports of this work, the accuracy of the approach to sensing force parameters was shown to be within 5% of the full range. This is acceptable to the intended application. Here, the success towards retrieving information in the context of minimal invasive surgical application is described. The construction of the digit has been described in previous publications[7,8] however, in the interest of completeness, the illustration of Fig. 3 shows the function of the sensing element
Fig. 2. The functions of the digit and of the master-slave system
250
Betty Tam et al.
Fig. 3. The mechanism of the digit
as sensor and as part of the mechanism, causing increasing curvature of the digit when subjected to rising internal pressure. The 12 mm diameter concertina tube and sensing elements are polypropylene and the digit described in these tests is 120 mm in length. The sensor is able to detect the shape of the digit, forces applied to the digit and is able to discriminate contacting conditions and the relationship between force and displacement, relating to palpation for example. Strain gauge sensor elements were positioned along the axis of the restraining element at positions 20 mm, 60 mm and 85 mm from the fixed root of the digit. The deformed shape of the digit is dependent upon the applied internal pressure, the stiffness of the structure of the digit and the reactive contacting loads encountered by the deforming digit. Signals from the sensors were input to a FFBP neural network as the input vector. The output vector being parameters defined to describe the contacting conditions or shape and deformation of the digit. Different neural networks were used in parallel to discriminate a variety of potential contacting conditions likely to be encountered.
4
The Experimental Set-up
Experimental evaluation of the performance of the digit to detect tactile information of relevance to clinical use required consistent motion and position of the digit with the possibility of repeating tests. The digit was fixed at the root on a horizontal axis and the target phantom sample to interact with the digit was attached to a linear actuation system, moving on a horizontal axis. This configuration is shown in Fig. 4. The target was then moved on this axis, and the digit actuated against different target samples. In this set-up it was possible to verify absolute values of position and force corresponding with information retrieved when using the sensor in the actuated digit. Target samples used in experiments were edges, curved tubes and soft surfaces of known stiffness.
Distributive Tactile Sensing Applied to Discriminate Contact and Motion
251
Fig. 4. The experimental configuration of the digit with respect to the target
5
Experimental Demonstration of Performance
The neural networks used for these experiments are based around the NETLAB software toolbox1of MatLab. Various types of neural network can be employed and for these experiments a single layer feed-forward neural network was employed. The sensor and a classification type neural network was applied to discriminate the orientation of the contacting loads applied both around the axis of the digit and at the tip. 100% accuracy was achieved in the 1375 random samples. The capability to discern the direction of the reactive load could be used to aid navigation of the advancing digit in lumen. Moving the digit along its axis, and rubbing against an obstacle (in this case a fixed flexible pin) showed that the identification of the position of the side load can be determined with acceptable accuracy (within five per cent of the length of the digit) at velocities of 20 mm/sec. This is illustrated in Fig. 5 where absolute
Fig. 5. Displacement transients derived from a rubbing contact
252
Betty Tam et al.
and inferred position are plotted as a function of time. The digit is seen to be moving with a peak to peak amplitude of 50 mm and the following error is 3mm. This shows that the digit could self-reference its axial position relative to surface feature using tactile information. This is a useful capability as absolute axial displacements are not reliable for referencing tool position to non-rigid tissues that are subjected to the tool forces, and is particularly important at the smaller scale, such as in microsurgical processes [9]. The digit was used to discriminate stiffness, as in the case of palpation. The combination of force and displacement was resolved by the neural network used from the difference in responding shape of the digit. The sensing system was able to grade stiffness as high medium or low spring rates in the following ranges: High: 60 N/mm or greater; Medium: from 30 to 60 N/mm; Low: less than 30 N/mm. The results were particular accurate with these ranges as illustrated in the table of Fig. 6. A more sensitive set of ranges is a subject of current research. Real / Predicted Hard Medium Soft
Hard 181 0 0
Medium 0 208 0
Soft 0 0 236
Fig. 6. Discrimination of surface stiffness
6
Conclusions
A flexible digit for use as a surgical tool in minimal access therapy has been constructed and deployed experimentally. Phantom studies have shown that the digit can be used to discern some mechanical and physical descriptions of a surface medium locally. The digit can discriminate the orientation of contact about the digit and the position of the reactive load. This is encouraging as the digit is of efficient construction in terms of sensing elements and structure. It has also been shown that can be utilised to register its axial position with respect to tactile features. There is a following error associated with the approach, and at a surface velocity of 20 mm/s, this amounted to 3 mm. Working in conjunction with visual feedback, it is expected that the principles described can be used effectively to assist the surgeon in the difficult working conditions of minimal access therapy. The digit was actuated and displaced into soft surfaces of known stiffness. Discrimination of the level of stiffness was 100% accurate across three broad categories. This capability is useful in palpation in the diagnosis and tissue identification.
Distributive Tactile Sensing Applied to Discriminate Contact and Motion
253
Acknowledgements The authors wish to acknowledge the support of the EPSRC of the UK in this research.
References [1].
[2]. [3]. [4]. [5].
[6]. [7]. [8]. [9].
Lazeroms, M.,La Haye, A., Sjoerdsma,W., Schreurs,W., Jongkind,W., Honderd, G. and Grimbergen,C.,’ A hydraulic forceps with force-feedback for use in minimally invasive surgery.’, Int J of Mechatronics, Vol 6, No 4, Permagon, 1996, ISSN 0957 4158 Dario, P, Carrozza M., Marcacci M., D’Ahanasio S., Magnami B., Tonet O., Megali, G.,’ A novel mechatronic tool for computer assisted arthroscopy.’, IEEE Trans on ITIB, Vol4 No1, March 2000. Dario, P.,’Micromechatronics in Medicine’, IEEE Trans on Mechatronics, Vol 1, No2, June 1996 Rabenstein, T., Krauss, N., Hahn, E.G., Konturek, P., Wireless capsule endoscopy – beyond the frontiers of flexible gastrointestinal endoscopy, Med Sci Monit, 2002; 8(6): RA128-132 Anderson, V.H., Roubin, G.S., Leimgruber, P.P., Douglas Jr, J.S., King III, S.B., Gruentzig, A.R., Primary angiographic success rates of percutaneous transluminal coronary angioplasty, The American Journal of Cardiology, Volume 56, Issue 12, 1 November 1985, Pages 712-717. Tongpadungrod, P., Brett, P.N.,’An efficient distributive tactile sensor for recognising contacting objects.’, Proc M2VIP, Thailand, Sept 2002. Brett,P.N., A Flexible Digit with tactile feedback for invasive clinical applications, M2VIP, Perth, Dec 2003. Ma,X., Brett,P.N., Wright, M.T. and Griffiths, M.V.,’ A flexible digit with tactile feedback for invasive clinical applications.’, Proc IMechE, part H, NoH3, Vol 218, pp151-157, ISSN 0954-4119 Brett, P.N., Ma,X. and Tritto, G.,’ The potential of robotic technology applied to meet requirements for tools to support microsurgery and cellular surgery.’, Int J of Cellular and Molecular Biology, N050, part 3, pp275-280, 2004, ISSN 1165-158X
Medical Applications
Robotics and mechatronic sensing allow surgeons to reach parts in a way that was hitherto extremely difficult or even impossible. The Birmingham group put their haptic sensing techniques to use for tactile internal examinations. This is a form of telepresence of a very short-range variety. With the widespread attention and controversy that surrounds stem-cell research, the collection of the blood that contains stem-cells from the umbilical cord of a newborn baby has special significance. This paper from Singapore concerns the practical aspects of the design for an automated machine. Another paper from the Birmingham group reports on a combination of actuators and sensors that performed an operation (literally) that would require great surgical skill. In drilling through a bone of the skull in preparation for inserting a cochlear implant, it is important to cease drilling when the bone is penetrated but before a delicate membrane is pierced. In a paper that is somewhat less challenging to the squeamish, the Singapore group describes the use of mechatronic principles in equipment for physiotherapy. Loads and forces can be programmed to suit the condition of the user, while a full record of performance can be captured automatically. What sort of display is appropriate when the user is blind? An array of stimulating electrodes must be of limited resolution and update frame-rate. This paper from a Brisbane group describes practical experiments to determine the response of users to displays of various qualities, while negotiating a course strewn with obstacles. A well-established treatment for cancer is the destruction of cells by beams of radiation, or in this case by high intensity ultrasound. The core problem is to ensure that the beams are brought to a focus at the required location. This location may move with the respiration of the patient. A paper from Toowoomba describes the automatic prediction of beam paths to correct variations.
Intelligent Approach to Cordblood Collection
S.L. Chen1, K.K. Tan1, S.N. Huang1 and K.Z. Tang1 1 Department of Electrical and Computer Engineering, National University of Singapore, 4 Engineering Drive 3, Singapore, 117576.
1
Introduction
In the past, following the birth of a baby, the umbilical cord is usually discarded along with the placenta. It is now known that blood retrieved from the umbilical cord, commonly known as umbilical cord blood (UCB), is an increasingly important rich source of stem cells (Cairo and Wagner, 1997). These cells obtained from UCB can produce all other blood cells, including blood-clotting platelets, and red and white blood cells. In a way, UCB is similar in primary functions to donated bone marrow. The cells that are harvested from UCB or bone marrow can be used for the treatment of over 45 malignant and non-malignant diseases, including certain cancers such as leukemia, and immune and genetic disorders. UCB provides a readily available source of stem cells for transplantation in many situations where bone marrow is now used. More critically, there are many advantages to use UCB instead of other sources of stem cells such as bone marrow and peripheral blood: • There is no risk involved in collection of UCB. • UCB is much easier to collect and harvest without the risks of general anesthesia required to harvest bone marrow. • UCB is readily available when needed if it is properly collected and stored at birth. • UCB is often more compatible when used in transplants. • UCB has lower procurement costs. • UCB has demonstrated broader potential clinical applications for improving neural repair and bone and tissue growth. The importance of UCB is now widely recognized. Blood centers worldwide now collect and store UCB after the delivery of a baby upon the parents’ request. The UCB may become extremely useful and indispensable at a later stage in saving the life of the newborn baby. However, one problem associated with UCB is that its collection is a one-time possibility and the amount of blood that can be
256
S. L. Chen, K. K. Tan et al.
collected is limited using current ways of blood collection, which include syringeassisted and gravity-assisted methods (Bertolini et al., 1995). These methods are mainly manually carried out. Apart from being a tedious and difficult process, the current ways of extracting the blood also inherit a high risk of unnecessary contamination. In this paper, an intelligent UCB collection system is presented to automate the process of harvesting stem cells. The proposed system is easy to use, can be readily sterilized and it yields an improved collected useful cells for adult transplantation compared to prior art. It is designed with the constraints of a typical delivery room in mind. The paper will elaborate on the hardware and software aspects of the system and the operational function associated with each of the components.
2
Construction of the Intelligent UCB Collection System
The intelligent system comprises of the following four components • a placenta tray with umbilical cord positioner, • an air-tight chamber with a controlled and distributed pneumatic pre-ssure application system, • an adjustable and integrated vibratory structure, • an open-architecture software control system fulfilling the functionali-ties of the overall system. These four components are modular in nature so that each can be modified or replaced, while the other components remain in use. This feature facilitates repeated operations using the same device. Collectively, the four components form an electro-mechanical apparatus which is able to manipulate the placenta via a combination of high frequency vibration and controlled pneumatic pressure, to maximise the flow of blood from the placenta to a collection tube. In addition, all the key components which may be directly or indirectly in contact with the placenta can be readily sterilised and are also designed to filter contaminants from the collected blood. In this section, the hardware and software aspects of the intelligent UCB collection system will be described in further details. The user interface for the overall system will also be elaborated.
2.1
Hardware Architecture
The hardware architecture and the pictorial diagram of the UCB collection system are shown in Fig. 1 and Fig. 2, respectively. Each of these components is carefully designed to fulfill a specific function contributing to the overall operation of the UCB collection system.
Intelligent Approach to Cordblood Collection
257
Fig. 1. Schematic diagram of UCB collection system
Fig. 2. Pictorial diagram of UCB collection system
Placenta Tray and Umbilical Cord Positioner The placenta tray is a modular slightly-reverse sloping (cone-shaped) component which is an integral part of the overall stainless steel support structure. It serves as the support base for the placenta, with the maternal side facing upward, the fetal side facing downward, and the umbilical cord, originating from the fetal side, passing through the ventura of the funnel-shaped element which is formed by a plastic umbilical cord positioner.
258
S. L. Chen, K. K. Tan et al.
Pneumatic Pressure Application System The pneumatic pressure application system comprises of a pressure application lid which is snapped onto the placenta tray. Together with the placenta tray and positioner, they form an air-tight chamber. The lid houses three interfaces to standard air tubings. Vibratory Structure The stainless steel structure is integrated with a vibrator which generates high frequency vibration to the entire structure. In this way, the placenta is kept in a naturally-inclined position at all times. Bottlenecks and clots impeding blood flow can be reduced. The amount of vibration is adjustable via the vibration controller. DSP Board The dSPACE DS1102 DSP board (dSPACE DS1102 User’s Guide, 1996) (manufactured by dSPACE Digital Signal Processing and Control Engineering GmbH) is used in the automated cord blood collection system as the main computing unit. It is a single board system, which is specifically designed for development of highspeed multivariable digital controllers and real-time simulations in various fields.
2.2
Software Development Platform
The DS1102 DSP board is well supported by popular software design and simulation tools, including MATLAB and SIMULINK (The Mathematics Laboratory, 2006), which offer a rich set of standard and modular design functions for both classical and modern control algorithms. Upon a successful automatic code generation from the SIMULINK control block diagram, the controller will then be downloaded and executed on the dSPACE DSP board. The user interface allows for user-friendly parameters tuning/changing and data logging during the operations. The control parameters can be changed on-line, while the system is in operation. All the changes made by the user can be observed simultaneously on the display.
2.3
Intelligent User Interface
An open-architecture software control system fulfills the overall functions of the system. It is programmed to give different chamber pressure profiles. Closed-loop control ensures the pressure is precisely controlled to track desired profiles. Alarm and safety features are implemented to maintain the chamber pressure within acceptable thresholds. The aim of the user interface is to enable the operation of the UCB collection system and to monitor the whole process via a fuzzy fusion controller.
Intelligent Approach to Cordblood Collection
259
Fig. 3. User interface of UCB collection system
The user interface is designed as a virtual instrument panel based on the dSPACE COCKPIT instrumentation tool (COCKPIT User’s Guide, 1996). COCKPIT (a peripheral design tool for graphical output and interactive modification of variables, for applications running on dSPACE DSP boards) is a comprehensive design environment where designers can intuitively manage, instrument, and automate their experiments and operations. Figure 3 shows the user interface customised for the intelligent UCB collection system.
3
Test Results
The aim of UCB collection is to obtain sufficient number of useful blood cell for adult transplantation. According to (McCullough et al., 1998), the current ways of UCB collection typically recovers only 20–40 ml. Apart from obstetric factors such as infant weight and time of collection, the procedure involved and the equipment used to perform the collection influence the final yield. Some critical cells, such as the CD34+ cell have high proliferation capacity, several hundredfold greater than similar cells from the adult bone marrow (Rogers et al. 2001). Our experimented results of UCB blood collection are shown in Table 1. The apparatus has been tested on six delivered placentas. The manual collection of UCB is firstly performed by syringe. The average number of Mononucleotide Cells (MNCs) collected is 4.4 × 1010 per placenta, while the average number of CD34+ cell obtained is 2.29 × 108 , which is far less than the minimum requirement for adult transplantation. Follow by manual collection, the automated device is used for further extraction of UCB from the same placenta, with injection of 6% ACD-A, perfusion medium and digest media to assist in the extraction of the UCB. The mean value of number of MNCs and CD34+ being collected are 1.64 × 1012 and 2.03 × 1010 per placenta accordingly. The no. of CD34+ cell obtained is about 2 times greater than the essential value for adult transplantation.
260
S. L. Chen, K. K. Tan et al.
Table 1. Experimental results Exp Wt. of Index placenta (g)
1 2 3 4 5 6
525 750 500 600 600 525
Collection using manual method (syringe)
Collection using automated device
Volume MNCs CD34+ (ml) ×108 /ml ×108 /ml 90.75 5.3 0.02438 85.66 5.78 0.027744 Cordblood sent to KL: no results No cord banking No cord banking 46 7.45
Volume (ml) 1 145 520 530 470 168 460
MNCs ×108 /ml 17.64 63.8 57.114 38.745 22.575 22.95
CD34+ ×108 /ml 0.360 0.2235 0.1564 1.1223 0.6377 0.71718
1 The UCB collected by automated machine is mixed with the injected medium. Thus, the volume is greatly increased. However, the key objective of our experiments is to improve the no. of MNCs and CD34+ cells being collected, rather than the volume of UCB.
4
Conclusions
An intelligent UCB collection system has been developed to enable an efficient UCB collection process. The hardware architecture, software development platform, user interface, and all constituent control components are elaborated on in the paper. The overall control system is comprehensive, and designed with the constraints of the typical delivery room in mind. The control and instrumentation components, integrated within a configuration of hardware architecture centred around a dSPACE DS1002 DSP processor board, collectively achieves the objective function and yield an improved number of MNCs and CD34+ collected according to real tests carried out on freshly delivered placentas.
References [1] [2] [3] [4] [5] [6] [7]
Cairo MS, and Wagner JE (1997) Placental and/or umbilical cord blood: an alternativesource of hematopoietic stem cells for transplantation. Blood, vol. 90, no. 1, pp 4665–4678 Bertolini F, Lazzari L, and Lauri E (1995) Comparative study of different procedures for the collection and banking of umbilical cord blood. J. Hematother Stem Cell Res., vol. 4, no. 1, pp 29–36 dSPACE DS1102 User’s Guide (1996) Document Version 3, dSPACE The Mathematics Laboratory (2006) http://www.mathworks.com COCKPIT User’s Guide (1996), Document Version 3.2, dSPACE McCullough J, Herr G, and Lennon S (1998) Factors influencing the availability of umbilical cord blood for banking and transplantation. Transfusion, vol. 38, no. 1, pp 508–510 Rogers I, Sutherland DR, Holt D, Macpate D, Lains A, Hollowell S, Cruickshank B, and Casper RF (2001) Human UC-blood banking: impact of blood volume, cell separation and cryopreservation on leukocyte and CD34+ cell recovery. Cytotherapy, 3:269–276
An Autonomous Surgical Robot Applied in Practice
P.N. Brett1, R.P. Taylor1, D. Proops2, M.V. Griffiths3 and C. Coulson1 1
Aston University, Birmingham, UK Queen Elizabeth Hospital Birmingham, UK 3 Saint Michaels Hospital, Bristol, UK 2
Keywords: Robotics, surgery, autonomous, sensory guided
1
Introduction
Over the last 20 years, robotic surgery has made its mark as a precise means of tool deployment in surgical procedures. The majority of applications have focused on the control of tools on trajectories defined using pre-operative scan data. These pre-determined trajectories are appropriate where tissue movement between scanning and surgical therapy processes can be considered insignificant, or within acceptable limits. This level of assistance has its value in many procedures, however more complex tool paths and variations in strategy are required in many other procedures that will benefit from the precise nature of robotic manipulation technology. (Dario et al. 2000), (Davies 2000) are different example systems. To an extent this has been achieved by introducing the surgeon operator into the control loop, where master-slave systems have attempted to harness the decisions on interpretation of the state of tissue tool interaction, the formulation of strategy by the surgeon and the response and accuracy of the robotic device. Unfortunately there is always a dilemma associated with the perception of interaction with the tissue at the tool point. This is particularly true in minimal access procedures or in procedures requiring microscopic tool interaction, where information based on visual perception is compromised and the sense of tactile information is lost. In addition to automatic and master-slave robotic systems in surgery there is a need for sensor-guided robotic devices that interpret or react to tissues in order to control the state of interaction relative to the target tissue. These can be fully automatic, or automatic as part of a master-slave system to enable precise operation of tool points with respect to tissue targets and interfaces. Sensory guided robots can be used to control penetration through flexible tissues and to control relative
262
P.N. Brett, R.P. Taylor et al.
motion to moving or deforming tissue targets and interfaces as in micro-surgery. In such applications, precision would otherwise be compromised by deflection induced by the action of tool forces. The micro-drilling robotic system is the first example of this new breed of autonomous surgical tool applied in practice. This paper covers a brief description of the cochleostomy procedure used to demonstrate the technique, the design of elements of the micro-drilling system, the sensing technique and a description of trials.
2
Preparing a Cochleostomy
Cochlear implantation has become the standard treatment for severe to profoundly deaf patients over the last 20 years (Young et al. 2003). This implant is within the inner ear hearing organ. Cochleostomy is one of the key steps in the procedure for installing an implant. This is the hole through which the electrode implant is inserted into the cochlear and its location with regard to the anatomy of the ear is shown in Fig. 1. When drilling through the bone tissue of the cochlear, the inadvertent protrusion of the drill through the delicate internal structures of the cochlear can lead to complications. Protrusion together with contaminating the internal fluids with bone dust, will lead to a reduction in residual hearing and will increase risk of post operative infection. Using the new drilling system it is possible to drill through the wall of the cochlear and complete the hole without penetrating the membrane at the inner wall interface. This minimises trauma to the hearing organ and increases the likelihood of retaining residual hearing. It also maintains a high level of sterility.
Fig. 1. Diagram illustrating the anatomy of the ear and location of a cochleostomy
An Autonomous Surgical Robot Applied in Practice
263
Access to the surface of the cochlear is prepared by the surgeon. Normally this is behind the ear and typically results in a hole 10 mm in diameter and 15 mm deep, narrowing towards the drilling site. Of importance is the need to maintain visual focus at the working site of the binocular surgical microscope and of the access to the drilling point while avoiding contact with various anatomical structures. This is reflected in the design of the mechanical elements of the drill. The implant is finally inserted through a pool of antiseptic gel at the cochleostomy to maintain sterile conditions. By using the micro-drill, it is possible to avoid the ingression of drilling debris and to avoid penetration into the cochlear before the antiseptic gel is applied. In the operating theatre, the drilling system has been constructed to observe high integrity on sterility to be expected of invasive surgical instruments. Other practical measures include earth linking, minimising the number of cables to one USB connection and the minimum set-up time in the operating theatre.
3
The Autonomous System
The micro-drilling system consists of the five principal parts shown in Fig. 2.: The drill unit comprises precision linear feed actuator, drill drive system and sensing elements; The flex and lock arm, incorporating fine and coarse adjustment; The hard-wired unit integrating sensing and control functions; The hand-held remote unit; The computer display screen. The control system and sensory functions operate in hardware. The computer is used to relay information to the surgeon on the state of the tool tip. The drilling process is controlled through the hard-wired unit either by the computer or by using the hand held remote unit. The controller implements the drill feed and drill bit rotation in response to interaction between tissue and tool point and the state of the drilling process. Working under a surgical microscope, the drill unit is aligned by the surgeon in close proximity with the drilling site on the correct trajectory by using the flex arm,
Fig. 2. The five principal parts of the autonomous system
264
P.N. Brett, R.P. Taylor et al.
fine adjustment mechanism and the hand-held remote unit. It is then locked in position. Automatic operation of the system is then triggered by using the handheld remote unit. Drilling feed is controlled at a constant rate, typically 0.5 mm/min with the drill bit rotating at 10 rev/s until a limiting force level is reached. Detection of the approach to the medial surface is by automatically identifying drilling force characteristics that occur simultaneously as this point is reached. The drilling then stops with the drill bit retracted until the feed force reduces to zero, such that the drill tip rests on the base of the drilled aperture. The choice is then made by the surgeon to penetrate by the minimum displacement to achieve a fully formed hole or to retract leaving a minimum thickness of bone tissue. At any point the drill bit can be retracted for visual inspection and drilling can then recommence at the same point.
4
Sensing the Medial Surface
The system monitors the force and torque transients at the tool point and interprets these in real-time. The relationship between the transients can be used to distinguish between different states and phenomena, such as patient or tool movement, the approach to tissue boundaries, tissue hardness and stiffness, and drill breakthrough. Using part of this information it is possible to interpret the critical breakthrough event before it occurs and to automatically control the drill penetration with minimum protrusion. A typical set of force and torque curves measured when drilling a cochlear are shown in Fig. 3. In these trials a 0.8 mm diameter burr drill bit is used. The approach to the medial surface of the cochlear is indicated by a progressively falling force transient simultaneously with a rising torque transient. These occur as the thickness of bone tissue, centrally in front of the drill tip, reduces such that it becomes particularly flexible locally. As a result the feed force is reacted at an increasing radius of the drill tip. A full description of this process is given by (Brett et al. 2004), (Baker et al. 1996). The method is reliable, is independent of force level and is able to compensate for axial deflection of the tissue, patient or drilling unit.
Fig. 3. Typical drilling force and torque transients
An Autonomous Surgical Robot Applied in Practice
5
265
Micro-drilling in Practice
Leading up to the clinical trials, other tests were carried out on porcine and cadaver specimen. This tested the system in tissues of similar properties to those in live patients and tested the suitability of the configuration of the flex arm and micro-drill unit in a set-up similar to that of the operating room. Trials under microscope are shown in Fig. 4. Figure 4a shows the drill bit entering the cochlear and Fig. 4b shows the resulting cochleostomy where the tissue has been fully perforated, leaving the inner membrane intact. Following the proving trials, the robotic micro-drill was used autonomously in theatre. It is the first example of a robotic surgical tool being used in a totally autonomous mode of operation.
Fig. 4. The preparation of a cochleostomy in pre-theatre trials. (a) The drill tip at the cochlear. (b) The resulting cochleostomy
6
Conclusion
The configuration and method of the first autonomous surgical robot applied in clinical practice have been described in this paper. The sensing and controlling functions of the drilling device are by hardware. The method for sensing the critical state of breakthrough of flexible tissue interfaces operates by the detection of reliable and simultaneous features in the feed force and torque transients.
Fig. 5. The set-up of the flex arm and drilling unit in the operating theatre
266
P.N. Brett, R.P. Taylor et al.
The drilling system has been applied in cochleostomy. This paper has shown a resulting cochleostomy produced by the drill that is able to penetrate the bone tissue of the cochlear leaving the inner membrane intact. This maintains sterility and avoids ingression of drilling debris. Fewer complications and higher performance in patient residual hearing are expected as a result of this approach compared with conventional methods.
Acknowledgements The authors wish to acknowledge the, support of Queen Elizabeth Hospital, Birmingham, UK in this work and the advice and enthusiasm of Mechtron Design Ltd.
References [1] [2] [3] [4] [5]
Baker, D., Brett, P.N., Griffiths, M.V. and Reyes, L. (1996) A mechatronic drilling tool for ear surgery: A case study of design characteristics, Int J of Mechatronics, vol. 6 No. 4, pp. 461–478. Brett, P.N., Baker, D.A., Taylor, R. and Griffiths, M.V. (2004) Controlling the penetration of flexible bone tissue using the stapedotomy micro-drill.. Proc IMechE, part. I, No. I4, Vol. 218. Dario,P., Carrozza, M.C., Marcacci, M., D’Attanasio, S., Magnami, B., Tonet, O. and Megali, G. (2000) A novel mechatronic tool for computer assisted arthroscopy.’, IEEE Trans on Information Technology in Biomedicine. Vol. 4, No. 1, pp. 15–29. Davies, B.L., (2000) Hands-on Robots for surgery. Seminar on Surgical Robotics and Computer assisted surgery, IMechE, London. Young, N., Nguyen, T., Wiet, R. (2003) Cochlear implantation, Operative techniques in otolaryngology 2003; 14(4): 263–267
Development of an Intelligent Physiotherapy System
S.L. Chen, W.B. Lai, T.H. Lee and K.K. Tan Department of Electrical and Computer Engineering, National University of Singapore, 4 Engineering Drive 3, Singapore, 117576.
1
Introduction
Physiotherapy (also known as physical therapy) is a health profession concerned with the assessment, diagnosis, and treatment of disease and disability through physical means with the aim of restoring, maintaining and promoting overall fitness and health (Wikipedia, 2006). Their patients include accident victims and individuals with disabling conditions such as lower back pain, arthritis, heart disease, fractures, head injuries and cerebral palsy. It is based upon principles of medical science, and is generally held to be within the sphere of conventional medicine. Other patrons of physiotherapy can be expected to be individuals with disabilities; baby boomers entering prime age for heart attacks and strokes, as well as gym goers for general well-being. There is also an increasing interest in health promotion. This paper described the process of developing this Intelligent Physiotherapy system in two phases. First, the individual components of the physiotherapy system are assembled. Programming and testing are done on the computer. Thereafter, work is carried out using National Instrument’s (NI) CompactRIO (cRIO) to make the system a convenient and portable standalone application, which is one of the key features of this project. The cRIO is the critical component of the system. It handles the tasks of running the program stored at the flash memory, reads in signals keyed in by the user and displays the results as well as control the reaction of the system in response to the human activity. Many electromechanical systems have been developed for physiotherapy systems (Coote and Stokes (2005), Intelligent Systems Group (2006) and CORDIS (2006)). These systems are designed to target different parts of the body (i.e., upper limbs, legs and back) requiring physiotherapy attention. But they are developed using customised hardware and are quite bulky. Furthermore, these systems are not easily reconfigurable to suit the specific needs of different users at the same time.
268
S.L. Chen, W.B. Lai et al.
This paper states the problem and describes the design methodology as well as the detailed design in electrical and control software. The results from the experiment, which validate the design methodology, are presented at the end.
2
Objective
The main objectives of this work are: • To develop a physiotherapy system which makes it possible to have a continuous and varying weight profile over the strokes to match the patient’s problem and therapy objectives and facilitate dynamic weight variation over a physiotherapy routine cycle as well. Normal equipment in the gymnasium based on a discrete-weights system is unable to achieve this function. • Makes use of computers to record and keep track of the results of the physiotherapy session. Currently, either the physiotherapist or patient will have to record the results of the physiotherapy routine personally. This can be a monotonous and boring task. There is also no feedback. The part of the routine where the patient faces problem and how far he is from overcoming the problem is all unknown or a matter of subjective opinion from the physiotherapist. It is not possible to have a more objective analysis of the patients’ exact weakness areas, and the physio-therapist’s experience and professionalism is very much depended upon. With this system, in addition to his own judgment, the physiotherapist is to make use of the feedback from system to analyse the recovery progress of the patient. • To develop a compact, portable version of Intelligent Physiotherapy system, which can achieve the major functions as well as the computer-aid version.
3
Hardware and Software Components
From the hardware considerations of this work, a 40 Nm servo motor is used. The sizing of the motor is determined after evaluating feedback provided by the Physiotherapy Department of the National University Hospital. This motor will replace discrete weights as the chief source of resistance. The key advantage of using motors compared discrete weights is that varying the voltage output to the motor can easily change the force provided by the motor. A servo motor is used for its efficiency and stability under low-speed conditions and its four quadrant operation, making it suitable for physiotherapy treatment. Also, no probes need to be attached to the patient. The force sensor will read in the force exerted by the patient. The force sensor used in this project utilises a specialised piezoresistive micromachined silicon-sensing element and provides precise, reliable force sensing performance. The hardware components of this system are shown in Fig. 1. The control system (Fig. 1(b)) consists of one each of the following: a reconfigurable chassis, a real-time processor, an analog input module, an analog output module, a digital input module and compatible LabVIEW software.
Development of an Intelligent Physiotherapy System
269
Fig. 1. (a) Mechanical componentts of system. (b) Standalone controller
4
Phase One Design
The Intelligent Physiotherapy system (Fig. 2) phase one design is made up of three key parts, the mechanism, the control system (cRIO) and the man-machine interface (the computer). The control system and the computer are connected to each other through an internal network using a TCP/IP cable. The user can input the commands to the physiotherapy system via the keyboard. The results of the treatment routine can be displayed real-time through the computer screen using LabVIEW software. The reaction or force exerted by the user is extracted using a force sensor. This is then analysed by the Intelligent Physiotherapy, which will provide an appropriate resistive force output based the settings determined by the user. The features to be introduced here are: (i) User Set Force: A safety feature to
Fig. 2. Overview of the intelligent physiotherapy system
270
S.L. Chen, W.B. Lai et al.
stop motor output when the patient is unable overcome the force exerted by the machine. (ii) Maximum Force Test: To detect the maximum force exerted by the patient during the physiotherapy routine. It also permits varying force profiles to be exerted over the range of physiotherapy movement. (iii) Positional Count: To enable the programmer/physiotherapist to analyse the patient’s force profile over a range of movement.
4.1
FPGA Program
The phase one design consists of 2 parts; the Field Programmable Gate Arrays (FPGA) program which runs on the CompactRIO (cRIO), and the host program which communicates with the FPGA program through a computer via an Ethernet network. cRIO permits the programmer to utilise user-programmable FPGA for high performance and customisation. It is used here to achieve the desired inputs (force applied by patient, safety force limit, strength of motor) and outputs (positional counts, graphical displays) required by the design of the Intelligent Physiotherapy described in the overview. These inputs and outputs can be adjusted and changed using Interactive Front Pane of LabVIEW. However, it is to be noted that the amount of data that can be fitted onto the FPGA is limited depending on the device used. More complicated data operations need the accomplishment of the host computer.
4.2
Host Program
The host program is used to interact with the FPGA program. It runs from the host computer. The host computer enables additional data processing function such as floating point arithmetic and data logging, which are not available on the FPGAonly system (National Instruments, 2004). It is also used for debugging, trail run and testing purpose, so that it can facilitate the standalone system design in the following step. The safety feature of this Intelligent Physiotherapy system is functional, such that the maximum force that can be applied to the system can be set by the user. The force profile over the range of movement applied to the system is also displayed on the host program interface. During the test, when the user is unable to overcome the User Set Force limit, the machine stops as instructed. The Force Chart displays in real time the variation of the applied force. The maximum force applied is also reflected on the Position vs Force Chart as well as the Max Force Gauge. The position works accurately and correctly outputs the current position of the Intelligent Physiotherapy’s arm. The results of the test may be saved into a text file for future analysis if desired. A successful phase one design has been achieved.
Development of an Intelligent Physiotherapy System
5
271
Phase Two Design – Stand Alone Application
The objective here is to build the Intelligent Physiotherapy system as a standalone application. The key challenges here are, removing the personal computer as the human-machine interface, re-programming the Intelligent Physiotherapy program and finding a suitable replacement human-machine interface. No host program will be created for the standalone system design as the Intelligent Physiotherapy is expected to run the program stored in the flash memory of the cRIO during the power-up state and on any resets as well. Without the host program running on the computer, the program for the standalone Intelligent Physiotherapy will be based solely on the FPGA. The comparison between two Phases of design is shown in Table 1. A simple external controller is assembled to control the Intelligent Physiotherapy and to provide the user with an interface after the computer is detached. To infuse the element of intelligence into the system, work is undergoing to incorporate the operation of the whole system and monitoring function of the whole process via a fuzzy fusion controller. This enables the system to alert the user of abnormalities encountered during the operation of the machine. The system is also able to recommend different training profiles based on certain inputs from the user. Figure 3(a) shows the components of this simple external controller, consists of two sliding potentiometer, two LEDs, one toggle switch and one digital multimeter. A casing is constructed to house the components which are shown in Fig. 3(b). The descriptions of various components are given in details as in Table 2. Table 1. Comparison of the design of two phases User set force
Phase 1 Phase 2 Yes Yes
Speed of arm
Yes
Yes
Position of arm
Yes
Yes – on FPGA front panel No – on actual system
Graphical display
Yes
No
Maximum force test Yes
No
Output to file
Yes
No
Portability
No
Yes
272
S.L. Chen, W.B. Lai et al.
Fig. 3. External controller: (a) Components. (b) Appearance. Table 2. Components of external controller Component Sliding Potentiometer (10 KΩ)
Uses To control User Set Force and Speed of Arm
Pros Cheap and easy to use.
Cons Inaccurate and prone to noise.
LEDs
Lights up when the sliding potentiometers are in use.
-
-
Toggle Switch
To switch the LCD display to show different outputs.
-
-
Digital Multimeter
To display the analog outputs of the system numerically.
Versatile and easy to use. Cheaper than building customized display.
The values displayed are not scaled, i.e. 150 N is displayed by a value of 10 on the LCD.
6
Field Trail Results
The completed system was tested on a few selected individuals. The various functionalities of the system were tested in different operating conditions. The completed system is able to fulfill the objectives stated in the earlier part of the paper. A screen capture of one of the trial tests conducted is shown in Fig. 4. Some of the observations from the field tests help us in fine-tuning this system. The main strengths of this system are the portability, the re-configurability and the safety features embedded within. Without discrete and bulky weights, the whole system is more portable as compared to the usual weight training equipment. The training profile of each user can be stored within the system.
Development of an Intelligent Physiotherapy System
273
Fig. 4. Screen capture of one of the trial tests
7
Conclusions
This paper describes the process of designing a standalone Intelligent Physiotherapy system in two different phases. For this project, the first phase serves to help define the features of the Intelligent Physiotherapy in the second phase as well as to carry out tests and refinement. Further developing the Intelligent Physiotherapy in either phase will lead to different systems with features catering to the diverse needs of the healthcare industry. As the Intelligent Physiotherapy is connected to a network here, there is potential for remote controlling of the system whereby the physiotherapist does not have to be physically present to monitor the patient’s routine. In the second phase, the Intelligent Physiotherapy is developed as a standalone system. A working version is successfully produced.
References [1] [2] [3] [4] [5]
S. Coote and E. K. Stokes (2005) Effect of robot-mediated therapy on upper extremity dysfunction post-stroke – a single case study. Physiotherapy, 91(1), pp. 250–256. Wikipedia, the free encyclopedia (2006), http://en.wikipedia.org/wiki/physiotherapy. Intelligent Systems Group (2006) School of Mechanical Engineering University of Leeds, http://mech-eng.leeds.ac.uk/res-group/is/intmech.htm. Community Research & Development Information Service (CORDIS) (2006) http://cordis.europa.eu/en/home.html. National Instruments (2004) LabVIEW FPGA Module User Manual. March 2004 Edition, pp 1–19.
Visual Prostheses for the Blind: A Framework for Information Presentation
Jason Dowling1, Wageeh Boles1 and Anthony Maeder2 1
Queensland University of Technology, Brisbane, Australia e-Health Research Centre / CSIRO ICT Centre, Brisbane Australia
2
1
Introduction
A number of research teams are investigating the partial restoration of sight to blind people from the electrical stimulation of a component of the visual system. In 1929 it was noted by Otfrid Foerster that stimulating the human visual cortex led to the perception of spots of ‘light’ (Hambercht, 1990) referred to as phosphenes. With recent advances in technology, progress has been made toward building a useful visual prosthesis or Artificial Human Vision (AHV) system to present phosphene information to a blind person. However there are currently a number of constraints in current prosthesis systems, including limitations in the number of electrodes which can be implanted and the perceived spatial layout and frame rate of phosphenes. The development of computer vision techniques that can maximize the value of the limited number of phosphenes would be useful in compensating for these constraints. There are also a limited number of people who have received an implant: therefore there much visual prosthesis research is currently conducted with normally sighted research participants. Three main functional requirements for blind users of a visual prosthesis include: the ability to read text (Dagnelie et al., 2006); to recognize faces (Thompson et al., 2003) and mobility (Cha et al., 1992). Although reading and face recognition research have received attention in simulation studies there has been less research conducted on mobility. This chapter consists of two main parts. Firstly a framework for the adaptive display of mobility information for visual prosthesis recipients is presented. In the second part the application of this framework in practice is demonstrated with a simulated visual prosthesis mobility experiment.
276
2
Jason Dowling, Wageeh Boles and Anthony Maeder
A Framework for Information Presentation
Although the development of a visual prosthesis involves research from a diverse range of specialists, there is currently no unifying framework combining the requirements of blind end-users with different system components. This chapter presents a novel framework (shown in Fig. 1) which is composed from different influences effecting how information from a prosthesis system (or other mobility device, such as a long cane) is perceived by a blind traveler. The main interrelated components of the framework are: dynamic and external factors, computer vision methods, display type, and other display modalities, and finally, mobility performance. In this framework, dynamic factors are those which relate to the current situation and goals of a mobile person. As a person moves, these inter-related factors can change rapidly. The first identified dynamic factor is context, used here to describe the expectations that a person will have in different situations, which will in turn affect the type of mobility information required. The extraction of information from captured scenes is affected by a number of properties, such as lighting, glare, low contrast, clutter and texture, therefore the second factor involves scene properties. Different information is required depending on the current task. A road crossing task may emphasize a straight path to the opposite curb (to prevent veering), whereas a task involving identifying a set of keys on a cluttered table may involve zooming or object recognition. Sensory Information dynamically changes for blind travelers: Auditory cues (such as the sound of an approaching object) are particularly important for blind mobility and navigation, as are tactile cues (such as hand-rails or Braille strips on a footpath). The final dynamic factor is the environment which includes properties of the physical environment such as the weather, landmarks or people. External factors refer to a group of factors which are important for displaying information; however these factors are static while a person is moving. The first set of external factors relate to the components of a visual prosthesis system which affect the amount of information which can be obtained (by camera properties such as frame rate, resolution and field of view), processed and displayed (by the limited number of electrodes). Individual psychological and physical differences between people, or human factors, may also affect the required information display. In addition to image information captured using a camera; information about the environment can be provided from non-image sensors such as ultrasound, laser and Global Positioning System technology. These non-image sensors have been used in a number of electronic travel aids for the blind. Computer vision methods are an important component of the framework. Images acquired from a camera will need to be updated before they can be used for display (e.g. to reduce spatial resolution). The use of additional methods used will depend on the dynamic and external factors discussed above. A simple example might be a color filter and edge detection applied to images to highlight a sign. The computer vision component may also combine information from non-image sensors.
Visual Prostheses for the Blind: A Framework for Information Presentation
Fig. 1. Proposed mobility display framework
277
278
Jason Dowling, Wageeh Boles and Anthony Maeder
Different types of visual prosthesis display may be useful. In a Standard Display captured images are reduced to a lower resolution and then each pixel in the reduced resolution image is used to control a single electrode. The resizing may be combined with a smoothing filter (to reduce noise) and edge detection (as in the Dobelle cortical implant (Dobelle, 2000)). It may also be beneficial if a visual prosthesis could continually search for hazardous features of the current scene and provide an alert display. Finally, a Symbolic display could extract salient objects from captured images and display a symbolic or cartoon-like representation (for example to highlight a doorway or table). Other Display Modalities Although the primary method of displaying information would be from electrodes, additional information (particularly a warning) could be presented using auditory channels. However, sensory substitution may overload an existing sensory input. A benefit of including non-electrode displays in the framework is the possible of objectively comparing traditional and ETA mobility aids with a visual prosthesis. Mobility Performance The final component of the mobility framework represents the dependent variables used to measure mobility. This component allows the experimenter to assess the effect of framework components on an individual’s mobility effectiveness. Three of the most common mobility performance measures for low vision and blind mobility are the Percentage of Preferred Walking Speed (PPWS), number of times veering has occurred, and contact with obstacles. The Percentage of Preferred Walking Speed (PPWS) is calculated as SMC/PWS × 100, where a persons speed walking through the mobility course (SMC) is defined as distance (m)/time (s). To normalise walking speeds between participants an initial assessment of Preferred Walking Speed (PWS) (distance travelled (m)/time (s)) is made over a shorter, obstacle-free course (Soong, 2001). The conceptual framework is significant as it supports and guides the development of an adaptive visual prosthesis system, and enables the dynamic adjustment of display properties in real-time. Benefits of the conceptual framework include: • Experimental control of different factors: By manipulating and controlling different factors from the framework the effectiveness of different prosthesis system displays can be measured (such as altering display temporal resolution while using a standard mobility assessment technique). • Common language: The framework allows a common language for visual prosthesis users, medical specialists, engineers, scientists, software developers, O&M specialists and other groups. • Standardised requirements: Research on the effects of different factors (such as display types) can lead to a standardised set of requirements for prosthesis system components (for example a common display interface used for menus). This may lead to interchangeable components and a standardised testing methodology.
Visual Prostheses for the Blind: A Framework for Information Presentation
279
• Training: The effects of different factors impacting on mobility effectiveness (such as age of onset of blindness) can be examined. Using these results different training strategies and training assessment methods can be developed and compared. • Adaptive systems: Finally the framework supports the development of systems which alter their method of computer vision processing depending on a number of external factors (for example, depending on the current task being performed by an end-user).
3
Application of the Framework to Mobility Assessment
The second part of this chapter presents experimental work supporting the use of the proposed mobility framework. In this application the number of dynamic factors has been controlled by use of a custom built indoor artificial mobility course. Two computer vision methods (discussed below) from the framework are treated as independent variables. There have been reported limits to the temporal resolution at which phosphenes can be perceived (for example, Dobelle has reported that 4 frames per second (FPS) was the most effective temporal resolution for his cortical device (Dobelle, 2000). Also, although faster stimulation has been reported in this literature, much research on temporal resolution (for example, Eckhorn et al. (2006)) is currently based on animal experiments and the effects of chronic electrical stimulation on the human visual system may cause a reduction in temporal resolution. In addition sensory substitution devices for the blind also provide information at low frame rates (for example, the vOICe auditory based device provides soundscapes at 1 fps (Meijer,1992). Although a current focus of much visual prosthesis research is to increase the number of implantable electrodes and therefore increasing perceived spatial resolution, the effect of frame rate on mobility for a visual prosthesis display has not yet been examined. It is hypothesised that mobility performance should increase with increased spatial resolution and also with frame rate. This chapter investigates the interaction between display frame rate (1, 2 and 4 FPS) and spatial resolution (32×24 and 16×12 phosphenes).
3.1
Method
For this experiment custom hardware and software were developed to simulate a phosphene display. An indoor mobility course was set up in a large civil engineering concrete laboratory.
280
Jason Dowling, Wageeh Boles and Anthony Maeder
Fig. 2. The visual prosthesis simulation device developed for this experiment. All external light is blocked by the white curtain block-out material, therefore the participant must use the phosphene display provided by the HMD for mobility related information captured from the head mounted camera
Simulation Hardware The Head Mounted Display (HMD) used in this study was the i-O Display Systems (Sacramento, CA) i-glasses PC/SVGA, which provided a selected resolution of 640-by-480, total field of view 26.5° at 60 Hz refresh rate. This display was chosen due to its low cost (AU$1230) and simple interface to a laptop PC. An external lithium polymer battery (cost AU$215) powered the HMD. To block out external light, a custom shroud was constructed from block out curtain and sewn around to the HMD (with slots to allow ventilation). A Swann Netmate Universal Serial Bus (USB) camera was attached, at eye level, to the front of the HMD. This camera was selected due to its low cost (AU$53), small size, light weight and simple integration with the Windows operating system. The camera used a 1/7 inch CMOS sensor, and has automatic gain compensation, exposure and white balance. The field of view (FOV) for this camera was manually calculated at 34° horizontal and 27° vertical. A Toshiba Tecra laptop (1.6 GHz Centrino processor) was either worn by participants in a backpack, or carried by the experimenter. The camera was powered from the USB port of this computer. Simulation Software The main requirement for the visual prosthesis simulation software was to convert input from the USB camera into an on-screen phosphene display. To be representative of current prototype devices and maintain the same aspect ratio of the display device the simulation reduced the resolution of captured images from 160×120 RGB colour to 32×24 or 16×12 simulated phosphenes. Therefore in this simulation it is assumed that eight grey levels for each phosphene can be displayed. The simulation software was written in Visual C++ 6.0 (Microsoft, Redmond, WA), using the Microsoft Video for Windows library to capture incoming video images. These images were sub-sampled (using the mean grey level of contributing
Visual Prostheses for the Blind: A Framework for Information Presentation
281
pixels) to a lower resolution image, which was then converted to 8 grey levels. To simulate a perceived electrode response, the low resolution image was displayed as a phosphene array using the DirectDraw component of Microsoft DirectX. Each phosphene was generated from an original circle, 40 pixels in diameter, filled with the matching grey level and blurred with a Gaussian filter (Radius=10). Examples of the simulation display are shown in Fig. 4. These simulated phosphenes are similar to those generated by Thompson et al. (2003) and Dagnelie et al. (2006). Mobility Course To assess mobility performance, an indoor mobility course (Fig. 3) was constructed within an emptied 30×40 m civil engineering laboratory at the Queensland University of Technology. The mobility course consisted of a winding path, approximately 1 m wide and 30 m long. Path boundaries were marked with 48 mm black duct tape. The floor of the course was concrete, which was painted light grey, however a 3 m2 section was painted white from a previous experiment. Grey office partitions, approximately 200 cm tall, were placed on either side of the path to reduce visual clutter and to prevent participants from confusing the neighboring path with the current path. Eight obstacles, painted in different shades of matt grey, were placed through the course (see lower portion of Fig. 3). Two of the obstacles were suspended from the ceiling to a height of 1.2 m above floor level. All obstacles along the path were made from empty packing boxes (450×410×300 mm). These obstacles were designed to replicate obstacles which a blind person could encounter in the real world. A straight, unobstructed, 10 m section of the course (shown in Fig. 3) was used to measure the Preferred Walking Speed (PWS) of each participant.
Fig. 3. Map of the 30 m mobility course built for this study. The grey shaded area is the path identified by black tape on the floor. The numbers refer to the placement of obstacles and the black lines denote office partitions. The different types of grey shading on each obstacle are shown in the lower image
282
Jason Dowling, Wageeh Boles and Anthony Maeder
During each mobility session participants were randomly allocated to one frame rate (1, 2 or 4 FPS) and one display type level (16×12 or 32×24 phosphenes) and were asked to walk through the mobility course starting at one of two randomly allocated starting positions. The end of the 30 m path was marked with a high contrast 1 m2 paper sheet attached to a partition. During the mobility trials, a single experimenter recorded walking speed, obstacle contacts and the number of times participants veered outside the path boundary. Participants Preferred Walking Speed (PWS) over 10 m was recorded at the start and end of the mobility trials and used to calculate participants PPWS. Participants Ten female and 50 male volunteers were recruited from staff and students at different faculties at the Queensland University of Technology (QUT) and the CSIRO e-Health Research Centre. The method of recruitment involved emails and posters placed around the three QUT campuses. The age and gender distribution of participants is shown in Table 1. All participants had normal or corrected-tonormal vision. Table 1. Gender and age groups of experiment participants Age 0–19 20–29 30–39 40–49 50+ Total
Male 3 27 11 6 3 50
Female 1 5 1 3 0 10
Total 4 32 12 9 3 60
Fig. 4. An image from the mobility course showing the effect of reduced resolution on image quality. The original 160×120 image (a) is shown reduced to 32×24 (b) and 16×12 simulated phosphenes (c)
Procedure Each participant was randomly allocated to one frame rate (1, 2 or 4 fps) and one display type level (16×12 or 32×24 phosphenes) and commenced their first trial with one of the two course start locations (marked ‘A’ or ‘B’ in Fig. 3). One hour
Visual Prostheses for the Blind: A Framework for Information Presentation
283
was allocated for testing each individual. Study participants were met in a corridor outside the lab, read a consent sheet and filled out the questionnaire. The simulation headgear was then explained and fitted before the participant was led into the lab. Each participant was then allowed two minutes to familiarize themselves with the display. The guided PWS was then recorded over 10 m. After this the participant was led to the trial starting location and the first mobility trial was conducted. Participants were offered a short break of approximately one minute before the second trial was conducted. Finally, the PWS was measured for the second time. During the mobility trials, a single experimenter recorded walking speed, obstacle contacts, the number of times participants were told they were walking backwards and the number of times participants veered outside the path boundary.
4
Results
Tables 2 and 4 summarize obstacle contacts and veering results which occurred during the experiment. Overall veering was significantly less with a higher level of spatial resolution (F(1,54) = 21.25, p < 0.01). There was no significant difference found between the two levels of display spatial resolution and overall obstacle contacts (F(1,54) = 0.08, p = 0.78). The walking speed results are shown in Fig. 5 and Table 4. Predictably, participants spent significantly less time walking through the course during the second trial (F(1,54) = 4.40, p < 0.05). Using PPWS as the dependent variable, frame rate was not related to improved performance on the first (F(2,54) = 1.80, p = 0.18) or second trials (F(2,54) = 2.33, p=0.11). However time spent walking through the mobility course was significantly affected by frame rate on both the first trial (F(2,54) = 3.86, p < 0.05) and the second trial (F(2,54) = 3.24, p < 0.05). There was also a marginally significant relationship between frame rate and overall veering on both trials (F(2,54) = 2.68, p = 0.08). Table 2. Mean number of obstacle for different resolution and frame rate Resolution 16×12
32×24
Frame Rate 1 2 4 1 2 4
Obstacle Trial 1 4.30 (1.70) 4.10 (1.73) 3.40 (1.35) 3.90 (1.37) 3.70 (1.77) 3.10 (1.10)
contacts
(with
Obstacle Trial 2 3.70 (1.57) 3.20 (1.55) 4.30 (1.34) 2.70 (1.16) 4.10 (1.29) 2.80 (1.62)
standard
deviations)
Total 8.00 (2.67) 7.30 (2.26) 7.70 (1.95) 6.80 (2.04) 7.80 (2.44) 5.90 (2.42)
284
Jason Dowling, Wageeh Boles and Anthony Maeder
Table 3. Mean number of veering for different resolution and frame rate Resolution 16×12
32×24
Frame Rate 1 2 4 1 2 4
Obstacle Trial 1 4.30 (1.70) 4.10 (1.73) 3.40 (1.35) 3.90 (1.37) 3.70 (1.77) 3.10 (1.10)
errors
(with
Obstacle Trial 2 3.70 (1.57) 3.20 (1.55) 4.30 (1.34) 2.70 (1.16) 4.10 (1.29) 2.80 (1.62)
standard
deviations)
Total 8.00 (2.67) 7.30 (2.26) 7.70 (1.95) 6.80 (2.04) 7.80 (2.44) 5.90 (2.42)
Table 4. Mean scores (with standard deviations) for the amount for time spent walking through the mobility course during each trial, and for PPWS (calculated using combined PWS) during each trial Resolution Frame Rate 16×12 1 2 4 32×24 1 2 4
Time (s) Trial 1 326.40 (190.66) 353.20 (142.64) 245.30 (61.91) 306.20 (86.82) 266.10 (78.50) 204.60 (79.80)
Time (s) Trial 2 317.80 (207.14) 376.80 (208.75) 237.50 (55.93) 251.80 (112.80) 264.10 (122.14) 178.70 (68.93)
PPWS Trial 1 27.87 (15.49) 24.50 (11.83) 31.79 (6.34) 25.55 (8.90) 29.85 (9.68) 35.84 (10.34)
PPWS Trial 2 29.76 (18.05) 26.07 (15.27) 32.88 (7.20) 32.39 (11.27) 31.74 (10.43) 40.00 (12.62)
Fig. 5. Percentage of Preferred Walking Speed (PPWS) results for trials 1 (PPWS1) and 2 (PPWS2) displayed by resolution type and frame rate (FPS)
Visual Prostheses for the Blind: A Framework for Information Presentation
5
285
Discussion
An increase in spatial resolution from 16×12 phosphees to 32×24 phosphenes was associated with a significant reduction in veering errors between participants. However frame rate, during the second of two trials for each participant, was significantly related to increased walking speed. The variability of results for the first PPWS trial could be due to learning effects, and mixed levels of comfort and confidence by participants. The results from this study indicate that spatial resolution is more useful than increased frame rate for following a path without veering. However the display frame rate has a significant effect on a person’s preferred walking speed. These findings suggest the development of an adaptive visual prosthesis system which could provide a lower resolution/faster display mode which a person is moving, and a higher resolution/slower display when a person has ceased movement. It would be interesting to assess the effect resolution and frame rate on mobility over a number of repeated trials. However it would be difficult and timeconsuming to maintain a sufficient number of participants for reasonable statistical results over a period of time. Learning effects have been found in many prosthesis simulation studies (for example, Cha (1992), Chen (2005) and Fu (2006)). Mean scores generally improved between the first and second trials in the current experiment, however an extraneous variable could be the level of confidence each participant felt while being effectively blindfolded in a strange environment. Some participants also required time to adjust to the location of camera and the associated difference in display viewing angle from their usual vision. The following suggestions were received from participant feedback and observation during the sessions which may enhance future visual prosthesis mobility performance research: • During training, to assist in obstacle avoidance, allow participants to observe the increased rate of expansion from a high-contrast looming object as they walk toward it. • Advise participants to adjust their walking speed to the speed of display (for example, 1 step per display update). • Demonstrate the width of the camera field of view (FOV) by showing an object of a known width (for example. a doorway) and allow the participant to touch the object. • To reduce veering, show participants the black tape marking the path boundaries and ask participants to touch it. • Suggest using slow head movements to compensate for the narrow displayed FOV. However, explain that faster head movements may result in image corruption due to motion blur. In addition, some participants tended to point the head mounted camera too high to locate the path boundaries. Therefore, an artificial horizon indicator may be useful to assist with camera orientation. The mean PPWS results for this experiment range from 24.5 for 16×12 phosphene resolution at 2 FPS to 40.0 for 32×24 resolution at 4 FPS. Participants
286
Jason Dowling, Wageeh Boles and Anthony Maeder
generally moved at a slow pace, and spent time scanning for both obstacles and the edges of the path. However these results are similar to those reported in Jones et al. (2006), who recorded PPWS while investigating eight visually impaired adults and the effectiveness of an image based ETA. The experimental hardware and software performed reliably. No participants reported nausea during the experiment, although two required a rest between trials. The front of the HMD sometimes became warm during the experiment, due to the shroud attached to block external light. One hardware constraint in this study was the narrow 34° field of view (FOV) of the Swann USB camera, which is a similar constraint to current generation night-vision goggles (Hartong, 2004). However, an image captured with a wider FOV may not necessarily enhance mobility, as the spatial resolution will still need to be greatly reduced for an electrode array. It would be useful in future work to compare the effect of different camera fields of view on mobility.
6
Conclusions
In this chapter a novel framework for the display of mobility information to assist blind mobility has been presented. This framework includes the main factors which impact on blind mobility, including the current context, scene properties, task undertaken, available sensory information, and environmental factors, in addition to human factors (such as level of training) and prosthesis technology (such as camera and electrode array technology). The benefits of using this framework include enhanced communication between visual prosthesis researchers and the ability to experimentally explore and compare different factors (such as different types of computer vision methods, and different mobility environments). A mobility experiment has been presented which has demonstrated how this framework can be applied to investigate different factors influencing mobility. Time spent walking through the mobility course, combined with veering and obstacle contacts form the basis for an objective method to assess the effects of different image processing methods in both simulated and real visual prosthesis systems. This method of assessment could also be extended to comparing different blind mobility aids with an implanted system, for example comparing the freeware vOICe auditory electronic aid for the blind (which is limited to presenting one frame per second (Meijer, 2003) with phosphene simulations).
References [1] [2]
Cha, K., Horch, K.W., and Normann, R.A. (1992). Mobility Performance with a Pixelised Vision System. Vision Research, 32(7), 1367–1372. Chen, S. C., Hallum, L., Lovell, N., and Suaning, G. J. (2005). Visual acuity measurement of prosthetic vision: a virtual-reality simulation study. Journal of Neural Engineering, 2, 135–145.
Visual Prostheses for the Blind: A Framework for Information Presentation [3]
287
Dagnelie, G. B., D.; Humayun, M.S. & Thompson Jr., R.W. (2006). Paragraph Text Reading Using a Pixelized Prosthetic Vision Simulator: Parameter Dependence and Task Learning in Free-Viewing Conditions. Investigative Ophthalmology and Visual Science, 47, 1241–1250. [4] Dobelle, W. (2000). Artificial Vision for the Blind by Connecting a Television Camera to the Brain. ASAIO Journal, 46(1), 3–9. [5] Eckhorn, R., Wilms, M., Schanze, T., Eger, M., Hesse, L., Eysel, U. T., Kisvrday, Z. F., Zrenner, E., Gekeler, F., and Schwahn, H. (2006). Visual resolution with retinal implants estimated from recordings in cat visual cortex. Vision Research, In Press. [6] Fu, L., Cai, S., Zhang, H., Hu, G., and Zhang, X. (2006). Psychophysics of reading with a limited number of pixels: Towards the rehabilitation of reading ability with visual prosthesis, Vision Research, 46 (8–9). [7] Hambercht, F.T. (1990). The history of neural stimulation and its relevance to future neural prostheses. In W.F. Agnew & D.B. McCreery (Eds.), Neural Prostheses: Fundamental Studies. New Jersey: Prentice Hall. [8] Hartong, D.T., Jorritsma, F.F., Neve, J.J., Melis-Dankers, B.J.M. and Kooijman, A.C. (2004). Improved mobility and independence of night-blind people using night-vision goggles, Investigative Ophthalmology & Visual Science, 45(6). [9] Jones, T. and Troscianko, T. (2006). Mobility performance of low-vision adults using an electronic mobility aid. Clinical and Experimental Optometry. 89(1). [10] Meijer, P. B. L. (1992). An Experimental System for Auditory Image Representations. IEEE Transactions on Biomedical Engineering, 39(2), 112–121. [11] Soong, G.P., Lovie-Kitchin, J.E. and Brown, B. (2001). Preferred walking speed for assessment of mobility performance: Sighted guide versus non-sighted guide techniques Clinical and Experimental Optometry. 82(5). [12] Thompson, R.W., Barnett, G.D., Humayun, M.S., and Dagnelie, G. (2003). Facial recognition using simulated prosthetic pixelized vision. Investigative Ophthalmology & Vision Science, 44(11), 5035–5042.
Computer-based Method of Determining the Path of a HIFU Beam Through Tissue Layers from Medical Images to Improve Cancer Treatment
E. McCarthy and S. Pather University of Southern Queensland, Toowoomba, Australia
1
Introduction
The accurate prediction of the path of high intensity focused ultrasound (HIFU) is essential for the destruction of cancerous cells. Knowledge about the layers of tissues traversed by the ultrasound beam would offer a means of predicting the path and hence provide the precise alignment of the HIFU transducers. The effectiveness of cell destruction by multiple HIFU beams is dependant on the foci of all the beams being incident at a single point. The path of an ultrasound (US) beam, similar to all sound waves that travel through several media, is deviated away from its “line-of-sight” path from the source (transducer) to its target (tumour cells). The degree of deviation is influenced by the angle of incidence of the beam at the tissue boundaries and the relative refractive indices of the tissue traversed. The determination of the tissue layers will aid in predicting the (deviated) path of the therapeutic US beam, and hence specify the correct alignment of the transducers. This will ensure that the foci are co-incident, resulting in the correct intensity of energy being applied to the tumour cells to guarantee their destruction. The knowledge of the tissue boundaries and tissue thicknesses, determined from medical images, is crucially important to an effective HIFU cancer treatment protocol. This paper describes the determination of the tissue boundaries and the associated prediction of the path of the US beam from information gained from medical images. In the early stages of the project it was decided to develop two methodologies of predicting the path; hence the project was divided broadly into two parts; Part A which developed a manual method, and Part B which is a more automatic method of predicting the beam deviations. Although the first is described as a “manual method”, both methods are computer based.
290
E. McCarthy and S. Pather
In Section 2 of this paper, an overview of the application of HIFU and the need for medical images is given. Section 3 discusses the manual method of path prediction, which is followed by the discussion of the automatic method in Section 4. The current state of the research is discussed in Section 5, which provides some details of the experimentation being developed to verify the prediction methods and also describes some concepts for determining the scaling factor to be used in the specification of the tissue thickness. Finally, Section 6 gives some concluding remarks and directions for future work.
2
An Overview of HIFU and the Need for Medical Images
One of the cancer treatment modalities currently under worldwide research is the use of high intensity focused ultrasound (HIFU). This method uses the high temperature that is created at the small focal region to destroy tumour tissue. The use of ultrasound to heat (and thus destroy) tissue was under investigation from early in the 20th century with focused ultrasound being proposed by Lynn (Lynn et al. 1944) as a potential surgical tool for neurosurgery. Recent studies have found new applications for HIFU in ophthalmology (Lizzi et al. 1978), urology (Watkin et al. 1995), oncology and acoustic haemostasis (Martin et al. 1999). There are two modes in which this modality can be applied: (i) as an adjunct to chemo- and radiotherapy to cause regional heating (40–50 oC) (van der Zee et al. 2000) and (ii) a surgical tool to destroy deep-seated tumour tissue (>50 oC) (Watkin et al. 1995). Extracorporeal HIFU is ideally suited for the non-invasive, trackless destruction of deep-seated tissue without the need of general (or local) anaesthesia and pre- or post-operative care. The bowl shaped US extra-corporeal transducer (used outside the body and hence non-invasive) produces short pulses of intense US energy, which passes into the body via a water bag couplant. “Trackless” describes the manner in which the beam passes through healthy tissue without causing any damage, but it is only the cells in the focal region that are destroyed (ter Haar 1989). Theoretically, it is only at the focal region of the beam that the temperature is sufficiently high to destroy tissue. It is at the focal region, where the beam is concentrated, that the energy is sufficient to cause the destruction of cells. The mechanism of cell destruction is a combination of heat (up to 70 °C) and stasis, which destroys the structure of the cell. HIFU administered by a single large transducer has several adverse effects such as off-focus hot spots and skin burns (Chauhan 1999). This is due to healthy tissue being repeatedly exposed to a large cone of US energy. The implementation of a multiple transducer system, where the total energy is divided among smaller diameter transducers, is reported to prevent these adverse effects (Davies et al. 1998). A prototype robotic manipulator has been developed to effectively move the multiple transducers in a pre-determined manner to ensure that the common focal region is scanned through a required volume representing a tumour (Pather et al. 2002).
Determining the Path of a HIFU Beam
291
Fig. 1. Deviation of beam away from intended path
Results from experiments undertaken on a large block of homogenous phantom tissue have shown that the temperature at the focal point is sufficient to destroy tissue (Pather 2001). The effective use of three transducers, set at predetermined angles, is complicated by the non-uniform profile of the patient’s abdominal region, with the beams from each transducer being incident at different angles. The first deviation of the US beam will occur at the fat layer. The beam will then need to pass through a layer of muscle and liver tissue. Figure 1 illustrates the deviation of the US beams passing through a layer of fat, muscle and soft tissue to the location of tumour cells. Figure 2 demonstrates the compensation required to ensure that the foci of the three transducers are co-incident. This figure, in essence, illustrates the aim of the research described in this paper. Hence, to be able to predict the actual US beam path, prior knowledge of tissue boundaries, tissue layer thicknesses and
Fig. 2. Correct alignment of transducer (path exaggerated)
292
E. McCarthy and S. Pather
the refractive indices of the tissue traversed by the beam would be required. Radiologists routinely retrieve information relating to the internal anatomical structures by examining medical scans, such as computed tomography (CT, also known as computerized axial tomography (CAT)), magnetic resonance imaging (MRI) and ultrasound scans. These imaging modalities have also found many applications where they play an active role in guiding surgeons in minimally invasive surgery and internal tissue biopsies (Vaezy 2001). The medical scans are available in digital format thus widening their application base to include computer-aided surgery. The implementation of the medical imaging into computer systems provide assistance to surgeons with anatomical model generation, as in augmented surgery, and image processing capabilities, for feature tracking, registration and identification (Soler et al. 2001). Each of the medical imaging modalities exploits the attenuation properties of the tissue to produce images. A visual inspection of the scans reveals anatomical structures represented by regions of non-homogenous intensity, thus demonstrating the natural variation of properties occurring within different tissue. Anatomical structures are displayed as greyscale pixel intensities ranging from 0 (representing black) to 255 (representing white), with pixel intensity darkening with decreasing tissue density. This results in a medical image to be displayed as a matrix of pixels, each representing a shade of grey, and compositely forming an image recognisable to the human eye. This matrix is in an ideal format to retrieve information about the image.
3
Manual Detection of Boundaries and Prediction of Beam Path
The manual and automatic methods begin with loading a medical image which is stored as a JPEG file. Processing code and associated GUI’s were written and executed in MATLAB (version 6 release 12). To make this and the automatic method easier to explain, only one transducer and the associated path will be discussed. This theory is applicable to any number of transducers used in the system. The manual method is best described by explaining the progressive GUI windows that require some form of user input. The first interface displayed (Fig. 3), requests the image file name that is to be opened, and displays this in the image window. The interface also requests the following information (the requested information is entered by a mouse click at the pointer location on the image or entering text input into the appropriate boxes): • Location of source: this is the position of the transducer relative to the surface of the skin, • Location of the target: this is the location of the treatment site (tumour cells) • Area of Interest – this is the portion of the image that is relevant to the beam path analysis. This is indicated by “cropping” the surrounding area of interest. This region must include the positions of the source and target.
Determining the Path of a HIFU Beam
Fig. 3. Opening screen of the Manual Method
293
Fig. 4. Second screen of the Manual Method – Boundary Profiles
The next window opens with the cropped image and requests further information to process the tissue boundaries. As the refractive indices are programmed as constants for the different tissue types, the user specifies the type of tissue between the boundaries using a menu box. The user is prompted to click the pointer at four locations on a single boundary between two distinctly different tissue layers. Ideally a radiologist would provide this input on the screen. The program registers these points and provides a “best-fit” second-order polynomial curve through these points. This results in the boundary having an equation describing its form between the two tissue layers. This process is repeated until all the required boundaries have been registered and can be described by equations. The final window (Fig. 5) displays the path that the beam will need to take from the source to the target point. Prior to this window being displayed, the program undertaken a number of iterative calculations to arrive at the best route for the beam. The explanation of the process is best described by the series of illustrations in Figs. 6–8. The key to the solution is the process of describing the boundary profile as an equation. With knowledge of the location of the source and target, the “line-ofsight” path from source to target can also be described as the equation of a straight line. During every iteration of the calculations, the first position of interest is the point of intersection between the “line-of-sight” straight line and the curve of the first boundary (illustrated by the thick solid line in Fig. 6). With the knowledge of the equation, the gradient of the curve at this point can be determined and hence the angle of incidence and angle of deviation (refraction) of the beam can be deduced. The angle to deviation is determined from Snell’s Law, which relates the ratio of the sine of the angles of incidence and refraction to the inverse ratio of the refractive indices of the media. A new path vector is drawn based on the angle of refraction (illustrated in Fig. 7). The above process is repeated (i.e. determine point of intersection at the next boundary, determine gradient, determine angle of refraction, draw new path) until all the boundaries are traversed and the beam intersects with either the target’s horizontal or vertical axis (as illustrated in Fig. 8). The location of the beam’s intersection point on the horizontal or vertical axis provides the input for the next iteration of the program. Based on the distance away from the target point the program will reset the transducer angle (by a few degrees) and so initiate the next loop in determining the path. The direction of rotation of the transducer
294
E. McCarthy and S. Pather
Fig. 5. The final screen showing the beam path
Fig. 6. Equation of the curve is calculated from the points on the boundary
Fig. 7. The angle of refraction is determined from the gradient at the point of intersection
Fig. 8. The transducer angle is re-aligned based on the measured offset
will depend on the axis with which the beam intersects; horizontal axis would require a counter-clockwise rotation and an intersection with the vertical axis would require a clockwise rotation. [following a number of trials, the analysis of the results could produce a look-up table for the best choice of reset angle]. Figure 5 shows the path of the beam in the final window of the program.
4
Automatic Prediction of Beam Path
This method requires minimum input from the user. Once the image is loaded into the program, the refractive indices are specified, and the position of the source and target located (Fig. 9), the program automatically determines the beam path from source to target (Fig. 10). A number of subroutines were developed to addresses specific aspects of this method. Before running the automatic beam prediction program, the medical image must be prepared in a form that the program can readily interact with. This requires that the image be resolved into a format that provides a clear distinction of the tissue layers. The following sub-section provides some detail of the imaging processing techniques undertaken.
Determining the Path of a HIFU Beam
Fig. 9. The opening screen of the Automatic Method
4.1
295
Fig. 10. The final screen showing the beam path
Image Processing Techniques
Each of the medical imaging modalities exploits the attenuation properties of the tissue to produce images. A visual inspection of the scans reveals anatomical structures represented by regions of non-homogenous intensity, thus demonstrating the natural variation of properties occurring within different tissue. The code retrieving information about the tissue boundaries is written in MATLAB, which is an interpreted numerical program specialising in matrix manipulation. The program was initialised by loading a scanned image in a JPEG format into the MATLAB program. The approach taken was to consider any image of resolution MxN pixels to be represented as a MxNxP matrix of pixel intensities, where P is 1 (for greyscale images) or 3 (for colour or red-green-blue images). For the purposes of this project, the non-homogenous regions of the scanned image are smoothed to a uniform pixel intensity using an image look-up table. The look-up table alters the image’s intensity distribution to a user-defined distribution by providing a new intensity value for each pixel (Crane 1997). Smoothing of the image intensity is achieved by setting a reference value for a range of intensity values, thus reducing the number of levels present in the image (shown in Fig. 11).
Fig. 11. Look up table technique. The original CT image is on the left with the processed image on the right. Time indicated is the processing time in seconds
296
E. McCarthy and S. Pather
Fig. 12. Comparison of pixel density and processing time for a medical image. (a) processing of image resampled to 10% of original size, (b) processing of image resampled to 50% of original size, (c) processing of image resampled to 100% of original size, and (d) plot of processing times for resampled images
The effects of reducing the resolution of the image on the computational time and processed image were also explored. The computer processing time for each technique was recorded as the period of time after the image was fully loaded, up until the completion of the image processing, as measured by the computer clock. A selection of images was re-sampled to pixel densities from 10% to 90% in 10% increments and the processing times were plotted. Figure 12 shows some of these images together with a time-plot which illustrates that the image processing times behaves exponentially for images of increasing pixel density. Determining the quality of the output image is an experience-based measurement and is unable to be interpreted by a computer. To assess the quality of the processed images, a radiologist from the Toowoomba Base Hospital was consulted for his professional opinion on a sample of the processed images. The radiologist highlighted the point that varying the spatial resolution yields results, where specific anatomical regions were better detected than others. Specifically in CT scans examined, the muscles were better represented in lower resolution, at the expense of the boundaries of the other anatomical structures. Alternatively, the higher spatial resolution provided excellent definition of the inner structures of the body.
Determining the Path of a HIFU Beam
297
Fig. 13. Edge detection mask technique. The original CT image is shown on the left and the edge detection mask processed image is shown on the right. Time indicated is the processing time in seconds
The application of weighted coefficient masks, such as Laplacian of a Gaussian, to highlight the edges present in the medical images was trialled for comparative purposes. The coefficient masks are predefined matrices applied via matrix multiplication with a matrix image pixels (Gonzalez et al. 2004). In edge detection techniques, the pixel intensity is determined to be either black or white based on the comparison of the weighted sum of the matrix elements against a predefined threshold. However, this method proved to be computationally expensive when compared to the look-up table method. The look-up table method requires that 256 values defining a new pixel intensity distribution be present in the table, regardless of the image size. In the edge detection method, for a mask of order N there are N2 multiplication and additions for each L*W pixel in the image. Mask dimensions depend on the application; however large masks cannot resolve fine detail and require more calculations, while small masks are more susceptible to noise (Crane 1997). In summary, the look-up table performed the best image processing, outputting the better quality image with the least computational expense, over mask edge detection methods.
4.2
Prediction of Beam Path
This section explains the method used to determine the path taken by the US beams from the source (transducer) to the target (tumour cells). Again, for clarity, only the determination of the path from one of the transducer to the target will be given. To further aid understanding, the following sub-sections describe the subroutines to achieve specific goals (these were implemented as “function calls” in the program). 4.2.1
Find Boundary
The positions of the source and target points are input onto the image, in a similar manner as described under “manual method”. The first iteration of the prediction
298
E. McCarthy and S. Pather
Fig. 14. Point of intersection of Incident Beam with 1st Boundary
uses a “line-of-sight” path from the source to the target; resolved as a straight-line equation. The software program interrogates each pixel on this straight line, starting at the source, and compares its intensity with the next pixel on the line. This data is saved and analysed until a decreasing trend in the intensity is noticed. This implies that the point under analysis has passed a tissue boundary, as illustrated in Fig. 14. The brightest point of the array thus far denotes the position of the boundary. This provides the point of intersection of the beam and the first boundary. 4.2.2
Determine Boundary Gradient
The program generates a square frame (each side of the frame is 35 pixels long and 1 pixel wide) around the point of intersection (Fig. 15). The pixels constituting the frame are now interrogated, in a similar manner as above, seeking the point where the tissue boundary intersects the frame. Typically, the boundary will intersect at points on opposite sides of the frame. These two points, together with the central point of intersection of the beam and the boundary, provide the input for a “best-fit” function to determine the linear equation through these points.
Fig. 15. The equation of the boundary is determined from the point of intersection of the beam and boundary and the points of intersection of the boundary and the frame
Determining the Path of a HIFU Beam
299
Fig. 16. Based on the equation of the boundary curve, the angles of incidence and refraction are calculated
4.2.3
Determine Angle of Refraction
With the knowledge of the equation of the straight line of the beam from the source to the first boundary and the equation (and hence gradient) of the boundary, the angle of incidence and angle of refraction can be calculated. A vector representing the refracted beam is now developed, which extends to the next boundary (Fig. 16). The above processes are now repeated, until the beam intersects with either the horizontal or vertical axes passing through the target point. Differences in the beam position and the target provide input to reset the angle of the source, and the complete process is repeated until the beam coincides with the target. Figure 10 shows the final window of the automatic detection of boundaries and beam path.
5
Current Research and Development
Presently, there are two specific areas that are being addressed. These are:
5.1
Determination of the Scaling Factor
Every scanned medical image is accompanied by a scale grid which is unique of the imaging modality, the scanning machine, and the region of the anatomy that is being scanned. Radiologists use this reference grid to measure features on the scan. However, it is difficult to effectively read this scaling grid using the imaging processing methods. The seemingly random placements (in position and orientation) of the scaling grid on the scan makes it difficult for the image processing software to quickly and automatically determine the scaling to be used for a particular scan, without any intervention of the user. Hence, it is considered expedient
300
E. McCarthy and S. Pather
to have the user input information about the scaling grid. This requires the user to register cursor points, in a perpendicular straight line, on two consecutive scaling grid marks, and enter the scale factor from the scanned image. The software registers the pixel locations of these points and determines the number of pixels between the two points and hence specifies the number of pixels per millimetre (or the size of each pixel in millimetres). This scaling factor is now used for all pixel measurements in terms of millimetres. This information would be required in future research where the length of the beam path will be investigated. The intensity of the HIFU beam is at a maximum at the focal point which is approximately at the focal length of the transducer. Hence the knowledge of the length of the path would be used to accurately place the focal point at the target.
5.2
Experimentation to Verify Beam Path Prediction
A test rig and test procedures are being developed to be able to quantify the accuracy of the beam path prediction from the manual and automatic modes of operation. For simplicity (and financial constraints) laser pointers are being used to represent the HIFU transducers, a digital camera (representing the medical scanning machine) is used to capture the image of the setup, and various shaped glass prisms (from a laboratory optics set) is used to represent a layer of tissue. The laser pointers and mounted at fixed locations relative to the plane of the target. The pointers are connected to a stepper motor which allows control over the direction of the beam (Fig. 17).
Fig. 17. Test Rig (currently being developed). This picture is taken by the digital camera which is integral to the rig. This image will be image processed by the software to determine boundaries, source and target positions
Determining the Path of a HIFU Beam
301
The proposed test procedure is being developed to test both path prediction methods. For the manual method, the procedure will be as follows: • • • •
Capture image with the digital camera Provide all input information (as discussed in Section 3) The program will provide the predicted path and new source (transducer) angle The new source angle is set by manually activating the stepper motor to rotate the transducers to the required angle. • The distance between the laser point and the target will be measured. This will provide a measure of the accuracy. For the automatic method, the procedure will be as follows: • Capture image with the digital camera Provide all input information (as discussed in Section 4) • The program will provide the predicted path of the beam. • The program will then activate the controller drives to move the stepper motors (and hence the pointer) to the required angle. The measure of accuracy will be the difference between the laser point and the actual target Careful consideration will be given to the causes of errors in the path. These may include the accuracy of the construction of the setup rig, alignment of the laser pointers, step resolution of the motors, backlash and errors in measurement. Further refinement of the test rig may eliminate most of these errors.
6
Conclusion
This paper has described the on-coming research and development into a computer based method of determining tissue boundaries from medical images to assist in the prediction of the path of US beams. This research is essential to the effective use of HIFU to destroy deep-seated tumour cells. Two methods were developed; the first a manual method requires significant user input, while the second is more automated. A test-rig and test procedures, presently under development, were discussed to provide a measure of the accuracy of these methods. The paper also provided some details of present and future development of the code to be able to specify the thickness of the tissue layers once the scaling factor of the images are determined. The final, fully integrated system of the above methods will provide a valuable tool for any medical procedure using extra-corporeal ultrasound for therapeutic purposes.
302
E. McCarthy and S. Pather
References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14]
Chauhan, S. (1999). The application of HIFU & Robotic Technology in Surgery. Mechanical Engineering. London, Imperial College University of London. Crane, R. (1997). A Simplified Approach to Image Processing. New Jersey, Prentice Hall. Davies, B. L., S. Chauhan, et al. (1998). A Robotic Approach to HIFU Based Neurosurgery. MICCAI ’98, Cambridge MA USA. Gonzalez, R.C., R.E. Woods, et al. (2004). Digital Image Processing using MATLAB. New Jersey, Pearson Prentice Hall. Lizzi, F. L., A. J. Packer, et al. (1978). “Experimental cataract production by high frequency ultrasound.” Ann Ophthalmol 10(7): 934–42. Lynn, J. G. and T. J. Putnam (1944). “Histology of Cerebral Lesions Produced by Focused Ultrasound.” American Journal of Pathology 20: 637–647. Martin, R. W., S. Vaezy, et al. (1999). “Hemostasis of punctured vessels using Doppler-guided high-intensity ultrasound.” Ultrasound Med Biol, 1999 Jul, 25:6, 985–90 25(6): 985–990. Pather, S. (2001). A Robotic System for the Application of HIFU Applied to Liver Tumours. Mechanical Engineering. London, Imperial College of Science, Technology and Medicine. Pather, S., B. L. Davies, et al. (2002). The Development of a Robotic System for HIFU Surgery Applied to Liver Tumours. ICARCV 2002, Singapore. Soler, L. and et al. (2001). “Fully automatic anatomical, pathological, and functional segmentation from CT scans for hepatic surgery.” Computer Aided Surgery, 6(3): 131–142. ter Haar, G. R. (1989). “High intensity focused ultrasound – a surgical technique for the treatment of discrete liver tumours.” Phys. Med. and Biol. 34(11): 1743–1750. Vaezy, S. (2001). “Image-Guided Acoustic Therapy.” Annual Review of Biomedical Engineering 3: 375–390. van der Zee, J., D. G. Gonzalez, et al. (2000). “Comparison of radiotherapy alone with radiotherapy plus hyperthermia in locally advanced pelvic tumours: a prospective, randomised, multicentre trial.” The Lancet 355(April 1): 1119–1125. Watkin, N. A., G. R. ter Haar, et al. (1995). “The Urological Applications of Focused Ultrasound Surgery.” Br. J. of Urology 75(1): 1–8.
Agricultural Applications
I must confess to a personal interest in this application area, since agriculture is the dominant industry surrounding the small city of Toowoomba, situated on top of the Australian Great Dividing Range. In the growing of cotton, the cost of irrigation is an important factor in the profitability of the crop. Assessment by machine vision of the crop's growing performance, as described in the first paper from a Toowoomba group, can be a valuable guide to the farmer. An Italian paper describes a harvesting machine and a post-harvest cutter. A form of chicory must be cut a centimetre below ground, while fennel must be cut to separate its foliage and root from the marketable portion. A vision system is fundamental to each of these applications. The final paper draws together summaries of a variety of projects undertaken at the National Centre for Engineering in Agriculture, here in Toowoomba. They range from the visual identification of animal species for controlling access to waterholes to the location of the coordinates of individual macadamia nuts that have fallen to the ground, in order to assist in the selection of trees to propagate.
On-the-go Machine Vision Sensing of Cotton Plant Geometric Parameters: First Results
Cheryl McCarthy1,2, Nigel Hancock1,2 and Steven Raine1,2 1
Cooperative Research Centre for Irrigation Futures National Centre for Engineering in Agriculture Faculty of Engineering and Surveying, University of Southern Queensland, Toowoomba, Australia.
2
Abstract Plant geometrical parameters such as internode length (i.e. the distance between successive branches on the main stem) indicate water stress in cotton. This paper describes a machine vision system that has been designed to measure internode length for the purpose of determining real-time cotton plant irrigation requirement. The imaging system features an enclosure which continuously traverses the crop canopy and forces the flexible upper main stem of individual plants against a glass panel at the front of the enclosure, hence allowing images of the plant to be captured in a fixed object plane. Subsequent image processing of selected video sequences enabled detection of the main stem in 88% of frames. However, node detection was subject to a high false detection rate due to leaf edges present in the images. Manual identification of nodes in the acquired imagery enabled measurement of internode lengths with 3% standard error.
1
Introduction
An increasingly essential factor in irrigated agriculture is the efficient use of water, that is, the application of water only as required. In addition to weather variations, there are topographic, soil, and plant-to-plant variations within fields which mean that the local irrigation requirement will be both time and spatially varied. Lateral move and centre pivot irrigation machines can be configured to apply time- and spatially-varied irrigation. However, variable-rate application systems currently rely on historical mapping of spatial differences rather than the actual water requirement of the crop. Further significant water savings are possible using
306
Cheryl McCarthy, Nigel Hancock and Steven Raine
site-specific water application that responds to real-time, local crop irrigation requirement, but real-time sensors of crop water stress are required to be developed. Plant geometrical parameters such as internode length (i.e. the distance between successive branches on the main stem) indicate water stress in cotton. In other crops, plant properties such as height, biomass and spacing have been successfully measured on-the-go in the field (such as Praat et al. 2004), but measurement of plant structure, including leaf area and internode length, has been restricted to laboratory environments (for example Lin et al. 2001). This paper describes a machine vision system that has been designed to measure internode length and other cotton plant parameters on-the-go in the field, and that may potentially be used in conjunction with a variable-rate centre pivot or lateral move irrigation machine.
2
Measurement of Plant Structure Using Machine Vision
An imaging system has been constructed that features a camera mounted in an enclosure with a transparent glass panel that forms the camera’s field of view (Fig. 1). The enclosure continuously traverses the crop canopy and makes use of the flexible upper main stem of the cotton plants to force individual plants against the glass window, and then smoothly and non-destructively guide each plant under the curved bottom surface of the enclosure. By forcing the plant against the glass window, the glass window becomes a fixed object plane which enables derivation of reliable geometrical data without the need for binocular vision.
Fig. 1. Diagram and photo of moving image capture apparatus
On-the-go Machine Vision Sensing of Cotton Plant Geometric Parameters
3
307
Image Processing
Plant features of stems and nodes are required to be identified from acquired imagery. This is achieved in a multi-step process that firstly estimates the plant’s main stem position and then identifies candidate branches. Stem colour may range from green to red and plants are densely populated, hence factors complicating main stem identification include green cotton leaves and branches from other plants. Borland® Delphi 6 with shareware DirectShow components (Mitov Software 2006) was used for software development. The camera was orientated in portrait so the long dimension of the image coincided with the long (vertical) edge of the window (and any upright stems). Every second row of each frame was discarded to remove interlacing effects, which halved the resolution of the image in the direction perpendicular to an upright stem (henceforth called the horizontal direction). Lens distortion in the window area, which was inset from the image boundaries, was assumed to be negligible. By comparing pixel locations of window scale marks in acquired images, the resolution of the image on the window surface was found to be 1.0 pixels/mm in the horizontal direction and 0.6 pixels/mm in the vertical direction. Acquired images included the window area as well as the box’s dark interior (Fig. 2a), so the image was segmented based on intensity to isolate the window area. A mask of the window area (Fig. 2b) was applied to all subsequent image processing steps to prevent image features not in the window area from influencing results. The first step in extracting the main stem from the image was to detect edges using an adaptive threshold. A mask size of 15×5 pixels was used to accentuate stem edges (Fig. 2c). Following this a morphological opening with a mask of size 2×8 pixels was applied, for the effect of keeping only thin, rectangular (and hence main stem-like) elements in the image (Fig. 2d). McDonald et al. (1990) describes morphological operators for detecting particular shapes in images of leaves. Finally the Hough transform (Duda et al. 1972), which uses a voting system to identify collinear points in an image, was applied to the opened image to estimate the main stem’s position (Figs. 2e and 2f). This approach assumes that the main stem is close to vertical, is partly visible and is the single most significant linear structure in the edge map. Using a similar process to detect other branches in the edge map requires a series of masks with different orientations to be applied to the image. However due to the large number of leaf edges in the image, and the variation in stem edge strength, this method was deemed not suitable for the acquired images. Detection of roads in aerial mapping images is a computer vision problem which has potential application in the detection of stems in plant images. Stems and roads have similar properties in their respective images, such as constant width and the presence of junctions and occlusions. Waksman et al. (1997) used a line detection technique employed in automatic extraction of roads from aerial images to detect petioles (leaf stems) in vine images, for the purpose of estimating average petiole incline angle. An example technique for automatic road extraction from aerial images is Steger’s line detection technique (Steger 1996).
308
Cheryl McCarthy, Nigel Hancock and Steven Raine
Fig. 2. Image processing steps: (a) Sample deinterlaced image captured from enclosure; (b) Mask for window area; (c) Adaptive threshold; (d) Morphological opening; (e) Hough transform line superimposed on (d); and (f) Hough transform line superimposed on (c)
On-the-go Machine Vision Sensing of Cotton Plant Geometric Parameters
309
Fig. 2. cont. Image processing steps: (g) Steger lines superimposed on (a); (h) Labelled Steger lines; and (i) Steger lines and candidate nodes superimposed on (a), where A, B, C, D and E are node predictions based on the extrapolation of lines 1, 6, 8, 9, 11 and 15 from (h) onto the main stem respectively
Steger’s line detection technique was used to detect candidate branch segments in acquired images (Fig. 2g). Steger’s technique convolves an image with derivatives of Gaussian kernels. For each image pixel, the local line direction is given by the pixel’s maximum second directional derivative, and a pixel is declared a line point if the magnitude of the maximum second directional derivative (or strength of the line) is within user-specified thresholds, and if the centre of the line lies within the pixel’s boundaries. Image features detected by Steger’s technique may include stems of the target plant as well as stems from other plants and leaf edges (e.g. lines 12 and 16 respectively in Fig. 2h). A single branch may be returned as several smaller, disjointed line segments, such as lines 2, 4 and 9 in Fig. 2h. Therefore candidate nodes were identified as the intersection of the main stem with those lines that meet the following criteria: the line has a slope that rises away from the main stem; the line exhibits smoothness; and the line represents a unique branch that has not already been projected onto the main stem (resulting candidate nodes shown in Fig. 2i). The distance between nodes was then calculated on all frames in which the main stem and two successive nodes were detected, with the maximum value for each internode distance corresponding to the frame in which the nodes were closest to the window. This maximum distance was declared the actual internode length.
310
4
Cheryl McCarthy, Nigel Hancock and Steven Raine
Field Equipment and First Trials
The imaging system was trialled on four cotton plants (cultivar: ‘Sicot 80 B’) ten weeks after planting in the 2005/2006 cotton season. A Sony TRV19E camcorder (resolution 720×576 pixels) was mounted in a fibreglass camera enclosure with overall dimensions 520×290×520 mm (Fig. 1). The camera enclosure was suspended from a sliding door track and was able to rotate such that different approach angles of the enclosure could be tested. Manual measurements of plant geometry included the top five internode lengths for each plant targeted by the vision system. The collected data, four sequences (of varying length) comprising 252 frames in total, were post-processed using the image processing method described above. The selected plants exhibited many visible stems and a minimum of leaf edges close to the main stem.
5
Results and Discussion
Using the image processing approach described above, the main stem was detected in 88% of frames. The factors that most influenced incorrect detection were the presence of more than one main stem in the image and the misalignment between the window area and target main stem, but other factors included curvature in the main stem and occlusion of the main stem by leaves. In future work the detection rate is expected to be improved by provision for more than one main stem per image, and by combining information from several frames to identify potential main stems. In individual frames, both correct and incorrect (‘false positive’) nodes were detected, with correctness (or otherwise) determined by visual frame-by-frame inspection (Table 1). It was noted that branches of other plants, leaf edges close to the main stem and inaccurate projection of actual branch line segments onto the main stem caused a high number of incorrectly detected nodes, on average 49% of all nodes detected (although the proportion varied greatly from frame to frame). While actual nodes were correctly detected in over half of the frames, the number of frames in which two sequential nodes were accurately detected was far fewer. Visual inspection revealed that this was due to the variation in stem width and intensity in images. It is clear that further refinement to the image processing approach is required before internode length can be routinely measured. The processing time was approximately 400 ms per frame on an Intel® Celeron® 1.40 GHz processor. Manual identification of nodes in images yielded internode distances with relative errors of up to 25%. However, visual observation of each sequence revealed that the larger errors always occurred when the main stem had not completely flattened against the window and this caused a reduced apparent internode distance. Hence these smaller values – which occurred at the start of each sequence – may be disregarded. Applying this criterion, the standard error in the determination of the internode distance was reduced to approximately 3%.
On-the-go Machine Vision Sensing of Cotton Plant Geometric Parameters
311
Table 1. Statistics of node detection in the four image sequences Sequence Number Total number of frames Frames with ≥1 node detected Frames with 2 adjacent nodes detected Frames with 3 sequential nodes detected Frames with false positives detected
6
1 73 57 27 8 53
2 80 66 17 6 46
3 60 39 9 2 36
4 39 31 13 6 27
Conclusions
The possibility for automatic, real-time, single-camera plant geometric measurement has been demonstrated. A camera enclosure that moves within the crop canopy is an effective and non-destructive method of collecting images suitable for analysis of plant geometry. For the dataset presented, the described image processing approach was effective at identifying the main stem but further work is required to improve node detection before fully-automated internode length measurement is achieved. However, with the aid of some not-yet-automated procedures based on visual inspection, measurement of internode lengths to 3% standard error has been demonstrated.
Acknowledgements The authors are grateful to the Queensland cotton farms ‘Macquarie Downs’ and ‘Adelong’ for providing field trial sites and to our colleague Simon White for assistance in collecting field data. The senior author is grateful to the Australian Research Council and to the Cooperative Research Centre for Irrigation Futures for funding support.
References [1] [2] [3] [4] [5]
Duda, R & Hart, P (1972), ‘Use of the Hough transformation to detect lines and curves in pictures’, Communications of the ACM, vol. 15, no. 1, pp. 11–15 Lin, T-T, Liao, W-C & Chien, C-F (2001), ‘3D graphical modeling of vegetable seedlings based on a stereo machine vision system’, ASAE Meeting Paper No. 013137, Sacramento, California, ASAE McDonald, T & Chen, Y (1990), ‘Application of morphological image processing in agriculture’, Transactions of the ASAE, vol. 33, no. 4, pp. 1345–1352 Mitov Software (2006), ‘VideoLab 2.2’, Moorpark, viewed 1 March 2006,
Praat, J & Bollen, F (2004), ‘New approaches to the management of vineyard variability in New Zealand’, in The 12th Australian Wine Industry Technical Conference, Managing Vineyard Variation (Precision Viticulture), pp. 24–30
312 [6]
[7]
Cheryl McCarthy, Nigel Hancock and Steven Raine Steger, C (1996), ‘Extracting curvilinear structures: a differential geometric approach’, in: Buxton, B & Cipolla, R (eds) (1996), Fourth European Conference on Computer Vision, Lecture Notes in Computer Science, Volume 1064, Springer Verlag, pp. 630–641 Waksman, A & Rosenfeld, A (1997), ‘Assessing the condition of a plant’, Machine Vision and Applications, vol. 10, no. 1, pp. 35–41
Robotics for Agricultural Systems
Mario M. Foglia1, Angelo Gentile2, and Giulio Reina3 1
Politecnico of Bari, Department of Mechanical and Management Engineering, Viale Japigia 182, 70126 Bari, Italy 2 University of Lecce, Department of Innovation Engineering, via per Arnesano, 73100 Lecce, Italy 1
[email protected],
[email protected],
[email protected].
1
Introduction
In the last few years robotics has been increasingly adopted in agriculture to improve productivity and efficiency. Most of the efforts in this research area have been devoted to fresh market fruit and vegetable harvesting tasks, which are generally, time consuming, tiring, and particularly demanding. For many crops, harvest labor accounts for as much as one-half to two-thirds of the total labor costs. Moreover, harvesting is expected to be automated due to a decrease in the farmer population. Extensive research has been conducted in applying robots to a variety of agricultural harvesting tasks: apples, asparagus, citrus, cucumbers, grapes, lettuce, tomatoes, melons, watermelons, oranges, and strawberries. Some notable examples of agricultural robotic systems can be found in (Arima et al., 2004; Brown, 2002; Edan et al., 2000; Hannan and Burks, 2004; Murakami et al., 1995; Peterson and Wolford, 2003b; Van Henten et al., 2002). Specific work on robotic endeffectors for agricultural operations such as harvesting, spraying, transplanting, and berry thinning has been developed in recent years (Ling et al., 2004; Monta et al., 1992; Monta et al., 1998). Computer vision has also been widely employed in agriculture for developing visual guidance systems (Benson et al., 2003; Pilarski et al., 1999), for fruit recognition on trees (Peterson et al., 2003a), grade judgment of fruits (Nagata and Cao, 1998) and for weed control (2002; Downey et al., 2003; Jeon et al., 2005). Specific research on vision-based harvesting of asparagus can be found in (Humburg and Reid, 1992), of melons in (Dobrusin et al., 1992), and of tomatoes in (Chi and Ling, 2004).
314
Mario Foglia, Angelo Gentile and Giulio Reina
This chapter describes two examples of robotic system dealing with the harvest of radicchio and post-harvest of fennel, respectively. These cultivations are widely grown in Italy and their market value and production rates justify the process automation. Radicchio, which is a red, broad leaf, heading form of chicory, requires a stem cutting approximately 10 mm underground in order to avoid sudden waste and to ensure appropriate product storage. Similarly, fennel requires a cutting operation to remove the root and the upper leaves after its harvest. The quality of the ready-to-market product largely depends on the accuracy of this operation. In Section 2, a robot for the harvesting of red radicchio is presented comprising a chain of two four-bar linkages as a manipulator and an optimized gripper. The robotic harvester autonomously performs its task using a vision-based module to detect and localize the plants in the field; we call it the radicchio visual localization (RVL) module. Section 3 presents a robotic system for the automated cutting of just-harvest fennel employing an innovative mechanism controlled by a visionbased inspection system, which we call the fennel visual identification (FVI) module. The FVI module is designed to analyze fennels traveling on a conveyor in sparse order and detect accurately root and leaves, which are automatically removed. Both visual algorithms are based on intelligent morphological and color filtering optimized in each one of the two cases to gain computational efficiency and real-time performance. Section 4 concludes this chapter describing experimental results and discussions to validate our systems and asses their performance.
2
Radicchio Harvester
The robotic harvester was designed aiming at both efficiency and cost effectiveness (Foglia and Reina, 2006). It consists of a double four-bar manipulator and a gripper for the harvesting of radicchio. Radicchio, which is typically 120– 130 mm in diameter with a 10–12 mm diameter stem, requires a stem cutting of the plant approximately 10 mm under the soil surface. Both manipulator and gripper are pneumatically actuated. While pneumatic actuators are difficult to control compared to electric actuators, they have a high power weight ratio, which makes them suitable for agricultural applications. Furthermore, the gripper is designed to work with pneumatic muscles, which in turn are inexpensive, light, robust and easy to maintain. Pneumatic actuation also provides good compliance with the plant due to compressibility of air, which allows to compensate for small errors in the measurement of radicchio position in the field (Kondo and Ting, 1998).
2.1
Grippers
All grasping devices should fulfill the following requirements: low-cost, robustness and simplicity in the mechanical design, and easy implementation.
Robotics for Agricultural Systems
315
Fig. 1. The two-finger gripper (a), and its cutting sequence (b)
Figure 1 shows the two-finger gripper prototype designed for our application; it is made of aluminum with an overall weight of approximately 16 kg, and employs two bucket-like fingers featuring a linear blade attached to their tips to perform the cutting operation. The driving linkage is actuated by two pneumatic muscles connected between the fixed plate F and the vertical slider S as indicated in Fig. 1(a, b); the two fingers operate simultaneously with symmetrical behavior. Note that the vertical stroke a of the slider S, translates into a horizontal and vertical displacement of the fingertips denoted, respectively, with b and c in Fig. 1(a). In the same figure, the fingertip paths are also shown by a dashed line. The closure of the gripper starts when all four limit switches touch the soil. Afterward, the fingers cut the stem at about 10 mm underground and simultaneously pull the plant out of the terrain as shown in Fig. 1(b) by the successive configurations of the gripper during the whole operation. In order to improve manipulation ability, a three-finger gripper was also considered. It is shown in Fig. 2 featuring three bucket-like fingers actuated independently by pneumatic muscles and driven by a four-bar mechanism, which allow performing the required stem cutting at about 13 mm underground. This is shown in Fig. 2(b) by the sequence of cut followed by one of the fingers. The prototype is made of aluminum with an overall weight of approximately 14 kg. Note that the fingers are mounted at 120° and the height of the upper support for the muscles, denoted with B in Fig. 2(b) can be shifted in order to adjust the relationship between the pneumatic muscle stroke and the displacement of the fingertip.
316
Mario Foglia, Angelo Gentile and Giulio Reina
Fig. 2. The three-finger gripper (a), and its cutting sequence (b)
2.2
Manipulator
The manipulator provides mobility to the gripper in order to approach the plant, perform the harvesting task, and deliver the radicchio to a container on the carrier. Requirements for the manipulator design are: the velocity of about 0.4 km/h of the carrier (tractor) of the robotic harvester, the distance between the plants of about 700 mm along the field lines, and the minimum height of 800 mm from the ground that is required by the CCD camera attached to the gripper for an efficient identification of the plant in the field during the targeting stage. The architecture of the manipulator is based on four-bar parallel links, which allow the gripper to stay level. Three candidate configurations are analyzed that utilize three, two and one pneumatic actuators respectively, as shown in the functional schemes collected in Fig. 3. Note that in all solutions a moving delivery tray is used to reduce the harvesting cycle time. Generally speaking, the number of actuated degrees of motions of a manipulator corresponds to the number of independent degrees of freedoms of the end-effector. The more degrees of motions, the higher the degree of flexibility and higher cost of the manipulator (Sciavicco and Siciliano, 2000). The one-actuator architecture (Fig. 3(a)) would allow the lowest costs but is less versatile; thus, the two-actuator configuration (Fig. 3(b)) results in the best trade-off and has been chosen for our system. Figure 4 shows a typical harvesting cycle path followed by the gripper with the two-actuator manipulator. The same path is referred to a ground reference frame in Fig. 4(a) and to a coordinate system embedded with the carrier in Fig. 4(b). The gripper moves forward horizontally until the plant is localized by the vision-based module (point B' in Fig. 4(a, b)).
Robotics for Agricultural Systems
317
Fig. 3. Four bar-based manipulator employing: one (a), two (b), and three actuators (c)
Then, the gripper starts its downward course towards the plant (point C), where it performs the cutting operation keeping null velocity with respect to the ground. Point D marks the return of the gripper towards the starting configuration where the plant is dropped on the delivery tray (point E) and the system can start the cycle again.
318
Mario Foglia, Angelo Gentile and Giulio Reina
Fig. 4. Harvesting cycle expressed in terms of path of the end-effector with respect to a ground frame (a), and to a carrier-embedded coordinate system (b)
2.3
The Radicchio Visual Localization
A vision-based algorithm was developed with the aim of detecting and localizing the plants in the field; The RVL module is based on intelligent color filters and morphological operations, which allow one to differentiate the radicchio within the images grabbed by a CCD color camera mounted on the end-effector with a frame rate of 5 Hz. Typically, the algorithm consists of the following steps: 1. Image acquisition in the Hue Saturation Luminance (HSL) space in order to enhance the thresholding operation described below (Fig. 5(b)). 2. Hue and Luminance plane extraction in order to obtain two images, where the radicchio is visually distinct from the surrounding (Fig. 6).
Robotics for Agricultural Systems
319
3. Independent thresholding in the Hue and Luminance planes in order to obtain two binary images comprising the radicchio purple pixels and white pixels respectively (Fig. 7). The thresholds are experimentally determined by analyzing the histogram of the two planes. Specifically, the threshold for the Hue plane was found to be well defined as: Tmax + Tmin 2
4. 5.
6. 7.
(1)
where Tmax and Tmin are the maximum and minimum intensity values respectively. The threshold for the Luminance plane is set, instead, as the highest value of grey levels since the white parts of the plant correspond to the pixels with the largest luminance. Or-operation of the two images in order to combine the information into a unique image (Fig. 8(a)). Morphological and particle filtering (Fig. 8(b)). Specifically, a morphological opening, i.e. an operation of erosion followed by dilation, is applied using a 5×5 pixels substructure to open up touching futures and remove isolated background pixels. Then, objects with an area, i.e. a total number of pixels, smaller than a threshold value (2000 pixels) are filtered out. Finally, a morphological closing, i.e. a succession of dilation and erosion, allows to reconstruct the shape of the plant by bridging the remaining small gaps. Convex hull generation of the extracted feature (Fig. 9(a)). Definition of the minimum enclosing circle (Fig. 9(a)), i.e. the smallest circle which encloses the extracted set of points (Xu et al., 2003). This geometrical algorithm helps to detect the plant even when only a relatively small uncentered portion of the radicchio is extracted as demonstrated later in Section 4.1.
Figure 9(b) shows the overall result of the localization algorithm. The minimum enclosing circle is overlaid over the original image with its center indicating the coordinates of the centroid of the plant in the image reference frame.
Fig. 5. Sample image of the radicchio in the RGB (a), and HSL (b) color space
320
Mario Foglia, Angelo Gentile and Giulio Reina
Fig. 6. Hue (a), and Luminance plane (b)
Fig. 7. Thresholding in the Hue (a), and Luminance (b) plane
Fig. 8. Or-operation between binary images (a), and morphological and particle filtering (b)
Robotics for Agricultural Systems
321
Fig. 9. Convex hull and minimum enclosing circle generation (a), radicchio localization in the image plane (b)
3
Fennel Cutting System
This section describes a robotic system for the automated cut of just-harvest fennel traveling at high speed and without any preordered orientation on a conveyor. This operation is commonly required during the post-harvest process of fennel to produce ready-to-market products. An automated cutting device would be beneficial to increase the manufacturing rate and the quality of the final product.
3.1. Cutting Mechanism The proposed mechanism is shown in Fig. 10 and consists of two four-bar linkage cutters, so-called fore and backing cutter, respectively, which operate asynchronously along the same path that is shown by a black solid line in Fig. 10. The time delay between the two cutters is proportional to the desired length of the fennel as
Fig. 10. Mechanism for automated cutting of just-harvest fennel
322
Mario Foglia, Angelo Gentile and Giulio Reina
Fig. 11. Configurations of the mechanism during a typical cutting cycle: starting configuration (a), first cut performed by the fore cutter (b), second cut performed by the backing cutter (c), and return of the two cutters to the starting configuration as the ready-to-marketfennel is dropped (d)
estimated by the visual identification module described later in this section. A third four-bar linkage feeder picks up the plant from the conveyor and provides the necessary support to perform a double cutting operation. Note that the cutters and the feeder follow approximately a straight line during the cutting stage providing a stable and strong cutting action. Figure 11 shows some typical configurations of the mechanism during its working cycle.
3.2
The Fennel Visual Identification
The FVI system processes fennels traveling at high speed on a conveyor detecting the parts of the plant, such as root and leaves, which need to be cut off to produce a high-quality market product. In Fig. 12 are shown, respectively, the just-harvest fennel and the final market product with overlaid the cutting lines (dash lines in Fig. 12(a)) issued by the FVI module. The FVI module operates in real time with a sampling rate of 10 Hz and a processing rate of 60–80 plants per minute. Typically, the algorithm consists of the following steps (Milella et al., 2006): 1. 2. 3. 4.
Image acquisition and RGB to HSL conversion. Extraction of the plant from the image by processing the Hue plane. Individuation of the orientation of the plant on the conveyor. Calculation of appropriate lines for the cutting operation. Each step is discussed in detail in the reminder of this section.
Robotics for Agricultural Systems
323
Fig. 12. The FVI module establishes the cutting lines (a) in order to produce the ready-tomarket fennel (b)
Image conversion in the HSL space − The use of the HSL space in place of the RGB space allows differentiating between color and luminance information, thus improving the successive image segmentation. Figure 13(a) shows a sample image of the fennel plant in the HSL color space. Hue plane processing − The fennel can be neatly distinguished from the belt background using a thresholding in the Hue plane (see Fig. 13(b)) as suggested by the related histogram (Fig. 14). The threshold value is experimentally determined through an initial calibration. An image of the belt without any fennel is acquired at the beginning of the process and the threshold for segmentation is fixed as μ-3σ where μ and σ are respectively, the mean and the standard deviation of the grey
Fig. 13. Sample image of the fennel in the HSL space (a), and in the Hue plane (b)
324
Mario Foglia, Angelo Gentile and Giulio Reina
Fig. 14. Typical Hue plane histogram
levels distribution of the hue image of the belt. The grey levels of the belt are isolated and the pixels of the plant can be selected. Then, an appropriate combination of erosion and dilation operations allows eliminating isolated pixels and joining connected pixels to gain a unique object representing the fennel (Fig. 15). Individuation of the plant orientation and cutting lines − In order for the FVI to work properly, it is necessary to know the orientation of the plant. Two strategies are adopted, which will be described below. The first strategy aims at detecting the relative position between the so-called “white” and “green” part of the plant. The green part can be detected by thresholding in the saturation plane, since it corresponds to the pixels with the highest intensity values in this plane (Fig. 16(a)). Similarly, the white part of the plant can be isolated by thresholding in the luminance plane, since this part is composed of pixels with the highest intensity
Fig. 15. Hue plane thresholding with morphological filtering
Fig. 16. The “green” (a), and “white” (b) part of the fennel with overlaid the cutting lines
Robotics for Agricultural Systems
325
Fig. 17. The root (a), and leaves part (b) template
values in this plane, as shown in Fig. 16(b). By comparing the position of the centroid of the green part with that of the white part, it is possible to derive the orientation of the plant. Then, the cutting lines can be fixed considering both dimension and orientation of the plant. Specifically, the root cutting line will be fixed at either the left side or the right side of the bounding box contouring the white part according to the plant orientation as shown in Fig. 16(b); the cutting line to remove the leaves is set, instead, at a distance from the root cutting line of two-thirds of the box length (Fig. 16(b)). The second approach for estimation of plant orientation is based on color matching techniques combined with grey scale pattern matching. This approach mostly relies on normalized cross correlation strategy. Two templates are defined respectively for the upper leaves and for the root, which will serve as masks (see Fig. 17). Each template is moved around the image estimating the value of the normalized correlation coefficient whose peak represent the position of the best matches in the image of the mask. Note that the comparison is not performed along the entire image but only in some locations which are previously selected based on color and shape matching with the templates. This reduces dramatically computational requirements. At the end of the cross correlation matching search, the algorithm is able to detect the root and the upper leaves template as shown in Fig. 18 and it issues the appropriate cutting lines, which are set at the end of the root template and at of two-thirds of the distance between the canters of the two templates. It is worth mentioning that both the color filtering-based and the template-based approach start working only when the entire plant is in the camera field of view. A software flag detects first when the plant enters the image and then when the plant is thoroughly in the field of view, thereby initiating the lines detection process.
Fig. 18. Cutting lines (white dash lines) as derived by the template-based approach
326
4
Mario Foglia, Angelo Gentile and Giulio Reina
Experimental Results
In this section, experimental and field results are presented obtained with our visual systems to asses their performance in terms of accuracy, repeatability, and robustness to disturbances and variations in lighting. 4.1
Radicchio Harvesting
A feasibility study of the system was performed through laboratory tests along with field validation of the vision-based module. The laboratory experiments were conducted on a prototype operating in a test bed simulating quasi-real working conditions. This stage was helpful to set up and optimize the components of the system. Then, the performance of the RVL module was validated in field tests. The prototype is shown in Fig. 19; three 3 mm thick steel plates provide for the connection between the two four-bar linkages consisting of 20 mm diameter and 2 mm thick aluminum tubes, between the manipulator and the carrier, and between the manipulator and the gripper. All the revolute joints are DryLin® bushings. A preliminary two-finger gripper was mounted at the end of the manipulator. The overall weight of the robotic arm is about 25 kg without the gripper. Figure 19(b) shows the laboratory test bed set with typical agricultural terrain and sparse fistsize rocks.
Fig. 19. The robotic harvester prototype operating in the laboratory test bed
4.1.1
Laboratory Tests
A set of experiments was performed to assess the performance of the robotic harvester in identifying and picking up radicchios that were randomly placed along the harvesting line. The plant position was derived by the RVL module; a localization error Ej was defined for the measurement of the plant position j as: Ej =
1 n
n
∑ i =1
x j − v j ,i xj
(2)
Robotics for Agricultural Systems
327
where n is the number of runs (n = 5), xj is the actual plant position measured by a ruler, and vj,i is the i-th vision-derived measurement. Ten different plant positions were analyzed along the 1-meter long harvesting line using radicchios of various shapes and sizes; the results are collected in Fig. 20, where Ej is reported along with the indication of its statistical spread. The average error was always below 5% and, for the worst-case measurement, it was less than 6.3%. No false localization was detected, and no significant influence of the size and shape of the radicchio on the RVL performance was observed. In all experiments, the robotic harvester was consistently and successfully able to pick up the targeted plant. The average time for a complete harvesting operation was about 6.5 s as indicated in detail in Table 1. The robustness of the algorithm to variations in lighting was also tested. Figure 21 shows the result of image segmentation under three different lighting conditions obtained by an adjustable video light. The RVL works very well even with a reduction of the environmental illumination level by as much as 80% of the optimal value (L = 0.8), as shown in the bottom image of Fig. 21.
Table 1. Average time for a complete harvesting operation Operation Radicchio Visual Localization Plant approaching and harvesting Plant delivering and arm reconfiguration
Fig. 20. Error estimation for the RVL module
Time (s) 0.2 4.0 2.5
328
Mario Foglia, Angelo Gentile and Giulio Reina
Fig. 21. Lighting influence on the RVL module
4.1.2
Field Tests
The RVL was extensively tested by performing several position estimation measurements in a field of radicchio ready for harvest on a cloudy day. Figure 22 shows a typical result for a plant recognition test. The RVL was able to detect correctly all six radicchio plants with an error of always less than 5%. In all experiments, the actual distance was estimated with a portable laser distance measuring tool using the center of the plant as reference point. The radicchio partially hidden by leaves (first plant of the lower row) was also detected accurately with an error of only 4.2%. The performance of the system was consistent with the laboratory results, but with better accuracy and robustness. No false detection was observed in the experiments. Note, however, that if the plant was completely obscured by leaves, it would not be possible to perform a correct image thresholding and the RVL module would fail. In addition, the system would provide poor plant localization if the extracted part of the plant was too small and uncentered. However, these are very unlikely field conditions for a fully or almost fully germinated plant, and no such case was observed in the investigated radicchio field.
4.2
Fennel Cutting
In order to test the performance of the FVI system a laboratory test bed was set up simulating a typical post-harvest environment using a ground-fixed firewire camera pointing straight down 1 m across from a 2 m×0.8 m conveyor. The FVI was
Robotics for Agricultural Systems
329
Fig. 22. Plant detection in the field: acquired image (a), plant thresholding (b), and radicchio localization (c)
extensively tested to detect the root and leaf part of 30 cm long fennel plants traveling on the conveyor with random orientation at the speed of about 1.5 m/s. In these experiments, the color filtering-based approach was adopted to estimate the orientation of the plants. The system was able to set correctly the lines of cutting for all the plants as shown in Fig. 23 for a typical measurement. No false detection was observed in the experiments. The influence of the lighting conditions on the performance of the FVI system was tested. The results are collected in Fig. 24 showing good robustness up to light reduction as much as 50% of the optimal value (L = 0.5). However, lighting is expected not to be a critical factor for this application as post-harvesting operations are usually performed in structured
330
Mario Foglia, Angelo Gentile and Giulio Reina
Fig. 23. Typical measurement obtained from the FVI module
Fig. 24. Lighting influence on the FVI module
environments. Experiments have also been performed implementing the templatebased approach to estimate fennel orientation; for this case the FVI detected correctly the cutting lines for only 80% of the plants. The lower performance is due to the differences in shape and dimension between plants, which the template-based approach is not able to account for.
5
Summary
Two applications of robotics in agricultural automation have been presented. The first application dealt with the harvest of radicchio. A robotic harvester was discussed giving details of its functional and mechanical design. A visual module to detect and localize accurately the plants in the field was also presented and validated in laboratory and field tests. The second case study addressed the issue of the automation of post-harvest cutting of fennel. An innovative cutting mechanism was described controlled by a visual module to detect and remove the parts of the
Robotics for Agricultural Systems
331
plant unfit for the market. Both systems showed to be effective in experimental trials and robust to noises and variations in lighting. They could be potentially adopted in agricultural automation to improve quality and efficiency.
Acknowledgements This work was funded by the Italian Department of Education, University and Research through a grant from PRIN 2001.
References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16]
Arima, S., Kondo, N. and Monta, M. (2004). Strawberry Harvesting Robot on Tabletop Culture. ASAE Paper no. 04-3089. St. Joseph, Mich.: ASAE. Benson, E., Reid, J., Zhang, Q. (2003). Machine Vision–Based Guidance System for an Agricultural Small-Grain Harvester. Trans. of the ASAE, Vol. 46(4): 1255–1264. Brown, G.K. (2002). Mechanical harvesting systems for the Florida citrus juice industry. ASAE Paper no. 02-1108. St. Joseph, Mich.: ASAE. Chi, Y.T., and Ling, P. (2004). Fast Fruit Identification for Robot Tomato Picker. ASAE Paper no. 04-3083. St. Joseph, Mich.: ASAE. Dobrusin, Y., Edan, Y., Grinshpun J., Peiper U.M., and Hetzroni A. (1992). Realtime image processing for robotic melon harvesting. ASAE Paper No. 92-3515. St. Joseph, Mich.: ASAE. Doeney D., Gilles D.K., and Slaughter D. (2003). Ground based vision identification for weed mapping using DPGS. ASAE Paper no. 03-1005. St. Joseph, Mich.: ASAE. Edan T., Rogozin D., Flash T., and Miles G. E. (2000). Robotic Melon Harvesting. IEEE Trans. on Robotics and Automation, Vol. 16 (6): 831–834. Foglia, M., and Reina, G. (2006), “Agricultural Robot for Radicchio Harvesting,” Journal of Field Robotics, Vol. 23, Nos 6/7. Hannan, M.W., and Burks, T. (2004). Current Developments in Automated Citrus Harvesting. ASAE Paper no. 04-3087. St. Joseph, Mich.: ASAE. Humburg, D.S. and Reid, J.F. (1992). Field performance of machine vision for the selective harvest of green asparagus. Trans ASAE, 100(2): 81−92. Jeon, H. Y., L.F. Tian and T. Grift. (2005). Developmnent of an individual weed treatment system using a robotic arm. ASAE Paper no. 05-1004. St. Joseph, Mich.: ASAE. Kondo, N., and Ting, K. (1998). Robotics for Bioproduction Systems. St. Joseph, Mi. ASAE. Ling, P., Ehsani, R., Ting, K., Yu-Tseh Chi, Ramalingam, N., Klingman, M., Draper, C. (2004). Sensing and End-Effector for a Robotic Tomato Harvester. ASAE Paper no. 04-3088. St. Joseph, Mich.: ASAE. Milella, A., Reina, G., Foglia, M. (2006). Computer Vision Technology for Agricultural Robotics. Sensor Review, Vol. 26, N° 4. Monta, M., Kondo, N., Shibano, Y., Mohri, K., Yamashita, J., and Fujiura, T. (1992). Agricultural Robots (3): Grape Berry Thinning Hand. ASAE Paper No. 92-3519. St. Joseph, Mich.: ASAE. Monta, M., Kondo, N., and Ting, K.C. (1998). End-effector for Tomato Harvesting Robot. Artificial Intelligence Review. 12:11–25.
332
Mario Foglia, Angelo Gentile and Giulio Reina
[17] Murakami, N., Inoue, K., and Otsuka, K. (1995). Selective Harvesting Robot for Cabbages. In Proc. of Int. Symposium of Automation and Robotics in Bio-production and Processing. JSAM, 2: 24–31. [18] Nagata, M., and Cao, Q. (1998). Study on Grade Judgment of Fruit Vegetables Using Machine Vision. Japan Agricultural Research Quarterly. 32 (4). [19] Peterson, D. L., Whiting, D., and Wolford, S. D. (2003a). Fresh–Market Quality Tree Fruit Harvester Part I: Sweet Cherry. Applied Engineering in Agriculture. 19(5): 539–543. [20] Peterson, D. L. and Wolford, D. (2003b). Fresh–Market Quality Tree Fruit Harvester Part II: Apples. Applied Engineering in Agriculture. 19(5): 545–548. [21] Pilarski, T., Happold, M., Pangels, H., Ollis, M., Fitzpatrick, K., and Stentz, A. (1999). The Demeter System for Automated Harvesting. In Proc. of the 8th Int. Topical Meeting on Robotics and Remote Systems. [22] Sciavicco, L., and Siciliano B.. (2000). Modelling and Control of Robot Manipulators. London, Springer-Verlag. [23] Van Henten, E.J., Hemming, J., Van Tuyl, B.A.J., Kornet J.G., Meuleman J., Bontsema J., Van Os E.A. (2002). An autonomous robot for harvesting cucumbers in greenhouses. Autonomous Robots, 13, 241–258. [24] Xu S., Freund, R., and Sun, J. (2003). Solution Methodologies for the Smallest Enclosing Circle Problem. Journal of Computational Optimization and Applications, 25, 283–292.
Biography Mario M. Foglia received the Laurea and Research Doctorate degrees in Mechanical Engineering from the Politecnico of Bari, Italy in 1995 and 2000, respectively. Since 2000, he is Assistant professor at the Department of Mechanical and Management Engineering, Politecnico of Bari. His research interests include path planning, agricultural robotics and localization and kinematic design of mobile robots. Angelo Gentile received the Laurea degree in Mechanical Engineering from the Politecnico of Bari in 1981. He is a Full Professor of Applied Mechanics at the Politecnico of Bari. He teaches Robot Mechanics at the University of Lecce. His areas of research are mobile robotics, mechatronic systems and industrial automation. Giulio Reina received the Laurea and Research Doctorate degrees from the Politecnico of Bari, Bari, Italy, in 2000 and 2004, respectively, both in mechanical engineering. From 2002 to 2003, he worked at the Mobile Robotics Laboratory, University of Michigan as a Visiting Scholar. Currently, he is an Assistant Professor in Applied Mechanics with the Department of Innovation Engineering, University of Lecce, Lecce, Italy. His research interests include ground autonomous vehicles, mobility and localization on rough-terrain, and agricultural robotics.
More Machine Vision Applications in the NCEA
John Billingsley National Centre for Engineering in Agriculture University of Southern Queensland Toowoomba, QLD AUSTRALIA
1
Introduction
In the early nineties, the research team at the National Centre for Engineering in Agriculture established a reputation for vision-based automated guidance of agricultural vehicles.[4] This work has a new lease of life with recent funding. A succession of further vision projects have been somewhat unusual, ranging from the visual identification of animal species for the culling of feral pigs to visionbased counting of macadamia nuts. A unifying feature is the easy availability of low-cost cameras and a framework for integrating analysis software using DirectX ‘filters’. Machine vision has changed from its earlier status as a sophisticated and expensive technology to a low-cost solution for more general instrumentation. For the rapid solution of ad hoc problems, it is easy to exploit the convergence between computing and media entertainment. A system including camera, interfacing, gigabytes of data storage, display and an embedded PC card can cost less than a single conventional high resolution industrial camera and interface. However we are also pursuing a more fundamental approach, building systems around image-sensor components interfaced by means of RISC processors.
2
Identification of Animal Species
In the Australian Great Artesian Basin, there has been a programme of capping bores and piping the outflow to watering points. [13] Access to water can therefore be controlled for feral and native animals, in addition to the farmed livestock. The objective is to allow normal access to both farmed and protected species, while ‘undesirables’ such as feral pigs are directed to a second water supply in an enclosure from which they cannot escape until they are ‘harvested’. [8] Feral pigs do hundreds of millions of dollars worth of damage per year, but there is a lucrative export market for wild pork.
334
John Billingsley
Fig. 1. Goat and boundary
Shortly, all Australian farmed animals will carry tags under the National Livestock Identification Scheme. This would make the task simple were it not for the need to allow kangaroos, emus and some other wild species equal access to water. The NLIS tag will give a reassuring double check, but the task of species identification remains. The animals must approach the water through a narrow fence-wire corridor, at the end of which is a gate which is controlled to direct them into one or other of two compounds. One of these has a one-way exit while the other is closed. Initial experiments used a blue tarpaulin behind corridor, to give easy colour-based discrimination. Identification of the animals is based on their boundary shape. Many edge detection methods lose important information by locating a boundary as a scatter of points with no regard to the order in which they should be linked to form a curve that circumscribes the shape. We therefore use a chain method that ‘draws’ the boundary, while a convention of ‘animal is to the left of the boundary’ defines the direction of drawing to be anticlockwise. A search is first performed on a coarse grid until the first ‘animal’ pixel is detected, then adjacent pixels are searched upwards to find the boundary. The
Fig. 2. Sheep and boundary
More Machine Vision Applications in the NCEA
335
boundary is then traced, using a strategy first proposed in 1983. [2, 3] Points are ‘stitched’ around the boundary in a routine that can most easily be likened to a dance step. Assume that we are ‘dancing’ to the left, starting from a point that has been identified as ‘animal’. We step forward and test the new point. If it has changed to ‘not animal’ we note the midpoint as a boundary point and step diagonally backwards to the left. If on the other hand it is ‘animal’ again, we rotate our body and progress direction forty-five degrees to the right and take a further step forward. A Javascript demonstration of the full algorithm can be found at http://www.jollies.com/stitch.htm. Points are found in a sequence that traces the boundary of the animal. We obtain a sequence of vectors of the nature of a Freeman chain. These give vectors, each of which takes us further around the boundary and has a direction, psi. When we plot psi against the boundary distance s we obtain an ‘s-psi’ curve relating tangent direction to the distance advanced around the boundary. A complex twodimensional image has been reduced to a simple one-dimensional function that can be matched against a set of shape templates of the same form. If the object is rotated, a constant is added to all the points of the curve. The distance can optionally be normalised to allow shapes of different sizes to be matched. Matching is simplified if we can choose a unique starting point on both curve and template. Instead of the full circumference of an animal, the ‘top half’ is usually sufficient for identification and avoids the confusion that leg movements can give. The edge is traced from the upwards vertical tangent at the nose, along the back to the downwards vertical tangent near the tail. It was felt that the blue tarpaulin might deter many of the animals in the wild from approaching the system. Further experiments discriminated between animals and background on the basis of the difference between the frame and a ‘remembered’ image of the background. It was harder to get a complete circumference this way, but a ‘bounding box’ was easy to construct. An elementary algorithm based on the two arrays of distances from the box to the front and to the rear of the animal gave good discrimination.
Fig. 3. Natural background and bounding box
336
John Billingsley
The project is at a field testing stage (literally) and sheep and goats can be separated with 100% success, except when animals overlap seriously.
3
The Counting of Macadamia Nuts
One of the objectives of breeding varietal strains is to produce a tree with maximum yield. The harvesting method is simple. Nuts fall from the trees and are picked up from the ground. The trees are planted close together, however, so the task of segregating the catch from the individual ‘drop zones’ is important. [1] The accepted harvester technology consists of a ‘bristle roller’ several metres wide. The nuts, in their husks, are trapped between the bristles and carried upwards, where they are stripped from the roller by ‘fingers’. They fall into an auger that carries them across the machine to another auger at the side, which in turn carries them to a bin at the back of the machine. As they move to the left in the auger at the front of the machine, nuts that have been gathered from the right-hand-side of the swathe are joined by nuts gathered progressively further to the left. At any instant, a point in the delivery auger will thus contain nuts gathered from a diagonal stripe of the width of the roller. If this were the basis of a count, it would be impossible to deconvolve the ‘Green’s function’ to ascribe the count rate to a fine enough location to assess the yield of individual trees. Machine vision has been adopted as the solution. The bristle roller has been coloured blue, to obtain good colour contrast with nuts at all stages of maturing, and cameras have been mounted inside the housing. First the RGB pixel components must be analysed to determine ‘roller’ or ‘not roller’. It was found by experiment that (red > blue) gave good discrimination. Unfortunately leaves are also picked up that are of a similar colour to the nuts. The second stage must therefore involve shape discrimination to select the circular nuts and reject other shapes. Figure 3 shows the discriminated image when all pixels are processed, but the chosen algorithm requires only a small proportion of the pixels to be sampled. The formal Hough transform approach would involve applying a filter to the discriminated data to identify all boundary points. Triplets of boundary points can then contribute ‘votes’ on the circle centres.
Fig. 4. Harvester photo, diagram
More Machine Vision Applications in the NCEA
337
Fig. 5. Camera locations
Because analysis must be performed in real time, a much simpler circledetection algorithm was chosen. Pixels are examined on a coarse grid, with spacing half the size of a nut. The grid is such that columns of points are examined downwards, the columns progressing from left to right. When a pixel is found that is ‘not blue’, adjacent pixels are then explored from this point up and down to find two points on the boundary. From the midpoint of this chord, pixels are explored horizontally to find boundary points; the centre is located and the radius is determined. To check the circularity of the object that has been found, we inspect points on the vertices of two octagons. These lie on two circles, one inside and one outside the circle we are testing. Although rough and ready, this algorithm is robust and rapid. If the initial search grid has intervals of five pixels, only four percent of the pixels are initially inspected. For each nut or leaf detected, some further thirty to forty further pixels are tested. Having determined the coordinates of nut-centres in one frame of the video stream, it is necessary to collate the sequence of frames to ensure that nuts are neither counted twice nor omitted. The result is a map of coordinates relative to the location of the tractor. Of course that is only half of the story. In order to aggregate the totals tree by tree, we must determine the location of the tractor relative to each tree. Machine vision provides a solution once again. At first glance, it would appear that precision GPS satellite navigation could give the tractor location. In practice, the canopy of trees will degrade the accuracy to a point where it cannot be used. Instead, a side-looking camera observes the tree trunks as they stream past some carrying an identifying marker so that the count cannot lose step. Trees in more distant rows are recognised by their ‘streaming rate’, so that they do not confuse the issue.
Fig. 6. Detected nuts and leaves
338
John Billingsley
Fig. 7. Dunnarts and detected movement
4
Animal Behaviour
There is a breeding programme to improve the survival chances of a species of dunnart, Sminthopsis douglasi. A serious hazard is the aggressive nature of the small mouse-like marsupials. If the female is not in oestrus when introduced into a cage with the male, there is a danger that they will fight to the death. A first step was to mount a camera where it could capture video from two adjoining cages containing male and female. Background discrimination enabled the animals to be located as they moved, easing the task of the student who had to monitor the recorded video to judge when it would be safe to put the animals together. The next step will be to encode the movement to obtain an automatic assessment of the animals’ behaviour. Initially this will merely alert the animal breeders, but eventually automatic operation of a gateway between the cages might be possible.
5
Texture Analysis
One of the factors used to determine citrus quality is the texture of the skin. The skin texture of citrus fruit is a combination of three different types of spatial variation. Sub-millimetre wrinkles cover the entire skin and are irregular, but have relatively constant coverage. Small dimples 1 mm–5 mm in depth are randomly spaced around the fruit. It is the depth and quantity of these which have the greatest impact on the skin texture grade. The third type of variation is deformation from the normal spheroid shape. These lumps or flat spots can be caused by rough handling, or may be due to variety. The texture can be measured directly from a fruit using an expensive stylus instrument where, similar to that of a record player, a needle touches the skin of the fruit as it revolves. [11] The changes in position are amplified and recorded. A serious problem is that this method only gives a single sample from one ‘latitude’ around the fruit, which may or may not be representative.
More Machine Vision Applications in the NCEA
339
Fig. 8. Orange, illumination and camera
The machine-vision solution is to illuminate the fruit from the side, so that to the camera mounted in front of it, it appears as a ‘half moon’. [7] The ‘terminator’, dividing lit and portions in shadow, will appear as a ragged vertical line, with a statistical distribution of the horizontal ‘roughness’ that is readily related to the texture. As the fruit rotates, a sequence of measures can be accumulated to give an assessment of the entire fruit. One such measure involves detecting the shadow edge position as a function x(y) of the scan line y on which it is seen. This function can be doubly low-pass filtered by executing code equivalent to xsmooth(ymin) = x(ymin) for y = ymin to ymax xs = xs + (x(y) – xs) / k xsmooth(y) = xs next for y = ymax to ymin step –1 xs = xs + (xsmooth(y) – xs) / k xsmooth(y) = xs next where k is a smoothing parameter, a sort of ‘distance constant’. The high pass ‘roughness’ signal is given by x(y)-xsmooth(y). A measure of texture can be obtained by squaring and summing this. We can tune k to remove lumps and deformations, while preserving dimples. A second project involving vegetation texture analysis is not really agricultural. There is a need for a fast and simple way of analysing the cover of a football pitch, to ensure that there is no bald and slippery patch where an expensive player might suffer damage. [9]
6
Measuring the Density of Dingo Teeth
A biologist colleague had a requirement to measure the density of porous dingo teeth, in order to establish the validity of using the relationship between tooth density and age to estimate the age of the animal. [5] Skulls containing canine teeth from
340
John Billingsley
Fig. 9. Building image from left and right tangents
68 ‘known-age’ animals, either field-captured or captive-reared, were borrowed from dingo skulls held by CSIRO Sustainable Ecosystems, Canberra, Australia. Weighing the teeth was of course not a problem, but their very porosity was at variance with the use of an ‘Archimedes’ immersion method for measuring their volumes. The biologist seized on the suggestion that machine vision could be the answer. The original intention was to use ‘structured light’ to map the tooth. Then there was a surprise announcement that the museum required the return of the teeth by the end of the week. Attention turned to a means of capturing the data for later analysis. The result was almost certainly superior to the original intention. The canine tooth, shaped rather like a banana, was attached to the vertical axis of a small stepper motor. The motor had been ‘recovered’ from a discarded floppy-disk drive. A ‘Smartcam Pro’ web camera was mounted firmly to observe it. It was decided that the 320 by 240 pixel resolution would be best employed by mounting the camera in ‘portrait’ position. The off-white tooth was illuminated in front of a black background. Only 50 images were to be captured per revolution. Even so, the prospect of saving and later processing over eleven megabytes of data for each of 160 teeth was daunting. So just the green signal was captured, yielding a clear binary silhouette. The data was further reduced before saving.
Fig. 10. Left and right estimates of the slice.
More Machine Vision Applications in the NCEA
341
For each of the 320 ‘slices’ of the image, the location of maximum and minimum tooth boundary were found. If no white appeared because the slice was beyond the end of the tooth, two values of 0 were recorded, otherwise two commaseparated numbers were written to file. The file size for each tooth was reduced to just over 100 kilobytes. Some industrious work by a biology student saw the teeth scanned and returned to the museum on time. The problem still remained of processing the data. Each of the number-pairs in the data file represents a line section of the perceived tooth image, in other words a ‘left tangent’ and a ‘right tangent’ to the tooth for a given elevation angle from the camera lens. The ‘perspective effect’ can readily be accommodated by drawing the tangents through a single point representing the camera. The method adopted was similar to that of the tomograph. A planar array of points was set up, initially deemed to be ‘occupied’. Now each tangent becomes a line that sets a boundary between occupied and unoccupied points and those that are unoccupied are eliminated. For the next image in the sequence, the camera position is rotated 1/50 revolution with respect to the plane and the process repeated. At the conclusion, the survivors are counted to give the area of the slice. The left and right tangents give two distinct estimates of the area. By comparing them, corrections can be made for errors in the location and angle of the image of the axis. Totalling the area gives a measure of the volume.
7
Vision Guidance
The original vision guidance method had a number of patented features. These have been carried through into the new version of the system. The basis of the strategy is to form ‘keyholes’ in the image, each of which embraces a single row. A regression fit is made to fit a line to the ‘plant’ pixels in the keyhole, updating the estimate of the row’s location and direction. This in turn updates the keyhole to be used for the next frame. The vegetation or furrow that must be followed must be discriminated from the background, either by comparing brightness or colour components. Since the lighting is apt to change, this could give problems. To solve this, the threshold level is manipulated to give a certain proportion or ‘plant’ pixels in the keyhole. At any stage of growth, this will be fairly constant. Figure 11 shows a typical image from the row-following algorithm.
342
John Billingsley
Fig. 11. Fitting lines to crop rows
8
Conclusions
A vision approach can be applied to an ever broader range of instrumentation tasks. It has become simple to exploit the media-motivated interfacing and videostream processing tools that are now readily available. Projects not described here include a refractometer for measuring sugar cane juice, based on a line-scan camera. [12] The interfacing task would have been simplified if a cannibalised webcam had been used instead. There is a divergence within computer peripheral systems. Analogue to digital converters for instrumentation have become increasingly complicated and costly, while sound cards with high performance ADCs are available at give-away prices. Transducers for absolute position are extremely costly and we have given serious thought to using a webcam to inspect a measuring tape. It may be technological overkill, but it is a low-cost alternative to a commercial sensor. There will be many occasions on which it is easier to tailor an elaborate consumer product rather than craft a simpler engineering solution. The projects presented in this paper may seem strange, but future applications may well push the bounds of imagination much further.
Acknowledgments This review draws heavily from the work of Mark Dunn, Kerry Withers and others in the NCEA and the University of Southern Queensland.
More Machine Vision Applications in the NCEA
343
References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13]
J. Billingsley, “The Counting of Macadamia Nuts”, Mechatronics and Machine Vision 2002: Current Practice, Research Studies Press Ltd. ISBN 0-86380-278-8, pp. 221–7. J. Billingsley and A.A. Hosseinmardi, “A low-cost vision system for robotics”, Proc. British Association meeting, Brighton, August 1983. J. Billingsley and A.A. Hosseinmardi, “Low cost vision for robots – a pragmatic approach”, IEE Colloquium (Digest), n. 1983/65, 1983, p. 3.1–3.2. J. Billingsley and M. Schoenfisch, “Vision and Mechatronics Applications at the NCEA”, Fourth IARP workshop on Robotics in Agriculture and the Food Industry, Toulouse. J. Billingsley and K. Withers, “Measuring the density of dingo teeth with machine vision”, Sensor Review, v. 24, n. 4, 2004, p. 361–363. E.R. Davies, Machine Vision Theory Algorithms, Practicalities (3rd ed.). Elsevier. M. Dunn and J. Billingsley, “A Machine Vision System for Surface Texture Measurements of Citrus”, Proceedings 11th IEEE conference on Mechatronics and Machine Vision in Practice, Macau, November 2004, pp. 73–76. M. Dunn, J. Billingsley and N. Finch, “Machine Vision Classification of Animals”, Mechatronics and Machine Vision 2003:Future trends, pub Research Studies Press Ltd, Baldock, UK, ISBN 0 86380 290 7, pp. 157–163. M. Dunn, J. Billingsley, S. Raine and A. Piper, “Using Machine Vision for Objective Evaluation of Ground Cover on Sporting Fields”, Proceedings 11th IEEE conference on Mechatronics and Machine Vision in Practice, Macau, November 2004. pp 88–91. V. Kindratenko, “On Using Functions to Describe the Shape”, Journal of Mathematical Imaging and Vision, vol. 18, no. 3, pp. 225–45. R. Leach, The Measurement of Surface Texture using Stylus Instruments. 2001, National Physixs Laboratory: London. p. 79. S. McCarthy and J. Billingsley, “A sensor for the sugar cane harvester topper”, Sensor Review, v. 22, n. 3, 2002, p. 242–246. J. Seccombe, “Sustainability of the Great Artesian Basin”, Proc. of Academy Symposium, Canberra, Australia.
Authors
Bauer, H., PROFACTOR Research, Austria 87 Bečanović, Vlatko, Kyushu Institute of Technology, Japan 65 Billingsley, John, University of Southern Queensland 333 Boles, Wageeh, Queensland University of Technology 275 Bradbeer, Robin, City University of Hong Kong 17, 167, 209, 17, 167, 209 Brett, P.N., Aston University, Birmingham 219, 247, 261, 219, 247, 261 Brooker, G., University of Sydney, Australia 139 Campbell, Duncan, Queensland University of Technology 125 Chen, Li, Shenyang Institute of Automation 201 Chen, S.L., National University of Singapore 255, 267, 255, 267 Chen, Weiping, Griffith University 193 Coulson, C., Aston University, Birmingham 261 Cruz-Villar, C.,InvEstAv I.P.N., Mexico 41 Cubero. Samuel N., University of Southern Queensland 27, 229, 27, 229 Dowling, Jason, Queensland University of Technology 275 Eberst, C., PROFACTOR Research, Austria 87 Foglia, Mario M., Politecnico of Bari, Italy 313 Gao, Yongsheng, Griffith University 193 Gentile, Angelo, Politecnico of Bari, Italy 313 Gibbens, Peter W., University of Sydney 181 Griffiths, M.V., Saint Michaels Hospital, Bristol 247, 261, 247, 261 Hancock, Nigel, University of Southern Queensland 305 Heindl, C., PROFACTOR Research, Austria 87 Hennessy, R., University of Sydney, Australia 139 Holding, David, Aston University, Birmingham 247 Huang, S.N., National University of Singapore 255 Keir, Andrew, Queensland University of Technology 125 Ku, Kenneth K.K., City University of Hong Kong 167, 209, 167, 209 Kuo, John, San Diego State University, USA 99 Lai, W.B., National University of Singapore 267 Lam, Katherine, City University of Hong Kong 167 Lee, T.H., National University of Singapore 267 Lees, Michael, Foster's Australia, Brisbane 125 Li, Bin, Shenyang Institute of Automation 201 Lobsey, C., University of Sydney, Australia 139 Ma, X., Aston University, Birmingham 219 Maclean, A., University of Sydney, Australia 139 Maeder, Anthony, Queensland University of Technology 275
346
Authors
McCarthy, Cheryl, University of Southern Queensland 305 McCarthy, E., University of Southern Queensland 289 Meers, Simon, University of Wollongong, Australia 111 Minichberger, J., PROFACTOR Research, Austria 87 Nickols, Frank, Dhofar University, Sultanate of Oman 3 Parra-Vega, V., CInvEstAv I.P.N., Mexico 41 Pather, S., University of Southern Queensland 289 Pichler, A., PROFACTOR Research, Austria 87 Piper, Ian, University of Wollongong, Australia 111 Portlock, Joshua N., Curtin University of Technology 27 Proops, D., Queen Elizabeth Hospital Birmingham 261 Raine, Steven, University of Southern Queensland 305 Reina, Giulio, University of Lecce, Italy 313 Rodriguez-Angeles, A., CInvEstAv I.P.N., Mexico 41 Scheding, S., University of Sydney, Australia 139 Sivadorai, M., University of Sydney, Australia 139 Stone, R.Hugh, University of Sydney 181 Tam, Betty, Aston University, Birmingham 247 Tan, K.K., National University of Singapore 255, 267, 255, 267 Tang, K.Z., National University of Singapore 255 Tarokh, Mahmoud, San Diego State University, USA 99 Taylor, R.P., Aston University, Birmingham 261 Tongpadungrod, P., King Mongkut's Institute of Technology, Bangkok 219 Trevelyan, James, University of Western Australia 51 Tsai, Allen C., University of Sydney 181 Tsang, P.W.M., City University of Hong Kong 77 Wang, Xue-Bing, Kyushu Institute of Technology, Japan 65 Wang, Yang, Shenyang Institute of Automation 201 Ward, Koren, University of Wollongong, Australia 111 Widzyk-Capehart, E., CSIRO, Brisbane, Australia 139 Yuen, T.Y.Y., City University of Hong Kong 77
Index
3D machine vision 87 3D object recognition 88, 89, 91, 93, 95 3D object reconstruction 91 3D scanning 90 Aerial robot 27 agricultural crop canopy 306 artificial human vision 275 asset management 125, 126 AUV (Autonomous Underwater Vehicle) 41 beam path 292 blind search 229, 233 blindness 279 blob tracking 113 bounding box 325 CD34+ cell 259 cochleostomy 262, 263 CompactRIO 267, 270 continuous computation architecture 4 Coordination in Mechatronic engineering work 51, 53
fingerprint 193–196 fixed object plane 306 flexible digit 248, 252 FMCW 149 framework 275, 276 Freeman chain 335 fuzzy control 102, 108 gaze-tracking 111 genetic algorithm 78, 80, 83, 84 grasping 314 harvesting 316 head-pose 111, 112, 119 hydraulic operation 210 illumination 128 inertial sensor 65, 66, 68 infrared LEDs 112, 113 inspection 201, 205 intelligent physiotherapy 267 internode length 305, 306 invariant moment 183 invasive clinical environment 247 inverse kinematics 229 Kalman filter 70, 71, 75
deformation 127 drilling bone tissue 262 education 17 Engineering education, implications 59 fibre-optic cable 167, 169 field camera enclosure 310, 311 Field Programmable Gate Array (FPGA) 270
LabVIEW 268 landing 183 livestock identification 334 localisation 318 macadamia nuts 333, 336 marine conservation 17, 23, 24 Mechatronics curricula 41, 50 Mechatronics integration 41 millimetre wave 139
348
Index
mining 148, 159 minutiae direction image 194, 195 mobility 275 monocular 65, 68, 74 mononucleotide cell (MNC) 259 Monte-Carlo 65, 70 motion parallax 65, 68, 74, 75 neural network 125, 134–136 neuromorphic 66 object matching 78, 80 optical character recognition 125, 126, 133 outdoor navigation 108
robotic person following 99, 100, 107 sensing 139 sensing dynamic loading 226 serpentine movement 203 Shadow group air muscles 210 simulation 275 simultaneous translation and rotation 3 six-legged robot rotation about instantaneous centre 13 snake locomotion 201 spectacles 111, 112, 120 s-psi 335 standalone system 270, 271 sun and planet wheel model 9
particle swarm 77 PD control 49 person identification 101, 108 phase correlation 194, 196 placenta 255, 256 plant node detection 311 pneumatic muscles 209 pneumatic pressure application system 258 position estimation 181, 182, 190
tactile sensing array 219 template 325 template matching 133–135 tomography 292
Quad rotor 28
Viennese waltz pattern 6 vision guidance 341 visual prosthesis 275, 276 visualisation 139, 141 Visual-servoing 43, 49
radar 139–142 robot planning 88 robot workspace 231, 232, 243
UAV 181, 184, 188, 190 UAV dynamics 28 ultrasound (HIFU) 289 umbilical cord blood (UCB) 255 underwater monitoring 167 underwater robots 18