Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen
Editorial Board David Hutchison Lancaster University, UK Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA Josef Kittler University of Surrey, Guildford, UK Jon M. Kleinberg Cornell University, Ithaca, NY, USA Alfred Kobsa University of California, Irvine, CA, USA Friedemann Mattern ETH Zurich, Switzerland John C. Mitchell Stanford University, CA, USA Moni Naor Weizmann Institute of Science, Rehovot, Israel Oscar Nierstrasz University of Bern, Switzerland C. Pandu Rangan Indian Institute of Technology, Madras, India Bernhard Steffen TU Dortmund University, Germany Madhu Sudan Microsoft Research, Cambridge, MA, USA Demetri Terzopoulos University of California, Los Angeles, CA, USA Doug Tygar University of California, Berkeley, CA, USA Gerhard Weikum Max Planck Institute for Informatics, Saarbruecken, Germany
7066
Halimah Badioze Zaman Peter Robinson Maria Petrou Patrick Olivier Timothy K. Shih Sergio Velastin Ingela Nyström (Eds.)
Visual Informatics: Sustaining Research and Innovations Second International Visual Informatics Conference, IVIC 2011 Selangor, Malaysia, November 9-11, 2011 Proceedings, Part I
13
Volume Editors Halimah Badioze Zaman Universiti Kebangsaan Malaysia, Bangi, Malaysia;
[email protected] Peter Robinson University of Cambridge, UK;
[email protected] Maria Petrou Imperial College, London, UK;
[email protected] Patrick Olivier Newcastle University upon-Tyne, UK;
[email protected] Timothy K. Shih National Central University, Jhongli City, Taiwan;
[email protected] Sergio Velastin Kingston University, UK;
[email protected] Ingela Nyström Uppsala University, Sweden;
[email protected]
ISSN 0302-9743 e-ISSN 1611-3349 ISBN 978-3-642-25190-0 e-ISBN 978-3-642-25191-7 DOI 10.1007/978-3-642-25191-7 Springer Heidelberg Dordrecht London New York Library of Congress Control Number: 2011940133 CR Subject Classification (1998): I.4, I.5, I.2.10, I.3.5, I.3.7, I.7.5, F.2.2 LNCS Sublibrary: SL 6 – Image Processing, Computer Vision, Pattern Recognition, and Graphics
© Springer-Verlag Berlin Heidelberg 2011 This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com)
Preface
Visual informatics is currently a multidisciplinary field that is well accepted among researchers and industry in computer science, information technology and engineering. The basic areas of research, such as virtual real image processing and engineering; computer vision and simulation; visual computing and visualization and social computing, have been applied in various domains such as education, medical and health, finance, agriculture and security. We currently also see various Centres of Excellence (CoE) in the field of visual informatics being established in various institutions of higher learning (IHLs) around the world (Europe, USA and UK). Malaysia has just established a similar CoE called the Institute of Visual Informatics (IVI) at Universiti Kebangsaan Malaysia (UKM) or The National University of Malaysia. It is therefore important that researchers from these various CoEs, research institutes, technology centers and industry form networks and share and disseminate new knowledge in this field for the benefit of society. It is for this reason that the Institute of Visual Informatics (IVI) at UKM decided to host the Second International Visual Informatics Conference (IVIC 2011), to bring together experts in this very important research area so that more concerted efforts can be undertaken not just locally but globally. The first IVIC held in 2009 brought together experts from Asia, the UK, Oceania and the USA. This time we also managed to bring in an expert from Sweden. Like the first IVIC, this time too the conference was conducted collaboratively by the visual informatics community from various public and private universities and industry. The second conference was co-sponsored by the Malaysian Information Technology Society (MITS), Multimedia Development Corporation (MDeC), and the Malaysian Research Education Network (MyREN). The conference was co-chaired by seven professors from four different countries (UK, Sweden, Taiwan and Malaysia). The theme of the conference, ‘Visual Informatics: Sustainable Innovations for Wealth Creation,’ reflects the importance of bringing research from ‘laboratories to the market.’ It also portrayed the shared belief of the organizers (both locally and globally) of the importance of creating a seamless value-chain R&D ecosystem: from fundamental, applied research to ‘proof of concept’ and commercialization. With the slow economic trend experienced around the world today, research and innovation are more important than ever before in creating high-income jobs in order to accelerate economic growth. Thus, the relevance of the theme of the conference was apt and timely. The conference focused on four tracks related to the basic areas of visual informatics over two days (November 9 and 10, 2011) and ended with a one-day workshop (November 11, 2011). There were four keynote speakers and 75 paper presentations based on topics covered by the four main tracks mentioned ear-
VI
Preface
lier. The reviewing of the papers was conducted by experts who represented a 150-member Program Committee from Asia, Europe, Oceania and North America. Each paper was reviewed by three reviewers and the rejection rate was 60%. The reviewing process was managed using an electronic conference management system (CoMSTM ) created by the Institute of Visual Informatics, UKM. On behalf of the Organizing and Program Committees of IVIC 2011, we thank all authors for their submissions and camera-ready copies of papers, and all participants for their thought-provoking ideas and active participation in the conference. We also thank the Vice-Chancellor of UKM (host University), and Vice-Chancellors and Deans of all IT faculties of the IHLs for their support in organizing this conference. We also acknowledge the sponsors, members of the Organizing Committees, Program Committee members, support committees and individuals who gave their continuous help and support in making the conference a success. We fervently believe that IVIC will grow from strength to strength and we also hope that one day it will be held in different host countries in Asia, Europe, Oceania or North America. We also hope that IVIC will continue to provide a stimulating and enriching platform for research and innovations that will transcend religions, cultures, race and beliefs to contribute to better general human well-being. November 2011
Halimah Badioze Zaman Peter Robinson Maria Petrou Patrick Olivier Timothy Shih Sergio Velastin Ingela Nystr¨ om
Organization
The Second International Visual Informatics Conference (IVIC 2011) was organized by the Institute of Visual Informatics and Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, in collaboration with 13 local public and private universities in Malaysia, the Malaysian Information Technology Society (MITS), Multimedia Development Corporation (MDeC), Malaysian Institute of Microelectronic Systems (MIMOS) and Malaysian Research Educational Network (MYREN).
Local Executive Committee General Chair Deputy Chair Secretary Assistant Secretary I Assistant Secretary II Treasurer Assistant Treasurer I Assistant Treasurer II
Halimah Badioze Zaman (UKM) Fatimah Dato’Ahmad (UPNM) Azlina Ahmad (UKM) Nazlena Mohamad Ali (UKM) Mohd M. Kadhum (UUM) Haslina Arshad (UKM) Rabiah Abdul Kadir (UPM) Syaimak Abd Shukor (UKM)
Program Committee Program Co-chairs Halimah Badioze Zaman Peter Robinson Patrick Olivier Ingela Nystr¨ om Maria Petrou Timothy Shih Sergio Velastin
Universiti Kebangsaan Malaysia, Malaysia University of Cambridge, UK University of Newcastle Upon-Tyne, UK Uppsala University, Sweden Imperial College, UK National Central University, Taiwan Kingston University, UK
Members/Referees Europe Ahmad Khurshid Alan Smeaton Burkhard Wuensche Daniel Thalmann
Edie Rasmussen Gregor Rainer Hassan Ugail Ingela Nystr¨ om
VIII
Organization
Jian Jun Zheng Jonathon Furner Ligang He Neil Andrew Gordon Peter Robinson Rainer Malaka Sergio Velastin Tony Pridmore Ann Blandford Carol Peters Donatella Castelli Gerald Schaefer
Harold Timbleby Ingeborg Solvberg Jian J. zhang John Wilson Keith van Rjsbergen Maria Petrou Patrick Olivier Qingde Li Roy Sterritt Stephen McKenna Wenyu Liu
USA Archan Misra Dick Simmons Hshinchun Chen Josep Torellas Micheal H. Hinchey Per-Ake (Paul) Larson Vicky Markstein
Carl K. Chang Eric Wong James Hughes Joseph Urban Paul R. Croll T. Kesavadas
Asia and Oceania Abd Razak Yaakub Abdul Razak Hamdan Abdul Samad Hasan Basari Abdul Samad Shibghatullah Abdullah Gani Aboamama Atahar Ahmed Alex Orailoglu Amirah Ismail Ang Mei Choo Anup Kumar Asim Smailagic Azizah Jaafar Azizi Abdullah Azlina Ahmad Azreen Azman Azurah Abu Samah
Bahari Belaton Bryon Purves Burairah Hussin Burhanuddin Mohd Aboobaider Chen Chwen Jen Choo Wou Onn Choo Yun Huoy Christopher C. Yang Chung Jen-Yao Dayang Norhayati Abg Jawawi Dayang Rohaya Awang Rambli Dhanesh Ramachandram Dzulkifli Mohamad Edwin Mit Effirul Ikhwan Ramlan
Elankovan A. Sundararajan Faaizah Shahbodin Faieza Abdul Aziz Farid Ghani Fatimah Dato’ Ahmad Faudziah Ahmad Hajah Norasiken Bakar Halimah Badioze Zaman Hamid Abdalla Jalab Hanspeter Pfister Haslina Arshad Hwee Hua Pang Jane Labadin Jie-Wu Juan Antonio Carballo Khairulmizam Samsudin Lai Jian Ming Li Jian Zhong
Organization
Lili Nurliyana Abdullah Ling Teck Chaw Li-Zhu Zhou M. Iqbal Bin Saripan Maizatul Hayati Mohamad Yatim Maryam Nazari Masatoshi Yoshikawa Masnizah Mohd Mazleena Salleh Md. Nazrul Islam Mohamad Ishak Desa Mohammad Khatim Hasan Mohd Faizal Abdollah Mohd Khanapi Abdul Ghani Mohd Shafry Mohd Rahim Mohd. Taufik Abdullah Mun-Kew Leong Muriati Mukhtar Narhum Gershon Nazlena Mohamad Ali Nazlia Omar Ng Giap Weng Ning Zhong Nor Aniza Abdullah Nor Azan Hj Mat Zin Nor Faezah M. Yatim Nor Hasbiah Ubaidullah Norafida Ithnin Noraidah Sahari Ashaari
Norasikin Fabil Norrozila Sulaiman Norshahriah Abdul Wahab Nur’Aini Abdul Rashid Nurazzah Abd Rahman Nursuriati Jamil Osman Ghazali Patricia Anthony Phillip C.-Y. Sheu Puteh Saad Rabiah Abd Kadir Ramlah Mailok Reggie Caudill Riaza Rias Ridzuan Hussin Riza Sulaiman Roselina Sallehuddin Shahrin Sahib Shahrul Azman Mohd Noah Shahrul Azmi Mohd. Yusof Sharifah Mumtazah Syed Ahmad Abdul Rahman Sharlini R. Urs Sim Kok Swee Siti Mariyam Hj Shamsuddin Sobihatun Nur Abdul Salam Suliman Hawamdeh
Syaimak Abdul Shukor Syamsiah Mashohor Syed Nasir Alsagoff Tan Tien Ping Teddy Surya Gunawan Tengku Siti Meriam Tengku Wook Timothy Shih Tutut Herawan Wai Lam Wan Abdul Rahim Wan Mohd Isa Wan Azlan Wan Zainal Abidin Wan Fatimah Wan Ahmad Wan Mohd Nazmee Wan Zainon Wee Mee Chin Wei Zhao Willian Hayward Wong Kok Sheik Yin Chai Wang Yin-Leng Theng Zailani Mohd Nordin Zainab Abu Bakar Zainul Abidin Zaipatimah Ali Zarinah Mohd Kasirun Zulikha Jamaluddin Zulkarnain Md Ali
Local Arrangements Committee Technical Committee Head Members
IX
Halimah Badioze Zaman (UKM) Azlina Ahmad (UKM) Muriati Mukhtar (UKM) Riza Sulaiman (UKM) Nazlena Mohamad Ali (UKM) Mohd M. Kadhum (UUM) M. Iqbal Saripan (UPM)
X
Organization
Haslina Arshad (UKM) Syaimak Abd Shukor (UKM) Rabiah Abdul Kadir (UPM) Elankovan A. Sundararajan (UKM) Norazan Mat Zin (UKM) Mohammad Khatim Hassan (UKM) Wan Mohd Nazmee Wan Zainon (USM) Tengku Siti Meriam Tengku Wook (UKM) Azizah Jaafar (UKM) Bahari Belaton (USM) Wan Fatimah Wan Ahmad (UTP) Fatimah Ahmad (UPNM) Noraidah Sahari Ashaari (UKM) Ang Mei Choo (UKM) Nursuriati Jamil (UiTM) Syed Nasir Alsagoff (UPNM) Azreen Azman (UPM) Dayang Rohaya Bt Awang Rambli (UTP) Suziah Sulaiman (UTP) Riaza Mohd Rias (UiTM) Faaizah Shahbodin (UTeM) Hajah Norasiken Bakar (UTeM) Norshahriah Wahab (UPNM) Nurazzah Abdul Rahman (UiTM) Publicity Head Members
Logistic Head Members
Elankovan A. Sundararajan (UKM) Norazan Mat Zin (UKM) Mohammad Khatim Hassan (UKM) Wan Mohd Nazmee Wan Zainon (USM) Tengku Siti Meriam Tengku Wook (UKM) Azlina Ahmad (UKM) Aidanismah Yahya (UKM) Nurdiyana Mohd Yassin (UKM) Ang Mei Choo (UKM) Nursuriati Jamil (UiTM) Syed Nasir Alsagoff (UPNM) Riaza Mohd Rias (UiTM) Maslina Abdul Aziz (UiTM) Norshahriah Wahab (UPNM) Mohd Hanif Md Saad (UKM)
Organization
Financial Head Members
Workshop Head Members
Secretariat
Azizah Jaafar (UKM) Halimah Badioze Zaman (UKM) Azlina Ahmad (UKM) Wan Fatimah Wan Ahmad (UTP) Fatimah Dato Ahmad (UPNM) Riza Sulaiman (UKM) Wan Mohd Nazmee Wan Zainon (USM) Faaizah Shahbodin (UTeM) Choo Wou Onn (UTAR) Nurul Aini Kasran (UKM) Aw Kien Sin (UKM)
Conference Management System (CoMS
TM
)
Institute of Visual Informatics, Universiti Kebangsaan Malaysia
Sponsoring Institutions Universiti Kebangsaan Malaysia (UKM) Universiti Putra Malaysia (UPM) Universiti Sains Malaysia (USM) University Teknologi PETRONAS (UTP) Universiti Teknologi MARA (UiTM) Universiti Pertahanan Nasional Malaysia (UPNM) Universiti Teknologi Malaysia (UTM) Universiti Malaysia Sarawak (UNIMAS) University Malaya (UM) Universiti Utara Malaysia (UUM) Universiti Teknikal Malaysia (UTeM) Universiti Tunku Abdul Rahman (UTAR) Multimedia University (MMU) Malaysian Information Technology Society (MITS) Multimedia Corporation Malaysia (MDeC) Malaysian Research Educational Network (MyREN) Malaysian Institute of Microelectronics (MIMOS)
XI
Table of Contents – Part I
Keynotes Visualization and Haptics for Interactive Medical Image Analysis: Image Segmentation in Cranio-Maxillofacial Surgery Planning . . . . . . . . . Ingela Nystr¨ om, Johan Nysj¨ o, and Filip Malmberg
1
Evaluation of Unsupervised Segmentation Algorithms for Silhouette Extraction in Human Action Video Sequences . . . . . . . . . . . . . . . . . . . . . . . Adolfo Mart´ınez-Us´ o, G. Salgues, and S.A. Velastin
13
Video Forgery and Motion Editing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Joseph C. Tsai and Timothy K. Shih
23
Computer Vision and Simulation Improved Incremental Orthogonal Centroid Algorithm for Visualising Pipeline Sensor Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A. Folorunso Olufemi, Mohd Shahrizal Sunar, and Normal Mat Jusoh 3D Visualization of Simple Natural Language Statement Using Semantic Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rabiah Abdul Kadir, Abdul Rahman Mad Hashim, Rahmita Wirza, and Aida Mustapha Character Recognition of License Plate Number Using Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Syafeeza Ahmad Radzi and Mohamed Khalil-Hani Simulation Strategy of Membrane Computing to Characterize the Structure and Non-deterministic Behavior of Biological Systems: A Case Study with Ligand-Receptor Network of Protein TGF-β . . . . . . . Muniyandi Ravie Chandren and Mohd. Zin Abdullah Development of 3D Tawaf Simulation for Hajj Training Application Using Virtual Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mohd Shafry Mohd Rahim, Ahmad Zakwan Azizul Fata, Ahmad Hoirul Basori, Arief Salleh Rosman, Tamar Jaya Nizar, and Farah Wahida Mohd Yusof A Grammar-Based Process Modeling and Simulation Methodology for Supply Chain Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mohsen Mohammadi, Muriati Bt. Mukhtar, and Hamid Reza Peikari
24
36
45
56
67
77
XIV
Table of Contents – Part I
A Parallel Coordinates Visualization for the Uncapaciated Examination Timetabling Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . J. Joshua Thomas, Ahamad Tajudin Khader, and Bahari Belaton A Modified Edge-Based Region Growing Segmentation of Geometric Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nursuriati Jamil, Hazwani Che Soh, Tengku Mohd Tengku Sembok, and Zainab Abu Bakar Comparison on Performance of Radial Basis Function Neural Network and Discriminant Function in Classification of CSEM Data . . . . . . . . . . . . Muhammad Abdulkarim, Afza Shafie, Radzuan Razali, Wan Fatimah Wan Ahmad, and Agus Arif Simulation for Laparoscopy Surgery with Haptic Element for Medical Students in HUKM: A Preliminary Analysis . . . . . . . . . . . . . . . . . . . . . . . . . A.R. Norkhairani, Halimah Badioze Zaman, and Azlina Ahmad
87
99
113
125
Virtual Image Processing and Engineering Detection and Classification of Granulation Tissue in Chronic Ulcers . . . Ahmad Fadzil M. Hani, Leena Arshad, Aamir Saeed Malik, Adawiyah Jamil, and Felix Yap Boon Bin
139
New Color Image Histogram-Based Detectors . . . . . . . . . . . . . . . . . . . . . . . . Taha H. Rassem and Bee Ee Khoo
151
Digital Training Tool Framework for Jawi Character Formation . . . . . . . . Norizan Mat Diah and Nor Azan Mat Zin
164
Empirical Performance Evaluation of Raster to Vector Conversion with Different Scanning Resolutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bilal Abdulrahman T. Al-Douri, Hasan S.M. Al-Khaffaf, and Abdullah Zawawi Talib
176
Visualizing the Construction of Incremental Disorder Trie Itemset Data Structure (DOSTrieIT) for Frequent Pattern Tree (FP-Tree) . . . . . . . . . . . Zailani Abdullah, Tutut Herawan, and Mustafa Mat Deris
183
The Gradient of the Maximal Curvature Estimation for Crest Lines Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pan Zheng, Bahari Belaton, Iman Yi Liao, and Zainul Ahmad Rajion
196
AdaBoost-Based Approach for Detecting Lithiasis and Polyps in USG Images of the Gallbladder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Marcin Ciecholewski
206
Table of Contents – Part I
Assessing Educators’ Acceptance of Virtual Reality (VR) in the Classroom Using the Unified Theory of Acceptance and Use of Technology (UTAUT) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Niwala Haswita Hussin, Jafreezal Jaafar, and Alan G. Downe A Fuzzy Similarity Based Image Segmentation Scheme Using Self-organizing Map with Iterative Region Merging . . . . . . . . . . . . . . . . . . . Wooi-Haw Tan, Gouenou Coatrieux, Basel Solaiman, and Rosli Besar Enhancing Learning Experience of Novice Surgeon with Virtual Training System for Jig an Fixture Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . Intan Syaherra Ramli, Haslina Arshad, Abu Bakar Sulong, Nor Hamdan Mohd. Yahaya, and Che Hassan Che Haron
XV
216
226
238
Improved Gait Recognition with Automatic Body Joint Identification . . . Tze-Wei Yeoh, Wooi-Haw Tan, Hu Ng, Hau-Lee Tong, and Chee-Pun Ooi
245
A Real-Time Vision-Based Framework for Human-Robot Interaction . . . Meng Chun Lam, Anton Satria Prabuwono, Haslina Arshad, and Chee Seng Chan
257
Automated Hemorrhage Slices Detection for CT Brain Images . . . . . . . . . Hau-Lee Tong, Mohammad Faizal Ahmad Fauzi, and Su-Cheng Haw
268
CBIR for an Automated Solid Waste Bin Level Detection System Using GLCM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Maher Arebey, M.A. Hannan, R.A. Begum, and Hassan Basri Image Enhancement of Underwater Habitat Using Color Correction Based on Histogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Norsila bt Shamsuddin, Wan Fatimah bt Wan Ahmad, Baharum b Baharudin, Mohd Kushairi b Mohd Rajuddin, and Farahwahida bt Mohd A New Application for Real-Time Shadow and Sun’s Position in Virtual Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hoshang Kolivand and Mohd Shahrizal Bin Sunar A New and Improved Size Detection Algorithm for Acetabular Implant in Total Hip Replacement Preoperative Planning . . . . . . . . . . . . . . . . . . . . . Azrulhizam Shapi’i, Riza Sulaiman, Mohammad Khatim Hasan, Anton Satria Prabuwono, Abdul Yazid Mohd Kassim, and Hamzaini Abdul Hamid Visual Application in Multi-touch Tabletop for Mathematics Learning: A Preliminary Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Khoo Shiang Tyng, Halimah Badioze Zaman, and Azlina Ahmad
280
289
300
307
319
XVI
Table of Contents – Part I
Analysing Tabletop Based Computer Supported Collaborative Learning Data through Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ammar Al-Qaraghuli, Halimah Badioze Zaman, Patrick Olivier, Ahmed Kharrufa, and Azlina Ahmad High Order Polynomial Surface Fitting for Measuring Roughness of Psoriasis Lesion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ahmad Fadzil M. Hani, Esa Prakasa, Hurriyatul Fitriyah, Hermawan Nugroho, Azura Mohd Affandi, and Suraiya Hani Hussein Modelling of Reflectance Spectra of Skin Phototypes III . . . . . . . . . . . . . . M.H. Ahmad Fadzil, Hermawan Nugroho, Romuald Jolivot, Franck Marzani, Norashikin Shamsuddin, and Roshidah Baba
329
341
352
Virtual Method to Compare Treatment Options to Assist Maxillofacial Surgery Planning and Decision Making Process for Implant and Screw Placement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yuwaraj Kumar Balakrishnan, Alwin Kumar Rathinam, Tan Su Tung, Vicknes Waran, and Zainal Ariff Abdul Rahman
361
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
369
Table of Contents – Part II
Visual Computing Capturing Mini Brand Using a Parametric Shape Grammar . . . . . . . . . . . Mei Choo Ang, Huai Yong Chong, Alison McKay, and Kok Weng Ng Development and Usability Evaluation of Virtual Environment for Early Diagnosis of Dementia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Syadiah Nor Wan Shamsuddin, Hassan Ugail, and Valerie Lesk Usability Study of Mobile Learning Course Content Application as a Revision Tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ahmad Sobri Hashim, Wan Fatimah Wan Ahmad, and Rohiza Ahmad Game Design Framework: A Pilot Study on Users’ Perceptions . . . . . . . . Ibrahim Ahmad and Azizah Jaafar
1
13
23
33
The Development of History Educational Game as a Revision Tool for Malaysia School Education . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.A.R. Hadi, Wan Mohd Fazdli Wan Daud, and Nurul Huda Ibrahim
39
Ontology Construction Using Computational Linguistics for E-Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . L. Jegatha Deborah, R. Baskaran, and A. Kannan
50
Eye Tracking in Educational Games Environment: Evaluating User Interface Design through Eye Tracking Patterns . . . . . . . . . . . . . . . . . . . . . Nurul Hidayah Mat Zain, Fariza Hanis Abdul Razak, Azizah Jaafar, and Mohd Firdaus Zulkipli Use of Content Analysis Tools for Visual Interaction Design . . . . . . . . . . . Nazlena Mohamad Ali, Hyowon Lee, and Alan F. Smeaton Improving Accessibility through Aggregative E-Learning for All Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Khairuddin Kamaludin, Noor Faezah Mohd Yatim, and Md. Jan Nordin Exploiting the Query Expansion through Knowledgebases for Images . . . Roohullah and J. Jaafar Usability Evaluation for ‘Komputer Saya’: Multimedia Courseware for Slow Learners . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Norfarhana Abdollah, Wan Fatimah Wan Ahmad, and Emelia Akashah Patah Akhir
64
74
85
93
104
XVIII
Table of Contents – Part II
Reconstruction of 3D Faces Using Face Space Coefficient, Texture Space and Shape Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sheng Hung Chung and Ean Teng Khor Research Finding for Usability Testing on ILC-WBLE . . . . . . . . . . . . . . . . Ming Chee Hoh, Wou Onn Choo, and Pei Hwa Siew Factors Affecting Undergraduates’ Acceptance of Educational Game: An Application of Technology Acceptance Model (TAM) . . . . . . . . . . . . . . Roslina Ibrahim, Rasimah Che Mohd Yusoff, Khalili Khalil, and Azizah Jaafar
114 123
135
Usability of Educational Computer Game (Usa ECG): Applying Analytic Hierarchy Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hasiah Mohamed Omar and Azizah Jaafar
147
Visual Learning through Augmented Reality Storybook for Remedial Student . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hafiza Abas and Halimah Badioze Zaman
157
Visualisation and Social Computing Preliminary Study on Haptic Approach in Learning Jawi Handwriting Skills . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Maizan Mat Amin, Halimah Badioze Zaman, and Azlina Ahmad Scaffolding in Early Reading Activities for Down Syndrome . . . . . . . . . . . Rahmah Lob Yussof and Halimah Badioze Zaman
168 180
EduTism: An Assistive Educational System for the Treatment of Autism Children with Intelligent Approach . . . . . . . . . . . . . . . . . . . . . . . . . . I. Siti Iradah and A.K. Rabiah
193
Investigating the Roles of Assistance in a Digital Storytelling Authoring System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jumail, Dayang Rohaya Awang Rambli, and Suziah Sulaiman
205
MYNDA - An Intelligent Data Mining Application Generator . . . . . . . . . Zulaiha Ali Othman, Abdul Razak Hamdan, Azuraliza Abu Bakar, Suhaila Zainudin, Hafiz Mohd Sarim, Mohd Zakree Ahmad Nazri, Zalinda Othman, Salwani Abdullah, Masri Ayob, and Ahmad Tarmizi Abdul Ghani
217
Scaffolding Poetry Lessons Using Desktop Virtual Reality . . . . . . . . . . . . . Nazrul Azha Mohamed Shaari and Halimah Badioze Zaman
231
Augmented Reality Remedial Worksheet for Negative Numbers: Subtraction Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Elango Periasamy and Halimah Badioze Zaman
242
Table of Contents – Part II
Developing Conceptual Model of Virtual Museum Environment Based on User Interaction Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Normala Rahim, Tengku Siti Meriam Tengku Wook, and Nor Azan Mat Zin
XIX
253
Use of RSVP Techniques on Children’s Digital Flashcards . . . . . . . . . . . . . Siti Zahidah Abdullah and Nazlena Mohamad Ali
261
Cultural Learning in Virtual Heritage: An Overview . . . . . . . . . . . . . . . . . . Nazrita Ibrahim, Nazlena Mohamad Ali, and Noor Faezah Mohd Yatim
273
i-JEN: Visual Interactive Malaysia Crime News Retrieval System . . . . . . Nazlena Mohamad Ali, Masnizah Mohd, Hyowon Lee, Alan F. Smeaton, Fabio Crestani, and Shahrul Azman Mohd Noah
284
Measurement Model to Evaluate Success of E-Government Applications through Visual Relationship . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Norshita Mat Nayan, Halimah Badioze Zaman, and Tengku Mohd Tengku Sembok
295
A Conceptual Design for Augmented Reality Games Using Motion Detection as User Interface and Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . Azfar Bin Tomi and Dayang Rohaya Awang Rambli
305
FaceSnap: Game-Based Courseware as a Learning Tool for Children with Social Impairment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Y.Y. Chen, Wan Fatimah Wan Ahmad, and Nur Zareen Zulkarnain
316
A Visual Measurement Model on Human Capital and ICT Dimensions of a Knowledge Society (KS) Framework for Malaysia towards an Innovative Digital Economy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Halimah Badioze Zaman, A.H. Norsiah, Azlina Ahmad, S. Riza, M.A. Nazlena, J. Azizah, and M.C. Ang
323
Visualization of the Hadith Chain of Narrators . . . . . . . . . . . . . . . . . . . . . . Zarina Shukur, Norasikin Fabil, Juhana Salim, and Shahrul Azman Noah
340
A New Framework for Phylogenetic Tree Visualization . . . . . . . . . . . . . . . . Wan Mohd Nazmee Wan Zainon, Abdullah Zawawi Talib, and Bahari Belaton
348
Technical Skills in Developing Augmented Reality Application: Teachers’ Readiness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Norabeerah Saforrudin, Halimah Badioze Zaman, and Azlina Ahmad
360
Scenario-Based Learning Approach for Virtual Biology Laboratory (VLab-Bio) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Murniza Muhamad, Halimah Badioze Zaman, and Azlina Ahmad
371
XX
Table of Contents – Part II
Towards a Multimodality Ontology Image Retrieval . . . . . . . . . . . . . . . . . . Yanti Idaya Aspura Mohd Khalid, Shahrul Azman Noah, and Siti Norulhuda Sheikh Abdullah A Visual Art Education Tool to Create Logo (APH-Pensil) Based on the Fundamental Design Theory Approach . . . . . . . . . . . . . . . . . . . . . . . . . . Halimah Badioze Zaman, H. Ridzuan, Azlina Ahmad, S. Riza, M.A. Nazlena, J. Azizah, M.C. Ang, and Haslina Arshad Different Visualization Types in Multimedia Learning: A Comparative Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Riaza Mohd Rias and Halimah Badioze Zaman Optimal Command and Control Method for Malaysian Army Small Units in a Malaysian Forest Environment: Small Unit Tactical Management System (SUTaMs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Syed Nasir Alsagoff Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
382
394
408
419
429
Visualization and Haptics for Interactive Medical Image Analysis: Image Segmentation in Cranio-Maxillofacial Surgery Planning Ingela Nystr¨ om, Johan Nysj¨ o , and Filip Malmberg Centre for Image Analysis, Uppsala University, Sweden
[email protected]
Abstract. A central problem in cranio-maxillofacial (CMF) surgery is to restore the normal anatomy of the skeleton after defects, e.g., trauma to the face. With careful pre-operative planning, the precision and predictability of the craniofacial reconstruction can be significantly improved. In addition, morbidity can be reduced thanks to shorter operation time. An important component in surgery planning is to be able to accurately measure the extent of anatomical structures. Of particular interest are the shape and volume of the orbits (eye sockets). These properties can be measured in 3D CT images of the skull, provided that an accurate segmentation of the orbits is available. Here, we present a system for interactive segmentation of the orbit in CT images. The system utilizes 3D visualization and haptic feedback to facilitate efficient exploration and manipulation of 3D data.
1
Introduction
A central problem in cranio-maxillofacial (CMF) surgery is to restore the normal anatomy of the facial skeleton after defects, e.g., malformations, tumours, and trauma to the face. CMF surgery can be exceedingly difficult and timeconsuming to perform when a fracture displaces bone segments from their proper position, when bone segments are missing, or when a bone segment is located in such a position that any attempt to restore it into its original position poses considerable risk for causing further damage to vital anatomical structures, e.g., the eye or the central nervous system. There is ample evidence that careful preoperative planning can significantly improve the precision and predictability of CMF surgery as well as reduce the post-operative morbidity. In addition, the time in the operating room can be reduced and thereby also costs. An important component in surgery planning is to be able to accurately measure the extent of certain anatomical structures. Of particular interest in CMF surgery planning is to measure the shape and volume of the orbit (eye socket), comparing an intact side with an injured side. These properties can be measured in three-dimensional (3D) computed tomography (CT) images of the skull. This, however, requires accurate segmentation of the orbits. Today, orbit segmentation is usually performed by experts in CMF surgery planning who manually trace the orbit boundaries in a large number of CT H. Badioze Zaman et al. (Eds.): IVIC 2011, Part I, LNCS 7066, pp. 1–12, 2011. c Springer-Verlag Berlin Heidelberg 2011
2
I. Nystr¨ om, J. Nysj¨ o, and F. Malmberg
image slices. This manual segmentation method is accurate but time-consuming, tedious, and sensitive to operator errors. Fully automatic orbit segmentation, on the other hand, is difficult to achieve, mainly because of the high shape variability of the orbit, the thin nature of the orbital walls, the lack of an exact definition of the orbital opening, and the presence of CT imaging artifacts such as noise and partial volume effects. To overcome these issues, we propose an interactive, or semi-automatic, method for orbit segmentation, where a human guides the segmentation process. When we interact with real-life 3D objects, the sense of touch (haptics) is an essential addition to our visual perception. During the last two decades, the development of haptic devices and haptic rendering algorithms has made it possible to create haptic interfaces in which the user can feel, touch, and manipulate virtual 3D objects in a natural and intuitive way. It has been demonstrated [12] that such interfaces greatly facilitate interactive segmentation of objects in medical volume images. Here, we present a system for interactive segmentation of the orbits in CT images. The system first extracts the boundaries of the orbital bone structures and then segments the orbit by fitting an interactive deformable simplex mesh to the extracted boundaries. A graphical user interface (GUI) with volume visualization tools and haptic feedback allows the user to efficiently explore the input CT image, define anatomical landmarks, and guide the deformable simplex mesh through the segmentation.
2
Segmentation Method
In this Section, the proposed method for orbit segmentation is described. The method consists of the following main steps: 1. Segment the bone structures around the orbit. 2. Calculate, based on user defined anatomical landmarks, a planar barrier that defines the extent of the orbital opening. 3. Fit a deformable surface model to the orbit, using the data obtained in the previous steps. The steps of the segmentation method are illustrated in Figure 1. In the remainder of this Section, the above steps are described in detail. 2.1
Bone Segmentation
In this step, extract a binary image representing the bone structures around the orbit. Since the intensity of bone is well-defined in CT images, the bone structures may be segmented using hysteresis thresholding [1] with fixed threshold values. See Figure 2. A one voxel thick boundary is extracted from the segmented bone region using mathematical morphology, resulting in a boundary map.
Visualization and Haptics for Interactive Medical Image Analysis
(a)
(b)
(c)
(d)
3
Fig. 1. Overview of the proposed method for interactive orbit segmentation. (a) The user defines the extent of the orbital opening by placing four landmarks on the orbital rim. Haptic feedback facilitates the landmark positioning. (b) After the landmarks have been positioned, the user interactively initializes a deformable simplex mesh as a coarse sphere inside the orbit. (c) The simplex mesh is deformed, using information from the underlying CT data, to fit the orbit. If necessary, the user can guide the deformation process interactively, using the haptic stylus to apply (interactive) forces on selected mesh faces. (d) Resulting orbit segmentation.
4
I. Nystr¨ om, J. Nysj¨ o, and F. Malmberg
Fig. 2. Segmentation of bone structures by hysteresis thresholding. (Left) Close-up image of the orbit in an axial slice of the volume. (Right) Result of hysteresis thresholding.
2.2
Defining the Orbital Opening
Inside the skull, the orbit is surrounded by bone that will prevent the deformable model from extending too far. At the orbital opening, however, there are no bone structures preventing the deformable model from growing indefinitely. Thus, we must explicitly define the extent of the orbital opening. The user selects four landmarks on the orbital rim. These landmarks are subsequently used to define a barrier, beyond which the deformable model is not allowed to grow. The barrier is calculated in two steps: 1. An implicit planar barrier is fitted to the four landmarks through singular value decomposition (SVD). The resulting barrier has three attributes: a normal, a centroid, and a radius. The centroid is defined as the mean value of the landmark positions. The radius is defined as the maximum distance between the centroid and the landmarks. 2. The three attributes are used to generate an explicit triangle mesh for the barrier. This mesh is rasterized, and the resulting voxels are added to the boundary map. 2.3
Deformable Model
Deformable contour models (also known as 2D snakes) were introduced by Kass, et al. [4] and extended to deformable surface models (also known as 3D snakes) by Terzopoulos, et al. [9]. Since their introduction in the late 1980s, deformable surface models have been frequently used in medical image segmentation. A deformable surface model may be thought of as an elastic surface (defined in the image domain) that is driven by the minimization of a cost function, of which local or global minima ideally should correspond to object boundaries.
Visualization and Haptics for Interactive Medical Image Analysis
5
Fig. 3. (Left) A simplex mesh and its dual triangle mesh. (Right) Interactive deformable simplex mesh segmentation of the left orbit in a CT image. The simplex mesh is deformed by external forces computed from the initial CT image and by haptic interactive forces that the user apply on selected mesh faces. Internal forces keep the simplex mesh together and prevent it from bending to much, forming loops, or leaking through boundary gaps.
Deformable surface models make few assumptions about object’s shape and can therefore be used to segment objects of widely different shapes as well as objects of high shape variability. By constraining the extracted object boundaries to be smooth, deformable surface models also offer robustness to boundary gaps and noise [14]. These properties make the deformable models a suitable tool for orbit segmentation. Several different geometric representations of deformable surface models have been proposed [5]. In this project, we have mainly used deformable 2-simplex meshes, a discrete surface model representation introduced by Delingette [2]. See Figure 3. For the remainder of this paper, we will refer to 2-simplex meshes as simplex meshes. Starting from a user-defined initial mesh, we wish to deform this mesh to accurately represent the shape of the orbit. More specifically, we seek a surface that satisfies the following properties: 1. The distance from each vertex of the mesh to the closest point in the boundary map should be minimized. 2. The surface should have some degree of smoothness. To find a surface that matches these criteria, the initial deformable simplex mesh is deformed according to Newton’s second law of motion (Newtonian evolution), μ
∂ 2 pi = Fdamp (pi ) + Fint (pi ) + Fext (pi ), ∂t2
(1)
6
I. Nystr¨ om, J. Nysj¨ o, and F. Malmberg
(a)
(b)
(c)
Fig. 4. External forces used to deform the simplex mesh. (a) A boundary map computed from a CT image of the orbit. The barrier in front of the orbit defines the orbital opening. (b) Distance transform of the boundary map. (c) External forces obtained by taking the (inverse) gradient of the distance transform.
where Fdamp is a damping force, Fint an internal force, and Fext an external force [14]. The damping force prevents oscillations of the model. The internal force assures some geometric continuity by penalizing non-smooth surfaces. The external force takes the underlying data into account by penalizing surfaces that are far away from the boundary map. More specifically, the external force at each point in the image is defined as the (inverse) gradient of a distance transform of the boundary map. See Figure 4. In addition, we include an interactive force which constrains the mesh by user supplied forces or forces that come from higher level image understanding processes. Here, these forces are contributed via the haptic device described in Section 3.2.
3 3.1
User Interface Visualization
To visualize the CT volume image during the segmentation process, we use multi-planar reformatting (MPR) [10] and hardware accelerated direct volume rendering. See Figure 5. MPR is a simple yet effective volume visualization technique that uses arbitrarily oriented 2D planes to simultaneously display several cross-sections of a volume image. In our implementation, the CT image is reformatted in three planes, each of which is orthogonal to one of the image axes and can be translated along that axis by the user to explore the contents of the image. Direct volume rendering is used here to visualize the surface of the skull. The skull is rendered as a shaded iso-surface, using the GPU-accelerated technique described in [6]. Combined with haptic feedback, this form of visualization provides an efficient interface for positioning the anatomical landmarks used to define the barrier at the orbital opening.
Visualization and Haptics for Interactive Medical Image Analysis
7
Fig. 5. (Left) Multi-planar reformatting (MPR) of an 8-bit 512 × 512 × 133 CT image of the orbits. The user can explore the CT image by translating the three orthogonal planes along the image axes. (Right) Hardware-accelerated direct volume rendering of the skull in a CT image, here, as a shaded iso-surface.
3.2
Haptics
Haptic rendering is the process of computing and generating kinesthetic or tactile feedback in response to a user’s interaction with virtual 3D objects through a haptic device. Here, the PHANTOM Desktop haptic device (see Figure 6) from Sensable Technologies [7] is used. This device has 6 degrees-of-freedom (DOF) for input and 3 DOF for output, i.e., a position and an orientation for input and a force vector for output. The 6 DOF input facilitates natural and efficient interaction with 3D data — the user can grab, move, and rotate the volume image freely to inspect and manipulate it. In the proposed system, haptic feedback is used when placing the landmarks that define the orbital opening. The skull is then rendered as a solid haptic surface. Additionally, haptic feedback is used to convey the effects of interactive forces to the user during deformable model segmentation.
Fig. 6. (Left) The PHANTOM Desktop haptic device from Sensable Technologies. (Right) A user working with a haptic device.
8
I. Nystr¨ om, J. Nysj¨ o, and F. Malmberg
"
!
#
$
%( %(
$
%( , %( '() ( *+ %&&
---
Fig. 7. The software architecture of WISH and WISHH3D
4
Implementation Details
We have implemented the proposed system using WISH [12,6], an open-source software package for developing haptic-enabled image processing software. The diagram in Figure 7 illustrates the software architecture of WISH. The core of WISH is a stand-alone C++ class library for image analysis, volume visualization, and haptics. Through an interface called WISHH3D, WISH is integrated with H3DAPI 1.5 [8], an open source scene graph API for graphics and haptics programming. H3DAPI implements the X3D [13] scene graph standard and uses OpenGL for graphics rendering and OpenHaptics [7] for haptics rendering. There are three programming interfaces to H3DAPI: X3D, Python, and C++. WISHH3D consists of H3DAPI scene graph nodes for the methods and algorithms implemented in WISH. These nodes are written in C++ and compiled into a dynamically linked library that can be loaded into H3DAPI applications via X3D files.
Visualization and Haptics for Interactive Medical Image Analysis
9
Table 1. Intra-operator results of the segmentation evaluation. The coefficient of variation (CV) is given in parenthesis. User Precision U1 0.964 (1.8 %) U2 0.972 (1.3 %) U3 0.969 (2.1 %)
Sensitivity 0.992 (0.3 %) 0.993 (0.4 %) 0.993 (0.4 %)
Deviation 1.5 (27 %) 1.4 (24 %) 1.4 (23 %)
Table 2. Inter-operator precision for the proposed method, compared to manual delineation. The coefficient of variation (CV) is given in parenthesis. User pair U1U2 U1U3 U2U3 Total
5
Manual 0.901 (1.6 %) 0.907 (1.5 %) 0.930 (0.9 %) 0.913 (1.9 %)
Interactive 0.954 (1.8 %) 0.950 (1.9 %) 0.969 (2.0 %) 0.957 (2.1 %)
Experiment and Results
To evaluate the performance of our segmentation system, we asked three users to segment both orbits in a set of seven CT volume images.1 To assess the repeatability of the proposed method, each user segmented each orbit twice. Thereby, a total of 84 orbits were segmented to validate the system. Additionally, the users performed manual segmentation of each orbit using the medical imaging software ITK-SNAP. Prior to this manual segmentation, the orbital openings were defined with planar barriers as described in Section 2.2. These barriers were shown to the users during the manual segmentation. We constructed crisp ground truth segmentations by shape-based interpolation [3] averaging of the manual segmentations. The time for completing the segmentation was recorded for both the interactive and the manual segmentation. The average time for manual segmentation of an orbit was approximately 17 minutes. With the proposed interactive method, the same task was completed on average in 3 minutes, i.e., the required user time for segmentation was reduced by a factor 5. The precision, accuracy, and efficiency of the proposed segmentation method was evaluated using the framework by Udupa, et al. [11]. We measured accuracy in terms of sensitivity, i.e., true positive volume fraction, and deviation, i.e., the distance between the boundaries of two segmentations. Tables 1 and 2 show some results of the evaluation. Figure 8 shows the segmentation results in a slice from a volume where the orbit is intact. The segmentation results shown in Figure 9 illustrates the ability of the proposed method to accurately segment fractured orbits. A 3D visualization of the segmentation results is shown in Figure 10. 1
The dimension of the CT images was 512 × 512 × Nz , where Nz was in the range 26–193 slices. The slice thickness Δz was 0.4–2.0 mm and the in-plane resolution in the range 0.31–0.43 mm.
10
I. Nystr¨ om, J. Nysj¨ o, and F. Malmberg
Fig. 8. Segmentation of an intact orbit. (Left) An axial slice of a crisp ground truth segmentation constructed by shape-based interpolation of three manual segmentations. (Right) Corresponding axial slice of a rasterized simplex mesh segmentation. The left and right segmentation results are similar, except that the simplex mesh segmentation reaches deeper into boundary concavities.
Fig. 9. Segmentation of a fractured orbit. (Left) An axial slice of a crisp ground truth segmentation constructed by shape-based interpolation of three manual segmentations. (Right) Corresponding axial slice of a rasterized simplex mesh segmentation. Despite this being a difficult case to segment, the left and right segmentation results are similar with only minor leakage problems at the boundary of the simplex mesh segmentation.
Visualization and Haptics for Interactive Medical Image Analysis
(a)
(b)
(c)
(d)
11
Fig. 10. Surface renderings of (a) a crisp “true” orbit segmentation and (b)–(d) simplex mesh segmentations performed by user U1, U2, and U3, respectively. (b)–(d) highlight the robustness of the method by showing similar results despite being segmentations by different users.
6
Conclusions and Future Work
We have presented an interactive system2 for orbit segmentation in CT images intended for planning of craniomaxillo-facial (CMF) surgery. The segmentation is driven by a deformable model in a 3D visual and haptic environment. We report on high accuracy, very high precision, and several times improved efficiency (compared with manual segmentation) in an evaluation study where three users segmented almost 100 orbits. Next, we propose to make a model for the restoration of an injured face by mirroring an intact side to the fractured side. The resulting model should be used as a template for interactive registration of the bone fragments using haptic interaction. Our bottom-line is that haptic enabled 3D input devices offer exciting possibilities for interactive manipulation and exploration of 3D data.
2
Source code available at http://www.cb.uu.se/research/haptics/orbitproject
12
I. Nystr¨ om, J. Nysj¨ o, and F. Malmberg
Acknowledgments. We are grateful to Prof. Jan-Micha´el Hirsch and Dr. Elias Messo at the Department of Surgical Sciences; Oral & Maxillofacial Surgery, Uppsala University, Sweden, for providing the CT datasets. The project was supported by a grant from NovaMedTech (http://www.novamedtech.se).
References 1. Canny, J.: A Computational Approach to Edge Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 8(6), 679–698 (1986) 2. Delingette, H.: General Object R econstruction Based on Simplex Meshes. International Journal of Computer Vision 32(2), 111–146 (1999) 3. Herman, G.T., Zheng, J., Bucholtz, C.A.: Shape-based Interpolation. IEEE Computer Graphics and Applications 12(3), 69–79 (1992) 4. Kass, M., Witkin, A., Terzopoulos, D.: Snakes: Active Contour Models. International Journal of Computer Vision 1, 321–331 (1987) 5. Montagnat, J., Delingette, H., Ayache, N.: A Review of Deformable Surfaces: Topology, Geometry and Deformation. Image and Vision Computing 19, 1023– 1040 (2001) 6. Nystr¨ om, I., Malmberg, F., Vidholm, E., Bengtsson, E.: Segmentation and Visualization of 3D Medical Images through Haptic Rendering. In: Proceedings of the 10th International Conference on Pattern Recognition and Information Proceeding (PRIP 2009), pp. 43–48. Publishing Center of BSU (2009) 7. SensAble Technologies: SensAble (2011), http://www.sensable.com (accessed on April 27, 2011) 8. SenseGraphics AB: H3D API (2011), http://www.h3dapi.org (accessed on April 27, 2011) 9. Terzopoulos, D., Witkin, A., Kass, M.: Constraints on Deformable Models: Recovering 3D Shape and Nonrigid Motion. Artificial Intelligence 36(1), 91–123 (1988) 10. Udupa, J.K., Herman, G.T. (eds.): 3D Imaging in Medicine, 2nd edn. CRC Press, Inc. (2000) 11. Udupa, J.K., LeBlanc, V.R., Zhuge, Y., Imielinska, C., Schmidt, H., Currie, L.M., Hirsch, B.E., Woodburn, J.: A framework for evaluating image segmentation algorithms. Computerized Medical Imaging and Graphics 30(2), 75–87 (2006) 12. Vidholm, E.: Visualization and Haptics for Interactive Medical I age Analysis. Ph.D. thesis, Uppsala University (2008) 13. Web3D Consortium: X3D (2011), http://www.web3d.org/x3d/ (accessed on April 27, 2011) 14. Xu, C., Prince, J.L.: Snakes, Shapes, and Gradient Vector Flow. IEE Transactions on Image Processing 7(3), 359–369 (1998)
Evaluation of Unsupervised Segmentation Algorithms for Silhouette Extraction in Human Action Video Sequences Adolfo Mart´ınez-Us´o, G. Salgues, and S.A. Velastin 1 Institute of New Imaging Technologies Universitat Jaume I, 12071 Castell´on, Spain
[email protected] 2 Ecole Nationale Suprieure de Physique De Strasbourg 67412 Illkirch, France 3 Digital Imaging Research Centre (DIRC) Faculty of Computing, Information Systems and Mathematics Kingston University, Surrey, KT1 2EE, UK
[email protected]
Abstract. The main motivation of this work is to find and evaluate solutions for generating binary masks (silhouettes) of foreground targets in an automatic way. To this end, four renowned unsupervised image segmentation algorithms are applied to foreground segmentation. A comparison among these algorithms is carried out using the MuHAVi dataset of multi-camera human action video sequences. This dataset presents significant challenges in terms of harsh illumination resulting for example in high contrast and deep shadows. The segmentation results have been objectively evaluated against manually derived ground-truth silhouettes.
1 Introduction The growing importance of imaging technology for many applications has widely been recognised in the literature, nowadays being a promising research field which needs further development before practical deployment [5]. The importance of this research has been highlighted with the rapid increase of the use of CCTV systems almost everywhere. These non-intrusive imaging devices are able to detect objects or identify people in a relatively cheap way. However, the huge amounts of data generated by CCTV cameras require sophisticated image analysis techniques to potentially recognise significant events, objects or people in a video sequence. There exists many image analysis methodologies based on video sequences. These methodologies are being developed for applications such as motion capture, surveillance, traffic monitoring, video-based biometric person authentication, etc. An essential step in video-based applications is foreground segmentation, which is the process that extracts a target of interest from a given image sequence. The quality of the final result produced by a video-based application depends mostly on the accuracy obtained segmenting the tarjet [2]. There have been many attempts over the years to segment the foreground, mostly based on features derived from movement or background H. Badioze Zaman et al. (Eds.): IVIC 2011, Part I, LNCS 7066, pp. 13–22, 2011. c Springer-Verlag Berlin Heidelberg 2011
14
A. Mart´ınez-Us´o, G. Salgues, and S.A. Velastin
subtraction [13,9]. Thus, some robust approaches on this task can be found [3], however, they work in a restrictive context or under important limitations [10]. Therefore, this problem is far from being solved in a general sense and novel approaches on this field are needed in order to find effective solutions particularly in noisy or low contrast conditions. The foreground segmentation task can be divided into two parts: foreground detection and foreground segmentation. The work presented here focuses on the latter part of this task, also known as silhouette extraction, through a comparison of competing algorithms. The experimental part of the work is based on the public MuHAVi human action recognition dataset [12]. In our approach, we consider a video sequence as a set of individual images (frames). The proposed framework starts from a coarse foreground detection using a state-of-theart algorithm [15] that finds out which rectangular part (blob) of the image will be segmented. The segmentation algorithms used in the comparison are well-known and widely used for image segmentation of static images [4,6,7,11]. All these algorithms are unsupervised segmentation methods that can be applied to any research field. The final segmentation results of each algorithm are evaluated using three wellknown metrics [14,8,15]. These metrics provide an objective measure of the quality of the segmentation by comparing it to the ground-truth. Therefore, a novel combination of foreground detection and image segmentation algorithms for foreground segmentation is proposed in this work. To the best of our knowledge, this idea have not been extensively evaluated for the segmentation of moving objects. In addition, this work also contributes to the growth of interesting solutions on this topic by means of knowing which of these well-accepted image segmentation algorithms performs the best for this task. A major part of the MuHAVi dataset remains without ground-truth reference, as manually labelling frame by frame is a very time-consuming task and therefore an automatic silhouette extraction is highly desirable. Thus, in addition to comparison purposes, other main motivations of this work are: – Using the results for significantly improving the final foreground segmentation of the MuHAVi dataset. – Guiding the decision on which method and parameters are appropriate necessary for this automatic silhouette extraction. – Establishing a baseline comparison for future researchers in the field.
2 Comparison Schema The proposed approach consists of several stages for generating the final binary silhouettes. These stages can be seen schematically in Fig. 1. Square objects represent instances whereas oval objects represent processes. Likewise, shapes filled with orange colour are data provided by the MuHAVi dataset. Firstly, a standard algorithm for computing roughly where the silhouette might be is used. This returns a rough binary mask (M ) of the foreground target and its bounding box coordinates (dashed lines in the figure). The frames thus extracted from the video sequences are cropped using the bounding box coordinates provided by the foreground
Evaluation of Unsupervised Segmentation Algorithms for Silhouette Extraction
15
Fig. 1. Flowchart schema of the experiments. Shapes filled with orange colour are data provided by the MuHAVi dataset.
detector. The cropped part of the frame is the input data for the segmentation algorithms. This cropped part also contains the binary mask M , obtained by the foreground detector. Examples of the M mask can be seen in Fig. 2, 3 and 4 framed in red colour. The M mask is quite rudimentary and inaccurate, thus a morphological closing operation is used to obtain M , where small holes have been reduced and gaps in contours have been attenuated. The segmentation process is applied to the whole cropped area but only those regions of the segmentation that intersect with M by more than a fraction p are selected to be part of the final foreground mask. Parameter p is a threshold value which has been found experimentally to work well when set to 80%. Fig. 2 in its bottom row shows the segmentation result (second column) and the final intersection result in a binary mask (fourth column). This resulting mask is worked out for each frame of the video sequence. An estimation of the quality of each binary mask obtained is then calculated using the ground truth reference of the frame and an objective evaluation metric. Finally, the average of the values obtained for each frame is given as a measure of the quality of the silhouette segmentation on each video sequence.
3 Segmentation Algorithms and Objective Evaluation Metrics Four well-known colour segmentation algorithms are used in our comparison. All of them are general-purpose unsupervised methods whose results have been quoted often as the reference to beat [1]. These algorithms are: 1. MS, segmentation method based on the mean shift algorithm proposed by Comaniciu and Meer in [4]. 2. In [7] Felzenszwalb and Huttenlocher presented an algorithm (FH) that adaptively adjusts the segmentation criterion based on the degree of variability in neighbouring regions of the image. 3. SRM algorithm was proposed by Nock and Nielsen in [11] based on the idea of using perceptual grouping and region merging for image segmentation. 4. JSEG algorithm proposed by Deng and Manjunath in [6] provides colour-texture homogeneous regions which are useful for salient region detection.
16
A. Mart´ınez-Us´o, G. Salgues, and S.A. Velastin
It is important to point out that the parameters of the algorithms have been set to the default values provided in the original papers, therefore, no parameters have been tuned up. The evaluation of the quality of the results has been done by means of comparing the resulting silhouettes (R) to the ground-truth binary mask (GT ). To this end, three objective metrics have been used: – The first metric is an overlapping measure (OV) that has been often used in the literature [15]. It is quite simple but efficient. For each frame k of a sequence the following expression is worked out: OV (k) = 1 −
GT ∩ R GT ∪ R
(1)
– The objective evaluation metric MPEGqm (MPEG quality measure) was proposed in [14] and has been widely adopted by the research community due to its simplicity. In addition to the error added in each frame, MPEGqm takes into account the fluctuation of the error between the current frame and the previous one. The spatial error is defined for a frame k as: Sqm(k) =
fn + f p GT ∪ R
(2)
where fn is the number of false negative pixels and fp is the number of false positive pixels. The temporal error is defined by: T qm(k) = abs(Sqm(k) − Sqm(k − 1))
(3)
Finally, the MPEGqm criteria for a frame k is: M P EGqm(k) =
Sqm(k) + T qm(k) 2
(4)
– The third metric is the measure PST (Perceptual Spatio-Temporal). It is based on objective errors and perceptual errors and has demonstrated superior performance than MPEGqm [8]. Errors produced by spatial artefacts and temporal artefacts are separated, giving different weights to each one. Four spatial artefacts are analysed regarding the ground truth reference: added regions, added background, inside holes and border holes. At the same time, the temporal artefacts are also studied regarding consecutive frames. Examples of spatial artefacts are the false pixel detections on the border of the silhouettes. These artefacts are strongly penalised because they affect the shape of the silhouettes. Examples of temporal artefacts are the fluctuation of the spatial errors from one frame to the previous one in terms of flickering and expectation effect produced. Therefore, the total annoyance found in a video object segmentation is estimated by a weighted linear combination of artefacts. PST metric has a quite long mathematical explanation, thus the reader is directed to [8] for details.
Evaluation of Unsupervised Segmentation Algorithms for Silhouette Extraction
17
4 Comparison Results The MuHAVi multi-camera, multi-action dataset [12] consists of 17 action classes, 14 actors and 8 cameras for a total amount of 952 video sequences. Each actor performs an action several times in the action zone. This is collected by 8 CCTV cameras located at 4 sides and 4 corners of a rectangular platform. The video segments from each camera have been manually synchronised. Illumination is quite harsh, as the experimental setup uses standard night street lighting, intentionally so to simulate realistic conditions and to test robustness of segmentation and action recognition. Table 1 shows some of the actions of the dataset. For these video sequences, the MuHAVi dataset provides manually labelled silhouettes for an interval of frames as a ground-truth reference (column 4 in Table 1). Table 1. Evaluation sequences. First column for the segmented actions, second for the actor/camera number, third for the sequence identifier and fourth for the frames with ground truth reference. Action Act/Cam Seq. Id Run stop 1/3 1 Run stop 1/4 2 Run stop 4/3 3 Run stop 4/4 4 Kick 1/3 5 Kick 1/4 6 Kick 4/3 7 Kick 4/4 8 Punch 1/3 9 Punch 1/4 10 Punch 4/3 11 Punch 4/4 12 Shotgun Collapse 1/3 13 Shotgun Collapse 4/3 14 Shotgun Collapse 4/4 15 Walk Turn Back 1/3 16 Walk Turn Back 1/4 17 Walk Turn Back 4/3 18 Walk Turn Back 4/4 19
Frames 980 to 1417 980 to 1417 293 to 617 293 to 618 2370 to 2910 2370 to 2910 200 to 628 200 to 628 2140 to 2606 2140 to 2606 92 to 535 92 to 535 319 to 1208 267 to 1103 267 to 1103 216 to 681 216 to 681 207 to 672 207 to 672
A foreground detection has been performed on these video sequences using an experimental industrial tracker [15]. However, any reasonable foreground detector (e.g. as available in OpenCV) would be valid as well. The foreground detector outputs i) the bounding box coordinates of the region of the image that will be segmented and ii) a binary mask with a roughly detection of the target. Fig. 2, 3 and 4 show three examples of original frames extracted from the video sequences. These figures also show the foreground detection framed in red, the groundtruth silhouette with no frame and the resulting silhouettes extracted from these actions framed in green. As it can be seen, the resulting silhouettes improve significantly the
18
A. Mart´ınez-Us´o, G. Salgues, and S.A. Velastin
Fig. 2. Sample for ”RunStop” sequence (actor 4, camera 3, frame 392). Second row, from left to right: binary mask (M ), example of segmentation result for the SRM algorithm, ground-truth reference (GT) and final segmentation as a binary mask (R).
foreground detection. Some common segmentation mistakes resulting from illumination changes and shadows have been successfully avoided. The silhouettes obtained with the proposed methodology are good enough to: – give rise to a baseline silhouette segmentation of the MuHAVi database. – be directly used by video-based applications, for instance in human action recognition. In Table 2, the four segmentation algorithms have been compared by means of the proposed evaluation metrics. It shows the average values of the evaluation metrics obtained by each segmentation algorithm. The comparison has been carried out using the 19 video sequences that have ground-truths in the MuHAVi dataset (described in Table 1). The lower the values, the better the segmentation result in terms of similarity to the ground-truth reference. The first column shows a number for identifying each sequence using Table 1. For each action and for each evaluation measure, the average values obtained by each segmentation algorithm are shown, with the best one shown in bold. The SRM algorithm mostly obtains the best rates in this table, followed by the FH algorithm. Together, they account for more than 90% of the best results.
Evaluation of Unsupervised Segmentation Algorithms for Silhouette Extraction
19
Fig. 3. Top row shows a sample for ”WalkTurnBack” sequence (actor 1, camera 3, frame 253). Second row, from left to right: binary mask (M ), ground-truth reference (GT ) and final segmentation as a binary mask (R).
4.1 Implementation and Computational Issues One of the key implementation issues in this comparison is that three of the algorithms have been implemented using C++ (FH, JSEG and MS) whereas the SRM algorithm has been run using MATLAB. Likewise, the computing cost analysis here only considers the segmentation process and not the preliminary blob detection, so as to focus on segmentation. Taking this in mind, an illustrative example of the computational costs for these image segmentation algorithms is presented. It has been performed using an Intel Core i7 4 CPU 2.80GHz. The cropped part of the frame images of the Punch video sequence (actor = 4, camera = 4, id = 12) has been used (the part that is actually segmented), that means 444 images of different sizes. These cropped parts of the sequence have an average height × width of 254.77 × 91.33 pixels with a standard deviation of 9.11 × 32.80 pixels for each dimension respectively. Table 3 gives a simple quantitative analysis of the computational cost of each algorithm using the CPU time obtained for the entire sequence (ES) and the average time per frame (PF). Note that no optimisations were carried out but a direct implementation of the algorithms described were run1 . As it can be seen, FH and MS algorithms are computationally more affordable than the SRM and JSEG algorithms. 1
Regarding optimisations, the authors are aware that there exits projects where MS algorithm has even been used in a real-time application (see CASBliP project at http://casblipdif.webs.upv.es/).
20
A. Mart´ınez-Us´o, G. Salgues, and S.A. Velastin
Table 2. Objective evaluation of the segmentation quality. First column for the sequence identifiers and second to fourth for the evaluation metrics. Each column of the evaluation metrics is in turn divided into four columns, one for each segmentation algorithm.
Seq. id 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
FH 0.294 0.323 0.333 0.229 0.202 0.181 0.197 0.147 0.321 0.172 0.196 0.174 0.266 0.286 0.300 0.503 0.282 0.297 0.258
OV JSEG MS 0.260 0.280 0.324 0.275 0.301 0.331 0.258 0.263 0.204 0.242 0.200 0.194 0.193 0.216 0.203 0.192 0.251 0.263 0.173 0.177 0.191 0.193 0.221 0.210 0.233 0.335 0.243 0.388 0.328 0.375 0.470 0.491 0.274 0.240 0.280 0.285 0.285 0.264
SRM 0.250 0.279 0.280 0.250 0.175 0.175 0.176 0.193 0.208 0.156 0.171 0.216 0.209 0.237 0.311 0.447 0.242 0.242 0.273
FH 0.158 0.175 0.180 0.129 0.110 0.103 0.110 0.084 0.167 0.094 0.128 0.098 0.144 0.151 0.159 0.264 0.156 0.161 0.143
MPEGqm JSEG MS 0.154 0.174 0.189 0.160 0.179 0.206 0.152 0.157 0.117 0.165 0.122 0.122 0.114 0.137 0.125 0.115 0.148 0.170 0.104 0.105 0.110 0.112 0.131 0.118 0.143 0.209 0.147 0.255 0.194 0.214 0.258 0.294 0.163 0.147 0.164 0.194 0.168 0.153
SRM 0.149 0.162 0.171 0.150 0.100 0.102 0.104 0.115 0.131 0.089 0.099 0.125 0.124 0.135 0.174 0.252 0.142 0.148 0.158
FH 0.686 0.235 0.798 0.104 0.595 0.087 0.665 0.087 0.542 0.056 0.704 0.569 0.304 0.599 0.575 0.857 0.142 0.643 0.174
PST JSEG MS 0.694 0.805 0.292 0.263 0.855 0.933 0.129 0.448 0.582 0.657 0.116 0.417 0.616 0.772 0.128 0.139 0.532 0.685 0.097 0.322 0.681 0.690 0.596 0.630 0.329 0.873 0.342 0.934 0.683 0.851 0.888 0.990 0.169 0.501 0.642 0.866 0.212 0.199
SRM 0.681 0.260 0.792 0.157 0.580 0.102 0.595 0.129 0.520 0.109 0.698 0.632 0.298 0.317 0.658 0.863 0.162 0.641 0.207
Table 3. Processing times for each of the analysed methods using id = 12 video sequence. First column shows the image segmentation algorithms, second column shows the time for the entire sequence (ES) whereas third column shows the average time per frame (PF). Time (ES) FH 7.79s JSEG 3m 2.98s MS 8.23s SRM 1m 33.02s
Time (PF) 0.0175s 0.4121s 0.0185s 0.2095s
Evaluation of Unsupervised Segmentation Algorithms for Silhouette Extraction
21
Fig. 4. Top row shows a sample for a ”ShotGunCollapse” sequence (actor 1, camera 3, frame 375), the rest of the rows, from top to bottom: binary mask (M ), ground-truth (GT ) and the final segmentation (R)
5 Conclusions In this work, a comparison of four well-known image segmentation methods has been presented. The comparison has been done in a silhouette extraction framework using the MuHAVi dataset of multi-camera human action video sequences and by means of three well-known objective evaluation metrics from the literature. The segmentation algorithms take profit from a binary mask provided by a foreground blob detector to just segment the part of the frame where the target is and to select the regions that form part of the foreground silhouette. A methodology and a baseline comparison for achieving an automatic silhouette extraction have been shown in this work. Likewise, experiments demonstrate that combining foreground blob detection and image segmentation algorithms is efficient in obtaining an improved binary mask of the foreground target. The comparison shows that
22
A. Mart´ınez-Us´o, G. Salgues, and S.A. Velastin
the SRM and FH algorithms (in this order) are the most suitable ones for this task, obtaining the best performance in almost all the tested actions. An analysis of the computational cost of each algorithm has also been offered. MS and FH algorithms have shown very similar performance, being computationally more affordable than the SRM and JSEG algorithms. A further refinement of the final silhouette extraction is planned as a future work by means of using more robust foreground detectors. These results will be made available as a part of the MuHAVi dataset. Acknowledgements. This work was supported by the Spanish Ministry of Science and Innovation under the project Consolider Ingenio 2010 CSD2007-00018 and under the ”Jos´e Castillejo” sub-programme for the mobility of Spanish university lecturers.
References 1. Arbelaez, P., Maire, M., Fowlkes, C., Malik, J.: From contours to regions: An empirical evaluation, pp. 2294–2301 (2009) 2. Chen, C., Liang, J., Zhao, H., Hu, H., Tian, J.: Frame difference energy image for gait recognition with incomplete silhouettes. Pattern Recognition Letters 30(11), 977–984 (2009) 3. Chen, D., Denman, S., Fooes, C., Sridharan, S.: Accurate silhouettes for surveillance - improved motion egmentation using graph cuts. In: Proceedings of the 2010 International Conference on Digital Image Computing: Techniques and Applications, DICTA 2010, pp. 369– 374. IEEE Computer Society (2010) 4. Comaniciu, D., Meer, P.: Robust analysis of feature spaces: Color image segmentation. In: IEEE Conf. Computer Vision and Pattern Recognition, pp. 750–755 (1997) 5. Davies, E.R.: Image analysis in crime: progress, problems and prospects. In: IEE Seminar Digests, pp. 105–112 (2005) 6. Deng, Y., Manjunath, B.S.: Unsupervised segmentation of color-texture regions in images and video. IEEE Trans. on PAMI 23(8), 800–810 (2001) 7. Felzenszwalb, P.F., Huttenlocher, D.P.: Efficient graph-based image segmentation. International Journal of Computer Vision 59(2), 167–181 (2004) 8. Gelasca, E.D., Ebrahimi, T.: On evaluating metrics for video segmentation algorithms. Invited paper of the Workshop on Video Processing and Quality Metrics, VPQM (2006) 9. Hu, W., Tan, T., Wang, L., Maybank, S.: A survey on visual surveillance of object motion and behaviours. IEEE Trans. on System, Man, and Cybernetics, Part C: Applications and Reviews 34(3), 334–352 (2004) 10. Lee, W., Woo, W., Boyer, E.: Silhouette segmentation in multiple views. IEEE Trans. on PAMI 33(7), 1429–1441 (2011) 11. Nock, R., Nielsen, F.: Statistical region merging. IEEE Trans. on PAMI 26(11), 1452–1458 (2004) 12. Singh, S., Velastin, S.A., Ragheb, H.: MuHAVi: A multicamera human action video dataset for the evaluation of action recognition methods, pp. 48–55 (2010) 13. Wang, L., Hu, W., Tan, T.: Recent developments in human motion analysis. Pattern Recognition 36(3), 585–601 (2003) 14. Wollborn, M., Mech, R.: Refined procedure for objective evaluation of video object generation algorithms. Doc. ISO/IEC JTC1/SC29/WG11 M3448 1648 (1998) 15. Yin, F., Makris, D., Velastin, S.A., Orwell, J.: Quantitative evaluation of different aspects of motion trackers under various challenges. British Machine Vision Association (5), 1–11 (2010)
Video Forgery and Motion Editing Joseph C. Tsai1 and Timothy K. Shih2 1
Multimedia Information Network Lab Tamkang University, Taiwan
[email protected] 2 Multimedia Information Network Lab National Central University, Taiwan
[email protected]
Abstract. Video Forgery is a technique for generating fake video by altering, combining, or creating new video contents. We change the behavior of actors in a video. For instance, the outcome of a 100-meter race in the Olympic Game can be falsified. We track objects and segment motions using a modified mean shift mechanism. The resulting video layers can be played in different speeds and at different reference points with respect to the original video. In order to obtain a smooth movement of target objects, a motion interpolation mechanism is proposed based on reference stick figures (i.e., a structure of human skeleton) and video inpainting mechanism. The video inpainting mechanism is performed in a quasi-3D space via guided 3D patch matching. Interpolated target objects and background layers are fused. It is hard to tell whether a falsified video is the original. In addition, in this talk, we demonstrate a new technique to allow users to change the dynamic texture used in a video background for special effect production. For instance, the dynamic texture of fire, smoke, water, cloud, and others can be edited through a series of automatic algorithms. Motion estimations of global and local textures are used. Video blending techniques are used in conjunction with a color balancing technique. The editing procedure will search for suitable patches in irregular shape blocks, to reproduce a realistic dynamic background, such as large waterfall, fire scene, or smoky background. The technique is suitable for making science fiction movies. We demonstrate the original and the falsified videos in our website at http://www.csie.ncu.edu.tw/~tshih. Although video falsifying may create a moral problem, our intension is to create special effects in movie industry. Keywords: Video Forgery, Object Tracking, Motion Estimation, Special Effect Production.
H. Badioze Zaman et al. (Eds.): IVIC 2011, Part I, LNCS 7066, p. 23, 2011. © Springer-Verlag Berlin Heidelberg 2011
Improved Incremental Orthogonal Centroid Algorithm for Visualising Pipeline Sensor Datasets A. Folorunso Olufemi, Mohd Shahrizal Sunar, and Normal Mat Jusoh Department of Graphics & Multimedia, Faculty of Computer Science and Information Systems, Universiti Teknologi Malaysia, 81310, Skudai, Johor
[email protected],
[email protected],
[email protected]
Abstract. Each year, millions of people suffer from after-effects of pipeline leakages, spills, and eruptions. Leakages Detection Systems (LDS) are often used to understand and analyse these phenomena but unfortunately could not offer complete solution to reducing the scale of the problem. One recent approach was to collect datasets from these pipeline sensors and analyse offline, the approach yielded questionable results due to vast nature of the datasets. These datasets together with the necessity for powerful exploration tools made most pipelines operating companies “data rich but information poor”. Researchers have therefore identified problem of dimensional reduction for pipeline sensor datasets as a major research issue. Hence, systematic gap filling data mining development approaches are required to transform data “tombs” into “golden nuggets” of knowledge. This paper proposes an algorithm for this purpose based on the Incremental Orthogonal Centroid (IOC). Search time for specific data patterns may be enhanced using this algorithm. Keywords: Pigging, heuristics, incremental, centroid, Visual Informatics.
1 Introduction Pipelines are essential components of the energy supply chain and the monitoring of their integrities have become major tasks for the pipeline management and control systems. Nowadays pipelines are being laid over very long distances in remote areas affected by landslides and harsh environmental conditions where soil texture that changes between different weathers increase the probability of hazards not to mention the possibility of third party intrusion such as vandalism and deliberate attempt of diversions of pipeline products. It is widely accepted that leakages from pipelines have huge environmental, cost and image impacts. Conventional monitoring techniques such as the LDSs could neither offer continuous pipeline monitoring over the whole pipeline distance nor present the required sensitivity for pipeline leakages or ground movement detection. Leakages can have various causes, including excessive deformations caused by earthquakes, landslides, corrosion, fatigue, H. Badioze Zaman et al. (Eds.): IVIC 2011, Part I, LNCS 7066, pp. 24–35, 2011. © Springer-Verlag Berlin Heidelberg 2011
Improved Incremental Orthogonal Centroid Algorithm
25
material flaws or even intentional or malicious damaging. Pipeline sensors datasets are structurally different and fundamentally unique for so many reasons. In the first place, these data are generated asynchronously, example of sensor datasets obtained from the velocity-vane anemometer is shown in Table 1. Secondly, they are filled with noises. Thirdly, they come in unrelated units and formats, making comparison very difficult. Example, the temperature is measured in degree Celsius while the Velocity is measured in m/s2. Table 1. Data Attributes and Variables from the Velocity-Vane Anemometer
Pressure (N/m2 )
Temp. (0C)
1.002312 1.002202 0.903421 1.002212 0.960620 1.002801 1.002376 .
19.302978 19.302990 19.302990 19.302978 18.999996 18.999996 19.302978 18.999996 .
Vol. (M3/H) x E-03 0.0055546 0.0055544 0.0055546 0.0055544 .
Flow Velocity (m/s) 12.002302 12.002302 12.003421 12.004523 12.005620 12.002302 12.002302 .
External body Force EBF (N) 0.000344 0.002765 0.003452 0.003564 0.005423 0.005642 .
A central problem in scientific visualisation is to develop an acceptable and resources efficient representation for such complex datasets [1, 2, 3]. The challenges of high dimensional datasets vary significantly across many factors and fields. Some researchers including [4, 5] viewed these challenges as scientifically significant for positive theoretical developments.
2 Literature Review Historically, the Principal Components Analysis (PCA) originally credited to Pearson (1901) whose first appearance in modern literatures dates back to the work by Hotelling (1933) was a popular approach to reducing dimensionality. It was formerly called the Karhunen-Loeve procedure, eigenvector analysis and empirical orthogonal functions. The PCA is a linear technique that regards a component as linear combinations of the original variables . The goal of PCA is to find a subspace whose basic vectors correspond to the directions with maximal variances. Let X be an dxp matrix obtained from sensor datasets for example, where d represents the individual data attributes (columns) and p the observations (or variables) that is being measured. Let us further denote the covariance matrix C that defined X explicitly as:
26
A. Folorunso Olufemi, M.S. Sunar, and N.M. Jusoh
C
∑
x
x x
x
T
(1.0)
Where x Є X and x is the mean of x , T is the positional orders of x Є X, and X is the covariance matrix of the sampled data. We can thus define an objective function as: G w
W T CW
(2.0)
The PCA’s aims is to maximise this stated objective function G(w) in a solution space defined by: H
WЄ R
, WT W
I
(3.0)
It has been proved that the column vectors of W are the p higher or maxima eigenvectors of covariance matrix C defined above [see 8]). However, for very large and massive datasets like the pipeline sensors datasets, an enhancement of the PCA called the Incremental PCA developed by [9, 10] could be a useful approach. The IPCA is an incremental learning algorithm with many variations. The variations differ by their ways of incrementing the internal representations of the covariance matrix. Although both the PCA and the IPCAs are very effective for most data mining applications, but, because they ignore the valuable class label information in the entire data space, they are inapplicable for sensor datasets. The Linear Discriminant Analysis (LDA) emerged as another approach commonly used to carry out dimensionality reduction. Its background could be traced to the PCA and it works by discriminating samples in their different classes. Its goal is to maximize the Fisher criterion specified by the objective function: (4.0) ∑ p m x m x T and s ∑ p E x m x m T with Where s x Є ci are called the Inter class scatter matrix and Intra class scatter matrix respectively. E denotes the expectation and p x is the prior probability of a variable (x) belonging to attribute (i). W can therefore be computed by solving arg max G w in the solution space H WЄ R , W T W I , in most w reports; this is always accomplished by providing solution to the generalized eigenvalue decomposition problem represented by the equation: Sbw =λSww
(5.0)
When the captured data is very large like in the case of sensors datasets considered in this research, LDA becomes inapplicable because it is harder and computationally expensive to determine the Singular Value Decomposition (SVD) of the covariance matrix more efficiently. LDA uses attribute label information of the samples, which has been found unsuitable by many researchers including [5] for numerical datasets. [11] had developed a variant of the LDA called the Incremental LDA (ILDA) to solve the problem of inability to handle massive datasets, but, its stability for this kind of application remains an issue till present date. The Orthogonal Centroid (OC) algorithm by [12 and13] is another acceptable algorithm that uses orthogonal transformation on centroid of the covariance matrix. It has been proved to be very effective for classification problems by [14] and it is based
Improved Incremental Orthogonal Centroid Algorithm
27
on the vector space computation in linear algebra by using the QR matrix decomposition where Q is an orthogonal matrix and R is an upper triangular matrix (Right Triangular Matrix) of the covariance matrix. The Orthogonal Centroid algorithm for dimensionality reduction has been successfully applied on text data (see [12]). But, the time and space cost of QR decomposition are too expensive for largescale data such as Web documents. Further, its application to numerical data or multivariate and multidimensional datasets of this sort remains a research challenge till date. In 2006, a highly scalable incremental algorithm based on the OC algorithm called the Incremental OC (IOC) was proposed by [5]. Because OC largely depends on the PCA, it is therefore not out of focus to state that the IOC is also a relaxed version of the conventional PCA. IOC is a one-pass algorithm. As dimensionality increases and defiles batch algorithms, IOC becomes an immediate alternative. The increase in data dimensionality could now be treated as a continuous stream of datasets similar to those obtainable from the velocity vane thermo-anemometer (VVTA) sensors and other data capturing devices, and then we can compute the low dimensional representation from the samples given, one at a time with user defined selection criterion Area of Interest (AOI) (iteratively). This reassures that the IOC is able to handle extremely large datasets. However, because of its neglect of the variables with extremely low eigenvalues, it is poised to be insensitive to outliers. Unfortunately, this is the case with the kind of data used in this research. There is therefore a necessity to improve the IOC algorithm to accommodate the insurgencies and the peculiarity presented by pipeline sensor datasets. The derivation of the IOC algorithm as well as the improvement proposed to the algorithm is discussed in detail in the following subsections.
3 IOC Derivation and the Proposed (HPDR) Improvement Basic Assumption 1: The IOC optimization problem could be restated as max ∑
WT S W
(6.0)
The aim of this is to optimise equation 6.0 with W Є Xdxp, where the parameters have their usual meanings. However, this is conditional upon wiwiT=1 with i=1,2,3,….p. Now, p belongs to the infinitely defined subspace of X, but, since it is not possible to select the entire variables for a particular data attribute at a time, we introduced a bias called Area of Interest (AOI) to limit each selection from the entire data space. A Lagrange function L is then introduced such that: Or
L w ,λ L w ,λ
∑ ∑
w S wT w S wT
λ w wT 1 λ w wT λ
(7.0)
(Observe that if wkwkT=1, then equation (7.0) is identically (6.0)) With λ being the Lagrange multipliers, at the saddle point, L must = 0. Therefore, it means S w T λ w T necessarily. Since obviously p >>>AOI at any point in time, this means that, w, the columns or attributes of W are p leading vectors of S . S n Can be computed therefore by using:
28
A. Folorunso Olufemi, M.S. Sunar, and N.M. Jusoh
∑AOI p n
S n
m n
m n
m n
m n
T
(8.0)
Where mj(n) is the mean of data attribute j at step i and m(i) is the mean of variables at step i. T is the order of the variable in the covariance matrix defined by data space X. To dance around this problem, the Eigen Value Decomposition (EVD) is the approach that is commonly used although it has been reported to have high computation complexity problems. The EVD is computed by following the following procedure: Given any finite data samples X={x1,x2,x3,...,xn} we first compute the mean of xi by using the conventional formula: ∑ x
µ
(9.0)
This is followed by the computation of the covariance C defined as: C
∑ x
x x
T
x
(10.0)
Next, we compute the eigenvalue λ(s) and eigenvectors e(s) of the matrix C and iteratively solve: Ce=λe
(11.0)
PCA then orders λ by their magnitudes such that λ1 >λ2>λ3>... >λn , and reduces the dimensionality by keeping direction e such that λ <<< T. In other words, the PCA works by ignoring data values whose eigenvalue(s) seems very insignificant. To apply this or make it usable for pipeline sensor datasets, we need a more adaptive incremental algorithm, to find the p leading eigenvectors of S in an iterative way. For sensor datasets, we present each sample of the selected AOI as: (x{n}, l ) where x{n} is the nth training data, l is its corresponding attribute label and n = 1, 2,3, . . . . AOI. a, then lim ∞ ∑ a i Basic Assumption 2: if given lim ∞ a n a by s , using Assumption 1.0: which induction, therefore, it means that lim ∞ s n means that: lim
∞
∑
s
s
(12.0)
However, the general eigenvector form is Au =λu, where u is the eigenvector of A corresponding to the eigenvalue-λ. By replacing the matrix A with s n , we can obtain an approximate iterative eigenvector computation formulation with v = Au =λu or u= v/λ: ∑
v n
s i u i
(13.0)
Injecting equation 8.0 into equation 13.0 implies: v n
1 n
AOI
Assuming that Φ i = m n
p n
m n
m n
m n); it means
m n
m n
T
u i
Improved Incremental Orthogonal Centroid Algorithm
∑AOI p n Φ i Φ i
∑
v n
T
u i
29
(14.0)
Therefore, since u= v/λ: the eigenvector u can be computed using u
(15.0)
But, vector u i could be explicitly defined as
u i =
, with i=1,2,3,…n.
Therefore, ∑AOI p n Φ i Φ i
∑
v n
T
(16.0)
Hence; v n
v n
If we substitute ξ n
∑AOI p n Φ n Φ n
1
Φ n
T
T
(17.0)
, j=1,2,3….AOI, and if we set v(0)=x(1) as
a starting point, then it is comfortable to write v(n) as: ∑AOI p n Φ n ξ n
v n
(18.0)
Since the eigenvectors must be orthogonal to each other by definition. Therefore, we could span variables in a complementary space for computation of the higher order eigenvectors of the underlying covariance matrix. To compute the (j+α)th eigenvector, where α=1,2,3…AOI, we then subtract its projection on the estimated jth eigenvector from the data. (19.0) (Note that j+α = AOI for any particular selection) Where x1(n)= x(n). Using this approach, we have been able to address the problem of high time consumption. Through the projection procedure at each step, we can then get the eigenvectors of Sb one by one (i.e for each set of the predetermined AOI). The IOC algorithm summary as presented by [5] is shown in Algorithm 1 and improved IOC called the HPDR algorithm is presented in Algorithm 2.0, the solution of step n is given as: v n
with j=1,2,3…p
(20.0)
3.1 The IOC Algorithm and the HPDR By going through the algorithm an example could be used to illustrate how HPDR solves the leading eigenvectors of Sb incrementally and sequentially. Let us assume that input sensor datasets obtained from the two sources (manually and experimentally) are represented by {ai }, i= 1,2,3,… and {bi }, i= 1,2,3… When there is no data input, the means m(0), m1(0), m2(0), are all zero. If we let the initial eigenvector v 1 =a1 for a start, then HPDR algorithm can be used to compute the initial values or the leading samples of the datasets ai(s) and bi(s) of the entire data space X. These initial values are given as: a1, a2, and b1, b2, and they can then be computed using equation 20.0.
30
A. Folorunso Olufemi, M.S. Sunar, and N.M. Jusoh
3.2 The Expected Likelihoods (EL) Given an arbitrary unordered set of data X defined by X={x1,x2,x3,…xn }k along with a set of unordered attributes Z={X1,X2,X3,…XN}k-n such that the attitudinal vector Zψ depends on the covariance matrix or X . The rowsum (RS), columsum (CS) and Grandtotal (GT) of the covariance matrix X│Xψ are defined as: RS
∑
X
CS
∑
Z
N
(21.0)
N
(22.0)
And GT
∑
Z
N
∑
X
N
(23.0)
The computation begins with the initialisation of counters for the row, the column and the Area of Interest (AOI) selected as i , j, and N respectively using E x ,y
RS CS
(24.0)
GT
The Averaged Expected Likelihood Al for E x , y is defined further by on major axis . A E . 0 elsewhere This gives a unit dimensional matrix A representing the original data X. E
25.0
3.3 Weighted Average Expected Likelihoods (WAEL) WAEL will behave similar to the normal statistical means, if all the sensor datasets are equally weighted, then what is computed is just the arithmetic mean which is considered unsuitable for sensor datasets due to its variability. For example in Table 2, we expanded IG computation for datasets represented in Table 1. to reflect the percentage contributions of each attributes. The percentage contribution is then calculated by the formula %Contribution = (Gain/Total Gain)*100. Table 2. Percentage Contributions of Attributes
Data Attribute
Information Gain
Pressure (p) Temperature (t) Volume (v) Flow Velocity (f) External Body Force(e)
0.898 0.673 0.944 0.445 0.429
Percentage Contribution (%) 26.5 19.86 27.85 13.13 12.66
3.4 High Performance Dimensionality Reduction Algorithm (HPDR) The strength of this paper is by the introduction of a mechanism for users’ choice of Areas of Interest (AOI). This is made possible by effectively determining the IG by each of the attributes and determining the lead attribute.
Improved Incremental Orthogonal Centroid Algorithm Algorithm 1. Conventional IOC Dimensionality Reduction Algorithm
Algorithm 2. High Performance Dimensionality Reduction Algorithm ൌͳǡʹǡ͵ǡǥ ǣ ሺሻൌሺሺǦͳሻሺǦͳሻሺሻሻȀ ሺሻൌሺǦͳሻͳǢሺሻൌሺሺǦͳሻሺǦͳሻሺሻሻȀሺሻ ȰͳሺሻൌሺሻǦሺሻǡൌͳǡʹǡǥͷ ൌͳǡʹǡǥͷǢൌͳǡʹǡ͵ǡǥ ሺൌͷǡ
ͷሻ ൌሺሻൌሺሻ ௩ ೕ ሺିଵሻ
ߙ ሺ݊ሻ ൌ ߔ ሺ݊ሻ் ฮ௩ೕ
ሺିଵሻฮ ைூ
ݒ ሺ݊ െ ͳሻଶ ͳ ݒ ሺ݊ሻ ൌ ቀ ሺ݊ሻߔ ሺ݊ሻߙ ሺ݊ሻቁ ݊ ݊ ୀଵ
ݔାఈ ሺ݊ሻ ൌ ߔ ሺ݊ሻ െ
ͳͲ
ሺߔ ሺ݊ሻ் ݒ ሺ݊ሻሻ ԡݒ ሺ݊ሻԡଶ
ሺሻ
ʲ ܧ௫ ൌ
Ǣ ǦǦ ͳǡǢ
ܴܵ݅ ݅ܵܥ כ ܶܩ
ͷǡǢͳͲǢ
ሺሻǦ
ܣ ൌ ߣ ܧ כ௫ Ȁͷ
௫ୀଵ
31
32
A. Folorunso Olufemi, M.S. Sunar, and N.M. Jusoh
3.6 Analysing the HPDR Algorithm This algorithm must be repeated p number of time (iterations) to predetermine AOI set of variables {j}, such that: α n ,
with i
1,2,3, … AOI
When computational complexities are no factors, HPDR offers a faster approach to reducing the dimensionality of the datasets based on the predefined criteria. The strength of this algorithm lies in the interaction with subtlety of the intrinsic data interdependencies. The HPDR also enables step by step and continuous processing of the data in a manner that supersedes the conventional batch processing technique.
4 Procedures Given D = n x m data space and two disjointed datasets {X, Sk Є D}Assuming that dataset (X)= {xi; 1 ≤ i ≤ Є N+} and dataset (Sk) = { sj; 1 ≤ j ≤ Є N+} Є D such that X∩ Sk = ф, then X and Sk are independent variables (vectors) of the set D it follows that: Centroid cXi
X
∑
2cXi
Or
∑I
∑
Sk
(27.0) ∑
s
x
(28.0)
X and Sk denotes the means of X and Sk respectively, and are arbitrary constants. If all missing sand s can be computed and inserted by “any means” into D such that n = n , it follows that: cXi
∑
s
∑
x
(29.0)
Therefore with the new centres for each classes or attributes, dataset D could be regrouped more effectively.
5 Results and Evaluation Generally, there is no uniformity or industrial standard for testing and implementing dimensionality reduction across all applications; researchers have developed areaspecific dimensionality reduction algorithms and techniques which has made comparison extremely difficult. Examples of such area or domain specific application are found in [3, 5, 6, 7, 16,17, 18, 19,20, 21, 22, 23, 24, 25 and 26] to mention but a few. To help evaluation, some checklists have been compiled using the Cross Industry Standard Process for Data Mining (CRISP-DM). Four algorithms are compared using the datasets obtained from two source: The VVTA and the Turbulence Rheometer. The results obtained are presented in Table 3.
Improved Incremental Orthogonal Centroid Algorithm
33
Table 3. Summary of the Result Obtained Comparing Four Dimensionality Reduction Algorithms
(AOI) -SELECTED VARIABLES 15-30 31-45 46-60
<14 EPR % HPDR 10 PCA 5 LDA 7 IOC 4
CMCR 0.20 0.452 0.429 0.50
TT (s) 2.0 2.26 3 1.99
EPR % 15 8 7 4
CMCR 0.15 0.389 0.43 0.50
TT (s) 2.2 3.11 3.01 1.99
EPR % 15 10 5 8
CMCR 0.22 0.315 0.602 0.25
TT (s) 3.3 3.15 3.01 2
EPR % 25 15 5 10
CMCR 0.14 0.217 0.602 0.20
TT (s) 3.6 3.25 3.01 2.0
The parameters used for comparison are the Error in Prediction Ratio (EPR), the Covariance Matrix Convergence Ratio (CMCR) and the averaged Time Taken (TT) for the computation.
Î
Nominal Value
ϯϬ Ϯϱ ϮϬ ϭϱ ϭϬ ϱ Ϭ й
й
й
й
,WZ
WZDZdd;ƐͿ WZDZdd;ƐͿ WZDZdd;ƐͿ WZ DZdd;ƐͿ фϭϰ
ϭϱͲϯϬ
ϯϭͲϰϱ
ϰϲͲϲϬ
W > /K
;K/ͿͲ^>dsZ/>^
Fig. 1. Comparing Dimensionality Reduction Algorithms
From the graph in Figure 1, the HPDR shows a lot of promises for higher selection of AOI although this has not been tested beyond 15 rows of selected variables at any single time due to the limitations imposed by the renderer. Acknowledgements. This work is supported by the UTMViCubeLab, FSKSM, Universiti Teknologi Malaysia. Special thanks to (MoHE), Malaysia and the Research Management Centre (RMC), UTM, through Vot.No. Q.J130000.7128.00J57, for providing financial support and necessary atmosphere for this research.
References 1. Goodyer, C., Hodrien, J., Jason, W., Brodlie, K.: Using high resolution display for high resolution 3d cardiac data. The Powerwall, pp. 5–16. University of Leeds (2009); The Powerwall Built from standard PC components of 7computers
34
A. Folorunso Olufemi, M.S. Sunar, and N.M. Jusoh
2. Ebert, D.S., Rohrer, R.M., Shaw, C.D., Panda, P., Kukla, J.M., Roberts, D.A.: Procedural shape generation for multi-dimensional data visualisation. Computers and Graphics 24, 375–384 (2000) 3. Masashi, S.: Dimensionality reduction of multimodal labeled data by local Fisher Discriminant analysis. Journal of Machine Learning Research 8, 1027–1061 (2007) 4. Donoho, D.L.: High-dimensional data analysis. The curses and blessings of dimensionality. In: Lecture delivered at the Mathematical Challenges of the 21st Century Conference, August 6-11. The American Math. Society, Los Angeles (2000) 5. Yan, J., Benyu, Z., Ning, L., Shuicheng, Y., Qiansheng, C., Weiguo, F., Qiang, Y., Xi, W., Zheng, C.: Effective and Efficient Dimensionality Reduction for Large-Scale and Streaming Data Preprocessing. IEEE Transactions on Knowledge And Data Engineering 18(3), 320–333 (2006) 6. da Silva-Claudionor, R., Jorge, A., Silva, C., Selma, R.A.: Reduction of the dimensionality of hyperspectral data for the classification of agricultural scenes. In: 13th Symposium on Deformation Measurements and Analysis, and 14th IAG Symposium on Geodesy for Geotechnical and Structural Engineering, LNEC Libson May, 2008LBEC, LIBSON, May 12-15, pp. 1–10 (2008) 7. Giraldo, L., Felipe, L.F., Quijano, N.: Foraging theory for dimensionality reduction of clustered data. Machine Learning 82, 71–90 (2011), doi:10.1007/s10994-009-5156-0 8. Vaccaro, R.J.: SVD and Signal Processing II: Algorithms, Analysis andApplications. Elsevier Science (1991) 9. Artae, M., Jogan, M., Leonardis, A.: Incremental PCA for OnLine Visual Learning and Recognition. In: Proceedings of the 16th International Conference on Pattern Recognition, pp. 781–784 (2002) 10. Weng, J., Zhang, Y., Hwang, W.S.: Candid Covariance Free Incremental Principal Component Analysis. IEEE Transaction on Pattern Analysis and Machine Intelligence 25, 1034–1040 (2003) 11. Hiraoka, K., Hidai, K., Hamahira, M., Mizoguchi, H., Mishima, T., Yoshizawa, S.: Successive Learning of Linear Discriminant Analysis: Sanger-Type Algorithm. In: Proceedings of the 14th International Conference on Pattern Recognition, pp. 2664–2667 (2004) 12. Jeon, M., Park, H., Rosen, J.B.: Dimension Reduction Based on Centroids and Least Squares for Efficient Processing of Text Data. Technical Report MN TR 01-010, Univ. of Minnesota, Minneapolis (February 2001) 13. Park, H., Jeon, M., Rosen, J.: Lower Dimensional Representationof Text Data Based on Centroids and Least Squares. BIT Numerical Math. 43, 427–448 (2003) 14. Howland, P., Park, H.: Generalizing Discriminant Analysis Using the Generalized Singular Value Decomposition. IEEE Trans. Pattern Analysis and Machine Intelligence 26, 995– 1006 (2004) 15. Han, J., Kamber, M.: Data Mining, Concepts and Techniques. Morgan Kaufmann (2001) 16. Mardia, K.V., Kent, J.T., Bibby, J.M.: Multivariate Analysis. Probability and Mathematical Statistics. Academic Press (1995) 17. Friedrnan, J.H., Tibshiirani, R.: Elements of Statistical Learning: Prediction. Inference and Data Mining. Springer, Heidelberg (2001) 18. Boulesteeix, A.: PLS Dimension reduction for classification with microarray data. Statistical Applications in Genetics and Molecular Biology 3(1), Article 33, 1–30 (2004) 19. Hand, D.J.: Discrimination and Classification. John Wiley, New York (1981) 20. Quinlan, J.R.: Induction of decision trees. Machine Learning 1, 81–106 (1986) 21. Quinlan, J.R.: Programs for Machine Learning. Morgan Kaufman (1993)
Improved Incremental Orthogonal Centroid Algorithm
35
22. Cox, T.F., Cox, M.A.A.: Multidimensional Scaling, 2nd edn. Chapman and Hall (2001) 23. Hoppe, H.: New quadric metric for simplifiying meshes with appearance attributes. In: Proceedings IEEE Visualisation 1999. IEEE Computer Society Press (1999) 24. Hyvärinen, A.: Survey on independent component analysis. Neural Computing Surveys 2, 94–128 (1999) 25. Levoy, M.P.K., Curless, B., Rusinkiewicz, S., Koller, D., Pereira, L., Ginzton, M., Anderson, S., Davis, J., Ginsberg, J., Shade, J., Fulk, D.: The Digital Michelangelo Project. 3D scanning of large statues. In: Proceedings of ACM SIGGRAPH 2000. Computer Graphics Proceedings, Annual Conference Series, pp. 131–144. ACM (2000) 26. Lee, T.W.: Independent Component Analysis: Theory and Applications. Kluwer Academic Publishers (2001)
3D Visualization of Simple Natural Language Statement Using Semantic Description Rabiah Abdul Kadir, Abdul Rahman Mad Hashim, Rahmita Wirza, and Aida Mustapha Faculty of Computer Science and Information Technology Universiti Putra Malaysia, 43400 UPM Serdang, Selangor
[email protected],
[email protected], {rahmita,aida}@fsktm.upm.edu.my
Abstract. Visualizing natural language description is a process of generating 3D scene from natural language statement. Firstly, we should consider the real world visualization and find out what are the keys of visual information that can be extracted from the sentences which represents the most fundamental concepts in both virtual and real environments. This paper focuses on method of generating the 3D scene visualization using semantic simplification description based on logical representation. Based on semantic simplification’s description, a concept of parent-child objects relationship is derived. This concept is used as the foundation in determining spatial relationships between objects. Aim of this study is to analyze and match the visual expression and the key information using visual semantic simplification description which presented in logical form. Based on the experimental result, it shows that 60% of the phrases were able to give appropriate depiction as the meaning of the phrase. Keywords: 3D Visualization, Natural Language Representation, Logical Approach, Semantic Technology, Visual Informatics.
1 Introduction A transformation tool of text to 3D visual is very helpful in storyboarding for game, animation and film making industry. Usually human can easily perceived messages through scene depiction rather than analyzing messages through the whole lines of text. The aim of this study is to investigate the 3D scene visualization using semantic simplification that based on the logical representation. This study is demonstrated through a prototype tool that we developed to illustrate a text document in natural language input, which presented in logical form into 3D visual expression. Form the visualization we are to establish a correspondence between word and 3D object models, and a sentence / phrase and the scene it may evoke. Finally we want to analyze and match the visual expression and the key information using visual semantic simplification description based on logical representation. We divided this paper as follows; Section 2 is about the related works, Section 3 explanation on the implementation of the 3D visualization, Section 4 the experimental result, the discussion in Section 5 and finally the conclusion and briefly future work of this study in Section 6. H. Badioze Zaman et al. (Eds.): IVIC 2011, Part I, LNCS 7066, pp. 36–44, 2011. © Springer-Verlag Berlin Heidelberg 2011
3D Visualization of Simple Natural Language Statement Using Semantic Description
37
2 Related Works Natural language input has been studied in a number of very early 3D graphics systems [4][6][7] and the Put system [3], which was limited to spatial arrangements of existing objects in a pre-constructed environment. Also, input was restricted to an artificial subset of English consisting of expressions of the form Put(X, P, Y), where X and Y are objects and P is a spatial preposition. Several more recent systems target animation rather than scene construction. Work at the University of Pennsylvania’s Center of Human Modeling and Simulation [2] used language to control animated characters in a closed virtual environment. CarSim [11] is domain-specific system where short animations were created from natural language descriptions of accident reports. CONFUCIUS [14] is a multi- modal animation system that takes as input a single sentence containing an action verb. The system blends animation channels to animate virtual human characters. Another recent system, from the University of Melbourne [18], uses a machine learning-based approach to create animated storyboards on a pre-made virtual stage. In these systems the referenced objects and attributes are typically relatively small in number or targeted to specific pre-existing domains.
3 Implementation The visualization input is a natural language text that will be translated into a logical representation. The translation from text to logical representation is done using the work by [1] and the process is not discussed in this paper. The visualization tool developed in this study was using 3D modeling tool called Blender3D [20] that uses the built-in Python scripting in order to work with their Blender API. We will discuss on how the objects and attributes are obtained from the logical representation, building the 3D object through mapping the vocabularies and the 3D objects, and also handling issues with 3D objects. 3.1 Logical Representations The natural language text is an input categorized of prepositional phrase consists of two possible nouns; either a simple noun or a noun with modifier. There are three types of logical representation is generated from the input phrases as follows. i. Noun with modifier. Logical representation of this type will be converted into such a way: u(v), where u=adjective, v=noun. e.g. Natural Language: Blue box. Logical Representation: blue(box) ii. Prepositional phrase with simple noun Logical representation of this type will be converted in such a way: w(x,y), where w=preposition, x=first noun, y=second noun e.g. Natural Language: Boy on a table. Logical Representation: on(boy,table) iii. Prepositional phrase with noun and modifier Logical representation of this type will converted be in such a way: x1=u(v), where u=adjective, v=noun.
38
R. Abdul Kadir et al.
w(x1,y), where w=preposition, x1=noun with modifier, y=second noun e.g. Natural Language: Blue box under the white table. Logical Representation: x1=blue(box) y1=white(table) under(x1,y1) With the logical representation in place the 3D visualization will need to determine which of the structure of the logical representation is the adjective, preposition, object and subject of the sentence. Using Python regular expression function, the logical representation pattern is processed to extract these attributes out and keep them in the variables that will be called later for mapping the vocabularies with their respective 3D objects, attributes and spatial positioning. 3.2 3D Visualization 3D model is created based on the logical representation. We are using Blender 3D to constructs 3D scene. During the implementation, 3D object is called for scene composition. The object file contains a collection of vertices that largely describe the 3D object. We are implementing the Python Blender API to called the object and brought it onto the 3D space through the bpy.ops.wm.link_append() function. Coordinate space in Blender is an important key in positioning the 3D objects. Blender represents locations in a scene by their coordinates. The coordinates of a location consist of three numbers that define its distance and direction from a fixed origin. More precisely: The first (or x-) coordinate of the location is defined as its distance from the YZ plane (the one containing both the Y and Z axes). Locations on the +X side of this plane are assigned positive x-coordinates, and those on the -X side are given negative ones. Its second (or y-) coordinate is its distance from the XZ plane, with locations on the -Y side of this plane having negative y-coordinates. Its third (or z-) coordinate is its distance from the XY plane, with locations on the -Z side of this plane having negative z-coordinates. 3.2.1 3D Object Dimension 3D models used in this project are made of the 3D modeling tool call Blender 3D. Blender 3D is a 3D model software than can model, shade, animate, render and composite 3D objects like most of 3D software does. It is free software released under GNU General Public License. In Blender 3D, 3D model is saved in file with the extension of .blend. In this .blend file the data that are used by Blender 3D to construct a 3D scene are kept in folders. The following are common list of folder structure inside a Blender3D model file. 3.2.2 Uneven Surfaces If two objects “Book on the box.” the following steps will apply. 1. Conversion the natural language to the logical representation, will generate a logical representation of on(book,box).
3D Visualization of Simple Natural Language Statement Using Semantic Description
39
2. From the logical representation of on(book,box), the identified items will be on as the preposition, book as the child object and table as the parent object. 3. According to the parent – child approach, the parent object has to be pun the 3D space first and the parent objects properties should be collected prior to visualizing the child object. The object properties needed are position point of local coordinate and the dimension of x, y and z. 4. The moving direction for the child object is determined by the preposition word. The moving direction will result in minus or increment of location point on axis x, y or z to get to the point for child object. 5. Then the distance for the local coordinate of parent object is calculated either by minus or plus the dimension figure of the parent object with local coordinate of the parent object on the axis x, y or z. Consider a phrase of “Book on a chair.” which gives on(book,chair) as its logical representation. In this case the chair as the parent object is differ from the box in previous example in such a way that chair object is made of a complex geometry with uneven surfaces on every angle. Child object (book) has been correctly placed on the relative point of the parent object (chair) in every position but it will be improper positioning for the prepositions of On and Under. This shows that the parent object dimension like chair, can infer a proper placement for a child object on every directions and places except for preposition of On and Under. This is due the uneven surface at the bottom and top part of parent object (chair). Using the dimension figure of chairs object makes its child object (book) being placed right under the chair’s leg. This makes it looks like the chair is standing on the book. It is correct to show the spatial relation but it would make more sense if the book were being placed right under the chair, on the same level the chair’s leg is standing. The same goes to the preposition of On. The book should be placed on seat of the chair not on its top rail. For the issue of uneven surface highlighted above, 3D object annotation is used as a countermeasure. Every 3D object that is used in this study will be annotated of its bottom surface type. The topmost height of object will also be tagged to ensure the placement of child object on the top of object giving a more reasonable look. The annotation will be discussed further in the next section. 3.2.3 Data Structure To visualize the natural language phrase into a 3D scene dynamically, this prototype uses a couple set of predefined data that has been organized into the following: 1. 2. 3.
3D objects Direction Colors
Every 3D object is kept in a folder and they are called by Blender into the 3D space using Python script. Path to this 3D object is predefined together with some other metatags as listed in the Table 3.1 to Table 3.3.
40
R. Abdul Kadir et al. Table 3.1. Data structure for 3D objects Data ObjectName
Data Type String
ObjectPath TopMost
String Double
EmptySpaceBott om
Bool
Description
The name of the object. This name is as what received from the logical representation. Directory path to the respective .blend file. The annotated length on z-axis dimension referring the highest part of the object that a child-object should be placed on. TRUE if the object have an empty space at the bottom part. Something like chair objects structure. Otherwise it is FALSE.
Table 3.2. Data structure for prepositions Data PrepositionName
Data Type String
PrepositionAxis
Char
PrepositionDirect Integer ion
Description The name of the preposition. This name is as what received from the logical representation. Axis that the object is moved on. Either x, y or z. Value consists of -1,1,0 and 2. -1 = moving to the negative value on the axis; 1 = moving to the positive value on the axis; 0 = random direct, toward either to negative or to the positive value of the axis; and 2 = object is not moved.
Table 3.3. Data structure for prepositions Data ColorName
Data Type String
ColorRed
Double
ColorGreen
Double
ColorBlue
Double
Description The name of the color (adjective). This name is as what received from the logical representation. Value color of red to perform RGB color to the object. Value color of green to perform RGB color to the object. Value color of blue to perform RGB color to the object.
In order to let Blender now of what axis and either it is positive or negative value should an object to move on the axis and moving direction for preposition is predefined in a table data. Table shows the data dictionary that holds the collection of predefined preposition metadata. When a preposition contain noun with a modifier (color) the modifier name is queried in the database to get the predefined of its equivalent RGB. This RGB code will then be passed to the Blender material and construct a new material with color and apply it to the respective 3D object. Table shows the data structure for colors.
3D Visualization of Simple Natural Language Statement Using Semantic Description
41
4 Experimental Results
Figure 3.1
Figure 3.2
Figure 3.3
From the list of experimental data, Figure 3.1 until Figure 3.3 is some of the depictions we have got. From observation on the 3D scene depictions, below are the things that we have noticed. 1. All of the objects have been placed according to their spatial relation except for the prepositions under. The child object was displayed right towards the –Z axis under the coordinate of (0,0,0). 2. Using the tree object as the parent would cause the child to be place far a bit the tree. 3. Improper visualization for placement with the preposition of on and under. For the phrase of Box on chair, though the box is placed on the seat of the chair, the box eats partially of the backrest of the chair. The same goes to the phrase of Box under the chair, where the legs of the chairs disappeared inside the box.
5 Discussion For issue number 1 above, the child object was place under the coordinate (0,0,0) downwards on axis-z because of the reasons below. 1. When visualizing two objects of parent-child relations, the parent will be brought onto the 3D space first. 2. The parent object’s local coordinate will always be place at the centre coordinate, which is (0,0,0). This makes the parent object being place right on the coordinate of (0,0,0) already. 3. Placement of child object always refers to the position point of parent object’s local. 4. Preposition of under will definitely moving the child object in decremented coordinate on axis-z (moving downwards). Tweak of this issue would be putting the visualization procedure of parent – child relational objects into an exception. The child object will be brought into the 3D space first Instead of the parent object. In this case the child object is taking the parents role because being an object of under other object it always has to be on the coordinate of (0,0,0).
42
R. Abdul Kadir et al.
Issue of number 2 happens because the child use the parent object’s dimension-x or dimension-y to get away from parent object on the axis-x or axis-y as how its spatial relation is. Hence, the bigger of the parent objects dimension-x or dimension-y the farther the child object will get away. To avoid this problem, the calculation that the child object needs, would be as the in following steps; 1. Take half of the parent object’s dimension-x or dimension-y. 2. Take half of the child object’s dimension-x or dimension-y. 3. Sum these two figures and use this figure to move the child objects on the axis-x or axis-y. The reason of taking half of both objects is that the starting point that child object refer to move away from is the parent object local coordinate. Every 3D object in this prototype their local coordinate is set at their bottom centre. Hence combining the half dimension-x or dimension-y of each object would position them around together in a reasonable distance. Issue like in number 3 would occur when the child object has a bigger dimension-x or dimension-y than the parent object. The countermeasures for such an issue would be; 1. Before visualizing the child object, compare the child object dimension-x and dimension-y with those of parent object. If child object turns out to be larger than of the parent object, scale the child object’s dimensions down (resize the object), or 2. By adding a few more annotation to the parent object. This annotation tags the dimension-x and dimension-y of certain height of the objects. These dimensions will provide the only space that a child object can fit in. For instance, tags the dimension-x and dimension-y at the height of the seat part of the chair. Then before visualizing the child objects compare the child objects dimensions with the parent objects dimensions that were tagged as explained above. If the child’s dimensions are larger scale it down until the child’s dimension get lesser than those tagged dimensions of parent object. Using the solution in number 1 above would not always work since some parent object would have a shrink geometrical shape on certain heights. Just like human body, the shoulder part usually broader than the waist. Blender will return an objects’ dimensions like in bounding box dimension regardless the object have even surface or not. The countermeasure number 1 can be done right away through Blender Python command script without involve additional data from SQLite. Unlike the countermeasure number 2, the process involves a predefined data tagged dimensions as has been explained above. However the countermeasure number 2 would consistently give a reasonable proper visualization of object placement.
6 Conclusion and Future Work This study has demonstrated the 3D scene visualization using semantic simplification description based on logical representation using the technique proposed by [1] as well as the work by [17]. From the semantic simplification’s description, parent child relationship that helps in satisfying the 3D visualization dependencies of two
3D Visualization of Simple Natural Language Statement Using Semantic Description
43
objects, represents their spatial relation. Object constraint like the uneven geometrical surface issue was taken into account and it is solved by object annotation. Annotation was also done on the prepositioning to determine the direction of object moving and the affect axis. Other annotation subject was color. A natural language vocabulary of color was represented by a combination of numerical value of red, green and blue. This annotation approach is adaption of method employed by [3] and from their work in graphic visualization using visual semantic parameterization. The work by [2] utilizes XML as their formatted data structure tool while in this study the concept is applied by utilizing SQLite database engine. In this study we found that the positioning child object using on and under by manipulating the parent objects dimension would give improper visualization when the parent object consist of uneven space. Annotation the 3D objects by tagging the height for child object placement and tagging the surface geometrical bottom side of the object would solve the problem. However there is another issue to take into account concerning the proper visualization. When a child object's dimensions are larger than the parent object's dimensions a particular height some part of the parent object might be invisible covered by the surfaces of child object. The best example was as in the discussion of the Box on the chair and the Box under the chair. Further annotation would be needed to handle this issue as elaborated in the discussion previously. From a recent reading concerning the uneven surface constraint, [18] utilized physic collision and voxelization in their work that involves positioning 3D objects to comply the spatial relations. Through collision over the voxelized part, the objects are partitioned into 9 spatial regions subset. The collision detection handling on the respective region would visualize objects’ spatial region properly. Using this approach in conjunction with the study in 3D scene visualization using logical representation text based would be definitely one of the main future works for consideration. Other future works that could be possibly a complement to this study would be extending the visualization on object’s texture, comparative adjective in terms of size and color intensity. This work can also be extended in visualizing 3D scene with animation from a verb phrase.
References 1. Rabiah, A.K., Sembok, T.M.T., Halimah, B.Z.: Generating 3D Visual Expression Using Semantic Simplification Description Based on Logical. Work, 58–65 (2009) 2. Mehdi, Q.H., Gough, N.E.: From Visual Semantic Parameterization to Graphic Visualization, pp. 1–6 (2005) 3. Clay, S.R.: Put: Language- Based Interactive Manipulation of Objects. IEEE Computer Graphics and Applications (1996) 4. Boberg, R.: Generating Line Drawings from Abstract Scene Descriptions. Master’s thesis, Dept. of Elec. Eng. MIT, Cambridge, MA (1972) 5. Simmons, R.: The clownsmicroworld. In: Proceedings of TINLAP, pp. 17–19 (1998) 6. Kahn, K.: Creation of Computer Animation from Story Descriptions. Ph.D. thesis, MIT, AI Lab, Cambridge, MA (1979)
44
R. Abdul Kadir et al.
7. Adorni, G., Di Manzo, M., Giunchiglia, F.: Natural language driven image generation. In: COLING, pp. 495–500 (1984) 8. Hanser, E., Kevitt, P.M., Lunney, T., Condell, J.: SceneMaker: Automatic Visualisation of Screenplays. Communication (2009) 9. Hanser, E., Kevitt, P.M., Lunney, T., Condell, J.: SceneMaker: Intelligent Multimodal Visualisation of Natural Language Scripts. Structure, 144–153 (2010) 10. Rowe, N.: The Visualization and Animation of Algorithmically Generated 3D Models Using Open Source Software. In: 2009 First International Conference on Advances in Multimedia, pp. 44–49 (July 2009) 11. Dupuy, S., Egges, A., Legendre, V., Nugues, P.: Generating a 3d simulation of a car accident from a written description in natural language: The carsim system. In: Proceedings of ACL Workshop on Temporal and Spatial Information Processing, pp. 1–8 (2001) 12. http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm? arnumber=5206914 13. Ye, P., Baldwin, T.: Towards Automatic Animated Storyboarding. Artificial Intelligence, 578–583 (2008) 14. Ma, M.: Automatic Conversion of Natural Language to 3D Animation. Ph.D. thesis, University of Ulster (2006) 15. Coyne, B., Sproat, R.: WordsEye: An Automatic Text-to-Scene Conversion System. Linguistic Analysis (2001) 16. Zeng, X., Mling, T.: A Review of Scene Visualization Based on Language Descriptions. In: 2009 Sixth International Conference on Computer Graphics, Imaging and Visualization, pp. 429–433 (August 2009), http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm? arnumber=5298772 17. Coyne, B., Rambow, O., Hirschberg, J., Sproat, R.: Frame Semantics in Text-to-Scene Generation. In: Setchi, R., Jordanov, I., Howlett, R.J., Jain, L.C. (eds.) KES 2010 Part IV. LNCS, vol. 6279, pp. 375–384. Springer, Heidelberg (2010) 18. Seversky, L.M.: Real-time Automatic 3D Scene Generation from Natural Language Voice and Text Descriptions. Interface, 61–64 (2006) 19. Ye, P., Baldwin, T.: Towards automatic animated storyboarding. In: Proceedings of the 23rd National Conference on Artificial Intelligence, vol. 1, pp. 578–583 (2008) 20. Blender 3D, http://www.blender3D.org
Character Recognition of License Plate Number Using Convolutional Neural Network Syafeeza Ahmad Radzi1,2 and Mohamed Khalil-Hani2 1
Faculty of Electronics & Computer Engineering Universiti Teknikal Malaysia Melaka (UTeM) 2 VLSI-eCAD Research Laboratory (VeCAD) Faculty of Electrical Engineering Universiti Teknologi Malaysia (UTM)
[email protected],
[email protected]
Abstract. This paper presents machine-printed character recognition acquired from license plate using convolutional neural network (CNN). CNN is a special type of feed-forward multilayer perceptron trained in supervised mode using a gradient descent Backpropagation learning algorithm that enables automated feature extraction. Common methods usually apply a combination of handcrafted feature extractor and trainable classifier. This may result in sub-optimal result and low accuracy. CNN has proved to achieve state-of-the-art results in such tasks such as optical character recognition, generic objects recognition, real-time face detection and pose estimation, speech recognition, license plate recognition etc. CNN combines three architectural concept namely local receptive field, shared weights and subsampling. The combination of these concepts and optimization method resulted in accuracy around 98%. In this paper, the method implemented to increase the performance of character recognition using CNN is proposed and discussed. Keywords: Character recognition, convolutional neural network, back propagation, license plate recognition, parallel architecture, Visual Informatics.
1 Introduction Convolutional Neural Network (CNN) is a special type of multilayer perceptron; a feed-forward neural network trained in supervised mode using a gradient descent Backpropagation learning algorithm that minimizes a loss function [1, 2]. It is one of the most successive machine learning architectures in computer vision and has achieve state-of-the-art results in such tasks as optical character recognition [1], generic objects recognition, real-time face detection [3] and pose estimation, speech recognition, license plate recognition [4-6] etc. Its strategy is to extract simple features at higher resolution and transform them to complex features at lower resolution. Lower resolution is obtained by applying subsampling at previous layer of such feature maps. The CNN like other artificial neural networks could benefit in speed by implementing parallel architectures. A parallel implementation helps to speed up CNNs simulation,
H. Badioze Zaman et al. (Eds.): IVIC 2011, Part I, LNCS 7066, pp. 45–55, 2011. © Springer-Verlag Berlin Heidelberg 2011
46
S.A. Radzi and M. Khalil-Hani
allowing to use more complicated architectures in real-time. It also significantly speeds up training process, which could take days using unparallel architectures. This work is the initial step towards license plate recognition in which the individual characters are manually extracted and recognized using CNNs. The outline of this paper is as follows: Section 2 discusses the theory of CNN. The methodology of the proposed architecture is discussed in Section 3. Section 4 represents the results and discussion, and the final section gives the conclusion of the overall work.
2 Theory The CNN consist layer of neurons and it is optimized for two-dimensional pattern recognition. CNN has three types of layer namely convolutional layer, subsampling layer, and fully connected layer. These layers are arranged in a feed-forward structure as shown in Fig. 1.
Input 1@32x32
C1:feature maps 6@28x28
S2:feature maps 6@14x14
C3:feature maps 16@10x10
S4:feature maps 16@5x5
F5:layer 120
2x2 subsampling 5x5 convolution
F6:layer 10
Full connection 5x5 convolution
2x2 subsampling
Fig. 1. Architecture of CNN [2]
A convolutional layer consists of several two dimensional planes of neurons known as feature maps. Each neuron from a feature map is connected to a neighborhood of neurons from previous layer, resulting to a so called receptive field. Convolution between input feature maps and the respective kernel is computed. These convolution outputs are then summed up together with a trainable bias term which is then passed to a non-linear activation function such as hyperbolic tangent to obtain a new feature value [2]. Weights are shared in convolution matrices, so that large images can be processed with a reduced set of weights. Convolutional layer acts as a feature extractor that extracts salient features of the inputs such as corners, edges, endpoints or nonvisual features in other signals using the concept of local receptive field and shared weights [1, 3]. Shared weights have several advantages as it reduces the number of free parameters to train on, reduce the complexity of the machine, and reducing the gap between test error and training error. An interesting property of convolutional layers is that if the input image is shifted, the feature map output will be shifted by the same amount, but it will be left unchanged otherwise. This property is the basic robustness of CNN for shifts and distortions of the input. One feature map in a layer will have identical weight vectors. A complete convolutional layer is composed of several feature maps (with different weight vectors) so that multiple features can be extracted at each location. Clear view of the statement is represented in Fig. 2. The output yn(l) of a feature map n in a convolutional layer l (as described in [2, 3]) is given by
Character Recognition of License Plate Number Using Convolutional Neural Network y n (l ) ( x , y ) = φ (l ) wmn (l ) (i, j ) ⋅ y m (l −1) ( x ⋅h (l ) + i , y ⋅ v (l ) + j ) + bn (l ) m∈M ( l ) (i , j )∈K ( l ) n
(1)
where
{
K (l ) = (i, j ) ∈ Ν 2 | 0 ≤ i < k x (l ) ;0 ≤ j < k y ( l )
},
47
k x (l ) and k y (l ) are the width and the height of
(l )
the convolution kernels
wmn of layer l and bn (l ) is the bias of feature map n in layer l.
The set M n (l ) contains the feature maps in the preceding layer l-1 that are connected to feature map n in layer l. The values h (l ) and v (l ) describe the horizontal and vertical step size of the convolution in layer l while φ (l ) is the activation function of layer l.
φ
(a)
(b)
Fig. 2. (a) Spatial convolution [1] (b) CNN convolutional layer [2]
Once a feature has been detected, its exact location becomes less important. Only its approximate position relative to other features is relevant. A simple way to reduce the precision is to reduce the spatial resolution of the feature map. This can be achieved with a so-called subsampling layer which performs local averaging. The subsampling layer reduces the resolution of the image thus reduces the precision of the translation (shift and distortion) effect since the feature maps are sensitive to translation in input [1]. This layer reduces the output of adjacent neuron from previous layer (normally 2x2) by averaging it into a single value. Next, the value is multiplied by a trainable weight (trainable coefficient), adds a bias and then passes the result to a non-linear activation function; such as hyperbolic tangent. The trainable coefficient and bias control the effect of the sigmoid nonlinearity. If the coefficient is small, the unit operates in a quasi-linear mode and the subsampling merely blurs the input. If the coefficient is large, subsampling units can be seen as performing a “noisy OR” or a “noisy AND” function depending on the value of the bias. The illustration can be viewed in Fig. 3. The output yn(l) of a feature map n in a subsampling layer l (as described in [2, 3]) is given by yn (l ) ( x, y) = φ (l ) wn (l ) ⋅ yn (l −1) (x ⋅ s x + i, y ⋅ s y + j ) + bn (l ) (i , j )∈S ( l )
(2)
where S (l ) = {(i, j ) ∈ Ν 2 | 0 ≤ i < s x (l ) ;0 ≤ j < s y (l ) }, s x (l ) and s y (l ) define width and height of the subsampling kernel of layer l and bn (l ) is the bias of feature map n in layer l.
48
S.A. Radzi and M. Khalil-Hani
The value wn (l ) is the weight of feature map n in layer l and φ (l ) is the activation function of layer l.
φ
(a)
(b)
Fig. 3. (a) Example of subsampling process [1] (b) CNN subsampling layer [2]
The final layer is a fully connected layer. In this layer, the neurons from the previous layer are fully connected to every neuron in the current layer. The fully connected layer acts as a normal classifier similar to the layers in traditional Multilayer Perceptron (MLP) networks. The equation of fully connected layer (as described in [2]), is given by N l −1 y ( l ) ( j ) = φ (l ) y (l −1) (i) ⋅ w(l ) (i, j ) + b (l ) ( j ) i =1
(3)
where N (l −1) is the number of neurons in the preceding layer l-1, w (l ) (i, j ) is the
weight for connection from neuron i in layer l-1 to neuron j in layer l and b (l ) ( j ) is the bias of neuron j in layer l, and φ (l ) represents the activation function of layer l. The number of layers depends on the application and each neuron in a layer is an input to the following layer. Besides, the current layer only receives input from the preceding layer plus a bias that is usually 1. Each neuron applies weights to each of its inputs and summed up all the weighted inputs. The total weighted value is subject to a non-linear function; such as sigmoid function to limit the neuron’s output to a range of values. Multiple planes are used in each layer so that multiple features can be detected.
3 Methodology This section describes the research methodology to develop the character recognition system. 3.1 Preparing the Training and Test Data Set
The data that are available for training are divided into two different sets: train set and validation set. There should not be any overlapping between these two datasets in order to improve generalization capacity of a neural-network. This technique is called cross validation [3]. The true performance of a network is only revealed when the network is tested with test data to measure how well the network performs on data that were not seen during training. The testing is designed to access the generalization capability of
Character Recognition of License Plate Number Using Convolutional Neural Network
49
the network. Good generalization means that the network performs correctly on data that are similar to, but different from the training data. The training and test data are limited to Malaysian license plate. The alphanumeric characters involved are all the alphabets except for ‘I’, ‘O’ and ‘Z’ as those three characters are not common for Malaysian license plate. The numeric characters are from 0 to 9. Therefore, the total characters are 33. Initially, character recognition is performed to test the algorithm. For that purpose, the characters are extracted from different-angle of license plate and it is binarized, resized from 22x12 pixels to 24x14 pixels and labeled. When padding the input, the feature units are centered on the border and each convolution layer reduces the feature size from n to (n-4)/2. Since the initial input size is 22x12, the nearest value which generates an integer size after two layers of convolution is 24x14.This is to ease the training process. The total numbers of training and test datasets involved are 750 images and 434 images, respectively. CNN can actually deal with raw data but to make it simple, it is suggested to perform simple image processing on train and test data as done in [1]. Fig. 4 shows a sample of test set images.
Fig. 4. Sample of testing set images
3.2 Developing a MATLAB Model
CNN algorithm is implemented by constructing a Matlab program to simulate and evaluate the features extraction process performance upon character image database. Fig. 5 shows the CNN architecture for character recognition. CNN are trained with gradient-based backpropagation method. The purpose of implementing this learning algorithm is to minimize the error function after each training example by adjusting the weights of neural network. The simplest error function used is Mean Square Error. All training patterns along with the expected outputs are fed into the network. Next, the network error (the difference between actual and expected output) is backpropagated through the network and the gradient of the network error is computed with respect to the weights. This gradient is then used to update the weight values according to specific rules such as stochastic, momentum, appropriate learning rate and activation function, etc. The training process will stop until the network is well trained [1]. The supervised training is shown in Fig. 6. Once the network is trained, a test image will be fed to the trained system to perform pattern classification. This statement describes Fig. 7. 3.3 The Proposed Architecture
The architecture shown in Fig. 5 comprises 5 layers, excluding the input, all of which contains trainable weights. The actual character size is 22x12 and padding it to 24x14 to extract the feature in the border of the character image. Layer C1 is a convolutional layer with 6 feature maps. The size of the feature maps is 20x10 pixels. The total number of neurons is 1200 (20x10x6). There are ((5x5+1)x6)=156 trainable weights. The "+1" is for the bias. Each 1200 neurons have
50
S.A. Radzi and M. Khalil-Hani
Fig. 5. The proposed architecture
Fig. 6. Supervised training diagram
Fig. 7. Testing process diagram
26 connections which make up 31200 total connections from layer C1 to prior layer. At this point, one of the benefits of a convolutional "shared weight" neural network should become clearer: because the weights are shared, even though there are 31200 connections, only 156 weights/parameters are needed to control those connections. As a consequence, only 156 weights need training. In comparison, a traditional "fully connected" neural network would have needed a unique weight for each connection, and would therefore have required training for 31200 different weights. Layer S2 is a subsampling layer with 6 feature maps of size 10x5 pixels. Each unit in each feature map is connected to a 2x2 neighbourhood in the corresponding feature map in C1. The 2x2 receptive fields are non-overlapping, therefore feature maps in S2 have half the number of rows and columns of feature maps in C1. Therefore, there are a total of 10x5x6 = 300 neurons in layer S2, 2x6 = 12 weights, and 300x(2x2+1) = 1500 connections.
Character Recognition of License Plate Number Using Convolutional Neural Network
51
Layer C3 is a convolutional layer with 16 feature maps. Each unit in each feature maps is connected to several 5x5 neighbourhoods at identical locations in a subset of S2 feature maps. The reason for this is to keep the number of connections with reasonable bounds. There are therefore 6x1x16 = 96 neurons in layer C3, 1516 weights, and 9096 connections. Table 1 shows the connection between S2 and C3. Table 1. Each column indicates which feature map in S2 are combined by the units in a particular feature map of C3 [1]
Layer C4 is a fully-connected layer with 120 units. The choice for this number is due to optimal capacity reached with 120 hidden units for 33 classes. Since it is fullyconnected, each of the 120 neurons in the layer is connected to all 96 neurons in the previous layer. Therefore, there are a total of 120 neurons in layer C5, 120x(96+1) = 11640 weights, and 120x97 = 11640 connections. Layer Output represents the output layer. This layer is a fully-connected layer with 33 units. Since it is fully-connected, each of the 33 neurons in the layer is connected to all 120 neurons in the previous layer. There are therefore 33 neurons in layer F6, 33x(120+1) = 3993 weights, and 33x121 = 3993 connections. Neurons with the value of “+1” corresponds to the “winning” neurons while “-1” corresponds to other neurons. No specific rules were given on deciding the number of layers and feature maps in order to obtain optimum architecture. As long as sufficient information could be extracted for classification task, the architecture is considered accepted. This could be determined by the misclassification rate obtained by the network. However, minimum number of layers and feature maps are much preferred to ease the computation process. Each unit in each feature map is connected to a 5x5 kernel in the input. This kernel size is chosen to be centered on a unit (odd size) in order to have sufficient overlap (around 70%) for not losing information. The kernel size of 3x3 would be too small with only one unit overlap while 7x7 kernel size would be too large that would add the computation complexity [7]. 3.4 Optimizing the CNN Algorithm
There are several actions that could be taken to improve the CNN performance. An extension to backpropagation algorithm could be considered to improve the convergence speed of the algorithm, avoiding local minima, improve the generalization of neural network and finally improves the recognition rate/accuracy. A character database consisting of 33 classes representing alphanumeric characters was created. The system sensitivity to angles and distortions was reduced by taking 10 samples for each class. There are several factors that affect the system performance as discussed below. The parameter value were taken from [1]. Among these techniques, weight decay is not implemented by the writer.
52
S.A. Radzi and M. Khalil-Hani
Training Mode. There are two principle training modes which determine the way the weights are updated. The first mode is online training (stochastic gradient). This mode represents a single example that is chosen randomly from the training set at each iteration t and the error is calculated before the weights are updated accordingly. Second mode is the offline training (batch training). The whole training example is fed into the network and the accumulated error is calculated before updating the weights. Between these two modes, online learning is much faster than batch learning and results in better generalization solution for large datasets [1, 3]. Learning Rate. Learning rate is used during the weight update of such architecture. This parameter is crucial in determining the successful of convergence and generalization of such neural network. A too small learning rate leads to slow convergence and oppositely leads to divergence. For this architecture, the values of the global learning rate η is adapted from [1]. The value was decreased using the following schedule: 0.0005 for the first two passes; 0.0002 for the next three; 0.0001 for the next three; 0.00005 for the next four; and 0.00001 thereafter. Activation Function. The activation function is pre-conditioned for faster convergence. The squashing function used in this convolutional network is f ( a) = A tanh( Sa) . In [1], A is chosen as 1.7159 and S=2/3. With this choice of parameters, the equalitites f (1) = 1 and f (−1) = −1 are satisfied. Symmetric functions are believed to yield faster convergence, although the learning can become extremely slow if the weights are too small. Second Order Backpropagation. Second order methods has the greatest impact on speeding convergence of the neural network, in that it dramatically reduced the number of epochs needed for convergence of the weights. All second order techniques aims to increase the speed with which backpropagation converges to optimal weights. However, most second order techniques are designed for offline mode which is useless with neural network training. Neural network training works considerably faster with online mode (stochastic gradient) where the parameters are updated after every training sample. Hence, [1] has proposed a stochastic version of the Levenberg-Marquardt algorithm with a diagonal approximation of the Hessian. The Diagonal Hessian (square matrix of second-order partial derivatives of a function) in neural network was shown to be very easy to compute with backpropagation. During the simulation process, the number of subsamples chosen for diagonal hessian estimation has different result as shown in Table 2. Size of Training Set. The size of training set also affects the system performance in term of accuracy [7]. The training set should be as large as possible by adding a new form of distorted data. By this, the network could learn different training patterns that will result to accuracy increment. Momentum. A momentum is added to improve the convergence speed. It controls the previous weight change on the current weight change [1, 3] from oscillating. The following shows the weight update formula.
Δwk (n) = −λ
∂E p ∂wk (n)
+ αΔwk (n − 1)
(4)
Character Recognition of License Plate Number Using Convolutional Neural Network
53
Where α is the momentum rate (0 ≤ α < 1) and λ is the learning rate. Table 3 represents the misclassification rate for different momentum value. Weight Decay. This is a regularization technique where the term
α
2
2
x
w x is added
to the error function as shown in the equation below: Ep =
1 α || o p − t p || 2 + 2 2
w x
2 x
(5)
This term avoids large weights; reduce the flexibility of the neural network and avoid overfitting the data. By performing gradient descent to this function may lead to the following update formula: ∂E p − λαwx Δw x = − λ (6) ∂wn Where wx refers to all the weights and biases and α refers to a small positive constant. Cross Validation. This technique separates the data into two disjoint parts representing training and validation set. The purpose of this technique is to improve the generalization capacity of a neural network. Theoretically, the error on training and validation set should decrease during the training process. However, at some point the validation error remain constant or even increases. In this case, the increase shows that the neural network might have stop learning the common pattern in train and validation sets, and started to learn noise contained in the training set [3].
4 Results and Discussions This section discusses the results obtained from 4 hours duration of simulation. This network has gone through 30 iteration of simulation. 4.1 Misclassification Rate
In order to analyze the results obtained, the performances of the architecture are measured by misclassification rate. Misclassification rate refers to the number of samples being misrecognized. Thirty iterations through the entire training data were performed for each session. The values of the global learning rate µ was decreased using the following schedule: 0.0005 for the first two passes; 0.0002 for the next three; 0.0001 for the next three; 0.00005 for the next four; and 0.00001 thereafter. There is a vast value of momentums available for CNN. In order to select the optimum value for momentum, several experiments were conducted resulting to the table below. From Table 2, it can be seen that the optimum momentum is approximately 0.5. According to Fig. 8, the produced result is 1.21% of misclassification rate for 434 test data and 750 training data, and for momentum = 0.5. This means the neural network correctly recognized 428 patterns and misrecognized 6 patterns.
54
S.A. Radzi and M. Khalil-Hani Table 2. The reading of misclassification rate according to different momentum values Momentum 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Misclassification rate (percent) Train (percent) 0.9091 0.9091 0.6061 0.3030 0.9091 0.0000 1.2121 0.3030 0.0000 0.3030
Test (percent) 2.4242 4.8485 3.0303 1.8182 3.6370 1.2121 3.0303 1.8182 3.0303 3.0303
As shown from the diagram, the train and test error stabilize after 20 iterations. In certain case, the graph increases after meeting a stable condition. It is assumed that at this point the neural network starts to overtrain the data and generalization decreases. This is due to the fact that neural network have learned the noise specific to that dataset. It may also due to overfitting problem of the classifier and the sample of training set is not large enough to improve the classification accuracy. In order to avoid this situation, “early stopping” technique could be implemented once the graph starts to stabilize [3].
Fig. 8. Misclassification rate graph for momentum=0.5
5 Conclusion Several factors that affect the system performance have been discussed. The system performance could be increased by implementing online mode, choosing appropriate learning rate and activation function, using second order backpropagation (stochastic version of the Levenberg-Marquardt algorithm), expanding the size of training set with different form of distorted images, applying momentum and weight decay, and ultimately implementing cross validation. The number of training and test images used
Character Recognition of License Plate Number Using Convolutional Neural Network
55
in this research is 750 and 434, respectively with 1.21% misclassification rate or 98.79% accuracy. In comparison to [4-6, 8], the approach is similar except for some, they applied geometrical rule to extract the characters. The numbers of training and test data sets are more than 2000 samples which are much higher than the one proposed in this work. Moreover, the accuracy is more or less the same with this research work. As a conclusion, this research work is better in terms of the reduced number of samples. Acknowledgments. This work is supported by the Ministry of Science, Technology & Innovation of Malaysia (MOSTI) under TECHNOFUND Grant TF0106C313, UTM Vote No. 79900.
References 1. 2.
3. 4.
5.
6.
7.
8.
LeCun, Y., et al.: Gradient-Based Learning Applied to Document Recognition. In: Intelligent Signal Processing, pp. 306–351. IEEE Press (2001) Strigl, D., Kofler, K., Podlipnig, S.: Performance and Scalability of GPU-Based Convolutional Neural Networks. In: 18th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP 2010), pp. 317–324 (2010) Duffner, S.: Face Image Analysis with Convolutional Neural Networks. [Dissertation], [cited Doctoral Thesis / Dissertation; 192 ] (2007) Zhihong, Z., Shaopu, Y., Xinna, M.: Chinese License Plate Recognition Using a Convolutional Neural Network. In: Pacific-Asia Workshop on Computational Intelligence and Industrial Application, PACIIA (2008) Chen, Y.-N., et al.: The Application of a Convolution Neural Network on Face and License Plate Detection. In: 18th International Conference on Pattern Recognition, pp. 552–555 (2006) Han, C.-C., et al.: License Plate Detection and Recognition Using a Dual-Camera Module in a Large Space. In: 41st Annual IEEE International Carnahan Conference on Security Technology, pp. 307–312 (2007) Simard, P.Y., Steinkraus, D., Platt, J.C.: Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis. In: Seventh International Conference on Document Analysis and Recognition. Institute of Electrical and Electronics Engineers, Inc. (2003) Johnson, M.: A Unified Architecture for the Detection and Classification of License Plates. In: 2008 10th International Conference on Control, Automation, Robotics and Vision, Hanoi, Vietnam (2008)
Simulation Strategy of Membrane Computing to Characterize the Structure and Non-deterministic Behavior of Biological Systems: A Case Study with Ligand-Receptor Network of Protein TGF-β Muniyandi Ravie Chandren1 and Mohd. Zin Abdullah2 1
School of Computer Science School of Information Technology, Faculty of Information Science and Technology, University Kebangsaan Malaysia, 43600 Bangi, Selangor, Malaysia {ravie,amz}@ftsm.ukm.my 2
Abstract. The processes in biological systems evolve in discrete and nondeterministic ways. Simulation of conventional models such as ordinary differential equations with continuous and deterministic evolution strategy has disregarded those behaviors in biological systems. Membrane computing which has been applied in a nondeterministic and maximally parallel way to capture the structure and behaviors of biological systems could be used to address the limitations in ordinary differential equations. The stochastic simulation strategy based on Gillespie’s algorithm has been used to simulate membrane computing model. This study was carried out to demonstrate the capability of membrane computing model in characterizing the structure and behaviors of biological systems in comparison to the model of ordinary differential equations. The results demonstrated that the simulation of membrane computing model preserves the structure and non-deterministic behaviors of biological systems that ignored in the ordinary differential equations model. Keywords: Membrane computing, stochastic simulation, biological systems.
1 Introduction The biological systems are assumed to be continuous and deterministic in the modeling and simulation with conventional mathematical approaches such as Ordinary Differential Equations (ODE) contrary to its discrete and non-deterministic behaviors. The concentration of chemical substances in the ODE is measured according to the decrease and the increase of the chemical substances by corresponding to kinetic constants of the processes in the system. Although ODE allows the processes to be described in detail, a number of implicit assumptions underlying ODE are no longer applicable on the molecular level [1]. Jong [1] explains the limitations of the continuity and deterministic aspects of ODE at the molecular level. Firstly, the small number of molecules in biological systems such as a few tens of molecules of a transcription factor in the cell nucleus could H. Badioze Zaman et al. (Eds.): IVIC 2011, Part I, LNCS 7066, pp. 56–66, 2011. © Springer-Verlag Berlin Heidelberg 2011
Simulation Strategy of Membrane Computing
57
compromise the continuity assumption of ODE. Secondly, deterministic change assumed by using differential operator may be questionable due to fluctuations in the timing of cellular events, such as the delay between start and finish of transcription. This could lead to a situation where two regulatory systems having the same initial conditions may ultimately settle into different states, a phenomenon strengthened by the small numbers of molecules involved [1]. Probability theory [2] has been used to understand stochastic behavior of biological systems, and the mathematical analysis based on this theory provides complete description of properties for simple random systems. Nevertheless, stochastic model of biological systems such as biochemical network showed the limitations of mathematical analysis and such system is described as analytically intractable [3]. In recent years, the ways to capture the role of stochasticity in biological systems ‘insilico’ has been the subject of increasing interest among biologist. With the development of computer system, the simulation of time-evolution of biological system becomes possible. Stochastic simulation is the way to simulate the dynamics of a system by capturing the random phenomena to understand the model and to extract the many realizations from it in order to study them [3]. In recent years, numbers of algorithms have been devised to deal with stochastic character of biological systems [4]. Some of them are, stochastic reaction-diffusion simulation with MesoRD [5]; Stochastic simulation of chemical reactions with spatial resolution and single molecule [6]; and, Monte Carlo simulation methods for biological reaction-diffusion systems [7]. However, such attempts focus more on the general behaviors of biological systems without taking into the structure of the system where the behaviors are taking place. Membrane computing [8] is an area of computer science that conceptualized the ideas and models of computation from the structure and behavior of living cell. It provides a mechanism for biological systems to preserve its structural characters as well as the behaviors of stochastic processes in the systems [9]. This is possible because membrane computing is a computing model with distributed and parallel processing, in which the processes evolve in parallel and non-deterministic or stochastic way in which all evolution rules are simultaneously applied to all the objects. The computation halts to produce output when no rule is applied. The membranes also can be arranged in a hierarchical structure as in a cell, or in an internetworking of different membranes as in a tissue. This conception provides the opportunity of a selective communication between the two regions in the biological systems. The structure and the stochastic behaviors of biology systems modeled with membrane computing has been verified by using stochastic simulation strategy based on Gillespie algorithm [10]. Gillespie algorithm [11] provides a method for the stochastic simulation of systems of bio-chemical reactions and proposed that at each time step, the chemical system is in exactly one state and to directly simulate the time evolution of the system. Basically, the algorithm determines the nature and occurrence of the next reaction, given that the system is in state s at time t. This paper analyzes the simulation of the membrane computing model of ligandreceptor network of protein TGF-β [12] to demonstrate the capability of membrane computing to preserve the structure and behaviors of biological systems.
58
M.R. Chandren and M.Z. Abdullah
1.1 Case Study: Ligand-Receptor Network of TGF-β The Ligand-Receptor Network of TGF-β is two compartments system with five chemical substances and fourteen different reactions. The compartments are Plasma membrane and Endosome as illustrated in Figure 1. Plasma membrane Endosome
Fig. 1. Membrane Structure of Ligand-Receptor Network of TGF-β
In signal transduction some cells secrete TGF-β and also generate receptors for TGF-β. The TGF-β signal transduction pathway plays a central role in tissue homeostasis and morphogenesis. It transduces a variety of extracellular signals into intracellular transcriptional responses that control the excesses of cellular processes such as cell growth, migration, adhesion, apoptosis and differentiation. At the molecular level, complex signal transduction machinery integrates signals from the 42 known ligands of the TGF-β superfamily of proteins. The elements of this machinery incorporate the members of the two main receptor families called type I and type II receptors. Each ligand induces the formation of a receptor complex with type I and type II receptors, which then signal through the channels. The capacity of most ligands to bind several type I and type II receptors lead to a complex ligand-receptor interaction network. TGF-β has particular interest in cancer research. For instance, in epithelial cells, it suppresses cellular growth and its inactivation contributes to tumourigenesis. The versatility of the pathway in eliciting different types of behaviour is perhaps best epitomized by the pervasive, rather paradoxical ability of TGF-β to change its function from suppressor to promoter of growth in epithelial cells during tumour progression. It has been suggested that TGF-β can suppress the growth of cells around the tumour, that it can shut down the immune system locally, and that it can promote angiogenesis. All these paracrine effects would help the growth of the tumour in vivo, where it has to compete with neighbouring cells. In this biological system, ligand concentration is constant to relate the consistent availability of ligand in the system. However there will be a trigger in the system at time steps 2500 to change the ligand concentration from 3x10-5 to 0.01 to reach a steady state. In this case ligand is considered as the event object that triggers the state changes in the system when a specific event or condition is invoked.
2 Membrane Computing Model of Ligand-Receptor Network of TGF-β The structure and behaviors of Ligand-Receptor Network of TGF-β (LRN) are extracted from Villar [12]. The gathered information is represented in membrane computing formalism [13] as follow.
Simulation Strategy of Membrane Computing
59
LRN = (V , μ , ω P , ω E , RP , RE ) The system contains two compartments that are Plasma membrane (P) and Endosome (E). Therefore its membrane structure can be represented as,
μ = [[
]E ]P
The combination of right and left square brackets refers to a membrane. The structure refers to a compartmentalized membrane system in which membrane E is encapsulated within membrane P. The objects are receptors and complex of ligand and receptors. l is the ligand which is assumed to be always available in P. RI and RII are receptors, and lRIRII is ligand-receptors complex which are manipulated by the rules in P and E. The objects are listed as follow:
V = {l , RI , RII , lRIRII } The combination of objects in a compartment at a time step t will form a multiset in which an object of a multiset can have multiple instances. The multisets at time step t=0 are,
ωP = {l , RI , RII } and ω E = {lRIRII } . k The evolution rule has the form: R j : u[v ] j ⎯ ⎯→ u ' [v ' ] j , where u , v, u
'
, v ' are
multisets and j is the label of compartment. k is a real number representing the kinetic constant. A rule of this form is interpreted as follows: based on the value of k, the multiset u outside membrane j and the multiset v inside membrane j react together to produce u ' outside a and v ' inside j. If there is more than one rule in compartment j, the rules are labeled as R j 1, R j 2,..., R j n to denote each of the n rules. The two common evolution rules in biological systems are: (a) Transformation rule: It is a rewriting rule that occur between objects inside a compartment. For example the rule [X,X]j→[X,Y], means that, when there are 2 objects of X available in compartment j, they can interact with each other to generate an object X and an object Y. (b) Communication rule: It is a transportation of object from a compartment to another compartment. It can be either an object in a compartment moving out or an object moving into a compartment. Y[X] j→ [X,Y] j : In this communication rule, object Y which is outside compartment j is moving into that compartment. [Y,X] j→Y[X]j: In this communication rule, object Y, which is inside compartment j is moving out of the compartment. The rules of ligand receptor network of TGF-β are extracted from the ODE model given by Villar [12]. The ODE model is converted into membrane computing by using rewriting rules [14]. The rules of ligand receptor network of TGF-β are represented in membrane computing as follow. 1) Ligand receptor complex formation: This rule is activated by ka in which l , RI , RII combine to evolve into lRIRII in the plasma membrane. In this rule an
60
M.R. Chandren and M.Z. Abdullah
event is accommodated in which the concentration of ligand (l) would change from 3x10-5 to 0.01 at time steps 2500.
RP1: (if t >=2500 l = 0.01 else l = 3×10−5 ) >> k [l , RI , RII ]P ⎯⎯→ [lRIRII , l ]P
(1)
a
2) Ligand receptor complex constitutive degradation: This rule is activated by kcd in which lRIRII in the plasma membrane is depleted. k cd [ RP 2 : [lRIRII ]P ⎯⎯→
]P
(2)
3) Ligand independent complex degradation: This rule is activated by klid in which lRIRII in the plasma membrane is depleted. lid [ RP 3 : [lRIRII ]P ⎯k⎯→
]P
(3)
4) Ligand receptor complex internalization: This rule is activated by ki in equation which lRIRII in the plasma membrane is transported into endosome.
RP 4 : lRIRII [
k ]E ⎯⎯→ [lRIRII ]E i
(4)
5) RI synthesis: This rule activated by pRI in which RI is generated in the plasma membrane.
RP 5 : [
p ]P ⎯⎯→ [RI ]P
(5)
RI
6) RI constitutive degradation: This rule is activated by kcd in equation in which RI in the plasma membrane is depleted. k cd [ RP 6 : [RI ]P ⎯⎯→
]P
(6)
7) RI internalization: This rule activated by ki in which RI in the plasma membrane is transported into endosome.
RP 7 : RI [
k ]E ⎯⎯→ [RI ]E i
(7)
8) RI recycling: This rule activated by kr in which RI in the endosome is transported into plasma membrane. kr RE1 : [RI ]E ⎯⎯→ RI [
]E
(8)
9) Ligand Receptor complex recycling: The first rule is activated by kr in which lRIRII in the endosome is transported into plasma membrane. The second rule is activated by the product of kr and α in which the produced lRIRII is transformed into RI , RII in plasma membrane. kr RE 2 : [lRIRII ]E ⎯⎯→ lRIRII [
]E
r .α RP 8 : [lRIRII ]P ⎯k⎯ ⎯ →[RI , RII ]P
(9) (10)
Simulation Strategy of Membrane Computing
61
10) RII synthesis: This rule activated by pRII in which RII is generated in the plasma membrane.
RP 9 : [
p ]P ⎯⎯→ ⎯ [RII ]P RII
(11)
11) RII constitutive degradation: This rule activated by kcd in which RII in the plasma membrane is depleted. k cd [ RP10 : [RII ]P ⎯⎯→
]P
(12)
12) RII internalization: This rule activated by ki in which RII in the plasma membrane is transported into endosome.
RP11 : RII [
k ]E ⎯⎯→ [RII ]E i
(13)
13) RII recycling: This rule activated by kr in which RII in the endosome is transported into plasma membrane. kr (14) RE 3 : [RII ]E ⎯⎯→ RII [ ] E
3 Simulation of LRN System The LRN system modeled in membrane computing is simulated with the Gillespie algorithm. This experiment intended to analyze how different parameters affect the behavior of the LRN system by using Gillespie simulator [15]. The signaling activity will peak or stop rising at certain steps according to the changes in the parameters. The parameters are extracted from the mathematical model build by Villar [12]. The initial multisets are, ω P = {1130 RI ,1130 RII } and ω E = {40lRIRII} .
Fig. 2. Behavior of LRN system for typical trafficking rates
62
M.R. Chandren and M.Z. Abdullah
Figure 2 shows the behavior of the model for typical trafficking rates. In this model, internalization rate (ki) represented by k4, k7 and k13, is 1/3; recycling rate (kr ) represented by k8, k9, k10 and k14 is 1/30; constitutive degradation rate (kcd) represented by k2, k6 and k12 is 1/36; ligand-induced degradation rate (klid) represented by k3 is 1/4; complex formation rate (ka) represented by k1 is 0.01; synthesis rate (pRI and pRII) for k5 and k11 is 8 and 4, respectively. The efficiency of recycling of active receptors rate, α is 1. The results show that signalling activity is peaking when the concentration of lRIRII in Endosome is around 500.
Fig. 3. Behavior of LRN system when rate constants for internalization and recycling are decreased
Fig. 4. Behavior of LRN system when rate constants for internalization and recycling are increased
Simulation Strategy of Membrane Computing
63
When the rate constants for internalization and recycling are decreased to 1/10 and 1/100 respectively, the peak of signaling activity is also decreased as shown in Figure 3. The concentration of lRIRII in Endosome is around 300. Meanwhile, when rate constants for internalization and recycling are increased to 1 and 1/10, respectively, the peak of signaling activity is also increased as shown in Figure 4. The concentration of lRIRII in Endosome is around 600.
Fig. 5. Behavior of LRN system when rate constants for ligand-induced degradation and efficiency of recycling of active receptors are decreased
Fig. 6. Behavior of LRN system when recycling of active receptors is decreased
The simulation in Figure 5 and Figure 6 show the behaviour of the model when the efficiency of recycling of active receptors rate is decreased to 0.5. Figure 5 shows that
64
M.R. Chandren and M.Z. Abdullah
when ligand-induced degradation rate is decreased to 0.01, the signaling activity is peaking when the concentration of lRIRII in Endosome is around 800. Meanwhile, with the same ligand-induced degradation rate as in Figure 2, the signaling activity is peaking when the concentration of lRIRII in Endosome is around 500. 3.1 Results and Discussions The membrane computing generated Figure 2, Figure 3, Figure 4, Figure 5 and Figure 6 are compared to the ODE generated figures (A), (B), (C), (D) and (E) in Figure 7, respectively. The simulation of membrane computing model of LRN system shows that the general behaviors of the system as indicated in the ODE model [12] in Figure 7 are also obtainable in membrane computing as demonstrated by the peak reached in the membrane computing model which is almost similar to the ODE model.
(A)
(B)
(C)
(D)
(E)
Fig. 7. Simulation of ODE model of TGF-β [12]
Moreover, the membrane computing model also captures the communication of objects between compartment P and compartment E based on the rules (4), (7), (8), (9), (13) and (14). Since the objects in the system are specified according to compartment, and each of these objects evolves in discrete, the number of each type of objects in the specific compartment can be measured at each time step. At the same time, the combinations of behaviors of processes or objects in the system could
Simulation Strategy of Membrane Computing
65
generate the emergent behavior as shown by the peak of lRIRII in Endosome at certain time step. The simulation results also show that the discrete and nondeterministic characters of membrane computing could not only sustain the general behavior of the system but also the specific stochastic elements at each time step by executing the rules non-deterministically based on the specific behavior of the system. Moreover, the model checking approach carried out by Muniandy et al. [16] reinforced that the properties of LRN system have been preserved in membrane computing model. However, since objects are modeled as a global entity in the ODE model, the concentration of objects in a specific compartment could not be verified. Therefore the ODE model is merely concentrating on the global behavior and disregards the structure and specific behavior in the system. Meanwhile, the membrane computing simulation strategy is taking more time compared to ODE. For instance, the ODE simulation takes around 200 time units to reach a peak [12] and the membrane computing simulation takes around 2500 steps to reach the similar peak with same initial concentration of lRIRII in Endosome. This observation demonstrates that more processing time is needed to select a reaction in the stochastic approach at each time step compared to deterministic approach of ODE. In the simulation of membrane computing model the numbers of simulation steps are almost similar to each of the investigation and it could not be adjusted as in the ODE model due to the limitations of Gillespie Simulator which could not accommodate event object. Event object triggers the state changes in model when a specific event or condition is invoked. In this model, ligand can be considered as the event object in which the event at time step 2500 should change the ligand concentration from 3x10-5 to 0.01 to reach a steady state. However this element is not accommodated in the Gillespie Simulator and due to this limitation the ligand concentration is fixed at 0.01 at all the time.
4 Conclusions The evolution processes in LRN system with two compartments show that the membrane computing model preserves the structure and behaviors of hierarchical systems, and this has been proven with the model checking approach [16] that certified that the properties of the system has been preserved in the model. The advances in memory capacity and processing time in computer system could reduce the simulation time of membrane computing compared to the ODE. At the same time, the Gillespie Simulator should be enhanced to speed up the processing time of membrane computing as well as to manage specific biological behavior such as event object as in the LRN system. Nevertheless, the investigation reinforce that the elements and structure of biological system can be characterized in a better way by using membrane computing model in order to be interpreted straightforwardly by biologist compared to the conventional mathematical models. Acknowledgements. This work supported by Exploratory Research Grant Scheme (ERGS), Ministry of Higher Education (Malaysia). Grant code: ERGS/1/2011/STG/ UKM/03/7.
66
M.R. Chandren and M.Z. Abdullah
References 1. Jong, H.D.: Modeling and Simulation of Genetic Regulatory Systems: A Literature Review. Journal of Computational Biology 9(1), 67–103 (2002) 2. Jaynes, E.T., Bretthorst, G.L.: Probability Theory: The Logic of Science. Cambridge University Press, London (2003) 3. Wilkinson, D.J.: Stochastic Modeling for Systems Biology. CRC Press, London (2006) 4. Modchang, C., Nadkarni, S., Bartol, T.M., Triampo, W., Sejnowski, T.J., Levine, H., Rappel, W.: A Comparison of Deterministic and Stochastic Simulations of Neuronal Vesicle Release Models. Phys. Biol. 7(2), 26008 (2010) 5. Hattne, J., Fange, D., Elf, J.: Stochastic Reaction-diffusion Simulation with MesoRD. Bioinformatics 21, 2923–2924 (2005) 6. Andrews, S.S., Bray, D.: Stochastic Simulation of Chemical Reactions with Spatial Resolution and Single Molecule Detail. Phys. Biol. 1(3), 137–151 (2004) 7. Kerr, R.A., Bartol, T.M., Kaminsky, B., Dittrich, M., Chang, J.J., Baden, S.B., Sejnowski, T.J., Stiles, J.R.: Fast Monte Carlo Simulation Methods for Biological Reaction-diffusion Systems in Solution and on Surfaces. SIAM J. Sci. Comput. 30, 3126 (2008) 8. Paun, G.: Computing with Membranes. Journal of Computer and System Sciences 61(1), 108–143 (1998) 9. Muniyandi, R., Mohd.Zin, A.: Modeling A Multi Compartments Biological System with Membrane Computing. J. Comput. Sci. 6, 1148–1155 (2010) 10. Muniyandi, R., Mohd.Zin, A.: Experimenting the Simulation Strategy of Membrane Computing with Gillespie Algorithm by Using Two Biological Case Studies. J. Comput. Sci. 6, 525–535 (2010) 11. Gillespie, D.T.: Approximate Accelerated Stochastic Simulation of Chemically Reacting Systems. Journal of Chemical Physics 115(4), 1716–1733 (2001) 12. Villar, J.M., Jansen, R., Sander, C.: Signal Processing in the TGF-β Superfamily LigandReceptor Network. PLoS Comput. Biol. 2(1), e3 (2006) 13. Muniyandi, R., Mohd.Zin, A.: Modeling of Biological Processes by Using Membrane Computing Formalism. Am. J. Applied Sci. 6, 1961–1969 (2009) 14. Bezem, M., Klop, J.W., Vrijer, R.D.: Term rewriting systems. Cambridge University Press, London (2003) 15. Romero-Campero, F., Gheorghe, M., Auld, J.: Multicompartment Gillespie Simulator in C (2004), http://www.dcs.shef.ac.uk/~marian/PSimulatorWeb/ PSystemMF.htm 16. Muniyandi, R., Mohd.Zin, A., Shukor, Z.: Model checking the biological model of membrane computing with probabilistic symbolic model checker by using two biological systems. J. Comput. Sci. 6, 669–678 (2010)
Development of 3D Tawaf Simulation for Hajj Training Application Using Virtual Environment Mohd Shafry Mohd Rahim, Ahmad Zakwan Azizul Fata, Ahmad Hoirul Basori, Arief Salleh Rosman, Tamar Jaya Nizar, and Farah Wahida Mohd Yusof Faculty of Computer Science & Information Systems, Universiti Teknologi Malaysia 81310 UTM Skudai Johor, Malaysia {shafry,aswar,tamar,farahw}@utm.my, {uchiha.hoirul,zakwanfata}@gmail.com
Abstract. 3D application has become very popular among people due to its capability to imitate real world environment and its interactivity to entertain users. This paper proposed an interactive training method for Hajj education in Malaysia by providing user a scene of Tawaf (one of Hajj pilgrimage rituals). Participants are provided with a flexible user control which is connected wirelessly to the system. The Tawaf simulation is created based on crowd technology and in this case we use a new method to control the movements of human character around Kaaba. Based on the user testing feedbacks, users are thrilled with the system, particularly because it is user-friendly and flexible. Keywords: 3D application, interactive training method, crowd, Visual Informatics.
1 Introduction Virtual reality has been used as a fundamental application in most of engineering fields (medical systems like surgery simulation), government defense system (military simulation training, flight simulation), and other engineering fields that related to real life (L.Greitzer et al., 2007, Michael and Chen, 2006, Geiger et al., 2008). There are various approaches to develop 3D simulation especially that utilize game engines and haptic devices, visualization, animation and feedback by sense of touch. In addition, virtual environment offers more interactivity on making simulation, because it has visual interpretation, text information, audio, and feedback based on sense of touch. Recently, games and simulation has evolved into serious game which comprises tasks that teach the users with specific knowledge (L.Greitzer et al., 2007). On the subject of Hajj pilgrimage, previous researcher has made virtual walkthrough over jamarat area to solve the issue of safest path during the annual pilgrimage which is involved million of people (Widyarto and Latiff, 2007). Researchers like Widyarto and Latiff (2007) focus on providing users a tour of jamarat area and show them the safe route to escape when an emergency occurs. H. Badioze Zaman et al. (Eds.): IVIC 2011, Part I, LNCS 7066, pp. 67–76, 2011. © Springer-Verlag Berlin Heidelberg 2011
68
M.S.M. Rahim et al.
Millions of pilgrims from around the world gather each year to perform Hajj. Based on a survey, in 2009 the number of pilgrims reached 2.5 million. To simulate this huge population, we have used crowd simulation technique to imitate the real situation inside virtual environment. The crowd simulation of hajj requires efficient algorithm and hardware specification to achieve the best performance. In this paper we focus on one of the annual Hajj pilgrimage ritual which is called Tawaf. Tawaf means moving in circular pattern and refer to Kaaba as the epicentre of the movement. This paper will discuss the experience on developing Tawaf application using crowd simulation technique and the results are tested with real users.
2 Previous Work Collaborating virtual reality, game and simulation give the virtual reality game new paradigm in terms of entertainment aspect. Simulation can stimulate player or user through entertaining, contest, and immediate reaction in digital environment that can generate immersive feeling (L.Greitzer et al., 2007). However, the simulation still faces some problem like how to improve 3D relation in digital environment such as choosing, exploitation, direction-finding (navigation) and managing system (Mocholí et al., 2006). Solving those issues with special integration between human haptic emotion and artificial intelligence will give the simulation closer to reality. The complexity of 3D simulation has been classified into several elements(Capilla and Martinez, 2004): • • • • •
Using special hardware like: haptic devices or head-mounted display devices, etc. Complication of user interface and interaction 3D modeling Collision, physics and deformation Presence of user in virtual environment.
In order to produce good quality of 3D simulation application we can utilize game engines as development tools. Currently game engines have advanced technology that can provide and facilitate better visualization and communication between graphics hardware, sound hardware and application (Marks et al., 2007). The latest 3D application must be able to handle graphics and visualization computational process for instance of loading highly complex primitive, terrain, constructions of trees, stones, cloud, ocean, sky, and loading thousand textures. The other researchers have focused to solve the safety issue of hajj pilgrimage by analyzing the structure of terrain Mina (place near Mecca, see Fig.1) and also the building structure based on GIS method to increase the human safety during the pray session(Al-Kodmany, 2009, Shehata and Koshak, 2006). According to Shehata and Koshak (2006) the hydrology structures need to be investigated to see water drafting path pattern and their impact to the area near mina. It is potential to cause flood or not? In order to solve this issue Shehata and Koshak (2006) has created 3D GIS model to analyze the potential of natural hazard.
Development of 3D Tawaf Simulation for Hajj Training Application
69
Fig. 1. Map of Hajj pilgrimage places
Education of Hajj pilgrimage especially in Malaysia still depends on conventional teaching methods such as hajj training, and prime hajj course (see Figure.2).
Fig. 2. Hajj training and prime hajj course (picture source: http://sabangcenter.blogspot.com/2008/10/22-calon-jamaaf-haji-sabang-ikuti.html)
3 Environment Modeling 3.1 Architecture of System This section will explain the process of designing and creating 3D model of Haram Mosque using mesh lab tools. The Haram Mosque model is created using 3D max and viewed by using meshlab tools. The models of Kaaba and Haram Mosque are bundled in one file and saved into collada model which is supported by horde 3D game engine. The 3D Edu-Hajj system is divided into three main elements: input, main process and output. Wiimote and nunchuk are used as input media for controlling the movement of camera
70
M.S.M. Rahim et al.
inside Haram Mosque environment. The main process of the system is 3D model design, loading and rendering by using horde 3D game engine. Output of the system is displayed using head-mounted display (HMD) (see Figure.3).
Fig. 3. Architecture of virtual hajj
The input of the system is read from wiimote event which is triggered when user presses one of the wiimote buttons. These actions include: move to right, left, forward and backward, rotate the camera and etc. The rendering process is started by interpreting the 3D collada model then arranged all those models into proper position. The first object that rendered in this application is Haram Mosque model. The reason being is Haram Mosque has a huge number of polygons compared to a human character. After rendering process of the mosque, it is continued with human characters rendering. The human characters are placed in certain location inside the mosque and its movement follows a particular pattern. The output of Edu-Hajj system is a mass of crowd model that perform Tawaf ritual with Kaaba as the epicenter. Interactively, Edu-Hajj user will be able to walk around Kaaba as onde of the Tawaf prayers. The output can be viewed on a monitor screen or HMD to get a more realistic experience. In our testing, users are more excited in using HMD rather than normal monitor or screen projector, HMD provide user a better focus.
3.2 Creating Model for Haram Mosque and Kaabah Haram Mosque around Kaaba is divided into three main parts: Kaaba, mosque area and floor. The design of Kaaba, Haram Mosque and floor is started by putting a cube in the centre of the mosque. The texture of the 3D model is based on jpeg file. The design of the whole mosque in wireframe and full scene can be viewed in Figure 4. The model on Figure 4 needs to be exported into collada model in order to be loaded in horde 3D game engine. The conversion into collada model is utilizing 3D max.
Development of 3D Tawaf Simulation for Hajj Training Application
71
Fig. 4. Model of Haram Mosque in Full scene rendering
4 Implementation and Testing 4.1 Experimental Setup This study is run on PC Pentium 4 with RAM 2 GB and VGA card 512 MB. The controllers used in this experiment are mouse and wiimote. Wiimote is preferred due to its flexibility to control object thorough wireless connection. The game engine is built from horde 3D and the human model is man.geo that taken from Chicago model of horde 3D. 4.2 Implementation To perform Tawaf, pilgrimages have to wear ihrams which consist of two piece of white cloth for men and any cloth that fulfills Islamic conditions for women. This ihram is yet implemented to our character. In our case, implementation is tested with 80 human characters and the destinations of each character are computed in real time. The human characters walk in circular pattern around Kaaba (black cube in the centre of Haram Mosque) starting from Hajar Aswad (black stone) to perform Tawaf, see Figure 5 for the illustration.
Fig. 5. Tawaf pray started from Hajar Aswad
72
M.S.M. Rahim et al.
The Kaaba has five important elements as follows: 1. 2. 3. 4. 5.
Hajar Aswad Maqam Ibrahim Kabaa doors Hijr Ismail Yamani corner.
Based on the testing records with 80 characters, our application has shown a good result on simulating Tawaf using crowd simulation. The frame rate (FPS) is relatively high and the CPU time is small, which indicates that the device will still able to handle a larger crowd number with high performance. The experiment is simulated using 80, 300 and 500 characters to see the performance of the Edu-Hajj. Subsequently, the system has been tested by users to see how they interact with EduHajj. The growth of character population on Edu-Hajj crowd system will reduce the rendering speed which is shown with low frame rate and large CPU times that needed to render the object. The complete comparison can be seen in Figure 6, Table 1, and Table 2.
Fig. 6. Performance evaluation of E-Hajj on various numbers of characters
Development of 3D Tawaf Simulation for Hajj Training Application
73
Table 1. Result of Performance Test in FPS Period (second)
80 characters (FPS) in millisecond ms
500 characters (FPS) in millisecond ms 4.3937
1
24.0462
300 characters (FPS) in millisecond ms 9.6915
2
23.5271
9.9591
4.4042
3
23.2154
10.0231
4.4292
4
23.4321
9.6583
4.4269
5
27.9471
9.7491
4.4198
6
23.0999
9.8261
4.4528
7
23.1895
9.7902
4.4487
8
23.2122
9.3862
4.2859
9
24.0571
9.7311
4.3203
10
23.2693
9.3778
4.2863
Table 2. Result of Performance Test in CPU Time 300 characters (CPU Time) in millisecond ms
500 characters (CPU Time) in millisecond ms
Period (second) in millisecond ms 1
80 characters (CPU Time) in millisecond ms 42.5855
107.1105
227.0720
2
43.2354
106.1046
225.7757
3
43.0867
105.6572
223.8707
4
42.5274
108.7314
226.7975
5
42.8334
104.3172
226.8674
6
41.7027
105.5220
224.0514
7
45.0591
109.9290
227.1216
8
38.3828
109.1557
232.6574
9
43.3686
109.6607
234.2742
10
43.9918
108.1883
233.4143
5 Result and Discussion In this section we will analyze the Edu-Hajj testing result based on empirical testing and user testing to evaluate the performance and usability of Edu-Hajj. The first subsection will discuss the performance of Edu-Hajj when it was tested using a large number of characters.
74
M.S.M. Rahim et al.
The Edu-Hajj has made characters walk circularly around Kaaba by calculating four main destination points for seven times during Tawaf. This is to be in consistent with the rule of Tawaf which encourages user to walk circularly around Kaaba seven times for one Tawaf pray. The four destination points are illustrated in Figure 7.
Fig. 7. Walking pattern
The following formula is generated pattern to calculate the circulation computational complexity of crowd simulation in Edu-Hajj.
Cindex = index × 4 × 7
(1)
Based on Table 1 and Table 2, it has shown the character growth will reduce the performance significantly and it also made CPU takes long time to render the object. 500 characters only has FPS 4.3937 with CPU times 234.2742. It’s very slow, due the complexity of Cindex is very huge.
C500 = 500 × 4 × 7 =14000
(2)
C300 = 300 × 4 × 7 =8400
(3)
Cindex 500 characters have reduced the performance by half compared to Cindex for 300 characters. We are still working to solve this issue which will be carried out as our future work.
6 Future Work In the future, we are planning to enhance the sensory experience of a crowded situation by placing a haptic device on a user’s body. By the movements of a haptic device, user can feel the simulation and the pressure level depends on the number of characters in the system. Moreover, future research will also focus on increasing the rendering speed using In-Situ approach. Due to the current system limitation, the system only allows up 500 models for an accurate simulation. Therefore, by improving the rendering speed, theoretically, a larger number of models can be simulated. This will enhance the
Development of 3D Tawaf Simulation for Hajj Training Application
75
simulation accuracy and also create a more realistic experience for the users, as the actual hajj pilgrimage normally draws millions of pilgrims from across the world . The future research will be based on hypothesis agents, where all old computers will be used. Each old computer central processing units (CPU) will be given a data in parallel, so that all data can be processed simultaneously. This will speed up the rendering process, as compared to using a single computer which is less efficient and severely slowing down the rendering process. When the rendering process is too slow, the simulation pattern become less natural and lagging behind the user’s movements. Consequently, the level of user acceptance will be low and the result of the simulation is inaccurate.
7 Conclusions Hajj pilgrimage is the largest people mass movement occurs annually. Therefore to dynamically simulate this scenario through a virtual environment is not an easy task. Each hajj rituals has its own rules and procedures, for instance, in performing Tawaf, there is a specific movement pattern and a number of rotations that must be achieved. The Edu-Hajj is a prototype, which is implemented according to these hajj rules. The current system is able to simulate the Tawaf method and its crowd, to a certain precision. Of course, there is room for improvement and our algorithm needs to be revised in the future. Nevertheless, the testing has shown a promising result on representing the crowd during hajj. Furthermore, during the walkthrough, users seem very excited about Edu-Hajj’s special feature, where they are able to control the camera perspective using the wiimote freely. Acknowledgement. The authors would like to thank Ministry of Science, Technology and Innovation Malaysia and Universiti Teknologi Malaysia (UTM) for their financial support and also CFIRST as main research centre for Edu-Hajj Project.
References 1. 2.
3.
4.
5. 6.
Al-Kodmany, K.: Planning for the Hajj: Political Power, Pragmatism, and Participatory GIS. Journal of Urban Technology (2009), doi:10.1080/10630730903090289 Capilla, R., Martinez, M.: Software Architectures for Designing Virtual Reality Applications. In: Oquendo, F., Warboys, B.C., Morrison, R. (eds.) EWSA 2004. LNCS, vol. 3047, pp. 135–147. Springer, Heidelberg (2004) Geiger, C., Fritze, R., Lehmann, A., Stocklein, J.: HYUI- A Visual Framework for prototyping Hybrid User Interfaces. In: Proceedings of the Second International Conference on Tangible Anda Embedded Interaction (TEI 2008). ACM, Bonn (2008) Greitzer, L., Kuchar, F., Huston, K.: Cognitive Science Implications for Enhancing Training Effectiveness in a Serious Gaming Context. Journal of Educational Resources in Computing-ACM 7(3) (2007) Marks, S., Windsor, J., Wunsche, B.: Evaluation of Game Engines for Simulated Surgical Training. In: GRAPHITE. ACM (2007) Micheal, D., Chen, S.: Serious Game, game that educate, train, and Inform. Thomson Johnson Technology (2006)
76 7.
8.
9.
M.S.M. Rahim et al. Mocholi, J.A., Esteve, J.M.J., Jaen, R.A., Xech, P.L.: An Emotional Path Finding Mechanism for Augmented Reality Applications. In: Harper, R., Rauterberg, M., Combetto, M. (eds.) ICEC 2006. LNCS, vol. 4161, pp. 13–24. Springer, Heidelberg (2006) Shehata, A.M., Koshak, N.A.: Using 3D GIS to Assess Environmental Hazards in Built Environments (A Case Study: Mina). The Journal of Al-Azhar University (2006); ISSN:1110-640 Widyarto, S., Latiff, M.S.A.: The use of virtual tours for cognitive preparation of visitors: a case study for VHE. Emerald Journal of Facilities 25, 271–285 (2007)
A Grammar-Based Process Modeling and Simulation Methodology for Supply Chain Management Mohsen Mohammadi1,*, Muriati Bt. Mukhtar1, and Hamid Reza Peikari2 1
Information Science and Technology, Universiti Kebangsaan Malaysia, Bangi, 43600, Malaysia
[email protected] 2 Graduate School of Business, Universiti Kebangsaan Malaysia, Bangi, 43600, Malaysia
Abstract. In order to respond to customer demands, supply chains must be rapidly reconfigured and evaluated. This has given rise to supply chain simulation as an important tool to aid in the evaluation of supply chain configurations. Part of the reconfigurations involved designing and redesigning business processes. Hence, business process simulation is an integral part of supply chain simulation. Supply chain simulation models are usually large-scale complex models. It is thus usual for the simulation experts to get overwhelmed with the simulation process itself. This paper intends to propose a methodology, combining two approaches (i.e. grammar-based business process modeling and simulation) to facilitate process thinking (reengineering). This methodology bridges the gap between business process modeling and simulation by providing the grammatical approach. This paper presents a novel approach to design business processes based on grammar-based process modeling, eventdriven process chains (EPC) and discrete event simulation. This paper illustrates that the grammar-based process modeling approach is applicable to the simulation of dynamic systems such as supply chains as well as representing detailed description of the processes and events. More detailed and advanced analysis, and discussions will be reported in future papers. Keywords: Event-driven process chain, grammar-based modeling, process simulation, supply chain, Business Process Reengineering.
1 Introduction The uncertainties in business environment change business models in supply chain. Therefore, it is needed to create/change business process in the shortest possible time to respond to the uncertainties of environment. In the design of new business processes, simulation facilitates the validation of the processes to ensure that they will work as designed. Simulation is used to evaluate supply chain performance and is an integral part of the decision making process in supply chain management. Business process-based simulation provides a precise, visual method to analyze and compare the performance before and after business process engineering [1]. For a *
Corresponding author.
H. Badioze Zaman et al. (Eds.): IVIC 2011, Part I, LNCS 7066, pp. 77–86, 2011. © Springer-Verlag Berlin Heidelberg 2011
78
M. Mohammadi, M.B. Mukhtar, and H.R. Peikari
manufacturing supply chain, manufacturing patterns such as manufacturing process model, which focus on process modeling, process management, process analysis, and process reengineering [2] will form an important input in the supply chain simulation model. As such, in order to develop supply chain simulation models, the modeler is required to use several approaches and tools in order to capture the important facets of the supply chain. Tools such as the Business Process Modeling (BPM), Business Process Simulation (BPS) and discrete event simulation are usually used. However, these tools have limitations and they need to be complemented by other tools. For instance, Business Process Modeling (BPM) has a static structured approach to business process improvement. It provides a holistic approach on how the business operates by documenting the business processes. However, BPM does not provide any information about the dynamics of the business and how the changes in business process with the minimum risk is possible. To provide a dynamic approach to the business and to consider the impacts of change in such dynamics without risk, the concept of Business Process Simulation (BPS) was developed by researchers and practitioners [3]. However, the diagrams in Business Process Modeling and Business Process Simulation are not sufficient to describe the details of a complex system and its processes [4]. There are numerous studies about Business Process Reengineering (BPR), in which each have used different methodologies. For example, [5] have introduced a methodology based on five business process modeling languages to integrate business process. [6] have proposed a general framework to assess a reengineering project, from its strategic planning stage to post-implementation phases [7] on the other hand have proposed a conceptual model to demonstrate the links between organizational re-structuring and behavioral changes to reengineer business processes. All of such studies have focused on business processes while in order to link between process modeling and process simulation; it is needed to have descriptive details of events and processes. Therefore, it is needed to apply a Grammar-Based Process Modeling. Applying such an approach, can improve the simulation of complex systems such as supply chains. Moreover, using a grammar-based process modeling approach is essential to prevent repeating the sub-processes and events of the system. The objective of this paper is to apply a Grammar-Based Process Modeling and Simulation methodology for supply chain management to illustrate more details of the system and its processes. [8] developed a simulation model by proposing an analytical framework for integrated logistic chain and this paper by using Arena 10 applies a simulation model based on numerical example in [8]; however it does not show the key performance indicators (KPI) and optimization of the model but show the possibility of using the information in simulation and analyzing them to have the results.
2 Literature Review The grammatical approach to design and redesign supply chain processes has received some attention. It is because this approach can balance and integrate the use of the other process modeling approaches [9]. MIT process handbook recommends the use of grammar-based approach to model the organizational and system processes [10]. Using this approach, each of the processes of a supply chain can be further described, modeled and modularized in more detailed activities and processes. The representation of the supply chain processes using this approach is easy to maintain because the processes can be described in hierarchical chunks [11].Many process
A Grammar-Based Process Modeling and Simulation Methodology
79
simulation tools can be integrated into the manufacturing process, and can be used to reengineer and optimize the manufacturing process model which lead to the improvement of many performance indices. Process models can be developed and supported by different systems such as workflow management system which support the decision making of managers in the business environment [2]. Business Process Simulation is an important part of the design and redesign of business processes. It provides quantitative reports about the impacts of a process design on process performance. Moreover, it provides quantitative evidences to decide the best process design. Simulation of business processes overlap with the simulation of other event systems [12].Simulation has enabled companies to deal with the impacts of the changes in their sites’ parameters without any unwanted or uncalculated effects on their operations and outputs. Business processes and companies knowledge can become much more transparent and understandable with an explicit, standard documentation [13]. The EPC [14] has been developed in 1992 by the team of August-Wilhelm Scheer at the Saarbrücken University in a joint research project with SAP AG. for modelling business processes with the goal to be easily understood and used by business people. The basic idea of EPC is that events trigger functions, or the executed functions cause events. The activities of a business process can be model by functions, and events are created by processing functions[15]. Three different tools applicable for BPS are process execution, process modeling and simulation. Jansen-Vullers and Netjes [12] developed a framework to find strengths and weaknesses of simulation modeling and process modeling tools. As shown in the Table 1, these tools for each of the evaluation criteria ranges from very bad (– –) and bad (-) to neutral (+/–), good (+) and very good (++). Table 1. Modeling capabilities [4] Feature Ease of model building Formal semantics/verif. Workflow patterns Level of details Animation Scenarios
ARIS + – – ++ + +
FLOWer + –– + ++ –– ––
Arena + +/– + ++ ++ +
3 Model Description As shown in Figure 1, a supply chain model was considered with the following assumptions: Three suppliers: Supplier 1, Supplier 2, Supplier 3 Two Semi-finished manufacturers: Semi-finished manufacturer 1, Semi-finished manufacturer 2 :One Final manufacturer: Final manufacturer One Distributor: DC Two Warehouse: Warehouse1, Warehouse2 Three Retailers: Retailer1, Retailer2, Retailer3 and Ultimate Customers.
80
M. Mohammadi, M.B. Mukhtar, and H.R. Peikari
Fig. 1. The supply chain model
3.1 The Methodology An efficient business process is a critical factor for any business to succeed. Business process modeling facilitates the understanding and analysis of the business process, and business process simulation is an effective technique to evaluate and diagnose business processes in an organization [16]. Due to different issues and shortcomings in time, costs, and users’ skills, the advanced simulation and optimization techniques have not been widely applied [16].The methodology to design business process reengineering modeling is presented in the following flowchart(Figure 2)
Fig. 2. The Methodology of Business Process Reengineering Modeling
A Grammar-Based Process Modeling and Simulation Methodology
81
3.2 The Workflow The workflow of the supply chain model illustrated in Figure 3 has been considered without the retailers.
Fig. 3. Workflow diagram
3.3 Grammar-Based Modeling in Supply Chain Process Numerous manufacturing process modeling approaches can be used in process view such as workflow modeling approach and event-driven process chain. Many process simulation tools can be integrated to the manufacturing process model. This optimizes and reengineers the manufacturing process model which lead to the improvement of many performance indices [2]. The element for representing the Supply Chain Process with structural form can be defined [4]: Process: Pi=
Event: Ei= Resource: Resource(Ei)={Rj,j=1,2,...•Rj Rj X,X=document or material} Rj= Dependency: Dependency (Di) = 3.4 Modeling the Process In order to represent the workflow of the model, we used Event-Driven Process (EPC) modeling language using ARIS Express 2.3. the events, processes and junctions in this software include and, or, xor. The model has been illustrated in Figure 4. The graph in a model is more intuitive than grammatical model. The above models can represent dynamic process with their representation logic and graphical symbols.
82
M. Mohammadi, M.B. Mukhtar, and H.R. Peikari
Fig. 4. Event-driven process chain diagram
3.5 Description of the Events and the Process There are a lot of constraints and relationships among the processes of a supply chain, which can be represented by the grammatical approach [9]. In a model, grammatical model is less intuitive than graph; but diagram cannot represent all information related to processes of a supply chain [9]. Then, in order to ensure the legibility of the model, the basic information of the processes of a supply chain can be represented by diagram as an accessorial tool. Table 2 illustrates the basic information of the grammatical model and provides detailed description of the events and the processes of the model. Table 2. Description of the events and the processes Process Name P1 Produce material 1
Agent Supplier1
P2
Produce material 2
Supplier2
P3
Produce material 3
Supplier3
P4
Produce semi-finished semi-finished product 1 product manufacturer 1 Produce semi-finished semi-finished product 2 product manufacturer 2 Produce final product final product manufacturer
P5
P6
Input Order bill of material 1 Order bill of material 2 Order bill of material 3 material 1 & material 2
Output Material 1
Pre Post D1 D2
Material 2
D3
D4
Material 3
D5
D6
Produce form manufacturer 1
D7
D9
material 2 & Produce form material 3 manufacturer 2
D8
D10
material 1,2,3
D11 D12
Produce final manufacturer
A Grammar-Based Process Modeling and Simulation Methodology
83
Table 2. (continued) P7 P8 P9 Event E1 E2,E3 E4 ….. D D1 …. D7 D14 ….
Distribution
Distribution Final productPrepare for D13 center warehouses Storing in Warehouse Warehouse 1 Prepare for Store in D15 1 warehouse 1 warhouse1 Storing in Warehouse Warehouse 2 Prepare for Store in warehouse D16 2 warehouse 2 2 Description Resource Pre. Receiving bill of order material 1 form Null semi-finished product manufacturer 1 Are similar to E1 … Material 1 is produce by manufacturer 1 <material 1 , received by D2 manufacturer 1 > Input E1
Output P1
Rule Flow
E4,E5 P7
P4 E10,E11
and xor
D14 D17 D18 Post. D1
D7
3.6 Arena Model and Simulation Results Based on the supply chain model shown in Figure 5 and information from Table 2 with a more details, a sub-model was simulated in Arena 10 by eliminating the retailers. As shown in the proposed sub-model illustrated in Figure 2, each of the
Fig. 5. The simulated sub-model
84
M. Mohammadi, M.B. Mukhtar, and H.R. Peikari
manufacturers can procure their materials from the suppliers 1, 2 and 3 and after producing the final product, send them to the distribution centers. Distribution centers decide about the distribution of the products to the warehouses based on a 50% 2-way by chance decision type. The simulation was designed run the replication for 500 times within 720 days before giving the results. Some of the results of the simulation related to entities (time), and process have been illustrated in Figure 6,7. Figures illustrate the process related to suppliers, manufacturers, distributors and warehouses. Our objective is not to show the key performance indicators (KPI) and optimization of the model but to show the possibility of using the information in simulation and analyzing them to have the results.
Fig. 6. The process result-time per entity
Fig. 7. The accumulated time result
4 Conclusion Business process modeling (BPM) provides management with a static structured approach to business improvement, providing a means of documenting the business processes, and business-process simulation (BPS) allows management to study the dynamics of the business. In the context of business process change BPS can be helpful. Thus BPM and BPS help to facilitate process thinking.This paper provides a
A Grammar-Based Process Modeling and Simulation Methodology
85
preliminary analysis only and more findings with details will be presented in future reports. However, despite the preliminary analysis provided in this paper, it has numerous findings and contributions. First, it illustrated that grammar-based process modeling can be used with business process simulation to improve the transparency of the system description and its process details. Moreover, since the modeling and simulating the processes include modeling the inputs, outputs, resources, workflow, etc, using a grammar-based modeling can bring about more flexibility of the supply chain dynamic simulation. Besides, a dynamic environment like supply chain includes its specific mechanisms and rules. Because these rules can be represented grammatically, therefore, in order to simulate a rule-based supply chain, the approach presented in this paper can be useful.
References 1. Cui, L., Chai, Y., Liu, Y.: Business Process based Simulation: A Powerful tool for demand analysis of business process reengineering and information system implementation. In: Proceedings of the 2008 Winter Simulation Conference (2008) 2. Feng, Y.: Research on Manufacturing Process Modeling and Simulation Based on Resource Optimization (2010), http://www.china-papers.com/?p=127962 (last accessed April 25, 2011 ) 3. Barber, K.D., Dewhurts, F.W., Burns, R.L.D.H., Rogers, J.B.B.: Business-Process Modeling, and simulation for manufacturing management: A Practical Way Forward. Business Process Management Journal 9, 527–542 (2003) 4. Yan, J., Qin, F., Yang, J., An, L., You, D.: A Grammar-Based Modeling of Supply Chain Process, Service Operations and Logistics, and Informatics. In: SOLI 2006, pp. 364–369 (2006) 5. Muthu, S., Whitman, L., Hossein Cheraghi, S.: Business Process Reengineering: A Consolidated Methodology. In: The 4th Annual International Conference on Industrial Engineering Theory, Applications and Practice, San Antonio, Texas, USA, November 1720 (1999) 6. Dodaro, G. L., Crowley, B. P.: BUSINESS Process Reengineering Assessment Guide, Accounting and Information Management Division (1997), http://www.gao.gov (August 16, 2011), http://www.gao.gov/special.pubs/bprag/bprag.pdf 7. Gunasekaran, A., Kobu, B.: Modelling and analysis of business process reengineering. International Journal of Production Research 40(11), 2521–2546 (2002) 8. Dong, M., Frank Chen, F.: Performance modeling and analysis of integrated logistic chains: An analytic framework. European Journal of Operational Research, 83–98 (2005) 9. Lee, J., Pentland, B.T.: Exploring the Process Space: A Grammatical Approach to Process Design/ Redesign (2002), http:Hccs.mit.edu/papers/pdfwp2l5.pdf (last accessed April 1, 2011) 10. Malone, T.W., Crowston, K., Herman, G.A.: Organizing Business Knowledge: The MIT Process Handbook. MIT Press, Cambridge (2003) 11. Zeng, Y., Pentland, B., Kwon, P.: A Grammar-Based Approach to Capturing and Managing Processes: An Industrial Case Study. Transactions of NAMRIIISME 31, 555– 562 (2003) 12. Jansen-Vullers, M.-H., Netjes, M.: Business Process Simulation - A Tool Survey, http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.87. 8291&rep=rep1&type=pdf (last accessed May 2, 2011)
86
M. Mohammadi, M.B. Mukhtar, and H.R. Peikari
13. Ponis, S.T., Hadzilias, E.A., Panayiotou, N.A.: Business Process Engineering with ARIS Simulation – A Case Study of a Call Centre Operation Assessment. WSEAS Transactions on Business and Economics 1(4), 338–343 (2004) 14. Scheer, A.-W.: ARIS – Business Process Modeling. Springer, Heidelberg (1999) 15. Kruczynski, K.: Business Process Modelling in the context of SOA – An empiric study of the acceptance between EPC and BPMN. In: Proceedings of the 5th International Conference on Information Technology and Applications, Cairns, Queensland, Australia, June 23-26 (2008) 16. Ren, C., Wang, W., Dong, J., Ding, H., Shao, B.: Towards A Flexible Business Process Modelling and simulation Environment. In: Proceedings of the 2008 Winter Simulation Conference (2008)
A Parallel Coordinates Visualization for the Uncapaciated Examination Timetabling Problem J. Joshua Thomas1, Ahamad Tajudin Khader2, and Bahari Belaton2 1
School of Computer Sciences, University Sains Malaysia 2 KDU College (PG) Sdn Bhd [email protected], {tajudin,[email protected]}
Abstract. The paper describes a new approach, based on visualization, to the uncapacitated examination timetabling problem. This approach involves with a naturally inspired evolutionary technique with parallel coordinates that facilitates exploration of data. By explicitly representing the visualization parameter space, the interface provides an overview of rendering options and enables user to easily explore different processes namely preprocessing, during the processing and post processing of examination schedules. The paper discusses the performance of this method on the Carter set of benchmark problems. This data set is comprised of real-world timetabling problems. Keywords: Visual Informatics, visualization, time-tabling, parallel coordinates.
1 Introduction Parallel coordinates is a well-established visualization technique first proposed by Inselberg [1, 2]. It is a scalable framework in the sense that the increase of the dimensions corresponds to the addition of extra axes. The visual clutter caused by an excessive overlaying of polylines limits the effectiveness of parallel coordinates in visualizing a dense data set. However parallel coordinates visualization method has adopted to visualize the timetabling input matrices and output matrices in its multivariable visualization. Visual clustering, filtering, and axis reordering are the common methods to reduce clutters in parallel coordinates. Some filtering techniques remove part of data in a pre-processing. Axis reordering computes the best axis order with minimal clutters, but it is very likely that the best axis order still leads to an unsatisfactory and cluttered display. This paper introduces the ParExaViz system, a software tool developed as a research aid in the field of examination scheduling. ParExaViz contains a set of visualizations designed to be used for analyzing the complexity of examination scheduling problems, helping to solve these problems and visualizing the heuristic strategies. Before discussing the motivations behind building ParExaViz, it is useful to look at the reasons for investigating this topic, present the university examination scheduling problem. The examination scheduling problem is interesting as the subject of these visualization as the basic structure of the examination scheduling problem can be clearly defined; however when scaled up to real life situations the problems become H. Badioze Zaman et al. (Eds.): IVIC 2011, Part I, LNCS 7066, pp. 87–98, 2011. © Springer-Verlag Berlin Heidelberg 2011
88
J.J. Thomas, A.T. Khader, and B. Belaton
highly complex making valid solutions are difficult. It is believed that the visualization presented here can be used to help successfully reduce this complexity. The goal of visualization software is to facilitate preprocess, during the process, and the post processing stages from raw data to feasible examination timetable generation in an intuitive way. Information visualization focuses on continuous data, often from acquisition process or computational simulation.Visualization tools often fail to reflect this fact both in functionality and in their user interfaces, which typically focus on graphics and programming concepts rather than more meaningful to end-user, although advances have been made, interfaces that support exploration and insight have substantial room for improvement, at least for assignment data in examination timetabling. We describe a prototype interface that utilizes parallel coordinates, processing of examination data exploration. The outline of this paper is as follows: Section 2 summarizes examination timetabling and the data. In Section 3, describes the design in more detail. Implementation stages are in Section 4 and conclusion and future work in Section 5.
2 Examination Timetabling Examination scheduling problem involves assigning resources, rooms and times, to exam events whilst ensuring that the number of violations, such as clashes where a student is scheduled to one or more exams simultaneously, is minimized. Other violations considered here are: room and time overflows, where exams are scheduled in locations without the required capacity or duration; and, consecutive violations, where a student has one or more exams in immediate succession. Throughout this discussion ‘cluster’ refers to a set of exams that do not share any common students (conflict) and can therefore all be scheduled in the same timeslot without causing any clash violations. There are many variants of examination timetabling problems due to the fact that each educational institution has their own rules, regulations and expectations resulting with various constraints. There has been much research into finding solutions to the uncapacitated examination timetabling problem and various techniques such as tabu search, simulated annealing, constraint programming, evolutionary algorithms, ant colonization, variations of the great deluge algorithm and the variable neighborhood search algorithm have been investigated for this purpose [4]. 2.1 Model of Examination Timetabling The model is based on constraints, as in the literature. Each of the constraints is formulated as integer programming. Examination timetabling problem (ETP) is known to be highly constrained and complex assignment problem. To reduce the complexity, it is normally divided into two sub-problems. ─ Exam-timeslot assignment ─ Exam-room assignment.
A Parallel Coordinates Visualization
89
In exam – timeslot assignment exams are scheduled into a fixed number of timeslots, and in exam-room assignment exams are assigned to rooms. Hence, the ETP, and assignment is ordered 3-tuple (a,b,c), where as , , and has the straightforward general interpretation. For example, exam a starts at timeslot b in room c. However, we focus on “exam-timeslot” assignment. There matrices are required to show the interrelationships between these set of variables. There are two types of matrices, input matrices and output matrices. The input matrices are the matrices where the values are known earlier (timetabling data), and have been allocated or pre-assigned. The output matrices are the assignment matrices where the values need to be determined by solving ETP. 2.2 Hard Constraints and Soft Constraints Hard constraints must be satisfied in order to produce a feasible timetable. Any timetable which fails to satisfy all these constraints is considered infeasible. Soft constraints are those which are desirable to be satisfied, but which in general cannot all since soft constraints are numerous and varied on the needs of the individual problem. 2.2.1 Exam-Timeslot Assignment Hard Constraints ─ Exam- clashing: Every exam must be assigned to exactly one timeslot of the timetable. ─ Student-clashing: No student should be scheduled in two different places at once. ─ Group together: Certain exams with questions in common must be grouped and scheduled together in the same timeslot. 2.2.2 Exam –Timeslot Assignment Soft Constraints ─ Timeslot-capacity: the total number of students in all exams in the same timeslot must be less than the total capacity for that timeslot. ─ Exam related (1) : no student should have two exams in adjacent timeslots on the same day. ─ Exam related (2) : no student should have two exams in s timeslots or in adjacent days. Quality measures of an examination timetabling are derived from soft constraints, most frequently from student restrictions. If several quality measures are used the objective function is a linear combination of these measures. 2.3 Dataset The Parallel coordinate’s visualization has tested with the set of Carter benchmarks listed in Table 1 below. Note that for some of the data sets more than one version exists, thus the version is also indicated, e.g. car-s-91I. The density of the conflict matrix is an estimate of the difficulty of the problem and is the ratio of the number of examinations involved in clashes and the total number of examinations. The hard
90
J.J. Thomas, A.T. Khader, and B. Belaton
constraint for the set of benchmarks is that there are no clashes, i.e. each student must not be scheduled to sit more than one exam in a given timeslot. Thus, the hard constraint cost for this problem is the number of clashes. A feasible timetable is one in which the hard constraint cost is zero, i.e. there are no clashes. Table 1. Carter benchmark dataset Data car-f-92 I car-s-91 I ear-f-83 I
hec-s-92 I kfu-s-93
lse-f-91
rye-s-93 sta-f-83 I
Institution Carleton University,Ottawa Carleton University, Ottawa Earl Haig Collegiate Institute, Toronto Ecole des Hautes Etudes Commerciales, Montreal King Fahd University of Petroleum and Minerals, Dharan London School of Economics Ryerson University, Toronto St Andrew’s Junior High School, Toronto
Periods 32
No. of Exams 543
No. of Students 18419
35
682
16925
24
190
1125
18
81
2823
20
461
5349
18
381
2726
23
486
11483
13
611 139
tre-s-92
Trent University, Peterborough, Ontario
23
261
4360
uta-s-92 I
Faculty of Arts and Sciences, University of Toronto Faculty of Engineering, University of Toronto
35
622
21266
10
184
2749
21
181
941
ute-s-92
yor-f-83 I
York Mills Collegiate Institute, Toronto
Table 1. Illustrates the carter’s version –I of dataset. For each data starting from the first row (car-f-92-I, 32, 543, and 18419) has shown the institution periods, number of exams and the total number of students enrolled. The subsequent rows are with different institutions provided as benchmarking data for the examination timetabling problem.
A Parallel Coordinates Visualization
91
3 Parallel Coordinates the Design 3.1 Analyze the Problem There are few questions before we start on explaining. “How hard is this problem to solve?” This may be in relation to other similar problems, or instances of the same problem. Mathematics helps answer the first query by providing a means of classifying different problems, for example the examination scheduling problem is known to be NP-Hard. In contrast the visualization discussed here attempts to provide a visual solution to the second question: how hard are examinations scheduling problems to solve relative to each other? Further how do different parts of the same problem, for example different departments, relate to each other in terms of difficulty to schedule? For using this visualization for analyzing the complexity of an examination scheduling problem instance is to see which parts of the problem look like they are going to be the hardest to solve. Many heuristics involve trying to solve the hardest parts of problems first, for example by initially scheduling exams with the highest number of clashes. Although this is relatively easy for an automated system it is not so easy for a human scheduler, especially when the criteria for selecting exams become more complicated. Another feature of the problem, which does appear to be a good indicator of the difficulty of solving an examination scheduling problem, is the number of potential clashes (or exam intersections) that exist. This value is not dependent on any algorithm, and is used here as a measurement of the problems complexity. We designed our interface to support the visualization tasks described by [3]. Note that we apply this visualization seeking mantra to processing stage representation rather than data representation. Our design organizes all visualization processes together in one space so that users can quickly gain an overview of the possibilities. The effects of changing parameter settings can be studied by making changes with a simple mouse action and viewing consecutive results. 3.2 Relationship to Parallel Coordinates Our interface is based on the concept of parallel coordinates. Many of the axes do not display one-dimensional variables. For example, a transfer function is a complex concept consisting of variables such as data intensities, opacities, and sometimes intensity derivatives [10]. Furthermore, axes can be merged to produce even more complex nodes. Sorting and interpolating nodes therefore requires slightly more complex techniques than ordinary parallel coordinates [5]. In addition, a major strength of parallel coordinates is their ability to make high-level trends in data sets (by drawing many lines).We display lines without colors and relate one other variables. There is no history view displayed. Our proposed interface is novel in that it is the first interface that would allow such analysis on the examination timetabling problem. In Figure 1 has depicted the general approach of the parallel coordinates versus Cartesian coordinate graphs.
92
J.J. Thomas, A.T. Khader, and B. Belaton
Fig. 1. Multiple points in Cartesian coordinate graph versus Parallel Coordinate graph
4 Implementation Visualization systems combine visualization and interaction techniques, ranging from simple slider and drag & drop interfaces to more advanced techniques such as Fish Eye Views. These techniques have in common that the visualizations transform data from an internal representation into an easily recognizable visual representation. This visual representation has contained more information about the data preprocessing or problem instance as input matrices as population the traditional genetic algorithm has the engine to do the assignment or schedule and these during the process has been visualized. This visualization has exposed the clashes between exams to student or student group, exams to timeslot etc. we want to use the medium of visualization in combination with interaction to capture additional facts about the data and the problem, which are hard or costly to capture and communicate otherwise. The constraints are usually not represented by the formal problem description (see unspecied portion of the problem in figure 2). Because of the complexity the problem and the optimization goals, the formal speciation of the problem is usually simplified, and captures only the necessary requirements but not the sufficient ones. Our interface is based on the concept of parallel coordinates, it has substantial differences from ordinary parallel coordinates displays [9]. Many of the axes do not display onedimensional variables. The next sections briefly explain the processes involved in visualization. Our proposed interface is novel in the sense it is the first interface that would allow such analysis.
A Parallel Coordinates Visualization
93
Fig. 2. Visual complexity of the examination timetabling problem
There are five set of variables that should be taken into account in a university examination timetabling. ─ Exam: the exam to be timetabled the domain of this variable E is the set of all 1….. has student exams to be scheduled. For each exam ei , enrollments, department, length and type. ─ Timeslot: the time used by the exam represent as domain of this variable T, is the set of all timeslots in the timetable preparation. For each timeslot tj, 1….. , has start and finish times and type. ─ Room: the room where the exam to be held the variable domain R, is the set of all rooms available for exams. ri 1….. has department, size and type. ─ Student: enrolled for the exams, the domain is S, is the set of all students enrolled for the exams. For each si 1….. has department and a list of exams. ─ Department: represents a faculty with the domain variable D, is the set of all 1….. has a list of departments at an institution. Each department di rooms, students, and the list of invigilators. 4.1 Preprocessing Data preprocessing can significantly improve the quality of data mining results, no matter whether an algorithmic or a visual approach is considered. There are four different aspects of data preprocessing in literature data cleaning, data integration, data transformation and data reduction [7]. However, these techniques are not quite suitable for the examination timetabling problem. In the pre-processing step, we first detect clusters in the given data set. Numerous clustering algorithms can be applied to
94
J.J. Thomas, A.T. Khader, and B. Belaton
detect clusters. The clustering process provides an abstraction of data without filtering out potentially important information such as outliers. Throughout the experiments in this paper, the K-means clustering algorithm [8] is adopted.
Fig. 3. Clustering algorithms applied on the car-s-91-I dataset
Although the algorithm is not guaranteed to return a global optimum, it is extremely fast and has been widely adopted in many research areas. In figure 3 a clustering method and the brushing system are used accordingly on the variables. Each column represents the constraint parameters of the problem. The starting column has the car-s-91-I input matrices, second and subsequent columns are timeslot, exams, students and rooms. Here the focus is only between exam and timeslot assignment and the clustered data variables are highlighted with simple brushing. We adopted the input matrices a pool of data as the primary input for the next processing phase. 4.2 During the Processing During the processing must have an algorithmic engine to produce the schedules or assignment according to the given input matrices. For example, figure 3 is one of the input timetabling data these files are of two types pertaining course data and student data. The course file has the parameter of course, enrollment and with the student file the parameters are courses in ascending order. As we mentioned in section 2.2 the assignment is between the exam and timeslot. This process has been assisted with visual metaphors to witness the clashes between exam, timeslot and student. Parallel coordinates visualization is more suitable if the cluster clashes between, exam with student and student group, timeslot with room, time with exam and student etc. the overview of the pattern as explained. Every student must do one exam. There are group exams and individual exams. The maximum number of students for group exam is 3 per group. The group size is predetermined by the student enrollment or with administrator. An exam can 35 many time slots. The maximum of 10 exams per timeslot, and with the minimum of 5 exams per slot will be considered and each student must enrolled 3 subjects.
A Parallel Coordinates Visualization
95
4.2.1 Data Set Structure and Constraints Given No of exams= 543, No of students = 18413. For ease of representation, we will adopt numerical data representation instead of binary in each locus in a chromosome. There are 2 types of data sets (two pools of chromosomes): 4.2.2 1st Pool of Phenotypes 1st pool contains chromosomes in the form of the 1st data set, which is a matrix of numbers, X [i, j] where X is a (N x 3) matrix (where N= total number of students available); ─ For j=1, each locus represents a exam_id ─ For j=2, each locus represents a timeslot, 1 ≤ i ≤N, where N= total number of timeslot available ─ For j=3, each locus represents no of students allocated to the exams Table 2. Input matrices of examination to timeslot and the student group
E1 E2 E3 E4 E5 : E543
T1 T1 T1 T2 T2 : T32
2 3 1 3 3 : :
In table 2, The size of X[i,j] is limited to X[1,1]……X[N,3], where N= total number of exams to be schedule ─ ─ ─ ─
X[i,1] = Tm , where 1≤ Tm ≤ 35 X[i,2] = En, where 1≤ Pn ≤ N, where N= total number of exams X[i,3] = G, where 1≤ G ≤ 3, where G = groups of students For every similar value of Tm , 5 ≤ ∑ of each corresponding value of G ≤ 10
4.2.3 2nd Pool of Phenotypes 2nd pool contains chromosomes in the form of the 2nd data set, which is a matrix of numbers, Y [i,j] where Y is a (16925 x 5) matrix; ─ For j=1, each locus represents a student id, ─ For j=2, each locus represents the 1st course of the courseid1, Cn Cn = X[i,2], where
1≤ C ≤ N, where N= total number of (courses) available
─ For j=3, each locus represents the 2nd choice of the student (a courseid2),) Cn Cn = X[i,2], where
1≤ C ≤ N, where N= total number of (course2) available
─ For j=4, each locus represents the 3rd choice of the student (courseid3,Cn Cn = X[i,2], where
1≤ C ≤ N, where N= total number of (course3)) available
96
J.J. Thomas, A.T. Khader, and B. Belaton Table 3. Input Matrices of students with courses
S1 S2 : S16925
Ca Ce : Cj
Cb Cf : Ck
Cc Cg : Cj
Cd Ch Cm
In table 3, The size of X[i,j] is limited to Y[1,1]……Y[16925,5] ─ Y[i,1] = Sc , where 1≤ Sc ≤ 16925 Cn = X[i,2], where 1≤ C ≤ N, where N= total number of ─ Y[i,2] = Cn, courses available Cn = X[i,2], where 1≤C ≤ N, where N= total number of ─ Y[i,3] = Cn, courses available ─ For every value of Y[i,2], Y[i,3], Y[i,4], Y[i,5], every value of Cn must not appear more than 3 times in the whole matrix (excluding values in Y[i,1]). Figure 4, explains the graphically drawn simple understanding of the parallel coordinates visualization of the examination timetabling parameters in during the processing. The visualization handles five variables which supports the individual or group types of data values. In the timeslot column and exam column few clashes are identified. For example, look at the time, exam and student column there are clashes identified. In the time column, T1 and T2 are the clash identified upon exam column. The exam column has identified the clashes of E1, E2 and E3 that clashes with individual or group of students.
Fig. 4. Parallel coordinates representation of examination timetabling parameters
4.3 Post Processing Figure 5 shows a screen shot of our system. The two squares on the left side shows conflict summaries of different time tables. The upper square contains the parent population (size 16). The lower square contains the result of a crossover operation of
A Parallel Coordinates Visualization
97
two time tables of the parent population. The bar on the right side contains a list of the best time tables seen so far. The large area in the middle shows the processing of the examination schedules. On right hand side the column boxes shows the conflicts. In here, it shows the time versus exams and exam versus subject. The post processing phase deals with that resultant timetable with minimum clashes. The slider symbolizes the examination threshold from the range of 0 to 2184.
Fig. 5. ParExaViz: A tool to solve examination timetabling problem
5 Conclusion and Future Work We presented a parallel coordinates user interface that facilitates examination time tabling processes data exploration. All parameters are clearly organized in terms of vertical lines in the visualization and visible so that users can see and remember what options are available and what settings generated based on the clashes. A history bar will be implemented as in future work however; previous and next buttons are useful in the interface to allow users to easily backtrack to previous states and quickly scroll to see which options have been previously tried. However, visual clutter in parallel coordinates significantly limits its usage for visualizing large data sets. Inspired by some previous graph based visual analytic tools, our approaches do not alter the data in the parallel coordinates. Instead, they are developed on top of the parallel coordinates to visualize the core examination timetabling processes.
98
J.J. Thomas, A.T. Khader, and B. Belaton
References [1] [2] [3] [4]
[5]
[6]
[7]
[8] [9] [10]
Parallel Coordinates: VISUAL Multidimensional Geometry and its Applications, Textbook, 554 pages. Springer, New York (2009) Inselberg, A., Dimsdale, B.: Parallel Coordinates: A Tool forVisualizing Multidimensional Geometry. In: Proc. IEEE Visualization, pp. 361–378 (1990) Shneiderman, B.: The Eyes Have It: A Task by Data Type Taxonomy for Information Visualizations. In: Proc. IEEE Symp. Visual Languages, pp. 336–343 (1996) Qu, R., Burke, E.K., McCollum, B., Merlot, L.T.G., Lee, S.Y.: A Survey of Search Methodologies and Automated System Development for Examination Timetabling. Journal of Scheduling 12(1), 55–89 (2009) Ericson, D., Johansson, J., Cooper, M.: Visual data analysis using tracked statistical measures within parallel coordinate representations. In: CMV 2005: Proceedings of the Coordinated and Multiple Views in Exploratory Visualization, Washington, DC, USA, pp. 42–53. IEEE Computer Society, Los Alamitos (2005) Joshua Thomas, J., Khader, A.T., Belaton, B.: Exploration of Rough Sets Analysis in Real-World Examination Timetabling Problem Instances. In: Tan, Y., Shi, Y., Chai, Y., Wang, G. (eds.) ICSI 2011, Part II. LNCS, vol. 6729, pp. 173–182. Springer, Heidelberg (2011) Ward, M.O.: Xmdvtool: integrating multiple methods for visualizing multivariate data. In: VIS 1994: Proceedings of the Conference on Visualization 1994, pp. 326–333. IEEE Computer Society Press, Los Alamitos (1994) Hartigan, J.A., Wong, M.A.: A K-means clustering algorithm. Applied Statistics 28, 100–108 (1979) Zhou, H., Yuan, X., Qu, H., Cui, W., Chen, B.: Visual clustering in parallel coordinates. Computer Graphics Forum 27 (2008) McDonnell, K., Mueller, K.: Illustrative parallel coordinates. Computer Graphics Forum (Special Issue Eurovis 2008) 27, 1027–1031 (2008)
A Modified Edge-Based Region Growing Segmentation of Geometric Objects Nursuriati Jamil1, Hazwani Che Soh1, Tengku Mohd Tengku Sembok2, and Zainab Abu Bakar1 1
Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA, 40450, Shah Alam, Selangor, Malaysia [email protected], [email protected], [email protected] 2 Faculty of Technology and Information Sciences, Universiti Kebangsaan Malaysia, 43600, Bangi, Selangor, Malaysia [email protected]
Abstract. Region growing and edge detection are two popular and common techniques used for image segmentation. Region growing is preferred over edge detection methods because it is more robust against low contrast problems and effectively addresses the connectivity issues faced by edge detectors. Edgebased techniques, on the other hand, can significantly reduce useless information while preserving the important structural properties in an image. Recent studies have shown that combining region growing and edge methods for segmentation will produce much better results. This paper proposed using edge information to automatically select seed pixels and guide the process of region growing in segmenting geometric objects from an image. The geometric objects are songket motifs from songket patterns. Songket motifs are the main elements that decorate songket pattern. The beauty of songket lies in the elaborate design of the patterns and combination of motifs that are intricately woven on the cloth. After experimenting on thirty songket pattern images, the proposed method achieved promising extraction of the songket motifs. Keywords: Image segmentation, region growing, edge detection, geometric objects, songket, Visual Informatics.
1 Introduction Region growing and edge detection are two popular and common techniques used for image segmentation. While region growing segment images by grouping similar pixels into regions based on seeds, edge techniques segregate pixels into different regions based on gradient discontinuities. Region growing works by appending neighbouring pixels of starting seed pixel to form a region based on predefined criteria [1] [2]. The region grows by appending to each seed pixel those neighbouring pixels that have the similar properties to the seed point such as specific range of gray level values or colour. Region growing is preferred over edge detection methods because it is more robust against low contrast problems and effectively addresses the H. Badioze Zaman et al. (Eds.): IVIC 2011, Part I, LNCS 7066, pp. 99–112, 2011. © Springer-Verlag Berlin Heidelberg 2011
100
N. Jamil et al.
connectivity issues faced by edge detectors [3]. Region growing, however, requires prior knowledge about the image. This is necessary to select the seed points [3] and the threshold values [4]. In recent years, numerous work on automating the seed points and threshold value selections have been attempted [5] [6] [7][8][9][10] [11] and the results are encouraging. Over and under segmentation are also common problems in region growing [12]. Edge detectors work by segmenting the image based on the abrupt changes in gray levels. These abrupt changes or discontinuities among pixels are detected by finding the maximum and minimum in the first derivative of the image and by searching the zero crossings in the second derivative of the image. The derivative nature of edge detectors, however, makes them extremely sensitive to image noise level. Noise will affectively affect the performance of edge detectors, thus significant noise removal is required to remove the noisy edges. Edge-based techniques are also subject to global variation in an image, but do reliably identify strong boundaries [13]. Edge methods are popularly employed for image segmentation [14][15][16][17][18][19][20] because it can significantly reduce useless information while preserving the important structural properties in an image. 1.1 Integration of Region Growing and Edge Detection Both region growing and edge-based segmentation have their own strengths and limitations [17]. Pavlidis and Liow [21] stated that combining region growing and edge methods for segmentation will produce much better results. [22] suggested that in region growing, region joining decisions may be based not only on pixel or neighborhood similarity but also on already-extracted edges and completion of these edges. Once two adjacent regions that are candidates for merging are found, strength of the boundary should be examined first. If the boundary is of sufficient strength, they should be kept separated. If the boundary between them is weak, they may be merged. Since 1980s, substantial researches on integrating region growing and edgedetection techniques for image segmentation have been conducted. Grinaker [23] used edge detection in areas around the borders of regions found by region-based segmentation. Fua and Hanson [24] used edge–based techniques and area-based geometric constraints to pick out the best segmentation from a series of region-based segmented images. Xiahon et al. [25] combined region growing, edge detection and a new edge preserving smoothing algorithm to avoid characteristic segmentation errors which occur when using region growing or edge detection separately. In [26], Yi-Wei and Jung-Hua proposed an edge detector criterion to determine whether the region growing process should continue or terminates when a region grew to such extent that it touched an edge pixel. Al-Hujazi and Sood [27] performed segmentation by using residual analysis to detect edges, and then a region-growing technique was used to obtain the final segmented image. The integration of region growing and edge detection techniques has been increasingly popular in recent studies. A hybrid method using region growing, edge detection and mathematical morphology was introduced by [28] in 2002. The threshold controlling the process of region growing was automatically determined by a fuzzy technique. Experimental results indicated that the proposed method for 3D segmentation of MRI brain images provided much better results than the traditional
A Modified Edge-Based Region Growing Segmentation of Geometric Objects
101
method using a single technique in the segmentation of a human brain MRI data set. In [13], region growing initially was used to produce over segmented image. The image was then modified using edge strength, edge smoothness, edge straightness and edge continuity. A reasonable segmentation results were produced by [29] using region growing and region merging of color images. Then, multi-threshold concept to generate local entropies was used for reasonable edge detection. In [30], adaptive thresholding based on the gradients and variances along and inside of the boundary curve was proposed to overcome the difficulty of manual threshold selection and sensitivity to noise. In clinical MRI image segmentation this method produced very satisfactorily results. The region distribution and global edge information are further employed to identify region with texture characterization to obtain segmentation results.
2 Data Collections In this paper, edge information is used to automatically select seed pixels and guide the process of region growing. Experiments are conducted on thirty images of scanned songket patterns. An example of a songket pattern is illustrated in Figure 1. Songket is an exquisite hand woven cloth of the Malays made by hand weaving silver, gold and silk threads on a handloom. The beauty of songket lies in the elaborate design of the patterns and combination of motifs that are intricately woven on the cloth. Motif is the main element of designing songket pattern and preservation of these motifs in digitized form is essential for cultural heritage purposes. Thus, the main purpose of this study is to automatically segment these motifs from the songket pattern and archived them.
Fig. 1. A songket pattern
The basis of this study is inspired by Fan et al. in [30]. Fan et al. employed edge detection techniques to define two kinds of seed pixels, one called edge region seeds (hot seeds) and the other homogenous seeds (cold seeds). Region growing is then applied based on the fact that the gray levels of the hot seeds are lower than the pixels not far away from the edge region in the hot object and the gray levels of the cold seeds are higher than the pixels not far away from the edge region of the cold background. The region growing process is highly computational and timeconsuming. Thus, this study proposed employing morphological operations to
102
N. Jamil et al.
complete the segmentation process. Further discussions of each step are presented in the next section.
3 Methodology In this section, each step of the experiments is detail out. Figure 2 demonstrates the flowchart of the proposed methodology. Start Songket pattern Pre-processing Edge detection Edge region detection Modified Sobel detection Threshold to edge region
Seeds definition Morphological operations Segmented Motifs End Fig. 2. Flowchart of the proposed methodology
3.1 Pre-processing In pre-processing stage, the scanned songket pattern color image is cropped and converted to grayscale. Noises captured during image acquisition process are then reduced by smoothing the image using a 9 x 9 weighted average filter as shown in
A Modified Edge-Based Region Growing Segmentation of Geometric Objects
103
Figure 3. Smoothing process also help enlarged sharp edges which later will be identified as edge region. An example of image smoothing is shown in Figure 4. 1 2 3 4 6 4 3 2 1
2 3 4 6 10 6 4 3 2
3 4 6 10 12 10 6 4 3
4 6 10 12 14 12 10 6 4
6 10 12 14 16 14 12 10 6
4 6 10 12 14 12 10 6 4
3 4 6 10 12 10 6 4 3
2 3 4 6 10 6 4 3 2
1 2 2 4 6 4 2 2 1
Fig. 3. Weighted average filter used for noise removal
(a)
(b)
Fig. 4. (a) Grayscale image. (b) Smoothed image.
3.2 Edge Detection Canny edge detector [32] provides the most possible compromise between noise reduction and edge localization and is observed to give good results. Its edge detection algorithm utilizes these steps: 1.
Noise smoothing - First of all, the image is smoothed by Gaussian convolution using a Gaussian approximation: ,
1 2
(1)
where σ is the standard deviation of the distribution. 2. Edge enhancement – The gradient magnitude and orientation are computed using finite-difference approximations for the partial derivatives. This can be done by applying a simple 2-D first derivative operator such as the Sobel vertical (Gx) and horizontal (Gy) filters to the smoothed image to highlight regions of the image with high first spatial derivatives.
104
N. Jamil et al.
Gx=
-1 -2 -1
0 0 0
1 2 1
Gy=
1 0 -1
2 0 -2
1 0 -1
The gradient magnitude and direction are then computed using Equation 1 and 2, respectively. |
|
|
|
(2)
3.
4.
(3)
Edge localization - Edges give rise to ridges in the gradient magnitude image. Nonmaxima suppresion is applied to the gradient magnitude to thin the wide ridges around local maxima in gradient magnitude down to edges that are only one pixel wide. Edge linking - To remove noise in the nonmaxima suppressed magnitude image, two threshold values (hysteresis thresholding algorithm) are applied to it. With these threshold values, two thresholded edge images T1[i, j] and T2[i, j] are produced. The image T2 has gaps in the contours but contains fewer false edges. With the double thresholding algorithm, the edges in T2 are linked into contours. When it reaches the end of a contour, the algorithm looks in T1 at the locations of the 8-neighbours for edges that can be linked to the contour. This algorithm continues until the gap has been bridged to an edge in T2. The algorithm performs edge linking as a by-product of thresholding and resolves some of the problems with choosing a single threshold value.
The results of applying Canny edge detection on the smoothed image in Figure 4b is shown in Figure 5.
(a)
(b)
Fig. 5. (a) Smoothed image (b) Canny edge detected image
3.3 Edge Region Detection The edge region is defined as pixels surrounding the single pixel detected by Canny operator (Figure 5b). Four Sobel operators are applied to the smoothed image in
A Modified Edge-Based Region Growing Segmentation of Geometric Objects
105
Figure 4b to identify all possible pixels that are to be included in the edge region. The four Sobel operators are shown as follows [31]: S1=
2 1 0
1 0 -1
S2=
-2 -1 0
S3=
0 1 2
S4=
0 -1 -2
0 -1 -2
+
1 0 -1
2 0 -2
1 0 -1
=
3 1 -1
3 0 -3
1 -1 -3
-1 0 1
0 1 2
+
-1 0 1
-2 0 2
-1 0 1
=
-3 -1 1
-3 0 3
-1 1 3
-1 0 1
-2 -1 0
+
1 2 1
0 0 0
-1 -2 -1
=
1 3 3
2 1 0
+
-1 -2 -1
1 2 1
=
-1 -3 -3
1 0 -1
0 0 0
-1 0 1
-3 -3 -1
1 0 -1
3 3 1
The general equations of edge region detection are as follows: ,
,
,
, ∑
1. .4
(4)
, 4
(5)
where fn(i,j) are the images obtained from the four Sobel operators and h(i,j) is the final edge region image. Figure 6 illustrates the final edge region image applied on the smoothed image.
(a)
(b)
Fig. 6. (a) Smoothed image. (b) Final edge region image.
After the edge region image is obtained, it is converted to binary using Otsu’s global threshold method. Otsu’s method uses a criterion function as a measure of
106
N. Jamil et al.
statistical separation between classes of foreground and background intensity values. Given these two groups of pixels, the overall or total variance of the gray values in 2 the image can be easily calculated, denoted by σ T . For any given threshold t, variance of the foreground and background pixels can be separately computed to 2 represent the within-class variance, σ W . Finally, the variation of the mean values for each class from the overall mean of all pixels defines a between-class variance, 2 denoted by σ B . An optimal threshold can then be calculated by minimizing the ratio of the between-class variance to the total variance [33]:
η (t ) =
σ B2 σ T2
(6)
The value t that gives the smallest value for η is the best threshold. Figure 7 shows the result of applying Otsu’s threshold algorithm on the edge region image.
(a)
(b)
Fig. 7. (a) Edge region image. (b) Thresholded image.
(a)
(b)
Fig. 8. (a) Binary edge region image. (b) Enhanced edge region image.
A Modified Edge-Based Region Growing Segmentation of Geometric Objects
107
Thresholding process sometimes produced small erroneous regions considered as noises. These small-sized regions are eliminated by performing median filtering as shown in Figure 8. Only then hot and cold seeds are computed. 3.4 Seeds Definition Seeds are defined into two types, that is hot and cold seeds. Seeds selections are based on the Canny edge image (Figure 5b) and the edge region image (Figure 8b). Hot seeds are pixels in edge region image not far away from foreground objects (i.e. songket motifs) and cold seeds are pixels in edge region image not far away from background. The hotness degree of the pixels in the edge region image is then estimated using the following pseudocode: For each pixel in the grayscale image If the current pixel belongs to the edge image && edge region image Calculate the hotness degree of the pixel. Hotness degree of each pixel is dependent on a threshold value determined by the median of the pixel’s 5x5 neighbours. If the pixel’s gray value is greater than the threshold, hotness degree of the pixel is increased by 1, otherwise the hotness degree is decreased by 1. After the hotness degree image is estimated, each pixel is in the image is defined based on the criteria listed in Table 1. Table 1. Seeds definition criteria Seed category Hot Cold 0(not a seed)
Criteria Hotness degree > 0 and pixel belongs to edge region image Hotness degree < 0 and pixel belongs to edge region image Hotness degree = 0
After the seeds are defined, the seeds image is shown in Figure 9.
Fig. 9. Seeds image
108
N. Jamil et al.
3.5 Morphological Operations The seeds are then grown using morphological reconstruction by connecting the pixels in 8-directions beginning from the holes of the binary image. A hole is a set of background pixels that cannot be reached by filling in the background from the edge of the image. Enhancement of the image is further done by erosion using a square structuring element of size 5 x 5. The final segmented image is as shown in Figure 10.
Fig. 10. Segmented image
4 Results and Discussions Thirty songket patterns are experimented using the proposed method and the results are summarized in Table 2. Seventy percent of the songket patterns are completely correctly segmented into the corresponding motifs. Only a small percentage of the patterns (6 out of 30) have less than 50% correct segmentation results. Songket patterns that have high percentage of correctly segmented motifs are mostly isolated, complete motifs such as shown in Figure 11. Patterns that have partial motifs at the border such as in Figure 12 tend to produce incorrectly segmented motifs. The morphological reconstruction seems to fill up the intricate details of the motifs at the border of the pattern image. Table 2. Summarized results of segmentation Songket Pattern No. Correctly segmented motifs 11, 13, 19, 20, 29, 30, 33, 35, 45, 47, 51, 54, 100% 55, 60, 72, 73, 76, 81, 85, 86, 89 74 90% 75, 98 50% 31, 43, 95 33% 49, 97, 94 10%
A Modified Edge-Based Region Growing Segmentation of Geometric Objects
109
Fig. 11. Isolated, complete motifs
(a)
(b)
Fig. 12. (a) Songket pattern. (b) Poorly segmented image
Partial Motifs
Another type of problematic patterns is those of chained motifs as in Figure 13. Hot and cold seeds are used to differentiate foreground and background pixels in the edge region. As long as the edge regions are connected, the seeds grew from one motif to another producing incorrectly segmented motif.
Fig. 13. Chained motifs
110
N. Jamil et al.
5 Conclusion This paper reports a method of image segmentation by using edge information as the basis of automated seed selection of region growing. There are two main contributions of this paper: (1) The selection of seed pixels are done automatically based on a Canny edge-detected image and edge region image defined by Sobel operators. (2) Region growing is done using morphological reconstruction methods, thus it is much faster and less computational. Further work need to be done to improve the segmentation result especially for the chained and partial motifs. Other performance measurements such as area overlap, false positive and negative rate should also be taken into consideration as only minimal measurement is used in this paper. As reported in the experiments, seventy percent of the songket patterns achieved total correctly segmented motifs.
References 1. Hojjatoleslami, S.A., Kittler, J.: Region Growing: A New Approach. IEEE Transactions on Image Processing 7(7), 1079–1084 (1998) 2. Gonzales, R.C., Woods, R.E.: Digital Image Processing, 2nd edn. Prentice Hall (2002) 3. Park, J.G., Lee, C.: Skull Stripping Based On Region Growing For Magnetic Resonance Brain Images. NeuroImage 47, 1394–1407 (2009) 4. Adams, R., Bischof, L.: Seeded Region Growing. IEEE Transactions on Pattern Analysis and Machine Intelligence 16(6), 641–646 (1994) 5. Poonguzhali, S., Ravindran, G.: A Complete Automatic Region Growing Method for Segmentation of Masses on Ultrasound Images. In: International Conference on Biomedical and Pharmaceutical Engineering, December 11-14, pp. 88–92 (2006) 6. Ghelich Oghli, M., Fallahi, A., Pooyan, M.: Automatic Region Growing Method using GSmap and Spatial Information on Ultrasound Images. In: 18th Iranian Conference on Electrical Engineering, May 11-13, pp. 35–38 (2010) 7. Yang, J.-H., Liu, J., Zhong, J.-C.: Anisotropic Diffusion with Morphological Reconstruction and Automatic Seeded Region Growing for Color Image Segmentation. In: International Symposium on Information Science and Engineering, December 20-22, vol. 2, pp. 591–595 (2008) 8. Tatanun, C., Ritthipravat, P., Bhongmakapat, T., Tuntiyatorn, L.: Automatic Segmentation of Nasopharyngeal Carcinoma from CT Images: Region Growing Based Technique. In: International Conference on Signal Processing Systems (ICSPS), July 5-7, vol. 2, pp. V2537–V2-541 (2010) 9. Jianping, F., Yau, D.K.Y., Elmagarmid, A.K., Aref, W.G.: Automatic Image Segmentation by Integrating Color-edge Extraction and Seeded Region Growing. IEEE Transactions on Image Processing 10(10), 1454–1466 (2001) 10. Siddique, I., Bajwa, I.S., Naveed, M.S., Choudhary, M.A.: Automatic Functional Brain MR Image Segmentation using Region Growing and Seed Pixel. In: International Conference on Information & Communications Technology, December 10-12, pp. 1–2 (2006) 11. Roslan, R., Jamil, N., Mahmud, R.: Skull Stripping of MRI Brain Images using Mathematical Morphology. In: IEEE EMBS Conference on Biomedical Engineering and Sciences, November 30-December 2, pp. 26–31 (2010)
A Modified Edge-Based Region Growing Segmentation of Geometric Objects
111
12. Sharma, N., Aggarwal, L.: Automated Medical Image Segmentation Techniques. J. Med. Phys. 35, 3–14 (2010) 13. Chowdhury, M.I., Robinson, J.A.: Improving Image Segmentation Using Edge Information. In: Canadian Conference on Electrical and Computer Engineering, vol. 1, pp. 312–316 (2000) 14. Law, M.W.K., Chung, A.C.S.: Weighted Local Variance-Based Edge Detection and Its Application to Vascular Segmentation in Magnetic Resonance Angiography. IEEE Transactions on Medical Imaging 26(9), 1224–1241 (2007) 15. Bellon, O.R.P., Direne, A.I., Silva, L.: Edge Detection to Guide Range Image Segmentation by Clustering Techniques. In: International Conference on Image Processing, vol. 2, pp. 725–729 (1999) 16. Morris, O., Lee, M., Constantinides, A.: A Unified Method for Segmentation and Edge Detection using Graph Theory. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 11, pp. 2051–2054 (April 1986) 17. Kaganami, H.G., Beiji, Z.: Region-Based Segmentation versus Edge Detection. In: Fifth International Conference Intelligent on Information Hiding and Multimedia Signal Processing, IIH-MSP 2009, September 12-14, pp. 1217–1221 (2009) 18. Bourjandi, M.: Image Segmentation Using Thresholding by Local Fuzzy Entropy-Based Competitive Fuzzy Edge Detection. In: Second International Conference on Computer and Electrical Engineering, December 28-30, vol. 2, pp. 298–301 (2009) 19. Hsiao, Y.-T., Chuang, C.-L., Jiang, J.-A., Chien, C.-C.: A Contour Based Image Segmentation Algorithm using Morphological Edge Detection. In: IEEE International Conference on Systems, Man and Cybernetics, October 10-12, vol. 3, pp. 2962–2967 (2005) 20. Yishao, L., Jiajun, B., Chun, C., Mingli, S.: A novel image segmentation algorithm based on convex polygon edge detection. In: TENCON 2004: IEEE Region 10 Conference, November 21-24, vol. B, 2, pp. 108–111 (2004) 21. Pavlidis, T., Liow, Y.-T.: Integrating Region Growing and Edge Detection. IEEE Trans. on Pattern Analysis and Machine Intelligence PAMI-12, 225–233 (1990) 22. Morse, B.S.: Bringham Young University, http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/ MORSE/region.pdf 23. Grinaker, S.: Edge Based Segmentation and Texture Separation. In: 5th Int. Conf. Pattern Recognition, Miami Beach, FL, December 1-4, pp. 776–780 (1980) 24. Fua, P., Hanson, A.J.: Using Generic Geometric Models for Intelligent Shape Extraction. In: AAAI 6th National Conference on Artificial Intelligence, pp. 706–711. MorganKaufmann, Los Altos (1987) 25. Xiaohan, Y., Yla-Jaaski, J., Huttunen, O., Vehkomaki, T., Sipila, O., Katila, T.: Image Segmentation Combining Region Growing and Edge Detection. In: 11th IAPR International Conference on Image, Speech and Signal Analysis, August 30-September 3, pp. 481–484 (1992) 26. Yu, Y.-W., Wang, J.-H.: Image Segmentation Based on Region Growing and Edge Detection. In: IEEE International Conference on Systems, Man, and Cybernetics, vol. 6, pp. 798–803 (1999) 27. Al-Hujazi, E., Sood, A.: Range Image Segmentation Combining Edge-Detection and Region-Growing Techniques with Applications to Robot Bin-Picking using Vacuum Gripper. IEEE Transactions on Systems, Man and Cybernetic 20(6), 1313–1325 (1990)
112
N. Jamil et al.
28. Xiang, Z., Dazhi, Z., Jinwen, T., Jian, L.: A Hybrid Method for 3D Segmentation of MRI Brain Images. In: 6th International Conference on Signal Processing, August 26-30, vol. 1, pp. 608–611 (2002) 29. Huang, Y.-R., Kuo, C.-M.: Image Segmentation using Edge Detection and Region Distribution. In: 2010 3rd International Congress on Image and Signal Processing (CISP), October 16-18, vol. 3, pp. 1410–1414 (2010) 30. Deng, W., Xiao, W., Deng, H., Liu, J.: MRI Brain Tumor Segmentation with Region Growing Method Based on the Gradients and Variances Along and Inside of the Boundary Curve. In: 3rd International Conference on Biomedical Engineering and Informatics (BMEI), October 16-18, vol. 1, pp. 393–396 (2010) 31. Fan, S.L.X., Man, Z., Samur, R.: Edge Based Region Growing-A New Image Segmentation Method. Journal of the ACM, 302–305 (2004) 32. Canny, J.: A Computational Approach to Edge Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 8(6), 679–698 (1986) 33. Otsu, N.: A Threshold Selection Method from Gray-Level Histograms. IEEE Trans. Sys. Man, Cyber. 9(1), 62–66 (1979)
Comparison on Performance of Radial Basis Function Neural Network and Discriminant Function in Classification of CSEM Data Muhammad Abdulkarim, Afza Shafie, Radzuan Razali, Wan Fatimah Wan Ahmad, and Agus Arif Universiti Teknologi PETRONAS, Seri Iskandar, 31750 Tronoh, Perak, Malaysia {mmmhammmad,agusarif}@gmail.com, {afza,radzuan_razali,fatimhd}@petronas.com.my
Abstract. Classification of Controlled Source Electro-Magnetic data into dichotomous groups based on the observed resistivity contrast measures is presented. These classifications may indicate the possible presence of hydrocarbon reservoir. Performance of Radial Basis Function of Neural network and Discriminant Function models were analyzed in this study. Both model's classification accuracy, Sensitivity and Specificity are compared and reported. Gaussian basis function was used for the hidden units in the RBF neural network, while quadratic form is used for the discriminant functions. The Controlled Source Electro-Magnetic data used for this study were obtained from simulating two known categories of data with and without hydrocarbon using COMSOL Multiphysics simulation software. The preliminary result indicates that the radial basis function neural network display superior accuracy, sensitivity and specificity in classifying CSEM data when compared to discriminant functions model. Keywords: Discriminant Function Analysis, Controlled Source ElectroMagnetic, hydrocarbon reservoir, Radial Basis Function, Resistivity Contrast, Visual Informatics.
1 Introduction Control source electromagnetic (CSEM) is one of the techniques used to detect the presence of hydrocarbon layer beneath the sea bed. This method utilizes man made electric and magnetic field to excite the earth beneath the sea floor [1, 2, 3]. In this method, a powerful Horizontal Electric Dipole (HED [4] is used to transmit ultra-low frequency (~ 0.1 ─ 5Hz) electromagnetic waves (EM) waves which is being towed approximately 30 ─ 40m above the seabed to detect resistivity contrasts in the subsurface. Classification accuracy of CSEM data based on the logged EM fields play an important role for oil and gas industry since it can provide an indication presence of H. Badioze Zaman et al. (Eds.): IVIC 2011, Part I, LNCS 7066, pp. 113–124, 2011. © Springer-Verlag Berlin Heidelberg 2011
114
M. Abdulkarim et al.
hydrocarbon. The objective of this study is to compare the performance of Radial Basis Functions (RBF) of artificial neural network and discriminant functions analysis techniques for the classification of CSEM data. In order to achieve this objective, CSEM simulations are carried out for without hydrocarbon and with hydrocarbon. Simulation data from both categories are extracted to analyze the classification capability between RBF neural network and Discriminant Functions Analysis (DFA) in discriminating and categorizing the data into one of the two groups labeled as; (Group 1 = Yes Hydrocarbon) and (Group 0 = No Hydrocarbon) based on the measured electric and the corresponding magnetic fields data. The paper is organized as follows; next section briefly explains the radial basis functions neural network and discriminant function analysis, followed by the research methodology and simulation set-ups in section 3. The results and discussion will be given in section 4 and the conclusions in section 5.
2 Radial Basis Function Network Model The radial basis function neural network is a multilayer feed forward neural network consisting of an input layer of source nodes, a layer of nonlinear hidden j locally tuned units which are fully interconnected to an output layer of L linear units. All hidden units simultaneously receive the n-dimensional real valued input vector X (Figure 1). In response to an input vector, the outputs of the hidden layer are linearly combined to form the network response that is processed with a desired response to the output layer.
Fig. 1. Schematic diagram for radial basis neural network architecture
The weights are trained in a supervised fashion using an appropriate linear method [5]. The response of RBF neural network is related to the distance between the input and the centroid associated with the basis function. The RBF hidden-unit output, Zn is obtained by closeness of the input X to an n-dimensional parameter vector Mj associated with the nth hidden unit [6, 7]. The response characteristics of the nth hidden unit (n = 1, 2, 3, …, N) is assumed as:
Comparison on Performance of Radial Basis Function Neural Network
X − μn Z n = ϕ σ2 n
115
(1)
Where φ is a strictly positive radially symmetric function (kernel) with a unique maximum at its ‘Centre’ μn and which drops off rapidly to zero away from the centre. The parameter σn is the width of the receptive field in the input space from unit n. This implies that Zn has an appreciable value only when the distance ||X – μn|| is smaller than the width σn. Given an input vector X, the output of the RBF network is the M-dimensional activity vector Y, whose mth component (m = 1, 2, 3, …, M) is given by: Ym ( X ) =
N
W
mn Z n
(X )
(2)
n =1
For m = 1, mapping of eq. (1) is similar to a polynomial threshold gate. However, in the RBF network, radially symmetric kernels are used as ‘hidden units’. The degree of accuracy of these networks is controlled by three parameters: the number of basis functions used, their location and their width [6, 7, 8, 9, 10]. In the present work we have assumed a Gaussian basis function for the hidden units given as Zn for n = 1, 2, 3, …, N, where; X − μn Z n = exp − 2σ n2
(3)
and μn and σn are mean and the standard deviation respectively, of the nth unit receptive field and the norm is the Euclidean. 2.1 RBF Model Training A training set is an n labeled pair {Xi, di} that represents associations of a given mapping or samples of a continuous multivariate function. The training method used here will adaptively update the free parameters of the RBF network while minimizing the error function E. These parameters are the receptive field centers μn of the hidden layer Gaussian units, the receptive field widths σn, and the output layer weights (Wmn). The training methods considered in the study was a fully supervised gradientdescent method over E [11]. In particular, μn, σn, and Wmn are updated as follows:
Δμ n = −ε μ ∇ μn E
(4)
∂E ∂σ n
(5)
∂E ∂ W mn
(6)
Δσ
n
= −εσ
Δ W mn = − ε w
116
M. Abdulkarim et al.
where εμ, εσ and εw are small positive constants. This training model will select the centers and dimensions of the functions and calculates the weights of the output neuron. The center, distance scale and precise shape of the radial function are parameters of the model, all fixed if it is linear. Selection of the centers can be understood as defining the optimal number of basis functions and choosing the elements of the training set used in the solution. It was done according to the method of forward selection [12]. The Artificial Neural Network (ANN) used in this work was implemented using MATLAB 2009a neural network toolbox. 2.2 Discriminant Function Analysis (DFA) The aim of DFA is to obtain a discriminant function that maximizes the distance between the categories, i.e. come up with an equation that has strong discriminatory power between groups. The form of the linear and quadratic discriminant functions of an n-dimensional feature vector is given in equation (7) and (8) respectively by [13] as follows: Dc ( X ) = 2 μ cT σ −1 X − μ cT σ −1 μ c − 2 log P(c)
(7)
Dc ( X ) = ( X − μ c ) T σ c−1 ( X − μ c ) + log σ c − 2 log P (c )
(8)
where for each class c, the parameters µ c and σc denote the mean vector and the covariance matrix for X in the class c, respectively, and are estimated by their sample analogs [see equations (9) and (10)]. P(c) is the a priori probability for the class c.
∧
μc
= X
∧
σc =
c
1 = Nc
Sc 1 = Nc Nc
N
i =1 N
X
n1
X
nm
i =1
(X − X
c )( X
x1 = x m
− X c )T
(9)
(10)
c
The aim of this statistical analysis DFA is to combine (weight) the variable scores in some way so that a single new composite variable, the discriminant score, is produced. In this study, predicted membership is calculated by first producing a score for D for each case using the discriminate function. Then cases with D values smaller than the cut-off value are classified as belonging to one group while those with values larger are classified into the other group. This technique for classification of CSEM data based on phase versus offset has been reported by [14]. 2.3 Terminology and Derivations from a Confusion Matrix A confusion matrix contains information about actual and predicted classifications done by a classification system [15] is given in Table 1. Note that, in this work,
Comparison on Performance of Radial Basis Function Neural Network
117
classification of No Hydrocarbon (0) data as Yes Hydrocarbon (1) is considered as False Positive (FP) and classification of Yes Hydrocarbon (1) data as No Hydrocarbon (0) is considered False Negative (FN). True Positive (TP) and True Negative (TN) are the cases where the Yes Hydrocarbon (1) is classified as Yes Hydrocarbon (1) and No Hydrocarbon (0) classified as No Hydrocarbon (0) respectively. Table 1. Illustration of Confusion Matrix
Predicted Actual
Negative θ1 θ3
Negative Positive
Positive θ2 θ4
The entries in the confusion matrix have the following meaning in the context of our study:
θ1 is the number of correct predictions that an instance is negative, θ2 is the number of incorrect predictions that an instance is positive, θ3 is the number of incorrect of predictions that an instance negative, and θ4 is the number of correct predictions that an instance is positive.
The accuracy, sensitivity, specificity and adjusted accuracy were estimated using the following relations [16]: The Accuracy (AC) is the proportion of the total number of predictions that were correct. It is determined using the equation: AC =
θ1 + θ 4 θ1 + θ 2 + θ 3 + θ 4
(13)
The True Positive rate (TP) also known as “Sensitivity” or the “recall” is the proportion of positive cases that were correctly identified, as calculated using the equation: TP =
θ4 θ3 + θ 4
(14)
The False Positive rate (FP) is the proportion of negatives cases that were incorrectly classified as positive, as calculated using the equation: FP =
θ2 θ1 + θ 2
(15)
The True Negative rate (TN) also known as “Specificity” is defined as the proportion of negatives cases that were classified correctly, as calculated using the equation: TN =
θ1 θ1 + θ 2
(16)
The False Negative rate (FN) is the proportion of positives cases that were incorrectly classified as negative, as calculated using the equation:
118
M. Abdulkarim et al.
FN =
θ3 θ3 + θ 4
(17)
The Negative Predictive Value (NPV) is the proportion of the predicted negative cases that were correct, as calculated using the equations: NPV =
θ1 θ1 + θ 3
(18)
The Precision (P) also known as “Positive Predictive Value (PPV)” is the proportion of the predicted positive cases that were correct, as calculated using the equations: P=
θ4 θ2 +θ4
(19)
Finally, the Adjusted Accuracy (AA) is the average value of sensitivity and specificity, as calculated using the equation:
AA =
Sensitivit y + Specificit y 2
(20)
Accuracy is the representation of classifier performance in global sense. Sensitivity and Specificity are the proportions of Yes Hydrocarbon (1) data classified as Yes Hydrocarbon (1), No Hydrocarbon (0) data classified as No Hydrocarbon (0) respectively. The adjusted accuracy is a measure that accounts for unbalanced sample data of No Hydrocarbon (0) and Yes Hydrocarbon (1) events. The adjusted accuracy combines sensitivity and specificity into a single measurable value [15].
3 Methodology 3.1 The Simulation Set-Up COMSOL Multi-Physics software is used to model this simulation. The set-up is shown in Fig. 2. The Computational domain is a sphere of 10km diameter. The top region of this sphere represents air with resistivity of 1000Ω. A 500m deep ocean water domain with a conductivity of 3 Sm–1 and relative permittivity of 80 is specified above the mid-plane of the sphere. Below the mid-plane, a conductivity of 1.5 Sm–1 and relative permittivity of 30 is specified for the rock. Embedded in the rock at an average depth of 1000m, there is a block-shaped 100m deep and 7km-by-7km wide hydrocarbon reservoir. The resistivity of the hydrocarbon layer is 1000Ω and the permittivity is 4. The transmitter is modeled as a short 1250A AC line current segment of length 150m of frequency 0.125 Hz is located 35m above the mid-plane. At the external spherical boundaries, a scattering boundary condition absorbs outgoing spherical waves. The Maxwell’s electromagnetic field wave equation in vacuum in the absence of electric or magnetic sources is solved for the electric field vector E inside the computational domain.
Comparison on Performance of Radial Basis Function Neural Network
119
Fig. 2. Snapshot of the CSEM Simulation set up
3.2 Data and Analysis Procedure The training data were the sets of E-field values extracted from the post processing results of the two CSEM simulations. The RBF neural network architecture considered for this application was a single hidden layer with Gaussian RBF. The basis function φ is a real function of the distance (radius) r from the origin, and the center is μ. Gaussian-type RBF was chosen here due to its similarity with the Euclidean distance and also since it gives better smoothing and interpolation properties [17]. Training of the RBF neural network involved two critical processes. First, the centers of each of the N Gaussian basis functions were fixed to represent the density function of the input space using a dynamic ‘k means clustering algorithm’. The algorithm is given by [18, 19]. This was accomplished by first initializing the set of Gaussian centers μn to random values. Then, for any arbitrary input vector X(t) in the training set, the closest Gaussian center, μn, is modified as:
(
μ nnew = μ nold + η X (t ) − μ nold
)
(21)
where η is a learning rate that decreases over time. This phase of RBF network training places the weights of the radial basis function units in only those regions of the input space where significant data are present. The parameter σn is set for each Gaussian unit to equal the average distance to the two closest neighbouring Gaussian basis units. If μ1 and μ2 represent the two closest weight centers to Gaussian unit n, the intention was to size this parameter so that there were no gaps between basis functions and only minimal overlap between adjacent basis functions were allowed. After the Gaussian basis centers were fixed, the second step of the RBF network training process was to determine the weight vector W which would best approximate the limited sample data X, thus leading to a linear optimization problem that could be solved by ordinary least squares method. Least mean square (LMS) algorithm [9] has been used to estimate these weights. This avoids the problem of gradient descent methods and local minima characteristic of back propagation algorithm [15].
120
M. Abdulkarim et al.
For the DFA model analysis, an equal prior was used because all of the populations sizes are equal (i.e. the extracted E-field and corresponding B-field data). Bartlett’s test was carried out to determine the variance-covariance matrices are homogenous for the two population involved, that is the E-field and the B-field data. SPSS statistical software and MATLAB Statistics toolbox were used to carry out the discriminant analysis [13].
4 Results and Discussion 4.1 RBF Neural Network A random sample of 1212 (60%) observed values was used as training, 242 (20%) for validation, 242 (20%) for testing. Training data were used to train the application; validation data were used to monitor the neural network performance during training and the test data were used to measure the performance of the trained application. The network performance plot and plotconfusion matrix in figures 3 & 4 respectively, provides an index of overall RBF model classification performance. The RBF neural network architecture used in this work had two input variables, E-field and its corresponding B-field. It has one hidden layer with five hidden nodes and one output node. The RBF performed best at seven centers and maximum number of centers tried was 11. Mean square error using the best centers was 0.115.
Fig. 3. Snapshot of Neural Network performance plot
Figure 4 shows a snapshot of the Matlab neural network confusion matrix for the training, testing, validation and the overall. The Diagonal cells in each table show the number of cases that were correctly classified, and the off-diagonal cell show the misclassified cases. The cell located at row 3, column 3 (i.e. in the bottom right) of each of the confusion matrices shows the total percent of correctly classified cases and the total percent of misclassified cases just below it within the same cell. The results for all the three data sets (training, validation and testing) show very good recognition. The overall sensitivity of the RBF neural network model was 97.49%, specificity was 98.07% and percentage correct prediction was 97.7%.
Comparison on Performance of Radial Basis Function Neural Network
121
Fig. 4. Snapshot of the Neural Network Confusion Matrix
4.2 Discrminant Analysis The mean differences between B-field and E-field data together with the standard deviations are depicted in Table 2 for the DFA. The values suggest that these may be good discriminators as the separations are large. Table 2. Group Statistics
Hydrocarbon
Mean
Std. Deviation
0
1
Total
Valid N (listwise) Unweighted
Weighted
Bfield
0.000
0.000
1212
1212
Efield
–1.7E2
10.55
1212
1212
Bfield
0.000
0.000
1212
1212
Efield
–1.4E2
6.47
1212
1212
Bfield
0.000
0.000
2424
2424.000
Efield
–1.57E2
17.52
2424
2424.000
Table 3. Test of Equality of Group Means
Wilks' Lambda
F
df1
df2
Sig.
Bfield
0.248
605.908
1
2422
.000
Efield
0.248
606.040
1
2422
.000
Table 3 provides strong statistical evidence of significant differences between means of Yes Hydrocarbon (1) and No Hydrocarbon (0) groups with E-field having a slightly higher value of F. Since for both variables p < 0.05 and the multivariate test—Wilks‘ lambda of 0.248 each for E and B fields we can say that the model is a good fit for the data.
122
M. Abdulkarim et al. Table 4. Performance of DFA Model
Actual Value DFA
Yes Hydrocarbon (1)
No Hydrocarbon (0)
Total
Prediction
TP = 735
FP = 43
778
False Negative
FN = 59
TN = 375
434
Total
794
418
1212
Table 4 presents a classification results of the DFA with simple summary of numbers classified correctly and incorrectly. This gives information about actual group membership versus predicted group membership. Table 5. Performance of RBF Neural Network
Actual Value RBF
Yes Hydrocarbon (1)
No Hydrocarbon (0)
Total
Prediction
TP = 778
FP = 8
786
False Negative
FN = 20
TN = 406
426
Total
798
414
1212
Table 5 shows the classification results of the RBF with simple summary of numbers classified correctly and incorrectly. This gives information about actual group membership versus predicted group membership. Table 6. Comparison of DFA and RBF
Indices Accuracy Sensitivity Specificity False Positive Rate = 1 – Specificity Positive Predictive Value (PPV) Negative Predictive Value (NPV) False Negative Rate Adjusted Accuracy
DFA 91.58% 92.57% 89.71% 10.29% 94.47% 86.41% 33.24% 91.14%
RBF 97.69% 97.49% 98.07% 1.93% 98.98% 95.31% 34.39% 97.78%
The comparative results of DFA and RBF models are presented in Table 6. Judging by all the indices, the results obtained indicate that the RBF network has a better performance than DFA models. The sensitivity of 97.49% by RBF which is higher than that of DFA 92.57% shows the ability of RBF neural network to correctly identify data indicating presence of hydrocarbon (true positive rate), whereas the specificity of 98.07% for the RBF as the ability of the network to correctly identify data indicating no hydrocarbon (true negative rate) is also higher than 89.71% for the DFA. The False
Comparison on Performance of Radial Basis Function Neural Network
123
Positive Rate of 10.29% for DFA which is higher than 1.93% for RBF explains the superiority of RBF to have less error of incorrectly identifying data without hydrocarbon as data with hydrocarbon. Figure 5 below shows a MATLAB Graphical Group Classification of the data.
Fig. 5. MATLAB Graphical Group Classification
5 Conclusion Accurate classification of CSEM data based on the logged EM fields can plays an important role for oil and gas industry. This can give an indication of reservoir presence during exploration. In this work, CSEM data are classified into two categories, Yes Hydrocarbon (1) and No Hydrocarbon (0) using artificial neural networks and classical statistical approach. The performance comparisons of the two methods are assessed. It is observed that radial basis function networks have better accuracy when compared to discriminant function analysis. The value of specificity shows that RBF network classifies No Hydrocarbon data more accurately than DFA. The positive predictive value suggests that the classification of CSEM data into Yes or No Hydrocarbon is higher in the RBF than that of the DFA model. This is evident through obtaining higher value false positive rate from DFA than RBF. It appears that ANN could be a valuable alternative to traditional statistical methods and may be particularly useful when the primary goal is classification. The next step in the study is to carry out this analysis with experimental data to establish the full potential of the RBF.
124
M. Abdulkarim et al.
Acknowledgment. This research is carried out under the Fundamental and Research Grant Scheme. The authors would like to acknowledge Universiti Teknologi PETRONAS for giving the opportunity to carry out this research work.
References 1. Eidesmo, et al.: Sea Bed Logging (SBL): A new method for remote and direct identification of hydrocarbon filled layers in deep water areas (2002) 2. Constable, S., Srnka, L.J.: An introduction to marine controlled-source electromagnetic methods for hydrocarbon exploration Geophysics 72, WA3 (2007) 3. Hesthammer, J., Boulaenko, M.: The potential of CSEM technology. GeoExpro. and HGS Bulletin 4, 52–58 (2007) 4. Demuth, H., Beale, M., Hagan, M.: Neural Network ToolboxTM User’s Guide 1992-2008, vol. 3, pp. 01760–2098. The MathWorks, Inc., Apple Hill Drive (2008) 5. Raiche, A.: A pattern recognition approach to geophysical inversion usingneural networks. Geophysical Journal International 105, 629–648 (1991) 6. Poulton, M., Sternberg, K., Glass, C.: Neural network pattern recognition of subsurface EM images. Journal of Applied Geophysics 29, 21–36 (1992) 7. Roth, G., Tarantola, A.: Neural networks and inversion of seismic data. Journal of Geophysical Research 99, 6753–6768 (1994) 8. El-Qady, G., Ushijima, K.: Inversion of DC resistivity data using neural networks. Geophysical Prospectin 49, 417–430 (2001) 9. Haykin, S.: Neural Networks a Comprehensive Foundation, 2nd edn. Prentice-Hall Inc. (1999) 10. Singh, U.K., Somvanshi, V.K., Tiwari, R.K., Singh, S.B.: Inversion of DC resistivity data using neural network approach. In: Proceedings of the International Groundwater Conference, IGC 2002, Dindigul, India, pp. 57–64 (2002) 11. Rummelhart, D.E., Hinton, G.E., Williams, R.J.: Learning internal representations by error propagation. In: Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol. 1, pp. 318–362. MIT Press, Cambridge (1986a) 12. Rummelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by backpropagating errors. Nature 323(9), 533–535 (1986b) 13. Pardoe, I., Xiangrong, Y., Dennis, R.C.: Discriminant Analysis and Statistical Pattern Recognition: Graphical tools for quadratic discriminant analysis. John Wiley Sons, New York (2006) 14. Muhammad, A., Afza, S., Radzuan, R.: Wan Fatimah Wan Ahmad: Application of Discriminant Analysis to Phase versus Offset Data in Detection of Resistivity Contrast. Paper will be Presented at NPC Symposium (2011) 15. Kohavi, R., Provost, F.: Machine Learning, vol. 30, pp. 271–274. Kluwer Academic Publishers, Boston (1998) 16. Abboud, S., Barnea, O., Guber, A., Narkiss, N., Bruderman, I.: Maximum expiratory flowvolume curve: mathematical model and experimental results. Med. Eng. Phys. 17(5), 332– 336 (1995) 17. Hartman, E.J., Keeler, J.D., Kowalski, J.M.: Layered neural networks with Gaussian hidden units as universal approximators. Neural Comput. 2, 210–215 (1990) 18. Orr, M.J.: Regularization in the selection of radial basis function centers. Neural Comput. 7, 606–623 (1995) 19. Sing, J.K., Basu, D.K., Nasipuri, M., Kundu, M.: Center selection of RBF neural network based on modified k-means algorithm with point symmetry distance measure. Found. Comput. Decis. Sci. 29, 247–266 (2004)
Simulation for Laparoscopy Surgery with Haptic Element for Medical Students in HUKM: A Preliminary Analysis A.R. Norkhairani1, Halimah Badioze Zaman2, and Azlina Ahmad2 1
Faculty of Technology and Information Science 2 Institute of Visual Informatics, Universiti Kebangsaan Malaysia, 43600 Bangi, Selangor [email protected],{hbz,aa}@ftsm.ukm.my
Abstract. Minimal invasive surgery or laparoscopy has taken been applied to especially pediatrics. Traditionally, medical students learn surgical skills through the mentorship model during their residency training. As technology evolves, more and more applications, simulations and systems are built as aids to medical students in acquiring surgical skills. This study was conducted to investigate how haptic technology can play a part in shaping students’ skills at the Department of Surgery in HUKM. This paper will discuss the first phase of the research: the Preliminary Analysis. Previous research were reviewed; interviews were conducted with experts and related staff of the department; and observations were made during actual training sessions. Results of the preliminary analysis show that current training approaches have some weaknesses that can be replaced by applying simulations with haptic devices. The results will be used to design a framework for Laparoscopy Simulation Surgery with Haptic Elements or Simulasi Pembedahan Laparoskopi dengan Elemen Haptik (SPLasH). This innovation will not only enrich the field of visual informatics, but also improve medical training in the field of Laparoscopy Surgery. Keywords: Simulation, Laparoscopy Surgery, Visual Informatics, Haptics.
1 Introduction In medical field, especially in surgery, various research have been conducted to reduce incisions in surgery that led to the implementation of MAS (minimal access surgery) which is also known as laparoscopy. This procedure needs highly skilled surgeons to operate, and is a difficult skill for the surgical trainees to master [1],[2]. Therefore, surgical trainees needed to spend more time supervised by experts. Some issues were raised such as limitation of time, social and financial aspects that restricted experts to spend more time in training the trainees [2]. These issues led to the use of technology as an effort to help trainees acquire the special skills required. Robotics, simulation and virtual technologies are currently available to fulfil this gap [3]. Besides that, the use of dummy has helped in the acquisition of such surgical H. Badioze Zaman et al. (Eds.): IVIC 2011, Part I, LNCS 7066, pp. 125–138, 2011. © Springer-Verlag Berlin Heidelberg 2011
126
A.R. Norkhairani, H. Badioze Zaman, and A. Ahmad
skills [3]. Some of the skills were acquired through supervision by experts together with consistent practices [1],[2]. Simulation is an approach that seems to be accepted as a tool for surgery skill training with the integration of haptic devices that provide force and tactile feedback to increase realism [4]. This multidisciplinary research has attracted researchers in both the medical and computer science field as part of visual informatics[5]. Thus, this preliminary analysis was conducted to investigate how simulation for laparoscopic surgery training with haptics elements or SPLasH can be implemented in medical student training at the department of surgery in HUKM (Hospital Universiti Kebangsaan Malaysia). Specifically, this paper focuses on the current issues in laparoscopy surgery, current research to the date that is carried out to overcome the issues and surgery training method at HUKM. The results obtained will be used as an input for the second phase of the research. 1.1 Literature Review Laparoscopy is a minimally invasive procedure that is used either to diagnose diseases or as a method of surgery [6],[7],[8]. It normally applies to those procedures that cannot be detected through diagnostic imaging such as pelvic pain by gynecologists, and is also used in normal procedure to examine abdominal organs. Laparoscopy starts with a small incision to accommodate the insertion of laparoscope instruments through cannula or trocar and is normally under the navel. Other incisions are also made to allow other insertions of laparoscopy additional instruments and are normally done on both left and right abdomen. Once the procedure is completed, the instruments will be removed and the incision will be sutured and bandaged. This incision will take a shorter time to recover and patients will not need to stay long at the hospital. Sometimes they are treated as daily patients [6],[7],[8]. Currently, there are four common approaches of surgery as summarized in Table 1 [9]. Open Surgery and Robot assisted Laparoscopy are two approaches that share same values in terms of vision and hand freedom which are three Dimensional and 6 DOF (degree of freedom). Degree of freedom refers to momentum of the body that can move independently in three directions of space. However, tactile sensation is absent in Robot assisted Laparoscopy. While endoscopic and laparoscopy share the same level of tactile sensation and hand freedom, they differ in vision criteria where endoscopic is monocular and laparoscopy is two Dimensional. Table 1. Differences between current surgical techniques Surgery Approach Open Endoscopic Laparoscopy Robot assisted laparoscopy
Vision Three Dimensional Monocular Two Dimensional Three Dimensional
Tactile Sensation Fully Present Reduced Reduced Absent
Hand Freedom 6 DOF 4 DOF 4 DOF 6 DOF
The differences, in terms of vision, tactile sensation and hand freedom, between Open surgery and Laparoscopy surgery are not too wide. It is clearly shown in Table 1, that in terms of vision, hand freedom and tactile sensation, laparoscopy shows less degree of freedom and vision compared to open surgery. This lack of properties in
Simulation for Laparoscopy Surgery with Haptic Element for Medical Students
127
laparoscopy can be replaced by using technology. This gap can be fulfilled by the application of simulations that will provide a higher degree of freedom and threeDimensional vision. The absence of tactile sensation can be filled by using appropriate haptic devices that are aligned to the need of the simulations. The approach in surgery training seems controversial among the surgeons [10].Some of the surgeons focus more on observations in the operation theatre and others promote the usage of animals and simulation model or combination of tasks related to surgery. It has been identified that there are a few levels of training in surgery, in order to acquire skills which involve cognitive and motor skills [1]. Traditionally, surgeons go through apprenticeship, which is based on residency training system that was introduced by Sir William Halsted in late 1800s [1]. Currently, training curriculum in medical faculties or institution is multilevel as depicted in Figure 1. Students start with first level of training which is learning through learning aids such as courseware, video, dummies and simulations with special devices. After the basic knowledge gained, they then will move to the next level where observation in operation room and fully simulated environment take place. The higher level involves consultation with experts and lastly, they are ready to practice in hospitals. Each of the levels will increase the skills acquired by the students through different approaches of training method. The higher the students go through, the approaches become closer to the real world.
Fig. 1. Training curriculum of Minimally Invasive Surgery Centre of Caceres (Spain). Simulator or virtual reality system can be implemented at the lowest level of skill training [11].
Surgical training needs to prepare surgeons in mastering a different set of psychomotor and visuo-spatial skills [12]. There are a few skills acquisition theories such as Kopta’s, Schmidt’s Schema, Traditional Apprenticeship, Cognitive apprenticeship and Ericsson’s Rationale for acquisition of expertise as shown in Table 2. Each of the theories has its own phase which is divided into three to six phases. Each phase focuses on each level of the whole process in skill acquisition but overall, it can be concluded that skills can be acquired through three main processes [1].
128
A.R. Norkhairani, H. Badioze Zaman, and A. Ahmad
a) Demonstration – surgeons observe how the procedures are being carried out either through video, operation theatre or simulation b) Coaching – surgeons then carry out the procedures with the tutelage of the mentor c) Practice – the surgeons do the procedure on their own in order to be assessed in terms of skills acquired. Table 2. Theories of skill acquisition and their phases Theory Kopta’s
Schmidt’s Schema
Traditional Apprenticeship
Cognitive Apprenticeship
Ericsson’s Rationale for acquisition of expertise
• • • • • • • • • • • • • • • • • • •
Phase Cognitive Integration Autonomous Initial condition Action initiation Sensory feedback Outcome Knowledge Observation Coaching Practice Modelling Coaching Scaffolding Articulation Reflection Exploration Deliberate practice Focused attention Time of day and duration of practice
Currently, there are many applications that have been developed to be used as training aids for minimal invasive surgery (MIS) or laparoscopy. SINERGIA laparoscopic virtual reality simulator is an application that provides exercises for specific skill acquisitions – coordination, speedy coordination, navigation, navigation and touch, accurate grasping, grasp and transfer, coordinate pulling, force sensitivity and accurate cutting [11]. The MISTELS system (McGill Inanimate System for Training and Evaluation of Laparoscopic Skills) was designed to provide a series of tasks that is performed under video guidance in a box using standard laparoscopic instruments to objectively assess basic laparoscopic skills [13]. Meanwhile, Laparoscopic Training Simulator (LTS) 2000 which is based on MISTELS and improved by combining computer-based electronic scoring, interactive user feedbacks and digital capture of performance data. Another part is, it is portable. Tasks provided in MISTELS and LTS are summarized in Table 3.These two training aids basically have the same function but LTS provides more feedbacks to user. Tasks provided in both training aids were dedicated to train basic skills in laparoscopy which involved: grasping (task 1, 6, 8 and 9), pressure (task 4, 5 10) and knotting (task 7, 8 and 9). Results showed that users were more satisfied with LTS.
Simulation for Laparoscopy Surgery with Haptic Element for Medical Students
129
Table 3. Task provided in MISTELS and LTS 2000
1 2 3 4 5 6 7 8 9 10
Task Peg transfer Ring manipulation (right) Ring manipulation (left) Cannulation (right) Cannulation (left) Roeder-type knot Application of knot on vertical tube Extracorporeal knot on horizontal tube Intracorporeal knot and tension test Cutting circle
MISTELS √
√ √ √ √
LTS √ √ √ √ √ √ √ √ √ √
Sansregret et al.[13], used both simulators to a group of medical students or interns, fellows, and attending surgeons with classified experience in laparoscopic surgery (novice, intermediate, competent, expert). Results showed that there was a progressive improvement according to academic level as well as experience. This means that simulators do help in skills acquisition. Another virtual reality simulator is the computer-based developed system called MIST-VR (Mentice AB, Gothenburg, Sweden) that includes interfaces with two laparoscopic instruments that is passed through a frame containing a tracking device [12]. MIST-VR provides basic skills training (core skills 1 and 2) under module 1 and module 2 for intracorporal suture training. The benefits of using haptic devices in surgical simulation have been realized and noticed by many parties including researchers and companies. Few companies that really put effort in this particular area, for example, Imersion Medical, Mentice, Surgical Science and Reachin [14],[15],16]. This has led to the growth of research in haptics and simulations. Over the last ten years, Basdogan, et al.[14] and the team at MIT Touch Lab have put an effort to explore the roles of haptics in Minimal Invasive Surgery Simulation and Training (MISST) as shown in Figure 2. MISST has integrated haptic devices to offer more realistic simulator. They have identified four (4) main areas of haptic roles as follows:
(i) Haptic interfaces— the integration of available haptic devices that have been commercialized with a simulation of MIS procedures. (ii) Haptic rendering—the development of computerized model of surgical instruments to enable collision detection, the geometric model of organs to be visualized, responses of forces to the users and models that are able to respond towards user interaction in real time. (iii) Haptic recording—the use of haptic devices in order to measure the properties of material during recording. (iv) Haptic playback—the development of techniques for user interaction which is based on force feedback to provide guidelines to the users.
130
A.R. Norkhairani, H. Badioze Zaman, and A. Ahmad
Fig. 2. The MISST – the handler is connected to Phantom I and Phantom II [12]
MISST divides the instrumentation of laparoscopic into two categories. One category is for instruments that are featured as long, thin and straight which are normally used for puncturing and another group consists of instruments that are articulated which are suitable for pulling, clamping, gripping and cutting. A few challenges were faced during the recording and playback, object deformation and rendering process. Findings of this project stated that the integration haptic into simulator for MIS seems essential since the MIS is a procedure that involves touching, feeling and manipulating organs through instruments. Beside MISST, THUMP that stands for Two-Handed Universal Master Project has been developed to improvise tele-robotic interfaces [17]. This is due to the loss of some classic hand-eye coordination challenges and other traditional limitations of MIS in tele-robotic systems. THUMP incorporate stereoscopic goggles with two eight-axis da VinciTM master mechanisms provided by Intuitive Surgical® [17] . All these systems development incorporate haptic element as tools to provide tactile and force feedback in order to give more realistic experience to the users. Haptics, according to Fisher et al.[18] is to be touched, as visual to be seen and as auditory to be heard while Oakley et al.[19] defines haptics as something that is related to the sense of touch [18-20]. Haptics is possible to be implemented in four areas which are human haptics, computer haptics, multimedia haptics and machine haptics [20],[21],[22]. Researches related to haptics have moved to multidisciplinary research that involves the four main areas[21],[22] as shown in Figure 3. Each of the areas focuses on different aspects of haptics technology but is interrelated to each other. Human haptics is an area that focuses on human sensory motor loop and all related aspects with human perception towards sense of touch. Machine haptic focuses on mechanical devices that replace or augment the human touch. Human haptics involve design and development of mechanical devices. The mechanical devices need a life to be felt by the users where this is the part played by computer haptics. It focuses on the algorithm development and software to produce sense of touch to the virtual object through haptic visual and rendering. While multimedia haptics is referred as the integration and synchronization haptic interfaces data presentation, with other media, in multimedia application that utilized gesture recognition, tactile sense and force feedback [21],[22]. The integration of these four focuses, enable the visualization of information that enhances features in visual informatics.
Simulation for Laparoscopy Surgery with Haptic Element for Medical Students
131
Human haptics • Perception • Cognition • neurophysiology Computer haptics • Modelling • Rendering • Stability
Machine haptics • Device design • Sensors • Communications Multimedia haptics • Collaborative environment
Fig. 3. Multidisciplinary research in haptics [21]
From the above discussions, it can be concluded that skills acquisition in medical field is a critical issue. Various research have been carried out for many years to study the best approach in medical training which for students and surgeons, then led to the combination of learning aids in residency systems. Various training aids have been developed and implemented in the area of medical training, specifically in laparoscopy surgery. Even though these training aids do help in improving surgery skills but there are still weaknesses, such as less sense of grasp and touch. This gap can be improved by implementing haptics technology that offers tactile sensation. Another criterion is, most of the application developed focused on task but none of them provide a single and complete surgery procedure such as hernia repair or cyst remove. This lack of criteria will be fulfilled by SPLasH through this current study. SPLasH will be developed as a simulation that focuses on a complete single procedure that stresses upon important aspects of laparoscopy surgery with consideration to blend the theory of skill acquisition in each of simulation part.
2 Methodology For the initial analysis, two methods were used to gather the required information. This information was used to identify the possibility of implementing haptic technology as additional or alternative tools for medical students at HUKM to acquire the skills required for laparoscopy surgery. All the results obtained were analysed qualitatively. 2.1 Interview with Experts To support the information gathered through paper review, an interview was conducted. The instruments comprised of open ended questions focused according to the different levels of respondents. Table 4 shows detail of the interviews conducted.
132
A.R. Norkhairani, H. Badioze Zaman, and A. Ahmad Table 4. Interviews details
Respondent Experts
Number 1
Role Leader surgery
Senior Fellow
2
Assist Expert in surgery
Skedul Temu Jurubedah (STBJ)
Sister
1
Skedul Temu Bual Ketua Jururawat (STBKJ)
Nurse
3
General Staff
1
Coordinates nurse and equipment Prepares the equipment for surgery Prepare the material for lab training
in
Instruments Skedul Temu Bual Pakar (STBP)
Bual
Nota Jururawat (NJ)
Nota Latihan (NL)
Focus • Surgery approach • Main elements in laparoscopy surgery • Current issues in laparoscopy surgery training tools • Surgery department planning • Current approach in laparoscopy surgery training at HUKM • Surgery preparation • Staff involvement • Instruments used • Identify instrument • Dummies preparation • Material preparation
The information gathered through the interviews complemented with the observations done in operation theatre and the training laboratories. Cross check was then made to ensure that they were accurate and reliable. 2.2 Observations Observations took place after the interviews. The observations were needed to get the real pictures of laparoscopy surgery procedure. The observation done during the training session of laparoscopy surgery, was also needed to complement the data gathered from previous method. Table 5 portrays the whole observation process. Table 5. Observation details
Venue Operation HUKM
Process observed theatre,
Training Lab, Surgery Department, HUKM
Hernia repair (left and right) procedure for 2 month old boy Equipment preparation
Instruments for data capture • Video Camera • Photographs • AIDA workstation • Photographs • Observational notes
Observations were done in both situations, the real laparoscopic surgery procedure in actual operating room and setting for training environment. For the real laparoscopic procedure surgery, observation involved with hernia repair procedure on a 2-month-baby boy. The procedure was recorded using Sony Video CamCorder and AIDA workstation.
Simulation for Laparoscopy Surgery with Haptic Element for Medical Students
133
3 Findings This section will describe results from both methods used. Results were summarized in Table 6. These results were concluded from the interviews conducted based on the instruments used in both methods. Table 6. Summary of findings Methods Interviews
Category Experts
Senior Fellow
Sister
Nurse General Staff Observation
Operation theatre
Training lab
Result • Laparoscopy is taking place as major approach especially for paediatrics • Laparoscopy is a method that need small incision for instruments insertion • Pressure and grasp is main elements in laparoscopy approach • Current learning tools are lacks in accurate pressure and grasp • HUKM is planning to develop a surgical training centre • Currently, dummies are used as training aid for laparoscopy surgery • The same dummies used for every sessions • Animal parts used as human organs • Experts skill is need to guide students • Only few master student is allowed to be in operation theatre • At least 2 – 3 nurses will assist for the process of preoperation, operation and post operation • Each procedure has its’ own set of instruments • Animal’s parts used to replace human organs (hearts, limb and etc.) • Few masters students witness the operation • Anaesthesia experts, assistant surgeons and nurses surround the bed beside the experts • No recording done for pre-operation, outside look of operation and post operation • Operation used set of instruments that specially for paediatrics • No trocar used • Usage of dummy box as training aid • Dummy box consist of upper part and lower part • Upper part is made from rubber and having many pin hole made • Lower part is dedicated for animal’s organ. No stability of organs.
Next section will analyse the results and discuss the significance of the results on the study conducted.
134
A.R. Norkhairani, H. Badioze Zaman, and A. Ahmad
4 Discussions As a result of the interviews,it was found that there were no computerized tools used as aids for laparoscopy surgery training. According to the experts, who is known as pediatric experts’ surgeon, haptics application for laparoscopy surgery training that is available in the market today, lacks accuracy in pressure and grasp. It is supported with findings that showed only few virtual reality simulators available today included haptic feedback as a functionality and they were not acknowledged as natural enough [12]. These two main elements are very important in laparoscopy procedure. Currently, at the department of Surgery in HUKM, medical students use dummies as learning aids. This dummy is in a form of box that has a surface that represents human abdomen which is made from rubber on the upper part as shown in Figure 4 and space for putting animal parts (limb, heart and etc.) in lower part as shown in Figure 5. Each time a student creates a small pin on the abdomen, it will remain on the dummies. This leads to the loss of elasticity that happens when a student pushes the instruments at the same spot that should be experienced if the student carries out the experiments on real human abdomen [24]. The loss of elasticity will influence the pressure that the students should be giving to the instruments. Findings by Storm et al.[23], indicate that haptic feedback is important at the early stage of skills acquisition in surgical simulations, while Webster, R. et al.[24] and Cotin, et al.[25] said that users must feel the contact forces in haptics surgical simulation.
Fig. 4. Upper part of the dummy
Fig. 5. Lower part of the dummy
Observations showed that during procedure of the surgery, only a few masters students were allowed to witness inside the operation room. This is due to the limited spaces (surrounding the bed) available, where other important staff needed to be there too, such as anesthesia experts, assistant surgeons and assistant nurses. Even though there was a recording going on, once the instruments were inserted into the abdomen, the pre process of determining where the instruments should be inserted was not recorded at all. The knowledge can only be obtained if the students or interns were present in the operation theatre. Figure 6 to Figure 11 shows some snapshots during the procedure. In Figure 8 it clearly shows there is a need for careful and accurate pressure because there are organs under the abdomen that, if pressure is used more than is needed, it will harm the intestine while insufficient pressure will result to incomplete incision. Figure 10
Simulation for Laparoscopy Surgery with Haptic Element for Medical Students
Fig. 6. Incision made under navel
135
Fig. 7. Laparoscope inserted through the incision
Fig. 8. Cutter is pressed to create incision for Fig. 9. Grasper inserted through incision grasper insertion made in Fig. 8.
Fig. 10. Grasper holds needle and grasps Fig. 11. After the procedure, incision is muscle for suturing almost invisible
shows that accurate pressure hat needs to be given to both graspers so that the appropriate grasping of muscle and needle will ease the suturing process. Figure 11 was recorded after the procedure has been completely done where the incision under
136
A.R. Norkhairani, H. Badioze Zaman, and A. Ahmad
the navel and incision made by instruments did not require any bandage as it was too small. These are all the reasons why laparoscopy is much more preferred as an approach in certain cases since it is less risky compared to open surgery approach. This leads to the need for the surgeons to acquire laparoscopy surgery skills in order to practice in this surgical approach, hence the need for the training tools that can provide accurate skills exercises. Haptics technology can fulfill these requirements. From the explanation above, it is very important for students to acquire the same skills set that is demonstrated by the experts through experts’ guide, but issues such as time, cost and social have been obstacles to make it a reality. Simulations are one of the ways to mimicking the expert’s skills. It can be concluded that surgical simulators, dummy box or virtual reality systems allow the exercising of basic laparoscopy tasks repeatedly in a controlled environment, which is free from pressure as if the procedure is carried out on real patients, although they have been criticized in terms of its realisticness [3]. The future plan of the Department of Surgery in HUKM is to develop a surgical training centre that can also be a motivational factor to develop a simulation that can be used as additional or alternative training aids. Preliminary Analysis of SPLasH Initial Analysis: a) Document review (i) Laparoscopy surgery (ii) Current training method in surgery (iii) Skill Acquisition Theory (iv) Adult Learning Theory (v) Past and current haptics implementation in surgery training b) Interview (i) Various level of staff involves with surgery training and procedure (ii) Different aspects of task c) Observation (i) Operation theater – involvement, complete procedure (ii) Training Lab – equipment, procedure, weaknesses -
Theories and methods in Simulation development ID models for simulation development Software Life cycle
Design and Development of SPLasH
ID model for SPLasH; • Kopta’s skill acquisition theories • Multimedia and Haptics • Cognitive and Psychomotor • Hands-on • Force and Tactile feedback Modules: • Guided Learning • Assisted Practices • Self-Practices
SPLasH Prototype
-
SPLash Development Model
User Acceptance Test of SPLasH
Usability test of SPLasH: a) • • •
Students Ability to learn Easy to remember Error rate
b) Medical lecturer / experts • Efficiency • Satisfaction Instruments: (i) Survey • Usability Test For Medical Students (UTFMS) • Usability Test For Medical Lecturers (UTFML) (ii) Observation • Observation note for students (ONFS) • Observation note for lecturers (ONFL)
Fig. 12. Simplified Research Process of SPLasH
Simulation for Laparoscopy Surgery with Haptic Element for Medical Students
137
5 Conclusion Surgery training evolves and attracts attention of researchers in recent years. It has been evolved from various angles of perspectives. Dunkin, B., et al. [13], reported that currently, many companies are actively involved with the implementation of haptics in the simulations, which can be used in laparoscopy skills training, portrayed that researches in this area are highly in demand [16]. He also highlighted issues that need to be covered for the future direction of surgical simulations. For example, the need to analyze the fiscal realities in providing training to the residents and the considerations of adult learning theories that need to be adapted into simulations. Besides that, the competency and proficiency provided by the simulators should be studied carefully in order to provide a surgical simulation that allows the skills acquisition as well as reduce fiscal issues and socials issues in medical training. Next part of the research will use the information gathered in this initial study to propose a design framework to develop a simulation integrating haptic devices as a learning aid for medical students at HUKM. Figure 12 simplified the whole research process. The simulation will use hernia repair as a procedure to gain basic laparoscopy skills. The integration with haptic devices will help students to experience the realism of laparoscopy surgery procedure with the ability of these devices to provide force and tactile feedbacks. This study will enrich the literature for future research conducted in the area of Visual Informatics which is growing rapidly. Acknowledgement. This research is funded by a grant under the Projek Arus Perdana (UKM-AP-ICT-16-2009) entitled ‘Multi-Display Interactive Visualization Environment on Haptic Horizontal Surface’ at Institute of Visual Informatics, UKM.
References 1. Wong, J.A., Matsumoto, E.D.: Primer: cognitive motor learning for teaching surgical skill—how are surgical skills taught and assessed? Nature Clinical Practice Urology 5(1), 47–54 (2008) 2. Hamdorf, J.M., Hall, J.C.: Acquiring surgical skills. British Journal of Surgery 87(1), 28– 37 (2000) 3. Najmaldin, A.: Skills training in pediatric minimal access surgery. Journal of Pediatric Surgery 42(2), 284–289 (2007) 4. Panait, L., Akkary, E., et al.: The role of haptic feedback in laparoscopic simulation training. Journal of Surgical Research 156(2), 312–316 (2009) 5. Zaman, H.B., et al.: Visual Informatics: Bridging Research and Practice. In: Badioze Zaman, H., Robinson, P., Petrou, M., Olivier, P., Schröder, H., Shih, T.K. (eds.) IVIC 2009. LNCS, vol. 5857, pp. 868–876. Springer, Heidelberg (2009) 6. Sándor, J., Lengyel, B., et al.: Minimally invasive surgical technologies: Challenges in education and training. Asian Journal of Endoscopic Surgery 3(3), 101–108 (2010) 7. Chamberlain, R.S., Sakpal, S.V.: A comprehensive review of single-incision laparoscopic surgery (SILS) and natural orifice transluminal endoscopic surgery (NOTES) techniques for cholecystectomy. Journal of Gastrointestinal Surgery 13(9), 1733–1740 (2009) 8. Bax, N.M.A.: Ten years of maturation of endoscopic surgery in children. Is the wine good? Journal of Pediatric Surgery 39(2), 146–151 (2004)
138
A.R. Norkhairani, H. Badioze Zaman, and A. Ahmad
9. Laguna, M., Wijkstra, H., et al.: Training in Laparoscopy. Laparoscopic Urologic Surgery in Malignancies, 253–269 (2005) 10. Rosen, J., Hannaford, B., et al.: Markov modeling of minimally invasive surgery based on tool/tissue interaction and force/torque signatures for evaluating surgical skills. IEEE Transactions on Biomedical Engineering 48(5), 579–591 (2001) 11. Lamata, P., Gómez, E.J., et al.: SINERGIA laparoscopic virtual reality simulator: Didactic design and technical development. Computer Methods and Programs in Biomedicine 85(3), 273–283 (2007) 12. Debes, A.J., Aggarwal, R., et al.: A tale of two trainers: virtual reality versus a video trainer for acquisition of basic laparoscopic skills. The American Journal of Surgery 199(6), 840–845 13. Sansregret, A., Fried, G.M., et al.: Choosing the right physical laparoscopic simulator? comparison of LTS2000-ISM60 with MISTELS: validation, correlation, and user satisfaction. The American Journal of Surgery 197(2), 258–265 (2009) 14. Basdogan, C., De, S., et al.: Haptics in minimally invasive surgical simulation and training. IEEE Computer Graphics and Applications 24(2), 56–64 (2004) 15. Basdogan, C., Ho, C.H., et al.: Virtual environments for medical training: Graphical and haptic simulation of laparoscopic common bile duct exploration. IEEE/ASME Transactions on Mechatronics 6(3), 269–285 (2001) 16. Dunkin, B., Adrales, G.L., et al.: Surgical simulation: a current review. Surgical Endoscopy 21(3), 357–366 (2007) 17. Niemeyer, G., Kuchenbecker, K.J., et al.: THUMP: an immersive haptic console for surgical simulation and training. Studies in Health Technology and Informatics, 272–274 (2004) 18. Fisher, B., Fels, S., et al.: Seeing, hearing, and touching: putting it all together. ACM (2004) 19. Oakley, I., McGee, M.R., et al.: Putting the feel in’look and feel. ACM (2000) 20. Eid, M., Andrews, S., et al.: HAMLAT: A HAML-based authoring tool for haptic application development. Haptics: Perception, Devices and Scenarios, 857–866 (2008) 21. Eid, M., Orozco, M., et al.: A guided tour in haptic audio visual environments and applications. International Journal of Advanced Media and Communication 1(3), 265–297 (2007) 22. El Saddik, A.: The potential of haptics technologies. IEEE Instrumentation and Measurement Magazine 10(1), 10 (2007) 23. Ström, P., Hedman, L., et al.: Early exposure to haptic feedback enhances performance in surgical simulator training: a prospective randomized crossover study in surgical residents. Surgical Endoscopy 20(9), 1383–1388 (2006) 24. Webster, R., Haluck, R., et al.: Elastically deformable 3D organs for haptic surgical simulation. Studies in Health Technology and Informatics, 570–572 (2002) 25. Cotin, S., Delingette, H., et al.: A hybrid elastic model for real-time cutting, deformations, and force feedback for surgery training and simulation. The Visual Computer 16(8), 437– 452 (2000)
Detection and Classification of Granulation Tissue in Chronic Ulcers Ahmad Fadzil M. Hani1, Leena Arshad1, Aamir Saeed Malik1, Adawiyah Jamil2, and Felix Yap Boon Bin2 1 Centre for Intelligent Signal & Imaging Research, Department of Electrical & Electronics Engineering, Universiti Teknologi PETRONAS, 31750 Tronoh, Perak, Malaysia 2 Department of Dermatology, General Hospital, Kuala Lumpur, 50586 Kuala Lumpur, Malaysia
Abstract. The ability to measure objectively wound healing is important for an effective wound management. Describing wound tissues in terms of percentages of each tissue colour is an approved clinical method of wound assessment. Wound healing is indicated by the growth of the red granulation tissue, which is rich in small blood capillaries that contain haemoglobin pigment reflecting the red colour of the tissue. A novel approach based on utilizing haemoglobin pigment content in chronic ulcers as an image marker to detect the growth of granulation tissue is investigated in this study. Independent Component Analysis is employed to convert colour images of chronic ulcers into images due to haemoglobin pigment only. K-means clustering is implemented to classify and segment regions of granulation tissue from the extracted haemoglobin images. Results obtained indicate an overall accuracy of 96.88% of the algorithm performance when compared to the manual segmentation. Keywords: Chronic Wounds, Ulcers, Granulation Tissue, Haemoglobin, Independent Component Analysis, Visual Informatics.
1 Introduction 1.1 Chronic Ulcers and Healing Assessment In pathology, wounds that fail to follow a normal specified course of healing are categorized as chronic wounds [1]. Ulcers are one common type of chronic wounds. Ulcers are generally divided into three main categories based on their etiological causes: vascular, pressure, and diabetic ulcers [2]. Ulcers are most commonly found on the lower extremity below the knee and affect around 1% of adult population and 3.6% of people older than 65 years [3]. Chronic ulcers introduce a major problem in dermatology and a huge economic dilemma especially in western countries. Chronic wounds affect three million to six million patients in the United States and treating these wounds costs an estimated $5 billion to $10 billion each year [2]. The annual cost associated with the care of ulcers is approximated to be £400 million in the H. Badioze Zaman et al. (Eds.): IVIC 2011, Part I, LNCS 7066, pp. 139–150, 2011. © Springer-Verlag Berlin Heidelberg 2011
140
A.F.M. Hani et al.
United Kingdom [4]. Non-healing ulcers can cause patients to suffer severe pain and discomfort, and may subject patients to the risk of limb amputation. Figure 1 shows two examples of chronic leg ulcers.
Fig. 1. Chronic Leg Ulcers* *Images acquired at Dermatology, Hospital Kuala Lumpur
Different types of tissue exist on ulcers surface as they progress throughout the healing process: black necrosis, yellow slough, red granulation and pink epithelial tissue. At any one time, all the four tissue types can be present on the ulcer surface. Recognizing and measuring the amount of each tissue is an approved method of wound assessment and understanding of wound progression. In daily clinical practice, physicians inspect the healing status of ulcers by describing the tissues inside the ulcer in terms of percentages of each tissue colour based on visual inspection utilizing the Black-Yellow-Red scheme or the Wound Healing Continuum [5], [6]. This method is widely used in clinical settings, but has the disadvantage of being simple, subjective and more importantly difficult to qualify and quantify. Moreover, chronic wounds heal gradually which makes detection of small changes with visual inspection challenging. Hence, imaging techniques utilizing colour digital images are developed to provide precise, objective and reliable data that aid physicians to evaluate healing status of ulcers. Some developed techniques in the area of wound tissue classification and segmentation are based on quantifying colour content in images using conventional colour models such as RGB and HIS [7],[8]. Other methods utilize the information obtained from RGB colour histogram to develop clustering techniques for automatic classification and segmentation of different types of tissues within the wound site [9-12]. Most recently, as part of the ESCALE project devoted to the design of a complete 3D and colour wound assessment tool, an unsupervised wound tissue segmentation method was proposed [13], [14]. The method utilizes three selected unsupervised segmentation methods, J-SEG, Mean Shift and CSC, to segment colour wound images into different regions. It then extracts both colour and texture descriptors from coloured images as inputs to an SVM based classifier for automatic classification and labelling of these regions. The work developed in the field of wound assessment thus far is based on processing and analyzing the colour content as the major component in digital imaging. However, the interpretation of colour content in the image is always compromised by the unavoidable differences in acquisition conditions such as the camera and flash used and the lighting in the room. Therefore, an alternative
Detection and Classification of Granulation Tissue in Chronic Ulcers
141
approach based on the optical characteristics of wound components is explored in this study. 1.2 Hypothesis and Research Objective One of the major changes during wound healing is the colour of the tissues, which results from human vision perception of the light reflected from the skin. Human skin is a highly heterogeneous media with multi-layered structure. Most of the incident light penetrates into the skin and follows a complex path until it exits back out of the skin or gets attenuated by skin pigments such as melanin and haemoglobin [15], [16]. Figure 2 shows the absorbance spectra of the main pigments in human skin, melanin, haemoglobin (deoxy-haemoglobin and oxy-haemoglobin) and bilirubin [16].
Fig. 2. Absorption Spectra of Main Skin Pigments, Melanin, Deoxy-haemoglobin (Hb), Oxyhaemoglobin (HbO2), and Bilirubin* * Reproduced from R.R. Anderson and J. A. Parrish (1981)
The first indication of ulcer healing is the growth of the mew healthy red granulation tissue on ulcer surface. Granulation tissue appears red in colour due to the presence of small blood vessels that contain haemoglobin pigment. Studies show that haemoglobin contains certain optical characters that can be detected from colour images and used to show their content within human skin [17-21]. Haemoglobin pigment exhibits its high absorbance at wavelengths around 420-430 nm with no absorbance (total reflectance) for wavelengths below 620nm. The portion of the electromagnetic spectrum that is visible to the human eye extends from violet at about 380 nm to red at about 750 nm as shown in figure 3. Different colour component exhibits different reflectance ranges in the visible spectrum. Red colour component exists at wavelengths between 620-740 nm from the visible light. This explains why the granulation tissue that contains haemoglobin pigment appears red in colour when viewed under visible light.
142
A.F.M. Hani et al.
Fig. 3. Reflectance Spectrum of the Visible Light
Because haemoglobin pigment causes the red colour of the granulation tissue it is hypothesized that image regions due to haemoglobin pigment, could be extracted from colour images of chronic ulcers. Extracted image regions due to haemoglobin pigment represent areas of haemoglobin distribution over the ulcer surface, which in return indicates regions of granulation tissue. Hence, the objective of this research work is to study haemoglobin pigment content in chronic ulcers as an image marker to detect the newly growing healthy granulation tissue.
2 Approach Independent Component Analysis (ICA) is a technique that extracts the original signals from mixtures of many independent sources without a priori knowledge on the sources and the process of the mixture [22], [23]. In digital imaging, colour is produced by combination of three different spectral bands, Red, Green and Blue (RGB). In this work, colour RGB images of chronic ulcers are considered the observed mixtures from which independent source of haemoglobin pigment content is extracted. The focus of this work is particularly on extracting image regions due to haemoglobin pigment, which represent detected areas of granulation tissue. 2.1 Colour Images of Chronic Ulcers Colour images of chronic ulcers that contain a mixture of each type of tissues are acquired at Hospital Kuala Lumpur, Malaysia. This is very crucial to the research work as it ensures working on ulcers images taken under controlled acquisition conditions. Each time before data acquisition session, the ulcers have to be examined and cleaned by the nurses at the hospital. Images of ulcers are then taken before the new dressing is applied to the ulcers. A total of 110 ulcer images are acquired from 69 patients. A Digital Single Lens Reflector (DSLR) camera with resolution of 12.3 megapixels is used to acquire colour images of chronic ulcers. A flash light is used to provide adequate reproducible lighting for image acquisition. To avoid spectral reflections and shadows, a diffuser dome was mounted on the flash light. Furthermore, small reference sticker of size 9x13 mm is placed next to the ulcer to provide a size reference. The images that provide the best view of the ulcer wounds under optimum lighting conditions are used in the analysis. 2.2 Detection of Granulation Tissue Each RGB colour ulcer image comprises of three grey-level images representing the spectral bands of Red, Green and Blue channels. For each ulcer image, these bands
Detection and Classification of Granulation Tissue in Chronic Ulcers
143
are used to create row vectors of data comprising an observation dataset, which represent the mixtures upon which Independent Component Analysis (ICA) is implemented. In this work, ICA is applied using the FastICA algorithm developed by Hyvärinen and Oja [22]. FastICA is based on a fixed-point iteration that uses maximization of non-gaussianity as a measure of independence to estimate the independent components. Figure 4 is a flow chart of the algorithm applied in this work. The dataset is first normalized to centre on zero point by extracting the mean value from each spectral band. Data whitening was then applied by employing Principle Component Analysis (PCA) to transform the dataset linearly so that its components are uncorrelated and their variances equal unity. These steps are performed to simplify the ICA algorithms and reduce the number of parameters to be estimated [22]. ICA was then implemented to extract grey level images representing areas of haemoglobin distribution. The algorithm is developed using MATLAB version 9.0 and implemented on the images with JPEG file format.
Fig. 4. Flow Chart of the Algorithm Implemented
2.3 Segmentation of Granulation Tissue Segmentation is an image processing technique that is used to subdivide an image into constitutes regions [24]. Data clustering is an unsupervised learning approach where similar data items are grouped together into clusters. K-means clustering is a method of hard clustering which aims to partition N observations into K clusters in which each observation belongs to the cluster with the nearest mean [25]. The number of
144
A.F.M. Hani et al.
clusters K is an important input parameter to the algorithm and hence an inappropriate choice of K may yield poor results. Hence, it is important to run diagnostic checks for determining the number of clusters in the data set. K-means clustering is applied in this study to classify the extracted haemoglobin images into distinctive clusters based on the Euclidean distance of each pixel value to the mean of the cluster centre. The algorithm is applied iteratively with number of cluster (K) ranging from K=2 till K=8. The parameter K is initially chosen as K=2 because the algorithm is expected to segment at least two distinctive regions, the granulation tissue and the rest of the ulcer. Some extracted haemoglobin images represent three distinctive intensity value regions and hence for segmenting the granulation tissue in these images, K is initially chosen as K=3 to avoid misclassifying other wounds regions as part of the granulation tissue. The classified image is then converted into a binary image based on the intensity values of the clustered granulation tissue. Results obtained are shown and discussed in the next section.
3 Results and Analysis In clinical practice, ulcers are assessed based on their appearance and tissue condition. Physicians evaluate the type and condition of the granulation tissue according to its colour which is determined by the concentration of haemoglobin pigment. There are three main types of granulation tissue categorized by physicians: bright beefy red, dusky pink and pale pink. The type of granulation tissue determines the level of the ulcer healing from newly grown bright beefy red till the healed pale pink. Acquired ulcer images are assessed by the dermatologists at Hospital Kuala Lumpur to determine the type and amount of granulation tissue existing on the surface of these ulcers. 3.1 Detection of Granulation Tissue The proposed algorithm is applied on the colour images of ulcers to extract images that represent distribution of haemoglobin pigment. The extracted grey-level images indicate the distribution of haemoglobin pigment which reflects areas of granulation tissue on the surface of these ulcers. Figures 5 shows the results obtained. Figure 5(a) shows an image extracted representing bright beefy red granulation. Figure 5(b) shows an image extracted representing dusky pink granulation and figure 5(c) shows image extracted representing pale pink granulation. By examining the extracted images due to haemoglobin, the regions, which appear with distinctive range of intensity values, represent areas of haemoglobin pigment distribution, highlighting areas of granulation tissue. This shows that the novel approach upon which the granulation tissue detection algorithms are developed successfully detects areas of granulation tissue on ulcer surface regardless of its type.
Detection and Classification of Granulation Tissue in Chronic Ulcers
145
(a)
(b)
(c) Fig. 5. Extracted Images Representing Granulation Tissue Regions Detected
3.2 Segmentation of Granulation Tissue Clustering based segmentation is employed to segment regions of granulation tissue from extracted images due to haemoglobin pigment. K-means clustering is utilized to classify the extracted images due to haemoglobin pigment into clusters of different regions. Figure 6 shows the results obtained from classifying an extracted grey level haemoglobin image into clusters.
146
A.F.M. Hani et al.
Fig. 6. Classified Image obtained from Haemoglobin Image
The classified image is then converted into a binary image based on the intensity values of the clustered granulation tissue region. Figure 7 shows the results of converting the classified image in figure 6 into a binary image with segmented regions of granulation tissue.
Fig. 7. Binary Image obtained from Classified Image
3.3 Validation of Results Twenty-two colour images of chronic ulcers are selected from the collected images based on different ulcer wounds with varying severity and healing status to validate the developed algorithm. The segmentation algorithm using k-means clustering is applied iteratively on each extracted haemoglobin image with parameter K ranging from K=2 or K=3 till K=8. The algorithm performance is validated by comparing the segmented granulation regions obtained with the developed algorithm for each number of clusters (K) against the ones obtained from manual segmentation. The manual segmentation is performed by the operator based on the dermatologist’s assessment of granulation tissue type and amount of each ulcer image. The difference between the algorithm segmentation and the manual segmentation is obtained by calculating the number of mis-segmented pixels as follows:
Detection and Classification of Granulation Tissue in Chronic Ulcers
147
n
Error =
xi 1
where n= Number of wrongly segmented pixels
N
N= Image size The accuracy of the algorithm performance is calculated as follows: Accuracy = |1-Error| x100 Table 1 below shows the calculated highest accuracy achieved of the developed segmentation algorithm performance, with the corresponding number of cluster (K), compared with the results obtained from the manual segmentation for each ulcer image. The table also shows the average accuracy achieved along with the corresponding standard deviation for each ulcer image when applying the algorithm iteratively for all range of K=2 or K=3 till K=8. Table 1. Accuracy of the Proposed Segmentation Algorithm Compared with Manual Segmentation Ulcer
Highest Accuracy
K
Average Accuracy
Standard Deviation
Ulcer_1 Ulcer_2 Ulcer_3 Ulcer_4 Ulcer_5 Ulcer_6 Ulcer_7 Ulcer_8 Ulcer_9 Ulcer_10 Ulcer_11 Ulcer_12 Ulcer_13 Ulcer_14 Ulcer_15 Ulcer_16 Ulcer_17 Ulcer_18 Ulcer_19 Ulcer_20 Ulcer_21 Ulcer_22 Overall
93.76 92.99 95.86 97.50 95.42 94.20 98.41 97.33 98.00 98.05 95.62 98.01 97.78 94.05 97.17 96.08 98.79 99.42 97.71 98.84 96.99 99.40 96.88
4 5 6 6 4 3 4 4 5 4 5 4 5 3 3 3 8 7 5 3 4 8
92.18 91.87 91.43 95.00 88.94 91.95 97.31 95.31 96.94 96.45 92.58 96.60 95.01 91.55 90.01 94.08 91.27 95.79 96.59 96.02 94.58 97.74 93.87
1.67 1.38 8.22 3.91 4.18 2.33 0.96 3.13 1.63 1.08 3.16 1.39 6.21 4.58 6.52 1.65 14.57 5.74 1.12 2.34 3.83 2.47 3.89
From Table 1, it is noted that the developed algorithm performs fairly well with an overall highest frequency of 96.88% and average accuracy of 93.87% with corresponding standard deviation of 3.89. The algorithm successfully segments regions of granulation tissue with K mostly ranging between K=3 and K=4. The algorithm compares well particularly with manual segmentation for ulcers, which
148
A.F.M. Hani et al.
contain regions of granulation tissue with well defined edges such as ulcers 7, 12 and 20. Such ulcers are shown in figure 8. However, some ulcer wounds are filled with exudates and infected blood, such as ulcers 17 and 22 shown in figure 9. In these ulcers, it is difficult to classify the little granulation tissue on the ulcer edge with small number of K. Hence, best performance results for these ulcers are achieved when K is selected to be a higher integer such as K= 7 or K=8.
Fig. 8. Ulcers Containing Single Granulation Region with Well- Defined Boundary
Fig. 9. Ulcers Filled with Exudates and Infected Blood
Fig. 10. Ulcers Containing Granulation Tissue Mixed with Slough
On the other hand, the algorithm does not compare well with manual segmentation for ulcers, which contains several scattered regions of granulation tissue such as ulcers1, 2, 6 and 14. This is because scattered granulation regions that are normally
Detection and Classification of Granulation Tissue in Chronic Ulcers
149
mixed with slough or exudates are hard to detect and segment manually. Examples of these ulcers are shown in figure 10. It is noted that the number of clusters (K) is an important parameter that has to be determined to achieve a successful segmentation of granulation tissue with high accuracy when compared to the manual segmentation. Results obtained thus far are to be further analyzed to enhance the developed algorithm. The main goal is to achieve a fully automated segmentation with high accuracy and reliable reproducible results. Priory knowledge on the number and types of tissues contained within the ulcer may be required as an input to the system to determine the optimum number of clusters needed to achieve accurate results. Further review is required to relate the findings obtained so far with the current techniques available on image segmentation to make the system automated and more robust.
4 Conclusion A novel approach to investigate the healing status of chronic ulcers by detecting the healthy granulation tissue is investigated in this study. Most of the work developed in the field of wound assessment utilize colour in images for image analysis. However, the interpretation of colour is compromised by the unavoidable differences in acquisition conditions and settings. In chronic ulcers, granulation tissue appears red in colour due to the presence of blood vessels that contain haemoglobin pigment. Hence, a new approach based on analysis of the optical characteristics of haemoglobin pigment for the detection of granulation tissue is investigated. Independent Component Analysis is employed to convert colour images of chronic ulcers into images due to haemoglobin pigment only. K-means clustering is implemented to classify and segment regions of granulation tissue from the extracted haemoglobin images. Results obtained indicate the proposed algorithm detects and segments regions of granulation tissue successfully regardless of their type and amount. The algorithm performs with an overall accuracy of 96.88% when compared to manual segmentation. This introduces a new objective non-invasive technique to assess the healing status of chronic wounds in a more precise and reliable way. Acknowledgment. This is a collaborative work with the Department of Dermatology, General Hospital Kuala Lumpur, Malaysia. The authors would like to thank the hospital staff in assisting them to acquire coloured images of chronic ulcers.
References [1] Keast, D., Orsted, H.: The Basic Principles of Wound Healing (2002) [2] Werdin, F., Tennenhaus, M., Schaller, H.-E., Rennekampff, H.-O.: Evidence-based Management Strategies for Treatment of Chronic Wounds (June 4, 2009) [3] London, N.J.M., Donnelly, R.: ABC of Arterial and Venous Disease: Ulcerated Lower Limb. BMJ 320, 1589–1591 (2000) [4] Margolis, D.J., Bilker, W., Santanna, J., Baumgarten, M.: Venous Leg Ulcer: Incidence and Prevalence in the Elderly. J. Am. Acad. Dermatol. 46(3), 381–386 (2002) [5] Goldman, R.J., Salcid, R.: More than One Way to Measure a Wound: An Overview of Tools and Techniques. Advances in Skin and Wound Care 15(5) (2002)
150
A.F.M. Hani et al.
[6] Gray, D., White, R., Cooper, P., Kingsley, A.: The Wound Healing Continuum- An Aid To Clinical Decision Making And Clinical Audit (2004) [7] Herbin, M., Venot, A., Devaux, J.Y., Piette, C.: Colour Quantitation Through Image Processing in Dermatology. IEEE Transactions on Medical Imaging 9(3) (September 1990) [8] Herbin, M., Bon, F.X., Venot, A., Jeanlouis, F., Dubertret, M.L., Dubertret, L., Strauch, G.: Assessment of Healing Kinetics Through True Colour Image Processing. IEEE Transactions on Medical Imaging 12(1) ( March 1993) [9] Mekkes, J.R., Westerhof, W.: Image Processing in the Study of Wound Healing. Clinics in Dermatology 13(4), 401–407 (1995) [10] Berris, W., Sangwine, S.J.: A Colour Histogram Clustering Technique for Tissue Analysis of Healing Skin Wounds. In: IPA 1997, July 15-17 (1997) [11] Zheng, H., Bradley, L., Patterson, D., Galushka, M., Winder, J.: New Protocol for Leg Ulcer Tissue Classification from Colour Images. In: Proceedings of the 26th Annual International Conference of the IEEE EMBS San Francisco, CA, USA, September 1-5 (2004) [12] Galushka, M., Zheng, H., Patterson, D., Bradley, L.: Case-Based Tissue classification for Monitoring Leg Ulcer Healing. In: Proceedings of the 18th IEEE Symposium on Computer-Based Medical Systems (CBMS 2005), pp. 1063–7125 (2005) [13] Wannous, H., Treuillet, S., Lucas, Y.: Supervised Tissue Classification from Colour Images for a Complete Wound Assessment Tool. In: Proceedings of the 29th Annual International Conference of the IEEE EMBS, Lyon, France, August 23-26 (2007) [14] Wannous, H., Lucas, Y., Treuillet, S., Albouy, B.: A Complete 3D Assessment Tool for Accurate Tissue Classification and Measurement. In: 15th International Conference on Image Processing, ICIP 2008, pp. 2928–2931 (2008) [15] Donner, C., Weyrich, T., d’Eon, E., Ramamoorthi, R., Rusinkiewicz, S.: A Layered, Heterogeneous Reflectance Model for Acquiring and Rendering Human Skin (2009) [16] Anderson, R.R., Parrish, J.A.: The Optics of Human Skin. Journal of Investigative Dermatology 77, 13–19 (1981) [17] Cotton, S.D., Claridge, E.: Developing a Predictive Model of Human Skin Colouring. In: Proceedings of SPIE Medical Imaging, vol. 2708, pp. 814–825 (1996) [18] Tsumura, N., Haneishi, H., Miyake, Y.: Independent Component Analysis of Skin Colour Image. Journal of Society of America 16(9), 2169–2176 (1999) [19] Cotton, S., Claridge, E., Hall, P.: A Skin Imaging Method Based on a Colour Formation Model and its Application to the Diagnosis of Pigmented Skin Lesions. In: Proceedings of Medical Image Understanding and Analysis, pp. 49–52. BMVA, Oxford (1999) [20] Claridge, E., Cotton, S.D., Hall, P., Moncrieff, M.: From Colour to Tissue Histology: Physics Based Interpretation of Images of Pigmented Skin Lesions. In: Dohi, T., Kikinis, R. (eds.) MICCAI 2002. LNCS, vol. 2488, pp. 730–738. Springer, Heidelberg (2002) [21] Tsumura, N., Haneishi, H., Miyake, Y.: Independent Component Analysis of Skin Color Images. In: The Sixth Color Imaging Conference: Color Science, Systems, and Applications (1999) [22] Hyvärinen, A., Oja, E.: Independent Component Analysis: Algorithms and Applications. Neural Networks 13(4-5), 411–430 (2000) [23] Langlois, D., Chartier, S., Gosselin, D.: An Introduction to Independent Component Analysis: InfoMax and FastICA algorithms. Tutorials in Quantitative Methods for Psychology 6(1), 31–38 (2010) [24] Gonzalez, R., Woods, R.E.: Digital Image Processing, 2nd edn. Prentice Hall (2006) [25] Fung, G.: A Comprehensive Overview of Basic Clustering Algorithms (June 2001)
New Color Image Histogram-Based Detectors Taha H. Rassem and Bee Ee Khoo Universiti Sains Malaysia, School of Electronic and Electrical Engineering 14300, Seberang Perai Selatan, Nibong Tebal, Penang, Malaysia [email protected], [email protected]
Abstract. Detecting an interest point in the images to extract the features from it is an important step in many computer vision applications. For good performance, these points have to be robust against any transformation that can be done on the images such as viewpoint change, scaling change, rotation, and illumination and, etc. Many of the suggested interest point detectors are measuring the pixel-wise differences in the image intensity or image color. Lee and Chen [1] used image histogram representation instead of pixel representation to detect the interest points. They used the gradient histogram and the RGB color histogram representation. In this work, different color model's histogram representation such as Ohta-color histogram, HSV-color histogram, Opponent color histogram and Transformed-color histogram are implemented and used in the proposed interest point detector. These detectors are evaluated by measuring their repeatability and matching score between the detected points in the image matching task and the classification accuracy in the image classification task. It is found that as compared with intensity pixels detectors and Lee's histogram detectors, the proposed histogram detectors performed better under some image conditions such as illumination change, blur and some other conditions. Keywords: Interest point detector, histogram, image classification, image matching, Visual Informatics.
1 Introduction To compare similar images, it is enough to detect some edges or lines in each image and make the comparison. This is an easy task. Unfortunately, all images are not similar in natural. The images may be exposed to certain circumstances or some transformations that make the comparison task so difficult. This prompted the computer vision researchers and scientists for searching and detecting strong points in the image that are invariant. These points should be invariant under scaling change, rotation, illumination change, compression variation and blur change that may occur to the image. Point detector step is an important step in many computer vision applications such as matching applications, classification, retrieval, recognition and, etc. The different types of point detectors can be shown in Fig 1. H. Badioze Zaman et al. (Eds.): IVIC 2011, Part I, LNCS 7066, pp. 151–163, 2011. © Springer-Verlag Berlin Heidelberg 2011
152
T.H. Rassem and B.E. Khoo
Fig. 1. Image detectors types. 1(a) Original image 1(b) Random point detector 1(c) Interest point detector 1(d) Dense point detector 1(e) Dense interest point detector.
In the random point detector, the regions are selected randomly without any criteria; it is an easy and bad point detector, whereas the interest point detector is selecting high information content regions [2]. In fact, that, the density of features is playing the important role than interest point features, especially in the recognition and classification tasks, the dense sampling point detector is suggested [3]. As shown in Fig 1(d), it is covering all locations inside the image by regular patches with different scales [4]. Dense interest point detector is suggested by Tuytelaars [2]. It is selecting all patches such as the dense sampling detector then selects the interest points from these points. The dense interest point detector can be shown in Fig 1(e). The interest point detectors are exploring the variation of the image representation such as pixel intensity value or pixel color value. Recently interest point detectors such as Harris, Hessian and etc are measuring the pixel-wise differences between image pixel intensity or color [5, 6]. These interest point detectors can be called pixelbased detector. Harris, Hessian [5,7-9], Maximally stable extremal regions(MSER) [10], detector based on intensity extreme (IBR) [11], an edge-based region detector (EBR) [12], salient region detector, log of Gaussian (LOG) and difference of Gaussian (DOG) detectors are pixel interest point detectors. A complete survey about local feature detectors can be found in [8, 9, 13]. Some of these detectors are exposed to some modification to increase their invariant property. Multi-scale Harris and Hessian affine detectors are most reliable detectors. To explain the Harris detector briefly. Firstly, the first derivatives of moving Gaussian window around is computed. Then, the second moment matrix SC is computed from these derivatives. Finally, the trace and determinant of the second moment matrix are used to decide and identify the interest points [14]. SC , σ, σ′
G X, σ′
I I I
,σ ,σ
I I I
,σ ,σ
(1)
Where σ is the integration scale of the Gaussian window, while σ is the differentiation scale of Gaussian kernels to compute the local image derivative. In case of Hessian detector, second derivative is used. It has strong response, especially on regions and blobs [15]. HE
HE , σ
h h
h h
I I
,σ ,σ
I I
,σ ,σ
(2)
The researchers are trying to use different image representation other than intensity in the detection process. Lee and Chen [1] are suggested to use the histogram representation instead of pixel intensity to build a new detector. This detector can be
New Color Image Histogram-Based Detectors
153
called histogram-based detector. The discrete quantity b , with L levels (bins) can be derived from either intensity or color from the image pixel point , . At each pixel location , , the value of the k-th bin of a weighted histogram , h , ,….. is computed as ,
Z
∑
,
,
,
,
(3)
Where the w , is a Gaussian weighting function, and . is the indictor function. The set Ω , defines a neighborhood around , , and Z is a normalization term , 1. to ensure∑L To find the interest point, Bhattacharyya coefficient is used to measure the , and shifted point , [16]. All similarity between the histogram at calculations details can be found in [1]. At last, the response function R of Hessian matrix that used to determine whether a pixel is an interest point or not is defined as det where the hessian matrix ,
(4)
is defined by
∑
,
,
,
,
,
(5)
Any low level features can be used for constructing histogram such as intensity gradient or color histogram. Lee and Chen used RGB color histogram to construct RGB histogram based interest point and gradient of intensity histogram based interest point. They quantized each color channel in RGB channel into 8 bins and total 512 bins histogram. In each point, the discrete quantity can be computed from the following quantization function ,
, 32
8
, 32
8
, 32
8
1
where , , , and , are the RGB values of pixel , . In intensity gradient histogram, they quantized the orientation of gradient into 8 bins; each covers 45-degree angle and the magnitude into 8 bins. In total, 64 oriented gradient histogram is used in [1]. RGB and gradient histogram based interest point detectors performed well in some conditions as mentioned in [1]. In this paper, new color histogram representations are used as interest point detectors such as Ohta-color histogram, Transformed-color histogram, Opponent color histogram and HSV-color histogram. The same process is used as the Lee and Chen histogram process after converting the image into the different color model. Compared with intensity pixels detectors and Lee's histogram detectors, our color histogram detectors performed better under some image conditions such as illumination change and blur and some other conditions. The new detectors are evaluated by measuring their repeatability and matching scores (distinctively) in the image matching task and their accuracy performance in the image classification task. This is due to the good properties, and the advantages of some color models. The rest of the paper is organized as follows:
154
T.H. Rassem and B.E. Khoo
the proposed color histogram based detectors are explained in Section 2. In section 3, image matching experimental results and image classification experimental results are explained and provided. Finally, the paper concludes along with discussion in Section 4.
2 Proposed Color Histogram Based Interest Point Detectors New color histogram based interest point detectors are explained in this section. The same process is used as Lee and Chen histogram process after converting the image into the suggested color spaces. The Ohta color space [17], HSV color space, Opponent color space, and Transformed color space are used instead of RGB color space. We believed, each color space has special properties and these properties may affect the detection process. In each color space, each channel is divided into 8 bins and totally 256 histogram bins in each histogram. The quantization function can be described as , 32
,
, 32
8
, 32
8
8
1
where , , , and , are the value of pixel , in A, B and C channels in each color space. The Ohta color space channels, the Opponent color space channels and the Transformed color space channels can be described as shown in Table 1. Fig 2 shows an example of the Lee's histogram detectors and our proposed histogram detectors. Table 1. Color spaces conversation equations Ohta color channels R
I
R
I I
G 3
2G
Opponent color channels B
B R 2
R
O O
B O
G R
√2 R
G
2B G
√6 R
Transformed color channels
G √3
B
B
R
μR σR
G
μG σG
B
μB σB
3 Experiments The proposed detectors are applied to the image matching task. These images are subjected to different conditions. The aim is to measure the repeatability and matching score between the detected points locating at the same category. This is the first part of experiments. In the second part, the proposed detectors are applied to image classification task. Lee and Chen histogram detectors and Harris and Hessian pixel detectors are compared with our proposed detectors in each experiment part.
New Color Image Histogram-Based Detectors
155
Fig. 2. Histogram-based detectors. 2(a) Gradient histogram-based interest point. 2(b) RGB histogram-based interest point. 2(c) Opponent histogram-based interest point. 2(d) HSV histogram-based interest point. 2(e) Ohta histogram-based interest point. 2(f) Transformed color histogram-based interest point.
3.1 Experiments on Image Matching Find the similarity between two similar scenes under different conditions such as different illumination, different scales or different viewpoint is an objective of the image point detectors. All the detectors in the literature are trying to overcome these different conditions in which sometimes lead to mismatch between the similar scenes. The proposed detectors are evaluated by measuring the repeatability and matching score for the detected points. Mikolajczyk et al. suggested a special data set for image matching task [14]. It has eight sets of images, and each set has six similar images in scene and category but under different conditions. Actually, this data set has two different scene images; structural scenes that contain homogenous regions with distinctive boundaries such as graffiti, boat and bikes and textured scenes that contain repeated texture of different forms such as tree, wall and bark. In each scene, three different variations are included such as scale change, blur change and view point change. The remaining two sets contain two other different variations such as JPEG compression and illumination change. Fig 3 shows some examples of these images set of the test data set. Any two regions in the left image in Fig 3(a) and other images in Fig 3(a) can be defined as corresponding regions if the overlap error between them is small. Many corresponding regions mean high repeatability and mean that the more stable interest point detector. Moreover, extracting the features from these corresponding regions is used to distinguishing and measuring the correct matching scores [14]. In the following figures, the results of image matching task under each image condition are shown. The same settings are used in these experiments as [2]. The regions are normalized to 30 pixels in size while the threshold of the overlap error is 40%. From Fig 4 to Fig 11, we can conclude that HSV color histogram, Transformed color histogram and Opponent color histogram based interest point detector performed better than another histogram-based interest point detector. Moreover, they performed better than the intensity point detector in some situations. This can be observed clearly in some textured scene such as well (view point change) and tree (blur change) and illumination change. The proposed detectors had acceptable performance in other situations such as the textured scene (Scale change),
156
T.H. Rassem and B.E. Khoo
structured scene (blur change) and structured scene (scale change). In addition, they performed bad in JPEG compression and worst performance in the structured scene (viewpoint change).
Fig. 3. Part of test data set. 3(a) Zoom + rotation (structural scene) 3(b) Viewpoint change (structural scene) 3(c) Viewpoint change (textured scene) 3(d) Blur change (structural scene).
(a)
(c)
(b)
(d)
Fig. 4. Textured scene bark images (scale change). 4(a) Repeatability 4(b) Number of corresponding regions 4(c) Number of correct matching regions 4(d) Matching score.
New Color Image Histogram-Based Detectors
(a)
(b)
(c)
(d)
(a)
(b)
157
Fig. 5. Structured scene bikes images (Blur change). 5(a) Repeatability 5(b) Number of corresponding regions 5(c) Number of correct matching regions 5(d) Matching score.
(c)
(d)
Fig. 6. Structured scene boat images (zoom + rotation change). 6(a) Repeatability 6(b) Number of corresponding regions 6(c) Number of correct matching regions 6(d) Matching score.
158
T.H. Rassem and B.E. Khoo
(a)
(b)
(c)
(d)
Fig. 7. Structured scene Graf images (viewpoint change). 7(a) Repeatability 7(b) Number of corresponding regions 7(c) Number of correct matching regions 7(d) Matching score.
(a)
(b)
(c)
(d)
Fig. 8. Leuven images (light change). 8(a) Repeatability 8(b) Number of corresponding regions 8(c) Number of correct matching regions 8(d) Matching score.
New Color Image Histogram-Based Detectors
(a)
(b)
(c)
(d)
159
Fig. 9. Textured scene trees images (blur change). 9(a) Repeatability 9(b) Number of corresponding regions 9(c) Number of correct matching regions 9(d) Matching score.
(a)
(b)
(c)
(d)
Fig. 10. Ubc images (JPEG comperssion). 10(a) Repeatability 10(b) Number of corresponding regions 10(c) Number of correct matching regions 10(d) Matching score.
160
T.H. Rassem and B.E. Khoo
(a)
(b)
(c)
(d)
Fig. 11. Textured scene wall images (Viewpoint change). 11(a) Repeatability 11(b) Number of corresponding regions 11(c) Number of correct matching regions 11(d) Matching score.
3.2 Experiments on Image Classification In this section, all the detectors mentioned in this paper are compared and evaluated in the image classification task. Different data sets with different properties are used. Object recognition data sets such as Caltech 04, Caltech 101 and Graz02 and Scene recognition data sets such as Oliva and Torralba 4 man-made categories (OT4MM), Oliva and Torralba 4 natural categories (OT4N) and Oliva and Torralba 8 categories (OT8)[18]. With each dataset, the interest points are detected using each interest point detector. Then SIFT are extracted from these points as descriptors. Bag-of features are used to represent the SIFT descriptors and obtain the descriptor's histogram. Table 2, shows the number of vocabulary word that are used with each data set. K-means clustering is used to build the visual code book and non-linear χ2 is used for classification. Table 2. Data sets information and number of vocabulary word Data set
# of Vocabulary Word
# of Categories
# of Training Images
# of Testing Images
Caltech04
300
6
600
300
Caltech101
600
102
3060
1355
GARZ02
500
4
600
300
OT8
500
8
1200
800
OT4MM
500
4
600
400
OT4N
500
4
600
400
New Color Image Histogram-Based Detectors
161
In each data set, the experiments are repeated 20 times with randomly training and testing images using each detector. The mean and standard deviation of the 20 times results are considered as the classification accuracy. Table 3 shows the classification results of image classification for Caltech04, Caltech101, Graz02, OT8, OT4N and OT4MM. From the table results, Hessian affine detector is the best for Caltech101, Graz02 and OT4MM. This is because of the images in these data sets are more structured images than textured images. Furthermore, these data sets are complex cluttered background. Opponent histogram detector and HSV histogram detector are better for the remaining data sets (OT4N, Caltech04 and OT8). The classification performance using the interest point detectors is less than the dense detector, specially the Caltech101 classification which is worse when compared with a lot of previous works in [4, 19-21]. Nowak et al.[3] proved that important factor to achieve high accuracy in the classification and recognition systems is extracting the features densely. The dense detectors are guaranteed to cover all objects or scenes in the images. Moreover, dense detectors covered all low contrast regions rather than other interest point based detectors. Table 3. Image classification accuracy Detector type
OT4MM
OT4N
OT8
Hessian affine detector Harris affine detector RGB histogram based interest point Gradient histogram based interest point HSV histogram based interest point Transformed color histogram based interest point Opponent histogram based interest point Ohta histogram based interest point Detector type Hessian affine detector Harris affine detector RGB histogram based interest point Gradient histogram based interest point HSV histogram based interest point Transformed color histogram based interest point Opponent histogram based interest point Ohta histogram based interest point
62.17±2.892 60.95±1.561 58.70±1.354 61.96±2.257 61.68±2.535 57.86±1.961 61.88±2.702 58.94±1.758 Caltech 04 81.05±2.007 73.80±3.039 83.55±2.562 83.20±1.939 86.23±2.274 80.08±2.339 86.25±1.767 82.73±2.129
51.35±2.371 47.44±2.240 61.08±2.051 54.96±1.729 60.74±2.993 59.09±2.439 62.55±2.014 59.64±2.289 Caltech 101 21.68±0.619 17.52±0.935 15.85±0.909 17.21±1.005 16.29±0.782 13.83±0.861 17.07±0.708 16.40±0.667
44.82±1.437 42.89±1.437 44.99±1.596 46.11±1.448 47.08±1.786 44.35±1.259 47.74±1.412 46.08±1.580 Graz02 65.09±3.167 64.09±3.105 57.91±2.671 62.11±2.901 59.76±3.287 54.91±3.154 59.20±4.167 59.42±3.021
4 Conclusions In this paper, we present four new color histogram-based detectors. Instead of using the pixel intensity, the histogram representation is used in the detection process to detect the interest points. Their performance evaluated for the image matching task using textural and structural data set scenes [14] and for image classification using the number of object and scene data sets. In the image matching, the proposed detectors achieved good repeatability and matching scores in the textural scenes especially in case of illumination and blur change and acceptable to worse repeatability in the structural scenes. In the image classification, the proposed detectors achieved better classification accuracy than other interest point detectors in OT4N, OT8 and Caltech04 data sets and worse classification accuracy in other data sets. Lastly, the image classification results were worse using the interest point detectors compared
162
T.H. Rassem and B.E. Khoo
with the results using the dense detector, especially the Caltech101 classification. This is evidence that the densely extracted feature from images is a crucial stage in the image classification systems. In future work, we plan to use the histogram representation to implement dense histogram-based detector to take the advantage of the densely features. In addition, to implement a hybrid dense histogram-based interest point detector.
References 1. Wei-Ting, L., Hwann-Tzong, C.: Histogram-based interest point detectors. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1590–1596 (2009) 2. Tuytelaars, T.: Dense interest points. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2281–2288 (2010) 3. Nowak, E., Jurie, F., Triggs, B.: Sampling Strategies for Bag-of-Features Image Classification. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3954, pp. 490–503. Springer, Heidelberg (2006) 4. Bosch, A., Zisserman, A., Munoz, X.: Image Classification using Random Forests and Ferns. In: IEEE 11th Conference on Computer Vision, pp. 1–8 (2007) 5. Harris, C., Stephens, M.: A Combined Corner and Edge Detection. In: Proceedings of The Fourth Alvey Vision Conference, pp. 147–151 (1988) 6. Montesinos, P., Gouet, V., Deriche, R., Pelé, D.: Matching color uncalibrated images using differential invariants. Image and Vision Computing 18, 659–671 (2000) 7. Lindeberg, T.: Feature Detection with Automatic Scale Selection. Int. J. Comput. Vision 30, 79–116 (1998) 8. Mikolajczyk, K., Schmid, C.: A Performance Evaluation of Local Descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 27, 1615–1630 (2005) 9. Tuytelaars, T., Mikolajczyk, K.: Local invariant feature detectors: a survey. Found. Trends. Comput. Graph. Vis. 3, 177–280 (2008) 10. Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide baseline stereo from maximally stable extremal regions. In: British Machine Vision Conference, pp. 384–393 (2002) 11. Tuytelaars, T., Van Gool, L., D’Haene, L., Koch, R.: Matching of affinely invariant regions for visual servoing. In: IEEE International Conference on Robotics and Automation, pp. 1601–1606 (1999) 12. Kadir, T., Zisserman, A., Brady, M.: An Affine Invariant Salient Region Detector. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3021, pp. 228–241. Springer, Heidelberg (2004) 13. Schmid, C., Mohr, R., Bauckhage, C.: Evaluation of Interest Point Detectors. Int. J. Comput. Vision 37, 151–172 (2000) 14. Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., Gool, L.V.: A Comparison of Affine Region Detectors. Int. J. Comput. Vision 65, 43–72 (2005) 15. Szeliski, R.: Computer Vision: Algorithms and Applications, 1st edn. Springer, Heidelberg (2011) 16. Comaniciu, D., Ramesh, V., Meer, P.: Real-time tracking of non-rigid objects using mean shift. In: IEEE Computer Vision and Pattern Recognition (CVPR), vol. 2, pp. 142–149 (2000) 17. Ohta, Y., Kanade, T., Sakai, T.: Color Information for Region Segmentation. Computer Graphics and Image Processing 13, 222–241 (1980)
New Color Image Histogram-Based Detectors
163
18. Oliva, A., Torralba, A.: Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope. Int. J. Comput. Vision 42, 145–175 (2001) 19. Boiman, O., Shechtman, E., Irani, M.: In defense of Nearest-Neighbor based image classification. In: IEEE Computer Vision and Pattern Recognition (CVPR), pp. 1–8 (2008) 20. van der Sande, K., Gevers, T., Snoek, C.: Evaluating Color Descriptors for Object and Scene Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 32, 1582–1596 (2010) 21. Rassem, T.H., Khoo, B.E.: Object class recognition using combination of color SIFT descriptors. In: IEEE Imaging Systems and Techniques (IST), pp. 290–295 (2011)
Digital Training Tool Framework for Jawi Character Formation Norizan Mat Diah and Nor Azan Mat Zin Department Of Information Science, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, 43600 UKM BANGI, Selangor, Malaysia [email protected], [email protected]
Abstract. There have been a number of studies carried out on computer-based Jawi writing. However, most of the studies focused on academic learning based on Roman letters. Jawi is a writing script for the Malay language, adopted from Arabic alphabets. Jawi characters are considered very difficult for children to write as it requires fine motor skills and need to be properly written. This article will discuss about the attributes involve in constructing the framework for digital training tools for Jawi character formation. There are four attributes concerning the pertaining to the formation of Jawi characters highlighted in this study. The four leading dimensions of the study are: Pre-writing activity, Jawi character formation process, Practice activities and Jawi character recognition. Previous and current studies discussed in this paper suggest that all of the dimensions highlighted are compatible with the formation of Jawi characters, even though most of the reviews are not about Jawi script. Keywords: Jawi character formation, training tool.
1 Introduction The history of education in Malaysia shows that the Islamic Education has been taught either by using books or texts written in Jawi script and is still being practiced today. A study by Amin [1], revealed that students are weak in the area of Jawi writing and reading skills. Consequently, students who are unable to master Jawi reading and writing skills will be left out in the Islamic subjects as Jawi is the medium of instruction of these subjects [2] Nik Yaacob [3] emphasized that learning to write Jawi must be taught at an early age. A study by Tulisan [4] found that children learning Jawi display a higher degree of weakness in mastering Jawi writing than Jawi reading. This finding is proven by Penguasaan [5] in a survey on Jawi writing skills among school children. The survey showed that some children were unable to master Jawi writing even at the age of 9. Jawi is currently taught in schools, however the methods used to teach Jawi is not motivating [3]. The teaching methods used are the ‘chalk and board’ approach and flash cards [6]. A proper technique needs to be adopted to make learning Jawi interesting activity. Hence, it is important to have suitable learning tools such as digital tools equipped with appropriate interactivity environment and proper interface design to aid children in developing an interest towards learning Jawi. H. Badioze Zaman et al. (Eds.): IVIC 2011, Part I, LNCS 7066, pp. 164–175, 2011. © Springer-Verlag Berlin Heidelberg 2011
Digital Training Tool Framework for Jawi Character Formation
165
Technology has become an influential factor when it comes to education. Advances in computer technology have allowed educators to develop new technology mediated spaces for teaching and learning. This advantage gives students the opportunity to choose what they are interested in learning. The term Courseware or Computer-Aided Instruction (CAI) covers a range of computer-based packages, which aim to provide interactive environment for learning. By adding the elements of interactivity into teaching, it becomes easier to understand lessons through audiovisual aids [7].Students prefer interactive learning provided by tools such as digital tools to motivate and aid them learning Jawi, and not just using the traditional flash cards.There are many computerized products for learning Jawi in the market but not many contribute towards Jawi handwriting within an interactive learning environment. Feedback provided by the computer is one of the interactive learning environments that can be applied in the field of Jawi education. The aim of this research is to develop an interactive tool for learning to write Jawi alphabets. Real data input via a touch screen device is used to formulate feedback. In this paper, we present the first part of our research, a framework to develop an interactive training tool to aid learners in learning to form Jawi characters using a touch screen device. The paper is divided into four sections; we start with the introduction as the first section. Section two describes the background of the study. Section three presents the proposed framework of the research. Finally we provide the conclusion for the paper in section four.
2 Background of the Study Motor skills are important not only at the early age of a child but also throughout their lives, as these skills especially those involving fine motor skills are needed to implement activities or movements of complex operations that require strong and proper control. As mentioned by Lippincott [8], activities that children are doing now lay the foundation for their skills in writing, cutting and the use of other abilities needed for grade school. Fine motor skills require the movements of a group of small muscles which are located above the hand muscles (palm). This group of muscles is essential for a variety of activities such as the coordination of the eyes, the movements of the tongue and lips, and the wriggling of the toes. Handwriting is an activity that involves the use of hand and is one of the important parts of fine motor skill, young children are involved in reading and writing abilities of children increased their visual skills and fine motor skills [9]. Research in motor skill development for handwriting, particularly cursive, has shown that many difficulties may challenge the early stages of handwriting development, or even before [10]. Handwriting is a very important fine motor skill learned during the early school years and formal handwriting instruction may begin as early as the kindergarten years. [11] reported that children in kindergarten spent about 45% of their activities on fine motor skills using paper and pencil. The developing of writing ability is not only important in building a child's self-esteem, but is also considered an essential ingredient for success in school [12]. This is because handwriting performance has a direct effect on academic performance [13], whereby children struggling with handwriting may totally avoid writing and decide that they cannot write.
166
N.M. Diah and N.A. Mat Zin
Such problems as reported in [13] appear to occur in learning Jawi as reported by [14]. He emphasized that learning to write Jawi must start at an early age, with the proper way and proper techniques. The Jawi script is the basic medium for teaching and learning of Islamic Education in Malaysian schools today, especially for learning the Qur'an [15]. The aim of Jawi lessons is to equip students with the skills in reading and writing Jawi to ensure that students understand the various fields of knowledge in Islamic Education. Jawi is a script derived from the Arabic alphabets and was adopted for the use of Malay language writing. The adaptation of Arabic script with six extra characters to accommodate Malay vocal sounds made up the total Jawi characters of twenty five as presented in Figure 1.
Fig. 1. Jawi Characters
Jawi writing which has evolved since the arrival of Islam to the Malay peninsular, was widely used in official communications and transactions during the era of the Malay Malacca sultanate [16]. Unfortunately, Jawi has been replaced with Roman script from the time of the British colonialization era until the present. Currently many people do not know how to read or write in Jawi since it is not widely used [17]. in addition, Jawi is confined only to Islamic matters and is primarily used in the Islamic courts, mosques, religion offices and religious schools [18]. Students are not interested in learning Jawi due to lack of motivation and the fact that there are only a few teachers who are proficient in the Jawi script [18]. The young generation also has negative attitudes towards learning Jawi, perceive it as difficult to learn and not important since it is not evaluated in examination [19]. Furthermore, the current teaching and learning process of Jawi is monotonous, not interesting and lacking in the use of effective teaching tools [20]. The research by [18] also shows that majority of the students feel that Jawi script is very complicated compared to Romanized characters. Hence, they become uninterested and quickly become bored in the traditional classroom environment. One way to overcome this problem is by using technology to aid children to learn and master Jawi character writing skill because nowadays children spend more time using computers. Computers have proven to be beneficial to the field of education. However a computer with an interface based on keyboard and mouse has limited practical use in schools [21]. This is because computers cannot be used when children are involved with handwriting activity. Hermann [22] suggested that handling the keyboard interferes with the writing process, and if this is the case, then the use of more natural interfaces, such as speech and handwriting recognition device, may be desirable. The
Digital Training Tool Framework for Jawi Character Formation
167
advantage a pen (digital pen) has over a keyboard is that it supports the activities of spelling and is a natural interface to the computer [23]. The availability and affordability of graphic tablets and digital pens, and tablet PCs today, make entering a response to computer via the computer handwriting mode a workable alternative to the keyboard. Pen-based computers have several crucial advantages that can help student with electronic handwriting. The advantages include handwriting recognition engine, handwriting-based interfaces, and development of handwriting-based “killer application” [24]. Facilities provided by the pen-based technology in computer handwriting can help solve the problem of Jawi writing difficulty for children. There is a need to study approaches to create software for processing Jawi characters by storing data (hand writing input via tablets or similar touch screen device) in the form of digital image. Suitable pattern recognition algorithms will be applied to formulate and then generate feedbacks to the user. Next, we present results from literature analysis, a framework for developing a training tool for children to learn forming Jawi characters, using intelligent character recognition technology.
3 Framework for Jawi Writing Training Tool The framework is composed of all dimensions involved in developing a training tool that aids the teaching of children in forming Jawi characters on a touch screen of a computer monitor. A stylus pen is a pen-shaped instrument that is used to select menus and to write Jawi characters onto devices that contain a touch screen. There are four components in the framework as shown in Figure 2.These components are discussed in detail in the following subsections. i. ii. iii. iv.
Pre-writing activity Jawi character formation Practice activities Jawi character recognition
Fig. 2. Framework for Jawi writing training tool
168
N.M. Diah and N.A. Mat Zin
Young children from three to five years of age used their hands to explore and learn about the environment and themselves. Developing hand and other pre-writing skills helps young children to prepare for the next step in the world of writing. The method of implementing pre-writing activities for pre-school children is a great approach to build essential foundational fine motor skills. With reference to deDiegoCottinelli and Barros [25], children practice tracing and drawing with pencils, crayons or even with their finger to gain basic pencil-control skill in the learning phase. Prewriting activity is a part of Graphomotor skills to achieve basic movements that are essential part of letter writing [26]. According to Carlson and Cunningham [27], Graphomotor skill is the coordination of the finger, hand, and arm to make desired marks when drawing or writing. Graphomotor helps children achieve the basic movements that are part of letter writing [28]. It is a phase of the writing process that emphasizes the basic hand movements and muscle development necessary for future penmanship. A pre-writing activity is defined as a whole set of tasks that should be accomplished to practice and assess motor skills for handwriting [26]. Before handwriting becomes an additional and complementary ways of expression for children, they need to become familiar with the use of writing tools. Pen-based activities are the leading activities for pre-school children before entering the world of writing. The pen-based skills will include hand strength, directional movement patterns, and effective hand position, which will facilitate making lines, letters, and shapes. Initially, very young children engage in picture drawing and scribbling that looks like writing. According to Clay [29], as children gain experience with printed materials and writing, this scribbling begins to take on the characteristics of print (linearity and separate units). Drawing and scribbling are addressed to be the obvious strategies children used to explore writing at the beginning [30]. Levin and Bus [31] provided evidence for continuity from children’s drawing and scribbling to writing letters. The pictures and scribbles children call “word” indicate many interesting trends: children firstly scribble randomly, and then gradually start to draw directional lines, and show some pictures containing meanings [32] Several studies have explored the connection between young children’s drawing and writing as a way to explore [33]. According to [33], children draw pictures and write to organize ideas and construct meaning from their experiences. [31] found that children’s writing began with drawing activities. By the time children reach the age of four, adults could distinguish children’s writings from their drawings, in part because children used writing-like features including segmentation into small units and linearity, as well as letters and letter-like shapes in their writing [34]. In a syllabic language, children’s ability of drawing and scribbling in emergent writing are acknowledged to help children gain familiarity with the lines and shapes of letters, the appropriate size of words and the direction of sentences [35]. Use of tools in writing begins at the nursery schools by the practice of drawing and continues at primary school mainly through copying task [36]. Tracing helps young child develop a sense of order for how writing is done, and develop his concentration and work on his fine motor skills. It has been shown that by using tracing/copying exercises students perform significantly better than students who are not allowed to participate in these exercises [37][38]. Research by [39] shows that there is a good correlation between copying, tracing, and writing skill.
Digital Training Tool Framework for Jawi Character Formation
169
Therefore it can be concluded that four types of activities are required by preschool children in pre-writing skills before they learn to form Jawi letter (characters). The activities involved are a) scribbling, b) drawing, c) tracing and d) copying. In the formation of manuscript letters, the child will need to be able to form straight lines in horizontal, vertical, and diagonal positions, and to form circles and partial circles. The directions of the stroke which are top-to-bottom and left-to-right also constitute as factors in letter formation [40]. Normally pre-school children should be developmentally ready to form the basic lines (vertical, horizontal, circular and oblique) that constitute manuscript letters by the time they enter school at the age of five [41]. The type of the stroke applied in the drawing and tracing activities for the children include the practice of drawing straight lines, then more challenging lines such as curves, zigzags and diagonal lines. Children will scribble continuous line and sketch soft and continuous pattern of basic line such as shapes of letters involving straight line (vertical and horizontal), curves line and oblique line. The pre-writing activities proposed by the [42] include dot to dot drawing of pictures, objects, shapes and numbers. Based on the literature scribbling, reviewed, it can be concluded that all pen-based activities apply three types of strokes; scribbling, a) straight line, b) oblique line and c) cursive line, to ensure that children are ready to form the Jawi characters for the next activity. 3.2 Jawi Character Formations Jawi character formation is slightly different from roman letters because the Jawi alphabet involves extensive use of curves and doted. Although Jawi writing uses a lot of curves, the character formation is virtually the same as for all of other characters, such as Roman, Japanese, Indian and Chinese. Early study on handwriting by [43] compared the usefulness of copying and tracing as methods of initial handwriting instructions. Only practice in copying can lead to improved performance. [44], studied the effects of copying strokes and letter forms, tracing or connecting dots placed at various points of the letter, and discriminating between correct and incorrect letter forms on handwriting performance. The results from the study showed that children were more engaged in connecting dots rather than in learning to write letters. [45], reported that copying letters improved when the models to be copied are models that depicted motion. A way of teaching handwriting is to explain the form and the order of letter strokes in addition to copying exercises [46]. Generally, the teaching methods for handwriting include tracing, copying, exercises, and drill [47]. Many commercially prepared systems of instruction introduce beginners to handwriting in two stages: tracing letters and independent production of letters [47]. Recently, [48] introduced a four-way method for teaching handwriting. The method consists of: i. The child wrote the letter after seeing the instructor writing it, ii. The child wrote the letter after seeing a copy of it with arrow indicating the order and direction of each stroke, iii. The child wrote the letter from memory after examining a copy of it, iv. The child wrote the letter from memory after examining a copy of it with arrows indicating the order and direction of each stroke.
170
N.M. Diah and N.A. Mat Zin
Children must be able to both perceive the shape of the model and evaluate the deviation between their own handwriting product and the standard. Based on the findings of the research discussed scribbling, together with findings from , [49] [13], three important elements that constitute a handwriting exercise have been identified and will be considered in this research, for Jawi character formation. The elements are a) tracing, b) copying letters, and c) producing letters under dictation. 3.3 Practice Activities The process of transferring behaviors that are model-dependent (as displayed in tracing and copying), to self-initiated and self-organized sequence of behaviors that are carried out without any outside support, is a complex set of learning [29]. Strategies for learning letter formation include modeling, tracing, copying, dictating, composing, self-monitoring, and peer recording [50]. Researcher by [51] indicated that good teaching requires the student to have systematic and extra instruction in handwriting. According to [52], newborn to eight-year-old children learn best from methods that are consistent with developmentally appropriate practice. Handwriting as a complex skill can be improved through practice, repetition, feedback and reinforcement [53]. Three of this elements come from the teachinglearning model, called an acquisitional model [54], derived from theories of learning. This model are instructional programs used by occupational therapists and are based on developmental and acquisitional theories. McCluskey [55] conducted a study on practice involving repetition of different letters, using different writing implements, classroom assignments, therapist feedback, and child self-evaluation. The findings suggest that programs involving writing practice, repetition, and feedback may improve handwriting slightly more than when no treatment is given. Quoting an example gives by [56], if a student who has consistently been unable to write any letters of a spelling dictation test begins to write the first letter sound of the dictated word, praise should be provided to that student. The teacher can then increase the criteria for the reinforcement so that only correct answers will be rewarded [57]. Furthermore [58] states that practice through repetition, feedback and reinforcement is essential if the skill of letter formation is to be mastered. A computer-based tool is an effective environment in which to practice as user need not be supervised but can be still monitored. Thus the three elements of a) repetition, b) feedback and c) reinforcement will be used as practice activities in the Jawi character formation tool. 3.4 Jawi Character Recognition Research into handwriting recognition suggests that accuracy rates of between 87% and 93% are achievable with constrained writing [59]. [60] found that secondary school children enjoyed using speech recognition technology but gave a disappointing 80% ratings for character recognition. Handwriting recognition is generally implemented in a computing environment in which a pen (digital pen) can be used for direct manipulation of screen objects [61].
Digital Training Tool Framework for Jawi Character Formation
171
Normally, handwriting is digitally captured at the point of creation; this is generally done with a graphics tablet or a special stylus or pen. The data input by user is then displayed as script and stored as an ‘ink’ data type. The data type has information about the position and time, and it is this data that is subsequently ‘recognized’ by the recognition algorithms [62]. Handwriting recognition system is a system that automates the process of turning handwritten work into a computer readable form [5]. When the handwriting is in the form of binaries, the computer is able to recognize them and this can ease the process of giving feedback to children especially on how to improve their handwriting [63]. There are researches on children’s handwriting for instance research on the use of optical character recognition for text entry on a PC for children. In this work, the authors compare different methods for children to input data to a computer. A project by [64] is aimed at automatically assessing exercises in which the pupils have to fill in the gaps using a given set of words. In a research on handwriting recognition by [65] writing using tablets was found to be quite efficient, and children were able to write reasonable stories. The Jawi character formation activity and practice activity requires a character recognition algorithm. This is because handwriting teaching involves showing children how to reproduce a letter according to a standard form. In order to produce a character, children must be able to perceive both the shape of the character in its standard form and the deviation between their own handwritten product and the standard, and software must be able to give feedback about on the accuracy and quality of products. Therefore, character recognition technology is an important component in the proposed framework.
4 Conclusion The framework proposed in this paper is based on an extensive review of the literature. However, most previous studies focused on the Romanized letters used in English writings, and several other characters such as Chinese and Japanese kanji. Researches concerning writing Jawi or Arabic characters are very limited; hence a new study is needed to overcome this gap. Findings from the literature on other characters will help guide our study. Although the way other characters are formed differ slightly, the basics of letter/character formation is similar. There are four components in the proposed framework: i. ii. iii. iv.
Pre-writing activity Jawi character formation Practice activities Jawi character recognition
These components can help in developing an effective training tool to aid learners in learning to form Jawi characters using the touch screen technology. The proposed framework needs to be verified through implementation in interactive learning software.
172
N.M. Diah and N.A. Mat Zin
References 1. Amin, A.M.: Tulisan Jawi ke arah penggunaan yang lebih meluas dan berkesan. Journal Dewan Bahasa 33(10), 937–941 (1989) 2. Endut, M.: Pendidikan Islam KBSM: Satu Kajian Tentang Masalah Pengajaran di Sekolahsekolah di Negeri Johor. In: Tesis, M. (ed.), Universiti Kebangsaan Malaysia, Bangi (1992) 3. Nik Yaacob, N.R.: Pengusaan Jawi dan Hubungan dengan Minat dan Pencapaian Pelajar dalam Pendidikan Islam. Journal Pendidik dan Pendidikan 22, 161–172 (2007) 4. Ismail, M. R.: Tulisan Jawi dan Isu – Isunya dalam Pendidikan, Malaysia (2008), http://www.scribd.com/doc/2582591/tulisanjawi 5. Ahmad, M.A.: Penguasaan Tulisan Jawi di Kalangan Murid-murid Tahun Tiga (2011), http://ww.scribd.com/doc/18584642/kajian-jawi 6. Abdullah, N.A., Raja Ahmad Kamaruddin, R.H., Razak, Z., Mohd Yusoff, Z.: A Toolkit Design Framework for Authoring Multimedia Game-Oriented Educational Content. In: 8th IEEE International Conference on Advanced Learning Technologies, pp. 144–145. IEEE Press, New York (2008) 7. Becker, H. J., Raviz, J. L., Wong, Y.T.: Teacher and Teacher-Directed Student Use of Computer and Software Teaching, Learning, and Computing 1998National Survey Report #3. Center for Research on Information Technology and Organizations University of California, Irvine and University of Minnesota (1999) 8. Lippincott, C.: Fine motor activities for preschoolers (2004), http://make-thegradeot.com/Fine%20Motor%20Activities%20For%20Preschoolers.p df 9. Steffani, S., Selvester, P.M.: The Relationship of Drawing, Writing, Literacy, and Math in Kindergarten Children. Reading Horizons 49(2), 125–142 (2009) 10. Grant, A.: How can development of gross and fine motor skills benefit gifted children who struggle with cursive handwriting? (2004), http://www.canterbury.ac.uk/education/professionaldevelopment/projects/gifted-and-talented/docs/alexgrant%2520.doc 11. Marr, D., Cermack, S.A., Cohn, E.S., Henderson, A.: Fine motor activities in Head Start and Kindergarten classrooms. American Journal of Occupational Therapy 57, 550–557 (2003) 12. Feder, K.P., Majnemer, A.: Handwriting development, competency, and intervention. Developmental Medicine and Child Neurology 49(4), 312–317 (2007) 13. Graham, S., Harris, K.R.: Improving the writing performance of young struggling writers: Theoretical and programmatic research from the center on accelerating student learning. The Journal of Special Education 39, 19–33 (2005) 14. Ismail, M. I. :Tulisan Jawi dan Isu – isunya dalam Pendidikan, Malaysia (2001), http://www.scribd.com/doc/2582591/tulisanjawi 15. Seouk, K.K.: Perkembangan tulisan Jawi dalam masyarakat Melayu. Dewan Bahasa dan Pustaka, Kuala Lumpur (1990) 16. Kratz, I.E.U.: Jawi Spelling and Orthography: A Brief Review, Indonesia and the Malay World, vol. 30(86), pp. 21–26. Carfax Publishing (2002) 17. Diah, N.M., Ismail, M., Ahmad, S., Syed Abdullah, S.A.S.: Jawi on Mobile devices with Jawi wordsearch game application. In: IEEE International Conference Science and Social Research (CSSR), pp. 326–329. IEEE Press, New York (2010)
Digital Training Tool Framework for Jawi Character Formation
173
18. Hairul Aysa Abdul Halim Shitiq, H.A.A., Moahmud, R.: Using an Edutainment Approach of a Snake and Ladder game for teaching Jawi Script. In: IEEE International Conference on Education and Management Technology, pp. 228–232. IEEE Press, New York (2010) 19. Yazid, M.: Penguasaan Jawi di Kalangan Pelajar-Pelajar Sekolah Menengah di Negeri Kelantan: Satu Kajian Khusus di Daerah Kota Bharu. Universiti Malaya, Kuala Lumpur (1991) 20. Muda, Z.: Faktor-Faktor Psikologi Dan Permasalahan Dalam Pembelajaran Dan Pembelajaran Jawi Di Kalangan Murid Darjah Lima Di Sekolah Kebangsaan Bukit Tunggal, Kuala Terengganu, Malaysia, Kementerian Pendidikan Malaysia: Jabatan Pendidikan Islam Dan Moral (1996) 21. Iwayama, N., Akiyama, K., Tanaka, H., Tamura, H., Ishigaki, K.: Handwriting-based Learning Materials on a TabletPC: A Prototype and its Practical Studies in an Elementary School. In: Ninth International Workshop on Frontiers in Handwriting Recognition, pp. 533–538 (2004) 22. Hermann, A.: Research into Writing and Computers: Viewing the Gestalt. In: Annual Meeting of the Modem Language Association, pp. 20–30 (1987) 23. Bearne, E.: Making Progress in English. Routledge, London (1998) 24. Nakagawa, M.: Toward Ideographic Human Interfaces, Technical Report of IEICE, PRMU2001-240, vol. 101(712), pp. 41-48 (2002) 25. Diego-Cottinelli, B.D., Barros, B.: TRAZO: A Tool to Acquire Handwriting Skills Using. In: Tablet-PC Devices Proceedings of Interaction Design and Children, pp. 278–281 (2010) 26. Barros, B., Conejo, R., de Diego-Cottinelli, A., Garcia-Herreros, J.: Modeling pre-writing tasks to improve graphomotricity processes. In: European Conference on TechnologyEnhanced Learning (2008) 27. Calrson, K., Cunningham, J.L.: Effect of pencil diameter on the Grphomotor skill of preschools. Journal of Early childhood Research Quarterly 5, 279–293 (1990) 28. Cuthbert, S.C., Barras, M.: Developmental Delay Syndromes: Psychometric Testing Before and After Chiropractic Treatment of 157 Children. Journal of Manipulative and Physiological Therapeutics 32(8), 660–669 (2009) 29. Clay, M.: What did I write? Heinemann, Australia (1975) 30. Freeman, E., Sanders, T.: Kindergarten Children’s Emerging Concepts of Writing Functions in the Community. Early Childhood Research Quarterly 4, 331–338 (1989) 31. Levin, I., Bus, A.G.: How is Emergent Writing Based on Drawing? Analyses of Children’s Products and Their Sorting by Children and Mothers. Developmental Psychology 39, 891– 905 (2003) 32. Chen, S., Jing Zhou, J.: Creative writing strategies of young children: Evidence from a study of Chinese emergent writing. Thinking Skills and Creativity, 138–149 (2010) 33. Baghban, M.: Scribbles, Labels, and Stories: The Role of Drawing in the Development of Writing. Young Children 62(1), 20–26 (2007) 34. Diamond, K.E., Gerde, H.K., Powell, D.R.: Development in Early Literacy Skills During the Pre-kindergarten Year in Head Start: Relations Between Growth in Children’s Writing an Understanding of Letters. Early Childhood Research Quarterly 23, 467–478 (2008) 35. Ritchey, K.D.: The Building Blocks of Writing: Learning to Write Letters and Spell Words. Reading and Writing 21, 27–47 (2008) 36. Rémi, C., Frélicot, C., Courtellemon, P.: Automatic Analysis of the Structuring of Children Drawing and Writing. Pattern Recognition 35(5), 1059–1069 (2002)
174
N.M. Diah and N.A. Mat Zin
37. McLaughlin, T.F.: An Analysis of Token Reinforcement: A Control Group Comparison with Special Education Youth Employing Measures of Clinical Significance. Child Behavior Therapy 3, 43–51 (1981) 38. Park, C., Weber, K.P., McLaughlin, T.F.: Effects of Fading, Modeling, Prompting, and Direct Instruction on Letter Legibility for Two Preschool Students with Physical Disabilities. Child & Family Behavior Therapy 29(3), 13–21 (2007) 39. Graham, S.: Handwriting and spelling instruction for students with learning disabilities: A review. Learning Disability Quarterly 22, 78–98 (1999) 40. Wright, J.P., Allen, E.G.: The Elementary School Journal 75(7), 430–435 (1975) 41. Beery, K.E.: The Beery-Buktenica developmental test of visual-motor integration. Modern Curriculum Press, New Jersey (1997) 42. Smith, J.: Activities for Fine Motor Skills Development. Teacher Created Resources (2004), http://www.teachercreated.com/products/activities-forfine-motor-skills-development-grade-prek-1-3689 43. Hertzberg, O.E.: A Comparative Study of Different Methods Used in Teaching Be-ginners to Write. Unpublished doctoral dissertation. Columbia University, New York (1926) 44. Hirsch, E., Niedermeyer, F.C.: The Ef-fects of Tracing Prompts and Discrimina-tion Training on Kindergarten Handwriting Handwriting Performance. Journal of Educational Research 67, 81–86 (1973) 45. Wright, C.D., Wright, J.P.: Handwriting: the effectiveness of copying from moving versus still models. Journal of Educational Research 74, 95–98 (1980) 46. Berninger, V.W., Graham, S., Vaughan, K.B., Abbott, R.D., Abbott, S.P., Woodruff Rogan, L.W., et al.: Treatment of handwriting problems in beginning writers: Transfer from handwriting to composition. Journal of Educational Research 89, 652–666 (1997) 47. Kirk, U.: Learning to Copy Letters: A Cognitive Rule-Governed. The Elementary School Journal 81(1), 28–33 (1980) 48. Bara, F., Gentaz, E.: Haptics in teaching Human movement science, handwriting: The role of perceptual and visuo-motor skills. Corrected Proof, pp. xxx–xxx (2011) 49. Vinter, A., Chartrel, E.: Effects of Different Types of Learning on Handwriting Movements in Young Children. Journal Learning and Instruction 20(6), 476–486 (2010) 50. Peterson, C.Q., Nelson, D.L.: Effect of an Occupational Intervention on Printing in Children with Economic Disadvantages. The American Journal of Occupational Therapy Official Publication of the American Occupational Therapy Association 57(2), 152–160 (2003) 51. Graham, S., Harris, K., Fink, B.: Extra handwriting instruction: Preventing writing difficulties right from the start. Teaching Exceptional Children 33, 88–91 (2000) 52. NAEYC position statement: Developmentally appropriate practice in early childhood programs serving children from birth through age eight. Washington DC (1996), http://www.naeyc.org/about/positions/pdf/PSDAP98.PDF 53. Mahshid Hosseini, H.: Dose Your Child have Dificulty with Printing and hand writing (2008), http://www.skilledkids.com 54. Mosey, A.C.: Occupational therapy: Configuration of a profession. Raven Press, New York (1981) 55. McCluskey, A.R.: The Log Handwriting Program Improved Children’s Writing Legibility: A Pretest–Posttest Study Nadine Mackay. Occupational Therapy 64(1), 30–36 (2010) 56. Morgret, K., Weber, N., Woo, L.: Behaviorist Approach To Spelling (2001), http://ldt.stanford.edu/~lwoo/behavior.pdf 57. Stipek, D.: Motivation to Learn: from Theory to Practice. Allyn and Bacon, Massachusetts (1993)
Digital Training Tool Framework for Jawi Character Formation
175
58. Nicol, T., Snape, L.: A Computer Based Letter Formation System for Children. In: Proceedings of Conference on Interaction Design and Children, p. 165 (2003) 59. MacKenzie, I.S., Chang, L.: A performance comparison of two handwriting recognizers. Interacting with Computers 11(3), 283–297 (1999) 60. O’Hare, E.A., McTear, M.F.: Speech Technology in the secondary school classroom, an exploratory study. Computers and Education 33, 27–45 (1999) 61. Frankish, C., Morgan, P., Hull, R.: Recognition Accuracy And Usability of Pen-Based Interfaces. In: IEE Colloquium on Interfaces - The Leading Edge (Digest No.1996/126), pp. 7/6 - -7/6 (1996) 62. Read, J.C., MacFarlane, S., Horton, M.: The Usability of Handwriting Recognition for Writing in the Primary Classroom. People and computer XVII-Design for life 4, 135–150 (2005) 63. Saad, M.N., Abd. Razak, A.H., Yasin, A., Aziz, N.S.: Redesigning the user interface of handwriting recognition system for preschool children. International Journal of Computer Science Issues 8(1) (2011); ISBN: 1694–0814 64. Allan, J., Allen, T., Sherkat, N.: Confident assessment of children’s handwritten responses. In: Proceedings of the Eighth International Workshop on Frontiers in Handwriting Recognition, p. 508 (2002) 65. Read. C : CobWeb - a Handwriting Recognition Based Writing Environment for Children, http://academic.research.microsoft.com/Publication/5258217/c obweb-a-handwritingrecognition-based-writing-environmentfor-children
Empirical Performance Evaluation of Raster to Vector Conversion with Different Scanning Resolutions Bilal Abdulrahman T. Al-Douri, Hasan S.M. Al-Khaffaf, and Abdullah Zawawi Talib School of Computer Sciences Universiti Sains Malaysia 11800 USM Penang, Malaysia [email protected], {hasanm,azht}@cs.usm.my
Abstract. Empirical performance evaluation of raster to vector conversion is a means of judging the quality of line detection algorithms. Many factors may affect line detection. This paper aims to study scanning resolution of raster images and its effects on the performance of line detection. Test images with three different scanning resolutions (200, 300, and 400 DPI) are vectorised using available raster to vector conversion software. The Vector Recovery Index scores calculated with reference to the ground truth images and the detected vectors are then obtained. These values are analysed statistically in order to study the effects of different scanning resolutions. From the results, Vextractor is found to be better (on average) compared to VPstudio and Scan2CAD. For all the three resolutions, Vextractor and VPstudio perform better than Scan2CAD. Different scanning resolutions affect the software differently. The performance of Vextractor and VPstudio increases from low resolution to moderate resolution, and then decreases with high resolution. The performance of Scan2CAD decreases with the increase in the resolutions. Keywords: Empirical Performance Evaluation, Raster to Vector Conversion, Vector Recovery Index, Statistical Analysis, Visual Informatics.
1 Introduction Empirical performance evaluation of raster to vector conversion is an important topic in the area of graphics recognition. It is used as a means to judge the quality of line detection algorithms. There are many factors that affect quality of line detection and they may lead to better or poorer line detection rate. Some of these factors have been studied such as noise level, vectorisation software, and cleaning methods [1] and the other factors have yet to be studied such as scanning resolution of raster images. Chhabra and Ihsin [2] used research prototypes and commercial software. The criterion of the performance was EditCost Index. Liu et al. [3] evaluated the performance of several research prototypes. They used different types of noise (Gaussian noise, high frequency, hard pencil, and geometrical noise). VRI was used to measure the performance of the methods. Only solid circular arcs were included in their tests. H. Badioze Zaman et al. (Eds.): IVIC 2011, Part I, LNCS 7066, pp. 176–182, 2011. © Springer-Verlag Berlin Heidelberg 2011
Empirical Performance Evaluation of Raster to Vector Conversion
177
Liu [4] also evaluated research prototypes methods, but the test was performed on real images. He used noisy images by adding salt-and-paper noise. Shafait et al. [5] evaluated the performance of vectorisation methods on one research prototype and three commercial software. They have used real scanned images (with no artificial noise). The criterion used was Vectorial Scores. Al-Khaffaf et al. [6] evaluated the performance of vectorisation methods on two research prototypes and three commercials software. They took real scanned images scanned from two old text books. No artificial noise and no degradation model were used in their tests. In order to focus on the detection ability of the software, Dv (detection rate) score was used as performance evaluation criterion. Two tests were performed: a test between different methods and a test between different versions of the same method. This paper aims to study scanning resolution of raster images and its effects on the performance of line detection. The images are vectorised using available raster to vector conversion software and the Vector Recovery Index (VRI) of the detected vectors will be used as the quality criterion. The VRI values will be analysed statistically to study the effects of different scanning resolutions.
2. Image Data and Raster to Vector Conversions Methods Eight images in three resolutions (200, 300, 400 DPI) were used in this study. The images are mechanical engineering drawings that were taken from Arc Segmentation Contest 2011 dataset [7]. The contest was held in conjunction with the IAPR International Workshop on Graphics Recognition 2011 (GREC’11). The focus is on drawings that contain arc segments because these drawings are mathematically complex than straight lines. Also the detection of straight lines is fairly straight forward in the process of performance evaluation and obsolete for research purpose. Many commercial raster to vector methods are available. However, only a number of the available software are useable due to their limitations. Important features that should be available in the software to make it usable in our study include: detection of circles/arcs, saving circles/arcs as circular curves rather than polylines, and saving of the detected vectors in the DXF file format. So, three commercials software (VPstudio(v9) [8], Scan2CAD(v8) [9], and Vextractor(v3.6) [10]) were used since they are readily available and have been used by other researchers [5] [6] (except for Vextractor which has not been reported in the literature).
3 Performance Evaluation VRI was chosen for this study because it is well accepted by other researchers and an implemented tool is readily available. It is calculated as follows [11]: VRI
Dv
1
Fv
(1)
where Dv is the detection rate and Fv is the false alarm rate. The comparison of the detected vector file with the ground truth file was carried out and the VRI score was output using performance evaluation tool (ArcEval2005).
178
B.A.T. Al-Douri, H.S S.M. Al-Khaffaf, and A.Z. Talib
4 Experimental Resu ults and Discussions Each of the eight test imaages was vectorised using the three commercial softw ware (VPstudio, Scan2CAD, an nd Vextractor). From each vectorised image and grouund truth image pair we got one o VRI score. The total number of VRI scores in this experiment is 72 values. The VRI values obtain ned in the evaluation using VPstudio, Scan2CAD and Vextractor software are sho own as bar chart in Figs. 1, 2, and 3, respectively.
0.7 0.6
VRI
0.5 0.4
2000DPI
0.3
3000DPI
0.2
4000DPI
0.1 0
Image Fig. 1. Bar diagram of VRI values and VPstudio
0.6 0.5
VRI
0.4 2000DPi 3000DPI 4000DPI
0.3 0.2 0.1 0
Image Fig. 2. Bar B diagram of VRI values using Scan2CAD
Empirical Performance Evaluation of Raster to Vector Conversion
179
0.7 0.6
VRI
0.5 0.4
200DPI
0.3
300DPI
0.2
400DPI
0.1 0
Image Fig. 3. Bar B diagram of VRI values using Vextractor
In this experiment we haave two independent variables (vectorisation software and scanning resolution) and one dependent variable (VRI). Since each image produuces more than one score and there are more than one independent variable, repeaated measure ANOVA is a suittable statistical analysis method to be used in this stuudy. There are three requiremen nts to run this statistical test [1,12], (a) order effect shoould be avoided, (b) the data should be normally distributed, and (c) the Sphericcity condition should not be violated. The first requirement is assured since the original image is always used during the addition of the treatments (i.e. level of a factor). In order to validate the secoond requirement of the test, we performed normality test by relying on Shapiro-Wilk. T The equations below show the null n and alternative hypotheses for Shapiro-Wilk. H0: There is no difference between b the distribution of the dataset and the normal. (2) H1: There is a difference beetween the distribution of the dataset and the normal. (3) If ρ (Sig. in Table 1) is lesss than or equal to .05 then null hypothesis will be rejectted. Otherwise, the null hypotheesis will not be rejected. From the normality test (Tablee 1) it can be seen that ρ for alll the cells are greater than .05. Hence, we fail to reject the null hypothesis and we willl consider the data to be normally distributed. Hence, the second condition for runnin ng repeated measure ANOVA is not violated. The Mauchly’s Test is neeeded to insure that the Sphericity condition is not violaated (satisfying the third condiition of repeated measure ANOVA). Only then, we can proceed with the next step of the analysis. From the Mauchly’s Test (Table 2) it can y condition is not violated (ρ > .05) for the two facttors be seen that the Sphericity (vectorisation software and scanning resolution) and for the interaction between theem. Hence, we proceed with thee analysis (see next paragraphs).
180
B.A.T. Al-Douri, H.S.M. Al-Khaffaf, and A.Z. Talib Table 1. Tests of Normality Vectorisation Software
Vector Recovery Index
Resolution Factor
VPstudio 9
200DPI 300DPI 400DPI 200DPI 300DPI 400DPI 200DPI 300DPI 400DPI
Scan2CAD 8
Vextractor 3.6
Shapiro-Wilk df Sig. 8 .707 8 .773 8 .177 8 .425 8 .527 8 .426 8 .250 8 .494 8 .691
Table 2. Mauchly's Test of Sphericityb
Within Subjects Effect
Mauchly's W
Approx. Chi-Square
df
Sig.
Software
.963
.227
2
.893
Resolution
.515
3.986
2
.136
Software * Resolution
.250
7.519
9
.602
b
Tests the null hypothesis that the error covariance matrix of the orthonormalized transformed dependent variables is proportional to an identity matrix.
The null and alternative hypotheses for the vectorisation software variable are shown below: (4) H0: MeanVPstudio = MeanScan2CAD = MeanVextractor (5) H1: Not all the means are equal The vectorisation software variable is significant, F(2,14) = 8.255, ρ < .05 indicating that the difference in the means for the three vectorisation methods are not the same. Hence we will reject the null hypothesis and accept the alternative hypothesis. Table 3. Pairwise Comparisons
(I) Software VPstudio
Scan2CAD
(J) Software
Mean Difference (I-J)
Scan2CAD Vextractor
.143 -.073
Vextractor
-.217*
Std. Error
Sig.a
.056 .057
.114 .727
.049 .009
Based on estimated marginal means a
. Adjustment for multiple comparisons: Bonferroni. *. The mean difference is significant at the .05 level.
Empirical Performance Evaluation of Raster to Vector Conversion
181
By investigating Table 3, it can be seen that the mean difference between Vextractor compared to Scan2CAD is significant. This result can also be seen in Fig 4. In the figure we can see that Vextractor has better overall results compared to both Scan2CAD and VPstudio. Vextractor is also much better than Scan2CAD.
Fig. 4. Performance of the three software based on scanning resolutions of 200, 300, and 400 DPI
5 Conclusions and Future Work The efficiency of three raster to vector software namely VPstudio, Scan2CAD, and Vextractor have been studied and analysed. From the result of the study, it can be concluded that for Vextractor and VPstudio, moderate resolution (300 DPI) has given a good detection rate compared to the other resolutions. The results have also shown that Scan2CAD works well with lower resolution and its performance drops with higher resolutions. Based on the study, it has also been found that Vextractor is efficient and robust with respect to different scanning resolutions. Finally, Vextractor has shown a better overall performance compared to VPstudio and Scan2CAD.
182
B.A.T. Al-Douri, H.S.M. Al-Khaffaf, and A.Z. Talib
References 1. Al-Khaffaf, H.S.M., Talib, A.Z., Abdul Salam, R.: Empirical performance evaluation of raster-to-vector conversion methods: A study on multi level interactions between different factors. IEICE Trans. INF. & Syst. E94-D(6), 1278–1288 (2011) 2. Phillips, I.T., Chhabra, A.K.: A Benchmark for Raster to Vector Conversion Systems. In: Proceedings of SSPR/SPR 1998, pp. 242–251 (1998) 3. Wenyin, L., Zhai, J.: Extended Summary of the Arc Segmentation Contest. In: Blostein, D., Kwon, Y.-B. (eds.) GREC 2001. LNCS, vol. 2390, pp. 343–349. Springer, Heidelberg (2002) 4. Liu, W.: Report of the Arc Segmentation Contest. In: Lladós, J., Kwon, Y.-B. (eds.) GREC 2003. LNCS, vol. 3088, pp. 364–367. Springer, Heidelberg (2004) 5. Shafait, F., Keysers, D., Breuel, T.M.: GREC 2007 Arc Segmentation Contest: Evaluation of Four Participating Algorithms. In: Liu, W., Lladós, J., Ogier, J.-M. (eds.) GREC 2007. LNCS, vol. 5046, pp. 310–320. Springer, Heidelberg (2008) 6. Al-Khaffaf, H.S.M., Talib, A.Z., Osman, M.A., Wong, P.L.: GREC’09 Arc Segmentation Contest: Performance Evaluation on Old Documents. In: Ogier, J.-M., Liu, W., Lladós, J. (eds.) GREC 2009. LNCS, vol. 6020, pp. 251–259. Springer, Heidelberg (2010) 7. Arc Segmentation Contest (2011), http://www.cs.usm.my/ArcSeg2011/downloads.html 8. VPstudio, http://www.softelec.com 9. Scan2CAD, http://www.scan2cad.com 10. Vextractor, http://www.vextractor.com 11. Wenyin, L., Dori, D.: A protocol for performance evaluation of line detection algorithms. Machine Vision and Applications 9(5), 240–250 (1997) 12. Roberts, M.J., Russo, R.: A Student’s Guide to Analysis of Variance. Routledge, Taylor Francis Ltd., United Kingdom (1999)
Visualizing the Construction of Incremental Disorder Trie Itemset Data Structure (DOSTrieIT) for Frequent Pattern Tree (FP-Tree) Zailani Abdullah1, Tutut Herawan2, and Mustafa Mat Deris3 1
Department of Computer Science, Universiti Malaysia Terengganu Faculty of Computer System and Software Engineering, Universiti Malaysia Pahang 3 Faculty of Computer Science and Information Technology, Universiti Tun Hussein Onn Malaysia [email protected], [email protected], [email protected] 2
Abstract. In data mining, visual representation can help in enhancing the ability of analyzing and understanding the techniques, patterns and their integration. Recently, there are varieties of visualizers have been proposed in marketplace and knowledge discovery communities. However, the detail visualization processes for constructing any incremental tree data structure from its original dataset are rarely presented. Essentially, graphic illustrations of the complex processes are easier to be understood as compared to the complex computer pseudocode. Therefore, this paper explains the visualization process of constructing our incremental Disorder Support Trie Itemset (DOSTrieIT) data structure from the flat-file dataset. DOSTrieIT can be used later as a compressed source of information for building Frequent Pattern Tree (FP-Tree). To ensure understandability, an appropriate dataset and its processes are graphically presented and details explained. Keywords: Visual Representation, Data Structure, Data Mining, Visual Informatics.
1 Introduction Data mining, also popularly known as Knowledge Discovery in Database (KDD), defines as “the nontrivial extraction of implicit, previously unknown, and potentially useful of data” [1] and “the science of extracting useful information from large dataset or databases” [2]. Often, these two terms are used exchangeable but indeed data mining is a part and core of KDD process. Hence, KDD is more broad process of finding knowledge from data. It usually emphasizes on the high-level of application and applies with the particular data mining techniques. The other synonym terms that are referring to data mining are data dredging, knowledge extraction and pattern discovery. Frequent pattern mining is one of the most important topics in data mining. For the past decade, it has invited many research interests [3-6] due to its good track records in handling several domain applications. The main aim at frequent pattern mining is H. Badioze Zaman et al. (Eds.): IVIC 2011, Part I, LNCS 7066, pp. 183–195, 2011. © Springer-Verlag Berlin Heidelberg 2011
184
Z. Abdullah, T. Herawan, and M.M. Deris
to search for interesting relationships among set of items (itemset) in the database repositories. It was first introduced by Agrawal et al. [7] and still playing an important role in data mining. A set of item in pattern (or transaction) is also defined as an itemset. The itemset is said to be frequent, if it appears equal or exceed the predefined minimum supports. Nowadays, more than hundreds of research papers have been published including new or modification of the algorithms. Generally, in term of frequent pattern mining, the result can be obtained from two types of wellknown algorithms; Apriori-based [7] or Frequent Pattern Tree (FP-Tree) [3] based algorithms. FP-Tree data structure is more dominant than Apriori since it is more compact in storing the vast number of transactions. However, from the past researches [8-11], the construction of FP-Tree is inefficient if it relies on the original dataset. One of the promising areas that can be applied in frequent pattern mining is visualization. Visualization is an approach of using the recent technology by applying certain principles in interpreting the data. In term of data mining, it can be classified into three categories [12]. First, apply the visualization techniques that are independence from data mining algorithm. Second, use the visualization to represent pattern and mining result from mining process graphically. Third, integrate both mining and visualization algorithm so that the immediate steps of a data mining algorithm can be visualized. Presently, several variations of visualizations in this area have been proposed. Yang [13] suggested a system to visualize both association rules and frequent patterns. In this system, domain items are sorted according to their frequencies and are evenly distributed along each vertical axis. Munzner [14] designed Powerset Viewer to visualize frequent patterns in the context of the powerset universe. Here, all patterns are grouped together based on cardinality. Leung [15] proposed Frequent Itemset Visualizer to present the frequent patterns by a polyline. A year later, Leung [16] again proposed FpViz to visualize the frequent patterns. At this time, the pattern is presented according to an orthogonally laid out node-link diagram instead of using the polyline diagram. Undeniable, the visual data mining is playing an important role to give the new insights of information and related processes rather than the textual form. However, the details visualization processes that are directly involved in constructing the incremental tree data structure of frequent pattern mining are rarely became a focal point. Therefore, in this paper we visualize the detail processes of constructing our proposed DOSTrieIT data structure in an attempt to replace the original flat-file dataset. Indeed, DOSTrieIT is an incremental tree data structure and it can be used for constructing FP-Tree in the future. Here, we graphically depict a step by step and explain in details the processes involved during the transformation. A suitable dataset that has been widely used in other research papers in frequent pattern mining is also employed. In summary, there are two main contributions from this work. First, we propose a novel and incremental tree data structure; DOSTrieIT that can replace the dependency of using the original transactional database to construct FP-Tree. Second, we present the detail processes of constructing DOSTrieIT from its database using visualization (graphically illustration) rather than the conventional approach (eg. textual form, pseudocode, etc.).
Visualizing the Construction of Incremental Disorder Trie Itemset Data Structure
185
The rest of the paper is organized as follows. Section 2 describes the preliminaries of association rules. Section 3 discusses the proposed method. This is followed by result and discussion in section 4. Finally, conclusion and future direction are reported in section 5.
2 Association Rules
{
}
Throughout this section the set I = i1 , i 2 , , i A , for A > 0 refers to the set of literals
{
}
called set of items and the set D = t1 , t 2 ,, t U , for U > 0 refers to the data set of
{
}
transactions, where each transaction t ∈ D is a list of distinct items t = i1 , i2 , , i M , 1 ≤ M ≤ A and each transaction can be identified by a distinct identifier TID. A set X ⊆ I is called an itemset. An itemset with k-items is called a k-itemset. The support of an itemset X ⊆ I , denoted supp( X ) is defined as a number of transactions contain X. Let X , Y ⊆ I be itemset. An association rule between sets X and Y is an implication of the form X Y , where X Y = φ . The sets X and Y are called antecedent and consequent, respectively. The support for an association rule X Y , denoted supp( X Y ) , is defined as a number of transactions in D contain X Y . The confidence for an association rule X Y , denoted conf ( X Y ) is defined as a ratio of the numbers of transactions in D contain X Y to the number of transactions in D contain X. Thus conf ( X Y ) = supp( X Y ) / supp ( X ) . An itemset X is called frequent item if supp ( X ) > β , where β is the minimum support. The set of frequent item will be denoted as Frequent Items and Frequent Item = {X ⊂ I | supp ( X ) > β } .
3
Proposed Model
3.1 DOSTrieIT Rudimentary
In this section, DOSTrieIT data structure is details explained. In order to easily comprehend the whole process, some required definitions together with a sample transactional data of the proposed model are presented. Definition 1. Disorder Support Trie Itemset (DOSTrieIT) is defined as a complete tree data structure in canonical order of itemsets (or in any form and it is free from any support order). The order of itemset is not based on the support descending order. DOSTrieIT contains n-levels of tree nodes (items) and their support. Moreover, DOSTrieIT is constructed in online manner and for the purpose of incremental pattern mining. Example 2. Let T = {{1,2,5}, {2,4}, {2,3}, {1,2,4}, {1,3}, {2,3,6}, {1,3}, {1,2,3,5}, {1,2,3}} . A step by step to construct DOSTrieIT is explained in the next section. Graphically, an item
186
Z. Abdullah, T. Herawan, and M.M. Deris
is represented as a node and its support is appeared nearby to the respective node. A complete structure of DOSTrieIT is shown as follows in Fig. 1. Definition 3. Single Item without Extension (SIWE) is a prefix path in the tree that contains only one item or node. SIWE is constructed upon receiving a new transaction and as a mechanism for fast searching of single item support. It will be employed during tree transformation process but it will not be physically transferred into the others tree. Example 4. From Example 2, the transactions have 6 unique items and it is not sorted in any order. In Fig. 1, SIWE for DOSTrieIT i.e
SIWE = {2,1,3,4,5,6}
Fig. 1. DOSTrieIT and SIWE Path arranged in descending order
Definition 5. Disorder-Support Itemset Path (DSIP) is a selected prefix path from DOSTrieIT. The selected path must be linked directly to the root and it branches are ignored temporarily. Example 6. From Example 2, the longest prefix path that originated from the root is {1,2,3,5} . Thus, Disorder-Support Itemset Path for the prefix path, i.e.,
DSIP(1,2,3,5) = {1,2,3,5} Definition 7. Parent Item (ParentItem) is an item(s) in DSIP that has an extension or others prefix paths. In DSIP, it may or may not contain the ParentItem. During transformation process to FP-Tree, DSIP will be removed from DOSTrieIT and its ParentItem will be replicated. Example 8. Let used again the example of DSIP(1,2,3,5) . The ParentItem is item 1
and 2. This item is shared by another three prefix paths, (1,3) , (2,4) and (2,5) . While
removing DSIP(1,2,3,5) from DOSTrieIT during transformation process, its ParentItem 1 and 2 is still maintained.
Definition 9. Child Item (ChildItem) is prefix paths that rooted from the ParentItem in DSIP. The ChildItem that is also appeared as part of DSIP is called direct ChildItem. During transforming DOSTrieIT to FP-Tree, the direct ChildItem will be removed from DOSTrieIT but the others (indirect) ChildItem will be remained.
Visualizing the Construction of Incremental Disorder Trie Itemset Data Structure
187
Example 10. For DSIP(1,2,3,5) , the ChildItem are 3 (1,3) , 4 (2,4) and 5 (2,5) . During transformation to FP-Tree, all these ChildItem and their ParentItem will be maintained in DOSTrieIT. Definition 11. Items and Support (ItemSupp) is a list of items and their support. The data in ItemSupp is updated instantly upon receiving a new line of transaction. The items in ItemSupp are flexible and not sorted in support descending order. An itemset X is called frequent item if supp ( X ) > β , where β is the minimum support threshold. Definition 12. Frequent Itemset (FRIT) is a set of frequent items that fulfils the minimum support threshold (β ) and sorted in support descending order. FRIT is generated by extracting the information from SIWE. Example 13. For simplicity, let fixed the minimum support threshold (β ) to 20%.
Thus, itemset in FRIT is {2,1,3,4,5} . Item 6 is excluded since its support value is less than the minimum support threshold. Definition 14. Ordered-Support Itemset Path (OSIP) is a prefix path and it itemset is sorted based on the itemset in FRIT. OSIP is produced by applying the intersection operation between FRIT and DSIP. The itemset arrangement in OSIP is following the arrangement as shown in FRIT. Example 15. Using the similar example, the intersection between FRIT(2,1,3,4,5)
and DSIP(1,2,3,5) is OSIP(2,1,3,5) . The itemset arrangement in OSIP is very important during constructing FP-Tree structure. Definition 16. Minimum Itemset Support (MISupp) is a uniform support value determined from the minimum items supports as appeared in OSIP. Once the MISupp is identified, all items in OSIP will be employed with this value. The support of replicated ParentItem (if any) of DSIP will be also deducted by MISupp. The support of indirect ChildItem will be unchanged. Example 17. Let used the example of DSIP(1,2,3,5) and OSIP(2,1,3,5) . MISupp for
OSIP(2,1,3,5) is 1. The replicated ParentItem of DSIP(1,2,3,5) is item 2 with the original support of 3. Therefore, the latest support of the replicated ParentItem 2 in DOSTrieIT is 2 (3 – MISupp). 3.2 DOSTrieIT Algorithm
This section describes a step by step of constructing DOSTrieIT and its algorithm. DOSTrieIT compresses the transactional database into tree data structure. It is online construction of tree data structure and constructed by only a single scanning of transactional database. Illustration of tee construction is based on transactional data as presented in Example 2. The fundamental steps of constructing DOSTrieIT are as follows: Step 1, the algorithm starts by scanning the content of the current DOSTrieIT. If existed, the existing data will be transferred into SortedList class. SortedList is a collection of key/value pairs that are sorted by the keys. The data in SortedList is a list
188
Z. Abdullah, T. Herawan, and M.M. Deris
of all prefixes paths and their support from the previous transactions. Otherwise, the structure of DOSTrieIT is assigned with null. Step 2, a transaction is a main input in the algorithm. A line of transaction which consists of items is read and transformed into String format. The items in the String format (horizontal) are then converted into vertical ArrayList. ArrayList in a single dimension array. The specification of array size is not required for this type of array and it is auto-extent by itself. Step 3, hash data structure is employed to transform the items (keys) from the ArrayList into the index of an array element. All items in a line of transaction are converted into another hash data structure. At this step, a technique based on the intersection operation between two hashes data structure is employed. This technique helps in reducing the computational times rather than typical brute-force technique. Step 4, if there is no intersection operation taken place between the line of transaction and the existing prefix paths hashes; a new prefix path linked to the root is produced. The support of all items (nodes) in the prefix path is the same and equal to MISupp. In addition, a list of items and their support is created. Once created, the list will be regularly and consistently updated. Step 5, if intersection operation is occurred between the line of transaction and the prefix paths hashes, and if the total number of items of intersection is the same, prefix path will not be produced. The support of involved items (nodes) in the prefix path is increased by summation with the current MISupp. However, if the total number of items of intersection is different, the support of similar intersected items is increased by summation with current MISupp and for dissimilar one, a new prefix path is created and its support is equal to current MISupp.
The final output from the processes is DOSTrieIT. The details implementation is shown in the following algorithm. DOSTrieIT Algorithm 1 : Input : Transaction 2 : Output : DOSTrieIT 3 : if DOSTrieIT ≠ null do 4: Read DOSTrieIT 5: PrefixPath∈ DOSTrieIT 6: ItemSupp∈ DOSTrieIT 7 : else 8: Create DOSTrieIT ← null 9: PrefixPath ← null 10 : end if 11: Read Transaction 12: for DTpath ∈ DOSTrieIT(path) j
13: 14:
for DTitem ∈ DTpath (item)
(
i
j
)
if Linepattern ∈ Transaction (pattern) ≠ DTpath do j
Visualizing the Construction of Incremental Disorder Trie Itemset Data Structure
15:
((
)
a
i
Insert PrefixPath from root PrefixPathitem ,supp ← Lineitem ,supp
18:
ItemSuppitem ,supp ← Lineitem ,supp
22:
)
for Lineitem ∈ Transaction(item) ∉ DTitem do
16: 17: 19: 20: 21:
189
a
a
a
a
a
a
a
a
end for loop end if for Lineitem ∈ Transaction (item) ∈ DTitem do
((
(
)
a
i
if Lineitem ∈ Transaction(item) = DTitem a
i
) ) do
23:
PrefixPathsupp ← PrefixPathsupp + Linesupp
24:
ItemSuppsupp ← ItemSuppsupp + Linesupp
25: 26:
end if if Lineitem ∈ Transaction(item) ≠ DTitem do
i
i
i
(
a
i
a
a
i
)
27:
Insert PrefixPath from PrefixPathitemi-1
28:
PrefixPathitem ,supp ← Lineitem ,supp
29:
ItemSuppitem ,supp ← Lineitem ,supp
a
a
a
a
a
a
a
a
end if 30: 31: end for loop 32: end for loop 33: end for
4 Result and Discussion The discussion in this section is based on the transactions as shown in Table 2. In general, the items shown in the transaction of T1 to T9 will be read, divided and converted into vertical format before transferring to DOSTrieIT. The visualization as shown from Fig 2 to Fig. 10 together with its elaboration will help in understanding precisely the step by step of constructing DOSTrieIT. Table 2. Transactions for constructing DOSTrieIT ID T1 T2 T3 T4 T5 T6 T7 T8 T9
Transactions 125 24 23 124 13 236 13 1235 123
190
Z. Abdullah, T. Herawan, and M.M. Deris
Fig. 2. DOSTrieIT Construction for T1
From Fig. 2 and start from the empty tree, the first transaction (T1: 1 2 3) will be arranged first in numerical sorting and then inserted at the root. Three Single Item without Extension (SIWE) which is (1, 2, 5) will be also directly inserted from the root. At this stage, all nodes in the tree have an equal support count, which is 1.
Fig. 3. DOSTrieIT Construction for T2
For the second transaction (T2: 2 4) as in Fig. 3, only item 2, is matched with the existing item in SIWE. Thus, the support count of item 2 in SIWE is updated into 2. A new SIWE for item 4 with the support count of 1 will be created. For the first item in transaction T2, it is not matched to any first level items in the tree (SIWE is ignored). Therefore, a new prefix path will be created which is originated from the root and the support count for each item is 1.
Fig. 4. DOSTrieIT Construction for T3
From Fig 4, only item 2 in the third transaction (T3: 2 3) matched with the existing item in SIWE. Therefore, the support count of item 2 in SIWE is now updated into 3. A new SIWE for item 3 with the support count of 1 will be created. For the first item
Visualizing the Construction of Incremental Disorder Trie Itemset Data Structure
191
in transaction T3, it is matched with item 2 in the first level items in the tree (SIWE is ignored). Therefore, a new prefix path which is originated from the item 2 will be created and the support count for item 2 and item 3 will become 2 and 1, respectively.
Fig. 5. DOSTrieIT Construction for T4
For the fourth transaction (T4: 1 2 4) as in Fig. 5, item 1 and 2 are matched with the existing item in SIWE. Thus, the support count for item 1 and 2 in SIWE are updated into 2 and 4, respectively. For the first item in transaction T4, it is matched with the existing first level item. For the next item in transaction T4, it is also matched to the consequence item in the same prefix path. Therefore, the support count for item 1 and 2 in that prefix path are updated into 2 and 2, respectively. Since for the next item (4) in transaction T4 is not matched to the consequence item in the prefix path, thus a new prefix path which is originated from item 2 will be created. The support count for item 4 in this prefix path is 1.
Fig. 6. DOSTrieIT Construction for T5
From Fig. 6, as in the fifth transaction (T5: 1 3), item 1 and 3 are matched with the existing item in SIWE. Thus, the support count for item 1 and 3 in SIWE are updated into 3 and 2, respectively. For the first item in transaction T5, it is matched with the existing first level item. For the next item in transaction T5, it is unmatched to the consequence item in the same prefix path. Thus, the support count for item 1 in that prefix path is updated into 3. Since for the next item (3) in transaction T5 is not matched to the consequence item in the prefix path, thus a new prefix path which is originated from item 1 will be created. The support count for item 3 in this prefix path is 1.
192
Z. Abdullah, T. Herawan, and M.M. Deris
Fig. 7. DOSTrieIT Construction for T6
At Fig. 7, item 2 and 3 in the sixth transaction (T6: 2 3 6), are matched with the existing item in SIWE. Thus, the support count for item 2 and 3 in SIWE are updated into 5 and 2, respectively. For the first item in transaction T6, it is matched with the existing first level item. For the next item in the transaction, it is again matched to the consequence item in the same prefix path. Therefore, the support count for item 2 and 3 in that prefix path are now updated into 3 and 2, respectively. Since for the next item (6) in transaction T6 is not matched to the consequence item in the prefix path, thus a new prefix path which is originated from item 3 will be created. The support count for item 6 in this prefix path is 1.
Fig. 8. DOSTrieIT Construction for T7
For the seventh transaction (T7: 1 3) as in Fig. 8, item 1 and 3 are matched with the existing item in SIWE. Thus, the support count for item 1 and 3 in SIWE are updated into 5 and 3, respectively. For the first item in transaction T7, it is matched with the existing first level item. For the next item in transaction T7, it is also matched to the consequence item in the same prefix path. Thus, the support count for item 1 and 3 in that prefix path is updated into 4 and 2, respectively.
Fig. 9. DOSTrieIT Construction for T8
Visualizing the Construction of Incremental Disorder Trie Itemset Data Structure
193
As shown in Fig. 9, item 1, 2, 3 and 5 in the eighth transaction (T8: 1 2 3 5) are matched with the existing item in SIWE. Thus, the support count for item 1, 2, 3 and 5 in SIWE are updated into 6, 6, 4 and 2, respectively. For the first item in transaction T8, it is matched with the existing first level item. For the next item in transaction T8, it is also matched to the consequence item in the same prefix path. Therefore, the support count for item 1 and 2 in that prefix path are updated into 5 and 3, respectively. Since for the next item (3) in transaction T8 is not matched to the consequence item in the prefix path, thus a new prefix path which is originated from item 2 will be created. The support count for both item 3 and 5 in this prefix path is 1. For the eighth transaction (T9: 1 2 3) as presented in Fig. 10, item 1, 2 and 3 are matched with the existing item in SIWE. Thus, the support count for item 1, 2 and 3 in SIWE are updated into 7, 7 and 5, respectively. For the first item in transaction T9, it is matched with the existing first level item. For the next item in transaction T9, it is also matched to the consequence item in the same prefix path. Moreover, the last item in the transaction is also matched to the consequence item in the same prefix path. Therefore, the support count for item 1, 2 and 3 in that prefix path are now updated into 6, 4 and 2, respectively. Fig. 11 depicts DOSTrieIT data structure with SIWE that are sorted in support descending order.
Fig. 10. DOSTrieIT Construction for T9
Fig. 11. Illustration of complete constructing DOSTrieIT based on frequent items
5 Conclusion Visual representation in data mining is playing an important role since it can comprehend the complex techniques, patterns and their integration. For the past
194
Z. Abdullah, T. Herawan, and M.M. Deris
decades, one of the popular research areas in data mining is frequent pattern mining. A real challenge is how to construct FP-Tree without depending on its original flatfile dataset. Even though, some of them had suggested a new tree data structure, but the details explanation is not graphically presented. Essentially, graphic presentations of the complex processes are easier to be understood as compared to the typical computer algorithm or mathematically notations. Therefore, here we propose a new incremental tree data structure to represent all itemsets called Disordered Support Trie Itemset (DOSTrieIT). To ensure the better understanding of constructing DOSTrieIT, all the detail processes and appropriate sample data are visually illustrated. In a near future, we are going to evaluate and compare the computational performance of transferring the data from DOSTrieIT against the original dataset or other tree data structures into FP-Tree. Acknowledgement. The authors would like to thanks Universiti Malaysia Pahang for supporting this work.
References 1. Frawley, W., Piatetsky-Shapiro, G., Matheus, C.: Knowledge Discovery in Databases: An Overview. AI Magazine, 213–228 (Fall 1992) 2. Hand, D., Mannila, H., Smyth, P.: Principles of Data Mining. MIT Press, Cambridge (2001) 3. Han, J., Pei, H., Yin, Y.: Mining Frequent Patterns without Candidate Generation. In: The Proceedings of the 2000 ACM SIGMOD, pp. 1–12 (2000) 4. Han, J., Pei, J.: Mining Frequent Pattern without Candidate Itemset Generation: A Frequent Pattern Tree Approach. Data Mining and Knowledge Discovery (8), 53–87 (2004) 5. Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W.: New algorithms for fast discovery of association rules. In: The Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, pp. 283–286. AAAI Press (1997) 6. Zheng, Z., Kohavi, R., Mason, L.: Real World Performance of Association Rule Algorithms. In: The Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 401–406. ACM Press (2001) 7. Agrawal, R., Imielinski, T., Swami, A.: Database Mining: A Performance Perspective. IEEE Transactions on Knowledge and Data Engineering 5(6), 914–925 (1993) 8. Cheung, W., Zaïane, O.R.: Incremental Mining of Frequent Patterns without Candidate Generation of Support Constraint. In: Proceeding the 7th International Database Engineering and Applications Symposium, IDEAS (2003) 9. Hong, T.-P., Lin, J.-W., We, Y.-L.: Incrementally Fast Updated Frequent Pattern Trees. International Journal of Expert Systems with Applications 34(4), 2424–2435 (2008) 10. Tanbeer, S.K., Ahmed, C.F., Jeong, B.S., Lee, Y.K.: Efficient Single-Pass Frequent Pattern Mining Using a Prefix-Tree. Information Science 279, 559–583 (2009) 11. Totad, S.G., Geeta, R.B., Reddy, P.P.: Batch Processing for Incremental FP-Tree Construction. International Journal of Computer Applications 5(5), 28–32 (2010) 12. Ankerst, M.: Visual Data Mining with pixel-oriented visualization techniques. In: Proceeding of ACM Special Interest Group on Knowledge Discovery and Data Mining (SIGKDD) Workshop on Visual Data Mining San Francisco (2001)
Visualizing the Construction of Incremental Disorder Trie Itemset Data Structure
195
13. Yang, X., Asur, S., Parthasarathy, S., Mehta, S.: A visual-analytic toolkit for dynamic interaction graphs. In: The Proceedings of Knowledge Discovery of Database 2008, pp. 1016–1024 (2008) 14. Munzner, T., Kong, Q., Ng, R.T., Lee, J., Klawe, J., Radulovic, D., Leung, C.K.-S.: Visual mining of power sets with large alphabets, Technical report UBC CS TR-2005-25, Department of Computer Science, The University of British Columbia, Vancouver, BC, Canada (2005) 15. Leung, C.K.-S., Irani, P.P., Carmichael, C.L.: FIsViz: a Frequent Itemset Visualizer. In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds.) PAKDD 2008. LNCS (LNAI), vol. 5012, pp. 644–652. Springer, Heidelberg (2008) 16. Leung, C.K.-S., Carmichael, C.L.: FpViz: A visualizer for frequent pattern mining. In: The Proceedings of VAKD 2009, pp. 30–39 (2009)
The Gradient of the Maximal Curvature Estimation for Crest Lines Extraction Pan Zheng1, Bahari Belaton1, Iman Yi Liao1, and Zainul Ahmad Rajion2 1
School of Computer Sciences, Universiti Sains Malaysia, Minden, Penang 11800 Malaysia 2 School of Dental Sciences Universiti Sains Malaysia, Kubang Kerian, Kelantan 16150 Malaysia [email protected], {bahari,iman}@cs.usm.my, [email protected]
Abstract. Crest lines are one of the many types of feature lines that illustrate the prominent characteristics of an object’s surface. In this study, we investigate one vital component to extract crest lines, the gradient of the maximal curvature. Most of geometry properties required to calculate crest lines can be obtained from the volume data during the process of the surface construction using implicit surface polygonizer. Nevertheless the gradient of the maximal curvature cannot be obtained due to the nature of the surface construction algorithm. Hence we proposed three weight function based methods in accordance with the knowledge of the surface mesh. We implemented our methods and conducted both qualitative and quantitative analysis on our methods to find the most appropriate method. We also put forward a simple filtering mechanism as a post-processing procedure to enhance the accuracy of the crest lines, which is addressed at the end of this study. Keywords: Crest lines, Feature Lines, Gradient, Maximal Curvature, Weight Function, Visual Informatics.
1
Introduction
Crest lines are one of the many types of feature lines that illustrate the prominent characteristics of an object’s surface. Various definitions of crest lines can be found from different sources in different domains of research. In this study, crest line is defined as one type of feature lines that illustrate the ridges and ravines of a surface. A ridge is the elevation with a narrow, elongated crest. Geometrically it can be understood as a constricted protrudent portion or line formed by the juncture of two slopping planes. A ravine is basically a valley which is a hollow resembling with predominant extent in one direction, by considering surface as a landscape. In a similar research context [1, 2], crest lines are defined as the locus of points on a surface whose largest curvature is locally maximal in the associated principal direction. Mathematically it is formed by connecting the extremal points on the surface. H. Badioze Zaman et al. (Eds.): IVIC 2011, Part I, LNCS 7066, pp. 196–205, 2011. © Springer-Verlag Berlin Heidelberg 2011
The Gradient of the Maximal Curvature Estimation for Crest Lines Extraction
197
Practical application of crest lines can be found in geo-science such as terrain synthesis [11]. In biometric field, crest lines are used for identity verification with palm skin [12], finger print recognition [13] and face detection [14] and recovery [15]. Vessel extraction [16], bone illustration [17] and cerebral atrophy analysis are some application of crest lines addressed in medical fields. Some example applications of crest line in the domain of illustrative rendering and None Photorealistic Rendering (NPR) are mentioned in [8, 9].Existing methods for crest lines extraction are typically categorised based on the type of data they used, i.e. volumetric data and mesh data (triangle patches). In the later case, a mesh representing the object is formed first, says using iso-surface technique or derived from laser scanning devices. Crest lines and other feature lines are then extracted from this mesh. In volumetric data case, crest lines are identified and extracted directly from the raw data. Some would argue that this approach provides more accurate results. Not surprising though, most of crest lines or more generally feature lines extraction methods are mesh-based. For instance, Stylianou et. al. [3] extract crest lines from a 3D triangulated mesh using a bivariate polynomial to approximate the surface locally and calculate the principal curvature on every vertex on the mesh and developed a tracing algorithm. Hildebrndt et. al. [4] perform a crest lines extraction on a smooth piecewise linear mesh by utilizing discrete differential operators. Yoshizawa et. al. [5] provide a local polynomial fitting scheme for generating crest lines on dense triangle mesh and later some enhancements are added [6]. One common characteristic of mesh-based methods is that they solely depend on the quality of the meshes, which are the smoothness, dense, hole-free and etc. In practice however, noise/hole free, well shaped and smooth meshes are challenging qualities to fulfil or even to be guaranteed, especially in domains where the complexities of the objects are apparent, like in medical and engineering fields. These methods however, produce reasonable results and they are able to achieve these results at a relatively good speed of execution. In contrast, there has been little research works in the domain of crest lines extraction from volumetric data. One notable research in this domain is the work carried by Thirion et. al. [1]. They proposed a marching line algorithm, to trace and extract crest lines directly from the raw volumetric data. The research illustrates the application of their algorithm to extract crest lines from human skulls. While the results they produced appeared to be in good quality. The formulas with high order derivatives are quite sensitive to noise or regular pattern in data, especially when the extraction procedures are conducted directly on the volumetric data. In the process of surface extraction, we use the implicit surface polygonizer [7, 19] to extract the surface. There are several advantages to use implicit polygonizer than conventional exhaustive volume traversal algorithm i.e. Marching Cubes [20]. Firstly, the complexity of implicit polygonizer is approximated to O(n2),whereas exhaustive algorithms usually have at least O(n3). Secondly, the higher order derivatives of the implicit function can be directly acquired based on the information of the volume data with finite difference methods. Thirdly, the size of the traversal cubes of the implicit polygonizer can be easily controlled; hence we can extract surfaces in different resolutions. On account of the nature of the implementation of implicit surface polygonizer, we are not able to procure all the geometry properties required to assemble crest lines of a
198
P. Zheng et al.
surface based on the information from volume data itself. The very element needed is the gradient of the maximal curvature. In this research, we study this vital component that concerns crest lines extraction, and propose three weight sum function based methods by exploring the information based on mesh and experiment these methods accordingly. Both qualitative and quantitative analyses are carried out. A simple filtering mechanism to eliminate the less important crest lines is provided at the end of the paper.
2
Theoretical Background
Crest lines of a surface is formed by connecting the zero crossing of extremal value e of each vertices of a triangle patch of an isosurface. Figure 1 gives a general picture of crest lines.
Fig. 1. Crest lines and related geometry properties
The extremal value can be obtained by extremality function, ·
.
(1)
) in maximal principal which is the gradient of the maximal curvature ( k direction( t ). k is a vector that can be written as ⁄∂x , ∂k ⁄∂y , ∂k ⁄∂z . It means the rate of changes of maximal ∂k principal curvature in x, y and z directions, herefrom we call it the gradient of the is shown below, maximal curvature, whereby the maximal principal direction t t
α Δ
S
√Δ f K
f ,α
√Δ f
f ,α
√Δ f
f
.
(2) (3)
where K and S are the Gaussian and mean curvatures. f is the implicit surface function. The subscript of f denotes the derivative in respective direction in x, y and z. The formulas for calculating S, K, α , α , α and also the maximal curvature (denoted as k) are presented in Thirion’s paper [18], which contain high order
The Gradient of the Maximal Curvature Estimation for Crest Lines Extraction
199
derivatives. With respect of the properties of the implicit function, the high order derivatives can be resolved with the instrument of finite difference methods. The calculation of these geometry elements and the error analysis is addressed in [21].
3
Methodology
By considering different facts and their influences in a calculation, weight function can be used to estimate derivatives in a discrete setting. After extracting the surface, the surface mesh provides us several geometry components which can be utilized to approximate the gradient of the maximal curvature. We proposed and investigated three weight function based methods for the estimation of the gradient of the maximal curvature. The components we contemplate are angles, distances and “intensity differences” [22]. The angle is defined as the corner formed by the vectors pointing from origin to the target vertex and the vector pointing from origin to the respective neighbour. The distance is defined as the interspace reach between the target vertex and the respective neighbour. The neighbourhood are simply the first ring adjacent vertices. The definitions are depicted in figure 2 below. “intensity difference” is addressed in a later context of this paper. The reason to choose these components is based on the important concept of geometry invariance. These components are geometry invariants under rigid transformation, which means that these components preserve their intrinsic geometry qualities and provide the stability we desired. The three components are mutually paired in our weight functions.
Fig. 2. First-ring neighbourhood of the target vertex, and the illustration of the angle and the distance
The general weight sum function to estimate the gradient of can be defined as
respect to x direction
200
P. Zheng et al. ∑
.
∑
|
. |
.
(4)
is the maximal curvature of the target vertex. where . is the weight function. is the maximal curvatures of the first-ring neighbourhood. is the total number of the neighbours. The first method of the weight function to approximate the gradient of the maximal curvature considers the angles and the distances. Ergo it is expounded as .
·
1
. .
. .
.
(5)
is the weight control parameter. We believe that the “distance” ( . . ), is inverse proportional to the influence of the neighbour to the target vertex, so is the “angle” ( . . ). Hence, the ctan of the “angle” is direction proportional to the influence ki to k0. So we have . .
.
.
(6)
, and are the coordinates of the target vertex and , and are the coordinates of the respective neighbour. To calculate the weight function of the distance, we can just employ the general formula for Euclidian distance calculation in 3D space. Owing to the fact that the distance is abundantly miniature, we normalized the distance described below .
∑
(7)
⁄ ⁄ . The “distance” function is The similar calculation applies to and the same. The “angle” function needs to be altered corresponding to desired directions. The second method we take the “angle” and the “intensity difference” into the consideration. The “angle” function is defined as same as it in the first method. The new component is the “intensity difference” [22].The idea is based on the fact that the gradient of the maximal curvature value at a vertex is influenced mainly by the vertices which are spatially close (first ring neighbourhood) and have similar “intensity”. Intensity difference is defined to be the projection of the normal on the surface normal . The intensity difference of difference vector two vertices can be calculate using ·
.
(8)
where the is the normal vector of target vertex and the is the normal vector of the respective neighbour. The third method is reckoned with the “distance” function and the “intensity difference” function. The above methods do not only apply to our implementation of implicit polygonizer surfaces. The methods are generalized to estimate the gradient of the maximal curvature of the surface obtained using other mesh surfaces as well, so long as the maximal curvature of each vertex of the mesh surface is known. The methods can also be potentially used for other derivative related geometry properties of a mesh surface.
The Gradient of the Maximal Curvature Estimation for Crest Lines Extraction
4
201
Experiment and Discussion
We implement all three methods on a synthesised implicit surface function with OpenGL and C. there are some crest lines extraction results with our methods shown below.
Fig. 3. Crest lines extraction on an example implicit surface
The first row is the crest lines result extracted based on the gradient of the maximal curvature estimated using the “angle” and the “distance” weight sum function. The second row shows the “angle” and “intensity difference” result. The third row is the “distance” and “intensity difference” results. All three methods are tested with three different control values which is the lambda. It is assigned to be 0.3, 0.5 and 0.7, which are shown in column-wise. As we can see from these qualitative results, the “angle” plays a vital role to ensure the accuracy of the crest lines identified in predominant surface feature; whereas in the third method, some surface features are extracted, but not desired. We also found that the control parameter in the weight function does not affect the visual results much, so we did a quantitative analysis on the errors of different method with different control parameter. We focus our analysis on the first two prime methods which provide more righteous patterns of crest lines. There are two methods and three control parameters, so totally six cases. For the comparison purpose, the surface function we experiment is a mathematical function surface; therefore the ground truth can be established with accurate calculations. We collect the ground truth
202
P. Zheng et al.
data with MATLAB and compare the actual values with our proposed methods for quantitative analysis. Figure 4 presents the comparisons.
Fig. 4. The Average MSE of the gradient of the maximal curvature against the maximal curvature (left: angle-distance method; right: angle-“intensity difference” method)
The Gradient of the Maximal Curvature Estimation for Crest Lines Extraction
203
We divide the overall values of maximal curvature into ten even intervals and calculate the average mean squared errors (MSE) in each interval with the formula below. Average MSE
∑
MSE
.
(9)
where n is the number of vectors that fall into the interval and v is the corresponding vector. The gradient of the maximal curvatures are vectors. The mean square error is calculated per vertex basis with the equation below. .
(10)
is the gradient vector estimated by our methods and g , g , g is where ( , , the ground truth.Figure 4 histograms exhibit that the errors mainly manifest in the area where n has a large number, which is rational. It also shows that the “angle” and “intensity difference” combined weight sum function shows a better precision. In the qualitative display results, we can see that crest lines appears in the flat region as well. This is mainly due to the numerical errors accumulated in the calculation of high order derivatives. Consequently we need to provide a filtering mechanism to eliminate the crest lines presented in less feature intensive regions. The filtering method is simple. In each surface triangle patch, we evaluate the value of the maximal curvature. If the maximal curvature of a vertex is close to zero, then we manual set it to zero. The threshold of the zeroing is defined case by case basis for different surfaces. If the maximal value of the maximal curvature is m, then we define the zeroing threshold to be m/10 or m/5 depending on the surface feature variations. Subsequently in the process of crest lines extraction, if any vertex of a triangle patch has a maximal curvature of zero, then the triangle patch will not be considered for a crest line drawing operation. Figure 5 shows the filtering result.
Fig. 5. Filtered crest lines display on the smoothed cube surface (left) and crest lines on a more complicated surface (right)
204
5
P. Zheng et al.
Conclusion and Future Works
In this study we investigate a method to extract crest lines for an isosurface base on information from both volume data and mesh surface, proposed weight sum function methods, compared the methods and found out the most suitable method applied. In the post-processing phase, we also provided a simple filtering mechanism to enhance the accuracy of the crest lines extraction. The end result is convincing. In our future works, we will try to implement this method on real world data, such as 3D medical data to extract silent features to facilitate practical applications. Acknowledgement. This study is conducted under the auspices of Research University Grant (1001 / PKOMP / 814109) Universiti Sains Malaysia.
References 1. Thiron, J.P., Gourdon, A.: The 3D Marching Lines Algorithm and Its Application to Crest Lines Extraction. Rapport De Recherche, INRIA. No.1672 (1992) 2. Monga, O., Benayoun, S.: Using Partial Derivatives of 3D Images to Extract Typical Surface Features. Computer Vision and Image Understanding 61(2), 171–189 (1995) 3. Stylianou, G., Farin, G.: Crest Lines Extraction from 3D Triangulated Meshes. In: Hierarchical and Geometrical Methods in Scientific Visualization, pp. 269–281. Springer, Heidelberg (2003) 4. Hildebrandt, K., Polthier, K., Wardetzky, M.: Smooth Feature Lines on Surface Meshes. In: Third Eurographics Symposium on Geometry Processing (SGP 2005), pp. 85–90 (2005) 5. Yoshizawa, S., Belyaev, A., Seidel, H.-P.: Fast and Robust Detection of Crest Lines on Meshes. In: ACM Symposium on Solid and Physical Modelling (SPM 2005), pp. 227–232 (2005) 6. Yoshizawa, S., Belyaev, A., Seidel, H.-P.: Fast, Robust and Faithful Method for Detecting Crest Lines on Meshes. Computer Aided Geometric Design 25, 545–560 (2008) 7. Bloomenthal, J.: Polygonization of Implicit Surfaces. Computer Aided Geometric Design 5, 341–355 (1988) 8. Burns, M., Klawe, J., Rusinkiewicz, S., et al.: Line Drawings from Volume Data. ACM Transactions on Graphics (Proc. SIGGRAPH) 24(3), 512–518 (2005) 9. Yuan, X., Chen, B.: Procedural Image Processing for Visualization. In: Bebis, G., Boyle, R., Parvin, B., Koracin, D., Remagnino, P., Nefian, A., Meenakshisundaram, G., Pascucci, V., Zara, J., Molineros, J., Theisel, H., Malzbender, T. (eds.) ISVC 2006. LNCS, vol. 4291, pp. 50–59. Springer, Heidelberg (2006) 10. Howard Zhou, H., Sun, J., Turk, G., Rehg, G.M.: Terrain Synthesis from Digital Elevation Models. IEEE Transactions on Visualization and Computer Graphics 13(4), 834–848 (2007) 11. Li, Z.J.J., Ferrer, M.A., Travieso, C.M., Alonso, J.B.: Biometric Based on Ridges of Palm Skin Over the Head of the Second Metacarpal Bone. Electronics Letters 42(7), 391–393 (2006) 12. Zhao, Y.L., Jiang, C.F., Xu, W., et al.: New Algorithm of Automation Fingerprint Recognition. In: International Conference on Engineering and Computer Science, pp. 1–4 (2009)
The Gradient of the Maximal Curvature Estimation for Crest Lines Extraction
205
13. Du, T.L.H., Duc, D.A., Vu, D.N.: Ridge and Valley based Face Detection. In: International Conference on Research, Innovation and Vision for the Future, pp. 237–243 (2006) 14. Lengagne, R., Fua, P., Monga, O.: 3D Face Modelling from Stereo and Differential Constraints. In: International Conference on Pattern Recognition, pp. 148–153 (1998) 15. Prinet, V., Monga, O., Rocchisani, J.M.: Multi-dimensional Vessels Extraction Using Crest Lines. In: IEEE Conf. Eng. in Medicine and Bio., vol. 1, pp. 393–394 (1997) 16. Elmoutaouakkil, A., Peyrin, F., Elkafi, J., Laval-Jeantet, A.-M.: Segmentation of Cancellous Bone from High-resolution Computed Tomography Images: Influence on Trabecular Bone Measurements. IEEE Trans. Med. Imaging 21(4), 354–362 (2002) 17. Subsol, G., Roberts, N., Boran, M., et al.: Automatic Analysis of Cerebral Atrophy. Magnetic Resonance Imaging 15(8), 917–927 (1997) 18. Thirion, J.P.: New Feature Points Based on Geometric Invariants for 3D Image Registration. International Journal of Computer Vision 18(2), 121–137 (1996) 19. Pan, Z., Belaton, B., Liao, I.Y.: Isosurface Extraction of Volumetric Data Using Implicit Surface Polygonization. In: Third Asia International Conference on Modelling and Simulation, pp. 555–559 (2009) 20. Lorensen, E., Cline, H.E.: Marching Cubes: A High Resolution 3D Surface Construction Algorithm. Computer Graphics 21(4), 163–169 (1987) 21. Pan, Z., Belaton, B., Liao, I.Y., Rajion, Z.A.: Finite Difference Error Analysis of Geometry Properties of Implicit Surfaces. In: IEEE Symposium on Computers and Informatics, pp. 413–418 (2011) 22. Lee, K.W., Wang, W.P.: Feature-Preserving Mesh Denoising via Bilateral Normal Filtering. In: Ninth International Conference on Computer Aided Design and Computer Graphics, pp. 275–280 (2005)
AdaBoost-Based Approach for Detecting Lithiasis and Polyps in USG Images of the Gallbladder Marcin Ciecholewski Institute of Computer Science, Jagiellonian University, ul. Lojasiewicza 6, 30-348 Krakow, Poland [email protected]
Abstract. This article presents the application of the AdaBoost method to recognise gallbladder lesions such as lithiasis and polyps in USG images. The classifier handles rectangular input image areas of a specific length. If the diameter of areas segmented is much greater than the diameter expected on the input, wavelet approximation of input images is used. The classification results obtained by using the AdaBoost method are promising for lithiasis classification. In the best case, the algorithm achieved the accuracy of 91% for lithiasis and of 80% when classifyingpolyps, as well as the accuracy of 78.9% for polyps and lithiasis jointly. Keywords: Adaptive Boosting classification, Medical Image Analysis, Gallbladder lesions, Ultrasonography (USG), Visual Informatics.
1 Introduction 10-15% of the adult population of Europe and US is estimated to have gallstones in their gallbladders [9]. Another widespread disease is the presence of polyps in the gallbladder, including cancerous ones which are found in 5% of the global population [8]. The early detection of gallbladder diseases plays a huge role in improving the efficacy of their treatment. If gallbladder lesions are detected at an early development stage, then the disease progress can be stopped, making further treatment easier. Consequently, if software supporting the analysis of gallbladder USG images for the early detection of lesions such as gallstones and polyps (including those of cancerous origin) is developed, it will find its use. It will be of high significance as for the gallbladder there are no ready, practical solutions to help the physician in their work. The purpose of this project was to develop an application based on adaptive boosting method (AdaBoost) [5, 7] for classifying USG gallbladder images, including those showing lesions such as lithiasis or polyps inside the gallbladder shape. Fig. 1(a) shows an image without lesions, while Fig. 1(b) shows an image with polyp. In Fig. 1(c) lithiasis is visible. Unlike neural networks [12], the boosting method is based on the idea of combining many classifiers into a single reliable one. The method calls for set of weak classifiers able to give the final answer which may, however, be simplified and contain some degree of error. However, the scheme for training these classifiers has H. Badioze Zaman et al. (Eds.): IVIC 2011, Part I, LNCS 7066, pp. 206–215, 2011. © Springer-Verlag Berlin Heidelberg 2011
AdaBoost-Based Approach for Detecting Lithiasis and Polyps in USG Images
207
Fig. 1. Examples USG images of the gallbladder. (a) An image of a gallbladder free of lesions. (b) An image showing a polyp inside the gallbladder. (c) An image with visible cholecystolithiasis (gallstone).
been developed so that the error in the final classification is small. Of the many boosting methods, the Adaptive Boosting (AdaBoost) classifier was selected, which had already been applied to medical image analyses. For example, in publication [10] the AdaBoost classifier was used to locate blood vessel borders in intravascular ultrasound images. In publication [10], two classifiers – AdaBoost and SVM – were combined to distinguish Alzheimer disease lesions from healthy brains in cerebral MRI images. This yielded an 85% accuracy. The section below describes the designed system using the AdaBoost method to classify lesions like lithiasis and polyps. The following section shows the processing of input USG gallbladder images. The technique of classification with the use of the AdaBoost method is presented in the third section below. The fourth section below presents the experiments conducted and selected research results. The last section provides a summary.
2 System Description The design of a system enabling lesions of the gallbladder such as lithiasis and polyps to be detected is based on a machine learning method using the AdaBoost classifier. The general structure is presented in Algorithm 1. Only errors introduced during the preparation of the files will be corrected. The methods presented in Algorithm 1 comprise several steps. First, every USG image is pre-processed by segmenting the shape of the gallbladder and eliminating the area of uneven contrast background. The approximate edge of the gallbladder was found by applying one of the active contour models [1, 2]. Then, the fragment of the image located outside the gallbladder contour is eliminated from the image. In the second step, a set of rectangular areas is selected from every available image from the database of gallbladder USG images. Selected areas are partitioned into patterns designated for learning and validating. Then, a set of features is distinguished for every pattern. Features from the learning set are designated for training the algorithm which will then be able to take a binary decision about the image features supplied to it. After the learning process, the classifier is validated using features of
208
M. Ciecholewski
the image from the validating set in order to evaluate the accuracy of the classifier. In particular, a confusion matrix with the dimensions 2x2: Ai,j is calculated. The element ai,j matrix represents the number of samples belonging to class i, assigned by the algorithm to class j, (i, j ∈ {1 – lesion, 0 - nonlesion}). As finding a lesion is treated as a positive diagnosis, while the lack of lesions as a negative one, the confusion matrix may contain four possible values determining the classifiers behaviour.
a1.1 , true negative a1.0 + a1.1 a1.0 and false negative ratio . a1.0 + a1.1
These are as follows: true positive ratio specified as the ratio
a0.0 a0.1 , false positive ratio a0.0 + a0.1 a0.0 + a0.1
The true positive ratio is also referred to as the sensitivity of the classifier, while the true negative as its specificity. Finally, the overall accuracy, or the success rate of the classifier can be quantified as
a1.1 + a0.0 . a0.0 + a0.1 + a1.0 + a1.1
3 Analysed Data An image database obtained from the Image Diagnostics Department of the Gdansk Regional Specialist Hospital, Poland, was used in the research on USG image analysis. The database of images used in this research contained 800 images, including 600 images without lesions and 200 images containing them, specifically 90
AdaBoost-Based Approach for Detecting Lithiasis and Polyps in USG Images
209
images showing polyps and 110 images depicting gallbladder stones. USG images of the gallbladder were processed by the histogram normalization transformation to improve their contrast, and the gallbladder shape was segmented using active contour models [1, 2]. Then the background area of uneven contrast was eliminated from images. The entire 800 database of gallbladder images was processed in this fashion. For illustration, Figure 2(a) shows an unprocessed USG image depicting lithiasis from the database provided, while Figure 2(b) is an image after the histogram normalisation transformation, with the approximate edge marked. The contour was initiated manually inside the gallbladder shape and on the outside of the single lesion or several lesions present, as illustrated (dashed line) in Figure 2 (b). Figure 2(c) is an image of a gallbladder from which the uneven background of the image was
Fig. 2. Pre–processing of the input USG image of the gallbladder. (a) Input image obtained from the USG image database. (b) An image with improved contrast after the histogram normalisation transformation and the edge marked using the active contour method. The dashed line shows manually initiated contour inside the gallbladder shape. (c) A USG gallbladder image with the background eliminated. (d) The marked rectangular image fragment with the gallstone in the centre of the region.
210
M. Ciecholewski
eliminated by setting the black colour to pixels located outside the contour. For images showing lesions, a rectangular bounding region containing the disease was captured. Then, the centre of the lesion was approximated, after which the rectangular region was cropped so as to contain the lesion at its centre. An example is shown in Fig. 2(d). These transformations generated a set of rectangular regions of various dimensions, every one of which contained a lesion in the centre. The above procedure made it possible to obtain samples of image regions containing lesions. To obtain samples without lesions, an region equal in size to that obtained for a lesion was delineated in images from healthy individuals. As there are more images without lesions, for every sample with a lesion there were 3 samples identified with the same area free of lesions, coming from different patients. The set of 200 samples with lesions, containing 90 of polyps and 110 of stones and 600 samples from images without lesions was divided into two sets: ─ The training set 300 of samples, containing 200 samples without lesions and 100 samples with lesions (60 of stones and 40 of polyps). ─ The validation set 500 of samples, containing 400 samples without lesions and 100 samples with lesions (50 of stones and 50 of polyps). The number of cases with lesions in the validating set represents 25% of all samples in that set. The classifier handles rectangular input image areas of a specific length. If the diameter of areas segmented is much greater than the diameter expected on the input, wavelet approximation of input images is used. There are two scenarios for selecting the level of this approximation. Fixed Scaling: every sample is downscaled several times using Daubechies-4 wavelet [3] method. Then the centrally located rectangular area of the required size is segmented. The diameter of the lesion is not used. Variable Scaling: Every sample is downscaled using the Daubechies-4 wavelet method. However, the number of subsequent approximations depends on the diameter of the lesion. The number is selected so that the lesion diameter is always smaller than the width of the classifier window, but greater than its half. Then, the centrally located area of the rescaled sample is segmented. For images showing a potentially healthy gallbladder, the diameter of lesions is immaterial until lesions are detected. This is why a constant threshold of wavelet approximation is used in the first scenario. Under the second scenario, the approximation level is randomly selected from the range of scales found in samples containing lesions. What is more, the distribution of scales approximates the distribution of scales for samples containing lesions. In addition, the AdaBoost method is used to classify image windows of fixed width. The AdaBoost method uses additional filtering of input images to increase the accuracy.
4 Boosting Method in Detecting Irregularities The AdaBoost classifier is adaptive in the sense that every subsequent classifier created is tweaked for instances wrongly classified by previous classifiers. The process of training the classifier is summarized in the form of Algorithm 2. During the
AdaBoost-Based Approach for Detecting Lithiasis and Polyps in USG Images
211
training phase, every vector of a pattern from the training set is weighted. Initially, the weights are identical for all vectors. Then, in every iteration, the weak classifier is trained to minimise the weight error for all samples. Every iteration causes the weights to be changed by deducting a value dependent on the value of the weak classifier’s error in the entire training set from the weight. However, weights are reduced only for instances which the classifier correctly classified in the current iteration. The weight of a weak classifier in the whole ensemble is also related to the error of its learning. The assignment of no nuniform, time–varying weights to vectors during the training is necessary to minimise the error rate of the final aggregated classifier. During the training, the ability of the classifier to classify items in the training set constantly increases. This is because weak classifiers used in AdaBoost are complementary. Thus samples of vectors incorrectly classified by some weak classifiers are classified correctly by other weak classifiers.
In particular, for every round t of all rounds T, the weak classifier ht is trained using the training set Tr with the weights Dt. The training set is made up of instances from domain X, marked with labels from set C. The training of the weak classifier is left for the unspecified algorithm WeakLearner, which should minimise the training error ɛt of the resultant weak classifier depending of weights Dt. Based on the error ɛt of the weak classifier ht, the parameters αt and βt are calculated. The first parameter defines the weight of the classifier ht in the final, joint classifier. The second parameter provides the multiplication constant, used to reduce the weights {Dt+1(i)} of the correctly classified instances {i}. Weights of instances that have been wrongly
212
M. Ciecholewski
classified are not changed. Thus, after normalising the new weights {Dt+1(i)}, the appropriate weights of wrongly classified instances from the training set are increased. Consequently, in round ht+1, WeakLearner focuses on those instances and thus the chance increases that classifier ht+1 will learn to classify them correctly. The final strong classifier hfin uses a weighted voting scheme on the results of weak classifiers ht. The weights of single classifiers are defined by the constant αt. Two special instances which are treated separately during the operation of the algorithm are distinguished. The first instance is when ɛt is 0. In this instance, weights Dt+1(i) are equal to Dt and ht+1 is equal to ht. Consequently, the algorithm does not execute the rest of the training process. The second instance occurs when ɛt ≥ 0.5. In this instance, the theoretical restrictions for ht are not met and the algorithm cannot continue new training rounds. One of the most important issues in the use of the AdaBoost scheme is the selection of the weak classifier which segregates instances into two distinguishable classes. Following [14], a classifier that selects a single feature from the entire feature vector is used. The training of the weak classifier consists in selecting the best feature and choosing the threshold of value for this feature, which threshold ensures the best segregation of instances belonging to one class from those belonging to the other. The selection contributes to minimising the error of weights for the training set. The feature set consists of features calculated as the differences of the sum of pixels intensities inside two, three or four adjacent rectangles. These rectangles are of different sizes and are located differently inside the image window but they must adjoin.
5 Training Phase and Selected Experiment Results This section contains the research results and an assessment of the AdaBoost algorithm based on the experiments completed. The tests were carried out on data sets presented in Section 3. The algorithm used central, rectangular part of the images with size equal to 24 x 24 pixels. 5.1 Input Image Filtering Before the wavelet scaling is applied, the input image is filtered using various methods. The filters listed below are a group of typical, elementary filters used in image processing. The following filters have been used [6]: No filtering – No filtering Laplacian – Laplacian filter Unsharp – Unsharp contrast enhancement filter, i.e., negation of the Laplacian LoG – Laplacian of Gaussian filter Dilate – Greyscale dilation using a disk structuring element
AdaBoost-Based Approach for Detecting Lithiasis and Polyps in USG Images
213
Table 1. Results of the AdaBoost classifier in different configurations for a validation sets. For fixed wavelet scaling, the level of approximation is specified. Filtering type Scaling Type Sensitivity (%) Specificity (%) Accuracy (%) No filtering Fixed – 3 75.2% 79% 78.9% No filtering Fixed – 4 80.4% 73.3% 74.7% No filtering Variable 71.4% 70.8% 72.3% Laplacian Fixed – 3 49.1% 63.7% 62.2% Laplacian Fixed – 4 42.2% 79.1% 75.5% Laplacian Variable 51.1% 66.1% 64.2% Unsharp Fixed – 3 78.1% 75.2% 75.7% Unsharp Fixed – 4 75.2% 75.8% 81% Unsharp Variable 67.3% 73.2% 72.6% LoG Fixed – 3 52.2% 71.3% 69.5% LoG Fixed – 4 54% 79.2% 77.3% LoG Variable 53.1% 67.2% 66.3% Dilate Fixed – 3 78.2% 76.3% 77% Dilate Fixed – 4 76% 78.9% 78.3% Dilate Variable 73.5% 68.2% 68.9% Test 1. Samples containing 100 lesions and 400 free of lesions No filtering Fixed – 3 82% 89.6% 89.1% No filtering Fixed – 4 77% 91.8% 90.5% No filtering Variable 94.5% 90.5% 90.8% Laplacian Fixed – 3 38.6% 70% 68.3% Laplacian Fixed – 4 48.5% 78.5% 76.7% Laplacian Variable 73.5% 64.6% 65.2% Unsharp Fixed – 3 76.5% 86.8% 86.5% Unsharp Fixed – 4 83.5% 91% 90.6% Unsharp Variable 88.6% 90.7% 90.6% LoG Fixed – 3 43.5% 66.4% 65.5% LoG Fixed – 4 53.5% 83.8% 82.3% LoG Variable 81% 63.4% 64.5% Dilate Fixed – 3 76% 89.9% 89.2% Dilate Fixed – 4 86% 86.7% 86.3% Dilate Variable 86% 91.3% 91% Test 2. Samples containing 50 lithiasis and 400 free of lesions No filtering Fixed – 3 65% 81.4% 80% No filtering Fixed – 4 75.5% 68.4% 69% No filtering Variable 62% 58% 59.2% Laplacian Fixed – 3 58.3% 69.2% 68.6% Laplacian Fixed – 4 76% 66.4% 67.5% Laplacian Variable 46.3% 75.9% 74.4% Unsharp Fixed – 3 46% 74.5% 74% Unsharp Fixed – 4 71% 68.5% 68.4% Unsharp Variable 41% 59.3% 58.4%
214
M. Ciecholewski Table 1. (Continued) LoG LoG LoG Dilate Dilate Dilate
Fixed – 3 52.1% 71.7% Fixed – 4 54.8% 79.4% Variable 53.5% 67.2% Fixed – 3 78.5% 76.8% Fixed – 4 76% 78.9% Variable 73.5% 68.3% Test 3. Samples containing 50 polyps and 400 free of lesions
69.8% 77.6% 66.3% 77% 78.6% 68.4%
After the wavelet scaling is completed, the classifier undergoes the training stage. The number of rounds for training the classifier was assumed as 300. For each different filtering type and wavelet scaling mode, a different classifier is trained. For every set consisting of the applied scaling method, the total accuracy, sensitivity and specificity are determined. Test results of the first test for the validation set containing 400 samples without lesions and 50 lithiasis and 50 polyps are presented in Table 1 Test1. The Test 1 demonstrated that the best results were achieved for wavelet fixed scaling. The highest sensitivity (81%) was obtained for the 4–level scaling. The total accuracy of the classifier is the greatest (78.9%) if the 3-level scaling is used. To find which type of lesion contributed to these results, the classifier was trained and then tests were conducted only for samples containing gallstones and samples containing no lesions (Test 2), while the third test scenario covered samples containing polyps and instances without lesions (Test 3). Results are presented in Table 1. In polyps detection, the dilation of the image resulted in some improvement of the results. In particular, it increased the sensitivity, as a cost of slight decrease in specificity in all three scaling configurations. To summarise test results it can be said that the AdaBoost classifier achieved significantly better results for lithiasis than for polyps. This can be attributed to the nature of the polyps presented in USG images which are highly non–uniform and not as well localized as gallstones. In the best case, the algorithm achieved the accuracy of 91% for lithiasis and of 80% when classifying polyps.
6 Summary and Further Research Directions This article presents a method of recognising lithiasis and polyps of the gallbladder in USG images, developed for a computer system supporting the early diagnostics of gallbladder lesions. USG images of the gallbladder were first processed by the histogram normalisation transformation to improve their contrast, and the gallbladder shape was segmented using active contour models. Then the background region of uneven contrast was eliminated from images. Subsequently, the Adaboost classifier was used to analyse a set of rectangular regions of various sizes. If the diameter of areas segmented is much greater than the diameter expected on the input, wavelet approximation of input images is used. Results achieved using the AdaBoost method are promising for lithiasis classification. Classifier accuracy and sensitivity reached 90% for lithiasis using variable scalling and dropped to some 77% for polyps and gallstones jointly when the fixed wavelet scalling was used with no filtering, unsharp or dilate. This drop in accuracy and sensitivity in the presence of both types of lesions was due to the classifier’s inability to recognise polyps. In recognising only polyps the algorithm achieved the best accuracy of 80% but sensitivity is just only of 65%
AdaBoost-Based Approach for Detecting Lithiasis and Polyps in USG Images
215
when the same fixed 3–level scalling was used. Further research will focus on the local processing of data from the image, i.e. developing specialised algorithms for analysing gallbladder USG images which will enable image features to be extracted. It is also necessary to develop and apply filters which will make it possible to set off lesions in USG gallbladder images. Further research will also aim at developing a fast and effective algorithm for selecting suspiciously–looking areas within gallbladder USG images in which possible lesions may be present. Acknowledgements. This research was financed with state budget funds for science for 2009-2012 as research project of the Ministry of Science and Higher Education: N N519406837.
References 1. Ciecholewski, M.: Gallbladder Segmentation in 2-D Ultrasound Images Using Deformable Contour Methods. In: Torra, V., Narukawa, Y., Daumas, M. (eds.) MDAI 2010. LNCS (LNAI), vol. 6408, pp. 163–174. Springer, Heidelberg (2010) 2. Ciecholewski, M.: Gallbladder Boundary Segmentation from Ultrasound Images Using Active Contour Model. In: Fyfe, C., Tino, P., Charles, D., Garcia-Osorio, C., Yin, H. (eds.) IDEAL 2010. LNCS, vol. 6283, pp. 63–69. Springer, Heidelberg (2010) 3. Daubechies, I.: Orthonormal bases of compactly supported wavelets. Comm. Pure Appl. Math. 41, 909–996 (1988) 4. Freund, Y., Schapire, R.E.: A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. In: Vitányi, P.M.B. (ed.) EuroCOLT 1995. LNCS, vol. 904, pp. 23–37. Springer, Heidelberg (1995) 5. Freund, Y., Schapire, R.: A short introduction to boosting. J. Jpn. Soc. Art. Intell. 14(5), 771–780 (1999) 6. Gonzalez, R.C., Woods, R.E.: Digital Image Processing, 3rd edn. Prentice Hall (2008) 7. Meir, R., Ratsch, G.: An Introduction to Boosting and Leveraging. In: Mendelson, S., Smola, A.J. (eds.) Advanced Lectures on Machine Learning. LNCS (LNAI), vol. 2600, pp. 118–183. Springer, Heidelberg (2003) 8. Myers, R.P., Shaffer, E.A., Beck, P.L.: Gallbladder polyps: Epidemiology, natural history and management. Can J Gastroenterol 16(3), 187–194 (2002) 9. Portincasa, P., Moschetta, A., Palasciano, G.: Cholesterol gallstone disease, Lancet, vol. 368, pp. 230–239 (2006) 10. Pujol, O., Rosales, M., Radeva, P., Fernandez–Nofrerıas, E.: Intravascular Ultrasound Images Vessel Characterization Using AdaBoost. In: Magnin, I.E., Montagnat, J., Clarysse, P., Nenonen, J., Katila, T. (eds.) FIMH 2003. LNCS, vol. 2674, pp. 242–251. Springer, Heidelberg (2003) 11. Savio, A., García-Sebastián, M., Graña, M., Villanúa, J.: Results of an Adaboost Approach on Alzheimer’s Disease Detection on MRI. In: Mira, J., Ferrández, J.M., Álvarez, J.R., de la Paz, F., Toledo, F.J. (eds.) IWINAC 2009. LNCS, vol. 5602, pp. 114–123. Springer, Heidelberg (2009) 12. Sehad, S., Desarnaud, S., Strauss, A.: Artificial neural classification of clustered microcalcifications on digitized mammograms. In: Proc. IEEE Int. Conf. Syst. Man Cybernet., pp. 4217–4222 (1997) 13. Smola, A.J., Scholkopf, B.: On a kernel-based method for pattern recognition, regression and approximation. Algorithmica 22, 211–231 (1998) 14. Viola, P., Jones, M.J.: Robust real-time face detection. Int. J. Comput. Vision 57(2), 137– 154 (2004)
Assessing Educators’ Acceptance of Virtual Reality (VR) in the Classroom Using the Unified Theory of Acceptance and Use of Technology (UTAUT) Niwala Haswita Hussin, Jafreezal Jaafar, and Alan G. Downe Department of Computer and Information Sciences Universiti Teknologi PETRONAS Perak, Malaysia [email protected], [email protected], [email protected]
Abstract. Teaching and learning processes can be enhanced through the adoption of Virtual Reality (VR) Technology in the classroom. However, educators in Malaysia have been slow to adopt VR approaches to classroom content delivery. Little is known about the specific factor that impact to the intention to employ VR in educational environment. The purpose of this present research paper is to investigate and provide a preliminary analysis of a framework that predicts level of technology acceptance in a post secondary institution in Perak, Malaysia. A questionnaire based on factor identified in the Unified Theory of Acceptance and the Use Of Technology (UTAUT) was administered to 41tutoring level educators to measure the effect of performance expectancy (PE), Effort Expectancy (EE), and Social Influence (SI) on Behavioral Intention (BI) towards VR. Regression analysis identified that EE and SI significantly influenced BI. The need for future study of technology acceptance in education was discussed. Keywords: Educators’ Informatics.
acceptance,
Virtual
Reality,
UTAUT,
Visual
1 Introduction Traditional classroom technique has move towards the evolutions of technology in term of integration, adoption, and communication for teaching and learning purposes. It has been reported that by having utilization of CMC, accessibility and quality in education has improve to the next level [1]. All syllabus or programs of study in worldwide are promoting learning through CMC (See, for example, [2], [3], [4], [5],[6]). Virtual Reality (VR) can be considered as computer mediated communication (CMC) since it has the involvement of technology not only the computer but also other electronic communication devices to create a virtual environment (VE). VE is a scenario where user can participate in real time application within a situation created by computer technology. The illusions created are very likely to be similar with the H. Badioze Zaman et al. (Eds.): IVIC 2011, Part I, LNCS 7066, pp. 216–225, 2011. © Springer-Verlag Berlin Heidelberg 2011
Assessing Educators’ Acceptance of Virtual Reality (VR) in the Classroom
217
real world. VR immersive world can be control real-time without users intervention physically and limit the risk to users. VR eliminating difficulties occurred in the real world [7]. It has been used in many areas such as pilot training that promote safe training zones even under extreme conditions, virtual oil drilling activities, medical and healthcare applications, games, and so on. Bonwell inquired for additional research to be done comprehensively for it to be a platform as a learning direction in the class [8]. An extensive research have to be conducted on particular technology with 3D virtual worlds, research of VR enrichment that grant learners to associate with the knowledge that are impossible in normal class setting ([9]; [10]; [11]; [12]). One factor that has been understudied relates to the acceptance of VR in the classroom by educators. There are a numbers of theories had been proposed to determine user acceptance towards technology. For instance, Technology Acceptance Model (TAM). However, in a critical review, Legris et.al contends to use a theory that will cover more variable in order to get wider clarification towards technology acceptance and adoption [13]. To solve this problem, the present authors have employed the Unified Theory of Acceptance and The Use of Technology (UTAUT) model [14] which outline eight other models including the Social Cognition Theory (SCT), Innovation Diffusion Theory (IDT), The Motivational Model (MM), Model of PC utilization (MPC), Theory of Acceptance Model and Theory of Planned Behavior (TAM - TPB), Theory of Acceptance Model (TAM), Theory of Planned Behavior (TPB), and Theory of Reason Action (TRA).
2 Literature Review Study of distribution and adoption of Information Communication Technology (ICT) can be divided into two levels which are individual level and organizational level. This study is considered as individual level because it focused more on educators’ acceptance of technology individually [15]. Factors in explaining or predicting the technologies use becoming a main focus by a variety of researcher. Many models have been structured to forecast technology acceptance. Figure 1 shows the original Unified Theory of the Acceptance and Use of Technology (UTAUT) [14]. Venkatesh et. al suggested four direct determinants for user acceptance and usage behavior: Performance Expectancy (PE), Effort Expectancy (EE), Social Influence (SI), and Facilitating Conditions (FC). PE refers to the degree to which individuals believe that using a VR system will help them accomplish an increments in work performance, EE defines the scale of ease related with the use of VR system, SI is the scale to which individuals perceive that important others believe they should use the new VR system, and FC indicates the scale to which individuals believe that institution and technical infrastructures support the use of a VR system. These relations are moderated by gender, age, experience and voluntariness.
218
N.H. Hussin, J. Jaafar, and A.G. Downe
Performance Expectancy Behavioral Intention
Effort Expectancy
Use Behaviour
Social Influence Facilitating Condition Gender
Age
Experien
Voluntarine ss Of Use
Fig. 1. Original UTAUT (Venkatesh et al., 2003)
2.2 Educators’ Acceptance of Technology There have been many studies of educators’ acceptances to implement ICT in school. Some research confirms the technology competence can amend teaching and learning processes, but in real implementations computer are not fully utilized as expected and left as dummy devices in a classroom. If an educator does use the devices, it was not utilized effectively. There were 81.8% teachers does not utilized a computer technology during teaching and learning process. The conclusion for this scenario may caused by teachers’ lack of confidence and professional development [17]. Sometimes, teachers did not realize benefits of technology towards teaching and learning process. Other explanations found for this situation are teachers’ did not have enough of technical support and lack of self confidence [18]. Some teachers preferred to use games to assist learning rather than computer technology [19].
3 Research Design 3.1 Unified Theory of Acceptance and the use of technology (UTAUT) As illustrated in Figure 2, this study examined determinant PE, EE, and SI will have a significant effect on educators’ acceptance towards VR. Moderator gender was used as controlling variables for data analysis purposes. Respondent will not to be exposed to a prototype of VR classroom at this stage. Thus, for this preliminary study Facilitating Condition (FC) will be excluded from research model, but it will be comprised in the model during Post Test session.
Assessing Educators’ Acceptance of Virtual Reality (VR) in the Classroom
219
Performance Expectancy sZ ,s/KZ> /EdEd/KE
Effort Expectancy Social Influence Gender
Fig. 2. Research Model
3.2 Research Hypotheses Preliminary survey was done to get an overview from respondents using UTAUT questionnaires as a guideline. Data were collected using questionnaires consist of demographics questions and several questions for each variable in the research model. Based on the research model, the following research hypotheses were formulated: H1a : Performance Expectancy has a significant effect on Behavioral Intention H1b : Gender significantly moderates the relationship between Performance Expectancy and Behavioral Intention. H2a : Effort Expectancy have a significant effect on Behavioral Intention H2b : Gender significantly moderates the relationship between Effort Expectancy and Behavioral Intention. H3a : Social Influence has a significant effect on Behavioral Intention H3b : Gender significantly moderates the relationship between Social Influence and Behavioral Intention.
4 Research Methodology 4.1 Sample Participants in this study were college educators at Politeknik Ungku Omar,(PUO) Perak. As shown in Table 1, there were a total of 41 participants with 13 males and 28 females ranging in age from 20 to 50 and above. Most of them had relatively lengthy teaching experience and were attached to 7 different academic departments. A total of 41 usable questionnaires were returned.
220
N.H. Hussin, J. Jaafar, and A.G. Downe Table 1. Demographic Summary For Participants
Categories
Frequency
Percentage
A) Gender Male Female Total Sample size
13 28 41
32 % 68 % 100 %
B) Age 20 to 24 25 to 29 30 to 34 35 to 39 40 to 44 50 and above Total Sample size
2 9 17 7 4 2 41
5% 22 % 41 % 17 % 10 % 5% 100 %
C)Working Experience Teaching) Less than 1 year 1 – 5 years 6 -10 years More than 10 years Total Sample size
0 3 10 28 41
7% 24 % 68 % 100 %
4.2 Measures A survey mechanism was considered to quantify five constructs in the research model. It was comprised of two sections: Part A- Demographic information and Part B- contained 17 statements based on the UTAUT factors and adapted from the earlier instrument developed by Venkatesh et al. [14]. Please refer to Appendix A for Table 6 for items asked in the Questionnaires. Each statement was evaluated with a five-point Likert Scale with 1= Strongly Disagree, to 5= Strongly Agree. There were 17 questions covering the 5 construct indicated in Figure 2. 4.3 Reliability of the Instrument Table 2. Reliability Table
SN 1 2 3 4
Construct Performance Expectancy (PE) Effort Expectation (EE) Social Influence (SI) Behavioral Intention (BI)
Cronbach Alpha 0.8382 0.9324 0.9103 0.8784
Assessing Educators’ Acceptance of Virtual Reality (VR) in the Classroom
221
All UTAUT constructs performance expectation (PE), effort expectation (EE), social influence (SI) and behavioural intention (BI) had high values of Cronbach alpha coefficient 0.8382, 0.9324, 0.9103 and 0.8784 respectively, and the values were considered high. Since all alpha values exceed 0.7, thus, the data can be considered as acceptable for construct reliability. In other words, the result showed, can be trusted for next process of data analysis [21],[22],[23] .
5 Results and Discussion A standard stepwise regression analysis was utilized on the collected preliminary data for getting all statistical result in this study. The analysis was in two phases: in the first phase the regression analysis was carried out without using a moderator. The analysis gave a low R value of 0.376 and R2 value of 0.141, implying that the measured variables accounted for about 14% of the variation in reasons for acceptance. Only the relationship between social influence and behavioural intention was significant with a coefficient of 0.0.376 at p<0.05. Other relationships were insignificant (see Table 3). Thus, H3a was accepted while H1a and H2a were rejected. Results from the analysis showed PE does not significant in both situations either with moderator or without moderator. Thus, PE was not a significant predictor of the behavioural intention of the educators. This is contrary to previous studies; such an outcome could be as a result of respondents’ inability to fully understand the usefulness of the proposed application. Respondent doesn’t aware the benefits of PE. This situation may occur because respondent cannot experience or cannot imagine the scenario for VR adoption in a classroom. This perception could be changed in a Post Test session where, a prototype of VR classroom will be developed and respondent can explore themselves, and the result for PE could be different. Table 3. Hypothesis Testing Without Moderator Hypothesis Independent Dependent Standard t Significance Moderator Comment Variable variable Coefficient H1a PE BI 0.153 1.035 0.307 None Rejected H2a EE BI 0.268 1.790 0.081 None Rejected H3a SI BI 0.376 2.532 0.015* None Accepted
* p-value < 0.05, ** p-value < 0.01 In the second phase, the regression test was conducted again, but this time with the inclusion of gender is equal to male as a moderating variable. There was an observed increase in the R value of 0.629 and in R2 of 0.395, implying that the model accounted for about 40% of the variation in reasons for acceptance. The Analysis shows that EE became a predictor of BI with a coefficient of 0.629 at p<0.05. Other relationships were insignificant (see table4). Thus H2bm and H3bm were accepted while H1b was rejected. In condition of moderator gender is equal to male; EE was significant in predicting the educators’ behavioural intention BI towards the use of the VR System in this study. Thus, the proposed system is expected by the male educators to be easy to use.
222
N.H. Hussin, J. Jaafar, and A.G. Downe
I should be noted that the educators were expecting an application that was not complex, anticipating for example that functions of the system should be attained with minimal clicks of the mouse device or used of keyboard. Table 4. Hypothesis Testing for Male Hypothesis Independent Dependent Standard t Significance Moderator Comment Variable variable Coefficient H1bm PE BI -.0329 -1.180 0.265 Gender Rejected (Male) H2bm EE BI 00.629 2.680 0.021* Gender Accepted (Male) H3bm SI BI 0.199 0.810 0.437 Gender Rejected (Male)
* p-value < 0.05, ** p-value < 0.01 The regression test was conducted for the third time, where gender is equal to female as a moderating variable. Stepwise method gives small difference for observed result. There was a slightly increase in the R value is 0.513 and in R2 of 0.263, implying that the model accounted for about 26% of the variation in reasons for acceptance. SI was still a predictor of BI with a coefficient of 0.513 at p<0.05. Other relationships were insignificant (see table5). Thus H3bf was accepted while H1bf and H2bf were rejected. SI showed a significant relation with BI of the educators to use the VR system in both condition either without moderator or with moderator. This implied that educators will likely use a system if people that are of important to them use or request them to use it. Therefore, a feature that allows educators to share ideas virtually with fellow teachers can encourage them to use the system. Finally, gender the moderator in this study significantly moderates the relationship between EE, SI and BI. However, gender failed to have any influence on the relationship between PE and BI. Table 5. Hypothesis Testing for Female Hypothesis Independent Dependent Standard t Significance Moderator Comment Variable variable Coefficient H1bf PE BI 0.182 1.078 0.291 Gender Rejected (Female) H2bf EE BI 0.055 0.292 0.773 Gender Rejected (Female) H3bf SI BI 0.513 3.047 0.005** Gender Accepted (Female)
* p-value < 0.05, ** p-value < 0.01
6 Implications for Future Research This study should be extended for other moderators like internet experience, age, and voluntariness, and content learning. Considerable future research needs to be carried
Assessing Educators’ Acceptance of Virtual Reality (VR) in the Classroom
223
out to better determine the impact of both direct and indirect factors on the acceptance of VR technology as a classroom tool by educators in both developing and developed environments. In particular, the role of social influence and the potentially important moderating role of voluntariness and learning content should be explored. Future studies could also examine specific facilitating conditions that educators consider instrumental to their acceptance of VR technology within their classrooms.
7 Conclusion The objective of this paper was to use three key constructs from the UTAUT model which are performance expectancy, effort expectancy, and social influence to discover the behavioural intention of educators toward the adoption of a proposed system within the education sector. The study has successfully determined that effort expectations and social influence educators’ predict behavioural intention to use VR applications. Furthermore, results related to the non-significant effects of the performance expectation variable underscore the need for detailed consideration of teaching and learning advantages associated with VR technology to educators as part of the software development process and communication of these benefits educators during system roll-out. In general, it can be concluded that the UTAUT model provides a useful conceptual framework in which to consider technology acceptance questions within the educational domain. Acknowledgements. The author wish to thank to academic staffs in Politeknik Ungku Omar, Perak, Malaysia for the help during pre-test data collection process. This work is funded by Ministry of Higher Education, Malaysia.
References [1] Hiltz, S.R., Turoff, M.: Video Plus Virtual classroom For Distance Education: Experience with Graduate Courses (1993) [2] Paulsen, M.F., Rekkedal, T.: The Electronic College: Selected Articles from the EKKO Project. NKI Forlaget, Bekkestua (1990) [3] Wells, R.A.: Computer-mediated communications for distance learning and training. Boise State University, Boise (1990) [4] Hsu, E., Hiltz, S.R.: Management gaming on a computer-mediated conferencing system: A case of collaborative learning through computer conferencing, vol. IV. IEEE Computer Society, Washington, DC (1991) [5] Weedman, J.: Task and non-task functions of a computer conference used in professional education: A measure of flexibility. International Journal of Man-Machine Studies, 303– 318 (1991) [6] Harasim, L., Hiltz, S.R., Teles, L., Turoff, M.: Learning Networks: A Field Guide. MIT Press, Cambridge (1993) (in press) [7] Furht, B.: Handbook of Internet and Multimedia System Application. CRC Press and IEEE Press, Boca Raton, Florida (1998) [8] Bonwell, C.C.: Active Learning: Creating Excitement in the Classroom. Eric Digest (1991) [9] Byrne, C.: Water on tap: the use of virtual reality as an educational tool (1996)
224
N.H. Hussin, J. Jaafar, and A.G. Downe
[10] Dede, C., Salzman, M., Loftin, R.: The development of a virtual world for learning Newtonian mechanics. Springer, Berlin (1996) [11] Osberg, K.M.: Constructivism in practice: the case for meaning-making in the virtual world, HITL Report R-97-47 (1997) [12] Dickey, M.D.: Three Dimensional virtual worlds and Distance Learning: Two case Studies of Active Worlds As a medium for Distance Learning, vol. 36 (2005) [13] Legris, P., Ingham, J., Collerette, P.: Why do people use Information Technology? A critical Review of the Technology Acceptance Model. Information and Management 40, 1–14 (2003) [14] Vankatesh, V., Morris, M.G., Davis, G.B., Davis, F.D.: User Acceptance of Information Technology: Toward A Unified View. MIS Quarterly 3, 425–478 (2003) [15] Dasgupta, S., Granger, M., Mcgarry, N.: User acceptance of e-collaboration technology: an extension of the technology acceptance model. Group Decision and Negotiation, 87– 100 (2002) [16] Lim, C.P., Khine, M.S.: Managing teachers’ barriers to ICT integration in Singapore schools. Journal of technology and Teacher Education 14, 97–125 (2006) [17] Bayhan, P., Olgun, P., Yelland, N.: A study of pre-service teachers’ thoughts about computer assisted instruction. Contemporary Issues in Early Childhood 3, 298–303 (2002) [18] Jones, A.: A Review of the research literature on barriers to the uptake of ICT. Becta, Conventry (2004) [19] Becker, H.: How are teachers using computers in instruction? Paper presented at, meetings of the American educational research Association (2001) [20] Ajzen, I.: The theory of planned behavior. Organizational Behavior and Human Decision Processes 50, 179–211 [21] Cronbach, L.J.: Response Sets and Test Validating: Educational And Psychological Measurement 6, 475–494 (1946) [22] Nunnally, J.C., Bernstein, I.H.: Psychometric Theory. Mc Graw Hill, New York (1994) [23] Sekaran, U.: Research Methods For Business: A Skill Building Approach, 4th edn. John Wiley & Sons, New York (2003)
Assessing Educators’ Acceptance of Virtual Reality (VR) in the Classroom
Appendix A Table 6. Variables/Constructs and Questionnaires Items Variable construct Performance Expectancy
Effort Expectancy
/ Scale variables PE
EE
Questionnaires Items PE1: VR is very useful in presenting new ideas in the classroom PE2: VR is very useful in encouraging interaction and participation amongst student PE3: VR is very useful in instilling student’s learning spirit and interest towards course content PE4: VR is very useful in enables me to get my work done faster PE5: VR can increase my productivity PE6: VR can increase my chances for increase in salary EE7: VR application/system is easy to use EE8: Learning how to use VR application/system is easy for me EE9: Interaction with the VR application/system is clear and easy to understand EE10: It easy for me to master the VR application/system
Social Influence SI
SI11: An individual who influenced my behavior think I should use this system SI12: An important individual make me to think that I should use this system SI13: The management is very helpful in helping me to adopt the VR application/system SI14: In general, the organization support the usage of VR application/system.
Behavioral Intention
BI15: I intend to use the VR application/system within the next (12) months BI16: I assumed that I will be using the VR application/system within the next (12) months BI17: I planned to use the VR application/system within the next (12) months
BI
225
A Fuzzy Similarity Based Image Segmentation Scheme Using Self-organizing Map with Iterative Region Merging Wooi-Haw Tan1, Gouenou Coatrieux2, Basel Solaiman2, and Rosli Besar3 1
2
Faculty of Engineering, Multimedia University, 63100 Cyberjaya, Selangor, Malaysia LaTIM, INSERM U650, Department ITI, Institut Telecom Bretagne, Technopôle Brest-Iroise, CS 83818, 29238 Brest Cedex 3, France 3 Faculty of Engineering and Technology, Multimedia University, 75450 Melaka, Malaysia {twhaw,rosli}@mmu.edu.my, {gouenou.coatrieux,basel.solaiman}@telecom-bretagne.eu
Abstract. This paper presents a new region-based segmentation scheme which considers homogeneous regions as constituted of pixel blocks that are highly similar to their neighborhoods. Based on the postulate that each homogenous region can be represented by an exemplary pixel block, segmentation is done by grouping contiguous pixel blocks whose neighborhoods are highly similar to the exemplary pixel blocks. In our approach, the degree of similarity between one pixel block and its neighborhood is determined via fuzzy similarity, while the exemplary pixel blocks are automatically discovered by Kohonen selforganizing map. The discovered pixel blocks are later used to split the image into its constituent regions. To obtain a more discernible result, a two-stage iterative merging technique based on Region Adjacency Graph (RAG) is applied. The proposed scheme has been evaluated using real images with results that are comparable and in certain cases better than the morphological watershed segmentation. Keywords: Image segmentation, self-organizing map, iterative region merging, Visual Informatics.
1 Introduction Image segmentation is a process that aims to divide an image into regions of maximum homogeneity depending on a set of predefined image properties. It is usually a crucial prerequisite in image analysis and the successfulness of other tasks in image understanding are very much depending on the adoption of appropriate segmentation technique [1]. According to Zhang [2], segmentation algorithms can be classified depending on whether the algorithms are edge-based or region-based, and whether sequential or parallel processing strategy is involved. This classification has produced four groups of image segmentation algorithm, as shown in Table 1 [2]. Table 1. General taxonomy of segmentation algorithm Classification Edge-based (discontinuity) Region-based (similarity) Parallel process G1: Edge-based parallel process G3: Region-based parallel process Sequential process G2: Edge-based sequential process G4: Region-based sequential process H. Badioze Zaman et al. (Eds.): IVIC 2011, Part I, LNCS 7066, pp. 226–237, 2011. © Springer-Verlag Berlin Heidelberg 2011
A Fuzzy Similarity Based Image Segmentation Scheme
227
In this paper, we present a variant of region splitting and merging technique from group G4. Our technique splits an image into homogeneous regions, where each region consists of pixel blocks that are highly similar to their neighborhoods. More specifically, considering that each region can be represented by an exemplary pixel block, region delineation is performed by aggregating the pixel blocks whose neighborhoods are highly similar to the exemplary pixel block. To select the most probable exemplary pixel blocks from the image, we make use of Kohonen selforganizing map (SOM). The Kohonen SOM is a type of artificial neural network that is trained using competitive and unsupervised learning. It converts complex, nonlinear statistical relationships between high-dimensional data into simple geometric relationships on low-dimensional map [3]. Due to this appealing attribute, SOM has been widely adopted for image processing [4], [5], [6]. Homogeneity is an essential property that is often used as the main criterion in region-based segmentation. The criteria for homogeneity can be based on average grey-level, color properties, texture properties, shape, model, etc. [7]. In our approach, homogeneity is evaluated in terms of pixel block similarity, which is a measure of spatial grey level continuity between a pixel block and its neighboring blocks [8]. Similar to other region splitting and merging techniques, our scheme will initially produce an over-segmented image, region merging is then carried out in a third phase to obtain a more discernible result. In this work, we apply a two-stage iterative merging based on region adjacency graph (RAG) [9] to produce a more discernible final segmentation.
2 Image Segmentation via Fuzzy Similarity Measure This technique was first presented to find the similarity degree of a pixel block with its 8-neighboring blocks [8]. It was later employed for image segmentation with promising results [10]. Consider a block of N×N pixel, B0 and its 8 neighboring blocks as depicted in Fig.1(b). Let Θ = {B1, B2, …, B8} be the ordered set of the 8 neighboring blocks of B0 and Ω = {C1, …, Ci, …, CM} be the universal set of pixel blocks. Given the membership function of similarity degree μ : Ω → [0, 1], the similarity strength of a pixel block Ci in the context of Θ defines a fuzzy set “A/Θ” over Ω, with the membership function of μA/Θ (Ci). Higher similarity degree near or equals to 1 implies that the block is more similar in spatial context with its neighborhood.
Fig. 1. (a) An image comprises pixel blocks. (b) A pixel block with its eight neighbours. (c) A pixel block is divided into 4 sub-blocks. (d) A sub-block with five surrounding sub-blocks.
228
W.-H. Tan et al.
For a more precise description of its contents, we may divide a pixel block into 4 sub-blocks {SBi,i=1, ..., 4}, with each sub-block representing one of its four quadrants as illustrated in Fig.1(c). Next, a sub-block similarity is defined to indicate the amount of spatial mean grey level continuity between adjacent sub-blocks. Two sub-blocks are perceptually similar when their mean intensities, mi and mj are close enough. This can be assessed by the following function:
μ similar ( SBi , SB j ) = 1 − mi − m j L .
(1)
where L corresponds to maximum value of the dynamic range of the image (L=255 for an 8-bit depth image). As depicted in Fig.1(d), each sub-blocks is surrounded by 5 sub-blocks {SBi(j) | j=1, 2, …, 5}, from 3 neighboring blocks. Herein, a sub-block SBi is similar to its surrounding sub-blocks if it is similar to one of them. Based on this notion, the similarity strength of a pixel block B0 with its neighborhood is maximized when all its elementary sub-blocks assume the spatial mean grey level continuity. Thus, the similarity strength of B0 with its neighborhood Θ can be stated as:
μ A Θ (B0 ) = min max (μ similar (SBi , SBi ( j ))), i = 1,...,4, j = 1,...,5 . i
j
(2)
In order to decide whether a block is spatially similar to its neighborhood Θ, the similarity measure μA/Θ(B0) can be compared to a threshold Ψ: 0 if μA/Θ ( B0 ) < Ψ . SΨ ( B0 ) = otherwise 1
(3)
where SΨ(B0) = 1 implies that the block is considered as similar to its neighborhood, while SΨ(B0) = 0 implies that the block is dissimilar to its neighborhood, and Ψ is a preset threshold known as the similarity threshold. Due to its inherent behavior in comparing the uniformity of a pixel block and its neighborhood, the fuzzy similarity measure can be used as a criterion for homogeneity in region-based segmentation. If a pixel block has been identified as representative of a region of the image, the regions with similar pixels blocks can be found by using Equation (2) and (3), respectively. If two or more exemplary pixel blocks have been identified, they will compete with each other in finding the most similar region. In this case, it is not required to predefine the similarity threshold. Instead, each exemplary pixel block is uniquely labeled. Pixels blocks of the image are then tagged with the label of the exemplary pixel block that produces the highest similarity. Once this is done, contiguous pixel blocks with the same label would form a uniform region. Thus, by identifying all exemplary pixel blocks that are representative of the image content, an image can be partitioned into its constituent regions. The aforementioned approach operates on the basis of pixel blocks. If each subblock consists of n×n pixel, the basic segmentation unit would be a pixel block encompassing 2n×2n pixels. Hence, the segmented image would be coarse with jagged boundaries. To apply the fuzzy similarity measure at pixel level for a finer
A Fuzzy Similarity Based Image Segmentation Scheme
229
segmentation result, the image can be over-sampled by subdividing each pixel of the original image into a block of 2n×2n sub-pixels. This approach of resampling is equivalent to 2n-fold magnification using interpolation [11]. It has the effect of smoothing the image and minimizes aliasing along lines and edges [12], where each interpolated pixel is influenced by the neighbouring pixels in the original image. If the original image has the size of NX×NY, the resampled image will consist of (2n×NX)×(2n×NY) sub-pixels with NX×NY blocks.
3 Segmentation Using Fuzzy Similarity with Kohonen Map Our approach for segmentation can be divided into 3 main phases. The flowchart in Fig.2 depicts the processes involved and the relationship between each phase. The proposed technique operates on the basis of pixel blocks. In order to apply it at pixel level, the input image is resampled via interpolation in the preprocessing phase. Since larger interpolation factor tends to over-smooth the image, we have empirically chosen to resample the image by a factor of 4 using bicubic interpolation. Each pixel in the original image is resampled to a block of 4×4 sub-pixels, with 2×2 sub-pixels in each sub-block. For an input image of size NX×NY, the resampled image would comprises 4NX×4NY sub-pixels in NX×NY pixel blocks.
Fig. 2. Flowchart of the processes in the proposed technique
In the region splitting phase, the exemplary pixel blocks are automatically identified by applying Kohonen self-organizing map. The Kohonen Map employed in the proposed approach consists of 16 nodes, which are arranged in a 4×4 regular grid. Each node n is associated with a 4-dimension weight vector, wn = [wn1, wn2, wn3, wn4]T. The training begins by initializing the weight vectors using randomly selected input vectors to facilitate convergence to vectors that are close to the vectors typical to those from the input space [6]. During the training, a pixel block from the input image is randomly sampled as the input vector, x = [x1, x2, x3, x4]T. All nodes compete with each other as to match with the input vector by means of their weight vectors. In our approach, the Euclidean distance is used as a measure of the match between an input vector x and a weight vector wn:
230
W.-H. Tan et al.
(x 4
x − wn =
j =1
− wnj ) . 2
j
(4)
The best matching node c, has a weight which gives the smallest Euclidean distance and is then determined using the following expression:
c = arg min( x − w n ) .
(5)
n
This best matching node is awarded by altering its weight vector to resemble the input vector. In addition, its neighbors are also compensated by being more like the input vector. This tends to smooth the weight vectors of the nodes in the neighborhood. The weight vectors of the best matching node and its neighbors at kth iteration are then updated by the following function:
w n (k + 1) = w n (k ) + h cn (k ) ⋅ [x (k ) − w n (k )] .
(6)
where hcn is a neighborhood kernel defined over the best matching node c. In our case, the kernel for neighborhood smoothing is written in terms of the following Gaussian function:
pc − pn 2 h cn (k ) = α(k ) ⋅ exp − 2σ 2 (k )
.
(7)
where pc and pn are the location vectors of nodes c and n, σ(k) is the radius of the neighborhood and α(k) is the learning rate factor. For convergence, it is necessary to ensure that hcn → 0 when k → K, where K is the total number of iterations for the training process [13]. Thus, α(k) should start with a value close to unity and decrease monotonically thereafter. To do this, we use the function below:
− k . α(k ) = exp K 5
(8)
If the kernel is too small, the map will not be ordered globally. To avoid this, the initial radius of the kernel is set to half of the width of the map. As training progresses, it shrinks until it encloses only the best matching node. This is similar to the process of coarse adjustment followed by fine tuning. To achieve this, the radius of the kernel can be modeled using the monotonically decreasing function below:
. −k K log e (σ 0 )
σ (k ) = σ 0 ⋅ exp
(9)
where σ0 is the initial radius of the kernel. Similar to other stochastic techniques, the accurateness of the process depends on the number of steps. For good statistical accuracy, the number of steps must be at least 500 times the number of nodes [13]. This can be achieved by using all pixel blocks in the image for training [6]. For a NX×NY image, this gives a total of NX×NY steps in each iteration. After the training, the weight vectors are arranged in a spatially
A Fuzzy Similarity Based Image Segmentation Scheme
231
ordered map that contains sets of dominant pixel blocks that are most descriptive of the input image. We found that no improvement can be achieved with Kohonen map of size larger than 4×4. Larger Kohonen map increases the complexity of the segmentation result which hampers the subsequent process to produce concise segmentation. Since the learning rate decreases to a very small amount after a large number of iterations, only minor improvement can be obtained with higher number of iterations. Empirically, satisfactory results can be obtained in 50 iterations. The final Kohonen map after the training process is used as a dictionary of the exemplary pixel blocks for image segmentation. More precisely, each weight vector of the Kohonen map corresponds to an exemplary pixel block. Region splitting is then conducted by calculating the similarity strength of each exemplary pixel block with respect to the neighborhood of every pixel block in the resampled image. Each exemplary pixel block is labeled and each pixel block is tagged with the label of the exemplary pixel block that produces the highest similarity. The pseudo-code of the region splitting algorithm is given below: Pseudo-code 1: Region splitting INPUT: Resampled image (IR), Kohonen Map (W) FOR i = 1, NX×NY DO FOR j = 1, NZ DO /* NZ is the number of nodes in W */ Calculate similarity strength of wj with neighborhood of pixel block Bi in IR /* wj is the weight vector of jth node in W */ Store similarity strength ENDFOR Pr(i) = label of weight vector with highest similarity strength ENDFOR OUTPUT: Initial segmented image (Pr)
The initial segmentation output is over-segmented since Kohonen map only considers the population of the pixel blocks regardless of their spatial distribution. To obtain a more discernable result, region merging can be applied. In our approach, region merging is performed based on the region adjacency graph (RAG) technique proposed by Haris et al.[9]. Since its debut, this technique has inspired a number of other RAGbased region merging algorithms [14], [15], [16], [17]. RAG is a data structure used to represent relations between regions in a segmented image. Each region is represented by a node. A pair of nodes is connected by an edge if the nodes are representing adjacent regions. A merging cost is assigned to each edge in the RAG denoting the dissimilarity between two adjacent regions. In our case, we use the dissimilarity function proposed by Haris et al. [9], as given by:
[
]
2 δ (Ri , R j ) = C ij ⋅ M (Ri ) − M (R j ) .
(10)
where M(Ri) and M(Rj) are the mean intensity of region Ri and Rj respectively, and Cij is the cardinality factor given by:
232
W.-H. Tan et al.
C ij =
Ri ⋅ R j Ri + R j
.
(11)
where ||Ri|| and ||Rj|| are the number pixels in region Ri and Rj. For region merging, we employ an iterative approach where the most similar region pair is merged in each iteration. The region merging starts with the construction of RAG for the initial segmented image. If there are r regions in the initial segmentation, the pseudo code to obtain a resulting segmentation of s regions is given below: Pseudo-code 2: Region merging INPUT: Region adjacency graph (RAG), initial segmented image (Pr), original image (I) FOR i = 1 to r DO /* r is the number of regions in Pr */ Determine adjacent regions of region Ri from RAG FOR j = 1 to Ni DO /* Ni = number of adjacent regions for region Ri */ Calculate merging cost for region Ri and Rij /* Rij is the jth neighbour of Ri */ Add Ri, Rij and their merging cost in cost table ENDFOR ENDFOR t = 0 REPEAT Determine region pair with the lowest merging cost Merge the region pair to form new region Combine adjacent regions of the region pair Calculate merging costs of new region and its adjacent regions Update cost table Update RAG t = t + 1 UNTIL (r-t) == s OUTPUT: Segmented image after merging ( P ' ) S
Using the dissimilarity function from Equation (10), the number of regions is greatly reduced and the resulting segments are more discernible. However, large spurious regions may still be observed in the image. This is attributed to the cardinality factor which favors the merging of small regions by penalizing the merging costs of large regions. In this case, a second region merging is performed in our approach to improve the results. Since the large spurious regions are similar in intensity, region average intensity can be used as a criterion for merging. However, this may lead to the merging of two legitimate regions with similar intensities but separated by a narrow ridge. To overcome this problem, we counterbalance average intensities with a second criterion based on crack edges. A crack edge is the border between a pixel and one of its four neighbors [7]. Each pixel has four crack edges and crack edges can be discerned between two regions. Illegitimate merging is avoided by using the average
A Fuzzy Similarity Based Image Segmentation Scheme
233
strength of crack edges along the common border between regions. The stronger the average strength, the more unlikely the regions would be merged. Hence, average merging and edge control are combined in defining a penalty function, as given by:
ρ (Ri , R j ) = M (Eij ) + (1 − Dij )⋅ M (Eij ) + M (Ri ) − M (R j ) .
(12)
where Eij is the crack edges along the common boundaries between region Ri and Rj, M(Eij) is the mean strength of the crack edges, and M(Ri) and M(Rj) are the mean intensities of region Ri and Rj, respectively. Dij is the crack edge to perimeter ratio given by:
Dij =
E ij
min (li , l j )
.
(13)
where li and lj are the perimeter lengths of region Ri and Rj respectively. The second region merging is then applied by using the same iterative process described in Pseudo-code 2. This time, the penalty function from Equation (12) is used to calculate the merging costs. In each iteration, region pair with the lowest penalty is merged until the desired number of regions is obtained.
4 Experimental Results and Discussion Initially, the proposed approach was tested on the well-known Shepp-Logan phantom, which is a synthetic medical phantom consisting of 14 regions, as shown in Fig.3. Our scheme was compared against the popular watershed segmentation as both techniques create over-segmented results. For watershed segmentation, the gradient image were obtained using the first order Gaussian derivative with a scale of σ = 1. Before watershed segmentation was applied, the gradient image was thresholded to remove the lower gradient values due to noise. In the test, the threshold was specified in terms of the percentile of the cumulative frequency distribution of the gradient magnitudes in the image. The results were later post-processed using the region merging technique proposed by Haris et al. [9].
Fig. 3. Left: Shepp-Logan phantom. Right: Ground truth with 14 labelled regions
234
W.-H. Tan et al.
In the test, both schemes were configured to produce 14 final regions. The results of the proposed technique and watershed segmentation are shown in Fig.4(a) and Fig.4(b), respectively. It can be seen that all 14 regions have been properly segmented in Fig.4(a). Conversely, Region 5 is missing while Region 11 and 12 are merged in Fig.4(b). Furthermore, Region 14 is over-segmented into 4 regions in Fig.4(b). We can also observe that Region 4 in Fig.4(b) is smaller in proportion when compared to the same region in Fig.3. Thus, the proposed scheme has shown to perform better than the watershed segmentation with Haris’s region merging technique in this experiment.
Fig. 4. Segmentation results of Shepp-Logan phantom: (a) The proposed scheme. (b) Watershed segmentation with Haris’s region merging technique.
To further compare the effectiveness of the proposed scheme against watershed segmentation with Haris’s RAG-based region merging, both techniques were applied on a number of standard test images: Boat, Cameraman, Clock, Goldhill and Lena. For comparison purposes, we have configured our approach and watershed segmentation with region merging to produce the same number of final regions. The segmentation results of both approaches are shown in Fig.5, with the values of the parameters used in each approach listed in Table 2. It can be observed that the results from both approaches are comparable. In certain cases, the proposed approach produced slightly better segmentation results. This is noticeable for the windows in the Goldhill image. The watershed segmentation has produced regions with natural and smooth borders. Conversely, some edges produced by the proposed approach are jagged and over-segmented, such as those in the Clock image. These may be solved by further reducing the number of regions, but to the cost of losing the details, which is a classical problem of balancing between over- and under-segmentation. The proposed technique poses some limitations albeit it generally produces satisfactory segmentation results. Firstly, it is computationally more expensive than watershed segmentation. Since all pixel blocks in the image is used during the training, the time taken to generate the Kohonen map increases for larger images. This
A Fuzzy Similarity Based Image Segmentation Scheme
235
Fig. 5. Results of segmentation on standard test images. Row (top to bottom): Boat, Cameraman, Clock, Goldhill, and Lena. Column (left to right): Original image, segment boundaries from watershed segmentation with Haris’s RAG-based region merging, and segment boundaries from our approach.
236
W.-H. Tan et al.
can be improved by optimizing and exploiting parallelism in the algorithm. Secondly, the training rule employed in our Kohonen map may not be optimum. This can be improved if a suitable objective function is defined. In addition, supervised learning via learning vector quantization (LVQ) can also be used to further improve the performance [6]. Table 2. Values of the parameters used in the experiments with standard test images Image
Boat Camera man Clock Goldhill Lena
Size
Watershed segmentation Gradient threshold No. of final (percentile) regions
512 × 512 256 × 256
25th 50th
500 150
256 × 256 512 × 512 256 × 256
70th 25th 25th
150 500 150
Our approach No. of regions 1st 2nd merging merging 550 500 160 150 170 550 170
150 500 150
5 Conclusion In this paper, we have proposed a segmentation scheme based on the postulate that an image can be divided into homogenous regions where each region comprises pixel blocks with highly similar neighbourhood. By using the exemplary pixel blocks from the image, main regions in the image can be segmented where contiguous pixel blocks are aggregated if their neighbourhoods are highly similar to the same exemplary pixel block. In our approach, these exemplary pixel blocks are automatically discovered through unsupervised learning via the Kohonen self-organizing map. The exemplary pixel blocks obtained are later used in splitting the image through fuzzy similarity measure. Even though the initial segmentation output is over-segmented, the proposed two-stage region merging technique has shown to produce a more discernable result. The first stage of the merging process combines small spurious regions while larger regions in the background are merged in the second stage. The proposed approach has been applied on a number of real world test images and the results have shown that it is capable of producing quality segmentations, which are comparable and in same cases better than the watershed-based segmentation technique.
References 1. Zhang, Y.J.: An Overview of Image and Video Segmentation in the Last 40 Years. In: Zhang, Y.J. (ed.) Advances in Image and Video Segmentation, pp. 1–15. IRM Press, Hershey PA (2006) 2. Zhang, Y.J.: Evaluation and Comparison of Different Segmentation Algorithms. Pattern Recognition Letters 18, 963–974 (1997) 3. Meyer-Bäse, A.: Neural net computing for image processing. In: Jähne, B., Haußecker, H., Geißler, P. (eds.) Handbook of Computer Vision and Applications, vol. 2, pp. 729– 751. Academic Press, San Diego (1999)
A Fuzzy Similarity Based Image Segmentation Scheme
237
4. Ahmed, M.N., Farag, A.A.: Two-stage Neural Network for Volume Segmentation of Medical Images. Pattern Recognition Letters 18, 1143–1151 (1997) 5. Reddick, W.E., Glass, J.O., Cook, E.N., Elkin, T.D., Deaton, R.J.: Automated Segmentation and Classification of Multispectral Magnetic Resonance Images of Brain Using Artificial Neural Networks. IEEE Transactions on Medical Imaging 16, 911–918 (1997) 6. Ong, S.H., Yeo, N.C., Lee, K.H., Venkatesh, Y.V., Cao, D.M.: Segmentation of Color Images Using a Two-Stage Self-Organizing Network. Image and Vision Computing 20, 279–289 (2002) 7. Sonka, M., Hlavac, V., Boyle, R.: Image Processing, Analysis and Machine Vision, 3rd edn. Thomson, Toronto (2008) 8. Coatrieux, G., Solaiman, B.: A Pixel Block Fuzzy Similarity Measure Applied in Two Applications. In: 1st IEEE Int. Conf. on Information and Communication Technologies: from Theory to Applications, pp. 355–356 (2004) 9. Haris, K., Efstratiadis, N., Maglaveras, N., Katsaggelos, A.K.: Hybrid Image Segmentation Using Watersheds and Fast Region Merging. IEEE Transactions on Image Processing 7, 1684–1699 (1998) 10. Tan, W.H., Coatrieux, G., Solaiman, B., Besar, R.: A Region Based Segmentation Using Pixel Block Fuzzy Similarity. In: 2nd IEEE Int. Conf. on Information and Communication Technologies: from Theory to Applications, pp. 1516–1521 (2006) 11. Klette, R., Rosenfeld, A.: Digital Geometry: Geometric Methods for Digital Picture Analysis. Morgan Kaufmann, San Francisco (2004) 12. Russ, J.C.: The Image Processing Handbook, 5th edn. CRC Press, New York (2007) 13. Kohonen, T.: Self-organizing Maps. Springer, Berlin (2001) 14. Sarkar, A., Biswas, M.K., Sharma, K.M.S.: A Simple Unsupervised MRF Model Based Image Segmentation Approach. IEEE Transactions on Image Processing 9, 801–811 (2000) 15. Hernández, S.E., Barner, K.E.: Joint Region Merging Criteria for Watershed-Based Image Segmentation. In: Int. Conf. on Image Processing 2000, pp. 108–111 (2000) 16. Duarte, A., Sánchez, A., Fernández, F., Montemayor, A.S.: Improving Image Segmentation Quality Through Effective Region Merging Using a Hierarchical Social Metaheuristic. Pattern Recognition Letters 27, 1239–1251 (2006) 17. Pichel, J.C., Singh, D.E., Rivera, F.F.: Image Segmentation Based on Merging of Suboptimal Segmentations. Pattern Recognition Letters 27, 1105–1116 (2006)
Enhancing Learning Experience of Novice Surgeon with Virtual Training System for Jig and Fixture Usage Intan Syaherra Ramli1, Haslina Arshad2, Abu Bakar Sulong3, Nor Hamdan Mohd. Yahaya4, and Che Hassan Che Haron3 1
Department of Computer Sciences, Faculty of Computer and Mathematical Sciences, Universiti Teknologi Mara (UiTM), (Cawangan Negeri Sembilan), Kuala Pilah, Negeri Sembilan [email protected] 2 Department of Industrial Computing, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, Bangi, Selangor 3 Department of Mechanical and Materials Engineering Faculty of Engineering and Built Environment, Universiti Kebangsaan Malaysia, Bangi, Selangor 4 Department of Orthopaedic &Traumatology Faculty of Medicine Universiti Kebangsaan Malaysia, Bangi, Selangor
Abstract. Complex surgical procedures, highly costs and time to prepare for the conventional surgical training are the factors which contribute to the development of virtual surgery training system in medical studies. A new jig and fixture has been designed to help orthopedic surgeons perform the knee replacement surgery. The jig and fixture is used to align the cutter blade in the right position before the sawing begins. It can also reduce the overall time taken when using computer assisted surgery system. Thus, training are required to practice the right techniques to apply the newly jig so that the time required to place the jig at the optimum position can be reduced. As an alternative to the conventional method, a virtual training system was developed to enhance the learning experience of novice surgeon. A testing with 8 novice surgeons showed that the time to align the jig on tibia after implementing the virtual training system were decreased compared to before. This showed that the virtual training system managed to train the doctors on the correct technique to use the newly designed jig and fixture. Keywords: virtual training system, virtual reality, knee replacement surgery, Visual Informatics.
1 Introduction Virtual training system in medical field is a combination between 3D image modeling and virtual reality technique as a clinical training for the doctors [1]. It is also an alternative to the conventional training before the doctors implement the real process in the operating room [2]. Furthermore, it is a new dimension for more effectiveness H. Badioze Zaman et al. (Eds.): IVIC 2011, Part I, LNCS 7066, pp. 238–244, 2011. © Springer-Verlag Berlin Heidelberg 2011
Enhancing Learning Experience of Novice Surgeon with Virtual Training System
239
learning instead of low cost and time [3]. There are many types of virtual training system were developed for medical training purposes which focusing on certain learning aspect such as anatomical studies, surgical procedure and surgical instrument usage [4],[5],[6],[2],[1],[7]. In this research, a virtual training system for newly designed jig and fixture usage in computer assisted knee replacement surgery was developed. The research started when the surgeon in the Orthopedic Department, Universiti Kebangsaan Malaysia Medical Centre (PPUKM) having a problem to align the existing jig during the computer assisted knee replacement surgery. The process to position the jig at the right angle will take a lot of time. This situation happened especially during fixing and locating the jig at the correct position on the patient’s knee. Surgeons have to readjust and realign the jig position several times. To overcome this problem, the jig and fixture was redesigned by a group of researchers from Engineering and Build Environment Faculty (FKAB, UKM). Training is required for the novice surgeons to practice on the jig and fixture usage during computer assisted knee replacement surgery. The right technique to apply the jig and fixture is important to reduce the time taken to get the optimum position of jig during the surgery [8]. However, the procedure to prepare for the conventional training is time consuming and highly cost [6]. This is because the conventional training requires plastic bone, desk, platform and the only one computer assisted in PPUKM which is mainly for surgical diagnosis purpose. Besides, the conventional method only allows one doctor to experience the training at one time and repetition process is tedious [5]. Thus, a virtual training system was developed to assist the computer-assisted knee surgery where the user can interact with virtual environment to determine the optimum position of the jig at tibia.
2 Knee Replacement Surgery Knee replacement surgery procedure was conducted to treat the knee anatomy which damaged due to an accident, age and weight factor. In general, the articular surface was replaced by the prosthesis during the surgery [8]. Articular surface is the knee joint which located at the end of the knee while the prosthesis was designed to replace the damage articular surface. In the conventional knee replacement surgery, the position of jig was obtained based on the data from MRI (Magnetic Resonance Image). However, this is less accurate and high error risk [9]. Therefore, the new emergence of technology has introduced the computer-assisted for the knee surgery which provides more accurate knee cut [10]. There are two types of computer-assisted which are image free and image base. In PPUKM, we use the image free type. In order for the system to work, it has to calibrate with the actual knee by using the devices such as sensor and camera. The implementation of jig and fixture was only during the knee cut phase. Therefore, the virtual training system development was only focus on the jig positioning process during the knee cut phase. In this process, the surgeon has to move the jig in order to get the most optimum position. By using the sensor, the jig movements were captured by the system as the actual plane (blue plane). On the other hand, the plan for the knee was determined as plan-resection plane (yellow plane) as in Figure 1.
240
I.S. Ramli et al.
Plan Resection Plane (PRP)
Actual Blue Plane (ABP) base
Front view
(a)
Side view
(b)
Fig. 1. Jig and fixture (a) Computer assisted interface (b)
The optimum position of jig was achieved when the actual blue planes are parallel and overlaps with the planned resection plane. At this point, the parameter values such as resection is equal to +/-0.5 mm while the posterior/ anterior slope is equal to +/-0.5 degree and the varus/ valgus is equal to +/-0.5 degree as showed in Figure 2.
\ Fig. 2. Optimum position of jig and fixture
There are several movements of jig that may influence the movement of plane in the system. This is very useful especially for the novice surgeon to get the most optimum position of jig with the less time consuming. These steps will be implemented in the virtual training system so that they can practice in order to enhance their learning experience.
Enhancing Learning Experience of Novice Surgeon with Virtual Training System
241
3 Virtual Training System for Newly Design Jig and Fixture Usage in Knee Replacement Surgery In order to construct the 3D model, it is very important for us to know the actual elements that involves in this process. There are computer-assisted system, knee bone, jig and fixture and other supporting equipments such as desk and the platform. The system development process is shown in the Fig.4.
Virtual Environment Elements in the jig positioning process using the computer assisted system Actual Environment
3D geometric modeling Collision detection algorithm Animation Visualization Fig. 4. Elements involves in the process
The elements were modeled by using 3D Studio Max software and CMM machine. To permit the interaction between the virtual models, the visualization and animation part were developed using the Unity 3D software. Besides, the collision detection algorithm was applied to trigger the detection between jig and tibia. This is to avoid the jig from penetrating the tibia while the visualization is running. The technique utilizes the narrow phase approach in detecting the intersection between two virtual planes.
4 Discussion and Result The training session involved 8 surgeons from Orthopedic Department, PPUKM. They are master’s student from year 1 to year 4. Previously, they just observed the knee surgery procedure and not yet to implement. Before starting the training, they were given the manual documentation on how to apply the system. The time to apply the jig on the saw bone was recorded first before they apply the virtual training system as shown in Figure 5. It was recorded starting from distal base been inserted with the pin and ended when the surgeon find out the actual blue plane is exactly parallel and overlaps with the planned resection plane.
242
I.S. Ramli et al.
D k
Fig. 5. Virtual training system
After that, they have to implement the training using the virtual training system for two sessions as shown in Figure 5. During the training session, they have to move the jig base on the specific movement (button) provided in the system to get the optimum position of jig. In this training, they have to implement both methods per session as shown in Figure 6. They also have to experience by having the various positions of planes in order to enhance their learning skill. After completed the training session, the time to apply the jig on the actual situation was recorded again after implementing the virtual training session.
Function
Initial position of jig and fixture
Initial position of ABP planes
Fig. 6. Virtual training system
Enhancing Learning Experience of Novice Surgeon with Virtual Training System
243
The result shows that the time taken for the surgeons to apply the jig on the saw bone after implementing the virtual training system reduced compared to before as shown in Figure 7.
Before After
Fig. 7. Time recorded before and after applying the system
5 Conclusion Based on the result, the virtual training system manages to train the novice surgeons to enhance the learning skill on how to implement the newly design jig and fixture in the computer assisted knee replacement surgery. With the correct technique, the time taken to get the optimum position of jig before sawing begins can be reduced. This is important as they manage improve the surgical performance. Besides, the novice surgeons also manage to implement this process to enhance the learning experience repeatedly, low cost and time to prepare for the training.
References 1. Sourina, O., Sourin, A., Sen, H.T.: Virtual Orthopedic Surgery Training On Personal Computer. International Journal of Information Technology 6(1), 16–29 (2000) 2. Gallagher, A., et al.: Virtual Reality Simulation for the Operating Room, ProficiencyBased Training as a Paradigm Shift in Surgical Skill Training. Annals of Surgery 241(2), 364–372 (2005) 3. Russ, Z., Richard, M.S.: Medical Applications of Virtual Reality. Communications of ACM 40(9), 63–64 (1997) 4. Gibson, S., Samosky, J., Mor, A., Fyock, C., Grimson, E., Kanade, T., Kikinis, R., Lauer, H., McKenzie, N., Nakajima, S., Ohkami, T., Osborne, R., Sawada, A.: Simulating arthroscopic knee surgery using volumetric object representations, real-time volume rendering and haptic feedback. In: Troccaz, J., Mösges, R., Grimson, W.E.L. (eds.) CVRMed-MRCAS 1997, CVRMed 1997, and MRCAS 1997. LNCS, vol. 1205, pp. 367– 378. Springer, Heidelberg (1997)
244
I.S. Ramli et al.
5. Heng, P.-A., Cheng, C.-Y., Wong, T.-T., Xu, Y., Chui, Y.-P., Chan, K.-M., Tso, S.-K.: A Virtual-Training System for Knee Antroscopic Surgery. IEEE Transactions on Information Technology in Biomedicine 8(2), 217–227 (2004) 6. Pöβneck, A., Nowatius, E., Trantakis, C., Cakmak, H., Maass, H., Kühnapfel, U., Dietz, A., Strauβ, G.: A Virtual Training System in Endoscopic Sinus Surgery. In: International Congress Series, vol. 1281, 3(184), pp. 527–530 (2005) 7. Sabri, H., Cowan, B., Kapralos, B., Porte, M., Backstein, D., Dubrowski, A.: Serious games for knee replacement surgery procedure education and training. In: Proceedings of the World Conference on Educational Science (WCES), pp. 3483–3488 (2010) 8. Awang, R.: Designing & Prototype Production of UKM TKR Cutting Jig for Computer Assisted Surgery in Total Knee Arthroplasty. Tesis Sarjana, Fakulti Perubatan, Pusat Perubatan Universiti Kebangsaan Malaysia (2009) 9. Berend, M.E., Riter, M.A., Meding, J.B., Faris, P.M., Keating, E.M., Redelman, R., Faris, G., Davis, K.: Tibial component failure mechanisms in total knee arthroplasty. Clin. Orthop. Relat. Res. Noc. (428), 26–34 (2004) 10. Stiehl, J.: Computer-Assisted Orthopedic Surgery (CAOS): Pros and Cons. Minimally Invasive Surgery in Orthopedics, 603–612 (2010)
Improved Gait Recognition with Automatic Body Joint Identification Tze-Wei Yeoh1, Wooi-Haw Tan1, Hu Ng2, Hau-Lee Tong2, and Chee-Pun Ooi1 1
Faculty of Engineering, Multimedia University, 63100 Cyberjaya, Selangor, Malaysia 2 Faculty of Information Technology, Multimedia University, 63100 Cyberjaya, Selangor, Malaysia [email protected], {twhaw,nghu,hltong,cpooi}@mmu.edu.my
Abstract. Gait recognition is an unobtrusive biometric, which allows identification of people from a distance by the manner in which they walk. In this paper, a new approach is proposed for extracting human gait features based on body joint identification from human silhouette images. In the proposed approach, the human silhouette image is first enhanced to remove the artifacts before it is divided into eight segments according to a priori knowledge of human body proportion. Next, the body joints which act as the pivot points in human gait are automatically identified and the joint trajectories are computed. To assess the performance of the extracted gait features, fuzzy k-nearest neighbor classification technique is used to identify subjects from the SOTON covariate database. The experimental results have shown that the gait features extracted using the proposed approach are effective as the recognition rate has been improved. Keywords: Gait recognition, Human identification, Fuzzy k-nearest neighbor, Visual Informatics.
1 Introduction Nowadays, biometrics are constantly applied for human identification and verification through the measurement of one or more intrinsic physical or behavioral traits such as fingerprint, face, spoken voice etc. Their usage is widely accepted as they are unique and one will not lose or forget them over time. Human gait is a complex yet unique locomotive pattern which involves synchronized movements of body parts, joints and the interaction among them [1]. Gait as biometric has attracted increasing attention as it is unobtrusive and has the ability to identify people at a distance when other biometrics are obscured. Basically, gait analysis can be divided into two distinct categories, namely modelbased method and model-free method. Model-based method generally models the human body structure or kinetics and extracts the features to match them to the modeled components. It incorporates knowledge of the human shape and dynamics of human gait into an extraction process. For instance, Johnson et al, used activityspecific static body parameters for gait recognition without directly analyzing gait dynamics [2]. Cunado et al. used thigh joint trajectories as the gait features [3]. The advantages of this method are the ability to derive the gait signatures directly from H. Badioze Zaman et al. (Eds.): IVIC 2011, Part I, LNCS 7066, pp. 245–256, 2011. © Springer-Verlag Berlin Heidelberg 2011
246
T.-W. Yeoh et al.
model parameters and free from the effect of different clothing or viewpoint. However, it has high computational cost due to complex matching and searching process. Conversely, model-free method differentiates the whole motion pattern of the human body through a concise representation which does not consider the underlying structure. Compared to the model-based method, this method has lower computational requirement. For instance, BenAbdelkader et al. proposed an eigengait method using image self-similarity plots [4]. Collins et al. established a method based on template matching of body silhouettes in key frames during a human’s walking cycle [5]. Philips et al. characterized the spatial-temporal distribution generated by gait motion in its continuum [6]. This paper describes a robust and reliable model-free approach to extract the human gait features from successive silhouette images of walking human subjects. In the proposed approach, the human silhouette is divided into eight body segments, from which the body joints are automatically identified and the joint trajectories are computed. As the proposed approach does not attempt to detect the legs separately, there will be no problem with self-occluded silhouettes. Moreover, the computation of joint trajectories is more straightforward than the Hough transform based technique presented by Ng et al [7]. As it is less complicated, the proposed approach also executes faster than the model-based method such as the linear regression approach by Yoo et al. [8] and temporal accumulation approach by Wagg [9].
2 Methodology The first step in the proposed gait feature extraction technique is to enhance the image containing the human silhouette via morphological opening. Next, the height and width of the human silhouette is measured and it is divided into eight segments according to a priori knowledge of human body proportion. The lower body joints that define the pivot points in human gait are then automatically identified and the joint trajectories are computed. After that, the step-size and crotch height are measured. Finally, fuzzy k-nearest neighbor (KNN) classification is employed for gait recognition in order to evaluate the performance of the extracted features. The proposed approach is summarized in the flowchart illustrated in Fig. 1.
Fig. 1. Flowchart of the proposed system
Improved Gait Recognition with Automatic Body Joint Identification
247
The original human silhouette images are obtained from the SOTON covariate database, University of Southampton [10]. This database was used to evaluate the recognition rate of the walking subjects with different covariate factors. In most of the human silhouette images, shadow is chronically found near the feet. It is usually merged to the subject body in the human silhouette image as shown in Fig. 2. This will hinder the gait feature extraction as it interferes with the body joint identification. The problem can be reduced by applying morphological opening with a 7×7 diamond shape structuring element, as denoted by A B = (AB) ⊕ B .
(1)
where A is the image, B is the structuring element, represents morphological erosion and ⊕ represents morphological dilation. The morphological opening first performs an erosion, followed by a dilation. Fig. 2 shows the original and enhanced images.
(a) Original image
(b) Enhanced image
Fig. 2. The original and enhanced image after morphological opening
Next, the width and height are measured from enhanced human silhouette. These two features will be used later for gait analysis in the later stage. Fig. 3(a) shows the width and height of the silhouette. The enhanced human silhouette is then divided into eight body segments based on a study of human body proportion by Dempster and
(a)
(b)
Fig. 3. (a) The width and height of a human silhouette. (b) The eight body segments.
248
T.-W. Yeoh et al.
Gaughran [11]. Fig. 3(b) shows the eight segments of the body, where a represents head and neck, b represents torso, c represents right upper thigh and hip, d represents right middle thigh, e represents right lower thigh and foot, f represents left upper thigh and hip, g represents left middle thigh and h represents left lower thigh and foot. To extract the body joints which define the human gait, the vertical position of hip, knees and ankles with respect to the body height is estimated by referring to a priori information of the human body proportion. Next, a horizontal line is drawn across the body joint and its horizontal position is calculated from two edges based on the following equation:
xc = x1 +
x2 − x1 . 2
(2)
where xc is the center position, x1 and x2 is the horizontal position of the first edge and second edge on the image profile along the horizontal line that passes through the joint, respectively. With the height of the hip determined, the next step is to find its horizontal position. When a horizontal line is drawn across the hip, there should be only two edges in a normal human silhouette, as highlighted in green in Fig. 4. The horizontal hip position can then be determined by calculating the width between both edges using the following equation:
c pos = crise +
c fall − crise 2
.
(3)
where crise and cfall are the horizontal positions of the rising edge and falling edge, respectively; and cpos is the horizontal position of the hip.
Fig. 4. Two edges on the horizontal line (dashed blue line) that passes through the hip. The corresponding image profile along the horizontal line is shown at the bottom.
By referring to a priori information of the human body proportion, the height of the knees can be found. To find the horizontal position of both knees, a horizontal line is drawn at knee height across the human silhouette. For a normal silhouette with no self-occlusion, there should be four edges on the image profile along the horizontal
Improved Gait Recognition with Automatic Body Joint Identification
249
line, as shown in green in Fig. 5(a). The horizontal knee positions can then be determined by finding the width between two adjacent edges on each leg using the following equations:
kleftPos = kleftRise +
krightPos = krightRise +
kleftFall − kleftRise
.
(4)
2 krightFall − krightRise
.
(5)
2
where kleftPos and krightPos are the horizontal positions of the left and right knees, respectively; kleftRise and krightRise are the horizontal positions of the rising edge on the left and right knees, respectively; and kleftFall and krightFall are the horizontal positions of the falling edge on the left and right knees, respectively. For a silhouette with self-occlusion, there will be only two edges on the same horizontal line, as highlighted in green in Fig. 5(b). The horizontal knee positions can be determined by computing the width between each edge with respect to the horizontal position of the hip, which is shown in purple in Fig. 5(b).
kleftPos = k rise +
krightPos = c pos +
c pos − krise
.
(6)
.
(7)
2 k fall − c pos 2
where kleftPos and krightPos are the horizontal positions of the left and right knees, respectively; cpos is the horizontal position of the hip; krise and kfall is the horizontal positions of the rising edge and falling edge, respectively. To determine the horizontal position of the ankles, the similar technique is employed. If a horizontal line is drawn at ankle height on a normal human silhouette with no self-occlusion, there should be four edges on the image profile along the horizontal line, as highlighted in green in Fig. 5(c). The horizontal ankle positions can then be determined by using the following equations:
AleftPos = AleftRise +
ArightPos = ArightRise +
AleftFall − AleftRise
.
(8)
2 ArightFall − ArightRise 2
.
(9)
where AleftPos and ArightPos are the horizontal positions of the left and right ankles, respectively; AleftRise and ArightRise are the horizontal positions of the rising edge on the left and right ankles, respectively; and AleftFall and ArightFall are the horizontal positions of the falling edge on the left and right ankles, respectively.
250
T.-W. Yeoh et al.
For human silhouette with self-occlusion, there will be only two edges on the horizontal line, as highlighted in green in Fig. 5(d). The horizontal ankle positions can then be determined by finding the width between both edges using the following equations:
AleftPos = Arise + 0.25 ( Afall − Arise ) .
(10)
ArightPos = Arise + 0.75 ( Afall − Arise ) .
(11)
where AleftPos and ArightPos are the horizontal positions of the left and right ankles, respectively; Arise and Afall are the horizontal positions of the rising edge and falling edge on the image profile, respectively.
(a)
(b)
(c)
(d)
Fig. 5. Image profile along the dashed blue horizontal line. (a) Four edges are found across the knees in a normal silhouette. (b) Two edges are found across the knees in a self-occluded silhouette. (c) Four edges are found across the ankles in a normal silhouette. (d) Two edges are found across the ankles in a self-occluded silhouette.
Fig. 6(a) shows how the joint trajectory is determined. All the joint trajectories are computed by using the following equation:
p 2 x − p1x p 2 − p1 y y
θ = tan −1
.
(12)
In this paper, five joint trajectories have been defined, as shown in Fig. 6(b). These trajectories are the hip joint trajectory (θ1), left knee joint trajectory(θ2), right knee joint trajectory (θ3), left ankle joint trajectory (θ4) and right ankle joint trajectory (θ5).The Euclidean distance between both ankles is also determined to obtain the subject’s step-size in each frame of the walking sequence. An example of a normal human silhouette and a self-occluded human silhouette are shown in Fig. 7 to illustrate the effectiveness of the proposed technique in identifying the body joints.
Improved Gait Recognition with Automatic Body Joint Identification
(a)
251
(b)
Fig. 6. (a) Joint trajectory computation. (b)All the extracted gait features.
(a) Normal human silhouette
(b) Self-occluded human silhouette
Fig. 7. Results of body joint identification using the proposed approach
3 Experiment As sufficient data is available for training and testing, the supervised fuzzy k-nearest neighbor (KNN) algorithm is applied to evaluate the effectiveness of the gait features extracted using the proposed approach. Basically, KNN is a classifier that is used to distinguish different subjects based on the nearest training data in the feature space. In other words, subjects are classified according to the majority of nearest neighbors. In an extension to KNN, Keller et al. [12] integrated fuzzy relation with KNN. According to Keller’s concept, the unlabelled subject’s membership function of class i given as:
252
T.-W. Yeoh et al.
1 ( ) u x i 2 x∈KNN || x − x ||m −1 ui ( x ) = 1 2 x∈KNN m −1 || x − x ||
⋅
(13)
where x , x and ui( x ) represent the unlabeled subjects, labeled subjects and x’s membership of class i, respectively. This equation will compute the membership value of unlabeled subject by the membership value of labeled subject and distance between the unlabeled subject and fuzzy KNN labeled subjects. The experiment was carried out for eleven subjects walking in parallel to a static camera, with thirteen different covariate factors that consist of ten different apparels and three different walking speeds. Each subject was captured wearing a variety of footwear (flip flops, bare feet, socks, boots, trainers and own shoes), clothes (normal or with trench coat) and carrying the various bags (barrel bag slung over the shoulder or carried by hand, and rucksack). They were also recorded walking at different speed (slow, fast and normal speed). For each subject, there are approximately twenty sets of walking sequences, which are from right to left and vice-versa on normal track. The proposed approach was capable of identifying the body joints of the human silhouette on all thirteen covariate factors. To observe its effectiveness, some examples of subjects with different apparels and the identified body joints are shown in Fig. 8.
(a) Rucksack.
(b) Handbag.
(c) Barrel bag.
(d) Trench coat.
(e) Boots, long blouse and barrel bag.
(f) Shalwar kameez and rucksack.
Fig. 8. Body joint identification on human silhouettes with different apparels
In order to obtain the optimized results, nine gait features were adopted for the classification. First, maximum hip trajectory (θ1max ) was determined from all the hip
Improved Gait Recognition with Automatic Body Joint Identification
253
trajectories collected during a walking sequence. When θ1max was identified, the corresponding step-size (S), width (w), height (h), left knee trajectory (θ 2 ) , right knee trajectory (θ3 ) and crotch height (CH) were determined as well. In order to improve the recognition rate, eight additional features were used. These features are the average of the local maxima detected for width (AW), crotch height (ACH), hip trajectory ( Aθ1 ) , left knee trajectory ( Aθ 2 ) , right knee trajectory ( Aθ3 ) , left ankle trajectory ( Aθ 4 ) , right ankle trajectory ( Aθ 5 ) and step-size (AS). As a result, fifteen features are channelled into the classification process and the Euclidean distance is defined as:
(
D ( xi , x j ) = sqrt (θ1max − θ1max i j ) + ( Si − S j ) + ( wi − w j ) + ( hi − h j ) +
(θ (A (A
2i
θ4
2
2
2
− θ 2 j ) + (θ 3i − θ3 j ) + ( CH i − CH j ) + ( Aiw − Awj ) +
CH i
i
2
2
2
2
2
(
θ3 θ3 θ1 θ1 θ2 θ2 − ACH j ) + ( Ai − Aj ) + ( Ai − Aj ) + Ai − Aj 2
2
(
− Aθj 4 ) + Aiθ5 − Aθj 5 2
2
) +(A 2
S i
) ).
)
2
+
(14)
S 2 j
−A
where xi is the unlabeled subject, xj is the labeled subject and sqrt(.) denotes the square root. Since supervised classification algorithm is adopted, the classification process is divided into training and testing stages. In the training stage, eight walking sequences for each subject with each covariate factor were utilised. The rest of the sequences were used in the testing stage. Therefore, there are 1360 and 1362 walking sequences used for training and testing, respectively.
4 Experimental Results and Discussion Several tests were conducted using different values of k for the fuzzy KNN algorithm for gait recognition. The overall results for 1362 walking sequences are summarised in Table 1. From the table, it can be observed that the highest recognition rate of 87.96% is obtained for k = 6 and k = 7. Unlike the results reported by Ng et al.[7], we found that there was no improvement in the recognition rate when Gaussian filter was used to smoothen the gait features. Table 1. Recognition rates with different k k
No. of correctly recognized walking sequences 1179 1178 1185 1186 1198 1198 1191
Recognition rate (%) 86.56 86.49 87.00 87.08 87.96 87.96 87.44
254
T.-W. Yeoh et al.
Recognition rates (%)
To further evaluate the performance of the extracted gait features, the Cumulative Match Scores (CMS) initiated by Philips et al. [13] was employed. CMS shows the probability of correct classification versus relative rank k. It depicts the recognition rate for the correct subject with respect to top n matches. By referring to Fig. 9, it can be observed that most of the subjects are correctly classified at rank 2, which achieved a recognition rate of 95.23%. The recognition rate reached 100% at the rank 5, which implies that all the subjects can be correctly identified within the top five matching.
100 98 96 94 92 90 88 86 84 82 80 rank 1
rank 2
rank 3 Rank
rank 4
rank 5
Fig. 9. Cumulative Match Scores of top five matches for all subjects
For comparison with other approaches that used the same SOTON covariate database, our approach performed better than the recognition rate of 73.4% by Bouchrika et al. [14], 80.58% by H. Ng et al. [7] and 86% by Pratheepan et. al. [15]. The recognition rates, number of subjects, number of covariate factors and number of walking sequences used in these approaches are given in Table 2. Table 2. Comparison with other approaches on SOTON covariate database
Recognition rate (%) No. of subjects No. of covariate factors No. of walking sequences
Bouchrika et. al. [22] 73.4
Pratheepan et. al. [15] 86
Ng et al. [13] 80.58
Our approach 87.96
Ten Eleven
Ten Four #
Eleven** Thirteen
Eleven** Thirteen
440
180
2722
2722
** Subjects include one female wearing long blouse # Without covariate factors on shoe types and walking speed
5 Conclusion A new approach for extracting the human gait features from human silhouette images has been developed. The gait features are extracted from the human silhouette by
Improved Gait Recognition with Automatic Body Joint Identification
255
determining the body joints from the body segments based on a priori knowledge of human body proportion. Once the body joints have been identified, the joint trajectories can be computed using a straightforward approach. This method has shown to be more effective as it is capable to identify the body joints from selfoccluded human silhouettes. In addition, the improved recognition rate also shows that the proposed method is robust and can perform well under different covariate factors. For future works, the proposed approach will be applied on other gait databases and the usage of various classification algorithms will also be studied. Acknowledgement. The authors would like to thank Professor Mark Nixon, School of Electronics and Computer Science, University of Southampton, United Kingdoms for his kindness in providing the SOTON covariate database for use in this research.
References 1. BenAbdelkader, C., Culter, R., Nanda, H., Davis, L.: EigenGait: Motion-based Recognition of People Using Image Self-similarity. In: International Conference Audio and Video-Based Person Authentication, pp. 284–294 (2001) 2. Bobick, A., Johnson, A.: Gait Recognition Using Static, Activity-specific Parameters. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 423–430 (2001) 3. Cunado, D., Nixon, M., Carter, J.: Automatic Extraction and Description of Human Gait Models for Recognition Purposes. Computer and Vision Image Understanding 90, 1–41 (2003) 4. BenAbdelkader, C., Cutler, R., Davis, L.: Motion-based Recognition of People in Eigengait Space. In: Fifth IEEE International Conference, pp. 267–272 (2002) 5. Collin, R., Gross, R., Shi, J.: Silhouette-based Human Identification from Body Shape and Gait. In: Fifth IEEE International Conference, pp. 366–371 (2002) 6. Philips, P.J., Sarkar, S., Robledo, I., Grother, P., Bowyer, K.: The Gait Identification Challenge Problem: Dataset and Baseline Algorithm. In: 16th International Conference on Pattern Recognition, pp. 385–389 (2002) 7. Ng, H., Tan, W.H., Tong, H.L., Abdullah, J.: Improved Gait Classification with Different Smoothing Techniques. In: International Conference on Advanced Science, Engineering and Information Technology, pp. 242–247 (2011) 8. Yoo, J.H., Nixon, M.S., Harris, C.J.: Extracting Human Gait Signatures by Body Segment Properties. In: Fifth IEEE Southwest Symposium on Image Analysis and Interpretation, pp. 35–39 (2002) 9. Wagg, D.K., Nixon, M.S.: On Automated Model-based Extraction and Analysis of Gait. In: 6th IEEE International Conference on Automatic Face and Gesture Recognition, pp. 11–16 (2004) 10. Shutler, J.D., Grant, M.G., Nixon, M.S., Carter, J.N.: On a Large Sequence-based Human Gait Database. In: 4th International Conference on Recent Advances in Soft Computing, pp. 66–71 (2002) 11. Dempster, W.T., Gaughran, G.R.L.: Properties of Body Segments Based on Size and Weight. American Journal of Anatomy 120, 33–54 (1967) 12. Keller, J., Gray, M., Givens, J.: A Fuzzy K-Nearest Neighbor Algorithm. IEEE Transactions on Systems, Man, and Cybernetics 15, 580–585 (1985)
256
T.-W. Yeoh et al.
13. Phillips, P.J., Moon, H., Rizvi, S.A., Rauss, P.: The FERET Evaluation Methodology for Face-recognition Algorithms. IEEE Transaction on Pattern Analysis and Machine Intelligence 22, 1090–1104 (2000) 14. Bouchrika, I., Nixon, M.S.: Exploratory Factor Analysis of Gait Recognition. In: 8th IEEE International Conference on Automatic Face and Gesture Recognition, pp. 1–6 (2008) 15. Pratheepan, Y., Condell, J.V., Prasad, G.: Individual Identification Using Gait Sequences under Different Covariate Factors. In: Fritz, M., Schiele, B., Piater, J.H. (eds.) ICVS 2009. LNCS, vol. 5815, pp. 84–93. Springer, Heidelberg (2009)
A Real-Time Vision-Based Framework for Human-Robot Interaction Meng Chun Lam1, Anton Satria Prabuwono1, Haslina Arshad1, and Chee Seng Chan2 1
Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, 43600 UKM Bangi, Selangor D.E., Malaysia 2 Center of Signal and Image Processing Faculty of Computer Science and Information Technology, University of Malaya, 530603 Kuala Lumpur, Malaysia [email protected], {antonsatria,has}@ftsm.ukm.my, [email protected]
Abstract. Building human-friendly robots which are able to interact and cooperate with humans has been an active research field in recent years. A major challenge in this field is to develop robots that can interact and cooperate with humans by understanding human communication modalities. Nonetheless, human face is a dynamic object and has a high degree of variability in its appearance, which makes face detection a difficult problem. In this paper, we present a real-time vision-based framework to detect human face and analysis of the human face direction in window area to interact with robot. A cascade of feature detectors trained with boosting technique has been employed. Experimental results using servo motors connect to SD21 and PIC16F887A microcontroller; and the MIABOT Pro have validated our approach. Our future work is to build an intelligent wheelchair whose motion can be controlled by the user’s face direction. Keywords: Vision-based framework, human-robot interaction, face detection, AdaBoost algorithm, Visual Informatics.
1 Introduction Advances in technology and robotics are making more concrete the possibility of having domestic robots performing tasks inside one’s home. To date, a wide array of assistive robots is already available for treatment and rehabilitation of temporarily impaired people and for long term care of people suffering from chronic illness. A new challenge can be identified in the possibility of implementing robotic devices which can go beyond the sphere of mere health assistance and help people in everyday activities at home [4]. Far from being a matter of functionality alone, the issue of acceptability depends on the psychological implications of Human-Robot Interaction (HRI). One of most important components to develop a human-friendly robot is visual interfaces, as it facilitates natural and easy interface for human-robot interaction [7]. Facial gesture can be a natural way to control a robot. Bakx et al. [3] have analysed facial orientation during multi-party interaction with a multi-modal information H. Badioze Zaman et al. (Eds.): IVIC 2011, Part I, LNCS 7066, pp. 257–267, 2011. © Springer-Verlag Berlin Heidelberg 2011
258
M.C. Lam et al.
booth. They found that users were nearly always looking at the screen of the information kiosk when interacting with the system. In near future, robots should be able to interact with human in more nature way such as vision-based interface that perform in real-time. For instance, tracking user’s face feature and movement, understand the face expression of human knowing the user is in angry, happy or sad to make a proper reaction. Nonetheless, robots also require to understand some prominent modality used by human for facilitate natural interaction. This includes speech, eye-contact, body-language and hand pointing gesture. One of the main reasons is humans are more comfortable to interact with robot in naturally than using a physical devices such as keyboard and mouse to control the robot.
Fig. 1. MIABOT Pro
Looking into this, the vision-based face detection and tracking systems have been established in many applications, e.g. video conferencing, distance learning, video surveillance, biometric access control as well as virtual environments or video games [1]. In this paper, a real-time vision-based interface is designed to detect the human face using the Viola and Jones [10] approach. That is, first of all, we employed a webcam attached on a personal computer to detect the human face from a cluttered environment. Secondly, an analysis is performed to find the human face direction in a window area. Finally, based on the window area information, a signal is sent to the robot simulation head controlled by a servo motor; and miniature mobile robot (MIABOT Pro as illustrated in Figure 1). In our experiment, the robot simulation head is designed to mimic the user head direction; and the MIABOT Pro is designed to move as to the user head position. The remainder of this paper is organised as follows: In Section 2 we review some related work. Section 3 describes the real-time AdaBoost algorithm for face detection. Section 4 presents experiment results for effective non-intrusive human-robot interaction. Section 5 concludes the paper.
2 Related Work Human-Robot Interaction (HRI) is a field of study dedicated to understanding, designing, and evaluating robotic systems for use by or with humans. Interaction, by
A Real-Time Vision-Based Framework for Human-Robot Interaction
259
definition, requires communication between robots and humans. The communication between a human being and a robot must be “human-friendly” which may take several forms. In this paper, however, we only focus on the literature review on facial detection. Face detection is one of the visual tasks which humans can do effortlessly. However, in computer vision terms, this task is not easy. A general statement of the problem can be defined as follows: Given a still or video image, detect and localize an unknown number (if any) of faces. The solution to the problem involves segmentation, extraction, and verification of faces and possibly facial features from an uncontrolled background. As a visual frontend processor, a face detection system should also be able to achieve the task regardless of illumination, orientation, and camera distance. As the most primitive feature in computer vision applications, edge representation was applied in the earliest face detection work by Sakai et al. [9]. The work was based on analysing line drawings of the faces from photographs, aiming to locate facial features. So far, many different types of edge operators have been applied. Colour segmentation can basically be performed using appropriate skin colour thresholds where skin colour is modelled through histograms or charts or within a wide user spectrum. For instance, Oliver et al. [8] and Yang and Waibel [2] employ a Gaussian distribution to represent a skin colour cluster of thousands of skin colour samples taken from difference races. The Gaussian distribution is characterized by its mean (µ) and covariance matrix (Σ). Pixel colour from an input image can be compared with the skin colour model by computing the Mahalanobis distance. This distance measure then gives an idea of how close the pixel colour resembles the skin colour of the model. Recently, a novel Bayesian Discriminating Features (BDF) method for multiple frontal face detection was proposed [6]. The BDF method, which is trained on images from only one database, yet works on test images from diverse sources, displays robust generalization performance. The novelty of this paper comes from the integration of the discriminating feature analysis of the input image, the statistical modelling of face and non-face classes, and the Bayes classifier for multiple frontal face detection.
3 Face Detection As stated in previous section, a wide variety of techniques have been proposed, ranging from simple edge-based algorithms to composite high-level approaches utilizing advanced pattern recognition methods. Viola and Jones [10] introduced an impressive face detection system capable of detecting faces in real-time with both high detection rate (about 90%) and very low false positive rates. The desirable properties are attributed especially to the efficiently computable features used, the AdaBoost learning algorithm and the cascade technique adopted for decision making. In this paper, we have adopted the Viola and Jones [10] approach to perform the HRI, and the proposed framework is illustrated in Figure 2.
260
M.C. Lam et al.
Fig. 2. The face detection framework
3.1 AdaBoost Algorithm The detection procedure classifies images based on the value of simple scalar features. There are many motivations for using features rather than the pixels directly. The most common reason is that features can act to encode ad hoc domain knowledge that is difficult to learn using a finite quantity of training data, another reason is that the feature-based system operates much faster than a pixel-based system. The features are similar to Haar basis functions. They operate on the grey level images and decision depends on the value of difference of sums computed over rectangular regions. Viola and Jones use three kinds of features depicted in Figure 3. The value of a two-rectangular feature is the difference between the sums of the pixels within two rectangular regions. The regions have the same size and shape and are horizontally or vertically adjacent. A three-rectangular feature computes the sum within two outside rectangles subtracted from the sum in centre rectangle. Finally a four-rectangular feature computes the difference between diagonal pairs of rectangles. Two other types of features can easily be generated by rotating the first two types by 90o.
A Real-Time Vision-Based Framework for Human-Robot Interaction
261
(a) Edge feature
(b) Line feature
(c) Centre feature Fig. 3. Haar-like features employed by Viola and Jones [10]
Rectangle features can be computed very efficiently by means of an auxiliary image, also referred to as an integral image. The integral image, I, at location (x; y) is defined as the sum of the pixels of the rectangle ranging from the top left corner at (0; 0) to the bottom right corner at (x; y):
I ( x, y) =
i ( x' , y ' )
x '≤ x , y '≤ y
(1)
where i is the input image in question. The integral image can be computed in one pass over the original image by using the following recurrence relation:
I ( x, y) = I ( x, y −1) + I ( x −1, y) + i( x, y) − I ( x −1, y −1)
(2)
where I (−1, y) = I (x, −1) = I (−1, −1) = 0. Using the integral image any rectangular sum can be computed in four array references [10]. Since the two-rectangular features defined above involve adjacent rectangular sums they can be computed in only six references, eight in the case of three-rectangle features and nine for four-rectangle features. As we have seen the scalar feature is a transformation from the n-dimensional image space to the real line. Given that the base resolution of the face detector is 24 x 24 pixels the image space is 576-dimensional. For each such detector window there are tens of thousands of different features ranging over all scales and positions within the window. Thus, the set of rectangular features is an over-complete set for the intrinsically low-dimensional image pattern x. However, as we will see in the next section only a very small number of these features are needed to form an effective classifier.
262
M.C. Lam et al.
3.2 Learning the Classifier Function Given a feature set and a training set of positive and negative images, different machine learning approaches could be used to learn a classification function. The system of Viola and Jones [10] makes a successful application of the discrete version of AdaBoost to learn the classification function. Specifically, AdaBoost is adapted to solve the following three fundamental problems in one boosting procedure: First - Learning effective features from a large feature set. The conventional AdaBoost procedure can be easily interpreted as a greedy feature selection process. Consider the general problem of boosting, in which a large set of classification functions are combined using a weighted majority vote. The challenge is to associate a large weight with each good classification function and a smaller with poor functions. Drawing an analogy between weak classifiers and features, AdaBoost is an effective procedure for searching out a small number of good features. Second - Constructing weak classifiers each of which is based on one of the selected features. One practical method for completing this analogy is to restrict the weak learner to the set of classification functions each of which depend on a single feature. In support of this goal, the weak learning algorithm is designed to select the single rectangle feature which best separates the positive and negative examples. This implies choosing the feature which minimise the error εt or Zt, for the discrete or real version of AdaBoost, respectively. Third - Boosting the weak classifiers into a stronger classifier. AdaBoost combines the selected features as a linear combination and provides a strong theoretical backbone for the bound of the training error. The adopted Adaboost algorithm is described in Algorithm 1. 3.4 Cascade Building In order to detect a face in an image, we need to examine all possible sub-windows and determine whether they contain a face or not. This is done for all possible positions and for different scales. In a regular image of 320 x 240 pixels there are over 500,000 sub-windows. In order to reduce the total running time of the system, we need to radically bound the average time that the system spends on each sub-window. For this purpose, Viola and Jones [10] suggested using a cascade of classifiers. The idea of the cascade is based on the observation that we need very few features to create a classifier that accepts almost 100% of the positive examples while rejecting many (20-50%) of the false examples. Linking many such classifiers one after another will create a cascade of classifiers that separates true from false examples almost perfectly. The structure of the cascade reflects the fact that within any single image an overwhelming majority of sub-windows are negative. As such, the cascade attempts to reject as many negatives as possible at the earliest stage possible. While a positive instance will trigger the evaluation of every classifier in the cascade, this is an exceedingly rare event.
A Real-Time Vision-Based Framework for Human-Robot Interaction
263
Algorithm 1. The AdaBoost Algorithm – Learning Phase GIVEN sample images , …, positive samples, respectively. INITIALISE weights D1(i) = ITERATE t = 1, …, T: 1. Normalise the weights, D t (i ) D t (i ) =
where
for negative and
, where Dt is a probability distribution
m
i =1
D t (i )
2. For each feature, k: - Compute a histogram, with N bins, using the values of the feature extracted from the training images. - Train a classier, hk, using the histogram and
m Dt (i )[xi ∈ Bin( j ) Λy = +1] + ε 1 c j = ln im=1 2 Dt (i )[xi ∈ Bin( j ) Λy = −1] + ε i =1 where cj = h(x) for x - Compute
Bin(j) and N
Zk = j =1
Λ is a small positive number.
D(i) exp(− yi c j )
i: xi ∈Bin ( j )
3. Choose the classier, ht, with the smallest Zt 4. Update the weights: Dt +1 (i ) = Dt (i ) exp(− yi ht ( xi )) RETURN the final classifier: T
H ( x ) = sign ( h t ( x )) t =1
The building process of the cascade is described in Algorithm 2. Inputs to the algorithm are the desired false positive rate, f, the detection rate, d, of the cascade stages and the final positive rate, ffinal, of the cascade. Each stage of the cascade is trained by AdaBoost with the number of features used being increased until the target detection and false positive rates can be met for this level. Because AdaBoost attempts only to minimise errors and is not specifically designed to achieve high detection rates at the expense of larger false positive rates, a threshold, θ, for the strong classifier is introduced. That is, let the new strong classifier be (3) Reducing the threshold yields a classifier with more false positives and a higher detection rate. The rates are determined by testing the current detector on a validation
264
M.C. Lam et al.
set. Subsequent classifiers are trained using those examples which pass through all the previous stages. As a result, the second classifier faces a more difficult task than the first. The examples which make it through the first stage are “harder” than typical examples. Algorithm 2. The Cascade Detector Algorithm – Learning Phase INPUT False positive rates, f, detection rate, d and final false positive rate, ffinal INITIALISE F0 = 1, D0 = 1 WHILE Fi < ffinal 1. Train a classier with Adaboost (Algorithm 1) until freached < f and dreached > d on the validation set. 2. Fi + 1 = Fi x freached 3. Di + 1 = Di x dreached 4. Discard misclassified faces and generate new data from non-face images.
4 Simulation Results In this paper, we conducted the experiments on robot simulation head controlled by a servo motor; and miniature mobile robot (MIABOT Pro). Face images are taken from the Caltect-101 [5]. The data set contains face images of variable quality, different facial expressions and taken under a wide range of lightning conditions, with uniform or complex background. The pose of the heads is generally frontal with slight rotation in all directions. The data set contains 450 images. Non-face images were collected from the web. Images of diverse scenes were included. The data set contains images of animals, plants, countryside, man-made objects, etc. 4.1 Servo Motor For the robot simulation head, protocol must be creating before the application can control the servo motor. This protocol resulted the microcontroller to understand the signals that come from the computer and send commands to the SD21 to rotate the servo motor. Table 1 and 2 show the protocol used in this experiment. Figure 4 shows the results of the robot simulator head mimics the user head direction in the window area. This can be seen that the approach has successfully control the robot simulation head. Table 1. Command for the robot head simulator and corresponding signal Instructions Reset Coordinate ‘X’ Position Reset Coordinate ‘Y’ Position Reset Coordinate ‘X’ AND ‘Y’ Position Position ‘UP’ Position ‘DOWN’ Position ‘LEFT’ Position ‘RIGHT’
Signal X Y B U D L R
A Real-Time Vision-Based Framework for Human-Robot Interaction
265
Table 2. Speed for robot si mu lato r head and corresponding signal Speed Fast Normal Slow
Signal F N S
(a) Head Direction (Up, Down, Left, Right) (b) Head Direction (Up-Right, Up-Left, DownLeft, Down-Right) Fig. 4. Results of the robot simulator head. It can be noticed that the robot simulator head is mimicking the user head direction.
4.2 MIABOT The MIABOT Pro is a fully autonomous miniature mobile robot. The latest version features bi-directional Bluetooth communications, which provides a robust frequency hopping wireless communications protocol at 2.4GHz. MIABOT Pros are ideal robots for use as part of technology class tutorials, for research and development, and for high-speed and agility requirements such as robot competitive events. Universities already use MIABOTs worldwide for a wide variety of applications including FIRA robot soccer competitions, intelligent behaviours, robot swarming experiments and mobile robot navigation experiments.
(a) Original Position
(b) Rotated Right
Fig. 5. Experimental results of the MIABOT Pro (right rotated)
266
M.C. Lam et al.
Manufacturer has design a set of Bluetooth communication protocol, basically all commands begin with the command-start character ‘[’, and end with the command end character ‘]’. The first character after ‘[’ identifies the command, after those bytes are free-format, determined by the specific command used. Extra characters after command-arguments and before ‘]’ are ignored and extra characters between commands (after ‘]’ and before ‘[’) are also ignored. In order to rotate the MIABOT Pro or move forward/backward, the commands are shown as below: [ < ] : turn left [ > ] : turn right [ ^ ] : step forward [ v ] : step backward Command will be send according the position of human face in the window area, for example in Figure 5, human face is in left side of the window then command [[ <]] will be send to rotate the MIABOT Pro; Figure 6 shows the human face move to upper area of window area then command [[ ^]] will be send to make the MIABOT Pro moves forward.
(a) Original Position
(b) Move Forward
Fig. 6. Experimental results of the MIABOT Pro (moves forward)
5 Conclusion Face detection and tracking for human-robot interaction provided vision based interface so interaction between human and robot can be done in a natural way. Such an interface is able to detect the human face position to control the robot movement. The application is simple to employ and passive, thereby making it be a humanfriendly robot control. This work is in its preliminary stages, and there are numerous promising directions we hope to explore. For instance, the approach will be applied to build an intelligent wheelchair. However, we wonder, at this point, whether a “best” human-robot interface exists and what kind of characteristics it should have, but we believe that every situation requires specific features and different strategies of interaction.
A Real-Time Vision-Based Framework for Human-Robot Interaction
267
Acknowledgment. The authors would like to thank Ministry of Higher Education Malaysia (MoHE) and Universiti Kebangsaan Malaysia (UKM) for providing facilities and financial support under Research University Grant of Arus Perdana No. UKM-AP-ICT-17-2009 and Fundamental Research Grant Scheme No. UKM-TT-03FRGS0131-2010.
References 1. Barth, A., Herpers, R.: Robust Head Detection and Tracking in Cluttered Workshop Environments using GMM. In: Kropatsch, W.G., Sablatnig, R., Hanbury, A. (eds.) DAGM 2005. LNCS, vol. 3663, pp. 442–450. Springer, Heidelberg (2005) 2. Yang, J., Waibel, A.: A real-time face tracker. In: Proceedings of WACV 1996 (1996) 3. Bakx, I., van Turnhout, K., Terken, J.: Facial orientation during multi-party interaction with information kiosks. In: Proceedings of the Interact 2003 (2003) 4. Kubota, N., Shimomura, Y.: Human-friendly networked partner robots toward sophisticated services for a community. In: Proceedings of the SICE-ICASE International Joint Conference, pp. 4861–4866 (2006) 5. Li, F.-F., Fergus, R., Perona, P.: One-shot learning of object categories. IEEE Transactions on Pattern Analysis and Machine Intelligence 28, 594–611 (2006) 6. Liu, C.: A bayesian discriminating features method for face detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 25, 725–740 (2003) 7. Matsumoto, Y., Heinzmann, J., Zelinsky, A.: The essential components of human-friendly robot systems. In: International Conference on Field and Service Robotics, pp. 43–51 (1999) 8. Oliver, N., Pentland, A., Brard, F.: Lafter: A real-time face and lips tracker with facial expression recognition. Pattern Recognition 33, 1369–1382 (2000) 9. Sakai, T., Nagao, M., Kanade, T.: Computer analysis and classification of photographs of human faces. In: Proc. First USA-JAPAN Computer Conference, pp. 55–62 (1972) 10. Viola, P., Jones, M.: Robust real-time face detection. International Journal Computer Vision 57(2), 137–154 (2004)
Automated Hemorrhage Slices Detection for CT Brain Images Hau-Lee Tong1, Mohammad Faizal Ahmad Fauzi2, and Su-Cheng Haw1 1 Faculty of Information Technology, Multimedia University, Jalan Multimedia, 63100 Cyberjaya, Selangor, Malaysia 2 Faculty of Engineering, Multimedia University, Jalan Multimedia, 63100 Cyberjaya, Selangor, Malaysia {hltong,faizal1,schaw}@mmu.edu.my
Abstract. This paper presents an automated method to detect the hemorrhage slices for Computed Tomography(CT) brain images. The proposed system can be divided into two stages which are preprocessing stage and detection stage. Preprocessing basically is to prepare and enhance the images for the detection stage. During detection stage, new midline detection approach is proposed to divide the intracranial area into left and right hemispheres. Then histogram features are extracted from left and right hemispheres for the dissimilarity comparison. All the feature components will be channeled into support vector machine (SVM) classifier to determine the existence of the hemorrhage. Tenfold cross validation was applied during the SVM classification. The experiments were performed on 450 CT images and results were evaluated in terms of recall and precision. The recall and precision obtained from experimental results for hemorrhage slices detection are 84.86% and 96.82% respectively. Keywords: Computed tomography, Hemorrhage slices detection, Visual Informatics.
1 Introduction Computed tomography(CT) is one of the modalities that is used widely to reveal the hemorrhage of the brain. Hemorrhage is one of the abnormalities that normally appear as a bright region which contrasts with other tissues in CT images. The detection of the hemorrhage is significant so that the immediate and proper treatment can be carried out. Treatment for the hemorrhage relies on the location, cause, and extent of the hemorrhage. Surgery may be needed to alleviate swelling and prevent bleeding. The computer-aided detection of the hemorrhages is to furnish additional information to the doctors to confirm their diagnosis. The main challenge in automated computer-aided hemorrhage detection is to improve the accuracy of the detection. Different methodologies have been presented by other researchers hoping H. Badioze Zaman et al. (Eds.): IVIC 2011, Part I, LNCS 7066, pp. 268–279, 2011. © Springer-Verlag Berlin Heidelberg 2011
Automated Hemorrhage Slices Detection for CT Brain Images
269
to achieve better detection accuracy. Liu et al [1] proposed a symmetric detection approach to detect the abnormalities in their semantics-based biomedical image indexing and retrieval system. The similar midline approach for hemorrhage detection is also employed by Hara et al [2] and Chan [3]. Another midline detection approach presented by Chawla et al [4] uses the slice with the steepest curve for each patient to detect the rotation angle for the correction and symmetric line. Then, the dissimilarity between the left and right hemispheres is evaluated to detect the acute infarct, chronic infarct and hemorrhage. Besides using midline approach, other techniques to detect the abnormalities also have been explored. By using the extracted features from the regions, simple rulebased approaches [5], [6] have been proposed to annotate the regions such as calcification, hemorrhage and stroke lesion. Liu eat al [7] proposed a support vector machine learning model for the hemorrhage slices detection by using extracted features from the hemorrhage slices. In this paper, we propose a simple new methodology for the detection of the hemorrhage slices. New midline formation approach is used in detecting the hemorrhage slices. The main objective of this work is to create a retrieval system so that medical experts and students can use it to retrieve the hemorrhage images automatically for further study and analysis rather than spend substantial time to browse through the whole image database.
2 Proposed Method Overall the proposed system is consisted of two essential stages which are preprocessing and detection as shown in Fig. 1.
Fig. 1. Flow of the proposed system
The images will undergo three preprocesses to prepare and improve the visualization the images for the detection stage. The first level image enhancement is to improve the contrast of the original image. For the intracranial area extraction, regions other than intracranial will be removed to acquire the intracranial area. Second level image enhancement is to refine the contrast of the hemorrhage regions to prepare the image for the feature extraction.
270
H.-L. Tong, M.F. Ahmad Fauzi, and S.-C. Haw
After preprocessing, the images will be ready for the detection process. Detection process is also consisted of three parts. The formation of midline is to divide the intracranial area into left and right hemispheres. After that, features will be extracted from left and right hemispheres respectively. Then histograms are constructed and Euclidean distances are computed to measure the dissimilarity between left and right hemispheres. All the extracted feature components then will be normalized and used for the SVM classification to differentiate the hemorrhage slices and normal slices.
3 Preprocessing 3.1 First Level Image Enhancement The first level image enhancement is to augment the contrast of the original image. The original image lacks of dynamic range whereby only certain parts are visible as shown in Fig. 2.
Fig. 2. Original image
Therefore, an automated visualization enhancement system is crucial to improve the visibility of the images. In order to automate the system, the original image will experience five essential processes as shown in Fig 3. Original image
Enhanced image
Construct the histogram
Apply the linear contrast stretching
Smoothing the curve
Transform to absolute first difference
Acquire the upper and lower limits acquisition
Fig. 3. Contrast enhancement flow
Firstly the histogram for the original image is constructed as shown in Fig. 4. The constructed histogram is consisted of several peaks. The leftmost and rightmost major peaks are contributed by the background and ROIs respectively. Therefore, the focus is to stretch the contrast for the rightmost peak. Then, it proceeds with a smoothing process where convolution operation with a vector is applied. Each element of the
Automated Hemorrhage Slices Detection for CT Brain Images
271
vector is with the value 10-3. The smoothing process is to facilitate the acquisition process of appropriate upper and lower limits. The smoothened curve is then transformed into absolute first difference (ABS) as shown in Fig. 5. Then the closest peak on the left and right are automatically detected and determined as the lower limit, IL and upper limit, IU respectively. IL IL
Fig. 4. Histogram of the Fig. 2
Eventually, the acquired
IU IL
Fig. 5. Absolute first difference
I L and I U are employed for the linear contrast
stretching as shown in equation (1).
F ( i , j ) = I m ax
( I (i, j ) − I L ) (I U − IL )
(1)
where Imax, I(i,j) and F(i,j) denote the maximum intensity of the image, pixel value of the original image and pixel value of the contrast improved image respectively. After the contrast stretching, the visual enhanced image is shown in Fig. 6.
Fig. 6. Enhanced image
3.2 Intracranial Extraction In this stage, intracranial area is acquired by filtering out the background, skull and scalp. First, the scalp is filtered out by using thresholding technique. The resulted binary image is shown in Fig. 7. From the binary image, the skull always appears to be the largest connected component as compared to the background objects. As such, we only need to identify the largest connected component in order to recognize the skull. After that, the skull is removed by setting the intensity of the skull to zero. The intracranial mask is generated by filling up the hole inside the skull. Lastly, intensity of the skull is set to zero and the obtained intracranial area is shown in Fig. 8.
272
H.-L. Tong, M.F. Ahmad Fauzi, and S.-C. Haw
Fig. 7. Resulted binary image
Fig. 8. Intracranial area
3.3 Second Level Image Enhancement The aim of this stage is to refine the contrast of the hemorrhagic regions where intensity of the hemorrhage is beyond the peak intensity of the first level visualization enhancement image. As such, the image enhancement is focused on the higher intensity side to improve the visualization of the hemorrhagic regions. Prior to the visualization enhancement, appropriate lower and upper limits are automatically determined by the following the steps: i) Construct the histogram for the intracranial image. ii) Auto-determine the peak position to be the lower limit, IL. Then upper limit, is defined by equation (2).
IU=IL+Ie
(2)
where Ie is predefined at 500 found from experimental observation. iii) Channel the obtained values of IU and IL into equation (1) for contrast stretching. In order to reduce the “salt and pepper” noise that appear in the enhanced image, median filter is employed. The image produced after the visualization enhancement and “salt and pepper” noise reduction is shown in Fig. 9.
Fig. 9. Hemorrhage contrast enhanced image
4 Detection Stage 4.1 Midline Formation Midline detection is based on the contour of the intracranial area. The entire contour is divided it into top and bottom sub-contours by using two different bounding boxes. Then two points of interest for frontal crest and internal occipital protuberance will be
Automated Hemorrhage Slices Detection for CT Brain Images
273
detected from respective sub-contour. These two points of interest will be used to construct the midline. Due to different images having different contours, a two-level detection for point of interest is proposed. For the first level point detection, line scanning scheme is proposed to detect the two points of interest. Whereas for second level, firstly Radon transform is utilized to narrow down the searching line. One of the main applications of Radon transformin image processing is for line detection[8]. Radon transform can be written as:
R( ρ ,θ ) = f ( x, y)δ (ρ − x cosθ − y sin θ )dxdy
(3)
where ρ ,θ and f ( x, y ) denote distance from origin to the line, angle from the Xaxis to the normal direction of the line and pixel intensity at coordinate (x , y). Dirac delta function is represented by δ (•) . After narrowing down the searching line, moving average is used to locate the points of interest. Basically moving average is used for the computation of the average intensity for all the points along the contour line. The entire process is summarized as bellows: i) Acquire the top and bottom sub-contours from the entire contour by using bounding boxes and the results are shown in Fig 10 (b) and Fig 10 (c). ii) Perform the line scanning commencing from the endpoints of the subcontours until it locates the local minima for top sub-contour and local maxima points for bottom sub-contour. The line scanning of local maxima, (x1 , y1) for bottom sub-contour is as illustrated in Fig 10 (d). In the case where the local minima or maxima point is not, found for example, for the top sub-contour as shown in Fig 10(b), the detection process proceeds with step (iii). iii) Apply the Radon transform to detect the line feature by integrating the image intensity in θ direction. From our experimental observation, θ is fixed at 170 degree. This will narrow down the searching contour line for the points of interest. The shortened line is thickened as depicted in Fig. 10 (e). The searching line can be elongated by using the lower value of theta in any necessary conditions. iv) Locate the points of interest, (x2 , y2) from shortened contour line by using moving average where point possesses the highest average value of intensity will be chosen as the point of interest. v) Midline is formed by using the linear interpolation as defined in equation (4) and the result is shown in Fig 11.
y = y1 +
( x − x1 )( y2 − y1 ) ( x2 − x1 )
(4)
274
H.-L. Tong, M.F. Ahmad Fauzi, and S.-C. Haw
(a)
(b)
(c)
(d)
(e)
(f)
Fig. 10. (a) Contour of intracranial area (b) Top sub-contour (c) Bottom sub-contour (d) Line scanning for local maxima detection (e) Shortened searching line (f) Located highest average value of intensity point
Fig. 11. Detected midline
4.2 Histogram Feature Extraction and Dissimilarity Measure After midline acquisition, two histogram features are considered in order to differentiate the hemorrhage slices with non hemorrhage slices. The detected midline from section 4.1 is used to divide the intracranial area into left and right hemispheres. For the first feature extraction, we consider the first level contrast enhanced images. Local binary pattern (LBP)[9] texture descriptor is extracted from left and right hemispheres respectively. In two-level version, there are only 28=256 possible texture units. A center pixel texture unit for 3x3 neighbourhood is obtained based on the following steps: i) Eight neighbourhood pixels’ values are thresholded by the value of the center pixels and the result is shown in Fig 12(b).
Automated Hemorrhage Slices Detection for CT Brain Images
275
ii) Multiply the weights as in Fig 12(c) to the thresholded values of the eight neighbourhood pixels as given in Fig. 12(b). The results is shown in Fig 12(d). iii) Then sum up the values of the eight neighbourhood pixels to acquire the number of texture unit for the center pixel which is 76 in this case. 7 9 3
6 8 9 (a)
9 3 4
0 1 0
0 ? 0 (b)
1 0 0
1 8 32
2 ? 64 (c)
4 16 128
0 8 0
0 ? 64 (d)
4 0 0
Fig. 12. Illustration for the acquisition of the texture unit for 3x3 sub-matrix as in (a)
After acquisition of the texture unit, histograms of 256-bin LBP are constructed for both left and right hemispheres. Finally, the dissimilarity of LBP between left and right hemispheres’ histograms is measured by using the Euclidean distance. However, for the second feature extraction, we consider the hemorrhage contrast enhanced image as shown in Fig 9. Histograms of intensities are constructed for the left and right hemispheres. Since from priori knowledge, intensity of the hemorrhage is located after the peak intensity of the first level image enhancement image. Therefore bins of interest are high indexed bins. To obtain the high indexed bins, the histograms is thresholded as shown in Fig. 13 and Fig. 14 based on the obtained lower limit, IL from section 3.3. Then the dissimilarity of the intensity is computed by using Euclidean distance for the left and right hemispheres.
(a)
(b)
Fig. 13. (a) Right hemisphere for Fig. 9 (b) Thresholded histogram for right hemisphere
(a) Fig. 14. (a) Left hemisphere for Fig. 9
(b) (b) Thresholded histogram for left hemisphere
276
H.-L. Tong, M.F. Ahmad Fauzi, and S.-C. Haw
4.3 Support Vector Machine(SVM) Classification For the classification, we used the SVM with a linear kernel for the model learning. SVM is one of the supervised pattern classification techniques. Therefore, the data is divided into training and testing sets. SVM technique tends to map a testing feature vector, Y={y1 , y2,…., y7} in ℜ of a region to a most closest class set {Ci}, where i=class number. All the obtained Euclidean distance for LBP and intensity histograms will be used for the classification at this stage. Prior to the SVM classification, all the feature components will be normalized by using the equation (5). ~
fc =
fc − l u −l
(5)
Where fv, u and l denote the original feature value, maximum~ feature value and minimum feature value for a particular feature. This will cause fc to be in the [0,1] range. For SVM, a hyperplane can be defined by:
( w • xi ) + b =0
(6)
where w is the vector perpendicular to the hyperlane and b is the bias. SVM seeks optimal hyperplane by solving the following optimization problem:
Minimise: Subject to
N 1 2 || w || +C ξi 2 i =1
(7)
yi ( w • xi ) + b ≥ 1 − ξi , ξi ≥ 0 . Where ξi denotes the slack variable.
Parameter C is penalty factor which maintain the balance of classification accuracy. SVM linear classifier classifies the given test instance, x based on the decision function as in equation (8). f(x)=sign(
α y ( x • x) + b )
xi ∈sv
i
i
i
Where sv is the set of support vectors and
αi
(8)
is a Lagrange multiplier. The value of
f(x) denotes the distance of the test instance from the separating hyperlane and the sign indicates the class label.
5 Experimental Results and Discussion The acquired CT brain images are in DICOM format. The images on 63 patients have been retrospectively retrieved from the CT archive of the collaborating hospital. For each patient, around seven to eight slices have been selected. This gives the total 450 CT brain images including 199 slices showing normal and 251 slices showing hemorrhage. The hemorrhage occurs in different places of the brain and can cause the
Automated Hemorrhage Slices Detection for CT Brain Images
277
different type of head traumas. These generate the different types of hemorrhage. Our abnormal slices consist of five common types of hemorrhages which are subdural, epidural, intraventricular, intraparenchymal and subarachnoid hemorrhages as shown in Fig. 15.
(a)
(b)
(d)
(e)
(c)
Fig. 15. Example of different five types of hemorrhages: (a)Intraparenchymal (b)Epidural (c) Subdural (d) Intraventricular (e) Subarachnoid
For the classification phase, all feature components extracted from 450 slices will be randomly shuffled. Then ten-fold cross validation was applied to partition all feature values into 10 disjointed subsets. Subsequently ten iterations of training and validation are performed such that different disjointed subset is held-out for validation while the remaining subsets are used for training during SVM classification .Each disjointed subsets acts as a new data set which has not been trained before. Ten fold cross validation is commonly used for the validation process. Finally, the results obtained from ten folds are then combined to produce a sole estimation. Overall, the chosen features generate satisfactory classification results in which the accuracy obtained for all the slices is 90%. The evaluation of the accuracies is extended to normal and hemorrhage slices by using recall and precision as shown in Table 1. For the classification, recall and precision for hemorrhage are defined as: True hemorrhage slices found True hemorrhage slices found + True hemorrhage slices not found True hemorrhage slices found Precision = True hemorrhage slices found + False hemorrhage slices found Recall =
278
H.-L. Tong, M.F. Ahmad Fauzi, and S.-C. Haw
Actually in medical aspect, recall rate is more meaningful to the medical doctors than precision. This is because recall rate only take the correctly classified hemorrhage slices into computation and never considered the misclassified hemorrhage slices. Table 1. Recall and precision for normal and hemorrhage slices
Recall Precision
Normal slices 0.9648 0.8348
Hemorrhage slices 0.8486 0.9682
From Table 1, recall is higher for the normal slices as compared with abnormal slices. This is contributed by normal brain slices which have a homogeneous distribution of the gray matter and clear texture on the brain. In this case, the intensity histogram feature and LBP texture histogram feature are describing the normal brain slices well. The abnormal slices that consist of hemorrhage supposingly appear brighter than the normal tissue of the brain. However, in some cases the intensity of the hemorrhage is very close with the normal tissue of the brain. On top of this, in rare cases hemorrhage that occurs symmetrically on left and right hemispheres will cause it to be misclassified as a normal slice. All these factors reduce the recall rate for the abnormal slices. However, the precision rate is higher for hemorrhage slices as only seven normal slices are misclassified as abnormal slices. On the contrary, precision is lower for normal slices as more hemorrhage slices are misclassified as normal slices which contribute to the false positive rate.
6 Conclusion We have presented an automated detection method for hemorrhage slices on 450 CT brain images. The proposed method has achieved promising results where the accuracy achieved for 450 CT brain slices is 90%. The proposed method uniquely contributes in creation of an automated system to detect the hemorrhage slices. Some plans for future work include (1) adoption of more features to boost up the accuracy, and (2) detection of more abnormalities in the brain such as infarct, hydrocephalus and atrophy. Acknowledgments. The authors would like to thank Dr. Ezamin Abdul Rahim and Dr. Noraini Abdul Rahim from Serdang Hospital for the image acquisition, annotation and consultation. Without their kind assistance, the production of this paper will not be possible.
References 1. Liu, Y., Lazar, N.A., Rothfus, W.E., Dellaert, F., Moore, A., Schneider, J., Kanade, T.: Semantic-based Biomedical Image Indexing and Retrieval. Trends and Advances in Content-Based Image and Video Retrieval (2004) 2. Hara, T., Matoba, N., Zhou, X., Yokoi, S., Aizawa, H., Fujita, H., Sakashita, K., Matsuoka, T.: Automated Detection of Extradural and Subdural Hematoma for Content enhanced CT Images in Emergency Medical Care. In: Proceeding of SPIE, pp. 1–4 (2007)
Automated Hemorrhage Slices Detection for CT Brain Images
279
3. Chan, T.: Computer aided detection of small acute intracranial hemorrhage on computer tomography of brain. Computerized Medical Imaging and Graphics 31(4-5), 285–298 (2007) 4. Chawla, M., Sharma, S., Sivaswamy, J., Kishore, L.T.: A Method for Automatic Detection and Classification of Stroke from Brain CT Images. Engineering in Medicine and Biology Society (2009) 5. Milan, M., Sven, L., Damir, P.: A rule-based approach to stroke lesion analysis from CT brain images. In: 2nd International Symposium on Image and Signal Processing and Analysis, pp. 219–223 (2001) 6. Cosic, D., Sven, L.: Rule-Based Labeling of CT Head Image. In: 6th Conference on Artificial Intelligence in Medicine, pp. 453–456 (1997) 7. Liu, R., Chew, L.T., Tze, Y.L., Cheng, K.L., Boon, C.P., Lim, C.C.T., Qi, T., Tang, S., Zhang, Z.: Hemorrhage slices detection in brain CT images. In: 19th International Conference on Pattern Recognition, pp. 1–4 (2008) 8. Chen, B., Zhong, H.: Line detection in image based on edge enhancement. In: Second International Symposium on Information Science and Engineering, pp. 415–418 (2009) 9. Ojala, T., Pietikaninen, M., Harwood, D.: A comparative study of texture measures with classification based on feature distributions. Pattern Recognition 29(1), 51–59 (1995)
CBIR for an Automated Solid Waste Bin Level Detection System Using GLCM Maher Arebey1,*, M.A. Hannan1, R.A. Begum2, and Hassan Basri3 1
Dept. of Electrical, Electronic & Systems Engineering 2 Institute for Environment & Development 3 Dept. of Civil and Structural Engineering Universiti Kebangsaan Malaysia [email protected]
Abstract. Nowadays, as the amount of waste increases, the need of automated bin collection and level detection becomes more crucial. The paper present an automated bin level detection using gray level co-occurrence matrices (GLCM) based on content-based image retrieval (CBIR). Bhattacharyya and Euclidean distances were used to evaluate CBIR system. The database consisting of different bin images, the database is divided into five classes such as low, medium, full. Flow and overflow. The GLCM features are extracted from both query image and all the images in the database, the output of the query and database images are compared using the similarity distances Bhattacharyya and Euclidean distances. The result shows that Bhattacharyya performs better than Euclidean in retrieving the top 20 images that are close to the query image. The performance of the automated bin level detection system using GLCM and CBIR system reached 0.716. The combination between the two techniques proved to be efficient and robust. Keywords: Solid waste management, GLCM, CBIR, Visual Informatics.
1 Introduction Solid waste management (SWM) is an important environmental health service, and is an integral part of basic urban services [1]. From the earliest primitive human society there have been attempts to safely collect and dispose the solid waste [1]. It is consider as one of the basic services that are currently receiving wide attention in many developing countries [2]. A traditional solid waste management system consists of trucks, bin and landfill. Due to the growing issue of landfill disposal from the, many researches are investigating waste diversion through an integrated solid waste management system [3]. The solid waste planner, monitor and management require comprehensive, *
Corresponding author.
H. Badioze Zaman et al. (Eds.): IVIC 2011, Part I, LNCS 7066, pp. 280–288, 2011. © Springer-Verlag Berlin Heidelberg 2011
CBIR for an Automated Solid Waste Bin Level Detection System Using GLCM
281
reliable data and information on solid waste. However, the solid waste database in Malaysia is limited to manage the data by individual local authorities or waste contractors [4]. In order to deal with this great demand on data management, advanced information technologies such as RFID, GPRS, GPS and GPS solution must be utilized [5]. The collection process can be improved if there is system can monitor the truck and bin in real time, such a system will provide real time information of the bin level status and can allow the operator to reallocate the bins position depends on their situation and status, the trucks can be optimized if we get sufficient information about the bin. The view of waste being discarded around the bin destroy the view of big cities, bin being overflowed all the time in some places is a serious problem and need close monitoring. Those reasons being our motivation to develop real time system for monitoring the bin status and detect it level in real time. The main problems of the existing solid waste collection and bin level detection system are as follows [6], [7], [8], [9]: • • •
There is no system can be used to estimate the bin level when different kind of materials being thrown in the bin. Lack of the proper system for monitoring, tracking the trucks and trash BIN that have been collected in real time. There is no system utilized advanced image processing techinque to estimate the amount of solid waste inside the BIN and the surrounding area due to the scattering of waste.
However, with the existing system, it is hard to get all the facilities in time. Solid waste monitoring and management need accurate information to make good decision. To stimulate all these facilities, an effective and robust system is needed.
2 Proposed System The method of the proposed system is developed using RFID, GIS and GPRS interfaced with low cost camera for solution of existing problems and streamline solid waste monitoring and management efficiencies as shown in Fig. 1. The main goal involved in improving the overall efficiency is to estimate the waste level from BIN. RFID tag is attached to each BIN in order to monitor and track the BIN during the collection process. Low cost camera attached to the truck in order to get BIN images. Once the truck enters within the BIN area, the camera takes images before and after the BIN collection, to estimate the waste level in the BIN and its surrounding area. Data from the truck network are recorded and forwarded to a control server through GPS GPRS system. The control server monitors the information and optimizes truck routes and BINs location according to the waste estimation. RFID tag is mounted on 120 L BIN, in order to the gather the serial number of the BIN. RFID provides resistance to environmental influences such as radio frequency wavelength is not absorbed by moisture and more water-resistant. RFID reader and camera are mounted in the truck to capture the serial number of the truck and BIN as well as image of the BIN and forwarded to the control sever via GSM/GPRS network. When
282
M. Arebey et al.
Black box RFID Tag
RFID Reader
GPS & GSM
Bin with Tag
Truck with Black box GPRS Network
GSM Receiver
Client Database
GIS
Fig. 1. Architecture of solid waste monitoring and management system
the truck comes closer to the BIN RFID reader communicates with the RFID tag to capture the tags ID and other information about the BIN and sends it to the control server to ensure proper collection and management of waste. When the control server received the serial number, the system would receive the first image and compared it with the reference images that stored in database using GLCM. After the collection process is done, the camera captures the second image. In this way, the actual level of the BIN and its surrounding area can be estimated with high precision. The camera interfaced with RFID reader and placed on the top of the truck, can provide the shape and the area of the objects. Regarding the camera, a low resolution RGB camera mounted on top of the truck in position that can cover 3 m2 around the BIN.
3 GLCM and Haralick’s Texture Features The gray level co-occurrence matrices (GLCM) provide a second-order method for generating texture features [10]. GLCM calculated the relationship between the conditional joint probabilities of all pair wise combinations of grey levels in the image given two parameters: displacement (d) and orientation (θ) [11]. The GLCM calculated by two methods, symmetric or non-symmetric matrices. GLCM is often defined to be symmetric, the neighboring pixel of (i,j) at (θ=0) would also be considered as being at (θ=180) so that entries would be made at (i,j) and (j,i) and also Each GLCM is dimensioned to the number of quantized grey levels (G) [12]. Various texture features can be generated by Applying statistics to a GLCM. Before texture features can be calculated. The measures require that each GLCM matrix contain not a count, but rather a probability. The probability density function normalizes the GLCM by the number of times this outcome occurs, divided by the total number of possible outcomes. The probability measure can be defined as: Ρ
,
where (Cij ) the co-occurrence probability between grey levels i and j is defined as:
(1)
CBIR for an Automated Solid Waste Bin Level Detection System Using GLCM
283
(2)
∑,
where Pij represents the number of occurrences of grey levels i and j within the given d and θ and G is the quantized number of grey levels. GLCM one of the most known texture analysis methods, estimates image properties related to second-order statistics. Each (i,j) in GLCM corresponds to the number of occurrences of (i,j) gray levels which are a distance d apart in original image [10]. 14 statistical features extracted from gray level co-occurrence matrices to estimate the similarity between them. Some of these features must be selected to reduce the computational cost. Clausi stated that the most appropriate features to be represented with GLCM are Contrast, Entropy, Homogeneity, and Correlation [12]. Contrast feature is a difference moment of the P and it measures the amount of local variations in an image [10]. (3) Entropy feature measures the disorder of an image and it achieves its largest value when all elements in P matrix are equal [10]. When the image is not texturally uniform many GLCM elements have very small values, which imply that entropy is very large [13]. log
(4)
Homogeneity feature compares the distribution of the values on the diagonal of the GLCM to the distribution of the values off the diagonal [10]. HOM
1
1 i j
C
(5)
The correlation feature is a measure of gray-tone linear-dependencies in the image. (6) In this paper, GLAM parameters that have been used are d value is 1with value of G= 16 and we set the values of θ to (0°, 45°, 90°, 135°). The statistical features extracted from gray level co-occurrence matrices are Contrast Entropy Homogeneity correlation.
4 Description of the Test Set The testing database consists of 250 images collected from different bin level. The images in database divide into 5 different classes, each class represented by 50 images. A low cost camera was used as explained in the previous section to obtain the bin images under various levels (Low, Medium, Full, Flow, Overflow). The resolution from the camera is 800 600 pixels with 24 billion colors. The database has been changed from time to time as new images are added. The images are received via GPRS connection as it has been explained in the methodology and stored
284
M. Arebey et al.
w) Fig. 2. Samples of Bin databasee at different levels (empty, low, medium, full, flow and overflow
in JPG format. Fig.2 show ws some samples of the bin images in the database tthat represent the bin in differen nt levels. CBIR started with a textu ure extraction where each image in the data base is extraccted as set of features by GLCM. Also, the query image is extracted using the same method to mage get the features. The secon nd step is a distance measurement between the query im and database images; differrent distance measurement has been applied to get accuurate result measurement. Fig.3 sh hows the flowchart of CBIR system. Query Image
Database Images
F Feature Extraction
Feature Extraction
Similarity Measurement Fetching Order of Matching Image ID Result Images
Visualize the nearest 10 images
Fig. 3. Archittecture of Content Based Image Retrieval System
CBIR for an Automated Solid Waste Bin Level Detection System Using GLCM
285
GLCM is applied for all the images in database , co-occurrence matrix is calculated and features such Features energy, entropy, contrast and inverse difference moment were calculated. The similarity of the query image with the images in database is calculated using two distance measurements, the Bhattacharyya and Euclidean distances. The purpose of using two distances is to compare the performance of the proposed method with both distances and choose the best efficiency. The Bhattacharyya distance has been used as a class similarity measure for class classification ∑
1 8
∑ 2
1 |∑ ∑ | ln 2 |∑ ||∑ |
(7)
where M is the mean vector for class i and ∑ its covariance matrix. Another distance used to calculate the similarity between the query image and database images is Euclidean distance and computed as follows:
E
(8)
5 Evaluation of Similarity Measurements The objective of this paper is developing CBIR system with high efficiency. The two distances were used with the five classes that explained in the database section. The result was compared to each other to get the highest performance. 5.1 Test Setup The GLCM features that have been used in the testing stage are Contrast Entropy Homogeneity correlation. The performance of the system is measured using precision and recall. The precision and recall is calculated for all the query images. To examine the performance of the proposed system, 250 images was collected for testing purpose, to evaluate the performance of the system, the images is divided into 5 classes, each class containing 50 images. Each class represent different BIN level such as Low, Medium, Full, Flow and Overflow. 5.2 Performance Metrics The CBIR system performance is evaluated base on precision and recall standard performance. The precision and recall are calculated by equations [8], [9], the number of relevant images that respond to the query divided by total number of returned images is used to measure the precision, while the number of relevant images that respond to the query divided by the total number of relevant images in the database is used to calculate the recall [13],[14]. The retrieved images from the system are categorized according to the query image as relevant and irrelevant image [15], as it explained above, each class contains 50 images, all of them considered as relevant images.
286
M. Arebey et al.
.
(9)
. .
(10)
. 5.3 Experimental Setup
To evaluate the performance of the proposed system, 20 query images are selected randomly from each class. Query images were tested with two similarity distances Bhattacharyya and Euclidean distances to compare the result to each other. GLCM technique that has been described in the above section is used to extract the features (Contrast Entropy Homogeneity correlation) for both the database and query images. Experiment has been conducted using distances Bhattacharyya and Euclidean distances with same query images to measure the performance accuracy with them. In both distances we used the same method. As it is explained earlier 20 images has been chosen randomly to form the query dataset. Every time, the query image passed the system, the top 20 images is retrieved with it. The top 20 images are consisted of relevant and irrelevant images. The precision and recall rates is measured based on the number of relevant images. The recall and precision rates are used to compare the performance of both distances. Table 1. Average precision and average recall of query images of Bhattacharyya and Euclidean distances with low, medium, full, flow and overflow classes
Class Low Medium Full Flow Overflow whole performance
Precision (Bhattacharyya) 0.8 0.87 0.43 0.76 0.72
Recall (Bhattacharyya) 0.32 0.35 0.17 0.3 0.29
Precision (Euclidean) 0.65 0.63 0.35 0.62 0.63
Recall (Euclidean) 0.26 0.25 0.14 .28 0.25
0.716
0.286
0.58
0.24
As it can be seen from table 1 Bhattacharyya distance improved the performance of the system, Bhattacharyya outperform Euclidean based on the number relevant images retrieved. Fig.4 shows the performance comparison between the two distances using average precision, the performance of all classes is improved with Bhattacharyya distance. However, the accuracy of full class is still low, this is probably goes to the limitation of variety in full class images. the overall system performance with Bhattacharyya distance reached 0.716 which is consider good in CBIR system.
CBIR for an Automated Solid Waste Bin Level Detection System Using GLCM
287
0.9 Bhattacharyya Euclidean
0.8 0.7
Precision
0.6 0.5 0.4 0.3 0.2 0.1 0 0.5
1
1.5
2
2.5
3 3.5 Classes
4
4.5
5
5.5
Fig. 4. Comparison of the average precision of Bhattacharyya and Euclidean distances with low, medium, full, flow and overflow classes
Fig.5 shows an example of top 20 retrieved images for flow class using Bhattacharyya distance. Analysis of the result shows that Bhattacharyya outperformed Euclidean distance and the overall performance can be improved by adding more images to data base from different classes. The images in fig.5 is arranged by the smallest distance to the query image.
Fig. 5. Top 20 retrieved images response to the query image
288
M. Arebey et al.
6 Conclusions In conclusion, an automated bin level detection system based on content-based image retrieval using GLCM texture extractor is proposed. We have developed and evaluated the system different bin levels using CBIR. The analysis of the system shows promising result from all the classes. Further improvement and another technique should be investigated to improve the whole system efficiency. However, the result of GLCM with CBIR is also efficient to detect the bin level. Acknowledgement. The authors are acknowledged this work for the financial support under the grant UKM-PTS-0017-2009.
References 1. Ahmed, S.A., Ali, M.: Partnerships for solid waste management in developing countries: Linking theories to realities. Habitat International 28, 467–479 (2004) 2. Seik, F.: Recycling of domestic wastes: Early experiences in Singapore. Habitat International 21(3), 277–289 (1997) 3. Johansson, O.M.: The effect of dynamic scheduling and routing in a solid waste management system. Waste Management 26, 875–885 (2006) 4. Maher, A., Hannan, M.A., Hassan, B., Begum, R.A., Huda, A.: Solid Waste Monitoring System Integration based on RFID, GPS and Camera. In: The World Congress on Engineering, WCE 2010, London, U.K, June 30-July 2, vol. I, pp. 2078–0966 (2010) 5. Ministry of Housing and Local Government, Malaysia: Kuala Lumpur: Technical Section of the Local Government Division Annual Report, Section 4 – Local Government (1999) 6. Chandravathani, S.: Waste Reduction: No Longer An Option But A Necessity, Bernama (2006), http://www.bernama.com/bernama/v3/news_lite.php?id=179384 7. Ping, L.I., Yang, S.H.: Integrating GIS with the GeoEnviron database system as a robust tool for integrated solid waste management in Malaysia (2006), http://www. gisdevelopment.net/application/urban/products/ma08.html 8. Thomas, V.M.: Product self- management: evolution in recycling and reuse. Environmental Science and Technology 37, 5297–5302 (2003) 9. Maher, A., Hannan, M.A., Hassan, B., Begum, R.A., Huda, A.: Integrated technologies for solid waste bin monitoring system. Environ. Monit. Assess. 177, 399–408 (2011) 10. Haralick, R.M., Shanmugam, K., Dinstein, I.: Textural Features for Image Classification. IEEE Trans. on Systems, Man, and Cybernetics 3(6), 61–621 (1973) 11. Gonzalez, R.C., Woods, R.E.: Digital Image Processing. Addison–Wesley Publishing, USA (1993) 12. Clausi, D.A., Jernigan, M.E.: A Fast Method to Determine Co-Occurrence Texture Features. IEEE Transactions on Geoscience and Remote Sensing 36(1), 298–300 (1998) 13. Haverkamp, D., Soh, L.K., Tsatsoulis, C.: A comprehensive, automated approach to determining sea ice thickness from SAR data. IEEE Trans. Geosci. Remote Sensing 33, 46–57 (1995) 14. Wong, Y.M., Hoi, S.C.H., Lyu, M.R.: An empirical study on large-scale content based image retrieval. In: Proc. IEEE Int. Conference on Multimedia & Expo., ICME 2007 (2007) 15. Muller, H., Michoux, N., Bandon, D., Geissbuhler, A.: A review of content-based image retrieval systems in medical applications–clinical benefits and future directions. International Journal of Medical Informatics 73(1), 1–23 (2004) 16. Salton, G.: The state of retrieval system evaluation. Information Processing and Management 28(4), 441–450 (1992)
Image Enhancement of Underwater Habitat Using Color Correction Based on Histogram Norsila bt Shamsuddin1,2, Wan Fatimah bt Wan Ahmad1, Baharum b Baharudin1, Mohd Kushairi b Mohd Rajuddin2, and Farahwahida bt Mohd2 1
Universiti Teknologi PETRONAS, Tronoh, Perak 2 Universiti Selangor, Selangor [email protected]
Abstract. The visual of an underwater images sometimes does not representing the real world objects. Light absorption and diffusion somehow degrade the clarity of the images. In order to improve visualization of underwater images, an approach using color correction based on histogram is used. The reason why this approach is used is simply because it improves contrast by redistributing intensity distributions and computes a uniform histogram. Histogram in Photoshop is a graph that shows the digital image balance and also placement of the pixels in an image. A few examples are shown in this paper. Keywords: Underwater images, visualization, histogram equalization, intensity distributions, pixels, Visual Informatics.
1 Introduction Underwater imaging is quite a challenging in the area of photography. Photographer requires very specialized equipment and technique to take photo deep down the ocean. Nowadays underwater activities are important to scientists. They are eager to discover, recognize and study the activities through underwater images. Existing research shows that underwater images brings new challenges and enforce significant problems due to light absorption and diffusion effects of the light. This is because the images tend to become bluish as they go down deeper into the ocean to take photos. Most of the previous research in image processing concentrates on ordinary types of images, for example personal photos as well as scenery and buildings. Underwater image processing is quite a challenge because the need to enhance the images in order to represent the real underwater world. Marine biologists study how the ocean currents, tides and many other oceanic factors influence ocean life forms, including their growth, distribution and well being. It is possible due to the usage of Global Positioning System (GPS) and newer underwater visual devices. For the past four decades, Harbor Branch Oceanographic Institute at Florida Atlantic University has been conducting research in the area of Ocean Engineering & Technology, a study on imaging systems to characterize the undersea environment and advance communications. Among tools, technologies and techniques they used are laser line scan undersea imaging which applied research in imaging system performance, prediction model validation, and development of novel H. Badioze Zaman et al. (Eds.): IVIC 2011, Part I, LNCS 7066, pp. 289–299, 2011. © Springer-Verlag Berlin Heidelberg 2011
290
N. bt Shamsuddin et al.
hardware for undersea imaging and Fluorescence imaging in assessment of benthic community structure and coral health. It is true that by using all sorts of special equipments and photographic techniques mentioned above will help diverphotographer to get better images but bear in mind it requires a lot of investment. Color loss and poor visibility can occur if the photographer with limited financial source taking photos using normal digital camera. Therefore, it is important to have another alternative to visualize the underwater habitat for those who are unable to spend on special equipment to take underwater images.
2 Underwater Imaging Problem Statement In the study of underwater images, we found that there are few problems occurred during the photographic session July 2010 at Tioman Island. It is found that those images are blurs due to light absorption and inherent structure of the sea. Due to the respect of light reflection, Church, et. al. [1] explains that the reflection of the light are varies and it depends on the structure of the sea. The primary barrier in underwater imaging is the extreme color loss and contrast due to any significant depth. The longer the sunlight’s wavelength the faster the color will fade. Water can bend the light either to create crinkle patterns or to diffuse the light. Anthoni [2] views that the reflected light is either polarized horizontally or enters the water vertically. When the lights enter the water vertically it somehow makes the objects less shining and makes the images bluish.
Fig. 1. Color Changes with Depth/Distance in Clear Water [2]
Figure 1 shows the interaction of water particles with light by absorbing certain wavelengths. The red and orange color will disappear first, followed by yellow, green and purple. The last color that disappears will be blue color because blue has the shortest wavelength. From the diagram it shows that the loss of the red color is obviously at 50cm. 90% of the color tends to disappear at 5 metres depth. Since the loss of color varies critically with distance, it is necessary to make corrections by applying color correction filters.
Image Enhancement of Underwater Habitat Using Color Correction
291
Fig. 2. The Light Absorption by Wavelength [2]
Figure 2 shows the accurate light extinction figures for 1m water. Clear oceanic water contributes 40% loss in red light per meter. The remaining light at 2 meters is 36%. Clear oceanic water has its least loss in the blue color, which means that distant objects would look blue. As a rule of thumb remember that you lose one f-stop (50%) in the reds for one meter light path [2]. Average coastal water loss in the red color is about 70%. Over 2 meters the remaining red light is about 9%. The dip of this curve rests in the green wavelengths, which makes distant objects look green. The amount of light that enters the water starts reducing when photographer start going deeper into the ocean. Same goes to the water molecules that absorb certain amount of light. Due to these events, the underwater images getting darker and darker as the depth increases. Not only the amount of light is reduced but also colors drop off one by one depending on the wavelength of the colors. The red color will disappear at the depth of 3 meters, followed by orange, yellow and green. The blue color travels the longest in the water due to its shortest wavelength. That is why the underwater images dominate only by blue color. Furthermore the blue color contributes to the blur images which are in low brightness and low contrast. Therefore, it is significant to study on technique to enhance the underwater images in order to have a better visualization of the underwater world. In other words, this study is important because it can show non-divers the exact underwater image in terms of color and structure.
3 Related Works Some of the related literatures concerning underwater image processing are: Gasparini and Schettini [3] have developed a tunable cast remover for digital photographs based on a modified version of the white balance algorithm. This approach first deducts the presence of a cast using a detector and secondly it removes the cast. The approach has been applied to a set of images downloaded from personal web pages.
292
N. bt Shamsuddin et al.
Garcia, et. al. [4] has proposed a vision based system using motion detection algorithm. This approach is used to automatically maintain the position of the vehicle when the reference of the corresponding image is lost. In this way, it addresses the issue of image orientation caused by vehicle movement. This approach is twofold. Firstly, this is applied to images to select a set of candidate matches for a particular interest point. Secondly, it uses a texture characterization of the points for selecting the best correspondence. The contrast operator performs a grey scale differentiation in the region. Similarly Greig et. al. [5] has used techniques such as contrast stretching and Markov Random field. They applied bimodal histogram model to the images in order to enhance the underwater image, first of all, they applied contrast stretching techniques. Secondly, they divided the image into two parts; object and background and then applied Markov Random field segmentation method. Schneider et. al. [7] has used a Physics-based model. They developed scene recovery algorithm in order to clear underwater images/scenes through polarizing filter. This approach addresses the issue of backscatter rather than blur. It mainly focused on the recovery of the object. They have applied this approach to analyze and remove the physical effects of visibility degradation which can be associated with partial polarization of light.
4 The Approach for the Underwater Image Enhancement It has been highlighted that researchers within the field of marine research in general and computer science in particular are facing problems regarding the quality of the underwater images. Given the theoretical and technological perception to marine research, the problem of image enhancement is gaining increasingly importance. One of the most significant issues is how to improve the quality of the underwater images in order to streamline the image processing analysis. The problems related to underwater images come from the light absorption and diffusion effects by the marine environment. In order to eliminate this problem, researchers are using state-of-the-art technology such as autonomous underwater vehicles [4], sensors and optical cameras [6], visually guided swimming robot [8]. However, the technology has not yet reached to the appropriate level of success. Due to time and cost, this research will only concentrate on images taken from normal camera with low resolution. The reason why the camera is chosen is to prove that the color of these images can also be corrected by using simple image enhancement methodology. The study would like to identify how much color that can be corrected from the images taken from normal camera. In order to address the issues discussed above, the propose approach is experimenting color correction based on histogram equalization. There are three stages in histogram equalization: Firstly, computing the histogram; parse the input images and count each distinct pixel value in the images. For example; for 8-bit pixels, the Max Pixel Value is 255, and array size is 256. Secondly, computing the normalized sum of histogram, where we store the sum of all the histogram values and normalize by multiplying each element by (maximum-pixel-value/number of pixels). Third Stage: Transforming input image into output image by using the normalized array as a look up table for mapping the input image pixel value to the new set of values.
Image Enhancement of Underwater Habitat Using Color Correction
293
Fig. 3. Image Adjustment Level Windows
Figure 3 is the tools to level the histogram. We need to set the initial darkest value and the brightest value. At this stage we need to double click the left hand side eyedropper to sample in image to set black point and double click the right hand side eyedropper to sample in image to set white point. The eyedropper in the middle is to sample in image to set the grey point.
Fig. 4. .Setting the Initial Darkest Value
Fig. 5. Black Point Slider
In Figure 4, we set the B (Brightness) value to 5% to set the black point at 95% darker. In Figure 5, we need to move the left slider to set the black point. This will assign the level for pure black.
Fig. 6. Setting the Initial Brightest Value
Fig. 7. White Point Slider
294
N. bt Shamsuddin et al.
In Figure 6, we set the B (Brightness) value to 95% to set the white point at 5% lighter. In Figure 7, we need to move the right slider to set the white point. This will assign the level for pure white. Finally move the center slider to reach a visual balance. This sets the mid-tone or center point for the image. This one is a bit more flexible and where to set it depends on the individual image.
5 Analysis and Result A few photos have been taken as samples by using normal camera with average resolution. The following are some examples to show the effects of color correction by using histogram:
Fig. 8. Coral Reefs before Enhancement
Fig. 9. Original Image’s Histogram
Figure 8 obviously shows that the image is dominated by the blue color. It is taken in 3 meters depth, at 1100 hours daylight. Histogram in Figure 9 shows that this image features a sort of bar graph with lots and lots of very narrow, vertical bars (lines, really) all blended together horizontally. There are 256 bars numbered from zero on the left to 255 on the right. The bars at the left represent the darkest pixels and the bars at the right represent the lightest pixels. Each bar is a level of luminosity (basically, brightness) representing the 256 levels of grayscale that the human eye is generally given credit for being able to recognize. The taller a bar the more pixels there are in the image at that level. In the example above there are some peaks and valleys, and that is to be expected.
Fig. 10. Coral Reefs after Enhancement
Fig. 11. Balance in Histogram
Image Enhancement of Underwater Habitat Using Color Correction
295
Figure 10 is the image that undergone the color correction based on histogram. Figure 11 shows that there is a fairly decent balance of levels in the histogram. The histogram shows activity all across the range showing that the image was fairly well exposed. Although the histogram looks a little heavy in the dark range, there is little in the white area at the right side of the histogram. Table 1. Before and After Color Correction Data
Figure 9 11 % changes
Mean 127.88 123.09 3.75
Standard Deviation 73.23 73.30 0.1
Median 111 109 1.8
Pixel 124848 124848 nil
If we compare the standard deviation stated in Figure 9 and 11 as in Table 1, there is a slightly increase (0.1%). It shows that there is a difference in terms of color between Figure 8 and 10.
Fig. 12. Original Corals and Fish Images
Fig. 13. Unbalanced Histogram
Another example in Figure 12 is the original corals and fish images taken daylight at 4 meters depth. In Figure 13, the histogram is devoid of any data on the left side of the graph. This shows that the image is lacking in data in the light region and is quite dark overall. It has been underexposed.
Fig. 14. After Color Correction
Fig. 15. Corrected Histogram
296
N. bt Shamsuddin et al.
Figure 14 is the improved image after the color correction is applied. In Figure 15, we can see that the histogram has been leveled. Although this histogram is balanced well to the left, it corrected the peak point shown in Figure 11. Table 2. Before and After Color Correction Data
Figure 13 15 % changes
Mean 88.70 103.50 16.70
Standard Deviation 55.78 61.34 10
Median 99 99 nil
Pixel 124848 124848 nil
Table 2 shows that we have achieved approximately 10% of increment in color changes of Figure 13 to Figure 15.
Fig. 16. Original Diver’s Image
Fig. 17. Unbalanced Histogram
Figure 16 also one of the examples of unbalanced color in the image. This image is taken during daylight at 4.5 meters depth. Again in Figure 17, there are few peaks and valleys appear. It is just because of the same problem occurred during the photo taken; lack of light exposure.
Fig. 18. Corrected Diver’s Image
Fig. 19. Corrected Histogram
Image Enhancement of Underwater Habitat Using Color Correction
297
Image in Figure 18 is obviously shows the actual color as compared to the one in Figure 16. Histogram in Figure 19 has been corrected and the peaks and valleys in Figure 17 have gone. This illustrates that the histogram have been leveled and the color is corrected. Table 3. Before and After Color Correction Data
Figure 17 19 % changes
Mean 94.72 72.63 23.30
Standard Deviation 46.80 53.23 13.70
Median 95 65 31.60
Pixel 124848 124848 nil
The standard deviation in Table 3 also shows an increment of 13.7% changes of color between Figure 17 and 19. With this changes although it is small, somehow we manage to improve the color of the underwater images by using ordinary software (Photoshop CS3). The following images are samples of color correction:
298
N. bt Shamsuddin et al.
6 Conclusion and Future Direction In this paper, the approach used to enhance the underwater images is by using color correction based on histogram equalization. It demonstrate the usefulness of the approach by transforming input image into output image by using the normalized array as a look up table for mapping the input image pixel value to the new set of values. The advantage of applying this approach is that it helps to equalize the color contrast in the images and also addresses the problem of lighting. By applying the proposed approach, we have produced promising results. The quality of the images is statistically illustrated through the histograms. For the future work, another tool will be used to enhance the underwater image that is statistical approach by using Markov Random Field by dividing the image into two parts; object and background and then applied Markov Random field segmentation method. The purpose of applying another method to enhance the image is to compare which tool is much better than the other.
References 1. Church, S.C., White, E.M., Partridge, U.C.: Ultraviolet Dermal Reflection and Mate Choice in the Guppy, pp. 693–700. Elsevier Science Ltd (2003) 2. Floor, A.J.: Water and Light in Underwater Photography (2005), http://www.seafriends.org.nz/phgraph/water.htm
Image Enhancement of Underwater Habitat Using Color Correction
299
3. Gasparini, F., Schettini, R.: Colour Correction for Digital Photographs. In: 12th International Conference on Image Analysis and Processing (ICIAP 2003). IEEE Press (2003) 4. Garcia, R., Nicosevici, T., Cufí, X.: On The Way to Solve Lighting Problems in Underwater Imaging. In: Proceedings of the IEEE OCEANS Conference (OCEANS), pp. 1018–1024 (2002) 5. Greig, A.R., Fairweather, A.J.R., Hodgetts, M.A.: Robust scene interpretation of underwater image sequences. In: 6th International Conference on Image Processing and its Applications, pp. 660–664 (1997); ISBN: 0 85296 692 X 6. Malkasse, J.P., Andreas, A.B., March, G.M.: A Pre-Processing Framework for Automatic Underwater Images Denoising. In: European Conference on Propagation and Systems, pp. 15–18 (2005) 7. Schechner, Y., Karpel, N.: Clear Underwater Vision. In: Proceedings of the IEEE CVPR, vol. 1, pp. 536–543 (2004) 8. Ripsman, A., et al.: A Visually Guided Swimming Robot. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, Session TAI-13 (2005) 9. Schettini, R., Corchs, S.: Underwater Image Processing: State of the Art of Restoration and Image Enhancement Methods. EURASIP Journal on Advances in Signal Processing (2010) 10. Padmavathi, G., Subashini, P., Kumar, M.M., Thakur, S.K.: Comparison of Filters used for Underwater Image Pre-Processing. IJCSNS 10(1) (2010)
A New Application for Real-Time Shadow and Sun's Position in Virtual Environment Hoshang Kolivand and Mohd Shahrizal Bin Sunar ViCubelab, Department of Computer Graphics and Multimedia, Faculty Computer Science and Information Systems, UTM, Universiti Teknologi Malaysia, 81310 Skudai Johor, Malaysia [email protected], [email protected]
Abstract. This paper introduced new real-time application for outdoor computer graphic programming to control the amount of shadow with effect of sun's position. The position of sun plays an important rule for outdoor games. Calculation of sun's position, as a result, position and length of shadows require a lot of attention and preciseness. Julian dating is used to calculate the sun's position in the virtual dome. In addition, of computer graphics, building design is another field that this paper contributes on it. To create shadow, projection shadow is proposed. By calculating the sun's position in the specific date, time and location on the earth, shadow is generated. Length and angle of shadow are two parameters measured for building designers and both of them are calculated in this proposed real-time application. Therefore, it can be used for teachers to teach some part of physics about earth orbit and effect of sun on shadow and it can be used in building design and commercial games in virtual reality systems. Keywords: real-time shadow, sun's position, sun light, projection shadow.
1 Introduction There are some different shadow techniques, like drawing a dark shape similar to the occluder on a plane. Although it is not precise, it is frequently used, especially in old computer games and some parts of advertisement animation. Another simple method to create real-time shadow is projection shadow algorithm [1] that is still widely used in game programming. In this method, shadow can be created just on horizontal and vertical plants at a time, but to create shadow on two adjacent horizontal and vertical planes need more calculation [2]. To have a shadow on arbitrary objects stencil buffer is appropriate. In computer games, shadow can reveal real distance between objects in virtual environment and give the gamers maximum feeling. A computer game without shadow cannot be very attractive even for indulged users when they play games or watch cartoons. In 1999, Preetham et al. approached an analytic model in rendering the sky. The image that they generated is attractive [3]. They present an inexpensive analytic sky
H. Badioze Zaman et al. (Eds.): IVIC 2011, Part I, LNCS 7066, pp. 300–306, 2011. © Springer-Verlag Berlin Heidelberg 2011
A New Application for Real-Time Shadow and Sun's Position in Virtual Environment
301
model from Perez et al. (Perez model) that approximates full spectrum daylight for various atmospheric conditions[4]. In 2008, M. Sunar, worked on sky color with effect of sun's position [5]. He used Julian dating and Perez model to create sky color. In 2010, Sami M. Halawani et al. published a paper entitled "Interaction between sunlight and sky color with effect of sun's position" [6]. They used Julian dating to control the position of sun. In 2008, Ibrahim Reda et al. introduced precise formulas for Julian day and used it for solar radiation [7].
2 Method There are three kinds of technique to generate real-time shadows. Real-time techniques are interactive and difficult to implement. Projection shadow, volume shadow and shadow mapping are famous techniques to create real-time shadow. 2.1 Hard Shadow Projection shadow technique is one of the simple methods to generate a real-time shadow. The most prominent problem of projection shadow is the fact that it needs a huge calculation to have shadow on arbitrary objects. The main idea of this method is to draw a projection of each occluder's pixels on the shadow receiver along the ray that is started from the light source up to the plane [2]. L is light source P is a pixel of occlude S is projection of P N: Normal vector of ground.
By using projection matrix for each pixel of occluder, projection shadow will be appearing on the plane. 2.2 Sky Modeling 2.2.1 Detect of Longitude and Latitude Latitude is a distance from north to south of the equator. Longitude is the angular distance from east to west of the prime meridian of the Earth. Longitude is 180 from east to west. Each 15 represent one hour of each time. For example, if you can travel towards west 15 per hour, you can turn off your time and turn on your time up on arrival without having any change in time. The earth spins around the sun in specific orbit once year.
302
H. Kolivand and M.S. Bin Sunar
2.2.2 Dome Modeling Dome is like a hemisphere in which the view point is located inside it. To create a hemisphere using mathematical formulas the best formula is: ,
(1)
Where θ is the zenith and φ is the azimuth and 0
Fig. 1. The zenithal and azimuthal angles on the hemisphere
Before creating shadows in virtual environment, the sun's position must be determined; and this will be described in the next section. 2.3 Sun's Position The principle calculation of the sun's position is well known long time ago and some exact data are needed. The ancient Egyptians were able in many years to calculate the sun's position so, with digging a large hole inside one of the pyramids, just once a year, when it is also the birthday of the king; the sun could shine on the grave of their king [8]. 2.3.1 Why the Sun Angle Changes During the Year The earth's oriented North - South line is not exactly perpendicular to the orbit. It has about 23.5° deviation. The diversion of earth during a turn in the orbit around the sun maintains. When earth is located on the right side of the sun, the southern hemisphere,
A New Application for Real-Time Shadow and Sun's Position in Virtual Environment
303
due to the slight deviation (23.5°), more direct radiation from the sun receives. About six months later, when the earth goes to the other side of the sun, this radiation to the northern hemisphere is vertical. Longitude and latitude are two most important necessary aspects to calculate the sun's position. The other information that is required is Greenwich Mean Time (GMT). To determine the position of the sun in the created dome, zenith and azimuth are enough. 2.3.2 Calculation of Sun's Position To calculate position of the sun, zenith and azimuth are enough. To have zenith and azimuth, location, longitude, latitude, date and time are needed [9]. Zenith is the angle that indicates the amount of sunrise while the azimuth is the angle that indicates the amount angle that sun turns around the earth. In 1983, Iqbal [10] proposed a formula to calculate the sun's position and in 1999, Preetham et al.[4] improved it. It is a common formula to calculate the position of the sun in physics. 0.17 0.129 12 where : Solar time : Standard time J: Julian date SM: Standard meridian L: Longitude The solar declination is calculated as the following formula: 0.4093
2
81 368
(2)
3
: Solar declination The time is calculated in decimal hours and degrees in radians. Finally zenith and azimuth can be calculated as follows: 2
sin
12
Tan where : Solar zenith : Solar azimuth : Latitude With calculation of zenith (θ
4 (5)
and azimuth (φ
the sun's position is obvious.
304
H. Kolivand and M.S. Bin Sunar
2.4 Effect of Sun's Position on Shadows As the earth moves in its orbit, sun is moved from earth. Shadow's length depends on the position of the sun relative to the view situation. When a part of earth is tilted away from the sun, the sun's position is lower in the sky and shadows are long. On the other hand, when a part of earth is tilted towards the sun, position of the sun is highest in the sky and shadows are short. Longer shadows in a day appear at the sunrise and sunset; and the short shadows appear at the noon. The tilted of the earth axis is a reason that each part of earth can see more or less sun in a day. Different season is a result of different length of days and nights [11].
3 Result and Discussion Sun's position and length of shadows in real-time computer games can make a game realistic as much as possible. To keep real position and length of shadows in a virtual environment a substantial amount of precision is needed. Solar energy is free and a blessing of God for us, optimized usage of this blessing needs a mastermind. On the other hand, in building design and architecture, possible recognition of which direction is best to build a building in specific location. In cold places, building needs to stay in a situation that shadows lie in the back of the building but on the contrary, in warm places building should be located in the direction that shadows lie in front of the building. Figure 2, 3 and 4 show the direction of sun shine and shadow at 7:40, 10:59 (b) and 16:30 respectively, in UTM university on latitude 1.28 and longitude 103.45 in 22 April 2011.
Fig. 2. Result of application at 7:40 in UTM (April 22, 2011)
A New Application for Real-Time Shadow and Sun's Position in Virtual Environment
305
Fig. 3. Result of application at 10:59 in UTM (April 22, 2011)
Fig. 4. Result of application at 16:30 in UTM (April 22, 2011)
4 Future Work To have shadows on arbitrary object, stencil buffer is appropriate. Volume shadow using stencil buffer combined with sky color can make it as much realistic as possible.
306
H. Kolivand and M.S. Bin Sunar
Interaction between the sun's position, sunlight, shadow and sky color in the outdoor virtual environment can be more appreciated.
5 Conclusion Since this research targets at shadows and sun's position for game engines and virtual environment, the real-time shadow technique seems appropriate. Subsequently, the hard shadow such as projection shadow was rendered with sun's position to get the precise effect of sun's position in outdoor effects such as shadows. The methodology used in this research combines projection method with sun's position and sky modeling. The first objective of this research is achieved by the use of projection shadow to create hard shadows and to recognize the position of the sun for each viewpoint in specific location, date and time of day. The proposed application can be used for outdoor games without any hesitation about the position of the sun and location of shadow for each object. Other contribution of this application is for building designer to find the best direction to build a building to save the solar energy. Expert proposed method can also be used to display the sun's position and to describe the amount of shadow changes in some high schools.
References 1. Crow, F.: Shadow Algorithms For Computer Graphics. Computer Graphics 11(2), 242– 247 (1977) 2. Yang, X., Jin, Y., Yin, Y., Wu, Z.: Gpu Based Real Time Shadow Research, 372–377 (2006) 10.1109/Cgiv..48 3. Perez, R., Seals, R., Michalsky, J.: All-Weather Model For Sky Luminance Distribution Preliminary Configuration And Validation. Solar Energy 50(3), 2 (1993) 4. Preetham, A.J., Shirley, P., Smith, B.: A Practical Analytic Model For Daylight. In: Computer Graphics (Siggraph 1999 Proceedings), vol. 91, p. 100 (1999) 5. Sunar, M.S., Zamri, M.N.: Advances In Computer Graphics And Virtual Environment, vol. 1. Utm Press (2007) ISBN 978-983-52-0636-8 6. Halawani, S.M., Sunar, M.S.: Interaction Between Sunlight And The Sky Color With 3d Objects In The Outdoor Virtual Environment. In: 2010 Fourth International Conference On Mathematic/Analytical Modeling And Computer Simulation (2010) 7. Reda, I., Andreas, A.: Solar Position Algorithm For Solar Radiation Applications. National Renewable Energy Laboratory (2008) 8. Nawar, S., Morcos, A.B., Mikhail, J.S.: Photoelectric Study Of The Sky Brightness Along Sun’s Meridian During The March 29, 2006 Solar Eclipse. New Astronomy 12, 562–568 (2007) 9. Chang, T.P.: The Sun’s Apparent Position And The Optimal Tilt Angle Of A Solar Collector In The Northern Hemisphere. Solar Energy 83, 1274–1284 (2009) 10. Iqbal, M.: An Introduction To Solar Radiation, p. 390. Academic Press (1983) 11. Chang, T.P.: The Sun’s Apparent Position And The Optimal Tilt Angle of A Solar Collector In The Northern Hemisphere. Solar Energy 83, 1274–1284 (2009)
A New and Improved Size Detection Algorithm for Acetabular Implant in Total Hip Replacement Preoperative Planning Azrulhizam Shapi‘i1, Riza Sulaiman2, Mohammad Khatim Hasan1, Anton Satria Prabuwono1, Abdul Yazid Mohd Kassim2, and Hamzaini Abdul Hamid2 1
Faculty of Information Science & Technology, Universiti Kebangsaan Malaysia, Institute of Visual Informatics 43600 Bangi, Malaysia {azrul,riza,khatim,antonsatria}@ftsm.ukm.my 2 Department of Orthopaedic & Traumatology, Medical Center of Universiti Kebangsaan Malaysia, Bandar Tun Razak, 56000 Cheras, Malaysia [email protected]
Abstract. Preoperative templating in Total Hip Replacement (THR) is a method to estimate the optimal size and the best position of the implant. Today, observational (manual) size recognition techniques are still used to find a suitable implant for the patient. Therefore, a digital and automated technique should be developed so that the implant size recognition process can be effectively implemented. For this purpose, we have introduced the new technique for acetabular implant size recognition in THR preoperative planning based on the diameter of acetabulum size. This technique enables the surgeon to recognise a digital acetabular implant size automatically based on the two points selected on the patient’s acetabulum. Based on the testing result, the average time taken for the acetabular implant templating process in preoperative planning using a new automated technique was much less than using the digital line distance method. Keywords: Acetabular, implant, total hip, detection, size, preoperative, Visual Informatics.
1 Introduction The field of orthopaedics at the time became more important as the number of patients suffering from osteoporosis increased every year [1]. The conventional detection method was used to search for a suitable implant for the patient. The method involves using the digital implant templates designed by the researcher, and then the image of the implant is measured by doing image mapping on a patient’s digital X-ray using line distance method. This step was used repeatedly until the D. Keyson and B. Kröse (Eds.): AmI 2011, LNCS 7040, pp. 307–318, 2011. © Springer-Verlag Berlin Heidelberg 2011
308
A. Shapi’i et al.
appropriate implant of the patient encountered. However, based on the testing result, the survey found that this procedure requires a long time, and is said to be less efficient [2]. Thus, the line distance procedure should be changed to a new technique. This new and improved technique helps the surgeon identify the appropriate patient’s implant size automatically and more faster. Several studies have shown that a digital technique can improve the placement of a total hip implant [4-7]. An important aspect of any software or method is presenting the good user interface. If the user interface is poor the software runs the risk that it will not be used, or worse, it will be used improperly with potentially harmful results. This is especially true for preoperative planning software, for which the results of the plan can directly impact the outcome of a surgical procedure [8].
2 Objectives The main objective of this research is to produce a new technique which can automatically recognize the size of patient’s acetabular implant in total hip replacement surgery. Particularly, in order to achieve these goals, the main objectives of the research that has been identified are: i. ii.
Studying and designing techniques for the detection of hip replacement acetabular implant size. Implementing the techniques developed.
3 Research Background An acetabular implant size recognition technique to be developed involves a number of things that need to be studied carefully. The technique could be developed to recognize the size of the acetabular implant accurately and effectively. In this section, the conventional size detection techniques and other related matters will be discussed. 3.1 Digital X-ray Images X-ray is a type of radiation used in medical images for diagnosing diseases like cancer and fractures [9]. The radiologist takes X-ray images by putting on an X-ray to an opposite source of the part that needs to be imaged or convert it into films. Then, the image can be generated into films or stored digitally. The difference between a digital X-ray and a general X-ray is that the output given in the first case is in a digital form while the general X-ray is in films. A digital X-ray may be edited and stored in a computer database. While a general x-ray only provides a negative film output as a reference. An example of a digital X-ray can be seen in Figure 1. An x-ray image is stored in JPEG format using software MedWeb.
A New and Improved Size Detection Algorithm for Acetabular Implant
309
Fig. 1. Digital X-ray
3.2 Total Hip Replacement (THR) Every joint disease, whether inflammatory or trauma, - if the disease process is allowed to continue - will cause erosion of cartilage and joint damage. When this happens, a patient may complain of pain, swelling, deformity and instability of joints involved. There are several methods of treatment for damaged joints. The first step is a move in the conservative treatment. If this fails, then the next method of treatment is surgery. The most effective surgical method is the total hip replacement THR surgery. This surgery involves an artificial joint that replaces the original which is damaged. THR is a surgical procedure in which the hip joint is replaced by an implant. THR surgery can be performed as a total replacement or a hemi (half) replacement [10]. THR consists of replacing both the femoral head and the acetabulum. Before the surgery is performed, a conventional orthopaedic implant size identification is done by manually matching the articial implant template with a patient X-ray by an orthopaedic specialist. This method is a conventional method to determine the patient’s implant size. But the manual or observational procedures require a long time to recognize the size of the patient’s implant because the method used is repeated several times. 3.2.1 Acetabulum The acetabulum (see Figure 2) is a concave surface of the pelvis. The head of the femur meets with the pelvis at the acetabulum, forming the hip joint [11].
310
A. Shapi’i et al.
Fig. 2. The acetabulum
3.2.2 Acetabular Implant One step in total hip replacement surgery is the insertion of the acetabular implant. The acetabular implant is the component which is placed into the acetabulum (hip socket). Bone and cartilage are removed from the acetabulum, and the acetabular implant is attached using cement or friction. Some acetabular cups are one piece, others are modular. One piece shells are either polyethylene or metal, they have their articular surface machined on the inside surface of the cup and do not rely on a locking mechanism to hold a liner inplace [10]. Figure 3 shows the acetabular implant used in THR.
Fig. 3. Acetabular implant
3.2.3 Digital Acetabular Templating Using Line Distance Method The process of designing and producing new techniques of implant detection is taking into consideration the method of improvement which will be implemented on the latest techniques. Detection techniques that will be generated are based on a computerized techniques in which the implant template will be matched to the X-ray images. This technique has the ability to detect the size of the acetabular implant
A New and Improved Size Detection Algorithm for Acetabular Implant
311
automatically. By using the concept of line distance, acetabular implants will be selected based on the patient’s acetabulum diameter size. Figure 4 shows the algorithm for the conventional acetabular implant size recognition technique [2]. ࣭ࣶࣜࣜࣜࣜࣜࣜࣜࣜ࣪࣬ࣜऀमझळࣜझࣜनथपडࣜफपࣜझटडरझञऱनऱऩࣜठथझऩडरडम࣪ࣜࣜ ࣮ࣶࣜࣜࣜࣜࣜࣜࣜࣜ࣪࣬ࣜࣿझनटऱनझरडࣜरतडࣜठथयरझपटडࣜफढࣜरतडࣜनथपड࣪ࣜࣜࣜࣜࣜࣜࣜࣜࣜࣜࣜࣜࣜࣜ ࣯ࣶࣜࣜࣜࣜࣜࣜࣜࣜ࣪࣬ࣜएथशडࣜफढࣜरतडࣜथऩबनझपरࣜथयࣜटतफयडपࣜञझयडठࣜफपࣶࣜ ࣯࣭ࣜࣜࣜࣜࣜࣜࣜࣜࣜࣜࣜࣜࣜ࣪ࣜअढࣜरतडࣜठथयरझपटडࣜफढࣲ࣯ࣜࣸऩऩࣜझपठࣺࣜࣴ࣬ऩऩࣨࣜपफࣜ ࣜࣜࣜࣜࣜࣜࣜࣜࣜࣜࣜࣜࣜࣜࣜࣜࣜथऩबनझपरࣜࣜळथननࣜञडࣜठथयबनझवडठࣜࣜࣜࣜࣜࣜࣜࣜࣜࣜ ࣯࣮ࣜࣜࣜࣜࣜࣜࣜࣜࣜࣜࣜࣜࣜ࣪ࣜअढࣜरतडࣜठथयरझपटडࣜतझयࣜझࣜठडटथऩझनࣜलझनऱडࣨࣜरझधडࣜरतडࣜ ࣜࣜࣜࣜࣜࣜࣜࣜࣜࣜࣜࣜࣜࣜࣜࣜࣜࣜबमडलथफऱयࣜलझनऱड࣪ࣜࣜ ࣯࣯ࣜࣜࣜࣜࣜࣜࣜࣜࣜࣜࣜࣜࣜ࣪ࣜअढࣜरतडࣜठथयरझपटडࣜतझयࣜझपࣜफठठࣜलझनऱडࣨࣜरझधडࣜरतडࣜࣜࣜ ࣜࣜࣜࣜࣜࣜࣜࣜࣜࣜࣜࣜࣜࣜࣜࣜࣜबमडलथफऱयࣜडलडपࣜलझनऱडࣜञडढफमडࣜरतडࣜफठठࣜलझनऱड࣪ࣜࣜ ࣰࣶࣜࣜࣜࣜࣜࣜࣜࣜ࣪࣬ࣜऀथयबनझवࣜझटडरझञऱनझमࣜथऩबनझपरࣜ
Fig. 4. Acetabular implant size detection algorithm
In the algorithm shown in Figure 4, the user must draw a line on the acetabulum diameter. Line distance will be calculated, and the size of the implant will be determined by the acetabulum diameter size. For example, in Figure 5, the distance of the line drawn on the acetabulum diameter is 64.15 mm, therefore the value will be taken as 64 mm.
Fig. 5. Line drawn on acetabulum
4 Materials and Method The main activity to implement is to analyze the existing techniques for the implant size recognition process. The technique analysis aims to identify similar techniques or that have been implemented or are being implemented by other researchers around the
312
A. Shapi’i et al.
world. The process of producing a new technique for the implant size recognition is to take into account the improved methodology to be adopted on the latest techniques. The new method that has the potential to be developed is the automated recognition of the digital acetabular implant size based on two points. This means that the surgeons do not have to use an automated line distance method to select the appropriate implant size. The second objective of the research is to implement the new techniques produced by developing a system that consists of modules of the new technique. This aims to ensure that new techniques will be developed to be implemented effectively. In addition, the system will be developed to include other modules such as the transformation of the image, measuring the size and angle, and so forth. An appropriate programming language is to be used for this purpose. After a new technique has been implemented, the next phase is to do testing and comparison. The aim is to ensure that this new technique can effectively recognize the implant size, and to assist physicians in carrying out the pre-operative process before the surgery in order for it to be more efficient and organized. 4.1 Digital Implant Digital implants produced using Autocad 2008 and Photoshop software can be seen in Figure 6.
Fig. 6. Digital acetabular implant
4.2 New Acetabular Size DetectionTechnique Using Point to Point Method A new detection technique that will be generated are based on a technique proposed by [2] in which the implant template will be matched to the X-ray images by using line distance method. This new technique has the ability to detect the size of the acetabular implant automatically based on the distance between two points. Figure 7 shows the algorithm for the conventional acetabular implant size recognition technique [2]. In the algorithm shown in Figure 7, the user must choose the first and the second point on the acetabulum diameter. The distance between the two points will be calculated, and the size of the implant will be determined by the acetabulum diameter size. Similar to the line distance method, sizing method involves three conditions, namely, if the size of the implant is smaller than 36 mm or larger than 80 mm , then the implant will not be displayed. In addition, if the distance has a decimal value, the previous value will be taken. For example, in Figure 8, the distance between two points choose (point A and B) on the acetabulum diameter is 54.28 mm, therefore the value will be taken as 54 mm.
A New and Improved Size Detection Algorithm for Acetabular Implant
313
࣭ࣶࣜࣜࣜࣜ࣪࣬ࣜࣿतफफयडࣜरतड࣭ࣜयरࣜबफथपरࣜफपࣜरतडࣜबझरथडपरࣜझटडरझञऱनऱऩࣜ ࣮ࣶࣜࣜࣜࣜࣜ࣪࣬ࣜࣿतफफयडࣜरतड࣮ࣜपठࣜबफथपरࣜफपࣜरतडࣜबझरथडपरࣜझटडरझञऱनऱऩࣜࣜࣜ ࣯ࣶࣜࣜࣜࣜࣜ࣪࣬ࣜࣿझनटऱनझरडࣜरतडࣜठथयरझपटडࣜञडरळडडपࣜरतडࣜरळफࣜबफथपरयࣜࣜࣜࣜࣜࣜࣜࣜࣜࣜࣜࣜࣜ ࣰࣶࣜࣜࣜࣜࣜ࣪࣬ࣜएथशडࣜफढࣜरतडࣜथऩबनझपरࣜथयࣜटतफयडपࣜञझयडठࣜफपࣶࣜ ࣰ࣭ࣜࣜࣜࣜࣜࣜࣜࣜࣜࣜ࣪ࣜअढࣜरतडࣜठथयरझपटडࣜफढࣲ࣯ࣜࣸऩऩࣜझपठࣺࣜࣴ࣬ऩऩࣨࣜपफࣜअऩबनझपरࣜࣜळथननࣜञडࣜठथयबनझवडठࣜࣜ ࣰ࣮ࣜࣜࣜࣜࣜࣜࣜࣜࣜࣜ࣪ࣜअढࣜरतडࣜठथयरझपटडࣜतझयࣜझࣜठडटथऩझनࣜलझनऱडࣨࣜरझधडࣜरतडࣜबमडलथफऱयࣜलझनऱड࣪ࣜࣜ ࣰ࣯ࣜࣜࣜࣜࣜࣜࣜࣜࣜࣜ࣪ࣜअढࣜरतडࣜठथयरझपटडࣜतझयࣜझपࣜफठठࣜलझनऱडࣨࣜरझधडࣜरतडࣜबमडलथफऱयࣜडलडपࣜࣜ ࣜࣜࣜࣜࣜࣜࣜࣜࣜࣜࣜࣜࣜࣜलझनऱडࣜञडढफमडࣜरतडࣜफठठࣜलझनऱड࣪ࣜࣜ ࣱࣶࣜࣜࣜࣜࣜ࣪࣬ࣜऀथयबनझवࣜझटडरझञऱनझमࣜथऩबनझपरࣜ
Fig. 7. New implant size detection for acetabular component
Point A
Point B
Fig. 8. Distance between point A and B
5 Results and Discussions Based on the algorithm in Figure 7, the first thing the user needs to do is select two points on the patient’s acetabulum diameter. Figure 9 shows how to select points. The distance between two points will be calculated to determine the size of the acetabular implant. In Figure 9, the distance drawn between the two points on the acetabulum diameter was 54.28 mm, so the value to be taken is 54 mm. An acetabular implant with 54 mm size will be displayed automatically on the X-ray image (see Figure 10 and Figure 11). The size of the acetabular supplied by the PPUKM is an even number, so if there is an odd number, the even value before the odd value will be taken as the size of the acetabular implant. For example, if the distance drawn was 59.25 mm, so the value to be taken is 58 mm.
314
A. Shapi’i et al.
Select Point 1
Select Point 2
Distance between point 1 and 2
Fig. 9. Points on the acetabulum diameter
Fig. 10. Output of algorithm
In Figure 11, it can be seen that the position of the acetabular implant is still not in the best position. The implant should be translated and rotated to get the optimum position. Figure 12 shows the implant which is translated and rotated using the geometric transformation algorithm [12], while Figure 13 shows the information of implant and patient. To test the utility of the acetabular implant size recognition new technique, an experiment was conducted with assistance from a surgeon for the implant templating using digital line distance method [2]. The templating results by using the line distance method were compared to the results produced by our new automated technique. The testing recorded the acetabular implant size to be used and the time
A New and Improved Size Detection Algorithm for Acetabular Implant
315
Fig. 11. Acetabular implant default position
Fig. 12. Implant that has been translated and rotated
taken by both methods. Five randomly selected X-rays of unidentified patients were used for templating for both techniques. The difference between the two sizes were calculated and shown in table 1. It is evident that the new technique yields very close results to those obtained by the line distance method in all studies (100%). The difference, if any, is also within the error of clinically acceptable range (± 2 mm determined by the surgeon) obtained by the line distance templating method. In addition, the study also demonstrated that
316
A. Shapi’i et al.
the average time taken for acetabular implant templating in total hip replacement preoperative planning using a new technique (19 second) was much less than using the line distance method (23 second).
Fig. 13. Acetabular and patient’s information Table 1. Line Distance Method vs Point to Point Method
Patient 1 2 3 4 5 Average Position Time
Acetabular Size (Line Distance)
Acetabular Size (Point to Point)
46 56 50 48 58
46 56 50 48 56
23s
19s
Size Difference 0 0 0 0 ±2
5 Conclusion Preoperative templating has been useful to determine the optimum implant size in total hip replacement (THR). With classical tracing paper now obsolete, we have developed a new technique to undertake the templating procedure with a digital acetabular implant and an X-ray. The digital implant provides several advantages for THR surgery. Compared to the observational method in which the surgeon uses a template manually and places it on the patient’s X-ray, the use of the digital method
A New and Improved Size Detection Algorithm for Acetabular Implant
317
not only saves time, but also can reduce the error due to consistency difference when making adjustments to a patient’s implant size [13-15]. This new technique has improved the size detection method for acetabular implant in THR. The technique allows users to choose the acetabular implant automatically on computer prior to surgery, based on the two points selected along the diameter of the patient’s acetabulum size. The new proposed acetabular implant detection technique also provides user-friendly and accurate computer programming surgical planning. In addition, the average time taken for the acetabular implant templating process in preoperative planning using a new automated technique was much less than using the line distance method. Acknowledgments. This research project was conducted in collaboration with Dr Abdul Yazid Mohd Kassim from the Department of Orthopaedic and Traumatology and Dr Hamzaini Abd Hamid from the Department of Radiology, Medical Centre of University Kebangsaan Malaysia. This research was also funded by the University Grants UKM-OUP-ICT-35-179/2010 and UKM-GUP-TMK-07-01-035.
References 1. Klein, M.: Using Data in Making Orthopedic Imaging Diagnoses. Journal of Advances in Experimental Medicine and Biology 44, 104–111 (2005) 2. Azrulhizam, S., Riza, S., Mohammad, K.H., Abdul, Y.M.K.: An Automated Size Recognition Technique for Acetabular Implant in Total Hip Replacement. International Journal of Computer Science & Information Technology 3(2), 236–249 (2011) 3. Murzic, W.J., Glozman, Z., Lowe, P.: The Accuracy of Digital (filmless) Templating in Total Hip Replacement. In: 72nd Annual Meeting of the American Academic of Orthopaedic Surgeons (2005) 4. Yusoff, S.F.: Knee Joint Replacement Automation Templates. M.sc Thesis, Universiti Kebangsaan Malaysia, Bangi, Malaysia (2009) 5. Van der Linden, H.M.J., Wolterbeek, R., Nelissen, R.G.G.H.: Computer Assisted Orthopedic Surgery; Its Influence on Prosthesis Size in Total Knee Replacement. Journal of The Knee 15, 81–285 (2008) 6. Todsaporn, F., Amnach, K., Mitsuhashi, W.: Computer-Aided Pre-Operative planning for Total Hip Replacement by using 2D X-ray images. In: Proceedings of SICE Annual Conference (2008) 7. Michalikova, M., Bednarcikova, L., Petrik, M., Rasi, R., Zivcak, J.: The Digital PreOperative Planning of Total Hip Replacement. In: Proceedings of 8th IEEE International Symposium on Applied Machine Intelligence and Informatics (2010) 8. Blackwell, M., Simon, D., Rao, S., DiGioia, A.: Design and Evaluation of 3-D Preoperative Planning Software: Application to Acetabular Implant Alignment, pp. 1–11 (1996) 9. Pospula, W.: Total Hip Replacement: Past, present and Future. Kuwait Medical Journal, 171–178 (2004) 10. Wikipedia Hip Replacement Website, http://en.wikipedia.org/wiki/Hip_replacement 11. GreatMedCorp Website, http://www.greatmedcorp.com/procedures/hip-replacement.html
318
A. Shapi’i et al.
12. Azrulhizam, S., Riza, S., Mohammad, K.H., Abdul, Y.M.K.: Geometric Transformation Technique for Total Hip Implant in Digital Medical Images. Universal Journal of Computer Science and Engineering Technology 1(2), 79–83 (2010) 13. Kosashvili, Y., Shasha, N., Olschewski, E., Safir, O., White, L., Gross, A., Backstein, D.: Digital versus conventional templating techniques in preoperative planning for total hip arthroplasty. Can Journal Surg. 52(1), 6–11 (2009) 14. Verdonschot, N., Horn, J.R., Oijen, P., Diercks, R.L.: Digital Versus Analogue Preoperative Planning of Total Hip Arthroplasties. The Journal of Arthroplasty, 866–870 (2007) 15. Valle, A.G., Comba, F., Taveras, N., Salvati, E.A.: The Utility and Precision of Analogue and Digital Preoperative Planning for Total Hip Arthroplasty. International Orthopaedics (SICOT) 32, 289–294 (2007) 16. ImageJ Website, http://rsbweb.nih.gov/ij/
Visual Application in Multi-touch Tabletop for Mathematics Learning: A Preliminary Study Khoo Shiang Tyng1, Halimah Badioze Zaman2, and Azlina Ahmad3 1,2,3
Faculty of Information Science and Technology 2,3 Institute of Visual Informatics Universiti Kebangsaan Malaysia, 43600 Bangi, Selangor {haze_khoo,hbzukm}@yahoo.com, [email protected]
Abstract. Multi-touch technology on tabletop appears as a promising tool in collaborative learning with or without the presence of a teacher. It can overcome some of the problems faced in using desktop computer in education since it allows single user. This paper presents a preliminary study conducted for a research project in developing a visual application in multi-touch tabletop for preschool Mathematics called MEL-Vis. The study was conducted on eight preschool’s teachers and nine primary school teachers. The results of the preliminary study showed that the topics of addition and subtraction operations were the major problems in learning and teaching of Mathematics in the preschool and at Elementary Level (Primary One). Students were found to have difficulty in understanding the concepts of addition and subtraction operations. Keywords: Tabletop Technology, Multi-touch, collaborative learning, horizontal display, Visual Informatics.
1
Introduction
Learning through means of visual illustrations and some form of interactivity has now become a popular way of learning and acquiring relevant information [1]. For years multi-touch technology in the form of table-top has been widely used in a number of fields [2],[3],[4],[5] . Their application ranges from gaming, entertainment, and interactivity to communication and now academics which are becoming highly recognized [2]. Tabletop displays have provided a silver lining for better modes of learning in a number of educational institutions and in what better way could they be used apart from introducing them into pre-schools for collaborative mathematical learning [1]. This introduction will provide students the opportunity to learn in small groups in a simple, interactive and innovative way. With ease of access and obtain-ability, multi-touch displays can easily increase competence levels in pre-school education [6]. It creates a more interactive environment as more than one user is allowed to participate as compared to normal laptops and desktops [6]. The technology of touch sensing will allow users to interact with the system through multi-finger usage and also students will be able to see what they are
H. Badioze Zaman et al. (Eds.): IVIC 2011, Part I, LNCS 7066, pp. 319–328, 2011. © Springer-Verlag Berlin Heidelberg 2011
320
K.S. Tyng, H. Badioze Zaman, and A. Ahmad
Fig. 1. Student solving mathematical problems on a multi-touch system
doing on the same device that they are operating on [6]. Figure 1 shows a student using multi-touch tabletop to solve Mathematics problems.
2
Literature Review
Multi-touch table-top displays have been used to help individuals understand things visually and interactively. People remember what they have seen and interacted with instead of what they have listened or read. The use of table-top technology has spanned over different fields. For instance interactive table were used to train medical students in carrying out diagnosis through visual representations of their patient [7]. In the field of social studies, multi-touch tabletop technology had been used to help adolescents with Asperger’s Syndrome in gaining effective team-working skills using a four-player cooperative computer game [8]. A few educational studies [5],[9],[10],[11] show that, multi-touch technology are able to help communication disabilities (Autistic) children in learning. They are able to use the learning material in interactive tabletop environment with certain difficulty [12]. Collaborative learning focuses on three tasks: looking for and obtaining information, compiling gathered information into an ordered, presentable manner and imparting the information unto others [13]. According to [14], collaborative learning occurs in a group of four to five students. Collaborative classroom can enhance learning performance of both groups and individuals by encouraging growth of new ideas and focusing on real time problem-solving. Through collaborative learning, students are more motivated, curious, concern for others and contested in psychological health. Collaborative learning encourages individuals to put forward ideas and address them within a group, thus increasing individual understanding and creating a strong
Visual Application in Multi-touch Tabletop for Mathematics Learning
321
foundation for team-work and social interaction [8, 15]. Individuals teach one another and gather sufficient knowledge to help them answer certain scenarios that have to be solved [13]. Multi-touch displays have shown great potential as a tool for collaborative learning [16]. When operating multi-touch table-tops, students usually stand in a circular manner around them and interact via finger gestures. In other words, they can manipulate objects within the display [17]. A study complete by [3] show that the learning results by presenting educational material on a tabletop display with presenting the same material are better than results obtained with the use of traditional paper handouts. Research by [11], found that learning based on the use of the touch screens had a positive impact on the learning of children who found mouse control very difficult or impossible due to poor hand-eye coordination, and that it was an excellent introduction for those children who had little or no previous experience of using a computer.
3
Research Motivation
Studies show that usage of computer may attract preschoolers and primary school students in learning mathematics [2]. However computer is not an ideal tool if collaborative learning is not well planned. The screen of a computer often limits the number of students allowed to access due to poor visibility. Typically, students in collaborative learning are in a group of 4 to 5. The usage of keyboard that allows only one access also hinders collaborative learning. Hence the use of multi-touch technology can overcome the problems mentioned above since multi-touch table-top has a bigger and clearer screen as compared to the normal computer screen. It also allows 4 to 5 preschoolers to participate and manipulate the screen comfortably. In addition, multi-touch technology which allows body contact may create a more relaxing atmosphere at learning as compare to usage of keyboard or mouse. This technology encourages students to share knowledge through collaborative learning by dragging and placing images, objects and text. This way, teachers will also find it easier to track learning progress of groups as well as individuals through the table-top display. Preschoolers can be classified as visual learners hence they remember easily if information is presented in the form of 2D and 3D visual graphics as compare to audio and text. The combination of graphics, audio, animation and video in a digital environment is necessary to create playful learning environment. In pre-schools, this technology could provide some form of learning base on games and entertainments. Preschoolers can solve various mathematical operations by playing and learning in groups. With all the potential applications of multi-touch technology in collaborative learning stated in the previous section, it is therefore the purpose of this research is to design and develop a visual application in multi-touch tabletop display for preschool Mathematics (MEL-Vis). In order to get better understanding of the difficulties faced by the preschooler in learning Mathematics, a preliminary study has been conducted on eight preschool’s teachers and nine primary teachers. The following sections will describe the objectives, procedures and the findings of the preliminary study.
322
4
K.S. Tyng, H. Badioze Zaman, and A. Ahmad
Objectives of Preliminary Study
The design and development of MEL-Vis involves three main phases namely preliminary analysis, design and development and evaluation. This paper highlights the preliminary study conducted which is the first phase of the development of MEL-Vis. Objectives of this study are as follows: i. To find out the types of teaching aids currently use by Mathematics teachers and problems faced by them. ii. To identify topics in preschool Mathematics syllabus which are considered difficult to understand by the preschoolers. iii. To find out which topic/s in the Mathematics syllabus which is considered difficult by preschoolers.
5
Procedure
An interview was conducted to gather the appropriate data in order to achieve the main objectives of the preliminary study. Seventeen Mathematics teachers, eight preschool teachers and nine primary teachers who had experience in teaching Mathematics for 2 to 20 years were involve in the study. The reason to choose teachers from preschool and primary levels is to investigate whether problems faced by preschoolers still exists in the Primary One students. There were two instruments used in the study namely: Teacher’s Interview Schedule (TIS) and a questionnaire on difficult topics for preschoolers (QDT). The procedure of this study comprises of 3 steps as follows: i. Interviewing teachers based on the instrument TIS. ii. Answering questionnaire by teachers based on the instruments QTD. iii. Analysing all data collected from interview and questionnaire.
6
Results of Study
This section presents the results of the preliminary study based on interviews and questionnaire. Table 1 shows the findings of the interviews and questionnaire. 87.5% of the teachers felt that students have trouble understanding the concept of addition and subtraction operations. The result of the study also shows that there were constraints in carrying out collaborative learning using computers. This is due to the vertical position of the computer monitor and the existence of a keyboard for each computer that is difficult to be move around among students. Therefore, collaborative learning interaction between students is limited in this situation. Another study was also conducted to identify the problems of teaching and learning Mathematics for students in Primary One. The purpose of the study was to determine if the problems faced at the preschool level still exist at the Primary One level. There were nine Mathematics teachers involved in this study.
Visual Application in Multi-touch Tabletop for Mathematics Learning
323
Table 1. Findings of the interview and questionnaire among preschool teachers
Problems Faced ▪ ▪ ▪
recognize numbers differentiate numbers do not understand the concept of addition and subtraction operations which is abstract ▪ can do counting but have less understanding about the number value ▪ constraints in the use of computers, keyboards and teaching aids (courseware), in carry out collaborative learning
Percentage(frequency) 37.5% ( 3 ) 12.5% ( 1 ) 87.5% ( 7 ) 25.0% ( 2 ) 87.5% ( 7 )
The results of the interview and the questionnaire can be seen in Table 2. The computational problems in addition and subtraction operations still exist among the students in Primary One. The study also found that there were constraints in the use of computer and keyboard as well as teaching aids (courseware) in carrying out collaborative learning. Table 2. Findings of the interview and questionnaire among Primary One teachers
Problems Faced ▪ recognize numbers ▪ differentiate numbers ▪ computational problems (addition and subtraction operations) ▪ do not understand the concept of addition and subtraction operations which is abstract ▪ can do counting but have less understanding about the number value ▪ constraints in the use of computers, keyboards and teaching aids (courseware), in carry out collaborative learning
Percentage(frequency)
44.4% ( 4 ) 44.4% ( 4 ) 89.9% ( 8 ) 89.9% ( 8 ) 44.4% ( 4 ) 89.9% ( 8 )
The result of the study shows that 89.9% of teachers agreed that computational problems for addition and subtraction operations exist among students in Primary One. The results of study also found that there were constraints in the use of computers in collaborative learning through the Mathematics.
324
K.S. Tyng, H. Badioze Zaman, and A. Ahmad
In summary the findings of this preliminary study are listed as follows: i. The topics of addition and subtraction operations are the major problems in learning and teaching of Mathematics in the preschool and at a elementary level (primary one). ii. Students in preschool and Primary One level have difficulties in understanding the concept of addition and subtraction operations. iii. There are constraints in the use of desktop computers in collaborative learning.
7
Conceptual Framework of MEL-Vis
From the preliminary study, a conceptual framework was developed as shown in Figure 2. The conceptual framework of MEL-Vis is divided into two : i. ii.
Prototype Development of MEL-Vis Usability Evaluation of MEL-Vis
Prototype Development of MEL-Vis
Usability Evaluation of MEL-Vis
Stage I Visual learning
Stage II
Stage I V
ID Model (MEL-Vis)
Effectiveness
Multi-touch Technology Stage III Cognitive theory in Multimedia learning
Sca ffolding Model
Collaborative Learning
MEL-Vis Development Course Software Model
Teaching and Learning Mathematics (addition and subtraction operation) Preschools
Visual Applications Prototype (MEL-Vis)
Story Board
Ease of Learn
Ease of Use
Fig. 2. Conceptual Framework of MEL-Vis
The cognitive theory of multimedia learning adopted in the design and modeling of the Mel-Vis prototype is to ensure that process of learning occurs among preschoolers. There are three important steps of cognitive process [18]. The first step, they pay attention to the relevant information in the presented material. Then, they create a cognitive structure by mentally organize the selected materials incoherent verbal and pictorial representation. Final step is to integrate incoming verbal and pictorial representation with each other and with existing knowledge. Meaningful knowledge and a better understanding of addition and subtraction operations occur with preschoolers appropriately engage all these processes.
8
Modules of MEL-Vis
The main purpose of developing MEL-Vis is to help preschoolers understand how to solve Mathematics problems through the concept of ‘learning through play’ in
Visual Application in Multi-touch Tabletop for Mathematics Learning
325
collaborative way. The target user for MEL-Vis is preschool children age 3 - 6 years old. Figure 3 shows the MEL-Vis visual application modules. Reading Module: The reading module introduces shapes and colors through short story. Learning shapes and colors allows children to group or classify items. Children develop their ability to make logical connection, which is a important to learning mathematics and language. Figure 4 shows an example of reading module. Number Recognition Module: The Number Recognition Module introduces numbers starting from number 1 to 50. In this module, users will recognize the number by counting on the object on the screen. The balloons on the screen will explode and numbers will appear once the user finishes counting. Figure 5 shows the number recognition module. Addition and Subtraction Operation Module: The Addition and Subtraction Operation Module introduces addition and subtraction operations. Users will learn addition and subtraction operation by watching multimedia video presentation. Short animated movie on counting will be shown in this module. Exercise Module: There are 3 types of exercises: Drag and Drop, Multiple Choice, and True or False. Users identify their own photo to login into their exercise profile instead of using username and password. This is because the target user of MEL-Vis is preschooler from three to six years old. Login by typing username and password is not suitable for children at this age level. All the exercises completed by the user will be recorded in their own profile. Teachers are able to track the date, total score, average score and exercises that have been carried out by the student. Activity Module: The activity module comprises a collaborative learning game are shown in Figure 6. This module allows users to work in pairs or group of three to solve Mathematics problem through the game. In this module, users need to find the numbers according to the question given and solve the mathematical problem together. Figure 4 -6 show the modules of MEL-Vis on Multi-touch Tabletop
Fig. 3. Modules of MEL-Vis
326
K.S. Tyng, H. Badioze Zaman, and A. Ahmad
Fig. 4. Reading Module of MEL-Vis
Fig. 5. Number Recognition Module of MEL-Vis
Visual Application in Multi-touch Tabletop for Mathematics Learning
327
Fig. 6. MEL-Vis Collaborative Learning Game
9
Conclusion
Multi-touch technology has the potential to stimulate children’s various senses and assist the development of cognitive psychomotor senses. Prototype development of visual application for Preschool Mathematics based on Multi-touch tabletop (MELVis) is expected to encourage interest of preschoolers to learn Mathematics on the topic of addition and subtraction operation. Preschool teachers can utilize this tool to facilitate teaching and collaborative learning activities for preschoolers through an interactive and attractive environment. The study will conduct experiments to evaluate MEL-Vis whether ‘learning through play’ through a new technology and media such as Mel-Vis would provide a better understanding in solving mathematical problem for preschooler. Besides, the result of the research is important to the policy makers to ensure that Malaysian’s preschooler does not lag behind of taking place in new technology in education. Early exposure for preschooler is important as the preschooler is the future leader of the country.
References 1. Shafie, A., Barnachea Janier, J., Wan Ahmad, W.: Visual Learning in Application of Integration Visual Informatics: Bridging Research and Practice. In: Badioze Zaman, H. (ed.), pp. 832–843. Springer, Heidelberg (2009) 2. Hourcade, J.P., Hansen, T.E.: Multitouch Displays to Support Preschool Children’s Learning in Mathematics, Reading, Writing, Social Skills and the Arts. In: International Conference on Interaction Design and Children, Como, Italy, June 3-5 (2009)
328
K.S. Tyng, H. Badioze Zaman, and A. Ahmad
3. Piper, A.M., Hollan, J.D.: Tabletop displays for small group study: affordances of paper and digital materials. In: Proceedings of the 27th International Conference on Human Factors in Computing Systems, pp. 1227–1236. ACM, Boston (2009) 4. Piper, A.M., Hollan, J.D.: Supporting medical conversations between deaf and hearing individuals with tabletop displays. In: Proceedings of the 2008 ACM Conference on Computer Supported Cooperative Work, pp. 147–156. ACM, San Diego (2008) 5. Khandelwal, M., Mazalek, A.: Teaching table: a tangible mentor for pre-k math education. In: Proceedings of the 1st International Conference on Tangible and Embedded Interaction, pp. 191–194 (2007) 6. Tse, E., Greenberg, S., Shen, C., Forlines, C.: Multimodal multiplayer tabletop gaming. Comput. Entertain. 5, 12 (2007) 7. Kaschny, M., Buron, S., Zadow, U., Sostmann, K.: Medical education on an interactive surface. In: ACM International Conference on Interactive Tabletops and Surfaces, pp. 267–268. ACM, Germany (2010) 8. Piper, A.M., O’Brien, E., Morris, M.R., Winograd, T.: SIDES: a cooperative tabletop computer game for social skills development. In: Proceedings of the 2006 20th Anniversary Conference on Computer Supported Cooperative Work, pp. 1–10. ACM, Banff (2006) 9. Sluis, R., Weevers, I., Schijndel, C.v., Kolos-Mazuryk, L., Fitrianie, S., Martens, J.: ReadIt: five-to-seven-yearold children learn to read in a tabletop environment (2004) 10. Rick, J., Rogers, Y.: DigiQuilt to DigiTile: Adapting Educational Technology to a Multitouch Table. In: IEEE International Workshop on Horizontal Interactive Human Computer System, TABLETOP (2008) 11. Ager, R., Kendall, M.: Getting it right from the start: a case study of the development of a Foundation Stage Learning and ICT Strategy in Northamptonshire, UK. In: Proceedings of the International Federation for Information Processing Working Group 3.5 open conference on Young children and learning technologies, vol. 34, pp. 3–11. Australian Computer Society, Inc, Sydney (2003) 12. Gal, E., Goren-Bar, D., Gazit, E., Bauminger, N., Cappelletti, A., Pianesi, F., Stock, O., Zancanaro, M., Weiss, P.: Enhancing Social Communication Through Story-Telling Among High-Functioning Children with Autism Intelligent Technologies for Interactive Entertainment. In: Maybury, M., Stock, O., Wahlster, W. (eds.), pp. 320–323. Springer, Heidelberg (2005) 13. Schneider, J., Derboven, J., Luyten, K., Vleugels, C., Bannier, S., De Roeck, D., Verstraete, M.: Multi-user multi-touch setups for collaborative learning in an educational setting. In: Luo, Y. (ed.) CDVE 2010. LNCS, vol. 6240, pp. 181–188. Springer, Heidelberg (2010) 14. Othman, Y.: Mengajar Membaca Teori & Aplikasi. PTS Professional (2007) 15. Cheng, I., Michel, D., Argyros, A., Basu, A.: A HIMI model for collaborative multi-touch multimedia education. In: Proceedings of the 2009 Workshop on Ambient Media Computing, pp. 3–12. ACM, Beijing (2009) 16. Morris, M.R., Lombardo, J., Wigdor, D.: WeSearch: supporting collaborative search and sensemaking on a tabletop display. In: Proceedings of the 2010 ACM Conference on Computer Supported Cooperative Work, pp. 401–410. ACM, Savannah (2010) 17. Martin, E., Haya, P.A., Roldán, D., García-Herranz, M.: Generating adaptive collaborative learning activities for multitouch tabletops 18. Mayer, R.: The Cambridge Handbook of Multimedia Learning. Cambridge University Press (2005)
Analysing Tabletop Based Computer Supported Collaborative Learning Data through Visualization Ammar Al-Qaraghuli1,2, Halimah Badioze Zaman1, Patrick Olivier2, Ahmed Kharrufa2 and Azlina Ahmad1 1
Institute of Visual Informatics, National University of Malaysia, Malaysia 2 School of Computing Science, Newcastle University, UK [email protected], [email protected], [email protected], {hbz,aa}@ftsm.ukm.my
Abstract. The development of digital tabletops that support user identification has opened the door for investigating users’ achievement in groupware based face-to-face Computer Supported Collaborative Learning (CSCL) settings. When students collaborate around the tabletop, teachers need to be aware of the steps undertaken by the students in order to reflect their achievements.We focused on exploring patterns of interaction between students and the objects of an educational application called ‘Digital Mysteries’, used as a basis for distinguishing the work of higher achieving groups from lower achieving ones. Teachers’ analysis of trial videos were informative in analysing data which aimed at automatically generating visualization log based dataset that reflect on students’ achievements. Our approach on users’ assessment, based on the interaction patterns in co-located CSCL setting, can be generalised to cover other collaborative application domains. The usefulness of such patterns can be applied in new designs of collaborative applications. Keywords: Visual Informatics, Visualization, Computer Collaborative Learning (CSCL), Tabletops, Log analysis.
Supported
1 Introduction The rapid expansion of research on tabletop computing and the development of related technologies have unleashed many emerging unexplored research areas that needs to be investigated. Accurate user identification capability supported by some digital tabletops (such as the Promethean ActivBoard used in this research and MERL’s Diamond Touch) has made it possible to log users’ interaction with the artefact, opening the doors for analysis of groupware in different application fields Co-located Computer-supported collaborative learning (CSCL) is the field chosen for this research. The Digital Mysteries application [1]was employed in the digital tabletops used for education purpose, and was chosen as the base application for our work. The video recording and interaction logs of the application employed during the trial sessions, paved the road to research on the automatic assessment of face-to-face group work between students using the digital tabletop. Collaborative problem solving in educational setting is referred to as collaborative learning. This method of learning usually increases students’ involvement in learning H. Badioze Zaman et al. (Eds.): IVIC 2011, Part I, LNCS 7066, pp. 329–340, 2011. © Springer-Verlag Berlin Heidelberg 2011
330
A. Al-Qaraghuli et al.
through discussions, explanations, disagreements, and mutual regulation activities which encourage knowledge elicitation and internalisation mechanisms [2]. Employing digital tabletops in collaborative learning offer many benefits over traditional learning; among these are breaking down problem solving processes into more-controllable subparts (modular problem solving), real time reflection on students’ learning, automatic assessment of achievement, and the capability of logging the solution process (which is the most important in our research since it is based on the interaction logs of a problem solving process)[3].Visualization was then applied to help comprehend the analysis of the important patterns of interaction in the data logs. The vast amount of data at hand was not easy to appreciate without using an easy-to-percept visual representation. Most of the research pertaining to collaborative learning activity analysis, focused on conventional interaction settings such as non-computer-supported classroom setting or distant collaboration rather than co-located computer supported type [4, 5]. Since the way users interact affects collaboration, there was a necessity to analyse interaction data of collaboration between students around the new digital tabletop technology. Stahl proposed using interaction analysis to study group perspectives and collaborative knowledge building [6]. Two research questions were addressed in this paper: (1) Is there a distinguishable set of patterns of interactions, extracted from the logs data of the co-located CSCL application, which can be used to measure the achievement of students’ participation in solving an open-ended problem? (2) Is the technique of cross checking the results of quantitative analysis of interaction logs data with that of qualitative analysis of video recorded trials, suitable to identify significant patterns of interactions that are useful for the evaluation of students’ achievement? Our solution involved preparing a multidimensional dataset that is suitable for visualisation, by identifying patterns of behavioural and cognitive aspects, extracted from the interaction data logs based on teachers’ analysis of students’ collaboration in solving problems in the ‘Digital Mysteries’, which integrated pedagogical and Human Computer Interface (HCI) theories.
2 Related Work Previous research around data analysis for visualization covered different fields and settings such as data mining and process mining of interaction data logs, and work on evaluation based on certain aspects of co-located CSCL such as territoriality and orientation [7]; [8]; [9]. Vistaco application was built by Tang et al. as an interactive visualization tool for tabletop collaboration. The researchers explored territoriality and spatial orientation focusing on individual activities and suggested the necessity of a more sophisticated set of visualizations. The work aimed at helping observers of the collaboration to further understand participant interactions[7]. Martinez et al. [8] used data mining techniques to model collaborative activities in co-located multi-display groupware setting where three users collaborated at the same table using their laptops, classifying collaboration into three categories: collaborative, somewhat collaborative and non-collaborative. Recorded mouse clicks and verbal
Analysing Tabletop Based Computer Supported Collaborative Learning Data
331
communication logs were matched by the researchers. They started with data exploration, and their statistics helped sensed variance in the amount of collaboration among the different groups. This was confirmed by video recording analysis of the sessions. Data was then mined, starting with annotation of each 30 seconds of video as one block and assigning collaboration level to this block. Participants were judged as ‘collaborative’, if they were all collaborating at some time within a block; ‘noncollaborative’, if they were not involved with the solution or only one student is participating in the solution process; ‘somewhat collaborative’ when one or two participants were out of track or when the group was working on track but were not communicating with each other. Rick et al. analysed the OurSpace application developed for multi-touch tabletops to enable children to arrange the desks in their classroom and allocate surrounding seats. Their findings included contrasting single touch versus multi touch interfaces as well as an analysis of territoriality and its effects on collaboration [9]. Martinez et al. used the data mining techniques to exploit log traces of the ‘Digital Mysteries’ application, that is also used as a base for this research. The researchers used two methods for analysing frequent sequential patterns in order to shed light on strategies followed by the groups of learners in the collaborative learning activity. Their methodology was proposed as a starting point for future research on pattern identification based on educational tabletop data [10] Christian et al. [11] suggested a framework to evaluate collaboration tools used by distributed teams. They focused on evaluating tool usefulness for a number of collaboration tools, and found that some aspects of communication were lost during the transformation from co-located collaboration setting to distributed teams environment. They then tried to compensate this communication loss by adopting collaboration tools that employ their evaluation framework which took into consideration of constructs such as awareness, context persistence, calendar assist, coordination and visualization. Finally, while many researchers covered the field of collaboration analysis in their research, investigations on literature pertaining to collaboration tools or collaboration techniques used in specific domains were found to be unavailable, particularly on colocated CSCL data analysis through visualization.
3 The Underlying Application: Digital Mysteries Based on the paper-based Mysteries tool, described as: “an enjoyable assessment tool which provides evidence of pupil’s cognitive processes through the observation and analysis of pupil’s manipulation of data slips to solve the mystery”[5], the ‘Digital Mysteries’ application was developed to be used as a case study to support the use of tabletops in education. The application was developed by Kharrufa et al. for a topprojected, multi-pen, horizontal Promethean Activboard prototype [1]. The design of the ‘Digital Mysteries’ used a number of virtual tools (such as grouping, post-it note and sticky tape tools), to satisfy the design goal of externalisation audit aimed at supporting higher-level thinking skills, feedback, reflection, and metacognition.
332
A. Al-Qaraghuli et al.
Fig. 1. Students solving a digital mystery around the tabletop
Each slip in a mystery contains data that forms a clue towards solving the mystery. Leat designed the mysteries to contain narrative thread slips that relates directly to the mystery, and added abstract slips that are more difficult to reason about and hence can be used to recognise higher achieving students, as well as ‘red herrings’ that boost the task’s openness and ambiguity [5].
4 Digital Mysteries Trials Trials on Digital Mysteries were reported in[3]. Six groups of high school students were selected by the teacher according to their achievement levels. Each group consisted of three students. Twelve trials were conducted in the school amongst 11-14 years old students. They solved four mysteries of different difficulty levels in their respective groups. Video recordings and data logs of these trials formed the main inputs of the research. Participants were chosen carefully to exemplify diversity in their thinking capabilities. Hence, the trials provided richer data logs. Three higher achieving students were in one group, whilst the other group consisted of lower achieving students. The four remaining groups had the combination of students with different achievement levels. Each of the six groups had to solve the “Windy Creek” mystery. In addition, the higher and lower achieving groups had to solve three other mysteries with increased levels of difficulty. Students were briefed about the technology and the application before they began to solve their first mystery. Two log files were created for each ‘Digital Mysteries’ session: the first is a text log that holds the time, interacting user, type of activity and object name after each interaction. This log is modest, yet it doesn’t provide all the necessary data for our analysis. The second log file is a binary file that stores the whole status of the application after each interaction; this binary file was created originally for the purpose of replaying the mystery solution during the reflection stage. It proved to be
Analysing Tabletop Based Computer Supported Collaborative Learning Data
333
useful for data analysis since each entry of this log holds full information about all the objects and user pointers at the moment it was stored. Hence, consequent entries can be compared to figure out the type of action and the object involved based on the last interaction as well as the interacting user ID.
5 Logs Data Analysis Analysis of logs data is essential for the visualization of students’ learning. The aim of this step is to explore the available data and video logs of the ‘Digital Mysteries’ trials, in an attempt to come up with a set of interaction patterns that are suitable to reflect on students’ collaboration in the educational context. These patterns should be automatically discoverable and will be visualised in order to assist teachers in the assessment of their students’ collaborative learning.
Fig. 2. Higher achieving students’ interactions with narrative threads
To inform the video analysis and consequent data analysis, we started with a preliminary data analysis step. The available text logs data were graphically represented in a few different ways. Figure 2 shows a reflection on higher achieving students’ interactions with narrative thread slips of a mystery. The same representation, but with the lower achieving group of students is shown in Figure 3. The types of interactions were represented against time of each interaction, stage borders are noted by vertical lines with a mention to the time spent during each stage. Each of the three students in the group was colour-coded to make it possible to distinguish the achievement of each student separately. This was made possible by the identity enabled attribute of the ActivBoard and the application design. These representations unleashed the need to analyse the videos and match some apparent patterns, with moments of the recorded real life sessions, to test the validity of these patterns. A focus on the slip movement interactions conducted by the lower achievers could be clearly seen when their interactions were compared with the higher
334
A. Al-Qaraghuli et al.
achievers, alongside a shorter ratio of time spent in the reading stage as compared to the latter. This led to difficulty in the grouping and sequencing stages due to the long duration of time spent at these stages.
Fig. 3. Lower achieving students’ interactions with narrative threads
An important factor in the design of the original mysteries as an educational tool was the inclusion of the different types of data slips: whereas the narrative threads related directly and clearly to the mystery, and the red herring slips were irrelevant to the narrative, the abstract slips were chosen to distinguish the higher achieving students through their challenging content which only high achievers could comprehend and manipulate appropriately. Users’ interactions with each object were analysed based on the type of interaction. The following are the types of interaction activities and the related objects logged in the textual log files: Table 1. Types of activities and the affected objects in the ‘Digital Mysteries’
Type of activity Enlarge Shrink Regularize size Move Rotate Create Add to group Remove from group Update Delete
Affected object types Slip, Note, Sicker, Virtual Keyboard Slip, Note, Sicker, Virtual Keyboard Slip, Note, Sicker Slip, Note, Sicker, Virtual Keyboard Slip, Note, Sicker, Group, Relation, Virtual Keyboard Note, Sicker, Group, Relation Slip, Note, Sicker, Relation Slip, Note, Sicker, Relation Relation, Note Group, Relation, Note, Sticker
Analysing Tabletop Based Computer Supported Collaborative Learning Data
335
Finding a considerable level of contrasts between the higher achievers and lower achievers, based on the number of interactions with the different slips as well as on the average interactions with all slips, provided a good reason to proceed further in our quantitative analysis. The higher achievers were found to follow some behavioural patterns that distinguished them from the lower achievers. Examples of such patterns are as follows: • Lower number of moves per slip by the higher achievers. Whilst the lower achievers, ranked highest in the number of slip movements at three times more than the higher achievers. • Lower achievers made much more interactions with certain slips. Some of these slips were the red herrings that higher achievers recognised as so. Other slips were abstract slips that were not easy to comprehend and manipulate by the lower achievers. • Relating slips to other slips, forming relations using stickers, was more frequent in the lower achieving groups than in the higher achieving ones. This indicates that the higher achievers did not find much problems in relating concepts with supporting evidence. The contrast was with the lower achievers. This was supported by the fact that the higher achievers made the logs. Since the mystery solving activity was divided into three stages: reading, grouping & sequencing and webbing, with different types of cognitive activities targeted at each stage, we decided to investigate per-stage interaction with different types of displayed objects. Students were required to read all the slips at the reading stage, since the slips were shrunk (thumbnail size), at the initial application setting. Students needed to enlarge every slip at least once before the application progressed to the next stage. Different groups of students followed different methods to complete this stage, as discussed in the video analysis section. This provided us with another good indicator for achievement. The higher achievers followed a well-organized group reading method: reading the slips in a one at a time fashion by enlarging them, shrinking back each slip, then dragging it to one side of the table. In contrast, the lower achievers, worked individually. Some slips were read three times, once by each student, whereas some other slips were read by one student only as observed during the video analysis, leaving gaps in acquiring the necessary information or knowledge required for solving the mystery. Figure 4 illustrates the time taken by each group to complete the three stages of the ‘Digital Mysteries’. G1T1 means first trial of group 1. Groups 1 and 3 are the higher achieving groups, whilst groups 2 and 4 are the lower achieving groups (refer to video analysis section). The time spent by the higher achievers at the reading stage was significantly longer than that spent by the lower achievers except during trial four (G1T4 and G2T4) where the difference was minor. The shorter reading time taken by the lower achieving groups resulted in their slower progression in the next stages due to the weakness in covering the background information needed to solve the mystery. Exceptions appeared in the first and fourth trials, leading to an investigation that connected the rush of the lower achieving groups to complete the mystery based on their shallow answers in these trials.
336
A. Al-Qaraghuli et all.
The answers were based on o a couple of narrative thread slips; whereas the higgher achieving groups provided d answers with detailed explanations that denote ddeep understanding of the abstracct slips in the same trials.
Time sp pent per stage during each group trial 5000
Time
4000 3000
Stage3 Stage2
2000
Stage1 1000 0
Group Fig. 4. Ratio of o time spent by students in completing each stage
The second stage of the ‘Digital Mysteries’ design emphasises on building grooups g tool (a container for slips), provides a virtual notes toool, of slips using the grouping and offers sticky tapes (stickers) ( to relate slips to each other. This helpss in externalising the internal reepresentation state of students, and hence their analysis and classification skills, accord ding to the Distributed Cognition (DC) theory[12]. Havving that as an important factor in i the educational application design, there is a necessityy to find patterns of behaviour to report on the usefulness of this stage to the learningg of students’ groups. The ‘Digital Mysteries’ application was designed to force students to group all the slips in at least four group ps, otherwise the application will not proceed to the thhird stage when students select the “next stage” option, providing a feedback that asks the ng to get at least four groups. students to redo the groupin Exploring the number off groups created by students during a solution didn’t shhow any interesting results, as well w as the sequence of creation/deletion of groups. Grooup names and content showed d an interesting finding that some students’ groups had created their groups of slip ps according to the possibilities of solution offered in the mystery question, such as “Leave” and “Stay” in the Windy Creek mystery whhich asks “Should Annie leave Windy Creek or should she stay? And why?”, whilee in some other cases the groups were created based on the types of information contaiined in the slips, such as “educaation”, “hobbies”, and “weather” in the same Windy Crreek mystery. Again, this wasn n’t consistent with the level of achievement of studeents. Since adding each slip into o a group is compulsory to proceed to the next stage, this was also not considered as a factor in the assessment of students’ work.
Analysing Tabletop Based Computer Supported Collaborative Learning Data
337
6 Video Analysis As detailed above, the lower achieving groups took significantly less time to pass the reading stage, compared to the higher achieving groups. The opposite was noted for the sequencing stage, where the lower achievers were in a rush to complete the early stages of the mystery. Thus, it was reasonable to observe their struggle at the last stage when they spent more time in sequencing and webbing the slips in a cause and effective manner, due to having a weakness on the prerequisite information. However, the lower achievers invested slightly less time than the higher achievers at the first and fourth trials. This was connected before to relatively weak answers, when compared to answers given by the higher achieving students. Based on Dillenbourg’s definition of collaboration (refer to introduction) [2], qualitative analysis was suggested to evaluate the collaboration of students’ groups based on video recording of the ‘Digital Mysteries’ trials. Analysis was conducted on participant and group based approach. Three school teachers were selected to participate in the video analysis based on their teaching experience and involvement in training teachers. Three days of group meetings were followed by a week of individual work to get the views of the teachers on the students’ group work in solving mysteries. A previously-prepared instrument (video observational analysis checklist) was enhanced after getting the teachers’ feedback and was distributed to the teachers who participated in the study. Five of the recorded videos were conducted based on group analysis. The videos selected included the higher achieving and lower achieving students’ groups, as well as three other groups solving the same mystery. The remaining videos were individually analysed. Overall, the video analysis findings showed that there was clear collaboration amongst the participants in some trials, whilst in others, the video analysis showed that one participant could have affected the whole session negatively for some reasons that are personality-bound such as being a leader or a free rider; or other reasons such as being upset due to some external factors irrelative to the study. The video analysis was matched by the preliminary data analysis conducted. This was due to the fact that some possible patterns needed to be confirmed during the video analysis, such as investigating what students were doing during a non-interaction period observed in the data logs of some trials. Our video analysis aimed at finding collaboration indicators besides analysing students’ behaviour during the trial sessions. We focused on indicators that can be automatically measured (statistics based on logs) rather than manually calculated (reading aloud or verbal discussions). The indicators are as follows: • Object passing: Passing objects to another student is a clear evidence of collaboration. This can be identified by looking for an object movement by one student followed directly by an interaction with the same object by another. This kind of collaboration was frequently seen at the reading stage of the solution videos. Transferring objects amongst participants is a core mechanical action that identifies collaboration in shared workspaces[13]. • Brainstorming moments: Some moments of thinking together were identified when there was a pause in interaction after some period of activity. These were linked
338
A. Al-Qaraghuli et al.
to verbal discussions that were seen clearly in the trials’ videos. Collaborative learners offer their prior knowledge and contributions to the group and take with them what they have learned from the group interaction through engaging in activities like brainstorming, sharing information, reacting to each other’s utterances, discussing, negotiating decisions and reaching common conclusions [6]. • Focus on an enlarged object: Having an enlarged slip in the middle of the table with no important interaction for a couple of seconds indicates a collaborative situation that was repeatedly observed during the videos analysis (waiting for a discussion to finish or until everybody finishes reading the displayed slip). • Territoriality: Slips moving among territories are another sign of collaboration. Previous interaction of each user denotes her/his territory rather than the part of the tabletop opposite to her/him since territories in tabletops are not necessarily mutually exclusive, and participants in tabletop collaboration tend to divide the workspaces into personal, group and storage territories[14]. Moving objects across territories characterises an invitation to collaborate. • Externalisation by adding notes: Making notes is a clear evidence of externalisation. Since writing a note takes a while when compared to other types of interactions with a freeze in interaction imposed by the application design during writing a note to encourage externalisation [1], students usually offer their opinions about the subject of the note. Hence, they’re externalising their thinking strategy. Higher achieving students employed the note tool effectively in most of their sessions, while lower achievers ignored the note making privilege. Thus, adding notes can constitute a part of the overall assessment factors. • Participation in final layout: Having consequent slips in the final layout added by different students was observed as a practice of higher achieving students and hence can be considered as an evidence of collaboration. The final outcome of a mystery solution, represented in both the final layout and the text in the answer of the mystery, is a good indicator of the group strategy[3]. However, automatic calculation of such an indicator without human intervention is not easily achieved. Therefore, the final outcome of the session will be displayed directly as a part of the visualisation.
7 Discussion and Future Work Our research presented a technique for analysing tabletop-based CSCL interaction data through visualisation-friendly patterns of behaviour that contribute to the evaluation of students’ collaboration. Video recordings of trial sessions of the application were analysed by teachers to inform the process of finding distinct patterns of interaction that are associated with the level of achievements of students. Our findings can be generalized and confirmed through future research based on similar application areas. The findings of this research were based on twelve application trials done by six groups of students. Two higher achieving and two lower achieving groups were clearly distinguished during the analysis and the findings were based mainly on contrasting the patterns of interaction of the ten application trials done by these groups. It is recommended that testing future versions of the application
Analysing Tabletop Based Computer Supported Collaborative Learning Data
339
be supported by a higher number of trial sessions to pave the road for more research pertaining to related areas as well as to confirm previous findings. We suggest that designers of co-located groupware applications consider including measurable actions and easily identifiable patterns in their designs, such as including interface elements with different importance levels to distinguish users focusing on the core of the problem from those manipulating less-important items. Further investigations into the effect of different cultural issues is also recommended as a future work to understand the effect of cultural and social factors on collaboration[13], comparing students from different stages of learning, students from different regions such as Europe and Asia, or students from different local cultures such as those from the cities and those from a rural or remote area. This research will continue by visualising the identified patterns of interaction, and examining the visualisations to be conducted by teachers. They shall evaluate students based on these visualizations. If the evaluation highly-match the previous videobased evaluation then visualisations are considered valid. Acknowledgement. The writers would like to acknowledge and thank Universiti Kebangsaan Malaysia (UKM) and the Ministry of Higher Education (MOHE) for awarding the grant (Ref No: UKM-AP-IPT-16-2009) to conduct this research.
References 1. Kharrufa, A., Leat, D., Olivier, P.: Digital Mysteries: Designing for Learning at the Tabletop. In: Proceedings of the ACM International Conference on Interactive Tabletops and Surfaces 2010 (2010) 2. Dillenbourg, P.: What do you mean by collaborative learning? In: Collaborative-learning: Cognitive and Computational Approaches, pp. 1–19. Elsevier, Oxford (1999) 3. Kharrufa, A.: Digital Tabletops and Collaborative Learning. Newcsatle University, UK (2010) 4. Perera, D., Kay, J., Koprinska, I., Yacef, K., Zaïane, O.R.: Clustering and sequential pattern mining of online collaborative learning data. IEEE Transactions on Knowledge and Data Engineering, 759–772 (2009) 5. Leat, D., Nichols, A.: Brains on the Table: diagnostic and formative assessment through observation. In: Assessment in Education: Principles, Policy & Practice, pp. 103–121. Routledge (2000) 6. Stahl, G.: Group cognition: Computer support for building collaborative knowledge. MIT Press, Cambridge (2006) 7. Tang, A., Pahud, M., Carpendale, S., Buxton, B.: VisTACO: Visualizing Tabletop Collaboration. In: ACM International Conference on Interactive Tabletops and Surfaces (ITS 2010) (2010) 8. Martinez, R., Wallace, J., Kay, J., Yacef, K.: Modelling and identifying collaborative situations in a collocated multi-display groupware setting. In: Proceedings of the International Conference on Artificial Intelligence in Education (2011) 9. Rick, J., Harris, A., Marshall, P., Fleck, R., Yuill, N., Rogers, Y.: Children designing together on a multi-touch tabletop: an analysis of spatial orientation and user interactions. In: Proceedings of the 8th International Conference on Interaction Design and Children, IDC 2009 (2009)
340
A. Al-Qaraghuli et al.
10. Martinez, R., Kay, J., Yacef, K., Al-Qaraghuli, A., Kharrufa, A.: Analysing frequent sequential patterns of collaborative learning activity around an interactive tabletop. In: Fourth International Conference on Educational Data Mining, Eindhoven (2011) 11. Christian, D., Rotenstreich, S.: An Evaluation Framework For Distributed Collaboration Tools. In: Seventh International Conference on Information Technology: New Generations (ITNG 2010), pp. 512–517 (2010) 12. Hollan, J.D., Hutchins, E., Kirsh, D.: Distributed cognition: toward a new foundation for human-computer interaction research. ACM Transactions on Computer-Human Interaction (TOCHI) 7, 174–196 (2000) 13. Pinelle, D., Gutwin, C., Greenberg, S.: Task analysis for groupware usability evaluation: Modeling shared-workspace tasks with the mechanics of collaboration. ACM Transactions on Computer-Human Interaction (TOCHI) 10, 281–311 (2003) 14. Scott, S.D., Sheelagh, M., Carpendale, T., Inkpen, K.M.: Territoriality in collaborative tabletop workspaces, pp. 294–303. ACM (2004)
High Order Polynomial Surface Fitting for Measuring Roughness of Psoriasis Lesion Ahmad Fadzil M Hani1, Esa Prakasa1, Hurriyatul Fitriyah1, Hermawan Nugroho1, Azura Mohd Affandi2, and Suraiya Hani Hussein3 1
Centre for Intelligent Signal and Imaging Research, Department of Electrical and Electronic Engineering, Universiti Teknologi PETRONAS, Tronoh, Malaysia 2 Department of Dermatology, Hospital Kuala Lumpur, Malaysia 3 KPJ Damansara Specialist Hospital, Kuala Lumpur, Malaysia [email protected]
Abstract. Scaliness of psoriasis lesions is one of the parameters to be determined during Psoriasis Area and Severity Index (PASI) scoring. Dermatologists typically use their visual and tactile senses to assess PASI scaliness. However, it is known that the scores are subjective resulting in inter- and intra-rater variability. In this paper, an objective 3D imaging method is proposed to assess PASI scaliness parameter of psoriasis lesions. As scales on the lesion invariably causes roughness, a surface-roughness measurement method is proposed for 3D curved surfaces. The method applies a polynomial surface fitting to the lesion surface to extract the estimated waviness from the actual lesion surface. Surface roughness is measured from the vertical deviations of the lesion surface from the estimated waviness surface. The surface roughness algorithm has been validated against 328 lesion models of known roughness on a medical mannequin. The proposed algorithm is found to have an error 0.0013 ± 0.0022 mm giving an accuracy of 89.30%. The algorithm is invariant to rotation of the measured surface. Accuracy of the rotated lesion models is found to be greater than 95%. System repeatability has been evaluated to successive measurements of 456 psoriasis lesions. The system repeatability can be accepted since 95.27% of the measurement differences are less than two standard deviation of measurement difference. Keywords: skin surface roughness, polynomial surface fitting, rotation invariance, Visual Informatics.
1 Introduction Psoriasis is a common skin disease that is incurable but treatable. In psoriasis, the immune system sends wrong signals that increases the growth cycle of skin cells, from 28 days (for normal skin) to 4 days only[1]. Psoriasis causes patient to feel distress and itchy and the problem can exist for long durations [2]. Psoriasis appears as a reddish lesion on the skin surface. At higher severities, the lesion becomes thicker with coarse white scales. The example of plaque psoriasis lesions on skin surfaces are shown in Figure 1. H. Badioze Zaman et al. (Eds.): IVIC 2011, Part I, LNCS 7066, pp. 341–351, 2011. © Springer-Verlag Berlin Heidelberg 2011
342
A.F.M. Hani et al.
Fig. 1. Plaque psoriasis lesions on skin surfaces
Neimann et al. have reported prevalence of psoriasis from several epidemiological studies. The prevalence around the world is estimated from 0.6-4.8% [3]. Based on clinical record and field survey from 1998 to 2009, Chandran et al. studied the psoriasis prevalence across the world is from 0.05 – 6.50% [4]. In Malaysia, Dermatological Society of Malaysia reported that the psoriasis prevalence in Malaysia is found to be 3% [5]. The Dermatology Department of Hospital Kuala Lumpur have registered 5.2% of 75,883 hospital’s patients in 2005-2010 are psoriasis patients [6]. Periodical medical treatment for psoriatic patient is crucial since the high prevalence of psoriasis and there is no total cure for the disease. The Psoriasis Area and Severity Index (PASI) scoring method has been recognised as a gold standard for the severity assessment and for determining treatment efficacy[7]. The four parameters that are required to determine PASI score are lesion area, erythema, scaliness and thickness. In PASI scaliness assessment, dermatologist selects representative lesions at four body regions representing the head, trunk, upper limbs, and lower limbs and then the dermatologist uses visual and tactile senses to determine the PASI parameter scores. The scaliness is scored by an integer number and ranging from 0 - 4. However, PASI assessment is not conducted in daily practice because it is laborious and time consuming. It is also known that the dermatologist’s score can be subjective resulting in inter and intra-rater variability [7]. Surface roughness measurement has been widely applied in various engineering applications. In ISO 14460-1, a surface is defined as a feature set that physically separates the object from surrounding medium. The surface profile is classified into three components which are roughness, waviness and form [8]. These components are differentiated based on the wavelength of the surface profile [8-10]. Roughness has finer ridges comparing to waviness, which is smoother than roughness. Form is a curved surface with very long-range deviation [10]. Statistical parameters, such as average roughness (Sa), root mean square roughness (Sq), maximum profile peak height (Sp), minimum profile valley depth (Sv), maximum height of the profile (St) etc, can be used to determine surface roughness [9]. Spatial distance in vertical or horizontal direction of roughness profile is extracted as surface data to obtain roughness parameters [11]. The waviness and/or the form profile need to be separated out by subtraction to obtain the roughness profile. Several profile separation methods have been existed such as average algorithm [12], surface fitting [13], Fourier transform [12] and wavelet [14].
High Order Polynomial Surface Fitting for Measuring Roughness of Psoriasis Lesion
343
Non-invasive methods have been applied for in vivo skin measurement. The methods that can be included as non invasive methods such as stripe projection method [15], tribo-acoustical system [16], and laser profilometry [17-18]. The stripe projection method allows fast scanning but it is not reliable to measure the deep-pitted skin surface. In tribo-acoustical system, roughness information is obtained by applying acoustic pressure on skin surface. The laser profilometry is able to measure wide area but takes longer time for scanning process. In this paper, a digital imaging system is proposed to determine objectively surface roughness of psoriasis lesions. Surface roughness of psoriasis lesions on the human body are features that are measurable using 3D imaging approaches and can be used to differentiate PASI scaliness scores. PASI scaliness severity is assessed by dermatologist based on descriptions of their visual appearance. In this proposed system, surface roughness feature is used to represent scaliness severity. It is observed that as a psoriasis lesion becomes more severe, the lesion surface becomes rougher due to the appearance of coarser scales. The average roughness (Sa) is selected to represent surface roughness. The advantages of using Sa are determination on the actual vertical deviations, reducing impulse noises, and there is no discarded data point in surface roughness calculation.
2 Surface Roughness Algorithm The 3D surface roughness is determined by averaging vertical deviations of the lesion surface. Because lesions appear on 3D curved surface (human body), the vertical deviations due to the lesion is determined by subtracting lesion surface from an estimated waviness surface as shown in cross-sectional view in Figure 2.
Fig. 2. Cross-sectional view of a skin lesion on normal skin surface
A 3D optical scanner with structured light projection method, PRIMOS (Phase shift Rapid In vivo Measurement of Skin) portable, is used to acquire lesion surface images. The scanner is suitable for 3D in vivo measurements of skin surface structure [19]. By using PRIMOS, a high-resolution 3D surface (spatial resolution: 0.0062 mm and depth resolution: 0.004 mm) can be imaged at high-speed scan (<63 ms). Higher-order polynomial surface fitting is applied to the rough lesion surface to extract an estimated 3D waviness surface from the rough lesion surface. By subtracting the rough lesion surface from estimated waviness surface, the vertical deviations of lesion surface can be exactly determined. The vertical deviations of
344
A.F.M. Hani et al.
lesion surface are known as the deviation surface. The second and third order polynomials are applied in this work. These polynomial orders are suitable for small surface areas as the vertical undulation of the waviness surface is lesser [13]. The general form of second and third order polynomials can be written by equation (1) and (2) respectively [20]:3
3
3
i =1
i =1
i =1
z2 ( x, y ) = y 2 ai x3−i + y ai + 3 x 3−i + ai + 6 x3−i 4
4
4
4
i =1
i =1
i =1
i =1
(1)
z3 ( x, y ) = y 3 ai x 4−i + y 2 ai + 4 x 4−i + y ai +8 x 4−i + ai +12 x 4−i
(2)
The coefficient of determination (R2) is used to measure fittingness of polynomial fitting [21]. A good fit can be obtained if R2 is in interval [0.9, 1.0]. The best fitting result of polynomial orders is selected based on the highest R2. The equation of R2 is expressed in following equation
(z( x , y ) − z M
N
i
R2 = 1−
i =1 j =1 M
j
(z( x , y N
i =1 j =1
i
( xi , y j ) )
2
k
(3)
) − z)
2
j
where z(xi,yj) is the elevation of lesion surface, z is the elevation average and zk(xi,yj) is fitted value at (xi,yj) using k-th order polynomial. Surface roughness is determined by using average roughness (Sa) equation [11]. In this equation, surface roughness is calculated by averaging the absolute vertical deviation of all data points. The average roughness, Sa is defined as
Sa =
1 MN
M
N
z ( xi , y j ) − z k ( x i , y j ) =
i =1 j =1
1 MN
M
N
e( x , y i
j
)
(4)
i =1 j =1
The proposed surface roughness algorithm for 3D curved surfaces is described as follows:1. 2. 3.
Input a 3D lesion surface matrix, Z0(x, y), with size M×N (see Figure 3-a). Divide the lesion surface into 2×2 sub-divided surfaces that gives four subdivided surfaces. For each sub-divided surface, Z(x, y), determine the polynomial coefficients of the selected order from coordinates x, y, and Z(x, y) through matrix inversion.
VA = Z
A = V −1Z
(5)
Matrix V contains the elements of polynomial equation, the coefficients of polynomial equation are stored in matrix A, and matrix Z represents the sub-divided lesion surface. As an example, the matrix equation of the second order polynomial surface fitting can be expressed in the following equations
High Order Polynomial Surface Fitting for Measuring Roughness of Psoriasis Lesion
x1 2 y1 2 2 2 x2 y2 V = x3 2 y 3 2 x 2 y 2 M N
x1 y1 2 x2 y 2
2
y1 2 y2
2
x1 y1 2 x2 y2
2
x1 y1 x2 y 2
y1 y2
x1 2 x2
2
x1 x2
2
y3
2
x3 y 3
2
x3 y 3
y3
x3
2
x3
2
xM y N
2
xM y N
yN
xM
2
xM
x3 y 3
xM y N
2
yN
4. 5. 6.
7. 8. 9.
1 1 1 1
z (x1 , y1 ) z (x , y ) 2 2 Z = z ( x3 , y3 ) z (xM , y N )
a1 a 2 A = a3 a9
345
(6)
Estimate the waviness matrix, Zk(x, y) (Figure 3-b), by applying calculated polynomial equation with coefficient matrix A at coordinates (x, y). The R2 is calculated to evaluate the fitting result. Here, Zw is accepted if R2 within [0.9, 1.0]. Deviation surface, E(x, y) (Figure 3-c), is determined by subtracting the estimated waviness (Zk) from the lesion surface (Z). The equation is E(x, y) = |Z(x, y) – Zk(x, y)|. E(x, y) is averaged to determine Sa of lesion surface matrix Z(x, y). Repeat steps 3 to 7 for the rest sub-divided surfaces until all accepted Sa of sub-divided surfaces have been completely determined. Average the Sa of sub-divided surfaces to obtain the final Sa of the measured lesion surface, Z0(x, y).
(a)
(b)
(c)
Fig. 3. Subtraction between lesion surface (a) and estimated waviness (b) is performed to obtain deviation surface (c). Average roughness equation is applied to deviation surface to determine surface roughness.
346
A.F.M. Hani et al.
3 Validation of Surface Roughness Algorithm Validation of surface roughness algorithm is performed to determine the accuracy and precision of roughness measurement of 3D curved skin surface. Algorithm validation on a surface with varying roughness has been reported [22]. 228 lesion models are used in validation. The lesion models are distributed on four regions of medical mannequin. The distribution of lesion models are as follows: head (6), upper limbs (54), trunk (151), and lower limbs (117). To evaluate accuracy and precision of surface roughness measurement, the average Sa of 33 lesion models on flat surface is used as a reference (SaRef). The SaRef value is found to be 0.0122 ± 0.0008 mm ( x Ref±σRef). The Sa of lesion models on the mannequin surface (SaCurve) is found to be 0.0135 ± 0.0021 mm ( x Curve±σCurve). For error analysis of the algorithm, the following equations are used:-
Error = xCurve − xRef
(7)
2 2 Precision = σ Total = σ Ref + σ Curve
xCurve − xRef Accuracy = 1 − xRef
×100 %
(8) (9)
The algorithm has an error of 0.0013 ± 0.0022 mm and accuracy is not less than 89.30%. The double precision (2σTotal= 0.0044 mm) of the algorithm approximates the depth resolution of PRIMOS camera (0.0040 mm). This fact shows that the algorithm in vertical deviation determination maintains the precision of PRIMOS camera. Rotation invariance of the algorithm with second and third order polynomials are tested at rotated surfaces of lesion model. Another surgical tape at flat surface with size more than 40 mm × 30 mm is used as the object. The rotation angles are varied from 0° to 330° with increment 30°. Therefore, 12 rotation angles are applied to evaluate rotation invariance. Figure 4 shows the rotated surfaces of lesion model.
Fig. 4. Rotated surfaces of lesion models by applying 12 different rotation angles
High Order Polynomial Surface Fitting for Measuring Roughness of Psoriasis Lesion
347
Five measurements are conducted for each angle variation. The Sa of lesion models with variation on rotation angle is obtained to be 0.0128 ± 0.0003 mm ( x ±σ). Figure 5 shows the Sa of the rotated lesions are within acceptable accuracy (>95%). With this result, it can be said that the algorithm is invariant to the rotation of the measured surface. This is achieved because the matrix inversion in polynomial coefficients determination does not depend on the element sequence of matrices V and Z (see Equation 6). Surface rotation will change the element sequence of these matrices.
Fig. 5. Sa of lesion models with variation on rotation angles. The values of Sa are within acceptable accuracy (>95%).
4 Algorithm Implementation on Psoriasis Lesion Surface roughness algorithm is applied to a sample of normal skin and two samples of scored psoriasis lesion. The lesions have been scored by dermatologist based on scaliness parameter. The outputs of the surface roughness algorithm are described in Figure 6, 7, and 8.
Fig. 6. Normal skin surface (5.5040 × 5.6960 mm), Sa = 0.016 mm, fitted at R2 0.99
348
A.F.M. Hani et al.
Fig. 7. Lesion surface score 1 (9.5360 × 7.6800 mm), Sa = 0.030 mm, fitted at R2 0.98
Fig. 8. Lesion surface score 4 (8.5760 × 7.2320 mm), Sa = 0.055 mm, fitted at R2 0.99
In the figures above, it can be shown that the rougher surface can be distinguished with the higher Sa. Clustering methods can be applied to determine the boundaries for classifying lesion surface based on surface roughness. To test rotation invariance of surface roughness algorithm, a lesion surface obtained in two successive scan will be measured. Surface roughness of lesion images are then determined by same user in separated calculation. The number of samples is 465 lesions (930 images). The lesion surface between the first and the second scan are made slightly different. As an example (See Figure 9), the lesion surface of second scan has been tilted in clockwise direction compared to the first scan. By applying surface roughness algorithm to the lesions (Figure 9), the measured Sa for the first and the second scans are 0.0371 mm (R2 0.9804) and 0.0511 mm (R2 0.9587), respectively. To prove that the algorithm is invariant to rotation of the measured surface, repeatability of the successive scan is evaluated. Definition of repeatability is the closeness between independent results that are obtained with the same method on identical object, same operator, same conditions, and performed in the short time interval [23]. The absolute differences between the two measurements are used to determine the system repeatability. The repeatability can be accepted if 95% of the absolute of measurement differences are less than two standard deviation of the measurement difference (2σDiff). This rule has been presented by Bland and Altman
High Order Polynomial Surface Fitting for Measuring Roughness of Psoriasis Lesion
349
in [24]. From the data of successive measurements, the standard deviation of measurement differences (σDiff) is found to be 0.011 mm. Therefore, the system repeatability of surface roughness algorithm can be accepted since 95.27% of the measurement differences (443 of 465 lesions) are within the acceptable repeatability (2σDiff = 0.021 mm).
(a)
(b)
Fig. 9. Lesion surfaces are obtained from two successive scan. Lesion (a) is from the first scan and lesion (b) is the following scan.
The measurement differences can be due to manual segmentation of the lesion area as seen in an example in Figure 10.
(a)
(b)
Fig. 10. The 2D image and height map of lesion surfaces from the first (a) and the second (b) measurements
5 Summary Surface roughness of psoriasis lesion can be used as a scaliness feature since scale appearance is related with scaliness severity. The developed surface roughness algorithm for 3D curved surface has been validated with an error of 0.0013 ± 0.0022 mm and giving an accuracy of 89.30%. The algorithm is invariant to rotation of the measured surface. Accuracy of the rotated lesion models is found to be greater than 95%. The accuracy of rotated lesion models is high because the surface texture of the lesion model is more regular. The surface roughness algorithm has been tested to the actual psoriasis lesions. System repeatability has been evaluated to successive measurements of 456 psoriasis lesions. The system repeatability can be accepted since 95.27% of the measurement differences are less than 2σDiff ( 0.021 mm).
350
A.F.M. Hani et al.
Acknowledgement. The research is collaboration between UTP and Dermatology Department, Hospital Kuala Lumpur (NMRR-09-1098-4863). This work is funded by Ministry of Science, Technology and Innovation, Malaysia under TechnoFund Grant TF 0308C041.
References 1. Lippincott, W., Wilkins: Pathophysiology: a 2-in-1 reference for nurses. Lippincott Williams & Wilkins, Philadelphia (2005) 2. MedicineNet, Inc., http://www.psoriasis.org/netcommunity/learn_statistics 3. Neimann, A.L., Porter, S.B., Gelfand, J.M.: The epidemiology of psoriasis. Expert Review of Dermatology 1, 63 (2006) 4. Chandran, V., Raychaudhuri, S.P.: Geoepidemiology and environmental factors of psoriasis and psoriatic arthritis. Journal of Autoimmunity 34, J314 (2010) 5. Persatuan Dermatologi Malaysia (2011), http://www.dermatology.org.my/overview_psoriasis.html 6. Registered patient in Hospital Kuala Lumpur. Information System of Hospital Kuala Lumpur, Malaysia, Kuala Lumpur (June 15, 2010) 7. Feldman, S.R., Krueger, G.G.: Psoriasis assessment tools in clinical trials. Annals of the Rheumatic Diseases 64, 65 (2005) 8. Blunt, L., Jiang, X.: Advanced techniques for assessment surface topography: development of a basis for 3D surface texture standards “surfstand”. Kogan Page Science, London (2003) 9. Muralikrishnan, B., Raja, J.: Computational surface and roundness metrology. Springer, London (2009) 10. Whitehouse, D.J.: Handbook of surface metrology. Institute of Physics Pub., Bristol (1994) 11. Barbato, G., Carneiro, K., Garnaes, J., Gori, G., Hughes, G., Jensen, C.P., Jørgensen, J.F., Jusko, O., Livi, S., McQuoid, H., Nielsen, L., Picotto, G.B., Wilkening, G.: Scanning Tunnelling Microscopy Methods for Roughness and Micro Hardness Measurements (1994) 12. Hoffman, R.M., Krotkov, E.P.: Terrain Roughness Measurement from Elevation Maps. In: Mobile Robots IV, pp. 104–114. SPIE (1989) 13. Dong, W.P., Mainsah, E., Stout, K.J.: Reference planes for the assessment of surface roughness in three dimensions. International Journal of Machine Tools and Manufacture 35, 263 (1995) 14. Jiang, X., Scott, P., Whitehouse, D.: Wavelets and their applications for surface metrology. CIRP Annals - Manufacturing Technology 57, 555–558 (2008) 15. Frankowski, G., Chen, M.: Optical 3D in vivo Measurement of Human Skin Surfaces with PRIMOS. In: Proc. of Fringe (2001) 16. Zahouani, H., Vargiolu, R., Boyer, G., Pailler-Mattei, C., Laquièze, L., Mavon, A.: Friction noise of human skin in vivo. Wear 267, 1274 (2009) 17. Proving for Efficacy: Laser Profilometry (2011), http://www.dermatest.de 18. Ma’or, Z., Yehuda, S., Voss, W.: Skin smoothing effects of Dead Sea minerals: comparative profilometric evaluation of skin surface. International Journal of Cosmetic Science 19, 105–110 (1997) 19. GFMesstechnik: User Manual: PRIMOS optical 3D skin measuring device. GFMesstechnik GmbH (2008)
High Order Polynomial Surface Fitting for Measuring Roughness of Psoriasis Lesion
351
20. Yoon, J.S., Ryu, C., Lee, J.H.: Developable polynomial surface approximation to smooth surfaces for fabrication parameters of a large curved shell plate by Differential Evolution. Computer-Aided Design 40, 905 (2008) 21. Wikipedia (2011), http://en.wikipedia.org/wiki/Coefficient_of_determination 22. Fadzil Ahmad, M.H., Prakasa, E., Fitriyah, H., Nugroho, H., Affandi, A.M., Hussein, S.H.: Validation on 3D surface roughness algorithm for measuring roughness of psoriasis lesion. In: Proceedings of World Academy of Science, Engineering and Technology, p. 116. WASET (2010) 23. Royal Society of Chemistry, Cambridge, UK (2006), http://old.iupac.org/goldbook/R05293.pdf 24. Bland, J.M., Altman, D.G.: Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 1, 307–310 (1986)
Modelling of Reflectance Spectra of Skin Phototypes III M.H. Ahmad Fadzil1, Hermawan Nugroho1, Romuald Jolivot2, Franck Marzani2, Norashikin Shamsuddin3, and Roshidah Baba4 1
Centre for Intelligent Signal and Imaging Research, Universiti Teknologi PETRONAS, 31750 Tronoh, Malaysia [email protected], [email protected] 2 Laboratoire Le2i, UMR CNRS 5158, UFR Sc. & Tech., Université de Bourgogne, BP 474870, 21078 Dijon, France 3 Department of Medicine, Faculty of Medicine and Health Sciences, Universiti Putra Malaysia, 43400 Serdang, Selangor Darul Ehsan, Malaysia 4 Dermatology Department, Hospital Kuala Lumpur, 50586 Kuala Lumpur, Malaysia
Abstract. In dermatology, study of human skin colour is related to skin phototype (SPT) in which the Fitzpatrick’s scale is the most used skin photo type classification. Assessment of skin response to UV for various reasons plays an important role in dermatology. This is however not easy to be performed because of two reasons. Firstly, skin areas may have different skin tone resulting in different reflectance spectra and secondly, different modalities may produce different reflectance spectra. We hypothesize that the underlying pattern of reflectance spectra must be similar regardless of the modalities use and the skin areas where it is obtained, for a particular person. An observational clinical study involving 21 participants with SPT III was performed to study the relationship between reflectance spectra of facultative skin colour and constitutive skin colour obtained using two different instruments namely spectrophometer and multispectral camera. The reflectance spectra is then modelled by different linear regressions over different intervals of wavelength (piecewise linear regressions). Results show that correlation between the modelled reflectance and reflectance obtained from different skin samples using different instruments is very high (R-squared >0.965). For this, it can be inferred that the reflectance model based on piecewise linear regressions is suitable to model SPT III. Keywords: skin photo type (SPT), Fitzpatrick’s scale, SPT III, piecewise linear regression, Visual Informatics.
1 Introduction In dermatology, study of human skin colour is related to skin phototype (SPT). Fitzpatrick devised a subjective format for assessing SPTs in human using six integers of Roman numeral scale from I to VI [1]. The classification is based on response of different types of skin to UV light [1, 2]. Table 1 shows the Fitzpatrick scale currently becoming the reference standard to assess an individual’s susceptibility to skin cancer and sunscreen efficacy [3, 4]. H. Badioze Zaman et al. (Eds.): IVIC 2011, Part I, LNCS 7066, pp. 352–360, 2011. © Springer-Verlag Berlin Heidelberg 2011
Modelling of Reflectance Spectra of Skin Phototypes III
353
Table 1. Fitzpatrick’s scale SPT I II III
Unexposed Skin Colour White White White
IV V VI
Light brown Brown Dark brown
Sun Response History Always burns, never tans Always burns, tans minimally Burns minimally, tans gradually and uniformly Burns minimally, always tans well Rarely burns, tans darkly Never burns, tans darkly
As an assessment of skin response to UV is important in dermatology, a model of pattern of skin responses among normal subjects in photodermatology will be able to provide a baseline data. This is however not easy to be performed because of two reasons. First, in a person, skin colour can be categorised into facultative (defined as the skin colour in body areas, which are usually exposed to the sun) and constitutive skin colour (defined as the skin colour in body areas, which are usually not exposed to the sun). Generally, the facultative skin colour is darker than the constitutive skin. This difference can affect prediction of skin phototype. According to Choe [5], there is a correlation between facultative skin colour and skin phototype however it is not so significant in comparison with constitutive skin colour. Second, different modalities may produce different reflectance spectra as shown in Galvan [6]. We hypothesize that even though the reflectance spectra taken from different skin areas using different modalities may be different its underlying pattern however must be similar. A correct model should be able to replicate the pattern and correlate well with reflectance spectra obtained from different skin areas (facultative and constitutive skin) using different modalities. Therefore, the objective of the study is to investigate and model characteristics of reflectance spectra of skin from SPT III found in Malaysia that will correlate well with reflectance spectra obtained from different skin areas (facultative and constitutive skin colour) using two different modalities (spectrophotometer and multispectral camera).
2 Research Methodology 2.1 Instruments Two instruments were used in the study. The first instrument is a multispectral camera called Asclepios - an image acquisition system developed by researchers of Le2i Laboratory (Burgundy University, France) [7]. The system can imaged a skin area of 32 x 38 mm in less than 2 seconds with a resolution of 33 pixels per mm (Fig. 1a). The system outputs a multispectral image comprising 10 mono-band images. Asclepios enhances a spectral resolution by reconstructing hyperspectral cube of skin reflectance by means of an artificial neural network-based algorithm [8]. The second instrument is Konica Minolta Spectrophotometer CM 2600C (Fig. 1b). The spectrophotometer employs the 45/0 method (45° ring-shaped illumination, vertical viewing) for its illumination/observation system. This particular setting
354
M.H. Ahmad Fadzil et al.
(a)
(b)
Fig. 1. Instruments used in the study (a) Asclepios (b) CM 2600C
enables the spectrophotometer to capture optimally the skin diffused reflectance spectrum. To standardise the measurements, the analysis of reflectance spectrum is conducted from wavelength of 430 nm to 740 nm. 2.2 Observational Study An observational study was conducted from November 2010 to January 2011 at Universiti Teknologi PETRONAS, Malaysia. Undergraduate and postgraduate students of the university were recruited as participants in this study. However, as exclusion criteria, participants with jaundiced or chronic medical disorders such as chronic renal failure, liver failure and any presence of dermatological diseases that could interfere with interpretation of the data are excluded. The study was conducted as follows; firstly the participants were given an informed consent form and afterward, the reflectance spectrums of facultative and constitutive skin colour were captured using Asclepios multispectral camera and Spectrophotometer. Facultative skin reflectance was sampled from the forehead of participant and constitutive one was sampled from the lower right back of participants. 2.3 Data Analysis In order to relate participants ‘skin reflectance data to their skin photo type, the data collected were statistically analysed. Using spectrophotometer, two skin reflectance spectra (one from facultative skin colour and another from constitutive skin colour) of each participant were acquired. Using Asclepios, instead of two skin reflectance spectra, two hyperspectral cubes of skin reflectance data (one from facultative and one from constitutive sample) were obtained. The hyperspectral cube refers to a series of reflectance images which of each represents a spatial reflectance of one waveband. To obtain skin reflectance spectra, mean of each image in the hyperspectral cube was computed. Figure 2 depicts a hyperspectral cube and its corresponding reflectance spectra.
Modelling of Reflectance Spectra of Skin Phototypes III
355
60
Reflectance
50 40 30 20 10 0 330
430
530
630
730
Wavelength
(b)
(a)
Fig. 2. (a) Hyperspectral cube (b) Reflectance spectra computed from 2(a)
The skin facultative and constitutive reflectance data of each participant were then grouped based on the instrument. Correlation between reflectance spectra captured from constitutive and facultative skin colour using spectrophotometer and multispectral camera was performed to analyse the linear dependencies between those data. Mean of reflectance from all participants at a particular wavelength were measured and plotted. To investigate the characteristics of SPT III, a standard deviation (SD) of reflectance data from all participants at that wavelength was measured. In the graph, the standard deviations are illustrated as a horizontal error bar while means are illustrated as data points. Piecewise linear regression here were then used to model SPT III.
3 Results and Analysis Personal data of the participants were collected from the patient information form. The spectral reflectance data and multispectral image of 21 participants (10 Malays, 8 Chinese and 3 from other ethnics (Cambodian and Arab)) were obtained in the study. The Pearson correlations between reflectance spectra of constitutive and facultative skin colour were computed. Table 2 shows the Pearson correlation results. Table 2. Pearson Correlation of Constitutive and Facultative Reflectance Spectrum Pearson Correlation Significance (2-tailed)
Asclepios 0.998 1.7x10-22
CM 2600 C 0.997 1.9x10-25
Table 2 shows a high correlation (>0.997) between constitutive and facultative reflectance spectrum in SPT III. Pearson correlation measures the strength of the linear dependence between two variables. It can be inferred that the facultative skin reflectance has a significant and strong correlation with constitutive skin reflectance. To investigate the characteristics of SPT III, analyses of the reflectance spectra were then carried out and to understand the variability at a particular wavelength, the standard deviation (SD) of reflectance data from all participants at that wavelength was measured as well. In the graph, the standard deviations are illustrated as horizontal error bar while means are illustrated as data points. Figure 3 shows the reflectance mean ± SD of SPT III.
M.H. Ahmad Fadzil et al.
SPT III - Constitutive Data - CM 2600C 70 60
Reflectance
50 40 30 20 10 0 430
530
630
730
Wavelength
(a)
SPT III - Facultative Data - CM 2600C 70 60
Reflectance
50 40 30 20 10 0 430
530
630
730
Wavelength
(b)
SPT III - Constitutive Data - Asclepios 70 60 50
Reflectance
356
40 30 20 10 0 430
530
630 Wavelength
(c)
Fig. 3. Reflectance Mean ± SD of SPT III
730
Modelling of Reflectance Spectra of Skin Phototypes III
357
SPT III - Facultative Data - Asclepios 70 60
Reflectance
50 40 30 20 10 0 430
530
630
730
Wavelength
(d)
Fig. 3. (Continued)
As shown that the reflectance is not linear, multiple linear regressions [9] as a result may not give the best analysis. For this, it is hypothesized that the reflectance can be modelled by different linear regressions over different interval of wavelength (piecewise linear regressions [10, 11]). A knot is a value at which the slope (gradient) in piecewise linear regression model changes. Figure 4 depicts the gradient at each wavelength. SPT III 0.6 Asclepios -Constitutive DataCM 2600 C - Constitutive Data -
0.5
Asclepios - Facultative Data CM 2600 C - Facultative Data 0.4
estimated knot
Gradient
0.3
0.2
0.1
0 440
490
540
590
640
-0.1
-0.2
Wavelength
Fig. 4. Gradient Analysis of SPT III
690
358
M.H. Ahmad Fadzil et al.
Figure 4 observably shows that there are 3 knots in SPT III located in between 500 -510 nm, 570 -580 nm and 630 -640 nm. To enable the regression applicable for clinical use, the piecewise linear regression is performed on the reflectance of facultative skin colour obtained from spectrophotometer. Figure 5 in turn shows the piecewise linear regression analysis. SPT III 60
Trendline CM 2600 C -Constitutive DataCM 2600 C - Facultative DataAsclepios-Constitutive Data-
50
Asclepios-Facultative Data-
Reflectance
40
30
20
10
0 430
480
530
580
630
680
730
Wavelength
Fig. 5. Piecewise Linear Regression of SPT III
The piecewise linear regression of reflectance, R (λ ) from SPT III can be formulated as,
0.138λ − 46.223 , 0.03λ + 8.031, R (λ ) = 0.327λ − 161.3 , 0.065λ − 2.863 ,
440 ≤ λ ≤ 502 503 ≤ λ ≤ 570 571 ≤ λ ≤ 627 627 ≤ λ ≤ 740
(1)
Table 3 shows the R-squared between the estimated reflectance and the actual reflectance obtained from the observational study. Table 3. R-squared of the Proposed Regression compared with SPT III No 1 2 3 4
Type of Data CM 2006 C, Constitute Skin Colour CM 2600 C, Facultative Skin Colour Asclepios, Constitute Skin Colour Asclepios, Facultative Skin Colour
R-squared 0.994 0.997 0.985 0.962
Modelling of Reflectance Spectra of Skin Phototypes III
359
Table 3 shows a very high correlation (R-squared > 0.965) between the reflectance obtained from the piecewise linear regression model and the actual reflectance captured from different skin samples and different equipments. As mentioned in the introduction, a correct model should be able to replicate the reflectance pattern and correlate well with reflectance spectra obtained from different skin areas (facultative and constitutive skin) using different modalities. Hence, it can be inferred that our piecewise linear regression model is the correct model for SPT III. This also indicates that our hypothesis is correct that the underlying pattern of the reflectance spectra is similar regardless of the modalities use and the skin areas where it is obtained.
4 Conclusion Assessment of SPT is important in dermatology. Analysis of skin response among normal subjects can provide a baseline for any study related to SPT. This is however not easy to be performed because of two reasons. Firstly, skin areas may have different skin tone resulting in different reflectance spectra and secondly, different modalities may produce different reflectance spectra. We hypothesize that even though the reflectance spectra taken from different skin areas using different modalities may be different its underlying pattern however must be similar. A correct model should be able to replicate the pattern and correlate well with reflectance spectra obtained from different skin areas (facultative and constitutive skin) using different modalities. Therefore, the objective of the study is to investigate and model characteristics of reflectance spectra of skin from SPT III found in Malaysia that will correlate well with reflectance spectra obtained from different skin areas (facultative and constitutive skin colour) using two different modalities (spectrophotometer and multispectral camera). An observation clinical study involving 21 participants with SPT III was performed. Correlation analysis shows a high correlation (Pearson correlation>0.997, p<0.01) between constitutive and facultative reflectance spectrum in SPT III. This high correlation between constitutive and facultative reflectance spectra indicates that SPTs can be discerned regardless the skin areas where it is obtained. To enable the regression applicable for clinical use, the piecewise linear regression is performed on the reflectance of facultative skin colour obtained from spectrophotometer. Tables 3 shows that correlation between the estimated reflectance and reflectance obtained from different skin samples and different modalities is very high (R-squared >0.965). It can be inferred that the proposed piecewise linear regression model is the correct model for SPT III as it correlates well with reflectance spectra obtained from different skin areas (facultative and constitutive skin) using different modalities (i.e. multispectral camera and spectrophotometer). We can also conclude that our hypothesis, that the underlying pattern of the reflectance spectra is similar regardless of the modalities use and the skin areas where it is obtained, is correct.
360
M.H. Ahmad Fadzil et al.
References 1. Fitzpatrick, T.B.: The validity and practicality of photo-reactive skin types I through VI. Arch. Dermatol. 124, 869–871 (1988) 2. Pershing, L.K., Tirumala, V.P., Nelson, J.L., Corlett, J.L., Lin, A.G., Meyer, L.J., Leachman, S.A.: Reflectance spectrophotometer: the dermatologists’ sphygmomanometer for skin phototyping? J. Invest. Dermatol. 128(7), 1633–1640 (2008) 3. Kawada, A.: Risk and preventive factors for skin phototype. J. Dermatol. Sci. 23(suppl. 1), S27–S29 (2000) 4. Autier, P., Boniol, M., Severi, G., Dore, J.F.: Quantity of sunscreen used by European students. Br. J. Dermatol. 144, 288–291 (2001) 5. Choe, Y.B., Jang, S.J., Jo, S.J., Ahn, K.J., Youn, J.I.: The difference between the constitutive and facultative skin color does not reflect skin phototype in Asian skin. Skin Research and Technology 12(1), 68–72 (2006) 6. Galvan, I., Sanz, J.J.: Measuring plumage colour using different spectrophotometric techniques: a word of caution. Ornis Fennica 87, 67–76 (2010) 7. Jolivot, R., Vabres, P., Marzani, F.: An open system to generate hyperspectral cubes for skin optical reflectance analysis. Skin Research and Technology 16(4), 452–516 (2009) 8. Jolivot, R., Vabres, P., Marzani, F.: Reconstruction of hyperspectral cutaneous data from an artificial neural network-based multispectral imaging system. Computerized Medical Imaging and Graphics 35, 85–88 (2011) 9. Shimada, M., Yamada, T., Itoh, M., Yataga, T.: Melanin and blood concentration in human skin model studied by multiple regression analysis: assessment by Monte Carlo simulation. Physics in Medicine and Biology 46, 2397–2406 (2001) 10. Vieth, E.: Fitting piecewise linear regression functions to biological responses. Journal of Applied Physiology 67(1), 390–396 (1989) 11. Malasha, G.F., El-Khaiary, M.: Piecewise linear regression: A statistical method for the analysis of experimental adsorption data by the intraparticle-diffusion models. Chemical Engineering Journal 163(3), 256–263 (2010)
Virtual Method to Compare Treatment Options to Assist Maxillofacial Surgery Planning and Decision Making Process for Implant and Screw Placement Yuwaraj Kumar Balakrishnan1, Alwin Kumar Rathinam1, Tan Su Tung1, Vicknes Waran1,2, and Zainal Ariff Abdul Rahman1,3 1
2
Virtual Reality Centre, University of Malaya Department of Surgery, University Malaya Medical Centre 3 Faculty of Dentistry, University of Malaya [email protected]
Abstract. The paper explores a biomodelling method comprising of several visual and virtual reality methods to assess different surgical procedures using the computer. A computer tomography (CT) medical scan of a patient was obtained and was converted into a three dimensional virtual biomodel. This mandible biomodel was then subjected to graphical post-processing to split it into two portions mimicking actual surgery to correct jaw deformity. In such cases, screws and/or implants are used in various positions to link the two parts of the mandible. In this evaluation all possible configurations of a 5 holes L-plate implant (Synthes) were computed to evaluate the various surgical approaches. The nine configurations were ordered on stability of treatment based on the least amount of movement between the two bone parts upon receiving a virtual force of 500N simulating a jaw bite. Although the best treatment method was highlighted for this particular case but other factors such as bone thickness need to be incorporated in order to further substantiate this option. This system could be used in future by maxillofacial surgeons as a pre-surgical planning tool to decide which of the various available configurations would be suitable for a given patient based on the patients’ medical scan data. Such results would also pave the way for Evidence based Medicine practice (EBP) where the decision for a particular treatment approach would be guided and strengthened by data from virtual simulation studies of the outcome of the surgery. Keywords: Biomodel, Maxillofacial, Evidence Based Medicine, Pre-surgical planning, Visual Informatics.
1
Introduction
The mandible commonly known as the jaw is the lower portion of human skull anatomy. The mandible is subject to various trauma and hereditary defects which require surgical treatment. Tumour growth in this region may also require surgical removal of the portion of mandible. The high rate of variability of the structure and H. Badioze Zaman et al. (Eds.): IVIC 2011, Part I, LNCS 7066, pp. 361–367, 2011. © Springer-Verlag Berlin Heidelberg 2011
362
Y.K. Balakrishnan et al.
shape of the mandible amongst individuals require surgeons to exercise different approaches of treatment. Computational methods have been employed in various ways for preoperative surgical planning [1][2]. Some involved the use of 3D virtual models [3] and physical 3D models [4]. The use of navigational surgery to guide surgeons has also been extensively explored [5]. Methods used in physics and engineering fields to assess the flow of water and the stresses and forces on a building have also been attempted in cranio-maxillofacial studies [6][7][9]. But such studies mostly still require the surgeon to decide on the method of treatment to be applied based on his or her own reasoning. Evidence based medicine has been a growing principal where the medical treatment must be based on a grounded evidence supporting the particular choice of treatment compared with other approaches [8]. This is an unbiased way of handling the medical treatment. In this particular medical field of maxillofacial surgery, there exists a high rate of different surgical methods and regularly a surgeon will exercise his/her own judgement based on his/her own personal prior experience in handling similar treatment on another patient beforehand. As such the surgeon is not able to provide any numerical figures to assess and support his choice of treatment over other approaches [8]. This stresses the need for a computational virtual method pipeline to assess the surgical outcome.
2 2.1
Methodology Three Dimensional Representation
A computer tomography (CT) scan of a patient consisting of the mandible was obtained using a 128 slice dual source Siemens CT imaging system in our university's medical centre. The data consisting of 100 two dimensional slices obtained in DICOM format and the window level was set to WL=300 and WW=1500 respectively to highlight the bone region. The data was then converted to a three dimensional (3D) representation using the marching cube and delauney algorithm. Using an openGL 3D modelling programme developed in house, we then segmented out the mandible and the segmented it along the midline into two portions as we intend to utilise the left portion.
Fig. 1. Sample CT scan of a patient’s skull
Virtual Method to Compare Treatment Options to Assist Maxillofacial Surgery Planning
363
Fig. 2. Three dimensional (3D) models for the placement of mandible and skull
2.2
Sagittal Cut
The mandible was then subject to the next step to split it sagittally in the area behind the last molar into two portions as per the actual surgery conducted. For this process, a multi-portioned rectangular plane was drawn along the split line to mimic the actual process. The plane and the mandible object were then subjected to boolean algorithms in order to split it into two portions. Thereafter, the frontal portion of the mandible object was translated 7mm in the Y direction also to mimic the actual distraction as conducted by the maxillofacial surgeon. 2.3
Implant and Screw placement
The L-plate implant was used in this study as a case example to assess the viability of the proposed method. The L-plate was modeled based on measurements from a 2mm hole size, left configuration with 5 holes L-Plate sourced from Synthes GmbH (Ref. No. 446.500). The plate had a thickness of 1.5mm and a length of 26mm. The short L portion extrudes from the main branch for a length of 1.8mm. In the actual plate, the plate had a bevel in the screw holes to hold the screw head but we modeled it without the bevel in order to conserve computing power. The screws were modeled as long cylinders without a screw head as in theory the screw heads do not have contact surfaces with the plate and the mandible. Two different screw sets were used, monocortical and bi-cortical, with a length of 6mm and 12mm respectively. Mono-cortical screws were used in the upper-posterior-mandibular portion and bi-cortical screws were used in the anterior-mandible portion. The screws and the plates were then positioned in various configurations as shown in the following table. This set of objects were then packaged as a single group of objects and exported into the Virtual Reality Markup Language (VRML).
364
Y.K. Balakrishnan et al.
Fig. 3. Position of implant and screws on mandible
2.3
Stress Analysis
The VRML data file was then imported into the stress and material analysis software (Solidworks) as a watertight solid part file. The Finite Element Analysis (FEA) was then carried out by first defining the material, young modulus and tensile strength for each component separately i.e. the mandible, titanium implant and titanium screws. Next, the mandible is applied with fixation operators on the rear end of the posterior portion of the mandible. This holds down the mandible similar to the real jaw which is held tightly to the skull by muscles. The force setting was then applied downwards on 4 points depicting the distribution of the force on the teeth on a human mandible. As such, the bite force was applied with a hard bite as with 500N of force (as proposed by ref in ref 1/2) and was distributed by the following percentage 1st point 10%= 50N 2nd point is 20%=100N , 3rd Point is 50%=250N and the 4th point is20%=100N , a displacement will or might occur. Placing the plate and screws on the suggested configuration will simulate the whole study and the displacement average values are taken, the value is based on a selected area which is between the two mandible joints. The graphical processing pipeline is as summarised by the following chart (Fig. 4).
Fig. 4. Graphical processing pipeline of the project
Virtual Method to Compare Treatment Options to Assist Maxillofacial Surgery Planning
No 1
Configuration
No 6
2
7
3
8
4
9
365
Configuration
5
Fig. 5. List of configurations of the 5 holes L-plate Table 1. Results of the stress analysis Configuration Pos A
Pos B
Total -Left Pos E
Pos F
Total-Right
Total Screws
Displacement
1
1
1
2
1
1
2
4
0.042706
2
1
1
2
1
1
3
0.042686
3
1
1
2
1
1
3
0.042719
4
1
5
1
1
1
1
2
3
0.075955
1
1
1
2
3
0.049047
1
1
2
0.062462
1
2
0.076217
1
2
0.048996
1
2
0.049012
6
1
1
7
1
1
8
1
1
9
1
1
1 1 1
366
Y.K. Balakrishnan et al. Table 2. Average Displacement of the stress analysis
Total Screws
Average Displacement 4
0.042706
3
0.052601
2
3
0.059171
Discussion
The results outline the anterior mandible's movement with the application of 500N force in the direction of downward force. This displacements show that the 1st and 2nd configuration to be the most stable configuration, reason being the numbers of screws, the distance between the screws and the positioning of the screw in a straight line. Comparatively the 7th configuration show least stable as the screws are place in pos A and pos F respectively making it weak to withstand the applied forces. Similar results are seen in the 4th configuration as there is only one screw in pos A where the force is intense and two screw in pos E and F respectively making this configuration almost equally weak. An average is taken based on the displacement caused by the number of screws. The least displacement is seen when four screws are applied and the most displacement is by applying two screws. Although it may be said that this particular configuration based on number of screws can be derived using selfreasoning as commonly done [8] but with such depicted numerical assessment, the particular configuration can be strongly supported by EBP. The overall results also show that the 7th configuration had the most amount of movement hence showing the least stability. This information may be of use for surgeons to identify which configurations they should avoid applying in the planned surgery. 3.1
Future Work
Additional work is required to further evaluate the computerised predicted best procedure as compared with actual surgical practice. This evaluation must be very quantitative in order to be effectively compared with the computation. Effort is currently under way to automate the above mentioned process in order to analyse each combination on a more automated manner. This is because the possible number of permutations for each type of plate increases with the number of variations. For example with a 7 hole plate where 6 holes are used in the analysis, the number of possible configurations are 49 as outlined as follows.
Virtual Method to Compare Treatment Options to Assist Maxillofacial Surgery Planning
367
A rough estimate of implants available for use by maxillofacial surgeons is around 20, hence we could have a possible 1000(20pcX50) combinations for a single patient. Without an automated system, this current proposed system would be slow to be absorbed by the surgical community for daily use.
References 1. Koch, R.M., Gross, M.H., Carls, F.R., von Büren, D.F., Fankhauser, G., Parish, Y.I.H.: Simulating Facial Surgery Using Finite Element Models. In: SIGGRAPH 1996 Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques, Computers & Graphics, vol. 30, pp. 421–428. ACM, NY (1996) 2. Tan, S.T., Rathinam, A.K., Kumar, Y., Rahman, Z.A.A.: Additional Cues Derived from Three Dimensional Image Processing to Aid Customised Reconstruction for Medical Applications. In: Badioze Zaman, H., Robinson, P., Petrou, M., Olivier, P., Schröder, H., Shih, T.K. (eds.) IVIC 2009. LNCS, vol. 5857, pp. 148–155. Springer, Heidelberg (2009) 3. Hassfeld, S., Mühling, J.: Computer assisted oral and maxillofacial surgery – a review and an assessment of technology. Int. J. Oral. Maxillofac. Surg. 30(1), 2–13 (2001) 4. D’Urso, P.S., Atkinson, R.L., Lanigan, M.W., Earwaker, W.J., Bruce, I.J., Holmes, A., Barker, T.M., Effeney, D.J., Thompson, R.G.: Stereolithographic (SL) biomodelling in craniofacial surgery. Br. J. Plast. Surg. 51(7), 522–530 (1998) 5. Casap, N., Tarazi, E., Wexler, A., Sonnenfeld, U., Lustmann, J.: Intraoperative computerized navigation for flapless implant surgery and immediate loading in the edentulous mandible. Int. J. Oral. Maxillofac. Implants 20(1), 92–98 (2005) 6. Cox, T., Kohn, M.W., Impelluso, T.: Computerized analysis of resorbable polymer plates and screws for the rigid fixation of mandibular angle fractures. J. Oral. Maxillofac. Surg. 61(4), 481–487 (2003); ; discussion 487–488 7. Vollmer, D., Meyer, U., Joos, U., Vègh, A., Piffko, J.: Experimental and finite element study of a human mandible. J. Craniomaxillofac. Surg. 28(2), 91–96 (2000) 8. Timmermans, S., Mauck, A.: The promises and pitfalls of evidence-based medicine. Health Aff (Millwood) 24(1), 18–28 (2005) 9. Erkmen, E., Simşek, B., Yücel, E., Kurt, A.: Comparison of different fixation methods following sagittal split ramus osteotomies using three-dimensional finite elements analysis. Part 1: advancement surgery-posterior loading. Int. J. Oral. Maxillofac. Surg. 34(5), 551– 558 (2005) 10. Chuong, C.J., Borotikar, B., Schwartz-Dabney, C., Sinn, D.P.: Mechanical characteristics of the mandible after bilateral sagittal split ramus osteotomy: comparing 2 different fixation techniques. J. Oral. Maxillofac. Surg. 63(1), 68–76 (2005) 11. Bettega, G., Dessenne, V., Raphael, B., Cinquin, P.: Computer-assisted mandibular condyle positioning in orthognathic surgery. J. Oral. Maxillofac. Surg. 54, 553–558 (1996) 12. Colchester, A.C., Zhao, J., Holton-Tainter, K.S., Henri, C.J., Maitland, N., Roberts, P.T., Harris, C.G., Evans, R.J.: Development and preliminary evaluation of VISLAN, a surgical planning and guidance system using intra-operative video imaging. Med. Image Anal. 1(1), 73–90 (1996) 13. Hassfeld, S., Zöller, J., Albert, F.K., Wirtz, C.R., Knauth, M., Mühling, J.: Preoperative planning and intraoperative navigation in skull base surgery. J. Craniomaxillofac. Surg. 26(4), 220–225 (1998) 14. Casap, N., Wexler, A., Persky, N., Schneider, A., Lustmann, J.: Navigation surgery for dental implants: assessment of accuracy of the image guided implantology system. J. Oral. Maxillofac. Surg. 62(9 suppl. 2), 116–119 (2004)
Author Index
Abas, Hafiza II-157 Abdollah, Norfarhana II-104 Abdul Ghani, Ahmad Tarmizi II-217 Abdulkarim, Muhammad I-113 Abdullah, Mohd. Zin I-56 Abdullah, Salwani II-217 Abdullah, Siti Norulhuda Sheikh II-382 Abdullah, Zailani I-183 Abdul Razak, Fariza Hanis II-64 Abu Bakar, Azuraliza II-217 Abu Bakar, Zainab I-99 Affandi, Azura Mohd I-341 Ahmad, Azlina I-125, I-319, I-329, II-168, II-323, II-360, II-371, II-394 Ahmad, Ibrahim II-33 Ahmad, Rohiza II-23 Ahmad Fadzil, M.H. I-352 Ahmad Fauzi, Mohammad Faizal I-268 Al-Douri, Bilal Abdulrahman T. I-176 Al-Khaffaf, Hasan S.M. I-176 Al-Qaraghuli, Ammar I-329 Alsagoff, Syed Nasir II-419 Ang, Mei Choo II-1, II-323, II-394 Arebey, Maher I-280 Arif, Agus I-113 Arshad, Haslina I-238, I-257, II-394 Arshad, Leena I-139 Ayob, Masri II-217 Azizah, J. II-323, II-394 b Baharudin, Baharum I-289 b Mohd Rajuddin, Mohd Kushairi I-289 Baba, Roshidah I-352 Badioze Zaman, Halimah I-125, I-319, I-329, II-157, II-168, II-180, II-231, II-242, II-295, II-323, II-360, II-371, II-394, II-408 Balakrishnan, Yuwaraj Kumar I-361 Baskaran, R. II-50 Basori, Ahmad Hoirul I-67 Basri, Hassan I-280 Begum, R.A. I-280 Belaton, Bahari I-87, I-196, II-348 Besar, Rosli I-226
Bin, Felix Yap Boon I-139 Bin Sunar, Mohd Shahrizal I-300 Bin Tomi, Azfar II-305 bt Mohd, Farahwahida I-289 bt Shamsuddin, Norsila I-289 bt Wan Ahmad, Wan Fatimah I-289 Chan, Chee Seng I-257 Chandren, Muniyandi Ravie I-56 Che Haron, Che Hassan I-238 Chen, Y.Y. II-316 Chong, Huai Yong II-1 Choo, Wou Onn II-123 Chung, Sheng Hung II-114 Ciecholewski, Marcin I-206 Coatrieux, Gouenou I-226 Crestani, Fabio II-284 Deris, Mustafa Mat I-183 Diah, Norizan Mat I-164 Downe, Alan G. I-216 Fabil, Norasikin II-340 Fata, Ahmad Zakwan Azizul Fitriyah, Hurriyatul I-341 Folorunso Olufemi, A. I-24
I-67
Hadi, A.A.R. II-39 Hamdan, Abdul Razak II-217 Hamid, Hamzaini Abdul I-307 Hani, Ahmad Fadzil M. I-139, I-341 Hannan, M.A. I-280 Hasan, Mohammad Khatim I-307 Hashim, Abdul Rahman Mad I-36 Hashim, Ahmad Sobri II-23 Haw, Su-Cheng I-268 Herawan, Tutut I-183 Hoh, Ming Chee II-123 Hussein, Suraiya Hani I-341 Hussin, Niwala Haswita I-216 Ibrahim, Nazrita II-273 Ibrahim, Nurul Huda II-39 Ibrahim, Roslina II-135
370
Author Index
Jaafar, Azizah II-33, II-64, II-135, II-147 Jaafar, J. II-93 Jaafar, Jafreezal I-216 Jamil, Adawiyah I-139 Jamil, Nursuriati I-99 Jegatha Deborah, L. II-50 Jolivot, Romuald I-352 Jumail II-205 Jusoh, Normal Mat I-24 Kadir, Rabiah Abdul I-36 Kamaludin, Khairuddin II-85 Kannan, A. II-50 Kassim, Abdul Yazid Mohd I-307 Khader, Ahamad Tajudin I-87 Khalil, Khalili II-135 Khalil-Hani, Mohamed I-45 Kharrufa, Ahmed I-329 Khoo, Bee Ee I-151 Khor, Ean Teng II-114 Kolivand, Hoshang I-300
Mohd Yatim, Noor Faezah II-85, II-273 Muhamad, Murniza II-371 Mukhtar, Muriati Bt. I-77 Mustapha, Aida I-36 Nazlena, M.A. II-323, II-394 Nazri, Mohd Zakree Ahmad II-217 Ng, Hu I-245 Ng, Kok Weng II-1 Nizar, Tamar Jaya I-67 Noah, Shahrul Azman II-340, II-382 Nordin, Md. Jan II-85 Norkhairani, A.R. I-125 Norsiah, A.H. II-323 Nugroho, Hermawan I-341, I-352 Nysj¨ o, Johan I-1 Nystr¨ om, Ingela I-1 Olivier, Patrick I-329 Ooi, Chee-Pun I-245 Othman, Zalinda II-217 Othman, Zulaiha Ali II-217
Lam, Meng Chun I-257 Lee, Hyowon II-74, II-284 Lesk, Valerie II-13 Liao, Iman Yi I-196 Lob Yussof, Rahmah II-180
Patah Akhir, Emelia Akashah II-104 Peikari, Hamid Reza I-77 Periasamy, Elango II-242 Prabuwono, Anton Satria I-257, I-307 Prakasa, Esa I-341
Malik, Aamir Saeed I-139 Malmberg, Filip I-1 Mart´ınez-Us´ o, Adolfo I-13 Marzani, Franck I-352 Mat Amin, Maizan II-168 Mat Nayan, Norshita II-295 Mat Zain, Nurul Hidayah II-64 Mat Zin, Nor Azan I-164, II-253 McKay, Alison II-1 Mohamad Ali, Nazlena II-74, II-261, II-273, II-284 Mohamed Omar, Hasiah II-147 Mohamed Shaari, Nazrul Azha II-231 Mohammadi, Mohsen I-77 Mohd, Masnizah II-284 Mohd Khalid, Yanti Idaya Aspura II-382 Mohd Noah, Shahrul Azman II-284 Mohd Rahim, Mohd Shafry I-67 Mohd Rias, Riaza II-408
Rabiah, A.K. II-193 Radzi, Syafeeza Ahmad I-45 Rahim, Normala II-253 Rahman, Zainal Ariff Abdul I-361 Rajion, Zainul Ahmad I-196 Rambli, Dayang Rohaya Awang II-205, II-305 Ramli, Intan Syaherra I-238 Rassem, Taha H. I-151 Rathinam, Alwin Kumar I-361 Razali, Radzuan I-113 Ridzuan, H. II-394 Riza, S. II-323, II-394 Roohullah II-93 Rosman, Arief Salleh I-67 Saforrudin, Norabeerah II-360 Salgues, G. I-13 Salim, Juhana II-340 Sarim, Hafiz Mohd II-217
Author Index Shafie, Afza I-113 Shamsuddin, Norashikin I-352 Shapi’i, Azrulhizam I-307 Shih, Timothy K. I-23 Shukur, Zarina II-340 Siew, Pei Hwa II-123 Siti Iradah, I. II-193 Smeaton, Alan F. II-74, II-284 Soh, Hazwani Che I-99 Solaiman, Basel I-226 Sulaiman, Riza I-307 Sulaiman, Suziah II-205 Sulong, Abu Bakar I-238 Sunar, Mohd Shahrizal I-24 Talib, Abdullah Zawawi I-176, II-348 Tan, Wooi-Haw I-226, I-245 Tengku Sembok, Tengku Mohd I-99, II-295 Tengku Wook, Tengku Siti Meriam II-253 Thomas, J. Joshua I-87 Tong, Hau-Lee I-245, I-268 Tsai, Joseph C. I-23
371
Tung, Tan Su I-361 Tyng, Khoo Shiang I-319 Ugail, Hassan
II-13
Velastin, S.A.
I-13
Wan Ahmad, Wan Fatimah I-113, II-23, II-104, II-316 Wan Daud, Wan Mohd Fazdli II-39 Wan Shamsuddin, Syadiah Nor II-13 Wan Zainon, Wan Mohd Nazmee II-348 Waran, Vicknes I-361 Wirza, Rahmita I-36 Yahaya, Nor Hamdan Mohd. I-238 Yeoh, Tze-Wei I-245 Yusof, Farah Wahida Mohd I-67 Yusoff, Rasimah Che Mohd II-135 Zahidah Abdullah, Siti II-261 Zainudin, Suhaila II-217 Zheng, Pan I-196 Zulkarnain, Nur Zareen II-316 Zulkipli, Mohd Firdaus II-64