Communications in Computer and Information Science
262
Tai-hoon Kim Hojjat Adeli William I. Grosky Niki Pissinou Timothy K. Shih Edward J. Rothwell Byeong-Ho Kang Seung-Jung Shin (Eds.)
Multimedia, Computer Graphics and Broadcasting International Conference, MulGraB 2011 Held as Part of the Future Generation Information Technology Conference, FGIT 2011 in Conjunction with GDC 2011 Jeju Island, Korea, December 8-10, 2011 Proceedings, Part I
13
Volume Editors Tai-hoon Kim Hannam University, Daejeon, Korea E-mail:
[email protected] Hojjat Adeli The Ohio State University, Columbus, OH, USA E-mail:
[email protected] William I. Grosky University of Michigan, Dearborn, MI, USA E-mail:
[email protected] Niki Pissinou Florida International University, Miami, FL, USA E-mail:
[email protected] Timothy K. Shih Tamkang University, Taipei, Taiwan, R.O.C. E-mail:
[email protected] Edward J. Rothwell Michigan State University, East Lansing, MI, USA E-mail:
[email protected] Byeong-Ho Kang University of Tasmania, Hobart, TAS, Australia E-mail:
[email protected] Seung-Jung Shin Hansei University, Gyeonggi-do, Korea E-mail:
[email protected]
ISSN 1865-0929 e-ISSN 1865-0937 e-ISBN 978-3-642-27204-2 ISBN 978-3-642-27203-5 DOI 10.1007/978-3-642-27204-2 Springer Heidelberg Dordrecht London New York Library of Congress Control Number: Applied for CR Subject Classification (1998): C.2, H.4, I.2, H.3, D.2, H.5 © Springer-Verlag Berlin Heidelberg 2011 This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com)
Foreword
Multimedia, computer graphics and broadcasting are areas that attract many professionals from academia and industry for research and development. The goal of the MulGraB conference is to bring together researchers from academia and industry as well as practitioners to share ideas, problems and solutions relating to the multifaceted aspects of multimedia, computer graphics and broadcasting. We would like to express our gratitude to all of the authors of submitted papers and to all attendees for their contributions and participation. We acknowledge the great effort of all the Chairs and the members of Advisory Boards and Program Committees of the above-listed event. Special thanks go to SERSC (Science and Engineering Research Support Society) for supporting this conference. We are grateful in particular to the speakers who kindly accepted our invitation and, in this way, helped to meet the objectives of the conference. December 2011
Chairs of MulGraB 2011
Preface
We would like to welcome you to the proceedings of the 2011 International Conference on Multimedia, Computer Graphics and Broadcasting (MulGraB 2011) — the partnering event of the Third International Mega-Conference on Future-Generation Information Technology (FGIT 2011) held during December 8–10, 2011, at Jeju Grand Hotel, Jeju Island, Korea MulGraB 2011 focused on various aspects of advances in multimedia, computer graphics and broadcasting. It provided a chance for academic and industry professionals to discuss recent progress in the related areas. We expect that the conference and its publications will be a trigger for further related research and technology improvements in this important subject. We would like to acknowledge the great effort of the MulGrab 2011 Chairs, Committees, International Advisory Board, Special Session Organizers, as well as all the organizations and individuals who supported the idea of publishing this volume of proceedings, including the SERSC and Springer. We are grateful to the following keynote, plenary and tutorial speakers who kindly accepted our invitation: Hsiao-Hwa Chen (National Cheng Kung University, Taiwan), Hamid R. Arabnia (University of Georgia, USA), Sabah Mohammed (Lakehead University, Canada), Ruay-Shiung Chang (National Dong Hwa University, Taiwan), Lei Li (Hosei University, Japan), Tadashi Dohi (Hiroshima University, Japan), Carlos Ramos (Polytechnic of Porto, Portugal), Marcin Szczuka (The University of Warsaw, Poland), Gerald Schaefer (Loughborough University, UK), Jinan Fiaidhi (Lakehead University, Canada) and Peter L. Stanchev (Kettering University, USA), Shusaku Tsumoto (Shimane University, Japan), Jemal H. Abawajy (Deakin University, Australia). We would like to express our gratitude to all of the authors and reviewers of submitted papers and to all attendees, for their contributions and participation, and for believing in the need to continue this undertaking in the future. December 2011
Tai-hoon Kim Hojjat Adeli William I. Grosky Niki Pissinou Timothy K. Shih Ed. Rothwell Byeongho Kang Seung-Jung Shin
Organization
Honorary Chair Jeong-Jin Kang
Dong Seoul University, Korea
General Co-chairs William I. Grosky Niki Pissinou Timothy K. Shih Ed Rothwell
University of Michigan-Dearborn, USA Florida International University, USA National Taipei University of Education, Taiwan Michigan State University, USA
Program Co-chairs Tai-hoon Kim Byeongho Kang Seung-Jung Shin
GVSA and University of Tasmania, Australia University of Tasmania, Australia Hansei University, Korea
Workshop Chair Byungjoo Park
Hannam University, Korea
Publication Chair Yongho Choi
Jungwon University, Korea
International Advisory Board Aboul Ella Hassanien Andrea Omicini Bozena Kostek Cao Jiannong Cas Apanowicz Ching-Hsien Hsu Claudia Linnhoff-Popien Daqing Zhang Diane J. Cook Frode Eika Sandnes
Cairo University, Egypt DEIS, Universit`a di Bologna, Italy Gdansk University of Technology, Poland Hong Kong Polytechnic University, Hong Kong Ministry of Education, Canada Chung Hua University, Taiwan Ludwig-Maximilians-Universit¨ at M¨ unchen, Germany Institute for Infocomm Research (I2R), Singapore University of Texas at Arlington, USA Oslo University College, Norway
X
Organization
Guoyin Wang Hamid R. Arabnia Han-Chieh Chao Ing-Ray Chen
CQUPT, Chongqing, China The University of Georgia, USA National Ilan University, Taiwan Virginia Polytechnic Institute and State University, USA Seoul National University of Science and Technology, Korea Hong Kong Polytechnic University, Hong Kong University of Canterbury, New Zealand PJIIT, Warsaw, Poland The Hong Kong University of Science and Technology, Hong Kong Pennsylvania State University, USA Michigan State University, USA University of Miami, USA The University of Melbourne, Australia Hongik University, Korea University Texas at Arlington, USA Acadia University, Canada Indian Statistical Institute, India Vienna University of Technology, Austria La Trobe University, Australia University of the Aegean, Greece University of Alabama, USA Eulji University, Korea University of North Carolina, USA Cairo University, Egypt
Jae-Sang Cha Jian-Nong Cao Krzysztof Pawlikowski Krzysztof Marasek Lionel Ni Mahmut Kandemir Matt Mutka Mei-Ling Shyu Rajkumar Buyya Robert Young Chul Kim Sajal K. Das Sajid Hussain Sankar K. Pal Schahram Dustdar Seng W. Loke Stefanos Gritzalis Yang Xiao Yong-Gyu Jung Zbigniew W. Ras Aboul Ella Hassanien
Program Committee Abdelwahab Hamou-Lhadj Ahmet Koltuksuz Alexander Loui Alexei Sourin Alicja Wieczorkowska Andrew Kusiak Andrzej Dzielinski Anthony Lewis Brooks Atsuko Miyaji Biplab K. Sarker Ch. Z. Patrikakis Chantana Chantrapornchai Chao-Tung Yang
Chengcui Zhang Chi Sung Laih Ching-Hsien Hsu Christine F. Maloigne Dae-Hyun Ryu Daniel Thalmann Dieter Gollmann Dimitris Iakovidis Doo-Hyun Kim Do-Hyeun Kim Eung-Nam Ko Fabrice M´eriaudeau Fangguo Zhang Francesco Masulli Federica Landolfi
G´erard Medioni Hae-Duck Joshua Jeong Hai Jin Huazhong Hiroaki Kikuchi Hironori Washizaki Hongji Yang Hoon Jin Hyun-Sung Kim Hyun-Tae Kim Jacques Blanc-Talon Jalal Al-Muhtadi Jang Sik Park Javier Garcia-Villalba Jean-Luc Dugelay Jemal H. Abawajy
Organization
Ji-Hoon Yang Jin Kwak Jiyoung Lim Jocelyn Chanussot Jong-Wook Jang Joonsang Baek Junzhong Gu Karl Leung Kee-Hong Um Kenneth Lam Khaled El-Maleh Khalil Drira Ki-Young Lee Kouichi Sakurai Kyung-Soo Jang Larbi Esmahi Lejla Batina Lukas Ruf MalRey Lee Marco Roccetti Mark Manulis Maytham Safar Mei-Ling Shyu Min Hong Miroslaw Swiercz Mohan S Kankanhalli
Mototaka Suzuki Myung-Jae Lim Nadia Magnenat-Thalmann Neungsoo Park Nicoletta Sala Nikitas Assimakopoulos Nikos Komodakis Olga Sourina Pablo de Heras Ciechomski Pao-Ann Hsiung Paolo D’Arco Paolo Remagnino Rainer Malaka Raphael C.-W. Phan Robert G. Reynolds Robert G. Rittenhouse Rodrigo Mello Roman Neruda Rui Zhang Ryszard Tadeusiewicz Sagarmay Deb Salah Bourennane Seenith Siva Serap Atay
Special Session Organizers YangSun Lee Kwan-Hee Yoo Nakhoon Baek
Seung-Hyun Seo Shin Jin Kang Shingo Ichii Shu-Ching Chen Sidhi Kulkarni Stefan Katzenbeisser Stuart J. Barnes Sun-Jeong Kim Swapna Gokhale Swee-Huay Heng Taenam Cho Tony Shan Umberto Villano Wasfi G. Al-Khatib Yao-Chung Chang Yi Mu Yong-Ho Seo Yong-Kap Kim Yong-Soon Im Yoo-Sik Hong Young-Dae Lee Young-Hwa An Yo-Sung Ho Young Ik Eom You-Jin Song
XI
Table of Contents – Part I
Resource Management for Scalable Video Using Adaptive Bargaining Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yonghun Lee, Jae-Yoon Jung, and Doug Young Suh Improved Resizing MPEG-2 Video Transcoding Method . . . . . . . . . . . . . . Sung Pil Ryu, Nae Joung Kwak, Dong Jin Kwon, and Jae-Hyeong Ahn
1
10
Distributed Formation Control for Communication Relay with Positionless Flying Agents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kiwon Yeom
18
A Content-Based Caching Algorithm for Streaming Media Cache Servers in CDN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Inwhee Joe, Ju Hoon Yi, and Kyu-Seek Sohn
28
Implementation of Bilinear Pairings over Elliptic Curves with Embedding Degree 24 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . In Tae Kim, Chanil Park, Seong Oun Hwang, and Cheol-Min Park
37
Improvement of Mobile U-health Services System . . . . . . . . . . . . . . . . . . . . Byung-Won Min
44
Design and Implementation of an Objective-C Compiler for the Virtual Machine on Smart Phone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . YunSik Son and YangSun Lee
52
The Semantic Analysis Using Tree Transformation on the Objective-C Compiler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . YunSik Son and YangSun Lee
60
A Platform Mapping Engine for the WIPI-to-Windows Mobile Contents Converter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . YangSun Lee and YunSik Son
69
A Trading System for Bidding Multimedia Contents on Mobile Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Young-Ho Park
79
Design of a Context-Aware Mobile System Using Sensors . . . . . . . . . . . . . Yoon Bin Choi and Young-Ho Park
89
XIV
Table of Contents – Part I
Finding Harmonious Combinations in a Color System Using Relational Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Young-Ho Park Image-Based Modeling for Virtual Museum . . . . . . . . . . . . . . . . . . . . . . . . . Jin-Mo Kim, Do-Kyung Shin, and Eun-Young Ahn
97 108
Automatic Tiled Roof Generator for Oriental Architectural CAD Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hyun-Min Lee, Dong-Yuel Choi, Jin-Mo Kim, and Eun-Young Ahn
120
Understanding and Implementation of the Digital Design Modules for HANOK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dong-Yuel Choi, Eun-Young Ahn, and Jae-Won Kim
127
A Gestural Modification System for Emotional Expression by Personality Traits of Virtual Characters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Changsook Lee and Kyungeun Cho
135
An Automatic Behavior Toolkit for a Virtual Character . . . . . . . . . . . . . . . Yunsick Sung and Kyungeun Cho Development of Real-Time Markerless Augmented Reality System Using Multi-thread Design Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Xiang Dan, Kyhyun Um, and Kyungeun Cho An Acceleration Method for Generating a Line Disparity Map Based on OpenCL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chan Park, Ji-Seong Jeong, Ki-Chul Kwon, Nam Kim, Mihye Kim, Nakhoon Baek, and Kwan-Hee Yoo
146
155
165
Hand Gesture User Interface for Transforming Objects in 3D Virtual Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ji-Seong Jeong, Chan Park, and Kwan-Hee Yoo
172
Marker Classification Method for Hierarchical Object Navigation in Mobile Augmented Reality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gyeong-Mi Park, PhyuPhyu Han, and Youngbong Kim
179
Physically Balancing Multi-articulated Objects . . . . . . . . . . . . . . . . . . . . . . Nakhoon Baek and Kwan-Hee Yoo
185
High Speed Vector Graphics Rendering on OpenCL Hardware . . . . . . . . . Jiyoung Yoon, Hwanyong Lee, Baekyu Park, and Nakhoon Baek
191
Research on Implementation of Graphics Standards Using Other Graphics API’s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Inkyun Lee, Hwanyong Lee, and Nakhoon Baek
197
Table of Contents – Part I
A Dynamics Model for Virtual Stone Skipping with Wii Remote . . . . . . . Namkyung Lee and Nakhoon Baek How to Use Mobile Technology to Provide Distance Learning in an Efficient Way Using Advanced Multimedia Tools in Developing Countries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sagarmay Deb Design and Implementation of Mobile Leadership with Interactive Multimedia Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Suyoto, Tri Prasetyaningrum, and Ryan Mario Gregorius New Development of M-Psychology for Junior High School with Interactive Multimedia Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Suyoto, Thomas Suselo, Yudi Dwiandiyanta, and Tri Prasetyaningrum Adaptive Bandwidth Assignment Scheme for Sustaining Downlink of Ka-Band SATCOM Systems under Rain Fading . . . . . . . . . . . . . . . . . . . . . Yangmoon Yoon, Donghun Oh, Inho Jeon, You-Ze Cho, and Youngok Kim
XV
203
210
217
227
237
Digital Modeling and Control of Multiple Time-Delayed Systems via SVD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jong-Jin Park, Gyoo-Seok Choi, and Leang-San Shieh
243
Control System Design Using Improved Newton-Raphson Method and Optimal Linear Model of Nonlinear Equations . . . . . . . . . . . . . . . . . . . . . . . Jong-Jin Park, Gyoo-Seok Choi, and In-Kyu Park
253
Cost-Effective Multicast Routings in Wireless Mesh Networks . . . . . . . . . Younho Jung, Su-il Choi, Intae Hwang, Taejin Jung, Bae Ho Lee, Kyungran Kang, and Jaehyung Park
262
Facial Animation and Analysis Using 2D+3D Facial Motion Tracking . . . Chan-Su Lee, SeungYong Chun, and Sang-Heon Lee
272
A Method to Improve Reliability of Spectrum Sensing over Rayleigh Fading Channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Truc Thanh Tran and Hyung Yun Kong
280
Development of Multi-functional Laser Pointer Mouse through Image Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jin Shin, Sungmin Kim, and Sooyeong Yi
290
The Effect of Biased Sampling in Radial Basis Function Networks for Data Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hyontai Sug
299
XVI
Table of Contents – Part I
Location Acquisition Method Based on RFID in Indoor Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kyoung Soo Bok, Yong Hun Park, Jun Il Pee, and Jae Soo Yoo The Efficiency of Feature Feedback Using R-LDA with Application to Portable E-Nose System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lang Bach Truong, Sang-Il Choi, Yoonseok Yang, Young-Dae Lee, and Gu-Min Jeong Interactive Virtual Aquarium with a Smart Device as a Remote User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yong-Ho Seo and Jin Choi
307
316
324
Intelligent Control Algorithm for Smart Grid Systems . . . . . . . . . . . . . . . . Tahidul Islam and Insoo Koo
332
Analysis on Interference Impact of LTE on DTV . . . . . . . . . . . . . . . . . . . . . Inkyoung Cho, Ilkyoo Lee, and Younok Park
344
An Ontology Structure for Semantic Sensing Information Representation in Healthcare Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rajani Reddy Gorrepati and Do-Hyeun Kim
351
A New Type of Remote Power Monitoring System Based on a Wireless Sensor Network Used in an Anti-islanding Method Applied to a Smart-Grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kyung-Jung Lee, Kee-Min Kim, ChanWoo Moon, Hyun-Sik Ahn, and Gu-Min Jeong
358
ICI Suppression in the SC-FDMA Communication System with Phase Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Heung-Gyoon Ryu
368
Content Authentication Scheme for Modifiable Multimedia Streams . . . . Hankyu Joo
377
Intelligent Music Player Based on Human Motion Recognition . . . . . . . . . Wenkai Xu, Soo-Yol Ok, and Eung-Joo Lee
387
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
397
Table of Contents – Part II
Logical User Interface Modeling for Multimedia Embedded Systems . . . . Saehwa Kim Efficient Doppler Spread Compensation with Frequency Domain Equalizer and Turbo Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Haeseong Jeong and Heung-Gyoon Ryu Machine Learning-Based Soccer Video Summarization System . . . . . . . . . Hossam M. Zawbaa, Nashwa El-Bendary, Aboul Ella Hassanien, and Tai-hoon Kim
1
9 19
A Focus on Comparative Analysis: Key Findings of MAC Protocols for Underwater Acoustic Communication According to Network Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jin-Young Lee, Nam-Yeol Yun, Sardorbek Muminov, Seung-Joo Lee, and Soo-Hyun Park
29
Interference Impact of Mobile WiMAX BS on LTE in TV White Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yanming Cheng, Inkyoung Cho, and Ilkyoo Lee
38
Generating Optimal Fuzzy If-Then Rules Using the Partition of Fuzzy Input Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . In-Kyu Park, Gyoo-Seok Choi, and Jong-Jin Park
45
A Design of Embedded Integration Prototyping System Based on AR . . . Sin Kwan Kang, Jung Eun Kim, Hyun Lee, Dong Ha Lee, and Jeong Bae Lee
54
Optimization Conditions of OCSVM for Erroneous GPS Data Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Woojoong Kim and Ha Yoon Song
62
An Enhanced Dynamic Signature Verification System for the Latest Smart-Phones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jin-whan Kim
71
Illumination Invariant Motion Estimation and Segmentation . . . . . . . . . . . Yeonho Kim and Sooyeong Yi Daily Life Mobility of a Student: From Position Data to Human Mobility Model through Expectation Maximization Clustering . . . . . . . . . Hyunuk Kim and Ha Yoon Song
78
88
XVIII
Table of Contents – Part II
A Fast Summarization Method for Smartphone Photos Using Human-Perception Based Color Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kwanghwi Kim, Sung-Hwan Kim, and Hwan-Gue Cho Context-Driven Mobile Social Network Discovery System . . . . . . . . . . . . . Jiamei Tang and Sangwook Kim An Energy Efficient Filtering Approach to In-Network Join Processing in Sensor Network Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kyung-Chang Kim and Byung-Jung Oh A Genetic Programming Approach to Data Clustering . . . . . . . . . . . . . . . . Chang Wook Ahn, Sanghoun Oh, and Moonyoung Oh
98 106
116 123
Design and Implementation of a Hand-Writing Message System for Android Smart Phone Using Digital Pen . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jong-Yun Yeo, Yong Dae Lee, Sang-Hoon Ji, and Gu-Min Jeong
133
Robust Blind Watermarking Scheme for Digital Images Based on Discrete Fractional Random Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Youngseok Lee and Jongweon Kim
139
Performance Evaluation of DAB, DAB+ and T-DMB Audio: Field Trial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Myung-Sun Baek, Yonghoon Lee, Sora Park, Geon Kim, Bo-mi Lim, Yun-Jeong Song, and Yong-Tae Lee A Case Study on Korean Wave: Focused on K-POP Concert by Korean Idol Group in Paris, June 2011 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hyunhee Cha and Seongmook Kim Design and Implementation of Emergency Situation System through Multi Bio-signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ki-Young Lee, Min-Ki Lee, Kyu-Ho Kim, Myung-jae Lim, Jeong-Seok Kang, Hee-Woong Jeong, and Young-Sik Na Intelligent Music Recommendation System Based on Cloud Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ki-Young Lee, Tae-Min Kwun, Myung-Jae Lim, Kyu-Ho Kim, Jeong-Lae Kim, and Il-Hee Seo Handling Frequent Updates of Moving Objects Using the Dynamic Non-uniform Grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ki-Young Lee, Jeong-Jin Kang, Joung-Joon Kim, Chae-Gyun Lim, Myung-Jae Lim, Kyu-Ho Kim, and Jeong-Lae Kim The Guaranteed QoS for Time-Sensitive Traffic in High-Bandwidth EPON . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jeong-hyun Cho and Yong-suk Chang
146
153
163
169
175
181
Table of Contents – Part II
Robust Vehicle Tracking Multi-feature Particle Filter . . . . . . . . . . . . . . . . . M. Eren Yildirim, Jongkwan Song, Jangsik Park, Byung Woo Yoon, and Yunsik Yu Computationally Efficient Vehicle Tracking for Detecting Accidents in Tunnels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gyuyeong Kim, Hyuntae Kim, Jangsik Park, Jaeho Kim, and Yunsik Yu Development of an Android Application for Sobriety Test Using Bluetooth Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jangju Kim, Daehyun Ryu, Jangsik Park, Hyuntae Kim, and Yunsik Yu Performance of Collaborative Cyclostationary Spectrum Sensing for Cognitive Radio System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yoon Hyun Kim, In Hwan Park, Seung Jong Kim, Jeong Jin Kang, and Jin Young Kim Novel Spectrum Sensing for Cognitive Radio Based Femto Networks . . . . Kyung Sun Lee, Yoon Hyun Kim, and Jin Young Kim
XIX
191
197
203
210
220
Efficient Transmission Scheme Using Transceiver Characteristics for Visible Light Communication Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . In Hwan Park, Yoon Hyun Kim, and Jin Young Kim
225
Modification of Feed Forward Process and Activation Function in Back-Propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gwang-Jun Kim, Dae-Hyon Kim, and Yong-Kab Kim
234
Influential Parameters for Dynamic Analysis of a Hydraulic Control Valve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kyong Uk Yang, Jung Gyu Hur, Gwang-Jun Kim, Dae Hyon Kim, and Yong-Kab Kim Fixed-Width Modified Booth Multiplier Design Based on Error Bound Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kyung-Ju Cho, Jin-Gyun Chung, Hwan-Yong Kim, Gwang-Jun Kim, Dae-Ik Kim, and Yong-Kab Kim A Performance Enhancement for Ubiquitous Indoor Networking Using VLC-LED Driving Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Geun-Bin Hong, Tae-Su Jang, Kwan-Woong Kim, and Yong-Kab Kim Improved Password Mutual Authentication Scheme for Remote Login Network Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Younghwa An
241
248
257
263
XX
Table of Contents – Part II
Context-Awareness Smart Safety Monitoring System Using Sensor Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Joon-Mo Yang, Jun-Yong Park, So-Young Im, Jung-Hwan Park, and Ryum-Duck Oh Spectro-temporal Analysis of High-Speed Pulsed-Signals Based on On-Wafer Optical Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dong-Joon Lee, Jae-Yong Kwon, Tae-Weon Kang, and Joo-Gwang Lee e-Test System Based Speech Recognition for Blind Users . . . . . . . . . . . . . . Myung-Jae Lim, Eun-Young Jung, and Ki-Young Lee Improving the Wi-Fi Channel Scanning Using a Decentralized IEEE 802.21 Information Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fabio Buiati, Luis Javier Garc´ıa Villalba, Delf´ın Rup´erez Ca˜ nas, and Tai-hoon Kim Grid of Learning Resources in E-learning Communities . . . . . . . . . . . . . . . Julio C´esar Rodr´ıguez Rib´ on, Luis Javier Garc´ıa Villalba, Tom´ as Pedro de Miguel Moro, and Tai-hoon Kim A Comparison Study between AntOR-Disjoint Node Routing and AntOR-Disjoint Link Routing for Mobile Ad Hoc Networks . . . . . . . . . . . Delf´ın Rup´erez Ca˜ nas, Ana Lucila Sandoval Orozco, Luis Javier Garc´ıa Villalba, and Tai-hoon Kim Comparing AntOR-Disjoint Node Routing Protocol with Its Parallel Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Delf´ın Rup´erez Ca˜ nas, Ana Lucila Sandoval Orozco, Luis Javier Garc´ıa Villalba, and Tai-hoon Kim Location Acquisition Method Based on RFID in Indoor Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kyoung Soo Bok, Yong Hun Park, Jun Il Pee, and Jae Soo Yoo A Study on Compatibility between ISM Equipment and GPS System . . . Yong-Sup Shim and Il-Kyoo Lee A Context Aware Data-Centric Storage Scheme in Wireless Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hyunju Kim, Junho Park, Dongook Seong, and Jaesoo Yoo A Continuous Query Processing Method in Broadcast Environments . . . . Yonghun Park, Kyoungsoo Bok, and Jaesoo Yoo An Adaptive Genetic Simulated Annealing Algorithm for QoS Multicast Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bo Peng and Lei Li
270
278
284
290
295
300
305
310 319
326 331
338
Table of Contents – Part II
XXI
A Quantified Audio Watermarking Algorithm Based on DWT-DCT . . . . De Li, Yingying Ji, and JongWeon Kim
339
Features Detection on Industrial 3D CT Data . . . . . . . . . . . . . . . . . . . . . . . Thi-Chau Ma, Chang-soo Park, Kittichai Suthunyatanakit, Min-jae Oh, Tae-wan Kim, Myung-joo Kang, and The-Duy Bui
345
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
355
Resource Management for Scalable Video Using Adaptive Bargaining Solution Yonghun Lee, Jae-Yoon Jung, and Doug Young Suh* Kyung Hee University, 1 Seocheon-dong, Ciheung-gu, Yongin-si Gyeonggi-do 446-701, Republic of Korea
[email protected],
[email protected],
[email protected]
Abstract. This paper proposes a method of providing scalable video service to multiple users by managing resources with Adaptive Bargaining Solution. The Adaptive Bargaining Solution, which is a mixture of two bargaining solutions, Nash Bargaining Solution (NBS) and Kalai-Smorodinsky Bargaining Solution (KSBS), are the method of allocating resource in order to guarantee system efficiency and fairness, respectively. This paper shows how to exploit the merits of both solutions by applying different bargaining solutions according to time varying total resources and different rate-quality performance of SVC content. Not only the minimum quality, but also the system efficiency can be guaranteed at an adequate level. In addition, we propose an adaptive bargaining power determination method which solves unfairness according to the available resources varying and difference of rate-quality performance between scalable video contents. Keywords: bargaining solution, scalable video coding, NBS, KSBS.
1
Introduction
Currently, as mobile multimedia services get popular, resource management techniques become more important in order to support quality of service (QoS) in time-varying and bandwidth-constrained wireless network environment. Compared to previous static reservation-based methods and equal rate allocation scenario (ERAS), recent resource management methods become context aware and dynamic so that they guarantee quality fairness when QoS requirements of all users can be satisfied because of deficiency of resources. Park et al.[3] introduced to apply two bargaining solutions such as Nash Bargaining Solution (NBS) and Kalai-Smorodinsky Bargaining Solution (KSBS) for allocating resource to multiple users. These solutions popular in economical studies have been used for resource allocation according to the video qualities experienced by users which are represented by Peak Signal-to-Noise Ratio (PSNR) of decoded video. Based on KSBS for video service, Guan et al. [4] proposed novel co-opetition paradigm in which a target quality is set as a boundary of competition and cooperation among users. However they applied non-standardized video coding technology. *
Corresponding author.
T.-h. Kim et al. (Eds.): MulGraB 2011, Part I, CCIS 262, pp. 1–9, 2011. © Springer-Verlag Berlin Heidelberg 2011
2
Y. Lee, J.-Y. Jung, and D.Y. Suh
This paper modified the methods in [3] and [4] for standard scalable video coding (SVC) of JVT (Joint Video Team)[5]. And we proposed Adaptive Bargaining Solution (ABS) in which NBS and KSBS were combined appropriately. ABS divides total available resource into two sections and the bargaining solution is applied to each section. The first part of resource is used for proportional fairness among the users, and the other part for the efficiency enhancement of total system. This section is followed by Section 2, which describes SVC and two bargaining solutions, NBS and KSBS. Section 3 proposes ABS and explains how to divide total available resource for guaranteeing both fairness among users and efficiency of system. And then we also introduce adaptive bargaining power determination to overcome difference of rate-quality performance among users. Simulation scenarios and quantitative results are provided in Section 4 and Section 5 concludes this paper.
2
Scalable Video Coding and Bargaining Game
2.1
Scalable Video Coding
SVC standardized in JVT includes three kinds of scalabilities such as spatial, temporal, and quality scalability methods. These three kinds of scalability methods are denoted by a 3-tuple (D, T, Q) in [5][6]. Fig. 1 shows an inter-layer prediction structure in a hybrid scalability mode that combines the three scalability methods.
Fig. 1. Layer structure of scalable video coding (2 spatial, 5 temporal, 2 quality)
In SVC, inter-layer predictions are used to remove the redundancy in the layered structure (the inter-layer redundancy). That is, an upper layer encoding or decoding is performed by referring to its lower layers. When a layer named l is decoded, its lower layers need to be decoded in order to decode the l-layer. Therefore, the quality and the bitrate corresponding to each layer are represented as follows:
Resource Management for Scalable Video Using Adaptive Bargaining Solution
3
l=0 Q0 , l=0 r0 , Ql = , rl = , 1 Q l L − Q ≤ < , r − r ref l l ref 1 ≤ l < L
(1)
where Qref and rref denote respectively the quality and bitrate of the layers to which the l-layer refers in encoding or decoding. For example, in the SVC content prediction structure shown in Fig. 1, if l is (1,3,0), Qref is (1,2,0) and (0,3,0). Therefore we can calculate each layer’s priority as follows: l =0 ρ 0 = 1, q ρl = L −1 ρ = L −l 1 , l =1 ρl = 1, 1 ≤ l < L l q l =1 l
(2)
where ql is number of layers(or frames) in group of picture(GOP) which references l-layer. Therefore decoding cannot be done. When an available data rate R is given, the optimal layer that offers best service quality is expressed in (4). L* = arg max l∈L −1
l
Q
l
j
r
, subject to
j
≤R
(3)
j =1
j =1
The optimal quality, bitrate, and priority achievable with R given in (4) can be calculated using the equations U ( R ) = X = lL= 0 Ql , R′ = lL= 0 rl , and ρ ′ = L ρ l , *
*
*
l =0
respectively. 2.2
Bargaining Solutions
The bargaining game addressed in [1] and [2] refers to utilitarian bargaining situations where two or more players competing over limited resources must reach agreement regarding how to distribute the resources. The concerned resources are first organized into a set of all feasible utility allocations available if the players cooperate (called as a feasible utility set). For the agreement which most favors each player’s interests with regard to the feasible utility set, Pareto optimality should be satisfied. A “Pareto optimal” bargaining solution is one in which none of the players can increase their payoff without decreasing the payoff of at least one of the other players. The bargaining set is a set of all possible Pareto optimal solutions. The bargaining set is represented as follows:
{
B = X X = ( X 1,..., X N ) , i =1 Ri = RMAX , ∀Ri ≥ r0,i N
}
(4)
where RMAX denotes the resources allocated to N players. 2.2.1 Nash Bargaining Solution (NBS) In the bargaining set B defined in (4), the NBS yields a unique bargaining solution satisfying the following axioms.
4
Y. Lee, J.-Y. Jung, and D.Y. Suh
Axiom 1. Pareto optimality Axiom 2. Independence of linear transformations Axiom 3. Independence of irrelevant alternatives Axiom 4. Symmetry The first axiom indicates that a NBS is selected from the bargaining set, and the other axioms (Axiom 2, 3 and 4) characterize fairness in the NBS. Details of each axiom are described in [1]. A NBS is the bargaining solution which maximizes the product of each player's utilities (Nash product) on the bargaining set. In a bargaining situation with N players, a NBS denoted by X * is defined as follows: X * = arg max X ⊂B
∏ (X
− X 0,i )
αi
N
i =1
i
(5)
where X i is the utility function for player i, X 0,i is the utility obtained at the disagreement point, and αi is player i’s bargaining power. The sum of each player’s bargaining power is 1. Supposing that in Equation (5), the utility X is PSNR, a NBS can be interpreted as a weighted sum by each player’s bargaining power αi , as shown in (6). X * = arg max X ⊂B
N
Xi X 0,i
αi
i =1
(6)
If every player has the same bargaining power (i.e., α1 = ... = α N ), the NBS gives priority to the player with the largest rate-quality performance when allocating the resource. Hence, a NBS X * computed in (6) maximizes system utility. However, this resource allocation scheme does not guarantee fairness when RMAX is scarce and/or the gap between the players in terms of the rate-quality performance is big. 2.2.2 Kalai-Smorodinsky Bargaining Solution (KSBS) The axioms that characterize the KSBS are the same as those that characterize the NBS, except that the Independence of Irrelevant Alternatives (Axiom 3) is replaced by the axiom called Individual Monotonicity. The axiom of individual monotonicity indicates that if the utility set favors a certain player, the KSBS yields a point on the Pareto optimal bargaining set such that the ratios of maximal gains for the favored player are maintained. A KSBS is a unique bargaining solution satisfying the equations below [5]: X MAX ,1 − X 1
α1 ( X MAX ,1 − X 0,1 )
= ... =
X MAX , N − X N
α N ( X MAX , N − X 0, N )
(7)
where X MAX , i is the maximal utility for player i within the range of the given resources RMAX . X 0,i is disagreement of player i. As shown in (7), the KSBS allocates the resources in
such a way that the achieved utility of every participating player incurs the same quality penalty, i.e., the same decrease in video quality as opposed to their maximum achievable qualities. Unlike the NBS, the KSBS can guarantee fairness for players when RMAX is scarce.
Resource Management for Scalable Video Using Adaptive Bargaining Solution
5
The disadvantage of the KSBS is that even if RMAX becomes abundant, system efficiency is decreased due to differences associated with the rate-quality performance (i.e., the users with high rate-quality performance and those with low rate-quality performance coexist).
3
Adaptive Bargaining Solution
As described earlier, the NBS and the KSBS yield bargaining solutions different in terms of system efficiency and fairness. If RMAX or user’s rate-quality performance is consistent, applying a single bargaining solution scheme (either NBS or KSBS) can provide consistent resource management performance. In actual environments, RMAX or user’s rate-quality performance varies over time, so applying a certain bargaining solution scheme constantly might result in decreased overall system efficiency or unfairness among users. This paper proposes a novel resource management scheme called Adaptive Bargaining Solution (ABS) that sets and periodically updates a resource threshold denoted by Rth according to user’s rate-quality performance. If RMAX comes under the established threshold, the KSBS is applied to find a bargaining solution, which offers fairness for users. On the other hand, if RMAX goes over the threshold, the resources amounted to the threshold are first allocated to each user, and then the NBS scheme is applied for the resource allocation associated with RMAX − Rth in order to increase system efficiency. In addition, we propose an adaptive bargaining power determination method which solves unfairness according to the available resources varying and difference of ratequality performance between users. 3.1
Adaptive Bargaining Solution
Based on resource threshold Rth , the ABS divides the given resources RMAX into two sections denoted as RKSBS and RNBS . The resource threshold with regard to each user ( Γi ) is computed using the rate-quality performance of SVC layers, as shown below: N
Rth = Γi , Γi = i =1
Lth ,i
(r ) , l ,i
l =0
(
)
Lth ,i = arg max ΔQl ,i − ΔQl +1,i , ΔQl ,i = l∈L −1
X l +1,i − X l ,i rl +1,i − rl ,i
(8)
Lth,i is the point of inflection of rate-quality which shows the most difference in
efficiency change between layers as rate is increased. If RMAX ≤ Rth , the point of fairness can be negotiated using (7). When RMAX > Rth , Γi amount of resources is guaranteed by setting the disagreement point of all users to Γi . The bargaining set of the remaining RNBS = RMAX − Rth is defined to find the bargaining solution that satisfies (6). The process of ABS is described in the Algorithm 1. 3.2
Adaptive Bargaining Power Determination
In Equation (6) and (7), the bargaining power α i is used as a weight factor for each user. If every user has the same bargaining power and the condition
6
Y. Lee, J.-Y. Jung, and D.Y. Suh
(
RMAX < i =1U i−1 X MAX ,i N
)
is true, the bargaining solutions computed in (6) and (7)
suffer from degradation in resource management performance caused by differences in SVC content’s rate-quality performance. For example, suppose that two types of video encoded with the same coding parameter settings but having different motion (e.g., slow motion and fast motion; foreman, mobile) are delivered to end users. In this case, the resources are allocated such that the users receiving slow motion sequences with higher rate-quality performance will always have better video quality than the users who receive fast motion sequences. Hence, the bargaining powers should be adapted to SVC rate-quality performance in order to provide utility-aware fairness. αi =
X − X n ,i λi−1 , λi = m ,i −1 N rm ,i − rn,i i =1 λi
(9)
where λi is difference of rate-quality performance between n-layer and m-layer. Therefore, (9) sets higher bargaining powers for users with low quality improvements as the rate increases. Following pseudo codes describes resource allocation processes of proposed ABS. Define Rth for according to (8) RKSBS = min { Rth , RMAX }
Find bargaining set BKSBS for according to (4) * Find the KSBS X KSBS with (7)
(
* −1 * X KSBS Find RKSBS ,i = U i ,i
)
, ∀i
IF RKSBS = RMAX : * Allocate Ri* = RKSBS ,i , ∀i ELSE: r0,i = Γi , ∀i
RNBS = RMAX
Find bargaining set BNBS for according to (4) * Find the NBS X NBS with (6)
(
* −1 * X NBS Find RNBS ,i = U i ,i
)
, ∀i
Allocate Ri* = R*NBS ,i , ∀i
4
Simulation Parameters and Results
To evaluate the effectiveness of the proposed scheme, computer simulations were performed. We assumed a multi-user (3 users) environment with different SVC contents which are encoded by the SVC reference software JSVM ver. 9.13.1. The SVC contents used in the simulation are encoded in terms of 2 spatial (QCIF and CIF
Resource Management for Scalable Video Using Adaptive Bargaining Solution
7
resolution), 5 temporal (1.875, 3.75, 7.5, 15, and 30 Hz), and 2 quality scalabilities. The applied coding parameters are identical. Table 1 shows the range of SVC content rate and quality used in the simulation and also the sum of significance of rate and quality of Lth decided by (8). Table 1. The range of SVC content rate and quality
Content
Min. Q Max. Q Min. R Max. R [dB] [dB] [Kbps] [Kbps]
Foreman 19.63
35.21
35
1046
Soccer
15.54
33.17
36
1508
Mobile
15.10
32.45
73
2132
Qth
Γth
[dB] 26.8 (46.1%) 23.8 (42.8%) 20.2 (20.7%)
[Kbps] 149 (11.3%) 192 (10.6%) 135 (3.01%)
ρth
39.5% 39.5% 29.6%
1.0
30 29
0.9
28
Average PSNR [dB]
26
0.7
25
0.6
24
0.5
23 22
NBS Avg. PSNR KSBS Avg. PSNR ABS Avg. PSNR NBS Fairness index KSBS Fairness index ABS Fairness index
21 20 19 18 17 0
500
1000
1500
0.4 0.3
Fairness Index
0.8
27
0.2 0.1
0.0 2000
Total rate [Kbps] Fig. 2. Average PSNR and fairness index achieved by the NBS, KSBS, and ABS with same bargaining powers (user1=1/3, user2=1/3, user3=1/3)
The Jain’s fairness index introduced in [7] is used to determine whether users are receiving fair shares of system resources, and system efficiency is evaluated by applying the average PSNR.
8
Y. Lee, J.-Y. Jung, and D.Y. Suh
As illustrated in Fig. 2, NBS show the highest average PSNR, which represents system efficiency. On the other hand, KSBS show the highest fairness. The proposed ABS using KSBS in rMAX < rth range compared to NBS in the same range improves an average of 15% fairness and decline of 1.5dB at average PSNR. Whereas, in rMAX > rth , NBS show 2.42dB higher average PSNR and 12% lower fairness compared to KSBS. 30
1.0
29
0.9
28
Average PSNR [dB]
26
0.7
25
0.6
24
0.5
23 22
NBS Avg. PSNR KSBS Avg. PSNR ABS Avg. PSNR NBS Fairness index KSBS Fairness index ABS Fairness index
21 20 19 18 17
0.4
Fairness Index
0.8
27
0.3 0.2 0.1 0.0
0
500
1000
1500
2000
Total rate [Kbps] Fig. 3. Average PSNR and fairness index achieved by the NBS, KSBS, and ABS with adaptive bargaining powers
In Fig. 3, the result of applying NBS using bargaining powers is determined from the total rate change and SVC rate-quality change according to (9). By applying identical bargaining powers in rMAX > rth , the fairness increases 4% and average PSNR drops 0.16dB. Using the result of both simulations to apply ABS and adaptive bargaining powers, the equivalent fairness of maximum proportional fairness (KSBS), 93% can be guaranteed at R0 < RMAX < Rth where the sum of significance and rate-quality is the highest ( rth ≈ 10% , Qth ≈ 40% , ρth ≈ 39% ). Also, if Rth < RMAX , 98% average PSNR is guaranteed and fairness is improved by 4% compared to maximum efficiency (NBS).
5
Conclusion
This paper proposed the Adaptive Bargaining Solution(ABS) that applies two bargaining solutions (NBS and KSBS) for appropriate purposes adaptively to amount
Resource Management for Scalable Video Using Adaptive Bargaining Solution
9
of available network resource. If available resource is not enough, ABS applies KSBS which pursues proportional fairness to every user. On the other hand, if available resource is abundant, it allocates resource threshold to every users first, and then applies NBS, which pursues maximization of total system utility. This approach enables fair and efficient use of time varying channel resource. Also, the unfairness due to image characteristics is reduced by determining bargaining powers using ratequality performance between layers and rate change. Acknowledgement. This research was supported by the MKE(The Ministry of Knowledge Economy), Korea, under the ITRC(Information Technology Research Center) support program supervised by the NIPA(National IT Industry Promotion Agency) (NIPA-2011-(C1090-1111-0001)).
References 1. 2. 3. 4.
5.
6.
7.
Nash, J.F.: The bargaining problem. Econometrica 18, 15–162 (1950) Kalai, E., Somorodinsky, M.: Other solutions to Nash’s bargaining problem. Econometrica 43, 514–518 (1975) Park, H., van der Schaar, M.: Bargaining Strategies for Networked Multimedia Resource Management. IEEE Transactions on Signal Processing 55, 3496–3511 (2007) Guan, Z., Yuan, D., Zhang, H.: Novel Coopetition Paradigm Based on Bargaining Theory for Collaborative Multimedia Resource Management. In: Proceedings of PIMRC, pp. 1–5 (September 2008) Advanced Video Coding for Generic Audiovisual Services, ITU-T Rec. H.264 and ISO/IEC 14496-10 (MPEG-4 AVC), ITU-T and ISO/IEC JTC 1, Version 8 (including SVC extension): Consented in (July 2007) Xiao, S., Wang, H., Kuo, C.-C.J.: Priority Ordering and Packetization for Scalable Video Multicast with Network Coding. In: Ip, H.H.-S., Au, O.C., Leung, H., Sun, M.-T., Ma, W.-Y., Hu, S.-M. (eds.) PCM 2007. LNCS, vol. 4810, pp. 520–529. Springer, Heidelberg (2007) Jain, R., Durresi, A., Babic, G.: Throughput fairness index: an explanation. ATM Forum Document Number: ATM Forum/990045 (February 1999)
Improved Resizing MPEG-2 Video Transcoding Method Sung Pil Ryu1, Nae Joung Kwak1, Dong Jin Kwon2, and Jae-Hyeong Ahn1 1
52 Naesudong-ro, Heungdeok-gu, Cheongju Chungbuk 361-763, Korea #49-3 Myeonmok-dong, Seoildaehak-gil-22, Jungnang-gu Seoul,131-702, Korea
[email protected],
[email protected],
[email protected],
[email protected] 2
Abstract. This paper proposes a transcoding technique to reduce the resolution of MPEG-2 video stream for small wireless communication terminals. The proposed method first extracts the motion vector, the macroblock mode information, and the DCT coefficient from the original image, and then determines the mode of the macroblock by analyzing the information, after which it re-calculates the new macroblock information and processes it by separating it into a treatment in the DCT domain and a treatment in the spatial domain. The proposed method reduced the computational complexity to be apt for real-time processing and the image degradation. Keywords: Transcoding, MPEG-2, Motion Vector, estimation, reduce resolution.
1
Introduction
With the advances in the wired/wireless communication technique and the miniaturization technique for mobile terminals, services that were previously available only on the terminal in a wired communication environment are now absorbed by small mobile wireless terminals. Video service for small wireless terminals are provided in two methods: the scalable encoding method aiming at overcoming the fact that transmission methods, storage media, terminal types and performance are all different, and the method of placing an appropriate transcoder between the encoder and the decoder to provide services actively according to the type of the network and decoding terminal, and requirements. The scalable encoding method separates the video streams into several layers according to their importance, and selectively receives data encoded in layers according to the network type, bandwidth, and the ability of the decoder. However, this method has a shortcoming: i.e. it is impossible to service for video streams not to be scalable-encoded. To overcome this shortcoming, the transcoding method is researched and used. Video transcoding is a technique that enables encoding of video contents in different sizes, different bit rates, and different standards for various terminal devices, and re-encodes video after the decoding for conversion of the features of the encoded video. Such transcoding is divided into homogeneous transcoding and heterogeneous T.-h. Kim et al. (Eds.): MulGraB 2011, Part I, CCIS 262, pp. 10–17, 2011. © Springer-Verlag Berlin Heidelberg 2011
Improved Resizing MPEG-2 Video Transcoding Method
11
transcoding. Heterogeneous transcoding performs conversion between different video coding standards[3]. Homogeneous transcoding converts bit-streams within the same standard and includes a frame-rate adjustments[5], an adaptive quantization [4] and a resolution conversion[6]. The spatial resolution conversion method regulates the encoded data to fit the screen size of various terminals. In this process a new motion vector is gotten for the inter-macroblock of the size-converted video images. The motion vector is gotton by fully restoring the encoded video stream and resizing it for the spatial domain, and then is re-estimated. This method provides high quality image, but it is inefficient because it requires many calculations and takes a long time for processing. More efficient methods [7][8] don't decode the encoded video stream but reuses the original video image’s motion vector in the DCT domain. These methods degrade the image quality but reduce the computational complexity. Therefore, there have been studied the hybrid methods to reduce the image's degradation with reusing the motion vector of the original images in the DCT domain.
2
Hybrid Resizing Transcoding
Resizing transcoding to convert the resolution is the process of producing the new video stream with the desired size from the already encoded video stream. There are three kinds of resizing methods: the cascade transcoding method and the DCT domain resizing (DDR) method[1], and the hybrid resizing transcoding method. The cascade transcoding method generally performs the re-encoding after completely decoding the encoded video streams and executes the resizing in the spatial domain. The DDR (DCT Domain Resizing) method, as shown in Figure 1, re-encodes the pre-encoded video streams using the conversion matrix in the DCT domain, without completely decoding the encoded video streams, using the data of them. The hybrid resizing transcoding is the efficient method to combine the cascade transcoding method and the DCT domain resizing (DDR) method
Fig. 1. Transcoder in DCT Domain
One of the representative methods of adjusting the size of video streams is the method of reducing the size by 1/2. This method combines four macro blocks into one macro block. So the method has to determine the mode of the macro blocks to be combined and re-estimates the motion vector according to the mode. Therefore, the macroblock mode and the motion vector re-estimation method have a remarkable
12
S.P. Ryu et al.
impact on the quality of the half resolution images to be transcoded by referencing the four macroblocks of the original resolution images. A typical example of motion vector re-estimating methods is the weighted average method. It is a method that uses the spatial activity of the macroblock of the input video as weight to the motion vectors and averages the vectors to determine the motion vector of the half-sized video images. This method produces better image's quality than average method or median method[2], but it also has a weak point that it requires an additional operation for the AC coefficient value in the process of calculating the weight.
3
Improved Motion Vector Re-estimation Method
This paper proposes a transcoding method of the selection of the optimal mode for the macro blocks of the video stream and the re-estimating the motion vector to minimize the error of the motion vector when the size of the input video stream is reduced 1/2. The proposed method, as shown in Figure 2, first extracts the motion vector, the macroblock mode information, and the DCT coefficient from the input video stream. Then the proposed method determines the macroblock mode by analyzing the information and re-calculates the new macroblock information.
Fig. 2. Proposed resizing transcoder
Table 1 shows the conditions to determine encoding mode and the encoding modes according to them. The proposed method determines the new encode mode(table 1’s right term) of resizing video streams to consider such the encode information(table 1’s left term) as the encode modes, motion vector, motion vector’s direction of the four macro blocks of the input image.
Improved Resizing MPEG-2 Video Transcoding Method
13
Table 1. The proposed encoding mode
Four of the macro block All INTRA If all the zero motion vector If all the same direction of motion vector If two or more of the macro block is INTRA all the SKIP mode Other cases
Encoding mode DDR (DCT Domain Resizing)
Modify INTRA + DDR SKIP Mode conversion, Modify ME
If the four adjacent macro blocks of the input video stream are all in the intra mode, if the motion vectors are all zero vectors, and the directions of the motion vectors are the same, the macro block mode is the same as that of the input image and the encoding mode sets DDR. If two or more of the four macro blocks are in the intra mode, the macro blocks after size-adjustment transcoding tends to be converted into the intra mode. So macro blocks, not in the intra mode, are converted into the intra mode, and the encoding mode sets DDR without ME(Motion Estimation) process. When the macro blocks are converted into the intra mode, IDCT and DCT operations are necessary. However many macro blocks are converted into the intra mode and the ME process can be omitted. So calculations will be reduced. If only one of the four macro block is in the intra mode, the Three inter macro blocks are similar to the macro block after the size-adjustment transcoding. So the mode of the intra macro blocks is changed in consideration of adjacent values, and is encoded as inter macro blocks. After the encoding method is determined, if the mean or median method to reestimate motion vector was equally applied to the macro blocks with different mode such as intra macro blocks, inter macro blocks, and mixture macro blocks, the error of the motion vector of the macro block with the reduced size will increase and thus deteriorate the image quality. Therefore, to reduce image's deterioration, this paper proposes the efficient re-estimating method to use the information of each macro block. The Motion Vector Re-Estimation (MVRE) to propose in this paper is determined as one among the average value, the median value, and the MME (Modified Motion Estimate), which is based on two thresholds, T1 and T2 by statistical methods.
LOW , if (VoM < T1 ) Average, MVRE = Median, MEDIUM , if (T1 ≤ VoM ≤ T2 ) MME, HIGH , if (VoM > T2 )
(1)
Here, T1 < T2 is met. VoM(Variation of Motion) is the estimation value of motion and is computed as following equation.
14
S.P. Ryu et al.
4
VoM i = (mvi ) k
(2)
k =1
where ‘i’ is the direction(0°,90°, 180°, 270°, 315°), ‘k’ is the index of the macroblock of the original image, and ‘mv’ is the motion vector of the original image. If VoM is small (VoM is less than T1 in equation (1)), the motion's directions and size are stable. So MVRE of a new macro block sets the motion vector’s average value of four macroblocks of the original image. The MEDIUM area (VoM is between T1 and T2 ) has a little difference among the four motion vectors. Therefore, MVRE sets as the median value between the two or three macroblocks’ motion vector values. The HIGH area(VoM is bigger than T2 ) is that the four macroblock’s motions are all big or the macroblock’s motion is partially big. In this case, if MVRE sets the average value or the median value, large motions of the blocks may be lost. Therefore, the proposed method reduces the error by calculating MVRE using the new Modified Motion Estimation (MME) process. MME re-searches the motion vector at ± 2 search range without a full search. Accordingly, the complexity of the re-search was sufficiently reduced and more than 94% of the image quality was guaranteed. So this value was selected for the research to obtain the motion vector. T1 and T2 , the two threshold of MVRE in equation (1), were determined statistically by transcoding multiple videos of CIF resolution to the QCIF size and considering the distribution of the motion vector. First, the half-resized video image is made by the cascade transcoding method. The three half-resized video images are made by applying each method(average, medium, and MME) of MVRE of the proposed method to the original image; that is, the motion vector of all macro blocks of a half-resized image set the average of the adjacent motion vector for the corresponding macro blocks, the motion vector of all macro blocks of other halfresized video image set median value of the adjacent motion vector for the corresponding macro blocks, and the motion vector of all macro blocks of the last half-resized video image set MME value of the adjacent motion vector for the corresponding macro blocks. The motion vectors of the half-resized image by the cascade transcoding method are compared with the motion vectors of the three halfresized images by average method, median method, and MME method. The threshold values are determined using the statistic of the comparison accuracy of three methods based on motion vector of the cascade transcoding method. It was possible to determine the threshold values by statistically calculating it. To increase the generality of the experiment, more than 20 video images of the CIF size were used for the experiment. As a result of the experiment, two critical values were determined: T1 =5 and T2 =17.
4
Experiment Results and Analysis
The experiment was performed using Visual Studio 2008 compiler in a computer with a 4GB RAM in an Intel P9500 CPU. To analyze the results of the proposed method
Improved Resizing MPEG-2 Video Transcoding Method
15
and the existing method, we get the motion vectors of the image with half size of the original image using the cascade thanscoding method and compare the similarity between the motion vectors from the cascade and them from each method (the existing method and the proposed method ). We use each 100 frames from images as foreman, costguard, flower, hall monitor, mobile news with the size of CIF(N=15,M=1). The Figure 3 shows the motion vector of the image using the cascade thanscoding method of the flower image relative accuracy of the motion vector of the image using the existing method[2] and the motion vector of the image using the proposed method. The accuracy to the motion vector of the dependent transcoder was 88.05% for the existing method, and 92.5% for the proposed method.
Fig. 3. Comparison accuracy of existed and proposed methods based on motion vector of the cascade transcoding method Table 2. Comparison of PSNR of the video streams
Streams
foreman coastguar d flower Hall monitor mobile news
Cascade transcoding method 29.93 26.59
Exist ing method 29.54 26.08
Propose d method
21.97 29.19
18.31 28.76
18.64 28.98
22.81 33.25
20.82 32.97
21.10 33.21
29.74 26.31
Also we compare PSNR of each method. Table 2 is average PSNR to apply each method to test images. The proposed method was more similar to the PSNR of the cascade transcoding method by 0.20~0.33dB.
16
S.P. Ryu et al.
(a) Existing method
(b) proposed method
Fig. 4. 58th frame of the foreman stream
(a) Existing method
(b) proposed method
Fig. 5. 33rd frame of the flower stream
Figure 4 shows the 58th frame of the foreman image. The partially enlarged picture shows that the lip-boundary is finer represented. Figure 5 shows the 33rd frame of the flower image. It shows that the boundary of the windmill's wing is more excellent represented. These mean that the proposed method is more effective than the existing method in re-estimating motion vectors for motion- area.
5
Conclusion
This paper proposes the transcoding method for resizing of MPEG-2 videos for small mobile terminals. The proposed method sets macroblocks’ modes of the resized stream according to macroblocks of the encoded MPEG stream and decides the encoded method in the DCT domain or the spatial domain depending on the decided modes. Also the proposed method analyzes the feature of the motion vector of the encoded video and re-estimates the motion vector using the average value, the median value, and the MME method to select appropriate motion vector according to the features. The comparison of the proposed method with the existing method showed that the accuracy of the motion vector was enhanced by 4.47%. Thus, it was verified that the superiority of the image quality and the execution speed were enhanced by improving the re-estimation process. Furthermore, even at PSNR, there was not much difference compared with the cascade transcoder and showed the gain of 0.27dB when compared with the existing method.
Improved Resizing MPEG-2 Video Transcoding Method
17
References 1. Dugad, R., Ahuja, N.: A Fast Scheme for Image Size Change in the Compressed Domain. IEEE Trans. CSVT 11(4), 461–474 (2001) 2. Shen, B., Sethi, I.K., Vasudev, B.: Adaptive motion-vector resampling for compressed video down scaling. IEEE Trans. Circuits Syst. Video Technol. 9(6), 926–936 (1999) 3. Chae, B.J., Oh, S.J., Chung, K.: An MPEG-2 to MPEG-4 Video Transcoder. In: Proc. ITCCSCC 2003, Kang-Won Do, Korea, vol. 2, pp. 914–916 (2003) 4. Werner, O.: Requantization for Trascoding of MPEG-2 Intraframes. IEEE Trans. on Image Processing 8(2) (1999) 5. Shanableh, T., Ghanbari, M.: Heterogeneous Video Transcoding to Lower SpatioTemporal Resolutions and Different Encoding Formats. IEEE Trans. Multimedia 2(2), 101–110 (2000) 6. Merhav, N., Bhaskaran, V.: Fast Algorithms for DCT-Domain Image Down-Sampling and for Inverse Motion Compensation. IEEE Trans. CSVT 7(3), 468–476 (1997) 7. Yim, C., Isnardi, M.A.: An Efficient Method for DCT-Domain Image Resizing with Mixed Field/Frame-Mode Macroblocks. IEEE Trans. CSVT 9(5), 696–700 (1999) 8. Chang, S.-F., Messerschmitt, D.G.: Manipulation and compositing of MC-DCT compressed Video. IEEE J. Select. Areas Commun. 13, 1 (1995)
Distributed Formation Control for Communication Relay with Positionless Flying Agents Kiwon Yeom Human Systems Integration Division, NASA Ames Research Center San Jose State University Research Foundation Moffett Field, CA 94035, USA
[email protected]
Abstract. Distributed formation of swarming with no coordinated agreement or positioning information is an interesting research area. This principle is applied to the development of ad-hoc wireless communication networks based on flying agent for finding ground users in disaster areas. We describe a decentralized self-control algorithm for coordinating a swarm of identical flying agents to spatially self-organize into arbitrary shapes using local communication maintaining a certain level of density. The proposed approach generates a shared coordinate system by flying agents which are continuously performing local trilateration, and achieves pre-defined shape formation by allowing agents to scatter within the defined 2D shape using virtual pheromones to maintain their communication pathways. Keywords: distributed formation, swarm, flying agent, self-organization.
1 Introduction Nature has shown that complex collective behaviors can be made possible by very simple interactions among large number of agents which are relatively unintelligent [1]. For example, schools of fish swim in unison, and are able to execute large scale collective behaviors to avoid a predator. Termite colonies can build very large and complex nests. Ants collectively search a very large area and are capable of returning food to the nest [2]. In these examples, there is no central leader with all the information-making decisions for each individual. This non-supervised behavior is a central aspect of distributed systems. In this paper, we focus on a control algorithm in which flying agents self-organize and self-sustain arbitrary 2D formations. Keeping a specific formation of flying agents is important for many real world tasks, especially when individual agents have limited abilities or the task requires global action (see Fig. 1). For example, flying agents may aggregate for coordinated search of survivors in a disaster area. Imagine a large group of small unmanned autonomous aerial vehicles that can fly with the agility of a flock of starlings in a city square or of a swarm of donut-like shape avoiding many obstacles [3]. We propose a decentralized formation algorithm, which can not only accomplish arbitrary shapes by self-organization but also produces resulting the formed global shapes that are highly robust to varying numbers of agents from agent death. In addition, it can T.-h. Kim et al. (Eds.): MulGraB 2011, Part I, CCIS 262, pp. 18–27, 2011. c Springer-Verlag Berlin Heidelberg 2011
Distributed Formation Control for Communication Relay
19
Fig. 1. Artistic view of the use of a swarm of UAVs for establishing communication networks between users located on ground
compensate for practical hardware limitations like sensor and movement error. We assume that our flying agents are equipped with imperfect proprioceptive sensors and a short-range wireless communication device with which agents can exchange information with only nearby neighbors. Briefly, our algorithm works as follows: first, flying agents initially wander with no information about their own coordinates or their environment. However, they have a programmed internal knowledge of the desired shape to be formed. Next, a small seed group of agents are initially located in a shape. As non-seeded agents move, they continually perform local trilaterations to figure out a common coordinate system among flying agents, and maintain the learned common coordinate system by continuous local communication. At the same time, agents maintain a certain density level among themselves using pheromones and flocking-rule-based distance measurements [4]. This enables flying agents to disperse within the specific shape and fill it efficiently. This approach has several salient contributions. It only requires agents to have local communication ability, an approximate measure of relative distance, an approximate measure of relative motion, and a magnetic compass. Technically, our system can distribute agents through a specified shape, not merely place them in particular positions on the two dimensional plane. Therefore, agents can easily aggregate into arbitrary pre-defined shapes. Because the proposed algorithm proceeds by synchronizing every agent’s coordinate system, the system enables agents to form many arbitrarily connected formations maintaining a certain density regardless of map size, the number of agents, and obstacles. We show through simulation experiments not only that flying agents can easily aggregate into arbitrary user-defined shapes but also that the formed shape is robust to varying numbers of agents and independent of the number of agents.
2 Related Work Several projects are aimed at getting UAVs to fly in formation, usually under remote but high-level control [5]. This type of project is therefore different from the biologicallyinspired flexibility and responsiveness of flocking pursued within a swarm [6]. However many of the required technologies are similar. The MinuteMan project at UCLA builds
20
K. Yeom
a reconfigurable architecture for highly mobile multi-agent systems [7]. The intention is that the computationally capable autonomous vehicle would be able to share information across a wireless fault-tolerant network. Study of formation-flying were undertaken at MIT, within the Autonomous blimps project [8]. The University of West England developed the flying flock project slightly different from previous work [9]. The work is conceived with a minimalist approach. Currently, UAVs are designed to achieve tasks such as the surveillance of an area of interest or searching for targets and subsequently destroying, tracking or sensing them [10]. Other possible applications include environmental monitoring and more specifically toxic plume characterization or forest fire detection, and the deployment of mobile communication networks. Several map-based UAV applications are proposed in [11] and [12]. In map-based applications, UAVs know their absolute position which can be shared locally or globally within the swarm. Each agent then decides where to navigate based on its interpretation of the map. UAVs can deposit and sense virtual pheromones, location information visited by robots over time, or areas of interest in the environment. Obtaining and maintaining relative or global position information is challenging for UAVs or mobile robot systems. A possible advance is to adopt a global positioning system (GPS). However, GPS is not reliable and rarely possible in cluttered areas. Alternatively, wireless technologies can be used to estimate the range or angle between agents of the swarm. In this case, beacon agents can be used for a reference position to other moving agents. Off-the-shelf sensors such as cameras, laser range finders, radars, ultrasound and infrared sensors are capable of providing relative positioning, but this equipment is typically expensive and heavy and hence incompatible with the scalable nature of swarms composed of large numbers of simple and inexpensive aerial robots. Our system attempts to achieve connected arbitrary formation using a decentralized local coordinate system of agents with relative distance and density model.
3 Flying Agent Model We assume a simple aerial robot that moves and turns in continuous space, which is motivated by the pieces of capabilities of real autonomous UAVs. Each robot has several simple equipment such as distance and obstacle-detection sensors, a magnetic compass, wireless communication, etc. (see Table 1) We assume that agents move in 2D continuous space, all flying agents execute the same program, and agents can interact only with other nearby agents by measuring distance and message exchange. Each agent has a magnetic compass for directional Table 1. Flying agent model Distance sensor Detection sensor Magnetic compass Wireless comm. Locomotion Internal shape map
provides estimated distance of each neighbor detects obstacles in direct proximity to robot provides directional orientation allows agents to communicate with each other moves agents in the world but has error is specified by user as a target shape
Distributed Formation Control for Communication Relay
(a)
21
(b)
Fig. 2. (a) Agent model inspired from capabilities of real UAVs. (b) An example of UAVs hardware
recognition, but both distance measurements and movement have error. We assume that the simulation world is finite for simplifying the handling of agent trajectories, and agents that wander off one side will reappear on the other side. The agents’ communication architecture is based on a simple IR ring architecture because we assume that agents can interact only with nearby neighbors. The robots have omnidirectional transmission, and directional reception. This means that when a robot receives a transmission, it knows roughly which direction the transmission came from (see Fig. 2(a)). An example of such communication hardware is described in [13] (see Fig. 2(b)). The agent’s dynamic model is implemented using a first order flight model for simple and low-cost airframe. We assume that our UAVs can fly at a speed of approximately 10 m/s and are able to hover or make sharp turns as an example in Figure 2(b). The minimum turn radius of the UAVs is assumed to be as small as possible with respect to the communication range. Having a realistic communication model is essential for credibility because of the real-life challenge brought on by highly dynamical systems, signal propagation uncertainties and network topologies that are prone to packet collisions. While most current robots have simplified communication models, we assume that our UAVs use wireless communication based on the IEEE 802.11b specification, allowing a communication range of around 100 m. This medium was chosen because in most potential scenarios, ground users can use wireless communication devices which are embedded on laptops, smart phones, PDAs etc.
4 Self-organizing Formation Algorithm Each flying agent has a shared map of the shape to be constructed and this should be overlaid on the agent’s learned coordinate system. Initially, flying agents are randomly scattered into the simulation world without any information about the environment. Then agents begin to execute their programmed process to decide their position using only data from their proximity sensors (i.e., distance and density) and their wireless communication link with nearby neighbors. Agents are simulated in an asynchronous and autonomous manner with finite time required for the calculation of both position and movement. In our model, agents have a simple process cycle model as shown in Fig. 3(a). The second sense step is necessary because agents should compare the data before and after movement to determine distance and orientation from positioning error. In a
22
K. Yeom
(a)
(b)
Fig. 3. (a) Agent’s process cycle. (b) Agent’s trilateration.
more realistic scenario, agents would move varying distances over time because of both distance measurements and movement error. The positioning process largely relies on the ability of agents to estimate the magnitude of their motions. In this model, agents have three computational states such as lost, out of shape, and in shape. Initially, agents are in the lost state because there is no given coordinate system. An agent in lost state will randomly wander through the world until it senses three nearby neighbors which have a coordinate system. When this condition is satisfied, the lost state agent tries to trilaterate to calculate its position by comparing the neighbor’s distance. In the next subsection we describe this trilateration process in more detail. 4.1 Local Trilateration Trilateration is the process of determining absolute or relative locations by measurement of distances using the geometry of circles or triangles. In two-dimensional geometry, when it is known that a point lies on two curves such as the boundaries of two circles then the circle centers and the two radii provide sufficient information to narrow the possible locations down to two [15]. Trilateration allows an agent to find its perceived position (xp , yp ) on the connected coordinate system (see Fig. 3(b)). It is also used subsequently to adjust its position. In this work, the trilateration process occurs only if there are at least three neighbors that are not in the lost state. An agent uses its distance sensor to estimate its distance to each neighbor agent and also to request their own learned coordinate systems by wireless communication. Let the positions of the three fixed anchors be defined by the vectors x1 , x2 , and x3 ∈ R2 . Further, let xp ∈ R2 be the position vector to be determined. Consider three circles, centered at each anchor, having radii of di meters, equal to the distances between xp and each anchor xi . These geometric constraints can be expressed by the following system of equations: xp − x1 2 = d21
(1)
xp − x2 2 = d22
(2)
xp − x3 =
(3)
2
d23
Distributed Formation Control for Communication Relay
23
Generally, the best fit for xp can be regarded as the point that minimizes the difference between the estimated distance (ζ) and the calculated distance from xp (xp , yp ) to the neighbors reported coordinate system. That is, argmin | (xi − xp )2 − ζi |. (4) (xp ,yp )
i
From this information, we can know that this problem is related to the sum minimization problems that arise in least squares and maximum-likelihood estimation. However, in this paper, we don’t consider finding any optimal or global solution but a local minimum, because it requires a lot of computational resources and it is not suitable for a small and inexpensive device. For simplification, formula 4 can be rewritten in the form of a sum as follows: Q(w) =
n
Qi (w)
(5)
i
where the parameter w is to be estimated and where typically each summand function Qi () is associated with the i-th observation in the data set. We perform Eq.6 to minimize the above function: w := w − α∇Q(w) = w − α
n
∇Qi (w)
(6)
i=1
where α is a step size. 4.2 Flocking Movement Control As described in the previous section, the agent has three states such as lost, out of shape, and in shape. Agents take different movement patterns according to their states. If an agent is in the lost state, it is assumed that the agent is located outside the shape or is in the initial simulation starting status. If they are outside the shape, they begin to wander randomly to find their way into the shape. When agents are inside the shape, they are considered as part of the swarm that comprises it. Once agents have acquired a coincident coordinate system in the shape, they should not take any steps so that place them outside of the shape. Then agents attempt to fill a formation shape. In this work, we achieve this control by modeling virtual pheromones in a closed container. Agents react to different densities of neighbors around them, moving away from areas of high density towards those of low density [5]. They finally settle into an equilibrium state of constant density level throughout the shape over time. This mechanism is inspired by Reynold’s flocking model and the pheromones of ants [14]. This is very a reasonable consideration when we deploy real aerial vehicles to some points because they have physical limitations in hardware like a short range of wireless link. If new agents are flooded somewhere into the swarm world, the density level is quickly increased and the agents adjust their position to maintain the density until they reach a given level of equilibrium again. Neighboring agents inside the shape with distance < Repel (see Fig. 4(a)) will repulse each other, leading to an average density of agents throughout the shape.
24
K. Yeom
(a)
(b)
Fig. 4. (a) Pheromone robot’s influence ranges. (b) Examples of formations
5 Experimental Results The scenario consists of having a swarm of UAVs form shapes while maintaining a wireless connection and avoiding obstacles. This is based on a real world situation. For example, when an earthquake occurs and a lot of buildings are destroyed, it is very difficult to approach some positions. In addition, there may be a second danger like an additional building collapse. Therefore avoiding obstacles is a very important issue for gathering information in a disaster area. We show that our algorithm can form arbitrary shapes without any human intervention or frequent modification of agents, and achieves restoration of formation from agent death or damage because each agent forms (from any starting configuration) and holds a swarm in a class of shapes. Fig. 4(b) shows several formation examples which are made by flying agents, and also shows that the same shape can be formed with different density levels that agents can accommodate. In this experiment, we basically set the initial density level of agents as 16 neighbors in target shapes. As shown in Fig. 4(b), at any density our virtual pheromone model causes flying agents to disperse evenly throughout user-specified shapes. Flying agents run the distributed algorithm to assume a circle shape (see Fig. 5(a)). Several seeded agents (bright colored) will serve as the circle center. At each step, all the other agents sense their positions and they move along the direction of the circle shape. Eventually, agents outside the intended circle radius will collapse toward it. We consider a formation similar to a ring network architecture. In particular, we imagine a difficult terrain with large obstacles so that agents can make an emergency communication network between multiple survivors located on the ground and a rescue team (refer Fig. 1). In this case, UAVs can fly over a difficult area such as flooded or collapsed terrain, or building debris and could replace damaged, nonexistent or congested networks. Our endeavor is motivated by this scenario. As shown in Fig. 5(a), our algorithm is well adapted to making ring architectures. We consider a more complicate situation with a lot of obstructions. In addition, during connection several sets of agents are destroyed. The separate swarm groups should connect to each other to share information about the task area. The gray circle in Fig. 5(c) shows the disconnection to the inner circle. Agents try to connect to the outside by the network bridge which we assume they could find. During moving, several groups died from some event and the other agents should connect to each other to avoid the
Distributed Formation Control for Communication Relay
(a)
(b)
(c)
(d)
(e)
(f)
25
Fig. 5. (a) Different stages of the circle formation. (b) Ring formation. (c) Connection to outside swarm group. (d) Self-repairing the shape. (e) Percentage of agents in the shape with different measurement of distance. (f) Average coordinate variance under movement and sensing errors.
debris area. As shown in Fig. 5(c), two agent groups are well connected in spite of some damage. It is worth noting that we do not apply self-repairing in this case. Whatever the shape being formed, it is of fundamental importance to preserve and maintain it. In this section, we describe experiments aimed at testing the ability to recover shape deformation from damage such as regional death of agents. We show that the connected coordinate system can be re-stabilized and that the agents can successfully adapt to death without any explicit detection or monitoring for failures. It is a challenge to maintain the overall shape that a misinformed group of agents should stabilize into in relation to the whole aggregate. We first allowed agents to stabilize into the aggregate shape. Then, we selected a large region of agents and uniformly displaced their coordinate systems. On the one hand, agents are able to estimate their local density, and thus they can sense a sudden drop in their neighborhood, revealing a change. On the other hand, all the agents close to the space previously occupied by the destroyed particles now have the possibility to move. Fig. 5(d) shows experiments on the ring
26
K. Yeom
architecture. Some of agents in the lower right corner are destroyed and got rid of from the system. The displaced agents start to move to the corresponding region on the grid. As agents interact with their neighbors from the original grid, they consequently correct the error on the shape and the collapsed shape can be reverted into the original shape. In our experiments, the average time required to complete a stabilized shape formation is about 300 time steps, depending on the number of agents and agent density. Fig. 5(e) shows the percentage of agents in the given shape in a 150x150 world. Most shapes are roughly formed in 100 time steps and converge after 300 time steps. In addition, rate of shape formation increases as the number of agents increases from 150. We also observe that coordinate systems very quickly propagate throughout agents when the agent density is high so that the time to stabilization is reduced. We simply tested how the agents are affected by hardware limitations. As seen in Fig. 5(f), the degree of angle of a sensor affects the performance of the agents. However, we did not evaluate agents’ movement error or sensing error because those are related closely to making a consensus coordinate system among agents.
6 Conclusion and Future Work This paper provides insight into the design of unmanned flying agent-based swarms capable of self-organizing using only local communication with inexpensive hardware. The formation and maintenance of a swarm of UAVs for the creation of wireless communication networks in disaster areas is demonstrated in a 2D simulation with realistic scenarios. We show that agents can self-organize into arbitrary user-specified shapes and maintain well the formed architecture by continuous trilateration-based on a consensus coordinate system and a virtual pheromone-based density model. When a set of agents is dead, destroyed, or displaced the resulting construction of swarms can also self repair it. Future developments can focus on mitigating the effect of wind. In addition, agent’s orientation control can also be investigated. Finally, scalability can be a useful approaches. Acknowledgments. This work has been partially supported from NASA Ames Research Center, Moffett Field, CA, USA. I would like to acknowledge and extend my heartfelt gratitude to my advisors, Dr. Stephen R. Ellis and Dr. Bernard D. Adelstein whose encouragement, guidance and support in Advanced Control and Display Group of Human Systems Integration Division. I offer my regards to Prof. Kevin P. Jordan in San Jose State University Research Foundation who supports me in any respect.
References 1. Camazine, S.: Self-organization in biological systems. Princeton Univ. Pr. (2003) 2. Sharpe, T., Webb, B.: Simulated and situated models of chemical trail following in ants. In: From Animals to Animats 5: Proceedings of the Fifth International Conference on Simulation of Adaptive Behavior, pp. 195–204 3. De Nardi, R., Holland, O.: UltraSwarm: A Further Step Towards a Flock of Miniature Helicopters. In: S¸ahin, E., Spears, W.M., Winfield, A.F.T. (eds.) SAB 2006 Ws 2007. LNCS, vol. 4433, pp. 116–128. Springer, Heidelberg (2007)
Distributed Formation Control for Communication Relay
27
4. Elston, J., Frew, E.: Hierarchical distributed control for search and tracking by heterogeneous aerial robot networks. In: IEEE International Conference on Robotics and Automation, ICRA 2008, pp. 170–175. IEEE (2008) 5. Payton, D., Daily, M., Estowski, R., Howard, M., Lee, C.: Pheromone robotics. Autonomous Robots 11(3), 319–324 (2001) 6. Flint, M., Polycarpou, M., Fernandez-Gaucherand, E.: Cooperative control for multiple autonomous uav’s searching for targets. In: Proceedings of the 41st IEEE Conference on Decision and Control, pp. 2823–2828. IEEE (2002) 7. Yoxall, P.: Minuteman project, gone in a minute or here to stay-the origin, history and future of citizen activism on the united states-mexico border. The U. Miami Inter-Am. L. Rev. 37, 517 (2005) 8. van de Burgt, R., Corporaal, H.: Blimp positioning in a wireless sensor network (2008) 9. U. of West England, “The flying flock” (2002), http://www.ias.uwe.ac.uk/projects.htm 10. Campo, A., Dorigo, M.: Efficient Multi-Foraging in Swarm Robotics. In: Almeida e Costa, F., Rocha, L.M., Costa, E., Harvey, I., Coutinho, A. (eds.) ECAL 2007. LNCS (LNAI), vol. 4648, pp. 696–705. Springer, Heidelberg (2007) 11. Kadrovach, B., Lamont, G.: Design and analysis of swarm-based sensor systems. In: Proceedings of the 44th IEEE 2001 Midwest Symposium on Circuits and Systems, MWSCAS 2001, vol. 1, pp. 487–490. IEEE (2001) 12. Kovacina, M., Palmer, D., Yang, G., Vaidyanathan, R.: Multi-agent control algorithms for chemical cloud detection and mapping using unmanned air vehicles. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, vol. 3, pp. 2782–2788. IEEE (2002) 13. Panait, L., Luke, S.: A pheromone-based utility model for collaborative foraging. In: Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, vol. 1, pp. 36–43. IEEE Computer Society (2004) 14. Van Dyke Parunak, H., Brueckner, S.A., Sauter, J.: Digital Pheromones for Coordination of Unmanned Vehicles. In: Weyns, D., Van Dyke Parunak, H., Michel, F. (eds.) E4MAS 2004. LNCS (LNAI), vol. 3374, pp. 246–263. Springer, Heidelberg (2005) 15. Patwari, N., Ash, J., Kyperountas, S., Hero III, A., Moses, R., Correal, N.: Locating the nodes: cooperative localization in wireless sensor networks. IEEE Signal Processing Magazine 22(4), 54–69 (2005)
A Content-Based Caching Algorithm for Streaming Media Cache Servers in CDN Inwhee Joe1, Ju Hoon Yi1, and Kyu-Seek Sohn2 1
2
Division of Computer Science and Engineering, Hanyang University Department of Information and Communication Engineering, Hanyang Cyber University Seoul, 133-791 South Korea
[email protected]
Abstract. High-quality streaming is getting more popular as it is getting more attention of Internet users. This was possible because there were enough highspeed network infrastructure and Contents Delivery Network (CDN) service. So far, CDN has supported a streaming service by using streaming media cache server that is based on the web cache server as its platform. The static caching is the most popular to the web cache server dealing with the static content. However, the media object of the streaming cache is different from that of the web cache in terms of the size and effective duration. Streaming service requires a large storage space and demands more amount of the traffic. In particular, the traffic of the streaming service is more severely variable due to the faster and more frequent interaction between the client and the server. For these streaming services, CDN uses dynamic caching that can save the cache space and can reduce the response time on the demand of the user. However, the dynamic caching causes the cost of heavy CPU burden. In this paper, we propose a new caching algorithm based on the dynamic caching for streaming media cache servers. The proposed algorithm is a content-based approach to cache an individual content to a server according to the popularity of the content. That is, the contents of high popularity only can be distributed over the multiple cache servers. The experimental results show that the proposed algorithm performs better than conventional approaches in terms of the cache hit rate and the amount of the traffic across the network. Keywords: CDN, Streaming, Content, Caching, Popularity.
1
Introduction
Recently streaming services such as VoD (Video on Demand), Pay TV, AoD (Audio on Demand), P2PTV (Peer-to-Peer TV), and E-book have formed most important service area of the Internet. High-quality streaming is getting more popular as it is getting more attention of Internet users. This was possible because there were enough high-speed network infrastructure and Contents Delivery Network (CDN) service that enabled high-quality streaming. CDN is a transmission service that transmits contents through the nearest server on the network which is mainly used for web contents and streaming delivery. T.-h. Kim et al. (Eds.): MulGraB 2011, Part I, CCIS 262, pp. 28–36, 2011. © Springer-Verlag Berlin Heidelberg 2011
A Content-Based Caching Algorithm for Streaming Media Cache Servers in CDN
29
So far, CDN has supported a streaming service by using streaming media cache server that is based on the web cache server as its platform. The media object of the streaming cache is different from that of the web cache in terms of their sizes and effective durations. Streaming service requires a large storage space and demands more amount of the traffic. In especially, the traffic of the streaming service is more severely variable due to the faster and more frequent interaction between the client and the server of the service [1,2]. The types of caching can be classified into static caching and dynamic caching. Static caching caches and replicates static content that is changed infrequently such as web pages, text documents, images or audio/video files. Dynamic caching deals with the dynamic contents, such as live or on-demand streaming media [3]. Dynamic caching updates the cached contents more frequently than static caching. Static caching performs the periodic prediction of the traffic pattern by the off-line tools such as simulation or traffic monitoring and changes the configuration of the caching mechanisms according to the prediction. In the contrary, dynamic caching reacts to the user's demand pattern. The dynamic caching caches only the portion of the contents that is required by the user. The dynamic caching is more efficient than the static caching while the dynamic caching needs more CPU power [4]. In this paper, we propose a new caching algorithm that distributes the individual content to a cache server according to its attributes. The proposed algorithm is based on the dynamic caching mechanism but is more efficient than the conventional dynamic caching approaches in terms of the hit rate and the amount of the traffic across the network.
2
Technology Review
Initially most services of CDN were web caching service and file transfer service. Currently the streaming service has become important portion of the services of CDN. The caching service using the media streaming cache server has already been commonly used but has been being continuously studied because of the attributes of the streaming and the increasing requirements of the high definition video content. In this section, we overview the streaming methods and the caching mechanisms used in CDN. 2.1
CDN
CDN is a network that is composed of nodes containing copies of content, placed at various points in the network so as to maximize the accessibility to the content from users through the network. The core technologies of CDN are guaranteeing QoS by the fast data transmission and load balancing among servers. In order to support these technologies, DNS's and caching platforms are used as the basic technologies. The caching platform consists of cache servers installed at local nodes and the origin servers presented as shown in Figure 1. The local node belongs to the CDN operator and the origin server is really the web server of the content provider.
30
I. Joe, J.H. Yi, and K.-S. Sohn
Once content is cached, the cache server can transfer the content immediately to the user when the user requests the content and then the cache server can reduce the traffic from the origin server. Cache server also acts as a point of control and security. Today's caches frequently include support for content filtering, anti-virus, access control, authentication, cryptography of content and bandwidth management. Antivirus, content filtering, authentication and cryptography give users an extra level of security across the network [3].
Fig. 1. Structure of CDN
2.2
Streaming Media Cache Proxy Server
Streaming caching achieves efficient media data transfer by handling the caching and transport of the streaming data simultaneously through the cache proxy server placed near users as shown in Figure 2 [2].
Fig. 2. Structure of the streaming cache proxy server
Streaming cache proxy server can deal with not only VoD but also live streaming by splitting. It can multicast the streaming which is received from the origin server to multiple users. The streaming cache proxy server placed near user or in the user's local network can enhance QoS of the streaming by eliminating the transmission
A Content-Based Caching Algorithm for Streaming Media Cache Servers in CDN
31
delay across WAN. The streaming cache proxy server can also reduce the demand of traffic originating from the origin server across the network. In current CDN, the web caching infrastructure of the content provider such as the web server is frequently used to serve the streaming service because HTTP is used more pervasively than RTP, RTSP and RTCP. 2.3
Load Balancing of CDN
In CDN, the DNS server locates the closest node to the user requesting the content and performs load balancing in the server farm including the located node. The locating and load balancing procedure is depicted in Figure 3 and described as follows [5,6]: 1. User requires content. The user's request is broadcasted by the local DNS to the outer DNS's. 2. Through a cooperative address resolution procedure among DNS servers belonging to the DNS hierarchy (such as root DNS server, top-level DNS server, higher-level DNS server, etc.) including the local DNS, a set of close nodes are selected (i.e., the candidate set). 3. The local DNS server determines the closest cache server from the candidate set by using an appropriate scheduling algorithm (e.g. round-robin scheduling algorithm).
Fig. 3. Locating the closest streaming media server in the DNS hierarchy
In the above procedure, the most typical scheduling algorithm for load balancing is round-robin (RR) scheduling. More sophisticated scheduling algorithms are studied and implemented, which can select the server of the lightest load among the candidate servers with considering information of the location of the candidate server nodes (e.g. IDC, Internet Data Center), disk I/O load and CPU load of each server, or offered load in the local network attached the candidate servers.
32
3
I. Joe, J.H. Yi, and K.-S. Sohn
Content-Based Caching Algorithm
The streaming media caching service is based on the cache platform controlled by DNS. DNS locates the closest node to the user and is in charge of the load balancing among servers. In this section, we propose the caching algorithm to enhance the performance of load balancing orchestrated by DNS The load balancing method based on the scheduling algorithm described in the previous section can show good performance of load balancing but may waste the storage space of cache servers, because it does not consider the attributes of the individual content. If it is possible to cache content to an appropriate cache server according to the popularity of the content, the storage space can be saved considerably. The proposed caching algorithm based on the content is just the scheduling algorithm to cache an individual content to a server according to the popularity of the content to be cached. That is, the content of high popularity is processed by the conventional load balancing scheduling algorithm, while the content of low popularity is dynamically assigned to a dedicated cache server. The proposed caching algorithm prevents the low popularity content from being distributed over the multiple cache servers and limits its location at a dedicated cache server. By doing this, the proposed caching algorithm can save the storage space to be used for caching in the network, enhance the overall hit rate of cache servers in the network, and reduce the demand of network traffic from the origin content server side. The content-based caching algorithm measures the popularity of the content and determines how the content should be cached. The popularity of the content is defined as the number of request made to a certain content in a given unit time. The popularity measurement mechanism is implemented by counting the number of request for a particular content in the time frame assigned to the content. Figure 4 represents a series of contents and time frames assigned to contents.
Fig. 4. Time frame for contents
If the popularity of a particular content is greater than a given threshold, the content is classified as the popular content. The popular content is distributed to cache servers by the load balancing algorithm. After determining the popularity of the content, the DNS server records the location of the content, the popularity of the content, and the address of the corresponding cache servers in its database. For example, each record of the database consists of URI (Uniform Resource Identifier) of the content as the key and the IP address of the cache server of the content and TTL (Time To Live) value. DNS server removes the record having TTL expired and recalculate the popularity of the content of the removed record when a new request for that content is arrived from a user.
A Content-Based Caching Algorithm for Streaming Media Cache Servers in CDN
33
The detailed description of the flow chart of Figure 5 is as follows: 1. Receiving the request of the content from the user, DNS server finds the IDC or the group of servers near the user. 2. With referring the cached information, the DNS server find out whether the streaming server dedicated to the requested content exists. 3. If there is no dedicated server, the DNS server selects a server by the scheduling algorithm and marks the server as the dedicated one for the requested content and keeps the information of the dedicated server into the database. 4. If there is the dedicated server, the DNS server checks whether or not its TTL is expired. 5. If TTL is expired or there is no dedicated server, the DNS server measures the popularity of the required content. If the content is popular, the DNS server performs load balancing with the scheduling algorithm. If the content is unpopular, the DNS server selects the dedicated server for the required content. And the DNS server reset the timer of the content.
Fig. 5. Flow of the content-based caching algorithm
The popularity of content is effective for the time of TTL. If a cache server is broken down, the recorded information of the corresponding content is invalidated and the scheduling algorithm or the server-dedication algorithm is performed repeatedly by the DNS server. The typical message sequence chart of the proposed algorithm is depicted in Figure 6.
34
I. Joe, J.H. Yi, and K.-S. Sohn
Fig. 6. Message sequence diagram of the proposed algorithm
4
Performance Evaluation
4.1
Testbed Setup
We have analyzed the performance of the proposed caching algorithm using a test bed constructed in Figure 7. The test bed consists of four streaming media cache servers, one DNS server, one origin server, and a load generator. Each of them is based on the personal computer and is connected to each other through the Internet. Each cache server has its own IP address but all of cache servers form a domain. The DNS server uses round-robin scheduling algorithm. The load generator consists of an event driven simulation software running on the personal computer. It performs the role of the client. The load generator issues the resolving request of a particular content to the DNS server according to the test scenario and then the DNS server selects one of the cache servers and responds to the load generator with the IP address of the streaming cache server. The load generator has the function logging all the events occurring between the generator and the DNS server and cache servers.
Fig. 7. Testbed for performance evaluation
4.2
Experimental Results
We experimented in two cases and compared the results of them. The one case was the simulation with the proposed content-based caching algorithm the other was that
A Content-Based Caching Algorithm for Streaming Media Cache Servers in CDN
35
with the pure round-robin algorithm conventionally used for web services. We installed ten thousand titles of content into the origin server. The size of cache of each streaming media cache server was set to 500 GB. The load generator issued the request for content at the rate that was randomly determined number from 5 to 10. Two experiments were performed for one hour, respectively. Figure 8 represents the comparison of hit rate between the two caching algorithms. Darker blue line is for the caching algorithm using pure round-robin scheduling mechanism. The curves of Figure 8 show that the hit rate of the proposed caching scheme is higher than that of the conventional caching scheme with 10% ~ 15%.
Fig. 8. Hit rates of the two caching schemes
Figure 9 shows the comparison of the traffic flowing out from the origin server in the two cases of experiment. The amount of traffic in the case of the proposed caching scheme is smaller than that in the case of the conventional caching scheme with 10% ~ 15%. The value of the reducing rate of the amount of the traffic is the same as the value of the enhancement of the hit rate. Finally, we can conclude that the enhancement of the hit rate of the cache server results in reduction of the amount of the traffic in the network.
Fig. 9. Traffic amount from the origin streaming media server
36
5
I. Joe, J.H. Yi, and K.-S. Sohn
Conclusions
In this paper, we have proposed a content-based caching algorithm for streaming media cache servers, in order to improve the effectiveness of the streaming media caching mechanism. The results of the experimental performance evaluation show that the proposed algorithm is better than the conventional round-robin caching scheme in terms of the cache hit rate and the amount of traffic burden. If the proposed caching scheme is deployed in CDN, considerable benefits are foreseeable. Firstly, the availability of the total storage in the local cache servers increases and more streaming media can be cached into the cache servers. This reduces the cost of the cache storage. Secondly, by increasing the number of cached streaming media, it is possible to expect the raised hit rate and reduced traffic cost. Also, the load of the origin server can be reduced. Acknowledgements. This work was supported by Basic Science Research Program through the National Research Foundation by Korea (NRF) funded by the Ministry of Education, Science and Technology (2011-0004974) and the KEIT R&D Support Program of the MKE.
References 1. Chen, S., Shen, B., Wee, S., Zhang, X.: Designs of high quality streaming proxy systems. In: Proceedings of IEEE INFOCOM, Hong Kong (2004) 2. Liu, J.: Streaming Media Caching. School of Computing Science. Simon Fraster University, British Columbia 3. Bartolini, N., Casalicchio, E., Tucci, S.: A Walk Through Content Delivery Networks. In: Calzarossa, M.C., Gelenbe, E. (eds.) MASCOTS 2003. LNCS, vol. 2965, pp. 1–25. Springer, Heidelberg (2004) 4. Kumar, C., Norris, J.B.: A new approach for a proxy-level web caching mechanism. Decision Support Systems 46 (December 2008) 5. Su, A., Choffnes, D.R., Kuzmanovic, A., Bustamante, F.E.: Drafting Behind Akamai: Inferring Network Conditions Based on CDN Redirection. IEEE/ACM Transactions on Networking 17(6) (December 2009) 6. Akamai, http://www.akamai.com
Implementation of Bilinear Pairings over Elliptic Curves with Embedding Degree 24 In Tae Kim1, Chanil Park2, Seong Oun Hwang1, and Cheol-Min Park3 1
Hongik University, Korea Agency for Defense Development, Korea 3 National Institute for Mathematical Science, Korea 2
Abstract. Most implementations of pairing-based cryptography are using pairing-friendly curves with an embedding degree k ≤ 12. They have security levels of up to 128 bits. In this paper, we consider a family of pairing-friendly curves with embedding degree k = 24, which have an enhanced security level of 192 bits. We also describe an efficient implementation of Tate and Ate pairings using field arithmetic in F 24 ; this includes a careful selection of the q
parameters with small hamming weight and a novel approach to final exponentiation, which reduces the number of computations required. Keywords: pairing-friendly curve, Tate pairing, Ate pairing.
1
Introduction
Pairing can be defined as a computable bilinear map between an elliptic curve group E(Fq) and a multiplicative group of an extension field Fq k , where k is called the embedding degree of the elliptic curve. A pairing operation is considered to be secure if the discrete logarithm problem in the groups is computationally infeasible. In fact, the security of a pairing operation depends on the selected elliptic curve E(Fq) and finite field Fq k . Therefore, over the last few decades, many papers have been published on the construction of pairing-friendly curves [5,8,9,10]. Pairing-friendly curves are parameterized by an embedding degree k and prime number q. For optimal security, the parameters k and q should be selected such that the discrete logarithm problem is difficult to solve even when using the best known algorithm [10]. Table 1 shows the relationship between the security level and the embedding degree. Table 1. Key size security in bits Security level (bits) 80 128 192 256
Group size 160 256 384 512
Extension field size 960 - 1280 3000 - 5000 8000 - 10000 12000 - 18000
T.-h. Kim et al. (Eds.): MulGraB 2011, Part I, CCIS 262, pp. 37–43, 2011. © Springer-Verlag Berlin Heidelberg 2011
Embedding degree 6-8 12 - 20 20 - 26 28 - 36
38
I.T. Kim et al.
Many researchers have examined the issue of constructing elliptic curves with a recommended embedding degree. Menezes et al. [11] showed that a supersingular elliptic curve must have an embedding degree k ≤ 6. Miyaji et al. [12] described the complete characteristics for ordinary elliptic curves of prime order with the embedding degree k = 3, 4, or 6. Barrento et al. [8] also provided a method for the construction of curves of prime order with k = 12. Security level is an extremely important aspect of real systems. The National Institute of Standards and Technology recommends the use of different algorithms to raise the security level [14]. The use of either a 192- or 256-bit key is recommended for top security agencies or a military environment, where security levels stronger than those in commercial environment are required. Thus, in this paper, we focus on the implementation of pairing-friendly curves with embedding degree k = 24, which have a 192-bit security level. The implementation of these types of curves has never been studied in detail at the time of writing this paper. The paper is organized as follows: In Section 2, we provide a brief background about pairing. The main contributions of this paper are presented in Sections 3 and 4 where we describe a pairing-friendly elliptic curve, a Tate pairing, and an Ate pairing. We describe our computational experiments in Section 5 and conclude this paper in Section 6.
2
Bilinear Pairings
Let G1 and G2 be additive groups and G3 be a multiplicative group. Let be a bilinear pairing. Let Fq be a finite field with a characteristic q and E(Fq) be an elliptic curve defined over Fq. Let n be the order of E(Fq), r a large prime number that n is divisible by, and k, the smallest positive integer such that r|qk-1. The integer k is the embedding degree of E with respect to r. We know that the r-th roots of unity are contained in Fqk . Let [a]P denote the multiplication of a point P∈E by a scalar a. ∞ denotes a point at infinity. Miller function [2] fr,P (⋅) is a rational function on E with r zeroes at P, one pole at [r]P, and r-1 poles at ∞: fr,P = r(P) − ([r]P) − (r −1)∞ Tate pairing [6] is a well-defined, non-degenerate bilinear pairing in which
G1 = E[r] , G2 = E[Fqk ]/ rE[Fqk ] , and G3 = Fq*k /(Fq*k )r . Let P∈E[r] and Q∈E[Fqk ]/ rE[Fqk ] . Then, the Tate pairing of P and Q is computed as follows:
e(P,Q) = fr,P (Q)(q −1)/ r k
Ate pairing [1] is a well defined, non-degenerate bilinear pairing with G1 = E[r] ∩ Ker(πq −[1]) , G2 = E[r]∩ Ker(πq −[q]) , and G3 = Fq*k /(Fq*k )r , where π q is the Frobenius endomorphism.
Implementation of Bilinear Pairings over Elliptic Curves with Embedding Degree 24
39
Let P∈E[r] ∩ Ker(πq −[1]) and Q∈E[r]∩ Ker(πq −[q]) , and let t be the trace of the Frobenius endomorphism of the curve. Then the Ate pairing of P and Q is computed as follows:
e(Q, P) = ft−1,Q(P)(q −1)/ r k
3
Pairing-Friendly Elliptic Curve with Embedding Degree k=24
We implemented a method to generate pairing-friendly elliptic curves over a prime field, with embedding degree k = 24. Freeman et al.[3] described a general method to generate ordinary curves using the Cocks-Pinch method [15]. The Cocks-Pinch method has an advantage in that it can produce curves with prime-order subgroups of nearly arbitrary sizes. Theorem 1. [3] Fix a positive integer k and a positive square-free integer D. Execute the following steps: (1) Find an irreducible polynomial r(x) with a positive leading coefficient such that
K = Q[x]/(r(x)) is a number field containing −D and the cyclotomic field Q(ζk ) . ζ ∈K . (2) Choose a primitive k-th root of unity k
ζ +1 in K. (3) Let t(x)∈Q[x] be a polynomial mapping to k
(ζ −1)/ −D (4) Let y(x) ∈Q[x] be a polynomial mapping to k in K. 2 2 (5) Let q( x) ∈ Q[ x] be given by (t(x) + Dy(x) )/4 .
Let q(x) represent primes and y(x0 )∈Z for some x0 ∈Z . Then the triple (t(x), r(x), q(x)) parameterizes a complete family of elliptic curves with an embedding degree k and discriminant D. In the paper, we follow the Cocks-Pinch method and the method proposed by Freeman et al. [3] to generate a family of elliptic curves with embedding degree k = 24. Reference [3] classified families in all cases where k is not divisible by 18. The equation of the curve is E: y2 = x3+b, with b ≠ 0. The trace of the curve, the prime number r by which the order of the curve is divisible, and the characteristic of Fq are parameterized as follows: t ( x) = x + 1
r ( x) = x8 − x 4 + 1 1 q ( x ) = ( x10 − 2 x9 + x8 − x 6 + 2 x 5 − x 4 + x 2 + x + 1) 3 We can calculate the ρ value like as follows: ρ = deg q(x) deg r(x) =10 8 =1.25
40
I.T. Kim et al.
Example 1. Using the proposed pairing-friendly curves, we present an example of an elliptic curve with embedding degree k = 24. Let x = -562956395872256. Then t = x+10 is 50 bits, r is 489 bits, q is 393 bits, and the hamming weight of x is 3. The
desired curve has the form of
y 2 = x 3 + 10 with
t=-562956395872255 r=106567878809068741478123480948740979636070513297632238605251 6761893902274102196368903858053299667708716351625059695223776 148550871771850476603746987, and q=100878371077876069882615459791383085619622113692689665650588 61035506973091017784521329917001106353323303332550614712321
4
Computation of Bilinear Pairings over Elliptic Curve
4.1
Tower Extension of Finite Field Fq24
The elements in the field are represented through a polynomial of degree k - 1, i.e., Fqk = Fq[x]/( f (x)) , where f(x) is an irreducible polynomial of degree k. In the paper, we construct the extension field Fq24 as a tower of finite extensions: quadratic on top of a cubic on top of a quadratic, i.e., 1-2-4-12-24. The irreducible polynomials for the tower of extensions are detailed in Table 2. Table 2. Tower of extension fields
4.2
Extension
Construction
Representation
Fq2
Fq[u]/(u +1)
a = a0 + a1u
Fq4
Fq2 [v]/(v2 −(1+ u))
a = a0 + av 1
Fq12
Fq4 [w]/(w3 − v)
a = a0 + a1w+ a2w2
Fq24
Fq12 [z]/(z2 − w)
a = a0 + a1z
2
Sextic Twist and Miller’s Algorithm
We describe the Tate and Ate pairing operations in this section. The pairing operations take points P=(xP, yP)∈E(Fq) and Q = (xQ, yQ )∈E(Fq24 ) . For optimization, we can compress points in E(Fq24 ) to points in a sextic twist E′(Fq4 ). Let i∈F 4 be such that q
x6 −i is irreducible over Fq . Then the elliptic curve E admits a sextic twisted curve 4
E′ : y2 = x3 + b / i with # E′(Fq ) = q + 1 − (3 f + T ) / 2 where T =t4 −4qt2 +2q2 and 4
4
f = (4q4 −T2)/3 [4]. Let θ∈Fq24 be a root of x6 −i . Then the injective homomorphism
Implementation of Bilinear Pairings over Elliptic Curves with Embedding Degree 24
41
E′ →E :(x', y ') →(θ 2x',θ3 y ') maps the points on the sextic twisted curve to the original curve. Tate and Ate pairing can be computed by using Miller’s algorithm such as [5]. When we compute the line function of Ate pairing, we can use sextic twist formula like [6]: For A=(xA, yA) =(xA 'θ2, yA 'θ3), B=(xB, yB) =(xB 'θ2, yB 'θ3)∈E(F24 ) , let lA,B be a line passing q through A and B. Then we have
lA,B(P) =(−yP) +(xPλA',B' )θ +(yA '−xA 'λA',B' )θ3 where λA',B' = ( yB '− yA ')/(xB '− xA ') . 4.3
Final Exponentiation
Both Tate and Ate pairing algorithms compute a final exponentiation (q24 −1)/ r after running the Miller algorithm. This exponentiation is factored into three parts to speed up our implementation: (q12 −1) , (q12 +1)/φ24 (q) , φ24 (q)/ r where φ24 (q) is the 24-th cyclotomic polynomial [7]. Here, φ24 (q)/ r is called the hard exponentiation. It can be easily shown by computation that φ24(q) =q8 −q4 +1 ,
r(x) =x8 −x4 +1. Then these exponents are explicitly expressed as (q12 −1) , (q4 +1) , and (1+(q3 + xq2 + x2q + x3)(q4 + x4 −1)(x −1)2 /3) . The exponentiation for the first two parts is easy to compute because of the Frobenius. Algorithm 1. Hard exponentiation
Input : f , x, q Output : f (1+( q
3
+ xq 2 + x 2 q + x3 )( q 4 + x 4 −1)( x −1)2 / 3) 2
3
1. Compute
f q , f q , and f q using Frobenius
2. Compute
f ' ← ( f q ) ⋅ ( ( f q ) ( ( f q ) ( f )x )x )x
3. Compute
( f ' ) q using Frobenius
3
2
4
(
)
4.
f '' ← ( f ' ) q ⋅ ( f ' ) x ⋅ ( f ' ) −1
5.
f ← f ⋅ ( f '' ) ( x −1)
4
4
2
/3
However, the exponentiation of the third part is difficult to compute. Therefore, instead of using the expensive multi-exponentiation method, we exploit the polynomial description of q and r to obtain Algorithm 1, which can produce equivalent result with lesser exponentiation. Our experiments show that this method is twice as fast as compared to multi-exponentiation.
42
I.T. Kim et al.
4.4
Frobenius Constant
In the case of particular prime p such that p = 3 mod 4, p = 1 mod 6 and p = 7 mod 12, we can speed up the abovementioned final exponentiation by converting exponentiations to multiplications as follows: If we let
E = 1 + u , F1 = E ( p −1) / 2 , F2 = E ( p −1) / 6 , F3 = E ( p − 7) /12 , then we have
z p = z p −7 z 6 z = ( z12 )( p −7) /12 vz = E ( p −7) /12 vz = F3vz , w p = ( w6 )( p −1) / 6 w = F2 w, v p = F1w, u p = −u . Therefore we obtain the following Table 3. Table 3. Tower of extension fields and their Frobenius constants
5
Extension
Representation
Frobenius
Fq2
a = a0 + a1u
ap = a0 − au 1
Fq4
a = a0 + av 1
a p = a0 p + a1p Fv 1
Fq12
a = a0 + a1w+ a2w2
ap = a0 p + a1pF2w+ a2 pF22w2
Fq24
a = a0 + a1z
a p = a0 p + a1p Fvz 3
Computation Experiment
The performances of the Tate and Ate pairings were measured using a Window 7 system with a 2.91GHz AMD Athlon™ II processor. The results have been listed in Table 4. The MIRACL v5.4.2 library 1 was used in our test; this library supports multiprecision arithmetic and a number of powerful optional optimizations. Internally, prime field elements are in Montgomery representations [13], which allows for fast reduction without divisions. The measured times for the Ate and Tate pairings are listed in Table 4. The Ate pairing over the proposed curve takes approximately 0.320 seconds, which is quite efficient for present-day use. Table 4. Timings in seconds for 2.91GHz AMD Athlon™ II
Miller loop Final exponentiation Total 1
http://www.shamus.ie
Tate pairing 0.740 0.254 0.994
Ate pairing 0.073 0.247 0.320
Implementation of Bilinear Pairings over Elliptic Curves with Embedding Degree 24
6
43
Conclusion
In this paper, we described our implementation of the Tate and Ate pairings over the proposed elliptic curves with embedding degree k = 24. We also showed the time required to compute the pairings using a MIRACL library. Current pairing time may not be practical for lightweight devices such as sensor nodes or mobile devices. Therefore, in the near future we plan to optimize the pairing operations, particularly the final exponentiation, in such devices. Acknowledgments. This work was supported by the Agency for Defense Development under contract UD090059ED.
References 1. Hess, F., Smart, N.P., Vercauteren, F.: The Eta Pairing Revisited. IEEE Transactions on Information Theory 52(10), 4595–4602 (2006) 2. Miller, V.S.: The Weil pairing and its efficient calculation. Journal of Cryptography 17(4), 235–261 (2004) 3. Freeman, D., Scott, M., Teske, E.: A Taxonomy of Pairing-Friendly Elliptic Curves. Journal of Cryptology 23, 224–280 (2010) 4. Scott, M.: A note on twists for pairing friendly curves (2005), ftp://ftp.coomputing.dcu.ie/pub/resources/crypto/twists.pdf 5. Barreto, P.S.L.M., Kim, H.Y., Lynn, B., Scott, M.: Efficient Algorithms for Pairing-Based Cryptosystems. In: Yung, M. (ed.) CRYPTO 2002. LNCS, vol. 2442, pp. 354–369. Springer, Heidelberg (2002) 6. Devegili, A.J., Scott, M., Dahab, R.: Implementing Cryptographic Pairings over BarretoNaehrig Curves. In: Takagi, T., Okamoto, T., Okamoto, E., Okamoto, T. (eds.) Pairing 2007. LNCS, vol. 4575, pp. 197–207. Springer, Heidelberg (2007) 7. Granger, R., Page, D., Smart, N.P.: High Security Pairing-Based Cryptography Revisited. In: Hess, F., Pauli, S., Pohst, M. (eds.) ANTS 2006. LNCS, vol. 4076, pp. 480–494. Springer, Heidelberg (2006) 8. Barreto, P.S.L.M., Naehrig, M.: Pairing-Friendly Elliptic Curves of Prime Order. In: Preneel, B., Tavares, S. (eds.) SAC 2005. LNCS, vol. 3897, pp. 319–331. Springer, Heidelberg (2006) 9. Brezing, F., Weng, A.: Elliptic curves suitable for pairing based cryptography. Designs, Codes and Cryptography 37(1), 133–141 (2005) 10. Freeman, D.: Constructing Pairing-Friendly Elliptic Curves with Embedding Degree 10. In: Hess, F., Pauli, S., Pohst, M. (eds.) ANTS 2006. LNCS, vol. 4076, pp. 452–465. Springer, Heidelberg (2006) 11. Menezes, A., Okamoto, T., Vanstone, S.: Reducing elliptic curve logarithms to logarithms in a finite field. IEEE Transactions on Information Theory 39, 1639–1646 (1993) 12. Miyaji, A., Nakabayashi, M., Takano, S.: New explicit conditions of elliptic curve traces for FR-reduction. IEICE Transactions on Fundamentals E84-A(5), 1234–1243 (2001) 13. Montgomery, P.L.: Modular multiplication without trial division. Mathematics of Computation 44(170), 519–521 (1985) 14. CNSS Policy, no. 15, fact sheet no. 1, National Policy on the Use of the Advanced Encryption Standard(AES) to Protect National Security Systems and National Security Information. NIST (2003) 15. Cocks, C., Pinch, R.G.E.: Identity-based cryptosystems based on the Weil pairing (2001) (unpublished manuscript)
Improvement of Mobile U-health Services System Byung-Won Min Department of Information Communication Engineering Mokwon University, Doan-dong 800, Seo-gu, Daejon, 302-729, Korea
[email protected]
Abstract. This paper presents a novel method to design and implement mobile u-health system by defining the essential elements of mobile healthcare services. Characteristics of U-health Services, First U-health services are process oriented, that is, a complete u-health service process is constructed by connecting and integrating small service units. Second, Many u-health services are the variations of a common and sharable u-health service scenario. Third, Many units of a u-health service are reusable by other u-health services. Fourth, Evolving, in other words, the services can be improved as the more data is accumulated and the better unit services are available. Fifth, Bio-sensors for uhealth services are limited in terms of size and precision against off-line high cost biomedical sensors. The Last, Less precise bio-signals are obtained more frequently from large number of users in u-health services. In addition the designed scheme offers a realized Mobile U-health System with the purpose of advanced developing tools for application or service developers. Keywords: U-health, mobile service platform, usability.
1
Introduction
In recent, social concerning about u-health industry is rapidly growing because our society has been highly aging status and requirement for health and welfare will be continuously increased. In terms of government budget, rate for health and welfare will be gradually increased [1]. Therefore various health-care services will be generalized in our modern society and models and systems for these services can be one of the hottest research topics in the near future. For example, LG and Samsung announce products for health-care checking blood-sugar or body-fat using a cell phone connected to biomedical sensors [2]. But we cannot decide that the era of u-healthcare begins in good earnest, although there have been a few simple instances of u-care mentioned above. These services are limited to a simple off-line care through a terminal with corresponding programs and connected sensors. We expect that u-healthcare scheme will be much more complex and useful system than simple terminal. We can store, manage, and analyze the physical data from various sensors using mobile terminal, and we ultimately get distance clinic throughout these on-line services and mobile handsets [3]. On the other hand, we can offer healthcare services such as blood sugar, body fat, heart rate, stress, and fatigue management using mobile terminal. We can also get various bio data such as ECG, pulse rate, blood sugar level, and body fat ratio through existing sensors[4]. T.-h. Kim et al. (Eds.): MulGraB 2011, Part I, CCIS 262, pp. 44–51, 2011. © Springer-Verlag Berlin Heidelberg 2011
Improvement of Mobile U-health Services System
45
In this paper, we propose a new approach for design of mobile u-healthcare system by defining an essential service group of mobile healthcare services. In addition we choose common service elements for the proposed u-healthcare and design the service platform. Especially we focus on automatic urine sensing u-care system to prove the effectiveness of the service platform. The rest of this paper is organized as follows; Section 2 briefly describes u-healthcare services in general point of view and defines the service elements for our new approach in this study. Section 3 offers the structure of platform using common service applications. Section 4 proposes a mobile u-healthcare scheme by automatic urine sensing u-care system and explains its usability. Finally Section 5 gives our concluding remarks and future studies.
2
Mobile U-health Service
2.1
Definition
Although there can be different definitions for mobile u-health service according to their points of view, we define a real-time service that can be obtained from a mobile terminal while we are moving. In other words, we can get, store, manage, and analyze the mobile bio data to apply the corresponding user to do a proper follow-up at proper time or directly cure the disease as an advanced service. In addition, mobile u-health system is defined as an integrated scheme including bio sensors, terminal, and related software and hardware, which needed to provide mobile u-health service defined above.
2.2
Elements of Mobile U-health Service
Mobile u-health system generally consists of the following core elements and their corresponding technologies, although there are other kinds of systems in our initiative u-health world [5]; • Framework for collection of bio data • Framework for storage and management of bio data • Framework for analysis of bio data • Framework for mobile u-health service Based on these cores, we pictorially present our structure of mobile u-health service as shown in Fig.1. System periodically gets user’s bio data and transfers them to the server using the framework for collection. The sensing output from the independent sensor installed in terminal are transferred to the server through the terminal gateway. Although the method of direct transferring from sensor to server is possible, it cannot be practical because of the sensor cost and the sensor’s capability limit. We can effectively store and manage the collected bio data using the framework for storage and management. In this scheme, we use temporal data management skill for the framework because the bio data are generated periodically and continuously.
46
B.-W. Min
Fig. 1. Structure of Mobile U-health Service Framework for analysis decides whether there are abnormal symptoms in user’s body by the prepared analysis method applying to new bio data obtained from frameworks described above. In order to analyze bio data, we apply data mining technology to detect possible abnormality or index of health. We use pattern matching method, expert system concepts, and supporting method for decision making. Finally, framework for mobile u-health service is a kind of middleware supporting the integrated service including data collection, storage and management, and their analyses. Therefore, the elements of u-health service described above and various corresponding technologies are integrated to a Hub, which is named as mobile u-health service framework. In addition, the framework offers the environment for developing services and makes services to be operated on the framework. Fig.1 shows a typical model of mobile u-health service which consists of the core element frameworks mentioned above. Real-time bio data for a user obtained from mobile utilities are stored in database and analyzed. Most of the time, analyses of bio data are fulfilled by the help of an expert system which collects bio data into two parts between normal group and patient group so that the accuracy of decision can be higher. Moreover the expert system continuously advances by studying the new data as its proper learning data.
3
Mobile U-health Service Platform
In this section, we present the architecture of mobile u-health service platform with its core elements and their roles. We explain the capabilities of the platform in connection with applications, expert system, and their related databases. Fig.2 shows the structure of mobile u-health service platform offered from this point of view[6]. The platform receives bio data as type of messages from various terminals and hands them over to the database management module to be processed. In other words, mobile message processing module connects between the moving client and the server. The bio data transferred through the mobile message processing module to the framework will be stored and managed by a large scale temporal database management system, in which the bio data may be separated according to their users, services, and sometimes by their types of treatments.
Improvement of Mobile U-health Services System
47
Fig. 2. Mobile U-health Service Platform Stored data will be used for detecting necessary health indices by applying data mining or pattern matching methodology, and then offer direct or feedback information to the expert system. Close relations between expert system with data mining or pattern matching module and temporal database management module are necessary because structure of database varies with kinds of application services. On the other hand, corresponding u-health application service has to be defined by a process format in order to develop the application service using the mobile platform. As shown in Fig.2, all the mobile u-health services are to be considered possible as processes for obtaining, storing, analysis, and informing the result. In addition mobile u-health application services represented by corresponding processes will be operated and controlled by the process management system, in which operational services and their monitoring steps for control will also be supported. The user management module supports personalized service control to manage all the personal information. This module can be used in connection with the user management scheme installed in the process management system.
4
Design of the Mobile U-health Service System
In this section, we design a U-health system which automatically senses feces and urine of patients and informs to their guardians through mobile text transmission service based on the concept of mobile u-health service platform represented in the last section. We also present that the system can realize functions such as patient monitoring, informing abnormal phenomena, communication between bio terminal and server, receiving the messages and analyzing them to achieve the original purpose of u-care system[7][8]. We show the structure of our u-care scheme based on mobile u-health service platform in Fig.3. The system consists of 4 kinds of frameworks for the purposes of bio data collection, storage and management, analysis, and mobile service as described in Section 2.2.
48
B.-W. Min
Bio Signal & Physical Symptom
Notification
Data Storing
Data Acquisition
Data Analysis
Final Confirmation
Decision & Delivery
User Feedback
+ t 1
+
t 5
Start t 2
t 3
Validation Check Check User Registrat ion Bio Dat a Type
Questionnaire Composer
Customer Info Bio Signal, Symptom Source Database
Environmental Information
t 4
Health Program Analysis
St ore Data
Business Process Management
Diagnosis& Weight Assignment Database
Ontology Manager
Fig. 3. Mobile U-care System Structure Framework for storage and management of bio data is a central element to store and manage the collected data using the sensor and the data collection framework as shown in Fig.4. It processes u-health data, user related data, and service specification. In addition, we analyze disease related issues, symptoms, and their relationships using semantics representation model based on the u-health data ontology throughout the framework. We can offer a user friendly environment for development of various u-health services and contents in order to meet the modern requirements from users.
Fig. 4. Bio Data Storing and Management Framework We can make professional service developers store his own services for various u-health applications using the ontology editor as shown in Fig.4. In addition, application developers can find the most proper service for his own u-health application by using the service broker of the ontology manager prepared in this scheme. Throughout these processes, we can offer personalized u-health services and a reliable developing environment for further or new application of advancing scheme. Framework for analysis of bio data is shown in Fig.5. We can decide feces or urine about patient by analyzing new created data throughout the method prepared previously. As shown in Fig.5, service developers and application developers can perform their developing jobs on the link provided by the service broker of the scheme. After service developers generate service units and load them, application developers may construct corresponding service processes using recommended service units throughout the service broker. We can realize a large process using stored process elements through the process template while we can directly realize any sub-process by bringing service units from the service pool.
Improvement of Mobile U-health Services System
49
Fig. 5. Bio Data Analysis Framework Mobile u-health service framework shown in Fig.6 is a kind of middleware software supporting services for data acquisition, storing and management, and analysis as an integrated service. This framework offers any kinds of application services loaded on our u-health platform not only to terminal but to the web service. Although there are some overlaps, we can assign elements such as client device tier, business logic tier, and data management tier to this framework as shown in Fig.6.
Fig. 6. Mobile U-health Service Framework
5
Implementation of the Mobile U-health Service System
Mobile U-health evaluation for the purpose of the system configuration, this paper presents the possibility for a concrete and real service by implementing a program is a test bed environment destroyers. The disease prediction probability calculated by using an array of DCAP and user feedback mechanisms, according to the available personal data to the service of the evolutionary, the diagnosis can improve the reliability. Shown in Fig.7, first check the authenticity of the terminal is sent from the data stored in the database. In the ontology manger for each bio/symptom data analysis and data to identify possible causes, to compensate for the weight. The next phase of DCAP matrix adjusted weights are used as input data, DCAP matrix with running-set used in the operation belong to the range of disease probabilities are calculated. Where the periodic update of the running-set is more accurate over time can be measured.
50
B.-W. Min
Fig. 7. Implementation of the Mobile U-health Service System
6
Conclusions • Presented an evolvable mobile u-health service platform which pursues to meet six
design goals of u-health service platform. • Flexibility, accessibility, evolvability, reusability, adaptability, interoperability are
the six design goals set for mobile u-health service platform. • BPMS, feedback based disease group identification (PCADP matrix), ontology are
the three key features or technologies for our platform. • The three key features are revealed to be effective in meeting the six design goals. • Confirmed the benefits of using u-health service platform by developing a stress
management service on the platform. • Since the system evolves, the u-health service platform gets better as the more
u-health services are developed and run on the platform.
References [1] Kwon, J.-D.: Customer Characteristics Analysis of the Curing Expert System for the Dementia or Other Handicapped. AlphaInternet Co. Ltd. (2001) [2] Han, D.-S., Ko, I.-Y., Park, S.-J.: A Study on Development of Mobile U-Health Service System, the final report of research with the same title from ICU, Korea (2006) [3] Han, D.-S., Ko, I.-Y., Park, S.-J.: Evolving Mobile U-Health Service Platform. Proceedings of Information Security Society 17(1), 11–21 (2007)
Improvement of Mobile U-health Services System
51
[4] Konstantas, D., Bults, R., Van Halteren, A., Wac, K., Jones, V., Wkdya, I., Herzog, R., Streimelweger, B.: Mobile Health Care: Towards a commercialization of research results. In: Proceedings of the 1st European Conference on eHealth-ECEH06-Fribourg, Switzerland, pp. 12–13 (October 2006) [5] Pappas, M., Coscia, C., Dodero, E., Gianuzzi, G., Earney, V.: A Mobile E-Health System Based on Workflow Automation Tools. In: Proceedings of the 15th IEEE Symposium on Computer-Based Medical Systems, pp. 271–276 (June 2002) [6] Min, B.-W., Oh, Y.-S., Han, D.-S., Ku, J.-Y.: A Design of Mobile U-Health Service Platform. In: Proceedings of Fall 2009 Integrated Conference, vol. 7(1), pp. 797–801. Korea Contents Association (2009) [7] Lee, H.-S., Bak, J.-H., Sim, B.-K., Lee, H.-O., Han, S.-W., Min, B.-W., Lee, H.-T.: Webbased Patient Monitoring System Using Wireless Diaper Wetness Sensor. In: Proceedings of ICCC 2008, vol. 6(2), pp. 652–660. Korea Contents Association (2008) [8] Min, B.-W., Lee, H.-T., Oh, Y.-S.: USN Based Intelligent Urine Sensing U-Care System. In: Proceedings of Spring 2008 Integrated Conference, vol. 5(2), pp. 598–601. Korea Contents Association (2008) [9] Min, B.-W., Oh, Y.-S.: Design of U-Healthcare Product Using Wetness Sensor. In: Proceedings of Spring 2007 Integrated Conference, vol. 3(2), pp. 144–147. Korea Contents Association (2007) [10] Park, H.-G., Kim, H.-J., Lee, S.-J.: A Transmission Management System of Signal from Living Bodies Using ZigBee. In: Proceedings of 2008 Conference, vol. 32(1), pp. 526–528. Korea Computer Society (2005)
Design and Implementation of an Objective-C Compiler for the Virtual Machine on Smart Phone* YunSik Son1 and YangSun Lee2,** 1
Dept. of Computer Engineering, Dongguk University 26 3-Ga Phil-Dong, Jung-Gu, Seoul 100-715, Korea
[email protected] 2 Dept. of Computer Engineering, Seokyeong University 16-1 Jungneung-Dong, Sungbuk-Ku, Seoul 136-704, Korea
[email protected] Abstract. For each platform, for smart phone contents, a unique development environment exists and thus suitable development methods and development languages must be used for each platform. A problem of this development environment is that when contents are created for a number of platforms, an increase in expenses occurs. SVM(Smart Virtual Machine) is a virtual machine solution which is being developed to overcome this problem by using SIL(Smart Intermediate Language) as an intermediate language. SIL is capable of accommodating ISO/IEC C++, Java, Objective-C and other object-oriented programming. In this paper, the Objective-C compiler for the virtual machine is designed and virtualized which creates stack based virtual machine codes, not objective codes when using contents previously developed for the use on other platforms. Keywords: Smart Intermediate Language, Smart Virtual Machine, Objective-C Compiler, Compiler Construction.
1
Introduction
Contents development environments for existing smart phones required objective codes to be made depending on the objective machine or platform and the development language used is different for each platform. Therefore, even if the same contents are to be used, it must be re-created depending on the objective machine and a compiler for that specific machine is needed, making the contents development process very inefficient. SVM(Smart Virtual Machine) is a virtual machine solution which aims to resolve such problems and was practiced after entering the SIL(Smart Intermediate Language) designed by our research team. In this study, a compiler for use in a program designed in the Objective-C language to be used on a SVM is designed and implemented. In order to effectively implement *
This research was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education, Science and Technology(No.20110006884). ** Corresponding author. T.-h. Kim et al. (Eds.): MulGraB 2011, Part I, CCIS 262, pp. 52–59, 2011. © Springer-Verlag Berlin Heidelberg 2011
Design and Implementation of an Objective-C Compiler for the Virtual Machine
53
the compiler, the program created using the Objective-C language was logically analyzed after dividing into the declaration part and the statement part. This study introduces the Objective-C SIL compiler in the following order. First in Chapter 2, the SVM platform and SIL, the intermediate language, is introduced. Following this, the entire composition of the compiler is introduced and the individual modules explained in Chapter 3. In Chapter 4, the Objective-C SIL compiler’s virtualization and the program source provided by iOS SDK is used to experiment. Finally in Chapter 5, the results of the study and the research directions are provided.
2
Relative Studies
2.1
SVM(Smart Virtual Machine)
The SVM is a platform which is loaded on smart phones. It is a stack based virtual machine solution which can independently download and run application programs. The SVM consists of three main parts; compiler, assembler and virtual machine. It is designed in a hierarchal structure to minimize the burden of the re-targeting process. Fig. 1 shows the composition of the SVM system.
Fig. 1. SVM System Configuration
The SVM is designed to accommodate successive languages, object-oriented languages and etc. through input of SIL(Smart Intermediate Language) as its intermediate language. It has the advantage of accommodating C/C++ and Java, which are the most widely used languages used by developers. SIL was a result of the compilation/translation process and it is changed into the running format SEF(SIL Executable Format) through an assembler. The SVM then runs the program after receiving the SEF. 2.2
SIL(Smart Intermediate Language)
SIL, the virtual machine code for SVMs, is designed as a standardized virtual machine code model for ordinary smart phones and embedded systems [1]. SIL is a stack
54
Y. Son and Y.S. Lee
based command set which holds independence as a language, hardware and a platform. In order to accommodate a variety of programming languages, SIL is defined based on the analysis of existing virtual machine codes such as bytecode [2], .NET IL [3] and etc. In addition, it also has the set of arithmetic operations codes to accommodate object-oriented languages and successive languages. SIL is composed of Meta code (shows class declarations and specific operations) and arithmetic codes (responds to actual commands). SIL’s arithmetic codes are classified into seven categories as can be seen in Fig. 2 and each category has its own detailed categories.
Fig. 2. Category for SIL Operation Code
3
Objective-C to SIL Compiler
The Objective-C language can be logically divided into two parts; the declaration part and the statement part [4]. The declaration part is the part which defines the data structures of the program where the statement part is the part which depicts the algorithm used in the problem solving process. In this study, the Objective-C to SIL compiler was designed based on such characteristics and as can be seen in Fig. 3 it has four parts and 9 detailed modules.
Fig. 3. Objective-C to SIL Compiler Model
The Objective-C to SIL compiler embodies the characteristics of the Objective-C language and therefore was designed with four different parts; syntax analysis, symbol information collection, semantic analysis and code generation. The detailed information for each part is as follows.
Design and Implementation of an Objective-C Compiler for the Virtual Machine
3.1
55
Syntax Analysis Part
The syntax analysis part carries out syntax analysis regarding the given input program (*.m) and converts it into an AST(Abstract Syntax Tree) which holds the equivalent semantics. There are largely three steps in the syntax analysis part; lexical analysis, syntax analysis and error recovery [5,6,7]. Details for each step are as follows. Lexical analysis is the process of disassembling the given input into tokens and the Objective-C to SIL compiler virtualized in this study can recognized a total of 115 types of tokens. Syntax analysis is the process of analyzing a program’s syntax. To start with, the crucial syntax analysis for Objective-C is expressed into a grammar form that can be recognized in LALR(1). Then this is used to create a parsing table through input of PGS(Parser Generating System). The parsing table created is used to create parse grammar using four routines – shift, reduce, accept and error – and such parse grammar goes through syntax-directed translation to create an AST. Error recovery is the process of handling errors that occur during the syntax analysis process [8,9,10]. Errors are handled through three methods, the error panic mode, insertion handling and deletion handling. 3.2
Symbol Information Collection Part
The module for symbol information collection consists of symbol information collection routines and a symbol table. First, the symbol information collection routine carries out the job of saving information into the symbol table which is obtained by inserting ASTs and rounding the tree. The routines consist of the interface, protocol, class member, ordinary declarations and others (given the characteristics of the Objective-C language). Next, the symbol table is used to manage the symbols(names) and information on the symbols within a program. In order to reflect the characteristics of the ObjectiveC language, it is combined by 3 logical table groups; Windows, Storage User defined type. And these table groups are further segmented into 7 detailed tables; Symbol, Concrete, Abstract, Type, Aggregate, Member and Link. 3.3
Semantic Analysis Part
The semantic analysis part is composed of the declarations semantic analysis module and the statements semantic analysis module. The declarations semantic analysis module checks the process of collecting symbol information on the AST level, to verify cases which are grammatically correct but semantically incorrect. Semantic analysis of the declarations part is handled by two parts; semantic error and semantic warning. The statements semantic analysis module uses the AST and symbol table to carry out semantic analysis of statements and creates a semantic tree as a result[11]. The statement semantic analysis module is made up of two parts, the semantic analysis module part that visits the AST to check whether each arithmetic operations are semantically correct and the tree conversion module which converts the tree into a form that makes it easy to generate codes.
56
3.4
Y. Son and Y.S. Lee
Code Generation Part
The code generation part receives the semantic tree as an input after all analysis is complete and it generates a SIL code which is semantically equal to the input program (*.m). A code generator visits each nodes of the semantic tree to convert them into SIL codes and largely consists of two parts, the declarations code generation module and the statements code generation module. In the declarations module, each declaration’s structure and symbol table are analyzed and a SIL code for that declarations part is generated. For the statements module, codes are generated for all operators and operands within the statements.
4
Implementation and Experiments
To implement the Objective-C to SIL compiler, first the language’s grammar was chosen and then using this a LALR(1) parsing table was created. The grammar used was based on Objective-C 2.0 and the information on the grammar parsing table can be seen in Table 1. Table 1. Objective-C Grammar, Parsing Table, Tree Information Name
Count
Name
Count
Grammar Rules
356
Parsing Table Kernels
574
Terminal Symbols
115
AST Nodes
169
Nonterminal Symbols
149
Semantic Tree Nodes
248
Next, we show the process of converting the source program’s code(written in Objective-C language) into the objective code, the SIL code, using the virtual Objective-C to SIL compiler. Table 2 has been created so that the characteristics of the declarations and syntax of the example program can be seen using the Objective-C language. Table 2. Example Program(VolumeTest.m) … @interface Volume : NSObject { int val; int min, max, step; } - (id)initWithMin:(int)a max:(int)b step:(int)s; - (int)value; - (id)up; - (id)down; @end
@implementation Volume - (id)initWithMin:(int)a max:(int)b step:(int)s { self = [super init]; if (self != nil) { val = min = a; max = b; step = s; } return self; } … @end
Design and Implementation of an Objective-C Compiler for the Virtual Machine
57
Table 3 shows the AST and semantic trees structures generated from the input program. You can see that the syntax have been expressed using the AST nodes defined earlier on, and semantic information and information needed for code generation that has been added to the semantic tree can be seen as well. Table 3. AST and Semantic Tree for an Example Program Segment %%HeaderSectionStart ˎ %%HeaderSectionEnd %%CodeSectionStart %FunctionStart .func_name &Volume::initWithMin$6 .func_type 2 .param_count 3 .opcode_start proc 16 1 1 str.p 1 0 str.i 1 12 str.i 1 8 str.i 1 4 lod.p 1 0 ldc.p 0 add.p ldp lod.p 1 0 call &NSObject::init$5 sti.t lod.p 1 0
ldc.p 0 add.p ldi.p ldc.i 0 ne.i fjp ##0 lod.p 1 0 ldc.p 8 add.p lod.i 1 4 sti.t lod.p 1 0 ldc.p 4 add.p lod.p 1 0 ldc.p 8 add.p sti.t lod.p 1 0 ldc.p 12 add.p lod.i 1 8 sti.t
lod.p 1 0 ldc.p 16 add.p lod.i 1 12 sti.t %Label ##0 lod.p 1 0 ldc.p 0 add.p retv.p ret .opcode_end %FunctionEnd … %%CodeSectionEnd %%DataSectionStart … %%DataSectionEnd
Table 4 shows a part of the SIL code that has been generated using a semantic tree. Table 4. Generated SIL Code for Example Program %%HeaderSectionStart … %%HeaderSectionEnd %%CodeSectionStart %FunctionStart .func_name &Volume::initWithMin$6 .func_type 2 .param_count 3 .opcode_start proc 16 1 1 str.p 1 0 str.i 1 12 str.i 1 8 str.i 1 4 lod.p 1 0 ldc.p 0 add.p ldp lod.p 1 0 call &NSObject::init$5 sti.t lod.p 1 0
ldc.p 0 add.p ldi.p ldc.i 0 ne.i fjp ##0 lod.p 1 ldc.p 8 add.p lod.i 1 sti.t lod.p 1 ldc.p 4 add.p lod.p 1 ldc.p 8 add.p sti.t lod.p 1 ldc.p 12 add.p lod.i 1 sti.t
0
4 0
0
0
8
lod.p 1 0 ldc.p 16 add.p lod.i 1 12 sti.t %Label ##0 lod.p 1 0 ldc.p 0 add.p retv.p ret .opcode_end %FunctionEnd … %%CodeSectionEnd %%DataSectionStart … %%DataSectionEnd
58
5
Y. Son and Y.S. Lee
Conclusions and Further Researches
Virtual machines refer to the technique of using the same application program even if the process or operating system is changed. It is the core technique that can be loaded onto recently booming smart phones, necessary as a independent download solution software technique. In this study, an Objective-C to SIL compiler was designed and virtualized to run a program that was originally created for another platform to enable its use on a SVM. The Objective-C language was logically divided into two parts the declarations part and the statements part. Then these parts were made into four modules to create a compiler and generate a SIL code for use on a SVM which is independent of platforms. As a result, programs developed for use as iOS contents could be run on a SVM using the compiler developed throughout the study and therefore expenses required when producing such contents can be minimized. In the future, there is need for research on an Android Java-SIL compiler so that Android contents can be run on a SVM. Further research on optimizers and assemblers for SIL code programs are also needed so that SIL codes that have been generated can run effectively on SVMs.
References 1. Yun, S.L., Nam, D.G., Oh, S.M., Kim, J.S.: Virtual Machine Code for Embed-ded Systems. In: International Conference on CIMCA, pp. 206–214 (2004) 2. Meyer, J., Downing, T.: JAVA Virtual Machine. O’REYLLY (1997) 3. Lindin, S.: Inside Microsoft .NET IL Assembler. Microsoft Press (2002) 4. The Objective-C Programming Language, Apple, http://developer.apple.com/library/ios/#documentation/Cocoa/ Conceptual/ObjectiveC/Introduction/introObjectiveC.html 5. Aho, A.V., Lam, M.S., Sethi, R., Ullman, J.D.: Compilers: Principles, Techniques, & Tools. Addision-Wesley (2007) 6. Grune, D., Bal, H.E., Jacobs, C.J.H., Langendoen, K.G.: Modern Compiler Design. John Wiley & Sons (2000) 7. Oh, S.M.: Introduction to Compilers, 3rd edn. Jungik Publishing, Seoul (2006) 8. Cerecke, C.: Repairing Syntax Errors in LR-Based Parsers. In: Proceedings of the 25th Australasian Conference on Computer Science, vol. 4, pp. 17–22 (2002) 9. Oh, S.M., Kim, J.S.: Extension of SG Compiler. Project Report, Research Center for Information Communication. Dongguk University (2001) 10. Kim, I.S., Choe, K.M.: Error Repair with Validation in LR-Based Parsing. ACM Transactions on Programming Languages and Systems 23(4), 451–471 (2001) 11. Son, Y.S.: 2-Level Code Generation using Semantic Tree, Master Thesis, Dongguk University (2006) 12. Aho, A.V., Johnson, S.C.: LR Parsing. ACM Computing Surveys 6(2), 99–124 (1974) 13. Barth, J.M.: A practical interprocedural data flow analysis algorithm. Communications of the ACM 21(9), 724–736 (1978) 14. Gough, J.: Compiling for the .NET Common Language Runtime(CLR). Prentice-Hall (2002)
Design and Implementation of an Objective-C Compiler for the Virtual Machine
59
15. Graham, S.L., Haley, C.B., Joy, W.N.: Practical LR Error Recovery. In: Proceedings of the SIGPLAN Sym. on Compiler Construction, SIGPLAN Notices, vol. 13(8), pp. 168–175 (1979) 16. Kim, Y.G., Kwon, H.J., Lee, Y.S.: Design and Implementation of a Decom-piler for Verification and Analysis of Intermediate Code in ANSI C Compiler. Journal of Korea Multimedia Society 10(3), 411–419 (2007) 17. Knuth, D.E.: The Genesis of Attribute Grammars. In: ACM Proceedings of the International Conference on Attribute Grammars and Their Applications, pp. 1–12 (1990) 18. Lee, G.O.: Prediction of Reduction Goals: Deterministic Approach. Journal of Korea Institute of Information Scientist and Engineers 30(5.6), 461–465 (2003) 19. Lee, Y.S., Oh, S.M., Kim, Y.G., Kwon, H.J., Son, Y.S., Park, S.H.: Development of ANSI C Compiler for Embedded Systems. Industry-Academia Cooperation Foundation of Seokyeong University (2004) 20. Lee, Y.S., Oh, S.M., Bae, S.M., Son, M.S., Son, Y.S., Shin, Y.H.: Development of C++ Compiler for Embedded Systems. Industry-Academia Cooperation Foundation of Seokyeong University (2006)
The Semantic Analysis Using Tree Transformation on the Objective-C Compiler* YunSik Son1 and YangSun Lee2,** 1
Dept. of Computer Engineering, Dongguk University 26 3-Ga Phil-Dong, Jung-Gu, Seoul 100-715, Korea
[email protected] 2 Dept. of Computer Engineering, Seokyeong University 16-1 Jungneung-Dong, Sungbuk-Ku, Seoul 136-704, Korea
[email protected]
Abstract. Semantic Analysis is a process which analyzes the validity of a meaning created by combining a program’s different constituents, and this process has become indispensable component for producing a compiler. It uses the attribute grammar method or the manual method however such methodology holds limitations in terms of efficiency or automation. In this study, in order to make up for the drawbacks mentioned above, a semantic tree which includes the analyzed information will be defined and a technique to convert the abstract syntax tree used in most compilers – a result of syntax analysis – into a semantic tree will be proposed. The semantic tree transformation technique processes semantic analysis on a semantic node unit level and the semantic analysis process is carried out consistently and efficiently. In addition, the semantic tree transformation makes transformation of data structures and automation very simple. Keywords: Semantic Tree, Tree Transformation, Semantic Analysis, Compiler Construction, Objective-C Compiler, Abstract Syntax Tree.
1
Introduction
Semantic analysis refers to a process that analyzes the validity of the meaning of a program and is the front-end part of a compiler. The semantic analysis process also has a function of transferring a ‘message of error’ to the programmer if the syntax of program is correct but incorrect in terms of semantics, so that a correct programming can take place. Furthermore, it collects data for code generation so that it can increase efficiency when creating codes and help right codes to be made. Semantic analysis can generally analyze using the attribute grammar method and the manual method. The attribute grammar method can analyze meanings *
This research was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education, Science and Technology(No.20110006884). ** Corresponding author. T.-h. Kim et al. (Eds.): MulGraB 2011, Part I, CCIS 262, pp. 60–68, 2011. © Springer-Verlag Berlin Heidelberg 2011
The Semantic Analysis Using Tree Transformation on the Objective-C Compiler
61
consistently, however the downsides are that it is a little less efficient and if the grammar changes, the technical routine related to such grammar changes must occur at the same time. On the other hand, it is difficult to analyze with consistency using the manual method [1]. This study defines what a semantic tree is, a data structure for analyzing meanings, and proposes a semantic analysis technique through transformation of the Intermediate Language from the abstract syntax tree into the semantic tree. The semantic tree, a result of a semantic analysis process, reflects both syntax information and semantic information at the same time so that semantic analysis and code creation can be carried out efficiently as a data structure. The tree transformation method converts the AST(Abstract Syntax Tree) into a semantic tree so that semantic analysis relevant to AST’s nodes is carried out with consistency. The semantic analysis technique proposed applies the transformation method for each node after syntax analysis is complete by the AST, and it is more efficient and analyzes with consistency in comparison to the attribute grammar method.
2
Relative Studies
2.1
Intermediate Language
The Intermediate language refers to a concept that developed as studies were carried out on compilers. It plays the role of connecting all modules that constitute a compiler. An Intermediate Language is designed according to the characteristics of a compiler and various forms exist such as the Polish notation, 3-Address code, Tree structure code, abstract machine code and etc. Intermediate languages are absolutely necessary, along with portable compilers, to increase portability. They help facilitate more efficient translation processes by connecting meaning differences of high standard languages with substandard codes and express them simply. Most recent compilers use AST as an Intermediate Language. ASTs carry the form of trees and express the program’s meanings more efficiently. ASTs are especially able to create trees more simply using the syntax-directed method during the syntax analysis process, and express programs’ syntactic structures concisely by eliminating unnecessary information [2,3]. 2.2
Semantic Analysis
Semantic analysis is the process of validating the meanings of a program’s syntactic structures during the compiling process. It carries out type checking, data flow analysis, control flow analysis and analyzes the different characteristics for each programming language. Generally there are two ways of carrying out semantic analysis of the compiler’s components. One way is the attribute grammar method where the attributes of the programming language are described and taken care of. The other is the manual method where the meanings are interpreted first hand and calculated. If a semantic analysis machine were to consist of attribute grammar, characteristics can be collected
62
Y.S. Son and Y.S. Lee
and analyzed with consistency according to the creating rules however, to account for symbols, separate attribute evaluators are necessary and thus the process becomes more complex. In addition, the cost of changing the meaning rules according to the grammatical changes requires a significant expense and makes it difficult to analyze the complicated programming language structures. The manual method analyzes more difficult semantic analysis that cannot be done using ordinary methods such as the attribute grammar method, and therefore interprets and analyzes characteristics of symbols and the flow of data. The manual method uses an Intermediate Language, a necessary component of an ordinary tree, to form an interpreter. This method is efficient for individual semantic analyses, but the downsides are that it is inefficient when analyzing several semantic analyses at once and a separate analysis model is needed when additional semantic analysis is necessary.
3
Tree Transformation
3.1
Semantic Tree
The semantic tree is a binary data structure which contains semantic information obtained from the AST(Abstract Syntax Tree). The semantic tree is defined based on the AST which expresses programs’ syntactic structures efficiently and includes all characteristics of an AST. The semantic tree’s position within a compiler’s process is as follows [4].
Fig. 1. Sematic Analyzer in Compiler Front-End
A semantic tree expresses a program’s semantic information in a structural way and its basic unit is a semantic node. The semantic nodes are responsible for semantic information for each symbol and structural expression of a program is a result of a combination between semantic nodes and an AST. Semantic nodes are classified according to their properties during the semantic analysis process and the classifications include reference, conversion, optional operator and etc. Another process that is needed during semantic analysis, aside from information collection on properties, is the process of collecting additional information needed to create codes and such information is retained by the individual semantic nodes. Semantic information and code creation information is obtained through the tree transformation method when using an AST to carry out semantic analysis. For a semantic tree to express correct semantic information and syntax, a semantic tree’s structure and semantic nodes for one program must be unique. This is a
The Semantic Analysis Using Tree Transformation on the Objective-C Compiler
63
prerequisite for using the tree transformation method to do semantic analysis. If a semantic tree’s structure and semantic nodes are unique for each individual program, the tree transformation method can be applied with consistency. 3.2
Tree Transformation Method
The tree transformation method converts the AST into a semantic tree to carry out semantic analysis and the semantic tree applies the analysis results. Fig. 2 is an image of the entire tree transformation model.
Fig. 2. Tree Transformation Model for Compiler Construction
The AST expresses a program’s syntax information and displays a symbol table which includes all the information on symbols. Such information is defined by ASTs’ individual nodes and they are converted in a semantic tree through the transformation method after the semantic analysis process. The tree transformation method used for semantic analysis mainly consists of a symbol properties information calculation method, type conversion method, node conversion method, and a flow control method used to analyze the flow control of a program. First of all, the symbol properties information calculation method uses the symbol table and the AST. The symbol table stores all information on symbols used by the semantic nodes when describing or analyzing the properties of the symbols and the semantic node stores the calculated properties of the symbols. Next, the type conversion method uses its characteristic, transparency, to convert an AST node type into a semantic node type [5]. Type conversion according to arithmetic operations uses synthesized attributes to apply this method, where the child node type’s attributes are used to expand according to the conversion rules that have been created beforehand. In general, the node conversion method follows the following three rules. Finally, in order to analyze a program’s flow control, a program’s turning point that can be consulted is traced and the flow is described using the tree. In order to analyze flow control, the AST node related to that branch point in terms of programming
64
Y.S. Son and Y.S. Lee
language is chosen. Then the basic blocks are assorted and the nodes for each point are connected to express the control stream.
4
Implementation
In this chapter, the semantic analysis using the tree transformation technique proposed will be applied to an actual compiler and tested. The compilers used for testing are Objective-C compilers used in multiplex smart phones, compilers being developed for virtual machines. The grammar information and syntax analysis information for the compiler used is as follows Table 1. Table 1. Objective-C Grammar Information Symbol Information
Terminal Count: 115 Nonterminal Count: 149 Tree Node Count: 169 Synonym Count: 0
Rule Information
Number Of Rules: 356 Average Rule Length: 2
Grammar is defined within 356 creating rules. There are 115 terminal symbols and 149 non-terminal symbols. Finally there are 169 nodes defined as nodes needed to compose an AST. In order to carry out the semantic analysis process using the tree transformation method proposed, the semantic node must be defined. A total of 248 nodes were defined as semantic node trees, which were based on AST nodes used for syntax analysis. They can be classified in the three categories stated below. 1) 2) 3)
Arithmetic operations nodes with types added Type conversion nodes Reference analysis nodes
In Table 2, the semantic tree is listed according to each arithmetic operation based on the AST defined. Arithmetic operations nodes with types added are based on the arithmetic operation nodes of AST and can be expanded into semantic nodes depending on the Objective-C language types they held. Unnecessary semantic node types were removed or substituted depending on the arithmetic operation language meanings they hold. The semantic nodes for type conversion can be seen in Table 3. First of all, in order to determine a type conversion node, a graph like Fig. 3 should be drawn up using the Objective-C language type and conversion characteristics. Based on the graph, the type conversion node is determined. Basic Objective-C type conversion nodes were defined as nodes that are all convertible after N:N mapping. They are used in type conversion graphs and the tree transformation method during semantic analysis.
The Semantic Analysis Using Tree Transformation on the Objective-C Compiler
65
Table 2. Semantic Node for Operations AST Node ADD / SUB MUL / DIV MOD NEG EQ / NE / GE / GT / LE / LT LOGICAL_AND / LOGICAL_OR / LOGICAL_NOT/ BITWISE_AND/BITWISE_OR/ BITWISE_XOR/LEFT_SHIFT/ COMP / RIGHT_SHIFT
Semantic Node ADD / SUB(I, U, L, P, F, D) MUL / DIV(I, U, L, F, D) MOD(I, U, L) NEG(I, L, F, D) EQ / NE / GE / GT / LE / LT(I, U, L, F, D) AND / OR / NOT / BAND / BOR / XOR / SHL / BCOM (I, L)
[U]SHR(C, S, I, L)
Fig. 3. Type Conversion Graph for Objective-C Table 3. Semantic Node for Type Conversion Convert to char short int unsigned
Semantic Node CV(S, I, U, L, F, D)_C CV(C, I, U, L, F, D)_S CV(C, S, U, L, F, D)_I CV(C, S, I, L, F, D)_U
Convert to long float double
Semantic Node CV(C, S, I, U, F, D)_L CV(C, S, I, U, L, D)_F CV(C, S, I, U, L, F)_D
Reference analysis nodes for dereferencing are as follows. The l-value and r-value for the variables are decided and there are five references responsible for special references and can be seen in Table 4. Table 4. Semantic Node for Reference Analysis Semantic Tree Node ADDR, VALUE, REFERENCE, OBJECT, THIS_OBJECT
The results below show the semantic analysis of a program using the tree transformation method to examine the semantic tree and its type. In Table 5, a program which generally expresses using the Objective-C language was selected and a part of the testing process was extracted.
66
Y.S. Son and Y.S. Lee Table 5. Example Program … @implementation Volume - (id)initWithMin:(long)a max:(int)b step:(int)s { self = [super init]; if (self != nil) { val = min = a; max = b; step = s; } return self; } …
int main(void) { … id v, w; v = [[Volume alloc] initWithMin:0 max:10 step:2]; w = [[Volume alloc] initWithMin:0 max:9 step:3]; [v up]; … }
Table 6 shows a part of the syntax analysis carried out by an AST, of the program used in Table 5. The Objective-C language’s message delivery process is shown to have been expressed in a structural manner. Table 6. Result of Syntax Analysis(AST) //v = [[Volume alloc] initWithMin:0 max:10 step:2]; Nonterminal: ASSIGN_OP Terminal( Type:id / Value:v ) Nonterminal: MESSAGE_EXP Nonterminal: RECEIVER_PART Nonterminal: MESSAGE_EXP Nonterminal: RECEIVER_PART Terminal( Type:className / Value:Volume ) Nonterminal: SELECTOR_PART Terminal( Type:id / Value:alloc ) Nonterminal: SELECTOR_PART Nonterminal: KEYWORD_ARG_LIST Nonterminal: KEYWORD_ARG Terminal( Type:id / Value:initWithMin ) Terminal( Type:int / Value:0 ) Nonterminal: KEYWORD_ARG Terminal( Type:id / Value:max ) Terminal( Type:int / Value:10 ) Nonterminal: KEYWORD_ARG Terminal( Type:id / Value:step ) Terminal( Type:int / Value:2 )
Next, the AST from above was entered and the tree transformation method was implemented for each of the AST nodes related and shows the semantic tree created as a result. As a result of semantic analysis, symbol properties were added to each node and for pointer types, the actual reference type information was calculated. It can be seen that if parameter types were different from one another, type conversion nodes was added. Furthermore the address information added to the parameter is used when creating object codes.
The Semantic Analysis Using Tree Transformation on the Objective-C Compiler
67
Table 7. Result of Semantic Analysis(Semantic Tree) //v = [[Volume alloc] initWithMin:0 max:10 step:2]; Nonterminal: ASSIGN_OP / opType:6 / targetType:58 Terminal( Type:id / Value:v / opType:6 / targetType:58 / qualifier:0 / (b:1, o:16, w:4) / Tag:1 / Dim:0) Nonterminal: MESSAGE_EXP / opType:6 Nonterminal: RECEIVER_PART / opType:6 Nonterminal: MESSAGE_EXP / opType:6 Nonterminal: RECEIVER_PART / opType:6 Terminal( Type:className / Value:Volume / opType:6 / targetType:67) Nonterminal: SELECTOR_PART Terminal( Type:id / Value:alloc / opType:67) Nonterminal: SELECTOR_PART Nonterminal: KEYWORD_ARG_LIST Nonterminal: KEYWORD_ARG / opType: 4 Terminal( Type:id / Value:initWithMin /opType: 4) Nonterminal: CVI_L / opType: 4 Terminal( Type:int / Value:0 / opType: 3) Nonterminal: KEYWORD_ARG / opType: 3 Terminal( Type:id / Value:max / opType: 3) Terminal( Type:int / Value:10 / opType: 3) Nonterminal: KEYWORD_ARG / opType: 3 Terminal( Type:id / Value:step / opType: 3) Terminal( Type:int / Value:2 / opType: 3)
5
Conclusions and Further Researches
A semantic tree is a data structure defined for semantic analysis. In this tree, ASTs’ abstract syntax structure form is maintained, semantic information and object machine can express dependent information, semantic analysis is efficient and uses a relatively easy intermediate language for code creation. In this study, the semantic tree and the tree transformation method is used to design a semantic analysis technique and the newly designed method was applied and experimented on the Objective-C programming language. The results of the experimentation were that the individual nodes reflected on the semantic properties’ values of the program and expanded into semantic nodes. During this process, the type examination and semantic analysis process was completed. The semantic analysis technique proposed in this study will require further research into data structures to increase its efficiency and to maintain the effectiveness of delivering information. Then, more research will be needed for the ASTs’ nodes, semantic nodes and automation tools for semantic analysis machines through the tree transformation method’s mapping.
References 1. Grune, D., Bal, H.E., Jacobs, C.J.H., Langendoen, K.G.: Modern Compiler Design. John Wiley & Sons (2000)
68
Y.S. Son and Y.S. Lee
2. Aho, A.V., Lam, M.S., Sethi, R., Ullman, J.D.: Compilers: Principles, Techniques, & Tools. Addision-Wesley (2007) 3. Oh, S.M.: Introduction to Compilers, 3rd edn. Jungik Publishing, Seoul (2006) 4. Brosgol, B.M.: TCOLAda and the Middle End of the PQCC Ada Compiler. In: Proceedings of the ACM-SIGPLAN Symp. on The ADA Programming Lanugage, pp. 101–112 (1980) 5. Mithell, J.C.: Coercion and Type Interface. In: 11th ACM Symp. on Principles of Programming Languages, pp. 175–185 (1984) 6. The Objective-C Programming Language, Apple, http://developer.apple.com/library/ios/#documentation/Cocoa/ Conceptual/ObjectiveC/Introduction/introObjectiveC.html 7. Aho, A.V., Johnson, S.C.: LR Parsing. ACM Computing Surveys 6(2), 99–124 (1974) 8. Barth, J.M.: A practical interprocedural data flow analysis algorithm. Communications of the ACM 21(9), 724–736 (1978) 9. Kernighan, B.W., Ritchie, D.M.: The C Programming Language, 2nd edn. Prentice Hall (1988) 10. Knuth, D.E.: The Genesis of Attribute Grammars. In: ACM Proceedings of the International Conference on Attribute Grammars and Their Applications, pp. 1–12 (1990) 11. Knuth, D.E.: Semantic of context-free languages. Mathematical Systems Theory 2(2), 127–145 (1968) 12. Koskimies, K.: A specification language for one-pass semantic analysis. In: Proceedings of the 1984 SIGPLAN Symposium on Compiler Construction, pp. 179–189 (1984) 13. Lee, Y.-S., Kim, Y., Kwon, H.: Design and Implementation of the Decompiler for Virtual Machine Code of the C++ Compiler in the Ubiquitous Game Platform. In: Szczuka, M.S., Howard, D., Ślȩzak, D., Kim, H.-K., Kim, T.-H., Ko, I.-S., Lee, G., Sloot, P.M.A. (eds.) ICHIT 2006. LNCS (LNAI), vol. 4413, pp. 511–521. Springer, Heidelberg (2007) 14. Lee, Y.S., Oh, S.M., Bae, S.M., Son, M.S., Son, Y.S., Shin, Y.H.: Development of C++ Compiler for Embedded Systems. Industry-Academia Cooperation Foundation of Seokyeong University (2006) 15. Muchnick, S.S.: Advanced Compiler Design Implementation. Morgan Kaufmann Press (1997) 16. Oh, S.M., Kim, J.S.: Extension of SG Compiler. Project Report, Research Center for Information Communication, Dongguk University (2001) 17. Paakki, J.: Attribute Grammar Paradigms – A High-Level Methodology in Language Implementation. ACM Computing Surveys 27(2), 196–255 (1995) 18. Sherman, M.S., Borkan, M.S.: A flexible semantic analyzer for Ada. In: ACM SIGPLAN Notices, Proceeding of the ACM-SIGPLAN Symposium on Ada Programming Language, vol. 15(2), pp. 62–71 (1980) 19. Son, Y.S.: 2-Level Code Generation using Semantic Tree, Master Thesis, Dongguk University (2006) 20. Son, Y.S., Oh, S.M.: Construction of Enhanced Parser for Mobile Contents. In: MITA 2008, pp. 41–44 (2008) 21. Kim, Y.G., Kwon, H.J., Lee, Y.S.: Design and Implementation of a Decompiler for Verification and Analysis of Intermediate Code in ANSI C Compiler. Journal of Korea Multimedia Society 10(3), 411–419 (2007)
A Platform Mapping Engine for the WIPI-to-Windows Mobile Contents Converter∗ YangSun Lee1,** and YunSik Son2 1
Dept. of Computer Engineering, Seokyeong University 16-1 Jungneung-Dong, Sungbuk-Ku, Seoul 136-704, Korea
[email protected] 2 Dept. of Computer Engineering, Dongguk University 26 3-Ga Phil-Dong, Jung-Gu, Seoul 100-715, Korea
[email protected]
Abstract. Mobile communication companies in Korea choose different mobile platforms from each other so that developers have to create contents for each of the platforms according to their different characteristics or undergo a converting process to provide the game contents to consumers. In this paper, in order to resolve such problems the game contents of the existing mobile platform, WIPI(Wireless Internet Platform for Interoperability), will be analyzed. Then a platform mapping engine will be implemented in order to convert the game contents for use on a smart platform, Windows Mobile. A mobile contents converter system has enabled contents to be transferred into smart platforms within a short time, so that the time and money it takes to launch services for different mobile communication companies can be reduced.
1
Introduction
Due to the use of different mobile platforms for each of the mobile communications companies in Korea, mobile contents developers must repeat development process to create different versions of games that match the different characteristics of the different smart phone platforms if they aspire to service their games. This has led to the need for developers to convert contents that have been already developed for use on smart phone platforms. However, large amounts of time and costs occur from analyzing one mobile game content’s sources and resources and then converting (porting and retargeting) it. The time and money that could be used to create new contents are being used to service an existing product on different platforms[1-7]. In order to solve this problem, a platform mapping engine will be implemented in this paper, so that game contents on WIPI(Wireless Internet Platform for Interoperability) – the feature phone platform – can be converted on to Windows Mobile – the smart phone ∗
This research was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education, Science and Technology(No.20100023644).
**
Corresponding author.
T.-h. Kim et al. (Eds.): MulGraB 2011, Part I, CCIS 262, pp. 69–78, 2011. © Springer-Verlag Berlin Heidelberg 2011
70
Y.S. Lee and Y.S. Son
platform. The platform mapping engine is a system which provides API functions which allow the previous platform’s execution environment to be recreated using the target platform’s wrapper functions. For this, the API functions, system variables, event environments and etc. are backed in the same forms so that the converted source code can be easily understood and modified. In addition, homogeneity in the driving environment increases the reliability and stability during execution[16-20]. This contents converter system allows mobile game contents to be transferred to different platforms within short periods of time so that the human resources, time and expenses used to service the contents to different mobile communication companies can be saved.
2
Related Studies
2.1
WIPI
WIPI(Wireless Internet Platform for Interoperability) is legislated by KWISF(Korea Wireless Internet Standardization Forum) and a standardized standard chosen by KTTA(Korea Telecommunications Technology Association) as an application program execution environment for mobile communication platforms[7-8]. Because mobile communication companies use different platforms each, contents developing companies feel a great burden from having to repeat development of contents, users’ rights of using are restricted and cell phone manufactures feel burdened to develop new phones. Thus a need for standardization arose and as a result, the Korean standard was set for wireless internet platforms. Figure 1 depicts the structure of a WIPI platform.
Fig. 1. System Configuration of the WIPI Platform
WIPI supports the C language and the Java language which were the languages used when developing application programs. In the case of Java, bytecodes are re-compiled using an AOTC (Ahead Of Time Compiler) and then executed in a native way for each cell phone. The WIPI standards can be largely divided into the HAL (Handset
A Platform Mapping Engine for the WIPI-to-Windows Mobile Contents Converter
71
Adaptation Layer and the basic API (Application Programming Interface). HAL is a standardized hardware abstraction layer to increase transferability. Through this, cell phones are able to carry out abstraction processes. Also, since it is hardwareindependent, it can be executed with no connection with the native system. Only using the standardized HAL and API, a WIPI runtime engine can be implemented and a basic API – for both the C language and the Java language - can be created over it. The basic API is an API that provides compatibility of the standardized platform which is composed of C APIs and Java APIs to accelerate diverse application program creations by program developers. 2.2
Windows Mobile
Windows Mobile(brand name changed to Windows Phone) is a mobile operating system invented by Microsoft Corporation. It is an embedded operating system based on Windows CE, an operating system, which is used in PDAs and smart phones which was previously known as pocket PCs. The Windows Mobile 6 version is a platform for mobile devices which was created using Windows CE 5.0 as its base. It supports hardware such as smart phones and PDAs (Personal Digital Assistants). Figure 2 shows the structure of a Windows Mobile System.
Fig. 2. System Configuration of the Windows Mobile Platform
Windows Mobile 6.5 version is the result of applying a Windows desktop line to a windows mobile device. In this version, a considerable number of UIs were changed for use with a touch screen, the classic pocket PC version supported previously and the resolution version that was seldom used were deleted, and a reinforced simpler version of internet explorer mobile 6 (compared to Windows mobile 6.1.4) is built in. Windows mobile is the basis to Windows embedded CE 5.2 and supports .NET compact framework. The Windows mobile platform offers higher security and diverse APIS such as Bluetooth and POOM (Pocket Outlook Object Model). It also includes a wide range of programming models such as the native code (CPP), a managed code (C#), mobile web development, multithreading and other device supports. The
72
Y.S. Lee and Y.S. Son
development environment is similar to that of Windows, allowing development time and money to be reduced[9]. 2.3
Existing Mobile Contents Converter
Until now, despite the invigoration of the mobile market, there has been a lack of research for mobile contents converters which has led to few examples to refer to. Furthermore, converters for existing contents generally only allow conversion of contents that have similar programming language environments or don’t allow automatic conversion at all. The reality is that programmers have to undergo the converting process by hand. There has been a study on an existing mobile contents converter using XML that attempted to convert Java contents[1-4]. In addition, the functions of the API used in the source codes to be converted are imitated and redefined using wrapper functions. Therefore there is no need to convert the source codes while the same functions are used. There was a study on the mutual conversion of BREW C and WIPI C[10] or converting GVM C into BREW C[11], however it was flawed because the source codes were not automatically converted, the users had to intervene and convert it manually. On the other hand, studies on automatic conversion of mobile contents using the compiler writing system[15,16] have been attempted. Studies have suggested a method to increase the reusability of contents and enhance productivity by converting mobile C contents of the GVM platform into WIPI C, Java or MIDP Java[14.15]. Also other studies are underway to convert existing mobile contents for use in the rising smart phone market for operating systems such as Android and iOS[16-20]. Aside from this, there were few studies on mobile platform contents conversion however a majority of them only supported conversion under identical programming language environments and thus their drawbacks were that they only supported one to one conversion between mobile platforms.
3
A Platform Mapping Engine for the WIPI-to-Windows Mobile Contents Converter
3.1
Composition of the WIPI-to-Windows Mobile Contents Converter
A mobile contents converter system[16-20] allows contents from one platform to be automatically converted for use in another platform, by aligning the characteristics of the contents to those of the target platform. The mobile contents converter system converts contents within a short period of time for use in a different platform to help reduce the time and expenses required to provide the same contents for different mobile communication companies and for different platforms. The WIPI-to-Windows Mobile contents converter consists of a Contents Analyzer, Source Translator, Resource Converter and a Platform Mapping Engine. Figure 3 is a diagram of a WIPI-to-Windows Mobile contents converter. A contents analyzer analyzes the WIPI C contents that have been input in source form to separate source codes and resource data. Also, it extracts resource managing
A Platform Mapping Engine for the WIPI-to-Windows Mobile Contents Converter
73
files that are only used in WIPI. A source translator translates WIPI’s source codes into source codes that can carry out the same actions in Windows Mobile. A resource converter converts the images, sound and resources data used in WIPI into resource data formats that can be used in Windows Mobile. Resource managing files used in WIPI are converted into resource managing files that can be used on Windows Mobile. A platform mapping engine builds the same execution environment of WIPI’s for the Windows Mobile platform and provides a reserved word related to WIPI’s API so that graphics and event environments can be used in the same way.
Fig. 3. WIPI-to-Windows Mobile Contents Converter System
3.2
Platform Mapping Engine
Platform mapping engines convert APIs such as displays, graphics, sound outputs, system variables and event handlers used in WIPI contents’ source codes into forms that are usable in Windows Mobile, the target platform. This way, WIPI contents can be used on the Windows Mobile platform as application programs. For this, identical execution environments to WIPI’s are built and based on these environments, wrapper functions are used to execute WIPI’s APIs, system variables and etc. in the same form and thus implement WIPI’s APIs as Windows Mobile’s APIs. By doing so, the translated source codes for Windows Mobile contents do not need additional adjustments before implementation. Also it enables simplified understanding and source code modification as identical forms of APIs used in WIPI are used. Figure 4 is a diagram of a platform mapping engine. (1) Project file generation The project files which make up Windows Mobile, are managed by the “Microsoft Visual Studio Solution” and actual sources are managed by the “VC++ Project” file. In order to run actual sources, there is a composition which must be included in the project. This composition is Windows Mobile’s basic headers, Register Class set-up, Procedure registration and WINAPI WinMain function. Through the platform
74
Y.S. Lee and Y.S. Son
mapping engine, headers for Windows Mobile and wrapper APIs for WIPI C source are added to the basic headers. RegisterClass’s registration related contents are needed to differentiate different contents within one Windows Mobile phone. Also, Procedure is responsible for actions such as draw, event and etc. for contents and handling them. The WinMain function is responsible for starting the contents so it takes care all of the actions above in order.
Fig. 4. System Configuration of a Platform Mapping Engine
(2) Event environments In WIPI C, “handleCletEvent”, an event handler is registered and used. Each event is defined as a certain type and when an event occurs, it is automatically called. Also in order to provide additional information about the event to the event handler, two parameter variables are used. The platform mapping engine converts events that occurred in Windows Mobile into a WIPI C event form. Then, events are transferred to event handlers, which have been defined for specific translated source codes, so that they can be handled. Event handlers have been set to be called when events for WIPICAPP or WIPICKNL sources – WIPI C’s APIs defined – occur. In this paper, only the timer event and key input event among WIPI C’s events were handled and implemented. (3) Graphics environments WIPI basically provides graphics environments using a frame buffer. Frame buffers provide main LCD frame buffers and assistant LCD frame buffers. And Virtual LCD Frame Buffers are used internally to increase speed of generating outputs and to ensure smooth output generation. If graphic library functions from WIPI’s API are used to create image data or texts, they create not an actual LCD frame buffer but a virtual LCD frame buffer. In this case, internal virtual LCD buffers are put out so that actual LCDs cannot be created. To generate actual LCDs, a library function MC_grpFlushLcd must be used.
A Platform Mapping Engine for the WIPI-to-Windows Mobile Contents Converter
75
Windows Mobile’s graphic functions have been made so that actual LCD buffers are generated, so when they are used, the outputs appear on the screen right away. For platform mapping engines support graphic output functions in the same way as WIPIs do, they use Windows Mobile’s API to generate a virtual LCD buffer. Also graphic functions identical to WIPI C’s graphic library functions have been designed to create the virtual LCD buffer’s images, figures and texts. The function MC_grpFlushLcd function has been designed to use virtual LCD buffer’s information so that deliver it to actual LCD buffers and consequently generate LCDs in the same way as WIPI C. (4) Supporting extended data type, system variable type and library functions Platform mapping engines redefine WIPI extended data type before use. Because extended data types are needed even in the WIPI API provided by a platform mapping engine, they are not defined in the class source translation process but in the platform mapping engine. The platform mapping engine names them identically to WIPI C system’s variable types. Table 1 is a list of the variable types supported in the WIPI system.
Table 1. WIPI's Variable Types Supported
Name M_Boolean M_Uint32 M_Uint16 M_Uint8 M_Int32 M_Int16 M_Int8 M_Char M_Byte M_Int64 M_Uint64 ulong64 Long64
Type boolean type unsigned int 32 bit type unsigned int 16 bit type unsigned int 8 bit type int 32 bit type int 16 bit type int 8 bit type char type byte type int 64 bit type unsigned int 64 bit type unsigned long 64 bit type long 64 bit type
Windows Mobile’s APIs were used in the implementation process so that they would carry out the same actions as the WIPI library functions. The WIPI library function is defined within the WIPICHEADER file, and since each header is inherited through translation of a translator, the WIPI library function within the source codes translated into the CPP language can be used in the same form as the original functions, thus requiring no additional conversion for use in the target platform. Table 2 is a list of the WIPI library functions supported.
76
Y.S. Lee and Y.S. Son Table 2. WIPI's Library Functions Supported
class
Windows Mobile API MC_knlPrintk, MC_knlGetResourceID, MC_knlCalloc, MC_knlGetResource, MC_knlDefTimer, MC_knlSetTimer, MC_knlUnsetTimer, MC_knlSprintk, MC_knlCurrentTime
Kernel(8)
Graphic(19)
Media(5)
Mathermatics (9)
4
MC_grpGetPixelFromRGB, MC_grpSetContext, MC_grpFillRect, MC_grpGetScreenFrameBuffer, MC_grpInitContext, MC_grpFlushLcd, MC_grpRepaint, MC_grpDestroyImage, MC_grpDrawImage, MC_grpCreateImage, MC_grpDrawRect, MC_grpPutPixel, MC_grpCreateOffScreenFrameBuffer, MC_grpCopyFrameBuffer, MC_grpDrawImageRegion MC_mdaClipCreate, MC_mdaClipPutData, MC_mdaPlay, MC_mdaStop, MC_mdaSetVolume MC_mathAbs, MC_mathRand, MC_mathSin100, MC_mathCos100, MC_mathTan100, MC_mathArcSin100, MC_mathArcCos100, MC_mathArcTan100, MC_mathSrand
Experiment Results and Analysis
Using the platform mapping engine proposed in this paper, a WIPI-to-Windows Mobile contents converter was designed. Using this, feature phone WIPI contents were converted into smart phone Windows Mobile contents and the results were compared. The emulators used to run the contents for each platform are as follows : a SKT WIPI Emulator and a Window Mobile 6.1 Emulator. As can be seen in the screens shown in Figure 5, WIPI contents have been converted using the WIPI-to-Windows Mobile contents converter and can be run on Windows Mobile just like it would be run on WIPI.
Fig. 5. Comparison of a Content Execution Result
A Platform Mapping Engine for the WIPI-to-Windows Mobile Contents Converter
5
77
Conclusion
The mobile contents converter developed in this paper using a platform mapping engine is one way to solve the converting problem of mobile contents. By adding an automatic source code translator to this converter, an automatic mobile contents converter can be made. Automatic source code translators are systems which automatically translate different platforms’ source codes by using a compiler technology, system software, and contents’ source codes to create translated output. Through a converter such as this, the job of converting contents can be carried out automatically within a short period of time. This will shorten the time invested in converting WIPI contents for feature phones into Windows Mobile contents for smart phones along with reducing expenses and enhancing productivity. For future enhancement of contents converters’ performance, more study for increasing execution speed and actual experimentation under the actual environment using real devices must be carried out. Consequently, optimized graphic outputs and source code translation and API provision would become possible for the specific platform and device used. Also, the study will be extended to create contents converters for the rapidly growing smart phone platforms such as Android, iOS(iPhone), Windows Phone 7, bada and etc. by supplementing the converters’ systems and functions.
References 1. Kim, M.-Y.: A Design and Implementation of the XML-based Contents Converting System for Wireless Internet Services, Master’s Thesis, Yeungnam University (2003) 2. Kim, S.-H.: Design and Implementation of A Mobile Contents Conversion System based on XML using J2ME MIDP, Master’s Thesis, Hannam University (2003) 3. Kim, Y.-S., Jang, D.-C.: A Design for Mobile Contents Converting Using XML Parser Extraction. Journal of Korea Multimedia Society 6(2), 267–274 (2003) 4. Kim, E.-S., Kim, S.-H., Yun, S.-I.: Design and Implementation of Wired and Wireless Markup Language Content Conversion Module. Journal of Korea Computer Information Society 9(4), 149–155 (2004) 5. Yun, S.-I.: Integrated Conversion System for Wired and Wireless Platform based on Mobile Environment, Ph.D Thesis, Hannam University (2003) 6. Kim, Y.-S., Oh, S.-Y.: A Study on Mobile Contents Converting Design of Web Engineering. Journal of Korea Information Processing Society 12-D(1), 129–134 (2005) 7. WIPI(Wireless Internet Platform for Interoperability), KWISF(Korea Wireless Internet Standardization Forum) (2004) 8. Kim, I.-G., Kwon, K., You, T.-T.: WIPI Mobile Game Programming, Daelim (2005) 9. Microsoft, Windows Mobile MSDN (2010), http://msdn.microsoft.com/ en-us/library/bb158486%28v=MSDN.10%29.aspx 10. Lee, Y.-J.: A Method of C Language based Solution Transformation between WIPI and BREW Platform, Master’s Thesis, Chungnam National University (2007)
78
Y.S. Lee and Y.S. Son
11. Hong, C.-U., Jo, J.-H., Jo, H.-H., Hong, D.-G., Lee, Y.-S.: GVM-to-BREW Translator System for Automatic Translation of Mobile Game Contents. Game Journal of Korea Information Processing Society 2(1), 49–64 (2005) 12. Lee, Y.-S., Na, S.-W.: Java Bytecode-to-.NET MSIL Translator for Construction of Platform Independent Information Systems. In: Negoita, M.G., Howlett, R.J., Jain, L.C. (eds.) KES 2004. LNCS (LNAI), vol. 3215, pp. 826–832. Springer, Heidelberg (2004) 13. Lee, Y.-S.: Design and Implementation of the MSIL-to-Bytecode Translator to Execute .NET programs in JVM platform. Journal of Korea Multimedia Society 7(7), 976–984 (2004) 14. Park, S.-H., Kwon, H.-J., Kim, Y.-K., Lee, Y.-S.: Design and Implementation of the GVM C-to-WIPI Java Converter for Reusing the Mo-bile Contents. Proceedings of Korea Information Processing Society 13(2), 717–720 (2006) 15. Park, S.-H., Kwon, H.-J., Kim, Y.-K., Lee, Y.-S.: Design and Implementation of the GVM C-to-MIDP Java Converter for Automatic Mo-bile Contents Conversion. Proceedings of Korea Multimedia Society 9(2), 215–218 (2006) 16. Lee, Y.-S.: Design and Implementation of the GNEX C-to-WIPI Java Converter for Automatic Mobile Contents Translation. Journal of Korea Multimedia Society 13(4), 609–617 (2010) 17. Son, Y.-S., Oh, S.-M., Lee, Y.-S.: Design and Implementation of the GNEX C-to-Android Java Converter using a Source-Level Contents Translator. Journal of Korea Multimedia Society 13(7), 1051–1061 (2010) 18. Lee, Y.-S., Choi, H.-J., Kim, J.-S.: Design and Implementation of the GNEX-to-iPhone Converter for Smart Phone Game Contents. Journal of Korea Multimedia Society 14(4), 577–584 (2011) 19. Lee, Y.-S., Kim, J.-S., Kim, M.-J.: Development of the Contents Analyzer and the Resource Converter for Automatic Mobile Contents Converter. Journal of Korea Multimedia Society 14(5), 681–690 (2011) 20. Lee, Y.-S.: Automatic Mobile Contents Converter for Smart Phone Plat-forms. Korea Multimedia Society 15(1), 54–73 (2011)
A Trading System for Bidding Multimedia Contents on Mobile Devices∗ Young-Ho Park Dept. of Multimedia Science, at Sookmyung Women’s University 2-Ga Chung-Pa-Ro, Yong-San-Gu, Seoul, 140-742, KOREA
[email protected]
Abstract. Recently, new interests on digital contents and UCC(User Created Content)s are growing fast through the heterogeneous interest environment. However, there have been many side-effects on those interests. The representative problems are perverting illegal copies and the distributions for personal valuable digital contents to unauthorized anonymous users. These decrease creation motivation of good contents by interfering with the growth of information technology industry and the content provider’s creative will. To resolve these problems, in the paper, we propose a novel auction system for multimedia contents and bidding processes. We call the system as “MobileAuction”. The system first regards the digital contents as the physical materials, and applies the concept of used goods onto digital contents. Especially, the auction system is based on mobile environment. We present new model of the auction process on digital contents by analyzing major algorithms among main auction processes. Finally, the performance evaluation shows that the main auction process algorithms indicate the time complexity of logarithm scale for insertions and searches even though we don’t focus on the performance of the presented system. Therefore, the performance of the system is not significantly influenced by the amount of contents even though the volume of contents in the system is increasing. Our main idea of the paper proposes a novel multimedia content auction system. Keywords: Mobile Auction, Bidding System, Multimedia Contents.
1
Introduction
Today, many works are able to be processed while moving for the development of internet infrastructure. Based on the environment, users can create their own multimedia contents easily. There are many kinds of digital creativities, called UCC(User Created Content)s. The pervasive UCC formats are classified as musics, user created videos, reports, photos, animations, advertisements, heterogeneous multimedia information and so on. However, many multimedia digital contents do not regard the physical contents as ∗
This research was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education, Science and Technology(No.20110002707).
T.-h. Kim et al. (Eds.): MulGraB 2011, Part I, CCIS 262, pp. 79–88, 2011. © Springer-Verlag Berlin Heidelberg 2011
80
Y.-H. Park
commercial materials based on the traditional thinking trend. Furthermore, there have been many illegal copies and unauthorized distributions on the multimedia contents. Although the multimedia contents are valuable, they can be devaluated. These problems can reduce the contents provider’s creative motivation and power. To resolve these problems, the paper proposes the new trading process for those multimedia contents. The proposing process is similar to the auction mechanism, which is the best well-known commercial method for trading of physical used goods when the price of goods is not stable or not firmly determined. The mechanism of the auction can induce the appropriate price for digital goods by a competitive bidding process in general. We adopt the auction process for trading multimedia contents. Let us explain an example. For any music contents, users can resale immediately and easily as an used content, which is restricted as a maximum play number. There are no restrictions such as logistics, changes of mind for purchasing depreciations, value degradation of the goods, depreciations and so on. The auction for multimedia contents has various merits. First, even though the multimedia contents are used ones, those contents are always new. Second, the trading of MobileAuction is fast enough, since the trading process is done on mobile environment immediately. Third, this system increases content provider’s creative power since the content producer makes money as creating UCCs. Last, the multimedia content is not material, so there is no logistical loss of contents when digital goods are delivered to a buyer. Recently, there have been significant researches on DRM(Digital Rights Management)[1], PKI(Public Key Infrastructure)[2] to keep the ownership for multimedia contents. The proposing auction system uses DRM and PKI for the right of multimedia contents. Therefore, the system can resolve those disputes on the digital right[3][4]. In the paper, we propose MobileAuction, which is the brand-new auction system for mobile multimedia digital contents as a new business model. This paper makes the following novel contributions based on the model and the system: • MobileAuction is the new mobile system that makes multimedia contents be traded on mobile devices. • The paper proposes the new auction system and the processing steps dealing the multimedia content. This is a novel trading method for digital contents as a new business model. This paper is organized as follows. Chapter 2 introduces related works and compares MobileAuction with other researches and related systems. Chapter 3 presents the architecture of the proposing auction system. Then, Chapter 4 describes implementation of the mobile device. Finally, Chapter 5 concludes the paper.
2
Related Work
In the section, we introduce some related researches and several commercial mobile auction systems.
A Trading System for Bidding Multimedia Contents on Mobile Devices
2.1
81
Related Researches
As a research of auction systems, [5] proves the advantages of a P2P(Peer-to-Peer) auction system. The paper proves that the P2P method makes good performance than the centralized one. In the paper, the price of an article at the auction is determined within constant time and this is more efficient when participants in auctions are larger. However, the [5] is an auction system only for material goods with physical logistics. MobileAuction adopts the P2P auction process and deals with multimedia digital contents. [6] proposes the auction closing time scheduling algorithm. Over thirty percent of the bids of an auction arrive in the last five percent of the auction’s life time. This creates a surge in the load seen by auction sites as an auction’s closing time approaches. The site’s performance is degraded if the site is not able to cope with these load spikes. The paper proposes the new auction processing algorithm and the new bid processing algorithm. MobileAuction is different from the paper. Since the mobile multimedia auction system is not focused on improving performance of auction but proposing the architecture of the trading multimedia system. 2.2
Commercial Systems
In the section, we describe differences between MobileAuction and commercial auction systems. The first is a commercial auction system that deals with material goods on mobile devices. There are two examples of the mobile auction system. One is the representative system in Republic of Korea, MpleOnline[7]. This system was originally developed on PC(Personal Computer)s and redeveloped in mobile environment recently. However, it is different from MobileAuction since the target of an auction is restricted on material goods causing timely physical logistics. The other is "Opera Mini-ebay" that is an auction system on mobile devices. The system is developed by "ebay[8]" and "OperaSoftware[9]". "Opera Mini" provides its customised web brower with “ebay”. Therefore users can access "ebay" on a mobile device at anywhere, anytime. However the auction system deals with only material goods with physical logistics. The second is those web sites for trading multimedia contents. The representative examples are Joongangphtoto[10] and Yonhapphoto[11] also in Republic of Korea. These sites provide photo contents with other news sites or a private person. However, those sites trade only picture contents as follows. The owner of a photo content determines the price of the photo and waits for selling as the own photo. The trading method mentioned before has defects that the price of photos cannot be stable. We resolve this problem by using auction processes. We can consider, MobileAuction is the best approach for organizing a proper reasonable price of a multimedia content that has vague prices.
3
The Mobile Multimedia Auction System
In the section, we describe the auction system architecture for multimedia contents and the bidding process. The section 3.1 presents the internal architecture of
82
Y.-H. Park
MobileAuction. The section 3.2 shows the three kinds of auction processes that include a resale process, a bidding process and a buy-it-now process. 3.1
System Architecture
In the section, we first describe the architecture and the trading method for a multimedia mobile auction system. The system includes Mobile Device, DCBSS(Digital Content Backup Storage Server), Content Service System, Multimedia Content Auction Server, Multimedia Content Management System, and so on. Figure 1 shows the architecture of MobileAuction system introducing as new a multimedia mobile auction system having a brand-new business model.
Fig. 1. The Architecture of MobileAuction System
The Mobile Device defining in the paper includes all kinds of mobile devices, which can handle digital multimedia contents as mentioned in Interoduction section. As a communication method between the server and the client uses P2P[12][13] for a role change. Thus, all mobile devices in the system can be a provider or a receiver for multimedia contents. The Digital Content Backup Storage Server is called as DCBSS. It keeps the detail trading information. The detail trading information includes a multimedia content itself, information for a content owner, a price of a multimedia content and information of DRM or PKI. DCBSS plays two different roles in MobileAuction as follows. First, the large volume of multimedia contents is stored in DCBSS instead of mobile devices that have relatively small memory space. The clients do not have to store large volume of multimedia contents but receive streaming services from DCBSS.
A Trading System for Bidding Multimedia Contents on Mobile Devices
83
Second, if the user can not download on his mobile device due to the memory restriction and so on, he can download multimedia contents from DCBSS since it has a history and a right caused by his purchasing. DCBSS keeps the multimedia contents against to several kinds of restrictions of mobile devices. The Content Service System manages the information of multimedia contents, customers and payments. It includes Multimedia Content Database Server, Customer Database Server and Payment Database Server. Multimedia Content Management System is organized as DRM(Digital Right Management)[14] Server, KMID(Korea Music Identification) Server, Watermarking Server, FingerPrint[15] Server, and PKI(Public Key Infrastructure)[16][17] Server. The Multimedia Content Management System keeps the content digital right of an owner for a multimedia content. DRM is a system that manages ownerships of multimedia contents. KMID is a standard code for Korean music files. It is an identification of music files and granted to all music files in Korea respectively. This can extend the concept to each nation. The authorizing methods for photo files or video files use Watermarking or FingerPrint methods. PKI is a representative method for delivering digital contents on mobile devices and a lot of researches have been addressed on PKI[16][17] [18]. The Multimedia Content Auction Server manages bidding processes by using information of Multimedia Content Management System and Content Service System. Payment process and bidding process for a multimedia content are managed on the Multimedia Content Auction Server. Figure 2 shows the internal architecture between Moible Devices and DCBSS for more detail views. First, we describe the internal software architecture on mobile devices and DCBSS(Digital Content Backup Storage Server). The Moblie Device is organized as four agents, which are Digital Right Management Agent, Usage Count Management Agent, Content Agent and Communication Agent. These agents interact with DCBSS and other mobile devices. DRM(Digital Rights Management) Agent, UC(Usage Count) Agent and Content Agent use the information of DR(Digital Right), UC(Usage Count) and Content in Content Pool that is located in the DCBSS. Agents in the mobile device check the DR, UC and Content before the contents are delivered. DRM Agent interacts with DR Manager in DCBSS and manages the digital right of a multimedia content. The Usage Count Agent counts the usage of multimedia contents. Content Agent manages the quality of a multimedia content and keeps the contents from damages such as bit-errors. Communication Agent delivers the trading information between DCBSS and Mobile Device through the mobile IP network. The DCBSS includes DR(Digital Right) Manager, UC(Usage Count) Manager, Content Manager and Content Pool. Those Managers in DCBSS make trading be completed with the help of agents in Mobile Devices. If the content has no a digital right, a DR Manager adds a digital right to the digital content before delivering the content. UC Manager checks the remainder of UC. If the remaining usage count is non-negative, the content is delivered to the client. Before delivering the multimedia content, Content Manager merges the digital right and usage count then makes an extended multimedia content. The multimedia content is transferred to Secure Agent and Transfer Agent.
84
Y.-H. Park
Fig. 2. The Internal Architecture of Mobile Device and DCBSS
Next, Figure 3 describes the communication between the DCBSS and the Mobile Devices. Multimedia content is transferred to the Secure Agent and the Transfer Agent. Those agents transfer multimedia contents with the information of DR(Digital Right), UC(Usage Count) and Content in DCBSS. When delivering the multimedia content, the information of DR, UC, and Content itself is transferred together.
Fig. 3. Communication between the DCBSS and the Mobile Devices
The Secure Agent in the middle of Figure3 keeps the security of a multimedia content. For example, the multimedia content is encrypted by PKI(Public Key Infrastructure). The multimedia content can be used after a consumer decrypts the encryption. If the consumer decrypts the encryption, the content can be activated. The Transfer Agent transfers multimedia contents between Mobile Devices and DCBSS. The multimedia content is transferred through mobile IP network. Mobile IP network is a communication method that is developed by IETF(Internet Engineering Task Force)[19] to support user’s movement in mobile environment[20].
A Trading System for Bidding Multimedia Contents on Mobile Devices
3.2
85
Mobile Multimedia Auction Process
There are three auction processes in MobileAuction. Those are resale process, buy-itnow process and bidding process. Figure 4 shows the three auction processes of MobileAuction.
Fig. 4. The Auction Process for Multimedia Contents
If users do not want to use their multimedia contents anymore, he can resale those contents at MobileAuction. The Digital content is traded after the DRM process by the Multimedia Contents Management System. The mobile payment is processed by the Payment Database Server in Content Service System. The payment is processed on a mobile device and digital right is given to the multimedia content by the DRM server[21]. The usage count is checked by UC Agent and if the mobile payment[22][23] is successfully completed, the transaction is done. The resale process of multimedia content is as follows. If an user does not want to use a multimedia content, he can register the content to the Multimedia Content Auction Server and sell the used content. The multimedia content is processed by the Multimedia Content Management System. The Multimedia Content Management System processes DRM and PKI for the multimedia content. Then, the remainder of Usage Count is checked by the UC Agent. If the seller receives the purchasing request, the payment and the commission process are started. The Payment and Commission is processed by the
86
Y.-H. Park
Payment Database Server in Contents Service System. Finally, the ownership of the multimedia content is transferred to a new consumer. The buy-it-now process of multimedia content is as follows. The buy-it-now process is the same with the resale process from the start of the transaction to the DRM process. After the DRM process, the UC Agent checks the proposed usage count. If the new purchasing request is generated, a payment and a commission process for buy-it-now process are started. Lastly, the ownership of the multimedia content is transferred to the new consumer. The bidding process of digital content is as follows. The process is the same with the buy-it-now process from the start of the transaction to the check point of the proposed usage count. In the process, the multimedia content is purchased through a competitive bidding. If a new bid of multimedia content is bid successfully, the payment is processed by the Content Service System and Digital Content Management System. The bidding process is completed after the digital right is transferred to the bidder.
4
Implementation of Mobile Systems
In this section we show real processing steps of MobileAuction for describing the main bidding processes of the system. The main bidding processes includes registering process, bidding process and buy-it-now process for multimedia contents. Figure 5 shows the registering process of a multimedia content. Figure 6 shows the bidding process of a multimedia content. The MobileAuction is implemented for Korean people. This can extend to other languages for the same process in MobileAuction. The (a) of Figure 5 shows the first screen of MobileAuction. The menu is composed of “Contents UP and DOWN”, “Purchasing Photo”, “Purchasing Video”, “The way of using MobileAuction”, “My page”. The “Contents UP and Down” is to register photo contents or video contents to the auction server.
(a) Main Menu (b) Start of Registering Process (c) DRM Process (d) End of Registering Process
Fig. 5. The Registering Process for a Multimedia Content
The “Purchasing Photo” and “Purchasing Video” are to buy photo contents or video contents. The “Way of using UbiAcution” describes how to use MobileAuction. The “My Page” shows the personal information for members. The (b) of Figure 5 shows the registering process of a photo content. The (c) of Figure 5 shows the DRM process for the multimedia content. The (d) of Figure 5 shows the end of the registering process.
A Trading System for Bidding Multimedia Contents on Mobile Devices
87
The (a) of Figure 6 shows the starting screen of bidding for a multimedia content. If an user choices the multimedia content, he writes the bidding price that he wants to bid like the (b) of Figure 6. If the price is reasonable, bidding is successfully completed as the (c) of Figure 6. Then the user purchase the content as the (d) of Figure 6.
(a) Start of Bidding for a Multimedia Content (b) Insertion of the Bidding Price (c) Success of Bidding Process (d) End of Bidding for a Multimedia Content
Fig. 6. The Bidding Process for a Multimedia Content
5
Conclusion
We have proposed the novel auction system called MobileAuction for multimedia contents. To trade the multimedia content at a reasonable price, we have presented new auction process. Those main auction processes are composed of a bidding process, a buy-it-now process and a content registering process. Especially, the auction system is based on mobile environment. Then, we presented implementation results for the system. In the traditional UCC(User Created Content), users only share multimedia contents. There were no compensation for content providers. It decreases the creation of valuable multimedia contents for the reason that it does not compensate for the effort of a content provider. However, the proposing auction system encourages the content provider to create more valuable multimedia contents. UbiAcution can be a new business model using an immaterial multimedia content. The importance of MobileAuction is that the new auction system for a multimedia content can be the representative model for trading the multimedia content.
References 1. Kim, G., Shin, D., Shin, D.: An efficient methodology for multimedia digital rights management on mobile handset. Proc. the IEEE Trans. on Consumer Electronics 50(4) (November 2004) 2. Cheung, T.-W., Chanson, S.T.: Design and Implementation of a PKI-based End-to-End Secure Infrastructure for Mobile E-Commerce. In: Proc. the IFIP TC6/WG6.1-21st Int’l Conf. on Formal Techniques for Networked and Distributed Systems Table of Contents, vol. 197, pp. 421–442 (2001)
88
Y.-H. Park
3. Mark, B.: Internet Digital Rights Management Taxanomy. In: Proc. the IETF-51 (August 6, 2001) 4. Paul John, D., Bulter, W.: Digital Rights Management Operating System. United State Patent 6,330,670 (December 11, 2001) 5. Ogston, E., Vassiliadis, S.: A peer-to-peer agent auction. In: Proc. the First Int‘l Joint Conference on Autonomous Agent and Multiagent Systems Part I, Italy, pp. 151–159 (July 2002) 6. Menascé, D.A., Akula, V.: Improving the Performance of Online Auction Sites through Closing Time Rescheduling. In: Proc. The First International Conference on the Quantitative Evaluation of Systems, pp. 186–194 (2004) 7. "Mple", http://www.mple.com 8. "ebay", http://www.ebay.com 9. "OperaSoftware", http://www.opera.com 10. "JoongAngilbo PHOTO ARCHIVE", http://photo.joins.com 11. "Yonhap Conetents", http://sales.yonhapnews.co.kr 12. Hara, T., Madria, S.K.: Consistency Management among Replicas in Peer-to-Peer Mobile Ad Hoc Networks. In: Proc. the 24th IEEE Symposium on Reliable Distributed Systems (SRDS 2005), pp. 3–12 (2005) 13. Sumino, H., Ishikawa, N., Kato, T.: Design and implementation for P2P protocol for mobile phones. In: Proc. the Fourth Annual IEEE Int’l Conf. on Pervasive Computing and Communications Workshops (PERCOMW 2006), pp. 363–398. NTTDoCoMo Inc. (2006) 14. Abie, H., Spilling, P., Foyn, B.: A distributed digital rights management model for secure information-distribution systems. Proc. the Int’l Journal of Information Security Archive 3, 113–128 (2004) 15. Hartung, F., Ramme, F., Research, E.: Digital Right Man-agement and Watermarking of Multimedia Content for M-Commerce Applications. Proc. IEEE Communication Magazine, 78–84 (November 2000) 16. Hadjichristofi, G.C., Adams, W.J., Davis IV, N.J.: A Framework for Key Management in Mobile Ad Hoc Networks. In: Proc. the Int’l Conf. on Information Technology: Coding and Computing (ITCC 2005), vol. 2, pp. 568–573 (April 2005) 17. Wu, B., Wu, J., Fernandz, E.B., Magliveras, S.: Secure and Efficient Key Management in Mobile Ad Hoc Networks. In: Proc. the 19th IEEE Int’l Parallel and Distributed Processing Symposium (IPDPS 2005)-Workshop, vol. 17 (2005) 18. Dankers, J., Garefalakis, T., Schaffelhofer, R., Wright, T.: Public Key infrastructure in mobile systems. Proc. the IEEE Electronics and Communication Engineering Journal 14(5), 180–190 (2002) 19. IETF, http://www.ietf.org 20. Hur, K., Roh, J.-S., Eom, D.-S., Tchah, K.-H.: The TCP Anlaysis of Packet Buffering in mobile IP network. Korea Association for Telecommunication Politics 28(5B) (2003)
Design of a Context-Aware Mobile System Using Sensors∗ Yoon Bin Choi1 and Young-Ho Park2,** 1
Dept. of Computer Engineering, at MyongJi University San 38-2 Namdong, Cheoin-gu, Yongin, Gyeonggido, 449-728, Korea
[email protected] 2 Dept. of Multimedia Science, at Sookmyung W. University 2-Ga Chung-Pa-Ro, Yong-San-Gu, Seoul, 140-742, Korea
[email protected]
Abstract. Recently, there are many applications on a smartphone which lead you convenient life such as navigation, recorder and web browser. However, we should launch these applications ourselves when we want to use one of them. If smartphone has a lot of applications it is quite painful searching by screen touch. In this paper, we present ASinM(Aware-Sensors in Mobiles), aware of our situations and phone usage patterns using sensors (e.g., GPS, accelerometers, compass, audio and light) and then, launch a proper application on smartphone so we can save our pain for searching a particular application when you want to use. Firstly, ASinM uses accelerometers to recognize user's steps. After that, it judges whether user is walking or running for launches a kind of the step counter application or specific application chosen by the user. Also ASinM uses GPS to get user's speed data. It launches such as the navigation application when the speed is higher than human running. Secondly, ASinM recognizes patterns of user's phone usage and then, determines user's situation such as absence. User can assign a particular application on each situation. We evaluate ASinM through real-world experiments using a prototype implementation on an Android based smartphone (e.g., GalaxyS) and show that it can launch an application properly several typical situations. Keywords: ASinM, Context-aware, Smartphone, Android.
1
Introduction
Today, as smartphones have become a part of life, a variety of mobile applications have emerged that lead us convenient life. For instance, GPS based applications (e.g., navigation) allow us to find the best path to the destination while driving and provide useful information (e.g., speed, traffic and hazard). Also, there are many useful applications give us great opportunity to get proper information or data on various ∗
This research was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education, Science and Technology(No.20110002707).
**
Corresponding author.
T.-h. Kim et al. (Eds.): MulGraB 2011, Part I, CCIS 262, pp. 89–96, 2011. © Springer-Verlag Berlin Heidelberg 2011
90
Y.B. Choi and Y.-H. Park
situations. Concretely, ‘Bus Information’ application gives you useful data such as bus schedule, bus route information, or current bus location when you are at bus stop. ‘Product Search’ application can search the price of goods by scanning barcode. However, while using smartphone, we have to launch a particular application to get the benefit of the smartphone. Also, because the smartphone is around us all the time, the smartphone can keep track of user’s usage such as phone call, message and application. Although, there are several repetitive usages, user has to repeat manually. How convenient if smartphone launch a particular application automatically when it detects and recognizes our situation. We introduce ASinM, Aware-Sensors in Mobiles, which is the service application on smartphone. It detects and recognizes your situation such as walking, moving by car, waiting at the bus stop, absence etc. Then, it launches the recommended application automatically. For example, launch a navigation application when it recognizes you are on driving situation. Also, it is possible to assign particular application to particular situation. For example, you can assign a step counter application or path tracking application for walking situation. In other words, ASinM determines only you're walking or not then, user decides what application will launch when you're walking. For recognize user’s situation, ASinM uses sensors (e.g., GPS, accelerometer, compass) and phone usage data. From GPS we can collect user’s speed and location data. Speed data is used for recognize whether user is driving or not. In this case, we can judge as driving when speed is more than the fastest human running speed. Location data is used for determine whether user’s location is significant place or not. In this case, we can use public information such as bus stop locations, subway station locations, park locations or specific buildings locations (e.g., bookstore, market, department store and school) to realize significant place from coordinate data. Accelerometer basically gives us three directions of acceleration data. ASinM uses this data to recognize user’s step and then, determines user is walking or running based on interval between steps. Also, we can recognize special movement of smartphone such as shaking, upside down, and turning as a circle. Lastly, ASinM keep track of user’s phone usage to analyze user’s situation. For example, after more than 3 missed call detected, ASinM determines that user is absence. After that, When call again, ASinM automatically launches the assigned application such as an auto answer application. In brief, ASinM determine user’s situation by sensors and usage patterns and then launches the application which assigned the situation by user. Therefore user doesn’t have to find and touch icon for launch an application. Now, we define a problem in the paper as follows. In these days, smart phone gradually takes over the roll as human’s partner, instead of feature phone. A feature phone has limited application environment, because all of applications are belong to vendor. However, smart phone has huge opportunity to get a lot of applications using app store or app market. Anyone can download a lot of applications on own smart phone when they want to use. These applications are not only for fun, but also make human-life affluent. But it is very difficult that using these applications effectively, or consistently, because there are too many applications on smart phone that make you annoyingly to find and launch the application what you would want to use. In short, the problem is that we are not enough diligent as much as keep track of smart phone all the time for use at variety situations.
Design of a Context-Aware Mobile System Using Sensors
91
To solve this problem, we going to use sensors on a smart phone which is not exist on a feature phone. Sensors give us lots of data to figure out our situations. Also, we can get user’s phone usage data from smart phone. It is possible to analyze user’s usage patterns from cumulous data. These works we called “Context-aware using smart phone”. The major function of ASinM service is automatically launches a specific application at specific situation. From now on, we are going to have three goals to realize this major function successfully. We present three goals as follows. First goal is collecting real-time data from motion sensors. For aware of user’s context, we should collect user’s motion data. The smart phone has a lot of sensors include sensors for user’s movement. Firstly, the accelerometer is the best motion sensor that captures three directions of user’s acceleration data. Therefore we can get any movement information of smart phone from accelerometer. Secondly, the GPS is also the most common sensor to get user’s real-time location data. Moreover, GPS data can be translate easily to speed data. ASinM service is going to collect data from both of sensors at Android system [1]. Second goal is recognizing significant situations using data analysis. The core technology of ASinM is the situation recognition. First step of recognition is that compare defined data with data of motion sensors. Concretely, define the boundary speed value between human running and moving by vehicle and then, compare with GPS speed data to determine user is on the vehicle or not. Second step of recognition is that analyze raw data using the specific algorithm to transform into meaningful data. Concretely, analyze acceleration data of three directions (x, y, and z) as a step. Further, obtain pace of step by count steps while period of time. Eventually, ASinM will recognize significant situations by both raw sensor data and transformed data. Third goal is launching an application automatically when a significant situation has detected. The ultimate goal of ASinM is launch a proper application on a significant situation. At the beginning, ASinM gives you several situation and recommended application. However, user can change the application on each situation by them. To do so, ASinM should know the list of applications from smart phone. Also, ASinM should possible to launch any application via smart phone’s operating system. In this paper, we are focus on the Android based smart phone that world widely used.
2
Related Work
Since smartphones has supplied smartphones to public, many organizations are devoted to researched services that maximize user’s convenience and minimize user’s recognition through user’s usage patterns and context awareness. We need investigating other organization’s researches to absorb advantage of researches and to exclude disadvantages because of ASinM is such as a Framework that supplying convenience to user and recognizes also user’s situation (e.g. activity, etc.) and user’s life pattern or a usage pattern of smartphone. In this part we categorize many technologies that is based on ASinM. We present five related topics as follows. First is context-aware-browser-positional information based context-aware web searching. Context-Aware-Browser (CAB) [2] that is researched by SMDC Lab of Udine shows contents to user. CAB inferred user situation through sensor’s data that
92
Y.B. Choi and Y.-H. Park
is measured by installed in mobile and searched web contents by the user situation. After received searching contents CAB refined those contents by refining additional information steps and shows them to user. But CAB required search engine that is support CAB and has a disadvantage because of received data that is translated by forms of XML from web that restricted web information scope that has a particular formation. Second is SmartActions, a non-supervisor learning of recognition user life-pattern. SmartAction [3] is researched by NOKIA generate shortcuts on display automatically. Those shortcuts generation is considered by user’s current positional information and time information. Not only generate shortcuts also show a user level of action like “Call to Harry”, “SMS to Marry”. But SmartAction show only abstract contents through learning, that make SmartAction limited only regular life-pattern. Because of that fact this paper analysis user’s moving path real-time to recognize user’s situation and then infer right service. (e.g. app launch or etc.) Third is step counter service, detected human step by accelerometer. Step Counter Service [4] is researched by IAIS in Germany, detect and recognize user’s activity. Detect running and walking, Step Counter Service is launch application that need some motion event to launch or count of step information. Step Counter Service not only is simply application also is middleware or framework that is similar to ASinM. Fourth is moving route recommendation system, analyzing user path and recommendation moving path using GPS Moving Route Recommendation System (MRRS) [5] is researched by Dankuk University, use GPS that installed in PDA or smartphone and use positional information to analyze user’s moving path and to recommend optimal moving path. The architecture is Analyzing and recommending algorithm is in the server, client just sends positional information to server and received optimal moving path through Google Map. Final is rule-based context modeling for context-ware service, rule-based contextaware system. Rule-based Context Modeling for Context-ware Service[6] is researched by SoongSil University, is a framework. This framework use context data from user and then made abstract context by rule. After made abstract context, if framework recognize some-context that is abstract context made before, some services activate. (e.g. app launch etc.) That is similar to ASinM which made some rules that recognize user’s situation. But out prototype registered some service that is triggered some context by user that is different the framework. After make prototype testing, we will make refine our ASinM's refined inferring engine.
3
A Context-Aware Life System using Sensors
In the chapter, we introduce architectures of the presenting system, main services, detecting sensors, application launchers, and their implementation methods. 3.1
ASinM System Architecture
Basically, ASinM is standalone application. It needs only sensor accessible driver and protocol for launch an application. However, ASinM should be able to upload its setting values to the server because ASinM has complicated setting values. This paper is focusing on recognizing user’s situation. Therefore, Server side will not be
Design of a Context-Aware Mobile System Using Sensors
93
mention. Anyway, the entire architecture of ASinM is Figure 1. There is also ‘Public Information Center server’. ASinM will use this server to get public information such as bus stop location and train station location.
Fig. 1. ASinM System Architecture
At the first use ASinM right after installation, ASinM will ask to user about user’s state. There are several questions for user’s state. For example, “Do you have a car?” “What is your job?” The answer for the question will use for determining the user’s situation. When user moving faster than the human fastest running, ASinM will recognize its situation as riding on vehicle, and then, if user’s answer was ‘doesn’t have a car’, ASinM will recognize its situation that user is using public transport such as taxi, bus or train. After initial setting done, ASinM starts detecting user’s motion by sensors and Recording user’s device usage at the same time. ASinM has 4 modules that Main service, App launcher, Sensor Detector, Pattern Recognizer. Main service handles rest of three modules and each three modules are connected to one or two of adequate device’s system framework module. Figure 2 shows the ASinM service architecture.
Fig. 2. ASinM Service Architecture
94
3.2
Y.B. Choi and Y.-H. Park
Main Services
Main Service is a background process on smart phone. When ASinM has started, Main Service creates three sub modules that already mentioned. Each sub modules will Main Service has multi-thread routine for processing three sub modules as parallel. Also, Main Service has callback methods which its sub modules are able to call. Main Service continuously saves data from callback methods and then, regularly checks the current situation of user. 3.3
Detecting Sensors
Sensor Detector detects two sensors. One is Accelerometer and the other is GPS. Sensor Detector uses framework’s sensor interface. Android framework provides SensorManager class for access sensors. Sensor Detector detects steps from accelerometer. Also, it checks the moving speed and current location from GPS. The details will mention at implementation part. At first, as shown in Figure 3. Pattern Recognizer cyclically records device usage to database. This part called Usage Recorder. Second part of pattern Recognizer is Data Analyzer. It converts usage data from database to a pattern and then matching with defined patterns to recognize a specific situation.
Fig. 3. Pattern Recognizer with Databases
3.4
Application Launcher
For launching an application, we should know the id of application and the protocol for launch an application on smart phone’s operating system. In the Android System, application id is the application’s package name. Also, Android System provides ActivityManager for handling application on Android OS. Also, App launcher supports preference interface to user for choose an application on a situation. Therefore, when Main Service requests to start an application with situation parameter, App Launcher launches an application which is appointed in preference. 3.5
Implementation of ASinM
In this section, we are going to mention about prototyping in Android based smart phone. Firstly, Prototype of ASinM has four states during running on Operating System. Figure 4 is the flow chart for ASinM.
Design of a Context-Aware Mobile System Using Sensors
95
Fig. 4. ASinM Flow chart
Start state is the beginning of ASinM. At Start state, Main Service creates sub modules and start inner threads. Initialize state is start at right after start state. It initializes local values and register Notification to Android OS so that user would be able to know ASinM started well. After initialize state, ASinM begins the infinity routine to check detection of situation. We call this state as Checking. During Checking, situation data and pattern data is update by each sub modules. Also, if a specific situation detected, launch an application using App Launcher. Unless user stops the ASinM, Checking will repeat again. When user stops the ASinM, ASinM state will change to Finish state. In this state, ASinM clears all resources and clears notification.
4
Evaluation Method of ASinM
Evaluate performance of Sensor Detector and Pattern Recognizer will be the main evaluation of ASinM. We going to count the success number of situation recognition each different situation. There are three different devices (Samsung-galaxys2, HTCdesire HD, LG-optimus Z) for this experiment. After experiment, we are going to fill out the Table 1 in below. Table 1. Table Formation of Recording Results Situation
Walk Run Vehicle Absence
App
Total
Success
Fail
Margin
96
Y.B. Choi and Y.-H. Park
As the detail of experiment, the experimenters will use the smart phone with ASinM for 24hours. During experiment, experimenters will direct defined situation and check whether the registered application launched or not. For the boundary value between walking and running situation, the initial boundary speed will be 5 km/h which is the normal human walking speed. The GPS Sensor will detect user’s moving speed based on distance of movement. More than 20 km/h is initial limit value of moving by vehicle. At last, absence situation will be detected by three times of missed call. After three missed call, Auto answer application should launched when fourth call coming.
5
Conclusion and Future Work
We present ASinM(Aware-Sensors in Mobiles) system. ASinM recognize any situations and phone usage patterns using sensors and then, launch a proper application on smartphone so we can save our pain for searching a particular application when you want to use. For this, we use accelerometers to recognize user's steps. After that, it judges whether user is walking or running for launches a kind of the step counter application or specific application chosen by the user. Also ASinM uses GPS to get user's speed data. It launches such as the navigation application when the speed is higher than human running. Then, ASinM recognizes patterns of user's phone usage and then, determines user's situation such as absence. User can assign a particular application on each situation. We, in Chapter 4, present the method of evaluating ASinM through real-world experiments using a prototype implementation on an Android based smartphone (e.g., GalaxyS) and, in future work, will show that ASinM can launch an application properly several typical situations.
References 1. Android Developer’s Guide, http://developer.android.com (retrieved February 2010) 2. Coppola, P., Mea, V.D., Gaspero, L.D., Menegon, D., Mischis, D., Mizzaro, S., Scagnetto, I., Vassena, L.: The Context-Aware Browser. Proc. of the IEEE Intelligent Systems 25(1), 38–47 (2010) 3. Vetek, A., Flanagan, J.A., Colley, A., Keränen, T.: SmartActions: Context-Aware Mobile Phone Shortcuts. In: Proc. of the 12th IFIP TC 13 International Conference on HumanComputer Interaction: Part I, August 24-28 (2009) 4. Mladenov, M., Mock, M.: A step counter service for Java-enabled devices using a built-in accelerometer, http://portal.acm.org/ft_gateway.cfm?id=1554235&type=pdf&CFI D=26242044&CFTOKEN=87571334 5. Kim, S.-Y., Park, B., Jung, J.-J.: User Route analysis of using GPS on a Mobile Device and Moving Route Recommendation System, http://www.dbpia.co.kr/view/ar_view.asp?arid=1603554 6. Choi, J.-H., Kim, J.-M., Seo, E., Park, Y.-T.: Rule-based Con-text Modeling for Contextawarer Services in Smart Phone Environments, http://www.dbpia.co.kr/view/ar_view.asp?arid=1162151
Finding Harmonious Combinations in a Color System Using Relational Algebra∗ Young-Ho Park Dept. of Multimedia Science, at Sookmyung Women’s University 2-Ga Chung-Pa-Ro, Yong-San-Gu, Seoul, 140-742, Korea
[email protected]
Abstract. Recently, the importance of interest in color harmony is increasing. For this, the paper focuses on harmony between colors. Selecting colors by mere feelings, however, can bring about results that do not fit in with the overall concept and can cause differences according to social environment, conscience, gender, etc. However, learning the color harmony theory requires a lot of time since users have to know the color harmony theory, and train for color combination. To solve this problem, the paper present the method finding harmonious combinations based on color harmony theories using relational algebra. The method proposes the color harmony rule based on the color harmony theory and formalized the rule. Through this, users can produce the same results in color selection as professionals without the expert knowledge in color. Keywords: Color, Color Harmony, Color Harmony Theory, Relational Algebra.
1
Introduction
Recently, with the dawn of the emotional age and development in economy, the importance of interest in color harmony is increasing. More people have started to consider the value of design in daily life products ranging from fashion, interior, electronic products, to media and so on [1]. For the completion of design, it should be done in harmony with design elements such as illustration, typography, color, and photo image. Among these, the proportion of the color is substantial. People use assorted graphic software to create various design results including poster designs, package designs, interior designs, fashion designs, and even photo design [2]. Even though most of the Graphic software provide color palettes to make color selection easier for users, it is still difficult for users to select harmonious colors because color combination methods are not yet clearly shown [3]. Color harmony theory is guidance to color mixing and color combination set by color scholars, which considers the three attributes of color, hue, brightness, and ∗
This research was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education, Science and Technology(No.20110002707).
T.-h. Kim et al. (Eds.): MulGraB 2011, Part I, CCIS 262, pp. 97–107, 2011. © Springer-Verlag Berlin Heidelberg 2011
98
Y.-H. Park
saturation. Selecting colors by mere feelings, however, can bring about results that do not fit in with the overall concept and can cause differences according to social environment, conscience, gender, etc. However, learning the color harmony theory requires a lot of time since users have to understand the color harmony theory, and train for color combination. To solve this problem, previous related researches proposed various systems for effective color recommendation. These systems, however, provide a huge pool of colors and do not consider the users’ basic understanding and usage regarding color. They also easily bore the users in that they give identical color combinations to same input colors which limit the range of selection. In the paper, we propose the color combination recommendation system in order to aid the color selection of users. For this, we organized the rules based on the color harmony theory. The paper presents contributions as follows. The paper organizes the color harmony rule based on the color harmony theory and formalized the rule. Through this, users can produce the same results in color selection as professionals without the expert knowledge in color. The remainder of the paper is organized as follows. Chapter 2 reviews existing related works. Chapter 3 presents the method finding harmonious combinations based on color harmony theories using relational algebra. Chapter 4 summarizes and concludes our paper.
2
Related Work
The section introduces existing methods on color recommendation systems and their characteristics. There has been a lot of work on color recommendation systems that are based on color harmony theories. The existing related research have been approached for color selection recommendation system based on color harmony rules[1],[4],[5], and color recommendation systems through emotion and preference[2],[3],[6],[7]. In [5], the authors propose color design support system considering color harmony which that automatically system considering color harmony recommends color schemes sets of colors. The schemes are harmonized with a user’s first inputted color in the system and correspondent to user’s inputted image keyword that is related with emotion such as casual, pretty, modern. The system judges the degree of the color harmony by fuzzy rules that are defined based on color image scale of the NCD and the color coordination of Matsuda. First, the user enters his favourite color with a color image keyword, and the system makes color scheme sets of colors to combine with the input color. These color scheme sets of colors images are evaluated by system and the user gets various color harmony schemes according to user’s image keyword which was provided by the user. However, the huge numbers of the result colors bring out confusion to select final color and when the input color does not correspond with the image keyword. The result colors are not suitable.
Finding Harmonious Combinations in a Color System Using Relational Algebra
99
In [6], the system recommends colors matching the user’s preference and skin color. The colors used in this system are based on P.C.C.S color system that classifies the colors by tones. To apply the user’s preference, the system uses image words such as Masculine, Feminine, Peaceful, Vivacious, Old-Young, Classical, Futuristic, Gimcrack, and Noble. The image words are extracted from questionnaire designed to pick words that suitably describe the image of the color product. Then, the system converts image words that were input by user to one of eight color tones, Pale, Light, Bright, Vivid, Deep, Dark, Dark Greyish, and Greyish based on fuzzy set theory. Then, the system recommends the high priority colors based on Moon and Spencer aesthetic measure theory [7] that is a standard to measure the colors are harmonious or disharmonious. The system recommends one color or two colors from user’s skin color. However, when the system recommends two colors from user’s skin color, the result colors are in the same tone. The colors in the same tone are harmonious, however, if the user wants two colors in the different tones, the colors recommended by the system are limited.
3
Detecting Color Harmony as Relational Algebra
This chapter shows the method finding harmonious combinations based on color harmony theories using relational algebra. For this, Section 3.1 introduces existing color harmony rules presented by Ostwald, then, we use NCS(Natural Color System) as a base system since it can be easily converted to computer domain. Section 3.2 shows the formal method for presenting six relational algebras according to each color harmony. 3.1
Basics of Color Harmony Rules in NCS
This research embodies a system based on the NCS(Natural Color System) in order to systemize the color harmony rule on top of Ostwald color harmony theory. The NCS colors were created based on the concepts of Ostwald colors that enable varieties in the field of design as the arrangement of colors is clear and easy to process and understand. The NCS colors that developed from this theory has universal nature colors as its base colors and thus carries the advantage of presenting the exterior color of an object in the viewpoint of human eyes. It is also easy to systemize as each color is quantified into percentages. Thus, this research systemizes the color harmony rule based on the color harmony theory of Ostwald according to the NCS colors. Figure 1 shows the NCS system. Figure 1.1 shows the color circle of the NCS system. The color circle is based on the most fundamental four colors that humans are able to differentiate, yellow(Y), red(R), blue(B), and green(G). They are divided into ten steps between Y and R, R and B, B and G, and G and Y, respectively. Thus total number of the color on the color circle becomes forty. Each color in Figure 1.1 represents hue. For example, Y20R has eighty percent of yellow and twenty percent of red. Figure 1.2 shows the equal color triangle of the NCS system. Each hue in Figure 1.1 has the equal color triangle as Figure 1.2.
100
Y.-H. Park
Fig. 1.1 The color circle
Fig. 1.2 The equal color triangles
Fig. 1. Basic Concepts of the NCS(Natural Color System)
The equal color triangle is the set of colors that consists of variations of chromaticness and blackness for each hue in Figure 1.1. W and S of the central axis in Figure 1.2 represent white and black, respectively. W is the brightest white and S is the darkest black, since blackness increases to get from W to S. The central axis is the grayscale. C in Figure 1.2 represents the unmixed color that doesn’t include blackness and whiteness at all and is each color on the color circle. The colors located in the direction of W to C have the same blackness and are equal blackness colors. The colors located in the direction of W to S have the same chromaticness and are equal chromaticness colors. The colors located in the direction of S to C have the same whiteness and are equal whiteness colors. Here, the whiteness is determined by the following Eq. (1): whiteness = 100 – (chromaticness + blackness)
Eq. (1)
We first convert NCS’ hue values to the hue numbers, 1 to 40, to apply the six color harmony rules. The six color harmony rules are complementary color harmony, similarity color harmony, different color harmony, identity color harmony, polychromatic harmony, and achromatic harmony, respectively, and are the expressions to find harmonious colors with the input color. We present each color harmony rule in detail in each section. To convert NCS’ hue values to the hue numbers, we create Algorithm 1. We first create substrings, C1, C2, and strength, from NCS’ color expression (Line 1). For example, if the input was ‘2010-Y30R’, then C1=’Y’, C2=’R’, and strength=’30’. Since the second part of NCS’ color expression, Y30R, represents NCS’ hue, we convert this to the hue number. Then, we select the first place of substring strength (Line 2). We convert to the hue number comparing with C1 and return the hue number.
Finding Harmonious Combinations in a Color System Using Relational Algebra
101
Algorithm 1. Converting NCS’ hue to the hue number Input: NCS’ color expression Output: converted hue number 1: creating substrings C1, C2, and strength from NCS’ color expression. 2: strength=strength.substr(0,1); 3: if C1 = ‘Y’ then 4: hue_number = 1+strength; 5: if C1 = ‘R’ then 6: hue_number = 11+strength; 7: if C1 = ‘G’ then 8: hue_number = 21+strength; 9: if C1 = ‘B’ then 10: hue_number = 31+strength; 11: return hue_number; In Table 1, we summarize the variables to be used throughout the paper. We denote the query that is the first selected color by the user as q, its hue number as Xq, its chromaticness as Yq, and its blackness as Zq. Similarly, we denote the result colors’ hue number as X, chromaticness as Y, and blackness as Z. Let constant t represent the total number of the hue numbers and constant h represent the opposite value, half of t. Let k, kα, and kβ represent the constants to calculate the harmonious region between the input color and the result colors. The minimum hue number Min and the maximum hue number Max are determined to fix the left and right side of the region on the color circle from any location. Table 1. The variables used in the paper
Notations q Xq Yq Zq X Y Z t h k, kα, kβ Min Max
Descriptions of the Notations the query, (i.e., the input color) the hue number of the input query the chromaticness of the input query the blackness of the input query the hue number of the result color set the chromaticness of the result color set the blackness of the result color set the total number of the hue numbers, 40 the opposite value, t/2 constants for harmonious region the minimum hue number the maximum hue number
Before explaining color harmony rules in detail, we formally define several expressions. When we find the harmonious colors with the input color, we consider hue number, chromaticness, and blackness. Since NCS represents a color with hue, chromaticness, and blackness, we generate three sets for each element.
102
Y.-H. Park
We generate the first set X selecting the colors with the hue number that is the same with the input hue number, the second set Y selecting the colors with the chromaticness that is the same with the input chromaticness, and the third set Z selecting the colors with the blackness that is the same with the input blackness. Then, we do the intersection of the three sets to find the result colors. The following definitions formally define the expressions for the hue number, chromaticness, blackness, and whiteness to find harmonious colors with the input color. Definition 1: Selecting the same hue numbers Given a query q, the hue number X, and the input color’s hue number Xq, the expression to find a set X for the hue number of q is defined as follow: □ {X|X=Xq} Definition 2: Selecting the same chromaticness Given a query q, the chromaticness Y, and the input color’s chromaticness Yq, the expression to find a set Y for the chromaticness of q is defined as follow: {Y|Y=Yq} □ Definition 3: Selecting the same blackness Given a query q, the blackness Z, and the input color’s blackness Zq, the expression to find a set Z for the blackness of q is defined as follow: {Z|Z=Zq} □ Definition 4: Selecting the same whiteness Given a query q, the chromaticness Y, the blackness Z, the input color’s chromaticness Yq, and the input color’s blackness Zq, the expression to find a set Y+Z for the whiteness of q is defined as follow: {Y, Z|Y+Z=Yq+Zq} □ 3.2
Six Relational Algebras According to Each Color Harmony
3.2.1 Complementary Color Harmony The complementary color harmony is two colors that are an opposite side on the color circle. For example, in Figure 1, Y30R and B30G are the complementary color to each other. The complementary color harmony has two cases. One is the complementary colors on the color circle and the other one is the complementary colors on the two equal color triangles. The complementary colors on the color circle are opposite to each other on the color circle and have the same chromaticness and blackness. The complementary colors on the color circle are opposite to each other on the color circle and are located at the same distance from the central axis on the two equal color triangles. Figure 2 shows the equal color triangles of the two complementary colors. The left triangle is the equal color triangle of G, green, and the right one is the equal color triangle of R, red. G and R are the complementary colors since G and R in
Finding Harmonious Combinations in a Color System Using Relational Algebra
103
Figure 1.1 are located to the opposite side. In Figure 2, the numbers on the each square represent the blackness and chromaticness that each color has. For example, 4020 represent the colors whose blackness is 40 and chromaticness is 20.
Fig. 2. The equal color triangles of the two complementary colors
Following Eq. (1.1), Eq. (1.2), and Eq. (1.3) show the expressions to select the complementary colors from the input color.
{X | X = (X q + h ) mod t } ∩ {Y | Y = Y q } ∩ {Z | Z = Z q } (A)
(B)
Eq. (1.1)
(C)
{X | X = (X q + h ) mod t } ∩ {Y | Y = Yq } ∩ {Z | Z = Yq + Z q } {X | X = (X q + h ) mod t} ∩ {Y | Y = Yq } ∩ {Z | Z = Yq − Z q }
Eq. (1.2) Eq. (1.3)
The proposed color harmony rules are consists of three parts, set X for the hue number, set Y for the chromaticness, and set Z for the blackness. As shown in Eq. (1.1), we represent the part for set X as (A), set Y as (B), and set Z as (C). When h is the opposite value and t is the total number of the hue numbers, by Definition 1, (A) of Eq. (1.1) find a set X for the hue number that has h difference with Xq (1-a). Since the complementary colors are opposite to each other on the color circle, we add h to Xq. Then, we do the modulo t since Xq+h could be larger than t.
104
Y.-H. Park
Eq. (1.1) is defined by (1-a) and Definition 2 and 3, since Eq. (1.1) finds the color that are opposite each other on the color circle and has the same chromaticness and blackness. By Definition 3, (C) of Eq. (1.2) find a set Z for the blackness that is the same with the sum of Yq and Zq (3-a). The color is located at the same distance with the input color from the central axis. The colors with the same sum of the chromaticness and the blackness are on the equal whiteness line by Eq. (1). Thus, the sum of the chromaticness and blackness becomes the blackness of the opposite equal triangle. Eq. (1.2) is defined by (1-a), (3-a) and Definition 2. Contrary to (3-a), (C) of Eq. (1.3) find a set Z for the blackness that is the same with the outcome of subtracting Zq from Yq (3-b). Eq. (1.3) is defined by (1-a), (3-b) and Definition 2. 3.2.2 Similarity Color Harmony The similarity color harmony is colors that are adjacent on the color circle. The mixture among similarity colors be natural harmony. Eqs. (2.1), (2.2), and (2.3) show the expressions to select the similarity colors from the input color.
{X | X is X − k ≤ X ≤ X + k ( Min ≤ X {Y | Y = Y } ∩ {Z | Z = Z } q
q
q
[
q
≤ Max)} ∩
]
X | X is 0 ≤ X ≤ X q + k or (X q − k ) + t ≤ X ≤ t ∩ {Y | Y = Yq } ∩ {Z | Z = Z q } ( X q ≺ Min )
[
[
]
]
Eq. (2.2)
]
X | X is X q − k ≤ X ≤ t or 0 ≤ X ≤ (X q + k ) − t ∩ {Y | Y = Yq } ∩ {Z | Z = Z q } ( X q Max)
[
Eq. (2.1)
q
Eq. (2.3)
Where k=7, Min=8, Max=33. When k is the variable to calculate harmonious region in which the similarity colors are harmonious, Min is the minimum value of X, and Max is the maximum value of X to apply k, by Definition 1, (A) of Eq. (2.1) find a set X for the hue numbers that has difference with Xq less than |k| (1-b). Since Eq. (2.1) selects the different hue number, but the same chromaticness and blackness, and thus, is defined by (1-b) and Definition 2 and 3. (A) of Eq. (2.2) can be explained in the same way with (1-b), however, since Xq is less than Min, we add t to Xq–k (1-c). When we find the hue number that the difference with Xq is zero to k counterclockwise, since Xq–k becomes negative integer, we add t. Eq. (2.2) is defined by (1-c) and Definition 2 and 3. Contrary to (1-c), since Xq is more than Max, we subtract t from Xq+k in (A) of Eq. (2.3) (1-d). When we find the hue number
Finding Harmonious Combinations in a Color System Using Relational Algebra
105
that the difference with Xq is zero to k clockwise, since Xq+k is out of the region of the hue number, we subtract t. Eq. (2.3) is defined by (1-d) and Definition 2 and 3. 3.2.3 Different Color Harmony The different color harmony has strong visual contrast among the colors. Eqs. (3.1), (3.2), (3.3), (3.4) and (3.5) show the expressions to select the different colors for the input color.
[
X | X is X q + k α ≤ X ≤ X X q − k β ≤ X ≤ X q − kα ( Min ≤ X q ≤ Max ) {Y | Y = Y q } ∩ {Z | Z = Z q }
[
]
q
+ k
β
[
] or
] ]
X | X is X q + k α ≤ X ≤ X q + k β or (X q − k β ) + t ≤ X ≤ (X q − k α ) + t X k ( ≺ ) α q {Y | Y = Y q } ∩ {Z | Z = Z q }
[
[
]
[
X | X is X q + k α ≤ X ≤ X 0 ≤ X ≤ X q − k α or (X q − k β ) + t ≤ X ≤ t (k ≺ X q ≺ Min ) α {Y | Y = Y q } ∩ {Z | Z = Z q }
[ [
[
]
]
+ k
β
]
] or
]
X q − k β ≤ X ≤ X X | X is + ≤ X ≤ t or X k α q 0 ≤ X ≤ (X q + k β ) − t ( Max ≺ X q ≤ Max + k α ) {Y | Y = Y q } ∩ {Z | Z = Z q }
[ [
q
]
q
− k
α
] or
Eq. (3.2)
∩
X | X is X q − k β ≤ X ≤ X q − k α or (X q + k α ) − t ≤ X ≤ (X q − k β ) + t + ( ) X Max k α q {Y | Y = Y q } ∩ {Z | Z = Z q }
[
Eq. (3.1)
∩
∩
∩
∩
Eq. (3.3)
Eq. (3.4)
Eq. (3.5)
where kα = 10, k β = 13, Min = 14, Max = 27 When kα and kβ are the variables to calculate harmonious region, Min is the minimum value of X, and Max is the maximum value of X to apply kα and kβ, by Definition 1, (A) of Eq. (3.1) find a set X for the hue numbers that the difference between Xq and X is more than kα and less than kβ (1-e). Since Eq. (3.1) selects the different hue number, but the same chromaticness and blackness, and thus, is defined by (1-e) and Definition 2 and 3. (A) of Eq. (3.2) can be
106
Y.-H. Park
explained in the same way with (1-e), however, since Xq is less than kα, we add t to Xq–kα and Xq–kβ (1-f). When we find the hue number that the difference with Xq is kα to kβ counterclockwise, since Xq–kα and Xq–kβ become negative integer, we add t. Eq. (3.2) is defined by (1-f) and Definition 2 and 3. Contrary to (1-f), since Xq is more than Max+kα, we subtract t from Xq+kα and Xq+kβ in (A) of Eq. (3.3) (1-g). When we find the hue number that the difference with Xq is kα to kβ clockwise, since Xq+kα and Xq+kβ are out of the region of the hue number, we subtract t. Eq. (3.3) is defined by (1-g) and Definition 2 and 3. (A) of Eq. (3.4) can be explained in the same way with (1-e), however, since Xq is more than kα and less than Min, we find the hue numbers that are between (Xq–kβ)+t and t and between zero and Xq–kα (1-h). Since some of the hue numbers whose difference with Xq is kα to kβ counterclockwise become negative integer, we consider both the hue numbers that are become negative and positive. Eq. (3.4) is defined by (1-h) and Definition 2 and 3. Contrary to (1-h), since Xq is more than Max and less than Max+kα, we find the hue numbers that are between zero and (Xq+kβ)-t and between Xq+kα and t in (A) of Eq. (3.5) (1-i). Since some of the hue numbers whose difference with Xq is kα to kβ clockwise are out of the region of the hue number, we consider both the hue numbers that are out of the region and in the region. Eq. (3.5) is defined by (1-i) and Definition 2 and 3. 3.2.4 Identity Color Harmony The identity color harmony is colors that have the same blackness or chromaticness or whiteness in the color triangle. Eqs. (4.1), (4.2), and (4.3) show the expressions to select the identity colors from the input color.
{X
|X = X
q
} ∩ {Z | Z
{X
| X = X
q
} ∩ {Y
{X
| X = X
q
} ∩ {Y , Z
= Zq}
| Y = Yq }
| Y + Z = Yq + Z q }
Eq. (4.1) Eq. (4.2) Eq. (4.3)
Eq. (4.1) selects the colors that have the same blackness on the equal blackness line. Since the colors on the equal black line have the same hue number and blackness, Eq. (4.1) is defined by Definition 1 and 3. Eq. (4.2) select the colors that have the same chromaticness on the equal chromaticness line. Since the colors on the equal chromaticness line has the same hue number and chromaticness, Eq. (4.2) is defined by Definition 1 and 2. Eq. (4.3) selects the colors that have the same whiteness on the equal whiteness line. Since the colors with the same sum of the chromaticness and blackness are equal to the colors with the same whiteness, Eq. (4.3) is defined by Definition 1 and 4. 3.2.5 Polychromatic Harmony The polychromatic harmony is colors that are located at the equal chromaticness and equal blackness, and includes the identity color harmony mentioned above. Eqs. (4.1), (4.2), (4.3) and (5) show the expressions to select the polychromatic from the input color.
Finding Harmonious Combinations in a Color System Using Relational Algebra
{Y
|Y = Yq
}
∩
{Z
| Z = Z
q
}
107
Eq. (5)
Eq. (5) is defined by Definition 2 and 3 since the polychromatic are harmonious when the colors have the same chromaticness and blackness. 3.2.6 Achromatic Harmony The achromatic harmony is consisted of colors on the grayscale. Eq. (6) shows the expression to select the achromatic from the input color. Eq. (6) {X | X = 0 } (A) of Eq. (6) select the colors on the grayscale by Definition 1 (1-j). Since the colors on the grayscale don’t have the hues, we find the colors with the hue number that is zero. Eq. (6) is defined by (1-j).
4
Conclusions
This paper focuses on harmony between colors. The paper presented the method finding harmonious combinations based on color harmony theories using relational algebra. The method is finding the color harmony rule based on the color harmony theory and formalized the rule. Through this, users produce the same results in color selection as professionals without the expert knowledge in color. For this, in the paper, we formalize the color harmonies using relational algebra existing in natural color systems.
References 1.
2.
3. 4. 5. 6.
7.
Shen, Y.-C., Chen, Y.-S., Hsu, W.-H.: Quantitative Evaluation of Color Harmony via Linguistic-Based Image Scale for Interior Design. COLOR Research and Application 21(5), 353 (1996) Shen, Y.-C., Yuan, W.-H., Hsu, W.-H., Chen, Y.-S.: Color Selection in the Consideration of Color Harmony for Interior Design. COLOR Research and Application 25(2) (February 2000) Tokumaru, M., Muranaka, N., Imanishi, S.: A Color Design Support Systems Considering Color Harmony. Fuzzy Systems, 378 (2002) Nemcsics, A.: Experimental determination of laws of color harmony. Part 4, COLOR Research and Application 34(3), 20–31 (2009) Tokumaru, M., Muranaka, N., Imanishi, S.: A Color Design Support System Considering Color Harmony. Fuzzy Systems, 383 (2002) Hsiao, S.-W., Chiu, F.-Y., Hsu, H.-Y.: A Computer-Assisted Colour Selection System Based on Aesthetic Measure for Colour Harmony and Fuzzy Logic Theory. COLOR Research and Application 33(5), 411–423 (2008) Nayatani, Y., Sakai, H.: Yoshinobu Nayatani,1 Hideki Sakai2: Proposal for Selecting Two-Color Combi-nations with Various Affections, COLOR Research and Application 34(2) (April 2009)
Image-Based Modeling for Virtual Museum Jin-Mo Kim1, Do-Kyung Shin2, and Eun-Young Ahn3,* 1 Dept. of Multimedia, Dongguk University, Seoul-City, South Korea Dept. of Computer Engineering, Hanyang University, Ansan-City, South Korea 3 Dept. of Communication Information & Computer Engineering, Hanbat National University, Deajeon-City, South Korea
[email protected],
[email protected],
[email protected]
2
Abstract. This method focuses on the making a vivid virtual museum in a proper time for the number of complicated artifacts. And another purpose is minimizing undesirable distortions in the process of the modeling and finally, gaining realistic visual effects. In this paper, we present a new method for constructing 3D VR contents by using Smart Billboard, namely for selecting a proper mapping image among the images captured by rotating camera position at regular intervals. Moreover, we describe a simplified calculation method for selecting an adequate image to the viewer. The proposed method is applicable to the industry like as e-commerce, e-learning on the web site and simulations for saving your efforts and resources to make 3D VR contents. And it is validated with the practical embodiments of a virtual museum in which the exhibitions are represented by Smart Billboard that automatically calculates and selects a proper image according to the view change in a cyber space. Keywords: Texture mapping, Billboard, Mixed reality, Image based modeling.
1
Introduction
Virtual reality techniques are widely used for experimental education requiring high risk and high cost in a real situation such as military training, medical skill practice, and ship and airplane control. In addition, a lot of contents on web sites adopt VR techniques to make common space between suppliers and users. For making favorable user friendly application, the VR contents should be immersive so that it gives realistic feelings through the adequate interaction with user. For this reason many kinds of devices like as haptic and smell sensor are often used recently. It is sure that visual information is a most important sensory among them for vivid VR contents. So, we have to try for offering sufficient visual information with real time 3D rendering and natural interaction without special devices[1]. However, 3D modeling for a realistic scene needs much efforts and cost. The more complex objects we want to describe show the larger data and require additional efforts for the imbodiments, becasuse the large data size causes low efficiency for data transfer and for rendering. Especially, on the web site the data size should be minimized for the *
Corresponding author.
T.-h. Kim et al. (Eds.): MulGraB 2011, Part I, CCIS 262, pp. 108–119, 2011. © Springer-Verlag Berlin Heidelberg 2011
Image-Based Modeling for Virtual Museum
109
users could access contents even under the low bandwidth [2,3]. In order to overcome these restrictions, many methods have been suggested. ‘Billboard’ is one of choices. The ‘Billboard’ is a simple trick but respectable method for reducing huge size of 3D data without so much injury of visual quality of 3D modeling. In this paper, we suggest a method to produce 3D contents using Billboard that not only decrease data size but also minimize the cost for 3D modeling. We propose an improved Billboard, called Smart Billboard, for reducing heavy task for describing 3D objects in detail. For showing the feasibility of the method, we make educational contents for historical museum that contains a number of relics and remains. First of all, we are looking into the characteristics and advantages of Billboard in chapter 2 and introduce the concept of Smart Billboard more precisely that is used in constructing a historical museum in chapter 3. And the following chapter shows experiment results and finally we discuss the conclusions of the suggested method.
2
Related Work
Billboard is a rectangular plane of which normal is always parallel with the viewer [4]. So, viewer can see only front side of it wherever viewer moves. From this property we can get a plausible effect by only mapping an image onto the Billboard, users feel as if they see 3D object. This technique is frequently used in games instead of 3D modeling of complex objects for real time rendering even though there are the number of objects in a scene. When you use Billboard for 3D modeling, the number of vertices for representing a 3D object can be reduced because that the Billboard has only four points of the rectangular. For implementing Billboard, the main issue lies on the adjusting rectangle being perpendicular to the viewer. Viewing transformation is executed through the model-view matrix, which is a part of geometric pipeline in graphics hardware. In other words, the model-view matrix contains information for the view coordinates. From the matrix we can get Billboard perpendicular to the view vector. Billboard is useful for representing inconsiderable objects such as buildings and non-player characters in games. However it always shows only one viewpoint image. The method is not enough to express 3D object we want to look over precisely. For examples, cyber museum should offer a method to manipulate virtual relics in many ways to examine accurately. This paper focuses on this shortage of Billboard technique and proposes a method to cut down the efforts for 3D modeling and to make possible real time rendering [5-8]. We call it ‘Smart Billboard’ and it has a selective mapping mechanism to make a scene according to the viewer movement in the cyber space.
3
Building a Cyber Museum with Smart Billboard
Fig. 1 shows the conceptual diagram for a cyber museum where we use Smart Billboard for artifacts requiring high quality. The museum is composed of three main components, historical information, 3D artifacts and their additional contents for comprehensive explanation. The background and environments for each era are good for understanding of the exhibition and for increasing the feeling of the reality and
110
J.-M. Kim, D.-K. Shin, and E.-Y. Ahn
helping gallery’s comprehension. Artifacts are exhibited in the era-environment. Moreover, many kinds of contents are linked into the remains for additional information and explanation. Users could navigate the historical cyber space at their convenience and observe remains and get useful information about them. The whole system helps spectator look around and get the comprehensive information for the relic and the ages [9-10].
Fig. 1. Structure of cyber museum
3.1
3D era Environments of Cyber Museum
For the immersive touring system, 3D construction for the environment is essential. The cyber museum is represented by 3D modeling. Fig. 2 illustrates cyber museum for the prehistoric age decorated by 3D object in that era like as a dolman, a dugout hut and so on. To describe these environment and background of the era, we create terrain and architect with 3D modeling tool. And the space is decorated with related artifacts of that period according to the historical researches. In the cyber museum, galleries could go around the spaces over here and there. When they find interesting remains, they might want to investigate them and take interaction with the touring system to get historical information about them. Therefore, the artifacts in the space should be described with 3D. 3D representation of an object needs tremendous efforts in producing a virtual reality application. Complex objects can be presentable with huge vertices, which lead a result large size in data. And it prevents smoothing web service in low bandwidth network environment. Moreover you should be careful not to distort image during the mapping process.
Image-Based Modeling for Virtual Museum
111
Fig. 2. Typical scene of periodic environment
3.2
Representation of Remains with Smart Billboard
Fig. 3 depicts the 3D models of an earthen vessel and stoneware of prehistoric era. Artifacts are modeled based on the historic information and image for the original shape. And we get texture mapping onto the 3D surface for improving realities. Before texture mapping, we use a plug-in program to modify the image of the relic according to its shape for minimizing distortion of the mapping image. In the process of producing 3D objects, we take account of some problem. First one is about the cost for describing relics. The more detail we try to describe relics, the larger size of data it produce. The relics like as Fig. 3-(a) are aside from the question because they are so simple. However, Fig. 3-(b) requires a lot of vertices and takes much time for modeling. Under the consideration of the further application of the contents such as mobile service and on-line education, the large size in data might become a big problem. Second problem is of realistic expression for the relic. Try as we may, there are inevitable differences between artificial description and original one. So we need a new efficient method for 3D description. To solve the problem previously we mentioned, we suggest an image-based approach for 3D modeling. The image based 3D representation has some strength like as follows:
Reduces efforts for 3D modeling. Eliminates the injury of quality due to the image distortion during texture mapping. Preserves consistent data size for 3D edifice even though it needs elaborative description. Consequently, we can alleviate overload for web-services.
3.2.1 Capturing Mapping Images To make able to look at the remains from multi viewpoint, the images from different view-angles should be prepared. Twelve images from different longitudinal angles (each 30 degree separation) at zero latitude angles are in a row images of the Fig. 4 Images at different latitude angles with 30 degrees increments are displayed at a column of the Fig. 4 The step for getting image matrix as follows:
112
J.-M. Kim, D.-K. Shin, and E.-Y. Ahn
(a) Earthen vessel and stoneware of prehistoric ages
(b) Artifact in BC 18-660
Fig. 3. Example of relics
Fig. 4. Matrix of images at different view angles
Fig. 5. Mapping images capture
Image-Based Modeling for Virtual Museum
113
1. Put an object on the turn table. 2. Rotate the turn table and get images at uniform separation, i.e., each 30 longitudinal degrees, respectively. 3. Move camera in vertical direction as amount of 30˚ and go to step 2(getting images). 4. Repeat above steps until its latitude angle change up 180 degrees as shown in Fig.5 3.2.2 Selective Mapping Mechanism Smart Billboard is conceptually an improved Billboard technique. It has a mechanism of proper mapping-image selection. Fig. 6 shows overall of the selection mechanism. When a gallery moves to another place in the cyber museum (Camera Movement), mapping information of the gallery (Position of Camera) is changed according to the calculating results in viewing vector. Using Equation (1), we can get V1. In this equation Vp and Vc are the vector from original point to object and camera respectively and viewing direction is obtained by subtracting Vc from Vp as illustrated in Fig. 7-(a). The vector V1 is a viewing direction from camera to Billboard. To simplify further calculation, we use unit vector n for viewing vector V1. Billboard can be regarded as a projection plane and we can have 3D effects by only mapping an adequate object’s image that it ought to be projected on the projection plane when camera sees the object through the projection plane. The mapping image is selected by the angle when galleries look at the object in their position. In detail, the azimuth and elevation of the viewing vector in Fig. 7-(b) signify viewing angle that galleries watch an object. According to these angles it is possible to choose an adequate projection image among the image set.
V1 = Vp − Vc
Fig. 6. Conceptual diagram for selective mapping mechanism
(1)
114
J.-M. Kim, D.-K. Shin, and E.-Y. Ahn
(a) View vector
(b) view angle
Fig. 7. Billboard orthogonal to the camera view and its normal vetor
It calculates azimuth(θ) and elevation (ω) angle in polar coordinates to make decision of the column and row index of the image matrix respectively. Because that the viewing vector is normalized(n), azimuth and elevation are acquired by simple equations. The angle(θ) is a view-angle with respect to the axis x and it is calculated by dot product of n with x-axis(see Equation.(2)). In similar, the complementary angle of elevation(φ) is acquired by Equation (3) and a raw index of the image matrix can be selected by the angles. Fig. 8 illustrates a necessity for adjusting the valued of azimuthal angle depending on the rotational direction. If the camera is located at the front side(FS) then column index is range from C0 to C5. Otherwise, the camera exists in the back side(BS), its column index for image is range from C6 to C11 and we negate x-value of n for later computational handiness. Because it is convenient for treating the case of BS same manner as FS case in the process of finding exact column index. When row and column index of image are decided, it makes a choice one mapping image in the image matrix according to the indices as shown in Fig.9.
cos θ = n ⋅ (1, 0, 0)
(2)
cos ϕ = n ⋅ (0,1, 0)
(3)
Fig. 8. Image indexing according to camera angle position
Image-Based Modeling for Virtual Museum
115
Fig. 9. Detailed process for deciding column and row index
4
Experimental Results and Analysis
To demonstrate the feasibility of the proposed method, we implemented Smart Billboard in a cyber museum on a PC Intel Core2 Duo T7250 2.0GHz with ATI Mobility Radeon HD 2400 video card and use 3XD based tool, Virtools. Fig. 10 illustrates exhibitions in the cyber museum represented with Smart Billboard. In this figure alpha channel of the mapping image is not used and environmental background is eliminated for intending to focus on the effect of Smart Billboard. Moving a view point makes change the normal vector of billboards where mapping images are imprinted on. Fig. 10 illustrates changes of mapping images as camera moves vertically also. Smart Billboard shows adequate reaction in the perspective projection and the resultant images are so natural, i. e., the size of the object is depending on the distance between the spectator and the plane as shown in fig. 10-(a). As moving camera vertically, the amount of latitude angle of frontward Billboard is larger than backward one. Therefore, an opening of the gourd bottle close to the camera can be seen more clearly as shown in Fig. 10-(b). Fig. 11 shows cyber museum where the artifacts like as stone monument and decant are represented by Smart Billboard. They are the results for the arbitrary navigation in the cyber space i.e. changing the distance, changing the view angle and so on and these results show that Smart Billboard takes an adequate reaction in the 3D space. To evaluate rendering performance, we compare relatively simple 3D earthenware bowl composed of 2,400 polygons and the same thing represented with Smart Billboard. For each representation method, we check rendering time as adding same objects one by one gradually.
116
J.-M. Kim, D.-K. Shin, and E.-Y. Ahn
(a)Scene for perspective view
(b) scene for orbit camera operation
Fig. 10. Results for Smart Billboard: as chaing view point Smart Billboard selects adequate image according to the view angle
Fig. 11. Cyber museum with Smart Billboard (closed view)
Table 1 shows that FPS(Frame per Second) becomes slow down from the instant over 250,000 polygon roughly in a scene in the case of 3D modeling. However Smart Billboard shows preserve consistent FPS. Fig. 12 shows the results for a more complicated object, a 3D stone monument composed of 35,136 polygons. It conforms that FPS plummets from exceeding eight monuments at a scene. Fig. 13 in the previous chapter is a gilt bronze incense burner in the age of Baekje, national treasure no. 287. Since the relic is so delicate and complicated it is hard to describe in 3D perfectly, we scanned it for 3D modeling by using 3D scanner. However we try to simplify the vertices it still has huge amount of vertices and makes some trouble in real time rendering. For this reason, we replace the 3D description with image based rendering, Smart Billboard from gotten multi-view point rendered images from the 3D modeling data. Fig. 13 shows the results. And another evidence for the advantages of the proposed method is as follows. To confirm a visual quality of the proposed method, we mixed 3D representation with proposed method in cyber museum and conduct a survey of sixthgrade primary students. We studied 29 students with 6 grade of primary school. It is a blind test holing back information which one is 3D object or not and we let the users navigate the cyber museum freely for a while. The galleries wouldn’t notice Billboards and think them as 3D objects during all their navigation. In addition, after notifying which one is image-based object, we investigate the level of user satisfaction measurement. Totally 29 students participated in the survey (15 female and 14 are male) perform 5-grade evaluation. Furthermore, most users evaluate that the objects represented with Smart Billboard are more qualified compared with the 3D objects T he results show that the proposed method marks in almost every category especially, interactive convenience, visual quality as depicted in Fig. 14.
Image-Based Modeling for Virtual Museum
117
Table 1. Performance evaluation (FPS) for an earthenware # of objects
1
4
16
64
256
512
1024
2048
4096
3D representation
60
60
60
60
60
30.7
17.5
9.5
4
Smart Billboard
60
60
60
60
60
60
60
59.8
59.6
Fig. 12. FPS plumets from obver eight of monument at a scene. This phenomenon appear faster if object is more delicated.
(a) Scanned 3D polygons
(b) Still cut images for describing the relic with Smart Billboard
(c) Baekje gilt-bronze with Smart Billboard in cyber space Fig. 13. An example of image-based rendering (Smart Billboard) for extremely complicated relict; (a)scanned huge 3D polygons for gilt-bronze incense burner, (b)Still images for Smart Billboard rednered at different view angle from the 3D modeling data, (c) Image-based incense burner represented in the cyber museum at any viewpoint
118
J.-M. Kim, D.-K. Shin, and E.-Y. Ahn
Fig. 14. Comparison of the user satisfaction for the virtual museum
5
Conclusions
This investigation is focused on reducing efforts for 3D modeling of an object in virtual reality applications. We propose an effective technique using real images for 3D feeling modeling, not 3D modeling of complex shaped objects. For the purpose of 3D feeling modeling, we suggest Smart Billboard (SB). SB is a realism enhanced method for Billboard. The major difference between SB and previous Billboard is a mapping mechanism for an adequate image onto the board. The proposed method makes a vivid scene according to the viewer position and direction by replacing the rendering of 3D objects with mapping a proper projected image when the viewer looks those objects. To construct image matrix for multi-viewpoint images, we use Object-VR equipments that controls camera movement and captures image from different view angle. For describing very complicated relics, we use still cuts rendered at intended view angle from scanned 3D models. And we describe how to decide a proper mapping image among the different viewpoint images. From the view point, SB calculates viewing vector and figure out relative view angle in polar coordinate and an image is selected from these two angle values. In order to validate the appropriateness and usefulness of the presented technique, we embodied a cyber museum and confirm that the artifacts represented with SB work properly in 3D virtual space. And we check out some considerations such as rendering time, user friendliness and the performance properties. The results explain that the presented methodology is effectively adaptable to many applications asking for not only real time rendering but also for high qualifying display. Especially, this method is good for implementing 3D cyber space such as virtual museums strongly requires correct visual description for remains and real time user interaction.
References 1. Burdea, G.C., Coiffet, P.: Virtual reality Technology, pp. 57–102. Wiley-Interscience (2003) 2. Ahn, E.Y., Kim, J.W.: Personalized Contents Service with User-Context. In: Proceedings of Korea Contents Society Conference, pp. 614–621 (May 2008)
Image-Based Modeling for Virtual Museum
119
3. Fox, G.C.: Portals and Frameworks for Web Based Education and Computational Science (2000), http://www.new-npac.org/users/fox/documents/pajavaapril00/ 4. Shum, H.-Y., Chan, S.-C., Kang, S.B.: Image-Based Rendering, pp. 31–34. Springer, Heidelberg (2007) 5. Mcmillan, L., Bishop, G.P.: Modeling: An Image-Based Rendering System. In: Proc. of ACM SIGRAPH, pp. 39–46 (1995) 6. Franco, T.C., et al.: Image Based Crowd Rendering. IEEE Computer Graphics Appl. 22(2), 36–43 (2002) 7. Tecchia, F., et al.: Real-Time Rendering of Densely Populated Urban Environments. In: Proc. of Eurographics Workshop on Rendering Tech., vol. 2, pp. 83–88 (2000) 8. Papagiannakis, G., L’Hoste, G., Foni, A., Magnenat-Thalmann, N.: Real-Time Photo Realistic Simulation of Complex Heritage Edifices. Virtual Systems and Multimedia, 218–227 (2001) 9. Ahn, E.Y., Ryu, I.Y., Kim, J.W.: The Efficient Integration of Information for User Preferred Contents Service in Virtual Reality. In: Proceedings of Korea Multimedia Society 2008, pp. 735–740 (May 2008) 10. Magnenat-Thalmann, N., Foni, A.E., et al.: Real Time Animation and Illumination in Ancient Roman Sites. Int’l Journal of Virtual Reality 6(1), 11–24 (2007)
Automatic Tiled Roof Generator for Oriental Architectural CAD Design* Hyun-Min Lee1, Dong-Yuel Choi1, Jin-Mo Kim 2, and Eun-Young Ahn1,** 1
Dept. of Information Communication & Computer Engineering, Hanbat National University, Deajeon-City, South Korea 2 Dept. of Multimedia, Dongguk University, Seoul-City, South Korea {Ct2009,aey}@hanbat.ac.kr,
[email protected],
[email protected]
Abstract. In digital design of oriental wooden architecture, drawing roof surfaces is very difficult and elaborating job. Because that the roof has a curved surface and the surface is generally made from hundreds number of roof tiles. And these facts cause modification problem. When designer want to redraw the roof surface, every related roof tiles should be adjusted in detail. To overcome this issue and to design a roof efficiently, we suggest an automatic roof generating method applicable in 3D CAD program. The automatic roof generating system is able to control the shape of roof by user's intent. The curved roof surface is roughly decided by geometrical characteristics of the house main frame but details such as roof curvature and LOD(level of detail) can be controlled by user. The proposed roof system is based on the BIM so it can evaluate and inform exact quantities helpful in the construction process. Keywords: Roof surface, Oriental architecture, CAD design, Automation.
1
Introduction
Roof surface in Korean traditional building is a 3D curved surface formed with raised curve and inward waist curve. Raise curve is a smooth lifting curve formed from center of the roof surface to corner of that in a front view. Inward waist curve is a curve path extruding from center to corner in the floor plan. Lovely curves of the eaves are not only aesthetic consideration. Basically, it is generated by rendezvous the slop of a roof for concave roof in four corner [1]. The Korean traditional roof has three leading types: gambrel roof, hipped roof, and hipped-and-gable roof as shown in Fig. 1 Looked into the development process of roof structure, a gambrel roof is the easiest, which came first than any other roof type. *
This research is supported by Ministry of Culture, Sports and Tourism(MCST) and Korea Creative Content Agency(KOCCA) in the Culture Technology(CT) Research & Development Program 2011 and Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education, Science and Technology(2010-0021154). ** Corresponding author. T.-h. Kim et al. (Eds.): MulGraB 2011, Part I, CCIS 262, pp. 120–126, 2011. © Springer-Verlag Berlin Heidelberg 2011
Automatic Tiled Roof Generator for Oriental Architectural CAD Design
121
A hipped roof is followed, which is good for handling the eaves and gable. It is usually used in a larger structure. The two types are developed into a hipped-andgable roof, combining the features of the two [2]. Roof side has a plane that rain falls on as shown in Fig. 2 The roof side consists of sloping planes that are flat, concave, or convex. Most traditional wooden structures have concave roofs. Timber framing is a wooden frame or skeleton that help build a house. Major materials in timber framing is column, crossbeam, and purlin. To make these materials have a style is a timber framing method. The timber framing method counts by side's cross section structure, regardless of front's number of space. The framing method refers to the side's cross section structure, regardless of the number of space. It divides into 3-purlin house, 5purlin house, 7-purlin house, and 9-purlin house. This is determined by how many lines of ridge purlins are used in the longitudinal section. The 5-purlin house is most widely used in the Korean traditional houses. It gives a hint that the most common roof styles are concave roofs. In oriental traditional architecture design, the roof has distinctive characteristics from other parts like as stereobate, shaft and other ornamental parts. Though there are some suggestions for 3D design of the traditional wooden building[3-5], there are few developments of a 3D roof design until now. Because that a roof is made of hundreds number of elementary component called 'Giwa' in Korean word and their combination appears to be a gentle curved surface. For this reason architecture designer take much time and efforts for drawing the roof by connecting roof tile and form an esthetic and elegant surface manually[6,7]. Once it is designed, modification of the roof surface is much more difficult. This research focuses on this troublesome and we propose an automatic roof generating method which is immediately applicable into a commercial 3D CAD program. The proposed method is implemented based on the BIM(Building Information Modeling) tool, becoming a conversation topic in architecture, from the auto-generated roof can offer information about component quantity evaluation and error check list, it is useful in the whole process of architecture design.
(a) gambrel roof
(b) hipped roof
c) hipped-and-gable roof
Fig. 1. Roof types for Korean traditional wooden architecture
(a) flat roof
(b) concave roof Fig. 2. Roof shape
(c) convex roof
122
H.-M. Lee et al.
2
Automatic Roof Generating System
2.1
System Concepts
The roof curve of Korean traditional architecture is composed of three-dimensional curves that are made of the raised angle rafter curve when viewed on elevation and the inner waist curve when viewed on a plane level. Fig. 3 depicts the concept diagram for the suggested method. For applicability and scalability, we develop it as a plug-in program that can be installed into the CAD program. In this system, many complicated components can be generated from the component library where each component is prefabricated as a template. But template based approach is inadequate for the roof design. Because roof tile is very simple but a number of roof tiles are needed to arrange one by one to form a roof surface having smooth curvature according to the condition of lower structure frame. So, roof design is very difficult and time consumable job. That is why we need a roof generator where roof is designed automatically. As the roof begins to be drawn after lower structure parts from stereobate to bracket-set(Gongpo) are completed, the system can calculate the position and geometrical properties for the roof from the lower structure. And then roof generation module generates the roof surface from roof tile component automatically. The drawing result is delivered to the CAD program.
Fig. 3. Concept of Roof Generating System
2.2
Analysis and Representation of Roof Shape
Fig. 4 depicts the most common shape of roof in Korean traditional building. The curves on the roof in Fig. 4 are for the description of roof shape. C1 is a main stream of the roof formed by connecting a number of basic objects lengthways. C2 and C3 are formed by the eaves board. The eaves are the lower edges of a roof, usually projecting beyond the columns or walls of the building. They are formed with rafters that projects out of the columns. Then, the end of the eaves that projects out long out of
Automatic Tiled Roof Generator for Oriental Architectural CAD Design
123
the columns is called stick reach. Different from the western architecture that rarely has the eaves, the Korean traditional architecture has the eaves. They protect walls and windows from rain and wind, as it provides added convenience and coolness during the summer. User can control the curvature and shape of the curve by setting some parameters. The roof is completed by extending the main stream(C1) along the C2 and L1(or L2). In detail, the extension goes with two end points, one point, P0 is on the C2 curve and another point, P2 is on the L1(or L2) they are regarded as end points of main stream(C1). The curves are represented by Bezier curve. L1 and L2 stand for ridge of the roof and descent ridge respectively.
Fig. 4. The curves defining roof surface in Korean common Architecture
Though the length and curvature of the curves are changed along the curve section, its variation is smooth and gentle. To satisfy these characteristics, the mid-point(P1) of control points is always in the same position relative to the two end points(P0 and P2). More specifically, regardless where end points are, mid-point(P1) is calculated and located by using Equation (1) to keep a similar triangle. P1 . y = P0 ⋅ y + ( P2 . y − P0 . y ) t1 for 0 ≤ t1 ≤ 1 P1 .z = P0 .z + ( P2 .z − P0 .z ) t2
for 0 ≤ t2 ≤ 1
where, P0 ∈ { P | C2 ( u )} , P2 ∈ L1 or L2
2.3
(1)
Generation of Roof
After deciding the position and curvature of the roof surface, it is possible to generate a roof tile and place it in a row. The roof tile called Giwa is generated along the curve surface repeatedly. Like other components, Giwa is described with script language supported by CAD program for do-it-yourself components description. Giwa has a very simple shape, so we use a simple surface model generated by extruding the half circle in a direction. In the process of roof design a huge number of vertices are generated and it makes reduce rendering performance. To solve this problem the roof generating system offers LOD(Level of Detail) that make possible to control the number of vertices of the component as necessary.
124
3
H.-M. Lee et al.
Results and Discussion
The automatic roof generating system is developed as an add-on program execute on the commercial CAD system namely ArchiCAD 12V. And script language GDL(Geometry Description Language) is used for the description of components[8] on a PC Xeon X5550-2.66GHz with NVIDA Quadro FX4800 video card. Roof surface is generated by expending main concave roof tile stream along the line L1 as shown Fig. 3 and then convex roof tile is built on the edge of the concave line. Fig. 5 shows the curvature of roof can be controlled by t1 and t2 parameters of the Equation (1). Fig. 6 demonstrates the execution of the roof generator. From the viewpoint of construction engineering, Fig. 6-(b) is unreal. So, the system checks this error and replaces the control value with the nearest reasonable value within the range.
(a) Curvature for roof main stream C1
(b) Extension of the C1 along with the C2 and L1
Fig. 5. Pavement of concave roof tile stream
(a) Roof shape control (t1= t2=0.3 )
(b) Roof shape control (t1= t2=1 )
Fig. 6. User Interface for setting up the curvature of main stream
Fig. 7 shows that the proposed method can generate the two types of roof shape, namely hipped-and gable roof (Fig. 7-(a)) and gambrel roof (Fig. 7-(b)). We implement the method on the commercial architectural CAD program, ArchiCAD.
Automatic Tiled Roof Generator for Oriental Architectural CAD Design
a) Hipped-and Gable Roof
125
(b) Gambrel Roof
Fig. 7. Two type of roof generated by the proposed method
Fig. 8 depicts a design result by using a CAD program where our roof generator is plugged-in. Fig. 8-(a) and (b) show the result of the roof generator and its 3D rendering result in a perspective view respectively.
(a)Execution of the roof generator
(b) 3D in perspective viewpoint
Fig. 8. Result for Roof Generator in Process of CAD design
4
Conclusions
Recently CAD design has become a common in oriental wooden architecture as well as modern Building. There are many differences between modern and oriental wooden architecture in respect to design. Oriental wooden architecture design is relatively much more difficult because that the designer should consider a combining rule between connecting components. Especially tiled roof design is hard and time consumable work because that the designer arranges roof tile one by one to from a roof surface. It leads to serious trouble like as rendering delay, lack of freedom in modification. In many cases, component library in which frequently used components are stored is generally used for design convenience. But tiled roof is not adequate for using components library. This paper suggests an efficient method to generate roof surface automatically from the information of the lower constructions. Moreover user can control and set roof surface in detail by prompt input changes. This proposed method is implemented as an add-on program that can be installed in a commercial CAD program. The roof generator is developed based on BIM system so it supports
126
H.-M. Lee et al.
additional function. For example the system can figure out quantity of components and check the errors occurred in the process of design.
References 1. 2. 3. 4.
5. 6. 7.
8.
Kim, W.-J.: Korean’s Architecture Terms with Pictures, Bareum (2000) Chang, G.-I.: Korea’s Wooden Architecture, Boseong-gak, pp. 247–324 (2003) Chiou, S.-C., Krishnamutri, R.: The Grammatical Basis of Chinese Traditional Architecture. Language of Design, 5–31 (1995) Li, A.I.-K., Tsou, J.-Y.: The Rule-based Nature of Wood Frame Construction of the Yingzao Fashi and the Role of Virtual Modeling in Understanding it, Computing in Architectural Research. In: Proc. of the International Conference on Chinese Architectural History, Hong Kong, pp. 25–40 (1995) Choi, J.-W., Hqang, J.-E.: KotaView: Simulating Traditional Korean Architecture Interatively and Intelligently on the Web. Automatic in Construction 14, 1–14 (2005) Yang, J.-Y.: A Study on the Framed Structure of the Gambrel Roof in Korean Traditional Architecture. Journal of Architectural Institute of Korea 25(2), 155–167 (2009) Kim, J.-H., Joo, N.-C.: A Study on the Relationship between Roof Shape and Floor Plan in Korean Traditional Architecture. Journal of Architectural Institute of Korea 5(2), 45–57 (1989) Dobelis, M.: GDL- New ear in CAD. In: 6th International Conference on Engineering Graphics BALTGRAF-6, pp. 198–203 (2002)
Understanding and Implementation of the Digital Design Modules for HANOK* Dong-Yuel Choi1, Eun-Young Ahn1,**, and Jae-Won Kim2 1
Dept. of Communication Information & Computer Engineering, Hanbat National University, Deajeon-City, South Korea 2 Dept. of Mechanical Engineering, Sunmoon University, Asan-City, South Korea
[email protected],
[email protected],
[email protected]
Abstract. This paper focuses on an easy and efficient design method to draw Korean style houses. The goal is achieved by template-based elementary components for architectural design. In Korean style construction, the building is formed in the process of stacking and jointing the wooden components according to their binding rules. There are many joint rules between the components. It makes complicates the digital design of a Korean style building. This paper proposes a method to draw an oriental wooden house easily using the prefabricated template-based component representation on a BIM(Building Information Modeling) tool. From the proposed method, the components can be transformed and reused for blending them with modern architectural components to make a creative and practical space in our life. Moreover, the proposed method is implemented on a BIM tool, it has a role for error detector and information provider to the user during and after the design process. Keywords: joint rule, Hanok, architectural design tool, template.
1
Introduction
Hanok is traditional Korean house that has been adapted to the geographic environment and life style. We have distinguished four seasons. So, Hanok has a unique system for standing severe hot in summer and cold in winter. Hanok has a special heating system to retain temperature from northwester in winter. Meanwhile, in summer, Dae-Chung floor and Bunhap window system are used for opening the space to make cool down the heat of the summer. Especially, Bunhap is unique window system nowhere to be founded in the world. Beside this, another strong point *
This research is supported by Ministry of Culture, Sports and Tourism(MCST) and Korea Creative Content Agency(KOCCA) in the Culture Technology(CT) Research & Development Program 2011 and Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education, Science and Technology(2010-0021154). ** Corresponding author. T.-h. Kim et al. (Eds.): MulGraB 2011, Part I, CCIS 262, pp. 127–134, 2011. © Springer-Verlag Berlin Heidelberg 2011
128
D.-Y. Choi, E.-Y. Ahn, and J.-W. Kim
we attention is that the Korean style building is a green house. House is made of echo-friendly materials such as wood, stone, and earth. Many people are interested in these advantages for Hanok and try to take these strengths into their modern life. This seems to be a growing trend these days. They are some examples renovating the traditional building but original style and grace is unspoiled. This new trying and a novel innovation have opened up new possibility of Hanok as a modern space. As increasing interest and new trying for Hanok like this, the demands for digital design of Hanok are growing up. Some systemic approaches of wooden house for structure representation and comprehension has been researched over recent years[1-5]. But there are some troubles to be solved in digital design of Hanok. Architectural CAD does not sufficiently support the construction of Korean wooden structure. So the design for the Korean style architecture is very difficult Simple components in CAD system are sufficient when designer want to draw a western house. But in the case of designing an oriental wooden house, there are so many problems. These problems are mainly caused by joint rule for the connected components in Hanok. There are many cases in staking principle, it is very difficult to understand and draw a building exactly. In this paper, we propose an easy way to design a traditional architecture from prefabricated parametric components that can be transformed and modified easily. This approach is useful for digital design of Hanok. Since the method is implemented in BIM(Building Information Modeling) tools. BIM embodies actual configurations and information in a 3-dimensional drawing system, so it stores all the information of the building in database automatically and use it to offer various contents if required[6] . So, it can be a role as an error detector and an information provider needed in the process of design and construction.
2
Building Style of Hanok
2.1
Structure Prototypes
For the comprehension, construction frame and component’s name for a Korean building is presented in Fig. 1 In wood structuring, there are three main elements namely the horizontal, the vertical and diagonal elements. Stacking principle is normally seen from structures where the horizontal elements are stacked on top of the vertical elements. Inserting principle is normally seen in joints where two elements are penetrated/inserted each other. The space measuring concept for the roof structure is named as Ryang which means in between the purlins as a diagonal bridge [7]. The 3-Ryang house is constructed by stacking the main perlin on the column and crossing the cross beam and then setting up the board post on the center of the cross beam and stacking the highest perlin as shown in Fig. 2-(a). In private house 7-Ryang house is rarely used while, 5-Ryang house is commonly appeared (Fig. 2-(b))[8]. 5-Ryang is a house type that adds the middle perlin between the main perlin and highest perlin as shown in Fig. 2-(c).
Understanding and Implementation of the Digital Design Modules for HANOK
2.2
129
Binding Order and Join Rules
Most CAD program is good for design of a western house. But in the case of designing an oriental wooden house, there are some problems. These problems are
Fig. 1. Components of oriental wooden architecture
(a) 3-Ryang Type
(b) 5-Ryang Type
(c) 1-high column 5-Ryang Type
Fig. 2. Space measuring in Hanok
(a) Non bracket-set house
(b) Simple bracket-set house
Fig. 3. Shape variation in head of column
130
D.-Y. Choi, E.-Y. Ahn, and J.-W. Kim
mainly caused by joint rule for the connecting components in Hanok. Fig. 3 shows two examples for joint rule. Since there are many joint rules in Korean traditional buildings, designer should design the components what they want in detail according to the coupling scheme. Binding order is another cause of the problem. Fig. 4 explains the correspondence between the binding order and the component shape variation. Generally, beam and cross-beam are inserted into the main post. The head shape of a post varies with the combination of these components. The complicated components and their relationships are impeding the growth of the digital design in Hanok. And it is inefficient to offer all components individually in CAD system, because a number of components make users confuse to select a proper component they need. For this reason, we suggest a template based description of the components which is helpful for intuitive and flexible design.
3
Hanok Module and BIM Modeling
3.1
System Overview
We present object-oriented based representation for the flexibility, convenience and reusability of the components. So, all components are described with parametric form. This fig. 4 shows the system overview of the proposed module implemented and plugged-into the commercial architectural design system. As designer request a component for traditional house on the BIM tools, Dialogue is pop-upped for setting up the parameters. User decides the component’s attributes through the dialogue. Otherwise default values are assigned to the member variables. Then the parametric template for the component in the library is activated. And it is instantiated from the template and drawn in the CAD system.
Fig. 4. System Overviws
Understanding and Implementation of the Digital Design Modules for HANOK
3.2
131
Parametric Components Descriptions
As mentioned before, traditional wooden architecture has a unique coupling rule between related components. As a result, there may be many kinds of shape even though they are functionally same. Template based description for the component is a good solution to handle these components. Among the resemble components in respect to the usage and exterior appearance, representative prototype is defined as a template with parameters for their attributes. For easy drawing, it is important to decide what is treated as parameters and how many parameters are imperatively necessary. Fig. 5 shows the component’s parameters and their relationships. For examples, Post- , Post- , Post- , Post- are the parameters deciding the shape of the head of column and have same values with Bo- , Dori- , Boaji- and JYrespectively.
ⓐ
ⓑ
ⓒ
ⓓ
ⓐ
ⓐ
ⓐ
ⓐ
Fig. 5. Relationship between CAD system and proposed component Libary
4
Implements and Results
To generate and handle the proposed components, we develop a traditional architectural module. The module is plugged-in CAD system and makes easy to use traditional components in the system. We implement the proposed method on a commercial CAD system, ArchiCAD 14 version with GDL(Geometric Description Language), the script language of the system [9,10]. Fig. 6 shows user interface for setting parameter for a component.
132
D.-Y. Choi, E.-Y. Ahn, and J.-W. Kim
Fig. 6. User interface for drawing a traditional architectural component
The components for Korean-style buildings are classified and defined as templates with member variables. From these templates, actual components are generated. In respect to efficiency, template based component description is very important because that so many derivatives can be possible from a same component according to the binding pattern. Moreover attributes for the components are used for BIM system to offer useful information to the user. Fig. 7 depicts that we can get diverse instances from a component template by changing attributes or type information of the component. Fig. 8 shows a floor plan print draw by proposed components and library and its interior and exterior 3D views. Template for Soro
Negal Soro
Segal Soro
Yangal Soro
Neyupgal Soro
Seyupgal Soro
Fig. 7. Prototype of Soro and its variants
Yangyupgal Soro
Understanding and Implementation of the Digital Design Modules for HANOK
133
Fig. 8. Interior and exterior 3D views and its structural information
5
Conclusions and Future works
We investigate the problem in designing oriental wooden house. Because of that the components for the traditional architecture are not a manufactured productions, almost every component is produced one-by-one according to the expert’s individual experience. And in the process of designing a traditional architecture, CAD does not support the construction of Korean wooden structure so the digital design for the Korean style architecture is very troublesome and time consuming work. And that’s why we need more intelligent and easy-handling design tool for traditional architecture especially. We proposed an easy way to design a traditional architecture from the templates that can generate the variants of a prototype under the consideration of the coupling order between components. Since proposed method is implemented base on the BIM system it provide intelligent function like as checking some combining error and reporting the component list for the designed architecture. In Korean traditional building, most impressive point may be a grace roof surface. From the designer standpoint, making this surface is very hard work. Sometimes, it takes 1 or 2days. Because that the surface is made of arranging a number of components such as Sunjayeon, Choonyeo and so on. In our future works, we try to automate the design of Hanok including roof frame and surface. And we investigate rule based approaches and guidance to assure error-free design,
134
D.-Y. Choi, E.-Y. Ahn, and J.-W. Kim
References 1. Inhan, K.: Managing Design Data in an Integrated CAAD Environment: A Product Model Approach. Journal of Automation in Construction 7(1) 2. Chiou, S.-C., Krishnamurti, R.: The Grammatical Basis of Chinese Traditional Architecture. Language of Design, 5–31 (1995) 3. Li, A.I.-K., Tsou, J.-Y.: The Rule-based Nature of Wood Frame Construction of the Yingzao Fashi and the Role of Virtual Modeling in Understanding it, Computing in Architectural Research. In: Proc. of the International Conference on Chinese Architectural History, Hong Kong, pp. 25–40 (1995) 4. Kim, M.: A Program Development of 3D Documentation System for the Korean Traditional Wooden Architecture. In: CAADRIA 2000, pp. 469–477 (2000) 5. Choi, J.-W., Hwang, J.-E.: KotaView: Simulating Traditional Korean Architecture Interactively and Iintelligently on the Web. Automatic in Construction 14, 1–14 (2005) 6. Dobelis, M.: GDL-New Era in CAD. In: 6th International Conference on Engineering Graphics BALTGRAF-6, pp. 198–203 (2002) 7. Park, S.H., Lee, H.M., Ahn, E.Y.: Implementation of the Traditional Bracket-set Design Modules for BIM Tools and Understanding of the Sung-Rye_Moon Roof Structure. In: MITA 2011 (2011) 8. Chang, G.I.: Wooden Structure. Bosung-gak (1987) 9. Graphisoft, ArchiCAD12 GDL Reference Manual, vol. (4), pp. 31-114 (2009) 10. Nicholson-Cole, D.: The GDL Cookbook 3, Marmalade (2003)
A Gestural Modification System for Emotional Expression by Personality Traits of Virtual Characters* Changsook Lee1 and Kyungeun Cho2, 1
**
Dept. of Computer Engineering, Graduate Schools, Dongguk University 2 Dept. of Multimedia Engineering, Dongguk University 26, Pil-dong 3-ga, Jung-gu Seoul 100-715, Korea
[email protected]
Abstract. In the expression of human emotions, the level of expression differs even under the same emotion. In this paper, virtual characters that expressed different emotions by personality type were examined. For this, personality traits that have an influence on the creation of human emotions were classified and applied. For verification of the applied method, each personality type was implanted into a virtual character. Then, the same emotions were differently expressed by a personality test. Results confirmed that different emotions were expressed with different gestures depending on the character’s personality. Keywords: Artificial Emotion, Emotion Adjustment, Emotion Expression, Body Movement, Emotional Virtual Character.
1
Introduction
Emotions occur through interaction among various variables, such as a specific context, object, place, recollection of the past, and relations with others. In addition, different emotions are expressed even under the same place, same time, and same context, because emotions are differently expressed depending on one’s personality. Unlike the past, game characters can now express a variety of emotions. However, they are still limited to the expression of emotions without expressing different emotions by personality type. This kind of uniform expression of emotions may make users feel boring and unrealistic over time. Therefore, it is necessary to express diverse emotions by implanting a personality type to the characters. In order to express the same emotion differently depending on the characters’ personality type, the personality traits that have an influence on emotions have been defined in this paper. A way to express the emotions in gestures is also investigated. The rest of the paper is divided in the following sections. In section 2, the classification of emotions and personality, which is usually used in creating virtual characters, is examined. In addition, studies on the expression of emotions in gestures *
This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (2009-0077594). ** Corresponding author. T.-h. Kim et al. (Eds.): MulGraB 2011, Part I, CCIS 262, pp. 135–145, 2011. © Springer-Verlag Berlin Heidelberg 2011
136
C. Lee and K. Cho
are described. Furthermore, how personality is used, and how it differs from this study are discussed. In section 3, personality traits that have an influence on the rise and fall, disappearance, and continuance of emotions are defined, and the manner in which they are actually used in controlling emotions is described. In section 4, the feasibility of a proposed method is determined by increasing or decreasing the emotions based on actual personality test scores. In addition, how the same emotion can be differently expressed by way of gestures is examined by applying virtual characters. In section 5, the conclusion and future plans are described based on the results.
2
Related Works
In this paper, personality classification and emotion classification theories, which are usually used in creating virtual characters, are described. In addition, conventional studies on the creation of emotions and studies on the expression of emotions using gestures are reviewed. Lastly, the difference between this study and conventional studies is mentioned. 2.1
Character Classification and Emotion Classification Theories That Are Usually Used in Creating Virtual Characters
In virtual reality and digital contents application sectors, a personality theory that defines human personality as a group of several traits is usually used. In general, five factors are used [1],[2]. This theory, known as the “Big Five,” classifies human personality in five dimensions. Each dimension has six sub-traits. Because each individual personality can be defined using the five dimensions, it is easy to create diverse personalities with relatively few parameters [2]. The emotion classification, which is usually used in studies on creation of virtual characters, includes the OCC Model and Ekman’s Basic Emotions. The OCC Model is a hierarchical model that determines how an emotional response will fit in a certain context after evaluating an external event based on three variables: event, agent, and object [3]. As 28 different emotions are hierarchically grouped based on each variable, it is easy to express each emotion digitally. Since it is basically presumed that an agent cannot save emotions, however, the created emotions can be used only once. Therefore, if this model is used, it is impossible to create multiple emotions, such as transfer into a new emotion. In Ekman’s Basic Emotions, a total of six human emotions are expressed; surprise, fear, disgust, anger, sadness, and happiness[4]. Ekman’s Basic Emotions are commonly applied human emotions regardless of cultural and racial differences. Because they are easily noticeable, they are very convenient in expressing clear emotions [4]. 2.2
Expression of Virtual Characters’ Emotions Using Gestures
In a study on the expression of virtual characters’ emotions, it is essential to express emotions through diverse facial expressions or gestures. In the following paragraphs,
A Gestural Modification System for Emotional Expression by Personality Traits
137
major studies closely related to this study will be described, and their differences will be mentioned. One of the most famous studies on the expression of emotions using gestures is J. Cassell’s Beat Project [5]. In this study, emotions are expressed in a multiple type by mixing gestures, facial expressions, and TTS engine-based voices. Characters are controlled by mixing a list of gestures described by an animator. This list of gestures mixes appropriate gestures after being interlocked with dialogue parameters that require texts to be entered. In A. Egges’ study, a P&E(Personality & Emotion) simulation that artificially creates emotions is created and emotions are expressed [6]. In addition, the created emotions are expressed by gestures and facial expressions. This paper focused on personality and emotion simulations depending on personality type. In addition, an attempt was made to increase or decrease positive and negative emotions using the Five Factors. W. Su’s Affective Story Character is a study on short dialogues expressed by virtual characters using gestures [7]. This paper focused on virtual characters’ acts using body language after mixing 100 different gestures. A study by K. Amaya examined how a motion scale could vary depending on the emotion type and level even under the same motion [8]. In this paper, the changes in the motion scale by type of motion were revealed by carrying out a simple action, such as kicking a ball or drinking water. Even with the same motion, the scale was relatively low when the testee was feeling sad. On the contrary, exaggerated gestures were observed when the testee was angry. These studies are similar to this study in terms of expressing emotions using gestures. In particular, the results of the study by K. Amaya supported the theoretical ground of this paper, which insists that the scale of gestures can differ depending on emotion levels. Egges has a similar study in terms of the creation of emotions using personality traits for the ups and downs of emotions. However, Egges used dimension scores. The problem is that it is not always feasible to use all dimension scores of Five Factors for the ups and downs of emotions. In the Five Factors, each dimension has six sub-traits. There is a basic interpretation for the individual traits. In fact, the interpretation can partially differ depending on the scores earned (high or low). Among them, the traits that have an influence on emotions are very limited. The scores in each dimension are estimated by summing up the scores of their sub-traits. Therefore, because the values that have no influence on emotions are also included, reliability on the emotion values is poor. In studies by J. Cassell and W. Su, emotions and emotion levels are expressed by mixing the pre-described animations. Unless appropriate animations are described in advance, it is impossible to express emotions. Because each motion is composed of several emotions and emotion levels, in addition, several animation files should exist. This paper proposes a method that expresses emotions by reflecting on the personality of characters, while at the same time creating multiple motions with several animations. In addition, to obtain reliable emotion values, a method to increase and decrease emotions by extracting personality traits that have an influence on emotions is proposed in this paper.
138
3
C. Lee and K. Cho
Emotion Control by Personality Traits
To express different emotions depending on personality types, the following two operations will be performed: extracting personality traits that have an influence on emotion levels among various personality-related traits, and defining what kind of emotion classification would be used. For this, the definition of emotion classification, a method to extract personality traits and an emotion control method using the extracted traits, are proposed in this chapter. 3.1
Emotions That Can Be Expressed Using Gestures
This study expresses the emotions of virtual characters by gestures. To make it easy to obtain the emotions of virtual characters, a clear emotion classification will be used. In this paper, Ekman’s Basic Emotions, which could be commonly perceived regardless of cultural and racial differences, were adopted. Compared to the expression of emotions through facial expressions, the expression by gestures is limited. If gestures are incorrectly expressed, it will be hard to predict what emotion the user is trying to convey. In this sense, Ekman’s Basic Emotions are a very clear and easily noticeable emotion classification. In this paper, each emotion is expressed by the gestures of virtual characters using the emotion classification after an addition or deduction, depending on the personality traits. 3.2
Classification of Personality Traits That Have an Influence on Emotions
To increase or decrease the emotion classification above depending on personality traits, it is necessary to figure out what personal traits have an influence on each emotion. For this, a personality classification that can define human personality using diverse personality traits is essential. In this study, the personality traits that have an influence on each emotion have been extracted using NEO-PI, a part of the personality scale with which one’s personality can be analyzed using the scores of the Five Factors sub-traits. The dimensions and sub-traits that belong to the NEO-PI are shown in Table 1 below: Table 1. Dimensions and sub-traits that belong to NEO-PI Dimension Neuroticism Extroversion Openness Agreeableness Conscientiousness
Sub Trait Anxiety, Anger Hostility, Depression, Self-consciousness, Impulsiveness, Vulnerability Warmth, Gregariousness, Assertiveness, Activity Excitement-seeking, Positive Emotion Fantasy, Aesthetic, Feeling, Actions, Ideas, Value Trust, Straightforwardness, Altruism, Compliance Modesty, Tender-mindedness Competence, Order, Dutifulness, Achievement striving, Self-discipline, Deliberation
A Gestural Modification System for Emotional Expression by Personality Traits
139
In fact, extroversion and neuroticism are major dimensions used in a personality traits model that have been classified by several psychologists who support trait theory. The trait theory is a psychological theory that insists that human personality is composed of several traits. Neuroticism is a dimension that evaluates overall emotion control, along with pathology, such as expression of emotions, neurosis, and stress. Therefore, most subtraits under this dimension have an influence on the ups and downs of emotional expressions in negative dimensions. Extroversion is a dimension associated with the expression of positive emotions. Some psychologists interpret extroversion and introversion separately [9]. In NEO-PI, they are classified into extrovert and introvert propensities depending on the numerical levels. To estimate the results of a NEO-PI personality test, the sum of scores in the subtraits is used as dimension scores. However, among 30 sub-traits in five dimensions, there could be something that has nothing to do with emotions. If the scores of five dimensions are simply used in increasing or decreasing emotions, their reliability would fall. Hence, the following criteria were created in this study to extract the subtraits and dimensions that are related to emotions: (a) If it has a direct influence on the ups and downs of emotion; (b) If it is related to current emotion expression; (c) If it is related to the continuance and suppression of emotions. To extract personality traits that meet the above criteria, the interpretation of NEO-PI (Korean edition) was referred. After preliminary classification depending on whether or not the word or content associated with the emotions exists in the interpretation of traits, the traits that satisfy the said criteria are extracted. The personality traits that have been extracted in accordance with each criterion are shown in Table 2. The extracted personality traits were classified by six sub-traits and 1 dimension. Table 2. Personality traits extracted in accordance with each criterion Traits that have a direct influence on a particular emotion Traits related to the decision of emotion expression Traits related to the continuance and suppression of emotions
Positive Emotion, Anger Hostility, Depression, Anxiety, Self-consciousness Feeling, Extroversion Compliance
First, the results and interpretation of the personality traits that have an influence on particular emotions are shown in Table 3. The table shows sub-traits in neuroticism except for positive emotional traits under extroversion. Depending on the level of traits, a decision is made on whether or not the emotion exists. Therefore, the scores of the said traits are used in increasing or decreasing each emotion.
140
C. Lee and K. Cho
Table 3. Personality traits that have an influence on particular emotions and interpretation of each traits Extracted Trait Positive Emotion Anger Hostility Depression Anxiety Self-consciousness
Interpretation Emotional experience-oriented (ex: joy, happiness, love, excitement) Experience of anger, depression and hostility Experience of melancholy, sadness, guilty conscience, despair, and loneliness Experience of tension, anxiety, and fear Experience of shame, embarrassment, sense of inferiority and shyness
Second, the results of the traits related to current emotion expressions are stated in Table 4 below. It may appear from the table that the traits of feeling are not related to the decision on the expression of emotions. However, these traits are interpreted as the expression of emotions depending on the level of values. In this paper, therefore, the traits of feeling are used as the determinants of the ups and downs of emotions. Depending on the extroversion scores, this study is used to determine weighted values for the addition and deduction of emotion values. Table 4. Traits related to the decision of current emotion expressions and individual analysis Traits Feeling Extroversion
Interpretation Acceptance of inner emotions Factors related to interrelations and activeness
Third, the results of the extraction of the traits related to the continuance and suppression of emotions are shown in Table 5. In this study, the traits are used in continuing and suppressing anger and dislike emotions whose severity differs depending on the counterpart’s willingness to accept the emotions. Table 5. Traits related to the continuance and suppression of emotions and individual interpretation Traits Compliance
Interpretation Determination on the level of accepting counterparts in human relations
The traits that have an influence on each emotion are summarized in Table 6. As described above, in compliance, feeling, and extroversion, the analysis by the NEO-PI test can differ depending on the ups and downs of the values. Therefore, it is necessary to define the criteria of scores, which can divide the values. In the NEO-PI test, each subcharacteristic consists of 8 questions (5-point scale). In addition, the dimension scores are calculated by summing the scores of six sub-traits. In this paper, 50% of the maximum scores that could be recorded in each characteristic and dimension are defined as the reference point. The reference point can be modified by a user.
A Gestural Modification System for Emotional Expression by Personality Traits
141
Table 6. Connection the extracted personality traits and emotion Emotion Happiness Anger Sadness Fear/Surprise Disgust
Personality Trait Positive Emotion /Feeling /Extroversion Anger Hostility /Compliance /Feeling /Extroversion Depression /Feeling /Extroversion Anxiety /Feeling /Extroversion Self-consciousness /Compliance/ Feeling /Extroversion
In the following section, a method to increase or decrease emotions on the expression of emotions by personality based on the results of Table 6 is described in detail. 3.3
Emotion Estimation for the Expression of Emotions by Personality
The emotion creation-related theories have been defined differently by psychologists. In general, artificial emotions are created by a perceiving external environment called “cognitive theory.” An equation commonly used to estimate emotions is as follows: E
E
E
(1)
Here, Et, a current emotion level, refers to the level of emotion externally entered. In the case of positive emotions, positive values are observed. For negative emotions, negative values are detected. Et-1 refers to current emotions that were already created. Et+1 is the emotion to be expressed at the end of Et-1. In other words, if other emotions are entered from the outside, they may be amplified or transferred to other emotions depending on the type of emotion. However, this equation has no factors that can have an influence on the ups and downs of emotions just like personality or environment. Hence, there is no choice but to express emotions with the same emotion values even though a character with a different personality is created. In this paper, to prevent this irony, the personality traits are applied to emotions as shown in Equation 2 based on the results of Table 3. mE
E
E
α
γ β
(2)
mEt+1 is a value that was obtained by adjusting Et in consideration of personality traits. Here, α is the personality characteristic that satisfies the criterion (a) with a direct influence on each emotion while β is a characteristic that determines if the emotion that was entered as a “Feeling” trait should be expressed or not. For those with low scores in “Feeling,” the emotion is not expressed by multiplying them by “0.” For those with high scores in “Feeling,” on the other hand, the estimated emotion is expressed by multiplying them by “1.” γ is an “Extroversion” value that increases or decreases the estimated emotions. Among extroverted persons with high scores in extroversion, γ is weighted against all estimated emotions. Among introverted persons, on the other hand, all estimated emotions declined by γ.
142
C. Lee and K. Cho
The estimated emotions are expressed in animation for a certain period of time. If no emotion is left to be expressed, the emotion state returns to “0.” Equation 3 refers to the definition on the duration time of animation: D
1
δ
(3)
In this paper, the duration time(Dt) of emotion was set to “1.” As shown in Table 6, however, in anger and disgust, the “Compliance” traits that were involved in the continuance and suppression of emotions were added, unlike other emotions. Because those with high compliance scores tend to suppress their aggressive propensity, it is necessary to decrease the duration time of anger and disgust. Therefore, the duration time of emotion is reduced by applying δ and compliance ratio, to the defined duration time as shown in Equation 3.
4
Test and Results
Two tests were performed to verify the method proposed in section 3. First, a personality test was performed to verify the Equation mentioned above. Then, the emotions were increased or decreased using the scores earned from the test. Second, a comparative test was performed after applying the emotion values that have fluctuated through the gesture deformation system to the actual characters. The emotions used for the tests were based on the emotion classification defined in Section 3.1. 4.1
Test Environment and Testing Method
For the first test, two persons with different personality types were selected. Then, personality data were obtained through a NEO-PI test. Among the obtained personality data, only the personality traits related to the emotions defined in Section 3.2 were extracted and used. Table 7 shows the scores of personality traits by each testee. Table 7. Scores of personality traits of testees Personality Trait Positive Emotion Anger Hostility Depression Anxiety Self-consciousness Feeling Extroversion Compliance
Testee 1 28 28 25 25 21 30 164 26
Testee 2 19 20 25 26 29 19 144 27
The scores in Table 7 are the raw scores that are obtained through a NEO-PI personality test. Testee 1 is a relatively extroverted person with high extroversion scores while Testee 2 is a somewhat introverted person. The data above were applied to the Equation mentioned in Section 3.3, and feasibility was examined.
A Gestural Modification System for Emotional Expression by Personality Traits
143
After applying the results from the first test to actual characters, a second test was performed to compare how gesture changes by testee under the same emotion. For the second test, it is necessary to create an environment by which motions in different scales by personality types can be performed. Therefore, weighted values were added to each bone of virtual characters by personality type, and an Emotional Animation Tool (EATool) was established to express diverse emotions through different gestures. The EATool is a tool that implants weighted values by personality type to each bone by loading characters that can perform the animation. After selecting the bones, a certain level of weighted values are chosen and saved. If emotion values are entered based on the saved weighted values, the scale of motion will differ by the weighted values. Fig. 1 shows how EATool works.
Fig. 1. Execution of EATool
The EATool is divided into two parts: a display of the same character as shown in Fig. 1 above and an adjustment of weighted values to calibrate a specific part of the character. The output of character is divided into an original animation that represents the output of original animation and a generated animation that presents the animation to which emotion values are applied. 4.2
Test Results
As mentioned above, two tests were performed. The first test was targeted to verify the Equation in section 3.3 using the data from Table 7. In the second test, the emotion values that were obtained from the first test were directly applied to characters and compared. Fig. 2 below shows emotion values that were not applied to the Equation and a graph of emotion values that were applied to the Equation:
144
C. Lee and K. Cho
Fig. 2. Comparison of the “Joy” Emotion by Testee
In Fig. 2, the input of raw values on the “joy” emotion and estimated value by testee were compared. The emotion experiment was tested with 10 Events, and different values were entered as input. Compared to the input of raw values, the emotion values by the testee, which were obtained by the Equation defined in section 3.3, was increased. In addition, emotion values that were obtained under the same input differed slightly depending on a testee’s personality. The extroverted Testee 1 obtained higher emotion values than the introverted Testee 2. A part of the data obtained was extracted and applied through the EATool. Fig. 3 shows a comparison between the original animation and emotion values by the testee.
Fig. 3. Comparison to original animation with the “joy” emotion by testees
In Fig. 3, the original animation is positioned on the left while Testee 1 is positioned in the middle. Testee 2 is positioned on the right. The characters jump high, extending their arms. To compare the same scene, the same frames were captured. Findings suggest that the extroverted Testee 1 had greater arm and leg
A Gestural Modification System for Emotional Expression by Personality Traits
145
motions than Testee 2. According to the test, even though the same emotion values were entered, different values were obtained depending on a testee’s personality traits. In addition, when the obtained values were applied to the actual characters, it felt different from the same original animation. In other words, it would be possible to generate different gestures by using a single animation file and applying different weighted values based on personality without making different animation files by personality types.
5
Conclusion and Future Directions
Depending on individual personalities, people can feel differently even within the same context. In this paper, a method to express emotions through personality types using gestures has been proposed. When personality test scores were applied to the proposed equation, emotion values differed by personality even when the same values were entered. In addition, different gestures for emotional expressions were compared by the testee through the application of the weighted values of personality to each bone by personality using the EATool and entering the emotion values. According to the comparison, the scale of gestures differed by personality even under the same emotion. In the current EATool, however, it is necessary to select the bones to which weighted values would be applied. In future studies, the weighed values should be automatically applied, depending on personality scores.
References 1. 2. 3. 4. 5. 6. 7.
8. 9.
Lee, H.: Emotional Psychology, Bobmunsa (2002) Park, A.: Understanding of Personality Developmental Psychology. Kyoyoockbook (2006) Ruebenstrunk, G.: Emotional Computers (1998) Ekman, P.: Emotion Revealed. Owl Books (2006) Cassell, J.: BEAT: The Behavior Expression Animation Toolkit. In: Proc. ACMSIGGRAPH 2001, pp. 477–486 (2001) Egges, A., Kshirsagar, S., Thalmann, N.M.: Generic personality and emotion simulation for conversational agents. Computer Animation and Virtual Worlds 15(1), 1–13 (2004) Su, W., Pham, B., Wardhani, A.: Personality and Emotion-Based High-Level Control of Affective Story Characters. IEEE Transaction on Visualization and Computer Graphics, 281–293 (2007) Amaya, K.: Emotion from Motion. In: Proceedings of the Conference on Graphics Interface, pp. 222–229 (1996) Larsen, R.J., Ketelaar, T.: Personality and susceptability to positive and negative emotional state. Journal of Personality and Social Psychology 61, 132–140 (1991)
An Automatic Behavior Toolkit for a Virtual Character* Yunsick Sung1 and Kyungeun Cho2,** 1
Dept. of Game Engineering, Graduate School, Dongguk University, 26, Pil-dong 3-ga, Jung-gu Seoul 100-715, Korea 2 Dept. of Multimedia Engineering, Dongguk University 26, Pil-dong 3-ga, Jung-gu Seoul 100-715, Korea
[email protected]
Abstract. Approaches that apply programming by demonstration (PbD) to automatically generate the behaviors of virtual characters have been actively studied. One of PbD directly delivers the knowledge of a predecessor to the virtual character. Therefore, a virtual character learns the behaviors to be executed by observing the behaviors of a predecessor. All consecutive actions are derived from the actions collected as behaviors. The behaviors to be executed are selected from defined behaviors using the Maximin Selection algorithm. However, these approaches collect a large amount of data in real time. Therefore, the amount of data significantly increases, and their analysis becomes difficult. This paper proposes a toolkit that employs PbD to automatically generate the behaviors of virtual characters based on those of a predecessor. Furthermore, an approach to manage and analyze the collected data is described. On the basis of the results of an experiment, it was verified that the proposed toolkit could generate a script of the behaviors of virtual characters for driving in a car simulation. Keywords: Behavior Toolkit, Programming by Demonstration, Virtual Character, Agent Framework.
1
Introduction
Diverse approaches have been developed to automatically generate the behaviors of autonomous virtual characters. One such approach applies the technique of Programming by Demonstration (PbD) [1]. This approach collects the movements of a virtual character that is controlled by a human being and automatically generates behaviors based on the collected data. For example, a study examines whether a virtual character can learn a series of consecutive actions using collected data [2]. Another study investigates the framework used to generate the behaviors, based on collected data [3]. However, these approaches face the following issues. First, the size of collected data is large because the data are collected from a human being in real time. Therefore, it is difficult to manage the collected data. Second, it is difficult to *
This research was supported by HUNIC(Hub University for industrial collaboration) at Dongguk University. This paper summarized the results of the “Development of a Supervised Learning Framework for Eldercare Robot Contents” project. ** Corresponding author. T.-h. Kim et al. (Eds.): MulGraB 2011, Part I, CCIS 262, pp. 146–154, 2011. © Springer-Verlag Berlin Heidelberg 2011
An Automatic Behavior Toolkit for a Virtual Character
147
analyze the collected data. There are limits on the quantity of data that can be analyzed. Thus, an intuitive analysis approach that includes graphs is required. The final problem is that the generated behaviors are platform dependent. That is, the approaches generate behaviors, which virtual characters are expected to execute, in a manner that is highly associated with the platform of implementation. Consequently, if a platform is changed, the approach needs to be implemented again. To solve the abovementioned problems, the approach should be able to systematically manage and analyze the collected data and the behaviors generated by the approach should not be dependent on any particular platform. This paper proposes a toolkit that automatically generates the behaviors of virtual characters. The toolkit provides the user interface that is required for generating behaviors and saves the results as scripts. Thus, it can be applied with various kinds of platforms. To evaluate the proposed approach, it was successfully used to generate scripts by learning the behaviors required for driving on the basis of human demonstration in a car simulation. This paper comprises the following sections. Section 2 introduces previous research based on PbD. Section 3 presents a toolkit for generating behaviors. Section 4 describes the implementation of the proposed toolkit and the generated behaviors. Section 5 summarizes the proposed approach.
2
Related Work
PbD was first proposed as an approach for developing user interfaces [1]. However, it has been applied in various fields. This section introduces previous research based on PbD. A toolkit that can be used to input user-executed behaviors is described. In addition, an approach to generate the behaviors of a virtual agent or a robot, based on a demonstration by a human being, is also described. 2.1
Toolkits Using Programming by Demonstration
A toolkit is generally used to easily and quickly generate the data required to run an application. Toolkits typically offer a convenient interface. PbD is applied to a toolkit so that the data generated by a human being’s direct performance of some activity can be utilized. For example, a CAPpella, a context-aware prototyping environment intended for end users, enables an end user operating a product in a real-life scenario to teach the smart environment how to identify his/her behaviors by leveraging PbD [4]. With this tool, a developer does not directly define the behaviors required to identify the behaviors, thereby increasing the rate of behavior identification. Another toolkit learns the application logic that is to be run in response to a variety of sensor inputs [5]. This toolkit generates the logic required via demonstration. It then automatically runs the application logic that is suitable to the sensor inputs on the basis of the generated rules. 2.2
Behavior Generation Methods Using Programming by Demonstration
PbD is applied when automatically generating behaviors so that the movements executed by a human being can be captured and reproduced by a robot or a virtual
148
Y. Sung and K. Cho
character. A previous study employing PbD, as described below, focused on collecting the actions of a virtual character controlled by a human being, calculating the probability of action in each condition, and making a virtual character execute the consecutive actions considering the calculated probability [2]. A virtual character executes the actions as a human being does by learning the actions that the human being executes most in a particular condition. In some studies, the consecutive behaviors were split and learned based on variance and then defined as tasks [6]. These studies assume that the behavior changes to a different task when the variance is significant. Another study looked at selecting consecutive actions to be used via the Maximin Selection algorithm, by deriving all possible consecutive actions from the entire set of collected actions [3]. This paper proposes a toolkit to automatically generate the behaviors of a virtual character. This toolkit uses PbD to collect the movements that have been executed directly by a human being and then generates the behaviors.
3
Framework of Behavior Generation Toolkit
It is necessary to define the behaviors of a virtual character in advance in order to enable the virtual character to execute the behaviors autonomously. This section presents a toolkit to automatically generate the behaviors of a virtual character. 3.1
Overview
The proposed approach employs a client- server model to enable multiple users to simultaneously collect data. Data collected by each client are saved in a database through the server. To distinguish between the roles of collecting and analyzing data, the client processes the collection and an analyzer provides a user interface for data analysis. For simultaneous data analysis by multiple users, the analyzer also provides access to the server and queries the database. Thus, the proposed agent framework comprises of a client, server, database, and analyzer, as shown in Fig. 1. The behaviors of a virtual character are generated as described below. First, a human being directly controls a virtual character using an input device. The data generated by this control are saved in a database through the client and the server. After completing the data collection required to generate the behaviors, the server generates the behaviors using the data saved in the database. The generated behaviors are saved as a script through the server and the client. Then, the generated script is used in a virtual environment. 3.2
Data Structure for Behavior Generation
The following data types are processed when automatically generating behaviors: (1) Movements: The movements of a virtual character controlled by a human being are saved and used to generate actions and behaviors. (2) Actions: Actions are defined by combining the movements that are collected simultaneously.
An Automatic Behavior Toolkit for a Virtual Character
149
Fig. 1. Relation between an Agent Framework and Virtual Environment
(3) Behaviors: Behaviors are defined using the actions. The behaviors generated through the server are saved in the database and transferred to the client when generating a script. (4) Metadata: While metadata are not essential for generating behaviors, they include the data required to analyze the collected movements. For example, when a movement is collected, the recorded metadata include the coordinates of a virtual character and the time of occurrence of the movements. 3.3
Functions of Message Router and Behavior Generator
The server comprises a message router and a behavior generator. The message router transfers data by connecting the client, analyzer, and database. The server generates the behaviors using the behavior generator when data collection is completed, as shown below. First, an action is defined by concurrent movements and their durations, as shown in Equation (1). In this equation, an is the action defined by the nth 1 movement collected, mn is the first movement comprising the nth action, and dn is the duration of the nth action. an = (mn1 · mn2 · …, dn)
(1)
Second, as shown in Equation (2), a set of consecutive actions, totaling z actions, defines the jth candidate behavior, cj. In this equation, u and v are the starting and ending positions of the action, considered as the candidate behavior in consecutive actions, respectively. cj = au · au+1 · … · av, 1 ≤ u ≤ v ≤ z
(2)
150
Y. Sung and K. Cho
Finally, the behaviors to be executed are selected from among multiple candidate behaviors using the Maximin Selection algorithm [3]. bk is the kth derived behavior. After the behaviors are generated, they are stored in the database. 3.4
Functions of Data Collector and Script Generator
The client comprises a data collector and a script generator. The movements of a virtual character, controlled by a human being and metadata, are transferred to the data collector. The data collector collects the data required to generate the behaviors and saves such data in a lump in a database through the server, when generating behaviors. After generating the behaviors through the server, the results are transferred to the script generator and a script is generated. To use the generated behaviors with various kinds of platforms, the results should not be dependent on a specific platform. A script can be executed on various kinds of platforms. The script comprises the data defining the behaviors and the functions querying the behaviors, as shown in Fig. 2. // Behavior Definition SET a1 (m11 • m12 • …, d1) SET a2 (m21 • m22 • …, d2)
… SET az
(mz1 • mz2 • …, dz)
SET b1 = a1 SET b2 = a1• a2 … SET by = a1• a2•…• …• az // Function Definition FUNCTION GetBehavior with index RETUTN bindex Fig. 2. Behavior Data and Behavior Inquiry Function Prototype
The actions are defined by multiple movements, and the behavior is defined by consecutive actions. The GetBehavior function returns the behaviors relevant to the index among multiple behaviors. 3.5
Viewer Functions
The quantity of data generated by human control is large. Therefore, it is difficult to monitor and analyze data generated by human control in real time. Thus, a
An Automatic Behavior Toolkit for a Virtual Character
151
functionality that facilitates the examination of the contents saved in the database after data collection must be identified. An analyzer should provide the following functions. First, it should facilitate the examination of movements and actions that are consecutively collected on the basis of time. Second, multiple movements should be examined on the basis of time for the generated behaviors. Third, the position of a virtual character that generates the data should be tracked using metadata. Thus, the proposed analyzer comprises a movement viewer, action viewer, behavior viewer, and metadata viewer.
4
Implementation and Experiments
This section presents an evaluation of the proposed approach. The behaviors required to drive a car were generated by using the Open Racing Car Simulation (TORCS). The driving wheel, accelerator, and brake pedal were used to generate the behaviors in the experiment. Each value was converted to a value from 1 to 100 and transferred to the client. The closer the driving wheel comes to 1, the more it is turned to the left. The closer it comes to 100, the more it is turned to the right. The pedal has a value of 50 when it is not in use. When it is pressed fully, a value of 100 is transferred. The client and the server were implemented as independent applications, as shown in Fig. 3. TORCS, the client, and the server are linked to one another using the TCP/IP protocol.
(a) Client
(b) Server
Fig. 3. Client and Server Interface
Three viewers were used to examine the data, as illustrated in Fig. 4. First, the analyzer generates a graph showing the change in collected actions and movements on a time axis. Second, the analyzer presents a 3D graph showing the movements from generated behaviors in accordance with the time and movement type. Finally, the analyzer shows the movement track of a virtual character, defined by metadata, using 3D coordinates. The behaviors of a virtual character controlling a car were generated as follows. The database, server, client, and TORCS were activated. The server was linked to the database, and the client was linked to the server. TORCS was connected to the client
152
Y. Sung and K. Cho
(a) Action and Movement Viewer
(b) Behavior Viewer
(c) Metadata Viewer Fig. 4. Analyzer Interface
after being activated. The driving movements of a human being were saved in the database through the client and the server. The behaviors generated through the server were saved as a Lua script [7], as shown in Fig. 5. A Lua script is a script that can be easily used with a variety of languages including C and C++. The Lua script contains three parts. First, the constants for the input device were defined. W, B, and A represent the driving wheel, brake pedal, and accelerator, respectively. These constants are used to express the actions and the behaviors in the script. Second, the movements that comprised the behaviors were defined in a multidimensional array. Fig. 5 shows two examples of the generated behaviors. The first behavior is the movement associated with gradually speeding up and turning the driving wheel to the right. The second behavior is the movement associated with gradually stepping on the brake pedal. These kinds of movements were defined using constants. Therefore, the definitions of the behaviors became clear. The GetBehavior function receives the index and then returns a list of actions comprising the relevant behaviors. The process to generate behaviors using the proposed toolkit was described by performing experiments. Then, it was verified that the script could be generated by demonstration.
An Automatic Behavior Toolkit for a Virtual Character
153
W=0 --Wheel B=1 --Brake Pedal A=2 --Accelerator
behaviors = { --# 1st behavior {{action = {{W,50},{A,50},{B,50},}, { action = {{W,54},{A,62},{B,50},}, { action = {{W,59},{A,70},{B,50},}, { action = {{W,65},{A,95},{B,50},}, }, --# 2nd behavior {{action = {{W,50},{A,50},{B,60},}, { action = {{W,50},{A,50},{B,69},}, { action = {{W,50},{A,50},{B,77},}, { action = {{W,50},{A,50},{B,85},}, { action = {{W,50},{A,50},{B,90},}, }, … }
d d d d
= = = =
21}, 131}, 87}, 160},
d d d d d
= = = = =
30}, 45}, 33}, 38}, 40},
function GetBehavior (index) return unpack(#behaviors[index]) end Fig. 5. Generated Lua Script
5
Conclusion
This paper proposed a toolkit to automatically generate the behaviors of a virtual character. The toolkit structure and the functions required for each application that comprised the toolkit were described. The client collected data and transferred it to the server. Then, it saved the generated behaviors as scripts. The server performed the processes to link the client, analyzer, and database and then generated the behaviors. The analyzer provided the interface for data inquiry to analyze the collected data. Finally, the data saved in the database were decscibed. Furthermore, the toolkit used to generate the behaviors was implemented according to the proposed structure, and the process to generate behaviors using the toolkit was described, as follows. First, the process to collect and analyze human movements from the driving simulation was explained. Next, the process to save the generated behaviors as a Lua script was described. Finally, the structure of the generated Lua script was introduced.
154
Y. Sung and K. Cho
References 1. 2. 3. 4.
5.
6.
7.
Allen, C.: Watch What I Do: Programming by Demonstration. MIT Press (1993) Thurau, C., Paczian, T., Bauckhage, C.: Is Bayesian Imitation Learning the Route to Believable Gamebots. In: Proceeding of GAME-ON North America, pp. 3–9 (2005) Sung, Y., Cho, K., Um, K.: An Action Generation Method of Agent. Journal of Game Society 11(2), 141–149 (2011) Dey, A.K., Hamid, R., Beckmann, C., Li, I., Hsu, D.: A CAPpella: Programming by Demonstration of Context-Aware Applications. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI 2004, pp. 33–40 (2004) Björn, H., Leith, A., Manas, M., Scott, R.K.: Authoring Sensor-based Interactions by Demonstration with Direct Manipulation and Pattern Recognition. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 145–154 (2007) Koenig, N., Mataric, M.J.: Behavior-based Segmentation of Demonstrated Task. In: Proceeding of International Conference on Development and Learning, pp. 228–234 (2006) Ierusalimschy, R.: Programming in Lua, Lua.org (2003)
Development of Real-Time Markerless Augmented Reality System Using Multi-thread Design Patterns* Daxing Jin1, Kyhyun Um2, and Kyungeun Cho2,*** 1
Dept. of Multimedia, Graduate School of Digital Image & Contents, Dongguk University 26, Pil-dong 3-ga, Jung-gu Seoul 100-715, Korea 2 Dept. of Multimedia Engineering, Dongguk University 26, Pil-dong 3-ga, Jung-gu Seoul 100-715, Korea
[email protected] Abstract. In the field of augmented reality (AR) technology, recently, several studies have been conducted on the real-time operation of a markerless AR system. However, this system has higher computational complexity than a marker-based system. This study proposes a method to implement a real-time markerless AR system using the speeded up robust features (SURF) extraction algorithm and a tracking algorithm based on multi-thread design patterns. Further, a method to quickly identify and execute reference objects even in the case of multiple reference objects is proposed. Single-thread and multi-thread systems are compared, and the performance of the implementation methodology proposed in this study is verified through a performance analysis depending on whether or not the finder thread added to search the reference objects is used. Keywords: Augmented Reality (AR), Markerless AR, SURF, Tracking, Multi-thread, Real-Time, Homography.
1
Introduction
In the early stage of an augmented reality (AR) study, a marker (thick, black squarestyle border) is used as a reference object to estimate the object position within the area covered by computer vision. Such a marker has the advantage of fast and easy recognition through image processing. Several researchers used a typical markerbased AR tool [1] called “ARToolKit ” from 2000 to 2004; in fact, it is still widely used. The AR created by this tool can be executed in real time. Because only a thick black marker is used as a reference object, this marker is not aesthetically appealing and is not appropriate for commercial use. To overcome the drawbacks of the marker, several studies have been conducted on markerless AR. Markerless AR uses general images as the marker. For this, a technology to extract the images designated by the marker is required. Thus far, studies on feature extraction have introduced several feature extraction algorithms such as SIFT [2], SURF [3], and FAST-SURF [4]. Further, the use of *
This research was supported by Collaborative R&D Program through the Small&Medium Business Administration funded by Ministry of Knowledge Economy (2010). ** Corresponding author. T.-h. Kim et al. (Eds.): MulGraB 2011, Part I, CCIS 262, pp. 155–164, 2011. © Springer-Verlag Berlin Heidelberg 2011
156
D. Jin, K. Um, and K. Cho*
FAST-SURF to implement markerless AR was attempted on a real-time basis because FAST-SURF-based AR can be executed in real time by using a computer. However, this attempt was not successful, because the performance of this algorithm is considerably poor in the mobile services domain owing to certain hardware limitations. Consequently, it is necessary to execute the tracking algorithm in a parallel architecture in order to realize AR on a real-time basis in a mobile environment. In actual applications such as the AR book [5],[6], the recognition of a single reference object is not sufficient to realize AR. Therefore, it is necessary to realize AR in the case of multiple reference objects. However, the computation in this case becomes considerably complicated because of the increase in the number of reference objects. Therefore, a further study needs to be conducted to solve this problem. In this paper, we propose a real-time AR system development plan for a mobile environment. In other words, we propose a multi-thread design pattern to execute a conventional feature extraction algorithm and a tracking algorithm in a parallel architecture and make this pattern available on a real-time basis in a mobile environment. Then, we introduce a homography algorithm to correct the time errors that occur upon the parallel execution of the feature extraction algorithm and the tracking algorithm. Finally, we propose a method to solve the search-time problem by adding a finder thread to decrease the search time of multiple reference objects; this decrease in search time is necessary to apply the proposed system to AR book contents. The rest of this paper is organized as follows. In Chapter 2, AR-related studies are described. In Chapter 3, the markerless AR system design patterns are presented. In Chapter 4, the tests of the proposed methods and their performance evaluations are discussed, and in Chapter 5, the conclusion is presented.
2
Related Works
Of all the feature extraction algorithms used in markerless AR, SIFT [2] is the most widely used algorithm. SIFT is an algorithm with sufficiently outstanding performance to quickly extract features from images and overcome various adverse effects of extraction, such as transformation, noise, and lightness. In 2006, Bay [3] proposed a SURF algorithm that improved the speed of the SIFT algorithm. In fact, this study made it possible to carry out feature extraction in an almost real-time manner. In 2006, [7] and [8] proposed a fact corner detection algorithmm to quickly extract the vertex from the images. Since then, the FAST-SURF algorithm has been proposed as a substitute for the fast-corner detection algorithm to find the special features of the SURF algorithm. This has, in turn, made it possible to realize AR on a real-time basis by using a computer. However, it is as yet impossible to realize AR on a real-time basis with only the FAST-SURF algorithm in an environment where the hardware performance is poor, such as in a mobile environment. As a result, a further study is required to figure out a way to realize AR on a real-time basis in a mobile environment as well. To extract special features and then track them down on a real-time basis, the Lucas-Kanade (LK) optical flow algorithm has been proposed [9]. An optical flow makes it possible to track down changes in the special features of video images. The LK method is an algorithm that quickly computes the optical flow using the Taylor formula and on the basis that the voxel size remains constant.
Development of Real-time Markerless Augmented Reality System
157
Further, it is necessary to convert plane objects into 3D images during the tracking process. In this study, this problem has been solved using homography [10]. Homography is an algorithm used for converting a plane into a 3D space and estimating the corresponding transformation matrix in a 2D space. To realize AR, it is necessary to perform object transformation in 2D images and estimate the camera transformation matrix in a 3D space. In a study by Kato released in 1999 [1], a 3D geometric method was proposed to solve this problem. In the AR tool kit, this method is used for solving the problem of the estimation of the camera transformation matrix. This method exhibits excellent performance in terms of speed. In this paper, we have proposed a real-time markerless AR system design method that uses a multi-thread design pattern based on SURF and LK optical flow algorithms. Then, to correct the position errors that occur because of the difference in the operating speed between the SURF and the LK optical flow tracking algorithms, a calibration method using a homography algorithm is proposed. Furthermore, we have proposed a method to realize AR on a real-time basis in the case of multiple reference objects by adding a finder thread to the main program.
3
Multi-Thread-Design-Based Feature Extraction and Tracking
In an AR system, it is important to locate objects in a 3D space on a real-time basis from 2D images that were obtained using a camera. However, it is not easy to realize AR in such a case, because a considerable amount of computation is required for identifying and locating these objects during image processing. Moreover, the number of reference images to be compared increases with an increase in the number of reference objects, and the process becomes relatively slow. Hence, we propose the following method. To improve speed, a real-time AR system design method that uses feature extraction and tracking algorithms based on a multi-thread design is proposed. Because of the slow computation of the SURF algorithm, however, the SURF thread is not synchronized with the real-time main thread. As a result, a time error is observed. To solve this problem, a homography-based method is proposed. Finally, a way to quickly access multiple reference objects is also proposed. 3.1
Speed Improvement Method Using Multi-thread Design
In this study, the SURF algorithm is used for extracting features. Because it is not possible to realize a real-time process with the use of only this algorithm, we propose a method to improve speed by using a multi-thread design pattern. While the SURF algorithm is repeatedly performed, a module that tracks down the features obtained as a result of the SURF algorithm by using the main thread is executed. In this paper, all images used as marker are defined as “reference objects.” Once the features of the reference objects are extracted, the position transformation of the reference objects can be tracked down using the main thread on the basis of the position of the extracted features. Then, because it is unnecessary to execute the SURF algorithm every time, the SURF algorithm and tracking algorithm can be performed on a real-time basis. As shown in Figure 1, the SURF algorithm that extracts the features of reference objects from the images is separated from the main thread. Thus, the main thread is
158
D. Jin, K. Um, and K. Cho*
freed from the burden of a significant amount of computation. Moreover, the main thread can be executed on a real-time basis. Figure 1 shows the execution procedure of the SURF and main threads. The details are as follows: 1) Initialization: The feature extraction tables for all reference objects should be established. After loading all reference objects, we extract the features of these objects using the SURF algorithm and save the extracted features in the feature extraction table. 2) SURF thread: During the execution of this thread, the images captured by the camera are saved in a shared buffer. The SURF thread obtains the images from the shared buffer, executes the SURF algorithm, and extracts the image features. Then, similarities are compared between a group of features of the extracted scenes and a group of features of the reference objects saved in the feature extraction table. If the similarities are greater than the threshold value, the group of extracted features is saved as a group of features (shared buffer) to make them available in the tracking stage. 3) Main thread: After capturing images, the main thread obtains data from the shared buffer (a group of features) and starts executing the tracking algorithm in every image. In this process, points lost during tracking or unclear points obtained after tracking are removed. If the group of features has less than four points, the feature extraction will be performed again in the SURF thread because homography cannot be estimated in the next stage if the group of features has less than four points. If the group of features has more than 4 points, homography can be estimated using the remaining features and the features matched in the images. Using the homography, we can estimate four vortexes and calculate the 3D spatial coordinates of the camera. Using the transformation matrix of the camera, we render 3D objects and display them on the 2D scene images using a 3D graphics engine.
Fig. 1. SURF and Main Thread Flowcharts
Development of Real-time Markerless Augmented Reality System
3.2
159
Correction of Position Errors Using Homography
When the SURF thread (feature extraction algorithm) and the main thread, which executes the feature-based tracking algorithm, are executed by using a multi-thread design pattern, images are not matched, because of the speed difference between the two threads. While the main thread can be executed on a real-time basis, the SURF thread is slow because of the significant amount of computation. The point where the result should be returned to the main thread after computing the SURF thread is reached when several frames have been analyzed. Hence, the points of the features extracted by the SURF thread do not match the current images. To ensure that the results of the SURF thread match the current images, it is necessary to correct the SURF results. This problem can be solved by using homography. In other words, if the translation transformation of objects at the start and end of the SURF thread is known, it is possible to relocate the SURF results to the object position of the current images using the homography of the translation transformation. In this paper, the features used for this correction are called “sub features.” These sub features are defined with the coordinates after extracting the corner/end point or the points that have a considerable contrast. In this process, the sub features are quickly extracted from the current images and saved in the shared buffer before the features are extracted by the SURF thread. While the SURF algorithm is executed and results are estimated in the next stage, the tracking algorithm is performed against the sub feature points by the main thread. However, since the sub features are those extracted from the whole image, even points that are not the features of objects are included. Therefore, it is necessary to filter the points that belong to the object domain only. The homography that has been estimated with the filtered points will become the transformed homography of the accurate objects. For this, we need to estimate the object domain in advance. Object domains can be searched as follows. Because the reference objects first saved in the system are square images, we set a square domain. We find the matching pairs after comparing the features extracted from the result of the SURF algorithm with the features of the reference objects. Then, homography is calculated on the basis of the matching pairs. The estimated homography is a 3 × 3 transformation matrix of an object plane. Lastly, the positions of four new vertexes are obtained after multiplying the estimated homography matrix by the four vertexes. The four new vertexes form the object domain in the images. Once the object domain is found, the following process is carried out. The features extracted by SURF are the results extracted from the scene images at the start of SURF. Therefore, it is necessary to save the positions of sub features separately after extracting the sub features. After extracting the results of the SURF thread, whether or not the positions of the sub features are included in this domain is examined. Then, the points that do not belong to this domain are removed. Using the remaining sub features, we set the homography to the position of the current sub features and at the start of the SURF process. The homography now has the value that can correct the SURF results. The new points obtained by multiplying the result of the SURF thread with the homography are customized to the position of the objects in the current images. Figure 2 shows a homography-based position error correction stage.
160
D. Jin, K. Um, and K. Cho*
Fig. 2. Homography-Based Position Error Correction Flowchart
As shown in Figure 3, a group of SURF features in the shared buffer are updated on the basis of the corrected results. In the main thread, the 3D camera transformation matrix can be estimated by tracking a group of features in the shared buffer and the calibrating camera. Based on the estimated camera transformation matrix, a 3D model is rendered and scattered on the display using a 3D graphics engine. Then, an AR system is developed. 3.3
Finder Thread Design for Handling Multiple Reference Objects
For making an actual AR book, AR should be realized in several pages, not just in one page. However, if there are many pages and reference objects, a considerable amount of computation is required for searching reference objects; the speed of the computation is also relative poor in this case. Further, there can be two cases in which the SURF thread does not find reference objects while executing the actual AR. First, a user changes pages. Second, even though the pages are not changed, the current page cannot be found by a different external environment. However, it is very inefficient to search for reference objects in these two cases by using the same method. In the first case, all reference objects need to be searched. In the second case, all reference objects need not be searched; if only the information on the current page is managed, we need to search only the reference object related to the current page.
Development of Real-time Markerless Augmented Reality System
Fig. 3. Error Correction Stage
Fig. 4. Finder-Thread-Added System Flowchart
161
162
D. Jin, K. Um, and K. Cho*
In this study, the process to find pages is separated from the SURF thread. The new thread plays a role in finding the current page. This thread is called the “finder thread.” If the SURF thread fails to find the current page, the finder thread is executed. In the SURF thread, the SURF algorithm is executed for a single page, and the computation speed remains constant. Figure 4 presents a system flowchart including the finder thread. In the main thread, one variable that saves the index information of the current page in the shared buffer is added. Upon loop execution, the SURF thread extracts features from scene images and matches these features with the features of reference images indicated by the page index variable. If the features do not match, the key point and descriptor of the extracted scene features are saved in the shared memory. Then, the finder thread is executed to move to the next loop. If the features match, the finder thread is stopped. Then, the group of features is updated as usual, and the index of the current page is sent to the main thread. The finder thread obtains data from the feature buffer of all reference images extracted by the SURF thread and waits for the next event in the beginning. This event is executed when the SURF thread fails. Once the event is activated, the finder thread ensures that the scene features in the shared buffer match with the features of the reference images and then extracts the indexes of the pages photographed by a current camera. If the SURF thread searches again while the finder thread is searching for the current page, the finder thread stops. Since such an event occurs when a current page cannot be found because of other external factors, it is unnecessary to keep executing a finder thread. Therefore, the finder thread is stopped. However, if the SURF thread fails to find the current page until the finder thread obtains the results, the user may have changed the page. Therefore, the result is informed to the SURF thread. In this method, when pages are changed, it takes time to find reference objects. However, if these objects are not found because of certain external factors in the case when the pages are not changed, it will not be a waste of time to search them.
4
Experiment and Performance Evaluation
To test the proposed method, a simple AR system is developed. The execution speed is set to the performance evaluation standard. A test is performed using images with a resolution of 640 × 480 and a laptop computer (2.80 GHz CPU, 4 GB, NVIDIA GeForce GTX 285, and Logitech V-UBG35 and V20). Figure 5 shows the execution of the markerless AR system. The system is implemented in an accurate and safe manner on a real-time basis with almost no delay in the position measurement. Further, because the features are tracked down using the tracking algorithm, it is possible to track down the already extracted features and locate precise positions even though they cannot be recognized with the SURF algorithm in a certain frame. The performed test consists of two comparisons: a comparison of the single-thread execution method and a method to separate SURF from the multi-thread, and a comparison of the finder thread-added method and the conventional method using the proposed method for an AR book. The results of the comparison between the single-thread and the multi-thread methods are as follows. Figure 6 shows the results of frames/second (fps) tested. In the case of the single-thread method, we cannot move to the next frame until the SURF results are obtained. Therefore, this method cannot be executed on a real-time basis.
Development of Real-time Markerless Augmented Reality System
163
Fig. 5. Image of Test Results
Fig. 6. FPS of Single-Thread and Multi-Thread Methods
Fig. 7. Comparison of Time Taken to Search for Objects Again before and after Addition of Finder Thread
164
D. Jin, K. Um, and K. Cho*
Further, the computation speed of this method decreases with an increase in the number of reference objects. On the other hand, in the multi-thread method using the additional tracking algorithm, the main thread is not influenced by the number of reference objects. Therefore, good results (average: 23 fps) are observed for this method. The results of a test performed by adding a finder thread are explained next. Figure 7 shows the time difference that occurred while searching for reference objects again when they were not found earlier because of external reasons such as being blocked by the another camera while the AR contents were being executed. According to these results, the number of pages searched is high, as denoted by the red line in the figure. However, no significant changes are found in terms of the time required to search the objects again. However, the blue line, which denotes the case without the finder thread, increases in a linear function pattern.
5
Conclusion
In this study, we investigated a system design plan to execute the AR system on a real-time basis unlike most studies on the enhancement of the speed of the SURF algorithm. We introduced a tracking algorithm and a multi-thread design pattern using only the conventional SURF algorithm without FAST-SURF. Further, we proposed an AR system that could be executed on a real-time basis after the development of a time difference calibration method to handle the slow processing of SURF. It appeared that the system would run relatively fast if the common FAST-SURF were introduced to the system design method proposed in this study.
References 1. Kato, H., Billinghurst, M.: Marker Tracking and HMD Calibration for a video-based mented Reality Conferencing System. In: Proceedings of the 2nd International Workshop on Augmented Reality, San Francisco, USA (1999) 2. Lowe, D.: Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision 60(2), 91–110 (2004) 3. Bay, H., Tuytelaars, T., Van Gool, L.: SURF: Speeded Up Robust Features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 404–417. Springer, Heidelberg (2006) 4. Wagner, D., Reitmayr, G., Mulloni, A., Drummond, T., Schmalstieg, D.: Real-time detec-tion and tracking for augmented reality on mobile phones. IEEE Transactions on Visualization and Computer Graphics 16(3), 355–368 (2010) 5. Kim, K., Lepetit, V., Woo, W.: Scalable real-time planar targets tracking for digilog books. The Visual Computer, 1145–1154 (2010) 6. Grasset, R., Dünser, A., Billinghurst, M.: Edutainment with a mixed reality book: a visually augmented illustrative childrens’ book (2008) 7. Rosten, E., Drummond, T.: Fusing points and lines for high performance tracking. In: IEEE International Conference on Computer Vision, pp. 1508–1511 (2005) 8. Rosten, E., Drummond, T.W.: Machine Learning for High-Speed Corner Detection. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 430–443. Springer, Heidelberg (2006) 9. Lucas, B.D., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: Proceedings of Imaging Understanding Workshop, pp. 121–130 (1981) 10. Hartley, R., Zisserman, A.: Multiple View Geometry in computer vision, pp. 32–33. Cambridge University Press (2003)
An Acceleration Method for Generating a Line Disparity Map Based on OpenCL Chan Park1, Ji-Seong Jeong1, Ki-Chul Kwon1, Nam Kim1, Mihye Kim2, Nakhoon Baek3, and Kwan-Hee Yoo1,∗ 1
Chungbuk National University, 410 Seongbongro Heungdukgu Cheongju Chungbuk, South Korea {farland83,szell,kwon,namkim,khyoo}@chungbuk.ac.kr 2 Catholic University of Daegu, 330 Hayangeup Gyeonsansi Gyeongbuk, South Korea
[email protected] 3 Kyungpook National University, Daegu Gyeongbuk, South Korea
[email protected]
Abstract. Stereo matching methods are typically divided into two types: areabased and feature-based methods. Area-based methods apply stereo matching to the entire image and are more widely used. However, since area-based methods calculate the matching points with block units for the whole image, real-time stereo matching of area based methods requires a significant amount of computing time. This paper proposes a line disparity map creation algorithm that can perform real-time stereo matching through GPGPU parallel processing based on OpenCL by improving the performance of the matching process. Keywords: Line disparity map, Real-time disparity map, Stereo images.
1
Introduction
Stereo matching algorithms have to produce more accurate disparity maps to extract three-dimensional (3D) information from stereoscopic images. A disparity map is the depth information that shows the distance between corresponding points from the left and right images. Stereo matching algorithms are usually classified into two methods: feature-based and area-based ones [1,2]. Feature-based methods identify the corresponding points between two images based on curves and boundary edges that represent the features in stereo images. Feature-based methods can achieve higher disparity accuracy, but they can only provide matching information on the distinguishing points. Area-based methods calculate the corresponding points from the result obtained after measuring and comparing the correlation level of the area between images within a specific window. Area based methods process the entire image to determine the corresponding points and can provide more detailed 3D information. There are a number of area-based matching algorithms in existing literature, including SAD (Sum of Absolute Difference), SSD (Sum of Squared Difference), and NC (Normalized ∗
Corresponding author.
T.-h. Kim et al. (Eds.): MulGraB 2011, Part I, CCIS 262, pp. 165–171, 2011. © Springer-Verlag Berlin Heidelberg 2011
166
C. Park et al.
Correlation), and these algorithms compare the correlations between the same corresponding pixels of two images [3]. However, these algorithms are time-consuming and often exclude features due to shadow or occlusion and failures from the difference in lighting information. Area based matching algorithms also have limitations in examining the correlations of the entire image. In this paper, we propose a line disparity map algorithm that can generate disparity maps in real-time by improving the performance of the matching processes of current area-based matching algorithms. The proposed algorithm is based on the algorithm introduced in [4], but it performs real-time stereo matching through GPGPU (General-Purpose Computing on graphics Processing Units) parallel processing based on OpenCL (Open Computing Language) [5].
2
Proposed Line Disparity Map Algorithm
To convert stereoscopic images into 3D information, the proposed algorithm determines the corresponding points over the stereoscopic images in real-time and maps the points in a 3D space by calculating the epipolar geometry. That is, the main advantage of the proposed algorithm is in extracting the matching points in real time. To enhance the matching performance, stereo matching algorithms should identify the matching points in real time. However, existing methods require a significant amount of computing time to obtain disparity maps because they calculate all points within a specific window based on certain areas and then detect the most similar points among those as shown in Fig. 1 (a). Furthermore, current methods basically adopt 1080i Full HD images as input data, which can entail much longer computation time. The line disparity map algorithm proposed in this paper compares only the matching points along the lines based on the line of the corresponding window, rather than all points as shown in Fig. 1 (b), resulting in improved matching performance.
(a) Area-based matching process
(b) Line-based matching process
Fig. 1. Area-based (a) and line-based (b) matching process
The proposed algorithm compares the matching points of stereoscopic images, line by line. Therefore, a line’s slope on each pixel in a stereo image must be calculated first. Equation (1) is used to calculate a horizontal line slope H(x,y) for (x,y)th pixel of an image.
(1)
An Acceleration Method for Generating a Line Disparity Map Based on OpenCL
167
In Equation (1), f(x,y) represents the sum of two tangent values with respect to each x-coordinate and y-coordinate, and D represents the window size of the line to calculate the slope. It is typically designated with ‘3’, ‘5’, or ‘7’. The larger the window of the line, the bigger the increase in the accuracy of disparities becomes. However, this also increases the amount of calculation (i.e., computation time). Hence, ‘5’ is usually used as the most appropriate value. In addition, if the same value is obtained after comparing the lines, then this value will be considered as a possible corresponding point. When several candidate points are obtained, a matching point is determined by comparing the y-axis lines of the possible candidate points. A vertical line slope V(x,y) for (x,y)th pixel of an image can be calculated similarly to the horizontal line slope as Equation (2).
(2)
The insides of the boxes in the left and right images of Fig. 2 show the visualized areas of the changes in the slopes obtained by Equations (1) and (2). Xs and Xe in the graphs of Fig. 2 mean the start x-coordinate and end x-coordinate of the box, respectively. Further Yp represents a specific y-coordinate of the box. The line slopes shown in Figure 2 give us meaningful information for detecting the disparity map of two images. If we observe the changes of line slopes, then it is easy to find the portions that can correspond to the left and right images in specific areas. When several possible corresponding points are obtained from the images due to the slight changes in the slopes, the most appropriate corresponding point will be determined by calculating each possible point of the y-coordinate line slope using Equation (2). Now, we are ready to obtain a disparity map for the left and right images with the horizontal and vertical line slopes of both. In order to do it, it is necessary to define the measurement M(x,y) for taking a disparity of a specific (x,y)th pixel in a left image with respect to a right image. In this paper, we define the measurement M(x,y) as the following Equation (3).
(3)
In Equation (3), H and V represent the horizontal and vertical line slopes of a left image, respectively, and represent the horizontal and vertical line slopes of a right image, respectively. Based on the measurement, a disparity, d(x,y) for (x,y)th pixel can be obtained by solving the optimization problem as the following Equation (4) since the rectification is already done for stereoscopic images. (4)
168
C. Park et al.
Fig. 2. Comparison of line slopes
3
Parallel Processing of the Line Disparity Map Algorithm
3.1
OpenCL Parallel Processing
Fig. 3 shows the architecture of OpenCL parallel processing. The host program calls up and processes the kernel for parallel processing, which means the kernel is executed by grids of parallel threads. A grid is constructed using more than one block, while all blocks are composed of the same number of threads with the maximum of 512 threads per block [5]. In order to implement the proposed line disparity map using OpenCL, we can apply the OpenCL parallel processing procedure for the following parts: the computation of each horizontal and vertical line slopes for a left image and a right image, and computation of a disparity map of the stereoscopic images. For line disparity map parallel processing, the total number of threads created is proportional to the image size of width * height. Now, consider the computing procedure of the line slopes for a left image using OpenCL. Fig. 4 shows the position of each thread in the kernel [6]. Two line slopes of a specific (i,j)th pixel are computed at a specific thread. The thread is designated as i get_global_id 0 and j get_global_id 1 . Similarly, a disparity d(i,j) can be computed by the thread that is assigned by i get_global_id 0 and j get_global_id 1 . As shown in Fig. 4, the result executed by the thread is denoted by P[i+width*j]. When the result of the disparity calculation is denoted as Disparity Map(P), the Disparity Map(p) = P[i+width*j].
An Acceleration Method for Generating a Line Disparity Map Based on OpenCL
169
Fig. 3. Architecture A of OpenCL parallel processing
Therefore, in one thread, the final line map is created by determining the matchhing point of the right image fro om the left image and calculating the distance from pooint ‘P’ to the corresponding point (i.e., the distance of pixels). Here, the proposed lline disparity map algorithm is used u in identifying the corresponding points.
F 4. Thread position in the kernel Fig.
3.2
Experiments
The experimental system was w developed using Microsoft Visual Studio 2008 SQ QL, MFC, OpenCV, and OpenC CL on Windows 7 (32bit) with Tesla C1060 graphic ccard and 4GB memory. The exp periments were conducted to test both the creation time and the accuracy of disparity y maps. The test images provided by the Middlebbury University for stereo match hing were used to compare the accuracy of disparity m maps created per second [6]. Fig. 5 shows the disparity map test images.
170
C. Park et al.
Fig. 5. Test imag ges of the Middlebury University for disparity maps
Fig. 6. Experrimental results of line disparity map by line size
4
Conclusion
Our experimental results sh howed that the frame speed could vary in three aspects: (1) image size, (2) size of lin ne being compared, and (3) size of disparity step beeing explored. Even though thee size of disparity step was fixed at ‘41’, which is the maximum disparity of thee recorded images, the results varied depending on the recorded images.
An Acceleration Method for Generating a Line Disparity Map Based on OpenCL
171
Fig. 6 shows the experimental results of the CPU-based line disparity map with the test images provided by the Middlebury University. The processing speed by each line size in the line disparity map was presented as a first-person shooter (FPS) in these results. The experimental results could demonstrate and confirm that disparity maps can perform in real-time, although the accuracy of the obtained disparity maps is lower than those of the test images. This study is still ongoing, and further development of the proposed algorithm is expected to improve the accuracy of disparities. Acknowledgments. This research was financially supported by the Ministry of Education, Science Technology (MEST) and National Research Foundation of Korea (NRF) through the Human Resource Training Project for Regional Innovation, and by the grant of the Korean Ministry of Education, Science and Technology (The Regional Core Research Program/Chungbuk BIT Research-Oriented University Consortium).
References 1.
2.
3.
4.
5. 6.
Koo, H.-S., Jeong, C.-S.: An Area-Based Stereo Matching Using Adaptive Search Range and Window Size. In: Alexandrov, V.N., Dongarra, J., Juliano, B.A., Renner, R.S., Tan, C.J.K. (eds.) ICCS-ComputSci 2001. LNCS, vol. 2074, pp. 44–53. Springer, Heidelberg (2001) Wang, J., Miyazaki, T., Koizumi, J., Iwata, M., Chong, J., Yagyu, H., Shimazu, H., Ikenage, T.: Rectangle Region Based Stereo Matching for Building Reconstruction. Journal of Ubiquitous Convergence Technology 1(1), 1–9 (2008) Bae, K., Kwon, S., Lee, Y., Lee, J., Moon, B.: A hardware architecture based on the NCC algorithm for fast disparity estimation in 3D shape measurement systems. J. of the Korean Sensors Society 19(2), 99–111 (2010) Park, C., Jeong, J., Kwon, K., Kim, N., Han, J., Im, M., Jang, R., Yoo, K.: Line Disparity Map for Real-Time Stereo Matching Algorithm. In: 2011 Spring KoCon Conference, pp. 57–58 (2011) OpenCL: http://www.khronos.org/opencl/ Middlebury: http://vision.middlebury.edu/stereo/
Hand Gesture User Interface for Transforming Objects in 3D Virtual Space Ji-Seong Jeong, Chan Park, and Kwan-Hee Yoo∗ Department of Computer Education and Department of Information Industrial Engineering, Chungbuk National University, 410 Seongbongro Heungdukgu Cheongju Chungbuk, South Korea {farland83,szell,khyoo}@chungbuk.ac.kr
Abstract. Generally, users have controlled objects in a 3D virtual space by using a mouse and a keyboard. However, it is not easy to carry out actions in 3D virtual space through the devices since others could be used to communicate about some issues. Therefore, in this paper, we propose a system in which hand gestures of users can be used to control objects. In the proposed system, an object can be picked up through a specific hand gesture, and it can then be translated, rotated, and scaled in x, y, z directions according to the recognized hand gestures. Keywords: 3D virtual space, hand gesture recognition, object transformation.
1
Introduction
With the enhancement of graphics and computer vision techniques, their diverse integrated contents have been made, leading to creation of various applications. One of them is a 3D virtual experimental education system in which users can do various experiences with others through direct participation in 3D virtual space [1]. Unfortunately, however, users of most systems cannot interact naturally with objects in 3D virtual space, and there are some constraints even if they can. This paper proposes a gesture user interface by which users can interact more naturally with objects in 3D virtual space without using extra devices. Compared to traditional user interfaces based on devices such as mice and keyboards, it is known that a gesture based user interface provides better immersion for users by allowing them to more naturally and friendly control objects in 3D virtual space[2,3,4]. Therefore, it is possible for users to more actively participate in 3D virtual space to maximize the effectiveness of such participation. For these reasons, gesture user interfaces are considered to be future interface mechanism [5]. Even though various gesture related actions such as movements of fingers, arms, heads, faces, and hands can be defined as user interfaces, hand gestures tend to be used more widely to implement their interactions. This paper will focus on hand gestures. First, hand gestures will have to more accurately be recognized so that they are used as user interfaces. With the purpose of recognizing hand gestures, model based approaches [7] and appearance based approaches [8,9] have been developed. Appearance based approaches use image features to recognize the visual appearance of hands to achieve ∗
Corresponding author.
T.-h. Kim et al. (Eds.): MulGraB 2011, Part I, CCIS 262, pp. 172–178, 2011. © Springer-Verlag Berlin Heidelberg 2011
Hand Gesture User Interface for Transforming Objects in 3D Virtual Space
173
very high performance rather than model based approaches that use predefined models. Generally, two types gestures can be recognized by using the two approaches. The first type gesture, called static gesture, is defined as a gesture at a specific time, and the second type gesture, called dynamic gesture, is defined as a gesture appearing during a given short interval, which is specified by a sequence of static gestures. Carg, et. al [5] emphasized that a grammar, which is defined by a sequence of static gestures, is necessary to recognize the dynamic gestures. However, grammar depends on developing gesture recognition system. With the purpose of performing transformation of an object in 3D virtual space, in this paper, we define static gestures and propose a mechanism by recognizing dynamic gestures according to the grammar defined via predefined static gestures. The rest of this paper is organized as follows. Section 2 illustrates the proposed system for transforming an object based on hand gestures, and Section 3 describes its implementation results and points out further research directions.
2
The Proposed System for Transforming Objects Based on Hand Gestures
In this section, we present a system in which users can transform objects in 3D virtual space by using hand gestures. As shown in Figure 1, the proposed system consists of six parts: part A for capturing and preprocessing an image or a video from a Webcam, part B for removing the background of the captured image, part C for extracting the region containing a hand gesture in the captured image, part D for registering static hand gestures into gesture DB(data base), part E for recognizing hand gestures from the captured image, which will be performed by comparing the gesture and hand gestures registered in DB, and then transforming an object according to the recognized hand gestures, and finally part F for rendering the captured image and the transformed object integrated in 3D virtual space. Parts A, B, C can be performed by applying image processing techniques as illustrated in each part in Figure 1. So, they will not be described in this paper.
Fig. 1. The proposed system configuration for transforming an object in a 3D virtual space based on hand gestures
174
J.-S. Jeong, C. Park, and K.-H. Yoo
Next, we will describe methods for processing parts D, E, and F more specifically. To more effectively recognize a gesture in either an image or several images captured via a Webcam, we are able to construct DB to store information related to gestures as a hierarchical structure with three levels. The information at the lowest level of DB represents a static gesture at a specific time. Dynamic gestures defined as a sequence of static gestures are located at the second level of the proposed DB. 3D virtual space is composed of fixed objects and movable objects. When a gesture user interface is used to control objects in virtual space, it will only be applied to movable objects. The system proposed in this paper provides that the movable objects can have several dynamic gestures at the top level of DB. The global hierarchical DB structure mentioned thus far is shown in Figure 2. An object can be transformed by several dynamic gestures, and each dynamic gesture can also be defined as a sequence of several static gestures.
Fig. 2. A hierarchical structure of the proposed gesture DB
In order to transform a movable object via a hand gesture interface, this paper defines seven static gestures as shown in Table 1, and stores them at the lowest level of gesture DB. Each record of DB represents a static gesture and contains its attributes such as ID, an image filename containing the gesture, and model based or appearance based features appearing in the gesture, the meaning, and outputs of the gesture. As mentioned early, there are model based approaches and appearance based approaches to recognize a gesture from an input image. Appearance based approaches are very widely used since they have advantages with respect to performance [5]. Since one of the appearance measurements of an image is Hu moments invariants [10], we are using Hu moments invariants as appearance features in this paper, which consist of six descriptors encoding a shape with invariance to translation, scale, and rotation and one descriptor ensuring skew invariance, which enables us to distinguish between mirrored images. After calculating Hu moments for images containing predefined static gestures, they are stored into fields of the corresponding record at the lowest level of DB. A static gesture appearing in an input image will be recognized through comparison of its Hu-moments and several Hu-moments stored in DB, which will be later illustrated in the recognition step of a dynamic gestures. The outputs of the static gestures represent the response values to occur after their recognition. For example, when the captured image is recognized as the first static gesture, that is, SG1, start_gesture = true as its result will be set.
Hand Gesture User Interface for Transforming Objects in 3D Virtual Space
175
Table 1. Seven static gestures stored at the lowest level of DB Type of Gestures
Static images
Meaning
Variables
Output
SG1
Start gesture recognition
start_gesture
True
SG2
End gesture recognition
end_gesture
True
SG3
Select and release an object
Pick_object Pick_direction
True/False Vector
SG4
Start translation
Translation_gesture
True
SG5
Start rotation
Rotation_gesture
True
SG6
Start scaling
Scaling_gesture
True
SG7
Transform an object
Transform_value
Scalar
.
Now, consider the second level of gestures DB. The static gestures will be used to define primitive operations for picking, translating, rotating, and scaling an object in 3D virtual space. The level information will be represented as a state transition diagram based on predefined static gestures, which become a grammar to recognize dynamic gestures. The state transition diagram with the purpose of transforming an object in a 3D virtual space is made as shown in Figure 3. An initial state in the state transition diagram has start_gesture = false, which means that any gesture will never be recognized. If the gesture SG1 at the initial state is recognized from an input image, then start_gesture = false will be changed to start_gesture = true. During start_gesture = true, any input image can be recognized as one of the predefined static gestures. While start_gesture is true, five gestures from SG3 to SG7 will be operated meaningfully until end_gesture becomes true from false,
176
J.-S. Jeong, C. Park, and K.-H. Yoo
Fig. 3. The state transition diagram representing dynamic gestures
that is, an end gesture, SG2 from an input image is recognized. An object in 3D virtual space can be selected with direction information appearing in the hand gesture when an input image is recognized as the gesture SG3 in DB. The selected object can be transformed according to later following gestures. One of three gestures, SG4, SG5 and SG6, decides the transformation type to be applied to the selected object. When the gesture is recognized as SG4, the selected object will be translated in the direction measured from previously captured images. The rotation and scaling of the selected object corresponding to SG5 and SG6, respectively, will be implemented in a similar way to translation operation. The transformation direction and amount of (x,y,z)coordinates for a selected object will be obtained through image processing techniques. Operations for these hand gestures are finished upon receiving the ending gesture SG2
3
Experimental Results and Conclusion
The proposed system has been implemented on a PC window system by using MS C# for doing general purpose program, OpenCV for processing images and OpenGL for totally rendering 3D virtual space and an image. Figure 4(a) shows the preprocessing results of a left-up image captured via a webcam. The right-up image shows the result binarizing the skin color region extracted from HSV and YCbCr ones converted from the captured RGB image. The left-lower image shows one detected from contour and Hu-moments for the binarized skin detected image, and the right-lower image shows one stored in DB, which will be retrieved by comparing with Hu-moments of the captured image. Figure 4(b) shows the chroma keying [11] result with 3D virtual space for the captured image. The picking result of an object by recognizing a picking gesture after a starting gesture is shown Figure 4(c), and a result obtained by applying a translation and moving gestures to the picked object in 3D virtual space is also shown in Figure 4(d).
Hand Gesture User Interface for Transforming Objects in 3D Virtual Space
(a) preprocessing results
(c) a picking gesture after a starting gesture
177
(b) chroma keying result
(d) translation and moving gestures
Fig. 4. The experimental results of the proposed gesture user interface system
When Hu-moments of an image are applied to recognize the gestures in the proposed system, even though the recognition ratio of the gestures in static images is relatively high, the recognition ratio in video is relatively low. So, various methods for enhancing the ratio in video have to be developed. The more accurate the direction and amount of transformation of an object are obtained from captured images, the more accurate the object can be transformed. Efforts for obtaining them are required to enhance the proposed hand gesture interface. Acknowledgement. This research was financially supported by the Ministry of Education, Science Technology (MEST) and National Research Foundation of Korea(NRF) through the Human Resource Training Project for Regional Innovation and by the ICT standardization program of MKE(The Ministry of Knowledge Economy).
References 1. Jeong, J.-S., et al.: Development of a 3D Virtual Studio System for Experiential Learning. Proceedings of ASN 2011, 78–87 (2011) 2. Cadoz, C., et al.: Gesture – music. In: Trends in Gestural Control of Music, pp. 71–94. Ircam-Centre Pompidou, Paris (2000) 3. Turk, M.: Gesture Recognition, Ch. 10, http://ilab.cs.ucsb.edu/projects/turk/TurkVEChapter.pdf
178
J.-S. Jeong, C. Park, and K.-H. Yoo
4. Nespoulous, J.-L., Lecours, A.R.: Gesture: nature and function. In: The Biological Foundations of Gestures: Motor and Semiotic Aspects, pp. 49–62. Lawrence Erlbaum Assoc. (1986) 5. Garg, P., Aggarwal, N., Sofat, S.: Vision Based Hand Gesture Recognition. World Academy of Science, Engineering and Technology 49 (2009) 6. Berry, G.: Small-wall, A Multimodal Human Computer Intelligent Interaction Test Bed with Applications, Dept. of ECE, University of Illinois at Urbana-Champaign, MS thesis (1998) 7. Stenger, B., Mendonça, P.R.S., Cipolla, R.: Model-Based 3D Tracking of an Articulated Hand. In: Proceedings of British Machine Vision Conference, Manchester, UK, vol. I, pp. 63–72 (September 2001) 8. Wang, C.C., Wang, K.C.: Hand Posture recognition using Adaboost with SIFT for human robot interaction, vol. 370. Springer, Berlin (2008) ISSN 0170-8643 9. Barczak, A.L.C., Dadgostar, F.: Real-time hand tracking using a set of co-operative classifiers based on Haar-like features. Res. Lett. Inf. Math. Sci. 7, 29–42 (2005) 10. Bourennane, C.S., Martin, L.: Comparison of Fourier descriptors and Hu moments for hand posture recognition. In: Proceedings of European Signal Processing Conference, EUSIPCO (2007) 11. http://www.mediacollege.com/glossary/c/chroma-key.html
Marker Classification Method for Hierarchical Object Navigation in Mobile Augmented Reality Gyeong-Mi Park, PhyuPhyu Han, and Youngbong Kim Department of IT Convergence and Application Engineering Pukyong National University 599-1 Daeyeon-Dong, Nam-Gu, Busan, 608-737, Korea
[email protected]
Abstract. Augmented Reality has attracted much attention with presentation of Smart Phone. In this paper, we propose an object navigation system using the marker-based hierarchical mobile augmented reality. In order to provide the location of the destination, we set the marker at the location of the object by using the technique which groups the spatial location into several zones to configure the hierarchical Marker. Each group is similar in the marker number to include. A searching method using Stratified Marker can be performed hierarchically not only wide but also narrow area. This navigator can identify the markers in the small display device of mobile equipment conveniently. Keywords: Mobile Augmented Reality, Augmented Reality, Marker.
1
Introduction
The Augmented Reality is the technique to display the virtual object which was created by computer as if it exists in the real world [1],[2]. If this is a field of virtual reality, beyond application in Game various types of several additional information allow more realistic and practical technologies are emerging that can be provided. Especially, the ease of mobility and a wide range of resistance enables to use Augmented Reality applications in a mobile environment [3],[4],[5]. The position indicator has to be given in any form of augmented reality implementation. Position indicator is a tool to match the real world and virtual world. After the position indicator matches the corresponding coordinates of real world and virtual world and then the object of virtual world is reinforced on the image of real world, actually that object will appear as if it exist in reality. The location indicator is one of the important components in Augmented Reality applications and thus its perception methods should be changed depending on the style and replacement of locator. Position can be divided into two types with active sensor and passive sensor. Active sensor means the sensor such as RF chips and infrared indicators that can actively stands their location. Passive sensor includes the sensor that can recognize using Marker or specific patterns or type of things in images of a camera and other visual equipments. The use of an active position indicator gives good performance but the drawback in its actual application is the narrow scope because it requires high costs and multiple equipments. The passive position indicator with low-costs can be installed easily on T.-h. Kim et al. (Eds.): MulGraB 2011, Part I, CCIS 262, pp. 179–184, 2011. © Springer-Verlag Berlin Heidelberg 2011
180
G.-M. Park, P.P. Han, and Y. Kim
the real environment and also the mobile equipment includes tools to recognize the images of various visual types such as a camera. Manual position indicator is classified into a marker-based [6] and vision-based [7][8] method. Vision-based approach owes many advantage because it use the properties of realistic environment for representing position indicators while it requires a considerable time consuming to analyze the images. The advantage of marker-based method is to easily detect the markers and thus make successful applications of augmented reality. However, this method gives drawbacks that show a narrow application scope and unrealistic visual images. This method is difficult to visually identify all markers in a mobile augmented Reality that exists a large number of dense markers in a wide area or in a particular area. Therefore, in this paper we propose a hierarchical marker structure to effectively navigate the markers that have been spread out on a wide area. The search with a layered marker can gradually narrow its search area from wide region to small region. In addition, this proposed method has to augment the markers over real images that correspond to the current location of a user and also the magnification level of a camera image. Subsequent chapter shows the method that hierarchically classifies the markers and the search process that gets the destination using a hierarchical marker structure. And it gives an augmentation method that makes the visual representation depending on the user’s current location. Finally, we analyze the results of experiments leading to the concluding.
2
Hierarchical Marker Classification for Retrieval Position
To make a hierarchical marker classification, we first have to investigate the considerations with the types of markers and the structural features of markers to be designed. The considerations give a criterion to classify the markers. These criteria will be used to get a marker-tier search destination and also generate the visual image corresponding to the current location of a user. Fig 1 shows the overview of the proposed campus-building guide system that represents the location and its information of a campus building.
Fig. 1. Destination Object Search System
Marker Classification Method for Hierarchical Object Navigation
2.1
181
Hierarchical Classification of Markers
Markers are used as indicators of the objects such as buildings or stores in a mobile augmented reality application. These makers are attached to the real environment image and then taken into the system as the image captured by mobile device camera. At this time, the markers should be displayed in a form that can give a natural experience of the current augmented reality application and also an enough accuracy to distinguish and recognize the markers. And also the markers are should be recognized in real time. We know that the recognition of the markers greatly depends on the design form of the markers. To make a hierarchical structure of markers, we classify the markers into the several zones using the positioning information of the building object marker. All markers belong to the particular zone becomes a lower layered marker of a key marker representing the zone. When a zone marker is selected, our system visualizes the rest markers belong to that zone. The hierarchical marker configuration was set to the rectangular area by minimum and maximum value of the x, y coordinate of Object-Marker which indicates the location of the building. This rectangular area can help to determine the number of zone area depending on the number of Object-Marker. This is tolerance for displaying markers on the mobile display. In this paper, we set 10 as the number of each section of Marker. When the number of sections is determined, each section can be divided into zones with an equal distance. However, because the buildings are concentrated in particular areas, the markers do not present the uniform distribution. The number of Object-Marker will not be evenly distributed. It also causes the great overlaps of markers. Thus, we employ the zone concept which includes similar number of ObjectMarkers by the unequal dividing of the area. In the following expression (1) and (2), ix and iy become a baseline to divide the area into zones. To determine the baseline, we first sort the x, y coordinates of ObjectMarkers and then select the baseline with equal distribution along the axis of x and y. P
OM x ix
ZC/
,P , jy
OM y
(1)
ZC/
(2)
Fig. 2. Marker grouping equal area grouping (left), proposed grouping (right)
182
G.-M. Park, P.P. Han, and Y. Kim
Where Pxi, Pyj is the zone number of ith, jth object marker, OM (xix), OM (yjy) is obtained by using expression (2) ix, jy the Object-Marker of x, y coordinates. And then, in expression (2) if i, j is the term of separate area i, j, let i=1, 2, ... ZC/2-1, j=1, 2, ... ZC/2-1 and in expression (2) ZC represents the total number of area. Figure 2 shows one example for the case which the number of Object-Marker is 37 and zone area is 4. Figure 2(a) denotes the case with constant width and (b) is the case with variable width which is proposed in this paper. We can see that markers are equally distributed in figure 2(b) rather than 2(a). After area is divided, we have to define the representative marker of each zone. The zone marker is selected as the nearest mark to the positional center of that zone among all objects marks. To select the zone marker, we sort the x, y coordinate of object marker in sequential order and then select the Zone-Marker by the expression (3) and (4). ZoneMarker mx
I
ZOM mx, my , my
(3)
I
(4)
In expression (4) Izx, Izy is the biggest value of index which are attached on x, y coordinate in zone and mx, my select Marker that can be defined the middle position the value of Object-Marker’s x, y. After each section of Zone-Marker was selected, the rest of the Object-Marker configures the collected hierarchical markers in the lower layer of Zone-Marker. In addition, Zone-Marker which is selected here can be used when you repartition area as location 2.2
Destination Search and Path Navigation Using the Hierarchical Marker
To search the destination object, we placed many hierarchical Markers on the GoogleMap. The search using the hierarchical Marker is first started with detection of the Zone-Markers which is eligible for the entire search. For each Zone-marker, the search system investigates a destination marker from all object markers in the detected zone. This job will be repeated until you get the destination marker. After you found a destination that you want to visit, you define a destination object marker and then try to get destination path with adequately shortest distances. And mobile GPS system will be employed to identify the user’s current location. This location helps us to display the markers on the actual imaging of retrieved ObjectMarker that is spatial location of the user context for a certain distance within the scope of the surrounding buildings. Further, Path Navigation shows the together destination position on GoogleMap and providing real time video, and distance between user’s current location and destination object.
3
Experiments and Results
In order to test this proposed system, we defined total 37 Object-Marker representing the building object and 4 Zone-Marker meaning the area zone such as shown in table (1). These objects markers are generated randomly.
Marker Classification Method for Hierarchical Object Navigation
183
Table 1. Marker Type of Marker Root-Marker Zone-Marker Object-Marker
Count 1 4 37
Information Location(x, y) Location(x, y) Location(x, y)
Fig.3 presents the process that search the destination object. First row of Fig 3 starts search scene, the second row is a part of the destination for search on the map. Left column shows a 4 Zone-Marker on the whole search area and right column exhibits the target object for the all object marker in a specific zone (2 areas) after zone marker searching on the left map. The third row image give augmented reality that combines an environmental image around the user’s to location and the information of each building. The left image in third row presents the search result of Zone- Markers and the right image shows the search result of Object-Marker in a specified zone.
Fig. 3. Destination Object Classified Search
If the destination is identified, the path to the target will be generated. Third row of Fig.3 shows some augmented results that are generating on the path of user movement.
184
G.-M. Park, P.P. Han, and Y. Kim
The proposed system grouped random generated makers and then creates the hierarchical structure of markers. The tracking of the destination object is gradually processed in narrowing area. Our system has made a good augmented reality application that can display the markers in distinguishable forms on the enough small size of a mobile display.
4
Conclusions and Remarks
In this paper, we have proposed a Campus Building Guide system employing a hierarchical structure of markers. This hierarchical system has divided the area into several zones so that have the equal distribution of object markers. This zoning scheme becomes a basis of a hierarchical structure of markers. This hierarchical structure first search the zone marker that represents the specified zone. The selected zones are investigated to find an object in detail. Using this method, we made a good augmented reality application that can shows on the small display of a mobile devices and identify all object markers on display. As a future work, we will design the method to search the non-marker objects using image-based data more realistic implementation of augmented reality research. Acknowledgement. This work was supported by the Pukyong National University Research Fund in 2010(PK-2010-00120002301014700).
References 1. 2. 3.
4.
5. 6. 7. 8.
Azuma, R.T.: Survey of augmented reality. In: Presence: Teleoperators and Virtual Environments, vol. 6, pp. 355–385 (1997) Azuma, R., Baillot, Y., et al.: Recent advances in augmented reality. Computer Graphics and Applications 21(6), 34–47 (2001) Zhou, F., Duh, H.B.-L., Billinghurst, M.: Trends in Augmented Reality Tracking, Interaction and Display: A Review of Ten Years of ISMAR. In: IEEE International Symposium on Mixed and Augmented Reality, pp. 193–202 (2008) Papagiannakis, G., Singh, G., Magnenat-Thalmann, N.: A survey of mobile and wirelesstechonologies for augmented reality systems. Computer Animation and Virtual Worlds 19(1), 3–2 (2008) Carmigniani, J., Furht, B., Anisetti, M.: Augmented reality technologies, systems and applications. Multimedia Tools and Applications 51(1), 341–377 (2010) Kato, H., Billinghurst, M.: Marker Tracking and HMD Calibration for a Video-based Augmented Reality Conferencing System. In: IWAR 1999, pp. 85–95 (1999) Teichrieb, V., et al.: Survey of Online Monocular Markerless Augmented Reality. In: IEEE International Symposium on Mixed and Augmented Reality, pp. 193–202 (2008) Schmid, C., Mohr, R.: Local grayvalue invariants for image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence 19, 530–534 (1997)
Physically Balancing Multi-articulated Objects Nakhoon Baek1 and Kwan-Hee Yoo2, 2
1 Kyungpook National University, Daegu 702-701, Republic of Korea Chungbuk National University, Cheongju Chungbuk 361-763, Republic of Korea
[email protected]
Abstract. In many fields of computer science and other engineering areas, we often need to balance multi-articulated structures. In this paper, we formalize this kind of balancing problem from a more physical and theoretical point of view. Through describing details of all the solution steps, we finally represent a set of algorithms to automatically balance multi-articulated objects with tree topologies. Given the geometric configurations and masses at the leaf nodes of target multi-articulated objects, our algorithms achieve their balanced state through adjusting the mass of each node. To minimize the mass changes from the initial configuration, we use constraints of minimizing the norms of the mass differences between the initial masses and the final balanced masses. Actually, we use three different metrics, l1 , l2 and l∞ norms. These norms show slightly different behaviors in the minimization process, and users can select one of them according to their preferences and application purposes. We show all the details of algorithms, their time complexity analyses, and experimental results. Keywords: balancing, tree-topology, minimization.
1
Introduction
In various fields including human computer interface, computer animation, mechanical engineering, and so on, we frequently use multi-articulated objects, whose components are linked to each other. Figure 1 shows virtual mobiles, as examples of such multi-articulated objects. Physically based techniques are then applied to generate their realistic motions. In this case, we need a set of physical parameters for each component, including mass, center of mass, moment of inertia, etc. There have been a few methods[3,6,7,8,9,10] to automatically calculate these physical parameters from the given configurations. GPU-based implementations are also available[4]. Physically based techniques usually require the object to be initially in its balanced state. In fact, most real-world multi-articulated objects are in their balanced states. In our previous work[5], we have designed a virtual mobile for our physically based mobile simulation system, through configuring the shape of each component and assigning the mass and other physical properties for each
Corresponding author.
T.-h. Kim et al. (Eds.): MulGraB 2011, Part I, CCIS 262, pp. 185–190, 2011. c Springer-Verlag Berlin Heidelberg 2011
186
N. Baek and K.-H. Yoo
(a) Steel Fish
(b) Southern Cross
Fig. 1. Virtual mobiles in their balanced states
component. However, the virtual mobiles are difficult to maintain their balanced states, since the manually selected masses of the components usually result in rotational moments due to gravity. The fundamental problem is that our multiarticulated object, a mobile does not satisfy the balanced mass conditions at the links. It had been impossible to find any previous results on systematically balancing multi-articulated objects. Thus, we used an iterative method to find the initial balanced states. To the best of our knowledge, there is still no research results on physically balancing multi-articulated objects. In this paper, we focus on the systematic way of physically balancing the multi-articulated objects, and present efficient and robust algorithms for finding the initial masses of multi-articulated objects with binary tree topology. Since our algorithms focus on the balancing of tree-topology objects, other application areas may include the general-purpose load-balancing problems and network-topology optimization problems[1]. Although there have been some treebalancing methods[2,11,1], they usually concentrated on the acceleration of inserting, deleting, and/or finding a node. Additionally, they usually achieved their goal through modifying the tree topology. Thus, previous tree-balancing methods are hard to apply for our physically balancing problem. In contrast, we focus on the change of leaf node masses, to finally get a balanced tree. In section 2, we will describe the given problem more theoretically as the weighted-leaf binary tree balancing problem. Three kinds of minimization methods are also presented in this section. In the next section, we show the details of our balancing algorithms and their time complexity analyses. Experimental results on the practical virtual objects follow in section 4. Finally, we present our conclusions and future work.
2
Problem Definition
In this paper, we will present a systematic way of balancing a multi-articulated object with binary tree topology, to finally let the object be in its balanced
Physically Balancing Multi-articulated Objects
187
state. As a start point, we will define a weighted-leaf binary tree, which is the theoretical model for our balancing problem. A weighted-leaf binary tree is defined as a binary tree, in which each leaf node Li has its corresponding positive mass mi and each internal node has zero mass. Since it is a binary tree, an internal node Ij has its left sub-tree Tjleft and right sub-tree Tjright . The total mass of Tjleft and Tjright can be expressed as Mjleft = i∈T left mi and Mjright = i∈T right mi , respectively. Additionally, Ij j
j
has its left and right weighting factor, eleft and eright . j j We can physically interpret the weighted-leaf binary tree as a set of levers and masses. In this case, each internal node Ij corresponds to a lever with the arm length of eleft and eright for each direction while each leaf node Li to a mass of j j mi . The physical laws show that the levers are in its balanced state if and only if eleft · Mjleft = eright · Mjright for each internal node Ij . j j However, it is hard to achieve the balanced state with arbitrary values of mi ’s. In this paper, we present a systematic way of calculating balanced masses mi ’s from the given values of mi ’s. Figure 2 shows a conceptual diagram of our idea. In this way, we can achieve a balanced weighted-leaf binary tree without changing the tree topology. The weighted-leaf binary tree balancing problem aims to find the balanced mass left right for each leaf node, with which eleft = eright · Mj for each internal node, j · Mj j left
right
where M j and M j are the total mass of left and right sub-trees respectively. Since we have (n − 1) internal nodes for n leaf nodes, we have (n − 1) equations for n unknowns. Thus, we need an additional constraint to solve this problem. In typical applications, the initial masses mi ’s are given and we usually need to change the mass minimally to reserve the original configuration. Hence, we adopt the constraint of minimizing the difference of the initial masses mi ’s and the balanced masses mi ’s. To minimize the mass differences, we can use three different metrics: the l1 , l2 and l∞ norms of the mass differences. Actually, these norms have slightly different behavior in the minimization process, as shown in the followings. Given the values of mi ’s, the l1 -norm of the mass differences can be expressed as n ||mi − mi ||1 = |mi − mi |. i=1
Thus, the minimization of this l1 -norm means minimizing the sum of all mass differences.
3
Balancing Algorithms
For a weighted-leaf binary tree with n leaf nodes, we have (n − 1) equations for each internal node. These equations are actually linear combinations of n unknowns. We will first express the (n − 1) unknowns, m2 , m3 , · · ·, mn in terms
188
N. Baek and K.-H. Yoo
e2left
e1left
e1right
e2right
e1left
e3left
e3right
m1
e4left
e2left
e3left
e2right
e4right m1
m2
m5
m4
e3right e4left e4right
m3
m2
e1right
m3 m4
(a) initial configuration
m5
(b) balanced configuration
Fig. 2. Balancing a binary tree through changing the leaf node masses
of m1 , to finally represent the differences of mi ’s and mi ’s in terms of a single variable m1 . Suppose that a weighted-leaf binary tree is in its balanced state, with the leaf node masses mi ’s. At a deepest internal node Ik , its left and right sub-trees are left right are equal to specific node masses mp leaf nodes. Therefore, M k and M k left
and mq , respectively. Since the tree is balanced, it is derived that eleft k · Mk
=
right
eright · M k , or equivalently, eleft · mp = eright · mq . Assuming that p is the k k k left right smaller index, mq can be calculated as (ek /ek )mp . The total mass of the right sub-tree whose root node is Ik can be expressed as mp +mq = (1+eleft )mp , k /ek with respect to the smaller index mass mp . Using this approach in a bottom-up manner, we can build up the total mass of all the sub-trees in terms of the smallest index mass in the corresponding subtrees. At the root node, the total mass of the whole tree is expressed in terms of m1 . Now, we can propagate m1 from the root node to the leaf nodes. When the total mass of an internal node Ik is expressed in terms of m1 , it implies that left
right
Mk + Mk
= (1 + =(
left
right
eleft k eright k
left
)M k
eright right k + 1)M k eleft k
and both of M k and M k can be expressed in terms of m1 . Applying this propagation in a top-down manner, we can finally express all the masses of leaf nodes in terms of m1 as mi = ci m1 , 1 ≤ i ≤ n. Since we have assumed positive masses, ci is the positive scaling factor for mi . The value of c1 is trivially 1. Since the whole process only traverses the tree twice, it is easy to find that the total time complexity for these variable substitutions is O(n).
Physically Balancing Multi-articulated Objects
189
Using the variable substitution presented in the previous subsection, the l1 norm of the mass differences can be calculated as ||mi − mi ||1 = ||ci m1 − mi ||1 =
n
|ci m1 − mi |,
i=1
where mi ’s are the initially given masses of the leaf nodes. Hence, the l1 -norm minimization becomes finding the minimum of the sum of folded line equations |ci m1 − mi |’s. input: initial masses mi ’s and geometric configurations. output: balanced masses mi ’s. apply variable substitution to get ci = mi /m1 , 1 ≤ i ≤ n. let ti be the candidate m1 values: ti = mi /ci . {sorting in O(n log n) time} sort ti to get the sorted candidates si ’s. {get the si with the minimum value in O(n) time} calculate min = n i=1 |ci m1 − mi | at s1 . for i = 2 to n do calculate val = n i=1 |ci m1 − mi | at si . if val < min then update min = val. end if end for {calculate the mi ’s in O(n) time} let m1 be the si value corresponding to the min value. for i = 2 to n do m i = ci m 1 . end for Fig. 3. The l1 -norm minimization algorithm
In this way, the line equations for each interval can be calculated with only constant time operations. Through evaluating the line equation at the end points of each interval, we can get the m1 value for the minimum l1 -norm. Overall processing can be summarized as Figure 3.
4
Experimental Results
We used our multi-articulated objects balancing methods to the virtual mobile system[5]. Due to automatic tree balancing features, we can avoid the iterative adjustment of the component masses. Examples of the balanced mobiles are shown in Figure 1. As we have expected, the mobiles are naturally in its balanced state. We used the l1 -norm minimization for these examples. Due to its optimized behavior, their execution times were less than 1 msec.
190
5
N. Baek and K.-H. Yoo
Conclusion
In this paper, we formalized the weighted-leaf tree balancing problem, which is directly applicable to the balancing of multi-articulated objects. We showed that the weighted-leaf binary tree balancing problem can be transformed into a minimization problem of a single variable. The solutions for the l1 -norm minimizations are presented. Acknowledgements. This research was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education, Science and Technology (Grant 2011-0014886).
References 1. Bayer, R.: Symmetric binary B-trees: Data structure and maintenance algorithms. Acta Inf. 1, 290–306 (1972) 2. Bayer, R., McCreight, E.: Organization and maintenance of large ordered indexes, pp. 245–262. Springer-Verlag New York, Inc. (2002) 3. Gonzalez-Ochoa, C., McCammon, S., Peters, J.: Computing moments of objects enclosed by piecewise polynomial surfaces. ACM Trans. Graph. 17(3), 143–157 (1998) 4. Kim, J., Kim, S., Ko, H., Terzopoulos, D.: Fast GPU computation of the mass properties of a general shape and its application to buoyancy simulation. Vis. Comput. 22(9), 856–864 (2006) 5. Lee, D., et al.: Reproducing works of calder. J. of Visualization and Computer Animation 12(2), 81–91 (2001) 6. Lee, Y.T., Requicha, A.: Algorithms for computing the volume and other integral properties of solids. I. known methods and open issues. Commun. ACM 25(9), 635–641 (1982) 7. Lee, Y.T., Requicha, A.: Algorithms for computing the volume and other integral properties of solids. II. A family of algorithms based on representation conversion and cellular approximation. Commun. ACM 25(9), 642–650 (1982) 8. Lien, S.L., Kajiya, J.T.: A symbolic method for calculating the integral properties of arbitrary nonconvex polyhedra. IEEE Computer Graphics and Applications 4, 35–41 (1984) 9. Mirtich, B.: Fast and accurate computation of polyhedral mass properties. J. Graph. Tools 1(2), 31–50 (1996) 10. Narayanaswami, C., Franklin, W.: Determination of mass properties of polygonal CSG objects in parallel. In: SMA 1991: Proc. the First ACM Symp. on Solid Modeling Foundations and CAD/CAM Applications, pp. 279–288 (1991) 11. Sleator, D., Tarjan, R.: A data structure for dynamic trees. J. Comput. Syst. Sci. 26(3), 362–391 (1983)
High Speed Vector Graphics Rendering on OpenCL Hardware Jiyoung Yoon1, Hwanyong Lee1, Baekyu Park1, and Nakhoon Baek2 1
573-13 Bokhyeon, 6F ITCC, Bukgu Daegu Korea {jyyoon,hylee,bkpark}@hu1.com 2 Mobile Graphics Lab. School of EECS Kyungpook National University Daegu Korea
[email protected]
Abstract. Most of computer graphics application is targeted to output for human eye which request 30~200Hz refresh rate and 1K ~ 4K resolution. However, in case of industrial application like a printing of circuit board, it requires rendering capability with much more big resolution and high performance with high precision of calculation. In this case, frequently, rendering using general graphics API is not fit for the requirement. In our research, we present case study of high precision, high speed and robust rendering of printed circuit board for high speed laser equipment. We used parallel programming using OpenCL1 API. Keywords: Vector Graphics, OpenVG, OpenCL, High speed rendering.
1
Introduction
Most of computer graphics application is targeted to output for human eye which request 30~200Hz refresh rate and 1K ~ 4K resolution. However, in case of industrial application, like a printing of circuit board with high speed laser device, it requires very high resolution, for example, 1 micro meter resolution for one square meter size, its resolution is 1,000,000 x 1,000,000. If we can use pre-rendered image for printing, we can make result with long time rendering and then we can use the result for printing. In case of modern printing device which can adjust and modify for status of surface of printing materials, we should transform slightly and re-render circuit artwork in every printing. We can use graphics API like OpenVG or OpenGL for rendering however, there are following technical issues.
1
Output path - In general, graphics API rendering is tuned for screen output, therefore, when we display result of rendering, it makes best performance but, when we try to use rendering result as data and should be transferred, sometimes, it makes unexpected performance.
OpenCL is Registered trademark of Apple Inc, OpenGL is registered trademark of SGI, OpenVG is trademark of Khronos Group. NVIDIA is trademark of NVIDIA Inc.
T.-h. Kim et al. (Eds.): MulGraB 2011, Part I, CCIS 262, pp. 191–196, 2011. © Springer-Verlag Berlin Heidelberg 2011
192
J. Yoon et al.
Output format - In general, graphics API rendering make result as pixel form with RGB or RGBA. However, some industrial application require different pixel format, for example, Black and White - 1 bit depth pixel for each render layer, etc. Most of general graphics hardware does not support 1 bit depth pixel format. Requirement for just in time rendering - We should render partially or fully in requested time to synchronize printing device. If we failed rendering in requested time, and if device cannot re-synchronize to process then, it occurs wasting of printing material which may be very high cost. Rendering Precision - most of graphics API targeted to output for human eye, so there may be some faulty rendering which cannot be recognized in human eyes. In this case, graphics system ignores this kind minor fault. However in industrial application, we should not ignore any faulty rendering which can make serious problem. Even in serious fault which cause one frame dropping failure, if it is rendered in very high speed, we cannot recognize it with eyes. However, OpenGL API and OpenGL hardware set to ignore it in default. We can turn on to monitoring bugs of OpenGL, however it makes performance loss. Unnecessary processing - In industrial application, for example in circuit rendering, very intensive proving is performed before printing. Therefore it has not geometric complexity, for example, all input polygons are simple polygon, sometimes convex polygon and there is no edge intersection, no coherent edges, and one edge is only included one polygon. On the contrary, general graphics hardware has features for solving this geometric complexity. If we can turn off these features, we can get more rendering performance. Huge input data - In circuit rendering, we should process data with huge number of edges. In this case, sometimes GPU generate unexpected rendering result. Huge output data - because of limitation of internal memory of GPU and main memory, we cannot store full rendering result to memory frequently. Then we should render with pipelined scheme with synchronization of printing device.
To solve above technical issue, we should use more precise and high performance method to render.
2
Rendering Using OpenCL
2.1
Requirements
We assume printing device which requires a high speed and high precision rendering to fulfill following requirements. Printing device has number of printing heads moving one-directional 1024 pixel width. Rendering result should be 1 bit depth pixel format packed in 32bit word. Horizontal and vertical resolution is times of 32. Rendering should be performed unit of track and performance should be faster than printing speed or data transfer speed.
High Speed Vector Graphics Rendering on OpenCL Hardware
2.2
193
Vector Rendering – Polygon Scanline Conversion
The usual method starts with edges of projected polygons inserted into buckets, one per scanline; the rasterizer maintains an active edge table(AET). Entries maintain sort links, X coordinates, gradients, and references to the polygons they bound. To rasterize the next scanline, the edges no longer relevant are removed; new edges from the current scanlines' Y-bucket are added, inserted sorted by X coordinate. The active edge table entries have X and other parameter information incremented. Active edge table entries are maintained in an X-sorted list by sort, effecting a change when 2 edges cross. After updating edges, the active edge table is traversed in X order to emit only the visible spans, maintaining a Z-sorted active Span table, inserting and deleting the surfaces when edges are crossed. [1] In summary, scanline by scanline, we should add new edge or remove edge, and then if we need to do, sorting, then we generate output scanline. 2.3
CPU vs GPU Rendering
We can implement vector rendering processing using CPU or GPU. We can render some objects using the graphic API such as OpenVG or OpenGL. And we can render some objects using software graphic algorithms with general processing. Of course hardware-accelerated rendering using graphics API is faster than software rendering but we cannot control precision of calculation, performance and parallelism. OpenCL is KHRONOS defined API for parallel computation, and it is possible to use general CPU’s and GPU’s. Therefore, if we implement the code with OpenCL once, we can use the code on CPU or GPU. 2.4
Parallelism
There may be various implementation of parallelism. We can divide jobs as unit of “rendering of track”, “rendering of scanline” or we can make parallel program of rendering one scanline. “Rendering of tracks” requires huge redundancy of input data so, we encountered lack of storage. In case of parallelism of one scanline, most big performance gain of parallelism is sorting, however, in circuit board rendering, there is edge order changing is rare, so most of case we don’t have to re-sort. Therefore there was little performance gain by parallel computing. We applied Scanline algorithm for a rendering processing for each line. We made AEL for full rendering scene through pre-processing. And each AEL stored on main memory, GPU memory and then each parallel processor using AEL. We can render the full scene at the one time through parallel processing. Before parallel processing, we performed following pre-processing for performance.
Mark of edge data for edge is included in which track. We need number of tracks bits for each edge. Pre-Sorted in direction of Y and then, X
194
J. Yoon et al.
3
Experiment and Result
3.1
Experiment
We experimented about rendering performance with very high resolution sample. Rendering sample is binary color and the Resolution is 425,984 by 524,288 pixels. We divided the resolution by 32 sample tracks to consider video memory. The resolution of each sample track is 425,984 by 16,384 pixels. So each sample track is required 872MByte memory space. Figure 1 is sample test image (white means turn of pixel).
Fig. 1. Sample Rendering Result
We applied Scanline algorithm for sample tracks and modified this algorithm for simple processing. Modified algorithm is assumed that working under following conditions. - Edges are not intersect - Do not meet closely located edges or vertices. - All objects should be rendered with even-odd fill rule. - All input vertices are in the viewport before and after transformation We experimented to render the sample on CPU, CPU using emulated OpenCL and GPU using OpenCL. The system information for out experimentation is as follow. -
OS : Windows XP 32bit CPU : i7 870 (2.93GHz) / GPU : nVidia GTX 580 / Memory : 4GByte OpenCL version : 1.0
High Speed Vector Graphics Rendering on OpenCL Hardware
195
Fig. 2. Flow of modified raster algorithm
3.2
Result
We experimented the sequential processing on CPU and the parallel processing on CPU & GPU using OpenCL. Table 1 is the performance test result. Table 1. Performance test result
(unit : ms) Track #
1 2 3 4 5 … 25 26 27 28 29 30 31 32 Average
Sequential Processing Parallel Processing Parallel Processing CPU CPU using OpenCL GPU using OpenCL Rendering Time Rendering Time Rendering Time 215.205127 233.385641 13.516466 71.466162 195.201286 100.167311 327.051840 221.773636 101.613081 38.556650 198.564132 14.778528 2809.046385 934.064232 87.303553 … … … 61.525509 193.753294 12.929199 12406.216416 3096.634826 95.350013 86.140283 207.996466 106.330943 381.545285 253.989626 109.370979 71.279381 195.280915 107.844331 72.362514 211.144243 106.421716 186.608018 212.017837 108.308972 0.109855 184.366843 12.970253 3516.837097 992.281998 79.420253
As you see Table 1, we got the best performance on the GPU using OpenCL. Parallel processing on the GPU using OpenCL is average of about 44 times faster than sequential processing on CPU. Table 2 is the performance comparison table. The criteria of the performance comparison is sequential processing rendering time on CPU.
196
J. Yoon et al. Table 2. Performance comparison table
(unit : times) Track #
Average
4
Parallel Processing CPU using OpenCL Rendering Time 3.544191
Parallel Processing GPU using OpenCL Rendering Time 44.281364
Conclusion
We achieved sufficient performance using parallel processing on GPU using OpenCL for our target equipment and requirements. As our experiment, parallel processing on the GPU using OpenCL is average of 32 times faster than sequential processing on CPU however, it is vary on size of input polygon, in case of very low density area, CPU performance is even better than GPU rendering. We implemented vector graphics rasterizer using OpenCL and we believe that there is following benefits compared to using general graphics API. -
It makes possible to calculate with required precision. We can examine and monitor each steps of calculation, it is not black box like OpenGL We can implement with parallelism we want, can make pipeline is synchronized with printing device We can use OpenCL implementation for GPU or CPU both. Therefore we can choose, and we can verify calculation result.
As a future work, we have to implement OpenVG on top of OpenCL for using more general application with very high precision calculation and high performance. Acknowledgement. This research was supported by the MKE (The Ministry of Knowledge Economy), Korea, under the “Strengthening the competitive power of new mobile convergence industries” support program (Grant Number 10037950).
References 1. 2. 3. 4. 5.
Wikipedia contributors. "Scanline rendering." Wikipedia, The Free Encyclopedia. Wikipedia (September 21, 2011); Web (September 22, 2011) OpenCL 1.0 API, Khronos Group, http://www.khronos.org/opencl/ Rice, D., Simpson, R.J.: OpenVG Specification, version 1.1, KHRONOS Group (2008) Kim, D., Cha, K., Chae, S.: A high performance OpenVG accelerator with dual-scanline filing rendering. IEEE Trans. Consumer Electronics, 1–2 (2009) Lee, H., Baek, N.: AlexVG: An OpenVG implementation with SVG-Tiny Support. Computer Standards & Interface 31(4), 661–668 (2009)
Research on Implementation of Graphics Standards Using Other Graphics API’s Inkyun Lee1,*, Hwanyong Lee1, and Nakhoon Baek2 1
HUONE Inc., 573-13 Bokhyeon, 6F ITCC, Bukgu Daegu Korea {iklee,hylee}@hu1.com 2 Kyungpook National University, 80 Daehakro Bukgu Daegu Korea
[email protected]
Abstract. There are a number of formats and API (Application Program Interface)'s which are used as standard in computer graphics area. Sometimes, format or API is implemented using other format or API's for hardware acceleration and saving time and cost to implement. In our research, we list major computer graphics API's and discuss about current status, technical issues, advantages and disadvantages of each cases of implementation on other API's and dependencies between graphics standards and middleware. Keywords: Computer Graphics API, Standard Implementation.
1
Introduction
Computer graphics industry is rapidly changing area. And related standards are rapidly updating. Therefore, making new graphics chip is business of high risk. Current trend for rich user experience requires very rapid feedback of user interaction and vivid visual effects therefore, hardware accelerated graphics is essential in most of personal devices like cell phone, media player as well as desktop PC. Without design and making new chip for new API, if we can implement new API on existing graphics chip, it is very highly cost effective. In this paper, we look into, dependencies of various kind of computer graphics middleware, formats and technologies. And we also present successful case of implementation of API on other API, we propose effective way of implementation using other API. Our research is mainly targeted to standard of KHRONOS group which define major media API like OpenGL, OpenVG, OpenMAX, etc., and standards of W3C, JCP. We will explain how to implement KHRONOS API on top of other KHRONOS API’s, alse present how 3D Graphics related technologies connected to 3D API, and 2D vector graphics middleware connected to graphics API. We discuss advantage and disadvantage of each case of implementation. *
Khronos Group, OpenVG, OpenSL, OpenWF, OpenKODE, OpenMAX and their logos are trademark of KHRONOS Group Inc,. OpenGL and its logo are trademark of SGI, OpenCL and OpenCL logo are trademark of Apple Inc., Collada and its logo are trademark of Sony Computer Inc.
T.-h. Kim et al. (Eds.): MulGraB 2011, Part I, CCIS 262, pp. 197–202, 2011. © Springer-Verlag Berlin Heidelberg 2011
198
I. Lee, H. Lee, and N. Baek
2
Implementation Using other API
2.1
Overview of KHRONOS Graphic Standard APIs
There are a number of formats and API (Application Program Interface)'s which are used as standard in computer graphics area. In the PC environment, OpenGL is the premier environment for developing portable, interactive 2D and 3D graphics applications. Since its introduction in 1992, OpenGL has become the industry's most widely used and supported 2D and 3D graphics application programming interface (API), bringing thousands of applications to a wide variety of computer platforms [12]. Recently OpenGL 4.2 version is released. In the field of mobile graphics APIs, various kind of API standards are defined by KHRONOS Group. As well as graphics API, multimedia streaming, high level sound, 3D contents asset, 3D Graphics in Web environment, 2D vector graphics, etc. Figure 1 illustrates standards stack of KHRONOS Group.
Fig. 1. KHRONOS Visual Computing Ecosystem [14]
OpenGL® ES is the standard for embedded accelerated 3D graphics. It is a royalty-free, cross-platform API for full-function 2D and 3D graphics on embedded systems - including consoles, phones, appliances and vehicles. It consists of well-defined subsets of desktop OpenGL, creating a flexible and powerful low-level interface between software and graphics acceleration. OpenGL ES includes profiles for floatingpoint and fixed-point systems and the EGL specification for portably binding to native windowing systems. OpenGL ES 1.X is for fixed function hardware and offers acceleration, image quality and performance. OpenGL ES 2.X enables full programmable 3D graphics. OpenGL SC is tuned for the safety critical market [13]. OpenVG™ is the standard for vector graphics acceleration. It is a royalty-free, cross-platform API that provides a low-level hardware acceleration interface for vector graphics libraries such as Flash and SVG. OpenVG is targeted primarily at
Research on Implementation of Graphics Standards Using other Graphics API’s
199
handheld devices that require portable acceleration of high-quality vector graphics for compelling user interfaces and text on small screen devices - while enabling hardware acceleration to provide fluidly interactive performance at very low power levels [13]. The open standard OpenGL® SC Safety Critical Profile is defined to meet the unique needs of the safety-critical market for avionics, industrial, military, medical and automotive applications including D0178-B certification. It simplifies safetycritical certification, guarantees repeatability, allows compliance with real-time requirements, and facilitates porting of legacy safety-critical applications [13]. 2.2
KHRONOS API Implementation on other API
Basically, all KHRONOS API’s are designed for their own silicon chip. However, because of technical relativity and market needs, some of API was implemented on other API. (see figure 2) OpenVG
WebGL
OpenGL ES 2.0 (+SL)
OpenGL 2.0 (+SL)
OpenGL 1.X
OpenWF
OpenGL ES 1.X
implementable using Partially Implementable
OpenGL SC
Possible, but not yet exist
Fig. 2. KHRONOS API’s implantable using other API
OpenGL SC implementation – is explicit case. Though OpenGL SC spec. is release in 2007, there is no OpenGL SC chip in market. It is because OpenGL SC market is very specific and small market. Therefore OpenGL SC graphics boards are implemented on OpenGL or OpenGL ES hardware [4]. Generally OpenGL SC specification is smaller set of OpenGL specification, it is possible to implement easily, however in OpenGL SC, there is palette texture which is not recent GPU does not support. It is required to use Shading Language to implement palette texture on recent GPU. It may make serious concern about reliability of implementation. OpenGL ES on OpenGL – OpenGL ES standard is for embedded system and based on OpenGL. It is natural that it is easy to implement on OpenGL. Common case of this kind implementations are included in SDK(software development kit) for OpenGL ES. In the case of implement OpenGL ES 1.1 on 2.0, OpenGL ES 1.1 strand has a fixed functionality pipeline, whereas the 2.0 versions feature flexible fragment and vertex shader units [3]. Fixed-functionality means that the OpenGL ES API provides a fixed amount of predefined algorithms for rendering which a programmer can use [14]. Some special case of OpenGL ES 2.0 implementation is on DirectX9 – ANGLE project. [15] ANGLE project is very useful for implementing WebGL because WebGL is based on OpenGL ES not OpenGL [3][5].
200
I. Lee, H. Lee, and N. Baek
OpenVG on OpenGL ES/OpenGL – OpenVG is 2D graphics and OpenGL also has 2D features, so many people consider implementation is easy, on the contrary OpenVG requires very high quality rendering and has complex features. There are several implementations of OpenVG on OpenGL ES.[8] This implementation requires tessellation process which requires heavy computation. In case of drawing of complex shape need to tessellate every frame of animation, performance of rendering is bad and, sometimes it may be slower than software rendering. OpenWF on OpenGL ES / OpenVG – OpenWF is windows composition standard and preferable to accelerating on hardware. Surely, it is possible to accelerate on OpenGL ES or OpenVG hardware, but currently no known implementation. 2.3
OpenGL ES Related Technologies
OpenGL ES and OpenGL is playing key role to implement high level graphics services. (See fig. 3) Some high level graphics was designed to operate on OpenGL ES.
Fig. 3. OpenGL ES Related Graphics Technologies
M3G, or Mobile 3D Graphics API for Java (JSR 184), is a high-level API. The rendering model is the same as in OpenGL ES. It would be feasible that a single mobile device would support both APIs, and the same basic graphics engine, especially if hardware accelerated, should be able to support both APIs. In fact, a recommended way to implement M3G is to implement it on top of OpenGL ES [11]. WebGL is a cross-platform, royalty-free web standard for a low-level 3D graphics API based on OpenGL ES 2.0, exposed through the HTML5 Canvas element as Document Object Model interfaces [13]. And Scene graph or content asset standards like OSG(Open Scene Graph), Collada and X3D can be rendered on OpenGL ES [1].
Research on Implementation of Graphics Standards Using other Graphics API’s
2.4
201
2D Vector Graphics Related Technologies
OpenVG and OpenGL ES both are playing role to render and accelerate 2D vector graphics technology. They can be used for implementing low-level API. Another important standard in 2D vector graphics is SVG which defined by W3C as standard we vector graphics format. SVG has three kind profile; SVG Full, SVG Basic, SVG Tiny. Especially SVG Tiny is widely used in wireless internet services and multimedia standards. (See fig. 4)
MPEG Part 20
JSR 226, 287, 271
OMA - DCD, MMS, WAP
HTML5 Canvas 2D
SVG (Tiny) WebKit GTK+ Cairo Graphics OpenVG
Nokia Qt Adobe Flash Player
Lite version Version 10.1 later Support / Dependent
Google Skia OpenGL (ES)
Partial / Experimental Support
Fig. 4. Hardware Acceleration of Major 2D Vector Graphics Technology
Recently, W3C is making new HTML standards, HTML5 which includes Canvas 2D element. Canvas 2D can be implemented easily with OpenVG. Current Open Source project about web browser, WebKit has interface later to OpenVG or OpenGL ES, however it is only for interface, OpenGL ES and OpenVG is not used to draw geometric object. Interesting case of Adobe Flash Player, can be accelerated on both OpenVG and OpenGL ES 2.0. However target market is different. Flash lite player which is targeted low tier cell phones, is accelerated on OpenVG hardware, while flash 10.1 player targeted smart phone or tablet, is accelerated on OpenGL ES 2.0. Google Skia also can be accelerated on OpenGL ES 2.0
3
Discussion
Implementing API on other API already implemented is very cost effective way. We can get much more high performance than software implementation. We can avoid risk of investment of the development and can distribute solution to market timely. By sending CPU rendering jobs to existing hardware accelerator, we can reduce CPU load. It means that we can reduce power consumption and head radiation and it is very preferable in mobile environment. Recently, providing vivid user experience is important issue in market, therefore hardware acceleration is major issue of application.
202
I. Lee, H. Lee, and N. Baek
Acknowledgement. This research was supported by The Ministry of Knowledge Economy, Korea, under the “Strengthening the competitive power of new mobile convergence industries” support program (Grant Number 10037950).
References 1. Nadalutti, D., Chittaro, L., Buttussi, F.: Rendering of X3D content on mobile devices with OpenGL ES: Published. In: Proceeding Web3D 2006 Proceedings of the Eleventh International Conference on 3D Web Technology. ACM, New York (2006) ©2006 table of contents ISBN:1-59593-336-0 2. Robart, M.: OpenVG paint subsystem over openGL ES shaders. In: Digest of Technical Papers International Conference on Consumer Electronics, ICCE 2009, Las Vegas, NV, pp. 1–2 (2009) ISBN: 978-1-4244-4701-5, AST, STMicroelectronics 3. Hill, S., Robart, M., Tanguy, E.: Implementing OpenGL ES 1.1 over OpenGL ES 2.0. In: Digest of Technical Papers International Conference on Consumer Electronics, ICCE 2008, Las Vegas, NV, January 9-13, pp. 1–2 (2008) ISBN: 978-1-4244-1458-1, STMicroelectronics, Bristol 4. Baek, N., Lee, H.: OpenGL SC Emulation Based on Windows PC’S. In: IEEE ICME 2011, Barcelona, Spain (2011) 5. Lee, H., Baek, N.: Implementing OpenGL ES on OpenGL. In: IEEE 13th International Symposium on Consumer Electronics, ISCE 2009, Kyoto, May 25-28, pp. 978–1003 (2009) ISBN: 978-1-4244-2975-2 6. Hall, C.: OpenGL ES Safety Critical. In: Proceeding SIGGRAPH 2006 ACM SIGGRAPH 2006 Courses. ACM, New York (2006) ©2006 table of contents ISBN:1-59593-364-6 7. Lee, H., Baek, N., Lee, I., Yoon, J., Pothier, O.: Accelerating OpenVG and SVG Tiny with multimedia hardware. In: IEEE International Conference on Consumer Electronics (ICCE), Las Vegas, NV, January 9-12, pp. 917–918 (2011) ISSN: 2158-3994, Print ISBN: 978-1-4244-8711-0 8. Oh, A., Sung, H., Lee, H., Kim, K., Baek, N.: Implementation of OpenVG 1.0 using OpenGL ES. In: Proceeding MobileHCI 2007 Proceedings of the 9th International Conference on Human Computer Interaction with Mobile Devices and Services. ACM, New York (2007) ©2007 table of contents ISBN: 978-1-59593-862-6 9. Cole, P.: OpenGL ES SC - open standard embedded graphics API for safety critical applications. In: The 24th Digital Avionics Systems Conference, DASC 2005, October 30November 3, vol. 2, p. 8 (2005) Print ISBN: 0-7803-9307-4 10. Baek, N., Lee, H.: Implementing OpenGL SC over OpenGL 1.1+. In: 2011 IEEE International Conference on Consumer Electronics (ICCE), Las Vegas, NV, January 9-12, pp. 271–272 (2011) ISSN: 2158-3994 11. Pulli, K.: New APIs for Mobile Graphics. In: Nokia Research Center. SPIE Multimedia on Mobile Devices II, San Jose, CA, USA, January 15-19 (2006) 12. OpenGL Overview, http://www.opengl.org/about/overview/ 13. Khronos Group, http://www.khronos.org/ 14. Vuorinen, J.: Cost-Efficient Development with Various OpenGL ES APIs; Master’s Thesis of HELSINKI UNIVERSITY OF TECHNOLOGY Department of Computer Science and Engineering; Espoo (August 30, 2009) 15. Angle project, http://code.google.com/p/angleproject/
A Dynamics Model for Virtual Stone Skipping with Wii Remote Namkyung Lee and Nakhoon Baek School of Computer Sci. and Eng. Kyungpook Nat’l Univ., Republic of Korea
[email protected]
Abstract. Stone skipping is an example of rigid-fluid coupling, which typically need much computation. In this paper, we present a real-time method for visually plausible stone skipping simulation. Based on the Newtonian physics, an intuitive dynamics model is presented to simulate the linear and rotational motions of the stone. The real-world stone is substituted by the Wii Remote connected to the host PC. Our implementation shows the real-time simulation of the bouncing process of the virtual stone. Keywords: stone skipping, virtual simulation, Wii controller.
1
Introduction
Stone skipping (also known as ducks and drakes) is one of the traditional pastimes, experiencing typical rigid-fluid coupling phenomena. We get various results with respect to the initial conditions such as attacking angle, velocity, etc. Although it is possible to physically simulate these phenomena, we need a remarkable amount of computation to make models for interactions between the fluid and the rigid body. In this paper, we present a real-time virtual stone-skipping simulation system, as shown in Figure 1. Our system contains a computation-efficient dynamics model to show visually plausible stone skipping motions in real time. The water waves generated by the stone on the water surface are processed by a modified water surface model from [10]. For better user experiences, the initial physical quantities are naturally set by the Wii Remote, the three dimensional input device for Nintendo game console Wii. This wireless controller has the acceleration, location and infrared sensors and a Bluetooth communication module to be directly connected to the host PC. We simulate the virtual stone skipping with this interactive device.
Corresponding author.
T.-h. Kim et al. (Eds.): MulGraB 2011, Part I, CCIS 262, pp. 203–209, 2011. c Springer-Verlag Berlin Heidelberg 2011
204
2
N. Lee and N. Baek
Background
Bocquet’s article[4] would be the first literature on the stone skipping. He analyzed the stone skipping as the drag forces between the water surface and the stone. He physically analyzed the suitable angle of attack for the greater number of stone bounces. Later, Rosellini et al. presented a more physical analysis of the stone skipping[14]. Nagahiro et al. used ordinary differential equations (ODE) and the smoothed particle hydrodynamics (SPH) method, to calculate the best velocity and angle of attack[12]. Both require any amount of computation. In the field of computer graphics, typical physically-based modeling researches are focusing on the realistic representation of the water surface hit by a rigid body. Since Stam’s fluid simulation method[16], many research results are available[2,6,17,3]. However, most of them require much computation time, and still unsuitable for real-time applications including computer games and virtual reality programs. Do et al.[7] focused on the virtual simulation of stone-skipping, and calculated the vertical and horizontal drag forces for the simulation. However, they ignored the spinning of the stone and its related effects, and failed to show realistic simulations, at least for some cases. For our user experience support, we used the Wii Remote as the input device. As an inexpensive three-dimensional input device, Wii Remote is now available for various applications including interactive whiteboard[9], motion recognition[15], virtual orchestra[5], etc. Our implementation will be another good example of direct interactions with the virtual physical world.
3 3.1
Our Method A Real-time Dynamics Model for Stone-Skipping
A well-balanced practical dynamics model is needed to achieve a realistic simulation of the stone skipping with relatively less computation. A stone thrown to the water surface is affected by four kinds of forces: throwing force, gravity, air resistance and impulsive force to the water. Using a typical physically-based simulation system[1], the throwing force and gravity can be naturally handled just as the forces on the rigid bodies. Bouncing of a stone is actually generated by the reaction force to the water surface. Bocquet modeled the flying stone simply as a flat disk and presents an analytic model for the collision of this disk with the surface of still water[4]. Later, Do et al.[7] refines the virtual stone with a triangular mesh. In our simulation, we adopted a hybrid approach of using the triangular mesh model for
A Dynamics Model for Virtual Stone Skipping
205
the more precise reactions to the water surface and the simplified disk model for air resistance calculation, respectively. At the moment of collision, its linear and angular velocity and the area of contact affects the bouncing of the stone. The linear velocity and the amount of contacting area are important factors for calculating the lift force. When the lift force is greater than the gravity, the stone moves the water surface up. The spinning of the stone supports its stabilization and also makes the curved trajectories of the stone skipping. Thus, all these terms should be integrated into the dynamics model. Figure 3 shows the collision between the water surface and a triangle Ti from the virtual stone mesh. The drag force Fi is derived from the contacting area Si . Thus, only the triangles under the water surface generate the drag force, major source of the bounce. Actually, the drag force can be decomposed into the lift force and the resistance, as shown in Figure 3. Since the stone spins, the linear velocity at the triangle Ti can be calculated as: vi = vlinear + ri × ω,
(1)
where vlinear is the linear velocity of the stone, ri is the average rotational radius for the triangle and ω is the rotational velocity of the stone. Based on the Newton’s drag force equation, the drag force Fi is calculated as[8,11]: Fi = −ρwater (vi · ni )2 Si ni ,
(2)
where ρwater , ni , and Si are the material constant for water, the normal vector, and contacting area of the triangle Ti , respectively. With this drag force, the torque τi on the triangle Ti can be expressed as the cross product of: τi = Fi × ri .
(3)
We also calculate the drag forces and lift forces with respect to the air. To minimize the computational burdens, we approximated the stone as a thin circular disk, as Bocquet did[4]. The Newtonian equations as shown in Equations (2) and (3) are also used for this calculation, with the air factors rather than the water surface. Through approximating the stone as a disk, we get more simplified equations of: Fair = −ρair (vdisk · ndisk )2 Sdisk ndisk , (4) and τair = Fair × rdisk ,
(5)
for the whole stone. Here, vdisk , ndisk , Sdisk , and rdisk are the linear velocity, normal vector, total surface area, and average radius of the simplified disk, respectively. The gravity and air-related forces are consistently applied to the stone, over the whole simulation process. The throwing force should be applied only at the
206
N. Lee and N. Baek
very first throwing time. Impact forces due to Equation (2) are applied to the stone at the stone-water collision time. 3.2
User Interface
Wii remote, also known as Wiimote, is the three-dimensional input device for Nintendo’s game console Wii[13]. A Wii remote has acceleration and motion sensors for three directions and ultra-red sensors. Our system interprets the sensor values of the Wii remote as the physical quantities of the virtual stone, during the stone throwing process. The location and acceleration values of the Wii remote are used for those of the virtual stone. At the first stage, the user takes the throwing motion with the action button pressed. At the time of throwing it, the action button will be released and the location, orientation, linear and angular acceleration terms are sent to the simulation system as the initial physical quantities. Through tracing these physical quantities of the virtual stone, we can calculate the velocity and the attack angle with respect to the water surface, and generates the bouncing motion. A rubber strap attaches the Wii remote to the user’s wrist, to prevent real collisions. This kind of user interface based on the actual throwing of the Wii remote finally achieved more natural user experience.
4
Example System
Our virtual stone-skipping system is implemented on an Intel Core2 6300 1.86GHz PC with a GeForce 9600 graphics card and DirectX 9.0c libraries. For user interactions, a Nintendo Wii remote wireless controller (ADXL 330) is also used. This controller uses the Bluetooth protocol and reads out its data to the host PC with a maximum of 100Hz update rate. To handle the communications with the Wii remote, we use the WiiYourself ! library[18]. A virtual stone represented by a mesh of 44 triangles was used for the experiment, with a height field over the 500 × 500 rectangular grid. Our system shows more than 90 frames per second, including all the simulations and real-time renderings with the light sources and textures. Figures 4 and 2 are simulation results from our system. Figure 4 shows a sequence of snapshots for a typical stone-skipping simulation. Figure 2 demonstrates the effect of the spinning of the stone. Without any rotation on the stone, it bounces just along to the moving direction, as shown in Figure 2.(b) and also mentioned in the previous work[7]. With the same linear velocity, our system integrates the rotational motion of the stone, and it generates naturally curved trajectory of the stone skipping, as shown in Figure 2.(a). As shown in Figure 2, the spinning stone makes much more bounces even with the same linear velocity.
A Dynamics Model for Virtual Stone Skipping
207
ω
drag force Fi
lift
stone vlinear water
resistance
a triangle Ti
ni
Fig. 3. Collision between the water surface and a triangle
(a)
Fig. 1. Our virtual stone skipping system (b)
(c)
(d)
(a) with spinning (from our system)
(b) without spinning (e) (from the previous work[7])
Fig. 2. The curved trajectory of a spinning stone
(f) Fig. 4. A sequence of stone-skipping simulation
208
5
N. Lee and N. Baek
Discussion
In this paper, we presented a real-time virtual experience system for stoneskipping. To reproduce the stone-skipping, we derived a restrained dynamics model for flying stones and also a wave propagation model on the water surface. Based on these specialized physically-based modeling techniques, we accomplished visually plausible interactive simulations at more than 90 frames per second. For better user experiences, we established a completely perceptible interface with Wii remote. Extracting all required physical quantities from the user motions, we accomplished more immersed experience. Our relatively inexpensive implementation of a perceptible simulation system would be expected to be used for other application areas. Currently, we are working on the better water surface model and user experience, to achieve more realistic systems. Acknowledgements. This research was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education, Science and Technology (Grant 2011-0014886).
References 1. Baraff, D.: Physically based modeling: Rigid body simulation. SIGGRAPH Course Note (2001) 2. Batty, C., Bertails, F., Bridson, R.: A fast variational framework for accurate solidfluid coupling. In: SIGGRAPH 2007, p. 100. ACM (2007) 3. Becker, M., Tessendorf, H., Teschner, M.: Direct forcing for lagrangian rigid-fluid coupling. IEEE Trans. on Visualization and Computer Graphics 15(3), 493–503 (2009) 4. Bocquet, L.: The physics of stone skipping. American J. of Physics 71(2), 150–155 (2003) 5. Bruegge, B., et al.: Pinocchio: conducting a virtual symphony orchestra. In: ACE 2007: Proc. of the Int’l Conf. on Advances in Computer Entertainment Technology, pp. 294–295 (2007) 6. Carlson, M., Mucha, P.J., Turk, G.: Rigid fluid: animating the interplay between rigid bodies and fluid. In: SIGGRAPH 2004, pp. 377–384. ACM (2004) 7. Do, J., Lee, N., Ryu, K.W.: Realtime simulation of stone skipping. Int’l J. of Computers 4(1), 251–254 (2007) 8. Halliday, D., Resnick, R.: Fundamentals of Physics. John Wiley & Sons (2005) 9. Lee, J.C.: Hacking the nintendo wii remote. IEEE Pervasive Computing 7(3), 39–45 (2008) 10. Lengyel, E.: Mathematics for 3D Game Programming and Computer Graphics, 2nd edn. Charles River Media, Inc. (2003) 11. Long, L.N., Weiss, H.: The velocity dependence of aerodynamic drag: A primer for mathematicians. The American Math. Monthly 106(2), 127–135 (1999) 12. Nagahiro, S., Hayakawa, Y.: Theoretical and numerical approach to “magic angle” of stone skipping. Phys. Rev. Lett. 94(17), 174501 (2005)
A Dynamics Model for Virtual Stone Skipping
209
13. Nintendo Wii (2010), http://www.nintendo.com/wii 14. Rosellini, L., et al.: Skipping stones. J. of Fluid Mechanics 543, 137–146 (2005) 15. Schl¨ omer, T., Poppinga, B., Henze, N., Boll, S.: Gesture recognition with a wii controller. In: TEI 2008: Proc. of the 2nd Int’l Conf. on Tangible and Embedded Interaction, pp. 11–14 (2008) 16. Stam, J.: Stable fluids. In: SIGGRAPH 1999, pp. 121–128 (1999) 17. Takahashi, T., Ueki, H., Kunimatsu, A., Fujii, H.: The simulation of fluid-rigid body interaction. In: SIGGRAPH 2002, pp. 266–266 (2002) 18. WiiYourself! Project (2010), http://wiiyourself.gl.tter.org/
How to Use Mobile Technology to Provide Distance Learning in an Efficient Way Using Advanced Multimedia Tools in Developing Countries Sagarmay Deb Central Queensland University, 400 Kent Street, Sydney 2000, NSW, Australia
[email protected]
Abstract. Although the developments of multimedia technology and internet networks have contributed to immense improvements in the standard of learning as well as distance learning in developed world, the developing world is still not in position to take advantage of these improvements because of limited spread of these technologies, lack of proper management and infrastructure problems. Unless we succeed in solving these problems to enable people of developing countries to take advantages of these technologies for distance learning the vast majority of the world population will be lagging behind. In this paper we explore how to use mobile technology to provide distance learning in an efficient way using advanced multimedia tools. We recommend the use of mobile and multimedia technology to reach this vast population of under-developed countries to impart quality learning in an effective way. Keywords: Distance learning, mobile technology, multimedia technology, developing countries.
1
Introduction
The concepts of distance learning are prevalent in developing countries for last few decades and it is very much in vogue in developed countries [1], [4]. In developing countries it started like many other countries did with correspondence courses where printed learning materials used to be despatched to the students at regular intervals and students were expected to read the materials and answer questions. The basic philosophy was teachers would be physically away from the students and have to conduct the teaching process from distance [2]. With the development of computer industry and internet networks during the last three decades things have changed and global communication has reached an unprecedented height [1]. With these developments immense scopes have come to the surface to impart learning in a much more efficient and interactive way. Multimedia technology and internet networks have changed the whole philosophy of learning and distance learning and provided us with the opportunity for close interaction between teachers and learners with improved standard of learning materials compared to what was existing only with the printed media. It has gone to such an extent to create a virtual class room where teachers and students are scattered all over the world. T.-h. Kim et al. (Eds.): MulGraB 2011, Part I, CCIS 262, pp. 210–216, 2011. © Springer-Verlag Berlin Heidelberg 2011
How to Use Mobile Technology to Provide Distance Learning
211
Although some of these facilities are expensive still the developed world is in a position to take advantage of these facilities to impart much better distance-learning to students residing in the developed countries. But for developing countries the story is different as computerization and network connections are still very limited compared to the developed world. In this paper we focus our attention on defining the problems of using these technologies for much more improved and extensive distance-learning and suggest how we could possibly reach these vast majority of people from the developing countries with the improved quality of distance-learning provided by multimedia and internet networks. Section one gives an introduction of the area. Section two presents the advancements developing countries are making to make use of mobile technologies. Section three presents the issue of usage of mobile technology with advanced multimedia tools in distance learning in developing countries. We put our concluding remarks in section four.
2
Analyses of Works Done
The open-universities which started functioning by late sixties and early seventies of last century, reaching off-campus students delivering instruction through radio, television, recorded audio-tapes and correspondence tutoring. Several universities particularly in developing countries still use educational radio as the main instructional delivery tool [1]. With the extended application of information technologies (IT), the conventional education system has crossed physical boundaries to reach the un-reached through a virtual education system. In the distant mode of education, students get the opportunity for education through self-learning methods with the use of technology-mediated techniques. Efforts are being made to promote distance education in the remotest regions of developing countries through institutional collaborations and adaptive use of collaborative learning systems [2]. Initially, computers with multimedia facilities can be delivered to regional resource centers and media rooms can be established in those centers to be used as multimedia labs. Running those labs would necessitate involvement of two or three IT personnel in each centre. To implement and ascertain the necessity, importance, effectiveness, demand and efficiency, an initial questionnaire can be developed. Distributing periodical surveys among the learners would reflect the effectiveness of the project for necessary fine-tuning. After complete installation and operation of a few pilot tests in specific regions, the whole country can be brought under a common network through these regional centers [2]. In developed economies, newer versions of technology are often used to upgrade older versions, but in developing economies where still older versions of technology are often prevalent (if they exist at all), the opportunities for leapfrogging over the successive generations of technology to the most recent version are that much greater [3]. In the conventional view, (i.e. as seen by technology developers and donors), developing countries passively adopt technology as standard products which have been developed in industrialized countries and which can be usefully employed immediately. However, successful use of IT requires much more than mere installation and
212
S. Deb
application of systematized knowledge. It also requires the application of implied knowledge regarding the organization and management of the technology and its application to the contextual environment in which it is to be used. This implied IT knowledge often represents experience with the deployment of previous technology accumulated over time, such experiences contributing towards the shaping of new technology [3]. In addition to purely technological issues, the development of appropriate human resources skills are required, i.e. extensive training of the people who are going to use (and train others how to use) the resources. Training is seen as particularly important as this is not technology just a few people to benefit from, but for many. As Pekka Tarjanne, Secretary General of the ITU, made clear at Africa Telecom '98, "communication is a basic human right" (original emphasis). Nelson Mandela, at Telecom 95 in Geneva, urged regional co-operation in Africa, emphasizing the importance of a massive investment in education and skills transfer, thereby ensuring that developing countries also have the opportunity to participate in the information revolution and the "global communications marketplace"[3]. Canada's International Development Research Centre (IDRC) runs a number of developing country projects that involve technology leapfrogging. The Pan Asian Network (PAN) was set up to fund ICT infrastructure and research projects in developing countries across Asia. Individuals, development institutions, and other organizations should all be able to use the infrastructure so as to share information [3] . PAN works with Bangladesh's world famous grassroots Grameen Bank. One service here is a "telecottage", where network services can be obtained. The technology and the material will be tailored to meet the needs of Grameen's typically poorly educated clients. One of PAN's objectives is gender equity. Women, who constitute some 95% of Grameen's borrowers, will be prominent among PAN users in Bangladesh [3]. PAN is also responsible for linking Laos to the Internet. The Science, Technology and Environment Organization (STENO) of the Lao Government invited some Laotian IT professionals living and working overseas to return home and share their experiences with their colleagues in the country. STENO collaborated with PAN in designing an 18-month long project to build the necessary infrastructure for a dial-up e-mail service. Among the pioneer users were "researchers working on agriculture and aquaculture projects; journalists managing national news agencies and newspapers; lawyers consulting on international legal issues; travel agents planning business trips; computer resellers tracking down suppliers and obtaining pricing information; and about 20 others in both the public and private sectors" [5].
3
How to Use Mobile Technology with Advanced Multimedia Tools
In Section 2, we presented various efforts made to make distance learning effective in developing countries. Presentation of course materials through multimedia in remote locations where in villages there could be school structures where those presentations could be made is feasible. Of course learning materials must be self-explanatory and not boring. Using multimedia facilities like videos, audios, graphics and interesting textual descriptions, it is possible to reach the remote locations of the world where
How to Use Mobile Technology to Provide Distance Learning
213
computer technology has not reached yet. As the areas not covered by computer and internet technology is still profoundly vast in the world this approach seems to be very constructive and should be pursued. Wherever possible distance learning through multimedia should be imparted through internet as internet and networks are the vehicles of multimedia. But since bandwidth connection is still very limited in vast areas of Asia, Africa and Latin America it would still take long time to reach major part of the population of the above-mentioned regions with multimedia and web. Mobile technology offers a very hopeful way to reach the vast population of the developing countries as it does not require bandwidth connections. We have to develop distance learning using multimedia through mobile technology. This seems to be the most viable way to reach billions living in the rural areas of the developing countries. Hence considerable research efforts must be dedicated to this line. Instructions could be sent through emails to mobiles of the distance learners. Also relevant website addresses could be transmitted to their emails and they could then visit those sites of distance learning though the internet of their mobiles. In his book, Mayer (2001) declares that while learning from the text-only books results in the poorest retention and transfer performance, learning from books that include both text and illustrations and from computer-based environments that include onscreen text, illustrations ,animations and narrations results in better performance [10]. Similar to e-Learning, mobile technologies can also be interfaced with many other media like audio, video, the Internet, and so forth. Mobile learning is more interactive, involves more contact, communication and collaboration with people [14]. The increasing and ubiquitous use of mobile phones provides a viable avenue for initiating contact and implementing interventions proactively. For instance, Short Message Service (SMS) is highly cost-effective and very reliable method of communication. It is less expensive to send an SMS than to mail a reminder through regular postal mail, or even follow-up via a telephone call. Further, no costly machines are required (which is clearly the case in terms of owning a personal computer).Besides SMS, distance learners can use mobile phones/ MP3 players to listen to their course lectures, and for storage and data transfer. New technologies especially mobile technologies are now challenging the traditional concept of Distance Education [12]. Today the more and more rapid development of the ICT contributes to the increasing abilities of the mobile devices (cell phones, smart phones, PDAs, laptops) and wireless communications, which are the main parts of the mobile learning. On the other hand for the implementation of mobile learning it is necessary to use a corresponding system for the management of such type of education [13]. The use of mobile technologies can help today's educators to embrace a truly learner-centred approach to learning. In various parts of the world mobile learning developments are taking place at three levels: The use of mobile devices in educational administration Development of a series of 5-6 screen mobile learning academic supports f or students Development of a number of mobile learning course modules [11]. Research into the current state of play in Europe indicates: 1. There is a wide range of roles for mobile technologies supporting the learner in many ways ranging from relatively simple use of SMS texting to the more advanced
214
S. Deb
use of smartphones for content delivery, project work, searching for information and assessment. Some proponents of mobile learning believe that it will only „come of age‟ when whole courses can be studied, assessed and learners accredited through mobile devices. 2. Although books are now being downloaded onto mobile devices, the authors believe that to support the learning process a great deal of thought has to be given to the structure of the learning and assessment material. However, it is true that for some, mainly at higher education level, mobile phones offer the opportunity to access institutional learning management systems. This provides greater flexibility to the learner without any new pedagogical input. 3. Costs are coming down rapidly; new first generation simple mobile phones will not be available on the market from 2010. All mobile phone users in Europe will be using 3 or 4G phones within the next two years. A welcome associated step is a move towards some form of standardization by the mobile phone companies as exemplified by the shift to common charging devices over the next two years. 4. The value which is put on possession of a mobile phone, especially by young people is surprising and the data on ownership suggests that this will be a ubiquitous tool for all very shortly and that it will be well cared for: there is evidence that ownership of devices brings responsible use and care. 5. Large scale educational usage in schools currently depends on government investment but in higher and further education it is safe to assume that all learners will have their own devices. Institutions will need to advise potential students on the range of devices most suitable for the curriculum, as they do currently with regard to computers. The convergence between small lap tops and handheld devices will continue until they are regarded as different varieties of the same species of technology. 6. There is a great potential for educational providers to work with large phone companies, both to reduce costs and to co-develop appropriate software [6]. Bangladesh Open University (BOU) is the only national institution in Bangladesh which is catering distance education in the country. It has extensive network throughout the country to provide readily accessible contact points for its learners. After passing of 15 years since its inception, BOU has lagged behind in using technologies. In consideration of its limit to conventional method in teaching, a project was undertaken to test the effectiveness and viability of interactive television (TV) and mobile's Short Message Service (SMS) classroom and explore the use of available and appropriate technologies to provide ICT enabled distance tuition. In this project, the mobile technology's SMS along with perceived live telecast was used to create ideal classroom situation for distance learning through the Question Based Participation (QBP) technique. The existing videos of BOU TV programs were made interactive using this technologies and technique. The existing BOU TV program and interactive version of the same were showed to same learners of BOU to evaluate its effectiveness. It is found from the study that this interactive virtual classroom significantly perform well in teaching than BOU video programs (non-interactive) which is used at present [7]. Another paper presents and discusses NKI (Norwegian Knowledge Institute) Distance Education basic philosophies of distance teaching and learning and their consequences for development of a learning environment supporting mobile distance learners.
How to Use Mobile Technology to Provide Distance Learning
215
For NKI it has been a major challenge to design solutions for users of mobile technology who wish to study also when on the move. Thus, when students are mobile and wishing to study, the equipment and technologies they use will be in addition to the equipment used at home or at work. The solutions must be designed in ways to allow both users and non-users of mobile technology to participate in the same course. This means that we have looked for solutions that are optimal for distributing content and communication in courses, independent on whether the students and tutors apply mobile technology or standard PC and Internet connection for teaching or learning. The learning environment must efficiently cater for both situations and both types of students. The solutions were developed for PDAs. During the time of the development and research the technologies have developed rapidly. Mobile phones are including PDA functionalities and vice versa. In principle the aim of developments is to design solutions that can be used on any kind of mobile devices. The paper builds on experiences from four European Union (EU) supported projects on mobile learning: From e-learning to m-learning (2000-2003), Mobile learning – the next generation of learning (2003-2005), Incorporating mobile learning into mainstream education (2005-2007) and the ongoing project, The role of mobile learning in European education (2006-2008). Most NKI courses are not designed to function as online interactive e-learning programs, although some parts of the courses may imply such interaction with multimedia materials, tests and assignments. The courses normally involve intensive study, mainly of text based materials, solving problems, writing essays, submitting assignments and communicating with fellow students by e-mail or in the web based conferences. This means that most of the time the students will be offline when studying. From experience we also know that the students often download content for reading offline and often also print out content for reading on paper. All aspects and functions of mobile learning in the NKI large scale distance learning system is clearly an additional service to the students [8]. Mobile Assisted Language Learning (MALL) describes an approach to language learning that is assisted or enhanced through the use of a handheld mobile device. MALL is a subset of both Mobile Learning (m-learning) and Computer Assisted Language Learning (CALL). MALL has evolved to support students‟ language learning with the increased use of mobile technologies such as mobile phones (cellphones), MP3 and MP4 players, PDAs and devices such as the iPhone or iPAD. With MALL, students are able to access language learning materials and to communicate with their teachers and peers at any time anywhere [9].
4
Conclusion
In this paper we studied the problems of imparting distance learning through multimedia in developing countries. We suggested mobile technology a viable and affordable media through which distance learning could be imparted to billions of people in an efficient way. We presented some examples of achievements in this field in this paper where we can use telephone, photography, audio, video, internet, eBook, animations and so on in mobile and deliver effective distance education in developing countries. More research needs to be carried out to tap the vast opportunity of reaching to billions in developing countries through mobile technology and gearing up multimedia technology to be easily transported to those locations.
216
S. Deb
References 1. Passerint, K., Granger, M.J.: Developmental Model for Distance Learning Using the Internet. Computer & Education 34(1) (2000) 2. Rahman, H.: Interactive Multimedia Technologies for Distance Education in Developing Countries - Introduction, Background, Main focus, Future trends, Conclusion (2000), http://encyclopedia.jrank.org/articles/pages/6637/ Interactive-Multimedia-Technologies-for-Distance-Educationin-Developing-Countries.html 3. Davison, R., Vogel, D., Harris, R., Jones, N.: Technology Leapfrogging in Developing Countries – An Inevitable Luxury? Journal of Information Systems in Developing Countries (2000) 4. Ruth, S., Giri, J.: The Distance Learning Playing Field: Do We Need Different Hash Marks? (2001), http://technologysource.org/article/ distance_learning_playing_field/ 5. Nhoybouakong, S., Ng, M.L.H., Lafond, R.: (1999), http://www.panasia.org.sg/hnews/la/la01i001.htm 6. Using Mobile Technology for Learner Support in Open Schooling, http://www.col.org/sitecollectiondocuments/ 7. Alam, M.S., Islam, Y.M.: Virtual Interactive Classroom (VIC) using Mobile Technology at the Bangladesh Open University (BOU), http://wikieducator.org/images/4/45/PID_563.pdf 8. Dye, A., Rekkedal, T.: Enhancing the flexibility of distance education through mobile learning. In: The European Consortium for the Learning Organisation. ECLO–15th International Conference, Budapest, May 15-16 (2008) 9. Mobile Assisted Language Learning, http://en.wikipedia.org/wiki/Mobile_Assisted 10. Mayer, R.E.: Multimedia learning. Cambridge University Press, Cambridge (2001) 11. Implications of Mobile Learning in Distance Education for Operational Activities, http://wikieducator.org/images/c/c6/PID_624.pdf 12. Yousuf, M.: Effectiveness of Mobile Learning in Distance Education. Turkish Online Journal of Distance Education-TOJDE 8(4), Article 9 (2006) (October 2007) ISSN 13026488, http://www.google.co.in/search?hl=en&q=%22Effectiveness+of+M obile+Learning+in+Distance+Education%22&meta (retrieved on March 31, 2008) 13. Georgieva, E.: A Comparison Analysis of Mobile Learning Systems. In: Paper Presented at International Conference on Computer Systems and Technologies-CompSysTech. (2006), http://ecet.ecs.ru.acad.bg/cst06/Docs/cp/sIV/IV.17.pdf (retrieved on March 31, 2008) 14. Vavoula, G.N.: D4.4: A study of mobile learning practices. MOBIlearn project deliverable. The MOBIlearn project website (2005), http://www.mobilearn.org/download/results/ public_deliverables/MOBIlearn_D4.4_Final.pdf
Design and Implementation of Mobile Leadership with Interactive Multimedia Approach Suyoto1, Tri Prasetyaningrum2, and Ryan Mario Gregorius3 1,3
Department of Informatics Engineering, University of Atma Jaya Yogyakarta, Indonesia
[email protected],
[email protected] 2 State Junior High School 18 Purworejo, Central Java, Indonesia
[email protected]
Abstract. In this paper, we propose the design and implementation an application of mobile leadership with Interactive Multimedia Approach-called “m-Leadership”. This application is used to indirect service of Guidance and Counseling that runs on mobile devices i.e. mobile phones. We use a method to develop approach that is a combination of interactive multimedia and educational psychology. We also take care aspects of the interface, interactivity, ease to use, and stand-alone for this application that runs on mobile phones and multimedia-based. There are four multimedia components used in this application is text, pictures / graphics, audio / voice, and video animation. This application is implemented using J2ME and J2ME Wireless Toolkit. This application has been tested as a whole at 30 junior high school students. Based on the test, people give rating: 46% excellent, 24% good, 29% adequate, and 1% poor. Keywords: Leadership, multimedia, educational psychology.
1
Introduction
Leadership is a process to influence. The process of influencing it must first begin from within us. While we have not been able to lead ourselves, we need not hope to lead others [1]. Leadership starts from ourselves should be started as a teenager so that the teens themselves can create strong self-control [2, 3]. Teenagers are those aged 13-18 years [4, 5]. Junior high school (JHS) age is 12-15 years. This means that teens are taking education started when junior high school. A teenager is in transition from children to adults [6]. A teenager at the time of this transition often juvenile delinquency such as free sex, brawl between students, to drug abuse, which is due to several factors, one of them because of the weak self-control [7]. Before a JHS start to develop leadership within themselves, they need to first know how the existing leadership within them. One of the facilities for JHS student to know the leadership in him is through the leadership tests developed from Task Inventory Development (TID) of junior high students that has developed by Indonesia T.-h. Kim et al. (Eds.): MulGraB 2011, Part I, CCIS 262, pp. 217–226, 2011. © Springer-Verlag Berlin Heidelberg 2011
218
Suyoto, T. Prasetyaningrum, and R.M. Gregorius
University of Education team. TID for JHS students are instruments used to understand the level of individual development in JHS students. TID is the form of a questionnaire which consists of a collection of statements that should be chosen by the JHS students [8]. With TID, the level of JHS students' progress can be measured, so the problems that hinder the development of JHS students can be identified and JHS students can get help in completing the task of development. TID for junior high school students measure the level of development of the 10 aspects of development. Six aspects of the development of which are closely related to the categories used to group the questions on the test of leadership for JHS students [9]. Development of mobile applications on the phone to discuss leadership in the user which is a JHS student has not been done; therefore it is necessary to develop a mobile application on your leadership to JHS children. Mobile applications will have several features that are test of leadership, leadership games, and short stories of leadership. The test of leadership on these mobile applications will have categories, shape, number, function, and assessment questions taken and adapted from TID. The content of the application features that have been mentioned it will be stored in a database. The leadership mobile applications must use a certain type of connection to access the content features - features that in a database through an application server. There are several types of connections that may be used by the application of an HTTP connection, Bluetooth, and SMS and MMS. However, the application will use an HTTP connection. On phones where the application is running, the HTTP connection will be run with the GPRS feature.
2
Literature Review
There are two studies each had discussing about the Bluetooth connection as well as SMS and MMS is a remote application development research presentations using Bluetooth technology and J2ME [10] and research development extension services ID card applications based on SMS/MMS Gateway [11]. The leadership mobile application will use the HTTP connection because of weaknesses that have the types of connections other content related access features on the database. The weakness of the Bluetooth connection is indicated through remote application development research presentations using Bluetooth technology and J2ME [10]. Research presentation remote application development using Bluetooth technology and J2ME have produced two pieces of software that is a J2ME based mobile applications and a desktop application based on J2SE [10]. The mobile applications are functioning as a client that sends commandments to perform certain functions on Microsoft Power Point applications, whereas desktop applications are functioning as a server that receives commands sent by the client via a Bluetooth connection, and then based on those commands perform certain functions Microsoft Power Point applications. Krisnanto (2008) has conducted research resulting in the extension services ID card applications based on SMS/MMS Gateway is used to handle the ID card renewal
Design and Implementation of Mobile Leadership
219
process. ID card renewal process is done by sending an SMS according to the procedure, then through the SMS Gateway to be checked into the database, if the appropriate then the next process is to send pictures of people who take care of us an extension of ID cards by MMS. SMS sent by people who are seeking extension of the ID card or SMS notification SMS from the ID card Gateway is an SMS Gateway through more than one SMS Center that different protocols. Therefore, SMS Gateway SMS Centre serves to connect a few who each have a different protocol [11]. Therefore, self-image of people who take care of ID card renewal applications have been generated from research Krisnanto (2008) sent using the MMS service which is a mobile service for the delivery of multimedia-based messages. However, for mobile applications developed with J2ME through MIDP profile, the file size that is sent and received via MMS and restricted 30 KB [12]. This is less support feature tests, games, and short stories of leadership that will be owned by the mobile application of leadership because the content of each feature on the database is likely to have a capacity of more than 30 KB, so it takes more than one occasion sending MMS for mobile applications can initiate leadership run each of its features are. HTTP protocol is the basis of HTTP connections that will be used by mobile applications leadership [13]. HTTP protocol provides convenience and speed for the distribution of multimedia-based information and provides connections to a server in large quantities [14]. With so many weaknesses that are owned by some other connections that have been described above in terms of accessing content features mobile application features of leadership in the data base could be addressed with HTTP connections. Thus the system architecture proposed by the authors is shown in Fig. 1.
Fig. 1. System architecture of “m-Leadership”
3
Overview of Mobile Leadership System
The software will be developed is called Applications Mobile Leadership (“mLeadership”). This software will provide the tests, games, and short stories of leadership to junior high school children, the content will be stored in a database.
220
Suyoto, T. Prasetyaningrum, and R.M. Gregorius
The software will use an HTTP connection to access the content of its features through web-based server applications with PHP technology. Database application server that will manage the data base to store the test content, games, and short stories of leadership in this software and web server that will provide web applications with PHP technologies that will be accessible to this software to retrieve the content features - these features in the data base located on a server computer. The test of leadership that will be provided by the software “m-Leadership” will provide the leadership test questions are different each time the software is accessed by a mobile phone. Category matter, forms of matter, the number and function of matter, and assessment questions that will be used in the tests of leadership in software is taken and adapted from the categories of matter, forms of matter, the number and function of matter, and judgments about the TID (Task Inventory Development) to junior high school level learners. The questions on the test of leadership in software “m-Leadership” will be divided into five categories: personal, social, learning, career, and moral The division of this category is obtained by adjustment of the 6 aspects of developments in TID is self-acceptance and development, maturity relationships with peers, intellectual maturity, insight and career preparation, the foundation of ethical behavior, and emotional maturity. Each question on the test of leadership in the form of a collection of software is a statement consisting of 4 statements. The number of questions on the test of leadership in this software as much as 50 questions, divided into 40 questions on which the assessment of the categories assessed leadership and 10 questions to test the consistency of answers on 40 questions. The value given to the statements in each question on a test of leadership in this software has a range from 1 to 4. Leadership game is a game of guessing the name of the leader. Short stories of leadership that will be provided by this software will provide a brief illustration of the 8 characters that must be owned by a good leader according to the Basic Leadership Training books written by F. Rudy and Theo Dwiwibawa Riyanto (2008). Some characters are composed of a sense of responsibility, concerned with task completion, vigor, and willpower, take risks, confidence, originality, and capacity to influence.
4
Design of Mobile Leadership System
4.1
Use Case Diagram
Fig. 2 shows the design of use case diagram of the “m-Leadership”. While Fig. 3 shows the system architecture design.
Design and Implementation of Mobile Leadership
Leadership Test
221
<
New Leadership Test
<
Leadership Game
login
<
Short Story of Leadership
Fig. 2. “M-Leadership” Use Case Diagram
a. Use Case Specification: Login The use case is used by actors (JHS student) to give the user identifier on the system; so that through this user identification system can present its features is a test, games, and short stories of leadership on the actor. b. Use Case Specification: The Leadership Test The use case is used by actors (JHS student) to gain leadership test on the system. c. Use Case Specification: the New Leadership Test. The use case is used by actors (JHS student) to obtain a test of leadership by removing the user identifier and the problems of the old leadership test he has and still stored in the database for HTTP connections with PHP based server application that functions take the test questions leadership in the data base is lost. In this use case, actor gets new leadership test with a new user identifier. d. Use Case Specification: The Short Story of Leadership The use case is used by actors (JHS student) for the Short Story about Leadership.
4.2
Architectural Design of Applications
Applications “m-Leadership” is software developed to discuss leadership at users who are junior high kids. In discussing leadership on these users, the software provides several features that test, games, and short stories leadership. Leadership tests provided to take and customize categories, shape, number and function, as well as an assessment about the TID (Task Inventory Development) for JHS students.
222
Suyoto, T. Prasetyaningrum, and R.M. Gregorius
Login
Leadership Test
New Leadership Test
Short Story Leadership
Leadership Game
Fig. 3. Architectural Design of Applications “m-Leadership”
Short stories of leadership in software illustrate the 8 characters good leader according to the book Leadership Basic Course, written by F. Rudy and Theo Dwiwibawa Riyanto (2008). This short story tells how a group of junior high school children to apply the 8 characters are good leaders in their daily lives. Leadership game in software is Game Guess the Name of Leader. Users of this software have to guess who the leader of a face image displayed on this game. When a user successfully guessed it, the user will get a description of the profile of leadership from leaders who display these images on his face. Thus, users of this software can be helped to know the leadership in him through the tests of leadership is available, whereas leadership through games and short stories are available, users of this software can emulate the 8 characters are good leaders and leadership profiles of leaders who face image is displayed. This software is developed using J2ME programming language. Content from the features of this software is stored in the database on the server computer. Database is managed by using an application manager that is the MySQL database. This software is accessing the content of its features through an HTTP connection with the PHPbased server applications. PHP based server application that will access the content of the features of this software on a computer database on a server and give it to this software. The design of some forms that exist in the software “m-Leadership” are for example: (a) User Identification Form; (b) Lead Main Menu Form; (c) Lead Test Description Form; (d) Lead Test Menu Form; (e) User Confirmation Form; (f) Result Test Form; (h) Guess Game Name Form; (i) Short Story Form. The User Identification Form (Fig. 4.a) is a form that was first displayed when the software “m-Leadership” is executed. In this form it appears that there is a text field. Software users “m-Leadership” shall include any user identifier text field through this in order to access the features of the software “m-Leadership” (tests, games, and short stories of leadership); The Lead Main Menu (Fig. 4.b.) contains a list of the features of the software “mLeadership” is a test, games, and short stories of leadership; The Lead Test Description Form (Fig. 4.c) is the form shown to provide usage instructions in software test leadership "m-Leadership '. Before the user performs a test of leadership, the user should read and understand these instructions;
Design and Implementation of Mobile Leadership
223
The Lead Test Menu (Fig. 4.d) is a list that lists the tests of leadership that can be selected by the user software “m-Leadership”. There are two tests provided leadership on the Lead Test Menu new and old test. New test gives test questions new leadership with User identification entered by the user. Old test provides test questions new leadership by first deleting the User Identification that never used a user to access the leadership test but not until the end of leadership tests and test questions of leadership provided by the user identifier. The User Confirmation Form (Fig. 4.e) is a form that is displayed when the user chooses the long tests on Lead Test Menu. In the text field provided in this form, the user must enter a user identifier that never used a user to access the tests of leadership in software “m-Leadership” but not until the tests of leadership is ending. User identifier entered by the user on this form along with test questions of leadership provided by the user identifier is removed from the database; The Lead Test Form (Fig. 4.f) is a form that is displayed to access the tests of leadership in the application. Users will access the test questions about the leadership of 50 through this form. In this form it appears that each test item “m-Leadership” leadership in software consists of 4 pieces each statement and the statement is displayed as a string property of each element of which is owned by a Choice Group for each test item provided the leadership. Old Test Options on Lead Test Menu provide leadership on New Lead Test Form. The form has the same appearance with Lead Test Form. The Result Test Form (Fig. 4.g) is a form that is displayed after the question number 50 of the tests completed leadership. The top of this form contains the user value for a category of software test leadership at “m-Leadership”. Description for that value on this form will be near the bottom of that value, such as. Social categories, learning, career, and character also has the same appearance with personal categories shown on this form. The Guess Game Name form (Fig. 4.h) is a form to play the game guess the leader's face. This form provides a text field and displays the facial image that leaders must be guessed his name. “m-Leadership” software users must enter the correct name in the text field provided in this form to get a description of the profile of leadership from leaders who face image is displayed. If the user successfully entered the correct names of leaders who face image is displayed, then the software will display the description of the profile of leadership. The Short Story form (Fig. 4.i) is a form that is used to access the short stories of leadership. This short story entitled The Challenging Task. “m-Leadership” software users via this form will get a short story as a whole that consists of 12 images with descriptions of each, which illustrates the 8 characters good leader. “m-Leadership” software users simply press the button Continue Reading to see pictures and read short stories from such leadership.
224
Suyoto, T. Prasetyaningrum, and R.M. Gregorius
(a)
(b)
(c)
Fig. 4. Design of (a) User Identification Form; (b) Lead Main Menu Form; (c) Lead Test Description Form
(d)
(e)
(f)
(g)
(h)
(i)
Fig. 4. Design of (d) Lead Test Menu Form; (e) User Confirmation Form; (f) Result Test Form; (h) Guess Game Name Form; (i) Short Story Form
5
Result and Discussion of Mobile Leadership System
There are four multimedia components used in this application is text, pictures / graphics, audio / voice, and video animation. This application is implemented using
Design and Implementation of Mobile Leadership
225
J2ME and J2ME Wireless Toolkit. T The “m-Leadership” application testing is diviided into two parts, namely thee test application and test the functionality of productss in general by the users off this software. Testing application functionality ““mLeadership” presented entirrely in the appendix section of Planning, description, and Results of Software Testing g. With the successful implementation of the features ““mLeadership” application that consists of a test of leadership, leadership games, and short stories of leadership. All features successfully implemented, this means that the application “m-Leadership”” has been successfully developed. Tests on the users is don ne by asking some JHS students to access the applicattion “m-Leadership” and ask th he opinions of the users are on the application by fillingg in questionnaires that were disstributed. The questions raised included the aspects of eease of use of the display of beauty, charm features, understanding of content and applications and benefits of o self-knowledge. “m-Leadership” application has bbeen tested at 30 junior high students. All junior high school students provides an assessment that includes excellent, e good, enough, not good, or very bad in the questionnaire given to them m after they access the application “m-Leadership”. U User rating - average to the overaall aspect “m-Leadership” applications in the questionnaaire were based on the results off the assessment above the 46% rate excellent, 24% gavve a good rating, 29% provide adequate assessment, and 1% gave the rating was not good/poor (Fig. 5).
User satisfaction surveys. poor adeq qu atee 29% % ggood 24%
1%
excelle nt 46%
Fig. 5. User U rating for “m-Leadership” applications
6
Conclusions
The “m-Leadership” appllication is used to indirect service of Guidance and Counselling that runs on n mobile devices i.e. mobile phones. There are ffour multimedia components ussed in this application is text, pictures / graphics, auddio / voice, and video animation n. This application is implemented using J2ME and J2M ME Wireless Toolkit. This appllication has been tested as a whole at 31 junior high schhool students with average result to the overall aspect is 46% of people give an exelent, 24% of people give a good d, 29% of people give adequate, and 1% of people givve a
226
Suyoto, T. Prasetyaningrum, and R.M. Gregorius
poor. For future research develop another mobile multimedia content i.e. an election at a junior high class president or student council president election even at a junior high school. Acknowledgements. The authors wish to thank DP2M Directorate General of Higher Education Indonesia for approving Competence Research Grants (for the budget year 2010 and 2011) titled Design and Implementation of Mobile Phone Content Service Application with Interactive Multimedia Approach.
References 1. Rudy Dwiwibawa, F., Riyanto, T.: So Ready Leader?, Kanisius, Yogyakarta, Indonesia (2008) (in Bahasa) 2. Giuliani, R.: Leadership. Miramax (2007) 3. Maharani, A.: Relationship Between Emotional Intelligence in Adolescents With Self Adjustment. In: Final Project, Psychology Department, Faculty of Psychology Sanata Dharma University, Yogyakarta, Indonesia (2005) 4. Spire and Research Consulting: Marketing to Indonesian Youths Today (January 2007) 5. Hurlock, E.B.: Adolocent Development, 4th edn. McGraw-Hill Kogakusha, Ltd., Tokyo (1973) 6. Bhaumik, S., et al.: Transition for Teenagers With Intellectual Disability, Carers’ Perspectives. Journal of Policy and Practice in Intellectual Disabilities 8(1), 53–61 (2011) 7. Worth, N.: Social Geography and Young People. University of Leeds (2011) 8. UPI Team: Task Inventory Development. Indonesia University of Education, Bandung (2008) (in Bahasa) 9. Kartadinata: Inventory Development Tasks for Junior High School. Indonesia University of Education, Bandung (2003) (in Bahasa) 10. Deny: Presentation Remote Application Development Using Bluetooth and J2ME Technology. Final Project, Informatics Department, University of Atma Jaya Yogyakarta (2007) (in Bahasa) 11. Krisnanto: Development Extension Service ID Card-Based SMS/MMS Gateway. Final Project, Informatics Department, Atma Jaya Yogyakarta University (2008) (in Bahasa) 12. Blewitt, A.: Survive the Test of Time Developing J2ME on Nokia Phones. Nokia Developer (2003) 13. MobiWeb team: SMS HTTP API Manual Version 7.3. MobiWeb Ltd. (2011) 14. Wiland, L., Banerjee, S.: Introduction to Mobile Phone Programming in Java Me. UWMadison (2008)
New Development of M-Psychology for Junior High School with Interactive Multimedia Approach Suyoto1, Thomas Suselo2, Yudi Dwiandiyanta3, and Tri Prasetyaningrum4 1,2,3
Department of Informatics Engineering, University of Atma Jaya Yogyakarta, Indonesia {Suyoto,suselo,yudi-dwi}@staff.uajy.ac.id 4 State Junior High School 18 Purworejo, Central Java, Indonesia
[email protected]
Abstract. This paper will present a new development of m-Psychology for Junior High School with multimedia approach. The m-Psychology will consist of three kinds of specific application utilized as an indirect Counseling Guidance that running on mobile phone. These three applications are as follow; m-KE (Emotional Quotient mobile), m-KK (Success Intelligent mobile), and mLDP (Leadership mobile). The method used in the application is a combination an interactive media and an educational psychology. Moreover, the development of this software application involves many aspects such as interface, interactivity, user friendly mode, and stand alone cellular phone software based on multimedia approach. Clearly, there are four multimedia components used in the application which are text, graph or picture, audio, and animation. The m-psychology will run on Symbian Operation System. Based on the test, people give rating: 43% excellent, 40% good, 16% adequate, and 1% poor. Keywords: Educational psychology, content service, cellular phone, and multimedia learning.
1
Introduction
In 2002, the study has started a series of on line psychological software such as potential academic test (known as e-TPA), electronic color-blind test (known as ekidsCV), emotional intelligent test (e-KE), success intelligent test (e-KK), leadership test (e-LDP), etc [1]. It is then in accordance to a research roadmap described by fish bone diagram (Fig. 1). Started in 2006, the writer continues the development of such psychological software series applied on cellular phone indeed. It is then initialized by the ColorBlind Test System or m-Color- Blind Test [2] [3] [4] specifically running on the cellular phone system. The study focuses on the application of cellular phone system since in 2006 the user of cellular phone in Indonesia reaches the total amount of 68 million people. This number is clearly predicted to grow higher and higher for the year 2010. Olli-Pekka Kallasvuo predicts that the global cellular phone user will reach around 4 billion in the year of 2010 [5]. Moreover, according to the report of T.-h. Kim et al. (Eds.): MulGraB 2011, Part I, CCIS 262, pp. 227–236, 2011. © Springer-Verlag Berlin Heidelberg 2011
228
Suyoto et al.
detikinet, the amount of cellular phone user in the last 2008 is more than 50 percent of the world population or more than 3.3 billion users [6]. What is more, the service of what is called a ring back tone or private tone is taking much more iinterests for cellular phone users in Indonesia, so are wallpaper, games, and message services. According to one of the biggest providers in Indonesia, Telkomsel, in the event of Telkomsel Content Provider Gathering 2007, the content service provided by Telkomsel has contributed as much as more or less 4% from company's total earnings. Nowadays, Telkomsel and its 108 Content Provider (CP)'s partner provide 3000 kinds of content services. As it is crystal, the number of CP in Indonesia is still around 200500 which still also gives great opportunity for any CP's business [7].
mobile emotional intelligent (m-KE) Analysis and Synthesis Electronic colorof NTT Motive Material blind test (e-kidsCV) Electronic potential academic test (e-TPA)
Electronic Emotional Quotient test (e-KE) Electronic Success Intelligent test (e-KK) M-ColourBlind Test
Electronic leadership test (e-LDP)
Building framework of mapping directory of UMKM DIY
Analysis and mobile of Leadership Synthesis of Toraja (m-LDP). Motive Material
Content Service-based multimedia
mobile of Success Intelligent (m-KK)
Fig. 1. Fishbone Diagram
As a part of a great opportunity in content innovation service and also an attitude of attention to education environment in Indonesia, the writer consistently develops a content service by utilizing the advance tool of information technology. As the writer believes, the development of the content service itself is truly a way to dedicate the writer's knowledge to the society as a whole. Moreover, the content will then be acknowledged through multimedia learning approach. Building the content service to cellular phone's software can be done in various ways and techniques such as either using Java programming language with J2ME [8] [9] or applying tools of application which are quite popular like Macromedia Flash Professional 8 and Adobe FlashLite 2.1 [10] [11] [12]. Specifically, the paper will describe clearly the content service itself which consists of some applications running on the cellular phone based on Symbian Operating System. These determined applications are Emotional Quotient mobile or mobile Kecerdasan Emosional (m-KE), Success Intelligent mobile or mobile Kecerdasan Kesuksesan (m-KK), and Leadership mobile or mobile LDP (m-LDP).
New Development of M-Psychology for Junior High School
2
229
Research Methodology
This research is then designed to be implemented as long as two years involving six graduate students. The result shows (a) a model of instructional design (ID) as the basis development for multimedia software supporting what is called Counseling Guidance service for Junior High School (JHS) students; (b) methodology of multimedia software development to both learn and serve Counseling Guidance subject; (c) prototype of multimedia series to learn and serve Counseling Guidance; and (d) text book material. The research is divided into two main chapters, (i) multimedia software development to learn and serve Counseling Guidance indirectly to the students of Junior High School, and (ii) the study of the effectively of multimedia software development to the learning and serving process in Counseling Guidance subject indirectly for targeted object (Junior High School students). To begin with, a deep analysis related to not only indirect Counseling Guidance learning and serving condition applied recently in the environment of JHS (including also its curriculum) but also all related theories especially educational psychology theory which is still relevant, takes place in this paper. Next, the result of the study will then determine a necessary instructional design model. The model will also be developed into a multimedia series to learn and serve Counseling Guidance indirectly by paying a great attention to some other related aspects such as interface, text font used, text arrangements, appropriate graphs, appropriate animations, interactivity, and navigation approach and reliability. What is coming next, such prototype of software multimedia produced will be tested, implemented and at the same time evaluated to reach the perfect form and function as well. For information, the testing, implementing and evaluating process can be continuously repeated as necessary. The research map involving the core or main theme and supported theme will be clearly presented in Fig. 2.
3
Application Design
This third part of the paper will discuss an application design used. The topic of the discussion focuses on analysis, perspective of the product, function of the product, characteristic of user, description of m-Pscyhology (m-KE, m-KK and m-LDP), special need, functional need, architecture and interface design from those three interactive multimedia applications. 3.1
Design and Analysis
Every creature is created in different emotion and characteristic for each. In this case, characteristic plays a vital role as it is a special feature of human influenced by many special environments also such as family, education, and heredity.
230
Suyoto et al.
Sub theme 1: Title: Development of mobile Emotional Quotient software with Interactive Multimedia Approach (m-KE)
Sub theme 2: Title: Development of mobile Success Intelligent software with Interactive Multimedia Approach (m-KK)
Sub theme 3: Title: Development of mobile Leadership Intelligent (m-LDP) software with Interactive Multimedia Approach
Main Theme: Content service building framework using interactive multimedia approach.
Sub theme 4: Title: Implementation of mobile Emotional Quotient (m-KE) software with Interactive Multimedia Approach for Counselling Guidance Service aimed to JHS student
Sub theme 5: Title: Implementation of mobile Success Intelligent mobile (m-KK) software with Interactive Multimedia Approach for Counselling Guidance Service aimed to JHS student.
Sub theme 6: Title: Implementation of mobile Leadership mobile (m-LDP) software with Interactive Multimedia Approach for Counselling Guidance Service aimed to JHS student.
Sub theme 7: Title: Implementation of Psychology mobile software with Interactive Multimedia Approach for Indirect Counselling Guidance Service aimed to Junior High School student.
Fig. 2. The research map
New Development of M-Psychology for Junior High School
231
Basically, there are at least four basic types of characters of human being which are sanguinity, phlegmatic, choleric, and melancholies type [13]. Those four types of character are responsible for any single problem caused by either misunderstanding or misperception among humans. The appeared problem resulting from such different perceptions is derived from the lack of understanding each other. [14] [15]. Intelligent owned by human is often interpreted as a potential talent to achieve success in one's life. It is then easy to determine earlier as parents tend to be proud if the children have a high result of an IQ test. Moreover, based on this test, parents will then assume that the children are smart and clever. In fact, this perception may not be true since the key factors of success may be derived from many other internal and external factors. For example the success of a child in his or her learning process in school will involve both internal and external factors. Internal factor can be in the form of intelligent of quotient, characteristic, learning motivation, emotional stability, learning strategy, and so on. While on the other hand external factors also can be in the form of family conflict, schoolmates influence classroom management, teacher's teaching technique, learning condition at home and its facilities and so forth [16]. Therefore, it is very clear that a success of one cannot be measured only from the degree of intelligent that he or she possess. Leadership test provided by the developed application will serve many kinds of leadership exercises which will vary from one cellular phone to others. The variation of exercises including numbers, functions, and evaluations will be adjusted not only to Inventory of Development Task or Inventori Tugas Pengembangan (ITP) but also to other related success factors [17]. A. Product Perspective A.1. Interface of User User can interact with the developed software through a useful interface output called Graphical User Interface (GUI) or monitor to the desktop if the user uses emulator. Otherwise, user can also use screen on his or her cellular phone if the media is in the form of cellular phone instead. A.2. Interface of Hardware To accommodate interface of hardware supporting developed software, user can utilize some media as follow: 1. PC with specification: Processor with minimal speed1, 8 GHz 2. Operating system: Windows XP or Windows Vista 3. Memory (RAM) minimum 512 MB, and preferred for 1GB 4. VGA 5. Mouse and Keyboard 6. Monitor 7. Sound Card and Speaker 8. Cellular phone supporting Adobe FlashLite 2.0
232
Suyoto et al.
A.3. Interface of Software To accommodate interface of software supporting developed software, user can utilize some media as follow: 1. Name: Windows XP Professional Edition SP2 Source: Microsoft Corporation It is used as a computer operating system. 2. Name: Adobe Flash CS3 Source: Adobe It is used as an action tool to make m-KE and m-KK software script. 3. Name: Adobe FlashLite 2.0 Source: Adobe It is used as a software supporting tools for cellular phone 4. Name: Adobe Photoshop CS4 Source: Adobe It is used as a tool to make graphical design and layout 5. Name: J2ME and J2ME Wireless Toolkit Source: Sun Microsystems It is used as an action tool to make m-LDP. A.4. Memory Limitation Primary memory limitation required in this developed operational application is a 512 MB minimum RAM for personal computer (recommended for 1GB), and for cellular phone a required minimum RAM is 1MB. B. User Characteristic The user of software of m-Pscychology (m-KE, m-KK, and m-LDP) is aimed to every student of Junior High School (JHS) especially for his or her who is curious not only to know the emotional type and characteristic but also to use some supporting content through cellular phone. What is more, the user can also find out how far he or she can reach success and can measure a leadership character. C. Assumtion and Interdependent Some assumptions used to develop m-Pscychology (m-KE, m-KK, and m-LDP) software are as follow: a. b.
A series of software suitable with the necessity to operate m-KE and m-KK software is FlashLite Player 2.0 and Java for m-LDP. Cellular Phone: Symbian Operating System S60 3rd edition minimum RAM 1 MB which is supporting FlashLite 2.0.
CPU: characteristic of CPU 32-bit data bus 1.8 GHz, minimum RAM 512 MB, recommendation of RAM 1 GB
New Development of M-Psychology for Junior High School
4
233
Result And Discussion
M-Pshcycology (m-KE, m-KK, and m-LDP) application is basically a developed interactive media application built on cellular phone which is aimed to Junior High School students. Special requirement goes to m-LDP application which is developed through Java tools approach called Client Server. So far, the application for m-KE and m-KK has been successfully built and developed by Adobe Flash CS4 Professional tool. However, Symbian Operating System supporting FlashLite 2.0 or 3.0 technologies will be needed for those users who have cellular phone based on the third edition of Nokia Series. Fig. 3 shows m-KE application feature while Fig. 4 performs m-KK application feature. Fig. 5 at last displays m-LDP application feature
Fig. 3. M-KE application feature
Tests on the users is done by asking some JHS students to access the application “m-Psychology” and ask the opinions of the users are on the application by filling in questionnaires that were distributed. The questions raised included the aspects of ease of use of the display of beauty, charm features, understanding of content and applications and benefits of self-knowledge. “m-Psychology” application has been tested at 31 JHS students. All JHS students provides an assessment that includes excellent, good, enough, not good, or very bad in the questionnaire given to them after they access the application “m-Psychology”. User rating - average to the overall aspect “m-Psychology” applications in the questionnaire were based on the results of the assessment above the 43% rate excellent, 40% gave a good rating, 16% provide rate adequate, and 1% gave the rating was not good/poor (Fig. 6).
234
Suyoto et al.
Fig. 4. M-KK application feature
Fig. 5. M-LDP application feature
User Satisfaction Surveys of mPsychology application poor 1%
adequat e 16%
exellent 43% good 40%
Fig. 6. User U rating for “m-Psychology” applications
New Development of M-Psychology for Junior High School
5
235
Conclusions
The development of m-Psychology that consists of three main applications which are Emotional Quotient mobile or m-KK (mobile Kecerdasan Emosional), Success Intelligent mobile or m-KK (mobile Kecerdasan Kesuksesan), and Leadership mobile or m-LDP application. The approach method used in the application is a combination between both interactive multimedia and educational psychology approach. There are four main multimedia components used into the application which are text, graph or picture, audio, and animation. This application has been tested as a whole at 31 Junior High School students with average result to the overall aspect is 43% of people give an exelent, 40% of people give a good, 16% of people give adequate, and 1% of people give a poor. Acknowledgements. The authors wish to thank DP2M Directorate General of Higher Education Indonesia for approving Competence Research Grants (for the budget year 2010 and 2011) titled Design and Implementation of Mobile Phone Content Service Application with Interactive Multimedia Approach.
References 1. Suyoto: Design and Implementation of e-KidsCV with Interactive Multimedia Approach. Research Report, Atma Jaya Yogyakarta University (2003) 2. Suyoto: The development of m-ColorBlindTest With Interactive Multimedia Approach. Research Report, Atma Jaya Yogyakarta University (2006) 3. Suyoto: Color Blind Test Development Through Mobile Phone with Interactive Multimedia Approach. In: Proceedings of the National Conference on Information Systems 2008, January 14-15. Sanata Dharma University in Yogyakarta, Yogyakarta (2008) ISBN: 978-979-1153-28-7 4. Suyoto: Design and Implementation of e-KidsCV with Interactive Multimedia Approach. Research Report, Atma Jaya Yogyakarta University (2003) 5. Kusumaputra, R.A.: Internet, a New Round of Growth Key to the Mobile Phone Industry, Kompas, Jakarta (November 29, 2006) 6. Suryadhi, A.: Late 2008 Mobile Phone Users Exceeds Half the World Population. detikinet (February 08, 2008) 7. Pulsa reporter: Business CP Tantalize. Pulsa, 125 edn. Th. V/2008 / February 14-27, Jakarta (2008) 8. Suyoto: Fractal with J2ME on mobile phone. Journal of Industrial Technology X(2) (2006) (in Bahasa) 9. Suyoto: Computer Graphics using J2ME. Journal of Information Technology 2(2) (2005) 10. Prasetyaningrum, T.: M-NingBK: Mobile Applications of Career Guidance for student of Junior High School. In: Proceeding of Semnasif – Informatics National Seminar 2008. Universitas Pembangunan Nasional Veteran, Jogjakarta (May 24, 2008) ISSN: 1979-2328 11. Prasetyaningrum, T.: Design of Mobile Counseling With Interactive Multimedia Approach. In: Proceedings of the National Seminar on Technology IV, Application of Technology to Enhance Sustainable Welfare Society Book 10. Faculty of Science and Technology University of Technology in Yogyakarta (April 5, 2008) ISBN: 978-9791334-20-4
236
Suyoto et al.
12. Suyoto: Development of Mobile Dictionary of Three Languages With Interactive Multimedia Approach. Journal of Informatics 2(1) (2006) 13. Awangga, N., Suryaputra: EQ plus Test. Pararaton Publishing, Yogyakarta (2008) 14. Cooper, R.K., Ayman, S.: Executive EQ: Emotional Intelligence in Leadership and Organizational. PT. Gramedia Pustaka Utama, Jakarta (2002) ISBN: 979-605-895-2 15. Patton, P.: EQ (Emotional Quotient) – Basic. Mitra Media Publisher (2000) ISBN: 979-95663-11-5 16. Safaria, T.: Successful Intelligence. Arti Bumi Intaran, Jogyakarta (2008) 17. Sunaryo, K., et al.: Instructions Use the Special Program Development Task Analysis. Report, University of Education Indonesia, Bandung (2003) (in Bahasa)
Adaptive Bandwidth Assignment Scheme for Sustaining Downlink of Ka-Band SATCOM Systems under Rain Fading Yangmoon Yoon1 , Donghun Oh2 , Inho Jeon2 , You-Ze Cho3 , and Youngok Kim2, 1
DTV Transition Department, Korea Radio Promotion Agency, Seoul, 138-803, Korea 2 Department of Electronic Engineering, Kwangwoon University, Seoul, 139-701, Korea 3 School of Electronics Engineering, Kyungpook National University, Daegu, 702-701, Korea
[email protected]
Abstract. Although the Ka-band SATCOM systems provide broad bandwidth, its received signal on earth is significantly attenuated and even the downlink is disconnected by rain fading. In this paper, an adaptive bandwidth assignment scheme is proposed for sustaining the downlink of Ka-band SATCOM systems under rain fading. The proposed scheme is operated in accordance with rain attenuation. In the proposed scheme, the assigned bandwidth is adaptively determined to concentrate the transmit power on the limited bandwidth since the maximum transmit power from the satellite is fixed. Simulation results demonstrate that the proposed scheme overcomes the rain fading and effectively sustains the downlink of Ka-band SATCOM systems. Keywords: Ka-band, Satellite Communications, rain attenuation, adaptive resource assignment, link sustentation.
1
Introduction
Since the first Korean satellite KITSAT was launched in 1992, many satellite transponders have been launched to provide the national infrastructure as well as various services, such as satellite communications (SATCOM), TV contents distribution, global positioning service (GPS) and recently satellite Internet service in Korea [1]. Recently, the Ka-band has been considered as a promising radio resource for SATCOM systems because the conventional radio resources, such as C, X and Ku-band, are nearly exhausted. Since the KOREASAT3 firstly brought Ka-band transponder, the KOREASAT5 followed it and the Communication,
Corresponding author.
T.-h. Kim et al. (Eds.): MulGraB 2011, Part I, CCIS 262, pp. 237–242, 2011. c Springer-Verlag Berlin Heidelberg 2011
238
Y. Yoon et al.
Ocean and Meteorological Satellite (COMS) was launched in June, 2010. In other countries, many Ka-band SATCOM systems has been launched and are expected to be launched for broadband SATCOM and multichannel HDTV, and so on [1], [2], [3]. The Ka-band SATCOM systems not only provide broad bandwidth, but also makes the emergency communications safe even under natural disasters, which makes the terrestrial communications fail. However, there is a defect that its received signal on earth is significantly attenuated and even the downlink can be disconnected by rain fading. In this paper, an adaptive bandwidth assignment (ABA) scheme is proposed for sustaining the downlink of Ka-band SATCOM systems under rain fading. In accordance with rain attenuation. the proposed scheme assigns bandwidth adaptively to concentrate the transmit power on the limited bandwidth since the maximum transmit power from the satellite is fixed. Simulation results demonstrate that the proposed scheme overcomes the rain fading and effectively sustains the downlink of Ka-band SATCOM systems. The rest of this paper is organized as follows. In Section 2, the system description is provided. In Section 3, the ABA scheme is proposed and its performance is discussed with the computer simulation results. Lastly, the conclusions are given in Section 4.
2 2.1
System Description Rain Attenuation Model
The international telecommunication union radiocommunication sector (ITUR) provides a radio attenuation model, which is based on both the spherical shape and non-spherical shape of raindrop, and Laws-Parsons (L-P) distribution model [4]. Assuming the non-spherical raindrop and vertical polarization, then the coefficients for ITU-R model are given in Table 1 and the specific attenuation for ITU-R model with L-P distribution model is shown in Fig. 1. In this paper, we adopt the ITU-R model and the coefficients for 19.45GHz, which is within the Ka-band. Fig. 2 shows the rain attenuation in terms of various surface rainfall intensities with ITU-R model and parameters in Table 2. The surface rainfall intensity is considered as a random variable uniformly distributed on [0, 100](mm/hr) to generate a rain attenuation model for computer simulations. Table 1. coefficients for ITU-R model with non-spherical raindrop and vertical
polarization Frequency (GHz) 12.25
aH
aV
bH
bV
0.00065 0.000591 1.121 1.075
19.45
0.07
40.00
0.35
0.0644 1.105 1.072 0.31
0.939 0.929
ABA Scheme for Sustaining Downlink of Ka-Band SATCOM Systems
239
25
ITU−R Model (12.25GHz) ITU−R Model (19.45GHz) ITU−R Model (40.00GHz) Specific Attenuation [dB/km]
20
15
10
5
0
0
10
20
30
40
50
60
70
80
90
100
90
100
Rainfall Rate [mm/hr]
Fig. 1. Specific attenuation for ITU-R model
40
35
Rain Attenuation [dB]
30
25
20
15
10
5
0
0
10
20
30
40
50
60
70
80
Surface Rainfall Intensity [mm/hr]
Fig. 2. Rain attenuation in terms of surface rainfall intensity
For the simulations, 100 examples of rain attenuation channel are generated with its surface rainfall intensity.
240
Y. Yoon et al. Table 2. Parameters for Seoul, Korea Latitude (φ) Elevation angle (θ) Frequency hS 37◦
2.2
45◦
19.45GHz 0 Km
Single Carrier FDM Symbol
When the downlink is disconnected by severe rain fading, the link can be sustained by adaptively determining the assigned bandwidth to concentrate the transmit power on the limited bandwidth. Although the OFDM system can adaptively determine the bandwidth, it is rarely applied in SATCOM due to its high peak to average ratio. Thus, a single carrier - frequency division multiplexing (SC-FDM) system is considered. The transmitter of a SC-FDM system converts a binary input signal to a sequence of modulated M subbands. The first step in modulation is to perform an M -point (M < N ) discrete Fourier transform (DFT) for producing a frequency domain representation of the input symbols. It then maps each of the M -DFT outputs to one of the N orthogonal subbands that can be transmitted [5].
3
Adaptive Bandwidth Assignment Scheme
In this section, an adaptive bandwidth assignment scheme is proposed for sustaining downlink of Ka-band SATCOM systems under rain fading. When the rain attenuation is less than 10dB, the rain attenuation can be effectively compensated by adaptively assigning the bandwidth. As mentioned previously, since the maximum transmit power from the satellite is fixed, the total transmit power can be concentrated on the limited bandwidth to compensate the rain attenuation. Fig. 3 shows the BER performance of system with proposed adaptive bandwidth assignment algorithm, which is provided in Table 3, over the 100 examples of rain attenuation channel. Note that the algorithm is an example for the maximum 10dB attenuation with the consideration of practically meaningful bandwidth. The simulation parameters are summarized in Table 4. As shown in the figure, the BER performance of the system with the proposed algorithm is effectively enhanced specially at low Eb/No regions. However, the limited enhancement of performance is observed in high Eb/No regions. If the algorithm is extended for overcoming more rain attenuation, the performance can be further enhanced, but the effective bandwidth also needs to be carefully considered not to be significantly narrowed because the narrower bandwidth is susceptible to deep fading.
ABA Scheme for Sustaining Downlink of Ka-Band SATCOM Systems
241
Table 3. Bandwidth Decision Criterion Rain Attenuation (dB) Bandwidth (MHz) No. of Subbands (M )
10
Bit Error Rate (BER)
10
10
10
0≤A≤3
200
64
3
100
32
0
50
16
9
25
8
0
−1
−2
−3
BPSK, BW=200MHz, Rain Attenuation= 0dB BPSK, BW=200MHz, Rain Attenuation= −10dB BPSK, BW=200MHz, Rain Attenuation= −20dB BPSK, BW=200MHz, Rain Attenuation= −30dB Proposed adaptive BW scheme 10
−4
0
5
10
15
20
25
30
Eb/N0 (dB)
Fig. 3. BER performance of system with the proposed adaptive BW assignment algorithm over the 100 examples of rain attenuation channel Table 4. System Parameters Parameter Modulation
Value BPSK
DFT Size (M )
64, 32, 16, 8
FFT Size (N )
64
Nominal Bandwidth (BW)
200 MHz
Sampling Frequency (Fs )
200 MHz
Sampling Interval ( F1s )
5 ns
Subcarrier Spacing (Δf =
Fs ) N
Basic OFDM symbol Duration (Tb = Guard Interval
3.125 MHz 1 Δf
)
32 μs 8 μs
242
4
Y. Yoon et al.
Conclusion
In this paper, an adaptive bandwidth assignment scheme was proposed for sustaining downlink of Ka-band SATCOM systems under rain fading. The proposed scheme is operated in accordance with rain attenuation. Simulation results demonstrated that the proposed scheme overcomes the rain fading and effectively sustains the downlink of Ka-band SATCOM systems. Acknowledgments. The present research has been conducted by the research grant of KORPA. This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (No.2011-0004197).
References 1. Lee, H.J., Kim, J.M., Lee, B.S., Lee, H., Ryoo, J.S.: Recent Korean R&D in Satellite Communications. IEICE Transactions on Communcations E92-B(11), 3300–3308 (2009) 2. Umehira, M., Kobayashi, K., Yasui, Y., Tanaka, M., Suzuki, R., Shinonaga, H., Kawai, N.: Recent Japanese R&D in Satellite Communications. IEICE Transactions on Communcations E92-B(11), 3290–3299 (2009) 3. Pelton, J.N.: The start of commercial satellite communications. IEEE Communications Magazine 48(3), 24–31 (2010) 4. Recommendation ITU-R P.618-5: Propagation data and prediction methods required for the design of earth-space telecommunications systems (1997) 5. Myung, H.G., Lim, J., Goodman, D.J.: Single carrier FDMA for uplink wireless transmission. IEEE Vehicular Technology Magazine 1(3), 30–38 (2006)
Digital Modeling and Control of Multiple Time-Delayed Systems via SVD Jong-Jin Park1, Gyoo-Seok Choi1,*, and Leang-San Shieh2 1
Dept. of Internet and Computer Science, Chungwoon University San 29, Namjang-ri, Hongseong, Chungnam, 350-701, South Korea {jjpark,lionel}@chungwoon.ac.kr 2 Department of Electrical and Computer Engineering, University of Houston, Houston, TX 77204-4005, USA [email protected]
Abstract. Delays by controller-to-actuator and sensor-to-controller deteriorate control performance and could destabilize the overall system. In this paper, a new approximated discretization method and digital design for control systems with multiple delays is proposed. Based on a procedure for the generation of impulse response data, the multiple fractional/integer time-delayed continuoustime system is transformed to a discrete-time model with multiple integer time delays. To implement the digital modeling, the singular value decomposition (SVD) of a Hankel matrix together with a energy loss level is employed to obtain an extended discrete-time state space model. Then, the extended discrete-time state space model of the control system is reformulated as an integer time-delayed discrete-time system by computing its observable canonical form. The proposed method can closely approximate the step response of the original continuous time-delayed control system by choosing various of energy loss level. Illustrative example is simulated to demonstrate the effectiveness of the developed method. Keywords: Multiple time-delayed systems, Hankel matrix, Singular value decomposition, Balanced model reduction.
1
Introduction
Time delay is one of the key factors influencing the overall system stability and performance. In particular, as the different effects of actuator, sensor and controller exist in control systems, delays are often formulated as state time delays, input time delays as well as output time delays in a continuous-time or discrete-time framework [1], [2-4]. To digitally simulate and design a continuous-time delayed control system, it is often required to obtain an equivalent discrete-time model. The digital modeling of continuous-time systems with input delays can be found in a standard textbook [5]. For improving the performance of a continuous-time system with multiple time delays, several advanced control theories and practical design techniques have been *
Corresponding author.
T.-h. Kim et al. (Eds.): MulGraB 2011, Part I, CCIS 262, pp. 243–252, 2011. © Springer-Verlag Berlin Heidelberg 2011
244
J.-J. Park, G.-S. Choi, and L.-S. Shieh
proposed [6-8]. Recently, a discretization method via the Chebyshev quadrature formula together with a linear interpolation method to construct an equivalent discrete-time model from the continuous time multiple time-delayed system was developed in [11]. Despite the significant progress that has been made on continuous/discrete time systems with multiple time delays, yet the digital modeling of a continuous-time system with multiple fractional/integer time delays in state, input and output is far from fully explored [4]. In this paper, we propose a new approximated discretization method and digital optimal controller design for a time-delayed control system with multiple time delays not only in states but also in inputs and outputs. In addition, the delay time can be a fractional or integer number of the sampling period, but the established discrete-time control system model is with only integer delays. Our digital modeling method is based on the balanced realization technique for system model reduction. The proposed design methodology proceeds as follows. First, from the unit-step response data of a sampled-data multiple time-delayed control system, the estimated impulse response sequences and Hankel matrices H_0,H_1 are generated. Next, using a singular value decomposition(SVD) and a predetermined energy loss level, a balanced representation of an extended discrete-time model is obtained, which contains the more significant degrees of controllability and observability of the original system. The eliminated singular values according to the predetermined energy loss level, reflect states of reduced importance of the original system form the energy point of view. Then, the extended discrete-time state space model is reformulated as an approximated integer time-delayed discrete-time system by computing its observable canonical form. As a result, the parameters of an optimal observer-based controller and its associated feedback design can be determined using this approximated integerdelayed discrete-time system model.
2
Digital Modeling of Multiple Time-Delayed Systems
2.1
Problem Formulation and Model Transformation
Consider a controllable, observable and stable continuous-time multiple-input, multiple-output (MIMO) system with multiple state, input and output time delays described by[3] ∑ where
∑
,
∑
. (1)
is the state with φ for max , , , is the control input and is the output of the system. Here n is the order of the system, m is number 0, i= 0, 1, … , , are of inputs and p is number of outputs of the system (1). Also, 0, i= 0, 1, … , , are the input delays and 0, i= 0, 1, … , , the state delays, are the output delays. These delays can be fractional or integer multiple of the max , , is a continuous sampling time T. The function φ vector-valued initial condition. The system matrices ( , , ) are sets of real , 0,1, … , , 0,1, … ; matrices defined with
Digital Modeling and Control of Multiple Time-Delayed Systems via SVD
,
0,1, … ,
245
.
Most control systems are formulated in a continuous-time framework, for which many analysis tools and control methodologies are well-established [5]. With the rapid advances in digital technology and computers, digital control provides various advantages over its analog counterpart for better reliability, lower cost, smaller size, more flexibility and better performance. The resulting digitally controlled continuoustime system becomes a sampled-data system [5]. For the hybrid control of sampled-data system, the objective is to design a digital controller for the continuous time-delayed system described in (1). To this end, it is required to obtain an equivalent discrete-time model for the MIMO continuous-time delay system in (1). This paper proposes a discretization method via SVD and energy analysis of the system along with a balanced minimal realization approach [12, 13]. and of the For this approach it is important to find the Hankel matrices system response. For computation of the Hankel matrices, we need the impulse response data of the system (1). We obtain impulse response data from unit-step response data indirectly. The obtained impulse response data for 1,2,3, …, can be utilized to and as follows construct Hankel matrix
Y1 Y2 Y2 Y3 # # Yr "
" # , % # " Yr + l −1 "
Y2 Y3 Y3 Y4 # # Yr +1 "
Yl
" Yl +1 " # . % # " Yr + l
where r and l are sufficiently large but integers. The Hankel matrix decomposed using SVD as shown below[12, 13] Σ
.
(2)
can be (3)
The matrices R and S in (3) are orthogonal, whereas Σ is a rectangular matrix defined as 0 Σ . Here Σ Σ , , is a diagonal matrix consisting of 0 0 monotonously decreasing singular values, where
246
J.-J. Park, G.-S. Choi, and L.-S. Shieh
…
0.
The size of is a relative measure of the contribution that the state makes to the in input output behavior of the system. It is well-known that the Hankel matrix is the product of the observability and controllability matrices of the system of indicate the intensity of the interest. Hence, the Hankel singular values, means controllability and observability of the system. A relatively small value of that the corresponding subspace is both weakly controllable and observable. Hence, it can be discarded. An effective balanced model reduction method based on the SVD method, can be found in [12, 13], which discards the least controllable and observable and retains the internally balanced systems inputstates corresponding to small output Grammian. Based on these singular values , we define the energy loss level as ∑
∑ ,
100 % .
∑
(4)
Hence, the dimension of a reduced-order model, can be determined using (4) by choosing , to be small, e.g. , < 1%. After finalizing the order, we find and by truncating the matrices R and S obtained in expression (3), that is, and are the first rows of R and S, respectively. Subsequently, we construct a time model (G,H,C) with order . Finally, The discrete-time model is shown in (5) as follows , C ∑
where ,
/
∑ ∑
/
,
/
,
∑
/
…
. , , ∑
(5) … ,…,
, and .
β is a modification factor which can adjust the steady-state value of the system (5) to match the original continuous-time system (1). 2.2
Discrete-Time Multiple Integer Time-Delayed Model of the System
To design a digital controller for the continuous-time system with multiple delays in states, inputs and outputs, it is necessary to construct an equivalent discrete-time multiple integer time-delayed model [5]. Discretization techniques for multiple input delay systems are well developed. However, discretization schemes for non-integer state delay systems is not yet fully developed [4, 11]. Here, we propose a discretization technique based on the balanced minimal realization approach using step response data of the given system. From these step response data, we compute the impulse response data . Using the data we can calculate the Hankel matrices and . Hence, by following the procedure discussed in Section 2.1, we can obtain an equivalent extended discrete-time model represented in (5). Subsequently,
Digital Modeling and Control of Multiple Time-Delayed Systems via SVD
247
we convert this digital model in (5) into a discrete-time multiple integer time-delayed (kT), where model as follows. We perform a linear transformation x(kT) = , is chosen as · The transformed system with observer-type form becomes[12] .
,
(6)
,
where , Equation (6) can be represented as
u
,
.
(7)
where . After solving the set of equations in (7) and y (kT), we get a Discrete-time Delay Difference Equation (DDDE) for with integer delays as described in the expressions shown below. . ,
Where
, ,
,
,
, (8)
,
.
The system represented by (8) is an equivalent discrete-time multiple integer timedelayed model for the given continuous-time fractional/integer time delay system with appropriate order.
3
Optimal Digital Controller Designing for Multiple Time-Delayed System
To optimally reduce the effect of state disturbance or quickly respond load changes, we design an optimal digital controller, using the discrete-time system described in equation (5). Based on the values of G, H and in (5), we compute the optimal controller as shown below. . with
and
(9)
are computed as .
(10) .
= G - H is the closed-loop system matrix and P = where solution of the following Riccati equation
(11) > 0 is the
248
J.-J. Park, G.-S. Choi, and L.-S. Shieh
.
(12)
Weighting matrices 0, R = > 0 in (12) are selected as per the given system in order to obtain the appropriate closed-loop response. The controller is designed in such a way that, the output of the closed-loop system should track the applied reference input. The controller presented in (9) is for the extended discretetime system (5). It is important to find the optimal digital controller for the discretetime system with multiple integer time-delayed system described by (8). Applying the same linear transformation described in Section 2.2, we obtain , , . where . After some simple algebraic operations, we obtain the final expression for the controller of the integer time-delayed system described as . where
,
,
(13)
.
Thus, the designed discrete-time system with multiple integer delays in (8) with optimal digital controller is shown in (14). , , .
(14)
To apply the digital controller in (13) to the multiple time-delayed system in (1), we can use the DDDE in (8) as an open-loop observer for implementation of the controller in (13).
4
Simulation and Results
In this section, a digital model obtained by the proposed SVD approach is compared with the digital model designed using the bilinear transformation method [9]. In this example, we discuss the accuracy of the proposed method (SVD approach) over the previous method of digital modeling (bilinear transformation). For comparison, we consider the following continuous time system with a state delay as described by , . where time
0 1 0 0 0 , , , 1 2 0.2 0.1 1 0.2 , and the sampling period is T = 0.15s, n = 2.
(15) 1 0
0 , the delay 1
Digital Modeling and Control of Multiple Time-Delayed Systems via SVD
Following the computation steps, let ∑ is obtained as follows. ∑
1%, the singular value matrix
,
0.7884 0.3177 0.0012 0.0009 . 2 , and the extended discrete-
So the appropriate order for the digital model is time model (5) is obtained as 0.9597 0.1285 G= 0.0076 0.0001
0.2282 0.0822
0.0492 0.0057 0.7745 0.0341 0.0387 0.0831 0.0003 0.0927 0.2508 0.3357 , 0.0254 0.0002 0.1405 0.0029 0.3213 0.0276
0.0001 0.0003 , 0.0926 0.0923
0.0001 · 0.0002
The observable canonical form transformation matrix , ,
where
249
32
is picked as
, so that the output vector y(kT) in (8) is
equal to the state vector x01(kT) in (8). So the matrices of the integer time-delayed discrete-time model (8) are 2.6361 0.4752 , 38.1408 4.3611 0.0101 0.0080 , 1 0 , 0 1
3.6344 38.1043 0.1292 0 ,
0.2155 , 2.2601 0.0835
0 .
The equivalent digital model of the given continuous-time system with a state delay represented by (15), is calculated based on the bilinear transformation method as 2 . with 2
0.9901 0.1293 , 0.1231 0.7346 0.0002 0.0001 , 0.0013 0.0006
0
0.0015 0.0008 , 0.0185 0.0093 0.0102 . 0.1291
, (16)
250
J.-J. Park, G.-S. Choi, and L.-S. Shieh
Fig. 1. Comparison of Step responses Table 1. Error Performance Index Output 1
Output 2
0.00119
0.0002
0.00959
0.00291
To compare the accuracy of the proposed method, we compare the step responses of both systems represented by (8) and (16). In Fig. 4, the state responses of the continuous-time system represented by (15), the digital model using the proposed method, described by (8), and the digital model determined by the bilinear transformation, represented in (16) are compared. To compare the closeness of the outputs of the original continuous-time multiple time-delayed system y(t) in (15), the proposed discrete-time model denoted as y(kT) in (8), and the existing discrete-time in (16), we compute the following error performance indexes. model , where
2
,
2
, .
is the ith output of system (8), is the ith output of system (16) and is the ith output of system (15) for , 1, . . . , , 500. The comparison results are listed in 1, 2, Table 1.
Digital Modeling and Control of Multiple Time-Delayed Systems via SVD
251
From the above values of the error performance index for both methods, it is clear that the proposed method gives closer results compared to the digital model obtained by the standard bilinear transformation.
5
Conclusions
In this paper, a new approximated state-space discretization scheme for a multivariable continuous-time system with multiple state, input and output delays has been presented. In addition, an open-loop observer-based optimal digital controller design method for a multiple integer time-delayed systems is also proposed. As a result, the infinite-dimensional continuous-time control system can be converted into a finite dimensional sampled-data system, and a direct digital design of the sampleddata closed-loop system can be adopted. The proposed digital modeling method has several advantages comparing to existing methods: (1) The developed technique is simpler and more accurate than those of existing methods to convert a multiple fractional/integer time-delayed system into an integer time-delayed discrete-time system. Furthermore, if it is necessary, the delay-free discrete-time model can also be constructed. (2) The transformed integer time-delayed discrete-time model has the same dimension with the original continuous-time system model. (3) The obtained balanced reduced-order model retains the significant contributions of the states to the input output behavior of the original system. Also, the proposed digital modeling method allows the development of an inexpensive, reliable and high performance digital control law for effective hybrid control of a multiple time-delayed continuous-time DPG control system. Simulation results have demonstrated the effectiveness of the proposed method.
References 1. Hauser, C.H., Bakken, D.E., Bose, A.: A failure to communicate: next generation communication requirements, technologies, and architecture for electric power grid. IEEE Power Energy Mag. 3, 47–55 (2005) 2. Yue, D., Hang, Q.L.: Delayed feedback control of uncertain systems with timevaryinginput delay. Automatica 41, 233–240 (2005) 3. Leyva-Ramos, J., Pearson, A.E.: Output feedback stabilizing controller for time-delay systems. Automatica 36, 613–617 (2000) 4. Richard, J.P.: Time-delay systems: an overview of some recent advances and open problems. Automatica 39, 1667–1694 (2003) 5. Astrom, K.J., Wittenmark, B.: Computer Controlled Systems. Prentice-Hall, Englewood Cliffs (1997) 6. Yao, W., Jian, L., Wu, Q.H.: Delay-dependent stability analysis of the power system with a wide-area damping controller embedded. IEEE Trans. Power Syst. 26, 233–240 (2011) 7. Madsen, J.M., Shieh, L.S., Guo, S.M.: State-space digital PID controller design for multivariable analog systems with multiple time delays. Asian J. Contr. 81, 161–173 (2006)
252
J.-J. Park, G.-S. Choi, and L.-S. Shieh
8. Guo, S.M., Wang, W., Shieh, L.S.: Discretisation of two degree-of freedom controller and system with state, input and output delays. IEE Proc. Control Theory Appli. 147, 87–96 (2000) 9. Shieh, L.S., Wang, W.M., Tsai, J.S.H.: Digital redesign of H∞ controller via bilinear approximation method for state-delayed systems. Int. J. Contr. 70, 665–683 (1998) 10. Wang, W.M., Guo, S.M., Shieh, L.S.: Discretization of cascaded continuous-time controllers for state and input delayed systems. Int. J. Syst. Sci. 31, 287–296 (2000) 11. Chang, Y.P., Shieh, L.S., Liu, C.R., Cofie, P.: Digital modeling and PID controller design for MIMO analog systems with multiple delays in states, inputs and outputs. Circuits Syst. Signal Process. 28, 111–145 (2009) 12. Moore, B.C.: Principal component analysis in linear systems: controllability, observability and model reduction. IEEE Trans. Automat. Contr. 26, 17–32 (1981) 13. Juang, J.N.: Applied System Identification. Prentice Hall, Englewood Cliffs (1994) 14. Shieh, L.S., Tsay, Y.T.: Transformations of a class of multivariable control systems to block companion forms. IEEE Trans. Automat. Contr. 27, 199–202 (1982)
Control System Design Using Improved Newton-Raphson Method and Optimal Linear Model of Nonlinear Equations Jong-Jin Park1, Gyoo-Seok Choi1,*, and In-Kyu Park2 1 Dept. of Internet and Computer Science, Chungwoon University San 29, Namjang-ri, Hongseong, Chungnam, 350-701, South Korea {jjpark,lionel}@chungwoon.ac.kr 2 Dept. of Computer Science, Joongbu University, 101 Daehak-Ro, Chubu-Myeon, Kumsan-Gun, Chungnam, 312-702, South Korea
[email protected]
Abstract. Model reference techniques are successfully used in many control system designs, particularly in the field of model reference adaptive control systems. In this paper for a linear single-input, single-output time-invariant system, a method for modeling transfer functions of reference model using the basic performance specifications is presented. A set of non-linear equations is constructed from the definitions of the performance specifications and the unknown coefficients of a transfer function. Improved Newton-Raphson method and optimal linear model are applied to solve the non-linear equations. The proposed method constructs approximate representations of the desired transfer functions to ensure more rapid convergence of the numerical method. First, improved Newton-Raphson method is developed. Second, a optimal linearization technique is explained to obtain optimal linear model of nonlinear equations. Finally, simulation with four specifications is carried out to obtain a second-order transfer function model and demonstrate our method. Keywords: improved Newton method, optimal linear model, control system design, transfer function.
1
Introduction
In regard to control system designs, particularly in the field of model reference techniques such as model reference adaptive control systems, it is very important to obtain a reference model which has a same order with the system to be controlled. Industrial specifications to obtain reference model for control system designs are steady-state characteristics such as steady-state errors, velocity errors and acceleration errors, and transient response characteristics such as damping ratio, resonant value and overshoot. And bandwith, settling time and natural angular frequency are used to show transient response velocity. To design controller and filter and predict time *
Corresponding author.
T.-h. Kim et al. (Eds.): MulGraB 2011, Part I, CCIS 262, pp. 253–261, 2011. © Springer-Verlag Berlin Heidelberg 2011
254
J.-J. Park, G.-S. Choi, and I.-K. Park
response characteristics, field technicians are interested in dominant poles. To obtain approximate transfer functions of reference model, several methods have been studied using these industrial specifications.[1, 2] These methods constitute nonlinear equations from industrial specifications to obtain desired reference models and are solved using Newton-Raphson method. Newton-Raphson method becomes more sensitive to initial values and takes longer time to find solution, when equations to be solved are high order. In this paper numerical analysis method is presented to obtain transfer function of reference model more accurately and fast which satisfy specific industrial specifications. First, we propose improved Newton-Raphson method to find solution of nonlinear equations constituted from certain specifications. Second, a method to obtain optimal linear model of nonlinear equations for improved Newton-Raphson method is described. Third, we perform simulation using proposed method to find 2order transfer function model with 4 specifications and prove excellence of the method by results.
2
Improved Newton-Raphson Method
2.1
Newton-Raphson Method
Newton-Raphson method is a traditional numerical analysis method by which we can find solutions of multi-dimensional nonlinear equations. A set of nonlinear equations is constructed from the definitions of the performance specifications, the dominant frequency-response data and the unknown coefficients of a transfer function.[2,5] When nonlinear equations are : ,
,
,
0,
1,2,
, .
(1)
Then traditional Newton-Raphson formula is given as follows: |
|
where,
,
0,1,
.
(2)
.
We can find solutions which satisfy equation (2) using one of two conditions in (3). | , 2.2
0, 0, 1, .
(3)
Improved Newton-Raphson Method
To construct improved Newton-Raphson formula, let’s consider equation like (4) at arbitrary operation point.
Control System Design Using Improved Newton-Raphson Method
|
255
.
(4)
(4) can be extended to (5) using Taylor Series of nonlinear equations.
1 2!
"
.
(5)
To obtain a optimal linear model, set (5) as (6). . |
where,
∆
J(X) is Jacobian matrix and ∆ , model which minimizes 0, from (6) set same formula with (2).
|
|
∆
|
(6) .
is correction matrix of J(X). when , then ∆ and we can obtain a optimal linear at
.
0 and ignore
To obtain solution of , we obtain (7) which is
0 . If follows:
(7)
is not ignored, we can obtain improved Newton-Raphson formula as
0 , where, 0 , then If ∆ Newton-Raphson formula.
3
∆
0,1,2,
.
(8)
.
. (8) becomes same formula with traditional
Optimal Linearization
To obtain solution of (8) using improved Newton-Raphson formula, linear equations of nonlinear equation (8) are needed. In many cases linearization of nonlinear equations through Jacobian matrix is used. It is easy to use and practical to analyze
256
J.-J. Park, G.-S. Choi, and I.-K. Park
local dynamical behaviors. In this point of view optimal local linear model is very important to find solution of nonlinear equations. Let’s consider a nonlinear model as follows:
:R , :R R is control input. At certain operation point, nonlinear model is as follows: where,
is nonlinear, R
.
(9)
R
is state vector and
optimal local linear model
.
,
of
(10)
where and are constant matrix of appropriate order. Taylor expansion is usually used for this purpose. But Taylor expansion which ignores some terms becomes an affine model rather than a linear model. Even if operation point is equilibrium of system, linearization by Taylor series generally can’t construct a local is equilibrium. linear model in terms of x and u. Suppose operation point ( , When R And R , in the case of (11) resultant linear model is (12). 0 .
(11)
.
(12)
(12) can be described as follows: .
(13)
(13) is evidently affine model rather than linear model because of constant term. 0. It is not In order to solve the problem, suppose operation state is equilibrium of (9). The goal is to obtain a local linear model for both x and u which can approximate dynamic operation of (9) near the operation state . It involves and which satisfy following equations near . For finding constant matrices arbitrary u following equations are established. . .
(14) (15)
Control input u could be designed arbitrarily, so we have .
(16)
Control System Design Using Improved Newton-Raphson Method
257
Therefore (14) and (15) become . . Assume
is i-th row of matrix
(17) (18)
. Thus we have , ,
Where : R R is i-th element of f. When we expand left term of (19) regard to order, we have
1,2, , . 1,2, , .
(19) (20)
and ignore terms that is higher than 2-
. :R R is gradient column vector of where Using (20) we can rewrite (21) as follows:
(21)
which is calculated at
.
.
(22)
where x is arbitrary, but needs to be near to for better approximation. Consider become constrained minimization problem, (23) in order that constant vector and satisfy almost same with 2 . 2
(23)
constraints : This constitutes convex optimization problem with constraints. Thus necessary condition of minimization of E satisfies sufficient condition. That is 0 . . where is Lagrange multiplier and From (24) we have
(24) (25)
indicates gradient in terms of subscript
0 .
.
(26)
258
J.-J. Park, G.-S. Choi, and I.-K. Park
0, from (26) we have
In case of
. By inserting obtained
(27)
to (26), we have .
0. In case of
where
(28)
0, (26) becomes .
(29)
This is special case of (28).
4
Simulation and Results
4.1
2-order Model
To verify efficiency of proposed algorithm consider 2-order system as an example. Suppose that the following specifications are given: (1) Type 1 system (2) = velocity error constant = 20 (3) = damping ratio = 0.7 = bandwidth =5 (4) We can construct 2-order transfer function which satisfies above specifications. (1) System type is 1. Thus transfer function is
. If
.
, then
where G(s) is open-loop transfer function. (2) The definition of
is known as · 20 0 .
(30)
Control System Design Using Improved Newton-Raphson Method
259
(3) Transfer function can be rewritten as
. and 2
√
,
From (31) we have
(31)
then 2 √
4
Thus,
0 .
(32)
(4) From the definition of bandwidth, we have | |
| 1
| 2 ,
If
0.707
√2 .
and
(33)
, transfer function is .
And we have nonlinear equations to be solved as follows: ,
,
2 ,
0
, ,
0 ,
4
0 .
Therefore nonlinear equation vector which is to be solved is 25
4.2
50 0.05 1.96
625 .
(34)
Optimal Linear Model
We can formulate optimal linear model of nonlinear equation vector F(x) using (6) and (28) as follows: 50
50 1 2
2 0.05 1.96
50 1 0
.
(35)
260
J.-J. Park, G.-S. Choi, and I.-K. Park
∆ ∆ ∆
∆ Where, ∆ ∆ ∆ ∆ ∆ 4.3
∆ ∆ ∆
∆ ∆ ∆
.
(36)
, 25 25 25 ∆
50 50 50 ∆
,∆
625 625 625
, , ,
0, ,∆
.
Simulation Results
We performed simulation to obtain transfer function of reference model using optimal linear model represented by (35) and (36). Initial guess in (8) at k=0 is 1 1 1 . We use condition of convergence for the solution as follows: ·
10
.
(37)
where tr( ) is transpose matrix. Table 1 shows simulation results by traditional Newton-Raphson method and improved Newton-Raphson method. Table 1. Results of simulation
Ef
Newton-Raphson
Improved NewtonRaphson 4349.3 988.8 157.5 9.1 0.038 0.009 -
K (1) (2) (3) (4) (5) (6) (7) (8)
476.1 228 59.8 14.6 3.7 0.9 0.24 0.06
Final solution by proposed method satisfying (37) is 4.9885 12.6967 . 4.3537
X
Therefore 2-order reference model for given specifications is .
. .
.
.
(38)
Control System Design Using Improved Newton-Raphson Method
5
261
Conclusions
In this paper we proposed a method for modelling transfer function of reference model for specific industrial specifications. The proposed method was expanded from traditional Newton-Raphson method and optimal linear model of nonlinear equations which are constructed from specifications. We simulated 2-order system to prove proposed method. When the specifications of the design goals of a control system are assigned, the proposed method gives the standard transfer function. Simulation results by improved Newton-Raphson method and optimal linear model are more accurate, and found faster than traditional Newton-Raphson method.
References 1. 2.
3. 4. 5.
Huang, C.J., Shieh, L.S.: Modelling large dynamical systems with industrial specifications. Int. J. Systems Sci. 7(3), 241–256 (1976) Shieh, L.S., Datta-Barua, M., Yates, R.E.: A method for modelling transfer functions using dominant frequency-response data and its applications. Int. J. Systems Sci. 10(10), 1097– 1114 (1979) Dabney, J.B., Harman, T.L.: Mastering SIMULINK 2. Prentice-Hall, Englewood Cliffs (1998) Teixeira, M.C.M., Zak, S.H.: Stabilizing controller design for uncertain nonlinear systems using fuzzy models. IEEE Trans. on Fuzzy Syst. 7, 133–142 (1999) Rao, S.S.: Optimization:Theory and Applications, pp. 292–300. John Wiley Sons, New York (1984)
Cost-Effective Multicast Routings in Wireless Mesh Networks* Younho Jung1, Su-il Choi1, Intae Hwang1, Taejin Jung1, Bae Ho Lee1, Kyungran Kang2, and Jaehyung Park1,** 1
School of Electronics and Computer Engineering, Chonnam National University, Gwangju, 500-757 Korea {sichoi,hit,tjjung,bhlee,hyeoung}@chonnam.ac.kr 2 School of Information and Computer Engineering, Ajou University, Suwon, 443-749 Korea
[email protected]
Abstract. In order to reflect multicast routing characteristics in wireless mesh networks, multicast routing metric is required for qualifying the multicast tree cost under wireless environments. We design a new multicast routing metric called the multicast-tree transmission ratio which quantifies the multicast tree cost, considering the link quality of wireless multicast channels as well as wireless multicast advantage. The multicast-tree transmission ratio represents the product of the multicast transmission ratios of all nodes in the constructed multicast tree. This paper proposes a wireless multicast routing which constructs the multicast tree by maximizing the multicast-tree transmission ratio in wireless mesh networks and extends the multicast routing in mesh networks with multiple gateways. The proposed wireless multicast routings show a higher delivery ratio and a lower average delay than the multicast routing minimizing the forwarding nodes in its multicast tree. In comparison with other multicast routings, simulation results show that the proposed multicast heuristics maximizing the multicast tree transmission ratio construct a cost-effective multicast tree in terms of its delivery ratio, average delay, and required network resources. Keywords: Routing Metric, Multicast Routing, Wireless Mesh Network, Multicast Transmission Ratio, Wireless Multicast Advantage.
1
Introduction
Wireless mesh networks are an emerging technology in providing adaptive and flexible wireless Internet connectivity to mobile users [1 - 3]. Due to such a reason, many network researchers and commercial developers are taking an intensive interest in them with multiple gateways for future Internet infrastructure. With the rapid development of communication technologies, multicast communication applications * **
This work is supported by Korea Research Foundation Grant. (KRF-2009-013-D00077). Corresponding author.
T.-h. Kim et al. (Eds.): MulGraB 2011, Part I, CCIS 262, pp. 262–271, 2011. © Springer-Verlag Berlin Heidelberg 2011
Cost-Effective Multicast Routings in Wireless Mesh Networks
263
are becoming more widely used in applications such as video on demand, IP TV, video conference, peer-to-peer communications, etc [1, 4]. Although multicast communication in wired infrastructures and mobile ad hoc networks has been studied intensively, the proposed multicast routings have been developed in order to cope with the constraints inherent to wireless mesh networks [5, 6]. Several multicast routings have previously been proposed for constructing the multicast tree between communication members [7 - 10]. The proposed multicast metrics in some multicast routing [7, 8] is based on the wireless path quality from a source to each member by the unicast routing metrics. In multicast routings [9, 10], the multicast metrics are designed according to the transmission count of multicast traffic on wireless multicast channels in the multicast tree, considering only the wireless multicast advantage. However, the proposed routings have the limitation that all multicast links are assumed to be of equal quality. Thus a multicast routing metric is required for qualifying the multicast tree cost while considering the wireless multicast advantage in wireless mesh networks. In order to qualify a multicast tree cost in mesh networks, we design a multicast routing metric called the multicast-tree transmission ratio, considering link quality of wireless multicast channels as well as wireless multicast advantage. For qualifying a multicast routing tree, the designed multicast tree transmission ratio represents the product of the multicast transmission ratios of all nodes in the constructed multicast tree. Then we propose a wireless multicast routing constructing a multicast tree by maximizing the proposed multicast tree transmission ratio in wireless mesh networks and extend the multicast routing in wireless mesh networks with multiple gateways. The proposed multicast routings maximizing the multicast tree transmission ratio show a higher delivery ratio and a lower average delay than the multicast routing minimizing the forwarding nodes in its multicast tree. The proposed multicast routings require less network resources in comparison with multicast routing, constructing the tree by adding paths having the maximum product of transmission ratios from a source to all members. Simulation results show that the proposed multicast routings maximizing the multicast tree transmission ratio construct a costeffective multicast tree in comparison with other multicast routings. The rest of the paper is organized as follows. Section 2 designs the multicast-tree transmission ratio as a multicast routing metric considering link quality metric of wireless multicast channels and wireless multicast advantage. Section 3 proposes a wireless multicast routing heuristic maximizing the multicast-tree transmission ratio and extends the multicast routing in wireless mesh networks with multiple gateways. Section 4 evaluates our wireless multicast routing by simulation and Section 5 concludes this paper.
2
Multicast Routing Metric
In this section, we define the multicast-node transmission ratio as a link quality metric of wireless multicast channels leveraging wireless multicast advantage and design the multicast-tree transmission ratio as a multicast routing metric qualifying a routing tree.
264
2.1
Y. Jung et al.
Multicast-Node Transmission Ratio
In wireless mesh networks, wireless nodes handle data packets at the link layer in a different manner in unicast and multicast routing. The nodes for multicast routing use link-layer broadcasts to leverage the wireless multicast advantage. The wireless multicast advantage is designed to enable a node to transfer multicast data to all nodes in its propagation area, and enhance the efficiency of data transfer under wireless environments. Thus, a multicast routing metric is required that is distinct from the unicast metric in wireless mesh networks. Definition 1. The transmission ratio, ri,j, is the ratio of a wireless multicast link where node j successfully receives a multicast packet directly from node i on. The transmission ratio, ri,j, of the wireless link from node i to node j via 1 hop represents link quality metric of wireless multicast channels. We consider the wireless links between wireless nodes as well as the wired links between gateways. Differently from wireless nodes, two gateways can transfer multicast packets through more reliable wired links than wireless ones. Therefore, the transmission ratio, ri,j, of the wired route is assumed 1.0 due to providing a high reliable data transfer between two gateways in wired networks. Definition 2. The multicast transmission ratio, mi, of node i’s wireless link in a multicast tree is defined as follows; m = min r , i ∀k i, k
(1)
where node k is a child in the multicast tree. The multicast transmission ratio of node i’s, mi, is expressed as the minimum transmission ratio from node i to its all child nodes via the multicast links in the multicast tree. This is because node i in the multicast tree should transfer multicast packets until the neighbor node with the minimum transmission ratio can successfully receive them, even though other nodes can already do so. Hence, the multicast transmission ratio considers the wireless multicast advantage. 2.2
Multicast-Tree Transmission Ratio
We design a multicast transmission ratio is a tree cost metric from Eq. (1) in Definition 2 for qualifying the multicast tree constructed by multicast routing. Definition 3. The multicast-tree transmission ratio, RT, of a multicast tree is defined as follows; R = Πm , T ∀k k
where node i is in the multicast tree excepting leaf nodes.
(2)
Cost-Effective Multicast Routings in Wireless Mesh Networks
265
The multicast tree transmission ratio, RT, of the multicast tree is expressed as the product of the multicast transmission ratios of node i’s wireless links excepting the leaf nodes. The leaf nodes have no multicast transmission ratio, because they need not transfer data to any node in the multicast tree. Hence, the multicast tree transmission ratio reflects the link quality of the wireless multicast channels in the multicast tree.
S
1.0
0.8 GW 0.9
0.7 A
R1
0.9 B
0.8 R2
R3
Fig. 1. An example of multicast tree in wireless mesh networks
Fig. 1 show an example of the multicast tree depicted in a solid line, where the multicast source is node S, the gateway is node GW, and the multicast members are node R1, R2 and R3. This multicast tree has the multicast-tree transmission ratio of 0.4032 where mS = 0.8 = min{0.8, 1.0}, mA = 0.7 = min{0.7, 0.9}, mB = 0.8, and mGW = 0.9.
3
Proposed Multicast Routings
In this section, we propose a wireless multicast routing which constructs the multicast tree by maximizing the multicast-tree transmission ratio and extend the multicast routing in wireless mesh networks with multiple gateways. 3.1
Multicast Routing Heuristic
Algorithm 1 describes a multicast routing heuristic constructing a multicast tree T = (VT, ET) by maximizing the proposed multicast tree transmission ratio. The wireless mesh networks is denoted by G = (V, E) as a directed graph, where D is a set of multicast members, and s is the source. In the proposed heuristic, multicast members are individually added to the multicast tree. Initially, all distinct multicast trees are constructed by adding a distinct path to the already generated tree T. Then, the multicast tree transmission ratios are computed (See Definition 3) for all distinct multicast trees, respectively. The path is chosen that maximizes the multicast tree transmission ratio RT. Nodes and links along the chosen path are added to the multicast tree T.
266
Y. Jung et al.
Algorithm 1. Proposed Multicast Routing Heuristic for Maximizing Multicast Tree Transmission Ratio Given G = (V, E), D = {d1, …, dn}, s, T = (VT, ET) for i = 1 to n do while there exist a distinct path to T do Construct a temporary tree Ttemp by adding the path Compute the multicast tree transmission ratio, RTtemp , for the temporary tree Ttemp end while Find the path p with maximum RTtemp VT ← VT {k| k p} ET ← ET {(i,j)| i,j) p} end for In Fig. 2, the multicast tree depicted by the solid line is constructed using the proposed multicast routing heuristic, where the multicast source is node S and the members are node R1 and R2. The multicast-tree transmission ratio of the multicast tree is 0.56 where mS = 1.0, mA = 0.7 = min{0.7, 0.9}, and mC = 0.8. 0.7 A
1.0
R1
0.9 C
0.8
S 0.7
0.8
0.8 B
0.8
R2
R3
Fig. 2. An example of proposed multicast routing heuristic, where node R3 wants to join the multicast tree
Thereafter, if node R3 wants to join multicast communication, three different multicast trees are used. The first tree is the multicast tree adding the blue dotted path (S→B→R3). This has a multicast tree transmission ratio of 0.448. The second tree is the tree adding the red dotted path (C→R3). This has a multicast tree transmission ratio of 0.49. The other tree is the multicast tree adding the green dotted path (R2→R3). This has a multicast tree transmission ratio of 0.448. Therefore, the proposed multicast routing heuristic constructs a multicast tree by adding the red dotted one, since this path maximizes the multicast tree transmission ratio. However, the path maximizing the product of transmission ratios from the multicast source S to node R3 is the blue dotted path, since the product of transmission ratios of the blue dotted path is 0.64, that of the path along S, A, C, and R3 is 0.63, and that of the path along S, A, C, R2, and R3 is 0.576. Therefore, the proposed multicast routing heuristic constructs a multicast tree in a different manner than a multicast routing by adding paths having the maximum product of transmission ratios from the source to all members.
Cost-Effective Multicast Routings in Wireless Mesh Networks
3.2
267
Extended Multicast Routing
Algorithm 2 describes the extended multicast routing heuristic in wireless mesh networks with multiple gateways. Such network is denoted by G = (V, E) as a directed graph, where s is the source, D is a set of multicast members, and Gw is a set of gateways. Initially, a multicast tree T has one node, s, and no link. Algorithm 2. Extended Multicast Routing in Wireless Mesh Networks with Multiple Gateways Given G = (V, E), s, D = {d1,…,dn}, Gw = {g1,…,gm} Initially, T = ({s}, {}) while D ≠ {} do for i such that di D do for k such that there is a distinct tree WTik do /* A tree WTik is generated as adding the member di into T */ Compute RWTik end for Find the maximum RWTimax among all RWTik for k such that there is a distinct tree GTik do /* A tree GTik is generated as adding any gateway gl in Gw into T */ Compute RGTik end for Find the maximum RGTimax among all RGTik for k such that there is a distinct tree NGik do /* A tree NGik is generated including any gateway gl in Gw and the member di */ Compute RNGik end for Find the maximum RNGimax among all RNGik end for Find di with the maximum RWTimax Find dj with the maximum RGTjmax RNGjmax if RWTimax ≤ RGTjmax RNGjmax then /* each corresponding trees are WTimax, GTjmax, and NGjmax */ T ← GTjmax NGjmax D ← D − {j} else T ← WTimax D ← D − {i} end while At first, all possible trees WTik consisting of only wireless links are generated as if the member di is added into T. The multicast-tree transmission ratio RWTik of the corresponding trees WTik is calculated as defined in Definition 3. Then, the maximum RWTimax and its tree WTimax are stored. Next, all possible trees GTik consisting of only
268
Y. Jung et al.
wireless links are generated as if any gateway gl in the set Gw into T. The multicasttree transmission ratio RGTik of the corresponding trees GTik is calculated and the maximum RGTimax and its tree GTimax are stored. After that, all possible trees NGik consisting of only wireless links are generated including any gateway gl in Gw and the member di. Similarly, the multicast-tree transmission ratio product RNGik of the corresponding trees NGik is calculated and the maximum RNGimax and its tree NGimax are stored. Then, we consider the generated trees. There are two kinds of trees which are newly generated by adding di into T. One is the tree which consists of only wireless links without passing any gateway like as WTik. The other is the tree which is constructed by passing gateways through a wired route like as combination of GTjk and NGjk. Therefore, the multicast-tree transmission ratio RWTimax of the tree without passing any gateway is compared with that RGTjmax RNGjmax of the composite tree with passing gateways. At last, the tree whose multicast-tree transmission ratio is the greatest is chosen.
4
Performance Evaluations
To evaluate our wireless multicast routing, our distributed multicast protocol that constructs a multicast tree by maximizing the multicast-tree transmission ratio is implemented in NS2 [11]. In our simulation scenario, the simulation parameters for multi-gateway mesh networks are shown in Table 1. Table 1. Simulation parameters MAC Type Interface Queue Type Interface Queue Length Antenna Type Propagation Type Topology Instance Transmission Range Traffic
IEEE 802.11 Drop Tail / Priority Queue 50 Omni directional Two Ray Ground Flat grid 1500m x 900m 200m CBR 0.5, 0.05, 0.005
In the simulations, 50 fixed nodes are located in areas of dimension 1,500m x 900m. One multicast source generates traffic at a CBR (Constant Bit Rate), with a packet size of 256 bytes. The packets are generated every 0.5, 0.05 and 0.005 seconds. From the total nodes, 20 are randomly selected and send a join request packet to the multicast group every second. In the simulations for multiple gateways, 5 nodes are randomly selected to work as gateway role. In order to compare with the proposed multicast routing heuristic which constructs a multicast tree by maximizing the proposed multicast-tree transmission ratio, two other multicast routings are considered. One multicast routing [10] constructs the multicast tree by minimizing the transmission count of multicast traffic, where all
Cost-Effective Multicast Routings in Wireless Mesh Networks
269
links are assumed to be of equal quality. The other constructs the tree by adding the paths having the maximum product of transmission ratios from a source to all members, as addressed in Section 3.1. 4.1
Delivery Ratio
The delivery ratio is the ratio of the number of data packets successfully received to the number of data packets sent from a source.
(a)
(b)
Fig. 3. Delivery ratio with respect to number of multicast members, where (a) shows delivery ratio in wireless mesh networks without any gateway and (b) does delivery ratio in wireless network with gateways
Fig. 3(a) shows delivery ratio varying number of multicast members in wireless mesh network without any gateway. The delivery ratio of our multicast routing heuristic is much higher than that of the multicast routing minimizing the forwarding nodes in [10], and is lower than that of the multicast routing adding paths having the maximum product of transmission ratios from the source to all members. In case of wireless mesh networks without gateway, as shown in Fig. 3(b), the delivery ratio of our extended multicast routing is close to that of the multicast routing adding paths having the maximum product of transmission ratio. The more members the multicast group consists of, the higher delivery ratio the multicast routing shows. That is why the multicast traffic goes through many reliable gateways. 4.2
Average Delay
In Fig. 4(a) and (b), the y-axis represents the simulation time unit while multicast traffic is transmitted in the multicast tree from the source to the members. As shown in Fig. 4(a), the average delay of the proposed routing multicast heuristic maximizing the proposed multicast tree transmission ratio is much less in comparison with that of the multicast routing minimizing the forwarding nodes. Also, the average delay of our proposed multicast routing heuristic is very close to that of the routing adding paths having the maximum product of transmission ratios. Fig. 4(b) shows that the more members the multicast group contains the lower delay the multicast routings show from the certain number of members. That is why the multicast traffic goes through many reliable gateways similarly to results in Fig. 3(b).
270
Y. Jung et al.
(a)
(b)
Fig. 4. Average delay with respect to number of multicast members, where (a) shows delivery ratio in wireless mesh networks without any gateway and (b) does delivery ratio in wireless network with gateways
4.3
Cost of the Multicast Tree
The cost of the multicast tree indicates how many nodes are involved in forwarding the multicast packet from the source to all members.
(a)
(b)
Fig. 5. Number of nodes in the multicast tree with respect to number of multicast members, where (a) shows delivery ratio in wireless mesh networks without any gateway and (b) does delivery ratio in wireless network with gateways
In Fig. 5(a), the proposed routing heuristic maximizing the multicast tree transmission ratio constructs a multicast tree having more forwarding nodes than the multicast routing minimizing the forwarding nodes. However, the number of nodes required by our routing heuristic is much less than that required for the routing adding paths with the maximum product of transmission ratios from the source to all members. As shown in Fig. 5(b), the number of nodes in the multicast tree under wireless mesh networks with multiple gateways shows a similar result to those without any gateways.
Cost-Effective Multicast Routings in Wireless Mesh Networks
5
271
Conclusions
In this paper, a multicast metric is designed for qualifying a multicast tree cost in wireless mesh networks. The multicast-tree transmission ratio product considers the link quality of the wireless multicast channels as well as wireless multicast advantage. We propose a wireless multicast routing which constructs a multicast tree by maximizing the multicast-tree transmission ratio and extend the multicast routing in wireless mesh networks with multiple gateways. The proposed wireless multicast routings maximizing the multicast tree transmission ratio shows a higher delivery ratio and a lower average delay in comparison with the multicast routing minimizing the transmission count. In comparison with other multicast routings, simulation results show that the multicast routings maximizing the multicast tree transmission ratio constructs a cost-effective multicast tree in terms of its delivery ratio, average delay, and required network resources.
References 1. Akyildiz, I.F., Wang, X., Wang, W.: Wireless Mesh Networks: a Survey. Computer Networks and ISDN Systems 47, 445–487 (2005) 2. Sichitiu, M.L.: Wireless Mesh Networks: Opportunities and Challenges. In: Proc. of the Wireless World Congress (2005) 3. Camp, J., Knightly, E.: The IEEE 802.11s Extended Service Set Mesh Networking Standard. IEEE Communications Magazine 46, 120–126 (2008) 4. Zhang, Y., Luo, J., Hu, H.: Wireless Mesh Networking: Architecture, Protocol, and Standards. Auerbach Publications (2007) 5. Baumann, R., Heimlicher, S., Lenders, V., May, M.: Routing Packets in Wireless Mesh Networks. In: Proc. of IEEE Conference on Wireless and Mobile Computing, Networking and Communications, WiMob (2007) 6. Waharte, S., Boutaba, R., Iraqi, Y., Ishibashi, B.: Routing Protocols in Wireless Mesh Networks: Challenges and Design Considerations. Multimedia Tools and Applications 29, 285–303 (2006) 7. Roy, S., Koutsonikolas, D., Das, S., Hu, Y.C.: High-Throughput Multicast Routing Metrics in Wireless Mesh Networks. In: Proc. of the 26th IEEE Int’l Conference on Distributed Computing Systems, p. 48 (2006) 8. Zhao, X., Chou, C.T., Guo, J., Jha, S.: A Scheme for Probabilistically Reliable Multicast Routing in Wireless Mesh Networks. In: Proc. of IEEE Conference on Local Computer Networks, p. 92 (2007) 9. Ruiz, P.M., Gomez-Skarmeta, A.F.: Approximating Optimal Multicast Trees in Wireless Multihop Networks. In: Proc. of IEEE Symposium on Computers and Communications, pp. 686–691 (2005) 10. Nguyen, U.T.: On Multicast Routing in Wireless Mesh Networks. Computer Communications 31, 1385–1399 (2008) 11. The Network Simulator – NS2
Facial Animation and Analysis Using 2D+3D Facial Motion Tracking Chan-Su Lee1, , SeungYong Chun1 , and Sang-Heon Lee2 1
Yeungnam University 214-1 Dae-dong, Gyeong-san si, Gyeongsangbook-do, 712-749, Rep. of Korea 2 Daegu Gyeongbuk Institute of Science & Technology
[email protected] 50-1, Sang-ri, Hyeongpung-myeon, Dalseong-gue,Daegu, 711-873, Rep. of Korea
Abstract. This paper presents a facial animation system using real-time tracking of 3D facial motions from a depth camera. We first applied 2D facial motion tracking based on extended Active Shape Models (ASMs) from 2D texture image corresponding to captured 3D depth information. Based on the estimated feature point tracking from extended 2D facial motion tracking, 3D facial motions are estimated. From the estimated 3D facial motion using extended ASMs, we extract MPEG-4 facial animation parameters (FAPs) from the 3D facial motion tracking, which provides more accurate estimation of FAPs invariant to view variations. Facial animations can be achieved by any facial animation tools supporting facial animation based on FAPs. Keywords: Facial animation, 3D facial motion tracking, depth camera, active shape models, MPEG-4, facial animation parameters.
1
Introduction
Recently there has been increased interests in facial expression recognition and tracking for affective computing [8], intelligent human computer interaction, human robot interaction, and facial animation [11]. Especially 3D facial expression recognition [7], tracking and animations are one of the key issues in computer vision and computer graphics communities. Advances in 3D capturing system and requests for 3D contents like 3D films and 3D displays promote 3D human motion tracking and modeling. Facial motion tracking systems can be divided into model-based tracking systems and appearance-based tracking systems. Typical appearance-based facial motion tracking systems are based on optical flow and its variations. Modelbased facial motion tracking systems can be divided into 2D model-based ones and 3D model-based ones. A typical 2D shape model-based approach is Active
Yeungnam University, Dept. of Electronics Engineering, LED-IT Fusion Technology Research Center.
T.-h. Kim et al. (Eds.): MulGraB 2011, Part I, CCIS 262, pp. 272–279, 2011. c Springer-Verlag Berlin Heidelberg 2011
Facial Animation and Analysis Using 2D+3D Facial Motion Tracking
273
Shape Models(ASMs) [1] and its variations. A 3D shape model can be found in 3D generic model-based facial motion tracking system. Here, we present a new method for facial motion tracking by combination of 2D facial motion tracking and corresponding 3D facial motion from depth camera for fast and accurate facial motion tracking systems. Facial animation methods can be divided into blend shape interpolation, parameter-based animation like facial action coding system(FACS) and MPEG-4 facial animation, deformation-based approaches, physics based muscle modeling, 3D face modeling, performance-driven animation,and so on [2]. Here, we present a MPEG-4 facial animation parameters(FAPs) based approach, which is originally presented for practical and useful facial animation in communication. By applying this approach, we can achieve standardized parameterizations of facial motion, which can be applicable to any facial animation software supporting FAPs. Though there are many research works related to accurate estimation of facial expressions and action units in FACS from video [9], few research works have focused on the estimation of FAPs from video except a few research works based on active shape models and active appearance models [3]. Commercial 3D depth cameras like Kinect 1 are available and 3D depth information can be extracted in real-time. However, few research are performed to extract facial animation parameters from depth camera. This paper presents an accurate real-time tracking of facial expressions from image and depth information by combining 2D and 3D facial motion tracking with extended ASMs. FAPs are extracted from 3D facial feature point tracking using extended ASMs. FAPs are used for animation of facial expressions using commercial tools that support facial animation using FAPs. In addition, we can recognize facial expressions based on FAPs similar to FACS based approaches. Using our extended ASMs, we can speed up fitting facial shape models based on 2D active shape models and estimate more accurate facial movements invariant to view variations using 3D depth information. In addition, FAPs which are used for facial animation is good features for facial expression recognition. Experimental results shows that FAPs parameters performs better than direct distance measurements in facial expression recognition.
2
3D Facial Motion Tracking Using Extended Active Shape Models
ASMs are a well known facial motion tracking method from 2D images. In this section, we briefly review the original ASMs presented by Cootes et. al. [1]. Then, we present an extension of the ASM using clustering of facial motions in different views. Based on the extended active shape models, we achieve 3D facial motion tracking using 3D depth camera with corresponding 2D images. 1
http://www.xbox.com/
274
2.1
C.-S. Lee, S.Y. Chun, and S.-H. Lee
Active Shape Models
Active shape models are presented to search for shape models which have shape variability under shape class constraints. The model represents shape by landmark points x = (x0 , y0 , x1 , y1 , · · · , xk , yk , · · · , xN −1 , yN−1 ), where N is the number of points used to represent shape models. After aligning collected landmark shapes, statistics of shape variability are captured by estimating linear subspace P using principle component analysis (PCA). The linear subspace allows synthesis of new shape as follows; x=x ¯ + P b,
(1)
M 1 where x ¯= M i=1 xi is mean of the aligned shape and P is linear projection and boldsymbolb is control parameter of the trained shape. Shape constraints can be achieved by constraining the control parameter within certain ratio like −3 λk ≤ bk ≤ 3 λk . (2) For efficient search of landmark point movements, edge intensity profiles are measured along the normal of model boundaries. Landmark point movement can be calculated from the trained profile and measured ones. Iteratively global transformation and local deformations are estimated within trained shape subspace. 2.2
Extending ASMs Using Clustering of Facial Motions
We extend the ASMs for large head motion using clustering of facial shape according to head motion. The facial shapes are collected not only for neutral head pose but also up, down, left, and right head pose. From these collected data, we applied clustering algorithms to segment subgroup of facial shape. In each subgroup, we learn subspace for ASM models. Then, the total facial tracking is achieved by transition of these subgroup. Figure 2 shows examples of tracking results. 2.3
3D ASMs Using Depth Camera
Depth information is very useful for facial expression recognition [7,10] and facial animation parameter estimation. Simultaneous extraction of 3D information with corresponding 2D texture is possible using commercial depth camera. By applying 2D tracking based on 2D texture images, we can estimate not only 2D motion of shape feature points, but also 3D motion from corresponding 3D depth information using depth camera. We used a Microspft Kinect camera for acquisition of 3D depth information as well as 2D texture images.
Facial Animation and Analysis Using 2D+3D Facial Motion Tracking
275
Fig. 1. Large head motion tracking using extended ASMs (a) Depth camera 2D tracking
(b) Depth image visualization
Fig. 2. Extended ASMs tracking with depth camera
3
Facial Animation Parameter Estimation from 2D and 3D Facial Motion Tracking
The MPEG-4 facial animation parameters(FAPs) are standardized facial control parameterizations. Many automatic estimation of facial motion parameters are presented based on 3D facial motion estimations [5]. Usually, 3D facial models are used for the estimation of 3D motion from 2D video sequences. For the specification of facial motion independent of individual variations of facial geometry, MPEG-4 specifies a face model in its neutral state, where facial animation parameter units(FAPUs) are defined as was shown in Figure 3 (a) [4]. MPEG-4 specifies 84 feature points(FPs) to evaluate facial animation parameters. FAPs specify 68 facial motion parameters compared to a neutral face; some feature points described by up/down motion, others right/left directional motion, and forward/backword directional motion. Figure 3 (b) shows FPs in neutral face. Each motion is described by predefined parameter units. From our extended 2D ASMs and corresponding 3D depth information, we can extract
276
C.-S. Lee, S.Y. Chun, and S.-H. Lee
(a) FAP Units
(b) FAP feature points
Fig. 3. Facial Action Coding Units and Feature Points [4]
Fig. 4. Facial animation using commercial facial animation tools from FAPs generated from 2D+3D Tracking
proper FAPs for facial animation. From the extracted FAPs, we can generate facial animation using commercial software supporting FAPs. We used Visage Technology AB’s Visage—InteractiveT M . Figure 4 shows example of facial animation using FAPs generated from 2D and 3D ASM Tracking.
Facial Animation and Analysis Using 2D+3D Facial Motion Tracking
4
277
Experimental Results
In the experiment, we tested facial expression recognition performance with/without FAPs and alignments. All the experiments are performed under Windows 7 operating system with Intel i7.
4.1
Performance Comparison of 2D and 3D FAPs in Facial Expression Recognition
We can extract 3D feature point location from extended 2D ASM tracking. As we have direct correspondence of 2D and 3D in the depth camera and its corresponding 2D texture. Initially we have tracking result for 2D texture and from the corresponding tracking result, we can estimate 3D tracking. In each case, we convert tracked feature points into FAPs. After that, we tested performance of facial expression recognition in 2D and 3D. We tested performance of facial expression recognition from 2D feature points. Several 3D facial motion DBs are available like [6,12]. Using this public database, we tested performance of facial expression recognition with/without vertical alignment and we direct distance representation FAPs representation for facial motions. For this experiment, we used Bosphorus 3D Face Database [6], which provides 2D landmark points, 3D landmark points, 2D image and 3D depth information for 104 subjects. Some of the subject provide six basic facial expressions like ’anger’,’disgust’,’fear’,’happiness’,’sadness’,’surprise’. However, others provide only part of the expressions among six basic facial expressions. We selected 41 subjects who have six basic facial expressions and a neutral face with the same number of landmark points. To measure movement from neutral face, we aligned each face image to a neutral face in two steps. First, we performed horizon alignment of the face. As each eye center can be computed from landmark points, we compute rotation angle based on connecting line of two eye locations to be parallel with horizontal axis. After that, we computer the center of two eyes, which is used to align global transformation of each subjects. We tested performances based on leave-one subject-out protocol. That is, we selected one subject for test and removed the subject data from test database, then we evaluated expressions using the 40 subjects as test database and estimated distances for given test landmark features and all other landmark points. To see the effect of different representation of facial motion, we classified facial expressions by nearest sample facial classes. Without alignment, we achieved 46% recognition performance as Table 1 shows. The overall performance is relatively low. However, the performance can be improved using better classifiers. By vertical and center alignment, the facial expression recognition performance improved to 54%; Details can be found in Table 2. Finally, using FAPs with the same landmark points and the same alignment, we achieved 57% the overall recognition performance, which is better than pure alignment based ones. Table 3 shows details based on FAPs.
278
C.-S. Lee, S.Y. Chun, and S.-H. Lee
Table 1. Confusion matrix: Direct pairwise distance based facial expression recognition(without alignment)
Anger Disgust Fear Happiness Sadness Surprise
Anger Disgust Fear Happiness Sadness Surprise 18 8 2 1 10 2 9 19 1 7 4 1 2 5 8 0 4 22 2 3 1 34 1 0 11 3 7 0 16 4 0 0 18 2 2 19
Table 2. Confusion matrix: Direct pairwise distance based facial expression recognition(with alignment)
Anger Disgust Fear Happiness Sadness Surprise
Anger Disgust Fear Happiness Sadness Surprise 26 5 2 0 8 0 7 11 2 9 11 1 2 2 13 0 4 20 0 3 0 38 0 0 7 3 5 0 22 4 1 1 13 0 2 24
Table 3. Confusion matrix: FAPs parameter-based facial expression recognition
Anger Disgust Fear Happiness Sadness Surprise
4.2
Anger Disgust Fear Happiness Sadness Surprise 15 12 3 0 11 0 9 19 2 3 7 1 2 0 16 0 6 17 0 4 0 37 0 0 6 3 4 0 26 2 1 1 11 0 2 26
Tracking Accuracy of 2D and 3D ASMs
We compared tracking accuracy of 2D ASMs and 3D ASMs. As we did not have ground truth 3D position values for the test data, we simply compare accuracy of 2D alignment compared with 2D ground truth. The ground truth values were manually marked. Using combination of 2D and 3D ASMs, we achieved more accurate tracking of facial motions.
5
Conclusions and Future Works
In this paper, we presented 3D facial motion tracking based on extended 2D ASMs. Using this 2D+3D model, we can extract FAPs for facial animation and
Facial Animation and Analysis Using 2D+3D Facial Motion Tracking
279
applied for facial animation using commercial software. In addition, our experiments show that FAPs can be used as facial motion parameterizations for facial expression recognition. Still, we are under collection of our own database for more accurate estimation of facial motion based on 2D+3D tracking and further investigation of the benefits 3D facial motion tracking. Acknowledgements. This work was supported by the DGIST R&D Program of the Ministry of Education, Science and Technology of Korea(11-IT-03).
References 1. Cootes, T.F., Taylor, C.J., Cooper, D.H., Graham, J.: Active shape models: Their training and applications, vol. 61(1), pp. 38–59 (1995) 2. Deng, Z., Neumann, U. (eds.): Data-Driven 3D Facial Animation. Springer, London (2008) 3. Ofli, F., Erzin, E., Yemez, Y., Tekalp, A.M.: Estimation and analysis of facial animation parameter patterns, vol. (IV), pp. 293–296 (2007) 4. Pandzie, I.S., Forchheimer, R. (eds.): MPEG-4 Facial Animation: The Standard, Implementation and Applications. John Wiley & Sons, LTD (2002) 5. Sarris, N., Grammalidis, N., Strintzis, M.G.: Fap extraction using threedimensional motion estimatioin. IEEE Transactions on Circuits and Systems for Video Technology 12(10) (2002) 6. Savran, A., Aly¨ uz, N., Dibeklio˘ glu, H., C ¸ eliktutan, O., G¨ okberk, B., Sankur, B., Akarun, L.: Bosphorus Database for 3D Face Analysis. In: Schouten, B., Juul, N.C., Drygajlo, A., Tistarelli, M. (eds.) BIOID 2008. LNCS, vol. 5372, pp. 47–56. Springer, Heidelberg (2008), http://dx.doi.org/10.1007/978-3-540-89991-4_6 7. Savran, A., Sankur, B., Bilge, T.: Comparative evaluation of 3d vs. 2d modality for automatic detection of facial action units. Pattern Recognition 45, 767–782 (2012) 8. Tao, J., Tan, T.T.: Affective Computing: A Review. In: Tao, J., Tan, T., Picard, R.W. (eds.) ACII 2005. LNCS, vol. 3784, pp. 981–995. Springer, Heidelberg (2005) 9. Tong, Y., Liao, W., Ji, Q.: Facial action unit recognition by exploiting their dynamic and semantic relationships. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 10. Tsalakanidou, F., Malassiotis, S.: Real-time 2d+3d facial action and expression recognition. Pattern Recognition 43(5), 1763–1775 (2010) 11. Vlasic, D., Brand, M., Pfister, H., Popovi´c, J.: Face transfer with multilinear models. ACM Trans. Graph. 24, 426–433 (2005), http://doi.acm.org/10.1145/1073204.1073209 12. Yin, L., Wei, X., Sun, Y., Wang, J., Rosato, M.J.: A 3d facial expression database for facial behavior research. In: Proceedings of the 7th International Conference on Automatic Face and Gesture Recognition, FGR 2006, pp. 211–216. IEEE Computer Society, Washington, DC (2006), http://dx.doi.org/10.1109/FGR.2006.6
A Method to Improve Reliability of Spectrum Sensing over Rayleigh Fading Channel Truc Thanh Tran* and Hyung Yun Kong Wireless Communication Lab University of Ulsan, S.Korea {trantruc,hkong}@mail.ulsan.ac.kr
Abstract. This paper evaluates performance of two methods of spectrum sensing: linear combining method and selection method which is based on maximum SNR of sensing channel. Then, we proposed a rule for global detection for the purpose of combating with hidden terminal problem in spectrum sensing. Our analysis considers the situation when sensing channels experience non-identical, independent distributed (i.i.d) Rayleigh fading. The average values of global detection’s probability of these methods are derived and compared. In scope of this paper, the reporting channels are assumed AWGN channel with invariant and identical gain during system’s operation. Keywords: maximum ratio combining, cognitive radio, Rayleigh fading, relay selection, cooperative spectrum sensing.
1
Introduction
In the scarcity of free frequency band, new wireless communication technology has to be more significantly effective than existing licensed system in terms of spectrum usage. Recently, Cognitive Radio has been emerging as the key technology which is able to satisfy strict requirements of spectrum efficiency [1]. This kind of system works as secondary users system to enable coexistence with licensed users or primary user. Thus, sensing technique is in turn required in order to allow secondary user (SU) to be aware of primary user (PU) to avoid harmful interference. Due to multipath fading, vicinity and shadowing, detection operation is not always reliable [2]. So, cooperative model of sensing has been proposed as the method which can reduce error in making final decision [3]. Detection operation is normally implemented in two successive stages: sensing and reporting. In the sensing stage, every SU carries out spectrum sensing individually. Then, in the reporting stage, observations of local sensing are reported to a common receiver (or fusion center) for globally making decision to determine whether PU is present or absent. There are several techniques on which cooperative spectrum sensing is based such as: summation of all local observations [4], cluster division [5], relay selection [6]. In [5], reporting channel experiences Rayleigh fading, and the most favorable user with largest reporting channel gain is selected to collect the results from other users. The *
Corresponding author.
T.-h. Kim et al. (Eds.): MulGraB 2011, Part I, CCIS 262, pp. 280–289, 2011. © Springer-Verlag Berlin Heidelberg 2011
A Method to Improve Reliability of Spectrum Sensing
281
selected user have two functions: collecting observation signal from local and then making binary decision to forward to common receiver. [6] has investigated the method of relay selection in the environment of i.i.d probability distribution of Rayleigh fading. In [4], the problem of optimization in high SNRs fusion is resolved to optimize performance of detection. Relying on the model introduced in [4] and [6], this paper considers two cooperative sensing techniques: linear combining local soft-decision and secondary users selection based on maximum SNR sensing channel in the environment of nonidentical independent distribution distributed (n.i.d) Rayleigh fading. Hence, the probability density function (pdf) of linear combining signal and best SU selection method are needed to take account of fading. Furthermore, we propose a method which is able to prevent from making incorrect decision due to the hidden terminal problem. The remaining consists of: section 2 is the System Model, section 3 is Performance Analysis, Numerical Results are given in section4 and the final is Conclusion.
2
System Model
The system has been described in [4](we name it as Method I in this paper) which linear combining technique is handled (Fig. 1) in condition of high fusion SNR [4]. In scope of this paper, the sensing channel is slow Rayleigh fading channel whose distribution is n.i.d. From [4], PU signal s(k) is complex PSK modulated independent and identically distributed (i.i.d.) with mean zero and variance
σ s2 ; hsi is the sensing channel gain,
which is assumed to be constant during each of cooperative spectrum sensing period and experiences slow flat Rayleigh fading; noise in sensing channel is i.i.d and follows circular, symmetric, complex Gaussian random variable with zero mean and variance of
σ ns2 . The local sensing SNR is γ si = σ s2 hsi 2 / σ ns2 .
Assuming number of samples taken for local energy detection is large enough (K>>1), ratio of AWGN noise variance of reporting channel σ nr2 and square of sensing noise variance σ ns4 can be neglected in comparison with gain of reporting channel gi , SNR of received PU signal at each SU is greatly less small in comparison with unit ( γ si << 1 ) [4]. In scope of this paper, channel gain gi is invariant and identical for all SUs reporting channel ( gi
= g , ∀i ). Because the paper consider situation of slow
flat fading, the channel gain of sensing channels are not changed during each period of energy detection. Thus, the combining signal, and then its probability of detection can be obtained from [4] as showed in (1) and (2) N N z I = ωi yi ~ i =1 N
N N ωi2 g 2σ ns4 2 :H 0 + σ nr2 ωi gσ ns , K i =1 i =1 N N ( 2γ si + 1) ωi2 g 2σ ns4 + σ 2 :H 2 ωi g (γ si + 1) σ ns , nr 1 K i =1 i =1
(1)
282
T.T. Tran and H.Y. Kong
Because we assume system is high fusion SNR [4], hence: PdI = Q Q −1 (α ) −
N
ω gγ i
i =1
i
N
ω i =1
2 i
g2
K
= Q Q −1 (α ) −
N
ω γ
i i
i =1
N
ω i =1
2 i
K
(2)
with threshold of detection is N
λI = ωi gσ ns2 + Q −1 (α )
( 2γ si + 1) ωi2 g 2σ ns4 + σ 2
i =1
(3)
nr
K
SU1 n.i.d Rayleigh Fading Sensing Channel
AWGN reporting channel
SU2
PU
fusion SUN
Fig. 1. System model
where
is denoted as number of samples are taken by local energy detector,
T ω = [ω1 , ω 2 ,.., ω N ] is linear weight vector, and ω = 1, ωi ≥ 0 , g = g1 = g 2 = .. = g N are
reporting channel gain,
γ = [γ s1 , γ s 2 ,.., γ sN ] is SNR of sensing channel vector, T
y = [ y1 , y2 ,.., yN ] is reporting signals received at fusion center, it is described T
specifically in [4], α is identified as probability of false alarm, and one-dimension Q +∞ function is Q ( x ) = 1 exp −t 2 2 dt . For tractability, we denote the average of 2π x
(
each
received
signal yi
μ y ,1 = g ( γ si + 1) σ i
2 ns
)
at
when H 1 ,
fusion the
are
as variance
μ y ,0 = gσ ns2 when H 0 , i
is
given
as
σ y2 ,0 = g 2σ ns4 K + σ nr2 when H 0 , σ y2 ,1 = (1 + 2γ si ) g 2σ ns4 K + σ nr2 when H 1 . i
i
It is different from Method I, Method II employ the best SU selection based on best SNR sensing channel. Among number of users which are in difference of Rayleigh fading parameters, fusion center chooses the SU which possesses the best SNR of sensing channel to dispatch its soft-decision rather than combining observations from all of SUs. Hence, probability of detection in this case is the special case of (1) when N=1, ωi
= 1 . The statistics of received signal for this case is showed in (8) where iˆ
denotes as selected SU. N iˆ = arg max {γ si }i =1
(4)
A Method to Improve Reliability of Spectrum Sensing 2 4 N gσns2 , g Kσns +σnr2 :H0 zII = ysiˆ ~ 2 4 N g γ +1 σ 2 , ( 2γ si +1) g σ ns +σ 2 :H ( ) ˆ 1 ns nr si K
283
(5)
Probability of detection for Method II (SU maximum SNR based selection method) is a special cases of (2) . Hence, we can achieve:
(
PdII = Q Q −1 (α ) − γ iˆ K
)
(6)
with threshold of detection is
λII = gσ ns2 + Q −1 (α ) 2.1
( 2γ si + 1) g 2σ ns4 + σ 2 K
nr
(7)
Proposed Detection Rule
In this section, we propose a decision rule as shown in. Define the
w = zI − zII = (ωiˆ − 1) yiˆ +
as
N
ω y , we can achieve its distribution as following
i =1,i ≠ iˆ
i
i
N g 2σ ns4 1 + ωi2 N i =1 + 2σ 2 :H N (ω ˆ − 1) + ω gσ 2 , i ns nr i 0 K i =1,i ≠ iˆ N 2 w~ (ωiˆ − 1)( γ siˆ + 1) + ωi ( γ si + 1) gσ ns , i =1,i ≠iˆ N N :H1 ( 2γ si + 1) ωi2 + ( 2γ ˆ + 1) g 2σ ns4 si i =1 + 2σ nr2 K
Fig. 2. The proposed decision rule
(8)
284
T.T. Tran and H.Y. Kong
The region of detection rule is described in Table 1. The signal w indicates the distance between the combining signal
zI and selecting signal z II . Whenever zI
is below its threshold λI , fusion center checks the difference between the two kind of received signals. If the difference in this case is small than a given ε P , a decision of ‘0’ which presents the state of no PU users is made. In the case affected by hidden terminal phenomenon, PU is still on active while the final decision is ‘0’ which denotes the absence of PU. Thus, in this case, fusion had made a miss detection. Otherwise, if there is really no PU on active (H0), this decision is true to denote the available channel for SUs to use. Table 1. Proposed Rule Region\Hypothesis
H1 Detection
w ≤ ε P , zI ≥ λI
H0 False Alarm
w > ε P , zI ≥ λI w > ε P , zI < λI
Abandoned region Miss Detection
w ≤ ε P , zI < λI
Abandoned region PU Absence
we obtain the joint pdf of pair of random variables ( zI , w) , as shown N μ ( ( z , w) ~ N (μ
z0I
I
Denote that ω = ω1 , ω2 ,..,
zI = ωT y
(ω
iˆ
z1I
) ;r ): H
, μ w0 ;σ z I , σ w0 ; r0 : H 0 0
, μw1 ;σ z I , σ w1 1
1
(9) 1
− 1) ,.., ω N . It is thus allowed us to rewrite that T
T and w = ω y . The correlation coefficient of two variables ( z I , w ) can be
{ }= R
rewritten that r = ω T Ryω − E { z I } E {w} where R y = E yy
{ }= R
and Ry = E yy
T
y ,1
T
y ,0
when H 0
when H 1 . The correlation matrix of y under
R y ,0 = g 2σ ns4 [1]N × N and under
μ y21 ,1 μ μ Ry ,1 = y2 ,1 y1 ,1 ... μ y ,1μ y ,1 N 1
is given as
μ y ,1μ y ,1
..
μ y ,1μ y ,1
μ y2 ,1
..
μ y ,1μ y ,1
...
...
...
...
...
μ y2
1
2
2
1
N
2
N
N
,1
N×N
is
A Method to Improve Reliability of Spectrum Sensing N N T 2 4 : H0 ω Ryω − g σ ns ωi (ωiˆ − 1) + ωi = 0 i =1,i ≠iˆ i =1 r= N N T 2 4 ωi ( γ si + 1) (ωiˆ − 1)( γ siˆ + 1) + ωi ( γ si + 1) = 0 : H1 ω Ryω − g σ ns ˆ i =1,i ≠i i =1
It is thus ( zI , w) are independent. We denote
285
(10)
as correct detection probability of PU
absence state and it is calculated as following
β = Pw ( w ≤ ε P ) Pz ( zI < λI ) Pw ( w ≤ ε P ) = I
Therefore, we can find the unique pair of values
β1 + β 2 =
β 1−α
(11)
β1 < β2 which
β1 , β 2 ;
β ; we can rewrite expression of ε in similar way of (3). Finding P 1−α
β1 , β 2 is outside of this paper’s scope. Value of ε P can be identified by ε P = μ w,0 + Q −1 ( β 2 ) σ w2 ,0 : H 0 and −ε P = μ w,0 + Q −1 ( β1 ) σ w2 ,0 : H 0 . We define that
Pm , w = Pw ( w ≤ ε P ) . Having the same way of calculation when deriving (2) which
is described concretely in [4], we can induce this probability as given: Q −1 ( β ) σ 2 + ( μ − μ ) Q −1 ( β ) σ 2 + ( μ − μ ) w,0 w,0 w,1 w,0 w,0 w,1 1 2 −Q Pm ,w = Q 2 2 σ σ w,1 w,1 ≈ Q Q −1 ( β1 ) −
N ωiγ si − γ iˆ i =1 N 2 ωi + 1 K i =1
−1 Q − Q ( β2 ) −
N ωiγ si − γ iˆ i =1 N 2 ωi + 1 K i =1
(12)
It is thus allows us to obtain miss detection probability as given
(
)
H1 : PzI , w w ≤ ε P , z I < λI ; μ z I , μ w1 ; σ z I , σ w1 ; r1 = Pw ( w ≤ ε P ) PzI ( z I < λI )
2.2
1
1
(13)
Average Probability of Detection
For calculating average probability of detection, from (2), we need to derive the N
pdf of factor γ Z = ωiγ i . Some results of specific value of N are given in the I i =1
condition of i.i.d distribution of γ in the following:
286
T.T. Tran and H.Y. Kong
fγ Z
I
(ω1ω2 )2 γ γ exp − − exp − when N = 2 ω ω γ ω γ − ( 2 1) 1 ω2γ 2 (ω ω ω ) ( γ ) = 1 2 3 ω1 exp − γ − ω2 exp − γ + ω1γ ω2 − ω3 ω2γ (ω2 − ω1 ) γ ω1 − ω3 ω2 γ ω1 + − exp − when N = 3 ω 2 − ω 3 ω 2 − ω3 ω3γ
(14)
For a special case when ω1 = .. = ωN = ω for all i = 1,..N and N ≥ 2 :
γ γ N −1 fγ ( γ ) = exp − N ωγ ( N − 1)!(ωγ )
(15)
zI
In Method II, the pdf of the selection method described in (7) can be derived as showed in (28). N
fγ siˆ ( γ siˆ ) = ∂ i =1
Fγ siˆ ( γ siˆ ) ∂γ si
N
N
i =1
j =1 j ≠i
= fγ si ( γ siˆ ) ∏ Fγ sj ( γ siˆ ) (16)
γ where f γ ( γ si ) = 1 exp − si is the pdf of SNR of sensing channel at the SU ith si
γ si
and
γ si
Fγ sj is its CDF function respectively.
3
Analytical Results
In method I, average value of probability of detection in combining technique is evaluated as given PdI = Q Q −1 (α ) − 0 +∞
Method
PdII =
+∞
II,
its
Q ( Q (α ) − γ 0
−1
siˆ
f (γ ) dγ I zI N γ zI 2 ωi K i =1
γz
performance
is
evaluated
(17)
in
by
)
K fγ siˆ ( γ siˆ )d γ siˆ .
From Table 1, it is that detection probability in the proposed scheme does not depend on ε P . Therefore, the proposed method has the same detection probability value with described linear combining method (method I). However, its probability of miss detection is different from Method I and is described as the following.
A Method to Improve Reliability of Spectrum Sensing
P mIII =
+∞ +∞
(1 − P ) P I d
0
f
m, w γ zI
(γ ) f (γ ) d γ zI
γ siˆ
siˆ
zI
287
d γ siˆ
0
(18)
The analytical results will demonstrate the performance of three methods in cases of n.i.d environment (table 2), i.i.d fading environment with different number of users (table 3) and i.i.d environment with the same number of users (N=8) but in different of SNR ([-12,-14,-15]dB). Table 2. Parameter of analysis of the case when the number of user is 4, and n.i.d fading environment
Number of SUs
N=4
Average SNR
SNR 1 = [-10,-13,-14,-16] dB SNR 2 = [-10.5,-11,-10,-12] dB
wi
½
Table 3. Parameter of analysis of the case when average SNR is -10dB, and i.i.d fading environment
Number of SUs (N) Average SNR wi
[4,8,12] [-10] dB 1 N
Fig. 3a demonstrates overall detection performance when number of SUs is kept constant at value of 4 and the condition of fading is non-identically, independent fading. From this plot, we can see that with set of SNR2, selection method (Method II) is better than the rest two methods (linear combining and proposed method). Meanwhile, it is worse (than the remaining) when we evaluated using set of SNR1. We also take notes that the probability of detection of proposed scheme is same with the combining method. Fig. 3b provides the performance when fusion misses PU detection. This shows that there is few differences between the proposed scheme and the other in term of preventing from miss detection under n.i.d fading environment, set of SNR1. It is probably depended on the set of SNR that we choose to survey. Fig. 4, Fig. 5 investigate the cases of i.i.d environment. Within the same number of users (N=8 in this case) in various average SNR value of Rayleigh fading. Fig. 4a shows that detection performance of method I and proposed method is better than selection method. In additions, when SNRs of sensing channel are better, the detection performance is enhanced. Meanwhile, in Fig. 4b, there is also reduction in miss detection probability in the case using proposed method compared with the case using the combining method only. Fig. 5 makes a survey respect to case that the number of SUs is changed in the same condition of average SNR. As we can see that when number of SUs increases, miss-detection probability decreases and the proposed scheme eliminate miss detection better than linear combining scheme.
288
T.T. Tran and H.Y. Kong n.i.d Rayleigh Fading, N = 4 1
n.i.d Rayleigh Fading, N = 4 Probability of Miss Detection
Probability of Detection
1
0.8 0.6 0.4
Com, Prop, SNR1 Sel, SNR 1 Com, Prop, SNR2 Sel, SNR 2
0.2 0
0
0.2 0.4 0.6 0.8 Probability of False Alarm
Combing, SNR 1 Proposed, beta = 0.9,SNR1 Proposed, beta = 0.95, SNR1
0.8 0.6 0.4 0.2 0
1
0
0.2 0.4 0.6 0.8 Probability of False Alarm
1
Fig. 3. a. Average probability of detection vs false alarm probability when N = 4, n.i.d Fading with two SNR sets described in table II. b) miss detection vs false alarm probability Probability of Detection over i.i.d Rayleigh Fading, N = 8, SNR = [-12,-14,-15]
1
1
Probability of Miss Detection, N = 8, i.i.d SNR = -12, -14,-15 dB
0.8
prop,SNR=-12dB
Probability of Missed Detection
Probability of Detection
0.8
0.6
com, SNR = -12dB prop, SNR = -14dB com, SNR = -14dB
0.6
com, prop, SNR = -12dB
0.4
com, SNR = -15dB
0.4
sel, SNR = -12dB com, prop, SNR = -14dB sel, SNR = -14dB
0.2
prop, SNR = -15dB
0.2
com, prop, SNR = -15dB sel, SNR = -15dB
0
0
0.2 0.4 0.6 0.8 Probability of False Alarm
0
1
0
0.2
0.4
0.6
0.8
1
Probability of False Alarm
Fig. 4. a) Probability of detection vs false alarm probability when N = 8 , i.i.d Rayleigh with SNR = [-12,-14,-15] dB. b) miss-detection vs false alarm probability 1 Probability of miss detection, i.i.d Rayleigh f ading SNR = -12dB
0.8
com, N =8
Probability of Detection
prop, N= 8 com, N = 12
0.6
prop, N = 12 com, N = 4 prop, N = 4
0.4
0.2
0
0
0.2
0.4
0.6
0.8
1
Probability of False Alarm
Fig. 5. Probability of miss detection vs false alarm probability for case number of user changed and i.i.d Rayleigh fading with SNR = -12dB, β=0.8
A Method to Improve Reliability of Spectrum Sensing
4
289
Conclusion
The paper has considers the cooperative scheme where linear combining and best SU selection are used for global detection in the condition of n.i.d fading environment. Then, the paper proposes a detection rule to eliminate miss detection. The proposed scheme has shown that it can reduce miss detection while keeping the same reliability of determining PU‘s absence state (within given β) in condition of i.i.d Rayleigh fading. For detection operation, the linear combining method and proposed scheme have showed better performance than selection method when they operate in i.i.d environment. In the n.i.d Rayleigh fading, depending on each SNR variance of fading in the group of SUs, the linear combining technique’s performance and proposed scheme are better or worse than the method of selection (method II). Besides, in the paper, under environment of n.i.d Rayleigh fading, the pdf of combining random variables of SNR has been also derived to calculate average probability of detection. The performance is improved when SNRs of sensing channel become better or the number of SUs joining sensing operation increases. Acknowledgments. This research was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education, Science and Technology(No. 2010-0004865).
References 1. 2. 3. 4.
5.
6.
Ghasemi, A., Sousa, E.S.: Spectrum sensing in cognitive radio networks: requirements, challenges and design trade-offs. IEEE Communications Magazine 46, 32–39 (2008) Mitola, J.: An Integrated Agent Architecture for Software Defined Radio. KTH (2000) Ekram, H., Vijay, B.: Cognitive Wireless Communication Networks 2007. Springer Science+Business Media (2007) Matsui, M., Shiba, H., Akabane, K., Uehara, K.: A Novel Cooperative Sensing Technique for Cognitive Radio. In: IEEE 18th International Symposium on Personal, Indoor and Mobile Radio Communications, PIMRC 2007, pp. 1–5 (2007) Xiao, Z., Jin-long, W., Qi-hui, W.: A Sensor Set Selecting Algorithm Based on Confidence Detection in Cognitive Radio. In: International Conference on Signal Acquisition and Processing, ICSAP 2010, pp. 40–44 (2010) Digham, F.F., Alouini, M.S., Simon, M.K.: On the Energy Detection of Unknown Signals Over Fading Channels. IEEE Transactions on Communications 55, 21–24 (2007)
Development of Multi-functional Laser Pointer Mouse through Image Processing Jin Shin1, Sungmin Kim2, and SooyeongYi1,* 1
Seoul National University of Science and Technology, Seoul, Korea
[email protected],
[email protected] 2 University of California, San Diego, USA
[email protected]
Abstract. A beam projector is popularly used for a presentation nowadays. In order to point out local area of the projected image, a laser pointer is used with it. Simple wireless presenter has only limited functions of a computer mouse(pointing device) such as “go to next slide” or “back to previous slide” in a specific application i.e., MS-PowerPoint, with wireless channel; thus, there is inconvenience to carry out other controls, e.g., execution or termination of an application and maximization or minimization of a window, etc. during the presentation. The main objective of this paper is to implement a wireless multifunctional laser pointer mouse that has the same functions of a computer mouse. In order to get position of the laser spot in the projector display, an image processing to detect the laser spot in the camera image is required. In addition, transformation of the spot position into computer display coordinates is needed to execute computer controls on the computer display. Keywords: vision sensor, presentation, laser pointer, beam projector, wireless presenter.
1
Introduction
For delivering visual information to a group of audiences, a beam projector is used in a presentation in general. In order to point out some area in the projected image on a screen, a computer mouse or a laser pointer is employed. As a simple pointing device, the laser pointer illuminates a red or green laser spot that is distinguishable from a displayed image on the projector screen. As well as the pointing function, the computer mouse has computer control functions i.e., execution or termination of an application and maximization or minimization of a window etc. by clicking a button. A commercially available so-called wireless presenter has simple pointing function of a laser pointer and a few computer control functions of a computer mouse. The wireless presenter transmits button signals to a main computer through wireless channel. However, the present wireless presenter has limited function for computer *
Corresponding author.
T.-h. Kim et al. (Eds.): MulGraB 2011, Part I, CCIS 262, pp. 290–298, 2011. © Springer-Verlag Berlin Heidelberg 2011
Development of Multi-functional Laser Pointer Mouse through Image Processing
291
control in a specific application i.e., “go to next/previous slide in MS-PowerPoint”. Thus, by the wireless presenter, it is impossible to carry out most of the computer control functions that can be done by the computer mouse. In order to solve the problem, this paper aims to develop a multi-functional laser pointer mouse by combining the laser pointer and the computer mouse in this paper. To implement the computer control functions, it is necessary to detect position of a pointer on the computer display. In case of the computer mouse, it is possible to detect the pointer position by mouse wheel rotation or optical reflection on a surface. Similarly to carry out the computer control functions by the laser pointer on a beam projector screen, it is needed to (1) detect the laser spot on the screen by camera image processing, (2) transform the position of laser spot in the camera image coordinates into the computer display coordinates, and (3) generate and transmit the same button signals of the computer mouser through wireless channel to a main computer. This paper is organized as follows; implementation of the proposed multifunctional laser pointer mouse is addressed in Sec. 2. Sec. 3 describes some experimental results and concluding remarks are presented in Sec. 4.
2
Multi-functional Laser Pointer Mouse
2.1
Structure of Laser Pointer Mouse
Structure of the laser pointer mouse proposed in this paper is shown in Fig. 1. For a presentation in general, a main computer, a beam projector, and a screen are needed. In addition, a camera is installed on the computer and located so that an image of the full projected screen can be acquired. The acquired image is sent to the computer and the laser pointer spot should be detected in the image. Since most of the computer control functions can be carried out by clicking buttons, the laser pointer mouse should have buttons and a wireless module to transmit the button signal to the main computer.
Fig. 1. System configuration of multi-functional laser pointer mouse
292
2.2
J. Shin, S. Kim, and S. Yi
Extraction of Computer Display Region from Camera Image
The acquired camera image may contain the background screen and the other surroundings as well as the projected computer display region as shown in Fig. 2. It is necessary to extract the computer display region from the camera image to detect the laser spot and transform its location into the computer display coordinates.
Fig. 2. Computer display region in camera image
By detecting four corners of the computer display region, it is possible to extract just the region from the whole camera image; it is noted that the projected computer display region in the camera image is brighter than the background region in the image in general. Thus, as shown in Fig. 3, it is easy to detect the four corners of the region from total sum of pixel intensities in 5 × 5 sub-areas A , B , C , and D respectively at a certain point ( x, y ) in the image as follows [1]: If If If If
Total ( D ) > Total ( A), Total ( B ), Total (C ) , then Total (C ) > Total ( A), Total ( B ), Total ( D ) , then Total ( B ) > Total ( A), Total (C ), Total ( D ) , then Total ( A) > Total ( B ), Total (C ), Total ( D ) , then
( x, y ) ( x, y ) ( x, y ) ( x, y )
is left-top corner. is right-top corner. is left-bottom corner. is right-bottom corner. (1)
where Total (⋅) denotes the total sum of pixel intensities in a sub-area. Fig. 3 illustrates the corner detection algorithm and Fig. 4 shows the detected four corners of the computer display region in the camera image. In Fig. 4, the coordinate values ( xi , yi ), i = 1, , 4 of the corners are represented in the camera image coordinates.
Development of Multi-fu unctional Laser Pointer Mouse through Image Processing
293
Fig. 3. Corner C detection for computer display region
Fig. 4. Comp puter display region extracted by corner detection
2.3
Detection of Laser Spot
In order to implement the computer c control functions of the laser pointer mouse, the laser spot should be detecteed first in the camera image. Since a distinguishable laaser light source is adopted in a presentation, it is easy to detect the laser spot locationn in the camera image by search hing a pixel with maximum intensity as described in Figg. 5 [2]. If the contrast level of the camera image is set too high, it might be impossiblee to detect the laser spot pixell due to intensity saturation. As a preprocessing for the camera image, an automaticc algorithm for adjusting the contrast level is developedd in this paper.
294
J. Shin, S. Kim, and S. Yi
Fig. 5. Laser spot detection by searching a maximum intensity pixel
2.4
Transformation of Laser Spot Location into Computer Display Coordinates
Once the laser spot has been detected, its position in the camera image coordinates should be transformed into the computer display coordinates so that to synchronize the mouse cursor to the laser spot and to make the computer controls accessible to the laser pointer mouse. It should be noted that the extracted computer display region in the camera image may contain tilt and deformation in accordance with position and orientation of camera setup. With a priori known original dimension of the computer display e.g. 1024 × 768 etc., the coordinate transformation is possible from the relationship between the original dimension and extracted dimension of the computer display region in Fig. 4. The well-known warping transformation defines the relationship between the original coordinate value P′ of the computer display and the detected coordinate value P of the corresponding region in the camera image as follows [3][5]:
P′ = M ⋅ P
(1)
Once the transformation matrix M is determined, it can be used to transform the coordinate value of the laser spot detected in the camera image to that of the original computer display coordinates. The computer display and the camera image are both 2-dimensional so the transformation between the two can be written as follows:
x′ a b c x y′ = d e f ⋅ y 1 g h 1 1
(2)
Here [ x′ y′ 1] t and [ x y 1] t represent the homogeneous coordinates of the computer display and the camera image, respectively. Expanding the above (2) in respect to x′ and y′ gives the following set of equations:
Development of Multi-functional Laser Pointer Mouse through Image Processing
x ′ = ax + by + c − gxx ′ − hx ′y y ′ = dx + ey + f − gxy ′ − hyy ′
295
(3)
From the known resolution of the computer display, each coordinate value of four corners of the computer display is given, which corresponds to the four detected corners shown in Fig. 4 in the camera image. For example, if the resolution of the computer display is 1024 × 768 , ( x3′ , y3′ ) in the computer display coordinates corresponding to right-top corner ( x3 , y3 ) in Fig. 4 is (1024, 768) . Rewriting (3) in respect to unknown variables a ∼ h using these four coordinate-transformation pairs gives the following:
x1′ x1 y′ 0 1 x2′ x2 y2′ = 0 x3′ x3 y3′ 0 x′ x 4 4 y4′ 0
y1
1
0
0
0
0
x1
y1
y2
1
0
0
0
0 x2
y2
y3 0
1 0
0 x3
0 y3
y4 0
1 0 0 x4
0 y4
− x1 x1′ 1 − x1 y1′ 0 − x2 x2′ 1 − x2 y2′ 0 − x3 x3′ 0
1 − x3 y3′ 0 − x4 x4′ 1 − x4 y4′
− x1′ y1 a − y1 y1′ b − x2′ y2 c − y2 y2′ d ⋅ − x3′ y3 e − y3 y3′ f − x4′ y4 g − y4 y4′ h
(4)
Fig. 6. Transformed image into the computer display coordinates
From this matrix equation, unknown variables a ∼ h can be solved and thus, transformation matrix in (2) can be determined. The following Fig. 6 shows the image of computer display region obtained by applying (2) to Fig. 4. Since it is the image that has been transformed with the computer display resolution, by applying the same transformation to the laser spot position detected in the camera image, it is possible to
296
J. Shin, S. Kim, and S. Yi
get the corresponding coordinate value in the original computer display coordinate system. Therefore, using this coordinate value, it is possible to synchronize the cursor of the computer mouse to the laser spot and the computer controls available on the computer display can be executed.
3
Results of Experiments
3.1
Presentation Environment
Fig. 7 shows the presentation environment and the camera used in this experiment. The camera has resolution of 640 × 480 and acquisition rate of 26 image frames per second and sends the obtained image to the main computer through USB. In this experiment, a separate USB camera was used, however the camera attached to laptops can be used as well. General beam projector and a screen for a presentation were employed in this experiment.
Fig. 7. Experimental environment
3.2
Mouse Buttons and Wireless Module
The multi-functional laser pointer mouse in this paper needs a set of buttons to execute computer controls and a wireless module to transfer the button signals. The buttons should correspond to the left and right buttons of the computer mouse. A wireless module to receive the signal from the laser pointer mouse should be equipped on the main computer of course. In this paper, a commercially available wireless presenter is adopted for the laser spot generation and the wireless module [4]. As described before, the commercial wireless presenter has a simple function of flipping pages with two buttons in a specific application i.e., MS-PowerPoint. Simply by installing the image processing program developed in this paper, it is possible to take advantages of the multi-functional laser pointer mouse with the commercial wireless presenter. Fig. 8 shows button signal interface between the commercial wireless presenter and the main computer.
Development of Multi-functional Laser Pointer Mouse through Image Processing
297
Fig. 8. Signal interface for commercial wireless presenter
3.3
Result of Experiment
Fig. 9 shows the experimental results. In Fig. 9 (a) it can be seen that the mouse cursor and the laser spot coincide with each other. This demonstrates that the algorithm of extracting computer display region in the camera image, detection of laser spot in the camera image, transformation of the laser spot position into the computer display coordinate system, and shifting the computer mouse cursor to the laser spot location is working well. Once the mouse cursor tracks the laser spot movement, the computer controls or the icons on which the cursor is located can be executed by the laser pointer mouse. Fig. 9 (b) shows the controls being executed by the laser pointer mouse after transforming the computer display region in the camera image.
(a) Coincidence of mouse cursor and laser spot
(b) Execution of computer controls
Fig. 9. Experimental results
4
Conclusion
In the paper, a multi-functional laser pointer mouse is developed that combines the laser pointer to point out a specific area on a beam projector screen and the computer
298
J. Shin, S. Kim, and S. Yi
mouse to execute many controls on the computer display. The image processing to extract the computer display projected on the screen, detect the laser spot in the camera image, and transform the spot location into the computer display coordinates can make the computer mouse cursor to coincide with the laser spot. As a consequence, it is possible to execute the computer controls by the laser pointer mouse. For the wireless module to transmit button signals, a simple commercial wireless presenter was adopted in this paper. Thus, by simply installing the image processing program developed in this paper on a computer, the commercial wireless presenter can be used as like a computer mouse and a laser pointer as well in a presentation. Experimental result shows that it is difficult to synchronize the mouse cursor with the fast moving laser spot due to the relatively slow (~30 frames per second) image acquisition rate of a camera. However, since it is normal for a user to stabilize and pause the laser spot movement near the desired computer control area to execute, there should not be any problem in accessing the controls by the laser pointer mouse.
References 1. 2. 3. 4. 5.
Gonzalez, R., Woods, R.: Digital Image Processing, 2 edn., pp. 589–591. Prentice-Hall (2002) Sin, S.: Detecting Method of max and min value for edge detection. Pat. No-1020050114463, Korea (2005) Fu, K., Gonzales, R., Lee, C.: Robotics-Control, Sensing, Vision, Intelligence. McGrawHill (1987) I-pointer, http://www.golgolu.com Bradski, G., Kaehler, A.: Learning OpenCV. O’Reilly (2008)
The Effect of Biased Sampling in Radial Basis Function Networks for Data Mining Hyontai Sug Division of Computer and Information Engineering, Dongseo University, Busan, 617-716, Korea
[email protected]
Abstract. Radial basis function (RBF) networks are known to have very good performance in the task of data mining of classification, and k-means clustering algorithm is often used to determine the centers and radii of the radial basis functions of the networks. Among many parameters the performance of generated RBF networks depends upon given training data sets very much, so we want to find some better classification models from the given data set. We used biased samples as well as conventional samples to find better classification models of RBF networks. Experiments with real world data sets showed successful results that biased samples could find some better knowledge models in some classes and conventional samples also could find some better knowledge models in some other classes so that we can take advantage of the results. Keywords: radial basis function network, classification, biased sampling.
1
Introduction
For the classification task of data mining or machine learning, the problem of insufficient data hinders the task much. The target data sets for the data mining usually come from real world databases, and because the real world databases are originally not made for data mining, the data sets may not contain enough data that are essential for some accurate classification. Moreover, to make matters worse, the real world data sets may contain some errors, and some data may be missing [1] Artificial neural networks and decision trees are representative works for the task of data mining. Decision tree algorithms have innate property that makes it easy to cope with large-sized data sets, because decision tree algorithms fragment a data set into many subsets. The algorithms split the data set based on how likely the subsets become purer for a class, and each object becomes to belong to a specific terminal node. Tests are done in the branches of the decision tree for the feature values of data objects. But the good property of decision trees for large-sized data sets can be harmful in data mining tasks, because we often may not have complete data sets, even the size of the data sets are large and the splitting causes fragmentation problem. Moreover, because we usually do not have complete data set for training, some heuristic-based pruning T.-h. Kim et al. (Eds.): MulGraB 2011, Part I, CCIS 262, pp. 299–306, 2011. © Springer-Verlag Berlin Heidelberg 2011
300
H. Sug
strategies are applied to the tree to avoid overfitting problem. But because the pruning is based on tree generation procedure, we cannot avoid the data fragmentation problem. Artificial neural networks can avoid the data fragmentation problem, since all objects are supplied for all nodes in the networks for training. For tasks of data mining two major neural networks like multilayer perceptrons and radial basis function (RBF) networks are mostly used because of their good performance in many applications [2, 3, 4]. We are especially interested in RBF networks, because the neural networks have been applied successfully for classification tasks of data mining [5]. RBF networks are artificial neural networks of radial basis functions. There are several radial basis functions [6]. Among them Gaussian functions are used often, since lots of data are in normal distribution. Gaussian function has two parameters – center and radius. In order to find the value of these two parameters, some clustering algorithm should be used. Among many clustering algorithms, k-means clustering algorithm can be some representative algorithm, because it is well known, and works well for most of data sets [7]. K-means clustering needs the number of clusters for user to enter, and deciding the number of clusters for k-means clustering is arbitrary in nature. So, we may resort to some repetitive trial to find the best number of clusters. Moreover, among many parameters the performance of generated RBF networks depends upon training data sets very much, and we want to find the better classification models from the given data set. In order to find better RBF networks, we may use biased samples as well as conventional samples to find better classification models. In section 2, we provide the related work to our research, and in sections 3 we present our method. Experiments were run to see the effect of the method in section 4. Finally section 5 presents some conclusions.
2
Related Work
Artificial neural networks have some good properties like robustness for errors in data, and some satisfactory performance even for incomplete data. Incomplete data are data that do not have some complete information for classification. For the classification tasks of data mining feed-forward neural networks can be used. RBF networks are one of the most popular feed-forward networks [8]. Even though RBF networks have three layers including the input layer, hidden layer, and output layer, they differ from a multilayer perceptron, because in RBF networks the hidden units are constructed usually based on clustering algorithms. A good point of RBF networks is their good prediction accuracy with small-sized data sets. Because decision trees have understandable structures and have been successful. Kubat [9] tried to utilize the information of terminal nodes of C4.5 [10] which is some representative decision tree algorithm to initialize RBF networks. The terminal nodes were used as center points for clustering for the RBF networks. He showed that the RBF networks based on the terminal nodes have better accuracy than the decision trees of C4.5 in some data sets. But Kubat didn’t consider the possibility of different number of clusters. In [11] the performance of four different neural networks, backpropagation network, RBF network, fuzzy-ARTUP-Net, LVQ, are compared with
The Effect of Biased Sampling in Radial Basis Function Networks
301
binary and n-ary decision trees for industrial radiographic testing data, and better performance of the four neural networks was shown. So we can see that RBF networks have better performance than decision trees. Because some induction method is used to train the data mining models like neural networks, the behavior of trained data mining models also dependent on the training data set. So, we can infer that the trained knowledge model will be dependent on sample size as well as the composition of data in the samples. Fukunaga and Hayes [12] discussed the effect of sample size for parameter estimates in a family of functions for classifiers. In [13] the authors showed that class imbalance in training data has effects in neural network development especially for medical domain. SMOTE [14] used synthetic data for the effect of over-sampling in minority class and showed improved performance in decision trees.
3
The Method
Many data sets for data mining have some unbalanced distribution with respect to classes, and this fact can be easily checked if we sort them with respect to classes. On the other hand, if we build some knowledge models of classification, we can easily check which classes are more inaccurate. The method first builds a RBF network with some arbitrary number of clusters for k-means clustering. Then, we inspect the number of misclassified objects for each class, and we choose classes that are desirable for over-sampling. Because the accuracy of RBF networks can be different for each different number of clusters, we increase the number of clusters from the small to large. But increasing the number of clusters sequentially and generating corresponding RBF networks may take a lot of computing time without much improvement in accuracy, so we increment the number as some multiple of the initial number of clusters. If the accuracy values of RBF networks do not increase within given criteria or converge, the search stops. The following is a brief description of the procedure of the method. procedure (Output): Begin /* X, K, C, D: parameters */ 1. Generate RBF network with arbitrary number of clusters, K; 2. Inspect the accuracy of each class to determine over-sampling for some classes; /* do for oversampled data */ 3. For each sample data set do 4. Do sampling of X % more for the classes 5. Find_the_best_RBFN; 6. End for. /* do for original sample data */ 7. For each sample data set do 8. Find_the_best_RBFN;
302
H. Sug
9. End for. End. Subprocedure find_the-best_RBFN: 11. Initialize the number of clusters of RBFN as C where C is the number of classes; 12. Generate a RBFN /* initial_accuracy = the accuracy of the network */ 13. yet_best_accuracy := initial_accuracy; /* check increasingly */ 14. Repeat 14.1 Generate a RBFN after increasing the number of clusters by D; 14.2 If the accuracy of RBFN > yet_best_accuracy Then yet_best_accuracy := the accuracy of RBFN; End if; 15. Until the accuracy of RBFN converges 16. the best_accuracy := yet_best_accuracy End Sub. In the above procedure there are four parameters to be defined, X, K, C, and D. X represents additional percentage to do more sampling. K represents the number of clusters given arbitrary. C represents the number of classess. D is increment in the number of clusters of RBF network. In the experiment below X is set to 20%, K is set to four or sixteen, and D is set depending on how many classes we have. One may give smaller value of D, if he wants more through search. Increasing the number of clusters will be stopped, when the accuracies of the generated RBF networks are not improved further.
4
Experimentation
Experiments were run using data sets in UCI machine learning repository [15] called ‘adult’ [16] and 'statlog(Landsat satellite)' [17] to see the effect of the method. The number of instances in adult data set is 48,842, and the number of instances in statlog data set is 6,435. The data sets were selected, because they are relatively large, adult data set may represent business domain and stalog data set may represent scientific domain. The total number of attributes is 14 and 36, and there are two classes and six classes for adult statlog data set respectively. There are six continuous attributes for adult data set, and all attributes are continuous attributes for stalog data set. We used RBF network using K-means clustering [18] to train for various number of clusters. Because most applications of RBF network use relatively small-sized data sets, we did sampling of relatively small sizes for the experiment to simulate the situation. For adult data set sample size of 1,600 is used, and for stalog data set sample size of 800 is used. All the remaining data are used for testing. For each sample size seven random sample data sets were drawn.
The Effect of Biased Sampling in Radial Basis Function Networks
303
As a first step, we made RBF networks of samples of size 1600 and 800 for the adult and statlog data set respectively. Table 1 and table 2 show error rates of each data set for each class when we generate RBF networks. Table 1. Error rates for each class of adult data set of sample size 1600 Class >50 K 50K
Error rate 29.4% 13.2%
Table 2. Error rates for each class of statlog data set of sample size 800 Class 1 2 3 4 5 6
Error rate 10.7% 2.0% 2.8% 66.0% 24.5% 17.7%
So, 20% more objects were sampled from the object pool of class ‘>50K’ for adult data set, and 10% more objects were sampled for each class 4 and 5 for statlog data set. The following table 3 thru 6 shows average error rate of the best RBF networks in the algorithm for each sample size of seven samples for the data sets. Table 3 shows average error rate of the RBF networks with minority over-sampling for adult data set. Table 3. Average error rate for each class of adult data set of samples of size 1920 with minority over-sampling Class >50K 50K
Error rate 33.5% 11.5%
Table 4 shows average error rate of the RBF networks with conventional sampling for adult data set. Table 4. Average error rate for each class of adult data set of samples of size 1600 with conventional sampling Class >50K 50K
Error rate 27.5% 14.0%
If we compare table 3 and table 4, we can notice that we can find better accuracy for major class with minor class over-sampling.
304
H. Sug
The following table 5 and 6 show result of experiment for statlog data set. Table 5 shows the result of experiment for statlog data set with minor class over-sampling. Note that class 4 and 5 has been chosen as minorities. Table 5. Average error rate for each class of statlog data set of samples with oversampling 2 minor classes (4, 5) Class 1 2 3 4 5 6
Error rate 3.5% 4.9% 10.0% 48.8% 26.7% 11.1%
Table 6 that was generated for comparison shows average error rate of the RBF networks with conventional sampling for statlog data set. Table 6. Average error rate for each class of statlog data set samples of size 800 with conventional sampling Class 1 2 3 4 5 6
Error rate 4.4% 4.3% 12.1% 36.9% 19.5% 16.1%
If we compare table 5 and table 6 carefully, we can notice that we have some similar result from the one in adult data set. As indicated by the underlines, the minority over-sampling method can produce better results for most of major classes. All in all, we may use both RBF networks of conventional sampling and over-sampling. For example, if an unseen case is classified as minor class with the RBF network from oversampled data, then we classify it with the RBF network from the original data, and we accept the result from the RBF network, because the RBF network from the original data has better error rates for the minor classes.
5
Conclusions
Because data mining tasks usually deal with data of some errors and incompletion, some effective tools to mine such data is needed, and artificial neural networks can be good tools, since all objects are supplied for all nodes in the networks for training so that they are relatively more robust than other data mining methods in data of some errors and incompletion.
The Effect of Biased Sampling in Radial Basis Function Networks
305
RBF networks make approximation based on training data, and Gaussian functions are used mostly as the radial basis function. In order to train RBF networks, we may use some unsupervised learning algorithms like k-means clustering. Since RBF networks have different performance depending on the number of clusters and available training data sets, we want to find better RBF networks under the constraint of the available data sets. Most target data sets for data mining have some skewed distribution in class values, so if there is some change in the distribution, the resulting RBF networks may have different performance. We propose a method to find better RBF networks in those contexts. We first generate RBF network with arbitrary number of clusters with conventional sampling to determine if there is relatively higher number of errors depending on classes, then we sample more for the classes, and RBF networks are generated with various number of clusters for the biased sample to find better one in the given sample data set. Experiments with two real world data sets in business and scientific domain give us the conclusion that we can we can find better RBF networks effectively.
References 1. 2. 3. 4. 5. 6.
7. 8. 9. 10. 11.
12. 13.
14.
Tan, P., Steinbach, M., Kumar, V.: Introduction to Data Mining. Addison Wesley (2006) Bishop, C.M.: Neural networks for pattern recognition. Oxford University Press (1995) Heaton, J.: Introduction to Neural Networks for C#, 2nd edn. Heaton Research Inc. (2008) Lippmann, R.P.: An Introduction to Computing with Neural Nets. IEEE ASSP Magazine 3(4), 4–22 (1987) Howlett, R.J., Jain, L.C.: Radial Basis Function Networks I: recent developments in theory and applications. Physics-Verlag (2001) Coulomb, J., Kobetski, A., Costa, M.C., Maréchal, Y., Jösson, U.: Comparison of radial basis function approximation techniques. The International Journal for Computation and Mathematics in Electrical and Electronic Engineering 22(3), 616–629 (2003) Russel, S., Novig, P.: Artificial Intelligence: a Modern Approach, 3rd edn. Prentice Hall (2009) Orr, M.J.L.: Introduction to Radial Basis Function Networks, http://www.anc.ed.ac.uk/~mjo/intro.ps Kubat, M.: Decision Trees Can Initialize Radial-Basis Function Networks. IEEE Transactions on Neural Networks 9(5), 813–821 (1998) Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, Inc. (1993) Perner, P., Zscherpel, U., Zacobsen, C.: A comparison between neural networks and decision trees based on data from industrial radiographic testing. Pattern Recognition Letters, 47–54 (2001) Fukunaga, K., Hayes, R.R.: Effects of Sample Size in Classifier Design. IEEE Transactions on Pattern Analysis and Machine Intelligence 11(8), 873–885 (1989) Mazuro, M.A., Habas, P.A., Zurada, J.M., Lo, J.Y., Baker, J.A., Tourassi, G.D.: Training neural network classifiers for medical decision making: The effects of imbalanced datasets on classification performance. Neural Networks 21(2-3), 427–436 (2008) Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research 16, 341–378 (2002)
306
H. Sug
15. Suncion, A., Newman, D.J.: UCI Machine Learning Repository. University of California, School of Information and Computer Sciences, Irvine (2007), http://www.ics.uci.edu/~mlearn/MLRepository.html 16. Kohavi, R.: Scaling up the accuracy of Naive-Bayes classifiers: a decision-tree hybrid. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, pp. 202–207 (1996) 17. Statlog (Landsat Satellite) Data Set, http://archive.ics.uci.edu/ml/datasets/ Statlog+%28Landsat+Satellite%29 18. Witten, I.H., Frank, E.: Data Mining, 3rd edn. Morgan Kaufmann (2011)
Location Acquisition Method Based on RFID in Indoor Environments Kyoung Soo Bok, Yong Hun Park, Jun Il Pee, and Jae Soo Yoo* Department of Information and Communication Engineering, Chungbuk National University, Cheongju, Chungbuk, Korea {Ksbok,yhpark1119,yjs}@chungbuk.ac.kr,
[email protected]
Abstract. In this paper, we propose a new location acquisition method that reduces the computation cost of location acquisition and keeps the accuracy of the location. The proposed method performs the event filtering to selects the necessary reference tags and then computes the accurate locations of objects. If the locations of objects are changed then update the locations of objects. To show the superiority of our proposed method, we compare it with LANDMARC, which is the most popular localization method. It shows that the proposed system reduces the computation cost of location estimation 500 times more than LANDMARC. Keywords: Location based service, RFID, Indoor, Location Acquisition.
1
Introduction
Through the development of sensor technology and communication technology, location based services have been developed. LBS provide information related certain locations or the locations of certain objects, has been highly increased [1, 2]. One of the most well known location-aware services is GPS. However, GPS has an inherent problem that accurately determines the location of objects inside buildings [1, 3]. LBS are important to the services for outdoor as well as indoor in Ubiquitous environments. The indoor location based service requires the accurate locations of objects less than number of meters. However, it is impossible to provide the indoor LBS because of the accurate locations because of the inaccuracy of the locations detected based on Global Positioning System(GPS)[4, 5]. Radio frequency identification (RFID) is an electronic identification technology for a real-time tracking and monitoring and is one of the core technologies for Ubiquitous services. RFID stream are generated quickly and automatically and then is a very large volume data. Since most of RFID stream sensed by reader are useless to the application, semantic event processing is required to detect more meaningful and interesting data to applications [7]. Recently, RFID based location systems are many researched in indoor environment. Generally, the location systems based RFID use *
Corresponding author.
T.-h. Kim et al. (Eds.): MulGraB 2011, Part I, CCIS 262, pp. 307–315, 2011. © Springer-Verlag Berlin Heidelberg 2011
308
K.S. Bok et al.
the RSSI to measure the signal strength from each tag to readers. These systems are classified into two methods. The first method is that RFID tags are attached at certain fixed position and RFID readers are attached to the moving objects[9, 10]. In this method, there is a problem that the system requires too much cost to be constructed since the RFID readers are very expensive and the readers are attached to each object. [10] proposed an indoor location estimation system based on UHF based RFID. In [10], each tag has a unique ID number and is attached to the ceiling. RFID reader is attached to the person. The location of the person is calculated from the coordinates of detected tags. The second method is that RFID readers are attached at certain fixed position and RFID tags are attached to the moving objects [4, 8]. This method requires relatively low in price to construct the system. To increase the accuracy of detecting location of the objects, RFID readers are required as much as possible. In addition, since the readers are expensive, the methods to increase the accuracy and reduce the number of the readers have been researched. LANDMARC is a prototype indoor location system based on active RFID [4]. LANDMARC uses the concept of reference tags to improve the overall accuracy of locating objects. Suppose we have n readers along with m tags as reference tags. Many studies have been studied to enhance a weakness of LANDMARC. LANDMARC does not work well in a closed area with sever radio signal multi-path effects. More, since the accuracy of localization relies on the placement of reference tags, more reference tags are need to improve location accuracy [3, 8]. Many studies have been studied to enhance a weakness of LANDMARC [3, 6, 8, 11]. VIRE used the concept of virtual reference tags to gain more accurate positions of tracking objects without additional tags and readers [11]. VIRE employ the concept of virtual reference tags to provide denser reference coverage in the sensing area instead of using many real reference RFID tags deployed in the sensing area. To alleviate the effects of uncharacteristic signal behavior, a proximity map is maintained by each reader. To estimate the possible position of an object, VIRE can eliminate those unlikely positions based on the information from different maps with certain design parameters. [8] used a sub-region selection mechanism to reduce the redundant calculation and proposed a nonlinear interpolation algorithm to calculate the RSSI values of virtual reference tags. To improve the accuracy of indoor localization in real environments, [6] used the method of curve fitting to construct the relationship between RSSI and distance from RF Reader to Tag. To calculate the moving object’s position, [6] first obtain the k nearest reference tags and the moving object tag’s position by LANDMARC algorithm. After that, [6] puts the k reference tags and the moving object’ tag with the computed position in a set and repeats to calibrate the target coordinate by the error corrections which obtained by members of this set. The calibration will continue until the position of the moving object’ tag tends to be a stable value. In this paper, we propose a new location acquisition method using RFID tags that reduces the cost of computing the locations and guarantees the accuracy of the locations. The method classifies the RFID tags into object tags and reference tags. The reference tags and readers are attached at the fixed locations to recognize the object tags attached to the moving objects. The reference tags are used to correct the locations of the object tags as assistants. The reader records the information of both the reference tags and the object tags periodically. To save the cost of computing the
Location Acquisition Method Based on RFID in Indoor Environments
309
locations of the objects, we adapt the filter phase that ignores unnecessary information of reference tags corrected from the un-relative readers. The rest of this paper is organized as follows. Section 2 introduces our proposed method to detect the location of the RFID tags and update the locations efficiently. Section 3 shows the superiority of our proposed methods through performance evaluation. Finally, section 4 presents the conclusions and future works.
2
The Proposed System Method
2.1
The System Architecture
We propose a new indoor location acquisition method using active RFID tags to improve the computation cost and the location accuracy when RFID readers and reference tags are located in fixed location and only moving object attached tags moves in indoor environment. RFID tags are classified into reference tags and object tags. The reference tags served as reference points are placed in fixed location to reduce the number of RFID readers and to improve the location accuracy, and are similar to LANMARC. The object tags are RFID tags attached to the moving object and move about in indoor. To improve the computation cost and the location accuracy, we use an event filtering to rapidly determine the neighbor reference tag required to acquire the location of moving object and adopt a location update policy to minimize the management cost of updates. Figure 1 shows the proposed system architecture. To rapidly acquire the location of object and enhance the location accuracy, our system consists of event filtering module and location tracking module, where the event filtering selects the necessary reference tags to compute the accurate location of an object tag and the location tracking calculates and updates the location of object tags. In the event filtering, data classification classifies the RFID stream transmitted from the middleware into objects tags and reference tags, and stores the RFID stream to each index structure according the kind of tags. The reference tags are used to assist deciding the locations of object tags by comparing the strength of the signal between the object tags and reference tags. The data filtering prunes the unnecessary reference tags to calculate the location of object tags. All the reference tags are not helpful to decide the locations of the object tags but only a few neighbor reference tags are used to calculate the location of object tag. The data filtering reduces the computation cost for calculating the locations of the objects using the reference tags. In the location tracking, the location generation calculates the real positions of objects based on the filtered RFID stream from data filtering. The location update decides whether the location of the object tag is updated according to the update policies or not. According to the decision, the locations of objects are updated and notify the service to update the locations. It reduces the communication cost between location management system and application.
310
K.S. Bok et al.
Fig. 1. The proposed system architecture
2.2
Event Filtering
To acquire the locations of moving objects, we manage an object tag table and a reference tag table that present RFID tag information sensed by RFID reader. The object tag table stores the tag information of moving objects and the reference tag table stores the information of the reference tag used in LANMARC. We have to register the moving objects monitored by applications to provide location based service. The registration of moving object means that it stores the physical identifier of RFID tag attached to the moving objet into the object tag table. After the registration, a logical identifier is assigned to the tag and stores into the moving object table. The tag information table is used to map the physical identifier to the logical identifier. The tag information table stores
, where PID is a physical identifier which is EPC code of moving object, LID is a logical identifier, Info is a current location of moving object and initially is null. Info is consisted of , where ti is the time, (xi, yi) is the position of an object tag and (vxi, vyi) is a velocity vector. Initially, Info is null. Info stores the location information of object tag after the location generation module is processed. The reference tag table is similar to the object tag except Info. In the reference tag table, Info only stores the positions in which the reference tag is deployed. To calculate the location of moving object, the data classification first classifies the RFID stream received from middleware into object tags and reference tags. We use two index structures such as OR(Object tag-Reader) index and RR(Reference tagReader) index. The index structures indicate the occurrences of the object tags and the reference tags sensed by readers. Figure 2 shows two index structures that is the grid based index structure to represent the relation of tags and readers, where OTi is an object tag, Ri is a reader, and RTi is a reference tag. Figure 2 (a) is the OR index which represent the occurrences of the object tags sensed by the multiple readers. Figure 2 (b) is the RR index which represent the occurrences of the reference tags sensed by the multiple readers. Initially, all the values of each cell in the two grid index structures set ‘0’. If the reader senses multiple tags, the reader transmits RFID streams to middleware. And then we classify the RFID stream into object tags and reference tags. If the physical identifier of a tag exists in the object tag table, we set '1' to the cell representing the reader and the object in OR index structure. If the physical identifier of a tag exists in the reference tag table, we set '1' to the cell representing the reader and the reference tag in RR index structure.
Location Acquisition Method Based on RFID in Indoor Environments
(a) OR index structure
311
(b) RR index structure
Fig. 2. A grid-based index structure
To process data classification from RFID stream sensed by reader, the tag information table and gird based index structure is used. Figure 3 represents the procedure of data classification. The RFID stream transmitted from RFID middleware is defined as a tuple <EPC, RID, TS, SS>, where EPC is the unique identifier of tag defined by electronic product code standard, RID is the identifier of RFID reader, TS is a timestamp which represents the time when the tag is sensed by RFID reader, SS is the signal strength of the tag. If the RFID stream transmitted through RFID middleware exists at t1 then we decide that the sensed RFID stream is classified through Tag information table and then the cell representing the sensed tag and the sending reader set ‘1’ in two grid based index structure. For example, <epc1, r1, t1, 3> is the object tag because the PID of epc1 exists in object tag table. Therefore, the cell <1, 1> representing the sensed tag epc1 and the sensing reader r1 set ‘1’ in OR index. The rest of the received RFID stream is processed in the same way as above.
Fig. 3. An example for data classification process
We decide the set of reference tags and readers required to calculate the locations of object tags through a data filtering module. The data filtering module uses the OR index and RR index. First of all, we make the set of readers sensing object tags from OR grid structure. After the set of readers are made, In RR index structure, we find out
312
K.S. Bok et al.
the bit patterns that present the reference tags that the readers recognize. After that, we find the set of reference tag that is recognized by the readers in common through ‘AND’ operation between the bit patterns of the readers. Figure 4 presents the selection process of candidate reader from OR and RR index structure for an object OT1. In OR index structure, a set of readers sensing an object OT1 is {R1, R2} at time t1. The set of readers sensing an object OT1 are the physically adjacent readers at time t1. To find the reference tags simultaneously sensed by reader sensing an object, we examine the bit patterns of a reader R1 and R2 sensing an object OT1. The bit patterns of R1 and R2 are ‘10011’ and ‘11001’. As shown figure 4, we can obtain a set of adjacent reference tags commonly sensed by the readers through ‘AND’ operation between the bit patterns of the readers sensing the object. Therefore, the reference tags found by the bit pattern of the readers in RR index structure are only the candidates used for calculating the location of the object.
Fig. 4. Selection process of adjacent candidate objects
2.3
Location Tracking
To generate the location of an object, we use the adjacent readers and reference tags selected in previous steps. We suppose that n and m denote the number of the selected adjacent readers and adjacent reference tags respectively. The Euclidean distance in signal strengths between the object and the reference tag is equation (1) where Si and θj denote the signal strength that the i-th reader receives from the object and j-th reference tag, respectively. ∑
,
1,
(1)
We select k number of the reference tags that have minimum value E=(E1, E2, …, Em) among the selected reference tags because we use only k number of reference tags is to increase the accuracy of the location to be estimated by using the reference tags that have the highest reliability. The E values of the k-selected reference tags are used to correct the location of the object with the weights according to the similarity of the
Location Acquisition Method Based on RFID in Indoor Environments
313
signal strength between the object and the reference tags. The weight wj is calculated by equation (2). Using the location information of the k-selected reference tags and their weights, we create the location of the object through equation (3). (2)
∑
,
,
(3)
To update the location information of an object, an application services can register and manage the update policies. First, we find the latest location information in the object tag table, and then compare the varieties between the latest location and the new locations of objects. The moving object table maintains the latest location information of objects from application services. If the distance between the current location and the new location exceeds the threshold then the new location information is transmitted to the application service and Info is updated in the moving object table.
3
Experimental Evaluation
To show the superiority of our proposed method, we compare our localization method with LANDMARC. The system setting is shown in Table 1. For the evaluation environment, we place the RFID tags and reader so that the minimum number of the adjacent readers communicating with a RFID tag is 3. We set the amount of objects to be monitored to 20% of all objects. Table 1. Experimental parameters Parameter Simulation area(SA) Transmission range of a reader(TA) The total number of moving object(TMO) The number of monitoring object(NMO) k
Value 50m50m~200m200m 20m 100~400 20% 5
Table 2. The computation cost according to the number of moving object Method TMO 100 200 400
LANMARC
The proposed method
3062500 6125000 12250000
6072 12222 24891
The comparison the computation cost according to the number of objects. To provide a real-time location based service in RFID systems, it is one of the most important factors to compute the location of objects. Table 2 shows the cost of computing the location of objects according to the number of objects from 100 to 400.
314
K.S. Bok et al.
The proposed method is about 500 times faster than LANDMARC. It is because the filtering step reduces the records to be used to computing the locations. Table 3 presents the comparison of communication costs of our proposed method in various environments. When the size of simulation environment is increased, the computation cost is increased. It is because the number of reader and reference tags participated in the computation of vector E increases. However, the amount of computation is reduced in the proposed method because we just use smaller number of readers and reference tags affecting the objects to compute the location of objects than that of LANDMARC. Therefore, the proposed method detects and computes the location of objects for large scale environment in real time. Table 3. The computation cost according to simulation area Method SA 50m50m 100m100m 150m150m 200m200m
LANMARC 30625 422500 2030625 6250000
The proposed method 195.58 163.58 199.88 183.18
To measure the accuracy, we compare the computed locations and the real locations of objects. Figure 5 presents the error distances of the proposed method comparing with LANDMARC during three time unit. The error distance of the proposed method is similar to that of LANDMARC. It means that the computed location of the proposed method guarantee the accuracy as that of LANDMARC even if smaller number of reader and reference tags then LANDMARC are participated in the computation. Therefore, the proposed method reduces the cost of computing the location of objects as well as keeps the accuracy of the locations.
Fig. 5. The accuracy of the computed location
4
Conclusion
In this paper, we proposed a new location acquisition method to reduce the computation cost as well as the accuracy of the locations. We just use a small number
Location Acquisition Method Based on RFID in Indoor Environments
315
of readers and reference tags for computing the location of objects through the event filtering. Through the performance evaluation, we prove that the computation cost is cut off about 50%~70%. The proposed system enhances the computation time about 500 times. For the further works, we will propose the method that detects the objects movement before computing the location of the objects to save the cost of computing the locations. Acknowledgments. This work was supported by the Ministry of Education, Science and Technology Grant funded by the Korea Government (The Regional Research Universities Program/Chungbuk BIT Research-Oriented University Consortium) and Basic Science Research Program through the National Research Foundation of Korea(NRF) grant funded by the Korea government(MEST)(No. 2009-0089128).
References 1. Hightower, J., Borriello, G.: Location systems for ubiquitous computing. IEEE Computer 34(8), 57–66 (2001) 2. Gressmann, B., Klimek, H., Turau, V.: Towards Ubiquitous Indoor Location Based Services and Indoor Navigation. In: Proc. Workshop on Positioning Navigation and Communication, pp. 107–112 (2010) 3. Jin, H.Y., Lu, X.Y., Park, M.S.: An Indoor Localization Mechanism Using Active RFID Tag. In: Proc. the IEEE International Conference on Sensor Networks, Ubiquitous, and Trustworthy Computing, p. 4 (2006) 4. Lionel, M.N., Yunhao, L.: LANDMARC: Indoor Location Sensing Using Active RFID. Wireless Networks 10(6), 701–710 (2004) 5. Heo, J., Pyeon, M.-W., Kim, J.W., Sohn, H.-G.: Towards the Optimal Design of an RFIDBased Positioning System for the Ubiquitous Computing Environment. In: Yao, J., Lingras, P., Wu, W.-Z., Szczuka, M.S., Cercone, N.J., Ślȩzak, D. (eds.) RSKT 2007. LNCS (LNAI), vol. 4481, pp. 331–338. Springer, Heidelberg (2007) 6. Jiang, X., Liu, Y., Wang, X.: An Enhanced Approach of Indoor Location Sensing Using Active RFID. In: Proc. International Conference on Information Engineering, pp. 169–172 (2009) 7. Liu, Y., Wang, D.: Complex Event Processing Engine for Large Volume of RFID Data. In: Proc. Second International Workshop on Education Technology and Computer Science, pp. 429–432 (2010) 8. Shi, W., Liu, K., Ju, Y., Yan, G.: An Efficient Indoor Location Algorithm Based on RFID Technology. In: 6th International Conference on Wireless Communications Networking and Mobile Computing (WiCOM), pp. 1–5 (2010) 9. Kim, S., Ko, D., An, S.: Geographical location based RFID tracking system. In: 2008 International Symposium on a World of Wireless, Mobile and Multimedia Networks, pp. 1–3 (2008) 10. Shiraishi, T., Komuro, N., Ueda, H., Kasai, H., Tsuboi, T.: Indoor Location Estimation Technique using UHF band RFID. In: Proc. International Conference on Information Networking, pp. 1–5 (2008) 11. Zhao, Y., Liu, Y., Ni, L.M.: VIRE: Active RFID-based Localization Using Virtual Reference Elimination. In: Proc. International Conference on Parallel Processing, p. 57 (2007)
The Efficiency of Feature Feedback Using R-LDA with Application to Portable E-Nose System Lang Bach Truong1, Sang-Il Choi2, Yoonseok Yang3, Young-Dae Lee4, and Gu-Min Jeong1,* 1
School of Electrical Engineering, Kookmin University, Seoul, Korea Dept. Of Computer Science, University of Southern California, USA 3 Biomedical Engineering, Chonbuk National University 3, Jeonju, Korea 4 Semyung University, Korea [email protected] 2
Abstract. In this paper, we improve the performance of Feature Feedback and present its application for vapor classification in a portable E-Nose system. Feature Feedback is a preprocessing method which detects and removes unimportant information from input data so that classification performance is improved. In our original Feature Feedback algorithm, PCA is used before LDA in order to avoid the small sample size (SSS) problem but it is said that this may cause loss of significant discriminant information for classification. To overcome this, in the proposed method, we improve Feature Feedback using regularized Fisher’s separability criterion to extract the features and apply it to E-Nose system. The experimental result shows that the proposed method works well. Keywords: e-nose system; vapor classification; feature feedback; discriminant feature.
1
Introduction
Sensors are used to measure a certain physical or chemical phenomenon. Nowadays, various sensor systems are adopted to digital embedded systems. A portable e-nose system is composed of a sensor array that contains several channels and a classifier. By use of the information acquired from those sensor arrays, the classifier distinguishes different vapors by a classification rule. In order to make an e-nose system perform reliably in various environments, it needs improvements not only in sensor hardware aspects, but also in data mining methods that process and classify the data measured by the sensors. Pattern recognition is one of the most important parts in designing for the usage of sensors. Feature extraction methods such as PCA+LDA [1], [2], FDA [3] and SVM [4] can be effectively utilized to classify data acquired from an e-nose sensor. The feature feedback based pattern recognition method [5], [6] has been proposed to analyze the relative importance of each part in the data for the classification and identify *
Corresponding author.
T.-h. Kim et al. (Eds.): MulGraB 2011, Part I, CCIS 262, pp. 316–323, 2011. © Springer-Verlag Berlin Heidelberg 2011
The Efficiency of Feature Feedback Using R-LDA
317
the important regions from the whole data. To do that, we first extract projection vectors that map the input space into the feature space by using a feature extraction method, and then feed the extracted features back to the input space. The efficiency of feature feedback based pattern recognition has been shown in both face recognition [5] as well as in vapor classification [6]. However, statistical learning methods including the LDA – based feature feedback often suffer from the so-called “Small Sample Size” (SSS) problem, encountered in high-dimensional pattern recognition tasks where the number of training samples available for each subject is smaller than the dimensionality of the sample space. To overcome this problem, our novel feature feedback method firstly used PCA as a pre – processing step to remove the null space of Sw and then LDA is performed in the lower dimensional PCA subspace. However, it has been shown that the discarded null spaces may contain significant discriminatory information. Recently, to prevent this from happening, solutions without a separate PCA step, called direct – LDA methods have been proposed [7-10]. Motivated by the success of R-LDA method [10], a variant of feature feedback is developed here. In this paper, the feature mask which is obtained by using R-LDA will be used to remove noise from input data in terms of classification performance and the efficiency. For the classification step, R-LDA is also used to extract the features from the refined input data. The rest of this paper is organized as follows. In section 2, we briefly overview related works. The experimental method and its application to face recognition are explained in section 3. The experimental results are described in section 4, followed by the conclusion in section 5.
2
Related Works
2.1
Regularized LDA (R-LDA) [10]
Given a training set, Z = {Zi }Ci =1 , containing C classes with each class
Z
i
=
{}
Ci
z
ij
j =1
con-
sisting of a number of localized face image zij, a total of face N = C i =1 Ci images are available in the set. For computational convenience, each image is represented as a column vector of length J( = Iw x Ih) by lexicographic ordering of the pixel elements , i.e. zij ∈ R J , where (Ix x Ih) is the image size and RJ denotes the J-dimensional real
{ }
M
space. LDA searches for a set of M (<< J) feature basic vectors, denoted as ψ m m =1 in the underlying space that best discriminate among the classes. This is achieved by maximizing the determinant of the between-class scatter matrix (Sb) and minimizing the determinant of the within-class scatter matrix (Sw) simultaneously. The objective function of LDA can be written as follows:
318
L.B. Truong et al.
T
ψ = arg max
ψ Sbψ
ψ ψ S ψ w T
, ψ=[ψ 1 ,ψ 2 ,...,ψ M ], ψ m ∈ R
J
(1)
Where Sb and Sw are the between- and the within-class scatter matrices which are defined as follows:
Sb =
C 1 C T T T Ci ( zi − z )( zi − z ) = φb ,iφ b,i = φbφb i =1 N i =1
(2)
Sw =
1 C Ci T ( zij − zi )( zij − zi ) = = i j 1 1 N
(3)
Where 1/2
φb,i = (C / N )
( zi − z ), φb = [φb,1 ,..., φb ,c ] and zi =
1 Ci
C j =i 1 zij is the mean
of the class Zi. The optimization problem of Eq.1 is equivalent to the following generalized eigenvalue problem,
Sbψ m = λm S wψ m , m = 1,...,M
(4)
Thus, when Sw is non-singular , the basis vectors Ψ sought in Eq.1 correspond to the −1 first M most significant eigenvectors of ( S Sb ) , where the “significant” means w that the eigenvalues corresponding to these eigenvectors are the first M largest ones. Due to SSS problem, traditional methods, for example in [3], attempt to solve the problem by utilizing an intermediate PCA step to reduce the input dimensionality and Sw is no longer degenerate. So that LDA can proceed without trouble. Nevertheless, a potential problem is that the PCA criterion may not be compatible with the LDA criterion, thus the PCA step may discard dimensions that contain important discriminative information. A regularized LDA (R-LDA) was developed in [7] to overcome this problem. In this method, the regularized Fisher’s criterion is expressed as follows: T
ψ = arg max
ψ Sbψ
(5)
ψ η (ψ T S ψ ) + ψ T S ψ w b
Where 0 ≤ η ≤ 1 is a regularization parameter. The proof of the equivalence between (1) and (5) has been shown in detail in ref. [10]. Ref. [10] also derived the algorithm to optimize the modified Fisher’s criterion in equation (5) as follows: C Input: A training set Z with C classes: Z = Z i , each class containing i =1 Ci Z i = Z ij face images, where zij ∈ R J and the regularization η . i =1
{ }
{ }
The Efficiency of Feature Feedback Using R-LDA
319
Output: An M-dimensional LDA subspace spanned by Ψ, a [J x M] matrix with M<<J. Algorithm: Step 1: Express T Sb = φbφb , with φb = φb,1 ,..., φb,c ,
1/2
φb,i = ( Ci / N )
C ( zi − z ), zi = 1 / Ci j =i 1 zij , and
C z = 1 / N iC=1 j i=1 zij
T Step 2: Find the m eigenvectors of Sb = φbφb with non-zero eigenvalues, and denote them as Em = [e1, ... , em]. Step 3: Calculate the first m most significant eigenvectors (Um) of Sb and their cor-
responding eigenvalues Λ b by T U m = φb Em and Λ b = U m SbU m −1/2 Step 4: Let H = U m Λ b . Find the eigenvectors of HTSwH , P = [p1,...,pm] sorted in increasing eigenvalue order. Step 5: Choose the first M ( ≤ m ) eigenvectors in P. Let PM and Λ w be the chosen eigenvectors and their coresspoding eigenvalues, respectively. −1/2 Step 6: Return ψ = HPM (η I + Λ w ) . 2.2
Feature Feedback [5]
We first extract projection vectors that map the input space into the feature space by using a feature extraction method and then feed the extracted features back to the input space. Based on the feedback information, each data sample is differentiated into two parts: the important and unimportant part. For a data sample xk that contains n input variables {xki | i = 1,2,..., n} , let a i ∈ R n be the i–th unit coordinate vector of the input space and let ψ l ∈ R n be the projection vector corresponding to the l-th largest eigenvalue obtained by a feature extraction method. Then, Ψl can be expressed by a linear combination of the unit direction vectors ais as follows: T
T
ψ l = [ψ l1 ,ψ l 2 , ...,ψ ln ]
= ψ l1a1 + ψ l 2 a 2 + ... + ψ ln a n
Here, the magnitude of li indicates how much the i-th coordinate vector ai contributes to the projection vector l. Therefore, if li is larger in magnitude than lj for a projection vector l, the coordinate vector ai (i.e., the i-th input variable) can be regarded as more important than aj (i.e., the j-th input variable). Among the n variables in a data sample, we select the t (< n) variables corresponding to some
320
L.B. Truong et al.
largest values of | li | in the order from greatest to the least. Then the input variables corresponding to selected t variables are used for classification.
3
Feature Feedback with R-LDA
In this section, we apply the feature feedback with R-LDA to face recognition. We first construct the feature mask from training set using feature feedback. Then, the features for face recognition are extracted from the masking faces. All necessary steps of the experiment are shown in Fig.1 and the detailed procedure is as follows:
Fig. 1. Procedures of experiment
Step 1: R-LDA is used to extract the discriminant features for the feature feedback stage. Among C-1 fisherfaces produced by R-LDA, we select the fisherfaces which will be used in the feature feedback based on the distribution of eigenvalues. In Fig.2a, we use 3 fisherfaces corresponding to the 3 largest eigenvalues. Step 2: We divide each fisherface into two parts, FIl and FUl, which are regarded as important and unimportant parts in the l-th fisherface, respectively. Let us define al and T as the average value of ||Ψli|| and a threshold value, respectively. Then, we can segment the l-th fisherface as follows:
The Efficiency of Feature Feedback Using R-LDA
ψ li ∈ FIl , ψ ∈ FU , li l
if ||ψ li || ≥ al +T, Otherwise
321
(6)
By using (6), we obtain the segmented fisherfaces in order to construct the feature mask. Step 3: We construct the final feature mask from the segmented fisherfaces. Using the OR operation, we can obtain the feature mask, as shown in Fig. 2a. FI = FI1⊕FI2⊕...⊕FIn
(7)
We then refine the input data and perform the classification as shown in Fig.2b, the selected pixels obtained by using the feature mask are utilized as the input of the classification.
Fig. 2. The procedure of data refinement for vapor classification (a) Feature feedback to obtain the final feature mask (b) Vapor classification based on the data refinement
322
4
L.B. Truong et al.
Experimental Results
We applied the proposed method to a kind of E-Nose system in [11] to evaluate its performance. The VOC measurement data consists of 8 classes which are acetone, benzene, cyclo-hexane, ethanol, heptane, methanol, propanol and toluene, respectively. In order to evaluate the classification rates, we perform 5-fold cross validation [7] five times and compute the average value. In the other words, there were 128 data samples in the training set and 32 data samples in the testing set. For the classification step, R-LDA method was used to extract the features and the one nearest neighbor rule was used as a classifier with the L2 distance matrix. In this experiment, we set the threshold value for the feature mask T= 0.1e-5, the regularization parameter η = 1 and 3 fisherfaces was used to construct the feature mask. Fig. 3 shows the recognition rate comparison between feature feedback using PCA+LDA and R-LDA for different number of features. As can be seen in Fig. 3, the performance of feature feedback using R-LDA is better than that of feature feedback using PCA+LDA for almost number of features.
Fig. 3. Recognition rates for a various number of features
5
Conclusion
In this paper, we proposed an improvement in feature feedback method using R-LDA and presented its application to vapor classification. This new method not only overcomes the SSS problem but also keeps the null space of Sw, which is very important
The Efficiency of Feature Feedback Using R-LDA
323
for classification. The effectiveness of the proposed method has been demonstrated through experimentation using E-nose database. In the proposed method, we require a more systematic algorithm for determination of the threshold value T, the number of features and the regularization parameter. In addition, it is also necessary to apply this method to other databases. These objectives remain as future work. Acknowledgements. This work was supported in part by research program of Kookmin University in Korea and also supported in part by the National Research Foundation of Korea(NRF) grant funded by the Korea government(MEST) (No.2011-0006958).
References 1. Yang, Y.-S., Choi, S.-I., Jeong, G.-M.: LDA-based vapor recognition using image-formed array sensor response for portable electronic nose. In: Medical Physics and Biomedical Engineering World Congress, pp. 1765–1759 (2009) 2. Belhumeur, P.N., Hespanha, J.P., Kriegman, D.J.: Eigenfaces vs. fisherfaces: recognition using class specific linear projection. IEEE Trans. Pattern Anal. 19, 711–720 (1997) 3. Zhang, S., Xie, C., Fan, C., Zhang, Q., Zhan, Q.: An alternate method of hierachial classification for E-Nose: Combined Fisher discriminant analysis and modified Sammon mapping. Sens. Actuators B 127, 399–405 (2007) 4. Pardo, M., Sberveglieri, G.: Classification of electronic nose data with support vector machines. Sens. Actuators B, 730–737 (2005) 5. Jeong, G.-M., Ahn, H.-S., Choi, S.-I., Kwak, N., Moon, C.: Pattern recognition using feature feedback: application to face recognition. Int’l J. Contr. Autom. and Sys., 1–8 (2010) 6. Choi, S.-I., Kim, S.-H., Yang, Y., Jeong, G.-M.: Data refinement and channel selection for a portable system by the use of feature feedback. Sensors, 10387–10400 (2010) 7. Yu, H., Yang, J.: A direct LDA algorithm for high dimensional data – with application to face recognition. Pattern Recognition 34, 2067–2070 (2001) 8. Chen, L.F., Liao, H.Y.M., Ko, M.T., Lin, J.C., Yu, G.J.: A new LDA- based face recognition system which can solve the small size sample. Pattern Recognition 33, 1713–1726 (2000) 9. Lu, J., Plataniotis, K.N., Venetsanopoulos, A.N.: Face Recognition Using LDA Based Algorithms. IEEE Transactions on Neural Networks 14(1), 195–200 (2003) 10. Lu, J., Plataniotis, K.N., Venetsanopoulos, A.N.: Regularization Studies of Linear Discriminant Analysis in Small Sample Size Scenarios with Application to Face Recognition. Pattern Recognition Letter 26, 181–191 (2005) 11. Yang, Y.-S., Ha, S.-C., Kim, Y.-S.: A matched-profile method for simple and robust vapor recognition in electronic (E-nose) system. Sens. Actuators B, 263–270 (2005)
Interactive Virtual Aquarium with a Smart Device as a Remote User Interface Yong-Ho Seo1 and Jin Choi2 1
Department of Intelligent Robot Engineering, Mokwon University, Mokwon Gil 21, Seo-gu, Daejon, Republic of Korea [email protected] 2 Mobile Comm. Samsung Electronics Co.,Ltd, Yeongtong-gu, Suwon, Republic of Korea [email protected]
Abstract. New applications of smart devices interacting with other computing devices are recently providing interesting and feasible solutions in ubiquitous computing environments. In this study, we propose an interactive virtual aquarium system that interacts with a smart device as a user interface. We developed a virtual aquarium graphic system and a remote interaction application of a smart device for building an interactive virtual aquarium system. We performed an experiment that demonstrates the feasibility and the effectiveness of the proposed system as an example of a new type of interactive application of smart display, where a smart device serves as a remote user interface. Keywords: Interactive Virtual System, Smart Device, Remote Sensory System.
In ubiquitous environments with seamless wireless connection, smart devices are becoming major user interfaces and hub devices in the upcoming personal cloud computing environment [1]. We also anticipate that smart devices will play an important role in ubiquitous computing environments as a kind of wearable computer. Specifically, this medium, which interacts with public media devices as a proxy for a user, will act as a service agent to reduce the burden of frequent interactions with each device, despite that such media devices will be increasingly more intelligent [2]. In the personal cloud computing environment, mixing both ubiquitous computing and wearable computing will be necessary to overcome the disadvantages of each, as summarized in the following Table 1. In consideration of this recent trend in computing technologies together with the rapid deployment of smart devices, we expect that the interconnection of smart devices and interactive media devices such as Kiosk or Smart wall, which we can refer to as smart displays, will provide good synergy in the application domain and have a major influence on the markets.
T.-h. Kim et al. (Eds.): MulGraB 2011, Part I, CCIS 262, pp. 324–331, 2011. © Springer-Verlag Berlin Heidelberg 2011
Interactive Virtual Aquarium with a Smart Device as a Remote User Interface
325
Upon this background, we developed an interactive virtual aquarium system as a prototype of a smart display along with application software to run on the smart device to allow interaction with this virtual aquarium, which will serve as the remote interface of a smart display. Finally, we conducted an experiment to demonstrate interactions between the virtual aquarium system and a smart device. Table 1. Features provided by ubiquitous computing versus wearable computing
Feature Privacy and Personalization Localized Control/Resource
1
Ubiquitous comp. X
Wearable comp.
O
X
O
Interactive Virtual Aquarium System
The proposed system has an embedded computer that uses a TV display as an example of a smart TV. The graphical expression and the animation of the virtual aquarium are based on an evolutional model that uses a genetic algorithm, and we use a fluid dynamics solver for simulating the natural movement of artificial life [3, 4]. At the beginning of the simulation, we set up environment parameters in the virtual aquarium. Each artificial creature has its own genotypes that are created randomly at the initial stage. These genotypes are simulated by bits. The shape, color, and sound of artificial life are decided by the genotypes. This system tracks the speed of the user’s hand and uses it as an external force in the virtual aquarium. This force results in change of the flow of the fluid in the virtual aquarium, based on a fluid dynamics solver. Finally, this flow changes the movement of the artificial creature. When one artificial creature meets another, an artificial creature that has new genotypes is created via a genetic algorithm. Through these processes, we can make various artificial creatures. Fig. 1 shows the initial state of the virtual aquarium system. In this figure, some artificial creatures are scattered randomly and they have various shapes and colors. At the center of the scene, there is a metaphor that moves according to the user’s hand movement using the gravity sensor of a smart device. 1.1
Artificial Creature Generation based on an Evolutional Model
This system has automatically generated artificial creatures based on an evolutional model that uses genetic algorithms [5]. In the evolutional model, the creatures are automatically generated according to the flow chart presented in Fig. 2.
326
Y.-H. Seo and J. Choi
Fig. 1. Interactive virtual aquarium system
Fig. 2. Flow chart of the evolutional model
A creature is represented by a set of genotypes that is simulated by bits. More specifically, the shape and the color of the creature are decided by genotypes, as shown in Fig. 3. The shape of a creature originates from one mesh. Fig. 4 shows the original mesh used in this prototype. To make creatures having various shapes, we mark some points of the original mesh as feature points. A creature derives its shape through movement of the feature points of the original mesh depending on defined values in genotypes. The color of the creature is organized by parameters r, g, and b, where r is red, g is green, and b is blue. In addition, the creature’s age is known, allowing simulation of its life cycle.
Interactive Virtual Aquarium with a Smart Device as a Remote User Interface
327
Fig. 3. Parameters of genotypes
Fig. 4. Original mesh of the artificial creature
Once a creature is generated, it moves along the flow of the fluid in the virtual aquarium. When a creature meets another creature, crossover and mutation occur. We use a uniform crossover, one of various crossover operations to build new genotypes for children. To match the new genotypes, mask data are needed in the operation of the uniform crossover and mask data are randomly generated for the diversity. For example, if the value of the mask data is 1, the bit from creature A is copied to the child. And if the value of the mask data is 0, the bit from creature B is copied to the child. After the crossover operation, we apply a mutation operation to the newly generated child creature. The mutation operation is simulated by converting the values of bits from 0 to 1 or from 1 to 0, depending on the mutation rate. Creatures obtain new features that are not expected in mutation, but they may lose their characteristics in the case of a high mutation rate. We also take the age of the creature into account. With the elapse of time, the creature grows and it can participate in a crossover operation, and it eventually dies within a definite period of time. 1.2
Generation of Movement of Artificial Creature
We use a real-time fluid dynamics to simulate the natural movement of the artificial creatures. The real-time fluid dynamics is based on the physical equation of fluid flow, called the Navier-Stokes equation. The state of a fluid at a given instant of time
328
Y.-H. Seo and J. Choi
is modeled as a velocity vector field, a function that assigns a velocity vector to every point in space. As the Navier-Stokes equation for the velocity, equation (1) is applied. Given the current state of the velocity and a current set of forces, the equation describes how the velocity will change over an infinitesimal time step. In particular, the velocity over a time step changes due to three factors, which are designated by the three terms on the right hand side of the equation. The first term means that the velocity should advect; the second means that the velocity may diffuse at a certain rate; and the third means that the velocity increases due to addition of forces. For example, if a person stirs the water inside the aquarium with a stick, the water flows will be changed. Supposing that an object is located in the aquarium and the velocity vector is a force to a point, it is possible to compute the next location of the object.
∂u 2 = −(u ⋅ ∇)u + ν ∇ u + f ∂t
(1)
Although the Navier-Stokes equation mathematically models fluid flows, the computational burden to obtain a solution is very high. Some studies have thus been carried out to promote the practical use of the equation. The present work applies Jos Stam’s solution [6]. Stam’s solution is easy to implement and can be solved in realtime. We extend his two dimensional solution into a three dimensional solution. In practice, it is not possible to evolve every point in a space that is infinite. We therefore divide the space into identical cells and sample the fluid at each cell’s center. Fig. 5 shows the evolutional steps of the velocity vector field, when a force occurs. Here, a white box denotes the finite space of the virtual aquarium and a white arrow represents a velocity vector of each cell. Initially, the magnitudes of all velocity vectors are zero, as shown in Fig. 5(a), and there are no fluid flows. When a force occurs within the space, the velocity vector field is changed and commences to swirl, as shown in Fig. 5(b). Subsequently, the velocity vector field enters a stable state, corresponding with the initial state, after some evolutions. In summary, our real-time fluid dynamics solver naturally simulates the real fluid flows.
Fig. 5. Evolving steps of velocity vector field, when a force occurs
Interactive Virtual Aquarium with a Smart Device as a Remote User Interface
2
329
Smart Device as a Remote User Interface
We used a Windows Mobile based Samsung smart phone named T-Omnia2 as a smart device to communicate with the interactive aquarium system. Many smart phones support integrated features such as Bluetooth, GPS, and Wi-Fi. To develop a smart device user interface, allow networking, and process the gravity sensor data, Windows Mobile SDK and the Samsung SDK, which make these smart device features available to mobile applications, are used. We referred to the documentation of the corresponding smart phone’s SDK [7]. The gravity sensor in the smart device is composed of several accelerometers that measure the acceleration in terms of Earth’s gravitational force, or g (9.8m/sec²) [8]. The Samsung T-Omnia2 can measure acceleration in a range of ±2g, and its resolution is 0.004g. The axis directions are shown in Fig. 6. The X axis is along the width of the phone, the Y axis is along the length of the phone, and the Z axis is along the depth of the phone. We can easily obtain the moving direction over the XY plane and the velocity of the metaphor from the acceleration values, Accx and Accy, of the accelerometers by applying equations (2) and (3). For the Z axis, we simply applied the Accz value with a scaling factor.
MovingDirectionXYplane = arctan(
Velocity =
Acc y Acc x
Acc x + Acc y + Acc z 2
2
) + offsetAngle
(2)
2
(3)
Fig. 6. Smart device, Samsung T-Omnia2, and its accelerometer axis
3
Experimental Results
To implement the proposed virtual aquarium, we use OpenGL, a graphic library for graphic rendering, and fluid dynamics for interaction with virtual creatures of the aquarium. We also developed application software for the smart device to allow interaction with the virtual aquarium based on the gravity sensor. We conducted an experiment involving interactions between the virtual aquarium system and a smart device. In the experiment, the proposed virtual aquarium system produces natural interactions according to the user’s movement of the metaphor by tilting the handheld smart device.
330
Y.-H. Seo and J. Choi
The demonstration shows the interaction of the interactive virtual aquarium with a smart device as an example of a smart device and a smart display interaction. When the user runs the application software for remote interaction with the aquarium on a smart device and approaches the interactive virtual aquarium system, the application notifies the user that he or she can use the aquarium system when a wireless connection has been established between the smart device and the virtual aquarium. The user can then move the metaphor of the virtual aquarium via hand movement. The flow of water in the aquarium then changes according to the direction of the user’s hand movement. As the metaphor moves around in three dimensional space, the accelerometer data from the smart device are applied to the input interface. The smart device sends the sensed data to this system through a Wi-Fi network. Subsequently, the force is applied to evolve the velocity vector field and to move the metaphor. Fig. 7 shows this interaction process sequentially.
Fig. 7. Interaction process
Finally, we synthesize all the processes described above. Fig. 8 shows a series of screen shots of the interactive virtual aquarium system when a person interacts with the virtual aquarium. When the user tilts the smart device, the metaphor moves to the left and the water flows of the virtual aquarium are changed.
Fig. 8. A series of screen shots of the virtual aquarium
Through the experiment, we verified the feasibility and the effectiveness of the proposed interactive aquarium system with a smart device as an example of a new type of interactive application of a smart display, with a smart device as a remote user interface.
Interactive Virtual Aquarium with a Smart Device as a Remote User Interface
4
331
Conclusion
In this paper, we described an interactive virtual aquarium system and its interaction with a smart device as a control interface. We developed a virtual aquarium graphic system and application software for a smart device to allow its use with the interactive virtual aquarium system. We performed an experiment that demonstrates the feasibility and the effectiveness of the interactive application of a smart display with a smart device as a remote user interface. A smart device can allow a user to choose a variety of options or menus autonomously when the user approaches an interactive smart display. A smart device can also be used as a suitable remote user interface of a smart display. It can also provide a useful medium between the user and public information devices, as the security of the interaction can be ensured by keeping the private information in the user’s smart device. It is difficult for the user of a smart device to scan all nearby devices and select the desired device to interact with. Therefore, for further work, we will attempt to develop a mechanism to suggest choices to a user based on frequently used selections. In addition, we are planning to identify proper skills to understand a user’s intention regarding desired interaction among electronic devices nearby the smart device. Acknowledgments. This work was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (MEST) (2011-0013776).
References 1. 2.
3.
4. 5.
6. 7. 8.
Yoon, J.H., et al.: Smart Device, merging into ’3S’ market, Timely Report, ROA Consulting (2010) Rhodes, B.J., Minar, N., Weaver, J.: Wearable Computing Meets Ubiquitous Computing: Reaping the Best of Both Worlds. In: Proceedings of the International Symposium on Wearable Computers (ISWC 1999) (October 1999) Lee, P., Lee, C., Sasada, S., Takahashi, H.: An Interactive Action of Automatic Artwork by Using An Evolutional Model. In: Proc. of the International Conference on Artificial Reality and Telexistence (ICAT 2001), Tokyo, Japan, pp. 217–220 (2001) Stam, J.: Real-Time Fluid Dynamics for Games. In: Proc. of the Game Developer Conference, San Jose, California, USA (2003) Lee, P., Lee, C., Sasada, S., Takahashi, H.: An Interactive Action of Automatic Artwork by Using An Evolutional Model. In: International Conference on Artificial Reality and Telexistence (2001) Stam, J.: Real-Time Fluid Dynamics for Games. In: Proceedings of the Game Developer Conference (March 2003) Samsung Mobile Innovator, http://innovator.samsungmobile.com de Souza, M., Carvalho, D.D.B., Barth, P., Ramos, J.V., Comunello, E., von Wangenheim, A.: Using Acceleration Data from Smartphones to Interact with 3D Medical Data. In: Proc. of the SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI 2010), Gramado, pp. 339–345 (2010)
Intelligent Control Algorithm for Smart Grid Systems Tahidul Islam and Insoo Koo* School of Electrical Engineering University of Ulsan, S. Korea
Abstract. A secured, well-organized and cost-effective communication system is essential for future Smart Grid (SG) systems, which consists of multi-tier network standard that makes it challenging to synchronize in power management communication. In this paper, we present a communication system model for SG systems by specifying a control algorithm for multilayer devices in home area network (HAN) that is one of the most essential subsystems in the SG. Our focus is to demonstrate a consistent HAN by Zigbee node to save device power as well as bandwidth to transfer the information of electricity consumption. In addition, a dynamic programming based control algorithm is devised for enhancing reliability to control total electricity consumption of an entire SG system in the lack of sufficient electricity supply. Experimental result exhibits the efficiency of the proposed system. Keywords: Home area network, Multilayer device control, Smart device, Smart grid, Zigbee.
1
Introduction
Smart Grid (SG) refers to an improved electricity supply chain that runs from a major power plant inside the home. It explores two-way communication technology, smart metering, updated control theory, dynamic optimization theory and machine-to-machine (M2M) communication in order to ensure the capabilities of superior network, efficient and secured distribution of energy, flexibility, and cyber safety. In SG communications, there are a number of devices that are exploited for the supervision and feedback information of grid, which requires a significant amount of device power. In this respect, intelligent and low cost monitoring and control systems enabled by online sensing technologies are essential to maintain safety, reliability, efficiency and uptime of SG [1-3]. A home area network (HAN) is one of the most essential subsystems in a SG to manage on-demand power requirements of end-consumers. There is an urgent need for cost-effective wireless monitoring and diagnostic systems for HAN that improve system reliability and efficiency by optimizing the management of electric power systems. In this regard, Zigbee plays an important role as a new wireless standard, which targets to a low power, low data-rate, and short-range wireless data transformation [4-5]. A number of approaches have been reported in the previous works in order to set the structure of HAN by utilizing Zigbee [6-9]. *
Corresponding author.
T.-h. Kim et al. (Eds.): MulGraB 2011, Part I, CCIS 262, pp. 332–343, 2011. © Springer-Verlag Berlin Heidelberg 2011
Intelligent Control Algorithm for Smart Grid Systems
333
By the advances in technologies, a number of electricity-based home appliances, such as fridge, washing machine, oven, dishwasher, radio, TV, computer, and so on, are being used in our daily life in order to attain an automated and pleasant life. These appliances make our lives easier (fridges, washing machines), save our time (microwave ovens, dishwashers), or simply give us pleasure (stereos, radios, TVs). These require a lot of electricity consumption at the consumer-end, which is on average a quarter of all the energy consumed in a house [10]. Peak demand of energy-supply can be handled better by giving incentives to consumers to turn off high power appliances, such as air conditioners, electric water heaters, pool pumps, clothes dryers, etc. In this research, several techniques have been discussed to monitor and control home appliances smartly [11-13]. Cho et al. proposed a smart home computing platform that integrated a smart remote monitoring and control system with an automatic meter reading approach to setup SG in home [11]. A Demand Response Smart Controller (DRSC) was proposed in [12] to control home appliances when electricity unit price increases. A communication protocol based on in-home appliances connected over home area network is proposed where two types of appliances are considered: real time and schedulable [13]. In addition, the architecture, standard, optimization and framework of SG communication are elucidated in [14-16]. A multilevel framework for a trust model to be used throughout the electrical grid is proposed in [14]. Bouhafs et al utilized data aggregation functions to reduce communication traffic generated by sensors and thus save the transmission bandwidth [15]. A smart home computing platform was described in [16]. Unfortunately, such approaches still cannot provide overall solution to centrally control the home appliances efficiently in a large SG system. In this paper, we consider an idea of HAN by Zigbee for the SG system to transfer the electricity consumption information and propose a multilayer device control approach in order to reduce device power and bandwidth consumption. In addition, we provide a control algorithm where the control centre (CC) controls according to the least necessary home appliances (priority is defined by the consumers) of the entire SG system when the sufficient electricity supply is not available. The control algorithm provides efficient solution to control the home appliances centrally. Rest of the paper is organized as follows. Section 2 introduces the overall system model. Section 3 presents the proposed multilayer device control system in HAN. Section 4 describes the proposed control algorithm for electricity consumption of the entire SG system in the lack of sufficient electricity supply. Section 5 contains the experimental results and finally Section 6 concludes the paper.
2
System Model
The overall structure of the proposed SG model is depicted in Fig. 1 and the constituents of the system are described in the following subsections. 2.1
Home Appliances
A home appliance is a power-consuming device in the SG, which is connected with a smart meter. The priority of the home appliances is changeable according to consumer’s necessity.
334
2.2
T. Islam and I. Koo
Smart Meter
A smart meter can be defined in different ways. In the proposed SG, the following functions of smart meter are included beside general functions: Providing electricity demand of the home appliances where the statistics of the electricity demand of the individual home appliance is known to it and it has an awareness of the priority of home appliances. In the case of electricity shortage, controlling (switching off/on) the home appliances according to their user-defined priority setting. 2.3
HAN
A HAN is utilized to collect electricity consumption demand of home appliances, supply the information to upper layer. In HAN, the home appliances are connected with smart meters. Smart meters are connected with smart devices through Zigbee nodes where smart devices communicate with WAN by exploiting Base Station (BS). 2.4
Smart Device
Although the definition of a smart device can vary, in our SG the term is used to imply a multifunctional device that acts as: 1. Cognitive device that manages bandwidth from the TV band for its data transmission and reception associated with a Base Station (BS). 2. A home gateway that continuously supplies electricity demand to the upper layer (BS). It sends electricity shortage information and control command to the lower layer i.e. a HAN. 2.5
WAN
In the proposed system model, the total WAN is divided into several clusters where each cluster has a radius of 25 KM from a BS, which is based on IEEE 802.22. The corresponding IEEE 802.22 standard defines the Physical (PHY) layers for a Wireless Regional Area Network (WRAN) that uses white spaces within the television bands [17].
3
The Proposed Multilayer Device Control Algorithm for Home Area Network (HAN)
In this section, we propose a multilayer device control algorithm for HAN to save power (devices) and bandwidth while transferring the electricity consumption information to Control Centre (CC). The HAN is proposed by Zigbee devices due to its superiority on other devices. The multilayer device control mechanism at every layer of HAN is proposed to increase the efficiency of the system.
Intelligent Control Algorithm for Smart Grid Systems
3.1
335
Zigbee Devices
A Zigbee system consists of a few components. The most basic is a device. A Zigbee device can be a full-function device (FFD) and a reduced-function device (RFD). The Zigbee network layer allows for star topology, peer-to-peer topology and cluster tree etc. Zigbee devices may take only milliseconds to exit their sleep states compared to the Bluetooth or Wi-Fi devices. Shuan et al. observed that, the Bluetooth and Zigbee protocols consume less power (for both transmission and reception) than Wi-Fi and UWB technologies, and Zigbee has superiority on Bluetooth [3]. In addition, Zigbee provides a fair communication range of 10-100 meters while maintaining significantly low power (1-100 mili-watts) and thereby lower cost. Examples of this scenario in the realistic cases can easily be found in the context of Zigbee networks [18, 19].
Fig. 1. Overall structure of the proposed SG system
In the proposed model, the HAN consists of Smart Meter, Zigbee End Device (ZED), Zigbee Router (ZR) and Zigbee Coordinator (ZC). It is considered that, ZED is as layer 1, ZR as layer 2, ZC as layer 3 and Smart Device (SD) as layer 4 devices. The ZED is physically inserted inside the Smart Meter. The ZED is connected with ZR, ZR is connected with ZC and ZC is connected with SD. The Smart Device (SD) is used as a gateway to the WAN through IEEE 802.15 WRAN. Figure 2 illustrates the cluster tree structure of a HAN where one ZC (FFD) is attached with four ZRs (FFD). There are ten ZEDs (RFD) connected with each ZR. Hence, total forty ZEDs are connected with each ZC through ZR.
336
T. Islam and I. Koo
Fig. 2. The proposed structure of a HAN based on Zigbee node
3.2
The Proposed Control Algorithm for Home Area Network
An intelligent power management approach for HAN is proposed in [6]. We extend the intelligent system for multilayer device control by focusing on three methods: 1. Silent mode of individual ZED 2. Silent mode of upper layer devices in HAN obtaining power requirement under a threshold value 3. Sending beacon message to ensure devices’ activeness while silent mode
Fig. 3. The flow chart of multilayer device control
Intelligent Control Algorithm for Smart Grid Systems
337
3.2.1 Silent Mode of Individual Zigbee Devices The flow chart of control algorithm is shown in fig. 3. In the system there are m ZED denoted by D m (m=1, 2…10) and n number of ZR denoted by Z R n (n=1, 2…4). Let us assume that,
D
m
requires transmitting the power requirements at every T time by
obtaining power requirements message getting from each smart meter. It is also assumed that its power requirement remains the same at every T t and T t + 1 time. In this case, the ZED does not require sending its power requirement message to ZR and goes to silent approach. Hence, the ZR expects the power requirement unchanged if D m remains silent. In addition, since there is no way to recognize the power requirements of ZED are unchanged or it is device faultiness, therefore it sends a beacon message of 1 bit after every T t + p time. 3.2.2 Silent Mode of Upper Layer Devices in HAN Obtaining Power Requirement Under a Threshold Value As ZR gets periodic signal at every T t time from D m , it analyzes the summation result of power requirements of all D m devices. If it finds below a threshold (which is defined by δ) value upon termination of
Tt
time, it does not send any message to ZC
and becomes in silent status as well as sends a beacon message after every T t + p time to ZC. The ZC follows the same procedure for the entire power requirement message receiving from all the Z R n and sends a beacon message after every T t + p time to SD. The threshold value indicates a very small amount of change of electricity consumption between T t and T t + 1 time, which will not affect in total consumption. In this model, it is assumed T t + p > T 2 0 . Focusing these three procedures, the following recompense are offered: 1. Control algorithm will affect the lifetime of the devices upon providing efficient services. 2. It will affect also sending the unchanged data through the smart device followed by saving a portion of the BW and over traffic.
4
Control Algorithm for home Appliances by Control Centre in the Whole SG
In this section, we propose the algorithm to control the home appliances by control centre (CC) in an SG. When the electricity supply is not well enough, the CC controls the home appliances according to priority set by consumers. All the home appliances are connected to a SM along with their user-specified priority values. Thus, when the electricity supply is not sufficient, the least necessary devices are switched-off.
338
4.1
T. Islam and I. Koo
Power Distribution and Allocation
Electricity is transferred from the source, and distributed to consumers through a transmission line. The control centre is informed about electricity consumption from the lower layer (HAN) by different stages (HAN to SD, SD to BS and BS to CC). 4.2
The Proposed Control Algorithm of Home Appliances by Control Centre
The device connections and their dependency model of the proposed SG system are shown in Fig. 4, which is a tree of order 6. From Fig. 4, we observe that a WAN with a CC consists of a number of clusters with a BS in each cluster. In this work, we consider M-number of clusters in a WAN, thus the number of BSs is M. Under a BS, there are N - number of SDs and each SD has Q-number of ZR through a ZC. In order to devise a control algorithm for the projected SG system we adopt the concepts of dynamic programming algorithm. A dynamic programming algorithm is a mathematical optimization method refers to simplifying a large problem by tracking it down into simpler subprograms to decision steps over time.
Fig. 4. The dependency model of the proposed SG in tree-structure
The steps of the proposed control algorithm are illustrated in Fig. 5. At this instant, suppose the total electricity demand is α, and the amount of electricity supplied by CC is β. The total electricity shortage is ζ , where ζ = α - β. If CC is asked to supply α amount of electricity, and it supports full supply then β = α, i.e. ζ = α - α = 0 which
Intelligent Control Algorithm for Smart Grid Systems
339
indicates no shortage and a solution is obtained without switching-off any devices, as depicted in Fig. 5. If CC supports the partial amount of electricity (β), then the electricity shortage is ζ > 0 . Therefore, the duty of control algorithm is to control (i.e. switching off) the minimum number of home appliances by considering the priorities of devices in each ZR which is represented by the value l, where l=1 indicates least priority devices that are defined as first-tier devices, l=2 indicates second least priority devices that are defined as second-tier devices and so on. The electricity consumption reduction by the l-th priority devices under the k-th ZR of the j-th SD of the i-th BS is denoted by P ( Z i j k ) l where 1<= k <=Q, 1<= j <=N, and 1<= i <=M. The following mathematical model calculates the total amount of reduction of electricity consumption after each repetition: PR=PR+ P ( Z ijk ) l
(1)
where PR indicates the electricity consumption reduction. Initially set PR=0. Notably that, instead of assigning a fixed number for every BS, it is assigned a random number from 1 to M by the control centre. Then the BS is chosen from 1 to control the home appliances. In order to meet the condition, ζ ≤ 0 , the algorithm initially set i=1, j=1,
Fig. 5. The flow chart of the proposed multi-tier control algorithm to control the home appliances in the SG by control centre
340
T. Islam and I. Koo
k=1 i.e. check and control the first-tier home appliances started from the first ZR as illustrated in Fig. 5. Therefore, for i=1 and j=1, it checks for every value of k (i.e. the devices connected with each ZR) increasing by 1 and calculate the total power reduction for every increment value. If P R ≥ ζ , then the solution is found. While total electricity of all the home appliances of all ZRs of the first SD have no solution then repeatedly check power reduction of previous amount plus the other home appliances of the next SD for all the ZR (k=1, 2, …, Q). If all the home appliances of all SD in one BS have no solution, then check repeatedly all BS to obtain the solution. The lack of a solution by switching-off the devices in the first-tier of all BSs (i.e. all home appliances having least priority) transfers the process repeating from the preliminary point for the second-tier (i.e. all home appliances having second least priority of all BSs). The deficiency of the second tier solution transfers the process to the third tier and so on. Thus, the proposed control algorithm generates the solution by repeatedly controlling the devices in a multi-tier approach.
5
Simulation Result
5.1
Device Power Saving
As stated earlier, Zigbee provides a fair communication range of 10-100 meter while maintaining significantly low power (1-100 mili-watts). In this model, it is presupposed the power consumption of ZED, ZR and ZC is 5 mili-watts, 35 mili-watts and 60 mili-watts respectively. Table 1. Number of power consuming devices and power consumption assumption Device type
Number of power consumption devices
Power consumption of each device
Considered power consuming devices (each of 5 miliwatts)
Power consumption (miliwatts)
ZED
40
5 mili-watts
40
200
ZR
4
35 mili-watts
28
140
ZC
1
60 mili-watts
12
60
80
400
Total
The total power-consuming device is thought as 80, each consuming 5 mili-watts. Since each ZR consumes 35 mili-watts it is considered one ZR as seven powerconsuming devices, each consuming 5 mili-watts. As every ZC consumes 60 miliwatts, one ZC is considered as 12 power-consuming devices, each consuming 5 miliwatts. The total ower-consuming device under a smart device of proposed system is given away in the able 1.The power consumption is shown in the fig. 6. As evident from this graph, the conventional power consumption system consumes 100 % power.
Intelligent Control Algorithm for Smart Grid Systems
341
Under the assumption of ten ZED’s silent mode, the power consumption is reduced to 87.5 %. In addition, 78.75 % power consumption is obtained for the silent approach of ten ZEDs and one ZR. Furthermore, 70% and 62.5% power consumption are observed for ten ZED and two ZR’s silent status and ten ZEDs, one ZR and one ZC’s silent mode respectively. To make the simulation result optimal 62.5 % is assumed as the minimum case even though less than this percentage may happen. From all the reduced power consumption analysis, minimum one-fourth power (on average) consumption reduction is expected for the proposed method.
Fig. 6. Number of power consuming devices and their power consumption saving in the proposed system model
5.2
Bandwidth Saving
Notably that, unlike the power consumption of devices, the bandwidth consumption is related with only the data of ZED as it sends the electricity consumption of home appliances collected from SM. Therefore, in the case of bandwidth consumption, the numbers of devices are ZED only. In the proposed system model, the total electricity consumption of forty ZEDs is transferred to other node in WRAN through the SD. While sending the electricity consumption information, as SD does not transmit all the information collected from ZEDs, rather it collects the changed amount electricity from its lower layer (ZC) in the whole network. Consequently, a portion of bandwidth saving is predictable while transmitting the information through SD.
6
Conclusion
We have demonstrated an intelligent multilayer HAN control system for power control. Moreover, a Control algorithm is proposed for controlling the entire SG system when the system suffers lack of sufficient electricity supply. We have evaluated our simulation result for device power saving while transferring the electricity consumption
342
T. Islam and I. Koo
information of home appliances under different configurations, which demonstrates the efficiency of the system. By proper utilization of our proposed intelligent system and algorithm, it is possible to manage the whole SG system in more competent manner. It is expected that our work contributes towards the development of SG system and this issue can be the future research prospects.
References [1] Gungor, V.C., Hancke, G.P.: Industrial wireless sensor networks: Challenges, design principles and technical approaches. IEEE Trans. Ind. Electron. 56, 4258–4265 (2009) [2] Yang, Y., Lambert, F., Divan, D.: A survey on technologies for implementing sensor networks for power delivery systems. In: IEEE Power Eng. Soc. Gen. Meeting, pp. 1–8 (June 2007) [3] Niyato, D., Xiao, L., Wang, P.: Machine to machine communication for Home energy management system in Smart Grid. IEEE Communications Magazine 49, 53–59 (2011) [4] Hwang, Z., Choi, B., Kang, S.: Enhanced Self-Configuration Scheme for a Robust Zigbee based Home Automation. IEEE Transactions on Consumer Electronics 56(2), 583–590 (2010) [5] Jianpo, L., Xuning, Z., Ning, T., Jisheng, S.: Study on ZigBee Network Architecture and Routing Algorithm. In: 2nd International Conference on Signal Processing Systems (ICSPS), vol. 2, pp. V2-389–V2-393. IEEE (August 2010) [6] Fadlullah, Z.M., Fouda, M.M., Kato, N., Takeuchi, A., Iwasaki, N., Nozaki, Y.: Toward intelligent m2m communications in smart grid. IEEE Communications Magazine 49(4), 60–65 (2011) [7] Zhang, Y., Rong, Y., Shengli, X., Wenqing, Y., Yang, X., Guizani, M.: Home M2M Networks: Architectures, Standards, and QoS Improvement. IEEE Communications Magazine 49(4), 44–52 (2011) [8] Starsinic, M.: System Architecture Challenges in the Home M2M Network. In: Applications and Technology Conference (LISAT), pp. 1–7. IEEE (May 2010) [9] Parikh, P.P., Kanabar, M.G., Sidhu, T.S.: Opportunities and challenges of wireless communication technologies for smart grid applications. In: Power and Energy Society General Meeting, pp. 1–7. IEEE (September 2010) [10] Alahmad, M., Wheeler, P., Schwer, A., Eiden, J., Brumbaugh, A.: A Comparative Study of Three Feedback Devices for Residential Real-Time Energy Monitoring. IEEE Transactions on Electronics (99) (August 2011) [11] Choi, I.H., Lee, J.H.: Development of smart controller with demand response for AMI connection. In: International Conference on Control Automation and Systems. IEEE (December 2010) [12] Cho, H.S., Yamazaki, T., Minsoo, H.: Determining Location of Appliances from Multihop Tree Structures of Power Strip Type Smart Meters. IEEE Transactions on Consumer Electronics 55(4) (November 2009) [13] Xiong, G., Chen, C., Kishore, S., Yener, A.: Smart (in-home) power scheduling for demand response on the smart grid. IEEE PES Innovative Smart Grid Technologies, 1–7 (April 2011) [14] Overman, T.M., Sackman, R.W.: High Assurance Smart Grid: Smart Grid Control Systems Communications Architecture. In: First IEEE International Conference on Smart Grid Communications, pp. 19–24 (November 2010)
Intelligent Control Algorithm for Smart Grid Systems
343
[15] Bouhafs, F., Merabti, M.: Managing communications complexity in the smart grid using data aggregation. In: 7th International Conference on Wireless Communications and Mobile Computing Conference (IWCMC), pp. 1315–1320. IEEE (September 2011) [16] Al-Ali, A.R., El-Hag, A.H., Dhaouadi, R., Zainaldain, A: Smart home gateway for smart grid. In: International Conference on Innovations in Information Technology (June 2011) [17] Ghassemi, A., Bavarian, S., Lampe, L.: Cognitive Radio for Smart Grid Communication. In: First IEEE International Conference on Smart Grid Communications, pp. 297–302 (November 2010) [18] Buratti, C.: Performance Analysis of IEEE 802.15.4 Beacon-Enabled Mode. IEEE Transaction on Vehicular Technology 59(4), 2031–2045 (2010) [19] Chalhoub, G., Misson, M.: Cluster-tree based energy efficient protocol for wireless sensor networks. In: International Conference on Networking, Sensing and Control. IEEE (May 2010)
Analysis on Interference Impact of LTE on DTV Inkyoung Cho1,3, Ilkyoo Lee2, and Younok Park3 1
Department of Information & Communication, College of Engineering, Kongju National University, Budae-dong, Cheonan, Chungnam, 330-717, Korea 2 Department of Electrical, Electronic & Control, College of Engineering, Kongju National University, Budae-dong, Cheonan, Chungnam, 330-717, Korea 3 Mobile Packet Transmission Research Team, Electronics and Telecommunications Research Institute, 138 Gajeongno, Yuseong-gu, Daejeon, 305-350, Korea [email protected]
Abstract. TV White Spaces (TVWS) are freed up after transition from analog television to Digital Television (DTV). Some wireless communications are allowable to operate in TVWSs. Because TVWSs are located in the VHF and UHF bands, TVWSs can provide significantly better coverage and wall penetration inside buildings. Therefore, this paper assumes that Long Term Evolution (LTE) will be deployed in TVWSs. However, the interference impact of LTE on DTV has to be taken into account. Spectrum Engineering Advanced Monte-Carlo Analysis Tool (SEAMCAT) to get guard band and protection distance through 5% interference probability are used. As a result, if 4 MHz guard band is required, when the assumed emission mask of LTE BS is used, the protection distance is reduced to 2 km. If the assumed emission mask of LTE MS is used, the protection distance between the reference LTE MS and DTV receiver is 0 when the guard band is 8 MHz. The analysis results may offer a reference and be helpful for considering interference between DTV and other communication systems. Keywords: Long term Evolution, DTV, Guard Band, Protection Distance, TV White Spaces.
1
Introduction
(
)
TV White Spaces TVWS are unused TV broadcast channels which can be available wireless communication systems. Specially, more available TVWSs are freed up after the transition from analog to digital TV. Due to TVWSs located in the VHF and UHF bands, TVWSs have several important properties that make them highly desirable for wireless communications as following[1]: Excellent propagation, Ability to penetrate buildings and foliage, Non-line of sight connectivity, Broadband payload capacity. Therefore, TVWS channels can be used in certain locations by certain devices, such as Long Term Evolution (LTE), Mobile World Interoperability for Microwave Access (WiMAX), Wireless microphone and etc. This paper assumes that LTE are operating on adjacent channels in TVWSs. Also, the specified spectrum emission mask and the assumed spectrum emission mask of LTE BS and MS are T.-h. Kim et al. (Eds.): MulGraB 2011, Part I, CCIS 262, pp. 344–350, 2011. © Springer-Verlag Berlin Heidelberg 2011
Analysis on Interference Impact of LTE on DTV
345
taken into consideration. The impact of LTE potentially interfering with DTV is analyzed by using Spectrum Engineering Advanced Monte Carlo Analysis Tool (SEAMCAT) based on the Monte-Carlo simulation method, which was developed within the frame of European Conference of Postal and Telecommunication administrations (CEPT). The protection distance and the guard band will be figured out through analysis.
2
System Description
2.1
Interference Link
The 3rd Generation Partnership Project (3GPP) Long Term Evolution (LTE) is the latest standard in the mobile network technology tree that produced the GSM/EDGE and UMTS/HSPA network technologies [2][3][4]. It is a project of 3GPP, operating under a name trademarked by one of the associations within the partnership, the European Telecommunications Standards Institute (ETSI). The main advantages with LTE are high throughput, low latency, plug and play, FDD and TDD in the same platform, an improved end-user experience and a simple architecture resulting in low operating costs. LTE will also support seamless passing to cell towers with older network technology such as GSM, UMTS, and CDMA2000. Table 1. Characteristics of LTE Characteristic Duplex Carrier Frequency(DL) Carrier Frequency(UL) Band Width Thermal Noise I/N LTE Link Coverage requirement Building Penetration Loss Propagation Model Coverage Radius Inter-Side Distance Sectorization Minimum Coupling Loss Number of Available Resource Blocks (M) Number of Resource Block per UE (N) Number of Active UEs per Cell (K) Minimum subcarrier usage per Base Station Bandwidth of Resource Block Hand Over (HO) Margin
Value FDD 595 MHz(Channel 35,36) 579 MHz(Channel 32,33) 10 MHz -174 dBm/Hz -10 dB Log-normal shadowing =10 dB[6] 8 dB[7] Macro cell propagation model Urban[8] 2.8668km 4.9654km Tri- sector antennas 70 dB 24 1 24 (K=M/N) assumed full loaded system 100% 375 3 dB
346
I. Cho, I. Lee, and Y. Park
The next step for LTE evolution is LTE Advanced and is currently being standardized in 3GPP Release 10[5]. LTE has introduced a number of new technologies when compared to the previous cellular systems. They enable LTE to be able to operate more efficiently with respect to the use of spectrum and also to provide the much higher data rates that are being required. Main parameters of LTE are summarized in Table 1. According to the specified spectrum emission mask and the assumed spectrum emission mask of LTE BS and MS, two spectrum emission masks are illustrated in Figure. 1. specified esmission mask
0
specified emission mask
0
assumed emission mask
-10
assumed emission mask
-10
-20 -20
-30
dBc
dB c
-30
-40
-40
-50 -50
-60 -60
-70 -70
-80 -90 -30
-20
-10
0 Frequency (MHz)
10
20
30
-80 -30
-20
-10
0 Frequency (MHz)
10
20
30
Fig. 1. Emission mask of LTE BS and MS
2.2
Victim Link
Digital Television (DTV) is an advanced broadcasting technology that transmits audio and video by digital signals. In contrast to the analog signals used by analog TV, DTV Table 2. Characteristics of DTV Characteristic
Transmit power ERP Frequency band Bandwidth Tx antenna height Tx antenna gain Rx antenna height Rx antenna gain Noise Figure Sensitivity C/I Transmit standard Modulation
Value
4 kW (66 dBm) 587 MHz(Channel 34) 6 MHz 100 m 0 dBi 10 m 10 dBi 10 dB -83 dBm 23 dB 8-VSB FM or QPSK
Analysis on Interference Impact of LTE on DTV
347
has several advantages over analog TV, such as requiring less bandwidth, providing high-definition television service, providing multimedia or interactivity, etc[9]. Therefore, many countries are replacing over-the-air broadcast analog television with digital television to allow other uses of the radio spectrum formerly used for analog TV broadcast. DTV adopts US DTV standard (ATSC) in this paper, main relevant characteristics of DTV are summarized in Table 2[10].
3
Methodology and Scenario
3.1
Methodology
A new statistical simulation model has been developed based on the Monte-Carlo method by the European Conference of Postal and Telecommunications Administrations (CEPT), named Spectrum Engineering Advanced Monte Carlo Analysis Tool (SEAMCAT). Figure. 2 illustrates the principle of calculating the interference probability in victim receiver in SEAMCAT. When interference is introduced, the interference adds to the noise floor. The difference between desired received signal strength(dRSS) and the interfering received signal strength (iRSS) is measured in dB, which is defined as the Signal to Interference ratio(C/Itrial). This ratio must be more than the required C/I threshold (C/Itarget) if interference is to be avoided. The Monte Carlo simulation methodology is used to check for this condition and records whether or not interference is occurring.
Fig. 2. Scenario of LTE interfering with DTV Illustrative summary of the interference criteria computation
SEAMCAT calculates the probability of interference (PI) of the victim receiver as follows:
348
I. Cho, I. Lee, and Y. Park
PI=1-PNI .
(1)
Where PI is the probability of interference in the victim receiver. PNI is the probability of non interference (NI) of in victim receiver. When a C/I criterion is considered, PNI is defined as: P PNI
dRSS
C
iRSS
I
, dRSS
(2)
By definition of P(A|B)=P(A∩B)/P(B), PNI becomes: PNI
dRSS C , dRSS iRSS I P dRSS
(3)
∑P iRSSj where P is the number of interferers (i.e. active interferer With transmitters).
3.2
Interference Scenario
Channel 34 in DTV bands is assumed to allocate to DTV. It is assumed that LTE uses Frequency Division Duplexing (FDD), and then upper 4 MHz of Channel 32 and channel 33 are assumed to allocate to LTE Uplink (UL), and channel 35 and lower 4 MHz of channel 36 are assumed to allocate to Downlink (DL). Figure. 3 illustrates scenario of interference impact of LTE on DTV receiver.
Fig. 3. Scenario of LTE interfering with DTV
4
Simulation Results and Analysis
4.1
The Case of LTE BS Interfering with DTV
In the case of LTE BS interfering with DTV receiver, main setups for simulation in SEAMCAT are summarized. Therefore, the evaluation of the relationship between the guard band and the protection distance will be conducted in SEAMCAT. For meeting the interference probability of 5% required by DTV receiver and the maximum allowable transmit power of LTE BS, the relationship between the guard band and the
Analysis on Interference Impact of LTE on DTV
349
protection distance is resspectively illustrated in Figure. 4, when the speciffied spectrum emission mask and a the assumed spectrum emission mask of LTE BS S is used, respectively.
Fig. 4. The relationship betweeen the guard band and the protection distance in the case of L LTE BS
Figure. 4 shows that if th he interference probability of 5% required by DTV receiiver and the maximum allowablle transmit power of LTE BS are required, if 4 MHz guuard band is required, when the specified emission mask of LTE BS is used, the protecttion n 14 km. When the assumed emission mask of LTE BS S is distance must be more than used, the protection distance is reduced to 2 km. 4.2
M Interfering with DTV The Case of LTE MS
In the case of LTE MS inteerfering with DTV receiver, main setups for simulationn in SEAMCAT are summarizeed. For meeting interference probability of 5% requiredd by DTV receiver, the relationsship between the guard band and the protection distancee in the case of LTE MS inteerfering with DTV is illustrated in Figure. 5. when the specified spectrum emissio on mask and the assumed spectrum emission mask of L LTE MS are used, respectively.
Fig. 5. The relationship betweeen the guard band and the protection distance in the case of L LTE MS
Figure. 5 shows that if 4 MHz guard band is required, when the specified emisssion mask of LTE MS is used, the t protection distance should be at least 13 km. But iff the
350
I. Cho, I. Lee, and Y. Park
assumed emission mask of LTE MS is used, the protection distance between the reference LTE MS and DTV receiver is 0 when the guard band is 8 MHz.
5
Conclusions
The scenario of LTE in TVWSs potentially interfering with DTV is assumed. The protection distance and the guard band for protecting DTV from interference of LTE is analyzed by using SEAMCAT. As results of study, the worst case is taken into account. If the interference probability of 5% is required by DTV receiver and the maximum allowable transmit power of LTE BS are required, in order to meet 4 MHz guard band, the protection distance must be more than 14 km when the specified emission mask of LTE BS is used. But when the assumed emission mask of LTE BS is used, the protection distance is reduced to 2 km. If the specified emission mask of LTE MS is used, the protection distance should be at least 13 km. But if the assumed emission mask of LTE MS is used, the protection distance between the reference LTE MS and DTV receiver is 0 with the guard band of 8 MHz. The results can be as a guideline and reference in making plan for the coexistence of LTE in TVWSs and DTV.
References 1. Ofcom, Digital Dividend Review: 550-630MHz and 790-854MHz, Consultation on detailed award design (2008) 2. WGSE - SEAMCAT Technical Group, “OFDMA algorithm description” (2010) 3. 3GPP LTE Encyclopedia,“An Introduction to LTE” (2010) 4. Motorola, "Long Term Evolution (LTE): A Technical Overview" (2010) 5. 3GPP LTE Encyclopedia, “LTE – An End-to-End Description of Network Architecture and Elements” (2009) 6. Digital Video Broadcasting (DVB), “DVB-H Implementation Guidelines” (2009) 7. MWG/AWG, “A comparative Analysis of Spectrum Alternatives for WiMAXTM Networks Based on the U.S.700MHz Band”, WiMAX Forum, p. 19 (June 2008) 8. 3rd Generation Partnership Project, Technical Specification Group Radio Access Network; Evolved Universal Terrestrial Radio Access (E-UTRA); Radio Frequency (RF) system scenarios(Release 10), 3GPP TR 36.942 V10.1.0, pp. 14–15, 39, 76–77 (2010) 9. White Spaces Report 2Q, “United States TV White Spaces: Usage & Availability Analysis” (2010) 10. Kim, S.-K.: Interference Analysis based on the Monte-Carlo Method, p. 61 (2008)
An Ontology Structure for Semantic Sensing Information Representation in Healthcare Services Rajani Reddy Gorrepati and Do-Hyeun Kim Dept. of Computer Engineering, Jeju National University, Jeju, Republic of Korea [email protected], [email protected]
Abstract. There should convert sensing data to semantic and context information for supporting customized healthcare services. Therefore, ontology and rules were used for semantic representation of many sensing data. In this paper, we propose an ontology structure for semantic sensor information and reasoning based on a context-aware which is executed using EEG (Electroencephalogram), MRI (Magnetic Resonance Imaging), pill camera and temperature sensors for customized healthcare services. This paper presents rules and the semantic sensing information representation for customized healthcare services. An ontology structure for semantic sensor information was also developed in order to recognize a situation and provide a customized healthcare service for patients. Keywords: Context-aware, Ontology structure, Healthcare, Semantic sensor information.
1
Introduction
Recently, there is an increasing need for information exchange among healthcare information systems. However, the existence of disparate application platforms and standards has significantly increased the process required to do so. Information Technology combined with recent advances in networking, mobile communications, and wireless medical sensor technologies offers great potential to support healthcare professionals and to deliver remote healthcare services. In recent years, substantial increases in the volume and complexity of data and knowledge available to medical research community. To enable the use of this knowledge in healthcare services, users generally require an integrated view of medical data across a number of data sources for healthcare. Consequently they require simplified mechanism of the healthcare service used for the health management. The healthcare industry has often been fraught with multiple disparate software packages each specialized to a certain sub-domain within the industry. This has created a less than ideal situation with many organizations possessing multiple instances of the same data spread throughout their enterprises in such a way that integration of those resources is non-trivial. The development of ontology is seen as central in all of these efforts. Ontology has metadata, providing a controlled vocabulary of terms. By common domain theories, T.-h. Kim et al. (Eds.): MulGraB 2011, Part I, CCIS 262, pp. 351–357, 2011. © Springer-Verlag Berlin Heidelberg 2011
352
R.R. Gorrepati and D.-H. Kim
ontology helps both people and machines to communicate more effectively. They will therefore have a crucial role in enabling content-based access, interoperability and communication across the Web. The standardization method for data modeling consists of firstly proposing the sensor data-based context ontology standard using the protégé and OWL (web ontology language) [1]. As protégé-OWL has been added with a SWRL (Semantic Web Rule Language) editor, it permits editing both SWRL rules and OWL ontology [2]. In some applications, rules are devoted to a specific task, which can be achieved independently by the ontology. In such cases, it is possible to use to distinct languages with specific inference engines, one for the structural part and another one for the rule component. Secondly, the knowledge interface technology for healthcare service is proposed. Additionally, throughout the rule for healthcare service, we propose a context-aware system in order to provide customized healthcare service for user [3]. In this paper, we present an ontology structure for semantic sensor information was also developed in order to recognize a situation and provide a customized healthcare service for patients. Proposed structure aims a semantic representation of the brain activities using EEG (Electroencephalogram), MRI (Magnetic Resonance Imaging), CTScan, fMRI and pill camera. This structure being developed relies on two components an ontology for dealing with the structural knowledge that is the brain entities and properties, and a rule base for relation representing the interdependencies between the properties. First, introduce the related work in section 2. Secondly, we propose an ontology structure for semantic sensor information for healthcare in section 3. And, finally conclude by some possible perspectives for fulfilling the needs of the application.
2
Related Works
In medical informatics research, the major challenges of the semantic web are the provision of controlled medical services within clinical information systems and the semantic web interoperability. In the past, few authors have proposed the context aware modeling using ontology based on semantic web, and in particular context aware modeling was used [4-6]. Electrical activity in the human brain was first reported in 1929 by Hans Berger who recorded the electrical potential variations from the scalp and introduced the term EEG to describe the graphical time domain signal resulting from these changes in electrical potential [7]. The endocrine system it controls most of the body’s housekeeping needs such as brain blocks, temperature regulation, appetites for food, aggression and pleasure [8]. MRI studies have confirmed that the brain region which controls finger movement in the hand increases in size in string players engaged in specific ‘fingering’ exercises [9]. It leads to degenerative diseases such as atherosclerosis, hypertension, diabetes, cancer, disorders such as posttraumatic stress syndrome and depression [10].
An Ontology Structure for Semantic Sensing Information Representation
3
353
A Proposed Ontology Structure for Semantic Sensing Information Representation in Healthcare Services
Context sensors are collected information of inside brain disorders of the healthcare. We can implement a context-aware inside brain disorders with the context analysis. And, we present the layers of context data modeling ontology, which is the basis of the representation and analysis of the brain disorders and diseases. It is categorize the context data into five layers, such as the physical sensor layer, the event layer, the semantic layer, the awareness layer and the service layer. These layers are also including database and the inference rules. The physical sensor layer is the source of context data, this layer serves as an abstraction between the physical world and the semantic data, and the awareness layer provides the description of the complex facts with the fusion of context services. Each sensor corresponds to the type of context layer. Figure 1 shows you hierarchical ontology structure for semantic sensing information representation in healthcare services. Service layer Identification Services
Rule-3
Health care Services
Hospital care Services
Surgical services
Emergency Services
Drug treatment Services
• rule-2 ^ condition( Body temperature, low ∨ High) → takes(patient, Medicine).
Awareness layer Identified
Rule-2
low
• Rule-1 → Identifies(Patient Profile ^ Doctor Profile ^ Nurse Profile ^ Health report). • measured( Body temperature, Thermometer)
Doctor ID Patient ID
Patient Name
Patient Age
Patient Address
Patient Height
Patient Weight
Data Base
High
Patient Profile
Doctor Name
Specialist / Normal Doctor
Nurse ID Nurse Name
Semantic layer
Patient ID Family History
Body Temperature
Cause of the disease
Doctor Profile
Nurse Profile
Health report Rule-1
• attached to( Patient ^ Doctor ^ Nurse, Bracelet Sensor)
Person Profiles
Patient BracletSe nsor
Body Conditions
Health Conditions
Event Layer
Physical layer Thermometer
Fig. 1. Hierarchical ontology structure for semantic sensing information representation in healthcare services
The physical layer includes the description of the patient information, causes of the brain disorders, diseases, family history of the patient, normal or abnormal images of the inside brain etc. Semantic layer can be used to characterize the situation of an entity. And awareness layer is the use of context to provide relevant information and services to the user, where relevancy depends on the particular task of the user [11, 12].
354
R.R. Gorrepati and D.-H. Kim
An event layer should automatically recognize the situation which is based on various sensors. Semantic sensing information representation is very important in context aware systems to provide semantic information for intelligent services [13]. Therefore, semantic sensing information representation is a key feature in the context-aware system. A semantic sensing information representation should provide rules, databases and context-aware interfaces. Semantic sensing information representation should describe the relationship between the domain vocabulary and the concept of the domain knowledge. Several semantic sensing information representation techniques exist such as logicbased modeling, graphical modeling, and ontology-based modeling. The ontologybased approach is very powerful and applicable in the environment. Ontology includes the definition of basic concepts in the domain and relationships among taxonomies. Ontology shares a common understanding of the structure descriptive information and enables reuse of domain knowledge. We present some preliminary results on the brain ontology system that is concerned with the collection, presentation and use of knowledge in the form of ontology. It is related to brain functions, brain diseases, their genetic basis and the relationship between all of them. Table 1. Rules for triggering semantic information in healthcare services
Application Hospital care services
Healthcare services
Identification services
Rules for semantic information Attached to (Patient ^ Physician ^ Nurse, Bracelet Sensor) → Identifies (Patient Profile ^ Physician Profile ^ Nurse Profile ^ Health report). Measured (Body temperature, Thermometer) ^ condition (Body temperature, low ∨ High) → takes (patient, Medicine). Attached to (Patient, EEG sensors) ^ measures (EEG sensors, Brain electrical activity) → Identifies (Neurological disorders).
It is possible to provide personal healthcare services to the users. The semantic database manages context store and context query because the system environment is restrictive of the environment. These are the most common symptoms of brain tumors: changes in speech, vision, or hearing problems balancing or walking; changes in mood, personality, or ability to concentrate problems with memory; muscle jerking or twitching numbness or tingling in the arms or legs. An x-ray machine linked to a computer takes a series of detailed pictures of the head. The patient may receive an injection of a special dye so the brain shows up clearly in the pictures. The pictures can show tumors in the brain. A powerful magnet linked to a computer makes detailed pictures of areas inside the body. These pictures are viewed on a monitor and can also be printed. Sometimes a special dye is injected to help show differences in the tissues of the brain. The pictures can show a tumor or other problem in the brain. The rule of triggering semantic information is based on the functions of the context-aware framework [14]. An example of the rule is shown in table 1.
An Ontology Structure for Semantic Sensing Information Representation
355
Ontology is used in order to make explicit assumptions and to separate domain knowledge from operational knowledge. Ontology has the advantage of sharing of knowledge, logic inference and the reuse of knowledge. If any system uses ontology, the system can provide a general expressive concept and offer syntactic and semantic interoperability and mapping concepts in different ontologies, structured information can be shared. Ontology is a good candidate for expressing context and domain knowledge [15]. Many ontology languages exist including resource description framework schema (RDFS), DAML+OIL, and OWL. OWL is a key to the semantic web and was proposed by the Web Ontology Working Group of W3C [16]. OWL is a language for defining the web and is more expressive than other ontology languages such as RDFS. OWL is based on the RDF [17]. RDF embodies the idea of identifying objects using web identifiers and describing resources in terms of simple properties and property values formed by triple. This approach could make produce a new meta-data generating procedure using semantic relationships.
Fig. 2. An ontology structure of semantic sensor information and reasoning for healthcare services
Figure 2 describes the health data of brain context’s hierarchy and same as relation. We designed person’s health data to brain disorders, brain tumors, neurological disorders, emergency services etc. As shown in figure 2, the brain disorder context and EEG context are subclass of health data context. Also brain disorder context and EEG
356
R.R. Gorrepati and D.-H. Kim
context have a same as relation. As we use a same as relation, we can interpret any context providers which can observe the brain disorder or EEG. In our context model, patient context is subclass of person context, and the patient context has health data context. The obtained semantic data and the corresponding rules are used for development ontology environment of brain activity using protégé. Classification is used to infer specialization relationships between classes from their formal dentitions. Basically, a classifier takes a class hierarchy including the logical expressions, and then returns a new class hierarchy, which is logically equivalent to the input hierarchy.
4
Conclusions
This paper presents an ontology structure for semantic sensor information and reasoning which is executed using EEG, pill camera and temperature sensors in a computing environment for healthcare of brain. It is related to brain functions, brain diseases, their genetic basis and the relationship between all of them. And we represent the semantic sensing information using proposed rules for customized healthcare services. Proposed ontology structure for semantic sensor information supports to recognize a situation of customized healthcare service for patients. Acknowledgments. This work was supported by the Industrial Strategic Technology Development Program funded by the Ministry of Knowledge Economy(MKE, Korea). [10038653, Development of Semantic based Open USN Service Platform]. Corresponding author; DoHyeun Kim (e-mail: [email protected]).
References 1. W3C Recommendation: OWL Web Ontology Language Overview At address (2004), http://www.w3.org/TR/owl-features/ 2. Horrocks, Patel-Schneider, P.F., Boley, H.: SWRL: A Semantic Web Rule Language Combining OWL and RuleML (2004), http://www.w3.org/Submission/SWRL/ 3. Masuoka, R., Labrou, Y., Parsia, B., Sirin, E.: Ontology-Enabled Pervasive Computing Applications. IEEE Intelligent Systems 18(5) (2003) 4. Chen, H., Finin, T.: An Ontology for a Context Aware Pervasive Computing Environment. In: IJCAI Workshop on Ontology and Distributed Systems, Acapulco MX (2003) 5. Ranganathan, A., Campbell, R.H.: A Middleware for Context-Aware Agents in Ubiquitous Computing Environments. In: Proceedings of ACM/IFIP/USENIX International Middleware Conference, Rio de Janeiro, Brazil (June 2003) 6. Zhang, D., Yu, Z., Chin, C.Y.: Context-Aware Infrastructure for Customized HealthCare. In: International Workshop on Customized Health. ISO Press (2004) 7. Berger, H.: Arch. F. Psychiat. 87: Hans Berger on the Electroencephalogram of Man. Elsevier, Amsterdam (1969) 8. Pollard, I.: From happiness to depression. Today’s Life Sciences 15, 22–26 (2003) 9. Elbert, T., Pantev, C., Wienbruch, C., Rockstroh, B., Taub, E.: Increased cortical representation of the fingers of the left hand in string players. Science 270, 305–307 (1995)
An Ontology Structure for Semantic Sensing Information Representation
357
10. Pollard, I.: The state of wellbeing: on health and ill-health. In: Life, Love & Children: A Practical Introduction to Bioscience Ethics and Bioethics, pp. 87–95. Norwell, Kluwer Academic (2002) 11. Beigl, M., Zimmer, T., Decker, C.: A Location Model for Communicating and Processing of Context. Journal of Personal and Ubiquitous Computing 6, 341–357 (2002) 12. Dey, A., Abowd, G.: Towards a Better Understanding of Context and Context-Awareness. In: Workshop on the What, Who, Where, When and How of Context-Awareness at CHI (2000) 13. Henricksen, K., Indulska, J., Rakotonirainy, A.: Modeling Context Information in Pervasive Computing Systems. In: Mattern, F., Naghshineh, M. (eds.) PERVASIVE 2002. LNCS, vol. 2414, pp. 167–180. Springer, Heidelberg (2002) 14. Horrocks, I., Patel-Schneider, P.F., Bechhofer, S., Tsarkov, D.: OWL rules: A proposal and prototype implementation. Journal of Web Semantics (2005) 15. Staab, S., Studer, R.: Handbook on Ontologies. Springer, Heidelberg (2004) 16. Smith, M., Welty, C., McGuinness, D.: OWL Web Ontology Language Guide (2004), http://www.w3.org/TR/owl-guide/ 17. Brickley, D., Guha, R.V.: RDF Vocabulary Description Language 1.0: RDF Schema. Wide Web Consortium (2003)
A New Type of Remote Power Monitoring System Based on a Wireless Sensor Network Used in an Anti-islanding Method Applied to a Smart-Grid Kyung-Jung Lee, Kee-Min Kim, ChanWoo Moon*, Hyun-Sik Ahn, and Gu-Min Jeong Department of Electronics Engineering, Kookmin University, Seoul, Korea [email protected]
Abstract. Renewable energy and the smart grid have become a major focus in industry. The smart grid is an intelligent system which maximizes energy efficiency; it needs to continuously monitor power production and consumption. Remote Monitoring Systems (RMS) are a very useful tool for this purpose, however they are mostly implemented using hard-wired communications. In this paper, we propose an RMS based on a wireless sensor network. In addition, a new anti-islanding method, implemented with the proposed wireless RMS, is presented. An experimental micro grid system has been implemented in order to verify the proposed RMS and the anti-islanding method. Keywords: Renewable energy, Smart grid, Remote Monitoring System, Wireless sensor network, Anti-islanding.
1
Introduction
Recently, renewable energy and the smart grid have become a major focus in the electrical generation industry. The smart grid is an intelligent power generation and distribution system; it can increase energy efficiency and make it possible to incorporate small scale renewable energy sources, such as solar power, wind power, and fuel cells, into the backbone of a power distribution utility. Starting with the earliest smart metering, from transmission and distribution automation to an overall intelligent process, the smart power grid concept has been substantially enriched over time [1]. In order to maximize energy efficiency, a smart grid needs to continuously monitor the amount of power generation and consumption. These tasks demand the collection and analysis of a large amount of real-time data, in order to optimize the energy production and consumption. A Remote Monitoring System (RMS) is very useful in monitoring power generation and consumption, and so has become an essential part of a smart grid. Many RMS types have been developed; most of them have been implemented using a hardwired industrial communication network and have performed their task with a great deal of success. Recently, however, widely distributed small scale power sources have *
Corresponding author.
T.-h. Kim et al. (Eds.): MulGraB 2011, Part I, CCIS 262, pp. 358–367, 2011. © Springer-Verlag Berlin Heidelberg 2011
A New Type of Remote Power Monitoring System
359
increased in number; it is not efficient to build a dedicated cable network for every one of them. Therefore, a wireless RMS communication method is needed. Independent power sources raise some unique problems. One of them is power source islanding. The islanding phenomenon found in power generators connected to a grid refers to their continuance in powering the utility system even though the generator has been disconnected from the remainder of the grid. Recently, serious concerns about this islanding effect have been raised by the spread of distributed power generation resources, because islanding can cause safety problems to utility service personnel or to its related equipment. Consequently, utility companies require antiislanding methodologies [2]-[3]. The main strategy used for islanding detection is in monitoring the distributed generation output parameters for the system, and based on the measurements decide whether an islanding situation has occurred. Islanding detection techniques can be divided into three basic methodologies [4]: 1) The Passive Method [5]-[7] The passive method measures the system parameters, such as variations in the voltage, frequency, harmonic distortion, etc, based upon thresholds set for these parameters. If these parameters exceed these thresholds an islanding is detected. This method very quickly detects an islanding. However, it has a large non-detection zone and special care is needed in setting the parameter thresholds. The passive method can utilize: • • • • • •
The output power rate of change The frequency rate of change The frequency over power rate of change A change in the impedance A voltage imbalance The amount of harmonic distortion
2) The Active Method [8]-[9] The active method tries to overcome the limitations of the passive method by introducing perturbations into the inverter output. The active method is able to detect islanding even when the generation and the load are perfectly matched, which is not possible in the case of the passive detection schemes, however, it can cause a degradation in the power quality. The active method can utilize: • • • • • • •
Reactive power export error detection Impedance measurements Phase (or frequency) shifts The Active Frequency Drift method The Active Frequency Drift with Positive Feedback method The Adaptive Logic Phase Shift method Current injection with positive feedback
3) The Hybrid Method [10]-[11] The hybrid method combines aspects of both the active and passive methods. The active technique is commenced only when islanding is suspected through the passive technique. It uses:
360
• •
K.-J. Lee et al.
Techniques based on voltage and reactive power shifts Techniques based on positive feedback and voltage imbalances
Islanding in a distributed generation system can have a deadly impact on power system personnel and facilities. In response to this situation, both IEEE Std. 929-2000 and UL1741 address this problem. In this paper, we propose a remote power monitoring system (RMS) based on a wireless sensor network. In addition, a new anti-islanding method is presented to be used with the proposed RMS. This RMS communicates with each node using ZigBee technology, because ZigBee has low power requirements, is simple, and is able to cover a large number of devices per network [12]. The proposed anti-islanding method does not affect the power quality, because this method uses external sensor data and does not modulate the output signal. In addition it reduces the computational burden for the power generator calculations. The validity of this RMS system was tested on an experimental system that simulates a micro grid. Another advantage is that the ZigBee system can be replaced with smarter wireless network developed in the future without changing the proposed structure.
2
The Structure of the Proposed Remote Monitoring System
Figure 1 shows the block diagram of the proposed RMS applied to a micro-grid. It consists of several different components, such as an End Monitor Module (EM), a Local Area Monitor Module (LAM), a Wide Area Monitor Module (WAM), and client devices. Hierarchically, these modules use different communication technologies in order to ensure the reliability of the system. The EM is a kind of wireless sensor node; an EM can be classified into one of three types: - The End Power Monitor Module (EPM): An EPM is located at each small scale renewable power generator, and monitors the generated power, the current, the voltage, the history, and the instantaneous status of the generator. Customers can check the EPM remotely on a PC. - The End Load Monitor Module (ELM): The ELM is placed at a customer location, such as a home, an office, or a factory; it monitors the power consumption. - The End Grid Monitor Module (EGM): The EGM is located at the power distribution network. It is equipped with current and voltage sensors, and monitors the total power supply and consumption of the utility, the power failures, and the power quality. It calculates the harmonics of the current and voltage using FFT and detects power failures when abnormal harmonics are detected. The EPM and ELM can shutdown the connected device by a command from the LAM.
A New Type of Remote Power Monitoring System
361
Fig. 1. The block diagram of the RMS
The experimental EM module used in this study consisted of a main microcontroller, sensors and a communication module. Because the EM must control each end device and measure a great deal of information at the same time, the EM module was designed to take advantage of a high-performance micro-processor. A ZigBee module is used to communicate between the EM and the LAM. The EM sends the measured information to the LAM. All of the data from each EM is gathered by the LAM; the LAM then returns the appropriate commands. If the power production of a distributed power source is excessive, the LAM commands the EPM to reduce the power generation. In the contrary case, the LAM commands the ELM to reduce the power consumption. The information monitored by each EM module is collected by the LAM in each local area, which guarantees communication reliability and better security. The LAM module has a gateway which converts the ZigBee protocol to Ethernet. If any problem occurs in the generator or load, the EM module will send an emergency signal to its LAM. Once the LAM module detects the emergency signal, it shuts down the involved node. The LAM module then reports this to the WAM module via Ethernet. The WAM monitors its entire network in real time. Customers can check the state of each node using the Internet. Customers can also shut down and turn on their own nodes remotely.
362
K.-J. Lee et al.
3
The Anti-islanding Application for the Wireless Remote Monitoring System
A key duty of this wireless RMS is to prevent the islanding of the distributed small scale power generators. The occurrence of islanding may complicate the orderly reconnection of the electric utility network and pose a hazard to utility personnel [13], and also can impose a burden on small scale power generators. If a generator inverter has the capability of over voltage protection (OVP), under voltage protection (UVP), over frequency protection (OFP), and under frequency protection (UFP), it is said to have a basic islanding detection capability [14]. Using this basic islanding detection capability, the passive method detects islanding by observing the variations in voltage, frequency and phase when a power failure in the utility network has occurred. The active method uses the Frequency Bias, the Sandia Frequency Shift, the Frequency Jump, the Harmonic Amplitude Jump, the Power Line Carrier Communications, and other methods. These methods detect islanding by converting the output signal of the generator to an arbitrary signal and observing the variations in the load voltage and frequency. Table 1. International Standards IEEE 929-2000 and UL1741 State
Voltage after ac dump
Frequency after ac dump
The allowed largest detecting time
1
V < 50%Vnom
f nom
6 cycles
50%Vnom < V < 88%Vnom 88%Vnom < V < 110%Vnom 110%Vnom < V < 137%Vnom
f nom
2 seconds
f nom
Normal operation
f nom
2 seconds
5
137%Vnom
f nom
6 cycles
6
Vnom
f < f nom - 0.7Hz
6 cycles
7
Vnom
f > f nom + 0.5Hz
6 cycles
2 3 4
V
Existing Islanding detection methods have some problems. First, due to the conversion of the generator output signal to an arbitrary signal, the power quality is reduced. Second, existing anti-islanding methods have a NDZ (Non-Detect Zero) problem. Therefore, in this paper, we propose a new RMS based anti-islanding method. Each EGM module has a current and voltage sensor, and monitors the power distribution network. The EGM node can thereby detect a fault within an islanding detection time in accordance to the international standards IEEE 929-2000 and UL1741 [15]. The main parameters of the standards are listed in Table 1.
A New Type of Remote Power Monitoring System
363
Fig. 2. The RMS based anti-islanding method
If the source and the load are balanced, the allowable detection time can be as long as 2 seconds, because this does less damage to the equipment and the load. The power failure information from one EGM is shared with all of the nodes belonging to the micro-grid via the wireless network. The LAM locates the source of the power failure, and requests that the generators most probably affected by the power failure to stop their generation. The generator is thereby isolated from the micro-grid, which prevents islanding. The concept of the proposed RMS based anti-islanding method is shown in Figure 2. This method does not have an effect on the power quality, because this method doesn’t modulate the output signal. For this same reason, the RMS based antiislanding method doesn’t have a NDZ problem.
4
Implementation
The ZigBee communication protocol between the coordinator and the sensor nodes is stringent. The sensor nodes periodically collect their data, package them, and then send them to the coordinator. After this is accomplished, they begin to receive responses. If the transmission of the data is successful, the sensor nodes go back to their free state. Otherwise, the nodes rerun the process until the transmission is successful. The coordinator displays the received data on a PC. Because different types of sensors exist in the network, the coordinator determines the data type according to the network address. The received data are displayed on the PC through serial port software programmed using JAVA; it can also be shown as a waveform in the coordinator. We can thereby analyze the status of the distributed generator systems based on these waveforms and data. Various experimental miniature distributed generator and load systems are shown in Figure 3.
364
K.-J. Lee et al.
(a)
(b)
Miniature End Monitor Modules
The LAM Monitor program
Fig. 3. The experimental End Monitor Modules and the LAM monitor program
5
The Experiment Results
An experiment has been conducted in order to verify the proposed system. The experiment setup consists of 4 sensor nodes and the LAM, as shown in Figure 4. Node A is an EGM and connected to a 1kVA/110V power source. Node D is an EPM; Nodes B and C are inner nodes. If an emergency situation occurred in the EGM, the LAM then needs to disconnect the EPM from the grid in order to prevent islanding, according to IEEE Std. 929-2000 and UL1741. In this experiment, a message of an accident occurring in Node A is sent to the LAM via Node B, the LAM then sends a cutoff message to Node D via Node C. The cutoff time is measured using the various node distances. Figure 5 shows the operation when the EPM received the cutoff message from the LAM. The Y axis voltage data was at an AC RMS value of Ref : 110V. As shown in this figure, the power generation of the EPM is cut off simultaneously with the message. Figure 6 shows the islanding detection time. The sensor nodes are located at 4 different distances: 80m, 60m, 40m, and 20m. The timing utilizes three
A New Type of Remote Power Monitoring System
365
steps. 1) detect the islanding, 2) send the cut-off command to the EPM, and 3) the EGM receives the acknowledge character from the EPM. The LAM is guarantees a short cutoff time for prevent islanding in less than 1s. As shown in Figure 7, in all four cases, the cut-off times are about 412ms regardless of the distance. This result guarantees the reliability of the RMS.
Fig. 4. The experiment setup diagram 500
150
400
voltage[V]
current[mA]
100
300
200
50
100
0 0
1
2
3
4
0 0
5
1
2
3
4
5
time[s]
time[s]
Fig. 5. The anti-islanding results (node distance 80m)
Fig. 6. The cut-off time according to distance
In this experiment, the strength of the received signal is more important than the geometric distance. The RSSI (Received Signal Strength Indicator) which is defined
366
K.-J. Lee et al.
as the ratio of the received power to the reference power is one of indicators of the communication quality. The RSSI determination is described by:
RSSI = 10 ⋅ log
PRX PREF
[ RSSI ] = dBm
(1)
Figure 7 shows that RSSI value of the four cases; they are in the -10dbm to -60dbm range. Through this, we determined that the experimental network environment is good [16]. We also concluded that if the RSSI value between each node is in the 10dbm to -60dbm range, the proposed anti-islanding method works properly.
Fig. 7. The RSSI strength according to the distance
6
Conclusions
This paper presented a wireless sensor network based remote power monitoring system with a new islanding detection method using a sensor network. The RMS consists of a WAM, LAMs and EMs. The EM modules monitor the power generation and consumption, and also detect islanding. The LAM modules collect the information from all of the EM modules attached to the micro-grid. The LAM modules send commands to the EMs, so that the EMs can shutdown their own node or adjust their power generation and consumption. Our new anti-islanding method is based on RMS communications. When an EGM detects a power failure, it sends an alarm signal to the LAM; the LAM then shuts down the appropriate EPM(s) and, if necessary, it also shuts down the ELM. An experimental RMS was implemented using miniature renewable power generators and loads. The information from each EM was collected by its LAM; the PC which was connected to the LAMs displayed the data.
A New Type of Remote Power Monitoring System
367
References 1. Tai, H., Hogain, E.O.: IEEE Trans. Power and Energy Magazine, vol. 7, pp. 96–92 (2009) 2. IEEE 1547 Standard for Interconnecting Distributed Resources with Electric Power Systems (2003) 3. Ropp, M.: Design issues for grid-connected photovoltaic systems. Ph.D. dissertation, Georgia Institute of Technology (1998) 4. Cobreces, S., Bueno, E.J., Pizarro, D., Rodriguez, F.J., Huerta, F.: Grid Impedance Monitoring System for Distributed Power Generation Electronic Interfaces. Alcala de Henares 58, 3112–3121 (2009) 5. González, H.-G., Iravani, R.: Current injection for active islanding detection of electronically-interfaced distributed resources. IEEE Transactions on Power Delivery 21(3), 1698–1705 (2006) 6. Ye, Z.: Evaluation of Anti-Islanding Schemes based on Non detection Zone Concept. IEEE Transactions on Power Electronics 19(5), 1171–1176 (2004) 7. Jang, S.-I., Kim, K.-H.: Development of a Logical Rule-Based Islanding Detection Method for Distributed Resources. IEEE Power Engineering Society Winter Meeting 2, 800–806 (January 2002) 8. Jang, S.-I., Kim, K.-H.: An Islanding Detection Method for Distributed Generations using Voltage Unbalance and Total Harmonic Distortion of Current. IEEE Transactions on Power Delivery 19(2), 745–752 (2004) 9. Ye, Z., Li, L., Garces, L., Wang, C., Zhang, R., Dame, M., Walling, R., Miller, N.: A New Family of Active Anti-Islanding Schemes Based on DQ Implementation For GridConnected Inverters. In: 35th Annul IEEE Power Electronics Specialists Conference, vol. 1, pp. 235–241 (2004) 10. Chiang, W.-J., Jou, H.-L., Wu, J.-C., Feng, Y.-T.: Novel Active Islanding Detection Method for Distributed Power Generation System. In: International Conference on Power System Technology, pp. 22–26 (2006) 11. Luitel, B., Venayagamoorthy, G.K., Johnson, C.E.: Enhanced Wide Area Monitoring System. In: IEEE PES Conference on Innovative Smart Grid Technologies, pp. 1–7 (2010) 12. Mulyadi, I.H., Supriyanto, E., Safri, N.M., Satria, M.H.: Wireless Medical Interface Using ZigBee and Bluetooth Technology. In: Third Asia International Conference on Modelling & Simulation, pp. 276–281 (2009) 13. Mekhilef, S., Rabim, N.A.: Implementation of Grid-Connected Photovoltaic System with Power Factor Control and Islanding Detection. In: 35th Annual IEEE Power Elecrronics Specialists Conference, vol. 2, pp. 1409–1412 (2004) 14. Zhang, C., Liu, W., San, G., Wu, W.: A Novel Active Islanding Detection Method of Gridconnected Photovoltaic Inverters Based on Current-Disturbing. In: Power Electronics and Motion Control Conference, vol. 3, pp. 1–4 (2006) 15. Hudson, R.M., Thorne, T., Mekanik, F., Behnke, M.R., Gonzalez, S., Ginn, J.: Implementation and testing of anti-islanding algorithms for IEEE 929-2000 compliance of single phase photovoltaic inverters. In: Conference Record of the Twenty-Ninth IEEE Photovoltaic Specialists Conference, pp. 1414–1419 (2004) 16. Wu, R.-H., Lee, Y.-H., Tseng, H.-W., Jan, Y.-G., Chuang, M.-H.: Study of characteristics of RSSI signal. In: IEEE International Conference on Industrial Technology, pp. 1–3 (2008)
ICI Suppression in the SC-FDMA Communication System with Phase Noise Heung-Gyoon Ryu Department of Electronic Engineering, Chungbuk National University, Cheongju, Korea [email protected]
Abstract. The SC-FDMA (single carrier-frequency division multiple access) is uplink standard of the 3GPP LTE (3rd generation partnership project long term evolution) mobile system. SC-FDMA has very low PAPR (peak to average power ratio) but is sensitive to the ICI (inter carrier interference) by phase noise. In this paper, we analyze the effect of phase noise considering the backoff amount of HPA (high power amplifier). And we propose the equalizer of advanced PNS (phase noise suppressing) algorithm to remove the ICI component effectively. The adaptive equalizer has similar form of SD-FDE (single carrier-frequency domain equalization) and the operation process based on PNS algorithm to remove the ICI component even in the IQ imbalance environment, which will be shown in the simulation results. Keywords: ICI, OFDM, phase noise, PAPR, equalizer.
1
Introduction
This single-carrier based radio access scheme has the advantage of low PAPR (peakto-average power ratio) features so that it can support wide-area coverage in cellular systems. SC-FDMA transmission is divided into two parts. One is the time domain processing called IFDMA (interleaved frequency division multiple access); the other is frequency domain processing called DFT-SOFDM (DFT spread orthogonal frequency division multiplexing) [1]. And it has received more attention as a solution to reduce PAPR for the uplink. SC-FDMA has the properties of low PAPR, good spectral efficiency, commonality in design and coexistence with OFDM technique [2]. However, SC-FDMA produces more interference components of ICI and SCI (self channel interference) than ordinary OFDM because of the DFT (Discrete Fourier Transform) spreading effect and phase offset mismatch caused by random phase noise. In OFDM system, it is important to reduce ICI by phase noise and many researches has studied [3] [4]. Most of previous works removed only CPE component and under assumption that CPE is removed, ICI component is compensated. So, using pilot symbol, another approach that estimates the phase noise and removes the CPE and ICI simultaneously was proposed in [4] and [5]. Therefore, ICI is increased from the distorted signal by HPA with phase noise. We can improve the performance by T.-h. Kim et al. (Eds.): MulGraB 2011, Part I, CCIS 262, pp. 368–376, 2011. © Springer-Verlag Berlin Heidelberg 2011
ICI Suppression in the SC-FDMA Communication System with Phase Noise
369
the equalizer. In previous equalizer of SC-FDE method, MMSE (minimum mean square error) is better than ZF (zero forcing) method [7]. PNS algorithm using MMSE method is excellent in the presence of phase noise. Basically, the previous method is robust to phase noise but is difficult to compensate the performance degradation.
2
Phase Noise in SC-FDMA
In the SC-FDMA, suppose each data symbols which are used for the construction of
~
one SC-FDMA symbol, first pass through S point DFT, then after sub-carrier mapping(mapping into N point by appropriate sub-carrier allocation method), pass through N point IFFT processing. The spreading codes can distribute the signal energy of the superimposed code symbols uniformly over all sub-carriers.
C = [d 0 , d1 ,...., d ~s −1 ]
(1)
When DFT spreading block is defined as follows,
1 1 1 W 1 Q= # # ~ ( S −1) 1 W where W = e
2π −j ~ S
W % # ~ ~ ( S −1)( S −1) " W S~×S~
" "
1
~ ( S −1)
(2)
and SC-FDMA signal is given by ~ S −1
X S′~ = C ⋅ Q = d ~l .e − j 2πs l / S . ~~ ~
(3)
~ l =0
After sub-carrier mapping, transmission signal vector pass through N point IFFT processing, so, The SC-FDMA signal is following as N −1
N −1
k =0
k =0
x ( n) = X k ⋅ e j 2πkn / N = X k′ ⋅ e j 2πkn / N where X k = X ~′ = k
~ S −1
d ~ l =0
(4)
~~ ~
~ l
⋅ e − j 2πs l / S .
The received signal is
r ( n) = x (n ) ⊗ h( n) + v( n) .
(5)
th
The recovered output for the k sub-carrier is as follows
Yk =
1 N
r[ n ] ⋅ e
−j
2π nk N
n∈S
= X k + N k = X k′ + N
S −1
= d l ⋅ pk ,l + N i = 0
where
X i is the frequency domain expression of x(n) .
(6)
370
H.-G. Ryu
Finally, after sub-carrier de-mapping, SC-FDMA demodulation for the transmitted symbol d k~ is as follows. S −1
dˆk = Ys′ ⋅ e j 2π ks / S = d k + N ,
k = 0,......, S − 1.
(7)
s = 0
When phase noise is inserted from frequency synthesizer of the transceiver, received signal is as follows.
r (n) = [x(n) ⊗ h(n) + v (n)].e jΦ ( n )
(8) th
Then, after removing cyclic prefix, after FFT, the recovered output for the k subcarrier is given by
Yk =
1 N
r[n].e
−j
n∈S
2π nk N
= X i ⋅ H i ⋅ Qi − k + N k .
(9)
i∈S
Finally, SC-FDMA demodulation for the transmitted symbol d k~ is as follows. S −1
S −1
S −1
S −1
dˆk = Ys′ ⋅ e j 2π ks / S = Ys ⋅ e j 2π ks / S = d v ⋅ ps ,v ⋅ Qi −s ⋅ e j 2π ks / S + N k v = 0
=
3
v = 0
S −1 S −1
1 dv ⋅ e N ⋅ S n∈S i∈S s =0 v =0
s = 0 i∈S v = 0
−iv ) ( i − s ) n ( ks l 2π + +Φ[ n ] S N
(10)
+ N k
SC-FDMA System Using FDE
In this paper, to remove ICI, we do not discuss the effect of HPA after IFFT in receiver side because FDE process is considered.
e jθ
Fig. 1. SC-FDMA system using FDE
If the back-off is not enough in uplink data transmission, we supposed that transmit signal is clipped. And clipped signal is expressed as follows
ICI Suppression in the SC-FDMA Communication System with Phase Noise
s[n], s[n] ≤ Amax . s[n]clip = g(s[n]) ≡ j(Φ[s[n]+θ (n)) , s[n] ≥ Amax A[s(n)]e where,
371
(11)
g (⋅) is the function of clipped signal and Amax is clipped level. Clip ratio to
analyze the effect by clipping is defined by
Amax Pav
γ≡ where
(12)
Pav is average power of OFDM signal before signal is clipped. Clipped
transmit signal is as follows [3].
s[n]clip = α ⋅ s[ n] .
(13)
Here, we consider that clipped signal is randomly generated by non-linear distortion according to back-off of HPA in transmitter and coefficient of clipped signal is α .
r ( n) = {α ⋅ x( n) ⊗ h(n) + υ (n)}⋅ e jθ ( n )
(14)
th
The recovered signal at k sub-carrier is given by
Yk =
1 N
1 = N
N −1
r[n] ⋅ e
−j
2π kn N
=
n=0
N −1 N −1
α ⋅ X n=0 i=0
i
1 N
⋅ Hi ⋅ e
N −1
[α ⋅ x(n) ⊗ h(n) + υ (n)] ⋅ e θ
j (n)
⋅e
−j
2π kn N
n=0
2π j ( i − k ) n +θ ( n ) N
(15)
+ Nk
To analyze the effect of phase noise, channel response is 1 (i.e., with phase noise is approximated as follows.
H = 1 ) and signal
N −1
Yk = α ⋅ X i ⋅ H i ⋅ Qi −k + N k i =0
= α ⋅ X k ⋅ Q0 +
N −1
i = 0, i ≠ k
(16)
α ⋅ X i ⋅ Qi − k + N k = Yk1 + Yk 2 + N k
X k and H i the frequency domain expression of x(n) , h(n) corresponding to kth sub-carrier. Q is expression of phase noise in frequency
where
domain. The first contribution
Yk1 ( i = k ) of (16) is the desired signal and CPE
component of phase noise and is expressed by
Yk 1 = α ⋅ X k ⋅ Q0 = α ⋅ X k ⋅
1 N
N −1
e n =0
j (θ ( n ))
= α ⋅ X k + CPE
(17)
where CPE added to each sub-carrier is proportional to signal value multiplied by jθ and this causes rotation of the constellation. Second contribution
Yk 2 ( i ≠ k ) is the ICI component.
372
H.-G. Ryu
Yk 2 =
N −1
α ⋅ X
i =0 , i ≠ k
i
⋅ Qi−k
N −1
1 = α ⋅ X i ⋅ N i =0, i ≠ k
N −1
e
j (θ ( n ))
⋅e
j
2π (i −k ) n N
(18)
n=0
This term corresponds to the summation of the information of the other N-1 subcarriers multiplied by some complex number and effects to kth sub-carrier useful signal.
4
Compensation of ICI
Fig. 2 is block diagram to compensate the phase noise and clipped error using MMSE equalizer after the FFT. After FFT, compensation processing is following as
Yk = α ⋅ X k ⋅ Q0 +
N −1
α ⋅ X
i =0 , i ≠ k
i
⋅ Qi −k + N k
(19)
where the first term is CPE component and second term is ICI component.
Hˆ k
H −1 BU
Fig. 2. Compensation block diagram
To estimate CPE component, we use the pilot symbol and estimated signal is expressed by
CPEk = rcpe
Yk ICI + N k = α ⋅ X k Q0 + = α ⋅ X k Q0 + Wk . Xk Xk
1 = Np
CPE
k∈s p
k
= α ⋅ X k Q0 + 1
4 k ∈s
p
Wk .
(20)
ICI Suppression in the SC-FDMA Communication System with Phase Noise
Here,
373
N p is the number of pilot symbol and 4, S p is pilot symbol and Wk is total
interference component due to ICI and AWGN.
Yk = α ⋅ X k ⋅ Q0 rcpe + =
N −1
i = 0, i ≠ k
N −1
i = 0, i ≠ k
α ⋅ X i ⋅ Qi − k rcpe + N rcpe (21)
α ⋅ X i Q i − k + WICI + AWGN .
Here, in (10), we can know that CPE component is removed and ICI by non-linear distortion error, phase noise and AWGN exist. In previous work, to remove phase noise, received signal of (10) estimates according to MMSE equalization and final recovered data sample (i.e., transmitted data sample) is following as
Xˆ k = Yk ⊗ C k . ~ Qi*− k ⋅ H k* Ck = 2 σ~ 2 ~ Qi*−k ⋅ H k* + x Ex where
(22) .
(23)
(⋅) ∗ means conjugate process, σ~x2 is variance of WICI + AWGN and E x is
useful signal power. Because ICI by phase noise and HPA is larger than noise power during equalization processing after FFT, to optimize the tap weighting factor, we use the error power achieved from pilot symbol and do not separate the ICI effect due to phase noise and HPA. Suppose that channel impulse response is much smaller than
~
ICI effect (i.e., H k << Q i − k ),
~ Qi − k ⋅ H k ≅ Qˆ i*−k and coefficient of equalizer is as
follows.
Qˆ i*− k
Ck = Qˆ
* i −k
2
σ~ x2
+
.
Ex
Using previous method, we exploit the pilot symbol to achieve sample of
~ Qi − k
which is happened interference between sub-carrier and minimize the cost function. As a matter of convenience, we replace α ⋅ X i by X h .
min
Q0 , k∈s p
Yk − X h Q i − k
2
Qˆ i − k = γ Q i − k + (1 − γ )Q i − k '
σ x2 =
1 NN
Y
k∈s p
k
2
.
(24)
374
H.-G. Ryu
σ~x2 ,
In previous method, to estimate the
it used null symbol and demonstrated the
improvement of performance by simulation. After removing ICI caused by phase noise, we know that remains are ICI component by non-linear distortion error and noise and use the decision directed compensation method to extract this component and use 4 pilot symbols to estimate. We estimate error which affects to sub-carrier using power difference of already
~ Yk cancelled CPE. Here, pilot symbol of Yk non-
known pilot symbol and pilot of
removed CPE component does not use because phase distortion or rotation value is large.
~ 1 Qh = NP min
2
~ (Y p − Yk ) k∈s p ~ ~ ~ Yk − X k Qh Qi −k
(25) 2
(26)
Q0 ,k∈s p
~
~ ~ Qh Qi −k =
Y X
Q0 ,k∈s p
k∈s p
5
k
Xk
* k
(27)
2
Simulation Results and Discussion
In this paper, we show the effect to equalizer according to phase noise and back-off of HPA in SC-FDMA. First, we confirm the improvement of BER performance controlling back-off of HPA without phase and investigate the effect of phase noise according to back-off.
BER
Phase error= 10 degree, gain error= 0.1, phase noise = -20 dBc and 10KHz, G=2 0 10 Theory without compensation ICI cancellation only IQ compensation only -1 10 Proposed IQ compenastion and ICI cancellation
-2
10
-3
10
-4
10
0
2
4
6
8 SNR[dB]
10
Fig. 3. BER comparison
12
14
16
ICI Suppression in the SC-FDMA Communication System with Phase Noise
375
Fig. 4 is BER comparison of compensation in case of phase noise change. Here, we consider the back-off of 5.5dB and control the phase noise to 0.005, 0.01, 0.06 rad2. From 0.005rad2 of phase noise, this system causes the error floor at 10-4 without compensation. When phase noise is removed, we know that system performance is satisfied under 14dB of SNR. However, compensation of phase noise must need to satisfy the SNR under 14dB without phase noise. Multipath2,CFO=0.01,gain error= 0.05,phase noise = -20 dBc and 10KHz,backoff 9dB,16QAM
0
10
Theory phase error=4 degree,IQ comp+Adaptive channel estimation phase error=4 degree,IQ and ICI comp+Adaptive channel estimation phase error=4 degree,IQ+ICI comp+LS channel estimation phase error=8 degree,IQ comp+Adaptive channel estimation phase error=8 degree,IQ and ICI comp+Adaptive channel estimation phase error=8 degree,IQ+ICI comp+LS channel estimation phase error=12 degree,IQ comp+Adaptive channel estimation phase error=12 degree,IQ andICI comp+Adaptive channel estimation phase error=12 degree,IQ+ICI comp+LS estimation
-1
BER
10
-2
10
-3
10
-4
10
0
2
4
6
8 SNR[dB]
10
12
14
16
Fig. 4. BER comparison in case of compensation of IQ imbalance and phase noise
6
Conclusion
The SC-FDMA with the distributed allocation method has nearly similar PAPR, compared with sub-band allocation method and PAPR can be further reduced by adding spectrum shaping filtering with appropriate roll of factor. We analyzed the effect by phase noise at oscillator and back-off at HPA which can be a problem for uplink data transmission. And we proposed the equalizer of advanced PNS algorithm to remove the ICI component effectively. When ICI component by phase noise and HPA exists in system simultaneously, propose method applies method updating reference value and tap weighting coefficient from conventional PNS algorithm, extracts the error by phase noise and HPA from pilot symbol at once and this component uses to forgetting factor. The adaptive equalizer has similar form of
376
H.-G. Ryu
SD-FDE (single carrier-frequency domain equalization) and the operation process based on PNS algorithm to remove the ICI component even in the IQ imbalance environment, which will be shown in the simulation results.
References 1.
2.
3. 4.
5. 6. 7.
8.
9.
Parsaee, G., Yarali, A.: OFDMA for the 4/sup th/ generation cellular networks. In: Canadian Conference on Electrical and Computer Engineering, May 2-5, vol. 4, pp. 2325– 2330 (2004) You, Y.-H., Jeon, W.-G., Wee, J.-W., Kim, S.-T., Hwang, I., Song, H.-K.: OFDMA Uplink Performance for Interactive Wireless Broadcasting. IEEE Transactions on Broadcasting 51(3), 383–388 (2005) Ryu, H.G., Lee, H.S.: Analysis and minimization of phase noise of the digital hybrid PLL frequency synthesizer. IEEE Transactions on Consumer Electronics 48(2) (May 2002) Wu, B., Cheng, S., Wang, H.: Clipping effects on channel estimation and signal detection in OFDM. In: 14th IEEE Proceedings on Personal, Indoor and Mobile Radio Communications, September 7-10, vol. 1, pp. 531–534 (2003) Wu, S., Bar-Ness, Y.: A phase noise suppression algorithm for OFDM-based WLANs. IEEE Communications Letters 6(12), 535–537 (2002) Wu, S., Bar-Ness, Y.: OFDM systems in the presence of phase noise: consequences and solutions. IEEE Transactions on Communications 52(11), 1988–1996 (2004) Witschnig, H., Reich, H., Stallinger, K., Weigel, H., Springer, R.: Performance versus effort for decision feedback equalization - an analysis based on SC/FDE adapted to IEEE 802.11a. In: IEEE International Conference on Communications, vol. 6, pp. 3455–3459 (June 2004) Wang, H., Chen, B.: Asymptotic distributions and peak power analysis for uplink OFDMA signals. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, May 17-21, vol. 4, pp. 1085–1088 (2004) Sorger, U., De Broeck, I., Schnell, M.: Interleaved FDMA-a new spread-spectrum multiple-access scheme. In: IEEE International Conference on Communications, ICC 1998, June 7-11, vol. 2, pp. 1013–1017 (1998)
Content Authentication Scheme for Modifiable Multimedia Streams Hankyu Joo Dept. of Computer Engineering, Hallym Univ., Chuncheon, Korea [email protected]
Abstract. Multimedia streaming service has been growing rapidly. The content authentication for multimedia streams is an important issue. Several content authentication schemes for multimedia streams have been developed. However, the developed authentication schemes do not allow modification of the content after authentication material is generated. Multimedia content producers and streaming service providers are different in many cases. Multimedia content is produced by the content producer and streaming service is performed by the streaming service provider. The authentication material of the content is generated by the content producer. The content producer may allow the streaming service provider to modify certain parts of the content after the authentication material is generated. We propose a content authentication scheme for multimedia streams in this paper. The proposed scheme allows modification of specific parts of the content by the streaming service provider after the authentication material is generated. Keywords: Multimedia streaming, Authentication, Hash, Chameleon Hash.
1
Introduction
Multimedia streaming service has been growing rapidly with the development of the high speed network, personal computer, and networked TV. Multimedia streaming can be divided into three categories; progressive download, live streaming, and on-demand streaming. In progressive download, the client begins playback of the multimedia file as it is delivered. The file is ultimately stored on the client computer. In the live streaming and the on-demand streaming, the file is not stored on the client computer. The live streaming is used to deliver a live event while it is occurring. The on-demand streaming is used to deliver media streams such as audio and video clips [1]. With the growing of multimedia streaming service, security of the streamed multimedia content has become an important issue. Content authentication for the multimedia streams is an important part of the multimedia streaming security. Content authentication for the multimedia streams is ensuring the authenticity of the streamed content. That is, the streamed content was produced by the claimed content producer and has not been altered by anyone since the content was produced. T.-h. Kim et al. (Eds.): MulGraB 2011, Part I, CCIS 262, pp. 377–386, 2011. © Springer-Verlag Berlin Heidelberg 2011
378
H. Joo
Content authentication schemes for multimedia streams have been developed [2][5]. The developed content authentication schemes work for progressive download and on-demand streaming. According to the developed schemes, the multimedia content may not be modified after the authentication material is generated. In many multimedia services, multimedia content producers and streaming service providers are different. In general, a content producer generates multimedia content and the streaming service provider takes permission to provide the streaming service of the content under a contract. The content producer may allow the streaming service provider to modify certain parts of the content. For example, the streaming service provider may add some advertisement in the middle of the content under the producer’s permission. The advertisement may be substituted by another advertisement without the knowledge of the producer. The producer of the content does not have to reauthenticate the content whenever the substitution occurs. The content producer needs to give permission to the streaming service provider to modify certain parts of the multimedia content after the authentication material is generated. In this paper, content authentication scheme for modifiable multimedia streams is proposed. The proposed scheme has the capability to authenticate the content of multimedia streams. Certain parts of the streamed content may be modified by the streaming service provider after the authentication material is generated. The content producer decides the parts which may be modified later and the streaming service provider who may modify the content. The proposed scheme works for progressive download and on-demand streaming. The rest of the paper is organized as follows. Service model and notations used in the paper are given in Section 2. Related works are described in Section 3. Proposed content authentication scheme for modifiable multimedia streams is described in Section 4. The proposed scheme is analyzed in Section 5. The conclusion is given in Section 6.
2
Service Model and Notations
2.1
Service Model
The content authentication scheme for multimedia streams is consisted of three entities: content producer, streaming service provider, and client. The content producer produces multimedia content. The multimedia content may contain some parts which may be modified by the streaming service provider. The content producer generates authentication material for the content and delivers the content and authentication material to the streaming service provider. The streaming service provider modifies parts of the multimedia content produced by the content producer. The streaming service provider provides the streaming service to the client upon the client’s request for streaming service. When the streaming service is requested by the client, the producer also delivers the authentication material to the client. The client plays (watches or listens) the multimedia content. When the client plays the multimedia content, the client verifies that the content is produced by the claimed
Content Authentication Scheme for Modifiable Multimedia Streams
379
content producer and has not been modified illegally by others. The modified parts are specified by the content producer and modified by the streaming service provider. Because of the characteristics of the streaming, the client may not verify the authenticity of the whole content before playing the content but verify it while playing the content. Digital signature is used in this scheme. We use Digital Signature Algorithm (DSA) [6] and Secure Hash Algorithm (SHA-1) [7] for digital signature and hash algorithm. We assume that the content producer and the streaming service provider have their own key pairs (public key, private key) for digital signature. 2.2
Notations
The following notations are used in the paper.
3
P: content producer S: streaming service provider C: client A → B: m : transfer of message m from A to B a || b : concatenation of messages a and b pubA: public key of A prvA: private key of A H(m): hash function of message m Signprv(m): digital signature of message m using key prv id: identifier of the content modList: list of modifiable block numbers
Related Works
Gennaro and Rohatgi proposed hash chaining scheme to authenticate multimedia streams [2]. A multimedia stream is divided into n blocks. Then the hash value of each block is attached to the previous block. That is, (
||
) and
) where
(
,
1, … , 2 .
(1)
The first block of the stream is signed by the streaming service provider. (
||
) .
(2)
The signature is delivered to the client and blocks are streamed to the client. When a block (blocki) is delivered, the hash value of the successive block (hi+1) is delivered to the client. The client verifies the first block by verifying the signature. Other blocks
380
H. Joo
are verified by the hash verification, that is, blocki is verified by checking hi = H(blocki || hi+1). Hash chaining scheme proposed in [2] cannot tolerate packet loss. Golle and Modadugu [3] proposed a scheme to authenticate multimedia streams that tolerates packet loss. To tolerate the packet loss, additional hash links are added to the basic hash chaining scheme. The proposed scheme also inserts additional packets and links to create augmented chains. The augmented chains give tolerance to the packet loss. Zhang et al. [4] proposed another scheme to tolerate packet loss. The proposed scheme uses butterfly-graph based stream authentication. The performances of the multimedia stream authentication schemes were analyzed in [5]. The schemes proposed in [2, 3, 4] are for authentication of multimedia streams which do not differentiate the content producer and the streaming service provider. These schemes do not allow modification of content after generation of the authentication material.
4
Proposed Authentication Scheme
We propose a scheme that allows a streaming service provider to modify specific parts of the streamed content. The producer of the content specifies the parts to be modified. The proposed scheme combines hash chain and chameleon hash [8] to allow content authentication for modifiable multimedia streams. The proposed scheme also uses digital signature. The hash function used in this scheme is SHA-1 [7]. DSA [6] is used for digital signature algorithm. The streaming service provider generates a DSA public key pair (pubS, prvS). The streaming service provider keeps prvS secretly and publishes pubS. The content producer and the clients may acquire pubS. The content producer also generates a DSA public key pair (pubP, prvP). The content producer keeps prvP secretly and publishes pubP. The streaming service provider and the clients may acquire pubP. DSA public key is composed of 4 integers: p, q, g, y. DSA private key is an integer x. DSA keys have the following properties:
p is a prime number of 1024 bits; q is a 160 bit prime divisor of (p-1); multiplicative order modulo p of g is q; and
(3)
. A content producer produces a multimedia content. The content has its own identifier, id. The content producer divides the content into n blocks (block1, block2, …blockn) with the same playing time. For example, each block may be played for 1 second. The content producer decides modifiability of each block. The block is modifiable if the producer wants to allow the streaming service provider to modify the block later. The unmodifiable blocks are numbered from 1 to u (unmod1, unmod2, …, unmodu), where u is the number of unmodifiable blocks. The modifiable blocks are numbered from 1
Content Authentication Scheme for Modifiable Multimedia Streams
381
to v (mod1, mod2, …, modv), where v is the number of modifiable blocks. Therefore n = u + v. The content producer selects two 160 bit integers, hu+1 and mhv+1. The content producer generates u chained hash values, h1, h2, …, hu, for the unmodifiable blocks. If the block is not modifiable then the content producer generates the chained hash values as (4). (
|| ||
) where
||
,
1, … , 1 .
(4)
The content producer also generates v chained hash values, mh1, mh2, …, mhv, for the modifiable blocks. If the block is modifiable then the content producer generates the chained hash values, mhi, as (5). (
|| ||
) where
||
,
1, … , 1 .
(5)
using the The content producer generates a chameleon hash value, ch, from public key of the streaming service provider, pubS. Assume that pubS has four integers: p, q, g, y. The content producer selects two random integers and in (1. . 1). Then the content producer calculates chameleon hash value, ch, as (6) and (7). (
|| ) .
||
(6)
) mod q .
(
(7)
After generating the hash values and the chameleon hash value, the content producer generates a digital signature on id, , and using his private key as (8). (
||
||
) .
(8)
The content producer transfers identifier of the content (id), list of modifiable block numbers (modList), the contents blocks (block1..blockn), hash value for the first block ( ), chameleon hash value ( ), and the signature (signature) to the streaming service provider. P → S: id, modList, (block1..blockn), h1, ch, signature.
382
H. Joo
The streaming service provider may modify the modifiable blocks whenever necessary. After modifying the modifiable blocks, the streaming service provider regenerates authentication material for the modified blocks. The chameleon hash value of the modified block is not modified although the blocks are modified. Assume that modi is a modifiable block and mhi is the corresponding chained hash value for the block. The streaming service provider modifies the modifiable block (modi) to . Then the streaming service provider regenerates the hash values as (9). ( where
,
) || 2, … , 1 and
|| || 1,
.
(9)
The streaming service provider selects a random integer k in (1..q-1) and calculates r , , and using his private key, prvS as (10), (11), and (12). Assume that prvS is x. (
(
.
)
||
|| ) .
.
(10)
(11)
(12)
When the client requests a streaming service for the content, the streaming service provider transfers identifier of the content, list of modifiable block numbers, hash values for first unmodifiable block, hash values for first modifiable block, chameleon hash value, , and , and signature of the content producer. S → C: id, modList, h1,
, ch,
,
, signature.
Then the streaming service provider starts streaming service by delivering the content blocks. S → C: (block1..blockn). The client verifies the signature, h1, and ch. Then the client performs verification of the chameleon hash value, ch, by performing (11) and by checking (13).
mod q .
(13)
Content Authentication Scheme for Modifiable Multimedia Streams
383
As the streaming going on, when the client receives a block, if the block is modifiable (modi), then the client verifies the block by checking (14). (
|| ||
||
) .
(14)
If the verification of is successful, the modi is modified only by the streaming service provider and the modification of the block is permitted by the content producer. If the block is not modifiable (unmodi), then the client verifies the block by checking (15) ( If the verification of
5
|| ||
|| un
) .
(15)
is successful, the unmodi is not modified after produced.
Analysis
The proposed authentication scheme is analyzed based on correctness, unforgeability, and timeliness. The correctness is the property that the client always accepts the authentic content. The unforgeability is the property that the forging of the content by any attacker is always detected. The timeliness is the property that the authentication can be achieved in timely manner for the streaming so that delay and jitter are minimized. The correctness is satisfied by the proposed scheme. If the content is not changed (or the modifiable block is changed by the streaming service provider), the client may always accept the content. Because of the property of the digital signature, if the content producer generates the hash value (h1), chameleon hash value (ch) and signature on (id, h1, ch), the signature can always be accepted. Whenever the client receives a block, the client checks the authenticity of the block. If the block is not modifiable, the client accepts the block, unmodi, if equation (15) is true. is Since hi is calculated as equation (4), if the block is not modified after calculated, the equation (15) is always true and the block is accepted. as For modifiable blocks, since the streaming service provider generates equation (9), the fequation (16) is true. (
|| ||
||
) .
(16)
And for modifiable blocks, the client performs equation (11) and checks the validity of the chameleon hash by equation (13).
384
Since
H. Joo
is calculated as equation (11), equation (17) is true. (
||
||
) .
(17)
And because of the following properties (3), and equation (10), (11), and (12), the equation (18) holds.
mod q (
) (
(18) )
.
Therefore, a modifiable block which is modified by the permitted streaming service provider with the private key x is always accepted. Unforgeability is satisfied by the proposed scheme. When an unmodifiable block is modified, the client can detect it. When a modifiable block is modified by other than the streaming service provider, the client may also detect it. When the client requests streaming service, the streaming service provider sends the signature of the content producer. When the client verifies the signature, the client knows that the signature, the hash value (h1), and the chameleon hash value (ch) are generated by the content producer. No other person can generate h1, ch, and the signature. If the unmodifiable block of the content is modified, the client can detect it. If the by other than the content producer, the hash block (unmodi) is modified to value, hi, should be recalculated as (19). (
|| ||
||
) .
(19)
If the hash value is recalculated, the hash values for the preceding unmodofiable blocks (h1, h2, …, hi-1) should be changed. If h1 is changed, the signature cannot be verified and the content would not be accepted. If the hi is not recalculated, the client may detect it because equation (20) holds. (
|| ||
||
) .
(20)
The modifiable block of the content may be modified only by the service provider who has the private key. If the modifiable block is changed by an attacker, the client may always detect it. If modi is a modifiable block, the hash value mhi should be recalculated as (9) by the attacker. If the hash value is not recalculated, the client may detect the illegal modification because the equation (21) holds.
Content Authentication Scheme for Modifiable Multimedia Streams
(
|| ||
||
) .
385
(21)
If the hash value is recalculated, the hash values for the preceding modifiable blocks (mh1, mh2, …, mhi-1) should be changed and the chameleon hash value (ch) should also be changed. Since the chameleon hash value, ch, is calculated as equation (6) and equation (7), only the streaming service provider who has the private key x can and satisfying the equation (22) [8]–[10]. generate (
||
||
)
mod q .
(22)
The chameleon hash value generated by an attacker who does not have the private key x may be detected by the client because equation (23) holds. (
||
||
)
mod q .
(23)
The proposed scheme minimizes the jitter for timeliness. The calculation of a hash function requires only a small amount of time. Digital signature verification and verification of chameleon hash function require more time than hash calculation. Our experiment shows that calculation of hash function takes about 0.004 ms and both of DSA digital signature verification and verification of chameleon hash function take about 10 ms. A client needs to perform only hash calculation whenever he receives a block. A client needs to perform digital signature verification and verification of chameleon hash function only once at the beginning of the play. Therefore the proposed scheme minimizes the jitter.
6
Conclusion
Content authentication is an important issue in multimedia streaming. Content authentication scheme for multimedia streaming is proposed in this paper. The proposed scheme allows modification of the content by the streaming service provider after generation of authentication material. The modifiable parts are determined by the content producer. The streaming service provider may modify the modifiable parts of the content whenever necessary. For example, the service provider may substitute an advertisement with a new one whenever necessary without getting new authentication material from the content producer. Packet loss problem is not considered in this paper. To tolerate packet loss, the proposed scheme may be combined with the augmented hash chaining [3] or butterfly-graph based chaining scheme [4].
386
H. Joo
The proposed scheme authenticates content for progressive download and ondemand streaming. The scheme is not applicable to the live streaming. More research is necessary for the authentication of live streaming. Acknowledgments. This research is supported by Hallym University Research Fund, 2011 (HRF-2011-031).
References 1. Silberschatz, Galvin, Gagne: Operating System Concepts, 8th edn. Wiley (2008) 2. Gennaro, R., Rohatgi, P.: How to Sign Digital Streams. In: Kaliski Jr., B.S. (ed.) CRYPTO 1997. LNCS, vol. 1294, pp. 180–197. Springer, Heidelberg (1997) 3. Golle, P., Modagudu, N.: Authenticating streamed data in the presence of random packet loss. In: NDSS 2001, pp. 13–22 (2001) 4. Zhang, Z., Sun, Q., Wong, W.: A Proposal of butterfly-graph based stream authentication over lossy networks. In: ICME 2005, pp. 784–787 (2005) 5. Hefeeda, M., Mokhtarian, K.: Authentication Schemes for Multimedia Streams: Quantitative Analysis and Comparison. ACM Tran. Multimedia Communications and Applications 6(1), Article 6 (2010) 6. Digital Signature Standard, FIPS 186, NIST (1994) 7. Secure Hash Standard, FIPS 180-1, NIST (1995) 8. Ateniese, G., de Medeiros, B.: On the Key Exposure Problem in Chameleon Hashes. In: Blundo, C., Cimato, S. (eds.) SCN 2004. LNCS, vol. 3352, pp. 165–179. Springer, Heidelberg (2005) 9. Ateniese, G., Chou, D.H., de Medeiros, B., Tsudik, G.: Sanitizable Signatures. In: de Capitani di Vimercati, S., Syverson, P.F., Gollmann, D. (eds.) ESORICS 2005. LNCS, vol. 3679, pp. 159–177. Springer, Heidelberg (2005) 10. Brzuska, C., Fischlin, M., Freudenreich, T., Lehmann, A., Page, M., Schelbert, J., Schröder, D., Volk, F.: Security of Sanitizable Signatures Revisited. In: Jarecki, S., Tsudik, G. (eds.) PKC 2009. LNCS, vol. 5443, pp. 317–336. Springer, Heidelberg (2009)
Intelligent Music Player Based on Human Motion Recognition Wenkai Xu, Soo-Yol Ok, and Eung-Joo Lee Department of Information and Communications Engineering, Tongmyong University, Busan, Korean [email protected], {SooYol,ejlee}@tu.ac.kr
Abstract. In this paper, an intelligent music player system based on computer vision is proposed. We present an improved method for skin color area detection and segmentation based on YCbCr and HSI mixed skin color space. Then we use the Adaboost learning algorithm to obtain the face detector based on the Haar wavelet transform with in skin color region and receive the location of eyes and mouth. The frame subtraction algorithm is used to achieve position of hand motion, we used CAMSHIFT algorithm to track the motion trajectory and estimate the center of hand, which used for controlling the Music Player. Afterwards, the system will recognize the hand motion by calculating the relationship between the mouth location and the position of hand area in real time. Therefore, experimental results illustrate this intelligent music controlling system has satisfying interactive input functions in real time, and it can be applied in different illumination condition and complex background well. Keywords: Face detection, hand motion detection, hand tracking, hand motion recognition.
1
Introduction
Tremendous technology shift has played a dominant role in all disciplines of science and technology. Virtual reality technologies, which can give humans the sensation of being involved in computer world, have been a popular research field for many years. The use of hand gesture is an active area of research in the vision community, mainly for the purpose of sign language recognition and Human-Computer Interaction (HCI). Face and gesture recognition are application areas in HCI to communicate with computers. Mosimann [1] used the Haar wavelet transform to extract face features from local Haar features. A gesture is spatio-temporal pattern which maybe static, dynamic or both. Static morphs of the hands are called postures and hand movements are called gestures. In the last decade, several methods of potential applications in the advanced gesture interfaces for HCI have been suggested but these differ from one to another in their models. Some of these models are Neural Network [2], Hidden Markov Model (HMM) [3] and Fuzzy Systems [4]. Hidden Markov Models (HMM) is one of the most successful and widely used tools for modeling signals with spatio-temparal variability. Human motion detection is an important orientation in the fields of computer vision and pattern recognition. It has an important role in graphic image processing, intelligent T.-h. Kim et al. (Eds.): MulGraB 2011, Part I, CCIS 262, pp. 387–396, 2011. © Springer-Verlag Berlin Heidelberg 2011
388
W. Xu, S.-Y. Ok, and E.-J. Lee
control, video coding and other fields. Using computer vision systems to perform motion recognition is a complex and challenging task. In this paper, we present an improved method for skin color area detection and segmentation based on YCbCr and HSI mixed skin color space. Then we use the Adaboost learning algorithm to obtain the face detector based on the Haar wavelet transform with in skin color region and receive the location of eyes and mouth. The novel frame subtraction algorithm is used to achieve position of hand motion, we used CAMSHIFT Algorithm to track the gesture trajectory and estimate the center of hand, which used for controlling the Music Player. Afterwards, the system will recognize the hand motion by calculating the relationship between the face centroid location and the position of hand area in real time. Using the algorithm we can exclude large numbers of interference and calculation, the complex data training and modeling is omitted.
2 2.1
Face Detection and Feature Extraction Skin Color Region Segmentation
To be color images, the information of skin-color is very important characteristics for human face. Research shows that: even though of different races, different ages and different gender, the difference in color chrominance is far less than the difference in the brightness [5]. Skin distribution shows clustering distribution in the skin-color space without luminance influence. Normally, in order to reduce the impact of brightness, we use nonlinear YCbCr elliptic cluster skin-color segmentation model that the illumination component is concentrated in a single component (Y) while the color is contained in the blue (Cb) and the red chrominance component (Cr). But because of illumination and complex background similar to skin-color effect, this method still may make skin-color region as non-skin color [5]. The other commonly used method of getting the skin area is based on HSI color space. HSI color space contains hue (H), saturation(S) and luminance (I). The HSI color space is very important and attractive color model for image processing applications because it represents color s similarly how the human eye senses colors. Thus the analysis of skin-color can be by hue and saturation space as to reduce the impact of luminance. This color space can detect skin-color well, so it is used in many skin-color detecting researches. Nevertheless, under the influence of the environment, this method still may make non-skin color as skin-color. Based on the skin color segmentation results in YCbCr Color Space and HSI Color Space, we analysis the advantages of their own and find out the shortages to think out a better method to get the more satisfactory results. By many times experiments we find that: By fusing the results receive from two methods, in other words, we perform every pixel “OR” operation on two binary images which get from two color space. Compared with the two methods which process the image only in YCbCr or HSI Color Space, we can get the better segmentation result as shown below.
Intelligent Music Player Based on Human Motion Recognition
(a)
(b)
(c)
389
(d)
Fig. 1. The Hand detection procedure: (a) Original image, (b) HSI Color Space, (c) YCbCr Color Space, (d) Fusion image
2.2
Face Detection and Location Based on Haar Feature
This article uses an YCbCr and HSI mixed skin color space model to perform image segmentation to obtain the skin color regions. It uses the Adaboost learning algorithm to obtain the face detector based on the Haar wavelet transform, and perform detection of the skin color region. The characteristic value of each class is all the pixels within the white rectangle and the gray value, minus the black rectangle in the region and all the pixel gray values (Figure 2).
(a)
(b)
(c)
Fig. 2. (a) Boundary, (b) Leptonema, (c) Diagonal
Viola [6] [7] proposed a fast algorithm in which an integral image can be easily extracted from the characteristics of the local Haar features. The image coordinates of points are defined in the point image from the point at the top left. All the pixel gray values are expressed as line on the pixel gray value. Using his algorithm, a traversal of the fusion image can calculate the integral image of all points. Using the integral image, any of the original image pixel gray values within a rectangle computation is required as a constant, which can quickly calculate the characteristics of each feature value. We obtain the face images to make a face that contains 500, 1500 non-face images in a training sample set, each sample size is 24 × 24 pixels. For each feature, calculate the corresponding sample eigenvalue; it will have the minimum classification error rate of the characteristic value as a threshold, to obtain a weak classifier:
1 h j (x ) = 0
p j f j (x) < p j θ j otherwise
(1)
390
W. Xu, S.-Y. Ok, and E.-J. Lee
We can use the equation below to strengthen the eye area, because the human eye has a strong component:
(
)
Eye _ Im age = Cb 2 + (255 − Cr ) 2 + (Cb − Cr ) / 3, Cb, Cr ∈ [0,255] 2
(2)
Compared with other organs, the color of the mouth has a higher-intensity Cr component and low intensity Cb component. Thus, we can use non-linear processing (formula 3) to enlarge Cr and Cb components of the difference between intensities to highlight the mouth region area.
(
Mouth _ Image = Cr 2 Cr 2 − η Cr Cb
η = 0.95*
Cr(x,y)
)
(3)
2
(x,y)∈ f
(4)
Cr(x,y) Cb(x,y)
(x,y)∈ f
We conduct an operation with Eye_Image and Mouth_Image to obtain the feature image, and to determine the center of each connected region. Triangulating a sample structure from these centers, form the bases of the geometry of eyes and mouth, to generate a triangular template (Figure 3) [11]. We calculate the similarity final score of the triangle with the template using the formulas from 5 to 7, based of the size similarity that we ultimately judge.
Fig. 3. Face triangle model
(θ1 − 0.8237 )2 ,0 ≤ θ
π 2
(5)
− θ2 2 π 0 ≤ θ2 < exp f 2 (θ2 ) = 0.1921 2 0 else
(6)
finalScore = f1 (θ1 ) f 2 (θ2 )
(7)
f1 (θ1 ) = 1 −
0.6785
1
<
We test people of different races images. The face detector can accurately detect the face region image combined with the color information used to train the Adaboost learning algorithm. For each rectangular region, this method effectively excludes most of the non-face region. We find the candidate of eyes and mouth positions in the face region. Finally, we accurately determine the position of the face through the similarity calculation and the template.
Intelligent Music Player Based on Human Motion Recognition
3
Hand Motion Detection and Tracking
3.1
Motion Detection Based on Improved Background Image Difference Algorithm
391
Image Difference Algorithm is usually used in the image with sample, uniform and static background. This method is easy to operate and runs quickly. However, it is sensitive to the change of dynamic scene, for instance, the lighting. It attempts to detect motion object by subtraction of gray-level values between the current image and the reference background image in a pixel-by-pixel mode. In case of the environment changes little and the gray-level value difference of the corresponding pixels is small, the scene can be regarded as stationary. If the change of gray-level value is large, the scene here can be regards as moving object. Marking these regions, the object’s position in the image can be calculated. The segmentation approach for the differential image differs, as the scale of the object and the light of the background are different. Its features are precise location, operation speed. It can split the moving object, but the algorithm is sensitive to changes in background image; it needs to update the background. The background image difference algorithm is shown as the following formula:
Z ij ( I ) = X ij [ I ] − Yij [ I ] , (i, j ) ∈ Ω Where X ij is gray-level value of the pixel
(i, j) in
(8) current frame, Yij is the
gray-level value of the pixel (i, j ) in background frame, Z ij is the difference of the gray-level value in pixel (i, j ) of two frames. According to previous research [8], it is better to use principle 3 σ of the normal distribution, consider the selection of pixel k, as a threshold in two adjacent frames in the gray value of the difference, three times the value of the changes in the distribution variance. It was found that, in the difference image, the relationship between the human movement gray value pixel image region and the current average gray value detection image is direct. When the average gray value of the forward test image is large, the gray value difference of the regional pixel of human movement mean is large; when the average gray value of the forward test image is small, the gray value difference of the regional pixel of human movement mean is small. Select the threshold, Tk = 3σ k , for image segmentation, when the lighting is dim; the current test image mean gray value is small, so the human movement image area will appear as large areas of rupture and leakage. The prospect is for a large number of pixels to be misidentified as background pixels. Therefore, this paper proposes a threshold algorithm based on the average gray value image. It uses the product of the current average gray value detection image L and the ratio of coefficients after differential threshold image segmentation:
T = μ×L
(9)
T is the threshold difference image segmentation, µ is the scale factor, L is the average gray value of the current test image. The experimental study found that when µ is in the range (0.15, 0.3), in conditions of low light and strong lighting, the resulting image segmentation is very satisfactory (Figure 4).
392
W. Xu, S.-Y. Ok, and E.-J. Lee
(a)
(b)
(c)
(d)
Fig. 4. Motion detection results: (a) Original image, (b) Difference image (c) Binary image (d) Face detection and hand motion detection result
3.2
Hand Motion Tracking
If the system does not detect the hand motion, it will only detect the face region; when the hand motion is detected by system through the algorithm above, hand tracking will going on. We know a method named mean shift algorithm, which is a simple iterative procedure that climbs the gradient of a probability distribution to find the nearest dominant mode. The mean shift algorithm operates on probability distributions [9] [10]. To track hand in video frame sequences, the image data has to be represented as a probability distribution. Distributions derived from video image sequences change over time, so the mean shift algorithm has to be modified to adapt dynamically to the probability distribution it is tracking. The new algorithm that meets all these requirements is called CAMSHIFT. CAMSHIFT algorithm is a dynamic change in the distribution of the density function of the gradient estimate of non-parametric methods. The course of algorithm is as follows: 1. Choose an initial search window W1; 2. Run the MEANSHIFT algorithm; 3. Resize the search window according to the result of Step (2), and get a new window W2; 4. Use W2 as the initial search window for the next video frame and repeat the algorithm.
Intelligent Music Player Based on Human Motion Recognition
393
When CAMSHIFT algorithm track a specific color object, the images do not have to calculate each frame all the pixels of the color probability distribution, just calculate pixel color probability distribution in the area that larger than the current search window. This can save a lot of computing.
Fig.5. Result Images of Hand Tracking with the CAMSHIFT Algorithm
4
Hand Motion Recognition
The human movement is the combination of a series of continuous movements, it has a wide application in intelligent video surveillance, virtual reality, user interfaces and motion recognition. According to our research above, we can receive the eyes location, mouth location and hand centroid, detect the hand motion and track the hand motion. These works supply very meaningful information for hand motion recognition. Firstly, we calculate the location of the face and the centroid of hand movement. If no hand movement is detected, only the detected face area. We chose the mouth location O (0,0) as the origin of coordinate, the hand centroid H c ( x1 , y1 ) in n
th
frame and H c1 ( x 2 , y 2 ) in ( n + m) frame are calculated, where m means time interval. Therefore, this coordinate system of motion analysis is created as Figure 6. To realize music player controlling accurately and effectively by using the information we extracted, we define five types operations to correspond five kinds of hand motion. Based on the coordinate we created, these different kinds of dynamic gesture can be expressed as math formulas as below: th
x1 < 0 < x 2 , y1 ⋅ y 2 2. x1 > 0 > x 2 , y1 ⋅ y 2 3. y1 < 0 < y 2 , x1 ⋅ x 2 4. y1 > 0 > y 2 , x1 ⋅ x 2
1.
≥ 0 , we regard it as motion from left side to right side; ≥ 0 , we regard it as motion from right side to left side; ≥ 0 , we regard it as motion from down side to up side; ≥ 0 , we regard it as motion from up side to down side.
394
W. Xu, S.-Y. Ok, and E.-J. Lee
(c) Fig. 6. Establish coordinate setting
Besides, there is another motion type in Z-axis direction, it is depth information. We get the depth information by calculating the palm area changing to implement “click” function. When the palm area difference between n frame and n + m frame greater than a threshold value ΔS , we regard it as click motion, it is expressed as: th
th
n+ m n S palm − S palm ≥ ΔS
5
(10)
Experimental Results
Based on the method we proposed, we implement an Intelligent Music Player Controlling system based on computer vision. As five types have been defined, four of which have been demonstrated: upwards, downwards, leftwards and rightwards. Besides, there is the other one type of movement in the Z-axis direction defined in our system: moving towards or away from the camera. In the controlling system we presented, we regard these five types gesture as different kinds of meanings for control signal. We defined the upwards gesture as “Turn on the volume”, downwards gesture as “Turn down the volume”, leftwards gesture as “Previous song” and rightwards gesture as “Next song”, the type of movement in the Z-axis direction is defined as the signal of “Play Or Pause”. The system GUI and experimental results are shown as Figure.7.
(a)
(b)
Fig. 7. The Intelligent Music Player GUI and Function Display: (a) to start the Music Player at 6th second frame, (b) Move the hand to play the previous song at 10th second fame
Intelligent Music Player Based on Human Motion Recognition
395
We selected ten persons for testing, to test them under different background. We found under simple background, human motion recognition average accuracy rate exceeds 94%. Under complex background, the human motion face recognition rate can exceed 94% and motion detection can exceed 92%. As a whole, face recognition and motion recognition has higher accuracy. Especially, the human motion recognition average accuracy rate exceeds 93.5%. The results are showed in Table. 1. Table 1. Human body posture recognition results
`
6
Under simple background
Under complex background
Num
Percent
Num
Percent
Leftwards COR
191
95.5%
186
93%
Rightwards COR
187
93.5%
184
92%
Upwards COR
193
96.5%
184
92%
Downwards COR
188
94%
183
91.5%
Total COR
759
94.88%
737
92.13%
Total Test
800
100%
800
100%
Conclusion
In this paper, an intelligent music player system is proposed based on human motion recognition. The motion recognition should be conducted based on the motion detection and motion tracking, so the results of motion detection and motion tracking have directly influenced the effect of human motion recognition. Firstly, we present an improved method for skin color areas detection and segmentation based on YCbCr and HSI mixed skin color space. Then we use the Adaboost learning algorithm to obtain the face detector based on the Haar wavelet transform within skin color region and receive the location of eyes and mouth. A novel frame subtraction algorithm is used to achieve position of hand motion, we used CAMSHIFT Algorithm to track the gesture trajectory and estimate the center of hand, which used for controlling the Music Player. Afterwards, the system will recognise the hand motion by calculating the relationship between the face centroid location and the position of hand area in real time. According to large numbers of experiments, the intelligent music player system that we present provides robust performance in dynamic video sequence, and actual result is satisfactory.
396
W. Xu, S.-Y. Ok, and E.-J. Lee
Acknowledgements. This research was supported by the MKE (The Ministry of Knowledge Economy), Korea, under the IT/SW NHN Program supervised by the NIPA (National IT Industry Promotion Agency) “(NIPA-2011-(C1820-1102-0010)).
References 1. Mosimann, U.P., Muri, R.M., Burn, D.J., Felblinger, J., O’Brien, J.T., McKeith, I.G.: Saccadic eye movement changes in parkinson’s disease dementia and dementia with lewy bodies 128(6), 1267–1276 (2005) 2. Deyou, X.: A Network Approach for Hand Gesture Recognition in Virtual Reality Driving Training System of SPG. In: ICPR 2006 Conference, pp. 519–522 (2006) 3. Elmezain, M., Al-Hamadi, A., Michaelis, B.: Real-Time Capable System for Hand Gesture Recognition Using Hidden Markov Models in Stereo Color Image Sequences. WSCG Journal 16(1), 65–72 (2008) 4. Holden, E., Owens, R., Roy, G.: Hand Movement Classification Using Adaptive Fuzzy Expert System. Expert Systems Journal 9(4), 465–480 (1996) 5. Xu, W., Lee, E.-J.: Improved Hand Detection and Gesture Recognition Algorithm. In: Korea Multimedia Society Conference 2011, p. 99 (2011) 6. Papageorgiou, C.P., Oren, M., Poggio, T.: A General Framework for Object Detection. In: Sixth International Conference on Computer Vision, pp. 555–562 (1998) 7. Viola, P., Jones, M.J.: Robust Real-time Object Detection, Cambridge Research Laboratory, Technical Report Series (2001) 8. Jing, G., Rajan, D., Siong, C.E.: Motion Detection with Adaptive Background and Dynamic Thresholds. In: Conference on Information, Communications and Signal Processing, pp. 41–45 (2005) 9. Liu, X., Chu, H., Li, P.: Research of the Improved Camshift Tracking Algorithm. In: IEEE Conference on ICMA 2007, pp. 968–972 (2007) 10. Yu, B., Lee, E.-J.: The hand mouse: Hand detection and hand tracking. In: International Conference on MITA 2009 (MITA 2009), pp. 244–245 (2009) 11. Shang, Y., Lee, E.-J.: Face and Hand Activity Detection Based on Haar Wavelet and Background Updating Algorithm. Journal of Korea Multimedia Society 14(8), 992–999 (2011)
Author Index
Ahn, Chang Wook II-123 Ahn, Eun-Young I-108, I-120, I-127 Ahn, Hyun-Sik I-358 Ahn, Jae-Hyeong I-10 An, Younghwa II-263 Baek, Myung-Sun II-146 Baek, Nakhoon I-165, I-185, I-191, I-197, I-203 Bok, Kyoung Soo I-307, II-310, II-331 Bui, The-Duy II-345 Buiati, Fabio II-290 Ca˜ nas, Delf´ın Rup´erez II-290, II-300, II-305 Cha, Hyunhee II-153 Chang, Yong-suk II-181 Cheng, Yanming II-38 Cho, Hwan-Gue II-98 Cho, Inkyoung I-344, II-38 Cho, Jeong-hyun II-181 Cho, Kyungeun I-135, I-146, I-155 Cho, Kyung-Ju II-248 Cho, You-Ze I-237 Choi, Dong-Yuel I-120, I-127 Choi, Gyoo-Seok I-243, I-253, II-45 Choi, Jin I-324 Choi, Sang-Il I-316 Choi, Su-il I-262 Choi, Yoon Bin I-89 Chun, SeungYong I-272 Chung, Jin-Gyun II-248 Dan, Xiang I-155 Deb, Sagarmay I-210 de Miguel Moro, Tom´ as Pedro Dwiandiyanta, Yudi I-227 El-Bendary, Nashwa
II-19
Gorrepati, Rajani Reddy I-351 Gregorius, Ryan Mario I-217 Han, PhyuPhyu I-179 Hassanien, Aboul Ella II-19
II-295
Hong, Geun-Bin II-257 Hur, Jung Gyu II-241 Hwang, Intae I-262 Hwang, Seong Oun I-37 Im, So-Young Islam, Tahidul
II-270 I-332
Jang, Tae-Su II-257 Jeon, Inho I-237 Jeong, Gu-Min I-316, I-358, II-133 Jeong, Haeseong II-9 Jeong, Hee-Woong II-163 Jeong, Ji-Seong I-165, I-172 Ji, Sang-Hoon II-133 Ji, Yingying II-339 Joe, Inwhee I-28 Joo, Hankyu I-377 Jung, Eun-Young II-284 Jung, Jae-Yoon I-1 Jung, Taejin I-262 Jung, Younho I-262 Kang, Jeong-Jin II-175, II-210 Kang, Jeong-Seok II-163 Kang, Kyungran I-262 Kang, Myung-joo II-345 Kang, Sin Kwan II-54 Kang, Tae-Weon II-278 Kim, Dae-Hyon II-234, II-241 Kim, Dae-Ik II-248 Kim, Do-Hyeun I-351 Kim, Geon II-146 Kim, Gwang-Jun II-234, II-241, II-248 Kim, Gyuyeong II-197 Kim, Hwan-Yong II-248 Kim, Hyunju II-326 Kim, Hyuntae II-197, II-203 Kim, Hyunuk II-88 Kim, In Tae I-37 Kim, Jaeho II-197 Kim, Jae-Won I-127 Kim, Jangju II-203 Kim, Jeong-Lae II-169, II-175 Kim, Jin-Mo I-108, I-120
398
Author Index
Kim, Jin-whan II-71 Kim, Jin Young II-210, II-220, II-225 Kim, JongWeon II-139, II-339 Kim, Joung-Joon II-175 Kim, Jung Eun II-54 Kim, Kee-Min I-358 Kim, Kwanghwi II-98 Kim, Kwan-Woong II-257 Kim, Kyu-Ho II-163, II-169, II-175 Kim, Kyung-Chang II-116 Kim, Mihye I-165 Kim, Nam I-165 Kim, Saehwa II-1 Kim, Sangwook II-106 Kim, Seongmook II-153 Kim, Seung Jong II-210 Kim, Sung-Hwan II-98 Kim, Sungmin I-290 Kim, Tae-wan II-345 Kim, Tai-hoon II-19, II-290, II-295, II-300, II-305 Kim, Woojoong II-62 Kim, Yeonho II-78 Kim, Yong-Kab II-234, II-241, II-248, II-257 Kim, Yoon Hyun II-210, II-220, II-225 Kim, Youngbong I-179 Kim, Youngok I-237 Kong, Hyung Yun I-280 Koo, Insoo I-332 Kwak, Nae Joung I-10 Kwon, Dong Jin I-10 Kwon, Jae-Yong II-278 Kwon, Ki-Chul I-165 Kwun, Tae-Min II-169 Lee, Lee, Lee, Lee, Lee, Lee, Lee, Lee, Lee, Lee, Lee, Lee, Lee, Lee,
Bae Ho I-262 Changsook I-135 Chan-Su I-272 Dong Ha II-54 Dong-Joon II-278 Eung-Joo I-387 Hwanyong I-191, I-197 Hyun II-54 Hyun-Min I-120 Il-Kyoo I-344, II-38, II-319 Inkyun I-197 Jeong Bae II-54 Jin-Young II-29 Joo-Gwang II-278
Lee, Ki-Young II-163, II-169, II-175, II-284 Lee, Kyung-Jung I-358 Lee, Kyung Sun II-220 Lee, Min-Ki II-163 Lee, Namkyung I-203 Lee, Sang-Heon I-272 Lee, Seung-Joo II-29 Lee, YangSun I-52, I-60, I-69 Lee, Yong Dae II-133 Lee, Yonghoon II-146 Lee, Yonghun I-1 Lee, Yong-Tae II-146 Lee, Young-Dae I-316 Lee, Youngseok II-139 Li, De II-339 Li, Lei II-338 Lim, Bo-mi II-146 Lim, Chae-Gyun II-175 Lim, Myung-jae II-163, II-169, II-175, II-284 Ma, Thi-Chau II-345 Min, Byung-Won I-44 Moon, ChanWoo I-358 Muminov, Sardorbek II-29 Na, Young-Sik
II-163
Oh, Byung-Jung II-116 Oh, Donghun I-237 Oh, Min-jae II-345 Oh, Moonyoung II-123 Oh, Ryum-Duck II-270 Oh, Sanghoun II-123 Ok, Soo-Yol I-387 Orozco, Ana Lucila Sandoval II-305 Park, Park, Park, Park, Park, Park, Park, Park, Park, Park, Park,
II-300,
Baekyu I-191 Chan I-165, I-172 Chang-soo II-345 Chanil I-37 Cheol-Min I-37 Gyeong-Mi I-179 In Hwan II-210, II-225 In-Kyu I-253, II-45 Jaehyung I-262 Jangsik II-191, II-197, II-203 Jong-Jin I-243, I-253, II-45
Author Index Park, Jung-Hwan II-270 Park, Junho II-326 Park, Jun-Yong II-270 Park, Soo-Hyun II-29 Park, Sora II-146 Park, Yong Hun I-307, II-310, II-331 Park, Young-Ho I-79, I-89, I-97 Park, Younok I-344 Pee, Jun Il I-307, II-310 Peng, Bo II-338 Prasetyaningrum, Tri I-217, I-227 Rib´ on, Julio C´esar Rodr´ıguez II-295 Ryu, Daehyun II-203 Ryu, Heung-Gyoon I-368, II-9 Ryu, Sung Pil I-10 Seo, Il-Hee II-169 Seo, Yong-Ho I-324 Seong, Dongook II-326 Shieh, Leang-San I-243 Shim, Yong-Sup II-319 Shin, Do-Kyung I-108 Shin, Jin I-290 Sohn, Kyu-Seek I-28 Son, YunSik I-52, I-60, I-69 Song, Ha Yoon II-62, II-88 Song, Jongkwan II-191 Song, Yun-Jeong II-146 Sug, Hyontai I-299 Suh, Doug Young I-1 Sung, Yunsick I-146
Suselo, Thomas I-227 Suthunyatanakit, Kittichai Suyoto I-217, I-227
399
II-345
Tang, Jiamei II-106 Tran, Truc Thanh I-280 Truong, Lang Bach I-316 Um, Kyhyun
I-155
Villalba, Luis Javier Garc´ıa II-295, II-300, II-305 Xu, Wenkai
II-290,
I-387
Yang, Joon-Mo II-270 Yang, Kyong Uk II-241 Yang, Yoonseok I-316 Yeo, Jong-Yun II-133 Yeom, Kiwon I-18 Yi, Ju Hoon I-28 Yi, Sooyeong I-290, II-78 Yildirim, M. Eren II-191 Yoo, Jae Soo I-307, II-310, II-326, II-331 Yoo, Kwan-Hee I-165, I-172, I-185 Yoon, Byung Woo II-191 Yoon, Jiyoung I-191 Yoon, Yangmoon I-237 Yu, Yunsik II-191, II-197, II-203 Yun, Nam-Yeol II-29 Zawbaa, Hossam M.
II-19