COMPUTER SCIENCE, TECHNOLOGY AND APPLICATIONS
COMPUTER ANIMATION
No part of this digital document may be reproduced, stored in a retrieval system or transmitted in any form or by any means. The publisher has taken reasonable care in the preparation of this digital document, but makes no expressed or implied warranty of any kind and assumes no responsibility for any errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of information contained herein. This digital document is sold with the clear understanding that the publisher is not engaged in rendering legal, medical or any other professional services.
COMPUTER SCIENCE, TECHNOLOGY AND APPLICATIONS Additional books in this series can be found on Nova’s website at:
https://www.novapublishers.com/catalog/index.php?cPath=23_29&seriesp= Computer%20Science%2C%20Technology%20and%20Applications&sort=2a&page=1
Additional e-books in this series can be found on Nova’s website at:
https://www.novapublishers.com/catalog/index.php?cPath=23_29&seriespe= Computer+Science%2C+Technology+and+Applications
COMPUTER SCIENCE, TECHNOLOGY AND APPLICATIONS
COMPUTER ANIMATION
JARON S. WRIGHT AND
LLOYD M. HUGHES EDITORS
Nova Science Publishers, Inc. New York
Copyright © 2010 by Nova Science Publishers, Inc.
All rights reserved. No part of this book may be reproduced, stored in a retrieval system or transmitted in any form or by any means: electronic, electrostatic, magnetic, tape, mechanical photocopying, recording or otherwise without the written permission of the Publisher. For permission to use material from this book please contact us: Telephone 631-231-7269; Fax 631-231-8175 Web Site: http://www.novapublishers.com NOTICE TO THE READER The Publisher has taken reasonable care in the preparation of this book, but makes no expressed or implied warranty of any kind and assumes no responsibility for any errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of information contained in this book. The Publisher shall not be liable for any special, consequential, or exemplary damages resulting, in whole or in part, from the readers’ use of, or reliance upon, this material. Any parts of this book based on government reports are so indicated and copyright is claimed for those parts to the extent applicable to compilations of such works. Independent verification should be sought for any data, advice or recommendations contained in this book. In addition, no responsibility is assumed by the publisher for any injury and/or damage to persons or property arising from any methods, products, instructions, ideas or otherwise contained in this publication. This publication is designed to provide accurate and authoritative information with regard to the subject matter covered herein. It is sold with the clear understanding that the Publisher is not engaged in rendering legal or any other professional services. If legal or any other expert assistance is required, the services of a competent person should be sought. FROM A DECLARATION OF PARTICIPANTS JOINTLY ADOPTED BY A COMMITTEE OF THE AMERICAN BAR ASSOCIATION AND A COMMITTEE OF PUBLISHERS. LIBRARY OF CONGRESS CATALOGING-IN-PUBLICATION DATA Computer animation / editors, Jaron S. Wright and Lloyd M. Hughes. p. cm. ISBN 978-1-61209-078-8 (eBook)
Published by Nova Science Publishers, Inc.
New York
CONTENTS Preface
vii
Chapter 1
Computer Animation Applied to the Recovery of Preindustrial Heritage: A New Approach José Ignacio Rojas-Sola and Francisco Javier Contreras-Anguita
Chapter 2
Virtual Engineering in Augmented Reality Pier Paolo Valentini, Eugenio Pezzuti and Davide Gattamelata
57
Chapter 3
A Survey of Popular 3D Soft-Body Animation Compression Approaches S. Ramanathan and A.A. Kassim
85
Chapter 4
Virtual Emotion to Expression: A Comprehensive Dynamic Emotion Model to Facial Expression Generation Using the MPEG-4 Standard Paula Rodrigues, Asla Sá and Luiz Velho
113
Chapter 5
Example-Based Performance-Driven Animation of an Anatomical Face Model Yu Zhang
129
Chapter 6
Dynamics for Managing Occlusion of Buildings in Panoramic Maps Neeharika Adabala
145
Chapter 7
Constraint-Based and Feature-Based CAD Systems and Applications Ioannis Fudos and Vasiliki Stamati
157
Chapter 8
Computer Aided Geometric Design with Powell-Sabin Splines Hendrik Speleers, Paul Dierckx and Stefan Vandewalle
177
Chapter 9
An Ontology of Computer-Aided Design Udo Kannengiesser and John S. Gero
209
Index
1
235
PREFACE During the last decades, computer-aided engineering (CAE) methodologies have deeply changed the way of designing and developing products, systems and services. Thanks also to significant hardware and software improvements, CAE techniques are widely used by designers from the early conceptual phases up to the final stages of engineering processes. At the industry level, these methodologies have become a fundamental tool to be competitive and to ensure high quality standards. In industrial engineering, computer-aided methodologies typically are instrumental for design teams in shape modeling, behavioral simulations, digital mock-ups and realistic animations. They are able to follow the development of a product from conception to production, also managing its life-cycle. Character animation is one of the key research areas in computer graphics and multimedia. It has applications in many fields, ranging from entertainment, games, virtual presence and others. This new important book gathers the latest research from around the globe in this dynamic field. The heritage of the preindustrial period is today coming under examination more often, as engineering must accept the study of its evolution as a discipline, from a technical as well as a historical perspective. Engineering therefore provides industrial Archaeology and the history of technology with an important element in order to complete the study of industrial heritage. These studies are generally considered from the perspectives of history, ethnography, philology and architecture, but do not usually include studies from an engineering perspective. Chapter 1 provides a detailed examination of the infographic work carried out on a Manchegan windmill (La Mancha – Quixote), as an example of preindustrial heritage, in order to obtain a computer animation, so that the procedure followed can be extrapolated to other examples of preindustrial heritage. One of the reasons for choosing the windmill is that flour mills represented an important nucleus of the economy and of the industrial and social development of society. For this reason its study is important, especially for industrial history. The study and analysis of these windmills is especially important owing to their general state of abandonment and deterioration, including analysis of the techniques used in their construction and those used in the working of the windmill. Computer animation is a key element in the recovery of this interesting preindustrial heritage. In addition, the chapter discusses the advantages of this technique compared with others such as virtual reality, and why the majority of museum interpretation centres already possess these tools.
viii
Jaron S. Wright and Lloyd M. Hughes
CAD-CAE (Computer-Aided Design/Computer-Aided Engineering) techniques provide through computer animation a fundamental tool to present an integral study from the perspective of engineering of any example of preindustrial heritage. The importance of this chapter resides in that it presents in an innovative and structured way the procedure for generating a computer animation of preindustrial heritage. In Chapter 2 the authors discuss several approaches in order to integrate computer-aided engineering instruments into Augmented Reality environment. Engineers and designers often develop their creative ideas in front of a computer monitor using mouse and keyboard. Although the integration between numerical computation and graphics leads to the generation of very realistic digital mock-ups, they are still far from the real context and the user has limited interaction with them. The purpose is to illustrate how recent development in computer graphics and image processing can improve the realism and interactivity with digital mock-ups. Starting from the interactive modeling of 3d shapes, the chapter presents some examples about the integration of real-time mechanism motion simulation, structural and fluid dynamics analysis post-processing. In Chapter 3, the authors review 3D dynamic mesh compression algorithms and investigate how vertex clustering, which chiefly contributes to animation coding complexity, affects compression performance. The authors finally conclude this chapter with observations that need to be effectively addressed by future 3D animation coding algorithms. In Chapter 4 the authors present a framework for generating dynamic facial expressions synchronized with speech, rendered using a tridimensional realistic face. Dynamic facial expressions are those temporal-based facial expressions semantically related with emotions, speech and affective inputs that can modify a facial animation behavior. The framework is composed by an emotion model for speech virtual actors, named VeeM (Virtual emotion-to-expression Model), which is based on a revision of the emotional wheel of Plutchik model. The VeeM introduces the emotional hypercube concept in the R4 canonical space to combine pure emotions and create new derived emotions. The VeeM model implementation uses the MPEG-4 face standard through a innovative tool named DynaFeX (Dynamic Facial eXpression). The DynaFeX is an authoring and player facial animation tool, where a speech processing is realized to allow the phoneme and viseme synchronization. The tool allows both the definition and refinement of emotions for each frame, or group of frames, as the facial animation edition using a high-level approach based on animation scripts. The tool player controls the animation presentation synchronizing the speech and emotional features with the virtual character performance. Finally, DynaFeX is built over a tridimensional polygonal mesh, compliant with MPEG-4 facial animation standard, what favors tool interoperability with other facial animation systems. Recent development of physics-based face modeling that emulates the anatomical structure including skin, muscles, and skull allows us to create detailed, realistic animations. However, synthesis of facial expressions on such complex models often involves significant manual work due to the difficulty in determining appropriate values of the muscle actuation parameters. Chapter 5 presents an example-based performance-driven method to automatically estimate facial muscle actuation parameters from markerless video footage. The authors method is based on an efficient face tracker which uses a facial deformation subspace model. During the training phase of the tracker a set of templates associated with the subspace basis is computed to alleviate the online computation. At runtime, the tracking algorithm establishes temporal correspondence of the face region in the video sequence by
Preface
ix
simultaneously determining both motion and appearance parameters. Using a set of example pairs that consist of the appearance and animation parameters corresponding to the key expressions, we learn the relationship between facial appearances and animation parameters. It enables the animation parameters to be computed in real-time from the appearance parameters obtained by the tracker, allowing animation of the anatomical model at interactive rates. Panoramic maps depict urban areas in oblique view. This form of cartography was prevalent from the late sixteenth century to the early nineteenth century, when there were not many skyscrapers in urban areas. But oblique view maps in the current urban scenarios suffer from loss of details due to occlusion among closely located multistory buildings. In Chapter 6 the authors leverage the time dimension to overcome the clutter in space dimension by introducing functional dynamics. The authors define a parameter called occlusion index for an urban scene at a given viewpoint. Solving the problem of occlusion involves devising methods for visualizing the urban scene that reduce/minimize the occlusion index. They explore occlusion reduction techniques that involve selecting optimal viewpoints, displacing buildings, making buildings transparent and changing building heights. The authors demonstrate these approaches by presenting screen shots of the solution applied to a prototype city block, and discuss the advantages and disadvantages of these solutions. This work is pioneering in its approach to applying animation in cartography, which has previously used animations only to depict time-dependent phenomena or fly-throughs. A new generation of Computer Aided Design systems has become available in which geometric constraints can be defined to determine properties of large designs. The new design concept, often called constraint-based design or design by features offers users the capability of easily defining and modifying a design, but introduces the problem of solving complicated, not always well defined, constraint problems. Traditional parametric models can also be enhanced to partially support declarative constraint-based descriptions. In Chapter 7 the authors provide an overview of representation schemes for CAD applications. Then they present a survey of methods for geometric constraint solving appropriate for Computer Aided Design. The authors demonstrate how these representations and constraint solving methods can be combined or adapted to support a broad range of CAD applications by presenting two example cases of successfully using a feature-based constraint-based representation scheme to support two different CAD applications. Powell-Sabin splines are bivariate C1-continuous quadratic splines defined on an arbitrary triangulation. Their construction is based on a particular split of each triangle in the triangulation into six smaller triangles. In Chapter 8 the authors give an overview of the properties of Powell-Sabin splines in the context of computer aided geometric design. These splines can be represented in a compact normalized B-spline basis with an intuitive geometric interpretation involving control triangles. Using these triangles one can interactively change the shape of the splines in a predictable way. The authors describe the simple subdivision rules for Powell-Sabin splines, and discuss some applications. The authors consider a new efficient spline visualization technique based on subdivision. The authors also look at two useful generalizations of the Powell-Sabin splines, i.e., QHPS splines and NURPS surfaces. The QHPS splines are a hierarchical variant of Powell-Sabin splines. They have very similar properties as the Powell-Sabin splines, and their hierarchical nature allows a local refinement of the spline in a very straightforward way. The NURPS surface is the rational extension of
x
Jaron S. Wright and Lloyd M. Hughes
the Powell-Sabin spline. By means of weights they give extra degrees of freedom to the designer for the modelling of surfaces. Chapter 9 develops an ontology of computer-aided design, based on the functionbehaviour-structure (FBS) ontology. It proposes two complementary views of the process of design. The object-centred view applies the FBS ontology to the artefact being designed. Integrating an ontology of three “design worlds”, this view establishes a framework of designing as a set of transformations between the function, behaviour and structure of the design object, driven by interactions between the three design worlds. Building on this framework, the process-centred view applies the FBS ontology to the activities defined by the object-centred view. This increases the level of detail and provides a more well-defined set of representations of these activities. The authors ontological framework can be used to provide a better understanding of the functionalities required of existing and future computer-aided design support.
In: Computer Animation Editors: J.S. Wright and L.M. Hughes, pp. 1-56
ISBN: 978-1-60741-559-6 © 2010 Nova Science Publishers, Inc.
Chapter 1
COMPUTER ANIMATION APPLIED TO THE RECOVERY OF PREINDUSTRIAL HERITAGE: A NEW APPROACH José Ignacio Rojas-Sola* and Francisco Javier Contreras-Anguita University of Jaén, Department of Engineering Graphics, Design and Projects, Campus de las Lagunillas, s/n, Jaén 23071, Spain
Abstract The heritage of the preindustrial period is today coming under examination more often, as engineering must accept the study of its evolution as a discipline, from a technical as well as a historical perspective. Engineering therefore provides industrial Archaeology and the history of technology with an important element in order to complete the study of industrial heritage. These studies are generally considered from the perspectives of history, ethnography, philology and architecture, but do not usually include studies from an engineering perspective. This chapter provides a detailed examination of the infographic work carried out on a Manchegan windmill (La Mancha – Quixote), as an example of preindustrial heritage, in order to obtain a computer animation, so that the procedure followed can be extrapolated to other examples of preindustrial heritage. One of the reasons for choosing the windmill is that flour mills represented an important nucleus of the economy and of the industrial and social development of society. For this reason its study is important, especially for industrial history. The study and analysis of these windmills is especially important owing to their general state of abandonment and deterioration, including analysis of the techniques used in their construction and those used in the working of the windmill. Computer animation is a key element in the recovery of this interesting preindustrial heritage. In addition, the chapter discusses the advantages of this technique compared with others such as virtual reality, and why the majority of museum interpretation centres already possess these tools.
*
E-mail address:
[email protected]. Tel: +34-953-212452; Fax: +34-953-212334; Corresponding author. Professor Dr. José Ignacio Rojas-Sola. University of Jaén, Department of Engineering Graphics, Design and Projects, Campus de las Lagunillas, s/n, Jaén 23071. Spain.
2
José Ignacio Rojas-Sola and Francisco Javier Contreras-Anguita CAD-CAE (Computer-Aided Design/Computer-Aided Engineering) techniques provide through computer animation a fundamental tool to present an integral study from the perspective of engineering of any example of preindustrial heritage. The importance of this chapter resides in that it presents in an innovative and structured way the procedure for generating a computer animation of preindustrial heritage.
Introduction A Google search for the term “computer animation” shows up 2,650,000 results, while a search for the term “heritage” gives 117,000,000 results. A search for both terms together gives 91,400 results (search carried out on the 22nd December 2008). This shows the increasing importance of heritage in any of its facets. This importance can also be seen in the numerous prestigious international congresses on the subject, such as “World Heritage in the Digital Age" (organized by UNESCO's World Heritage Centre) or VAST Conferences (International Symposium on Virtual Reality, Archaeology and Cultural Heritage). We must also consider the existence of high-impact journals, such as the Journal of Cultural Heritage (JCR) among others, the large number of websites dedicated to the issue [1], [2], or the European Union’s 7th Framework [3]. The UNESCO World Heritage [4] defines heritage as “our legacy from the past, what we live with today, and what we pass on to future generations”. In terms of the virtual heritage which concerns us here, researchers believe that it can serve to encourage people to visit the actual site, and can provide a complement to such a visit [5]; visitors can benefit from the changes and opportunities it offers [6]. Current trends in work on virtual heritage point to three different steps: complete 3D documentation, 3D representation (from historical reconstruction to visualization) and 3D publication (from immersive reality to augmented reality) [7]. Many applications have been developed which deal with historical sites or buildings, and in 2000 it was already forecast that in the following decade work would be centered on virtual industrial heritage [8]. Industrial heritage has a close relationship with Industrial Archaeology. Much has been written on this subject, defining it variously as the discovery, analysis, record and preservation of past industrial remains [9], the discovery, cataloguing and study of physical remnant of the industrial past, in order to learn about significant aspects of the world of work and technical and production processes [10], or the study of material culture and aspects linked to production, distribution and consumption, in the future and un connections with the past [11]. Today there are many examples of industrial heritage that are about to disappear, and in many cases in ruins. Many organizations are working to study and analyze these cases, some linked to industrial archaeology, such as TICCIH (The International Committee for the Conservation of Industrial Heritage) [12], AIA (Association for Industrial Archaeology - UK) [13], or linked to the history of technology, such as SHOT (Society for the History Of Technology - USA) [14], as well as branches of UNESCO which study the many aspects of heritage; architectural, industrial, cultural, ethnographic, to name but a few. The recovery of heritage is in many cases linked to the history of technology, as it is a fundamental element in the study of the technological evolution of any invention. Engineering Graphics, and more specifically infographic techniques, play an essential role in the study of the history of technology, given the universal character of graphic language, as is
Computer Animation Applied to the Recovery of Preindustrial Heritage
3
shown by the large number of articles in print which deal with graphic reconstructions of various inventions and devices [15 - 19]. However, in many cases the efforts of conservationists, archaeologists and restorers are not enough. In particular, the heritage provided by ruined buildings and constructions, whether architectural or industrial, is often lost owing to the interests of urban development or the lack of a renovation project which could give life to the area and bring opportunities for work. This loss is more clearly shown in the case of preindustrial heritage1, which has been part of production processes, not only because of the wideness of its scope, but also because older machines suffer greater deterioration when they are no longer used. On many occasions initiatives are in put in place to conserve examples of industrial heritage, for example Museums of Science and Technology, which are becoming increasingly frequent, as they are a way of safeguarding a form of culture linked to the socio-economic development of a given area [20]. This example is all the more evident in the case of elements related to proto-industrialization (windmills, watermills, fulling mills, or oil presses, among others), as they date from the preindustrial period, and their age makes them more susceptible to deterioration and disappearance. The role of synthesis images in the conservation of industrial heritage has grown exponentially in recent years. They allow an area, building or object to be preserved and interpreted in ways otherwise impossible to imagine, using photographic techniques. The most important factor is however that when using virtual models, it is not necessary to disturb or modify the original item. There are also other advantages, stemming from the computer animation itself. Firstly there is a socio-cultural objective in the conservation of the ‘collective historical memory’ of an area where a given type of heritage was prevalent, providing information on the evolution of that society. Secondly, there is a clear educational objective in showing details of an abandoned culture [21]. Thirdly, there is also technological interest, as the use of computer animation techniques and processes provides valued know-how. A computer-generated image should be as faithful as a figure in a journal, although this is rarely possible [22]. In sum, whenever an element of a society’s heritage is lost, it also becomes impossible to study, analyze and value its impact on that society. This chapter presents a new approach to the use of computer animation techniques applied to an element of preindustrial heritage, the Manchegan windmill (La Mancha, Spain), which were built in the 16th century, and some of which remain in near-perfect condition today. The specific windmill under study is the ‘Sardinero’, one of the 10 which still stand in the area of Campo de Criptana (Ciudad Real, Spain); it has also been declared of special cultural interest by the Spanish Government. These famous windmills appear in the masterpiece of Spanish literature, Don Quixote, by Miguel de Cervantes.
The Windmill The windmill is one of the devices that has most been used over the centuries to obtain flour, an essential part of the human diet. A detailed study of their working mechanisms [23-25] and
1
The following is applicable to both preindustrial and industrial heritage.
4
José Ignacio Rojas-Sola and Francisco Javier Contreras-Anguita
their design codes allows us to develop a computer model and conserve completely this example of heritage, leaving a legacy which can be studied by future generations.
Architecture Although there are various different types of windmills both in Spain and in other countries, the original architecture of a Manchegan windmill has three different floors. The ground floor (cuadra) is where the cereal was received and the canvas sails were stored. The first floor (camareta) is where the flour was packed into sacks, and the second floor (moledero) housed all the machinery necessary for milling the cereal [26]. The windmill had a cylindrical masonry tower about 8 m in height, capped with a conical cover (windmill cap) made of zinc, about 3.5 m in height. This rested on a ring on the top of the tower, which allowed this part of the windmill to turn to face the prevailing wind.
Working The way a Manchegan windmill worked can be explained using the following photographs, which were taken by the author. Figure 1 shows an exterior view of the ‘Sardinero’ windmill in Campo de Criptana. The photograph shows different functional elements of the windmill, such as the sails (which would be covered with canvas to increase their surface area), the windshaft and the windmill cap.
WIND SHAFT
WINDMILL CAP
SAILS
UPPER WINDOWS
Figure 1. View of the Windmill.
Computer Animation Applied to the Recovery of Preindustrial Heritage
5
Figure 2. Close-up view of the sails.
Figure 3. Close-up view of the join between the sails and the windshaft.
Each windmill has two rotation systems: a horizontal system, formed by the windshaft, the sails and other elements which will be described below; and a vertical system formed by the windmill cap and the tailpole. This vertical rotation system allowed the windmill cap to turn so that the sails faced the prevailing wind. This was done by the miller, who would use
6
José Ignacio Rojas-Sola and Francisco Javier Contreras-Anguita
the 12 small upper windows around the windmill to determine which way the wind was blowing. Figure 2 shows the geometry of the sails, which have become deformed over time. They measure some 16 m from tip to tip, and are formed by 2 central stocks, and each of the sails had the central stock, four long ways struts and 19 crossways struts, giving rigidity to the sail. Figure 3 shows the detail of the join between the sails and the windshaft, and the three struts which provide rigidity to the sails.
TAILPOLE
FIRST FLOOR WINDOW
ENTRANCE
STONE MARKERS
Figure 4. Front View of the Entrance.
Figure 5. Space for storing sail canvasses.
TRIPOD
Computer Animation Applied to the Recovery of Preindustrial Heritage
7
Figure 4 is a view of the front of the windmill, showing the only entrance to the ground floor (cuadra), the window of the first floor (camareta) which provided the only source of natural light, the tailpole, which allowed the windmill cap to turn into the wind, the tripod or support for the tailpole, and the 12 stone markers which marked the 12 possible positions of the tailpole.
Figure 6. Entrance to the ground floor, with the counterweight which was used to separate the milling stones.
Figure 7. Spiral staircase leading to the first floor.
8
José Ignacio Rojas-Sola and Francisco Javier Contreras-Anguita
In the entrance to the ground floor there was an area where the sail canvasses were stored (Figure 5), and where the counterweight hung (Figure 6). This counterweight could be pulled down manually in order to separate the two milling stones.
Figure 8. Beams (marranos) supporting the milling floor.
Figure 9. View of the flour channel where the flour was put into sacks.
Computer Animation Applied to the Recovery of Preindustrial Heritage
9
The spiral staircase (Figure 7) which leads up to the first floor runs alongside the two huge beams (marranos), which supported the second floor (milling floor) (Figure 8). On this floor there was a flour channel (Figure 9) through which the milled flour passed directly to be put into sacks. The mechanism which separated the milling stones (relief mechanism) (Figure 10) was located on the milling floor, and was activated by the counterweight shown in Figure 6. Figure 11 shows the runner and the bedstone, between which the cereal was milled. These stones were normally grooved to aid the milling process. The photograph also shows the outlet for flour which led to the flour channel shown in Figure 9.
MECHANISM FOR SEPARATING MILLING STONES
Figure 10. Stairway to milling floor.
RUNNER
BEDSTONE
FLOUR CHANNEL
Figure 11. View of the two milling stones and the outlet for flour.
10
José Ignacio Rojas-Sola and Francisco Javier Contreras-Anguita
Figure 12 shows gearing between the wallower and the brake wheel (fixed to the windshaft), which were the gear wheels which transmitted movement to the milling mechanism. The brake wheel had 40 cogs, and the wallower had 8 segments in which the cogs fitted, giving a gear ratio of 1:5 (8/40). Therefore, when the sails turned (normally at around 9 rpm), the brake wheel transmitted movement to the wallower, which in turn drove the rotation of the runner stone via the iron wallower axle. A further important element in the working mechanism was the brake rim, which was activated using a set of struts and a rope. Figure 13 shows the joint between the tailpole and the windmill cap, which were joined by a wooden block called the fraile. This linked the tailpole to the roof structure of the windmill, which was strengthened by wooden ribs. IRON WALLOWER AXLE
COG
BRAKE RIM
WALLOWER
BRAKE WHEEL
Figure 12. Detail of the gearing between the wallower and the brake wheel. WINDMILL CAP STRUCTURE
TAILPOLE
FRAILE
Figure 13. Detail of the joint between the tailpole and the windmill cap at the fraile.
Computer Animation Applied to the Recovery of Preindustrial Heritage
11
Figure 14 shows the hopper where the cereal was housed and the channel which fed the cereal into the central hole in the runner stone. Lastly, Figure 15 shows the roof structure (formed by perpendicular beams called madres and manzanos), which was the wooden structure on which the ribs of the roof section of the windmill rested, as well as the tailpole, which allowed the windmill cap to turn. It turned on a ring (rueda terrera), which was greased in order to avoid excessive friction. The photograph also shows the windshaft and one of the two stones on which it rested, the forestone (fuélliga), and the tailstone (rabote).
CHANNEL HOPPER CENTRAL HOLE IN THE RUNNER STONE
Figure 14. Hopper and channel feeding the hole in the runner stone.
ROOF STRUCTURE
WINDSHAFT
TAILSTONE
Figure 15. View of the roof structure, windshaft and tailstone.
12
José Ignacio Rojas-Sola and Francisco Javier Contreras-Anguita
Methodology Computer animation forms part of an innovative methodology for the conservation, diffusion and updating of industrial heritage [27]. In order to systemize the process of acquisition, classification, treatment and distribution of the available sources of information dealing with the object of study (plans, photographs, documents, slides, analogue recordings, hard-copy texts) and thereby both improve the conservation of this material and help to generate formats with higher added value, a working methodology was developed, as shown in Figure 16. UPDATING
EXECUTION
VERIFICATION
TARGET MATERIAL
Corrective actions
Figure 16. Diagram of the methodology proposed for the recovery and updating of industrial heritage.
The Updating stage is divided into three sections: • • •
Location, classification, nomenclature and storage. Digitalization of the original material. Classification in digital repositories.
The Execution stage has four steps: • • • •
Identification of the technical requirements. Definition of functional features. Workflows between applications. Analysis of the creation and publication processes.
The Verification stage has two parts: • •
Development of tests. Analysis and verification.
This methodology provides an organized sequence of procedures, structured in three stages. A computer animation forms part of the first stage of the procedure, digitalization of the original material, as it consists of creating a sequence of frames in sequence, which when played at an adequate speed forms a video animation. There are many programs for modeling, synthesizing images and computer animation, and a comparative study of them [28] shows the most well known characteristics of each. Two of the most outstanding are Autodesk 3ds MaxTM and Autodesk MayaTM. Although either of the two could have been chosen for this study, Autodesk 3ds Max was chosen owing to the need to create particles (grains of wheat and flour). One of the critical phases of this process was the generation of digital models of the object, in order to create realistic images from the original sources. The applications and processes used in this task are described in the following sections.
Computer Animation Applied to the Recovery of Preindustrial Heritage
13
The work process followed these steps: 1. General outline of the virtual recreation of the ‘Sardinero’ windmill 2. Creation of CAD model with AutoCAD and import to Autodesk 3ds Max 2.1. Fieldwork 2.2. Modeling 2.2.1. From AutoCAD, by exporting .3ds files 2.2.2. From Autodesk 3ds Max, by importing .dwg files 3. Cameras and illumination 3.1. Camera movement. Creation of path 3.2. Illumination 4. Animation of working parts 4.1. Runner Stone raising mechanism 4.2. Brake rim mechanism 5. Materials and maps. Mapping coordinates 6. Creation of textures 7. Rendering and video creation 8. Postproduction
Development 1. General Outline of the Virtual Recreation of the ‘Sardinero’ Windmill It is recommendable, when working with industrial heritage, to perform two sequences or videos: a virtual ‘static’ view, which shows the object and its surroundings, and a second, ‘dynamic’ view, showing the working of the object, following the logical order of the productive process. This is how the work has been carried out in the case of the ‘Sardinero’ windmill studied here, establishing a playback speed of PAL frames of 25 frames per second. Given the nature of this work we decided to create a single file in Autodesk 3ds Max which included both sequences (static and dynamic) so as not to have to make adjustments in texture and illumination in various files. Lastly, we chose .avi as the file format, defined by Windows as its Video for Windows technology, as it is a format which is compatible with most video players. The file was created from the frames rendered individually in .png format.
2. Creation of CAD Model with AutoCAD and Import to Autodesk 3ds Max Although there are many existing procedures to digitalize industrial heritage objects in 3D [29], such as Empirical techniques, Topographic techniques, Laser scanning techniques or Photogrammetry, we have used empirical techniques, owing to their ease of use, their transferability and to the fact that precision measurement was not a determining factor. In addition, the geometry is relatively simple, with a cylindrical tower which could easily be modeled using CAD techniques, and from the perspective of engineering graphics this technique allows us to obtain all types of views, perspectives and sections of the windmill. This in turn allows us to make comparisons with other forms.
14
José Ignacio Rojas-Sola and Francisco Javier Contreras-Anguita Two examples of the plans obtained are shown in Figures 17 and 18.
Figure 17. Section Perspective of the windmill modeled in 3D.
Figure 18. Exploded view of the horizontal rotation system of the windmill.
Computer Animation Applied to the Recovery of Preindustrial Heritage
15
The development of the empirical approach applied to the ‘Sardinero’ windmill is based on two fundamental previous sections: fieldwork and graphic reconstruction.
2.1. Fieldwork The fieldwork necessary for the project includes both taking photographs and drawing sketches of the building and its mechanisms. The quality of the computer animation depends on that of the photographs, as the texture captured from them is applied to the model in order to provide a high degree of realism in the final video.
Figure 19. Transition from sketch to 3D CAD model.
We used a Nikon D-200 digital camera to take the photographs, with an ISO setting of 800. This allowed us to obtain clear images, and we took around 500 photographs of the
16
José Ignacio Rojas-Sola and Francisco Javier Contreras-Anguita
exterior and of the three floors of the windmill. The windmill was measured using a 10 m tape measure in order to draw the sketches. The inside areas and mechanisms of the windmill were measured using an engineer’s scale. As the windmill is not a precision-built construction, various references were taken and measurements had to be adjusted. The precision of the final model depends on the accuracy of the sketches and measurements taken. It is also necessary to bear in mind that the geometrical data obtained also allow us to infer certain technological considerations, and to make possible comparisons with other types of windmills.
2.2. Modeling After sketching, the next step is modeling, which is necessary in the graphic reconstruction to plan the virtual tour of the windmill. From the perspective of engineering, modeling is a powerful tool which allows for an accurate study of each part of the machinery, as well as giving an overall idea of how the different parts of the mechanism worked together. Some of the measurements taken ‘in situ’ were not completely accurate, and so further measurements had to be taken in order to shed light on certain assembly details which were not totally clear. The program used for modeling was AutoCAD. To obtain a model which is a faithful as possible to the original is a complex task, as there are often limiting factors, such as the measurement of certain elements which cannot be measured by hand, and other techniques have to be used. In this way the CAD model is obtained from the hand-drawn sketches (Figure 19). Given that AutoCAD and Autodesk 3ds Max were developed by the same company, it is easy to exchange information between them; for example, a point with coordinates x, y, z in AutoCAD corresponds exactly to another with coordinates u, v, w in Autodesk 3ds Max. This is an added advantage, because although there are neutral exchange files such as IGES, STEP or VDAS, these files sometimes produce a loss of information. This exchange of information can be made in two ways:
2.2.1. From AutoCAD, by Exporting .3ds Files Using this method it is possible export all the parts which are necessary, avoiding the need to debug later all the non-necessary elements, but they must by solids surfaces, lines, 3D polylines or 3D faces, among others. However, there are some disadvantages: Sometimes when working with complex geometry AutoCAD cannot obtain the 3ds file format. In order to avoid in Autodesk 3ds Max curved surfaces which have a multi-sided appearance, it is necessary to increase a variable in AutoCAD (facetrees) from a default value of 0.5 to a value of 10, which causes a notable slowing of the program. For these reasons, we chose the second option:
2.2.2. From Autodesk 3ds Max, by Importing .dwg Files The model created in AutoCAD can be imported directly with the extension .dwg, although it is necessary beforehand to configure the .max receiver file in Autodesk 3ds Max with a series of options such as the measurement units, considerations about AutoCAD primitives, geometry, layering and rendering options of the splines.
Computer Animation Applied to the Recovery of Preindustrial Heritage
17
Once the model has been imported, the screen is divided into four windows, called graphic windows (Figure 20), in order to create the sequences from different angles and perspectives. The active window is marked with a thick grey line, and all options can be accessed from a contextual menu using the right mouse button.
Figure 20. Working Screen in Autodesk 3ds Max.
3. Cameras and Illumination Cameras can be used to obtain personalized views of a scene much in the same way as with real cameras. Here, they need lens adjustments which are measured in millimeters. Autodesk 3ds Max has two types of cameras: Target and Free. The first is centered on the given object and the area around it, giving an independent animation of the object, while the free type simple records a scene in the direction in which it is pointing, without being linked to a specific object. In the case of the ‘Sardinero’ windmill, we used target cameras to obtain general plans of the exterior views, and of the first and second floors. Free cameras were used to focus in specific elements or movements, for example the counterweight relief mechanism used to life the runner stone. Although Autodesk 3ds Max provides a wide variety of groups of lenses, from 35 mm to 200 mm, in the windmill cameras with a focal distance of 24.29 mm were used, which is the default setting, in order to obtain wide angle views of the scene. As well as the lens, it is necessary to adjust the field of vision (FOV) which is measured in degrees (Figure 21). This
18
José Ignacio Rojas-Sola and Francisco Javier Contreras-Anguita
is linked directly to the focal length and measures the visible part of the scene. In the case of the default focal distance, the program adjusts the value directly to 45º.
Figure 21. Camera with adjusted FOV.
3.1. Camera Movement. Creation of Path Although a free camera is usually the better option if it is in movement and a target camera is more useful in situations where the camera does not move, we chose to use a target camera to produce the static video of the windmill (that is, a virtual visit where the windmill is not working), animating both the ‘body’ of the camera and the objective. It is very important to maintain a constant and appropriate speed during the path of the camera, and therefore movement constraints were used, which link objects to others or to the path of the camera. Autodesk 3ds Max offers different constraints, such as: • • • • • • •
Attachment constraint Surface constraint Path constraint Position constraint Link constraint LookAt constraint Orientation constraint
Computer Animation Applied to the Recovery of Preindustrial Heritage
Figure 22. Path of the camera following a spline curve.
Figure 23. Dummy linked to the camera lens and moved through the scene.
19
20
José Ignacio Rojas-Sola and Francisco Javier Contreras-Anguita
In this case, the path constraint has been used, so that the camera follows a spline curve previously created in Autodesk 3ds Max (Figure 22). Using the appropriate commands, the following path was created, which is the path of the camera outside and inside the ‘Sardinero’ windmill. Once the path has been generated, this has to be assigned to the ‘body’ of the camera. Although this can be done directly, in this case it was done indirectly, by assigning the path to a false object (Dummy) and then linking the camera to this object. This allows us to create camera travelling, at the same time as the camera uniformly followed its path; this is very useful to position objects and measure dimensions. A simple cube was used as the Dummy, with a pivot point in the centre, which was not rendered and which had no parameters. The link was then made between objects lower and higher in the kinematic chain. Finally, camera lens was animated independently of the ‘body’ of the camera using key frames. A helper was also linked, and moved through the scene by movement transformations, which does not imply any changes in the geometry of the object, but rather a modification of its initial state. The following figure shows the situation of the camera which travels the path through the windmill.
3.2. Illumination Illumination is the most intricate and complex part of the creation of any scene, as it forms the basis of the work carried out with textures and materials, and also determines to a large extent the rendering. In cases of complex examples of industrial heritage such as a windmill, simplicity should be a key factor, in order to find an optimum balance between rendering time and the quality of the result. The configuration of the materials in the scene will also be a conditioning factor. In this example, we have used different types of lights from Autodesk 3ds Max, including the Daylight system to simulate natural sunlight (Figure 24) with its various options, in which the software simulates the position of the sun at a specific time and date, and from a specific direction. It is also necessary to activate shadows by selecting the ray traced type (Figure 25), which are very accurate, as AutoDesk 3ds Max calculates the shadows according to each ray of light which enters the scene. In addition, we activated the option which determines the transition between bright areas and areas without illumination, and the exponential attenuation of light with distance. The other values are default values, and the color and intensity of the light are according to the geographical location selected earlier. It is also necessary to include fill light which does not generate shadows in areas where the principal light does not provide illumination. This fill light projects light from a defined area rather than from a single point, and with a lower intensity than that of the principal light. In the windmill these lights have been placed in each of the small upper windows (Figure 26).
Computer Animation Applied to the Recovery of Preindustrial Heritage
Figure 24. Simulation of sunlight.
Figure 25. Selection of Ray Traced Shadows.
21
22
José Ignacio Rojas-Sola and Francisco Javier Contreras-Anguita
Figure 26. Fill light situated in the upper windows.
4. Animation of Working Parts The following working subgroups were studied and animated: 1. Runner Stone raising mechanism. Inverse kinematics was used, which allow the designation of the movements of objects higher in the kinematics chain through the movement of the objects lower in the kinematics chain. 2. Brake rim mechanism. Here, inverse kinematics was also used, as well as Free Form Deformation (FFD), which allows elastic deformations of objects. 3. Creation and animation of ropes. Here the Reactor module has been used, and so once the approximate forms have been created, gravity is applied, giving a realistic curve. 4. Brake wheel–wallower. In this case we have used forward kinematics, that is, to determine the movements of the objects lower in the kinematics chain acting on the objects higher in the kinematics chain. 5. Obtaining flour from grains of wheat. This operation was carried out in two phases: in the first a Reactor module was used to achieve the effect of the grains of wheat, stored in the hopper, falling into the channel through an opening and from there falling to the milling stones. In the second phase, the particle systems were introduced, which are elements which generate groups of objects called particles, which behave as a single unit, and which allow the creation of real-time simulations of natural phenomena such as rain, dust and snow, among others.
Computer Animation Applied to the Recovery of Preindustrial Heritage
23
6. Movement of the sail canvases. Here an independent simulation system called Cloth is used, which allows the creation and animation of deformable material. Autodesk 3ds Max also has a specific modifier called Garment Maker, which transforms geometric primitives into material patterns. As it would take up too much space to give a detailed explanation of each of these subgroups, we have chosen two as examples: the runner stone raising mechanism and the brake rim mechanism.
4.1. Runner Stone Raising Mechanism In order to animate this mechanism which separates the milling stones, we used inverse kinematics, which allows us to determine the movements of objects higher in the kinematics chain by controlling the objects lower in kinematics chain. This is more effective than forward kinematics. Autodesk 3ds Max includes various methods to animate using inverse kinematics, such as IK Solvers, and traditional methods, Interactive IK and Applied IK. IK solvers are helpers which apply inverse kinematics to systems of linked objects. For example, there is a History-Dependent solver which is recommended in mechanical systems with sliding joints in inverse kinematics, as it has controls for damping, priority, and spring back.
Figure 27. Elements in the scene.
24
José Ignacio Rojas-Sola and Francisco Javier Contreras-Anguita
Figure 28. Control Element and actions on the other elements.
Interactive IK allows the positioning of a hierarchy linked to objects in different frames, and Autodesk 3ds Max interpolates all the key frames. This is not an accurate method, although it uses a minimum number of keys. Finally, applied IK is a method which applies a solution in a range of frames, calculating the keys in each frame; it is more accurate that Interactive IK, although it creates a large number of key frames. In this case we used inverse kinematics applying both methods. Once the necessary links between the objects had been established using Interactive IK, their behavior was observed, and the animation was carried out using Applied IK. To animate the movement of the separation mechanism of the milling stones, the raising mechanism of the runner stone, the elements have to be renamed, as when the AutoCAD model is imported into Autodesk 3ds Max, a predetermined name is given to all elements in the scene, and these names are not clear when there are many elements. Therefore, it is necessary to re-designate all the elements (Figure 27). Then, the control element for inverse kinematics is established. This is the element which will be animated manually, to be used as the basis for the animation (Figure 28). The next step is to determine the links between the control element and the other elements. This is the most difficult and time-consuming step, as it is necessary to use helpers and to relocate the pivot points of some objects. We added 6 helpers, to allow for the interconnection between all the elements and the combination of movements of some of them, for example in the runner stone which must turn and rise at the same time.
Computer Animation Applied to the Recovery of Preindustrial Heritage
25
Figure 29. Positions of helpers.
Figure 29 shows the functions of the helpers in the mechanism: Helper 01: Linked to the runner stone (including rings), the wallower (including rings and cogs) and to the iron wallower axle, which is the object higher in the kinematics chain. It allows the transmission of circular movement to these elements. Helper 02: Linked in the same way as helper 01, and transmits higher and lower movement hierarchy to this set of elements. Helper 03: Linked to the lever-beam joint, acting as a link between this and the other elements. Helper 04: Linked to the exterior raising beam, allowing for two pivot points on this element. Helper 05: Linked to the exterior raising beam, at the point where this joins the interior raising beam. Its function is to connect these two beams. Helper 06: This helper is lower on the hierarchy than the interior raising beam and its function is to create two pivot points and also to connect the interior and exterior raising beams. In addition, it is necessary to link two elements which are already joined, helper 04 and helper 03. In the same way, helper 06 is linked to helper 05, and helper 02 to helper 05. It is then necessary to define the constraints of the joints of each element, as each has six degrees of movement: rotation and movement along the X, Y and Z axes. The Rotational Joints and Sliding Joints options were used (Figure 30).
26
José Ignacio Rojas-Sola and Francisco Javier Contreras-Anguita
Each joint has three sections referring to each of the three axes, and if the Active option is deselected, that axis is constrained; that is, if the active option of the x axis of the interior raising beam is deselected in the rotational joints window, the element cannot turn on this axis. In the same way, if the same option is deselected in the sliding joints window, the joint cannot slide along this axis. This is how the joints are defined for the rest of the elements of the mechanism.
Figure 30. Rotational & Sliding Joints.
Computer Animation Applied to the Recovery of Preindustrial Heritage
27
The rings of the runner stone, the wallower, and the screw and bolts which form part of the mechanism do not have defined joints, as they are linked solidly to elements which already have these joints defined. Once the links and joints have been established (rotational and sliding) interactive IK is used to check that the elements of the mechanism move correctly. The button select and rotate shows another transformation of the three available in Autodesk 3ds Max, which does not imply any change in the geometry of the object, but rather a modification of its initial state (Figure 31). It can be seen that from a certain angle of the control element, the joints do not function as in real life; specifically, from 30º, the elements begin to intersect with one another. To solve this problem, other animation tools can be used such the Reactor plug-in, which allows the creation of key frames when objects interact according to the laws of physics.
Figure 31. Inverse Kinematics and select and rotate buttons.
However, it would be time-consuming to configure the scene using the Reactor module, and given that the runner stone has a movement of approximately 1 cm, for which the angle at which the control element turned was not more than 6º, this simulation is unnecessary. Therefore it is only necessary to apply the inverse kinematic solution using applied IK, which can be applied to any range of frames. The required animation is therefore obtained, as the program calculates the key frames for the other elements according to the control element and the links established.
28
José Ignacio Rojas-Sola and Francisco Javier Contreras-Anguita
4.2. Brake Rim Mechanism Inverse Kinematics has also been used to simulate the mechanism which brakes the brake wheel. In addition, a Free Form Deformation (FFD) modifier has been used to achieve the elastic deformation of the brake rim (Figure 32). In this case the elements which are present in the animation are the linking beam, counterweight beam, hook joint with windmill cap, counterweight-linking beam joint, linking beam-counterweight joint, pin flange, pin and bolt (Figure 33). The ring and the rim itself will be animated once the movement of the other parts has been determined. The control element is the linking beam, which is also the real-life control element (Figure 34). The links are then made between the control element and the other elements. A helper has been added, not to link objects, but to make possible the presence of two pivot points on the counterweight beam. This beam is then established as higher in the kinematics chain than the pin, pin flange, bolt, hook joint with windmill cap, counterweight-linking beam joint, as well as the helper, and lastly, the control element is designated as higher in the kinematics chain than the linking beam-counterweight joint. The counterweight-linking beam joint is linked to the linking beam-counterweight joint; specifically, the counterweight-linking beam joint is made to follow the linking beamcounterweight joint, and finally, the constraints of the joints are defined in these elements in the same way as before, using the Rotational Joints and Sliding Joints options.
Figure 32. Brake rim mechanism.
Computer Animation Applied to the Recovery of Preindustrial Heritage
29
Figure 33. Elements of the brake rim mechanism.
As before, correct movement is checked using interactive inverse kinematics, turning the control element with respect to its y axis, and observing the behavior of the other elements. Applied IK is then used as it can be applied to a given range of frames. Once the animation of the beams has been completed, the brake rim and its metal ring are animated. For this, the FFD modifier is used, as it can model rounded deformations without arrises, adjusting the control points of a lattice.
30
José Ignacio Rojas-Sola and Francisco Javier Contreras-Anguita
Figure 34. Control element and actions on the other elements.
Figure 35. FFD modifer button.
Computer Animation Applied to the Recovery of Preindustrial Heritage
31
Figure 36. Cylindrical Geometry of the FFD modifier, and surface adjustment button.
Figure 37. Modifier adjusted to fit the geometry of the rim.
The selection which is closest to the geometry of this example is cylinder type, FFd (cyl). The geometry of the modifier is then situated in the desired location (Figure 36).
32
José Ignacio Rojas-Sola and Francisco Javier Contreras-Anguita
Figure 38. Control points selected two by two.
Figure 39. Effect of the modifier on the geometry after moving the control points.
Computer Animation Applied to the Recovery of Preindustrial Heritage
33
The lattice is then selected and its resolution and size is configured, so that it coincides as much as possible with the brake rim and the ring. The higher the resolution, the better the results in the elastic deformation of the object, but more time is required. In this case, the resolution of the lattice was set at 42 control points, giving a perfect fit with the geometry of the rim, and also allowing us to animate it quickly. Figure 37 shows the geometry of the FFD modifier (cyl) after making these adjustments. The brake rim and its ring are then linked to the animation, similarly to the way objects are linked using the command Select and Link. Finally, the lattice control points are animated using key frames, defining an initial and final state using movement transformation. Therefore, the control points should be selected two by two (Figure 38) in order to achieve a good result, and to ensure that the elastic deformation of the rim coincides with the movement of the beams. Figure 39 shows the effect of the FFD modifier on the geometry of the brake rim after moving the control points vertically upwards.
5. Materials and Maps. Mapping Coordinates Autodesk 3ds Max uses materials to cover objects to imitate the effect of light on these objects. Maps or textures are elements which are applied to materials in order to achieve a realistic appearance, using mapping coordinates which are defined as how the objects are aligned using three-dimensional coordinates u, v and w.
Figure 40. Material selection window in Autodesk 3ds Max.
34
José Ignacio Rojas-Sola and Francisco Javier Contreras-Anguita
Autodesk 3ds Max has various types of materials, and their choice depends to a great extent on the rendering motor used and the type of illumination, among other factors. In the case of our windmill we have used standard materials to give realism to the animation, which are found in the Material Editor (Figure 40). Below is a description of the process used for the ‘dustcover’, a piece of wood which covered the milling stones. In many cases it is necessary to create a new material because it does not exist in the library contained in the software. In our case, as the element which is to be textured is formed by a series of wooden staves, a material was created for each stave with a similar texture, so that the final appearance of the surface of the element does not have repeated patterns. It is also necessary to define the shader to be used, as this is the algorithm which calculates the appearance of the material according to the specified parameters. In this case the Blinn shader has been used (the default setting), as it renders simple circular projections and softens adjacent surfaces. This shader has color panels to configure the Ambient, Diffuse and Specular colors, which determine the appearance of the final color of an object. However, we used maps and textures taken from images of the original model in the texturizing process, using the Diffuse component through another shader. It is also possible to obtain better results by configuring some aspects of the indirect illumination in the rendering motor, although this takes more time. This decision depends ultimately on the designer. It is then necessary to define the mapping coordinates in the element to be texturized. We have used the UVW Map modifier, with a projection gizmo, which defines how the map will be projected onto the surface and how the material will be applied (Figure 41).
Figure 41. Projection Gizmo and material assignment.
Computer Animation Applied to the Recovery of Preindustrial Heritage
35
Figure 42. Adjustment of dimensions of gizmo to those of the image.
Figure 43. Final Result of Mapping.
However, the visualization of the texture on the surface of the element it not correct, as the adaptation of the gizmo to its geometry implies uneven steps, and therefore when the
36
José Ignacio Rojas-Sola and Francisco Javier Contreras-Anguita
texture is applied it seems stretched or compacted. Final adjustments have to be made to ensure that the dimensions of the gizmo are proportional to those of the image (Figure 42). Figure 43 shows the final result, which is very similar to real life.
6. Creation of Textures Maps or textures are applied to the materials to obtain a realistic effect. In our case the textures (Figure 44) are taken from digital photographs taken of the real object and edited with Adobe PhotoshopTM.
Figure 44. Texture taken from digital photograph.
. Figure 45. Continued on next page.
Computer Animation Applied to the Recovery of Preindustrial Heritage
37
Figure 45. Exterior ground of the windmill processed by Adobe Photoshop.
Figure 46. Wooden door from which repetition texture is extracted.
The digital model was divided into a number of pixels with color and intensity, giving images which Adobe Photoshop can work with. These textures can be repeated indefinitely without the sensation of a repeated pattern. This is shown in Figure 45 applied to the ground of the exterior of the windmill. The upper part of the image has not been processed with
38
José Ignacio Rojas-Sola and Francisco Javier Contreras-Anguita
Adobe Photoshop and edges of the images do not match; however, the lower part of the image shows how this problem is solved.
Figure 47. Image with corrected perspective and area to be cut selected.
Figure 48. Rendered image with texture applied.
Computer Animation Applied to the Recovery of Preindustrial Heritage
39
The first step when using Adobe Photoshop is to select the color mode RGB (Red-GreenBlue) and the 8 bits /channel option, which is a standard mode found in televisions and color monitors. Files used should be saved in format .psd. The first step is extract the area of the photograph (texture) which is to be applied, using the lens correction and cut tools. The example shows the process used for the wooden door of a food store (Figure 46). The lens correction tool is used to correct the perspective, eliminating the divergence of parallel line, and the cut tool is then used to extract the desired area of the photograph (Figure 47). As digital photographs are already illuminated, it is necessary to adjust the brightness in order to ensure that the textures extracted are not excessively bright. The brightness has therefore been reduced by between 10% and 20% in photographs taken inside the windmill, and between 25% and 30% in photographs taken outside. Once this has been done, the image is cut and adjusted, obtaining a texture which can be repeated on the model without clear edges. This is undoubtedly one of the most complex stages of the work, to give a texture which has a high degree of realism (Figure 48).
7. Rendering and Video Creation Rendering is a process which calculates the properties of objects before they are shown on screen, that is, it generates a synthesis of the scene created. Autodesk 3ds Max has a rendering motor called Mental Ray and an additional plug-in called VRay, which give excellent results thanks to the representation of light through rays. Rendering is done in the active window, which is marked by a thick border (Figure 49).
Figure 49. Active window where the scene is rendered.
40
José Ignacio Rojas-Sola and Francisco Javier Contreras-Anguita
Figure 50. Render process window.
Figure 51. Video Post process window.
Computer Animation Applied to the Recovery of Preindustrial Heritage
41
The Rendering drop-down menu includes the Render command where the configuration is done, defining the rendering motor, the range of frames to render, the format and size of the images, among other factors. We chose the format .png (Portable Network Graphics), with a color configuration RGB 26 bits (16.7 million), with alpha channel and interlinking activated, which is one of the best image formats for computer animation. The resolution of the image is 768x576 pixels, with a width to height ratio of 4:3, although the final video format is a DVD. Once the configuration has been done, the rendering process itself starts, and a dialogue box shows the adjustments made and the progress of the process (Figure 50). Before rendering the final image is configured, defining the range of color and the output levels of the final image. This stage is very important, as it controls the clarity of the colors of the scene (not the illumination), the intensity of the tones, the intensity of the standard lights, and adjusts the colors so that they correspond to an exterior scene. Once the rendering process is complete, the frames are linked together using the Video Post command, which allows the inclusion of many effects; this command is also found in the Rendering menu. The first operation to obtain video is to include the rendered images to make up the list of images. Then the output format AVI (Audio Video Interleave) is set, as it a simple and standard digital video format, and a compression codec is chosen to reduce the size of the final video so that it is more manageable. The resolution of the final video file is also set, in this case PAL 768x576 pixels. We obtained a video of 720 MB for the static sequence (a virtual tour with the windmill not working) and a video of 8.41 GB for the dynamic sequence (virtual tour with of the working windmill).
42
José Ignacio Rojas-Sola and Francisco Javier Contreras-Anguita
The sequence is executed and the following window appears (Figure 51). The following 25 images obtained using rendering show the degree of realism obtained in the computer animation process.
Computer Animation Applied to the Recovery of Preindustrial Heritage
43
44
José Ignacio Rojas-Sola and Francisco Javier Contreras-Anguita
Computer Animation Applied to the Recovery of Preindustrial Heritage
45
46
José Ignacio Rojas-Sola and Francisco Javier Contreras-Anguita
Computer Animation Applied to the Recovery of Preindustrial Heritage
47
48
José Ignacio Rojas-Sola and Francisco Javier Contreras-Anguita
Computer Animation Applied to the Recovery of Preindustrial Heritage
49
50
José Ignacio Rojas-Sola and Francisco Javier Contreras-Anguita
Computer Animation Applied to the Recovery of Preindustrial Heritage
51
52
José Ignacio Rojas-Sola and Francisco Javier Contreras-Anguita
Computer Animation Applied to the Recovery of Preindustrial Heritage
53
54
José Ignacio Rojas-Sola and Francisco Javier Contreras-Anguita
8. Postproduction The final stage in any Project in the graphic conservation of industrial heritage is the post production or editing of the video, to create an audiovisual document combining effectively audio, video and text. The object is to give a clear idea of the heritage and its setting in the production process. This stage has been carried out with Adobe Premiere, which is a very versatile and intuitive program with a wide range of video, audio and transition effects. The final video has the configuration DV-PAL with the Standard 48 kHz, allowing it to be shown anywhere using any equipment, and with options which allow the user to personalize the video. First, the sound and video files are loaded, and the title and subtitles are created using a text editor. These are then added to the timeline, to set the order and timing of each element, and the transitions between videos are established. Lastly, the video is exported with the necessary settings.
Conclusion This project shows that to create a realistic computer animation requires a great deal of time and effort. This process is usually carried out by a team of designers equipped with powerful computers with various high-speed microprocessors, sufficient RAM and graphic cards with large amounts of memory. In our case, two people and three computers took almost one year to complete the project. Another important conclusion to be drawn is the importance of the technical training of the person who carries out the virtual recreation of the apparatus and devices which make up the element of preindustrial heritage, as in order to create a true-to-life animation it is necessary to know how these devices worked and were originally designed. Without such knowledge, it would be impossible for example to reproduce real working speeds of the machinery in the animation. At the same time it is extremely useful to be familiar with forward and inverse kinematics, as it makes working with Autodesk 3ds Max much easier. Generating a high quality computer animation requires great effort, not only in learning the main software packages used, but also in learning to use other graphic programs, such as video editing software and photographic software, which offer many possibilities. There has been a great deal of progress in the field of virtual reality, for example augmented reality, which is especially useful in the design of virtual scenes where real-life images are mixed with virtual images. However, computer animation using specific software still gives very high quality results, which makes it very useful when the objective is to show in detail how old machinery worked and its environment. Computer animation using specific software provides a better solution when dealing with complex machinery than a virtual reality in which the user interacts with the system, as the user would need to know in detail how the machinery worked. For example, in the case of a windmill, how to use the regulation elements such as the counterweight which operates the raising mechanism of the runner stone or the mechanism which controls the brake rim; this is specialist knowledge which a normal user is unlikely to possess.
Computer Animation Applied to the Recovery of Preindustrial Heritage
55
Therefore we have produced an audiovisual animation with as many camera angles as necessary, which shows quickly and intuitively all the details of the working of the machinery and its environment, of which it is not necessary for the user to have expert knowledge. This shows how non-immersive virtual reality using computer animations can provide many advantages.
Funding This research was funded by the Spanish National R&D Plan (HUM2006-00377), “Estudio histórico-tecnológico y representación gráfica de la evolución en el diseño de los molinos de viento en la mancha, en la España de los siglos XVI y XVII, mediante técnicas de Dibujo Asistido”, of the Research Projects Subdepartment of the Universities Department of the Ministry of Science and Innovation.
References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18]
http://www.virtualheritage.net/ http://www.itabc.cnr.it/VHLab http://cordis.europa.eu/fp7/ict/telearn-digicult/home_en.html (Cultural Heritage & Technology Enhanced Learning) http://whc.unesco.org/en/35/ Refsland, S. T., Ojika, T., Addison A. C., & Stone, R. J. (2000) Virtual heritage: Breathing new life into our ancient past. IEEE Multimedia, 7, 20-21. Arnold, D. (2001). Virtual heritage: Challenges and opportunities. Digital content creation (281-293). New York: Springer-Verlag. Addison, A. C. (2000). Emerging trends in virtual heritage. IEEE Multimedia, 7, 22-25. Stone, R. J., & Ojika, T. (2000). Virtual Heritage: What Next?. IEEE Multimedia, 7, 73-74. Buchanan, R. A. (1972). Industrial Archaeology in Britain. Harmondsworth: Penguin Books. Hudson, K. (1971). A guide to the industrial archaeology of Europe. London: Adams & Dart. Carandini, A. (1984). Arqueología y cultura material. Barcelona: Mitre. http://www.mnactec.cat/ticcih/ http://www.industrial-archaeology.org.uk/ http://www.historyoftechnology.org/ Rojas-Sola, J. I. (2005). Ancient technology and Computer-Aided Design: olive oil production in Southern Spain. Interdisciplinary Science Reviews, 30, 59-67. Rojas-Sola, J. I., & Domene-García, J. (2005). Engineering and Computer-aided design: A Study of watermills in Southeastern Spain. Interciencia, 30, 745-751. Rojas-Sola, J. I., Suárez-Quirós, J., & Rubio-García, R. (2007). The tradition of fulling mills: a study from engineering. Interciencia, 32, 675-678. Rojas-Sola, J. I., & López-García, R. (2007). Engineering graphics and watermills: Ancient technology in Spain. Renewable Energy, 32, 2019-2033.
56
José Ignacio Rojas-Sola and Francisco Javier Contreras-Anguita
[19] Pennestri, E., Pezzuti, E., Valentini, P. P., & Vita, L. (2006). Computer-aided virtual reconstruction of Italian ancient clocks. Computer animation and virtual worlds, 17, 565572. [20] Rojas-Sola, J. I. (2006). Cultural heritage and information technologies: improvement proposal for science and technology museums and interactive Centers of Venezuela. [21] Rojas-Sola, J. I., & López-García, R. (2007). Computer-aided design in the recovery and analysis of industrial heritage: Application to a watermill. International Journal of Engineering Education, 23, 192-198. [22] Bakker, G., Meulenberg, F., & De Rode, J. (2003). Truth and Credibility as a Double Ambition: Reconstruction of the Built Past, Experiences and Dilemmas. Journal of Visualization and Computer Animation, 14, 159-167. [23] Rojas-Sola, J. I., & Amezcua-Ogáyar, J.M. (2005). Graphical and Technical study of windmills in Spain. Interciencia, 30, 339-346. [24] Rojas-Sola, J. I., & Amezcua-Ogáyar, J.M. (2005). Southern Spanish windmills: technological aspects. Renewable Energy¸30, 1943-1953. [25] Rojas-Sola, J. I., Gómez-Elvira González, M.A., & Pérez-Martín, E. (2006). Computeraided design and engineering: a study of windmills in La Mancha (Spain). Renewable Energy, 31, 1471-1482. [26] Rojas-Sola, J. I., & Amezcua-Ogáyar, J.M. (2005). Origin and expansion of windmills in Spain. Interciencia, 30, 316-325. [27] Suárez-Quirós, J., Rojas-Sola, J. I., Rubio-García, R., Martín-González, S., & MoránFernández, S. (2009). Teaching applications of the new computer-aided modelling technologies in the recovery and diffusion of the industrial heritage. Computer Applications in Engineering Education, 17¸ 455-466. [28] http://www.tdt3d.be/articles_viewer.php?art_id=99 [29] Pavlidis, G., Koutsoudis, A., Arnaoutoglou, F., Tsioukas, V., & Chamzas, C. (2007). Methods for 3d Digitization of Cultural Heritage. Journal of Cultural Heritage, 8, 9398.
In: Computer Animation Editors: J.S. Wright and L.M. Hughes, pp. 57-83
ISBN: 978-1-60741-559-6 © 2010 Nova Science Publishers, Inc.
Chapter 2
VIRTUAL ENGINEERING IN AUGMENTED REALITY Pier Paolo Valentini*, Eugenio Pezzuti and Davide Gattamelata University of Rome “Tor Vergata”, Department of Mechanical Engineering Via del Politecnico, 1 – 00133 Rome, Italy
Abstract In this chapter the authors discuss several approaches in order to integrate computeraided engineering instruments into Augmented Reality environment. Engineers and designers often develop their creative ideas in front of a computer monitor using mouse and keyboard. Although the integration between numerical computation and graphics leads to the generation of very realistic digital mock-ups, they are still far from the real context and the user has limited interaction with them. The purpose is to illustrate how recent development in computer graphics and image processing can improve the realism and interactivity with digital mock-ups. Starting from the interactive modeling of 3d shapes, the chapter presents some examples about the integration of real-time mechanism motion simulation, structural and fluid dynamics analysis post-processing.
Keywords: Virtual Engineering; Augmented Reality; Simulation; Computer-Aided Design
1. Introduction 1.1. The Role of Virtual Engineering During last decades computer-aided engineering (CAE) methodologies have deeply changed the way of designing and developing products, systems and services [1]. Thank also to significant hardware and software improvements, CAE techniques are widely used by the designers since the early conceptual phases up to the final stages of engineering processes. At the industry level, these methodologies have become a fundamental tool to be competitive and to ensure high quality standards. In industrial engineering, computer-aided methodologies typically are instrumental for design teams in shape modeling, behavioral simulations, digital *
E-mail address:
[email protected]. (Address all correspondence to this author)
58
Pier Paolo Valentini, Eugenio Pezzuti and Davide Gattamelata
mock-ups up and realistic animations. They are able to follow the development of a product from conception to production, also managing its life-cycle. For this reason, the science that support the use of these tools is called Virtual Engineering. It means that all the tasks which are typical of an engineer or a designer can be arranged in a virtual way, using informatics and computers. The benefits of this approach are many. First of all there is a saving of time: many solutions can be tested, compared and optimized without building physical prototypes. As a consequence there is a saving of money because a digital mock-up is much less expensive than a physical one. On the other hand, a meaningful and reliable virtual simulation needs accurate study of the involved phenomena, definition of parameters, simplifications. The available computing capabilities allow to produce very realistic models and animations, mimic the real behavior of a system. Examples of these model are reported in Figure 1.
Figure 1. Examples of digital mock-up built and simulated using virtual engineering techniques.
Virtual Engineering in Augmented Reality
59
1.2. Augmented Reality The Augmented reality (AR) is an emerging field of the visual communication and information technologies [2-4]. It deals with the combination of real world images and computer generated data. Although the idea of virtual reality can be dated in the last century and the development of portable displays is dated 1966, the phrase “augmented reality” was coined by Prof. Tom Caudell only in 1990 considering an application developed at Boeing to help workers in assembling aircraft components. At present, most AR research is concerned with the use of live video imagery which is digitally processed and "augmented" by the addition of computer generated graphics. The idea behind the augmented reality is simple, but its development has been slowed down by the inadequacy of hardware resources to support real time heavy computation. Only during last decades, thanks to the increasing of hardware performance, the research in the field of augmented reality boosted [5]. With an AR system, the user can extend the visual perception of the world, being supported by additional information and virtual objects. The level of details of the augmented scene has to be very realistic in order to give the user the illusion of a unique real world. There are different types of AR applications depending on the level of details, graphics effects and interactivity (Figure 2). A generic AR implementation is depicted in Figure 3. The user interacts with the application. This one has to manage the presence of interfaces, ensure a correct tracking of user and objects in the real scene and compute an adequate and realistic collimation between real world and augmented contents. At the first level there are the tracking and the registration technologies; the first aims to calculate the user’s point of view with respect to the scene while the registration deals with the collimation of virtual world objects with the real environment. The human-scene interfaces can be mechanical, electromagnetic or optical devices. The second basic element for an AR system is the capability of real time rendering. The detail of rendered scene depends on hardware capabilities. The display technology ensures the immersion of the user in the scene. The ways to capture images from the real world, process and project again to the user can be different [6]. Three different technologies are currently implemented: • • •
the video based system; the optical see-through system; the video see-through system.
In the first one, as shown in the left side picture of Figure 2, the images coming from the real world are augmented with simple graphics data and streamed on a monitor. This approach is often used in sports to superimpose data in a common television broadcasting. It is quite simple, but the user is not immersed in the scene at all. In optical see-through systems, the information is not directly captured from the real world, but the augmented contents are added by projecting them onto a semi-transparent visor which naturally mixes the real perception with the augmented one. The user feels immersed in the scene, but there are some limitations on the quality of the images and some difficulties in collimating real and virtual objects.
60
Pier Paolo Valentini, Eugenio Pezzuti and Davide Gattamelata
Figure 2. Examples of augmented reality applications. On the left: a simple video augmented application in which a virtual line is added to the video stream in a swimming competition. On the right: a complex augmented scene for training in medicine (courtesy of Institute for Computer Graphics and Vision, Austria).
Figure 3. Layout of a generic high-end Augmented Reality System.
The video see-through system (see Figure 2, on the right and the scheme depicted in Figure 4) is based on the use of one or two cameras which acquire an image stream from the real world. The stream is processed by a computer which add virtual contents, producing an augmented image stream which is projected again to the user by means of a blind visor. It allows to increase the level of immersion in the scene and the quality of the images depends on the resolution of the visor. As witnessed by several investigations, the video see-through systems revealed to be more suitable for different applications, especially in the engineering field.
Virtual Engineering in Augmented Reality
61
Figure 4. AR Video see-through system.
Scientific literature reports an increasing interest for the development of applications of augmented reality in many different fields [7-9]. The AR has been used in medicine and surgery [10] to improve the reliability of complex clinical treatments and assist operations (image-guided surgery). Moreover, for military purposes, the army is already using AR displays in cockpits where screened information are shown to the pilot on the windshield of the cockpit or the visor of their flight helmets. Other fields of interests in AR technologies are the robotics [11] and telerobotics in which an augmented display can assist the user with a visual image of the remote workspace to guide the robot movements. AR is also useful in maintenance and assembling activities [12-14] where technicians can approach a new or unfamiliar piece of equipment simply putting on an AR display, instead of opening several repair manuals, and visualizing information directly on the desired object. AR applications have been developed in architecture [15] for perceiving the structural modifications inside and outside an house, by superimposing walls and furniture to the current solution and perceiving the results in a realistic way. There are also applications in e-learning [15-18], manufacturing [19-20], services and logistics [21-22], arts [23], navigation [24], etc. The most of all these applications deals with the merging in the real world object, scene and animation which have been modeled and simulated outside the system. It means that the user perceives a real scene augmented with pre-computed object. His interaction with them is often limited to exploration. In order to extend the application of augmented reality in engineering field we have to move some steps forward. A designer or an engineer cannot be limited to the exploration of the scene, but he has to interact with objects to modify, create, animate and materialize his creativity [25-29]. For this purpose the augmented reality has to be enriched with tools (haptic devices) which allow to interact with the objects in the scene. The most important interaction is about the tracking of user’s position in the augmented scene. There are five types of tracking devices that are implemented, depending on the methodology used for measuring the
62
Pier Paolo Valentini, Eugenio Pezzuti and Davide Gattamelata
position in the space: mechanical, electromagnetic, optical, acoustic and inertial. The first ones use a multi degree-of-freedom linkage to compute the position in the space of a pointer. They have the advantage to be very precise and rather cheap but they have a small workspace and limit the user’s movement. The electromagnetic devices are comprised of an emitter and a receiver. The emitter generates a magnetic field which is captured by the receiver. The changing of the acquired signal is converted to information about the position and attitude of the receiver. Due to its small size, the receiver can be easy put on by the user or attached to a tracking stick. On the other hand, the electromagnetic devices are sensible to interference of the magnetic field due to electronic equipments or bulky ferromagnetic objects. Optical tracking systems are more complex. They require the use of several markers and two or more high-speed cameras. Their precision depends on the resolution of the cameras and the size of the markers. They allow wide working area, but require a specific setup to ensure that the markers are always visible to cameras. Acoustic devices are comprised of an emitter and several receiver. The emitter generates an acoustic signal whose time of flight is acquired by each receiver and converted in spatial position information. They are quite cheap devices but cannot ensure great precision and they are sensible to temperature and humidity variation and to the presence of echoes. Inertial devices use accelerometers and gyroscopes to measure the position and the attitude. They require a frequent calibration but offer good precision. In a general application, the interaction with the scene has to fulfill the following requirements: • • • •
the devices have to be easy to use; the devices have to be precise and be allow an acquisition with a high frequency; their application has to be intuitive and natural without limit the user’s movement; they have to support the ability and the intent of the designer being an alley and not an obstacle.
Unfortunately many of these devices or system are often much expensive and their cost limits the usage to large research facilities or large industries. The AR system has some similarities with the Virtual Reality (VR) one. The main difference is that in the VR the perceived world is fully virtual (generated by one or more rendering pipelines), while in the AR the virtual world merges the real one. This mixing involves not only the generation of rendered graphics, but also complex procedures of registering the real and the virtual video stream. Moreover, in order to give the most natural perception of the augmented world to the user, all these activities need to be real time processed. Such demanding requirements slowed down the advancements and researches of AR systems with respect to the VR ones.
1.3. Motivation and Objectives The motivations of this chapter come from all the considerations discussed in the previous section and are fuelled by the idea that future of virtual engineering will be on virtual and augmented platforms in order to increase the user’s interaction and the level of realism and perception.
Virtual Engineering in Augmented Reality
63
According to many researchers, the AR system can be the future of Computer-Aided Design (CAD) technologies and Virtual Engineering. Presently CAE applications support the designer through numerical computation and computer graphics. Often engineers and designers develop their creative ideas in front of a computer monitor using mouse and keyboard. Although the integration between numerical computation and graphics leads to the generation of realistic digital mock-ups, they are still far from the real context and the user has a limited interaction with them. This limitation can generate problems (non-conformities, unexpected behaviour and appearance, for instance) when the designed products have to be integrated in the real world. For overcoming this disadvantage, new instruments, based on AR systems can be set up. The contents of the chapter concern with the discussion about possible implementation of virtual engineering tools in an augmented reality environment. Three aspects will be faced: the modeling of three dimensional shapes, the simulation of physical motion of mechanisms and the visualization of engineering structural and fluid-dynamics investigations. Both hardware and software details will be discussed, proposing an implementation of a low-cost system.
2. 3D Modelling in Augmented Reality The first step toward the building of a virtual prototype is the modeling of shapes. Standard applications implemented into Computer-Aided Design systems make use of keyboard, mouse and, in some cases, of spatial pointers. The results of modeling actions can be viewed by the designer on the pc monitor. They have a good level of realism but they are limited inside the monitor. The designer with his personal ability has to extrapolate these results and imagine how they will fit in the real world. In order to overcome this limitation, virtual shapes can be merged in the real world using augmented reality techniques. This solution still limits the creativity of the designer because he has to model using a monitor interface and then project the shapes in the real world. A more interactive solution involves the possibility to create virtual shapes directly in the real environment by passing the use of external interfaces. In order to arrange a system able to allow the user a real-time modeling and visualization, appropriate hardware has to be set up and software has to be implemented. In the following section a low cost solution, that has been implemented by the authors, will be described.
2.1. Hardware Setup The implemented system is made of input, processing and output devices. Input devices have to acquire a real world video stream and user actions. Output devices have to project an augmented perception of the real world enriched with virtual objects. Processing units have to manage inputs coming from different devices, store and arrange data flows and render the augmented video stream. The input video device of the implemented system is a Microsoft LifeCam VX6000 USB 2.0 camera, able to catch frames up to 30 Hz with a resolution of 1024x768 pixels. This camera has been rigidly mounted on the Head Mounted Display (Figure 5). The position tracking system has been developed starting from an electromagnetic 6 degrees of freedom
64
Pier Paolo Valentini, Eugenio Pezzuti and Davide Gattamelata
sensor named Flock of Birds by Ascension (http://www.ascension-tech.com) in combination with a wood pen (named “I-Pen”). Flock of Birds is an electromagnetic tracker similar to those described in the introduction. Its receiver (a small cube) is rigidly constrained to the IPen and the emitter is placed inside the design space. Another input device is a standard pc keyboard for user key command. The processing unit is a personal computer with a Pentium IV quad core processor, 3 Gb RAM, NVidia Quadro FX3700 graphic card and the operative system is Windows XP professional and the development suite for programming is Microsoft Visual Studio 2005. The output display is an Head Mounted Display equipped with OLed displays (Z800 3D visor by Emagin - http://www.3dvisor.com/). It is able to support stereovision up to a resolution of 800x600 each eye. The emitter of the electromagnetic tracker has to be rigidly fixed inside the working area. Since it has a transmission range of less than 1 m, it can be useful to place in the middle of the area. The presence of metallic objects strongly affects the measurement, so it is important to ensure a correct shielding or an appropriate distance of ferromagnetic parts from the working zone. In the working area, in a place that can be always seen by the camera, a patterned marker has to be located. Its presence allows the computation of the relative position between the camera and the real world as explained in the following section. The user has to wear the head mounted display and the camera and has to grab the I-Pen in a normal way (Figure 5).
2.2. Software Setup Since the system is assembled from scratch, a reliable software has to be implemented in order to manage the various data flows, to control the devices and to generate an augmented real-time video stream. Since the driver of the electromagnetic tracker were available in C++, all the procedures have been implemented in the same programming language.
Figure 5. Input and output devices of the implemented system.
In the system there are three main data flows. The first is the video stream coming from the camera. This video is processed in order to find out the relative position between the camera and the real world. Since the camera is fixed on the head mounted display, it is also
Virtual Engineering in Augmented Reality
65
the relative position between the user and real world. The computation is possible thanks to the ArtToolkit libraries. Their source code (freely available together with documentation at http://sourceforge.net/projects/artoolkit) is widely used to developed augmented reality applications. The ArtToolkit routines can analyze the scene, recognize a patterned square marker (as that in Figure 4), and find out the spatial transformation between the camera and the real world. At the same time another data flow comes from the tracking system. So it is possible to locate the tip P of the I-Pen in the world reference frame of the emitter (O-XYZ) as (see Figure 6):
{P}O− XYZ = {Op }O− XYZ + [T ]I-Pen {P}O − x y z p
(1)
p p p
where:
{*} A are the coordinate of the generic vector with respect to A system of coordinates; [T ]I-Pen is the spatial transformation between the reference frame of the I-Pen sensor and
that of the emitter
{ }
Both O p
O − XYZ
and [T ]I-Pen can be computed with the embedded driver of the tracker.
Figure 6. World, emitter and I-Pen reference frames.
In order to collimate the three reference frame of the camera, the world and the I-Pen, another transformation has to be computed. It relates the emitter with the world patterned marker. Since these two reference frames are fixed in space, this transformation ( {OW }O − XYZ and
[T ]marker )
can be computed off-line when setting up the working
environment. By doing this it is possible to locate the I-Pen tip P in the reference frame of the marker as:
{r '}O − XYZ = {P}O− XYZ − {OW }O− XYZ {P}O
W
= [T ]marker {r '}O − XYZ = [T ]marker {r '}O − XYZ −1
− xW yW zW
T
(2) (3)
66
Pier Paolo Valentini, Eugenio Pezzuti and Davide Gattamelata
2.3. Examples In this section some examples of modeling will be discussed. With reference to Figure 7 one can notice how to sketch planar geometry such as lines and polygons. The user can draw using the I-Pen in the same way he can do with a pencil and a paper sheet. The graphic modeler, based on OpenGL, render the scene, augmented with sketched objects. The augmented scene also includes virtual tripods in order to underline where the world systems of coordinate is located (on the patterned marker and where the I-Pen is pointing). Figure 8 shows examples of three dimensional geometry. Starting from sketches, the user can extrude 1D and 2D entities, building surfaces and solids. It can be done simply dragging the I-Pen throughout the path of extrusion in a natural way. The results can be visualized in preview during the modeling, so the user can adjust and correct his operation in real time. Starting from the picking of points in space, the user can also define free form surfaces (Figure 9). These entities are very suitable for aesthetic design purposes, reverse engineering and advanced modeling. The user can define and modify control points of surface, simply moving the I-Pen in the working area and selecting locations.
Figure 7. Modelling of objects in plane (lines, on the left and a triangle, on the right).
Virtual Engineering in Augmented Reality
67
3. Simulating and Animating in AR The capabilities of augmented reality can be also used to support engineering simulation. In particular, this section focuses on the main three types of simulation often performed by engineers: the movement of a linkage (kinematics and dynamics), the deformation of an object subjected to loads, and the dynamics of fluids inside or outside objects. When a designer prepares a virtual model for one of these simulations, he has to define geometries, constraints and boundary conditions. All these tasks can be supported by the implemented AR system which can serve as pre- and post-processor. Moreover, the great advantage that the augmented reality can give to simulation is in the visualization of results, i.e. in postprocessing. In this way the user can integrate in the real world the visual results of several numerical analyses in a more realistic way.
Figure 8. Modelling of 3D extruded objects (a surface based on a spline, on the left and two cylinders, on the right).
68
Pier Paolo Valentini, Eugenio Pezzuti and Davide Gattamelata
Figure 9. Modelling of 3D complex objects (a free form surface based on control points, on the left and a patched surface to be integrated in an existing models of a car, on the right).
3.1. Multibody Animation With multibody simulations engineers analyze the kinematics and dynamics of moving parts and linkages. Starting from the knowledge of mass and inertial properties of each component, topological constraints (joints and fixtures) and external actions (forces, torques, motors), it is possible to simulate the movement of a system in a reliable way. Readers interested in techniques for deducing the governing equations and solution strategies can refer to referenced books [30-31]. The AR can support multibody simulations in two different ways. The first is about the possibility to project on the real world the results coming from a precomputed simulation. It concern the rendering on the scene of all the objects involved in the simulation, whose position is updated according to the results of the computation. This implementation is similar to that of the common post processing software for visualizing graphics results. The only difference lays in the merging of the simulated system in the real
Virtual Engineering in Augmented Reality
69
world. The advantage is to perceive the interaction with the real world and check working spaces, possible interferences, etc. Although useful, this approach does not use all the potential of AR. Let us consider a practical application in order to illustrate this first possible integration between motion simulation and augmented reality. The purpose is to simulate a robot that has to be mounted on flange by means of a revolute joints. The flange is present in the real world as a physical component (Figure 10). The robot and its movement have to be added as virtual shapes. The first step is to built the virtual parts. This task can be done using the modeling techniques illustrated in the previous section or also using a computer-aided design program. In this case it is useful to export geometries using .vrml (Virtual Reality Mark-up Language) file. This format is quite common and can be exported by almost all solid modelers. Dealing with .vrml is useful for rendering geometries using OpenGL libraries. The second step is to perform the numerical computation using a specific solver. In order to prepare the input file for simulation, it is useful to consider how to relate the real world to the virtual world. The main issue is to collimate the two main reference frames. This operation can be made using communication reference frames in the same way of sub-structuring large assemblies. In other worlds, we can consider the virtual world (and the objects in it) as a subsystem of the real world. The relation between the main system and a subsystem is controlled by several constraint equations acting on the communication reference frames. For the specific example of the robot, it can be convenient to locate the real world communication frame on the flange when the revolute joint has to be implemented. Similarly, we can define the virtual communication frame coincident to the inertial reference frame of the virtual system. When building the equations for simulation, we have to create a fictitious revolute joint between the virtual inertial reference frame and a reference frame on the first link of the robot. This approach can be resumed in the following five steps: 1. Before the simulation starts, the geometries and topological properties (joints and connections) have to be defined, built and stored in files; 2. The real scene has to contain information for collimating the real world to the virtual objects (communication frames); 3. The equations of motion of the investigated mechanisms have to be built and externally solved by means of a specific solver; 4. After the numerical solution, simulation results have to be accessible to AR executable. 5. Each frame acquisition, virtual objects have to be rendered on the scene in the correct position and attitude according to the simulation results, considering the position of the communication frames and using OpenGl ModelView transformations. The definition of the real world reference frame can be done in two different ways. In the first, the patterned marker can be placed on the flange where the communication frame is desired (Figure 10). The second way is about the picking of points defining the communication frame by the use of I-Pen and recording the coordinates of the origin and main axes. As a result, the augmented scene includes the real world with the virtual robot. Since the movement of the manipulator has been simulated solving the equations of motion, the augmented scene will show an animation of the robot. By this way the user can visualize from different points of view, the motion of the manipulator directly in the real world looking at its
70
Pier Paolo Valentini, Eugenio Pezzuti and Davide Gattamelata
performance, verifying working space, checking possible interference with real objects and the aesthetic impact on the real environment. The augmented video can be enriched with other visual information on kinematic and dynamic parameters as velocity, acceleration, force, torque, joint reaction, etc. This can be performed rendering on the scene both vectors (for the direction) and numerical values (for the amplitude) as static overlays. A smarter way to enhance the multibody simulation is to introduce interactivity. It means that the user does not only watch the augmented scene, but interact with it. Let us imagine the kinematic simulation of a robot arm whose end-effector can be grabbed by the user and moved. The purpose of the simulation is to compute attitude and position of all the links in order to obtain the required position and orientation of the end-effector. This simulation can be supported by augmented reality by introducing in the real scene the virtual robot that can be interactively manipulated by the user using virtual sensors. Of course, the simulation of the mechanism has to be computed real time in order to update the scene with quick information. This idea can be implemented following five steps (Figure 11): 1. Before the simulation starts, the geometries and topological properties (joints and connections) have to be defined, built and stored in files; 2. The real scene has to contain information for collimating the real world to the virtual objects and the virtual sensor(s) for the interactive action of the user; 3. Each frame, the position and attitude of all markers in the scene have to be acquired and mathematical transformations between camera and markers have to be computed. 4. Each frame, starting from the previous recognition and calculation, dynamic or kinematics equations have to be solved in order to compute the correct position and attitude of all the virtual bodies in the scene; Each frame acquisition, virtual objects have to be rendered on the scene in the correct position and attitude.
Figure 10. Continued on next page.
Virtual Engineering in Augmented Reality
71
Figure 10. Simulation of the movement of a manipulator in augmented reality. Starting from a real environment and adding collimated virtual objects (on the top), an augmented animation can be built (at the bottom).
Figure 11. Activities for implementing interactive simulation in augmented reality.
The input and output procedures (data acquisition, markers recognition, rendering of the virtual objects) can be implemented using the ArtToolkit open source library. Between the input procedures (data acquisition and markers recognition) and the output procedures (rendering of the virtual objects on the input scene) a specific kinematic or dynamic solution has to be implemented. This portion of the algorithm depends on the specific mechanism to be simulated. Since it has to be performed between the acquisition of two consecutive frames, all the equations need a real-time solution [32-34]. For this purpose it is useful to optimize the solution strategy and kinematics simulations are more suitable
72
Pier Paolo Valentini, Eugenio Pezzuti and Davide Gattamelata
because involve the solution of a system of non linear equations, instead of a system of differential-algebraic equations. In order to explain in details all the steps to implement a multibody simulation in an Augmented Reality environment we develop an example about a 3D manipulator.
Figure 12. The 3D manipulator of the example, and its closure loop.
With reference to Figure 12, let us consider a robot with 4 bodies connected with 3 revolute joints and 1 spherical joint. According to Grubler’s count the mechanism has 6 degrees of freedom:
dof = 6 ⋅ nlink − ∑ i =1
joints
( 6 − fi ) = 6
Virtual Engineering in Augmented Reality
73
where nlink is the number of moving parts, joints is the number of kinematic pairs and f i are the degrees of constraint of the i-th pair. It means that, in order to define in a unique way the position in space of the manipulator, we have to prescribe 6 independent parameters (i.e. position and attitude of the end-effector). For an interactive simulation it means that the user can freely choose the position and attitude of the end effector and the Augmented Reality scene has to be able to include such a 6 d.o.f. sensor. This sensor can be represented by a patterned marker. The first step in building the model is the construction of geometries of each link that can be stored in .vrml (Virtual Reality Mark-up Language) file. This format is quite common and can be exported by almost all solid modelers. Dealing with .vrml is useful for rendering geometries using OpenGL. The second step is about the preparation of the scene. We need a marker (marker 0) to define the position and orientation of the manipulator world coordinate system (i.e. its position inside the scene) and another marker (marker 1) to define the position and the orientation of the end-effector that will work as a sensor (Figure 12, on the right). The third step is about the implementation of the system of constraint equations which can be built considering the closed loop of vectors (Figure 12, on the right). Several approaches can be used for building the system of equations. It is useful, for the subsequent graphical operations, to use the 4x4 homogeneous transformation matrix [T ]l1−l 2 to express
the relative position and attitude between two generic links 2 and 1. The first 3x3 portion of this matrix is used to define the relative orientation between the two reference frames attached to the two links. The last column is used to describe the relative position between the origins of the coordinate frames. The last row of the matrix is [0 0 0 1]:
⎡[ Orientation ]3x3 [ Position ]3x1 ⎤ [T ]l1−l 2 = ⎢ ⎥ ⎣
0 0 0
1
⎦
Looking at the tip point P on the link 2 (center of the spherical joint) we can deduce its position with respect to the marker 0 as:
{P}marker0 = [T ]0−l 0 ⋅ [T ]l 0−l1 ⋅ [T ]l1−l 2 ⋅ {P}link 2
(1)
where:
{P}marker0 is the position vector of point P in marker 0 (world) coordinate system; {P}link 2 is the position vector of point P in link 2 (local) coordinate system; [T ]0−l 0 is the homogeneous transformation matrix between link 0 and marker 0. It is a function of the parameter revolute joint;
which describes the relative rotation at their relative
74
Pier Paolo Valentini, Eugenio Pezzuti and Davide Gattamelata
[T ]l 0−l1 is the homogeneous transformation matrix between link 1 and link 0. It is a function of the parameter revolute joint;
which describes the relative rotation at their relative
function of the parameter revolute joint;
which describes the relative rotation at their relative
[T ]l1−l 2 is the homogeneous transformation matrix between link 2 and link 1. It is a
Looking at the same point P, but on the slider and considering that the slider is attached to the marker 1, we can deduce its position with respect to the marker 0 as:
{P}marker0 = [T ]0−1 ⋅{P}slider
(2)
where:
{P}slider is the position vector of point P in slider or marker 1 (local) coordinate system; [T ]0−1 is the homogeneous transformation matrix between marker 1 and marker 0. It is a function of the 6 independent parameters which describes the relative position and the relative rotation between the two markers. These parameters can be considered as the input of the kinematic analysis because can be freely chosen by the user to move the manipulator in space.
Since at point P, link 1 is connected to slider by means of a spherical joint, we can obtain the closure loop equation of the mechanism as:
[T ]0−l 0 ⋅ [T ]l 0−l1 ⋅ [T ]l1−l 2 ⋅ {P}link2 − [T ]0−1 {P}slider
⎧0 ⎫ ⎪0 ⎪ ⎪ ⎪ =⎨ ⎬ ⎪0 ⎪ ⎩⎪1 ⎭⎪
(3)
The system of equations in (3) can be solved for the unknown kinematic parameters starting from the knowledge of the position and attitude of the marker 1 (and the endeffector slider) with respect to the marker 0. In order to compute this information we have to know the relative transformation between marker 1 and marker 0. ARToolkit deals with matrices which are similar to the homogeneous ones. They are 3x4 transformation matrices, containing information about the relative position and attitude as the homogeneous ones but without the last dummy row. Quaternions and position vector can be extracted from these matrices by using arUtilMat2QuatPos Artoolkit procedure. The relative position of the camera with respect of marker 0 ( [T ]c −0 ) and marker 1
( [T ]c −1 ) can be computed using an Artoolkit procedure arGetTransMat. It computes the camera position and attitude in function of detected markers. Their inverse matrices
Virtual Engineering in Augmented Reality ( [T ]0−c and
[T ]1−c )
75
represent the relative transformations between the markers and the
camera. The relative transformation between the marker 1 and marker 0 can be computed as:
[T ]0−1 = [T ]c−0 ⋅ [T ]1−c
(4)
This relation is used to compute the coordinates of point P on slider (fixed to the marker 1) with respect to the marker 0. By doing this, the equations (3) can be solved for the unknown angles. At this point the virtual position and attitude of each link of the mechanism are known. The next step is to compute the projection of geometries in the augmented scene. Since the renderer is OpenGl engine [35], we have to deal with the ModelView projection matrix which maps the 3d points coordinates into 2d (scene) coordinates. The first step concerns with the loading of the projection matrix computed from the perspective of the marker 0. This task can be performed by using arglCameraViewRH whose output is a vector of 16 elements containing the transformation and scale information to project the object from the 3D coordinate system of the marker 0 into the camera view plane. This transformation can be used to relate the position and attitude of each objects from the virtual world coordinate system and the marker 0. The next steps concern the computation of the transformation in order to draw the manipulator link in the scene, using the OpenGl operators glRotated and glTranslated. The first performs a rotation of an angle about a specified axis, the second performs a translation of a specified amplitude along a specified direction. For the base it is sufficient to perform a rotation about the z axis of the marker 0 of an angle: glRotated(alpha,0.0,0.0,1.0);
For the first link three transformations need: a translation a1 along z axis of marker 0, a rotation about the same z axis of an first revolute joint:
angle and a rotation of a
angle about the axis of the
glTranslated(0.0,0.0,a1); glRotated(alpha,0.0,0.0,1.0); glRotated(beta,0.0,-1.0,0.0);
For the second link, three transformations need: a rotation of an angle about the -z axis of the marker 0, a translation to the center of the revolute joint between the 1st and the 2nd links and a rotation of an 2nd links:
angle about the axis of the revolute joint between the 1st and the
glRotated(alpha,0.0,0.0,1.0); glTranslated(l1*cos(beta),0.0,l1*sin(beta)+a1); glRotated(gamma,0.0,-1.0,0.0);
The end-effector, since it is fixed to the marker 1, can be projected using marker 1 projection matrix without applying any further transformation.
76
Pier Paolo Valentini, Eugenio Pezzuti and Davide Gattamelata Screenshots of animation are reported in Figure 13.
Figure 13. Snapshots of the interactive animation.
3.2. FEM Pre and Postprocessing By means of finite element analyses engineers analyze the mechanical stresses inside a structure and its deformation to applied loads. Starting from the knowledge of elastic properties of each part, topological constraints between parts and presence of external actions (forces, torques, enforced displacements) it is possible to simulate the deformation of a system and
Virtual Engineering in Augmented Reality
77
assess the level of mechanical stresses inside it. Readers interested in techniques for deducing the governing equations and solution strategies can refer to referenced book [36]. Let us consider a practical application in order to illustrate how the augmented reality can support this kind of engineering simulation. The purpose is to simulate a steel bar that is pinned on two mountings and it is loaded with a force. The two mountings and the bar are physically present in the real world (Figure 14). The force, the resultant deformation and the stress field have to be added as virtual contents. The first step is to built the virtual parts. This task can be done using the modeling techniques illustrated in the previous section, using the IPen to acquire the geometry of the bar. The second step is to define the location and the amplitude of the force. This can be also done with the I-Pen, by pointing on the force location on the real world and defining load amplitude using the keyboard. Then, we have to perform the numerical computation using an external FEM solver. The last step is the projection of the numerical results in the real scene. Generally, the main results of such simulations concern with a deformed shape of the structure (that reproduces the deformation in an amplified way) that is coloured with a palette indicating the stress level. The information has to be added to the real world. Again, we have to collimate the two main reference frames of the real world and the virtual one.
Figure 14. Simulation of the deformation and stresses of a pinned bar in augmented reality. Starting from a real environment and adding collimated virtual objects (on the top), an augmented animation can be built (at the bottom) .
78
Pier Paolo Valentini, Eugenio Pezzuti and Davide Gattamelata
This operation can be made using communication reference frames in the same way of the previous discussion. For the specific example of the bar, it can be convenient to put a visible patterned marker locate near the physical bar (Figure 14), defining the real world communication frame. Similarly, we can define the virtual communication frame at the corresponding location in the virtual world. By this way it is possible to locate the virtual deformed and colored bar in the real scene with a simple coordinate transformation (translation and rotation) using OpenGl operators glRotated and glTranslated. As a result, the augmented scene includes the real world with the virtual bar (Figure 14, at the bottom). The scene can be also enriched including an animation of the deformed shape instead of static geometry. It is useful especially for vibration analyses where the visualization of modal shape is very important. By this way, the user can visualize from different points of view, the deformation and the stress level of the bar directly in the real world looking at its performance, verifying excessive stresses or deformation. The implementation can be summarizes in the following six steps: 1. The real scene has to contain information for collimating the real world to the virtual objects. A patterned marker or a coordinate system defined by picking points with IPen can be used; 2. The geometry of component(s) under investigation has to be acquired; 3. Boundary conditions (constraints and loads) have to be defined too; 4. The computation of deformed shapes and stress level can be performed using an external FEM solver; 5. The results coming from step 4 has to be converted into coloured .wrml entities (i.e. deformed shapes with stress contour plot); 6. The .wrml entities have to be collimated with the real world and rendered on the augmented scene. The augmented video can be enriched with other visual information on the exact numerical values of stresses (i.e. adding a virtual legend), constraints and restraints forces, etc. This can be performed rendering on the scene both vectors (for the direction) and numerical values (for the amplitude).
3.3. CFD Postprocessing With computational fluid-dynamics engineers analyze liquid and gasses flows inside or outside parts. Starting from the knowledge of fluid properties, boundary conditions (flow rate, pressure, temperature, etc.) and wall properties, it is possible to simulate the kinematic and thermodynamic behavior of fluids and their interaction with solids. Readers interested in techniques for deducing the governing equations and solution strategies can refer to referenced book [37]. Let us consider a practical application in order to illustrate how the augmented reality can support this kind of engineering simulation. The purpose is to simulate an external air flow around a cylinder on a table. Both table and cylinder are physically present in the real world (Figure 15). The stream line and the pressure field on the surfaces have to be added as virtual contents. The first step is to built the virtual parts. This task can be done using the modeling
Virtual Engineering in Augmented Reality
79
techniques illustrated in the previous section, using the I-Pen to acquire the geometry of the cylinder and of the table. The second step is to define all the bounding conditions for the simulation. Because there are many values to be defined, it is convenient to complete the model outside the augmented reality environment. Then we have to perform the numerical computation using an external CFD solver. The last step is the projection of the numerical results in the real scene. Generally, the main results of such simulations concern with several stream lines describing the fluid trajectories which are coloured with a palette indicating the velocity, pressure or temperature value. The information has to be added to the real world. Once again, we have to collimate the two main reference frames of the real world and the virtual one. This operation can be made using communication reference frames in the same way of the previous discussions. For the specific example of the cylinder, it can be convenient to put a visible patterned marker locate near the physical cylinder (Figure 15), defining the real world communication frame. Similarly, we can define the virtual communication frame at the corresponding location in the virtual world. By this way it is possible to locate the virtual stream lines and surface plots in the real scene with a simple coordinate transformation (translation and rotation) using OpenGl operators glRotated and glTranslated.
Figure 15. Simulation of the air flow around a cylinder on a table in augmented reality. Starting from a real environment and adding collimated virtual objects (on the top), an augmented animation can be built (at the bottom) .
80
Pier Paolo Valentini, Eugenio Pezzuti and Davide Gattamelata
As a result, the augmented scene includes the real world with the virtual surfaces touched by the fluid (cylinder and table) (Figure 15, at the bottom). The scene can be also enriched including an animation of the stream lines. By this way the user can visualize from different points of view, the fluid stream around the cylinder and the pressure acting on the boundary surfaces directly in the real world. The implementation can be summarizes in the following six steps: 1. The real scene has to contain information for collimating the real world to the virtual objects. A patterned marker or a coordinate system defined by picking points with IPen can be used; 2. The geometry of component(s) under investigation has to be acquired; 3. Boundary conditions (fluid flows, pressure openings, wall properties, etc.) have to be defined too; 4. The computation of fluid field and pressure on surfaces can be performed using an external FEM solver; 5. The results coming from step 4 has to be converted into coloured .wrml entities (i.e. stream lines, surfaces coloured according to pressure values); 6. The .wrml entities have to be collimated with the real world and rendered on the augmented scene. The augmented video can be also enriched with other visual information on the exact numerical values of fluid or surface parameters adding a virtual legend (i.e for describing pressure, velocities, temperature, etc.).
4. Conclusion The integration between computer aided engineering tools and augmented reality has revealed to be a valid instrument in supporting designers and users in modeling, testing and reviewing their products. With the integration of specific hardware devices as trackers and sensors, the user can interact with the scene in an immersive way. For the modeling of shapes, a magnetic sensor can be useful to acquire a precise position in space of a virtual pen in order to allow the user to sketch virtual objects directly on the real world. Moreover the comprehension of the results of engineering simulations as motion analyses, structural investigations, fluid dynamics computations can be improved by enhanced visualization and interaction with real and virtual objects. The discussed instrument and methodologies can be also useful for the collaborative design. Very often, scientists or engineers teams work on the same project at different locations. In this case, all the designers can wear an AR sub-system and all these sub-systems can communicate among them. Imagine that a group of designers is working on the model of a complex device for their customers. The designers and customers want to perform a joined design review even though they are physically separated. If each of them is equipped with an augmented reality display this could be accomplished. The physical prototype that the designers have mocked up is imaged and displayed in the client’s AR system in 3D. The client may look at different aspects of it, testing engineering performances and checking its integration to the real world.
Virtual Engineering in Augmented Reality
81
The future of this combination between AR and Virtual Engineering is very promising. The discussed examples are only a small part of the capabilities of such integration. The main future challenges are about the manipulation of objects, the capabilities for virtual assembling and the fully integration of numerical methodology without requiring external solvers.
References [1]
[2] [3] [4] [5] [6] [7]
[8] [9]
[10]
[11]
[12] [13]
[14]
Bernard A. (2005). Virtual engineering: methods and tools. Proceedings of the Institution of Mechanical Engineers, Part B: Journal of Engineering Manufacture Vol. 219 (5), pp. 413-422. Azuma R.T. (1997). A survey of augmented reality. Teleoperators and Virtual Environments, 6(4), pp. 355–385. Bimber O., Raskar R. (2005). Spatial Augmented Reality: Merging Real and Virtual Worlds, A K Peters, Ltd. Vallino J. (1998). Interactive augmented reality. PhD thesis, Department of Computer Science, University of Rochester, USA. Azuma R., Baillot Y. et al. (2001). Recent advances in augmented reality. IEEE Computer Graphics 21(6), pp. 34–47. Haniff D., Baber C., Edmondson W. (2000). Categorizing augmented reality systems. Journal of Three Dimensional Images 14(4), pp. 105– 109. Klinker G., Ahlers K.H. et al. (1997). Confluence of computer vision and interactive graphics for augmented reality, PRESENCE: teleoperations and virtual environments. Special issue on Augmented Reality, 6(4), pp. 433–451. Fotis Liarokapis (2007). An augmented reality interface for visualizing and interacting with virtual content, Virtual Reality 11, pp. 23–43. Muller A., Conrad S., Kruijff E. (2003). Multifaceted interaction with a virtual engineering environment using a scenegraph-oriented approach. Proceedings of the 11th Int. Conf. in Central Europe on Computer Graphics, Visualization and Computer Vision, Czech Republic. Samset E., Talsma A., Elle O., Aurdal L., Hirschberg H., Fosse E. (2002). A virtual environment for surgical image guidance in intraoperative MRI, Computer Aided Surgery 7(4), pp.187– 196. Stilman M., Michel P., Chestnutt J., Nishiwaki K., Kagami S., Kuffner J.J. (2005). Augmented Reality for Robot Development and Experimentation. Tech. Report CMURI-TR-05-55, Robotics Institute, Carnegie Mellon University. Ong S.K., Pang Y., Nee, A.Y.C. (2007). Augmented Reality Aided Assembly Design and Planning. Annals of the CIRP Vol. 56/1 pp.49-52. Pang Y., Nee A.Y.C., Ong S.K., Yuan M.L. Youcef-Toumi K. (2006). Assembly Feature Design in an Augmented Reality Environment, Assembly Automation, 26/1, pp. 34-43. Sharma R. and Molineros J. (1995). Computer vision-based augmented reality for guiding manual assembly. PRESENCE: Teleoperators and Virtual Environments, n. 3, pp. 292-317.
82
Pier Paolo Valentini, Eugenio Pezzuti and Davide Gattamelata
[15] Webster A., Feiner S., MacIntyre B., Massie W., Krueger T. (1996). Augmented reality in architectural construction, inspection, and renovation. Int. Proc. Of Third Congress on Computing in Civil Engineering ASCE 3, Anaheim, CA, pp. 913-919. [16] Liarokapis, Petridis P., Lister P.F., White M. (2002). Multimedia augmented reality interface for E-learning (MARIE). World TransEng Technol Educ 1(2), pp.173–176. [17] Pan Z., Cheok A.D., Yang H., Zhu J.,_, Shi J. (2006). Virtual reality and mixed reality for virtual learning environments. Computers & Graphics 30, pp. 20–28. [18] Kaufmann H., Schmalstieg D., Wagner, M. (2000). Construct3D: A Virtual Reality Application for Mathematics and Geometry Education. Education and Information Technologies 5(4), pp. 263-276. [19] Dangelmaier W., Fischer M., Gausemeier J., Grafe M., Matysczok C., Mueck B. (2005). Virtual and augmented reality support for discrete manufacturing system simulation. Computers in Industry 56, pp. 371–383. [20] Ong S.K., Nee A.Y.C. (2004). Virtual and Augmented Reality Applications in Manufacturing. Springer, London, UK. [21] Reif R., Walch D. (2008). Augmented & Virtual Reality applications in the field of logistics. Visual Computing 24, pp. 987–994. [22] Friedrich W. (2004). ARVIKA—Augmented Reality for Development, Production and Service. Publicis Corporate Publishing, Erlangen. [23] Liarokapis F., Sylaiou S., et al. (2004). An interactive visualization interface for virtual museum. Proceedings of the 5th international symposium on Virtual Reality, Archaeology- Cultural Heritage, Brussels and Oudenaarde, pp 47–56. [24] Narzt W., Pomberger G., Ferscha A., Kolb D., Muller R, Wieghardt J., Hortner H., Lindinger C. (2006). Augmented reality navigation systems. Univ Access Inf Soc 4, pp. 177–187. [25] Klinker G., Stricker D., Reiners D. (1999). Optically based direct manipulation for augmented reality, Computers & Graphics 23, pp. 827-830. [26] Bowman D.A. (1996). Conceptual Design Space: Beyond Walk-through to Immersive Design. In Bertol D., Designing Digital Space, John Wiley & Sons, New York. [27] Bowman D.A. (1999). Interaction Techniques for Common Tasks in Immersive Virtual Environments: Design, Evaluation, and Application. Ph.D. thesis, Virginia Polytechnic & State University. [28] Liang J. and Green M. (1994). JDCAD: a highly interactive 3D modeling system. Computers & Graphics 18(4), pp. 499-506. [29] Raskar R., Welch G., Cutts M., Lake A., Stesin L., Fuchs H. (1998). The office of the future: a unified approach to image-Based modeling and spatially immersive displays. Proceedings of SIGGRAPH 98, Orlando, FL, July 19–24. [30] Garçıa de Jálon J., Bayo E. (2004). Kinematic and Dynamic Simulation of Multibody Systems – the Real-Time Challenge. Springer-Verlag, New York. [31] Haug E.J. (1989). Computer-Aided Kinematics and Dynamics of Mechanical Systems. Allyn and Bacon, Boston, MA. [32] Korkealaakso P.M., Rouvinen A.J., Moisio S.M., Peusaari J.K. (2007). Development of a real-time simulation environment. Multibody System Dynamics, 17, pp. 177–194. [33] Naya M.A., Dopico D., Perez J.A., Cuadrado J. (2007). Real-time multi-body formulation for virtual-reality-based design and evaluation of automobile controllers.
Virtual Engineering in Augmented Reality
[34]
[35] [36] [37]
83
Proceedings of the Institution of Mechanical Engineers, Part K: Journal of Multi-body Dynamics, Vol. 221 (2), pp. 261-276. Isnard F., Dodds G., Vallée C., Fortuné D. (2000). Real-time dynamics simulation of a closed-chain robot within a virtual reality environment. Proceedings of the Institution of Mechanical Engineers, Part K: Journal of Multi-body Dynamics Vol. 214(4), pp. 219-232. Wright R.S., Lipchak B. (2004). OpenGl Super Bible, third edition, Sams Publishing, USA. Zienkiewicz O.C. (1977). The Finite Element Method in Engineering Science. McGraw-Hill Publ., London. Anderson J.D. (1995). Computational Fluid Dynamics, Mc Grow Hill Inc., USA.
In: Computer Animation ISBN 978-1-60741-559-6 c 2010 Nova Science Publishers, Inc. Editors: J.S. Wright and L.M. Hughes, pp. 85-111
Chapter 3
A S URVEY OF P OPULAR 3D S OFT-B ODY A NIMATION C OMPRESSION A PPROACHES S. Ramanathan and A.A. Kassim National University of Singapore, Dept. of Electrical and Computer Engineering, Singapore
1.
Introduction
The world of Computer Graphics has seen rapid developments over the past few decades. Since the appearance of 3D computer animation in the science fiction movie, Futureworld (1976), animations have been successfully employed in Toy Story (1995), Shrek (2001) and Happy Feet (2006). Contemporary animations are synthesized by deforming 3D objects, typically modeled using triangular meshes. A triangular mesh is defined by its geometry (location of vertices), connectivity (how the triangles are connected), surface color, normal and texture properties. An isolated triangle requires more than 100 bytes for its description, since the geometry, normal and texture components are represented using floating point numbers. For accurate representation of real world objects, 3D mesh models typically require many thousands of vertices and triangles. Therefore, 3D meshes demand extensive storage memory and transmission bandwidth, which makes 3D animation compression1 a very relevant problem that has generated much interest in recent years. Besides entertainment and video games, 3D meshes and animations are used in biometrics, computer-aided design (CAD) and medicine. 3D soft-body animations are synthesized by applying non-linear deformations on a static mesh to generate realistic, life-like motion. Complex, realistic and human- like motion of these characters is achieved using physically-based [41, 9] or synthetic [30] animation techniques, by moving the 3D mesh vertices in separate trajectories, as compared to rigid-body transformations where the whole mesh deforms homogeneously. As shown in Fig. 1, mesh deformations can be achieved by either 1
In the 1980s, the need for efficient multimedia storage and transmission led to the development of the popular JPEG (Joint Photographic Experts Group) and MPEG (Moving Picture Experts Group) techniques.
86
S. Ramanathan and A.A. Kassim i. changing the mesh geometry while keeping the connectivity constant- where mesh is deformed by changing the position of the mesh vertices. These animations are termed dynamic geometry sequences. ii. altering mesh geometry as well as connectivity - where deformations are accompanied by movement of the mesh vertices as well as addition/deletion of new/existing vertices which modifies the existing connectivity.
While it is intuitive to treat coding of 3D animations (essentially 3D video) similar to 2D video compression [24], the two problems are inherently different. The key to efficient 3D animation coding is the compact representation of the inter-mesh motion using a few parameters analogous to motion vectors in video compression. A number of animation coding algorithms [25, 1, 46] have attempted to address the compact 3D motion representation problem using the video coding approach involving the following steps (i) Segmentation - a necessary pre-processing step to efficiently represent the motion between consecutive video frames. The current video instance (frame) is segmented into smaller blocks (of size 8 × 8 or 16 × 16 pixels) during this process. (ii) Motion prediction - which allows for compact inter-frame motion representation in terms of motion parameters. The motion parameters are computed by finding the best-match block in the temporal reference for each block in the current frame. (iii) Parameter encoding - where the amount of information required to represent the motion parameters is further reduced using data compression techniques such as Huffman/Arithmetic coding [45]. However, the arbitrary topology (i.e., arrangement of vertices in space) of 3D meshes makes it difficult to efficiently segment 3D meshes and thereby translate well known video compression techniques to 3D. The difficulty involved in segmenting non-planar 3D meshes as compared to trivial planar image segmentation is illustrated in Fig. 2. Therefore, a number of new methods have been proposed for efficient 3D dynamic mesh coding. MPEG4 Part 25 [19] and www.3dcompression.com are resources dedicated to research on 3D animation compression. In this chapter, we review 3D dynamic mesh compression algorithms and investigate how vertex clustering, which chiefly contributes to animation coding complexity, affects compression performance. We finally conclude this chapter with observations that need to be effectively addressed by future 3D animation coding algorithms.
2. 2.1.
An Overview of Mesh Coding Algorithms Static Mesh Compression
Over the past decade, much research has been focused on compression of 3D static meshes and a majority of these [7, 8, 10, 11, 26, 34, 40, 42] address efficient encoding of mesh connectivity. Using spiralling tree-based encoding schemes, popular algorithms like Cutborder[11] and Edgebreaker [34] achieve lossless connectivity compression with only up
A Survey of Popular 3D Soft-Body Animation Compression Approaches
87
(a)
(b) Figure 1. Examples of animations synthesized by (a) changing mesh geometry as seen from frames 95 and 105 of the Chicken animation or (b) changing both mesh geometry and connectivity (around the mouth) for realistic facial expression generation.
(a)
(b)
Figure 2. (a) Image segmentation into equal-sized blocks is a simple pre-processing step to motion prediction in video coding. A segmented frame from the Foreman sequence. (b) Segmentation of the Dino 3D mesh (2039 vertices, 3999 triangles) into 29 pieces using spectral mesh decomposition [28]. Since 3D meshes are non-planar, segmenting 3D meshes into coherent pieces is a non-trivial and compute-intensive task.
88
S. Ramanathan and A.A. Kassim
to 1.5-2 bits per mesh triangle. A few multiresolution geometry-cum-connectivity representation techniques [17, 32] have been proposed to facilitate progressive data transmission and achieve a compression efficiency of around 4-10 bits per vertex (bpv). Compression of mesh geometry is only a supplementary to the connectivity coding scheme in these algorithms. Geometry compression, which involves coding of the floating point (x,y,z) vertex coordinates, is inherently lossy and has been attempted using predictive coding as well as signal processing-based techniques. Predictive coding [42] exploits correlation in the mesh data by predicting a vertex position using the positions of its neighboring vertices. Prediction errors are quantized and entropy coded for compact representation. A typical compression efficiency of 7-12 bpv is obtained using this scheme. Spectral compression [48] is a popular signal processing-based mesh compression method, where the mesh geometry is projected onto an orthonormal basis and reconstructed using a small number of components in the basis. This method achieves a compression efficiency of around 14 bpv for perceptually lossless encoding. A wavelet-based geometry compression technique [2] achieves a compression efficiency of 8 bpv. Since the number of components for geometry reconstruction can be adaptively varied, [48] and [2] can also be used for multiresolution mesh representation. Other techniques that tackle the problem of mesh representation with various levels of detail are [32, 8].
2.2.
Dynamic Mesh Coding Algorithms
As noted above, there are two types of 3D animation sequences - (i) dynamic geometry sequences where mesh motion is achieved by moving the mesh vertices with time, and (ii) dynamic geometry-cum-connectivity sequences where mesh motion is accompanied by changes in mesh geometry as well as connectivity. Dynamic geometry compression algorithms can be grouped into three major classes based on their implementation- Registrationbased, Prediction-based and PCA-based multiresolution representation. Examples of these algorithms are discussed below. 2.2.1.
Registration-Based Compression
In Lengyel’s pioneering work on registration-based dynamic geometry compression [25], he proposes the segmentation of the mesh into smaller sub-meshes and represents the motion of each of these sub-meshes using rigid-body affine transforms. His compression mechanism yields an efficiency of 3.45 bpvf (bpv per frame) for the Chicken animation with 16 and 4 bits used for affine and vertex quantization respectively.Ibarria et al. report a compression efficiency ranging from 1.37 to 2.91 bpvf for their Dynapack algorithm [18] when the quantization ranges from 7 to 13 bits for test animations. Their algorithm exploits spacetime coherence in dynamic geometry by predicting the position of each vertex v in frame f from three of its neighbors in f and the positions of v and its neighbors in the previous frame. A video coding-like method which segments the 3D mesh into blocks and computes motion vectors and error residuals for each mesh block is proposed by Ahn et al. [1]. A compression efficiency of 9.6 bpvf is obtained using the encoding scheme that consists of I
A Survey of Popular 3D Soft-Body Animation Compression Approaches
89
(Intra), P (Predicted) and B (Bi-directionally predicted) meshes for the Chicken animation. In Gupta et al.’s dynamic geometry compression scheme [13], the mesh is partitioned into segments, and the displacement of vertices in each segment is computed using Iterative Closest Point (ICP)-based registration. The encoding scheme describes mesh motion using a few affine parameters and residual errors to achieve a compression efficiency of 2.5 bpvf for the Chicken animation. 2.2.2.
Prediction-Based Compression
Another interesting work on dynamic geometry compression is that of Yang et al. [46], based on vertex-wise motion vector (MV) prediction. Each vertex is given a motion vector obtained from the neighborhood of the vertex, defined as the set of all vertices within a threshold distance around the vertex. This their coding procedure requires a third of the bitrate compared to [25] for the same quality of animation reconstruction measured in terms of Signal-to-Noise ratio (SNR). Stefanoski et al. propose a connectivity-based prediction technique in [38], where prediction is performed in a frame-to-frame fashion using the previous frame and the partly decoded current frame. Mesh connectivity is used to determine the order of vertex compression and the spatial-cum-temporal dependency between vertex locations is exploited using a non-linear spatio-temporal predictor with angle preserving properties. They report a 25% improvement in compression performance over competing prediction schemes like [18], especially for high quality animation reconstruction. Muller et al. [31] propose another prediction-based compression algorithm using Differential Pulse Code Modulation (DPCM) where errors in prediction from the previously decoded mesh are clustered in an octree. Only a representative from each cluster is used for further processing which results in a significant reduction in bit-rate. 2.2.3.
Multiresolution Representation
Recently, multi-resolution mesh representation for bandwidth limited streaming applications has generated much interest. A notable work on multiresolution representation of dynamic geometry is that of Alexa et al. [3] who propose a Principal Component Analysis (PCA)-based compact animation representation scheme where each mesh in the animation sequence is projected on a basis of n PCA eigenvectors. The animation may be reconstructed using k eigenvectors where k << n. Higher the k, greater the level of detail. Another example of wavelet-based multiresolution encoding is that of Guskov et al. [14], which exploits parametric coherence in mesh sequences using an anisotropic wavelet transform and progressively encodes wavelet details. Payan et al. [33] propose another wavelet-based multiresolution representation scheme based on a temporal lifting scheme that exploits the temporal redundancy in dynamic geometry. In [21], Karni et al. propose a compression scheme that employs a combination of Principal Component Analysis (PCA) and Linear Predictive Coding (LPC). Recently, localized PCA-based dynamic geometry coding techniques have yielded good compression performance. Sattler et al. [35] propose animation compression using Clustered PCA (CPCA) where the mesh is first segmented into meaningful components based on vertex motion analysis and PCA is applied on each of these components. This compression scheme outperforms both pure PCA-based and PCA+LPC approaches while achieving better animation
90
S. Ramanathan and A.A. Kassim
reconstruction. Another Localized PCA Analysis (LPCA)-based compression scheme is proposed by Amjoun et al. [4]. Upon clustering the mesh using local similarity properties, a local coordinate system is defined for each cluster with respect to which the cluster motion is encoded using PCA. LPCA coding achieves better compression performance compared to CPCA-based compression for similar quality of reconstructed animation. 2.2.4.
Other Coding Algorithms
Varakliotis et al. propose animation encoding with RTP packetization in [43], and recommend insertion of I frames in the encoded/transmitted mesh sequence to maintain animation smoothness. A Differential Pulse Code Modulation (DPCM)-based encoder is used to compress the animation, whose compression efficiency is low. Main contributions of this work include (i) Analysis of the trade-off between compression performance and reconstructed animation quality and (ii) Introduction of the Peak Mean Square Error (PMSE)-based distortion metric to tackle degradation of animation smoothness under noise. MPEG-4 Part 25 [19] presents generic tools for dynamic 3D mesh compression using Bone-Based Animation (BBA), which involves decomposition of geometric motions in the animation to elementary transformations, and Frame-based Animation Mesh Compression (FAMC), where the animation is divided into segments that can be decoded independently. A spatially and temporally scalable compression scheme for 3D animations using FAMC, where the original animation is reconstructed at multiple layers corresponding to different spatial resolutions is proposed in [39]. Boulfani et al. [5] propose a 3D dynamic mesh compression scheme where geometry compensation is performed upon clustering the mesh using motion characteristics followed by application of the scan-based wavelet transform. 2.2.5.
Encoding 3D Dynamic Meshes with Changing Connectivity
Very few works deal with compression of animations with changing connectivity. Shamir et al. [36] suggest a multi-resolution representation scheme for animations with changing connectivity using the T-DAG data structure, which can be incrementally constructed even as the input mesh is processed. Gupta et al. [12] propose an Iterative Closest Point (ICP) registration-based geometry-cum-connectivity coding scheme for dynamic 3D MMs. As in [13], the current and previous frames are partitioned to generate sub-meshes and intermesh correspondences are computed using ICP to identify the added/deleted vertices over time. Subsequently, the errors in geometry and connectivity prediction are encoded and transmitted.
3.
Vertex Clustering for Dynamic Geometry Coding
In this section, we investigate how vertex clustering, which involves grouping of mesh vertices for motion prediction, affects dynamic geometry compression. We first discuss the impact of vertex clustering on registration-based dynamic mesh coding with ICPbased compression [13] as an example, and see how PCA coding approaches developed on similar lines [35, 4] have yielded superior compression performance for dynamic geometry animations in general.
A Survey of Popular 3D Soft-Body Animation Compression Approaches
91
Vertex clustering techniques group mesh vertices into a number of sets, where the grouping may done on the basis of (a) mesh topology (b) mesh geometry or (c) semantic mesh segmentation.
3.1. 3.1.1.
Overview of Vertex Clustering Techniques Topology-Based Clustering
Topology-based clustering techniques partition the mesh based on vertex adjacency. They are more popularly known as ”Graph Partitioning Techniques” and many algorithms have been developed to solve the graph partitioning problem [16, 23]. The graph partitioning problem was formulated to solve a compute intensive problem using a number of parallel processors where the objective was to share the workload equally among the processors while keeping number of inter-processor communications small. In the case of the 3D triangular mesh, the aim is to divide it into equal-sized pieces with minimum number of edges between the pieces. To this end, the multilevel k-way graph partitioning algorithm divides the mesh into k roughly equal partitions such that the number of cut edges connecting vertices in different partitions is minimized. In MPEG video compression [24], the image is divided into smaller, equal-sized blocks for efficient motion prediction. Likewise, decomposition of the mesh into equal-sized segments can be achieved using the topology-based multilevel k-way graph partitioning algorithm [16]. Given the mesh geometry V , the function of the graph partitioning algorithm is to divide the mesh, consisting of n vertices, into k subsets, V1 , V2 , . . . , Vk such that Vi ∩ Vj [
= φ f or i 6= j
(1)
|Vi | = n/k
(2)
Vi = V
(3)
i=1..k
S where |Vi | denotes the cardinality of the ith cluster and Vi denotes the union of the k clusters. In this mode of clustering, vertices are clustered with their connected neighbors as given by the mesh connectivity and no knowledge of mesh geometry is required. For a graph G(V, E) containing vertex set V and edge set E, the graph partitioning algorithm first coarsens the original graph G0 = G(V 0 , E 0 ) into a series of coarse graphs Gi = G(V i , E i ), such that the number of vertices at Gi is approximately half the number of vertices at G(i−1) i.e., |Vi | ≈ 12 |Vi−1 |. The graph is coarsened by performing a series of edge contractions. A maximal set of edges, no two of which are incident on the same vertex, are first determined and these edges are contracted. This coarsening procedure maps each vertex in the fine graph Gi−1 to a unique vertex in the coarse graph Gi and therefore, graph topology is preserved. The coarsening terminates when the original graph has been coarsened to Gm = G(V m , E m ) where |Vm | is typically a small number. The coarsest graph Gm is now segmented using a spectral partitioner [15] which uses eigenvectors of the graph Laplacian for partitioning. The partitions obtained for the coarsest graph are propagated back to the finer graphs by projecting the k partitions onto Gm−1 , Gm−2 , . . . , G0 . The projected partitioning onto Gi−1 is occasionally refined using local refinement heuristics based on the Kerninghan-Lin (KL) algorithm [23]. Vertices are
92
S. Ramanathan and A.A. Kassim
incrementally swapped among the partitions to reduce the number of cut edges connecting vertices in different partitions. The mesh is finally divided through recursive bisection into k sets each containing about |V0 |/k vertices. The clusters generated by the k-way graph partitioning algorithm for various meshes are shown in Fig 3. Overall, multilevel k-way partitioning performs better than competing inertial or spectral bisection approaches [37] in terms of execution time and the partition quality (based on the number of cut edges). However, the vertex clusters obtained by minimization of the number of cut-edges are ineffective for determining the mesh motion. This is evident from Fig 2 where vertices belonging to distinct mesh regions are clustered together (parts of the nose, forehead and cheeks in the face; pelvis and thigh regions for blade) while vertices corresponding to the same region fall in different clusters (mouth region of the face and the claws for the chicken). Clearly, the clustered vertices do not undergo homogeneous motion. Also, when the same clusters are used for the entire sequence, detecting the coherent motion regions becomes difficult and consequently, the compression performance is affected as we will see in latter sections where we analyze related experimental results.
(a)
(b)
(c)
(d)
Figure 3. Clusters generated by topology-based k-way partitioning algorithm for (a) Chicken - 32 partitions (b) Face - 8 partitions (c) Dinopet - 32 partitions and (d) Blade - 8 partitions.
3.1.2.
Geometry-Based Clustering
Geometry-based clustering involves grouping of vertices based on their positional closeness and is independent of the mesh connectivity. The Lloyd’s k-means algorithm [29] is a popularly used geometry-based clustering technique. Given a set of n data points {xi } in d-dimensional space and the required number of clusters k, the problem is to determine a set of k centers {cj } such that the mean squared distance of each point to its nearest center, termed the average distortion D, is minimum. The algorithm works as follows. The initial k cluster centers are chosen at random and the data points {xi } are partitioned into k clusters by assigning each point to the cluster containing the closest ci . The set of data points to which ci is the nearest center is known
A Survey of Popular 3D Soft-Body Animation Compression Approaches
93
as the neighborhood of ci and is denoted by V (ci ). Once the initial centers and their neighborhoods have been determined, the algorithm proceeds by moving the ci ’s to the centroid of their clusters and recomputing V (ci ) for each of the ci ’s. This process iterates until convergence is achieved or the mean distortion D achieves a local minimum. A summary of the Lloyd’s algorithm is presented below. • Step 1: Initialize {cj } by selecting the cj ’s at random. • Step 2: Determine the neighborhood V (cj ) for each of the cj ’s by assigning the xi ’s to their closest center. V (cj ) = {xi : d(xi , cj ) ≤ d(xi , ck ), f or all k 6= j}
(4)
• Step 3: Move each of the cj ’s to the centroid of V (cj ). cj =
X 1 (xi ), xi ∈ V (cj ) |V (cj )|
(5)
i
• Step 4: Repeat Step 2 and Step 3 until mean distortion D is minimum i.e., D=
k X 1X 1 (xi − cj )2 = Dmin k |V (cj )| j=1
(a)
(6)
x∈V (cj )
(b)
(c)
Figure 4. Illustration of Lloyd’s clustering for k=3. (a) The initial cluster centers in red, blue and green and their computed neighborhoods (b) Centers are moved to the centroid of the cluster and the data points are re-assigned to the nearest centers. (c) Final clusters. Fig 4 illustrates the working of the Lloyd’s algorithm. In the context of mesh partitioning, the clusters themselves are more important than the cluster centers. For k-means clustering, it can be proved that the local minimum distortion measure would correspond to a ”centroidal Voronoi” configuration [20], where each data point is closer to its cluster center than any other cluster center. The partitions move closer to this configuration at every step until convergence, and the final clusters would correspond to the local energy minima, even when the initial centers are badly chosen. However, slightly different initial
94
S. Ramanathan and A.A. Kassim
partitionings do not produce the same set of clusters. Also, while the final partitioning is definitely better than the initial partitioning, it need not correspond to the global minimum. Nevertheless, this is not a significant problem since data repartitioning may be performed later, as explained in the next section. Since vertex clustering is independent of the mesh connectivity, vertex neighbors have to be computed explicitly. For 3D meshes, computing nearest neighbors is not a trivial problem. The Lloyd’s implementation in [20] computes nearest neighbors using a kd-tree (k dimensional tree) data structure. Also, since the data points do not change throughout the cluster computation process, the kd-tree needs to be computed only once. The clusters at every step are determined by computing the nearest center for each of the tree nodes. Since clusters are determined based on the vertex positions, the cluster configurations will vary for different meshes in the animation sequence (Fig 5). Clustering based on vertex proximity produces better quality partitions whose vertices are more likely to undergo homogeneous motion.The cluster sizes are variable and the mesh can be segmented into an arbitrary number of clusters. A noticeable improvement in compression performance is observed when geometry-based clustering is employed, especially for high-motion sequences, as noted in later sections. However, since the vertex clusters do not correspond to the distinctive mesh components, the general performance of geometry-based clustering is inferior to that of semantic mesh decomposition for dynamic mesh coding.
(a)
(b)
(c)
(d)
Figure 5. Segmentation using Lloyd’s clustering for frames (a) 70 and (b) 120 of the Chicken animation (maximum cluster size = 100); frames (c) 0 and (d) 505 of the Face animation (maximum cluster size = 75).
3.1.3.
Spectral-Based 3D Mesh Segmentation
A third set of techniques perform Semantic Mesh Decomposition to segment the mesh into meaningful components. These techniques exploit both topology and geometry features to generate components that represent distinctive features of the 3D polygonal mesh. Unlike in geometry or topology-based clustering, where the number of clusters is user specified, most of the semantic mesh decomposition algorithms [28, 22, 27] automatically determine the number of vertex clusters based on homogeneity of the mesh regions. The mesh components represent the distinctive regions of the object consistent with human perception, which defines boundaries along concavities of the surface. They can be used to establish
A Survey of Popular 3D Soft-Body Animation Compression Approaches
95
shape correspondence, and in most cases, also correspond to regions capable of undergoing independent motion (Fig 6). Liu and Zhang proposed a spectral clustering approach to mesh decomposition in [28], where, in order to segment a 3D mesh with n faces along the edges, the n×n affinity matrix W is initially constructed for the dual of the mesh graph to group faces closer to each other. Each vertex in the dual graph corresponds to a mesh face and two vertices are connected if and only if the corresponding mesh faces are adjacent to each other. For grouping of faces, the pairwise face distance measure used in [22] is used to define the affinity matrix. The distance measure between mesh faces fi and fj is defined as the shortest path between their dual vertices given by
Dist(i, j) = weight(dual(fi ), dual(fj )) = δ
Geod(fi , fj ) Ang Dist(αij ) +(1−δ) (7) avg(Geod) avg(Ang Dist)
Here, Geod(fi , fj ) is the geodesic distance between fi and fj while the angular distance is defined as Ang Dist(αij ) = η(1 − cosαij )
(8)
where αij is the angle between the normals for adjacent faces fi and fj . Since the angular distance plays a more important role for visually meaningful segmentation, δ is set to a value close to zero. Also, a smaller value of η favors concavities and therefore it is set in the range 0.1 ≤ η ≤ 0.2. On obtaining the pairwise face distances, the affinity matrix is defined by the Gaussian kernel W (i, j) = e
−Dist(i,j) 2σ 2
(9)
It can be easily seen that 0 < W (i, j) < 1 and takes larger values for faces closer to each other. A suitable value for the width of the Gaussian, σ, is empirically set to 1 P 2 1≤i,j≤n Dist(i, j). n W (i, j) encodes the likelihood of faces i and j belonging to the same patch. The nor1 1 malization of the affinity matrix is performed as N = D− 2 W D− 2 where D is the diagonal matrix whose ith diagonal element is the sum of the ith row of W , the vertex degree at node i. N possesses desirable properties in the context of spectral clustering [44] and W Nij = √D ijD . Let V be n×k matrix formed using the k leading eigenvectors of N . Then, ii
jj
the n × n matrix Q = V V T represents the most energy preserving projection of N to rank k. Normalizing the rows of V to unit length gives Vˆ , whose rows vˆ1 . . . vˆn (of dimension k) represent the embedding of the Wij s onto the k-dimensional unit sphere centered at origin. ˆ = Vˆ Vˆ T is known as the association matrix whose elements Q ˆ ij = vˆi vˆj T = cos θij Q are the cosine of the angle between unit vectors vˆi andPvˆj . As N is projected to successively lower rank k, the sum of squared angle cosines i,j (cosθij )2 is strictly increasing [6]. Point pairs likely to be clustered together will move towards each other as k decreases, while other pairs will move further apart. Therefore, clustering points in k-dimensional space is easier than clustering the original data and is accomplished by performing k-means clustering on the rows of Vˆ .
96
S. Ramanathan and A.A. Kassim
The spectral clustering algorithm performs a semantic decomposition of the mesh with the generated components corresponding to the salient object features, capable of undergoing independent motion, as shown in Fig 6. The algorithm tends to segment the mesh in a hierarchical fashion on varying the number of eigenvectors chosen for V . The computation of the shortest distance face pairs and the affinity matrix W are of complexity O(n2 log(n)) and O(n2 ) respectively but this computation time is greatly reduced in the implementation described in [47]. Also, a recursive 2-way spectral cut procedure used in [47] overcomes the problem of choosing the optimal k for clustering and produces better quality partitions. Since the components generated upon mesh decomposition correspond to the salient features of the object, the same set of vertex clusters can be used for performing motion estimation for the entire animation sequence. As the components can describe the piecewise affine mesh motion effectively, it is possible to encode the animation more efficiently. Experimental results confirm the superior compression performance obtained through semantic mesh decomposition compared to k-way partitioning or Lloyd’s clustering for dynamic geometry compression.
(a)
(b)
(c)
(d)
Figure 6. Segmentation of (a) Chicken (59 components), (b) Face (6 components), (c) Dolphin (7 components) and (d) Dinopet (29 components) meshes through spectral clustering. A number of mesh segments e.g., fins of the dolphin, limbs of the dinosaur can undergo independent motion.
3.1.4.
Analysis of Registration-Based Coding Algorithms
A majority of the registration-based dynamic geometry compression algorithms use topology-based clustering for segmenting meshes. Lengyel’s [25] algorithm uses a greedy vertex clustering approach based on the triangulation of the original mesh. Prediction-based geometry compression algorithms [46, 38] define vertex neighborhoods for prediction based on mesh connectivity . Ahn et al. [1] segment the mesh by converting the triangular mesh structure into a linear triangle strip form. The triangle strip is divided into blocks such that each block has same number of vertices. Gupta et al. [13] use the multilevel k-way graph partitioning technique [16] that generates clusters of approximately equal sizes. Since the
A Survey of Popular 3D Soft-Body Animation Compression Approaches
97
mesh connectivity remains constant for dynamic geometry, topology-based clustering needs to be performed only for the first mesh in the sequence (I mesh) and the clusters remain fixed thereafter for the entire sequence. We find that mesh segmentation based on the fixed mesh topology is unsuitable for compressing mesh sequences with changing mesh geometry. Efficient detection of mesh pieces that have moved over time is possible only when the components generated upon clustering roughly represent these pieces. When the mesh undergoes arbitrary deformation, the vertex clusters undergoing coherent motion will be different at different times. Clearly, it is impossible for a given set of clusters generated using topology-based partitioning to represent the coherent motion regions at all times. Clustering the mesh on the basis of positional proximity instead of graph adjacency is more suited for encoding dynamic geometry sequences. Alternatively, a fixed set of clusters will effectively describe the piecewise affine mesh motion in animations only if they correspond to the distinctive mesh components that can undergo independent motion. Our experimental results (Section 3.7) confirm that geometry-based clustering and semantic mesh decomposition techniques produce better compression performance than topology-based clustering. The next section briefly discusses ICP-based dynamic mesh coding [13] and related performance metrics while the following sections demonstrate how potent vertex clustering improves compression performance of registration-based and PCA-based dynamic mesh coding algorithms.Finally, we conclude with observations that will require critical consideration during the development of an efficient 3D animation coding standard.
3.2.
ICP-based 3D Dynamic Geometry Compression
The vertex clustering algorithms described in the previous section simplify the problem of determining the regions of coherent motion by segmenting the mesh into pieces that can possibly undergo affine motion. In order to determine the actual regions that have moved, motion prediction needs to be performed. For 3D dynamic geometry, the inter-mesh motion is typically small. The mesh motion can be completely described using a few affine transformations and residual errors, and this compact representation leads to compression. An efficient dynamic geometry compression algorithm that performs systematic motion estimation is described in [13]. The algorithm uses the multilevel k-way graph partitioning algorithm for initially segmenting into pieces, and for each piece in the current mesh, the corresponding piece in the temporal reference is detected using the Iterative Closest Point (ICP) algorithm. Using the results of ICP-based registration, the motion segmentation module segments the mesh vertices into three distinct sets based on their motion characteristics. • First set (Type 1) - consisting of clusters of vertices, such that the motion of each of the clusters may be described accurately using the associated affine transform Ai . The reconstruction error for the vertices falling in this set is less than a threshold, τ . • Second set (Type 2) - consisting of clusters of vertices, such that each cluster has an affine transformation matrix Ai associated with it. In addition, residual errors also need to be encoded for accurately representing the vertex positions. The reconstruc-
98
S. Ramanathan and A.A. Kassim tion error for the vertices falling in this set using the affine transform alone is less than 20 τ . • Third set (Type 3) - consisting of vertices whose motion cannot be described effectively using affine transforms. These are encoded using DPCM-based techniques.
The ability of ICP-based motion segmentation to divide the vertices in the mesh into distinct sets helps achieve efficient compression. This is because the motion of Type 1 and Type 2 vertices, which constitute over 70% of the total, can be described using a few affine transformations and residual errors. Also, the residual errors can be adaptively coded using variable number of bits for different groups of vertices in order to maintain the animation smoothness. ICP-based compression produces two types of meshes: I and P . I (Intra) meshes are encoded using static mesh compression techniques and complete information can be obtained by decoding the I mesh without any reference. P (Predicted) meshes contain only the differences from the temporally previous I or P mesh and chiefly contribute to compression. The difference needs to be added to the reference mesh data in order to obtain the complete P mesh information. Although dynamic geometry compression is inherently lossy, it is still acceptable if the reconstructed animation is ’perceptually lossless’, as there is a trade-off between compression and quality. For P meshes, the following information needs to be encoded: • Affine matrices and vertices associated with each affine transform matrix. • Error values associated with the vertex positions for Type 2 and Type 3 vertices. To compactly encode the above information, the following procedure is used: • Vertices whose motion can be described using affine transforms are given the symbol P. Every P vertex is associated with a patch index to denote the associated affine transform. When the patch index of the vertex hasn’t changed from the reference, a symbol N is used to signify no change and the previous patch index is used. • Type 2 vertices are represented using the symbol E to denote error information. • Type 3 vertices encoded using DPCM techniques are assigned the symbol D. • As affine transforms associated with the vertex clusters exhibit a high spatio-temporal correlation, the differences between the affine matrices are quantized and encoded. • The number of bits used to encode error data for Type 2 and Type 3 vertices is determined by the PSNR. More bits are added to encode error for vertices whose PSNR is below a certain threshold. • Vertex symbols, affine matrices and residual errors are encoded using Arithmetic coding. The process of reconstructing mesh geometry from P meshes is outlined in Fig 7. When there is large inter-mesh motion, the same vertex may be associated with a number of affine transforms and many vertices may require error information to be encoded, which results
A Survey of Popular 3D Soft-Body Animation Compression Approaches
99
in considerably lower compression ratios. Therefore, such meshes are encoded as I meshes for which only the spatial coherence in mesh geometry is exploited. The Edgebreaker algorithm [34] is used for encoding I meshes. Insertion of I meshes helps maintain animation smoothness with a marginal reduction in compression. Also, periodic transmission of I meshes is necessary while transmitting data over noisy channels and for enabling random access to the animation sequence.
Figure 7. Reconstruction of mesh geometry from P meshes.
3.2.1. Performance Metrics The Signal to Noise Ratio (SNR) is extensively used for evaluating the performance of dynamic geometry coding algorithms. The SNR and PSNR (Peak Signal to Noise Ratio) are considered objective measures for comparing the quality of the compressed data against the original data. The following definition for Peak Signal to Noise Ratio (PSNR) [43] is for evaluating the reconstruction quality of the encoding scheme. P SN R = −10 log10 P M SE
(10)
where P M SE is the Peak Mean Square Error per vertex given by P M SE =
1 Nn
Pn
1 j=1 3
P
i=x,y,z (vji (t) R2
− v¯ji (t))2
(11)
where R is the maximum inter-mesh displacement for the entire animation sequence, Nn and n denote the number of vertices that have moved between two consecutive meshes and the total number of mesh vertices respectively. The PSNR provides a quantitative measure of the animation smoothness. Another performance metric used for comparing various compression algorithms is the compression ratio which is defined as the size of the original data to the encoded data. The
100
S. Ramanathan and A.A. Kassim
per-frame compression ratio (CR) is calculated as follows: CR =
(Bits for raw data) (Encoded vertex data bits + Affine transform bits + Error bits)
(12)
A number of encoding schemes also use Distortion Factor da (also termed KGerror ), to evaluate reconstruction quality and is defined as da = 100
ˆ kB − Bk kB − C(B)k
(13)
where B is a 3V × F matrix representing the geometry of the V vertices in the F frames ˆ represents the reconstructed animation geometry and C(B) of the original animation, B contains the average vertex positions for the animation. Likewise, the compression performance is alternatively measured using encoded Bits per Vertex per Frame, (bpvf ), which is related to the compression ratio as given by the following equation.
bpvf =
3.3.
Bits for encoding each vertex Raw data bits (total) = = Compressed bits (total) F ( ) FV
PF
96
i=1
CRi
(14)
F
Impact of Vertex Clustering on Compression Performance
In this section, we study the impact of vertex clustering on dynamic geometry compression by comparing the aforementioned performance metrics for the clustering schemes described previously. 3.3.1.
Test Animations
Four animation sequences, namely, Chicken2 , Face3 , Cow and Dance were used for comparing the clustering schemes. The Chicken animation contains 400 frames with each mesh in the sequence consisting of 3029 vertices and 5664 triangles. The animation is highly non-linear with the motion becoming extremely rapid after frame 260. The Face sequence contains a realistic animation of a talking human face in various poses and exhibiting various facial expressions as well. There are 952 frames in the sequence with 757 vertices and 1468 triangles per mesh. The Cow (2904 vertices, 5804 triangles, 204 frames) animation is also a high motion sequence while the Dance sequence (7061 vertices, 14118 triangles, 201 frames) depicts a person performing various dance movements. Due to the similar nature of motion throughout the Dance animation, only the first 100 meshes were used in the experiments. 3.3.2.
Experimental Results
ICP-based dynamic geometry compression [13] using k-way partitioning, with about 100 vertices per cluster, achieves an average compression of 45 for the Chicken animation. It can 2
The Chicken, created by Andrew Glassner, Tom McClure, Scott Benza, and Mark Van Langeveld for Microsoft Corporation (1996). 3 The Face sequence is the property of Visage Technologies.
A Survey of Popular 3D Soft-Body Animation Compression Approaches
101
be observed that the Lloyd’s clustering (Fig 4) and spectral decomposition (Fig 6) schemes are able to segment the chicken’s neck from its torso more effectively compared to k-way topology partitioning (Fig 3). The motion in frames 40-60 and 200-230 is mainly localized around the neck of the chicken as seen in Fig 8.
Figure 8. Frames 48, 56, 203 and 221 of the Chicken animation. For P frames, the per-frame compression is directly proportional to the number of Type 1 and Type 2 vertices and inversely proportional to the number of Type 3 vertices. The number of Type 1 vertices, in turn, depends on how well the affine transforms computed using ICP based registration can represent the piecewise motion of the mesh. There exists a direct relationship between the number of Type 1 vertices registered using ICP and the accuracy with which the initial clusters input to ICP can represent the independent mesh regions. To illustrate this point, frame sequences 40-60 and 200-230 were encoded using Lloyd’s clustering (100 vertices per cluster) and spectral mesh decomposition (59 mesh components). Table 1 shows that the number of Type 1 vertices registered using ICP are much higher when Lloyd’s and spectral clustering are used instead to k-way graph partitioning for both frame sequences. Better clustering produces more Type 1 vertices, and hence, better compression performance. For frames 40-60, all mesh vertices are encoded as either Type 1 or Type 2. However, for frames 200-230, affine transforms can effectively describe motion of only few mesh vertices, and therefore, there are many Type 3 vertices. For this frame sequence, while the number of Type 1 vertices registered using Lloyd’s and spectral clustering are about the same, more Type 2 vertices generated using spectral clustering produces higher compression. The latter part of the Chicken animation (frames 261-399) is characterized by extensive motion. In these frames, reconstruction errors are associated with a large number of vertices and the reconstructed animation is noisy. Additional bits need to be encoded to improve animation quality at the expense of compression performance for these frames as seen from Fig 9. PSNR calculations are used to measure and improve the animation smoothness for this set of frames. For each frame in the animation, we measure the PSNR for Type 1, Type 2, Type 3 vertices and the entire reconstructed frame for analysis. For ensuring smooth animation reconstruction, the overall minimum PSNR varies for different sequences and depends on the nature of the mesh motion. A minimum PSNR of 35 db is required for the Chicken sequence (R = 0.7662), while a PSNR threshold of 20 db is sufficient for the low
102
S. Ramanathan and A.A. Kassim
motion Face sequence (R = 0.0673). The frame PSNR can be improved by (i) allocating extra bits for encoding Type 2 and Type 3 vertices (ii) using a smaller error threshold τ to register Type 1 vertices and (iii) transmission of I meshes. Examples of (i) and (ii) are shown in Fig 9. For frames 261-399, Lloyd’s clustering performs better than spectral clustering and provides the best compression performance (Table 1). This is because the motion in these frames is concentrated around the chicken’s wings which is not well segmented by spectral clustering. The rapidness in motion also necessitates a number of meshes in the animation to be coded as I meshes which correspond to the minima in the compression curve. The performance of various dynamic geometry coding schemes for the Chicken animation is presented in Table 2. Clearly, the use of Lloyd’s clustering over k-way partitioning improves performance of ICP-based compression by 3.8% under similar distortion (da ). Table 1. Impact of vertex clustering on compression performance for frame sequences (a) 40-60, (b) 200-230 and (c) 261-399 of the Chicken animation. The table contains the mean values of Type 1 count, Type 2 count, CR and PSNR for the frame sequence under consideration. The number of Type 1 vertices and compression ratios increase with improvement in quality of input clusters. Frame nos. 40-60
200-230
261-399
Clust. Algo.
Type 1 count
Type 2 count
CR
PSNR
k-way
733
2296
52
66.1
Lloyd’s
746
2283
52.2
66.5
Spectral
839.6
2189.4
52.8
67
k-way
470
1787
45.6
63.7
Lloyd’s
604
1632
47.2
63.9
Spectral
603
1925
48.3
64.6
k-way
264
1406
38.3
51.1
Lloyd’s
357
1299
39.9
50.8
Spectral
296
1407
39.1
51.1
For the Face sequence, the inter-frame motion is very low and very few vertices are registered with error threshold τ = ν/4, where ν is average inter-frame motion, for many frames. The compression results for the different clustering schemes for τ = ν/4, ν/3 and ν/2 are shown in Table 3. Clearly, spectral clustering produces higher compression performance than Lloyd’s or k-way partitioning. The ability of the spectral clustering algorithm to accurately segment the various face regions (Fig 6) enables the ICP module to register a maximum number of Type 1 vertices. This leads to a major improvement in the compression performance even when the encoded mesh is small in size. The compression obtained using spectral clustering is 9.7%, 15% and 7.8% higher than k-way partitioning for τ equal to ν/4, ν/3 and ν/2 respectively. However, the compression obtained using Lloyd’s and k-way partitioning is very similar even though Lloyd’s clustering produces better quality
A Survey of Popular 3D Soft-Body Animation Compression Approaches
(i)
(ii)
(i)
(a)
103
(ii)
(b)
Figure 9. (a) Reconstructed Chicken frame 286 with (i) PSNR = 25.7 db (CR = 46.6) and (ii) PSNR = 43.2 db using τ = ν/11 and 6 bits for error encoding (CR = 46.3). (b) Reconstructed frame 320 with (i) PSNR = 31.8 db (CR = 49.5) and (ii) PSNR = 39.7 db using τ = ν/7 and 5 bits for error encoding (CR = 49.3).
(a)
(b)
(c)
Figure 10. Frames of the Cow and Dance animations partitioned using (a) k-way (b) Lloyd’s and (c) Spectral clustering.
104
S. Ramanathan and A.A. Kassim
Table 2. Performance of various dynamic geometry compression algorithms for the Chicken animation (uncompressed file size = 13.9 MB). Table contains mean values of CR and da (wherever available). Compression algorithm
CR
da
Motion compensated compression [1]
10
Motion vector prediction-based compression [46]
18.3
Time-dependent geometry compression [25]
27
PCA Representation [3]
39.8
Connectivity-guided connectivity compression [38]
33
0.13
Clustered PCA Analysis [35]
34.3
0.076
Partitioning based compression [13]
45.3
0.11
[13] with spectral
46.6
0.12
[13] with Lloyd’s
46.7
0.12
Local PCA Analysis [4]
64
0.057
clusters. This is possibly because a marginal improvement in cluster quality does not significantly improve the number of Type 1 vertices for the low-motion, small-sized Face mesh sequence. A PSNR threshold of 20 db is sufficient to smoothly reconstruct the animation. Table 3. Compression performance of the different clustering schemes algorithms at various values of τ for the Face animation. Table contains mean values of CR and PSNR. τ ν/4
ν/3
ν/2
Clust. algo.
Type 1 count
CR
PSNR
k-way
182
41.9
44.3
Lloyd’s
165
41.2
44.5
Spectral
268
45.9
43.8
k-way
372
53.3
39.9
Lloyd’s
367
52.9
40
Spectral
526.8
61.3
39.2
k-way
670
69.5
34.4
Lloyd’s
675
69.2
33.9
Spectral
710
74.9
34.9
Some partitioned frames of the Cow and the Dance animations are shown in Fig 10.
A Survey of Popular 3D Soft-Body Animation Compression Approaches
105
A minimum PSNR of 35 db is required to smoothly reconstruct the animation for both sequences. It is evident from the figure that k-way and Lloyd’s produce more mesh clusters compared to spectral clustering. As vertex clustering is performed solely based on proximity for k-way and Lloyd’s clustering, the cluster sizes affect the compression and SNR performance as observed in [13]. While ICP works well on small-sized clusters, a large number of mesh clusters are associated with increased processing time and reduced compression performance (as more affines need to be encoded). Also, large-sized clusters produce registration errors and consequently, a degradation in compression and SNR performance. We observe that the best compression and SNR performance is achieved for cluster sizes of 100 and 125 for k-way and Lloyd’s clustering respectively. Table 4. Compression performance of the different clustering schemes algorithms for the Cow and Dance animations. Mean values of CR and PSNR are listed in the table. Anim. Cow
Dance
Clust. algo.
Type 1 count
Type 2 count
CR
(% incr.)
PSNR
k-way
351
1386
40.3
-
43
Spectral
332.3
1597
41.7
3.5
44.7
Lloyd’s
368
1581
42
4.2
42.5
k-way
312
4038
41.4
-
42.2
Lloyd’s
414
3993
41.9
1.2
42.9
Spectral
740
4765
45.8
10.6
41.5
The Cow animation sequence is characterized by high motion and I meshes need to be encoded frequently after frame 100. Lloyd’s clustering enables most efficient encoding of the mesh motion as shown in Table 4. The number of Type 1 vertices is minimum for spectral clustering but outperforms k-way partitioning as there are more registered Type 2 vertices. The poor performance of spectral compression for the Cow and the latter part of the Chicken animations underline the limitations of semantic mesh decomposition. This is because the mesh decomposition is purely based on the intrinsic geometric structure of the mesh. While it is difficult to achieve accurate segmentation of the mesh into distinctive components, efficient segmentation of coherent motion regions can only be performed by exploiting the motion cues available from the animation. Overall, about a 4% improvement in compression performance is obtained when Lloyd’s clustering is used instead of graph partitioning. On the other hand, spectral clustering performs exceedingly well for the Dance animation. The increased number of ICP registered Type 1 vertices using spectral clustering improves compression performance by over 10% for the Dance animation as shown in Table 4.
3.4.
Comparison with PCA-Based Algorithms
As seen from the experimental results, the clustering scheme employed to segment the mesh for motion prediction greatly affects compression performance. The k-way parti-
106
S. Ramanathan and A.A. Kassim
(a)
(b)
Figure 11. (a) Clusters generated for the Chicken and Cow in [35]. (b) Vertex clusters for the Chicken, Cow and Dance meshes in [4]. Figures adapted from [35, 4]. tioning, Lloyd’s clustering and Spectral mesh decomposition are static mesh segmentation techniques that segment the mesh to be encoded into smaller pieces without any temporal considerations. For encoding motion in dynamic mesh sequences, temporal cues can also be used to group vertices likely to undergo similar motion. Motion-based segmentation can facilitate identification of those ”pieces” that cannot be easily detected using static mesh decomposition. For example, for the human figure in the Dance sequence, segmentation of the arms and limbs is achieved by spectral clustering (Fig 10). Motion cues can be used to achieve further segmentation around articulated joints like the elbow and knee, and clearly, the segmented parts will correspond to the coherent motion regions better. Next, we look at two recent algorithms that segment the mesh based on motion characteristics to reinforce the idea that meaningful mesh segmentation greatly impacts performance of 3D dynamic mesh coding algorithms . Two approaches that perform motion-based clustering to efficiently represent motion have been found to achieve high compression performance. Sattler et al. [35] propose the clustered PCA (CPCA) approach to dynamic geometry compression that can identify the mesh parts undergoing coherent motion over time. The vertex trajectories are clustered using Lloyd’s clustering [29] in combination with PCA to segment the coherent mesh parts. Each mesh part is then compressed using PCA on the complete animation as performed in [3]. This method results in higher compression than standard PCA and PCA+LPC approaches while producing lesser distortion. Also, Amjoun et al. [4] propose local PCAbased compression, where the mesh is segmented into clusters based on local motion characteristics and a local coordinate system is defined for each cluster, with respect to which the cluster motion is encoded. Table 5 compares the performance of ICP-based compression using spectral clustering with CPCA and LPCA-based compression for similar distortion. From the table, it is evident that for the bpvf values for ICP-based compression using spectral decomposition are much lower than those for Clustered PCA for comparable values of distortion. While LPCA-based compression performs better than CPCA or ICP-based coding for the Chicken sequence, ICP coding with spectral clustering achieves maximum compression for the Cow animation. This could be attributed to the inadequate segmentation achieved by the pure motion-based clustering schemes in [35, 4] as shown in Fig. 11. While motion-based clustering can produce meaningful segmentation of the mesh into components, e.g., wings and legs of the Chicken, more coherent mesh segments are obtained for
A Survey of Popular 3D Soft-Body Animation Compression Approaches
107
Table 5. Comparison of CPCA and LPCA-based compression with ICP based-compression using Spectral clustering for the Chicken and Cow animations. Animation
Chicken
Cow
CPCA
LPCA
ICP
bpvf
da
bpvf
da
bpvf
da
4.7
0.076
3.5
0.008
2.16
0.12
2.8
0.139
1.5
0.057
1.72
0.26
7.4
0.16
6.8
0.128
2.9
0.33
3.8
0.50
4.1
0.47
2.3
0.47
2.0
7.4
2.2
1.22
1.9
0.9
the Cow using spectral decomposition (Fig 10) compared to pure motion-based clustering. Therefore, an ideal vertex clustering scheme for 3D dynamic mesh coding needs to exploit both structural and motion-based cues for maximal compression performance.
4.
Conclusion
3D soft-body animation compression is a non-trivial as non-planar 3D mesh deformations cannot be described using well-known video motion prediction techniques. A number techniques have been proposed for efficiently coding 3D animations, and in particular, 3D dynamic geometry. These 3D dynamic mesh coding algorithms can be divided into three major classes- registration-based, prediction-based and PCA-based multiresolution representation. Efficient vertex clustering for motion prediction enhances compression performance achieved using registration and pca-based coding. Our experimental results clearly demonstrate that meaningful mesh segmentation enhances compression performance of dynamic mesh coding algorithms. Using spectral mesh decomposition, which segments the mesh based on structural cues, performance of registration-based dynamic geometry coding improves by as much as 10% (for the Dance animation), while CPCA and LPCA-based compression algorithms, that use motion cues for vertex clustering, also perform significantly better compared to other PCA-based coding schemes. Nevertheless, vertex clustering exclusively using structural or motion cues does not produce the best clusters for motion prediction- Lloyd’s clustering outperforms spectral decomposition for the high-motion Chicken animation while cluaters generated using motion cues are inadequate for the Cow sequence. A hierarchical mesh segmentation scheme that initially segments the mesh based on structural cues followed by generation of finer vertex clusters through motion analysis appears to be best suited for dynamic geometry coding. Detecting motion-coherent vertex clusters could be the key in solving the animation compression problem. Recently proposed Bone-based animation [19], which involves detection of the elementary transformations that constitute mesh motion, offers an exciting prospect in this regard.
108
S. Ramanathan and A.A. Kassim
References [1] J. H. Ahn, C. S. Kim, C. C. Kuo, and Y. S. Ho. Motion compensated compression of 3d animation models. Electronic Letters, 37(24):1445–1446, 2001. [2] A.Khodakovsky, P.Schroder, and W.Sweldens. Progressive geometry compression. In Proceedings of the 27th annual conference on Computer graphics and interactive techniques, pages 271–278, 2000. [3] M. Alexa and W. Muller. Representing animations by principal components. In EUROGRAPHICS, volume 19(3), pages 411–418, 2000. [4] Rachida Amjoun and Wolfgang Straβer. Efficient compression of 3d dynamic mesh sequences. Journal of the WSCG, 2007. [5] Y. Boulfani, F. Payan, and M. Antonini. Temporal wavelet-based compression of 3d animated meshes using motion-based clustering. In Proceedings of the Workshop TAIMA’07. Tunisia, May 2007. [6] M. Brand and K. Huang. A unifying theorem for spectral embedding and clustering. In Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics, 2003. [7] M. Chow. Geometry compression for real-time graphics. In Proceedings of Visualization’97, 1997. [8] D. Cohen-Or, O. Remez, and D. Levin. Progressive compression of arbitrary triangular meshes. In Proceedings of Visualization ’99, pages 67–72, 1999. [9] G. Debunne, M. Desbrun, M.P. Cani, and A. H. Barr. Dynamic real-time deformations using space and time adaptive sampling. In SIGGRAPH 2001, Computer Graphics Proceedings, pages 31–36, 2001. [10] M. Deering. Geometry compression. In Proceedings of SIGGRAPH ’95, pages 13–20, 1995. [11] S. Gumhold and W. Strasser. Real time compression of triangle mesh connectivity. In Proceedings of SIGGRAPH ’98, pages 133–140, 1998. [12] S Gupta, K Sengupta, and A.A. Kassim. Registration, partitioning based compression of 3d dynamic data. IEEE Transactions on Circuits and Systems for Video Technology, 13(11):1144–1155, 2003. [13] Sumit Gupta, Kuntal Sengupta, and A.A. Kassim. Compression of 3d dynamic geometry data using iterative closest point algorithm. Computer Vision and Image Understanding, 87:116–130, 2002. [14] Igor Guskov and Andrei Khodakovsky. Wavelet compression of parametrically coherent mesh sequences. In SCA ’04: Proceedings of the 2004 ACM SIGGRAPH/Eurographics symposium on Computer animation, pages 183–192, 2004.
A Survey of Popular 3D Soft-Body Animation Compression Approaches
109
[15] Bruce Hendrickson and Robert W. Leland. An improved spectral graph partitioning algorithm for mapping parallel computations. SIAM Journal on Scientific Computing, 16(2):452–469, 1995. [16] Bruce Hendrickson and Robert W. Leland. A multi-level algorithm for partitioning graphs. In Supercomputing, 1995. [17] Hugues Hoppe. Progressive meshes. In Proceedings of the 23rd annual conference on Computer graphics and interactive techniques, pages 99–108, 1996. [18] L. Ibarria and J. Rossignac. Dynapack: Space-time compression of the 3d animation of triangle meshes with fixed connectivity. In Proceedings of the ACM SIGGRAPH Symposium on Computer Animation, 1999. [19] B. Jovanova, M. Preda, and F. Preteux. Mpeg-4 part 25: A generic model for 3d graphics compression. In 3DTV08, pages 101–104, 2008. [20] Tapas Kanungo, David M. Mount, Nathan S. Netanyahu, Christine D. Piatko, Ruth Silverman, and Angela Y. Wu. The analysis of a simple k-means clustering algorithm. In Proc of the 16th Annual Symposium on Computational Geometry, pages 100–109, 1991. [21] Z. Karni and C. Gotsman. Compression of soft body animation sequences. Computers and Graphics, 28:25–34, 2004. [22] Sagi Katz and Ayellat Tal. Hierarchical mesh decomposition using fuzzy clustering and cuts. ACM Transactions on Graphics, 22(3):954–961, 2003. [23] B. W. Kernighan and S. Lin. An efficient heuristic procedure for partitioning graphs. The Bell System Technical Journal, 49(2):291–307, 1970. [24] Rob Koenen. Overview of the MPEG-4 standard. Moving Picture Experts Group, 2000. [25] J. Lengyel. Compression of time dependent geometry. In Symposium on Interactive 3D Graphics, pages 89–95, 1999. [26] J. Li and C.C. Kuo. A dual graph approach to 3d triangular mesh compression. In Proceedings of the IEEE International Conference on Image Processing, pages 891– 894, 1998. [27] X. Li, T. Toon, T. Tan, and Z. Huang. Decomposing polygon meshes for interactive applications. In Proceedings of the Symposium on Interactive 3D Graphics, pages 35–42, 2001. [28] R. Liu and H. Zhang. Segmentation of 3d meshes through spectral clustering. In Proceedings of Pacific Graphics, pages 298–305, 2004. [29] Stuart P. Lloyd. Least squares quantization in pcm. IEEE Transactions on Information Theory, 18(2):129–137, 1982.
110
S. Ramanathan and A.A. Kassim
[30] S. Moradoff and D. Lischinski. Synthesis of textural motion with hard constraints. In Proceedings of the 4th IsraelKored 6i-national conference on geometric modelling and computer graphics., pages 123–128, 2003. [31] K. Muller, A. Smolic, M. Kautzner, P. Eisert, and T. Wiegand. Predictive compression of dynamic 3d meshes. In ICIP05, pages 589–592, 2005. [32] R. Pajarola and J. Rossignac. Compressed progressive meshes. IEEE Transactions on Visualization and Computer Graphics, 6(1):79–93, 2000. [33] F. Payan and M. Antonini. Wavelet-based compression of 3d mesh sequences. In Proceedings of IEEE ACIDCA-ICMI’2005, 2005. [34] J. Rossignac. Edgebreaker: Connectivity compression for triangle meshes. IEEE Transactions on Visualization and Computer Graphics, 5(1):47–61, 1999. [35] Mirko Sattler, Ralf Sarlette, and Reinhard Klein. Simple and efficient compression of animation sequences. In SCA ’05: Proceedings of the 2005 ACM SIGGRAPH/Eurographics symposium on Computer animation, pages 209–217, 2005. [36] Ariel Shamir, Chandrajit Bajaj, and Valerio Pascucci. Multi-resolution dynamic meshes with arbitrary deformations. In Proceedings of the conference on Visualization ’00, pages 423–430, 2000. [37] H.D. Simon. Partitioning of unstructured problems for parallel processing. In Proc. Conference on Parallel Methods on Large Scale Structural Analysis and Physics Applications, pages 135–148. Pergammon Press, 1991. [38] N. Stefanoski and J. Ostermann. Connectivity-guided predictive compression of dynamic 3d meshes. In ICIP06, pages 2973–2976, 2006. [39] N. Stefanoski and J. Ostermann. Spatially and temporally scalable compression of animated 3d meshes with mpeg-4/famc. In ICIP08, pages 2696–2699, 2008. [40] G. Taubin and J. Rossignac. Geometric compression through topological surgery. ACM Transactions on Graphics, 17(2):84–115, 1998. [41] D. Terzopoulos, J. Platt, A. Barr, and K. Fleischer. Elastically deformable models. In SIGGRAPH ’87: Proceedings of the 14th annual conference on Computer graphics and interactive techniques, pages 205–214, 1987. [42] C. Touma and C. Gotsman. Triangle mesh compression. In Proceedings of Graphics Interface, pages 26–34, 1998. [43] S. Varakliotis, J. Ostermann, and V. Hardman. Coding of animated 3d wireframe models for internet streaming applications. In Proc. International Conference on Multimedia and Expo., pages 353–356, 2001. [44] Yair Weiss. Segmentation using eigenvectors: A unifying view. In International Conference on Computer Vision, pages 975–982, 1999.
A Survey of Popular 3D Soft-Body Animation Compression Approaches
111
[45] Ian H. Witten, A. Moffat, and Timothy C. Bell. Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kaufmann Publishers, San Francisco, CA, 1999. [46] J.H. Yang, C.S. Kim, and S.U. Lee. Compression of 3-d triangle mesh sequences based on vertex-wise motion vector prediction. IEEE Transactions on Circuits and Systems for Video Technology, (12):1178–1184, December 2002. [47] H. Zhang and R. Liu. Mesh segmentation via recursive and visually salient spectral cuts. In Proceedings of Vision, Modeling and Visualization, pages 429–436, 2005. [48] Z.Karni and C.Gotsman. Spectral compression of mesh geometry. In Proceedings of the 27th annual conference on Computer graphics and interactive techniques, pages 279–286, 2000.
In: Computer Animation ISBN 978-1-60741-559-6 c 2010 Nova Science Publishers, Inc. Editors: J.S. Wright and L.M. Hughes, pp. 113-127
Chapter 4
V IRTUAL E MOTION TO E XPRESSION : A C OMPREHENSIVE DYNAMIC E MOTION M ODEL TO FACIAL E XPRESSION G ENERATION U SING THE MPEG-4 S TANDARD Paula Rodrigues1,∗, Asla S´a2,† and Luiz Velho3,‡ 1 Informatics Department, PUC-Rio, Brazil 2 TecGraf, PUC-Rio, Brazil 3 IMPA - Instituto de Matem´atica Pura e Aplicada, Brazil
Abstract In this paper we present a framework for generating dynamic facial expressions synchronized with speech, rendered using a tridimensional realistic face. Dynamic facial expressions are those temporal-based facial expressions semantically related with emotions, speech and affective inputs that can modify a facial animation behavior. The framework is composed by an emotion model for speech virtual actors, named VeeM (Virtual emotion-to-expression Model), which is based on a revision of the emotional wheel of Plutchik model. The VeeM introduces the emotional hypercube concept in the R4 canonical space to combine pure emotions and create new derived emotions. The VeeM model implementation uses the MPEG-4 face standard through a innovative tool named DynaFeX (Dynamic Facial eXpression). The DynaFeX is an authoring and player facial animation tool, where a speech processing is realized to allow the phoneme and viseme synchronization. The tool allows both the definition and refinement of emotions for each frame, or group of frames, as the facial animation edition using a high-level approach based on animation scripts. The tool player controls the animation presentation synchronizing the speech and emotional features with the virtual character performance. Finally, DynaFeX is built over a tridimensional polygonal mesh, compliant with MPEG-4 facial animation standard, what favors tool interoperability with other facial animation systems.
Keywords: Facial Animation, Talking Heads, Expressive Virtual Characters. E-mail address:
[email protected] E-mail address:
[email protected] ‡ E-mail address:
[email protected]
∗ †
114
1.
Paula Rodrigues, Asla S´a and Luiz Velho
Introduction
Character Animation is one of the key research areas in Computer Graphics and Multimedia. It has applications in many fields, ranging from Entertainment, Games, Virtual Presence and others. Within the general area of character animation, the modeling and animation of faces is perhaps the single most important and challenging topic. This is because the expressiveness and personality of a character is communicated by facial expressions. The research in face modeling and animation dates back to the seminal work of Frederic Parke in the early 1970’s [9]. Since that time, the area experienced a very intense development. Practically all problems related to generating the shape and motion of faces have been deeply studied. This body of research includes a plethora of techniques for capturing the geometry and appearance of human faces, learning facial expressions, modeling muscles and the dynamics of deformations, together with realistic rendering methods for skin and hair. Despite of the amazing progress in the area of facial animation, there is one problem which is still open, and poses a great challenge to researchers: it is how to incorporate emotion on animated characters! This is the crucial step toward believable virtual characters. While a talented artist with the help of powerful modeling and animation tools can manually create a very expressive character, the same is not true for an automatic or even semi-automatic animation system. It is our intent in this paper to address the challenge of generating believable virtual characters automatically by incorporating a computational emotion model. We propose a comprehensive emotion model for facial animation that considers the various aspects of an expressive character. The model is implemented using a system based on the guidelines of the MPEG-4 standard for faces. The rest of the paper is structured as follows: in next section the emotion models and related work are discussed. In Section 3. we propose an emotion space named emotion Hypercube. The emotion hypercube enables pure emotion combinations in order to generate derived emotions in a natural way. We then describe a derived emotion classification scheme based on the proposed space. In Section 4. some affective phenomenon, like mood and personalitity, are incorporated into the model. In Section 5. the VeeM (Virtual emotionto-expression Model) is formalized, it consists of a representation of a facial expression from an emotion description with dynamic features. Section 6. presents an overview of the MPEG-4 standard to facial animation. In Section 7. we explain how the proposed model is implemented. Finally, conclusions and future work are discussed in the Section 8..
2.
Emotion Models and Related Work
Several models have been proposed to explain what is an emotion and how it is represented [13]. Here we summarize the main approaches related to this topic. Basic Emotion is probably the most well-known emotion approach. The reason for this is its association with universal recognized emotions [5]. Nevertheless, there is not a consensus for defining which are the basic emotions yet.
Virtual Emotion to Expression
115
Figure 1. Six universal basic emotions defined by Ekman (surrounded by a dashed line) and additional Plutchik basic emotions (accepted and aware). As discussed in [7], the basic emotion approach aims to build a psychologically irreducible emotion set, which means that these emotions cannot be derived by any other emotion and new emotions are derived from them. Note that these considerations matches to the mathematical definition of a basis. As mentioned above, the best known method used to study basic emotion is by observing facial expressions. Through this, Ekman [5] defined six universal emotions: anger, fear, disgust, surprise, joy and sadness, illustrated in Figure 1. An extension of Ekman’s model to the basic emotions representation is the approach proposed by Plutchik [11], where two additional basic emotions are defined (emphasized in Figure 1): anticipation (also referred as aware, curiosity or interest) and acceptance (also referred as trust). Plutchik describe its basic emotion as pairs of opposite emotions. Plutchik emotions are disposed in a wheel of opposed pairs, as illustrated in Figure 2. Derived emotions are defined as the combination of two neighbor basic emotion or as a basic emotion intensity variation. In the emotion literature, the Plutchik wheel is considered to be enough to span most of human emotion state. Examples of computational systems that use the basic emotion approach to generate their facial expressions are: SMILE [6], eFASE [3], EE-FAS [15], Cloning Expression [12], the MPEG-4 Standard [8] and the CSLU Toolkit [2]. In addition to any model of emotion, the emotion perception becomes single for each person due to factors such as mood and personality. The approach proposed in [10] is to model mood as a simple and unique dimension: good mood and bad mood. A more complete approach proposed by Thayer in [16] uses emotion spaces to represent mood in two dimensions (calm/tense and energy/tired), resulting in four mood emotional states: Energetic-calm, Energetic-tense, Tired-calm and Tiredtense. An example of computational system that incorporates mood to the character using Thayer’s model to generate dynamic facial expressions is the DER [14] [15]. Personality is another important aspect to define the action and reaction of each person as unique even when submitted to the same situation of another person. Until now, there is not a formal consensus to define the personality trait of a person, however the Big
116
Paula Rodrigues, Asla S´a and Luiz Velho
Figure 2. Plutchik wheel. Five (or Big OCEAN) model is well-known. In this model, each first letter of OCEAN word defines a dimension in the personality trait: Openess to experience, Conscientiousness, Extraversion, Agreeableness, Neuroticism 1 . Emotions are not static. They are experienced by each individual differently because of characteristics such as personality and mood, referred here as affective phenomenas. Additionally, affective phenomena also interferes in the reaction of each person when receives a stimulus, defining the emotion sustaining time. Our aim is to propose a computational system which incorporates and implements a robust model based on basic emotions. So, the Plutchik model [11] is revisited and generalized to allow the description of new emotions from the eight basic emotions as well as the incorporation of emotion dynamics in a comprehensive manner in order to allow the automatic generation of believable virtual characters.
3.
The Emotion Hypercube
The emotion description space proposed in this paper is a reinterpretation of Plutchik’s emotional wheel. We consider a family of emotions as a set of emotions composed by different intensity levels of a given basic emotion Ei . Plutchik has considered a discrete set of three levels of intensity of a given basic emotion, namely, an attenuation of the pure basic emotion, the pure basic emotion itself and an extrapolation of the pure basic emotion, as described in Table 1. Assuming that the Plutchik’s set of basic emotions is psychologically irreducible, our 1
More information about BIG OCEAN model can be found at http://www.answers.com/topic/big-fivepersonality-traits (accessed in 22-jan-2008).
Virtual Emotion to Expression
family
Table 1. Basic emotions attenuation basic emotion extrapolation
(axis)
(|αi | < 1)
(|αi | = 1)
(|αi | > 1)
1 (x+) 2 (y+) 3 (z+) 4 (w+) 5 (x-) 6 (y-) 7 (z-) 8 (w-)
serenity annoyance acceptance distraction pensivenes apprehension boredom interest
joy anger trust surprise sadness fear disgust anticipation
ecstasy rage admiration amazement grief terror loathing vigilance
117
goal is to define a basis that represents the space of derived emotions. In order to do so, we define an emotion axis, denoted by e, as composed by a pair of opposed families as stated in Plutchik emotional wheel. Thus, the eight basic emotions are arranged in 4 emotion axes denoted by x, y, z e w. The level of intensity of an emotion axis is modeled as a continuum parameter represented by the real value αi , where αe ∈ [−γ, +γ], with |γ| ≥ 1. A basic emotion is mapped to the intensity level 1 and its opposed basic emotion is mapped to -1. The neutral emotion is mapped to level 0. Though the intensity level |αi | = 1 matches to a basic emotion, |αi | < 1 is the emotion attenuation and |αi | > 1 is the emotion extrapolation (See Table 1). The more obvious space to be adopted to represent the space of emotions is Rn , where n is the number of opposed pairs of emotions to be considered in the model. Since we adopt 4 emotion axes, limited to the interval [−γ, +γ] we obtain an emotion hypercube H = [−γ, +γ] × [−γ, +γ] × [−γ, +γ] × [−γ, +γ] A given emotional stimulus can then be completely defined by a vector of intensity levels ~u = (αx , αy , αz , αw ) that represents the character’s state of emotion. The emotion hypercube H is a comprehensive emotion space useful to facial expression generation. Observe that the proposed space can be easily extended to more than four axes if a new pair of opposed basic emotions is incorporated, observing that the new axis should be independent from the previously defined ones in order to preserve the property that the set of emotion axes is a basis to H.
3.1.
Derived Emotions
The combination of basic emotions is an adequate approach to use with virtual talking heads. Plutchik [11] states that two basic emotions can be combined if they are not opposed to each other. The emotion hypercube H leads to a simple way to derive combined emotions. For instance, binary combinations can be defined by setting two intensity levels to zero. Thus, given two non opposite basic emotions Ei and Ej , their combination is defined by their non zero intensity values αi and αj . Although the original Plutchik model restricts combinations to adjacent basic emotions, we do not adopt this restriction in the proposed model.
118
Paula Rodrigues, Asla S´a and Luiz Velho
We emphasize that by using the hypercube model, n-ary combinations, where n is greater than two, are easily stated, although in the literature combinations of more than two emotions are not considered, probably due to the combinatorial growth of the number of derived emotions to be considered and semantically interpreted. Observe also that the restriction to not combine opposite emotions is intrinsic to the disposition of opposite emotions in the same axis.
3.2.
Binary Derived Emotions Taxonomy
In this section we propose a natural and complete taxonomy of the binary derived emotions that extends Plutchik’s work [11] by defining additional semantic interpretations. Two emotion axes ei and ej define an emotion plane of derived emotions Πij . We will refer to each quadrant of a plane as a sector of derived emotion. The combination of the 4 axis, 2 by 2, results in 6 derived planes. In Table 2 each sector of each plane is named with a semantic interpretation of the derived emotion. We emphasize that the semantic interpretation of each sector corresponds to the combination of two basic emotions with level of intensity |αi | = 1. If the basic emotion intensity is attenuated or extrapolated the semantic interpretation of the derived emotion may change.
4.
Modeling Affective Phenomena in H
Personality, mood, physical environment and others factors interfere on the expression of emotion. We refer to such factors collectively as affective phenomena and when applied to an individual character as its affective pattern. Thanks to affective phenomena people under the same emotion stimulus react and feel it in different ways and intensities according to its affective pattern. Affective phenomena are more enduring emotions and usually occur as an emotional background of much lower intensity than emotional episodes. In order to model the influence of an affective phenomena on an emotional episode we assume that the basic emotions parameters are defined considering a neutral character while an affective pattern is modeled as a distortion (warping) of the original emotion space.
4.1.
Affective Pattern Description
Suppose that in a given instant a happiness stimulus is augmented, then it is reasonable to think that the level of intensity related to joy also augments even if the affective pattern of the character is biased to sadness. Thus it is reasonable to state that an affective pattern can be defined by a set of monotonic functions fi : [−γ, γ] → [−γ, γ]. The new intensity level on each emotion axis is α˜i = fi (αi ), corresponding to the original stimulus i with i = x, y, z, w is one of the emotion axes from H. Then, the vector ~u ˜ = (α˜x , α˜y , α˜z , α˜w ) is an instance of the affective pattern in H. This approach to modeling affective patterns is general and capable to describe nonneutral characters behavior. The set of functions χ = {fx , fy , fz , fw } characterize an affective pattern. The difficulty resides in determine such functions in a meaningful manner.
Virtual Emotion to Expression
119
Table 2. Taxonomy Derived
basic
basic
derived
Plane
emotion i
emotion j
emotion
Πxy
Joy Joy Sadness Sadness
Fear Anger Fear Anger
thrill negative pride despair envy
Πxz
Joy Joy Sadness Sadness
Trust Disgust Trust Disgust
love morbidness sentimentalism remorse
Πxw
Joy Joy Sadness Sadness
Anticipation Surprise Anticipation Surprise
optimism absent pessimism disappointment
Πyz
Fear Fear Anger Anger
Trust Disgust Trust Disgust
submission distress dominance contempt
Πyw
Fear Fear Anger Anger
Anticipation Surprise Anticipation Surprise
anxiety awe aggression outrage
Πzw
Trust Trust Disgust Disgust
Anticipation Surprise Anticipation Surprise
positive pride curiosity cynicism defeat
Note that the physical environment can also be described by a set of such functions. For example if a character is in a formal environment the emotion expression tends to be attenuated, that is, α˜i < αi , thus the χ set should be a set of emotion attenuation functions.
4.2.
The Dynamics of an Affective Pattern
It is important to notice that affective patterns are not static. For instance, personality traits evolve in lifetime while mood can change during the day time or as a reaction to an emotional episode. The emotion dynamics can be modeled as a function of emotion and time f : (H, t) → H. Semantic aspects also need to be taken into account in the context of emotion dynamics. For example: a character cannot mix opposite emotion families and cannot swap them
120
Paula Rodrigues, Asla S´a and Luiz Velho
Figure 3. Emotion vector generation. frequently, otherwise a conflict or emotional instability is established. The combination of basic emotions with the farthest families generates destructive emotions, because they are close to a conflict state.
5.
VeeM: Virtual Emotion to Expression Model
Figure 4. VeeM architecture. The H space is used to define a given character’s state of emotion in an instant of time. In order to simulate a believable character animation, the affective patterns and its dynamic characteristics as well as dynamic characteristics such as head movements and eyes blink have to be combined together. The dynamics of the facial expression of a given emotion will be treated by VeeM. In Figure 4 a schematic view of the proposed model is illustrated.
Virtual Emotion to Expression
121
There are other equally important dynamic characteristics to be considered when aiming to generate a believable character, namely the speech dynamics, the eyes movements dynamics and the head movement dynamics, refered in literature as non-verbal movements. The difference between affective patterns and non-verbal movements is their domain of action, instead of affect an emotional state, the non-verbal movements act directly on the facial expression domain F. Thus, can be modeled as a function of emotion and time g : (H, t) → F. Speech interferes on mouth movements and a lot of work has been done in order to define the movements that characterize it. The subject is complex and depends on the spoken language and character’s culture. But the consensus is to define visemes to represent the phonemes and combine them to produce speech visualization. A viseme is a visual representation of a phoneme that describes the facial movements that occur alongside the voicing of phonemes. Head and eyes movements can be treated as random functions (random noise) considering the fact that it is uncommon to keep them fixed. Other complementary approach is to model them as directed reactions that simulate the attentional focus. In both cases the function that model these behaviors should interfere in the head and eyes position as well as the motion velocity. Speech and non-verbal characteristics such as eyes and head movements should be combined with the emotion description in order to produce the resultant facial expression [13]. VeeM also incorporates dynamics of reaction to an emotion episode (directly related to stimulus or facts which elicits an emotion). The emotion expression reaction depends on the affective pattern. The reaction curve rj (t), similarly to the description of emotional stimulus given by Picard [10], has 3 stages: onset (usually very fast), sustain and decay stages, as illustrated in Figure 5.
Figure 5. Emotion reaction curve. Emotion transition is incorporated as a blending between two subsequent emotions.
6.
The MPEG-4 Standard
Once the domain of emotion description H has already been defined we now turn to the description of the range space, that is, the space where the emotions are to be visualized:
122
Paula Rodrigues, Asla S´a and Luiz Velho
the facial expression space F. In general, different works in facial animation generation develop their own facial model without worrying about compatibility. Often the only common approach among these works is the emotion model, which usually are the six Ekman’s basic emotions. Aiming to define a standard, the MPEG-4 [4] [8] agreed a set of control points to define a facial model proposing a facial polygonal mesh that can be considered universal. It is important to mention that the MPEG-4 facial animation standard is the first effort in this direction. The MPEG-4 specifies a face model in its neutral state, a number of feature points (FPs) on this neutral face as reference points, and a set of facial animation parameters (FAPs), each corresponding to a particular facial action that deforms the face model starting from the neutral state. In this work we identify the facial expression space F to the space of FAPs. A neutral face in the MPEG-4 standard must consider the following properties: • Gaze is in the direction of z-axes; • All face muscles are relaxed; • Eyelids are tangent to the iris; • The pupil is one third the diameter of the iris; • Lips are in contact; the line of the lips is horizontal and the same height at lip corners; • The mouth is closed and the upper teeth touch the lower ones; and • The tongue is flat, horizontal with the tip of the tongue touching the boundary between upper and lower teeth. In order to define FAPs for arbitrary face models, MPEG-4 defines facial animation units (FAPUs) that servers to scale FAPs for any face model. FAPUs are defined as fractions of distances between key facial features. These features, such as eye separation are defined on a face model that is in the neutral state. From the FAPUs definition, MPEG-4 specifies 84 FPs on the neutral face. The main purpose of these FPs is to provide spatial reference for defining FAPs. FPs are arranged in groups such as cheeks, eyes and mouth. The location of these FPs has to be known for any MPEG-4 compliant face model. The FAPs are based on the study of minimal perceptible actions and are closely related to muscle actions [8]. The 68 parameters are categorized into 10 groups related to parts of the face (Table 3). FAPs represent a complete set of basic facial actions including head motion, eye and mouth control. The FAP group 1 contains two high-level parameters: visemes and expressions. The MPEG-4 standard defines 14 visemes to represent english phonemes [13] [8]. The expression parameter defines the six basic facial expressions. Facial expressions are animated by a value defining the excitation of the expression. A benefit of using FAP Group 1 is that each facial model preserves its personality in a sense that a specific face model preserves its particular version of facial expression.
Virtual Emotion to Expression
123
Table 3. FAPs groups Group
Number of FAPs
1. visemes and expressions 2. jaw, chin, inner lowerlip cornerlips, midlip 3. eyeballs, pupils, eyelids 4. eyebrow 5. cheeks 6. tongue 7. head rotation 8. outer-lip positions 9. nose 10. ears
2 16 12 8 4 5 3 10 4 4
FAP groups 2 to 10 are considered low-level parameters. They specify precisely how much a FP of a face has to be moved for a given amplitude [8]. An MPEG-4 facial expression is then obtained by moving the Feature Points (FP) associated to the FAPs. Each basic emotion has a set of FAPs defined to produce its correspondent facial expression. We call signal the facial expression related to a specific emotion and we denote the set of FAP values that define the emotion j as ~vj . The facial animation sequence is obtained by specifying FAP values at each time instant, ~vjt , according to an input timeline. We adopt the MPEG-4 Standard as our space of facial expressions.
7.
VeeM applied on an MPEG-4 Face Model
As already mentioned, the VeeM permits to generate different facial emotions taking into account affective patterns and the emotion dynamics. A challenge turns into emotion visualization using an MPEG-4 face model. That is, to set FAPs parameter values from the defined emotion. In this work we adopt the open source Xface [1] face model. The model had to be extended since the original implementation defines only the six Ekman’s basic emotions as conventioned in MPEG-4 standard. Figure 6 shows the eight basic emotions of VeeM specified in the Xface MPEG-4 polygonal mesh. Speech is a key element to generate a dynamic and natural facial animation combined with the character’s emotional state. The implemented system uses an audio file with the character’s speech as input. This file goes through a speech recognition stage generating, as output, the speech phonemes [13]. The produced phonemes are mapped into the 14 MPEG-4 visemes, each one mapped into a set with 26 FAPs that define the mouth region. The FAPs values generated at each stage of facial animation specification (verbal expressions, non-verbal expressions and emotions) need to be blended to define the final value
124
Paula Rodrigues, Asla S´a and Luiz Velho
Figure 6. VeeM basic emotions viewed in a MPEG-4 facial animation. that a FAP takes in each animation frame. For each animation frame, this blending is done in two stages (Figure 7): • Facial expression for the emotion (pure or derived); and • Facial expression for the resulting emotion and viseme blending.
Figure 7. FAP blending to generate the final facial expression for each animation frame. The first stage can receive as input a single vector of FAPs, two vectors of FAPs or nothing. Whatever the input, the results are always two vectors with dimension 68: a vector containing the mask of FAPs (signaling if a FAP participates or not in the current frame) and another with the values for the participants FAPs.
Virtual Emotion to Expression
125
If a vector is not provided as input, the first stage result is the FAPs vector for natural emotion and a mask containing the value 0 for the 26 FAPs of the mouth region and for the FAP 1 (viseme), and the value 1 for the other FAPs. This configuration purpose is the face remains in the natural state, and its expression influenced only by the viseme FAPs. In the case where a single FAPs vector is provided, the first stage output is the FAPs vector itself and a mask containing a value of 1 for all FAPs, with the exception of FAP 1 (viseme). When the first stage receives two vectors of FAPs, a blending is necessary to define the derived emotion. This blending is obtained by calculating the mean between the values of each FAP, except for FAPs 1 and 2 which are the high-level FAPs. In FAP 1 the value 0 is set, once it represents the viseme. In FAP 2, which describes the emotion, the value 0 is assigned, because this FAP does not has influence in the animation, since only low-level FAPs are considered by the module of synchronization. If the two FAPs vectors are provided as input, the output mask passes through the same process as in the case of only one input vector. The second blending stage generates the resulting FAPs vector using as input the first stage output, the viseme FAPs vector and a viseme contribution parameter indicating a factor blending, denoted by β, where 0 ≤ β ≤ 1. Before applying the viseme-emotion blending rule, the second stage creates a mask for the viseme FAPs vector. If there is a viseme FAPs vector, the generated mask vector has value 1 for the 26 FAPs related to mouth region and value 0 for the other FAPs. If there is not a viseme FAPs vector specified as input of second stage, the mask is created with all elements of the vector having value 0. The blending rule is simple. Denoting F APemo and M ASKemo , respectively, the resulting emotion FAPs vector and mask vector; F APvis and M ASKvis , respectively, the viseme FAPs vector and mask, both created in second stage; and β as the viseme contribution factor, it is possible to apply for each index i of resulting FAPs vector F APres the following algorithmic logic: • if (M ASKvis = 0) ⇒ F APresi = M ASKemoi ∗ F APemoi • if ((M ASKvis 6= 0) ∧ (M ASKemo = 0)) ⇒ F APresi = M ASKvisi ∗ F APvisi • if ((M ASKvis 6= 0) ∧ (M ASKemo 6= 0)) ⇒ F APresi = (1 − β) ∗ M ASKemoi ∗ F APemoi + β ∗ M ASKvisi ∗ F APvisi With this rule, it is possible to note that the value β = 1 is related to the blending visemes and emotions strategy where visemes is overlapping emotions, ignoring the emotion influence in the generation of facial expression in mouth region. The β value can be set for a qualitative analysis of the visual results obtained. Good results have been achieved with values ranging from 0.6 ≤ β ≤ 0.7. Once FAP values for each animation frame are generated, the next step is the synchronization between the audio speech file and each frame. In the beginning of animation presentation, a thread for audio presentation is initiated in parallel with the thread used to control the face. The thread responsible for the face control synchronizes FAPs with the
126
Paula Rodrigues, Asla S´a and Luiz Velho
audio verifying the machine clock and using the frame frequency defined in the animation. At each iteration, the elapsed time from the start of the animation is calculated and the corresponding frame is loaded. If the frame is different from the previous one, it is taken in FAP structure received as a parameter. This method generates the FAPs mapping in the mesh and the new expression is rendered.
8.
Conclusion
This paper introduced a new emotion model for the generation of facial expressions in virtual characters. The proposed Virtual emotion-to-expression Model (VeeM) is based on a generalization of Plutchik‘s emotional wheel [11] to the emotional Hypercube H ⊂ R4 . This mathematical formulation allows the combination of the 8 pure emotions in a general and coherent way. Furthermore, it sets the ground for a comprehensive framework which integrates emotions and affective phenomena by time-varying functions f : (H, t) → H as well as the non-verbal movements g : (H, t) → F, from the configuration space H to the space of facial expressions F. The dynamics of expressions is modeled by considering the temporal properties of functions ft in the space H and gt in the space F. Facial expressions are defined in our framework using the MPEG-4 standard. Consequently, another relevant contribution of this paper is a computational methodology that incorporates VeeM expressions into the representation of a face under the MPEG-4 guidelines. Our system generates animations of believable virtual characters from emotional and verbal elements. This is done by mapping emotion and speech to FAPs at each frame of the animation with lip-sync and expression dynamics. An important aspect that we plan to address in the future is the development of tests to validate the combination between visemes and facial expressions in VeeM. Another avenue for future work is the investigation of derived emotions resulting from the combination of more than two axis of the emotional Hypercube. Further research for the VeeM framework includes: a detailed analysis of warpings of the emotional hypercube to model affective phenomena; an in-depth study of how to effectively model the dynamics of expressions through time-dependent properties of functions in the emotional space; and a validation of the best strategies to perform transition between emotional states. Finally, our end goal is to fully exploit the potential of VeeM in actual applications.
References [1] Balci, K.: Xface: MPEG-4 based open source toolkit for 3D Facial Animation. Proceedings of the working conference on Advanced visual interfaces, 399–402 (2004). [2] Cole, R.: Tools for research and education in speech science. Proceedings of the International Conference of Phonetic Sciences (1999). [3] Deng, Z., Neumann, U.: eFASE: Expressive Facial Animation Synthesis and Editing with Phoneme-Isomap Control. Proceedings of ACM SIGGRAPH/Eurographics Symposium on Computer Animation (2006).
Virtual Emotion to Expression
127
[4] Ebragimi, T., Pereira, F.: The MPEG-4 Book. Prentice Hall PTR, 1st edition (2002). [5] Ekman, P.: Universal and cultural differences in facial expressions of emotion. Nebraska Symposium on Motivation. Ed. Lincoln, 207–283 (1971). [6] Karla, P. et al.: SMILE: A Multilayered Facial Animation System. Proceedings of IFIP WG 5 (10), 189–198 (1991). [7] Ortony, A., Turner, T.J.: What’s basic about basic emotion? American Psychological Association, Inc. 97 (3), 315–321 (1990). [8] Pandzic, I.S., Forchheimer, R.: MPEG-4 Facial Animation: The Standard, Implementation and Applications. John Wiley and Sons, Ltd (2002). [9] Parke, F.: A parametric model for human faces. PhD Thesis, University of Utah (1974). [10] Picard, R. W.: Affective computing. Cambridge, Mass. : M.I.T. Press (1997). [11] Plutchik, R.: A general psychoevolutionary theory of emotion. Emotion: Theory, research, and experience. Theories of emotion Vol 1, 3–33 (1980). [12] Pyun, H. and et al.: An Example-Based Approach for Facial Expression Cloning. Proceedings of the 2003 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, 167–176 (2003). [13] Rodrigues, P.S.L.: A System for Generating Dynamic Facial Expressions in 3D Facial Animation with Speech Processing. PhD Thesis, PUC-Rio, Brazil (2007). [14] Tanguy, E. A. R.: Emotions: the Art of Communication Applied to Virtual Actors. PhD Thesis, Department of Computer Science; University of Bath; CSBU-2006-06 (ISSN 1740-9497) (2006). [15] Tanguy, E., Willis, P., Bryson, J.: A Dynamic Emotion Representation Model Within a Facial Animation System. Techinical Report, CSBU-2005-14 (2005). [16] Thayer, R.E.: The origin of everyday moods. Oxford University Press (1996).
In: Computer Animation ISBN 978-1-60741-559-6 c 2010 Nova Science Publishers, Inc. Editors: J.S. Wright and L.M. Hughes, pp. 129-144
Chapter 5
E XAMPLE -BASED P ERFORMANCE -D RIVEN A NIMATION OF AN A NATOMICAL FACE M ODEL Yu Zhang Institute of High Performance Computing, Singapore 117528
Abstract Recent development of physics-based face modeling that emulates the anatomical structure including skin, muscles, and skull allows us to create detailed, realistic animations. However, synthesis of facial expressions on such complex models often involves significant manual work due to the difficulty in determining appropriate values of the muscle actuation parameters. This paper presents an example-based performance-driven method to automatically estimate facial muscle actuation parameters from markerless video footage. Our method is based on an efficient face tracker which uses a facial deformation subspace model. During the training phase of the tracker a set of templates associated with the subspace basis is computed to alleviate the online computation. At runtime, the tracking algorithm establishes temporal correspondence of the face region in the video sequence by simultaneously determining both motion and appearance parameters. Using a set of example pairs that consist of the appearance and animation parameters corresponding to the key expressions, we learn the relationship between facial appearances and animation parameters. It enables the animation parameters to be computed in real-time from the appearance parameters obtained by the tracker, allowing animation of the anatomical model at interactive rates.
1.
Introduction
Realistic modeling and animation of human faces has been one of the most interesting problems in computer graphics. In particular, synthesizing facial expressions of virtual characters has experienced increased attention for its important applications in entertainment (e.g., movies and computer games), advanced man-machine interfaces, electronic commence, tele-presence and shared virtual world systems, and facial expression recognition. However, the task of modeling the expressive human face by computer remains a
130
Yu Zhang
major challenge. First, facial movement is a product of the underlying skeletal and muscular forms, as well as the mechanical properties of the skin and subcutaneous layers. This is very complex, because there are numerous specific muscles and important interactions between muscles and bone structure. Second, the human visual system is very sensitive to the nuances of facial expressions that it can perceive, and the slightest deviation from real facial appearance or movement can be immediately detected as wrong by the most casual viewer. Advances in facial animation systems show the potential of physics-based approaches, where an anatomically accurate model of facial musculature, passive tissues and underlying skeletal structure is simulated [15, 16, 18, 25, 30, 33]. This kind of techniques can be used to create detailed, realistic animations. However, as the model becomes more complex, animating detailed models of this sort becomes more difficult, requiring complex coordinated stimulation of the underlying musculature. Although the Facial Action Coding System (FACS) [10] that codes facial movements in small units provides guidance to activate individual muscles for synthesizing specific expressions, extensive manual intervention is still required due to the difficulty in determining appropriate muscle parameter values. Once a model exists, it is often desirable to automatically determine muscle contractions from real facial motion data. One solution to this problem comes from the performance-driven animation approach, in which video footage recording the performance of a human actor is used to control the animation of a synthetic model. The face can be tracked throughout the video by recovering the position and expression at each frame. This information can then be used to estimate animation parameter values. While various techniques have been used for performancedriven animation, most existing ones use colored markers painted on the actor’s face to aid the face tracker. Once the position of the markers has been determined, the position of the face and the facial features can be derived easily. However, the use of markers on the face is intrusive and limits the type of video that can be processed. In this paper, we present an example-based performance-driven animation method by automatically determining facial muscle activations that track markerless video footage of a person’s face. Our method consists of several stages: facial deformation subspace construction, facial motion tracking, and expression retargeting. Fig. 1 shows a block diagram of the system architecture. In facial deformation subspace construction, we build a lowdimensional linear subspace that models image variation due to non-rigid facial deformations. The subspace is trained offline by processing a video sequence of the person with different expressions. In face tracking procedure, the deformation subspace model is incorporated into an efficient tracking algorithm which establishes temporal correspondence of the face region in the video sequence by simultaneously determining both motion and appearance parameters with no more computation that would be required. During the training phase of the tracking algorithm a set of templates associated to the subspace basis is computed to alleviate the online computation. In expression retargeting, a set of example face images that show key expressions are selected. Their appearance parameters in the facial deformation subspace together with the corresponding animation parameters of an anatomical 3D face model are used to learn the relationship between the animation parameters and the appearance parameters. At runtime, it allows us to efficiently estimate the animation parameters from the appearance parameters provided by the tracker.
Hamiltonian Mechanics
131
Figure 1. Overview of the example-based performance-driven expression synthesis system. The paper is organized as follows. Section 2. reviews the previous and related work. Section 3. presents our anatomical face model and creation of key example expressions. Construction of the deformation subspace is explained in Section 4.. Section 5. details the facial motion tracking algorithm. Section 6. describes efficient estimation of animation parameters in the expression retargeting process. Experimental results are shown in Section 7.. Section 8. presents conclusions and proposes avenues for future work.
2.
Previous and Related Work
Realistic facial animation remains as a fundamental challenge in computer graphics. Since the pioneering work of Parke [24], a large body of literature on modeling and animating faces has been published in the last four decades. A good overview can be found in the textbook by Parke and Waters [23] and in the survey by Noh and Neumann [21]. In the context of this paper, we focus on publications that address performance-driven facial animation and muscle-based face modeling. Remarkably, one of the oldest publications in this context is the one that uses threedimensional sparse motion capture marker data to control facial movement of computergenerated models [32]. The system synthesizes expressions by changing texture coordinates calculated from the positions of the markers on the performer’s face. Eisert and Girod [9] model a face with a B-spline surface, and analyze facial expressions into feature point positions to estimate the facial animation parameters of the MPEG-4 standard. Guenter [13] capture both 3D geometry and shading information of a human face, and reproduce photorealistic expressions. In all of these methods, the locations of the markers are used to drive the 3D model. Since the markers usually are quite sparse compared to the dense surface mesh of the model, an interpolation function is typically used to deformed the mesh so that vertices in between the markers are displaced properly. Another category of performance-driven animation is to synthesize expressions by blending pre-modeled key expressions. The animation is achieved by computing a set of
132
Yu Zhang
blending weights that minimize the Euclidean distance between the corresponding markers on the actor’s face and the 3D model. Pighin et al. [27] reconstruct the geometry and texture of an individual face from several face images taken from different viewangles. They also model basic expressions and generate novel expressions by blending them. Later, they propose a method to find the blending weights by minimizing an error function over the set of pre-modeled expressions and face positions spanned by the model [26]. Kouadio et al. [17] animate a synthetic character by extracting the interpolation weights from the feature points traced by an optical capture system. Choe and Ko [5] develop an artist-in-the-loop method for analyzing captured expressions. The expressions are synthesized by a linear combination of the elements in a muscle actuation basis which consists of face shapes resulting from the contraction of each single facial muscle. Typically, the basis elements need to be resculpted a number of times to obtain satisfactory results. In the approach proposed by Chuang and Bregler [6], the 2D key expressions are automatically found from the tracking data, and the corresponding 3D key shapes of a face model are created manually. Facial animation is produced by applying the blending weights recovered from facial feature decomposition to the 3D key shapes. However, a complete bank of 3D key shapes must be built for any new subject, which is a tedious task. Some approaches involve mapping the motion of a facial expression from the source model to the target model directly [20]. Since the target model may have different shape, the source motion vectors need to be transformed to follow the curvature of the new face shape. In order to generate delicate skin deformations, dense mesh motion is required as input. But this may not be available from some motion capture systems. Moreover, dense correspondences between the source and target models should be established for motion retargeting, which is difficult if the source and target shapes are very different. In facial animation, desire for improved realism has driven researchers to extend geometric models with physical models of facial anatomy which attempt to emulate the influence of muscle contraction onto the skin surface by approximating the biomechanical properties of skin [15, 16, 18, 25, 30, 33]. Animating a detailed muscle-based model can be rather difficult since facial muscles contract in a complex coordinated manner to generate expressions. A solution to this problem is to automatically determine muscle activations from the facial motion data. Terzopoulos and Waters [31] extract muscle contraction parameters based on the position of facial features tracked by snakes. Morishima et al. [19] use 2D marker positions as input for a neural network which estimates muscle actuation parameters. Both of these approaches require heavy makeup of the actor’s face. Some techniques compute an optical flow from the video sequence and decompose the flow into muscle activations. Essa et al. [12] use a physical face model and develop a system to estimate muscle contractions that match optical flow input based on feedback control theory. Decarlo and Metaxas [8] employed a similar model that incorporates variations in head shape using anthropometric measurements. In [1, 4], a 2D quasi-static finite element model is used to simulate movements of the lips [1] or the facial skin surface [4]. Given marker data, the authors use a steepest descent iterative solver to calculate the lip model parameters or the facial muscle activations that best track the motion data. More recently, Sifakis et al. [29] employ an optimization framework to determine muscle activations that track a sparse set of surface landmarks, and used it for speech animation [28]. However, the computational complexity of the nonlinear optimization process makes this method unsuitable to
Hamiltonian Mechanics
133
retarget facial expressions in real-time.
3. Creating Key Expressions on An Anatomy-based Face Model We have developed an anatomy-based face model for physically-based facial animation [33]. The model encapsulates three structural layers: skin, muscles, and skull (see Fig. 2). The skin surface is represented as a triangular mesh, consisting of 4,517 triangles. The edges and vertices of the skin mesh are converted to nonlinear springs and point masses to simulate dynamic deformation of the soft tissue. A layer of 23 muscles is attached to the skull and inserted into the skin to control facial movement. Our muscle models simulate the distribution of muscle force exerted on the facial skin. When muscles contract, the surrounding skin tissue is dynamically deformed under a field of muscle forces. An animation of facial expressions is carried out by a deformation of the skin mesh resulting from the combined contractions of a set of muscles based on the FACS [10]. The skull is also represented as a triangular mesh. During the runtime of an animation, articulation of the jaw causes motion of the mouth, and shape of the skull constrains skin deformation, preventing skull penetration. The reader is referred to [33] for detailed description of this model. Frontalis Inner Frontalis Major Frontalis Outer Corrugator Supercilliary Orbicularis Oculi Nasalis Zygomaticus Minor Zygomaticus Major Orbicularis Oris Risorius Depressor Anguli Mentalis
Figure 2. The anatomy-based face model. Left: face geometry. Right: multi-layer anatomical structure of the skin, muscles and skull. The anatomy-based face model is animated to generate a set of key expressions, which is done once offline. Ekman [10] illustrated that facial expressions result from the actuation of a single or multiple facial muscles. Using FACS, expressions are coded as combinations of the Action Units and levels of muscle activation, which serves as an excellent guide to the key expression modeling job. With a windows GUI, different expressions can be readily synthesized on the face model using the muscle and jaw parameters. For each muscle, the degree of the muscle contraction is controlled by the parameter muscle contraction rate which is defined between 0 (passive) and 1 (maximally active). The motion of the jaw is realized by a 3D coordinate transformation which is controlled by six parameters: three rotation angles and three translation parameters. Rigid movements of two eyes are controlled by four transformation parameters. The total 33 facial animation parameters are grouped into a vector p = [p1 , p2 , . . . , pm ]T , where m = 33. Following the categorization of emotions in psychological study [10], we create a set of 54 key expressions which are believed to correspond to situations eliciting different kinds of emotion. Fig. 3 shows some
134
Yu Zhang
examples. Each key expression is rendered into an image, and its corresponding animation parameter vector, pi , is recorded.
Figure 3. Examples of key expressions generated on the anatomy-based face model.
4.
Face Deformation Subspace Model
The tracking model is trained from an annotated training set which contains a number of images from the training video sequence. In acquisition of the training data, the subjects were asked to make all kinds of expressions including the key expressions. To obtain the training set, a handful of M frames from the video sequence that exhibit pronounced differences are manually labeled with 65 feature points which are located on the eyebrows, eyes, nose, mouth, and outline of the face. Feature points are distributed evenly along each contour (see an example in Fig. 4).
(a)
(b)
Figure 4. A training face image (a) and the feature points (b). Assume that the feature point set of the training frames be given as {Pi }i=1,...,M where i )) ∈ R2K is a sequence of K (K=65) points in the image Pi = ((xi1 , y1i ), . . . , (xiK , yK ¯ plane. Let P be the mean shape of the feature point set. P¯ is calculated after Pi are aligned to remove the affine motion of the face. Each training image is then warped correspondingly from its original feature point set Pi to the mean shape P¯ by using a thin plate spline approach [3]. After this normalization procedure, we define a target region R ∈ RN which is the patch of N image pixels enclosed by the bounding box of P¯ . Let the set {Ri }i=1,...,M be a set of M target regions in the warped training images. For each image in the set {Ri } we construct a 1D vector by scanning it in the standard lexicographic order. We assume that the number of training images, M , is less than the number of pixels, N . The average over
Hamiltonian Mechanics 135 PM M formed vectors is given by Φ0 = (1/M ) i=1 Ri . Each formed vector differs from the average by the vector dRi = Ri − Φ0 . We arrange the deviation vectors into a matrix D = [dR1 , dR2 , . . . , dRM ]. Principal component analysis (PCA) of the matrix D yields a set of M principal orthogonal modes of variation in {Ri }, Φj , and their corresponding eigenvalues λj . Φj are sorted according to the decreasing order of their eigenvalues. The PCA model is obtained as: R = Φ0 +
M X
αj Φj = Φα
(1)
j=1
where Φ = [Φ0 , Φ1 , . . . , ΦM ] is the matrix consisting of the average and M principal modes of variation in the training set, and α = [1, α1 , . . . , αM ]T is the vector of appearance parameters. The projection from R to α is α = ΦT R
(2)
By truncating the expansion of Eq. 1 at j = k we introduce an error whose magnitude decreases when k is increased. We choose the k such that k X j=1
λj ≥ τ
M X
λj
(3)
j=1
where τ defines the proportion of the total variation exhibited in the training set (98% in our case). By this, a k(<< N ) dimensional deformation subspace is defined by the k basis vectors, and each training image, Ri , is represented as a point in the subspace in Rk . The linear subspace basis, Φ, models non-rigid deformation of the face in generating expressions.
5.
Tracking the Face
Let I(x, t) be the pixel value at the location x = (x, y)T in the image acquired at time t. Over time, the relative motion between the subject and the camera causes the image of the face to shift. We use a warping function f (x, β) to model the rigid motion of the face, where β = [β1 , β2 , . . . , βl ]T is the motion parameter vector, with f (x, 0) = x. f is assumed to be differentiable in both x and β. Tracking a face amounts to recovering the motion parameter vector for each image in the tracking sequence. We assume that the only changes in images of the face are completely described by f , i.e., there are no changes in the illumination of the face. Our tracking model is represented by the image constancy assumption I(f (x, β t ), t) = [Φαt ](x),
∀x ∈ R
(4)
where I(f (x, β t ), t) is the image acquired at time t rectified with motion model f (x, β t ) and motion parameters β t . By [Φα](x) we denote the value of Φα for the pixel with position x in the image. Intuitively, Eq. (4) states that the rigidly rectified image I(f (x, β t ), t) can be expressed as a linear combination of the subspace basis vectors Φ.
136
Yu Zhang
Tracking a face consists of estimating for each frame in the sequence the values of the motion parameter vector β and appearance parameter vector α to minimize the following least squares objective function X O(α, β) = (I(f (x, β t ), t) − [Φαt ](x))2 =k I(β t , t) − Φαt k2 (5) x∈R
where I(β t , t) is the image of the target region, under the rameters β, in vector form in an N -dimensional space: I(f (x1 , β t ), t) I(f (x2 , β ), t) t I(β t , t) = .. . I(f (xN , β t ), t)
change of coordinates with pa
(6)
Minimizing Eq. (5) can be a difficult task as it defines a nonconvex cost function. In the absence of a good starting point, some costly global optimization approaches are required to solve this problem. In our case, by taking advantage of the continuity of face motion in the tracking sequence, we recast the tracking problem as one of determining a vector of offsets, ∆β, such that β t+∆t = β t + ∆β from a frame acquired at t + ∆t. Incorporating this modification into Eq. (5) and using a first order Taylor series expansion, we reduce the problem to a linearized version O(α, ∆β) ≈k I(β t , t + ∆t) + J∆β − Φαt+∆t k2
(7)
where J ∈ RN ×l is the Jacobian matrix of I with respect to the components of β, i.e., J = ∂I(β,t) ∂β |βt . Such a linearization enables us to apply continuous optimization procedures to the tracking problem. The optimization scheme we use first assumes α constant and uses the most recent appearance parameter estimate αt to rectify the target region. Then the solution for ∆β can be obtained by solving the set of equations ∇O = 0: ∆β = −(JT J)−1 JT [I(β t , t + ∆t) − Φαt ]
(8)
We define the error vector as the difference between the rectified image and the linear combination of the subspace basis vectors. e(t + ∆t) = I(β t , t + ∆t) − Φαt
(9)
Thus, the solution of Eq. (5) at time t + ∆t given a solution at time t is β t+∆t = β t − (JT J)−1 JT e(t + ∆t)
(10)
From Eq. (10), we see that the obstacle to efficiently tracking the face region through the image sequence is the computational cost of estimating J for each frame which involves the calculation of the image gradient vector. However, it is possible to reduce this computation by factoring J. Each element of J can be written as sij = Iβj (f (xi , β), t) = ∇f I(f (xi , β), t)T fβj (xi , β)
(11)
Hamiltonian Mechanics
137
By differentiating both sides of Eq. (4) with respect to x, we obtain ∇f I(f (x, β), t)T fx (x, β) = ∇x Φα From Eq. (11) and (12), we get J(α, β) =
∇x Φ(x1 )αfx (x1 , β)−1 fβ (x1 , β) ∇x Φ(x2 )αfx (x2 , β)−1 fβ (x2 , β) .. . ∇x Φ(xN )αfx (xN , β)−1 fβ (xN , β)
(12)
(13)
Therefore, J can be expressed in terms of the gradient of the subspace basis vectors, ∇x Φ, which are constant, and the appearance and motion parameters, (α,β), which are timevarying. If we choose a motion model f such that αfx (x, β)−1 fβ (x, β) = Λ(x)Γ(α, β),
(14)
where Λ and Γ are the matrices depending only on image coordinates and parameters (α,β), respectively. J can then be factored into ∇x Φ(x1 )Λ(x1 ) ∇x Φ(x2 )Λ(x2 ) J(α, β) = (15) Γ(α, β) = J0 Γ(α, β) .. . ∇x Φ(xN )Λ(xN )
where ∇x Φ(xi ) is the Jacobian of Φ with respect to image coordinates, J0 is a constant matrix, and Γ is a time-varying matrix. The columns of J0 can be regarded as a set of fixed template images and can be pre-computed offline. By exploiting this factorization, from (8), an efficient solution for ∆β can be obtained: ∆β = −(ΓT ΩΓ)−1 ΓT JT0 e
(16)
where Ω = JT0 J0 . J0 can be precomputed and stored, and only e and Γ need to be evaluated online at (αt , β t ). Our optimization scheme then assumes β constant and computes the minimum of O(α, ∆β) with respect to α to obtain the solution for αt+∆t : αt+∆t = ΦT [I(β t , t + ∆t) + J∆β]
(17)
The term J∆β represents the pixel value variation in I due to a motion of magnitude ∆β. Intuitively Eq. (17) states that the appearance parameters are computed by projecting into the subspace Φ the rectified image corrected to take into account the incremental motion ∆β. In our experiments, we use a projective motion model, f (x, β) = Hxh , where H is a 3 × 3 projective transformation matrix containing l = 8 motion parameters and homogeneous image coordinates xh are related to x by: xh = (r, s, λ)T → x = (r/λ, s/λ)T = (x, y)T ; λ 6= 0. The above optimization is performed using the Gauss-Newton approach. With obtained αt+∆t , we iteratively estimate ∆β using (16). Normally in two to three iterations the convergence is reached.
138
6.
Yu Zhang
Facial Expression Retargeting
The face tracker described in Section 5. provides the vector of subspace coefficients, αt , which encapsulates deformation appearance of the face at time t. αt is used to estimate animation parameters of the anatomy-based face model. We select a set example face images {Rj }j=1,...,n (n = 54) corresponding to the key facial expressions from the normalized training data set {Ri }i=1,...,M , and project each example image into the constructed deformation subspace to obtain its appearance parameters. αj = ΦT Rj (18) Let A∈ R(k+1)×n and P∈ Rm×n be the matrices obtained by storing column-wise the computed appearance parameters of the example images and pre-stored facial animation parameters corresponding to the key facial expressions, respectively. We construct a matrix G∈ R(k+m+1)×n : A α1 · · · αn = (19) G= WP W(p1 · · · pn ) where W is a diagonal matrix of weights for the facial animation parameters to compensate the difference in scale between the animation parameters and appearance parametersz. We set W = diag(w) where w2 is the ratio of the total variation of the appearance parameters to the total variation of the animation parameters. Applying PCA on G, we obtain a matrix Ψ ∈ R(k+m+1)×(q+1) which consists of the mean vector of examples and q (q < n) eigenvectors corresponding to the q largest eigenvalues of the covariance matrix GGT : Ψα (20) Ψ= Ψp Each example pair (αj ,pj ) can be approximated as: αj = Ψγj wpj
(21)
where γ j = [1, γj,1 , . . . , γj,q ]T is the vector of coefficients. By this, a concatenated parameter vector, which is originally in R(k+m+1) , can be represented as a point in the low dimensional parameter subspace in R(q+1) . Ψ parameterizes the example parameter vectors and represents the relationship between the appearance parameters in A and the animation parameters in P. For each frame in the tracking sequence, once the vector of appearance parameters is obtained by the tracker, it can be represented by a linear combination of the parameter subspace basis vectors: αt = Ψα γ t , where γ t is the unknown. γ t is solved by γ t = pinv(Ψα )αt
(22)
where pinv(·) is the matrix pseudo-inverse operator using SVD. The vector of facial animation parameters corresponding to the tracked face at time t is then computed as: 1 1 pt = Ψp γ t = Ψp pinv(Ψα )αt = Cαt (23) w w where the constant matrix C∈ Rm×(k+1) is precomputed offline. Table. 1 shows the implementation steps of our efficient performance-driven facial animation algorithm.
Hamiltonian Mechanics
139
Table 1. The steps of our facial animation algorithm. Process Offline 1. Compute the gradient of the subspace basis ∇x Φ(x) and matrix Λ(x).
2. Compute and store J0 and Ω.
3. Compute the parameter subspace basis Ψ. 4. Compute and store C.
Runtime 1. Reconstruct the image vector Φαt . 2. Use the motion parameter βt to compute I(βt , t + ∆t). 3. Compute e(t + ∆t) according to Eq. (10). 4. Compute Γ(αt , βt ). 5. Compute ∆β according to Eq. (16). 6. Compute βt+∆t = βt + ∆β. 7. Compute αt+∆t according to Eq. (17). 8. Go to step 5 to compute ∆β using αt+∆t until convergency. 9. Compute the facial animation parameters pt+∆t according to Eq. (23).
7.
Results
Our facial animation system is programmed with C++/OpenGL and runs on a 3.2GHz PC with 1GB memory. In order to evaluate the accuracy of estimation of facial animation parameters, we use a synthetic sequence as the input. The anatomy-based face model is animated to generate two synthetic image sequences: a training sequence (944 frames) and a tracking sequence (1865 frames). The facial expressions in the training sequence include the key expressions and are different from those in the tracking sequence. We select 134 normalized images from the training sequence to train the tracker. The target face region contains N = 147 × 220 pixels. These selected images allow us to form the deformation subspace, Φ, for tracking. The set of 54 normalized images of the key expressions and the pre-stored animation parameter vectors is used to compute the parameter subspace, Ψ, for facial expression retargeting.
Figure 5. Some tracked images from the synthetic sequence.
140
Yu Zhang
Figure 6. Errors of estimated animation parameters.
(a)
(b)
Figure 7. Example frames from two live tracking sequences and expressions synthesized on the 3D face model. Fig. 5 shows some tracking frames from the synthetic sequence. We assess the reanimation by measuring the maximum, mean, and root mean square (RMS) errors from the
Hamiltonian Mechanics
141
Table 2. Parameter values of two experiments and performance of our system. Notation: tracking time (Tt ), retargeting time (Tr ) and expression simulation time (Ts ).
estimated animation parameters to the ground truth. Fig. 6 plots the normalized errors of all parameters (the ratio between measured errors and the maximum parameter range). The result shows that the overall parameter estimation is accurate. Some relatively large errors (e.g., the maximum errors of the parameters 3-8, 18 and 19 which control the peripheral muscles) occur when the model in the tracking sequence has a large out-of-plane rotation (> 20 degrees). We also test our system using the live video sequences of different subjects. Fig. 7 shows some tracking frames and animation snapshots. Note that we do not estimate 3D head motion, only the in-plane 2D rotation and translation extracted during the tracking stage are used to produce the global head motion. The first tracking video (male) consists of 5,100 frames, acquired by a Sony VL500 firewire camera at 30 fps. The second video (female) consists of 5,400 frames. Also, long sequences are used for training the tracker and the expression retargeting module. In particular, a set of images corresponding to the key expressions are manually selected from the training sequence. Together with the stored animation parameter vectors, the appearance parameters of these images are used to form the example parameter pairs for building the parameter subspace. Table 2 shows the parameter values used in two experiments and the average time of each process for resynthesizing facial animation. With the proposed algorithm we can achieve standard video rate performance (30Hz) for face tracking and estimation of animation parameters. The bulk of computation is consumed by the physically-based expression simulation [33]. Nevertheless, we can still achieve an average frame rate of about 15 fps animation speed on the current experimental platform.
8.
Conclusion
We have presented an efficient example-based method to synthesize facial expressions on an anatomical model by retargeting captured performance. Our method automatically determines facial muscle activations from markerless video footage. In offline processing, we build a linear subspace model to compactly describe appearance variation due to facial deformations. Based upon this model, an efficient face tracker is used to track the target face region by simultaneously solving for both the motion and appearance parameters. Efficiency is gained by precomputing a set of motion template resulting from factorization of the image Jacobian used in minimizing the tracking error function. Using a set of example pairs that consist of the appearance and animation parameters corresponding to the
142
Yu Zhang
key expressions, we learn an expression retargeting matrix. Given the appearance parameters provided by the tracker, facial animation parameters are estimated in real-time, and the anatomical model is animated at an interactive rate. One improvement of the existing system consists in splitting the face into the upper region (including eyes and forehead) and the lower region (including nose, mouth, and chin). This allows a more compact and accurate model of the regions of interest. It enables us to use a subspace of much lower dimension (i.e., number of eigenvectors) to model the appearance of a target region with lower dimensionality (i.e., length of an eigenvector), which would speed up the tracking process. Moreover, we would like to automatically select key expressions of the actor from the training sequence. One possibility is to build a personalized model by conforming the anatomical 3D face model to the face shape in video footage, and generate key expressions on this model. The corresponding video image can then be found by minimizing an objective function that measures the similarity between the expression in the video and 2D projection of the synthesized expression.
References [1] S. Basu, N. Oliver, and A. Pentland. “3D modeling of human lip motion.” Proc. ICCV’98, pp. 337-343, 1998. [2] M. J. Black and A. D. Jepson. “Eigentracking: Robust matching and tracking of articulated objects using a view-based representation.” International Journal of Computer Vision, 26(1): 63-84, 1998. [3] F. L. Bookstein. “Principle warps: Thin plate splines and the decomposition of deformations.” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2(6):567585, 1989. [4] B. Choe, H. Lee, and H.-S. Ko. “Performance-driven muscle-based facial animation.” Journal of Visualization and Computer Animation, 12: 67-79, 2001. [5] B. Choe and H. S. Ko. “Analysis and synthesis of facial expressions with handgenerated muscle actuation basis.” Proc. Computer Animation’01, pp. 12-19, 2001. [6] E. Chuang and C. Bregler. “Performance driven facial animation using blendshape interpolation.” Standford University Computer Science Technical Report, CSTR-200202, April 2002. [7] T. F. Cootes, G. J. Edwards, and C. J. Taylor. “Active appearance models.” Proc. ECCV’98, vol. 2, pp. 484-498, 1998. [8] D. DeCarlo and D. Metaxas. “Deformable model-based shape and motion analysis from images using motion residual error.” Proc. ICCV’98, pp. 113-119, 1998. [9] P. Eisert and B. Girod. “Analyzing facial expression for virtual conferencing.” IEEE Computer Graphics and Application, 18(5):70-78, 1998. [10] P. Ekman and W. V. Friesen, Facial Action Coding System, Consulting Psychologists Press Inc., Palo Alto, California 94306, 1978.
Hamiltonian Mechanics
143
[11] R. Enciso, J. Li, D. Fidaleo, T-Y. Kim, J-Y. Noh, and U. Neumann. “Synthesis of 3D faces.” International Workshop on Digital and Computational Video, December 1999. [12] I. Essa and A. Pentland. “Coding, analysis, interpretation, and recognition of facial expressions.” IEEE Tran. Pattern Analysis and Machine Intelligence, 19(7):757-763, July 1997. [13] B. Guenter, C. Grimm, D. Wood, H, Malvar, F. Pighin. “Making faces.” Proc. SIGGRAPH’98, pp. 55-66, July 1998. [14] G. Hager and P. Belhumeur. “Efficient region tracking with parametric models of geometry and illuminations.” IEEE Tran. Pattern Analysis and Machine Intelligence, 20(10):1025-1039, 1998. [15] K. K¨ahler, J. Haber, H. Yamauchi, and H.-P. Seidel. “Head shop: Generating animated head models with anatomical structure.” Proc. ACM SIGGRAPH Symp. on Comput. Anim., pp. 55-64, 2002. [16] R. Koch, M. Gross, and A. Bosshard. “Emotion editing using finite elements.”Proc. EUROGRAPHICS’98, pp. 295-302, 1998. [17] C. Kouadio, P. Poulin, and P. Lachapelle. “Real-time facial animation based upon a bank of 3D facial expression.” Proc. Computer Animation’98, pp. 128-136, 1998. [18] Y. Lee, D. Terzopoulos, and K. Waters. “Realistic modeling for facial animation.” Proc. SIGGRAPH’95, pp. 55-62, August 1995. [19] S. Morishima, T. Ishikawa, and D. Terzopoulos. “Facial muscle parameter decision from 2D frontal image.” Proc. the Int. Conf. on Pattern Recognition, vol. 1, 160-162, 1998. [20] J. Y. Noh and U. Neumann. “Expression cloning.” Proc. SIGGRAPH’01, pp. 277-288, August 2001. [21] J. Y. Noh and U. Neumann. A survey of facial modeling and animation techniques. USC Technical Report 99-705, USC, Los Angeles, CA, 1999. [22] J. Ohya, Y. Kitamura, H. Takemura, H. Ishi, F. Kishino, and N. Terashima. “Virtual space teleconferencing: Real-time reproduction of 3d human images.” Journal of Visual Communications and Image Representation, 6(1): 1-25, 1995. [23] F. I. Parke and K. Waters. Computer Facial Animation. AK Peters, Wellesley, MA, 1996. [24] F. I. Parke. Computer generated animation of faces. Master’s thesis, University of Utah, Salt Lake City, 1972. [25] S. Platt and N. Badler. “Animating facial expressions.” Proc. SIGGRAPH’81, pp. 245252, 1981.
144
Yu Zhang
[26] F. Pighin, R. Szeliski, and D. H. Salesin. “Resynthesizing facial animation through 3d model-based tracking.” Proc. ICCV’99, pp. 143-150, 1999. [27] F. Pighin, J. Hecker, D. Lischinski, R. Szeliski, and D. H. Salesin. “Synthesizing realistic facial expressions from photographs.” Proc. SIGGRAPH’98, pp. 75-84, July 1998. [28] E. Sifakis, A. Selle, A. Robinson-Mosher, and R. Fedkiw. “Simulating speech with a physics-based facial muscle model.” Proc. ACM SIGGRAPH/Eurographics Symp. on Comput. Anim.’06, pp. 261-270, 2006. [29] E. Sifakis, I. Neverov, and R. Fedkiw. “Automatic determination of facial muscle activations from sparse motion capture marker data.” Proc. SIGGRAPH’05, pp. 417-425, 2005. [30] D. Terzopoulos and K. Waters. “Physically-based facial modeling, analysis and animation.” Journal of Visualization and Computer Animation, vol.1, pp. 73-80, 1990. [31] D. Terzopoulos and K. Waters. “Analysis and synthesis of facial image sequences using physical and anatomical models.” IEEE Tran. Pattern Analysis and Machine Intelligence, 15(6):569-579, June 1993. [32] L. Williams. “Performance-driven facial animation.” Proc. SIGGRAPH’90, pp. 235242, August 1990. [33] Y. Zhang, E. C. Prakash, and E. Sung. “Efficient modeling of an anatomy-based face and fast 3D facial expression synthesis.” Computer Graphics Forum, 22(2): 159-169, June 2003.
In: Computer Animation ISBN 978-1-60741-559-6 c 2010 Nova Science Publishers, Inc. Editors: J.S. Wright and L.M. Hughes, pp. 145-156
Chapter 6
DYNAMICS FOR M ANAGING O CCLUSION OF B UILDINGS IN PANORAMIC M APS Neeharika Adabala∗ Microsoft Research India, “Scientia”, 196/36, 2nd Main, Sadashivnagar, Bangalore 560080
Abstract Panoramic maps depict urban areas in oblique view. This form of cartography was prevalent from the late sixteenth century to the early nineteenth century, when there were not many skyscrapers in urban areas. But oblique view maps in the current urban scenarios suffer from loss of details due to occlusion among closely located multistory buildings. In this work we leverage the time dimension to overcome the clutter in space dimension by introducing functional dynamics. We define a parameter called occlusion index for an urban scene at a given viewpoint. Solving the problem of occlusion involves devising methods for visualizing the urban scene that reduce/minimize the occlusion index. We explore occlusion reduction techniques that involve selecting optimal viewpoints, displacing buildings, making buildings transparent and changing building heights. We demonstrate these approaches by presenting screen shots of the solution applied to a prototype city block, and discuss the advantages and disadvantages of these solutions. This work is pioneering in its approach to applying animation in cartography, which has previously used animations only to depict time-dependent phenomena or fly-throughs.
1.
Introduction
Panoramic maps were a vibrant form of representing urban locales from the late sixteenth century to the early nineteenth century [8]. They depicted cities in oblique view and included trees, people, horse carts on roads, etc., composed together aesthetically. The beauty of panoramic maps makes them popular wall hangings to this day. These maps in oblique view are also appealing because they represent landmarks in three-dimensions, making them easier to recognize for map users while finding their way in a city. ∗
E-mail address:
[email protected]
146
Neeharika Adabala
Making of panoramic maps disappeared with time as it involved extensive manual effort by skilled artists. Also current day urban areas contain several buildings that occlude each other, making oblique view maps unattractive and lacking in information. Recent advances in computer graphics and computer vision have enabled creation of large-scale urban models [15, 13, 14] that can be used to generate panoramic maps. These techniques significantly reduce the manual effort of panoramic map artists. However, the problem of extensive building occlusion severely limits the usefulness of the maps. Therefore there is a need to explore techniques to overcome this limitation, and increase the visibility of buildings. A simple way to address the problem of occlusion is to modify the viewpoint of the map. This approach is applicable for either static maps printed on a paper or dynamics maps displayed by online mapping services. However, it is not always possible to find an acceptable solution as one or more buildings may be occluded in every possible viewpoint. Therefore a solution cannot be guaranteed by this approach. When it is not possible to find an ideal viewpoint, then it is clear that there is no possible solution in the three dimensions of space. We overcome this problem by resorting to the time dimension - we do this by introducing the concept of functional dynamics into the rendering and solve the occlusion problem. The solutions can take various forms, the specific techniques that we describe in this work include: displacing buildings to improve visibility, viewpoint-dependent building transparency and altering building heights. These solutions are especially relevant in the context of the recent trend towards providing online mapping services on digital displays. These online maps no longer have to be static. The modifications based on the proposed approaches are applied to entities (buildings) present in the map resulting in a time varying appearance of the map. The above time variation in maps during user interaction is different from the typical dynamics that are often incorporated in contemporary maps to depict time-dependent phenomena (example: the evolution of a storm, or the expansion of an empire with time). We note the distinction of the dynamics introduced by us to solve the problem of occlusion by emphasizing that this dynamic is unrelated to the physical concept of time lapsed. Therefore the animation is said to be functional and has the sole function of reducing occlusion between buildings. This differentiation is important when developing GI systems that need to visualize large volumes of information by employing both functional dynamics and animations of time-dependent phenomena. Functional dynamics impacts the user interaction with the geographic information system; an effective implementation of functional dynamics results in conveying more information to the user with lesser interaction. The rest of the paper is organized as follows, the following section gives an overview of related work. In section 3 we define a parameter called occlusion index, which characterizes the extent to which the buildings are occluded for a given viewpoint. The visibility of the buildings in a map can be optimized by selecting a configuration that minimizes the occlusion index. Section 4 to 7 describes techniques to manipulate the oblique view maps to minimize the value of occlusion index when rendering of an urban region model. We present example results on a hypothetical synthesized city block to illustrate the working of the proposed technique in section 8 and give conclusions in section 9.
Dynamics for Managing Occlusion of Buildings in Panoramic Maps
2.
147
Related Work
Several approaches to large-scale urban modeling have been developed and are described in [10]. Other interesting results on non-photorealistic rendering of city models inspired by panoramic maps are described in Buchholz et al. [2] and techniques for rendering them stylistically are presented in the work of Adabala [1] and D¨ollner and Walther [4]. A technique for procedurally creating city models is presented in [16]. The rendering of the models [16] generated in birds-eye view results in impressive panoramic view of urban areas, however the resulting images cannot be used as maps as several of the buildings occlude each other. The problem of being able to use the rendered results as maps has not been addressed. Landmarks are best represented by three-dimensional models in oblique view, they have a key role in user-friendliness of a map and are often seen in tourist maps. Therefore techniques to make oblique view maps from models of cities is attractive. There has been significant amount of work done in the area of determining optimal viewpoints for rendering of graphics models. These include the work of Pere-Pau V´azquez et al. [17] where they develop a method of determining the optimal viewpoint for viewing a geometric model by defining viewpoint entropy. The entropy attempts to capture the degree of visibility of features of the geometric model, higher entropy of a viewpoint indicates greater visible information. A large body of work exists in computer vision literature on finding optimal viewpoints at which to place cameras and lights such that the resulting image is most suitable to apply computer vision algorithms. Capture of edges and features of objects is crucial for identifying objects. Therefore these algorithms have mainly focused on single objects rather than scenes where elements occlude each other [19]. All research in the context of aspect graphs in computer vision is relevant to this work, the concept of aspect graph is very general and has to be adapted significantly to be applied in a particular context. Another group of pertinent algorithms in computer vision are developed in the context are robot path planning; indoor environments form the main subject of these studies [12]. Rendering algorithms like radiosity and ray-tracing often solve the visibility problem in the context of whether light from a one surface reaches another [7]. Similarly shadow casting algorithms also solved the visibility problem. The work by Wonka and Schmalstieg [18] solve a problem of visibility in the context of walk-throughs in urban environments. An survey of walk-through related visibility results can be found in [3]. A summary of several studies on visibility can be found in the excellent multidisciplinary survey of visibility studies by Durand [5, 6]. The other interesting group of problems that require viewpoint analysis are the problems that can be mapped onto the art gallery problem. Several analytical solutions have been proposed for the problem that are based on polygon triangulation [11]. The solutions limit themselves to two-dimensions by considering the art gallery as a polygon. Interesting solutions have been developed for this problem, however they cannot be easily applied in the context of our problem. All these visibility related solutions treat the graphics models as a collection of triangles and define optimization techniques applicable to the triangles. The number of triangles can be large making the optimization algorithms slow. Also, the triangles cannot be associated with the semantics of the model they represent. In this work we introduce a simple concept
148
Neeharika Adabala
of occlusion index that is specific to urban models. The occlusion index provides a tool that enables us to retain the semantic significance of each building as an entity while optimizing visibility. The work on automatically generating tourist maps [9] explores the problem of reducing occlusion in tourist maps, and addresses the problem by adopting the traditional technique used by artists that involves expanding the roads. This technique considers the static maps and emphasizes on the visibility of roads. The idea of using functional dynamics to overcome occlusion and improve visualization has not been explored prior to this work.
3.
The Occlusion Index
In our approach buildings within the region of interest are represented by their bounding boxes for the purpose of visibility computations. This is quite accurate for rectangular buildings blocks, and for more irregular shaped buildings also it is acceptable. A general rule of thumb is that such a bounding box is suitable for all buildings that can be modeled completely by extruding a footprint till the final height with no modifications in cross section. This approximation can lead to faulty results when the building tapers towards the top. In this case if the full height of the building is considered in the bounding box, then the solutions can be erroneous for the visibility of the tapering portion. A viewpoint in which a small portion of the tapered tip of the building is visible may be selected as an optimal viewpoint. We over come this problem by limiting the height of the bounding box to the height of the building that is not tapered when it is being evaluated for visibility. We use the bounding box for the full height of the building in computations where it occludes other buildings so that occlusion by the tapering part is not ignored in the computation. Using a variable bounding box, illustrated in figure 1, is a significant approximation to visibility computations on actual geometry, but we err on the side of being more conservative in our estimates in this approach. This leads to reasonable solutions while keeping the computations simple. Once the bounding boxes that represent the buildings have been identified, we determine the extent of occlusion of buildings for a given view point using these bounding boxes. For this purpose we define a parameter called occlusion index that measures that is an aggregate of the farction of buildings occluded for a given viewpoint of the urban region. The building P occlusion boi of a building i for a viewpoint is defined by boi = j (j6=i) BlkF racij where BlkF racij is the fraction of visible area of building i that is blocked by building j. We apply the models transformation matrix and the projection matrix to compute the screen coordinates of each of the models, and the value of BlkF racij is computed in the screen coordinates space. In the case of buildings with tapering roofs or buildings that require the use of the variable bounding box approach, we use the smaller height value shown in figure 1 while computing the value of BlkF racij of the building i, and the larger height value when it occurs as the building j while occluding the building i behind it. Thus we are able to separate the height property of a building during visibility computation, and occlusion computation.
Dynamics for Managing Occlusion of Buildings in Panoramic Maps
149
Figure 1. Bounding boxes of buildings when building cannot be thought of as an extrusion of its footprint. When we want to find the solution for a subset of total number of buildings in the scene then we limit the computation of boi to the buildings of interest. However the computation of values BlkF racij is performed over all buildings j in the scene. The occlusion index for a given viewpoint and position of buildings is given by O = P i boi where the summing is over all buildings i in the scene for which boi is computed. In the following four sections we describes ways in which this occlusion index can be exploited to reduce occlusion among buildings in panoramics maps.
4.
Modification of Viewpoint
An obvious way to handle occlusion is to change the viewpoint [6], for this the occlusion index is computed repeatedly for rotating viewpoints. The viewing direction that has the minimum value of occlusion index is selected as the optimal viewpoint. The algorithm for determining the optimal viewpoint is outlined below: 1. Select region or buildings for which optimal viewpoint is to be determined. Region can be selected by truncating an urban scene to an extended bounded region around the current visible portion of environment on the screen. If buildings are to be selected this can be done by querying a database that contains additional information about the functionality of the buildings. 2. Select the angle of elevation and zoom level at which to display the panoramic map of region. 3. Determine bounding boxes for buildings. 4. Rotate 360 degrees about the center of the selected region, and evaluate occlusion index for each viewpoint: (a) For each buildings i in the scene compute the building occlusion term boi by considering all the buildings that can block it for given viewpoint.
150
Neeharika Adabala (b) Sum up the values of boi to find the occlusion index for the scene.
5. Fix viewpoint to the direction at which the occlusion index assumed minimum value. When no efficient data structure is used in storing the building information, this is an O(n2 ) algorithm when there are n buildings in the scene and we are optimizing for all of them. If m buildings are selected as buildings of interest then the algorithm complexity is O(mn). This approach may not always result in a solution in which at least some part of every building is seen, especially in over crowded downtown areas. Also, modifying the viewpoint is not applicable in all cases, often the cartographer or user may decide which direction - north, south, east, or west - of the urban area should be displayed at the top of the map. Therefore there is a need to explore techniques to optimize the visibility of buildings after the orientation of the maps has been selected. The following three sections describe techniques that use funcational animation to optimize the visibility of buildings.
5.
Displacing Buildings
In this technique for handling occlusion we shift builds about their original position by small amounts when they occlude each other. This movement is dependent on the viewpoint, and it can make completely occluded buildings to become partially visible in the map. Constraints are applied to the extent to which the building can moved from its original position to maintain the usefulness of the map. Buildings cannot change its relative location with respect to roads or with respect to other buildings. The occlusion index is minimized to obtain the most suitable building locations for a given viewpoint. This approach cannot guarantee a solution in all cases, especially in over crowded downtown areas where the buildings have a limit region in which they can be displaced.
6.
Making Buildings Transparent
An alternative dynamic approach is to make the buildings that are blocking other buildings transparent. This is again a viewpoint dependent approach. When a building becomes transparent the contribution of the building to the occlusion index O is considered to be zero. This approach guarantees that all the buildings are visible from any given viewpoint. However it can happen that for certain viewpoints, several buildings become transparent for enabling the visibility of all the buildings. Alternative approaches of assigning a depth dependent translucency to the buildings can be adopted in this case. Here the contribution of a building to the occlusion index is weighted by the translucency associated with the P buildings. Therefore the occlusion index is given by O = i αi boi where αi is the transparency associated with the building i, and the values of boi are computed with the equation P boi = j (j6=i) αj BlkF racij where αj is the transparency associated with building j.
Dynamics for Managing Occlusion of Buildings in Panoramic Maps
151
Figure 2. Optimal view point for a simple urban scene.
7. Altering Heights of Buildings An alternative approach is to scale down the height of obstructing buildings. This approach is adopted in cases where the accuracy of the skyline and scale is not important, and in stylized maps where the information and aesthetics are more emphasized. This kind of distortion is often seen in tourist maps where some of the monuments of a city are draw to the scale of their fame rather than their physical size. Different ways of changing height of buildings can be explore. For example one can we exchange the height of buildings when a taller building obstructs a shorter one. This approach to change heights prevents an overly uniform appearance of buildings. Also it prevents excessive changing of building heights with changes in viewpoint. Alternatively one can assign ascending order of heights to buildings for a given viewpoint. This approach can result in a regular monotonous representation of the urban areas. In the approaches described here where we shorten the buildings in the foreground relative to the ones at the back, the occlusion index reduces as the values of BlkF racij decrease when shorter buildings are in front. Note that the approaches of manipulating properties of buildings present in a map are not completely novel. In fact these approaches have been used extensively in static visualizations of cities that are found in the Nuremberg Chronicle. Buildings were moved, and heights changed in the town plans printed in this fifteenth century incunabulum.
8.
Results and Discussion
The model of an urban area was synthesized, such that there are several multistory buildings in adjacent blocks, to demonstrate our results. In real world scenarios a subset of buildings from the map may be displayed in a separate window, similar to the panoramic map viewing tool provided at [8], and our techniques can work in real time in this window. We have implemented our algorithm in C++ and OpenGL on a Windows system. The results are illustrated in the figures 2 to 6. Figure 2 shows the synthesized simple urban area at an optimal viewpoint. In the case of results shown in figure 3 the model was created by considering an aerial image of a region in Las Vegas shown in the left. We marked out the footprint of some
152
Neeharika Adabala
Figure 3. Optimal view point for urban scene for irregular shape of building footprints. (Left) Aerial image of urban area - a region in Las Vegas. (Right above) Optimal viewpoint for the urban area model created from aerial image. (Right below) The two viewpoints when occlusion index reaches peak values. complex shaped buildings from this image and created a model by extruding these footprints to random heights. The variation of occlusion index is given for this model in figure 4. Notice that the occlusion index peaks at two points. These are the two viewpoints in which the buildings line up one behind the other in two rows, shown in the right bottom in figure 3. There are two points at which the occlusion index reaches a minimal value, one such viewpoint is shown in right top of figure 3. A limitation of this approach is that it cannot guarantee that all buildings are visible at optimal viewpoints. At some elevations of viewing the urban region some of the buildings are completely occluded even at optimal viewpoints. The other three approaches make use of functional dynamics to enhance the information visible to the user. Frames captured during interaction with the synthesized urban block are presented in figures 5 to 8. Observe the occlusion of the pink building in each case. The figure 5 shows the completely occluded building. In figure 6 the green building in the front is displaced suitably, and the pink building becomes partially visible. In the figure 7 the buildings obstructing a large portion of buildings behind them become transparent. A threshold value of the extant of obstruction is used to decide when a building should be made transparent, this value is provided as a user control. A low value of threshold makes the buildings become transparent even when they occlude a small fraction of the building behind them. In the implementation shown in figure 7 we made the building faces fully transparent and only retained the edges to indicate that a building is currently transparent in this viewpoint to improve visibility. In figure 8 shows the results when the buildings change height. Such a modification of the ground truth can be disorienting to a user. However this can be mitigated if the buildings are labeled and the user is aware that the map is stylized and not to scale with respect to the
Dynamics for Managing Occlusion of Buildings in Panoramic Maps
153
25
20
Occlusion Index
15
10
5
0
Figure 4. Graph of variation of occlusion index with change in view point for scene given in following figure. Notice the to peaks when the view point coincides with the two directions in which the buildings are apparently lined up one behind the other as shown in Figure 3. building heights. Our initial experimentation with users showed that they find the approach interesting, however more detailed user studies are required for wider use of this approach. These approaches enable viewing of more information in a single frame, therefore a better picture of the relative locations of buildings in the urban region is obtained by lesser interaction. A more thorough manipulation of the region would have been required without the support from functional dynamics.
9.
Conclusions
In this work we addressed the issue of information loss in oblique view maps due to occlusion among closely located multistory buildings. Maps are abstract representations of
Figure 5. Handling occlusion of buildings in panoramic maps observe the pink building in all cases. The pink building is completely occluded for this viewpoint.
154
Neeharika Adabala
Figure 6. Moving position of buildings slightly.
Figure 7. Making buildings transparent.
Figure 8. Changing height of buildings - exchange of height between obstructing building and building at the back.
Dynamics for Managing Occlusion of Buildings in Panoramic Maps
155
spatial information, therefore we manipulated the visualizations scheme to maximize the information presented to the user. We defined a parameter called occlusion index in the context of urban areas with multistory buildings, and determined optimal viewpoints for the region of interest. This solution did not always guarantee the visibility of all buildings in the urban region of interest. We therefore proposed alternative solutions to visualize overcrowded three-dimensional spaces by taking recourse to the fourth ‘time’ dimension; we introduced the concept of functional dynamics. We applied this approach and suggested techniques that displace buildings slightly from original positions, make buildings transparent and change heights of buildings to improve the visibility of information in a region. It should be noted that while our work demonstrated the results individually for each of the approaches, they can be combined together to give more effective solutions. That is, buildings can be displaces, made transparent, and scaled, simultaneously to create optimal visualizations of the urban regions. The concept of introducing dynamics to improve visualization is relatively new to cartography, and it is important to conduct user studies to learn which approaches are more attractive and effective for users. The initial reaction of users to the proposed techniques has been encouraging.
Acknowledgment The author would like to acknowledge the discussions with Dr. Kentaro Toyama that led to the development of this work.
References [1] N. Adabala. A technique for building representation in oblique-view maps of modern urban areas. To appear in The Cartographic Journal, 2009. [2] H. Buchholz, J. D¨ollner, M. Nienhaus, and F. Kirsch. Real-time non-photorealistic rendering of 3d city models. In Proceedings of the 1st International Workshop on Next Generation 3D City Models, 2005. [3] D. Cohen-Or, Y. Chrysanthou, C. Silva, and F. Durand. A survey of visibility for walkthrough applications. IEEE Transactions on Visualization and Computer Graphics, 9(3):412–431, July-Sept. 2003. [4] J. D¨ollner and M. Walther. Real-time expressive rendering of city models. In Seventh International Conference on Information Visualization, Proceedings IEEE 2003 Information Visualization, pages 245–250, 2003. [5] F. Durand. A multidisciplinary survey of visibility. in ACM Siggraph course notes Visibility, Problems, Techniques, and Applications, July 2000. [6] F. Durand. 3D Visibility: analytical study and applications. PhD thesis, Universit´e Joseph Fourier, Grenoble I, July 1999. http://www-imagis.imag.fr.
156
Neeharika Adabala
[7] F. Durand, G. Drettakis, and C. Puech. Fast and accurate hierarchical radiosity using global visibility. ACM Transactions on Graphics, 18(2):128–170, 1999. [8] Geography and MapsDivision. The library of congress: Panoramic maps collection. http://memory.loc.gov/ammem/pmhtml/panhome.html. [9] F. Grabler, M. Agrawala, R. W. Sumner, and M. Pauly. Automatic generation of tourist maps. ACM Trans. Graph., 27(3):1–11, 2008. [10] J. Hu, S. You, and U. Neumann. Approaches to large-scale urban modeling. IEEE Comput. Graph. Appl., 23(6):62–69, 2003. [11] D. T. Lee and A. K. Lin. Computational complexity of art gallery problems. IEEE Trans. Inf. Theor., 32(2):276–282, 1986. [12] J. Lengyel, M. Reichert, B. R. Donald, and D. P. Greenberg. Real-time robot motion planning using rasterizing computer graphics hardware. In SIGGRAPH ’90: Proceedings of the 17th annual conference on Computer graphics and interactive techniques, pages 327–335, New York, NY, USA, 1990. ACM Press. [13] P. M¨uller, P. Wonka, S. Haegler, A. Ulmer, and L. V. Gool. Procedural modeling of buildings. In SIGGRAPH ’06: ACM SIGGRAPH 2006 Papers, pages 614–623, New York, NY, USA, 2006. ACM. [14] P. M¨uller, G. Zeng, P. Wonka, and L. V. Gool. Image-based procedural modeling of facades. ACM Trans. Graph., 26(3):85, 2007. [15] Y. I. H. Parish and P. M¨uller. Procedural modeling of cities. In SIGGRAPH ’01: Proceedings of the 28th annual conference on Computer graphics and interactive techniques, pages 301–308, New York, NY, USA, 2001. ACM. [16] Y. I. H. Parish and P. M¨uller. Procedural modeling of cities. In SIGGRAPH ’01: Proceedings of the 28th annual conference on Computer graphics and interactive techniques, pages 301–308, New York, NY, USA, 2001. ACM Press. [17] P.-P. V´azquez, M. Feixas, M. Sbert, and W. Heidrich. Viewpoint selection using viewpoint entropy. In VMV ’01: Proceedings of the Vision Modeling and Visualization Conference 2001, pages 273–280, 2001. [18] P. Wonka and D. Schmalstieg. Occluder shadows for fast walkthroughs of urban environments. In P. Brunet and R. Scopigno, editors, Computer Graphics Forum (Eurographics ’99), volume 18(3), pages 51–60. The Eurographics Association and Blackwell Publishers, 1999. [19] S. Yi, M. Haralick, and L. G. Shapiro. Automatic sensor and light source positioning for machine vision. In In Proceedings of the 10th International Conference on Pattern Recognition, pages 55–59, Piscataway, NJ, USA, 1990. IEEE Press.
In: Computer Animation Editors: J.S. Wright and L.M. Hughes, pp. 157-175
ISBN: 978-1-60741-559-6 © 2010 Nova Science Publishers, Inc.
Chapter 7
CONSTRAINT-BASED AND FEATURE-BASED CAD SYSTEMS AND APPLICATIONS Ioannis Fudosa and Vasiliki Stamatib Department of Computer Science, University of Ioannina, Greece
Abstract A new generation of Computer Aided Design systems has become available in which geometric constraints can be defined to determine properties of large designs. The new design concept, often called constraint-based design or design by features offers users the capability of easily defining and modifying a design, but introduces the problem of solving complicated, not always well defined, constraint problems. Traditional parametric models can also be enhanced to partially support declarative constraint-based descriptions. We provide an overview of representation schemes for CAD applications. Then we present a survey of methods for geometric constraint solving appropriate for Computer Aided Design. We demonstrate how these representations and constraint solving methods can be combined or adapted to support a broad range of CAD applications by presenting two example cases of successfully using a feature-based constraint-based representation scheme to support two different CAD applications.
1. Introduction Computer Aided Design has been the motivation for major breakthroughs in various fields of computer science such as computer graphics, visualization, and computer architecture. On the other hand, advancements in graph theory, geometric constraint solving, algorithms and data structures have enabled the use of computers in various fields such as manufacturing, VLSI design, reverse engineering, and restoration of artifacts. This new setting has established Computer Aided Design as a major framework for designing and editing machine parts, jewelry, archaeological findings, buildings, electronics and computers. We present a
a b
E-mail addresses:
[email protected]. E-mail addresses:
[email protected].
158
Ioannis Fudos and Vasiliki Stamati
framework for representing and editing CAD models for various applications based on two main concepts: Local characteristics which are commonly called features. These characteristics may determine structural properties such as connectivity and hierarchy of objects, geometric properties such as dimensions, distances, angles, smoothness, inclusion, and other topological relations and finally functionality properties that usually are application dependent and determine the behavior of the system when performing its required tasks. Local or global constraints imposed on the model to enforce complex geometric structures and advanced functionality. Such constraints may be part of a feature or span a number of different features. Constraints are clearly declarative, meaning that there is no suggested enforcement procedure. The systems determine based of the application and the user interaction how to enforce the system of geometric constraints. The rest of this is structured as follows: Section 2 presents a comparative survey of representation models for CAD applications. The hybrid feature-based and constraint-based scheme is argued to be the most appropriate for a diverse collection of CAD applications. Section 3 presents methods to tackle the bottleneck problem of geometric constraint solving. Traditional methods are described and modern approaches that are not domain sensitive are presented. Section 4 presents two example cases of applying the framework to two different CAD applications. Section 5 offers conclusions.
2. Computer Aided Design Representation Schemes There is a variety of geometric representations that can be used at different levels of CAD applications. The suitable representation scheme for each application depends on the scope of the application and its peculiarities. Some modeling types are simple and aim at providing only an external representation of the object, whereas others aim at encapsulating and providing additional knowledge and data, such as design intent, functionality, and editability. In the following we study common modeling schemes used in CAD applications. An object can be represented in the simple form of raw data, such as a point cloud corresponding to points on the surface of the object. A widespread scheme in solid modeling is the Boundary Representation (B-rep) model where the facets and edges that describe the boundary of a solid are modeled using a connectivity graph and a collection of surface and edge patches. On the other hand, Constructive Solid Geometry (CSG) and volume models handle objects as 3D solids. There are also higher-level representation schemes that capture not only the shape of the object but also provide information pertaining to design intent and functionality, which can be used later on for re-parameterization and modification. We briefly describe each scheme and evaluate its suitability for various CAD applications.
2.1. Raw Data The most basic and simple way to represent a 3D object is as raw data. By raw data we mean an unstructured collection of geometric primitives such as a point cloud or a range image. Such data are usually produced directly from a 3D object scanning or 3D reconstruction setting. The density of the data sets produced by these methods depends on the sampling rate
Constraint-Based and Feature-Based CAD Systems and Applications
159
used to acquire information from the object’s surface. Also, very often the point clouds obtained contain noisy data due to physical characteristics of the object or limitations and regulations of the acquisition method used. However, this problem has been dealt with and processing methods have been suggested that overcome this problem. The characteristic of this representation model is that it describes the object as discrete data, i.e. points, without providing any information about the connectivity, the topological relation among geometric primitives or the design intent. This type of representation is mainly used in point-based modeling, i.e. [1], [2] and reverse engineering applications [3].
2.2. Boundary Representation (Brep) A more common representation model in CAD applications is the boundary representation model (Brep), which describes the edges and facets of the boundary of the object. This type of model consists of a collection of surface patches. Surfaces can capture objects of complex and freeform design. Thanks to advancements in computer graphics hardware we are able to handle efficiently the CPU-intensive processing required by Brep. These factors have resulted in the increased usage of this representation in a wide spectrum of applications. A Brep model is often realized as a mesh of triangular or quadrilateral (and in general polygonal) planar or higher degree surface facets. Planar polygonal meshes (called polyhedral representations) are mostly suited for rendering and virtual reality and not for CAD applications since they do not provide sufficient detail. Often, other representation schemes are converted to polygonal representations for the purpose of rendering. Polyhedral representations such as triangulations are also used in reverse engineering applications, usually as intermediate representations during the reengineering process. A drawback of representing a 3D object with a polygonal mesh is that it cannot capture design semantics, such as design intent, inter part relations and overall behavior. Also model editing is only feasible in a local corrective sense. Smooth object surfaces cannot efficiently and accurately be represented by a polygonal mesh, even when a large number of polygons are used, since the polyhedral representation by definition cannot accommodate for G1 continuity. For example, to render areas of high curvature quite accurately we need to increase the number of polygons and decrease significantly the facet size. Overall, polyhedral representation is not suitable for describing objects with specific design characteristics and functionality, such as mechanical and industrial parts. Also it is not appropriate for describing complex and detailed objects since then the large number of polygons needed to sufficiently approximate the initial object makes the method unaffordable both time-wise and space-wise. Applications such as aesthetic and industrial engineering, reverse engineering and jewellery design use commonly non-planar surfaces to capture the boundaries of complex objects [4]. A Brep model may be constructed using NURBS (Non-Uniform Rational BSplines) or other parametric surface patches. This type of representation is useful in applications where free-form surfaces are part of the repertoire of primitive geometric entities. Brep can capture almost any type of object, such as mechanical parts and objects of aesthetic design. Surfaces can be described using appropriate parametric representations. Brep models make editing of local features feasible by interactively placing control points,
160
Ioannis Fudos and Vasiliki Stamati
therefore modifying the shape or curvature of the object’s feature. However, Brep models on their own do not capture higher design characteristics of the object such as functionality and part relationships. The information provided through this type of model is limited and does not provide tools for modifying parts of the model that affect the whole design. Therefore, Brep models are used in combination with other techniques (e.g. features, constraints) to obtain higher-level descriptions that correspond to more flexible and useful models that are suitable for CAD applications. For instance, in [5], the authors present a beautification process based on constraints which is performed on B-rep models constructed from reverse engineering range data. B-rep models acquired by re-engineering can present various inaccuracies and errors, therefore the authors suggest the beautification of the models by describing topological regularities in terms of geometric constraints.
2.3. Volume Modeling While surface raw data and Brep modeling schemes provide data concerning the boundary of a model, constructive solid geometry (CSG) and volumetric models represent the objects as a volume. This type of representation can be used for objects that B-rep cannot sufficiently describe. For example, a Brep model cannot represent unambiguously a sphere containing a hollow, whereas a volume model can easily capture such solids. Constructive solid geometry (CSG) models are created by performing Boolean operations on solid primitives e.g. spheres, cones, cylinders and cubes. We perceive that CSG models represent objects that can be created from solid primitives. Unless we use a very large number of primitives we cannot use CSG to model higher degree free-form objects. In general, the CSG representation scheme is well suited for mechanical part design and for all applications where the design history can be expressed as a tree of Boolean operations on geometric primitives. Also editing and local shape modification is performed by intervening in the appropriate operation (internal tree node). Converting CSG models to render-able ones is extremely difficult and therefore CSG is commonly used in conjunction to Brep. In this case a Brep model is always maintained and every modification is transformed to an incremental Brep editing operation. Constraints may also be used in conjunction to CSG for performing multiple internal node modifications at a single step. Volume pixels (voxels) are used in a volumetric approach to 3D object representation. A voxel is a geometric primitive and represents the smallest discrete volume used in this representation scheme. Voxel-based representations are commonly used for visualizing unstructured 3D volume date such as data from scientific computing, medical imaging etc. Although used in early CAD/CAM settings, volumetric representations have been proven to be very inefficient for computer aided editing, rendering and manufacturing. This representation scheme may be used as redundant auxiliary information in CAD applications [6] such as solid modeling, reverse engineering and feature-based and constraint-based modeling for the purposes of physical modeling and simulation.
Constraint-Based and Feature-Based CAD Systems and Applications
161
2.4. Higher-Level Representations in CAD A current promising trend in computer-aided design is to use higher-level structures for model representation. These structures are based on one of the former representation types in combination with additional structural, topological or other information. A feature-based representation scheme describes the object as a combination of features, which are surfaces or solid parts with specific characteristics. A constraint-based representation scheme uses geometric constraints enforced on the model and its features to obtain a more accurate representation that captures designer requirements. The skeleton of a model can also be considered as a higher-level CAD representation that can be used for specific operations such as feature detection and extraction. More specifically, the feature-based model is a representation scheme that is growing more and more popular. The model is described by defining collections of feature elements and relationships among them. The features are collections of points, surfaces or other features. For example a commonly used feature type is a cross-section of a solid. Constraints are applied to the features to create more accurate and robust models, but also for enforcing global criteria such as tolerance and beautification. This type of model representation has been established initially for manufacturing mechanical parts, where a library of features is created and then relationships among feature elements are enforced. The feature-based scheme is well suited to industrial design in general since it provides for advanced editability. This is due to the knowledge encapsulated by the model concerning tolerances, constraints, relationships and connectivity. For this reason, feature-based methods are often characterized as knowledge-based. Their main objective is to exploit any knowledge and information pertaining to design intent, functionality and construction process. Besides, this representation scheme supports collaborative CAD, reverse engineering and VLSI applications. This type of model also provides the user-designer with the capability of editing, redesigning and reconstructing the original design, depending on her preferences and needs by tailoring the model features [7]. A powerful higher-level structure for representing objects is the constraint-based scheme, which is often used in combination with features [8]. This representation scheme is particularly preferred in CAD applications where the objects being modeled, modified and manufactured are of geometric or freeform design and must conform to constraints determined locally on specific components or globally on the whole model [9]. Constraints defined on a model or its individual components can refer to almost any characteristic, i.e. geometric attributes, such as size and shape, topological characteristics, such as placement and connectivity, functionality and behavior. Constraint-based models are widely used in architecture, mechanical engineering, electronic design, aesthetic and industrial design, for design, modeling or re-engineering. The types of constraints defined depend on the nature of the CAD application. For example, in VLSI CAD a geometric constraint scheme may be used in conjunction to feature-based or other graph-based connectivity modeling. Constraints are imposed on each design feature used in the VLSI circuit referring to the feature’s intraconnectivity and its local characteristics (i.e. area, size, geometry). Contraints may also be imposed to express inter-feature connectivity requirements. Finally, constraints are also enforced globally on the circuit, and are targeted to optimize the overall placement and routing of the features on the chip.
162
Ioannis Fudos and Vasiliki Stamati
An object can also be represented by its skeleton. By skeleton we mean the closure of all points that have more than one closest point on the shape boundary (for example the medial axis transform). This representation provides the topology and shapes that exist in the object and also reflects the symmetries of an object. Depending on the type of application the skeleton is used for, it may be a 2D or 3D representation. For instance, in 3D the medial axis transform produces a medial surface. The exact computation of the 3D skeleton is a computationally intensive problem that returns a skeleton as complex as the object itself. Therefore we usually seek for an approximation. A skeleton representation scheme is used in various CAD applications for object recognition and retrieval [10], animation [11] and other solid modeling operations ([12], [13]). It is widely used in feature-based modeling, where it can be employed to describe the shape of features, in feature detection and extraction applications and shape deformation, for instance refer to [14] and [15].
3. Geometric Constraint Solving Higher level representations are powerful, accurate and user friendly. However they have a major bottleneck; all complicated functionality has been shifted to the solution of a large nonlinear system of geometric constraints with multiple valid solutions. In this section we present an overview of approaches to geometric constraint solving. We outline the most representative methods and evaluate their behavior in terms of the major concerns faced in CAD/CAM systems: solution selection, interactive speed, editability, handling of over and underconstrained configurations and scope [16].
3.1. Numerical Constraint Solvers In numerical constraint solvers, the constraints are translated into a system of algebraic equations and are solved using iterative methods. To handle the exponential number of solutions and the large number of parameters, iterative methods require sharp initial guesses. Also, most iterative methods have difficulties handling overconstrained or underconstrained instances. The advantage of these methods is that they have the potential to solve large nonlinear system that may not be solvable using any of the other methods. All existing solvers more or less switch to iterative methods when the given configuration is not solvable by the native method. This fact emphasizes the need for further research in the area of numerical constraint solving. Sketchpad [17] was the first system to use the method of relaxation as an alternative to propagation. Relaxation is a slow but quite general method. The Newton-Raphson method has been used in various systems [18] [19], and it proved to be faster than relaxation but it has the problem that it may not converge or it may converge to an unwanted solution after a chaotic behavior. For that reason, Juno [18] uses as initial state the sketch interactively drafted by the user. However, Newton-Raphson is so sensitive to the initial guess [20], that the sketch drafted must almost satisfy all constraints prior to constraint solving. A sophisticated use of the Newton-Raphson method was developed in [21], where an improved way for finding the inverse Jacobian matrix is presented. Furthermore, the idea of dividing the matrix of constraints into submatrices as presented in the same work has the potential of providing the
Constraint-Based and Feature-Based CAD Systems and Applications
163
user with useful information regarding the constraint structure of the sketch. Though this information is usually quantitative and nonspecific, it may help the user in basic modifications. To check whether a constraint problem is well-constrained, Chyz [22] proposes a preprocessing phase where the graph of constraints is analyzed to check whether a necessary condition is satisfied. The method is however quite expensive in time and it cannot detect all the cases of singularity. An alternative method to Newton-Raphson for geometric constraint solving is homotopy or continuation [23], that is argued in [24] to be more satisfactory in typical situations where Newton-Raphson fails. Homotopy, is global, exhaustive and thus slow when compared to the local and fast Newton's method [25], however it may be more appropriate for CAD/CAM systems when constructive methods fail, since it may return all solutions if designed carefully.
3.2. Constructive Constraint Solvers This class of constraint solvers is based on the fact that most configurations in an engineering drawing are solvable by ruler, compass and protractor or using other less classical repertoires of construction steps. In these methods the constraints are satisfied in a constructive fashion, which makes the constraint solving process natural for the user and suitable for interactive debugging. There are two main approaches in this direction.
Rule-Constructive Solvers Rule-constructive solvers use rewrite rules for the discovery and execution of the construction steps. In this approach, complex constraints can be easily handled, and extensions to the scope of the method are straightforward to incorporate [26]. Although it is a good approach for prototyping and experimentation, the extensive computations involved in the exhaustive searching and matching make it inappropriate for real world applications. A method that guarantees termination, ruler and compass completeness and uniqueness using the Knuth-Bendix critical pair algorithm is presented in ([27], [28]). This method can be proved to confirm theorems that are provable under a given system of axioms [29]. A system based on this method was implemented in Prolog. Aldefeld in [30] uses a forward chaining inference mechanism, where the notion of direction of lines is imposed by introducing additional rules, and thus restricting the solution space. A similar method is presented in [31], where handling of overconstrained and underconstrained problems is given special consideration. Sunde in [32] uses a rule-constructive method but adopts different rules for representing directed and nondirected distances, giving flexibility for dealing with the solution selection problem. In [33], the problem of nonunique solutions is handled by imposing a topological order on three geometric objects. An elaborate description of a complete set of rules for 2D geometric constraint solving can be found in [34]. In their work, the scope of the particular set of rules is characterized. [35] presents an extension of the set of rules of [34], and provides a correctness proof based on the techniques of [36].
164
Ioannis Fudos and Vasiliki Stamati
Graph-Constructive Solvers The graph-constructive approach has two phases. During the first phase the graph of constraints is analyzed and a sequence of construction steps is derived. During the second phase these construction steps are followed to place the geometric elements. These approaches are fast and more methodical. In addition, conclusions characterizing the scope of the method can be easily derived. A major drawback is that as the repertoire of constraints increases the graph-analysis algorithm needs to be modified. Fitzgerald [37] follows the method of dimensioned trees introduced by Requicha [38]. This method allows only horizontal and vertical distances and it is useful for simple engineering drawings. Todd in [39] first generalized the dimension trees of Requicha. Owen in [40] presents an extension of this principle that includes circularly dimensioned sketches. DCM [41] is a system that uses some extension of Owen's method. [42] presents an elaborative graph-constructive method, with fast analysis and construction algorithms, and extensions for handling classes of nonsolvable, underconstrained and consistently overconstrained configuration
3.3. Propagation Methods Propagation methods follow the approach met in traditional constraint solving systems. In this approach, the constraints are first translated into a system of equations involving variables and constants. The equations are then represented by an undirected graph which has as nodes the equations, the variables and the constants, and whose edges represent whether a variable or a constant appears in an equation. Subsequently, we try to direct the graph so as to satisfy all the equations starting from the constants. To accomplish this, various propagation techniques have been used but none of them guarantees to derive a solution and at the same time have a reasonable worst case running time. For a review of these methods see [28]. In a sense, the constructive constraint solvers can be thought of as a sub case of the propagation method (fixed geometric elements for constants and variable geometric elements for variables). However, constructive constraint solvers utilize domain specific information to derive more powerful and efficient algorithms.
3.4. Symbolic Constraint Solvers In symbolic solvers, the constraints are transformed to a system of algebraic equations which is solved using methods from algebraic manipulation, such as Grobner basis calculation [43] or Wu's method [44]. Although, these methods are interesting from a theoretical viewpoint, their practical significance is limited, since their time and space complexity is typically exponential or even hyperexponential.
3.5. Hierarchical and Hybrid Approaches A major result in analysis of constraint graphs by [45] in which an efficient method for detecting dense constraint subgraphs is described has enabled the solution of large systems of
Constraint-Based and Feature-Based CAD Systems and Applications
165
geometric constraints in 2, 3 or more dimensions. By using this result we can build efficient algorithm for solving arbitrary systems of geometric constraints. We first find a set of minimal disjoint dense constraint subgraphs. Each subgraph is then reduced in a supernode of high dimension and the method is applied recursively to the resulting graph. In this way we build a hierarchy of constraint graphs that is treated bottom up or top down based on the application. Interfeature 3D constraints result in systems of 3D constraints. Such systems are very hard to solve with graph constructive methods since there is not even a necessary and sufficient condition for well-constrainedness in 3D. By using the decomposition suggested by this approach we may breakdown the large geometric constraint system in a multitude of small systems with few variables. Such systems are usually easy to solve using global optimization with topological constraints to narrow down the root selection process. In this direction [46] has developed a novel method for placement and routing in VLSI by constructing a circuit hierarchy by detecting dense connectivity graphs and then employing global optimization algorithms for each sub-problem.
4. CAD Applications from a Feature-Based/Constraint-Based Point of View Since the models constructed by CAD applications are most often meant for manufacturing or production in general, it is necessary that they are robust and accurate. Also, most applications require that the model can be modified and re-engineered. The use of constraints and features in such CAD applications is essential and is the only representation scheme that sufficiently supports these requirements. In this section we will present two example approaches that adopt this modeling scheme.
4.1. Parametric Feature Based Design in Manufacturing Systems An example of CAD application where the feature- and constraint-based representation model is most appropriate is parametric feature-based manufacturing [47]. Parametric modeling was commonly used for the construction of complex models on which parameters were used to provide for subsequent customization. The parameters defined during the design and modeling process are relative to the individual geometric characteristics of the model or to the model as a whole. For example, the parameters can control characteristics such as length, height, width and hole radius. On the other hand, feature-based modeling is a representation scheme based on the combination of individual feature components. In this context a feature is a unit that can be defined as a connected set of geometric elements (i.e. a subpart) associated with attributes that describe its shape and behavior, such as geometry, topology, functionality and connectivity with other features. In traditional approaches each feature is linked to a set of local parameters that control its attribute values. Here, the feature-based model is complemented through the use of local and global constraints. The constraints are applied locally, in reference to the parameter values or the geometric characteristics of the primitives of the features to impose design or user-defined specifics such as hole size, pocket depth, and globally, in reference to the connectivity and the inter-feature relations of the model.
166
Ioannis Fudos and Vasiliki Stamati
Much work has been performed on the definition of features in relation to various CAD applications. Features are often perceived as 3D solid components that can be classified into feature libraries depending on their shape or geometry. This point of view is described for instance in [48], where the authors present a library of features fit for manufacturing applications, and in [49], where design features for machining are examined. In [50] features are defined as pierced voxels that are used to create traditional pierced jewellery. However, features can also be defined from surfaces, which is especially common in freeform design applications. For instance, [51] examines freeform surface features, whereas [52] presents a taxonomy of freeform features. Other work uses the notion of feature points and feature lines ([53], [54]) for applications usually related to data segmentation for reverse engineering, or shape deformation and manipulation. Since the definition of a feature is not strict, Hoffmann and Joan-Arinyo in [55] suggest the use of user-defined features in feature based modeling. Parametric and feature based modeling is an essential component of current CAD design systems. In traditional CAD systems, CSG and Brep models are created by adding and subtracting parts in the model and by applying transformations and various design operations. Design intent was not a concern in these systems and therefore precise editing that involved structural and arbitrary topological modifications of parts of the model was almost impossible without rebuilding the model from scratch. Editing a part of the model is feasible if the design steps are undone until the model returns to the previous state, when the part was created. This of course is possible if the design history of the model is recorded and it is obvious that even though editing theoretically concerns a part of the model, ultimately the whole design process is affected. Feature based CAD systems overcome this limitation by capturing design intent. Since the models are constructed using parameters and features, local editing is possible without necessarily affecting the whole model. Changes are propagated through the model based on the parameters and constraints defined in the system and based on the attributes and connectivity of the features. Feature-based constraint-based modeling systems provide libraries of feature components to be used in the design process and some support userdefined features. Applications such as custom design are feasible since components of models can be combined or re-designed to satisfy user defined preferences or requirements. Many commercial CAD modeling systems support parametric and/or feature-based modeling. Systems such as PRO/Engineer [56], AUTOCAD [57], IRONCAD [58], CATIA [59], Solidworks [60], SolidEdge [61] and Alibre design [62], which have been developed mainly for mechanical engineering, manufacturing and industrial design applications, have been integrated with parametric and/or feature based modeling capabilities. Architectural Desktop and AUTOCAD are CAD systems used in architectural applications that support parametric modeling. 3D Studio Max [63] and Maya [64] are parametric feature-based modeling systems used for artwork and animation. There are also systems that have been developed for specific CAD applications, such as jewelry, clothing and textile design, marine applications and furniture. The above modeling systems are very efficient for manufacturing and production applications. However more freeform applications, such as aesthetic and custom design, are still challenging even with these systems. An interesting case is jewellery design. A large number of CAD systems for jewellery design are parametric feature-based. They provide graphical interfaces with excellent rendering capabilities. The majority of these systems provide built-in libraries of settings and cut gems and stones and advanced feature-based design tools. Some systems provide advanced functionality that provides the use of builders
Constraint-Based and Feature-Based CAD Systems and Applications
167
for recording design steps and for defining parameter values for parts to be used in the process. Also, the majority of these systems have the capability of exporting models to rapid prototyping machines. However, in most CAD systems for jewellery, designing is performed manually using various tools and usually the design steps cannot be programmed to be executed automatically and accurately. This means that each different piece of jewellery has to be created basically from the beginning by hand, making custom design applications difficult and time-consuming. Also these systems require that the user has designing skills or knowledge of using CAD systems. In the following we will present an interesting example of jewellery application that is difficult to carry out with existing CAD systems: the construction of traditional pierced jewellery, .
Figure 1. Using a chisel to create carvings around a hole.
Figure 2. A structural element (feature).
In [50] ByzantineCAD, a feature-based CAD system suitable for the design of pierced Byzantine jewellery is presented. The system is automated and parametric meaning that the user-designer sets some parameter values and ByzantineCAD creates the jewellery model that corresponds to the specified values. This provides the designer with the ability to rapidly create custom-designed jewellery, based on the preferences of the customers such as including their initials on a ring. ByzantineCAD introduces a feature-based and voxel-based approach to designing jewellery, through the definition of elementary structural elements with
168
Ioannis Fudos and Vasiliki Stamati
specific attributes and properties that are used as building blocks to construct complex pierced designs. More specifically, pierced Byzantine jewellery are gold jewels with pierced designs that were made along the coastlines of the eastern Mediterranean Sea during the period 3rd –7th century A.D. Their originality is due to the particular processing technique that is used for their creation resulting in a special aesthetic effect. Pierced jewellery was created from thin sheets of gold. The designs were engraved on these sheets of gold with a thin sharp tool. After the outlining of the designs, holes following their shape were created and these were decorated with triangular carvings, using an iron chisel.
(a)
Figure 3. Pierced voxel elements such as (a) are used as features to create complex solid plaques representing designs, i.e. letters or words, that are sized and modified appropriately to construct custom-designed jewellery (i.e. ring).
In ByzantineCAD a feature library of carved, pierced voxel elements is defined in accordance to the craftsmanship used in traditional Byzantine jewellery. The design of pierced jewellery is made up of cylindrical holes that have carvings around them. Each hole with the corresponding carvings around it is considered for the purposes of reconstruction as a structural element (feature). Each feature is a solid made of a rectangular parallelepiped with a cylindrical hole and the corresponding carvings around the hole (figures 1,2) . According to the aesthetic rules that characterize traditional pierced jewellery, all structural elements have the same size but differ in the position of the hole and the carvings around it. The hole can be located either in the center of the parallelepiped or in the center of any of the four quarters.
Constraint-Based and Feature-Based CAD Systems and Applications
169
Note that, in terms of computer aided design and manufacturing, the cylindrical hole can be positioned anywhere in the rectangular parallelepiped; the above restriction follows from careful interpretation of the traditional artistic patterns used. Attributes of these feature elements are characteristics such as the number of carvings around the cylindrical hole, the position of the hole in the parallelepiped, the directions of the carvings and more. A large number of different structural elements can be created by a hole and various carvings and, since not all of these feasible feature elements are valid for use in creating pierced designs, restrictions concerning the carving directions are defined based on aesthetic and artistic rules. These feature elements are combined like 3D building blocks to create complex carved plaques representing pierced designs (Figure 3). The structural elements are placed side by side, either on top, bottom, right or left of each other, and unioned into a new object. The rules determining how the different features can be combined are defined by the designs to be recreated. The construction of these plaques is constrained by the parameter values defined by the user-designer in reference to characteristics of the plaque such as length and width. The plaques are then used to create jewellery such as rings and necklace pendants. By parameterizing the process of creating pierced jewellery, it is very easy to modify characteristics of the jewellery such as the size and the designs represented.
4.2. Feature-Based Modeling for Reverse Engineering Reverse engineering aims to analyze a real object and to determine its characteristics and mechanisms, with further aim to reconstruct and remanufacture it. The data concerning the physical object can be obtained by various methods. A common method is using a 3D laser scanner to obtain a point cloud corresponding to points on the surface of the scanned object. Given this, a more specific definition of reverse engineering would be to define it as the process of obtaining a geometric CAD model from measurements acquired by scanning an existing physical model [65]. Reverse engineering is vital for various industries because the computer models acquired help improve the quality and efficiency of designs and also speed up the manufacturing and analysis process. In mechanical part engineering and manufacturing, reverse engineering aims to replicate existing parts for which no CAD models exist. Also, it is possible to manufacture objects for which the original CAD model no longer corresponds to the physical part that was manufactured due to subsequent undocumented modifications made after the initial design stage. Reverse engineering is applied in industrial design, such as automobile exterior parts design. Stylists and artists very often create physical models of their concepts by using clay, plaster or wood. These real-scale models are then used to create CAD models for manufacturing the objects on an industrial scale. Also the CAD models provide the artists and stylists with the ability to re-evaluate their designs, especially when they can easily re-design or modify them as needed. Reverse engineering encourages conceptual design because the designer creates an initial prototype, scans it and manipulates it as desired. Re-engineering objects of freeform design is relatively more difficult and complex than reconstructing mechanical parts. Mechanical parts usually have specific geometric characteristics, such as symmetries and swept profiles that are fairly easy to detect and parameterize. However, in the case of freeform objects, there are often features that are difficult to extract. In the case of restricted mechanical parts, features libraries can be defined
170
Ioannis Fudos and Vasiliki Stamati
and used for detecting features in the point cloud, whereas for freeform objects this is not feasible. Since reverse engineering applications aim at reproduction and manufacturing, the reengineering process must create highly accurate and robust CAD models. A characteristic of the objects re-engineered is that they are usually parts of larger objects and therefore have to fit and connect exactly with other parts, like pieces in a puzzle. For this reason, the models created through the reverse engineering process must be well defined and constrained [66]. Also, in some cases, the model created is to be edited in i.e. custom designing, therefore a simple B-rep model is not appropriate. Given this, it is natural that feature-based and constrained based models are used more and more in re-engineering applications. A feature-based reverse engineering method was also used in [67] for reverse engineering a mannequin for garment design. The basic concept in this method is to create a generic mannequin model of a human torso, which is appropriately aligned with the 3D point cloud of the desired human torso model, and the generic model is “fitted” to the point cloud by matching up characteristic points of the models e.g. peaks. This method creates parameterized models by exploiting the features of the object and by using them to constrain the fitting process. It is an automated approach to reverse engineering human torsos that creates parameterized models with good accuracy. Constraint definition and application has been used in building reconstruction and reverse engineering objects of aesthetic design. Specifically, [68] examines how a priori knowledge can be used to derive constraints to create more accurate models in architectural applications. Relevant work is also found in [69]. In [70] the authors suggest a constraint-based approach in reverse engineering for model beautification.
Figure 4. A point cloud of a screwdriver [74] (left) and its concavity intensity map (right).
A paradigm of feature-based/constraint-based re-engineering is the REFAB ([71], [48]) project, which uses a feature-based and constraint-based method to reverse engineer mechanical parts. REFAB is a human interactive system where the 3D point cloud is presented to the user, and the user selects from a list a feature that exists in the cloud, specifies with the mouse the approximate location of the feature in the point cloud, and the system then fits the specified feature to the actual point cloud data using a least square means method iteratively. The authors give emphasis on the fitting of pockets, where the user draws a profile of the pocket on the point cloud and the system then fits the profile to the data and the profile is then extruded to create the pocket. This feature-fitting process is made more
Constraint-Based and Feature-Based CAD Systems and Applications
171
accurate by using constraints that are detected by the system, verified by the user and then exploited to achieve a better fitting of the features according to the data. The system supports constraints [72] such as parallelism, concentricity, perpendicularity and symmetry. The constraints defined and used in REFAB seek to reduce the degrees of freedom associated with the object as much as possible, so as to achieve high precision models in less time. An interesting feature-based approach to re-engineering objects of freeform design is presented in [73]. Re-engineering objects of freeform design is essential for supporting custom design in a CAD model reconstruction system. It provides user-designers with the capability to modify re-engineered CAD models according to their preferences and to incorporate in novel designs. For instance, in the case of jewellery re-engineering, the userdesigner might like to be able to modify the dimensions of a ring to produce one of larger size, or be able to choose certain parts of the object to use them to create other pieces of jewellery. To this end, one needs to exploit the features of the original model and the relationships and constraints that hold among them. In [73] a generic and global feature detection approach to reverse engineering point clouds of objects of freeform design is presented. A method is presented for detecting and segmenting a point cloud into individual subsets that correspond to features. This is achieved by using a point characteristic defined as “concavity intensity” to decompose the point cloud into subsets (components) that correspond to features of the physical object. The concavity intensity of a point corresponds to the smallest distance from the point to its convex hull that does not pass through the point cloud. This feature basically detects concave features in the object being reengineered. A concavity intensity map is shown in figure 4. In the concavity intensity map the values are rendered using greyscale, where black corresponds to points belonging to the convex hull, whereas white corresponds to points that are farthest away from the convex hull. We can observe that edges (rapid variations in concavity intensity), saddle points and extrema can be used to partition the point cloud into components that can later be refined as the features of the object.
Figure 5. Feature extraction is performed by region growing.
The point concavity intensity values calculated are used to segment the point cloud into feature components. A feature component is bounded by areas where abrupt changes in the direction of the normal and/or rapid concavity intensity variations are observed. After calculating the concavity intensities of the all the points that form the point cloud, a region
172
Ioannis Fudos and Vasiliki Stamati
growing method to divide the point cloud into its components is applied (figure 5). The region growing method is based on two criteria: i)
the normal vectors of neighboring points belonging to the same region should form an angle smaller than a threshold t and ii) the approximate gradients of the concavity intensity function in directions x, y and z for neighboring points of the same region should maintain the same sign value, meaning that there are no zero crossings observed between them.
5. Conclusion We presented a survey of representation schemes for CAD applications. Then we introduced a framework for representing and performing complex design operations in CAD systems. This framework has evolved from the feature-based design paradigm augmented with arbitrary intra-feature and inter-feature constraints and a hierarchical constraint analysis approach for placing geometric elements. To demonstrate the employment of this framework we presented two example applications that we have developed using this new design concept.
References [1] Cripps R.J., Algorithms to support point-based cadcam, International Journal of Machine Tools and Manufacture 2003, 43(4), 425-432. [2] Kobbelt L. and Botsch M., A survey of point-based techniques in computer graphic, Computers and Graphics 2004, 28(6), 801-814. [3] Hoppe H., Derose T., Duchamp T., McDonald J. and Stuetzle W., Surface reconstruction from unorganized points, Computer Graphics 1992, 26(2), 71--78. [4] Benko P., Martin R.R. and Varady T., Algorithms for reverse engineering boundary representation models, Computer-Aided Design 2001, 33(11), 839-851. [5] Langbein F.C., Marshall A.D. and Martin R.R., Choosing consistent constraints for beautification of reverse engineered geometric models, Computer-aided Design 2004, 36(3), 261-278. [6] Jense G.J., Voxel-based methods for CAD, Computer Aided Design 1989, 21(10), 528533. [7] Hoffmann C.M. and Joan-Arinyo R., Erep An editable high-level representation for geometric design, Geometric Modeling for Product Realization, P. Wilson, M. Wozny , M. Pratt, eds., North Holland, 1993, 129-164. [8] Benko P., Kos G., Varady T., Andor L. and Martin R.R., Constrained fitting in reverse engineering, Computer-Aided Design 2002, 19(3), 173-205. [9] Anderl R. and Mendgen R., Modelling with constraints: theoretical foundation and application, Artificial Intelligence in Computer-Aided Design 1996, 28(3), 155-168. [10] Cornea N.D., Demirci M.F., Silver D., Shokoufandeh A., Dickinson S.J. and Kantor P.B., 3D Object Retrieval using Many-to-many Matching of Curve Skeletons, IEEE
Constraint-Based and Feature-Based CAD Systems and Applications
[11]
[12]
[13] [14]
[15]
[16] [17] [18] [19] [20]
[21] [22] [23] [24] [25] [26] [27] [28] [29] [30]
173
International Conference on Shape Modeling and Applications (SMI) 2005, Boston USA, 368-373. Bloomenthal J., Medial-based Vertex Deformation, Proceedings of the 2002 ACM SIGGRAPH/Eurographics Symposium on Computer Animation 2002, San Antonio, TX, USA, . Sheehy D., Armstrong C. and Robinson D., Shape description by medial axis construction, IEEE Transactions on Visualization and Computer Graphics 1996, 2(1), 62-72. Storti D.W., Turkiyyah G.M., Ganter M.A., Lim C.T. and Stal D.M., Skeleton-based modeling operations on solids, Solid Modeling '97 1997, Atlanta GA USA. Lien J., Keyser J. and Amato N.M., Simultaneous Shape Decomposition and Skeletonization, Proceedings of the ACM Syposium on Solid and Physical Modeling 2006. Cardiff, Wales, United Kingdom. Yoshizawa S., Belyaev A.G. and Seidel H-P., Free-form skeleton-driven mesh deformation, Proceedings of the eighth ACM Symposium on Solid Modeling and applications 2003, Seattle, Washington U.S.A. Fudos I., Constraint Solving for Computer Aided Design, Dept of Computer Sciences, Purdue University, 1995, PhD Thesis. Sutherland I., Sketchpad, a man-machine graphical communication system. Proceedings of the spring Joint Compo Conference, 1963. Nelson, G., Juno, a constraint-based graphics system, SIGGRAPH 1985, San Francisco USA. Serrano D. and Gossard D., Combining mathematical models and geometric models in CAE systems, Proc. ASME Computers in Eng. Conf 1986, Chicago USA. Beaty P.L., Fitzhorn P.A. and Herron G.J., Extensions in variational geometry that generate and modify object edges composed of rational Bezier curves, Computer-Aided Design 1994, 26(2), 98-108. Light R. and Gossard D., Modification of geometric models through variational geometry, Computer Aided Design 1982, 14(4), 209-214. Chyz W., Constraint management for CSG, MIT, 1985, Master's Thesis. Allgower E.L. and Georg K., Continuation and path following, Acta Numerica 1993, 164. Lamure H. and Michelucci D., Solving geometric constraints by homotopy, Proc. Third Symposium on Solid Modeling and Applications 1995, Salt Lake City, USA. Morgan A., Solving polynomial systems using continuation for engineering and scientific problems 1987, Prentice Hall Inc. Bruderlin B. and Roller D., Geometric constraint solving and applications 1998, Springer Verlag. Bruderlin B., Constructing three-dimensional geometric objects defined by constraints, In Workshop on Interactive 3D Graphics 1986. Sohrt W., Interaction with constraints in three-dimensional modeling, Dept of Computer Science. The University of Utah, 1991, Master's Thesis. Bruderlin B., Using geometric rewrite rules for solving geometric problems symbolically, Theoretical Computer Science 1993, 116, 291-303. Aldefeld, B., Variation of geometries based on a geometric-reasoning method, Computer-Aided Design 1988, 20(3), 117-126.
174
Ioannis Fudos and Vasiliki Stamati
[31] Suzuki H., Ando H. and Kimura F., Variation of geometries based on a geometricreasoning method, Computers & Graphics 1990, 14(2), 211-224. [32] Sunde G., Specification of shape by dimensions' and other geometric constraints, in Geometric modeling for CAD applications 1988, M. J. Wozny, H. W. McLaughlin, and J. L. Encarnacao, eds., North Holland, IFIP, 199-213. [33] Yamaguchi Y. and Kimura F., A constraint modeling system for variational geometry, in Geometric Modeling for Product Engineering, M. J. Wozny, J.U. Turner and K. Preiss, eds., 1990, Elsevier Science Publishers B.V. (North Holland), 221-233. [34] Verroust A., Schonek F. and Roller D., Rule-oriented method for parameterized computer-aided design, Computer Aided Design 1992, 24(10), 531-540. [35] Joan-Arinyo R. and Soto A., A rule-constructive geometric constraint solver, Technical Report LSI-95-25-R, 1995, Universitat Politecnica de Catalunya. [36] Fudos I. and Hoffmann C.M., Correctness proof of a geometric constraint solver, International Journal of Computational Geometry & Applications, 1996, 405-420 [37] Fitzgerald W.J., Using axial dimensions to determine the proportions of line drawings in computer graphics, Computer Aided Design 1981, 13(6), 377-382. [38] Requicha, A., Dimensionining and tolerancing, Technical report PADL TM-19, Production Automation Project 1977, University of Rochester. [39] Todd P., A k-tree generalization that characterizes consistency of dimensioned engineering drawings, SIAM J DISC MATH 1989, 2(2), 255-261. [40] Owen J.C., Algebraic solution for geometry from dimensional constraints, ACM Symp. Found. of Solid Modeling 1991, Austin, TX. [41] D-Cubed, Ltd., 68 Castle Street, Cambridge, CB3 OAJ, England. The Dimensional Constraint Manager, June 1994, Version 2.7. [42] Fudos I. and Hoffmann C.M., A graph-constructive method to solving systems of geometric constraints, ACM Transactions on Graphics 1997, 16(2), 179-216. [43] Buchberger B., Grobner Bases: An algorithmic method in polynomial ideal theory, in Multidimensional Systems Theory, N.K. Bose, Editor, 1985, D. Reidel Publishing Company, 184-232. [44] Wu W.T., Basic principles of mechanical theorem proving in elementary geometries, Journal of Automated Reasoning 1986, 2, 221-252. [45] Hoffmann C.M., Lomonosov A., and Sitharam M., Finding solvable subsets of constraint graphs, L. G. Smolka, editor, 1997, Springer Verlag. 463-477. [46] Fudos I. and Markouzis D., A hierarchical feature-based approach to computer aided placement and routing for VLSI, Technical Report TR-2007-09, 2007, Computer Science Department, University of Ioannina. [47] Shah J.J. and Mantyla M., Parametric and Feature-based CAD/CAM 1995, John Wiley & Sons Inc. [48] Thompson W.B., Owen J.C., James de St Germain H., Stark S.R. and Henderson T.C., Feature-based reverse engineering of mechanical parts, IEEE Transactions on Robotics and Automation 1999, 12(1), 57-66, . [49] Lee J.Y. and Kim K., A feature-based approach to extracting machining features, Computer-Aided Design 1998, 30(13), 1019-1035. [50] Stamati V. and Fudos I., A parametric feature-based CAD system for reproducing traditional pierced jewellery, Computer Aided Design 2005, 37(4), 431-449.
Constraint-Based and Feature-Based CAD Systems and Applications
175
[51] Nyirenda P. J., Mulbagal M. and Bronsvoort W. F., Definition of freeform surface feature classes, Computer Aided Design & Applications 2006, 3(5), 665-674. [52] Fontana M., Giannini F. and Meirana M., A free form feature taxonomy, In Proceedings of EUROGRAPHICS '99, 1999. [53] Dobson G.T., Waggenspack Jr W.N. and Lamousin H.J., Feature based models for anatomical data fitting, Computer Aided Design 1995, 27(2), 139-146. [54] Gumhold S., Wang X. and MacLeod R., Feature extraction from point clouds, Proceedings of the 10th International Meshing Roundtable, 2001. [55] Hoffmann C.M. and Joan-Arinyo R., On user-defined features, Computer Aided Design 1998, 30(5), 321-332. [56] PTC, PRO/ENGINEER, http://www.ptc.com. [57] Autodesk, AUTOCAD, http://www.autodesk.com/autocad. [58] IRONCAD, IRONCAD, http://www.ironcad.com. [59] Dassault Systems, CATIA, http://www.3ds.com/products-solutions/plm-solutions/catia/ overview. [60] Dassault Systems, Solidworks, http://www.3ds.com/products-solutions/ solidworks. [61] Siemens - UGS PLM Software, Solid Edge, http://www.solidedge.com. [62] Alibre Inc., Alibre Design, http://www.alibre.com. [63] Autodesk, 3D Studio Max, http://www.autodesk.com/3dsmax. [64] Autodesk, Maya, http://www.autodesk.com/maya. [65] Varady T., Martin R.R. and Cox J., Reverse engineering of geometric models - an introduction, Computer Aided Design 1997, 29(4), 253-330. [66] Werghi N., Fisher R., Robertson C. and Ashbrook A., Object reconstruction by incorporating geometric constraints in reverse engineering, Computer Aided Design 1999, 31(6), 363-399. [67] Au C.K. and Yuen M.M.F., Feature-based reverse engineering of mannequin for garment design, Computer-Aided Design 1999, 31(12), 751-759. [68] Fisher R.B., Applying knowledge to reverse engineering problems, Computer-Aided Design 2004, 36, 501-510. [69] Cantzler, H., Improving architectural 3D reconstruction by constrained modeling, Institute of Perception, Action and Behaviour, School of Informatics, University of Edinburgh, 2003, PhD Thesis. [70] Gao C.H., Langbein F.C., Marshall A.D. and Martin R.R., Local topological beautification of reverse engineered models, Computer-Aided Design 2004, 36(13), 1337-1355. [71] Thompson W.B., James de St Germain H., Henderson T.C. and Owen J.C., Constructing high-precision geometric models from sensed position data, Proceedings of the 1996 ARPA Image Understanding Workshop, 1996. [72] James de St. Germain H., Stark S.R., Thompson W.B. and Henderson T.C., Constraint optimization and feature-based model construction for reverse engineering, ARPA Image Understanding Workshop, 1997. [73] Stamati V. and Fudos I., A feature based approach to re-engineering objects of freeform design by exploiting point cloud morphology. Proceedings of the 2007 ACM symposium on Solid and physical modeling 2007, Beijing China. [74] Cyberware, Cyberware Rapid 3D Scanners - Desktop 3D Scanner Samples, 1999, http://www.cyberware.com/products/ scanners/desktopSamples.html
In: Computer Animation ISBN 978-1-60741-559-6 c 2010 Nova Science Publishers, Inc. Editors: J.S. Wright and L.M. Hughes, pp. 177-208
Chapter 8
C OMPUTER A IDED G EOMETRIC D ESIGN WITH P OWELL -S ABIN S PLINES Hendrik Speleers1,2 , Paul Dierckx1 and Stefan Vandewalle1 1
Katholieke Universiteit Leuven, Department of Computer Science, Belgium 2 Research Assistant of the Fund for Scientific Research Flanders, Belgium
Abstract Powell-Sabin splines are bivariate C 1 -continuous quadratic splines defined on an arbitrary triangulation. Their construction is based on a particular split of each triangle in the triangulation into six smaller triangles. In this article we give an overview of the properties of Powell-Sabin splines in the context of computer aided geometric design. These splines can be represented in a compact normalized B-spline basis with an intuitive geometric interpretation involving control triangles. Using these triangles one can interactively change the shape of the splines in a predictable way. We describe the simple subdivision rules for Powell-Sabin splines, and discuss some applications. We consider a new efficient spline visualization technique based on subdivision. We also look at two useful generalizations of the Powell-Sabin splines, i.e., QHPS splines and NURPS surfaces. The QHPS splines are a hierarchical variant of Powell-Sabin splines. They have very similar properties as the Powell-Sabin splines, and their hierarchical nature allows a local refinement of the spline in a very straightforward way. The NURPS surface is the rational extension of the Powell-Sabin spline. By means of weights they give extra degrees of freedom to the designer for the modelling of surfaces.
1.
Introduction
The ability to represent complex surfaces on the computer is important in a broad range of applications [10]. Tensor product B-splines are today’s most commonly used surface splines in computer aided geometric design (CAGD) packages. They are very attractive because of their compact representation, the ease of implementation and their efficiency. The B-spline control net of such a surface enables local adaptations in a geometrically intuitive way. A definite drawback, however, is that they are restricted to regular meshes
178
Hendrik Speleers, Paul Dierckx and Stefan Vandewalle
on rectangular domains. Therefore, they are not well suited to represent strongly irregular objects defined on arbitrary domains. Instead of using the tensor product representation on rectangular domains, one can also describe piecewise bivariate polynomial surfaces in terms of barycentric coordinates with respect to a triangular domain. These triangular patches in Bernstein form are called B´ezier triangles [9]. In spite of the flexibility of a triangular mesh, the representation of complex smooth surfaces can require a large number of B´ezier triangles. Imposing smoothness conditions between these patches results in a large number of non-trivial continuity relations between the coefficients. A number of authors studied the construction of smooth bivariate spline functions on arbitrary triangulations. A major difficulty is to determine the dimension of such spline spaces. In general, it is not possible to express the dimension in terms of the number of vertices and triangles in the triangulation. There are some results for particular choices of polynomial degree and smoothness [1, 3, 13], and for particular constrained triangulations [11]. Yet, in general and especially for low degree polynomials the problem remains open. One can overcome this problem by using so-called macro-elements, where each triangle in the triangulation is split in a particular way. Well-known in the finite element literature is the C 1 -continuous cubic Clough-Tocher spline space [5]. For C 1 -continuous quadratic splines, Powell and Sabin [21] constructed an element by splitting all triangles into six subtriangles. Powell-Sabin splines can be compactly represented in a normalized basis. Dierckx [6] discovered that the basis splines have an intuitive geometric interpretation by means of control triangles. Windmolders [34] was the first to investigate the use of Powell-Sabin splines in CAGD applications. Many new interesting properties and techniques have been developed since, e.g., a triadic subdivision scheme. We will overview them in the context of computer aided design and modelling. Section 2. is devoted to the Powell-Sabin spline space. It recalls some basic concepts of Powell-Sabin splines and the construction of a suitable normalized basis. In section 3. we describe a subdivision scheme for the splines, and we discuss some applications. The next sections cover two generalizations of Powell-Sabin splines. Section 4. discusses a hierarchical variant of the splines, called QHPS splines. They retain similar properties as the Powell-Sabin splines, but they are defined on a hierarchical triangulation. These splines are very useful in an adaptive local refinement strategy. In section 5. we consider NURPS surfaces, a rational extension of Powell-Sabin splines. They give the designer more degrees of freedom by means of extra weights.
2.
Powell-Sabin Splines
In this section we detail the theory of Powell-Sabin splines. We first recall some general concepts of polynomials on triangles in their Bernstein-B´ezier representation. Powell-Sabin splines are defined on arbitrary triangulations refined with a particular split. We discuss a normalized basis for this spline space, and we give an overview of their properties.
Computer Aided Geometric Design with Powell-Sabin Splines
2.1.
179
Polynomials on Triangles
Barycentric coordinates provide an elegant tool for defining points inside a triangle. Let T (V1 , V2 , V3 ) be a non-degenerated triangle. An arbitrary point P in the plane of the triangle can be uniquely expressed in terms of the barycentric coordinates τ = (τ1 , τ2 , τ3 ) with respect to T , such that P =
3 X
τi Vi ,
and τ1 + τ2 + τ3 = 1.
(2.1)
i=1
If the point P lies inside the triangle T , then its barycentric coordinates are all positive. Consider two points in the plane of the triangle, i.e., P1 and P2 . The barycentric direction δ = (δ1 , δ2 , δ3 ) of the vector P2 − P1 with respect to T is defined as the difference of the barycentric coordinates of both points. If the Euclidian distance kP2 − P1 k = 1, then δ is called a unit barycentric direction. Let Πd denote the linear space of bivariate polynomials of total degree less than or equal to d. Any polynomial pd ∈ Πd on triangle T has a unique Bernstein-B´ezier representation [9], X d pd (τ ) = bijk Bijk (τ ), (2.2) i+j+k=d
with d Bijk (τ ) =
d! τ1 i τ2 j τ3 k i!j!k!
(2.3)
the Bernstein polynomials on the domain triangle T . The coefficients bijk are called B´ezier ordinates, and theB´ezier domain points ξijk are defined as the points with barycentric coordinates di , dj , kd . By associating each B´ezier ordinate bijk with the B´ezier domain points ξijk , we can display the Bernstein-B´ezier representation schematically as in Figure 1(a) for the case d = 2. The piecewise linear interpolant of the B´ezier control points, defined as (ξijk , bijk ), is called the B´ezier control net. This control net is tangent to the polynomial surface z = pd (τ ) at the three vertices of the triangle, and it mimics the shape of the Bernstein-B´ezier surface. Figure 1(b) shows such a surface together with its control points and control net. Polynomials in their Bernstein-B´ezier representation (2.2) can be evaluated using the recursive de Casteljau algorithm [4], i.e., pd (τ ) = bd0,0,0 (τ ),
(2.4a)
where b0i,j,k (τ ) = bijk , bri,j,k (τ )
=
i + j + k = d,
τ1 br−1 i+1,j,k (τ )
+
τ2 br−1 i,j+1,k (τ )
i + j + k = d − r,
(2.4b) +
τ3 br−1 i,j,k+1 (τ ), and r = 1, . . . , d.
(2.4c)
This algorithm is numerically stable and has many interesting properties [9]. Besides for evaluation, the algorithm can also be used for computing derivatives and for obtaining continuity conditions on neighbouring triangular patches.
180
Hendrik Speleers, Paul Dierckx and Stefan Vandewalle
b002 V3 b011
b101
V2
V1 b200
b110
b020
(a)
(b)
Figure 1. (a) Schematic representation of a quadratic bivariate polynomial by means of its B´ezier ordinates bijk . (b) A quadratic Bernstein-B´ezier polynomial with its control points and control net.
Representing complex surfaces requires the use of a large number of B´ezier triangles. Preserving a certain degree of continuity between all patches results then in a large set of non-trivial relations between their B´ezier ordinates. Therefore, one has looked for piecewise polynomials with inherent global continuity conditions.
2.2.
The Powell-Sabin Spline Space
Consider a simply connected subset Ω ⊂ R2 with polygonal boundary ∂Ω. Assume a conforming triangulation ∆ of Ω is given, consisting of t triangles Tj , with j = 1, . . . , t, and having n vertices Vk , with k = 1, . . . , n. A triangulation is conforming if no triangle contains a vertex different from its own three vertices. The Powell-Sabin (PS) refinement ∆∗ of ∆ partitions each triangle Tj into six smaller triangles in the following way: 1. Choose an interior point Zj in each triangle Tj , so that if two triangles Ti and Tj have a common edge, then the line joining Zi and Zj intersects the common edge at a point Rij between its vertices. 2. Join each point Zj to the vertices of Tj . 3. For each edge of the triangle Tj (a) which belongs to the boundary ∂Ω: join Zj to an arbitrary point on that edge; (b) which is common to a triangle Ti : join Zj to Rij . Figure 2(a) displays a triangulation with 8 triangles, and a corresponding PS refinement containing 48 triangles. The space of piecewise quadratic polynomials on ∆∗ with global C 1 -continuity is called the Powell-Sabin spline space: n o S21 (∆∗ ) := s ∈ C 1 (Ω) : s|Tj∗ ∈ Π2 , Tj∗ ∈ ∆∗ . (2.5)
Computer Aided Geometric Design with Powell-Sabin Splines
(a)
181
(b)
Figure 2. (a) A PS refinement ∆∗ (in dashed lines) of a given triangulation ∆ (in solid lines). (b) The PS points (bullets) and a set of suitable PS triangles (shaded).
Each of the 6t triangles resulting from the PS refinement is the domain triangle of a quadratic Bernstein-B´ezier polynomial. Powell and Sabin [21] proved that the following interpolation problem s(Vk ) = fk ,
∂s (Vk ) = fx,k , ∂x
∂s (Vk ) = fy,k , ∂y
k = 1, . . . , n,
(2.6)
has a unique solution s(x, y) ∈ S21 (∆∗ ) for any given set of n (fk , fx,k , fy,k )-triplets. Hence, the dimension of the Powell-Sabin spline space S21 (∆∗ ) equals 3n.
2.3.
A B-spline Representation
Dierckx [6] presented a geometric method to construct a normalized basis for the spline space S21 (∆∗ ). Every Powell-Sabin spline can then be represented as s(x, y) =
3 n X X
ci,j Bij (x, y).
(2.7)
i=1 j=1
To obtain the basis functions Bij (x, y), we associate with each vertex Vi three linearly independent triplets (αi,j , βi,j , γi,j ), j = 1, 2, 3. These triplets are determined as follows: 1. For each vertex Vi ∈ ∆ with Cartesian coordinates (xi , yi ), find the corresponding PS points. These points are the immediately surrounding B´ezier domain points of Vi in the PS refinement ∆∗ . The vertex Vi itself is also a PS point. In Figure 2(b) the PS points are indicated as bullets. 2. For each vertex Vi , find a triangle ti (Qi,1 , Qi,2 , Qi,3 ) that contains all the PS points of Vi . These triangles ti , i = 1, . . . , n, are called PS triangles, and we denote their vertices as Qi,j = (Xi,j , Yi,j ). Figure 2(b) shows a possible set of PS triangles.
182
Hendrik Speleers, Paul Dierckx and Stefan Vandewalle
3. The three linearly independent triplets (αi,j , βi,j , γi,j ), j = 1, 2, 3, are derived from the PS triangle ti of vertex Vi as follows: • αi = (αi,1 , αi,2 , αi,3 ) are the barycentric coordinates of Vi with respect to ti , • βi = (βi,1 , βi,2 , βi,3 ) and γi = (γi,1 , γi,2 , γi,3 ) are the unit barycentric directions with respect to ti , in the x- and y-direction respectively. Practically, they can be computed as xi X yi 1 Yi,1 1 1 1 i,1 Xi,2 Yi,2 1 , xi yi 1 αi = E E Xi,3 Yi,3 1 Xi,3 Yi,3 1 Yi,2 − Yi,3 Yi,3 − Yi,1 Yi,1 − Yi,2 , , , βi = E E E Xi,3 − Xi,2 Xi,1 − Xi,3 Xi,2 − Xi,1 γi = , , , E E E
with
1 , E
Xi,1 Yi,1 1 Xi,2 Yi,2 1 , xi yi 1
Xi,1 Yi,1 1 E = Xi,2 Yi,2 1 . Xi,3 Yi,3 1
The Powell-Sabin B-spline Bij (x, y) is defined as the unique solution of the interpolation problem (2.6) with all (fk , fx,k , fy,k ) = (0, 0, 0) except for k = i, where (fi , fx,i , fy,i ) = (αi,j , βi,j , γi,j ) 6= (0, 0, 0). Figure 3 shows an example of three linearly independent Powell-Sabin B-splines corresponding to the same vertex. Choice of PS triangles. The set of PS triangles is not uniquely defined for a given PS refinement. One possibility for their construction is to calculate triangles of minimal area, the so-called optimal PS triangles [6]. Computationally, this problem leads to a quadratic programming problem. An alternative (and easier to implement) solution is given in [31], where the sides of the PS triangle are found by connecting two neighbouring PS points. From a practical point of view, other choices may be more appropriate. A particular choice of the PS triangles can, e.g., simplify the treatment of boundary conditions [28]. In such a case it is better to construct PS triangles at the boundary vertices with one side tangential and another normal to the boundary curve. For quasi-interpolation [20] the corners of each PS triangle are preferred to be chosen on edges of the triangulation.
2.4.
Properties of the B-spline Basis
The Powell-Sabin B-splines have some nice properties, which are very useful in CAGD and approximation applications. In this section we review some of them. Later on, in section 3., we will discuss in more detail subdivision rules for Powell-Sabin splines in the representation (2.7).
Computer Aided Geometric Design with Powell-Sabin Splines
(a)
(b)
(c)
(d)
183
Figure 3. (a) A given triangulation with PS refinement. (b)-(d) The three Powell-Sabin B-splines Bij (x, y) corresponding to the central vertex Vi and its PS triangle. The contour lines of the basis functions are depicted.
Local support. It is easy to see that each Powell-Sabin B-spline Bij (x, y) has a local support, because the basis function is zero outside the molecule Mi of vertex Vi . The molecule (also called 1-ring) is defined as the union of all triangles in the triangulation that contain Vi . Convex partition of unity. Dierckx showed in [6] that the proposed basis forms a convex partition of unity on the domain Ω, i.e., Bij (x, y) ≥ 0,
and
n X 3 X
Bij (x, y) = 1,
for all (x, y) ∈ Ω.
(2.8)
i=1 j=1
The fact that each PS triangle ti contains all PS points of vertex Vi guarantees the positivity of the basis functions. Stability. Maes et al. [19] proved that the basis is L∞ -stable. For the max-norms kck∞ = max |ci,j |, i,j
and ks(x, y)k∞ = max |s(x, y)|, Ω
184
Hendrik Speleers, Paul Dierckx and Stefan Vandewalle
Figure 4. A Powell-Sabin spline and its PS control triangles.
they showed that for all choices of the coefficient vector c, one has that K1 kck∞ ≤ ks(x, y)k∞ ≤ K2 kck∞ ,
(2.9)
where K2 = 1, and K1 depends only on the smallest angle θ∆ in the triangulation ∆ and on the size of the PS triangles. Moreover, the smaller the PS triangles the better (the larger) the stability constant. In [25] an adapted version of the proof is given, resulting in a sharper stability bound. The ratio K2 /K1 yields an upper bound for the condition number of the basis. It reflects the influence of a change in the coefficients on the magnitude of the corresponding spline with respect to the L∞ -norm. In [26, 32] the stability of the PowellSabin spline basis is proven in the more general Lp -norm with 1 ≤ p ≤ ∞. A stable local basis provides spline approximations of smooth functions with an optimal order [16]. PS control triangles. Referring to the representation (2.7) for Powell-Sabin splines, we define control points as ci,j = (Qi,j , ci,j ). (2.10) These points lead to PS control triangles Ti (ci,1 , ci,2 , ci,3 ), which are tangent to the spline surface z = s(x, y) at the vertices Vi . The projection of the control triangles Ti in the (x, y)-plane are simply the PS triangles ti . Using these control triangles a designer can interactively change the shape of a given Powell-Sabin spline locally in a predictable way. From property (2.8) it follows that the graph of the spline (2.7) lies inside the convex hull of its control points ci,j . Figure 4 shows a Powell-Sabin spline surface together with the corresponding control triangles. The spline is taken from [7] and represents a smooth approximation of the func−1 tion exp((x − 0.52)2 + (y − 0.48)2 ) − 0.95 on the domain [−1, 1] × [−1, 1].
Computer Aided Geometric Design with Powell-Sabin Splines Vk
sk
Qi,3
uk Rki
rj Zijk
Si′
Rjk
Si
ui
si
Rij
ω
wi
Vj
Vi
wk θj
vi
S˜i Qi,1
185
vk ri
θi
wj
θk rk
vj
uj sj
Qi,2 (a)
(b)
Figure 5. (a) PS refinement of a triangle T (Vi , Vj , Vk ), together with the PS triangle ti (Qi,1 , Qi,2 , Qi,3 ) associated with vertex Vi . (b) Schematic representation of the B´ezier ordinates of a Powell-Sabin spline.
Other bases for the Powell-Sabin spline space can be found in the literature [2, 17]. Their construction is based on so-called minimal determining sets. They are stable, but they do not form a convex partition of unity and they have no geometric interpretation via control triangles.
2.5.
A Bernstein-B´ezier Representation
For further manipulation (e.g. evaluation and differentiation) of a Powell-Sabin spline in the form (2.7), we can write the spline in a Bernstein-B´ezier representation. We consider a single domain triangle T (Vi , Vj , Vk ) ∈ ∆ with its PS refinement ∈ ∆∗ . The other triangles in the triangulation can be treated in the same way. We assume that the points indicated in Figure 5(a) have the following barycentric coordinates: Vi (1, 0, 0),
Vj (0, 1, 0),
Rij (λij , λji , 0),
Vk (0, 0, 1),
Rjk (0, λjk , λkj ),
Zijk (zi , zj , zk ), Rki (λik , 0, λki ).
On each of the six triangles in ∆∗ the Powell-Sabin spline is a quadratic polynomial, that can be represented in its Bernstein-B´ezier formulation, i.e., with d = 2 in equations (2.2) and (2.3). The value of the corresponding B´ezier ordinates is derived in [6]. The outcome is schematically represented in Figure 5(b), with si = αi,1 ci,1 + αi,2 ci,2 + αi,3 ci,3 ,
(2.11a)
ui = Li,1 ci,1 + Li,2 ci,2 + Li,3 ci,3 ,
(2.11b)
L′i,3 ci,3 ,
(2.11c)
˜ i,1 ci,1 + L ˜ i,2 ci,2 + L ˜ i,3 ci,3 . wi = L
(2.11d)
vi =
L′i,1 ci,1
+
L′i,2 ci,2
+
˜ i,1 , L ˜ i,2 , L ˜ i,3 ) are The values of (αi,1 , αi,2 , αi,3 ), (Li,1 , Li,2 , Li,3 ), (L′i,1 , L′i,2 , L′i,3 ) and (L found as the barycentric coordinates of the PS points Vi , Si , Si′ and S˜i , respectively, with
186
Hendrik Speleers, Paul Dierckx and Stefan Vandewalle
respect to the PS triangle ti (Qi,1 , Qi,2 , Qi,3 ). These points are depicted in Figure 5(a). Analogously, we can compute the values of (sj , uj , vj , wj ) and (sk , uk , vk , wk ). The other B´ezier ordinates are derived from the inherent continuity conditions of the Powell-Sabin spline, e.g., rk = λij ui + λji vj ,
(2.11e)
θk = λij wi + λji wj ,
(2.11f)
ω = z i wi + z j wj + z k wk .
(2.11g)
In this Bernstein-B´ezier representation the Powell-Sabin splines can be easily manipulated using the de Casteljau algorithm (2.4).
2.6.
Parametric Powell-Sabin Surfaces
A parametric Powell-Sabin surface is defined as Pn P3 j x x = P i=1 P j=1 ci,j Bi (u, v) y = ni=1 3j=1 cyi,j Bij (u, v) , z = Pn P3 cz B j (u, v) i=1 j=1 i,j i
(u, v) ∈ Ω,
(2.12)
or, compactly, s(u, v) =
n X 3 X
ci,j Bij (u, v),
(2.13)
i=1 j=1
(cxi,j , cyi,j , czi,j )
where the ci,j = are again called control points. Referring to (2.7), the graph of a Powell-Sabin spline is a particular case of the parametric Powell-Sabin surface, notably with x = u and y = v. A parametric surface s(u, v) lies within the convex hull of its control points. We can associate a control triangle Ti (ci,1 , ci,2 , ci,3 ) with each vertex Vi in the parameter domain. This triangle is tangent to the surface at s(Vi ). Note that in the parametric setting the choice of the control points ci,j is completely free, whereas for Powell-Sabin splines only the zcomponent of the control points can be chosen. Figure 6 depicts a torus modelled using a parametric Powell-Sabin surface with 16 control triangles. Parametric Powell-Sabin surfaces can be easily manipulated by applying the algorithms for Powell-Sabin splines separately on the three components in (2.12).
3.
Spline Subdivision
A natural question that comes up in many applications is how to represent a spline function on a refinement ∆1 of the given triangulation ∆0 . A procedure to do that is called a subdivision scheme. Windmolders and Dierckx [35] considered the subdivision problem for uniform Powell-Sabin splines with a dyadic scheme. Recently, this was used to construct a tangent subdivision scheme for parameter free surfaces [30]. However, the dyadic refinement scheme is generally not applicable for non-uniform Powell-Sabin splines. Vanraes et al. [33] presented a global triadic subdivision scheme for
Computer Aided Geometric Design with Powell-Sabin Splines
187
Figure 6. A torus modelled by a parametric Powell-Sabin surface. The control triangles and the triangular mesh lines are shown.
general Powell-Sabin splines, which was extended to a local adaptive scheme in [23]. First, we explain the refinement strategy for a given triangulation, and then we determine the corresponding PS control triangles of the new spline representation such that the original spline surface is preserved.
3.1.
Refinement Rules of the Triangulation
General subdivision for Powell-Sabin splines is based on the so-called scheme [14, 15]. Such a refinement proceeds as follows:
√
3 refinement
1. Split every triangle into three subtriangles by inserting a new vertex Vijk inside the old triangle T (Vi , Vj , Vk ), and connect it to the surrounding old vertices. For example, in the Powell-Sabin case the new vertex will be located at the interior point Zijk . 2. Flip each edge adjacent to two refined triangles of the original triangulation in order to rebalance the new triangulation. These edges connect now two new vertices instead of two original vertices. These two steps√are illustrated in Figure 7. From the construction it follows that the refined triangulation ∆ 3 does not preserve the original edges except at the boundary. However, if the new vertex√Vijk is chosen as the interior point Zijk of the original PS refinement, the new edges of ∆ 3 still coincide with the edges of that PS refinement. When we apply the √ 3 refinement scheme twice, we obtain a triadic split, as shown in Figure 8. Every original edge is trisected and each original triangle is split into nine subtriangles. √ 3,∗ In order to make subdivision possible, the interior points of the PS refinement ∆ √ √ of the new triangulation ∆ 3 must be chosen such that ∆ 3,∗ contains the edges √of the original PS refinement ∆∗ . As can be seen in Figure 7(c), the new triangles in ∆ 3 are bisected by an edge of√∆∗ . If their interior points are chosen on such an edge, we obtain a √ valid PS refinement ∆ 3,∗ . Figure 9 depicts a 3 refined triangle where the corresponding PS refinement is indicated with dashed lines.
188
Hendrik Speleers, Paul Dierckx and Stefan Vandewalle
(a)
(b)
(c)
√ Figure 7. Principle of 3 refinement. (a) PS refinement of two neighbouring triangles. (b) Place a new vertex at the position of the interior points and connect with the triangle corners. (c) Flip the edge adjacent to the two refined triangles. The dashed lines are part of the PS refinement.
(a)
(b)
Figure 8. Applying the
3.2.
√
(c)
3 refinement scheme twice results in a triadic refinement.
The Construction of Refined Control Triangles
In this section we explain how to derive the B-spline coefficients of the new Powell-Sabin spline on the (locally) refined triangulation, when the B-spline coefficients were given on the original triangulation. For the vertices Vi in the original triangulation one can reuse the old PS triangles defined by their corner points Qi,m , with m = 1, 2, 3. This is valid because any new PS point in the refined triangulation lies closer to the considered original vertex. However, it is better to determine a smaller PS triangle for improving the stability of the new Powell-Sabin spline. We will rescale the original PS triangle with an appropriate scalar ωi . The value of ωi can be found by comparing √the positions of the old and new PS points (see [33]). The corners of the new PS triangle ti 3 (see Figure 9) are then given by √
3 Qi,m = ωi Vi + (1 − ωi ) Qi,m ,
m = 1, 2, 3,
Computer Aided Geometric Design with Powell-Sabin Splines
189
Vk Qi,3
√
3 Qijk,3 √
Qi,33
Vijk √
3 Qijk,1
Qi,1
√
√
3 Qijk,2
Vi
Qi,13
Vj
√
Qi,23
Qi,2 √
√
√
√
√
√
√
√
3 3 3 , Qijk,2 , Qijk,3 ) associated Figure 9. The PS triangles ti 3 (Qi,13 , Qi,23 , Qi,33 ) and tijk3 (Qijk,1 √ with the vertices Vi and Vijk in a 3 refined triangulation. The PS refinement is indicated with dashed lines.
√
3 and the corresponding coefficients ci,m are calculated via the old coefficients ci,m as √
ci,13 = (ωi αi,1 + 1 − ωi ) ci,1 + ωi αi,2 ci,2 + ωi αi,3 ci,3 ,
(3.14a)
ci,23 = ωi αi,1 ci,1 + (ωi αi,2 + 1 − ωi ) ci,2 + ωi αi,3 ci,3 ,
(3.14b)
ci,33 = ωi αi,1 ci,1 + ωi αi,2 ci,2 + (ωi αi,3 + 1 − ωi ) ci,3 .
(3.14c)
√ √
√
The PS triangles tijk3 associated with the new vertices Vijk in the tion are defined by √
3 = (Vijk + Vi )/2, Qijk,1
√
3 Qijk,2 = (Vijk + Vj )/2,
√
3 refined triangula-
√
3 and, Qijk,3 = (Vijk + Vk )/2,
as shown in Figure 9. The corresponding coefficients are computed as √
3 ˜ i,1 ci,1 + L ˜ i,2 ci,2 + L ˜ i,3 ci,3 , cijk,1 =L
(3.15a)
3 ˜ j,1 cj,1 + L ˜ j,2 cj,2 + L ˜ j,3 cj,3 , cijk,2 =L
(3.15b)
3 ˜ k,1 ck,1 + L ˜ k,2 ck,2 + L ˜ k,3 ck,3 . cijk,3 =L
(3.15c)
√
√
√
3 Remark that Qijk,1 simply corresponds to the point S˜i in Figure 5(a), and equation (3.15a) ˜ j,1 , L ˜ j,2 , L ˜ j,3 ) and (L ˜ k,1 , L ˜ k,2 , L ˜ k,3 ) can refers to equation (2.11d). Likewise, the triplets (√ L √
3 3 be computed as the barycentric coordinates of Qijk,2 and Qijk,3 , respectively, with respect to the PS triangles of the vertices Vj and Vk . The formulas (3.14)-(3.15) use only convex combinations of the old coefficients. As a consequence, this subdivision scheme is numerically stable.
190
Hendrik Speleers, Paul Dierckx and Stefan Vandewalle
(a)
(b)
Figure 10. (a) A Powell-Sabin spline with 7 control triangles. (b) The equivalent triadically subdivided spline.
Applying those rules twice, one obtains the control triangles of the new vertices in a triadically refined triangulation. Figure 10 illustrates the triadic subdivision scheme for a given Powell-Sabin spline.
3.3.
Applications
In a broad range of application domains spline subdivision is of interest. In this section we discuss some applications of the PS subdivision scheme. Visualization. A common application of spline subdivision is visualization. After a few subdivision steps, the PS control triangles mimic the shape of the Powell-Sabin spline quite well. These control triangles can be used to construct a wireframe that approximates the spline surface. The question is how to connect the PS control triangles efficiently into a single wireframe. We consider a few approaches. The wireframe can be constructed by connecting the spline interpolation points s(Vi ), in the same way as the vertices Vi are connected in the domain triangulation. These points are the tangent points of the PS control triangles to the spline surface. This approach was suggested in [35]. Figure 11(a) shows such triangular meshes for the splines in Figure 10. Another strategy is to use the B´ezier control net, see Figure 1(b), of the Powell-Sabin spline in its Bernstein-B´ezier representation [6]. This control net has the advantage that it forms a convex hull for the spline surface. It also converges more rapidly to the surface than the previous wireframe. However, many triangles are needed in this representation, as can be seen in Figure 11(b). A fair compromise is the wireframe in Figure 11(c), constructed by connecting projections of some PS points into the PS control triangles in a particular way, as described below. We first build the mesh in the parameter domain, and then we project the points in this mesh into the corresponding PS control triangles. There are three types of patches in such a mesh, as illustrated in Figure 12. The first type is obtained by constructing for each vertex Vi the smallest envelope polygon that contains all PS points associated with Vi . Note
Computer Aided Geometric Design with Powell-Sabin Splines
191
(a)
(b)
(c)
Figure 11. Different wireframes for visualizing the Powell-Sabin splines in Figure 10. The wireframe can be obtained (a) by connecting the spline interpolation points s(Vi ), (b) by using the B´ezier control net, (c) by connecting the projections of certain PS points into the corresponding PS control triangles, as explained in section 3.3..
that the corners of such a polygon are particular PS points. Then, these corner points are connected into triangular and quadrilateral patches in the following way. For each triangle in the domain triangulation ∆ we construct a triangular patch by connecting the three PS points in the interior of the considered triangle. We connect the adjacent PS points along each edge in ∆ to form a quadrilateral patch. The wireframe is then defined by the projections of the corners of these patches into the corresponding PS control triangles. Note that these projections are just particular B´ezier ordinates in the Bernstein-B´ezier representation of the Powell-Sabin spline. For instance, the B´ezier ordinate wi is the projection of the PS point S˜i into PS control triangle Ti , as shown in Figure 5.
192
Hendrik Speleers, Paul Dierckx and Stefan Vandewalle
(a)
(b)
Figure 12. (a) A triangulation with PS refinement and PS points. (b) The mesh used for constructing a wireframe.
As can be seen in the Figures 11(b)-(c), the edges of this wireframe coincide with particular edges in the B´ezier control net. The number of used patches is about 1/8 of the number needed for the B´ezier control net. Let n, t and e be the number of vertices, triangles and edges, respectively, in the triangulation ∆. Then, the B´ezier control net needs 24t patches. The wireframe of Figure 11(c) only needs n + t + e patches, which amounts to about 3t using Euler’s formulas. Because of the reduced number of patches, this wireframe is more visually pleasant than the B´ezier control net. Note that at most four patches come together at each vertex in this wireframe. It could be disadvantageous that the patches in this wireframe are constructed of polygons with a different number of edges. Nevertheless, many graphical libraries, as OpenGL, can efficiently handle such set of polygons. Modelling. The basis functions have a smaller support after subdivision. This gives the designer more local control for manipulating the spline surface. By the local nature of the √ 3 subdivision scheme, complex surfaces can be efficiently represented with a reasonable amount of memory [23]. Approximation. Subdivision is useful in data fitting and finite element applications. The original spline space is a subspace of the space obtained after subdivision. Hence, we are guaranteed of a better spline approximation in the refined space when a least squares data fitting strategy is applied [7], or a Ritz-Galerkin finite element method is used for the numerical solution of partial differential equations [24]. When we use an iterative method to solve the linear systems that arise in these methods, the subdivided version of the optimal spline approximation in the coarser spline space provides a good initial guess for the optimal solution in the finer space. Geometric multigrid methods can be used to solve partial differential equations in a very efficient way. By means of a hierarchy of meshes, one can accelerate the convergence of a basic iterative method. Using the Powell-Sabin subdivision scheme, a nested sequence of triangulations can be easily created with natural intergrid transfer operators [26].
Computer Aided Geometric Design with Powell-Sabin Splines
193
Multiresolution. Multiresolution techniques work with different levels of resolution. Wavelets are functions that split data into different frequency components. Each component is studied with a resolution matched to its scale. A common tool to develop wavelets is the lifting scheme, where subdivision can be used as the prediction step. Such wavelets, called Powell-Sabin spline wavelets, have been developed in [32, 38]. They are particularly suitable for image/surface compression [18].
4.
QHPS Splines
When an increased resolution is only required in a small part of the surface, the use of global subdivision may lead to excessive computational and storage √ costs. In such a case, a local (adaptive) subdivision scheme is recommended. Although the 3 refinement scheme described in section 3.1. can be applied locally, a naive use may introduce poorly shaped triangles at the boundary of the locally refined region. This problem can be dealt with by using a refinement propagation strategy [23]. When a triangle fails to satisfy a certain quality requirement, extra neighbouring triangles are √ refined. This results in an expansion of the refined region. By the edge flipping step in the 3 refinement method, a narrow triangle (with small angles) will be replaced by two better shaped triangles. At the boundary of the domain, artificial vertices outside the domain can be inserted into the triangulation. The proposed strategy is driven by one parameter that manages the trade-off between the mesh quality and the refinement localization. Such a specialized shape-improvement strategy could be avoided if non-conforming triangulations were allowed. In this section we review the idea of [25], where PowellSabin splines are adapted towards certain non-conforming triangulations. In particular, we consider hierarchical triangulations which are obtained by partitioning an initial conforming triangulation with a triadic split. A hierarchical triangulation gives rise to a set of nested spline spaces. A hierarchical basis for such a space is constructed in a way that the basis functions of the coarser spaces are retained in the basis of the finer space. In a so-called quasi-hierarchical basis some of the coarse-level basis functions are replaced by finer-level basis functions [12]. This section discusses QHPS splines, which are a hierarchical variant of Powell-Sabin splines in a quasi-hierarchical basis representation [25].
4.1.
The Hierarchical Powell-Sabin Spline Space
Consider a simply connected subset Ω ⊂ R2 with polygonal boundary, and assume a conforming triangulation ∆0 of Ω is given. We construct a hierarchical triangulation ∆H on Ω by partitioning successively subsets of triangles with a triadic split, starting from the initial triangulation ∆0 . An example of such a triangulation is drawn in Figure 13(a) with solid lines. Here, ∆0 is the triangulation of Figure 2(a). The hierarchical triangulation has a total of n vertices. Of these vertices, nnc are nonconforming (or hanging) vertices. They are located on interiors of triangle edges. The remaining ones, i.e. nc = n − nnc , are called conforming vertices. In Figure 13(a), ∆0 consists of 8 triangles and 8 (conforming) vertices. The hierarchical triangulation in the figure consists of 16 triangles and 15 vertices (nc = 9, nnc = 6).
194
Hendrik Speleers, Paul Dierckx and Stefan Vandewalle
(a)
(b)
Figure 13. (a) A HPS refinement ∆∗H (in dashed lines) of a given hierarchical triangulation ∆H (in solid lines). (b) The QHPS points (bullets) and a set of suitable QHPS triangles (shaded).
To each hierarchical triangulation ∆H we can associate a hierarchical mesh structure ∆H . It is the set of triangles T k that are generated during the construction of ∆H . The superscript k of a triangle T k refers to the refinement level of that triangle, i.e., the minimal number of triadic refinement steps needed to construct the triangle. The triangles in ∆H that are part of ∆H are called leaf triangles. We will denote ∆lH as the subset of ∆H containing all triangles whose level is l or lower, and let ∆lH be its corresponding hierarchical triangulation. Note that these mesh structures are nested ∆0 ⊂ ∆1H ⊂ ∆2H ⊂ . . . ⊂ ∆H .
(4.16)
We will use Powell-Sabin triadic splits, as in Figure 8(c), in the construction of a hierarchical triangulation ∆H . The PS refinements needed in the splitting process generate a particular refinement of ∆H which partitions each triangle in ∆H into six subtriangles. This refinement is called the hierarchical Powell-Sabin (HPS) refinement ∆∗H of ∆H . Analogous to (4.16), the HPS refinement yields a nested structure of sets of triangles 2,∗ ∗ ∆0,∗ ⊂ ∆1,∗ H ⊂ ∆H ⊂ . . . ⊂ ∆H .
(4.17)
In Figure 13(a) such a HPS refinement is drawn in dashed lines. The space of piecewise quadratic polynomials on ∆∗H with global C 1 -continuity is called the hierarchical Powell-Sabin spline space: n o 1 S2,H (∆∗H ) = sH ∈ C 1 (Ω) : sH |Tj∗ ∈ Π2 , Tj∗ ∈ ∆∗H . (4.18) For this hierarchical spline space we considered a similar interpolation problem as (2.6) for Powell-Sabin splines [25] . Given a triplet (fk , fx,k , fy,k ) at each conforming vertex Vk in the hierarchical triangulation ∆H , the interpolation problem sH (Vk ) = fk ,
∂sH (Vk ) = fx,k , ∂x
∂sH (Vk ) = fy,k , ∂y
k = 1, . . . , nc ,
(4.19)
Computer Aided Geometric Design with Powell-Sabin Splines
195
1 (∆∗ ). It follows that the dimension of the hierarchihas a unique solution sH (x, y) ∈ S2,H H cal Powell-Sabin spline space is equal to 3nc .
4.2.
A Quasi-hierarchical Powell-Sabin Spline Basis
1 (∆∗ ) is very similar to the The construction of a normalized basis for the spline space S2,H H geometric approach for the Powell-Sabin basis, described in section 2.3.. A hierarchical Powell-Sabin spline in its quasi-hierarchical representation is called a QHPS spline, and is denoted as nc X 3 X j sQH (x, y) = ci,j Bi,QH (x, y). (4.20) i=1 j=1
We associate with each conforming vertex Vi in the hierarchical triangulation three linearly j independent triplets (αi,j , βi,j , γi,j ), j = 1, 2, 3. The QHPS B-spline Bi,QH (x, y) can then be determined as the solution of the interpolation problem (4.19) with all (fk , fx,k , fy,k ) = (0, 0, 0) except for k = i, where (fi , fx,i , fy,i ) = (αi,j , βi,j , γi,j ) 6= (0, 0, 0). The triplets (αi,j , βi,j , γi,j ) are computed as follows: 1. For each conforming vertex Vi in the hierarchical triangulation ∆H , identify the corresponding QHPS points. Let k be the smallest level of all triangles in ∆H that contain vertex Vi . Denote ∆kH as the triangulation, consisting of triangles of at most level k, that appears during the construction of ∆H . The QHPS points of Vi are defined as the midpoints of all edges in the HPS refinement ∆k,∗ H ending in Vi . The vertex Vi is also a QHPS point. In Figure 13(b) the QHPS points are indicated as bullets. 2. For each conforming vertex Vi , construct a triangle ti (Qi,1 , Qi,2 , Qi,3 ) containing all the QHPS points of Vi . The triangles ti , i = 1, . . . , nc , are called QHPS triangles. Figure 13(b) shows a possible set of QHPS triangles. 3. The three linearly independent triplets (αi,j , βi,j , γi,j ), j = 1, 2, 3, are derived from the QHPS triangle ti of a vertex Vi as follows: • αi = (αi,1 , αi,2 , αi,3 ) are the barycentric coordinates of Vi with respect to ti , • βi = (βi,1 , βi,2 , βi,3 ) and γi = (γi,1 , γi,2 , γi,3 ) are the coordinates of the unit barycentric directions, in x- and y-direction respectively, with respect to ti . Figure 14 illustrates how a QHPS B-spline associated with the central vertex in a given triangulation changes when some of the triangles are triadically refined. The same QHPS triangle is used in the three cases. Note that if the hierarchical triangulation is obtained by global triadic splits, i.e., the final triangulation is conforming, then the QHPS B-splines are just the classical PowellSabin B-splines.
4.3.
Properties of the QHPS B-spline Basis
The quasi-hierarchical B-splines have similar attractive properties as the classical PowellSabin B-splines. Unless stated otherwise, the proofs of the properties can be found in [25].
196
Hendrik Speleers, Paul Dierckx and Stefan Vandewalle
(a)
(b)
Figure 14. (a) Several triadically refined hierarchical triangulations. The HPS refinement is indicated with dashed lines. (b) Contour plots of a QHPS basis function associated with the central vertex for the corresponding meshes in (a). The first B-spline is the same as in Figure 3(b).
j Local support. Each QHPS B-spline Bi,QH (x, y) is zero outside the molecule Mik of the corresponding vertex Vi in ∆kH , with k the smallest level of any triangle in ∆H containing Vi .
Convex partition of unity.
The QHPS basis splines are positive, and they sum up to one.
Stability. The quasi-hierarchical basis is strongly L∞ -stable, i.e., the stability constants K1 and K2 in inequality (2.9) are both independent of the number of levels in the hierarchical triangulation. In [27] we investigated the Lp -stability of the basis. It turns out that the QHPS basis is, in general, weakly Lp -stable, i.e., the constants K1 −1 and K2 have at most a polynomial growth in the number of levels. However, this result can be improved for a broad class of hierarchical triangulations ∆H . Suppose there exists an upper bound on the difference between the level numbers of any two triangles in ∆H that lie inside the support of the same QHPS B-spline. If this bound is independent of the number of refinement levels in ∆H , then the QHPS basis is strongly Lp -stable. More details can be found in [27]. Note that the classical Powell-Sabin B-splines are always strongly Lp -stable on a globally refined hierarchical triangulation.
Computer Aided Geometric Design with Powell-Sabin Splines
197
QHPS control triangles. We can define control points as ci,j = (Qi,j , ci,j ) and control triangles as Ti (ci,1 , ci,2 , ci,3 ). These triangles are tangent to the spline surface z = s(x, y) at the vertices Vi , and the graph of the QHPS spline lies inside the convex hull of these control points. Subdivision. The non-conformity of the hierarchical triangulation allows a local triadic refinement in a natural way. In addition, a QHPS spline can be locally subdivided on a given set of triangles, using the same triadic PS subdivision rules as explained in section 3.. The locality of the subdivision scheme ensures that a QHPS spline surface can be adaptively manipulated with a reasonable increase of dimension of the space. In Figure 15(a) we subdivided the torus, shown in Figure 6, locally on two triangles, and we adapted the QHPS control triangles of the newly introduced vertices in Figure 15(b). Note that the support of each QHPS B-spline associated with one of these new vertices stays within its original triangle. This is a very interesting property for surface editing. The designer selects a triangle where the surface must be locally subdivided. Then, he/she can freely morph the surface while only the part within the selected triangle will be affected. The new QHPS surface in Figure 15(b) only consists of 18 control triangles, whereas a globally subdivided Powell-Sabin spline would need 144 control triangles to represent the same surface.
4.4.
A Practical Implementation
In this section we show how the efficient algorithms for classical Powell-Sabin splines can be used for working with quasi-hierarchical Powell-Sabin splines. A QHPS spline can be represented on each leaf triangle by a particular Powell-Sabin spline. To construct this Powell-Sabin spline, we have to determine the PS control triangles corresponding to the three corner vertices of the considered triangle. The PS control triangles associated with conforming vertices can be taken identical to the QHPS control triangles of the QHPS spline. We can use subdivision to compute the control triangles for the non-conforming vertices. The correct PS control triangles can be recursively obtained while moving through the hierarchical mesh structure. When the leaf triangle is reached, the corresponding Powell-Sabin spline on the considered triangle will then be fully defined. In this way, an algorithm for QHPS splines can be straightforwardly reduced to the equivalent algorithm for Powell-Sabin splines. We now present a generic QHPS algorithm [25]. It only needs an equivalent algorithm for classical Powell-Sabin splines. Generic QHPS algorithm. Let ∆H be a hierarchical mesh structure, and ∆ be a conforming triangulation. The generic algorithm qhps algorithm(∆H ) for QHPS splines is based on the equivalent algorithm ps algorithm(∆) for classical Powell-Sabin splines. We suppose that each triangle in ∆H and ∆ has access to the control triangles associated with its three corner vertices. For conforming vertices, their control triangles are already known in advance, i.e., the QHPS control triangles. For non-conforming vertices, their control triangles are computed during the algorithm. function qhps algorithm(hierarchical structure ∆H )
198
Hendrik Speleers, Paul Dierckx and Stefan Vandewalle for all triangles T in the initial triangulation ∆0 of ∆H : qhps local algorithm(T , ∆H ) endfor end function qhps local algorithm(triangle T , hierarchical structure ∆H ) if T is not a leaf triangle: 1. Let l be the level of T in the hierarchical structure. 2. for all non-conforming vertices Vi of level (l + 1) situated at the interior of an edge of T : Calculate the control triangle Ti by triadic PS subdivision, using the control triangles of two corner vertices of T . endfor 3. for all 9 subtriangles Ti of level (l + 1) in T : qhps local algorithm(Ti , ∆H ) endfor else ps algorithm(T ) endif end
The algorithm requires that all triangles in the hierarchical mesh structure ∆H are stored in memory. It is recommended to manage these triangles in the following way. Use an array that contains the triangles in the initial triangulation ∆0 . A triangle that is refined keeps references to each of its nine subtriangles. That ensures that all triangles in ∆H are reachable. In order to navigate easily through the mesh structure it is also advisable that the triangles keep the references to their (at most three) neighbouring triangles. Using a different Powell-Sabin spline on each leaf triangle in the hierarchical triangulation is usually not more time-consuming than considering a single Powell-Sabin spline on a larger set of triangles. Indeed, most of the algorithms for Powell-Sabin splines run over all triangles in the triangulation separately. The QHPS splines can be evaluated and manipulated in a stable way. Only convex combinations are needed to convert the QHPS spline on each triangle to a Powell-Sabin spline, to represent these Powell-Sabin splines with Bernstein-B´ezier polynomials, and to evaluate these polynomials via the de Casteljau algorithm.
5.
NURPS Surfaces
Rational surfaces, such as rational B´ezier and NURBS surfaces, are commonly used tools in commercial computer aided design and computer graphic packages. Rational surface representations give a designer extra degrees of freedom compared to their non-rational counterparts through the weights that are associated with the control points. These rational
Computer Aided Geometric Design with Powell-Sabin Splines
199
(a)
(b)
Figure 15. (a) Local QHPS subdivision applied on two triangles of the surface in Figure 6. The control triangles and the triangular mesh lines are shown. (b) Effect of moving the two new control triangles of the QHPS surface.
surfaces are able to exactly represent patches on quadric surfaces, e.g., patches on the cone and the sphere. In this section we discuss the rational extension of Powell-Sabin splines, called NURPS surfaces. For more details we refer to the papers [29, 36, 37].
5.1.
Rational Powell-Sabin Surfaces
Consider a conforming triangulation ∆ on a given domain Ω ⊂ R2 , with PS refinement ∆∗ . A non-uniform rational Powell-Sabin (NURPS) surface s(u, v) is defined as
s(u, v) =
n X 3 X i=1 j=1
ci,j φji (u, v),
(u, v) ∈ Ω,
(5.21)
200
Hendrik Speleers, Paul Dierckx and Stefan Vandewalle
with ci,j = (cxi,j , cyi,j , czi,j ) the NURPS control points, and the blending functions wi,j B j (u, v) φji (u, v) = Pn P3 i . j i=1 j=1 wi,j Bi (u, v)
(5.22)
Here, Bij (u, v) are the normalized Powell-Sabin B-splines on ∆∗ , and wi,j are positive weights. When all weights are constant, then (5.21) reduces to (2.13). The blending functions φji (u, v) in (5.22) have a local support and they form a convex partition of unity. A NURPS surface in representation (5.21) can be seen as the 3D projection onto the Euclidian space of a 4D Powell-Sabin spline in the homogeneous space, i.e.,
s(u, v) =
n X 3 X
chi,j Bij (u, v),
(5.23)
i=1 j=1
where the 4D homogeneous control points are given by hy hz hw y x z chi,j = (chx i,j , ci,j , ci,j , ci,j ) = (wi,j ci,j , wi,j ci,j , wi,j ci,j , wi,j ).
(5.24)
The next sections are devoted to some particular features of the NURPS surfaces. First, we discuss the use of the control points and the weights in shape modelling with NURPS surfaces. Then, we describe a subdivision scheme. We end with the NURPS representation of some quadric surfaces.
5.2.
Modelling with NURPS Surfaces
A designer has two types of freedom in the construction of a NURPS surface: the coefficients ci,j and the weights wi,j . They can be determined in a geometrically intuitive way by means of control triangles and weight points.
Control triangles. Similar to the Powell-Sabin splines, the influence of the coefficients in (5.21) on the NURPS surface can be intuitively interpreted via control triangles [36]. With each vertex Vi in the triangulation ∆, one can define the control triangle Ti (ci,1 , ci,2 , ci,3 ). This triangle is tangent to the NURPS surface at the point
where
s(Vi ) = α ˆ i,1 ci,1 + α ˆ i,2 ci,2 + α ˆ i,3 ci,3 ,
(5.25)
wi,j αi,j α ˆ i,j = P3 . k=1 wi,k αi,k
(5.26)
Here, (αi,1 , αi,2 , αi,3 ) are the barycentric coordinates of vertex Vi with respect to the PS triangle ti .
Computer Aided Geometric Design with Powell-Sabin Splines
201
Weight points. In [29] it is described how one can use so-called weight points as a design tool for handling the weights. Such a point is characterized by its position pi and by a scaling factor Ki . The position is chosen as the tangent point of the control triangle Ti to the NURPS surface, i.e., pi = s(Vi ). The three weights wi,j , with j = 1, 2, 3, are then uniquely defined by means of the barycentric coordinates (α ˆ i,1 , α ˆ i,2 , α ˆ i,3 ) of pi with respect to Ti , up to a positive factor Ki , via the formula α ˆ i,j wi,j = Ki . (5.27) αi,j A designer can freely move the weight point within the control triangle Ti . Since its position is the tangent point of Ti to the NURPS surface, the effect of the movement will be intuitive to the designer. This is illustrated in the first two pictures of Figure 16. One can use the scaling factor Ki to determine the relative importance of the considered three weights with respect to the other weights. Indeed, Ki can be interpreted as a weighted mean of these weights, i.e., Ki = αi,1 wi,1 + αi,2 wi,2 + αi,3 wi,3 . (5.28) The larger the value of Ki , the more the NURPS surface will be attracted to the control triangle. A cusp can be simulated by reducing the scaling factor. The effect of changing the scaling factor is illustrated in the bottom two pictures of Figure 16. Similar effects can be obtained by rescaling (enlarging/reducing) the control triangles [37]. However, changing the factor Ki has the advantage that the designer can continue to work with the same control triangles. Let ri,j be the intersection point of the line pi − ci,j and the edge of control triangle Ti opposite to ci,j , then α ˆ i,j ′ ci,j ′ + α ˆ i,j ′′ ci,j ′′ ri,j = , (5.29) α ˆ i,j ′ + α ˆ i,j ′′ with j ′ = 1 + (j mod 3) and j ′′ = 1 + (j ′ mod 3). Using these points, the ratio of two weights can be geometrically interpreted as the ratio of two lengths, i.e., wi,j ′′ αi,j ′ kri,j − ci,j ′ k = , wi,j ′ αi,j ′′ kri,j − ci,j ′′ k
(5.30)
and, using formula (5.27), as a ratio of two triangular areas, i.e., wi,j ′′ αi,j ′ A(pi , ci,j , ci,j ′ ) = . wi,j ′ αi,j ′′ A(pi , ci,j ′′ , ci,j )
5.3.
(5.31)
NURPS Subdivision
We now adapt the subdivision scheme for Powell-Sabin splines towards NURPS surfaces. √ 3 The construction of the refined triangulation ∆ and the choice of the PS triangles remain identical as described in section 3.. We can compute the control points of the subdivided NURPS surface via its homo√ h, 3 geneous representation (5.23). For instance, the homogeneous control points cijk,m , with
202
Hendrik Speleers, Paul Dierckx and Stefan Vandewalle
(a)
(b)
(c)
(d)
Figure 16. (a) Original NURPS surface. (b) The effect of moving the weight point pi inside the control triangle. (c)-(d) Enlarging and reducing the scaling factor Ki . The considered weight point and control points are indicated with bullets. √
m = 1, 2, 3, corresponding to the PS triangle tijk3 , are then calculated as √
3 h h h ˜ ˜ ˜ ch, ijk,1 = Li,1 ci,1 + Li,2 ci,2 + Li,3 ci,3 , √ h, 3 cijk,2 √ 3 ch, ijk,3
(5.32a)
˜ j,1 chj,1 + L ˜ j,2 chj,2 + L ˜ j,3 chj,3 , =L
(5.32b)
˜ k,1 ch + L ˜ k,2 ch + L ˜ k,3 ch . =L k,1 k,2 k,3
(5.32c)
The convex combinations are identical to the ones in formulas (3.15). Projecting the new control points (5.32) back to the Euclidian space yields √ √ √ hx, 3 hy, 3 hz, 3 √ c c c ijk,m ijk,m ijk,m 3 √ , √ , √ cijk,m = . (5.33) hw, 3 hw, 3 hw, 3 cijk,m cijk,m cijk,m It is well known that working in the homogeneous space can lead to numerical instabilities. √ hr, 3 When the weights vary greatly in magnitude, the coordinates cijk,m , with r = x, y, z, can become extremely large. Then, the calculations do not operate in the convex hull of the rational control points anymore, and numerical stability is endangered. Inspired by the idea behind the rational variant of the de Casteljau algorithm from Farin [8], we can improve the numerical stability by rearranging the calculations in or-
Computer Aided Geometric Design with Powell-Sabin Splines
203
Figure 17. The domain triangle of a NURPS patch, together with its PS refinement and PS triangles.
√
√
3 3 and its weight wijk,1 der to avoid working in the homogeneous space. For instance, cijk,1 are computed in a stable way as follows: √
√
hw, 3 3 ˜ i,1 wi,1 + L ˜ i,2 wi,2 + L ˜ i,3 wi,3 , wijk,1 = cijk,1 =L ˜ ˆ i,m = wi,m√Li,m , L m = 1, 2, 3, 3 wijk,1
(5.34a) (5.34b)
√
3 ˆ i,1 ci,1 + L ˆ i,2 ci,2 + L ˆ i,3 ci,3 . cijk,1 =L
(5.34c)
Analogously, one can calculate the other control points in a numerically stable way.
5.4.
Quadrics as NURPS Surfaces
In [37] closed formulas are derived for the control points of NURPS patches on a cylinder, a cone and a sphere. In this section we give another representation of such patches, resulting in a simpler choice of the control points. Further on, each triangular NURPS patch is defined on the equilateral domain triangle T (V1 , V2 , V3 ) depicted in Figure 17, with the given PS refinement and PS triangles, defined by the points Q1,1 = V1 ,
Q1,2 = (V1 + V2 )/2,
Q1,3 = (V1 + V3 )/2,
Q2,1 = V2 ,
Q2,2 = (V2 + V3 )/2,
Q2,3 = (V2 + V1 )/2,
Q3,1 = V3 ,
Q3,2 = (V3 + V1 )/2,
Q3,3 = (V3 + V2 )/2.
The derivation of the corresponding control points is analogous to the one described in [37]. Cylinder. The cylinder x2 + y 2 = r2 ,
0 ≤ z ≤ h,
(5.35)
can be split into eight isometrical triangular segments. The control points and weights of the NURPS representation of such a patch is given in the following table.
204
Hendrik Speleers, Paul Dierckx and Stefan Vandewalle
i 1 2 3
ci,1 (r, 0, 0) (0, r, 0) (r, 0, h)
wi,1 1 1 1
ci,2 (r, r, 0) (r, r, h2 ) (r, 0, h2 )
w√i,2 2 √2 2 2
1
ci,3 (r, 0, h2 ) (r, r, 0) (r, r, h2 )
wi,3 1 √ 2 √2 2 2
The entire cylinder, constructed with eight NURPS patches, is shown in Figure 18(a). Cone. The cone
x2 + y 2 =
h−z r h
2
,
0 ≤ z ≤ h,
(5.36)
can be split into four isometrical patches. The NURPS representation of such a patch is given in the following table. i 1 2 3
ci,1 (r, 0, 0) (0, r, 0) (0, 0, h)
wi,1 1 1 1
ci,2 (r, r, 0) (0, 0, h) (0, 0, h)
w√i,2 2 2
1 1
ci,3 (0, 0, h) (r, r, 0) (0, 0, h)
wi,3 1 √ 2 2
1
The complete cone is depicted in Figure 18(b). Sphere.
Using only NURPS patches, it is not possible to describe the entire sphere, x2 + y 2 + z 2 = 1.
(5.37)
Nevertheless, with 2n NURPS patches we can represent the sphere up to some small gaps. The maximal height of these gaps is equal to hg =
2 sin2 θ , cos2 θ + 1
(5.38)
where θ = π/n. Small gaps can be achieved by a small angle θ, but then a large number of patches has to be used. The control points of such a patch are shown in the table below. i 1 2 3
ci,1 (cos θ, − sin θ, 0) (cos θ, sin θ, 0) (0, 0, 1)
wi,1 1 1 1
ci,2 1 2 cos θ , 0, tan θ (cos θ, sin θ, 1) (cos θ, − sin θ, 1)
wi,2 2θ cos √ 2 √2 2 2
ci,3 (cos θ, − sin θ, 1) 1 2 cos θ , 0, tan θ (cos θ, sin θ, 1)
w√i,3
2 2 2θ cos √ 2 2
An extension to a sphere with radius r is straightforward, i.e., by multiplying the coordinates of each control point with r. An incomplete NURPS representation of the sphere with twelve patches is given in Figure 18(c). The hole filling strategy in [34] can be used to close the gaps in the sphere approximately.
Computer Aided Geometric Design with Powell-Sabin Splines
(a)
(b)
205
(c)
Figure 18. A NURPS representation of (a) a cylinder, (b) a cone, and (c) a sphere. The patches are defined on the domain triangle in Figure 17. Neighbouring patches are shaded in different colours.
6.
Conclusion
Powell-Sabin splines are C 1 -continuous piecewise quadratic polynomials defined on an arbitrary conforming triangulation. These splines can be compactly represented in a stable normalized basis. The basis functions can be chosen in a flexible way by means of PS triangles. This leads to a natural definition of PS control triangles, that allow an interactive change of the shape of Powell-Sabin splines in a predictable way. √ Powell-Sabin splines can be refined using the 3 subdivision scheme. The control points of the subdivided spline can be easily calculated in a stable way. Applying the scheme twice results in a triadic refinement. Subdivision has many applications. The subdivision scheme can be applied for an efficient visualization of the Powell-Sabin splines. The increased resolution can also be used for obtaining a more detailed approximation or for a local manipulation of the spline shape. QHPS splines are a hierarchical variant of Powell-Sabin splines in a quasi-hierarchical basis representation. They are defined on a hierarchical triangulation. Such a mesh is obtained, starting from an initial conforming triangulation, by partitioning successively a subset of triangles with a triadic split. The QHPS basis retains all advantages of the PowellSabin B-splines. In addition, a local refinement of the QHPS spline can be performed in a very natural way. The rational extension of a Powell-Sabin spline surface is called a NURPS surface. In the rational representation a weight is associated with each control point. These weights can be used as extra degrees of freedom in the modelling of shapes by means of weight points. The position of the tangent points within the control triangles leads to an intuitive and graphical interpretation of the weights. NURPS surfaces are able to exactly represent patches of quadric surfaces, as the cylinder, the cone and the sphere. Note that the quasihierarchical setting is also applicable to the NURPS surfaces. A generalization of Powell-Sabin splines to higher degrees and higher dimensions is not a trivial task. Some higher degree spline extensions on the Powell-Sabin split are con-
206
Hendrik Speleers, Paul Dierckx and Stefan Vandewalle
sidered in [2, 17]. Unfortunately, the proposed B-splines have no immediate geometric interpretation anymore, similar to PS triangles, and they are more difficult to implement. A generalization of the Powell-Sabin split in more dimensions is discussed in [22, 39]. Each simplex in a tessellation has to be split into a lot of smaller simplices. They must satisfy a particular set of geometric constraints in order to achieve C 1 -continuity. At the moment it is not clear whether these constraints can always be satisfied for an arbitrary tessellation. Nevertheless, we can conclude that Powell-Sabin splines exhibit many nice properties that make them suitable for many CAGD applications.
References [1] P. Alfeld and L.L. Schumaker. The dimension of bivariate spline spaces of smoothness r for degree d ≥ 4r + 1. Constr. Approx., 3:189–197, 1987. [2] P. Alfeld and L.L. Schumaker. Smooth macro-elements based on Powell-Sabin triangle splits. Adv. Comp. Math., 16:29–46, 2002. [3] L.J. Billera. Homology of smooth splines: generic triangulations and a conjecture of Strang. Trans. Amer. Math. Soc., 310:325–340, 1988. [4] W. Boehm and A. M¨uller. On de Casteljau’s algorithm. Comput. Aided Geom. Design, 16:587–605, 1999. [5] R.W. Clough and J.L. Tocher. Finite element stiffness matrices for analysis of plates in bending. In Proc. 1st Conference on Matrix Methods in Structural Mechanics, pages 515–545, Wright Patterson Air Force Base, Ohio, 1965. [6] P. Dierckx. On calculating normalized Powell-Sabin B-splines. Comput. Aided Geom. Design, 15(1):61–78, 1997. [7] P. Dierckx, S. Van Leemput, and T. Vermeire. Algorithms for surface fitting using Powell-Sabin splines. IMA J. Numer. Anal., 12:271–299, 1992. [8] G. Farin. Algorithms for rational B´ezier curves. Comput. Aided Design, 15:73–77, 1983. [9] G. Farin. Triangular Bernstein-B´ezier patches. Comput. Aided Geom. Design, 3:83– 127, 1986. [10] G. Farin. Curves and Surfaces for CAGD: A Practical Guide. Morgan Kaufmann Publishers, San Francisco, fifth edition, 2002. [11] G. Farin. Dimensions of spline spaces over unconstricted triangulations. J. Comput. Appl. Math., 192:320–327, 2006. [12] E. Grinspun, P. Krysl, and P. Schr¨oder. CHARMS: a simple framework for adaptive simulation. ACM Trans. Graphics, 21(3):281–290, 2002.
Computer Aided Geometric Design with Powell-Sabin Splines
207
[13] D. Hong. Spaces of bivariate spline functions over triangulation. Approx. Theory Appl., 7:56–75, 1991. √ [14] L. Kobbelt. 3-Subdivision. In Computer Graphics Proceedings, Annual Conference Series, pages 103–112. ACM SIGGRAPH, 2000. √ [15] U. Labsik and G. Greiner. Interpolatory 3-subdivision. In Proc. 21th European Conference on Computer Graphics, volume 19 of Computer Graphics Forum, pages 131–138, Cambridge, 2000. [16] M.J. Lai and L.L. Schumaker. On the approximation power of bivariate splines. Adv. Comp. Math., 9:251–279, 1998. [17] M.J. Lai and L.L. Schumaker. Macro-elements and stable local bases for spaces of splines on Powell-Sabin triangulations. Math. Comp., 72:335–354, 2003. [18] J. Maes and A. Bultheel. Stable multiresolution analysis on triangles for surface compression. Electr. Trans. Numer. Anal., 25:224–258, 2006. [19] J. Maes, E. Vanraes, P. Dierckx, and A. Bultheel. On the Stability of normalized Powell-Sabin B-splines. J. Comput. Appl. Math., 170(1):181–196, 2004. [20] C. Manni and P. Sablonni`ere. Quadratic spline quasi-interpolants on Powell-Sabin partitions. Adv. Comput. Math., 26:283–304, 2007. [21] M.J.D. Powell and M.A. Sabin. Piecewise quadratic approximations on triangles. ACM Trans. Math. Softw., 3:316–325, 1977. [22] T. Sorokina and A.J. Worsey. A multivariate Powell-Sabin interpolant. Adv. Comput. Math., in press, 2007. [23] H. Speleers, P. Dierckx, and S. Vandewalle. Local subdivision of Powell-Sabin splines. Comput. Aided Geom. Design, 23(5):446–462, 2006. [24] H. Speleers, P. Dierckx, and S. Vandewalle. Numerical solution of partial differential equations with Powell-Sabin splines. J. Comput. Appl. Math., 189(1-2):643–659, 2006. [25] H. Speleers, P. Dierckx, and S. Vandewalle. Quasi-hierarchical Powell-Sabin Bsplines. Technical Report 472, Dept. Computer Science, K.U. Leuven, 2006. [26] H. Speleers, P. Dierckx, and S. Vandewalle. Multigrid methods with Powell-Sabin splines. Technical Report 488, Dept. Computer Science, K.U. Leuven, 2007. [27] H. Speleers, P. Dierckx, and S. Vandewalle. On the Lp -stability of quasi-hierarchical Powell-Sabin splines. Technical Report 492, Dept. Computer Science, K.U. Leuven, 2007. [28] H. Speleers, P. Dierckx, and S. Vandewalle. Powell-Sabin splines with boundary conditions for polygonal and non-polygonal domains. J. Comput. Appl. Math., in press, 2007.
208
Hendrik Speleers, Paul Dierckx and Stefan Vandewalle
[29] H. Speleers, P. Dierckx, and S. Vandewalle. Weight control for modelling with NURPS surfaces. Comput. Aided Geom. Design, 24(3):179–186, 2007. [30] E. Vanraes and A. Bultheel. A tangent subdivision scheme. ACM Trans. Graphics, 25:340–355, 2006. [31] E. Vanraes, P. Dierckx, and A. Bultheel. On the choice of the PS-triangles. Technical Report 353, Dept. Computer Science, K.U. Leuven, 2003. [32] E. Vanraes, J. Maes, and A. Bultheel. Powell-Sabin spline wavelets. Int. J. Wav. Multires. Inf. Proc., 2(1):23–42, 2004. [33] E. Vanraes, J. Windmolders, A. Bultheel, and P. Dierckx. Automatic construction of control triangles for subdivided Powel-Sabin splines. Comput. Aided Geom. Design, 21(7):671–682, 2004. [34] J. Windmolders. Powell-Sabin splines for computer aided geometric design. PhD thesis, Dept. Computer Science, K.U. Leuven, 2003. [35] J. Windmolders and P. Dierckx. Subdivision of uniform Powell-Sabin splines. Comput. Aided Geom. Design, 16:301–315, 1999. [36] J. Windmolders and P. Dierckx. From PS-splines to NURPS. In A. Cohen, C. Rabut, and L.L. Schumaker, editors, Proc. of Curve and Surface Fitting, Saint-Malo 1999, pages 45–54. Vanderbilt University Press, 2000. [37] J. Windmolders and P. Dierckx. NURPS for special effects and quadrics. In T. Lyche and L.L. Schumaker, editors, Proc. of Mathematical Methods for Curves and Surfaces, Oslo 2000, pages 527 – 534. Vanderbilt University Press, 2001. [38] J. Windmolders, E. Vanraes, P. Dierckx, and A. Bultheel. Uniform Powell-Sabin spline wavelets. J. Comput. Appl. Math., 154(1):125–142, 2003. [39] A.J. Worsey and B. Piper. A trivariate Powell-Sabin interpolant. Comput. Aided Geom. Design, 5:177–186, 1988.
In: Computer Animation Editors: J.S. Wright and L.M. Hughes, pp. 209-234
ISBN: 978-1-60741-559-6 © 2010 Nova Science Publishers, Inc.
Chapter 9
AN ONTOLOGY OF COMPUTER-AIDED DESIGN Udo Kannengiesser NICTA, Australia
John S. Gero Krasnow Institute for Advanced Study and Volgenau School of Information Technology and Engineering, George Mason University, USA, and University of Technology, Sydney, Australia
Abstract This chapter develops an ontology of computer-aided design, based on the functionbehaviour-structure (FBS) ontology. It proposes two complementary views of the process of design. The object-centred view applies the FBS ontology to the artefact being designed. Integrating an ontology of three “design worlds”, this view establishes a framework of designing as a set of transformations between the function, behaviour and structure of the design object, driven by interactions between the three design worlds. Building on this framework, the process-centred view applies the FBS ontology to the activities defined by the object-centred view. This increases the level of detail and provides a more well-defined set of representations of these activities. Our ontological framework can be used to provide a better understanding of the functionalities required of existing and future computer-aided design support.
1. Introduction The notion of computer-aided design can be understood as an umbrella term for approaches to using computational tools to support human design activities. Its principal innovations to date include tools for computer-aided drafting (CAD), engineering (CAE) and manufacturing (CAM), which have been recognised as a significant technological achievement of the past century (Weisberg 2000). These tools are now indispensable for practitioners in many design domains.
210
Udo Kannengiesser and John S. Gero
Some research has focused on expanding computer support to activities carried out in the early, conceptual stages of design. However, its impact on design practices and tool development in industry has generally been rather low. We believe that one of the reasons is that many of these approaches are based on an insufficient understanding of design. This concerns activities that are carried out by human designers, which include producing and reinterpreting drawings or sketches, and reflecting on current and previous design tasks. It has been shown that these activities are important drivers of designing (Schön and Wiggins 1992; Suwa et al. 1999; Suwa and Tversky 2002). Most traditional models of design are inadequate because they do not explicitly account for these findings. Progress on a more comprehensive understanding of designing has been made only recently. Our situated function-behaviour-structure (FBS) framework (Gero and Kannengiesser 2004) represents designing as a situated act that is driven by the interactions between the designer and their environment. It uses a perspective that is oriented to the object being designed, so that designing can be shown as a set of transformations between the function, behaviour and structure of the artefact. The situated FBS framework has shown its potential to enhance human understanding of designing. This chapter develops extensions to this framework that provide a more detailed ontological basis on which computer-aided design support can be built. Section 2 presents the situated FBS framework and shows how it derives from an objectcentred view of designing driven by the interactions between three “design worlds”. Section 3, adopting a process-centred view of designing, applies the FBS ontology to the design activities defined in Section 2. This adds a significant amount of detail and rigour to the representation of each activity. Section 4 uses this view to derive a framework that specifies the functions required of computational tools to support designing. Section 5 concludes the chapter.
2. An Object-Centred Ontology of Design 2.1. An Ontology of Design Objects Most design models and design ontologies focus on the artefact or object of design. The FBS ontology distinguishes between three aspects of a design object (Gero 1990; Gero and Kannengiesser 2004): function (F), behaviour (B) and structure (S).
2.1.1. Object Function Function (F) of an object is defined as its teleology (“what the object is for”). For example, some of the functions of a window include “to provide view”, “to provide daylight” and “to provide rain protection”. Function represents the usefulness of the object for another system.
2.1.2. Object Behaviour Behaviour (B) of an object is defined as the attributes that can be derived from its structure (“what the object does”). Using the window example, behaviours include “thermal
An Ontology of Computer-Aided Design
211
conduction”, “light transmission” and “direct solar gain”. Behaviour provides operational, measurable performance criteria for comparing different objects.
2.1.3. Object Structure Structure (S) of an object is defined as its components and their relationships (“what the object consists of”). The structure of physical objects includes their form (i.e., geometry and topology) and material. More generally, form can be viewed as a description of an object’s macro-structure, and material can be viewed as a shorthand description of the micro-structure. In the window example, macro-structure (form) includes “glazing length” and “glazing height”, and micro-structure (material) includes “type of glass”.
2.1.4. Relationships between Object Function, Behaviour and Structure Humans construct relationships between function, behaviour and structure through experience and through the development of causal models based on interactions with the object. Specifically, function is ascribed to behaviour by establishing a teleological connection between the human’s goals and observable or measurable effects of the object. There is no direct relationship between function and structure. Behaviour is causally related to structure, i.e. it can be derived from structure using physical laws or heuristics. This may require knowledge about external effects (exogenous variables) and their interaction with the artefact’s structure. In the window example, deriving the behaviour “light transmission” requires considering external light sources.
2.2. An Ontology of Design Worlds An aspect that has been ignored in most models of design relates to the interactions of the designer and their environment. Designers perform actions in order to change their environment. By observing and interpreting the results of their actions, they then decide on new actions to be executed on the environment. The designers’ concepts may change according to what they are “seeing”, which itself is a function of what they have done. One may speak of an “interaction of making and seeing” (Schön and Wiggins 1992). This interaction between the designer and the environment strongly determines the course of designing. This idea is called situatedness, whose foundational concepts go back to the work of Dewey (1896) and Bartlett (1932). Gero and Kannengiesser (2004) have modelled situatedness by specifying three interacting worlds: the external world, interpreted world and expected world, Figure 1(a).
2.2.1. The External World The external world is the world that is composed of representations outside the designer or design agent. The notion of “external” is meant in a conceptual sense rather than a physical one. It denotes an environment that contains design artefacts made available for interpretation.
212
Udo Kannengiesser and John S. Gero
Figure 1. Situatedness as the interaction of three worlds: (a) general model, (b) specialised model for design representations
2.2.2. The Interpreted World The interpreted world is the world that is built up inside the design agent in terms of sensory experiences, percepts and concepts. It is the internal representation of that part of the external world that the design agent interacts with. The interpreted world provides an environment for analytic activities and discovery during designing.
2.2.3. The Expected World The expected world is the world imagined actions of the design agent will produce. It is the environment in which the effects of actions are predicted according to current goals and interpretations of the state of the world.
2.2.4. Relationships between the Three Worlds These three worlds are related through three classes of interaction. Interpretation transforms variables that are sensed in the external world into sensory experiences, percepts and concepts that compose the interpreted world. Focussing takes some aspects of the interpreted world and uses them as goals for the expected world. Action is an effect which brings about a change in the external world according to the goals in the expected world.
An Ontology of Computer-Aided Design
213
2.2.5. A More Detailed Framework of Design Interactions Figure 1(b) presents a specialised view of the ontology of design worlds, with the design agent (described by the interpreted and expected world) located within the external world, and with general classes of design representations placed into this nested model. The set of expected design representations (Xei) corresponds to the notion of a design state space, i.e. the state space of all possible designs that satisfy the set of requirements. This state space can be modified during the process of designing by transferring new interpreted design representations (Xi) into the expected world and/or transferring some of the expected design representations (Xei) out of the expected world. This leads to changes in external design representations (Xe), which may then be used as a basis for re-interpretation changing the interpreted world. Novel interpreted design representations (Xi) may also be the result of memory (here called constructive memory), which can be viewed as a process of interaction among design representations within the interpreted world rather than across the interpreted and the external world. Both interpretation and constructive memory are viewed as “push-pull” processes, i.e. the results of these processes are driven both by the original experience (“push”) and by some of the agent’s current interpretations and expectations (“pull”) (Gero and Fujii 2000). This notion captures two ideas. First, interpretation and constructive memory have a subjective nature, using first-person knowledge grounded in the designer’s interactions with their environment (Bickhard and Campbell 1996; Clancey 1997; Ziemke 1999; Smith and Gero 2005). This is in contrast to static approaches that attempt to encode all relevant design knowledge prior to its use. Anecdotal evidence in support of first-person knowledge is provided by the common observation that different designers perceive the same set of requirements differently (and thus produce different designs). And the same designer is likely to produce different designs at later times for the same requirements. This is a result of the designer acquiring new knowledge while interacting with their environment between the two times. Second, the interplay between “push” and “pull” has the potential to produce emergent effects, leading to novel and often surprising interpretations of the same internal or external representation. This idea extends the notion of biases that simply reproduce the agent’s current expectations. Examples have been provided from experimental studies of designers interacting with their sketches of the design object. Schön and Wiggins (1992) found that designers use their sketches not only as an external memory, but also as a means to reinterpret what they have drawn, thus leading the design in a surprising, new direction. Suwa et al. (1999) noted, in studying designers, a correlation of unexpected discoveries in sketches with the invention of new issues or requirements during the design process. They concluded that “sketches serve as a physical setting in which design thoughts are constructed on the fly in a situated way”. Guindon’s (1990) protocol analyses of software engineers, designing control software for a lift, revealed that designing is characterised by frequent discoveries of new requirements interleaved with the development of new partial design solutions. As Guindon puts it, “designers try to make the most effective use of newly inferred requirements, or the sudden discovery of partial solutions, and modify their goals and plans accordingly”.
214
Udo Kannengiesser and John S. Gero
2.3. The Situated Function-Behaviour-Structure Framework Gero and Kannengiesser (2004) have combined the ontology of design artefacts (Section 2.1) with the ontology of design worlds (Section 2.2), by specialising the model of situatedness shown in Figure 1(b). In particular, the variable X, which stands for design representations in general, is replaced with the more specific representations F, B and S. This provides the basis of the situated FBS framework, Figure 2 (Gero and Kannengiesser 2004). In addition to using external, interpreted and expected F, B and S, this framework uses explicit representations of external requirements given to the designer by another agent (usually the customer). Specifically, there may be external requirements on function (FRe), external requirements on behaviour (BRe), and external requirements on structure (SRe). The situated FBS framework also introduces the process of comparison between interpreted behaviour (Bi) and expected behaviour (Bei), and a number of processes that transform interpreted structure (Si) into interpreted behaviour (Bi), interpreted behaviour (Bi) into interpreted function (Fi), expected function (Fei) into expected behaviour (Bei), and expected behaviour (Bei) into expected structure (Sei). Figure 2 uses the numerals 1 to 20 to label the resultant set of processes; however, it should be noted that they do not represent any order of execution. The 20 processes can be mapped onto eight fundamental design steps (Gero 1990; Gero and Kannengiesser 2004). 1. Formulation: consists of processes 1 – 10. It includes interpretation of external requirements, given to the designer by a customer, as function, behaviour and structure, via processes 1, 2 and 3. Requirements are also constructed as implicit requirements generated from within the designer, using constructive memory (processes 4, 5 and 6). Focussing transfers a subset of the (explicitly and implicitly) required function, behaviour and structure into the expected world (processes 7, 8 and 9). In summary, processes 1 – 9 represent activities that populate the interpreted and expected worlds with design concepts, providing the basis for subsequent transformations of these concepts. Process 10 transforms expected function into additional expected behaviour. The set of expected function, behaviour and structure, resulting from the formulation step, represents the design state space. It includes all the variables and their ranges of values that are relevant for the design task. 2. Synthesis: consists of process 11 to generate an instance of structure that is expected to meet the required behaviour, and the externalisation of that structure via process 12. This design step can be viewed as part of a search process through the (previously formulated) state space of all possible instances of structure. 3. Analysis: consists of interpretation of externalised structure (process 13) and the derivation of behaviour from that structure (process 14). 4. Evaluation: consists of a comparison of expected behaviour and behaviour derived through analysis (process 15). 5. Documentation: produces an external representation of the final design solution for purposes of communicating that solution in terms of structure (process 12), and, optionally, behaviour (process 17) and function (process 18).
An Ontology of Computer-Aided Design
215
6. Reformulation type 1: consists of focussing on different structures than previously expected (process 9). Precursors of this process are the interpretation of external structure (process 13), constructive memory of structure (process 6) or the interpretation of new requirements on structure (process 3). 7. Reformulation type 2: consists of focussing on different behaviours than previously expected (process 8). Precursors of this process are the derivation of behaviour from structure (process 14), the interpretation of external behaviour (process 19), constructive memory of behaviour (process 5) or the interpretation of new requirements on behaviour (process 2). 8. Reformulation type 3: consists of focussing on different functions than previously expected (process 7). Precursors of this process are the ascription of function to behaviour (process 16), the interpretation of external function (process 20), constructive memory of function (process 4) or the interpretation of new requirements on function (process 1). The numbering of the eight design steps, similar to the 20 processes, does not prescribe any order of execution. While it may be expected for some routine design tasks to follow a sequential execution of only the first five steps, it has been found that all three types of reformulation frequently occur throughout the process of designing (McNeill et al. 1998).
Figure 2. The situated FBS framework.
216
Udo Kannengiesser and John S. Gero
The situated FBS framework represents designing independently of the domain of the design and the specific methods used, and of the subject carrying out the process of designing. What we have referred to as the “design agent” in the definition of the three design worlds can be embodied by a human designer (or team of human designers), a computational tool, or a combination of both.
3. A Process-Centred Ontology of Design The object-centred ontology of design presented in Section 2 has been helpful for establishing a basic understanding of design. Its emphasis on artefacts provides an intuitive, tangible perspective, representing the process of designing as a gradual evolution of the design object across three levels. The three-world model of design interactions, in which this representation is embedded, is sufficiently rich to account for the phenomena of situatedness. However, the object-centred ontology lacks sufficient detail and rigour to be useful for comparing or developing different methods and computer support for designers. The key ideas and semantics conveyed by the situated FBS framework are only informally expressed using textual, natural-language descriptions such as in Sections 2.2 and 2.3. The graphical model in Figure 2 does not fully capture these semantics. The mapping of the 20 processes onto Gero’s (1990) eight fundamental design steps has added some more meaning by locating these processes within typical phases of a design project. However, this mapping does not completely capture all the semantics and is too informal to be used as an ontological framework for computer-aided design. What is needed is an ontology that is process-centred, treating design processes as first-class entities rather than derivatives of object-centred constructs. This Section will present such an ontology, extending our recent work on an FBS ontology of processes (Gero and Kannengiesser 2007).
3.1. An Ontology of Processes Processes are usually understood as entities that are less tangible than (physical) objects. Nonetheless, they can be represented using the same set of ontological constructs as used for describing objects: function, behaviour and structure. To clearly distinguish between the notations of the process-centred and the object-centred FBS ontology, we will use the indices “p” for “process” and “o” for “object”.
3.1.1. Process Function Function (Fp) of a process is ontologically no different to object function, as it is based on the observer’s goals rather than on embodiment as an object or as a process. Instances of process functions are largely domain-dependent. However, most processes that we design and execute through actions have the general function of replacing an existing state of the world with a desired one.
An Ontology of Computer-Aided Design
217
3.1.2. Process Behaviour Behaviour (Bp) of a process relates to attributes that allow comparison on a performance level as a basis for process evaluation. Typical process behaviours are speed, cost, amount of space required and accuracy. These behaviours can be specialised and/or quantified for instances of processes in particular domains.
3.1.3. Process Structure Through an analogy with the structure of physical objects, we can distinguish between a macro- and a micro-structure (Sp) of processes.
Figure 3. The macro-structure of a process (i = input; t = transformation; o = output).
The macro-structure of a process includes three components and two relationships, Figure 3. The components are • • •
an input (i), a transformation (t) and an output (o).
The relationships connect • •
the input and the transformation (i – t) and the transformation and the output (t – o).
Input (i) and output (o) represent properties of entities being transformed in terms of their variables and/or their values. For example, the process of transportation changes the values for the location of a (physical) object (e.g. the values of its x-, y- and z-coordinates). The process of electricity generation takes mechanical motion as input and produces electrical energy as output. A common way to describe the transformation (t) of a process is in terms of a plan, a set of rules or other procedural descriptions. A typical example is a software procedure that is expressed in source code or as an activity diagram in the Unified Modeling Language (UML). Such descriptions are often used to specify sub-components of the transformation. The relationships between the three components of a process are usually uni-directional from the input to the transformation and from the transformation to the output. For iterative processes the t – o relationship is bi-directional to represent the feedback loop between the output and the transformation. The micro-structure or “material” of a process differs from the macro-structure because its components and relationships cannot be distinguished (or are not relevant) at the same level of abstraction. For example, it is not common to specify the (business process)
218
Udo Kannengiesser and John S. Gero
transformation “pay the supplier” in terms of more fine-grained activities (sub-components) such as “log in to online banking system”, “fill out funds transfer form” and “click the submit button”. This set of activities is best viewed as a micro-structure specified only through a shorthand qualifier such as “using internet banking”. Micro-structure can also be associated with the input and output components of a process. For example, a set of measuring data that is the input of a statistical analysis process may be “materialised” through either digital or paper-based media. While micro-structure is clearly needed to carry out (“materialise”) a process, the components and relationships of that micro-structure are not explicitly represented. This fits with one of Merriam-Webster’s definitions of material as “the formless substratum of all things which exists only potentially and upon which form acts to produce realities”. The “formless substratum” of a transformation may reference not only processes but also objects. It then denotes the entity or agent executing the transformation. In the “pay the supplier” example, it is possible to specify “finance officer” or “purchasing department” as a general descriptor for the executing agent. Including such references to agents (as “actors” or “roles”) has become well-established in process modelling (Curtis et al. 1992). Since micro-structure does not specify components and relationships, it can be embodied by either (micro-) objects or (micro-) processes. In some instances, the micro-structure of objects can refer to (micro-) processes rather than (micro-) objects. For example, the chemical bonds (macro-relationships) between the atoms (macro-components) of a molecule are realised by physical processes, according to the laws of quantum electrodynamics. A view of the world as being based on processes rather than objects has generally been suggested in process philosophy (Rescher 2006).
3.1.4. Relationships between Process Function, Behaviour and Structure Relationships among Fp, Bp and Sp are constructed according to the same principles as described for Fo, Bo and So (see Section 2.1.4). Function is ascribed to behaviour based on associations of process performance with human goals. Behaviour can be derived from structure either directly or indirectly based on external effects. An example of directly derived behaviour is the speed of a process, as this depends exclusively on the macro-structure (“what kind of transformation is used on what input to produce what output?”) and the microstructure (“how and by whom is the transformation carried out using what input/output media?”). An example of indirectly derived behaviour is accuracy, which needs an external benchmark against which the output of the process is compared.
3.2. An Ontology of Design Processes The FBS ontology of processes can be used to re-represent the object-centred description of the 20 design processes (presented in Section 2.3) as a process-centred one. Most parts of the object-centred model depicted in Figure 2 directly map onto the input and output components of process macro-structure (Sp). For example, Sp of process 14 in Figure 2 includes (interpreted) object structure (So) as input (i) and (interpreted) object behaviour (Bo) as output
An Ontology of Computer-Aided Design
219
(o).1 No specific information is given about transformation components, as this is available only at an instance level. Most of the semantics of the situated FBS framework can be captured by process function (Fp). Table 1 gives an overview of the structure and functions of each of the 20 design processes. Table 1. Function (Fp) and macro-structure (Sp) of the 20 design processes ID
Process class
(macro-) Sp
1 2 3 4 5 6 7 8 9 10 11 12
Action
FRe → Fi BRe → Bi SRe → Si Fi → Fi Bi → Bi Si → Si Fi → Fei Bi → Bei Si → Sei Fei → Bei Bei → Sei Sei → Se
13
Interpretation
Se → Si
14
Transformation
Si → Bi
15 16 17 18 19 20
Comparison Transformation Action
{Bei, Bi} → decision Bi → Fi Bei → Be Fei → Fe Be → Bi Fe → Fi
Interpretation
Constructive memory Focussing
Transformation
Interpretation
Fp 1. transfer design concepts as intended 2. re-interpret design concepts 1. retrieve design concepts as stored 2. re-construct design concepts construct function state space construct behaviour state space construct structure state space construct behaviour state space generate values for design structure 1. communicate the design to others 2. initiate reflective conversation 1. transfer design concepts as intended 2. re-interpret design concepts 1. analyse for performance expectations 2. generate new design issues evaluate the design generate new design issues 1. communicate the design to others 2. initiate reflective conversation 1. transfer design concepts as intended 2. re- interpret design concepts
Interpretation processes (1, 2, 3, 13, 19 and 20) can have two different functions. One function is to transfer existing design concepts from one agent to another or the same agent without a change of the initial meaning of these concepts. This involves bringing external representations into a form that allows processing of these representations by the individual design agent. The other function of interpretation is to re-interpret design concepts based on existing ones. This generates design concepts and issues that are novel with respect to the ones initially intended. Constructive memory processes (4, 5 and 6) have a similar set of functions. One function is to retrieve design concepts from some storage space in the same way as they were experienced at the time of storage. While this may include some computation or transformation, such as refinement or decomposition of design concepts, the results of this 1
Indices for “interpreted” have been omitted here to improve notational clarity.
220
Udo Kannengiesser and John S. Gero
process will all have a pre-defined relationship with the initial concepts. The other function of the constructive memory processes is to re-construct and thereby modify existing design concepts, which corresponds to the notion of reflection (Schön 1983). Focussing processes (7, 8 and 9) have the function to construct the design state space. This includes the construction of the initial design state space (maps onto the formulation step) and subsequent modifications of that space (maps onto the reformulation steps). Action processes (12, 17 and 18) can have two different functions. One function is to communicate aspects of the design to other stakeholders (agents). Here, the notion of communication is used in its traditional sense of sharing information, based on unambiguous transfer of design concepts. The other function is to initiate reflective conversation, either with other stakeholders (agents) or the initiator of the action process itself. In other words, external representations are produced to be re-interpreted in new ways. Processes 10, 11, 14 and 16 may be called “FBSo transformations” based on their role as transformers between Fo, Bo and So. Process 10 has the function to construct the behaviour state space, and process 11 has the function to generate values within the (previously constructed) structure state space. Process 14 has two functions. One function is to analyse the design with respect to current performance expectations. The other function is to generate new design concepts that can be included as new issues in the current design task. This is also the function of process 16. The comparison process (15) has the function to evaluate the design, based on decision making informed by comparison of expected and interpreted design performance. It can be seen that some of the functions (Fp) – loosely speaking – relate to non-situated and others to situated aspects of designing. Non-situated aspects are captured by those functions that do not address the potential for change during designing. These are the functions that involve “transfer” (in interpretation processes), “retrieval” (in constructive memory processes) and “communication” (in action processes). Situated aspects of designing describing the potential for change are captured by functions that involve “re-interpretation” (in interpretation processes), “re-construction” (in constructive memory processes) and “reflective conversation” (in action processes). Table 1 does not include the behaviours (Bp) of the 20 processes. This is because, at the current level of abstraction, they are no different from the general process behaviours described in Section 3.1.2. This is based on the independence of our ontology of specific methods or design domains. No detailed information about structure (Sp) and exogenous effects is available to be able to specialise or quantify general process behaviours (Bp) such as speed, accuracy and cost. An example for such detailed information would be when process structures (Sp) were considered that contain iterations (e.g., when using genetic algorithms (GAs) in design synthesis). In this case, the behaviour (Bp) “rate of convergence” could be derived that is a specialisation of the behaviour (Bp) “speed”. However, as our aim here is to provide a general rather than an instance-specific ontology, different classes of design processes are distinguished only at the level of function (Fp) and structure (Sp).
4. An Ontological Framework for Computer-Aided Design Support The ontological view presented in Sections 2 and 3 has provided a detailed description of 20 distinct processes in designing. This is useful for enhancing our understanding of designing as
An Ontology of Computer-Aided Design
221
a human activity. However, the ultimate aim of most research in design is to enhance the performance of this activity, both in terms of higher effectiveness and efficiency. The key to improving performance or behaviour (Bp) of designing is in the structure (Sp) it is derived from. This requires more detailed representations of structure (Sp) than presented in Table 1, and mainly concerns micro-structure. Exploring the micro-level of process structure is a general research theme that has been recognised in a number of other disciplines (Osterweil 2005). Research in the micro-structure (Sp) of designing can be characterised loosely as either method- or tool-oriented. Method-oriented approaches focus on process-centred representations of micro-structure. These representations can be viewed as composing a new macro-structure to be “materialised” by humans or tools. Tool-oriented design research focuses on object-centred representations of micro-structure in terms of new design tools. Computer-aided design research and development is clearly located in this field. Both method- and tool-oriented research streams are complementary, as each of them often uses results from the other. Computer-aided design tools can themselves be regarded as design objects. Applying the FBS ontology to these tools provides a schema for the characteristics that the tools must exhibit to be useful in the process of designing. We will use the index “t” for “tool” to distinguish the FBS view of tools from the FBS view of design objects and design processes. The most essential characteristics of a tool relate to function (Ft) as they orient the specification of a tool’s behaviour (Bt) and structure (St) towards the required goals and context of use. Many of the functions of computer-aided design tools do not differ from any other software product. They include such general characteristics as usability, reliability, maintainability and others (ISO 2001). However, there are a number of functions that are specific to computer-aided design tools. These functions relate to the tools’ role as the “material” of design processes, and can generally be described as “to support design processes of class X”. For example, a general function (Ft) of a commercial CAD tool is “to support design processes of class X = documentation” (one of the fundamental design steps presented in Section 2.3). These functions can be further specialised using particular combinations of the FBSp properties of the 20 design processes presented in Table 1. An example of a more specific function (Ft) of a CAD tool is “to support the process structure (Sp) Sei → Se in a way to achieve the process function (Fp) of communicating the design to others”. The set of functions (Ft) derivable in this way can serve as high-level requirements for the development of new design tools. This approach makes research and development in computer-aided design look like a design process, generating computational models and architectures as the structure (St) of tools exhibiting certain behaviours (Bt) to achieve the required functions (Ft). The remainder of this Section will cast existing work on computeraided design systems in this ontology, classifying that work based on tool functions (Ft) derived from combinations of Fp and Sp shown in Table 1. This aims to provide an overview of the current range of both commercial software and academic proof-of-concept demonstrators. For this purpose, detailed descriptions of their behaviour (Bt) and structure (St) are not required. Readers may consult our references to the literature for more specific information.
222
Udo Kannengiesser and John S. Gero
Figure 4. Action in the situated FBS framework.
4.1. Computer-aided Design Support for Action The notion of a “tool” has traditionally been viewed as a mechanism for humans to perform actions. Computer-aided design tools can serve two possible functions (Ft) in their support of action (see Table 1): • •
to support communicating the design to support initiating reflective conversation
Figure 4 highlights processes 12, 17 and 18 in the situated FBS framework to represent actions related to these two functions.
4.1.1. Support for Communicating the Design •
Sei → Se (process 12): The ability to generate representations of external object structure (So) is provided by commercial CAD systems. These tools produce 2-D or 3-D models and offer functionalities such as scaling, rotating and rendering to communicate different aspects of the object. The models generated by CAD systems
An Ontology of Computer-Aided Design
•
•
223
are primarily used for data exchange with other designers, manufacturers or other stakeholders, or for providing input for tools that perform analyses of the designed objects. Communication across different tools has been recognised as an area of growing concern, as the tools generally use different languages (data formats) for representing object structure. A number of approaches address this problem by defining standardised product models, the best known of which are STEP and IFCs (Eastman 1999). Many CAD tools now have translators (called pre-processors) that map object structure onto a neutral format based on these standards. Some of our previous work was concerned with developing an agent-based approach to communicating product data in situations where no standard formats are available (Kannengiesser and Gero 2006; Kannengiesser and Gero 2007). Bei → Be (process 17): Virtual reality (VR) systems are increasingly used to generate 3-D objects in a place-like context that usually include avatars representing potential users or stakeholders of the design. These tools support modelling not only the structure (So) but also the behaviour (Bo) of the designed object based on simulated interactions with avatars or other objects. Digital mock-ups (DMUs) are based on a similar concept, and are commonly used for the simulation of assembly operations or kinematics. Other tools that focus mainly on the communication of object behaviour (Bo) are those specialised in performing particular engineering analyses. Typical examples here include the representation of stresses and temperatures. Fei → Fe (process 18): There are currently no commercial tools specialised in generating formal representations of object function (Fo). This is mainly due to the lack of a commonly agreed representation language. In most cases, function is described informally using natural language expressions, usually based on verb-noun pairs (Jacobsen et al. 1991) that are also used in this chapter. These descriptions can be produced by general-purpose word processors and annotation mechanisms provided by CAD systems. Future tool support may result from recent work on more formal representations of function (Chandrasekaran and Josephson 2000; Stone and Wood 2000; Szykman et al. 2001; Deng 2002).
4.1.2. Support for Initiating Reflective Conversation •
•
Sei → Se (process 12): There are no commercial design tools that explicitly aim at supporting reflective conversation. However, there are some method-oriented approaches that may inform the development of such tools. For example, Jun and Gero (1997) have demonstrated how shapes can emerge by representing the same geometrical structure in different ways. Current CAD systems do not have this ability, as their representations are fixed through the way they store a design’s geometry in their database. An approach by Reymen et al. (2006) uses checklists and forms for designers to stimulate the creation of textual descriptions of designs from multiple perspectives, at regular intervals during the process of design. Bei → Be (process 17): Reflective conversation at the behaviour level has not been well understood. However, the models of generating multiple representations described for the structure level can be applied when behaviour is represented using
224
Udo Kannengiesser and John S. Gero
•
shapes. The notion of space, for example, can be viewed as a behaviour (derived from a walls-and-floor structure) that can be described geometrically. Fei → Fe (process 18): Apart from cases in which functions represent references to shapes, reflective conversation at the function level has not been well understood. Tool support for generating multiple, textual representations of function may be developed based on research in natural language semantics. For example, de Vries et al. (2005) explore the use of the WordNet lexicon (Miller 1995) to generate a graph of synonyms and other semantic relations from a given set of words.
4.2. Computer-aided Design Support for FBSo Transformations and Evaluation A number of research efforts have concentrated on tool support for performing those transformations and evaluations that have been viewed as fundamental in most traditional models of designing (e.g., Asimov (1962)). These include the transformations between the function, behaviour and structure of the design object, and evaluation based on comparing expected with “actual” behaviour. Figure 5 highlights processes 10, 11, 14, 15 and 16 to represent these activities.
Figure 5. FBSo transformations and evaluation in the situated FBS framework.
An Ontology of Computer-Aided Design •
•
•
•
•
225
Si → Bi (process 14): There is a wide range of commercial tools that support the derivation of object behaviour from design structure. These are commonly referred to as analysis tools or simulation tools. Most of them are based on the physical laws and principles established in the engineering sciences. Examples of design analyses for which there is automated support include finite element analysis, thermal analysis, energy analysis and kinematic analysis. Some tools, such as design optimisation tools and parametric CAD systems, provide automated support for the Si → Bi transformation as part of a collection of transformations that also include evaluation (process 15) and the generation of object structure (process 11). These tools will be presented in more detail under the bullet points for processes 11 and 15 (below). The function of generating new design issues (see Table 1) is addressed by some CAD systems performing runtime analyses of the design, such Design for X (DFX) analyses. Gero and Kazakov (1998) have developed a computational model of behaviour analogy where new behaviour variables are introduced into the target design based on structure similarity with the source design. Bei → Sei (process 11): Parametric CAD systems have shown to significantly facilitate the creation of solid models (Shah and Mäntylä 1995), and many CAD vendors now offer parametric modelling features. These systems can be viewed as automating the process of computing an object structure once a set of parameters have been formulated for both structure and behaviour. Parametric CAD systems also allow for automated maintenance of parametric constraints (Sacks et al. 2004). This requires additional automation for analysing and evaluating the design for constraint violations, which can be mapped onto the transformation process Si → Bi (process 14) and the evaluation process {Bei, Bi} → decision (process 15). Design optimisation tools provide similar integrated functionalities supporting the same set of processes. They provide an extensive range of mechanisms to evolve object structure, including various deterministic and stochastic search methods (Papalambros and Wilde 2000). {Bei, Bi} → decision (process 15): Automated support for this process is provided in a number of computer-aided design systems, as indicated above. Optimisation tools, in particular, incorporate sophisticated strategies for controlling the execution of alternative search paths, based on the performance of the current design candidate. Research on agent-based design systems addresses evaluation using conflict resolution mechanisms, which have been applied to instances of multi-objective design optimisation (Grecu and Brown 1996; Campbell et al. 1999). Fei → Bei (process 10): Few systems have been developed that support the generation of object behaviours based on object function (Maiden and Sutcliffe 1992; Bhatta et al. 1994; Umeda et al. 1996). This is mainly due to the lack of a formal language to represent function. Bi → Fi (process 16): There has been no work to date on tool support for this process.
226
Udo Kannengiesser and John S. Gero
Figure 6. Focussing in the situated FBS framework.
4.3. Computer-aided Design Support for Focussing There has been some work on tools to support focussing, the processes involved in the formulation of a design state space. These tools are mainly based on decision-making mechanisms that use various kinds of information. Figure 6 highlights processes 7, 8 and 9 to represent focussing. •
Si → Sei (process 9): A number of computational approaches to focussing on object structure have been developed in the area of design optimisation. Some of this work uses information extracted from the current design. For example, Parmee’s (1996) cluster-oriented genetic algorithms (COGAs) identify high-performance regions within the current structure state space. These features are then used for focussing on different structure variables and constraints, to concentrate the search for an optimum design on particular areas within the original structure state space. Other work uses information learnt from previous design tasks. A tool developed by Schwabacher et al. (1998) extracts characteristics of previous optimisation results and uses them to formulate new optimization problems. These characteristics include information such
An Ontology of Computer-Aided Design
•
•
227
as optimal structure, mappings between structure and behaviour, infeasible behaviour and active constraints. This information is used to improve the problem formulation by reducing the structure state space. Bi → Bei (process 8): Some work has been done on focussing at the level of object behaviour, again mostly in the context of optimisation. Mackenzie and Gero (1987) have induced rules to detect certain features of Pareto optimal sets relating to curvature, sensitivity and other information. The rules use this information to reformulate the problem by carrying out focussing in a way that reduces the behaviour state space. Jozwiak’s (1987) approach uses learning to acquire knowledge of inactive constraints, which is then used to predict whether or not the constraints of the current optimisation task may be neglected. Fi → Fei (process 7): There has been no work to date on tool support for this process.
Figure 7. Interpretation in the situated FBS framework.
228
Udo Kannengiesser and John S. Gero
4.4. Computer-aided Design Support for Interpretation Tools usually have some form of interface to receive and utilise input provided externally either by humans or other tools. In computer-aided design, there are two possible functions (Ft) for interpretation by tools: • •
to support transferring design concepts as intended to support re-interpreting design concepts
Figure 7 highlights processes 1, 2, 3, 13, 19 and 20 to represent interpretation.
4.4.1. Support for Transfer of Design Concepts as Intended •
•
•
SRe → Si (process 3) and Se → Si (process 13): There has been considerable research in the computational interpretation of external object structure. The standardisation approaches to product modelling, mentioned in Section 4.1.1, provide the basis for the development of import mechanisms (called post-processors) that translate the standard models into the tool’s native format. Post-processors for STEP and IFC models are available in a number of commercial CAD/CAE/CAM systems. Another area of research is concerned with the interpretation of human sketches and freehand drawings by tools converting them into more exact graphical models or performing early design analyses (Taggart 1975; Gross 1996; Leclercq 2001). BRe → Bi (process 2) and Be → Bi (process 19): Most design tools dealing with object behaviour directly derive that behaviour from structure (process 14) rather than interpreting it externally (e.g., from other tools). As a result, not much work exists on tool support for the interpretation of object behaviour. However, recent approaches to interoperability aiming to standardise the representation of function and behaviour besides structure (Szykman et al. 2001) may lead to the development of tool translators that automate this process. FRe → Fi (process 1) and Fe → Fi (process 20): Most tool support for the interpretation of object function is based on mechanisms of word recognition, given that many representations of function are described using natural language annotations. While there is a large number of general-purpose tools that provide interfaces for textual input (such as Word processors or electronic whiteboards), only few of them (e.g., the word generation system developed by de Vries et al. (2005)) offer more word-processing features than just editing. Future work on the interpretation of function can be expected to be driven by advances in both representing and reasoning about function, particularly in the area of design interoperability (Szykman et al. 2001).
4.4.2. Support for Re-Interpretation of Design Concepts •
SRe → Si (process 3) and Se → Si (process 13): Most research in re-interpretation has been done at the level of object structure. A system presented by Saund and Moran (1994) supports the creation of multiple interpretations of line drawings, by first decomposing and then reassembling elements of freehand drawings. The
An Ontology of Computer-Aided Design
• •
229
emerging shapes are then presented to the user for selection. A design agent capable of re-interpretation has been developed by Smith and Gero (2001) on the basis of Gero and Fujii’s (2000) “push-pull” model of situated cognition. This system has been able to learn new shapes over sequences of action and (re-)interpretation that are themselves the result of the agent’s modified experience. BRe → Bi (process 2) and Be → Bi (process 19): There has been no work to date on tool support for this process. FRe → Fi (process 1) and Fe → Fi (process 20): There has been no work to date on tool support for this process.
4.5. Computer-aided Design Support for Constructive Memory Most work on computer-aided design tools includes support for memory in some way. There are two possible functions (Ft) related to this notion: • •
to support retrieval of design concepts as stored to support re-construction of design concepts
Figure 8 highlights processes 4, 5 and 6 to represent constructive memory.
Figure 8. Constructive memory in the situated FBS framework.
230
Udo Kannengiesser and John S. Gero
4.5.1. Support for Retrieval of Design Concepts as Stored •
•
•
Si → Si (process 6): Research in using memory of object structure includes work on feature-based modelling. A number of CAD systems provide design databases, repositories or libraries to store design features, such as pockets, holes and slots. Their reuse can lead to significant gains of productivity in designing. Techniques of feature extraction from geometrical CAD models can be viewed as another example of retrieving design concepts, although they require some additional computation. Here, features are implicitly stored in the pre-defined mappings underpinning common extraction techniques such as graph matching, syntactic pattern recognition and shape grammars (Shah 1991). Bi → Bi (process 5): Some recent work on design repositories has concentrated on including properties related to behaviour (Bo) and function (Fo) of the design object (Szykman et al. 2001; Mocko et al. 2004). In addition, approaches to capturing and reusing design rationale have focused on appropriate representations of previous object behaviour to be accessible for guiding the generation of new object structure (Chandrasekaran et al. 1993). Fi → Fi (process 4): Simple retrieval of object function is best exemplified by work on storing and reusing function (Fo) hierarchies in design repositories (Szykman et al. 2001) or case bases (Navinchandra et al. 1991). Approaches to retrieving implicitly stored functions include work, mentioned earlier, on inferring word relations based on WordNet (de Vries et al. 2005). Other work focuses on the construction of subfunctions using decomposition knowledge encoded in grammars (Sridharan and Campbell 2005).
4.5.2. Support for Re-Construction of Design Concepts •
Si → Si (process 6), Bi → Bi (process 5) and Fi → Fi (process 4): The idea of generating design concepts by situated re-construction rather than static retrieval from previous experience is quite new in design research. As a result, very little work has been done towards the development of computational models and tools that support this process. However, a number of research demonstrators have shown both the feasibility and the potential benefits of future constructive memory tools. Examples include neural network implementations used for the design of mechanical assemblies (Liew and Gero 2004), design optimisation (Peng and Gero 2006) and the exchange of product data between design tools (Kannengiesser and Gero 2007). The majority of this work provides support for re-construction of design concepts at all three levels, comprising function (Fo), behaviour (Bo) and structure (So).
5. Conclusion Designing comprises a rich set of activities that is only beginning to be completely understood. Capturing these activities and defining them in a detailed framework is necessary to advance our understanding of design. The ontological framework presented in this chapter
An Ontology of Computer-Aided Design
231
is a contribution to this aim. It extends our previous, object-centred work on representing the process of designing by adding a process-centred view. This view is based on the direct application of the FBS ontology to design activities, treating them as first-class entities with their own function, behaviour and structure, and no longer as mere derivatives of objectcentred constructs. This provides a more structured description at a higher level of detail, which has the potential to make our framework of situated designing more amenable to other researchers. We have shown that the process-centred ontology of designing also allows specifying a set of requirements for tool support. This is based on the connection we established between the FBS view of design processes and the FBS view of design tools. Specifically, tools are viewed as artefacts whose functions (Ft) are specialised to supporting particular aspects of design processes, which themselves consist of combinations of function (Fp), behaviour (Bp) and structure (Sp). We have demonstrated how some of the outcomes of existing computeraided design research and development can be mapped onto 20 classes of design processes represented in this way. One result of our mappings is that a lack of tool support can be identified for a number of design activities. At the level of granularity presented in this chapter, this concerns activities of re-interpretation and re-construction of design concepts, and reasoning and focussing on object function (Fo). Our ontology allows understanding the research field of computer-aided design as the “materials science” of designing, concerned with creating and analysing tools to form appropriate “materials” of design processes at different levels of granularity. This is possible because the FBS ontology represents all design objects, tools and processes uniformly. Future research may use this ontology to create more fine-grained specifications of design tools. For example, different classes of feature extraction processes can be defined based on different classes of inputs (e.g., cubic, cylindrical or free-form shapes) and on different classes of transformations (e.g., graph matching, shape grammars, neural networks, etc.), and consequently different functions (Ft) of feature extraction tools can be derived. Information about specific process behaviour (Bp) and process function (Fp), on an instance level, can be added to derive more refined tool functions (Ft). Researchers and developers in computeraided design can then identify specific gaps in the functions (Ft) of existing tools and generate the behaviour (Bt) and ultimately the structure (St) of new tools to close these gaps.
Acknowledgments NICTA is funded by the Australian Government's Department of Communications, Information Technology and the Arts, and the Australian Research Council through Backing Australia's Ability and the ICT Research Centre of Excellence program.
References Asimov, M: 1962, Introduction to Design, Prentice-Hall, Englewood Cliffs. Bartlett, FC: 1932 reprinted in 1977, Remembering: A Study in Experimental and Social Psychology, Cambridge University Press, Cambridge.
232
Udo Kannengiesser and John S. Gero
Bhatta, S, Goel, A and Prabhakar, S: 1994, Innovation in analogical design: A model-based approach, in JS Gero and F Sudweeks (eds) Artificial Intelligence in Design ’94, Kluwer, Dordrecht, pp. 57-74. Bickhard, MH and Campbell, RL: 1996, Topologies of learning, New Ideas in Psychology 14(2): 111-156. Campbell, MI, Cagan, J and Kotovsky, K: 1999, A-Design: An agent-based approach to conceptual design in a dynamic environment, Research in Engineering Design 11(3): 172-192. Chandrasekaran, B, Goel, AK and Iwasaki, Y: 1993, Functional representation as design rationale, IEEE Computer 26(1): 48-56. Chandrasekaran, B and Josephson, JR: 2000, Function in device representation, Engineering with Computers 16(3-4): 162-177. Clancey, WJ: 1997, Situated Cognition: On Human Knowledge and Computer Representations, Cambridge University Press, Cambridge. Curtis, B, Kellner, MI and Over, J: 1992, Process modeling, Communications of the ACM 35(9): 75-90. Deng, YM: 2002, Function and behavior representation in conceptual mechanical design, Artificial Intelligence for Engineering Design, Analysis and Manufacturing 16(5): 343362. Dewey, J: 1896 reprinted in 1981, The reflex arc concept in psychology, Psychological Review 3: 357-370. Eastman, CM: 1999, Building Product Models: Computer Environments Supporting Design and Construction, CRC Press, Boca Raton. Gero, JS: 1990, Design prototypes: A knowledge representation schema for design, AI Magazine 11(4): 26-36. Gero, JS and Fujii, H: 2000, A computational framework for concept formation for a situated design agent, Knowledge-Based Systems 13(6): 361-368. Gero, JS and Kannengiesser, U: 2004, The situated function-behaviour-structure framework, Design Studies 25(4): 373-391. Gero, JS and Kannengiesser, U: 2007, A function-behavior-structure ontology of processes, Artificial Intelligence for Engineering Design, Analysis and Manufacturing 21(4), in press Gero, JS and Kazakov, V: 1998, Using analogy to extend the behaviour state space in creative design, in JS Gero and ML Maher (eds) Computational Models of Creative Design IV, Key Centre of Design Computing and Cognition, University of Sydney, Australia, pp. 113-143. Grecu, DL and Brown, DC: 1996, Learning by single function agents during spring design, in JS Gero and F Sudweeks (eds) Artificial Intelligence in Design ’96, Kluwer, Dordrecht, pp. 409-428. Gross, MD: 1996, The electronic cocktail napkin – a computational environment for working with design diagrams, Design Studies 17(1): 53-69. Guindon, R: 1990, Designing the design process: Exploiting opportunistic thoughts, HumanComputer Interaction 5: 305-344. ISO: 2001, Software Engineering – Product Quality – Part 1: Quality Model, ISO/IEC 91261, International Organization for Standardization, Geneva, www.iso.ch
An Ontology of Computer-Aided Design
233
Jacobsen, K, Sigurjonsson, J and Jacobsen, O: 1991, Formalized specification of functional requirements, Design Studies 12(4): 221-224. Jozwiak, SF: 1987, Improving structural optimization programs using artificial intelligence concepts, Engineering Optimization 12: 155-162. Jun, HJ and Gero, JS: 1997, Representation, re-representation and emergence in collaborative computer-aided design, in ML Maher, JS Gero and F Sudweeks (eds) Preprints Formal Aspects of Collaborative Computer-Aided Design, Key Centre of Design Computing and Cognition, University of Sydney, Australia, pp. 303-320. Kannengiesser, U and Gero, JS: 2006, Towards mass customized interoperability, ComputerAided Design 38(8): 920-936. Kannengiesser, U and Gero, JS: 2007, Agent-based interoperability without product model standards, Computer-Aided Civil and Infrastructure Engineering 22(2): 80-97. Leclercq, P: 2001, Programming and assisted sketching, in B de Vries, JP van Leeuwen and HH Achten (eds) CAAD Futures 2001, Kluwer Academic Publishers, Dordrecht, pp. 1532. Liew, P and Gero, JS: 2004, Constructive memory for situated design agents, Artificial Intelligence for Engineering Design, Analysis and Manufacturing 18(2): 163-198. Mackenzie, CA and Gero, JS: 1987, Learning design rules from decisions and performances, Artificial Intelligence in Engineering 2(1): 2-10. Maiden, NA and Sutcliffe, AG: 1992, Exploiting reusable specifications through analogy, Communications of the ACM 35(4): 55-63. McNeill, T, Gero, JS and Warren, J: 1998, Understanding conceptual electronic design using protocol analysis, Research in Engineering Design 10(3): 129-140. Miller, GA: 1995, WordNet: A lexical database for English, Communications of the ACM 38(11): 39-41. Mocko, G, Malak, R, Paredis, C and Peak, R: 2004, A knowledge repository for behavioral models in engineering design, Computers and Information Science in Engineering Conference ’04, Salt Lake City, UT. Navinchandra, D, Sycara, KP and Narasimhan, S: 1991, Behavioral synthesis in CADET, a case-based design tool, IEEE Conference on Artificial Intelligence Applications, Miami Beach, FL, pp. 217-221. Osterweil, LJ: 2005, Unifying microprocess and macroprocess research, in M Li, B Boehm and LJ Osterweil (eds) Unifying the Software Process Spectrum, Springer-Verlag, Berlin, pp. 68-74. Papalambros, P and Wilde, DJ: 2000, Principles of Optimal Design: Modeling and Computation, Cambridge University Press, Cambridge. Parmee I.C. (1996) Towards an optimal engineering design process using appropriate adaptive search strategies, Journal of Engineering Design 7(4): 341-362. Peng, W and Gero, JS: 2006, Concept formation in a design optimization tool, in J van Leeuwen and H Timmermans (eds) Innovations in Design Decision Support Systems in Architecture and Urban Planning, Springer-Verlag, Berlin, pp. 293-308. Rescher, N: 2006, Process Philosophical Deliberations, Ontos-Verlag, Frankfurt. Reymen, IMMJ, Hammer, DK, Kroes, PA, van Aken, JE, Dorst, CH, Bax, MFT and Basten, T: 2006, A domain-independent descriptive design model and its application to structured reflection on design processes, Research in Engineering Design 16(4): 147-173.
234
Udo Kannengiesser and John S. Gero
Sacks, R, Eastman, CM and Lee, G: 2004, Parametric 3D modeling in building construction with examples from precast concrete, Automation in Construction 13(3): 291-312. Saund, E and Moran, TP: 1994, A perceptually-supported sketch editor, ACM Symposium on User Interface Software and Technology, ACM Press, New York. Schön, DA: 1983, The Reflective Practitioner: How Professionals Think in Action, Harper Collins, New York. Schön, DA and Wiggins, G: 1992, Kinds of seeing and their functions in designing, Design Studies 13(2): 135-156. Schwabacher, M, Ellman, T and Hirsh, H: 1998, Learning to set up numerical optimizations of engineering designs, Artificial Intelligence for Engineering Design, Analysis and Manufacturing 12(2): 173-192. Shah, JJ: 1991, Assessment of features technology, Computer-Aided Design 23(5): 331-343. Shah, JJ and Mäntylä, M: 1995, Parametric and Feature-Based CAD/CAM: Concepts, Techniques, and Applications, John Wiley & Sons, New York. Smith, GJ and Gero, JS: 2001, Interaction and experience: Situated agents and sketching, in JS Gero and FMT Brazier (eds) Agents in Design 2002, Key Centre of Design Computing and Cognition, University of Sydney, pp. 115-132. Smith, GJ and Gero, JS: 2005, What does an artificial design agent mean by being ‘situated’?, Design Studies 26(5): 535-561. Sridharan, P and Campbell, MI: 2005, A study on the grammatical construction of function structures, Artificial Intelligence for Engineering Design, Analysis and Manufacturing 19(3): 139-160. Stone, RB and Wood, KL: 2000, Development of a functional basis for design, Journal of Mechanical Design 122(4): 359-370. Suwa, M, Gero, JS and Purcell, T: 1999, Unexpected discoveries and s-inventions of design requirements: A key to creative designs, in JS Gero and ML Maher (eds) Computational Models of Creative Design IV, Key Centre of Design Computing and Cognition, University of Sydney, Sydney, Australia, pp. 297-320. Suwa, M and Tversky, B: 2002, External representations contribute to the dynamic construction of ideas, in M Hegarty, B Meyer and NH Narayanan (eds) Diagrams 2002, Springer-Verlag, Berlin, pp. 341-343. Szykman, S, Fenves, SJ, Keirouz, W and Shooter, SB: 2001, A foundation for interoperability in next-generation product development systems, Computer-Aided Design 33(7): 545559. Taggart, J: 1975, Sketching: An informal dialogue between designer and computer, in N Negroponte (ed.) Reflections on Computer Aids to Design and Architecture, Petrocelli Charter, New York, pp. 147-162. Umeda, Y, Ishii, M, Yoshioka, M, Shimomura, Y and Tomiyama, T: 1996, Supporting conceptual design based on the function-behavior-state modeler, Artificial Intelligence for Engineering Design, Analysis and Manufacturing 10(4): 275-288. de Vries, B, Jessurun, J, Segers, N and Achten, H: 2005, Word graphs in architectural design, Artificial Intelligence for Engineering Design, Analysis and Manufacturing 19(4): 277-288. Weisberg, DE: 2000, The electronic push, Mechanical Engineering 122(4): 52-59. Ziemke, T: 1999, Rethinking grounding, in A Riegler, M Peschl and A von Stein (eds) Understanding Representation in the Cognitive Sciences: Does Representation Need Reality?, Plenum Press, New York, pp. 177-190.
INDEX A abstraction, 217, 220 accelerometers, 62 achievement, 209 actuation, viii, 129, 132, 133, 142 adaptation, 35 adaptations, 177 adjustment, 31 aesthetics, 151 age, 3 aggression, 119 Air Force, 206 algorithm, viii, 34, 71, 88, 89, 91, 92, 93, 96, 97, 99, 102, 104, 108, 109, 129, 130, 131, 138, 139, 141, 149, 150, 151, 163, 164, 165, 179, 186, 197, 198, 202, 206 amplitude, 70, 75, 77, 78, 123 anatomy, 132 anger, 115, 117 animations, vii, viii, ix, 55, 58, 85, 86, 87, 88, 90, 97, 103, 104, 105, 107, 108, 126, 129, 130, 145, 146 annotation, 223 anxiety, 119 articulation, 133 artificial intelligence, 233 assignment, 34 atoms, 218 Australia, 209, 231, 232, 233, 234 Austria, 60 automation, 225
B background, 118 bandwidth, 85, 89 banking, 218 beams, 9, 11, 25, 29, 33 beautification, 160, 161, 170, 172, 175 beauty, 145
behavior, viii, 24, 29, 58, 78, 113, 118, 158, 159, 161, 162, 165, 232 Beijing, 175 Belgium, 177 bending, 206 Bible, 83 blocks, 86, 87, 88, 91, 96, 148, 151 bonds, 218 bone, 130 boredom, 117 Brazil, 113, 127 breakdown, 165 Britain, 55 building blocks, 168, 169 buttons, 27
C calibration, 62 CAP, 4, 10 cast, 221 casting, 147 categorization, 133 Central Europe, 81 channels, 99 chicken, 92, 101, 102 China, 175 City, 143, 155, 173, 233 clarity, 41, 219 classes, 88, 164, 175, 212, 213, 220, 231 classification, 12, 114 cloning, 143 closure, 72, 74, 162 clustering, viii, 86, 90, 91, 92, 93, 94, 95, 96, 97, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109 clusters, 91, 92, 93, 94, 96, 97, 98, 101, 102, 104, 105, 106, 107 codes, 4, 130 coding, viii, 86, 87, 88, 89, 90, 94, 97, 98, 99, 102, 106, 107 cognition, 229 coherence, 88, 89, 99
236 communication, 59, 69, 78, 79, 173, 220, 223 compatibility, 122 compensation, 90 competition, 60 complement, 2 complexity, viii, 86, 96, 132, 150, 156, 164 components, 59, 85, 88, 89, 94, 96, 97, 101, 105, 106, 108, 136, 161, 165, 166, 171, 172, 186, 193, 211, 217, 218, 219 comprehension, 80 compression, viii, 41, 86, 88, 89, 90, 91, 92, 94, 96, 97, 98, 99, 100, 101, 102, 104, 105, 106, 107, 108, 109, 110, 111, 193, 207 computation, viii, 57, 59, 63, 64, 65, 68, 69, 75, 77, 78, 79, 80, 94, 96, 129, 130, 136, 141, 148, 149, 162, 219, 230 computer graphics, vii, viii, 57, 63, 110, 129, 131, 146, 156, 157, 159, 174 computing, 58, 94, 127, 131, 148, 179, 225 conception, vii, 58 concrete, 234 conditioning, 20 conduction, 211 configuration, 20, 41, 54, 93, 125, 126, 146, 162, 164 conflict, 120, 225 conflict resolution, 225 Congress, 82 conjecture, 206 connectivity, 85, 86, 87, 88, 89, 90, 91, 92, 94, 96, 97, 104, 108, 109, 158, 159, 161, 165, 166 consensus, 114, 115, 121 conservation, 3, 12, 54 constraint-based design, ix, 157 construction, vii, ix, 1, 16, 73, 82, 130, 161, 163, 164, 165, 167, 169, 173, 175, 177, 178, 182, 185, 187, 194, 195, 200, 201, 208, 220, 230, 234 consumption, 2 continuity, 136, 159, 178, 179, 180, 186 contour, 78, 134, 183 control, ix, 24, 27, 28, 29, 32, 33, 64, 66, 68, 122, 125, 130, 131, 132, 133, 141, 152, 159, 165, 177, 178, 179, 180, 184, 185, 186, 187, 190, 191, 192, 197, 198, 199, 200, 201, 202, 203, 204, 205, 208, 213 convergence, 93, 137, 192, 220 correlation, 88, 98, 213 costs, 193 creativity, 61, 63 cues, 105, 106, 107 cultural differences, 127 culture, 2, 3, 121 curiosity, 115, 119 customers, 80, 167 Czech Republic, 81
Index
D damping, 23 dance, 100 data set, 138, 158 data structure, 90, 94, 150 database, 149, 223, 233 decay, 121 decision making, 220 decisions, 233 decoding, 98 decomposition, 87, 90, 91, 94, 95, 96, 97, 101, 105, 106, 107, 109, 132, 142, 165, 219, 230 definition, viii, 58, 69, 99, 113, 115, 122, 159, 166, 167, 169, 170, 205, 216 deformation, viii, 67, 76, 77, 78, 97, 129, 130, 131, 133, 135, 138, 139, 162, 166, 173 degradation, 90, 105 density, 158 derivatives, 179, 216, 231 designers, vii, viii, 54, 57, 63, 80, 210, 211, 213, 216, 223 detection, 97, 107, 161, 162, 171 deviation, 130, 135 diet, 3 differentiation, 146, 185 diffusion, 12, 56 dimensionality, 142 disappointment, 119 discipline, vii, 1 discrete data, 159 displacement, 89, 99 disposition, 118 distress, 119 distribution, 2, 12, 133 divergence, 39 dominance, 119 drawing, 15, 163 DynaFeX, viii, 113
E ears, 123 ecstasy, 117 editors, 156, 208 Education, 56, 82 educational objective, 3 elastic deformation, 22, 28, 33 e-learning, 61 electricity, 217 electromagnetic, 59, 62, 63, 64 emotion, viii, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 133 emotional state, 115, 121, 123, 126 emotions, viii, 113, 114, 115, 116, 117, 118, 120, 121, 122, 123, 124, 125, 126, 133 employment, 172
Index encoding, 86, 88, 89, 90, 97, 99, 100, 102, 103, 105, 106 energy, 93, 95, 115, 217, 225 England, 174 entropy, 88, 147, 156 environment, viii, 54, 55, 57, 59, 63, 65, 70, 71, 72, 77, 79, 81, 82, 83, 119, 149, 210, 211, 212, 213, 232 estimating, 136 Europe, 55 European Union, 2 evolution, vii, 1, 2, 3, 146, 216 excitation, 122 execution, 92, 163, 214, 215, 225 expressiveness, 114 extraction, 161, 162, 171, 175, 230, 231 extrapolation, 116, 117 extrusion, 66, 149
F Facial Action Coding, 130, 142 facial expression, viii, 87, 100, 113, 114, 115, 117, 120, 121, 122, 123, 124, 125, 126, 127, 129, 130, 131, 132, 133, 138, 139, 141, 142, 143, 144 facial muscles, 132, 133 family, 116, 117 FBS ontology, x, 209, 210, 216, 218, 221, 231 fear, 115, 117 feedback, 132, 217 FEM, 76, 77, 78, 80 finance, 218 finite element method, 192 flexibility, 163, 178 flight, 61, 62 floating, 85, 88 flour, vii, 1, 3, 4, 8, 9, 12, 22 fluid, viii, 57, 78, 79, 80 fluid dynamics, viii food, 39 freedom, x, 63, 72, 171, 177, 178, 198, 200, 205 friction, 11 functional dynamics, ix, 145, 146, 148, 152, 153, 155 funds, 218 furniture, 61, 166
G gene, ix, 177, 178 generalization, 126, 174, 205, 206 generation, viii, ix, 12, 57, 62, 63, 87, 107, 116, 117, 120, 122, 125, 126, 156, 157, 217, 225, 228, 230 goals, 211, 212, 213, 216, 218, 221 gold, 168
237 grains, 12, 22 graph, 91, 92, 95, 96, 97, 101, 105, 109, 147, 157, 158, 163, 164, 165, 184, 186, 197, 224, 230, 231 gravity, 22 Greece, 157 grief, 117 grounding, 234 grouping, 90, 91, 92, 95 groups, 17, 22, 98, 122, 123 growth, 118, 196 guidance, 81, 130 guidelines, 114, 126
H Hamiltonian, 131, 133, 135, 137, 139, 141, 143 happiness, 118 height, 4, 41, 122, 148, 151, 152, 154, 165, 204, 211 homogeneity, 94 human activity, 221 humidity, 62 hybrid, 158 hypercube, viii, 113, 114, 117, 118, 126
I ideal, 107, 146, 174 identification, 106 illumination, 13, 20, 34, 41, 135 illusion, 59 IMA, 206 image, viii, 3, 35, 36, 37, 38, 39, 41, 57, 60, 61, 81, 86, 91, 130, 134, 135, 136, 137, 138, 139, 141, 142, 143, 144, 147, 151, 152, 158, 193 image processing, viii, 57 imagery, 59 images, 3, 12, 15, 34, 37, 38, 41, 42, 54, 59, 60, 130, 132, 134, 135, 137, 138, 139, 141, 142, 143, 147 immersion, 59, 60 implementation, viii, 59, 63, 68, 73, 78, 80, 94, 96, 113, 123, 138, 146, 152, 177, 230 inclusion, 41, 158 independence, 220 India, 145 indices, 216 individual character, 118 industry, vii, 57, 210 inequality, 196 initial state, 20, 27, 162 insertion, 90 instability, 120 instruments, viii, 57, 63 integration, viii, 57, 63, 69, 80, 81
238
Index
interaction, viii, 57, 61, 62, 63, 69, 78, 80, 81, 146, 152, 153, 158, 211, 212, 213 interactions, x, 130, 209, 210, 211, 213, 216, 223 interface, 63, 81, 82, 228 interference, 62, 70 internet, 110, 218 interoperability, viii, 113, 228, 233, 234 interval, 117 intervention, 130 inventions, 3 iris, 122 iron, 10, 25, 168 Italy, 57 iteration, 126
J jaw, 123, 133 joints, 23, 25, 26, 27, 28, 68, 69, 70, 72, 73, 106
L language, 2, 64, 121, 223, 224, 225, 228 laws, 27, 211, 218, 225 layering, 16 learning, 54, 82, 114, 227, 232 learning environment, 82 Least squares, 109 legend, 78, 80 lens, 17, 19, 20, 39 lifetime, 119 light transmission, 211 likelihood, 95 limitation, 63, 146, 152, 166 line, 17, 39, 60, 78, 115, 122, 152, 174, 180, 201, 228 linear systems, 192 linkage, 62, 67 links, 24, 27, 28, 70, 73, 75 localization, 193 logistics, 61, 82 love, 119
M machinery, 4, 16, 54, 55 magnetic field, 62 maintenance, 61, 225 management, 173 manipulation, 81, 82, 153, 164, 166, 185, 205 manufacturing, 61, 82, 157, 160, 161, 165, 166, 169, 170, 209 mapping, 33, 34, 109, 126, 132, 146, 216 materials science, 231 matrix, 73, 75, 95, 96, 98, 100, 135, 136, 137, 138, 139, 142, 148, 162
measurement, 13, 16, 64 measures, 18, 99, 142, 148 mechanical properties, 130 mechanical stress, 76, 77 media, 218 Mediterranean, 168 memory, 3, 54, 85, 139, 156, 192, 198, 213, 214, 215, 219, 220, 229, 230, 233 memory processes, 219, 220 Miami, 233 Microsoft, 63, 64, 100, 145 microstructure, 218 military, 61 mixing, 62 model, viii, ix, 4, 13, 15, 16, 17, 24, 29, 34, 37, 39, 58, 63, 67, 73, 79, 80, 109, 113, 114, 115, 116, 117, 118, 120, 121, 122, 123, 126, 127, 129, 130, 131, 132, 133, 134, 135, 137, 138, 139, 140, 141, 142, 144, 146, 147, 151, 152, 158, 159, 160, 161, 165, 166, 167, 169, 170, 171, 175, 212, 213, 214, 216, 218, 225, 229, 233 modeling, vii, viii, 12, 16, 57, 63, 66, 69, 77, 78, 80, 82, 114, 118, 129, 131, 133, 142, 143, 144, 147, 156, 158, 159, 160, 161, 162, 165, 166, 173, 174, 175, 232, 234 models, viii, ix, 3, 12, 58, 68, 85, 108, 110, 114, 122, 129, 130, 131, 132, 133, 135, 142, 143, 144, 146, 147, 148, 155, 157, 158, 159, 160, 161, 165, 166, 167, 169, 170, 171, 172, 173, 175, 210, 211, 221, 222, 223, 224, 225, 228, 230, 233 money, 58 mood, 114, 115, 116, 118, 119 morphology, 175 motion, viii, ix, 57, 63, 69, 80, 85, 86, 87, 88, 89, 90, 91, 92, 94, 95, 96, 97, 98, 100, 101, 102, 105, 106, 107, 110, 111, 114, 121, 122, 129, 130, 131, 132, 133, 134, 135, 136, 137, 139, 141, 142, 144, 156, 217 motivation, 157 movement, 10, 13, 18, 20, 22, 24, 25, 27, 28, 29, 33, 62, 67, 68, 69, 71, 86, 121, 130, 131, 133, 150, 201 MRI, 81 multimedia, vii, 85 multiple interpretations, 228 muscles, viii, 114, 122, 129, 130, 133, 141
N navigation system, 82 neural network, 132, 230, 231 neural networks, 231 nodes, 94, 164 noise, 90, 121 nucleus, vii, 1
Index
O observations, viii, 86, 97 obstruction, 152 occlusion, ix, 145, 146, 148, 149, 150, 151, 152, 153, 155 oil, 3 olive oil, 55 operator, 138 optimism, 119 optimization, 132, 136, 137, 147, 165, 175, 226, 233 orientation, 70, 73, 150 originality, 168
P Pacific, 109 panoramic maps, 145, 146, 147, 153 parallel processing, 110 parallelism, 171 parameter, ix, 73, 74, 117, 122, 123, 125, 126, 130, 133, 134, 135, 136, 138, 139, 141, 143, 145, 146, 148, 155, 165, 167, 169, 186, 190, 193 parameter estimation, 141 parameter vectors, 138, 139, 141 parameters, viii, ix, 20, 34, 58, 70, 73, 74, 80, 86, 89, 118, 122, 123, 129, 130, 131, 132, 133, 135, 136, 137, 138, 139, 140, 141, 142, 162, 165, 166, 225 Pareto, 227 Pareto optimal, 227 partial differential equations, 192 particles, 12, 22 partition, 91, 92, 171, 183, 185, 196, 200 passive, 130, 133 path planning, 147 pattern recognition, 230 PCA, 89, 90, 104, 106, 135, 138 pelvis, 92 personality, 114, 115, 116, 119, 122 personality traits, 119 pessimism, 119 phonemes, 121, 122, 123 photographs, 4, 12, 15, 36, 39, 144 physical environment, 118, 119 physics, 27 planning, 156 plaque, 169 poor, 105 poor performance, 105 Powell-Sabin splines, ix, 177, 178, 182, 184, 186, 187, 191, 194, 197, 198, 199, 200, 201, 205, 206, 207, 208 power, 207
239 prediction, 86, 87, 89, 90, 91, 96, 97, 105, 107, 111, 193 pressure, 78, 79, 80 production, vii, 2, 3, 54, 55, 58, 165, 166 productivity, 230 program, 16, 18, 27, 54, 69, 231 programming, 64, 182 propagation, 162, 164, 193 protocol, 213, 233 prototype, ix, 63, 80, 145, 169 psychology, 232 pupil, 122
Q quality standards, vii, 57 quantization, 88, 109 quantum electrodynamics, 218
R radius, 165, 204 rain, 22, 210 range, ix, 24, 27, 29, 41, 54, 64, 95, 121, 141, 157, 158, 160, 177, 190, 221, 225 ray-tracing, 147 real time, 59, 62, 66, 70, 151 realism, viii, 15, 34, 39, 42, 57, 62, 63, 132 reality, vii, 1, 2, 54, 55, 59, 60, 61, 63, 65, 67, 69, 70, 71, 77, 78, 79, 80, 81, 82, 83, 159, 223 reason, vii, 1, 58, 114, 161, 162, 170 reasoning, 174, 228, 231 recall, 178 recognition, 70, 71, 123, 129, 143, 162 reconstruction, 2, 15, 16, 56, 88, 89, 90, 97, 99, 100, 101, 158, 168, 170, 171, 172, 175 recovery, vii, 1, 2, 12, 56 recreation, 13, 54 rectangular domains, 178 redundancy, 89 reference frame, 65, 69, 73, 77, 78, 79 reflection, 220, 233 region, viii, 92, 123, 125, 129, 130, 134, 136, 139, 141, 142, 143, 146, 148, 149, 150, 151, 152, 153, 155, 171, 172, 193 regulation, 54 regulations, 159 relationship, ix, 2, 101, 129, 130, 138, 211, 217, 220 relaxation, 162 reliability, 61, 221 relief, 9, 17 repair, 61 reproduction, 143, 170 Requirements, 214 residual error, 89, 97, 98, 142 residuals, 88
240
Index
resolution, 33, 41, 60, 62, 63, 64, 193, 205 resources, 59, 86 returns, 162, 166 rings, 25, 27, 169 robotics, 61 routines, 65 routing, 161, 165, 174
S sadness, 115, 117, 118 sampling, 108, 158 scaling, 201, 202, 222 schema, 221, 232 scientific computing, 160 search, 2, 214, 225, 226, 233 searching, 163 selecting, ix, 20, 66, 93, 145, 146 semantics, 147, 159, 216, 219, 224 sensation, 37 sensitivity, 227 sensors, 70, 80 sensory experience, 212 separation, 24, 122 shape, vii, ix, 57, 77, 78, 95, 114, 132, 133, 134, 142, 152, 158, 160, 161, 162, 165, 166, 168, 174, 177, 179, 184, 190, 200, 205, 230, 231 sharing, 220 simulation, viii, 23, 27, 57, 58, 63, 67, 68, 69, 70, 71, 72, 73, 77, 78, 79, 82, 83, 141, 160, 206, 223, 225 Singapore, 85, 129 skeleton, 161, 162 skills, 167 skin, viii, 114, 129, 130, 132, 133 smoothness, 90, 98, 99, 101, 158, 178, 206 snakes, 132 social development, vii, 1 software, vii, 20, 34, 54, 57, 63, 64, 68, 213, 217, 221 space, viii, ix, 23, 62, 64, 65, 66, 70, 73, 74, 80, 86, 88, 92, 95, 108, 113, 114, 116, 117, 118, 120, 121, 122, 123, 126, 136, 143, 145, 146, 148, 163, 164, 178, 179, 180, 181, 185, 192, 193, 194, 195, 197, 200, 202, 203, 213, 214, 217, 219, 220, 224, 226, 227, 232 Spain, 1, 3, 4, 55, 56 spatial information, 155 specialisation, 220 spectrum, 159 speech, viii, 113, 121, 123, 125, 126, 132, 144 speed, 12, 13, 18, 141, 142, 162, 169, 217, 218, 220 sports, 59 stability, 184, 188, 196, 202 stakeholders, 220, 223 standards, 223, 233 static geometry, 78
steel, 77 stimulus, 116, 117, 118, 121 stock, 6 storage, 12, 85, 193, 219 strategies, 68, 77, 78, 126, 225, 233 stress, 77, 78 structural modifications, 61 subgroups, 22, 23 surface area, 4 symbols, 98 symmetry, 171 synchronization, viii, 113, 125 synthesis, viii, 3, 39, 129, 131, 142, 144, 220, 233
T taxonomy, 118, 166, 175 teeth, 122 teleconferencing, 143 teleology, 210 television, 59 temperature, 62, 78, 79, 80 thermal analysis, 225 thoughts, 213, 232 three-dimensional model, 147, 173 three-dimensional space, 155 threshold, 89, 97, 98, 101, 102, 104, 152, 172 timing, 54 tissue, 133 tones, 41 topology, 86, 91, 94, 97, 101, 162, 165, 211 torus, 186, 187, 197 tracking, viii, 59, 61, 62, 63, 65, 129, 130, 131, 132, 134, 135, 136, 138, 139, 140, 141, 142, 143, 144 trade-off, 90, 98, 193 tradition, 55 training, viii, 54, 60, 129, 130, 134, 135, 138, 139, 141, 142 transformation, 27, 33, 65, 73, 74, 75, 78, 79, 97, 133, 137, 148, 217, 218, 219, 225 transformation matrix, 73, 74, 97, 137, 148 transformations, x, 20, 69, 70, 75, 85, 90, 97, 98, 107, 166, 209, 210, 214, 220, 224, 225, 231 transition, 20, 54, 121, 126 transitions, 54 translation, 75, 78, 79, 133, 141 transmission, 25, 64, 85, 88, 99, 102 transmits, 25 transparency, 146, 150 transportation, 217 trees, 145, 164 triangulation, ix, 96, 147, 177, 178, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 205, 207 trust, 115, 117
Index
U UNESCO, 2 uniform, 151, 186, 208 United Kingdom, 2, 82, 173 updating, 12 urban areas, ix, 145, 146, 147, 151, 155
V validation, 126 variables, 164, 165, 211, 212, 214, 217, 225, 226 vector, 65, 73, 74, 75, 89, 104, 111, 117, 118, 120, 124, 125, 133, 134, 135, 136, 138, 139, 179, 184 velocity, 70, 79, 121 Venezuela, 56 vibration, 78 virtual actors, viii, 113 virtual reality, vii, 1, 54, 55, 59, 83, 159 vision, 17, 81, 146, 147, 156 visual system, 130
241 visualization, ix, 2, 35, 63, 67, 78, 80, 82, 121, 123, 148, 155, 157, 177, 190, 205 voicing, 121
W Wales, 173 wavelet, 89, 90 wear, 64, 80 websites, 2 wheat, 12, 22 wind, 4, 5, 6, 7 windmill, vii, 1, 3, 4, 5, 6, 7, 10, 11, 13, 14, 15, 16, 17, 18, 20, 28, 34, 37, 39, 41, 54 windows, 6, 17, 20, 22, 133 wood, 34, 64, 169 word recognition, 228 workers, 59 workload, 91
Z zinc, 4