This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
An Abstract Argumentation Framework for Supporting Agreements
181
Definition 5 (Conflict-free). A set of arguments ARG ∈ A is conf lict − f reeag1 for an agent ag1 in the society St if (a1 , a2 ∈ ARG / (attacks(a1 , a2 ) ∨ attacks(a2 , a1 )) ∨ ((val(ag1 , a1 )<Sagt 1 val(ag2 , a2 )∈ / V alprefag1 ) ∧ (Role(ag1) t <SPtow Role(ag2 ) ∨ Role(ag1 )<SAuth Role(ag2 ) ∈ / DependencySt )). That is, if there is no pair of arguments that attack each other and or, otherwise, there is a value preference relation and a dependency relation that invalidates the attack. Definition 6 (Acceptability). An argument a1 ∈ A is acceptableag in a society St wrt a set of arguments ARG ∈ A iff ∀a2 ∈ A ∧ def eatsag (a2 , a1 ) → ∃a3 ∈ ARG ∧ def eatsag (a3 , a2 ). That is, if the argument is def eatedag by other argument of A, some argument of the subset ARG def eatsag this other argument. Definition 7 (Admissibility). A conflict-free set of arguments ARG ∈ A is admissible for an agent ag iff ∀a ∈ ARG → acceptableag . Definition 8 (Preferred Extension). A set of arguments ARG ∈ A is a pref erred − extensionag for an agent ag if it is a maximal (wrt set inclusion) admissibleag subset of A. Then, for any AAF AS = < Ag, Rl, D, G, N , A, R, V , Role, DependencySt , Group, V alues, val, V alprefagi > there is a corresponding AF AS = , where R = def eatsagi . Thus, each attack relation of AF AS has a corresponding agent specific def eatagi relation in AAF AS. These properties are illustrated in the example of the next section.
4
Application of the Framework to the Management of Water-Right Transfer Agreements
To exemplify our framework, let us propose a scenario of an open MAS that represents a water market [13], where agents are users of a river basin, they belong to a society St and they can enter or leave the system to buy and sell waterrights. A water-right is a contract with the basin administrator that specifies the volume that can be spent, the water price, the district where the water is settled, etc. Here, suppose that two agents that play the role of farmers F1 and F2 in the river basin RB (group) are arguing to decide over a water-right transfer agreement and a basin administrator BA must control the process and make a final decision. The basin has a set of norms NRB and commands a dependency relations of charity (Ch) between two farmers and power (Pow) between a basin administrator and a farmer. In addition, farmers prefer to reach an agreement before taking legal action to avoid the intervention of a jury (J). Also, F1 prefers economy over solidarity (SO <SFt1 J <SFt1 EC), F2 prefers solidarity over economy (J <SFt2 EC <SFt2 SO) and by default, BA has the value preference order of the t t basin, which is (EC <SBA SO <SBA J).
182
S. Heras, V. Botti, and V. Juli´ an
In this scenario, F1 puts forward the argument “ I should be the beneficiary of the transfer because my land is adjacent to the owner’s land”. Here, we suppose that the closer the lands the cheaper the transfers between them and then, this argument could promote economy. However, F2 replies with the argument “I should be the beneficiary of the transfer because there is a drought and my land is almost dry”. In this argument, we assume that crops are lost in dry lands and helping people to avoid losing crops promotes solidarity. In addition, the BA knows that the jury will interfer if the agreement violates the value preferences of the river basin. Then, they can also put forward the following arguments “F2 should allow me (F1 ) to be the beneficiary of the water-right transfer to avoid the intervention of a jury (J)”, “F1 should allow me (F2 ) to be the beneficiary of the water-right transfer to avoid the intervention of a jury (J)” and “F1 should allow F2 to be the beneficiary of the water-right transfer to avoid the intervention of a jury (J)”. In view of this context, BA could generate a AF AS = as an extension of abstract argumentation frameworks AF =< A, R >. Thus, we have the following arguments in A={A1, A2, A3, A4, A5, A6, A7} (which are all possible solutions for the water-right transfer agreement process): A1 (posed by F1 ): F1 should be the beneficiary of the water transfer (F1 w) to promote economy (EC), A2 (posed by F2 ): F1 should not be the beneficiary of the water transfer (F1 nw) to promote solidarity (SO), A3 (posed by F2 ): F2 should be the beneficiary of the water transfer (F1 w) to promote solidarity (SO), A4 (posed by F1 ): F2 should not be the beneficiary of the water transfer (F2 nw) to promote saving (EC), A5 (posed by F1 ): F2 should allow F1 to be the beneficiary of the water transfer (F1 w&F2 nw) to avoid the intervention of a Jury (J) and A6 (posed by F2 and BA): F1 should allow F2 to be the beneficiary of the water transfer (F1 nw&F2 w) to avoid the intervention of a Jury (J). The BA cannot decide the water tranfer in favour of both water users, so attacks(A1, A3) and vice versa and we assume that it must take a decision favouring at least one part, so attacks(A2, A4) and vice versa. In addition, attacks(A5, A2), attacks(A5, A3) and attacks(A5, A6) and all these arguments attack A5 and attacks(A6, A1), attacks(A6, A4) and attacks(A6, A5) and all these arguments attack A6. Then, R ={attacks(A1, A3), attacks(A3, A1), attacks(A2, A4), attacks(A4, A2), attacks(A5, A2), attacks(A5, A3), attacks(A5, A6), attacks(A2, A5), attacks(A3, A5), attacks(A6, A5), attacks(A6, A1), attacks(A6, A4), attacks(A1, A6), attacks(A4, A6)} and St = < Ag, Rl, D, G, N, V, Role, DependencySt , Group, Values, V alprefq > where Ag = {F1 , F2 , BA}; Rl = {Farmer, BasinAdministrator}, D = {Power, Charity}; G = {RB}; N = NRB ; V = {EC, SO, J}, Role(F1 ) = Role(F2 ) = Farmer ; Role(BA) = t BasinAdministrator, Farmer <SPtow BasinAdministrator, Farmer <SCh Farmer, Group(F1 ) = Group(F2 ) = Group(BA) = RB, Values(F1 ) = Values(F2 ) = Values(BA) = {EC, SO, J}, ValprefF1 = {SO <SFt1 J <SFt1 EC}; Valpref(F2 ) = {EC t t SO <SBA J}. Therefore, taking into <SFt2 J <SFt2 SO}; Valpref(BA) = {EC<SBA account that F1 and F2 have a charity dependency relation between them, the AFAS for this example is shown in the figure 1a.
An Abstract Argumentation Framework for Supporting Agreements
A3 F2w SO
A1 F1w EC
A5 F1w&F2nw J
A3 F2w SO
A6 F1nw&F2w J
A2 F1nw SO
A1 F1w EC
A5 F1w&F2nw J
A4 F2nw EC
183
A6 F1nw&F2w J
A2 F1nw SO
A4 F2nw EC
Fig. 1. a) Example AF AS; b) Example AF ASF2
A3 F2w SO
A1 F1w EC
A5 F1w&F2nw J
A3 F2w SO
A6 F1nw&F2w J
A2 F1nw SO
A4 F2nw EC
A1 F1w EC
A5 F1w&F2nw J
A6 F1nw&F2w J
A2 F1nw SO
A4 F2nw EC
Fig. 2. a) Example of AF ASF1 ; b) AF ASF1 modified
Now, lets consider what happens with specific agents by creating their AAFAS. For instance, recalling that F1 prefers economy to other values and gives solidarity the lesser value (SO <SFt1 J <SFt1 EC) we have that AAF ASF1 is the following: AAF ASF1 = < Ag, Rl, D, G, N , A, R, V , Role, DependencySt , Group, V alues, val, V alprefF1 >. Then, eliminating the unsuccessful attacks (due to value preferences of F1 ) we have the equivalent AF ASF1 for AAF ASF1 as AAF ASF1 = < A, {attacks(A1, A3), attacks(A2, A4), attacks(A5, A2), attacks(A5, A3), attacks(A6, A5), attacks(A6, A1), attacks(A6, A4)}, St > which is shown in the graph of Figure 2a. This graph has the preferred extension P EF1 = {A6}, meaning that F2 should be the beneficiary of the water-right transfer to promote solidarity and the no intervention of a jury. This demonstrates how the power dependency relation of BA prevails over farmers and their arguments. Otherwise, if we change the environment and set a charity dependency relation of t BasinAdministrator, the prefbasin administrators over farmers F armer <SCh erences of F1 would prevail and the graph would be as the one of Figure 2b. In this case, the preferred extension would be P EF1 modif ied = {A1, A4, A5} that would defend F1 as the benefitiary of the transfer agreement. In its turn, F2 gives the highest value to solidarity, but prefers to avoid a jury over economy (EC <SFt2 F <SFt2 SO). Therefore, its asssociated AAF ASF2 would be the following: AAF ASF2 = < Ag, Rl, D, G, N , A, R, V , Role, DependencySt , Group, V alues, val, V alprefF2 >. Then, eliminating the unsuccessful attacks we have the equivalent AF ASF2 for AAF ASF2 as AFF2 = < A, {attacks(A3, A1), attacks(A2, A4), attacks(A2, A5), attacks(A3, A5), attacks(A6, A5), attacks(A6, A1), attacks(A6, A4)}, St > which is shown in the graph of the
184
S. Heras, V. Botti, and V. Juli´ an
Figure 1b. This graph has the preferred extension P EF2 = A2, A3, A6 that means that F2 defends its position as beneficiary of the water transfer.
5
Conclusion
In this paper we have presented an abstract argumentation framework to help reaching agreements in agent societies. After defining our concept of agent society, we have provided the formal definition of our argumentation framework. This is an extension of Dung’s framework [11] to include agents’s values, value preference orders and dependency relations. The framework has been illustrated in a real scenario of a water-rights transfer market. Acknowledgements. This work is supported by the Spanish government grants CONSOLIDER INGENIO 2010 CSD2007-00022, TIN2008-04446 and TIN200913839-C03-01 and by the GVA project PROMETEO 2008/051.
References 1. Rahwan, I.: Argumentation in multi-agent systems. Autonomous Agents and Multiagent Systems, Guest Editorial 11(2), 115–125 (2006) 2. Bench-Capon, T., Dunne, P.: Argumentation in artificial intelligence. Artificial Intelligence 171(10-15), 619–938 (2007) 3. Ferber, J., Gutknecht, O., Michel, F.: From agents to organizations: an organizational view of multi-agent systems. In: Giorgini, P., M¨ uller, J.P., Odell, J.J. (eds.) AOSE 2003. LNCS, vol. 2935, pp. 214–230. Springer, Heidelberg (2004) 4. Oliva, E., McBurney, P., Omicini, A.: Co-argumentation artifact for agent societies. In: Rahwan, I., Moraitis, P. (eds.) Argumentation in Multi-Agent Systems. LNCS (LNAI), vol. 5384. Springer, Heidelberg (2009) 5. Bench-Capon, T., Atkinson, K.: Abstract argumentation and values. In: Argumentation in Artificial Intelligence, pp. 45–64 (2009) 6. Dignum, V.: PhD Dissertation: A model for organizational interaction: based on agents, founded in logic. PhD thesis (2003) 7. Artikis, A., Sergot, M., Pitt, J.: Specifying norm-governed computational societies. ACM Transactions on Computational Logic 10(1) (2009) 8. Criado, N., Argente, E., Botti, V.: A Normative Model For Open Agent Organizations. In: International Conference on Artificial Intelligence, ICAI 2009 (2009) 9. Perelman, C., Olbrechts-Tyteca, L.: The New Rhetoric: A Treatise on Argumentation (1969) 10. Searle, J.R.: Rationality in Action (2001) 11. Dung, P.M.: On the acceptability of arguments and its fundamental role in nonmonotonic reasoning, logic programming, and n -person games. Artificial Intelligence 77, 321–357 (1995) 12. Baroni, P., Giacomin, M.: Semantics of Abstract Argument Systems. In: Argumentation in Artificial Intelligence, pp. 25–44. Springer, Heidelberg (2009) 13. Botti, V., Garrido, A., Giret, A., Noriega, P.: Managing water demand as a regulated open mas. In: Workshop on Coordination, Organization, Institutions and Norms in agent systems in on-line communities, COIN 2009, vol. 494, pp. 1–10 (2009)
Reaching a Common Agreement Discourse Universe on Multi-Agent Planning Alejandro Torre˜ no, Eva Onaindia, and Oscar Sapena Departamento Sistemas Informaticos y Computacion Universidad Politecnica Valencia Valencia, Spain {atorreno,onaindia,osapena}@dsic.upv.es
Abstract. Multi-Agent Planning (MAP) is the problem of having a group of agents working together to solve a problem that requires a collective effort. When coordination on a MAP system is done through negotiation, agents must share a common ontology. In this paper we propose a mechanism to reach a shared ontology through the definition of a common information model.
1
Introduction
The term Multi-Agent Planning (MAP) refers to any kind of planning in multiagent environments. MAP is concerned with planning by multiple agents, i.e. distributed planning, or planning for multiple agents, i.e. planning for multi-agent execution. In general, MAP accounts for the problem of planning in domains where several agents plan and act together. In our MAP approach, we have multiple independent agents devising a global plan jointly in an environment which they have different views. To do so, agents are capable of sharing some of their planning information and negotiating to build plans. Hence, in this context, it is assumed that agents share a common agreement discourse universe [1], i.e. agents have to know a common set of concepts in order to negotiate. Some approaches to MAP, as Cooperative Distributed Planning (CDP) [5], put the emphasis on extending planning to a distributed environment with fully cooperative agents. Others consider self-interested agents, defining MAP as the problem of finding a plan for each agent that achieves its private goals, such that these plans together are coordinated and the global goals are also met [3]. Under this perspective, the emphasis is on how to manage the interdependencies between the agents’ plans and how to solve the coordination problem [2,4]. In our MAP approach, agents are heterogeneous, i.e. they have different visions of the environment, they can have private goals and they manage different ontologies. Coordination among agents is achieved through a negotiation process, for which agents must share an agreement discourse universe. In this paper, we define a set of mechanisms to define a common information model built on top of each agent’s local model. This common model provides agents with a shared ontology based on PDDL (the Planning Domain Definition E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 185–192, 2010. c Springer-Verlag Berlin Heidelberg 2010
186
A. Torre˜ no, E. Onaindia, and O. Sapena
Language [7]), adapting the representation of the information contained in the agent’s local model in such a way that coherence between both models is ensured. The paper is organized as follows: next section presents the MAP model; section 3 presents the notion of heterogeneous planning agents and the issues raised by them; section 4 defines the mechanisms to design the common information model; following, we show an example of application, and last section concludes.
2
Multi-Agent Planning Model
The informal definition of a MAP problem can be stated as follows: given a description of the initial state, a set of global goals, a set of (at least) two agents, and for each agent a set of its capabilities and (probably) its private goals, find a single competent plan that achieves the global and private goals. This definition contains two key aspects: a collaborative task by which all the agents should cooperate to attain the global goals, and an individual task that guides agents’ proposals towards the resolution of their own private goals. Therefore, a MAP problem can be seen as a Cooperative Distributed Planning (CDP) task while, in addition, agents have their own planning task to solve. First, we will focus on the CDP problem, and then on the individual problem of each agent. Definition 1 A CDP task is a tuple T = AG, Θ, P, I, G, F where AG = {1 . . . n} is a finite, non-empty set of planning agents, Θ is the set of actions that describe the state changes in the domain, P is a finite set of propositional state variables, I ⊆ P is the initial state, G ⊆ P are the problem goals and F is a utility function to select a plan when several choices are available. When solving a CDP task, agents in AG are aimed at finding an executable plan (if such a plan exists) which, applied to the initial state, leads to a state in which all problem goals hold. However, each agent has a different planning task to solve, since agents may have private goals and the knowledge of the problem’s overall state is distributed, having each individual only a partial view. Consequently, agents will have different initial states and most likely different capabilities so they will actually be solving different planning problems. On the other hand, when having common goals, agents must have some limited knowledge of the models of the rest of individuals, since this information may be potentially significant for coordination. In our framework, the planning model of an agent encodes partial information on the capabilities of each other agent in order to promote a coherent coordination towards a joint plan. A CDP task can thus be interpreted as solving as many planning tasks as agents in AG. A different planning task Ti = Θi , I i , Gi , Fi is associated to each agent i ∈ AG such that solving T implies solving ∀i∈AG Ti : – Θi ⊆ Θ represents the set of actions in the model of agent i. This set includes some limited knowledge of the abilities of the other agents. Formally, we define Θi = Γi ∪ Δi , where Γi denotes the actions executable by agent i, and Δi denotes the set of actions executable by any other agent j/j = i.
Reaching a Common Agreement Discourse Universe on MAP
187
– Ii ⊆ I denotes the local knowledge of agent i, its partial perspective of the environment. Formally, we can define I = ∀i∈AG Ii . – Gi = G ∪ PGi , where G are the goals of the CDP task, and PGi ⊆ P is the set of agent i’s private goals. PGi is an optional parameter. – Fi is the utility function of the agent, which can be different to the global one, F . F will be used by all the participants to discuss the CDP solution in the same terms.
3
Planning Agents Information Model
In our MAP approach, planning agents have an information model which defines their view of the environment. As stated in the MAP definition, this model includes a description of the initial state, the set of actions that can be performed by the agent, the private and global goals, the agent’s utility function, and some information on the environment and on the abilities of the rest of agents. The participants must use a common ontology, in order to build jointly and negotiate over problem solutions. However, our MAP framework allows the presence of heterogeneous agents, which do not share a common ontology. This fact implies that the participants do not share an agreement discourse universe. As our MAP approach achieves coordination among agents through negotiation, the absence of a common discourse universe prevents the agents from negotiating and from reaching an agreement on the design of the joint plan. Consequently, a mechanism to establish a common ontology is required in order to tackle this issue. In our MAP framework, each agent initially receives a PDDL-based information model. More precisely, these models are based on PDDL2.1 [7], one of the most popular extensions to PDDL. The PDDL-based ontologies of heterogeneous agents may present differences on the following aspects: – Actions. Agents can have some shared abilities (actions), but it is not assured that they share the same exact representation of these abilities. To integrate knowledge about actions executable by others into an agent model, it is necessary to dispose of homogeneous representations of these abilities. – Objects. Similarly to the actions, heterogeneous agents can have different internal representations of the objects. Again, a common representation is required for the agents to build and discuss about common plans successfully. Our approach to solve this issue is based on the definition of a new information model, which uses an ontology shared by all the individuals. This common information model acts as an upper layer that occludes the original one, allowing the agents to communicate appropriately. Hence, planning agents will manage a common and a local information model. Both models have to be coherent with each other, so the information in the local layer can be migrated to the common one. Coherence between models is a key aspect of our design, since it is necessary to establish a mapping between them. The next section exposes the mechanisms that have been designed to define the common information model.
188
4
A. Torre˜ no, E. Onaindia, and O. Sapena
Defining a Common Information Model
To design a common ontology shared by all the planning agents, we have defined two different mechanisms. On the one hand, we have defined a set of design techniques that allow the generation of a common PDDL-based planning model, which contains homogeneous descriptions of the objects and the abilities of the agents, and is coherent with the agents’ underlying local models. On the other hand, we have extended the PDDL2.1 language with a set of constructs that allow the agents to translate information between both planning layers. The following sections detail the design techniques and the language extensions. 4.1
Modeling Techniques
Since some agents can have more detailed description of the objects and the operators than others, the design techniques are aimed to homogenize these descriptions by reducing their level of detail. The purpose of the design techniques is to create an information model that includes the simpler description of the objects and operators among the local models, thus ensuring the coherence between them. Hence, the common model will contain, in general, simpler representations of the objects and operators than the local ones. According to their effect, it is possible to classify the design techniques into two groups: – Generalization techniques. Objects and operators included in the common model may be modeled as groupings of objects and operators in the agents’ local models. An object in the common model can be translated into a grouping of local objects, or multiple groupings of different local objects. It can also be a direct mapping of a local object. Similarly, an operator can be translated as a set of local actions or directly mapped to a local action. – Detail reduction techniques. Certain objects and operators may only be included in some of the local models. This information will be considered not to be relevant for the common model, since it does not have a correspondence with all of the local models. Instead of applying a generalization technique, this information will be directly discarded from the common model. Objects generalization. At the common layer, objects must have a correspondence with the local ones. An object in the common layer can be seen as the composition of one or more instances of one or more local objects. Hence, it is possible to distinguish three different ways to group objects: – Direct mapping. The simplest possible grouping consists in using an object at the common level as it is in a local model. – Simple grouping. This grouping technique involves a common-model object being the composition of several instances of a local object. – Multiple grouping. Unlike simple grouping, this technique allows the grouping of several instances of different local objects into a common-model object.
Reaching a Common Agreement Discourse Universe on MAP
189
It is not possible to define any object in the common model unless it has the proper correspondence in all of the local models, thus guaranteeing that the objects of the two information layers of each agent are coherent. Operators generalization. Operators on the common model have also to be coherent with their local counterparts. The design of these operators is a similar process than the definition of Hierarchical Task Networks (HTN) [6]. It is possible, then, to consider the local operators to be primitive actions, while the common-model operators can be translated into networks of primitive actions, or directly mapped to a single one. Hence, we can distinguish two ways of defining these operators: – Direct mapping. A local operator may be included in the common model as it is, replacing only its local objects for common-model ones. – Hierarchical network. This technique, based on HTN, maps commonmodel operators to networks of local ones, relating them in this way to the local model, since they are compositions of primitive (local) operators. Therefore, common-model operators must also be necessarily equivalent to single local operators or sequences of them, in order to preserve the coherence between both information levels. Detail reduction. Besides the generalization techniques, it is possible to apply a detail reduction by discarding elements from the common model that are irrelevant from a common perspective. We consider an object or an operator to be irrelevant to the common model if they are not reflected in the local models of all the individuals. In this situation, it is not necessary to create entities in the common model to be mapped to these objects or operators, so they are directly discarded. This way, coherence between the common information model and the participants’ local models is preserved. 4.2
Planning Language Extensions
Once we have defined a common information model, it is necessary to provide the agents with a mechanism to translate the information between this layer and the underlying local model. To do so, we have included an extension in our planning language that establishes the relationship between both models, and how to translate the information from one layer to the other. More precisely, this information is specified in the mapping section, which is included into the local model. The mapping construct, models the correspondence between the two information models defined through predicates. Within the section defined by a mapping construct, it is possible to include several implies sentences, that associate common and local predicates. A mapping construct uses the following BNF (Backus Naur Form) syntax:
190
A. Torre˜ no, E. Onaindia, and O. Sapena
< mapping - def > < implies - def >
::= ::= ::= ::= ::= ::= ::= ::= ::= ::= ::= ::= ::=
(: m a p p i n g < implies - def >+) ( i m p l i e s < com - pred >+ < loc - pred >+) < pred - def > < pred - def >
As the syntax description states, it is possible to introduce multiple implies sentences within a mapping section. Both com-pred and loc-pred are defined as a pred-def, which can be a single predicate or a conjunction or a disjunction of them. These predicates may be totally or partially instanced. Each implies sentence can be seen as a double implication of the form (com-pred ⇔ loc-pred). The predicates in these sentences act as patterns; if a common-model literal adjusts to the pattern defined by com-pred, then it is possible to infer the right-hand side of the sentence, i.e. the loc-pred. Since the implication is bidirectional, the left-hand side can be inferred by having a local predicate that fits into the pattern established by loc-pred. That makes possible the exchange of information between the two layers in both directions.
5
Application Example
In order to illustrate the specified mechanisms, this section presents a simple example. The example models a transport domain, in which agents act as transport agencies that use their trucks to deliver packages to a certain city. Agents work in a particular geographic area, so they will have to interact with other agents in order to deliver packages outside their area. The example includes two heterogeneous agents, Ag1 and Ag2, each having its own representation of the objects and actions. Ag1 ’s model contains the objects Truck, Package, City and Agent, that represent accurately the objects of the environment. Ag2, however, manages a less detailed representation of the objects. Its model contains the objects Package, Agent, Delivery and Area. Area refers to the geographical areas in which are placed the cities, and Delivery is a representation of a set of trucks, each of them carrying one or more packages, i.e. each Delivery is a multiple grouping that includes n trucks and n packages. Regarding the operators, Ag1 has the actions Load, to load a package into a truck; Unload, to unload a package from a truck; and Drive, to drive a truck from a city to another. Ag2 ’s model includes also three actions, Collect, to start the transportation of a delivery; Deliver, to hand over a delivery; and Move, to move a delivery from a geographical area to another.
Reaching a Common Agreement Discourse Universe on MAP
191
From this initial situation, the modeling techniques have been applied to build a common planning model. The resultant model shares the same objects with Ag2, since its model offers the simplest representation of the objects in the environment. The following list shows the techniques applied in Ag1 ’s model, excluding the direct mappings: • City → Area - Simple grouping: Each Area object constitutes a simple grouping, that encloses n different City objects. • Delivery - Multiple grouping: As previously defined, a Delivery object is a multiple grouping that includes n trucks and n packages. • Truck - Detail reduction: Since the Delivery object includes the notion of truck, and the Truck object is not shared by both agents, it is not necessary to include it in the common model. The common model operators also coincide with Ag2 ’s. The following the list summarizes the techniques applied in Ag1 ’s model: • Load → Collect - Direct mapping: Collect is a direct mapping of Ag1 ’s Load action. The only differences between them lie in their parameters. • Unload → Deliver - Direct mapping: Deliver is a direct mapping of Ag1 ’s Unload action. Variations stem only from the parameters of each operator. • Drive → Move - Hierarchical network: A Move action can imply driving a truck to a city, unloading it, and loading the cargo into another truck. It is possible to repeat that pattern n times until the destination of the action is reached. Hence, the action can be seen as a hierarchical network. To illustrate the use of the :mapping section, let us focus in Ag1 ’s model, since Ag2 ’s :mapping section is more straightforward, given that all the objects on the common model have been directly mapped from its local model. Ag1 works in the area aSpain, which includes the City objects Madrid, Barcelona and Valencia. Let us also consider the common-model predicates (at ?d - Delivery ?a Area), which states that a delivery ?d is placed at an area ?a, and (in ?p Package ?d - Delivery), which indicates that a package ?p is included into a delivery ?d. Ag1 ’s local versions of these predicates state that a package ?p is in a truck ?t, and that a truck ?t is placed at a city ?c. The mapping section of Ag1 ’s model is defined as follows: (: m a p p i n g ( i m p l i e s ( and ( at ? d - d e l i v e r y aSpain ) ( in ? p - p a c k a g e ? d )) ( and ( in ? p ? t - truck ) ( or ( at ? t Madrid ) ( at ? t B a r c e l o n a ) ( at ? t V a l e n c i a ))) )) The section states that having a package in a delivery placed at the geographical area aSpain implies that the package is in a truck located at Madrid, Barcelona or Valencia. This way, common-model information about deliveries and areas can be translated in terms of Ag1 ’s local objects, like cities and trucks.
192
6
A. Torre˜ no, E. Onaindia, and O. Sapena
Conclusions
This paper presents a mechanism to achieve a common agreement discourse universe among heterogeneous agents that do not share a common ontology. The problem of finding a common discourse universe constitutes a major issue in the context of a MAP system in which agents coordinate through negotiation. To tackle this issue, a PDDL-based information model, built coherently on top of the individual models of the agents, is defined. This new model introduces a novel approach for handling information in MAP: each agent handles now a two-layered information model, that allows it to negotiate with the rest of participants while maintaining its original model. The model is built through a set of design techniques, the aim of which is to create a model that includes the simpler description of the objects and operators among the local models, assuring in this way the coherence between them. Hence, the common model will contain the simplest representations of the objects and the operators among the local models. Coherence is a key aspect of the defined method. The common model’s objects and operators must have a correspondence with the ones included in the local models. Although the modeling techniques assure this condition by themselves, some extensions have been introduced into our planning language to establish a mapping between both models, that will allow the agents to translate information between both layers. The exposed method allows the agents to use a common ontology while preserving their original models. The main drawback of the method lies in the fact that the modeling of the common model is currently performed by hand, requiring the intervention of an expert. Hence, the future work will be focused on automating the process fully or partially. Acknowledgments. This work has been supported by the Spanish MICINN under projects TIN2008-06701-C03-03 and Consolider Ingenio 2010 CSD200700022, and the Valencian Prometeo Project 2008/051.
References 1. Carrascosa, C., Rebollo, M.: Agreement spaces for counselor agents (extended abstract). In: Proc. of 8th Int. Conf. on Autonomous Agents and Multiagent Systems (AAMAS 2009), pp. 1205–1206 (2009) 2. Chen, W., Decker, K.: Managing multi-agent coordination, planning, and scheduling. In: AAMAS, pp. 1360–1361 (2004) 3. de Weerdt, M., ter Mors, A., Witteveen, C.: Multi-agent planning. an introduction to planning and coordination. In: Handouts of the European Agent Systems Summer School (EASSS 2005), pp. 1–32 (2005) 4. Decker, K., Lesser, V.R.: Generalizing the partial global planning algorithm. Int. J. Cooperative Inf. Syst. 2(2), 319–346 (1992) 5. desJardins, M.E., Durfee, E.H., Ortiz, C.L., Wolverton, M.J.: A survey of research in distributed continual planning. AI Magazine 20(4), 13–22 (1999) 6. Erol, K., Hendler, J., Nau, D.: UCMP: A sound and complete procedure for hierarchical task-network planning. In: Proceedings of the International Conference on Artificial Intelligence Planning Systems, pp. 249–254 (1994) 7. Ghallab, M., Howe, A., Knoblock, C., McDermott, D., Ram, A., Veloso, M., Weld, D., Wilkins, D.: PDDL - the planning domain definition language. In: AIPS 1998 Planning Committee (1998)
Integrating Information Extraction Agents into a Tourism Recommender System Sergio Esparcia, V´ıctor S´anchez-Anguix, Estefan´ıa Argente, Ana Garc´ıa-Fornes, and Vicente Juli´ an Grupo de Tecnolog´ıa Inform´ atica - Inteligencia Artifical Departamento de Sistemas Inform´ aticos y Computaci´ on Universidad Polit´ecnica de Valencia Camino de Vera, s/n, 46022 - Valencia, Spain {sesparcia,sanguix,eargente,agarcia,vinglada}@dsic.upv.es
Abstract. Recommender systems face some problems. On the one hand information needs to be maintained updated, which can result in a costly task if it is not performed automatically. On the other hand, it may be interesting to include third party services in the recommendation since they improve its quality. In this paper, we present an add-on for the Social-Net Tourism Recommender System that uses information extraction and natural language processing techniques in order to automatically extract and classify information from the Web. Its goal is to maintain the system updated and obtain information about third party services that are not offered by service providers inside the system. Keywords: recommender systems, information agents.
1
Introduction
Over the last few years, the Web has become the greatest source of available information. A goal for researchers is to obtain optimal ways of recovering specific information from the Web. One of the most important information filtering techniques are recommender systems, whose goal is to present information items that could be interesting to the user. Recommender systems attempt to reduce information overload by selecting subsets of items based on user preferences. More specifically, their aim is to offer new services that are adapted to the personal preferences of their users. One of the industries where recommender systems have been applied with success is the tourism industry [1,2]. Most of these systems present tour plans according to the personal preferences of the tourists. There are two main approaches on recommender systems: content-based and collaborative filtering. On the one hand, content-based algorithms use item content, like name and other features. Triplehop’s Technologies TripMatcher [1], DIETORECS [3,4], and Vacation Coach’s Me-Print are three examples for ecommerce implementations that use content-based algorithms. In this kind of systems, recommendations depend on item description. On the other hand, E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 193–200, 2010. c Springer-Verlag Berlin Heidelberg 2010
194
S. Esparcia et al.
collaborative recommender systems use algorithms that give recommendations based on the user’s behavior (other users’ experiences and opinions). Some examples can be found in [2,5,6]. Collaborative filtering is currently the most used technique in recommender systems. One example of collaborative recommender system is the Social-Net Tourism Recommender System (STRS) [7], a tourism application that helps tourists to make their visits to a city more profitable and adapted to their preferences. It uses mobile devices, such as a phone or a PDA, allowing tourists to make reservations in restaurants or cinemas, and providing a tour plan for a day. STRS is based on multi-agent system technology and employs social networks as a mechanism to give recommendations. However, some problems arise in STRS and most tourism recommender systems: keeping up-to-date business information is a costly task and they do not offer information about third party services that could enhance recommendations. For instance, a tourism system where no user offers information about films could loose the opportunity to satisfy users that are highly interested on cinema. Even although third parties are not part of the system, it should be noted that the final goal of the system is to satisfy tourists and, consequently, increase the benefits of those parties that are part of the system. Both types of information may be available on the Web. An automated update mechanism is convenient in order to be able to cope with the dynamic content that can be found on the Web. In this work, an add-on for the STRS that retrieves information from third parties and tourist service providers is presented. It is based on information agents technologies and voting processes that allow to accurately extract and classify information. The remainder of this paper is organized as follows. Section 2 provides an overview for the STRS architecture. Section 3 gives a description of the add-on: the description for information extraction and information classification agents, and an explanation for the extended architecture. Section 4 presents the experiments used to test the system, showing the classification accuracy of the system and an experiment in which a simple update rule for the voting power is used. Finally, section 5 presents some conclusions.
2
The Social-Net Tourism Recommender System Architecture
The Social-Net Tourism Recommender System (STRS) [7] is a tourism application, which offers different services to tourists. Its goal is to improve their stay in a city, spending their time in the most efficient way, by means of the generation of tour plans. Tourists can find two kinds of information, according to their interest: personal interesting places (restaurants, cinemas, museums, theaters...) and general interesting places (monuments, churches, beaches, parks...). Users can set their preferences using a mobile phone or a PDA. Tourists can make a reservation in a restaurant, buy tickets for a film or a concert, and so forth.
Integrating IE Agents into a Tourism Recommender System
195
STRS integrates Multi-Agent technology and a recommender system based on social network analysis. It uses social networks to model communities of users, trying to identify the relations among them, to identify similar users that help to recommend items. STRS is formed by two subsystems that cooperate to provide comprehensive and accurate tourism recommendations: the Multi-Agent Tourism System (MATS) and the Multi-Agent Social Recommender System (MASR): – MASR is formed by four types of agents: (i) user agent : an interface between the user and the MASR; (ii) data agent : an agent responsible for the management of a database with user’s data; (iii) recommender agent : receives all the recommendation and users registration queries; and (iv) social-net agent : adds a node onto the social agent when a user joins the platform and determines the new user’s similarity with regards to the other profiles. – MATS is also composed by four agents: (i) broker agent: in charge of establishing a communication between the user and sight agents; (ii) sight agent : manages all the information regarding the characteristics and activities of a specific place of interest in the city; (iii) user agent : allows tourists to use the different services by means of a GUI on their mobile devices; and (iv) plan agent : establishes and manages all the planning process offered by the system, taking into account preferences and searches. As stated above, one of the main problems of this proposal is to maintain information up-to-date and to integrate information that could be interesting for the tourists and which is placed outside of our system. Therefore, next section presents an add-on for STRS that is capable of maintain the information updated and is able to introduce new information which is not supplied by any of the service providers of the system.
3
The Information Agents Add-on in the STRS
The proposed add-on is based on information agents that use natural language processing techniques retrieve information from the Web. More specifically, there are two different types of main agents that collaborate to accomplish this task: information extraction agents (IE agents) and information classification agents (IC agents). The first ones extract the information from the Web sources, whereas the second ones classify the extracted information according to their service category (e.g., concerts, cinema, theater plays, etc.). Additionally, there are two types of auxiliary agents that help IC agents: contact agents and a trusted mediator. In the following subsections, we describe both types of main agents and their integration with the STRS. 3.1
Information Extraction Agents
As it has been stated before, IE agents extract information from the required websites. According to classical information agent classifications, they can be
196
S. Esparcia et al.
categorized into the wrapper agents category [8,9]. They look for specific text patterns that point to relevant information for the STRS such as service name, address, price, time, duration, etc. It must be noted that one IE agent is required per website that needs to be analyzed. The general steps that our IE agents carry out in order to obtain the required information are: (i) Send a HTTP petition to the target website and wait for its response; (ii) Analyze the HTML code of the response received; (iii) Look for specific patterns in the HTML code that point to the desired information; (iv) Extract and process the information; (v) Send the service description to IC agents. 3.2
Information Classification Agents
Since the service information extracted from the web may not be explicitly categorized, it is necessary to provide an additional mechanism that allows to perform such categorization task. Each IC agent is specialized in scoring into one specific event category (e.g., one agent specializes in concerts, other agent specializes in theater plays, etc.). This classification is carried out by means of matching rules based in Natural Language Processing (NLP) knowledge that are applied to the received service descriptions. We propose two different types of rules for IC agents: 1. Term Strength rules: Term Strength (TS) [10,11] is a measure of how relevant a word/lemma is with respect to a specific category. The Term Strength of words with respect to a specific category is precalculated using a corpus. We propose the following mechanism for TS rules: TS rules look for lemmas whose TS value has been precalculated during the training phase. If a match is found, the matched TS rule rj produces a score vote SCT S equal to the precalculated TS for the word. 2. Hyperonym rules: These rules are based on hyperonym trees found in Wordnet [12]. Hyperonymy trees represent the semantic relation between a specific word (root) and a more general related concept (intermediate nodes). Each branch represents a different sense ordered by frequency (the first branch represents the most common sense). We propose a rule based on the matching of specific patterns (usually provided by an expert) in hyperonym trees. If the pattern contained in rule rj is found then the rule produces a score vote SCH that is equal to: SCH (wi ) =
|S(wi )| − (i − 1) |S(wi )| k k=1
(1)
where wi is the word/lemma analyzed, |S(wi )| is the number of senses of wi , and i is the number of sense where the pattern was found. This way, less common senses score lower than the most frequent senses. The final score vote associated to a word wi is equal to the score of the matching rule, if any, that produced the maximum score vote for wi . Therefore, the final
Integrating IE Agents into a Tourism Recommender System
197
score vote for a service description is equal to the sum of the final scores produced by the words that are part of the description. Each IC agent has its own set of rules that is specialized in scoring a specific service category. Ideally, the agent should produce high scores for descriptions that belong to their expertise category and low scores for other categories. IC agents form a mediated agent organization whom has two advantages: IE agents only need to know the contact agents of the organization and mediators can govern the classification process. The contacts agents offer the classification service, that is called by IE agents. The service call needs a service description to be provided by IE agents. This service description is broadcasted by contact agents to all of the IC agents and a trusted mediator. The trusted mediator starts a voting process where all of the IC agents have to emit a vote that reflects if they believe that the service description can be categorized into their expertise area. After all of the IC agents have voted, the trusted mediator assign a service category to the service description that is equal to the expertise area of the agent whose vote scored higher. This expertise area is sent back by the contact agent to the service invoker. The mediator can regulate the voting process by adjusting the voting power (vpai ) of each IC agent according to past experiences. This voting process can be formalized as follows: Category(W ) = Expertise(argmax vpai ∗ SCai (W )) ai ∈ICS
(2)
where ICS is the set of IC agents, and vpai is the voting power that the mediator grants to the agent ai . 3.3
Integrating the Add-on in the STRS
The integration of the add-on in the STRS and the way it works can be summarized as follows: 1. An IE agent extracts a service description from the Web. 2. This agent requests a classification service to the IC organization. The argument of the call is the service description previously extracted. 3. The contact agent of the organization receives the service call and broadcasts it to all of the IC agents and the trusted mediator. 4. The trusted mediator starts the voting process and waits for the votes. IC agents send their votes and their expertise category to the trusted mediator. 5. The mediator decides which category to assign to the service description based on the highest score and each agent voting power. 6. The classification result is sent back to the invoking IE agent. 7. This IE agent sends a message with the information extracted and its associated category to the corresponding sight agent in the STRS. Sight agents manage all the information regarding the characteristics and activities of a specific place of interest in the city. The complete architecture of the Social-Net Tourism Recommender System and the designed add-on can be found in Fig. 1. It shows the whole process of information extraction, information classification and its integration in the STRS system.
198
S. Esparcia et al.
Fig. 1. The architecture of the STRS system and its add-on
4
Experiments
The first experiment consisted in testing the classification accuracy of IC agents. Three different service categories were employed: concerts, exhibitions, and theater plays. One IC agent per category was designed. Therefore, three different rule sets were designed using the information of a balanced corpus of 600 service descriptions (70% training, 30% test) and information provided by experts. The voting power of each agent was fixed to wai =1, and it remained static during the whole process. It must be noted that the proposed method was compared with a baseline that only used the knowledge provided by Term Strength. The results can be found in Fig. 2.b. It can be observed that the proposed method performs better in classification error that the Term Strength method. This improvement is obtained due to the use of term strength rules (statistical knowledge) and hyperonym rules (expert knowledge). The second experiment aimed to show how the trusted mediator can govern the voting process carried out to classify event descriptions. The three agents that were used in the first experiment were also used in this second experiment (music agent, theater agent, exhibition agent). Additionally, three malicious agents (badly designed) that represent music, theater, exhibition categories were also used. These agents generate high scores with a high probability despite the true category of the service description, thus they usually introduce error in the classification service. The mediator updated the voting power of each agent each 10 service calls. It applied a decay on the voting power vpai based on the behavior of the agent in the past 10 service calls. The decay formula can be formalized as follows: T Pai F Pai t vpt+1 + (3) ai = vpai − |Nother | |N |
Integrating IE Agents into a Tourism Recommender System
199
Voting power evolution MusicAgent TheaterAgent ExhibitionAgent BadDesignAgent1 BadDesignAgent2 BadDesignAgent3
1
a)
Voting power
0.8
Experiment 1: Classification error Training 11.79% Proposed method Test 11.11% b) Training 17.65% Term Strength Test 16.67%
0.6
0.4
0.2
0
20
40
60
80
100
Number of service calls
Fig. 2. a) Evolution of agents’ voting power for the second experiment. b) Results for the first experiment. t where vpt+1 ai is the new voting power, vpai is the voting power of agent ai in the last check, F Pai is the number of times where the system decision was given by agent ai and the correct service category was not the one ai represents, T Pai is the number of times where the system decision was given by agent ai and the correct service category was the one ai represents, |N | is the total number of service calls (10 in this case), and |Nother | is the total number of service calls whose associated service category is not the one agent ai represents. The experiment was run for 100 random service calls and its results can be observed in Fig. 2.a. This figure shows the evolution of agents’ voting power. As the number of service calls increases, the voting power of malicious agents is nullified whereas the voting power of the other agents remains almost intact. Consequently, the mediator was capable of regulating the voting process in order to provide a better classification service.
5
Conclusions
In this work, an add-on for the Social-Net Tourism Recommender (STRS) has been presented. The recommender system is based on multi-agent system technology and it uses social networks to make its recommendations. The add-on allows to keep and retrieve information about third party services and providers of the system. Information agents that employ natural language processing techniques are used in order to extract and classify the web information that the system add-on requires for the STRS. Additionally, mediated voting processes are employed to classify the extracted information into different service categories. Some experiments have been carried out that show the classification accuracy of the extracted information and how bad designed agents can be neutralized by means of mediated voting processes.
200
S. Esparcia et al.
Acknowledgments. This work is supported by TIN2009-13839-C03-01, TIN2008-04446 and PROMETEO/2008/051 projects of the Spanish government, CONSOLIDER-INGENIO 2010 under grant CSD2007-00022, and FPU grant AP2008-00600 awarded to V.Sanchez-Anguix.
References 1. Ricci, F., Werthner, H.: Case-based querying for travel planning recommendation. Information Technology and Tourism 4(3-4), 215–226 (2002) 2. Loh, S., Lorenzi, F., Saldana, R., Litchnow, D.: A tourism recommender system based on collaboration and text analysis. Information Technology and Tourism 6, 157–165 3. Fesenmaier, D., Ricci, F., Schaumlechner, E., Wober, K., Zanella, C.: Dietorecs: Travel advisory for multiple decision styles. In: Proc. of the ENTER 2003, pp. 232–242 (2003) 4. Herlocker, J., Konstan, J.A.: Content-independent task-focused recommendation. IEEE Internet Comput. 5, 40–47 (2001) 5. Rudstrom, A., Fagerberg, P.: Socially enhaced travel booking: a case study. Information Technology and Tourism 6(3) 6. Sebastia, L., Garcia, I., Onaindia, E., Guzman, C.: E-tourism: A tourist recommendation and planning adaptation. Int. J. Artif. Intell. Tools 18(5), 717–738 (2009) 7. Lopez, J.S., Bustos, F.A., Julian, V., Rebollo, M.: Developing a multiagent recommender system: A case study in tourism industry. International Transactions on Systems Science and Applications 4, 206–212 (2008) 8. Flesca, S., Manco, G., Masciari, E., Rende, E., Tagarelli, A.: Web wrapper induction: a brief survey. AI Commun. 17(2), 57–61 (2004) 9. Kushmerick, N., Thomas, B.: Adaptive information extraction: Core technologies for information agents. In: Klusch, M., Bergamaschi, S., Edwards, P., Petta, P. (eds.) Intelligent Information Agents. LNCS (LNAI), vol. 2586, pp. 79–103. Springer, Heidelberg (2003) 10. John Wilbur, W., Sirotkin, K.: The automatic identification of stop words. J. Inf. Sci. 18(1), 45–55 (1992) 11. Yang, Y.: Noise reduction in a statistical approach to text categorization. In: Proc. of the SIGIR 1995, pp. 256–263 (1995) 12. Miller, G.A.: WordNet: a lexical database for English. ACM Commun. 38(11), 41 (1995)
Adaptive Hybrid Immune Detector Maturation Algorithm Jungan Chen, Wenxin Chen, and Feng Liang Electronic Information Department, Zhejiang Wanli University No.8 South Qian Hu Road Ningbo, Zhejiang, 315100, China [email protected], [email protected], [email protected]
Abstract. In this work, a novel Adaptive Hybrid Immune Detector Maturation Algorithm is proposed for anomaly detection. T-detector Maturation Algorithm and Dynamic Negative Selection Algorithm are combined with a new state transformation model. Experiment results show that the proposed algorithm solves the population-adapt problem and can generate detectors with higher affinity. Keywords: artificial immune system, anomaly detection, adapt problem.
1 Introduction Nowadays, Artificial Immune System (AIS) is used to construct the algorithms based on negative selection, immune network model, or clonal selection[1][2][3].It is applied in many areas such as anomaly detection, classification, learning and control algorithm[4][5][6]. Negative Selection Algorithm (NSA) is firstly proposed to generate detectors which are applied to anomaly detection [1]. In NSA, match rule is one of the most important components and used to decide whether two strings are matched. There are many match rules proposed [7] [8] [9]. But no matter what kind of match rules, the match threshold (r) is constant and can not be adapted to the change of self data, which called self-adapt problem. Inspired from T-cells maturation’s process, match range model is proposed to solve the self-adapt problem and T-detector Maturation Algorithm(TMA) is put forward [10][11][12]. Besides the self-adapt problem, NSA can not generate dynamic detectors varied with nonselves, which called nonself-adapt problem. Inspired from the affinity maturation’s process, Dynamic Negative Selection Algorithm Based on Affinity Maturation (DNSA-AM) is proposed to solve the nonself-adapt problem[13]. As NSA is used to delete detectors which detect any self, DNSA-AM has the self-adapt problem. Dynamic Negative Selection Algorithm Based on Match Range Model (DNSA-MRM) is proposed to solve the problem[14]. But DNSA-MRM should set the parameter, the size of detector population (PSize), which can not adapt to the change of nonselves, called as population-adapt problem. In this work, a novel algorithm called Adaptive Hybrid Immune Detector Maturation Algorithm(AHIDMA) is proposed to solve the population-adapt problem. E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 201–208, 2010. © Springer-Verlag Berlin Heidelberg 2010
202
J. Chen, W. Chen, and F. Liang
As refrence[15] mentioned, ‘Generalized lymphocytes will not be fast enough for detecting specific pathogens...The immune system incorporates mechanisms that enable lymphocytes to learn the structures of specific foreign proteins; essentially, the immune system evolves and reproduces lymphocytes that have high affinities for specific pathogens’. The mechanism mentioned is affinity maturation, which enable immune system to detect antigen faster than generalized lymphocytes. AHIDMA has combined TMA with affinity maturation or DNSA. The detectors generated by TMA are just like the generalized lymphocytes and the detectors generated by affinity maturation are the specialized lymphocytes. So there is a balance between generality and specialty. Furthermore, a new state transformation model of antigen and detector is proposed to control the population size. Based on the state transformation model, the population-adapt problem is easily solved. In a word, these three adapt problems are solved through the new proposed algorithm.
2 Algorithm 2.1 The State Transformation Model In AIS, the normal set is defined as selves and anomaly set is defined as nonselves. Antigen is the suspect network activity. Detector is used to detect the anomaly or
,
nonselves. U={0 1}n ,n is the length of binary string which is the gene expression of antigen or detector. selves nonselves=U. selves∩nonselves= Φ. In match rule, the affinity is the distance between antigen and detector. There are two binary string AgBin=g1g2…gn, AbBin=b1b2…bn. The hamming distance between AgBin and Ab is:
∪
1
new
suspect 2
4 self
new
nonself
1 4
3
highest 2
maturation
3
die
1. antigen can not be detected by existed maturation detectors 2. angtigen can not be detected over specific generations 3,4. antigen detected by any new or maturation detectors
1. detector has higher affinity with antigen 2. detector detected an nonself antigen 3,4 detector has lower affinity with all antigens and can not detect any antigen
Fig. 1. Antigen and detector’s transformation model
Adaptive Hybrid Immune Detector Maturation Algorithm
203
n
d(AgBin, AbBin) = ∑ g i ⊕ bi
(1)
i =1
In this model, the set of antigen is defined as AGS. The antigen is defined as Ag= {
∈
∈
∈
∈
2.2 Implementation of the Model
∈
In one detector, selfmax and selfmin is calculated by setMatchRange(dct, selves), k [1, |selves| ], selfk selfves. In equation 2, [selfmin,selfmax] is defined as self area. Others are as nonself area. Suppose there is a binary string x U and one detector dct DCTS. When d(x,dct.Ab) ∉ [dct.selfmin, dct.selfmax], x is detected as anomaly. It is called as Range Match Rule (RMR)[11].
∈
∈
∈
⎧ selfmin = min({d(self k , dct.AbBin)}) setMatchRange = ⎨ ⎩selfmax = max({d(Self k , dct.AbBin)})
(2)
M =| AGS |, N =| DCTS |, i ∈[1, M], j ∈ [1, N]
(3)
In equation 3,M,N is the size of antigens, detectors. The value i is the index of antigen in AGS and the value j is the index of detector in DCTS. So the hamming distance between antigen i and detector j is calculated as following:
Ag i .d j = dct j .d i = d ij = d ( Ag i . AgBin, dct j . AbBin)
(4)
Harmmax is the max distance of one antigen with all detectors or one detector with all antigens. harmmax of Ag,dct is calculated by Equation 5,6. In equation 5, the value x means the index of the detector which have the highest distance with antigen i.
204
J. Chen, W. Chen, and F. Liang
Ag i .harm max = d i , x = max (d i* ) d i* = {d i1 , d i 2 ,...d iN ,
dct j .harm max = max (d * j ) d * j = {d1 j , d 2 j ,...d Mj ,
}
}
(5) (6)
harmBitNumij = max((d ij − dct j .self max), (dct j .self min − d ij ))
(7)
harmBitNumi* = {harmBitNumi1 , harmBitNumi 2 ,...harmBitNumiN }
(8)
Ag i .harmbitnum max = harmBitNum i , y = max (harmBitNum i* )
(9)
In this work, a parameter harmBitNum is proposed in equation 7. if harmBitNum is bigger than 1, the distance d is bigger than selfmax or smaller than selfmin,i.e, d ∉ [dct.selfmin, dct.selfmax], x is detected as anomaly according Range Match Rule. The bigger harmBitNum is, the bigger detect range dct has. harmbitnum is calculated by equation 7~9. The value y means the index of the detector which have the highest harmBitNum with antigen i. if (dct x .d i ∈ [dct x .seflmin, dct x .seflmax]) ⎧Ag i .state =' suspect ' ⎪ ⎨Ag i .undetectedCount = Ag i .undetectedCount + 1 ⎪dct .state =' highest ' ⎩ x
(10)
if ( Ag i .un det ectedCount > max un det ectedCount) Ag i .state =' self '
According equation 5, the detector x has the highest distance with antigen i. In equation 10, if angtigen i can not detect by detector x, antigen i is taken as suspect antigen, which means that antigen i required to be detected over many generations until antigen is taken as self antigen or nonself antigen. If antigen i can not be detected over maxundetectedCount generations, it is changed to ‘self antigen’. Furthermore, if detector x can not detect the antigen i, detector x is changed to highest detector.
if (dct y .d i ∉ [dct y .seflmin, dct y .seflmax]) Ag i .state =' nonself ' , dct y .state =' maturation' ,
(11)
According equation 9, the detector y has the highest harmbitnum with antigen i. In equation 11, if detector y detects the antige i, antigen i is changed to nonself antigen and detector y is changed to maturated detector. Because the detector with bigger harmBitNum has more detect range, the mechanism is used to ensure the balance of generality and specialty. 2.3 The Detection Process The algorithm proposed has combined TMA with Affinity Maturation. So it has two detect processes. Some variables are defined in equation 12~15.
Adaptive Hybrid Immune Detector Maturation Algorithm
205
∀Ag new ∈ AGS new ∈ AGS , Ag new .state =' new'
(12)
∀Ag suspect ∈ AGS suspect ∈ AGS , Ag suspect .state =' suspect '
(13)
∀dct highest ∈ DCTS highest ∈ DCTS , dct highest .state =' highest '
(14)
∀dct maturation ∈ DCTS marturation ∈ DCTS , dct highest .state =' maturation '
(15)
1.AGSnew is firstly detected by DCTSmaturation.,called TMADetect. After the process. Angtigen in AGSnew will be split into suspect or nonself antigen.If one antigen can not detected by detectors, highest detectors will be generated. 2.AGSsuspect is detected by DCTShighest,called AMDetect. In this process, new detectors are generated from the highest detectors through affinity maturation. Randomly generate new detectors, the number of detectors is | AGSsuspect |. Each detector in DCTShighest reproduces child detectors. The higher affinity the detector has , the higher the number of clones generated[3]. The total number of new detectors generated is the value of harmax There is no cross operator but mutate operator according the hypermuate principle. The higher affinity, the smaller mutate rate[3]. So mutate Rate is (the length of dct.agbin-dct.harmmax)/2.
3 Experiments The objective of the experiments is to: (1) verify the detect result and whether the size of detectors is adaptive. (2) investigate the effect of affinity maturation and selfantigen’s detection. Experiments are carried out using the famous benchmark Fisher’s Iris Data and Wisconsin Diagnostic Breast Cancer (WDBC)[5]data listed in table.1. Minimal entropy discretization algorithm is used to discretize these data sets[16]. Algorithm runs ten times. The max generation maxg=1000. For verifying the adaptive character, nonself data are changed every 50 generations, maxundetectCount=maxg. In the Iris Data, It has 4 attributes and has total 150 examples with three classes: ’Setosa’, ’ Versicolour’, ’ Virginica’. Each class has 50 examples. One of the three types of iris is considered as normal data. The other two are considered anomaly and injected into the algorithm in turn and repeatedly. As for WDBC data, it has 30 attributes and consists of 569 examples with two classes: ‘benign’, ’malignant’. 357 examples belong to ‘benign’ and 212 belong to ‘malignant’. ‘malignant’ is defined as self set and ‘benign’ as nonself set. 30 nonself data is injected every 50 generation and repeatedly. To investigate the effect of self-antigen’s detection, iris data is used. Versicolour’ is as normal data and others is as anomaly.50 nonself data are injected every 50 generations and 5 self data are injected in the first generation. The parameter maxundetectCount=100. The detection results are shown in table.1. there is none nonself-antigen can not be detected and 5 self-antigens is detected successfully.
206
J. Chen, W. Chen, and F. Liang Table 1. Data set used in expriment and results
Data- Self data Set
Setosa Virginica Versicolor Wdbc Malignant’ Iris Versicolor Iris
Size Of Self ves
SizeOf SizeOf Self Nonself Anti Antigens gens
50
0
100
212 50
0 5
357 100
Maxund SizeOf SizeOf etec Valid Antigens tedCount Detectors Cannot Detected 6.9 0 18.3 0 1000 11.2 0 52.4 0 100 11.5 0
Size Of Self Antigens 0 0 0 0 5
3.1 Population-Adapt In fig 1, the former figure shows the number of antigens changed in every 50 generations. No matter how the antigens change, there are fewer antigens which can not be detected shown in the second figure because of the adaptive population of AHIDMA shown in the fourth figure.
Fig. 1. Results of algorithm using wdbc data
3.2 Self Antigen’s Detection and the Effect of Affinity Maturation In the experiment, 50 nonself data are injected every 50 generations and 5 self data are injected in the first generation. In fig.2, the first figure presents the variation curve of the number of antigen undetected. Stimulated by the antigen undetected, daughter detectors with higher ‘harmmax’ are generated and the value ‘harmmax’ in the third figure fluctuates because of Affirnity Maturation. So does the number of detectors in the fourth figure. In the second figure, five self-antigens are detected after the 100th generations because the parameter maxundetectCount is set to 100.
Adaptive Hybrid Immune Detector Maturation Algorithm
207
Fig. 2. Results of self-antigen detection
Furthermore, the number of detectors produced by affinity maturation or TMA is in direct proportion to the number of antigens. The first figure shows that the number of antigens between the 0th and 100th generation are more than other generations, so the number of detectors between the 50th and 100th generation are more than other generation in the fourth figure. After the 100th generation, TMADetect can detect all the antigens and AMDetect will not be activated, so detector with ‘highest’ state will die and the number of detectors decreased.
4 Conclusion In this work, a new state transformation model is proposed. Based on the transformation model, a new algorithm called Adaptive Hybrid Immune Detector Maturation Algorithm(AHIDMA) is proposed. It combines the TMA with AM and solved the adapt problem, that is the self-adapt, nonself-adapt and population-adapt problem. In the algorithm, randomly detectors generating mechanism is used to implement the NSA or TMA and affinity maturation is used to implement the DNSA or AM. TMDetect process is used to detect antigen detected before and AMDetect process is used to detect antigen undetected before. Affinity maturation is the key component to produce new detectors. Acknowledgments. This work is supported by Ningbo Nature Science Foundation 200701A6301043 and Scientific Research Fund of Zhejiang Provincial Education Department 20070731. Thanks for the assistance received by using KDD Cup 1999 data set [http://kdd.ics.uci.edu/databases / kddcup99/ kddcup99.html]
References 1. Forrest, S., Perelson, A.S., Allen, L., Cherukuri, R.: Self-nonself Discrimination in a Computer. In: Proceedings of the 1994 IEEE Symposium on Research in Security and Privacy (1994)
208
J. Chen, W. Chen, and F. Liang
2. de Castro, L.N., Von Zuben, F.J.: aiNet: An Artificial Immune Network for Data Analysis. In: Abbass, H.A., Sarker, R.A., Newton, C.S. (eds.) Data Mining: A Heuristic Approach, ch. XII, pp. 231–259. Idea Group Publishing, USA (2001) 3. de Castro, L.N., Von Zuben, F.J.: Learning and Optimization Using the Clonal Selection Principle. IEEE Transactions on Evolutionary Computation, Special Issue on Artificial Immune Systems (2002) 4. Chen, T.-C., Chen, C.-Y.: Satellite-Derived Land-Cover Classification Using Immune Based Mining Approach. In: 5th WSEAS International Conference on Applied Computer Science (2006) 5. Kim, J.W.: Integrating Artificial Immune Algorithms for Intrusion Detection, PhD Thesis, Department of Computer Science, University College London (2002) 6. Huang, T.L., Lee, K.T., Chang, C.H., Hwang, T.Y.: Two-Level Sliding Mode Controller Using Artificial Immue Algorithm. WSEAS Transcations on Power Systems (2006) 7. Hofmeyr, S.A.: An Immunological Model of Distributed Detection and its Application to Computer Security, PhD Dissertation, University of New Mexico (1999) 8. Gonzalez, F.: A Study of Artificial Immune Systems applied to Anomaly Detection, PhD Dissertation, The University of Memphis (May 2003) 9. Ji, Z., Dasgupta, D.: Revisiting Negative Selection Algorithms. Evolutionary Computation (2007) 10. Yang, D., Chen, J.: The T-Detectors Maturation Algorithm Based on Genetic Algorithm, LNCS. Springer, Heidelberg (2004) 11. Yang, D., Chen, J.: The T-detectors maturation algorithm based on match range model. In: Proceedings of the 2005 ACM Symposium on Applied Computing (2005) 12. Chen, J.: T-detectors Maturation Algorithm with Min- Match Range Model. In: The 3rd IEEE International Conference on Intelligent System (2006) 13. Wenjian, L.: Research on Artificial Immune Model and Algorithms Applied to Intrusion Detection, PhD Dissertation, University of Science and Technology of China (2003) 14. Chen, J.: Dynamic Negative Selection Algorithm Based on Match Range Model, LNCS. Springer, Heidelberg (2005) 15. http://www.cs.unm.edu/~immsec/html-imm/affmat.html 16. Dougherty, J., Kohavi, R., Sahami, M.: Supervised and unsupervised discretization of continuous features, http://robotics.stanford.edu/~ronnyk/disc.ps
Interactive Visualization Applets for Modular Exponentiation Using Addition Chains Hatem M. Bahig and Yasser Kotb Computer Science Division, Department of Mathematics, Faculty of Science, Ain Shams University, Cairo, Egypt {hmbahig,kotb}@asunet.shams.edu.eg
Abstract. Online visualization systems have come to be heavily used in education, particularly for online learning. Most e-learning systems, including interactive learning systems, have been designed to simplify understanding the ideas of some main problems or in general overall course materials. This paper presents a novel interactive visualization system for one of the most important operation in public-key cryptosystems. This operation is modular exponentiation using addition chains. An addition chain for a natural number e is a sequence 1 = a0 < a1 < . . . < ar = e of numbers such that for each 0 < i ≤ r, ai = aj +ak for some 0 ≤ k ≤ j < i. Finding an addition chain with minimal length is NP-hard problem. The proposed system visualizes how to generate addition chains with minimal length using depth-first branch and bound technique and how to compute the modular exponentiation using addition chains. keywords: addition chain, branch and bound algorithm, public-key cryptosystem, visualization.
1
Introduction
The modular exponentiation (computing me mod n, for given m, e, and n) is one of the most important operations for many public-key cryptosystems. For example, in the RSA [17], the encryption of the message m is me mod n, where n, and e are the public-key. In general, modular exponentiation is computed using a chain of modular multiplications. There are two strategies to improve the throughput of these cryptosystems implementation. The first strategy is optimizing the multiplication [11]. The second strategy is reducing the number of the required modular multiplications. The computation of exponentiation me with minimal number of multiplications given that the only operation allowed is multiplying two already-computed powers corresponds to the problem of finding a sequence of increasing natural numbers approaching the exponent e such that the sequence starts with 1, which represents the element m, ends with e which represents the elements me and every other element in the sequence is the sum of two preceding elements (not necessary distinct). Such sequence is called an addition chain. Finding the minimal length addition chains is NP-hard problem [10]. Therefore, there is a need to understand this problem to improve the performance of E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 209–216, 2010. c Springer-Verlag Berlin Heidelberg 2010
210
H.M. Bahig and Y. Kotb
those cryptosystems. There are two directions to find addition chains. The first one is to find a short (not necessary minimal length) addition chains. The other is to find a minimal length addition chains. Artificial Intelligence (AI) can play in both directions. For the first direction, many techniques in AI have been introduced to find addition chain with short length. For examples, Genetic algorithms [4,14], Ant colony algorithms [15], Swarm algorithms [13] and Artificial Immune System paradigm [5] have been applied to generate a short addition chain. For the second direction, search tree techniques in AI for exact solutions have been applied to generate a minimal length addition chain, see [2,7] for examples. A depth-first branch and bound algorithm is the best known technique to generate minimal length addition chain [1,19]. Therefore, we concentrate on it. E-learning can be defined as technology-based learning in which learning material is delivered electronically to remote learners via a computer network. Elearning (or Internet-based learning) could be seen as a professional level of education but with the advantages of lower time and cost. Some other advantages of e-learning include larger learning population, shortage of qualified training staff and lower cost of campus maintenance, up-to-date information and accessibility. In a typical e-learning environment the lecturers, students and information are in different geographical locations and are connected via the Internet. Traditional web-based courses usually are static hypertext pages without student adaptability. However, since last ninetieth, several research teams are implementing different kinds of adaptive and intelligent systems for Web-based education [3]. There are few softwares to help students to understand cryptographic protocols and operations, for examples [6,8,9]. There is no educational system for modular exponentiation operation. This motivates us to visualize modular exponentiation using addition chain. The paper is organized as follows. Section 2 includes overview of addition chains. In Section 3, we present the uses of depth-first branch and bound algorithm to generate addition chains with minimal length. In Section 4, we present a visualization of addition chains. In Section 5, we give a brief discussion with classroom methodological studies. Finally, Section 6 presents the conclusion and future work.
2
Overview of Addition Chains
In this section we mention some basic definitions, notations, and facts about addition chains. An addition chain [12] of length r for a natural number e is a strictly increasing sequence of (r + 1)-natural numbers 1 = a0 , a1 , . . . , ar = e such that for each i ≥ 1, ai = aj + ak for some k ≤ j < i. The integer r is called the length of the addition chain for e. The minimal length of an addition chain for e is denoted by (e). For example, the sequences 1, 2, 4, 5, 10, 15 and 1, 2, 4, 8, 12, 14, 15 are two addition chains for 15 with lengths 5 and 6 respectively. The computation of m15 using the first chain is m, m2 , m4 , m5 , m10 , m15 while it is m, m2 , m4 , m8 , m12 , m14 , m15 using the second chain.
Interactive Visualization Applets for Addition Chains
211
Let λ(e) = log2 e and ν(e) be the number of 1’s in the binary representation of e. The ith step ai = aj + ak (0 ≤ k ≤ j < i) is called star if j = i − 1, small if λ(ai ) = λ(ai−1 ), and big if λ(ai ) = λ(ai−1 ) + 1. The length of an addition chain can be expressed as r = λ(e) + S(a0 , a1 , . . . , ar = e), where S(a0 , a1 , . . . , ai ) denotes the number of small steps in the chain up to ai . The lower bound of (e) is (e) ≥ log2 e + log2 (ν(e)) − 2.13.
(1)
A set of (lb+1)−natural numbers {bi }lb i=0 is called a bounding sequence of length lb for e if bi ≤ ai for each addition chain a0 , a1 , . . . , alb = e of length lb for e. The purpose of bounding sequences is to cut off some branches in the search tree that cannot lead to a shortest chain. Thurber [19] proposed three bounding sequences: (2) bi = e/2lb−i , i = 0, · · · , lb. lb−i−2 e/(3 · 2 ) 0 ≤ i ≤ lb − t − 2; bi = (3) e/2lb−i lb − t − 1 ≤ i ≤ lb. e/2t · (2lb−t−(i+1) + 1) 0 ≤ i ≤ lb − t − 2; (4) bi = e/2lb−i lb − t − 1 ≤ i ≤ lb. A bounding sequence is called vertical (VBS) if it is used using the condition ai < bi and is called slant (SBS) if it is used using the condition ai+1 +ai < bi+2 . Computing VBS and SBS for a given e is as follows [19]. – if ν(e) = 1 and e = (2j + 1)k, j > 0, then use Eq.(4) as VBS and Eq.(3) as SBS. – if ν(e) = 1 and e = 5k, then use Eq.(3) as VBS and SBS. – otherwise, use Eq.(2) as VBS and Eq.(2) as SBS.
3
Generating Minimal Length Addition Chains
Since finding the minimal length addition chains is NP-hard problem, different strategies in artificial intelligence for exact solutions have been proposed to generate addition chains with minimal length [2,7]. For example, A∗ -algorithm [2,16] expands the node in the search tree whose evaluation function (f (ai ) + h(ai ), where f (ai ) is the lower bound of ai , and h(ai ) is the length of the shortest path from the goal node e to the current node ai ) gives the lowest value among all nodes that are not yet expanded. The algorithm terminates when the goal node is found and all unexpanded nodes in the search tree have a higher or equal evaluation than the goal node. However, this algorithm requires a lot of memory, in particular the size of search tree grows very fast with large e. A search algorithm is efficient when the search tree can be pruned rigorously. Therefore, we use depth-first search algorithm with branch-and-bound technique. The next algorithm [1,19] traverses a search tree using cut and branch technique. The depth of the tree starts with the lower bound of addition chains
212
H.M. Bahig and Y. Kotb
Eq.(1). At each step in the search tree, the possible children ai+1 of ai , their types (start or nonstar) and their levels i + 1 are put in the stack. The set of possible children ai+1 of ai is {ai +ak ≤ e; k ≤ i}∪{ai < aj +ak ≤ e; j, k ≤ i−1}. The children of ai constitute a stack segment. To cut some elements, and so some branches, in the search tree that cannot lead to a minimal length chain, we use VBS and SBS. VBS and SBS have two advantages. The first one is speeding up generation of minimal length chains and the other is decreasing the maximum length of the stack. Algorithm. Generating minimal length addition chains. Input: e > 2. Output: minimal length addition chains for e Begin lb ←− log 2 e + log2 (ν(e)) − 2.13 a0 ←− 1; a1 ←− 2; loop Determine vertical and slant bounding sequences VBS and SBS for e. i ←− 1; loop find-chain if (i < lb) then Determine whether to retain ai ; if ai is retained then Push on the stack the possibilities for ai+1 ; i ←− i + 1; Let ai be the element on the top of the stack; if ai = e then Chain is found and then take the next element off of the stack that is not in the stack segment of ai ; end if else Take the next element off of the stack; Let ai be the element on the top of the stack; end if else Take the next element off of the stack that is not in the stack segment of ai ; Let ai be the element on the top of the stack; end if end loop find-chain if no chains found then lb ←− lb + 1; else exit end if end loop End.
4
Visualizing Addition Chains
This section presents our interactive visualization system for modular exponentiation using addition chains. The system contains two applets. The first and the main one is to visualize generation of addition chains with minimal length. The second one is to visualize how to compute modular exponentiation given an addition chain. Fig. 1 shows the main applet. It contains the following.
Interactive Visualization Applets for Addition Chains
213
1. The Input Text Field: it accepts the number e for which the corresponding all possible addition chains will be generated. After the user inputs the number e and clicks on the button “Enter”, the Input Text Field will be disabled. 2. Trace Mode Option: there are two modes for our web application: the “Stepby-step” mode and “Step-over ” mode. The Step-by-step mode enables the user to generate the addition chains in step-by-step manner. In this case, the user interacts with the system in each generation step. While the Step-over mode enables the user to show the generation process without interaction with the system. The user can switch between both modes while the system is running by checking/un-checking the associated check box.
Fig. 1. Visualization Addition Chain System Snapshot
214
H.M. Bahig and Y. Kotb
3. Pause/Continue Button: this button allows the user to pause or continue the generation process. The button title changes automatically in dual manner. 4. Stack Panel: this panel shows the current state of the stack. Each cell in the stack holds (1) the element ai . (2) its index i which indicates to the level of the search tree. (3) its type: star ‘*’ or nonstar ‘!*’. The next children of each element in the search tree are pushed on the stack. 5. Path Panel: shows the current path of the search tree, i.e., the current addition chain. 6. Step Type Panel: it shows the type of each element (star or nostar) in the current addition chain (Path Panel). Both Path and Step Type Panels are run concurrently. 7. Next Child Button: this button generates the possible children of the last element (called current element/node) in the current addition chain. The current element is displayed in the button title. 8. Children Panel: it displays all possible children of the current element and their step types (star or nonstar). If there is no child, the message “No Child” will appear. 9. Push/Pop Buttons: the push button inserts the last generated children into the stack. In case of “No Child”, the system will step over the Push case and directly go to the Pop step. The Pop button is responsible to pop the top of the stack. Both buttons need not to be clicked if the user choose the “Step-over” mode. 10. Addition Chain Tree Panel: this panel shows the addition chain tree. This tree is generated simultaneously with the previous steps. The tree root starts with the value “1” and a sub-child from “1” is “2”. Starting from “2” all possible children will be generated. By clicking on each child in the tree, the unexpanded children will expand. 11. Lower Bound Information: it presents (e), the lower bound of (e), and the number of small steps S(a0 , a1 , . . . , ar ). 12. Vertical Bounding Sequence: it presents the vertical bounding sequence VBS for the number in the Input Text Field. 13. Slant Bounding Sequence: it shows the corresponding slant bounding sequence SBS for the number in the Input Text Field. 14. List of Addition Chains Panel: this panel displays all addition chains for the number in the Input Text Field. This list is generated one by one as the system working. 15. Reset Button: this button cleans all data in the applet.
5
Classroom Methodological Studies
There are five main types of empirical methodological studies [18]: controlled experiments, observational studies, questionnaires and surveys, ethnographic field techniques, and usability studies. In the current work, we concentrate on effectiveness of visualizations, i.e., how well one performs with the visualization algorithm compared to others who do not use the visualization. In fact, we see the
Interactive Visualization Applets for Addition Chains
215
domain of visualization studies as extending beyond the effectiveness. A number of classroom studies have been conducted in order to evaluate and analyze the usage of the system by using both qualitative and quantitative methods. Even though classroom studies are more difficult to control, they can give more externally valid results. Thus, they can be used to describe the practices that take place in classrooms and analyze how the tool usage affects them. As a qualitative work; studies were conducted in graduated and post-graduated cryptography courses at our department and data were collected by observations, video recording, and interviews as well as by using questionnaires. The results were presented as categories of visualization utilizations, problems encountered when using the visualizations, types of visualizations used, etc. The content of these categories have guided the further development of the system by giving a better idea in what kinds of situations the system can be used and how it is expected to work. The quantitative studies indicated that the system helped especially the mediocre students to learn addition chain principles, and to write a program for generating minimal length addition chains.
6
Conclusions and Future Work
We have presented a highly interactive visualization system for finding a minimal length addition chains using depth-first branch-and-bound search technique. The system contains (1) controls for stepping the process. (2) fully understand what is happening in the generation process. (3) animation. Our aims in the future work to do short-term and longitudinal methodological studies as well as mixed methods studies. Also, we will include a comparison between different methods to generate a minimal length addition chain, and the performance of each bounding sequence.
Acknowledgements We are grateful to M. Fathy for valuable comments.
References 1. Bahig, H.: Improved generation of minimal addition chains. Computing 78, 161–172 (2006) 2. Bleichenbacher, D.: Efficiency and Security of Cryptosystems Based on Number Theory. ch. 4, A Docotor Thesis, Swiss Federal Institue of Technology Zurich, Zurich (1996) 3. Brusilovsky, P.: Adaptive and intelligent technologies for web-based education. Kunstliche Intelligenz. Special Issue on Intelligent Tutoring Systems and Teleteaching 4 (1999) 4. Cruz-Cort´es, N., Rodr´ıguez-Henr´ıquez, F., Ju´ arez-Morales, R., Coello Coello, C.: Finding Optimal Addition Chains Using a Genetic Algorithm Approach. In: Hao, Y., Liu, J., Wang, Y.-P., Cheung, Y.-m., Yin, H., Jiao, L., Ma, J., Jiao, Y.-C. (eds.) CIS 2005. LNCS (LNAI), vol. 3801, pp. 208–215. Springer, Heidelberg (2005)
216
H.M. Bahig and Y. Kotb
5. Cruz-Cort´es, N., Rodr´ıguez-Henr´ıquez, F., Ju´ arez-Morales, R., Coello Coello, C.: An Artificial Immune System Heuristic for Generating Short Addition Chains. IEEE Trans. Evolutionary Computation 12(1), 1–24 (2008) 6. Cattaneo, G., De Santis, A., Ferraro Petrillo, U.: Visualization of cryptographic protocols with GRACE. Journal of Visual Languages and Computing 19, 258–290 (2008) 7. Chin, Y., Tsai, Y.: Algorithms for finding the shortest addition chain. In: Proceedings National Computer Symposium, Kaoshiung, Taiwan, pp. 1398–1414 (1985) 8. Cryptography demos, http://nsfsecurity.pr.erau.edu/crypto/index.html 9. Cryptool, http://www.cryptool.org/ 10. Downey, P., Leong, B., Sethi, R.: Computing sequences with addition chains. SIAM J. Computing 10(3), 638–646 (1981) 11. Gordon, D.M.: A survey of fast exponentiation methods. J. Algorithms 122, 129– 146 (1998) 12. Knuth, D.E.: The Art of Computer Programming: Seminumerical Algorithms, 3rd edn., vol. 2, pp. 461–485. Addison-Wesley, Reading (1997) 13. Alejandro, L., Cruz-Cort´es, N., Moreno-Armend´ ariz, M., Orantes-Jim´enez, S.: Finding Minimal Addition Chains with a Particle Swarm Optimization Algorithm. LNCS, vol. 5845, pp. 680–691. Springer, Heidelberg (2009) 14. Nedjah, N., Mourelle, L.: Minimal Addition-Subtraction Chains Using Genetic Algorithms. In: Yakhno, T. (ed.) ADVIS 2002. LNCS, vol. 2457, pp. 303–313. Springer, Heidelberg (2002) 15. Nedjah, N., Mourelle, L.: Towards minimal addition chains using Ant colony optimization. Journal of Mathematical modeling and algorithms 5(4), 525–543 (2003) 16. Nilsson, N.J.: Principles of Ariticial Intelligence, 2nd edn. Springer, Heidelberg (1982) 17. Rivest, R., Shamir, A., Adleman, L.: A method for obtaining digital signatures and public-key cryptosystems. Communications of the ACM 21(2), 120–126 (1978) 18. Stasko, J., Hundhausen, C.: Algorithm Visualization. In: Fincher, S., Petre, M. (eds.) Computer Science Education Research, pp. 199–228 (2005) 19. Thurber, E.: Efficient generation of minimal length addition chains. SIAM J. Computing 28, 1247–1263 (1999)
Multimedia Elements in a Hybrid Multi-Agent System for the Analysis of Web Usability E. Mosqueira-Rey, B. Baldonedo del R´ıo, D. Alonso-R´ıos, E. Rodr´ıguez-Poch, and D. Prado-Gesto University of A Coru˜ na, Laboratorio de I+D en Inteligencia Artificial, A Coru˜ na, Spain http://www.dc.fi.udc.es/lidia/
Abstract. The widespread use of the World Wide Web by persons from different age groups, diverse cultures, and with different computer skills has put more emphasis in the process of ensuring that web sites and applications are usable. On the other hand we can see how plain text web pages have been replaced by ones in which multimedia information (images, video and audio) are an important part of them. In this context we have developed a multi-agent system for analysing web usability that performs both static and dynamic analysis of web sites. The static analysis can examine certain multimedia issues related to usability (mostly associated with images) and in this paper we also comment on the issues related with video and audio technologies.
1
Introduction
At present, web sites and web applications are used on a daily basis by persons from different age groups, from diverse cultures, and with different computer skills. In these web sites, multimedia elements are becoming very popular nowadays. We can see that there is a transition from old-style web pages (based on HTML code and images) to ones with audio and video clips inserted and with capabilities that resemble those of desktop applications using Rich Internet Application –RIA– technologies. At the same time, technological progress has brought about a significant change of mentality in users, compelling us to become more demanding and to get what we want with the least effort. As a consequence, we are quick to become impatient with user-unfriendly applications. In this paper we present a multi-agent system based on evolutionary learning for the usability analysis of web sites. Our first approach is to analyse the static HTML code and to try to simulate the navigation performed by standard web users who find typical usability problems. In the second part of the paper we analyse how the multimedia elements affect the usability of web pages and our plans to inspect usability issues in those elements.
This research has been partly funded by the Xunta of Galicia Regional Government of Spain under Project 08SIN010CT.
E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 217–224, 2010. c Springer-Verlag Berlin Heidelberg 2010
218
2
E. Mosqueira-Rey et al.
A Multi-Agent System for the Usability Analysis of Web Sites
In this section we describe briefly the multi-agent system developed to analyse the usability of web sites [1]. It uses two different approaches: on the one hand we have a static strategy in which the content of a web page (basically HTML code) is analysed in order to find well-known issues that can derive in usability problems. On the other hand, we follow a dynamic strategy that consists of a simulation of the browsing process. This simulation is based on modelling users who try to reach one particular URL in the web site guided by a set of goals (e.g., desired information or intended actions) that are represented by key phrases. In this case we are testing the clarity of the structure of the web site and how easily users find the information that is relevant to them. The system is then structured in two different types of agents as shown in Figure 1: HTML Analyser Agent and User Agents. Each user agent has as its goal to arrive to a destination URL from an initial URL, and possesses a set of rules of potential use in achieving this goal. When a user agent arrives to a new page, it requests information from the HTML Analyser Agent on the available links (link text, text surrounding the link, target URL). The user agent then checks this information against its rules. The HTML Analyser Agent stores the information on the links for future requests. User agents use a reinforcement learning mechanism that will reward or penalise agent actions. For example a positive reinforcement would reward being able to use the initial key phrases and a negative reinforcement would penalise being obliged to return to the previous page. In order to speed up the learning process, user agents are considered as individuals of an evolutionary process and the typical operators of crossover and mutation are applied. The objective is to ensure that the best rules are passed on to the next generation and become widespread between the different agents.
Fig. 1. Structure of the multi-agent system
Multimedia Elements in a Hybrid Multi-Agent System
2.1
219
HTML Analyser Agent
The HTML Analyser Agent [2] examines the HTML code of a single page, a group of pages, or an entire web site, with the aim of detecting usability problems. Even though HTML code is relatively transparent and consists of a reduced set of elements, it can be a complex to infer usability issues from it and all kinds of ambiguities may appear. The HTML Analyser Agent, thus, uses heuristics to decide whether a usability problem actually exists, and reports every problem found indicating its estimated criticality. As an example of usability aspects analysed we can cite: – Web page usability issues, such as the estimated page size, programming problems (browser-specific tags, deprecated tags, the lack of text encoding, etc.), flexibility problems (fixed font sizes, fixed-width elements), problematic elements in certain contexts (cookies, CSS formatting, Javascript code, animations, etc.) and the presence of search engines. – Image-specific usability issues: described in more detail in section 3. – Form-specific usability issues: Number of elements, the existence of elements that are subject to some type of validation, accessibility features such as easily clickable controls, etc. – Table-specific usability issues: Size, whether tables are used for presenting data or for establishing the page layout, the existence of recommended elements such as headers, captions, and summaries. – Link-specific usability issues: Non-standard representations, broken links, badly constructed links, inappropriate link texts, the use of anchors, the existence of links to non-HTML files, etc. The last issue, that is, the existence of links to non-HTML files, is important in the context of this paper since many times these files are used to store multimedia information (e.g. video or audio files). While using non-HTML files is not necessarily bad it is true that they can have a negative impact on usability in many ways: lack of consistency, switch between different ways of navigating, proprietary file formats, additional software needs to be installed on the client side, etc. These issues are commented on in more detail in the following sections of this paper. 2.2
User Agents
User agents model the dynamic browsing process of the human users. Each agent has as its goal to arrive to a destination URL from another one, and possesses a set of key phrases of potential use in achieving this goal. The motivation for this word-based approach is that the Web is primarily a linguistic medium that involves browsing through text pages and examining text content. Our multi-agent system contains a population of non-identical user agents that model different types of human users and obtain different results. User agents are implemented using an evolutionary agent architecture composed by rule-based reactive agents.
220
E. Mosqueira-Rey et al.
The motivation behind the fallible and non-deterministic behaviour of the agents is to decide if the text content and the links of the web site help users in performing tasks and finding the information they seek. That is, if an agent fails, it is probably because of usability problems in the web site. In a product with a good level of usability, computerised task implementation will be as close as possible to the mental model of the user. In our case that means that the structure of the web site, the text content, and the link labels should be as intuitive as possible. 2.3
Multimedia Elements in Web Usability Analysis
The initial reason for including multimedia elements in web sites was to make them more attractive, to enhance the user experience, and to convey information more efficiently. Ideally, this would make the Web more usable. However, these multimedia elements can easily hamper usability if they are not used judiciously. Moreover, proprietary formats or external plugins, for instance, can cause compatibility problems that can even prevent the user from navigating the web site. The continuing availability of new multimedia formats and functionalities means that these problems have become increasingly complex and more and more common. Our HTML Agent helps to discover these usability problems by (1) detecting the use of multimedia elements in a web site, (2) automatically analysing their basic characteristics, and (3) generating reports on the issues found. A detailed explanation of the inherent usability problems of each multimedia type appears below. Section 3 describes the problems with the images and Section 4 examines the issues with video and audio files. At the relevant points, we include descriptions of how the HTML Analyser Agent addresses those issues.
3
Images
Images are probably the oldest type of multimedia files used on web pages. Their basic usability problems are fairly well-known and are related to their dimensions, formatting problems, accessibility issues, and the size of the file. In order to detect these problems, our HTML Analyser Agent automatically examines the characteristics of the image file and inspects the HTML code of the page that contains it. More specifically, the following issues are analysed and reported on: – Image dimensions: A basic usability heuristic is that images should not be so big that they do not fit on an average screen. Regardless of the actual dimensions of the image, HTML also offers attributes to explicitly specify the height and the width in which it will be displayed. Moreover, these attributes let the browser know how much space must be reserved in order to display the image before it is actually loaded. Failing to indicate this can cause formatting problems because the elements of the page can be continuously rearranged as the images are retrieved and displayed.
Multimedia Elements in a Hybrid Multi-Agent System
221
– Accessibility features: HTML also offers attributes for declaring a long description of the image and an alternative text for users who are not able to see images properly [3]. Omitting these attributes is a typical accessibility failure. Our Agent also warns the user if the alternative text is too long. – File size: This is an inherent property of the file that has an impact on download time. It should be kept in mind that not all users have broadband access, and, even in broadband situations, response times can be slow [4]. – The use of image maps: These are pictures in which different areas contain links to different URLs. They should be used only when necessary. Figure 2 shows an example of a report generated by the HTML Analyser Agent on the usability problems of a particular image. As can be seen, the HTML code for the image lacks both the “alternative text” and “long description” attributes, and neither the height nor the width are expressly declared.
Fig. 2. Usability issues found for a specific image
4
Video and Audio
When a multimedia file is placed on a web page, the HTML Analyser Agent can extract useful data from its MIME type, including the kind of software needed to handle the file (commercial software, shareware, or freeware) or whether its format is proprietary or not. However, some other significant information, such as the video format (e.g., H.264, Theora, VP6 or VP8), cannot be obtained from MIME types, since they mostly refer to container formats: for instance, an AVI file may contain audio/visual data in several compression algorithms, including Motion JPEG, RealVideo, or MPEG-4 Video.
222
E. Mosqueira-Rey et al.
One important aspect that directly affects web usability is how multimedia content is displayed, that is, how usable are web video players. In the next two subsections we introduce the two competing platforms used to embed video on web pages: Flash and HTML 5. 4.1
Flash
Since Adobe Flash Player version 4 appeared in 1999, Flash Video has quickly established itself as the de facto format of choice for embedded video on the web and has become one of the most popular platforms for developing Rich Internet Applications. Flash animations were easy to create and allowed web developers to expand their possibilities in design. Furthermore, after installing a specific plugin, any content shown in Flash was viewable on most web browsers and operating systems. This resulted in Adobe Flash Player being part of 99% of Internet-enabled desktops in 2009. However, this situation might be changing since HTML 5 was released and new popular products, such as Apple iPhone and iPad, do not support Flash. Despite of its popularity, the Flash platform is known to lead to web designs with many usability issues, as shown in [5]. Flash developing tools allow or, in many cases, force users not to follow web-design standards and the resulting Flash objects usually increase the size of web pages and therefore their load time. Moreover, in spite of being indexable by Google, Flash-based web pages are still hard to explore by many other search engines, since its indexation mechanism is too specific. Finally, the plugin needed to visualise Flash content does not seem to always work well on some operating systems such as Mac OS X or Linux. 4.2
HTML 5
On March 4, 2010, the Web Hypertext Application Technology Working Group (WHATWG) published the latest working draft for the fifth revision of HTML. As far as our context is concerned, the main characteristic is the inclusion of some tags to embed multimedia content, namely, the video and audio elements for video and audio data, respectively. Every browser that supports the HTML 5 protocol will provide a media player that allows to display the specified content inside the aforementioned tags. The design of this player is made up of a default configuration which satisfies the established standards of usability. In this manner the web developers do not need to implement and configure an external player, which could be neither usable nor accessible enough. This issue makes the design of web sites with multimedia content more user-friendly. As HTML is a markup language with elements and attributes, the media elements in web pages become easily parseable and hence indexable by a search engine. This feature gives us more information, which would be used by our HTML Analyser Agent to provide some advice about the use of multimedia contents.
Multimedia Elements in a Hybrid Multi-Agent System
223
The main drawback is an issue external to the implementation of the new version of HTML and it arises from the problem of choosing a standard format to encode the videos included in web sites. At first the WHATWG suggested the following encoding formats: Theora video and Vorbis audio encapsulated in Ogg containers. However, the decision to require this specific format led to opposition by Google, Apple Inc. and Nokia, citing uncertainty about potential patents and lack of hardware support, opting instead for other codecs. Finally, they decided to settle the matter in such a way that suits all, that is, they agreed to allow the presence of media types encoded with any codec. This means that the video publisher must encode all their media files in all formats supported by each browser, and hence write larger code. Until this “format war” comes to an end, the WHATWG recommends a workaround based on the use of source tags as shown below.
5
Discussion
As we can see, a key point in the prevalence of one of these protocols among the Internet community is the video format standardisation, whose future appears to be uncertain. On the one hand, significant improvements in Theora, such as providing hardware support, could lead it to win the “format war” and make it the standard video codec. On the other hand, this lack of standardisation around HTML 5 may benefit the FlashVideo dominance, since its plugin is widely spread and its features are being constantly improved. A new alternative has arisen recently since Google purchased On2 Technologies on February 2010 and, as a result, Google now owns all the patents behind VP8, a new high performance video codec, that can be included in the HTML5 specification if released under a royalty-free licence. Also we have to take into account the usability issues regarding web accessibility to people with disabilities. With HTML 5 we can conclude that the introduction of the multimedia tags would boost the use of accessibility tools in web sites –such as screen readers, subtitles, and audio transcription–, generating a growing interest in the web community to include some specific tags in order to define accessibility contents. On the other hand, Flash elements can be adequate for users with disabilities if designed with care [6] and, ideally, with an alternative access to the information in the form of a standard HTML version of the content [7]. Regardless of which will be the most extended technology in the next years, the features that HTML 5 offers are much more useful in order to improve our HTML
224
E. Mosqueira-Rey et al.
Analyser Agent. The ease of parsing the markup languages makes it possible to quickly obtain information on the attributes in the multimedia elements, for instance the displaying size, the video/audio format, the used codec, the MIME type of the file, etc. With such enhancements we will be able to increase, both in number and in quality, the usability and accessibility reports that web developers could use to make more user-friendly interfaces. One interesting feature that may be included in our system is interest analysis of on-line multimedia content. Aspects like short duration of video clips, high dynamism of scenes and lack of distracting elements in the main frame are considered for improving the web-user experience [8]. In order to make a simple analysis of the header to obtain interesting data –such as audio/video bitrate, media duration, file size, video dimensions, audio sample rate, etc.– some tools (e.g., ffmpeg) could be used to cover the above-mentioned aspects. However, the amount and the type of information stored in a file are specified by the container format and, at this point, a multimedia content description standard (MPEG21, MPEG-7, etc.) could be useful. There are some other characteristics that can be automatically analysed by applying Multimedia Information Retrieval (MIR) techniques which allow us to extract relevant data from video and audio. Speech recognition, video OCR and image similarity matching are examples of MIR algorithms that can be useful to achieve the aforementioned analysis. More about MIR can be found in [9].
References 1. Mosqueira-Rey, E., Alonso-R´ıos, D., V´ azquez-Garc´ıa, A., Baldonedo del R´ıo, B., Moret-Bonillo, V.: A Multi-Agent System Based on Evolutionary Learning for the Usability Analysis of Websites. In: Nguyen, N.T., Jain, L.C. (eds.) Intelligent Agents in the Evolution of Web and Applications. SCI, vol. 167, pp. 11–34. Springer, Heidelberg (2009) 2. Alonso-R´ıos, D., Luis-V´ azquez, I., Mosqueira-Rey, E., Moret-Bonillo, V.: An HTML analyzer for the study of web usability. In: IEEE Int. Conf. on Systems, Man, and Cybernetics, San Antonio, Texas, USA, pp. 1261–1266 (2009) 3. Nielsen, J.: Designing Web Usability. New Riders, Berkeley (2000) 4. Nielsen, J., Loranger, H.: Prioritizing Web Usability. New Riders, Berkeley (2006) 5. Nielsen, J.: Flash: 99% Bad. Jakob Nielsen’s Alertbox (October 2000), http://www.useit.com/alertbox/20001029.html (retrieved 2010-02-25) 6. Nielsen, J.: Making Flash Usable for Users With Disabilities. Jackob Nielsen’s Alertbox (October 2002), http://www.useit.com/alertbox/20021014.html (retrieved 2010-02-25) 7. Hudson, R.: Flash and accessibility. Web Usability - Accessibility and Usability services (November 2003), http://www.usability.com.au/resources/flash.cfm (retrieved 2010-02-28) 8. Nielsen, J.: Talking-Head Video Is Boring Online. Jakob Nielsen’s Alertbox (December 2005), http://www.useit.com/alertbox/video.html (retrieved 2010-02-25) 9. Lew, M., Sebe, N., Djeraba, C., Jain, R.: Content-Based Multimedia Information Retrieval: State of the Art and Challenges. ACM Trans. on Multimedia Computing, Communications, and Applications 2, 1–19 (2006)
An Approach for an AVC to SVC Transcoder with Temporal Scalability Rosario Garrido-Cantos, José Luis Martínez, Pedro Cuenca, and Antonio Garrido Albacete Research Institute of Informatics Universidad de Castilla-La Mancha 02071 Albacete, Spain {charo,joseluismm,pcuenca,antonio}@dsi.uclm.es
Abstract. The scalable extension (SVC) of H.264/AVC uses a notion of layers within the encoded bitstream for providing temporal, spatial and quality scalability, separately or combined. This scalability allows adaptation depending on the scenarios with different devices and heterogeneous networks. The SVC design requires scalability to be provided at the encoder side by exploiting interlayer dependencies during encoding. This implies that existing H.264/AVC content cannot benefit from the scalability tools in SVC due to the lack of intrinsic scalability provided in the bitstream at encoding time. Since a lot of technical and financial effort is currently being spent on the migration from MPEG-2 equipment to H.264/AVC, it is unlikely that a new migration to SVC will occur in the short term. Due to broadcasters and content distributors want to have scalable bitstreams at their disposal, efficient techniques for migration of single-layer content to a scalable format are desirable. In this paper, an approach for temporal scalability transcoding from H.264/AVC to SVC is discussed. This approach is applied to the upper layers of SVC, where coding complexity is higher, and it is capable to reduce this coding complexity around to 55.75% while maintaining the coding efficiency. Keywords: Scalable Video Coding (SVC), H.264/AVC, Transcoding, Temporal Scalability.
1 Introduction Recent advances in video coding technology and standardization along with the rapid developments and improvements of mobile and handled devices, network infrastructures, storage capacity, and computing power are enabling an increasing number of video applications. Application areas today range from multimedia messaging, video telephony, and video conferencing over mobile TV, wireless and wired Internet video streaming, standard and high-definition TV broadcasting to DVD, Blu-ray Disc, and HD DVD optical storage media. For these applications, adaptable video transmission and storage systems must be employed. Video coding standards such as H.264/AVC [1] provide mechanisms for coding video optimizing compression efficiency and satisfying the needs of these emerging multimedia applications. These multimedia applications are typically characterized by a wide range of connection qualities and receiving devices with different decoding capacities (display, E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 225–232, 2010. © Springer-Verlag Berlin Heidelberg 2010
226
R. Garrido-Cantos et al.
processing power, memory capabilities, etc). Moreover, the networks used to deliver video contents are heterogeneous. Therefore, the compressed video bit stream has to be adapted to the network connections and different characteristics of devices to ensure a continuous and quality image. Video adaption has become an essential technology for providing multimedia contents in an appropriate way to the user devices. Scalable Video Coding is a highly attractive solution to the problems posed by the characteristics of modern video transmission systems. The main idea of scalable video coding is to encode the video as one base layer and a few enhancement layers, so that lower bit rates, spatial resolutions and temporal resolutions could be obtained by simply truncating certain layers from the original bitstream to adapt the communication channel bandwidth and user device capabilities. Recently, joint efforts of MPEG and VCEG have led to the standardization of a new state-of-the-art scalable video codec. This scalable extension of H.264/AVC [1], denoted as SVC (Scalable Video Coding), makes it possible to encode scalable video bitstreams containing several quality, spatial, and temporal layers. By parsing and extracting, lower-layers can easily be obtained, hence providing different types of scalability in a flexible manner. It supports temporal, spatial and quality-SNR scalability for video that allows adaption to certain application requirements such as display and processing capabilities of target devices and varying transmission conditions. Temporal scalability in SVC is provided by using Hierarchical Prediction Structures. Spatial scalability is achieved by encoding each spatial resolution into one layer. Additionally, inter-layer prediction mechanisms are applied to remove redundancy between layers. Regarding quality-SNR scalability, is intended to give different levels of detail and fidelity to the original video. It can be seen as a case of spatial scalability where the base and enhancement layers have identical pictures sizes, but different qualities. It is decision of the encoding process to decide which more details will be added to parts of the video images. Different cases of quality levels are distinguished: Coarse-Grain Scalability (CGS) and Medium-Grain Scalability (MGS). A third one, Fine-Grain Scalability (FGS) was removed from the SVC amendment finalized in July 2007. Despite these scalability tools, most of the video contents today are still created in a single-layer format (H.264/AVC video streams). The lack of scalable streams results in the necessity for developing alternative techniques to enable video adaptation. In this paper, video transcoding [2] is proposed for enabling efficient adaptation of H.264/AVC to H.264/SVC video streams. Its efficiency is obtained by reusing as much information as possible from the original bitstream, such as mode decisions and motion information. The ultimate goal is to perform the required adaptation process faster than the straightforward concatenation of decoder and encoder. In particular, this paper describes an approach to transcoding from a single-layer of a H.264/AVC bitstream without temporal scalability (typical IBBP GOP pattern) to SVC bitstream with temporal scalability with hierarchical prediction structures (with B-pictures). The remainder of this paper is organized as follows. In Sect. 2, the state-of-the-art for H.264/AVC to SVC transcoding is discussed. Sect. 3 describes the temporal scalability technique in SVC. In Sect. 4, our approach is shown and, finally, in Sect. 5 conclusions are presented.
An Approach for an AVC to SVC Transcoder with Temporal Scalability
227
2 Related Work Since it is beneficial for broadcasters and content distributors to have scalable bitstreams at their disposal, efficient techniques for migration of H.264/AVC to a H.264/SVC format are desirable. Due to its computational efficiency, transcoding can be used for introducing scalability in compressed, single-layer bitstreams. In this way, re-encoding can be avoided when migrating legacy content to a scalable format. A number of techniques have been proposed in the past for introducing scalability in compressed bitstreams. The majority of the proposals are related to quality-SNR scalability, although there are a few related the other two types of scalability (spatial and temporal). Respecting quality-SNR scalability, in [3] a technique was studied for transcoding from hierarchically encoded H.264/AVC to FGS streams. Although it was the first work in this type of transcoding, does not have a great relevance since this technique for providing quality-SNR scalability was removed from the following versions of the standard due to its high computational complexity. In [4], different architectures for transcoding from single layer H.264/AVC bitstream to SNR scalable SVC streams with CGS layers were proposed that depends on the macroblock type. Moreover, the normative bitstream rewriting process implemented in SVC standard to convert SVC to H.264/AVC bitstream is used to reduce the computational complexity of architectures proposed. For spatial scalability, a proposal was presented in [5]. They presented an algorithm for converting a single layer H.264/AVC bitstream to a multi layer spatially scalable SVC video bitstream, containing layers of video with different spatial resolution. Using a full-decode full-encode algorithm as starting point, some modification are made to reuse information available after decoding a H.264/AVC bitstream for motion estimation and refinement processes on the encoder. The scalability is achieved by an Information Downscaling Algorithm which use the top enhancement layer (this layer has the same resolution as the original video output) to produce different spatial layers of the output SVC bitstream. In the temporal scalability framework, the most relevant work is in [6]. A transcoding method from H.264/AVC P-picture based bitstream to a SVC bitstream with temporal scalability was presented. In this approach, the H.264/AVC bitstream is transcoded to a two layers of P-pictures (one with reference pictures and another with non reference ones). Then, this bitstream is transformed to a SVC bitstream by syntax adaptation.
3 Temporal Scalability in H.264/SVC A bit stream provides temporal scalability when the set of corresponding access units can be partitioned into a temporal base layer and one or more temporal enhancement layers with the following property. Let the temporal layers be identified by a temporal layer identifier T, which starts from 0 for the base layer and is increased by 1 from one temporal layer to the next. Then for each natural number k, the bit stream that is obtained by removing all access units of all temporal layers with a temporal layer identifier T greater than k forms another valid bit stream for the given decoder.
228
R. Garrido-Cantos et al.
For hybrid video codecs, temporal scalability can generally be enabled by restricting motion-compensated prediction to reference pictures with a temporal layer identifier that is less than or equal to the temporal layer identifier of the picture to be predicted. In H.264/AVC and for extension in SVC, any picture can be marked as reference picture and used for motion compensated prediction of following pictures. This feature allows the coding of picture sequences with arbitrary temporal dependencies. Hence, for supporting temporal scalability with a reasonable number of temporal layers, no changes to the design of H.264/AVC were required. The only related change in SVC refers to the signaling of temporal layers. In this way, to achieve temporal scalability, SVC links its reference and predicted frames using Hierarchical Prediction Structures [7] which defines the temporal layering of the final structure. With Hierarchical Prediction Structures, key pictures (typically I or P frames) are coded in regular intervals by using only previous key pictures as references. The pictures between two key pictures are hierarchically predicted and together with the succeeding key picture are known as Group of Pictures (GOP). The sequence of key pictures represents the lowest temporal (temporal base layer) which can be increase with the non key pictures that are divided into enhancement layers.
Fig. 1. Hierarchical B prediction structure with four temporal layers (TL)
There are different structures for enabling temporal scalability, but the typical GOP structure is based on hierarchical B pictures, which is also the used in the JSVM reference encoder software [8]. The number of temporal layers is thus equal to 1+log2[GOP size]. One of these structures, with dyadic structure, GOP of 8 (I7BP pattern) and therefore four temporal layers, is illustrated in Fig. 1.
4 Proposed H.264/AVC to SVC Video Transcoder The most time consuming tasks carried out at H.264/AVC and SVC encoders are the Motion Estimation (ME) and the procedure called as Macro Block (MB) mode coded
An Approach for an AVC to SVC Transcoder with Temporal Scalability
229
decision. Both techniques perform the inter prediction and they are the most suitable modules to be accelerated. The idea behind the proposed transcoder consists of reusing most of the operations that can be gathered in the H.264/AVC decoding algorithm (as part of the transcoder) to accelerate the SVC encoding one (also included in the transcoder). In this framework, on the one hand, the proposed transcoder tackles the ME reduction by reusing the Motion Vectors (MVs) used in H.264/AVC in order to define smaller search areas in SVC (this approach is depicted in Section 4.1). On the other hand, previously MB partitions developed by H.264/AVC can be used as candidate MB partitions for SVC. Moreover, the residual information (the residual frame) can be also used to refine this preliminary MB partitions. In this line, Machine Learning (ML) can be applied to convert these observations into rules that can be implemented in the proposed transcoder instead of the more complex original. This technique of applying ML has been previously used in MPEG-2 to H.264/AVC video transcoder and shows that ML is an appropriate solution in the framework of transcoding [9]. Experimental results show that the proposed approach reduces the MB mode selection complexity by as much as 95% while maintaining the coding efficiency. 4.1 Dynamic Motion Window for Motion Estimation The idea of ME consists of eliminating the temporal redundancy in a way to determine the movement of the scene. Because of SVC is an extension of H.264/AVC, the ME carried out in SVC will be highly correlated with the ME previously performed in H.264/AVC. Therefore, it looks obviously that performing the complete ME again is a waste of time. Although it looks that a simpler approach will be to use the incoming MVs themselves from H.264/AVC in SVC, the fact is that the H.264/AVC MVs are correlated with those generated in SVC, but are not the same. This is because the AVC pattern follows, in general, the IBBP structure and SVC uses hierarchical B pictures (see Figure 1). This mismatches between GOP sizes and formats deals with different MVs in both ME. Therefore, this paper, firstly, proposes a Dynamic Motion Window (DMW) technique that uses the incoming MVs from H.264/AVC to determine a small area to find the real MVs calculated in SVC (which is depicted in Figure 2). In the proposed transcoder based on DMW, the motion vector search range for every SVC MB is adaptively determined and it is recalculated for every macroblock (or sub-macroblock partition) that can occur in the MB mode coded decision and reduced depending on the length and the orientation of the incoming H.264/AVC MVs. It is used to determine a dynamic search range area around the H.264/AVC MVs orientation. The description of the DMW approach is depicted in Figure 2. The new search range is determined by the area created by the circumference equation, centered in the (0,0) point for each mode or sub-mode, with the length of the incoming vector such as the radio of the circumference (see Figure 2). This technique of applying Dynamic Motion Window for Motion Estimation has been used to reduce the motion estimation complexity in upper temporal layers by as much 55.75% while maintaining the coding efficiency.
230
R. Garrido-Cantos et al. Block Size = M x N
N + 2d max
Exhaustive Search Max search range = d max
N
M + 2d max
d max
M
Search area: (M + 2d max) (N + 2d max) Number of search points = (2d max+1)2 DMW Search
d max
Max search range = Limited by the circumference Search area: x2 + y2 <= (MVx2 + MVy2), where x and y are the coordinates of the candidate points
H.264/AVC Motion Vector (MVx, MVy) Circumference equation centered in (0,0): x2 + y2 = r2, where r is the radio
Number of search points: Inner point of the circumference
Using the Pythagoras theorem: r2 = MVx2 + MVy2
Fig. 2. Proposed dynamic motion estimation
4.2 Implementation Results In this section, results from the implementation of the proposal described in Sect 4.1 are shown. Test sequences with varying characteristics were used, namely Foreman, Bus, Football, Mobile, Soccer and Hall in CIF and QCIF resolutions. Table 1. Encoding time for each temporal layer (TL) with different GOP sizes using CIF
Sequence Foreman Bus Football Mobile Soccer Hall Average
Encoding time (%) of every temporal layer – CIF (30 Hz) GOP = 8 GOP = 16 TL 0 TL1 TL 2 TL 3 TL 0 TL1 TL 2 TL 3 4.72 13.59 27.29 54.40 1.52 6.63 13.11 26.34 4.73 13.67 27.29 54.31 1.75 6.47 13.29 26.22 4.70 13.62 27.41 54.27 1.55 6.58 13.13 26.34 4.72 13.60 27.23 54.45 1.49 6.61 13.09 26.32 4.71 13.59 27.25 54.45 1.54 6.55 13.11 26.35 4.68 13.57 27.26 54.49 1.57 6.56 13.15 26.39 4.71 13.61 27.28 54.40 1.57 6.57 13.15 26.33
TL4 52.40 52.27 52.40 52.50 52.45 52.33 52.38
These sequences were encoded using the H.264/AVC Joint Model reference software, version 16.2 [10], with an IBBPBBP pattern with a fixed QP in a trade-off between quality and bitrate. Then, for reference results, encoded bitstreams are decoded and re-encoded using the JSVM software, version 9.19.3 [8] with hierarchical GOP structures and different values of QP (28, 32, 36, 40). For results of the proposal, encoded bitstreams in H.264/AVC are transcoded using the technique described in Section 4.1.
An Approach for an AVC to SVC Transcoder with Temporal Scalability
231
Table 2. Encoding time for each temporal layer (TL) with different GOP sizes using QCIF
Sequence Foreman Bus Football Mobile Soccer Hall Average
Encoding time (%) of every temporal layer – QCIF (15 Hz) GOP = 4 GOP = 8 TL 0 TL1 TL 2 TL 0 TL1 TL 2 11.98 29.45 58.57 4.71 13.73 27.32 11.71 29.26 59.03 4.84 13.58 27.06 11.72 29.40 58.87 4.67 13.73 27.26 11.94 29.36 58.70 4.72 13.65 27.37 11.74 29.42 58.84 4.99 13.44 27.13 11.69 29.51 58.80 5.08 13.52 27.10 11.80 29.40 58.80 4.84 13.61 27.21
TL 3 54.24 54.52 54.34 54.26 54.44 54.30 54.34
The technique explained previously will be applied on the upper temporal layers because the encoder spends more time to encode the higher enhancement layers (around 75% in the two last ones) as is shown in Table 1 and Table 2. Table 3. RD performance of the approach using CIF (30 Hz) RD performance of AVC/SVC transcoder GOP = 16 - CIF (30 Hz) Sequence ∆PSNR (dB) ∆Bitrate (%) ∆Time (%) Foreman -0.1027 3.04 -50.34 Bus -0.1774 11.37 -45.56 Football -0.0997 1.63 -25.21 Mobile -0.1653 5.96 -84.78 Soccer -0.2774 8.78 -39.36 Hall -0.0337 2.95 -86.49 Average -0.1427 5.62 -55.29 Table 4. RD performance of the approach using QCIF (15 Hz) RD performance of AVC/SVC transcoder GOP = 8 - QCIF (15 Hz) Sequence ∆PSNR (dB) ∆Bitrate (%) ∆Time (%) Foreman -0.1109 4.44 -51.95 Bus -0.1845 12.51 -46.21 Football -0.1095 1.82 -26.12 Mobile -0.1727 6.45 -85.31 Soccer -0.2873 9.26 -40.46 Hall -0.0877 3.12 -87.30 Average -0.1588 6.27 -56.23
Tables 3 and 4 shows ∆PSNR, ∆Bitrate and ∆Time for the sequences under studio on average between QPs when our approach is applied compared to the more complex reference transcoder. The values obtained with the proposed transcoder are very close to the results obtained when applying the reference transcoder: the average PSNR lost over the reference is 0.15 dB, with an average of increase of bitrate around 6% and achieving around to 55.75% reduction of computational complexity.
232
R. Garrido-Cantos et al.
5 Conclusions This work presents an approach for H.264/AVC to SVC with temporal scalability transcoding. Beginning on higher layers and reusing information available after decoding a H.264/AVC bitstream, motion estimation can be accelerated with a dynamic motion window. Experimental results applying these approaches show that they are capable to reduce the coding complexity around to 55.75% while maintaining the coding efficiency. Acknowledgments. This work was supported by the Spanish MEC and MICINN, as well as European Commission FEDER funds, under Grants CSD2006-00046, TIN2009-14475-C04 and TIN2009-05737-E, and it was also partly supported by JCCM funds under grant PEII09-0037-2328 and PII2I09-0045-9916.
References 1. ITU-T and ISO/IEC JTC 1: Advanced Video Coding for Generic Audiovisual Services. ITU-T Rec. H.264/AVC and ISO/IEC 14496-10 (including SVC extension) (March 2009) 2. Vetro, A., Christopoulos, C., Sun, H.: Video Transcoding Architectures and Techniques: an Overview. IEEE Signal Processing Magazine, 18–29 (2003) 3. Shen, H., Sun, X.S., Wu, F., Li, H., Li, S.: Transcoding to FGS Streams from H.264/AVC Hierarchical B-Pictures. In: IEEE Int. Conf. Image Processing, Atlanta (2006) 4. De Cock, J., Notebaert, S., Lambert, P., Van de Walle, R.: Architectures of Fast Transcoding of H.264/AVC to Quality-Scalable SVC Streams. IEEE Transaction on Multimedia 11(7), 1209–1224 (2009) 5. Sachdeva, R., Johar, S., Piccinelli, E.: Adding SVC Spatial Scalability to Existing H.264/AVC Video. In: 8th IEEE/ACIS International Conference on Computer and Information Science, Shangai (2009) 6. Dziri, A., Diallo, A., Kieffer, M., Duhamel, P.: P-Picture Based H.264 AVC to H.264 SVC Temporal Transcoding. In: International Wireless Communications and Mobile Computing Conference (2008) 7. Schwarz, H., Marpe, D., Wiegand, T.: Analysis of Hierarchical B pictures and MCTF. In: IEEE Int. Conf. ICME and Expo., Toronto (2006) 8. Joint Video Team JSVM reference software, http://ip.hhi.de/imagecom_G1/savce/downloads/ SVC-Reference-Software.htm 9. Fernández-Escribano, G., Bialkowski, J., Gámez, J.A., Kalva, H., Cuenca, P., OrozcoBarbosa, L., Kaup, A.: Low-Complexity Heterogeneous Video Transcoding Using Data Mining. IEEE Transaction on Multimedia 10(2), 286–299 (2008) 10. Joint Model JM reference software, http://iphome.hhi.de/suehring/tml/download/
A GPU-Based DVC to H.264/AVC Transcoder Alberto Corrales-García1, Rafael Rodríguez-Sánchez1, José Luis Martínez1, Gerardo Fernández-Escribano1, José M. Claver2, and José Luis Sánchez1 1
Instituto de Investigación en Informática de Albacete (I3A) Universidad de Castilla-La Mancha 02071 Albacete, Spain {albertocorrales,rrsanchez,joseluismm,gerardo, jsanchez}@dsi.uclm.es 2 Departamento de Informática Universidad de Valencia 46100 Burjassot, Valencia, Spain [email protected]
Abstract. Mobile to mobile video conferencing is one of the services that the newest mobile network operators can offer to users. With the apparition of the distributed video coding paradigm which moves the majority of complexity from the encoder to the decoder, this offering can be achieved by introducing a transcoder. This device has to convert from the distributed video coding paradigm to traditional video coding such as H.264/AVC which is formed by simpler decoders and more complex encoders, and allows to the users to execute only the low complex algorithms. In order to deal with this high complex video transcoder, this paper introduces a graphics processing unit based transcoder as base station. The use of graphic accelerators in this framework has not been proposed before in the literature and offer a new field to explore with promising results. The proposed transcoder offers a time reduction of the whole process over 79% with negligible rate distortion penalty. Keywords: Distributed Video Coding, H.264/AVC, Graphic Processing Units, Transcoding, Heterogeneous Computing.
1 Introduction Multimedia communications between mobile devices are becoming an important area of interest in telecommunications because of the advance in mobile networks (such as 4G). Nowadays, one of the most requested mobile services is the video conferencing, where both the transmitter and receiver devices may not have necessary computing power, resources or complexity constraints to perform complex video algorithms (both coding and decoding). On the one hand, traditional video codecs such as H.264 Advanced Video Coding (AVC) [1] typically have highly complex encoders and less complex decoders. On the other hand, Distributed Video Coding (DVC) [2] has received great interest from multimedia research community because it offers low complexity encoders and more complex decoders. In other words, DVC framework offers E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 233–240, 2010. © Springer-Verlag Berlin Heidelberg 2010
234
A. Corrales-García et al.
a reversal of the asymmetry in terms of complexity compared to traditional codecs like H.264/AVC. This mobile to mobile scenario is depicted in Figure 1.
Fig. 1. Video communication system using a DVC to H.264/AVC video transcoder
In order to achieve this low complexity communication between both paradigms, a DVC to H.264/AVC video transcoder needs to be included into the network which converts the bitstream. Basically, both sending and receiving devices shift their complexity to the base station resulting in less complex user devices. On the contrary, the transcoder has to handle two complex processes: DVC decoding and H.264/AVC encoding. It is worth to mention that this transcoder device does not have any computational restriction and it is designed to be a high processing unit with many resources. Recently, in high-performance computing are used accelerator or multi-core processor devices such as Graphics Processing Units (GPUs), Cell Broadband Engines (Cell BEs), and Field-Programmable Gate Arrays (FPGAs). These small devices consist of tens or hundreds of homogeneous processing cores which are designed and organized with the goal of achieving higher performance are being used. Therefore, these new hardware opportunities open a new door in the field of multimedia processing and computing; in particular, in the framework of DVC to H.264/AVC transcoders, which joint two of the most time consuming processes (such as H.264/AVC encoding and DVC decoding algorithms). At this point, this paper proposes a DVC to H.264/AVC GPU-based video transcoder in which the H.264/AVC encoding algorithm (the second half of the proposed transcoder) is accelerated by means of parallel processing. The Motion Vectors (MVs) generated in the DVC side information process (this process is the DVC motion estimation) are reused as MVs predictors in the H.264/AVC encoding stage and then, the H.264/AVC motion estimation is executed in a parallel way over a GPU. In other words, the center point of the H.264/AVC search area is adjusted based on the DVC incoming MVs. The proposed transcoder is a straight forward step because of the parallel processing has not been used before in the literature in the framework of DVC based transcoders; all the previous DVC-based transcoders (H.263 [3] and H.264/AVC [4]) are based on sequential execution. This paper is organized as follows: Section 2 presents the basics of DVC, H.264/AVC and GPUs. Section 3 presents the proposed GPU-based video transcoder which is evaluated in Section 4. Finally, the conclusions are presented in Section 5.
A GPU-Based DVC to H.264/AVC Transcoder
235
2 Technical Background 2.1 Distributed Video Coding DVC provides a new video coding paradigm, where the architecture is characterized by encoders less complex than decoders. On the encoder side, frames are labeled as Key Frames (K) and Wyner-Ziv Frames (WZ). K frames are encoded as Intra frames in traditional codecs. However, WZ frames only store a few parity bits and temporal correlation is not exploited. For this reason, encoding procedure is much faster than traditional encoders. On the other hand, DVC decoder receives K frames firstly. From each two adjacent K frames is done an estimation of the middle WZ frame, which is called Side Information (SI). Figure 2 shows the first step in the SI generation for a MacroBlock (MB) using two frames with positions k and k+n in the sequence, then SI represents an approximation of the frame with position k+n/2. Every MB in the K frame k+n is matching with another MB in the K frame k. This matching is done by checking all the possibilities into the defined search area and choosing the lowest residual one. The displacement is quantified by a MV, and the middle of this MV represents the displacement for the MB interpolated. Afterwards, a channel decoding algorithm tries to refine this SI by using the parity information, as is specified in [2] [5].
Fig. 2. First step of SI generation process
2.2 Overview of H.264/AVC H.264/AVC [1] is the most recent predictive video compression standard that outperforms other previous existing video codecs. The H.264/AVC standard builds on those previous coding standards to achieve a compression gain of about 50%, largely at the cost of the increase in computational complexity of the encoder. These compression gains are mainly related to the variable and smaller block size motion compensation, improved entropy coding, multiple reference frames, smaller blocks transform and deblocking filter among others. The inter prediction in H.264/AVC supports motion compensation block sizes ranging from 16x16 to 4x4 with many options available between them. Then, the Motion Estimation (ME) process is carried out for each partition and sub-partition. This process is known as tree structured motion compensation algorithm. Therefore, the ME process is carried out many times per MB and, for this reason; this process spends most of the time of the encoding algorithm. Moreover, MVs neighboring partitions are often highly correlated and so each motion vector is
236
A. Corrales-García et al.
predicted from vectors of nearby, previously coded partitions. The predicted MVs are computed as the average of candidate MVs. These MBs include the left MB, above MB, and above-right MB against the current MB. 2.3 Graphics Processing Units In the past few years new heterogeneous architectures are being currently used in high-performance computing [6]. An example of this architecture is the GPUs. GPUs are small accelerator devices with hundreds of cores which are organized in several Single Instruction Multiple Data (SIMD) blocks, and designed with the goal of achieve high performance in graphics applications. GPUs are characterized by a high parallelism level and they are usually used as a coprocessor to assist the Central Processing Unit (CPU) in computing massive data. It must be taken in account that current GPUs can offer 10x higher main memory bandwidth and use data parallelism to achieve up to 10x more floating point throughput than the CPUs. Although GPUs can be used for general purpose they come primarily from multimedia and gaming applications. To facilitate the programming tasks of these devices, GPU manufacturers provide diverse tools, function libraries, languages or extensions for the most common used high level programming languages. For example, NVIDIA proposes a powerful GPU architecture called Compute Unified Device Architecture (CUDA) [7]. This architecture allows a great number of threads running simultaneously the same code (kernel) taking advantage of the high computation capacity and main memory bandwidth.
3 Proposed DVC to H.264/AVC GPU-Based Video Transcoder This work proposes a DVC to H.264/AVC transcoding architecture from each DVC GOP to H.264/AVC I11P GOP (baseline profile). This procedure is done in an efficient way by the reusing of the MVs calculated during the DVC decoding phase in order to determine the MBs predictors of the H.264/AVC ME task. In consequence, the time spent in the overall transcoding process is largely reduced. 3.1 Allocation of Motion Vectors In the DVC decoding stage, MVs are calculated during the SI generation process, as it was explained in section 2.1. These MVs offer an approximation about the displacement between frames. For different DVC GOP lengths, decoding process changes but MVs are selected in a similar way. For example, Figure 3 shows the mapping of MVs from a DVC GOP length 4 to a H.264/AVC I11P GOP. On the step 1, DVC decodes frame WZ2 using K0 and K4 frames as references. As a result, MVs V0-4 are available, but these MVs are not considered because references with high distance do not provide a good accuracy. On the second step, frame WZ1 is decoded using frames WZ0 and WZ’2 as references, and likewise frame WZ3 is decoded using as references frames WZ2 and WZ4. MVs generated in the second step (V0-2 and V2-4) provide a better accuracy, so they will be used by H.264/AVC, which will use them like predictors. However, as they are calculated for a distance of 2 and P frames have references
A GPU-Based DVC to H.264/AVC Transcoder
237
Fig. 3. Allocation of MVs from DVC GOP 4 to H.264/AVC I11P GOP
with distance 1, they are split into two halves. Notice that each DVC GOP has the last DVC step in common and MVs used like predictors for H.264 are selected in this step. Consequently, MVs’ extraction process is generic for each DVC GOP. 3.2 H.264/AVC GPU-Based Execution The improved H.264/AVC encoding algorithm as part of the whole transcoding process is presented in the following lines. The idea behind of this approach is motivated by the fact that the ME is carried out many times in H.264/AVC encoding algorithm. As it has been explained in Section 2.2, the H.264/AVC encoding algorithm supports many MB partitions and for each of them, it calls to the ME algorithm. As the number of partitions increase, the time consumption also increases. Therefore, GPUs philosophy fits well in this framework because of they are based on a SIMD computing device. In fact, the ME algorithm is carried out over multiples data, which are the MB positions to be checked. Therefore the H.264/AVC inter prediction (which includes the ME process) is executed over the GPU by using CUDA. For this purpose, the proposed algorithm is divided into three steps; all of them need to be executed sequentially but each one is exploited following a highly parallel procedure by using the GPU. The goal of the first kernel is to obtain the Sum Absolute Differences (SAD) calculation between the current MB (split into sixteen 4x4 partitions) and all MB positions in the reference frame inside the search range. Then, by using the previous 4x4 block SAD calculations, it is able to obtain the SAD costs for the different sub-partitions. Finally, the last kernel reduces the SAD cost to one SAD cost for each one of the 41 MB partitions of each MB. More detail about the algorithm can be found in [8] In a nutshell, the main challenge of this approach is to support efficiently the tree structured motion compensation algorithm developed at the H.264/AVC encoding algorithm. In order to achieve that, H.264/AVC encoding algorithm uses the SAD calculation to determine the best MB partition, which is calculated in parallel over the GPU. The main problem of performing the ME in a parallel way is that the MVs predictors of neighboring MBs are not accessible for the current MB. As it has been
238
A. Corrales-García et al.
explained in Section 2.2, the defined search area for each MB is determined based on the MVs of its neighboring but, this information is not accessible because they are being calculated at the same time that the current MB. In the present approach, the MVs generated in the DVC decoding algorithm, which are calculated as Section 3.1 explains, are used to determine the predicted search area. Figure 4 depicts this approach.
Fig. 4. Predicted search area based on the incoming DVC motion vectors
4 Performance Evaluation In order to evaluate the performance of the proposed transcoder, four QCIF sequences were encoded at 30 fps by a DVC codec based on VISNET-II [9]. QCIF format was selected because it is the most suitable format for mobile-to-mobile video communications due to the reduced size of mobile displays and the low network bandwidth requirements. In the DVC encoding stage, 300 frames were encoded for each sequence using a QP matrix fixed to 7 [5] and GOP lengths 2, 4 and 8. During the DVC decoding stage, MVs are passed to H.264/AVC encoder as predictors. On the second stage of the transcoder, the H.264/AVC encoder converts each DVC GOP input into a H.264/AVC I11P GOP using QPs = 28, 32, 36 and 40 as specified in Bjøntegaard and Sullivan´s common test rule [10]. In the simulations the H.264/AVC JM reference software, version 15.1 [11] was used. The baseline profile with the configuration by default was applied. In addition, RD-Optimization was turned off to make a suitable real-time encoding for low complexity mobile devices. In order to evaluate the performance, percentage of ME Time Reduction (%TR) reports the average of times reduction of ME displayed by H.264/AVC for the four QP points under study. Table 1 shows RD results for the proposed transcoder. As it is observed, the transcoder complexity is highly reduced reaching about a 79% of TR on average without significant RD penalties. This small RD drop is a consequence of the parallel execution which cannot use the sequential standard process to calculate the predictors, so it uses an approximation provided by the DVC MVs. Moreover, similar results are observed for different GOPs due to the generic MVs extraction procedure employed by the proposal. The last column of Table 1 shows the frame per second (fps) rate achieved for the whole encoding process which performs real time encoding. The present approach is a straight forward step in the framework of GPU-based transcoders and, thus, although RD results are similar than presented in [8] (without using
A GPU-Based DVC to H.264/AVC Transcoder
239
MVs), this approach provides a more accurate displacement of the search area. This could be extended as future work trying to reduce the search area by using the incoming MVs to reach better time reduction without large increasing of the RD penalty. In addition, Figure 5 displays RD results from a graphical point of view. As it is shown, all QP points are much close and the proposal presents a similar behavior for different GOPs. Table 1. Performance of the proposed transcoder for 30 fps QCIF sequences RD performance of the WZ/H.264/AVC video transcoder – 30fps GOP PSNR (dB) Bitrate (%) TR (%) 2 -0.191 4.60 80.53 4 -0.217 4.80 80.54 8 -0.161 4.12 80.88 2 -0.055 1.21 72.97 4 -0.042 0.92 73.09 8 -0.036 0.82 73.18 2 -0.118 2.95 82.47 4 -0.102 2.33 81.90 8 -0.105 2.37 81.71 2 -0.216 5.11 81.31 4 -0.213 5.26 81.97 8 -0.201 5.08 82.11 -0.138 3.297 79.39
Sequence Foreman
Hall
Coastguard
Soccer mean
fps 27,12 26,95 26,83 28,96 28,99 29,05 27,11 27,07 27,13 26,57 26,84 26,65 27,44
Sequences QCIF (176x144) 30 fps GOP = 2 41.00 Soccer
Hall 39.00
Foreman CoastGuard
37.00
PSNR
35.00
33.00
31.00
29.00
Reference Proposed
27.00
25.00 0
50
100
150
200
250
300
Bit rate [kbit/s]
Fig. 5. PSNR/bitrate results for sequences with GOP = 2. Reference symbols: ●Foreman ♦Hall ▲CoastGuard ■ Soccer.
5 Conclusions This paper presents a GPU-based video transcoder to efficiently support the mobile to mobile communications. The incoming DVC MVs are used as candidate to define the predicted search area and then, the ME algorithm is executed in parallel over the
240
A. Corrales-García et al.
GPU. The presented transcoder shows that the parallel computing in general, and GPUs as particular, is another efficient way to accelerate video coding algorithms. The improved transcoder depicted in this paper achieves a time reduction 79% on average with negligible rate distortion penalty. Ongoing work walks to improve the DVC decoding part of the transcoder using also parallel processing. Acknowledgments. This work was supported by the Spanish MEC and MICINN, as well as European Comission FEDER funds, under Grants CSD2006-00046, TIN200914475-C04 and TIN2009-05737-E. It was also partly supported by The Council of Science and Technology of Castilla-La Mancha under Grants PEII09-0037-2328, PII2I09-0045-9916 and PCC08-0078-9856. The work presented was developed by using the VISNET2-WZ-IST software developed in the framework of the VISNET II project.
References 1. ITU-T and ISO/IEC JTC 1: Advanced Video Coding for Generic Audiovisual Services. ITU-T Rec. H.264/AVC and ISO/IEC 14496-10 Version 8 (2007) 2. Girod, B., Aaron, A., Rane, S., Monedero, D.R.: Distributed Video Coding. In: Proc. of IEEE Special Issue on Advances in Video Coding and Delivery, vol. 93(1), pp. 1–12 (2005) 3. Peixoto, E., Queiroz, R.L., Mukherjee, D.: A Wyner-Ziv Video Transcoder. IEEE Trans. Circuits and Systems for Video Technology (to appear, 2010) 4. Martínez, J.L., Kalva, H., Fernández-Escribano, G., Fernando, W.A.C., Cuenca, P.: Wyner-Ziv to H.264 video transcoder. In: 16th IEEE International Conference on Image Processing (ICIP), Cairo, Egypt, pp. 2941–2944 (2009) 5. Ascenso, J., Brites, C., Pereira, F.: Improving frame interpolation with spatial motion smoothing for pixel domain distributed video coding. In: 5th EURASIP Conference on Speech and Image Processing, Multimedia Communications and Services, Smolenice, Slovak Republic (2005) 6. Feng, W.-c., Manocha, D.: High-performance computing using accelerators. Parallel Computing 33(10-11), 645–647 (2007) 7. NVIDA, NVIDIA CUDA Compute Unified Device Architecture-Programming Guide, Version 2.2 (February 2009) 8. Rodriguez, R., Martínez, J.L., Fernández-Escribano, G., Claver, J.M., Sánchez, J.L.: Accelerating H.264 Inter Prediction in a GPU by using CUDA. In: Proceedings of IEEE International Conference on Consumer Electronics, Las Vegas, NV, USA (2010) 9. VISNET II project, http://www.visnet-noe.org/ (last visited March 2010) 10. Sullivan, G., Bjøntegaard, G.: Recommended Simulation Common Conditions for H.26L Coding Efficiency Experiments on Low-Resolution Progressive-Scan Source Material. ITU-T VCEG, Doc. VCEG-N81 (2001) 11. Joint Video Team (JVT) of ISO/IEC MPEG and ITU-T VCEG, Reference Software to Committee Draft. JVT-F100 JM15.1 (2009)
Hybrid Color Space Transformation to Visualize Color Constancy Ramón Moreno, José Manuel López-Guede, and Alicia d’Anjou Computational Intelligence Group Universidad del País Vasco, UPV/EHU http://www.ehu.es/cwintco
Abstract. Color constancy and chromatic edge detection are fundamental problems in artificial vision. In this paperwe present a way to provide a visualization of color constancy that works well even in dark scenes where such humans and computer vision algorithms have hard problems due to the noise. The method is an hybrid and non linear transform of the RGB image based on the assignment of the chromatic angle as the luminosity value in the HSV space. This chromatic angle is defined on the basis of the dichromatic reflection model, having thus a physical model supporting it. Keywords: Color Constancy, Chromatic Edge, Color Segmentation, Illumination Transform.
1
Introduction
Color constancy (CC) is fundamental problem in artificial vision [4,10,15], and it has been the subject of neuropsicological research [1], it can be very influential in Color Clustering processes [2,11,7,3]. It is the ability of the human observer to identify the same surface color in spite of changes of environmental light, shadows and diverse degrees of noise. A related problem is that of Chromatic Edge detection (CE), meaning the ability to detect the location of surface and scene color transitions, corresponding to object boundaries. In the artificial vision framework, works ensuring CC or trying to perform CR, must assume some color space, often they must perform the estimation of the illumination source chromaticity [6,14] and proceed by the separation of diffuse and specular image components [9,12,16]. Usually, CC is associated with the diffuse component of the image. Measurements on human subjects lead to the conclusion that retinal processing is not enough to extract chromatic features and chromatic based structural image information. Some works demonstrate that CC analysis is done in the visual cortex, in the areas V4 and V4A [1]. Assuming the analogy with the human
This work been supported by Ministerio de Ciencia e Innovación of Spain TIN200905736-E/TIN.
E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 241–247, 2010. c Springer-Verlag Berlin Heidelberg 2010
242
R. Moreno, J.M. López-Guede, and A. d’Anjou
vision biology, artificial vision systems need no trivial processing to ensure CC results on the processing real images. Dark scenes are critical for CC, because dark image regions are usually very noisy, that is, the signal to noise ratio is very high due to the low magnitude of the visual signal. In these regions, the ubiquitous thermodynamical noise has an amplified effect that distorts region and edge detection ensuring CC conditions. Our approach obtains remarkable good results in these critical regions. In this paper we present a hybrid and non linear transformation of the RGB image based on the assignment of the chromatic angle of the pixel (computed in the RGB space) as the luminosity value in the HSV space. The image is preprocessed to remove the specular component [13]. The chromatic angle was defined on the basis of the Dichromatic Reflection Model (DRM), having thus a physical interpretation supporting it. In the HSV color space the intensity is represented in the V value, changing it does not change the pixel chromatic information. Thus, to visualize CC we assign constant intensity to the pixels having common chromatic features, by assigning the chromatic angle as the V value in HSV space. The paper has the following structure: section 2 is a brief overview of the dichromatic reflection model (DRM). Section 3 presents our approach. Section 4 shows and explains the experimental results. Section 5 gives the conclusions and directions for further works.
2
Dichromatic Reflection Model (DRM) in the RGB Space
The Dichromatic Reflection Model (DRM) was introduced by Shafer [8]. It explains the perceived color intensity I ∈ R3 of each pixel in the image as addition of two components, one diffuse component D ∈ R3 and a specular component S ∈ R3 . The diffuse component refers to the chromatic properties of the observed surface, while the specular component refers to the illumination color. Surface reflections are pixels with a high specular component. The mathematical expression of the model, when we have only one surface color in the scene, is as follows: I(x) = md (x)D + ms (x)S, (1) where md and ms are weighting values for the diffuse and specular components, taking values in [0, 1]. In figure1 the stripped region represents a convex region of the plane Πdc in RGB that contains all the possible colors expressed by the DRM equation 1. For an scene with several surface colors, the DRM equation must assume that the diffuse component may vary spatially, while the specular component is constant across the image domain: I(x) = md (x)D(x) + ms (x)S.
Hybrid Color Space Transformation to Visualize Color Constancy
243
Fig. 1. Typical distribution of pixels in the RGB space according to the Dichromatic Reflection Model
That the specular component is space invariant in both cases, means that the illumination is constant for all the scene. Finally, assuming several illumination colors we have the most general DRM I(x) = md (x)D(x) + ms (x)S(x), where the surface and illumination chromaticity are spatially variant. In the HSV color space, chromaticity is identified with the pair (H, S), and the V variable represents the luminosity or light intensity. Plotting on the RGB space a collection of color points that have constant (H, S) components and variable intensity I component, we have observed that chromaticity in the RGB space is geometrically characterized by a straight line crossing the RGB space’s origin, determined by the φ and θ angles of the polar coordinates of the points over this chromaticity line. The plot of the pixels in a chromatically uniform image region appear as straight line in the RGB space. We denote Ld this diffuse line. If the image has surface reflection bright spots, the plot of the pixels in these highly specular regions appear as another line Ls intersecting Ld . For diffuse pixels (those with a small specular weight ms (x)) the zenith φ and azimuthal θ angles are almost constant, while they are changing for specular pixels, and dramatically changing among diffuse pixels belonging to different color regions. Therefore, the angle between the vectors representing two neighboring pixels I (xp ) and I (xq ), denoted ∠ (Ip , Iq ), reflects the chromatic variation among them. For two pixels in the same chromatic regions, this angle must be ∠(Ip , Iq ) = 0 because they will be collinear in RGB space. The angle between Ip , Iq is calculated with the equation: ⎞ ⎛ T I (xp ) I (xq ) ⎠. ∠(Ip , Iq ) = arccos ⎝ (2) I (xp )2 + I (xq )2
3
An Approach for Regular Region Intensity
The basic idea of our approach is to assign a constant luminosity to the pixels inside an homogeneous chromatic region. To do that we must combine manipulations over
244
R. Moreno, J.M. López-Guede, and A. d’Anjou
the two color space representations of the pixels, the HSV and RGB. The process is highly non linear and it is composed of the following steps: 1. Isolate the diffuse component removing specular components (ms = 0): we are interested only in the diffuse component because it is the representation of the true surface color. We use the method presented in [12] to perform the diffuse and specular component separation. 2. Transform the diffuse RGB image into the HSV color space. 3. Compute for each pixel in the image the chromaticity angle as the angle between the gray diagonal line in the RGB space, going from the black space origin to the pure white corner, and the chromaticity line of the pixel. 4. Assume the normalized chromaticity angle as the new luminosity value in the HSV space pixel representation. In an homogeneous chromatic region, all pixels fall on the same diffuse line Ld : (r, g, b) = O + sσ; ∀s ∈ R+ where O = [0, 0, 0] and σ = [σr , σg , σb ] is the region chromaticity. The chromatic reference is the pure white line Lpw which is defined as Lpw : (r, g, b) = c + su; ∀s ∈ R+ where O = [0, 0, 0] and u = [1, 1, 1]. Therefore, if all pixels is a region belong to the same chromatic line, the angle between each pixel and the line Lpw must be the same, and the result of this angular measurement is a constant for whole region. Our strategy is to normalize this measure in his domain of definition (the RGB cube) and assume it as the constant luminosity value V . This method is expressed with the equation: V new (x) =
∠ (I(x), u) arccos(ϑ)
(3)
where the denominator arccos(ϑ) is the normalization constant corresponding to the maximum angle between the extreme chromatic lines of the RGB space (red, green or blue axes) and the pure white line. Algorithm 1, shows a Matlab/Scilab implementation of the method, where ϑ takes the value 13 and arccos(ϑ) = 0.9553166. Algorithm 1. Regular Region Intensity function IR = SF3(I) Idiff = imDiffuse(I); // look for the diffuse component new_intensity = angle(Idiff, [1 1 1]); // return a matrix of chromatic angles Ihsv = rgb2hsv(Idiff); Ihsv(:,:,3) = new_intensity; // assign the normalized angles as image intensity IR = hsv2rgb(Ihsv); endfunction
4
Experimental Results
We present the results from three computational experiments. The first one using a synthetic image and the remaining using natural images. The figure 2 displays the first experimental results. The figure 2a is the original image. The figure 2b
Hybrid Color Space Transformation to Visualize Color Constancy
(a)
(b)
(c)
(d)
245
Fig. 2. Synthetic image results (a) original image, (b) diffuse component of the image, (c) our method on image (a), our method on image (b)
is the diffuse image obtained applying the method in [13]. The image 2c is the result applying our proposed method in the image 2a. The figure 2d display the result applying the method in the image 2b. It can be appreciated that our method is able to identify the main chromatic regions even without component separation (figure 2c), with some artifact due to the bright reflections. After removal of these reflections, the method has a very clean identification of the chromatic regions. For the next experiments we use natural images that have been used by other researchers previously. The figures 3 and 4 show the experimental results. In both
(a)
(b)
(c)
(d)
Fig. 3. Natural image results, (a) original image, (b) diffuse component of the image, (c) our method on image (a), our method on image (b)
246
R. Moreno, J.M. López-Guede, and A. d’Anjou
(a)
(b)
(c)
(d)
Fig. 4. Natural images, (a) original image, (b) diffuse component of the image, (c) our method on image (a), our method on image (b)
cases the subfigure (a) has the original image, subfigure (b) shows the diffuse image, subfigure (c) displays the results applying our proposed method to the original image (a), subfigure (d) show the results applying our method in the diffuse image (b). In both experiments we can see a similar effect of applying specular correction. The images (c) obtained without component separation, show a better chromatic preservation, although with some degradation in the regions corresponding to the specular brights. The images obtained after diffuse component identification [13] are less sensitive to specular effects, however they show some chromatic region oversegmentation. It is important to note that no clustering process has been performed to obtain these images.
5
Conclusions and Further Works
In this work we present a color transformation that enables good visualization of Color Constancies in the image, changing only the image luminosity and preserving its chromaticity. The result is a new image with strong contrast between chromatic homogeneous regions, and good visualization of these regions as uniform regions in the image. This method performs very well in dark regions, which are critical for most CC methods and image segmentation based on color clustering processes. The method could be the basis for such a process, applying the clustering process to the chromaticity angle. We have found that specular correction of the image improves the results on highly specular regions of the image, however our approach performs well also
Hybrid Color Space Transformation to Visualize Color Constancy
247
on images that have not been preprocessed. Future works will be addressed to the computation of color edge detection and color image segmentation based on this approach. Hierarchical approaches may be useful [5].
References 1. Barbur, J.L., Spang, K.: Colour constancy and conscious perception of changes of illuminant. Neuropsychologia 46, 853–863 (2008); PMID: 18206187 2. Cheng, H.D., Jiang, X.H., Sun, Y., Wang, J.: Color image segmentation: advances and prospects. Pattern Recognition 34(12), 2259–2281 (2001) 3. Garcia-Sebastian, M., Gonzalez, A.I., Grana, M.: An adaptive field rule for non-parametric mri intensity inhomogeneity estimation algorithm. Neurocomputing 72(16-18), 3556–3569 (2009); Financial Engineering; Computational and Ambient Intelligence (IWANN 2007) 4. Gijsenij, A., Gevers, T., van de Weijer, J.: Generalized gamut mapping using image derivative structures for color constancy. International Journal of Computer Vision 86(2), 127–139 (2010) 5. Graña, M., Torrealdea, F.J.: Hierarchically structured systems. European Journal of Operational Research 25, 20–26 (1986) 6. Choi, Y.-J., Yoon, K.-J., Kweon, I.S.: Illuminant chromaticity estimation using dichromatic slope and dichromatic line space. In: Korea-Japan Joint Workshop on Frontiers of Computer Vision, FCV, pp. 219–224 (2005) 7. Lezoray, O., Charrier, C.: Color image segmentation using morphological clustering and fusion with automatic scale selection. Pattern Recognition Letters 30(4), 397– 406 (2009) 8. Shafer, S.A.: Using color to separate reflection components. Color Research and Aplications 10, 43–51 (1984) 9. Shen, H.-L., Zhang, H.-G., Shao, S.-J., Xin, J.H.: Chromaticity-based separation of reflection components in a single image. Pattern Recognition 41, 2461–2469 (2008) 10. Skaff, S., Arbel, T., Clark, J.J.: A sequential bayesian approach to color constancy using non-uniform filters. Computer Vision and Image Understanding 113(9), 993– 1004 (2009) 11. Tan, R.T., Nishino, K., Ikeuchi, K.: Color constancy through inverse-intensity chromaticity space. J. Opt. Soc. Am. A Opt. Image Sci. Vis. 21(3), 321–334 (2004) 12. Tan, R.T., Nishino, K., Ikeuchi, K.: Separating reflection components based on chromaticity and noise analysis. IEEE Trans. Pattern Anal. Mach. Intell. 26(10), 1373–1379 (2004) 13. Tan, R.T., Ikeuchi, K.: Reflection components decomposition of textured surfaces using linear basis functions. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, June 20-25, vol. 1, pp. 125–131 (2005) 14. Tan, T.T., Nishino, K., Ikeuchi, K.: Illumination chromaticity estimation using inverse-intensity chromaticity space. In: Proceedings of 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, June 18-20, vol. 1, pp. I-673–I-680 (2003) 15. Yoon, K.-J., Chofi, Y.J., Kweon, I.-S.: Dichromatic-based color constancy using dichromatic slope and dichromatic line space. In: IEEE International Conference on Image Processing, ICIP 2005, September 11-14, vol. 3, pp. III–960–3 (2005) 16. Yoon, K.-J., Choi, Y., Kweon, I.S.: Fast separation of reflection components using a specularity-invariant image representation. In: IEEE International Conference on Image Processing, October 8-11, pp. 973–976 (2006)
A Novel Hybrid Approach to Improve Performance of Frequency Division Duplex Systems with Linear Precoding Paula M. Castro, Jos´e A. Garc´ıa-Naya, Daniel Iglesia, and Adriana Dapena Department of Electronics and Systems University of A Coru˜ na, Spain {pcastro,jagarcia,dani,adriana}@udc.es
Abstract. Linear precoding is an attractive technique to combat interference in multiple-input multiple-output systems because it reduces costs and power consumption in the receiver equipment. Most of the frequency division duplex systems with linear precoding acquire the channel state information at the receiver by using supervised algorithms. Such algorithms make use of pilot symbols periodically sent by the transmitter. In a later step, the channel state information is sent to the transmitter side through a limited feedback channel. In order to reduce the overhead inherent to the periodical transmission of training data, we propose to acquire the channel state information by combining supervised and unsupervised algorithms, leading to a hybrid and more efficient approach. Simulation results show that the performance achieved with the proposed scheme is clearly better than that with standard algorithms. Keywords: Linear Precoding, MIMO Systems.
1
Introduction
The increased demand of multimedia contents has produced a continuous development of new techniques to try to improve the throughput of digital communication systems. For instance, current transmission standards for Multiple-Input Multiple-Output (MIMO) systems include the so-called precoders in order to guarantee that the link throughput be maximized [1, 2]. Precoding algorithms for MIMO are classified into linear and nonlinear precoding types. In the sequel, we consider Linear Precoding (LP) approaches because they achieve reasonable throughput with a complexity lower than that required by non linear precoding approaches. In order to be able to implement precoding schemes, the base station must know the Channel State Information (CSI). However, in most of the Frequency Division Duplex (FDD) systems the transmitter (TX) cannot obtain the CSI from the received signals —even under the assumption of perfect calibration— because the channels are not reciprocal. The CSI is thus estimated at the receiver (RX) side and transmitted back through a limited feedback channel. Usually, E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 248–255, 2010. Springer-Verlag Berlin Heidelberg 2010
A Novel Hybrid Channel Estimation Approach with Linear Precoding
u [n] F x [n] H
gI uˆ [n]
Q(Ċ)
249
u˜ [n]
η [n] Fig. 1. MIMO System with linear transmit filter (linear precoding)
current standards perform the channel estimation by using supervised algorithms that make use of pilot symbols. Such pilot symbols do not convey information, and therefore, both system throughput and spectral efficiency are penalized. In this paper, we propose to combine two important paradigms of Neural Networks: supervised and unsupervised learning. The kind of learning to be used is decided based on a simple criterion that determines the time instant when the channel has suffered a significant variation. In such moment, a supervised algorithm is employed to re-estimate the channel making use of pilot symbols. The rest of the time, when the channel variation is not significant enough, the unsupervised algorithm known as Infomax [3] is utilized.
2
System Model
We consider a MIMO system with Nt transmit antennas and Nr receive antennas. The precoder generates the transmit signal x from all data symbols u = [u1 , . . . , uNr ] corresponding to the different receive antennas 1, . . . , Nr .We denote the equivalent low-pass channel impulse response between the j–th transmit antenna and the i–th receive antenna as hi,j (τ, t). For flat fading channels, the channel matrix H(t) is given by ⎛
⎞ h1,1 (t) · · · h1,Nt (t) ⎜ ⎟ .. .. .. H(t) = ⎝ ⎠, . . . hNr ,1 (t) · · · hNr ,Nt (t)
and the received signal is yj (t) =
Nt
hji (t)xi (t) + ηj (t), y(t) = H(t)x(t) + η(t),
(1)
i=1
where ηj (t) is the additive noise, x(t) = [x1 (t), . . . , xNt (t)]T ∈ CNt , y(t) = [y1 (t), . . . , yNr (t)]T ∈ CNr , and η(t) = [η1 (t), . . . , ηNr (t)]T ∈ CNr In general, if f [n] = f (nTs + Δ) denote the samples of f (t) every Ts seconds with Δ being the sampling delay and Ts the symbol time, then sampling y(t) every Ts seconds yields to the discrete time signal y[n] = y(nTs + Δ) given by y[n] = H[q]x[n] + η(n),
(2)
where n = 0, 1, 2, . . . corresponds to samples spaced Ts seconds, and q denotes the time slot. The channel remains stationary during a block of NB symbols.
250
P.M. Castro et al.
Note that this discrete time model is equivalent to the continuous-time model in (1) only if Inter-Symbol Interference (ISI) is avoided (i.e. if the Nyquist criterion is satisfied). In that case, we will be able to reconstruct the original continuoustime signal from the samples. This channel model is known as time-varying flat block-fading channel and it will be assumed in the sequel. For brevity, we omit the slot index q in the sequel. At the TX side, a way to carry out the pre–equalizer (or precoding) step consists in including a transmit filter matrix F ∈ CNt ×Nr , and a RX filter matrix G = gI ∈ CNr ×Nr , leading to Nr scalar data streams. Figure 1 shows the resulting communications system in which the data symbols u[n] are passed through the transmit filter F to form the transmit signal x[n] = F u[n] ∈ CNt . Note that the constraint for the transmit energy must be fulfilled. Therefore, the received signal is given by y[n] = HF u[n] + η[n] ∈ CNr , where H ∈ CNr ×Nt , and η[n] ∈ CNr is the Additive White Gaussian Noise (AWGN). After multiplying by the receive gain g, we get the estimated symbols ˆ = gHF u[n] + gη[n] ∈ CNr . u[n]
(3)
Clearly, the restriction that all the receivers apply the same scalar weight g is not necessary for decentralized receivers. Replacing G by a diagonal matrix suffices (e.g. [4]). However, usually no closed form can be obtained for the precoder if G is diagonal. Fortunately, F can be found in closed form for G = gI. Thus, we use G = gI in the following. Although Wiener filtering for precoding has only been considered by a few authors [5] in comparison with other criteria for precoding, it is a very powerful transmit optimization that minimizes the Mean Square Error (MSE) with a transmit energy constraint [6, 7, 8, 2], i.e.
2 H ˆ {FWF , gWF } = argmin E u[n] − u[n] (4) 2 , s.t.: tr(F Cu F ) ≤ Etx , {F ,g}
where Cu = E u[n]uH [n] . It has been demonstrated in [5] that (4) leads to a unique solution if we restrict g to be positive real. Then, the solution for the Wiener filter is given by −2
tr (H H H+ξI ) H H Cu H −1 Cη −1 H H FWF = gWF H H +ξI H , gWF = , ξ = tr Etx Etx (5)
3
Adaptive Algorithms
The model explained in Section 2 states that the observations are linear and instantaneous mixtures of the transmitted signals x[n] of (2). For the case of the
A Novel Hybrid Channel Estimation Approach with Linear Precoding
251
linear precoder described in the previous section, this equation can be rewritten as follows y[n] = HF u[n] + η[n]. (6) This means that the observations y[n] are instantaneous mixtures of the data symbols u[n], where the mixing matrix is given by HF . In the sequel, we will denote this mixing matrix as A, so the observations y[n] can be obtained as y[n] = Ad[n] + η[n].
(7)
According to our target, A may represent the channel matrix (see (2)), or the whole coding–channel matrix HF (see (6)). In the first case, d[n] represents the coded signal x[n] = F u[n] and, in the second case, the user data signal u[n]. We assume that the mixing matrix is unknown but full rank. Without any loss of generality, we can suppose that the source data have a normalized power equal to one since possible differences in power may be included into the mixing matrix A. In order to recover the source data, we will use a linear system of which output is a combination of the observations, expressed as z[n] = W H [n]y[n], W ∈ CNr ×Nr ,
(8)
by combining both (7) and (8), the output z[n] can be rewritten as a linear combination of the desired signal z[n] = Γ [n]d[n],
(9)
where Γ [n] = W H [n]A represents the overall mixing/separating system. Sources are optimally recovered when the matrix W [n] is selected such as every output extracts a different single source. This occurs when the matrix Γ [n] has the form Γ [n] = D[n]P [n],
(10)
where D[n] is a diagonal invertible matrix, and P [n] is a permutation matrix. In this paper, we consider two types of Neural Network paradigms: supervised and unsupervised approaches. Supervised Approach. A way to estimate the channel matrix, H, consists in minimizing the Mean Square Error MSE between the outputs y[n] and the code signals x[n]. In particular, by considering only one sample, we obtain the Least Mean Squares (LMS) algorithm, W [n + 1] = W [n] − μy[n](W H [n]y[n] − d[n])H . This algorithm is also called delta rule of Widrow-Hopf [9] in the context of Artificial Neural Networks. It is easy to prove that the stationary points of this rule are (11) W [n] = Cy −1 Cyd ,
252
P.M. Castro et al.
where Cy = E[y[n]y H [n]] is the autocorrelation of the observations and Cyd = E[y[n]dH [n]] is the cross–correlation between the observations and the desired signals. In practice, the desired signal is considered known only during a finite number of instants (pilot symbols) and the expectations are estimated by averaging samples. Unsupervised Approach. The inclusion of pilot symbols reduces the system throughput (or equivalently, the spectral efficiency of the system) and wastes transmission energy because pilot sequences do not convey user data. This limitation can be avoided by using Blind Source Separation (BSS) algorithms, which simultaneously estimate the mixing matrix A, and the realizations of the source vector u[n] from the corresponding realizations of the observations y[n]. One of the best known BSS algorithms has been proposed by Bell and Sejnowski in [3]. Given an activation function h(·), the idea is to obtain the weighted coefficients of a Neural Network W [n] in order to maximize the mutual information between the outputs before the activation function h(z[n]) = h(W H [n]y[n]), and its inputs y[n], which is given by JMI (W [n]) = ln(det(W H [n])) +
NB
E[ln(hi (zi [n]))],
(12)
i=1
where hi is the i–th element of the vector h(z[n]) and denotes the first derivative. The resulting algorithm, named Infomax, has the following form
W [n + 1] = W [n] + μW [n]W H [n] · y[n] g H (z[n]) − W −H [n]
(13) = W [n] + μW [n] z[n]g H (z[n]) − I . The expression in (12) admits an interesting interpretation when the non–linear function g(z) = z ∗ (1 − |z|2 ) is utilized. In this case, Castedo and Macchi [12] have shown that the Bell and Sejnowski rules are equivalent to the Constant Modulus Algorithm (CMA) proposed by Godard in [13].
4
Hybrid Approach
One of advantages of adaptive unsupervised algorithms is their ability to track low variations of the channel. On the contrary, supervised solutions provide a fast channel estimation for low or high variations at the cost of using pilot symbols. In this section, we combine this two paradigms in order to obtain a performance similar to that offered by supervised approaches, but using lower number of pilot symbols. We will denote by Wu [n] and Ws [n] the matrices for the unsupervised and the supervised modules, respectively. We start with an initial estimation of the channel matrix obtained using the Widrow-Hopf solution (11). This estimation is used at the TX in order to obtain the optimum coding matrix F , and at the RX with the goal of initializing the unsupervised algorithm to Wu [n] = (F H)−H .
A Novel Hybrid Channel Estimation Approach with Linear Precoding
253
While the channel does not suffer from a significant variation, the matrix Wu [n] is adapted (unsupervised mode) and the data symbols u[n] are recovered using z[n] = WuH [n]y[n]. However, when a significant variation is detected, the RX sends an alarm to the TX through the feedback channel. Next, a pilot sequence is transmitted. Then, at the RX a supervised algorithm estimates the channel from the pilot symbols (channel estimation update). In particular, we make use of Widrow-Hopf solution (11) by considering that u[n] are the coded signals at the output of the linear precoder. This solution provides us the channel matrix estimation, which is sent to the TX in order to adapt the coding matrix. ˆ , and The RX also computes the coding matrix F , the reference matrix HF −1 ˆ initializes the unsupervised algorithms as Wu [n] = HF . The question now is how to determine when the channel has suffered a significant change. An interesting consequence of using a linear precoder is that the permutation indeterminacy (see (10)) associated to unsupervised algorithms is avoided because of the following initialization Wu [n] = (F H)−H . This means that the sources are recovered in the same order as they were transmitted. (10) implies that the optimum separation matrix produces a diagonal matrix Γ [n], and therefore, the mismatch of Γ [n] with respect to a diagonal matrix allows us to measure the variations of the channel. ˆ Although the channel matrix is unknown, we can use the estimation HF computed by the supervised approach as a reference. This means that in each ˆ . Consequently, the difference with iteration we can compute Γ [n] = WuH [n]HF respect to a diagonal matrix can be obtained using the following error criterion Nt Nt |γij [n]|2 |γji [n]|2 Error(n) = + , |γii [n]|2 |γii [n]|2 i=1
(14)
j=1,j =i
where γii [n] denotes the i–th element of its diagonal. The decision rule consists in comparing with some threshold t, i.e. Error(n) > t → Use supervised approach (15) Error(n) ≤ t → Use unsupervised approach
5
Experimental Results
We evaluate the performance of the proposed combined schemes by simulations. We transmit 8 000 pixels of the image cameraman (in TIF format with 256 gray levels) using a QPSK and a 4×4 MIMO system. The channel matrix is updated each 2 000 symbols using the following model: H = (1 − α)H + αHnew , where Hnew is a 4 × 4 matrix randomly generated according to a Gaussian distribution. The SNR has been fixed to 20 dB. We compare the performance of three different schemes (see Figure 2): the Widrow-Hopf solution (11) with 200 pilot symbols transmitted every 2 000 symbols (supervised approach); the Infomax algorithm (13) with the non linear
254
P.M. Castro et al.
10
0
Constant channel during a random number of symbols (between 2000 and 3000)
BER
Unsupervised 10
−1
10
−2
10
−3
10
−4
Constant channel during a fixed number of symbols (2000)
Supervised
Hybrid
# channel estimation updates
4 3.5 3 2.5 2 1.5 1 0.5 0 0
0.1 0.2 0.3 0.4 Channel updating parameter α
0
0.1 0.2 0.3 0.4 Channel updating parameter α
Fig. 2. Performance results (see Section 5)
function g(z) = z ∗ (1 − |z|2 ), and μ = 0.001 (unsupervised approach); and the hybrid approach with a threshold t = 0.7. The left-hand side of Figure 2 shows the results when the channel remains constant during a random number of symbols between 2 000 and 3 000. The right-hand side of Figure 2 plots the results when the channel remains constant during NB = 2 000 symbols. The top side of Figure 2 shows the Bit Error Ratio (BER) obtained for all approaches. The bottom side shows the number of times the mixing matrix has been estimated, and updated, using the supervised approach. Comparing the curves in Figure 2, we observe that the BER offered by the hybrid system is invariant to the number of symbols in which the channel remains constant.
6
Conclusion
In order to reduce the overhead due to the transmission of pilot symbols we have proposed to combine supervised and unsupervised algorithms. The algorithm selection was done by using a simple decision rule to determine a significant variation in the channel. This information was sent to the TX using a limited
A Novel Hybrid Channel Estimation Approach with Linear Precoding
255
feedback channel. The experimental results showed that the hybrid approach is an attractive solution because it provides an adequate BER with a reduced number of pilot symbols.
Acknowledgment This work been supported by Xunta de Galicia, Ministerio de Ciencia e Innovaci´on of Spain, and FEDER funds under the grants 09TIC008105PR, TEC200768020-C04-01, CSD2008-00010, and TIN2009-05736-E/TIN.
References [1] Fischer, R.F.H.: Precoding and Signal Shaping for Digital Transmission. John Wiley & Sons, Chichester (2002) [2] Joham, M.: Optimization of Linear and Nonlinear Transmit Signal Processing. PhD dissertation, Munich University of Technology (2004) [3] Bell, A., Sejnowski, T.: An Information-Maximization Approach to Blind Separation and Blind Deconvolution. Neural Computation 7(6), 1129–1159 (1995) [4] Hunger, R., Joham, M., Utschick, W.: Extension of Linear and Nonlinear Transmit Filters for Decentralized Receivers. In: European Wireless 2005, vol. 1, pp. 40–46 (2005) [5] Joham, M., Kusume, M., Gzara, M.H., Utschick, W., Nossek, J.A.: Transmit Wiener Filter for the Downlink of TDD DS-CDMA Systems. In: Proc. ISSSTA, vol. 1, pp. 9–13 (2002) [6] Choi, R.L., Murch, R.D.: New Transmit Schemes and Simplified Receiver for MIMO Wireless Communication Systems. IEEE Transactions on Wireless Communications 2(6), 1217–1230 (2003) [7] Karimi, H.R., Sandell, M., Salz, J.: Comparison between Transmitter and Receiver Array Processing to Achieve Interference Nulling and Diversity. In: Proc. PIMRC 1999, vol. 3, pp. 997–1001 (1999) [8] Nossek, J.A., Joham, M., Utschick, W.: Transmit Processing in MIMO Wireless Systems. In: Proc. 6th IEEE Circuits and Systems Symposium on Emerging Technologies: Frontiers of Mobile and Wireless Communication, Shanghai, pp. 1–18 (2004) [9] Haykin, S.: Neural Networks. A Comprehensive Foundation. Macmillan College Publishing Company, New York (1994) [10] Amari, S.-I.: Gradient Learning in Structured Parameter Spaces: Adaptive Blind Separation of Signal Sources. In: Proc. WCNN 1996, pp. 951–956 (1996) [11] Mejuto, C., Castedo, L.: A Neural Network Approach to Blind Source Separation. In: Proc. Neural Networks for Signal Processing VII, pp. 486–595 (1997) [12] Castedo, L., Macchi, O.: Maximizing the Information Transfer for Adaptive Unsupervised Source Separation. In: Proc. SPAWC 1997, pp. 65–68 (1997) [13] Godard, D.N.: Self-Recovering Equalization Carrier Tracking in Two-Dimensional DataCommunications Systems. IEEE Transactions on Communications (1980)
Low Bit-Rate Video Coding with 3D Lower Trees (3D-LTW) Otoniel L´ opez , Miguel Mart´ınez-Rach, Pablo Pi˜ nol, Manuel P. Malumbres, and Jos´e Oliver Miguel Hern´ andez University, Avda. Universidad s/n, 03202 Elche, Spain Universidad Polit´ecnica de Valencia Camino de Vera s/n, 46022 Valencia, Spain {otoniel,mmrach,pablop,mels}@umh.es, [email protected]
Abstract. The 3D-DWT is a mathematical tool of increasing importance in those applications that require an efficient processing of volumetric info. However, the huge memory requirement of the algorithms that compute it is one of the main drawbacks in practical implementations. In this paper, we introduce a fast frame-based 3D-DWT video encoder with low memory usage, based on lower-trees. In this scheme, there is no need to divide the input video sequence into group of pictures (GOP), and it can be applied in a continuous manner, so that no boundary effects between GOPs appear. Keywords: 3D-DWT, wavelet-based video coding.
1
Introduction
In recent years, three-dimensional wavelet transform (3D-DWT) has focused the attention of the research community, most of all in areas such as video watermarking and 3D coding (e.g., compression of volumetric data [1] or multispectral images [2], 3D model coding [3], and especially, video coding). In video compression, some early proposals were based on merely applying the wavelet transform on the time axis after computing the 2D-DWT for each frame [4]. Then, an adapted version of an image encoder can be used, taking into account the new dimension. For instance, the two dimensional (2D) embedded zero-tree (IEZW) method has been extended to 3D IEZW for video coding by Chen and Pearlman[5], and showed promise of an effective and computationally simple video coding system without motion compensation, obtaining excellent numerical and visual results. A 3D zero-tree coding through modified EZW has also been used with good results in compression of volumetric images[6]. In [4], instead of the typical quad-trees of image coding, a tree with eight descendants per coefficient is used to extend the SPIHT image encoder to 3D video coding.
Thanks to Spanish Ministry of education and Science under grant DPI2007-66796C03-03 for funding.
E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 256–263, 2010. c Springer-Verlag Berlin Heidelberg 2010
Low Bit-Rate Video Coding with 3D Lower Trees (3D-LTW)
257
A more efficient strategy for video coding with time filtering is Motion Compensated Temporal Filtering (MCTF) [7]. In MCTF, in order to compensate object (or pixel) misalignment between frames, and hence avoid the significant amount of energy that appears in high-frequency subbands, a motion compensation algorithm is introduced to align all the objects (or pixels) in the frames before being temporally filtered. In all these applications, the first problem that arises is the extremely high memory consumption of the 3D wavelet transform if the regular algorithm is used, since a group of frames must be kept in memory before applying temporal filtering, and in the case of video coding, we know that the greater temporal decorrelation, the greater number of frames are needed in memory. So, the GOP size should be small in order to prevent high memory usage. This leads to another problem, since dividing the video sequence in small GOPs hinders the temporal decorrelation and may produce visual artifacts at the GOP boundary. Even though several proposals have been made to avoid the aforementioned problems, most of them are not general (for any wavelet transform) and/or complete (the wavelet coefficients are not the same as those from the usual dyadic wavelet transform). In addition, software implementation is not always easy. In this paper, we propose a video encoder based on a frame-by-frame 3D-DWT scheme which does not require a GOP division, significantly reduces the memory usage and performs the 3D-DWT much faster than traditional algorithms.
2
3D-DWT with Low Memory Usage
In this section we propose an extension to a three-dimensional wavelet transform of the classical line-based approach [8], which computes the 2D-DWT with reduced memory consumption. In the new approach, frames are continuously input with no need to divide the video sequence into GOPs. Moreover, the algorithm yields slices of wavelet subbands (which we call subband frames) as soon as it has enough frames to compute them. This approach works as follows: The algorithm starts requesting LLL frames to the last level (GetLLLframe( nlevel) in Fig. 1). As seen in Fig. 2, the nlevel buffer must be filled with subband frames from the nlevel -1 level before it can generate frames. In order to get them, this function recursively calls itself until level 0 is reached. At this point, it no longer needs to call itself since it can return a frame from the video sequence, which can be directly read from the input/output system. The first time that the recursive function is called at every level, it has its buffer (bufferlevel ) empty. Then, its upper half (from N to 2N) is recursively filled with frames from the previous level. Recall that once a frame is received, it must be transformed using a 2D-DWT before being stored. Once the upper half is full, the lower half is filled by using symmetric extension. On the other hand, if the buffer is not empty, it simply has to be updated. In order to update it, it is shifted one position so that the frame contained in the first position is discarded and a new frame can be introduced in the last position (2N) by using a recursive call. This operation is repeated twice.
258
O. L´ opez et al.
However, if there are no more frames in the previous level, this recursive call will return End Of Frame (EOF). That points out that we are about to finish the computation at this level, but we still need to continue filling the buffer. We fill it by using symmetric extension again. Once the buffer is filled or updated, both high-pass and low-pass filter banks are applied to the frames in the buffer. As a result of the convolution, we get a frame of every wavelet subband at this level (HHLlevel , HLHlevel , HHHlevel , HLLlevel , LHLlevel , LLHlevel and LHHlevel ), and an LLL frame. The highfrequency coefficients are compressed and this function returns the LLL frame (see Fig. 2). For more details about frame-by-frame 3D-DWT, and a formal description of the algorithm, the reader is referred to [9]. function LowMemUsage3D FWT(nlevel) set F ramesReadlevel = 0 ∀level ∈ nlevel set bufferlevel = empty ∀level ∈ nlevel repeat LLL = GetLLLframe(nlevel) if (LLL != EOF) ProcessLowFreqSubFrame(LLL) until LLL = EOF end of fuction
Fig. 1. Perform the 3DFWT by calling GetLLLFrame recursive function
3
3D LTW
This section introduces the extension of the LTW still image coding [10] to 3D video coding. Our main concern is to keep the same simplicity of the 2D LTW, still giving high performance and low memory requirements. However, some changes must be done in the LTW algorithm so that it can be incorporated in this efficient wavelet transform. The main changes are: – Global knowledge of the video frame is no longer available, and therefore an estimation of the highest coefficient that may appear should be made, mainly depending on the type of wavelet normalization and the pixel resolution of the source video (in bpp). Finally, to ensure the correctness of the encoder, an escape code should be used for values outside the predicted range. – Since coefficients from different subband levels are interleaved (due to the computation order of the proposed wavelet transform), instead of a single bitstream, we should generate a different bitstream for every subband level. These bitstreams can be held in memory or saved in secondary storage, and are employed to form the final ordered bitstream. – Now, the root of a tree has eight descendants, instead of the four descendants in the 2D-LTW. Fig. 3 shows our overall system. The 3D-DWT module releases subband frames at different decomposition levels. At each level the subband frames are stored
Low Bit-Rate Video Coding with 3D Lower Trees (3D-LTW)
259
function GetLLLFrame (level) 1) First base case: No more frames to read at this level if F ramesReadlevel = MaxF rameslevel return EOF 2) Second base case: The current level belongs to the space domain and not to the wavelet domain else if level = 0 return InputFrame() else 3) Recursive case 3.1) Recursively fill or update the buffer for this level if bufferlevel is empty for i = N . . . 2N bufferlevel (i) = 2DF W T (GetLLf rame(level − 1)) FullSymmetricExtension(bufferlevel) else repeat twice Shift(bufferlevel ) frame = GetLLLframe(level − 1) if frame = EOF bufferlevel (2N ) = SymmetricExt(bufferlevel) else bufferlevel (2N ) = 2DFWT(frame) 3.2) Calculate the WT for the time direction from the frames in buffer, then process the resulting high frequency subband frames {LLL, LLH, LHL, LHH} =Z-axis FWT LowPass(bufferlevel ) {HLL, HLH, HHL, HHH} =Z-axis FWT HighPass(bufferlevel ) ProcessSubFrames({LLH, LHL, LHH, HLL, HLH, HHL, HHH}) set F ramesReadlevel =F ramesReadlevel + 1 return LLL end of fuction
Fig. 2. GetLLLFrame Recursive function
in a dedicated encoder buffer. There are two subband frames for each subband type. When this buffer is full, the 3D-DWT encoder process all subbands and maintains the significance map for building the trees. An important difference between this version and the LTW presented previously is that the new adapted encoder must process coefficients in only one-pass, and therefore symbols must be computed and output at once. However, in this case, it is not an important drawback because the order of the wavelet coefficients is later arranged for the decoder with an independent bitstream per decomposition level. The 3D-LTW algorithm is formally described in Fig. 4. Let us see it with some detail. The encoder has to determine if each 2x2 block of coefficients of both subband frames stored in the encoding buffer is part of a lower-tree. If the eight coefficients in these blocks are lower than the quantization threshold 2rplanes , and their descendant offspring are also insignificant, they are part of a lower-tree and do not need to be encoded. In order to know if their offspring are significant, we need to hold a binary significance map of every encoder buffer (S L in the figure) because the encoder buffer is overwritten by the wavelet transform once it is encoded, and hence the significance for their ascendant coefficients is not automatically held. Obviously, this significance map was not needed in the original LTW because the whole image was available for the encoder. The width of each significance map is sized eigth the size of the encoder buffer that it represents, since the significance is held for both 2x2 block. The significance of
260
O. L´ opez et al.
N level buffers LLLnlevel Bits
Buffer size (Width/2nlevel-1)x(Height/2nlevel-1)
FraameͲbased 3DDWTT
HLL2
HLH2
LHL2
LHH2
HHL2
HHH2 Buffers Length
LLH2
Buffer size (Width/4)x(Height/4) HLL1 buffer
HLH1 buffer
LHL1 buffer
LHH1 buffer
HHL1 buffer
HHH1 buffer
LLH1 buffer
.. .
S2 Significance map
2nd level bitstream
S1 Significance map
B ff Buffers Length
Buffer size (Width/2)x(Height/2) Video Frames (level=0) (INPUT)
TreeͲbased Subband Encoder
.. .
1st level bitstream
Final Bitstream (OUTPUT)
Fig. 3. Overview of the proposed tree-based encoder with efficient use of memory
both 2x2 blocks can be held with a single bit. Therefore, the memory required for these significance maps is almost negligible when compared with the rest of buffers. As in original LTW encoder, when there is a significant coefficient in both 2x2 block or in its descendant coefficients, we need to encode each coefficient separately. Recall that in this case, if a coefficient and all its descendants are insignificant, we use the LOWER symbol to encode the entire tree, but if it is insignificant, and the significance map of its eight direct descendant coefficients shows that it has a significant descendant, the coefficient is encoded as ISOLATED LOWER. Finally, when a coefficient is significant, it is encoded with a numeric symbol along with its significant bits and sign. At the last level (N), the tree cannot be propagated upward, and for this reason, we always encode all the coefficients at this level. Moreover, we can keep the compressed bit-stream in memory, which allows us to invert the order of the bitstream for the inverse procedure.
4
Results
In this section we analyze the behavior of the proposed encoder (3D-LTW). We will compare the 3D-LTW encoder versus the fast M-LTW Intra video encoder[11], 3D-SPIHT [12] and H.264 (JM16.1 version), in terms of R/D performance, coding and decoding delay and memory requirements. All the evaluated encoders have been tested on an Intel PentiumM Dual Core 3.0 GHz with 1 Gbyte RAM memory. In the frame-by-frame 3D wavelet transform, each buffer must be able to keep either 2N + 1 (filter length) low frequency frames at every level and each buffer at a level i needs a quarter of coefficients if compared with the previous level (i − 1). Therefore, for a frame size of (wxh) and an nlevel time decomposition, the number of coefficients required by this algorithm is:
Low Bit-Rate Video Coding with 3D Lower Trees (3D-LTW)
261
function SubbandCode( level , Buffer, S level−1 , S level ) Scan Buffer in 2x2 blocks (Bx,y ) in horizontal raster order for each blockBx,y = {c2x,2y , c2x+1,2y , c2x,2y+1 , c2x+1,2y+1 } level−1 if level = N ∧ ci,j < 2rplanes ∧ Si,j isInsignif.∀ci,j ∈ Bx,y level set Sx,y = Insignif. else level set Sx,y = Signif. for each ci,j ∈ Bx,y if ci,j < 2rplanes
level−1 if Si,j isInsignif. arithmetic output LOWER else arithmetic output ISOLATED LOWER else nbitsi,j = log2 (|Ci,j |) level−1 if Si,j isInsignif. ER arithmetic output nbitsLOW i,j else arithmetic output nbitsi,j output bitnbits(i,j) −1 (|Ci,j |). . . bitrplane+1 (|Ci,j |) output sign(ci,j ) endif endif end of fuction Note: bitn (C) is a function that returns the nth bit of C
Fig. 4. Lower tree wavelet coding with reduced memory usage ∞ (2N + 1) × (w × h) 4 = (2N + 1) × (w × h) × . n 4 3 n=0
(1)
which is asymptotically (as nlevel approaches infinity) independent of the number of frames to be encoded, less than the regular case, which needs (wxhxG), being G the number of frames in a GOP. In Table 1, the memory requirements of encoders under test are shown. Obviously, the M-LTW encoder only uses the memory needed to store one frame. The 3D-LTW encoder (using Daubechies 9/7F filter for both spatial and temporal filtering) uses up to 3.4 times less memory than 3D-SPIHT for CIF sequence size and up to 9 times less memory than H.264 for QCIF sequence size. Regarding R/D, in Fig. 5 we can see the R/D behavior of all evaluated encoders. As shown, H.264 is the one that obtains the best results, mainly due to the motion estimation/motion compensation (ME/MC) stage included in this encoder, contrary to 3D-SPIHT and 3D-LTW that do not include any ME/MC stage. It is interesting to see the improvement of 3D-SPIHT and 3D-LTW when Table 1. Memory requirements for evaluated encoders (KB) (results obtained with Windows XP task manager, peak memory usage index) Codec/Format H.264 3D-SPIHT 3D-LTW M-LTW QCIF 35824 10152 4008 1104 CIF 86272 34504 10644 1540
262
O. L´ opez et al. 50 55
45
50
P PSNR(dB)
PSNR(dB)
45 40 35
40
35
3DͲSPIHT
30
3DͲSPIHT
3DͲLTW
3DͲLTW
30
MͲLTW(Intra)
25
MͲLTW(Intra)
H264 0
100
200
300
400
500
600
700
H264
25
20 800
0
900
500
1000
1500
2000
2500
3000
3500
TargetBitͲrate(Kbps)
TargetBitͲrate(Kbps)
(a) Container
(b) Foreman
Fig. 5. PSNR (dB) for all evaluated encoders for a) Container sequence in QCIF format and b) Foreman sequence in CIF format 684.62
1,000.00
190.27
167.73
FraamesperSecond
100.00
62.61
47.06 23.19
10.00
1.00 0.18
0.10
0.04
0.01
CIF 3DͲSPIHT
QCIF 3DͲLTW
MͲLTW(Intra)
H264
Fig. 6. Execution time comparison of the encoding process
compared to an INTRA video encoder. As mentioned, no ME stage is included in 3D-SPIHT and 3D-LTW, so this improvement is accomplished by exploiting only the temporal redundancy among video frames. The R/D behavior of 3DSPIHT and 3D-LTW is similar for images with moderate-high motion activity, being 3D-LTW slightly better than 3D-SPIHT (up to 0.5 dB), but for sequences with low movement, 3D-SPIHT outperforms 3D-LTW, mainly due to the further dyadic decompositions applied in the temporal high frequency. Regarding coding delay, in Fig. 6 we can see that the 3D-LTW encoder is the fastest one, being up to 10 times faster than 3D-SPIHT for QCIF size sequences, 3.5 times faster than the M-LTW INTRA video encoder and up to 3800 times faster than H.264. The decoding process is also faster in 3D-LTW than in the other encoders.
5
Conclusions
In this paper a fast and very low memory demanding 3D-DWT encoder has been presented. The new encoder reduces the memory requirements compared with 3D-SPIHT (3.5 times less memory) and H.264 (up to 10 times less memory). The new 3D-DWT encoder is very fast (up to 10 times faster than 3D-SPIHT) and it has better R/D behavior than the INTRA video coder M-LTW (up to 11 dB). In order to improve the coding efficiency, an ME/MC stage could be added. In
Low Bit-Rate Video Coding with 3D Lower Trees (3D-LTW)
263
this manner, the objects/pixels of the input video sequence will be aligned, and so, fewer frequencies would appear at the higher frequency subbands, improving the compression performance. Acknowledgments. Thanks to Spanish Ministry of Science and Innovation under grant TIN2009-05737-E for funding.
References 1. Schelkens, P., Munteanu, A., Barbariend, J., Galca, M., Giro-Nieto, X., Cornelis, J.: Wavelet coding of volumetric medical datasets. IEEE Transactions on Medical Imaging 22(3), 441–458 (2003) 2. Dragotti, P., Poggi, G.: Compression of multispectral images by three-dimensional SPITH algorithm. IEEE Transactions on Geoscience and Remote Sensing 38(1), 416–428 (2000) 3. Aviles, M., Moran, F., Garcia, N.: Progressive lower trees of wavelet coefficients: Efficient spatial and SNR scalable coding of 3D models. In: Ho, Y.-S., Kim, H.-J. (eds.) PCM 2005. LNCS, vol. 3767, pp. 61–72. Springer, Heidelberg (2005) 4. Kim, B., Xiong, Z., Pearlman, W.: Low bit-rate scalable video coding with 3D set partitioning in hierarchical trees (3D SPIHT). IEEE Transactions on Circuits and Systems for Video Technology 10, 1374–1387 (2000) 5. Chen, Y., Pearlman, W.A.: Three-dimensional subband coding of video using the zero-tree method. In: Visual Communications and Image Processing, Proc. SPIE, March 1996, vol. 2727, pp. 1302–1309 (1996) 6. Luo, J., Wang, X., Chen, C., Parker, K.: Volumetric medical image compression with three-dimensional wavelet transform and octave zerotree coding. In: Visual Communications and Image Processing, Proc. SPIE, March 1996, vol. 2727, pp. 579–590 (1996) 7. Secker, A., Taubman, D.: Motion-compensated highly scalable video compression using an adaptive 3D wavelet transform based on lifting. In: IEEE Internantional Conference on Image Processing, October 2001, pp. 1029–1032 (2001) 8. Chrysafis, C., Ortega, A.: Line-based, reduced memory, wavelet image compression. IEEE Transactions on Image Processing 9(3), 378–389 (2000) 9. Oliver, J., Lopez, O., Martinez-Rach, M., Malumbres, M.: A general frame-byframe wavelet transform algorithm for a three-dimensional analysis with reduced memory usage. In: IEEE International Conference on Image Processing, October 2007, pp. 469–472 (2007) 10. Oliver, J., Malumbres, M.P.: Low-complexity multiresolution image compression using wavelet lower trees. IEEE Transactions on Circuits and Systems for Video Technology 16(11), 1437–1444 (2006) 11. Lopez, O., Martinez-Rach, M., Pi˜ nol, P., Malumbres, M., Oliver, J.: M-LTW: A fast and efficient intra video codec. Signal Processing: Image Communication (23), 637–648 (2008) 12. Kim, B.J., Xiong, Z., Pearlman, W.: Very low bit-rate embedded video coding with 3D set partitioning in hierarchical trees (3D SPIHT) (1997)
Color Video Segmentation by Dissimilarity Based on Edges Luc´ıa Ramos1 , Jorge Novo1 , Jos´e Rouco1 , Antonio Mosquera2, and Manuel G. Penedo1 1
VARPA Group, Department of Computer Science, University of A Coru˜ na, Spain [email protected], {jnovo,jrouco,mgpenedo}@udc.es 2 Artificial Vision Group, Department of Electronics and Computer Science, University of Santiago de Compostela, Spain [email protected]
Abstract. In this work new approaches are proposed to the extension to color-space of different shot change detection methods. These techniques are those ones that use the dissimilarity based on edges, in particular on space and frequency domains. They were previously defined to deal with grayscale videos, so the methods are redesigned to provide the best possible results in color-space videos. Moreover, some improvements are performed to obtain better results, as the use of an adaptative threshold instead of the fixed one previously used. Some experiments are presented to show the better behaviour of the new developed approaches. Keywords: Video segmentation, detection of changes of scene, cuts, fades, dissolves.
1
Introduction and Previous Work
Multimedia information has been growing in many areas of application over the years. Thus, there is an increased demand in new technologies and tools for organization, indexing, search and retrieval of data to satisfy user needs. Temporal video segmentation is the preliminary step to get the visual and semantic information to describe scenes for proper indexing and searching. The problem of shot change detection in video sequences has been widely studied in the bibliography in the area of analysis multimedia. All the existing techniques are based on the fact that frames within the same scene preserve certain degree of similarity, whereas frames around the limits of a scene show an important change in the visual content. Thus, shot changes are detected when the distance between successive frames is higher than a given threshold. There are methods that use the values of the image for the dissimilarity between frames. Some of them, use local characteristics of the image, as the approach proposed by Nagasaka and Tanaka [1] which consists on comparing the dissimilarity between consecutive frames computing the difference of intensities. Others compare two images according to global features as the image histogram dissimilarity proposed in the work of Kasturi et al. [2]. E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 264–271, 2010. c Springer-Verlag Berlin Heidelberg 2010
Color Video Segmentation by Dissimilarity Based on Edges
265
Previous methods are based on trivial features of images. There are other techniques focused on axes or edges which use more complex characteristics to get similarity measures between frames. Zabih et al. [3] proposed to obtain the similarity between frames by analyzing the edges of consecutive images. There are other approaches of this category as the work of Ardebilian et al. [4] which is based on comparing characteristic points or the approach proposed by Porter et al. [5] who use the correlation between frames. This work focuses on the dissimilarity measure based on edges, considering a method in the space domain and other one in the frequency domain. For each of them, besides the original version using grayscale frames, an approach to color extension is performed. Color versions improve some limitations of the original versions as the problems caused by changes in illumination. The selected methods employ a fixed threshold for detection. However, data video are highly dependent on content, so it is very difficult to establish an universal threshold for all the sequences. In this paper, it is proposed an adaptive threshold according to the sequence information. This paper is organized as follows. Section 2 details the steps for the shot change detection and the methods selected. Section 3 explains the contributions added to the original versions. Section 4 shows the experiments done and the results obtained. Finally, section 5 expounds the conclusions reached.
2
Temporal Video Segmentation Methods
The purpose of these methods is to detect changes of scene that presents a video sequence to be divided into a set of manageable and meaningful segments. The transitions between two scenes can be sudden or gradual. In the first case, there is a total change of visual content from one frame to the next, so the detection is simple and easy. In the second one, the changes are gradual, occurring slowly over successive frames, so the detection is more complex. The following types are identified: Cut: Strong change between two consecutive frames where the last frame of a scene is directly followed by the first frame of the next one. Dissolve: Gradual transition from one scene to another in which both frames are superimposed, so the last frame of the previous scene fades out as the first of the new scene fades in. Fade: Special case of dissolve where a monochrome frame replaces the last frame of the previous scene or the first frame of the next one. The first step for shot change detection is to obtain successive frames composing the video sequence, on which the operations necessary to analyze the features of interest are performed. To determine whether changes are happening between consecutive frames is necessary to establish a distance metric called “dissimilarity”. This metric allows us to decide if the variation of visual content is enough to consider the existence of a scene change between these frames. The methods studied here are oriented to define the dissimilarity based on edges. After
266
L. Ramos et al.
obtaining the distances between consecutive frames, these measures are compared with a threshold to determine the existence of scene changes. 2.1
Dissimilarity on Space Domain
This approach proposed by Zabih et al. [3] is based on the idea that when a scene change happens, new edges appear far from the positions of the previous edges and old edges disappear far from the emerging ones. As shown in the diagram of Figure 1, this method takes as input two binary images E and E obtained from applying Canny edge detector [6] on two consecutive frames. Then, the images E and E are created, where each edge pixel from E and E’ is dilated by a radius r. This dilation allows small displacements between consecutive frames without interfering in the detection. An entering pixel is defined as the edge pixel belonging to E , which appears far from the edge pixels in E. In the same way, an exiting pixel is defined as the edge pixel belonging to E that disappears far from the edge pixels in E . Thus, shot changes can be detected by the number of entering and exiting pixels.
Fig. 1. Steps in the calculation of the dissimilarity on space domain
Considering only the pixels belonging to edges of these images, ρin and ρout are defined by: ⎧ E[x,y]E [x,y] ⎨ρin = 1 − x,y ⎩ρ
out
=1−
x,y
E[x,y]
[x,y] E[x,y] x,y
E[x,y]E x,y
as the fraction of entering and exiting pixels, respectively. Shot transitions can be detected looking for peaks in ρ which is the maximum of these values. 2.2
Dissimilarity on Frequency Domain
The correlation is a measure of correspondence between two images. However, this operation in the spatial domain is too expensive. For that reason, this method computes correlation in the frequency domain. In the method proposed by Porter et al. [5], the first step is to calculate the Fourier transform of two frames to get their representation in the frequency domain. Then, a high-pass filter is applied to each image to accentuate the
Color Video Segmentation by Dissimilarity Based on Edges
267
contributions from higher spatial frequencies, because the edges and other strong changes in an image are related to the high frequency components of its Fourier transform. After applying the high-pass filter, the normalized correlation is calculated by the following equation: ρ (ξ) =
F T −1 {F T x1 (ω) F T x∗2 (w)} | F T x1 (ω) |2 dω · |F T x2 (ω) |2 dω
(1)
where ξ and ω are the spatial and spatial frequency coordinate vectors, respectively, F T xi (ω) denotes the Fourier transform of a frame xi (ξ), ∗ is the complex conjugate and T F −1 denotes the inverse Fourier transform. The dissimilarity measure for two consecutive frames is obtained as 1 - d where d represents the maximum value between the correlation coefficients, that is, the best match between two images. Figure 2 shows an example of the dissimilarity measures obtained between consecutive frames of a video sequence where the peaks represent scene changes.
Fig. 2. Dissimilarity measure between consecutive frames
3
Improved Extension to Color-Space
The original version of these methods calculates the dissimilarity in terms of intensity variations, obtained from the grayscale images. However, the methods have so many problems working with scenes that are too dark or bright. For that reason, an approach to color extension is developed for each method that identifies those variations as a result from the differences in color, not in intensity. 3.1
Color Dissimilarity on Space Domain
The color version of this method consists on obtaining the color edges on space domain. As a first approximation it was considered the use of the RGB model. The main idea is to get the edges separately for each channel, and then to obtain a global estimation of all of them. But the problem is that this estimation is not equivalent to get the edges directly in the space combination of the three components. As a solution, the ratios of Gevers [7] are used to capture all possible
268
L. Ramos et al.
color differences. This work adapts the original version of the method to detect edges according to color differences applying the thresholding stage of the Canny edge detector on these color ratios. This allows the calculation of perceptual color differences independently of the luminosity, providing a solution to the problems of lighting changes found in the grayscale version. These color ratios are defined by the relation of the RGB components red, green and blue between two neighboring pixels, as the following equation: mi
− → − → − → − → C1k1 , C1k2 , C2k1 , C2k2
=
− →
− →
− →
− →
C1k1 C2k2 C1k2 C2k1
(2)
→ − → − where C1 , C2 ∈ {R, G, B}, and k1 and k2 are the coordinates in the image of two neighboring pixels. These ratios can be considered as the correspondence between these pixels in the image domain. To adapt this method to color ratios, it is used the finite differences between pixels along a particular direction and calculated between red-green, red-blue and green-blue channels to get the gradient of the Canny operator. Thus, the edge detector is more robust against changes in shading and lighting between consecutive frames of the same scene. 3.2
Color Dissimilarity on Frequency Domain
Just as in the previous method, the idea would be to obtain the correlation on frequency domain of each channel and then merge all together. However, this is not equivalent to measure the correlation directly in the space composition of the three dimensions. For that reason, once again, the direct use of the RGB model was discarded for this approximation. As the authors proposed, it is better to obtain the correlation in a frequency space. The Fourier transform works with complex numbers so the adaptation of this method is proposed performing the hue-saturation space as a complex number which is a chromatic subspace of HSV defined by the following expression: b (x, y) = S (x, y) · eiH(x,y)
(3)
where the saturation is interpreted as the magnitude and hue as the phase of the complex. The interpretation of the hue and saturation as polar coordinates allows direct use this color space for Fourier transform, making easier the adaptation of this method. After that, the remaining steps of the method are common to the grayscale version. 3.3
Improved Adaptative Threshold
Shot changes are detected when the distance between successive frames is higher than a given threshold. It is very difficult to establish a single threshold valid for all sequences because video data are quite variable. As Figure 3(a) shows, to determine a threshold too high means that not all shot changes are detected, meanwhile with low values for the threshold, frames not corresponding to scene changes are
Color Video Segmentation by Dissimilarity Based on Edges
269
included, increasing the rate of false positives. To obtain a better behaviour, it would be more suitable the use of a threshold calculated from the video characteristics, considering the variability between frames of different scenes. The selected methods use a fixed threshold which implies some limitations. In this work it is proposed the use of an adaptative threshold considering that shot changes correspond to outliers of the distribution of similarity measures. That is performed following the equation proposed by Kobla et al. [8]: Tl = μ + ασ
(4)
where α is a constant, and μ and σ are the mean and standard deviations of interframe differences. These values are computed dynamically on successive frames from the last scene change detected so this detection is based on the variability in the scene being computed each time. This contribution manages to increase the rate of true positives without increasing false detections, as it can be seen in Figure 3(b).
(a)
(b)
Fig. 3. Application examples of fixed and adaptive threshold on a video sequence. (a) Fixed threshold (b) Adaptative threshold.
4
Results
In this section the results obtained of applying the methods with a video dataset are presented. This dataset must be heterogeneous in terms of types of transitions and must contain some common cases of segmentation error, such as fast-moving sequences, significant lighting changes or gradual changes of different lengths. The selected video dataset was taken from TRECVID [9], which provides videos designed specifically to study this kind of methods. The dataset contains a total of 1338 changes of scene of which 1078 are cuts, 40 are fades, 211 are dissolves and 9 are other types. The evaluation of the methodologies is based on quality criteria, taking the number of scene changes detected, undetected and false detections. The used metrics are recall, which measures the ability to detect all the scene changes, and precision, which measures the ability to detect only scene changes. The used methods present several parameters that need to be tuned. For that reason, it was needed to perform a tuning process on a subset of the database
270
L. Ramos et al. Table 1. Evaluation of the results obtained with the methods
Threshold Fixed Grayscale Adaptative Edges Fixed Color ratios Adaptative Fixed Grayscale Adaptative Correlation Fixed HSV Adaptative
Cuts 52.75% 58.21% 71.07% 78.43% 74.22% 78.90% 82.80% 92.41%
Recall Dissolves 14.73% 28.04% 21.64% 41.18% 15.33% 30.02% 42.81% 50.00%
Fades 71.50% 77.50% 93.54% 100.00% 86.91% 50.00% 94.29% 66.98%
Precision 41.91% 52.11% 57.29% 68.50% 65.75% 66.10% 78.70% 81.90%
ROC Curve
ROC Curve 1
1
0.9
0.9
0.8
0.8
0.7
0.7
0.6
0.6 precision
precision
Recall 51.49% 59.01% 66.99% 70.38% 63.24% 64.17% 76.22% 82.60%
0.5
0.5
0.4
0.4
0.3
0.3 0.2
0.2
0.1
0.1
Grayscale with fixed threshold Color with adaptative threshold)
Grayscale with fixed threshold Color with adaptative threshold) 0
0 0
0.2
0.4
0.6 recall
(a)
0.8
1
0
0.2
0.4
0.6
0.8
1
recall
(b)
Fig. 4. ROC curve comparison between basic methods and improved ones. (a) Space domain (b) Frequency domain.
to obtain those values that gave the best results possible for a compromise between recall and precision. With this tuned parameter set, the experiments were extended to the entire database to test the behaviour of the methodologies. Table 1 shows the results obtained for each method in their versions of grayscale and color, using fixed and adaptive threshold. The grayscale versions presented some problems in scenes too dark or with excessive changes in illumination. Our extensions using color ratios and HSV space, which are insensitive to changes in lighting, solved this problem improving performance, especially in the case of fades, as seen in the results. Moreover, the use of adaptive threshold according to the video data, thanks to a dynamic reconfiguration considering the variability between consecutive frames, improves the results in terms of increasing the rate of true positives as well as reduces false detections. The worse results obtained with the frequency domain method with adaptative threshold in the case of fades could be due to low energy edges which causes the dissimilarity evolve slowly. That situation implies that the threshold cannot be adapted in time, so there are scene changes that are lost. It could be solved by applying a thresholding after the high pass filter to eliminate low energy edges.
Color Video Segmentation by Dissimilarity Based on Edges
271
This would be equivalent to the process of hysteresis of the edge detector in the method on space domain. The graphs of Figure 4 shows the ROC curves for the most basic versions of each of the methods, ie grayscale with fixed threshold, together with improved color versions with adaptive thresholding. As it can be clearly seen, the best results of recall versus precision were obtained with the improved versions.
5
Conclusions
In this paper two representative methods of edges based detection were studied, considering one technique on space domain and another on frequency domain. For each of them, an extension to color space was made and a variation to the stage of thresholding was added. Previous studies working with grayscale frames only considered variations in intensity for the dissimilarity between frames which means some limitations in certain cases related to luminosity. In this work, an approach to color extension was performed for these methods considering the color representation more adapted to the characteristics of each one. Moreover, an adaptive threshold was proposed which, unlike the fixed threshold suggested in the literature, is calculated dynamically depending on video data. All these approaches have been tested in a specifically selected database video. In the stage of experimentation it was found that these color extensions solved some problems found in the original versions and get better results. Furthermore, the adaptative threshold improves the results of detection in most cases increasing the success ratio and reducing false detections.
References 1. Nagasaka, A., Tanaka, Y.: Automatic video indexing and full-video search for object appearances. In: Visual Database Systems, IFIP Working Conference, October 1991, pp. 113–127 (1991) 2. Kasturi, R., Strayer, S.H., Gargi, U.: An evaluation of color histogram based methods in video indexing. In: International workshop on image databases and multimedia search, Amsterdam, The Netherlands, August 1996, vol. 9 (1996) 3. Zabih, R., Miller, J., Mai, K.: A feature-based algorithm for detecting and classifying production effects. IEEE Trans. on Pattern Analysis and Machine Intelligence 7(2), 119–128 (1999) 4. Ardebilian, M., Tu, X., Chen, L.: Robust 3d clue-based video segmentation for video indexing. J. of visual communication and Image Representation 11(1), 58–79 (2000) 5. Porter, S.V., Mirmehdi, M., Thomas, B.T.: Detection and classification of shot transitions (2000) 6. Canny, J.: A computational approach to edge-detection 8(6), 679–698 (1986) 7. Theo, G., Arnold, S.: Color-based object recognition. Pattern Recognition Journal 32, 453–464 (1999) 8. Kobla, V., DeMenthon, D., Doermann, D. (eds.): Special effect edit detction using VideoTrails: a comparison with existing techniques. SPIE Conference on Storage and Retrieval for Image and Video Databases VII (1999) 9. Trec video retrieval evaluation, http://www-nlpir.nist.gov/projects/trecvid
Label Dependent Evolutionary Feature Weighting for Remote Sensing Data Daniel Mateos-Garc´ıa, Jorge Garc´ıa-Guti´errez, and Jos´e C. Riquelme-Santos Department of Computer Science, Avda. Reina Mercedes S/N, 41012 Seville (Spain) {mateosg,jgarcia,riquelme}@lsi.us.es http://www.lsi.us.es
Abstract. Nearest neighbour (NN) is a very common classifier used to develop important remote sensing products like land use and land cover (LULC) maps. Evolutive computation has often been used to obtain feature weighting in order to improve the results of the NN. In this paper, a new algorithm based on evolutionary computation which has been called Label Dependent Feature Weighting (LDFW) is proposed. The LDFW method transforms the feature space assigning different weights to every feature depending on each class. This multilevel feature weighting algorithm is tested on remote sensing data from fusion of sensors (LIDAR and orthophotography). The results show an improvement on the NN and resemble the results obtained with a neural network which is the best classifier for the study area. Keywords: remote sensing, feature weighting, evolutionary computation, label dependence.
1
Introduction
Remote sensing is a very important discipline for many tasks like resource management, environmental monitoring, disaster response, etc. Since long time ago, machine learning techniques have been used to improve remote sensing performance and applicability. In addition, the use of active sensors like LIDAR (Light Detection and Ranging) has recently spread to improve the classical remote sensing products [1] which were mainly based on images. This fact involves a data complexity increase and makes machine learning even more important in order to extract meaningful information from remote sensing data. Remote sensing knowledge can be gathered in several products where land use and land cover (LULC) maps can be found as one of the most important. This product is based on a classification of the terrain by means of its own morphologic or functional characteristics and it is a main tool to develop policies to manage the natural environment. An automatic pixel classification which is generally supervised is usually the first step to extract LULC maps from remote sensing data. Several techniques from machine learning have been used to develop LULC maps with satisfactory results, e.g., k-NN [2], Naive Bayes [3], SVM [4], etc. E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 272–279, 2010. c Springer-Verlag Berlin Heidelberg 2010
Label Dependent Evolutionary Feature Weighting
273
Although machine learning validity has widely been proved in the remote sensing context, more research is needed in order to fulfill the standard requirements of many products from remote sensing and specially for LULC map development [5]. In this way, some researchers [6] have started to exploit optimization techniques (genetic algorithms) on their approaches showing that a weighted execution produces an improvement on the results. In addition, machine learning often applies evolutionary computation to search optimal weighting on both structural and functional aspects in order to improve the predictive models. From the standpoint of unsupervised learning, we can see some works that focus on the determination of weights for clustering algorithms. Generally, the considered model is the k-means algorithm and traditional evolutionary techniques [7] with differences in the fitness function, which can be distance-based or even based on information given by a combination of different algorithms. Additionally, there are basically three main areas of weighting application in supervised machine learning: support vector machines optimization, artificial neural networks (training and topology) and feature weighting. Thus, SVM kernel [8] or artificial neural networks [9] parameters can be optimized by means of genetic algorithms or genetic programming with good results. In this context, evolutionary algorithms are usually employed to find a set of weights for the feature space, allowing greater accuracy in the classification process [10]. A common individual encoding is a set of real values that represent the weights of each feature. The fitness is defined by the classification process itself. Therefore, the search process can be viewed as a global task in which the optimal weights are considered with respect to the features regardless of the label that each instance belongs. In this work, a novel proposal of applying evolutionary algorithms to search optimal weights for each feature depending on the label is shown. Existing methods in the literature usually work in a global way, i.e., the same weights are applied to all features. In opposition, this work shows that the importance of each feature can depend on the class to predict. Thus, for the LULC maps development, the features provided by orthophotos may have more leverage to distinguish vegetation textures, and the features provided by LIDAR can discriminate better different structures like buildings and roads since they include height measures. To the best of our knowledge, the application of this multiple weighting level has not been exploited enough and it can improve the results when a classical classifier is applied on remote sensing data. In this way, a new evolutionary method based on distances and a double weighting level is described with three main objectives: – Improve the general quality of a well-known machine learning technique like the k-NN classifier when it is applied on remote sensing data. – Obtain new information about what features are more important to classify each class by means of the study of the resulted weights per label. – Provide a new tool to develop high accuracy LULC maps from fusion of sensors (LIDAR and imagery).
274
D. Mateos-Garc´ıa, J. Garc´ıa-Guti´errez, and J.C. Riquelme-Santos
The rest of the paper is organized as follows. Section 2 describes the general process to select the feature weighting, highlighting the most interesting features of the applied evolutionary algorithm. The results achieved are shown in Section 3. Finally, Section 4 shows a summary of the conclusions and the future lines of work.
2 2.1
Method Data Description
The data for this study belongs to a geographical area in the north of Galizia (Spain) and it was obtained from the fusion of LIDAR and orthophotography information. LIDAR is an active sensor technology that measures properties of light (usually laser) to register distant targets. After a LIDAR flight, a cloud point database is available in which for every point, it is possible to find: spatial position(i.e., x, y and z coordinates), intensity of return, number of the return in a sequence (if a pulse caused multiple impacts), etc. This features and the RGB values in an orthophoto are used in this work to obtain statistics on which the instances for the model are based. A Digital Elevation Model (DEM) is needed to correct the height of objects. In this case, a DEM was extracted from the LIDAR data to make the correction. The orthophoto is used to extract features from the visible spectrum band. It was taken from the same area with similar weather conditions at the time of the LIDAR flight acquisition. From the original data set, 500 instances are classified manually to build the training set. Every instance from the training set has a total of 61 basic statistics (average, variance, minimum, maximum, standard deviation, etc.) from five different bands of the LIDAR and the image data: height, intensity, red band, green band and blue band; and 5 different classes, one for each land type: road, farming land, middle vegetation, high vegetation and buildings. 2.2
Preprocess
Before the generation of the model, a preprocess has to be carried out. Thus, three different filters are executed. First, every attribute missing value is replaced with the corresponding averaged value. Then, the data are standardized. Finally, a Correlation Feature Selection (CFS) method is applied in order to reduce the search space. With the 18 selected features already generated, the next phase is the execution of the evolutionary algorithm which is characterized in the next subsections. 2.3
Initial Population
The goal of the proposed evolutionary algorithm is to find an optimal set of weights in order to apply a lineal transformation to the feature space depending on each label and to improve the overall classification process. Thus, after the
Label Dependent Evolutionary Feature Weighting
275
evolutionary execution (see Fig. 1), a weight is obtained for each label and feature which is used to complete the classification process in two steps: 1. The weights are applied to the training instances according to its label. 2. Given a test instance, the transformed nearest neighbour label is chosen to classify the test instance. To carry out this idea, the population representation is as follows: An individual is a matrix which represents the weights per label for every feature. Hence, there is a row for each label which has as many columns as features. The initial population is then a matrix of weights where every value is randomly chosen. 2.4
Fitness Function
As previously said, the training data consist of a matrix P with n rows (each one represents a pixel) and f columns (one per feature). A class label is assigned by using the label function to each point of P . For simplicity, we assume that the label is an integer between 1 and b. Thus, a point pi is a row of P and a vector of Rf such that label(pi) = l ∈ {1..b}. A transformation is given by a matrix of weights W (wij ), with b rows (number of different labels), and f columns (number of features). Thus, pi is transformed through W in pi so that each feature is ”weighted” with a value depending on the class to which it belongs as follows: ∀j = 1...f, pij = wlabel(pi )j ∗ pij
(1)
As seen in Fig. 1, each training set P , is divided into n bags (3), so that the weights of the individual which is being evaluated are applied to n − 1 bags (5), and the remaining is used as initial test (6 et seq.). The transformation of each label is applied to each pixel of the test bag (6-8) and then, the nearest pixel from P is calculated (9). Once the point has been tested, it becomes part of P reinforcing the training (10). The label that makes the process return the shorter distance is chosen (12). If this label does not match the point test label, the fitness will be increase (13, 14). Therefore, to calculate the fitness function, the input parameters are a matrix P , the label function and a matrix W . The output is a function to measure the classification error rate which is the objective function to be minimized. 2.5
Crossover and Mutation
The crossover operation for two individuals is applied to every corresponding row (ith row of an individual is crossed with the ith row of the other) since they have the same label. The roulette-wheel method is selected as the method to obtain the individuals to cross. Besides, two techniques have been selected for the generation of the new individuals: the uniform crossover and the BLX-α crossover [11]. The uniform crossover consists of the pick out of a gene from one parent at random. The BLX-α crossover is described as follows: if g1 and g2 are
276
D. Mateos-Garc´ıa, J. Garc´ıa-Guti´errez, and J.C. Riquelme-Santos
3 w11 · · · w1f 7 6 W is the matrix 4 ... . . . ... 5 wb1 · · · wbf 1: fitness=0 2: for i = 1 to m do 3: We divide P into n bags: B1 , ..., Bn 4: for all bag Bk do 5: According to Equation 1, we apply the W transformation to every point from the remaining n − 1 bags, obtaining the set of points P 6: for all point pi in Bk do 7: for all label l ∈ {1..b} do 8: We construct the tranformed point pli so that plij = wlj ∗ pij 9: We calculate dl = minimum distance from pli to the points of P 10: We apply the W transformation to pi according to its label, and we add it to P 11: end for 12: We calculate the minimum from the distances dl . Let h ∈ {1..b}, the label of the point of P which makes dl . 13: if the label of pi = h then 14: f itness = f itness + 1 15: end if 16: end for 17: end for 18: end for 2
Fig. 1. Fitness function
the ith gene from each parent, the new gene is a real number randomly selected in the interval [Gmin − Iα, Gmax + Iα], where: α=positive real number, Gmax = max(g1 , g2 ), Gmin = min(g1 , g2 ), I = Gmax − Gmin The mutation operator has been defined to increase or decrease the value of a weight according to a probability p. The increase or decrease is a random value Δ that satisfies: Δ = r/10z , where : r ∈ R : (0 ≤ r ≤ 1) and z ∈ Z : (0 ≤ z ≤ n)
3
Results
To assess the quality of our approach, a comparison among several classifiers is carried out. The classifiers Naive Bayes, Support Vector Machines (obtained
Label Dependent Evolutionary Feature Weighting
277
Table 1. Averaged error rate for each studied algorithm Algorithm
Accuracy
Naive Bayes
0.15
SMO
0.14
Nearest Neighbour
0.13
Neural Network
0.10
Nearest Neighbour LDFW
0.10
by Sequential Minimal Optimization), Artificial Neural Networks (Multilayer Perceptron) and Nearest Neighbour (NN) with and without LDFW are chosen to compare their performance. Every model is built using the WEKA software [12]. For the experiments, the LDFW evolutionary algorithm is set up with the following parameters: a population of 20 individuals, 100 generations, a 10% of elitism and a 20% of mutation probability. To establish a fair comparison among the performances of the different algorithms, a stratified n-fold cross-validation method is used. Concretely, three 10-fold cross-validation with different random seeds are executed and the results for each fold are then registered. In Tab. 1, the overall error rate for each algorithm can be seen. In order to evaluate the statistical significance of the measured differences in algorithm ranks, we use a method for comparing classifiers across multiple data sets. In this case, there is only one data set since remote sensing data has high costs to be obtained. Thus, the set of measures are the partial results of the previous 10-fold cross-validation (30 measures for each classifier) and the Friedman test is selected to analyze those measures. The Friedman test is a nonparametric statistical test which evaluates the differences among more than two related sample means. The null hypothesis is that every classifier performs the same, regardless the differences among the registered results. In Equation 2, the statistic used can be seen. XF2 =
2 12n 2 k(k + 1) ( ) rj − k(k + 1) j 4
(2)
The Friedman test checks whether the average ranks are significantly different from the mean rank r = 2.5 expected under the null hypothesis. Leaning on a statistical package (MATLAB), p value for the Friedman test has resulted on a value of 7.0351E − 7 so the null hypothesis is rejected and the measured average ranks are significantly different (at α = 0.05 ). With this in mind, the results show that the performance of the Nearest Neighbour with LDFW is very similar to the resulted from the neural network which is the best classifier for the study area. However, as will be seen later, the
278
D. Mateos-Garc´ıa, J. Garc´ıa-Guti´errez, and J.C. Riquelme-Santos Table 2. Most important features according to its weight for the study zone
Class
Features
Road
MINSNDVI
PEC
IMAX
HCV
Farming Land
IMEAN
IGMEAN
HMAX
HCV
Middle Vegetation
HSTD
IGKURT
MINSNDVI
IGVAR
High Vegetation
IMAX
IRVAR
IGVAR
IGMEAN
Buildings
IGKURT
PCT32
EMP
MINSNDVI
H*: height statistic, I*: Intensity statistic, IG*: Intensity green band stat., IR*: Intensity red band stat., *SNDVI: Simulated Normalized Difference Vegetation Index stat., PEC: Penetration coef., PCT32: Percentage third or later returns over second returns.
LDFW technique provides a descriptive information about the most important features per class. Neural networks supply an approximation about the features importance too, but in a much less explicit manner. The rest of the classifiers show a lower accuracy. If the LDFW-NN is compared with the classical NN, it results in a 3% of improvement. In Tab. 2, the importance of the feature weighting according to the label can be seen. After the application of the LDFW, every class has its own set of features that determines its label the best. This information provides a very important feature selection tool and allows us to establish a more accurate class separation; e.g., vegetation classes are principally determined by the orthophoto, specially by the features that correspond to the green band (IG features) whilst roads are characterized better by LIDAR features.
4
Conclusions
In this paper, a new algorithm based on evolutionary computation which was called Label Dependent Feature Weighting (LDFW) was proposed. The LDFW method transforms the feature space assigning different weights to every feature depending on each class. This multilevel feature weighting algorithm was tested on remote sensing data from fusion of sensors (LIDAR and orthophotography) in order to improve a NN which is a very used classifier in the context of the LULC map development. The results showed an improvement of the 3% on the NN and resemble the results obtained with a neural network which was the best classifier for the study area. Additionally, the LDFW was able to provide qualitative and quantitative information about the importance of each feature in order to distinguish among the different classes. In future work, the use of other measures like entropy in lieu of distance will be a very interesting way to improve the results and should be taken into account. In addition, different transformation functions on the attributes which, at the
Label Dependent Evolutionary Feature Weighting
279
moment, are limited to linear kernels should be explored. Finally, the definition of this algorithm as an independent preprocess method is a primary objective so that more complex classifiers like ensembles could be tested.
References 1. Erdody, T., Moskal, L.: Fusion of LIDAR and imagery for estimating forest canopy fuels. Remote Sensing of Environment (to appear, 2010) 2. Atkinson, P.M.: Spatially weighted supervised classification for remote sensing. International Journal of Applied Earth Observation and Geoinformation 5(4), 277– 291 (2004) 3. Bork, E.W., Su, J.G.: Integrating lidar data and multispectral imagery for enhanced classification of rangeland vegetation: A meta analysis. Remote Sensing of Environment 111(1), 11–24 (2007) 4. Mazzoni, D., Garay, M.J., Davies, R., Nelson, D.: An operational misr pixel classifier using support vector machines. Remote Sensing of Environment 107(1-2), 149–158 (2007) 5. Shao, G., Wu, J.: On the accuracy of landscape pattern analysis using remote sensing data. Landscape Ecology (23), 505–511 (2008) 6. Tomppo, E.O., Gagliano, C., Natale, F.D., Katila, M., McRoberts, R.E.: Predicting categorical forest variables using an improved k-nearest neighbour estimator and landsat imagery. Remote Sensing of Environment (113), 500–517 (2009) 7. Krishna, K., Narasimha Murty, M.: Genetic K-means algorithm. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 29(3), 433–439 (2002) 8. Howley, T., Madden, M.G.: The genetic kernel support vector machine: Description and evaluation. Artificial Intelligence Review 24, 379–395 (2005) 9. Herv´ as-Mart´ınez, C., Mart´ınez-Estudillo, F., Carbonero-Ruz, M.: Multilogistic regression by means of evolutionary product-unit neural networks. Neural Networks 21(7), 951–961 (2008) 10. Komosinski, M., Krawiec, K.: Evolutionary weighting of image features for diagnosing of CNS tumors. Artificial Intelligence in Medicine 19(1), 25–38 (2000) 11. Eshelman, L.J., Schaffer, J.D.: Real-coded genetic algorithms and intervalschemata. In: Whitley, D.L. (ed.) Foundation of Genetic Algorithms 2, pp. 187–202. Morgan Kaufmann, San Mateo (1993) 12. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: An update. SIGKDD Explorations 11(1) (2009)
Evolutionary q-Gaussian Radial Basis Functions for Binary-Classification F. Fern´ andez-Navarro1, C. Herv´ as-Mart´ınez1, P.A. Guti´errez1, 1 M. Cruz-Ram´ırez , and M. Carbonero-Ruz2 1 2
Department of Computer Science and Numerical Analysis, University of Cordoba, ordoba, Spain Rabanales Campus, Albert Einstein building 3 floor, 14071, C´ Department of Management and Quantitative Methods, ETEA, Escritor Castilla Aguayo 4, 14005, Cordoba, Spain
Abstract. This paper proposes a Radial Basis Function Neural Network (RBFNN) which reproduces different Radial Basis Functions (RBFs) by means a real parameter q, named q-Gaussian RBFNN. The architecture, weights and node topology are learnt through a Hybrid Algorithm (HA) with the iRprop+ algorithm as the local improvement procedure. In order to test its overall performance, an experimental study with eleven datasets, taken from the UCI repository is presented. The RBFNN with the q-Gaussian is compared to RBFNN with Gaussian, Cauchy and Inverse Multiquadratic RBFs.
1
Introduction
Different types of neural networks, are being used for classification purposes [1], including, among others: Multilayer Perceptron Neural Networks (MLPNN) where the transfer functions are Sigmoidal Unit Basis Functions; Radial Basis Function Neural Networks (RBFNNs) with kernel functions where the transfer functions are usually Gaussian [2]; Product Unit Neural Networks (PUNNs)[3] with multiplicative units, or Neural Network where the hidden layer is composed by a mixture of basis functions [4]. We focus on RBFNNs which have been succesfully employed in different pattern recognition problems in the last years [5]. There are several common types of functions used as the transfer functions, for example, the standard Gaussian 1 (SRBF), φ(z) = e−z , the Multiquadratic (MRBF), φ(z) = (1 + z) 2 , the In1 verse Multiquadratic (IMRBF), φ(z) = (1 + z)− 2 , and the Cauchy (CRBF), −1 φ(z) = (1 + z) . In the output layer, the activations of the hidden units are combined in order to produce a classification of the input pattern. In this study, we investigate the q-Gaussian RBFNN, which can reproduce different RBFs, by changing a real parameter q. A Hybrid Algorithm (HA) is employed to select the parameters of the Radial Basis Functions (RBFs): the number of hidden nodes and the centers, width and the value of the parameter q of each q-Gaussian RBFNN of the population. This paper is organized as follows: a brief analysis of some works related with the models proposed is given in Section 2; Section 3 describes base classifier E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 280–287, 2010. Springer-Verlag Berlin Heidelberg 2010
Evolutionary q-Gaussian Radial Basis Functions for Binary-Classification
281
applied to binary-classification problems; In section 4, a methodology to optimize the RBF parameters based on Hybrid Algorithms is presented ; Section 5 explains the experiments carried out; and finally, Section 6 summarizes the conclusions of our work.
2
Related Works
A RBFNN is a three-layer feed-forward Neural Network. Let the number of nodes of the input layer, of the hidden layer and of the output layer be p, m and 1 respectively. For any sample x = [x1 , x2 , . . . , xp ], the output of the RBFNN is f (x). The model of a RBFNN can be described with the following equation: f (x) = β0 +
m
βi · φi (di (x))
(1)
i=1
where φi (di (x)) is a non-linear mapping from the input layer to the hidden layer, β = (β1 , β2 , . . . , βm ) is the connection weight between the hidden layer and the output layer, β0 is the bias. The function di (x) can be defined as: di (x) =
x − ci 2 θi2
(2)
where θi is the scalar parameter that defines the width for the i-th radial unit, . represents the Euclidean norm and ci = [c1 , c2 , . . . , cp ] the centers of the RBFs. The standard RBF (SRBF) is the Gaussian function, which is given by: φi (di (x)) = e−di (x) ,
(3)
The radial basis function φi (di (x)) can take different forms, including the Cauchy RBF (CRBF) defined by: φi (di (x)) =
1 1 + di (x)
(4)
and the Inverse Multiquadratic RBF (IMRBF), given by: φi (di (x)) =
1 1
(1 + di (x)) 2
(5)
Fig. 1 ilustrates the influence of the choice of the RBF in the hidden unit activation. One can observe that the Gaussian function presents a higher activation close to the radial unit center than the other two RBFs. In this paper, we propose the use of the q-Gaussian function as RBF. The q-Gaussian can be defined as: 1 (1 − (1 − q)di (x)) 1−q if (1 − (1 − q)di (x)) ≥ 0 φi (di (x)) = (6) 0 Otherwise.
282
F. Fern´ andez-Navarro et al.
1
1 RBFs SRBF CRBF IMRBF
0.9
q-Values q=0.25 q=0.75 q=1.00 q=1.50 q=2.00 q=3.00 q=4.00
0.9
0.8
0.8
0.7
0.7
0.6
0.6
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0
0 -4
-3
-2
-1
0
1
2
3
4
-4
-3
-2
(a)
-1
0
1
2
3
4
(b)
Fig. 1. Radial unit activation in one-dimensional space with c = 0 and θ = 1 for different RBFs: (a) SBRF, CRBF and IMRBF and (b) q-Gaussian with different values of q
The q-Gaussian can reproduce different RBFs for different values of the real parameter q. As an example, when the q parameter is close to 2, the q-Gaussian is the CRBF, for q = 3, the activation of a radial unit with an IMRBF for di (x) turns out to be equal to the activation of a radial unit with a q-Gaussian RBF for di (x) and, finally, when the value of q converges to 1, the q-Gaussian converges to the Gaussian function (SRBF). Fig. 1b presents the radial unit activation for the q-Gaussian RBF for different values of q. As we can see in Fig. 1b, a small change in the value of q represents a smooth modification on the shape of the RBF.
3
q-Gaussian RBF for Classification
To apply evolutionary neural network techniques, we consider a RBFNNs with softmax outputs and the standard structure: an input layer with a node for every input variable; a hidden layer with several RBFs; and an output layer with 1 node. There are no connections between the nodes of a layer and none between the input and output layers either. The activation function of the i-th node in the hidden layer (φi (di (x))) is given by Eq. 6 and the activation function of the output node (f (x)) is defined in Eq 1. The transfer function of all output nodes is the identity function. In this work, the outputs of the neurons are interpreted from the point of view of probability through the use of the softmax activation function. g(x) =
exp f (x) 1 + exp f (x)
(7)
where g(x) is the probability that a pattern x belongs to class 1. The probability a pattern x has of belonging to class 2 is 1 − g(x). The error surface associated with the model is very convoluted. Thus, the parameters of the RBFNNs are estimated by means of a HA (detailed in Section
Evolutionary q-Gaussian Radial Basis Functions for Binary-Classification
283
1: Hybrid Algorithm: 2: Generate a random population of size N 3: repeat 4: Calculate the fitness of every individual in the population 5: Rank the individuals with respect to their fitness 6: The best individual is copied into the new population 7: The best 10% of population individuals are replicated and they substitute the worst 10% of individuals 8: Apply parametric mutation to the best (pm )% of individuals 9: Apply structural mutation to the remaining (100 − pm )% of individuals 10: until the stopping criterion is fulfilled 11: Apply iRprop+ to the best solution obtained by the EA in the last generation. Fig. 2. Hybrid Algorithm (HA) framework
4). The HA was developed to optimize the error function given by the negative log-likelihood for N observations, which is defined for a classifier g: l(g) =
1 N
N
n=1 [−yn f (xn )+ + log exp f (xn )] .
(8)
where yn is the class that the pattern n belongs to.
4
Hybrid Algorithm
The basic framework of the HA is the following: the search begins with an initial population of RBFNNs and, in each iteration, the population is updated using a population-update algorithm which evolves both its structure and weights. The population is subject to operations of replication and mutation. Figure 2 describes the procedure to select the parameters of the radial units. The main characteristics of the algorithm are the following: 1. Representation of the Individuals. The algorithm evolves architectures and connection weights simultaneously, each individual being a fully specified RBFNN. The neural networks are represented using an object-oriented approach and the algorithm deals directly with the RBFNN phenotype. 2. Error and Fitness Functions. We consider l(g) (see Eq. 7) as the error function of an individual g of the population. The fitness measure needed for evaluating the individuals is a strictly decreasing transformation of the error 1 , where 0 < A(g) ≤ 1 . function l(g) given by A(g) = 1+l(g) 3. Initialization of the Population. The initial population is generated trying to obtain RBFNNs with the maximun possible fitness. First, 5, 000 random RBFNNs are generated. The centers of the radial units are firstly defined by the k-means algorithm for different values of k, where k ∈ [Mmin , Mmax ], being Mmin and Mmax the minimum and maximum number of hidden nodes allowed for any RBFNN model in the HA. The widths of the RBFNNs
284
F. Fern´ andez-Navarro et al.
are initialized to the geometric mean of the distance to the two nearest neighbourhoods and the q parameter to values near to 1, since when q → 1 the q-Gaussian reduces to the standard Gaussian RBFNN. A random value in the [−I, I] interval is assigned for the weights between the hidden layer and the output layer. The obtained individuals are evaluated using the fitness function and the initial population is finally obtained by selecting the best 500 RBFNNs. 4. Structural Mutation. Structural mutation implies a modification in the structure of the RBFNNs and allows the exploration of different regions in the search space, helping to keep the diversity of the population. There are four different structural mutations: hidden node addition, hidden node deletion, connection addition and connection deletion. These four mutations are applied sequentially to each network, each one with a specific probability. If the structural mutator adds a new node in the RBFNN, the q parameter is assigned to a γ value, where γ ∈ [0.75, 1.25], since when q → 1 the q-Gaussian reproduce to the SRBF. 5. Parametric Mutation. Different weight mutations are applied: – Centre, Radii and q Mutation. These parameters are modified in the following way: • Centre creep. The value of each centre is modified by adding a Gaussian noise, cji (t + 1) = cji (t) + ξ(t), where ξ(t) ∈ N (cji , ri ) and N (cji , ri ) represents a one-dimensional normally distributed random variable with mean cji and with variance the radius of the RBF hidden node. • Radius creep. The value of each radii is modified by adding another Gaussian noise, ri (t + 1) = ri (t) + ξ(t), where ξ(t) ∈ N (ri , d) and N (ri , d) represents a one-dimensional normally distributed random variable with mean ri and with variance the width of the range of each dimension (d). • Mutation of the q parameter. The q parameter is updated by adding an ε value, where ε ∈ [−0.25, 0.25], since the modification of the qGaussian RBFNN is very sensible to q variation (as we can see in Fig. 1b). – Output-to-Hidden Node Connection Mutations [3]. These connections are modified by adding another Gaussian noise, w(t+1) = w(t)+ξ(t), where ξ(t) ∈ N (0, T (g)) and N (0, T (g)) represents a one-dimensional normally distributed random variable with mean 0 and variance the network temperature (T (g) = 1 − A(g)). 6. iRprop+ Local Optimizer. The local optimization algorithm used in our paper is the iRprop+ [6] optimization method. The iRprop+ is believed to be a fast and robust learning algorithm. This algorithm applies a backtracking strategy (i.e. it decides whether to take a step back along a weight direction or not by means of a heuristic). In the proposed methodology, we run the EA and then apply the local optimization algorithm to the best solution obtained by the EA in the last generation.
Evolutionary q-Gaussian Radial Basis Functions for Binary-Classification
285
Table 1. Characteristics of the eleven datasets used for the experiments: number of instances (Size), number of Real (R), Binary (B) and Nominal (N) input variables, total number of inputs (#In.), number of classes (#Out.), per-class distribution of the instances (Distribution), minimum and maximum number of hidden nodes used for each dataset ([Mmin , Mmax ]) and the number of generations (#Gen.) Dataset Size R B N In Out Distribution [Mmin , Mmax ] Gen Labor 57 8 3 5 29 2 (30, 27) [2, 5] 20 Promoters 106 − − 57 114 2 (53, 53) [2, 5] 100 Hepatitis 155 6 13 − 19 2 (32, 123) [2, 5] 20 Sonar 208 60 − − 60 2 (98, 110) [2, 5] 40 Heart 270 13 − − 13 2 (150, 120) [2, 5] 100 BreastC 286 4 3 2 15 2 (201, 85) [2, 5] 40 Heart-C 302 6 3 4 26 2 (164, 138) [2, 5] 100 Liver 345 6 − − 6 2 (145, 200) [2, 5] 40 Vote 435 − 16 − 16 2 (267, 168) [2, 5] 20 Card 690 6 4 5 51 2 (307, 308) [4, 7] 40 German 1000 6 3 11 61 2 (700, 300) [1, 3] 200 All nominal variables are transformed to binary variables. BreastC: Breast-Cancer; Heart-C: Heart-disease (Cleveland).
5 5.1
Experiments Experimental Design
The proposed methodologies are applied to eleven datasets taken from the UCI repository [7], to test its overall performance when compared to other radial basis functions (SRBF, CRBF and the Inverse Multiquadratic RBF (IMRBF)). The selected datasets present different numbers of instances and features (see Table 1). The experimental design was conducted using a 10-fold cross validation, with 10 repetitions per each fold. The performance of each method has been evaluated using the correct classification rate in the generalization set (CG ). All the parameters used in the evolutionary algorithm except the maximun and minimun number of RBFs in the hidden layer and the number of generations have the same values in all problems analyzed below (see Table 1). We have done a simple linear rescaling of the input variables in the interval [−2, 2], Xi∗ being the transformed variables. The connections between hidden and output layer are initialized in the [−5, 5] interval (i.e. [−I, I] = [−5, 5]). The size of the population is N = 500. For the structural mutation, the number of nodes that can be added or removed is within the [1, 2] interval, and the number of connections to add or delete in the hidden and the output layer during structural mutations is within the [1, 7] interval.
286
F. Fern´ andez-Navarro et al.
Table 2. Comparison of the proposed basis functions to other basis functions: Mean and Standard Deviation (SD) of the accuracy results (CG (%)) from 100 executions, mean accuracy (C G (%)), mean ranking (R), p-Value and α for the Hommel post-hoc non-parametric tests in CG with α = 0.1 (q-Gaussian is the control method) Method(CG (%)) SRBF CRBF IMRBF Labor 91.33 ± 12.09 95.00 ± 11.24 91.66 ± 8.78 Promoters 75.54 ± 13.56 80.18 ± 6.66 81 .09 ± 8 .69 Hepatitis 86.33 ± 8.09 83.16 ± 7.15 85.12 ± 7.52 Sonar 78.38 ± 9.03 74.09 ± 10.20 76.02 ± 11.16 Heart 81.85 ± 8.97 83.70 ± 8.76 84.81 ± 8.45 BreastC 72.04 ± 6.39 71.35 ± 8.00 73.10 ± 6.39 Heart-C 85.44 ± 3.83 85.45 ± 5.59 85 .77 ± 3 .05 65.52 ± 6.31 Liver 68 .41 ± 5 .15 65.23 ± 8.23 Vote 96.32 ± 3.97 95.39 ± 3.59 94.94 ± 2.36 Card 86.08 ± 3.14 86 .52 ± 3 .55 85.94 ± 3.80 German 74.80 ± 3.82 74 .90 ± 3 .17 74.40 ± 2.50 C G (%) 81.50 81.36 81.67 R 2.72 2.99 2.72 p-Value 0.03 0.00 0.03 αHommel 0.10 0.03 0.05 The best result is in bold face and the second best result
5.2
q-Gaussian 93 .33 ± 11 .65 84.00 ± 6.15 85 .30 ± 7 .54 76 .04 ± 13 .56 84 .07 ± 7 .20 73 .06 ± 6 .77 85.79 ± 5.20 71.30 ± 6.50 96 .08 ± 3 .45 87.87 ± 0.37 75.25 ± 2.98 82.91 1.54 in italics.
Comparison to Other Radial Basis Functions
In Table 2, the mean and the standard deviation of the correct classification rate in the generalization set (CG ) is shown for each dataset and a total of 100 executions. From the analysis of the results, it can be concluded, from a purely descriptive point of view, that the q-Gaussian model obtained the best results for five datasets, the SRBF achieved the best performace for three dataset and the CRBF and IMRBF methods yield the higher performance for one and two datasets respectively. To determine the statistical significance of the rank differences observed for each method in the different datasets, we have carried out a non-parametric Friedman test [8] with the ranking of CG of the best models as the test variable (since a previous evaluation of the CG values results in rejecting the normality and the equality of variances’ hypothesis). The test shows that the effect of the method used for classification is statistically significant at a significance level of 10%, as the confidence interval is C0 = (0, F0.10 = 2.89) and the F-distribution / C0 for CG . Consequently, we reject the nullstatistical values are F ∗ = 8.34 ∈ hypothesis stating that all algorithms perform equally in mean ranking. Based on this rejection, the Hommel post-hoc test is used to compare all classifier to each other. The Hommel test was applied with the best performing model (q-Gaussian) as the control method. The results of the Hommel tests for α = 0.10 can be seen in Table 2, using the corresponding p and αHommel values.
Evolutionary q-Gaussian Radial Basis Functions for Binary-Classification
287
From the results of these tests, it can be concluded that the q-Gaussian model obtains a significantly higher ranking of CG when compared to the remaining of RBFs, which justifies the proposal.
6
Conclusions
In this paper, we have proposed a new approach to determine the optimized parameters for the q-Gaussian RBF applied to multi classification problems. These models have been designed with a HA constructed specifically to take into account the characteristics of this kernel model. The evaluation of the model and the algorithm for the eleven datasets considered, showed that the q-Gaussian RBF obtained a higher accuracy when compared to the remaining RBFs. Finally, some suggestions for future research are the following: to study other radial basis functions and to adapt the algorithm to deal with multi-class problems.
Acknowledgement This work has been partially subsidized by the TIN 2008-06681-C06-03 project of the Spanish Inter-Ministerial Commission of Science and Technology (MICYT), FEDER funds and the P08-TIC-3745 project of the “Junta de Andaluc´ıa” (Spain). The research of Francisco Fern´ andez-Navarro has been funded by the “Junta de Andalucia” Predoctoral Program, grant reference P08-TIC-3745.
References 1. Lippmann, R.: Pattern classification using neural networks. IEEE Communications Magazine 27, 47–64 (1989) 2. Bishop, C.M.: Neural networks for pattern recognition. Oxford University Press, Oxford (1996) 3. Mart´ınez-Estudillo, F.J., Herv´ as-Mart´ınez, C., Guti´errez, P.A., Mart´ınez-Estudillo, A.C.: Evolutionary product-unit neural networks classifiers. Neurocomputing 72(12), 548–561 (2008) 4. Guti´errez, P.A., Herv´ as-Mart´ınez, C., Carbonero, M., Fern´ andez, J.C.: Combined Projection and Kernel Basis Functions for Classification in Evolutionary Neural Networks. Neurocomputing 72(13-15), 2731–2742 (2009) 5. Freeman, J.A.S., Saad, D.: Learning and generalization in radial basis function networks. Neural Computation 7(5), 1000–1020 (1995) 6. Igel, C., H¨ usken, M.: Empirical evaluation of the improved rprop learning algorithms. Neurocomputing 50(6), 105–123 (2003) 7. Asuncion, A., Newman, D.: UCI machine learning repository (2007) 8. Friedman, M.: A comparison of alternative tests of significance for the problem of m rankings. Annals of Mathematical Statistics 11(1), 86–92 (1940)
Evolutionary Learning Using a Sensitivity-Accuracy Approach for Classification Javier S´ anchez-Monedero1, , C. Herv´ as-Mart´ınez1, F.J. Mart´ınez-Estudillo2, 2 Mariano Carbonero Ruz , M.C. Ram´ırez Moreno2 , and M. Cruz-Ram´ırez1 1
2
Department of Computer Science and Numerical Analysis, University of C´ ordoba, Spain Department of Management and Quantitative Methods, ETEA, Spain
Abstract. Accuracy alone is insufficient to evaluate the performance of a classifier especially when the number of classes increases. This paper proposes an approach to deal with multi-class problems based on Accuracy (C) and Sensitivity (S). We use the differential evolution algorithm and the ELM-algorithm (Extreme Learning Machine) to obtain multi-classifiers with a high classification rate level in the global dataset with an acceptable level of accuracy for each class. This methodology is applied to solve four benchmark classification problems and obtains promising results.
1
Introduction
To evaluate a classifier, the machine learning community has traditionally used the correct classification rate or accuracy to measure its default performance. In the same way, accuracy has been frequently used as the fitness function in evolutionary algorithms when solving classification problems. However, the pitfalls of using accuracy have been pointed out by several authors [1]. Actually, it is enough to simply realize that accuracy cannot capture all the different behavioural aspects found in two different classifiers. Assuming that all misclassifications are equally costly and that there is no penalty for a correct classification, we start from the premise that a good classifier should combine a high classification rate level in the testing set with an acceptable level for each class. Concretely, we consider traditionally used accuracy (C) and the minimum of the sensitivities of all classes (S), that is, the lowest percentage of examples correctly predicted as belonging to each class with respect to the total number of examples in the corresponding class. Recently, in [2], Huang et al. proposed an original algorithm called extreme learning machine (ELM) which randomly chooses hidden nodes and analytically determines (by using Moore-Penrose generalized inverse) the output weights of the network. The algorithm tends to provide good testing performance at
Corresponding author at: E-mail address: [email protected] Phone: +34-957218349. This work has been partially subsidized by TIN 2008-06681-C06-03 (MICYT), FEDER funds and the P08-TIC-3745 project of the “Junta de Andaluc´ıa” (Spain).
E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 288–295, 2010. c Springer-Verlag Berlin Heidelberg 2010
Evolutionary Learning Using Sensitivity-Accuracy Approach
289
an extremely fast learning speed. However, ELM may need a higher number of hidden nodes due to the random determination of the input weights and hidden biases. In [3], a hybrid algorithm called Evolutionary ELM (E-ELM) was proposed by using the differential evolution algorithm [4]. The experimental results obtained show that this approach reduces the number of hidden nodes and obtains more compact networks. In this paper, the simultaneous optimization of accuracy and sensitivity is carried out by means of the E-ELM algorithm combination. The key point of the algorithm is the fitness function considered, as the convex linear combination of accuracy and sensitivity, which tries to achieve a good balance between the classification rate level in the global dataset and an acceptable level for each class. The base classifier considered is the standard multilayer perceptron (MLP) neural network. The paper is structured as follows. First, we present our approach based on the sensitivity versus accuracy pair (S, C). The third section contains the evolutionary approach. Finally, the paper concludes with an analysis of the results obtained in four benchmark classification problems.
2
Accuracy and Sensitivity
We consider a classification problem with Q classes and N training or testing patterns with as a classifier obtaining a Q × Q contingency or confusion matrix g Q M (g) = nij ; i,j=1 nij = N where nij represents the number of times the patterns are predicted by classifier g to be in class j when they really belong to class i. Q Let us denote the number of patterns associated with class i by fi = j=1 nij , i = 1, . . . , Q. We start by defining two scalar measures that take the elements of the confusion matrix into consideration from different points of view. Let Si = nii /fi be the number of patterns correctly predicted to be in class i with respect to the total number of patterns in i (sensitivity for class i). Therefore, the sensitivity for class i estimates the probability of correctly predicting a class i example. From the above quantities we define the sensitivity S of the classifier as the minimum value of the sensitivities for each class, S = min {Si ; i = 1, . . . , Q}. We define the Correct Classification Rate or Accuracy, C = (1/N ) Q j=1 njj , which is the rate of all the correct predictions. Specifically, we consider the two-dimensional measure (S, C) associated with classifier g. The measure tries to evaluate two features of a classifier: global performance and the performance in each class. We represent S on the horizontal axis and C on the vertical axis. One point in (S, C) space dominates another if it is above and to the right, i.e. it has more accuracy and greater sensitivity. straightforward It is straightforward to prove the following relationship between C and S (see [5]). Let us consider a Q-class classification problem. Let C and S be respectively the accuracy and sensitivity associated with a classifier g, then S ≤ C ≤ 1 − (1 − S) p∗ , where p∗ = fQ /N is the minimum of the estimated prior probabilities.
290
J. S´ anchez-Monedero et al.
Therefore, each classifier will be represented as a point outside the shaded region in Fig. 2 (Fig. 2 is built from experimental data, see Section 4.2). Several points in (S, C) space are important to note. The lower left point (0, 0) represents the worst classifier and the optimum classifier is located at the (1, 1) point. Furthermore, the points on the vertical axis correspond to classifiers that are not able to predict any point in a concrete class correctly. Note that it is possible to find among them classifiers with a high level of C, particularly in problems with small p∗ [6]. Our objective is to build an evolutionary algorithm that tries to move the classifier population towards the optimum classifier located in the (1, 1) point in the (S, C) space. We think an evolutionary algorithm could be an adequate scheme allowing us to improve the quality of the classifiers, measured in terms of C and S, directing the solutions towards the (1, 1) point.
3 3.1
The Proposed Method Differential Evolution and Extreme Learning Machine
Let us consider the training set given by N samples D = {(xj , yj ) : xj ∈ RK , yj ∈ RQ , j = 1, 2, . . . , N }, where xj is a k × 1 input vector and yj is a Q × 1 target vector. Let us consider the MLP with M nodes in the hidden layer given by f = (f1 , f2 , . . . , fQ ): fl (x, θ l ) = β0l +
M
j=1
βjl σj (x, wj ), l = 1, 2, . . . , Q
where θ = (θ1 , . . . , θJ )T is the transpose matrix containing all the neural net l , w1 , . . . , wM ) is the vector of weights of the l outweights, θl = (β0l , β1l , . . . , βM put node,wj = (w1j , . . . , wKj ) is the vector of weights of the connections between the input layer and the jth hidden node, Q is the number of classes in the problem, M is the number of sigmoidal units in the hidden layer, x is the input pattern and σj (x, wj ) the sigmoidal function. Suppose we are training a MLP with M -nodes in the hidden layer to learn the N samples of set D. The linear system f (xj ) = yj , j = 1, 2, . . . , N , can be written in a more compact format as Hβ = Y, where H is the hidden layer output matrix of the network. The ELM algorithm randomly selects the wj = (w1j , . . . , wKj ) weights and biases for hidden nodes, and analytically determines the output weights β0l , β1l , . . ., l βM by finding the least square solution to the given linear system. The minimum ˆ = H† Y, where H† is norm least-square solution (LS) to the linear system is β the Moore-Penrose (MP) generalized inverse of matrix H. The minimum norm LS solution is unique and has the smallest norm among all the LS solutions. The Evolutionary Extreme Learning Machine (E-ELM) [3] improves the original ELM by using a Differential Evolution (DE) algorithm. Differential Evolution was proposed by Storn and Price [4] and it is known as one of the most efficient evolutionary algorithms.
Evolutionary Learning Using Sensitivity-Accuracy Approach
291
Require: P (Training Patterns), T (Training Tags) 1: Create a random initial population θ = [w1 , . . . , wk , b1 , . . . , bk ] of size N 2: for each individual do ˆ = ELM output(w, P, T ) {Calculate output weights} 3: β ˆ {Evaluate individual} 4: φλ = F itness(w, β 5: end for 6: Select best individual of initial population 7: while Stop condition is not met do 8: Mutate random individuals and apply crossover 9: for each individual in the new population do ˆ = ELM output(w, P, T ) {Calculate output weights} 10: β ˆ {Evaluate model} 11: φλ = F itness(w, β) 12: Select new individuals for replacing individuals in old population 13: end for 14: Select the best model in the generation 15: end while ˆ = ELM output(w, P, T ) 16: function β Calculate the hidden layer output matrix H ˆ = H† Y Calculate the output weight β ˆ 17: function φλ = F itness(w, β, λ, P, T ) Build training confusion matrix M Calculate C and S from M Get classifier fitness with (1) Fig. 1. E-ELM-CS algorithm pseudocode
3.2
The E-ELM-CS Algorithm
As mentioned in Section 2, our approach tries to build classifiers with C and S simultaneously optimized. These objectives are not always cooperative. This fact justifies the use a multi-objective approach for the evolutionary algorithm [6]. To obtain the maximization of objectives C and S we use a linear combination of these objectives. This option is a good method when there are two objectives and when the first Pareto front has a very small number of models, in some cases only one (see results from MPANN methodology in Balance and Newthyroid datasets in Table 2). In addition, its computational cost is noticeably lower. Weighted linear combination proves to be very efficient in practice for certain types of problems, for example in combinatorial multi-objective optimization. Some of the applications of this technique are schedule evaluation of a resource scheduler or design multiplierless IIR filters. We consider the fitness function defined by φλ = (1 − λ)C + λS, where λ is a user parameter in [0, 1]. This function evaluates the performance of a classifier depending on a weighted Accuracy level and a weighted Sensitivity. Our proposed method is implemented by using the Evolutionary ELM (EELM)[3]. E-ELM for classification problems only considers the misclassification rate of the classifier. We have extended the E-ELM to consider both C and S (E-ELM-CS, Evolutionary ELM considering C and S). Since E-ELM considers
292
J. S´ anchez-Monedero et al. Table 1. Datasets used for the experiments Dataset BreastC BreastCW Balance Newthyroid
Size #Input #Classes Distribution p∗ 286 15 2 (201,85) 0.2957 699 9 2 (458,241) 0.3428 625 4 3 (288,49,288) 0.0641 215 5 3 (150,35,30) 0.1296
an error measure as the fitness which should be minimized, we reformulate our fitness function as: 1 (1) φλ = (1 − λ)C + λS The E-ELM-CS algorithm pseudocode is shown in Fig.1. Mutation, crossover and selection operations work as described in [3].
4
Experiments
We consider four datasets with different features taken from the UCI repository (see Table 1). The experimental design was conducted using a stratified holdout procedure with 30 runs, where approximately 75% of the patterns were randomly selected for the training set and the remaining 25% for the test set. 4.1
Comparison Procedure
E-ELM-CS is compared to two popular classification algorithms using ANNs: 1. MPANN (Memetic Pareto Artificial Neural Networks) [7]. MPANN is a Multi-objective Evolutionary Algorithm based on Differential Evolution [8] with two objectives; one is to minimize the mean squared error (MSE) and the other is to minimize ANN complexity (the number of hidden units). We have implemented a Java version using the pseudocode shown in [7] and the framework for evolutionary computation JCLEC 1 . Thus, the methodology is named MPANN-MSE when the extreme the Pareto front chosen, that is provided by the algorithm, has better MSE; or is called MPANN-HN if the extreme that is chosen provided by the algorithm has better complexity value. 2. TRAINDIFFEVOL (Differential evolution training algorithm for NeuralNetworks) [8]. TRAINDIFFEVOL is an algorithm to train feed forward multilayer perceptron neural networks based on Differential Evolution [4]. This algorithm uses the MSE and mean squared weights and biases for training the networks. To obtain the sensitivity for each class, a modification of the source code provided by the author2 has been implemented. 1 2
http://jclec.sourceforge.net/ http://www.it.lut.fi/project/nngenetic/
Evolutionary Learning Using Sensitivity-Accuracy Approach
293
Table 2. Statistical results for E-ELM-CS, E-ELM, TRAINDIFFEVOL , MPANNMSE and MPANN-HN C(%) S(%) Means Ranking Means Ranking Algorithm Mean±SD Mean±SD of the C of the S µELMCS ≥ E-ELM-CSλ=0.4 68.97±3.19 33.97±6.82 µELMCS ≥ μTDIF ≥ μELM > μMPANHN ≥ E-ELM 68.36±1.98 23.33±6.42 μMPANHN ≥ μMPAN ≥ μTDIF > TDIF 68.92±2.89 26.35±11.71 μMPAN μELM ; µELMCS > MPANN-MSE 66.53±3.07 28.73±14.23 μMPANHN , (◦) MPANN-HN 66.53±3.07 28.41±14.34 (T-test) µELMCS ≥ BreastCW E-ELM-CSλ=0.4 96.32±0.86 93.87±2.28 µELMCS ≥ μMPANHN ≥ μMPANHN ≥ E-ELM 95.68±1.19 92.61±3.21 μMPAN ≥ μELM > μMPAN ≥ μELM > TDIF 93.98±1.75 86.22±4.69 μTDIF μTDIF MPANN-MSE 96.04±1.08 92.75±3.40 MPANN-HN 96.27±1.00 93.30±3.36 µELMCS > Balance E-ELM-CSλ=0.7 91.48±1.50 86.74±10.01 µMPANHN ≥ μMPANHN , (*) MW E-ELM 90.56±1.38 14.00±17.73 μELMCS , (*) MW test test TDIF 87.12±2.56 2.00±6.10 MPANN-MSE 92.94±1.81 60.00±14.14 MPANN-HN 92.94±1.81 60.00±14.14 µELMCS ≥ Newthy E-ELM-CSλ=0.9 96.23±2.31 80.85±11.88 µELMCS ≥ μMPANHN , MW E-ELM 94.26±2.35 75.77±10.16 μMPANHN ≥ μM P AN ≥ μELM > test µELMCS ≥ TDIF 91.11±4.77 59.47±22.74 μTDIF ; µELMCS > μMPANHN , MW MPANN-MSE 94.87±3.82 72.11±22.29 μMPANHN , test MPANN-HN 94.87±3.82 72.11±22.29 (◦)(T-test) (*) (◦) The average difference is significant with p-values= 0.05 or 0.10 Dataset BreastC
From a statistical point of view, these comparisons are possible because we use the same partitions of the datasets. If not, it would be difficult to justify the equity of the comparison procedure. Regarding the settings of each algorithm that has been compared to E-ELM-CS, we have used the algorithm values advised by the authors in their respective studies. 4.2
Experimental Results
In Table 2 we present the values of the mean and the standard deviation (SD) for C and S for 30 runs obtained in generalization for the best models in each run over the generalization set in each dataset. In E-ELM-CS, the λ parameter is a user parameter. The λ parameter has been obtained for each dataset as the best result of a preliminary experimental design with λ ∈ [0.0, 0.1, . . . , 1.0]. If we analyze the results for C in the generalization set, we can observe that the E-ELM-CS methodology obtains results that are, in mean, better than or similar to the results of the second best methodology (MPANN-HN in three data sets, TRAINDIFFEVOL in two data sets or E-ELM in one data set). On the other hand, the results in mean of S show that the E-ELM-CS methodology obtains a performance that is better than the second best methodology (MPANN-HN in four data sets and TRAINDIFFEVOL or E-ELM in one data set respectively). In order to determine the best methodology for training MLP neural networks (in the sense of its influence on C and S in the test dataset), an ANalysis Of the VAriance of one factor (ANOVA I) statistical method or the non parametric
294
J. S´ anchez-Monedero et al. unfeasible region E-ELMCS λ=0.7
E-ELM TRAINDIFFEVOL
MPAN-MSE MPAN-HN Best classifier
1 (1-p*=0.94)
C (Accuracy)
0.8
0.6
0.4
0.2
0 Worst 0 classifier
0.2
0.4
0.6
0.8
1
S (Sensitivity)
Fig. 2. Comparison of E-ELM-CS, E-ELM, TDIF, MPAN-MSE and MPAN-HN methods for Balance database
Kruskal-Wallis (KW) tests were chosen depending on the satisfaction of the normality hypothesis of C and S values. The factor methodologies analyze the effect on the C (or S) of each methodology applied with levels i = 1 . . . 5 to E-ELM-CS (ELMCS), E-ELM (ELM), TRAINDIFFEVOL (TDIF), MPANNMSE (MPAN) and MPANN-HN (MPANHN). The results of the ANOVA or KW analysis for C and S show that, for the four data sets, the effect of the methodologies is statistically significant at a level of 5%. Because there is a significant difference in mean for C and S using the Snedecor’s F or the KW test; for the former, under the normality hypothesis, a post hoc multiple comparison test is performed of the mean C and S obtained with the different levels of the factor. We perform a Tukey test [20] under normality, and a pair-wise T-test, or a pair-wise Mann-Whitney test in other cases. Table 2 shows the results obtained. Columns 5 and 6 present the post hoc Tukey test; and T-test or Mann-Whitney (M-W) tests. The mean difference is significant with p-values= 0.05 (*) or 0.10 (◦). The μA ≥ μB : methodology A yields better results than methodology B, but the difference is not significant; μA > μB : methodology A yields better results than methodology B with significant differences. The binary relation ≥ is not transitive. Observe that there is a relationship between the imbalanced degree of the dataset and the results obtained by the E-ELM-CS algorithm. It is worthwhile to point out that, for imbalanced datasets, E-ELM-CS gets the best performance results and the highest differences in S when comparing the algorithms (see Balance and Newthyroid results in Table 2). On the other hand, even for two class problems we can observe the same behaviour (compare the S results of BreastCW and BreastC datasets). Finally, our approach improves Sensitivity levels with respect to the original Evolutionary Extreme Learning Machine (EELM), while maintaining Accuracy at the same level.
Evolutionary Learning Using Sensitivity-Accuracy Approach
295
Fig. 2 depicts the sensitivity-accuracy results of the four methodologies for the Balance dataset in the (S, C) space. A visual inspection of the figure allows us to easily observe the difference in the performance of E-ELM-CS with respect to E-ELM, TDIF and MPANN.
5
Conclusions
This work proposes a new approach to deal with multi-class classification problems. Assuming that a good classifier should combine a high classification rate level in the global dataset with an acceptable level for each class, we consider traditionally used Accuracy C and the minimum of the sensitivities of all classes, S. The differential evolution algorithm and the fast ELM algorithm are used to optimize both measures in a multi-objective optimization approach, by using a fitness function built as a convex linear combination of S and C. The procedure obtains multi-classifiers with a high classification rate level in the global dataset with a good level of accuracy for each class. In our opinion, the (S, C) approach reveals an interesting point of view for dealing with multi-class classification problems since it improves sensitivity levels with respect to the Evolutionary Extreme Learning Machine, while maintaining accuracy at similar levels.
References 1. Provost, F., Fawcett, T.: Analysis and visualization of classifier performance: Comparison under imprecise class and cost distributions. In: Proc. of the 3rd International Conference on Knowledge Discovery and Data Mining, pp. 43–48 (1997) 2. Huang, G.B., Chen, L., Siew, C.K.: Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE Transactions on Neural Networks 17(4), 879–892 (2006) 3. Zhu, Q.Y., Qin, A., Suganthan, P., Huang, G.B.: Evolutionary extreme learning machine. Pattern Recognition 38(10), 1759–1763 (2005) 4. Storn, R., Price, K.: Differential evolution – a simple and efficient heuristic for global optimization over continuous spaces. J. of Global Opt. 11(4), 341–359 (1997) 5. Mart´ınez-Estudillo, F., Guti´errez, P., Herv´ as-Mart´ınez, C., Fern´ andez, J.: Evolutionary learning by a sensitivity-accuracy approach for multi-class problems. Accepted in the Proceedings of the 2008 IEEE Congress on Evolutionary Computation (CEC 2008), Hong Kong, China (2008) 6. Fern´ andez, J., Mart´ınez-Estudillo, F., Herv´ as, C., Guti´errez, P.: Sensitivity versus accuracy in multi-class problems using memetic pareto evolutionary neural networks. IEEE Transacctions on Neural Networks (accepted, 2010) 7. Abbass, H.A.: Speeding up backpropagation using multiobjective evolutionary algorithms. Neural Computation 15(11), 2705–2726 (2003) 8. Ilonen, J., Kamarainen, J.K., Lampinen, J.: Differential evolution training algorithm for feed-forward neural networks. Neural Process. Lett. 17(1), 93–105 (2003)
An Hybrid System for Continuous Learning Aldo Franco Dragoni, Germano Vallesi, Paola Baldassarri, and Mauro Mazzieri Department of Ingegneria Informatica, Gestionale e dell’Automazione (DIIGA), Università Politecnica delle Marche, Via Brecce Bianche, 60131 Ancona, Italy {a.f.dragoni,g.vallesi,p.baldassarri,m.mazzieri}@univpm.it
Abstract. We propose a Multiple Neural Networks system for dynamic environments, where one or more neural nets could no longer be able to properly operate, due to partial changes in some of the characteristics of the individuals. We assume that each expert network has a reliability factor that can be dynamically re-evaluated on the ground of the global recognition operated by the overall group. Since the net’s degree of reliability is defined as the probability that the net is giving the desired output, in case of conflicts between the outputs of the various nets the re-evaluation of their degrees of reliability can be simply performed on the basis of the Bayes Rule. The new vector of reliability will be used for making the final choice, by applying two algorithms, the Inclusion based and the Weighted one over all the maximally consistent subsets of the global outcome. Keywords: Multiple Neural Networks, Hybrid System, Bayesian Conditioning.
1 Introduction Several researches indicate that some complex recognition problems cannot be effectively solved by a single neural network but by “Multiple Neural Networks” systems [1]. The idea is to decompose a large problem into a number of subproblems and then to combine the subsolutions into the global one. Normally modules are domain specific and have specialized computational architectures to recognize certain subsets of the overall task [2]. Each module is typically independent and does not influence or become influenced by the others. Being simpler architectures, modules can respond to given input faster [2]. The combination of expert neural networks can be competitive, cooperative or totally decoupled, but is particularly critical when there are incompatibilities between them. In this case it is necessary to use mechanisms to deal with contradictions. In this work we apply a multiple neural network system to the problem of face recognition and we propose a model for detecting and solving eventual contradictions into the global outcome. Each neural network is trained to recognize a significant region of the face and is assigned an arbitrary a-priori degree of reliability (that may depend on the region of face that must be recognized). This reliability factor can be dynamically re-evaluated on basis of the Bayesian Rule after that contradictions eventually arise. The conflicts depend on the fact that there may be no global agreement about the recognized subject, may be for s/he changed some features of her/his face. The new vector of reliability obtained through the Bayes Rule will be E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 296–303, 2010. © Springer-Verlag Berlin Heidelberg 2010
An Hybrid System for Continuous Learning
297
used for making the final choice, by applying the “Inclusion based” algorithm [3] or another “Weighted” algorithm over all the maximally consistent subsets of the global output. Networks that do not agree with this choice are required to retrain themselves automatically on the basis of the recognized subject. In this way, the system should be able to follow the changes of the faces of the subjects, while continuing to recognize them even after many years thanks to this continuous process of self training.
2 Theoretical Background In this section we introduce some theoretical background from Belief Revision field. Belief Revision occurs when a new piece of information inconsistent with the present belief set is added in order to produce a new consistent belief system [4].
Source: Information
Nogoods
U: β
U: β V: α T: α→⌐β
Bayesian Conditioning
Source
T:α→⌐β
U: β T: α→⌐β
V: α U: β
V: α Goods
U V T
V: α T: α→⌐β
Source
A-posteriori Reliability 0.7983869 0.566774 0.3951612 Selection algorithms
Knowledge Base
U V T
A-priori Reliability 0.9 0.8 0.7
V: α U: β
Fig. 1. A “Belief Revision” mechanism
In Figure 1, we see a Knowledge Base (KB) which two pieces of information: α, which come from source V, and the rule ”if α, then not β” that comes from source T. Unfortunately, another piece of information β, from source U, is coming, causing a conflict in KB. To solve it we find all the “maximally consistent subsets”, called Goods, inside the inconsistent KB, and we choose one of them as the most believable one. In our case there are three Goods: {α, β}; {β, α →¬β}; {α, α→¬β}. Each source of information is associated with an a-priori “degree of reliability”, which is intended as the a-priori probability that provides correct information. In case of conflicts the “degree of reliability” of the involved sources should decrease after “Bayesian Conditioning” which is obtained as follows. Let S = {s1, ..., sn} be the set of the sources, each source si is associated with an a-priori reliability R(si). Let φ be an element of 2S. If the sources are independent, the probability that only the sources belonging to the subset φ ⊆ S are reliable is:
R (φ ) =
∏ R ( si ) * ∏ (1 − R ( si )) . si ∈φ
si ∉φ
(1)
This combined reliability can be calculated for any φ providing that:
∑ R(φ ) = 1 . φ ∈2
S
(2)
298
A.F. Dragoni et al.
Of course, if the sources belonging to a certain φ give incompatible information, then R( φ ) must be zero. Nogoods are defined to be “minimally inconsistent subsets”, so that Goods and Nogoods are dual notions, and finding all the Goods means also finding all the Nogoods. Now, what we have to do is: • summing up into RContradictory the a-priori reliability • putting at zero the reliabilities of all the Nogoods and their supersets • dividing the reliability of all the other set of sources by 1 − RContradictory.
The last step assures that the constrain (2) is still satisfied and it is well known as “Bayesian Conditioning”. The revised reliability NR(si) of a source si is defined to be the sum of the reliabilities of the elements of 2S that contain si. If a source has been involved in some contradictions, then NR(si) ≤ R(si), otherwise NR(si) = R(si). For instance, the application of this Bayesian conditioning to the case of Figure 1 produces the following new reliability: NR(U) = ∑NR(U ∈ S) = 0.7983869, NR(V) = ∑NR(V ∈ S) = 0.566774 and finally, NR(T) = ∑NR(T ∈ S) = 0.3951612. 2.1 Selection Algorithms
These new “degrees of reliability” will be used for choosing the most credible Goods as the one suggested by “the most reliable sources”. There are tree algorithms to perform this task: 1) Inclusion based (IB) This algorithm works as follows: a) select all the Good which contains information provided by the most reliable source b) if the selection returns just one Good, STOP, that’s the searched most credible Good c) else, if there are more than one Good then pop the most reliable source from the list and go to step 1 d) if there are no more Goods in the selection, the ones that were selected at the previous iteration will be returned as the most credible ones with the same degree of credibility. 2) Inclusion based weighted (IBW) is a variation of Inclusion based: each Good is associated with a weight derived from the sum of Euclidean distances between the neurons of the networks (i.e. the inverse of the credibility of the recognition operated by each net). If IB select more than one Good, then IBW selects as winner the Good with a lower weight. 3) Weighted algorithm (WA) combines the a-posteriori reliability of each network with the order of the answers provided. Each answer has a weight 1 n where
n ∈ [1; N ] represents its position among the N responses. Every Good is given a weight obtained by joining together the reliability of each network that supports it with the weight of the answer given by the network itself, as shown in the following equation, where WGoodj is:
Mj
An Hybrid System for Continuous Learning
299
(
(3)
)
WGood j = ∑ 1 * Reli . ni i =1 WGoodj: Weight of Goodj Reli: Reliability of network i ( i ∈ Good j )
ni: Position in the list of answers provided by the network i. Mj: Number of network that compose Goodj If there are more than one Good with the same reliability then the winner is the Good with the highest weight.
3 Face Recognition System an Example Many methods of face recognition have been proposed during the past 30 years [5, 6]. These methods are broadly classified into Holistic methods, Local methods and Hybrid methods [7]. In the Holistic methods each face image is represented as a single high-dimensional vector by concatenating the grey values of all the pixels in the face; Local methods use the local facial features, and, finally Hybrid methods use both local and holistic features to recognize a face. We focus the attention on the Local methods that provide flexibility to recognize a face based on its parts. Each face is represented by a set of four rectangular masks representing her main facial features, i.e., eyes, nose, mouth and hair [8]. We apply the Belief Revision method to the problem of recognizing faces by means of a “Multiple Neural Networks” system. There are four neural nets specialized to perform a specific task: eyes recognition (E), nose recognition (N), mouth recognition (M) and, finally, hair recognition (H). Their outputs are the recognized subjects, and conflicts are simple disagreements regarding the subject recognized. The group should be able to recognize the face even if partial changes occurred. Neural networks are able to upgrade themselves in presence of changes in the input pattern. As an example, let’s suppose that during the testing phase, the system has to recognize the face of four persons: Andrea (A), Franco (F), Lucia (L) and Paolo (P). According to the value of the weights of each trained network, each net will provide in output a list of names of subjects, ordered from the most probable to the least one. For the purpose of this example, we take into account only the first two outputs (i.e. let’s limit the uncertainty to the first two most probable names). Let’s suppose that, after the testing phase, the outputs of the networks are as follows: E gives as output “A or F”, N gives “A or P”, M gives “L or P” and, finally H gives “L or A”. So, the 4 networks do not globally agree since the intersection of the four outputs is void. The problem is to establish the most credible individual corresponding to this contradictory global output. To solve this problem we adopt the Belief Revision method. First of all we need to give an a-priori degree of reliability to each network. Then we have to find Goods and Nogoods. In our example the Goods (the largest subsets of {E,N,M,H} which agree in the choice of at least one subject) are: {E,N,H} corresponding to Andrea, {N,M} corresponding to Paolo and, finally {M,H} corresponding to Lucia. Besides, we identify two Nogoods (the smallest subsets of {E,N,M,H} which have no subject in common): {N,M,H} and {E,M}. Now we have to choose the most credible Good, i.e. the one “provided by the most reliable
300
A.F. Dragoni et al.
networks”. However the reliability of the networks are changed due to the fact they felt in conflict. Starting from an undifferentiated a-priori reliability factor of 0.9, and applying the method described in the previous section we get the following new vector of reliability: NR(E)=0.7684, NR(N)=0.8375, NR(M)=0.1459 and NR(H)=0.8375. The networks N and H have the (same) highest reliability, and by applying the “Inclusion based” algorithm it turns out that the most credible Goods is {E,N,H}, which corresponds to Andrea. So Andrea results from the collective image processing.
Fig. 2. Schematic representation of the Face Recognition System (FRS)
Figure 2 shows a schematic representation of this Face Recognition System (FRS). Which is able to recognize the most probable individual even in presence of serious conflicts among the outputs of the various nets.
4 A Never-Ending Learning Phase Back to the example in Section III, let’s suppose that the network M is not able to recognize Andrea from is mouth. There can be two reasons for the fault of M: either the task of recognizing any mouth is objectively harder, or Andrea could have recently changed the shape of his mouth (perhaps because of the grown of a goatee or moustaches). The second case is interesting because it shows how our FRS could be useful for coping with dynamic changes in the features of the subjects. In such a dynamic environment, where the input pattern partially changes, some neural networks could no longer be able to recognize them. However, if the changes are minimal, we guess that most of the networks will still correctly recognize the face. So, we force each faulting network to re-train itself on the basis of the recognition made by the overall group. On the basis of the a-posteriori reliability and of the Goods, our idea is to automatically re-train the networks that did not agree with the others. The network that do not support the most credible Good are forced to re-train themselves in order to “correctly” (according to the option of the group) recognize the changed face. Each iteration of the cycle applies Bayesian conditioning to the a-priori “degrees of reliability” producing an a-posteriori vector of reliability. To take into account the history of the responses that came from each network, we maintain an “average vectors of reliability” produced at each recognition, always starting from the a-priori degrees of reliability. This average vector will be given as input to the two algorithms, IBW and WA, instead of the a-posteriori vector of reliability produced in the current recognition. In other words, the difference with respect to the BR mechanism described in section 2 is that we do not give an a-posteriori vector of reliability to the two algorithms (IBW and WA), but the average vector of reliability calculated since the FRS started to work with that set of subjects to recognize.
An Hybrid System for Continuous Learning
301
With this feedback, our FRS performs a continuous learning phase adapting itself to partial continuous changes of the individuals in the population to be recognized.
Fig. 3. Schematic representation of re-learning to the system when the input is partially changed
Figure 3 shows the behaviour of the system when the testing image partially changes. Now the subject has moustaches and goatee. So OM network (specialized to recognize the mouth) is no longer able to correctly indicate the tested subject. Since all the others still recognize Andrea, OM will be retrained with the mouth of Andrea as new input pattern.
5 Partial Experimental Results This section shows only partial results: those obtained without the feedback, discussed in the previous section. We compare two groups of neural networks: the first consisting of four networks and the second with five (the additional network is obtained by separating the eyes in two separate networks). All the networks are Learning Vector Quantization, LVQ 2.1 [9], a variation of Kohonen’s LVQ [10], each one specialized to respond to individual template of the face. Learning rate used is shown in the following equation:
α (t ) = η e ( − β t ) .
(4)
where α (t ) decreases monotonically with the number of iterations t (η=0.25 and β=0.001, values obtained after a series of tests to optimize networks). The training set is composed of 20 subjects (taken from FERET database [11]), for each one 4 pictures were taken for a total of 80. Networks were trained, during the learning phase, with three different epochs: 3000, 4000 and 5000. To find Goods and Nogoods, from the networks responses we use two methods: 1) Static method: the cardinality of the response provided by each net is fixed a priori. We choose values from 1 to 5, 1 meaning the most probable individual, while 5 meaning the most five probable subjects 2) Dynamic method: the cardinality of the response provided by each net changes dynamically according to the minimum number of “desired” Goods to be searched among. In other words, we set the number of desired Goods and reduce the cardinality of the response (from 5 down to 1) till we eventually reach that number (of course, if all the nets agree in their first name there will be only one Goods).
302
A.F. Dragoni et al.
In the next step we applied the Bayesian conditioning [12], on the Nogoods obtained with the two previous techniques, obtaining an a-posteriori vector of reliability. These new “degrees of reliability” will be used for choosing the most credible Good (i.e. the name of subject). We use two algorithms to perform this task: 1) Inclusion based weighted (IBW). 2) Weighted algorithm (WA).
Fig. 4. Rate of correct recognition obtained by two algorithms: IBW and WA. Calculated both with 4 and 5 neural networks at different epochs of training (3000, 4000 and 5000)
To test our work, we took 488 different images of the 20 subjects. Figure 4 reports the rate of correct recognition for this test. It shows how WA is better than IBW and the best solution for WA is achieved with five neural networks and 5000 epochs in both the methods (Static and Dynamic). Figure 5 shows the average values of correct recognition of WA with 5000 epochs obtained by the two methods. These results show how the union of the Dynamic method with the WA and five neural networks gives the best solution to reach a 78.20% correct recognition rate of the subjects.
Fig. 5. Average rate of correct recognition with four and five neural networks for Weighted Algorithm and 5000 epochs in both cases Static and Dynamic
6 Conclusion and Future Work Our hybrid method integrates multiple neural networks with a symbolic approach to Belief Revision to deal with pattern recognition problems that:
An Hybrid System for Continuous Learning
303
1) require the cooperation of multiple neural networks specialized on different topics 2) the individuals to recognize change dynamically some of their features so that some nets occasionally fail. We tested this hybrid method with a face recognition problem, training each net on a specific region of the face: eyes, nose, mouth, and hair. Every output unit is associated with one of the persons to be recognized. Each net gives the same number of outputs. We consider a constrained environment in which the image of the face is always frontal, lighting conditions, scaling and rotation of the face being the same. We accommodated the test so the changes of the faces are partial, for example the mouth and hair do not change simultaneously. The system assigns a reliability factor to each neural network, which is recalculated on the basis of conflicts that occur among them. The new “degrees of reliability” are obtained through the Bayesian Conditioning. These new “degrees of reliability” can be used to select the most likely subject. The networks that do not agree with the choice made by the overall group will be forced to re-train themselves on the basis of the global output. So, the overall system is engaged in a never ending loop of testing and re-training that makes it able to cope with dynamic partial changes in the features of the subjects.
References 1. Shields, M.W., Casey, M.C.: A theoretical framework for multiple neural network systems. Neurocomputing 71, 1462–1476 (2008) 2. Li, Y., Zhang, D.: Modular neural networks and their applications in biometrics. Trends in Neural Computation 35, 337–365 (2007) 3. Benferhat, S., Cayrol, C., Dubois, D., Lang, J., Prade, H.: Inconsistency management and prioritized syntax-based entailment. In: Proceedings of the 13th International Joint Conference on Artificial Intelligence (IJCAI 1993), pp. 640–645 (1993) 4. Gärdenfors, P.: Belief Revision. Cambridge Tracts in Theoretical Computer Science, vol. 29 (December 2003) 5. Tolba, A.S., El-Baz, A.H., El-Harby, A.A.: Face recognition a literature review. International Journal of Signal Processing 2, 88–103 (2006) 6. Zhao, W., Chellappa, R., Phillips, P.J., Rosenfeld, A.: Face recognition: a literature survey. ACM Computing Surveys 35(4), 399–458 (2003) 7. Tan, X., Chen, S., Zhou, Z.H., Zhang, F.: Face recognition from a single image per person: a survey. Pattern Recognition 39, 1725–1745 (2006) 8. Brunelli, R., Poggio, T.: Face recognition: features versus template. IEEE Transactions on Pattern Analysis and Machine Intelligence 15(10), 1042–1052 (1993) 9. Kohonen, T., Hynninen, J., Kangas, J., Laaksonen, J., Torkkola, K.: Lvq Pak: The Learning Vector Quantization Program Package (1995) 10. Kohonen, T.: Learning vector quantization. In: Self-Organising Maps, 3rd edn. Springer Series in Information Sciences. Springer, Heidelberg (1995) 11. Philips, P.J., Wechsler, H., Huang, J., Rauss, P.: The FERET Database and Evaluetion Procedure for Face-Recognition Algorithms. Image and Vision Computing J. 16(5), 295– 306 (1998) 12. Dragoni, A.F.: Belief revision: from theory to practice. The Knowledge Engineering Review, 147–179 (2001)
Support Vector Regression Algorithms in the Forecasting of Daily Maximums of Tropospheric Ozone Concentration in Madrid E.G. Ortiz-Garc´ıa, S. Salcedo-Sanz, A.M. P´erez-Bellido, J. Gasc´on-Moreno, and A. Portilla-Figueras Universidad de Alcal´ a, Madrid, Spain
Abstract. In this paper we present the application of a support vector regression algorithm to a real problem of maximum daily tropospheric ozone forecast. The support vector regression approach proposed is hybridized with an heuristic for optimal selection of hyper-parameters. The prediction of maximum daily ozone is carried out in all the station of the air quality monitoring network of Madrid. In the paper we analyze how the ozone prediction depends on meteorological variables such as solar radiation and temperature, and also we perform a comparison against the results obtained using a multi-layer perceptron neural network in the same prediction problem.
1
Introduction
Ozone (O3 ) is one of the most relevant air pollutants in urban areas of all medium and large cities of the world [1,2]. It is well known that ozone is a secondary pollutant, since it is not directly emitted into the air. On the contrary, tropospheric ozone is produced when the primary pollutants, mainly nitrogen oxides (N Ox ) and Volatile Organic Compounds (VOC), interact under the action of the sunlight. In addition, O3 is recognized as one of the key pollutants degrading the air quality in urban areas, and it is responsible for increases in mortality rates during episodes of high concentration, mainly in summer. The study of the O3 concentrations, and specially O3 maxima, is, therefore, of major interest. Several works on O3 modeling and forecasting can be found in the literature [3,4], many of them tackle the problem of the modeling or forecasting the complete concentration of O3 in a column, or the distribution of the pollutant in a study area. There are also specific works on ground O3 forecasting from air quality stations in different cities of the world [5,6]. Recently the computation paradigm of Support Vector Machines (SVMs) has gained importance in forecasting problems related to environment [7]. Specifically, the Support Vector Regression algorithms (SVMrs) – SVMs specifically developed for regression problems – are appealing algorithms for a large variety of regression problems [8], since they do not only take into account the error approximation to the data, but also the generalization of the model, i.e., their capability to maintain a good performance when new data are evaluated E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 304–311, 2010. c Springer-Verlag Berlin Heidelberg 2010
Support Vector Regression Algorithms
305
by it. Several previous works have applied the SVM or SVMr methodology for O3 forecasting or related problems. In [7] the SVM approach is applied to the forecasting of the O3 at ground level in Hong-Kong. The authors propose an interesting modification of the standard SVM for classification problems in order to be able to tackle regression problems with it. In [6] an online forecasting system for pollutants based on SVMs is presented. The experimental test of the approach is also carried in Hong-Kong and surrounding areas. In [9] the prediction of retention time of VOC at ground level is carried out with a SVM. The performance of the SVM algorithm is compared with that of a heuristic algorithm for the same purpose. In [10] the performance of a SVM algorithm is tested in the forecasting of different atmospheric pollutants, including O3 , and in [11] the SVM is mixed with wavelets for improving the performance of the SVM approach in a problem of meteorological pollutants forecasting. In this paper we present the application of a SVMr algorithm in the forecasting of O3 daily maximums from data of the Madrid air quality network. We use a SVMr algorithm which incorporates a mechanism based on bounds to better estimate the corresponding hyper-parameters of the SVMr machine [12]. We study the possibility of using meteorological variables to improve the forecasting of the O3 concentrations and include a comparison with the results obtained by a multi-layer perceptron. The structure of the rest of the paper is the following: next section presents the description of the -SVMr approach and the criterion to choose the SVMr hyper-parameters. Next we describe the Madrid air quality network. Section 4 presents the experimental part of the paper, where we provide the main results obtained with the SVMr. Section 5 closes the paper giving some final conclusions.
2
SVMr Formulation and Parameters Search Space Reductions
Although there are several versions of SVMr, in this case we use the classic model presented in [13], i.e., the -SVMr. This method for regression consists of, given a set of training vectors S = {(xi , yi ), i = 1, . . . , l}, training a model of the form y(x) = f (x) + b = wT φ(x) + b. Basically, the training of this model is carried out by means of solving the following optimization problem, coming from a dual formulation of a quadratic problem (see [13] for details): ⎛ l 1 (αi − α∗i )(αj − α∗j )K(xi , xj )− max ⎝− 2 i,j=1 −
l
(αi +
α∗i )
i=1
+
l
yi (αi −
α∗i )
(1)
i=1
subject to l i=1
(αi − α∗i ) = 0
(2)
306
E.G. Ortiz-Garc´ıa et al.
αi , α∗i ∈ [0, C]
(3)
The final form of the regression function f (x) depends on the variables αi , α∗i , as follows: l f (x) = (αi − α∗i )k(xi , x) (4) i=1
In this way it is possible to obtain a SVMr model by means of the training of a quadratic problem for a given hyper-parameters C, and for the kernel parameter γ, which controls the width of the Gaussian function which we select as kernel function. To obtain the optimal hyper-parameters we use the grid search algorithm with search space reduction described in [12] because it allows to obtain very good results with a small training time. In that paper a novel methodology which obtains a good balance between training time and accuracy is introduced. This methodology is based on a classical grid search, which divides the search space in an uniform distribution of points around the whole space. Then, it evaluates the validation accuracy of the model trained by using the hyper-parameter in each point, and finally the model with smaller validation error is chosen as the final one. The most important characteristic in this novel algorithm is the addition of hyper-parameters search space reductions. These reductions enclose the search space in an smaller subspace where the grid search is carried out. The search space reductions proposed in [12] are described by the following equations: C≤
yimax − b − l 1 (1 − l−1 j=1,j =i K(xj , xi ))
γ≤−
( 1l
l
loge (0.001)
i=1
2 minj,i =j d(xj , xi ))
< σy
(5)
(6) (7)
Equation (5) describes the relationship between the regularization hyperparameter C and the rest of hyper-parameters. It is specially important the relationship of parameter C with parameter γ because it generates the most important reduction in the search space. The rest of bounds (Equations (6) and (7)) are related to the characteristic of minimum influence between support vector and the closed relationship between the hyper-parameter and the variance of noise in the data. After applying these reductions, a grid search algorithm is used in the experimental part of the paper to find the hyper-parameters of the SVMr, in the O3 prediction problem.
3
The Air Pollution Monitoring Network of Madrid
The air pollution monitoring network of Madrid is the largest in Spain, and one of the largest in Europe. It is currently formed by 27 measuring (fixed) stations spread out in the city (see Fig. 1). At the beginning the network was formed by 16
Support Vector Regression Algorithms
307
Fig. 1. Location of the measuring stations of the air quality monitoring network of Madrid
measuring stations, connected by the telephonic network with a center of data control, depending on the department of air quality of Madrid City Council. In 1989 the network was completely renewed, new “intelligent” stations were acquired. At this point the systematic measurement of NOx and O3 started. The monitoring network in its current form was finished in 2001, when the last 2 stations were added to the network and several other stations were moved from their original location due to technical reasons. The network can be considered as a routine network [14] because it has been designed to develop long-term concentration of contaminants studies. In fact, the available database is formed by hourly measures for several contaminants during years from 2002 to 2007 in all measuring stations. On the other hand, the network provides meteorological variables in this period of time by means of other 6 meteorological stations which are able to measure solar radiation, temperature and others.
4 4.1
Experiments and Results General Description of the Experiments
To carry out a daily prediction of maxima concentration, it is necessary to apply a maximum function to the hourly measures provided by the network, obtaining 365 ozone concentrations a year. To obtain multiple training and test sets to develop the experiments and to be able to compare the accuracy of the models by using statistical tests, we divide each year in five subsets. Each experiment is formed by two consecutive subsets, using the first one as training set and the second one as test set. In this way, we keep the temporal relationship between data, except in the case of the last subset which is tested in the first one. With all these subsets we obtain a total of 30 experiments which allows to perform statistical test to compare them. These statistical tests consist of a t-test with
308
E.G. Ortiz-Garc´ıa et al.
α = 0.05 significance level after a Kolmogorov-Smirnov normality test (positive test in all of the experiments carried out). On the other hand, to reduce the training time for the experiments we use only 5 out of 27 available measuring stations. We choose the measuring stations which present highest maximums of ozone concentration for the six studied years. That is because we expect this characteristic will remain in future years, being thus more important to forecast those stations than others. 4.2
Analysis of Dependency with Solar Radiation and Temperature
Now we compare the accuracy of the models trained by using four previous ozone measures (in four previous days) and some meteorological variables, in our case, solar radiation and temperature. Note that alternative analysis not included in this paper have shown that other meteorological variables such as wind direction, wind speed, etc., are not statistically related to daily maximum ozone prediction. The mean and standard deviation of the accuracy for the 30 experiments are shown in Table 1 and the statistical test carried out are displayed in Table 2. These results show that the solar radiation can improve statistically in 2 out of the 5 stations (stations 10 and 24) and a good number of winner experiments
Table 1. Mean and standard deviation of the SVMr accuracy for the 30 experiments, considering ozone measures from four previous days without meteorological variables, and also including solar radiation, temperature or both
Station 5 9 10 14 24
None Mean Std 17.56 4.80 15.69 4.06 17.38 4.91 16.84 3.72 17.29 4.01
Solar radiation Mean Std 17.53 4.59 15.61 4.18 17.13 4.83 16.53 3.11 17.00 3.79
Temperature Mean Std 17.82 4.19 15.78 4.17 17.39 4.50 17.01 4.11 17.23 3.66
Both Mean Std 17.68 4.22 15.83 4.13 17.13 4.49 16.87 3.88 17.04 3.74
Table 2. Statistical tests for the 30 experiments considering ozone measures from four previous days. We compare the case of not including meteorological variables with the case of including solar radiation, temperature or both. W-L-T stands for Win-Lost-Tie results in the comparison. Solar radiation Station P-value W-L-T 5 0.80∗ 15-15-0 9 0.65∗ 16-14-0 10 0.04∗ 21-9-0 14 0.22∗ 17-13-0 24 0.02∗ 18-12-0 ∗ t-test α =0.05.
Temperature P-value W-L-T 0.26∗ 15-15-0 0.62∗ 15-15-0 0.97∗ 17-13-0 0.56∗ 15-15-0 0.71∗ 14-16-0
Both P-value W-L-T 0.69∗ 19-11-0 0.53∗ 16-14-0 0.09∗ 19-11-0 0.92∗ 17-13-0 0.06∗ 18-12-0
Support Vector Regression Algorithms
309
in the station 14. However, by using temperature or both of them, it is only possible to obtain similar results to the standard case. Therefore, following these results, the optimal selection of features in this case includes solar radiation, apart from the ozone samples of four days before. A graphic example of the prediction obtained with the model trained by using this features and the real ozone concentration values are shown in Fig. 2. In this figure we show the prediction in the six years studied: 2002 to 2007. Note that the values concern to the prediction of each input vector in the test sets used in the statistical tests. It is possible to see how the trend of the prediction and the real data is very similar. 150 O3(max)
O3(max)
150 100 50 0
100
200 2002
O3(max)
O3(max)
50
100
200 2004
200 2003
300
100
200 2005
300
100
200 2007
300
100 50 0
300
150 O3(max)
150 O3(max)
100
150
100
100 50 0
50 0
300
150
0
100
100
200 2006
300
100 50 0
Fig. 2. Comparison of the forecast and measures of maximum daily ozone concentrations in different years (Station 9)
4.3
Comparison between SVMr Model and MLP
Finally, we compare the performance of the SVMr model presented with a neural network based on a multi-layer perceptron (MLP). To train the MLP we use a variable number of neurons in the hidden layer, from 6 to 20, and it is trained by using a Levenberg-Marquardt algorithm and repeating the training for 20 repetitions. In addition, a hold-out validation process is used in order to control the generalization of the model and to choose the best MLP from all repetitions. The mean and standard deviation of the RMSE (root mean square error) of the MLP model and the best SVMr found in the previous subsection and the statistical tests are shown in Table 3. These results clearly show that the SVMr model obtains better performance than MLPs, obtaining better mean accuracy and statistical differences in all the evaluated stations.
310
E.G. Ortiz-Garc´ıa et al.
Table 3. Mean and standard deviation of the accuracy for the 30 experiments considering ozone measures from four previous days and solar radiation, by using a multi-layer perceptron or SVMr model MLP Station Mean Std 5 34.60 14.75 9 32.90 16.12 10 34.99 15.97 14 31.58 14.13 24 33.28 15.26 ∗ t-test α =0.05.
5
SVMr Mean Std 17.53 4.59 15.61 4.18 17.13 4.83 16.53 3.11 17.00 3.79
SVMr t-test 0.00∗ 0.00∗ 0.00∗ 0.00∗ 0.00∗
vs MLP W-L-T 29-1-0 29-1-0 28-2-0 28-2-0 29-1-0
Conclusions
In this paper we have presented a complete study of the application of support vector regression (SVMr) algorithms to daily maximum ozone prediction in Madrid urban area. Comparison with the results of a multi-layer perceptron has shown the good performance of the approach. The results obtained are promising, and show that this methodology can be useful to face this type of problems. The application of the SVMr to other prediction ozone problems with different time horizon, such as hourly or long-time prediction could be an interesting future line of research.
Acknowledgement This work is partially supported by Comunidad de Madrid and Universidad de Alcal´ a through Project CCG08-UAH/AMB-3993. E. G. Ortiz-Garc´ıa is supported by the FPU Predoctoral Program (Spanish Ministry of Innovation and ´ M. P´erez-Bellido is supported with Science) grant reference AP2008-00248. A. a FPI fellowship from Junta de Comunidades de Castilla la Mancha.
References 1. Massart, B., Kvalheim, O.M.: Ozone forecasting from meteorological variables: Part I. Predictive models by moving window and partial least squares regression. Chemometrics and Intelligent Laboratory Systems 42(1-2), 179–190 (1998) 2. Massart, B., Kvalheim, O.M.: Ozone forecasting from meteorological variables: Part II. Daily maximum ground-level ozone concentration from local weather forecasts. Chemometrics and Intelligent Laboratory Systems 42(1-2), 191–197 (1998) 3. Garfias-V´ azquez, M., Audry-S´ anchez, J., Garfias-Ayala, F.J.: Tropospheric ozone prediction in Mexico city. Journal of the Mexican Chemistry Society 49(1), 2–9 (2005) 4. Brunelli, U., Piazza, V., Pignato, L., Sorbello, F., Vitabile, S.: Two-days ahead prediction of daily maximum concentrations of SO2 , O3 , P M10 , N O2 , CO in the urban area of Palermo, Italy. Atmospheric Environment 41, 2967–2995 (2007)
Support Vector Regression Algorithms
311
5. Wang, D., Lu, W.: Ground-level ozone prediction using multilayer perceptron trained with an innovative hybrid approach. Ecological Modelling 198, 332–340 (2006) 6. Wang, W., Men, C., Lu, W.: Online prediction model based on support vector machine. Neurocomputing 71(4-6), 550–558 (2008) 7. Lu, W., Wang, D.: Ground-level ozone prediction by support vector machine approach with a cost-sensitive classification scheme. Science of the Total Environment 395, 109–116 (2008) 8. Mohandes, M.A., Halawani, T.O., Rehman, S., Hussain, A.A.: Support vector machines for wind speed prediction. Renewable Energy 29(6), 939–947 (2004) 9. Luan, F., Xue, C., Zhang, R., Zhao, C., Liu, M., Hu, Z., Fan, B.: Prediction of retention time of a variety of volatile organic compounds based on the heuristic method and support vector machine. Analytica Chimica Acta 537(1-2), 101–110 (2005) 10. Lu, W.-Z., Wang, W.-J.: Potential assessment of the support vector machine method in forecasting ambient air pollutant trends. Chemosphere 59(5), 693–701 (2005) 11. Osowski, S., Garanty, K.: Forecasting of the daily meteorological pollution using wavelets and support vector machine. Engineering Applications of Artificial Intelligence 20(6), 745–755 (2007) 12. Ortiz-Garc´ıa, E., Salcedo-Sanz, S., P´erez-Bellido, A., Portilla-Figueras, J.A.: Improving the training time of support vector regression algorithms through novel hyper-parameters search space reductions. Neurocomputing 72, 3683–3691 (2009) 13. Smola, A.J., Schlkopf, B.: A tutorial on support vector regression. Statistics and Computing (1998) 14. Hoek, G., Beelen, R., de Hoogh, K., Vienneau, D., Gulliver, J., Fischer, P., Briggs, D.: A review of land-use regression models to assess spatial variation of outdoor air pollution. Atmospheric Environment 42, 7561–7578 (2008)
Neuronal Implementation of Predictive Controllers José Manuel López-Guede∗, Ekaitz Zulueta, and Borja Fernández-Gauna Computational Intelligence Group UPV/EHU {jm.lopez,ekaitz.zulueta,manuel.grana,alicia.danjou}@ehu.es www.ehu.es/ccwintco
Abstract. In spite of the multiple advantages that Model Predictive Control offers (for example, they can control systems that classical control schemes can’t), it has a main drawback: it is computationally expensive in its working phase. In this paper we deal with the problem of getting an implementation of predictive controllers that implements its operations in an efficient way, so we use a neuronal implementation. We show how we have trained these neural networks, and how we exploit their generalization property and their robustness when there are control and measurement disturbances.
1 Introduction In this paper we show a work to implement Model Predictive Controllers using a neuronal implementation to overcome the drawbacks derived from the classical analytical implementation of these controllers. Model Predictive Control is an interesting advanced control schema because classic and well known control schemas as PID controllers could have problems to control some systems, and in this situation, we can use other advanced control systems which try to emulate the human brain, as Predictive Control. This kind of control works using a world model and calculating some predictions about the response that it will show under some stimulus, and it obtains the better way of control the system knowing which is the desired behavior from this moment until a certain instant later. The predictive controller tuning is a process that is done using analytical and manual methods. Such tuning process is expensive in computational terms, but it is done once and in this paper we don’t deal with this problem. However, in spite of the great advantage of predictive control, which contributes to control systems that the classic control is unable to do, it has a great drawback: it is very computationally expensive while it is working. In section 2 we will revise the cause of this problem. A way of avoiding this drawback is to model the predictive controller using neural networks, because once these devices are trained they perform the calculus at great speed and with very small computational requirements. In this paper we propose a learning model to be used with Time Delayed Neural Networks, so once the neural network is trained, the neuronal predictive controller is ready and it responds properly showing its generalization capabilities in environments that it hasn’t seen in the training phase, ∗
This work was supported in part by the Spanish Ministerio de Educación y Ciencia under grant DPI2006-15346-C03-03.
E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 312–319, 2010. © Springer-Verlag Berlin Heidelberg 2010
Neuronal Implementation of Predictive Controllers
313
and it shows its robustness when there are control and measurement disturbances. On the other hand, they are a powerful implementation because they aren’t so computationally expensive as the classical analytical predictive control. In the literature there are works talking about the comparison between PID and MPC controllers [1], and there are interesting approximations to the prediction capacity of neuronal models when predictive control is present [2-4]. The stability of these neural networks is an important issue [5], so we will analyze their robustness in this paper. Section 2 gives background information about Predictive Control and a technique called Dynamic Matrix Control, discussing its advantages and drawbacks. Section 3 talks about a kind of neural nets called Time Delayed Neural Networks and about their structural parameters. Section 4 show how a neuronal implementation can be found, and finally, conclusions are covered in Section 5.
2 Model Predictive Control This section gives a brief introduction about a general technique called Model Predictive Control, and about a concrete technique called Dynamic Matrix Control. We consider that it is necessary to understand the advantages of this kind of control, that make it very useful in some circumstances, and their drawbacks, and then understand how a neural network based implementation can eliminate these drawbacks. 2.1 Model Predictive Control (MPC) and Dynamic Matrix Control (DMC) Model Predictive Control (MPC) is an advanced control technique used to deal with systems that are not controllable using classic control schemas as PID. This kind of controller works like the human brain in the sense that instead of using the past error between the output of the system and the desired value, it controls the system predicting the value of the output in a sort time, so the system output is as closer as possible to its desired value for these moments. Predictive Control isn’t a concrete technique. It’s a set of techniques that have several common characteristics: there is a world model that is used to predict the system output from the actual moment until p samples, an objective function that must be minimized and a control law that minimizes the objective function. The predictive controllers follow these steps: •
• •
Each sampling time, through the system model, the controller calculates the system output from now until p sampling times (prediction horizon), which depends on the future control signals that the controller will generate. A set of m control signals is calculated optimizing the objective function to be used along m sampling times (control horizon). In each sampling time only the first of the set of m control signals is used, and in the next sampling time, all the process is repeated again.
The technique called Dynamic Matrix Control (DMC) is a concrete MPC algorithm that uses: •
As subsystem model, the step response of the subsystem,
314
• •
J.M. López-Guede, E. Zulueta, and B. Fernández-Gauna
As objective function, it measures the difference between the reference signal and the subsystem output, As control law, the shown in the equation (1), being G a matrix that contains the systems dynamics, λ a parameter about the following capacity of the subsystem, w the reference signal and f the free response of the subsystem.
(
Δu = G t G + λI
)
−1
G t (w − f ) .
(1)
To learn more about Predictive Control in general, and about Dynamic Matrix Control in particular, see [6-9]. 2.2 Model Predictive Control Advantages From a theoretical point of view, model predictive based controllers have some advantages: - It is an open methodology, with possibility of new algorithms. - They can include constraints on manipulated variables as well as on controlled variables. This is important to save energy and to get the working point as near as possible from the optimum. - They can deal with multivariable systems in a simplest way than other algorithms. From a practical point of view, model predictive based controllers have the advantage that they can deal with systems that show stability problems with classical control schemes. To show this property we are going to suppose that the model of a system is described by the following discrete transfer function: H (z ) =
1 . z − 0.5
(2)
Although it is a stable system because its pole is inside the unit circle, its response is unstable if we try to control it using a PID controller tuned through classic and wellknown methods as Ziegler-Nichols. However, using a properly tuned DMC predictive controller, for example, with the values for its parameters p = 5 , m = 3 and λ = 1 , a right control is obtained. To get this control it has been mandatory to tune the DMC controller. This phase is very expensive in computationally terms, but it’s carried out only one time. To know more about this tuning phase, see [10]. 2.3
Model Predictive Control Drawbacks
The main drawback of predictive controllers isn’t that it was very expensive in computationally terms in the tuning phase, because it is carried out only one time. The main drawback is that the computational requirements of the shown controller are great when it’s in its working phase. Each sample time the controller must calculate the control law of the equation (1), and there are involved several matrix operations, as several multiplication, an addition and a subtraction. Performing these operations we obtain a set of m control signals, but only the first of them is used in this sample
Neuronal Implementation of Predictive Controllers
315
time, the rest are ignored. The algorithm woks in this way, but it is computationally inefficient.
3 Neural Implementation Following with the discussion about the computationally inefficiency of the analytical predictive control shown in the previous section, we think that it would be convenient to have a mechanism that could implement such controller requiring less computational power. An alternative to get this is to use neural networks, and more precisely, Time Delayed Neural Networks, because as the rest of neural networks, they are very fast, computationally inexpensive and they have the ability of generalizing their responses. We do not consider a hierarchical structure [21]. This section gives a brief introduction about a kind of neural networks called Time Delayed Neural Networks, that we have used to model the previous model predictive controller to eliminate the shown drawbacks. 3.1 Time Delayed Neural Networks
Time Delayed Neural Networks (TDNN) are a kind of multi-layer perceptron neural networks. The TDNN special feature is that they are a kind of dynamic neural networks, because delayed versions of the input signals are introduced to the input layer. Due to this, the outputs don’t depend only on the actual values of the signals, they depend on the past value of the signals too. This kind of neural network can be trained using the Backpropagation algorithm or the Generalized Delta Rule. In the experiments that we show in this paper the Levenberg-Marquardt method has been used. To learn more about neural networks in general see [11-13]. To learn more about Time Delayed Neural Networks, see [14-16]. 3.2 Structural Parameters
As we are worried about the computational cost of our implementation of the predictive controller, we limit the number of hidden layers to one, so we assume that we are working with a time delayed neural network that has the simplest structure. Once we have established this constraint, the main parameters that configure the structure of this TDNN are the number of neurons of the hidden layer and the size of the time delay line, in other words, the number of delayed versions of the input signals are introduced to the input layer. We will try to get these parameters as small as possible to minimize the computational cost of the resultant implementation. The last main parameter to establish is the kind of the function that will be executed in each neuron, and we will take into account that the linear function is the least expensive from the computational point of view. In Fig. 1 we show the structure of the Time Delayed Neural Network that we have used to get our purpose, in which we have fitted the size of time delay line d and the size of hidden layer h parameters.
316
J.M. López-Guede, E. Zulueta, and B. Fernández-Gauna Input
Hidden
Out
T D L (d)
(3)
(h)
(1)
Fig. 1. Time Delayed Neural Network structure with 3 layers: input layer with 3 inputs, and the output layer with 1 output
4 Results In this section we will deal with the concrete problem of get a neuronal predictive controller that could control the system described by the discrete transfer function of the equation (2) using Time Delayed Neural Networks. We have done training experiments with multiple structures, varying the two main structural parameters: the number of the hidden layer neurons h and the number of delays of the time delay line d , having in mind that linear function is computationally efficient. We have used the Levenberg-Marquardt method to carry out the training of each structure, and the training model has consisted of a target vector ′ P = [w(k ) , y (k ) , Δu (k − 1)] and an output Δu (k ) , to get the same control that equation (1). As it has be shown in Fig. 2, the control is right with known references, and in Fig. 3, we can see that the neuronal controller is right even with noisy references that hadn’t been used in the training phase due to the generalization property of neural networks. Now we exploit more characteristics of this neural predictive controller once we have proved the generalization property of neural networks: we are going to explore the robustness of the learned predictive controller. To get this we have designed some experiments where we apply some perturbations to the control signal Δu (k ) generated by the neural predictive controller, and on the other hand, we apply the same perturbations to the measured output of the system y (k ) .
The perturbation that we apply is white noise of mean zero and variance σ 2 = 10 −3 . In Fig. 4 there is shown the control that the neuronal controller executes when the reference signal is the same that is used in Fig. 3. As we can see, the performance is very close to the performance of the analytic predictive controller that is shown in Fig. 3. To learn more about identification and control of dynamical systems, see [17-18], and about neural identification applied to predictive control, see [19-20].
Neuronal Implementation of Predictive Controllers
317
Fig. 2. Control of a system with a Time Delayed Neural Network with a time delay line of d = 7 delays in the input, and h = 5 neurons in the hidden layer. The reference to follow is a signal that the neural network has been used in the training phase.
Fig. 3. Control of a robot system with a Time Delayed Neural Network with a time delay line of d = 7 delays in the input, and h = 5 neurons in the hidden layer. The reference to follow is a signal that the neural network hasn’t been used in the training phase.
318
J.M. López-Guede, E. Zulueta, and B. Fernández-Gauna
Fig. 4. Control executed by a neural predictive control with control and measurement disturbance
5 Conclusions This paper has started showing that there are systems that cannot be controlled using classical control schemas as PID. So, there are advanced control schemes as Predictive Control that can control them, and we have show how a system of this kind is controlled. Predictive Control has some advantages and a very clear drawback: it is computationally expensive in its tuning and working phases. To overcome this drawback in the control phase the authors propose a neural network based implementation. We show how a concrete predictive controller can be learned with a concrete learning structure, showing that its performance is very good even with unknown references and with control and measurement disturbances. It would be interesting to apply them to multicomponent sytems [22] and linked systems [23].
References 1. Voicu, M., Lazär, C., Schönberger, F., Pästravanu, O., Ifrim, S.: Predictive Control vs. PID Control of Thermal Treatment Processes. In: Control Engineering Solution: a Practical Approach, pp. 163–174 2. McKinstry, J.L., Edelman, G.M., Krichmar, J.L.: A cerebellar model for predictive motor control tested in a brain-based device. Proceedings of the National Academy of Sciences of The United States of America 103(9), 3387–3392 (2006) 3. Aleksic, M., Luebke, T., Heckenkamp, J., Gawenda, M., et al.: Implementation of an Artificial Neural Network to Predict Shunt Necessity in Carotid Surgery. Annals if Vascular Surgery 22(5), 635–642 (2008)
Neuronal Implementation of Predictive Controllers
319
4. Kang, H.: A neural network based identification-control paradigm via adaptative prediction. In: Proceedings of the 30th IEEE Conference on Decision and Control, vol. 3, pp. 2939–2941 (1991) 5. Wilson, W.H.: Stability of Learning in Classes of Recurrent and Feedforward Networks. In: Proceedings of the Sixth Australian Conference on Neuronal Networks (ACNN 1995), pp. 142–145 (1995) 6. Camacho, E.F., Bordons, C.: Model Predictive Control. Springer, London (2004) 7. Camacho, E.F., Bordons, C.: Model Predictive Control in the Process Industry. Springer, London (1995) 8. Maciejowski, J.M.: Predictive Control with Constraints. Prentice Hall, London (2002) 9. Sunan, H., Kok, T., Tong, L.: Applied Predictive Control. Springer, London (2002) 10. López-Guede, J.M., Zulueta, E., Graña, M., Oterino, F.: Ajuste Manual de Controladores Predictivos. In: III Jornadas de Inteligencia Computacional. Ed. Universidad del País Vasco, pp. 472–475 (2009), http://www.ehu.es/ccwintco/index.php/ Libros_y_monograf%C3%ADas_editadas 11. Braspenning, P.J., Thuijsman, F., Weijters, A.J.M.M.: Artificial Neural Networks. Springer, Berlin (1995) 12. Chester, M.: Neural Networks. Prentice Hall, New Jersey (1993) 13. Widrow, B., Lehr, M.A.: 30 Years of Adaptative Neural Networks: Perceptron, Madaline, and Backpropagation. Proceedings of IEEE 78(9), 1415–1441 (1990) 14. Waibel, A., Hanazawa, T., Hinton, G., Shikano, K., Lang, K.: Phoneme Recognition Using Time Delay Neural Networks. IEEE Transactions on Accoustics, Speech and Signal Processing 37, 328–339 (1989) 15. Wang, Y., Kim, S.-P., Principe, J.C.: Comparison of TDNN training algorithms in brain machine interfaces. In: Proceedings of the IEEE International Joint Conference on Neural Networks (IJCNN 2005), vol. 4, pp. 2459–2462 (2005) 16. Taskaya-Temizel, T., Casey, M.C.: Configuration of Neural Networks for the Analysis of Seasonal Time Series. In: Singh, S., Singh, M., Apte, C., Perner, P. (eds.) ICAPR 2005. LNCS, vol. 3686, pp. 297–304. Springer, Heidelberg (2005) 17. Narendra, K.S., Parthasarathy, K.: Indentification and Control of Dynamical Systems Using Neural Networks. IEEE Tran. Neural Networks 1(1), 491–513 (1990) 18. Norgaard, M., Ravn, O., Poulsen, N.K., Hansen, L.K.: Neural Networks for Modelling and Control of Dynamic Systems. Springer, London (2003) 19. Arahal, M.R., Berenguel, M., Camacho, E.F.: Neural identification applied to predictive control of a solar plant. Control Engineering Practice 6, 333–344 (1998) 20. Huang, J.Q., Lewis, F.L., Liu, K.: A Neural Net Predictive Control for Telerobots with Time Delay. Journal of Intelligent and Robotic Systems 29, 1–25 (2000) 21. Graña, M., Torrealdea, F.J.: Hierarchically structured systems. European Journal of Operational Research 25, 20–26 (1986) 22. Duro, R.J., Graña, M., de Lope, J.: On the potential contributions of hybrid intelligent approaches to multicomponent robotic system development. Information Sciences (in pres, 2010) 23. Echegoyen, Z., Villaverde, I., Moreno, R., Graña, M., d’Anjou, A.: Linked multicomponent mobile robots: modeling, simulation and control. Robotics and Autonomous Systems (submitted, 2010)
α-Satisfiability and α-Lock Resolution for a Lattice-Valued Logic LP(X) Xingxing He1 , Yang Xu1 , Yingfang Li2 , Jun Liu3 , Luis Martinez4 , and Da Ruan5 1
3 5
Intelligent Control Development Center, Southwest Jiaotong University, Chengdu 610031, Sichuan, PR China 2 Department of Mathematics, Southwest Jiaotong University, Chengdu 610031, Sichuan, PR China School of Computing and Mathematics, University of Ulster, Northern Ireland, UK 4 Department of Computing, University of Ja´en, E-23071 Ja´en, Spain Belgian Nuclear Research Centre (SCK◦CEN), Mol, and Ghent University, Belgium [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
Abstract. This paper focuses on some automated reasoning issues for a kind of lattice-valued logic LP (X) based on lattice-valued algebra. Firstly some extended strategies from classical logic to LP (X) are investigated in order to verify the α-satisfiability of formulae in LP (X) while the main focus is given on the role of constant formula played in LP (X) in order to simply the verification procedure in the semantic level. Then, an α-lock resolution method in LP (X) is proposed and the weak completeness of this method is proved. The work will provide a support for the more efficient resolution based automated reasoning in LP (X). Keywords: lattice-valued logic; α-resolution principle; α-satisfiability; α-lock resolution method.
1
Introduction
Automated theorem proving [1,2,3,4] is an important research topic in the realm of artificial intelligence. Theorem proving is a procedure that can be used to check whether a given logical formula F (the “goal”) is a logical consequence of a set of formulae N (the “theory”), the equivalent treatment is to validate the unsatisfiability of the set N ∪ {¬F }. Resolution principle in classic logic, proposed by Robinson [1], is a single rule of inference for a test of unsatisfiability of a logical formula, proceeds by constructing refutation proofs, i.e., proofs by contradiction. A resolution algorithm in classical logic has three obvious features: (1) the formulas are basically sets of clauses each of which is a disjunction of literals. The forms of literals are simple because they usually contain neither constants nor implication connectives; (2) a resolution algorithm is constructed to prove the unsatisfiability of a logical formula, i.e., only one level of truth-value status is considered (i.e., false, denoted as O), called O-resolution; and (3) to judge if two E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 320–327, 2010. c Springer-Verlag Berlin Heidelberg 2010
α-Satisfiability and α-Lock Resolution for a Lattice-Valued Logic LP(X)
321
literals being O-resolvent can be simplified into judging if the two literals are the complementary pair. Furthermore, in order to avoid excessive redundant clauses production during the resolution process, some restrictions should be imposed on resolution. Of course, the completeness of resolution principle should also be presupposed as far as possible. In this sense, semantic resolution, lock resolution and linear resolution based on classic logic have been proposed given [2,3,4]. Incomparability is a kind of uncertainty often associated with human’s intelligent activities not only in the processed object itself, but also in the course of the object being dealt with. It is a kind of overall uncertainty of objects caused due to the complexity of objects itself associated with many factors and the inconsistent token among those factors, which occurs inevitably in the process of dealing with the complex objects. In order to deal with the uncertain information especially for incomparability in the intelligent computation from the logical view, lattice implication algebra [5,6] (LIA), lattice-valued logic [6] based on LIA, and approximate reasoning [6] (i.e. uncertainty reasoning and automated reasoning) in lattice-valued logic based on LIA were proposed and studied. For automated reasoning aspect, αresolution principle, were deeply investigated in [7,8,9,10]. Lattice-valued propositional logic LP (X)[6] based on LIA extends the classical logic in many ways such as the valuation field, the implication connective and the language. The valuation field extends the set {0, 1} to a lattice L and the implication connective is not Kleene’s but a more general operator in latticevalued logic. For the language aspect, the symbols in LP (X) include the set of constants, but not in classical logic. They make the capability of expression and transmission be improved. Moreover, the logical formula in LP (X) can be expressed in different truth value levels, so it can depict and process the information more naturally in the logical view. However, the logical formula which includes constants makes the system more complex. For example, the propositional variables in classical logic can be taken values in {0, 1} freely, but the valuation of the logical formulae in LP (X) may not be taken in the whole valuation field L. Therefore, when we discuss the α-resolution and some properties in LP (X), many conclusions in classical logic may not hold here. In this paper, after a brief overview about lattice-valued logic based on LIA in Section 2, the logical formula which includes constants is discussed in Section 3, the logical formulae which are comparable with the resolution level α can be deleted or not be considered before the α-resolution, which aims to simplify the structure of the generalized literals and extend the resolution methods in classical logic to LP (X) more conveniently. Furthermore, some rules in classical logic are extended to LP (X) to verify the α-satisfiability of formulae in LP (X) in Section 4. Finally, in order to improve the efficiency of α-resolution in latticevalued logic, an α-lock resolution method based on LP (X) is proposed and the weak completeness of this method is also proved in Section 5. Section 6 concludes the paper.
322
2
X. He et al.
Preliminaries
Among extensive research results on LIA and their corresponding lattice-valued propositional logic LP (X)[6] and lattice-valued first-order logic LF (X)[6], we only outline elementary concepts which are closely relevant to this work for the convenience of readers. For further details about the background and properties of LIA and LP (X), we refer to the related references, e.g., [6,7,8,9]. Definition 1. [6] Let (L, ∨, ∧, 0, 1) be a bounded lattice with an order-reversing involution “ ”, 1 and 0 the greatest and the smallest element of L, respectively, and →: L × L −→ L be a mapping. (L, ∨, ∧, , →, 0, 1) is called a implication algebra if the following conditions hold for any x, y, z ∈ L: (I1 ) (I2 ) (I3 ) (I4 ) (I5 ) (I6 ) (I7 )
x → (y → z) = y → (x → z)(exchange property), x → x = 1(identity), x → y = y → x (contraposition), x → y = y → x = 1 implies x = y, (x → y) → y = (y → x) → x, (x ∨ y) → z = (x → z) ∧ (y → z), (x ∧ y) → z = (x → z) ∨ (y → z).
Definition 2. [6] (Lukasiewicz implication algebra on finite chains) Consider the set Ln = {ai |i = 1, 2, . . . , n}. For any 1 ≤ j, k ≤ n, define aj ∨ ak = amax{j,k} , aj ∧ ak = amin{j,k} ,
(aj ) = an−j+1 , aj → ak = amin{n−j+k,n} . then (Ln , ∨, ∧, , →, 0, 1) is a lattice implication algebra. In the following text, we always assume that (L, ∨, ∧, , →, 0, 1) is an LIA, in short L. Definition 3. [6] Let X be a set of propositional variables, T = L ∪ {, →} be a type with ar()=1, ar(→)=2 and ar(a)=0 for every a ∈ L. The propositional algebra of the lattice-valued propositional calculus on the set X of propositional variables is the free T algebra on X and is denoted by LP(X). Proposition 1. [6] LP (X) is the minimal set Y which satisfies: (1) X ∪ L ⊆ Y . (2) If p, q ∈ Y , then p , p → q ∈ Y .
α-Satisfiability and α-Lock Resolution for a Lattice-Valued Logic LP(X)
323
Note that L and LP (X) are the algebras with the same type T , where T = L ∪ { , →}. Definition 4. [6] A valuation of LP(X) is a propositional algebra homomorphism γ : LP (X) → L. If γ is a valuation of LP(X), we have γ(α) = α for every α ∈ L. Remark 1. Specially, when L = Ln , LP (X) is denoted as Ln P (X). Definition 5. [6] Let p ∈ LP (X), α ∈ L. If there exists a valuation γ of LP(X) such that γ(p) ≥ α, p is satisfiable by a truth-value level α, in short, α − satisf iable; if γ(p) ≥ α for every valuation γ of LP(X), p is valid by the truth-value level α, in short, α − valid. If α = I, then p is valid simply. Definition 6. [6] Let p ∈ LP (X), α ∈ L. If γ(p) ≤ α for every valuation γ of LP(X), p is always false by the truth-value level α, in short α − f alse. If α = O, then p is invalid. In the following, for the convenience, F ∈ LP (X), stands for F is a logical formula in lattice valued propositional system LP (X) based on LIA. Definition 7. [6] A lattice-valued propositional logical formula F is called an extremely simple form, in short ESF, if a lattice-valued propositional logical formula F ∗ obtained by deleting any constant or literal or implication term appearing in F is not equivalent to F . Here, the definition of literal is the same as that in classical logic. Definition 8. [6] A lattice-valued propositional logical formula F is called an indecomposable extremely simple form, in short IESF, if (1) F is an ESF containing connective → and at most. (2) For any G ∈ F, if G ∈ F in LP (X), then G is an ESF containing connective → and at most. Definition 9. [6] An IESF F is called an k-IESF if there exist exactly k implication connectives occurring in F . Definition 10. [6] All the constants, literals and IESFs are called generalized literals. Definition 11. [6] A lattice-valued propositional logical formula G is called a generalized clause (phrase), if G is a formula of the form: G = g1 ∨ . . . ∨ gi ∨ . . . ∨ gn (G = g1 ∧ . . . ∧ gi ∧ . . . ∧ gn ) where gi (i = 1, ..., n) are generalized literals. A conjunction (disjunction) of finite generalized clauses (phrases) is called a generalized conjunctive (disjunctive) normal form.
324
X. He et al.
Definition 12. [6] (α-Resolution). Let α ∈ L, and G1 and G2 be two generalized clauses of the forms: G1 = g1 ∨ . . . ∨ gi ∨ . . . ∨ gm G2 = h1 ∨ . . . ∨ hj ∨ . . . ∨ hn If gi ∧ hj , then G = g1 ∨ . . . ∨ gi−1 ∨ . . . ∨ gi+1 ∨ . . . ∨ h1 ∨ . . . ∨ hj−1 ∨ . . . ∨ hj+1 ∨ . . . ∨ hn is called an α-resolvent of G1 and G2 , denoted by G = Rα (G1 , G2 ), and gi and hj form an α-resolution pair, denoted by (gi , hj ) − α. Generation of an αresolvent from two clauses, called α-resolution, is the sole rule of inference of the α-resolution principle. Definition 13. Let S = G1 ∧ . . . ∧ Gi ∧ . . . ∧ Gn be a generalized conjunctive normal form in LP(X), where Gi (i = 1, 2, . . . , n) be generalized clauses. S is called a generalized clause set, if the logical symbol “∧” in S is rewritten as the logical symbol “,” and S has this form S = {G1 , G2 , . . . , Gn }. Definition 14. A generalized clause G in LP(X) is called a unit generalized clause, if G is composed of only one generalized literal g, and this generalized literal g is called a unit generalized literal. Definition 15. Let S be a generalized clause set in LP(X), a generalized literal g is called a pure generalized literal, if g does not exist in S.
3
α-Satisfiability of Formulae in LnP (X)
In this section, we discuss the generalized literals which include constants in Ln P (X) and extend some classical satisfiability verification rules to Ln P (X). Theorem 1. (α-Valid Rule) Let S = {G1 , G2 , . . . , Gn } be a generalized clause set in Ln P (X), α ∈ Ln . If there exists a generalized clause Gi > α(i = 1, 2, . . . , n), then S − Gi ≤ α if and only if S ≤ α Proof. Let S = Gi ∪ {S − Gi }, for any valuation γ, if γ(S − Gi ) ≤ α, it follows that γ(S) ≤ γ(S − Gi ). On the contrary, if there exists a valuation γ0 , such that γ0 (S) > α, then γ0 (Gi ∪ {S − Gi }) > α. Hence, γ0 (S − Gi ) > α by Gi ≥ α. It is a contradiction to S − Gi ≤ α. Similar to Theorem 1, we can establish the following corollary. Corollary 1. Let S = {G1 , G2 , . . . , Gn } be a generalized clause set in Ln P (X), gj is one of generalized literals of Gi , α ∈ Ln . Then the following conclusions hold: (1) If gj > α, then S − Gi ≤ α if and only if S ≤ α.
α-Satisfiability and α-Lock Resolution for a Lattice-Valued Logic LP(X)
325
(2) If gj ≤ α, then there exist two cases as follows: (i) If gj is a unit generalized literal, then S ≤ α. (ii) If gj is not a unit generalized literal, then S − {gj } ≤ α if and only if S≤α Proposition 2. Let g be an IESF in Ln P (X), Ln = {ai |i = 1, 2, . . . , n}, the following conclusions hold: (1) Let a be a constant in Ln P (X). If g includes a, the truth value of g can be taken in one of the following five cases, that is, g ∈ [a, an ), g ∈ (a1 , a], g ∈ [a , an ), g ∈ (a1 , a ] and g ∈ Ln . (2) Let a1 , b1 be constants in Ln P (X). If g includes k constants (where k ∈ Z + , and k ≥ 2). The truth value of g can be taken in one of the following four cases, that is, g ∈ [a1 , b1 ], g ∈ (a1 , b1 ], g ∈ [a1 , an ) and g ∈ Ln . Remark 2. From the discussion above, the logical formulae which are comparable with the resolution level α should not be considered (i.e. deleted or determined), so two types of generalized literals are remained in Ln P (X) in the semantic view: (1) Propositional variables. These generalized literals can be taken values in the whole valuation field Ln . (2) Some IESFs which include no constant and are incomparable with the resolution level α. These generalized literals can be taken values greater than α or less than α. Therefore, after the pretreatment, the valuations of the remained generalized literals in Ln P (X) can be greater than α or less than α. Theorem 2. (Unit generalized literal rule). Let S = {G1 , G2 , . . . , Gn } be a generalized clause set in Ln P (X), α ∈ Ln and ∨a∈Ln (a ∧ a ) ≤ α < I. If there exists a unit generalized literal g in S, then delete all generalized clauses which include g, and get a generalized clause set S1 . The following conclusions hold: (1) If S1 = φ, then S is α-satisfiable. (2) If S1 = φ, then delete the generalized literal g in S1 , and get a generalized clause set S2 . S ≤ α if and only if S2 ≤ α. Corollary 2. Let S = {G1 , G2 , . . . , Gn } be a generalized clause set in Ln P (X), α ∈ Ln . If S ≤ α, then the following conclusions hold: (1) Delete the generalized literal g and the generalized clauses which include g , and get a generalized clause set S1 , then S1 ≤ α. (2) Delete the generalized literal g and the generalized clauses which include g, and get a generalized clause set S2 , then S2 ≤ α. Theorem 3. (Pure generalized literal rule). Let S be a generalized clause set in Ln P (X), α ∈ Ln . If there exists a pure generalized literal g in S, then delete all generalized clauses which include g, and get a generalized clause set S1 . The following conclusions hold. (1) If S1 = φ, then S is α-satisfiable. (2) If S1 = φ, then S ≤ α if and only if S1 ≤ α.
326
X. He et al.
Theorem 4. (Splitting rule). Let S be a generalized clause set in Ln P (X), α ∈ Ln . S can be written in the following form: (A1 ∨ g) ∧ . . . ∧ (Am ∨ g) ∧ . . . ∧ (B1 ∨ g ) ∧ . . . ∧ (Bn ∨ g ) ∧ R, where Ai , Bi , R do not include g and g . Let S1 = A1 ∧ . . . ∧ Am ∧ R, S2 = B1 ∧ . . . ∧ Bn ∧ R, then S ≤ α if and only if S1 ≤ α and S2 ≤ α.
4
α-Lock Resolution Method in LnP (X)
In this section, we give the definitions of lock generalized clause, α-lock resolution and α-lock deduction, and discuss the soundness and weak completeness of αlock resolution method in Ln P (X). Definition 16. Let G be a generalized clause in Ln P (X), each occurrence of a generalized literal in G is assigned a positive integer in the lower left corner (the same generalized literals can be labeled different positive integer), this specific generalized clause G is called a lock generalized clause, and the positive integer in the generalized literal is called a lock index. Definition 17. Let G be a lock generalized clause in Ln P (X). Suppose that G contains generalized literals which have the same name with different indices, then delete the generalized literals with larger indices. This process is called amalgamation. Definition 18. Let G1 and G2 be two generalized clauses in Ln P (X), α ∈ Ln . G = RαL (G1 , G2 ) is called an α-lock resolvent of G1 and G2 if it satisfies the following conditions. (1) G is the α-resolvent of G1 and G2 . (2) The α-resolvent generalized literals in G1 and G2 have the minimal indices respectively. Definition 19. Let S be a finite generalized clause set in Ln P (X), and all generalized literals in S are assigned lock indices. An α-resolution deduction from S is called an α-lock deduction if each α-resolution in the deduction process is an α-lock resolution. An α-lock deduction of from S to α-empty clause is called an α-lock proof of S. Theorem 5. (Soundness Theorem). Let S be a finite generalized clause set in Ln P (X), and all generalized literals in S are assigned lock indices. {D1 , D2 , . . . , Dm } is an α-lock resolution deduction from S to a generalized clause Dm . If Dm ≤ α, then S ≤ α. Theorem 6. (weak completeness theorem). Let S be a finite generalized clause set in Ln P (X), and all generalized literals in S are assigned lock indices. Let α ∈ Ln and ∨a∈Ln (a ∧ a ) ≤ α < I. If S ≤ α, then there exists an α-lock deduction of from S to α-empty clause.
α-Satisfiability and α-Lock Resolution for a Lattice-Valued Logic LP(X)
5
327
Conclusion
In this paper, the generalized literals in LP (X) which include constants are discussed in details, it is shown that the generalized literals which are comparable with the resolution level α can be deleted or not be considered in the resolutionbased automated reasoning process. Finally, an α-lock resolution method based on Ln P (X) is proposed and the weak completeness of this method is proved. The further research will be concentrated on contriving an algorithm to achieve the efficiency of the α-lock resolution method in Ln P (X), and extending the α-lock resolution method to more general logic systems such as LP (X) and LF (X). The proposed work aimed at handling imprecise and incomparable information, especially when the truth-valued field is taken as a lattice-ordered linguistic valued structure, i.e., the truth-value is assigned as linguistic term (e.g., possibly true, very true, more or less true etc) instead of using numerical value in [0, 1]. Hence this will lead to an application in qualitative reasoning and qualitative linguistic valued decision making. The proposed automated reasoning algorithm placed a foundation for linguistic-valued based symbolic reasoning, logic programming, and decision making. Acknowledgments. This work is partially supported by the National Natural Science Foundation of China (Grant No. 60875034) and the Specialized Research Foundation for the Doctoral Program of Higher Education of China (Grant No. 20060613007), and the research project TIN-2009-08286 and P08-TIC-3548.
References 1. Robinson, J.P.: A machine-oriented logic based on the resolution principle. J. ACM 12, 23–41 (1965) 2. Liu, X.H.: Resolution-Based Automated Reasoning. Academic Press, Beijing (1994) (in Chinese) 3. Wos, L.: Automated Reasoning: 33 Basic Research Problems. Prentice Hall, New Jersey (1988) 4. Wang, G.J., Zhou, H.J.: Introduction to Mathematical Logic and Resolution Principle, 2nd edn. Science Press, Beijing (2006) 5. Xu, Y.: Lattice implication algebra. J. Southwest Jiaotong University 1, 20–27 (1993) 6. Xu, Y., Ruan, D., Qin, K.Y., Liu, J.: Lattice-Valued Logic: An Alternative Approach to Treat Fuzziness and Incomparability. Springer, Berlin (2003) 7. Xu, Y., Ruan, D., Kerre, E.E., Liu, J.: α-resolution principle based on lattice-valued propositional logic LP (X). Information Science 130, 195–223 (2000) 8. Xu, Y., Ruan, D., Kerre, E.E., Liu, J.: α-resolution principle based on first-order lattice-valued logic LF (X). Information Science 132, 221–239 (2001b) 9. Ma, J., Li, W.J., Ruan, D., Xu, Y.: Filter-based resolution principle for latticevalued propositional logic LP (X). Information Sciences 177, 1046–1062 (2007) 10. Liu, J., Ruan, D., Xu, Y., Song, Z.M.: A resolution-like strategy based on a latticevalued logic. IEEE Transaction on Fuzzy System 11(4), 560–567 (2003)
On Compactness and Consistency in Finite Lattice-Valued Propositional Logic Xiaodong Pan1 , Yang Xu1 , Luis Martinez2 , Da Ruan3 , and Jun Liu4 1
Intelligent Control Development Center, Southwest Jiaotong University, Chengdu 610031, Sichuan, PR China 2 Department of Computing, University of Ja´en, E-23071 Ja´en, Spain
3 Belgian Nuclear Research Centre (SCKCEN), Mol, and Ghent University, Belgium 4 School of Computing and Mathematics, University of Ulster, Northern Ireland, UK
Abstract. In this paper, we investigate the semantical theory of finite latticevalued propositional logic based on finite lattice implication algebras. Based on the fuzzy set theory on a set of formulas, some propositions analogous to those in the classical logic are proved, and using the semantical consequence operation, the consistence and compactness is investigated. Keywords: Lattice-valued logic; Consequence operation; Compactness; Fuzzy theory; Consistency.
1 Introduction In recent years, the theory of fuzzy sets have been applied widely to research of fuzzy logic (e.g. see [8, 10, 15, 16, 20]). Using fuzzy sets of formulas in the semantical and syntactic inference. In [14], Pavelka incorporated internal truth value in the language, defined a semantical consequence operation as an extension of the classical case and proved a lot of important results about its axiomatizability. According to the discussion presented in [16], Pavelka’s fuzzy logic is a fuzzy logics with evaluated syntax, in which each formula is in the syntax assigned a value, as a consequence, the concept of proof in classical setting becomes evaluated proof in his fuzzy logic, i.e., proving a formula to be true to some degree. This is a generalization of many-valued logic in that in the former we infer new facts along with their truth values whereas in many-valued logic one infers only those facts that are absolutely true (have truth value 1). In [17, 18], Nov´ak extended Pavelka’s approach to Łukasiewicz first order logic. In [22], Xu extended Pavelka’s approach in a relatively generalied lattice, some important conclusions about uncertain reasoning and automated reasoning have been obtained (see e.g. [23-25]). In [11, 12], Ma et al. presented a filter-based resolution principle for this kind of logic and also considered the application problem of this kind of logic in machine intelligence. In addition, it is well known that whether a theory is consistent or inconsistent is one of the crucial questions in fuzzy logic, in [21, 26-28], Zhou. etc. investigated the consistent E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 328–334, 2010. c Springer-Verlag Berlin Heidelberg 2010
Compactness and Consistency in Lattice-Valued Logic
329
degree of theory by means of deduction theorems, standard completeness theorems and satisfiability degrees of formulas in several typical fuzzy logic systems. In [19], closure operators introduced by Tarski in 1930 and their use in classical logic are well-known, in Pavelka’s fuzzy logic theory, the closure operators play a significant role, the semantic and syntactic closure operators have been defined as mappings from fuzzy sets of formulas to fuzzy sets of formulas, which extended the concept of closure operator in Tarski’s sense in a natural way(see [14]). Concerning closure operators, one of the first works was done by Mich´alek in [13] in the framework of fuzzy topological spaces. From the point of view of fuzzy set theory, Bˇelohl´avek investigated closure operators and related structrues in [1, 2, 3]. During the last decades or so, the closure operators have been also studied in the context of the fuzzy logic taking the chain L [0 1] as a special case [4-7, 9]. In order to make fuzzy logic to work for approximate reasoning better, in this position paper, based on the fuzzy theory on a set of formulas, we investigate the consistence and compactness of semantical consequence operation in finite lattice-valued propositional logic.
2 Preliminaries In this section, for the purpose of reference, we introduce some basic definitions and results about closure operation, lattice implication algebras and lattice-valued logic, and notation conventions we shall use throughout this paper. Definition 1. [22] Let (L O I) be a bounded lattice with an order-reversing involution ¼ , I and O the greatest and smallest element of L respectively, and : L L L be a mapping. (L ¼ O I) is called a lattice implication algebra if the following conditions hold for any x y z L: (I1 ) x (y z) y (x z), (I2 ) x x I, (I3 ) x y y¼ x¼ , (I4 ) x y y x I x y, (I5 ) (x y) y (y x) x (L1 ) (x y) z (x z) (y z), (L2 ) (x y) z (x z) (y z). Remark 1. In a lattice implication algebra (L ¼ O I), by defining a binary operation as follows: for any x y L, x y (x y¼ )¼ , we can show that (L I) is a residuated lattice, but the converse is not always true. For example, let ([0 1] min ) be a G¨odel structure, where x y is 1 for x y and y elsewhere. It is easy to prove that ([0 1] min ) is a residuated lattice, but since 0 06 0 06 (08 07) 08 (06 07) 08 1 1, thus it is not a lattice implication algebra. (More details, please refer to [22]). In what follows, unless otherwise stated, L always represents any given finite lattice implication algebra. The set of all natural numbers will be denoted by N, the set N 0 will be denoted by N . Let M be a nonempty set, L M denotes all L-fuzzy set on M. If the set supp(A) x A(x) O is finite, then A is called a finite fuzzy set.
330
X. Pan et al.
By LP we denote the lattice-valued propositional logic based on finite lattice implication algebras L. In LP, the formula set F is a ( & )-type free algebra generated ¯ where S is the set of propositional variables, L¯ a¯ a L, a¯ L¯ is a by set S L, nullary operation. Definition 2. The mapping v : F L is called a valuation if v( A) v(A)¼ v(A&B) v(A) v(B) (v(A) v(B)¼)¼ v(A B) v(A) v(B), and v(¯a) a for any a L. The set T of all valuations is called the semantics of LP. Definition 3. [14] Let X LF A F a L. The mapping CT : LF
v T v X is called the Lsemantic consequence operation on F .
LF X
Remark 2. Let X LF , X is called an fuzzy theory on F , for any A F , X(A) L denote the initial truth value of the formula A, it has been given in advance, so X can also be viewed as an information qua premiss.
3 Semantical Consequence Operation and Consistency of Fuzzy Theory In this section, we generalize the classical set of formulae to L-set in the level of D, discuss the properties of L-semantic consequence operation, and study the problem of the consistency. In what follows, we always admit that is a subset of F and D be a subset of L satisfying (i). I D O D; (ii). for any x y L such that x y, x D implies y D. ¯ then Definition 4. Let F , define D X LF A F , if A and A L, ¯ then X(A) ; otherwise, X(A) O. X(A) D; if A and A ¯ L, Remark 3. In definition above, is a finite subset of F as usual. Let X D , for every A F , by X(A) we mean a truth value for A in X in the level of D, it have been given in advance. In fact, this can be viewed as a generalization of a premise formulas set in the classical semantic deduction. Definition 5. Let F , v T . v is called as a model of in the level of D, or v ¯ is called satisfiable satisfies in the level of D if v(A) D for every A and A L. in the level of D if there exists a valuation v, which satisfies in the level of D. By definition 4 and 5, the following proposition is obvious. Proposition 1. is satisfiable in the level of D if and only if there exist X v T such that v X.
D
and
In definition 3, Pavelka extended semantical consequence operation in the classical logic to L-consequence operation. In what follows, we discuss the properties of the semantical consequence operation CT . Definition 6. Let X LF , if v T v X , then we assign IF to X, i.e. CT X IF , where IF is the greatest element of LF which is the constant map equal to I on the whole of F . In this case, X is said to be inconsistent with regard to T ; otherwise, X is said to be consistent. If for all X D , X is consistent, then is said to be consistent with regard to T in the level of D.
Compactness and Consistency in Lattice-Valued Logic
331
By Definition 5 and 6, the following proposition is obvious. Proposition 2. is satisfiable in the level of D if and only if there exists X that X is consistent with regard to T .
LF , if there exist A F
Proposition 3. For any X
CT X (A) CT X ( A)
D
such
and A L¯ such that
«¾L
¼
then X is inconsistent with regard to T . Proof. Assume that X is consistent, then there exists v0 T such that v0 X, it follows that v0 (A) v0 ( A) CT X (A) CT X ( A) for any A F , since there exist A F and A L¯ such that
CT X (A) CT X ( A)
so v0 (A) v0 ( A) «¾ L
¼
, but «¾L
¼
«¾L
¼
v0(A)(v0(A))
¼
v0 (A) v0 ( A),
this is impossible. Hence, X is inconsistent. By definition 6 and proposition 3, it is easy to prove the following conclusion. Corollary 1. For any X exists A F such that
LF , X is inconsistent with regard to T
CT X (A)
CT X ( A)
if and only if there
I
In classical logic, by M A we mean that v(A) 1 for any v T satisfying v(M) 1. That is to say, A Dv M Dv v T , where Dv
p F v(p) 1. In the following, by the semantical consequence operation, we extend the above concept to lattice-valued logic, we can obtain the following conclusion:
Theorem 1. Let A F , if CT X (A) v T satisfying v( ) D.
D for any X
D
, then v(A)
D for any
4 The Compactness of Semantical Consequence Operation It is well known that compactness is an important property of classical logic, which establishes a link between infinity and finity. In this section, we discuss the compactness of semantical consequence operation CT . Proposition 4. Let BCT
Y LF CT Z Y for any Z Y , then for any N BCT , N BCT ; that is to say, BCT is a closure system. Proof. Assume that N BCT , let B N . If N , then B N IF , thus B IF BCT ; otherwise, for any Y1 LF , if Y1 B, then Y1 Y for any Y N , and so CT Y1 Y for any Y N . Hence, CT Y1 N B. Sum up, B BCT , BCT is a closure system, ending the proof.
332
X. Pan et al.
Y LF CT Y
Corollary 2. BCT points of CT .
Corollary 3. For any Y
Y , that is to say, BCT consists of all fixed
LF , CT Y
Z Y Z Z BC . T
Definition 7. The mapping CT is said to be compact if
CT Y Y
CT X
LF
X and Y isa f inite f uzzyset
Y
for any X LF . CT is said to have the property of preserving directed joins if CT i¾I Yi
Yi i I of subsets of LF , where U is i¾I CT Yi for any directed family U said to be a directed family if for any Yi Y j U , there exists Yk U such that Yi Yk and Y j Yk .
Theorem 2. CT is compact if and only if it have the property of preserving directed joins.
Yi i I is a directed family of Proof. (Necessity)Assume that CT is compact, U subsets of LF . Let Y0 Y , on the one hand, since CT is a closure operation, so i i¾I C Y C Y . On the other hand, C is compact, it follows that T i T 0 T i¾I
CT Y0
CT Z Z
LF
Y0 and Z isa f inite f uzzyset
Z
and then for any A F ,
CT Y0(A) CT Z(A) Z LF Z Y0 and Z isa f inite f uzzyset
as CT Y0 (A) CT Z (A) L and L is finite lattice implication algebra, so there exist Z1 Z2
Zn
Y0 (n N
1 n L ) such that
in CT Y0 (A) CT Zi(A)
i1
For any Z j Y0 i¾I Yi , since Z j is finite fuzzy set, so there exist Y j1 Y j2 Y jk ( j1 jk I) such that Z j Y j1 Y j2 Y jk , by U is directed, it follows that there exists Y j0 U such that Y ji Y j0 (i 1 k), thus Z j Y j0 . Similarly, we can prove that there exists Y£ U such that Y j0 Y£ ( j 1 n), hence Z j Y£ ( j 1 n). Therefore,
CT Y0 (A) CT Z1(A) CT Z2(A) CT Y (A) CT Yi (A)
CT Zn(A)
£
i¾I
It shows that CT Y0 (A) i¾I CT Yi (A). Sum up, CT Y0 (A) i¾I CT Yi (A), that is to say, CT have the property of preserving directed joins. (SuÆciency) Assume that CT have the property of preserving directed joins. Due to the fact that for any Y LF , Y is the union of some finite subsets of LF , obviously, the family of all finite subsets of LF is directed, hence CT is compact. This proof is completed.
Compactness and Consistency in Lattice-Valued Logic
333
5 Conclusion The semantical consequence operation, the consistency and compactness of a latticevalued prepositional logic LP(X) are investigated in this paper, which enhance the theoretical foundation of this logic system and provide a theoretical support for approximate reasoning to handle fuzziness and incomparability. Acknowledgments. The work is partially supported by the Natural Science Foundation of China (Grant no. 60875034) and the research projects TIN-2009-08286 and P08TIC-3548.
References 1. Bˇelohl´avek, R.: Fuzzy closure operators. Journal of Mathematical Analysis and Applications 262, 473–489 (2001) 2. Bˇelohl´avek, R.: Fuzzy closure operators II: induced relations, representation, and examples. Soft computing 7, 53–64 (2002) 3. Bˇelohl´avek, R.: Fuzzy Relational Systems: Foundations and Principles. Kluwer, New York (2002) 4. Biacino, L., Gerla, G.: An extension principle for closure operators. Journal of Mathematical analysis and applications 198, 1–24 (1996) 5. Biacino, L., Gerla, G.: Closure Operators for Fuzzy Subsets. In: Proceeds First European Congress on Fuzzy and Intelligent Technologies, Aachen (1993) 6. Castro, J.L., Trillas, E.: Tarski’s fuzzy consequences. In: Proc. Internat. Fuzzy Eng. Symp. 1991, vol. 1, pp. 70–81 (1991) 7. Castro, J.L., Trillas, E., Cubillo, S.: On consequence in approximate reasoning. J. Appl. NonClassical Logics 4(1), 91–103 (1994) 8. Cintula, P.: From fuzzy logic to fuzzy mathematics. Ph.D.Thesis, Czech Technical University, Prague (2005) 9. Gerla, G.: Comparing fuzzy and crisp deduction systems. Fuzzy Sets and Systems 67, 317– 328 (1994) 10. H´ajek, P.: Metamathematics of Fuzzy Logic. Kluwer Academic Publishers, Dordrecht (1998) 11. Ma, J., Li, W., Ruan, D., Xu, Y.: Filter-based resolution principle for lattice-valued propositional logic LP(X). Information Sci. 177, 1046–1062 (2007) 12. Ma, J., Chen, S., Xu, Y.: Fuzzy logic from the viewpoint of machine intelligence. Fuzzy Sets Syst. 157, 628–634 (2006) 13. Mich´alek, J.: Fuzzy Topologies. Kibernetika II 5, 345–354 (1975) 14. Pavelka, J.: On fuzzy logic I: Many-valued rules of inference, II: Enriched residuated lattices and semantics of propositional calculi, III: Semantical Conpleteness of some many-valued propositional calculi. Zeitschr. F. Math. Logik. Und. Grundlagend Math. 25, 45–52 (1979) 15. Nov´ak, V., Perfilieva, I., Moˇckoˇr, J.: Mathematical Principles of Fuzzy Logic. Kluwer, Boston (1999) 16. Nov´ak, V.: Which logic is the real fuzzy logic? Fuzzy Sets and Systems 157, 635–641 (2006) 17. Nov´ak, V.: On the syntactico-semantical completeness of first-order fuzzy logic. Part I, syntax and semantics, Kybernetika 26, 47–66 (1990) 18. Nov´ak, V.: On the syntactico-semantical completeness of first-order fuzzy logic. Part II, main results, Kybernetika 26, 134–154 (1990) 19. Tarski, A.: Logic, Semantics and Metamathematics. Clarendon Press, Oxford (1956)
334
X. Pan et al.
20. Turunen, E.: Mathematics behind fuzzy logic. In: Advances in Soft Computing. PhysicaVerlag, Heidelberg (1999) 21. Wang, G.J., Zhang, W.X.: Consistency degrees of finite theories in£ukasiewicz propositional fuzzy logic. Fuzzy Sets and Systems 149, 275–284 (2005) 22. Xu, Y., Ruan, D., Qin, K.Y., Liu, J.: Lattice-Valued Logic-An Alternative Approach to Treat Fuzziness and Incomparability. Springer, Heidelberg (2003) 23. Xu, Y., Ruan, D., Kerre, E.E., Liu, J.: «-Resolution principle based on lattice-valued propositional logic LP(X). Information Sci. 130, 1–29 (2000) 24. Xu, Y., Ruan, D., Kerre, E.E., Liu, J.: «-Resolution principle based on first-order latticevalued logic LF(X). Information Sci. 132, 221–239 (2001) 25. Xu, Y., Liu, J., Ruan, D.: Tsu-Tian Lee: On the consistency of Rule Bases Based on latticevalued first-order logic LF(X). Internat. J. Intelligent Systems 21, 399–424 (2006) 26. Zhou, X.N., Wang, G.J.: Consistency degrees of theories in some systems of propositional fuzzy logic. Fuzzy Sets and Systems 152, 321–331 (2005) 27. Zhou, H.J., Wang, G.J.: Generalized consistency degrees of theories w.r.t. formulas in several standard complete logic systems. Fuzzy Sets and Systems 157, 2058–2073 (2006) 28. Zhou, H.J., Wang, G.J.: Characterizations of maximal consistent theories in the formal deductive system L (NM-logic) and Cantor space. Fuzzy Sets and Systems 158, 2591–2604 (2007)
Lattice Independent Component Analysis for Mobile Robot Localization Ivan Villaverde, Borja Fernandez-Gauna, and Ekaitz Zulueta Computational Intelligence Group Dept. CCIA, UPV/EHU, Apdo. 649, 20080 San Sebastian, Spain www.ehu.es/ccwintco
Abstract. This paper introduces an approach to appearance based mobile robot localization using Lattice Independent Component Analysis (LICA). The Endmember Induction Heuristic Algorithm (EIHA) is used to select a set of Strong Lattice Independent (SLI) vectors, which can be assumed to be Affine Independent, and therefore candidates to be the endmembers of the data. Selected endmembers are used to compute the linear unmixing of the robot’s acquired images. The resulting mixing coefficients are used as feature vectors for view recognition through classification. We show on a sample path experiment that our approach can recognise the localization of the robot and we compare the results with the Independent Component Analysis (ICA).
1
Introduction
Navigation is the ability of an agent to move around its environment with a specific purpose. A necessary ability for navigation is self-localization: the capacity of the robot to ascertain, more or less accurately, “where it is” from the information provided by its sensors. This knowledge makes possible other navigation related tasks like path planning. But this ability also requires a previously known model of the environment, a map, which can be built off-line, in a previous training step, or on-line, as the robot explores new space. Topological maps are one of the most common types of such maps. These kind of maps do not store any metric relationship between environment elements, but mere neighborhood relationships between reference locations inside the environment. One popular approach to topological maps are the ones based on appearance based models [8]. Appearance based models are based on view matching, being the maps formed by collections of images taken at different spots of the environment, usually along with some information of their relative position. Those images are stored in the nodes of a graph in which the links between nodes indicate an appearance or spatial neighborhood, building in this way a topological map. Localization will be, then, to find the stored image resembling more closely the view at the robot’s current location. This resembling will be based on global image features like color histograms [17], edge density [15] or PCA [10] or ICA [12] descriptors, instead of local features tracked from image to image. E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 335–342, 2010. c Springer-Verlag Berlin Heidelberg 2010
336
I. Villaverde, B. Fernandez-Gauna, and E. Zulueta
In this paper we propose the application to appearance based localization of the approach called Lattice Independent Component Analysis (LICA), introduced in [4]. This approach consists of two steps: First it selects Strong Lattice Independent (SLI) vectors from the input dataset using an heuristic algorithm, the Endmember Induction Heuristic Algorithm (EIHA) [5]. Second, because of the conjectured equivalence between SLI and Affine Independence, it performs the linear unmixing of the input dataset based on these endmembers, obtaining the feature vectors of each input data. Therefore, the approach is a mixture of linear and nonlinear methods. The original work using this approach was devoted to unsupervised hyperspectral image segmentation, therefore the use of the name endmember for the selected vectors. We maintain the basic assumption that the data is generated as a convex combination of a set of endmembers which are the vertices of a convex polytope covering some region of the input data. This assumption is similar to the linear mixture assumed by the Independent Component Analysis (ICA) [6] approach, however we do not impose any probabilistic assumption on the data. If we try to establish correspondences to the ICA, the endmembers correspond to the unknown sources and the mixing matrix is the one given by the abundance coefficients computed by least squares estimation. The EIHA was first proposed in [5]. In this algorithm, our approach to endmember selection from the data is based on the conjectured equivalence between the Strong Lattice Independence and the Affine Independence [14]. The SLI needs two conditions: Lattice Independence and max/min dominance. Lattice Independence is detected based on results on fixed points for Lattice Autoassociative Memories (LAM) [14,13,16], and max/min dominance is tested using algorithms inspired in the ones described in [18]. The LICA approach falls in the field of Lattice Computing algorithms, which have been introduced in [3] as the class of algorithms that either apply lattice operators inf and sup or use lattice theory to produce generalizations or fusions of previous approaches. In [3] an extensive and updated list of references that can be labeled Lattice Computing can be found. The outline of the paper is as follows: Section 2 gives a brief recall of ICA. Section 3 introduces the linear mixing model. Section 4 presents results of the proposed approach on a sample path. Finally, section 5 provides some conclusions.
2
Independent Component Analysis
The Independent Component Analysis (ICA) [6] assumes that the data is a linear combination of non Gaussian, mutually independent latent variables with an unknown mixing matrix. The ICA reveals the hidden independent sources and the mixing matrix. That is, given a set of observations represented by a d dimensional vector x, ICA assumes a generative model x = As,
(1)
Lattice Independent Component Analysis for Mobile Robot Localization
337
where s is the M dimensional vector of independent sources and A is the d × M unknown basis matrix. The ICA searches for the linear transformation of the data W, such that the projected variables Wx = s
(2)
are as independent as possible. It has been shown that the model is completely identifiable if the sources are statistically independent and at least M −1 of them are non Gaussian. If the sources are Gaussian the ICA transformation could be estimated up to an orthogonal transformation. Estimation of mixing and unmixing matrices can be done maximizing diverse objective functions, among them the non gaussianity of the sources and the likelihood of the sample. We have used the implementations of Mean Field ICA [7] and Molgedey and Schouster ICA based on dynamic decorrelation [11], which are available at [1].
3
Linear Mixing Model and the Lattice Independent Component Analysis
The linear mixing model (LMM) [9] assumes that the data follows a linear model, which can be expressed as: x=
M
ai si + w = Sa + w,
(3)
i=1
where x is the d-dimension pattern vector (the images acquired by the robot’s camera in our case), S is the d × M matrix whose columns are the d-dimension vertices of the convex region covering the data corresponding to the so called endmembers si , i = 1, .., M, a is the M -dimension abundance vector, and w is the d-dimension additive observation noise vector. The LMM is applied when some item is assumed to be the combination of several pure items, called endmembers. In [9] the items are light spectra in the context of hyperspectral image processing, here the items are the singular images which could be used as landmarks. Abundance coefficients correspond to the fraction of the contribution of each endmember to the observed item. From this physical interpretation follows that the linear mixing model is subjected to two constraints on the abundance coefficients. First, to be physically meaningful, all abundance coefficients must Mbe non-negative ai ≥ 0, i = 1, .., M , and, second, they must be fully additive i=1 ai = 1. As a side effect, there is a saturation condition ai ≤ 1, i = 1, .., M . From a geometrical point of view, these restrictions mean that we expect the endmembers in S to be affinely independent and that the convex region defined by them covers all the data points. The model in eq. 3 is shared by other linear analysis approaches, such as the ICA [6], that do not view S as a set of endmembers but as regressors or independent sources. The mixing inversion process (often called unmixing) consists in the estimation of the abundance coefficients, given the endmembers S and the observation
338
I. Villaverde, B. Fernandez-Gauna, and E. Zulueta
data x. The simplest approach is the unconstrained least squared error (ULSE) estimation given by: −1 T a = ST S S x. (4) The coefficients that result from equation (4) do not necessarily fulfill the nonnegativity and full additivity conditions. From the physical interpretation point of view, the non-negativity restriction is more fundamental. The heuristic algorithm EIHA described in [5] always produces convex regions that lie inside the data cloud, so that enforcing the non-negative and full additivity restrictions would be impossible for some data points. Enforcing them for some points may introduce undesired distortions of their abundance values. Nevertheless, our attempts to use other unmixing techniques with our data have resulted in prohibitive computational times. Being this a critical issue in mobile robotics, we use systematically the unconstrained estimation of equation (4) to compute the abundance coefficients, looking for a compromise solution. We call Lattice Independent Component Analysis (LICA) the approach grounded in the results and algorithm that have been described in [5,4]. LICA consists in two steps: 1. Induce from the given data a set of Strongly Lattice Independent vectors. In this paper we apply the Endmember Induction Heuristic Algorithm (EIHA) [5]. These vectors are taken as a set of affine independent vectors. The advantages of this approach are (1) that we are not imposing statistical assumptions, (2) that the algorithm is one-pass and very fast because it only uses comparisons and addition, (3) that it is unsupervised and incremental, and (4) that it detects naturally the number of endmembers. 2. Apply the unconstrained least squares estimation to obtain the mixing matrix. The localization results are based on the classification of the images using the coefficients of this matrix. Therefore, the approach is a combination of linear and lattice computing: a linear component analysis where the components have been discovered by non-linear, lattice theory based, algorithms. Our reasoning for the application of the LICA to vision based mobile robot localization is as follows: When M d the computation of the convex coordinates can be interpreted as a dimension reduction process, or a feature extraction process as used in the experiment described in section 4. In contrast to the approach followed in [12], in which images were divided in windows, storing the ICA descriptor of each one, we obtain a feature vector for the full image.
4
Experimental Validation
The approach tested can be summarized as follows: we try to perform the visual recognition of designed landmark positions by a supervisedly built classifier. We test two ways to compute the convex coordinates used as feature vectors for the images: the LICA and the ICA approaches. The proposal is based on the features extracted from the images with a two step process.
Lattice Independent Component Analysis for Mobile Robot Localization
339
Table 1. Classification results using LICA (α = 7) and 3-nn #Endmembers Pass 1 Pass 2 Pass 3 Pass 4 Pass 5 Average 19 20 16 20 18 Average
0.82 0.78 0.78 0.80 0.80 0.80
0.70 0.75 0.67 0.75 0.74 0.72
0.65 0.65 0.62 0.65 0.66 0.65
0.71 0.71 0.68 0.72 0.73 0.71
0.75 0.71 0.73 0.71 0.74 0.73
0.72 0.72 0.70 0.72 0.74 0.72
Table 2. Classification results using LICA (α = 8) and 3-nn #Endmembers Pass 1 Pass 2 Pass 3 Pass 4 Pass 5 Average 8 0.63 0.59 0.56 0.63 0.60 0.60 9 0.59 0.54 0.46 0.53 0.61 0.55 8 0.67 0.61 0.54 0.60 0.57 0.60 10 0.65 0.55 0.48 0.60 0.57 0.57 8 0.54 0.54 0.43 0.50 0.41 0.48 Average 0.62 0.57 0.49 0.57 0.55 0.56
First step consists in the induction of the endmembers from the data sample formed by the set of images captured by a robot along its travelled path. Those induced endmembers will be, in the second step, the basis for a linear unmixing of the image data. This linear unmixing will give as a result a vector of convex coordinates which will be used as feature vector for the images. We extract the endmembers with the Endmember Induction Heuristic Algorithm (EIHA) [5] and two variations of the ICA approach. 4.1
Map Building and Localization
This approach requires the full image data set that “describes” the path which is going to be mapped, which has to be recorded in a training step and processed afterwards. It is an off-line mapping algorithm. This image data set will be composed of a sequence of optical images taken at regular spaces all along the path followed by the robot. From this training data set several positions are selected to act as landmarks. Selection of those positions can follow any arbitrary pattern. The optical views from these position are transformed into the convex coordinates computed using the endmembers extracted from the whole image data set of the path by the EIHA, or the sources detected by the ICA. We assume that the selected positions divide the path into segments or regions. These path regions correspond to spatial regions where the reference landmark views are expected to be smoothly recognized1 , being these regions ideally adjacent and dense. Map building includes 1
Smooth recognition means that small displacements do not modify catastrophically the recognition.
340
I. Villaverde, B. Fernandez-Gauna, and E. Zulueta Table 3. Classification results using Mean Field ICA and 3-nn #Indep. Comp. Pass 1 Pass 2 Pass 3 Pass 4 Pass 5 Average 5 0.32 0.31 0.31 0.30 0.24 0.30 10 0.27 0.30 0.26 0.23 0.24 0.26 15 0.36 0.33 0.32 0.34 0.32 0.33 20 0.27 0.26 0.21 0.25 0.21 0.24 25 0.69 0.62 0.54 0.65 0.53 0.61 Average
0.38
0.36
0.33
0.35
0.31
0.35
Table 4. Classification results using Molgedey and Schouster ICA and 3-nn #Indep. Comp. Pass 1 Pass 2 Pass 3 Pass 4 Pass 5 Average 5 0.48 0.49 0.50 0.42 0.39 0.45 10 0.70 0.57 0.54 0.55 0.58 0.59 15 0.76 0.61 0.57 0.64 0.62 0.64 20 0.81 0.69 0.62 0.74 0.69 0.71 25 0.82 0.69 0.62 0.73 0.67 0.71 Average 0.71 0.61 0.57 0.62 0.59 0.62
the construction of the feature vector classifier using these images as the training set. The robot self-localization process will be, thus, produced by the classification of new acquired images in one of the regions previously defined using the stored images as representatives of the region. For an input image the process is as follows: First, we perform the linear unmixing of the image with the basis of endmembers computed from the training path images. The convex coordinates are the feature vector. Second, we classify the image feature vector with the classifier trained with the training path. For the validation, we count as success if the actual robot position falls in the region defined by the landmark classifier. This mapping approach produces a topological appearance based map, since no metric information is stored in the map and the robot only uses relative, non-precise positioning, and the localization is based on image matching. 4.2
Experimental Results
The experiments were performed over several pre-recorded image datasets, showing in this paper the results obtained over one of those datasets, as sample test. Each dataset was recorded by driving manually a Pioneer robot six times along a predefined path along the corridors of our building, acquiring images at spaces of 5-6 cm, along with its related odometry measurement. The paths try to simulate possible paths that a robot should travel in one hypothetical navigation task in that building. As the robots were guided manually, each one of the travels follows a slightly different path.
Lattice Independent Component Analysis for Mobile Robot Localization
341
For each recorded path, the first trip was used to train the system parameters, and the five remaining trips were used as test sequences. The task to perform is to recognize a hand selected set of spatial landmark positions given their respective views of the world as taken by the robot’s camera. The landmark positions were selected on the floor plane, selecting places of practical relevancy, like doors to other laboratories. Classes of images are identified for each of the selected landmark position, assigning the images in the sequences to the closest landmark map position with similar orientation, according to its corresponding robot odometry reading. This image labelling is the ground truth for the ensuing processes. Therefore the task becomes the classification of the newly acquired images into one of the map classes. The classification was done using a 3-NN classifier. Tables 1 and 2 show the classification results obtained over the sample path using the proposed LICA approach. The EIHA noise tolerance α parameter has been tuned to α = 7 and α = 8, respectively, to obtain a range of desired number of endmembers. Since the EIHA has a random start, the results of 5 runs of the algorithm are shown, with different number of endmembers induced. Tables 3 and 4 show the classification results obtained using the abundance coefficients computed from the independent components extracted with two ICA algorithms (Mean Field ICA and Molgedey and Schouster ICA). Tables show the results obtained with several numbers of independent components. It can be appreciated that the LICA approach outperforms clearly the Mean Field ICA in all cases, with similar or even greater dimensionality reduction, while performing slightly better than the Molgedey and Schouster ICA in some cases, with overall similar performance.
5
Summary and Conclusions
We have proposed and applied a Lattice Independent Component Analysis (LICA) to the appearance based mobile robot localization. The LICA is based on the application of a Lattice Computing based EIHA algorithm for the selection of the endmembers, and the linear unmixing of the data based on these endmembers. We have discussed the similarities of our approach to the application of the ICA to the same problem. In our approach the salient views acquired along the path correspond to endmembers detected by the EIHA algorithm and the spatial mixing coefficients correspond to the convex coordinates obtained by unmixing the recorded images on the basis of the found endmembers. The LICA approach then uses this set of vectors to compute the abundance coefficients that characterize the data and the endmember. Over these coefficients we perform the robot localization, consisting in the classification of the views on the map classes. Results in section 4 show that the convex coordinates of the data points based on the endmembers induced by the EIHA algorithm can be used as features for pattern classification. Results show that this approach improves the Mean Field ICA approach, while on average performs similarly like the Molgedey and Schouster ICA with different number of sources tried, improving it in some cases. Hierarchical issues [2] would be considered.
342
I. Villaverde, B. Fernandez-Gauna, and E. Zulueta
References 1. Ica:dtu site, http://isp.imm.dtu.dk/toolbox/ica/index.html 2. Graña, M., Torrealdea, F.: Hierarchically structured systems. European Journal of Operational Research 25, 20–26 (1986) 3. Graña, M.: A brief review of Lattice Computing. In: IEEE International Conference on Fuzzy Systems, FUZZ-IEEE 2008 (IEEE World Congress on Computational Intelligence), June 2008, pp. 1777–1781 (2008) 4. Graña, M., Savio, A.M., García-Sebastián, M., Fernandez, E.: A Lattice Computing approach for on-line fMRI analysis. Image and Vision Computing (in Press Corrected Proof, 2009) 5. Graña, M., Villaverde, I., Maldonado, J.O., Hernandez, C.: Two Lattice Computing approaches for the unsupervised segmentation of hyperspectral images. Neurocomputing 72(10-12), 2111–2120 (2009) 6. Hyvärinen, A., Karhunen, J., Oja, E.: Independent component analysis. John Wiley and Sons, Chichester (2001) 7. Højen-Sørensen, P., Winther, O., Hansen, L.K.: Mean-field approaches to independent component analysis. Neural Computation 14(4), 889–918 (2002) 8. Jones, S., Andresen, C., Crowley, J.: Appearance based process for visual navigation. In: Proceedings of the 1997 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 1997, September 1997, vol. 2, pp. 551–557 (1997) 9. Keshava, N., Mustard, J.: Spectral unmixing. IEEE Signal Processing Magazine 19(1), 44–57 (2002) 10. Kröse, B., Vlassis, N., Bunschoten, R.: Omnidirectional vision for AppearanceBased robot localization. In: Hager, G.D., Christensen, H.I., Bunke, H., Klein, R. (eds.) Dagstuhl Seminar 2000. LNCS, vol. 2238, pp. 39–50. Springer, Heidelberg (2002) 11. Molgedey, L., Schuster, H.G.: Separation of a mixture of independent signals using time delayed correlations. Physical Review Letters 72, 3634–3637 (1994) 12. Munguia, R., Grau, A., Sanfeliu, A.: Matching images features in a wide base line with ICA descriptors. In: 18th International Conference on Pattern Recognition, ICPR 2006, vol. 2, pp. 159–162 (2006) 13. Ritter, G.X., Gader, P.: Fixed points of Lattice Transforms and Lattice Associative Memories. In: Advances in Imaging and Electron Physics, vol. 144, pp. 165–242. Elsevier, Amsterdam (2006) 14. Ritter, G.X., Urcid, G., Schmalz, M.: Autonomous single-pass endmember approximation using Lattice Auto-Associative Memories. Neurocomputing 72(10-12), 2101–2110 (2009) 15. Sim, R., Dudek, G.: Learning generative models of scene features. In: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2001, vol. 1, pp. 406–412 (2001) 16. Sussner, P., Valle, M.: Gray-scale Morphological Associative Memories. IEEE Transactions on Neural Networks 17(3), 559–570 (2006) 17. Ulrich, I., Nourbakhsh, I.: Appearance-based place recognition for topological localization. In: Proceedings of IEEE International Conference on Robotics and Automation, ICRA 2000, vol. 2, pp. 1023–1029 (2000) 18. Urcid, G., Valdiviezo, J.C.: Generation of lattice independent vector sets for pattern recognition applications. In: Ritter, G.X., Schmalz, M.S., Barrera, J., Astola, J.T. (eds.) Proc. of SPIE 2007, Math. of Data/Image Pattern Recog. Compression, Coding and Encrip. with Applications X, vol. 6700, pp. 67000C:1–12. SPIE, San Jose (2007)
An Introduction to the Kosko Subsethood FAM Peter Sussner and Estev˜ao Esmi Department of Applied Mathematics, University of Campinas, Campinas, State of S˜ao Paulo, Brazil
Abstract. Inspired by the fact that in (fuzzy) mathematical morphology a (fuzzy) erosion is defined in terms of a (fuzzy) inclusion measure, we introduce a nondistributive fuzzy morphological associative memory model on the basis of the Kosko subsethood measure. Moreover, we compare the error correction capabilities of the new model and of other fuzzy and gray-scale associative memories in terms of some experimental results concerning gray-scale image reconstruction. Keywords: Fuzzy associative memory, mathematical morphology, fuzzy erosion, Kosko subsethood measure, gray-scale image reconstruction.
We have recently proposed a very general class of fuzzy associative memories (FAMs) called fuzzy morphological associative memories (FMAMs) [24] that includes many well-known FAM models. FMAMs grew out of a gray-scale associative memory model called morphological associative memory (MAM) [14,21]. In this context, the term ”morphological” refers to the fact that the nodes of (fuzzy) morphological associative memories execute elementary operations of mathematical morphology (MM) as defined in a complete lattice setting such as the extended reals or integers in the case of MAMs and the unit interval [0, 1] in the case of FMAMs [24]. Unlike Kosko’s original FAM model, the KS-FAM introduced in this paper does not comply with this definition of FMAM. Instead, we found inspiration in the roots of MM that lie in the processing and analysis of images using “structuring elements” [12]. For example, in fuzzy mathematical morphology, a fuzzy erosion of an image by a SE is given by the degree of inclusion of the translated SE at every pixel [13,20]. The label ”morphological” may be attached to the KS-FAM because each hidden node of this two-layer model performs a type of fuzzy erosion with Kosko’s subsethood [10] playing the role of the inclusion measure. Just like other FAM models and the original MAM, the KS-FAM can be used to store and retrieve gray-scale patterns. Therefore, this paper includes experiments concerning the reconstruction of gray-scale images from corrupted image cues, comparing the KSFAM with other gray-scale AMs such as the Hamming AM, the MAM WXX , the MAM WXX +ν, Kosko’s FAM, the kernel associative memory (KAM), and the optimal linear associative memory (OLAM).
This work was supported by FAPESP under grant no 2006/05868 − 5 and by CNPq under grant no. 306040/2006 − 9.
E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 343–350, 2010. c Springer-Verlag Berlin Heidelberg 2010
344
P. Sussner and E. Esmi
1 Some Mathematical Background The KS-FAM model introduced in this paper incorporates concepts of mathematical morphology in a different way than the original MAM and FMAM models. The following background information on MM and MAMs is indispensable in order to grant some insight into these issues. 1.1 Some Basic Notions of Mathematical Morphology Complete lattices are generally accepted as the appropriate mathematical framework of MM [8,16,17]. Let us review a few basic concepts of lattice theory and MM on complete lattices. A partially ordered set L is called a complete lattice if and only if every non-empty subset of L has an infimum and a supremum in L [1]. Examples of complete lattices are given by R±∞ = R ∪ {+∞, −∞} andRn±∞ = (R±∞ )n . For any Y ⊆ L, we denote the infimum of Y using the symbol Y and we denote supremum of Y using the symbol Y . An (algebraic) erosion ε is defined as a mapping from a complete lattice L to a complete lattice M that commutes with the infimum operator. Formally, we have ε(y) ; (1) ε( Y ) = y∈Y
Similarly, an (algebraic) dilation δ is defined as a mapping from a complete lattice L to a complete lattice M that commutes with the supremum operator. Instead of providing more details about MM on complete lattices, let us recall the origins of MM as a set-theoretical approach to binary image processing [8,12]. Later, MM was extended to gray-scale image processing. Fuzzy mathematical morphology represents one of the approaches towards gray-scale MM [6,13,20]. Let F (X) = [0, 1]X denote the class of fuzzy sets in X. The fuzzy erosion of an image a ∈ F(X) by a structuring element (SE) s ∈ F(X) arises via the following definition: EF (a, s)(x) = IncF (sx , a) ,
(2)
where IncF is a fuzzy inclusion measure [13,20] that fuzzifies the crisp inclusion measure Inc : P(X)×P(X) → {0, 1} (Inc(A, S) = 1 ⇔ A ⊆ S) and sx is the translation of the fuzzy SE s by x given by sx (y) = s(y−x) for all y ∈ X. Thus the fuzzy erosion EF (a, s) at point x is given by a degree of inclusion of the translated SE sx in the fuzzy image a. 1.2 Basic Concepts of Morphological Associative Memories Morphological associative memories (MAMs) belong to the class of morphological neural networks (MNNs) [15,24]. Despite the name ”morphological neural network”, the first MNN models were based on minimax algebra, a lattice algebra that originated from problems in machine scheduling and operations research [3,4]. J.L. Davidson exposed the close relationship between MM and minimax algebra by embedding classical binary and gray-scale MM into minimax algebra [5].
An Introduction to the Kosko Subsethood FAM
345
Let us consider the following special cases of matrix products that are defined in minimax algebra [3,4]. For A ∈ Rm×p and B ∈ Rp×n ∨ B, also ±∞ , the matrix C = A called the max product of A and B, and the matrix D = A ∧ B, also called the min product of A and B, are defined by cij =
p k=1
(aik + bkj ) , dij =
p
(aik + bkj ) .
(3)
k=1
Let A ∈ Rm×n . If εA and δA are such that εA (x) = A ∧ x and δA (x) = A ∨ x for all x ∈ Rn±∞ . then εA represents an (algebraic) erosion and δA represents an (algebraic) dilation from the complete lattice Rn±∞ into the complete lattice Rm ±∞ . These operations are employed in the recall phase of MAMs. For simplicity, we restrict ourselves to patterns with entries in R ⊆ R±∞ . Suppose that we want to record k vector pairs x1 , y1 , . . . , xk , yk using a morphological associative memory [14,21]. Let X denote the matrix in Rn×k whose column vectors are the vectors xξ ∈ Rn and let Y denote the matrix in Rm×k whose column vectors are the vectors yξ ∈ Rm , where ξ = 1, . . . , k. There are two possible recording schemes for MAMs resulting in weight matrices MXY and WXY . The first recording scheme consists in constructing an m×n matrix MXY as follows: MXY = Y ∨ (−X t ) .
(4)
The second, dual scheme consists in constructing an m × n matrix WXY of the form WXY = Y ∧ (−X t ). The recall phases of MXY and WXY are respectively given in terms of the erosion εMXY and the dilation δWXY . If Y = X then we obtain the auto-associative morphological memories (AMMs) MXX and WXX . Consider MXX in the binary case where X ∈ {0, 1}n×k . In addition, assume that MXX ∈ {0, 1}n×n . We previously observed that each entry of the min product MXX ∧ x can be computed by evaluating the crisp inclusion of a certain SE s ∈ {0, 1}n in x. Specifically, let mi ∈ {0, 1}1×n be the i-th row of MXX and let ¯ ti = 1 − mti denotes the complement of mi then we have [22]: mti = (mi )t . If m ¯ ti ≤ x , 1 if m mi ∀ i = 1, . . . , n . (5) ∧x= 0 otherwise . ¯ ti in x. Hence, mi ∧ x computes the crisp inclusion of the SE m
2 Introduction to the Kosko Subsethood Fuzzy Associative Memory The AMMs MXX and WXX have several desirable properties such as unlimited absolute storage capacity and one-step convergence when used as a dynamic model with feedback [14,21]. On the down-side, these AMMs exhibit a limited error correction capability and many spurious memories. In an attempt to improve the noise tolerance of the binary MXX model, we fuzzified Equation 5 [22]. The following observation proved to be useful for this purpose: If
346
P. Sussner and E. Esmi
f : [0, 1]n → {0, 1}n is the hard-limiting function defined below and if S : [0, 1]n × [0, 1]n → [0, 1] is Kosko’s subsethood measure then the following equations hold for all i = 1, . . . , n. 0 if x < 1 t ¯ i , x)) , where f (x) = (6) (MXX ∧ x)i = f (S(m 1 else . n i=1 xi ∧ yi and S(x, y) = for x = 0 ∈ [0, 1]n . (7) n i=1 xi By leaving away the hard-limiter f in Equation 6 we obtain the fuzzy min product of ˜ x (evaluating Kosko’s subsethood of the complement MXX and x, denoted by MXX ∧ n×k of the i-th row of A ∈ [0, 1] and the j-th column of B ∈ [0, 1]k×m yields cij , where ˜ ˜ x corresponds to the C = A ∧ B). In the terminology of MM, the ith entry of MXX ∧ t ¯ i is a subset of the fuzzy image x. degree to which the SE m Example 1. The following example illustrates the action of the fuzzy min product between the matrix MXX and binary input vectors x and y. ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 10 0001 0 1 ⎢1 0⎥ ⎢0 0 0 1⎥ ⎢1⎥ ⎢0⎥ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ X =⎢ (8) ⎣ 1 1 ⎦ , MXX = ⎣ 1 1 0 1 ⎦ , x = ⎣ 1 ⎦ , y = ⎣ 1 ⎦ , 01 1100 1 0 ⎡ ⎡ ⎤ ⎤ 2/3 2/3 ⎢ ⎥ ⎢ ⎥ ˜ x = ⎢ 2/3 ⎥ , MXX ˜ y = ⎢ 2/3 ⎥ . MXX (9) ∧ ∧ ⎣ 1 ⎦ ⎣ 1 ⎦ 1 1/2 The following scheme yields a successful approach towards a binary AMM [22]: ˜ x → Defuzzification → output y input x → MXX ∧
(10)
Another approach consisting of a two-layer binary MAM was previously suggested to reduce the number of spurious memories for the auto- and hetero-associative cases [23]. The Kosko subsethood FAM that we introduce in this paper combines the advantages of both aproaches and provides a generalization to the gray-scale case. Specifically, let {(x1 , y1 ), . . . , (xk , yk )} be the set of (fuzzy) fundamental mem 1 k ories, i.e. the associations of fuzzy patterns to be stored. Let X = x ∈ , . . . , x 1 n×k k m×k [0, 1] and Y = y , . . . , y ∈ [0, 1] denote the matrices whose columns are respectively the input vectors and the output vectors. The KS-FAM model requires the p×k choice of a matrix of auxiliary patterns Z = z1 , . . . , zk ∈ {0, 1} that satisfy the following equations ∨kξ=1 zξ = 1 ∈ [0, 1]p×k , zξ ≤ zγ , and zξ ∧ zγ = 0 ∈ [0, 1]p×k ∀ γ = ξ.
(11)
For all practical purposes, we may select Z to be the k × k identity matrix. Given an input x ∈ [0, 1]n the KS-FAM model produces an output y according the following equations: ˜ x) , y = WZY (12) w = h(MXZ ∧ ∨ w. t
where h (z) = (h (z1 ) , . . . , h (zp )) is given by
An Introduction to the Kosko Subsethood FAM
h (zi ) =
1 if zi ≥ 0 else
p
j=1 zj
, for all i = 1, . . . , p.
347
(13)
The KS-FAM represents a two-layer neural network. After computing the fuzzy min ˜ x, the defuzzification operator h is applied which results in a comproduct MXZ ∧ petition among the hidden neurons. The activation of the hidden nodes exhibiting the highest values leads to the activation of the corresponding patterns yξ . ¯ ti , x)), where m ¯ ti is given by the Note that the ith hidden node calculates h(S(m complemented transpose of the ith row of MXZ . Thus, the aggregation function of the ¯ ti . This ith hidden node evaluates a type of fuzzy inclusion of the input pattern x in m operation can be viewed as an erosion by a structuring element in the wide sense of MM but not as an algebraic erosion since S(·, x) does generally not commute with the infimum operator [20]. The output layer computes a dilation (in both the latticealgebraic and broad sense) given by the operator δWZY .
3 Experimental Results In this section, we compare the KS-FAM with other fuzzy and gray-scale associative memories in some simulations using gray-scale (fuzzy) images (recall that a gray-scale image can be identified with a fuzzy set). Specifally, we recorded the normalized root mean square error (NRMSE) after presentation of an imperfect image cue to the KSFAM, the Hamming net [7,11], the MAM WXX [14], the MAM WXX +ν [21], Kosko’s max-min FAM [10], the KAM [25], and the OLAM [9]. Figure 1 displays images of size 64×64 with 256 gray levels representing downsized versions of images that are contained in the database of the Computer Vision Group, University of Granada, Spain [2]. By applying the row-scan method to each of the four images, we generated fuzzy vectors xξ ∈ [0, 1]4096 of length 4096 for ξ = 1, . . . , 4 which were used in the experiments. Applications of the KS-FAM to each one of these vectors resulted in perfect recall.
Fig. 1. Original images used in the experiments
3.1 Variations in Brightness and Orientation In this experiment we modified the brightness and the orientation of the original images. Specifically, we subtracted a positive constant (0.35 in the fuzzy case and 89 in the grayscale case) from the tree image and added the same constant to the Lena image. The resulting pixel values were thresholded at the lower and upper boundaries of the fuzzy domain [0, 1] or gray-scale domain [0, 255]. In addition, we rotated the church image
348
P. Sussner and E. Esmi
by 10 degrees to the left and the cameraman image by 10 degrees to the right. Figure 2 depicts the results of this experiment for the brightened version of the Lena image. The KS-FAM succeeded in perfectly retrieving all four original images. The Hamming net was unable to deal with the variations in lighting. Apart from the KS-FAM, the best overall performance was achieved by the KAM model. Table 1 summarizes the results of this experiment in terms of the NRMSE.
Fig. 2. From left to right and from top to botton, the first image shows a brightened version of the Lena image. The remaining images correspond to the outputs of the KS-FAM, the Hamming net, the MAM WXX , the MAM WXX + ν, Kosko’s FAM, the KAM, and the OLAM. Table 1. NRMSEs Produced by AM Models in Applications to Patterns Exhibiting Variations in Brightness and Orientation
Tree Lena Church Cameraman
KS-FAM Hamming net 0 0.6347 0 0.8414 0 0 0 0
WXX WXX + ν Kosko’s FAM 0.4771 0.6032 0.4302 0.7354 0.4615 0.8937 1.6015 0.6168 1.1586 0.9509 0.4765 0.7300
KAM 0.1945 0.1499 0.0566 0.0784
OLAM 0.4986 0.6810 0.2892 0.1937
3.2 Noisy Patterns Finally, we corrupted the original images by introducing Gaussian noise of zero mean and variance 0.03. Figure 3 visualizes the outputs produced by the aforementioned associative memories in applications to the corrupted church image that can be found on the top left-hand corner. Table 2 shows the NRMSEs in 100 experiments for each pattern xξ , ξ = 1, . . . , 4. Both the KS-FAM and the Hamming net achieved perfect recall of the four original images. The KAM and the OLAM also exhibited a very satisfactory tolerance with respect to the three types of noise. Table 2. NRMSEs Produced by AM Models in Applications to Noisy Input Patterns
Gaussian Noise (σ 2 = 0.03)
KS-FAM Hamming net WXX WXX + ν Kosko’s FAM KAM OLAM 0 0 0.9005 0.2770 0.8185 0.0137 0.0365
An Introduction to the Kosko Subsethood FAM
349
Fig. 3. From left to right and from top to botton, the first image shows a corrupted version of the church image containing Gaussian noise of zero mean and variance 0.03. The remaining images correspond to the outputs of the KS-FAM, the Hamming net, the MAM WXX , the MAM WXX + ν, Kosko’s FAM, the KAM, and the OLAM.
4 Concluding Remarks This paper presents the Kosko subsethood FAM on the basis of ideas from mathematical morphology. Preliminary experiments in gray-scale image reconstruction indicate its potential utility for applications in pattern recognition. In fact, the KS-FAM was the only model among the distributive and non-distributive associative memories we tested that achieved perfect recall in all our experiments. However, we would like to caution that the KS-FAM model neither tolerates arbitrary variations in brightness and orientation nor excessive amounts of noise.
References 1. Birkhoff, G.: Lattice Theory. American Mathematical Society, Providence (1993) 2. Computer Vision Group Image Database, Dept. of Comp. Sci. and Art. Int., Univ. of Granada, Spain, http://decsai.ugr.es/cvg/index2.php 3. Cuninghame-Green, R.: Minimax Algebra and Applications. In: Hawkes, P. (ed.) Advances in Imaging and Electron Physics, pp. 1–121. Academic Press, New York (1995) 4. Cuninghame-Green, R.: Minimax Algebra: Lecture Notes in Economics and Mathematical Systems 166. Springer, New York (1979) 5. Davidson, J.L.: Foundation and Applications of Lattice Transforms in Image Processing. In: Hawkes, P. (ed.) Advances in Electronics and Electron Physics, vol. 84, pp. 61–130. Academic Press, New York (1992) 6. Deng, T.Q., Heijmans, H.J.A.M.: Grey-scale morphology based on fuzzy logic. Journal of Mathematical Imaging and Vision 16(2), 155–171 (2002) 7. Hassoun, M.H., Watta, P.B.: The Hamming associative memory and its relation to the exponential capacity DAM. In: Proc. IEEE International Conference on Neural Networks, June 1996, vol. 1, pp. 583–587 (1996) 8. Heijmans, H.J.A.M.: Morphological Image Operators. Academic Press, New York (1994) 9. Kohonen, T.: Self-Organization and Associative Memory. Springer, Heidelberg (1984) 10. Kosko, B.: Neural Networks and Fuzzy Systems: A Dynamical Systems Approach to Machine Intelligence. Prentice Hall, Englewood Cliffs (1992)
350
P. Sussner and E. Esmi
11. Lippmann, R.P.: An Introduction to Computing with Neural Nets. IEEE Transactions on Acoustics, Speech, and Signal Processing ASSP-4, 4–22 (1987) 12. Matheron, G.: Random Sets and Integral Geometry. Wiley, New York (1975) 13. Nachtegael, M., Kerre, E.E.: Connections between binary, gray-scale and fuzzy mathematical morphologies. Fuzzy Sets and Systems 124(1), 73–85 (2001) 14. Ritter, G.X., Sussner, P., Diaz de Leon, J.L.: Morphological Associative Memories. IEEE Transactions on Neural Networks 9(2), 281–293 (1998) 15. Ritter, G.X., Sussner, P.: An Introduction to Morphological Neural Networks. In: Proceedings of the 13th International Conference on Pattern Recognition, Vienna, Austria, pp. 709–717 (1996) 16. Ronse, C.: Why Mathematical Morphology Needs Complete Lattices. Signal Processing 21(2), 129–154 (1990) 17. Serra, J.: Image Analysis and Mathematical Morphology, Theoretical Advances, vol. 2. Academic Press, New York (1988) 18. Soille, P.: Morphological Image Analysis. Springer, Berlin (1999) 19. Sussner, P., Esmi, E.L.: Morphological Perceptrons with Competitive Learning: LatticeTheoretical Framework and Constructive Learning Algorithm. Information Sciences (accepted for publication, 2010) 20. Sussner, P., Valle, M.E.: Classification of Fuzzy Mathematical Morphologies Based on Concepts of Inclusion Measure and Duality. Journal of Mathematical Imaging and Vision 32(2), 139–159 (2008) 21. Sussner, P., Valle, M.E.: Grayscale Morphological Associative Memories. IEEE Transactions on Neural Networks 17(3), 559–570 (2006) 22. Sussner, P.: Generalizing Operations of Binary Morphological Autoassociative Memories using Fuzzy Set Theory. J. of Math. Imaging and Vision 9(2), 81–93 (2003) 23. Sussner, P.: Associative morphological memories based on variations of the kernel and dual kernel methods. Neural Networks 16(5), 625–632 (2003) 24. Valle, M.E., Sussner, P.: A General Framework for Fuzzy Morphological Associative Memories. Fuzzy Sets and Systems 159(7), 747–768 (2008) 25. Zhang, B.-L., Zhang, H., Ge, S.S.: Face Recognition by Applying Wavelet Subband Representation and Kernel Associative Memory. IEEE Transactions on Neural Networks 15(1), 166–177 (2004)
An Increasing Hybrid Morphological-Linear Perceptron with Evolutionary Learning and Phase Correction for Financial Time Series Forecasting Ricardo de A. Ara´ujo1 and Peter Sussner2 1 2
Information Technology Department, [gm]2 Intelligent Systems, Brazil Department of Applied Mathematics, University of Campinas, Brazil [email protected], [email protected]
Abstract. In this paper we present a suitable model to solve the financial time series forecasting problem, called increasing hybrid morphological-linear perceptron (IHMP). An evolutionary training algorithm is presented to design the IHMP (learning process), using a modified genetic algorithm (MGA). The learning process includes an automatic phase correction step that is geared at eliminating the time phase distortions that typically occur in financial time series forecasting. Furthermore, we compare the proposed IHMP with other neural and statistical models using two complex nonlinear problems of financial forecasting. Keywords: Lattice Theory, Minimax Algebra, Morphological Neural Networks, Genetic Algorithms, Financial Time Series Forecasting.
1 Introduction In the last few years, morphological neural networks (MNNs) have been proposed for a wide range of applications [1, 2, 3, 4, 5]. MNNs are based on the framework of mathematical morphology (MM) whose algebraic foundations can be found in lattice theory [6, 7, 8]. Originally MM was developed for the processing and analysis of images using structuring elements (SEs) [9, 10]. In contrast to traditional artificial neural network (ANN) models, the aggregation functions of MNNs perform operations of MM instead of conventional linear operations [11]. Morphological neural networks have only very recently found applications in the domain of financial time series forecasting [12, 13, 14] whereas conventional ANN models have been successfully used for nonlinear modeling of time series for at least two decades [15,16,17]. ANNs typically require setting a series of system parameters, some of which are not always easy to determine. In the particular case of time series forecasting, another crucial element that needs to be determined beforehand is the relevant time lags to represent the series [13, 14]. Given the fact that financial time series exhibit a strong linear component as well as a weaker nonlinear component, this paper proposes a hybrid model, called increasing hybrid morphological-linear perceptron (IHMP), consisting of a convex combination of a nonlinear increasing morphological perceptron [18] (because experimental results presented in [13, 14] indicate that financial forecasting models can be assumed to be increasing) and a linear perceptron [19]. IHMP learning employs a modified genetic algorithm (MGA) [20]. Moreover, the learning process includes an automatic phase correction step that is geared at eliminating the time phase distortions that typically E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 351–358, 2010. c Springer-Verlag Berlin Heidelberg 2010
352
R. de A. Ara´ujo and P. Sussner
occur in financial forecasting (“random walk dilemma”) [16, 13, 14]. Furthermore, two complex nonlinear problems of financial prediction are used to compare the proposed IHMP with other prediction models found in the literature. The paper concludes with a discussion of the IHMP model and its performance in prediction problems.
2 The Random Walk Dilemma A naive prediction strategy is to use the last observation of a time series as a prediction of its future value (xt+1 = xt ). This kind of model is known as the random walk (RW) model [16] and is determined by the following equations: xt = xt−1 + rt ,
(1)
where xt is the current observation, xt−1 is the immediate observation before xt , and rt is a noise term with a gaussian distribution of zero mean and standard deviation σ (rt ≈ N (0, σ)). This behavior is common in the finance and economics and is called random walk dilemma or random walk hypothesis [16]. Assuming that an accurate prediction model is used to build an estimated value of xt , denoted by xt , the expected value (E[·]) of the difference between xt and xt must tend to zero, E[xt − xt ] → 0. (2) If the time series generator phenomenon is supposed to have a strong random walk linear component and a very weak nonlinear component (denoted by g(t)), and assuming = t), the expected value of the difference between that E[rt ] = 0 and E[rt rk ] = 0 (∀ k xt and xt (assuming that xt = xt−1 + g(t) + rt ) will be E[xt ] → E[xt−1 ] + E[g(t)] + E[rt ].
(3)
If E[g(t)] → 0, then E[xt−1 ] + E[g(t)] + E[rt ] → E[xt−1 ] and E[xt ] → E[xt−1 ]. Under these conditions, escaping the random walk dilemma is a hard task [16].
3 Background Information on Morphological Neural Networks In a general, yet rigorous way, a morphological neural network (MNN) is defined as the type of artificial neural network that performs an elementary operation of mathematical morphology (MM) between complete lattices at every node, possibly followed by an activation function [4, 5]. Recall that complete lattices provide an appropriate algebraic framework for MM [6,7,8]. This insight was acquired at later stages of the development of MM. In this paper, we adhere to the rigorous, lattice-algebraic definition of an MNN. A partially ordered set L is called a lattice if and only if every finite, non-empty subset of L has an infimum and asupremum in L [21]. we denote For any X ⊆ L, the infimum of X by the symbol X and we write j∈J xj instead of X if X = {xj , j ∈ J} for a index set J. We use similar notations to denote the supremum of X. Let L and M be lattices. A mapping Ψ : L → M is called increasing if and only if the following statement is true for all x, y ∈ L: x ≤ y ⇒ Ψ (x) ≤ Ψ (y) .
(4)
An IHMP with Evolutionary Learning and Phase Correction
353
If L are lattices, a partial order on Ln can be defined by setting (x1 , . . . , xn ) ≤ (y1 , . . . , yn ) ⇔ xi ≤ yi , i = 1, . . . , n
(5)
The resulting partially ordered set Ln is also a lattice and is called product lattice. A lattice L is complete if every non-empty (finite or infinite) subset has an infimum and a supremum in L [21]. If L is a complete lattice then the product lattice Ln is also complete. Complete lattices are widely accepted as the appropriate theoretical framework for mathematical morphology [6, 7, 8]. A central issue in this setting is the decomposition of mappings between complete lattices in terms of elementary operations. Let ε (an algebraic erosion) and δ (an algebraic dilation) be operators from a complete lattice L to a complete lattice M. Banon and Barrera have provided several theorems on the constructive decomposition of mappings between complete lattices in terms of elementary operations of MM [18]. In particular, Banon and Barrera’s constructive decomposition of increasing mappings leads to the following theorem: Theorem 1. An increasing mapping Ψ : L → M between complete lattices L and M can be represented either by a supremum of erosions or by an infimum of dilations. Formally, there exist erosions εi and dilations δ j for some index sets I and J such that Ψ= εi = δj . (6) i∈I
j∈J
Banon and Barrera’s decomposition theorems have (implicitly) served as the basis for the learning algorithms of several MNN models. In these models, the elementary morphological operators occurring in the decomposition are assumed to adopt a special form which requires an additional algebraic structure besides the complete lattice structure [4, 5]. In this paper, we focus on the complete lattice R±∞ since financial time series prediction problems can be modeled in terms of functions Rn±∞ → R±∞ (where n is the number of antecedents or time lags). Given a matrix A ∈ Rm×p and a matrix B ∈ Rp×n ∨ B, called ±∞ , the matrix C = A the max-product of A and B, and the matrix D = A ∧ B, called the min-product of A and B, are defined by the following equations: cij =
p k=1
(aik + bkj ),
dij =
p
(aik + bkj ) .
(7)
k=1
n×m : Consider the following operators εA , δA : Rn±∞ → Rm ±∞ for A ∈ R
εA (x) = AT ∧ x, δA (x) = AT ∨ x,
(8) (9)
where ·T denotes transposition. The operators εA and δA represent respectively an (algebraic) erosion and an (algebraic) dilation from the complete lattice Rn±∞ to the complete lattice Rm ±∞ [5]. In an upcoming paper, we will prove that every erosion ε : Rn±∞ → Rm ±∞ is of the form εA and every dilation δ : Rn±∞ → Rm ±∞ is of the form δA . This statement together with Equation 6 suggests that an increasing function Ψ : Rn → R can be approximated ¯ as follows in terms of vectors vi , wj ∈ Rn and some finite index sets I¯ and J:
354
R. de A. Ara´ujo and P. Sussner
Ψ
εvi or Ψ
i∈I¯
(10)
δw j .
j∈J¯
The hypothesis of Equation 10 provides the basis for our estimation of financial time series by means of morphological perceptrons. A further discussion is beyond the scope of the paper.
4 The Proposed Increasing Hybrid Morphological-Linear Perceptron We conducted a number of experiments that led us to believe that the financial time series considered in this paper are given by increasing functions Ψ : Rn → R, where n represents the number of antecedents or time lags. The proposed increasing hybrid morphological-linear perceptron (IHMP) has a morphological module as well a linear module whose outputs are linearly combined to yield the final output. We differentiate between erosion-based IHMP (E-IHMP) and dilation-based IHMP (D-IHMP). Specifically, the E-IHMP model is given by the following equations: y = λα + (1 − λ)β, λ ∈ [0, 1], (11) where
β = x · bT = x1 b1 + x2 b2 + . . . + xn bn
(12)
and α=
k
n
vi for v = (v1 , v2 , . . . , vk ) and vi = εai (x) =
i=1
(aij + xj ) .
(13)
j=1
Here, n denotes the input signal (x) dimensionality and k denotes the number of operations employed in the morphological module. The i-th erosion is given by εai , where ai = (ai1 , ai2 , . . . , ain )T ∈ Rn can be viewed as the structuring element corresponding to the erosion εai . The vector b comprises the coefficients of the linear component of the model. The only difference between the D-IHMP and E-IHMP models is that in the D-IHMP the following equation replaces Equation 13: α=
k
vi , for v = (v1 , v2 , . . . , vk ) and vi = δai (x) =
i=1
n
(aij + xj ).
(14)
j=1
Note that, for both the E-IHMP and the D-IHMP, a convex combination of α, the output of the morphological module, and β, the output of the linear module, yields the final output. 4.1 The Proposed Training Algorithm Note that the E-IHMP and D-IHMP models require the setting of the parameters λ, b and ai for i = 1, . . . k. If a denotes the concatenation of the vectors ai , i.e., aT = ((a1 )T , (a2 )T , . . . , (ak )T ) then the weight vector w of either model is given by wT = (λ, bT , aT ) .
(15)
An IHMP with Evolutionary Learning and Phase Correction
355
During the evolutionary training process, the weights of the IHMP are adjusted according to an error criterion until convergence or until the end of evolutionary algorithm generations. Each individual of population represents all IHMP weights (wT ). Let us define the following fitness function f (w) in terms of the weights: f (w) =
1+
1 M
m=1
e2 (m)
(16)
.
Here, M is the number of training data and e(m) is the instantaneous error given by e(m) = d(m) − y(m) ,
(17)
where d(m) and y(m) are respectively the desired output signal and the actual output for the m-th training pattern. The modified genetic algorithm (MGA) used to train the IHMP is based on the work of Leung et al. [20]. The MGA procedure consists on the selection of a parent pair of chromosomes and then performing crossover and mutation operators (generating the offspring chromosomes – the new population) until the termination condition is reached. Then the best individual in the population is selected as a solution to the problem. In our simulations, the population comprises ten individuals. The crossover operator is used for exchanging information from two parents (vectors p1 and p2 ) obtained in the selection process by a roulette wheel approach [20]. The recombination process to generate the offsprings (vectors C1 , C2 , C3 and C4 ) is done by four crossover operators, which are defined by the following equations [20]: C1 =
p1 + p2 , 2
(18)
C2 = w(p1 ∨ p2 ) + (1 − w)pmax ,
(19)
C3 = w(p1 ∧ p2 ) + (1 − w)pmin ,
(20)
w(p1 + p2 ) + (1 − w)(pmax + pmin ) . (21) 2 The symbol w ∈ [0, 1] (in this paper, we used 0.9) denotes the crossover weight (the closer w is to 1, the greater is the direct contribution from parents). The symbols p1 ∨p2 and p1 ∧ p2 denote the vectors whose elements are respectively the element-wise maximum and minimum of p1 and p2 . The terms pmax and pmin denote the vectors with the maximum and minimum possible gene values, respectively. After the offspring generation by crossover operators, the son exhibiting the greatest fitness value will be chosen as the offspring generated by the crossover process. The resulting vector is denoted using the symbol Cbest , which replaces the individual of the population with the smallest fitness value. After conclusion of the crossover process, three new mutated offsprings MC1 , MC2 , and MC3 are generated from Cbest as follows [20]: C4 =
MCj = Cbest + Γj ΔMj , j = 1, 2, 3 . j
best
(22) j
Here, the vectors ΔM satisfy the inequalities pmin ≤ C + ΔM ≤ pmax for j = 1, 2, 3. The vectors Γj have entries in {0, 1} and satisfy the following additional
356
R. de A. Ara´ujo and P. Sussner
conditions: The vector Γ1 has only one randomly chosen non-zero entry, Γ2 represents a random binary vector, and Γ3 is the constant vector 1 (consisting only of ones). The mutated offsprings are incorporated into the population according to the following scheme. We generate a random element r of the unit interval [0, 1] and compare it with 0.1. If r < 0.1 then the mutated offspring exhibiting the largest fitness replaces the individual of the current population that has the smallest fitness value. Otherwise, we perform the following steps for j = 1, 2, 3. If the fitness value of MCj exceeds the one of the least fit individual (the one that yields the smallest fitness value) of the current population then we substitute the latter with MCj . Finally, in order to automatically adjust time phase distortions we included a phase fix procedure in the training algorithm. The phase fix procedure have two steps. In the first step an application of the IHMP to an input pattern x = (x1 , . . . , xn )T produces the output y1 . Then the value y1 is attached to the shortened vector (x1 , . . . , xn−1 )T ∈ Rn−1 yielding the pattern (y1 , x1 , . . . , xn−1 )T . The modified pattern (y1 , x1 , . . . , xn−1 )T is now fed to the same IHMP which generates the phase corrected prediction y2 . Three stopping criteria are used in the proposed evolutionary training algorithm: i) The maximum generation number: gen = 10000, ii) The decrease in the training error process training (P t) [22] of the fitness function: P t ≤ 10−6 , and iii) An increase of the validation error or generalization loss of the fitness function (Gl) [22] beyond 5%. The entries of the weight vectors a and b of each individual of population are randomly initialized within the range [−1, 1]. The initial mixture coefficient λ of each individual of population is randomly chosen in the interval [0, 1]. The choice of k (the number of erosions or dilations operations used in the morphological module) varies for the prediction problems that we considered in this paper. It is important to mention that due to genetic operators, the resulting chromosome gene values may exceed their valid boundary values. Whenever this happens, the corresponding genes are truncated to remain within the permitted interval.
5 Simulations and Experimental Results A set of two real world financial time series (Dow Jones Industrial Average (DJIA) index and Standard & Poor 500 (S&P500) index) were used as a test bed for the evaluation of the proposed model. All time series were normalized to lie within the range [0, 1] and divided in three sets according to Prechelt [22]. In order to establish a performance study, previously published results obtained with the ARIMA [23], modular morphological neural network (MMNN) [12], multi-layer perceptron (MLP) [19] and morphological-rank-linear (MRL) perceptron [13] models on the same time series and under the same conditions are employed for comparative studies. To ensure a fair comparison, we applied the phase fix procedure to all of these models [13, 14]. As a global indicator of the prediction performances, we employed the following evaluation function (EF) which combines five well-known performance measures defined in [13, 14]: EF =
POCID . 1 + MSE + MAPE + THEIL + ARV
(23)
For the DJIA index series prediction, we utilized the same time lags presented in [12] to create the input vectors (in this case we use the lags 2, 3, 4, 5, 6, 7, 8, 9, 10 and 11), and the same number of operations in morphological module presented in [12] (in this case we use 8). Table 1 shows the results for all the performance measures.
An IHMP with Evolutionary Learning and Phase Correction
357
Table 1. Results of the test set for the DJIA index series Metrics MSE MAPE THEIL ARV POCID EF
ARIMA 5.8033e-4 8.3200e-2 1.2649 3.9200e-2 46.10 19.3058
MMNN 8.3236e-4 9.6700e-2 0.9945 3.4423e-2 50.85 23.9130
MLP 8.3000e-2 9.3788e-2 0.9885 3.4204e-2 46.59 21.1822
MRL 8.2148e-4 9.6578e-2 0.9916 3.3981e-2 46.82 22.0539
D-IHMP 1.6044e-4 5.7717e-2 0.4965 6.5683e-3 100.00 64.0637
E-IHMP 1.7619e-4 6.0262e-2 0.5094 7.2129e-3 100.00 63.4095
For the S&P500 index series prediction, we used same time lags as in [12] to create the input vectors (in this case we use the lags 2, 3, 4, 5 and 6), and the same number of operations in the morphological module as in [12] (in this case we use 10). Table 2 shows the results for all the performance measures. Table 2. Results of the test set for the S&P500 index series Metrics MSE MAPE THEIL ARV POCID EF
ARIMA 2.1447e-5 1.2400e-2 1.4090 0.1374 47.22 18.4538
MMNN 9.7451e-5 9.2000e-2 0.9498 7.4749e-3 81.31 39.6756
MLP 9.6000e-3 1.0103e-2 0.9179 7.2875e-3 50.98 26.2123
MRL 1.0982e-4 1.0214e-2 1.0397 8.4926e-2 52.18 24.4409
D-IHMP 3.8909e-5 7.2277e-3 0.6184 2.9930e-3 100.00 61.4002
E-IHMP 2.9857e-5 6.2731e-3 0.5388 2.2967e-3 100.00 64.6245
6 Conclusion This paper introduces the increasing hybrid morphological-linear perceptron (IHMP) for financial time series prediction. The IHMP training algorithm makes use of a modified genetic algorithm (MGA) to determine the IHMP parameters. We also added an automatic phase correction step that is geared at eliminating the time phase distortions in financial time series. The performance of the proposed IHMP in comparison to a number of competitive neural and statistical models was assessed in terms of five well-known performance measures in two experiments using real world financial time series: DJIA and S&P500. In addition, an evaluation function that combines the five aforementioned performance measures served as a global indicator for the quality of prediction achieved by a certain model. The experimental results demonstrated a consistently better performance of the proposed IHMP model in comparison to other models found in the literature. With the inclusion of the phase correction step, the IHMP was able to escape the so called random walk dilemma [16] in our simulations. In other words, the IHMP model succeeded in automatically correcting the time phase distortions that typically occur in financial forecasting. Despite the incorporation of the same phase correction procedure, the other models tested in this paper were unable to cope as well with these time phase distortions. Finally, we would like to clarify that the excellent performance of the IHMP does not depend on the use of a genetic algorithm-based training method. Instead, the main advantage of the IHMP in comparison to other models is its capability of modeling the combination of the linear and nonlinear components that determine financial time series in terms of a combination of a linear module and a morphological or lattice-based module. The main purpose of the phase correction procedure is to adjust the nonlinear component which enters the final prediction.
358
R. de A. Ara´ujo and P. Sussner
References 1. Pessoa, L.F.C., Maragos, P.: Neural networks with hybrid morphological rank linear nodes: a unifying framework with applications to handwritten character recognition. Pattern Recognition 33, 945–960 (2000) 2. Gader, P.D., Khabou, M.A., Koldobsky, A.: Morphological regularization neural networks. Pattern Recognition, Special Issue on Mathematical Morphology and Its Applications 33(6), 935–945 (2000) 3. Khabou, M.A., Gader, P.D., Keller, J.M.: LADAR target detection using morphological shared-weight neural networks. Machine Vision and Applications 11(6), 300–305 (2000) 4. Sussner, P., Esmi, E.L.: Introduction to morphological perceptrons with competitive learning. In: Proceedings of the International Joint Conference on Neural Networks, Atlanta, GA, pp. 3024–3031 (2009) 5. Sussner, P., Esmi, E.L.: Morphological perceptrons with competitive learning: Latticetheoretical framework and constructive learning algorithm. Information Sciences (2009) (accepted for publication) 6. Serra, J.: Image Analysis and Mathematical Morphology, Theoretical Advances, vol. 2. Academic Press, New York (1988) 7. Ronse, C.: Why mathematical morphology needs complete lattices. Signal Processing 21(2), 129–154 (1990) 8. Heijmans, H.J.A.M.: Morphological Image Operators. Academic Press, New York (1994) 9. Matheron, G.: Random Sets and Integral Geometry. Wiley, New York (1975) 10. Serra, J.: Image Analysis and Mathematical Morphology. Academic Press, London (1982) 11. Sussner, P., Esmi, E.L.: Constructive morphological neural networks: some theoretical aspects and experimental results in classification. In: Kacprzyk, J. (ed.) Constructive Neural Networks. Studies in Computational Intelligence. Springer, Heidelberg (2009) 12. de A. Ara´ujo, R., Madeiro, F., de Sousa, R.P., Pessoa, L.F.C., Ferreira, T.A.E.: An evolutionary morphological approach for financial time series forecasting. In: Proceedings of the IEEE Congress on Evolutionary Computation, pp. 2467–2474 (2006) 13. de A. Ara´ujo, R., Ferreira, T.A.E.: An intelligent hybrid morphological-rank-linear method for financial time series prediction. Neurocomputing 72(10-12), 2507–2524 (2009) 14. de A. Ara´ujo, R., Ferreira, T.A.E.: A morphological-rank-linear evolutionary method for stock market prediction. Information Sciences (in Press, 2010) 15. Zhang, G., Patuwo, B.E., Hu, M.Y.: Forecasting with artificial neural networks: The state of the art. International Journal of Forecasting 14, 35–62 (1998) 16. Sitte, R., Sitte, J.: Neural networks approach to the random walk dilemma of financial time series. Applied Intelligence 16(3), 163–171 (2002) 17. Zhang, G.P., Kline, D.M.: Quarterly time-series forecasting with neural networks. IEEE Transactions on Neural Networks 18(6), 1800–1814 (2007) 18. Banon, G.J.F., Barrera, J.: Decomposition of mappings between complete lattices by mathematical morphology, part 1. general lattices. Signal Processing 30(3), 299–327 (1993) 19. Haykin, S.: Neural networks: A comprehensive foundation. Prentice Hall, New Jersey (1998) 20. Leung, F.H.F., Lam, H.K., Ling, S.H., Tam, P.K.S.: Tuning of the structure and parameters of the neural network using an improved genetic algorithm. IEEE Transactions on Neural Networks 14(1), 79–88 (2003) 21. Birkhoff, G.: Lattice Theory, 3rd edn. American Mathematical Society, Providence (1993) 22. Prechelt, L.: Proben1: A set of neural network benchmark problems and benchmarking rules. Technical Report 21/94 (1994) 23. Box, G.E.P., Jenkins, G.M., Reinsel, G.C.: Time Series Analysis: Forecasting and Control, 3rd edn. Prentice Hall, New Jersey (1994)
Lattice Associative Memories for Segmenting Color Images in Different Color Spaces Gonzalo Urcid1, , Juan Carlos Valdiviezo-N1 , and Gerhard X. Ritter2 1
2
Optics Department, INAOE, Tonantzintla, Pue. 72000, Mexico {gurcid,jcvaldiviezo}@inaoep.mx CISE Department, University of Florida, Gainesville, FL 32611–6120, USA [email protected]
Abstract. This paper describes a technique for segmenting color images in different color spaces based on lattice auto-associative memories. Basically, the min- or max auto-associative memories can be used to determine tetrahedra enclosing different subsets of image pixels. The column vectors of either memory, additively scaled, correspond to the most saturated color pixels that are the vertices of a specified tetrahedron, and any other color pixel can be considered a linear mixture of these points. The non-negative least square method is used to linearly unmix color pixels and provides the fundamental step in the unsupervised segmentation of a given input color image. We give illustrative examples to demonstrate the effectiveness of our method as well as the color separation results in four different color spaces.
1
Introduction
Color image segmentation has been approached from several perspectives that currently are categorized as pixel, area, edge, and physics based segmentation [1]. For example, pixel based segmentation includes histogram techniques and cluster analysis in color spaces. Optimal thresholding [2] and the use of a perceptually uniform color space [3] are examples of histogram based techniques. Area based segmentation contemplates region growing as well as split-and-merge techniques whereas edge based segmentation embodies local methods and extensions of the morphological watershed transformation [4] such as the flat zone approach [5]. A seminal work employing Markov random fields for splitting and merging color regions was proposed in [6]. Other recent developments contemplate the fusion of various segmentation techniques such as the application of morphological closing and adaptive dilation to color histogram thresholding [7], the use of the watershed algorithm for color clustering with Markovian labeling [8], or fuzzy principal component analysis coupled with clustering based on recursive one-dimensional histogram analysis [9]. For a recent systematic exposition of color image segmentation methods the interested reader may see [10].
Corresponding author. Fax: +52 (222) 247-2940; Tel: +52 (222) 266-3100 Ext.8205. G. Urcid and J.C. Valdiviezo-N. are grateful with SNI-CONACYT for partial financial support through grant # 22036 and doctoral scholarship # 175027.
E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 359–366, 2010. c Springer-Verlag Berlin Heidelberg 2010
360
G. Urcid, J.C. Valdiviezo-N, and G.X. Ritter
In this paper, we describe a lattice algebra based technique for image segmentation and apply it to RGB (Red-Green-Blue) color images transformed to other representative systems such as, HSI (Hue-Saturation-Intensity), I1 I2 I3 (principal components approximation), and L*a*b*(Luminance - redness/greenness - yellowness/blueness) color spaces. The proposed method relies on the min WXX and max MXX lattice auto-associative memories, where X is the set formed by all different colors or 3-dimensional pixel vectors contained in the input image. The scaled column vectors of either memory together with the minimum or maximum vector bounds of X may form the vertices of tetrahedra enclosing subsets of X, and correspond to the most saturated color pixels in the image. Image partition into regions of similar colors is realized by linearly unmixing pixels belonging to tetrahedra determined by WXX and MXX , and then by scaling pixel color fractions obtained with the non-negative least squares numerical method. Thus our approach to color image segmentation can be classified as a pixel based unsupervised clustering technique. Section 2 presents background material on image segmentation and a brief overview of minimax algebra and lattice associative memories; Section 3 describes the segmentation technique based on the scaled column vectors of WXX and MXX including the linear mixing model used to determine the color fractions composing any pixel vector in the input image. In Section 4, we give the segmentation results for images represented in the color spaces previously mentioned and, finally, Section 5 gives the conclusions concerning this research.
2
Background
Intuitively, to segment an image is to divide it into a finite set of disjoint regions whose pixels share well-defined attributes. Perceptually, the segmentation process must convey the necessary information to visually recognize or identify the prominent features contained in the image such as color hue, brightness or texture. Let X be a finite set with k elements and p a logical predicate about a quantifiable attribute. A segmentation of X is a family {Ri } of subsets of X each with ki elements for i = 1, . . . , q, that satisfy the following properties: 1) Ri ∩ Rj = ∅ for i = j (pairwise disjoint q subsets), 2) for any i, Ri is a q connected subset, 3) i=1 Ri = X and i=1 ki = k (whole set covering), 4) ∀i, p(Ri ) = true (elements in a single subset share the same attribute), and 5) for i = j, p(Ri ∪ Rj ) = false (elements in a pairwise union of subsets do not share the same attribute). It should be clear that color image segmentation needs additional computational effort due to its vectorial nature if compared to the scalar nature of grayscale image segmentation. The maximum and minimum of two numbers usually denoted as functions, max(x, y) and min(x, y), will be written as the “join” and “meet” binary operators employed in lattice theory, x ∨ y = max(x, y) and x ∧ y = min(x, y). Lattice matrix operations are defined componentwise, e. g., the maximum of two matrices X, Y of the same size m × n is computed as (X ∨ Y )ij = xij ∨ yij for i = 1, . . . , m and j = 1, . . . , n. Inequalities between matrices are also verified
Lattice Associative Memories for Segmenting Color Images
361
elementwise, for example, X ≤ Y if and only if xij ≤ yij . Also, the conjugate matrix X ∗ is defined as −X t where X t denotes usual matrix transposition. The max-of-sums X ∨ Y , of appropriately sized matrices and the min-ofare defined, for i = 1, . . . , m and j= 1, . . . , n, respectively, as sums X ∧ Y , p p (X ∨ Y )ij = k=1 (xik + ykj ) and (X ∧ Y )ij = k=1 (xik + ykj ). For p = 1 these lattice matrix operations reduce to the outer sum of two vectors x = (x1 , . . . , xn )t ∈ IRn and y = (y1 , . . . , ym )t ∈ IRm , which is the m × n matrix: (y × xt )ij = (yi + xj ). Let (x1 , y 1 ), . . . , (xk , y k ) be k vector pairs with xξ ∈ IRn and y ξ ∈ IRm for each ξ, with corresponding associated matrices (X, Y ), where X = (x1 , . . . , xk ) and Y = (y 1 , . . . , y k ). Then, X is of dimension n × k with i, jth entry xji and Y is of dimension m × k with i, jth entry yij . To store k vector pairs (x1 , y1 ), . . . , (xk , y k ) in an m× n lattice associative memory (LAM), vector encoding uses the outer sums y ξ × (−xξ )t for all ξ [11]. The network weights, wij of the min-memory WXY and, mij of the max-memory MXY , for i = 1, . . . , m and j = 1, . . . , n, are given by wij =
k
(yiξ − xξj ) ;
ξ=1
mij =
k
(yiξ − xξj ).
(1)
ξ=1
We speak of a lattice hetero-associative memory (LHAM) if X = Y and of a lattice auto-associative memory (LAAM) if X = Y . In this paper we will use LAAMs only, i.e., WXX and MXX of size n × n; in particular, the main diagonals of both matrices, i. e., wii and mii consist entirely of zeros.
3
Segmenting Images with the W M Method
From a given a color image A of size p × q pixels, set X contains all different colors or 3-dimensional vectors present in A. If |X| = k is the number of elements in X then k ≤ pq = |A|, where pq is the maximum number of possible color pixels available in A. Using (1) with yiξ = xξi for all i ∈ {1, . . . , n}, the memory matrices min-WXX and max-MXX are computed and written, respectively, as W = (w1 , w2 , w3 ) and M = (m1 , m2 , m3 ) to make explicit their column vectors. By construction, vector entries of W or M may not necessarily belong to the numerical range of a given color space; for example W usually has negative entries. The next transformation puts these vectors in the appropriate range. The minimum- and maximum vector bounds of X = (x1 , . . . , xk ) are given by v = kξ=1 xξ and u = kξ=1 xξ , respectively. Let W = (w1 , . . . , wn ) and M = (m1 , . . . , mn ) be the min- and max memory matrices, then additive scaling results in two scaled matrices, denoted W and M , whose column vectors are defined by w i = w i + ui = w i +
k ξ=1
xξi
;
m i = mi + vi = mi +
k
xξi .
(2)
ξ=1
Notice that, wiii = ui and miii = vi , hence diag(W ) = u and diag(M ) = v. Each set of scaled vectors, {w1 , w2 , w3 } or {m1 , m2 , m3 }, makes possible to
362
G. Urcid, J.C. Valdiviezo-N, and G.X. Ritter
determine several tetrahedra enclosing specific subsets of X. Recall that X is said to be a convex set if the straight line joining any two points in X lies completely within X; also, an n-dimensional simplex is the minimal convex set whose n + 1 vertices are vectors in IRn . Since the color solid is a subspace of IR3 a 3-dimensional simplex will correspond to a tetrahedron. Hence, considering pixel vectors in a color image enclosed by some tetrahedron whose base face is determined by its most saturated colors, an estimation of the fractions in which they appear at any other color pixel can be made. A model commonly used for the analysis of spectral mixtures in hyperspectral images, known as the linear mixing (LM) can be used to unmix noiseless color images by representing each pixel vector x, as a linear combination of the most saturated colors. Thus, x = Sψ = ψ1 s1 + ψ2 s2 + ψ3 s3 ,
(3)
where, x is a 3 × 1 pixel vector, S = (s1 , s2 , s3 ) is a square matrix of size 3 × 3 whose columns are the most saturated colors, and ψ is the 3 × 1 vector of “color fractions” present in x. The components of ψ must satisfy the relations, ψ1 , ψ2 , ψ3 ≥ 0 (non-negativity) and ψ1 + ψ2 + ψ3 = 1 (full additivity). Solving (3) to find vector ψ given that S = W or S = M for every x ∈ X, is the process known as constrained linear unmixing. For this task, we apply the non-negative least squares (NNLS) numerical method that relaxes the full additivity condition. Once (3) is solved for every color pixel x, all ψ vector values are reassembled into grayscale fraction images for s1 , s2 , s3 , and a thresholding procedure can be applied to get a coarser color segmentation depicting the corresponding image partition. Additional material and discussion on the W M method has been addressed earlier [12,13]. Figure 1 shows in the top left, the “peppers” RGB color image of size 128×128 pixels, its HSI transformation displayed as a false RGB color image, and the extreme color pixels determined from W (upper row) and M (lower row) in the HSI color space. Here, X = {x1 , . . . , x13,844 } (from a total of 16, 384 pixel vectors). A 3D-scatter plot of X is depicted to the left of Fig. 2. The computed scaled memory matrices and vector bounds are given by ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ 255 100 36 255 0 67 140 0 W = ⎝ 188 255 16 ⎠ , u = ⎝ 255 ⎠ ; M = ⎝ 155 0 152 ⎠ , v = ⎝ 0 ⎠ . 115 103 255 255 219 239 0 0 Figure 2 illustrates four tetrahedra enclosing different subsets of X, namely W ∪ {v} and W ∪ {u} shown in the middle, or M ∪ {v} and M ∪ {u} displayed to the right. Equation (3), implemented with the NNLS method, is applied to find a fractions solution vector ψ for each one of the 16, 384 color pixels, first taking S = W then making S = M . The 2nd and 3rd rows in Fig. 1, display the fraction maps obtained from the HSI saturated colors displayed in the top right, whose associated column vectors correspond, respectively, to W and M . Each segmented image sj is linearly scaled from the subinterval [0, μ] to the dynamic k range [0, 255], where μ = ξ=1 ψjξ and k = 16, 384. The fraction threshold φ and the grayscale threshold τ are related by the expression φ = μτ /256.
Lattice Associative Memories for Segmenting Color Images
363
Fig. 1. 1st row: RGB color image, transformed HSI color image, saturated colors obtained from W (upper line) and M (lower line); 2nd and 3rd rows: grayscale segmented images derived from w j , respectively, mj for j = 1, 2, 3, showing “red/green” pepper regions and bright reflected light regions. Brighter gray tones correspond to higher fractions of saturated colors.
Fig. 2. Left: 3D-scatter plot of X showing all different colors present in the HSI representation of the “peppers” RGB color image; right: tetrahedra determined from W = {w 1 , w 2 , w 3 }, and M = {m 1 , m 2 , m3 } enclosing four different subsets of X
4
Segmentation Results in Other Color Spaces
To test the performance of the W M method in different color spaces, besides the standard non-normalized correlated RGB space, we selected as representative alternatives, Ohta’s I1 I2 I3 linearly decorrelated RGB color space [1,6], the HSI non-linear and non-uniform color space [10,14], and the perceptually uniform color space L*a*b* [3,10]. The “peppers” RGB color image and its transformation to the I1 I2 I3 , HSI, and L*a*b* color spaces, rendered as false color RGB images, are displayed in the first four columns of row one of Fig. 3. In the 2nd row below each color image, composed thresholded fraction maps selected from W ∪ M , depict the best
364
G. Urcid, J.C. Valdiviezo-N, and G.X. Ritter Table 1. Segmentation performance for the “peppers” color image Segmentation Method WM in RGB WM in I1 I2 I3 WM in HSI WM in L*a*b* Mahalanobis distance clustering Histograms + Morph. Watersheds
Corr. Coef. 0.707 0.717 0.708 0.675 0.632 0.594
SNR 14.179 14.931 14.124 14.006 12.917 9.814
segmentation obtained in the corresponding color space, e. g., vectors and fraction thresholds used in RGB color space were w1 (0.454), w2 (0.363), and m1 (1.561); similarly, for the I1 I2 I3 color space, m3 (0.389), w3 (0.384), and w 1 (0.347) were chosen. The 3rd row displays Sobel gradient edge images corresponding to the segmentation produced by the W M method in the RGB and I1 I2 I3 color spaces, a clustering method based on Mahalanobis distance, and a hybrid technique employing histograms and morphological watersheds. The 5th column of Fig. 3 shows from top to bottom, the NTSC grayscale version of the original color image, a 16-level quantization produced by an optimized octree nearest color algorithm, and its corresponding Sobel edge image used as reference for quantitative comparisons (see Table 1). Similarly, Figure 4 displays the segmentation results of additional color images. In each row, the source color image in RGB format is shown left and to the right follows the segmentation obtained in the RGB, I1 I2 I3 , HSI, and L*a*b* color spaces, shown as quantized grayscale images. For example, the corresponding “bear” grayscale image in the I1 I2 I3 color space (2nd row, 3rd column) was
Fig. 3. Top row: color image in RGB, I1 I2 I3 , HSI, and L*a*b* color spaces; 2nd row: segmented images of “red/green” peppers and bright portions of reflected light; 3rd row: Sobel edge images of different segmentation methods
Lattice Associative Memories for Segmenting Color Images
365
Fig. 4. 1st column: sample RGB color images; 2nd to 5th columns: compound segmented images obtained with the W M method, respectively, in the RGB, I1 I2 I3 , HSI, and L*a*b* color spaces, main regions of interest are quantized
generated by composing the fraction maps obtained from w2 and m2 after thresholding, respectively, at φ = 0.387 and φ = 0.326. Based on the examples given here, the best segmentation results produced by applying the W M method and semi-constrained LM model occur in the I1 I2 I3 space (cf. 2nd column in Fig. 3 and 3rd. column in Fig. 4).
5
Conclusions
This work describes a segmentation method for color images in different color spaces based on the lattice auto-associative memories, W and M , whose scaled column vectors define the most saturated pixels. These extreme points are suitable to perform semi-constrained linear unmixing to determine color fractions of any other pixel. Granular segmented images of all saturated pixels are produced by scaling the fraction data computed with the NNLS method, and coarse segmented images can be obtained by thresholding the corresponding color fraction maps. Examples are given to illustrate visually the results of segmentation and a preliminary comparison was made against two other segmentation techniques. We remark that the LAAMs based approach can be classified as an unsupervised pixel clustering technique. Future work contemplates additional quantitative evaluation of the proposed method.
References 1. Cheng, H.D., Jain, X.H., Sun, Y., Wang, J.: Color Image Segmentation: Advances and Prospects. Pattern Rec. 34(12), 2259–2281 (2001) 2. Celenk, M., de Haag, M.U.: Optimal Thresholding for Color Images. In: Proc. SPIE, Nonlinear Image Processing IX, San Jose, CA, vol. 3304, pp. 250–259 (1998)
366
G. Urcid, J.C. Valdiviezo-N, and G.X. Ritter
3. Shafarenko, L., Petrou, H., Kittler, J.: Histogram-based Segmentation in a Perceptually Uniform Color Space. IEEE Trans. on Image Processing 7(9), 1354–1358 (1998) 4. Meyer, F.: Color Image Segmentation. In: Proc. IEEE 4th Inter. Conf. on Image Processing and its Applications, pp. 303–306 (1992) 5. Crespo, J., Schafer, R.W.: The Flat Zone Approach and Color Images. In: Serra, J., Soille, P. (eds.) Mathematical Morphology and Its Applications to Image Processing, pp. 85–92. Kluwer Academic, Dordrecht (1994) 6. Liu, J., Yang, Y.-H.: Multiresolution Color Image Segmentation. IEEE Trans. on Pattern Anal. and Mach. Int. 16(7), 689–700 (1994) 7. Park, S.H., Yun, I.D., Lee, S.U.: Color Image Segmentation based on 3-D Clustering: Morphological Approach. Pattern Rec. 31(8), 1061–1076 (1998) 8. G´eraud, T., Strub, P.-Y., Darbon, J.: Color Image Segmentation Based on Automatic Morphological Clustering. In: Proc. IEEE Inter. Conf. on Image Processing, Thessaloniki, Greece, vol. 3, pp. 70–73 (2001) 9. Essaqote, H., Zahid, N., Haddaoui, I., Ettouhami, A.: Color Image Segmentation Based on New Clustering Algorithm and Fuzzy Eigenspace. Research Journal of Applied Sciences 2(8), 853–858 (2007) 10. Koschan, A., Abidi, M.: Digital Color Image Processing, pp. 149–174. John Wiley & Sons, Hoboken (2008) 11. Ritter, G.X., Gader, P.: Fixed Points of Lattice Transforms and Lattice Associative Memories. In: Hawkes, P. (ed.) Advances in Imaging and Electron Physics, vol. 144, pp. 165–242. Elsevier, San Diego (2006) 12. Ritter, G.X., Urcid, G., Schmalz, M.S.: Autonomous Single-Pass Endmember Approximation using Lattice Auto-Associative Memories. Neurocomputing 72(10-12), 2101–2110 (2009) 13. Urcid, G., Valdiviezo-N., J.C.: Color Image Segmentation Based on Lattice AutoAssociative Memories. In: Proc. 13th IASTED Inter. Conf. on Artificial Intelligence and Soft Computing, pp. 166–173 (2009) 14. Zhang, C., Wang, P.: A New Method for Color Image Segmentation Based on Intensity and Hue Clustering. In: Proc. 15th IEEE Inter. Conf. on Pattern Recognition, vol. 3, pp. 613–616 (2000)
Lattice Neural Networks with Spike Trains Gerhard X. Ritter1 and Gonzalo Urcid2, 1
CISE Department, University of Florida, Gainesville, FL 32611–6120, USA [email protected] 2 Optics Department, INAOE, Tonantzintla, Pue. 72000, Mexico Fax: +52 (222) 247-2940; Tel.: +52 (222) 266-3100 Ext.8205 [email protected]
Abstract. Lattice based neural networks have proven their capability of resolving difficult non-linear problems and have been successfully employed to resolve real-world problems. In this paper we introduce a novel lattice neural net that generalizes previous dendritic models. The new model employs the biological notion of dendritic spines and spike trains. We show by example that it can accomplish tasks previous lattice neural networks were incapable of achieving.
1
Introduction
Despite major advances in artificial intelligence, humans and other primates easily outperform the best machine vision systems with respect to most measures. For this reason, emulating object and pattern recognition processes in the cortex remains a fascinating and challenging area of research. Early attempts in constructing artificial neural networks (ANNs) were only partially successful and over time diverged into mathematical systems, such as radial basis function neural nets and support vector machines, that have little in common with biological neural nets. One reason for this divergence is that advances in neurobiology and biophysics of neural information transfer have either not been taken into serious consideration or have been too difficult to implement into practical ANNs. Recent advances in neurobiology have brought to the foreground the importance of dendritic trees, axonal arborization, and spike trains. Less than a decade ago, we started to incorporate some of these concepts into ANNs. One of our early attempts concerned the incorporation of dendritic structures and axonal trees into ANNs [1]. Various researchers consider these structures as the primary basic computational units of the neuron, capable of realizing logical operations. Neurons with dendrites can function as many, almost independent, functional subunits with each being able to implement a rich repertoire of such logic operations as Xor, And, Not, and Or [2,3,4,5]. These logic operations, speed of computation, as well as work by Poggio and colleagues on the Max operator [6,7] are some of the reasons that we used lattice algebra as the main tool for mathematical modeling.
Corresponding author. G. Urcid is grateful with CONACYT for partial financial support, grant # 22036.
E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 367–374, 2010. c Springer-Verlag Berlin Heidelberg 2010
368
G.X. Ritter and G. Urcid
In this paper we extend our earlier model based on dendritic computing to include the concepts of spike trains and spines. In Section 2 we provide a short background of the biological processes necessary for understanding the proposed mathematical model. In Section 3 we define the dendritic model and discuss the rationale and need for generalizing it. Section 4 introduces the dendritic model with spike trains and provides an example for a better understanding of the new model. We conclude this research work a few pertinent observations.
2
Elements of Neurobiology
To understand the computational neural network models discussed in the next section, some basic knowledge of neurobiology is necessary. We assume that the reader has some basic knowledge of the morphology of a neuron and its processes, namely the axon and its arborization, dendrites, and synapses. A neuron sending an electric impulse to another neuron is called the presynaptic neuron and the neuron receiving the impulse is the postsynaptic neuron. The impulse travels along the presynaptic neuron’s axon and its branches which terminate on the dendrites of postsynaptic neurons. The sites where the axonal fibers terminate are called the synaptic sites or synapses. These are the sites were information from the presynaptic neuron is transferred to the postsynaptic neurons. Dendrites are usually (but not always) studded with large numbers of tiny branches called spines. Dentritic spines are major postsynaptic targets of presynaptic input. The number of synapses on a single neuron ranges between 500 to 200,000 and the number of synapses in the human brain has been estimated to be between 60 and 240 trillion (240 × 1012) residing on 10 to 20 billion neurons. These numbers provide a feel of the scope and immensity of the computational and information processing power of the human brain. After receiving impulses from presynaptic neurons, the dendrites generate smaller impulses to the postsynaptic neural cell body. The impulses received from the dendrites will change the electric potential of the postsynaptic neuron and, thus, possible turning it into a presynaptic neuron for another set of postsynaptic neurons. The impulses or action potentials traveling along the axon of the presynaptic neuron are also known as spikes. As a spike travels along the axon of the presynaptic neuron it will be automatically duplicated at each branch of the axonal tree. Thus, each spike reaches all its targeted synaptic sites on the postsynaptic neuron. A spike train is the time-series of spikes recorded from an individual neuron of the brain within some time interval Δt. Since spikes are measured in milliseconds, the number of spikes in a one second interval can be fairly large with some spikes bunched closely together, so called spike bursts, followed by gaps. This is also known as high frequency firing rate fluctuation. A graphical interpretation of a spike train is shown in Fig.1. The spikes in Fig. 1 are displayed as vertical segments of the same height. The reason for this is that the actual spikes generated by a neuron have basically all the same height and shape. There are no 1/4, 1/2, or 3/4 spikes.
Lattice Neural Networks with Spike Trains
t0
t1
t2 ... ti
ti+1 ...
369
tn = 1
Fig. 1. A one second spike train. The vertical line segments are just symbolic markers of action potentials. mV
neural firing threshold
-45 EPSP
-70
IPSP
t (ms)
Fig. 2. When the totality of the EPSPs and IPSPs exceeds the neuron’s firing threshold, the neuron fires and sends a spike along its axon. Here t is in milliseconds and V in millivolts.
When a spike reaches a synaptic site, it produces a postsynaptic potential (PSP). If this potential results in an increase of the membrane potential of the postsynaptic neuron, then it is called an excitatory PSP (EPSP), and if it leads to hyperpolarization of the membrane potential, then it is called an inhibitory PSP (IPSP). An IPSP reduces the postsynaptic cell’s potential away from its firing threshold. These two reactions of the postsynaptic neuron to spikes are illustrated in Fig. 2. Recent research supports the idea that it is not the firing rate, which corresponds to the number of spikes in a spike train, that is the key in coding and decoding of signals, but that it is the position of spikes, gaps, and spike bursts within the time interval Δt before a targeted postsynaptic neuron fires that is important. Moreover, it is the totality of all spike trains generated by all presynaptic neurons during Δt with terminal axonal fibers on a given postsynaptic neuron that is key to understanding the language by which neurons communicate. Figure 3 illustrates this rhythm for three presynaptic neurons with synapse on a given postsynaptic neuron.
3
Lattice Neural Networks
ANNs whose major computational components are derived from lattice theory, are collectively known as lattice neural networks (LNNs). In the past ten years, these networks have become an extremely active area of research. A model of
370
G.X. Ritter and G. Urcid
Dt3
Dt2
Dt1
N3 N2 N1
Dt
Fig. 3. Spike trains of three presynaptic neurons N1 , N2 , and N3 for time interval Δt = 3i=1 Δti
LNNs with dendritic structures was first described in [1]. In this model, as well as later refinements and modifications, a postsynaptic neuron with dendritic structure receives input from n presynaptic input neurons, N1 , . . . , Nn , whose axons have multiple terminal fibers with knobs on synaptic sites on dendritic branches of their target postsynaptic neurons. The input neurons carry the information of a pattern vector x ∈ IRn , by assigning the pattern feature xi to Ni . The computation at the kth dendritic branch is given by (−1)1− xi + wik , (1) τk (x) = pk i∈I(k) ∈L(i)
where I(k) ⊆ {1, . . . , n} corresponds to the set of all input neurons with terminal fibers that synapse on the kth dendritic branch of the neuron, L(i) ⊆ {0, 1} corresponds to the set of terminal fibers of Ni that synapse on the kth dendrite of the neuron, and pjk ∈ {−1, 1} denotes the EPSP (pjk = 1) or IPSP (pjk = −1) response of the kth dendrite’s membrane that will effect the total membrane potential of the neuron. The superscript on the additive synaptic weight wik 1− can only be ‘0’ or ‘1’ since L(i) ⊆ {0, 1}. Thus, if = 0, then (−1) = −1 provides an additional inhibitory effect, while = 1 provides for an excitatory effect since (−1)0 = 1. However, it also means that in this model at most two synapses are allowed on a given dendritic branch for a presynaptic neuron. The kth-dendrite response τk (x) is passed to the cell body and the state of the postsynaptic cell is a function of the input received from all its dendritic branches. Then, the overall neural response is given by τ (x) = p
K
τk (x),
(2)
k=1
where K denotes the total number of dendritic branches of the neuron and p = ±1 denotes the response of the cell body to the received dendritic input. Here again, p = 1 means that the input is accepted, while p = −1 means that the cell rejects the received input. The appeal of this model is that no multiplication comes into play and the max and min operators provide another aid for extremely fast convergence of training algorithms. At first glance, (2) seems to be based
Lattice Neural Networks with Spike Trains
371
x2 2
1
2
1
x1
Fig. 4. A shape that cannot be exactly modeled by a finite number of rectangles with sides that are orthogonal to the x1 and x2 axis
on only minimums. However, due to the relations −(x ∧ y) = −x ∨ −y and −x ∧ −y = −(x ∨ y) and the use of p = ±1, pk = ±1, and (−1)1− , the maximum function is automatically built into the operations expressed by (1) and (2). Just about all of the techniques for training single layer and multilayer LNNs based on this model focus on the training patterns being enclosed by a series of hyperboxes that are orthogonal to the axis of the data space. These techniques have many desirable properties, including fast convergence, clear geometric interpretation, and 100% accurate classification of the training data. Boundaries of these hyperboxes are established in the dendrites and this information flows into the neural body which recognizes the full geometric configuration established by the boundary pieces. A result is that a single layer LNN with only one output neuron can approximate any compact geometric shape in n-dimensional Euclidean space to any degree > 0 of accuracy. This includes connected as well as disconnected configurations. As most algorithms are derived from slight modifications of the algorithm given in [1], we will simply refer to them collectively as Algorithm A. A major problem of various Algorithm A approaches is that many shapes cannot be exactly modeled by hyperboxes, but can only be approximated. Consider the triangle in Fig. 4. The region is described by only three lines, but there is no finite number of rectangles, with sides orthogonal to the x1 and x2 axis, whose union and/or intersections is the triangle. The only way to solve this problem exactly with the use of rectangles whose sides are orthogonal to one of the x1 or x2 axis, is through the use of a postsynaptic neuron with an infinite number of synapses. In order to get around this problem of dealing only with hyperboxes whose faces are orthogonal to the standard basis axis, we constructed orthonormal basis LNNs (OB-LNNs) [8]. In this scheme, rotation matrices are used in order to find the best fit hyperbox enclosing the data or training set of interest. The lattice computation of the kth orthonormal basis dendrite can be expressed, based on (1), as follows τk (x) = pk
(−1)1− [(Rk x)i + wik ],
i∈I(k) ∈L(i)
(3)
372
G.X. Ritter and G. Urcid
x2
x2
x1 x1 Fig. 5. Two rectangular boxes containing the triangle from Fig. 4. The box with the dotted boundaries is orthogonal with respect to its dotted basis which is obtained from a 45◦ rotation of the standard basis. The triangle is the intersection of two rectangular boxes.
where Rk is a square matrix whose columns are unit vectors forming an orthonormal basis. Each dendrite now works its own normal basis defined by the matrix Rk . Figure 6 provides a simple visual example. Observe that each rectangular box is orthogonal with respect to its orthonormal basis and is the smallest box containing the (not rotated) data set, which in this case is the triangle, in that basis. The training algorithm for an OB-LNN, here referred to as Algorithm B, given in [8], proved superior to Algorithm A on three data sets, namely the dual spiral separation problem, the Iris data set, and the separation of an ellipse from its complement. However, it also has its own problems in that no box can classify the triangle in Fig. 4. Nevertheless, eliminating some rotated boxes in its complement allows one to carve out the shape of the triangle. Additionally, having a rotation matrix as part of a dendritic operation is somewhat difficult to explain from a biological standpoint. A slightly different approach becomes apparent if one looks at each step of Algorithms A and B. In the first step, each finds the smallest box containing the data set of interest, Algorithm A in the standard basis and Algorithm B in another basis obtained via a rotation. It is important to note, that both contain an optimal box with respect its basis and each box contains the test data. Hence their intersection contains the test data and provides a better solution for the first step. In a similar way, the next step, namely elimination of points belonging to another class that maybe in the configuration obtained from the intersection can be handled by each algorithm separately and their results again combined. Thus, for instance, the triangle problem is solved in one step by taking the intersection of the first step of Algorithms A and B as illustrated in Fig. 5. Since the surfaces thus obtained are piecewise linear, a LNN can be quickly constructed with dendritic structure and synapses growing or being removed according to the results of each combined step. As it turns out, the notion of spike trains are an ideal way to obtain such networks.
Lattice Neural Networks with Spike Trains
4
373
Single Neuron with Spike Trains
For networks involving dendritic branches, spines, spike trains, synapses, and time delays, we will use the following assumptions. The synaptic weights for two terminal fibers whose terminal knobs impinge on each other or share the same synaptic site, are the same. When we refer to spines we assume that they contain one or more synapses and the information transfer that occurs on the spine within a small time interval (say 1ms to 5ms) gets summed before flowing down the dendritic branch toward the soma of the neuron. We assume that the time interval Δt has been subdivided into these smaller time intervals Δth of equal lengths. Additionally, we assume that the postsynaptic neuron has K distinct dendritic branches d1 , . . . , dK , and each branch dk has rk spines, where the rth spine on dk is denoted by σ(r, k). Finally, the input x ∈ Rn resides in the input neurons Ni with xi ∈ Ni . The lattice algebraic formulation for dendritic computing with spike trains is expressed as τk (x, Δt) = pk
rk m h=1 r=1
r (−1)1−(r,i) si (Δth )(xi + wik ) ,
(4)
i∈I(k,r)
where m denotes the number of subintervals Δth , I(k, r) is the set of all integers r i for which the presynaptic neuron Ni has a synapse on σ(r, k) and wik is the additive weight associated with this synapse,and si (Δth ) denotes the number of spikes generated by Ni during the time Δth . The PSP factor pk = ±1 is determined during training and so is (r, i) ∈ {0, 1}, which depends on both r and i. The postsynaptic neuron collects the information generated by its dendrites over the time interval Δt and computes the output τ (x, Δt) = p
K
τk (x, Δt),
(5)
k=1
where p = ±1 is determined during training. Training is accomplished by applying Algorithms A and B as outlined in the last paragraph of the preceding section. First trials on the Iris data set showed that when using 60% of the data, an error rate of 5.3% resulted when testing the full data set. This compares with error rates of 6.41%, 10%, and 12, 21% for Algorithms B, A, and a multilayer perceptron, respectively, when using the same training data. The triangle problem provides for a simple toy example. Since the algorithm stops after step 1, each Algorithm A and B produce an LNN having two input neurons and one output neuron with two dendrites. These are combined into one LNN with two dendrites. However, the number of axonal fibers changes and so do some synaptic weights. Since the new configuration has only three sides, only three boundaries need to be encoded, thus reducing the number of synapses. The step number of the algorithm provides the delay time interval which in this case is Δt1 = Δt and during this time only one spike from each Ni is needed as each variable xi is used only once in the step 1 computation. In many other cases more than one spike is needed (spike burst) within a small interval. With this in mind,
374
G.X. Ritter and G. Urcid
x2
N2
0
d2 M
-2
x1
N1
0
y
d1
Fig. 6. A single LNN that solves the triangle problem. We assume that the two terminal fibers synapsing on d2 have synapses on the same spine.
and the fact that r1 = 2, τ1 (Δt1 ) computes τ1 (Δt1 ) = (x1 − 0) ∧ [−(x1 − 2)], while τ2 (Δt1 ) = (x1 − 0) + [−(x2 + 0)] = x1 − x2 . Hence τ (x, Δt) ≥ 0 if and only if x1 ≥ 0 and x1 ≤ 2 and x1 ≥ x2 . That is, τ (x, Δt) ≥ 0 if and only if x is in the triangle. This means that the LNN depicted in Fig. 6 can recognize the triangle exactly.
5
Conclusions
We developed a novel LNN that incorporates the notion of dendrites, dendritic spines, and spine trains. This brings ANNs a little closer to the biological model. Initial testing shows superiority over preceding LNN models as well as the perceptrons. Further testing and comparisons need to be done in the future.
References 1. Ritter, G.X., Urcid, G.: Lattice Algebra Approach to Single-Neuron Computation. IEEE Trans. on Neural Networks 14(2), 282–295 (2003) 2. Holmes, W.R., Rall, W.: Electronic Models of Neuron Dendrites and Single Neuron Computation. In: McKenna, T., Davis, J., Zornetzer, F. (eds.) Single Neuron Computation, pp. 7–25. Academic Press, New York (1992) 3. Koch, C., Poggio, T.: Multiplying with Synapses. In: McKenna, T., Davis, J., Zornetzer, F. (eds.) Single Neuron Computation, pp. 315–345. Academic Press, New York (1992) 4. Koch, C.: Biophysics of Computation: Information Processing in Single Neurons. Oxford University Press, Oxford (1999) 5. Mel, B.W.: Why have Dendrites? A Computational Perspective. In: Dendrites, S.G., Spruston, N., Hausser, M.D. (eds.) pp. 271–228. Oxford University Press, Oxford (1999) 6. Yu, A.J., Giese, M.A., Poggio, T.A.: Biophysiologically Plausible Implementations of the Maximum Operation. Neural Computation 14(12), 2857–2881 (2002) 7. Lampl, I., Ferster, D., Poggio, T., Riesenhuber, M.: Intracellular Measurements of Spatial Integration and the Max Operation in Complex Cells of the Cat Primary Visual Cortex. Journal of Neurophysiology 92, 2704–2713 (2004) 8. Barmpoutis, A., Ritter, G.X.: Orthonormal Basis Lattice Neural Networks. In: Kaburlasos, V.G., Ritter, G.X. (eds.) Computational Intelligence based on Lattice Theory. SCI, vol. 67, pp. 45–58. Springer, Berlin (2007)
Detecting Features from Confusion Matrices Using Generalized Formal Concept Analysis Carmen Pel´aez-Moreno and Francisco J. Valverde-Albacete Dpto. de Teor´ıa de la Se˜ nal y de las Comunicaciones, Universidad Carlos III de Madrid Avda. de la Universidad, 30. Legan´es 28911. Spain {carmen,fva}@tsc.uc3m.es
Abstract. We claim that the confusion matrices of multiclass problems can be analyzed by means of a generalization of Formal Concept Analysis to obtain symbolic information about the feature sets of the underlying classification task. We prove our claims by analyzing the confusion matrices of human speech perception experiments and comparing our results to those elicited by experts.
1
Motivation
For n, p ∈ N, let G = {gi }ni=1 be a set of input labels or stimuli and M = {mj }pj=1 a set of output labels or responses for a multiclass classifier task embodied in a human or artificial agent. Consider the joint event “presenting a stimulus gi to a classifier and obtaining response mj ,” (G = gi , M = mj ) . A contingency table or confusion matrix (CM) for the classifier C ∈ Nn×p is a record of the decisions of N repetitions of such an experiment1 . Confusion matrices are rich summaries of how the classifier performed in a test set. This is usually transformed into an aggregate figure of merit, like accuracy, or a visual depiction, like a multi-class ROC, thereby losing information about the particular errors the classifier may commit. We contend that some information about the underlying task can be obtained from the numerical data in the confusion matrix via a special type of biclustering scheme, a concept lattice, from Formal Concept Analysis (FCA) [1]. Furthermore, concept lattices allow us both to observe the global behavior of classifiers and to analyze their confusions in detail. FCA, unfortunately, cannot deal in an automatic way with non-binary incidences, but generalizations of it to cater for the notion of degree of incidence have been developed [2,3,4,5,6]. In this paper, we use K-Formal Concept Analysis (kFCA) [5,7], which enables the analysis of practical real-valued CM by embedding them into an idempotent 1
This work has been supported by Spanish Government-Comisi´ on Interministerial de Ciencia y Tecnolog´ıa projects TEC2008-02473/TEC and TEC2008-06382/TEC. We consider here the general case where the labels used in the training speech samples differ from those considered by the recognizer.
E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 375–382, 2010. c Springer-Verlag Berlin Heidelberg 2010
376
C. Pel´ aez-Moreno and F.J. Valverde-Albacete
semifield K—actually a bounded lattice-ordered group [8]—, to try and prove that a concept lattice can elicit a symbolic description of the features being used in the classification process and how they are misused by the classifier.
2
Generalized Formal Concept Analysis of Confusion Matrices
From count matrix to ϕ-confusion lattice. To illustrate K-Formal Concept Analysis of confusion matrices, consider that of Fig. 1(a). The first design choice is to find an adequate domain to express the strength of confusions. From a count matrix NGM we may obtain an estimate of the mutual information distribution for the events CˆGM , like that of Fig. 1(b). A proper choice for the semiring in K-Formal Concept Analysis is Rmax,+ (read “completed max-plus”). This is the completed set of reals with the “max” operation used as addition and normal addition as multiplication.
N p m t f th k s
s
ˆ C
150 0 38 7 13 88 0 0 201 0 0 0 0 0 30 0 193 1 0 28 0 4 1 3 199 46 5 4 11 0 6 85 114 4 10 86 0 45 4 1 138 0 0 0 2 5 38 1 170
p m t f th k s
p m
t
f th
k
(a) NGM
p
m
t
f
th
k
s
2.851 −∞ 0.824 -1.717 -0.305 2.155 −∞ −∞ 4.202 −∞ −∞ −∞ −∞ −∞ 0.761 −∞ 3.401 -4.292 −∞ 0.735 −∞ -2.213 -3.793 -2.674 3.277 1.683 -1.817 -1.626 -0.567 −∞ -1.487 2.236 3.179 -1.953 -0.117 2.149 −∞ 1.169 -2.424 -3.904 2.905 −∞ −∞ −∞ -3.047 -1.826 1.619 -3.928 3.995
(b) CˆGM
p m t f th k s p × m t ×
×
×
×
×
×
f
× ×
th
× ×
k × s
×
× ×
×
(c) (G, M, I + C,ϕ )Rmax,+
(d) B(G, M, I + C,ϕ )Rmax,+
Fig. 1. Example analysis using kFCA: (a) count confusion matrix, obtained from the Miller and Nicely experiments [9] for snr = 0dB—only phonemes G = M = {/m/, /p/, /t/, /k/, /f /, /s/, /th/} have been retained as both stimuli (left) and responses (above); (b) its mutual information distribution; (c) structural matrix and (d) structural lattice for ϕ = 0.056585
Detecting Features from Confusion Matrices Using Generalized FCA
377
For n, p ∈ N, given two sets of stimuli G = {gi }ni=1 , and responses M = n×p {mj }pj=1 , and a Rmax,+ -valued matrix C ∈ Rmax,+ , the triple (G, M, C)Rmax,+ is called a Rmax,+ -valued formal context, where C(i, j) = λ reads as “stimulus gi is confused with response mj to degree λ” and dually “response gj is evoked by stimulus mi in degree λ”. We may associated multi-valued sets of stimuli A and responses B by means n p p n + + of a pair of functions (·)C,ϕ : Rmax,+ → Rmax,+ and C,ϕ (·) : Rmax,+ → Rmax,+ forming a Galois connection [1,7] as follows: define ϕ-concepts as pairs (A, B)ϕ + + such that (A)C,ϕ = B ⇐⇒ A = C,ϕ (B). The Basic Theorem of K-Formal Concept Analysis asserts that the set of formal ϕ-concepts is a complete lattice Bϕ (G, M, C)Rmax,+ (see [5,7] for details). The parameter ϕ ∈ R is called the threshold of existence and it describes a minimum degree of confusion required for concepts to be considered members of the Bϕ (G, M, C)Rmax,+ . Structural Confusion Lattices. The ϕ-concept lattice Bϕ (G, M, C)Rmax,+ has a huge number of concepts (infinite, in the typical case) and is hard to visualize. Therefore, for each choice of ϕ deemed interesting, we introduce its structural (confusion) lattice B(G, M, I + C,ϕ ) , the (standard) confusion lattice of the binary incidence, I + , depicting only those concepts above a fixed threshold C,ϕ of existence ϕ. The following lattice exploration algorithm must be carried out once for each choice of ϕ 2 : + 1. Work out the concepts γ(gi )+ C,ϕ and μ(mj )C,ϕ associated to singleton stimuli and responses, respectively. 2. Build a binary incidence I + C,ϕ associated to those concepts by adequately comparing them to create the binary context (G, M, I + C,ϕ ) with the binary + + + incidence gi I C,ϕ mi ⇐⇒ γ(gi )C,ϕ ≤ μ(mj )C,ϕ . 3. Use a standard tool for Formal Concept Analysis, called ConExp [11], to build and visualize the structural concept lattice at ϕ, B(G, M, I + C,ϕ ) .
Structural confusion lattice interpretation. For a boolean confusion matrix I—such as that of Fig. 1(c)— the triple (G, M, I) is called a formal context, and assumed to encode all information pertaining to the phenomenon being analyzed. Pairs of a particular set of stimuli that are all confused with a particular set of responses, and vice versa, are called formal concepts. For instance, c1 = ({/s/}, {/s/, /th/}) is one such pairs for the context above, and c2 = ({/s/, /f /, /th/}, {/th/}) another. The set of stimuli in a concept is called the extent and the set of responses is the intent of the concept: {/s/} and {/s/, /th/}, are the extent and intent, respectively of c1 , meaning stimulus /s/ is confused with responses /s/ and /th/. To distinguish between stimuli and responses, boldface characters will be used for the former throughout the text. Concepts are partially ordered by inclusion of extents, or, equivalently, reverse inclusion of intents: if (A1 , B1 ) ≤ (A2 , B2 ) ⇔ A1 ⊆ A2 ⇔ B1 ⊇ B2 we say that 2
An on-line demonstration of this can be accessed in [10].
378
C. Pel´ aez-Moreno and F.J. Valverde-Albacete
the first concept is more specific (less general ) than the second. For instance, c1 is more specific than c2 . The Basic Theorem of Formal Concept Analysis asserts that the set of formal concepts of a formal context, as related by this order relation, is a complete lattice called the concept lattice B(G, M, I) . In the Hasse diagram of a confusion lattice, stimulus labels appear in white boxes just below the corresponding concept and response labels usually appear in gray boxes just above. To diminish visual clutter, instead of completely labeling each node with all labels of either sort we put the label of each response only in the highest—most abstract—concept it appears, and the label of each stimulus only in the lowest—most specific—concept it appears. This is the reduced labeling shown in Fig. 1(d). In this labeling scheme, concepts capture the confusions between more phones than those that actually appear attached to the concept. To recover the confusion extent, the set of stimuli being confused at a particular concept, we take the union of all stimulus labels found from the node downwards in the lattice. Similarly, to build the confusion intent, we take the union of all response labels found from the node upwards in the lattice . In the example, if we go from c1 downwards in the lattice collecting stimulus labels (below the nodes) we obtain its extent {/s/}, and if we go upwards we find the labels in its intent, /s/ (above c1 itself) and /th/ (above c2 ). There are two types of complementary, domain-specific information that can be gleaned from a lattice: specific concept information and overall lattice information. As to the first, the most interesting concepts are the join-irreducible concepts (bottom half-filled in black in Fig. 1(d)), and meet-irreducible concepts (top half-filled in gray, blue online). Call the rest of the concepts in the example lattice cptk = ({/p/, /t/, /k/}, {/p/, /t/, /k/}), cm = ({/m/}, {/m/}), and cf th = ({/f /, /th/}, {/f /, /th/}) . The set of join-irreducibles is J = {cptk , cf th , c1 , cm }, and the set of meet-irreducibles is M = {cptk , cf th , c1 , c2 , cm }. In confusion lattices, join-irreducibles, always annotated with a stimulus label, are the concepts to peruse in order to know what responses each individual stimulus invokes. And likewise, meet irreducibles, annotated with response labels, show what set of stimuli evokes a particular response. Regarding overall information about the matrix, consider the three separate sublattices of Fig. 1(d) including, the first, concepts top, bottom and cptk , to the left; the second, concepts top, bottom and cm , to the right; and the third, concepts top, bottom, c1 , c2 and cf th , at the center. Concepts in different sublattices are incomparable except for the top and bottom. We will say that such sublattices are adjoined factor sublattices of the confusion lattice. Notice that stimuli and responses that lie in adjoined factor sublattices are never confused, hence the presence of some adjoined sublattices in the confusion lattice is essentially the lattice-theoretic manifestation of as many different virtual channels in the classifier system. By this we mean that the classifier succeeds in conveying definite information from input to output without error. In the example, the channels for {/m/}, {/f /, /s/, /th/} and {/p/, /t/, /k/} seem evident.
Detecting Features from Confusion Matrices Using Generalized FCA
3
379
The Elicitation of Symbolic Knowledge from Phonetic Confusion Matrices
Confusion matrices became a key tool for the analysis of human speech perception since the Miller & Nicely experiments [9]. After a thorough analysis their major conclusion was that phone recognition is grounded on hierarchic categorical discrimination, that is, English consonant sounds form groups identified in terms of hierarchical clusters of articulatory features. They introduced the notion of virtual articulatory communication channels, according to such clusterings, and posited that the channels were characterized by five distinctive acousticarticulatory features, namely, voicing, nasality, affrication, duration and place. In the following we will try to reproduce these results using K-Formal Concept Analysis of perceptual confusion matrices. To assess the complexity of the structural confusion lattices, we have worked out the concept counts defined at different thresholds ϕ . The concept count represent the number of nodes present in the corresponding structural lattice, and provides a rough measure of the complexity of the resulting representation. Small values of the threshold ϕ bring into the picture non-systematic, difficult to explain, confusions evident in the analysis of structural lattices with a high number of nodes. On the contrary, if a larger value of ϕ is chosen, the number of concepts will be reduced offering a much simpler structural lattice showing the most prominent confusions. The different plots of Fig. 2 represent the evolution of the number of concepts for several Signal to Noise Ratios (SNR) and a full 200 − 6 500Hz band for the Miller and Nicely experiments. We can clearly notice how the maximum number of concepts attained by each plot is inversely related to the SNR of the emitted syllables. Therefore, the confusion lattice analysis is capturing the complexity of the CM that corresponds to each SNR: as the speech signal quality gets better the errors become more systematic or structured and therefore the number of concepts decreases. The evolution of the number of concepts suggests a method for describing the information in structural lattices: 1. Begin by observing the most salient properties of the system, that is, those lattices obtained with higher values of the threshold ϕ. 2. Subsequently, try to bring more detail into the picture by sweeping from higher to lower values of ϕ (from right to left in figures 2). We thus obtain a sequence of structural lattices starting from the least complex— with the least number of concepts—and gradually increasing the complexity as new concepts appear. Figure 3 is a typical structural lattice for the Miller & Nicely experiments at 0dB and a particular ϕ where six adjoined factor sublattices can be observed. To the left the voiced phonemes with /m/ and /n/, the nasals, even represented in two separate sublattices. To the right, three sublattices representing unvoiced phonemes: the (oral) stops /p-t-k/, fricative /sh/ and the rest of fricatives.
380
C. Pel´ aez-Moreno and F.J. Valverde-Albacete
Fig. 2. (Color online) Number of concepts vs. ϕ for HSR confusion matrices (data from [9]). The maximum number of concepts attained by each plot is inversely related to the SNR of the emitted syllables.
Fig. 3. Phonetic confusion lattice at ϕ = 0.11716 and SNR = 0dB (data from [9])
Hence our hypothesis is that adjoined factor sublattices in a structural confusion lattice reflect virtual feature transmission channels. Since this has to be contrasted to the Miller and Nicely findings, a direct method to elicit what phonetic knowledge the sublattices reflect would be to show the stimuli and responses in each lattice. This would demand, afterwards, the concourse of a phonetic expert to elicit the features. However, a clustering of phonemes in terms of their voicing, manner and place of articulation can also be cast into an Formal Concept Analysis concept lattice as shown at the top of Fig. 4(a)—showing two phonemes for each feature that
Detecting Features from Confusion Matrices Using Generalized FCA
381
Fig. 4. Phonemes vs. articulatory features concept lattices: (a) canonical clustering with unvoiced sound concepts on the left and voiced ones on the right; (b) clustering elicited from the confusion lattice of Fig. 3; (c) id. including the place feature
correspond to unvoiced (on the left) and voiced sounds (on the right). We may use this knowledge to label the structural lattices automatically by selecting the feature label adequate for each phonemic concept extent. The lattices at the bottom of figure 4 demonstrate which part of the clustering can be actually elicited from the confusion matrix in Fig. 3. Voicing, manner of articulation—stop, nasal, fricative—can be obtained almost without error as shown in Fig. 4(b), although clear mismatches between the canonical and empirically induced representations can be observed: /b/ and /g/ are perceived as fricative, /z/ as stop. But place of articulation is hopeless as Fig. 4(c) shows. In fact, labiodental and velar can not be defined at all. This agrees in all with the Miller & Nicely conclusions, except for result in place of articulation, which has often been disputed.
4
Conclusions
We have provided evidence that Rmax,+ -Formal Concept Analysis of confusion data for a multiple-classification task can identify features present in the classification act. Since our generalization considers non-binary matrices in the analysis, it is ideally suited to the analysis of count confusion matrices.
382
C. Pel´ aez-Moreno and F.J. Valverde-Albacete
After a preprocessing stage which amounts to considering the confusion matrix as a joint-distribution of input stimuli and output responses, we are able to pinpoint adjoined sublattices in the concept lattice which we take as evidence that some definite feature is being transmitted. For assessment purposes, we also elicited these features using conventional articulatory acoustic knowledge. Our results agree with expert-drawn conclusions in all but the most contested ones, what we take to reflect the robustness of the elicitation process.
References 1. Ganter, B., Wille, R.: Formal Concept Analysis: Mathematical Foundations. Springer, Heidelberg (1999) 2. Burusco, A., Fuentes-Gonz´ alez, R.: The study of the L-fuzzy Concept Lattice. Mathware and Soft Computing 1(3), 209–218 (1994) 3. Bˇelohl´ avek, R.: Lattice generated by binary fuzzy relations. Tatra Mt. Mathematical Publications 16, 11–19 (1999) 4. Krajci, S.: A generalized concept lattice. Logic Journal of IGPL 13, 543 (2005) 5. Valverde-Albacete, F.J., Pel´ aez-Moreno, C.: Towards a generalisation of Formal Concept Analysis for data mining purposes. In: Missaoui, R., Schmidt, J. (eds.) Formal Concept Analysis. LNCS (LNAI), vol. 3874, pp. 161–176. Springer, Heidelberg (2006) 6. Medina, J., Ojeda-Aciego, M., Ruiz-Calvi˜ no, J.: Formal concept analysis via multiadjoint concept lattices. Fuzzy Sets and Systems 160, 130–144 (2009) 7. Valverde-Albacete, F.J., Pel´ aez-Moreno, C.: Galois connections between semimodules and applications in data mining. In: Kuznetsov, S.O., Schmidt, S. (eds.) ICFCA 2007. LNCS (LNAI), vol. 4390, pp. 181–196. Springer, Heidelberg (2007) 8. Cuninghame-Green, R.: Minimax Algebra. Lecture notes in Economics and Mathematical Systems, vol. 166. Springer, Heidelberg (1979) 9. Miller, G.A., Nicely, P.E.: An analysis of perceptual confusions among some English consonants. Journal of the Acoustic Society of America 27, 338–352 (1955) 10. Esteban-Alonso, V., Valverde-Albacete, F.J., Pel´ aez-Moreno, C.: Generalised Formal Concept Analysis demo (2008) (date last viewed 28/02/2010) 11. Yevtushenko, S.A.: System of data analysis Concept Explorer. In: Proceedings of the 7th National Conference on Artificial Intelligence KII 2000, pp. 127–134 (2000) (in Russian), http://sourceforge.net/projects/conexp
Reconciling Knowledge in Social Tagging Web Services Gonzalo A. Aranda-Corral1 and Joaqu´ın Borrego-D´ıaz2 1
2
Universidad de Huelva. Department of Information Technology, Crta. Palos de La Frontera s/n. 21819 Palos de La Frontera Universidad de Sevilla. Department of Computer Science and Artificial Intelligence, Avda. Reina Mercedes s/n. 41012 Sevilla. Spain [email protected], [email protected]
Abstract. Sometimes we want to search for new information about topics but we can not find relevant results using our own knowledge (for example, our personal bookmarks). A potential solution could be the use of knowledge from other users to find what we are searching for. This solution implies that we can achieve some agreement on implicit semantics used by the other users. We call it Reconciliation of Knowledge. The aim of this paper is to show an agent-based method which lets us reconcile two different knowledge basis (associated with tagging systems) into a common language, obtaining a new one that allows the reconcilitiation of (part of) this knowledge. The agents use Formal Concept Analysis concepts and tools and it has been implemented on the JADE multiagent platform.
1
Introduction
The amazing growth of Web 2.0 provides powerful technologies for sharing information among users (members of social networks) as, for example, the social indexing of the digital objects of the Web. Collaborative tagging represents a very useful process for users that aim to add metadata to documents, objects, resources, urls, etc. Among other applications, the tagging enable users to achieve personal knowledge organization according to their own interests. Additionally, the Web 2.0 systems can extract (by means of Collective Intelligence methods) some global organization of the information (from a user’s personal point of view). This way the collaborative tagging offers a pragmatic alternative to the semantic web ontologies. However, the gap between the personal organization of information and the global one (as well as between that of different users) makes the use of automated methods to reconcile them difficult. These different ways are combined in tagging tools that the tag-based platform facilitates. This situations leads to a crowd of tagging systems. Moreover, inside of the platform and due to the preferences of the users, different tagging
Partially supported by TIN2009-09492 project of Spanish Ministry of Science and Innovation, cofinanced with FEDER founds.
E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 383–390, 2010. c Springer-Verlag Berlin Heidelberg 2010
384
G.A. Aranda-Corral and J. Borrego-D´ıaz
behaviours exist that actually obstruct the automated interoperability among tag sets. Despite the fact that the systems offer solutions to aid the understanding of the folksonomy that the users collectively build (tag clouds, tools based on related tag ideas, collective intelligence methods, data mining, etc.) Although tagging shows potential benefits, personal organization of information leads to implicit logical conditions that often differ from the global one. Tagging provides a sort of weak organisation of the information, very useful, but mediated by the user’s behaviour. Therefore, it is also possible that user’s tags associated with an object do not agree with the other users tags. Formal Concept Analysis (FCA) is a mathematical tool that, applied to tagging systems, makes explicit the set of concepts that the user manages in tagging, as well as the structure of the relationship between them [7]. The concept lattice (a mathematical structure extracted by FCA methods) represents an intermediate structure between tagging (nonhierarchical and inclusive) and classical taxonomies (hierarchical and exclusive). Thus, FCA is useful to bridge the semantic gap providing a solid mathematical theory to tagging [7]. 1.1
Motivation
Since that user’s tagging reflect their own set of concepts about the documents, two of the main tools of FCA, namely the concept lattice and Stem basis, shows distinct results for different users (semantic heterogenity). From the point of view of navigation by means of tags, the semantic heterogeneity makes the activity insecure. Thus, to ensure an efficient use of another user’s tags, some reasoning on tags must be performed, in order to achieve some consensus (also represented by FCA tools) that allows navigation between different concept structures. In this scenario, it could be very important to attempt to delegate these tasks to intelligent agents (see fig. 1). Our aim is to show how the authors have solved this problem. The solution presented in this paper was designed in the framework of Mobile Web 2.0 project (Mowento), although the solution proposed is valid for any tagging system (in fact, in this paper the method is applied to a well known social bookmarking platform, Delicious1 ). The aim of Mowento is that anyone can publish content (on the WWW) both videos and photos from anywhere at anytime, without needing a next-generation mobile device[2]. Mowento allows users to annotate basic information semantically. This annotation is in principle very limited due to the usability of nonadvanced mobile devices, which do not allow the use of complex applications for tagging. We address the challenge of creating a simple and effective labeling method for the content, which should be able to be properly labeled with a few clicks. The method consists of a series of hierarchically arranged menus whose construction algorithm is based on the Formal Concepts Analysis [1]. From the point of view of the Mowento server, the information is received and is automatically entered into a database, pending processing. From here, the multiagent system (programmed on JADE2 ) takes control of the process and performs its 1 2
http://delicious.com http://jade.tilab.com/
Reconciling Knowledge in Social Tagging Web Services
385
Fig. 1. Knowledge conciliation in social bookmarking represented by concept lattices
tasks. In this context several tagging problems have been solved by means of agents. Among others, this is the main aim of this paper, namely the agentbased reconciliation method. The solution method presented in this paper is also applicable to platforms with tag-based organization of information such as Delicious, which will be used as an example. Mowento is in experimental phase, and user generated content allocated in the project does not provide representative examples while personal bookmarks of authors in Delicious represent a good sample for showing results.
2
Tagging and Heterogenity
In the case of bookmarking systems as Delicious, different features and users’ behaviours represent a similar problem to one faced in the Mowento project: how to organise folksonomies by means formal ontologies (or better, ontologies on user’s tags). Although tagging is useful to navigate among pages on the WWW, it can not be considered as a robust knowledge organization method. Some methods exist to integrate this kind of knowledge organization into SW realm [10]. These methods can be classified according to the semantics associated with tag sets (or folksonomies). For example, there are methods based on ontological definition of tags which use ad hoc ontologies, in order to formally describe the properties of tags (see [8]). Other methods are based on transforming folksonomies into ontologies (see, e.g., [12]), including ontologies designed for dealing with folksonomies [6] or more concrete proposals, as in [9]. 2.1
Heterogeneity
As is argued in [5], tagging is fundamentally about sensemaking, a process in which information is categorized, labeled and, critically, through which meaning emerges [13]. Even in a personal tagging structure, boundaries of concepts and categories are vague, so some items are doubtfully labeled. Lastly, users also use the tagging for their own benefit, but nevertheless constitute a useful public good ([5]).
386
G.A. Aranda-Corral and J. Borrego-D´ıaz
There exist several limitations to collaborative tagging in sites such as Delicious. The first one is that a tag can be used to refer to different concepts; that is, there is a context dependent feature of the tag associated with the user. This dependence limits both the effectiveness and adequacy of collaborative tagging. The limitation is called ”Context Dependent Knowledge Heterogeneity” (CDKH). A second is the Classical Ambiguity (CA) of terms, inherited from natural language and/or the consideration of different ”basic levels” among users [11][5]. CA would not be critical when users work with urls (content of url induces, in fact, a disambiguation of terms because of its specific topic). In this case, the contextualization of tags in a graph structure (by means of clustering analysis) distinguishes the different terms associated with the same tag [3]. However, CDKH is associated with concept structures that users do not represent in the system, but that FCA can extract. It is also possible CDKH is associated with the potential future use of the tagging (it can be used for classifying documents, for facilitating navigation among visited urls, to collect specific and temporal urls, etc.). Thus, navigation among concept structures of different users faced up with CDKH. In the case of platforms such as Mowento, CDKH is a less important problem than with collaborative tagging such as Delicious. This is due to both the specific scope of the activities (reporting testimonials about events) and the common language represented by the tags offered by Mowento’s mobile tagging widget. In Mowento, CDKH can occur only in specific concepts of the personal concept lattice. Thus, reconcilitation is easier than collaborative tagging systems. However, in sites such as Delicious CDKH represents the main problem, because tags perform several functions as bookmarks (see [5]).
3
Agent-Based Reconciliation Knowledge Algorithm
To implement the algorithm, a solution has been chosen based on a multiagent system, which make the extension and distribution of our algorithm no big effort. The multiagent system has to satisfy some requirements, such as to be FIPA compliant, in order to facilitate communication and integration with other multiagent systems. We also thought that it should be, as far as possible, open source. Jade was selected since it is composed of a set of tools for developing agents and an execution platform where the agents can live. Another major point in this decision was that it is developed in Java, a multiplatform language. The reconciliation algorithm consists of the following sequence of steps (see fig. 2): 1. Agent creation step: It starts by creating two Jade agents, passing the agent names and Delicious data as parameters. They know the existence of each other within the platform, so it is not necessary to search -at Service Directory level, managed by the Directory Facilitator agent- another agent that offers the reconciliation service. White Pages registration is transparent to developers because it is already implemented in the Jade toolkit. 2. Building formal contexts and Stem basis: In this step, the agents work in parallel mode, with no interaction, by loading and setting their own knowledge base (KB). They work with the formal context which is built from the
Reconciling Knowledge in Social Tagging Web Services
387
Fig. 2. Reconciliation algorithm
Delicious downloaded information, where the objects are the urls and the attributes are the associated tags. With data, the context is built, and concepts and Stem Basis (SB) are extracted. To obtain such elements it has been integrated with the Concept Explorer tool, ConExp3 , which provides all the FCA algorithms that we need. It is developed in Java, it allows us a fast deployment of FCA algorithms. ConExp comes as a compressed file ”.jar” to be included in the classpath application, and from there, we can instantiate the necessary objects for the computations. 3. Initializing agent dialogue step: Once the agent is initialized, he has to execute a double task related with communication. On one side, the agent sends its own language (attribute set) to the other agent. On the other side, the agent prepares itself for the reception of the same kind message from the other agent. For agent communication, we try to adjust the intention to FIPA performatives and its meaning, so that each message is associated with the best one, according to the content. Specifically, the sending of one language of an agent to another one is done through the INFORM performative. 4. Restrictions of own formal contexts: After this brief communication, agents reduce their languages (the attribute set) to the common language, restricting their formal contexts to that language. This restriction also implies that many objects are now outside of the restriction of the context, because it has discarded those that are not labelled with any common tags, and these contribute nothing to our knowledge base. With the restricted contexts, agents compute the new concept lattices, as well as their concepts and the Stem basis. 5. Synthesizing the production system from Stem Basis: From the stem basis, calculated in the previous step, agents consider the rules that have a support greater than zero. In this paper we call this set of implications Stem Kernel Basis (SKB). Based on the SKB, a production system is synthesized, 3
http://sourceforge.net/projects/conexp/
388
G.A. Aranda-Corral and J. Borrego-D´ıaz
that it will serve later to suggest to the other agent the changes to objects so they can be accepted by the common ground. This production system (used for the new tags’ suggestions) has been completely implemented, because the inference engine requirements were few and not worth the effort to integrate with any other engine, such as Jess4 or Drools5 . 6. Knowledge negotiation between agents: To execute this step, a phase of implementation is necessary, which is clearly multiagent in character , in which a deep agent communication/negotiation is produced. Though a turn-based communication or alternating shifts could have implemented, a more asynchronous one that respects even more, the multiagent philosophy is preferred. The reason is that usual scenario consists of agents’ KB of different sizes, so the communication needs of each agent will be different. – The negotiation begins with the creation, by each agent of a new context where the common knowledge will be stored and will produce the results of the reconciliation. Then, a massive sending of all objects (associated tags included) to the other agent is performed and it waits for the objects and responses from it. All of these sent messages are described by the PROPOSE performative. – When an agent receives an object from the other one, we check whether the object satisfies all the implications of the agent’s SKB, and if so, it includes it within the common context and it also sends an acceptance message to another agent (ACCEPT PROPOSAL performative) so it can also include it in its common context. – If the object does not meet SKB, it introduces it into the production system, created from the SKB, and checks if any of the attributes obtained can be added to the object in order to be accepted by the SKB. This object is then sent back to the other agent as a “new object”, restarting the negotiation about this object. If any suggestions are returned by the production system, we will send a message of rejection (REJECT PROPOSAL performative) to the other agent to proceed to remove the object, as we did. – Once made the whole process of message exchanging and negotiation has finished, the agents will get a common context. So it can extract new concepts and suggestions from the stem basis. These represent a shared conceptualization 3.1
Example
As we explain above, Delicious has been chosen as a test environment to illustrate the method. For reasons of paper length, it is not possible to show the trace of the method. For the experiment, authors’ accounts has been selected in Delicious (http://delicious.com/garanda and http://delicious.com/jborrego), which share common interests. This us find a significant common language, and 4 5
http://www.jessrules.com/ http://www.jboss.org/drools/
Reconciling Knowledge in Social Tagging Web Services
User jborrego garanda Language 381 137 Bookmarks 358 536
389
User jborrego garanda Common Tags 19 Bookmarks 131 114 Implications 11 11
Fig. 3. User data statistics before(left) and after(right) reduce to common language User A ( t u t o r i a l ) ( r o b o t i c s ) −−> ( a i ) ( t w i t t e r ) −−> ( s o c i a l ) ( web2 . 0 ) ( socialnetworking ) ( f a c e b o o k ) −−> ( h a s k e l l ) ( t u t o r i a l )
User B ( t w i t t e r ) ( b l o g ) −−> ( s o c i a l ) ( web2 . 0 ) ( t u t o r i a l ) ( t w i t t e r ) −−> ( web2 . 0 ) ( f a c e b o o k ) −−> ( t w i t t e r )
Fig. 4. Rules before conciliating ( tutorial ) ( tutorial ) ( twitter ) ( facebook )
( r o b o t i c s ) −−> ( programming ) ( a i ) ( h a s k e l l ) ( b l o g ) ( programming ) ( h a s k e l l ) ( b l o g ) −−> ( a i ) −−> ( s o c i a l ) ( web2 . 0 ) ( b l o g ) ( s o c i a l n e t w o r k i n g ) −−> ( s o c i a l ) ( t w i t t e r ) ( t u t o r i a l ) ( h a s k e l l ) ( web2 . 0 ) ( b l o g ) ( s o c i a l n e t w o r k i n g )
Fig. 5. Some rules produced by reconciliation proccess
Language Bookmarks Implications
Conciliation 19 245 21
Fig. 6. Size of conciliated knowledge
the conciliated knowledge could be more interesting. The size of the data refered to users’ accounts (see fig 3, left) with attributes (language) and objects (bookmarks). According to the multiagent protocol, it has to set the common language and reduce the context, leaving the common attributes and removing the objects with no tags in the common language. Results are in fig. 3, right (step 4). In fig. 4, part of the rule sets corresponding to both agents is depicted. Fig. 5 shows some of the rules after reconciliation. Finally, we obtain a common context with a small number of objects and a greatly reduced number of implications with a support greater than zero (last step) (in fig. 6 is presented some information on this context).
4
Conclusions and Future Work
In this paper, a method to reconcile knowledge basis associated to tagging systems is presented. The method is based on FCA, and designed on a mutiagent system, where agents collaborate in order to establish a common knowledge represented both a new tagging and a concept lattice. It is based on dialogs, so it is
390
G.A. Aranda-Corral and J. Borrego-D´ıaz
interesting to compare them with some standard protocols, such as contract-net or similar, and in the near future to adopt one of them. Reconciliation knowledge method can be applied to any tagging-based system. Experiments on Delicious show that after a small number of taggings on the same item, a nascent consensus seems to form and this consensus is not affected by the addition of new tags [5]. This stabilisation implies, for the conciliation method presented, that intentions of objects tend to be similar among users. Future work will be focused on extending the algorithm to find consensus ontologies (with a crowd of users) and, if possible, in a semi-automatic way.
References 1. Alonso-Jim´enez, J.A., Aranda-Corral, G.A., Borrego-D´ıaz, J., Fern´ andez-Lebr´ on, M.M., Hidalgo-Doblado, M.J.: Extending Attribute Exploration by Means of Boolean Derivatives. In: Proc. 6th Int. Conf. on Concept Lattices and Their Applications. CEUR Workshops Proc., p. 433 (2008) 2. Aranda-Corral, G.A., Borrego-D´ıaz, J., G´ omez-Mar´ın, F.: Toward Semantic Mobile Web 2.0 through Multiagent Systems. In: H˚ akansson, A., Nguyen, N.T., Hartung, R.L., Howlett, R.J., Jain, L.C. (eds.) KES-AMSTA 2009. LNCS, vol. 5559, pp. 400–409. Springer, Heidelberg (2009) 3. Yeung, C.M.A., Gibbins, N., Shadbolt, N.: Contextualising Tags in Collaborative Tagging Systems. In: Proceedings of the 20th ACM Conference on Hypertext and Hypermedia (2009) 4. Ganter, B., Wille, R.: Formal Concept Analysis - Mathematical Foundations. Springer, Heidelberg (1999) 5. Golder, S., Huberman, B.A.: The structure of collaborative tagging systems. Journal of Information Science 32(2), 98–208 (2006) 6. Gruber, T.: Ontology of Folksonomy: A Mash-up of Apples and Oranges. Int’l. Journal on Semantic Web & Information Systems 3(2) (2007) 7. J¨ aschke, R., Hotho, A., Schmitz, C., Ganter, B., Stumme, G.: Discovering shared conceptualizations in folksonomies. Journal of Web Semantics 6(1), 38–53 (2008) 8. Kim, H.-L., Scerri, S., Breslin, J., Decker, S., Kim, H.-G.: The state of the art in tag ontologies: A semantic model for tagging and folksonomies. In: International Conference on Dublin Core and Metadata Applications, Berlin, Germany (2008) 9. Knerr, T.: Tagging ontology- towards a common ontology for folksonomies (2006), http://tagont.googlecode.com/files/TagOntPaper.pdf (June 14, 2008) 10. Smith, G.: 2007 Tagging: People-Powered Metadata for the Social Web. First. New Riders Publishing, Indianapolis (2007) 11. Tanaka, J.W., Taylor, M.: Object categories and expertise: Is the basic level in the eye of the beholder? Cognitive Psychology 23(3), 457–482 (1991) 12. Van Damme, C., Hepp, M., Siorpaes, K.: FolksOntology: An Integrated Approach for Turning Folksonomies into Ontologies. In: ESWC 2007 workshop Bridging the Gap between Semantic Web and Web 2.0, May 2007, pp. 57–70 (2007) 13. Weick, K.E., Sutcliffe, K.M., Obstfeld, D.: Organizing and the Process of Sensemaking. Organization Science 16(4), 409–421 (2005)
2-D Shape Representation and Recognition by Lattice Computing Techniques V.G. Kaburlasos, A. Amanatiadis, and S.E. Papadakis Technological Educational Institution of Kavala Department of Industrial Informatics 65404 Kavala, Greece {vgkabs,aamanat,spap}@teikav.edu.gr
Abstract. We consider binary images such that an image includes a single 2-D shape, from which we extract three populations of three different (shape) descriptors, respectively. Each population is represented by an Intervals’ Number, or IN for short, in the mathematical lattice (F, ) of INs. In conclusion, a 2-D shape is represented in the Cartesian product lattice (F3 , ). We present a 2-D shape classification scheme based on fuzzy lattice reasoning (FLR). Preliminary experimental results have been encouraging. We discuss the potential of Lattice Computing (LC) techniques in image representation and recognition applications. Keywords: 2-D shape classification, Fuzzy lattice reasoning (FLR), Inclusion measure, Intervals’ number (IN), Lattice computing.
1
Introduction
In a recent work [1], we have evaluated three different 2-D shape descriptors, namely Fourier descriptors (F D), angular radial transform (ART ) descriptors, and image moments (IM ) descriptors, towards 2-D shape retrieval as follows. A 2-D shape was represented by a Nd -dimensional vector per shape descriptor d ∈ {F D, ART, IM }. Then, 2-D shape retrieval was pursued in the Euclidean space RNd by k- Nearest Neighbor (kN N ), for k = 1. In conclusion, the aforementioned shape descriptors were evaluated, comparatively. Building on [1], this work deals with a different problem, namely 2-D shape recognition/classification, using novel techniques as described next. A population of shape descriptors, instead by a Nd -dimensional vector, is represented here by an Intervals’ Number (INs) in the complete lattice (F, ) of INs [4,6,7]. In conclusion, a 2-D shape is represented here by three INs, respectively, for the three shape descriptors d ∈ {F D, ART, IM }. Finally, we apply a Fuzzy Lattice Reasoning (FLR) scheme for classification in lattice (F3 , ). Preliminary experimental results have been encouraging. The potential of Lattice Computing (LC) in image representation/recognition is also discussed. We remark that the term Lattice Computing (LC) was originally coined by Manuel Gra˜ na [2,3] to denote a Computational Intelligence branch, which develops algorithms in an algebra (R, ∨, ∧, +), where R is the set of real numbers. E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 391–398, 2010. c Springer-Verlag Berlin Heidelberg 2010
392
V.G. Kaburlasos, A. Amanatiadis, and S.E. Papadakis
Later work proposed a wider definition as follows: LC is an evolving collection of tools and methodologies that can process disparate types of data including logic values, numbers, sets, symbols, and graphs based on mathematical lattice theory with emphasis on clustering, classification, regression, pattern analysis, and knowledge representation applications [5]. The work here is organized as follows. Section 2 presents the mathematical background. Section 3 presents a FLR scheme for classification. Section 4 formulates the 2-D shape recognition problem. Section 5 describes preliminary experimental results. Section 6 concludes by summarizing the contribution.
2
Mathematical Background
For basic definitions regarding lattice theory the interested reader may refer elsewhere [4,5]. This section summarizes useful mathematical tools. Note that “curly” symbols such as , , , etc. are employed below between general lattice elements, whereas “straight” symbols such as ≤, ∨, ∧, etc. are employed between real numbers. Consider the following definition. Definition 1. Let (L, ) be a complete lattice with least and greatest elements O and I, respectively. An inclusion measure in (L, ) is a function σ : L×L → [0, 1], which satisfies the following conditions C0. C1. C2. C3.
σ(x, O) = 0, ∀x = O. σ(x, x) = 1, ∀x ∈ L. x y ≺ x ⇒ σ(x, y) < 1. u w ⇒ σ(x, u) ≤ σ(x, w).
We remark that σ(x, y) can be interpreted as the fuzzy degree to which x is less than y; therefore notation σ(x y) may be used instead of σ(x, y). Two different inclusion measure functions are presented next, based on a positive valuation1 function. Theorem 1. If function v : L → R is a positive valuation in a complete latand sigma-join tice (L, ) then both functions sigma-meet σ (x, y) = v(xy) v(x) σ (x, y) =
v(y) v(xy)
are inclusion measures.
Since condition C0 in Definition 1 requires σ(x, O) = 0, ∀x = O, our interest here is in positive valuation functions v : L → R≥0 such that v(O) = 0 as explained by Kaburlasos and Papadakis [6] in Theorem A.10. A four-level hierarchy of complete lattices is presented progressively, next. 2.1
Hierarchy Level-0: The Lattice (R, ≤) of Real Numbers
The set R of real numbers ordered by the conventional order (≤) relation is a complete, totally-ordered lattice (R, ≤) with least and greatest elements denoted, respectively, by O = −∞ and I = +∞. 1
Positive valuation in a lattice (L, ) is a real function v : L × L → R that satisfies both v(x) + v(y) = v(x y) + v(x y) and x ≺ y ⇒ v(x) < v(y).
2-D Shape Representation and Recognition by LC Techniques
2.2
393
Hierarchy Level-1: The Lattice (Δ, ) of Generalized Intervals
A generalized interval is defined next. Definition 2. Generalized interval is an element of lattice (R, ≤∂ ) × (R, ≤). We remark that ≤∂ in Definition 2 denotes the dual (i.e. converse) of order relation ≤ in lattice (R,≤), i.e. ≤∂ ≡≥. The complete product lattice (R,≤∂ )×(R,≤) ≡ (R × R,≥ × ≤) will be denoted, simply, by (Δ, ). A generalized interval will be denoted by [x, y], where x, y ∈ R. The meet () and join () in lattice (Δ,) are given, respectively, by [a, b][c, d] = [a∨c, b∧d] and [a, b] [c, d] = [a ∧ c, b ∨ d]. The set of positive (negative) generalized intervals [a, b], characterized by a ≤ b (a > b), is denoted by Δ+ (Δ− ). It turns out that (Δ+ ,) is a poset, namely poset of positive generalized intervals. Furthermore, poset (Δ+ ,) is isomorphic2 to the poset (τ (R),) of conventional intervals (sets) in R, i.e. (τ (R),) ∼ = (Δ+ ,). We augmented poset (τ (R),) by a least (empty) interval, denoted by O = [+∞, −∞]. Hence, the complete lattice (τO (R) = τ (R)∪{O},)∼ = (Δ+ ∪ {O}, ) emerged. A strictly decreasing bijective, i.e. one-to-one, function θ : R → R implies isomorphism (R,≤) ∼ = (R,≥). Furthermore, a strictly increasing function v : R → R is a positive valuation in lattice (R,≤). It follows that function vΔ : Δ → R given by vΔ ([a, b]) = v(θ(a)) + v(b) is a positive valuation in lattice (Δ,≤). In general, parametric functions θ(.) and v(.) may introduce tunable nonlinearities. Two different inclusion measures, namely sigma-meet and sigma-join, have been proposed in lattice (τO (R), ) as follows: , if a ∨ c ≤ b ∧ d; otherwise, σ ([a, b] 1) σ ([a, b] [c, d]) = v(θ(a∨c))+v(b∧d) v(θ(a))+v(b) [c, d]) = 0, and v(θ(c))+v(d) 2) σ ([a, b] [c, d]) = v(θ(a∧c))+v(b∨d) . 2.3
Hierarchy Level-2: The Lattice (F,) of Intervals’ Numbers
A generalized interval number is defined in the first place, next. Definition 3. Generalized interval number (GIN) is a function G : (0, 1] → Δ. Let G denote the set of GINs. It follows complete lattice (G, ), as the Cartesian product of complete lattices (Δ, ). Our interest here focuses on the sublattice3 of intervals’ numbers defined next. Definition 4. An Intervals’ Number, or IN for short, is a GIN F such that both F (h) ∈ (Δ+ ∪ {O}) and h1 ≤ h2 ⇒ F (h1 ) F (h2 ). 2
3
A map ψ : (P, ) → (Q, ) is called (order) isomorphism iff both “x y ⇔ ψ(x) ψ(y)” and “ψ is onto Q”. Two posets (P, ) and (Q, ) are called isomorphic, symbolically (P, ) ∼ = (Q, ), iff there is an isomorphism between them. A sublattice of a lattice (L, ) is another lattice (S, ) such that S ⊆ L.
394
V.G. Kaburlasos, A. Amanatiadis, and S.E. Papadakis
Let F denote the set of INs. It follows complete lattice (F, ) with least element O = O(h) = [+∞, −∞] and greatest element I = I(h) = [−∞, +∞], h ∈ (0, 1]. A IN will be denoted by a capital letter in italics, e.g. F ∈ F. Given the two inclusion measures σ (., .) and σ (., .) in (Δ, ), the following two inclusion measures emerge, respectively, in (F,): 1 1) σ (F1 F2 ) = σ (F1 (h) F2 (h))dh. 0
2) σ (F1 F2 ) =
1
σ (F1 (h) F2 (h))dh.
0
2.4
Hierarchy Level-3: The Cartesian Product Lattice (FN , )
The Cartesian product lattice (FN , ) is the “fourth level” in our proposed hierarchy of complete lattices. An element of the complete lattice (FN , ) will be denoted by a capital letter in bold, e.g. F = (F1 , ..., FN ) ∈ FN . 2.5
Additional Definitions
The size of a IN is defined as follows. Definition 5. The size of a IN F = F (h) = [ah , bh ], h ∈ (0, 1], with respect to a positive valuation function v : R → R, is defined as a nonnegative function 1 S : F → R≥0 given by S(F ) = [v(bh ) − v(ah )]dh. 0
We remark that the size of interval-IN δ = [A, B] equals S(δ) = S(A) − S(B).
3
A Fuzzy Lattice Reasoning (FLR) Classifier
Algorithm 1 (BIINtrn ) induces L interval-INs from a set {F1 , . . . , Fntrn } of (labelled) INs for training. Whereas, Algorithm 2 (BIINtst ) assigns classes to a set {E1 , . . . , Entst } of INs for testing. We remark that, on the one hand, algorithm BIINtrn is an agglomerative clustering scheme, which proceeds by conditionally merging “nearby” INs until a maximum user-defined threshold size Sθ = 0.5 is met. On the other hand, algorithm BIINtst assigns a IN Ei to the class of an interval-IN where Ei in included most, in an inclusion measure function (σ) sense. Note that an employment of an inclusion measure function is called Fuzzy Lattice Reasoning (FLR).
4
The Problem and Its Mathematical Formulation
We considered the MPEG-7 benchmark data set of binary images including 2-D shapes [1]. In particular, we used the 1,400 image data set divided in 70 classes with 20 images per class. Sample images are shown in Fig.1. In a data preprocessing step, from each image, we extracted Fourier descriptors (F D), angular radial transform (ART ) descriptors, and image moments
2-D Shape Representation and Recognition by LC Techniques
395
Algorithm 1. BIINtrn : Batch Interval-IN algorithm for training 1: Let {F1 , . . . , Fntrn } be a set of labelled INs for training. Furthermore, let c(F ) denote the class label of IN F , where ∈ {1, . . . , ntrn }. 2: Consider the set {δ1 , . . . , δntrn } of labelled (trivial) interval-INs δ = [F , F ]; moreover, let c(δ ) denote the class label of interval-IN δ , where ∈ {1, . . . , ntrn }. 3: L ← ntrn . 4: Consider a user-defined size threshold Sθ = 0.5. . 5: Let (I, J) = arg min{Size(δi δj )}, i, j ∈ {1, . . . , L}: I = J and c(δI ) = c(δJ ). 6: while Size(δI δJ ) < Sθ do {learn by merging interval-INs of the same class} 7: Replace both δI and δJ by δI δJ . 8: L ← L − 1. . 9: Let (I, J) = arg min{Size(δi δj )}, i, j ∈ {1, . . . , L}: I = J and c(δI ) = c(δJ ). 10: end while
Algorithm 2. BIINtst : Batch Interval-IN algorithm for testing 1: Consider a set {δ1 , . . . , δL } of labelled interval-INs, where both δ ∈ F3 × F3 and c(δ ) denotes the class label of interval-IN δ , ∈ {1, . . . , L}. 2: for i = 1 to ntst do {for each testing datum Ei ∈ F3 do} . 3: J = arg max{σ([Ei , Ei ] δ )}, ∈ {1, . . . , L}. 4: Assign IN Ei to class c(δJ ). 5: end for
(a)
(b)
Fig. 1. Samples of 2-D shapes regarding shape classes (a) “chicken”, and (b) “bird”
(IM ) descriptors [1]. A 2-D shape was represented by Nd descriptors, where d ∈ {F D, ART, IM }; in particular, NF D = 32, NART = 112, and NIM = 6. A population of Nd descriptors was represented by one IN induced from the aforementioned population by algorithm CALCIN [4]. For example, Fig.2 and Fig.3 show INs induced from F D and ART descriptors, respectively, for two different classes, namely “chicken” and “bird”. Fig.4 shows interval-INs computed by the lattice-meet and lattice-join operations of the INs they contain.
396
V.G. Kaburlasos, A. Amanatiadis, and S.E. Papadakis
1.0
1.0
0.5
0.5
0.0 0.0
0.5
1.0
1.5
2.0
0.0 0.0
FD Magnitute
0.5
1.0
1.5
2.0
FD Magnitute
(a)
(b)
Fig. 2. Each IN, above, was induced from one population of Fourier Descriptors (FD) regarding shape classes (a) “chicken”, and (b) “bird”
1.0
1.0
0.5
0.5
0.0 0.0
0.5 ART
(a)
1.0
0.0 0.0
0.5 ART
1.0
(b)
Fig. 3. Each IN, above, was induced from one population of Angular Radial Transform (ART) descriptors regarding shape classes (a) “chicken”, and (b) “bird”
5
Preliminary Computational Experiments
We have developed reliable software, which implements our algorithms for both training and testing. In our computational experiments we employed sigmoid positive valuation functions v(x; λ, μ0 ) = 1/(1 + e−λ(x−μ0) ) with tunable parameters λ and μ0 ; furthermore, we used function θ(x) = −x.
2-D Shape Representation and Recognition by LC Techniques
1.0
1.0
0.5
0.0
397
0.5
0.0
0.5 ART
(a)
1.0
0.0
0.0
0.5 ART
1.0
(b)
Fig. 4. Two interval-INs, drawn in thick lines, computed from ART-descriptor-induced INs regarding, respectively, shape classes (a) “chicken”, and (b) “bird”. Note that an interval-IN (drawn in thick lines) envelops a cluster of INs drawn in thin lines.
Preliminary experimental results have been encouraging. For instance, in certain experiments, we have recorded recognition rates well over 90%. Comparative experimental work is currently under way.
6
Conclusion
This work has presented preliminary experimental evidence that distributions of measurements, represented by INs, can be used for effective 2-D shape representation towards recognition. Further research on IN-based LC techniques in image representation and recognition applications is a topic for future work. Acknowledgement. This work has been supported, in part, by a project Archimedes-III contract.
References 1. Amanatiadis, A., Kaburlasos, V.G., Gasteratos, A., Papadakis, S.E.: A comparative study of invariant descriptors for shape retrieval. In: Proc. 2009 IEEE Intl. Conf. on Imaging Systems & Techniques (IST 2009), pp. 391–394 (2009) 2. Gra˜ na, M.: State of the art in lattice computing for artificial intelligence applications. In: Nadarajan, R., Anitha, R., Porkodi, C. (eds.) Mathematical and Computational Models, pp. 233–242 (2007)
398
V.G. Kaburlasos, A. Amanatiadis, and S.E. Papadakis
3. Gra˜ na, M.: Lattice computing: lattice-theory-based computational intelligence. In: Matsuhisa, T., Koibuchi, H. (eds.) Proc. Kosen Workshop on Mathematics, Technology, and Education (MTE), pp. 19–27 (2008) 4. Kaburlasos, V.G.: Towards a Unified Modeling and Knowledge-Representation Based on Lattice Theory. SCI, vol. 27. Springer, Heidelberg (2006) 5. Kaburlasos, V.G.: Granular fuzzy inference system (FIS) design by lattice computing. In: Corchado Rodriguez, E.S., et al. (eds.) HAIS 2010, Part II. LNCS (LNAI), vol. 6077, pp. 410–417. Springer, Heidelberg (2010) 6. Kaburlasos, V.G., Papadakis, S.E.: A granular extension of the fuzzy-ARTMAP (FAM) neural classifier based on fuzzy lattice reasoning (FLR). Neurocomputing 72(10-12), 2067–2078 (2009) 7. Papadakis, S.E., Kaburlasos, V.G.: Induction of classification rules from histograms. In: Proc. 8th Intl. Conf. on Natural Computing, Joint Conf. on Information Sciences (JCIS 2007), pp. 1646–1652 (2007)
Order Metrics for Semantic Knowledge Systems Cliff Joslyn1 and Emilie Hogan2 1
National Security Directorate, Pacific Northwest National Laboratory, Seattle, Washington, 98109, USA [email protected] 2 Mathematics Department, Rutgers University
Abstract. Knowledge systems technologies, as derived from AI methods and used in the modern Semantic Web movement, are dominated by graphical knowledge structures such as ontologies and semantic graph databases. A critical but typically overlooked aspect of all of these structures is their admission to analyses in terms of formal hierarchical relations. The partial order representations of whatever hierarchy is present within a knowledge structure afford opportunities to exploit these hierarchical constraints to facilitate a variety of tasks, including ontology analysis and alignment, visual layout, and anomaly detection. We introduce the basic concepts of order metrics and address the impact of a hierarchical (order-theoretical) analysis on knowledge systems tasks.
1
Introduction
Knowledge systems technologies are dominated by graphical structures. Semantic graph databases [15] take the form of labeled directed graphs implemented in RDF [9]. Their OWL [10] ontological typing systems are also labeled directed graphs, frequently dominated by directed acyclic graph (DAG) and other hierarchical structures. Fig. 1 shows a toy example, where the ontology of classes on the left forms the typing system for the semantic graph of node and link instances on the right. But where semantic taxonomies such as the Gene Ontology [2] include hierarchical class structures, other portions can be non-hierarchical. And more general knowledge structures like semantic graphs are not explicitly or necessarily hierarchical, but may contain large hierarchical components. In practice, ontologies are dominated by their “hierarchical cores”, specifically their class hierarchies connected by is-a subsumptive and has-part compositional links. And many of the most common links in RDF graphs are transitive, including causes, implies, and precedes. The partial order representation of whatever hierarchy is present within a knowledge structure affords opportunities to exploit these hierarchical constraints for a variety of tasks, including: Clustering and Classification: Characterizing a portion of a hierarchy (e.g. groups of ontology nodes) to identify common characteristics [12,18], Alignment: Casting ontology matching [7] as mappings between hierarchical structures [11,14]. E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 399–409, 2010. c Springer-Verlag Berlin Heidelberg 2010
400
C. Joslyn and E. Hogan ONTOLOGY = TYPE GRAPH Thing
Located
Is-a Weapon
FACT BASE = INSTANCE SEMANTIC GRAPH
Place
Has-part
Al Qaeda : Group
Posseses Group Country
Bio
Explosives
Has-part Terrorist Organization
Anthrax
Smallpox
USA
State
Has-part
Ohio
Possesses Strain B: Anthrax
Located Ohio: State
County
Strain A
Strain B
Al Qaeda
Fig. 1. Toy model of a semantic graph database. (Left) Ontological typing system as a labeled, directed graph of classes (sample instances shown below dashed links). (Right) Conforming instance sub-graph.
Visualization: Including exploiting the level structure of hierarchies to achieve a satisfactory layout [13]. In general, such a hierarchical analysis, when available, promises complexity reduction, improved user interaction with the knowledge base, and improved layout and visual analytics.
2
DAGs and Partial Orders
Hierarchies are represented as partially ordered sets (posets), which are reflexive, anti-symmetric, and transitive binary relations P = P, ≤ on an underlying finite set of nodes P [6]. While we typically think of hierarchies as tree structures, more general kinds of hierarchies have “multiple inheritance”, where nodes can have more than one parent. These include lattice structures, where pairs of nodes have unique least common subsumers (and unique greatest lower bounds as well); partial orders where pairs of nodes can have an indefinite number of least common subsumers and greatest lower bounds; and finally general DAGs can also include “transitive links” which form shortcuts across paths. Consider simple DAG in the top of Fig. 2. The two transitive links 1 → H, 1 → E connect the two paths 1 → K → H and 1 → C → I → E respectively. Given a DAG D, the DAG P(D) produced by including all possible transitive links consistent with its paths is its transitive closure, and determines an ordered set P(D) = P, ≤ where a ≤ b ⊆ P if there is a directed path from a to b in D. The graph V(D) produced from a DAG D by removing all its transitive links (its transitive reduction [1]) determines a cover relation or Hasse diagram. Thus each cover relation V determines a unique poset P(V), and vice versa a poset P determines a unique cover V(P); each DAG D determines a unique poset P(D) and cover V(D); and each unique poset-cover pair determines a class of DAGs equivalent by transitive links.
Order Metrics for Semantic Knowledge Systems
401
Fig. 2. (Top) A DAG D. (Left) Transitive reduction V(D). (Right) Transitive closure P(D).
For a DAG D we can measure its degree of transitivity as T R(D) : =
|D \ V(D)| , |P(D) \ V(D)|
(1)
where \ is set subtraction, we interpret each structure as the binary relation on P 2 of its incidence matrix, and | · | is cardinality, so that | · | is the number of links in ·, seen as a graph. T R(D) measures the number |D \ V(D)| of transitive links in D relative to the total possible number |P(D) \ V(D)| in its transitive 2 , indicating a relatively low degree closure P(D). In Fig. 2 we have T R(D) = 11 of transitivity. In knowledge systems such as ontologies, our interpretation of the presence or absence of transitive links in DAGs is significant. If the link-type in question is anti-transitive, so that transitive links are disallowed, then clearly the presence of transitive links is in error. If, on the other hand, the link-type in question is atransitive, so that transitive links are allowed, but not required, then the T R(D) measures this extent. But finally, if, as is the case with our subsumption and composition types, the link type represents a fully transitive property, then the presence of transitive links are irrelevant or erroneous. Effectively, such link types live in the trasitively equivalent class of DAGs, that is, in the partial order P(D), and T R(D) can be used as an aid to the user or engineer to identify issues with the underlying ontology.
3
Measures on Hierarchical Graphs
Given a hierarchical DAG structure represented by its transitive closure poset P, tools are available to measure this hierarchical structure. Here we discusses
402
C. Joslyn and E. Hogan
interval-valued rank measuring the vertical level of nodes, and order metrics measuring the distances between nodes. See [13] for more details. Consider the hierarchy shown in Fig. 3. We are concerned with the proper representation of the vertical level of each node, as represented by its positioning in a layout. We note that all children of the root have the same “distance” from the root, but if these are also leaves then they should be positioned further down. In other words, we need to exploit the vertical distance from both the top and a global bottom, including a virtual node 0 ∈ P inserted below all the leaves. 1
L
B
G
M
K X A I
Y
H J
E
O D Q
Fig. 3. A DAG displayed as a hierarchy
For a, b ∈ P , let h∗ (a, b) be the length of the maximum path from a to b. Then the distance of a node a ∈ P from the root 1 ∈ P is the top rank rt (a) : = h∗ (a, 1). Dually we define the bottom rank rb (a) : = h∗ (0, 1) − h∗ (0, b), where ¯ h∗ (0, 1) is the overall height of the structure. Then the interval rank R(a) : = t b [r (a), r (a)] becomes available as an interval-valued measure of the vertical levels over which a can range, while the rank width W (a) : = rb (a)− rt (a) is a measure of that range [13]. We can exploit this vertical rank for hierarchical layout and visualization, as shown for our example in Fig. 4. Each node which sits on a complete chain (a path from 1 down to 0) of maximal size is placed horizontally at the center of the page. Nodes are laid out horizontally according to the size of their largest maximal chains. The result is to place maximal complete chains along a central axis, and short complete chains towards the outer edges. Nodes are placed vertically according to the mathematical quantity of the midpoint of their interval rank, but can be free to move between top rank rt (a) and bottom rank rb (a). The result is that while nodes on maximal complete chains (all those intersecting the chain 0 → D → E → I → X → 1 in the example) exist at a single level, some (for example K) do not. While Fig. 4 shows a 2D layout, we have also deployed this concept in a 3D layout [13].
4
Order Metrics
Given the need to perform operations like clustering or alignment on ontologies represented as ordered sets P = P, ≤, it is essential to have a general sense of
Order Metrics for Semantic Knowledge Systems
403
Max chain Length 5 = Height Top rank = 2 Min length from bottom: 2 Max length from bottom: 3 Bottom rank = 5 - 3 = 2 Rank = [2,2]
Other chain Length 4
1
Min chain Length 2
X L
B
Lower Top Rank Lower Bottom Rank More Children Fewer Parents
Y I K
J M
G
A
Top rank = 1 Bottom rank = 5 - 1 = 4 Rank = [1,4]
E H O
Q
D Top rank = 2 Bottom rank = 5 - 1 = 4 Rank = [2,4]
Shorter chains
Higher Top Rank Higher Bottom Rank Fewer Children More Parents
0 Virtual bottom
Longer chains
Shorter chains
Fig. 4. Chain layout of the cyclic decomposition of the network in Fig. 3
distance d(a, b) between two nodes a, b ∈ P . The knowledge systems literature has focused on semantic similarities to perform a similar function, which are available when P is equipped with a probability distribution, derived, for example, from the frequency with which terms appear in documents (for the Wordnet [8] thesaurus), or genes are annotated to GO nodes. p : P → [0, 1], So assume a poset P, ≤ with a base probability distribution p(a) = 1, and a “cumulative” function β(a) : = p(a). We then gena∈P b≤a eralize the join (least upper bound) and meet (greatest lower bound) operations in lattices as follows. Let ↑ a : = {b ≥ a} and ↓ a : = {b ≤ a} are the up-set (filter) and down-set (ideal) respectively of a node a ∈ P . Then for two nodes a, b ∈ P , let a∇b : = ↑ a ∩ ↑ b and aΔb : = ↓ a ∩ ↓ b be the set of nodes above or below respectively both of them. Then the generalized join a ∨ b is the set of minimal (lowest) nodes of a∇b, and the generalized meet a ∧ b is the set of maximal (highest) nodes of aΔb. When P is a lattice, then |a ∨ b| = |a ∧ b| = 1, recovering traditional join and meet. Common choices for the semantic similarity S(a, b) between two nodes include the measures of Resnik, Lin, and Jiang and Conrath [5]: S(a, b) = max [− log2 (β(c))] c∈a∨b
S(a, b) =
2 maxc∈a∨b [log2 (β(c))] log2 (β(a)) + log2 (β(b))
(2)
(3)
404
C. Joslyn and E. Hogan
S(a, b) = 2 max [log2 (β(c))] − log2 (β(a)) − log2 (β(b)) c∈a∨b
(4)
respectively. But most of these are not metrics (not satisfying the triangle inequality), and all of these lack a general mathematical grounding and require a probabilistic weighting. We use ordered set metrics [16,17], which are preferable to semantic similarities, because while they can use, they do not require, a quantitative weighting such as β; and because they always yield a metric. They are based on valuation functions v : P → IR+ which are, first, either isotone (a ≤ b → v(a) ≤ v(b)) or antitone (a ≤ b → v(a) ≥ v(b)); and then semimodular, in that v(a) + v(b) v ∇ (a, b) + vΔ (a, b),
(5)
where ∈ {≤, ≥, =}, yielding super-modular, sub-modular, and modular valuations respectively; and v ∇ (a, b) : = min v(c), c∈a∇b
vΔ (a, b) : = max v(c). c∈aΔb
(6)
Whether a valuation v is antitone or isotone, and then sub- or super-modular, determines which of four distance functions is generated, e.g. the antitone, supermodular case yields d(a, b) = v(a) + v(b) − 2v ∇ (a, b). When P is a lattice, then this simplifies to d(a, b) = v(a) + v(b) − 2v(a ∨ b). See [17] for full details and proofs. Typical valuations v include the cardinality of up-sets and down-sets: v(a) = | ↑ a|, v(a) = | ↓ a|, and the cumulative probabilities used in semantic similarities v(a) = β(a). In this way, poset metrics generalize semantic similarities and provide a strong basis for various analytical tasks.
5
Order Metrics in Ontology Alignment
A good example of the utility of this order theoretical technology in knowledge systems tasks is in ontology alignment [7,11]. An ontology alignment is a mapping f : P → P taking anchors a ∈ P in one semantic hierarchy P = P, ≤ into anchors a ∈ P in another P = P , ≤ . In seeking a measure of the structural properties of the mapping f , our primary criterion is that f should not distort the metric relations of concepts, taking nodes that are close together and making them farther apart, or vice versa. It should be noted that a “smooth” mapping f is neither necessary nor sufficient to be a good alignment: on the one hand, a good structural mapping may be available between structures from different domains; and on the other, differences in semantic intent between the two structures may be irreconcilable. Nonetheless, other things being equal, it is preferable to have a more smooth mapping than not.
Order Metrics for Semantic Knowledge Systems
405
So, for two ontology nodes a, b ∈ P, consider the lower cardinality distance dl (a, b) : = | ↓ a| + | ↓ b| − 2 max | ↓ c|. We can measure the change in distance c∈a∧b
between a, b ∈ P induced by f as the distance discrepancy δ(a, b) : = |d¯l (a, b) − d¯l (f (a), f (b))|,
(7)
dl (a,b) ∈ [0, 1] is the normalized lower distance between a diamd (P) and b in P given the diameter diamd (P) : = max d(a, b). We can measure the
where d¯l (a, b) : =
a,b∈P
entire amount of distance discrepancy at a node a ∈ P compared to all the other anchors b ∈ P by summing δf (a) : = δ(a, b) = |d¯l (a, b) − d¯l (f (a), f (b))|, (8) b∈P
b∈P
yielding the discrepancy δ(f ) : = a∈P δf (a) of the alignment. Consider the example in Fig. 5, with the partial alignment function f as shown, mapping only certain nodes {B, E, G} from P to P . Then we have e.g. the lower normalized distance between nodes E and G as d¯l (E, G) = 1/3; the distance discrepancy between the two nodes E, G in virtue of f as δ(E, G) = |1/3−3/5| = .267; the entire distance discrepancy at the node E as δf (E) = 2/5; and finally the distance discrepancy for the entire alignment as δ(f ) = .47. F
1
1
f1
f2 B
C
D
f3
I
J
K
L
f4 E
G 0
0
Fig. 5. An example alignment
6
Order Metrics for Ontology Clustering
Consider the following question. Assume a (portion of a) taxonomy is represented as a finite, non-empty poset P = P, ≤, and then we’re given a collection of nodes Q ⊆ P . How “big an area” does Q “implicate” or “delineate” or “occupy” in the hierarchy? We are pursuing this question now in the context of determining the quality of ontology query returns: the “tighter” a set of ontology nodes returned from an ontology query, the stronger the quality of that set. For two nodes a, b ∈ P , they are comparable a ∼ b if a ≤ b or a ≥ b. If a ≤ b ∈ P then define the order interval [a, b] : = {c ∈ P : a ≤ c ≤ b}. Note
406
C. Joslyn and E. Hogan
that [a, b] = ↑ a ∩ ↓ b. Now consider two typical order metrics, the upper and lower cardinality metrics: du (a, b) : = | ↑ a| + | ↑ b| − 2| ↑ a ∩ ↑ b|,
dl (a, b) : = | ↓ a| + | ↓ b| − 2| ↓ a ∩ ↓ b|, (9)
for a, b ∈ P . From the triangle inequality of d, we know that ∀a, b, c ∈ P, d(a, b) ≤ d(a, c) + d(c, b). So following [3,4], for two points a, b ∈ P and metric d, we can define the segment [[a, b]]d as the set of all nodes which are “between” them in the metric sense: [[a, b]]d : = {c ∈ P : d(a, b) = d(a, c) + d(c, b)}.
(10)
We know that ∀a, b ∈ P, [[a, b]] = ∅, since a, b ∈ [[a, b]]; and when nodes are comparable, segments collapse to order intervals: a ∼ b → [[a, b]] = [a, b]. Consider the three-cube in Fig. 6, with du shown in Table 1, we have du (B, G) = 4, and [[B, G]] = {a ∈ P : du (a, B) + du (a, G) = 4} = {A, B, C, D, G}.
A B
C
D
E
F
G
H Fig. 6. The Boolean 3-cube Table 1. Upper distance matrix du in the 3 cube du (a, b) A B C D E F G H
A 0 1 1 1 3 3 3 7
B 1 0 2 2 2 2 4 6
C 1 2 0 2 2 4 2 6
D 1 2 2 0 4 2 2 6
E 3 2 2 4 0 4 4 4
F 3 2 4 2 4 0 4 4
G 3 4 2 2 4 4 0 4
H 7 6 6 6 4 4 4 0
Convexity is the idea that any nodes between other nodes in a collection Q ⊆ P are also in that collection, so that a subset of nodes Q ⊆ P is convex if ∀a, b ∈ Q, [[a, b]] ⊆ Q. We can then define C(Q), the convex hull of Q, by the following iterative algorithm using the function K(Q) = a,b∈Q [[a, b]]d .
Order Metrics for Semantic Knowledge Systems
407
Let ^ Q := Q While ^ Q is not convex{ ^ Q := K(^ Q) } RETURN ^ Q The convex hull C(Q) is clearly convex, and includes the original set: Q ⊆ C(Q). Consider again a poset P = P, ≤ with metric d, and a subset of nodes Q ⊆ P . Then we can define the exterior points as those outside the convex hull: E(Q) : = P \ C(Q); and the interior points as those inside the convex hull, but not in the original collection: I(Q) : = C(Q) \ Q. For a subset of nodes Q ⊆ P we have its size S(Q) : = |Q|,
S(Q) ¯ S(Q) : = S(P )
(11)
in both un-normalized and normalized forms, and similarly the dispersion D(Q) ¯ D(Q) : = d(a, b), D(Q) : = . (12) D(P ) a,b∈C(Q)
Continuing our example from Fig. 6, still using du , consider the set Q={B, E, G}. Then we have C(Q) = [[B, E]] ∪ [[B, G]] ∪ [[E, G]] = {B, E} ∪ {A, B, C, D, G} ∪ {E, C, G} = {A, B, C, D, E, G}
(13)
This is shown in Fig. 7. We have exterior points E(Q) = {F, H} and interior points I(Q) = {A, C, D}. We also have D− (Q) = 10, D(Q) = 35, and D(P ) = ¯ D− (P ) = 91. So, note that while the normalized dispersion is D(Q) = 35/91 = ¯ 0.385%, the relative size is S(Q) = 3/8 = 0.375 ≤ 0.385 = D(Q).
Fig. 7. The 3-cube identifying C({B, E, G})
408
C. Joslyn and E. Hogan
Now consider Q = {E, G} ⊆ Q, then we have: C(Q ) = [[E, G]] = {E, C, G}, I(Q ) = {C},
E(Q ) = {A, B, D, F, H},
V (Q ) = C(Q) = {E, C, G}.
(14)
(15)
Note that this last result V (Q) = C(Q) holds whenever |Q| = 2. Finally we have S(Q ) = 2, D(Q ) = D− (Q ) = 3,
¯ ) = 0.25, S(Q ¯ ) = D ¯ − (Q ) = 3/91 = 0.033. D(Q
(16) (17)
It is valuable to compare the above approach to a typical approach used in semantic analysis, which is to work not within the poset P as a directed graph, but rather the undirected, symmetrically-closed version of P wherein link directions and thus hierarchical structure are not recognized. Let G(P) : = P, R, where R ⊆ P 2 and ∀a, b ∈ P, a, b ∈ R ↔ a ∼ b. The metric dp (a, b) is then the minimum path length in G(P) of a and b. In our original example with Q = {B, E, G}, we have E ∼ B, so that [[B, E]]du = [[B, E]]dp = {B, E}. But [[E, G]]dp = {E, C, G, H}, and [[B, G]]dp = P , because B and G are inverses. Thus the convex hull is Cdp (Q) = P , and Q can be said to be of maximal size. This is clearly inadequate.
References 1. Aho, A.V., Garey, M.R., Ullman, J.D.: The Transitive Reduction of a Directed Graph. SIAM Journal of Computing 1(2), 131–137 (1972) 2. Ashburner, M., Ball, C.A., Blake, J.A., et al.: Gene Ontology: Tool For the Unification of Biology. Nature Genetics 25(1), 25–29 (2000) 3. Bandelt, H.J.: Centroids and Medians of Finite Metric Spaces. J. Graph Theory 16(4), 305–317 (1992) 4. Bandelt, H.J., Chepoi, V.: Metric Graph Theory and Geometry: A Survey. In: Surveys on Discrete and Computational Geometry: Twenty Years Later, vol. 453, pp. 49–86. American Math. Soc., Providence (2008) 5. Butanitsky, A., Hirst, G.: Evaluating WordNet-based Measures of Lexical Semantic Relatedness. Computational Linguistics 32(1), 13–47 (2006) 6. Davey, B.A., Priestly, H.A.: Introduction to Lattices and Order, 2nd edn., Cambridge UP, Cambridge UK (1990) 7. Euzenat, J., Shvaiko, P.: Ontology Matching. Springer, Hiedelberg (2007) 8. Fellbaum, C. (ed.): Wordnet: An Electronic Lexical Database. MIT Press, Cambridge (1998) 9. http://www.w3.org/RDF 10. http://www.w3.org/TR/owl-features 11. Joslyn, C., Baddeley, B., Blake, J., Bult, C., et al.: Automated AnnotationBased Bio-Ontology Alignment with Structural Validation. In: Smith, B. (ed.) Proc. Int. Conf. on Biomedical Ontology (ICBO 2009), pp. 75–78 (2009), doi:10.1038/npre.2009.3518.1
Order Metrics for Semantic Knowledge Systems
409
12. Joslyn, C., Mniszewski, S., Fulmer, A., Heaton, G.: The Gene Ontology Categorizer. Bioinformatics 20(s1), 169–177 (2004) 13. Joslyn, C., Mniszewski, S.M., Smith, S.A., Weber, P.M.: SpindleViz: A Three Dimensional, Order Theoretical Visualization Environment for the Gene Ontology. In: Joint BioLINK and 9th Bio-Ontologies Meeting, JBB 2006 (2006), http://www.bio-ontologies.org.uk/2006/download/ Joslyn2EtAlSpindleviz.pdf 14. Joslyn, C., Paulson, P., White, A.: Measuring the Structural Preservation of Semantic Hierarchy Alignments. In: Proc. 4th Int. Wshop. on Ontology Matching (OM 2009), CEUR, vol. 551 (2009), http://ceur-ws.org/Vol-551/om2009_Tpaper6.pdf 15. McBride, B.: Jena: A Semantic Web Toolkit. IEEE Internet Computing 6(6), 55–59 (2002) 16. Monjardet, B.: Metrics on Partially Ordered Sets - A Survey. Discrete Mathematics 35, 173–184 (1981) 17. Orum, C., Joslyn, C.A.: Valuations and Metrics on Partially Ordered Sets (2009) (submitted), http://arxiv.org/abs/0903.2679v1 18. Verspoor, K.M., Cohn, J.D., Mniszewski, S.M., Joslyn, C.A.: A Categorization Approach to Automated Ontological Function Annotation. Protein Science 15, 1544–1549 (2006)
Granular Fuzzy Inference System (FIS) Design by Lattice Computing Vassilis G. Kaburlasos Technological Educational Institution of Kavala Department of Industrial Informatics 65404 Kavala, Greece [email protected]
Abstract. Information granules are partially/lattice-ordered. Therefore, lattice computing (LC) is proposed for dealing with them. The granules here are Intervals’ Numbers (INs), which can represent real numbers, intervals, fuzzy numbers, probability distributions, and logic values. Based on two novel theoretical propositions introduced here, it is demonstrated how LC may enhance popular fuzzy inference system (FIS) design by the rigorous fusion of granular input data, the sensible employment of sparse rules, and the introduction of tunable nonlinearities. Keywords: Fuzzy inference system (FIS), Granular data, Inclusion measure, Intervals’ number (IN), Lattice computing.
1
Introduction
An information granule [11] can be thought of as a (local) cluster. It turns out that clusters are partially-ordered – For a formal definition of partial-order see below. Under certain conditions, a partially-ordered set is a lattice. Hence, mathematical lattice theory emerges naturally in granular computing. The term Lattice Computing (LC) was coined by Gra˜ na [3,4] to denote a Computational Intelligence branch, which develops algorithms in an algebra (R, ∨, ∧, +), where R is the set of real numbers. Later work [5,9] proposed the following, wider definition: Lattice computing (LC) is an evolving collection of tools and methodologies that can process disparate types of data including logic values, numbers, sets, symbols, and graphs based on mathematical lattice theory. Note that the former LC definition was motivated mainly by mathematical morphology for image processing [12], whereas the latter LC definition has a wider motivation including, in addition, formal concept analysis [1], general clustering/classification/regression techniques [7], logic and reasoning [15], etc. A popular family of algorithms is Fuzzy Inference Systems (FISs) [6], whose inputs typically consist of vectors in the Euclidean space RN . Recent work described an approach to FIS design based on mathematical morphology [13]. This work proposes a rigorous extension of conventional FIS techniques towards computing with (information) granules, namely Intervals’ Number (INs). E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 410–417, 2010. c Springer-Verlag Berlin Heidelberg 2010
Granular Fuzzy Inference System (FIS) Design by Lattice Computing
411
More specifically, this work builds on an established mathematical result, namely “the resolution identity theorem”, which specifies that a fuzzy set can (equivalently) be represented either by its membership function or by its αcuts. In conclusion, based on two novel mathematical propositions, an inclusion measure function emerges here as an instrument towards substantial FIS improvements including the rigorous fusion of granular input data, the sensible employment of sparse rules, and the introduction of tunable nonlinearities. The work here is organized as follows. Section 2 presents the mathematical background. Section 3 introduces novel mathematical tools. Section 4 demonstrates granular FIS. Section 5 concludes by summarizing the contribution.
2
Mathematical Preliminaries
This section summarizes a hierarchy of lattices [7,10] using an improved mathematical notation introduced recently [8,10]. 2.1
The Complete Lattice (Δ,) of Generalized Intervals
There is no unanimous opinion whether lattice (R, ≤) is complete or not [7,10]. Here, we assume that lattice (R, ≤) is complete with least and greatest elements O = −∞ and I = +∞, respectively. We define a generalized interval, next. Definition 1. Generalized interval is an element of lattice (R, ≤∂ ) × (R, ≤). We remark that ≤∂ in Definition 1 denotes the dual (i.e. converse) of order relation ≤, i.e. ≤∂ ≡≥. Product lattice (R,≤∂ )×(R,≤) ≡ (R × R,≥ × ≤) will be denoted, simply, by (Δ,). Note that curly symbols , , are used for general lattice elements, whereas straight symbols ≤, ∨, ∧ are used for real numbers. A generalized interval will be denoted by [x, y], where x, y ∈ R. The meet () and join () in lattice (Δ,) are given, respectively, by [a, b][c, d] = [a∨c, b∧d] and [a, b] [c, d] = [a ∧ c, b ∨ d]. The set of positive (negative) generalized intervals [a, b], characterized by a ≤ b (a > b), is denoted by Δ+ (Δ− ). It turns out that (Δ+ ,) is a poset, namely poset of positive generalized intervals. Poset (Δ+ ,) is isomorphic1 to the poset (τ (R),) of intervals (sets) in R, i.e. (τ (R),) ∼ = (Δ+ ,). We augmented poset (τ (R),) by a least (empty) interval, denoted by O = [+∞, −∞]. Hence, the complete lattice (τO (R) = τ (R) ∪ {O},)∼ = (Δ+ ∪ {O}, ) emerged. A strictly decreasing bijective, i.e. one-to-one, function θ : R → R implies isomorphism (R,≤) ∼ = (R,≥). Furthermore, a strictly increasing function v : R → R is a positive valuation2 in lattice (R,≤). It follows that function vΔ : Δ → R given by vΔ ([a, b]) = v(θ(a)) + v(b) is a positive valuation in lattice (Δ,≤). Parametric functions θ(.) and v(.) may introduce tunable nonlinearities. 1
2
A map ψ : (P, ) → (Q, ) is called (order) isomorphism iff both “x y ⇔ ψ(x) ψ(y)” and “ψ is onto Q”. Two posets (P, ) and (Q, ) are called isomorphic, symbolically (P, ) ∼ = (Q, ), iff there is an isomorphism between them. Positive valuation in a general lattice (L, ) is a real function v : L × L → R that satisfies both v(x) + v(y) = v(x y) + v(x y) and x ≺ y ⇒ v(x) < v(y) [2].
412
2.2
V.G. Kaburlasos
The Complete Lattice (F,) of Intervals’ Numbers (INs)
Based on generalized intervals, this subsection presents intervals’ numbers (IN s). A more general number type is defined in the first place, next. Definition 2. Generalized interval number (GIN) is a function G : (0, 1] → Δ. Let G denote the set of GINs. It follows the complete lattice (G, ), as the Cartesian product of complete lattices (Δ, ). Our interest here focuses on the sublattice3 of intervals’ numbers defined next. Definition 3. An Intervals’ Number, or IN for short, is a GIN F such that both F (h) ∈ (Δ+ ∪ {O}) and h1 ≤ h2 ⇒ F (h1 ) F (h2 ). Let F denote the set of INs. It follows that (F, ) is a complete lattice with least element O = O(h) = [+∞, −∞] and greatest element I = I(h) = [−∞, +∞], ∀h ∈ (0, 1]. Conventionally, a IN will be denoted by a capital letter in italics, e.g. F ∈ F. Moreover, a N -tuple IN will be denoted by a capital letter in bold, e.g. F = (F1 , ..., FN ) ∈ FN . Lattice (FN , ) is the fourth-level in a hierarchy of complete lattices whose first-, second- and third- level include lattices (R, ), (Δ,) and (F, ), respectively. A IN is a mathematical object, which admits different interpretations as follows. First, based on the “resolution identity theorem”, a IN F (h), h ∈ (0, 1] may be interpreted as a fuzzy number, where F (h) is the corresponding α-cut for α = h. Hence, a IN F : (0, 1] → τO (R) may, equivalently, be represented by an upper-semicontinuous membership function mF : R → (0, 1]; that is the membership-function-representation for a IN. Moreover, a IN F (h), h ∈ (0, 1] is represented by a set of intervals; that is the interval-representation for a IN. There follows equivalence mF1 (x) ≤ mF2 (x) ⇔ F1 (h) F2 (h), where x ∈ R, h ∈ (0, 1]. Second, a IN F (h), h ∈ (0, 1] may also be interpreted as a probability distribution such that interval F (h) includes 100(1 − h)% of the distribution, whereas the remaining 100h% is split even both below and above interval F (h).
3
Novel Mathematical Tools
Consider the following definition [7,8,10]. Definition 4. Let (L, ) be a complete lattice with least and greatest elements O and I, respectively. An inclusion measure in (L, ) is a function σ : L×L → [0, 1], which satisfies the following conditions C0. C1. C2. C3. 3
σ(x, O) = 0, ∀x = O. σ(x, x) = 1, ∀x ∈ L. x y ≺ x ⇒ σ(x, y) < 1. u w ⇒ σ(x, u) ≤ σ(x, w).
A sublattice of a lattice (L, ) is another lattice (S, ) such that S ⊆ L.
Granular Fuzzy Inference System (FIS) Design by Lattice Computing
413
We remark that σ(x, y) can be interpreted as the fuzzy degree to which x is less than y; therefore notation σ(x y) may be used instead of σ(x, y). Two inclusion measures, namely sigma-meet and sigma-join, respectively, have been proposed [7,8] in the complete lattice (τO (R), ) of intervals as follows. 1) σ ([a, b] [c, d]) = v(θ(a∨c))+v(b∧d) v(θ(a))+v(b) , if a ∨ c ≤ b ∧ d; otherwise, σ ([a, b] [c, d]) = 0, and v(θ(c))+v(d) 2) σ ([a, b] [c, d]) = v(θ(a∧c))+v(b∨d) , where function v : R → R is strictly increasing, whereas function θ : R → R is strictly decreasing. In conclusion, as detailed in [7], the following two inclusion measures emerge, respectively, in the complete lattice (F,) of INs. 1 1) σ (F1 F2 ) = σ (F1 (h) F2 (h))dh. 0
2) σ (F1 F2 ) =
1
σ (F1 (h) F2 (h))dh.
0
The following Proposition can be interpreted with reference to Fig. 1. Proposition 1. Consider a continuous dual isomorphic function θ : R → R and a continuous positive valuation function v : R → R. Let U0 (h) = [u0 , u0 ], h ∈ (0, 1] be a trivial IN and let W (h), h ∈ (0, 1] be a IN with upper-semicontinuous membership function mW : R → R. Then σ (U0 W ) = mW (u0 ).
U0
1 h2
W mW(x)
mW(u0) = h0 h1 ah1
u0 ah2
bh2
bh1
x
Fig. 1. The sigma-meet σ (U0 W ) degree of inclusion of trivial IN U0 = [u0 , u0 ], h ∈ (0, 1] to IN W = W (h) = [ah , bh ], h ∈ (0, 1] equals mW (u0 ), where mW : R → R is the membership function of IN W
We remark that Proposition 1 couples a IN’s two different representations, namely the interval-representation and the membership-function-representation. Note that the principal advantage of the former (interval) representation is that it enables useful algebraic operations, whereas the principal advantage of the latter (membership function) representation is that it enables convenient fuzzy logic interpretions. The practical significance of Proposition 1 as well as of the following Proposition is demonstrated below.
414
V.G. Kaburlasos
Proposition 2. Consider complete lattices (Li , ), i ∈ {1, ..., N }, each equipped with an inclusion measure function σi : Li × Li → [0, 1]. Consider N -tuples x = (x1 , . . . , xN ) and y = (y1 , . . . , yN ) such that x, y ∈ L = L1 × . . . × LN . Furthermore, consider the conventional lattice ordering x y ⇔ xi yi , ∀i ∈ {1, ..., N }. Then, both functions 1) σ∧ : L × L → [0, 1] given by σ∧ (x y) = min{σi (xi yi )} and 2) σΠ : L × L → [0, 1] given by σΠ (x y) = Πσi (xi yi ), i ∈ {1, . . . , N } are inclusion measures in (L, ).
4
Computational Experiments
A FIS includes K rules (implications) Rk , k = 1, . . . K, of the following form Rule Rk : IF (variable V1 is Fk,1 ).and. . . . .and.(variable VN is Fk,N ) THEN ck This work does not concern with the consequents ck , k = 1, . . . K of rules. Instead, the interest here focuses exclusively on rule antecedents. Furthermore, unless otherwise stated, this work employs functions v(x) = x and θ = −x. Fig. 2 displays the antecedent of a FIS rule R with only two INs W1 and W2 having parabolic membership functions mW1 (x) = −x2 + 6x − 8 and mW2 (x) = −0.25x2 + 3.5x − 11.25, respectively. Let an input [u1,0 , u2,0 ] = [3.5, 5.5] be presented as shown in Fig. 3(a). Using conventional FIS techniques, the activation mR (u1,0 , u2,0 ) of rule R is a function of both mW1 (u1,0 ) = 0.75 and mW2 (u2,0 ) = 0.4375. For instance, it may be either mR (u1,0 , u2,0 ) = min{mW1 (u1,0 ), mW2 (u2,0 )} or mR (u1,0 , u2,0 ) = mW1 (u1,0 )mW2 (u2,0 ). Identical results are obtained by inclusion measure σ (., .) as explained next. Let trivial INs U1,0 = U1,0 (h) = [u1,0 , u1,0 ] = [3.5, 3.5], h ∈ (0, 1] and U2,0 = U2,0 (h) = [u2,0 , u2,0 ] = [5.5, 5.5], h ∈ (0, 1] represent real numbers u1,0 = 3.5 and u2,0 = 5.5, respectively. Then, based on Proposition 1, it follows both σ (U1,0 W1 ) = mW1 (u1,0 ) = 0.75 and σ (U2,0 W2 ) = mW2 (u2,0 ) = 0.4375. Finally, based on Proposition 2, the degree of inclusion of U0 = [U1,0 , U2,0 ] to W = [W1 , W2 ] may be either σ∧ (U0 W) = min{σ (U1,0 W1 ), σ (U2,0 W2 )} = min{mW1 (u1,0 ), mW2 (u2,0 )} or σΠ (U0 W) = σ (U1,0 W1 )σ (U2,0 W2 ) = mW1 (u1,0 )mW2 (u2,0 ).
W1
1
W2
1
mW (x 2)
AND
mW (x 1)
2
1
0
0
1
2
3
4
5 x1
6
7
8
9
10
0
0
1
2
3
4
5 x2
6
7
8
9
10
Fig. 2. A FIS rule R antecedent: “variable V1 is W1 ” and “variable V2 is W2 ”. The membership functions of INs W1 and W2 are parabolas mW1 (x1 ) and mW2 (x2 ) with maxima at x1 = 3 and x2 = 7, respectively.
Granular Fuzzy Inference System (FIS) Design by Lattice Computing
415
A first substantial advantage for an inclusion measure is its capacity to accommodate “in principle” granular input INs for representing uncertainty/vagueness in practice [14]. For instance, consider the granular input INs U1 and U2 shown in Fig. 3(b) each with an isosceles (triangular) membership function of width 2 ∗ 0.2 = 0.4 centered at x1 = 3.5 and x2 = 5.5, respectively. Inclusion measure σ (., .) computes the activation of rule R in Fig. 3(b) as follows. 0.6825 0.7902 −0.2h−0.3+√1−h One the one hand, it is σ (U1 W1 ) = 1dh + dh + −0.4h+0.4 0
1
0dh ≈ 0.7456. On the other hand, it is σ (U2 W2 ) =
0.7902 0.5088 2√1−h−0.2h−1.3 dh −0.4h+0.4 0.3331
W1
1 0.75
1
+
0dh ≈ 0.4321.
U1,0
U2,0
1
W2 mW (x 2)
AND
2
0.4375
3 3.5 4
2
1
1dh +
0.5088
mW (x 1)
0
0.3331 0
1
0
0.6825
5 x1
0
10
9
8
7
6
0
1
2
4
3
5 5.5 6 x2
9
8
7
10
(a) W1
1
U1
AND mW (x 1)
0
1
2
W2 mW (x 2) 2
0.5088 0.3331
1
0
U2
1
0.7902 0.6825
3 3.5 4
5 x1
6
7
9
8
0
10
0
1
2
4
3
5 5.5 6 x2
7
8
9
10
(b) Fig. 3. Consider the antecedent of rule R in Fig. 2. (a) Rule R is activated by trivial IN U0 = [U1,0 , U2,0 ]. (b) Rule R is activated by IN U = [U1 , U2 ], where both INs U1 and U2 have an isosceles (triangular) membership function of width 2 ∗ 0.2 = 0.4.
A second substantial advantage for σ (., .), in particular, is its capacity to deal with nonoverlapping INs towards sensibly employing a sparse rule-base. For instance, on the one hand, Fig. 4(a) shows a trivial IN input U0 = [U0 , U0 ], where U0 = U0 (h) = [4.5, 4.5], h ∈ (0, 1], presented to rule R. It follows σ (U0 W1 ) = 1 2√1−h 1 4√1−h √ √ dh ≈ 0.5974, moreover σ (U W ) = dh ≈ 0.6737. On 0 2 1.5+ 1−h 2.5+2 1−h 0
0
the other hand, Fig. 4(b) shows a nontrivial IN input U = [U, U ] presented to rule R, where IN U has an isosceles (triangular) membership function of width √ 1 2 1−h √ 2 ∗ 0.2 = 0.4 centered at 4.5. It follows σ (U W1 ) = 1.7−0.2h+ dh ≈ 1−h 0.5693, and σ (U W2 ) =
1 0
√ 4 1−h√ dh 2.7−0.2h+2 1−h
0
≈ 0.6555.
416
V.G. Kaburlasos
W1
1
U0
U0
1
W2 mW (x 2)
AND
mW (x 1)
2
1
0
0
2
1
4 4.5 5 x1
3
6
0
10
9
8
7
0
1
2
4 4.5 5 x2
3
7
6
9
8
10
(a) W1
1
U
W2
U
1
mW (x 2)
AND
mW (x 1)
2
1
0
0
1
2
4 4.5 5 x1
3
6
0
10
9
8
7
0
1
2
7
6
4 4.5 5 x2
3
8
9
10
(b) Fig. 4. Consider the antecedent of rule R in Fig. 2. (a) A trivial IN input U0 = [U0 , U0 ] is presented. (b) A granular IN input U = [U, U ] is presented. Only inclusion measure σ (., .) can activate “in principle” rule R.
Finally, a third substantial advantage for an inclusion measure is its capacity to employ alternative positive valuation functions, whereas, in stark contrast, the majority of FISs in the literature (implicitly) employ solely positive valuation v(x) = x. In the following we demonstrate the effects of the (paramet1 , x ∈ R, where ric) sigmoid positive valuation function vs (x; λ, μ0 ) = 1+e−λ(x−μ 0) >0 λ ∈ R , μ0 ∈ R. Consider INs W1 and W2 of Fig. 2, trivial IN U0 of Fig. 4(a), and triangular IN U of Fig. 4(b). Then, for the sigmoid function vs (x; 1, 4.5) shown in Fig. 5, it was computed σ (U0 W1 ) ≈ 0.6114 and σ (U0 W2 ) ≈ 0.9999; furthermore, σ (U W1 ) ≈ 0.5803 and σ (U W2 ) ≈ 1. Hence, a positive valuation can be used as an instrument for tunable decision-making.
W1
1
U0
W2
U 0.5
0 -6
-5
-4
-3
-2
-1
0
1
2 x
3
4.5
6
7
8
9
10
Fig. 5. INs W1 and W2 of Fig. 2 are displayed as well as both trivial IN U0 and triangular IN U of Fig. 4. Inclusion measures σ (., .) were computed using the displayed sigmoid positive valuation vs (x; λ, μ0 ) = 1/(1 + e−λ(x−μ0 ) ) with λ = 1, μ0 = 4.5.
Granular Fuzzy Inference System (FIS) Design by Lattice Computing
5
417
Discussion and Conclusion
This work introduced two major theoretical results, presented by Proposition 1 and Proposition 2 relating, on the one hand, inclusion-measure-based algebraic operations in lattice (F, ) and, on the other hand, membersip-function-based fuzzy logic operations. In conclusion, significant improvements were demonstrated in FIS design including the rigorous fusion of granular input data, sensible employment of sparse rules, and introduction of tunable nonlinearities. Acknowledgement. This work has been supported, in part, by a project Archimedes-III contract.
References 1. Belohlavek, R.: Fuzzy Relational Systems: Foundations & Principles. Springer, Heidelberg (2002) 2. Birkhoff, G.: Lattice Theory. AMS, Colloquium Publications 25 (1967) 3. Gra˜ na, M.: State of the art in lattice computing for artificial intelligence applications. In: Nadarajan, R., Anitha, R., Porkodi, C. (eds.) Mathematical and Computational Models, pp. 233–242 (2007) 4. Gra˜ na, M.: Lattice computing: lattice-theory-based computational intelligence. In: Matsuhisa, T., Koibuchi, H. (eds.) Proc. Kosen Workshop on Mathematics, Technology, and Education (MTE), pp. 19–27 (2008) 5. Gra˜ na, M., Villaverde, I., Maldonado, J.O., Hernandez, C.: Two lattice computing approaches for the unsupervised segmentation of hyperspectral images. Neurocomputing 72(10-12), 2111–2120 (2009) 6. Guillaume, S.: Designing fuzzy inference systems from data: an interpretabilityoriented review. IEEE Trans. Fuzzy Systems 9(3), 426–443 (2001) 7. Kaburlasos, V.G.: Towards a Unified Modeling and Knowledge-Representation Based on Lattice Theory. SCI, vol. 27. Springer, Heidelberg (2006) 8. Kaburlasos, V.G., Hatzimichailidis, A.G.: Improved fuzzy inference system (FIS) design based on fuzzy lattice reasoning (FLR) (submitted) 9. Kaburlasos, V.G., Papadakis, S.E.: Piecewise-linear approximation of nonlinear models based on interval numbers (INs). In: Kaburlasos, V.G., Priss, U., Gra˜ na, M. (eds.) Proc. Lattice-Based Modeling (LBM 2008) Workshop, pp. 13–22 (2008) 10. Papadakis, S.E., Kaburlasos, V.G.: Piecewise-linear approximation of nonlinear models based on probabilistically/possibilistically interpreted intervals’ numbers (INs). Information Sciences (to be published) 11. Pedrycz, W., Skowron, A., Kreinovich, V. (eds.): Handbook of Granular Computing. John Wiley & Sons, Chichester (2008) 12. Ritter, G.X., Wilson, J.N.: Handbook of Computer Vision Algorithms in Image Algebra, 2nd edn. CRC Press, Boca Raton (2000) 13. Sussner, P., Valle, M.E.: Morphological and certain fuzzy morphological associative memories for classification and prediction. In: Kaburlasos, V.G., Ritter, G.X. (eds.) Computational Intelligence Based on Lattice Theory. SCI, vol. 67, pp. 149–171. Springer, Heidelberg (2007) 14. Wang, P.P.: Mathematics of Uncertainty – Guest Editiorial. Information Sciences 177(23), 5141–5142 (2007) 15. Xu, Y., Ruan, D., Qin, K., Liu, J.: Lattice-Valued Logic. Studies in Fuzziness and Soft Computing, vol. 132. Springer, Heidelberg (2003)
Median Hetero-Associative Memories Applied to the Categorization of True-Color Patterns Roberto A. Vázquez1 and Humberto Sossa2 1
Escuela de Ingeniería – Universidad La Salle Benjamín Franklin 47 Col. Condesa CP 06140 México, D.F. 2 Centro de Investigación en Computación – IPN Av. Juan de Dios Batiz, esquina con Miguel de Othon de Mendizábal Ciudad de México, 07738, México [email protected], [email protected]
Abstract. Median associative memories (MED-AMs) are a special type of associative memory based on the median operator. This type of associative model has been applied to the restoration of gray scale images and provides better performance than other models, such as morphological associative memories, when the patterns are altered with mixed noise. Despite of his power, MEDAMs have not been applied in problems involving true-color patterns. In this paper we describe how a median hetero-associative memory (MED-HAM) could be applied in problems that involve true-color patterns. A complete study of the behavior of this associative model in the restoration of true-color images is performed using a benchmark of 14400 images altered by different type of noises. Furthermore, we describe how this model can be applied to an image categorization problem.
1 Introduction The concept of associative memory (AM) emerges from psychological theories of human and animals learning. These memories store information by learning correlations among different stimuli. When a stimulus is presented as a memory cue, the other is retrieval as a consequence; this means that the two stimuli have become associated each other in the memory. An AM can be seen as a particular type of neural network designed to recall output patterns in terms of input patterns that can appear altered by some kind of noise. Several AMs have been proposed in the last 50 years (refer for example [1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11] and [12]). Some of these AMs have several constraints that limit their applicability in complex problems. Most of these constraints are related to storage capacity, the type of patterns (only binary, bipolar), see for example [4], and robustness to noise (additive, subtractive, mixed, Gaussian noise, deformations, etc), see for example [8] and [12]. In 1998, Ritter et al. [8] proposed the concept of morphological associative memories (MAMs) which exhibit optimal absolute storage capacity and one-step convergence. Basically, the authors substituted the outer product by max and min E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 418–428, 2010. © Springer-Verlag Berlin Heidelberg 2010
Median Hetero-Associative Memories Applied to the Categorization
419
operations. This type of associative model has been applied to different pattern recognition problems including face localization and reconstruction of gray scale [9] and true-color images [17], but they are not robust to mixed noise. However, the morphological associative model alone is incapable to deal with patterns distorted with additive and subtractive noise at the same time. A solution to this problem was proposed in [6]. There are other approaches based on fuzzy theory and lattice theory, see for example [7], [13], [14] and [16]. Kosko’s model [7] describes an associative memory in terms of a nonlinear matrix vector product called max-min composition, and the synaptic weight matrix is given by fuzzy Hebbian learning; however, it exhibits a low storage capacity (one rule per FAM matrix). Later in 1996, Chung and Lee [13] presented a generalization of this model and demonstrated that a perfect recall of multiple rules per FAM matrix is possible if the input fuzzy sets are normal and max-t composition orthogonal. Recently, Sussner and Valle [14], generalized the implicative learning rules to include any max-t composition based on a continuous t-norm. On the other hand, the associative memory based on a dendritic single layer morphological perceptron is robust under different type of noises [16]. Despite of the robustness of these models under noisy patterns, they do not present one-step convergence as morphological associative memories. Another interesting one-step approach was introduced by Sossa, et al. [12]. In this model, the authors substituted the max-min operator by the med operator. By using this new operator the median associative model (MED-AM) was capable to deal with patterns which include additive and subtractive noise at the same time. Despite of the power of recent models, they have not been applied in problems that involve true-color patterns neither a deep study of this associative model under truecolor image pattern has been performed. In this paper it is described how a MED-AM could be applied in problems that involve true-color patterns. Furthermore, a complete study of the behavior of this associative model in the reconstruction of true-color images is performed using a benchmark of 14400 images altered by different type of noises. In addition, we describe how this model could be applied to an image categorization problem.
2 Basics on Median Associative Memories An associative memory is a device designed to recall patterns. These patterns might appear altered by noise. An associative memory M can be viewed as an input-output system as follows: x → M → y , with x and y, respectively the input and output patterns vectors. Each input vector forms an association with a corresponding output vector. The associative memory M is represented by a matrix whose ij-th component is mij . M is generated from a finite a priori set of known associations, known as the fundamental set of associations, or simply the fundamental set (FS). If ξ is an index,
{(
)
}
the fundamental set is represented as: x , y | ξ = 1,2, K, p with p the cardinality of the set. The patterns that form the fundamental set are called fundamental patterns. If it holds that
ξ
ξ
x ξ = y ξ ∀ ξ ∈ {1,2,K p}, then M is auto-associative,
420
R.A. Vázquez and H. Sossa
otherwise it is hetero-associative. A distorted version of a pattern x to be recuperated will be denoted as ~ x . If when feeding a distorted version of x with w ∈ {1,2,K, p} to an associative memory M, then it happens that the output correw
sponds exactly to the associated pattern Let
[ ]
P = pij
m×r
and
[ ]
Q = qij
r ×n
y w , we say that recalling is perfect. two matrices.
Definition 1. The following two matrix operations are defined to recall integervalued patterns:
[ ] = [f ]
1.
Operation
◊ Α : Pm×r ◊ Α Qr×n = f ijΑ
2.
Operation
◊ Β : Pm×r ◊ Β Qr×n
where
f ijΑ = ⊗ Α( pik , qkj ) .
where
f ijΒ = ⊗ Β( pik , qkj ) .
r
m×n
Β ij m×n
k =1 r
k =1
According to the operators ⊗, Α and Β, different results can be obtained. If we want, for example, to compensate for additive or subtractive noise, operator ⊗ should be replaced by median operator (med) because it provides excellent results in the presence of mixed noise. It can be easily shown that if matrix of dimension m × n .
x ∈ Z n and y ∈ Z m , then y◊ Α x t is a
Relevant simplifications are obtained when operations tween vectors: 1.
If x ∈ Z and holds that n
y ∈ Z m , then y◊ Α x t is a matrix of dimensions m×n, and also it
⎛ Α( y1 , x1 ) Α( y1 , x 2 ) ⎜ ⎜ Α( y 2 , x1 ) Α( y 2 , x 2 ) y ◊Α xt = ⎜ M M ⎜ ⎜ Α( y , x ) Α( y , x ) m m 1 2 ⎝
2.
If
◊ Α and ◊ Β are applied be-
L Α( y1 , x n ) ⎞ ⎟ L Α( y 2 , x n ) ⎟ ⎟ O M ⎟ L Α( y m , x n )⎟⎠ m×n
(1)
x ∈ Z n and P a matrix of dimensions m×n, operations Pm×n ◊ Β x gives as a
result one vector with dimension m, with i-th component given as:
(Pm×r ◊ Β x )i = med Β( p ij , x j ) j =1 n
If
(2)
x ∈ Z n and P a matrix of dimensions m × n then operation M m×n ◊ Β x outputs
an m-dimensional column vector, with i-th component given as:
(M m×n ◊ Β x )i
= med Β(mij , x j ) n
j =1
(3)
Median Hetero-Associative Memories Applied to the Categorization
421
Operators Α and Β are defined as follows:
Α( x, y ) = x − y Β(x, y ) = x + y
(4) (5)
2.1 Memory Construction Two steps are required to build the MED-AM: Step 1:
For each ξ = 1,2,L, p , from each couple
[y ◊ (x ) ] ξ
Step 2:
Α
ξ t
m×n
(x
ξ
, yξ
)
build matrix:
as in equation 1.
Apply the median operator to the matrices obtained in Step 1 to get matrix M as follows: p
[
( )]
M = med y ξ ◊ Α x ξ ξ =1
t
(6)
2.2 Pattern Recall
~ x (altered version of a pattern x w is presented to the HAM memory M x using equation 3. and the following operation is done by M◊ Β ~ A pattern
The complete set of theorems which guarantee perfect recall and their corresponding proofs are presented in [12]. However, in practice most of the fundamental sets of patterns do not satisfy the restricted conditions imposed by the authors. For that reason the authors propose the following procedure to perfectly recall a general FS. TRAINING PHASE: Step 1. Transform the FS into an auxiliary fundamental set (FS’) satisfying Theorem 1: 1) Make d = cont .
(
) (
)
2) Make x , y = x , y . 3) For the remaining couples do { For ξ = 2 to p For i=1 to n { xiξ = xiξ −1 + d ; xˆiξ = xiξ − xiξ ; yiξ = yiξ −1 + d ; yˆiξ = yiξ − yiξ } 1
1
1
1
Step 2. Build matrix M in terms of set FS’: Apply to FS’ steps 1 and 2 of the training procedure described at the beginning of Section 2.1. Remark 1. After this transformation, patterns from the auxiliary fundamental set are equidistant among them in a ratio of d. This value also determines the noise supported by the model. Originally the authors decided to use the difference between the first components; however, in [10] the authors proposed other technique to compute d.
422
R.A. Vázquez and H. Sossa
RECALLING PHASE: ξ Recalling of a pattern y from an altered version of its key ~ xξ :
~ x ξ to x ξ by applying the following transformation: xξ = ~ x ξ + xˆ ξ . ξ ξ 2) Apply equations 3 to x to get y , and ξ ξ ξ ξ ξ 3) Anti-transform y as y = y − yˆ to get y . 1) Transform
Some important to mention is that this MED-AM is robust only to mixed noise.
3 Behavior of the MED-AM In this section a behavioral study of the MED-HAM using true-color noisy patterns is presented. The benchmark used in this set of experiments is composed by 14440 color images of 63 × 43 pixels and 24 bits in a bmp format [15]. This benchmark is composed of 40 classes of flowers and animals. Per each class, there are 90 images altered with additive noise (0% of the pixels to 90% of the pixels), 90 images altered with subtractive noise (0% of the pixels to 90% of the pixels), 90 images altered with mixed noise (0% of the pixels to 90% of the pixels) and 90 images altered with Gaussian noise (0% of the pixels to 90% of the pixels). In addition, one image of each class was altered removing some parts of the image. Some images which compose this benchmark are shown in Fig. 1. Additive noise Subtractive noise Mixed noise Gaussian noise 10 % of noise
20 % of noise
30 % of noise
40 % of noise
50 % of noise
Missing data
Fig. 1. Some images from the benchmark used to train and test the MED-AM
Median Hetero-Associative Memories Applied to the Categorization
423
In order to generate the images altered with additive noise, we follow the next procedure: 1) use a uniform distribution to select k percentage of the RGB pixels which compose the image; 2) set each component of these RGB pixels to the maximum grey level value
( L − 1) , in this case 255 to produce a white color. As with the previous
procedure, in order to generate the images altered with subtractive noise, we use a uniform distribution to select at random k percentage of the RGB pixels which compose the image and then, each component of these RGB pixels were set to the minimum grey level value, in this case 0 to produce a black color. For the case of mixed noise, we combined the two previous procedures as follows: randomly select k percentage of the RGB pixels which compose the image, then generate a random number between 0 and 1 and if the number is greater than 0.5 alter the RGB pixels with additive noise, otherwise alter with subtractive noise. In order to generate the image altered with Gaussian noise, we use an uniform distribution to select k percentage of the RGB pixels which compose the image, then set each component of these RGB pixels with a random number sampled from a normal L −1
L −1
distribution with
μ = ∑i i =0
and
σ = 1 L ∑ (i − μ )
2
.
i =0
Although it seems that there is not much difference between gray level images and true-color images, MED-AMs are not designed to cope with multivariable patterns (three channels per pixel). Instead of training one memory per color channel and then deciding how to combine the information recalled by each memory and finally restore the true-color image, we proposed to transform these three channels into one channel. Before the MED-HAM was trained, each image was transformed into an image pattern. To build an image pattern from a bmp file, the image was read from left-right and up-down. Each RGB pixel (hexadecimal value) was transformed into a decimal value and finally, this information was stored into an array. For example, suppose that the value of a RGB pixel is “0x3E53A1” then by transforming into its decimal version, its corresponding decimal value will be “4084641”. Note that if we transform the RGB channels into one channel by means of computing the average of the three channels (in other words transform the true-color image into a gray level image), we will not able to recover the information of the RGB channels from the average channel. Once the images were transformed, the MED-HAM was trained using a set of associations composed by the 40 image patterns which are not altered with any type of noise. Each image pattern is composed by 2709 pixels which imply that this MEDHAM has 2709 × 2709 synaptic weights. First to all, we verified if the MED-HAM was able to recall the complete set of associations. Then we verified the behavior of MED-HAM using noisy versions of the images used to train it. After that, we performed a study concerning to how the number of associations influence the behavior of the MED-HAM. In order to measure the accuracy of the MED-HAM we counted the percentage of pixels correctly recalled. Once trained the associative memory, we proceeded to evaluate the behavior of the MED-HAM. It is important to remark that, even storing true-color patterns in the MED-HAM, it was capable to recall the complete set of associations.
424
R.A. Vázquez and H. Sossa
In resume, we can conclude that the accuracy of the MED-HAM is not susceptible to the number of associations stored. The associative model presents the same behavior even if the number of associations is increased or decreased. The general behavior of the MED-HAM is shown in Fig. 2, where clearly we can observe the robustness of this memory with patterns altered with mixed noise. Although we already knew that the MED-HAM is robust to mixed noise, nobody had reported results using color images. These results are acceptable and support the applicability of this MED-HAM to restore true-color images from noisy versions altered with mixed noise. On the other hand, we expected the worst results for the case of additive, subtractive and Gaussian noise, however, the accuracy obtained with images altered with additive and subtractive noise was highly acceptable. In this experiment we also could observe that the MED-HAM was more robust to Gaussian than additive and subtractive noise. Although the authors in [12] said that MED-HAM is only robust to mixed noise, we experimentally showed that this model is robust also to true-color patterns altered with additive, subtractive, and Gaussian noise.
(a)
(c)
(b)
(d)
Fig. 2. General behavior of the MED-HAM tested with different type of noises
Contrary to the behavior of morphological associative memories (best accuracy for auto-associative version, low accuracy for the hetero-associative version) [17], the same accuracy with the same experiments was observed in the MED-AAM and MED-HAM versions. No comparison against other models was performed because
Median Hetero-Associative Memories Applied to the Categorization
425
authors do not report results related to the relationship between storage capacity and amount of noise with more than 5 associations and with more that the 40% of noise added to the image. 3.1 A Real Application: Image Categorization Using MED-HAMs
Image categorization is not a trivial problem when pictures are taken from real life situations. This implies that categorization must be invariant to several image transformations such as translations, rotations, scale changes, illumination changes, orientation changes, noise, and so on [18]. In this section we describe how images can be categorized using the MED-HAM already described and the methodology proposed in [18]. Suppose that we feed a MED-HAM with a picture and we expect that it responds with something indicating the content of the picture. For example, if the picture contains a lion, we would expect that the MED-HAM should respond with the word “lion”. A first step to solve this problem was reported in [18]; now we will use that approach to show the applicability of this median associative model trying to give solution to this image categorization problem when the concerned images are distorted only by additive noise. Following the procedure described in [18] we firstly selected a set of images, in this case the benchmark used in previous experiment. Then, we associated these images with describing words. The images and the describing words are our fundamenk
k
tal set of associations, with x is the k -image and y the k -describing word. With this set of associations we proceed to train the MED-HAM.
Lion
Leopard
Peacock
Tiger
Turtle
Zebra
Wild dog
Domestic dog
Rhinoceros
Flamingo
Fig. 3. Fundamental set of associations composed of 40 associations used to train the MEDHAM applied to an image categorization problem
Before the MED-HAM was trained, each image was transformed into an image pattern. The elements y r , r = 1,K, R of vector y correspond to the ASCII codes of the k
k
letters of each describing word, where R is the number of letters of a given word. In Fig. 3, we show the information used to train the associative memory. By using this set of association we expect that when feeding the MAM with the image which contains an agapanthus, we will recall the word “agapanthus”, even if the image is altered with additive noise. Once trained the associative model, we proceed to test the accuracy of the proposal altering the images with additive noise.
426
R.A. Vázquez and H. Sossa
In average, the accuracy of the proposal in this image categorization task was of 88.27%. As you can appreciate from Fig. 4, not all the 14400 images were correctly categorized, in other words, not all the words associated with the images were correctly recalled. From this figure we can also observe that if the quantity of noise added to the image is less than 70%, all the noisy images are correctly categorized or classified. If the quantity of noise surpasses this threshold, the accuracy starts to decreases.
Fig. 4. Accuracy of the MED-HAM when it is applied to an image categorization problem
On the other hand, when images miss some data (see figure 1), the complete set of associations were perfectly recalled, i.e. all images were correctly categorized of classified.
4 Conclusions In this paper, a complete behavioral study of the median hetero-associative memory in the restoration of true-color images was performed using a benchmark of 14400 images altered by different type of noises. Furthermore, we described how this associative model could be applied in an image categorization or classification problem, using the same benchmark Due to this associative model had been only applied to gray level patterns; this paper is useful to really understand the power and limitations of this model. Through several experiments, we found some interesting properties of this associative model. MED-HAMs present robust recall even if patterns are altered by some kind of noise. Furthermore, MED-HAMs are not sensitive to the amount of noises; however, after certain amount of additive and subtractive noise, the accuracy of the model tends to decrease; in this case this threshold is reach at 73% of noise. We also observed that the model is not only robust to mixed noise but to additive, subtractive and Gaussian noise too. Regarding to the storage capacity, we found that the accuracy of the model is not sensitive to the number of stored associations. In general we can say that when the number of association is increased, the accuracy of the memory reminds almost stable. In average, MED-HAMs correctly recall 79.9% of the pixels when patterns are altered by additive noise. A correctly recall of 79.4% of the pixels is obtained when
Median Hetero-Associative Memories Applied to the Categorization
427
patterns are altered by subtractive noise; for the case of mixed and Gaussian noise, correctly recall 100% of the pixels is obtained. These results are highly acceptable, compared against the results provided by the morphological associative model [17] which were in average 77.4% using the same benchmark. Concerning to the image categorization problem, the accuracy of the proposal in average was of 88.27%. We observed that if the quantity of noise added to the image to be classified is less than 70%, all the noisy images are correctly categorized or classified. On the other hand, if the quantity of noise surpasses this threshold, the accuracy starts to decreases. Other possible applications of MED-HAMS (not developed here due to space limitations) are the following: Recall of the word that best describes a given image corrupted by mixed noise, finding the index class of an object given an image of it, retrieving an associated image from a corrupted version of its associated image, and so on. Acknowledgements. The authors thank the SIP-IPN under grant 20091421. H. Sossa thanks CINVESTAV-GDL for the support to do a sabbatical stay from December 1, 2009 to May 31, 2010. Authors also thank the European Union, the European Commission and CONACYT for the economical support. This paper has been prepared by economical support of the European Commission under grant FONCICYT 93829. The content of this paper is an exclusive responsibility of the CIC-IPN and it cannot be considered that it reflects the position of the European Union. We thank also the reviewers for their comments for the improvement of this paper.
References [1] Steinbuch, K.: Die Lernmatrix. Kybernetik 1, 26–45 (1961) [2] Anderson, J.A.: A simple neural network generating an interactive memory. Math. Biosci. 14, 197–220 (1972) [3] Kohonen, T.: Correlation matrix memories. IEEE Trans. on Comp. 21, 353–359 (1972) [4] Hopfield, J.J.: Neural networks and physical systems with emergent collective computational abilities. Proc. Natl. Acad. Sci. 79, 2554–2558 (1982) [5] Sussner, P.: Generalizing operations of binary auto-associative morphological memories using fuzzy set theory. J. Math. Imaging Vis. 19, 81–93 (2003) [6] Ritter, G.X., et al.: Reconstruction of patterns from noisy inputs using morphological associative memories. J. Math. Imaging Vis. 19, 95–111 (2003) [7] Kosko, B.: Neural Networks and Fuzzy Systems: A Dynamical Systems Approach to Machine Intelligence. Prentice- Hall, Englewood Cliffs (1992) [8] Ritter, G.X., Sussner, P., Diaz de Leon, J.L.: Morphological associative memories. IEEE Trans. Neural Networks 9, 281–293 (1998) [9] Sussner, P., Valle, M.: Gray-Scale Morphological Associative Memories. IEEE Trans. on Neural Netw. 17, 559–570 (2006) [10] Vazquez, R.A., Sossa, H.: A new associative memory with dynamical synapses. Neural Processing Letters 28(3), 189–207 (2008) [11] Vazquez, R.A., Sossa, H.: A Bidirectional Heteroassociative Memory for True Color Patterns. Neural Processing Letters 28(3), 131–153 (2008)
428
R.A. Vázquez and H. Sossa
[12] Sossa, H., Barron, R., Vazquez, R.A.: New associative memories to recall real-valued patterns. In: Sanfeliu, A., Martínez Trinidad, J.F., Carrasco Ochoa, J.A. (eds.) CIARP 2004. LNCS, vol. 3287, pp. 195–202. Springer, Heidelberg (2004) [13] Chung, F.-L., Lee, T.: On fuzzy associative memory with multiple-rule storage capacity. IEEE Trans. Fuzzy Syst. 4(4), 375–384 (1996) [14] Susner, P., Valle, M.E.: Implicative fuzzy associative memory. IEEE Trans. on Fuzzy Systems 14(6), 793–807 (2006) [15] Sossa, H., Vazquez, R.A.: Flower and Animals Database, http://roberto.a.vazquez.googlepages.com [16] Ritter, G.X., Urcid, G.: Learning in Lattice Neural Networks that Employ Dedritic Computing. In: Kaburlasos, V.G., Ritter, G.X. (eds.) Computational Intelligence based on Lattice Theory, vol. 67, pp. 25–44 (2007) [17] Vazquez, R.A., Sossa, H.: Behavior of morphological associative memories with truecolor image patterns. Neurocomputing 73, 225–244 (2009) [18] Vazquez, R.A., Sossa, H.: Associative memories applied to image categorization. In: Martínez-Trinidad, J.F., Carrasco Ochoa, J.A., Kittler, J. (eds.) CIARP 2006. LNCS, vol. 4225, pp. 549–558. Springer, Heidelberg (2006)
A Comparison of VBM Results by SPM, ICA and LICA Darya Chyzyk, Maite Termenon, and Alexandre Savio Computational Intelligence Group Dept. CCIA, UPV/EHU, Apdo. 649, 20080 San Sebastian, Spain www.ehu.es/ccwintco
Abstract. Lattice Independent Component Analysis (LICA) approach consists of a detection of independent vectors in the morphological or lattice theoretic sense that are the basis for a linear decomposition of the data. We apply it in this paper to a Voxel Based Morphometry (VBM) study on Alzheimer’s disease (AD) patients extracted from a well known public database. The approach is compared to SPM and Independent Component Analysis results.
1
Introduction
Morphometry analysis has become a common tool for computational brain anatomy studies. It allows a comprehensive measurement of structural differences within a group or across groups, not just in specific structures, but throughout the entire brain. Voxel-based Morphometry (VBM) is a computational approach to neuroanatomy that measures differences in local concentrations of brain tissue through a voxel-wise comparison of multiple brain images [3]. For instance, VBM has been applied to study volumetric atrophy of the grey matter (GM) in areas of neocortex of AD patients vs. control subjects [4,17,6]. The procedure involves the spatial normalization of subject images into a standard space, segmentation of tissue classes using a priori probability maps, smoothing to correct noise and small variations, and voxel-wise statistical tests. Statistical analysis is based on the General Linear Model (GLM) to describe the data in terms of experimental and confounding effects, and residual variability. Classical statistical inference is used to test hypotheses that are expressed in terms of GLM estimated regression parameters. This computation is specified as a contrast that produces a scalar estimate which the Statistical Parametric Map (SPM) thresholds according to the Random Field theory to obtain clusters of significant voxels. SPM has been also widely applied to fMRI voxel activation analysis. Alternative works on fMRI analysis are based on the Independent Component Analysis (ICA) [18] assuming that the time series observations are linear mixtures of independent sources which can not be observed. This leads us to consider here ICA and other approaches for VBM on transversal data. ICA assumes that the source signals are non-Gaussian and that the linear mixing process is unknown. The approaches to solve the ICA problem obtain both the independent sources E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 429–435, 2010. c Springer-Verlag Berlin Heidelberg 2010
430
D. Chyzyk, M. Termenon, and A. Savio
and the linear unmixing matrix. These approaches are unsupervised because no a priori information about the sources or the mixing process is included, hence the alternative name of Blind Deconvolution. Sources in VBM correspond to the pattern of intensities of a voxel across the population of subjects. We have used the FastICA algorithm implementation available at [2] . We have also used the implementations of Maximum Likelihood ICA [14] (which is equivalent to Infomax ICA), Mean Field ICA [13], Molgedey and Schouster ICA based on dynamic decorrelation [15], which are available at [1]. We have proposed [11,9] a Lattice Computing [8] approach that we call Lattice Independent Component Analysis (LICA) that consists of two steps. Firts it selects Strong Lattice Independent (SLI) vectors from the input dataset using an incremental algorithm, the Incremental Endmember Induction Algorithm (IEIA) [10]. Second, because of the conjectured equivalence between SLI and Affine Independence [12], it performs the linear unmixing of the input dataset based on these endmembers1 . Therefore, the approach is a mixture of linear and nonlinear methods. We assume that the data is generated as a convex combination of a set of endmembers which are the vertices of a convex polytope covering some region of the input data. This assumption is similar to the linear mixture assumed by the ICA approach, however we do not impose any probabilistic assumption on the data. The endmembers discovered by the IEIA are equivalent to the GLM design matrix columns, and the unmixing process is identical to the conventional least squares estimator so LICA is a kind of unsupervised GLM whose regressor functions are mined from the input dataset. If we try to stablish correspondences to the ICA, the endmembers correspond to the unknown sources and the mixing matrix is the one given by the abundance coefficients computed by least squares estimation. The outline of the paper is as follows: Section 2 overviews the LICA. Section 3 presents results of the proposed approach on a VBM case study on an Alzheimer’s Disease population with paired controls. Section 4 provides some conclusions.
2
The Lattice Independent Component Analysis
M The linear mixing model can be expressed as follows: x = i=1 ai ei + w = Ea + w, where x is the d-dimension pattern vector corresponding to the fMRI voxel time series vector, E is a d×M matrix whose columns are the d-dimensional vectors, when these vectors are the vertices of a convex region covering the data they are called endmembers ei , i = 1, .., M, a is the M -dimension vector of linear mixing coefficients, which correspond to fractional abundances in the convex case, and w is the d-dimension additive observation noise vector. The linear mixing model is subjected to two constraints on the abundance coefficients when the data points fall into a simplex whose vertices are the endmembers, all abundance coefficients must be non-negative ai ≥ 0, i = 1, .., M and normalized to unity 1
The original works were devoted to unsupervised hyperspectral image segmentation, therefore the use of the name endmember for the selected vectors.
A Comparison of VBM Results by SPM, ICA and LICA
431
summation M i=1 ai = 1. Under this circumstance, we expect that the vectors in E are affinely independent and that the convex region defined by them includes all the data points. Once the endmembers have been determined, the unmixing process is the computation of the matrix inversion that gives the coordinates of the point relative to the convex region vertices. The simplest approach is the un −1 T constrained least squared error (LSE) estimation given by: a = ET E E x. Even when the vectors in E are affinely independent, the coefficients that result from this estimation do not necessarily fulfill the non-negativity and unity normalization. Ensuring both conditions is a complex problem. We call Lattice Independent Component Analysis (LICA) the following approach: 1. Induce from the given data a set of Strongly Lattice Independent vectors. In this paper we apply the Incremental Endmember Induction Algorithm (IEIA) [10,9]. These vectors are taken as a set of affine independent vectors. The advantages of this approach are (1) that we are not imposing statistical assumptions, (2) that the algorithm is one-pass and very fast because it only uses comparisons and addition, (3) that it is unsupervised and incremental, and (4) that it detects naturally the number of endmembers. 2. Apply the unconstrained least squares estimation to obtain the mixing matrix. The detection results are based on the analysis of the coefficients of this matrix. Therefore, the approach is a combination of linear and lattice computing: a linear component analysis where the components have been discovered by non-linear, lattice theory based, algorithms.
3 3.1
A VBM Case Study Experimental Data
Ninety eight right-handed women (aged 65-96 yr) were selected from the Open Access Series of Imaging Studies (OASIS) database (http://www.oasis-brains.org) [16]. OASIS data set has a cross-sectional collection of 416 subjects covering the adult life span aged 18 to 96 including individuals with early-stage Alzheimer’s Disease. We have ruled out a set of 200 subjects whose demographic, clinical or derived anatomic volumes information was incomplete. For the present study there are 49 subjects who have been diagnosed with very mild to mild AD and 49 nondemented. A summary of subject demographics and dementia status is shown in table 1. Multiple (three or four) high-resolution structural T1-weighted magnetizationprepared rapid gradient echo (MP-RAGE) images were acquired [5] on a 1.5-T Vision scanner (Siemens, Erlangen, Germany) in a single imaging session. Image parameters: TR= 9.7 msec., TE= 4.0 msec., Flip angle= 10, TI= 20 msec., TD= 200 msec., 128 sagittal 1.25 mm slices without gaps and pixels resolution of 256×256 (1×1mm).
432
D. Chyzyk, M. Termenon, and A. Savio
Table 1. Summary of subject demographics and dementia status. Education codes correspond to the following levels of education: 1 less than high school grad., 2: high school grad., 3: some college, 4: college grad., 5: beyond college. Categories of socioeconomic status: from 1 (biggest status) to 5 (lowest status). MMSE score ranges from 0 (worst) to 30 (best).
No. of subjects Age Education Socioeconomic status CDR (0.5 / 1 / 2) MMSE
3.2
Very mild to mild AD Normal 49 49 78.08 (66-96) 77.77 (65-94) 2.63 (1-5) 2.87 (1-5) 2.94 (1-5) 2.88 (1-5) 31 / 17 / 1 0 24 (15-30) 28.96 (26-30)
Algorithms Applied
We have applied both SPM and FSL approaches to this data. Figure 1 shows the activation results from a FSL study on this data. We have used the preprocessed volumes as inputs for the ICA and LICA algorithms. Detection of significative voxels in ICA and LICA approaches is given by setting the threshold on the mixing/abundance coefficients to the 95% percentil of the empirical distribution (histogram) of this coefficients. We present in figure 2 the activation results corresponding to the 3d endmember detected by the LICA algorithm, for comparison with the FSL results. It can be appreciated a great agreement. Because both ICA and LICA are unsupervised in the sense that the pattern searched is not prescribed, they suffer from the identificability problem: we do not know beforehand which of the discovered sources/endmembers correspond to the sought significative pattern, while SPM and FSL approaches are supervised in the sense that we provide the a priori identification of controls and patients, searching for voxels that correlate well with this indicative variable. In order to provide a quantitative assessment of the agreement between the discoveries of the ICA and LICA and the statistical significances computed by SPM and FSL we computed the correlations between the abundance/mixture matrices of the ICA approach. Table 2 shows the correlation between the mixing coefficients and the abundance coefficients of the corresponding ICA ML algorithm sources (the one with best results) and the LICA endmembers, both before (left) and after (right) the application of the 95% percentil threshold to determine the signficative voxels. We decide that the best relation is between the third LICA endmember and the second ICA source, because their correlation does not drop after thresholding, contrary to LICA#4 with ICA#1 whose correlation drops dramatically after thresholding for significance detection. To give some measure of the meaningfulness of the unsupervised approaches, we must find out if they are able to uncover something that has a good agreement with the findings of either SPM or FSL approaches. Therefore we compute the correlation between the mixing/abundance coefficients of ICA/LICA and the
A Comparison of VBM Results by SPM, ICA and LICA
Fig. 1. FSL significative voxel detection
Fig. 2. LICA activation results for the endmember #3
433
434
D. Chyzyk, M. Termenon, and A. Savio
Table 2. Correlation among ICA and LICA mixing coefficients, before (left) and after (right) thresholding for activation detection ICA ML LICA #1 #2 #3 #4 LICA #1 #1 #2 #3 #4
0.05 0.19 0.38 0.69
0.24 0.12 0.67 0.04
0.44 -0.28 0.30 0.26
-0.01 -0.60 0.24 -0.18
ICA ML #2 #3
#4
#1 0.003 0.09 0.34 0.03 #2 0.15 0.05 -0.02 -0.02 #3 0.01 0.66 0.007 0.08 #4 0.26 -0.01 0.13 -0.00
Table 3. Agreement between SPM, FSL, ICA and LICA
ICA vs SPM LICA vs SPM ICA vs FSL LICA vs FSL
#1
#2
#3
#4
-0.11 -0.03 0.08 0.07
0.32 -0.03 0.56 0.02
-0.02 0.23 0.03 0.58
0.02 -0.06 0.07 0.20
statistics computed by SPM and FSL. Table 3 shows these correlations. Here the agreement between the third endmember of LICA and the secod source of ICA ML obtains a further support, because both are the ones that show maximal agreement with SPM and FSL, and in both ICA and LICA the agreement with FSL is greater than with SPM results.
4
Summary and Conclusions
We have proposed and applied a Lattice Independent Component Analysis (LICA) to the model-free (unsupervised) VBM analysis. The LICA is based on the application of a Lattice Computing based algorithm IEIA for the selection of the endmembers, and the linear unmixing of the data based on these endmembers. We compare our results with those obtained by the conventional SPM and FSL algorithms, as well as the ICA unsupervised approach. We find a strong agreement between LICA results and those of ICA, and we can identify endmembers and sources that correspond closely to the significative detection of results in agreement with SPM and FSL, providing a validation of the approach. The problem with VBM and similar morphometric approaches is that we need to be able to give some interpretation to the findings of the ICA and LICA algorithms, that is, besides the obvious identification of voxels that correlate well with the indicative variable, the problem is to find additional regularities and give them some sense. Some kind of hierachical analysis [7] could be advantageous in the future works.
A Comparison of VBM Results by SPM, ICA and LICA
435
References 1. http://isp.imm.dtu.dk/toolbox/ica/index.html 2. http://www.cis.hut.fi/projects/ica/fastica/ 3. Ashburner, J., Friston, K.J.: Voxel-based morphometry: The methods. Neuroimage 11(6), 805–821 (2000) 4. Busatto, G.F., Garrido, G.E.J., Almeida, O.P., Castro, C.C., Camargo, C.H.P., Cid, C.G., Buchpiguel, C.A., Furuie, S., Bottino, C.M.: A voxel-based morphometry study of temporal lobe gray matter reductions in alzheimer’s disease. Neurobiology of Aging 24(2), 221–231 (2003) 5. Fotenos, A.F., Snyder, A.Z., Girton, L.E., Morris, J.C., Buckner, R.L.: Normative estimates of cross-sectional and longitudinal brain volume decline in aging and AD. Neurology 64(6), 1032–1039 (2005) 6. Frisoni, G.B., Testa, C., Zorzan, A., Sabattoli, F., Beltramello, A., Soininen, H., Laakso, M.P.: Detection of grey matter loss in mild alzheimer’s disease with voxel based morphometry. Journal of Neurology, Neurosurgery & Psychiatry 73(6), 657– 664 (2002) 7. Gra˜ na, M., Torrealdea, F.J.: Hierarchically structured systems. European Journal of Operational Research 25, 20–26 (1986) 8. Gra˜ na, M.: A brief review of lattice computing. In: Proc. WCCI, pp. 1777–1781 (2008) 9. Gra˜ na, M., Chyzyk, D., Garc´ıa-Sebasti´ an, M., Hern´ andez, C.: Lattice independent component analysis for FMRI. Information Sciences (in press, 2010) 10. Gra˜ na, M., Villaverde, I., Maldonado, J.O., Hernandez, C.: Two lattice computing approaches for the unsupervised segmentation of hyperspectral images. Neurocomputing 72(10-12), 2111–2120 (2009) 11. Gra˜ na, M., Savio, A.M., Garcia-Sebastian, M., Fernandez, E.: A lattice computing approach for on-line FMRI analysis. Image and Vision Computing (in press, 2009) 12. Schmalz, M.S., Ritter, G.X., Urcid, G.: Autonomous single-pass endmember approximation using lattice auto-associative memories. Neurocomputing 72(10-12), 2101–2110 (2009) 13. Højen-Sørensen, P., Winther, O., Hansen, L.K.: Mean field approaches to independent component analysis. Neural Computation 14, 889–918 (2002) 14. Kolenda, T., Hansen, L.K., Larsen, J.: Blind detection of independent dynamic components. In: Proc. IEEE ICASSP 2001, vol. 5, pp. 3197–3200 (2001) 15. Schuster, H., Molgedey, L.: Separation of independent signals using time-delayed correlations. Physical Review Letters 72(23), 3634–3637 (1994) 16. Marcus, D.S., Wang, T.H., Parker, J., Csernansky, J.G., Morris, J.C., Buckner, R.L.: Open access series of imaging studies (OASIS): cross-sectional MRI data in young, middle aged, nondemented, and demented older adults. Journal of Cognitive Neuroscience 19(9), 1498–1507 (2007) 17. Scahill, R.I., Schott, J.M., Stevens, J.M., Rossor, M.N., Fox, N.C.: Mapping the evolution of regional atrophy in alzheimer’s disease: Unbiased analysis of fluidregistered serial MRI. Proceedings of the National Academy of Sciences 99(7), 4703 (2002) 18. Calhoun, T.V.D., Adali, T.: Unmixing FMRI with independent component analysis. IEEE Engineering in Medicine and Biology Magazine 25(2), 79–90 (2006)
Fusion of Single View Soft k-NN Classifiers for Multicamera Human Action Recognition Rodrigo Cilla, Miguel A. Patricio, Antonio Berlanga, and Jose M. Molina Computer Science Department, Universidad Carlos III de Madrid Avda. de la Universidad Carlos III, 22. 28270 Colmenarejo, Madrid. Spain {rcilla,mpatrici}@inf.uc3m.es, {aberlan,molina}@ia.uc3m.es
Abstract. This paper presents two different classifier fusion algorithms applied in the domain of Human Action Recognition from video. A set of cameras observes a person performing an action from a predefined set. For each camera view a 2D descriptor is computed and a posterior on the performed activity is obtained using a soft classifier. These posteriors are combined using voting and a bayesian network to obtain a single belief measure to use for the final decision on the performed action. Experiments are conducted with different low level frame descriptors on the IXMAS dataset, achieving results comparable to state of the art 3D proposals, but only performing 2D processing.
1
Introduction
Human Action Recognition (HAR) from video is one of the most active research areas in computer vision. Different surveys of the works in the area have been published during the last years [1]. Applications of HAR systems range from video surveillance [2] and Ambient Assisted Living [3] to automatic annotation of video contents [4]. The recognition of Human Actions from video may be considered as a pattern recognition problem [5]. First, a low level descriptor is computed to try to capture the variance on the input frames. Popular choices at this level are motion templates [6], optical flow descriptors [7], spatio-temporal interest points [4], trajectories [8] or a combination of them [9,2]. This computed descriptor is introduced into a classifier to obtain the action category it belongs to. Common choices include Mixtures of Gaussians [8], Support Vector Machines [10], database searches [7,9,2] or Hierarchical Bayesian Models [11]. A particular feature in the recognition of Human Actions is that the actions do not happen isolated, they happen in a temporal sequence. The most popular technique to model the temporal sequence statistics has been Hidden Markov Models [12]. Other proposed techniques have been Context Free Grammars [13] or Conditional Random Fields [14]. In this work we assume that actions happen isolated, focusing on the descriptor classification level. Most of the existing approaches to HAR have considered a single video sensor to perceive the environment where the actions take place. A single sensor may not be enough to accurately perceive the actions, due to the presence of occlusions. E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 436–443, 2010. c Springer-Verlag Berlin Heidelberg 2010
Fusion of Single View Soft k-NN Classifiers
437
These occlusions may be caused by the relative position of the human body and the camera (self-occlusions) or by the presence of walls and furniture in the environment. To deal with these problems, HAR systems may be improved using a Visual Sensor Networks (VSN) [15] with overlapped cameras. In this paper we study how to obtain a single classification of the action perceived by all the cameras from the outputs of a set of single camera soft classifiers. Single camera soft classifiers provide a posterior for the performed activity based on the frame descriptor previously computed. We try two different approaches to solve the problem: the first one is based on a weighted voting scheme; the second one is based on using a Bayesian Network to model the error produced by each one of the single view classifiers. Our approach avoids computing the 3D visual hull, an expensive and centralized task used by state of the art methods for multiple view human action recognition [16,17], using only 2D pattern recognition techniques. Paper is organized as follows: on section 2 the problem to solve is formally defined; on section 3, the classifier fusion algorithms to be tested are presented; on section 4, we present the single view soft classifier we use to test the classifier fusion algorithms; on section 5, results of applying the proposed algorithms to classify the IXMAS dataset are shown; finally, on section 6, the conclusions of this work are presented.
2
Problem Statement
Let ft = ft1 , . . . , ftC be a set of action descriptors computed by a set of C cameras at an arbitrary instant t. The posterior probability p (yn | ftc ) of action yn , yn ∈ Y = {y1 , . . . , yN } is obtained applying a soft classifier to the descriptor ftc . Let B = {p (yn | ftc )}∀n, c be the set of all the posterior probabilities obtained after applying the soft classifier to each one of the views. The problem we want to solve is how to combine posteriors in B into a single posterior the single camera for all the cameras, p yn | ft1 , . . . , ft1 , yn ∈ Y, in order to decide what is the activity yn being performed.
3
Fusion of Soft Classifiers
Two different algorithms are going to be tested for this task. The first one, a voting scheme. The second one, a bayesian network modeling the errors on local classifications. 3.1
Voting
The first algorithm we are going to test for the fusion of single view soft classifications is defined to be the sum of the posterior probabilities. C p (ai | ftc ) p ai | ft1 , . . . , ftC ∝ c=1
(1)
438
3.2
R. Cilla et al.
Bayesian Network
The second algorithm we are going to test for the fusion of single view soft classifications is based on the Bayesian Network shown on figure 1. The network is composed of observation nodes ftc , representing the observation at instant t on camera c, a node αt representing the activity at time t and a set of latent nodes vtc , to model the single view classification. Given a set of frame descriptors ft = ft1 , . . . , ftC , a set of latent variables vt = vt1 , . . . , vtc , and the activity label αt , their joint probability is factorized as: p (αt , vt , ft ) = p (αt | vt )
C
p (vtc ) p (ftc | vtc )
(2)
c=1
The probability of αt is defined as a product of independent factors, assuming independence between hidden variables vtc : C . p (αt | vt ) = p (αt | vtc )
(3)
c=1
With this assumption we refuse to model correlations between local classification errors. In this way, when adding a new camera to the system only 2 conditional probability distributions need to be estimated, instead of the exponential number of them if the assumption were not made. Thus, equation 2 can be rewritten as: p (αt , vt , ft ) =
C
p (αt | vtc ) p (vtc ) p (ftc | vtc )
(4)
c=1
The posterior probability of an activity label αt and a set of hidden variables vt is proportional to the joint probability: p (αt , vt | ft ) ∝ p (αt , vt , ft )
(5)
Given a set of frame descriptors ft , the posterior probability of the activity label αt is obtained marginalizing equation 5 over the set of latent variables vt :
p (αt = ai | ftc ) =
N C
p (αt = ai | vtc = aj ) p (vtc = aj ) p (ftc | vtc = aj )
(6)
j=1 c=1
p (ftc | vtc = aj ) may be computed in terms of p (vtc = aj | ftc ) using Bayes theorem: p (ftc | vtc = aj ) =
p (vtc = aj | ftc ) p (ftc ) . p (vtc = aj | ftc ) = p (vtc = aj ) p (vtc = aj )
(7)
The term p (ftc ) vanishes assuming that ftc ∼ U nif orm. The final expression for the posterior is obtained introducing the RHS of equation 7 into equation 6:
Fusion of Single View Soft k-NN Classifiers
439
Fig. 1. Plate model of the Bayesian Network used to combine the outputs from the classifiers at each camera
p (αt = ai | ftc ) =
N C
p (αt = ai | vtc = aj ) p (vtc = aj | ftc )
(8)
j=1 c=1
Network parameters are estimated using labeled training samples. p (vtc | ftc ) is known, being provided by the single view soft classifiers, so only p (αt | vct ) needs to be estimated. Be Oc = (oc1 , . . . , ocK } the set of K training frame descriptors computed at camera c with their corresponding activity labels Y c = c {y1c , . . . , yK }, ykc ∈ A. Model parameters are estimated as: K
p αt = ai | vtc = aj =
γk p (vtc = aj | ock )
k=1 N K
(9) γk p (vtc
= al |
ock )
l=1 k=1
where γk = 1 if yk = aj and γk = 0 otherwise.
4
Soft Classifier
The classifier we are going to use to obtain the probability of each single frame being an instance of each action category is based on a k-Nearest Neighbor setting (kNN). Let D = {xi , y i }, 1 ≤ i ≤ M be a set of M training samples, being yi ∈ {y1 , yN } the label corresponding to the instance xi . The posterior probability p y | xj of a new sample xj to predict is decided sampling from the neighborhood of xi , transforming the distances to the k nearest neighbors into likelihood values: K γk ρj − xj − xk p y = yn | xj ∝ k=1
(10)
K where ρj = k=1 xj − xk ,i.e. the sum of the distances to the k nearest neighbors of xj ; γk = 1 if yn = yk and γk = 0. The main advantage of this classifier is
440
R. Cilla et al.
that it captures the local structure of the data, being able to model multimodal distributions. Training is also very fast because only requires storing the samples on the database.
5
Experiments
5.1
Experimental Setup
Experiments are going to be conducted on the state-of-the art testbed for human action recognition: the Inria IXMAS dataset 1 . The dataset includes samples of eleven action categories performed by 12 different actors 3 times each (36 clips), recorded by 5 different camera views. The actions are: check watch, cross arms, scratch head, sit down, get up, turn around, walk, wave, punch, kick and pick up. Two different frame descriptors are used to model these actions and test our algorithms. The first one is the popular Motion History Image (MHI) [6]. This descriptor is based on a temporal accumulation of the human body shape. The computed descriptors are resized to a box of 35x20 pixels, obtaining a feature vector of length lMHI = 700. The second one is the proposed by Tran et al.[9], including both shape and optical flow information. The extracted descriptor can be obtained from their web2 , being its length lT ran = 286. The evaluation protocol to test the classification and fusion algorithms is Leave-One-Clip-Out-Cross Validation: algorithms are trained with all the action clips unless one, that is used for testing. The procedure is repeated until all the clips have been used for testing. The kN N classification algorithms are going to be tested using neighborhood values of k = 3 and k = 5. As the length of the descriptors is too large for practical usage, the well known Principal Component Analysis is applied to obtain reduced descriptors ranging from l = 10 to l = 45 with a stepsize of 5. 5.2
Results and Discussion
Single camera classification. Figure 2 shows the result of classifying each one of the camera views with the soft kNN classifier. It is clear that the Tran descriptor predicts the activity performed on a single frame better than the MHI. This behavior was expectable until some point, because Tran’s descriptor includes shape and local motion information, while the MHI only includes shape. The classifiers with k = 5 always work better than those with K = 3. Single camera classification results also show that while from cameras 1-4 the obtained accuracy is similar, camera 5 accuracy drops about a 10%. Camera 5 provides a top view of the action, preventing descriptors from accurately capturing the dynamics of the performed action. 1 2
http://charibdis.inrialpes.fr/ http://vision.cs.uiuc.edu/projects/activity/
Fusion of Single View Soft k-NN Classifiers
(a) Camera 1
(b) Camera 2
(c) Camera 3
(d) Camera 4
441
(e) Camera 5 Fig. 2. Classification results at each camera before and after the fusion. The first number stands for the number of nearest neighbors used. The suffix stands for the fusion algorithm used: V for voting and C for the bayesian network.
Fusion results. The different plots shown on figure 2 also show the results obtained after applying the fusion algorithms to the single camera soft classifications. The weighted voting proposed on section 3.1 and the bayesian network proposed on section 3.2 have similar results, being voting slightly better. Fusion algorithms improve more the classification based on MHI descriptors. This is
442
R. Cilla et al. Table 1. Comparison of the accuracy of our method to others Method Accuracy Type Tran et al. [9] 81 2D Srivastava et al. [18] 81.4 2D Multicamera Our 92.01 Multicamera Weinland et al. [16] 93.33 3D Peng et al. [17] 94.59 3D
probably because as the initial result was worse than when using Tran descriptors, it is easier to improve the results using fusion. Comparison to other proposals. Finally, on table 1, we compare the results obtained by our method to the obtained by other state of the art approaches. Our method performs better than other 2D multicamera approaches, obtaining results comparable to proposals based on computing the 3D visual hull. Results on the table are for sequence classification. To obtain them, each frame on a sequence has voted with its posterior distribution to obtain the majority classification.
6
Conclusions
In this paper we have shown how the accuracy of the task of human action classification can be improved combining the results of single view classifiers. We want to remark that our method avoids visual hull computation, being very easy to implement on a distributed environment. Another advantage of the proposed method is that it can integrate other sensors without very much effort, because the fusion level is independent of the type of sensor used. If a posterior for the activity can be obtained from the hypothetical sensor, it can be used in our system. Future works will explore how to model the correlations between the soft classifications from each camera. We suspect that the independence assumption made between sensor values is too strong, and that fusion results may be highly improved introducing dependencies between sensors in our fusion model. Acknowledgment. This work was supported in part by Projects CICYT TIN2008-06742-C02-02/TSI, CICYT TEC2008-06732-C02-02/TEC, CAM CONTEXTS (S2009/TIC-1485) and DPS2008-07029-C02-02.
References 1. Lavee, G., Rivlin, E., Rudzsky, M.: Understanding Video Events: A Survey of Methods for Automatic Interpretation of Semantic Occurrences in Video. IEEE Transactions on Systems, Man and Cybernetics - Part C: Applications and Reviews 39(5), 489–504 (2009)
Fusion of Single View Soft k-NN Classifiers
443
2. Robertson, N., Reid, I.: A general method for human activity recognition in video. Computer Vision and Image Understanding 104(2-3), 232–248 (2006) 3. Cilla, R., Patricio, M., Belanga, A., Molina, J.: Non-supervised discovering of user activities in visual sensor networks for ambient intelligence applications. In: 2nd International Symposium on Applied Sciences in Biomedical and Communication Technologies, ISABEL 2009, November 2009, pp. 1–6 (2009) 4. Marszalek, M., Laptev, I., Schmid, C.: Actions in context. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Miami, USA (2009) 5. Bishop, C., et al.: Pattern recognition and machine learning. Springer, New York (2006) 6. Bobick, A., Davis, J.: The recognition of human movement using temporal templates. IEEE Transactions on Pattern Analysis and Machine Intelligence 23(3), 257–267 (2001) 7. Efros, A., Berg, A., Mori, G., Malik, J.: Recognizing action at a distance. In: IEEE International Conference on Computer Vision, vol. 2, pp. 726–733 (2003) 8. Ribeiro, P., Santos-Victor, J.: Human activity recognition from video: modeling, feature selection and classification architecture. In: International Workshop on Human Activity Recognition and Modeling, HAREM (2005) 9. Tran, D., Sorokin, A., Forsyth, D.: Human activity recognition with metric learning. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 548–561. Springer, Heidelberg (2008) 10. Cao, D., Masoud, O., Boley, D., Papanikolopoulos, N.: Human motion recognition using support vector machines. Computer Vision and Image Understanding 113(10), 1064–1075 (2009) 11. Niebles, J., Wang, H., Fei-Fei, L.: Unsupervised learning of human action categories using spatial-temporal words. International Journal of Computer Vision 79(3), 299– 318 (2008) 12. Oliver, N., Rosario, B., Pentland, A.: A Bayesian computer vision system for modeling human interactions. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(8), 831–843 (2000) 13. Guerra-Filho, G., Aloimonos, Y.: A Language for Human Action. Computer 40(5), 42–51 (2007) 14. Quattoni, A., Wang, S., Morency, L., Collins, M., Darrell, T.: Hidden-state conditional random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence (2007) 15. Cucchiara, R., Prati, A., Vezzani, R.: Making the home safer and more secure through visual surveillance. In: Symposium on Automatic detection of abnormal human behaviour using video processing of Measuring Behaviour, Wageningen, The Netherlands (2005) 16. Weinland, D., Ronfard, R., Boyer, E.: Free viewpoint action recognition using motion history volumes. Computer Vision and Image Understanding 104(2-3), 249– 257 (2006) 17. Peng, B., Qian, G., Rajko, S.: View-Invariant Full-Body Gesture Recognition via Multilinear Analysis of Voxel Data. In: Third ACM/IEEE Conference on Distributed Smart Cameras (September 2009) 18. Srivastava, C., Iwaki, H., Park, J., Kak, A.C.: Distributed and Lightweight MultiCamera Human Activity Classification. In: Third ACM/IEEE Conference on Distributed Smart Cameras (September 2009)
Self-adaptive Coordination for Organizations of Agents in Information Fusion Environments Sara Rodríguez, Belén Pérez-Lancho, Javier Bajo, Carolina Zato, and Juan M. Corchado University of Salamanca, Plaza de la Merced s/n, 37008 Salamanca, Spain {srg,lancho,jbajope,carol_zato,corchado}@usal.es
Abstract. Each organization of agents needs to be supported by a coordinated effort that explicitly determines how the agents should be organized and carry out the actions and tasks assigned to them. The interactions of a multi-agent system cannot be related only to the agent and their communication skills, if not that it's necessary to use the concepts of organizational engineering. This research presents a new global coordination model for an agent organization. The innovation of the model consists of the dynamic and adaptive planning capability to distribute tasks among the agent members of the organization as effectively as possible. Keywords: Multi-Agent systems, Virtual Organizations; Dynamic Architectures; Adaptive Environments.
1 Introduction Open MAS should allow the participation of heterogeneous agents with different architectures and even different languages [17][5]. The development of open MAS is still a recent field of the multi-agent system paradigm and its development will allow applying the agent technology in new and more complex application domains. However, this makes it impossible to trust agent behavior unless certain controls based on norms or social rules are imposed. To this end, developers have focused on the organizational aspects of agent societies, using the concepts of organization, norms, roles, etc. to guide the development process of the system. Virtual organizations [9] are a means of understanding system models from a sociological perspective. From a business perspective, a virtual organization model is based on the principles of cooperation among businesses within a shared network, and exploits the distinguishing elements that provide the flexibility and quick response capability that form the strategy aimed at customer satisfaction. Even so, within the development of organizations, both at the business and agent level, we find a set of requirements [15] that call for the use of new social models in which the use of open and adaptive systems is possible [17]. Given the advantages provided by the unique characteristics found in the development of MAS from an organizational perspective, and the absence of an adaptive E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 444–451, 2010. © Springer-Verlag Berlin Heidelberg 2010
Self-adaptive Coordination for Organizations of Agents
445
planning process for any social model, this study proposes a model that can coordinate a dynamic and adaptive planning system in an agent organization. The development of the model enables the use of information to improve the allocation tasks. The proposed notions will be validated via the development of an experimental system consisting of a small-scale fusion and planning model located at each of the participating agents. The article is structured as follows: Section 2 describes the state of the art for current studies of the agent organizations and its adaptation. Section 3 presents the proposed planning model. And finally, Section 4 demonstrates how the model can be used in a case study in an information fusion environments and some conclusions and experimental results.
2 Organizational Approaches There are several different organizational approaches[7][17]. However, while these studies provide mechanisms for creating coordination among participants, there is much less work focused on adapting organizational structures in execution time or norms defined in design time. For example, [12] proposes a model for controlling adaption by creating new norms. [10] propose a distributed model for reorganizing their architecture. [1] requires agents to follow a protocol to adapt the norms. Each of these studies focuses on the structure and/or norms based on adapting the coordination among participants. Another possibility is the development of a MAS that focuses on the concept of organization/institution. One electronic institution [8] should be considered a social middleware between the external participating agents and the selected communication layer responsible for accepting or rejecting the agent actions. The primary difference with the other proposals is that the adaption is carried out by the institution instead of by the agents. Lastly, there are approaches focus on social group mechanisms based on the social information gathered during the interactions [16]. None of these approaches is capable of coordinating tasks for the member agents of the organization to solve a common problem, nor do them consider that task planning should adapt to changes in the environment. The architecture selected for this study is OVAMAH [3][11], which focuses on defining the structure and norms. OVAMAH (Adaptive Virtual Organizations: Mechanisms, Architectures and Tools) is the evolution of architecture THOMAS (MeTHods, techniques and tools for Open Multi-Agent Systems) [6][11].The following section will present the planning model proposed integrated into OVAMAH whose goal is to carry out an adaptive planning process within an agent organization. The architecture is essentially formed by a set of services that are modularly structured. OVAMAH uses the FIPA architecture, expanding its capabilities with respect to the design of the organization, while also expanding the services capacity. OVAMAH has a module with the sole objective of managing organizations that have been introduced into the architecture, and incorporates a new definition of the FIPA Directory Facilitator that is capable of handling services in a much more elaborate way, following the service-oriented architecture directives. From a global perspective, the OVAMAH architecture offers a total integration enabling agents to transparently offer and request services from other agents
446
S. Rodríguez et al.
or entities, at the same time allowing external entities to interact with agents in the architecture by using the services provided.
3 Description of the Model In this research is proposed a planning model that facilitates a self-adaptation feature within an agent society. We will use a cooperative MAS in which each agent is capable of establishing plans dynamically in order to reach its objectives. The global mechanism considers the global objective of the society, as well as its norms and roles. It's obtained a planning model that can, within an architecture geared towards the development of agent organizations (OVAMAH [3][11]), take into account the changes that are produced within an environment during the execution of a plan. The planning process defines the actions that the society of agents will have to execute and should therefore also take into account the particular circumstances of each of its members. To achieve this, a CBP-BDI (Case Based Planning) agent is used, applying the planning model showed in this section, that is particularly suited for organizations. A CBP-BDI agent is a specialization of an CBR-BDI agent [6]. A CBP-BDI agent calculates the plan or intention that is most easy to replan: Most RePlannable Intention (MRPI). This is the plan that can most easily be replaced by another plan in case it is interrupted (for example, if a user changes preferences while the plan is being executed. A plan p within an organization is defined as p=<E, O, O’, R, R’>, where: E is the environment that represents the type of problem that the organization solves, and is characterized by a set of states E = {e0, e*} for each agent, where e0 represents the initial state of the agent when the plan begins, y and e* is the state or set of states that the agents tries to achieve. O represents the set of objectives for the individual agent and O´ is the set of objectives reached once the plan has been executed. R is the set of available resources for the given agent and R´ is the set of resources that the agent has used during the execution of the plan.
Fig. 1. Planning Model
Given the initial state of the organization, the term global planning is used to describe the search for a solution that can reach the final state, all the while complying with a series of requirements for the organization. The problem can be represented in
Self-adaptive Coordination for Organizations of Agents
447
a planning space that is delimited by the restrictions imposed by the requirements. Given a common objective, specified resources available and tasks to perform, the aim is to find a global plan that allows the organization to find the optimal solution, To this end, the planning agent should bear in mind the optimal plans p*(t) obtained for each individual agent. It is not necessary for all of the agents within the organization to know how to meet the objectives, but they should know how to perform some of the tasks that contribute towards reaching those objectives for the organization. Upon initiating the process, certain agents will be retrieved from the data memory of cases to perform at least one of the problem tasks. For each task that is not completed by any of the retrieved agents, at least one new agent will be incorporated. This agent will have the greatest probability of successfully completing the given task. The idea is to count on the necessary agents so that no task is left unassigned. Let us assume that the common objective for agents “m” has “n” states or tasks with m ,n ∈ . Each agent has its own characteristics with regards to which tasks it can perform, which resources to use, and the amount of time available to perform the tasks. In other words, each agent has its own profile. Given a state “j” for each agent “i” where i∈ {1, ,m }m∈ , it can be defined with a tuple zij - where each coordinate in the tuple refers to the characteristic that defines it. The following binary variables are defined as: ⎧1 if agent " i " is assigned to task " j " ai j = ⎨ 0 otherwise ⎩
For each problem related to assigning tasks, an objective function is defined whose goal is to minimize and maximize the cost used by agents “m” to perform the common objective. For example, minimize or maximize the cost of using one of the agents to reach an objective, or maximize an efficiency function as need for each case. A new efficiency function is introduced in order to assign tasks to the agents. Its aim is to visit the greatest number of points with the lowest possible cost. Cost is another function that depends on the time that agent “i” has spent working on task “j”, on the resources used, and on the type of agent assigned to each task. This is represented as: cit r .The efficiency function is defined as: Efficiency=Nº points visited / m n c i ia .Let
∑∑
ij ij
i =1 j =1
t i j ri j
ij
m n
us assume we want to maximize the efficiency function: Max·Nº points visited / ∑∑c it r iai j i =1 j =1
where
ti j
is the time it takes agent “i” to perform the task, and
ti jk
{ }
tij = Máx tijk k
ij ij
where
indicates the time it takes agent “i” to perform task “j” for tourist “k”. Taking the maximum value of “k” (type of tourist), we can ensure that the guide has time to perform the necessary task regardless of the type of tourist. These times are initially estimated. Let us now define the restrictions of the problem. 1. We want each state to be completed by an agent, which in mathematical terms m
can be stated, for each state “k” as:
aik = 1 ∑ i =1
∀k ∈ {1, ,n}
448
S. Rodríguez et al.
2. We want each state to be completed within a specified period of time. Let us t assume that state “k” should be completed within time k . The restriction would m
tikaik ≤ tk ∑ i
∀k ∈ {1, ,n} be:. =1 3. Each state “k” needs a set of resources to be executed. There is no reason for all of the agents to have these resources.
Given state “k”, we need {rkx} x∈{1, ,w} The variables
rhk
rw = m ax{rhk}h∈
k =1, ,n
resources with h ∈
∀k ∈ {1, ,n}
, where
.
are defined in binary form:
⎧1 if the agent " k " needs the resource " x " r =⎨ 0 otherwise ⎩ k x
The agent that performs state “k” must at the very least have at its disposable the resources that are needed to perform state “k”, for which, given state “k”, for each resource from the set m
∑ rixaik ≥ r
k x
tion:. i=1 binary variables:
{rkx}x∈{1,
,w}
∀k ∈ {1, ,n}
∀k ∈ {1, ,n} ;∀x∈ {1, ,w}
we can define the following restric-
. The variables
{r }
ix x∈ 1, ,w { }
∀i∈ {1, ,m }
are
⎧1 if the agent " i " has the resource " x " ri x = ⎨ 0 otherwise ⎩
4. Each agent “i” has a minimum and maximum time for work, depending on the type of agent. These times are represented as
tiTurn on
and
tiTurn off
respec-
n
t ≤ ∑ tij ≤ t ∀i∈ {1, ,m } j=1 For the majority of agents, as we will see in the tively: case study, the maximum number of working hours is equal to a regular 8 hour work day. 5. Every time we assign tasks to an agent, we want it to perform the minimum number of tasks, which varies according to the type of agent: Turn on i
n
aij ≥ Num berTask i ∑ j
Turn off i
∀i∈ {1,
,m }
. If the suggested problem of non-linear programming were incompatible, we would add agents to make it compatible. The agent added would be the one with the highest probability a priori of performing the necessary tasks. If a norm (restriction) changes, it would be necessary to assign tasks once again. This allows us to obtain a plan for the tasks that need to be performed by the agent organization. In other words, we can obtain a global plan composed of all the tasks and agents in the organization that will carry them out. Every agent in the organization recognizes the tasks that it needs to perform. These agents, which are CBP-BDI agents, integrate the 4 phases of a CBR system (retrieval, reuse, revise and retain). =1
4 Experimental System and Results This section presents a case study that tests the defined model. The case study delineates the scope and potential virtual organizations in the design and development
Self-adaptive Coordination for Organizations of Agents
449
of information fusion processors for deployment in multi-agents environments. An organization is implemented by using the model proposed in section 3 and is represented in a virtual world [14] containing a set of cultural heritage sites. The simulation within the virtual world represents a tourist environment in which there are guides and tourists, and in which the tour guide´s tasks will be performed in adherence to a defined set of norms. The roles that have been identified within the case study are: Tourist, Monument, Guide, Visitor, Coordinator, Notification and Manager: The agents that take on the role of Guide are those that will carry out dynamic planning according to the tasks they need to carry out for each group of tourists. The generated plans should ensure that all of the visitors assigned to a tour guide are able to follow their tourist route. They will be personalized according to the Guide´s profile and work habits, and should take into account the restrictions directly related to each agent on an individual basis, as well as the restrictions of the organization itself. These restrictions are imposed according to the norms for the society of agents: (i) the work schedule for a Guide agent (8 hours); (ii) the maximum number of Tourist agents assigned to a guide; (iii) visiting days and hours for certain monuments; (iv)the maximum number of Guide agents that can participate on a route; (v) the minimum number of points to visit on a route. Once the Coordinator has identified all of the agents in the organization that are needed to carry out the plan, it assigns each task to the agent responsible for completing it. At that moment each Guide agent becomes aware of its tasks and designs an individual plan. Each Guide agent is a type of CBP-BDI agent capable of providing efficient plans in execution time. The following paragraph provides a detailed example. Let
Eg = {e0g, ,ehg}
be the tasks carried out by a group of tourists and visitors “g”
{
}
E = ∪ E g= e0 , ,e n
g in order of priority. We have the following problem , where E represents the complete set of tasks that must be completed (for this reason they are not superscripted). Let us assume there are 10 guides. Randomly selecting a Guide i∈{1,● , 10}, (specifically, i=3), the task assignment according to their profile is: (1) Agent Task: Visit the cathedral with tourist group 2 ≡ e ; t =30 min. (2) Agent Task: Take tourist group 2 to the aqueduct ≡ e ; t =15 min. (3) Agent Task: Take tourist group 2 2 2 to the hermitage ≡ e 3 ; t =10 min. (4) Agent Task: Visit the hermitage ≡ e 4 ; t 34 =10 min. 2 1
2 2
31
32
33
t
(5) Agent Task: Take tourist group 2 to the Roman city ≡ e 5 ; 35 =20 min. (6) Agent 2 Task: Visit the Roman city ≡ e 6 ; t =30 min. (7) Agent Task: Take tourist group 2 to 2 the ravine ≡ e 7 ; t =50 min. (8) Agent Task: Hike along the ravine with group 2 ≡ e ; 2 t =20 min. (9) Agent Task: Return to the cathedral with group 2 ≡ e 9 ; t 39 =10 min. Calculating the assigned tasks ensures both that the total amount of time assigned to a Guide does not exceed 8 hours, and that any other restrictions corresponding to the norms of the organization are also respected. Each task has a set of objectives that must be met so that the global plan can be successfully completed. To perform each task, the Guide agent should have the number of available resources. For example, the task "Buy tickets for museum 1” corresponds to the objective “Visit museum 2
36
37
2 8
38
1”
≡ O0
and "breakfast, lunch, tea and dinner" correspond to the objective
450
S. Rodríguez et al.
≡ O2,4,6,7
(task 2 indicates breakfast, task 4 indicates lunch, task 6 indicates tea, and 7 indicates dinner). A similar coding is used for resources. As shown in Fig. 2a, value 1 indicates the resource that is needed or the objective to be met, while zero denotes the contrary. Fig. 2a shows the representation of a space ℜ for tasks according to the following three coordinates: time, number of objectives achieved, and number of resources used (coordinates taken from similar retrieved cases). Specifically, Fig. 2a shows a hyper plan of restrictions and the plan followed for a case retrieved from the beliefs base, considered to be similar to the new case. There are 120 possible routes, not all of which are viable because of the previously mentioned restrictions. In a simulated scenario where the Coordinator assigned this group of tourists to a Guide, the planning process used by the Guide for the tasks it needed to perform is the same as that shown in Fig 2a. 3
Fig. 2. Representation of a space ℜ for tasks (a) and replanned tasks (b). Number of agents working simultaneously (c). 3
Figure 2 illustrates the plan as it was carried out. To understand the graphical representation, let us focus on the initial task e1 and the final task e9. In between these two tasks, the Guide agent could carry out other tasks that would involve the same or different tourists and visitors. The idea presented in the planning model is to select the optimal plan, the one with the most plans surrounding it, as the solution. The following studies were carried out: Given the same tourists attractions to be visited on the same day, and the same number of tourists per group, one group used the planner and the other did not. The results for different days, as far as the number of Guide agents used, can be observed in Fig.2c. The color blue represents the average number of guides needed each day using the planner, and red the number without using it. The proposed model helps the organization utilize fewer guides, thus minimizing its costs. In conclusion, we can affirm to have achieved out stated objectives: (i) Develop agent societies; (ii) Simulate the behavior of an organization in a specific case involving the coordination and adaption of its agents; and (iii) Validate the proposed planning model through a simulation of the organization in a case study. As previously mentioned, it is increasingly common to model a MAS not only from the perspective of the agent and its communication capabilities, but by including organizational engineering as well. Acknowledgments. This work has been supported by the MICINN TIN 2009-13839C03-03 project.
Self-adaptive Coordination for Organizations of Agents
451
References [1] Argente Villaplana, E.: Gormas: Guías para el desarrollo de sistemas multi-agente abiertos basados en organizaciones. Ph.D. thesis, Universidad Politécnica de Valencia (2008) [2] Artikis, A., Kaponis, D., Pitt, J.: Dynamic Specications of Norm-Governed Systems. In: Multi-Agent Systems: Semantics and Dynamics of Organisational Models. IGI Globa (2009) [3] Carrascosa, C., Giret, C.A., Julian, V., Rebollo, M., Argente, E., Botti, V.: Service Oriented MAS: An open architecture (Short Paper). In: Decker, S., Sierra, C. (eds.) Proc. of 8th Int. Conf. on Autonomous Agents and Multiagent Systems (AAMAS 2009), Budapest, Hungary, May 10-15, pp. 1291–1292 (2009) [4] Castanedo, F., Patricio, M.A., García, J.M., Molina, J.: Data Fusion to Improve Trajectory Tracking in Cooperative Surveillance Multi-Agent Architecture? Information Fusion. An International Journal. Special issue on Agent-based Information Fusion 11, 243–255 (2010) [5] Corchado, E., Pellicer, M.A., Borrajo, M.L.: A MLHL Based Method to an Agent-Based Architecture. International Journal of Computer Mathematics 86(10, 11), 1760–1768 (2008) [6] Corchado, J.M., Glez-Bedia, M., de Paz, Y., Bajo, J., de Paz, J.F.: Concept, formulation and mechanism for agent replanification: MRP Architecture. In: Computational Intelligence. Blackwell Publishers, Malden (2008) [7] Dignum, V.: A model for organizational interaction: Based on agents, founded in logic, PhD. Thesis (2004) [8] Esteva, M.: Electronic Institutions: from specification to development. Ph. D. Thesis, Technical University of Catalonia (2003) [9] Ferber, J., Gutknecht, O., Michel, F.: From Agents to Organizations: an Organizational View of Multi-Agent Systems. In: Giorgini, P., Müller, J.P., Odell, J.J. (eds.) AOSE 2003. LNCS, vol. 2935, pp. 214–230. Springer, Heidelberg (2004) [10] Gasser, L., Ishida, T.: A dynamic organizational architecture for adaptive problem solving. In: Proc. of AAAI 1991, pp. 185–190 (1991) [11] Giret, A., Julian, V., Rebollo, M., Argente, E., Carrascosa, C., Botti, V.: An Open Architecture for Service-Oriented Virtual Organizations. In: Seventh International Workshop on Programming Multi-Agent Systems, PROMAS 2009, pp. 23–33 (2009) [12] Hubner, J.F., Sichman, J.S., Boissier, O.: Using the Moise+ for a cooperative framework of mas reorganisation. In: Bazzan, A.L.C., Labidi, S. (eds.) SBIA 2004. LNCS (LNAI), vol. 3171, pp. 506–515. Springer, Heidelberg (2004) [13] Huhns, M., Stephens, L.: Multiagent Systems and Societies of Agents. In: Weiss, G. (ed.) Multi-agent Systems: a Modern Approach to Distributed Artificial Intelligence. MIT, Cambridge (1999) [14] http://repast.sourceforge.net (2009) [15] Rodríguez, S., Pérez-Lancho, B., De Paz, J.F., Bajo, J., Corchado, J.M.: Ovamah: Multiagent-based Adaptive Virtual Organizations. In: 12th International Conference on Information Fusion, Seattle, Washington, USA (July 2009) [16] Villatoro, D., Sabater-Mir, J.: Categorizing Social Norms in a Simulated Resource Gathering Society. In: Hübner, J.F., Matson, E., Boissier, O., Dignum, V. (eds.) COIN@AAMAS 2008. LNCS, vol. 5428, pp. 235–249. Springer, Heidelberg (2009) [17] Zambonelli, F., Jennings, N.R., Wooldridge, M.: Developing Multiagent Systems: The Gaia Methodology. ACM Transactions on Software Engineering and Methodology 12, 317–370 (2003)
Sensor Management: A New Paradigm for Automatic Video Surveillance Lauro Snidaro, Ingrid Visentini, and Gian Luca Foresti Department of Mathematics and Computer Science University of Udine Udine, Italy [email protected]
Abstract. In this paper we discuss the new paradigm of Sensor Management that could be taken into consideration for the design of next generation surveillance systems. The paradigm is meant to optimize the sensing capabilities of a system by taking into account the state of the environment being observed along with contextual information that can drive the choice of sensing modalities and platforms. We thus provide a brief account on how the Sensor Management concepts developed within the data fusion community could be applied in the design of the next generation of surveillance systems. Keywords: Data fusion; Video Surveillance; Sensor Management.
1
Introduction
Video surveillance systems have always been based on multiple sensors since their first generation (CCTV systems) [1]. Video streams from analog cameras were multiplexed on video terminals in control rooms to help human operators monitor entire buildings or wide open areas. The last generation makes use of digital equipment to capture and transmit images that can be viewed virtually everywhere by using Internet. Initially, multi-sensor systems were employed to extend surveillance coverage over wide areas. The recent advances in sensor and communication technology, in addition to lower costs, allow to use multiple sensors for the monitoring of the same area [2,3]. This has opened new possibilities in the field of surveillance as multiple and possibly heterogeneous sensors observing the same scene provide redundant or possibly improved data that can be exploited to improve detections accuracy and robustness, enlarging monitoring coverage, reducing uncertainty [2]. While the advantages of using multiple sources of information are well know to the data fusion community [4], the full potential of multi-sensor surveillance is yet to be discovered. In particular, the enrichment of available sensor assets has allowed to take advantage of data fusion techniques for solving specific tasks like for example target localization and tracking [5], or person identification [6]. This can be formalized as the application of JDL Level 1 and 2 fusion techniques [4] to surveillance strictly following a processing stream that exploits multi-sensor E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 452–459, 2010. c Springer-Verlag Berlin Heidelberg 2010
Sensor Management: A New Paradigm for Automatic Video Surveillance
453
data to achieve better system perception performance an in the end improved situational awareness. A brief exemplification of the techniques that can be employed in Level 1 and 2 will be presented in Section 3. While many technical problems remain to be solved for integrating heterogeneous suites of sensors for wide area surveillance, a principled top-down approach is probably still left unexplored. Given the acknowledged increased complexity of architectures that can be developed nowadays, a full exploitation of this potential is probably beyond the possibilities of a human operator. Think for example to all the possible combinations of configurations that are made available by modern sensors: PanTilt-Zoom (PTZ) cameras can be controlled to cover different areas, day/night sensors offer different sensing modalities, radars can operate at different frequencies, etc. The larger the system the more likely it will be called to address many different surveillance needs. A top-down approach would be needed in order to develop surveillance systems that are able to automatically manage large arrays of sensors in order to enforce surveillance directives provided by the operator that in turn translate the security policies of the owning organization. Therefore, a new paradigm is needed to guide the design of architectures and algorithms in order to build the next generation of surveillance systems that are able to organize themselves to collect the data relevant to the objectives specified by the operator. This new paradigm would probably need to take inspiration by the principles behind the Sensor Management policies foreseen by JDL Level 4 [7]. The JDL model Level 4 is also called Process Refinement step as it implies adaptive data acquisition and processing to support mission objectives. Conceptually, this refinement step should be able to manage the system in its entirety: from controlling hardware resources (e.g. sensors, processors, storage, etc.) to adjusting the processing flow in order to optimize the behaviour of the system to best achieve mission goals. It is therefore apparent that the Process Refinement step encompasses a broad spectrum of techniques and algorithms that operate at very different logical levels. In this regard, an implemented full-fledged Process Refinement would provide the system a form of awareness of its own capabilities and how they relate and interact with the observed environment.
2
The JDL Fusion Process Model
Several fusion process models have been developed over the years. The first and most known originates from the US Joint Directors of Laboratories (JDL) in 1985 under the guidance of the Department of Defense (DoD). The JDL model [8] comprises five levels of data processing and a database, which are all interconnected by a bus. The five levels are not meant to be processed in a strict order and can also be executed concurrently. Steinberg and Bowman proposed revisions and expansions of the JDL model involving broadening the functional model, relating the taxonomy to fields beyond the original military focus, and integrating a data fusion tree architecture model for system description, design, and development [9]. This updated model, sketched in Figure 1, is composed by the following levels:
454
L. Snidaro, I. Visentini, and G.L. Foresti
Fig. 1. The JDL data fusion process model
– Level 0 - Sub-Object Data Assessment: estimation and prediction of signal/object observable states on the basis of pixel/signal level data association and characterization; – Level 1 - Object Assessment: estimation and prediction of entity states on the basis of observation-to-track association, continuous state estimation (e.g. kinematics) and discrete state estimation (e.g. target type and ID); – Level 2 - Situation Assessment: estimation and prediction of relations among entities, to include force structure and cross force relations, communications and perceptual influences, physical context, etc.; – Level 3 - Impact Assessment: estimation and prediction of effects on situations of planned or estimated/predicted actions by the participants; to include interactions between action plans of multiple players (e.g. assessing susceptibilities and vulnerabilities to estimated/predicted threat actions given ones own planned actions); – Level 4 - Process Refinement: adaptive data acquisition and processing to support mission objectives. The model is deliberately very abstract which sometimes makes it difficult to properly interpret its parts and to appropriately apply it to specific problems. However, as already mentioned, it was originally conceived more as a basis for common understanding and discussion between scientists rather than a real guide for developers in identifying the methods that should be used [8]. A recent paper by Llinas et al. [7] suggests revisions and extensions of the model in order to cope with issues and functions of nowadays applications. In particular, further extensions of the JDL Model-version are proposed with an emphasis in four areas: (1) remarks on issues related to quality control, reliability, and consistency in data fusion (DF) processing, (2) assertions about the need for co-processing of abductive/inductive and deductive inferencing processes, (3) remarks about the need for and exploitation of an ontologically-based approach to DF process design, and (4) discussion on the role for Distributed Data Fusion (DDF).
Sensor Management: A New Paradigm for Automatic Video Surveillance
3
455
Data Fusion and Surveillance
While in the past ambient security systems were focused on the extensive usage of arrays of single-type sensors [10,11,12,13], modern surveillance systems aim to combine information coming from different types of sources. Multi-modal systems [6,14], even more often used in biometrics, or multi-sensor multi-cue approaches [15,16] fuse heterogeneous data in order to provide a more robust response and enhance situational awareness.
Fig. 2. Example of contextualization of the the JDL scheme of Figure 1 to a surveillance scenario
The JDL model presented in section 2 can be contextualized and fitted to a surveillance context. In particular, we can imagine a typical surveillance scenario where multiple cameras monitor a wide area. A concrete example on how the levels of the JDL scheme can be reinterpreted can be found in Figure 2. In the proposed example, the levels correspond to specific video-surveillance tasks or patterns as follows: – Level 0 - Sub-Object Data Assessment: the raw data streams coming from the cameras can be individually pre-processed. For example, they can be filtered to reduced noise, processed to increase contrast, scaled down to reduce the processing time of subsequent elaborations. – Level 1 - Object Assessment: multiple objects in the scene (e.g. typically pedestrians, vehicles, etc.) can be detected, tracked, classified and recognized.
456
L. Snidaro, I. Visentini, and G.L. Foresti
The objects are the entities of the process, but no relationships are involved yet at this point. Additional data as, for instance, the map or sensitive areas are a priori contextual information. – Level 2 - Situation Assessment: spatial or temporal relationships between entities are here drawn: a target moving, for instance, from a sensitive Zone1 to another zone Zone2 can constitute an event. Simple atomic events are built considering brief stand-alone actions, while more complex events are obtained joining several simple events. Possible alarms to the operator are given at this point. – Level 3 - Impact Assessment: a prediction of an event can be an example of what, in practice, may happen at this step. An estimation of a trajectory of a potential target, or a prediction of the behaviour of an entity can be topic of this level. For instance, knowing that an object crossed Zone1 heading to Zone2 , we can presume it will cross even Zone3 according to the current trajectory. – Level 4 - Process Refinement: after the prediction given by Level 3, several countermeasures can be taken in this phase regarding all the previous levels. For instance, the sensors can be relocated to better monitor Zone3 , new thresholds can be imposed in Level 0 procedures, or different algorithms can be employed in Level 1.
4
Sensor Management
The Process Refinement part dedicated to sensors and data sources is often called Sensor Management and it can be defined as “a process that seeks to manage, or coordinate, the use of a set of sensors in a dynamic, uncertain environment, to improve the performance of the system” [17]. In other words, a Sensor Management process should be able to, given the current state of affairs of the observed environment, translate mission plans or human directives into sensing actions directed to acquire needed additional or missing information in order to improve situational awareness and fulfill the objectives. Sensor Management has been reasonably well-studied in the Data Fusion community but the focus of adaptive sensor control has largely been on improved estimation [18]. In the USA, a large DARPA program called Dynamic Tactical Targeting addressed Sensor Management at a high level but again was largely with a single objective in mind [19]. A five-layered procedure has been proposed in [17] and is reproduced in Figure 3. The chart schematizes a general sensor management process that can be used to guide the design of a real sensor management module. In the following, the different levels will be contextualized in the case of a surveillance system. 4.1
Mission Planning
This level takes as input the current situation and the requests from the human operator and performs a first breakdown of the objectives by trying to
Sensor Management: A New Paradigm for Automatic Video Surveillance
457
Fig. 3. Five-layered sensor managing process [17]
match them with the available services and functionalities of the system. In a surveillance system the requests from the operator can be events of interest to be detected (e.g. a vehicle being stationary outside a parking slot) and alarm conditions (e.g. a person trespassing a forbidden area). Each of the events should be given a priority by the operator. The Mission Planning module is in charge of selecting the functions to be used in order to detect the required events (e.g. target tracking, classification, plate reading, face recognition, trajectory analysis, etc.). Actually this module should work in a way similar to a compiler, starting from the description of the events of interest expressed in a high level language, parsing the description and determining the relevant services to be employed. The module will also identify the areas to be monitored, the targets to look for, the frequency of measurements and the accuracy level. 4.2
Resource Deployment
This level identifies the sensors to be used among the available ones. If mobile and/or active sensors are available their repositioning may be needed [20]. In particular, this level would take into consideration aspects such as coverage and sensing modality. For example, depending on the time of the day a certain event is to be detected a sensor may be preferred to another. 4.3
Resource Planning
This level is in charge of tasking the individual sensors (e.g. movement planning for active sensors [21,20]) and coordinating them (e.g. sensor hand-overs) in order to carry out a certain task (e.g. tracking). The level also deals with sensor selection techniques that can choose for every instant and every target the optimal subset of sensors for tracking or classifying it. Several approaches to sensor selection have been proposed in the literature such as, for example, information gain based [22] and detection quality based [5]. 4.4
Sensor Scheduling
Depending on the planning and requests coming from the Resource Planning, this level is in charge of determining a detailed schedule of commands for each sensor. This is particularly appropriate for active (i.e. PTZ cameras), mobile (e.g. robots) and multi-mode (day/night cameras, multi-frequency radar) sensors. The problem of sensor scheduling has been addressed in [23], and a recent contribution on the scheduling of visual sensors can be found in [24].
458
4.5
L. Snidaro, I. Visentini, and G.L. Foresti
Sensor Control
This is the lowest level and possibly also the simplest. The purpose of this level is to optimize sensor parameters given the current command imposed by Level 1 and 2. For video sensors this may involve regulating iris and focus to optimize image quality. Although this is performed automatically by sensor hardware in most of the cases, it could be beneficial to manage sensor parameters directly according to some figure of merit which is dependent on the content of the image. For example, contrast and focus may be adjusted specifically for a given target. An early treatment of the subject may be found in [25], while a recent survey may be found in [26].
5
Conclusions
We discussed how the combination of heterogeneous data can lead to better situation awareness in a surveillance scenario. We have also presented a possible new paradigm regarding sensor management that could be taken into account in the design of next generation surveillance systems. In particular, sensor management should provide a principled way for exploiting sensory information in light of the contextual information available to the system. We believe that this new paradigm will drive the development of the next generation of surveillance systems. In this way, the new systems will include a form of awareness of their own capabilities and how they relate and interact with the observed environment. This will in turn provide better system perception performance and in the end improved situational awareness.
References 1. Regazzoni, C.S., Visvanathan, R., Foresti, G.L.: Scanning the issue / technology - Special Issue on Video Communications, processing and understanding for third generation surveillance systems. Proceedings of the IEEE 89(10), 1355–1367 (2001) 2. Snidaro, L., Niu, R., Foresti, G., Varshney, P.: Quality-Based Fusion of Multiple Video Sensors for Video Surveillance. IEEE Trans. Syst. Man, Cybern. B 37(4), 1044–1051 (2007) 3. Aghajan, H., Cavallaro, A. (eds.): Multi-Camera Networks. Elsevier, Amsterdam (2009) 4. Liggins, M.E., Hall, D.L., Llinas, J.: Multisensor data fusion: theory and practice. The electrical engineering & applied signal processing series, vol. 2. CRC Press, Boca Raton (2008) 5. Snidaro, L., Visentini, I., Foresti, G.: Quality based multi-sensor fusion for object detection in video-surveillance. In: Intelligent Video Surveillance: Systems and Technology, pp. 363–388. CRC Press, Boca Raton (2009) 6. Ross, A., Jain, A.: Multimodal biometrics: An overview. In: Proc. XII European Signal Processing Conf., pp. 1221–1224 (2004) 7. Llinas, J., Bowman, C.L., Rogova, G.L., Steinberg, A.N., Waltz, E.L., White, F.E.: Revisiting the JDL data fusion model II. In: Svensson, P., Schubert, J. (eds.) Proceedings of the Seventh International Conference on Information Fusion, Stockholm, Sweden, International Society of Information Fusion, June 2004, vol. II, pp. 1218–1230 (2004)
Sensor Management: A New Paradigm for Automatic Video Surveillance
459
8. Hall, D.L., Llinas, J.: An introduction to multisensor data fusion. Proceedings of the IEEE 85(1), 6–23 (1997) 9. Steinberg, A.N., Bowman, C.: Revisions to the JDL data fusion process model. In: Proceedings of the 1999 National Symposium on Sensor Data Fusion (May 1999) 10. Javed, O., Rasheed, Z., Shafique, K., Shah, M.: Tracking across multiple cameras with disjoint views. In: Proceedings of the 9th IEEE International Conference on Computer Vision (ICCV), vol. 2, pp. 952–957 (2003) 11. Valin, J.M., Michaud, F., Rouat, J.: Robust localization and tracking of simultaneous moving sound sources using beamforming and particle filtering. Robot. Auton. Syst. 55(3), 216–228 (2007) 12. Kang, J., Cohen, I., Medioni, G.: Multi-views tracking within and across uncalibrated camera streams. In: IWVS 2003: First ACM SIGMM international workshop on Video surveillance, pp. 21–33. ACM, New York (2003) 13. Monekosso, D., Remagnino, P.: Monitoring behavior with an array of sensors. Computational Intelligence 23(4), 420–438 (2007) 14. Jain, A., Hong, L., Kulkarni, Y.: A multimodal biometric system using fingerprints, face and speech. In: 2nd International Conference on Audio- and Video-based Biometric Person Authentication, pp. 182–187 (1999) 15. Liu, H., Yu, Z., Zha, H., Zou, Y., Zhang, L.: Robust human tracking based on multi-cue integration and mean-shift. Pattern Recognition Letters 30(9), 827–837 (2009) 16. Gavrila, D.M., Munder, S.: Multi-cue pedestrian detection and tracking from a moving vehicle. Int. J. Comput. Vision 73(1), 41–59 (2007) 17. Xiong, N., Svensson, P.: Multi-sensor management for information fusion: issues and approaches. Information fusion 3(2), 163–186 (2002) 18. Hero, A.O., Castan, D.A., Cochran, D., Kastella, K.: Foundations and Applications of Sensor Management. Springer Publishing Company, Incorporated, Heidelberg (2008) 19. Hanselman, P.B., Lawrence, C., Fortunato, E., Tenney, R.R., Blasch, E.P.: Dynamic tactical targeting. In: Suresh, R. (ed.) Battlespace Digitization and NetworkCentric Systems IV, July 2004. Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series, vol. 5441, pp. 36–47 (2004) 20. Mittal, A., Davis, L.: A general method for sensor planning in multi-sensor systems: Extension to random occlusion. International Journal of Computer Vision 76(1), 31–52 (2008) 21. Denzler, J., Brown, C.: Information theoretic sensor data selection for active object recognition and state estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(2), 145–157 (2002) 22. Kreucher, C., Kastella, K., Hero III, A.: Sensor management using an active sensing approach. Signal Processing 85(3), 607–624 (2005) 23. McIntyre, G., Hintz, K.: Sensor measurement scheduling: an enhanced dynamic, preemptive algorithm. Optical Engineering 37, 517 (1998) 24. Qureshi, F., Terzopoulos, D.: Surveillance camera scheduling: A virtual vision approach. Multimedia Systems 12(3), 269–283 (2006) 25. Tarabanis, K., Allen, P., Tsai, R.: A survey of sensor planning in computer vision. IEEE transactions on Robotics and Automation 11(1), 86–104 (1995) 26. Abidi, B.R., Aragam, N.R., Yao, Y., Abidi, M.A.: Survey and analysis of multimodal sensor planning and integration for wide area surveillance. ACM Computing Surveys 41(1), 1–36 (2008)
A Simulation Framework for UAV Sensor Fusion Enrique Martí, Jesús García, and Jose Manuel Molina Group of Applied Artificial Intelligence, Universidad Carlos III de Madrid, Av. de la Universidad Carlos III, 22, 28270 Colmenarejo, Madrid (Spain) [email protected], [email protected], [email protected]
Abstract. The design of complex fusion systems requires experimental analysis, following the classical structure of experiment design, data acquisition, experiment execution and analysis of the obtained results. We present here a framework with simulation capabilities for sensor fusion in aerial vehicles. Thanks to its abstraction level it only requires a few high level properties for defining a whole experiment. Its modular design offers flexibility and makes easy to complete its functionality. Finally, it includes a set of tools for fast development and more accurate analysis of the experimental results. Keywords: sensor fusion, simulation framework, unmanned air vehicle.
1 Introduction The research of fusion solutions to real-world complex problems is a time costly process, plagued by accessory tasks which demand a great effort to be done. The market offers powerful tools for accelerating some of the more generic parts, as data analysis or visualization. Nonetheless, as we focus in a more reduced and specialized field, it is common to find that one has to perform the expensive task of implementing its own tools. Counting with an effective piece of software really makes a difference. Apart from the time saving, having a good toolbox for data representation/visualization can suppose detecting an otherwise ignored problem, or knowing how to improve the analyzed algorithms. This paper presents a generic framework for experimentation on unmanned air vehicles (UAV) sensor fusion. Bearing in mind the way in which such a tool is used, the whole system has been implemented in MATLABTM to make it flexible, easily modifiable, as well as speeding up data visualization [1]. With illustrative purposes, this document shows its application to the multisensory navigation subsystem of a vehicle that is performing maneuvers related with air traffic management (ATM)[2]. Nonetheless, it can manage any other type of flight trajectories and even other kind of vehicles (such as maritime or terrestrial). The structure of the simulator will be reviewed, with special attention to the design and functioning details of each module. We will also present our simulation and experimentation methodology, illustrating the process with some of the figures generated for results analysis and validation. E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 460–467, 2010. © Springer-Verlag Berlin Heidelberg 2010
A Simulation Framework for UAV Sensor Fusion
461
2 Simulator Architecture
SIMULATOR
The simulator is composed by three modules for data generation (see Fig. 1), plus an additional module for fusion algorithms and another one for performance evaluation. Data generation begins creating the specification of a vehicle trajectory. The resulting data is then feed to the aerodynamic model, generating the flight simulation. That process results in a set of values related with the dynamics of the UAV (such as position, attitude or accelerations), that can be easily used to synthesize realistic sensor measurements. The fusion module takes the outputs of the selected sensors and processes them sequentially using the desired technique among the available library of implemented algorithms: (Extended) Kalman Filters, Particle Filters, etc. Agustín.
Trajectory generation
Aerodynamic model
Sensor model 2
1 High level trajectory spec.
Low level trajectory spec.
Simulated flight data (ideal)
Sensor measurements for input trajectory
Performance evaluation
4 Analyzed algorithms (KF, PF…)
FUSION
3
Fig. 1. Framework schematic view. Specification for a desired trajectory (1), sensors measurement models (2), fusion process (3), performance evaluation (4).
2.1 Trajectory Generation This module is composed by several scripts and functions. They are focused on generating the input for the aerodynamic simulation from a high level specification of the trajectory. The framework is provided with a set of functions for simple maneuvers such as straight flight at constant speed or with longitudinal accelerations/decelerations, or turns around a single axis of the vehicle local coordinate system. These basic pieces can be combined and concatenated then to generate more complex trajectories. In the case of ATM trajectories, we have created scripts for typical scenarios as the
462
E. Martí, J. García, and J. M. Molina
racetrack (performed during the waiting time before landing of an aircraft in order to fit with the time scheduled. They have the shape of a hippodrome, a rectangle with two semicircles attached to its shorter sides (Fig. 2).
Z coordinate
Ideal Trajectory
200 150 100 50
-4000 -3500 -3000 -2500 -2000 -1500 -1000 -500 0
X coordinate
500 1000
0 200 400 600 800
Y coordinate
Fig. 2. 3D view of simulated racetrack+landing trajectory
The output of the system consists in six arrays of pairs instant-value. The three first arrays contain the forces in the body-fixed frame of reference of the vehicle ( , , ), which determine the translation. The three remaining are the moments , , ) in the same frame, which determine the vehicle rotations. ( 2.2 Aerodynamic Simulation The following step is the simulation of the vehicle dynamics. The selected model in our case has been, for the sake of simplicity, a rigid body with six degrees of freedom (6DoF in advance). Our reference implementation uses the aerospace MATLABTM Aerosim Aeronautical Simulation Block Set(1)[3][4], that provides a complete set of tools for rapid development of detailed 6DoF nonlinear generic aerial vehicle models and also a graphical view to check the behavior of system under test. This generic motion model can be substituted by a more detailed scheme. The only requirement for the replacement is to generate all the real data needed to synthesize the measurements on the next phase. In our example we store the position, speed, attitude (in quaternion and Euler angles), accelerations and angular rates of the body. This information is enough for simulating all the common sensors. As Fig. 3, the 6DoF dynamic model integrated both ideal segments to compose the simulated trajectory, and the simulated noisy sensor data (in its lower part). This is especially useful for online simulations, but is not recommended for experiments because the result will be sensitive to the particular generation of random noise injected in the data. The “real” data of the flight is stored in separate files for later use. This is useful for creating persistent datasets and for saving computation time, because the simulation of a flight is usually a costly process implying complex numerical operations. Even in the case of a standard 6DoF system, it is required to solve some differential equations at each time step –and typical time step resolution is in the order of a few milliseconds.
A Simulation Framework for UAV Sensor Fusion
463
Fig. 3. Simulation of ideal traajectory and IMU measures with MATLAB Aerospace blocksset
2.3 Generating Realistic Sensor Data It is commonly known thatt sensors do not provide perfect information because tthey suffer from different effectts such as inappropriate calibration, time drifts or interferences from external entitiees. Instead, their measures of real magnitudes are usuaally altered with an added rando om noise, and systematic effects as biases. Sensor measures can bee generated from “ideal” data quite straightforwardly, because all the aforementioneed effects can be subsequently incorporated through sim mple operations –for instance, ad ddition or matrix multiplication. We can find an example in Fig. 4, where the ideal flight altitude (ascending linee of crosses in the bottom part) is taken as starting point for simulating the measure oof a barometric altimeter. The process consists in three consecutive steps: • Simulate noise, by pertturbing each value with a random sample drawn from m a Gaussian distribution. Th he result is shown in hollow circles, also in the bottom. • Add a constant value to mimic the altimeter bias. The bias is an eff ffect mainly caused by differeences between the sea level atmospheric pressure assum med by the device and its reeal value. The hollow triangles in the upper part are the result. • The last effect to be sim mulated is the quantization step. Altimeters based in a baarometer do not provide co ontinuous measures: output values are quantized (here, the step is 50 meters). The fiinal measures are the solid squares. Thus, the reason for separaating flight simulation from sensor measurement syntheesis is quite simple: it simplifiees the production of several generations of measuremeents for the same trajectory, and d swapping among different sensor models.
464
E. Martí, J. García, an nd J. M. Molina
Fig. 4. Example of measurem ment generation using the model of a barometric altimeter. The starting point is the original ideeal flight altitude.
The implemented framew work applies the noise model of each sensor to every avvailable sample of unaltered data, d disregarding its temporal resolution. The produuced values are individually marrked with a timestamp, resulting in a very dense (and unrealistic) set of measures. This T will allow later selection and tuning of the sensor update ratio, and the emulation n of more advanced effects such as measurement loss. Each sensor model is implemented as a separate function. It receives the iddeal flight data and the parametters of the measurement model, and returns the simulaated values.
3 Sensor Fusion After generating all the neccessary data, experiments can be performed. The first sstep consists on defining the seet of available sensors and their features (including nooise model and update ratio), the t fusion architecture and the concrete algorithms too be used. Once the architecture is defined [5], the great majority of the experiments cann be run using a fixed scheme. All our experiments have been implemented over a ffew script templates. The next step s is to configure the fusion algorithms. One of the avvailable tracking techniques haas to be selected and configured to make use of the seleccted inputs. The result at each tim me step is registered together with its timestamp. The integration of GPS S with inertial sensors attract a considerable numberr of research attention [6]. Plen nty of classical and advanced techniques to increase the robustness have been applieed, such as unscented Kalman, particle filters or soft coomputing paradigms [7][8][9]. There are many available algorithms available for thheir direct integration in Matlab b. At this moment, our framework includes Kalman Fillter, Extended Kalman Filter, Unscented Kalman Filter and Particle Filter [10][11]. As the references show, we have adapted a existing libraries to work with our framework, such that its inclusion in the codee consists in a few lines.
A Simulation Framework for UAV Sensor Fusion
465
Once the whole trajectory has been filtered, we obtain a different interpretation of the flight trajectory. It can be directly compared with both real data and sensor measurements because the three sets provide values for the same sequence of time instants. Back to our illustrative example, we have performed several experiments of interest, as shown in the next subsections. A centralized processing architecture is the selected option for all the single-vehicle problems, given the coupling requirements to estimate sensor corrections together with trajectory parameters: a single algorithm of loosely coupled type will track the whole UAV state using the information from all the available sensors. Typically, it is interesting to generate the measures in the same script the fusion is performed, because it allows to experiment also with sensor features, as the noise models to be used. One of the problems of dealing with noisy data is that a certain scenario can be particularly favorable (or unfavorable) for the applied algorithm, leading to non representative results. To avoid this, the whole trajectory is not filtered once per experiment. We will follow a Monte Carlo approach instead. This means that for the same trajectory, several sets of measures will be generated. The noise used in the synthesis of each set of measures is, by definition, random and different on each generation, so the final set will be unique. The random number generator seed can be stored for assuring experiment reproducibility. With the simulated data sets, the Monte Carlo methodology allows for different experiments which can be run in order to perform rigorous statistical analysis on the output (root mean squared errors, t-test for performance comparison, integrity analysis, etc.).
4 Results Analysis and Validation Evaluating the performance of a solution can be complex task. In order to facilitate it, our framework provides tools for supervising the process while it is executed, and to analyze the results one the trajectory has been filtered. As an example of a tool of the first category, Fig. 5 shows a 3D plot obtained for a Particle Filter (PF) during the tracking of a trajectory using a GPS, an accelerometer and a gyroscope as sensors. Each particle (the faded cloud in the right of the figure) is drawn as an arrow, with color intensities and sizes directly proportional to the weights of the particles. Note how only a few particles in the bottom are considered important after the last GPS measure is received (just over the X axis, near the -3590 meters mark), while the vast majority of the population is represented in a very light pale tone. The estimation of the filter is the wide arrow with triangles in the extremes. Intermediate figures of this kind have multiple uses, such as visualizing with high detail on each step to diagnose the causes of a previously detected problem. For instance, we can supervise if the resampling stage is introducing enough variability in the population of particles. The overall performance of a certain solution, however, can only be evaluated after the experiment is finished. MATLABTM makes very easy the calculation of different statistics and plotting the desired variables. The real dare is to select the appropriate quality indicators. Next figures are some examples obtained using our framework.
466
E. Martí, J. García, an nd J. M. Molina
Fig. 5. Auxiliary plot to help visualizing current system state during fusion process
p estimation accuracy of a filtering algorithm againstt the Fig. 6. Comparison between position raw GPS error. Gyroscope output shows context about turns and straight segments. estimated gyroscope bia ases
estimated gyroscope biases 0.4
0.06
roll pitch yaw
0.04
roll pitch yaw
0.3
angular rate bias (rad/s)
angular rate bias (rad/s)
0.2
0.02
0
-0.02
0.1 0 -0.1 -0.2
-0.04 -0.3
-0.06 50
100
150 time (s)
200
250
300
-0.4 0
50
100
150 time (s)
200
250
3 300
Fig. 7. Gyroscope bias estimaation for an EKF- Fig. 8. Unstable gyroscope bias estimaation for an EKF-based solution of a GPS+IIMU based solution of GPS+IMU fu usion fusion
5 Conclusions A framework for experimen nting on sensor fusion has been presented in this document. Apart from detailing its strructure, we have shown how it can be used for creatinng a
A Simulation Framework for UAV Sensor Fusion
467
whole experiment, starting with the generation of a simulated flight trajectory and ending with graphical descriptions of the results. Using this software, we have reduced the implementation time of an already designed experiment to just a few minutes. Evaluating how a change in a variable affects the result is trivial, as well as changing the value of any set of configuration parameters. If required, the functionality can be completed by adding new algorithms, measure models or trajectory definitions.
Acknowledgements This work was supported in part by Projects ATLANTIDA, CICYT TIN2008-06742C02-02/TSI, CICYT TEC2008-06732-C02-02/TEC, SINPROB, CAM CONTEXTS S2009/TIC-1485 and DPS2008-07029-C02-02.
References [1] Gade, K.: NAVLAB, a Generic Simulation and Post-processing Tool for Navigation. European Journal of Navigation 2(4), 51–59 (2004) [2] Rodriguez, A.L., et al.: Real time sensor acquisition platform for experimental UAV research. In: IEEE/AIAA 28th DASC 2009, pp. 5.C.5-1–5.C.5-10 (October 2009) [3] Aerospace Toolbox - MATLAB. The MathWorks, http://www.mathworks.com/products/aerotb/ (Cited: 03 15, 2010) [4] Kurnaz, S., Cetin, O., Kaynak, O.: Fuzzy Logic Based Approach to Design of Flight Control and Navigation Tasks for Autonomous Unmanned Aerial Vehicles. Journal of Intelligent and Robotic Systems 54(1-3), 229–244 (2009) [5] García, J., et al.: Data fusion architectures for autonomous vehicles using heterogeneous sensors. In: 1st ESA NAVITEC. Noordwikj, Holland (December 2006) [6] Wagner, J.F., Wienekeb, T.: Integrating satellite and inertial navigation—conventional and new fusion approaches. Control Engineering Practice 11(5), 543–550 (2003) [7] van der Merwe, R., Wan, E., Julier, S.: Sigma Point Kalman Filters for Nonlinear Estimation and Sensor Fusion: Applications to Integrated Navigation. In: AIAA Guidance, Navigation and Controls Conference, Providence, USA (August 2004) [8] Crassidis, J.: Sigma-Point Kalman Filtering for Integrated GPS and Inertial Navigation. IEEE Trans. on AES 42(2) (April 2006) [9] Chiang, K.W., Huang, Y.W.: An intelligent navigator for seamless INS/GPS integrated land vehicle navigation applications. Applied Soft Computing 8(1), 722–733 (2008) [10] Hartikainen, J., Sarkka, S.: Optimal filtering with Kalman filters and smoothers-a Manual for Matlab toolbox EKF/UKF (2007), http://www.lce.hut.fi/research/mm/ekfukf/ [11] Chen, L., et al.: PFLib - An Object Oriented MATLAB Toolbox for Particle Filtering. Department of Statistics - Colorado State University (2007), http://www.stat.colostate.edu/~chihoon/ paper-6567-25-revised.pdf (Cited: 03 14, 2010)
An Embeddable Fusion Framework to Manage Context Information in Mobile Devices Ana M. Bernardos, Eva Madrazo, and José R. Casar Telecommunications Engineering School, Technical University of Madrid, Av. Complutense 30, 28040, Madrid, Spain {abernardos,eva.madrazo,jramon}@grpss.ssr.upm.es
Abstract. Conveniently fused and combined with data from external sources, information from sensors embedded in a mobile device may offer a dynamic view of the user’s situation, sufficient to build adaptive context-aware services. In order to shorten the development cycle of these applications, an embeddable framework to acquire, fuse and reason on context information is hereby described. ‘CASanDRA Mobile’ is designed to work autonomously in resourceconstrained devices, offering to application developers a transparent management of context information. Based on a service-oriented architecture implemented in mobile OSGi, it offers a scalable infrastructure of bundles which decouple context acquisition and automate context inference from application development. ‘CASanDRA Mobile’ aims at providing the user with full control on his private context data, by using privacy policies suitable to handle P2P context sharing. To exemplify how to use the framework features, the design procedure for a context-aware wellness application is described. Keywords: Context-aware system; data fusion; mobile reasoning; activity recognition; mobile framework.
1 Introduction Since 1994, when Schilit and Theimer coined the term ‘context-awareness’ [1], a good number of architectures to handle context information have been described in literature [2]. Most of these proposals, which aim at decoupling the process of data acquisition from application development, base their performance on the existence of a centralized infrastructure-based module capable of fusing data to extract context information. Mobile devices are increasingly equipped with better sensing, processing and storage capabilities, so it is possible to design a light infrastructure-independent middleware to infer context information. Light context-aware systems enabling distributed peer-to-peer context-sharing provides a important feature to implement the ‘Internet of Things’ concept, by which different types of objects and devices are able to opportunistically communicate among them. It is important to note that this autonomous approach of mobile computing is not opposed to the ‘cloud computing’ trend, but complements it (e.g. by exploiting short-distance data connections and device off-line autonomous capabilities). E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 468–477, 2010. © Springer-Verlag Berlin Heidelberg 2010
An Embeddable Fusion Framework to Manage Context Information
469
Following we describe our embeddable framework to provide Context Acquisition Services anD Reasoning Algorithms, from now on called ‘CASanDRA Mobile’. This middleware is the result of a learning process starting with the design and development of our infrastructure-based system for context-awareness, CASanDRA [3], which implements the fusion architecture described in [4]. Of course, the light version of the system present different challenges, but it is capable of offering a significant collection of features already considered in CASanDRA, while including domain-specific ones. CASanDRA Mobile bundle a set of standard services, conceptually layered (from data acquisition to high-level fusion), but functionally sharing the same communication and inference resources. The developer can use context information at different levels of abstraction, as the middleware provides APIs to access all the available data. The middleware, developed by using an implementation of a Service Oriented Architecture [5], is prepared to handle P2P context-sharing and, in future, it will provide Quality of Context management in order to auto-regulate its performance. Section II reviews previous approaches to mobile computing middleware. Section III lists CASanDRA Mobile’s standard functionalities and justifies our choice for a SOA-based implementation. Section IV describes the middleware itself and Section V explains a practical implementation of a context-aware application using CASanDRA Mobile’s activity inference capabilities. Finally, Section VI considers open issues which need to be addressed in further developments.
2 A Review of Light Architectures for Context Management Numerous context-aware frameworks have been described in literature during the last decade; some of them are autonomous embeddable systems, designed to be installed in mobile resource-constrained devices. Following there is a chronological short review of some of these proposals; their features, architectural approaches and learned lessons have inspired the current design of CASanDRA Mobile. Released in 2003 (when developments such as LIME, XMIDDLE, or Mobiware had already addressed general aspects of middleware development for mobile computing), MobiPADS [6] appears as one of the first designs considering context-awareness in mobile middleware. The platform enables active service deployment and dynamic service composition depending on the context, in order to optimize – in terms of resources – the overall operation of mobile applications. MobiPADS is composed by two parts: the system components, providing essential management services for deployment and configuration, and the service space, where a series of mobilets (MobiPADS’ services) can be chained to provide the applications with aggregated functionalities. Mobilets access the components through the mobilet API, which also have interfaces to allow communication and configuration of the system’s components. MobiPADS software claims to be reflective. Reflection [7] as design paradigm is also considered to build CARISMA [8]. This middleware offers customized services to applications by using policies, which depend on the context configuration. The middleware provides applications with an API (meta-interface) to inspect and alter the middleware behavior, as encoded in application profiles. CORTEX [9] focuses on the configuration of a distributed network for context management. The system is composed of sentient objects, independent software components capable of acquiring context data and performing inference. Information sharing among neighbors and dynamic resource discovery are two of its important features. CASanDRA Mobile also bases is architecture on the composition of dynamic software
470
A.M. Bernardos, E. Madrazo, and J.R. Casar
structures, adopting the ‘reflection’ paradigm: the middleware is able to manage and change its internal composition to provide the applications on top of it with the information they need, while minimizing its memory footprint and resource consumption. The ReMMoC proposal [10] focuses on dealing with the platform heterogeneity problem by proposing a web services-based reflective middleware that allows mobile clients to be developed independently of both discovery and interaction mechanisms. In the same direction, the Obje infrastructure [11] uses abstract models common to every object/device in the network, which expose information about how a device can connect to another, provide metadata about itself, be controlled or provide references to other devices. CASanDRA Mobile extends these concepts to its components, which can be automatically discovered in the system. The middleware can then dynamically create and manage the life cycle of customized aggregation services, ready perform unplanned tasks for context-aware applications. The ContextPhone [12] is a prototyping platform running on mobile phones using Symbian OS. It consists of four interconnected modules: Sensors, Communications, Customizable applications (which can seamlessly augment or replace built-in applications) and System Services. The architecture mirrors the widget approach of the wellknown Context Toolkit, using a publish-subscribe model for its components. It is possible to add new data types for context extension at compile time. Citron [13] presents an alternative for internal data sharing. The framework, conceived to be fully operative in the mobile device, uses a blackboard approach to handle information tuples gathered from ‘workers’ (components which handle access to sensors). CASAnDRA Mobile opts by the publish-subscribe model, implemented by using the standard features of a SOA platform. MADAM [14] employs component framework to design applications that can be adapted by reconfiguration; an application is assembled from a recursive structure of component frameworks. The middleware is based on extended goal policies expressed as utility functions, leaving the system to reason on the actions required to implement those policies. For MADAM, it is necessary to describe the application structure, the application’s variability and distribution aspects, the properties of each variant and the utility functions for comparing variants. MARKS [15] faces context management in ad-hoc networks. MARKS’ authors claim to incorporate some unexplored attributes – such as ‘knowledge usability’, ‘resource discovery’ and ‘self-healing’ to a pervasive middleware for mobile devices, in order to optimize the use of physical resources but to ensure security and privacy too. MARKS is composed of core components and services: the former include the object request broker, resource discovery, trust management and universal service access unit. CASanDRA Mobile brings concepts such as ‘reflection’, ‘resource discovery’, ‘service chain’, ‘privacy policies’ or ‘event-based architectures’ together. Additionally, it aims at managing Quality of Context by using bottom-up probabilistic methods. As far as we know, it is the first attempt to benefit from the use of a Service Oriented Architecture [5] based on mobile OSGi (mOSGi), highly modularizable and dynamically configurable in real time operation. mOSGi enable the design of a middleware capable of handling context asynchronous (event-based) communications and supporting off-line performance.
An Embeddable Fusion Framework to Manage Context Information
471
3 CASanDRA Mobile: Functionalities and Approach CASanDRA Mobile aims at offering a set of standard off-the-shelf features for the development of context-aware applications, in order to accelerate the application design and development life cycle. The framework, thought to be easily and modularly scalable over the time, is component-based and infrastructure-independent. It is ready to maximize its functionalities even when no data connection with external infrastructures is available. For this reason, the middleware needs to offer an off-line functional alternative to services which may need infrastructure support (e.g. indoor positioning systems). CASanDRA Mobile can be used by both native and in-thecloud applications; it is the developer who determines which kind of context information the application needs to handle. The middleware includes P2P capabilities, mainly driven to make possible context information sharing among different devices equipped with CASanDRA Mobile. Context data will be labeled depending on privacy policies controlled by the user; as a consequence, the P2P sharing functionality will handle and expose context data taking into account the privacy restrictions. The framework will have internal procedures for handling the quality of context it manages, in order to optimize the acquisition and processing procedures while considering the computational and resource costs. The applications will be aware of the quality of the information they receive, being able to adapt their behavior depending on the accuracy, up-to-dateness and other QoC features. This implies that uncertainty needs to be controlled over all the context composition lifecycle in a coordinated and reliable way. The framework will deliver access-to-sensor and fusion features, but also reasoning tools for automatic processing of complex context information. This means that an inference engine will be available inside the framework. Initially, this inference engine will be assimilated to a rule-base engine, but the final objective for CASanDRA Mobile is to expand its capabilities and manage elaborated data models (e.g. light ontologies). CASanDRA Mobile aims at adapting its components’ behavior at runtime, starting, stopping and hibernating components when needed. Real-time scalability (dynamic discovery of context sources) and easy maintenance are also key features. This means that, for example, hot installation and remote update of components (to be done without restarting the framework) will be possible. Finally, components are to be weakly integrated, allowing modular and independent software development. Considering these requirements, we have opted to implement the architecture of CASanDRA Mobile over a SOA implementation (mOSGi). Service Oriented Architectures handle ‘services’ as software units and use them to implement the key concepts of ‘visibility’, ‘interaction’ and ‘effect’. As defined in the standard [5], a ‘service’ needs to be able to perform work for another, specifying the work offered for another and also offering to another to perform the work. For this reason, services need to have interfaces to be externally invoked, and to publish their functionalities for applications to use them. In SOA architectures, modularization improves the reusability of software components and makes parallel development and testing
472
A.M. Bernardos, E. Madrazo, and J.R. Casar
simpler (each service may be independently developed and mock-uped when not available). With respect to its core functionalities, mobile OSGi enables automatic service registration and component management, and allows hot deployment of new services, not requiring a stop-start procedure when updating the service offering.
4 Description of CASanDRA Mobile Middleware 4.1 Introduction to CASanDRA Mobile’s Design The architecture of CASanDRA Mobile is composed by three building blocks - Acquisition Layer, Context Inference Layer and Core System. The Acquisition Layer decouples the access to embedded and external sensors from upper processing levels by using software ‘Sensors’, which deal with low-level hardware information retrieval. The Context Inference Layer gathers a number of ‘Enablers’ - modules that process data coming from ‘Sensors’, fuse them, and infer complex context parameters. Finally, the Core System provides several features to integrate these components in the middleware, such as discovery and registry management of new elements and some common utility libraries. Both ‘sensors’ and ‘enablers’ publish their output data in the middleware through an event manager. The main difference between these two types of components is that the formers only act as data providers, not being ready to consume data from other components. Applications run on top of CASanDRA Mobile middleware, consuming context information provided by Enablers and Sensors and using its standard features. Regarding information retrieval, CASanDRA Core offers a set of APIs to handle ‘subscription-based’ and ‘on-demand’ information queries. In the first case, a component (or an application) can subscribe to a set of context parameters: this method makes possible to receive periodic updates for the selected context data via asynchronous communications (events). Consumer elements are able to configure context patterns in order to combine and filter notification events. CASanDRA Mobile middleware is implemented in Java; it runs on a mobile OSGi platform based on J9 inside a Windows Mobile (WM) device. OSGi is a Dynamic Module System for Java, handling modules called ‘bundles’, cohesive, selfcontained units, which explicitly define their dependencies to other modules/services and their external service API. OSGi improves encapsulation and reusability, simplifies the implementation of a modular system, and provides a very useful diversity of standardized optional services including the logging service or the eventAdmin service to manage events. Additionally, several implementations of mOSGi frameworks are available for mobile operative systems such as Symbian or Android, so CASanDRA Mobile concept can be flavored to cover other type of devices beyond WM. 4.2 CASanDRA Mobile’s Components Fig. 1 shows the general architecture of CASanDRA Mobile and its APIs.
An Embeddable Fusion Framework to Manage Context Information
MEASURE NAME
SUBSCRIPTOR
BT
Registry
BBDD / History
API
Communications Manager
SubscriptionManager
ContextManager
MEASURE NAME
473
COMPONENT
COM
Privacy Manager
HTTP
͙
Logging
Discovery
Component Manager
CORE SYSTEM Sensor events
API
API
Context events
Context events
SENSOR SENSOR ͙ Accel. GPS
Inference Engine
ACQUISITION LAYER
Context ͙ Enablers
P2P Context Sharing
Performance Manager
CONTEXT INFERENCE LAYER API
Fig. 1. Software components
CASanDRA Core System. CASanDRA Mobile main modules are encapsulated together in the Core System bundle, which controls the components’ integration and their lifecycle and facilitates. In brief, the Core System is composed by: 1. The Context Manager, the main module in the Core bundle, stores and manages the publication of context parameters. It controls when a subscription is done or removed and consequently asks the Component Manager to initiate or stop the needed components. Components are started and stopped in a lazy manner, that’s to say that the middleware only starts a component when necessary and stops it when it is not needed by any of the components in the middleware. The objective is to adapt the structure of the framework to the consumer application needs, keeping the component deployment as simple as possible, in order to improve the middleware performance. This module provides an API allowing the access to stored measures/context parameters and another to require on demand measures. Context information is stored in name-value tuples aggregated to compose measure objects (for instance, the three axis acceleration values compose an acceleration ‘measure’). This facilitates the management and the storage of measures in an objectoriented database (db4o). 2. The Component Manager manages the components life cycle according to the needs of active applications. It is able to start, stop or configure components under the supervision of the Context Manager. 3. The Subscription Manager allows components and applications to subscribe to a measure/context parameter. It is aware of every active subscription (having information about the subscribed component/application), and provides an API to retrieve subscription data. 4. The Registry gathers all the available context measure names, together with the component that publishes each measure. The Registry API allows components and applications to ask for available measures/context parameters at any moment. 5. The Discovery module listens to new components registration queries and adds these components to the Registry. Components must use some special parameters
474
A.M. Bernardos, E. Madrazo, and J.R. Casar
when registering, so the discovery module can effectively make them available in the middleware. 6. The Communications Manager centralizes the access to available communication interfaces in the mobile device: every component needing a Bluetooth connection, COM ports, or HTTP connections, will use this library. It also includes additional features; for instance, the Bluetooth Manager performs the periodic search for new near devices, and provides an API used by the P2P Context Sharing for this purpose. 7. The Logging Module includes some basic logging facilities. 8. The Privacy Manager controls the access to the user’s private data. It stores credentials and privacy profiles, and manages application authorizations. This is especially useful in order to share context parameters with other devices via P2P context sharing. CASanDRA Core is packaged in a single OSGi bundle. This core API is sufficient to develop sensor components, enablers and applications, and to make them work together. CASanDRA Mobile Sensors and Enablers. ‘Sensors’ and ‘Enablers’ are CASanDRA’s components. A component is a bundle implementing the ComponentInterface interface, and offering this interface as a service by registering itself in the middleware. When a component is started, it subscribes to all the components providing the needed data to compute its output; automatically, the middleware starts every component in the middleware that generate those data. This chain reaction ends with the startup of the sensors that directly acquire raw data from hardware. CASanDRA Mobile aims at providing a complete set of ‘sensors’, software pieces accessing in a customizable way the sensing resources embedded in (or attached to) the mobile device, but also managing connections to retrieve data from external sensors which may be connected through Bluetooth or ZigBee interfaces. For example, CASanDRA Mobile offers sensing modules to retrieve a) acceleration data from an embedded inertial system, b) the received signal strength indicator of a WiFi connection (in order to enable localization algorithms) or c) biometric data from an external BT oxymeter. ‘Enablers’ performs data fusion at different levels, from signal to situation and impact fusion. The middleware currently includes a P2P Context Sharing Enabler, which listens to other devices executing CASanDRA middleware, in order to share with them context parameters (P2P context sharing makes possible to enhance the context image locally treated in each device) and a group of general Context Enablers, which e.g. includes a Location Broker - to fuse position information coming from different Location Enablers in order to provide seamless position estimation- or an Activity Enabler – working on acceleration data to infer activity estimates. Two additional enablers offer horizontal functionalities. On one hand an Inference Engine based on the rule engine 3aplm offers an API to configure rules to be executed when needed, providing with reasoning capabilities to external components. Applications and internal components may subscribe to receive events from the Inference Engine. On the other hand, the Performance Manager watches variables such as the available memory or the number of working threads, and generates context parameters in relation to the system state.
An Embeddable Fusion Framework to Manage Context Information
475
Every sensor or enabler registers the measure or context parameter it provides when initialized by using a ‘name’. These names are encouraged to follow a coherent taxonomy for improving the use of patterns in subscriptions. For instance, all the components publishing a location context parameter should name it ‘location.X’ so that any component can subscribe to ‘location.*’ and receive every location context parameter available (e.g. location.gps.internal, location.wifi, location.bluetooth).
5 Deployment of an Application on Top of CASanDRA Mobile 5.1 Service Scenario: A Wellness Application to Control Sedentary Behavior The application to develop on top of CASanDRA Mobile is a native context-aware wellness application, which aims at persuading the user to increase his physical activity in order to avoid or minimize sedentary behaviors. Each hour, the application evaluates the user’s activity level, and makes a positive or negative verdict. This balance is visually communicated to the user: the scenery serving as wallpaper for the application turns into a dark landscape each day at midnight, which progressively evolves to a greener scene if the user’s activity level is adequate. Additionally, the application triggers context-aware notifications in order to help the user to increase its activity when low levels are detected. The application needs to be aware of the user’s daily activity, assumed to be a combination of ‘atomic activity estimates’ (at rest-walking-running), covered walking distance and time. The user is not wearing any external device but his personal mobile phone, ideally located in a chest-pocket in the user’s shirt. This simplified scenario makes possible to estimate the user’s movement through the accelerometer in the mobile device. Activity information is combined with GPS data when outdoors and with a WiFi-based location estimates when indoors. 5.2 Development Methodology and Application’s Design CASanDRA Mobile can be easily improved at the same time that applications are built, if some general guides are taken into account when designing and developing the application’s building blocks. An important recommendation is to develop every enabler/sensor/application in a separate bundle: this allows specialization, division of labor, component reusability and parallel design and testing life-cycles. Therefore, when building an application, the developer has to define the bundles –sensors, enablers and application bundles- needed to have a full-featured application, and match his needs with the services available in CASanDRA Mobile. For the proposed scenario, three sensors are needed: ‘accelerometerSensor’, ‘wifirssSensor’ and ‘GPSSensor’. Then, it is necessary to process sensor data in order to infer the context feature ‘activity’. The ‘activityEnabler’ will be able to discriminate among a set of simple activities such as ‘at rest’, ‘running’ or ‘walking’, by computing the variance of the acceleration signal in each axis (x, y, z). The ‘activityEnabler’ will be in charge of publishing the activity estimation, in order to be used by the consumer components. The ‘wifirssSensor’ retrieve signal strength data from the WiFi network, for the ‘wifiLocationEnabler’ to process them. The ‘locationBrokerEnabler’ gathers both GPS data and WiFi estimates to provide seamless location information. The application will be able to use the Inference Engine’s API to configure the specific behavioral patterns’ checking, in order to make the activity evaluation easier. For example, the application may be interested in getting a notification from the Inference
476
A.M. Bernardos, E. Madrazo, and J.R. Casar
Engine if no activity and no position change is detected during two hours in a working day. The logic on how to deal with this situation still remains in the application side, which may decide to send an email to the user with some healthy advices in order to a) check that the user is carrying his mobile phone with him and b) foster the user’s physical activity. In future, the application may deal with learning methods to cope with uncertainty and minimize notifications.
Fig. 2. Bundle deployment in CASanDRA Mobile
The application will be finally deployed in a separate bundle. That makes eight independent bundles (Fig. 2) that can be developed in parallel, using mock components to simulate the others. Final integration is expected to be seamless if every single bundle is adequately tested.
6 Conclusions and Further Work The development of light fusion strategies for context management remains a challenge: from stable and efficient and accurate context feature extraction, to complex reasoning or uncertainty management and context sharing, there is a way to go to have systems for resource-constrained mobile nodes. Additionally to the specific issues related to context management, common problems of mobile application development (such as portability, modularity or scalability) remain without universal solution. CASanDRA Mobile is an attempt to address both aspects, delivering a full-featured but light fusion framework for context management in mobile devices. The platform is still in its infancy, but its first version demonstrates the feasibility and convenience of building the framework on the service oriented architecture implemented through mOSGi. We expect to deliver results on performance tests over an intensive consumer of context data application. The framework, which aims at being transparent to the applications developer, works on a general model of resources and context elements, so it defines interfaces which enable that middleware services use new resources as they appear in the environment.
An Embeddable Fusion Framework to Manage Context Information
477
Our current lines of work are currently focused on improving: 1) a light strategy for ‘quality of context’ control during all the fusion process; 2) a fusion module to manage position estimation in a seamless manner; 3) an stable activity inference system which uses Bayesian logic; 4) a model for context sharing context among different devices with the objective of improving context estimation and 4) a reasoning service including ontology processing. Of course, these are still a small part of the fusion problems to solve in order to have an operative framework. Acknowledgments. This work has been supported by the Government of Madrid under grant S-0505/TIC-0255 and by the Spanish Ministry of Science and Innovation under grant TIN2008-06742-C02-01.
References 1. Schilit, B.N., Theimer, M.M.: Disseminating active map information to mobile hosts. IEEE Network, 22–32 (September/October 1994) 2. Hong, J.-Y., Suh, E.-H., Kim, S.-J.: Context-aware systems: A literature review and classification. Expert Systems and Applications 36, 8509–8522 (2009) 3. Bernardos, A.M., Tarrío, P., Casar, J.R.: CASanDRA: A framework to provide Context Acquisition Services And Reasoning Algorithms for Ambient Intelligence Applications. In: Proc. Int. C. on Parallel and Distributed Computing, Apps. and Tech., Hiroshima (2009) 4. Bernardos, A.M., Tarrío, P., Casar, J.R.: A data fusion framework for context-aware mobile services. In: Proc. of the IEEE International Conf. in Multisensor Fusion and Integration for Intelligent Systems, Seoul, pp. 606–613 (2008) 5. OASIS Standard, Reference Model for Service Oriented Architecture 1.0 (2006) 6. Chan, A.T.S., Chuang, S.: MobiPADS: A Reflective Middleware for Context-Aware Mobile Computing. IEEE Transactions on Software Engineering 29(12) (2003) 7. Sobel, J.M., Friedman, D.P.: An introduction to reflection-oriented programming. In: Proceedings of Reflection 1996, San Francisco (1996) 8. Capra, L., Emmerich, W., Mascolo, C.: CARISMA: Context-Aware Reflective mIddleware System for Mobile Applications. IEEE T. on SW Engin. 29(10), 929–945 (2003) 9. Sørensen, C., Wu, M., Sivaharan, T., Blair, G.S., Okanda, P., Friday, A., Duran-Limón, H.: A Context-Aware Middleware for Applications in Mobile Ad Hoc Environments. In: Proc. 2nd W. on Middleware for Pervasive and Ad hoc Computing, pp. 107–110. ACM, NY (2004) 10. Grace, P., Blair, G.S., Samuel, S.: A Reflective Framework for Discovery and Interaction in Heterogeneous Mobile Environments. ACM SIGMOBILE Mobile Computing and Comms. Review on Discovery and Interaction of Mobile Services 9(1), 2–14 (2005) 11. Edwards, W.K., Newman, M.W., Sedivy, J.Z., Smith, T.F.: Bringing Network Effects to Pervasive Spaces. Pervasive Computing 4(3), 15–17 (2005) 12. Raento, M., Oulasvirta, A., Petit, R., Toivonen, H.: ContextPhone: A Prototyping Platform for Context-Aware Mobile Applications. IEEE Pervasive Comp., 51–59 (April-June 2005) 13. Yamabe, T., Takagi, A., Nakajima, T.: Citron: A Context Information Acquisition Framework for Personal Devices. In: Proc. 11th Int. Conf. on Embedded and Real-Time Computing Systems and Apps., pp. 489–495. IEEE Computer Society, Los Alamitos (2005) 14. Alia, M., Horn, G., Eliassen, F., Khan, M.U., Fricke, R., Reichle, R.: A Component-Based Planning Framework for Adaptive Systems. In: Meersman, R., Tari, Z. (eds.) OTM 2006. LNCS, vol. 4276, pp. 1686–1704. Springer, Heidelberg (2006) 15. Sharmin, M., Ahmed, S., Ahamed, S.I.: MARKS for Mobile Devices of Pervasive Computing Environments. In: Proc. 3rd Int. Conf. on Information Tech., pp. 306–313 (2006)
Embodied Moving-Target Seeking with Prediction and Planning Noelia Oses1 , Matej Hoffmann2 , and Randal A. Koene1 1
2
Fundaci´ on FATRONIK-Tecnalia, Mikeletegi pasealekua 7, 20009 Donostia-San Sebasti´ an, Spain {noses,rkoene}@fatronik.com Artificial Intelligence Laboratory, Department of Informatics, University of Zurich Andreasstrasse 15, 8050 Zurich, Switzerland [email protected]
Abstract. We present a bio-inspired control method for moving-target seeking with a mobile robot, which resembles a predator-prey scenario. The motor repertoire of a simulated Khepera robot was restricted to a discrete number of ‘gaits’. After an exploration phase, the robot automatically synthesizes a model of its motor repertoire, acquiring a forward model. Two additional components were introduced for the task of catching a prey robot. First, an inverse model to the forward model, which is used to determine the action (gait) needed to reach a desired location. Second, while hunting the prey, a model of the prey’s behavior is learned online by the hunter robot. All the models are learned ab initio, without assumptions, work in egocentric coordinates, and are probabilistic in nature. Our architecture can be applied to robots with any physical constraints (or embodiment), such as legged robots. Keywords: bio-inspired control; forward model; inverse model; prediction; planning; egocentric coordinates.
1
Introduction
This paper deals with the problem of moving-target seeking by a mobile robot, a predator-prey scenario. This problem has been long solved in nature, hence we use bio-inspired control methods to approach it. In order to approximate the real-world conditions we use a Khepera robot model with specific physical constraints. We define a set of 10 gaits, each gait being a pair of velocities for left and right motors. This restricted repertoire of gaits helps us to approximate the context of animal behavior (our final goal is to address more complex platforms such as legged robots). We implement a forward model, which enables the robot to learn to predict how a set of motor commands from its repertoire will influence its state in the environment [1,2]. The robot needs to learn its own dynamics model for navigation, in accordance with its limited set of gaits. We achieve this through autonomous exploration inspired by the motor-babbling observed in infants [3]. E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 478–485, 2010. c Springer-Verlag Berlin Heidelberg 2010
Embodied Moving-Target Seeking
479
The inverse model of the forward model is used to determine the gait needed to reach a specific location in one time-step, such as the expected relative location of the prey. If a single time-step does not suffice then a sequence of gaits is planned. The number of possible combinations of gaits increases exponentially with the length of the sequence, so that efficient heuristics are needed. Finally, the hunter learns a model of the prey’s behavior online and without any prior knowledge or assumptions. This prey model is used to predict future prey locations. All models operate in egocentric (robot-centered) coordinates, without any assumptions on the action space, and incorporate uncertainty. The combination of dynamics and uncertainty (in the robot’s forward model and in prediction of prey behavior) provide a useful approximation of real-world conditions. In these conditions extensive planning is unfeasible, because algorithms need to operate in real-time and a deep plan would need to be updated too rapidly to be useful (the frame problem [4]). Instead, we begin with a bottomup approach: Find a solution that is as reactive as possible; then add lookahead prediction and planning that is required to catch the prey. Planning is therefore added only to the extent that the combination outperforms a simple reactive architecture.
2
Learning a Forward Model in an Egocentric Coordinate System
We use a relative reference system in polar coordinates centered in the hunter robot’s center of mass (Fig. 1a). Angle is measured clockwise from the robot’s posteroanterior vector (PA), i.e. the hunter’s heading is zero degrees. Location and heading constitute a robot’s pose. For the forward model, the hunter’s reference system at time t is used to express the next pose after one time-step. We indicate the heading of the hunter at time t plus one time-step as the angle that the hunter robot’s PA vector subtends measured clockwise with respect to the hunter robot’s PA vector at time t (Fig. 1a).
(a)
(b)
(c)
Next position GAIT
Distance Distance
Angle
Heading
angle Initial position
Next bearing Previous bearing
Bearing
Fig. 1. (a) Egocentric coordinates. (b) Bayesian network for the forward model. (c) Plot showing the outcome of applying different gaits.
480
N. Oses, M. Hoffmann, and R.A. Koene
The robot needs to learn to predict the outcomes of its actions. The forward model enables this. We define the application of a specific gait for a specific amount of time as an action. The consequence of such an action is a new pose of the robot. We implement the forward model as a Bayesian network (BN), as in Demiris and Dearden [5], because BNs provide a powerful probabilistic framework in which to express the causal nature of a robot’s control system. A motor command (Gait) and the observations Distance, Angle, and Heading are each represented as random variables in the BN (Fig. 1b). We use a na¨ıve Bayes classifier, which is often quite effective even when the attribute values are not conditionally independent [6,7]. The BN parameters (the conditional probability distributions) are learned offline from data obtained during motor-babbling (randomly applied gaits, see Fig. 1c). The data is complete, the structure of the network is known and the prior probability distribution for the gaits is uniform (gaits were applied randomly with equal probability). Maximum a posteriori (MAP) learning therefore reduces to maximum-likelihood parameter learning.
3
Inverse Model and Prey Model
The inverse of the forward model describes which gait to take in order to achieve a desired location (distance, angle) in one time-step. We can obtain this inverse model, P (Gait|Distance, Angle), through inference from P (Distance|Gait) and P (Angle|Gait), which are provided by the BN of the forward model. We approximate (distance, angle) tuples with the nearest learned polar coordinates encountered during learning. The hunter learns a probabilistic transition model of the prey’s movement online, independent of the models for its own movement. The hunter observes how the prey moves, new prey pose as a function of prey pose one time-step earlier (Fig. 2a). Currently, the hunter robot gets the GPS data corresponding to the location of the prey at each time-step; at time t+Δt (Δt being the time-step) transforms that into the prey’s egocentric coordinates with respect to the prey’s reference system at time t; and incorporates the egocentric coordinates to the prey model. This prey model is used to predict the prey’s future positions. This approach resembles Thrun et al. [8], except that we make no a-priori assumptions about the way in which the prey moves or about its possible actions (unlike [9]). We note the frequency of each pose transition observed in terms of distance, angle and heading. An illustrative plot of a prey transition model can be seen next (Fig. 2b).
4
Models and Experiments
We develop a reactive model, a prey prediction model, and a planning model, and we assess the performance of each with the same experiment, conducted both in a walled-in environment and in an open environment. The experiment has seven initial states. These consist in the prey being located at five bodies’ distance from
Embodied Moving-Target Seeking (a)
481
(b) Next prey position
Distance
angle Initial prey position
Hunter
Fig. 2. (a) Illustration of one prey transition. (b) Example transition model of the prey. (a)
(b) Theta 5 Theta 6
Theta 4 Hunter
Theta 0
Theta 3 Theta 1 Theta 2
Fig. 3. (a) Experiment set-ups. (b) Results of the reactive model.
the hunter and with the prey at angles θ = 0, 1, 2, 3, 4, 5 and 6 radians, with identical headings for hunter and prey (Fig. 3a). Cyberbotics’ Webots’TM [10] Braitenberg controller runs the prey, so that it moves straight ahead until it senses an obstacle and turns. The hunter performs no obstacle avoidance. The simulated time elapsed until the hunter catches the prey is recorded. Simulations end when the prey is caught or after one simulated minute. An experiment consists of 100 simulations for each initial state. 4.1
Reactive Model
The hunter applies a gait determined by the inverse model in accordance with the current prey position. Resulting reactive behavior only enables the hunter to catch the prey in very concrete circumstances. In general, the hunter appears to follow the prey around (Fig. 3b). Out of 700 runs only 102 were successful (14.57% success rate). When the prey started off at θ = 1 radians the hunter
482
N. Oses, M. Hoffmann, and R.A. Koene
was always successful. The hunter also caught the prey on 2 occasions when the prey started off at θ = 2 radians. 4.2
Prey Prediction Model
The hunter learns the prey model online and uses it to predict the prey’s future position (Fig. 4a). At each time-step, the prey’s predicted position is used as target position for the inverse model, which determines the hunter’s gait. The prey’s position can be predicted ahead for a number of time-steps (T), and the optimal number depends on the distance between hunter and prey. We set T to the nearest integer to half of that distance. (a)
(b)
(c)
Prey t+3Δ t Prey t+2Δ t
Prey t+Δt
Prey t Hunter
Fig. 4. (a) Prey prediction. (b) Results of the prey prediction model in the closed environment. (c) Results of the prey prediction model in the open environment.
Figure 4b shows results in a walled-in environment. The prey was caught in 655 of 700 runs (92.8% success rate), with average catch time (including misses): 15.287 seconds. Figure 4c shows results in an open environment. The prey was caught in 473 of 700 runs (67.6% success rate). The lower success rate was influenced by the prey controller, as the prey can continue to run straight when nothing forces it to turn. 4.3
Planning Model
For a better success rate in the open environment, the hunter needs to plan more than one gait ahead, composing gaits to catch the prey. We now predict the prey’s position at successive time-steps and select the minimum at which a composition of gaits will minimize the distance between the hunter and the prey. Heuristic Solution with Best-First Search: The theoretical solution would involve calculating the probability distribution for the distance between the hunter and the prey. In doing so we would encounter the “curse of dimensionality” due to the exponential increase in the size of the state space with each level of the search tree (Fig. 5). We can avoid this by using sampling to predict hunter
Embodied Moving-Target Seeking
G1
G1G1
G1G2
G2
G1G3
G2G1
G2G2
483
G3
G2G3
G3G1 G3G2
G2G1G1 G2G1G2 G2G1G3 G2G2G1 G2G2G2 G2G2G3 G2G3G1 G2G3G2
G3G3
G2G3G3
Fig. 5. Search tree for planning sequence of gaits (shown only for 3 gaits for the sake of clarity) (a)
(b)
Predicted and actual hunter and prey trajectories 0.4 0.35
Predicted prey trajectory Predicted hunter trajectory Actual prey trajectory Actual hunter trajectory
0.3
Y (m)
0.25 0.2 0.15 0.1 0.05 0 0
0.05
0.1
0.15
0.2 0.25 X (m)
0.3
0.35
0.4
Fig. 6. (a) Example of a planning iteration. (b) Results of the planning model in an open environment.
position. We calculate samples for each time step and each different sequence of gaits. Each node in the tree has associated information: T (time-steps or depth the node plans), Gait[t] (sequence of gaits applied at time-steps t < T ), cost in terms of number of gaits used in planning, cost in terms of number of gait transitions (transitions will be important in the legged robot scenario), Hunter[t] (predicted hunter coordinates at time-steps t < T relative to hunter pose during planning), Value (final predicted distance between hunter and prey). Choosing a sequence of gaits is a combinatorial optimization problem. A breadth-first search of the tree was too slow, so we proceeded to use a bestfirst search. The best-first search algorithm explores a graph by expanding the most promising node. In our case, the most promising node is the one that most reduces the distance to predicted prey position. The search tree with g T nodes needs to be pruned further, for example by eliminating combinations with more than one gait transition. With the planning model (Fig. 6b), 591 of 700 runs were successful (84.4% success rate).
484
5
N. Oses, M. Hoffmann, and R.A. Koene
Discussion and Conclusions
We have presented a bio-inspired control architecture that allows a mobile robot to: (1) learn a model of its own action repertoire (a forward model); (2) learn a model of an object (prey) it is seeking; (3) combine the forward model and the prey model to seek the prey. Braitenberg [11] and Brooks [12] showed that robots that rely on embodiment, purely reactive behaviors and that exploit interaction with the environment could address real-world dynamic problems that representations in classical A.I. could not adequately deal with. Such robots exhibit sophisticated behaviors and properties such as adaptivity, robustness, versatility and agility found in biological organisms, yet without emphasizing cognitive capabilities such as planning, abstract reasoning or language. Following this inspiration, we took a bottom-up approach by developing a reactive model first and only adding cognitive capabilities as and when necessary. Our architecture has the following properties: (1) An egocentric coordinate system is used; (2) The model can deal with an arbitrary action repertoire of the hunter and the prey. There are no assumptions on the behavior of the hunter or prey; (3) The action space is discrete; (4) The models are learned ab initio. The hunter’s forward model is is learned as a result of a motor-babbling phase. The prey’s model is learned online and incrementally updated; (5) Our model accounts for and plans with uncertainty. We see two possible uses for our architecture. First, it can be applied as a whole to moving-target seeking by an autonomous vehicle, for instance. Or, only individual components can be utilized. The forward model implementation would allow an arbitrary robot to learn its motor repertoire and plan with it. The prey model can be applied to any target object, such as in a person-following scenario [9]. Second, our scenario could serve to model biology. By adding details about particular behaviors we may test hypotheses for the way in which animals achieve similar behaviors, for example: the prey-catching behavior of the spider Portia [18] or hunting in vertebrates. At the same time, our scenario is a case for minimalistic model of cognition which is firmly grounded in body dynamics [13,14,15,16,17]. Future work planned includes extending our model to a legged platform which uses real gaits, adding real sensing of the prey (through a camera on the hunter, for instance), and studying various cost functions for the trajectory planning of the hunter. These can include energy consumption, or computational complexity/reaction time. Acknowledgments. The authors would like to thank the AI Lab of the University of Zurich for welcoming the first author to the lab through an internship. We would also like to thank Juan Pablo Carbajal for many interesting and informative discussions about the project. This work was undertaken in the context of the “From locomotion to cognition” project funded by the Swiss National Science Foundation, Grant Nr. 200020-122279/1.
Embodied Moving-Target Seeking
485
References 1. Webb, B.: Neural mechanisms for prediction: do insects have forward models? Trends in Neurosciences 27, 278–282 (2004) 2. Wolpert, D.M., Miall, R.C., Kawato, M.: Internal models in the cerebellum. Trends in Cognitive Sciences 2, 338–347 (1998) 3. Meltzoff, A.N., Moore, M.K.: Explaining facial imitation: a theoretical model. Early Development and parenting 6(2), 157, 1–14 (1997) 4. Pfeifer, R., Scheier, C.: Understanding intelligence. The MIT Press, Cambridge (2001) 5. Demiris, Y., Dearden, A.: From motor babbling to hierarchical learning by imitation: a robot developmental pathway. In: Proceedings of the Fifth International Workshop on Epigenetic Robotics: Modeling Cognitive Development in Robotic Systems. Lund University Cognitive Studies, vol. 123, pp. 31–37 (2005) 6. Mitchell, T.M.: Machine learning. McGraw-Hill, New York (1997) 7. Russel, S., Norvig, P.: Artificial Intelligence: A Modern Approach. Prentice Hall, Englewood Cliffs (1995) 8. Thrun, S., Burgard, W., Fox, D.: Probabilistic robotics. The MIT Press, Cambridge (2005) 9. Vazquez, A.: Incremental learning for motion prediction of pedestrians and vehicles. PhD thesis, Institut National Polytechnique de Grenoble (2007) 10. Michel, O.: Webots: Professional Mobile Robot Simulation. International Journal of Advanced Robotic Systems 1(1), 39–42 (2004) 11. Braitenberg, V.: Vehicles Experiments in Synthetic Psychology. The MIT Press, Cambridge (1986) 12. Brooks, R.A.: Intelligence Without Representation. Artificial Intelligence Journal 47, 139–159 (1991) 13. Pezzulo, G.: Anticipation and Future-Oriented Capabilities in Natural and Artificial Cognition. In: Lungarella, M., Iida, F., Bongard, J.C., Pfeifer, R. (eds.) 50 Years of Aritficial Intelligence. LNCS (LNAI), vol. 4850, pp. 258–271. Springer, Heidelberg (2007) 14. Duro, R.J., Gra˜ na, M., de Lope, J.: On the potential contributions of hybrid intelligent approaches to multicomponent robotic system development. Information Sciences (in press, 2010) 15. Clark, A., Grush, R.: Towards a Cognitive Robotics. Adaptive Behavior 7(1), 5–16 (1999) 16. Grush, R.: The emulation theory of representation: motor control, imagery, and perception. Behavioral and Brain Sciences 27, 377–442 (2004) 17. Schomaker, L.: Anticipation in cybernetic systems: A case against mindless antirepresentationalism. In: IEEE International Conference on Systems, Man and Cybernetics, The Hague, Netherlands (2004) 18. Tarsitano, M.: Route selection by a jumping spider (Portia labiata) during the locomotory phase of a detour. Animal Behavior 72, 1437–1442 (2006)
Using Self-Organizing Maps for Intelligent Camera-Based User Interfaces Zorana Banković, Elena Romero, Javier Blesa, José M. Moya, David Fraga, Juan Carlos Vallejo, Álvaro Araujo, Pedro Malagón, Juan-Mariano de Goyeneche, Daniel Villanueva, and Octavio Nieto-Taladriz Dep. Ingeniería Electrónica, Universidad Politécnica de Madrid, Av. Complutense 30, 28040 Madrid, Spain {zorana,elena,jblesa,josem,dfraga,jcvallejo,araujo,malagon, goyeneche,danielvg,nieto}@die.upm.es
Abstract. The area of Human-Machine Interface is growing fast due to its high importance in all technological systems. The basic idea behind designing Human-Machine interfaces is to enrich the communication with the technology in a natural and easy way. Gesture interfaces are a good example of transparent interfaces. Such interfaces must perform the action the user wants, so that the proper gesture recognition is of the highest importance. However, most of the systems based on gesture recognition use complex methods requiring highresource devices. In this work we propose to model gestures capturing their temporal properties, significantly reducing the storage requirements, and using self-organizing maps for their classification. The main advantage of the approach is its simplicity, which enables the implementation using devices with limited resources, and therefore low cost. First testing results demonstrate its high potential. Keywords. Gesture recognition, intelligent environments, self-organizing maps.
1 Introduction Human-machine interaction has been the subject of intense research over the past few decades. The human-computer interaction must be designed as naturally and as easily as possible, without resulting in the perception of an intrusive technology. User interaction should not require the user to adapt to special conventions or rules; it should be the environment who should adapt to the natural way of user interaction. In the recent past, new natural and flexible interfaces, embedded in the objects the people use on everyday basis, have been developed. These new interfaces have been designed and adapted for the end users' needs. One of the most natural and comfortable ways to interact with the system is by hand gestures. Most of the systems based on gesture recognition use complex methods or algorithms, which require high-resource devices to be efficiently performed. For this E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 486–492, 2010. © Springer-Verlag Berlin Heidelberg 2010
Using Self-Organizing Maps for Intelligent Camera-Based User Interfaces
487
reason, there are many interesting works on gesture recognition using general purpose computers. However, we have to take into account that the embedded systems connected to the cameras usually have very limited resources. We need to do as much processing as possible in the camera processor to try to reduce the spectrum occupation but we must pay attention, since this increases the consumption of the sensor. In this way, the main advantage of our proposal is its simplicity and its low resource consumption. This permits us to perform our algorithm on a device with limited resources, and therefore low cost, as embedded systems usually are. In this article we present a low-cost gesture interface that can control different systems of an environment with simple and fast processing in the embedded systems, minimizing the need for communications. First we propose to model gestures capturing their temporal properties, and after that we deploy a self-organizing maps (SOM) algorithm for clustering the gestures. The paper is organized as follows. In Section 2 we present the previous work on the subject. Section 3 details the characterization of gestures. In Section 4 we give further details of the SOM implementation. Finally, results are presented in Section 5 and conclusions are drawn in Section 6.
2 Previous Work on Self-Organizing Maps for Gesture Recognition There are numbers of papers that deploy SOM algorithm for gesture recognition [1, 2, 3, 4]. The common aspect of all of them is that they have two stages and that SOM is either hierarchical or deployed together with another technique that captures temporal properties of the gestures. Furthermore, all of them deploy rather standard characterization that contains information such as trajectory of the hand, resultant direction of the movement, velocity of the movement. Extraction of these features introduces additional computational overhead, and with their complexity, i.e. the need of training two learning algorithms, total computational overhead can be too high for their implementation in devices with limited resources. On the other hand, our characterization implicitly contains above information and its calculation is straightforward. Since it also captures the temporal properties of gestures, we do not need two learning algorithms. This makes our approach less complicated and more appropriate for implementations in devices with limited resources.
3 Gesture Characterization Each gesture is captured as a set of frames of variable size, as presented in the figure below for the gesture up-down.
488
Z. Banković et al.
Fig. 1. Gesture captured in 12 consecutive frames
We propose to divide each frame into n x n smaller parts and assign to each part a value that corresponds to its luminosity. After that we characterize temporal evolution of each part in the following way. If, for example, the values are: 0 0 20 40 50 60 70 50 10, we characterize it with the time-windows of a certain size and the value of each feature is its frequency in the captured gesture. For time window of size 3, it would be the following: 0 0 20 0 20 40 20 40 50 40 50 60 50 60 70 70 50 10
0.16 0.16 0.16 0.16 0.16 0.16
Having in mind that the number of the features extracted in this way does not have to be fixed, in order to find the difference, i.e. distance between two of them, we deploy distance function proposed in [5] that calculates distance between sequences.
Using Self-Organizing Maps for Intelligent Camera-Based User Interfaces
489
Further, the distance between two captured gestures is simply the sum of absolute distances of the corresponding sub-parts. In this way, in the gesture characterization we have captured the temporal evolution of the gesture. The following step is the clustering using Self-organizing map (SOM) algorithm.
4 Self-Organizing Maps Algorithm Self organizing maps (SOM), also known as Kohonen networks, are an unsupervised type of neural networks [6]. As in neural networks, the basic idea of SOM has origins in certain brain operations, specifically in projection of multidimensional inputs to one-dimensional or two-dimensional neuronal structures on cortex. For example, perception of color depends on three different light receptors (red, green and blue), plus the eyes capture information about the position, size or texture of objects. It has been demonstrated that this multidimensional signal is processed by the planar cortex structure. Further, the areas of the brain responsible for different signals from the body preserve topology, e.g. the area responsible for the signals that come from the arms is close to the area responsible for the signals that come from the hand. These are precisely the basic ideas of SOM that consist in the following: 1. Multidimensional data and their dependencies are presented and captured in a lower dimension SOM network (usually 2D). 2. The proximity of the nodes in the lattice reflects similarity of the data mapped to the nodes. For these reasons, SOMs have been widely deployed for clustering and good visualization of clustering problem. If we project the resulting clusters to RGB space, we can visualize the similarities of the adjacent clusters. They have been successfully deployed in different fields such as image processing [7], robotics [8] (for both visual and motor functions), function approximation in mathematics [9], network security [10], detection of outliers in data [11], etc. 4.1 Implementation Details The designed SOM algorithm mainly follows the steps of the standard SOM algorithm [12]. The only specific part is the update of the node. Namely, if the node does not contain a feature from a certain input, we add it to the node with the value 0. However, having in mind that in this way, nodes may end up having many features which would introduce significant computational overhead, we discard all the features that have at least 100 smaller value of the maximal feature value of the node, as this does not affect significantly on the final result. After having finished the training, in the current implementation we label the nodes with the label of the gesture from the set of labeled gestures that is closest to the node according to the distance function explained above. The process is depicted in Figure 2.
490
Z. Banković et al.
Fig. 2. The process of gesture recognition
4.2 Advantages of the Approach The main advantage of our proposal is its simplicity. Our characterization of gestures captures the temporal evolution of the gesture, and we distinguish gestures simply by clustering them. This is another advantage, as in essence we do not have to label all the gestures (only those used for cluster labeling). Furthermore, the characterization significantly reduces the memory needed to store a gesture. For example, a captured gesture that occupies 507kB is reduced to 625B (for 5x5 division of the frame), around 1000 times. This permits us to perform our algorithm on a device with limited resources, as embedded systems deployed in ambient intelligence usually are.
5 Empirical Evaluation 5.1 Training and Testing Dataset In order to test the proposed algorithm, we have captured five types of gesture: leftright, right-left, up-down, down-up and the fifth type are random gestures labeled as unknown. Twelve persons were making the gestures and in total 760 gestures were made. In order to illustrate the memory reduction that our approach provides, we will give the numbers of occupied memories in both cases. The captured gestures occupy 1.08GB of storage space, while after the division of each frame into 5x5 blocks, they occupied 3.11MB, taking around 350 times less storage space. 5.2 Results and Discussion We have tested our algorithm on different training and testing scenarios (by taking different portions of data explained above). We have performed testing with both 3x3 and 5x5 frame partitions. Furthermore, we have experimented with different sizes of time window mention in the Sec. 3. The main conclusion is that in general smaller window sizes (3 for example) exhibit better performances than higher. In general, we have obtained very high detection rates for gestures up-down (94%) and down-up (up to 100%), as well as the unknown gestures (up to 100%). However,
Using Self-Organizing Maps for Intelligent Camera-Based User Interfaces
491
Table 1. Maximal Detection Rates for Each Gesture Gesture Unknown Down-up Up-down Left-right Right-left Overall
Detection Rate (%) 88 100 92 13 13 80
the detection of gestures left-right and right-left did not give satisfactory results as these have been mostly confused with each other (and sometimes with unknown gestures). In the future we plan on further working on this issue. These results are summarized in Table 1.
6 Conclusions In this work we have presented low-cost algorithm for gesture classification. We have proposed the characterization of gestures that captures temporal properties of gesture. We have further clustered the gestures using SOM algorithm, achieving detection rate of up to 100% for certain gestures and overall detection of 80% at most. In the future we plan to add one more stage of SOM clustering in order to detect the users. Acknowledgments. This work was funded by the Spanish Ministry of Industry, Tourism and Trade, under Research Grant TSI-020301-2009-18 (eCID), the Spanish Ministry of Science and Innovation, under Research Grant TEC2009-14595-C02-01, and the CENIT Project Segur@.
References 1. Ishikawa, M., Sasaki, N.: Gesture Recognition based on SOM using Multiple Sensors. In: 9th International Conference on Neural Information Processing, pp. 1300–1304. IEEE Xplore (2002) 2. Shimada, A., Taniguchi, R.: Gesture Recognition Using Sparse Code of Hierarchical SOM. In: 19th International Conference on Pattern Recognition, pp. 1–4. IEEE Xplore (2008) 3. Caridakis, G., Karpouzis, K., Drosopoulos, A.I., Kollias, S.D.: SOMM: Self organizing Markov map for gesture recognition. Pattern Recognition Letters 31(1), 52–59 (2010) 4. Caridakis, G., Karpouzis, K., Pateritsas, C., Drosopoulos, A.I., Stafylopatis, A., Kollias, S.D.: Hand trajectory based gesture recognition using self-organizing feature maps and Markov models. In: ICME 2008, pp. 1105–1108 (2008) 5. Rieck, K., Laskov, P.: Linear Time Computation of Similarity for Sequential Data. Journal of Machine Learning Research 9, 23–48 (2008) 6. Rojas, R.: Neural Networks. Springer, Berlin (1996)
492
Z. Banković et al.
7. Littmann, E., Drees, A., Ritter, H.: Neural Recognition of Human Pointing Gestures in Real Images. In: Neural Processing Letters, pp. 61–71. Kluwer Academic Publishers, Dordrecht (1996) 8. Vleugels, J.M., Kok, J.N., Overmars, M.H.: A self-organizing neural network for robot motion planning. In: Gielen, S., Kappen, B. (eds.) ICANN 1993 Art. Neural Networks Conf. Proc., pp. 281–284. Springer, Heidelberg (1993) 9. Aupetit, M., Couturier, P., Massote, P.: Function Approximation with Continuous SelfOrganizing Maps Using Neighboring Influence Interpolation. In: Proc. of Neural Computation (NC 2000), Berlin, Germany (May 2000) 10. Lane Thames, J., Abler, R., Saad, A.: Hybrid intelligent systems for network security. In: ACM Southeast Regional Conference, Proceedings of the 44th annual Southeast regional conference, pp. 286–289 (2006) 11. Muñoz, A., Muruzábal, J.: Self-Organizing Maps for Outlier Detection. Neurocomputing 18(1-3), 33–60 (1998) 12. SOM Algorithm, http://www.ai-junkie.comñ/ann/som/som2.html
A SVM and k-NN Restricted Stacking to Improve Land Use and Land Cover Classification Jorge Garcia-Gutierrez, Daniel Mateos-Garcia, and Jose C. Riquelme-Santos Department of Computer Science E.T.S.I.I. - University of Seville, Spain {jgarcia,mateos,riquelme}@lsi.us.es
Abstract. Land use and land cover (LULC) maps are remote sensing products that are used to classify areas into different landscapes. The newest techniques have been applied to improve the final LULC classification and most of them are based on SVM classifiers. In this paper, a new method based on a multiple classifiers ensemble to improve LULC map accuracy is shown. The method builds a statistical raster from LIDAR and image fusion data following a pixel-oriented strategy. Then, the pixels from a training area are used to build a SVM and k-NN restricted stacking taking into account the special characteristics of spatial data. A comparison between a SVM and the restricted stacking is carried out. The results of the tests show that our approach improves the results in the context of the real data from a riparian area of Huelva (Spain).
1
Introduction
Remote sensing has become a very important tool to carry out many different tasks for the Natural Environment. In this way, remote sensing has successfully been applied to important activities like flood control, forestal inventories or invasive species control in protected or specially interesting areas. Although remote sensing usually works with images exclusively, data fusion has been of high interest since the appearance of new active sensors (i.e., data is produced as a response for a stimulus which is not the solar light). They complement images and overcome some of their limitations, e.g., the problems associated to shadows. These limitations cause fusion of sensors can be found as a proper technique specially interesting to improve the results of the classical remote sensing approaches. One of the most active research lines has been based on LIDAR (LIght Detection And Ranging) technology. This technology is able to register object heights and it is specially recommended to be applied on complex landscapes like riparian zones. Thus, Verrelst et al.[1] use LIDAR to study vegetal species communities and Antonorakis et al.[2] develop a new methodology to identify different types of commercial wood in riparian zones using only LIDAR. An automatic pixel classification which is generally supervised is usually the first step to extract knowledge from remote sensing data. Several techniques from machine learning have been used with satisfactory results though support vector machines (SVM) are the predominant technique to obtain the best results E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 493–500, 2010. c Springer-Verlag Berlin Heidelberg 2010
494
J. Garcia-Gutierrez, D. Mateos-Garcia, and J.C. Riquelme-Santos
in most cases [3]. Despite the SVM’s high accuracy, improvement is needed to reach the standards for products like land uses and land cover (LULC) maps [4]. LULC maps are remote sensing products that are used to classify areas into different landscapes subject to their own characteristics or functionality. The newest techniques have been applied to improve the final classification to develop LULC maps. Fauvel et al. [5] apply an SVM to classify the pixels depending on morphologic and hyperspectral data. In Mitrakis et al. [6], a neuronal network with weights determined by a genetic algorithm obtains the final classification using fusion operators and fuzzy logic. It is important to underline that ensembles are one of the most powerful tools in machine learning and so they are in remote sensing where they have also been applied profusely. A very clear example can be seen in [7] where an stacking of several SVM’s and a random forest is used to carry out the pixel classification. This work explores the application of ensembles on remote sensing taking advantage of contextual information [8] from multi-source (LIDAR and aerial images) data. Thus, a novel supervised method called R-STACK (based on a stacking of a SVM and multiple NN classifiers) is shown with two purposes: – Show an easy way to improve the quality of models when intelligent techniques are applied on LIDAR and imagery fusion data. – Improve the general accuracy of an automatically generated LULC map. The rest of the paper is organized as follows. Section 2 provides a description of the data used in this work. Section 3 describes the methodology used, highlighting the feature set and the model extraction process. The results achieved are shown in section 4 and, finally, section 5 is devoted to summary the conclusions and to discuss future lines of work.
2
Data Description
A LIDAR system is an optical sensor technology that measures properties of scattered light (usually laser) to find range and/or other information of a distant target. The whole process starts with the emission of polarized light, typically, in the ultraviolet visible or near infrared. Then, LIDAR catches the reflected signal from the topographic surface and measures the time employed for each return to establish the distance between the emitter and the object that produced the return. This process is helped by a global positioning system (GPS) to give rise to a cloud point database in which for every point, it is possible to find: spatial position(i.e., x, y and z coordinates), intensity of return, number of the return in a sequence (if a pulse caused multiple impacts), etc. This features and the RGB values in an orthophoto are used in this work to obtain statistical measures on which the method is based and they will be explained in section 3. The LIDAR data was taken in coastal areas of the province of Huelva. The pulses were geo-referenced and correctly validated by the distributor of the data and having 1,384,875 records for an area of 1.5 km2 . The reported precision indicates a maximum error of 0.5 m in the x-y positions, and of 0.15 m in the
A SVM and k-NN Restricted Stacking to Improve LULC Classification
495
z position. Along with the LIDAR flight, the aerial photographs were taken of the area with a resolution of 0.5 m2 . The study area is situated in the south of Spain, in the mouth of the Tinto and Odiel rivers. This area is near the city of Huelva and presents a mix of urban and natural areas. The natural areas can be classified in five subclasses: watered zones, marshland and vegetation (low, middle and high). The high vegetation is formed by scarce trees of the genus eucalyptus in the area. The middle vegetation is formed by different types of Mediterranean bushes that principally surround roads and urban areas. Pastures are classified as low vegetation and include bare earth areas. In addition, the urban areas are also classified in five subclasses: roads and railways, buildings, coal deposits, dumps and mixed areas.
3
Method
Our LULC development method (see Figure 3) follows a pixel-oriented strategy which obliges us to create a matrix or raster where each element is a pixel. Each pixel represents an area in function of the resolution. The value of resolution must be provided by the user as a method parameter to determine the area within each pixel. The resolution depends on the LIDAR point density and the orthophoto resolution. In our case, the selected resolution is set at 3 m2 . Lesser resolution could damage the smallest classes (roads) classification and bigger resolution cannot be possible due to the LIDAR resolution (0.5 points/m2 ). Apart from the resolution, it is necessary to supply a digital elevation model (DEM) to extract the actual heights of the LIDAR returns. In our case, this process is carried out by an adaptative morphologic filter [9]. In addition, expert knowledge has been applied to manually classify over a 2% of total data (7399 instances). Expert knowledge leaned on the photographs taken in the same flight as LIDAR data and previous information from the Regional Ministry of Andalusia (LULC map from 2003) was collected by an operator to build the training set. The next step (step 2 in Figure 3) is to calculate a set of variables from image RGB values, LIDAR intensity, heights and distribution of the LIDAR returns for each pixel (a total of 500,000 pixels). In this manner, sixty-one different measures were calculated for every pixel. Most of variables used have been extracted from literature [10][2]. In Table 1, a summary of these features can be seen. Specially interesting is the case of the normalized difference vegetation index (NDVI). The classical NDVI is generated from near infrared band (NIR) and the red band (R) as can be seen in Equation 1. In our case, it cannot be calculated since the NIR band is not available in LIDAR or orthophotography. Thus, the new attribute SNDVI has been used to simulate the NDVI using the intensity (I) from LIDAR (Equation 2) as near-infrared value which approximates the real NIR value. N DV I =
N IR − R N IR + R
SN DV I =
I −R I +R
(1) (2)
496
J. Garcia-Gutierrez, D. Mateos-Garcia, and J.C. Riquelme-Santos
A new method called R-STACK based on a modified stacking of two well-known classifiers (SVM and k-NN) has been developed for the model generation. The Weka [11] implementation of SOM and a ad-hoc k-NN implementation were used for each classifier respectively. Moreover, the general scheme for stacking has been modified to adapt it to geographic data. In this way, the first level (steps 5 and 6) consists in a SVM which takes every feature from the pixels in the training area to build an initial model which classifies every pixel from the study zone. At that point, a classical SVM application on images is resulted.
Input l: LIDAR data o: Orthophotograhy data Output m: LULC map Begin 1. Develop a matrix raster in which every cell involves a physical position 2. Add the corresponding statistics from l and o to each pixel in raster Select a training set from raster, called train Label each pixel in train using expert knowledge Build a SVM model, svm, from train Use svm to classify every pixel in raster For each pixel p in raster 7.1. Collect the neigbourhood of p in a set s 7.2. Build a k-NN model, knn, from s 7.3. Use knn to classify p 8. Return a map m with every pixel spatial position and its label End 3. 4. 5. 6. 7.
Fig. 1. The LULC classification method based on a R-STACK algorithm (steps 6 to 8)
The novelty of the R-STACK method settles on the second level (step 7). Particularly, on the application of several classifiers (k-NN) and the way they are trained. Thus, a k-NN is build for each pixel taking its neighbours in the raster as training set which involves a strong relation (physical dependence) among the training pixels and the current pixel to classify. For the study area, we work with k = 3 and an 8-adjacency that is, each 3-NN is developed with just 8 instances of the pixel surrounding area. For this reason, the process can be tackled from the point of view of efficiency and complexity. In the end, the k-NN classifies the current pixel again having used the model built by its neighbours. In this way, possible inconsistences and non-desired effects can be removed. It is important to point out that it is necessary to make a raster copy before this last process and whilst the classes in the original raster are modified, every k-NN has to be build taking the neighbours from the raster copy in order to avoid collateral
A SVM and k-NN Restricted Stacking to Improve LULC Classification
497
Table 1. Sixty-one candidate variables. Variables with (*) are calculated for each band of a pixel: Height(H), Intensity(I), Red(R), Green(G) and Blue(B). Variable Description SNDVIMIN SNDVI minimum SNDVIMAX SNDVI maximum SNDVISTD SNDVI Standard deviation SNDVIAVG SNDVI average MIN(*) Minimum MAX(*) Maximum STD(*) Standard deviation AVG(*) Average VAR(*) Variance SKEW(*) Skewness KURT(*) Kurtosis RANGE(*) Range NOTFIRST Second or later return EMP Empty neighbours
Variable Description ICV Intensity coefficient of variation HCV Height coefficient of variation SLP Slope CRR Canopy relief ratio PEC Penetration coefficient TOTALR Total of returns PCTN1 Unique return percentage PCTN2 Double return percentage PCTN3 Three or more returns percentage PCTR1 First return percentage PCTR2 Second return percentage PCTR3 Third or later return percentage PCTR31 PCTR3 over PCTR1 PCTR21 PCTR2 over PCTR1 PCTR32 PCTR3 over PCTR2
effects. Otherwise, the new classification sequence would affect the result of the remaining pixels.
4
Results
Two kinds of testing have been carried out to compare the efficiency of our approach against a classical SVM. The first test is based on statistical techniques. Since remote sensing data is expensive to generate, the comparison has to rest on an artificial data split. In our case, 100 splits are created from the original data so that each split contains about 740 instances. Then, a 10-fold-cross-validation process is made for every split. The results are registered for the following comparison process. We have used the procedure suggested in several works [12] for robustly comparing classifiers across multiple datasets in order to evaluate the statistical significance of the measured differences in algorithm ranks. The chosen procedure involves the use of a statistical test to compare classifiers one each other. Our objective was to compare a classical SVM to our approach in terms of accuracy. Thus, the Wilcoxon procedure was selected as the appropriate test. A fair comparison of the algorithms is obtained by average ranks and in this case, after the previous 100 10-fold-cross-validation results, our approach ranks first. With the measured average ranks, the Wilcoxon test checks whether the average ranks are significantly different from the mean rank r = 1.5 expected under the null hypothesis. Leaning on a statistical package (MATLAB), p value for the Wilcoxon test have resulted on a value less than 5.72e − 06 so the null
498
J. Garcia-Gutierrez, D. Mateos-Garcia, and J.C. Riquelme-Santos
Table 2. A summing up of the hold-out test for the SVM classical approach User class \sample Water Marshland Roads or railways Low Veg. Middle Veg. High Veg. Buildings Coal deposits Dumps Mixed areas TP Rate FP Rate Precision KIA Correctly classified
Water Marsh Roads or Low Middle High Buildings Coal Dumps Mixed railways Veg. Veg. Veg. deposits areas 178 5 0 0 0 0 0 0 0 2 0 100 1 2 2 1 0 2 0 1 0 0 0
4 4 9
69 2 2
0 50 2
6 1 21
0 0 3
0 0 0
0 0 0
1 0 0
0 0 0
0 0
0 2
0 3
0 0
0 2
26 1
0 31
0 0
0 0
0 0
0 0 0 0.795 0 1
10 0 0 0.625 0.003 0.833
0 1 1 0 0 0 0.962 0.917 0.002 0.051 0.994 0.8 0.815
0 0 17 0.863 0.048 0.734
4 1 0 0 0 0 0 0 0 0.877 0.568 1.0 0.015 0.021 0.009 0.862 0.636 0.839
0 0 21 9 1 2 0.677 0.1 0.004 0.021 0.913 0.143
0.846
Table 3. A summing up of the hold-out test for the SVM + k-NN restricted stacking User class \sample Water Marshland Roads or railways Low Veg. Middle Veg. High Veg. Buildings Coal deposits Dumps Mixed areas TP Rate FP Rate Precision KIA Correctly classified
Water Marsh Roads or Low Middle High Buildings Coal Dumps Mixed railways Veg. Veg. Veg. deposits areas 181 3 0 0 0 0 0 0 0 1 1 98 1 5 1 1 0 0 0 2 0 0 0 0 0
4 2 4 0 2
0 1 1 0 0 0 0.978 0.899 0.005 0.033 0.989 0.86 0.847 0.873
72 2 0 0 2 1 0 15 0.9 0.04 0.774
0 53 5 0 0
2 0 25 0 2
0 0 3 26 2
4 0 0 0 0 0 1 0 0 0.93 0.676 1.0 0.028 0.009 0.01 0.779 0.833 0.813
0 0 0 0 31
0 0 0 0 0
0 0 0 0.795 0 1
10 0 0 0.625 0 1
0 0 0 0 0
2 0 0 0 0
0 0 24 6 0 4 0.774 0.2 0 0.019 1 0.267
A SVM and k-NN Restricted Stacking to Improve LULC Classification
499
hypothesis is rejected. Having found that the measured average ranks are significantly different (at α = 0.05), our analysis based on ranks reveals that the accuracy of classical SVM is significantly worse than that of our approach for this kind of data. The second type of testing is a hold-out process with data previously classified. This is the common testing in remote sensing. The test data set(600 instances) was selected from the original data set because of its special difficulty to be classified and it is not part of the training set. In Table 3 and Table 2, results for our approach and classic SVM are shown when the hold-out process is carried out. The general improvement is a 3% which is a very important advance.
5
Conclusions
In this paper, a new method based on a multiple classifiers ensemble was used to improve LULC map accuracy. The method built a statistical raster from LIDAR and image fusion data following a pixel-oriented strategy. Then, the pixels from a training area were used to train a SVM and k-NN restricted stacking (called R-STACK) taking into account the special characteristics of spatial data. A comparison between a SVM and the R-STACK method was carried out. The results in a riparian area of Huelva (Spain) showed a global accuracy of 84.6% for the classical SVM and 87.6% for the new approach which means a significant advance. Even though results are satisfactory, there are still several problems to fix. Some of them are related to shadows from images and its weight on the final classification which has to be taken into account. Hence, a control of weights for each feature has to be implemented in order to avoid their misclassification effects. Genetic algorithms could be a very suitable tool to solve this problem. In addition, dependence on the training set can be a more important problem. Sometimes, the training set can be incomplete or not enough to describe the real space. These problems are harder to fix. Despite the fact that a semi-supervised approach seems to be more suitable to sort out this kind of problems, very few semi-supervised proposals can be found yet and more research is needed in order to develop them with the required accuracy. Finally, some problems are inherent in pixel-oriented approaches such as the detection of partial artificial structures. In the future, it would be very interesting to apply a prior phase in which at low addition to the computational cost, an object-oriented segmentation and classification could be carried out to extract the most difficult structures to classify, using visual recognition techniques from the computer vision world. Acknowledgments. We would like to thank the Regional Ministry of Andalusia for all the support received in the development of this work and especially, to thank Irene Carpintero, Juan Jos´e Vales and Daniel Laguna for their very ´ appreciated comments. We would also like to thank Francisco Mart´ınez-Alvarez and Luis Gon¸calves-Seco for all the time they invested that allowed this work to be completed.
500
J. Garcia-Gutierrez, D. Mateos-Garcia, and J.C. Riquelme-Santos
References 1. Verrelst, J., Geerling, G., Sykora, K., Clevers, J.: Mapping of aggregated floodplain plant communities using image fusion of casi and lidar data. International Journal of Applied Earth Observation and Geoinformation (11), 83–94 (2009) 2. Antonarakis, A., Richards, K., Brasington, J.: Object-based land cover classification using airborne LIDAR. Remote Sensing of Environment (112), 2988–2998 (2008) 3. Dalponte, M., Bruzzone, L., Vescovo, L., Damiano, G.: The role of spectral resolution and classifier complexity in the analysis of hyperspectral images of forest areas. Remote Sensing of Environment (113), 2345–2355 (2009) 4. Shao, G., Wu, J.: On the accuracy of landscape pattern analysis using remote sensing data. Landscape Ecology (23), 505–511 (2008) 5. Fauvel, M., Benediktsson, J., Chanussot, J., Sveinsson, J.: Spectral and spatial classification of hyperspectral data using SVMs and morphological profiles. IEEE Transactions on Geoscience and Remote Sensing 46(11), 3804–3814 (2008) 6. Mitrakis, N., Topaloglou, C., Alexandridis, T., Theocharis, J., Zalidis, G.: Decision fusion of GA self-organizing neuro-fuzzy multilayered classifiers for land cover classification using textural and spectral features. IEEE Transactions on Geoscience and Remote Sensing 46(7), 2137–2152 (2008) 7. Waske, B., van der Linden, S.: Classifying multilevel imagery from sar and optical sensors by decision fusion. IEEE Transactions on Geoscience and Remote Sensing 46(5), 1457–1466 (2008) 8. Cortijo, F.J., Blanca, N.P.D.L.: Improving classical contextual classifications. International Journal of Remote Sensing 19(8) (1998) 9. Goncalves-Seco, L., Miranda, D., Crecente, R., Farto, J.: Digital terrain model generation using airborne LIDAR in florested area of Galicia. In: Proceedings of 7th International Symposium on Spatial Accuracy Assessment in Natural Resources and Environmental Sciences, Spain, pp. 169–180 (2006) 10. Hudak, A.T., Crookston, N.L., Evans, J.S., Halls, D.E., Falkowski, M.J.: Nearest neighbor imputation of species-level, plot-scale forest structure attributes from lidar data. Remote Sensing of Environment 112, 2232–2245 (2008) 11. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: An update. SIGKDD Explorations 11(1) (2009) 12. Garcia, S., Herrera, F.: An extension on statistical comparisons of classifiers over multiple data sets for all pairwise comparisons. Journal of Machine Learning Research 9, 2677–2694 (2008)
A Bio-inspired Fusion Method for Data Visualization Bruno Baruque1 and Emilio Corchado2 1 2
Department of Civil Engineering, University of Burgos. C/ Francisco de Vitoria s/n, 09006 Burgos, Spain Departamento de Informática y Automática, University of Salamanca. Plaza de la Merced s/n, 37008, Salamanca, Spain [email protected], [email protected]
Abstract. This research presents a novel bio-inspired fusion algorithm based on the application of a topology preserving map called Visualization Induced SOM (ViSOM) under the umbrella of an ensemble summarization algorithm, the Weighted Voting Superposition (WeVoS). The presented model aims to obtain more accurate and robust maps, also increasing the models stability by means of the use of an ensemble training schema and a posterior fusion algorithm, been those very suitable for visualization and also classification purposes. This model may be applied alone or under the frame of hybrid intelligent systems, when used for instance in the recovery phase of a case based reasoning system. For the sake of completeness, the comparison of the performance with other topology preserving maps and previous fusion algorithms with several public data set obtained from the UCI repository are also included.
1
Introduction
One of the main problems for data analysis nowadays is not the difficulty in obtaining the data, but the extraction of useful information from the huge amount of data that almost every business management, industrial or scientific process generates. Also, the organization and classification of the already existent data for a posterior use is a primary concern when talking about knowledge management and applications. Among the variety of tools at our disposition for these kind of tasks, one of the most useful are Artificial Neural Networks (ANNs) [1] and those making use of unsupervised learning in particular, as no prior knowledge about the data set is needed for their training. Among these models the topology preserving map family has proven as a very useful one in tasks such as visual data inspection [2], data clustering and organization -due to their pattern matching capabilities- [3] or image processing [4] among others. It is a well-known phenomenon that due to the usual use of randomness in the ANNs training process, the results of training two networks with the same parameters can lead to somewhat different results, being a difficult task to identify one as better than the other. Ensemble [5] and fusion theory aim to to obtain more accurate and robust models, also increasing their stability.This research is E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 501–509, 2010. c Springer-Verlag Berlin Heidelberg 2010
502
B. Baruque and E. Corchado
based then on the application of ensembles to a topology preserving map called Visualization Induced Self-Organizing Map (ViSOM) [6]. This type of algorithms can be used for classification and visualization purposes. Then they are very suitable to perform these kind of tasks under the frame of hybrid intelligent systems, when used for instance in the recovery phase of a case based reasoning system (CBR) [7]. For the sake of completeness the comparison of the performance with other topology preserving maps and previously presented fusion algorithms with several public data set are also included.
2
Topology Preserving Mapping
The topology preserving maps comprises a family of techniques conceived as a visualization tool to enable the representation of high-dimensional data sets on 2-dimensional maps and thereby facilitating data interpretation tasks for human experts. The best known technique among them is the Self-Organizing Map (SOM) algorithm [8]. It is based on a type of unsupervised learning called competitive learning; an adaptive process in which the neurons in a neural network gradually become sensitive to different input categories, sets of samples in a specific domain of the input space [9]. One interesting extension of this algorithm is the Visualization Induced SOM (ViSOM) [6], proposed to directly preserve the local distance information on the map, along with the topology. The ViSOM constrains the lateral contraction forces between neurons and hence regularizes the inter-neuron distances so that distances between neurons in the data space are in proportion to those in the input space [10]. The difference between the SOM and the ViSOM hence lies in the update of the weights of the neighbours of the winner neuron as can be seen from Eq. 1 and Eq. 2. Update of neighbourhood neurons in SOM: wk (t + 1) = wk (t) + α(t)η(v, k, t)(x(t) − wk (t))
(1)
where, x denotes the network input; wk the characteristics vector of each neuron; α is the learning rate of the algorithm; and η(v, k, t) is the neighbourhood function, in which v represents the position of the winning neuron or Best Matching Unit (BMU) in the lattice, and k the positions of the neurons in its neighbourhood. Update of neighbourhood neurons in ViSOM is done, on its hand, using the following expression: dvk − vk λ wk (t + 1) = wk (t) + α(t)η(v, k, t) (x(t) − wv (t)) + (wv (t) − wk (t)) vk λ (2) where, dvk and vk are the distances between neurons in the data space v and k on the unit grid or map, respectively, and λ is a positive pre-specified resolution parameter. It represents the desired inter-neuron distance -of two neighbouring nodes- reflected in the input space. The most common neighbourhood function used in this kind of models is the Gaussian function or, in particular cases, the difference of Gaussian.
A Bio-inspired Fusion Method for Data Visualization
3 3.1
503
Previous Work on Topology Preserving Algorithms Fusion The Ensemble Meta-algorithm
In the field of AI, ensemble learning [5] is the process by which multiple models, such as classifiers or experts, are strategically generated and combined to solve a particular computational intelligence problem. In the case of this study, one of the simplest ensemble-based algorithms is used: the Bagging algorithm [11]. By randomly selecting a sub-set of the original data set several times, this algorithm obtains several “replicated” data sets in which various entries from the original data set will appear -one or many timesand other entries will not appear at all. Training the same algorithm over each of the sub-sets gives as a result a set of slightly different automated learning models, that are hoped to overcome the problems that could arise with a single one. Especially related to ensemble learning, classifier fusion techniques have been subject of study by many researchers [12,13]. The aim of these kind of techniques is to obtain a single classifier able to improve the performance of a single one by training an array of several simpler but similar classifiers and finally “summarizing” them into a final one. A particular way of performing this fusion is at the model level. The advantage of this, apart from the improvement of the classification performance and stability is the obtaining of a single model, which is easier to deal with. In the case of the present study the objective is the calculation of a single map, as when a visual inspection of a data set is required, the simplicity is an essential characteristic. 3.2
Map Fusion by Euclidean Distance
The Map Fusion by Euclidean Distance algorithm [14] first searches for the neurons that are closer in the input space (selecting only one neuron in each network of the ensemble) then it fuses them to obtain the final neuron in the fused map. This process is repeated until all the neurons have been fused. The main characteristic of this approach is that a pair-wise match of the neurons of each network will always take place. When completing the fusion of two neurons, the neighbouring neurons are not taken into account. Fusing two neurons will result in a neuron associated with a slightly different characteristic vector. In visual terms, this is the same as “shifting” the position of a neuron in a map. If this is done without taking account of the neighbouring neurons, two neurons considered neighbours will not necessary be the two closest neurons of the network in the final fused map. The complete algorithm implementing this fusion method is detailed in the original publication [14].
504
4
B. Baruque and E. Corchado
A Novel Fusion Algorithm: Weighted Voting Superposition
The idea behind the novel fusion method presented in this study, Weighted Voting Superposition (WeVoS) [15], is to obtain the best position for a neuron, but also for their neighbours, unlike the previously explained method. As a consequence, the final map keeps one of the most important features of this type of algorithms: its topological ordering. In this study, the WeVoS is applied for the first time to the ViSOM using well-know data sets to perform a thorough study and comparison of its capabilities. The first step in this meta-algorithm is to calculate the “quality” -or rather, error measure [16,17]- of each of the neurons composing each map, in order to relay some kind of informed decision for the fusion of neurons. The final map is obtained also on a neuron-by-neuron basis. First, the neurons of the final map are initialized by calculating the centroids of the neurons in the same position of the map grid in each of the trained maps. Then, a recalculation of the final position of that neuron uses the information associated with the neurons in that same position in each map of the ensemble. For each neuron, a sort of voting process is performed, as in Eq. 3: bp,m qp,m · M Vp,m = M i=1 bp,i i=1 qp,i
(3)
where Vp,m is the weight of the vote for the neuron included in map m of the ensemble, in position p. M is the total number of maps in the ensemble, bp,m is the binary vector used for marking the data set entries recognized by neuron in position p of map m, and qp,m is the value of the desired quality measure for a neuron in position p of map m. The weights of the neurons are fed into the final network as with the data inputs during the training phase of a SOM, considering the “homologous” neuron in the final map as the BMU. The weights of the final neuron will be updated towards the weights of the composing neuron. The difference of the updating performed for each “homologous” neuron in the composing maps depends on the quality measure calculated for each neuron. The higher the quality (or the lower the error) of the neuron of the composing map, the stronger the neuron of the fused map updated towards the weights of that neuron. The number of data inputs recognized by each neuron is also taken into account in this quantization of the “best suitability” of one neuron or another for the same position in the final map. So, in comparison with previously presented method -Fusion by Euclidean Distance-, when updating the characteristics of a single neuron, this approach takes into account not only the characteristics of that neuron, but also the topographic ordering of its neighbour. It is expected that this new approach will obtain more maps that are more faithful to the inner structure of the data set from a visualization point of view.
A Bio-inspired Fusion Method for Data Visualization
5
505
Experiments and Results
Several experiments have been performed to check the suitability of using the previously described fusion techniques under the frame of the mentioned topology preserving models. The data sets selected are Iris and Echocardiogram that were obtained from the UCI repository [18]. There can be found in literature many different measures to asses the quality of a topology preserving maps [17]. Dealing with usually such subjective aspects as visual inspection, among them there is no one being able to capture all aspects of the performance of this kind of algorithm. Rather than that, many researchers use different measures, considering them complementary. In the case of this work, two different quality measures have been used: Distortion [19] and “Goodness of Map” [20]. Both of them are considered error measures, so lower values for them are considered good results. For all the tests involving the fusion of networks, the procedure is the same. A simple n-fold cross-validation is used in order to employ all data available for training and testing the model and having several executions to calculate an average of its performance. In each step of the cross-validation, first an ensemble of networks must be obtained -by using the bagging algorithm-. Then the computation of both fusion algorithms is performed. Finally the quality measures are calculated employing the test fold, both for the single models and the fusion obtained from the ensemble. In Fig. 1, a representation of four of the previously discussed maps are represented in the input space of the data set -first 2 Principal Components- so the way each of them adapts to the data set can be clearly compared. It can be seen how the WeVoS models (Fig. 1c and Fig. 1d) slightly modifies the position of the grid of their corresponding single model (Fig. 1a and Fig. 1b) to better cover the input data space. The remaining model -Fusion by Euclidean Distance- was not included due to space constraints. The second set of experiments performed (Fig. 2 and Fig.3) consisted in progressively reducing the size of the data set used in order to observe how this reduction, which progressively introduces instability to the data set, affects the performance of the models. Experimentally, five maps have been selected as the most suitable number of components for the ensemble in this experiment, as it is important to have a certain variability in the ensemble to obtain significant results. From these analytical results it can clearly be inferred that Weighted Voting Superposition obtains consistently better results for the Distortion measure (Fig. 2a and Fig.3a), accounting for the topographic preservation of the models. As a consequence of that clear advantage, the Goodness of Map (Fig. 2b and Fig.3b), which accounts both for the quantization and topology error is also worse for the Fusion by Euclidean Distance than for the WeVoS in almost all test. Other general observation that can be extracted from the showed results is that, as the Iris data set (Fig. 2) is quite simple and well defined, the effect of reducing the number of samples in the test does not significantly affect the results of the maps, with the exception of the Fusion by Euclidean Distance.
506
B. Baruque and E. Corchado
(a) SOM
(b) ViSOM
(c) WeVoS-SOM
(d) WeVoS-ViSOM
Fig. 1. Four of models discussed -two single models and two ensemble fusion modelsembedded in a 2D representation of the Iris data set. Both the data set and the grids are projected over the first two Principal Components of the data set.
(a)
(b)
Fig. 2. Results comparing single algorithms and the two ensemble fusion algorithms. Experiments performed varying the number of samples used from the Iris data set.
A Bio-inspired Fusion Method for Data Visualization
(a)
507
(b)
Fig. 3. Results comparing single algorithms and the two ensemble fusion algorithms. Experiments performed varying the number of samples used from the Echo-Cardiogram data set.
This is due to the way the position of the final units of this fusion is calculate that, as explained before, does not take into account their neighbourhood. On the other hand, when decreasing the number of samples of the Echocardiogram data set (Fig. 3) this clearly increases the instability of the results, although not a clear tendency appears. Studying more in detail the analytical results it can be concluded that when dealing with a rather simple data set as is the Iris (150 entries, 4 dimensions), the use of the ensemble algorithm does not necessarily lead to better results. Seeing Fig. 2a, it can be seen, that Distortion is very similar for the single SOM and ViSOM models and slightly lower for both the WeVoS-SOM and WeVoS-ViSOM. Although for a very small margin the ViSOM proves to be a little better than the SOM and the WeVoS-ViSOM a bit better than the WeVoS-SOM. On the contrary, the case is inverted for the Goodness of Map measure (Fig. 2b). In this case, although again with little difference, the single models obtain lower error than the WeVoS counterparts. These results mean that, although the WeVoS helps improve the topographic preservation of the models, it can degrade its vector quantization performance. As can be seen from Fig. 3 when the data set is more complicated to interpret, such as the Echo-cardiogram data set (104 entries, 9 dimensions), the clear improvement of the results makes extra effort of training an ensemble of maps worthy. The Distortion measure (Fig. 3a) is clearly improved by the Fusion algorithms -both Fusion by Euclidean Distance and WeVoS-SOM-, being the WeVoS-SOM the model obtaining the lowest error of the three. In this case, the ViSOM obtains clear lower error than the SOM. The Fusion by Distance of ViSOM is not able to outperform the single model, but the WeVoS-ViSOM obtains generally better results than the single ViSOM. For the Goodness of Map (Fig. 3b), in this case the Fusion by Distance obtains far worse results than the rest. Single SOM and ViSOM obtain mixed results, although they are very close, especially when the size of the data set decreases to 63 entries, it could be concluded that in this case the SOM performs a bit better. In the case of the
508
B. Baruque and E. Corchado
WeVoS models this situation is inverted, as the WeVoS-ViSOM obtains lower error than the WeVoS-SOM for a size of data set of 63 entries or less.
6
Conclusions and Future Work
This work has presented a model for topology preserving map algorithms fusion. Its aim is to obtain a more truthful representation of the data set by enhancing one of its main features: topology preservation. In this case it has been tested with a model such as the ViSOM for the first time. The present work includes comparison of the WeVoS-ViSOM model with previously devised fusion methods and its application to other topology preserving models such as the SOM. Results seem to point to the fact that the use of the WeVoS algorithm improves topology preservation of the final maps, as the decrease in the Distortion error proves. Due to the use of a subset of the whole training set, it is able to concentrate in some interesting features found in each map, that might not have been clearly registered when training a map on the whole data set. By doing so, the final map will most likely perform worse when measuring the quantization error, as it is not so focused on each of the samples of the data set. Therefore, this technique will be of more use when trying to obtain a visual representation of the data set structure, rather than to be used as a vector quantization algorithm. Also it can be concluded from the results that, as expected, the method is manly useful when analyzing a complex data set with a high number of dimensions or low number of entries. In this cases, the extra complexity of performing several trains of the algorithm are compensated by a clear improvement of the results. Future work includes a wider comparison of the WeVoS with other topology preserving maps and with more complex ensemble training algorithms, such as boosting; to clearly confirm the strengths and weaknesses of the algorithm. It also includes the use of the WeVoS into a real-life application that benefits from its particular characteristics, such as its inclusion in hybrid intelligent systems based, for instance, on the use of the CBR or multi-agent system methodologies. Acknowledgments. This research has been partially supported through projects CIT-020000-2008-2 and CIT-020000-2009-12 of the Spanish Ministry of Education and Innovation and BU006A08 of the Junta of Castilla and León (JCyL). The authors would also like to thank the manufacturer of components for vehicle interiors, Grupo Antolín Ingeniería, S.A. in the framework of the project MAGNO 2008 – 1028 – CENIT Project funded by the Spanish Ministry of Science and Innovation.
References 1. Bishop, C.: Neural Networks for Pattern Recognition, Oxford (1995) 2. Deboeck, G., Kohonen, T. (eds.): Visual Explorations in Finance: with SelfOrganizing Maps. Springer, Heidelberg (1998)
A Bio-inspired Fusion Method for Data Visualization
509
3. Kaski, S., Honkela, T., Lagus, K., Kohonen, T.: WEBSOM - self-organizing maps of document collections. Neurocomputing 21(1-3), 101–117 (1998) 4. Kohonen, T., Oja, E., Simula, O., Visa, A., Kangas, J.: Engineering applications of the self-organizing map. In: Proceedings of the IEEE, vol. 84, pp. 1358–1384 (1996) 5. Sharkey, A., Sharkey, N.: Diversity, selection and ensembles of artificial neural nets. In: Third International Conference on Neural Networks and their Applications. IUSPIM, pp. 205–212 (1997) 6. Yin, H.: ViSOM - a novel method for multivariate data projection and structure visualization. IEEE Transactions on Neural Networks 13(1), 237–243 (2002) 7. Baruque, B., Corchado, E., Mata, A., Corchado, J.M.: A forecasting solution to the oil spill problem based on a hybrid intelligent system. Information Sciences 180(10), 2029–2043 (2010) 8. Kohonen, T.: Self-Organizing Maps, vol. 30. Springer, Berlin (1995) 9. Kohonen, T., Lehtio, P., Rovamo, J., Hyvarinen, J., Bry, K., Vainio, L.: A principle of neural associative memory. Neuroscience 2(6), 1065–1076 (1977) 10. Yin, H.: Data visualisation and manifold mapping using the ViSOM. Neural Networks 15(8-9), 1005–1016 (2002) 11. Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996) 12. Sharkey, A., Sharkey, N.: Combining diverse neural nets. Knowledge Engineering Review 12(3), 1–17 (1997) 13. Ruta, D., Gabrys, B.: An overview of classifier fusion methods. Computing and Information Systems 7(1), 1–10 (2000) 14. Georgakis, A., Li, H., Gordan, M.: An ensemble of SOM networks for document organization and retrieval. In: International Conference on Adaptive Knowledge Representation and Reasoning (AKRR 2005), pp. 6–141 (2005) 15. Baruque, B., Corchado, E.: A weighted voting summarization of SOM ensembles. In: Data Mining and Knowledge Discovery (January 2010), doi:10.1007/s10618– 009–0160–3 16. Polani, D.: Measures for the organization of self-organizing maps. In: Seiffert, U., Jain, L.C. (eds.) Self-organizing Neural Networks: Recent Advances and Applications. Studies in Fuzziness and Soft Computing, vol. 16, pp. 13–44. Physica-Verlag, Heidelberg (2003) 17. Pozlbauer, G.: Survey and comparison of quality measures for self-organizing maps. In: Paralic, J., Polzlbauer, G., Andreas, R. (eds.) Fifth Workshop on Data Analysis (WDA 2004), pp. 67–82. Elfa Academic Press, London (2004) 18. Asuncion, A., Newman, D.J.: UCI machine learning repository (2007) 19. Vesanto, J., Sulkava, M., Hollmen, J.: On the decomposition of the self-organizing map distortion measure. In: Proceedings of the Workshop on Self-Organizing Maps (WSOM 2003), pp. 11–16 (2003) 20. Kaski, S., Lagus, K.: Comparing Self-Organizing Maps. In: von der Malsburg, C., Vorbrüggen, J.C., von Seelen, W., Sendhoff, B. (eds.) ICANN 1996. LNCS, vol. 1112, pp. 809–814. Springer, Heidelberg (1996)
CBRid4SQL: A CBR Intrusion Detector for SQL Injection Attacks Cristian Pinzón1,2, Álvaro Herrero3, Juan F. De Paz1, Emilio Corchado1, and Javier Bajo1 1
Departamento de Informática y Automática, Universidad de Salamanca, Plaza de la Merced s/n, 37008, Salamanca, Spain {cristian_ivanp,fcofds,escorchado,jbajope}@usal.es 2 Universidad Tecnológica de Panamá, A.P: 0819-07289, Panamá, Rep. De Panamá 3 Department of Civil Engineering, University of Burgos, Spain C/ Francisco de Vitoria s/n, 09006 Burgos, Spain [email protected]
Abstract. One of the most serious security threats to recently deployed databases has been the SQL Injection attack. This paper presents an agent specialised in the detection of SQL injection attacks. The agent incorporates a Case-Based Reasoning engine which is equipped with a learning and adaptation capacity for the classification of malicious codes. The agent also incorporates advanced algorithms in the reasoning cycle stages. The reuse phase uses an innovative classification model based on a mixture of a neuronal network together with a Support Vector Machine in order to classify the received SQL queries in the most reliable way. Finally, a visualisation neural technique is incorporated, which notably eases the revision stage carried out by human experts in the case of suspicious queries. The Classifier Agent was tested in a real-traffic case study and its experimental results, which validate the performance of the proposed approach, are presented here. Keywords: SQL Injection, Intrusion Detection, CBR, SVM, Neural Networks.
1 Introduction Over recent years, one of the most serious security threats around databases has been the SQL Injection attack [1]. In spite of it being a well-known type of attack, the SQL injection remains at the top of the published threat list. The solutions proposed so far [2], [3], [4], [5], [6], [7], [8] seem insufficient to prevent and block this type of attack because these solutions lack the learning and adaptation capabilities for dealing with attacks and their possible variations in the future. In addition, the vast majority of solutions are based on centralized mechanisms with little capacity to work in distributed and dynamic environments. This study presents the intelligent agent CBRid4SQL (a CBR Intrusion Detector), capable of detecting attacks based on SQL code injection. CBRid4SQL is an agent specially designed following the strategy of an Intrusion Detection System (IDS) and is defined as a Hybrid Artificial Intelligence System (HAIS). This agent is E.S. Corchado Rodriguez et al. (Eds.): HAIS 2010, Part II, LNAI 6077, pp. 510–519, 2010. © Springer-Verlag Berlin Heidelberg 2010
CBRid4SQL: A CBR Intrusion Detector for SQL Injection Attacks
511
the principal component of a distributed hierarchical multi-agent system aimed at detecting attacks in dynamic and distributed environments. The CBRid4SQL agent is a CBR agent [9] that is characterized by the integration of a CBR (Case-Based Reasoning) mechanism. This mechanism provides the agents with a greater level of adaptation and learning capability, since CBR systems make use of past experiences to solve new problems [9]. This is very effective for blocking SQL injection attacks as the mechanism uses a strategy based on anomaly detection [10]. Additional to the incorporated CBR motor in the CBRid4SQL agent’s internal structure, an integrated mixture through an Artificial Neural Network (ANN) and a Support Vector Machine (SVM) are used as a mechanism of classification. Through the use of this mixture, it is possible to exploit the advantages of both strategies in order to classify the SQL queries in a more reliable way. Finally, to assist the expert in the making of decisions regarding those queries classified as suspicious, a visualization mechanism is proposed which combines clustering techniques and neural models to reduce the dimensionality based on unsupervised learning. The rest of the paper is structured as follows: section 2 presents the problem that has prompted most of this research work. Section 3 explains the internal structure of the CBRid4SQL agent used as a classifier agent. Finally, the conclusions and experimental results of this work are presented in section 4.
2 SQL Injection Attacks An SQL injection attack takes place when a hacker changes the semantic or syntactic logic of an SQL text string by inserting SQL keywords or special symbols within the original SQL command which is executed at the database layer of an application [1]. Different attack techniques exist which include the use of SQL Tautologies, Logic errors / Illegal queries, union query and piggy-backed queries. Other more advanced techniques use injection based on interference and alternative codification [1]. The cause of the SQL injection attacks is relatively simple: an inadequate input validation on the user interface. As a result of this attack, a hacker can be responsible for unauthorized data handling, retrieval of confidential information, and in the worst possible case, taking over control of the application server [1]. Different strategies have been presented as a solution to the problem of SQL injection attacks [1], with special attention given to strategies based on IDSs [2], [3], [4], [5], [6], [7], [8]. One approach based on anomaly detection was proposed by [2], applying a clustering strategy to group similar queries and isolate queries which are considered malicious. The main disadvantage of this approach is in its high computational overhead which would affect a real-time detection. Kemalis and Tzouramanis propose SQL-IDS (SQL Injection Detection System) [3] that uses security specifications to capture the syntactic structure of the SQL queries generated by the applications. The main limitation of this approach is the computational cost while comparing the new query with the predefined structure at runtime. In [4] two types of SQL injection attacks are raised: tautology attacks and those based on the UNION operator. Through the syntactic analysis of SQL query strings, the data of the HTTP requests are extracted to later be used in the training phase and
512
C. Pinzón et al.
to determine the threshold to use in the evaluation phase. Bertino, Kamra and Early [4] propose an anomaly detection mechanism applying data mining techniques. The main problem of this approach is to find an adequate threshold to maintain a low rate of both false positives and false negatives. Another anomaly-based approach is proposed by Robertson, Vigna, Kruegel and Kemmerer [6]. The approach uses generalisation techniques to convert suspicious requests within abnormal signatures. These signatures are later used to group malicious requests which present similar characteristics. Another of the techniques used is characterization; deducing the type of attack associated with the malicious request. A low computational overhead is generated. However, it is susceptible to generating false positives. The algorithm ID3, presented by Garcia, Monroy and Quintana [7], proposes the detection of attacks targeted at web applications. The algorithm ID3 is used to detect and filter malicious SQL string. This approach presents a significant percentage of incorrect classifications. Valeur, Mutz, and Vigna [8] propose the use of anomaly detection through the generation of a series of models beginning with a set of recovered queries. At execution time, they monitor the applications in order to identify requests which are not associated with the aforementioned models.
3 An Agent for Detecting SQL Injection Attacks Agents are characterized by their autonomy, which gives them the ability to work independently and in real-time environments [11]. The CBRid4SQL agent presented in this study interacts with other agents within the architecture. These agents carry out tasks related to capturing messages, syntactic analysis, administration, and user interaction. As opposed to the tasks for these agents, the CBRid4SQL agent executes classification SQL queries that we will subsequently define in greater detail. CBR is a paradigm which is based on the idea that similar problems have similar solutions. Thus, a new problem is resolved by consulting the case memory to find a similar case which has been resolved in the past. When working with this type of system, the key concept is that of “case”. A case is defined as a previous experience and is composed of three elements: a description of the problem that depicts the initial problem; a solution that describes the sequence of actions performed in order to solve the problem; and the final state, which describes the state that has been achieved once the solution is applied. As previously mentioned, the CBRid4SQL agent is an specialization of a CBR agent which is the key component of a multi-agent architecture and is geared towards classifying SQL queries for the detection of SQL injection attacks. Below, a new classification mechanism incorporated in the internal structure of the CBRid4SQL agent is explained in detail. 3.1 CBRid4SQL Agent In this section the CBRid4SQL agent is presented, with special attention paid to its internal structure and the classification mechanism of SQL attacks. This mechanism combines the advantages of CBR systems, such as learning and adaptation, with the predictive capabilities of a combination integrated by ANNs and SVMs. The use of
CBRid4SQL: A CBR Intrusion Detector for SQL Injection Attacks
513
this combination of techniques is based on the possibility of using two classifiers together to detect suspicious queries in the most reliable way possible. In terms of CBR, the case is composed of elements of the SQL Query described as follows: (a) Problem Description that describes the initial information available for generating a plan. The problem description consists of: case identification, user session and SQL query elements. (b) Solution that describes the action carried out in order to solve the problem description, in this case, prediction models. (c) Final State that describes the state achieved after that the solution has been applied. The fields defining a case are as follows: IdCase, Session, User, IP_Address, Query_SQL, Affected_table, Affected_field, Command_type, Word_GroupBy, Word_Having, Word_OrderBy, Numer_And, Numer_Or, Number_literals, Number_LOL, Length_SQL_String, Start_Time_Execution, End_Time_Execution, and Query_Category. Additionally, the information related to the prediction models used is stored as well.
Fig. 1. CBR cycle and classification mechanism of the CBRid4SQL agent
In Figure 1, the different stages applied in the reasoning cycle can be seen. In summary, in the retrieval stage, there is a selection of queries sorted by type and by the memory’s classification models. In the reuse phase, as seen in figure 1, a Multilayer Perceptron (MLP) and an SVM are applied simultaneously to carry out the prediction of the new query. Subsequently, a new inspection is performed which can be done automatically or by a human expert. In the case of the query resulting as suspicious, further inspection will be carried out manually by a human expert. At this stage the most similar cases will be selected by means of a Growing Cell Structures (GCS) network [12], visualized by a dimensionality reduction technique which employs the
514
C. Pinzón et al.
neuronal model called Cooperative Maximum Likelihood Hebbian Learning (CMLHL). As a result, the human expert will graphically see the relationship between the suspicious query and the recovered queries. During learning, memory information regarding the cases and models will be updated. Below, the different stages of the CBR reasoning cycle associated with the system are described in more detail. 3.1.1 Retrieve The retrieval phase is broken down into two phases; case retrieval and model retrieval. Case retrieval is performed by using the Query_Category attribute which retrieves queries from the case memory (Cr) which were used for a similar query in accordance with attributes of the new case cn. Subsequently, the models for the multilayer perceptron mlpr and svmr associated with the recovered cases are retrieved. The recovery of these memory models allows the improvement of the system’s performance so that the time necessary for the creation of models will be considerably reduced, mainly in the case of the ANN training. 3.1.2 Reuse The reuse phase is performed beginning with the information of the retrieved cases Cr and the recovered models mlpr and svmr. The combination of both techniques is fundamental in the reduction of the rate of false negatives. The inputs of the MLP are: Query_SQL, Affected_table, Affected_field, Command_type, Word_GroupBy, Word_Having, Word_OrderBy, Numer_And, Numer_Or, Number_literals, Number_LOL, and Length_SQL_String. The number of neurons in the hidden layer is 2n+1, where n is the number of neurons in the input layer. Finally, at output, one neuron is had. The activation function selected for the different layers has been the sigmoid. Taking into account the activation function fj, the calculation of output values are given by the following expression N
y jp = fj (∑ w ji(t ) xip(t) + θ j )
(1)
i =1
r
The outputs correspond to x . As the neurons exiting from the hidden layer of the neural network contain sigmoidal neurons with values between [0, 1], the incoming variables are redefined so that their range falls between [0.2-0.8]. This transformation is necessary because the network does not deal with values that fall outside of this range. The outgoing values are similarly limited to the range of [0.2, 0.8] with the value 0.2 corresponding to a non-attack and the value 0.8 corresponding to an attack. The network training is carried out through the error Backpropagation Algorithm [13]. At the same time as the estimation through the use of neuronal networks is performed, estimation is also carried out by the SVM application, a supervised learning technique applied to the classification and regression of elements. The algorithm represents an extension of nonlinear models [14]. SVM also allows the separation of element classes which are not linearly separable. For this the space of initial coordinates is mapped in a high dimensionality space through the use of functions. Due to the fact that the dimensionality of the new space can be very high, it is not feasible to calculate hyperplanes that allow the production of linear separability. For this, a series of non-linear functions called kernels is used.
CBRid4SQL: A CBR Intrusion Detector for SQL Injection Attacks
Let us consider a set of patterns
T = {( x1 , y1 ), ( x2 , y2 ),...,( xm , ym )}
is a vector of the dimension n. The idea is to convert the elements
xi
515
where xi
in a space of
high dimensionality through the application of a function, in such a way that the set of original patterns is converted into the following set
Φ(T ) = {(Φ( x1 ), y1 ), (Φ( x2 ), y2 ),..., (Φ( xm ), ym )}
that, depending on the se-
lected function Φ (x ) , could be linearly separable. To carry out the classification, this equation sign is studied [15]:
⎛ m ⎞ class( xk ) = sign⎜ ∑ λi yi Φ( xi )Φ( xk ) + b ⎟ ⎝ i =1 ⎠
(2)
The selected kernel function in this problem was polynomial. The values used for the estimation are dominated by decision values and are related to the distance from the points to the hyperplane. Once the output values for the ANN and the SVM are obtained, the mixture is performed by way of a weighted average in function of the error rate of each one of the techniques. Before carrying out the average, the values are normalized to the interval [0,1], as SVM provides positive and negative values and those of greater magnitude, so that it could affect the final value in greater measure if it is not redimensioned. 3.1.3 Revise The revise phase can be manual or automatic depending on the output values. The automatic review is given for non-suspicious cases during the estimation obtained for the reuse phase. For cases detected as suspicious, with output values determined experimentally in the interval [0.35, 0.6]), a review by a human expert is performed. As CBR learns, the interval values are automatically adjusted to the smallest of the false negatives. The greater limit is constantly maintained throughout the iterations. The review consists of recovering queries similar to the current one together with previous classifications. This combines a clustering technique for the selection of similar requests with a neuronal model for the reduction of dimensionality, which permits visualisation in 2D or 3D. The selection of similar cases is carried out through the use of a neuronal GCS network, the different cases are distributed in meshes and the mesh in which the new case is found is selected. To visualize the cases (those in the selected mesh), the dimensionality of data is reduced by means of the CMLHL neuronal model [16] which performs Exploratory Projection Pursuit by unsupervised learning. Considering an Ndimensional input vector ( x ), and an M-dimensional output vector ( y ), with Wij being the weight (linking input
j to output i ), then CMLHL can be expressed as:
Feed-forward step: Lateral activation passing:
N
yi = ∑Wij x j ,∀i
(3)
y i (t + 1) = [ y i (t) + τ (b − Ay )]
+
(4)
516
C. Pinzón et al.
Feedback step:
M
e j = x j − ∑ Wij yi , ∀j º
(5)
ΔWij = η . yi .sign(e j ) | e j | p −1
(6)
i =1
Weight change:
η is the learning rate, τ is the “strength” of the lateral connections, b the bias parameter, p a parameter related to the energy function [14], [15] and A is a Where:
symmetric matrix used to modify the response to the data [14]. The effect of this matrix is based on the relation between the distances separating the output neurons. Finally, the information is represented and the associated queries are recovered with the retrieved mesh, as can be seen in Fig. 2. 3.1.4 Retain The learning phase updates the information of the new classified case and reconstructs the classifiers offline to leave the system available for new classifications. The ANN classifier is reconstructed only when an erroneous classification is produced. In the case of a reference to inspection of suspicious queries, information and classifiers are updated when the expert updates the information.
4 Experimental Results and Conclusions A sample web application with access to a MySQL 5.0 database was developed to check the proposed approach. Once the database had been created, legal queries were sent from the designed user interfaces. In the case of malicious queries, the dispatch of the queries was automated using the agent SQLMAP0.5 [17]. This tool is able to fingerprint an extensive DBMS back-end, retrieve remote DBMS databases and so on. To analyze the successful rates, a test of the classification of queries was conducted, taking into account the following classifiers: Bayesian Network, Naive Bayes, AdaBoost M1, Bagging, DecisionStump, J48, JRIP, LMT, Logistic, LogitBoost, MultiBoosting AdaBoost, OneR, SMO, Stacking. The different classifiers were applied to 705 previously classified queries (437 legal, 268 attacks). The consecutive process to carry out the output test was the following: selecting one of the cases, extracting it from the set, conducting the model starting from the remaining cases and classifying the extracted case. This process is repeated for each one of the cases and techniques in order to analyze each query without it being used to build the model. The final result of the classification can be seen in Table 1. Table 1. Total number of hits for the different classifiers
Method BayesNet Bagging JRIP LogitBoost SMO
638 684 692 680 685
Method Naive Bayes DecisionStump LMT MultiBoostAB Stacking
666 598 693 666 437
Method AdaBoostM1 J48 Logistic OneR CBRid4SQL
665 689 688 622 698
CBRid4SQL: A CBR Intrusion Detector for SQL Injection Attacks
517
As can be seen in Table 1, the higest-performance system is CBRid4SQL, which has a success rate of 698/705. The number of queries detected as suspicious was limited to 7 being one of those shown below: select pedido_cliente.id_pedido, linea, codigo, nombre, precio from pedido_lineas, pedido_cliente, producto where pedido_cliente.id_pedido = pedido_lineas.id_pedido and producto.codigo = pedido_lineas.codigo and pedido_cliente.id_pedido = 1 OR 1 = 1 order by fecha desc
This query represents an attack on the database since the presence of OR 1=1implies the retrieval of a number of records not associated with requests from the client. The value obtained by the ANN for this query was 0.28. However SVM deemed that the output value was 0.66. The mixture gave an output value of 0.47, which is in the range of suspicious queries. If the ANN had been applied alone it would have considered this query as a valid one. However, the SVM would have considered it as an attack. The mixture deemed suspicious and a review would be carried out manually.
Fig. 2. SQL queries recovered in the revise stage
During the manual review similar queries are recovered and dimensionality is reduced. In Figure 2 the obtained results to be shown to the human expert can be seen. The most similar queries are coloured: queries that correspond as legal are shown in green, attacks are in red and current queries are in blue. Non-recovered queries are shown in black. A set of queries different from the rest is recovered, both normal and abnormal. An example of each of these is shown below. select pedido_cliente.id_pedido, linea, codigo, nombre, precio from pedido_lineas, pedido_cliente, producto where pedido_cliente.id_pedido = pedido_lineas.id_pedido and producto.codigo = pedido_lineas.codigo and pedido_cliente.id_pedido = 1 AND ORD(MID((CONCAT(CHAR(55), CHAR(55))), 1, 1)) > 63 order by fecha desc select pedido_cliente.id_pedido, linea, codigo, nombre, precio from pedido_lineas, pedido_cliente, producto where pedido_cliente.id_pedido = pedido_lineas.id_pedido and producto.codigo = pedido_lineas.codigo and pedido_cliente.id_pedido = 1 AND 1 = 1 order by fecha desc
The first of the queries is a clear attack, while second of the queries could also present uncertainties due to the presence of the literal 1=1. Being that a query more restrictive than the original, it would retrieve the same values or less, which would not be a very
518
C. Pinzón et al.
intelligent for an attack. In any case, the system considers it as such and provides an output value of 0.66 and it is thus also filtered a priori, but this should not be worrying. This is one of the false positives that the system presents within the 7 existing in the 705. The combination of different paradigms of AI allows the development of a HAIS with characteristics such as the capacity for learning and reasoning, flexibility and robustness which make the detection of SQL injection attacks possible. The proposed CBRid4SQL agent is capable of detecting these abnormal situations with low error rates compared with other existing techniques, as demonstrated in Table 1. It also provides a decision mechanism which eases the review of suspicious queries through the selection of similar queries and their visualization using neuronal models. Acknowledgments. This development has been partially supported by the Spanish Ministry of Science and Technology project OVAMAH: TIN 2009-13839-C03-03, Junta of Castilla and León (JCyL): [BU006A08], the project of the Spanish Ministry of Education and Innovation [CIT-020000-2008-2] and [CIT-020000-2009-12], and Grupo Antolin Ingenieria, S.A., within the framework of project MAGNO2008 1028.- CENIT also funded by the same Government Ministry and The Professional Excellence Program 2006-2010 IFARHU-SENACYT-Panama.
References 1. Halfond, W.G.J., Viegas, J., Orso, A.: A Classification of SQL-Injection Attacks and Countermeasures. In: Proceedings of the IEEE International Symposium on Secure Software Engineering, Arlington, VA, USA (2006) 2. Bockermann, C., Apel, M., Meier, M.: Learning SQL for Database Intrusion Detection Using Context-Sensitive Modelling (Extended Abstract). In: Flegel, U., Bruschi, D. (eds.) DIMVA 2009. LNCS, vol. 5587, pp. 196–205. Springer, Heidelberg (2009) 3. Kemalis, K., Tzouramanis, T.: SQL-IDS: a specification-based approach for SQL-injection detection. In: Proceedings of the 2008 ACM symposium on Applied computing (SAC 2008). ACM, New York (2008) 4. Kiani, M., Clark, A., Mohay, G.: Evaluation of Anomaly Based Character Distribution Models in the Detection of SQL Injection Attacks. In: Third International Conference on Availability, Reliability and Security (ARES 2008). IEEE Computer Society, Washington (2008) 5. Bertino, E., Kamra, A., Early, J.: Profiling Database Applications to Detect SQL Injection Attacks. In: Proceedings of the Performance, Computing, and Communications Conference, IPCCC 2007 (2007) 6. Robertson, W., Vigna, G., Kruegel, C., Kemmerer, R.A.: Using Generalization and Characterization Techniques in the Anomaly-Based Detection of Web Attacks. In: 13th Annual Network and Distributed System Security Symposium, NDSS 2006 (2006) 7. García, V.H., Monroy, R., Quintana, M.: Web Attack Detection Using ID3. In: International Federation for Information Processing (2006) 8. Valeur, F., Mutz, D., Vigna, G.: A Learning-Based Approach to the Detection of SQL Attacks. In: Julisch, K., Krügel, C. (eds.) DIMVA 2005. LNCS, vol. 3548, pp. 123–140. Springer, Heidelberg (2005)
CBRid4SQL: A CBR Intrusion Detector for SQL Injection Attacks
519
9. Corchado, J.M., Laza, R.: Constructing deliberative agents with case-based reasoning technology. International Journal of Intelligent Systems 18, 1227–1241 (2003) 10. Mukkamala, S., Sung, A.H., Abraham, A.: Intrusion detection using an ensemble of intelligent paradigms. Journal of Network and Computer Applications 28(2), 167–182 (2005) 11. Carrascosa, C., Bajo, J., Julian, V., Corchado, J.M., Botti, V.: Hybrid multi-agent architecture as a real-time problem-solving model. Expert Systems with Applications 34(1), 2–17 (2008) 12. Fritzke, B.: A Growing Neural Gas Network Learns Topologies. In: Advances in Neural Information Processing Systems, vol. 7. MIT Press, Cambridge (1995) 13. LeCun, Y., Bottou, L., Orr, G.B., Müller, K.R.: Efficient BackProp. In: Orr, G.B., Müller, K.-R. (eds.) NIPS-WS 1996. LNCS, vol. 1524, p. 9. Springer, Heidelberg (1998) 14. Corchado, E., Fyfe, C.: Connectionist Techniques for the Identification and Suppression of Interfering Underlying Factors. International Journal of Pattern Recognition and Artificial Intelligence 17(8), 1447–1466 (2003) 15. Corchado, E., MacDonald, D., Fyfe, C.: Maximum and Minimum Likelihood Hebbian Learning for Exploratory Projection Pursuit. Data Mining and Knowledge Discovery 8(3), 203–225 (2004) 16. Herrero, Á., Corchado, E., Sáiz, L., Abraham, A.: DIPKIP: A Connectionist Knowledge Management System to Identify Knowledge Deficits in Practical Cases. Computational Intelligence 26(1), 26–56 (2010) 17. Damele, B.: SQLMAP0.5 – Automated SQL Injection Tool (2007)
Author Index
Abu´ın, Javier Sanchez I-524 Aguilar, Ramiro II-53 Alberdi, Amaia I-327 Alonso, J.B. I-302 Alonso, Luis II-53 Alonso, Ricardo S. II-111 Alonso-R´ıos, D. II-217 ´ Alvarez, A. I-302 ´ Alvarez, Ignacio I-468, I-476 ´ Alvarez-S´ anchez, Jos´e Ram´ on I-245 Amanatiadis, A. II-391 Arana, R. I-270 Aranda-Corral, Gonzalo A. II-383 ´ Araujo, Alvaro II-486 Ara´ ujo, Ricardo de A. II-351 Arbelaitz, Olatz II-151 Argente, Estefan´ıa II-159, II-193 Armano, Giuliano I-548 Artaza, F. I-343 Artaza, Fernando I-368 Asla, N. I-286 Assadipour, Ghazal I-359 ´ Avila-Jim´ enez, Jos´e Luis II-9 Bahig, Hatem M. II-209 Baig, Abdul Rauf I-56 Bajo, Javier I-96, II-444, II-510 Baldassarri, Paola II-296 Bankovi´c, Zorana II-486 Barroso, N. I-196 Barroso, O. I-196 Baruque, Bruno II-501 Batista, Vivian F. L´ opez I-104 Batouche, Mohammed I-48 Bellas, Francisco I-88 Berlanga, Antonio II-436 Bernardo, Jon Alzola I-319 Bernardos, Ana M. II-468 Blankertz, Benjamin I-413 Blesa, Javier II-486 Borrego-D´ıaz, Joaqu´ın II-383 Bot´ıa, Juan A. I-64 Botia, Juan A. I-80 Boto, Fernando I-500, I-524 Botti, Vicente II-177
Bragaglia, Stefano I-438 Burduk, Robert I-532 Buza, Krisztian I-557 Caama˜ no, Pilar I-88 Campos, Jordi II-168 Cano, Alberto II-17 Carbonero-Ruz, M. II-280 Carrascal, Alberto I-327 Casar, Jos´e R. II-468 Casillas, Jorge II-1 Casta˜ nos, David Lecumberri I-492 Castelo, Francisco Javier Perez I-385 Castro, Paula M. II-248 Cavazos, Alberto I-429 Chaves, Rosa I-446, I-452, I-468, I-476, I-516 Chen, Jungan II-201 Chen, Wenxin II-201 Chesani, Federico I-438 Chinga-Carrasco, Gary I-144 Chira, Camelia I-405, II-119 Chmielnicki, Wieslaw I-162 Chyzyk, Darya II-429 Ciampolini, Anna I-438 Cilla, Rodrigo II-436 Cimenbicer, Cem I-178 Claver, Jos´e M. II-233 Corchado, Emilio II-101, II-501, II-510 Corchado, Juan M. II-53, II-85, II-93, II-444 Corrales-Garc´ıa, Alberto II-233 Couso, In´es II-45 Cri¸san, Gloria-Cerasela I-405 Cruz-Ram´ırez, M. II-280, II-288 Cuadra, J.M. I-245 Cuenca, Pedro II-225 Cuevas, F.J. I-40 Cyganek, Boguslaw I-254 d’Anjou, Alicia II-241 Dapena, Adriana II-248 de Blas, Mariano I-500 ´ de Gauna, Oscar Berasategui Ruiz
I-319
522
Author Index
de Goyeneche, Juan-Mariano II-486 de Ipi˜ na, K. L´ opez I-196, I-286, I-508 de la Cal, E.A. II-143 de la Cal, Enrique I-421 de la Hoz Rastrollo, Ana Bel´en I-492 delaPaz, F. I-245 de la Prieta, Fernando II-61, II-93 de Lope, Javier II-77 del R´ıo, B. Baldonedo II-217 de Luis, Ana I-96 de Mendivil, Rafael Yuguero Gonz´ alez I-319 De Miguel Catoira, Alberto I-395 De Paz, Juan F. II-510 De Paz, Juan F. I-229, II-85, II-111 D´ıaz, I. I-237 Djaghloul, Haroun I-48 Dom´ınguez, Ra´ ul II-77 Dong, Jun I-136 Dragoni, Aldo Franco II-296 Duro, Richard J. I-88 Ercan, M. Fikret I-24 Esgin, Eren I-178 Esmi, Estev˜ ao II-343 Esparcia, Sergio II-159, II-193 Esseghir, M.A. I-351 Esteva, Marc II-168 Ezeiza, A. I-196 Fenner, Trevor I-152 Fern´ andez, E. I-245 Fern´ andez, E.M. II-143 Fern´ andez-Escribano, Gerardo II-233 Fernandez-Gauna, Borja I-73, II-312, II-335 Fern´ andez-Navarro, F. II-280 Ferr´ andez, J.M. I-245 Ferrer, M.A. I-302 Foresti, Gian Luca II-452 Fraga, David II-486 Fraile, Juan A. I-96 Fuangkhon, Piyabute I-128 Fuerte, Mercedes Villa II-135 Fuertes, Juan Jos´e I-302 Gaji´c, Vladeta I-205 Gald´ os, Andoni I-213 Garay, Naiara Telleria I-492 Garcia, Ander II-151
Garc´ıa-Fornes, Ana II-193 Garc´ıa, Guillermo I-500, I-524 Garc´ıa-Guti´errez, Jorge II-272 Garcia-Gutierrez, Jorge II-493 Garc´ıa, Jes´ us II-460 Garc´ıa, Mar´ıa N. Moreno I-104 Garc´ıa-Naya, Jos´e A. II-248 ´ Garc´ıa, Oscar II-111 Garcia, Ramon Ferreiro I-385, I-395 Garc´ıa-Sedano, Javier A. I-319 Garc´ıa-Tamargo, Marco I-421 Garcia-Valverde, Teresa I-80 Garc´ıa, Yazmani II-127 Garrido, Antonio II-225 Garrido-Cantos, Rosario II-225 Garro, Beatriz A. I-376 Gasc´ on-Moreno, J. II-304 Ghorbani, Ali A. I-1 Gibaja, Eva II-9 Gil, Ana II-61, II-85 ´ Gil, Oscar II-111 Goenetxea, Jon I-213 G´ omez-Garay, Vicente I-368 Gomez, L.E. I-40 G´ omez, V. I-343 Goncalves, Gilles I-351 Gonz´ alez, Ang´elica II-111 Gonz´ alez, Asier Gonz´ alez I-319 Gonz´ alez, Michel II-1 Gorawski, Marcin I-187 G´ orriz, J.M. I-452, I-460, I-476, I-484, I-516 G´ orriz, Juan-Manuel I-446, I-468 Graczyk, Magdalena I-581 Gra˜ na, Manuel I-500, I-524 Guti´errez, P.A. II-280 Hatami, Nima I-548 Heras, Stella II-177 Hern´ andez, Angeles I-429 Hern´ andez, Carmen II-69 Hern´ andez, M.C. I-508 Hern´ andez, Paula Hern´ andez II-135 ´ Herrero, Alvaro II-101, II-510 Herv´ as-Mart´ınez, C. II-280, II-288 He, Xingxing II-320 Hillairet, Guillaume I-311 Hoffmann, Matej II-478 Hogan, Emilie II-399
Author Index Ibarguren, A. I-270 Iglesia, Daniel II-248 Ill´ an, I.A. I-446, I-452, I-516 Iragorri, Eider Egilegor I-492 Irigoyen, E. I-286, I-343 Irigoyen, Eloy I-368 Jabeen, Hajira I-56 Jackowski, Konrad I-540 Jeon, Sungchae I-278 Jessel, Jean-Pierre I-48 Jiang, Jianmin I-120 Jimenez, J.F. I-40 Jolai, Fariborz I-359 Joslyn, Cliff II-399 Juli´ an, Vicente II-101, II-177, II-193 Jureczek, Pawel I-187 Kaburlasos, Vassilis G. II-391, II-410 Kajdanowicz, Tomasz I-573 Kazienko, Przemyslaw I-573 Keck, I.R. I-460 Kim, Eunyoung I-278 Kim, Minkyung I-278 Koci´c-Tanackov, Sunˇcica I-32 Kodewitz, A. I-460 Koene, Randal A. II-478 Kotb, Yasser II-209 Kramer, Oliver I-221, I-262 Kraszewski, Jan I-573 Lang, Elmar I-468 Lang, Elmar W. I-460 Larrea, M. I-343 Lasota, Tadeusz I-581 L¨ assig, J¨ org I-262 Legarreta, Jon Haitz I-500, I-524 Liang, Feng II-201 Liang, Ximing I-24 Linaza, Maria Teresa II-151 Liu, Jun II-320, II-328 Liu, Xia I-136 Li, Xiang I-24 Li, Yingfang II-320 Llinas, James I-14 L´ opez-Guede, Jos´e Manuel I-73, I-492, II-241, II-312 L´ opez, Miriam I-446, I-452, I-468, I-476, I-516
L´ opez, Otoniel II-256 L´ opez-S´ anchez, Maite II-168 L´ opez, Vivian F. II-53, II-61 Lorente, V. I-245 Lucas, Joel Pinho I-104 Luna, J.M. II-27 Lu, Xiaofen I-335 Mac´ıa, Iv´ an I-500, I-524 Madrazo, Eva II-468 Maiora, Josu I-500, I-524 Malag´ on, Pedro II-486 Malumbres, Manuel P. II-256 Maravall, Dar´ıo II-77 Mart´ı, Enrique II-460 Mart´ınez, E. I-508 Mart´ınez-Estudillo, F.J. II-288 Mart´ınez, Jos´e Luis II-225, II-233 Martinez, Luis II-320, II-328 Mart´ınez-Otzeta, J.M. I-270 Mart´ınez, R. I-286 Mart´ınez-Rach, Miguel II-256 Martin, Marcel I-221 Mart´ın, M. Jos´e Polo I-104 Mata-Jim´enez, Marco-Tulio I-429 Matei, O. II-119 Mateos-Garc´ıa, Daniel II-272 Mateos-Garcia, Daniel II-493 Maycock, Jonathan I-221 Mazzieri, Mauro II-296 Mello, Paola I-438 Mendez, Gerardo M. I-429 Mirkin, Boris I-152 Molina, Jose M. II-436 Molina, Jose Manuel II-460 Montali, Marco I-438 Monta˜ n´es, E. I-237 Morell, Carlos II-1 Moreno, Aitor I-213 Moreno, Mar´ıa N. II-53 Moreno, Ram´ on II-241 Mosqueira-Rey, E. II-217 Mosquera, Antonio II-264 Moya, Jos´e M. II-486 M¨ uller, Klaus-Robert I-413 Mu˜ noz, Andr´es I-64 Nanopoulos, Alexandros I-557 Nascimento, Susana I-152 Navarro, Mart´ı II-101
523
524
Author Index
Nieto-Taladriz, Octavio Novo, Jorge II-264
II-486
Ochoa, Alberto II-127 Olivares, Alberto I-484 Olivares, Gonzalo I-484 Oliver, Jos´e II-256 Onaindia, Eva II-185 Onut, Iosif-Viorel I-1 Ortiz-Garc´ıa, E.G. II-304 Oses, Noelia II-478 Otero, Jos´e II-45 Padilla, Pablo I-446, I-452, I-468, I-476, I-516 Paloc, C´eline I-500, I-524 Pan, Xiaodong II-328 Papadakis, S.E. II-391 Patricio, Miguel A. II-436 Pauplin, Olivier I-120 Pechenizkiy, Mykola II-35 Pel´ aez-Moreno, Carmen II-375 Pe˜ na, Carlos Pertusa I-492 Penedo, Manuel G. II-264 Pereira, Lu´ıs Moniz I-152 P´erez-Bellido, A.M. II-304 P´erez, Javier I-229 P´erez-Lancho, Bel´en II-444 Pi˜ nol, Pablo II-256 Pintea, Camelia-M. I-405 Pinz´ on, Cristian II-510 Pinz´ on, Cristian I-229 Pop, P.C. II-119 Portilla-Figueras, A. II-304 Prado-Gesto, D. II-217 Prieto, Abraham I-88 Puntonet, C.G. I-476, I-516 Quevedo, J.R. I-237 Quiroga, R. II-143 Ram´ırez, Javier I-452, I-468, I-476, I-484, I-516 Ram´ırez, J. I-446 Ram´ırez Moreno, M.C. II-288 Ramos, Luc´ıa II-264 Ranilla, J. I-237 Raveaux, Romain I-311 Reyes, Laura Cruz II-135 Riquelme-Santos, Jos´e C. II-272, II-493 Ritter, Gerhard X. II-359, II-367
Rodr´ıguez-Poch, E. II-217 Rodr´ıguez-S´ anchez, Rafael II-233 Rodr´ıguez, Sara II-85, II-93, II-444 Rolle, Jose Luis Calvo I-385 Romero, Elena II-486 Romero, J.R. II-27 Rouco, Jos´e II-264 Ruan, Da II-320, II-328 Ruz, Mariano Carbonero II-288 Salamat, Nadeem I-294 Salas-Gonzalez, D. I-446, I-452, I-476, I-516 Salas-Gonz´ alez, Diego I-468 Salcedo-Sanz, S. II-304 S´ anchez-Anguix, V´ıctor II-193 S´ anchez, Jos´e Luis II-233 S´ anchez, Luciano II-45 S´ anchez-Monedero, Javier II-288 Sannelli, Claudia I-413 Santill´ an, Claudia G´ omez II-135 Sanz, Beatriz Ferreiro I-395 Sapena, Oscar II-185 Satzger, Benjamin I-262 Savio, Alexandre II-429 Schmidt, Florian Paul I-221 Schmidt-Thieme, Lars I-557 Sedano, J. II-143 Sedano, Javier I-421 Segovia, Fermin I-446, I-452, I-468, I-476, I-516 ´ Segura, Alvaro I-213 Senkul, Pinar I-178 Seo, Changwoo I-278 Serrano, Emilio I-80 Simi´c, Dragan I-32, I-205 Simi´c, Svetlana I-205 ˙ S ¸ im¸sek, Irfan I-170 Sitar, C. Pop II-119 Slimani, Yahya I-351 Snidaro, Lauro II-452 Sossa, Humberto I-40, I-376, II-418 Sottara, Davide I-438 Souffriau, Wouter II-151 Sremac, Siniˇsa I-32 Sta´ nczyk, Urszula I-565 St¸apor, Katarzyna I-162 Susperregi, U. I-196 Sussner, Peter II-343, II-351
Author Index Tanackov, Ilija I-32 Tang, Ke I-335 Tanprasert, Thitipong I-128 Tapia, Dante I. I-96, II-93 Telec, Zbigniew I-581 Tellaeche, A. I-270 Tepi´c, Jovan I-32 Termenon, Maite II-429 Teymanoglu, Yaddik II-127 Tom´e, A.M. I-460 Tong, Jia-fei I-136 Topuz, Vedat I-112, I-170 Torre˜ no, Alejandro II-185 Travieso, Carlos M. I-302 Trawi´ nski, Bogdan I-581 Unzueta, Luis I-213 Urcid, Gonzalo II-359, II-367 Valdiviezo-N., Juan Carlos II-359 Valera, J. I-343 Vallejo, Juan Carlos II-486 Vallesi, Germano II-296 Valverde-Albacete, Francisco J. II-375 Vansteenwegen, Pieter II-151 Vaquero, C. I-508 V´ azquez, Roberto A. I-376, II-418 Veganzones, Miguel A. II-69
Vega, Pastora II-85 Ventura, Sebasti´ an II-9, II-17, II-27, II-35 Vidaurre, Carmen I-413 Vilches, Borja Ayerdi I-492 Villanueva, Daniel II-486 Villar, Jose R. I-421 Villar, J.R. II-143 Villaverde, Ivan II-335 Visentini, Ingrid II-452 Wozniak, Michal Xu, Yang
I-590
II-320, II-328
Yamakawa, Asuka I-144 Ya˜ nez, Javier II-127 Yao, Xin I-335 Yavuz, Erdem I-112 Zafra, Amelia II-17, II-35 Zahzah, El-hadi I-294 Zato, Carolina I-229, II-444 Zezzatti, Carlos Alberto Ochoa Ort´ız II-135 Zmyslony, Marcin I-590 Zulueta Guerrero, Ekaitz I-492 Zulueta, Ekaitz I-73, II-312, II-335
525