ATLANTIS A MBIENT AND P ERVASIVE I NTELLIGENCE VOLUME 4 S ERIES E DITOR : I SMAIL K HALIL
Atlantis Ambient and Pervasive Intelligence Series Editor: Ismail Khalil, Linz, Austria (ISSN: 1875-7642)
Aims and scope of the series The book series ‘Atlantis Ambient and Pervasive Intelligence’ publishes high quality titles in the fields of Pervasive Computing, Mixed Reality, Wearable Computing, LocationAware Computing, Ambient Interfaces, Tangible Interfaces, Smart Environments, Intelligent Interfaces, Software Agents and other related fields. We welcome submission of book proposals from researchers worldwide who aim at sharing their results in this important research area. For more information on this series and our other book series, please visit our website at: www.atlantis-press.com/publications/books
A MSTERDAM – PARIS c ATLANTIS PRESS
Activity Recognition in Pervasive Intelligent Environments
Liming Chen
Chris D. Nugent
School of Computing and Mathematics University of Ulster Shore Road, Newtownabbey County Antrim BT37 0QB United Kingdom
School of Computing and Mathematics University of Ulster Shore Road, Newtownabbey County Antrim BT37 0QB United Kingdom
Jit Biswas
Jesse Hoey
Networking Protocols Department Institute of Infocomm Research (I2R) Singapore
School of Computer Science University of Waterloo 200 University Avenue West Waterloo, Ontario N2L 3G1 Canada
A MSTERDAM – PARIS
Atlantis Press 8, square des Bouleaux 75019 Paris, France For information on all Atlantis Press publications, visit our website at: www.atlantis-press.com Copyright This book, or any parts thereof, may not be reproduced for commercial purposes in any form or by any means, electronic or mechanical, including photocopying, recording or any information storage and retrieval system known or to be invented, without prior permission from the Publisher.
Atlantis Ambient and Pervasive Intelligence Volume 1: Agent-Based Ubiquitous Computing – Eleni Mangina, Javier Carbo, José M. Molina Volume 2: Web-Based Information Technologies and Distributed Systems – Alban Gabillon, Quan Z. Sheng, Wathiq Mansoor Volume 3: Multicore Systems On-Chip: Practical Software/Hardware Design – Abderazek Ben Abdallah
ISBNs Print: E-Book: ISSN:
978-90-78677-42-0 978-94-91216-05-3 1875-7642
c 2011 ATLANTIS PRESS
Preface
We have now entered an era where technology is being embedded transparently and seamlessly within our surrounding environments. This is being driven by decreasing hardware costs, increased functionality and battery life along with improved levels of pervasiveness. With such a technology rich paradigm we are now witnessing for the first time intelligent environments with the ability to provide support within our homes, the community and in the workplace. The knock-on effect has an impact on both improved living experiences within the environment along with increased levels of independence. At a general level we can decompose the construct of an intelligent environment into three main components. In the first instance we have the core sensing technology which has the ability to record the interactions with the environment. These may be in the form of for example video, contact sensors or motion sensors. A data processing module has the task to infer decisions based on the information gleaned from the sensing technology and with the third and final component providing the feedback to those within the environment via a suite of multi-modal interfaces. It has been the aim of this text to focus specifically on the data processing module, specifically focusing on the notion of activity recognition. Within the domain of intelligent environments some may have the view that the process of activity recognition forms the critical path in providing a truly automated environment. It is tasked with extracting and establishing meaningful activities from a myriad of sensor activations. Although work in this area is still deemed to be emerging, the initial results achieved have been more than impressive. This text represents a consolidation of 14 Chapters presenting leading research results in the area of activity recognition. The material addressed ranges from collective state-of-the-art reviews, to probabilistic and ontological based reasoning approaches to specific examples in the areas of assistance with activities of daily living. v
vi
Activity Recognition in Pervasive Intelligent Environments
The text is intended for those working within the area of intelligent environments who require a detailed understanding of the processes of activity recognition along with their technical variances. The inclusion of specific case studies assists with the further contextulisation of the theoretical concepts which have been introduced. We would like to take this opportunity to thank all of the Authors for their timely contributions in provision of their Chapters in both initial and revised formats along with all of the dedicated efforts provided by those who undertook to review the material. We would also like to express our gratitude to Ismail Khalil who providing the inspiration to undertake this Project and provided continual motivation and advice throughput. We would also like to thank Atlantis Press for supporting the text and also for their help in producing the final version of the text. We hope that the text becomes a valuable reference source within the field of activity recognition and assists in further progress the translation of research efforts into tangible large scale intelligent environments that we can all benefit from. Liming Chen Chris Nugent Jit Biswas Jesse Hoey
Contents
Preface
v
1.
1
Activity Recognition: Approaches, Practices and Trends Liming Chen, Ismail Khalil 1.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
1.2
Activity recognition approaches and algorithms . . . . . . . . . . . . . .
3
1.2.1
Activity recognition approaches . . . . . . . . . . . . . . . . . .
3
1.2.2
Activity recognition algorithms . . . . . . . . . . . . . . . . . .
5
1.2.3
Ontology-based activity recognition . . . . . . . . . . . . . . .
8
1.3
The practice and lifecycle of ontology-based activity recognition . . . . . 10 1.3.1
Domain knowledge acquisition . . . . . . . . . . . . . . . . . . 11
1.3.2
Formal ontology modelling . . . . . . . . . . . . . . . . . . . . 12
1.3.3
Semantic sensor metadata creation . . . . . . . . . . . . . . . . 13
1.3.4
Semantic sensor metadata storage and retrieval . . . . . . . . . . 14
1.3.5
Activity recognition . . . . . . . . . . . . . . . . . . . . . . . . 15
1.3.6
Activity model learning . . . . . . . . . . . . . . . . . . . . . . 16
1.3.7
Activity assistance . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.4
An exemplar case study . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.5
Emerging research on activity recognition . . . . . . . . . . . . . . . . . 22
1.6
1.5.1
Complex activity recognition . . . . . . . . . . . . . . . . . . . 22
1.5.2
Domain knowledge exploitation . . . . . . . . . . . . . . . . . . 24
1.5.3
Infrastructure mediated activity monitoring . . . . . . . . . . . . 25
1.5.4
Abnormal activity recognition . . . . . . . . . . . . . . . . . . . 26
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 vii
viii
2.
Activity Recognition in Pervasive Intelligent Environments
Possibilistic Activity Recognition
33
P.C. Roy, S. Giroux, B. Bouchard, A. Bouzouane, C. Phua, A. Tolstikov, and J. Biswas 2.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.2
Overall Picture of Alzheimer’s disease . . . . . . . . . . . . . . . . . . . 36
2.3
Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.4
Possibilistic Activity Recognition Model . . . . . . . . . . . . . . . . . . 41
2.5
2.6
2.4.1
Environment Representation and Context . . . . . . . . . . . . . 42
2.4.2
Action Recognition . . . . . . . . . . . . . . . . . . . . . . . . 42
2.4.3
Behavior Recognition . . . . . . . . . . . . . . . . . . . . . . . 45
2.4.4
Overview of the activity recognition process . . . . . . . . . . . 48
Smart Home Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 2.5.1
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.5.2
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.5.3
Summary of Our Contribution . . . . . . . . . . . . . . . . . . . 55
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.
Multi-user Activity Recognition in a Smart Home
59
Liang Wang, Tao Gu, Xianping Tao, Hanhua Chen, and Jian Lu 3.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.2
Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.3
Multi-modal Wearable Sensor Platform . . . . . . . . . . . . . . . . . . 63
3.4
Multi-chained Temporal Probabilistic Models . . . . . . . . . . . . . . . 64
3.5
3.4.1
Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.4.2
Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.4.3
Coupled Hidden Markov Model . . . . . . . . . . . . . . . . . . 66
3.4.4
Factorial Conditional Random Field . . . . . . . . . . . . . . . 68
3.4.5
Activity Models in CHMM and FCRF . . . . . . . . . . . . . . 70
Experimental Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 3.5.1
Trace Collection . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.5.2
Evaluation Methodology . . . . . . . . . . . . . . . . . . . . . 73
3.5.3
Accuracy Performance . . . . . . . . . . . . . . . . . . . . . . 73
Contents
ix
3.6
Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . 77
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 4.
Smart Environments and Activity Recognition: a Logic-based Approach
83
F. Mastrogiovanni, A. Scalmato, A. Sgorbissa, and R. Zaccaria 4.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.2
Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.3
Representation of Temporal Contexts . . . . . . . . . . . . . . . . . . . 90
4.4
Assessment of Temporal Contexts . . . . . . . . . . . . . . . . . . . . . 97
4.5
Experimental Results and Discussion . . . . . . . . . . . . . . . . . . . 98 4.5.1
An Example of System Usage . . . . . . . . . . . . . . . . . . . 100
4.5.2
A Discussion about context assessment complexity and system performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
4.6
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 5.
ElderCare: An Interactive TV-based Ambient Assisted Living Platform
111
D. Lopez-de-Ipina, S. Blanco, X. Laiseca, I. Diaz-de-Sarralde 5.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
5.2
Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
5.3
The ElderCare Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 5.3.1
5.4
5.5
Eldercare Platform Components . . . . . . . . . . . . . . . . . . 116
Implementation Overview . . . . . . . . . . . . . . . . . . . . . . . . . 117 5.4.1
Eldercare’s Local System . . . . . . . . . . . . . . . . . . . . . 118
5.4.2
ElderCare’s Central Server . . . . . . . . . . . . . . . . . . . . 119
5.4.3
ElderCare’s Mobile Client . . . . . . . . . . . . . . . . . . . . . 120
Conclusion and Further Work . . . . . . . . . . . . . . . . . . . . . . . . 124
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 6.
An Ontology-based Context-aware Approach for Behaviour Analysis
127
Shumei Zhang, Paul McCullagh, Chris Nugent, Huiru Zheng 6.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
6.2
Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
6.3
Data Collection and Ontological Context Extraction . . . . . . . . . . . . 131
x
Activity Recognition in Pervasive Intelligent Environments
6.4
6.5
6.6
6.3.1
Activity Context Extraction . . . . . . . . . . . . . . . . . . . . 131
6.3.2
Location Context Detection . . . . . . . . . . . . . . . . . . . . 132
6.3.3
Schedule Design . . . . . . . . . . . . . . . . . . . . . . . . . . 134
Ontological ADL Modelling and Knowledge Base (KB) Building . . . . 135 6.4.1
Ontological Modelling . . . . . . . . . . . . . . . . . . . . . . . 135
6.4.2
Knowledge Base Building . . . . . . . . . . . . . . . . . . . . . 137
Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 6.5.1
iMessenger Ontologies . . . . . . . . . . . . . . . . . . . . . . 139
6.5.2
Rules Definition . . . . . . . . . . . . . . . . . . . . . . . . . . 140
6.5.3
Case Study: Querying the Ontology Using SQWRL . . . . . . . 141
Discussion and future work . . . . . . . . . . . . . . . . . . . . . . . . . 145
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 7.
User’s Behavior Classification Model for Smart Houses Occupant Prediction 149 R. Kadouche, H. Pigot, B. Abdulrazak, S. Giroux 7.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
7.2
Background and Related Work . . . . . . . . . . . . . . . . . . . . . . . 150
7.3
Our Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 7.3.1
7.4
7.5
7.6
Support Vector Machines (SVM) . . . . . . . . . . . . . . . . . 152
Experimentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 7.4.1
DOMUS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
7.4.2
CASAS Smart Home Project . . . . . . . . . . . . . . . . . . . 157
Result and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 7.5.1
SVM Vs others classifiers . . . . . . . . . . . . . . . . . . . . . 161
7.5.2
BCM Accuracy Results’ . . . . . . . . . . . . . . . . . . . . . . 162
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 8.
Activity Recognition Benchmark
165
T.L.M. van Kasteren, G. Englebienne and B.J.A. Kröse 8.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
8.2
Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 8.2.1
Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
8.2.2
Naive Bayes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
Contents
8.3
8.4
xi
8.2.3
Hidden Markov model . . . . . . . . . . . . . . . . . . . . . . . 168
8.2.4
Hidden semi-Markov model . . . . . . . . . . . . . . . . . . . . 168
8.2.5
Conditional random fields . . . . . . . . . . . . . . . . . . . . . 169
8.2.6
Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
8.2.7
Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 8.3.1
Sensors Used . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
8.3.2
Annotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
8.3.3
Houses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 8.4.1
Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . 176
8.4.2
Feature Representation . . . . . . . . . . . . . . . . . . . . . . 177
8.4.3
Experiment 1: Timeslice Length . . . . . . . . . . . . . . . . . 177
8.4.4
Experiment 2: Feature Representations and Models . . . . . . . 180
8.5
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
8.6
Related and Future work . . . . . . . . . . . . . . . . . . . . . . . . . . 183
8.7
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 9.
Smart Sweet Home
187
N. Noury, J. Poujaud, A. Fleury, R. Nocua, T. Haddidi 9.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
9.2
Daily Activities at Home . . . . . . . . . . . . . . . . . . . . . . . . . . 188 9.2.1
9.3
State of the art in health smart homes . . . . . . . . . . . . . . . 189
Detection of activities with basic PIR sensors . . . . . . . . . . . . . . . 190 9.3.1
PIR sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
9.3.2
The HIS of Grenoble . . . . . . . . . . . . . . . . . . . . . . . 192
9.4
Ambulatograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
9.5
Circadian activity rhythms? . . . . . . . . . . . . . . . . . . . . . . . . . 194
9.6
Night and day alternation . . . . . . . . . . . . . . . . . . . . . . . . . . 196
9.7
Inactivity of Daily Living? . . . . . . . . . . . . . . . . . . . . . . . . . 197
9.8
Activities of daily living . . . . . . . . . . . . . . . . . . . . . . . . . . 200
9.9
On the automatic detection of the ADL . . . . . . . . . . . . . . . . . . 201
9.10
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
xii
Activity Recognition in Pervasive Intelligent Environments
9.11
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 10.
Synthesising Generative Probabilistic Models for High-Level Activity Recognition
209
C. Burghardt, M. Wurdel, S. Bader, G. Ruscher, and T. Kirste 10.1
Introduction & Motivation . . . . . . . . . . . . . . . . . . . . . . . . . 209
10.2
Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
10.3
Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
10.4
10.3.1
Hidden Markov Models . . . . . . . . . . . . . . . . . . . . . . 212
10.3.2
Planning Problem . . . . . . . . . . . . . . . . . . . . . . . . . 213
10.3.3
Task Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
10.3.4
Probabilistic Context-Free Grammars . . . . . . . . . . . . . . . 217
Synthesising Probabilistic Models . . . . . . . . . . . . . . . . . . . . . 218 10.4.1
From Task Models to Hidden Markov Models . . . . . . . . . . 218
10.4.2
From Planning Problems to Hidden Markov Models . . . . . . . 220
10.4.3
From Probabilistic Context-Free Grammars to Hidden Markov Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
10.4.4 10.5
10.6
Joint HMMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230 10.5.1
Planning operators . . . . . . . . . . . . . . . . . . . . . . . . . 230
10.5.2
Task models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
10.5.3
Probabilistic Context-Free Grammars . . . . . . . . . . . . . . . 232
10.5.4
Joint Hidden Markov Models . . . . . . . . . . . . . . . . . . . 232
Summary and Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 11.
Ontology-based Learning Framework for Activity Assistance in an Adaptive Smart Home
237
G. Okeyo, L. Chen, H. Wang, and R. Sterritt 11.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
11.2
Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
11.3
Activity and Behaviour Learning and Model Evolution Framework . . . . 241 11.3.1
Rationale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
Contents
11.4
11.5
xiii
11.3.2
The Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 243
11.3.3
The Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
Activity Learning and Model Evolution Methods . . . . . . . . . . . . . 245 11.4.1
Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
11.4.2
Learning Algorithm for Unlabelled Traces . . . . . . . . . . . . 247
11.4.3
Learning Algorithm for Labelled Traces . . . . . . . . . . . . . 249
Behaviour Learning and Evolution Method . . . . . . . . . . . . . . . . 251 11.5.1
11.6
11.7
Algorithm for Behaviour Learning . . . . . . . . . . . . . . . . 253
Illustration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254 11.6.1
Ontological modelling and representation . . . . . . . . . . . . . 254
11.6.2
Inferring and Logging ADL Activities . . . . . . . . . . . . . . 257
11.6.3
Use scenario for ADL Learning and Evolution . . . . . . . . . . 257
11.6.4
Use scenario for Behaviour Learning and Evolution . . . . . . . 259
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 12.
Benefits of Dynamically Reconfigurable Activity Recognition in Distributed Sensing Environments
265
C. Lombriser, O. Amft, P. Zappi, L. Benini, and G. Tröster 12.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
12.2
Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
12.3
Distributed activity recognition . . . . . . . . . . . . . . . . . . . . . . . 269 12.3.1
12.4
12.5
12.6
Distributed activity recognition architecture . . . . . . . . . . . 270
Dynamic reconfiguration of activity models . . . . . . . . . . . . . . . . 272 12.4.1
Reconfiguration concept . . . . . . . . . . . . . . . . . . . . . . 272
12.4.2
Reconfiguration granularities . . . . . . . . . . . . . . . . . . . 273
Implementation of the activity recognition chain . . . . . . . . . . . . . . 274 12.5.1
Event recognition at distributed sensor nodes . . . . . . . . . . . 274
12.5.2
Network fusion of distributed detector events . . . . . . . . . . . 276
12.5.3
Architecture and reconfiguration complexity metrics . . . . . . . 276
12.5.4
Performance evaluation . . . . . . . . . . . . . . . . . . . . . . 277
Evaluation dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278 12.6.1
Experimental procedure . . . . . . . . . . . . . . . . . . . . . . 279
12.6.2
Sensor node complexity . . . . . . . . . . . . . . . . . . . . . . 280
xiv
Activity Recognition in Pervasive Intelligent Environments
12.7
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280 12.7.1
Baseline results . . . . . . . . . . . . . . . . . . . . . . . . . . 281
12.7.2
Setting-specific results . . . . . . . . . . . . . . . . . . . . . . . 281
12.7.3
Composite-specific results . . . . . . . . . . . . . . . . . . . . . 282
12.7.4
Object-specific results . . . . . . . . . . . . . . . . . . . . . . . 282
12.7.5
Costs of reconfiguration . . . . . . . . . . . . . . . . . . . . . . 283
12.8
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
12.9
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288 13.
Embedded Activity Monitoring Methods
291
N. Shah, M. Kapuria, K. Newman 13.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
13.2
Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292
13.3
13.4
13.2.1
Machine Vision . . . . . . . . . . . . . . . . . . . . . . . . . . 292
13.2.2
RFID-Object Tracking . . . . . . . . . . . . . . . . . . . . . . . 293
13.2.3
RFID and Machine Vision . . . . . . . . . . . . . . . . . . . . . 295
13.2.4
Motion Sensors . . . . . . . . . . . . . . . . . . . . . . . . . . 295
13.2.5
Pressure Sensors . . . . . . . . . . . . . . . . . . . . . . . . . . 296
13.2.6
Accelerometers . . . . . . . . . . . . . . . . . . . . . . . . . . 297
13.2.7
Accelerometers and Gyroscopes . . . . . . . . . . . . . . . . . 298
Ultrasonic Activity Recognition Method . . . . . . . . . . . . . . . . . . 299 13.3.1
Ultrasonic Sensor Selection . . . . . . . . . . . . . . . . . . . . 300
13.3.2
Construction of the System . . . . . . . . . . . . . . . . . . . . 302
13.3.3
System Operation . . . . . . . . . . . . . . . . . . . . . . . . . 303
13.3.4
Activity and Pose Recognition . . . . . . . . . . . . . . . . . . 305
13.3.5
Open Issues and Drawbacks . . . . . . . . . . . . . . . . . . . . 307
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310 14.
Activity Recognition and Healthier Food Preparation
313
T. Plötz, P. Moynihan, C. Pham, P. Olivier 14.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
14.2
The Role of Technology for Healthier Eating . . . . . . . . . . . . . . . 315
Contents
14.3
14.4
14.5
xv
14.2.1
Current dietary guidelines . . . . . . . . . . . . . . . . . . . . . 315
14.2.2
Barriers to healthier eating with focus on preparation . . . . . . 316
14.2.3
Why technology-based approach to healthier cooking? . . . . . . 316
14.2.4
Evaluation and assessment of cooking skills . . . . . . . . . . . 317
Activity Recognition in the Kitchen – The State-of-the-Art . . . . . . . . 317 14.3.1
Sensor-based Activity Recognition . . . . . . . . . . . . . . . . 317
14.3.2
Instrumented Kitchens . . . . . . . . . . . . . . . . . . . . . . . 318
Automatic Analysis of Food Preparation Processes . . . . . . . . . . . . 320 14.4.1
Activity Recognition in the Ambient Kitchen . . . . . . . . . . . 320
14.4.2
System Description . . . . . . . . . . . . . . . . . . . . . . . . 322
14.4.3
Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . 324
Activity recognition and the promotion of health and wellbeing . . . . . . 326
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327
Chapter 1
Activity Recognition: Approaches, Practices and Trends
Liming Chena , Ismail Khalilb a School
of Computing and Mathematics, University of Ulster, BT37 0QB, Northern Ireland, UK b Institute of Telecooperation, Johannes Kepler University Linz, Altenberger Strasse 69, A-4040 Linz, Austria
[email protected],
[email protected] Abstract Activity recognition has attracted increasing attention as a number of related research areas such as pervasive computing, intelligent environments and robotics converge on this critical issue. It is also driven by growing real-world application needs in such areas as ambient assisted living and security surveillance. This chapter aims to provide an overview on existing approaches, current practices and future trends on activity recognition. It is intended to provide the necessary material to inform relevant research communities of the latest developments in this field in addition to providing a reference for researchers and system developers who are working towards the design and development of activity-based context aware applications. The chapter first reviews the existing approaches and algorithms that have been used for activity recognition in a number of related areas. It then describes the practice and lifecycle of the ontology-based approach to activity recognition that has recently been under vigorous investigation. Finally the chapter presents emerging research on activity recognition by outlining various issues and directions the field will take.
1.1
Introduction
With the advance and prevalence of low-cost low-power sensors, computing devices and wireless communication networks, pervasive computing has evolved from a vision [1] to an achievable and deployable computing paradigm. As a result, research is now being conducted in all areas related to pervasive computing, ranging from low-level data collection, to intermediate-level information processing, to high-level applications and service delivery. It is becoming increasingly evident that intelligent environments which can sup1
2
Activity Recognition in Pervasive Intelligent Environments
port both living and work places through flexible multimodal interactions, proactive service provision, and context aware personalized activity assistance will be commonplace in the very near future. For example, Smart Homes (SH) – augmented real or simulated home settings equipped with sensors, actuators and information processing systems, have been extensively studied. Work in this area has produced a number of lab-based or real-world SH prototypes [2]. Within a SH the Activities of Daily Living (ADL) of its inhabitants, usually the elderly or disabled, can be monitored and analysed so that personalized context-aware assistive living can be provided. Activity recognition has emerged as a decisive research issue related to the successful realisation of intelligent pervasive environments. This relates to the fact that activities in a pervasive environment provide important contextual information and any intelligent behaviour of such an environment must be relevant to the user’s context and ongoing activities. Activity recognition has been an active and fast growing research area. Whilst early work focused on the monitoring and analysis of visual information, such as images and surveillance videos, as a means to recognise activities, recent research has moved towards the use of multiple miniature dense sensors embedded within environments. These sensors are used to acquire the contextual data required for the process of activity recognition. Accordingly, a multitude of approaches and algorithms have been proposed and studied with the main differences between each being the manner in which the activities are modeled, represented, reasoned and used. This chapter serves three main purposes: Firstly it aims to present an up-to-date summary of the state-of-the-art in activity recognition. We describe various existing approaches and their underlying algorithms. In particular we discuss and analyse dense sensor based activity recognition. Special emphasis is placed on approaches which utilize ontologies to enable and facilitate activity recognition. Secondly, the chapter introduces the concept of a domain knowledge driven approach to activity recognition. The approach adopts ontological modeling as the conceptual backbone covering the lifecycle of activity recognition in a sensorised pervasive environment. The compelling feature of the proposed approach is that activity recognition is performed through direct semantic reasoning making extensive use of semantic descriptions and domain knowledge. The approach supports progressive activity recognition at both coarse-grained and fine-grained levels. Thirdly, the chapter presents and discusses a number of research issues and directions the field will take. This intends to help all stakeholders in the field identity areas of particular importance, understand a coherent vision of the development of the area, its application domains and likely barriers
Activity Recognition: Approaches, Practices and Trends
3
to adoption. The remainder of the chapter is organised as follows: Section 1.2 presents the approaches and algorithms of activity recognition. Section 1.3 describes the practice and lifecycle of the ontology-based approach to activity recognition. Section 1.4 outlines an exemplar case study for activity recognition in smart homes. Section 1.5 discusses the future research issues and trends about activity recognition. Section 1.6 concludes the chapter.
1.2
Activity recognition approaches and algorithms
Activity recognition is the process whereby an actor’s behavior and his/her situated environment are monitored and analysed to infer the undergoing activities. It comprises many different tasks, namely activity modeling, behavior and environment monitoring, data processing and pattern recognition. To perform activity recognition, it is therefore necessary to (1) create computational activity models in a way that allows software systems/agents to conduct reasoning and manipulation. (2) monitor and capture a user’s behavior along with the state change of the environment. (3) process perceived information through aggregation and fusion to generate a high-level abstraction of context or situation. (4) decide which activity recognition algorithm to use, and finally (5) carry out pattern recognition to determine the performed activity. Researchers from different application domains have investigated activity recognition for the past decade by developing a diversity of approaches and techniques for each of these core tasks. Based on the way these tasks are undertaken, we broadly classify activity recognition into the following categories. 1.2.1
Activity recognition approaches
Monitoring an actor’s behavior along with changes in the environment is a critical task in activity recognition. This monitoring process is responsible for capturing relevant contextual information for activity recognition systems to infer an actor’s activity. In terms of the way and data type of these monitoring facilities, there are currently two main activity recognition approaches; vision-based activity recognition and sensor-based activity recognition. Vision-based activity recognition uses visual sensing facilities, e.g., camera-based
4
Activity Recognition in Pervasive Intelligent Environments
surveillance systems, to monitor an actor’s behavior and its environment changes [3–6]. It exploits computer vision techniques to analyse visual observations for pattern recognition. Vision-based activity recognition has been a research focus for a long period of time due to its important role in areas such as human-computer interaction, user interface design, robot learning and surveillance. Researchers have used a wide variety of modalities, such as single camera, stereo and infra-red, to capture activity contexts. In addition, they have investigated a number of application scenarios, e.g., single actor or group tracking and recognition. The typical computational process of vision-based activity recognition is usually composed of four steps, namely object (or human) detection, behavior tracking, activity recognition and finally a high-level activity evaluation. While considerable work has been undertaken and significant progress has been made, vision-based activity recognition approaches suffer from issues related to scalability and reusability due to the complexity of real world settings, e.g., highly varied activities in natural environment. In addition, as cameras are generally used as recording devices, the invasiveness of this approach as perceived by some also prevent it from large-scale uptake in some applications, e.g., home environments. Sensor-based activity recognition exploits the emerging sensor network technologies to monitor an actor’s behaviour along with their environment. The sensor data which are collected are usually analysed using data mining and machine learning techniques to build activity models and perform further means of pattern recognition. In this approach, sensors can be attached to either an actor under observation or objects that constitute the environment. Sensors attached to humans, i.e., wearable sensors often use inertial measurement units (e.g. accelerometers, gyroscopes, magnetometers), vital sign processing devices (heart rate, temperature) and RFID tags to gather an actor’s behavioural information. Activity recognition based on wearable sensors has been extensively used in the recognition of human physical movements [7–11]. Activities such as walking, running, sitting down/up, climbing or physical exercises, are generally characterised by a distinct, often periodic, motion pattern. The wearable sensor based approach is effective and also relatively inexpensive for data acquisition and activity recognition for certain types of human activities, mainly human physical movements. Nevertheless, it suffers from two drawbacks. Firstly, most wearable sensors are not applicable in real world application scenarios due to technical issues such as size, ease of use and battery life in conjunction with the general issue of acceptability or willingness of the use to wear them. Secondly, many activities in real world situations
Activity Recognition: Approaches, Practices and Trends
5
involve complex physical motions and complex interactions with the environment. Sensor observations from wearable sensors alone may not be able to differentiate activities involving simple physical movements, e.g., making tea and making coffee. To address these issues object-based activity recognition, has emerged as one mainstream approach [12]. The approach is based on real world observations that activities are characterised by the objects that are manipulated during their operation. Simple sensors can often provide powerful clues about the activity being undertaken. As such it is assumed that activities can be recognised from sensor data that monitor human interactions with objects in the environment. Object-based activity recognition has attracted increasing attention as low-cost low-power intelligent sensors, wireless communication networks and pervasive computing infrastructures become technically mature and financially affordable. It has been, in particular, under vigorous investigation in the creation of intelligent pervasive environments for ambient assisted living (AAL), i.e., the SH paradigm [2, 13, 14]. Sensors in a SH can monitor an inhabitant’s movements and environmental events so that assistive agents can infer the undergoing activities based on the sensor observations, thus providing just-in-time context-aware ADL assistance. For instance, a switch sensor in the bed can strongly suggest sleeping, and pressure mat sensors can be used for tracking the movement and position of people within the environment. It is worth pointing out that the approaches described above may be suitable for different applications. Taking this into account it is not possible to claim that one approach is superior to the other. The suitability and performance is in the end down to the nature of the type of activities being assessed and the characteristics of the concrete applications. In most cases they are complementary and can be used in combination in order to yield optimal recognition results.
1.2.2
Activity recognition algorithms
Activity recognition algorithms can be broadly divided into two major strands. The first one is based on machine learning techniques, including both supervised and unsupervised learning methods, which primarily use probabilistic and statistical reasoning. Supervised learning requires the use of labelled data upon which an algorithm is trained. Following training the algorithm is then able to classify unknown data. The general procedure using a supervised learning algorithm for activity recognition includes several steps, namely, (1) to acquire sensor data representative of activities, including labelled annotations of what
6
Activity Recognition in Pervasive Intelligent Environments
an actor does and when, (2) to determine the input data features and its representation, (3) to aggregate data from multiple data sources and transform them into the applicationdependent features, e.g., through data fusion, noise elimination, dimension reduction and data normalization, (4) to divide the data into a training set and a test set, (5) to train the recognition algorithm on the training set, (6) to test the classification performance of the trained algorithm on the test set, and finally (7) to apply the algorithm in the context of activity recognition. It is common to repeat steps (4) to (7) with different partitioning of the training and test sets in order to achieve better generalisation with the recognition models. There are a wide range of algorithms and models for supervised learning and activity recognition. These include Hidden Markov Models (HMMs) [9, 15, 16], dynamic and naive Bayes networks [12, 17, 18], decision trees [19], nearest neighbour [10] and support vector machines (SVMs) [20]. Among them HMMs and Bayes networks are the most commonly used methods in activity recognition. Unsupervised learning on the other hand tries to directly construct recognition models from unlabeled data. The basic idea is to manually assign a probability to each possible activity and to predefine a stochastic model that can update these likelihoods according to new observations and to the known state of the system. Such an approach employs density estimation methods, i.e., to estimate the properties of the underlying probability density or clustering techniques, to discover groups of similar examples to create learning models. The general procedure for unsupervised learning typically includes (1) to acquire unlabeled sensor data, (2) to aggregate and transforming the sensor data into features, and (3) to model the data using either density estimation or clustering methods. Algorithms for unsupervised learning include the use of graphical models [21] and multiple eigenspaces [20]. A number of unsupervised learning methods are also based on probabilistic reasoning such as various variants of HMMs and Bayes networks. The main difference between unsupervised and supervised probabilistic techniques is that, instead of using a pre-established stochastic model to update the activity likelihood, supervised learning algorithms keep a trace of their previous observed experiences and use them to dynamically learn the parameters of the stochastic activity models. This enables them to create a predictive model based on the observed agent’s activity profiles. A major strength of the activity recognition algorithms that are based on probabilistic learning models is that they are capable of handling noisy, uncertain and incomplete sensor data. Probabilities can be used to model uncertainty and also to capture domain heuristics, e.g., some activities are more likely than others. The limitation of the unsupervised learning
Activity Recognition: Approaches, Practices and Trends
7
probabilistic methods lies in the assignment of these handcrafted probabilistic parameters for the computation of the activity likelihood. They are usually static and highly activitydependent. The disadvantage of supervised learning in the case of probabilistic methods is that they require a large amount of labelled training and test data. In addition, to learn each activity in a probabilistic model for a large diversity of activities in real world application scenarios could be deemed as being computationally expensive. The resulting models are often ad-hoc, not reusable and scalable due to the variation of the individual’s behaviour and their environments. The second strand of activity recognition algorithms is based on logical modelling and reasoning. The rationale of logical approaches is to exploit logical knowledge representation for activity and sensor data modelling, and to use logical reasoning to perform activity recognition. The general procedure of a logical approach includes (1) to use a logical formalism to explicitly define and describe a library of activity models for all possible activities in a domain, (2) to aggregate and transform sensor data into logical terms and formula, and (3) to perform logical reasoning, e.g., deduction, abduction and subsumption, to extract a minimal set of covering models of interpretation from the activity model library based on a set of observed actions, which could explain the observations. There exist a number of logical modelling methods and reasoning algorithms in terms of logical theories and representation formalisms. For example, Kauz [22] adopted first-order axioms to build a library of hierarchical plans for plan recognition. Wobke [23] extended Kauz’s work using situation theory to address the different probabilities of inferred plans. Bouchard [24] used action Description Logic (DL) and lattice theory for plan recognition with particular emphasis on the modelling and reasoning of plan intra-dependencies. Chen [25] exploited the event theory – a logical formalism, for explicit specification, manipulation and reasoning of events, to formalise an activity domain for activity recognition and assistance. The major strength of Chen’s work is its capabilities to handle temporal issues and undecidability. Logical activity modelling and reasoning is semantically clear and elegant in computational reasoning. It is also relatively easy to incorporate domain knowledge and heuristics for activity models and data fusion. The weakness of logical approaches is their inability or inherent infeasibility to represent fuzziness and uncertainty. Most of them offer no mechanism for deciding whether one particular model is more effective than another, as long as both of them can be consistent enough to explain the actions observed. There is also a lack of learning ability associated with logic based methods.
8
1.2.3
Activity Recognition in Pervasive Intelligent Environments
Ontology-based activity recognition
Using ontologies for activity recognition is a recent endeavor and has gained growing interest. In the vision-based activity recognition community, researchers have realized that symbolic activity definitions based on manual specification of a set of rules suffer from limitations in their applicability, i.e., the definitions are only deployable to the scenarios for which they have been designed. There is a need for an explicit commonly agreed representation of activity definitions, i.e., ontologies, for activities that are independent of algorithmic choices, thus facilitating portability, interoperability and reuse and sharing of both underlying technologies and systems. As such, researchers have proposed ontologies for specific domains of visual surveillance. For example, Chen [26] proposed an ontology for analyzing social interaction in nursing homes; Hakeem [27] used ontolgoies for the classification of meeting videos, and Georis [28] for activities in a bank monitoring setting. To consolidate these efforts and to build a common knowledge base of domain ontologies, a collaborative initiative has been made to define ontologies for six domains of video surveillance. This has led to a video event ontology [29] and the corresponding representation language [30]. For instance, Akdemir [31] used the video event ontologies for activity recognition in both bank and car park monitoring scenarios. In principle these studies use ontologies to provide common terms as building primitives for activity definitions. Activity recognition is performed using individually preferred algorithms, such as rule-based systems [27] and finite-state machines [31]. In the object-based activity recognition community, ontologies have been utilised to construct reliable activity models. Such models are able to match an unknown sensor reading with a word in an ontology which is related to the sensor event. For example, a Mug sensor event could be substituted by a Cup event in the activity model “MakeTea” as it uses a Cup. This is particularly useful to address model incompleteness and multiple representations of terms. For example, Tapia [39] generated a large object ontology based on functional similarity between objects from WordNet, which can complete mined activity models from the Web with similar objects. Yamada [32] used ontologies to represent objects in an activity space. By exploiting semantic relationships between things, the reported approach can automatically detect possible activities even given a variety of object characteristics including multiple representation and variability. Similar to vision-based activity recognition, these studies mainly use ontologies to provide activity descriptors for activity definitions. Activity recognition is performed based on probabilistic and/or statistical reasoning [32, 39]. More recently, ontology based modelling and representation has been applied in pervasive
Activity Recognition: Approaches, Practices and Trends
9
computing and in particular Ambient Assisted Living. For example, Latfi [33] proposed an ontological architecture of a telehealth based SH aiming at high-level intelligent applications for elderly persons suffering from loss of cognitive autonomy. Michael et al. [34] developed an ontology-centred design approach to create a reliable and scalable ambient middleware. Chen et al. [35] pioneered the notion of semantic smart homes in an attempt to leverage the full potential of semantic technologies in the entire lifecycle of assistive living i.e. from data modelling, content generation, activity representation, processing techniques and technologies to assist with the provision and deployment. While these endeavours, together with existing work in both vision- and object- based activity recognition, provide solid technical underpinnings for ontological data, object, sensor modelling and representation, there is a gap between semantic descriptions of events/objects related to activities and semantic reasoning for activity recognition. Ontologies are currently used as a mapping mechanism for multiple terms of an object as in [39] or the categorisation of terms as in [32] or a common conceptual template for data integration, interoperability and reuse as in [33–35]. Specifically, there is a lack of activity ontologies, i.e., explicit conceptualisation of activities and their interrelationships. Ontology-based activity recognition approach offers several compelling features: Firstly, ontological ADL models can capture and encode rich domain knowledge and heuristics in a machine understandable and processable way. This enables knowledge based intelligent processing at a higher degree of automation. Secondly, DL-based descriptive reasoning along a time line can support incremental progressive activity recognition and assistance as an ADL unfolds. The two levels of abstraction in activity modelling, i.e., concepts and instances, also allow coarse-grained and fine-grained activity assistance. Thirdly, as the ADL profile of an inhabitant is essentially a set of instances of ADL concepts, it provides an easy and flexible way to capture a user’s activity preferences and styles, thus facilitating personalised ADL assistance. Finally, the unified modelling, representation and reasoning for ADL modelling, recognition and assistance makes it natural and straightforward to support the integration and interoperability between contextual information and ADL recognition. This will support systematic coordinated system development by making use of seamless integration and synergy of a wide range of data and technologies. In the following sections we use SH based ambient assisted living to further illustrate these concepts within the realms of ontological activity recognition.
10
1.3
Activity Recognition in Pervasive Intelligent Environments
The practice and lifecycle of ontology-based activity recognition
At the time of writing the mainstream approaches to activity recognition is based on probabilistic and statistical analysis methods. These methods are well-known and widely used with a plethora of work in the literature. On the other hands ontology-based approach to activity recognition has only emerged recently with little published work. As such this chapter intends to describe the general framework and lifecycle of the approach that can be used by researchers as a general reference. Figure 1.1 shows the system architecture for the realisation of ontology-based activity recognition. Central to the architecture is the ontological modelling and representation of SH domain knowledge (refer to the components in the right-hand column). This provides Context and ADL Ontologies as conceptual knowledge models and User Profiles and Situations as knowledge entities in corresponding repositories. The context ontologies are used to semantically describe contextual entities, e.g., objects, events and environmental elements. The generated semantic contexts, i.e. Situations are used by the ADL Recognition component for activity recognition. The ADL ontologies are used, on the one hand, to create ADL instances for an inhabitant in terms of their ADL profiles, and on the other hand, to serve as a generic ADL model for activity recognition. In addition, archived data in these repositories can be mined for advanced features such as learning, high-level longterm trend recognition as well as automatic model creation. The components in the left hand column denote the physical environment, sensors, devices and assistive services in a SH. The sensors monitor an inhabitant’s ADL and use their observations, together with context ontologies, to generate semantic contexts. Assistive Services receive instructions from the ADL Recognition component and further act on the environment and/or the inhabitant through various actuators. Activity recognition is performed through a description logic based reasoner (the components in the middle column). The reasoner takes as inputs the semantic descriptions of a situation and performs reasoning against the ADL ontologies to provide incremental progressive activity recognition. To support fine-grained activity recognition, concrete sensor observations will be bound with context models to create an activity’s description. By reasoning the descriptions against an inhabitant’s personal ADL profile, specific personalized activities can be recognised. A full discussion related to activity assistance is beyond the scope of this chapter. As most ADLs in the context of ambient assisted living are daily routines with abundant common sense patterns and heuristics from medical observations and psychological be-
Activity Recognition: Approaches, Practices and Trends
11
Fig. 1.1 The system architecture
havioral studies [36, 37], it is reasonable and straightforward to construct an ontological activity model using a description language. This avoids problems suffered by probabilistic algorithms such as the lack of large amounts of observation data, inflexibility, i.e. each activity model needs to be computationally learned, and reusability, i.e. one person’s activity model may be different from others. Using ontological modeling the creation of user activity profiles is equivalent to creating activity instances in terms of a user’s preferences and styles of performing ADLs. Hence it is relatively straightforward to undertake and is also scalable to a large number of users and activities in comparison with traditional approaches. Ontology-based activity recognition follows a typical knowledge engineering lifecycle involving knowledge acquisition, modeling, representation, storage, use/reuse and reasoning, which are described below in details in the context of smart home based ambient assisted living. 1.3.1
Domain knowledge acquisition
A Smart Home (SH) is a home setting where inhabitants perform various ADLs in a location at a time using one or more items. As routine daily activities, ADLs are usually performed in specific circumstances, i.e., in specific environments with specific objects used for specific purposes. For example, brushing teeth usually takes place two times a day, in a bathroom, normally in the morning and before going to bed. This activity usually involves the use of toothpaste and a toothbrush. This is more generally referred to as the context for the corresponding activity. As humans have different life styles, habits or abilities, individuals’ ADLs and the way they perform them may vary one from another. Even for the same type of activity, e.g., making white coffee, different people may use different items, e.g., skimmed milk or semi-skimmed milk, and in different orders, e.g., adding milk
12
Activity Recognition in Pervasive Intelligent Environments
first and then sugar, or vice versa. As such ADLs can be categorized as generic ADLs applicable to all and personalised ADLs with subtlety of individuals. In addition, ADLs can be conceptualized at different levels of granularity. For example, Grooming can be considered to be comprised of sub-activities Washing, Brushing and Applying Make-up. There are usually a “is-a” and “part-of” relationships between a primitive and composite ADL. All these observations can be viewed as prior domain knowledge and heuristics that can facilitate assistive living. The key is how to formally capture, encode and represent such domain knowledge. There are two approaches to capture SH domain knowledge. The first is to derive relevant information through interviews, questionnaires and by studying existing documents [36], then to extract and construct patterns manually using some knowledge engineering tools. The second one is to use information extraction (IE) and data mining techniques to mine from the text corpuses on the Web a set of objects used for each activity and extract object usage information to derive their associated usage probabilities [38, 39]. The second approach is motivated by the observation that an activity can be viewed as the sequence of objects used, i.e., a probabilistic translation between activity names (e.g., “make coffee”) and the names of involved objects (e.g., “mug”, “milk”). As the correlations between activities and their objects used are common sense (e.g., most of us know how to carry out daily activities), such domain knowledge can be discovered in various sources such as how-tos (e.g., those at ehow.com), recipes (e.g., from epicurious.com), training manuals, experimental protocols, and facility/device user manuals or the generic global information space – the Web. Knowledge acquisition usually generates conceptual knowledge models that can be represented in various informal or formal forms, e.g., HTML, XML, tables, diagrams and graphs. Such models are normally be formalised later in terms of application scenarios.
1.3.2
Formal ontology modelling
Ontological modeling is a formal way of knowledge modeling that explicitly specifies key concepts and their properties for a problem domain. These concepts are organized in a hierarchical structure in terms of their shared properties to form super-classs and sub-class relations. For example, MakeTea is a subclass of MakeHotDrink. Properties establish the interrelations between concepts. For instance, hasDrinkType is a property of the MakeHotDrink activity that links the DrinkType concept (e.g., tea, coffee, chocolate) to the MakeHotDrink concept. The resulting ontologies, essentially knowledge models, are able to
Activity Recognition: Approaches, Practices and Trends
13
encode and represent domain knowledge and heuristics. This avoids manual class labeling, pre-processing and training processes in traditional data-centered approaches to activity recognition. In addition, ontologies allow agents to interpret data/information and reason against ontological contexts, thus enhancing the capabilities of automated data interpretation and inference. Once domain knowledge in smart home environment is captured, they can be formally modeled using ontologies, which include context ontologies and activity ontologies. Context ontologies consist of classes and properties for describing SH entities such as Device, Furniture, Location, Time and Sensor, and their interrelationships with an activity class. Each sensor monitors and reflects one facet of a situation. By aggregating individual sensor observations the contextual snapshots at specific time points, or say a situation, can be generated, which can be used to perform activity recognition. Activity ontologies are the explicit representation of a hierarchy of activities that consists of activity types and their relationships in a problem domain. Activities in activity ontologies are modeled not only based on objects, environmental elements and events but also the interrelationships between them, such as is-a or part-of relations. This allows an assistive system/agent to take advantage of semantic reasoning directly to infer activities rather than using the traditional probabilistic methods. Ontological activity recognition is closer to the logical approach in nature. It uses a logic based markup language, e.g. OWL or RDF [40] for specifying activities, and their descriptors and relationships. The major strength of ontology-based activity recognition is that the explicit commonly shared specification of terms and relationships for all relevant entities, e.g., objects, environment elements and events, facilitates interoperability, reusability and portability of the models between different systems and application domains.
1.3.3
Semantic sensor metadata creation
In a SH sensor data are generated continuously, and activity assistance needs to be provided dynamically, both along a timeline. This requires that semantic enrichment of sensor data should be done in real time so that the activity inference can take place. To this end, domain specific dedicated lightweight annotation mechanisms and tools are required. Given the nature of data in SH a two-phase semi-automatic approach to generating semantic descriptions is required. In the first phase data sources such as sensors and devices are manually semantically described. As the number of data sources in a SH is relatively limited, though large, it is manageable to create all semantic instances manually by generic
14
Activity Recognition in Pervasive Intelligent Environments
ontology editors such as the Protégé OWL Plugin. In the second phase dynamically collected sensor data are first converted to textual descriptors. For example, a contact sensor returns a two-state binary value. It can be pre-processed to literals sensible for denoting two states such as on/off or open/close or used/unused, etc. The concrete interpretation of the state depends on the purpose of the sensor. For example, the two states of a contact sensor in a microwave could be open/close. If the contact sensor is attached to a milk bottle, the literal might be used or unused. The conversion of numerical values to descriptive terms is to facilitate interpretation and comprehension for both humans and machines. Pre-processed data can then be automatically attached to semantic instances of the corresponding data source to create a semantic knowledge repository. All these operations are performed through demon-like style software tools embedded in the implemented syste
1.3.4
Semantic sensor metadata storage and retrieval
Once semantic data are generated, they can be archived in semantic repositories for later exchange or consumption by services and applications. Repositories may be centralised in one location or distributed in geographically dispersed sites. As all repositories share the same model, and often use the same type of access APIs, there is little difference in the retrieval of semantic data. Nonetheless, distributed repositories are required to deal with issues pertaining to security and communication bandwidth. Within SH based assistive living, data may be exchanged and shared between institutions in different countries at a global scale. It would be desirable for each institution to have a repository and its own authorisation and authentication control for the enforcement of local data usage policies and ethical issues. On the other hand, as the volume of various data in a single SH is expected to be reasonably low, a centralised repository should be cost effective and easy for management. A centralised repository that consists of two interlinked components, as shown in Figure 1.2 can be developed for semantic data management. The first component contains semantic descriptions relating to the various sensors, devices, inhabitants and the services offered within an institution. These entities and their semantic descriptions are relatively stable for a care institution, i.e., static data. This component can functionally serve as a registry so that new SHs once built within the institution, devices once added to any individual SH, inhabitants once they take residence in a SH and new services once developed, can all be registered for later discovery and reuse. The second component is dedicated to the storage of dynamically generated sensor data and derived high level ADL data, which
Activity Recognition: Approaches, Practices and Trends
15
are time dependent, varying and extensible, i.e. dynamic data. Static data only need to be described and recorded once while dynamic data have the requirement to be recorded whenever they are generated. The separation of their storage saves storage space and also increases recording efficiency. Another advantage with this design is its ability to supports dynamic, automatic discovery of devices, device data, services and inhabitants, thus facilitating reuse of data and services. Further details of these concepts will be presented in the following Section.
Fig. 1.2
1.3.5
The semantic repository
Activity recognition
In ontological SH modeling, activities are modeled as activity classes in the ADL ontologies and contextual information such as time, location and the entities involved is modeled as properties for describing activity classes. As such, a situation at a specific time point is actually a concept description created from SH contextual ontologies, denoting an unknown activity. In this case, activity recognition can be mapped to the classification of the unknown activity into the right position of the class hierarchy of the activity ontologies and the identification of the equivalent activity class. This is exactly the subsumption problem in DL, i.e., to decide if a concept description C is subsumed by a concept description
D, denoted as C ⊆ D. The commonly used tableau proof system uses negation to reduce subsumption to unsatisfiability of concept descriptions, which can be described below. • Reduce subsumption to check unsatisfiability of concept description, i.e., a concept C is subsumed by a concept D can be reduced to the checking of satisfiability of concept
16
Activity Recognition in Pervasive Intelligent Environments
C and the negation of concept D, which can be written below C ⊆ D −→ C ∩ ¬ D • Check whether an instance b of this resulting concept description can be constructed • Build a tree-like model for the concept description • Transform the concept description in Negation Normal Form • Decompose the description using tableau transformation rules • Stop when a clash occurs or no more rules are applicable • If each branch in the tableau contains a clash, the concept is inconsistent Specifically a situation, i.e., an unknown concept description at a specific time point can be generated by linking sensor observations to properties of the context ontologies and incrementally fusing a sequence of sensor observations (as illustrated in Figure 1.3 and 1.5). For example, the activation of the contact sensors in a cup and milk bottle can link the cup and milk to the unknown activity through hasContainer and hasAddings properties. By aggregating sensor observations along a time line, a specific situation, that corresponds to an unknown activity, could be constructed, e.g., hasTime(10am), hasLocation(kitchen), hasContainer(cup) and hasAddings(milk). If the closest ADL class in the ADL ontologies that contains as many perceived properties as possible to the situation can be found, e.g., MakeDrink, then it can be deemed to be the type of ADL for the identified situation. 1.3.6
Activity model learning
As activity models play a critical role in mining real-time sensor data for activity recognition, to make sure that activity models are complete and accurate are of paramount importance. While ADL ontologies have the advantage of providing knowledge-rich activity models, it is difficult to manually build comprehensive ADL ontologies. In particular, given the complexity of ADLs, the differences of ways and capabilities of users carrying out ADLs and also the levels of granularity that an ADL can be modeled, building complete one-for-all ADL ontologies is not only infeasible but also rigid for adapting to various evolving use scenarios. To address this problem, we can use the manually developed ADL ontologies as the seed ADL ontologies. The seed ontologies are, on one hand, used to recognize activities as described in Section 1.5.1. On the other hand, we developed learning algorithms that can learn activity models from sensor activations and the classified activity traces. As such, ADL ontologies can grow naturally as it is used for activity recognition. This is actually a self-learning process in order to adapt to user ADL styles and use scenarios. Consider that an activity description denoted by a number of sensor activations
Activity Recognition: Approaches, Practices and Trends
17
can be subsumed by an abstract ADL, but no matching equivalent ADL concept is found. This means that the situation represents a new activity. The activity is a subclass of the high-level abstract ADL but does not belong to any of its existent subclasses in the seed ontologies. If such a situation appears frequently, it is reasonable to declare the description as a new ADL class and insert the class into the corresponding place within the seed ontology hierarchy. As an example, consider the following scenario: If sensors attached to a kitchen door, a container and a bottle (beer or wine) are activated in a regular pattern for a period of time, it is reasonable to assume that the contextual description described by these sensor observations represents a regular activity. By performing recognition reasoning, an agent can classify the contextual description as a subclass of MakeColdDrink. But it cannot be categorized into any subclasses of MakeColdDrink, e.g., MakeIceWater, MakeJuice. In this case a new ADL such as MakeAlcoholDrink can be declared as a new ADL class and inserted into the hierarchy below the MakeColdDrink class. With this refined ADL model, when the same sensor activation pattern happens again, the assistive agent will be able to recognize the MakeAlcoholDrink ADL and be capable of providing assistance if necessary. While new ADL models can be learnt and added into ADL ontologies, a software agent will not be able to assign a proper term for the learnt ADL and also validate the model. Therefore, human intervention, in particular with regard to model evaluation, validation, naming, is indispensable. Nevertheless, the learning mechanism has the ability to automatically indentify the trends and make the recommendation on how the ADL models are to be improved.
1.3.7
Activity assistance
With activity ontologies as activity models, and activity instances from a specific inhabitant as the inhabitant’s activity profile, the propose approach can support both coarse-grained and fine-grained activity assistance. The former is directly based on subsumption reasoning at concept (or class) level, i.e., TBox while the latter on subsumption reasoning at instance level, i.e., ABox – an inhabitant’s ADL profile. For coarse-grained activity assistance, the process is nearly the same as activity recognition described in section 1.5.1. The extra step is to compare the properties of the recognized activity with the properties identified by sensor observations. The missing property(ies) can then be used to suggest next action(s). For example, if the recognized activity is MakeTea with properties hasContainer, hasAddings and hasHotDrinkType, and the sensor activations indicate tea as HotDrinkType and cup as Container, then advice on Addings such as milk or sugar can be provided. For
18
Activity Recognition in Pervasive Intelligent Environments
fine-grained personalized activity assistance, it is necessary to identify how an inhabitant performs the recognized type of activity in terms of its ADL profile. The discovered ADL instance can then be compared with what has already been performed to decide what need to be done next in order to accomplish the ongoing ADL. For example, a unknown ADL has been identified in the context of hasContainer(cup), hasAddings(milk) and hasHotDrinkType(coffee). Using the aforementioned recognition mechanism, the MakeHotDrink ADL can be firstly recognised. Suppose the inhabitant has specified her/his MakeHotDrink ADL profile as hasContainer(cup), hasAddings(milk), hasHotDrinkType(coffee), hasAddings(sugar) and hasHotWater(hotWater). Through comparison, an assistive system can infer that sugar and hot water are needed in order to complete the ADL. As the matching happens at the instance level, i.e., based on the way the inhabitant performs the activity and what actually happened, it can provide inhabitants with personalised assistance. Activity assistance can be provided in a progressive manner, i.e., initially at coarse-grained level and then fine-grained level. For instance, the MakeDrink ADL may be first identified before proceeding to recognise a concrete MakeDrink activity, such as MakeHotDrink or MakeColdDrink. In this case ADL assistance can be provided first at coarse-grained level, i.e., an assistive system suggests the immediate subactivity classes to a inhabitant without specifying concrete actions. For the above example, the system will simply offer a reminder that the following ADL could be MakeHotDrink or MakeColdDrink. Once an inhabitant takes further actions by following the coarse-gained assistance, the unknown activity will unfold gradually towards the leaf activity. The increasing senor data will be available to help recognise the leaf activity. Once at this stage fine-grained activity assistance can be provided. 1.4
An exemplar case study
We use the “MakeDrink” ADL as an application scenario to demonstrate the ontologybased activity recognition. In the case study we firstly build SH ontologies using the Protégé ontology editor covering both context and ADL ontologies, as shown in Figure 1.3. The MakeDrink scenario is designed based on our Smart Lab [2] environment that consists of a real kitchen. In the kitchen various sensors are attached to objects pertaining to ADLs. For example, contact sensors are attached to the entry door, containers of teabag, sugar, milk, coffee and chocolate, and a cup. The activation of a sensor indicates the occurrence of an action involving the object to which the sensor is attached. The event will be interpreted
Activity Recognition: Approaches, Practices and Trends
Fig. 1.3
19
A fragment of the SH context and ADL ontologies
and mapped to a corresponding concept or property in the SH ontologies giving specific contextual information. For instance, the activation of a door sensor means that the location of a user is in kitchen. The activation of a coffee container means that coffee is used for an ADL as a drink type. Other sensors such as motion detectors and pressure sensors are also used in the lab for various purposes, e.g. the detection of an inhabitant and their exact location. To simplify the description, we only use a number of properties to illustrate the mechanism in the following manner. The scenario runs as follows: In first instance the kitchen door sensor is activated. Then the system detects the activation of the sensors attached to a cup, a teabag container and a milk bottle in a temporal sequence. Suppose no other sensors were activated during this period that the above sensors become active. The question is therefore which ADL just took place? To infer the ADL, it is necessary to map sensor activations to the state changes of the objects to which sensors are attached. The mapping relations are modelled by the Sensor and HomeEntity classes in the SH ontologies. The hasSensor property in the HomeEntity class links each home object such as sugar, fridge, microwave and cup to the attached sensor such as a contact sensor. Then the sensor readings contained in the hasSensorState property in the Sensor class can be mapped to the values of the hasEntityState property in the HowmEntity class. For example, the activation of a contact sensor in a sugar container is mapped to the “used” state for the hasEntityState property of the Sugar class. As the Sugar class is a subclass of the Addings class in the ontology and the Addings class is a filler class for the hasAdding property, this means that an anonymous ADL has the hasAdding property. In this way, the sequence of sensor activations in the above scenario can denote
20
Activity Recognition in Pervasive Intelligent Environments
an anonymous ADL that is described by the following contextual properties, i.e. hasLocation, hasContainer, hasAdding and hasHotDrinkType. Based on the situational context and the context ontologies, ADL inference can be performed in a number of ways, which are described below using Protégé Version 4. ADL Recognition: The ADL recognition algorithm described in Section 1.5 can be performed in the FaCT++ reasoner [41] that has been bundled with Protégé 4 as a backend reasoner. Protégé 4 provides a query frontend, i.e. DL Query tab through which complex query expressions can be framed. Fig. 1.4 shows the implementation process of the ADL recognition. We first input the situational context described above into the class expression pane using a simplified OWL DL query syntax. When the Execute button is pressed, the specified context is passed onto the backend FaCT++ reasoner to reason against the ontological ADL models. The results which are returned are displayed in the Super classes, Sub classes, Descendant classes, Instances and Equivalent classes panes, which can be interpreted as follows: • If a class in the Super classes pane is exactly the same as the one in the Sub classes pane, then the class can be regarded as the ongoing ADL. • If a class in the Super classes is different from the one in the Sub classes, then the inferred ADL can be viewed as an intermediate ADL between a superclass and a subclass. In our example, the situation can be interpreted as that a type of MakeHotDrink ADL has been performed. Even though MakeTea is a sub class of the inferred ADL it is not possible to claim that it is a MakeTea ADL. This means some descriptive properties may be missing, e.g. a hasHotWater property may be needed in order to ascertain the inferred ADL as the MakeTea ADL. • In both cases described above, the Instances pane contains recognised concrete instances of the inferred ADL. These instances usually model and represent a user’s preferred way of performing the class of inferred ADL. The activities recognised in the above cases a and b are coarse-grained, i.e., an assistive agent can only suggest a type (class or subclass) of activity to an inhabitant as the ongoing ADL, e.g. to remind an inhabitant “are you going to make tea?”. To recognise fine-grained personalised ADLs, an assistive agent needs to compare the perceived sensorised contextual information with the specified property values of the instances in the Instance pane. Fig. 1.5 shows the main properties of the UserA_Preferred_Tea ADL. In comparison with the perceived context of cup, hot water, location and teabag, an assistive agent can recognise that the user is making a Chinesetea with skimmed milk and sugar along with a China
Activity Recognition: Approaches, Practices and Trends
21
Fig. 1.4 A fragment of the ADL recognition process
cup. Based on fine-grained activity recognition, advice on completing the ongoing ADL can be tailored to a user’s ADL preferences.
Fig. 1.5 An example of a user’s ADL profile
Progressive ADL Recognition under incomplete sensor observations: The proposed ontology based domain driven approach can perform progressive incremental ADL recognition under incomplete sensor observations. Fig. 1.6 shows the screenshots of recognition operation at three sequential time points. As can be seen, with only location context, i.e. the kitchen door sensor activated, the agent can infer that the ongoing ADL is definitely a KitchenADL and one of the five ADL instances as shown in the left column of the screenshot. As the ADL progresses, more contextual information can be captured and used to
22
Activity Recognition in Pervasive Intelligent Environments
reason against the ontological ADL models. This will allow incremental ADL recognition. For example, with the hasContainer, has Adding and hasHotWater properties, the agent can infer the ongoing ADL as the MakeHotDrink ADL with three instances as shown in the middle column of the Figure 1.6. With the hasHotDrinkType property captured, the agent can further recognise that this is a type of MakeTea ADL with only one instance as indicated in the right column.
Fig. 1.6 Screenshots of progressive ADL recognition
It is worth pointing out that the discussions in this section are based on the simplified application scenario, i.e., “MakeDrink” ADL recognition. Its main purpose is to demonstrate the proposed approach, its implementation and operation. Though the experiments involve only part of the SH ontologies, the methods and mechanisms can be extended and applied to complex scenarios. Our experiments have been carried out using the latest Protégé toolsets. As all tools in Protégé consists of application programming interfaces, it is straightforward to develop an integrated system as outlined in Section 1.3.
1.5 1.5.1
Emerging research on activity recognition Complex activity recognition
Most existing work on activity recognition is built upon simplified use scenarios, normally focusing on single-user single-activity recognition. In real world situations, human activities are often performed in complex manners. These include, for example, that a single ac-
Activity Recognition: Approaches, Practices and Trends
23
tor performs interleaved and concurrent activities and/or a group of actors interact with each other to perform joint activities. Apparently existing research results, i.e., the approaches and algorithms described in previous sections cannot be applied directly for recognising complex activities. Researchers in related communities have realised this knowledge gap and more attention is being focused towards to this area. This shift of research emphasis is also driven by the increasing demand on scalable solutions that are deployable to real world use cases. Nevertheless, research endeavors in this niche field are still at an infancy. In the modelling and recognition of complex activities of a single user, Wu et al. [42] proposed an algorithm using factorial conditional random field (FCRF) for recognizing multiple concurrent activities. This model can handle concurrency but cannot model interleaving activities and cannot be easily scaled up. Hu et al. [43] proposed a two-level probabilistic and goal-correlation framework that deals with both concurrent and interleaving goals from observed activity sequences. They exploited skip-chain conditional random fields (SCCRF) at the lower level to estimate the probabilities of whether each goal is being pursued given a newly observed activity. At the upper level they used a learnt graph model of goals to infer goals in a “collective classification” manner. Modayil et al. [44] introduced Interleaved Hidden Markov Models to model both inter-activity and intra-activity dynamics. To reduce the size of the state space, they used an approximation for recognizing multitasked activities. Gu et al. [45] proposed an Emerging Patterns based approach to Sequential, Interleaved and Concurrent Activity Recognition (epSICAR). They exploit Emerging Patterns as powerful discriminators to differentiate activities. Different from other learning-based models built upon the training dataset for complex activities, they built activity models by mining a set of Emerging Patterns from the sequential activity trace only and applied these models in recognizing sequential, interleaved and concurrent activities. In the modeling and recognition of complex activities of group or multiple occupants, existing work has mainly focused on vision analysis techniques for activity recognition from video data. Various HMM models have been developed for modeling an individual person’s behavior, interactions and probabilistic data associations. These include the dynamically multi-linked HMM model [46], the hierarchical HMM model [47], the Coupled HMM [48], the mixed-memory Markov model [49] and the Layered Hidden Markov Models (LHMMs) [50]. DBN models are also extensively used to model human interaction activities [51, 52] both using video cameras. Lian et al. [53] used FCRF to conduct inference and learning from patterns of multiple concurrent chatting activities based on audio streams. Work on using dense sensing for complex activity recognition is rare. Lin et
24
Activity Recognition in Pervasive Intelligent Environments
al. [54] proposed a layered model to learn multiple users’ activity preferences based on sensor readings deployed in a home environment. Nevertheless, their focus is on learning of preference models of multiple users rather than on recognizing their activities. Wang et al. [55] used CHMMs to recognize multi-user activities from dense sensor readings in a smart home environment. They developed a multimodal sensing platform and presented a theoretical framework to recognize both single-user and multi-user activities. Singla et al. [56] proposed a single HMM model for two residents. The model can not only represent transitions between activities performed by one person, but also represent transitions between residents and transitions between different activities performed by different residents. As such their probabilistic models of activities are able to recognize activities in complex situations where multiple residents are performing activities in parallel in the same environment.
1.5.2
Domain knowledge exploitation
As can be seen from the above discussions, at present, there is a multitude of sensing technologies, multimodal devices and communication platforms being developed and deployed in smart environments for activity monitoring. There is an abundance of approaches and algorithms for activity recognition in various scenarios, including a single user performing a single activity, a single user performing interleaved multiple activities and multiple users performing complex activities. Nevertheless, existing endeavors for activity monitoring and recognition suffer from several main drawbacks. Firstly, sensor data generated from activity monitoring, in particular in the situations of using multimodal sensors and different types of sensors, are primitive and heterogeneous in format and storage, and separated from each other in both structure and semantics. Such data sets are usually ad hoc, lack of descriptions, thus difficult for exchange, sharing and reuse. To address this problem researchers have made use of domain knowledge to develop high-level formal data models. Nugent et al. [57] proposed a standard XML schema HomeML for smart home data modelling and exchange; Chen et al. [58] proposed context ontologies to provide high-level descriptive sensor data models and related technologies for semantic sensor data management aiming to facilitate semantic data fusion, sharing and intelligent processing. We believe knowledge rich data modelling and standardisation supported by relevant communities is a promising direction towards a commonly accepted framework for sensor data modelling, sharing and reuse. Secondly, current approaches and algorithms for activity recognition are often carefully
Activity Recognition: Approaches, Practices and Trends
25
handcrafted to well-defined specific scenarios. Existing implemented proof-of-concept systems are mainly accomplished by plumbing and hardwiring the fragmented, disjointed, and often ad hoc technologies. This makes these solutions subject to environment layout, sensor types and installation, and specific application scenarios, i.e., lack of interoperability and scalability. The latest experiments performed by Biswas et al. [59] indicated it is difficult to replicate and duplicate a solution in different environments even for the same, simplest single-user single-activity application scenario. This highlights the challenge to generalise approaches and algorithms of activity recognition to real world use cases. While it is not realistic to pre-define one-size-fits-all activity models due to the number of activities and the variation of the way activities are performed, it is desirable if rich domain knowledge can be exploited to produce initial explicit generic activity models. These models are later used, on the one hand, to generate fine-grained individual-specific activity models, and on the other hand, to evolve towards completion through learning. Chen et al. [60] proposed activity ontologies for this purpose and initial results are promising. It is expected further work is needed along this line. Domain knowledge will certainly play a dominant role when activity recognition is designed as a component of a complete system, e.g., as an input to support inference and decision making. An envisioned application is to use activity recognition to perform behavioural or functional assessment of adults in their everyday environments. This type of automated assessment also provides a mechanism for evaluating the effectiveness of alternative health interventions. For example, Patel et al. [61] used accelerometer sensor data to analyse and assess activities of patients with Parkinson’s disease. They developed analysis metrics and compared the results with assessment criteria from domain experts to estimate the severity of symptoms and motor complications. This demonstrates that domain knowledge about activity profiling and assessment heuristics are valuable for providing automated health monitoring and assistance in an individual’s everyday environment.
1.5.3
Infrastructure mediated activity monitoring
Although many sensor-based activity recognition systems have been developed in the past decade, most of them are still set in experimental environments. To be deployed in real-life settings, activity recognition systems must be scalable, in-expensive, easy to install and maintain. Instead of installing an extensive sensing infrastructure or a large number of low-cost sensors, it is important to leverage a home’s existing infrastructure to “reach into the home” with a small set of strategically-placed sensors [63]. Infrastructure mediated
26
Activity Recognition in Pervasive Intelligent Environments
activity monitoring requires selecting appropriate sensors and designing elaborate methods for combining the sensors with the existing infrastructure. Patel et al. [62] detected human movement by differential air pressure sensing in HVAC system ductwork. Fogarty et al. [63] deploy a small number of low-cost sensors at critical locations in a home’s existing water distribution infrastructure. The authors infer activities in the home based on water usage patterns. These systems would be useful to explore this direction.
1.5.4
Abnormal activity recognition
The existing systems recognize activities in various levels such as action level, ADL (Activity of Daily Living) level, and high level. But they all are normal activities. Another kind of activity that we should pay attention is abnormal activity. Detecting abnormal activities is a particularly important task in security monitoring and healthcare applications. Nevertheless it is challenging to solve the problem. First what is an abnormal activity; we might have a variety of definitions. For instance everyone do activity A, one person does activity B; we can call it an abnormal activity. Yin et al. [64] defined abnormal activities as events that occur rarely and have not been expected in advance. Second there is an unbalanced data problem in abnormal activity detection. Much larger proportion of sensing data is about normal activity, while the data for abnormal ones is extremely scarce, which makes training the classification model quite difficult.
1.6
Conclusions
There is no doubt that intelligent pervasive environments and applications will pervade future working and living spaces, transform our lives and impact our society. Activity recognition is becoming an increasingly important determinant to the success of contextaware personalised pervasive applications. Synergistic research efforts in various scientific disciplines, e.g., computer vision, artificial intelligence, sensor networks and wireless communication to name but a few, have brought us a diversity of approaches and methods to address this issue. In this chapter we first presented a focal review on the state-of-the-art of activity recognition and described their strengths and weaknesses of both approaches and algorithms. It becomes evident that new approaches and methods are required to deal with the sensor data of multiple modalities and the large number of activities of different nature and complexity in the context of ever-growing novel pervasive applications. In particular, such approaches and methods should tackle technical challenges in terms of their robust-
Activity Recognition: Approaches, Practices and Trends
27
ness to real-world conditions and real-time performance, e.g., applicability, scalability and reusability. Ontology-based approach to activity recognition has recently emerged and initial results have proved it is a promising direction. As such we introduced the practice and lifecycle of the ontology-based approach covering ontological modelling, representation and inference of sensor, objects and activities in the lifecycle of activity recognition. We have outlined an integrated system architecture to illustrate the realisation of the proposed approach. In the context of ambient assisted living, we have analysed the nature and characteristics of ADLs and developed the concepts of ADL ontologies. We have described the algorithms of activity recognition making full use of the reasoning power of semantic modelling and representation. We have used a simple yet convincing example scenario to illustrate the use of the approach for a real world problem. Compared with traditional approaches, ontological ADL models are flexible and can be easily created, customised, deployed and scaled up. Description reasoning can provide advanced features such as exploitation of domain knowledge, progressive activity recognition and multiple levels of recognition. We also outline and discuss future research problems and specific issues. In particular, we highlight the necessity of moving from simple activity scenarios to complex real world situations, e.g., interleaved, concurrent activities and multiple users. To address the challenges we highlight the importance of using domain knowledge and a number of issues to be addressed. We fully believe this chapter provides an overview and a reference for researchers in this active research community. References [1] Weiser M., (1991), The computer for the twenty-first century, Scientific American, Vol. 265, No. 3, pp. 94–104. [2] Nugent C.D., (2008), Experiences in the Development of a Smart Lab, The International Journal of Biomedical Engineering and Technology, Vol. 2, No. 4, pp. 319–331 [3] Ivano Y. and Bobick A., (2000), Recognition of Visual Activities and Interactions by Stochastic Parsing, IEEE Trans Pattern Analysis and Machine Intelligence, Vol. 22, No. 8, pp. 852–872. [4] Stauffer C. and Grimson W.E., (2000), Learning Patterns of Activity Using Real-Time Tracking, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 22, No. 8, pp. 747– 757. [5] Bodor R., Jackson B., and Papanikolopoulos N., (2003), Vision based human tracking and activity recognition, In Proceedings of the 11th Mediterranean Conference on Control and Automation. [6] Fiore L., Fehr D., Bodor R., Drenner A., Somasundaram G., and Papanikolopoulos N., (2008), Multi-Camera Human Activity Monitoring, Journal of Intelligent and Robotic Systems, Vol. 52, No. 1, pp. 5–43. [7] Bao L. and Intille S., (2004), Activity recognition from userannotated acceleration data, In
28
Activity Recognition in Pervasive Intelligent Environments
Proc. Pervasive, LNCS3001, pp. 1–17. [8] Huynh D.T.G., (2008), Human Activity Recognition with Wearable Sensors, PhD thesis, TU Darmstadt. [9] Patterson DJ., Fox D., Kautz H., and Philipose M., (2005), Fine-grained activity recognition by aggregating abstract object usage, In Proc. of IEEE International Symposium on Wearable Computers, pp. 44–51. [10] Lee SW. and Mase K., (2002), Activity and location recognition using wearable sensors, IEEE Pervasive Computing, Vol. 1, No. 3, pp. 24–32. [11] Parkka J., Ermes M., Korpipaa P., Mantyjarvi J., Peltola J., and Korhonen I., (2006), Activity classification using realistic data from wearable sensors, IEEE Transactions on Information Technology in Biomedicine, Vol. 10, No. 1, pp. 119–128. [12] Philipose M., Fishkin K.P., Perkowitz M., Patterson D.J., Hahnel D., Fox D., and Kautz H., (2004), Inferring Activities from Interactions with Objects, IEEE Pervasive Computing: Mobile and Ubiquitous Systems, Vol. 3, No. 4, pp. 50–57. [13] Chan M., Estéve D., Escriba C., and Campo E., (2008), A review of smart homes-Present state and future challenges, Computer Methods and Programs in Biomedicine, Vol. 91, No. 1, pp. 55–81. [14] Helal S., Mann W., El-Zabadani H., King J., Kaddoura Y., and Jansen E., (2005), The gator tech smart house: a programmable pervasive space, IEEE Computer, Vol. 38, No. 3, pp. 64–74. [15] Ward J.A., Lukowicz, Troter P., and Starner T.G., (2006), Activity recognition of assembly tasks using body-worn microphones and accelerometers, IEEE Trans.Pattern Analysis and Machine Intelligence, Vol. 28, No. 10, pp. 1553–1567. [16] Boger J., Poupart P., Hoey J., Boutilier C., and Mihailidis A., (2005), A Decision-Theoretic Approach to Task Assistance for Persons with Dementia, In Proc. of the International Joint Conference on Artificial Intelligence (IJCAI’05), pp. 1293–1299. [17] Wang S., Pentney W., Popescu A.M., Choudhury T., and Philipose M., (2007), Common Sense Based Joint Training of Human Activity Recognizers, In Proc. International Joint Conference on Artificial Intelligence. [18] Albrecht D.W. and Zukerman I., (1998), Bayesian Models for Keyhole Plan Recognition in an Adventure Game, User Modelling and User-Adapted Interaction, Vol. 8, pp. 5–47. [19] Tapia E.M. and Intille, S., (2007), Real-time recognition of physical activities and their intensities using wireless accelerometers and a heart rate monitor, In International Symposium on Wearable Computers (ISWC). [20] Huynh T. and Schiele B., (2006), Unsupervised Discovery of Structure in Activity Data using Multiple Eigenspaces, In 2nd International Workshop on Location- and Context-Awareness (LoCA), LNCS Vol. 3987. [21] Liao L., Fox D., and Kautz H., Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields. The International Journalof Robotics Research, 26(1):119, 2007. [22] Kautz H., (1991), A Formal Theory of Plan Recognition and its Implementation, Reasoning About Plans, Allen J., Pelavin R. and Tenenberg J. eds., Morgan Kaufmann, San Mateo, C.A., pp. 69–125. [23] Wobke W., (2002), Two Logical Theories of Plan Recognition, Journal of Logic Computation, Vol. 12, No. 3, pp. 371–412. [24] Bouchard B. and Giroux S., (2006), A Smart Home Agent for Plan Recognition of Cognitivelyimpaired Patients, Journal of Computers, Vol. 1, No. 5, pp. 53–62. [25] Chen L., Nugent C., Mulvenna M., Finlay D., Hong X., Poland, M., (2008), A Logical Framework for Behaviour Reasoning and Assistance in a Smart Home, International Journal of Assistive Robotics and Mechatronics, Vol. 9, No. 4, pp. 20–34. [26] Chen D., Yang J., and Wactlar, H.D., (2004), Towards automatic analysis of social interaction
Activity Recognition: Approaches, Practices and Trends
[27] [28]
[29] [30]
[31]
[32]
[33]
[34]
[35]
[36]
[37] [38] [39]
[40] [41] [42]
[43] [44]
[45]
29
patterns in a nursing home environment from video, in Proc. 6th ACM SIGMM Int. Workshop Multimedia Inf. Retrieval, pp. 283–290. Hakeem A. and Shah, M., (2004), Ontology and Taxonomy Collaborated Framework for Meeting Classification, In Proc. Int. Conf. Pattern Recognition, pp. 219–222. Georis B., Maziere M., Bremond F., and Thonnat M., (2004), A video interpretation platform applied to bank agency monitoring, in Proc. 2nd Workshop Intell. Distributed Surveillance System, pp. 46–50. Hobbs J., Nevatia R., and Bolles B., (2004), An Ontology for Video Event Representation, In IEEE Workshop on Event Detection and Recognition. Francois A.R.J., Nevatia R., Hobbs J., and Bolles R.C., (2005), VERL: An Ontology Framework for Representing and Annotating Video Events, IEEE MultiMedia, Vol. 12, No. 4, pp. 76–86. Akdemir U., Turaga P., Chellappa, R., (2008), An ontology based approach for activity recognition from video, Proceeding of the 16th ACM international conference on Multimedia, pp. 709–712. Yamada N., Sakamoto K., Kunito G., Isoda Y., Yamazaki K., and Tanaka S., (2007), Applying Ontology and Probabilistic Model to Human Activity Recognition from Surrounding Things, IPSJ Digital Courier, Vol. 3, pp. 506–517. Latfi F., Lefebvre B., and Descheneaux C., (2007), Ontology-Based Management of the Telehealth Smart Home, Dedicated to Elderly in Loss of Cognitive Autonomy, CEUR Workshop Proceedings, Vol. 258, (online) available: http://ftp.informatik.rwthaachen.de/Publications/CEUR-WS/Vol-258/paper42.pdf. Michael K., Schmidt A., and Lauer R., (2007), Ontology-Centred Design of an Ambient Middleware for Assisted Living: The Case of SOPRANO, In Proc. of the 30th Annual German Conference on Artificial Intelligence. Chen L., Nugent C.D., Mulvenna M., Finlay D., and Hong X., (2009), Semantic Smart Homes: Towards Knowledge Rich Assisted Living Environments, Studies in Computational Intelligence, Vol. 189, pp. 279–296. James A.B., (2008), Activities of daily living and instrumental activities of daily living, In: Crepeau EB, Cohn ES, Schell BB, editors. Willard and Spackman’s Occupational Therapy. Philadelphia: Lippincott, Williams and Wilkins, pp. 538–578. WHO, World Health Organization, International classification of functioning, disability and health (ICF), http://www.who.int/classifications/icf/en/ Wyatt D., Philipose, M., and Choudhury T., (2005), Unsupervised activity recognition using automatically mined common sense, in: Proc. of AAAI 2005. Tapia M.E., Choudhury T., and Philipose M., (2006), Building Reliable Activity Models using Hierarchical Shrinkage and Mined Ontology, Proceedings of the 4th International Conference on Pervasive Computing, pp. 17–32. OWL and RDF specifications, http://www.w3.org/ Horrocks I., Sattler U., and Tobies S., (1999), Practical reasoning for expressive description logics, Lecture Notes in Artificial Intelligence, No. 1705, pp. 161–180. Wu T.Y., Lian C.C., and Hsu J.Y., Joint recognition of multiple concurrent activities using factorial conditional random fields, in: Proc. AAAI Workshop Plan, Activity, and Intent Recognition, California, 2007. Hu D.H. and Yang Q., Cigar: concurrent and interleaving goal and activity recognition, in: Proc. AAAI, Chicago, Illinois, USA, 2008. Modayil J., Bai T., Kautz H., Improving the recognition of interleaved activities, Proceedings of the 10th international conference on Ubiquitous computing, UbiComp; Vol. 344, pp. 40–43, 2008. Gu T., Wu Z., Tao X.P., Pung H.K., and Lu J., epSICAR: An Emerging Patterns based Approach
30
[46] [47]
[48]
[49]
[50]
[51] [52] [53]
[54]
[55] [56]
[57]
[58]
[59]
[60] [61]
[62]
Activity Recognition in Pervasive Intelligent Environments
to Sequential, Interleaved and Concurrent Activity Recognition. In Proc. of the 7th Annual IEEE International Conference on Pervasive Computing and Communications (Percom ’09), pp. 1–9, 2009. Gong S. and Xiang T., Recognition of group activities using dynamic probabilistic networks, in: Proc.of ICCV 2003, Nice, France, 2003, pp. 742–749. Nguyen N., Bui H., and Venkatesh S., Recognising behaviour of multpile people with hierarchical probabilistic and statistical data association, in: Proc of the 17th British Machine Vision Conference (BMVC 2006), Edinburgh, Scotland, 2006. Oliver N., Rosario B., and Pentland A., A bayesian computer vision system for modeling human interactions, in: IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 22, 2000, pp. 831–843. Choudhury T. and Basu S., Modeling conversational dynamics as a mixedmemory markov process, in: Advances in Neural Information Processing Systems 17, MIT Press, Cambridge, MA, 2005, pp. 281–288. Oliver N., Gargb A., and Horvitz E., Layered representations for learning and inferring office activity from multiple sensory channels, in: Computer Vision and Image Understanding, November 2004, pp. 163–180. Du Y., Chen F., Xu W., and Li Y., Recognizing interaction activities using dynamic bayesian network, in: Proc. of ICPR 2006, Hong Kong, China, 2006. Wyatt D., Choudhury T., Bilmes J., and Kautz H., A privacy sensitive approach to modeling multi-person conversations, in: Proc. IJCAI, India, 2007. Lian, C. and Hsu, J., Chatting activity recognition in social occasions using factorial conditional random fields with iterative classification, in: Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (2008), Chicago, Illinois, 2008. Lin Z. and Fu L., Multi-user preference model and service provision in a smart home environment, in: Proc. of IEEE International Conference on Automation Science and Engineering (CASE 2007), 2007. Wang L., Gu T., Tao X., and Lu J., Sensor-Based Human Activity Recognition in a Multi-user Scenario, AmI 2009, LNCS 5859, pp. 78–87, 2009. Singla G., Cook D., and Schmitter-Edgecombe M., Recognizing independent and joint activities among multiple residents in smart environments. Ambient Intelligence and Humanized Computing Journal, 1(1):57–63, 2010. Nugent C.D., Finlay D.D., Davies R., Wang H.Y., Zheng H., Hallberg J., Synnes K., and Mulvenna M.D., homeML - An open standard for the exchange of data within smart environments, Proceedings of 5th International Conference on Smart homes and health Telematics, Lecture Notes in Computer Science (Vol. 4541), pp. 121–129, 2007. Chen L. and Nugent C.D., (2009), Semantic Data Management for Situation-aware Assistance in Ambient Assisted Living, the proceedings of the 11th International Conference on Information Integration and Web-based Applications & Services (iiWAS2009), ACM ISBN 978-160558-660-1, pp. 296–303. Biswas, J., Tolstikov, A., Phyo Wai, A., Baumgarten, M., Nugent, C., Chen, L., Donnelly, M., Replicating Sensor Environments for Ambient Assisted Living: A Step Towards Scaled Deployment, accepted for International Conference on Smart homes and health Telematics (ICOST2010), 2010. Chen L., Nugent C.D., (2009), Ontology-based activity recognition in intelligent pervasive environments, International Journal of Web Information Systems, Vol. 5, No. 4, pp. 410–430. Patel S., Lorincz K., Hughes R., Huggins N., Bonato, Monitoring Motor Fluctuations in Patients With Parkinson’s Disease Using Wearable Sensors, IEEE Transaction on Information Technologies in Biomedicine , Vol. 13, No. 6, Nov. 2009. Patel S.N., Reynolds M.S., and Abowd G.D., Detecting Human Movement by Differential
Activity Recognition: Approaches, Practices and Trends
31
Air Pressure Sensing in HVAC System Ductwork: An Exploration in Infrastructure Mediated Sensing. Proc. of the 6th International Conference on Pervasive Computing (Pervasive 2008), pp. 1–18, Sydney, Australia, 2008. [63] Fogarty J., Au C., and Hudson S.E., Sensing from the Basement: A Feasibility Study of Unobtrusive and Low-Cost Home Activity Recognition, UIST’06, 91–100. [64] Yin J., Yang Q., and Pan J., Sensor-based abnormal human-activity detection. IEEE Trans. Knowl. Data Eng., 20(8):1082–1090, 2008.
Chapter 2
A Possibilistic Approach for Activity Recognition in Smart Homes for Cognitive Assistance to Alzheimer’s Patients
Patrice C. Roy 1 , Sylvain Giroux 1 , Bruno Bouchard 2 , Abdenour Bouzouane 2 , Clifton Phua 3 , Andrei Tolstikov 4 , and Jit Biswas 4 1 DOMUS
Lab., Université de Sherbrooke, J1K 2R1, Canada Lab., Université du Québec à Chicoutimi, G7H 2B1, Canada 3 Data Mining Dep., I2R, 138632, Singapore 4 Networking Protocols Dep., I2R, 138632, Singapore 2 LIARA
Corresponding author:
[email protected] Abstract Providing cognitive assistance to Alzheimer’s patients in smart homes is a field of research that receives a lot of attention lately. The recognition of the patient’s behavior when he carries out some activities in a smart home is primordial in order to give adequate assistance at the opportune moment. To address this challenging issue, we present a formal activity recognition framework based on possibility theory and description logics. We present initial results from an implementation of this recognition approach in a smart home laboratory.
2.1
Introduction
A major development in recent years is the importance given to research on ambient intelligence in the context of recognition of activities of daily living. Ambient intelligence, in opposition to traditional computing where the desktop computer is the archetype, consists of a new approach based on the capacities of mobility and integration of digital systems in the physical environment, in accordance with ubiquitous computing. This mobility and this fusion are made possible by the miniaturization and reduced power consumption of electronic components, the omnipresence of wireless networks and the fall of production costs. This allows us to glimpse the opportune composition of devices and services of all 33
34
Activity Recognition in Pervasive Intelligent Environments
kinds on an infrastructure characterized by a granularity and variable geometry, endowed with faculties of capture, action, treatment, communication and interaction [1, 2]. One of these emerging infrastructures is the concept of smart home. To be considered as intelligent, the proposed home must inevitably include techniques of activity recognition, which can be considered as being the key to exploit ambient intelligence. Combining ambient assisted living with techniques from activity recognition greatly increases its acceptance and makes it more capable of providing a better quality of life in a non-intrusive way. Elderly people, with or without disabilities, could clearly benefit from this new technology [3]. Activity recognition, often referred as plan recognition, aims to recognize the actions and goals of one or more agents from observations on the environmental conditions. The plan recognition problem has been an active research topic in artificial intelligence [4] for a long time and still remains very challenging. The keyhole, adversarial or intended plan recognition problem [5] is usually based on a logic or probabilistic reasoning for the construction of hypotheses about the possible plans, and on a matching process linking the observations with some activity models (plans) related to the application domain. Prior work has been done to use sensors, like radio frequency identification (RFID) tags attached to household objects [6], to recognize the execution status of particular types of activities, such as hand washing [7], in order to provide assistive tasks like, for instance, reminders about the activities of daily living (ADL). However, most of this research has focused on probabilistic models such as Markovian models and Bayesian networks. But there is some limitations with probability theory. Firstly, the belief degree concerning an event is determined by the belief degree in the contrary event (additivity axiom). Secondly, the classical probability theory (single distribution) cannot model (full or partial) ignorance in a natural way [8]. An uniform probability distribution on an event set better express randomness than ignorance, i.e. the equal chance of occurrence of events. Ignorance represents the fact that, for an agent, each possible event’s occurrence is equally plausible, since there is no evidence that is available to support any of them by the lack of information. Hence, one of the solutions to this kind of problem is possibility theory [9], an uncertainty theory devoted to the handling of incomplete information. Contrary as in probability theory, the belief degree of an event is only weakly linked to the belief degree of the contrary event. Also, the possibilistic encoding of knowledge can be purely qualitative, whereas the probabilistic encoding is numerical and relies on the addi-
Possibilistic Activity Recognition
35
tivity assumption. Thus, possibilistic reasoning, which is based on max and min operations, is computationally less difficult than probabilistic reasoning. Unlike probability theory, the estimation of an agent’s belief about the occurrence of event is based on two set-functions. These functions are the possibility and necessity (or certainty) measures. Finally, instead of using historical sensors data like others probabilistic approaches, we use partial beliefs from human experts concerning the smart home occupant’s behaviors, according to plausible environment’s contexts. Since the recognition system can be deployed in different smart homes and that their environments are dynamics, it is more easier to obtain an approximation of the occupant’s behaviors from human experts’ beliefs than learning the behaviors in different smart home’s settings. By using general descriptions of environmental contexts, it is possible to keep the same possibility distributions for different smart home’s settings. Consequently, another advantage of possibility theory is that it is easier to capture partial belief concerning the activities’ realizations from human experts, since this theory was initially meant to provide a graded semantics to natural language statements [10]. In the DOMUS [11] and LIARA research labs, we use possibility theory to address the problem of behavior recognition, where the observed behavior could be associated to cognitive errors. These recognition results are used to identify the various ways a smart home may help an Alzheimer’s occupant at early-intermediate stages to carry out his ADLs. This context increases the recognition complexity in such a way that the presumption of the observed agent’s coherency, usually supposed in the literature, cannot be reasonably maintained. We propose a formal framework for activity recognition based on description logic and possibility theory, which transforms the recognition problem into a possibilistic classification of activities. The possibility and necessity measures on behavior hypotheses allow us to capture the fact that, in some activities, erroneous behavior is as possible as normal behavior. Hence, in a complete ignorance setting, both behavior types are possible, although each type is not necessarily the one being carried out. The chapter is organized as follows. Section 2.2 presents an overview of Alzheimer’s disease. Section 2.3 presents an overview of some related approaches for activity recognition in smart homes. Section 2.4 presents the possibilistic activity recognition model that we propose. Section 2.5 presents the results of our implementation’s experimentation based on real data from the AIHEC project at the Singapore’s Institute for Infocomm Research. Furthermore, this section also presents a discussion of our results according to related recognition approaches. Finally, we conclude the paper, mentioning future perspectives of this work.
36
2.2
Activity Recognition in Pervasive Intelligent Environments
Overall Picture of Alzheimer’s disease
Alzheimer’s disease, or senile dementia of the Alzheimer’s type (SDAT), is a disease of the brain characterized by progressive deterioration of intellectual abilities (cognitive skills and memory) of the carrier [12]. This disease evolves slowly over a period of seven to ten years. The capacity to carry out normal activities (reading, cooking, . . . ) decreases gradually, just like the capacity to produce judgments and answers that are suitable to everyday life problems. The cognitive decline of this dementia can be classified into seven degeneration stages, referring to the global scale of deterioration stages (GDS) of the primary cognitive functions of an individual [13]. The intermediate stages of the disease (3–5) constitute the longest portion of the degeneration process. The main symptoms of these stages are related to planning problems caused by sporadic memory losses, the weakening of the executive functions, and by concentration troubles that decreases attention to a particular task. A distraction (e.g., a phone call, an unfamiliar sound, . . . ) or a memory lapse can hinder the patient in accomplishing his activity, leading him to carry out the actions attached to his activity in the wrong order, to skip some steps of his activity, or to carry out actions that are not even related to his initial objectives [14]. However, the patient’s capacity to perform a simple action (without many steps) remains relatively unaffected [13]. In these intermediate stages, patients suffering from Alzheimer’s disease do not need to be totally taken in charge. They need a supervision and a specific intervention on the part of an assistant. When continuous support is provided to Alzheimer’s patients in the form of cognitive assistance, the degeneration process of the disease is slowed down, making it possible for a patient to remain at home longer [15]. Hence, the development of an artificial agent able to assist Alzheimer’s patients at an intermediate stage, when the situation require it, would make it possible to decrease the workload carried by natural and professional caregivers. This type of technology has the potential to delay the fateful moment of institutionalization of patients by alleviating some caregiver duties and by giving a partial autonomy to patients. In the longer term, it will also constitute an economically viable solution to the increasing cost of home care services. Consequently, the research in this field will have important social and economic impacts in the future. When an Alzheimer’s patient performs a task requiring cognitive skills, it is possible that inconsistencies become present in his behavior [16]. In the plan recognition problem, the presence of these inconsistencies results in erroneous execution of activities (erroneous plans), according to certain types of particular errors. In a real context, the people without such impairment can also act in an inconsistent manner, but on a level less important and
Possibilistic Activity Recognition
37
regular than that of an Alzheimer’s patient. The main difference is that a healthy person is usually able to recognize his behavioral errors and corrects them in order to achieve his initial goals. In contrast, an Alzheimer’s patient will act incoherently, even while performing familiar tasks, and his behavior will become more inconsistent as the disease evolves. Also, we must consider that hypotheses concerning the behavior of the observed patient can fall into the scope of reactive recognition [17]. More precisely, a patient at an early stage is not necessarily making errors; he can simply temporarily stop the execution of an activity plan to begin another one in the middle of an activity realization. This way, the patient deviates from the activity originally planned by carrying out multiple interleaved activities, in order to react to the dynamic of his environment [13]. This is coherent behavior. Starting from this point, if one wishes to establish an effective recognition model capable of predicting any type of correct or incorrect behavior, the confrontation of these two investigations, erroneous versus interleaved, constitutes a necessity that is in conformity with reality, but which is also paradoxical. The detection of a new observed action, different from the expected one, cannot be directly interpreted as an interruption of the plan in progress, in the objective of pursuing a new goal. In fact, this action can be the result of an error on the part of a mentally impaired patient, or a healthy but fallible human operator. However, this unexpected action is not inevitably an error, even in the most extreme case where the patient is at an advanced stage of his disease. This context raises a recognition dilemma that is primarily due to the problem of completing the activity plan library, which of course cannot be complete in any domain. This dilemma is important in a context of cognitive deficiency such as Alzheimer’s disease. A person suffering from Alzheimer’s disease will tend to produce more errors, induced by his cognitive deficit, than a healthy person when he carries out his activities of daily living. Like healthy people, Alzheimer’s patients carry out activities in an interleaved way. Consequently, the consideration of interleaved and erroneous realizations that could explain the current behavior of an Alzheimer’s patient is important. 2.3
Related Work
In the last two decades, the AI community produced much fundamental work addressing the activities recognition issue. This work can be divided into three major trends of research. The first one comprises works based on logical approaches [18], which consist of developing a theory, using first order logic, for formalizing the recognition activity into a deduction or an abduction process. The second trend of literature is related to probabilis-
38
Activity Recognition in Pervasive Intelligent Environments
tic models [19, 20], which define activity recognition in terms of probabilistic reasoning, based primarily on some Markovian models or on Bayesian networks. The third trend covers emerging hybrid approaches [21–23] that try to combine the two previous avenues of research. It considers activity recognition as the result of a probabilistic quantification on the hypotheses obtained from a symbolic (qualitative) algorithm. A significant amount of the fundamental work of these three trends was concerned with applicative contexts different from AmI, supposing most of the time that the observed entity always acts in a coherent way, avoiding many important issues such as recognizing erroneous behavior associated with some activities’ realization. Currently, there are many research groups working on assistive technologies that try to exploit and enhance those activity recognition models in order to concretely address the activities recognition problem in a AmI context for cognitive assistance in a smart environment. The remainder of this section presents a non-exhaustive list of applications, exploiting activity recognition, related to this context of cognitive assistance in a smart environment. The MavHome (Managing an Adaptive Versatile Home) project [24] is a smart home based on a multi-agents architecture, where each agent perceives its state through sensors and acts, in a rational way, on the smart home environment, in order to maximize the inhabitants’ comfort and productivity. For instance, we can have devices that are automatically controlled according to the behavior prediction and the routine and repetitive task patterns (data mining) of the inhabitant. The location prediction uses the Active Lezi algorithm, which is based on the LZ78 compression algorithm. It uses information theory’s principles to process historical information sequences and thereby learn the inhabitant’s likely future positions. This algorithm has been tested on synthetic and real data, with an accuracy of 64 % and 55 % respectively. By adding temporal rules, based on Allen’s temporal logic, to this algorithm, the synthetic data set and real data set prediction accuracies increase to 69 % and 56 % respectively [25]. The inhabitant action prediction is based on the SHIP (Smart Home Inhabitant Prediction) algorithm, which considers the most recent sequence of actions (commands to devices issued by the inhabitants) with those in the inhabitant’s history, in order to predict the inhabitant interactions (commands to devices) with the smart home. The SHIP algorithm prediction accuracy on real data set is 53.4 %. The ED (Episode Discovery) algorithm, based on data mining techniques, is used to identify significant episodes (set of related device events that can be partially/totally ordered or not) that are present in the inhabitant history. Those episodic patterns allow them to minimize the length of the input stream by using the minimum description length principle (MDL), where each
Possibilistic Activity Recognition
39
instance of a pattern is replaced with a pointer to the pattern definition. By combining the Active Lezi and the ED algorithms, the prediction accuracy on the real data set is improved by 14 %. These results indicate the effectiveness of these algorithms in predicting the inhabitant’s activities. Nevertheless, an algorithmic analysis in terms of response time is indispensable, before more conclusions can be drawn. The Gator Tech Smart House developed by the University of Florida’s Mobile and Pervasive Computing Laboratory [26] has been used to conduct several cutting-edge research works aiming to create the technology, framework and systems that will make happen the concept of assistive environments benefiting the growing elderly and disabled population around the world. That research includes personal assistant, sensors, recognition systems, localization and tracking technology, robotic technology, . . . . One of their recent projects, closely related to our research on activity recognition, is on a health platform that monitors the activity, diet and exercise compliance of diabetes patients [27]. The activity recognition is based on Hidden Markov Models (HMM), one for each task to recognize. By using the sensor data, the activity recognition is able to quantitatively evaluate the completion status of an activity and qualitatively evaluate the steps that are missing. By using real data from the CASAS smart apartment at Washington State University [28], five tasks’ HMMs, trained on those data, were able to achieve a recognition accuracy of 98 % on this real data set. Concerning erroneous behavior, the tasks’ HMMs were able to detect errors (missing or wrongly carried out steps) of 19 participants. The Smart Environments Research Group at Ulster University [29] works on different aspects of cognitive assistance in smart homes. In one recent project [30], they proposed a solution to model and reason with uncertain sensor data with the aim of predicting the activities of the monitored person. Their recognition approach is based on the DempsterShafer (DS) theory of evidence [31], an alternative theory for uncertainty that bridges fuzzy logic and probabilistic reasoning and uses belief functions. The use of DS allows them to fuse uncertain information detected from sensors for activities of daily living (ADL), which are monitored within a smart home, by using an ordinal conditional function and merging approaches to handle inconsistencies within knowledge from different sources. It allows them to combine evidence from various sources and arrive at a degree of belief (represented by a belief function) that takes into account all the available evidences. They use background knowledge (such as a caregiver’s diary) to resolve any inconsistencies between the predicted (recognized) action and the actual activities. The strength of this recognition method is that it allows one to capture the fact that sensors provide only imprecise data and
40
Activity Recognition in Pervasive Intelligent Environments
that some informations or sources are, a priori, more reliable than others. However, the weakness of approaches based on DS theory is that the complexity of combining evidences is related to the number of focal elements used by the belief functions, which can be high in a real smart home context. Also, we can note that even if validation based on simulation is planned by the team, this approach has never been tested yet on real data. The research team led by Mihailidis [32] has developed COACH (Cognitive Orthosis for Assisting aCtivities in the Home), which is a prototype aiming to actively monitor an Alzheimer’s patient attempting a specific task, like the hand washing activity, and to offer assistance in the form of guidance (prompts or reminders) when it is most appropriate. The system uses a camera to obtain as observations a set of state variables, such as the location of patient’s hands, in order to determine the completion status of the task according to a handcrafted model of the activity. If a problem occurs, such as an error being made or the patient being confused, the system computes the most appropriate solution to finish the task, using a probabilistic approach based on Partially Observable Markov Decision Processes (POMDP) [33], and then guides the person in the completion of his activity. Experiments and clinical trials with the COACH system, including Alzheimer’s patients and therapists, have shown very promising results in monitoring a single pre-established activity (Hand Washing) and in providing adequate assistance at the right moment [15]. However, a major limitation of the prototype is presuming that the system already knows which activity is in progress, and thus supposing that it can only have one on-going task at a time. The Barista system [34] is a fine-grained ADL recognition system that uses object IDs to determine which activities are currently carried out. It uses Radio Frequency Identification (RFID) tags on objects and two RFID gloves that the user wears in order to recognize activities in a smart home. With object interactions detected with the gloves, a probabilistic engine infers activities according to the probabilistic models of activities, which were created from, for instance, written recipes. The activities are represented as sequences of activity stages. Each stage is composed of the objects involved, the probability of their involvement, and, optionally, a time to completion modelled as a Gaussian probability distribution. The activities are converted into Dynamic Bayesian Networks (DBN) by the probabilistic engine. By using the current sub-activity as a hidden variable and the set of objects seen and time elapsed as observed variables, the engine is able to probabilistically estimate the activities from sensor data. The engine was also tested with hidden Markov models (HMM) in order to evaluate the accuracy precision of activity recognition [20].
Possibilistic Activity Recognition
41
These models were trained with a set of examples where an user performs a set of interleaved activities. However, some HMM models perform poorly, and the DBN model was able to identify the specific on-going activity with a recognition accuracy higher than 80 %. This approach is able to identify the currently carried out ADL in a context where activities can be interleaved. However, this approach does not take into account the erroneous realization of activities, because the result of the activity recognition is the most plausible on-going ADLs.
2.4
Possibilistic Activity Recognition Model
Our activity recognition model is based on possibility theory and on description logics (DL) [35]. DL is a family of knowledge representation formalisms that may be viewed as a subset of first-order logic, and its expressive power goes beyond propositional logic, although reasoning is still decidable. In our model, the activity recognition process can be separated into three agents: the environment representation agent, the action recognition agent, and the behavior recognition agent. The environment recognition agent infers the plausible contexts that could explain the current observed environment’s state resulting from an action realization. This observed state, which can be partial, is obtained from information retrieved from the smart home’s sensor events. Our approach assumes that there is an event manager agent that collects sensor events and evaluates if an action was carried out by the smart home’s occupant. The action recognition agent infers the most plausible low-level action that was carried out in the smart home according the change observed in the environment’s state. In accordance with a possibilistic action formalization and to the set of plausible environment contexts that could represent the observed environment’s state resulting from an action realization, the action recognition agent selects in the action ontology the most possible and necessary recognized action that could explain the environment changes. The behavior recognition agent infers hypotheses about the plausible high-level occupant’s behavior related to the accomplishment, in an erroneous or coherent way, of some intended activities. When the smart home’s occupant performs some activities, its behavior, which could be coherent or erroneous, is observed as a sequence of actions. According to the plausible sequence of observed actions and to a possibilistic activity formalization, the behavior recognition agent infers a behavior hypothesis set and evaluates the most possible and necessary hypotheses in order to send them to an assistive agent. Our approach assumes
42
Activity Recognition in Pervasive Intelligent Environments
that there is a smart home’s assistive agent that uses information from different agents, including the behavior recognition agent, in order to plan, if needed, a helping task, through some smart home’s effectors, towards the smart home’s occupant. 2.4.1
Environment Representation and Context
In our model, knowledge concerning the resident’s environment is represented by using a formalism in DL. By using the open world assumption, it allows us to represent the fact that knowledge about the smart home environment’s state is incomplete (partially observable). Smart home environment’s states are described with terminological (concepts and roles) and assertional (concepts’ instances and relations between them) axioms. DL assertions describing the environment’s state resulting from an action realization are retrieved from information sent by an event manager, which is responsible to collect sensor events and to infer if an action was carried out. In order to reduce the size of plausible states to consider for action and activity formalizations, our approach use low-level environment’s contexts. A context c is defined as a set of environmental properties that are shared by some states s in the environment’s state space S. More formally, a context c can be interpreted as a subset of the environment’s state space (cI ⊆ S) 1 , where states of this subset share some common environmental properties described by the context assertions. For instance, the context where the occupant is in the kitchen, the pantry door is open, and the pasta box is in the pantry can matches several possible states of the smart home environment. It is possible to partition the environment’s state space with a set of contexts. Since the observation of the current state is partial, it is possible to have multiple contexts that could explain the current observed state. Thus, our approach must infer, by using a reasoning system, the plausible contexts that can be satisfied by the current observed state’s assertions. 2.4.2
Action Recognition
In order to infer hypotheses about the observed behavior of the occupant when he carries out some activities in the smart home environment, we need to recognize the sequence of observed actions that were performed in order to achieve the activities’ goals. In our model, we formalize action according to a context-transition model where transitions between contexts resulting from an action realization are quantified with a possibility value. Proposition 2.1. A possibilistic action a is a tuple (Cprea , Cposta , πinita , πtransa ), where 1 ·I is an interpretation function that assigns to a context C a subset of the interpretation domain ΔI = S (the interpretation of c corresponds to a nonempty set of states).
Possibilistic Activity Recognition
43
C prea and Cposta are context sets and πinita and πtransa are possibility distributions on those context sets. C prea is the set of possible contexts before the action occurs (pre-action contexts), Cposta is the set of possible contexts after the action occurs (post-action contexts), πinita is the possibility distribution on Cprea that an environment’s state in a particular context ci ∈ Cprea allows the action to occur, and πtransa is the transition possibility distribution on C prea × C posta if the action does occur. The action library, which contains the set of possible actions A that can be carried out by the occupant, is represented with an action ontology (A , A ), where each action is partially ordered according to an action subsumption relation A , which can be seen as an extension of the concept subsumption relation of DL [35]. This order relation, which is transitive, allows us to indicate that a concept is more general than (subsumes) another concept. In other words, a subsumed concept is a subset of the subsumer concept. According to the action subsumption relation A , if an action subsumes another one, its possibility values are at least as possible as the action subsumed. For instance, since OpenDoor subsumes OpenDoorPantry, then the OpenDoor possibility is greater or equal than the OpenDoorPantry possibility, since OpenDoor is more general than OpenDoorPantry. This action ontology is used when we need to evaluate the most possible action that could explain the changes observed in the smart home environment resulting from an action realization by an observed occupant. At the same time, we evaluate the next most possible action that can be carried out according to the current state of the smart home environment. In order to evaluate the recognition and prediction possibilities on the action ontology at a time t, which is associated to a time value that indicates the elapsed time since the start of the recognition process, we need to use the observation obst of the current environment state, represented by a set of DL assertions. This observed state can be partial or complete according to the information that can be retrieved from the environment’s sensors. Furthermore, each observation timestamp t ∈ Ts is associated with a time value ti ∈ T that indicates the elapsed time (in minutes, seconds, . . . ) since the start of the recognition process. From this observation obst , we need to evaluate the set of contexts ci that are entailed by this observation (obst |= ci ). Since the environment can be partially observable, multiple entailed contexts are possible. Those entailed contexts are then used to evaluate the possibility distributions for the prediction and recognition of actions. The action prediction possibility distribution π pret on A indicates, for each action, the possibility that the action
44
Activity Recognition in Pervasive Intelligent Environments
could be the next one carried out according to the current state observed by obst . Thus, for each action a ∈ A , the prediction possibility π pret (a) is obtained by selecting the maximum value among the initiation possibilities πinita (ci ) for the pre-action contexts ci ∈ Cprea that are entailed by the current observation (obst |= ci ). The action recognition possibility distribution πrect on A indicates, for each action, the possibility that the action was carried out according to the previous and currents states observed by obst−1 and obst . Thus, for each action a ∈ A , the recognition possibility πrect (a) is obtained selecting the maximum value among the transition possibilities πtransa (ci , c j ) for the pre-action contexts ci ∈ Cprea and post-action contexts c j ∈ Cposta that are entailed by the previous and current observations (obst−1 |= ci and obst |= c j ). With this action recognition possibility distribution πrect , we can evaluate the possibility and necessity measures that an action was observed at a time t. The possibility that an action in Act ⊆ A was observed by obst , denoted by Πrect (Act), is given by selecting the maximum possibility among the actions in Act (πrect (a), a ∈ Act). The necessity that an action in Act ⊆ A was observed by obst , denoted by Nrect (Act), is given by selecting the maximum value in πrect and to subtract the maximum possibility among the actions not in Act (πrect (a), a ∈ / Act). According to those possibility and necessity measures (Πrect and Nrect ), the most plausible action a that could explain the changes observed in the environment state described by obst , is selected in A . If more than one action is plausible, the selected observed action is the most specific action among the actions that commonly subsume the most specific actions among theses plausible actions, according to the action subsumption relation A . For instance, if the most plausible actions are All, OpenTap, OpenColdTap and OpenHotTap, then OpenTap is selected since it is the most specific common subsumer of OpenColdTap and OpenHotTap, which are the most specific actions in the plausible action set. This new observed action (a,t) is sent to the behavior recognition agent, which uses the sequence of observed actions to infer behavior hypotheses concerning the realization of the occupant’s activities. This sequence of actions represents the observed plan Pobst , and consists to a set of observed actions (ai ,t j ) totally ordered by the sequence relation ≺T . For instance, let obs0 and obs1 be two observations where the time values associated to the timestamps 0 and 1 are 3 minutes and 4 minutes, respectively. Then the observed plan (OpenDoor, 0) ≺T (EnterKitchen, 1) indicates that OpenDoor was observed, according to obs0 , 3 minutes after the start of the recognition process and that EnterKitchen was then observed, according to obs1 , 1 minute later. This observed plan will be used by the behav-
Possibilistic Activity Recognition
45
ior recognition agent to generate a set of hypotheses concerning the occupant’s observed behavior when he carries out some activities. 2.4.3
Behavior Recognition
In order to have hypotheses about the behavior associated with the performance of some activities, we need to formalize activities as plan structures by using the action ontology A . An activity plan consists of a partially ordered sequence of actions that must be carried out in order to achieve the activity’s goals. Proposition 2.2. An activity α is a tuple (Aα , ◦α ,Crealα , πrealα ), where Aα ⊆ A is the activity’s set of actions, which is partially ordered by a temporal relation ◦α ⊆ Aα × Aα × T × T , where T represents a set of time values, Crealα is the set of possible contexts related to the activity realization, and πrealα is the possibility distribution that a context is related to the realization of the activity. The use of time values allows us to describe the minimum and maximum delays between the carrying out of two actions. So, the ◦α relation, which is transitive, can be seen as an ordering relationship with temporal constraints between two actions in the activity plan. For instance, the activity WatchT v can have an activity plan composed of the actions SitOnCouch, TurnOnT v and TurnO f f T v and the sequence relations (SitOnCouch, TurnOnT v, 0, 5) and (TurnOnT v, TurnO f f T v, 5, 480), where the time values are in minutes. The activity plan library, which contains the set of possible activity plans P that can be carried out by the occupant in the smart home environment, is represented with an activity plan ontology (P, P ), where each activity plan is partially ordered according to an activity subsumption relation P . In other words, if an activity subsumes another one, its possibility values are at least as possible than the activity subsumed. For instance, since CookFood subsumes CookPastaDish, then the CookFood possibility is greater or equal than the CookPastaDish possibility, since CookFood is more general than CookPastaDish. For each observation obst , the activity realization possibility distribution πrealt is evaluated. The activity realization possibility distribution πrealt on P indicates, for each activity, the possibility that the environment’s state observed by obst is related to the activity realization. Thus, for each activity α ∈ P, the realization possibility πrealt (α ) is obtained by selecting the maximum value among the context possibilities πrealα (ci ) for the contexts ci ∈ Crealα that are entailed by the current observation (obst |= ci ).
46
Activity Recognition in Pervasive Intelligent Environments
By using the activity plan ontology and the observed plan, the behavior recognition agent can generate hypotheses concerning the actual behavior of the observed occupant when he carries out some activities. Since multiple activity realizations can explain the observed plan, we need to evaluate partial activity realization paths, which represent partial/complete realizations of activities. A partial activity realization path path j ∈ Path consists to a subset of the observed plan Pobst , where the selected observed actions represent a coherent partial/complete realization of a particular activity plan, according to the sequence and temporal constraints defined in the activity plan and the action subsumption relation. For instance, given the observation plan (SitOnCouch, 0) ≺T (TurnOnElectricalAppliance, 1) and the WatchT v activity plan, we can have as partial path the associations ((SitOnCouch, 0), SitOnCouch) and ((TurnOnElectricalAppliance, 1), TurnOnT v) (since TurnOnT v is subsumed by TurnOnElectricalAppliance). Since the set of partial paths Path depends on the observed plan Pobst , we must update Path for each new observed action by extending, removing, or adding new partial paths according to the temporal constraints defined in the activities’ plans. With this partial activity realization path set Path, we need to evaluate the possibility that a particular partial path is associated with a coherent behavior or an erroneous behavior. The coherent partial path distribution πPathC,t on Path indicates, for each partial path path j ∈ Path, the possibility that the partial path is associated to a coherent behavior according to the observed plan Pobst . Thus, for each partial path path j ∈ Path, the possibility is obtained by selecting the maximum value between the minimum prediction possibility π pret among the next possible actions in the activity plan and the minimum value among the recognition possibilities πrect for the partial path’s observed actions and the activity realization possibilities πrealα for the partial path’s activity for each observation time t in the partial path. The erroneous partial path distribution πPathE,t on Path denotes, for each partial path path j ∈ Path, the possibility that the partial path is associated to an erroneous behavior according to the observed plan Pobst . So, for each path j ∈ Path, the possibility is obtained by selecting the minimum value among the recognition possibilities πrect for the observed action not in the partial path and the activity realization possibilities πrealα for the partial path’s activity for each observation time t not in the partial path. By considering the set of possible activities P poss ⊆ P that are in the partial path set Path, we can generate hypotheses concerning the observed behavior of a occupant when he carries out some activities in the smart home environment. A behavior hypothesis hi ∈ Ht consists to a subset of P poss , where each activity in the hypothesis is not subsumed by
Possibilistic Activity Recognition
47
another activity in the hypothesis. Two interpretations can be given to each hypothesis hi . A hypothesis hi can be interpreted as a coherent behavior where the occupant carries out, in a coherent way, the activities in the hypothesis. Those activities can, at the current observation time, be partially realized. Also, a hypothesis hi can be interpreted as an erroneous behavior where the occupant carries out some activities in an erroneous way, while the activities in the hypothesis, if any, are carried out in a coherent way. According to these two interpretations, we evaluate the coherent behavior and erroneous behavior possibility distributions, πBevC,t and πBevE,t , on the behavior hypothesis set Ht . The coherent possibility distribution πBevC,t indicates, for each hi ∈ Ht , the possibility that a behavior hypothesis denotes coherent behavior according to the observed plan Pobst and the partial paths associated to the activities in the hypothesis. If there is no activity in the hypothesis or that some observed actions in Pobst are not associated to at least one partial path, the possibility is zero. Otherwise, for each behavior hypothesis hi ∈ Ht , the possibility πBevC,t (hi ) is obtained by selecting the maximum value among the activities’ minimal coherent partial path possibilities. The erroneous possibility distribution πBevE,t indicates, for each hi ∈ Ht , the possibility that a behavior hypothesis denotes erroneous behavior according to the observed plan Pobst and the partial paths associated to the activities in the hypothesis. If there is no activity in the hypothesis, the possibility is obtained by selecting the minimal action recognition possibility among the actions in the observed plan Pobst . Else, for each behavior hypothesis, the possibility πBevE,t (hi ) is obtained by selecting the maximum value among the activities’ minimal erroneous partial path possibilities. With those two possibility distributions, we can evaluate, in the same manner as the action recognition, the possibility and necessity measures that each hypothesis represents coherent or erroneous behavior that could explain the observed actions Pobst . The possibility and necessity measures that a hypothesis in B ⊆ Ht represents coherent behavior that could explain the observed plan Pobst , is given by ΠBevC,t (B) and NBevC,t (B), which are obtained from the πBevC,t possibility distribution. The possibility and necessity measures that a hypothesis in B ⊆ Ht represents erroneous behavior that could explain the observed plan Pobst , is given by ΠBevE,t (B) and NBevE,t (B), which are obtained from the πBevE,t possibility distribution. The most possible and necessary hypotheses are then selected according to the ΠBevC,t , NBevC,t , ΠBevE,t and NBevE,t measures on the hypothesis set Ht . The result of the behavior recognition is then sent to an assistive agent, which will use it to plan a helping task if needed.
48
2.4.4
Activity Recognition in Pervasive Intelligent Environments
Overview of the activity recognition process
Let us illustrate the recognition process of our possibilistic model inside a smart home environment with a small example, where the possible activities are DishWashing, DrinkCupWater, and WatchT v. Suppose that the environment’s sensor events indicate a context where a kitchen tap is open. According to the set of entailed contexts and the possibilistic formalization of the actions in the action ontology A , the system evaluates the action prediction and recognition possibility distributions for the current observation. By using the possibility and necessity measure obtained from the recognition possibility distribution, the system finds three plausible actions that could explain the environment changes: OpenHotWaterKitchen, OpenColdWaterKitchen, and OpenTapKitchen. According to the action subsumption relation defined in the action ontology, the recognized action that will be appended to the observed plan is OpenTapKitchen, since it is the most specific common subsumer for OpenHotWaterKitchen and OpenColdWaterKitchen (the most specific actions). Let suppose that we observe the action TurnOnT v 1 minute later, which is associated to the activity WatchT v. Then, the observed plan used for the behavior recognition will be: (OpenTapKitchen, 0) ≺T (TurnOnT v, 1). According to the possible activities and their partial paths, some behavior hypotheses hi ∈ Ht are generated and the system evaluates the coherent behavior and erroneous behavior possibility distributions on these hypotheses. According to the possibility and necessity measures obtained from these possibility distributions, the system selects the most plausible behavior hypotheses and sends them to an assistive system that will plan a helping task if needed. For instance, a plausible hypothesis can denotes a coherent behavior where the observed occupant carries out, in a interleaved way, the WatchT v and DishWashing activities. If the temporal constraints associated with the next action of DishWashing are not satisfied later in the recognition process, this previous hypothesis will be rejected.
2.5
Smart Home Validation
In this section, we present results from our possibilistic model implementation in the Ambient Intelligence for Home based Elderly Care (AIHEC) project’s infrastructure at Singapore’s Institute for Infocomm Research (I2 R) [36]. This infrastructure consists of a simulated smart home environment, which contains stations that represent smart home rooms (pantry, dining, . . . ). The behavior of the observed person is monitored by using pressure sensors (to detect sitting on a chair), RFID antennas (to detect cup and plate on the table
Possibilistic Activity Recognition
49
and in the cupboard), PIR sensors (to detect movement in the pantry and dining areas), reed switch sensors (to detect opening and closing of the cupboard), accelerometer sensors (to detect occupant’s hand movements), and video sensors (mainly to annotate and audit the observed occupant’s behavior). The event manager, which collect the dining and pantry sensor events, is based on a Dynamic Bayes Network (DBN) [37]. It should be noted that other approaches could be used instead of DBN in the event manager. Also, the lab environment uses a wireless sensor network.
Fig. 2.1 Simplified Smart Home Assistance System
Our possibilistic behavior recognition model is implemented according to the simplified smart home system architecture (Figure 2.1), and is subdivided into three agents: environment representation agent, action recognition agent, and behavior recognition agent. The system architecture works as follows. Basic events related to an action realization by the smart home’s occupant are generated by the sensors and are sent to a sensor event manager agent. The sensor event manager agent, which is based on a DBN in this case, infers that an action was carried out and sends the current smart home environment’s state, which could be partially observable, to the environment representation agent. The environment representation agent, which has a virtual representation of the smart home environment encoded in a Pellet description logic system [38], infers which contexts are entailed by the current (partial) environment state. Those entailed contexts are then sent to an action recognition agent, which will use a possibilistic action formalization and the action ontology to select the most plausible action that could explain the observed changes in the environment. This recognized action is sent to a behavior recognition agent, which will use the sequence of observed actions (observed plan) and the activity plan ontology to generate possibilistic
50
Activity Recognition in Pervasive Intelligent Environments
hypotheses about the behavior of the observed occupant. These hypotheses, with information from other agents, will be used by an assistive agent, which will plan a helping task, if needed, towards the smart home’s occupant by using the smart home’s actuators.
2.5.1
Results
A previous trial was carried out in this simulated smart home environment, where 6 actors simulated a meal-time scenario several times (coherent and erroneous behavior) on 4 occasions. This meal-time scenario, which is an eating ADL (activity of daily living), contains several sub-activities: getting the utensils (plate), getting the food (biscuit), getting drink (water bottle) from the cupboard in the pantry to the table, eating and drinking while sitting on the chair, and putting back the utensils, food, and drink in the cupboard. These activities can be interleaved. For instance, the actor can bring the utensils, food and drink at the same time to the table, where some actions are shared between these activities (e.g. open the cupboard). Some erroneous realizations for this scenario were carried out and are mainly associated with realization errors (forget an activity step, add irrelevant actions, break the temporal constrain between two activity’s actions), where some of them can also be considered as an initiation error (do not start an activity), or a completion error (forget to finish the activity). By using the sensor databases for each observed behavior, a set of observed sequences of smart home events was recognized, constituting a set of behavioral realizations. Among those observed behaviors, we select 40 (10 coherent/30 erroneous) scenario realizations that are the most representative, since some of them are similar. The selected coherent scenario realizations represent a coherent behavior of the activities, in an interleaved way, in the meal-time scenario. The selected erroneous scenario realizations represent an erroneous behavior of the meal-time scenario, with or without some coherent partial activity realizations. In those erroneous realizations, there is usually more than one error type that occurs (realization, initiation and completion errors). Each selected scenario realization was simulated in our model implementation by inputting the smart home events related to each realization and to use the sensor event manager, in order to recognize the sequence of observed actions and to generate hypotheses concerning the observed behavior, according to the environment, action and activity ontologies. The main goal of our implementation experimentation is to evaluate the high-level recognition accuracy about the observed behavior associated with the realization of the meal-time scenario. Since our recognition approach is based on the knowledge about the observed smart home environment’s state, problems related to sensors (e.g. noise) are managed by a sensor event man-
Possibilistic Activity Recognition
51
ager, in this case a DBN. This sensor event manager sends the observed environment’s state to our environment manager agent, which infers the plausible contexts that could explain the current observed state. Concerning behavior recognition accuracy, when we look if a particular scenario realization is among the most possible and necessary behavior hypotheses (subsets with only one hypothesis), our model was able to recognize 56.7 % of the erroneous behaviors and 80 % of the coherent behavior (overall 62.5 %)(first part of Figure 2.2). When we look if a particular scenario realization is in a hypothesis set among the most possible and necessary subsets of the behavior hypothesis set (second part of Figure 2.2), our model was able to recognize 63.3 % of the erroneous behavior (overall 67.5 %). In this case, a most possible and necessary hypothesis subset consists to a set of hypotheses where their possibility values maximize the necessity measure according to a certain threshold (fixed value or a ratio of the possibility measure). When we consider the erroneous realizations as generic erroneous behavior (erroneous realizations without coherent activities partially carried out), our model was able to recognize 100 % of them (95 % overall) (third part of Figure 2.2). One of the main reasons that some behavior realizations (coherent and erroneous) are not recognized is related to the rigidity of the action temporal relation, where the only time constraint is a time interval between actions. In this case, some erroneous or coherent realizations are instead considered as generic erroneous realizations, since the coherent partial activities’ realizations are not recognized. Furthermore, in some cases, the sensor configuration changes a little bit (mainly the PIR sensor), and that influences the accuracy of the event manager system. For instance, a change in the PIR sensor localization can produce a lot of perturbations: zones detected by the PIR sensors become overlapped, while the training on the event recognizer is made on non-overlapping zones. It should be noted that all possibility distributions and temporal constraints are obtained from one’s belief instead of historical data. Figure 2.3 plots, for each scenario realization, the system runtime for each observed action that was recognized by our possibilistic model implementation. The runtime includes the interactions with the description logic reasoning system, the action recognition and the behavior recognition. Since the focus of the experimentation is the performance of our behavior recognition implementation, the runtime prior to the smart home event recognition is not considered. For this simulation, the runtime is generally between 100 and 200 milliseconds for each action observed. We observe bell shaped curves, which are an effect that results from a diminution of the partial activity realization path set’s size. This size
52
Activity Recognition in Pervasive Intelligent Environments
Fig. 2.2 Recognition Accuracy of Behavior Recognition: Best Plausible Hypothesis, Best Plausible Subset, and Generic Erroneous Behaviors
diminution results from the fact that some temporal constraints between actions described in the activity plans are no longer satisfied, so that subsets of the partial path set must be removed. It should be noted that the number of actions carried out and the time between them for each scenario realization are different from those of another scenario realization, since each scenario realization represents a specific behavior. 2.5.2
Discussion
Several previous related work, such as that of Cook [24] (MavHome project), Mihailidis [7] (Coach project), Helal [26] (Gator Tech Smart House) and Patterson [34] (Barista system), have conducted the same kind of experiments that we did, using synthetic and real data on comparable problems of similar size. Comparing our experimental results with these previous one is not a simple task. One of the main reasons is the rather different nature of the used recognition approaches and algorithms. In our case, we experimented with a hybrid (logical and possibilistic) approach aiming to recognize precisely the occupant’s correct and incorrect behavior. The output of our recognition algorithm, which takes the form of a set of behavior hypotheses with a possibility distribution that partially order these hypotheses according to the possibility and necessity measures obtained from the possibility distribution, is difficult to compare with other probabilistic approaches, since they do not capture the same facets of uncertainty. Thus, probability theory offers a quantitative model
Possibilistic Activity Recognition
Fig. 2.3
53
Activity Recognition Runtime per Observed Action for the Scenario Realizations
for randomness and indecisiveness, while possibility theory offers a qualitative model of incomplete knowledge [39]. Moreover, the objectives of our respective experiments are also somewhat different. In our case, the focus of our experiment was to know if our method is able to correctly identify the observed occupant’s correct behavior, which could be related to the realization of multiple interleaved activities and to erroneous deviations from the occupant. In contrast, the experiment of Mihailidis [7], as an example, focused only on the identification of the person’s current activity step, while assuming to know the current on-going activity. These two objectives and methods are quite different and lead to some difficulties in comparing them. Despite the heterogeneous nature of previous works experiments, we can draw some useful comparisons and conclusions from the evaluation of their experimental results. First, most of the previous work exploited a form of probabilistic model (Markovian or Bayesian based). These approaches seem to give better results in recognizing an on-going activity and the current activity step with a small plan library. For instance, the results presented by Helal et al. [27] with a Hidden Markov Model give a recognition accuracy of 98 % in identifying the correct activity among five candidates. Also, this approach was able to detect, in a qualitative way, the omitted steps of those activities. The approach of Patter-
54
Activity Recognition in Pervasive Intelligent Environments
son [34], based on Dynamic Bayesian Networks, was able to identify the specific on-going activity with a recognition accuracy higher than 80 %. The Markovian model proposed by Mihailidis [7] also has shown amazing results in recognition accuracy. However, this last approach only focused on monitoring a single activity. In the light of these experimental results, we can draw some comparisons. First, despite their good results, these previous probabilistic models seem to be adapted to small recognition contexts with only a few activities. It seems much more difficult to use them on a large scale, knowing that each activity must be handcrafted and included in a stochastic model, while conserving the probability distribution. Also, the propagation of probabilities following an observation can be quite laborious while dealing with a large activity library. One of the main probabilistic approaches, the Hidden Markov Model, requires an exponential number of parameters (according to the number of elements that describe the observed state) to specify the transition and observation models, which means that we need a lot of data to learn the model, and the inference is exponential [40]. Moreover, a lot of these approaches assume that the sensors are not changing and the recognition is then based on the sensor events and on historical sensors data. This is a limitation, since the smart home environment is dynamic (sensors can be removed, added, moved) and the recognition system could be deployed in different smart homes. For high-level recognition, it is more easier to work with environment’s contexts, which are common in different smart homes, if more general concepts are used to describe the contexts. Also, most previous models simply do not take into account the possibility of recognizing coherent behavior composed of a few activities with their steps interleaved. They also tend to only identify certain precise types of errors (ex. missing steps), while avoiding the others. Finally, we believe that the biggest problem of using a purely probabilistic theory is the inability of handle together the imprecision and the uncertainty of the incomplete information in a natural way. One way to deal with this difficulty is to use a probabilistic interval, which means that there are two probability distributions (one for the minimum values and one for the maximum values). Our approach based on possibility theory, seems to have more flexibility and potential, and to be more advantageous regarding these issues. For instance, by using only one possibility distribution, we can obtain possibility and necessity measures (the interval) on the hypotheses. It allows us to capture partial belief concerning the activities’ execution from human experts, since this theory was initially meant to provide a graded semantics to natural language statements. It also allows us to manage a large quantity of activities, to take into account multiple interleaved plans, and to recognize most types of correct and incorrect be-
Possibilistic Activity Recognition
55
havior. Furthermore, applications based on possibility theory are usually computationally tractable [41].
2.5.3
Summary of Our Contribution
By looking at the previous approaches, we can note that most applications are based on a probabilistic reasoning method for recognizing the observed behavior. They mainly learn the activities’ patterns by using real sensor data collected in the smart home. A potential drawback of such an approach is that, in order to have an adequate probabilistic model, the required amounts of historical data on activity realization can be quite large. Also, since the inhabitant profile influences the activities’ realization, the probabilistic model should be tailored to the inhabitant in order to properly recognize the observed behavior. Furthermore, if we want to deploy an assistive system in existing homes, where each home configuration can be quite different from another, the learning process will need to be carried out again. Some approaches propose learning erroneous patterns of some activities, but since there exist many ways to carry out activities having errors associated with their realizations, the erroneous pattern library will be always incomplete. Moreover, the patient’s habits may change from time to time, according to new experiences, the hour of the day, his physical and psychological condition, etc. Therefore, the patient’s routines must be constantly relearned, and an adaptation period is required by the system. Our possibilistic activity recognition model can be seen as a hybrid between logical and probabilistic approaches. By using knowledge concerning the environment state, the action ontology, the activity ontology, and uncertainty and imprecision associated with each level of the recognition process, our model evaluates a set of behavioral hypotheses, which is ranked according to the erroneous and coherent behavior possibility distributions. Since learning the patient profile from the sensors is a long term process, the use of possibility theory in our recognition model allows us to capture partial belief from human experts, from its origin as a graded semantics to natural language statements [10]. The knowledge of these human experts concerning the patient profile is incomplete and imperfect, and therefore difficult to represent by probability theory. Our approach takes into account the recognition of coherent (interleaved or not) and erroneous behavior, where both behavior types are valid hypotheses to explain the observed realization of some activities, instead of choosing one behavior type by default. For instance, if some events are not in the learned activity pattern, instead of considering only erroneous behavior or interleaved coherent behavior, both behavior types must be considered in the hypotheses that could explain the
56
Activity Recognition in Pervasive Intelligent Environments
observed behavior. In other words, erroneous and coherent (interleaved or not) behaviors are both valid explanations for the patient’s observed behavior when he carries out some activities.
2.6
Conclusion
Despite the important progress made in the activity recognition field for the last 30 years, many problems still occupy a significant place at a basic level of the discipline and its applications. This paper has presented a formal framework of activity recognition based on possibility theory and description logics as the semantic model of the agent’s behavior. It should be emphasized that the initial framework and our preliminary results are not meant to bring exhaustive or definitive answers to the multiple issues raised by activity recognition. However, it can be considered as a first step toward a more expressive ambient agent recognizer, which will facilitate the support of imprecise and uncertain constraints inherent to smart home environments. This approach was implemented and tested on a real data set, showing that it can provide, inside a smart home, a viable solution for the recognition of the observed occupant’s behavior, by helping the system to identify opportunities for assistance. An interesting perspective for the enrichment of this model consists of using a possibilistic description logic to represent the environment’s observed state, thus taking into account the uncertainty, imprecision and fuzziness related to the information obtained from the sensors and to the description (concepts, roles, assertions) of the environment. Finally, we clearly believe that considerable future work and large scale experimentation will be necessary, in a more advanced stage of our work, to help evaluate the effectiveness of this model in the field.
References [1] J. Coutaz and J. Crowley, Plan “intelligence ambiante” : Défis et opportunités, Technical report, Engineering Human-Computer Interaction (EHCI) research group, Grenoble Informatics Laboratory (LIG), (2008). [2] C. Ramos, J. Augusto, and D. Shapiro, Ambient intelligence–the next step for artificial intelligence, IEEE Intelligent Systems, 23(2), 15–18, (2008). [3] R. Casas, R. B. Marín, A. Robinet, A. R. Delgado, A. R. Yarza, J. Mcginn, R. Picking, and V. Grout, User modelling in ambient intelligence for elderly and disabled people, In Proc. of the 11th ICCHP, number 5105 in LNCS, Springer-Verlag, (2008). [4] J. C. Augusto and C. D. Nugent, Eds., Designing Smart Homes: The Role of Artificial Intelligence, vol. 4008, LNAI, (Springer, 2006). [5] C. Geib, Plan recognition, In eds. A. Kott and W. M. McEneaney, Adversarial Reasoning: Com-
Possibilistic Activity Recognition
[6]
[7] [8]
[9] [10] [11]
[12] [13] [14]
[15]
[16]
[17]
[18]
[19] [20]
[21]
[22] [23]
57
putational Approaches to Reading the Opponent’s Mind, pp. 77–100, Chapman & Hall/CRC, (2007). M. Philipose, K. P. Fishkin, M. Perkowitz, D. J. Patterson, D. Fox, H. Kautz, and D. Hähnel, Inferring activities from interactions with objects, IEEE Pervasive Computing: Mobile and Ubiquitous Systems, 3(4), 50–57, (2004). A. Mihailidis, J. Boger, M. Canido, and J. Hoey, The use of an intelligent prompting system for people with dementia: A case study, ACM Interactions. 14(4), 34–37, (2007). D. Dubois, A. HadjAli, and H. Prade, A possibility theory-based approach to the handling of uncertain relations between temporal points, International Journal of Intelligent Systems, 22 (2), 157–179, (2007). D. Dubois and H. Prade, Possibility Theory: An Approach to Computerized Processing of Uncertainty, (Plenum Press, 1988). L. A. Zadeh, Fuzzy sets as a basis for a theory of possibility, Fuzzy Sets and Systems, 1(1), 3–28, (1978). S. Giroux, T. Leblanc, A. Bouzouane, B. Bouchard, H. Pigot, and J. Bauchet, The praxis of cognitive assistance in smart homes, In eds. B. Gottfried and H. Aghajan, Behaviour Monitoring and Interpretation – BMI – Smart Environments, vol. 3, Ambient Intelligence and Smart Environments, pp. 183–211, IOS Press, (2009). J. Diamond, A report on Alzheimer’s disease and current research, Technical report, Alzheimer Society of Canada, (2006). B. Reisberg, S. Ferris, M. Leon, and T. Crook, The global deterioration scale for assessment of primary degenerative dementia, AM J Psychiatry, 9, 1136–1139, (1982). D. J. Patterson, O. Etzioni, D. Fox, and H. Kautz, Intelligent ubiquitous computing to support alzheimer’s patients: Enabling the cognitively disabled, In Proceedings of UbiCog ’02: First International Workshop on Ubiquitous Computing for Cognitive Aids, Göteborg, Sweden, pp. 1–2, (2002). J. Boger, J. Hoey, P. Poupart, C. Boutilie, G. Fernie, and A. Mihailidis, A planning system based on Markov Decision Processes to guide people with dementia through activities of daily living, IEEE Transactions on Information Technology in BioMedicine, 10(2), 323–333, (2006). H. Pigot, A. Mayers, and S. Giroux, The intelligent habitat and everyday life activity support, In Proceeding of the 5th International Conference in Simulations in Biomedicine, Slovenia, pp. 507–516, (2003). A. S. Rao, Means-end plan recognition: Towards a theory of reactive recognition, In Proceedings of the 4th International Conference on Principles of Knowledge Representation and Reasoning, pp. 497–508. Morgan Kaufmann publishers, (1994). H. A. Kautz, A formal theory of plan recognition and its implementation, In eds. J. F. Allen, H. A. Kautz, R. N. Pelavin, and J. D. Tenenberg, Reasoning About Plans, chapter 2, pp. 69–126. Morgan Kaufmann, (1991). D. W. Albrecht, I. Zukerman, and A. E. Nicholson, Bayesian models for keyhole plan recognition in an adventure game, User Modeling and User-Adapted Interaction. 8, 5–47, (1998). D. J. Patterson, D. Fox, H. Kautz, and M. Philipose, Fine-grained activity recognition by aggregating abstract object usage, In Proceedings of IEEE 9th International Symposium on Wearable Computers (ISWC), (2005). D. Avrahami-Zilberbrand and G. A. Kaminka, Incorporating observer biases in keyhole plan recognition (efficiently!), In Proc. of the Twenty-Second AAAI Conference on Artificial Intelligence, pp. 944–949. AAAI Press, Menlo Park, CA, (2007). C. W. Geib and R. P. Goldman, Partial observability and probabilistic plan/goal recognition, In Proceedings of the IJCAI workshop on Modeling Other from Observations, pp. 1–6, (2005). P. Roy, B. Bouchard, A. Bouzouane, and S. Giroux, A hybrid plan recognition model for Alzheimer’s patients: Interleaved-erroneous dilemma, Web Intelligence and Agent Systems:
58
Activity Recognition in Pervasive Intelligent Environments
An International Journal, 7(4), 375–397, (2009). [24] D. J. Cook, M. Youngblood, and S. K. Das, A multi–agent approach to controlling a smart environment, In eds. J. C. Augusto and C. D. Nugent, Designing Smart Homes: The Role of Artificial Intelligence, vol. 4008, Lecture Notes in Artificial Intelligence, pp. 165–182. Springer, (2006). [25] V. R. Jakkula and D. J. Cook, Using temporal relations in smart environment data for activity prediction, In Proceedings of the 24th International Conference on Machine Learning, pp. 1–4, (2007). [26] S. Helal, W. Mann, H. El-Zabadani, J. King, Y. Kaddoura, and E. Jansen, The Gator Tech Smart House: A programmable pervasive space, Computer, 38(3), 50–60, (2005). [27] A. Helal, D. J. Cook, and M. Schmalz, Smart home–based health platform for behavioral monitoring and alteration of diabetes patients, Journal of Diabetes Science and Technology. 3(1), 141–148 (January, 2009). [28] S. Szewcyzk, K. Dwan, B. Minor, B. Swedlove, and D. Cook, Annotating smart environment sensor data for activity learning, Technology and Health Care. 17(3), 161–169, (2009). [29] L. Chen, C. Nugent, M. Mulvenna, D. Finlay, and X. Hong, Semantic smart homes: Towards knowledge rich assisted living environments, In eds. S. McClean, P. Millard, E. El-Darzi, and C. Nugent, Intelligent Patient Management, vol. 189, Studies in Computational Intelligence, pp. 279–296. Springer, (2009). [30] X. Hong, C. Nugent, M. Mulvenna, S. McClean, B. Scotney, and S. Devlin, Evidential fusion of sensor data for activity recognition in smart homes, Pervasive and Mobile Computing, 5(3), 236–252, (2009). [31] G. Shafer, A Mathematical Theory of Evidence, (Princeton University Press Princeton, NJ, 1976). [32] A. Mihailidis, J. Boger, T. Craig, and J. Hoey, The COACH prompting system to assist older adults with dementia through handwashing: An efficacy study, BMC Geriatrics, 8, 28, (2008). [33] W. S. Lovejoy, A survey of algorithmic methods for partially observed Markov decision processes, Annals of Operations Research, 28, 47–66, (1991). [34] D. J. Patterson, H. A. Kautz, D. Fox, and L. Liao, Pervasive computing in the home and community, In eds. J. E. Bardram, A. Mihailidis, and D. Wan, Pervasive Computing in Healthcare, pp. 79–103, CRC Press, (2007). [35] F. Baader, D. Calvanese, D. L. McGuinness, D. Nardi, and P. F. Patel-Schneider, Eds., The Description Logic Handbook: Theory, Implementation, and Applications, (Cambridge University Press, 2007), 2e edition. [36] C. Phua, V. Foo, J. Biswas, A. Tolstikov, A. Aung, J. Maniyeri, W. Huang, M. That, D. Xu, and A. Chu, 2-layer erroneous-plan recognition for dementia patients in smart homes, In Proc. of HealthCom09, pp. 21–28, (2009). [37] A. Tolstikov, J. Biswas, C.-K. Tham, and P. Yap, Eating activity primitives detection – a step towards ADL recognition, In Proc. of the 10th IEEE Intl. Conf. on e–Health Networking, Applications and Service (HEALTHCOM’08), pp. 35–41, (2008). [38] E. Sirin, B. Parsia, B. C. Grau, A. Kalyanpur, and Y. Katz, Pellet: A practical OWL–DL reasoner, J. Web Sem. 5(2), 51–53, (2007). [39] D. Dubois, L. Foulloy, G. Mauris, and H. Prade, Probability-possibility transformations, triangular fuzzy sets, and probabilistic inequalities, Reliable computing. 10(4), 273–297, (2004). [40] K. P. Murphy, Dynamic Bayesian Networks: Representation, Inference and Learning. PhD thesis, University of California, Berkeley, (2002). [41] D. Dubois and H. Prade, Qualitative possibility theory in information processing. In eds. M. Nikravesh, J. Kacprzyk, and L. A. Zadeh, Forging New Frontiers: Fuzzy Pioneers II, Studies in Fuzziness and Soft Computing, pp. 53–83. Springer, (2007).
Chapter 3
Multi-user Activity Recognition in a Smart Home
Liang Wang a,b , Tao Gu b , Xianping Tao a , Hanhua Chen b,c , and Jian Lu a a State
Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210093 Jiangsu, P.R. China b Department of Mathematics and Computer Science, University of Southern Denmark, Campusvej 55, DK-5230 Odense M, Denmark c School of Computer Science and Technology, Huazhong University of Science and Technology, 1037 Luoyu Road, Wuhan, 430074 Hubei, P.R. China
[email protected];
[email protected];
[email protected];
[email protected];
[email protected] Abstract The advances of wearable sensors and wireless networks offer many opportunities to recognize human activities from sensor readings in pervasive computing. Existing work so far focus mainly on recognizing activities of a single user in a home environment. However, there are typically multiple inhabitants in a real home and they often perform activities together. In this paper, we investigate the problem of recognizing multi-user activities using wearable sensors in a home setting. We develop a multi-modal, wearable sensor platform to collect sensor data for multiple users, and study two temporal probabilistic models—Coupled Hidden Markov Model (CHMM) and Factorial Conditional Random Field (FCRF)—to model interacting processes in a sensor-based, multi-user scenario. We conduct a real-world trace collection done by two subjects over two weeks, and evaluate these two models through our experimental studies. Our experimental results show that we achieve an accuracy of 96.41% with CHMM and an accuracy of 87.93% with FCRF, respectively, for recognizing multi-user activities.
3.1
Introduction
The problem of recognizing human actions and activities using video camera has been studied in computer vision since a decade ago [1–3]. With the availability of low-cost sensors and the advancement in wireless sensor networks, researchers in pervasive computing are 59
60
Activity Recognition in Pervasive Intelligent Environments
recently interested in deploying various sensors to collect observations, and recognizing activities based on these observations. This in turn supports many potential applications such as monitoring activities of daily living (ADLs) [4] for the elderly or people with cognitive impairments [5]. By capturing useful low-level features, such as human motion, living environment and human-to-environment interactions, sensors show great potential for human activity recognition. However, recognizing human activities using sensors is challenging because sensor data are inherently noisy and human activities are complex in nature. Most existing work focuses on recognizing single-user activities [6–15]. However, humans are social beings, and they often form a group to complete specific tasks. Activities that involve multiple users collaboratively or concurrently are common in our daily lives, especially in a home setting. Recognizing multi-user activities using wearable sensors is more challenging than recognizing single-user activities. The main challenges are how to design appropriate wearable sensors to capture user interactions, and how to model interacting processes and perform inferences. In this work, we develop a wearable sensor platform to capture the observations of each user and the interactions among multiple users. Using this platform, we conduct a real-world activity trace collection done by two subjects over a period of two weeks in a smart home. We then study two temporal probabilistic models—Coupled Hidden Markov Model (CHMM) and Factorial Conditional Random Field (FCRF)—to model interacting processes which involve multiple users and recognize multi-user activities. Both models are multi-chained variants of their basic models; CHMM couples HMM with temporal, asymmetric influences while FCRF couples CRF with probabilistic dependencies between co-temporal label sequences. We evaluate and compare both models through real system experiments, and analyze their effectiveness in modeling multi-user activities from sensor data. In summary, the chapter makes the following contributions • To the best of our knowledge, this work is the first formal study of two temporal probabilistic models (CHMM and FCRF) in recognizing multi-user activities based on wearable sensors in a smart home environment. • We develop a multi-modal, wearable sensor platform to capture the observations of each user and the interactions between users, and conduct a real-world trace collection in a smart home. • We conduct extensive experiments to evaluate our models and analyze their effective-
Multi-user Activity Recognition in a Smart Home
61
ness in a wearable sensor based setting. The rest of the chapter is organized as follows. Section 3.2 discusses the related work. In Section 3.3, we present the design of our wearable sensor platform. Section 3.4 describes our proposed activity models, and Section 3.5 reports our empirical studies. Finally, Section 3.6 concludes the chapter.
3.2
Related Work
Much early work in human activity recognition [1–3] has been done in computer vision. They leverage on video cameras, and explore various spatial-temporal analysis to recognize people’s actions from video sequences. Recently, researchers are interested in recognizing activities based on sensor readings. Recognition models are typically probabilistic based, and they can be categorized into static and temporal classification. Typical static classifiers include naïve Bayes used in [7–9], decision tree used in [7, 9], and k-nearest neighbor (k-NN) used in [7, 16]. In temporal classification, state-space models are typically used to enable the inference of hidden states (i.e., activity labels) given the observations. We name a few examples here: Hidden Markov Model (HMM) used in [10–12, 17–20], Dynamic Bayesian Network (DBN) used in [13] and Conditional Random Field (CRF) used in [15, 21]. The variants of CRF have been used to model complex activities of a single user. For example, Wu et al. [22] applied Factorial Conditional Random Field (FCRF) to model concurrent activities. There are some existing work on recognizing group activities and modeling interacting processes. Gong et al. [23] developed a dynamically multi-linked HMM model to interpret group activities. They also compared their methods with Multi-Observation HMM, Parallel HMM, and Coupled HMM. Nguyen et al. [24] employed hierarchical HMM for modeling the behavior of each person and the joint probabilistic data association filters for data association. Park et al. [25] presented a synergistic track- and body-level analysis framework for multi-person interaction and activity analysis in the context of video surveillance. An integrated visual interface for gestures and behavior was designed in [26] as a platform for investigating visually mediated interaction with video camera. However, their system only tackled simple gestures like waving and pointing. Du et al. [27] proposed a new DBN model structure with state duration to model human interacting activities (involving two users) using video cameras, combining the global features with local ones. Choudhury et al. [28] modeled the joint turn-taking behavior as a mixed-memory Markov model that
62
Activity Recognition in Pervasive Intelligent Environments
combines the statistics of the individual subjects’ self-transitions and the partners’ crosstransitions. Lian et al. [29] used FCRF to conduct inference and learning from patterns of multiple concurrent chatting activities based on audio streams. Oliver et al. [30, 31] proposed CHMM to model user interactions in video. Oliver et al. [32] also proposed Layered Hidden Markov Models (LHMMs) to diagnose states of a user’s activity based on data streams from video, audio, and computer (keyboard and mouse) interactions. With regard to modality, most of the work mentioned above employed video data only. Audio and other modal data are less frequently used together. One possible reason is that it is rather hard to determine hidden parameters of HMMs in the case of multi-modal group action or activity recognition, where features from each modal are concatenated to define the observation model [33]. Wyatt et al. [27] presented a privacy-sensitive DBN-based unsupervised approach to separating speakers and their turns in a multi-person conversation. They addressed the problem of recognizing sequences of human interaction patterns in meetings with two-layer HMM using both audio and video data. Unlike other work, their framework explicitly modeled actions at different semantic levels from individual to group level at the same time scale. However, to the best of our knowledge, there is no formal study on recognizing multi-user activities using wearable sensors in a smart home environment. An interesting work is done by Lin et al. [34], where they deployed various kinds of sensors in a home environment and proposed a layered model to learn multiple user’s preferences based on sensor readings. However, their focus is on learning of preference models of multiple users, i.e., relationships among users as well as dependency between services and sensor observations, not on recognizing their activities. A series of research work has been done in the CASAS smart home project at WSU [35] to serve the residents in a smart home. Singla et al. [36] addressed the problem of recognizing the independent and joint activities among multiple residents in smart environments using a single HMM model. However, they only explored the use of infrastructure sensors deployed in a smart apartment which limits the information they could get on the activities of the residents. Besides, the association of the sensor data and the people who generated it is done manually. Data association is trivial for wearable sensors but difficult for infrastructures sensors, which indicates the advantage of our sensor platform. In our earlier work, we have explored a pattern mining approach [37] to multi-user activity recognition. In this paper, we use a machine learning approach and study two temporal models to model interacting processes involving multiple users.
Multi-user Activity Recognition in a Smart Home
3.3
63
Multi-modal Wearable Sensor Platform
Multi-modal wearable sensors have been successfully applied in recognizing the activities of daily living by many researchers. Accelerometers have been used to capture the bodily movements of the user [7, 12]. Recent advances in wearable RFID readers have enabled the recognition of activities by capturing object use [10]. Wearable audio devices have also been used in activity recognition [28]. We built a multi-modal wearable sensor platform, as shown in both Figures 3.1 and 3.2. This platform measures user movement, user location, human-object interaction, humanto-human interaction and environmental information. To capture acceleration data, we used a Crossbow iMote2 IPR2400 processor/radio board with an ITS400 sensor board, as shown in Figure 3.1d. The ITS400 sensor board also measures environmental temperature, humidity and light. To capture object use, we built a customized RFID wristband reader which incorporates a Crossbow Mica2Dot wireless mote, a Skyetek M1-mini RFID reader and a Li-Polymer rechargeable battery. The wristband is able to detect the presence of a tagged object within the range of 6 to 8 cm. The RFID wristband reader is also able to capture object interaction, i.e., objects passing from one user to another. To capture vocal interaction among users, we use an audio recorder with a maximum sampling rate of 44.1 kHz to record audio data, as shown in Figure 3.1b. In addition, user location is detected in a simple way that a UHF RFID reader is located in each room to sense the proximity of a user wearing a UHF tag. To determine user identity, the device IDs of each iMote2 set and RFID reader are logged and bound to a specific user. The sampling rate of the RFID readers is set to 2 Hz and the sampling rate of the 3-axis accelerometer in each iMote2 is set to 128 Hz, and the sampling rate of audio recorder is set to 16 kHz. When a user performs activities, the acceleration readings from each iMote2 set are transmitted wirelessly to a local server (shown in Figure 3.2a, left or middle) which runs on a laptop PC with an iMote2 IPR2400 board connected through its USB port. When a user handles a tagged object, the RFID wristband reader scans the tag ID and sends it wirelessly to another server (shown in Fig. 3.2a, right) that can map the ID to an object name. This server runs on a Linux-based laptop PC with a MIB510CA serial interface board and a Mica2Dot module connected through its serial port. In addition, human voice and environmental sound are recorded by the audio recorder. All the sensor data with timestamps are logged separately, and will be merged into a single text file as the activity trace for each user.
64
Activity Recognition in Pervasive Intelligent Environments
Fig. 3.1 (a) Wearable sensor set, (b) Audio recorder, (c) iMote2 with ITS400, (d) RFID wristband reader.
Fig. 3.2
3.4
(a) Servers for logging sensor data, (b) iMote2 sink node, (c) Mica2Dot sink node.
Multi-chained Temporal Probabilistic Models
In this section, we first describe our problem statement, then present two multi-chained temporal probabilistic activity models to model and recognize multi-user activities. 3.4.1
Problem Statement
We formulate our multi-user activity recognition problem as follows. We assume that there are a number of training datasets, where each training dataset corresponds to each user. Each training dataset O consists of T observations O = {o1 , o2 , . . . , oT } associated with
Multi-user Activity Recognition in a Smart Home
65
activity labels {A1 , A2 , . . . , Am }, where there are m multi-user activities. For a new sequence of observations corresponding to a user, our objective is to train an appropriate activity model that can assign each new observation with the correct activity label.
3.4.2
Feature Extraction
After obtaining sensor readings, we first need to extract appropriate sensor features. We convert all the sensor readings to a series of observation vectors by concatenating all of the data observed in a fixed time interval which is set to one second in our experiments. Different types of sensors require different processing to compute various features. For acceleration data, we compute five features—DC mean, variance, energy, frequencydomain entropy, and correlation. The DC mean is the mean acceleration value in a time interval. Variance is used to characterize the stability of a signal. Energy captures data periodicity, and is computed as the sum of the squared discrete FFT component magnitudes of a signal. Frequency-domain entropy helps to discriminate activities with similar energy values, and is computed as the normalized information entropy of the discrete FFT component magnitudes of a signal. Correlation is computed for every two axes of each accelerometer and all pair-wise axes combinations of two different accelerometers. This feature aims to find out the correlation among different axes of any two accelerometers. We compute four features (DC mean, variance, energy, and frequency-domain entropy) for each axes of the two 3-axis acceleromters. The correlation is computed for each of the fifteen pairs of axes. In total, there are 39 features extracted from acceleration data. To improve the reliability of our feature extraction, a cubic spline approach can be used to fill out the missing values, and a low-pass filter can be used to remove the outliners in the data. For audio data, we compute both time-domain and frequency-domain features. The timedomain features measure the temporal variation of an audio signal, and consist of three features. The first one is the standard deviation of a reading in a time interval, normalized by the maximum reading in the interval. The second one is the dynamic range defined as (max − min)/ max, where min and max represent the minimum and maximum readings in the interval. The third is Zero-Crossing Rate (ZCR) which measures the frequency content of a signal and is defined as the number of time-domain zero crossings in a time interval. In the frequency domain, we compute two features—centroid (the midpoint of the spectral power distribution) and bandwidth (the width of the range of frequencies that a signal occupies). In total, there are five features extracted from the audio data. A similar technique used for acceleration data can be applied to audio data to improve the reliability
66
Activity Recognition in Pervasive Intelligent Environments
of feature extraction. For RFID reading or location information, we use object name or location name directly as features. For each RFID wristband reader, we choose the first object in a one-second time interval since a user is unlikely to touch two or more objects in such a short interval. If no RFID reading is observed or in the presence of a corrupted tag ID, the value will be set to NULL. There are three features in total–two features for object usage of both hands and one feature for the user’s location. The above process generates a 47-dimensional observation vector every second. We then transform these observation vectors into feature vectors. A feature vector consists of many feature items, where a feature item refers to a feature name-value pair in which a feature can be numeric or nominal. We denote a numeric feature as num f eaturei . Suppose its range is [x, y] and an interval [a, b] (or in other forms, (a, b], [a, b), or (a, b)) is contained in [x, y]. We call num f eaturei @[a, b] a numeric feature item, meaning that the value of num f eaturei is limited inclusively between a and b. We denote a nominal attribute as nom f eature j . Suppose its range is {v1 , v2 , . . . , vn }, we call nom f eature j @vk a nominal feature item, meaning the value of nom f eature j is vk . The key step of transformation is to discretize numeric features. We follow the entropybased discretization method which partitions a range of continuous values into a number of disjoint intervals such that the entropy of the partition is minimal [38]. The class information entropy of candidate partitions is used to select binary boundaries, and the minimal entropy criteria is then used to find multi-level cuts for each attribute. The discretization method partitions 44 numeric feature values into a total of 484 disjoint intervals. Then we can directly combine the feature name and its interval into a numeric feature item. For the nominal feature, the feature name and its value are combined as a nominal feature item. For the LEFT OBJ and RIGHT OBJ features, we merge them into one feature by computing LEFT OBJ ∪ RIGHT OBJ without losing any object during the user-object interaction due to user’s handedness. In our current sensor setting, we have a total of 574 feature items. They are indexed by a simple encoding scheme and will be used as the inputs of our probabilistic models described in the next section.
3.4.3
Coupled Hidden Markov Model
After feature extraction, we obtain a sequence of feature vectors for each user, where a feature vector f = { f1 , f2 , . . . , fT } is associated with activity labels {A1 , A2 , . . . , Am }. To model a single-user sequence, HMM is often used, and it consists of a hidden variable and
Multi-user Activity Recognition in a Smart Home
a Ot−1
67
Ota
a Ot+1
at−1
at
at+1
bt−1
bt
bt+1
b Ot−1
Fig. 3.3
Otb
b Ot+1
Structure of CHMM
an observable variable at each time step. In this case, the hidden variable is an activity label, and the observable variable is a feature vector. For multiple sequences of observations corresponding to multiple users, we can factorize the basic HMM into multiple channels and couple HMMs with temporal influences to model interacting processes. The coupling bridges hidden variables with conditional probability of transition. CHMM was originally introduced in [39] for modeling interacting processes. The CHMM models the causal influences between the hidden state variables of different chains. The semantics of the model is clear and easy to understand. We explore CHMM in this work for modeling multi-user sensor sequences and capturing inter-user influences across time. The advantage of using CHMM is that it can recognize activities of both single and multiple users in a unified framework. To the best of our knowledge, this work is among the first of using CHMM to recognize multi-user activities from sensor readings. To illustrate, as shown in Figure 3.3, there are two sequences of states A and B with observation Oa and Ob , respectively, at each time slice t. A two-chain CHMM can be constructed by bridging hidden states of its two component HMMs at each time slice with the crosswork of conditional probabilities Pat |bt−1 and Pbt |at−1 . The posterior of a state sequence through fully coupled two-chain CHMM is defines as follows.
68
Activity Recognition in Pervasive Intelligent Environments
P(S|O) =
πa1 P(oa1 |a1 )πb1 P(ob1 |b1 ) T ∏[Pat |at−1 Pbt |bt−1 Pat |bt−1 Pbt |at−1 P(ota |at )P(otb |bt )] P(O) t=2 (3.1)
where πa1 and πb1 are the initial probabilities of states, Pat |at−1 and Pbt |bt−1 are the innerchain state transition probabilities, Pat |bt−1 and Pat |bt−1 are the inter-chain state transition probabilities modeling the interactions, P(ota |at ) and P(otb |bt ) are the output probabilities of the states, we employ the Gaussian distribution in this case. The CHMM inference problem is formulated as follows. Given an observation sequence O, we need to find a state sequence S which maximizes P(S|O). The inference algorithm— Viterbi—for HMM could be applied to CHMM as well with some modifications. The key point is, for each step, we need to compute both the inner-chain and inter-chain state transition probability, i.e., Pat |at−1 Pbt |bt−1 and Pat |bt−1 Pbt |at−1 . The algorithm outputs the best state sequence S which involves two state sequences Sa and Sb corresponding to the recognized activity sequences for the two users. There are many existing algorithms for training HMM such as Baum-Welch. Since a twochain CHMM, C, can be constructed by joining two component HMMs, A and B, and taking the Cartesian product of their states, we define our training method as follows. We first train A and B following the maximum likelihood method, and then, we couple A and B with inter-chain transition probabilities which can be learnt from training datasets. This method is efficient since we do not need to re-train the CHMM.
3.4.4
Factorial Conditional Random Field
We also study another temporal probabilistic model—FCRF—which was first introduced in [40]. Unlike generative models such as HMM, CRF is an undirected, discriminative model that relaxes the independence assumption of observations and avoids enumerating all possible observation sequences. FCRF factorizes the basic linear-chained CRF by introducing co-temporal connections, and hence, it can be used to model interactions among multi-user activities. Figure 3.4 shows an example of FCRFs where there are two CRF chains, and the inter-state connections are conditionally trained. States at both time slices are joined at the same time step. Given the input instance O at a time slice t, we can unroll the CRF chains, A and B, to get a full undirected model, similar to a DBN. It is reported that discriminative models often outperform generative models in classification tasks [41]. We explore the use of FCRF for multi-user activity recognition and compare the performance
Multi-user Activity Recognition in a Smart Home
69
at−1
at
at+1
bt−1
bt
bt+1
Ot−1
Ot
Ot+1
Fig. 3.4
Structure of FCRF
of FCRF and CHMM. We define FCRF to model multi-user activities as follows. Let s = {s1 . . . sT } be a sequence of random vectors si = (si1 . . . sim ), where si is the state vector at time i, and si j is the value of variable j at time i. Let C be a set of clique indices, F = { fk (st,c , o,t)} be a set of feature functions and Λ = {λk } be a set of real valued weights. FCRF (C, F, Λ) is then determined as follows.
1 p(s | o) = (3.2) ∏ exp ∑ λk fk (st,c, o,t) Z(o) ∏ t c∈C k where Z(o) = ∑ ∏ ∏ exp ∑ λk f k (st,c , o,t) is normalization constant, which ensures the s s c∈C
k
final result is a probability. Given an observation sequence O, we wish to solve two inference problems: (a) computing the marginals p(st,c | o) over all cliques st,c , and (b) computing the Viterbi decoding s∗ = arg maxs p(s | o). The Viterbi decoding can be used to label a new sequence, and marginal computation is used for parameter estimation. There are many inference algorithms for CRF. Exact inference can be very computationally intensive. Loopy belief propagation is one of the most popular algorithms for inference in CRF [40]. In general, belief propagation involves iteratively updating a vector m = (mu (ov )) of messages between pairs of vertices ou and ov . The update from ou to ov is given by: mu (ov ) ← ∑ Φ(ou , ov ) ou
∏
ot =ov
mt (ou )
(3.3)
where Φ(ou , ov ) is the potential on the edge (ou , ov ). Performing this update for an edge (ou , ov ) in one direction is also called Sending a message from ou to ov . Given a message
70
Activity Recognition in Pervasive Intelligent Environments
vector o, approximate marginals are computed as follows. p(ou , ov ) ← κ Φ(ou , ov ) where κ is a normalization constant.
∏
ot =ov
mt (ou )
∏
ow =ou
mw (ov )
(3.4)
Parameter estimation is an approximate method used for training of FCRF. Given training data O, we aim to find a set of parameters Λ by optimizing the conditional log-likelihood L(Λ) defined as follows. L(Λ) = ∑ log pΛ (s(i) | o(i) )
(3.5)
i
where O = {o(i) , s(i) }Ni=1 and Λ = {λk }. To reduce overfitting, we define a prior p(Λ) over parameters, and optimize log p(Λ | O) = L(Λ) + log p(Λ). By using a spherical Gaussian prior with mean μ = 0 and covariance matrix Σ = σ 2 I, the gradient becomes ∂ p(Λ | O) ∂ L λk = − (3.6) ∂ λk ∂ λk σ 2 The function p(Λ | O) is convex and can be optimized using the L-BFGS [42] technique. 3.4.5
Activity Models in CHMM and FCRF
The dataset we collected consists of a sequence of sensor observations for each user. We first preprocess observation sequences to feature vectors as we described in 3.4.2. Each sequence of feature vectors will be divided into a training sequence and a testing sequence, and input to a CHMM (or FCRF) model. The activity model will be first trained from multiple training sequences corresponding to multiple users, and the trained model is then used to inference activities given multiple testing sequences. We implemented our HMM model based on Jahmm—an open source implementation of HMM in Java [43]. We extended its code and implemented our CHMM model. We also used an open source Java based implementation for CRF—Mallet [44]. We implemented our FCRF model by extending the existing code provided by mallet. In the training process, we build a CHMM (or FCRF) model for multiple users where each single-chain HMM (or CRF) is used for each user and each hidden state in the HMM (or CRF) represents one activity for the user. We train the CHMM model with multiple training sequences using the parameter estimation method described in 3.4.3. When testing, the testing sequences are fed into the trained CHMM (or FCRF) model, the inference algorithm then outputs multiple state sequences for each single-chain HMM (or CRF) as the labeled sequences.
Multi-user Activity Recognition in a Smart Home
71
Fig. 3.5 Snapshots showing our smart home, tagged objects, and various activities being performed in our sensor data collection. (a) Our smart home, (b) brushing teeth, (c) making tea, (d) tagged objects, (e) making coffee, (f) having meal, (g) using computer.
3.5
Experimental Studies
We now move to evaluate these two models. Our first task is to select an appropriate sensor dataset with proper annotations, consisting of a reasonable number of multi-user activity instances. In this section, we first describe our data collection methodology and experimental setup, then present and discuss the evaluation results obtained from a series of experiments.
3.5.1
Trace Collection
Collecting sensor data for multiple users in a real home is a difficult and time-consuming task. Aiming for a realistic data collection, we first conduct a survey among 30 university students. In this survey, each participant was asked to report on what daily activities (both single- and multi-user activities) she/he performed at her/his home, and how each activity is performed (i.e., where each activity is performed, number of persons involved, number of times each activity is performed each day, duration of each activity, major steps and ob-
72
Activity Recognition in Pervasive Intelligent Environments
jects used in each activity, etc.). They were asked to report not only their own experiences, but also the experiences from their family members (e.g., parents, siblings, etc.). In return, each participant was awarded with a small amount of remuneration. From the survey reports, we have a number of findings as follows. In a home environment, although many daily activities can be performed by multiple users, single-user activity is still the majority. Second, there are typically less than 4 people involved in a multi-user activity. Third, most of the multi-user activities are performed in a same room. Fourth, interactions among users occur in a number of ways including voice conversation, objects passing, etc. To have a reasonable and realistic data collection, we randomly select 21 activities (shown in Table 3.1) from the list in our survey. The ratio of number of single-user activities to number of multi-user activities is close to the result in our survey. We limit the number of users to 2 for reducing annotation efforts. Our data collection was done in a smart home (i.e., a living lab environment), as shown in Figure 3.5 (top left). The smart home consists of a living room, a kitchen, two bedrooms, a study room, a bathroom and a store room. Each room, except the bathroom, is equipped with a video camera for recording the ground truth. We tagged over 100 day-to-day objects such as tablespoons, cups and computer mouse using HF RFID tags in three different sizes—coin, clip and card. Figure 3.5 (top right) shows some tagged objects in the kitchen. We have two male subjects, and both are student volunteers from a local university. During data collection, each subject wore a set of wearable sensors we developed and performed these activities following the typical steps for each activity. These activities are performed in an order which is close to daily practice. Figure 3.5 shows some snapshots of various activities being performed in the bathroom, the living room, the kitchen and the study room during our data collection. A set of servers was set up in the living room to log the trace. All the servers and sensors were synchronized before data collection. Before the collection, one server broadcasts a message contains a timestamp of the current system and the other servers and sensors synchronize their local clocks with the time stamp on receiving this message. For each user, a trace was logged and annotated by an annotator who is also a student volunteer from a local university. The ground truth was also recorded by video cameras. Data collection was done over a period of ten days across two weeks, and we collected a total number of 420 annotated instances for both subjects. For each user, there are 150 single-user activity instances and 60 multi-user activity instances, and the ratio of number of multi-user instance to number of single-user instance is higher than the result in our survey in order to have more multi-user activity instances.
Multi-user Activity Recognition in a Smart Home
73
Table 3.1 Activities Performed in Our Trace Collection. Multi-user Activities
Single-user Activities 0
brushing teeth
8
vacuuming
15
making pasta
1
washing face
9
using phone
16
cleaning a dining table
2
brushing hair
10
using computer
17
making coffee
3
making pasta
11
reading book/magazine
18
toileting (with conflict)a
4
making coffee
12
watching TV
19
watching TV
5
toileting
13
having meal
20
using computer
6
ironing
14
drinking
7
making tea
a This
is a case when one person tried to use the toilet which was being occupied.
Compare to the dataset collected in [36], our dataset involves more activities (21 versus 15) and much larger amount of sensor data (wearable sensors with high sampling rate versus infrastructure sensors). 3.5.2
Evaluation Methodology
We use ten-fold cross-validation to generate our training and test datasets. We evaluate the performance using the time-slice accuracy which is a typical technique in time-series analysis. The time-slice accuracy represents the percentage of correctly labeled time slices. The length of time slice Δt is set to 1 second. This time-slice duration is short enough to provide high accurate labeling of activities as well as precise measurements for most of the activity recognition applications and is commonly used in previous work [12, 22, 45]. The metric of the time-slice accuracy is defined as follows. N
∑ [predicted(n) = ground_truth(n)]
Accuracy = where 3.5.3
N=
n=1
N
(3.7)
T Δt .
Accuracy Performance
In the first experiment, we evaluate and compare the accuracies of the two multi-chained probabilistic models. Figure 3.6 shows the accuracies of both CHMM and FCRF in all the ten datasets. Table 3.2 shows the result breakdowns in type of activities and users, respectively. Both models achieve an acceptable performance with similar accuracies—85.46%
74
Activity Recognition in Pervasive Intelligent Environments
Fig. 3.6
Accuracy results breakdown in datasets.
Table 3.2 Accuracy Results Breakdown in Type of Activities and Users Users
CHMM Accuracies Single-user Multi-user Activity
Activity
user 1
74.79%
96.91%
user 2
85.11%
95.91%
overall
79.95%
96.41%
Overall
FCRF Accuracies Single-user Multi-user
Overall
Activity
Activity
82.22%
85.75%
87.02%
86.70%
88.71%
82.56%
88.84%
86.37%
85.46%
84.16%
87.93%
86.54%
for CHMM and 86.54% for FCRF. One observation is that CHMM outperforms FCRF in the case of multi-user activity for both users. To analyze, CHMM couples HMMs with temporal, asymmetric influences while FCRF couples CRFs with probabilistic dependencies between co-temporal links. When modeling user interactions, the coupling method which bridges time slices seems offer a better model of inter-process influences. A similar observation can be found in [39]. Another observation from Table 3.2 is that the recognition accuracy for multi-user activities is higher than single-user activities for both models. The results show that both models tend to recognize an activity as performed by multiple users. For example, over half of the instances of watching TV (single-user activity) were recognized as watching TV (multi-user activity) by CHMM and one-thirds of the instances of toileting (single-user activity) is recognized as toileting (with conflict) by FCRF. Currently this phenomenon is not properly explained. To have a clear understanding of this phenomenon, extensive experiments with more activities and more subjects involved are
Multi-user Activity Recognition in a Smart Home
Table 3.3
75
Confusion Matrix of CHMM (percentages)
Predicted Activities
Ground Truth Activities 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 0 27.8 2.4 0.1 1 72.2 97.6 5.0 2 100 92.9 25.3 3 7.1 74.7 4 5 100 6 93.7 1.1 7 1.7 93.9 3.5 8 95.6 9 100 10 100 11 68.0 1.7 12 0.4 4.8 43.3 13 0.4 0.4 84.9 14 0.3 0.4 0.1 90.2 15 2.5 0.6 0.1 90.0 4.9 5.5 10.0 95.1 8.8 0.3 0.4 16 91.2 0.6 0.1 0.1 0.3 5.0 17 18 0.2 0.4 0.1 4.4 100 19 0.1 24.4 54.4 0.2 100 20 0.8 0.1 9.7 100
necessary, and hence we leave for our future work. We also present the confusion matrix for each model as shown in Tables 3.3 and 3.4, respectively. The columns in each table show the ground-truth labels and the rows show the predicted labels. The activity serial numbers in both tables are identical to the numbers in Table 3.1. The values in each table are percentages, indicating that the percentage of the entire observation sequence for each activity is predicted correctly, and the percentages are predicted as other labels. For CHMM, three single-user activities (brushing hair, toileting and using computer) and two multi-user activities (watching TV and toileting w/conf ) give the highest accuracies, and three single-user activities (brushing teeth, reading book/magazine, and watching TV) perform the worst. For FCRF, two single-user activities (vacuuming and drinking), and one multi-user activity (cleaning a dining table) perform the best, and two single-user activities (toileting and watching TV), and one multi-user activity (watching TV) performs the worst. Most confusion takes place in the following four cases: Case 1: A single-user activity is predicted as another single-user activity. For example, the result of CHMM shows that, for the making coffee activity, while 74.7% of its entire
76
Activity Recognition in Pervasive Intelligent Environments
Table 3.4
Confusion Matrix of FCRF (percentages)
Predicted Activities
Ground Truth Activities 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 0 95.6 26.3 12.4 0.2 4.2 1 1.4 72.4 9.4 0.2 1.9 2 1.3 74.2 0.1 0.2 10.4 1.3 3.0 78.1 3.3 1.8 3 1.9 73.7 0.2 9.6 4 2.3 5 0.1 63.7 0.2 6 0.6 82.2 1.9 7 6.2 0.2 93.9 0.2 0.1 5.2 8 0.3 99.6 0.4 9 0.2 94.7 10 0.2 80.9 0.6 7.3 11 0.1 1.5 0.7 93.6 0.9 29.5 12 7.7 2.9 0.3 68.4 0.1 13 0.3 94.5 0.9 1.3 0.6 0.1 96.7 14 15 0.4 9.9 0.6 0.4 89.3 0.2 1.2 16 5.3 0.3 98.0 0.1 0.3 82.2 0.1 17 0.4 10.0 15.6 1.6 18 2.9 0.3 33.4 0.3 95.5 19 0.3 9.7 5.2 28.4 0.2 70.5 20 0.2 18.4 1.8 92.1
observation sequence is predicted correctly, 25.3% of them is predicted as another activity making pasta. The result of FCRF shows that, for the washing face activity, while 72.4% of its entire observation sequence is predicted correctly, 26.3% of them is predicted as another activity brushing teeth. Most of these recognition errors occur between activities within the same place and involve similar user movements and object usage. Making coffee and making pasta both take place in the kitchen while washing face and brushing teeth both take place in the bathroom. It is also possible that the RFID read worn on the users hands may sense objects that are not supposed to be used. For example the RFID reader may read the tag attached on the coffee pot or the tooth brush while the user is making pasta or washing face, respectively. Case 2: A single-user activity is predicted as a multi-user activity. For example, the result of CHMM shows that 24.4% of the observation sequence of reading book/magazine is predicted as watching TV (multi-user), 54.4% of the observation sequence of watching TV (single user) is predicted as watching TV (multi-user), and the result of FCRF shows that 33.4% of the observation sequence of toileting is predicted as toileting w/conf. Case 3: A multi-user activity is predicted as another multi-user activity. For example,
Multi-user Activity Recognition in a Smart Home
77
for the making pasta activity, the result of CHMM show that 10.0% of its observation sequence is predicted as another activity cleaning a dining table. These errors can occur when activities share some objects. Spoons and plates are touched when the user is making pasta and cleaning a dining table. Case 4: A multi-user activity is predicted as a single-user activity. For example, the result of FCRF shows that 29.5% of the observation sequence of watching TV (multi-user) is predicted as reading book/magazine. Both activities involve sitting and both take place in the living room. Possible solutions for improving multi-user activity recognition include deploying more sensors which are potentially useful to capture user interactions. For example, sensors such as gyro, 3D compass or tilt switch sensor can be potentially used to keep track of a user’s head movement. We observe that two users usually face each other when they are talking. Detecting that they face each other can be used as an additional feature to capture their interactions other than voice conversation, which we leave for our future work. 3.6
Conclusions and Future Work
In this chapter, we study the fundamental problem of recognizing activities of multiple users using wearable sensors in a home setting. We develop our wearable sensor platform and conduct a real-world trace collection in a smart home. We then investigate a challenging problem of how to model and classify multi-user activities. We study two multi-chained temporal probabilistic models—CHMM and FCRF—and our evaluation results demonstrate the effectiveness of both models. We will deploy the system in real-wold environments to study its performances such as recognition accuracy and delay. Typical end users without prior knowledge about our research will be involved in the experiments to study the usability of the system. We will also study better annotation methods that can accurately record the activities performed without affecting the user’s behavior. There are a number of limitations in this work. Although the datasets we collected in this chapter contain various cases of both single- and multi-user activities, they are still done in a “mock” scenario. A more natural collection should be conducted in a real home and done by real users. We plan to have such data collection in our future work, and evaluate these two multi-chained models further based on a real dataset. We perceive that such dataset typically contains much background noise, hence it will be challenging to handle noise in these activity models. In addition, as we discussed in Section 3.5.3, we will further develop our sensing platform by introducing more sensors such as gyro, 3D compass, etc.
78
Activity Recognition in Pervasive Intelligent Environments
These sensors will be very useful in capturing additional sensor features related to user interactions. We plan to deploy our system in trial to study its performances. We will also investigate better annotation methods that can accurately record the activities performed without affecting the user’s behavior. Another limitation of our work is that, despite the fact that it is effective to recognize multiuser activities using CHMM and FCRF, the scalability of these models has not been investigated. The complexity of inferencing CHMM is O(T N 2C ) for an algorithm which takes the Cartesian product of C chains each with N hidden states observing T data points [39]. Although an approximate algorithm is proposed in [39] which reduces the complexity to O(T (CN)2 ), this solution remains questionable in term of scalability. For FCRF, algorithms used for inferencing from the model are also computationally expensive. One possible way of reducing the complexity is using the location information to reduce the search space of the models by assuming that activities involving multiple users can only be performed when all the users related are at the same location. We plan to investigate the scalability issue of both models in our future work. We will also investigate a possible future research direction— to recognize activities in a more complex scenario where single- and multi-user activities are mixed with interleaved (i.e., switching between the steps of two or more activities) or concurrent (i.e., performing two or more activities simultaneously) activities. For example, while two users are preparing a meal, one of the users turns on the TV to watch today’s headline news. Similar cases may exist in our daily lives. Recognizing activities in such a complex situation can be very challenging while we consider both single- and multi-user activities at the same time, and hence, an in-depth study is required. Our final goal is to develop an efficient, realtime, sensor-based activity recognition system capable of recognizing various activities for multiple users under different real-life scenarios and deploy the system for real-life trials.
Acknowledgement This work was supported by the Danish Council for Independent Research, Natural Science under Grant 09-073281, National 973 program of China under Grant 2009CB320702, Natural Science Foundation of China under Grants 60736015, 60721002 and 61073031 and Jiangsu PanDeng Program under Grant BK2008017.
Multi-user Activity Recognition in a Smart Home
79
References [1] Y. Yacoob and M. J. Black, Parameterized modeling and recognition of activities, In Proc. of Computer Vision and Image Understanding, pp. 73: 232–247, (1999). [2] D. Moore, I. Essa, and M. Hayes, Exploiting human actions and object context for recognition tasks, In Proc. of Int’l Conf. on Computer Vision (ICCV’99), pp. 1: 80–86, (1999). [3] Y. A. Ivanov and A. F. Bobick, Recognition of visual activities and interactions by stochastic parsing, In IEEE Trans. Pattern Recognition and Machine Intelligence, pp. 22(8): 852–872, (2000). [4] S. Katz, A. B. Ford, R. W. Moskowitz, B. A. Jackson, and M. W. Jaffe, Studies of illness in the aged, the index of adl: A standardized measure of biological and psychological function, In J. American Medical Association, pp. 914–919 (September, 1963). [5] M. Pollack, L. Brown, D. Colbry, C. McCarthy, C. Orosz, B. Peintner, S. Ramakrishnan, and I. Tsamardinos, Autominder: An intelligent cognitive orthotic system for people with memory impairment, In Robotics and Autonomous Systems, pp. 44: 273–282, (2003). [6] D. J. Patterson, L. Liao, D. Fox, and H. A. Kautz, Inferring high-level behavior from low-level sensors, In Proc. of Int’l Conf. on Ubiquitous Computing (Ubicomp’03), pp. 2864: 73–89, Seattle, Washington (October, 2003). [7] L. Bao and S. S. Intille, Activity recognition from user-annotated acceleration data, In Proc. of PERVASIVE 2004, pp. 3001: 1–17, Vienna, Austria (April, 2004). [8] E. M. Tapia, S. S. Intille, and K. Larson, Activity recognition in the home using simple and ubiquitous sensors, In Proc. of PERVASIVE 2004, pp. 3001: 158–175, Vienna Austria (April, 2004). [9] B. Logan, J. Healey, M. Philipose, E. Munguia-Tapia, and S. Intille, A long-term evaluation of sensing modalities for activity recognition, In Proc. of Int’l Conf. on Ubiquitous Computing (Ubicomp’07), pp. 4717: 483–500, Austria (September, 2007). [10] D. J. Patterson, D. Fox, H. Kautz, and M. Philipose, Fine-grained activity recognition by aggregating abstract object usage, In Proc. IEEE Int’l Symp. Wearable Computers, pp. 44–51, Osaka (October, 2005). [11] J. Lester, T. Choudhury, and G. Borriello, A practical approach to recognizing physical activities, In Proc. of PERVASIVE 2006, pp. 3968: 1–16, Dublin, Ireland (October, 2006). [12] J. A. Ward, P. Lukowicz, G. Troester, and T. Starner, Activity recognition of assembly tasks using body-worn microphones and accelerometers, In IEEE Trans. Pattern Analysis and Machine Intelligence, pp. 28(10): 1553–1567 (October, 2006). [13] M. Philipose, K. P. Fishkin, M. Perkowitz, D. J. Patterson, D. Fox, H. Kautz, and D. Hähnel, Inferring activities from interactions with objects, In IEEE Pervasive Computing, pp. 3(4): 50– 57 (October, 2004). [14] R. Hamid, S. Maddi, A. Johnson, A. Bobick, I. Essa, and C. Isbell, A novel sequence representation for unsupervised analysis of human activities, In J. Artificial Intelligence, pp. 173(14): 1221–1244, (2008). [15] T. L. M. van Kasteren, A. K. Noulas, G. Englebienne, and B. Kröse, Accurate activity recognition in a home setting, In Proc. of Int’l Conf. on Ubiquitous Computing (Ubicomp’08), pp. 1–9, Seoul, Korea (September, 2008). [16] T. Huynh, U. Blanke, and B. Schiele, Scalable recognition of daily activities from wearable sensors, In Proc. Int’l Symp. Location and Context-Awareness (LoCA), pp. 50–67, Germany (September, 2007). [17] D. Wyatt, M. Philipose, and T. Choudhury, Unsupervised activity recognition using automatically mined common sense, In Proc. of the 20th national Conf. on Artificial intelligence (AAAI’05), pp. 1: 21–27, Pittsburgh (July, 2005). [18] T. Gu, L. Wang, Z. Wu, X. Tao, and J. Lu, A pattern mining approach to sensor-based hu-
80
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
[30] [31]
[32]
[33]
[34]
Activity Recognition in Pervasive Intelligent Environments
man activity recognition, In IEEE Transactions on Knowledge and Data Engineering (TKDE), (23 Sept., 2010). P. P. Palmes, H. K. Pung, T. Gu, W. Xue, and S. Chen, Object relevance weight pattern mining for activity recognition and segmentation, In Elsevier Journal of Pervasive and Mobile Computing (PMC), pp. 6(1): 43–57 (Feburary, 2010). T. Gu, S. Chen, X. Tao, and J. Lu, An unsupervised approach to activity recognition and segmentation based on object-use fingerprints, In Elsevier Journal of Data & Knowledge Engineering (DKE), pp. 69(6): 533–544 (June, 2010). D. L. Vail, M. M. Veloso, and J. D. Lafferty, Conditional random fields for activity recognition, In Proc. of Int’l Conf. on Autonomous Agents and Multi-agent Systems (AAMAS), pp. 1–8, (2007). T. Y. Wu, C. C. Lian, and J. Y. Hsu, Joint recognition of multiple concurrent activities using factorial conditional random fields, In Proc. AAAI Workshop Plan, Activity, and Intent Recognition, California (July, 2007). S. Gong and T. Xiang, Recognition of group activities using dynamic probabilistic networks, In Proc. of the 9th Int’l Conf. on Computer Vision (ICCV’03), pp. 742–749, Nice, France (October, 2003). N. Nguyen, H. Bui, and S. Venkatesh, Recognising behaviour of multpile people with hierarchical probabilistic and statistical data association, In Proc of the 17th British Machine Vision Conference (BMVC’06), pp. 1239–1248, Edinburgh, Scotland (September, 2006). S. Park and M. M. Trivedi, Multi-person interaction and activity analysis: A synergistic trackand body- level analysis framework, In Machine Vision and Applications: Special Issue on Novel Concepts and Challenges for the Generation of Video Surveillance Systems, pp. 18(3): 151–166. Springer Verlag Inc., (2007). Y. Du, F. Chen, W. Xu, and Y. Li, Recognizing interaction activities using dynamic bayesian network, In Proc. of Int’l Conf. on Pattern Recognition (ICPR’06), pp. 1: 618–621, Hong Kong, China (August, 2006). D. Wyatt, T. Choudhury, J. Bilmes, and H. Kautz, A privacy sensitive approach to modeling multi-person conversations, In Proc. of Int’l Joint Conf. On Artificial Intelligence (IJCAI’07), pp. 1769–1775, India (January, 2007). T. Choudhury and S. Basu, Modeling conversational dynamics as a mixed-memory markov process, In Advances in Neural Information Processing Systems (NIPS’04), pp. 281–288, Cambridge, MA, (2005). MIT Press. C. chun Lian and J. Y. jen Hsu, Chatting activity recognition in social occasions using factorial conditional random fields with iterative classification, In Proceedings of the Twenty-Third AAAI Conf. on Artificial Intelligence (2008), pp. 1814–1815, Chicago, Illinois, (2008). N. Oliver, B. Rosario, and A. Pentland, Statistical modeling of human interactions, In CVPR Workshop on Interpretation of Visual Motion, pp. 39–46, (1998). N. Oliver, B. Rosario, and A. Pentland, A bayesian computer vision system for modeling human interactions, In IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 22: 831– 843, (2000). N. Oliver, A. Gargb, and E. Horvitz, Layered representations for learning and inferring office activity from multiple sensory channels, In Computer Vision and Image Understanding, pp. 96: 163–180, (November 2004). D. Zhang, D. Gatica-Perez, S. Bengio, I. McCowan, and G. Lathoud, Modeling individual and group actions in meetings: A two-layer hmm framework, In Proc. of Computer Vision and Pattern Recognition Workshop (CVPRW’04), p. 7: 117, Washington, DC, USA, (2004). Z. Lin and L. Fu, Multi-user preference model and service provision in a smart home environment, In Proc. of IEEE Int’l Conf. on Automation Science and Engineering (CASE’07), pp. 759–764 (September, 2007).
Multi-user Activity Recognition in a Smart Home
81
[35] P. Rashidi, G. Youngblood, D. Cook, and S. Das, Inhabitant guidance of smart environments, In Proc. of the Int’l Conf. on Human-Computer Interaction, pp. 910–919, (2007). [36] G. Singla, D. Cook, and M. Schmitter-Edgecombe, Recognizing independent and joint activities among multiple residents in smart environments, In J. Ambient Intelligence and Humanized Computing, pp. 1: 57–63. Springer, (2009). [37] T. Gu, L. Wang, Z. Wu, X. Tao, and J. Lu, Mining emerging patterns for recognizing activities of multiple users in pervasive computing, In Proc. of the 6th Int’l Conf. on Mobile and Ubiquitous Systems: Computing, Networking and Services (MobiQuitous ’09), pp. 1–10, Toronto, Canada (July 13-16, 2009). [38] U. Fayyad and K. Irani, Multi-interval discretization of continuous-valued attributes for classification learning, In Proc. Int’l Joint Conf. on Artificial Intelligence, pp. 1022–1027, San Francisco, (1993). [39] M. Brand, Coupled hidden markov models for modeling interacting processes, In Technical Report, pp. 1721: 30–42 (November, 1997). [40] M. Brand, Dynamic conditional random fields: Factorized probabilistic models for labeling and segmenting sequence data, In J. Machine Learning Research, pp. 8: 693–723, (2007). [41] A. Ng and M. Jordan, On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes, In Advances in Neural Information Processing Systems, p. 10(1.46): 6208, (2002). [42] J. Nocedal and S. J. Wright, Numerical Optimization, (Springer, New York, 1999). [43] http://code.google.com/p/jahmm/. [44] A. K. McCallum, Mallet: A machine learning for language toolkit, http://mallet.cs. umass.edu/, (2002). [45] D. Wilson and C. Atkeson, Simultaneous tracking and activity recognition (star) using many anonymous, binary sensors, In Proc. Int’l Conf. Pervasive, pp. 62–79, Germany, (2005).
Chapter 4
Smart Environments and Activity Recognition: a Logic-based Approach
Fulvio Mastrogiovanni, Antonello Scalmato, Antonio Sgorbissa, and Renato Zaccaria Department of Computer, Communication and System Sciences, University of Genova, Genova, Italy {name.surname}@unige.it Abstract This paper introduces a framework for enabling context-aware behaviors in smart environment applications, with a special emphasis on smart homes and similar scenarios. In particular, an ontology-based architecture is described that allows system designers to specify non trivial situations the system must be able to detect on the basis of available sensory data. Relevant situations may include activities and events that could be prolonged over long periods of time. Therefore, the ontology encodes temporal operators that, once applied to sensory information, allow to efficiently recognize and correlate different human activities and other events whose temporal relationships are contextually important. Special emphasis is devoted to actual representation and recognition of temporally distributed situations. The proof of concept is validated through a thoroughly described example of system usage.
4.1
Introduction
Nowadays, the role that context-aware systems are going to play in our society is definitely clear: it is commonly accepted that rich and expressive context models must be designed and realized in order to improve the actual behaviour and to extend the capabilities offered by Ambient Intelligence (AmI in short) systems, such as smart homes, intelligent vehicles or even smart cities. Although the very notion of “context” is particularly vanishing, the goal of a context-aware system is well defined. Referring to the work described in (Dourish, 2004) [12], this goal can be resolved in the following question: How computation can be made sensitive and responsive to the setting (both physical and social) in which it is harnessed and used? 83
84
Activity Recognition in Pervasive Intelligent Environments
Common examples are Personal Digital Assistants or Smart Phones that adaptively display information depending on both the surrounding environment and current user activities, such as a to buy list of goods which items are selectable or not depending on their availability in the store the user is currently visiting, or a to do list of activities which elements pop up depending on the current schedule and approaching deadlines. In order to offer these kinds of services, human behaviour must be carefully detected and understood. The search for expressive and effective context models is heavily related to cognition modelling and understanding, especially in humans. As it has been shown by philosophers, psychologists and cognitive scientists (refer to the work described in Dourish, 2004 [12] and Waldmann, 2007 [37], and the references therein), human activity does not necessarily follow long-term plans: on the contrary, it is very “situation dependent” and opportunistic, whereas long-term plans can usually be described as a dynamical sequence of short-term activities, each one set in a specific context. Early work in this field focused on “the big picture”, which allows to consider contexts as well-delimited and stable pieces of information. On the other hand, it seems intuitive that: • Contexts are not relevant per se, but especially when related to other “pieces” of information that may originate as a consequence of user activity, specifically distributed over prolonged periods of time. • Contexts are not static and monolithic entities: on the contrary, they are based on many constituent parts (depending on activities and roles of the humans involved) and can contribute to other contexts with possibly significant temporal relationships. To deal with these issues, different context models have been investigated, designed and effectively used in real-world scenarios, with alternate success. On the basis of these considerations, in the work reported in (Krummenacher & Strang, 2007) [20] a number of prerequisites are outlined, which are described as follows. Applicability. The context model must conform to existing environments and different domains of human activity. This requires the definition of common infrastructures and context representation paradigms enforcing scalability and modularity. Comparability. Proper metrics and classification procedures must be added to the context model in order to compare heterogeneous entities and to analyse sensory information. Furthermore, since context models are aimed at associating a “meaning” to a collection of sensory information, these metrics must span both numerical and symbolic domains. Traceability. The model must be aware of how incoming data are manipulated before being handled by the model itself. This requires a tight integration between information process-
Smart Environments and Activity Recognition: a Logic-based Approach
85
ing and knowledge representation. History. Knowledge of past events and activities must be considered in context definitions, along with their relationships with current situations. In particular, it is useful to define very expressive relationships among activities distributed over long periods of time using formalisms allowing to efficiently recognize temporal contexts. Quality. Incoming information must be associated with meaningful metrics about quality and reliability. Satisfiability. The context model must provide high levels of conformance between what is represented and inferred by the model itself and actual situations in the real world, which is a sort of symbol grounding problem (Harnad, 1990) [15]. Inference. The context model must reason about abstract context representations in order to make sense of incoming sensory information in real-time. From a theoretical perspective, a high-level symbolic framework for describing such a model represents a feasible solution, specifically when dealing with temporal patterns of data: as a matter of fact, symbolic approaches to context modelling received much attention in the past decade. From a practical perspective, these frameworks usually adopt formal ontologies, which constitute a representation of what is relevant in a given domain, therefore facing the shared understanding issue (Strang, Linnhoff-Popien, & Frank, 2003) [33] that characterizes systems where human-environment interaction is important. This paper proposes a temporal context model that is aimed at integrating the benefits of ontologies with temporal reasoning frameworks to enforce context-awareness in AmI systems. Specifically, the model allows the representation of non trivial patterns of interleaved events and human activities that must be searched for in sensory data (i.e., “symbolic features”), accounting for temporal patterns distributed over long periods of time. Sensory data are gathered from an AmI infrastructure that provides the ontology with information originating from distributed sources (Mastrogiovanni, Sgorbissa, & Zaccaria, 2007) [25], which are functionally anchored to symbolic representations. The paper is organized as follows. Section 4.2 discusses relevant literature. Next, the context model is introduced in its many facets in Section 4.3, specifically detailing how it is integrated with temporal reasoning operators. Section 4.4 discusses the associated context recognition process, whereas Section 4.5 details a specific example. Conclusion follows.
86
4.2
Activity Recognition in Pervasive Intelligent Environments
Related Work
The problem of representing and effectively reasoning upon temporal relationships among detected events in smart environments and AmI scenarios in general has not been exhaustively addressed in AmI literature, specifically with respect to the principled integration of suitable formalisms to encode, maintain and process temporal information (Gottfried, Guesgen, & Hübner, 2006) [14]. As a matter of fact, different approaches adopt very different frameworks that are not widely accepted by the overall community (Allen, 1983) [1] (Morchen, 2006) [29]. Furthermore, given the current state of the art technology in temporal knowledge representation Lutz, Wolter, & Zakharyashev, 2008) [23], it is not currently possible to provide designers of AmI systems with guidelines and best practices to formally define and easily deploy smart environments (Rugnone et al., 2007) [31]. This is due to the fact that it is not clear yet how to represent and manipulate detected temporal events occurring in a smart environment, as well as how to integrate this representation with events and human activities prolonged over extended periods of time. As a consequence, during the past decade, research directions have been rather sparse and not well-focused on a holistic view of AmI systems. In particular, major stress has been dedicated either on infrastructural issues or the use of dedicated formal representation tools. Examples of the first class include the specific role played by information fusion in determining which sensing technology to adopt, the identification of suitable data formats to facilitate the representation of sensory information, and algorithms for the recognition of a priori defined sequences of events and activities originating from sensory data. On the other hand, formal representation tools include the Allen’s Interval Algebra (Allen, 1983) [1], the Time Point Algebra (Vilain & Kautz, 1989) [36], the so-called Calendar Algebra (Bettini, Mascetti, & Sean Wang, 2007) [5] and the Linear Temporal Logic (Emerson & Halpern, 1986) [13]. Among the problems related to infrastructure, many researchers still debate on the interplay between the used technology for distributed sensing and how to represent this information in order to actually ground reasoning processes. In practice, different approaches are pursued, which largely depend on both the authors? background and the particular “school of thought” they adhere to. On one hand, AmI architectures completely based on cameras information have been discussed (Takahashi et al., 2007) [35], which are mainly focused on information processing rather than on the principled integration of different sensory modalities. On the other hand, the work described in (Mastrogiovanni, Sgorbissa, & Zaccaria, 2007) [25] (Monekosso & Remagnino,
Smart Environments and Activity Recognition: a Logic-based Approach
87
2007) [28] stresses the importance of using sensors providing simple (i.e., binary) information, and the associated logic-based information fusion mechanisms. This paves the way for the introduction of more advanced techniques for assessing distributed knowledge. However, temporal knowledge is not explicitly maintained. The Context Toolkit (Salber, Dey, & Abowd, 1999) [32] (Dey, 2001) [11] is a widely used architecture for quickly deploying smart environments. In particular, it provides a formally defined modelling language that is based on three basic elements, namely context widget, context interpreter and context aggregator. Widgets correspond to real-world artefacts (i.e., distributed sensing devices and associated services) devoted to acquire contextual knowledge and to make it accessible for applications to use it in a way that is transparent to them. Interpreters correspond to modules meant at reasoning upon contextual knowledge: they receive different contexts in input and produce new contextual information. Finally, aggregators correspond to modules in charge of combining heterogeneous contextual knowledge to depict a broader view of their meaning. In spite of the powerful capabilities in managing distributed sources of information, the explicit use of temporal information is not addressed. The Context Broker Architecture (CoBrA in short) and the associated ontology ontology (Chen, Finin, & Joshi, 2003) [8] (Chen, Perich, Finin, & Joshi, 2004) [9] originate from well-known requirements of distributed computing systems: (i) contextual knowledge must be shared in spite of the dynamic nature of distributed systems, thus dealing with temporally distributed situations; (ii) intelligent autonomous agents must be provided with a welldefined declarative semantics allowing them to reason upon knowledge; (iii) interoperability among different heterogeneous representations must be enforced. All these aspects are mediated by a broker, an entity in charge of: (i) maintaining a shared language used to model situations; (ii) acquiring data from heterogeneous information sources; (iii) establishing information delivery policies to share contextual knowledge. Unfortunately, very simple scenarios have been modelled (e.g., meeting rooms and similar environments), which do not clearly address how temporal situations are maintained within the system. As it can be noticed, the stress is more on infrastructure and architecture for the efficient, scalable and robust treatment of sensory data, rather than on integrating architectural aspects with formalisms allowing the processing of temporal relationships among the assessed events and human activities. The explicit introduction of temporal reasoning in smart environments can be considered a 2-step process. The first step is to introduce a knowledge representation layer able to
88
Activity Recognition in Pervasive Intelligent Environments
efficiently assess semantically relevant information originating from the underlying architecture over time, whereas the second step consists in integrating temporal reasoning techniques on the basis of the chosen representation formalism. In the specific case of smart environments, this is realized by encoding a suitable context model (and an associated context recognition process), and then by adding another representation layer that is devoted to extract temporally related patterns of data from the model itself. Among the possible solutions, both probabilistic and logic-based frameworks have been adopted. As an example of probabilistic approaches dealing with temporal information, Markov networks have been used in (Liao, Fox, & Kautz, 2005) [22] to represent events and human activities, specifically based on the notion of location: the system keeps track of the sequence of visited locations in order to infer human activities, which are described by chains in the network. Although the approach exhibits good computational performance, the expressivity is rather limited, since numerical information can be hardly provided with semantics (Harnad, 1990) [15]. This drawback (that is common to all the models based on Markov networks) is faced by purposively associating nodes in the network with specific labels, e.g., the name of a visited location, thereby being able to associate simple descriptions with detected paths. Nonetheless, major limits of the approach result in the limited significance that can be associated with labeled data, and in the inability to describe complex situations in a hierarchical fashion. On the other hand, several approaches have been proposed that manage temporal knowledge using logic-based formalisms, such as ontologies. The importance of temporal relationships in activity recognition is formally and philosophically discussed in (Chalmers, 2004) [7]. Stemming from the work of Martin Heidegger, and adopting human activity as the inspiring conceptual paradigm, the author argues that situations (and in particular actual contexts situations are built of) must combine both temporal and subjective patterns of interaction with different artefacts as well as a representation of the subjective experience gained during the interaction itself. This forms the basis of the so-called Activitybased Computing paradigm (Muehlenbrock, Brdiczka, Snowdon, & Meunier, 2004) [30] (Mastrogiovanni, Sgorbissa, & Zaccaria, 2008) [26]: different properties of an entity are obtained by filling particular activity roles and activity relationships among different contextual elements. This perspective has major consequences on practical aspects of situation representation and interpretation: designers of context-aware ubiquitous systems do not provide computational representations with either meaning or conceptualization; on the contrary, they can limit themselves in influencing and constraining it. In spite of the ap-
Smart Environments and Activity Recognition: a Logic-based Approach
89
pealing characteristics of the model, to date no practical implementation has been reported in literature. The smart home architecture presented in (Augusto et al., 2008) [2] (Aztiria et al., 2008) [3] adopts a rule-based system that exploits the ECA (Event?Condition?Action) model to determine a proper action, given a particular event and provided that the proper set of conditions hold. Each rule antecedent is constituted by an event type and an expression of literals, each one expressing a specific condition that must hold for the rule to fire. As soon as interesting events are detected over time, an algorithm iteratively checks whether the corresponding conditions hold: these may hold either at the same time or in different time instants. In the second case, temporal operators define the truth table of the full condition. However, two pitfalls must be accounted for: the former is the increased computational complexity that is associated with the rule verification algorithm, whereas the latter is related to the adopted “flat” model, that is not able to easily consider hierarchies of situations and structured events. A formally defined language aimed at integrating definition and recognition of activities has been discussed in (Mastrogiovanni et al., 2008) [24], which adopts an ontology to directly represent distributed sensory data in a hierarchical structure. In particular, a context model that is based on the three main base constructs of predicate, context and situation is encoded within the ontology. These base constructs are composed using both relational and temporal operators that define the semantics associated with the whole structure. However, the temporal expressivity is rather limited to simple forms only, whereas the uncertainty associated to sensory data (and its impact over temporal relationships) is not properly addressed. The work presented in (Henricksen, Indulska, & Rakotonirainy, 2002) [16] describes a standard Entity-Relationship language where relationships between entities can be modified at run-time according to parameters related to both temporal characteristics and information uncertainty. Therefore, relationships can be either static or dynamic, thereby leading to situations that may exhibit different parallel representations. Unfortunately, it seems very difficult to provide an AmI system with a proper dynamics operating on relationships in order to reflect the true state of the environment. For this reason, the language has been applied to very simple toy scenarios. The Event Calculus has been integrated within an ontology in (Chen et al., 2008) [10], where a cognitive model based on the notion of predicate is introduced. Each predicate is directly related to a number of facts occurring in the real-world, each one characterized by
90
Activity Recognition in Pervasive Intelligent Environments
temporal information, such as the occurrence or the initial and final time instants. Predicates contribute to axioms, which encode general-purpose rules about events and activities. Unfortunately, the complexity of Event Calculus allows for an experimental evaluation that is limited to restricted scenarios only. Adopting the terminology of the Allen’s Algebra, the work presented in (Jakkula & Cook, 2007) [18] (Jakkula & Cook, 2007) [19] describes an activity recognition system that is based on the pairwise checking of detected events and activities through temporal operators. Furthermore, this framework is extended to encompass machine learning algorithms that are used in order to predict future situations on the basis of current knowledge. Again, the checking process of temporal relationships is quite computationally expensive. As a remark, the focus of these systems has been devoted to the effective representation of context models in intelligent systems. However, it is worth noting that, from one side, knowledge representation frameworks are seldom well-integrated with the underlying infrastructure, whereas, on the other side, temporal reasoning mechanisms encoded so far are rather limited.
4.3
Representation of Temporal Contexts
The work described in this paper refers to standard knowledge representation systems, and in particular to ontologies described using Description Logics (DL in short), which are a formal conceptualization of a set of elements belonging to a specific domain and the relationships between those elements (Baader et al., 2003) [4]. Conceptualized elements are called concepts. Within an ontology Σ, concepts are represented through symbols, and described using descriptions. Formally, . Σ = {σk },
k = 1, . . . , |σk |.
(4.1)
Each symbol σk in Equation (4.1) is unambiguously associated with a description Dk that allows to define it using (combinations of) other descriptions. In SHOIN ontologies (Kutz, 2004) [21], which are the target of this discussion, it is possible to assume the availability of common logic connectors, such as definition ≡, and , or and not ¬, whose meaning is intuitive (formal specification of both syntax and semantics are beyond the scope of this paper). We collectively define them as host operators, as follows: . H = {≡, , , ¬}.
(4.2)
Smart Environments and Activity Recognition: a Logic-based Approach
91
Symbols in SHOIN ontologies are always formalized using composite descriptions that use connectors in H . Adopting an infix notation, descriptions can be thought as: Dk ≡ Dk,1 · · · Dk, j · · · Dk,n ,
∈H ≡.
(4.3)
Specifically, each constituent description Dk, j is in the form: • ¬Dk , Dk1 Dk2 , or Dk1 Dk2 , which allow to iteratively compose abstract descriptions on the basis of simpler low level descriptions. • ∃ r.Dk or ∀ r.Dk , where r is a role that is filled either by other composite descriptions, or by numerical, ordered or non-ordered symbols whose semantics is grounded with respect to Σ: in this case, roles are the essential formal means by which to incorporate actual sensory data within the ontology. • n r.Dk or n r.Dk , which restrict the number of possible filling descriptions for the role r, respectively at most and at least n fillers. Furthermore, a simple and efficient inference mechanism, called subsumption (and referred to as ), is commonly assumed (Horrocks & Sattler, 2007) [17] (McGuinnes & Borgida, 1995) [27]. Subsumption can be considered a binary operator acting upon two descriptions D1 and D2 . However, it is usually exploited in its query form subs?[D1 , D2 ] to return true if D1 is more general than D2 , and false otherwise. Intuitively, subs?[D1 , D2 ] holds if and only if D1 holds whenever D2 holds. As a practical side effect, if we consider a description Dk as a collection of constituent descriptions Dk, j , any aggregation of Dk, j subsumes Dk in principle.
Fig. 4.1 The proposed model for context assessment.
92
Activity Recognition in Pervasive Intelligent Environments
According to this formalization, a satisfied symbol Σ |= σk (σl ) is a grounded symbol (i.e., with no unspecified role) originating from the closure of roles in σk through a proper variable assignment α : each role r in the definition of σ is properly filled with the correct number and type of constituent descriptions. In a more formal perspective, given a collection of symbols σl , each one grounded with respect to an object σ described in the ontology Σ, and given a variable assignment α under a specific interpretation I at a given time instant τ , then the proposed model for context assessment can be seen as a collection of satisfiability procedures (Σ, I , ατ ) |= σk (σl ),
l = 1, . . . , |σl |.
In particular, the proposed model for context assessment is based on a specific definition of the structure of Σ: part of the overall ontology is devoted to represent the primitives that build up the context model. Although the proposed model can manage contexts with many levels of nested symbols (thereby allowing to represent specific as well as more general activities, along with their relationships, see Figure 4.1), three primitives are used that are hierarchically organized in: • Predicate, a concept that is aimed at explicitly representing information about sensory data, and referred to as P. In particular, for each type of information that is represented within Σ, a specific child concept of P is modeled: as a consequence, a set of symbols σk is defined such that {σk |Σ |= P(σk )} 1 . A symbol σk such that Σ |= P(σk ) is defined with a description Dk that is based exclusively on numerical, ordered or non-ordered sensory data. • Context, a concept which purpose is to aggregate descriptions of both predicates and other contexts expressing information about the same entity (i.e., the same person, or the same object involved in a human-environment interaction), and referred to as C . In particular, for each information aggregate that is to be represented, a child concept of Context is introduced in Σ. These are members of a set in the form {σk |Σ |= C (σk )}, and correspond to descriptions based on both composite descriptions, and numerical, ordered and non-ordered sensory data. In our model, there are many Context layers, where the first is directly related to predicates. • Situation, a concept that combines different contexts related to different entities to consider them as a whole, and is referred to as S . Again, for each interesting situation, a child concept is introduced such that {σk |Σ |= S (σk )}, which takes exclusively composite descriptions into account. 1 The
two forms P(σk ) and σk P can be used indifferently.
Smart Environments and Activity Recognition: a Logic-based Approach
93
Since each symbol in the ontology either contributes to or ultimately represents real-world events the system can recognize, this structure over Σ implicitly defines a computational process able to extract Predicate symbols from sensory data and to assess semantically relevant information from combinations of predicates into more complex descriptions, respectively Context and Situation descriptions. In principle, what we expect from the sysl from sensory data corresponding to tem is to be able to generate grounded descriptions D those of symbols Σ |= P(σk ), to iteratively aggregate them in order to satisfy symbols in Σ |= C (σk ) and then to combine satisfied descriptions to check subsumption with respect to symbols in Σ |= S (σk ). This process is described in detail in Section 4.3. In order to take temporal dependencies among Predicate, Context and Situation symbols into account (thereby reflecting temporal relationships holding between the corresponding real-world events), the ontology is provided with a collection of temporal operators, which require to explicitly introduce time dependence, and to reformulate the context assessment satisfiability problem of Equation (4.3) for symbols in P, C and S . In particular, for a given time instant τ , symbols σk , k = 1, . . . , |P| + |C | + |S |, are satisfied by σl in τ (and we write σk,τ ) if there is an interpretation I and a variable assignment ατ such that l ), (Σ, I , ατ ) |= σk,τ (σ
l |. l = 1, . . . , |σ
This new formalization allows us to introduce a family of temporal operators, which can be roughly divided into two classes: the former . Tt = {δ , λ } is aimed at monitoring the evolution of the single truth value of the term description, whereas the latter . Tr = {≺t , t , {d}, {o}, {m}, , {s}, { f }} comprises the well-known operators formalized in the Allen?s Interval Algebra, therefore being aimed at comparing pairwise different descriptions. Collectively, the set of temporal operators T is defined such that . T = Tt ∪ Tr . Furthermore, time is modeled as an ordered sequence of time instants. In order to allow subsumption to reason upon temporal relationships, it is necessary to introduce a family of common mathematical operators
ω ∈ {<, , =, , >},
94
Activity Recognition in Pervasive Intelligent Environments
justified as syntactic sugar, and to use them to encode the previously introduced temporal operators into Σ. Temporal relationships must be encoded within Σ. To this end, symbols such that Σ |= P(σk ), Σ |= C (σk ) and Σ |= S (σk ) are characterized by descriptions Dk accounting for two specific roles, namely starts_at and ends_at, which are filled with instances of time instants and, considered together, specify the temporal interval when the corresponding grounded descriptions have been last satisfied by a given variable assignment. In particular, for a given symbol σl , starts_at and ends_at are filled with two time instants τl− and τl+ if, l is true. Operators in Tt are defined in the following for each τ ∈ [τl− , τl+ ], ατ is such that σ paragraphs. Specifically, starts_at and ends_at are usually encoded in a specific symbol
σPERIOD , which is usually associated with Predicate, Context and Situation concepts.
Fig. 4.2
Operators Table
Change: an unary operator, referred to as δ , which produces a description Dδ ≡ δ (D). Given I , two time instants τ1 and τ2 , where τ1 < τ2 , and two variable assignments ατ1 and τ . ατ , δτ (Dτ ) is satisfied if and only if Dτ = ¬D 2
2
2
2
1
Smart Environments and Activity Recognition: a Logic-based Approach
95
Duration: a family of binary operators, referred to as λω , which produces a description Dλ ≡ λω (D1 , D2 ). Given I , four time instants τ1− , τ1+ , τ2− and τ2+ , and four variable assignments ατ − , ατ + , ατ − and ατ + , ατ − < ατ + < ατ − < ατ + , for each τ such that ατ + < τ , 1 2 1 2 2 1 2 1 2 2,τ ) is satisfied if and only if (τ + − τ − )ω (τ + − τ − ) is true. Next, operators in λω ,τ (D1,τ , D 1
1
2
2
Tr are defined in the following paragraphs (see Figure 4.3). Beforet : a binary operator, referred to as ≺t , which produces a description D≺t ≡≺t (D1 , D2 ), i.e., when an events occurs t time instants before another one. Given I and 1 ≺t D 2 is satisfied if and only if D1 , D2 and a sequence of variable assignments {ατ }, D
τ1+ + t > τ2− hold. When t is not specified there are no constraints over the period. Aftert : a binary operator, referred to as t , which produces a description Dt ≡t (D1 , D2 ), i.e., the opposite of the previous case. Given I and a sequence of variable 1 t D 2 is satisfied if and only if D1 , D2 and τ − − t < τ + holds. assignments {ατ }, D 1 2 When t is not specified there are no constraints over the period. During: a binary operator, referred to as {d}, which produces a description Dd ≡ {d}(D1 , D2 ), i.e., two events are parallel and one is entirely contained within the other 1 {d}D 2 is satisfied one?s span. Given I and a sequence of variable assignments {ατ }, D if and only if D1 , D2 and (τ1− > τ2− ) ∧ (τ1+ < τ2+ ) hold. Overlaps: a binary operator, referred to as {o}, which produces a description Do ≡ {o}(D1 , D2 ), i.e., two events are partially parallel. Given I and a sequence of variable 1 {o}D 2 is satisfied if and only if D1 , D2 and (τ − < τ − ) ∧ (τ + < τ − ) assignments {ατ }, D 1
2
1
2
hold. Meets: a binary operator, referred to as {m}, which produces a description Dm ≡ {m}(D1 , D2 ), i.e., when one events exactly starts when the another one ends. Given I 1 {m}D 2 is satisfied if and only if D1 , D2 and a sequence of variable assignments {ατ }, D and τ1+ = τ2− hold. Equals: a binary operator, referred to as , which produces a description D ≡ (D1 , D2 ), i.e., two events last exactly for the same period of time. Given I and a sequence of variable
96
Activity Recognition in Pervasive Intelligent Environments
1 D 2 is satisfied if and only if D1 , D2 and (τ − = τ − ) ∧ (τ + = τ + ) assignments {ατ }, D 1 2 1 2 hold. Starts: a binary operator, referred to as {s}, which produces a description Ds ≡ {s}(D1 , D2 ), i.e., two events start exactly at the same time. Given I and a sequence of vari1 {s}D 2 is satisfied if and only if D1 , D2 and (τ − = τ − ) ∧ (τ + < able assignments {ατ }, D 1
2
1
τ2+ ) hold. Finishes: a binary operator, referred to as { f }, which produces a description D f ≡ { f }(D1 , D2 ), i.e., two events end exactly at the same time. Given I and a sequence of vari1 { f }D 2 is satisfied if and only if D1 , D2 and (τ − < τ − ) ∧ (τ + = able assignments {ατ }, D 1
2
1
τ2+ ) hold. Collectively, operators in T are used during the context assessment phase to recognize temporal relationships between descriptions. In particular, grounded descriptions (originated from sensory data) are evaluated in order to establish temporal relationships by inspecting the values of the roles starts_at and ends_at through subsumption with respect to operators in T . Obviously enough, the evaluation of temporal operators is deferred in time, since they involve many events to occur in sequence.
Fig. 4.3 The context assessment process.
Smart Environments and Activity Recognition: a Logic-based Approach
4.4
97
Assessment of Temporal Contexts
Given an ontology-based context model defined as in the previous Section, temporal context assessment is realized by iteratively aggregating descriptions in order to satisfy elements in C and S (see Figure 4.3 on the bottom). Every time instant τ , and as a consequence of a variable assignment ατ (i.e., when updated sensory data are available), a l ), thereby modifying classification process is carried out over grounded symbols Σ |= P(σ their truth value. Satisfied symbols Σ |= P(σk ) are then described by grounded descrip p . It is worth noting that also grounded descriptions related to symbols Σ |= P(σk ) tions D l
that have been satisfied in previous time instants (within a predefined temporal window) are represented: the set of all the grounded predicative elements satisfied at the time instant . p τp = p )}, where l = 1, . . . , |D p |. τ is defined as D {D |Σ |= P(D l
l
l
A procedure T is defined such that, given a set of grounded descriptions of satisfied symτp ), it returns the set of grounded descriptions of bols Σ |= P(σk ) or Σ |= C (σk ) (e.g., D satisfied temporal operators that hold between them in τ . Specifically, if the set of temporal l ) is defined as relationships holding between grounded symbols Σ |= P(σ . t(p) t(p) t(p) τ = {D m |Σ |= m )}, D (D ∈T
τt(p) = T (D τp ). If we omit the dependency of temporal operator then T is defined such that D descriptions on those of predicates, we define the history of the system at the Predicate level τp and D τt(p) , as by joining descriptions belonging to D . s,p∪t(p) p ... D p p D t ... D t = D =D Ds,i , τ 1 t | 1 |D |Dl |
m
where s stands for satisfied, and i is an index (initialized to 0) whose meaning is related to the number of hierarchical Context levels that are represented within the ontology. At this point, a sequence of steps is recursively performed. The set Cs,i of i-th level symbols Σ |= C (σk,i ) that subsume history is determined as . s,i ]} Cs,i = {Σ |= C (σk,i ) : subs?[σk,i , D
(4.4)
Satisfied i-th level symbols Σ |= C (σk,i ) are then described by their grounded descriptions c . The set of all the grounded context elements satisfied at the time instant τ is defined as D l . c c = c )}, l = 1, . . . , |D c |. D {Dl |Σ |= C (D (4.5) τ l l The procedure T is used again to determine the set of grounded descriptions of satisfied k,i ) at the time instant τ temporal operators holding between grounded symbols Σ |= C (σ t(c) c τ = T (D ), and therefore: D τ . t(c) τt(c) = mt(c) )}. D {Dm |Σ |= (D ∈T
98
Activity Recognition in Pervasive Intelligent Environments
The history of the system at the i-th Context level is determined by joining descriptions τt(c) and D c , as belonging to D τ
. s,c∪t(c) c · · · D c c D t · · · D t = D =D Ds,i . τ 1 1 | t | |D |D l
(4.6)
m
The recursion over Equations (4.4)–(4.6) ends when there are no more (i + 1)-th level symbols Σ |= C (σk,i+1 ). As a final step, the set of Situation symbols Σ |= S (σk ) that are satisfied at the time instant τ , namely Ss,τ , is given by: . s,i ]} Ss,τ = {Σ |= S (σk ) : subs?[σk , D Once the set Ss,τ is assessed, the system is said to be aware of modeled events occurring within the monitored environment at the time instant τ . The architecture implements a 3layer context-aware mechanism, each layer devoted to manage different aspects of context recognition: • Predicate layer. A fast dynamically changing representation structure that, for efficiency reasons related to the complexity of inference mechanisms in DL-based ontologies (Horrocks & Sattler, 2007) [17], is constrained to operate on a limited temporal window. However, its importance relies in that it follows high frequency variations in patterns of sensory data and the associated temporal relationships. • Context layer. A multiple-resolution representation layer that allows to formally model contexts at different degrees of granularity. For this reason, it can cope with middle and slow frequency fluctuations in observed data thereby recognizing trends over long periods of time. • Situation layer. An aggregation level that allows the evaluation of combined events, possibly spanned over time and only loosely coupled.
4.5
Experimental Results and Discussion
During the past few years, many experiments have been performed in the Know-House environment (Mastrogiovanni, Sgorbissa, & Zaccaria, 2007) [25] to validate both the temporal context model and the context assessment procedure. In this Section, an example is described with the purpose of clarifying the conceptual steps involved in representing temporal information and assessing temporal relationships between events and human activities as they are represented within the ontology. Specifically, in order to ground the discussion, we show how to encode and assess a sophisticated daily activity. The overall system is
Smart Environments and Activity Recognition: a Logic-based Approach
Fig. 4.4
99
Top: a relevant sketch of the ontology. Bottom: a variable assignment that satisfies the ontology.
currently undergoing an experimental evaluation at Villa Basilea, an assisted-living facility located in Genoa (Italy), where a dedicated apartment has been provided with different kinds of sensors and smart appliances (see Figure 4.5). Specifically: • Twelve Passive Infra Red (PIR) sensors located to monitor interesting areas of the apartment, such as the “bed area” in the bedroom or “table area” and “stove area” in the kitchen. • Five integrated temperature, luminosity and humidity sensors, each one located in a different room. • Five RFID antennas, which allow the system to track specific users in the apartment, each one located in a different room. • Ten light switches to control lights state. • Eleven contact sensors for detecting doors and windows state. • Three smoke detectors located ? respectively ? in the kitchen, the bedroom and the living room. • One smart TV, which is able to provide the system with information about its usage. • Two pressure sensors, the first for the armchair in the living room and the second for
100
Activity Recognition in Pervasive Intelligent Environments
detecting bed usage. • Six cameras with integrated motion detection algorithms that allow the system to monitor wide apartment areas revealing “activities” therein.
Fig. 4.5 The Villa Basilea set-up. Top: the living room; Bottom Left: the kitchen; Bottom Right: the bedroom.
As it has been described in (Mastrogiovanni, Sgorbissa, & Zaccaria, 2007) [25], all collected sensory data are then processed on a centralized processing unit where both the ontology and the context assessment algorithms have been implemented.
4.5.1
An Example of System Usage
In this Section we focus on the following sequence of activities that must be detected by the context assessment process:
“after having switched the light off, the user watches the TV while sitting on the armchair for about one hour; after that, within half an hour, he goes to bed”.
Smart Environments and Activity Recognition: a Logic-based Approach
101
Specifically, in the following paragraphs, we focus only on issues related to the modeling phase and the associated context assessment process. Modeling. First of all, it is important to notice that the previously introduced sequence of activities to be recognized must be modeled using a Situation concept, since it involves representation layers related to many entities: user, TV, and bed, just to name a few. Using the temporal operators in T , this situation can be written using the composite formula: NocturnalActivity ≡ TurnedO f f KitchenLight ≺ SittingOnChair SittingOnChair and WatchingTV 1hour InBed 30min WatchingTV 1hour.
(4.7)
As it can be noticed, this definition is made up of three constituent descriptions, namely D1 ≡ TurnedO f f KitchenLight ≺ SittingOnChair, D2 ≡ SittingOnChair{d}WatchingTV 1hour, D3 ≡ InBed 30min WatchingTV 1hour, each one corresponding to a particular context. Then, each context is mapped to a number of predicates (see Figure 4.4 on the top). As previously described, Predicate concepts in Σ directly represent sensory information. In the following paragraphs we exemplify how to model a particular contextual element, namely SittingOnChair. To this aim, we first specify the PIR Sensor symbol σPIR , characterized by the description DPIR , as follows: DPIR = ∀ id.INT EGER ∀ description.ST RING) ∀ range.Area), where id is the PIR identification number, description is aimed at providing system users with meaningful information about the sensor, whereas range is a role that is filled with instances of the Area concept in order to associate sensory data with labeled places within the environment. For instance, if we want to model a specific PIR sensor with id equal to “4” and detecting events in the Kitchen area (where the chair is located), we have PIR4 ≡ id(4) description(“KitchenPIRSensor ) range(Kitchen) D At the same time, we specify the PIRData Data symbols σPIRData , that is characterized by the description DPIRData , as follows: DPirData = ∀ Id.INT EGER ∀ value.INT EGER where id specifies the Data type, whereas value is the actual sensor reading. For instance, if something is detected by pir4, this will be modeled within the ontology by the following PIRData4 , assuming that something is detected: grounded description D PirData4 = Id(1) value(1) D
102
Activity Recognition in Pervasive Intelligent Environments
As described in Section 4.3, the association between sensors and sensory information is maintained through Predicate concepts. Specifically, IsPirActive is represented within Σ using a symbol σIsPIRActive through a description DIsPIRActive , as follows: DIsPirActive = ∀ sensor.Pir ∀ value.PirData. As it can be noticed, its definition exploits both DPIR and DPIRData : IsPirActive4 = sensor(PIR4) value(PirData4). D Similar considerations hold for all the other Predicate concepts, and in particular, for Pressed. Once both IsPIRActive and IsPressed are represented within the ontology, it is straightforward to introduce SittingOnChair using a symbol σSittingOnChair , that is described using the following description: SittingOnChair ≡ Pressed(chair, chairData) PirActive(Pir4, PirData4). It is worth noticing that, in our current implementation, the system is able to dynamically create all the relevant Predicate instances on the basis of associated specifications for actual deployed sensors. Temporal relationships in the NocturnalActivity formula are expressed using the σPERIOD concept. As described in Section 4.3, PERIOD is defined using three roles, namely starts_at, ends_at and duration. Symbols in the ontology referring to the context model include a period role within their definitions. For instance, the WatchingTV 1hour Context can be defined as: WatchingTV 1hour ≡ WatchingTV ∀ period(1hour), where 1hour is a particular instance of Period. Since temporal intervals are directly formalized within Σ, the system can use standard subsumption algorithms to classify temporal duration of descriptions over time. In our current implementation, periods can be possibly defined using fuzzy intervals and they can be associated with semantically consistent labels, like “long period” or “sleeping period”. In spite of actual differences in symbol definitions, similar considerations hold for other contexts as well: first relevant Predicates are built, then they are used to define Contexts. Contexts must be used to define more complex relationships, i.e., Situations. It is now possible to define the description that has been previously labeled as D2 , by combining SittingOnChair with WatchingTV1hour by means of the during temporal operator. In particular, assuming that and are ? respectively ? the initial and the final ends of the temporal interval when SittingOnChair holds, and that τ1− and τ1+ are ? respectively ?
Smart Environments and Activity Recognition: a Logic-based Approach
103
the initial and the final ends of the temporal interval when WaitingTV1hour holds, then SittingOnChair and WatchingTV 1hour is defined as follows: D2 ≡ SittingOnChair WatchingTV 1hour (τ1− > τ2− ) (τ1+ < τ2+ ). If sensory data are such that the two contexts SittingOnChair and WatchingTV1hour hold, and if the two temporal intervals are such to satisfy the two relationships (τ1− > τ2− ) and (τ1+ < τ2+ ), then the during temporal operator is said to be satisfied. Context recognition. An example of sequence of variable assignments satisfying NocturnalActivity is given in Figure 4.4 on the bottom. According to the proposed model, context recognition is an iterative process that, every time instant τ , determines the set of all the satτp . On the basis of D τp , the set D τt(p) of all the relevant temporal isfied Predicate concepts D relationships among satisfied instances of Predicate is built (see Equation (4.7)). It is worth noticing that, in our current implementation, we do not exhaustively build all the possible relationships among satisfied Predicate concepts: on the contrary, we consider only those relationships appearing in definitions of either Context or Situation concepts. This mechanism can be easily extended to Context concepts satisfied in τ (see Equations (4.4)–(4.6)). If we carefully look at the definition of NocturnalActivity, it is possible to notice that TurnOffKitchenLight can be formalized into two ways, either using the derivative of IsLightOn, namely δ {IsLightOn}τ
t→ f
, or exploiting a sequence that represents the variation
of the kitchen light from on to off, as KitchenLightOn{m}KitchenLightOff. With the respect to our example, and starting from the Context level for the sake of brevity, we can see that at the time instant τ = 00.23, only one relevant Context is recognized, namely c D 00.23 = {KitchenLightOn}. Since only one satisfied Context is represented within history, the corresponding set of temporal relationships is empty, and therefore t(c) = {}. D 00.23 As a consequence, the overall history at the time instant τ = 00.23 is given by c∪t(c) = {KitchenLightOn}. D s,00.23 When, at τ = 00.24, the lights are switched off, the Context KitchenLightOff is recognized, and therefore it becomes part of history: c D 00.24 = {KitchenLightOn, KitchenLightO f f }.
104
Activity Recognition in Pervasive Intelligent Environments
However, an interesting temporal relationships involving both KitchenLightOn and KitchenLightOff is detected, i.e.: t(c) = {TurnedO f f KitchenLight}. D 00.24 The corresponding overall history at τ = 00.24 is updated as follows: c∪t(c) = {KitchenLightOn, KitchenLightO f f , TurnedO f f KitchenLight}. D s,00.24 As it is shown in Figure 4.4, the next interesting event occurs at the time instant τ = 00.40, when the TV is switched on. Analogously to what has been previously shown, history is updated as follows: c D 00.40 = {KitchenLightOn, KitchenLightO f f , SittingOnChair,WatchingTV }, t(c) = {TurnedO f f KitchenLight, D1 }, D 00.40 and then c∪t(c) = {KitchenLightOn, KitchenLightO f f , TurnedO f f KitchenLight, SittingOnChair, D s,00.40 WatchingTV, D1 }. Obviously enough, the proper detection of WatchingTV1hour is postponed exactly of one hour (given its definition). Finally, at the time instant τ = 01.58, all the events described in NocturnalActivity are detected, since c = {KitchenLightOn, KitchenLightO f f , SittingOnChair,WatchingTV 1hour, OnBed}, D 01.58 t(c) = {D1 , D2 , D3 }, D 01.58 c∪t(c) = {KitchenLightOn, KitchenLightO f f , TurnedO f f KitchenLight, D s,01.58 SittingOnChair,WatchingTV 1hour, OnBed, D1 , D2 , D3 }. Finally, . s,01.58 ]}. Ss,01.58 = {Σ |= S (σNocturnalActivity ) : subs?[σNocturnalActivity , D As a practical example, at time instant 1.00pm, context history comprises three elements: TurnedOffKitchenLight, WatchingTV, and SittingOnChair. quence is detected at time instant 1.58, when InBed is detected.
The overall seIf we care-
fully look at the temporal order scheme, the two time instants KitchenLightOn+ and KitchenLightOn− are filled with the same numerical value (i.e., 00.24), and therefore KitchenLightOn{m}KitchenLightO f f is satisfied, thereby satisfying TurnedOffKitchenLight as well. The next event on the “timeline” is SittingOnChair, which starts at time
Smart Environments and Activity Recognition: a Logic-based Approach
105
instant 0.33; the following detected one is WatchTV1hour, and so on. In order for the overall Situation to be detected, time instants to be considered are only those directly expressed in temporal operators. As a major consequence, a possible instance of temporal relationships is: TurnedO f f KitchenLight + < SittingOnChair− SittingOnChair− < WatchTV 1hour− SittingOnChair+
> WatchTV 1hour+ + t
(4.8) > InBed − ,
which, when actual temporal values are substituted, turns out to be: 0.24 < 0.33 0.33 < 0.37 1.39 > 1.38 1.38 + 0.30 > 1.58. When this happens, the formula is said to be satisfied. 4.5.2
A Discussion about context assessment complexity and system performance
One of the major requirements of context assessment systems is to recognize a dangerous situation in real-time and to promptly react whenever the detected situation calls for an immediate action. As a consequence, performance issues are very important both in principle but also at a practical level. Furthermore, real-time requirements must be guaranteed at system level, i.e., both at the infrastructural and reasoning levels. With respect to real-time requirements at the infrastructural level, these have been addressed in previous work (Mastrogiovanni, Sgorbissa, & Zaccaria, 2007) [25], where a distributed, scalable and real-time architecture has been discussed that is at the basis of the current system implementation. However, the real challenge in any symbolic-based architecture is to guarantee real-time performance at the reasoning and knowledge representation levels. The proposed system requires that information originating from distributed devices is represented and managed by means of an ontology, where reasoning processes occur that implement the described context assessment mechanism. As it has been described in (Borgida, 1996) [6], low-order polynomial complexity in ontology-based inference can be obtained by imposing a monotonic increase of the number of represented concepts and individuals. This implies that named concepts and individuals cannot be removed from the ontology: the number of symbols is to be maintained constant. This is in contrast with what is expected to happen in a real-world environment, where novel events, situations and human actions can originate new symbols (in particular, new individuals) in the ontology as time passes. In order to cope with this issue, the implemented system adopts a hybrid approach: the goal is to keep the number of concepts and individual constant over time, while being able to update relevant information in real-time.
106
Activity Recognition in Pervasive Intelligent Environments
Basically, two kinds of symbols must be dealt with: the former class comprises those symbols that are related to the context model, i.e., concepts and individuals related to Predicates, Contexts and Situations, whereas the latter includes symbols that represent sensors and sensor data. With respect to the first class, detected instances of Predicates, Contexts and Situations are represented using placeholder individuals that maintain information on a given number of their occurrences over time. Basically, the same individual is used to keep information about many actual instances. According to a formal perspective, this approach can be considered “semantically incoherent”, since it is always assumed that a one-to-one mapping exists between instances and real-world entities; however, practice suggests that this “trick” does not impact on formal system properties. Obviously enough, the number of instances maintained within the corresponding individual affects the overall history length and, as a consequence, the temporal complexity of the situations that can be detected over time: the bigger this number, the longer the history of past events the system can reason upon. With respect to the second class, the number of needed symbols can be computed in advance with respect to the number of devices actually used: in this analysis, we consider a minimum of three concepts for each sensing device representing sensory data, the actual physical device and the related Predicates. Sensors measuring physical properties defined in a range (i.e., temperature) are characterized by discrete (fuzzy) intervals at the Predicates level: sensors and sensory data are represented with one corresponding individual. Analogously to what happens for symbols in the context model, the number of symbols involved in grounding sensory data can be determined beforehand, given the number of deployed devices. Figure 4.6 superimposes increase rates for the number of symbols stored in the ontology currently used in the real-world scenario of Villa Basilea. The blue line describes the rate associated with an implementation that does not exploit the proposed tricks. In this case, the ever increasing number of stored symbols leads to a system unable to meet real-time requirements after an unpredictable amount of time, since it depends on what actually happens in the scenario). The red line shows the rate associated with the current implementation: after some time (21 hours in Figure 4.6) all the possible Situations have been detected: from this moment on, the system can be considered at a regime phase, and therefore its computational behavior can be carefully predicted. 4.6
Conclusion
In this paper, an architecture for context assessment in smart environments has been proposed, with a particular emphasis on the knowledge representation and reasoning capabili-
Smart Environments and Activity Recognition: a Logic-based Approach
107
Fig. 4.6 After system bootstrap, assuming 10 new symbols per hour, the grey line shows a generic increase rate for stored symbols in the ontology, whereas the black line shows the corresponding increase rate for the proposed system.
ties that are needed for real-time operation. In particular, the presented framework exploits an ontology and the associated subsumption inference process to reason upon sensory information that represents events and human actions the system must detect. The ontology is fed with sensory data originated from distributed devices by means of symbols that directly map data into semantically relevant information. These base symbols are then used as input to a hierarchical context model that is organized into three layers: Predicates are responsible for coupling sensory data with the relevant entities, Contexts are aimed at organizing Predicates in symbolic structures, which relate different information, Situations aggregate many Contexts to define more complex structures, which can include temporal relationships between events and human actions. Specific focus has been devoted to a context assessment model and the associated iterative algorithm. Finally, an example is reported with the aim of clarifying conceptual aspects related both to modeling and assessment. Differently from what can be found in literature, a special attention is devoted to implementation aspects related to computational requirements and real-time operating conditions. In order to guarantee these requirements over time, formal aspects of the adopted logic-based approach are left aside. However, this guarantees a predictable behavior related to context assessment and situation recognition, which is a fundamental aspect of smart environment architectures for real world operation.
108
Activity Recognition in Pervasive Intelligent Environments
References [1] Allen J., (1983), Maintaining knowledge about temporal intervals. Communications of the ACM, 26 (11), pp. 832–843. [2] Augusto J.C., Liu J., McCullagh P., Wang H., & Yang J.-B., (2008), Management of Uncertainty and Spatio-Temporal Aspects for Monitoring and Diagnosis in a Smart Home. International Journal of Computational Intelligence Systems, 1 (4), pp. 361–378. [3] Aztiria A., Augusto J.C., Izaguirre A., & Cook D., (2008), Learning Accurate Temporal Relations from User Actions in Intelligent Environments. Proceedings of the Symposium of Ubiquitous Computing and Ambient Intelligence, pp. 274–283. Salamanca, Spain. [4] Baader F., Calvanese D., McGuinness D., Nardi D., & Patel-Schneider P., (2003), The Description Logic Handbook. New York, NY, USA: Cambridge University Press. [5] Bettini C., Mascetti S., & Sean Wang X., (2007), Supporting Temporal Reasoning by Mapping Calendar Expressions to Minimal Periodic Sets. Journal of Artificial Intelligence Research, 28 (1), pp. 299–348. [6] Borgida A., (1996), On the Relative Expressiveness of Description Logics and Predicate Logics. Artificial Intelligence, 82 (1-2), pp. 353–367. [7] Chalmers M., (2004), A Historical View of Context. Computer Supported Collaborative Work, 13 (3-4), pp. 223–247. [8] Chen H., Finin T., & Joshi A., (2003), An Ontology for Context Aware Pervasive Computing Environments. Proceedings of the 18th Intl. Joint Conf. on Artificial Intelligence (IJCAI-03), Acapulco, Mexico. [9] Chen, H., Perich, F., Finin, T., & Joshi A., (04), SOUPA: Standard Ontology for Ubiquitous and Pervasive Applications. Proceedings of the 1st Annual Intl. Conf. on Mobile and Ubiquitous Systems (MobiQuitous2004). Cambridge, MA, USA. [10] Chen L., Nugent C., Mulvenna M., Finlay D., Hong X., & Poland M., (2008), Using Event Calculus for Behaviour Reasoning and Assistance in a Smart Home. Proceedings of the International Conference On Smart homes and health Telematics (ICOST2008). Ames, IA, USA. [11] Dey A., (2001), Understanding and Using Context. Personal and Ubiquitous Computing, 5 (1). [12] Dourish P., (2004), What we talk about when we talk about context. Personal and Ubiquitous Computing, 8 (1), pp. 19–30. [13] Emerson A.E., & Halpern J., (1986), “Sometimes” and “not never” revisited: on branching versus linear time temporal logic, J. ACM, 33 (1), pp. 151–178. [14] Gottfried B., Guesgen H.W., & Hübner S., (2006), Spatiotemporal Reasoning for Smart Homes. In Designing Smart Homes (Vol. 4008/2006), pp. 16–34, Springer Berlin/Heidelberg. [15] Harnad S., (1990), The Symbol Grounding Problem, Physica D, 42, 335–346. [16] Henricksen K., Indulska J., & Rakotonirainy A., (2002), Modelling Context Information in Pervasive Computing Systems. Proceedings of the Intl. Conf. on Pervasive Computing (Pervasive 2002). Zurich, Switzerland. [17] Horrocks I., & Sattler U., (2007), A Tableau Decision Procedure for SHOIQ, Journal of Automated Reasoning, 39 (3), pp. 248–276. [18] Jakkula V.R., & Cook D., (2007), Using Temporal Relations in Smart Home Data for Activity Prediction. Proceedings of the International Conference on Machine Learning (ICML) Workshop on the Induction of Process Models (IPM /ICML 2007). Corvalis. [19] Jakkula V., & Cook D., (2007), Anomaly Detection using Temporal Data Mining in a Smart Home Environment. Methods of Information in Medicine, 47 (1), pp. 70–75. [20] Krummenacher R., & Strang T., (2007), Ontology-Based Context-Modeling. Proceedings of the 3rd Workshop on Context Awareness for Proactive Systems (CAPS’07). Guildford, United Kingdom. [21] Kutz O.L., (2004), E-connections of Abstract Description Systems. Artificial Intelligence, 156
Smart Environments and Activity Recognition: a Logic-based Approach
109
(1), pp. 1–73. [22] Liao L., Fox, D., & Kautz H., (2005), Location-Based Activity Recognition using Relational Markov Networks, Proceedings of the Advances in Neural Information Processing Systems (NIPS). Edimburg, Scotland. [23] Lutz C., Wolter F., & Zakharyashev M., (2008), Temporal Description Logics: A Survey, Proceedings of the 15th International Symposium on Temporal Representation and Reasoning (TIME). pp. 3–14. Montreal, Canada. [24] Mastrogiovanni F., Scalmato A., Sgorbissa A., & Zaccaria R., (2008), An Integrated Approach to Context Specification and Recognition in Smart Homes. In Smart Homes and Health Telematics, pp. 26–33, Springer Berlin/Heidelberg. [25] Mastrogiovanni F., Sgorbissa A., & Zaccaria R., (2007), Classification System for Context Representation and Acquisition. In J. A. (Eds.), Advances in Ambient Intelligence. In the Frontiers of Artificial Intelligence and Application (FAIA) Series. IOS Press. [26] Mastrogiovanni F., Sgorbissa A., & Zaccaria R., (2008), Representing and Reasoning upon Contexts in Artificial Systems. 3rd Workshop on Artificial Intelligence Techniques for Ambient Intelligence (AITAmI-08), co-located with the 18th European Conf. on Artificial Intelligence (ECAI 08). Patras, Greece. [27] McGuinness D. and Borgida A., (1995), Explaining Subsumption in Description Logics. Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI-95), Montréal, Canada. [28] Monekosso D.N., & Remagnino P., (2007, 11), Monitoring Behavior With An Array of Sensors. Computational Intelligence, 23 (4), pp. 420–438. [29] Morchen F., (2006), A Better Tool that Allen’s Relations for Expressing Temporal Knowledge in Interval Data. Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). Philadelphia. [30] Muehlenbrock M., Brdiczka O., Snowdon D., & Meunier J., (2004), Learning to Detect User Activity and Availability from a Variety of Sensor Data. Proceedings of the 2004 IEEE Int.l Conf. on Pervasive Computing (PerCom04). Piscataway, NY, USA. [31] Rugnone A., Nugent C., Donnelly M., Craig D., Vicario E., Paggetti C., et al., (2007), HomeTL: A Visual Formalism, Based on Temporal Logic, for the Design of Home Based Care. Proceedings of the 3rd annual IEEE Conference on Automation Science and Engineering (CASE), pp. 747–752, Scottsdale, Arizona, USA. [32] Salber D., Dey A., & Abowd G., (1999), The Context Toolkit: Aiding the Development of Context-Enabled Applications. Proceedings of the Conf. on Human Factors in Computing Systems (CHI’99), pp. 434-441. Pittsburgh, Pennsylvania, USA. [33] Strang T., Linnhoff-Popien C., & Frank K., (2003), Applications of a Context Ontology Language. Proceedings of the Int.l Conf. on Software, Telecommunications and Computer Networks. Dubrovnik, Croatia. [34] Strang, T., Linnhoff-Popien, C., & Frank., K. (2003), CoOL: A Context Ontology Language to enable Contextual Interoperability. Proceedings of the 4th IFIP WG 6.1 Int.l Conf. on Distributed Applications and Interoperable Systems (DAIS2003). Paris, France. [35] Takahashi H., Tokairin Y., Yamanaka K., Suganuma T., Kinoshita T., Sugawara K., et al., (2007), uEyes: A Ubiquitous Care-Support Service Based on Multiple Contexts Coordination. Proceedings of the 2007 International Symposium on Applications and the Internet, IEEE Computer Society. [36] Vilain M. & Kautz H., (1989), Constraint propagation algorithms for temporal reasoning: A revisited report. In Readings in Qualitative Reasoning about Physical Systems (pp. 373–381). Morgan Kaufmann Publishers Inc. [37] Waldmann M., (2007), Combining versus Analyzing Multiple Causes: How Domain Assumptions and Task Context Affect Integration Rules. Cognitive Science, 31, pp. 233–256.
Chapter 5
ElderCare: An Interactive TV-based Ambient Assisted Living Platform
Diego López-de-Ipiña, Sergio Blanco, Xabier Laiseca, and Ignacio Díaz-de-Sarralde Deusto Institute of Technology - DeustoTech, University of Deusto, Avda. Universidades 24, 48007 Bilbao, Spain {dipina,sergio.blanco,xabier.laiseca,isarralde}@deusto.es Abstract This paper describes the architecture and components of an AAL-enabling platform, centred around interactive TV (iTV), which combines OSGi middleware, RFID and NFC in order to ease the day to day of dependant or semi-dependant elderly people (its main focus), their care takers and relatives. The end result is an affordable, unobtrusive, evolvable, usable and easily deployable ICT infrastructure which aims to approach the vision of “AAL for All”. This is, it seeks a more widespread adoption of AAL and a better QoS on caretaking through the combination of common hardware, OSGi dynamic service and mobile-aided care data management.
5.1
Introduction
Ambient Assisted Living (AAL) [1] fosters the provision of equipment and services for the independent or more autonomous living of elderly people via the seamless integration of info-communication technologies (ICT) within homes and residences, thus increasing their quality of life and autonomy and reducing the need for entering in residences or aiding it when it happens. This paper aims to explain the architecture and components of an AAL solution designed to give support not only to people in risk of losing autonomy but also to caretakers or people concerned about them (relatives or friends). A common issue in AAL systems, inherited from the wider AmI paradigm, is that deployment is limited to premises where an important economic investment and cumbersome ICT deployment can be justified. The main focus of this work is to define an AAL-enabling platform, namely ElderCare, which addresses these limitations. Hence, this paper mainly 111
112
Activity Recognition in Pervasive Intelligent Environments
discusses the quest for a low cost, easily deployable, usable and evolvable ICT infrastructure leading towards “AAL for All”. The fact that it must be usable by elderly people explains why the main interaction mechanism offered by this platform is interactive TV (iTV), i.e. the interaction with a device (remote control) with which everybody is familiar. An important collective not usually targeted by AAL platforms is people concerned with or interested in the elderly people, e.g. relatives or friends who are not directly involved in caretaking. Hence, this work proposes different notification mechanisms to keep them up-to-date (e.g. email, SMS, RSS or Twitter) about dependant people evolution. Furthermore, it allows authorized care takers and relatives to access the ICT infrastructure aiding the elderly person in order to monitor or accommodate the system to their new caring requirements. In essence, the proposed solution is not only targeted to professional caretakers but also to relatives that want to enhance their elderly people’s homes with some assistive support. Another important and difficult issue to solve regarding AAL is to ensure adequate and timely care-related data management at residences and homes. In them, big amounts of data must be gathered and reported in real-time to be able to follow the progress and incidents regarding elderly people’s daily routine. Only if proper care data collection is in place, better care for the elderly and more suitable working conditions for their care takers are possible. Thus, this work proposes to combine NFC mobiles and RFID tags to address this issue. The structure of the paper is as follows. Section 5.2 gives an overview of related work. Section 5.3 presents the overall architecture of the ElderCare system. Section 5.4 offers more details on the three distributed components conforming ElderCare’s modular, extensible and intelligent architecture. Section 5.5 draws some conclusions and states some future work plans. 5.2
Related Work
Standford [2] describes one of the first AAL solutions: an instrumented elderly home that uses pervasive computing to help the residence’s staff to easily identify where they are needed and to give them support to increase their work efficiency. Information is acquired by the system via locator badges, weight sensors in beds and so on, enabling the staff to better study and react to the problems that arise. Also in-apartment computers are used with custom applications to enable residents to communicate with the outside world and consume some services. The main limitation of the system is its lack of a capability for
ElderCare: An Interactive TV-based Ambient Assisted Living Platform
113
the seamless integration of new hardware. Besides, the system is designed to only operate within a residence. The GatorTech Smart Home [3] is a remarkable 5 year long project that led to the construction of a very advanced smart house. A modular OSGi-based service architecture is defined that allows easy service creation, context and knowledge intelligent management and the integration of some custom-built hardware such as RFID plugs to easily identify connecting devices. Unfortunately, this solution can only be deployed in residences willing to make a big investment and go through a heavy deployment process. Its infrastructure is not suitable for widespread adoption. Besides, it does not address remote care giving at elderly people’s homes, i.e. making the information and configuration accessible remotely for caregivers and family. The iNEM4U FP7 project [4] focuses on enriching iTV experience by integrating community and interactivity services from various technology domains (social networks, in-home device networks) into an iTV session. They present an MHP based interactive TV prototype which uses a computer with VLC and MHP to build a TV Quiz application. Unfortunately, they focus on the connection of TV broadcast with external Internet services, but do not address the shortcomings of MHP technology to connect to in-house (often non-IP) devices or user-bound mobile clients. The reason being that MHP [5] decoders are limited to run applications that come embedded in the TV signal and run in a sandbox that only allows graphical interface rendering and IP communication with the Internet. Some authors [6] have already extended MHP with proprietary extensions to interact with OSGi-based residential gateways. However, given the limited deployment at European homes of DVB MHP-compatible decoders and, even more importantly, the reliance on digital broadcasters to include MHP applications in the audiovisual carrousel, we have decided to adopt a proprietary but self-sufficient iTV solution. The NUADU project [7] is another relevant research work aiming to improve elderly standards of living by means of low-cost and easily deployable healthcare services. This project resembles ElderCare in the use of TV as an end user client and the idea of providing support to other user collectives apart from elderly people (family, doctors, caregivers, . . . ). On the other hand, while NUADU is devised as a home system with some aiding support for street walking, our project targets a seamless and modular solution that can work in homes and residences alike. Currently there are several interesting initiatives integrating computers in home living rooms with projects such as Boxee [8], XBMC [9] and MythTV [10]. However, few try
114
Activity Recognition in Pervasive Intelligent Environments
to adventure outside the comfort zone of media playing and management. LinuxMCE [11] is one of the few that provides mechanisms to control home automation devices from any computer running the media centre solution. The combination of NFC [12] technology and RFID tags has been used in several research projects related to medicine and caretaking [13]. Indeed, the adoption of mobile devices is making caretaking easier. Mei proposes the development of a framework that depicts patients’ vital signs [14] and Tadj provides a basic framework for developing and implementing pervasive computer applications [15]. However, few if any have offered solutions to store the most relevant information regarding the residents themselves, their symptoms and the caring procedures applied in their identifying wristbands, so that care data management can be improved.
5.3
The ElderCare Platform
This platform is devised to provide universal ICT support for more friendly aging at homes and residences, i.e. how to preserve and elongate the autonomy of people through technology despite their increasing dependency needs. It aims to define a minimum but sufficient set of off-the-shelf hardware and accompanying software infrastructure, named AAL Kit, which is easily portable, deployable, configurable and usable in either elderly people homes or in their rooms in residences. It is thought to be accessible to any elderly independently of their socio-economic, cultural or technological background. In the proposed solution, see Figure 5.1, local deployments of AAL Kits are remotely managed by a centralised back-end server which allows residences or institutions monitoring elderly people homes to control, notify and be alerted from the assisted environment. Notably, this platform brings about usable and accessible interaction interfaces best suited for the three collectives targeted in ElderCare: a) first and foremost, elderly people living at their homes or in residences are offered an interactive TV interface; b) caretaking staff request and register info about their caring procedures through NFC mobiles and touch screens; and c) authorised relatives and friends interested on and concerned with following the elderly people’s life logs, follow them through RSS or micro-blogging services such as Twitter. In essence, the ElderCare platform aims to provide a holistic ICT infrastructure for AAL in any home or residence that is affordable, unobtrusive, easily deployable, usable, accessible and evolvable. In other words, this solution addresses the following design requirements:
ElderCare: An Interactive TV-based Ambient Assisted Living Platform
Fig. 5.1
115
AAL Kit (left hand side) and Interactive TV interface (right hand side)
• Affordable since it has to be offered at a low cost to ensure anybody can purchase it. Therefore, it should be built using mass-produced off-the-shelf hardware. In our prototype, the following elements have been used: a TV tuner (68e), a small size motherboard (84e), a 2 Gb RAM module (45e), 1 120 Gb hard drive (30e) and 1 settop-box like PC box (35e). All of it amounting to a total of 265e, a price that could be reduced significantly if such a product was to be produced massively. Other additional and optional elements used in our prototype have been a ZephyrTM HxM Bluetooth biometric belt (160e), a Bluetooth USB adapter (20e), a Nabaztag Internet-connected object (145e), 1K HF RFID wristbands (0,3e/unit) and NFC mobiles (150e/unit). • Unobtrusive so that it can be seamlessly integrated within a home or residence room, i.e. it should have the form factor of other common electronic devices, without cluttering the environment where they are deployed, or be easily worn by users. Our solution takes the form of a small PC or set-top-box, see small black box next to the phone in Figure 5.1. Furthermore, residents are encouraged to wear a lightweight silicon wristband and optionally a more intrusive biometric belt in order to monitor their vital signs. Only in this latter case we could state that our solution is slightly intrusive from the residents’ point of view. On the other hand, caretakers are only encouraged to carry a mobile phone. Therefore we can state that our solution in this case is not obtrusive. • Easily deployable so that relatives or even elderly people can plug in the system and configure it. In our case, the proposed set-top-box-like hardware is directly connected to the TV and only needs a button press to be started and operates, in its default mode, as a standard DVB-T decoder. • Usable and accesible to every user collective. Elderly people accustomed to use a TV remote control can easily use the offered Teletext-like interface. On the other hand, care staff is provided with efficient and rapid access to the platform services through
116
Activity Recognition in Pervasive Intelligent Environments
touch screens or intuitive mobile phone applications. More advanced users access ElderCare from remote PC browsers through a simple RIA (Rich Internet Application) web front-end, see Figure 5.4. Finally, authorised relatives and friends can follow a dependant person’s daily caring logs through notification mechanisms such as email, RSS or Twitter. In summary, ElderCare attempts to offer those interaction interfaces best suited to every different collective. • Evolvable. It should be easily integrated with any existing or emerging home automation devices (e.g. X10, Zigbee Home Automation, KNX), notification mechanisms and assistive services. Furthermore, it has to cope with the fact that the assistive services demanded at a premise vary in time, and so do the sensors, actuators or Internet services required. Bearing this in mind, ElderCare has been designed around OSGi, a service infrastructure devised to cater with the heterogeneity and dynamicity requirements mentioned. 5.3.1
Eldercare Platform Components
The ElderCare platform addresses the aforementioned requirements proposing a distributed architecture with the following three components: • AAL Kit - a set of essential hardware and software components which can be deployed in any home or residence room to aid personal autonomy. Figure 5.1 (left hand) shows the form factor of the AAL Kit, constituted by a set-top-box like element connected to a TV. As a matter of fact, it acts as an enhanced DVB-T decoder which offers data services to elderly people and controls the devices deployed in their habitat. Its software, named ElderCare’s Local System, see Figure 5.1, provides a set of default assistive services which are described later. • A central remote management and service provisioning system, namely ElderCare’s Central Server (see top of Figure 5.2). It remotely manages the Local Systems deployed in the rooms of a residence or in different homes. It collects data gathered at those installations, stores and analyses them and generates notifications via different mechanisms in order to report care staff or relatives about important events associated to elderly people. Importantly, it offers an AAL service repository, something like an “AAL store”, which can be accessed using web browsers to select, download and install new services. For instance, a “food shopping from TV” service. All the services currently deployed in a given AAL Kit can be reviewed by accessing to the Services menu option (see selected tab in Figure 5.4).
ElderCare: An Interactive TV-based Ambient Assisted Living Platform
117
• A mobile client to assist in care logging, namely Mobile Care Logging System (left part of Figure 5.4 and Figure 5.5). It is used by relatives and care staff to record, through NFC mobiles in elderly people’s RFID wristbands, events and caring procedures performed over them. Care logs stored in RFID wristbands are also reported by mobiles regularly through Bluetooth to the Local System which forwards them to the Central Server which then notifies non privacy invasive selections of them to relatives or friends.
Fig. 5.2 Eldercare Architecture
5.4
Implementation Overview
This section offers a more detailed and low-level description of the three distributed components conforming ElderCare’s system architecture (see Figure 5.2).
118
Activity Recognition in Pervasive Intelligent Environments
5.4.1
Eldercare’s Local System
Internally, an ElderCare’s Local System is governed by an OSGi system (deployed on Equinox [16]) which manages the following set of embedded default bundles:
• TV Tuning and Widget Manager. It generates the interactive TV main interface offered by a Local System. It behaves as an enhanced DVB-T decoder which performs widget rendering on top of TV images captured from a TV tuner card when requested by OSGi bundles executing in the local system. • Home Automation Manager. This service allows communication with different widely available building automation standards such as X10, Zigbee or KNX, encountered in any modern or previously instrumented home. • Alert Manager. Alerts may be programmed locally or remotely and registered in an internal scheduler. Generally, alerts will be rendered in the TV screen, see alert on TV in Figure , although other alternative channels when the TV is off are possible such as TTS through the set-top-box speakers or a Nabaztag (left side of Figure 5.1). • Elderly Vital Sign Monitor. Vital sign data collection and analysis from a Zephyr HxM biometric wireless vest [17] has also been integrated. Such vest communicates, through Bluetooth, vital variables like heartbeat rate, ECG or distance walked. The data collected can then be reviewed through the Local System web interface (see Log icon in Figure 5.3). In addition, health risk alerts are identified and reported to the Central Server which notifies the relevant people. • Service Manager. This core service provides the extensibility capability of a Local System. It allows to install/un-install services dynamically without system reboot. It also gives access to all the logs generated by the assistive services currently run at an AAL Kit instance. For all of this, it leverages on OSGi’s service management capabilities through the BundleContext class. Figure 5.3 displays in the Local System’s web front-end an extension service which can be selected for installation. • An interesting feature of ElderCare’s Local Systems is their capability to render graphical user interfaces on top of digital TV. Unfortunately, Java support for multimedia management, through JMF, is limited both in terms of the media formats it supports and its performance. Thus, an alternative in the form of an external non-Java application has been used which fulfils that role. Such tool is MPlayer [18], i.e. an open source multimedia player which supports most of the currently established media types and offers a command-line interface (CLI).
ElderCare: An Interactive TV-based Ambient Assisted Living Platform
119
• Something remarkable about ElderCare’s interactive TV interfaces is that they are thought to be easily controllable through the TV remote control by requiring only numeric input or navigation and selection using the remote’s custom cursors and the OK button (see alert painted on screen on right hand side of Figure 5.3).
Fig. 5.3 Local System’s web front-end
5.4.2
ElderCare’s Central Server
The Central Server offers a unique façade from which managers of Local Systems (relatives or staff in a residence) can control ElderCare deployments. Figure 5.4 shows ElderCare’s central server’s web front-end developed with GWT, a Google toolkit which transforms Java developed interfaces into RIA applications. This front-end allows the configuration of a given deployment. For example, all the residents and staff details within a residence. Moreover, it allows the control of every local system deployment. This Central Server is also responsible of managing the changing connection details of
120
Activity Recognition in Pervasive Intelligent Environments
individual Local Systems. In standard home settings, these are mostly associated to nonpublic IP addresses that may change from one day to another. Therefore, Local Systems have to regularly report their currently assigned IP address to the Central Server. Thus, the Central Server always available at the same domain name is aware, at any time, of the current connection and status details (ON/OFF) of a Local System.
Fig. 5.4 Eldercare’s Central Server web front-end
An interesting feature of the Central Server is its capability to react, autonomously, to unexpected or emergency aspects in Local Systems. For that, it includes a rule-based system which adds reactive behaviour. Thus, when an anomalous situation is detected by any of the services deployed at a Local System, a rule maybe triggered that notifies the residence staff or the elderly person’s relatives to take action. The most interesting feature of this rule-based reactive system is that different rule engines may support different rule types (discrete and fuzzy rules) given that there will be different rule and knowledge base pairs coexisting, where each pair may be associated to a given Local System or even to specific service instances within it. 5.4.3
ElderCare’s Mobile Client
As stated before, the storage of data in RFID tags is far from common. However, we believe passive HF RFID tags do have considerable data storage capabilities which make them suitable to be used as low-cost object-bound databases. The NFC Forum specifies a data-packaging format called NDEF [19] to exchange information between an NFC device and another NFC device or an NFC tag. The specification defines the NDEF data structure format as well as rules to construct a valid NDEF mes-
ElderCare: An Interactive TV-based Ambient Assisted Living Platform
121
sage as an ordered and unbroken collection of NDEF records. Furthermore, it defines the mechanism for specifying the types of application data encapsulated in NDEF records. An NDEF record carries three parameters for describing its payload: the payload length, the payload type, and an optional payload identifier. Payload type names may be MIME media types, absolute URIs, NFC Forum external type names, or may be well-known NFC type names. The NFC RTD (Record Type Definition) specification is intended to support NFC-specific application and service frameworks by providing a means for reservation of well-known record types, and third party extension types. Record type names are used by NDEF-aware applications to identify the semantics and structure of the record content. Once the difficulties of reliably gathering data had been identified, the next step was to identify the data actually gathered and then used to enhance the services offered to patients. For that purpose, several interviews (Including staff from an elderly care centre in Bilbao) were carried out. The results of such interviews reveal three categories of data reporting are clearly distinguished:
• Report of daily regular activities: These reports correspond to the activities carried out every day by a resident usually following a strict time pattern (i.e. wake up time, food eaten, pills taken, . . . ). • Report of non-daily, non-regular activities: Visit data, incidents with staff or other patients, . . . • Report of activities outside the centre: Transfers, visits to medical centres, holidays, outdoor activities, . . .
Apart from what to record, a very important aspect is when to transfer the collected data to the hospital’s back-end. As previously mentioned, the fact that those events are manually reported several hours after they actually happened implies that a lot of data is lost. Unfortunately, the overall quality of the caring process and their associated report to the interested parties (doctors and relatives) is affected by this data loss greatly. Our solution for this problem is a tiered data storing and relaying system; When events happen, care staff can use NFC mobile clients to store event data on the user’s wristband. This data is stored in the mobile client too, so when it comes near an ElderCare Local System, it relays the data via Bluetooth to the server via the aforementioned local system. If another care giver happens to want what happened in the last hours to a user, the data can be found in his wristband and in the server via the web RIA previously explained.
122
Activity Recognition in Pervasive Intelligent Environments
Fig. 5.5
5.4.3.1
Java ME Client Application
Data Encoding Features Analysis
In order to maximize the usage of the selected HF RFID tags’ storage facilities, a special purpose compressed object serialization method was developed. The following incremental data encoding formats are used internally in order to generate such data serializations (see also Figure 5.6): • Raw data, or Stringified: a human readable string representation of a log entry. • Encoded data: a transformation of a stringified log entry into a more optimal representation. • Compressed data: this format is the result of applying the Range Encoding compression algorithm mentioned before to the serialized data. Each wristband stores some basic patient info and an array of log entries. To determine how many log entries could be stored in each different format for each different wristband, some tests were developed, resulting in Figure 5.7. We considered a sample of 234 care logs, taking into account common redundancies in care logging, in order to assess the capabilities of the proposed encoding format. For example, the same caretaker may perform
ElderCare: An Interactive TV-based Ambient Assisted Living Platform
Fig. 5.6
123
Data transofmration
several logs during a day, most of the logs will be at the same location, the same message (e.g. “visit to WC”) may be taken several times in a day, and so on. Figure 5.7 compares ElderCare’s Mobile Client’s four incremental encoding formats. Considering care logs of an average size of 56 characters, a total of 34 and 164 messages could be stored in the 1K wristband and 4K watch tags, respectively, having discounted the 41 bytes occupied by the reduced electronic patient record (EPR). The size of the records used to store data in the 1K and 4K tags were 709 and 3200 bytes, respectively. 14000
12000
Size (Bytes)
10000
Stringified Encoded Serialized Compressed Mifare 4K Mifare 1K
8000
6000
4000
2000
0 0
20
40
60
80
100
120
140
Number of Messages
160
180
200
220
234
Fig. 5.7 Data Compression Tests
As conclusion, ElderCare’s Mobile Client’s compressed serialized encoded method reduces, on average, the size of a care log to 30% of its original size in stringified form, allowing a significant number of care logs to be recorded in a residence’s 1K Mifare wrist-
124
Activity Recognition in Pervasive Intelligent Environments
band or 4K Mifare watch, i.e. 34 and 164, respectively. According to care taking experts, such number of logs is sufficient, even in the 1K wristbands’ case, to store all the logs gathered for a resident for one day.
5.5
Conclusion and Further Work
This work has shown a novel AAL platform offering three distinguishing features: a) the ICT infrastructure provided is affordable and easily deployable at both elderly people’s own homes or in their residences, b) the infrastructure does not only primarily target elderly people’s assistance but also helps caretakers in their work and properly keeps relatives and friends up-to-date on their evolution, and c) it alleviates data management in care taking by combining NFC mobiles and data storage on RFID tags. Importantly, the ICT infrastructure proposed approaches the ideal of “AAL for All” by being not only lowcost and easily deployable, but also attempting to ensure high user acceptability in terms usability and accessibility, thanks to its iTV, RIA and NFC interfaces. Finally, it leverages on OSGi service management capabilities to ensure service evolution, i.e. the capability to dynamically extend its functionality and cope with changes. Future work will put the ElderCare platform in practise in a real deployment, a residence recently created which will be opened in November 2010. This will allow us to verify the goodness and limitations of the promising ElderCare ICT infrastructure. Technically, future work will also consider including videoconferencing capabilities between Local Systems and the Central Server or even authorised remote Local System managers.
Acknowledgements We thank the Basque Government Industry Department for having funded the “AGUREZAINA: Platform to Automate and Control Assistive Residential Environments” project through the research program GAITEK 2008 (IG-2008/0000157) and GAITEK 2009 (IG-2009/0000066).
References [1] Ambient assisted living joint program, (2009). http://www.aal-europe.eu. [2] V. Stanford, Using pervasive computing to deliver elder care, IEEE Pervasive Computing, 1(1), 10–13, (2002). ISSN 1536-1268. doi: http://dx.doi.org/10.1109/MPRV.2002.993139. [3] S. Helal, W. Mann, H. El-Zabadani, J. King, Y. Kaddoura, and E. Jansen, The gator tech smart
ElderCare: An Interactive TV-based Ambient Assisted Living Platform
[4]
[5]
[6] [7] [8] [9] [10] [11] [12] [13]
[14]
[15]
[16] [17] [18] [19]
125
house: A programmable pervasive space, Computer, 38(3), 50–60, (2005). ISSN 0018-9162. doi: http://dx.doi.org/10.1109/MC.2005.107. C. Hesselman, W. Derks, J. Broekens, H. Eertink, M. Gulbahar, and R. Poortinga, An Open Service Infrastructure for Enhancing Interactive TV Experiences, In EuroITV2008 Workshop on Sharing Content and Experiences with Social Interactive Television, (2008). Digital video broadcasting (dvb): Multimedia home platform (mhp) specification 1.2.2, (2010). http://www.etsi.org/deliver/etsi_ts/102700_102799/102727/01.01.01_ 60/ts_102727v010101p.pdf. R. Redondo, A. Vilas, M. Cabrer, and J. Pazos, Exploiting OSGi capabilities from MHP applications, Journal of Virtual Reality and Broadcasting, 4, 16, (2007). Nuadu project, (2009). http://www.nuadu.org/. Boxee, (2010). http://www.boxee.tv. Xbmc, (2010). http://www.xbmc.org. Mythtv, (2010). http://www.mythtv.org. Linuxmce, (2010). http://www.linuxmce.com. Nfc, (2009). http://www.nfc-forum.org. J. Bravo, D. López-de Ipiña, C. Fuentes, R. Hervás, R. Peña, M. Vergara, and G. Casero, Enabling NFC Technology for Supporting Chronic Diseases: A Proposal for Alzheimer Caregivers, Ambient Intelligence, pp. 109–125, (2008). H. Mei, I. Widya, A. van Halteren, and B. Erfianto, A Flexible Vital Sign Representation Framework for Mobile Healthcare, In 1st International Conference on Pervasive Computing Technologies for Healthcare. Citeseer, (2006). C. Tadj and G. Ngantchaha, Context handling in a pervasive computing system framework, In Proceedings of the 3rd international conference on Mobile technology, applications & systems, p. 13. ACM, (2006). Eclipse equinox website, (2010). http://www.eclipse.org/equinox. ZephyrTM hxm biometric vest, (2010). http://www.zephyr-technology.com/hxm.html. Mplayer home site, (2010). http://www.mplayerhq.hu. Nfc data exchange format technical specification, (2006). http://www.nfc-forum.org/ specs/spec_list/.
Chapter 6
An Ontology-based Context-aware Approach for Behaviour Analysis
Shumei Zhang, Paul McCullagh, Chris Nugent, Huiru Zheng1 Computer Science Research Institute and School of Computing and Mathematics, University of Ulster, BT37 0QB, Northern Ireland, UK
[email protected] Abstract Abstract: An ontology-based context-aware framework for behavior analysis and reminder delivery is described within this Chapter. Such a framework may be used to assist elderly persons maintain a healthy daily routine and help them to live safely and independently within their own home for longer periods of time. Behavior analysis associated with the delivery of reminders offers strategies to promote a healthier lifestyle. Current studies addressing reminder based systems have focused largely on the delivery of prompts for a prescribed schedule at fixed times. This is not ideal given that such an approach does not consider what the user is doing and whether the reminder is relevant to them at that specific point in time. Our proposed solution is based upon high-level domain concept reasoning, to account for more complex scenarios. The solution, referred to as iMessenger, addresses the problem of efficient and appropriate delivery of feedback by combining context such as current activity, posture, location, time and personal schedule to manage any inconsistency between what the user is expected to do and what the user is actually doing. The ontology-based context-aware approach has the potential to integrate knowledge and data from different ontology-based repositories. Therefore, iMessenger can utilize a set of potential ontological, context extracting frameworks, to locate, monitor, address and deliver personalized behaviour related feedback, aiding people in the self-management of their well-being.
6.1
Introduction
People are becoming more motivated to maintain their health and avoid illness. Full engagement with regular physical activity may reduce symptoms associated with chronic 1 Author
footnote. footnote.
2 Affiliation
127
128
Activity Recognition in Pervasive Intelligent Environments
diseases such as heart disease and chronic pain [1]. Analysis of behavior and acting upon these offers strategies to promote a healthier lifestyle [2]. Deployment of ubiquitous computing in a smart environment (e.g., intelligent home) has the potential to support the monitoring of activities of daily living (ADLs) and hence provide sufficient data for behavior analysis. Activity monitoring combined with delivery of context-aware reminders has the ability to improve well-being, especially for elderly persons living independently in their home. More specifically, activity-aware reminders can stimulate people to engage in a set of predefined activities and hence improve their behavior. Current reminder systems normally deliver messages according to a predefined routine, based only on fixed times [3]. With such an approach the system does not take into account what the user is doing and whether the reminder will be useful or relevant to them at that particular point in time. For example, the user may deviate from their normal routine and get up in the morning at 6:50am as opposed to their normal time of 7.00am. Without taking into account such context related information, the system may deliver an alarm and a message to remind the user to get up at a time, when the person is already up and preparing their breakfast. This scenario may be perceived by users as being ineffective and most probably annoying. In order to deliver the reminders effectively, it is necessary to take into account the multiple contexts of the user such as their current location, activity, vital signs and environmental variables such as background noise. Ontologies have the ability to describe the concepts and relationships that are important in a particular domain, providing a vocabulary for that domain and a computerized specification of the meaning of terms used in the vocabulary [4]. An important technology for a contextaware system is the integration of intelligent agents that utilize knowledge and reasoning to understand the contexts and share this information. Ontologies support the sharing, integration, reuse and processing of domain knowledge [5]. Ontology based solutions have utilized intelligent agents employing knowledge and reasoning to understand the wider context and to share this information in the support of applications and interfaces [6]. In our current work we have developed an ontology-based context-aware approach referred to as iMessenger (intelligent messenger) for activity monitoring and behavior analysis. Within this model a smart phone and intelligent environment were used for data collection. Ontology modeling and querying were used for the behavior analysis and delivery of the context-aware reminders based on temporal relationships between multiple contexts. The remainder of the Chapter is structured as follows: Related work is presented in Section 6.2 and the ontological context extractions are presented in Section 6.3. Section 6.4 is
An Ontology-based Context-aware Approach for Behaviour Analysis
129
focused on the ontological modeling and presentation, in addition to the construction of the knowledge base (KB) and querying the ontology. In Section 6.5 the algorithms are evaluated through the simulation of real life scenarios. Discussion of the approach and future work is presented in Section 6.6. 6.2
Related Work
Context-aware systems have been proposed in several domains such as healthcare [6], activity recognition [7] and reminder based applications [3]. The approach of combing contextaware systems with mobile devices has the potential to increase the usability of services to be delivered [8]. Most reminder systems can be viewed as scheduling aids which have the ability to assist users in the management of their ADLs over an extended period of time. Early technology for scheduling aids provided prompts for prescribed activities at fixed times, specified in advance. These systems function in a similar manner to an alarm clock [9]. PEAT [10], a cognitive orthotic system, adopted automated planning technology and provided visible and audible clues about plan execution using a handheld device. Donnelly et al. [11] proposed a reminder system based around the delivery of personalized video prompts, delivered via a mobile phone. Autominder [12] provided adaptive, personalized reminders of (basic, instrumental, and extended) ADLs to help elderly people adapt to cognitive decline. Autominder was deployed on a mobile robot, and included three components: Plan Manager, Client Modeler and Reminder Generation to model an individual’s daily plans, observe and reason about the execution of those plans and to make decisions about whether and when reminders were given. The robot’s on-board sensors can report what room the client was in. Place-Its [13], comMotion [14], and PlaceMail [15] used location awareness for reminder delivery, given that location is an important contextual factor in human daily life. Nevertheless, the approach of reminder delivery decided only by location, is not effective in the delivery design, since location does not provide sufficient information to estimate what the user is doing, what task they have performed, and whether the reminder delivery is an undesirable interruption. A well designed model is important in any context-aware system. Context modeling expresses all the major elements of a system and the relationships between the elements. The model shows at an abstract level how the system functions and explores possible interactions. Strang et al. [16] classified context modeling approaches into six categories: (1) Key-Value, (2) Markup Scheme, (3) Graphical, (4) Object Oriented, (5) Logic Based,
130
Activity Recognition in Pervasive Intelligent Environments
and (6) Ontology Based. The context modelling approaches were evaluated for ubiquitous computing environments based on the requirements of distributed composition, partial validation, richness and quality of information, incompleteness and ambiguity, level of formality, and applicability to existing environments. Based on their survey, they proposed that the ontology based approach was the most expressive approach and best met the above requirements. Ontology based modelling and representation has been adopted in pervasive computing and in particular for assistance with the completion of ADLs. For example, Kang et al. [6] proposed a context-aware ubiquitous healthcare system, using a watch type sensor and a chest belt sensor to gather and send vital signals (e.g. heart rate, respiration, temperature and blood pressure) to a personal digital assistant (PDA). In their PDA system, an ontology based context model using the web ontology language (OWL) was designed to identify a person, devices and their health status. The Jess reasoning engine was used as a context interpreter to abstract information from the data of the user’s vital signals. The abstracted contexts were used for both self health checking and remote health monitoring. Chen and Nugent [5] used an ontology-based approach for activity recognition. They showed that ADL models were flexible and could be easily created, customized, deployed and scaled. Chen et al. [17] proposed a collection of ontologies called COBRA-ONT for describing places, agents, events and their associated properties in an intelligent meeting room domain. A number of query languages have been developed for relational context retrieval. For example, Pung et al. [18] presented a context-aware middleware for pervasive homecare. Structured query language (SQL) was used for context queries from multiple physical spaces. Nevertheless, the SQL-relational combination for querying has limitations, such as the lack of abstract, domain-level query support, the lack of the notion of a hierarchy, and minimal temporal query support [4]. SPARQL [19] is currently the standard RDF (Resource Description Framework) query language. Cyganiak [20] proposed a relational SPARQL model which made a correspondence between SPARQL queries and relational algebra queries over a single relation. It also summarized a translation between SPARQL and SQL. Nevertheless, SPARQL has no native understanding of OWL, and has no knowledge of the language constructs [21]. OWL and SWRL (Semantic Web Rule Language) are core Semantic Web languages. OWL is an ontology language for constructing ontologies that provides a high-level description of Web content [21]. The SWRL is a combination of RuleML and OWL [22]. One of SWRL’s most powerful features is its ability to support built-ins [23], such as SQWRL (Semantic Query-enhanced Web Rule Language). Addi-
An Ontology-based Context-aware Approach for Behaviour Analysis
131
tionally, a set of SWRL temporal built-in library can be used in SWRL rules to perform temporal operations. OWL, SWRL and SQWRL have been widely used to integrate lowlevel representations of relational context data with high-level domain concepts reasoning for healthcare applications [24, 25]. iMessenger uses ontology based ADL domain modelling, SQWRL queries and temporal reasoning for context-aware behaviour analysis with the delivery of reminders. This approach combines multiple contexts such as current status, expected status and location to infer inconsistency between what the user is expected to do and what they are currently doing with the intention to provide the most helpful level of support possible.
6.3
Data Collection and Ontological Context Extraction
iMessenger aims to monitor a user’s ADLs and obtain the multiple contexts of when, where and what the user is doing. At the same time the system aims to infer whether the user is in the correct place at the correct time and undertaking the correct activity. In a preliminary study, three kinds of sensors were used to collect data in relation to the user: • Accelerometer: embedded in a smart phone and used to measure the user’s acceleration of movement during their completion of ADLs. The phone is belt-worn on the left side of the waist. • GPS (global positioning system): embedded in the smart phone and used for locating the user when outside of the home. • RFID (Radio-frequency identification): detects the location of the user when inside the home. In addition, three independent modules (Activity Monitoring, Location detection and Schedule designing) are used to extract the useful contexts: activity postures, locations and events in personal routines. 6.3.1
Activity Context Extraction
The Activity Monitoring module has been described in our previous work [26–28]. It uses a hierarchical classification approach to discriminate the user’s activity as being one of motion or inactivity by initially using rule-based reasoning, and then utilizing two multiclass SVM (support vector machine) classifiers to classify motion and motionless activities, respectively [27]. The motion activities were classified as fall, walking, gentle motion
132
Activity Recognition in Pervasive Intelligent Environments
(Gmotion), and posture transition (PT) [26, 27]. Motionless activities were classified as nine postures that included four sitting postures: sitting normal (Sit-N), sitting back (SitB), sitting leaning left and right (Sit-L and Sit-R); two standing postures: standing upright (Sta-U) and standing forward (Sta-F); three lying postures: lying right (Lyi-R), lying back (Lyi-B) and lying face down (Lyi-Fd). Fig. 6.1 presents an overview of the classification framework [28].
Fig. 6.1 Activity context extraction using a hierarchical classification algorithm.
6.3.2
Location Context Detection
The Location Detection module provides details of the location of the user while they are indoors and outdoors through the use of RFID and GPS respectively. The RFID system has several components: readers, reference tags, mobile tags, and communication between the tags and the readers [29]. Fig. 6.2 shows the RFID system configuration in a simulated smart home. Three RFID readers (R1, R2 and R3) and five reference tags (Rtag1, Rtag2, Rtag3, Rtag4, and Rtag5) are distributed throughout the house whilst the mobile tag (Tag) is carried by the user. Reference tags are deployed in different locations in the one room, such as one near to the kettle and one near to the cooker. Thus
An Ontology-based Context-aware Approach for Behaviour Analysis
133
the system can obtain details about the user’s location based on the similarity on signal strength between the reference tags and the mobile tag. Using the RFID system and the nearest neighbor algorithm, iMessenger can estimate the user’s location details as shown in Tab. 6.1. For example, Fig. 6.2 shows that reader1 (R1) is deployed in the study room desk. The RFID system can measure the signal strength between the readers and tags (mobile tag and reference tags). If the signal strength measured from the mobile tag shows that the Tag (person) is beside R1 (Reader1 measures a very strong signal from the mobile tag), then the system could classify the person’s location as StudyRoom.desk. Bedroom
StudyRoom
LivingRoom
Kitchen
Bathroom
Stairs Simulated smart home Fig. 6.2 An example of the RFID system deployment for indoor location detection. Rtag=reference tag, Tag=mobile tag.
R=RFID reader,
Table 6.1 Location details obtained from the RFID system. R1 Rtag1 StudyRoom.desk Beside Bedroom.bed StudyRoom.else Nearby Bedroom.else R2 and Rtag4 Beside R2 Kitchen.kettle Beside Rtag5 Beside Rtag4 Kitchen.cooker Beside Rtag6 Between both Kitchen.else Between both
Beside Nearby
Rtag2
Rtag3 Beside LivingRoom.sofa Nearby LivingRoom.else Rtag5 and Rtag6 Bathroom.toilet Bathroom.shower Bathroom.else
Stairs
Note: Beside and nearby represent the relationships between the mobile tag and Readers or reference tags.
Detailed location information can assist in the analysis of the relationships between activity postures and events. For example, if iMessenger determined a user’s activity posture as sitting, and their location as in StudyRoom.desk, it may infer that the event is reading, using the computer, listening on the telephone, writing or partaking in a similar activity. If the user’s activity posture is sitting and their location is Kitchen.kettle, the system may
134
Activity Recognition in Pervasive Intelligent Environments
infer that the user may be drinking coffee/tea. Additionally, the relationship of activity posture with location is useful to analyze if the user’s activity is normal or abnormal. For example, if the user’s activity posture is sitting and their location is near to Kitchen.cooker, iMessenger will infer that the activity posture is abnormal, and an alert message can be raised.
6.3.3
Schedule Design
The Schedule was used to gain the event related information that predicts when, where and what the user will do in his/her routines. A suitable schedule plays an important role in healthcare. The technology of activity monitoring and behaviour analysis can be used for the judgment of the schedule efficiency. In order to infer the inconsistency between what the user is expected to do in the schedule and what the user is actually doing, the schedule not only provides an event list, but also enumerates some of the possible locations and activity postures for the event performance. An event such as breakfast can be naturally performed in a set of intervals at which the event is performed in different locations and in various activity postures. For example, the time spent heating water (location=kitchen.kettle, activity=standing), the time of cooking (kitchen.cooker, waking and standing), and the time the person is eating and washing (kitchen.else, sitting, walking and standing). Fig. 6.3 depicts the hierarchical approach of representing knowledge of temporal relations between the times (intervals) at which the event is performed. Note that the schedule designing should allow uncertainty of temporal information. Often, the exact time points to where the user will be, and what activity postures the user will have during the event interval are uncertain. For example, the time the user is standing beside the kettle in the kitchen is flexible during breakfast. Nevertheless, there are constraints on possible locations and activity postures used to perform this event during the scheduled interval. Where the interval named I(E) is an event occurred time-period. A sequence of intervals I1(L1), I2(L2) and I3(L3) are subintervals during the event interval for the time from one performed location (L1) to another location (L2, . . .). In addition, each location interval could be further decomposed for a series of activity intervals for the time of performed activity postures (A1, A2,. . .) during the location interval. Furthermore, the temporal relationships between each pair of location intervals or activity intervals are defined as meets which means the pair of intervals are closed.
An Ontology-based Context-aware Approach for Behaviour Analysis
135
Fig. 6.3 A hierarchical representation of intervals for temporal constraints between each pair of intervals in an event performance cluster. Where I=interval, L=location, A=activity, d=during, m=meets.
This approach of representing relevant temporal knowledge in a way of hierarchical intervals (Fig. 6.3) is convenient for temporal reasoning. It is partially useful in the ADL domain where temporal information is imprecise and relative.
6.4
Ontological ADL Modelling and Knowledge Base (KB) Building
Ontology-driven applications confront the challenge of integrating the multiple formats with information stored in ontologies. The design, reuse, maintenance, and integration of ontologies are all challenging [30]. High quality ontologies should be meaningful, correct and minimally redundant, whilst providing sufficiently detailed descriptions. OWL can formalize a domain by defining hierarchies of classes and relating the classes to each other using properties [31]. It also has the ability to define individuals and asserts properties about them and provides mechanisms for reasoning about these classes and individuals. Additionally, OWL offers a powerful axiom language for precisely defining how to interpret concepts in ontologies. iMessenger uses the OWL (Protégé OWL) to model the ADL domain by defining classes of person, activity, location and scheduled event, whilst using the body state and spatio-temporal properties to represent the relationships between them. 6.4.1
Ontological Modelling
Ontologies are defined as the representation of the semantics of terms and their relationships. They consist of classes (concepts), properties, instances (individuals), and relations between classes. The iMessenger ontology is categorized into five main classes with 12 object properties and 5 data-type properties as shown in Tab. 6.2. (i) Person, the class of all users. Each person has three properties hasName, hasAge and hasCondition (such as high blood pressure) to describe the user’s basic information.
136
Activity Recognition in Pervasive Intelligent Environments
Table 6.2 iMessenger Ontology classes and properties Classed Person Motion Activity
inactivity Indoor
Location
Outdoor
Properties hasName hasAge hasCondition hasName monitoredActivity hasStartTime hasFinishTime hasName locatedIn hasStartTime hasFinishTime
Classes
SEvent (scheduled event)
Temporal:Entity
Properties hasName hasID hasEvent hasStartTime hasFinishTime performedIn performedActivity hasGranularity hasStartTime hasFinishTime
(ii) Activity, the class of body movement that describes what postures the user has when performing an event. The properties of hasName, monitoredActivity, hasStartTime and hasFinishTime are defined to describe who, when and what posture the user had. All postures are categorized into either motion or inactivity subclasses. The activity postures can be classified as nine inactivity postures and four motion postures by the activity monitoring module (Fig. 6.1). (iii) Location, the class of positions that describes where the user is at a given time. The properties hasName, hasStartTime, hasFinshTime and locateIn are used to express who, when and where the user has been. The location can be detected as which room, in addition to which part of a room by the RFID system (Tab. 6.1). The outdoor location can be detected by the GPS system. (iv) Scheduled Event (SEvent), the class of all events that describes the main tasks for a user’s routines. Events are organized using the schedule. The properties hasName, hasID, hasEvent, hasStartTime and hasFinishTime are defined to illustrate who, what event and when the person should undertake it. The properties performedIn and performedActivity are defined to describe where and what activity postures are used to implement the event. For example, the event ’undertake exercise’ will be implemented outside by walking. (v) temporal:Entity, is a Protégé-OWL built-in class used to represent temporal durations and temporal constraints among the SEvent, Activity and Location classes. We selected part of temporal:Entity properties such as hasGranularity, hasStartTime and hasFinishTime to describe the temporal information such as when and for how long. The temporal granularity can take values of days, months, years, hours, minutes, seconds, or milliseconds.
An Ontology-based Context-aware Approach for Behaviour Analysis
137
Fig. 6.4 iMessenger ontological modeling and the relationships among the multiple classes.
Fig. 6.4 shows the relationships between the multiple classes. In order to consider queries such as, “when, where and what the user is doing?” we need to represent temporal knowledge and the relationships between temporal intervals. Therefore, the time course of events is important for the schedule based behaviour analysis. The temporal entity class links to the three classes (SEvent, Activity and Location) through the corresponding time and duration properties (hasStartTime, hasFinishTime and hasGranularity). Additionally, the class SEvent links to Location through a pair of inverse properties performedIn (performed location) and locatedIn (detected location). Meanwhile SEvent links to Activity using a pair of inverse properties performedActivity and monitoredActivity. If the property’s value is consistent between each of the two pairs of inverse properties at the same time, we will deduce that the person is following their schedule correctly. Therefore, iMessenger can infer what the person is expected to do and what the person is actually doing from the relationships among the multiple classes.
6.4.2
Knowledge Base Building
The knowledge base (KB) building is the act of integrating and populating the ontologies with the instances acquired from databases. The ADL ontology shapes the ADL knowledge representation. The ADL ontology specifies, at a higher level the classes of concepts that are relevant to the ADL domain and the relations that exist between these classes. The KB consists of a terminological component (Tbox) and an assertion component (Abox). The Tbox comprises a set of definitional vocabularies. The Abox contains a set of facts associ-
138
Activity Recognition in Pervasive Intelligent Environments
ated with the terminological vocabularies. Abox statements are associated with individual instances. iMessenger extracts contexts of activity postures and location from raw data sensed using wearable and smart home sensors, and saves all contextual information related to activity, location, scheduled events and personal information in different worksheets respectively using a spreadsheet. The spreadsheet can be subsequently used to populate the KB in the Protégé environment. Protégé is an ontology editor and a knowledge base framework developed by Stanford University [32]. Protégé-OWL has two plug-ins DataMaster and MappingMaster that support direct data import/mapping into an ontology. We used the MappingMaster tab, since it can map any spreadsheet to an OWL ontology using a very powerful OWL-based mapping language [33]. It can also take into account Object Properties, and provides better control over how data can be manipulated within the ontology.
Fig. 6.5 The InstanceTree tab showing the instances of the selected class and the instance form ontologies.
The screenshot in Fig. 6.5 shows the populated SEvent class using the schedule worksheet in the Protégé environment. The left hand pane displays the instances of the selected class, such as six instances in the SEvent class. The right hand pane shows the populated instance form for the selected instance. For example, the SEvent class defined seven properties (shown in Tab. 6.2), hence every instance in this class needs to populate corresponding values for each of the seven properties. The right instance form shows that the event CoffeeOrTea was scheduled from 9:00am to 10:00am; its performedActivity defined multiple values, this means the subject could be Sit-N, Sta-U or Walking during his coffee time; the performedIn values indicate that the location could be in Kitchen or in LivingRoom.
An Ontology-based Context-aware Approach for Behaviour Analysis
6.5
139
Experiments
iMessenger is capable of identifying whether people keep healthy postures and comply with their prescriptive schedule during their daily life. The results of behaviour analysis were delivered as offline reminders by using a set of rules in this initial study. In the following sections we shall describe the functionality of the main components within iMessenger and how it could be used to identify people whose behaviour is either in a healthy state or could be somehow improved. 6.5.1
iMessenger Ontologies
Fig. 6.6 shows the classes, object properties and data-type properties comprising the iMessenger ontologies, created using Protégé-OWL (version 3.4.4). In the study, the ontology focuses primarily on activity, location and event relevant behaviour analysis, but can easily be adapted by adding on other elements for more complicated behaviour analysis.
Fig. 6.6
iMessenger ontological classes, object properties and datatype properties created in Protégé
The grouping and ontological naming concepts are self-explanatory; however, a few details need to be explained. For example, the class temporal:Entity is a valid-time temporal ontology model which exists in the standard Protégé-OWL repositories and can be imported. It also uses a library of SWRL built-ins to perform temporal ontology oper-
140
Activity Recognition in Pervasive Intelligent Environments
ations [34]. The temporal ontology provides OWL entities for representing granularity, duration, valid-times, and propositions as shown in Fig. 6.6. In this study, two of its subclasses were used: temporal:Granularity and temporal:ValidPeriod. The valid period has temporal:hasStartTime and temporal:hasFinishTime data-type properties. Relationships between the multiple classes are represented by their properties as previously shown in Fig. 6.4.
6.5.2
Rules Definition
In order to understand and discover knowledge consistency or inconsistency, several rules were defined for the behaviour analysis and reminders delivery. For example, three main types of rules definition were defined as follows: • Falls alert will be sent to a nominated healthcare assistant’s or a relative’s mobile phone if the user falls. • Unhealthy posture reminders will remind the user to change their posture when they are in an unhealthy posture for more than a predefined period of time. • Scheduled event inconsistencies will be issued if any inconsistency is inferred between what the user is expected to do and what they are actually doing. Therefore, accordingly three kinds of reminder rules were defined by properties of classes (Activity, Location, and SEvent) and temporal relationships between them. The representations of the rules using description logic are described as: (1) fallsAlert≡ IsMoniteredActivity (∃fall, yes)→Alert to a caregiver (2) postureReminder≡ IsMoniteredActivity (∃unhealthyPosture, yes)∧ IsActivity.Duration (predefinedPeriod, yes)→ Reminder to Person being Monitored (3) eventReminder≡ IscurrentTime (SEvent(ID).startTime, yes)∧ IscurrentTime (SEvent(ID).endTime, yes)∧ (IsLocatedIn (∃SEvent(ID).performedIn, No)∨ IsMonitoredActivity (∃SEvent(ID).PerformedActivity, No))→ Reminder to Person being Monitored (4) healthyPosture∈ {Sit-N, Sta-U, Lyi-R, Lyi-B} (5) unhealthyPosture∈ {Sit-B, Sit-R, Sit-L, Sta-F, Lyi-Fd}
An Ontology-based Context-aware Approach for Behaviour Analysis
141
The fallsAltert is associated by fall or abnormal activities such as lying in the bathroom. The postureReminder is determined by two facts: unhealthy posture and duration. Here, unhealthyPosture is defined by five postures which are Sit-B, Sit-R, Sit-L, Sta-F and LyiFd [28]. The eventReminder is established by the relationships among temporal interval, event, location and activity. If the current time is within a scheduled event interval, but the detected location or monitored activity from the person is inconsistent with the event performed location, or performed activity, then an event relevant reminder will be generated.
6.5.3
Case Study: Querying the Ontology Using SQWRL
The Protégé-OWL has support to edit and execute SWRL rules by using the SWRL Rules Tab. The SWRL was developed to add rules to OWL. SWRL provides deductive reasoning capabilities that can infer new knowledge from OWL individuals [23]. The SWRL rules can be executed using either SQWRLQueryTab or SWRLJessTab, which is a plug-in to the SWRL Tab contained in the Protégé-OWL 3.4 distribution. This study used the SQWRLQueryTab to execute SQWRL queries. SQWRL is a SWRL-based language for querying OWL ontologies and also provides SQL-like operations to format knowledge retrieved from OWL [35]. SQWRLQueryTab provides a convenient way to visualize the results of queries on OWL ontologies using the Jess rule engine [36]. Consider a simple example scenario of this algorithm in operation. Peter, an elderly person lives alone at home. In order to motivate Peter to live at home independently and achieve a well balanced lifestyle, he uses iMessenger to monitor and remind him to keep a healthy posture and correctly follow a guided schedule of his routines. For example, Fig. 6.7 shows an excerpt from Peter’s routine. The top of Fig. 6.7 shows that the scheduled time for Peter’s breakfast is from 7:30 am to 8:30 am. This event is performed in the Kitchen where Peter could be beside the cooker, kettle or anywhere else in the kitchen, and the activity postures could be walk, Sta-U or Sit-N. The locations were located as five subintervals during his breakfast interval are shown in the bottom of Fig. 6.7. The monitored activity postures during each of the location intervals were shown in the middle of Fig. 6.7. Here we use the scenario of a breakfast event to show how the SWRL rules and SQWRL queries are used to analyze a user’s behaviour and provide useful feedback. The behaviour analysis involves determining a user’s activity, whether they have fallen or have unhealthy postures that required the immediate delivery of a reminder. In addition, the details of consistency and inconsistency between a user’s monitored status and expected status will be reported. If the monitored locations and activities are consistent with the performed
142
Activity Recognition in Pervasive Intelligent Environments
Scheduled Event: Breakfast (7;30~8:30)
cooker
Sta-U
else
Performed In
Walk
Sit-N
Postures
Kitchen kettle
Performed Activity
Sit-N Sit-L Walking Sta-F Sta-U Downstairs Sta-U
Sit-N Sit-B Sit-N
Walking Sta-U Walking Sta-U
Bathroom.toilt Stairs Kitchen.cooker Kitchen.kettle Kitchen.else 7:30~7:38
7:39~7:41 7:41~7:54
7:55~8:20
8:20~8:34 Timeline
Fig. 6.7 An excerpt of Peter’s schedule with monitored activity and location during his breakfast. The bottom displays five locations and the respective time intervals; the monitored postures during each interval are shown in the middle; with the details of scheduled breakfast above.
locations and performed activities at the same time, iMessenger will infer that the user is following the schedule correctly. Otherwise, if any inconsistency is found between performed locations with detected locations or performed activities with monitored activities it can be inferred that the user has deviated from their schedule. The inference rules were written using SWRL rules and SQWRL queries in Protégé (Rules described in section 6.5.2). Fig. 6.8 shows the details of the consistentActivityAndLocation rule. The pseudo code below is used to paraphrase the logical conjunction between the multiple temporal constraints in this query expression. **************************************************************** Pseudo code for Rule consistentActivityAndLocation IF "p" Is a Person AND "p" has name "n" AND IF "a" Is an Activity AND "a" has monitoredActivity "ma" AND "ma" has StartTime & FinishTime
An Ontology-based Context-aware Approach for Behaviour Analysis
Fig. 6.8
143
Details of consistentActivityAndLocation rule written using SWRL rule and SQWRL query.
AND "ma" got Duration "aDuration=FinishTime-StartTime"(Minutes) AND IF "lc" IS a Location AND "lc" has locatedIn "li" AND "li" has StartTime & FinishTime AND "li" got Duration "lDuration=FinishTime-StartTime"(Minutes) AND IF "s" Is a Schedule AND "s" has event "e" AND "e" has performedIn "pi" & performedActivity "pa" AND "e" has StartTime "start" & FinishTime "finish" AND IF "ma" Is an element of "pa" during the interval("start","finish") AND "li" Is an element of "pi" during the interval("start","finish") THEN Query and show the consistent results. The columns include "Person Name", "Event",  **************************************************************** Fig. 6.9 shows the result after executing the query consistentActivityAndLocation that in-
144
Activity Recognition in Pervasive Intelligent Environments
Fig. 6.9 The result following execution of the highlighted consistentActivityAndLocation query.
fers the consistency of monitored activities and locations with performed activities and performed locations during Peter’s breakfast time. The result shows that Peter followed his breakfast schedule from 7:41am; nevertheless, it is delayed by 11 minutes from his scheduled time of 7:30am. Feedback could be saved as an instance for the reminder delivery, such as “you are late by about 11 minutes for breakfast on 12th May 201”. It is important to combine multiple contexts for human behaviour analysis. For example, if only considering the current activity postures, the result should include the postures of Sit-N and Sta-U during 7:30am and 7:38am as shown in Fig. 6.7. Nevertheless, the user’s location is inconsistent with the performed location during that time; hence it is not inferred in the final consistent result. In a similar manner, the other rules such as inconsistentActivityAndLocation, fallsAlert, and unhealthyPostures can supply corresponding results. Fig. 6.10 shows the results after executing the inconsistentLocations query. According to Peter’s schedule, he should be in the Kitchen to prepare and eat his breakfast from 7:30am, however, the result of inconsistent locations shows that he is in Bathroom.toilet during 7:30am and 7:38am, and on the Stairs during 7:39am and 7:41am, although his monitoredActivity (Sit-N, Sta-U, Walking) was consistent with the performedActivity during that period. In this case, a feedback could
An Ontology-based Context-aware Approach for Behaviour Analysis
145
be saved as “Peter, you were in the wrong location from 7:30am to 7:38am during your breakfast time on 12th May 2010”.
Fig. 6.10 The result after executing the inconsistentLocation query.
In addition, iMessenger can analyze the user’s ADLs for a period of time such as one week, one month or six months, and provide feedback to show when, where and for how long the user was correctly or incorrectly following his/her schedule, and to remind them which element of their behaviour should be improved. Moreover, the user’s therapist can use results for refining the schedule accordingly. The advantage of this approach is that the schedule-based behaviour analysis is a multiple context-related basis, i.e., current activity postures and locations of users are taken into account for identifying any inconsistency between what the user is expected to do and what the user is doing. This approach has the potential of improving the reliability of behaviour analysis and reminder delivery.
6.6
Discussion and future work
This paper presented work related to an ontology driven context-aware framework for behaviour analysis and reminder delivery that takes into consideration the wider contexts of the user, when analysing behaviour. iMessenger combined multiple contexts, such as monitored status and expected status to infer inconsistency between what the user is expected to do and what the user is doing, and provides feedback about whether the user’s behaviour is healthy or should be improved. It is important to ensure that users accept the system and do not feel irritated by it and also adhere to the prompts delivered by the system. iMessenger adopts ontologies to integrate intelligent agents that employ ontological knowledge and reasoning to understand the wider contexts and share this information in support of intelligent feedback. The algorithms were evaluated using some rudimentary although convincing example scenarios. The use of an ontology-based approach makes it easier to create, visualize and navigate the knowledge base. In the current implementation of iMes-
146
Activity Recognition in Pervasive Intelligent Environments
senger, five contexts have been considered which include personal conditions, temporal information, scheduled events, detected locations and monitored activity postures. This can be extended to consider a wider class of context data, e.g. video monotoring, heart rate data to name but a few. At present, the two activity relevant reminders (fall and unhealthy posture) could potentially be delivered in real-time on an HTC phone, taking into account that the acceleration data sensing and activity classification can be processed in real time using the mobile phone. The event relevant behaviour analysis and reminders delivery are inferred offline, since the location information is collected by a computer through a wireless link, not directly to the phone. The ontology-based context fusion is currently performed offline. Future work will involved the developments to support transmitting the detected locations to the HTC phone directly by wireless communication, hence supporting the delivery of relevant reminders in real-time. The limitation for the study is that the behaviour analysis is based on multiple contexts which were extracted using different technologies such as activity monitoring and location detection modules, which were subsequently transferred into the Protégé-OWL ontology using an OWL-based mapping language. The accuracy of context extraction will directly influence the result of the behaviour analysis. In our future work, we will endeavour to solve the uncertainties caused by imprecise context extraction from the activity monitoring and location detection modules, such as how to find and correct the misclassified contexts automatically before or during the behaviour analysis.
Acknowledgements The authors acknowledge the support of the University of Ulster Vice Chancellor Scholarship Programme, and thank all members of the Smart Environments Research Group for their help with collecting the experimental data.
References [1] Haskell, W.L., Lee, I.M., Pate, R.R., Powell, K.E., Blair, S.N., Franklin, B.A., Macera, C.A., Heath, G.W., Thompson, P.D. and Bauman, A. 2007, “Physical activity and public health: updated recommendation for adults from the American College of Sports Medicine and the American Heart Association", Circulation, vol. 116, no. 9, pp. 1081. [2] Skinner, B.F. and Vaughan, M. 1997, Enjoy old age: A practical guide, WW Norton and Company.
An Ontology-based Context-aware Approach for Behaviour Analysis
147
[3] Osmani, V., Zhang, D. and Balasubramaniam, S. 2009, “Human activity recognition supporting context-appropriate reminders for elderly", Pervasive Health 2009, London, April 1-3 2009. [4] Mabotuwana, T. and Warren, J. 2009, “An ontology-based approach to enhance querying capabilities of general practice medicine for better management of hypertension", Artificial Intelligence in Medicine. [5] Chen, L. and Nugent, C. 2009, “Ontology-based activity recognition in intelligent pervasive environments", International Journal of Web Information Systems, vol. 5, no. 4, pp. 410-430. [6] Kang, D.O., Lee, H.J., Ko, E.J., Kang, K. and Lee, J. 2006, “A wearable context aware system for ubiquitous healthcare", IEEE Engineering in Medicine and Biology Society.Conference, vol. 1, pp. 5192-5195. [7] Daniele,R. and Claudio Bettini, 2009, “Context-Aware Activity Recognition through a Combination of Ontological and Statistical Reasoning", Ubiquitous Intelligence and Computing: 6th International Conference, Proceedings, pp. 39. [8] Baldauf, M., Dustdar, S. and Rosenberg, F. 2007, “A survey on context-aware systems", International Journal of Ad Hoc and Ubiquitous Computing, vol. 2, no. 4, pp. 263-277. [9] Jdnsson, B. and Svensk, A. 1995, “Isaac-A Personal Digital Assistant for the Differently Abled", The European context for assistive technology: proceedings of the 2nd TIDE Congress, 26-28 April 1995, ParisIos Pr Inc, pp. 356. [10] Levinson, R. 1997, “The planning and execution assistant and trainer (PEAT)", The Journal of head trauma rehabilitation, vol. 12, no. 2, pp. 85. [11] Donnelly, M., Nugent, C., McClean, S., Scotney, B., Mason, S., Passmore, P. and Craig, D. 2010, “A Mobile Multimedia Technology to Aid Those with Alzheimer’s Disease", IEEE MultiMedia, vol. 17, no. 2, pp. 42-51, Apr. 2010. [12] Pollack, M.E., Brown, L., Colbry, D., McCarthy, C.E., Orosz, C., Peintner, B., Ramakrishnan, S. and Tsamardinos, I. 2003, “Autominder: An intelligent cognitive orthotic system for people with memory impairment", Robotics and Autonomous Systems, vol. 44, no. 3, pp. 273-282. [13] Sohn, T., Li, K.A., Lee, G., et al. 2005, “Place-its: A study of location-based reminders on mobile phones", UbiComp 2005: Ubiquitous Computing, pp. 232-250. [14] Marmasse, N. and Schmandt, C. 2000, “Location-aware information delivery with commotion", Handheld and Ubiquitous ComputingSpringer, pp. 361. [15] Ludford, P.J., Frankowski, D., Reily, K., et al. 2006, “Because I carry my cell phone anyway: functional location-based reminder applications", Proceedings of the SIGCHI conference on Human Factors in computing systemsACM, pp. 898. [16] Strang, T. and Linnhoff-Popien, C. 2004, “A context modeling survey", Workshop on Advanced Context Modelling, Reasoning and Management as part of UbiCompCiteseer. [17] Chen, H., Finin, T. and Joshi, A. 2004, “Semantic web in the context broker architecture", Proceedings of PerCom 2004, pp. 277-286. [18] Pung, H.K., Gu, T., Xue, W., Palmes, P.P., Zhu, J., Ng, W.L., Tang, C.W. and Chung, N.H. 2009, “Context-aware middleware for pervasive elderly homecare", IEEE Journal on Selected Areas in Communications, vol. 27, no. 4, pp. 510-524. [19] Prud’Hommeaux, E. and Seaborne, A. 2006, “SPARQL query language for RDF", W3C working draft, vol. 20. [20] Cyganiak, R. 2005, “A relational algebra for SPARQL", Digital Media Systems Laboratory, HP Laboratories Bristol, pp. 2005-2170. [21] O’Connor, M. and Das, A. 2009, “SQWRL: a query language for OWL", OWL: Experiences and Directions (OWLED), Fifth International Workshop. [22] Lee, J.K. and Sohn, M.M. 2003, “The extensible rule markup language", Communications of the ACM, vol. 46, no. 5, pp. 64. [23] Horrocks, I., Patel-Schneider, P.F., Boley, H., Tabet, S., Grosof, B. and Dean, M. 2004, “SWRL: A semantic web rule language combining OWL and RuleML", W3C Member submission, vol.
148
Activity Recognition in Pervasive Intelligent Environments
21. [24] O’Connor, M.J., Shankar, R.D., Parrish, D.B. and Das, A.K. 2009, “Knowledge-data integration for temporal reasoning in a clinical trial system", International journal of medical informatics, vol. 78, pp. S77-S85. [25] Young, L., Vismer, D., McAuliffe, M.J., Tu, S.W., Tennakoon, L., Das, A.K., Astakhov, V., Gupta, A. and Jeffrey, S. 2009, “Ontology Driven Data Integration for Autism Research", Proceedings of the 22nd IEEE International Symposium on Computer-Based Medical Systems. [26] Zhang, S., McCullagh, P., Nugent, C. and Zheng, H. 2009, “A Theoretic Algorithm for Fall and Motionless Detection",Pervasive Computing Technologies for Healthcare, 2009. Proceedings of the 3rd Annual IEEE International Conference PP.1-6 [27] Zhang, S., McCullagh, P., Nugent, C. and Zheng, H. 2010, “Activity Monitoring Using a Smart Phone’s Accelerometer with Hierarchical Classification". Intelligent Environment 2010 Proceedings of the 6th IEEE International Conference. [28] Zhang, S., McCullagh, P., Nugent, C. and Zheng, H. 2010, “Optimal Model Selection for Posture Recognition in Home-based Healthcare". International Journal of Machine Learning and Cybernetics,vol.2, Springer. [29] Hallberg, J., Nugent, C., Davies, R. and Donnelly, M. 2009, “Localisation of Forgotten Items using RFID Technology". Proceedings of the 9th International Conference on Information Technology and Applications in Biomedicine, Larnaca, Cyprus. [30] Cuenca Grau, B., Horrocks, I., Kazakov, Y. and Sattler, U. 2009, “Extracting modules from ontologies: A logic-based approach", Modular Ontologies, pp. 159-186. [31] Horrocks, I., Patel-Schneider, P.F. and Van Harmelen, F. 2003, “From SHIQ and RDF to OWL: The making of a web ontology language", Web semantics: science, services and agents on the World Wide Web, vol. 1, no. 1, pp. 7-26. [32] Protégé, 2010, http://www.protege.stanford.edu [33] O’Connor, M. J., C. Halaschek-Wiener, M. A. Musen, 2010, “OWL: Experiences and Directions (OWLED)", Sixth International Workshop, San Francisco, CA. [34] SWRLTemporalBuiltIns,2010,http://protege.cim3.net/cgi-bin/ wiki.pl?SWRLTemporalBuiltIns. [35] SQWRL, 2010, http://protege.cim3.net/cgi-bin/wiki.pl?SQWRL [36] Jess, the rule engine for the for Java platform, http://www.jessrules.com/
Chapter 7
User’s Behavior Classification Model for Smart Houses Occupant Prediction
Rachid Kadouche, Hélène Pigot, Bessam Abdulrazak, and Sylvain Giroux DOMUS Lab., Université de Sherbrooke, Sherbrooke, Québec, Canada
[email protected] Abstract This paper deals with the smart house occupant prediction issue based on daily life activities. Based on data provided by non intrusive sensors and devices, our approach uses supervised learning technics to predict the house occupant. We applied Support Vector Machines (SVM) classifier to build a Behavior Classification Model (BCM) and learn the users’ habits when they perform activities for predicting and identifying the house occupant. To test the model, we have analyzed the early morning routine activity of six users at the DOMUS apartment and two users of the publicly available dataset of the Washington State University smart apartment tesbed. The results showed a high prediction precision and demonstrate that each user has his own manner to perform his morning activity, and can be easily identified by just learning his habits.
7.1
Introduction
Smart houses [1–3] are becoming a viable option for people with special needs (PwSN1 ) who would prefer to stay in the comfort of their homes rather than move to a healthcare facility. This support includes facilities for environmental control, information access, communication, monitoring, etc., and built over various existing and emerging technologies. However, PwSN have a large variability in their needs and levels of cognitive and/or motor handicap. Hence, they call for adapted services, especially when interacting with their environment. The task of environment personalization becomes more complicated in the case of multiple inhabitants in the house. According to [4], over half of the elderly live with their spouse, and one third live alone assisted by residential care. In both cases, the 1 PwSN:
People with disabilities and elderly 149
150
Activity Recognition in Pervasive Intelligent Environments
house should be able to correctly distinguish between multiple inhabitants before any other operation. The system should predict and identify, among a group of people, who is currently present in the house in order to provide him a personalized service. In case of PwSN, the identification system should take into account the omission errors of this population particularly when they are involved by themselves in the process (using a badge, introducing a password...). In this case, the risk of missing the identification task is high. Thus, the identification should be automatic. It should not intrude or disturb the user, but should quietly support them. In addition, several constraints should be considered to build such a system: • Occupants’ privacy has to be preserved. For instance, the use of video cameras and microphone are not recommended. • The sensors used should be economic, designed to be quickly and ubiquitously installed and easy to dissimulate in the house to provide a familiar environment. • Avoid the use of wearable tags. This reduces anxiety and the feeling of being constantly monitored. In this paper, we present a smart house occupant prediction model based on basic activities that people undertake each day, such as having breakfast, grooming, etc. The particularity of our approach remains in the use of non intrusive sensors and devices and apply a supervised learning technic to classify users by their habits when they are performing their daily activities. In this work, we prove that different users can have their own way of carrying out an activity. Thus, they can be easily identified using a prediction technics. 7.2
Background and Related Work
Blind user’s prediction is under-researched area of work. It uses an effective means to predict the users without personal information. It defines an alternative method of authentication in smart house systems. Floor sensor settings have been used for user prediction in smart environments. The authors in [5] and [6], respectively use the nearest-neighbour and hidden Markov models (HMM) methods, for a footstep prediction based on a small area of ground reaction force sensors. UbiFloor [7], based on neural network classifier, uses binary type ON/OFF sensors to detect the characteristic walking styles of the user based on both single footsteps and walking from consecutive footsteps on the floor. In these systems, to perform a high resolution of the prediction process, the house should be equipped with a large number of sensors. Thus, the cost of the prediction should be high. Other prediction
User’s Behavior Classification Model for Smart Houses Occupant Prediction
151
systems are based on gait recognition. Gait-based methods use the modelling sequence of walking using cameras [8]. Continuous hidden Markov models [9] and eigenspace transformation combined with canonical space transformation [8] have been applied to predict a user among a group. These technics suffer from occlusion, differences in background movements, and lighting conditions. Physiological measurements such us biometrics are also used for user prediction in smart environments. They refer to automatic recognition of people based on their distinctive anatomical (e.g., face, fingerprint, hand shapes, iris, retina, hand geometry) [10, 11]. These systems have not yet carried out automatic human recognition. They are sensitive to the environmental factors like shadow, light intensity and obstacles. Although prediction systems presented above present a flexible and natural way to identify the inhabitant without any wearable devices in smart environments, these solutions have many disadvantages. They are either costly or extracted features are sensitive to environmental factors and the recording devices are not always hidden from the user (e.g. camera) which can compromise the individual’s privacy. Our approach uses behavioral modelling based on users’ habits inside the house. The key advantage of our approach is that it can uses any imbedded sensors already mounted in the smart house to detect the inhabitant.
7.3
Our Approach
We use supervised learning technics for predicting a person from inhabitant group (classification). This task predicts the house occupants after they have performed, many times, particular daily activities (e.g. grooming, eating, having breakfast, ...) inside the house. We first define the activity targeted for evaluation, and then learn the habits of each user by training a Support Vector Machines (SVM) classifier over the set of data received from sensors and actuators provided by the users when they perform the activity (see Figure 7.1). The SVM classifier builds a learning model, called Behavior Classification Model (BCM), which will be able to separate one user from others. BCM will class each user by his/her behavior to perform a particular activity and be able to predict the house occupant (see Figure 7.2). To evaluate the BCM, we analyzed the early morning routine activity. Six users at the DOMUS apartment [12] have participated to the experiment. In addition and to confirm the efficiency of our approach, we analyzed the dataset of the Washington State University smart apartment tesbed, which is part of the CASAS smart home project [13].
152
Activity Recognition in Pervasive Intelligent Environments
Fig. 7.1 Behavior Classification Model(BCM)
Fig. 7.2
7.3.1
User’s prediction
Support Vector Machines (SVM)
SVM are powerful classification systems based on regularization technics with excellent performances in many practical classification problems [14]. SVM deliver state-of-the-art performance in real-world applications such as text categorization, hand-written character recognition, biosequences analysis, image classification, etc. It is now established as one of the standard tools for machine learning and data mining. The SVM decision function is
User’s Behavior Classification Model for Smart Houses Occupant Prediction
153
defined as follows: N
f (y) = ∑ αi K(xi , y) + b
(7.1)
i=1
Here y is the unclassified tested vector, xi are the support vectors and αi their weights and b is a constant bias. K(x, y) is the kernel function introduced into SVM to solve the nonlinear problems by performing implicit mapping into a high-dimensional feature space [15, 16]. We have applied SVM to build the BCM for learning the users’ habits when they perform their activities inside the house. The BCM data input is a matrix defined by a set of vectors, each vector, called pattern, is composed of n components which we name features. Features are the apartment sensors’ states. Patterns correspond to the user’s actions and movement in the smart house. It defines the involved sensors when the user was performing the activity. Features have binary values (1 and 0). The value 1 (resp. 0) means that the corresponding sensor have been involved (resp not been involved). We have to note that patterns do not give any temporal order of the actions performed by the user. Particulary in this work, the temporal constrain is not taken into account. Each vector was labelled by the feature Class which defines the user who is performing the activity.
7.4
Experimentation
In this work we have focused only on the early morning habits. Particularly. We assume that we are monitoring a group of people living together where the morning wake up time is different from one another. Each person at a time does the early morning routine alone in the apartment. This choice is due to being unable to detect the activity of one person among a group performing altogether activities inside the house. Thus, we first conduct a series of experiments at DOMUS apartment and evaluated six users. Then analyzed the publicly available dataset from the Washington State University smart apartment tesbed.
7.4.1
DOMUS
The DOMUS laboratory [12] includes a standard apartment (kitchen, living room, dining room, bedroom and bathroom) located within the computer science department of the University of Sherbrooke and equipped with a set of infrared sensors, pressure detectors, smart light switches, electrical contacts on doors, audio and video systems, as well as smart tags (RFID) to obtain, in the apartment, the real-time position of the user and objects.
154
Activity Recognition in Pervasive Intelligent Environments
Table 7.1
List of the DOMUS sensors used per zones
Entrance hall Living room Dining room Kitchen Bathroom Bedroom Total IR Pressure Detector Lamps Door contacts Switch contacts Flow meters
7.4.1.1
0 1 0 0 0 0
1 0 1 0 0 0
1 0 1 0 0 0
3 0 1 0 19 0
0 0 1 1 0 2
0 0 1 1 0 2
5 1 5 2 19 4
The Sensors
In this work we have considered 36 imbedded and unobtrusive sensors which are already mounted in DOMUS apartment, they allow us to cover a large space of potential sensors. Each sensor have two states (open and close) that makes a total of 72 features. Six zones are defined to cover the different apartment area (see Figure 7.3). The number of installed sensors varies depending on the interested zone. Table 7.1 defines their disposition per zone. The following list gives the details of each sensor. • Infrared (IR) movement detectors: they provide the users’ location in a zone. They cover a zone or a part of a zone. For example there is only one IR detector that covers the entire zone in the dining room and living room (salon), whereas three are installed in the kitchen covering oven, sink and toaster. • Pressure detector: in form of tactile carpets placed on entrance hall, it detects the user moving between the bedroom and living room. The users can use two paths to move around these two zones are through the kitchen or through the entrance hall. • Lamps Light switches: these sensors send an event every time the occupant turns the lights on or off. • Door contacts: these sensors are placed on the doors. They send an event related to the door state (open or close). • Switches contacts: the same as door contact, they are placed on the lockers and fridge. They provide an event when their state is changed either opened or closed. • Flow meter: they provide the taps and the flush toilet stats, two are mounted on the cold and hot water taps of the kitchen sink, one is mounted on the washbasins cold water tap and another in the flush toilet. They send an event when the tap is opened or closed and the flush toilet used.
User’s Behavior Classification Model for Smart Houses Occupant Prediction
Fig. 7.3
7.4.1.2
155
Domus apartment
The Experiment Scenario
Six adults have participated in the experiment to evaluate the early morning habits (wash up, having breakfast). The experience held in 2 times at DOMUS apartment. In the first time (serie 1), the user was asked to perform the early morning routine as he is used to do at home (see Figure 7.4). In the second time (serie 2), he was asked to repeat the same routine where a constraint was introduced during the experiment. This constraint which joined another study conducted by colleagues in the DOMUS laboratory, consist on learning a tea recipe which takes at most 10 minutes. In serie 1 the user cames 10 times to the laboratory,
156
Activity Recognition in Pervasive Intelligent Environments
Table 7.2
BCM matrix example of DOMUS sensor data
Lamp-Bedroom-Open Lamp-Bedroom-Close Door-Bedroom-Open Door-Bedroom-Close Class Day 1 Day 2 Day 3 Day 4 Day 5
1 0 1 0 1
0 0 1 0 1
1 0 1 0 0
1 1 1 1 1
U1 U1 U1 U1 U1
ideally in two consecutive weeks. After 2 weeks break, the user starts the serie 2, he was asked to come 5 days, ideally in one week. In both series, the user was free to use any equipment available in the apartment. The experiment time was about 45 minutes. The experiment starts with the same apartment conditions for all the users, all doors closed and lights switched off. The user is asked to stay in the bedroom for one minute (time required to start data recording). Each user experiment defines a sample data which represents a pattern, this makes a total of 60 samples for serie 1 and 30 samples for a serie 2.
Fig. 7.4
User performing the early morning routine at the Domus appartment
User’s Behavior Classification Model for Smart Houses Occupant Prediction
7.4.1.3
157
Data Preprocessing
A preprocessing step is used to select, from the original values, the DOMUS data subset that is used to construct the BCM matrix. We faced many problems during the experiment providing incorrect samples. These is due to technical (blackout during the experiment, server shutdown, ...) and experimental problems (user begins the experiment before start recording the data). We reported 16 incorrect samples which are eliminated in this phase. To recover this data, we need missing data technics [17] which is another issue that have not been addressed in this paper. In this paper we used the data of serie 1 as a training set (to learn a classifier) and a test data set (to estimate the performance of the classifier). 7.4.1.4
BCM matrix example
Table 7.2 represent five patterns defining the activities that user U1 performed in the DOMUS apartment during five days. Each sensor have two states. For visibility, in this example we show only 4 sensors states from 72. Thus, the sensor bedroom-lamp have the states open and close. We can interpret the first vector (day1) as followed: the user U1, switched on the bedroom light (Lamp-Bedroom-Open), opened and closed the bedroom door (Door-Bedroom-Open, Door-Bedroom-Close). 7.4.2
CASAS Smart Home Project
The CASAS2 Smart Home project is a multi-disciplinary research project at Washington State University. The CASAS project is focused on the creation of an intelligent home environment aiming at minimizing the cost of maintaining the home and maximizing the comfort of its inhabitants. The testbed smart apartment which is part of the CASAS smart home project [18] is a three bedroom apartment that includes three bedrooms, one bathroom, a kitchen, and a living / dining room. The testbed apartment is equipped with different kinds of sensors and actuators to sense the environment and give back information to inhabitants accordingly. 7.4.2.1
Sensors description
The testbed smart apartment is equipped with the following sensors categories. • Motion sensors: motion sensors are positioned on the ceiling approximately 1 meter 2 http://ailab.eecs.wsu.edu/casas/
158
Activity Recognition in Pervasive Intelligent Environments
Table 7.3
Example of the CASAS Smart Home Project sensor data
Date
Time
Sensor Name
State / value
2009-02-02 2009-02-02 2009-02-02 2009-02-02 2009-02-05 2009-02-05 2009-02-10
12:18:44 12:18:46 12:28:50 12:29:55 08:05:52 12:21:51 17:03:57
M16 M17 D12 I03 AD1-B D09 I03
ON OFF OPEN PRESENT 0.0448835 CLOSE ABSENT
apart throughout the space. Motion sensors are labeled from M01 to M51, and have two states: ON, OFF. • Temperature sensors: these sensors are installed to provide temperature readings, and are labeled Txx. These sensors have numeric value. • Door sensors: these sensors are used to detect the state of doors, and are labeled Dxx. They have two states: OPEN, CLOSE. • Item sensors: are used for selected items in the kitchen, and have two states: PRESENT, ABSENT. • Light controllers: are used to control lights in the testbed apartment, and are labeled Lxx. These controllers have two states: ON, OFF. • In addition to the above sensors categories, the testbed apartment is equipped with a burner sensor labeled AD1-A, a hot water sensor AD1-B, a cold water sensor AD1-C, and electricity usage sensor labeled P001. All these sensors have numeric values. Each sensor data has associated features such as sensor name, value or state, and temporal information which indicates when the sensor is triggered. An example of sensors data collected during this experiment is given in Table 7.3. Sensors are installed in well defined zones according to their types and the experiments being performed. The general disposition of the testbed sensors is shown in Figure 7.6. 7.4.2.2
Data preprocessing
The dataset collected from the testbed smart apartment can not be used directly in our approach. Hence, a preprocessing step is required to analyze these data. All the data collected in the CASAS testbed apartment are recorded in one file. We first classed the data by user, which provide us one file for each user. Then we extracted the data of the evaluated activities from each file. We then converted CASAS sensors’ states to features,
User’s Behavior Classification Model for Smart Houses Occupant Prediction
Fig. 7.5
159
Sensor layout of the testbed smart apartment [18]
taking into account the different states value (ON, OFF, CLOSE, OPEN, etc.). A java application is implemented for this purpose to construct the patterns and build the BCM matrix. 7.4.2.3
Testbed activities description
Activities performed during this study were achieved in the testbed smart apartment during spring of 2009 over three months. The apartment housed two residents at this time and they performed their normal daily activities (see Figure 7.6), which correspond to the basic everyday tasks of life such as sleeping, eating, bathing, dressing, cleaning and toileting. The list of activities involved during this study are: Bed to toilet, Grooming and Having breakfast. 7.4.2.4
BCM matrix example
Table 7.4 represents an example of the processed data for the “Grooming” activity with some light controllers states. In this example, each day represents a pattern. In line 2 (Day 1), the user ‘U1’ has switched on the lights L01, L02 and L03 and switched off the
160
Activity Recognition in Pervasive Intelligent Environments
Table 7.4 BCM matrix example of CASAS Smart Home Project Grooming activity data
Day 1 Day 2 Day 3 Day 4 Day 5
L01ON
L01OFF
L02ON
L02OFF
L03ON
L03OFF
L04ON
L04OFF
Class
1 1 1 1 1
0 0 0 0 0
1 1 1 1 1
1 1 1 1 1
1 1 1 1 1
0 1 1 0 1
0 0 1 0 1
0 0 1 0 1
U2 U2 U2 U2 U2
light L02. The light L04 was not activated. Here, in this example, we can remark that, for the five days, the user ‘U1’ switches on the light L01 an does not switch it off. This habit belongs to the user’s behavior when he performs the grooming activity. It is the kind of information which are extracted by the BCM to perform user classification for prediction.
Fig. 7.6 User performing the early morning routine at the the Washington State University smart apartment tesbed [18]
User’s Behavior Classification Model for Smart Houses Occupant Prediction
161
100
Correctness (%)
80
60
40 DT RF BN NN SVM
20
0 0
1
2
3
4
5
Time Period Fig. 7.7
7.5 7.5.1
Comparison of various classifiers accuracies (first five minutes)
Result and Discussion SVM Vs others classifiers
We used normalized polynomial kernel [16] to run the SVM classifier and fixed the complexity parameter C to 1. Based on WEKA Framework [19], we have compared the effectiveness of SVM classifier among other classifiers using the DOMUS data. Figure 7.7 shows the results of the first five minutes of the experiment time. The following classifiers are involved: • Decision Tree (DT) • Random Forest(RF) • Bayesian Network (BN) • Neural Network (NN) • Support Vector Machines (SVM) We can observe that SVM classifier significantly achieves best performance among all of the classifiers. Its identification accuracy is over then 85% during the three first minutes while it’s under 84% for the other classifiers.
162
Activity Recognition in Pervasive Intelligent Environments
100
Correctness (%)
90
80
70
60 Early morning routine 50 1
5
10
20
50
60
Time Period Fig. 7.8
7.5.2
Inhabitant identification accuracy using DOMUS data
BCM Accuracy Results’
We trained the SVM classifier to build the behavior classification model (BCM). The accuracy of the BCM was tested using 10-fold cross-validation [20]. Figures 7.8, 7.9 shows respectively the results of the BCM accuracy over various experiment time period for DOMUS data and the three activities tested using the CASAS Smart Home Project data. For the DOMUS data the accuracy is over then 80%. The best score is recorded at the tenth minute (90%). Regarding the CASAS Smart Home Project data, for the “Bed to Toilet” activity, the accuracy is 100% from the fourth minute of the experiment time. For the “grooming” activity, the best score is recorded at the twentieth minute (99%). Finally, for the “having breakfast” activity, the users are correctly predicted for an accuracy of 79% at the tenth minute. Although the morning routine, tested in this work, is a complicated activity, and usually, several users have the same habits to perform this activity, the behavior classification model (BCM) succeeds, with a high precision, to correctly predict and discriminate the users. This results prove that the users tested in this work have different behaviors and each of them has his own manner to perform the morning routine activity.
User’s Behavior Classification Model for Smart Houses Occupant Prediction
163
100
Correctness (%)
90
80
70
60 Bed to toilet Breakfast Grooming 50 1
5
10
20
50
60
Time Period Fig. 7.9 Inhabitant identification accuracy per activities using CASAS Smart Home Project data
7.6
Conclusion
In this paper, we present our ongoing work on the house occupant prediction issue based on daily life habits in smart houses. Our approach is based on supervised learning technics. We used Support vector machines (SVM) to build behavior classification model (BCM) for learning the user’s habits. We have analyzed the early morning routine with six users at the DOMUS apartment and two users from the publicly available dataset of the Washington State University smart apartment tesbed. The results showed that the user can be recognized with a high precision which means that each user have his own way to perform this activity. As future work we are studying the users’ patterns which allow a person to be discriminated and recognized among a group performing altogether activities in the same environment without using intrusive technologies.
References [1] [2] [3] [4] [5]
Mit nhouse, http://architecture.mit.edu/housen/web. A. Pentland, Smart rooms, Scientific American. 274(4), 68–76, (1996). Georgia tech aware home research initiative, http://www.cc.gatech.edu/fce/ahri. E. Vierck and K. Hodges, Aging: Lifestyles, Work and Money. (Greenwood Press, 2005). R. J. Orr and G. D. Abowd. The smart floor: a mechanism for natural user identification and tracking. In Conference on Human Factors in Computing Systems, pp. 275–276, The Hague,
164
Activity Recognition in Pervasive Intelligent Environments
The Netherlands, (2000). ACM. [6] M. Addlesee, A. H. Jones, F. Livesey, and F. S. Samaria, The orl active floor, IEEE Personal Communications. 4, 35–41, (1997). [7] W.-T. W. J.-H. R. J.-S. Yun, S.-H. Lee. The user identification system using walking pattern over the ubifloor. pp. 1046–1050, Gyeongju, Korea, (2003). [8] J. Little and J. E. Boyd, Recognizing people by their gait: The shape of motion, Videre. 1, 1–32, (1998). [9] K. R. Cuntoor, A. Kale, A. N. Rajagopalan, N. Cuntoor, and V. Krüger. Gait-based recognition of humans using continuous hmms. In in Fifth IEEE International Conference on Automatic Face and Gesture Recognition, pp. 321–326, (2002). [10] A. Jain, L. Hong, and S. Pankanti, Biometric identification, Commun. ACM. 43(2), 90–98, (2000). ISSN 0001-0782. doi: http://doi.acm.org/10.1145/328236.328110. [11] A. K. Jain, A. Ross, and S. Prabhakar, An introduction to biometric recognition, IEEE Trans. on Circuits and Systems for Video Technology. 14(1), 4–20, (2004). [12] Domus, http://domus.usherbrooke.ca/. [13] P. Rashidi and D. Cook, Keeping the resident in the loop: Adapting the smart home to the user, Systems, Man and Cybernetics, Part A: Systems and Humans, IEEE Transactions on. 39(5), 949–959 (Sept., 2009). ISSN 1083-4427. doi: 10.1109/TSMCA.2009.2025137. [14] V. Vapnik, Statistical LearningTheory. (Wiley-Interscience, September 1998). ISBN 0471030031. [15] J. Shawe-Taylor and N. Cristianini, Kernel Methods for Pattern Analysis. (Cambridge University Press, New York, NY, USA, 2004). ISBN 0521813972. [16] R. Debnath and H. Takahashi, Kernel selection for the support vector machine(biocybernetics, neurocomputing), IEICE transactions on information and systems. 87(12), 2903–2904, (2004). [17] R. J. A. Little and D. B. Rubin, Statistical analysis with missing data. (John Wiley & Sons, Inc., New York, NY, USA, 2002). ISBN 978-0471183860. [18] D. Cook and M. Schmitter-Edgecombe, Assessing the quality of activities in a smart environment, Methods of Information in Medicine. 48(5), (2009). [19] I. H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques. (Morgan Kaufmann, 2005), 2 edition. [20] R. Kohavi, A study of cross-validation and bootstrap for accuracy estimation and model selection. 2(12), 1137–1143, (1995).
Chapter 8
Human Activity Recognition from Wireless Sensor Network Data: Benchmark and Software
T.L.M. van Kasteren, G. Englebienne, and B.J.A. Kröse1 Intelligent Systems Lab Amsterdam, Science Park 107, 1098 XG, Amsterdam, The Netherlands Abstract Although activity recognition is an active area of research no common benchmark for evaluating the performance of activity recognition methods exists. In this chapter we present the state of the art probabilistic models used in activity recognition and show their performance on several real world datasets. Our results can be used as a baseline for comparing the performance of other pattern recognition methods (both probabilistic and non-probabilistic). The datasets used in this chapter are made public, together with the source code of the probabilistic models used.
8.1
Introduction
Automatically recognizing human activities such as cooking, sleeping and bathing, in a home setting allows many applications in areas such as intelligent environments [1, 2] and healthcare [3–5]. Recent developments in sensing technology has led to wireless sensor networks which provide a non-intrusive, privacy friendly and easy to install solution to in-home monitoring. Sensors used are contact switches to measure open-close states of doors and cupboards; pressure mats to measure sitting on a couch or lying in bed; mercury contacts for movement of objects such as drawers; passive infrared sensors to detect motion in a specific area and float sensors to measure the toilet being flushed. Recognizing activities from this sensor data requires us to solve the following issues. First, the start and end time of a performed activity is unknown. When activities are performed around the house there is no clear indication when one activity ends and another one starts. Second, there is ambiguity in the observed sensor data with respect to which activity is 1 Corresponding
author T.L.M. van Kasteren can be reached at
[email protected]. 165
166
Activity Recognition in Pervasive Intelligent Environments
taking place. For example, cooking and getting a drink both involve opening the fridge. Sensors can measure that the fridge is opened, but not which item is taken from the fridge. Third, activities can be performed in a large number of ways, making it difficult to create a general description of an activity. Fourth, observed sensor data is noisy. Noise can either be caused by a human making a mistake, for example opening a wrong cupboard, or by the sensor system which fails to transmit a sensor reading due to a network glitch. These issues make activity recognition a very challenging task. Many different models have been proposed for activity recognition, but no common benchmark for evaluating the performance of these models exists. In this chapter we present the state of the art probabilistic models used in activity recognition and show their performance on several real world datasets. Our results can be used as a baseline for comparing the performance of other pattern recognition methods (both probabilistic and non-probabilistic). The datasets used in this chapter are made available to the public, together with the source code of the probabilistic models used. The rest of this chapter is organized as follows. Section 8.2 presents the probabilistic models used for activity recognition. Section 8.3 presents the datasets and the sensor and annotation system used for recording the datasets. Section 8.4 presents our experiments and the results of these experiments. Section 8.5 discusses the outcome of our experiments. Section 8.6 discusses related work and how future work can possibly improve on the models discussed in this chapter.
8.2
Models
In this section we describe the probabilistic models that we use to provide a baseline recognition performance. First, we define the notation used for describing the models, then we present the naive Bayes model, hidden Markov model, hidden semi-Markov model and conditional random field. We present the model definitions and their inference and learning algorithms. The code of these models is publicly available for download from http://sites.google.com/site/tim0306/.
8.2.1
Notation
The models presented in this chapter require discretized time series data, this means data needs to be presented using timeslices of constant length. We discretize the data obtained from the sensors into T timeslices of length Δt. Each sensor corresponds to a single feature
Activity Recognition Benchmark
167
denoted as xti , indicating the value of feature i at timeslice t, with xti ∈ {0, 1}. Feature values can either represent the raw values obtained directly from the sensor, or can be transformed according to a particular feature representation procedure. In the Section 8.4.2 we present various feature representations that can be used. In a house with N sensors installed, we define a binary observation vector xt = (xt1 , xt2 , . . . , xtN )T . The activity at timeslice t is denoted with yt ∈ {1, . . . , Q} for Q possible states. Our task is to find a mapping between a sequence of observations x1:T = {x1 ,x2 , . . . ,xT } and a sequence of activity labels y1:T = {y1 , y2 , . . . , yT } for a total of T timeslices.
8.2.2
Naive Bayes
The naive Bayes model can be considered as one of the most simplistic probabilistic models. Unlike the other models in this chapter, the naive Bayes model assumes all data points are independently and identically distributed (i.i.d.), that is it does not take into account any temporal relations between datapoints. The model factorizes the joint probability over the datapoints as follows T
p(y1:T , x1:T ) = ∏ p(xt | yt )p(yt ).
(8.1)
t=1
The term p(yt ) is the prior probability over an activity, that is, how probable an activity is to occur without taking any observation into account. The observation distribution p(xt | yt ) represents the probability that the activity yt would generate observation vector xt . If we were to model the distribution of the observation vector exactly, we would need to consider all possible combinations of values in the vector’s dimensions. This would require 2N parameters per activity, with N being the number of sensors used. This easily results in a large number of parameters, even for small numbers of N. Instead, we apply the naive Bayes assumption, which means we model each sensor reading separately, requiring only N parameters for each activity. The observation distribution therefore factorizes as N
p(xt | yt = i) = ∏ p(xtn | yt = i)
(8.2)
n=1
where each sensor observation is modeled as an independent Bernoulli distribution, given xn
n
by p(xtn | yt = i) = μnit (1 − μni )1−xt . Using naive Bayes is a very strong model assumption, which most likely does not represent the true distribution of the data (i.e. it is very likely that two sensors are dependent on each other with respect to a particular activity). However, naive Bayes has been shown to give very good results in many domains, despite its overly simplistic assumption [6].
168
Activity Recognition in Pervasive Intelligent Environments
8.2.3
Hidden Markov model
Hidden Markov models (HMM) apply the Markov assumption to model the temporal correlation between consecutive timeslices; it relies on the following two independence assumptions. • The hidden variable at time t, namely yt , depends only on the previous hidden variable yt−1 (first order Markov assumption [7]). • The observable variable at time t, namely xt , depends only on the hidden variable yt at that time slice. The joint probability therefore factorizes as follows T
p(y1:T , x1:T ) = ∏ p(xt | yt )p(yt | yt−1 ) t=1
where we have used p(y1 | y0 ) = p(y1 ) for the sake of notational simplicity. For the observation model p(xt | yt ) we use the same assumptions as we did with the naive Bayes model. The transition probability distribution p(yt | yt−1 ) represents the probability of going from one state to the next. This is modeled as Q multinomial distributions, one for each activity. Individual transition probabilities are denoted as p(yt = j | yt−1 = i) ≡ ai j . For further reading about hidden Markov models we refer the reader to a tutorial by Rabiner [7]. The tutorial covers many different aspects of HMMs and explains their application in the task of speech recognition. 8.2.4
Hidden semi-Markov model
Semi-Markov models relax the Markov assumption by explicitly modeling the duration of an activity. In conventional Markov models, such as the HMM, the duration of an activity is modeled implicitly by means of self-transitions of states, and this entails a number of important limitations [7, 8]. Hidden semi-Markov models explicitly model the duration of a state and therefore any distribution can be used to model the duration. To model the duration we introduce an additional variable dt , which represents the remaining duration of state yt . The value of dt is decreased by one at each timestep, and as long as the value is larger than zero the model continues to stay in state yt . When the value of dt reaches zero a transition to a new state is made and the duration of the new state is obtained from the duration distribution. The joint probability of the HSMM is factorized as T
p(y1:T , x1:T , d1:T ) = ∏ p(xt | yt )p(yt | yt−1 , dt−1 )p(dt | dt−1 , yt ) t=1
Activity Recognition Benchmark
169
where we have used p(y1 | y0 , d0 ) = p(y1 ) and p(d1 | d0 , y1 ) = p(d1 | y1 ) for the sake of notational simplicity. The observation model p(xt | yt ) is the same as with the naive Bayes model and the HMM. Transitions between states are modeled by the factor p(yt | yt−1 , dt−1 ), defined as: ⎧ ⎨δ (i, j) if dt−1 > 0 (remain in same state) p(yt = i | yt−1 = j, dt−1 ) = ⎩a if dt−1 == 0 (transition) ij
(8.3)
where δ (i, j) is the Kronecker delta function, giving 1 if i = j and 0 otherwise. The parameter ai j is part of a multinomial distribution. Note, some model definitions of semi-Markov models force the value of aii to zero, effectively disabling self-transitions. However, in this work we do allow self-transitions. The use of self-transitions in semi-Markov models allows convolutions of duration distributions and makes it possible that separate instances of activities immediately follow each other. State durations are modeled by the term p(dt | dt−1 , yt ), defined as ⎧ ⎨ pk (dt + 1) if dt−1 == 0 (generate new duration) p(dt | dt−1 , yt = k) = ⎩δ (d , d − 1) if d > 0 (count down duration) t
t−1
(8.4)
t−1
where pk (dt + 1) is the distribution used for modeling the state duration. In this work we use a histogram using 5 bins to represent the duration distribution. Note, dt is defined as the remaining duration of a state, therefore the actual duration, when generating a new duration, is dt + 1 timeslices. For further details on hidden semi-Markov models we refer the reader to a tutorial from Murphy [8]. It provides a unified framework for HSMMs and describes possible variations to the HSMM described above. 8.2.5
Conditional random fields
The naive Bayes model, the HMM and HSMM are generative models in which parameters are learned by maximizing the joint probability p(y1:T , x1:T ). Conditional random fields are discriminative models. Rather than modeling the full joint probability, discriminative models model the conditional probability p(y1:T | x1:T ). Conditional random fields represent a general class of discriminative models. A CRF using the first-order Markov assumption is called a linear-chain CRF and most closely resembles the HMM in terms of structure. We define the linear-chain CRF as a discriminative analog of the previously defined HMM, so that the same independence assumptions hold. These assumptions are represented in the
170
Activity Recognition in Pervasive Intelligent Environments
model using feature functions, so that the conditional distribution is defined as p(y1:T | x1:T ) =
K T 1 exp ∑ λk fk (yt , yt−1 ,xt ) ∏ Z(x1:T ) t=1 k=1
where K is the number of feature functions used to parameterize the distribution, λk is a weight parameter and fk (yt , yt−1 ,xt ) a feature function. The product of the parameters and the feature function λk fk (yt , yt−1 ,xt ) is called an energy function, and the exponential representation of that term is called a potential function [9]. Unlike the factors in the joint distribution of HMMs, the potential functions do not have a specific probabilistic interpretation and can take any positive real value. The partition function Z(x1:T ) is a normalization term, that ensures that the distribution sums up to one and obtains a probabilistic interpretation [10]. It is calculated by summing over all possible state sequences
Z(x1:T ) = ∑ y
T
K
t=1
k=1
∏ exp ∑ λk fk (yt , yt−1 ,xt )
.
(8.5)
The feature functions fk (yt , yt−1 ,xt ) for the CRF can be grouped as observation feature functions and transition feature functions. In defining the feature functions we use a multidimensional index to simplify notation, rather than the one-dimensional index used above. This gives the following feature function definitions: Observation: fvin (xtn , yt ) = δ (yt , i) · δ (xtn , v). Transition: fi j (yt , yt−1 ) = δ (yt , i) · δ (yt−1 , j). A thorough introduction to CRFs is presented by Sutton et al. [10]. It includes a general comparison between discriminative and generative models, and it explains how HMMs and CRFs differ from each other. 8.2.6
Inference
The inference problem for these models consists of finding the single best state sequence that maximizes p(y1:T | x1:T ). For the naive Bayes model we can simply calculate the probability for all state and observation pairs and select the state with the maximum probability at each timeslice. In the other models we need to consider every possible sequence of states, which grows exponentially with the length of the sequence. We use the Viterbi algorithm, which uses dynamic programming, to find the best state sequence efficiently. By using Viterbi we can discard a number of paths at each time step, for HMMs and CRFs this results in a
Activity Recognition Benchmark
171
computational complexity of O(T Q2 ) for the entire sequence, where T is the total number of timeslices and Q the number of states [7]. For HSMMs we also need to iterate over all possible durations and therefore the computational complexity is O(DT Q2 ), where D is the maximum possible duration [11]. 8.2.7
Learning
For learning the model parameters we assume that we have fully labeled data available. The parameters of the naive Bayes model, HMM and HSMM can be learned in closed form. In the case of the CRF, parameters have to be estimated iteratively using a numerical method. 8.2.7.1
Closed Form Solution
We learn the model parameters using maximum likelihood. Finding the joint maximum likelihood parameters is equivalent to finding the maximum likelihood parameters of each of the factors that make up the joint probability. The observation probability p(xtn | y = i) is a Bernoulli distribution whose maximum likelihood parameter estimation is given by
μni =
T xtn δ (yt , i) ∑t=1 T ∑t=1 δ (yt , i)
(8.6)
where T is the total number of time slices. The transition probability p(yt = j | yt−1 = i) is a multinomial distribution whose parameters are calculated by ai j =
T δ (yt , j)δ (yt−1 , i) ∑t=2 T δ (yt−1 , j) ∑t=2
(8.7)
where T is equal to the number of time slices. 8.2.7.2
Numerical Optimization
The parameters θ = {λ1 , . . . , λK } of CRFs are learned by maximizing the conditional log likelihood l(θ ) = log p(y1:T | x1:T , θ ) given by T
l(θ ) = ∑
K
K
λ2
∑ λk fk (yt , yt−1 ,xt ) − log Z(x1:T ) − ∑ 2σk 2
t=1 k=1
k=1
where the final term is a regularization term, penalizing large values of λ to prevent overfitting. The constant σ is set beforehand and determines the strength of the penalization [10]. The function l(θ ) is concave, which follows from the convexity of log Z(x1:T ) [10]. A useful property of convex functions in parameter learning is that any local optimum is also
172
Activity Recognition in Pervasive Intelligent Environments
a global optimum. Quasi-Newton methods such as BFGS have been shown to be suitable for estimating the parameters of CRFs [12, 13]. These methods approximate the Hessian, the matrix of second derivatives, by analyzing successive gradient vectors. To analyze the gradient vectors, the partial derivative of l(θ ) with respect to λi is needed, it is given by T T ∂l λi = − 2 + ∑ fi (yt , yt−1 ,xt ) − ∑ ∑ p(yt , yt−1 |xt ) fi (yt , yt−1 ,xt ). ∂ λi σ t=1 t=1 yt ,yt−1
Because the size of the Hessian is quadratic in the number of parameters, storing the full Hessian is memory intensive. We therefore use a limited-memory version of BFGS [14, 15].
8.3
Datasets
We have recorded three datasets consisting of several weeks of data recorded in a real world setting. In this section we describe the sensor and annotation system that we used to record and annotate our datasets and we give a description of the houses and the datasets recorded in them. 8.3.1
Sensors Used
In this work we use wireless sensor networks to observe the behavior of inhabitants inside their homes. After considering different commercially available wireless network kits, we selected the RFM DM 1810 (fig. 8.1(a)), because it comes with a very rich and well documented API and the standard firmware includes an energy efficient network protocol. An energy saving protocol efficiently handles the wireless communication, resulting in a long battery life. The node can reach a data transmission rate of 4.8 kb/s, which is enough for the binary sensor data that we need to collect. The kit comes with a base station which is attached to a PC through USB. A new sensor node can be easily added to the network by a simple pairing procedure, which involves pressing a button on both the base station and the new node. The RFM wireless network node has an analog and digital input. It sends an event when the state of the digital input changes or when the analog input crosses some threshold. We equipped nodes with various kinds of sensors: reed switches to measure whether doors and cupboards are open or closed; pressure mats to measure sitting on a couch or lying in bed; mercury contacts to detect the movement of objects (e.g. drawers); passive infrared (PIR) to detect motion in a specific area; float sensors to measure the toilet being flushed. All
Activity Recognition Benchmark
173
(b) Bluetooth headset (a) Network node Fig. 8.1 tion.
Wireless sensor network node used for sensing the environment and bluetooth headset used for annota-
sensors give binary output.
8.3.2
Annotation
Annotation was performed using either a handwritten activity diary or a bluetooth headset combined with speech recognition software. The starting and end point of an activity were annotated in both cases.
8.3.2.1
Handwritten diary
In one of our datasets we annotated the activities using a handwritten diary. Several sheets of papers were distributed throughout the house at locations where activities are typically performed. The subject would annotate the start and end time of activities by reading the time of his watch and writing it down on one of the sheets. The advantage of this method is that it is very easy to install and use by the subject. The disadvantage is that it is time consuming to process the annotated data (i.e. enter the handwritten sheets into a computer) and that the time stamps from the watch might slightly differ from the timestamps recorded by the computer saving the sensor data.
174
Activity Recognition in Pervasive Intelligent Environments
Table 8.1
Age Gender Setting Rooms Duration Sensors Activities Annotation
8.3.2.2
Details of recorded datasets. House A
House B
House C
26 Male Apartment 3 25 days 14 10 Bluetooth
28 Male Apartment 2 14 days 23 13 Diary
57 Male House 6 19 days 21 16 Bluetooth
Bluetooth method
In the bluetooth annotation method a set of predefined commands is used to record the start and end time of activities. We used the Jabra BT250v bluetooth headset (fig. 8.1(b)) for annotation. It has a range up to 10 meters and battery power for 300 hours standby or 10 hours active talking. This is more than enough for a full day of annotation, the headset was recharged at night. The headset contains a button which we used to trigger the software to add a new annotation entry. The software for storing the annotation is written in C and combines elements of the bluetooth API with the Microsoft Speech API1 . The bluetooth API was needed to catch the event of the headset button being pressed and should work with any bluetooth dongle and headset that uses the Widcomm2 bluetooth stack. The Microsoft Speech API provides an easy way to use both speech recognition and text to speech. When the headset button is pressed the speech recognition engine starts listening for commands it can recognize. We created our own speech grammar, which contains a limited combination of commands the recognition engine could expect. By using very distinctive commands such as ‘begin use toilet’ and ‘begin take shower’, the recognition engine had multiple words by which it could distinguish different commands. This resulted in near perfect recognition results during annotation. The recognized sentence is outputted using the text-to-speech engine. Any errors that do occur can be immediately corrected using a ‘correct last’ command. Because annotation is provided by the user on the spot this method is very efficient and accurate. The use of a headset together with speech recognition results in very little interference while performing activities. The resulting annotation data requires very little post 1 For 2 For
details about the Microsoft Speech API see: http://www.microsoft.com/speech/ details about the Widcomm stack see: http://www.broadcom.com/products/bluetooth_sdk.php
Activity Recognition Benchmark
175
processing and is timestamped using the same clock (the internal clock of the computer) that is used to timestamp the sensor data.
(a) House A
(c) House C, First floor
(b) House B
(d) House C, Second floor
Fig. 8.2 Floorplan of houses A, B and C, the red boxes represent wireless sensor nodes.
8.3.3
Houses
A total of three datasets was recorded in three different houses. Details about the datasets can be found in Table 8.1. Floorplans for each of the houses, indicating the locations of the sensors, can be found in Figure 8.2.
176
Activity Recognition in Pervasive Intelligent Environments
Table 8.2
Confusion Matrix
True
1
Inferred 2
1 2 3
T P1 ε21 ε31
ε12 T P2 ε32
ε13 ε23 T P3
T T1 T T2 T T3
T I1
T I2
T I3
Total
3
Confusion Matrix showing the true positives (TP), total of true labels (TT) and total of inferred labels (TI) for each class.
These datasets are publicly available for download from http://sites.google.com/ site/tim0306/. 8.4
Experiments
Our experiments provide a baseline recognition performance in using probabilistic models with our datasets. We run two experiments, in the first experiment we discretize our data using various timeslice lengths and in the second experiment we compare the performance of the different models and a number of feature representations. 8.4.1
Experimental Setup
We split our data into a test and training set using a ‘leave one day out’ approach. In this approach, one full day of sensor readings is used for testing and the remaining days are used for training. We cycle over all the days and report the average performance measure. We evaluate the performance of our models using precision, recall and F-measure. These measures can be calculated using the confusion matrix shown in Table 8.2. The diagonal of the matrix contains the true positives (TP), while the sum of a row gives us the total of true labels (TT) and the sum of a column gives us the total of inferred labels (TI). We calculate the precision and recall for each class separately and then take the average over all classes. It is important to use these particular measures because we are dealing with unbalanced datasets. In unbalanced datasets some classes appear much more frequent than other classes. Our measure takes the average precision and recall over all classes and therefore considers the correct classification of each class equally important. To further illustrate the importance of these measures we also include the accuracy in our results. The accuracy represents the percentage of correctly classified timeslices, therefore more frequently occurring classes have a larger weight in this measure.
Activity Recognition Benchmark
177
(b) Changepoint
(a) Raw Fig. 8.3
Differen feature representations.
Precision =
1 N T Pi ∑ T Ii N i=1
(8.8)
Recall =
1 N T Pi ∑ T Ti N i=1
(8.9)
2 · precision · recall precision + recall ∑N T Pi Accuracy = i=1 Total
F-Measure =
8.4.2
(c) Last-fired
(8.10) (8.11)
Feature Representation
The raw data obtained from the sensors can either be used directly, or be transformed into a different representation form. We experiment with three different feature representations: Raw: The raw sensor representation uses the sensor data directly as it was received from the sensors. It gives a 1 when the sensor is firing and a 0 otherwise (Fig. 8.3(a)). Changepoint: The change point representation indicates when a sensor event takes place. That is, it indicates when a sensor changes value. More formally, it gives a 1 when a sensor changes state (i.e. goes from zero to one or vice versa) and a 0 otherwise (Fig. 8.3(b)). Last-fired: The last-fired sensor representation indicates which sensor fired last. The sensor that changed state last continues to give 1 and changes to 0 when another sensor changes state (Fig. 8.3(c)).
8.4.3
Experiment 1: Timeslice Length
Here we present our findings for determining the ideal timeslice length for discretizing the sensor data. Experiments were run using the HMM. We experimented using all the feature
178
Activity Recognition in Pervasive Intelligent Environments
(a) House A
(b) House B
(c) House C Fig. 8.4 F-Measure performance of the HMM for the three houses using different timeslice length to discretize the data. Δt is given in seconds.
Activity Recognition Benchmark
179
representations, to rule out any bias towards any of the representations. The sensor data and the ground truth activity labels are discretized using the same timeslice length. During this discretization process it is possible that two or more activities occur within a single timeslice. For example, an activity might end somewhere halfway of the timeslice and another activity can start immediately after. In this case we let the timeslice represent the activity that takes up most of the timeslice. However, a consequence of this is that the discretized ground truth differs from the actual ground truth. To express the magnitude of this difference, we introduce the discretization error. The discretization error represents the percentage of incorrect labels in the discretized ground truth, as a result of the discretization process. The discretized ground truth labels are used to learn the model parameters. However, if we were to use the discretized ground truth for calculating the performance measures of the model as well, our measure would not take into account the discretization error. Therefore, we evaluate the performance of our models using the actual ground truth, which was obtained with a one second accuracy. The F-measure values for the various timeslice lengths are plotted in Figure 8.4. We see that within a single house no timeslice length achieves consistent maximum performance for all feature representations. Furthermore, we see that across all three houses no timeslice length consistently outperforms the others for a particular feature representation. Overall we see that the timeslice lengths of Δt = 30 seconds and Δt = 60 seconds give a good performance. Table 8.3 shows the discretization error for this experiment. As expected, small timeslice lengths result in a small discretization error. We see that a significant error occurs for timeslice length of Δt = 300 seconds and Δt = 600 seconds.
Table 8.3 Discretization Error in percentages. Length Δt = Δt = Δt = Δt = Δt = Δt =
1s 10 s 30 s 60 s 300 s 600 s
House A
House B
House C
0.0 0.2 0.6 1.3 5.9 10.6
0.0 0.2 0.6 1.1 4.0 17.4
0.0 0.2 0.9 1.7 8.1 13.7
180
8.4.4
Activity Recognition in Pervasive Intelligent Environments
Experiment 2: Feature Representations and Models
This experiment shows the performance of the models for the different feature representations. Data was discretized using the timeslice length of Δt = 60 seconds. The discretized ground truth was used to calculate the performance measures. The measures and their standard deviation for house A are shown in Table 8.4, for house B in Table 8.5 and for house C in Table 8.6. In comparing the performance of the different feature representations we see that the raw feature representation performs by far the worst. The change and last representation both perform well, although the change representation manages to outperform the last representation significantly in a number of cases. In terms of model performances, we see that the HMM generally outperforms the naive Bayes model and that the HSMM generally outperforms the HMM. CRFs manage to outperform the other models in some cases, but the HSMM using the changepoint representation consistently achieves a very high F-measure (the highest in two of the three datasets). When looking at the precision and recall we see that the generative models (naive Bayes, HMM and HSMM) generally score higher in terms of recall, while the discriminative model (CRF) generally outperforms the other models in terms of precision. Finally, in terms of accuracy CRFs have a very high score and outperform the other models in almost all cases.
8.5
Discussion
The results from experiment 1 show that the choice of timeslice length can strongly influence the recognition performance of a model. A timeslice length of Δt = 30 seconds and Δt = 60 seconds gives a good performance and results in a small discretization error. This means these timeslice lengths are long enough to be discriminative and short enough to provide high accuracy labeling results. Experiment 2 shows that the choice of feature representation strongly affects the recognition performance of the model. Feature representations are used to restructure the data with the intention of making the classification problem easier to solve. The raw feature representation does not give good results because the binary state of a sensor does not provide good information about the activity that is currently performed. For example, people tend to leave doors open after they walk through them. The raw representation can indicate that a shower door is open long after the person finished using the shower. Because the shower door can practically be open or closed when someone is not taking a shower it is not a good
Activity Recognition Benchmark
181
Table 8.4 Model
NB
HMM
HSMM
CRF
Results of experiment 2, House A
Feature
Precision
Recall
F-Measure
Accuracy
Raw Change Last Raw Change Last Raw Change Last Raw Change Last
48.3 ± 17.7 52.7 ± 17.5 67.3 ± 17.2 37.9 ± 19.8 70.3 ± 16.0 54.6 ± 17.0 39.5 ± 18.9 70.5 ± 16.0 60.2 ± 15.4 59.2 ± 18.3 73.5 ± 16.6 66.2 ± 15.8
42.6 ± 16.6 43.2 ± 18.0 64.8 ± 14.6 45.5 ± 19.5 74.3 ± 13.3 69.5 ± 12.7 48.5 ± 19.5 75.0 ± 12.1 73.8 ± 12.5 56.1 ± 17.3 68.0 ± 16.0 65.8 ± 14.0
45.1 ± 16.9 47.1 ± 17.2 65.8 ± 15.5 41.0 ± 19.5 72.0 ± 14.2 60.8 ± 14.9 43.2 ± 19.1 72.4 ± 13.7 66.0 ± 13.7 57.2 ± 17.3 70.4 ± 15.9 65.9 ± 14.6
77.1 ± 20.8 55.9 ± 18.8 95.3 ± 2.8 59.1 ± 28.7 92.3 ± 5.8 89.5 ± 8.4 59.5 ± 29.0 91.8 ± 5.9 91.0 ± 7.2 89.8 ± 8.5 91.4 ± 5.6 96.4 ± 2.4
Different feature representations using naive Bayes (NB), HMM, HSMMs and CRFs.
Table 8.5 Results of experiment 2, House B Model
NB
HMM
HSMM
CRF
Feature
Precision
Recall
F-Measure
Accuracy
Raw Change Last Raw Change Last Raw Change Last Raw Change Last
33.6 ± 10.9 40.9 ± 7.2 43.7 ± 8.7 38.8 ± 14.7 48.2 ± 17.2 38.5 ± 15.8 37.4 ± 16.9 49.8 ± 15.8 40.8 ± 11.6 35.7 ± 15.2 48.3 ± 8.3 46.0 ± 12.5
32.5 ± 8.4 38.9 ± 5.7 44.6 ± 7.2 44.7 ± 13.4 63.1 ± 14.1 46.6 ± 19.5 44.6 ± 14.3 65.2 ± 13.4 53.3 ± 10.9 40.6 ± 12.0 51.5 ± 8.5 47.8 ± 12.1
32.4 ± 8.0 39.5 ± 5.0 43.3 ± 4.8 40.7 ± 12.4 53.6 ± 16.5 41.8 ± 17.1 39.9 ± 14.3 55.7 ± 14.6 45.8 ± 11.2 37.5 ± 13.7 49.7 ± 7.9 46.6 ± 12.0
80.4 ± 18.0 67.8 ± 18.6 86.2 ± 13.8 63.2 ± 24.7 81.0 ± 14.2 48.4 ± 26.0 63.8 ± 24.2 82.3 ± 13.5 67.1 ± 24.8 78.0 ± 25.9 92.9 ± 6.2 89.2 ± 13.9
Different feature representations using naive Bayes (NB), HMM, HSMMs and CRFs.
feature to base classification on. In the changepoint representation this issue is resolved because this feature represents when a sensor changes state and thus indicates when an object is used. The last representation gives an indication of the location of an inhabitant. The idea is that as long as people remain in the same location they do not trigger any sensors. As soon as people start moving around the house they are likely to trigger sensors, which will provide an update of their current location. This representation works best with a large number of sensors installed, so that the chance of triggering a sensor when moving around
182
Activity Recognition in Pervasive Intelligent Environments
Table 8.6 Results of experiment 2, House C Model
NB
HMM
HSMM
CRF
Feature
Precision
Recall
F-Measure
Accuracy
Raw Change Last Raw Change Last Raw Change Last Raw Change Last
19.6 ± 11.4 39.9 ± 6.9 40.5 ± 7.4 15.2 ± 9.2 41.4 ± 8.8 40.7 ± 9.7 15.6 ± 9.2 43.8 ± 10.0 42.5 ± 10.8 17.8 ± 22.1 36.7 ± 18.0 37.7 ± 17.1
16.8 ± 7.5 30.8 ± 4.8 46.4 ± 14.8 17.2 ± 9.3 50.0 ± 11.4 53.7 ± 16.2 20.4 ± 10.9 52.3 ± 12.8 56.0 ± 15.4 21.8 ± 20.9 39.6 ± 17.4 40.4 ± 16.0
17.8 ± 9.1 34.5 ± 4.6 42.3 ± 6.8 15.7 ± 8.8 44.9 ± 8.8 45.9 ± 11.2 17.3 ± 9.6 47.4 ± 10.5 47.9 ± 11.3 19.0 ± 21.8 38.0 ± 17.6 38.9 ± 16.5
46.5 ± 22.6 57.6 ± 15.4 87.0 ± 12.2 26.5 ± 22.7 77.2 ± 14.6 83.9 ± 13.9 31.2 ± 24.6 77.5 ± 15.3 84.5 ± 13.2 46.3 ± 25.5 82.2 ± 13.9 89.7 ± 8.4
Different feature representations using naive Bayes (NB), HMM, HSMMs and CRFs.
the house is large. The probabilistic models we used in the experiments differ in complexity. Naive Bayes is the simplest model, HMMs add temporal relations and HSMMs add the explicit modeling of state durations. CRFs are basically similar to HMMs, but the parameters are learned using a different optimization criterion. The results show that an increase in model complexity generally results in an increase of performance as well. However, this is not always the case, increasing the model complexity involves making certain model assumptions. For example, to model the temporal relations in HMMs we use the first order Markov assumption. This assumption might be too strong, it is very well possible that in a complex activity such as cooking dinner there are dependencies between timeslices at the start of the cooking activity and the end of the cooking activity. On the other hand, explicitly modeling these long term dependencies results in a large number of model parameters and therefore requires a large amount of training data to accurately learn those parameters. The best performing model therefore needs to be designed with a careful trade-off between complexity and amount of training data needed. Precision and recall gives us further insight into the quality of the classification. Precision tells us which percentage of inferred labels was correctly classified, while recall tells us which percentage of true labels was correctly classified. CRFs score higher on precision because during learning the set of parameters is chosen that favors the correct classification of frequently occurring activities (activities that take up many timeslices). As a result infrequent classes are only classified as such when there is little or no confusion with the
Activity Recognition Benchmark
183
frequent class. This results in a high precision for the infrequent classes. The recall on the other hand is low because many infrequent classes are incorrectly classified as frequent. In the case of the generative models (naive Bayes, HMM and HSMM) model parameters are learned without consideration of the class frequency. This results in more confusion among the infrequent classes, but also results in more true positives for the infrequent classes and therefore a higher recall. Finally, in terms of the accuracy measure CRFs generally score the highest, because of its preference to correctly classify frequent classes. In the accuracy measure frequent classes have a higher weight. However, because in most applications the correct classification of each class is equally important, we suggest the use of our precision, recall and F-measure. If for some application it is clear with which weight of importance the activities should be measured, it is possible to use those weights in the calculation of the precision and recall accordingly. 8.6
Related and Future work
The probabilistic models discussed in this chapter represent the state of the art models used in activity recognition. Tapia et al. used the naive Bayes model in combination with the raw feature representation on two real world datasets recorded using a wireless sensor network [16]. HMMs were used in work by Patterson et al. and were applied to data obtained from a wearable RFID reader in a house where many objects are equipped with RFID tags [17]. Wireless sensor network data is more ambiguous than RFID data with respect to activities. For example, when using RFID it is possible to sense which object is used, while with wireless sensor networks only the cupboard that contains the object can be sensed. Because there are usually multiple objects in a cupboard the resulting data is ambiguous with respect to the object used. In work by van Kasteren et al. the performance of HMMs and CRFs in activity recognition was compared on a realworld dataset recorded using a wireless sensor network [18]. Duong et al. compared the performance of HSMMs and HMMs in activity recognition using a laboratory setup in which four cameras captured the location of a person in a kitchen setup [19]. One type of model that we have not included in our comparison are hierarchical models. They have been successfully applied to activity recognition from video data [20], in an office environment using cameras, microphones and keyboard input [21] and on data obtained from a wearable sensing system [22]. The models and datasets presented in this chapter provide a good benchmark for comparing future models. Possible extensions for future work include different feature representa-
184
Activity Recognition in Pervasive Intelligent Environments
tions. In the representations presented in this chapter each feature corresponds to a single sensor. However, a representation which captures properties between sensors (e.g. sensor A fired after sensor B) might further improve the recognition performance. Another extension can be the use of a different observation model. In this chapter we used the naive Bayes assumption in all models to minimize the number of parameters needed. Structure learning can be used to learn which dependencies between sensors are most important for recognizing activities, therefore automatically finding a nice trade-off between the number of parameters and the complexity of the model. Finally, more elaborate dynamic Bayesian networks can be used in which variables relevant to activity recognition are modeled to increase recognition performance. A clear downside of the use of probabilistic models is the need for carefully labeled training data to learn the model parameters. Because human activities are quite well understood and wireless sensor networks provide easily interpretable data, there is a large body of work in the logic community focusing on activity recognition. The use of a logic system allows the inclusion of hand crafted rules and therefore limits the need for training data. Many logic systems for activity recognition have been proposed using various kinds of formal logic, such as event calculus [23], lattice theory and action description logic [24] and temporal logic [25]. The benchmark datasets in this chapter can be used to compare the performance of such logic systems to the performance of state of the art probabilistic models in activity recognition.
8.7
Conclusion
This chapter presents several real world datasets for activity recognition that were recorded using a wireless sensor network. We introduced an evaluation method using precision, recall and F-measure, which provides insight into the quality of the classification and takes unbalanced datasets into account. The performance of state of the art probabilistic models was evaluated and these results provide a baseline for comparison with other pattern recognition methods. The code and datasets needed for repeating and extending on these experiments are publicly available to the community.
References [1] J. C. Augusto and C. D. Nugent, Eds. Designing Smart Homes, The Role of Artificial Intelligence, vol. 4008, Lecture Notes in Computer Science, (2006). Springer. ISBN 3-540-35994-X.
Activity Recognition Benchmark
185
[2] D. J. Cook and S. K. Das, Smart Environments: Technology, Protocols and Applications. (Wiley-Interscience, 2004). [3] G. Abowd, A. Bobick, I. Essa, E. Mynatt, and W. Rogers. The aware home: Developing technologies for successful aging. In Proceedings of AAAI Workshop and Automation as a Care Giver, (2002). [4] R. Suzuki, M. Ogawa, S. Otake, T. Izutsu, Y. Tobimatsu, S.-I. Izumi, and T. Iwaya, Analysis of activities of daily living in elderly people living alone, Telemedicine. 10, 260, (2004). [5] D. H. Wilson. Assistive Intelligent Environments for Automatic Health Monitoring. PhD thesis, Carnegie Mellon University, (2005). [6] I. Rish. An empirical study of the naive bayes classifier. In IJCAI 2001 workshop on empirical methods in artificial intelligence, pp. 41–46, (2001). [7] L. R. Rabiner, A tutorial on hidden markov models and selected applications in speech recognition, Proceedings of the IEEE. 77(2), 257–286, (1989). URL http://ieeexplore.ieee. org/xpls/abs_all.jsp?arnumber=18626. [8] K. P. Murphy. Hidden semi-markov models (hsmms). Technical report, University of British Columbia, (2002). [9] C. M. Bishop, Pattern Recognition and Machine Learning (Information Science and Statistics). (Springer, August 2006). ISBN 0387310738. URL http://www.amazon.ca/exec/obidos/ redirect?tag=citeulike04-20&path=ASIN/0387310738. [10] C. Sutton and A. McCallum, Introduction to Statistical Relational Learning, chapter 1: An introduction to Conditional Random Fields for Relational Learning, p. (Available online). MIT Press, (2006). [11] T. van Kasteren, G. Englebienne, and B. Kröse, Activity recognition using semi-markov models on real world smart home datasets, Journal of Ambient Intelligence and Smart Environments. 2 (3), 311–325, (2010). [12] H. Wallach. Efficient training of conditional random fields. Master’s thesis, University of Edinburgh, (2002). URL http://citeseer.ist.psu.edu/wallach02efficient.html. [13] F. Sha and F. Pereira. Shallow parsing with conditional random fields. In NAACL ’03: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, pp. 134–141, Morristown, NJ, USA, (2003). Association for Computational Linguistics. doi: http://dx.doi.org/10.3115/1073445. 1073473. [14] R. H. Byrd, J. Nocedal, and R. B. Schnabel, Representations of quasi-newton matrices and their use in limited memory methods, Math. Program. 63(2), 129–156, (1994). ISSN 0025-5610. [15] D. C. Liu and J. Nocedal, On the limited memory bfgs method for large scale optimization, Math. Program. 45(3), 503–528, (1989). ISSN 0025-5610. doi: http://dx.doi.org/10.1007/ BF01589116. [16] E. M. Tapia, S. S. Intille, and K. Larson. Activity recognition in the home using simple and ubiquitous sensors. In Pervasive Computing, Second International Conference, PERVASIVE 2004, pp. 158–175, Vienna, Austria (April, 2004). [17] D. J. Patterson, D. Fox, H. A. Kautz, and M. Philipose. Fine-grained activity recognition by aggregating abstract object usage. In ISWC, pp. 44–51. IEEE Computer Society, (2005). ISBN 0-7695-2419-2. URL http://doi.ieeecomputersociety.org/10.1109/ISWC.2005.22. [18] T. van Kasteren, A. Noulas, G. Englebienne, and B. Kröse. Accurate activity recognition in a home setting. In UbiComp ’08: Proceedings of the 10th international conference on Ubiquitous computing, pp. 1–9, New York, NY, USA, (2008). ACM. ISBN 978-1-60558-136-1. doi: http: //doi.acm.org/10.1145/1409635.1409637. [19] T. Duong, D. Phung, H. Bui, and S. Venkatesh, Efficient duration and hierarchical modeling for human activity recognition, Artif. Intell. 173(7-8), 830–856, (2009). ISSN 0004-3702. doi: http://dx.doi.org/10.1016/j.artint.2008.12.005.
186
Activity Recognition in Pervasive Intelligent Environments
[20] S. Luhr, H. H. Bui, S. Venkatesh, and G. A. West, Recognition of human activity through hierarchical stochastic learning, percom. 00, 416, (2003). doi: http://doi.ieeecomputersociety. org/10.1109/PERCOM.2003.1192766. [21] N. Oliver, A. Garg, and E. Horvitz, Layered representations for learning and inferring office activity from multiple sensory channels, Comput. Vis. Image Underst. 96(2), 163–180, (2004). ISSN 1077-3142. doi: http://dx.doi.org/10.1016/j.cviu.2004.02.004. [22] A. Subramanya, A. Raj, J. Bilmes, and D. Fox. Hierarchical models for activity recognition. In IEEE Multimedia Signal Processing (MMSP) Conference, Victoria, CA (October, 2006). [23] L. Chen, C. Nugent, M. Mulvenna, D. Finlay, X. Hong, and M. Poland. Using event calculus for behaviour reasoning and assistance in a smart home. In ICOST ’08: Proceedings of the 6th international conference on Smart Homes and Health Telematics, pp. 81–89, Berlin, Heidelberg, (2008). Springer-Verlag. ISBN 978-3-540-69914-9. doi: http://dx.doi.org/10.1007/ 978-3-540-69916-3_10. [24] B. Bouchard, S. Giroux, and A. Bouzouane. A Logical Approach to ADL Recognition for Alzheimer’s patients. In Smart Homes And Beyond: Icost 2006, 4th International Conference on Smart Homes and Health Telematics, p. 122. IOS Press, (2006). [25] A. Rugnone, F. Poli, E. Vicario, C. D. Nugent, E. Tamburini, and C. Paggetti. A visual editor to support the use of temporal logic for adl monitoring. In ICOST, pp. 217–225, (2007).
Chapter 9
Smart Sweet Home... A Pervasive Environment for Sensing our Daily Activity?
Norbert Noury1 , Julien Poujaud, Anthony Fleury, Ronald Nocua, Tareq Haddidi2 , Pierre Rumeau3 INL INSA Lyon, Bat. Leonard de Vinci, Av. Jean-Capelle, 69621 Villeurbanne, France
[email protected] 4 Abstract Humans deeply modified their relationship to their housings during the past centuries. Once a shelter where humans could find protection and have rest, the living place successfully evolved to become the midpoint of the family, the expression of own culture and nowadays a more self centered place where individuals develop their own personal aspirations and express their social position. With the introduction of communication technologies, humans may become nomads again with the ability to stay connected with others in any place at any time but, as a paradox, we can observe a wide movement for “cocooning”. Among all the services a living place can bring to inhabitants, we may list comfort, security, wellness and also health services. Thus a new living place is to be invented, becoming the “witness” of our breath, perceiving the inhabitants rhythms of activities, habits, tastes and wishes. Eventually, the “smart home” become the “Health Smart Home” to enable the follow up of physical and health status and meet the new concepts of “Aging in place” and “citizen health care”. We listed some of the research projects in Health Smart Home, which were launched worldwide to discover they are mostly based on very basic sensors and simple algorithms. We experienced our own Health Smart Home to prove that temporal analysis of data output from simple presence sensors is already worthwhile. We first produced “ambulatograms”, a temporal representation of the daily activity gathered from the presence sensors, and then discovered regular patterns of activities which we named “circadian activity rhythms (car)”, the direct relationship between night and day level of activities and also the information contained in periods of inactivity. We now concentrate on the automatic recognition of the daily Activities with multiple sensor fusions methods. 1 Prof. N. Noury is with INL UMR CNRS 5270, Lyon, and associate researcher at TIMC-IMAG UMR CNRS 5521, Grenoble. 2 J. Poujaud, A. Fleury, R. Nocua and T. Hadidi are with TIMC-IMAG UMR CNRS 5521. 3 Dr P. Rumeau is with hospital CHU Toulouse. 4 Affiliation footnote.
187
188
9.1
Activity Recognition in Pervasive Intelligent Environments
Introduction
Our living habitat constantly evolved along the history of humanity. In early times caves were only a refuge. Still, this basic shelter allowed humans to create their unique concept of Family. Later on, humans shaped their housings following their cultural differences and religious beliefs. Indeed, the features of our living habitat - its spatial organization, style of occupation, center of human activities - are fundamentals in our culture and in evolution of mankind. It can help understand our present living models and our interactions in a more complex societal context based on the dual wish to communicate globally and in the mean time to reach individual accomplishment. The first technology introduced in our homes was electricity. It brought artificial lighting, allowing humans to maintain their activities after setting sun. Electricity also opened our home to numerous technologies which changed our lives. The fridge contributed in improving our diet safety, with major consequences on our longevity. The washing machine contributed to emancipation of women who could then participate more actively to economy. The last revolution in homeland is the telephone which made it possible to be in communication with others. The Internet amplified this phenomenon with the possibility to instantly share communications and information. Thus, there is a deep aspiration in human for achievement in a single place while enlarging his field of influence. This “referential” is the home. Smart Homes technology offers a new chance to meet the deep needs of humans to develop their own personality in a secure environment of wellness and good health, if it is reliable and brings real services to mankind. After a synthesis of several worldwide research projects in the domain of “Health Smart Homes”, we make the hypothesis that useful information can be elaborated from simple presence sensors. We start from the “ambulatograms”, a temporal representation of the daily activity gathered from the presence sensors, to discover regular patterns of activities, which we named “circadian rhythms of activities”, and then we show the direct relationship between night and day level of activities and the useful information coming from periods of inactivity. We further concentrate on the automatic recognition of the daily activities with multiple sensor fusions methods.
9.2
Daily Activities at Home
Life is movement; therefore, Human beings can be qualified as “moving beings”, performing a set of activities all through the day to meet both their basic needs and higher aspirations. For the elderly subject, the ability to live independently in own home depends on
Smart Sweet Home
189
Table 9.1 Some Smart Homes programs in Asia. Localization Japan, Chiba University Japan(1990)
Name WTH (Welfare Techno House) Ubiquitous House
Japan, Tokyo University Japan, Ibaraki
Sensing room SELF
South Korea South Korea
uHouse UbiHome
Australia, NSW University Sydney a Presence
Main Findings Instrumentation of the toilets
References Tamura [1, 2]
Activities: Sensing floors, microphones and ceiling cameras Human Machine interactions with passive and active robots (Phyno) Activities: collection of data on human activities with presence sensors and door contacts Physiological and Psychological information Model of “reference” health status Activities: (Pyroelectric) presence sensors Activities: presence sensors and automatic control of lights Activities: PIRa sensors, pressure sensors, Light sensors Health: Physiological sensors
Yamazaki [3]
Noguchi [4] Nashida [5] Park [6] Woo [7] Celler [8]
Infrared Sensors.
his autonomy to perform the basic actions involved in the daily living: to transfer to/from bed and in/out of a chair, to move around and out of the flat, to wash, to use the toilet, etc. Actually, there is a direct relationship between the number of activities performed daily by the person and his level of autonomy. Furthermore, some age-related diseases (cognitive impairments or mild dementia such as Alzheimer’s disease) have a direct impact on the daily activities of the elderly. 9.2.1
State of the art in health smart homes
A common research interest in ubiquitous computing is the development of inexpensive and easy to deploy sensing systems to support activity detection and context awareness applications in the home. This research finds an application in maintaining at home the fast growing population of elderly people who wish to carry on an independent living. This problem is addressed with the development of experimental platforms requiring large numbers of sensors, invasive vision systems, or extensive installation procedures. Several researchers have explored modalities using arrays of low-cost sensors, such as motion detectors or simple contact switches. Although these solutions are cost-effective on an individual sensor basis, they have their drawbacks. For example, the presence of many visible sensors may modify the habits of the subject. It may also detract from the aesthetics of the home. Additionally, having to install and maintain many sensors may be a time-consuming and costly task. The researches on Health Smart Homes were first motivated by the aging
190
Activity Recognition in Pervasive Intelligent Environments
Table 9.2 Some Smart Homes programs in Europe. Localization United kingdom United Kingdom
The Netherlans, Philips inc. Eindhoven Norway, Tonsberg France, Grenoble Univ.
Name CarerNet (MIDAS) ASH (Adaptable Smart Home) ENABLE
Main Findings
SmartBo
Assistance: home automation technologies
Elger [12]
HIS (AILISA)
Activities and Fall: PIR sensors, wearable sensors, sound capture (Microphones) Model of activities and detection of abnormal situations Health status: network of physiological sensors (weight, blood pressure) Activities: presence sensors (pyroelectric), detection of absence during nighttime Comfort: detect electrical consumption habits (ANNa ) Assistance: remote controls
Noury [13]
France, LAAS Toulouse
ProSAFE (Ergdom)
Spain, Ingema
I2Home
a Artificial
References Williams [9] Richardson [10]
Communication technologies to reduce dependence
Van Berlo [11]
Chan [14]
b
Neural Networks. b www.i2home.org
of populations. This is probably the reason why it first started in Japan in the mid-eighties (Table 9.1). West European countries followed at the beginning of the nineties (Table 9.2). North America entered the field in the new century (Table 9.3), mainly motivated by industrial reasons after identifying this new promising market. The projects of research and developments are now numerous in the field of health smart homes because it has a high societal impact. It is also an attractive field of applications for already existing researches on sensors, image and signal processing, communication engineering and human machine interactions. This field has also grown interest with the support of the European community (Framework Programs FP6 then FP7).
9.3 9.3.1
Detection of activities with basic PIR sensors PIR sensors
The “Presence Infra Red” detectors, also called PIR sensors, are commonly used in alarm systems to detect intrusions. The sensing element is passive and detects only the variations of energy received in infrared wave lengths, such as those produced at the surface of the body. To detect the movement, an additional Fresnel lens is used to produce several “light paths” for the waves to create a differential signal when the body crosses 2 successive paths
Smart Sweet Home
191
Table 9.3 Localization Colorado
Name ACHE
Pennsylviana, Philadephia, Drexel University
ADL Suite)
Arlington, Univ. Texas Florida
MavHome
Georgia tech.
Aware Home
Microsoft corp. Boston, MIT
Easy Living House_n
Columbia, Univ.Missouri Oregon, Portland
Aging in place Elite Care (OctField)
Gator Tech
Canada, Toronto Canada, Univ. Sherbrooke
DOMUS
Some Smart Homes programs in North America. Main Findings Comfort: Sensors of electrical consumption with algorithms to adapt to the habits of inhabitants (ANN) monitoring a wide variety of interactions with objects in the environment (microwave, stove, dishwasher,kitchen drawers and cabinets, toilet,bed, and TV set. The suite was also equipped with a video camera system that recorded the real time activity (food preparation, medication taking, transferring) Comfort: Statistical studies of electrical habits
References Mozer [15]
Comfort: video cameras, movement sensors (USa )), RFID Activities: smart floors (gait signature of the person), RFID Tags (objects findings) Comfort: electrical and water consumption Video cameras inserted in a mirror Wellness: energy consumption, air quality (Co and CO2), temperature and humidity, sound capture Activities: RFID tags (localization of objects) Activities: wireless sensors to detect movements
Helal [18]
Activities: vision cameras (Mild dementia and Alzheimers) Activities, Fall: Vision sensors (place on ceilings) Activities in Kitchen (vision cameras, sensing floors)
c
Glascock [16]
Cook [17]
Abowd [19]
Krumm [20] Intille [21]
b
Mihailidis [22] Pigot [23]
a Ultrasound. b www.aginginplaceinitiative.org c www.elitecare.com.
(Fig. 9.1). The PIR sensors can therefore only detect movements, not presence ; they were built to detect intrusions. They are also prone to misdetections due to their sensitivity to any IR sources (heating appliances, sunlight) and their detecting area (reduced detection volume, limited light paths). Additionally they make no distinction between humans, making it difficult to distinguish the activities of multiple inhabitants. Anyway, it is feasible to easily detect the main activities of a single subject with a correct position of the PIR sensors around the home (Fig. 9.2) and with simple algorithm based on a state machine [24], considering the subject is located where he was last detected. Further more, for security reasons, PIR sensors are likely to be already installed in the environment of the elderly.
192
Activity Recognition in Pervasive Intelligent Environments
Fig. 9.1 The moving body is detected through the differential signal that appears on the single IR sensor when crossing 2 successive light paths of the Fresnel lens.
Fig. 9.2 Each PIR sensors having a limited detection area, it is necessary to superpose their detecting areas.
9.3.2
The HIS of Grenoble
We first installed our HIS (“Habitat Intelligent pour la Sante”) in 1998, inside the Faculty of Medicine of Grenoble. This real flat (47 square meters) is composed of a bedroom, a living-room, a hall, a kitchen, a bathroom with a shower and a cabinet (Fig. 9.3). A technical supervision room, next to the flat, was installed to receive the network hub and information infrastructure. Each room is equipped with PIR sensors, connected wirelessly to a local CAN network. Additional sensors were connected to the local field bus in order to capture physiological parameters (Weight, blood pressure, SpO2) together with ambient parameters (temperature, humidity, luminosity level). The HIS was further reproduced in several settings in geriatric Hospitals and elderly institutions, within the French national project AILISA [25], and we mainly focused our research on the interpretation of the data of activity from the presence sensors.
Smart Sweet Home
193
Fig. 9.3 The HIS of Grenoble is a real flat composed of a bedroom, a living room, a kitchen, and a bathroom with shower and cabinet. A local network gathers data from 3 kinds of sensors: Physiological, Ambient and Activity.
9.4
Ambulatograms
Thanks to the results gathered in real living places within the AILISA project, we created a new pattern of daily activity, which we named “ambulatograms” [26] (Fig. 9.4)
Fig. 9.4 Ambulatogram of activity on a daily basis (horizontal scale is time in hours; vertical scale is the number of rooms visited.
Our “ambulatogram” already points out the main periods of activity and inactivity, together with the spatial frequencies (a high frequency of activity corresponds to close vertical lines). From the raw signal of all the detections events (the ’log file’), we compute a series of events n = S∗ ( j, i), with j ∈ [start day, end day], i ∈ [1, 86400 seconds] and n ∈ [0, 5] the
194
Activity Recognition in Pervasive Intelligent Environments
sensor number (in this case, 1 to 5). For a further analysis of the events density repartition, we create the daily signal SJ (i) = S(J, i) (for each day J) and then the cumulated signal AJ (i), which equals to 1 if SJ (i) = 1, or to 0 if SJ (i) = 0. After summing up AJ (i) from day J = M to N, we eventually obtain the signal AMN (i) = ∑NJ=M A j (i). The graphical representation of this signal is the “Agitation profile” which already shows regular patterns of activity (rhythms) of the inhabitants (Fig. 9.5),
Fig. 9.5 An example of the signal “Agitation profile” (top). After applying a convolution with a Hanning time window (bottom) the circadian rhythms are visible.
9.5
Circadian activity rhythms?
The first study held in the HIS led to discover the circadian rhythms of living activities [27] of the inhabitants. During successive days, we simply sum the number of minutes spent each hour in each room during a day to obtain series of values for each room and for each hour (Fig. 9.6). For each series, we calculate its mean value (m) and standard deviation (s). These last values belonging to the interval [0; 60 minutes], we normalize these values with dividing them by 60 minutes. Thus, a value of “1” corresponds to a room always occupied whereas a value of “0” corresponds to a room never occupied. We can therefore assimilate these values to a probability (of presence). After a long period, this gives a good approximation of the person’s habits. We then discover that, for each room, the hourly successive mean
Smart Sweet Home
Fig. 9.6 days.
195
Lapses of time recorded in one room (here the kitchen between 8 and 9 o’clock) during a stay of 70
values follow an hourly biological rhythm (Fig. 9.7).
Fig. 9.7 Biological rhythm per each room (after a stay of 70 days).
We can further trigger an alarm in case of an abnormal deviation of activities with 2 symetrical thresholds S, S∗ : [S, S∗ ] = [m − μ · s, m + μ · s], with μ an adjustable parameter. Any behavior outside the [S, S∗ ] interval is considered unusual, or even more critical, depending on the adjustment of the parameter μ which sets the alarm trigger level. For instance, for a
196
Activity Recognition in Pervasive Intelligent Environments
Gaussian distribution of the lapses of time, f (x) =
1 (x − m)2 ∗ exp − , with m = 0 and s = 1. 2π 2.s
“μ 1 = 1.5” would give a “benign” confidence interval [m − 1.5s, m + 1.5s] which includes 87% of the observations. Thus a minor alert is triggered if the benign threshold is exceeded with a probability Pext = 13%, Pext (μ ) =
∞ μ
2. f (x).dx.
The “ μ 2 = 2” value triggers a major alert with a probability Pext = 5% as the limit interval [m − 2s, m + 2s] corresponding to a probability of 95% of being inside this interval. This was further evaluated in assisted living units in St. Paul, MN, by Virone [28], involving 22 participants, median age 85, for periods ranging from 3 months to 1 year. This validated the method and demonstrated the possibility to launch alarms in case of abnormal deviations in the rhythms of activities.
9.6
Night and day alternation
In an attempt to find a possible correlation between the health status and the expression of activity, we rediscovered the very simple alternation in night and day level of activity [29] which can be stated very simply: the diurnal level of activity is lower after a bad sleep (restless nighttime). Thus, we formulated the following postulate: a high (low) level of activity during both the day and the night may be abnormal. This was further confirmed by the geriatricians in 2 hospitals where we followed up 5 elderly persons during several months (see example in Fig. 9.8). In order to better point out the correlation between the diurnal/nocturnal activities, we simply computed the trends (first derivative) on both signals. Therefore, when an “abnormal situation” appears, the 2 signals are synchronized and their trends have both the same sign (positive or negative) as in Fig. 9.9. Within the AILISA project, we recorded long term data in our hospital suites at Hopital Charles Foix, Paris, and Centre de Geriatrie Sud, Grenoble, together with written observations from the nurses. When presenting the results to physicians, we discovered that the periods of synchronization corresponded to periods when the patient felt poorly, complaining a lot to the medical team.
Smart Sweet Home
197
Fig. 9.8 The diurnal and nocturnal activities are mostly correlated (Monitoring of an elderly person during a 2 months period - May to July 2007).
Fig. 9.9 When the trends are both the same sign, it corresponds to periods of complaint of the patient.
9.7
Inactivity of Daily Living?
The inactive periods may also provide useful information about the human activity [30]. The duration of sleeping periods is probably the first example, but we can also imagine that an abnormal loss of activity might raise an alarm. This time, the log file (date, time, and event) is first converted into a list of time delays between two detections, then cut into separate series for each room of the flat, as the activity differs in each room (i.e. activities
198
Activity Recognition in Pervasive Intelligent Environments
in bedroom are different from activities in bathroom). Within the AILISA project, we recorded data during 2 months in one of the private flats of the Residence Notre Dame, Grenoble. An example of inactivity distribution for the Living room is given in Fig. 9.10.
Fig. 9.10 Histogram of time of inactivity.
This series of data don’t fit a normal distribution so we used a non-parametric test (i.e. Kruskal-Wallis) instead of a more classical one (i.e. Anova) to test the independence of mean and variance.Actually, the test confirms that mean and variance ranks are statistically independents (Fig. 9.11). In order to set an alarm threshold on the inactivity we can either directly work on the bar graph (Histogram), or fit a statistic model to the density of inactivity. The exponential law is commonly used to represent process life times (λ -distribution). Let t be the time between two detections and X the random value of the process’ life. If the mean value is E(X) and if the self life after a time T is independent of time T , then the density of probability f (t) of X is, f (t) = 0
for t < 0 (9.1) 1 t f (t) = ∗ exp − for t 0 E(X) E(X) We then compute this density of probability for each data series. As an example, Fig. 9.12 shows the density of probability of inactivity in the living room during afternoons, From this density of probability, we can set a reliable threshold on the Probability P(X > t) with a confidence interval of 95% in each sample of population. The following results
Smart Sweet Home
199
Fig. 9.11 Variance and mean ranks of inactivity.
Fig. 9.12
The density of probability of inactivity in the living room follows an exponential Law.
(Table 9.4) show the possibility to launch an alarm in case of under activity in a room (here the Living room), for a given period of the day.
200
Activity Recognition in Pervasive Intelligent Environments
Table 9.4
Thresholds on time of inactivity for different periods (Living room).
Method Bar Graph Exponential Model
9.8
Morning 83s 70s
Afternoon 139s 107s
Evening 235s 183s
Night 93s 83s
Activities of daily living
The “Activities of Daily Living” concern “all the actions we normally do in daily living including any daily activity we should perform in full autonomy for our own self-care (such as feeding ourselves, bathing, dressing, and grooming)”. Professional practitioners (i.e. geriatricians) currently measure the level of autonomy of a fragile person through evaluating the ability of the person to perform on their own a selection of activities which are essential to run an independent living in the everyday life. One such scale, the ADL [31], involves 6 tasks (bathing, dressing, toileting, transferring, continence and eating) which are individually assessed by the professional as being “autonomous”, “partially autonomous” or “non autonomous” (Fig. 9.13) for a given patient.
Fig. 9.13 Each of the 6 individual ADLs are evaluated as being performed: “Autonomous”, “Partially Autonomous” or “Non autonomous”.
If only one of the activities is “non autonomous”, the subject is declared “dependent” and will need either assistance at home or placement in an institution for elderly. This assessment is obviously operator-dependent. Also it cannot be performed frequently enough to detect the slow trend involved in the process of loss of autonomy. Thus, there is a need for a system which can perform this evaluation in a more objective manner and on a more regular basis in order to early detect the loss of autonomy and, may be, provide some assistance to prolonge an independent living at home. Moreover, the ADL scale was thought for evaluating the autonomy of elderly people in geriatrics institutions. It is not informative enough in the conditions of independent daily living. Therefore, researchers [32, 33] con-
Smart Sweet Home
201
sidered 5 activities (hygiene, continence, feeding, sleeping, walking) within 4 time periods (day, night, morning, evening). This refinement of the ADL leads to a combination of 10 activities with time periods (Fig. 9.14),
Fig. 9.14 A proposal of refinement of ADL with distribution over time (horizontal scale is time of day).
9.9
On the automatic detection of the ADL
We early stated the possible benefit in evaluating the ADL more objectively and more regularly, in order to detect the slow drift to dependence. For example, observation of a reduced number of meals taken each day (i.e. only 2 instead of 3) is a clinical sign of “abandon”. If early detected, a preventive action can be taken (meal servicing, health inquiries). In our study we focused on the following 8 “scenarios” of activities (Sc): Walking (Wa), Breakfast (Bk), Lunch (Lu), Dinner (Di), Dressing (Dr), Toileting (To), Recreation (Re) and Sleeping/Resting (Sr). We considered 3 “sensing axis”: localization indoors (L), posture of the subject (P) and time (H). The first one was obtained with the PIR sensors placed in the HIS. The posture of the wearer was determined with a kinematic sensor, early developed in our lab with a tri-axis accelerometer, which also produced additional information on the level of activity (Static/Dynamic, SD), transfers (T) and walking phases (W). The time was delivered by an electronic clock which was also used to measure the duration of an activity within a period of the day. We then elaborated a “reference matrix (RT)” (Table 9.5) with all the expected values of our 6 pieces of information (I) according to our 8 scenarios of activities (Sc), The process of selecting the best activity scenario (SSc ) among 8 possible scenarios (Si), when knowing information (I), can be performed with conditional probabilities (Bayes
202
Activity Recognition in Pervasive Intelligent Environments
Table 9.5 The Reference Matrix (RT) holds the range of values of information (I) for each activity(Sc). Hour (H) Information (I) Activity (Sc)
Sleeping (Sr) Recreation (Re)
[19h, 20h, 7h, 8h] [9h, 10h, 17h, 18h]
Toileting (To) Dressing (Dr)
any
Breakfast (Bk) Lunch (Lu)
[5h, 6.5h, 7.5h, 9h] [10h, 11.5h, 12.5h, 14h] [16h, 18.5h, 19.5h, 21h] any
Diner (Di) Walking (Wa)
[7h, 8h, 9h, 10h]
Location (L) Bedroom
Transfers (T) No
Posture (P) Lying
Static/Dyn (SD) Static
Walking (W) No
Living room, Bedroom WC
No
Static
No
Static
No
Bedroom, Living room, Shower room, WC Kitchen
No
Standing, bended, Lying Standing, Lying standing, leaning
Dynamic
No
Static
No
Kitchen
Bts, Sts
Static
No
Kitchen
Bts, Sts
Static
No
2 different rooms
No
Standing, Leaning Standing, Leaning Standing, Leaning Standing, Leaning
Dynamic
Yes
Bts, Sts
Bts, Sts
formula): P(I/SSc ) ∗ P(SSc ) ∑i P(I/Si).P(Si) with the scenarios Sc ∈ [1; 8] and the piece of information I ∈ {H, L, T, P, SD,W }. For P(SSc /I) =
instance, if we consider scenario “sleeping” (Sr), P(SSr /H) ∗ P(SSr /L) ∗ P(SSr /T ) ∗ P(SSr /P) ∗ P(SSr /SD) ∗ P(SSr /W )/P(SSr ) P(SSr /I) = ∑i P(Si /H) ∗ P(Si /L) ∗ P(Si /T ) ∗ P(Si /P) ∗ P(Si /SD) ∗ P(Si /W )/P(Si ) where Si is one of the Sc ∈ [1; 8]. Eventually, the criterion of “maximum a posteriori” elects the scenario which obtains the highest conditional probability. But this requires eight conditional probabilities which are difficult to estimate. Instead, we proposed to consider the “weight” of each piece of information in each activity (matrix DSW, Table 9.6), We first segment all our signals in time slots of equal length (LFen=15 seconds, in order to accurately detect the shortest scenarios). In each time window (w), we compute the maximum value of each piece of information (the hour (H) at the middle of the time window, the location (L) where the person mostly stayed, the posture (P) mainly reported . . . ). We obtain the vector “Trend of Information” T I6,1 (w) with 6 values which are further compared to the “Reference Matrix” RT (Table 9.5). The distance from the vector TI(w) to each line of the matrix RT produces a new matrix “Distance to the Scenarios” (DS): DS6,8 (Sc, I) ∈ [0; 1], with Sc ∈ [1; 8] and I ∈ {H, L, T, P, SD,W }
Smart Sweet Home
203
Table 9.6 Matrix (DSW) of the weights of each information (I) for each activity(Sc). Sleeping(Sr) Recreation(Re) Toileting(To) Dressing(Dr) Breakfast(Bk) Lunch(Lu) Diner(Di) Walking(Wa)
H 0.2 0.3 0 0.15 0.3 0.3 0.3 0
L 0.4 0.3 0.55 0.2 0.4 0.4 0.4 0.2
T 0.05 0.05 0.2 0.1 0.1 0.1 0.1 0.05
P 0.2 0.15 0.1 0.1 0.1 0.1 0.1 0.05
SD 0.15 0.2 0.15 0.45 0.1 0.1 0.1 0.2
W 0 0 0 0 0 0 0 0.5
A cell of the DS matrix takes the value “1” if one value of the vector TI(w) matches the correspondent reference value in RT; otherwise it is 0. In the last step we compute a vector “Level of Achievement” LA8,1 , which represents the level of the 8 scenarios independently. For each scenario, it is made of the weighted sum of each line of the matrix DS, LA(Si, NFen) = ∑ DSW (k, NFen, Si).DS(Si, k) ∀ i ∈ [1; 8] k
with Si one of the 8 scenarios (Sc), NFen the current time window and k one of the 6 software sensors (I). The first cell of the vector LA is the level of achievement for the scenario number one (sleeping), the second cell is the level of the second scenario (recreation). Eventually, the more cells of TI match those of RT (most of them are close to 1), the greater the sum will be. In the following example (Fig. 9.15), we can see that the level of achievement (LA) is most of the time close to maximum. An experimentation involved 7 students at university (age = 27.0 ± 7.4) and 4 senior citizens (age = 80.5 ± 3.2) who were asked to perform the eight activities (sleeping, relaxing, dressing, toileting, breakfast, lunch, dinner and walking) within a reduced time (about 2 hours) in our experimental platform “HIS”. The overall performances were good and even better for the elderly group (Se = 86.9 %, Sp = 59.3 %) than for the young group (Se = 67.0 %, Sp = 52.6 %). Sensitivities were higher than specificities (all scenarios and both populations). The level of classification was good for each scenario, except for the scenario “dressing” which gave poor performances, for both the young (Se = 23.0 %, Sp = 17.9 %) and elderly populations (Se = 33.3 %, Sp = 18.9 %). 9.10
Discussion
The researches in smart homes are multidisciplinary and thus raise numerous questions which must be addressed carefully.
204
Activity Recognition in Pervasive Intelligent Environments
Fig. 9.15 Input signals, detected scenario and Level of Achievement (LA). The signals are for one young volunteer simulating series of activities randomly (The length of each window is 15 s).
The first one is: “what selection of sensors and for what kind of information?”. Obviously, with numerous sensors we may obtain a more detailed information (video cameras and scene interpretations) but this may become difficult to install and to maintain and eventually, inacceptable in the home privacy. On opposite, using the technologies already in place is probably less painful. But this leads to work on “traces (clues)” of the human activity and to invest on more subtle mathematics (i.e. Hidden Markov Model [34], Support vector Machines [35]). In other words, this question addresses the “granularity” of information: with detailed information (microscopic granularity) one can launch alarms on situations needing immediate assistance (i.e. “a fall”), whereas on a coarse information (macroscopic granularity) we might detect the slow trends of the activity (autonomy). The second set of questions concerns intrusiveness of those technologies and ethics: “is it acceptable to collect and moreover to build information in the intimacy of the home?”. First, the most acceptable technologies are probably the one already in place: an interesting path is in trying to use the environment itself as a sensor (i.e. to detect the main activities through the electrical activities on the residential power line [36]). Nevertheless, the second
Smart Sweet Home
205
question raises a more deep concern. Actually, Ethics is not “Morals”. Humans will only accept technologies which will serve them. If technology can help individuals to stay in home independently, then it is undoubtly acceptable. A rather “non ethical” behavior would be to let no chance to the elderly because technology might disrupt them... Eventually, one may question about the meaning of this research: “Is there something measurable?”. The results we obtained show that it is possible to build meaningful information upon simple sensors. Still, the work must be carried further on methods to better understand the signals of activities and the correlations between wellness and signs of activity. This work must be done in close collaboration with physicians and physiologists. It will need field experimentations on large scale. This is a thorny problem. One way we explore consists in the elaboration of models to simulate the daily activity and then use it to select the best sensors with the best algorithms [37, 38].
9.11
Conclusion
Individuals now wish their living place to be more responding to both their vital needs and higher aspirations. Among the technologies which tried to enter our homes, some never made sense. This was the case for Domotics . Once a promising industry in early eighties, based on well established industrial standards (automation and electronics), it failed to meet the market place. This was probably due to the process “technology push” which proposes technologies before to analyze the real whishes; producing unreliable “gadgets”, lacking the flexibility to adapt to the evolving needs of the consumers, rapidly abandoned by the “early adopters”. Humans invite in their homes the technologies which make sense to them, servicing their basic needs and helping them concentrate on their main achievement in life... Domotics was unsuccessful probably because it never met these fundamental wishes. A promising way is to use the technologies already existing in home to build on new services.
References [1] T. Tamura, T. Togawa, M. Ogawa, and M. Yoda, Fully automated health monitoring system in the home, Med Eng Phys. 20(8), 573–579 (Nov, 1998). [2] T. Tamura, A. Kawarada, M. Nambu, A. Tsukada, K. Sasaki, and K. Yamakoshi, E-healthcare at an experimental welfare techno house in Japan, Open Med Inform J. 1, (2007). [3] T. Yamazaki. Ubiquitous home: real-life testbed for home context-aware service. In Proc. First Int. Conf. Testbeds and Research Infrastructures for the Development of Networks and Communities Tridentcom 2005, pp. 54–59, (2005). doi: 10.1109/TRIDNT.2005.37.
206
Activity Recognition in Pervasive Intelligent Environments
[4] T. Mori, A. Takada, H. Noguchi, T. Harada, and T. Sato. Behavior prediction based on dailylife record database in distributed sensing space. In Proc. IEEE/RSJ Int. Conf. Intelligent Robots and Systems (IROS 2005), pp. 1703–1709, (2005). doi: 10.1109/IROS.2005.1545244. [5] Y. Nishida, T. Hori, T. Suehiro, and S. Hirai. Sensorized environment for self-communication based on observation of daily human behavior. In Proc. IEEE/RSJ Int. Conf. Intelligent Robots and Systems (IROS 2000), vol. 2, pp. 1364–1372, (2000). doi: 10.1109/IROS.2000.893211. [6] J.-S. Lee, K.-S. Park, and M.-S. Hahn. Windowactive: An interactive house window on demand. In Proc. 1st Korea-Japan Joint Workshop on Ubiquitous Computing and Networking Systems UbiCNS, pp. 481–484 (June, 2005). [7] Y. Oh and W. Woo. A unified application service model for ubihome by exploiting intelligent context-awareness. In Proc. of Second Intern. Symp. on Ubiquitous Computing Systems (UCS2004), Tokyo, pp. 117–122, (2004). [8] B. G. Celler, W. Earnshaw, E. D. Ilsar, L. Betbeder-Matibet, M. F. Harris, R. Clark, T. Hesketh, and N. H. Lovell, Remote monitoring of health status of the elderly at home. a multidisciplinary project on aging at the university of new south wales., Int J Biomed Comput. 40(2), 147–155 (Oct, 1995). [9] G. Williams, K. Doughty, and D. Bradley, A systems approach to achieving carernet-an integrated and intelligent telecare system, Information Technology in Biomedicine, IEEE Transactions on. 2(1), 1 –9 (march, 1998). ISSN 1089-7771. doi: 10.1109/4233.678527. [10] S. Richardson, D. Poulson, and C. Nicolle. Supporting independent living through adaptable smart home (ash) technologies. In Proc. Human welfare and technologies: papers from the human service information technology applications (HUSITA) conference on information technology and the quality of life and services, pp. 87–95, (1993). [11] A. Van Berlo. A “smart” model house as research and demonstration tool for telematics development. In Proc. 3rd TIDE Congres : Technology for Inclusive Design and Equality Improving ˝ June, Helsinki, Finland, (1998). the Quality of Life for the European Citizen, 23U25 [12] G. Elger and B. Furugren. “smartbo” - an ict and computer-based demonstration home for disabled people. In Proc. 3rd TIDE Congress : Technology for Inclusive Design and Equality ˝ June, Helsinki, Finland, 1998., Improving the Quality of Life for the European Citizen, 23U25 (1998). [13] N. Noury, T. Herve, V. Rialle, G. Virone, E. Mercier, G. Morey, A. Moro, and T. Porcheron. Monitoring behavior in home using a smart fall sensor and position sensors. In Proc. Conf Microtechnologies in Medicine and Biology, 1st Annual Int. On. 2000, pp. 607–610, (2000). doi: 10.1109/MMB.2000.893857. [14] M. Chan, C. Hariton, P. Ringeard, and E. Campo. Smart house automation system for the elderly and the disabled. In Proc. IEEE Int Systems, Man and Cybernetics Intelligent Systems for the 21st Century. Conf, vol. 2, pp. 1586–1589, (1995). doi: 10.1109/ICSMC.1995.537998. [15] M. Mozer. The adaptive house. In Proc. IEE Seminar (Ref Intelligent Building Environments No. 2005/11059), pp. 39–79, (2005). [16] A. P. Glascock and D. M. Kutzik, Behavioral telemedicine: A new approach to the continuous nonintrusive monitoring of activities of daily living, Telemedicine Journal. 6(1), 33– 44, (2000). doi: 10.1089/107830200311833. URL http://www.liebertonline.com/doi/ abs/10.1089/107830200311833. [17] S. Das, D. Cook, A. Battacharya, I. Heierman, E.O., and T.-Y. Lin, The role of prediction algorithms in the mavhome smart home architecture, Wireless Communications, IEEE. 9(6), 77 – 84 (dec., 2002). ISSN 1536-1284. doi: 10.1109/MWC.2002.1160085. [18] S. Helal, W. Mann, H. El-Zabadani, J. King, Y. Kaddoura, and E. Jansen, The gator tech smart house: a programmable pervasive space, Computer. 38(3), 50–60, (2005). doi: 10.1109/MC. 2005.107. [19] C. D. Kidd, R. Orr, G. D. Abowd, C. G. Atkeson, I. A. Essa, B. MacIntyre, E. D. Mynatt,
Smart Sweet Home
[20]
[21] [22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
[30]
[31]
[32]
[33]
[34]
207
T. Starner, and W. Newstetter. The aware home: A living laboratory for ubiquitous computing research. In CoBuild ’99: Proceedings of the Second International Workshop on Cooperative Buildings, Integrating Information, Organization, and Architecture, pp. 191–198, London, UK, (1999). Springer-Verlag. ISBN 3-540-66596-X. J. Krumm, S. Harris, B. Meyers, B. Brumitt, M. Hale, and S. Shafer. Multi-camera multiperson tracking for easyliving. In Proc. Third IEEE Int Visual Surveillance Workshop, pp. 3–10, (2000). doi: 10.1109/VS.2000.856852. S. S. Intille, Designing a home of the future, IEEE Pervasive Computing. 1(2), 76–82, (2002). ISSN 1536-1268. doi: http://dx.doi.org/10.1109/MPRV.2002.1012340. T. Lee and A. Mihailidis, An intelligent emergency response system: preliminary development and testing of automated fall detection., J Telemed Telecare. 11(4), 194–198, (2005). doi: 10. 1258/1357633054068946. URL http://dx.doi.org/10.1258/1357633054068946. H. Pigot, B. Lefebvre, B. Meunier, J.G.and Kerherve, A. Mayers, and S. Giroux. The role of intelligent habitats in upholding elders in residence. In Proc. 5th international conference on Simulations in Biomedicine, Slovenia, April 2003, (2003). N. Noury, G. Virone, and T. Creuzet. The health integrated smart home information system (his2): rules based system for the localization of a human. In Microtechnologies in Medicine Biology 2nd Annual International IEEE-EMB Special Topic Conference on, pp. 318 –321, (2002). doi: 10.1109/MMB.2002.1002338. N. Noury. Ailisa : experimental platforms to evaluate remote care and assistive technologies in gerontology. In Proc. 7th Inter. Workshop on enterprise networking and computing in Healthcare industry, Healthcom2005, Busan-Korea, 24-25 Jun 2005, pp. 155–160., (2005). G. LeBellego, N. Noury, G. Virone, M. Mousseau, and J. Demongeot, A model for the measurement of patient activity in a hospital suite. 10(1), 92–99, (2006). doi: 10.1109/TITB.2005. 856855. G. Virone, N. Noury, and J. Demongeot, A system for automatic measurement of circadian activity deviations in telemedicine. 49(12), 1463–1469, (2002). doi: 10.1109/TBME.2002. 805452. G. Virone, M. Alwan, S. Dalal, S. W. Kell, B. Turner, J. A. Stankovic, and R. Felder, Behavioral patterns of older adults in assisted living. 12(3), 387–398, (2008). doi: 10.1109/TITB.2007. 904157. N. Noury, T. Hadidi, M. Laila, A. Fleury, C. Villemazet, V. Rialle, and A. Franco. Level of activity, night and day alternation, and well being measured in a smart hospital suite. In Proc. 30th Annual Int. Conf. of the IEEE Engineering in Medicine and Biology Society EMBS 2008, pp. 3328–3331, (2008). doi: 10.1109/IEMBS.2008.4649917. J. Poujaud, N. Noury, and J.-E. Lundy. Identification of inactivity behavior in smart home. In Proc. 30th Annual Int. Conf. of the IEEE Engineering in Medicine and Biology Society EMBS 2008, pp. 2075–2078, (2008). doi: 10.1109/IEMBS.2008.4649601. S. Katz, A. Ford, R. W. Moskowitz, B. A. Jackson, and M. W. Jaffe, Studies of illness in the aged. the index of adl: A standardized measure of biological and psychosocial function., JAMA. 185, 914–919 (Sep, 1963). P. Barralon, N. Noury, and N. Vuillerme. Classification of daily physical activities from a single kinematic sensor. In Proc. 27th Annual Int. Conf. of the Engineering in Medicine and Biology Society IEEE-EMBS 2005, pp. 2447–2450, (2005). doi: 10.1109/IEMBS.2005.1616963. M. Berenguer, M. Giordani, F. Giraud-By, and N. Noury. Automatic detection of activities of daily living from detecting and classifying electrical events on the residential power line. In Proc. 10th Int. Conf. e-health Networking, Applications and Services HealthCom 2008, pp. 29–32, (2008). doi: 10.1109/HEALTH.2008.4600104. X. H. B. Le, M. Di Mascolo, A. Gouin, and N. Noury. Health smart home for elders - a tool for automatic recognition of activities of daily living. In Proc. 30th Annual Int. Conf. of the
208
[35]
[36]
[37]
[38]
Activity Recognition in Pervasive Intelligent Environments
IEEE Engineering in Medicine and Biology Society EMBS 2008, pp. 3316–3319, (2008). doi: 10.1109/IEMBS.2008.4649914. A. Fleury, M. Vacher, and N. Noury, Svm-based multimodal classification of activities of daily living in health smart homes: Sensors, algorithms, and first experimental results. 14(2), 274– 283, (2010). doi: 10.1109/TITB.2009.2037317. N. Noury, K. A. Quach, M. Berenguer, H. Teyssier, M.-J. Bouzid, L. Goldstein, and M. Giordani. Remote follow up of health through the monitoring of electrical activities on the residential power line – preliminary results of an experimentation. In Proc. 11th Int. Conf. e-Health Networking, Applications and Services Healthcom 2009, pp. 9–13, (2009). doi: 10.1109/HEALTH.2009.5406203. G. Virone, B. Lefebvre, N. Noury, and J. Demongeot. Modeling and computer simulation of physiological rhythms and behaviors at home for data fusion programs in a telecare system. In Proc. 5th Int. Workshop Enterprise Networking and Computing in Healthcare Industry Healthcom 2003, pp. 111–117, (2003). T. Hadidi and N. Noury. Model and simulator of the activity of the elderly person in a health smart home. In Proc. of the IEEE-Healthcom2010, Lyon-France, 1-3 Jul. 2010, (2010).
Chapter 10
Synthesising Generative Probabilistic Models for High-Level Activity Recognition
Christoph Burghardt, Maik Wurdel, Sebastian Bader, Gernot Ruscher, and Thomas Kirste Institute for Computer Science, University of Rostock, 18059 Rostock, Germany
[email protected] Abstract High-level (hierarchical) behaviour with long-term correlations is difficult to describe with first-order Markovian models like Hidden Markov models. We therefore discuss different approaches to synthesise generative probabilistic models for activity recognition based on different symbolic high-level description. Those descriptions of complex activities are compiled into robust generative models. The underlying assumptions for our work are (i) we need probabilistic models in robust activity recognition systems for the real world, (ii) those models should not necessarily rely on an extensive training phase and (iii) we should use available background knowledge to initialise them. We show how to construct such models based on different symbolic representations.
10.1
Introduction & Motivation
Activity recognition is an important part of most ubiquitous computing applications [1]. With the help of activity recognition, the system can interpret human behaviour to assist the user [2–4], being able to control complex environments [5, 6] or detect difficult or hazardous situations [7]. Activity recognition got very successful with the rise of machine-learning techniques [8] that helped to build systems which gain domain knowledge from simple training data [9– 11]. However while it is simple to gather training data for inferring e.g. the gait of the user [12], this process does not scale up for longer and more complex activities [13]. Longterm behaviour of users, or scenarios with multiple interacting people, lead to a state explosion that systematically hinders the process of collecting training data. Higher-level activities like “making tea”, “printing a paper” or “holding a meeting” typically consist of 209
210
Activity Recognition in Pervasive Intelligent Environments
more than five steps which can – at least partially – vary in execution order or even be omitted. Under these circumstances gathering or even annotating training data is in general very complex and expensive. Furthermore the training data is less valuable because it is much more difficult to generalise to unseen instances. Another desirable feature of an activity recognition system is that the system should work in an ad-hoc fashion and exhibit sensible behaviour right from the start. It is not possible to gather training data for every ad-hoc session. The aim of this paper is to briefly present earlier research and explore three formalisms of human behaviour modelling with respect to their suitability to express different high-level activities, thus turning our explicit knowledge automatically to a probabilistic framework for inference tasks. Our work is based on the following three hypotheses: i) Probabilistic generative models are needed: To cope with noisy data, we need some kind of probabilistic system, because crisp systems usually fail in such domains. One goal of activity recognition is to provide pro-active assistance to the user. For this, we do need generative models which allow forecasts into to future activities, which then can be supported by the system. Therefore, we need probabilistic and in particular also generative models. ii) Activity recognition systems should not rely on a training phase: We believe that an activity recognition system should work right from the start. I.e., we should not have to train it before being able to use it. Nonetheless it should profit from available training data. As argued above, in most interesting situations the collection of training data is quite expensive and sometimes even unfeasible, because there are just to many different situations the system is supposed to work in. iii) Available (symbolic) background knowledge can and should be used: To create systems with the desired properties, we need to use available background knowledge to synthesise such probabilistic models. Because different modelling paradigms have individual strength and weaknesses in terms of describing sequences and hierarchical structures of activities, we discuss three different approaches in this paper. After introducing the preliminaries, we discuss how to convert process-based hierarchical approaches based on task models and grammars in Section 10.4.1 to 10.4.3, and one causal approach, based on preconditions and effects in Section 10.4.2. To leverage the individual strength of each formal description, we also discuss a first approach to combine those approaches into a single model. In Section 10.5 we dis-
Synthesising Generative Probabilistic Models for High-LevelActivity Recognition
211
cuss the advantages and disadvantages of the approaches and conclude this paper with an outlook for future research avenues. We are using hidden Markov models as the target formalism, because they are simple to illustrate and yet powerful enough to abstract every realistic (finite) problem. Furthermore, powerful and efficient inference mechanism for HMMs are well known. Even though it is clear that propositional symbolic descriptions can be transformed into a hidden Markov model, a concise treatment of such approaches is missing in the literature. Here we make a first attempt by discussing the transformation of different formal descriptions into HMMs utilising the same notation and thus provide a starting point for a further unification of such approaches. 10.2
Related Work
Among the most mature approaches to activity recognition are logic-based ones that try to infer the current activity based on logical explanations for the observed actions. Kautz et al. introduced a formal theory on “plan recognition” [14]. However, also argued by Charniak and Goldman [15], in the real world sensor data is inherently uncertain and therefore a probabilistic framework is needed. This made Bayesian inference methods very popular for activity and context recognition [10, 11, 16–18]. Many approaches use dynamic Bayesian networks (DBNs) and especially hidden Markov models (HMMs) [16, 19, 20] as the most simple DBN to infer the current activity of the user. Numerous effective inference and learning algorithms exists for HMMs and we can map every realistic finite problem onto them. Many recent activity recognition systems [21, 22] employ discriminative approaches because of their more efficient use of training data. E.g., in [23, 24] it is argued that discriminative models achieve a lower asymptotic error bound, but that generative models achieve their (typically higher) error bound faster. That is, with little training data, a generative approach (built on prior knowledge) is able to provide better estimates. If a generative model does correctly reflect the dynamics of the system, it achieves a better performance with limited or no training data. Different approaches and techniques (e.g. bootstrapping [25, 26], cross-domain activity recognition [27]) are under research to minimise or omit the need of training data. E.g., the Proact system [16] data-mined structured action descriptions from the Internet. In later publications their extended the data-mining basis to also include knowledge databases [28, 29]. We complement and extend this approach by taking (different) formal descriptions
212
Activity Recognition in Pervasive Intelligent Environments
of human behaviour and turn these into probabilistic generative models usable for activity recognition. However, in our work we seek to construct generative models in order to predict future behaviour.
10.3
Preliminaries
We now discuss some related approaches and introduce some preliminary concepts important for the sequel. In particular, we discuss hidden Markov models (HMMs), the collaborative task modelling language (CTML), the planning domain definition language (PDDL) and probabilistic context-free grammars (PCFGs) to some extent. 10.3.1
Hidden Markov Models
Inferring activities from noisy and ambiguous sensor data is a problem akin to tasks such as estimating the position of vehicles from velocity and direction information, or estimating a verbal utterance from speech sounds. These tasks are typically tackled by employing probabilistic inference methods. In a probabilistic framework, the basic inference problem is to compute the distribution function p(Xt |Y1:t = y1:t ); the probability distribution of the state random variable Xt given a specific realisation of the sequence of observation random variables Y1:t . The probability of having a specific state xt is then given by p(xt | y1:t ). In a model-based setting, we assume that our model specifies the state space X of the system and provide information on the system dynamics: It will make a statement about in which future state xt+1 ∈ X it will be in, if it is currently in state xt ∈ X . In a deterministic setting, system dynamics will be a function xt+1 = f (xt ). However, many systems (e.g. humans) act in a non-deterministic fashion, therefore, the model generally provides a probability for reaching a state xt+1 given a state xt , denoted by p(xt+1 | pt ). This is the system model. The system model is first order Markovian: the current state depends only on the previous state and no other states. Furthermore, we typically cannot observe directly the current state of the system, we have sensors that make observations Y . These observations yt will depend on the current system state xt , again non-deterministically in general, giving the distribution p(yt | xt ) as observation model. Let us consider a small example: Example 10.1. Figure 10.1 shows a graphical representation of the HMM S, π , T, τ , O, P with S = {One, Two}, π (One) = π (Two) = 0.5, τ = S × S with τ (One, One) =
Synthesising Generative Probabilistic Models for High-LevelActivity Recognition
213
τ ((Two, Two)) = 0.9 and τ (One, Two) = τ ((Two, One)) = 0.1, using O = R and P(One, i) = N(−1,2) (i) and P(Two, i) = N(1,1) (i)1 .
0.5
0.5 One
0.1
Two
0.1 0.9
0.9
Fig. 10.1 A graphical representation of the transition model of the HMM from Ex. 10.1
We can write this example more formally: Definition 10.1 (HMM). A hidden Markov model is a tuple (S, π , T, τ , O, P) with S being a set of states and π assigning an initial probability to these states with ∑s∈S π (s) = 1 , T ⊆ S × S being the state transition relation and τ : T → R mapping transition to probabilities with ∑(s,s )∈T τ ((s, s )) = 1 for all s ∈ S , O being a set of possible observations and P : S × O → R mapping states and observations to probabilities with ∑o∈O P(s, o) = 1 for all s ∈ S. The states and the corresponding state-transition model describe the sub-steps of an activity and the order that they have to be done. In the literature there exists many different paradigms to model human behaviour. However, few have been applied to describe human behaviour with respect to activity recognition. In this paper, we discuss different approaches to describe human long term and high-level activities such that those formal descriptions (task models, planning operators and grammars) can be used to initialise HMMs. 10.3.2
Planning Problem
Automated planning is a branch of artificial intelligence that concerns the realisation of strategies or action sequences, typically for execution by intelligent agents, autonomous robots and unmanned vehicles. In the STRIPS formalism [30] actions are described using 1 Throughout
the paper we use N(μ ,σ ) to denote the normal distribution with mean μ and standard deviation σ .
214
Activity Recognition in Pervasive Intelligent Environments
preconditions and effects. A state of the world is described using a set of ground proposition, which are assumed to be true in the current state. A precondition is an arbitrary function-free first order logic sentence. The effects are imposed on the current world-state. Problems are defined with respect to a domain (a set of actions) and specify an initial situation and the goal condition which the planner should achieve. All predicates which are not explicitly said to be true in the initial conditions are assumed to be false (closed-world assumption). Example 10.2. Consider a planning problem with i.e., three simple actions a, b, and c with effects p, q, and r respectively. We can define this problem as I = {a, b, c}, Is = {}, G = {p, q, r}, A = {a, b, c}, pre(a) = pre(b) = {}, pre(c) = {p, q}, eff+ (a) = {p}, eff+ (b) = {q}, eff+ (c) = {r}, eff− (a) = eff− (b) = eff− (c) = {}. Starting from an empty initial world state we are looking for a state in which every operator is applied at least once. The corresponding state graph is shown in Figure 10.2.
aa {}
{a}
ab
ab aa
{a,b}
ac
{a,b,c}
{b}
Fig. 10.2
Transition graph for the planning problem from Example 10.2
Definition 10.2 (Planning Problem). We define a planning problem formally as a tuple I, Is , G, A, pre, eff+ , eff− with I being the set of all ground propositions in the domain, Is ⊆ I being the initial state of the world and G ⊆ I being the set of propositions that must hold in the goal state, A is a set of actions with pre : A → P(I) mapping actions to preconditions and eff+ , eff− : A → P(I) mapping actions to positive and negative effects respectively. The purpose of a planner is to build up a valid sequence of actions that change the world from the initial state Is to a state where the goal condition G is true, as described by the problem description. By describing each action with precondition and effects, this modelling approach describes the resulting valid processes implicit, thus in a bottom up fashion. The automatic emergence of all valid process sequences is a very useful property for activity recognition.
Synthesising Generative Probabilistic Models for High-LevelActivity Recognition
10.3.3
215
Task Models
In Human Computer Interaction (HCI), task analysis and modelling is a commonly accepted method to elicit requirements when developing a software system. This also applies for smart environments but due to the complexity of such an environment the modelling tasks is quite challenging. Different task-based specification techniques exist (e.g.: HTA TAG, TKS, CTT (for an overview we refer to [31]) but none incorporates the complexity involved in such a scenario. According to task analysis humans, tend to decompose complex tasks into more simple ones until an atomic unit, the action, has been reached. Moreover tasks are performed not only sequential but also decision making and interleaving task performance is common to humans. Therefore the basic concepts incorporated by most modelling languages are hierarchical decomposition and temporal operators restricting the potential execution order of tasks (e.g. sequential, concurrent or disjunct execution). An appropriate modelling notation, accompanied by a tool, supports the user in several ways. First, task-based specifications have the asset of being understandable to stakeholders. Thus modelling results can be discussed and revised based on feedback. Second, task models can be animated which even fosters the understandability and offers the opportunity to generate early prototypes. Last but not least, task-based specifications can also be used in design stages to derive lower level models such as HMMs [32]. Task models are usually understood as descriptions of user interaction with a software system. They have been mostly applied to model-based UI development even though the application areas are versatile. For intention recognition task modelling can be employed to specify knowledge about the behaviour of users in an understandable manner which may be inspected at a very early stage. CTML (The Collaborative Task Modelling Language) is a task modelling language dedicated for smart environments which incorporates cooperational aspects as well as location modelling [33], device modelling, team modelling and domain modelling [34]. It is a high level approach starting with role-based task specifications defining the stereotypical behaviour of actors in the environment. The interplay of tasks and the environment is defined by preconditions and effects. A precondition defines a required state which is needed to execute a task. An effect denotes a state change of the model through the performance of the corresponding task. Therefore CTML specifications allow for specifying complex dependencies of task models and relevant other models (e.g. domain model, location model) via preconditions and/or effects. Here, we focus on a subset of CTML only, called Custom CTML (CCTML).
216
Activity Recognition in Pervasive Intelligent Environments
Example 10.3. To clarify the intuition behind task models, we give a brief example of a CCTML specification. It will not only highlight the basic concepts of CCTML but will also serve as foundation for a more elaborate example in Section 10.4.1. In Figure 10.3 the example is given. It specifies how a presenter may give a talk in a very basic manner. More precisely it defines that a sequence of the actions Introduce, Configure Equipment, Start Presentation, Show Slides and Leave Room constitute to a successful presentation. With respect to the formal definitions above we can formalise the example as follows (instead of using the task names we use the prefixed numbers of task in Figure 10.3): T = {1., 2., 3., 4., 5.}, γ = (1., 2., 3., 4., 5.), prio(t) = 1, O = {O1 , O2 , O3 },o(t) = {(1., O1 ), (2., O1 ), (3., O2 ), (4., O3 ), (5., O3 )} 1. Give Talk
2. Introduce
>>
3. Configure Equipment
>>
4. Start Presentation
>>
5. Show Slides
>>
6. Leave Room
Fig. 10.3 Simplified CCTML Model for “Giving a Presentation”. Please note that O, prio and o have been excluded from the visual representation.
To specify such a model formally, we first introduce the basic task expressions as follows: Definition 10.3 (Task Expression (TE)). Let T be a set of atomic tasks. Let t1 and t2 be task expressions and λ ∈ T, then the following expression are also task expressions: t1 []t2 ,t1 | = |t2 ,t1 |||t2 ,t1 [>t2 ,t1 | >t2 ,t1 t2 , λ , λ ∗ , [λ ], (λ ). A CCTML model is defined by a set of action, denoting the atomic units of execution and a task expression γ . We extended the usual notion by introducing the function prio assigning a priority to each action in the CCTML model and the function o assigning observations to actions. Definition 10.4 (CCTML Model). A CCTML model is defined as a tuple T, γ , O, prio, o with T being a set atomic actions, γ ∈ TE being the task, prio : T → N assigning a priority to each action and o : T → O assigning an observation to each atomic action. Please note that TE defines only binary operators in infix notation.
Nested TEs
((tx [] (ty []tz ))) can be easily translated into n-ary operators which are used in the examples. The task expressions TE defines the structure of the CCTML model. It is an recursive
Synthesising Generative Probabilistic Models for High-LevelActivity Recognition
217
definition. Each task expression e ∈ TE is therefore defined by nesting of (an-) other task expression(s) plus an operator until an action has been reached. More precisely there are binary and unary operators. Additionally the actions need to be defined in the set of actions (T ) in the corresponding CCTML. For CTML an interleaving semantics, similar to operational semantics in process algebras [35], is defined via inference rules. Basically for every temporal operator (such as choice([]), enabling (), etc.) a set of inference rules is declared which enables a derivation of a Labeled Transition System or a HMM. As the comprehensive definition includes more then 30 rules only an example for the choice operator is given. Informally this operator defines that a set of tasks are enabled concurrently. A task may be activated but due to its execution the others become disabled. Thus, it implements an exclusive choice of task execution. Formally the behaviour of the operator can be defined as follows (ti ∈ t1 . . .tn ): act
ti → act
[ ](t1 ,t2 , . . . ,tn ) →
(10.1)
act
ti → ti
(10.2) act [ ](t1 ,t2 , . . . ,tn ) → ti Taking the second rule we explain how such a rule is to be read. Given that we may transit from ti to ti by executing act then a choice expression can be translated to ti by executing act with ti ∈ t1 . . .tn . The first rule declares that if an execution of act in ti lead to a successful termination then a choice expression may terminate also successfully by executing act with ti ∈ t1 . . .tn . This rule is applied if ti is an atomic unit, so called action. Further readings about the inference rules for task modelling can be found in [36]. The definition of a CCTML model is based on task names and a complex expression which specifies the structure of the model. The transformation process of a CCTML model to a state transition system by inference rules always terminates as expressions are always simplified until no rule to apply exists anymore. Picking up the example above a choice expression is transformed into a simple task expression (more precisely a state representing [ ](t1 ,t2 , . . . ,tn ) to a state representing ti ) which is in turn further simplified. Therefore eventually the state representing the empty task expression is created. This expression cannot be further simplified. 10.3.4
Probabilistic Context-Free Grammars
Probabilistic context free grammars (PCFGs) are usually applied in speech recognition problems. They extend the notion of context free grammars by assigning probabilities to
218
Activity Recognition in Pervasive Intelligent Environments
rules. For a general introduction we refer to [37]. Valid words of the language are derived by replacing non-terminals according to the rules starting with the start symbol. Using the probabilities attached to the rules, we can compute the overall probability that a word has been generated using a given grammar. Example 10.4. As a running example we use the following PCFG with the set of terminal symbols T = {indoor, walk, stop, carStop, carSlow, carFast, indoor}, and the non-terminals N = {Day, Trans, Car}, N1 = Day, and R = {(r1 := 1.0 : Day → Trans, indoor, Trans), (r2 := 0.3 : Trans → walk), (r3 := 0.7 : Trans → carStop, Car, carStop), (r4 := 0.2 : Car → carSlow, carFast, carSlow), (r5 := 0.8 : Car → carSlow, carFast, carSlow, Car)} in which every rule (r1 , . . . , r5 ) is annotated with its probability π . A valid sequence of actions is walk, indoor, walk. Another valid sequence in which the first walk has been exchanged is carStop, carSlow, carFast, carSlow, carStop, indoor, walk Definition 10.5 (PCFG). A probabilistic context free grammar is defined as a quintuple T, N, N1 , R, π , with T being a set of terminal symbols, N being a set of non-terminals, N1 ∈ N being the start symbol, R (rewrite rules) being a relation between non-terminals and words ζ ∈ (N × T)∗ , such that for every non-terminal n ∈ N there is at least one rule (n, ζ ) ∈ R, and π : R → R assigning probabilities to rules such that for all n ∈ N we find ∑(n,ζ )∈R π ((n, ζ )) = 1. 10.4
Synthesising Probabilistic Models
The following three sections show different approaches for the construction of hidden Markov models based on planning operators, task models and grammars, respectively. Afterwards, we discuss a first approach to combine the different modelling approaches to produce a single joint HMM. 10.4.1
From Task Models to Hidden Markov Models
Using the definitions presented in Section 10.2, we define the corresponding HMM for a given CCTML-model as follows:
Synthesising Generative Probabilistic Models for High-LevelActivity Recognition
219
Definition 10.6. Let T, γ , O, prio, o be a CCTML model. S, π , T , τ , O, P
We define the corresponding HMM
as follows: act
• S = {(γ⎧ )} ∪ {e | e ∈ S, e → e } ⎨1 if s = γ • π (s) = ⎩0 otherwise act
• T = {(t1 ,t2 ) | t1 ,t2 ∈ S,t1 → t2 } act prio(act) with ti → t j • τ (ti ,t j ) = prio(act) ∑(t ,t)∈T and t act →t i
• P(ti , o) =
i
1 act
|{o(act) | (t,ti , ) ∈ T and t → ti }|
For a given CCTML model we derive the HMM by extracting all potential states from the CCTML by the inference rules. Thus it is started with the task expression γ and all potential action relations are fired. The resulting task expressions are added to the set S. This process is continued until the empty task expression is derived. Therefore the set of states of the HMM is consisting of all potential states of the CCTML model. Task models clearly specify the initial state by their structure. Therefore the initial probability of γ is 1 and for all other states 0. In the same vein as the states, transitions are derived. We define a transition in the HMM for each pair of states which can be reached by executing an action. The transition probabilities are calculated by means of the function prio. As a transition in the HMM coincides with an execution of an action in the task model transition probabilities are calculated by the ratio of the priority of the task under execution and the sum of priorities of all potential task executions. The function o(t) assigns an observation to an action. The probability of the occurrence of a certain observation in a state is uniformly distributed over the observations assigned to the incoming actions. 1. Give Talk
2. Introduce
>>
3. Configure Equipment
8. Connect Laptop & Projector
|=|
>>
4. Start Presentation
9. Set to Presentation
Fig. 10.4
>>
10. Show Next Slide
5. Next Slide
>>
* [>
6. End Presentation
11. Explain Slide
CCTML Model for “Giving a Presentation”
>>
7. Leave Room
220
Activity Recognition in Pervasive Intelligent Environments
Example 10.5. Let us examine the transformation by an example depicted in Figure 10.4. Nodes denote tasks whereas edges either represent hierarchical task decomposition (vertical) or temporal operators (horizontal). The model specifies how a presenter may give a presentation. First the presenter introduces herself, then the presenter configures her equipment by connecting the laptop to the projector and set her laptop to presentation mode. Note that these two action may be executed in arbitrary order (denoted by the orderindependence operator (| = |)). After doing so the presenter starts her talk. The talk itself is performed by presenting slides iteratively (denoted by the unary iteration operator (∗ )). After ending the presenting which aborts the iteration (due to the disabling operator ([>))the presenter leaves the room. Using the prefixed numbers the task model can be paraphrase by the formal task expression term: >> (2., | = |(8., 9.), 4., [> (>> (10., 11.)∗ , 6.), 7.). Please note that there are also modelling elements which no visual counterpart (e.g. the observations O). The same applies for prio function which assigns for each task the priority 1 but for the following: (8., 4), (10., 8), (6., 2) The first number denotes the task to be assigned whereas the second illustrates the priority. Thus we have all information to construct the HMM.
{4, 10, 3, 2}
{2,8}
{}
1.0
0.727
1.0
0.8 {2}
{3,2} 0.2
1.0
{4,3,2}
0.8
0.667 {4,3, 2,5}
0.333
{4,3,2, 5,6}
0.182
1.0
0.2
0.091
1.0 {1}
{2,9}
Fig. 10.5
Corresponding HMM to Figure 10.4
The resulting HMM is depicted in Figure 10.5. Again not all modelling elements are visualised (observations (O) and initial probability (π ) are hidden). Nodes represent elements of S whereas edges define state transitions (T). Moreover labels of transitions mark transition probabilities (τ ) for the corresponding transitions. Labels inside of nodes denote the set of task executed within the state (again, for reason of brevity we use the prefixed numbers). 10.4.2
From Planning Problems to Hidden Markov Models
Given a planning problem as defined in the preliminaries, we first create the directed acyclic graph (and therefore the HMM transition model) that contains all operator sequences that
Synthesising Generative Probabilistic Models for High-LevelActivity Recognition
221
reach a state where the goal condition G is true. Every vertex corresponds to a state of the world and the edges are annotated with actions. As humans normally behave situationdriven, we execute in our implementation a forward search, applying consecutive each operator to a world state I1 . If the preconditions of the operator are satisfied, we derive a new world-state I2 . We continue until either the goal condition is satisfied or we ran out of applicable actions. Afterwards we generate the observations O and the observation model P by applying a P mapping. This process is defined as follows: Definition 10.7. Let I, Is , G, A, pre, eff+ , eff− be a planning problem, O be a set of possible observations and P : A × O → R be a mapping from actions and observations to probabilities. Then, we define the corresponding HMM S, π , T, τ , O, P as follows: • S = (Is ) ∪ {(s ) | (s) ∈ S, G ⊆ s, a ∈ A, pre(a) ⊆ s, s := s \ eff− (a) ∪ eff+ (a)} • T = {((s), (s )), a ∈ A and s := s \ eff− (a) ∪ eff+ (a)} • τ (s1 , s2 ) = • π ((s)) =
1 |{(s1 ,s )|(s1 ,s )∈T}|
⎧ ⎨p
if s = Is
⎩0
and p =
otherwise • P((s), o) = P(a, o)
1 |{(s)|(s)∈S and s⊆Is }|
Example 10.6. Reconsider the planning problem from Example 10.2, the set of observables {oa , ob , oc } and P(a, oa ) = 1, P(b, ob ) = 1, P(c, oc ) = 1. The states, prior probabilities and transitions of the resulting HMM are depicted in Figure 10.6 and we find P(({}, a), oa ) = P(({}, b), ob ) = P(({b}, a), oa ) = P(({a}, b), ob ) = 1.
0.5 {} aa
1.0
{a} ab
0.5
1.0 1.0
{} ab
1.0
{a,b} ac
{b} aa
Fig. 10.6 A graphical representation of the HMM from Example 10.6.
While the resulting transition model T is already usable, we can enhance it by applying heuristic functions to weight each transition individually and generate better transition probabilities that reflect more the individual human behaviour. Possible heuristic functions
222
Activity Recognition in Pervasive Intelligent Environments
come e.g. from the planning domain [38] (plan length, goal-distance) or from knowledgedriven systems [39]. We can apply these heuristics as follows: Salience implies to explicit prioritise operators. We can just give each operator a certain weight that will added to all transitions from this operator. Refractoriness denotes that once a operator is applied, don’t apply it again. This rule can be modelled in PDDL by introducing history-propositions that store whether an operator has been applied. Recency indicates to fire most recently activated rule first. It can be implemented by calculating the goal-distance to each operator, in order to add penalty to actions that lead to to plans with more plan-steps. Specificity is to choose an operator with the most conditions or the most specific conditions ahead of a more general rule (e.g. the number of ground propositions the effects of an action that are also part of the goal condition. These different strategies generate different transition probabilities τ for a HMM. We are currently investigating which rules are more successful in mimicking human behaviour. The derivation of the observation probabilities in the example is arguably very simple with a given direct action-observation mapping. However this mapping can be generated by more sophisticated approaches from related literature. Philopose et al. [13] mined these observation probabilities for RFID-based activity recognition from the web. In later publications, Pentney further enhanced this approach [28] by data-mining and combining more common-sense databases like Cyc [40] and OpenMind [41]. These and similar ongoing research complements our approach of automatically synthesising probabilistic models for activity recognition. 10.4.3
From Probabilistic Context-Free Grammars to Hidden Markov Models
As already argued above, humans tend to describe complex actions in a hierarchical manner. One approach to describe complex tasks has been discussed in Section 10.4.1. Here we pursue a second approach by employing probabilistic context free grammars (PCFG). Many users find writing grammar rules more intuitive than e.g. a causal language description. PCFGs are also used in the literature intention recognition by Kiefer et al. [42]. Our approach to construct HMMs from probabilistic context-free grammars is as follows: We first construct a so-called hierarchical hidden Markov model which captures the structure of a given grammar and then we flatten this hierarchical model to obtain a normal
Synthesising Generative Probabilistic Models for High-LevelActivity Recognition
223
HMM. Obviously one could also use the hierarchical model for inferences, but as already argued we would like to obtain a plain hidden Markov model. Therefore, we convert the hierarchical model into a flat version. But before diving into the technical details of the transformation, we introduce extended PCFGs and hierarchical hidden Markov models formally. Extended PCFGs (EPCFG) extends a PCFG by adding the set of possible observations and the observation probabilities as follows: Definition 10.8 (EPCFG). Let T, N, N1 , R, π be a PCFG as defined above, let I be a set of input symbols and P : T × I → R mapping input symbols and terminal states to observation probabilities, such that for all t ∈ T we find ∑i∈I P(t, i) = 1. Then we call T, N, N1 , R, π , I, P an extended PCFG. To illustrate the concept, we use the following simple example in which a working day is modelled. The task was to infer the current activity based on the speed obtained from a GPS-sensor. Example 10.7. Reconsider the grammar shown in Example 10.4. Using the current speed as input, we have I = R and we define the observation probabilities as follows: Pindoor = N(2,3)
Pwalk = N(4,3)
PcarSlow = N(25,15)
PcarFast = N(50,20)
PcarStop = N(0,2)
The observations probabilities shown above define the probability of certain activities with respect to the current speed of the user. Hierarchical hidden Markov models [43] extend the usual notation of a HMM, by allowing the definition of sub-models c ∈ C. Those sub-models can be activated by so called internal states. This call-relation is captured in the function J with J(c, i) = c stating that state i in sub-model c activates sub-model c , for non-internal states we define J(c, i) = . A given sub-model can be left in any state with ξ (c, i) being the probability of sub-model c terminating in state i. To simplify the notation below, we use natural numbers to denote the states within a sub-model of a HHMM. Definition 10.9 (HHMM). A hierarchical hidden Markov model is defined as a octuple C,C1 , | · |, π , τ , ξ , J, O, P with C being a set of model names and C1 ∈ C be the start model, | · | : C → N defining the size of a given sub-model, π : C × N → R being a prior probability function with π (c, i) being the probability that model c starts in state i and for all c ∈ C
224
Activity Recognition in Pervasive Intelligent Environments |c|
we find ∑i=0 π (c, i) = 1, τ : C × N × N → R defining the transition probabilities between to |c|
states within a given model with ∑ j=0 τ (c, i, j) = 1 for all c and 0 i < |c|, ξ : C × N → R defining the exit probability, J : C × N → C ∪ {} being a function from states to other submodels indicating sub-model-calls, O being a set of input symbols and P : C × N × O → R being a function from sub-model states and inputs to observation probabilities. Please note that we use the model name as an index rather than a parameter, e.g., we write
πc (i) instead of π (c, i). D 0 Day:
T
0.3
2 indoor
Trans
0.7 0
Trans:
1
Trans
1
walk
2
3
carStop
C
Car
carStop
0.2
0.8 0
Car:
carSlow
1
2 carFast
3 carSlow
4 carSlow
5
6 carFast
carSlow
Car
Fig. 10.7 A graphical representation of the HHMM from Example 10.8. Prior probabilities are depicted by an in-going arrow, exit probabilities using an outgoing arrow, and calls to a sub-model using a dashed line. All unlabelled links are weighted with 1.0.
Example 10.8. Consider the HHMM C,C1 , | · |, π , τ , ξ , J, O, P with C = {D, T,C}, C1 = D, |D| = 3, |T | = 4 and |C| = 7, πD (0) = 1, πT (0) = 0.3, πT (1) = 0.7, πC (0) = 0.2, πC (3) = 0.8 and πc (i) = 0 otherwise, ξD (2) = 1, ξT (0) = 1, ξT (3) = 1, ξC (2) = 1, ξC (6) = 1 and
ξc (i) = 0 otherwise, JD (0) = T , JD (2) = T , JT (2) = C and JC (6) = C. The transition probabilities are shown as arrows in Figure 10.7, which contains a graphical representation of this HHMM. Every arrow represents a transition with non-zero probability. Please note that this example defines the structure of the HHMM only and neither the input symbols nor the observation probabilities are depicted here. 10.4.3.1
From PCFGs to HHMMs
As a first step while constructing a HMM, we construct a HHMM corresponding to a given EPCFG, with corresponding meaning informally “doing the same”. For the moment, we consider acyclic PCFGs only. After presenting the definition and discussing the underlying intuition, we extend our transformation to PCFGs allowing for cycles of length 1, i.e., references from a rule to the non-terminal.
Synthesising Generative Probabilistic Models for High-LevelActivity Recognition
225
For convenience, we assume the set of rules to be ordered and define the following function
ι : N × R × N → N mapping a given non-terminal, rule and position of some symbol within the body of the rule to a global offset within respect to the non-terminal. Considering the PCFG from Example 10.7, we find ι (Day, r1 , 1) = 1 and ι (Car, r5 , 2) = 5. Definition 10.10. Let T, N, N1 , R, π , I, P be an extended PCFG. We define the corresponding hierarchical hidden Markov model C,C1 , | · |, π , τ , ξ , J, O , P as follows: • C := N, C1 := N1 , |N| := ∑(n→ζ )∈R |ζ | and O := I ⎧ ⎨π (r) if i = ι (c, r, 1) • πc (i) := ⎩0 otherwise ⎧ ⎨1 if i = ι (c, (c → ζ ), i ), j = i + 1 and i < |ζ | • τc (i, j) := ⎩0 otherwise ⎧ ⎨1 if i = ι (c, (c → ζ ), i ) and i = |ζ | • ξc := ⎩0 otherwise
• Jc (i) := c iff i = ι (c, (c → ζ ), i ), ζ [i ] = c and c ∈ N
• Pc (i, s) := P(t, s) iff i = ι (c, (c → ζ ), i ), ζ [i ] = t and t ∈ T For a given grammar, we construct the corresponding HHMM by inserting a sub-model for every non-terminal. For every occurrence of a (non)-terminal symbol within a rule for the non-terminal we insert a state into the corresponding sub-model. E.g., for the non-terminal Day from the example above, we create a sub-model containing 3 states. In our running example, we use a cycle of length 1 to reference sub-model C from state C6 . Those cycles can be handled by changing the transition model of the corresponding sub-state. Without discussing the technical details, the result is shown in the following example. Example 10.9. Reconsidering the EPCFG from Example 10.7 we find the corresponding HHMM to be the one discussed in Example 10.8. 10.4.3.2
Flattening HHMMs
After constructing a hierarchical hidden Markov model from a given PCFG, we have to flatten this model into a “normal” HMM. Considering the procedural semantics of a HHMM, we find that the model’s global state can be described using the current call-stack, which is represented as a list of numbers. E.g., in the HHMM from Figure 10.7, the state 0 of
226
Activity Recognition in Pervasive Intelligent Environments
the start model Day calls sub-model Trans, being there in state 2 is represented as call stack [0, 2]. Please note furthermore, that for a given stack [s1 , . . . , sn ], the stack context [1s , . . . , sn−1 ] completely specifies the sub-model of state sn . Therefore, we can use stack contexts to refer to sub-models in the definitions below. We use the notation [s1:n ] to refer to [s1 , . . . , sn ]. Before defining a flatted HMM, we compute the set of reachable stacks as follows: Definition 10.11 (Stacks). Let C,C1 , | · |, π , τ , ξ , J, O, P be a given HHMM. We define the set of reachable stacks as follows: S = {[i] | 0 i < |C1 | and πC1 (i) > 0} ∪ {[si:n−1 , i] | [si:n ] ∈ S, 0 i < |[si:n−1 ]| and τ[si:n−1 ] (sn , i) > 0} ∪ {[si:n , i] | [si:n ] ∈ S, J[si:n−1 ] (sn ) = c, 0 i < |c| and πc (i) > 0} This definition collects all reachable stacks by simply traversing the HHMM. To obtain a flattened HMM, we construct a state for every possible stack ending with a non-internal state, and afterwards, we compute the transition probabilities between different stacks. Definition 10.12. Let C,C1 , | · |, π , τ , ξ , J, O, P be a given HHMM and S be the set of stacks. We define the corresponding HMM S, π , T, τ , O, P as follows: • S := {[s1:n ] | [s1:n ] ∈ S and J[s1:n−1 ] (sn ) = } ⎧ ⎨πC (i) iff s = [i] 1 • π : S → R : s → ⎩0 otherwise • T := S × S min(n,m)
• τ : T → R : ([s1:n ], [r1:m ]) → ∑i=1
p([s1:n ], [r1:m ], i)
• P([s1:n ], o) = P[s1:n−1 ] (sn , o) with p : S × S × N → R being defined as follows: i
i
j=n
j=m
p([s1:n ], [r1:m ], i) = ∏ ξ[s1: j−1 ] (s j ) · (1 − ξ[s1:i−1 ] (si )) · τ[s1:i−1 ] (si , ri ) · ∏ π[r1: j−1 ] (r j ) The intuition behind the definition of p is as follows: to reach a stack configuration [r1:m ] from a stack [s1:n ] using a transition on level i, we have to leave all sub-models from level n back to level i (∏ij=n ξ[s1: j−1 ] (s j )), afterwards, we multiply the probability of not leaving sub-model i (1 − ξ[s1:i−1 ] (si )) and transiting to state ri (τ[s1:i−1 ] (si , ri )), finally we call all necessary sub-models to reach the desired stack-configuration (∏ij=m π[r1: j−1 ] (r j )).
Synthesising Generative Probabilistic Models for High-LevelActivity Recognition
227
Example 10.10. Figure 10.8 shows the flattened version of the hierarchical HMM from Example 10.8.
0.3 0.0
0.7
1
0.2 0.1
0.2.0
0.2.1
0.2.3
0.2.4
2.0
0.7
0.2.2
0.2 0.3
0.2
0.8
0.3
0.2.5
2.1
2.2.0
2.2.1
2.2.2 2.3
0.2
0.8 2.2.3
0.8
2.2.4
2.2.5
0.8
Fig. 10.8 A graphical representation of the flattened HHMM from Example 10.8. As above, prior probabilities are depicted by an in-going arrow, exit probabilities using an outgoing arrow, and unlabelled links are weighted 1.0.
Using the process described above, we can construct a HMM for a given extended PCFG such that the HMM captures the ideas of the grammar. I.e., all valid sequences of actions are represented within the HMM and the corresponding observation models are attached to the states. As mentioned above, we applied this approach in a first simple scenario, in which we try to infer the users activity based on her current speed. The experiments show that the general temporal structure of a day can easily be captured in a PCFG and thus be transformed into an equivalent HMM. Here, we did not discuss the duration of single actions. But instead of using just a single state for a given action, we can use a sequence of states. But this extension is beyond the scope of this paper. 10.4.4
Joint HMMs
In the previous sections three approaches have been presented which allow us to model human activities in a more convenient manner than it is achievable by instantly utilising the hidden Markov models (HMM) approach. We have seen that it is feasible to synthesise a HMM based on a description of domain and problem of the intended scenario in terms of precondition/effect operators or on a set of rules of a Probabilistic Context-Free Grammar (PCFG) or on a process algebra described in the CTML language, respectively. In each case we obtain a well-defined hidden Markov model as a result, that allows us to infer human activities, given the specific case of application. As mentioned in Section 10.3.1, we can assume that the real-world processes which can be described by one of the previously depicted formalisms and of which afterwards a HMM can be synthesised, consist of time-sequential activities which we understand as each of
228
Activity Recognition in Pervasive Intelligent Environments
them beginning and ending at specific points in time, and following one after another. Thus, a state si ∈ S of the HMM S, π , T, τ , O, P can be seen as a specific activity in the underlying process. The transition (i, j) ∈ T connecting the states si and s j can – if
τ ((i, j)) > 0 – hence be seen as the existence of a potential consecutiveness between the two corresponding activities of this process. Note that it is varying and heavily scenario-dependent how well one of the specified formalisms fits to the process that is to be modelled. Consider having gained two or more somehow scenario-complementary hidden Markov models by utilising different synthesising algorithms. One would like to join these models, e.g. by inserting novel inter-HMM transitions, which would introduce additional potential consecutiveness between the corresponding activities. It is thereby evident that the result of this joining operation needs to be a well-defined hidden Markov model itself, preserving the advantages of this probabilistic approach. Definition 10.13 (JointHMM). Let H be a set of n well-defined hidden Markov models Si , πi , Ti , τi , Oi , Pi with 1 i n, Si ∩ S j = 0/ for i = j, S =
Si and Oi = O j . Let R
be a set of connections with R ⊆ S × S and ρ : R → R be a mapping from inter-model connections to probabilities. We define the joint hidden Markov model S, π , T, τ , O, P as follows: • π : S → R : s → • T=
πi (s) n
with s ∈ Si
Ti ∪ R
⎧ ⎪ ρ ((s1 , s2 )) ⎪ ⎪ ⎨ • τ : S × S → R : (s1 , s2 ) → τi (s1 , s2 ) · f ⎪ ⎪ ⎪ ⎩
if (s1 , s2 ) ∈ R with s1 , s2 ∈ Si and f = 1 − ∑(si ,s )∈R ρ ((si , s ))
• P : S × O → R : (s, o) → Pi (s, o) with s ∈ Si Given two or more hidden Markov models compliant with definition 10.1, as sextuples Si , πi , Ti , τi , Oi , Pi the question was issued of how to join these models. In a first step, we unify the original sets of states S1 , S2 , . . . , Sn of the n HMMs to obtain the set of states S of the joint HMM, while assuming them to be pairwise disjoint. Furthermore, the values of the joint prior probability function π should sum up to 1. As a straightforward approach we divide all the values πi by the number of the original HMMs, n. Incidentally, one also could define (some kind of global) prior weights over the different models, and combine these with the priors for each model. We decided not to do so for simplicity reasons.
Synthesising Generative Probabilistic Models for High-LevelActivity Recognition
229
The joint set of transitions T can be determined by unifying the particular sets of transitions T1 , T2 , . . . , Tn and R, which is a set of novel inter-HMM transitions. This step also needs the assumption of the subsets T1 , T2 , . . . , Tn , R to be pairwise disjoint. The set R can hereby be imagined as a set of rules that allow an intuitive way to fully specify novel inter-HMM transitions. As an example, one of these rules might denote a transition between the states s and s of the HMMs A and B, respectively, together with assigning the probability p: p
A(s) − → B(s ). The probability value of an inter-HMM transition is herewith interpreted as the chance of exiting a sub-HMM and entering the targeted sub-HMM. The joint transition probability function τ is therefore determined by the original τi except for the particular transitions that originate from a state s ∈ Si and this s itself is a starting point of at least one inter-HMM transition. In such cases, the probabilities of the intra-HMM transitions coming from s have to be normalised with the factor f by Definition 10.13. Finally coming to the observation model, for simplicity reasons we assume the sets of observations Oi of the sub-HMMs to be identical. Thus, the observation probability function P is fully determined by the Pi of the sub-HMMs. Example 10.11. Consider now having modelled a meeting scenario held in an intelligent meeting room, e.g. by using PDDL operators, and furthermore the modelling of a technical infrastructure demonstration in this room using a set of PCFG rules. After having translated these two model descriptions, we have as a result obtained two disparate hidden Markov models, each recognising diverse sets of activities. Once we imagine the meeting to be preceded by such a technical demonstration the concern for joining these HMMs is coming up. Let now denote M(Presentation) the state Presentation of a hidden Markov model M, with M modelling the meeting scenario, and D(DemoEnding) denote a state DemoEnding of HMM D, with D modelling the technical demonstration. We now would like to introduce the existence of a potential transition between these two activities, expressing a potential consecutiveness between the ending of a technical demonstration and a presentation within a project meeting. For this purpose we bring in a novel inter-HMM transition and establish 0.2
a rule which connects the two states: D(DemoEnding) −→ M(Presentation) The probability value 0.2 indicates a chance of 0.2 to leave the sub-HMM D and enter M. Each intra-HMM transition which comes from the state D(DemoEnding) needs to be multiplied by 1 − 0.2 = 0.8. At this point, we have everything we need to join the HMMs D and M by Definition 10.13.
230
10.5
Activity Recognition in Pervasive Intelligent Environments
Discussion
The underlying idea of all our approaches is to turn a formal description of the users activity into a probabilistic generative model that is able to recognise the desired activities. Over time, many different formal descriptions for human behaviour have emerged. In this paper we have explored three alternatives: a top-down description where we have chosen CTML as a representative for task modelling language. The second approach is a bottomup modelling approach where we use a causal description with preconditions and effects. Our third approach is to use PCFGs. To leverage the advantages of the individual modelling approaches we combine the simpler HMMs generated by each modelling approach to a joint HMM. PDDL
Synthesise
HMM1
CCTML
Synthesise
HMM2
PCFG
Synthesise
HMM3
Join
Joint HMM
Fig. 10.9 Overview of our model synthesis research
In this section we discuss the individual strength and weaknesses of the different approaches when modelling real world scenarios, creating some guidelines when to use a certain modelling paradigm and when not.
10.5.1
Planning operators
“Classical” planning has a long history of searching for solutions in order to efficiently schedule resources or control the behaviour of robots. We translated this approach to humans in order to describe (simplified) human behaviour. The advantage of the planning approach is the implicit description of all processes that can occur, clearly an advantage as human behaviour often is interleaved with different activities. As the processes are modelled implicit, no permutation can be forgotten. Also the possible interaction between multiple humans emerge automatically, a clear advantage for team scenarios. Furthermore it is easy to extend the application domain with new operators as these can be simply added to the action repository and are automatically added by the algorithm in the right positions. Thus devices in a smart environment could describe their possible human
Synthesising Generative Probabilistic Models for High-LevelActivity Recognition
231
interactions with preconditions and effects and become a part in the activity recognition process [44]. Two main challenges arise from this planning approach: The first is the description of the world-state I. It is difficult to define which information belongs in the current world-state and what therefore all the preconditions and effects of an action need to take into account. The second challenge is the state explosion which is here much more imminent than in the other two approaches. However by employing approximate inference we can construct a particle filter where the planning process efficiently describes the state transition process, thus avoiding the need of exploring the whole state space a-priori.
10.5.2
Task models
Task analysis and modelling has been employed for specifying user actions over decades. It is a commonly agreed in HCI that task analysis is a important device for interaction development (which we consider as a super domain of intention recognition). As CTML is rooted in task modelling approaches of HCI, it supports the asset of those approaches: understandability, intuitiveness, tool support and embedment into a methodology. The assets are also relevant for intention recognition especially while creating the model. Top-down modelling has the advantage of incorporating gradual refinement of models in an intuitive manner. Actions can be further decomposed as necessary. Moreover as CTML define a formal semantics verification algorithms can be employed to assure certain quality criteria (e.g. deadlock freedom). Another advantage of CTML is the opportunity of validating the model by animation. As the semantics are well defined an interactive animation can be directly created. However it is more important how such a modelling approach is valid for highly dynamic systems like smart environments at runtime. One has to admit that task modelling is a rather static approach as task models are created interactively by software designers at design time. Adding new tasks is rather cumbersome as the task model needs to be adapted (which is not the case using the precondition/effects description formalism). On the contrary designed models exhibit a higher quality than automatically composed ones. In contrast to context-free grammars CTML is able to specify concurrent behaviour. This is a major advantage as humans usually perform not only one task a time but mingling task in an interleaving manner. Especially for multi-user specifications CTML models can become quite complex which leads naturally to very large HMMs. This is a problem that can be solved with approximate
232
Activity Recognition in Pervasive Intelligent Environments
inference by employing a particle filter. The CTML is intuitively a compact representation of the state transition function. With a particle filter we keep only track of the executed task and employ a lazy (on demand) state creation. The treatment of observation is currently rather basic. More sophisticated approaches need to be developed to create more realistic HMMs which is one item currently under investigation. 10.5.3
Probabilistic Context-Free Grammars
Probabilistic context-free grammars have quite successfully been applied in the area of speech recognition. Here we used them to initialise a hidden Markov model to recognise human behaviour. In first experiments we used the approach to model the overall structure of a day and to infer the users activity based on her current speed as computed by a GPS sensor. Those experiments showed that our approach can indeed be used to model highlevel activities and their temporal structure quite intuitively. But they also showed that e.g., interleaving activities are hard to model. On the other hand, the overall structure can be modelled quite intuitive. Due to space constraints we did not discuss timing information here. But those can be integrated into the model using the standard time-expansion of states. E.g., if a certain state is supposed to last for three time-steps, it is expanded into a sequence of three identical states. In a similar fashion and using self transitions in all states of the sequence it is possible to implement a negative binomial distribution with respect to the expected time a state should last. 10.5.4
Joint Hidden Markov Models
The major advantage in the presented HMM-joining algorithm can be seen in addressing the need for a formalism to combine a set of hidden Markov models that (potentially) have been synthesised by using differing formalisms. p
Therefore we have introduced a simple operator − →, that allows us to deploy a set of rules, with each one introducing a novel inter-HMM transition, plus the assignment of a probability value p which obeys an intuitive semantics: the chance of leaving the current sub-HMM through this particular transition. As a result, we gain a well-defined hidden Markov model, incorporating the previously shown advantages of probabilistic generative models and symbolic high-level description of complex human activities. We are currently investigating, how we can weaken the precondition Oi = O j and thereby determine the two parameters O and P of the joint HMM in a more generalising manner,
Synthesising Generative Probabilistic Models for High-LevelActivity Recognition
233
so that the set of observations which can be recognised by the available sensors, and the state-observation probabilities, respectively, are allowed to differ between sub-HMMs.
10.6
Summary and Outlook
In order to easily describe hierarchical behaviour, we employ high-level description languages to derive probabilistic models for activity recognition. In this paper we have shown three alternatives based on a process-based language, a causal language, and a grammar, to synthesise generative probabilistic models for activity recognition from different highlevel description modelling paradigms. These different approaches comprise top-down and bottom-up modelling, and are thus complementing each other. We have further sketched a method to combine the languages to derive a joint HMM, thus leveraging the individual strength and weaknesses of the different modelling paradigms. Given the basics in this paper, we seek to build activity recognition systems for complex domains like team-activity recognition or a day-care nurse, that comprise of many hierarchical, long-term, intermingled activities, leading to very large state-spaces that are very challenging to describe.
References [1] D. J. Cook, J. C. Augusto, and V. R. Jakkula, Ambient intelligence: Technologies, applications, and opportunities, Pervasive and Mobile Computing. 5(4), 277–298 (August, 2009). ISSN 15741192. doi: 10.1016/j.pmcj.2009.04.001. URL http://dx.doi.org/10.1016/j. pmcj.2009.04.001. [2] G. D. Abowd and E. D. Mynatt, Designing for the human experience in smart environments. pp. 151–174 (September, 2004). doi: 10.1002/047168659X.ch7. URL http://dx.doi.org/ 10.1002/047168659X.ch7. [3] G. M. Youngblood and D. J. Cook, Data mining for hierarchical model creation, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews). 37(4), 561–572 (July, 2007). ISSN 1094-6977. doi: 10.1109/TSMCC.2007.897341. URL http://dx.doi. org/10.1109/TSMCC.2007.897341. [4] F. Doctor, H. Hagras, and V. Callaghan, A fuzzy embedded agent-based approach for realizing ambient intelligence in intelligent inhabited environments, IEEE Transactions on Systems, Man, and Cybernetics, Part A. 35(1), 55–65, (2005). doi: 10.1109/TSMCA.2004.838488. URL http://dx.doi.org/10.1109/TSMCA.2004.838488. [5] E. Chávez, R. Ide, and T. Kirste. Samoa: An experimental platform for situation-aware mobile assistance. In eds. C. H. Cap, W. Erhard, W. Koch, C. H. Cap, W. Erhard, and W. Koch, ARCS, pp. 231–236. VDE Verlag, (1999). ISBN 3-8007-2482-0. URL http://dblp.uni-trier. de/rec/bibtex/conf/arcs/ChavezIK99. [6] A. Fox, B. Johanson, P. Hanrahan, and T. Winograd, Integrating information appliances into an interactive workspace, IEEE Computer Graphics and Applications. 20(3), 54–65 (August,
234
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
Activity Recognition in Pervasive Intelligent Environments
2000). ISSN 0272-1716. doi: 10.1109/38.844373. URL http://dx.doi.org/10.1109/38. 844373. S. A. Velastin, B. A. Boghossian, B. Ping, L. Lo, J. Sun, and M. A. Vicencio-silva. Prismatica: Toward ambient intelligence in public transport environments. In Good Practice for the Management and Operation of Town Centre CCTV. European Conf. on Security and Detection, vol. 35, pp. 164–182, (2005). URL http://citeseerx.ist.psu.edu/viewdoc/summary? doi=10.1.1.110.2113. Z. Chen. Bayesian filtering: From kalman filters to particle filters, and beyond. Technical report, McMaster University, (2003). URL http://math1.unice.fr/\~{}delmoral/chen\ _bayesian.pdf. L. Liao, D. Fox, and H. Kautz. Location-based activity recognition using relational markov networks. In IJCAI’05: Proceedings of the 19th international joint conference on Artificial intelligence, pp. 773–778, San Francisco, CA, USA, (2005). Morgan Kaufmann Publishers Inc. URL http://portal.acm.org/citation.cfm?id=1642293.1642417. D. J. Patterson, L. Liao, K. Gajos, M. Collier, N. Livic, K. Olson, S. Wang, D. Fox, and H. A. Kautz. Opportunity knocks: A system to provide cognitive assistance with transportation services. In eds. N. Davies, E. D. Mynatt, I. Siio, N. Davies, E. D. Mynatt, and I. Siio, Ubicomp, vol. 3205, Lecture Notes in Computer Science, pp. 433–450. Springer, (2004). ISBN 3-540-22955-8. URL http://dblp.uni-trier.de/rec/bibtex/ conf/huc/PattersonLGCLOWFK04. D. H. Hu and Q. Yang. Cigar: concurrent and interleaving goal and activity recognition. In AAAI’08: Proceedings of the 23rd national conference on Artificial intelligence, pp. 1363–1368. AAAI Press, (2008). ISBN 978-1-57735-368-3. URL http://portal.acm.org/ citation.cfm?id=1620286. A. Hein and T. Kirste, A hybrid approach for recognizing adls and care activities using inertial sensors and rfid. 5615, 178–188, (2009). doi: 10.1007/978-3-642-02710-9\_21. URL http: //dx.doi.org/10.1007/978-3-642-02710-9\_21. M. Perkowitz, M. Philipose, K. Fishkin, and D. J. Patterson. Mining models of human activities from the web. In WWW ’04: Proceedings of the 13th international conference on World Wide Web, pp. 573–582, New York, NY, USA, (2004). ACM. ISBN 1-58113-844-X. doi: 10.1145/ 988672.988750. URL http://dx.doi.org/10.1145/988672.988750. H. A. Kautz. A formal theory of plan recognition and its implementation. In eds. J. F. Allen, H. A. Kautz, R. Pelavin, and J. Tenenberg, Reasoning About Plans, pp. 69–125. Morgan Kaufmann Publishers, San Mateo (CA), USA, (1991). URL http://citeseerx.ist.psu.edu/ viewdoc/summary?doi=10.1.1.21.1583. R. P. Goldman, C. W. Geib, and C. A. Miller, A bayesian model of plan recognition, Artificial Intelligence. 64, 53–79, (1993). URL http://citeseerx.ist.psu.edu/viewdoc/ summary?doi=10.1.1.21.4744. M. Philipose, K. P. Fishkin, M. Perkowitz, D. J. Patterson, D. Fox, H. Kautz, and D. Hahnel, Inferring activities from interactions with objects, IEEE Pervasive Computing. 3(4), 50–57 (October, 2004). ISSN 1536-1268. doi: 10.1109/MPRV.2004.7. URL http://dx.doi.org/ 10.1109/MPRV.2004.7. D. W. Albrecht, I. Zukerman, and A. E. Nicholson, Bayesian models for keyhole plan recognition in an adventure game, User Modeling and User-Adapted Interaction. 8(1), 5– 47 (March, 1998). doi: 10.1023/A:1008238218679. URL http://dx.doi.org/10.1023/A: 1008238218679. H. H. Bui. A general model for online probabilistic plan recognition. In IJCAI’03: Proceedings of the 18th international joint conference on Artificial intelligence, pp. 1309–1315, San Francisco, CA, USA, (2003). Morgan Kaufmann Publishers Inc. URL http://portal.acm.org/ citation.cfm?id=1630846.
Synthesising Generative Probabilistic Models for High-LevelActivity Recognition
235
[19] E. Kim, S. Helal, and D. Cook, Human activity recognition and pattern discovery, IEEE Pervasive Computing. 9(1), 48–53 (January, 2010). ISSN 1536-1268. doi: 10.1109/MPRV.2010.7. URL http://dx.doi.org/10.1109/MPRV.2010.7. [20] G. Singla, D. J. Cook, and M. Schmitter-Edgecombe, Recognizing independent and joint activities among multiple residents in smart environments, Journal of Ambient Intelligence and Humanized Computing. 1(1), 57–63. ISSN 1868-5137. doi: 10.1007/s12652-009-0007-1. URL http://dx.doi.org/10.1007/s12652-009-0007-1. [21] T. Gu, Z. Wu, X. Tao, H. K. Pung, and J. Lu. epsicar: An emerging patterns based approach to sequential, interleaved and concurrent activity recognition. In 2009 IEEE International Conference on Pervasive Computing and Communications, vol. 0, pp. 1–9, Los Alamitos, CA, USA (March, 2009). IEEE. ISBN 978-1-4244-3304-9. doi: 10.1109/PERCOM.2009.4912776. URL http://dx.doi.org/10.1109/PERCOM.2009.4912776. [22] R. Hamid, S. Maddi, A. Johnson, A. Bobick, I. Essa, and C. Isbell, A novel sequence representation for unsupervised analysis of human activities, Artificial Intelligence. 173(14), 1221–1244 (September, 2009). ISSN 00043702. doi: 10.1016/j.artint.2009.05.002. URL http: //dx.doi.org/10.1016/j.artint.2009.05.002. [23] A. Y. Ng and M. I. Jordan. On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes, (2001). URL http://citeseerx.ist.psu.edu/viewdoc/ summary?doi=10.1.1.19.9829. ˘ ˙I on discriminative vs. generative classifiers: [24] J.-H. Xue and D. M. Titterington, Comment on âA ˘ ˙I, Neural Processing Letters. 28(3), A comparison of logistic regression and naive bayesâA 169–187 (December, 2008). ISSN 1370-4621. doi: 10.1007/s11063-008-9088-7. URL http: //dx.doi.org/10.1007/s11063-008-9088-7. [25] S. Dasgupta, M. L. Littman, and D. McAllester. Pac generalization bounds for co-training, (2001). URL http://ttic.uchicago.edu/\~{}dmcallester/cotrain01.ps. [26] A. P. Dempster, N. M. Laird, and D. B. Rubin, Maximum likelihood from incomplete data via the em algorithm, Journal of the Royal Statistical Society. Series B (Methodological). 39(1), 1–38, (1977). ISSN 00359246. doi: 10.2307/2984875. URL http://dx.doi.org/10.2307/ 2984875. [27] V. W. Zheng, D. H. Hu, and Q. Yang. Cross-domain activity recognition. In Ubicomp ’09: Proceedings of the 11th international conference on Ubiquitous computing, pp. 61–70, New York, NY, USA, (2009). ACM. ISBN 978-1-60558-431-7. doi: 10.1145/1620545.1620554. URL http://dx.doi.org/10.1145/1620545.1620554. [28] W. Pentney. Large scale use of common sense for activity recognition and analysis. URL http: //citeseer.ist.psu.edu/rd/32044135\%2C761213\%2C1\%2C0.25\%2CDownload/ http://citeseer.ist.psu.edu/cache/papers/cs2/285/http:zSzzSzwww.cs. washington.eduzSzhomeszSzbillzSzgenerals.pdf/pentney05large.pdf. [29] W. Pentney, M. Philipose, and J. A. Bilmes. Structure learning on large scale common sense statistical models of human state. In eds. D. Fox, C. P. Gomes, D. Fox, and C. P. Gomes, AAAI, pp. 1389–1395. AAAI Press, (2008). ISBN 978-1-57735-368-3. URL http://dblp. uni-trier.de/rec/bibtex/conf/aaai/PentneyPB08. [30] R. E. Fikes and N. J. Nilsson, Strips: A new approach to the application of theorem proving to problem solving, Artificial Intelligence. 2(3-4), 189–208, (1971). doi: 10.1016/0004-3702(71) 90010-5. URL http://dx.doi.org/10.1016/0004-3702(71)90010-5. [31] Q. Limbourg and J. Vanderdonckt. Comparing task models for user interface design. In eds. D. Diaper and N. Stanton, The Handbook of Task Analysis for Human-Computer Interaction. Lawrence Erlbaum Associates, (2003). [32] M. Giersich, P. Forbrig, G. Fuchs, T. Kirste, D. Reichart, and H. Schumann, Towards an integrated approach for task modeling and human behavior recognition, Human-Computer Interaction. 4550, 1109–1118, (2007).
236
Activity Recognition in Pervasive Intelligent Environments
[33] M. Wurdel. Towards an holistic understanding of tasks, objects and location in collaborative environments. In ed. M. Kurosu, HCI (10), vol. 5619, Lecture Notes in Computer Science, pp. 357–366. Springer, (2009). ISBN 978-3-642-02805-2. [34] M. Wurdel, D. Sinnig, and P. Forbrig, Ctml: Domain and task modeling for collaborative environments, JUCS. 14(Human-Computer Interaction), (2008). [35] A. W. Roscoe, C. A. R. Hoare, C. A. R. Hoare, and B. Richard, The Theory and Practice of Concurrency. (Prentice Hall PTR, 1997). [36] F. Paterno and C. Santoro. The concurtasktrees notation for task modelling. Technical report, (2001). [37] K. Lari and S. Young, The estimation of stochastic context-free grammars using the insideoutside algorithm, Computer Speech and Language. (4), 35–56, (1990). [38] J. Hoffmann and B. Nebel. The ff planning system: Fast plan generation through heuristic search, (2001). URL http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1. 26.673. [39] L. Brownston, Programming expert systems in OPS5 : an introduction to rule-based programming. The Addison-Wesley series in artificial intelligence, (Addison-Wesley, 1985). ISBN 978. URL http://www.worldcat.org/isbn/978. [40] D. B. Lenat, Cyc: a large-scale investment in knowledge infrastructure, Commun. ACM. 38 (11), 33–38 (November, 1995). ISSN 0001-0782. doi: 10.1145/219717.219745. URL http: //dx.doi.org/10.1145/219717.219745. [41] P. Singh, T. Lin, E. T. Mueller, G. Lim, T. Perkins, and W. L. Zhu. Open mind common sense: Knowledge acquisition from the general public. In On the Move to Meaningful Internet Systems, 2002 - DOA/CoopIS/ODBASE 2002 Confederated International Conferences DOA, CoopIS and ODBASE 2002, pp. 1223–1237, London, UK, (2002). Springer-Verlag. ISBN 3-540-00106-9. URL http://portal.acm.org/citation.cfm?id=701499. [42] P. Kiefer and K. Stein. A framework for mobile intention recognition in spatially structured environments. In eds. B. Gottfried, H. K. Aghajan, B. Gottfried, and H. K. Aghajan, BMI, vol. 396, CEUR Workshop Proceedings, pp. 28–41. CEUR-WS.org, (2008). URL http://dblp. uni-trier.de/rec/bibtex/conf/ki/KieferS08. [43] S. Fine and Y. Singer. The hierarchical hidden markov model: Analysis and applications. In MACHINE LEARNING, pp. 41–62, (1998). [44] C. Reisse, C. Burghardt, F. Marquardt, T. Kirste, and A. Uhrmacher. Smart environments meet the semantic web. In MUM ’08: Proceedings of the 7th International Conference on Mobile and Ubiquitous Multimedia, pp. 88–91, New York, NY, USA, (2008). ACM. ISBN 978-1-60558-192-7. doi: 10.1145/1543137.1543154. URL http://dx.doi.org/10.1145/ 1543137.1543154.
Chapter 11
Ontology-based Learning Framework for Activity Assistance in an Adaptive Smart Home
George Okeyo, Liming Chen, Hui Wang, and Roy Sterritt Computer Science Research Institute, School of Computing and Mathematics, University of Ulster, BT37 0QB Newtownabbey, United Kingdom
[email protected] {l.chen, h.wang, r.sterritt}@ulster.ac.uk Abstract Activity and behaviour modelling are significant for activity recognition and personalized assistance, respectively, in smart home based assisted living. Ontology-based activity and behaviour modelling is able to leverage domain knowledge and heuristics to create Activities of Daily Living (ADL) and behaviour models with rich semantics. However, they suffer from incompleteness, inflexibility, and lack of adaptation. In this article, we propose a novel approach for learning and evolving activity and behaviour models. The approach uses predefined “seed” ADL ontologies to identify activities from sensor activation streams. Similarly, we provide predefined, but initially unpopulated behaviour ontologies to aid behaviour recognition. First, we develop algorithms that analyze logs of activity data to discover new activities as well as the conditions for evolving the seed ADL ontologies. Consequently, we provide an algorithm for learning and evolving behaviours (or life habits) from these logs. We illustrate our approach through scenarios. The first scenario shows how ADL models can be evolved to accommodate new ADL activities and peculiarities of individual smart home’s inhabitants. The second scenario describes how, subsequent to ADL learning and evolution, behaviours can be learned and evolved.
11.1
Introduction
The ability to provide activity assistance to inhabitants of a Smart Home (SH) [1] is significant in increasing the utility of SHs. However, before activity assistance can be provided the SH should be aware of the inhabitant’s ongoing activity or the intended activity. An SH is able to identify ongoing or intended activities through a process of activity recognition. In a typical SH, sensors are deployed to monitor an actor’s behaviour and the situated en237
238
Activity Recognition in Pervasive Intelligent Environments
vironment. By using activity recognition, the SH can analyze the incoming sensor data to identify the ongoing activity or to predict the intended activity. In general, activity recognition is made up of many different tasks, namely behaviour and environment monitoring, data processing, activity modelling, and pattern recognition [2]. At the core of activity recognition is activity modelling and representation. Essentially, computational activity models are created and used to analyze incoming sensor data to infer ongoing activities or the intended activities. Activity models can be created by handcrafting explicit models or by learning implicit models from sensor data. The resulting activity models represent activities of daily living (ADL). An emerging approach to activity modelling is to explicitly model and represent ADLs using ontologies. ADL ontologies model activities as a hierarchy of classes and interrelationships, with each class described by a number of properties. Each property is used to specify the features of activity classes, e.g. sensors, time, and location. In a typical situation, there is a wide range of ADLs, and each ADL could be performed in a diversity of ways depending on an individual’s preferences and styles. Therefore, it is quite common that some activities may not be originally anticipated, thus not being modelled by the ADL ontologies. This will make such activities impossible to be recognised. As a result they are flagged, during activity recognition, as unknown activities. Currently, the ontology-based approach has one major limitation, i.e. the seed ontological activity models are static and unable to handle unforeseen activities or activity variants without explicit manual revision. In addition, there is a lack of flexibility in handling different ways of performing activities by different users. This limits their ability to effectively support the provision of personalised activity assistance. The provision of activity assistance requires the SH to not only identify the ongoing or intended activity, but also the manner in which the activity is performed. However, activity recognition is primarily concerned with the identification of the ongoing activity. To determine how an activity is performed, a Smart Home uses behaviour recognition. Behaviour (or life habit) typically describes how an individual performs an activity, and possibly identifies the entities that are used. In this article we distinguish between a life habit (behaviour) and an activity of daily living (ADL). From [3], a habit is defined as “a recurrent, often unconscious pattern of behaviour that is acquired often through frequent repetition”. Therefore, a life habit refers to a habit discovered from ADLs. In [4], we presented an approach that supports the evolution of ontological activity models. We extend the work in [4] to enable behaviour modelling, learning and evolution. To achieve this,
Ontology-based Learning Framework for Activity Assistance in an Adaptive Smart Home
239
we create ontological behaviour models and formalize these using the Behaviour ontology. Essentially, a behaviour model represents a life habit, an associated ADL activity, and the sets of primitive actions that constitute it. In this article, we present an integrated approach that supports activity modelling, activity model learning and evolution, and behaviour discovery and evolution. Our novel approach, which is based on ontology evolution [5], is capable of supporting continuous learning and evolution of ADL models and behaviours. For ADL model learning, we create an initial ontology-driven knowledge base representing the ADL model (called the seed ADL ontology) and evolve it over time. For instance, given unknown activities or variations in performance of activities, the seed ontological activity models are modified through learning. Analogously, we create an initial Behaviour ontology that has no behaviour instances and then populate it through behaviour discovery and evolution. In summary, the approach has the following strengths. Firstly, it uses common sense domain knowledge for representing activity models; hence activities can be identified in spite of sparse initial data. Secondly, it is open to new knowledge, e.g. new kinds of activities and activity variants, making it capable of handling unforeseen activities. Thirdly, it leverages the strengths of predefined and learnt activity models to enable adaptability, flexibility and reusability. Through adaptability, it can dynamically handle changes in both the user environment and behaviour. Fourth, through flexibility, it can offer both coarse-grained and fine-grained activity recognition. Finally, by discovering and evolving behaviours, it is capable of offering personalized activity assistance to inhabitants. We demonstrate, through use scenarios, how ontology evolution has been utilized to achieve activity and behaviour model adaptation. The remainder of the article is organized as follows. Section 11.2 describes related work in activity and behaviour modelling and adaptation. Section 11.3 presents the proposed framework for adapting ontological activity and behaviour models. Section 11.4 describes the method for activity learning and model evolution. Section 11.5 describes the method for behaviour discovery and evolution. Section 11.6 gives an illustration to demonstrate the framework. Finally, we conclude and comment on future work in Section 11.7. 11.2
Related Work
Activity Modelling. In the area of activity modelling, two main approaches can be identified in the literature: data-driven and knowledge driven approaches. Data-driven approaches rely on probabilistic and statistical models to represent activities. Examples in-
240
Activity Recognition in Pervasive Intelligent Environments
clude Hidden Markov Models (HMMs) [6], dynamic and naive Bayes networks [7], support vector machines (SVMs) [8], graphical models [9] and multiple eigenspaces [10]. Individual activity models are obtained by training using large-scale data sets. The resulting activity models are then reasoned against sensor data to support activity recognition. Data-driven approaches are generally robust to noisy, uncertain and incomplete data but suffer from several drawbacks. They are not reusable since they need to be learnt for each activity and each individual. Secondly, they are not flexible to accommodate variations in individual’s behaviour and the environment. Thirdly, they can be computationally expensive as each activity needs to be learnt separately. Similarly, models have to be learnt for each individual. Finally, they require huge amounts of labelled training and test data to be available. On the other hand, knowledge-driven approaches use artificial intelligence (AI) based knowledge representation formalisms for activity modelling. The core idea is to explore rich domain knowledge for activity modelling rather than to learn them from data as seen in data-driven approaches. For example, the use of logical formalisms to model and represent an activity as a sequence of events is presented in [11]. In [12–14], and [2], the use of ontologies to represent sensor data and activity models is reported. For instance, [2] uses ontologies to explicitly model ADLs and Description Logics (DL) [15]-based reasoning for activity recognition. The strengths of the knowledge-based approach include: it captures clear semantics inherent in domain knowledge, allows domain knowledge and heuristics for activity models and data fusion to be incorporated, and supports knowledge based intelligent processing by being able to capture everyday common sense domain knowledge. The major weakness of the knowledge-driven approach is that the pre-defined activity models, derived from the expert’s domain knowledge, are static and unable to adapt to the user’s behaviour changes. As a result, they cannot handle emergent activities or changes in the user’s behaviour. Behaviour Modelling, Learning and Recognition. In the area of behaviour learning and recognition, a number of approaches have been explored. In [12], the use of Bayesian Networks and ontologies is presented. Further, [16] reports the use of Event Calculus (EC) [17] to model and recognize behaviours. Behaviours are described as predefined rules, with EC used to represent temporal constraints. These predefined rules can be used by a behaviour recognition system to infer behaviours. In [18], the authors present a method to learn patterns of user behaviour. It uses a Patterns of User Behaviour System (PUBS) that is able to learn patterns from sensor data. PUBS represents patterns as Event-Condition-
Ontology-based Learning Framework for Activity Assistance in an Adaptive Smart Home
241
Action (ECA) rules. The rules specify the actions to carry out given that certain events occur under certain conditions. An unsupervised method to identify frequent patterns that occur in an individual’s routines is presented in [19]. Frequent patterns can be likened to behaviours. The method uses a mining approach to discover activity patterns and then a clustering step to group discovered patterns. It is applied to sensor data, and the ultimate aim is to use these patterns for activity recognition. In [20] a framework to learn patterns of human behaviour is proposed. It combines techniques of clustering and learning of sequences of actions. The method is nonsupervised and is applied to primary sensor data. The use of association rule mining for routines and preferences is reported in [21]. It reports the use of spatio-temporal ontology models to represent contextual information and a rule learning mechanism to learn from annotated activity data. The annotated activity data is labelled through a user’s responses to system messages. Our work belongs to the knowledge-driven approach but extends existing work in [2] and [4] by allowing the creation of behaviour models and the adaptation of both the ADL activity and behaviour models. In [2], the predefined ADL ontology does not get to evolve with use. In [4], the explicitly predefined ADL activity ontologies are only used as the seed activity models. The seed ontological activity model is then evolved on the basis of newly discovered primitive activities or individual preferences. In this work, we further provide predefined behaviour models which are then dynamically populated and adapted as more activity data becomes available. Given the reviewed literature on behaviour learning, we note that most methods essentially process raw sensor data to discover behaviours. In our work, the sensor data first undergoes classification and labelling through ontological inference. The proposed learning approach is applied to this labelled activity data. Since changes would be made to activity and behaviour models, which are represented as ontologies, our approach uses techniques from ontology evolution [22] to ensure that there is consistent ontology change.
11.3 11.3.1
Activity and Behaviour Learning and Model Evolution Framework Rationale
In an SH, inhabitants perform ADLs in a given location, time and using specific objects. Collectively, these refer to context for the corresponding activity. This contextual information can be monitored and captured using sensors and be used to infer ongoing activities. In ontology-based activity modelling, activity models that establish links between
242
Activity Recognition in Pervasive Intelligent Environments
activities and their context can be created and represented as formal ADL ontologies [4]. Therefore, activity recognition amounts to constructing situations from sensor observations and reasoning with them against activity models represented by the ADL ontologies. Due to individual preferences and/or limitations, the performance of ADLs may vary. In some cases, even the same individual may carry out an activity in various ways and sequences. As such, during activity modelling, activities can be represented at multiple levels of abstraction, namely generic, personalized and idiosyncratic representation. Generic models capture available common sense domain knowledge that characterizes most ADLs. Personalized models vary between individuals, while idiosyncratic models are special types of personalized models that capture the variations in how the same individual carries out ADLs. The seed ADL ontology may not capture all possible ADL activities, including subtleties, due to differences among individuals and the idiosyncratic ways in which they carry out activities. The consequence is that the inference engine may fail to identify the ongoing activity. It is important that the activity recognition system learns to recognize peculiarities of modelled activities. Similarly, it should learn to recognize activities that users perform but that are not presently represented in the activity models. This allows it to provide reasonable and trustworthy ADL assistance to inhabitants. To address this, the ADL ontology needs to be responsive and adaptable to individual inhabitants as well as to the different ways that an individual may complete activities. We capture information about ongoing activities as activity traces, which are logged for further analysis. We can distinguish between labelled and un-labelled activities in the captured information. Labelled activities are those that are explicitly encoded with a term in the ADL ontology. They can be identified through ontological reasoning process. Unlabelled activities, on the other hand, are those activities carried out by the inhabitant but the reasoning engine cannot find a corresponding term in the ADL ontology. As a result the latter are logged with the label ‘unknown’. In addition, an assistive system will need knowledge about how an individual performs activities, i.e. the behaviour (or life habit). In the same way that we create activity models for ADL activities, we can use behaviour models to represent life habits. Given that more activity data can be obtained over time, we can allow the behaviour models to be adapted in line with available activity data. We can use an ontological representation to formalize the behaviour model. In the next sub-section, we describe a framework for adapting the activity and behaviour models in view of activity data.
Ontology-based Learning Framework for Activity Assistance in an Adaptive Smart Home
11.3.2
243
The Architecture
We exploit ontology evolution [5] and integrate it with activity recognition and learning to give the generic framework shown in Fig. 11.1. The ADL Ontology creates an activity model that provides a semantic model of the Smart Home. The ADL ontology captures activities and contextual information e.g. time, location and actors and gives the semantics needed to support activity recognition. It uses classes to represent activities and their interrelationships and properties to connect activities with their context. The Behaviour Ontology captures repetitive patterns of behaviour (or life habits) and associated semantic information. For instance, it records information about the primitive actions and patterns of primitive actions that constitute an activity. The Behaviour Ontology can be used by an assistive system to determine when the inhabitant needs help, the kind of help to provide and how to provide help. It essentially allows life habits to be inferred. The Activity Recognizer is responsible for activity recognition and the recording of activity traces. It receives sensor data as a sequence of sensor activations and reasons with these through Description Logics (DL)-based reasoning [23] against the ADL ontology to infer ongoing activities. It outputs activity data that includes inferred activities and activity traces; it records the latter in the Activity Log as instances of Trace Ontology. The Trace Ontology provides the format of activity traces and allows meta-data about the latter to be captured and stored. It describes information to be captured about each activity, e.g. sequences of sensors activations, temporal information and activity labels. The Trace Ontology is used to instantiate individual activity traces in the Activity Log. The Activity Log provides persistent storage of activity traces related to labelled and unlabelled activities. The Activity Log is represented in a suitable machine process-able format, e.g. XML. To enhance the utility and responsiveness of the knowledge represented, we propose to evolve the ADL and Behaviour ontologies. We use ontology evolution to provide and formalize a systematic procedure for effecting ontology change and adopt the process described in [24]. As a consequence, we have central to our framework the Ontology Evolution component that is responsible for ontology evolution. The main function of this component is to evolve ADL and Behaviour ontologies while it maintains their consistency. It processes the Activity Log to allow the modification of the ADL ontology and to support the discovery of life habits and evolution of the Behaviour ontology. The Ontology Evolution component integrates two components, namely ADL Learner and Behaviour Learner, to facilitate this task. The ADL Learner is responsible for evolving the ADL ontology.
244
Activity Recognition in Pervasive Intelligent Environments
Fig. 11.1 The generic framework for activity and behaviour learning and model evolution.
It determines whether to change the ADL ontology and the nature of change to make, e.g. whether to model new activities or variants of existing activities. To achieve this, it analyses the Activity Log using algorithms described in Section 11.4. The Behaviour Learner allows the Behaviour ontology to be evolved. It is responsible for discovering user behaviour (life habits) and changing the Behaviour ontology. Recall that behaviour can be described as a repetitive pattern of carrying out an activity. To achieve its objective, the Behaviour Learner uses the method described in Section 11.5. A description of the Assistive system is outside the scope of this article. In a nutshell, we separate the component used for learning from the component used for activity recognition. In this way learning only takes place later, after activity data has been collected for some time. The results of learning are enhanced ADL ontologies, and better recognition and decision support for activity assistance. This means that even before any learning can take place, it should still be possible to identify ongoing ADL activities. In the next section, we describe the process captured in the framework. 11.3.3
The Process
From Fig. 11.1, we can distinguish two distinct, but related, phases, namely activity recognition, learning and model evolution (Phase I) and behaviour learning and evolution (Phase II). This division ensures that there is a logical separation of the two processes while highlighting their interdependency. In Phase I, sensor data is processed by the Activity Recognizer component in order to identify ongoing activities. The resulting information
Ontology-based Learning Framework for Activity Assistance in an Adaptive Smart Home
245
is captured in the Activity Log. The ADL learner component then processes the Activity Log in order to discover new activities and to evolve the ADL ontology. Using ontology evolution, the ADL learner should make changes to the ADL ontology that include the addition of new concepts and instances as well as restructuring of the ontology structure. Phase I occurs in both the Activity Recognizer and the ADL Learner. In Phase II, the Behaviour learner uses the Activity Log in order to discover life habits. Initially, it uses a Behaviour discovery process to analyze the activity traces to discover life habits. Subsequently, it uses a Behaviour adaptation and evolution process to make changes to the Behaviour ontology based on the discovered life habits. As a result, the Behaviour ontology will continuously be updated as more activity data becomes available. Phase II occurs entirely within the Behaviour Learner. In this article the focus for behaviour discovery is on life habits based on single ADLs. The discovery of life habits made up of more than one ADL activity will be investigated in future work. In the next two sections, we describe the methods for implementing these two phases. 11.4
Activity Learning and Model Evolution Methods
In our approach, we propose to perform ADL adaptation by means of a formal Ontology Evolution process. The adopted process has six (6) phases [24] namely: change capturing, change representation, semantics of change, change implementation, change propagation, and change validation. To perform ontology evolution, the first step is change capturing phase that entails a change discovery process. In this article we focus on the change capturing phase. Change discovery aims to identify the changes that need to be made to the ontology of interest. According to [24] there are three broad approaches to change discovery, namely structure-driven, data-driven, and usage-driven discovery. For instance, in usage-driven change discovery, ontology usage data is analysed to discover problems with the ontology and conditions for evolving the ontology. The types of changes related to labelled and unlabelled activities correspond to changes typically discovered through a usage-driven change discovery process. In our framework, this data is captured as activity traces, stored in the Activity Log. At present, a number of heuristics-based methods have been used for usage-driven change discovery. For instance, [25] uses the notion of semantic similarity that adopts ontology-matching techniques for discovery of new concepts. In [24], the authors present statistical measures based on the ontology structure and its usage, e.g. navigation and querying. In [26] a semantic similarity
246
Activity Recognition in Pervasive Intelligent Environments
measure is proposed that uses background knowledge to compute similarity between terms. Evolva, an ontology evolution framework that only deals with the discovery of concepts is presented in [27]. Most existing methods deal with change discovery that leads to concept-related changes. Our work aims at the discovery of both instance- and concept-related changes. To this end, we propose a set of measures to formalize this discovery and then propagate them to the other phases in ontology evolution process. We use a combination of heuristic-based, statistical and semantic similarity measurements. The semantic similarity measures are capable of making comparisons among concepts, properties and instances and can be used to compare entities in both the Activity Log and the ADL ontology. In the next sub-sections, we describe the measures and the proposed algorithms for change discovery. 11.4.1
Preliminaries
In order to aid analysis and to apply the proposed measures, we adopt the following definition for ontology structure that the ADL Ontology and Trace Ontology must correspond to. Definition 11.1 (Ontology Structure). An ontology structure is a tuple O = {C,P,R,A,I, H C ,H R ,H A ,Lit,domain,range} which consists of a set of concept symbols C, a set of relation symbols R, a set of attribute symbols A, a set of property symbols P = R ∪ A, a set of instances I, and a concept hierarchy HC ⊆ C × C. Others are a relation hierarchy H R , attribute hierarchy H A and a set of literals Lit. The function dom : R → C gives the domain of R and function range : A → Lit gives the range of A. Further we define labelled and unlabelled activity traces, and corresponding sets for each. Definition 11.2 (Labelled and Unlabelled Trace). A labelled trace (LT) is a trace whose label has a corresponding term explicitly encoded in the seed ADL ontology (SO). An unlabelled trace (UT) is one associated with the label ‘unknown’. Definition 11.3 (Unlabelled Activity Traces (UAT), Labelled Activity Traces (LAT)). UAT is the set of all unlabelled activity traces in the Activity Log. LAT is the set of labelled activity traces. Suppose that the set of sensors in a Smart Home (SH) is denoted by S, and each sensor is identified by a sensor ID, sx , this can be represented as S : {s1 , s2 , s3 , . . . , sn }. Similarly,
Ontology-based Learning Framework for Activity Assistance in an Adaptive Smart Home
247
given that a sensor activation is denoted by say , a sequence of sensor activations over time can be denoted by SA and represented as SA : sa1 , sa2 , sa3 , . . . , sak . Activity recognition involves reasoning with sensor activations against the seed ADL ontology. Typically, when the Activity Recognizer is given SA, it derives the relationship between SA and S denoted as a pair say , sx that maps each sensor activation say to a sensor sx . It uses this information to infer ongoing activities as described in [2] by using Description Logics (DL)-based subsumption reasoning [28]. It generates activity traces over a given timeline and these can be recorded in the Activity Log. The problem is to determine which labelled and unlabelled traces should cause the seed ADL ontology to be changed. In general, we can assume a total of N activity traces, made up of M labelled and L unlabelled traces. As the L unlabelled traces have no corresponding terms in the seed ADL ontology, it is necessary to set and implement the criteria to discover those traces that should lead to changes to the seed ADL ontology. We describe the learning and discovery process in Section 11.4.2. Similarly, from among the M labelled traces we need a method that discovers the traces that should cause changes to the structure of the seed ADL ontology. We describe the learning and discovery method in Section 11.4.3.
11.4.2
Learning Algorithm for Unlabelled Traces
In this section, we present an algorithm that analyzes unlabelled activity traces in order to propose changes to the seed ADL ontology. We define various heuristic-based measures and using these and various semantic similarity measures describe the process of analysis. To determine the unlabelled traces that lead to seed ADL ontology change, we proceed by collecting all traces that have matching sensor activations into a set, UATy ⊆ UAT . Each UATy has traces with identical sensor activations. UATy = UTx | Simact (UTi ,UT j ) = 1 , UATy ⊆ UAT, i = j, 1 i, j, x, y L . (11.1) We use semantic similarity measure Simact (UTi ,UT j ) to determine matching traces. Simact (UTi ,UT j ) is defined based on relation similarity measure RS (UTi ,UT j ) derived from [29]. Simact calculates the similarity between two instances of activity traces on the basis of the relations defined in these instances. In our method, it determines the similarity of activity traces based on the sensor activations involved. It is defined below: Simact (UTi ,UT j ) = RS (UTi ,UT j ) .
(11.2)
We then collect all these UATy ’s into a set UAT, which is the union of all UATy as shown below: UAT = UAT1 ∪UAT2 ∪ · · ·UATy .
(11.3)
248
Activity Recognition in Pervasive Intelligent Environments
To determine how regular or frequent a given kind of trace is, we define the ratio of occurrence measure ROtrace (UATy ) for each UATy . ROtrace (UATy ) =
# of traces in UATy . L
(11.4)
In addition, we define a threshold value TRO as the average ratio of occurrence for all distinct sets of unlabelled traces as shown below: TRO (UAT ) =
∑y ROtrace (UATy ) , m =# of subsets in UAT . m
(11.5)
We set the condition that only those UATy whose ratio of occurrence is greater than or equal to the threshold can result in an ontology change. Using this condition, we determine these UATy ’s and pick only the first activity trace UTi1 into a set of candidate traces CT . CT = UTi1 | UTi1 ∈ UATy , ROtrace (UATy ) TRO (UAT ) .
(11.6)
Let an instance of the seed ADL ontology be denoted by iSO x . For each activity trace UTk ∈ CT, (1 k |CT |), we use the sensor activations to create a temporary concept ctemp and a corresponding temporary instance itemp in the ADL ontology, (itemp is instance of ctemp ) . We then compare the properties of itemp with properties of iSO x . We define the SO semantic similarity measure Sim property itemp , ix based on semantic matching measure SM derived from [30]. Sim property uses a measure of lexical similarity to calculate the similarity between two lexical entries, e.g. the similarity between two strings. SM uses string matching to compare lexical entries; in our case it compares the names of properties of itemp and iSO x . Sim property returns 1 when there is a match, and a value greater than or equal to 0 otherwise, and it is defined below: Sim property itemp , iSO = SM itemp , iSO . x x
(11.7)
When properties do not match, we recommend evolution based on concept related changes. On the other hand, we proceed to compare the property values for semantic similarity using Simproperty-value itemp , iSO x . We recommend instance related changes if property values do not match, but do nothing otherwise. wR ∗ RS itemp , iSO + wA ∗ AS itemp , iSO x x Simproperty-value itemp , iSO = . x (wR + wA )
(11.8)
RS and AS are relation and attribute similarity respectively and are defined in [29]. wR and wA are weights associated with each similarity measure. Fig. 11.2 shows the algorithm.
Ontology-based Learning Framework for Activity Assistance in an Adaptive Smart Home
249
Def: UATy = {UTx | Simact (UTi , UTj ) is equal to 1} UAT := UAT1 ∪ UAT2 ∪ · · · ∪ UATy For each UATy ⊆ UAT Do # of traces in UATy ROtrace (UATy ) := L End ∑y ROtrace (UATy ) TRO (UAT) := m Def: CT := {UTi1 | UTi1 ∈ UATy , ROtrace (UATy ) TRO (UAT)} ISO ∈ SO // all instances of SO For each UTk ∈ CT Do Create ctemp and itemp SO Do For each iSO x ∈I If Simproperty (itemp , iSO x ) == 1 Then If Simproperty-value (itemp , iSO x ) == 1 Then Action: make instance related change End Else Action: Make concept related change End End End Fig. 11.2 Algorithm for unlabelled activity based change discovery
11.4.3
Learning Algorithm for Labelled Traces
In this section, we describe how to analyze labelled activity traces in order to propose changes related to individual inhabitants. We define various heuristic-based measures and show how these are used with semantic similarity measures to initiate ontology evolution. By analysing the M labelled traces, we can extract the general information, e.g. 1) frequency of occurrence of each activity, 2) start time, duration and end-time of each activity, and 3) activity concepts associated with an activity trace. Take the case of a hypothetical activity ‘Make Tea’ as an example, and given that it occurs n-times (n M) in the traces with differing sensors and/or sequence of sensors used. From the n, we can extract the following information: 1) frequency of occurrence of each sequence of activations, 2) start time, duration and end-time of each activation, 3) pattern(s) of activations, and 4) predecessors of each activation. Using the results of the analysis, we can define measures that allow us to determine whether changes should be made to the seed ontology or not. We use a number of heuristics to guide this process, e.g., 1) the number of traces per activity; 2) the number of patterns of sensor
250
Activity Recognition in Pervasive Intelligent Environments
activations per activity and their variability; and 3) whether coarse-grained or fine-grained recognition was possible. By using these heuristics , we define three measures: ratio of occurrence, diversity and coarseness. These are used in an algorithm to help decide when to effect changes to the seed ADL ontology. The learning process can be described in the following. Given all labelled traces LAT , a subset LATz containing all traces with the label z (z corresponds to a term in the seed ADL ontology) can be denoted below. LATz = {LTi | all traces with the label z} , LATz ⊆ LAT, 1 i M .
(11.9)
We then collect all these LATz’s into a set LAT , which is the union of all LATz LAT = LATz ∪ LATy ∪ · · · ∪ LATm , z, y, and m are labels .
(11.10)
To determine whether an identified activity is coarse or fine, we define the measure coarseness. Coarseness determines how specific an activity associated with a given trace is and how definitive the recognition performance is. It is defined as a function of sub-concepts of the given activity and increases with increase in the number of sub-concepts involved. We then use non-zero values of coarseness to signify the need for concept related ontology change to the seed ADL ontology. This is aimed at refining the ontology structure, to enhance fine-grained activity recognition. Coarseness is calculated by the formula below: coarseness(z) = 1 −
1 . (numO f SubConcepts(z) + 1)
(11.11)
Subsequently, we determine how regular a given activity z occurs by defining ratio of occurrence. Ratio of occurrence (RO) determines how regular a given activity is identified in the traces. It is computed below: ROz (LATz ) =
# of traces in LATz . M
(11.12)
In addition, we define a threshold value TRO as the average ratio of occurrence for each LATz ⊆ LAT . We use the formula: TRO (LAT ) =
∑z ROz (LATz ) , n = # of subsets in LAT . n
(11.13)
To determine how variable any LATz is, we define the measure diversity. Diversity defines how varied an activity z is, depending on the number of unique patterns of sensor activations that can be found in the set LATz . To determine the number of unique patterns, we compare each activity trace LTi ∈ LATz with activity traces LTj ∈ LATz , (i = j) and count each unique trace using a variable uniquePatterns. For each comparison, we
Ontology-based Learning Framework for Activity Assistance in an Adaptive Smart Home
251
use Simact (LTi , LTj ) defined based on relation similarity measure RS (LTi , LT j ) as specified in [29]. Simact is defined below: Simact (LTi , LTj ) = RS (LTi , LTj ) , 1 < i |LATz | , i = j .
(11.14)
Using the value for the number of unique patterns, we can define diversity as shown below. 1 uniquePatterns > 1 diversity (LATz ) = (11.15) 0 otherwise When diversity is 1 and the rate of occurrence is greater than or equal to the threshold value, we can recommend concept related change to the seed ADL ontology. This would enable the ontology to enhance personalization by accommodating idiosyncratic ways in which individuals carry out activities. Fig. 11.3 shows the algorithm. After executing the activity learning and model evolution phase, the next step is to describe the method used to implement the behaviour learning and evolution phase. The next section presents the method.
11.5
Behaviour Learning and Evolution Method
In Section 11.4, we described the procedure for updating the ADL ontology and presented the necessary algorithms. From the activity recognition phase, the raw sensor data has undergone a form of classification and the ongoing activities identified, labelled and time stamped in the Activity Log. We propose to perform further analysis on this already classified data to identify and characterize the long term behaviours exhibited by an inhabitant. The discovered behaviours (or life habits) can be used to populate the Behaviour ontology. The Behaviour ontology is conceptualized as a Behaviour model, as indicated in Fig 11.7. We derive a life habit by performing an analysis on ADL data that is available in the Activity Log. A life habit can be composed of one or more ADLs. We shall focus on life habits that correspond to a single ADL activity. Essentially, for each ADL activity we generate a life habit. However, since an ADL activity can be performed in a number of different ways, each life habit will be encoded using the primitive events or actions that constitute the corresponding ADL activity. For the purpose of this article, we will call each of these different ways an ADL pattern. Therefore, a single life habit can have one or more patterns that describe it. In summary, life habits encode information about the different ways an individual performs an ADL activity and whether there are preferred ways of doing this. In this section, we propose to annotate each pattern with a value that can be used to provide
252
Activity Recognition in Pervasive Intelligent Environments
Def: LATz := {LTi | all traces with label z} LAT := LATz ∪ LATy ∪ · · · ∪ LATm For each LATz ⊆ LAT Do # of traces in LATz RPz (LATz ) := M 1 coarseness(z) := 1 − numOfSubConcepts(z) + 1 If coarseness(z)! = 0 Then Action: Make concept related structure change End End ∑ ROz (LATz ) TRO (LAT) := z n For each LATz ⊆ LAT Do uniquePatterns=0 For i = 1 to i = |LATz | Do found=false For j = i + 1 to j = |LATz | Do If Simact (LTi , LTj ) = 1 Then found=true End End If found=false Then uniquePatterns=uniquePatterns+1 End End If uniquePatterns>1 Then diversity(LATz ) := 1 Else diversity(LATz ) := 0 End If diversity(LATz ) := 1 and ROz (LATz ) TRO (LAT) Then Action: Make concept related idiosyncratic change End End Fig. 11.3 Algorithm for Labelled activity based Change Discovery
a criterion for determining whether a particular pattern is preferred or not. An assistive system can use this information to help it provide assistance to a SH inhabitant. In this work, we analyse the sensor sequences (also called patterns) in each of these recognised activities within the Activity Log as well as the statistical properties of each pattern’s occurrence. Based on this analysis, we identify regular (most probable) patterns. A regular pattern refers to the ADL pattern that occurs most often, overall, in the ADL Log for a given ADL activity. We focus on patterns that occur for a reasonably large number of instances.
Ontology-based Learning Framework for Activity Assistance in an Adaptive Smart Home
253
Borrowing from the area of data mining [31], we can consider such patterns as useful or interesting patterns. Analogous to a similar measure in data mining, in this paper, the measure of interestingness is called support. The value of support captures the proportion of the number of times a pattern occurs in the Activity Log for a particular ADL activity given the total number of occurrence of all other patterns for that activity. The interesting patterns must meet some minimum occurrence criteria. A pattern is considered interesting if it has support value that is above a given minimum support (minsupport) threshold. The measure, support, is ideally a statistical value that can be computed very simply from available data. We note that as more activity data become available, the value of support associated with any pattern can change. The Behaviour Learning and Evolution component will automatically discover this change and make the desired adjustments to the Behaviour ontology. In the next section, we present the algorithm used.
11.5.1
Algorithm for Behaviour Learning
In this section, we present the approach for discovering life habits from the Activity Log. First, we provide a formula for computing support. Consequently, we use it to show how new life habits can be discovered and added or to provide a basis for modifying existing life habits. To compute support, first, we obtain all activity traces from the Activity Log and group the traces according to the activity labels. Therefore, given the set of all activity traces is AT , a subset ATz containing all the traces that have the label z is generated as shown below: ATz = {LTi | all traces with the label z}, ATz ⊆ AT, 1 i |AT | .
(11.16)
Using the set ATz above, we determine the unique ADL patterns that it has in the same way as we did with labelled traces in Section 11.4. To determine the number of unique patterns, we compare the activity traces LTi , LT j ∈ ATz , i = j. We count each trace that is different from all other traces using the variable uniquePatterns. For each comparison, we use Simpattern (LTi , LTj ) defined based on relation similarity measure RS (LTi , LTj ) as specified in [29]. RS makes it possible to compare the values of properties for an ontology instance. Simpattern is defined below: Simpattern (LTi , LT j ) = RS (LTi , LTj ) , 1 i, j |ATz | , i = j .
(11.17)
Using this value for unique patterns, we can partition ATz into uniquePatterns subsets, pattern-x
ATz
pattern-x
. A subset ATz
contains all traces of label z that have the pattern denoted
254
Activity Recognition in Pervasive Intelligent Environments
by pattern-x. x is a whole number ranging from 1 to less than or equal to uniquePatterns. Each pattern subset is as defined below: ATzpattern-x = {LT j | all traces with pattern-x}, ATzpattern-x ⊆ ATz , 1 x uniquePatterns, j = 1, 2, . . . .
(11.18)
In the next step, we compute the support for each pattern by first counting the number of instances of that pattern using patternCountpattern-x as defined below:
patternCountpattern-x = |ATzpattern-x | .
(11.19)
Using the value of patternCountpattern-x as indicated above, support for each pattern, supportpattern-x , can be computed using the formula: supportpattern-x =
patternCountpattern-x . patternCountpattern-x
∑uniquePatterns x
(11.20)
As a result of this computation, a number of interesting patterns can be identified for each activity. All patterns that meet the minimum support threshold are added as action patterns of a life habit in the Behaviour ontology. In this work, we have chosen a threshold value of 20%. Figure 11.4 shows the algorithm. 11.6
Illustration
We have prototyped the proposed framework. This includes the development of the underlying ontological models and the facilities to record and retrieve activity traces. We describe the prototype and demonstrate its workings using typical use scenarios in subsections below. 11.6.1
Ontological modelling and representation
In an SH environment, inhabitants may carry out a variety of ADL activities. Typically, these are performed using certain objects and in predefined locations. For example, to make tea, the inhabitant may use a kettle, cup, tea bags, hot water, sugar and milk while located in the kitchen. This may occur in the morning, late afternoon and probably before going to bed. In a nutshell, each activity is carried out in some kind of context-where the context includes inhabitant, related activities, time, location and objects used. In addition, we can conceptualize ADLs at different levels of granularity: fine- and coarsegrained. Those ADLs with sub-activities or more specialized child activities are called
Ontology-based Learning Framework for Activity Assistance in an Adaptive Smart Home
255
Def: //partition the Activity traces based on Activity labels ATz = {LTi | all traces with the label z} minsupport=20% For each ATz ⊆ AT Do uniquePatterns=0; //determine the number of unique patterns For i = 1 to i = |ATz | Do found=false For j = i + 1 to j = |ATz | Do If Simpattern (LTi , LTj ) = 1 Then found=true End End If found=false Then uniquePatterns=uniquePatterns+1 End End //generate the sets of unique patterns For x = 1 to uniquePatterns Do pattern-x ATz = {LTi | all traces with pattern-x} pattern-x patternCountpattern-x = |ATz | End pattern-x For each ATz Do patternCountpattern-x supportpattern-x = uniquePatterns patternCountpattern-x ∑z If supportpattern-x minsupport Then Add pattern-x to life habit End End End Fig. 11.4
Algorithm for Behaviour learning and evolution
coarse-grained ADLs. Specialized ADLs with no child activities are fine-grained and these identify very specific ADLs. The above constitute important common sense domain knowledge and heuristics to be modelled by creating ontology-based knowledge models. We used the Web Ontology Language (OWL) [23], specifically OWL-DL, for ontological modelling and representation. As OWL-DL is based on the logical formalism Description Logics, ontological activity modelling can exploit DL-based reasoning for activity recognition. The core elements of the DL formalism are concepts, roles and individuals. These elements can be mapped to classes, properties and instances, respectively, in formal ontologies. We used Protege [32] to create ADL and Trace ontologies. We created the seed ADL ontology to represent the activity model used for activity recognition. The ADL on-
256
Activity Recognition in Pervasive Intelligent Environments
tology contains classes that allow us to explicitly model an aspect of the Smart Home and, together, they provide a semantic model for Smart Homes. Figure 11.5 provides a fragment of ADL ontology for the Kitchen related ADLs. It shows direct and indirect subclasses of KitchenADL.
is-a
MakeBoiledRice
MakeHotMeal is-a
is-a
MakeToast
MakeMeal is-a
is-a
is-a
MakeSoup
MakeColdMeal is-a
MakeSandwich
KitchenADL is-a
MakeChocolate
is-a
is-a
MakeHotDrink
MakeCoffee
is-a is-a
MakeTea
MakeDrink is-a
is-a
MakeWater
MakeColdDrink is-a
Fig. 11.5
MakeJuice
The tree hierarchy of the Kitchen ADL classes
We created the Trace Ontology that is used to instantiate activity traces in the Activity Log. We analyze the Activity Log in order to evolve the ADL ontology. It identifies the set of sensor activations (and therefore the sensors) involved, temporal information about the trace and the assigned label. In addition, it provides a list of likely activities given a particular series of sensor activations. Figure 11.6 provides a visualization of the Trace Ontology. We created a behaviour model as shown in Fig. 11.7 and formalized this using Behaviour ontology. This captures the concepts and features of a life habit. It captures concepts such as the sets of actions involved in a life habit and the associated ADL activity.
Ontology-based Learning Framework for Activity Assistance in an Adaptive Smart Home
TemporalUnit
257
StateValue
type after
hasSensorState
id SensorActivation
Sensor hasSensor
predecessorID
hasActivation hasStartTime ActivityTrace Instant
likelyActivity hasDate ADLActivity
is-a
label TemporalEntity
Fig. 11.6 The graphical representation of the Trace Ontology
11.6.2
Inferring and Logging ADL Activities
The implemented system captures sensor activations and reasons with them to identify ongoing activities. When a sensor is activated, it is associated with properties in the ADL ontology and using these properties the Activity Recognizer attempts to identify the ongoing activity. It outputs a list of likely activities being performed or, if possible, the precise activity being performed. As an illustration, consider the following sequence of sensors being activated-KitchenDoor, ChinaCup, ChineseTea, KitchenHotWater, WholeMilk and SandSugar. The activity recognizer links these with properties defined in the ontology and by recognizing the classes involved and their relationships, it can identify the ongoing activity. Once the session is completed and the activity recognized, the information about the activated sensors and the activity are logged as instances of Trace Ontology. 11.6.3
Use scenario for ADL Learning and Evolution
Consider a scenario whereby the system has monitored and collected fifty-three (53) activity traces. Among these nineteen (19) are unlabelled traces and thirty-four (34) are labeled
258
Activity Recognition in Pervasive Intelligent Environments
PersonalInformation UserProfile
describes
belongsTo contains
Inhabitant LifeHabit
hasPattern
occursBefore occursWith
occursAfter
ActionPattern
involves support ADLActivity
hasTime
hasAction
occursWith Action TimeInterval
occursAfter
occursBefore
Fig. 11.7 Behaviour model
Table 11.1 Subsets of UAT UAT1 UAT2 UAT3 UAT4 UAT5
Measures for Unlabelled Traces # of traces 7 2 2 5 3
ROtrace 0.368 0.105 0.105 0.263 0.158
TRO (UAT ) 0.2
traces. We can compare the 19 unlabelled traces using the semantic similarity measure Simact . From this comparison, it is determined that there are five (5) distinct patterns of activations-7 traces for first pattern, 2 traces each for second and third patterns, 5 for fourth pattern and 3 for the fifth pattern. We can compute the measures used by the algorithm described in Section 11.4.2 as shown in Table 11.1. Because the ratio of occurrence for UAT1 and UAT4 are greater than the threshold TRO , we can recommend changes to the seed ADL ontology based on their traces. Due to space limitations, we do not show how to determine the specific kind of change to recommend. Similarly we can analyze the thirty-four (34) labeled traces. We start by checking the la-
Ontology-based Learning Framework for Activity Assistance in an Adaptive Smart Home
259
Table 11.2 Coarseness for Activity Classes Labels MakeTea MakeCoffee MakeMeal
# of Sub concepts 0 0 6
Coarseness 0 0 0.86
Table 11.3 Diversity and Ratio of Occurrence for Labeled Traces Subsets of LAT LATMakeTea LATMakeCo f f ee LATMakeMeal
# of Traces 13 9 11
Unique Patterns 3 1 1
Diversity 1 0 0
ROz 0.382 0.265 0.324
TRO (LAT ) 0.323
bel assigned to each trace and group traces that have the same label into individual subsets. Assume that we determine that the traces are associated with the labels Make Coffee, Make Tea and Make Meal. Further analysis discovers that ten (10) traces are labeled Make Coffee, thirteen (13) are Make Tea traces, and eleven (11) are Make Meal traces. We can compare traces in each subset, using Simact , to determine whether they have different sets of sensor activations or not. From this comparison, the Make Coffee and Make Meal traces are found to have only one (1) unique pattern each while Make Tea has three (3) distinct patterns. Therefore, we can compute the measures needed by the algorithm described in Section 11.4.3 as shown in Table 11.2 and Table 11.3. From Table 11.2 we are able to recommend a refinement to the Make Meal class in the seed ADL ontology due to its value of coarseness. In addition, from Table 11.3 we can recommend changes due to Make Tea. This is because it has non-zero diversity and the rate of occurrence is greater than the threshold. Due to space limitations, we do not show how to determine the specific kind of change to recommend. 11.6.4
Use scenario for Behaviour Learning and Evolution
From the previous section, we described a scenario with fifty-three (53) traces having been recorded. Using the same 53 traces, after applying the algorithm in Section 11.4.3, we expect that all traces will have been labelled. By applying the algorithm in Section 11.5.1, an initial analysis produces results like those presented in Table 11.4. Each ADL activity will be presented together with a variety of patterns that constitute it. For ease of analysis, let us assume that out of the 53 traces, 40 traces correspond to only two ADL activities, i.e. MakeTea and MakeCoffee.
260
Activity Recognition in Pervasive Intelligent Environments
Table 11.4 Action patterns data ADL Activity
MakeTea
MakeCoffee ...
Action Pattern Tea-Pattern-1 Tea-Pattern-2 ... Tea-Pattern-3 Tea-Pattern-4 Coffee-Pattern-1 ... Coffee-Pattern-2 Coffee-Pattern-3 ...
Start Time
End Time
...
...
Table 11.5 Support data for action patterns ADL Activity MakeTea
MakeCoffee
Action Pattern Tea-Pattern-1 Tea-Pattern-2 Tea-Pattern-3 Tea-Pattern-4 Coffee-Pattern-1 Coffee-Pattern-2 Coffee-Pattern-3
Pattern Count 2 15 5 3 8 2 5
Support 0.08 0.6 0.2 0.12 0.53 0.13 0.33
Further, assume that MakeTea has 25 traces and MakeCoffee has 15 traces. From further analysis of the data in Table 11.4, our algorithm determines the unique patterns. Assume that it is determined that MakeTea has 4 distinct patterns (Tea-Pattern-1, Tea-Pattern-2, Tea-Pattern-3 and Tea-Pattern-4) while MakeCoffee has 3 unique patterns (Coffee-Pattern1, Coffee-Pattern-2, and Coffee-Pattern-3). The proposed algorithm can proceed to summarize the data and compute the number of occurrences for each pattern and subsequently the value of the measure, support, for each pattern. This is provided in Table 11.5. From Table 11.5, the proposed algorithm would create or modify two life habits- MakeTea habit and MakeCoffee habit. MakeTea habit would have three action patterns (Tea-Pattern1, Tea-Pattern-2, and Tea-Pattern-3). These three patterns would be considered interesting because their support value is greater than or equal to 20%. Analogously, MakeCoffee habit will have two patterns (Coffee-Pattern-1 and Coffee-Pattern-3) as interesting patterns. Changes can be made to the Behaviour ontology to capture information about these interesting patterns. We expect that an assistive system can use this pattern and interestingness information to provide personalized assistance to inhabitants. Similarly, those patterns considered not
Ontology-based Learning Framework for Activity Assistance in an Adaptive Smart Home
261
interesting can be further examined to determine whether they depict conditions about the inhabitant that signify erratic behaviour.
11.7
Conclusion
Activity and behaviour models are important to support reusable, adaptive and personalized activity recognition, and further the scalability and applicability of any assistive systems. This article introduced an ontology-enabled framework for activity and behaviour learning and model evolution in Smart Homes. We have proposed a system architecture and described its working mechanisms. We have developed algorithms for activity learning and activity model evolution through the analysis of ontology-based activity traces. Similarly, we presented an algorithm for learning and evolving life habits. We presented an initial implementation of an assistive system based on the proposed framework. We have outlined use scenarios to illustrate the operation of the framework. While full evaluation awaits further implementation and experiments, work so far has shown that this ontologybased approach is promising in the provision of quick-started, pragmatic and applicable real-world assistive systems. For future work we will explore and propose methods for the discovery and evolution of behaviours that are made of patterns of ADL activities. We shall also investigate how to exploit temporal characteristics in ontological activity and behaviour modelling, learning and evolution. In addition, we plan to fully implement the framework and conduct experiments to evaluate it.
References [1] J. Nehmer, M. Becker, A. Karshmer and R. Lamm, “Living assistance systems - an ambient intelligence approach,” in 28th International Conference on Software Engineering Proceedings, 2006, pp. 43–50. [2] L. Chen and C. D. Nugent, “Ontology-based activity recognition in intelligent pervasive environments,” 2009. [3] Dictionary.com, ‘‘http://dictionary.reference.com/browse/behaviour{\textquotedblright} . [4] G. Okeyo, L. Chen, H. Wang and R. Sterritt, “Ontology-enabled Activity Learning and Model Evolution in Smart Homes,” in 7th International Conference on Ubiquitous Intelligence and Computing proceedings, 2010. [5] P. Haase and Y. Sure, “State-of-the-Art on Ontology Evolution,” 2004. [6] D. J. Patterson, D. Fox, H. Kautz and M. Philipose, “Fine-grained activity recognition by aggregating abstract object usage,” in Ninth IEEE International Symposium on Wearable Computers, 2005, pp. 44–51.
262
Activity Recognition in Pervasive Intelligent Environments
[7] E. M. Tapia, S. S. Intille, W. Haskell, K. Larson, J. Wright, A. King and R. Friedman, “Realtime recognition of physical activities and their intensities using wireless accelerometers and a heart rate monitor,” in Eleventh IEEE International Symposium on Wearable Computers, 2007, pp. 37–40. [8] T. Huynh, U. Blanke and B. Schiele, “Scalable recognition of daily activities with wearable sensors,” in Third International Symposium, LoCA 2007, 2007, pp. 50–67. [9] L. Liao, D. Fox and H. Kautz, “Extracting places and activities from GPS traces using hierarchical conditional random fields,” Int. J. Robotics Res., vol. 26, pp. 119–134, 2007. [10] T. Huynh and B. Schiele, “Unsupervised discovery of structure in activity data using multiple eigenspaces,” in 2nd International Workshop on Location- and Context- Awareness (LoCA 2006), 2006, pp. 151–67. [11] S. Chua, S. Marsland and H. W. Guesgen, “Spatio-temporal and context reasoning in smart homes,” in International Conference on Spatial Information Theory, 2009, pp. 9–20. [12] F. Lafti, B. Lefebvre and C. Descheneaux, “Ontology-based management of the telehealth smart home, dedicated to elders in loss of cognitive autonomy,” in OWLED 2007 Workshop on OWL: Experiences and Directions, 2007. [13] U. Akdemir, P. Turaga and R. Chellappa, “An ontology based approach for activity recognition from video,” in 16th ACM International Conference on Multimedia, MM ’08, October 26, 2008 - October 31, 2008, pp. 709–712. [14] N. Yamada, K. Sakamoto, G. Kunito, Y. Isoda, K. Yamazaki and S. Tanaka, “Applying ontology and probabilistic model to human activity recognition from surrounding things,” Transactions of the Information Processing Society of Japan, vol. 48, pp. 2823–34, 08, 2007. [15] F. Baader, D. Calvanese, D. McGuinness, D. Nardi and P. F. Patel-Schneider, Description Logic Handbook: Theory, Implementation and Applications. Cambridge University Press, 2003. [16] A. Artikis and G. Paliouras, “Behaviour recognition using the event calculus,” in IFIP Conference on Artificial Intelligence Applications and Innovations (AIAI), 2009. [17] M. Shanahan, “The event calculus explained,” in Artificial Intelligence Today. Recent Trends and Developments, Anonymous Berlin, Germany: Springer-Verlag, 1999, pp. 409–30. [18] A. Aztiria, J.C. Augusto and A. Izaguirre, “Autonomous learning of user’s preferences improved through user feedback,” in Proceedings of the 2nd Workshop on Behaviour Monitoring and Interpretation, 2008, pp. 72. [19] P. Rashidi, “Discovering Activities to Recognize and Track in a Smart Environment,” IEEE Transactions on Knowledge and Data Engineering, vol. 99, 08/24, 2010. [20] S. W. Lee, “A Nonsupervised Learning Framework of Human Behavior Patterns Based on Sequential Actions,” IEEE Transactions on Knowledge and Data Engineering, vol. 22, pp. 479– 492, 05/01, 2010. [21] P. N. Kim Anh, Young-Koo Lee and Sung-Young Lee, “OWL-based user preference and behavior routine ontology for ubiquitous system,” in OTM Confederated International Conferences, 2005, pp. 1615–22. [22] G. Flouris, D. Manakanatas, H. Kondylakis, D. Plexousakis and G. Antoniou, “Ontology change: classification and survey,” Knowl. Eng. Rev., vol. 23, pp. 117–52, 06, 2008. [23] I. Horrocks, “OWL: A description logic based ontology language,” in 11th International Conference on Principles and Practice of Constraint Programming - CP 2005, October 1, 2005 October 5, 2005, pp. 5–8. [24] L. Stojanovic, “Methods and Tools for Ontology Evolution,” 05 August 2004, 2004. [25] S. Castano, A. Ferrara and G. Hess, “Discovery-driven ontology evolution,” in 3rd Italian Semantic Web Workshop, 2006. [26] F. Zablith, M. Sabou, M. d’Aquin and E. Motta, “Using background knowledge for ontology evolution,” in International Workshop on Ontology Dynamics, 2008. [27] F. Zablith, “Ontology evolution: A practical approach,” in Proceedings of Workshop on Match-
Ontology-based Learning Framework for Activity Assistance in an Adaptive Smart Home
263
ing and Meaning at Artificial Intelligence and Simulation of Behaviour, 2009. [28] I. Horrocks, U. Sattler and S. Tobies, “Practical reasoning for expressive description logics,” in 6th International Conference on Logic Programming and Automated Reasoning, 1999, pp. 161–80. [29] A. Maedche and V. Zacharias, “Clustering ontology-based metadata in the semantic web,” in PKDD-02: 6th European Conference on Principles and Practice of Knowledge Discovery in Databases, 2002, pp. 348–60. [30] A. Maedche and S. Staab, “Measuring similarity between ontologies,” in 13th International Conference on Knowledge Engineering and Knowledge Management. Ontologies and the Semantic Web, 2002, pp. 251–63. [31] H. Jiawei and K. Micheline, Data Mining: Concepts and Techniques. Morgan Kaufmann, 2006. [32] Protege, ‘‘http://protege.standford.edu/{\textquotedblright}.
Chapter 12
Benefits of Dynamically Reconfigurable Activity Recognition in Distributed Sensing Environments Clemens Lombriser1 , Oliver Amft1,2 , Piero Zappi3,4 , Luca Benini3 , and Gerhard Tröster1 1 2 3 4
Wearable Computing Laboratory, ETH Zürich, Switzerland ACTLab, Signal Processing Systems, TU Eindhoven, The Netherlands Micrel Lab, Università di Bologna, Italy Computer Science and Engineering, University of California San Diego, USA
{lombriser,amft,troester}@ife.ee.ethz.ch, {piero.zappi,luca.benini}@unibo.it Abstract The automatic detection of complex human activities in daily life using distributed ambient and on-body sensors is still an open research challenge. A key issue is to construct scalable systems that can capture the large diversity and variety of human activities. Dynamic system reconfiguration is a possible solution to adaptively focus on the current scene and thus reduce recognition complexity. In this work, we evaluate potential energy savings and performance gains of dynamic reconfiguration in a case study using 28 sensors recording 78 activities performed within four settings. Our results show that reconfiguration improves recognition performance by up to 11.48 %, while reducing energy consumption when turning off unneeded sensors by 74.8 %. The granularity of reconfiguration trades off recognition performance for energy savings.
12.1
Introduction
Activity recognition is widely considered as basic service in smart environments and onbody assistants, where novel ways of interaction and assistive tools are sought. Examples include personal healthcare [1], safety, and user comfort [2]. Advances in sensing technology and embedded systems support this trend and a substantial amount of research results indicates applicability of various activity recognition approaches. Nevertheless, obtaining satisfactory system performances is often a major challenge [3]. This can be attributed to complexity and variability of human behavior, and further, to persisting limitations in sensing and recognizing large activity catalogs. Research efforts have often considered 265
266
Activity Recognition in Pervasive Intelligent Environments
activity recognition with a focus on individual settings, which allowed to constrain recognition regarding sensors used and the activity catalog needed. While this concept is valid for some applications, it does not scale to broad activity catalogs and the full capabilities needed for today’s smart environment visions. It can be expected that reconfiguration of activity recognition systems will allow merging benefits of situation-specific performance and scalability requirements when several settings are involved. Knowledge extraction in smart environments essentially benefits from distributed sensing and information fusion, such as when using a mix of body-worn, object-integrated, and ambient-installed sensors with embedded processing facilities. Such sensors are generally battery-driven, techniques to reduce energy consumption are thus vital to ensure acceptable system lifetime. Energy can e.g. be saved using local processing and communicating only when needed as opposed to sending continuous data streams [4]. Additionally, in many smart environments monitoring a multitude of user activities, each sensor node might not be required at all times. Consequently, effective energy saving should incorporate knowledge on the actual need for a node to be present and when it is safe to turn off. This work investigates the benefits of dynamic reconfiguration in adapting activity recognition systems to a specific setting. We deployed a distributed activity sensing and processing architecture that performs local activity event spotting at each sensor node and communicates recognized events only. We integrated this system with a state-based reconfiguration scheme to evaluate adaptation benefits regarding the actually relevant activity catalog and required sensor nodes. In particular, this paper addresses the following questions: (1) How much energy is saved by reconfiguring a distributed recognition system? Here, we evaluate potential benefits of reconfiguration on energy savings by letting sensors sleep when they are not needed for monitoring activities in the current situation. Since reconfiguring sensor nodes increases communication overhead, we include the communication cost for node adaptations in our analysis. (2) How does reconfiguration impact the recognition performance? We evaluate benefits of reconfiguration on recognition performance. Reconfiguration of sensor nodes was used here to load specific recognition models, adapted to individual situations and their relevant activities. Starting from a baseline that included all activities, the activity catalog was reduced to contain situation-specific activities only. (3) Which reconfiguration granularities generate which benefits? We propose three different reconfiguration granularities relating to the current setting, activity compos-
Benefits of Dynamically Reconfigurable Activity Recognition in Distributed Sensing Environments
267
ite, and object used. Subsequently, we compare their benefits regarding energy consumption and recognition performance to the baseline without reconfiguration. The aim of this work was to quantify potential energy savings and recognition performance increases for the most resource-constraint layer in a activity recognition stack: the sensor nodes that recognize basic activities. This work does not target algorithm optimizations for robust recognition. Instead, we focus our analysis on a particular configuration of sensor nodes and algorithm set used to investigate fundamental reconfiguration benefits. To evaluate our reconfiguration approach, we analyzed a large dataset comprising 78 atomic activities in four settings. Sensor data from 28 nodes (accelerometers and light sensors) deployed on-body, on tools, and in the infrastructure were used to train a distributed, continuous activity recognition system and analyze reconfiguration effects. Three reconfiguration granularities were explored and compared to the baseline (no reconfiguration): (1) settingspecific (adapting to current situation), (2) composite-specific (adapting to current activity composite), (3) object-specific (adapting to currently used object). Section 12.2 discusses previous approaches related to reconfigurable activity recognition. Section 12.3 introduces our recognition architecture and terminology used in this work to denote a distributed recognition of human activities. Our architecture is extended in Sec. 12.4 to enable dynamic reconfiguration of the system at different granularities. The implementation of our approach is summarized in Sec. 12.5. Section 12.6 describes our evaluation dataset. Section 12.7 presents the reconfiguration results obtained for all three granularities. Section 12.8 finally discusses our main findings and indicates opportunities for further research.
12.2
Related Work
Various hierarchical abstraction techniques have been considered to capture complex human activities. Nevertheless, those approaches differ in granularity of abstractions and recognition goals. For example, Ryoo and Aggarwal [5] used three layers and context-free grammars to describe image sequences. Kawanaka et al. [6] used a hierarchical architecture of interacting hidden Markov models to represent sequences of activities. Reconfiguration of sensor networks was not specifically addressed in those works. While we utilize a hierarchical abstraction concept in the current work as well, higher-layer probabilistic modeling is not addressed, in favor for investigating fundamental sensor-based energy and recognition benefits of reconfiguration.
268
Activity Recognition in Pervasive Intelligent Environments
Another concept used to address the complexity problem in activity recognition is to incorporate location information as a results filter. For example, Naya et al. [7] used an infrared location estimation system to mask location-dependent activities. In total 13 activities of a typical nursing workflow were classified from body-worn accelerometers with recalls ranging from 14.8–97.4 %. Ogris et al. [8] recognized 20 activities for quality inspection in car production. After a first spotting step with high recalls of 79.0 %, a masking step was applied in which location and force sensors at the lower arm were used to refine recognition results. This approach changed the performance to 47.8 % precision at 70.6 % recall in the considered dataset. Besides spatial information, also temporal relations between subsequent activities can be used to rule out impossible sequences of activities. For example, Murao et al. [9] could improve the recognition of nine leg-based activities by prohibiting impossible transitions between activities, such as a direct change from bicycling to running. Performance changed by 3.99 % to 91.74 %. A similar approach is known in speech recognition as triphones or context-dependent phone modeling [10]. Phone recognition used the immediate left and right neighboring phones as context to estimate the current one, reducing recognition error rate by 60 %. These approaches used additional knowledge to improve recognition performance. Activity models in sensors were typically not completely exchanged in those approaches. However, this could allow to minimise processing and the activity catalog to be recognised. Energy-efficient activity recognition has been considered in many investigations as a system design parameter. For example, Stäger et al. [11] showed the tradeoff between recognition performance and power consumption at node level by adapting sampling rate and duration, and choosing appropriate features. At the network level, redundancy between sensor nodes was exploited to turn off unneeded sensor nodes: Ghasemzadeh et al. [12] provided bounds for selecting the smallest number of sensor nodes while maintaining service quality. The approach is based on modeling actvitiy class discrimination capabilities of sensor nodes. Zappi et al. [13] demonstrated that clustering sets of active nodes in a gesture recognition setting could extend the network lifetime more than 7 times, while keeping recognition rates above 80 % and more than 4 times for more than 90 %. The work did not address energy saving benefits for different situations, while accounting for reconfiguration-related communications efforts. Several frameworks have been developed that allow dynamic reconfiguration and could enable adaptations in context recognition algorithms for resource-limited sensor networks.
Benefits of Dynamically Reconfigurable Activity Recognition in Distributed Sensing Environments
269
RUNES [14] defines a framework that uses high-level application descriptions and deploys them on top of a run-time kernel in networked sensor nodes. Titan [15] describes context recognition applications as service graphs and distributes them among sensor nodes of wireless sensor networks. SPINE [16] provides a set of libraries for rapid prototyping of health care applications in body sensor networks. Osmani et al. [17] introduced a concept, called “Context Zones”, where sensor nodes join a zone if they can contribute with events to the inference engine executed by this zone. These frameworks have been evaluated with respect to their networking performance, but their impact on activity recognition performance has not yet been quantified. 12.3
Distributed activity recognition
Activities are often captured in a hierarchical recognition stack. At its lowest layer, a stack would process raw sensor data to identify atomic activities, which are considered basic, non-dividable activity units in the particular recognition stack. Examples include “picking up a book” and “using a screwdriver”. Higher layers are frequently used to agglomerate atomic activities into more complex activity composites, representing workflow expressions. As an example, the atomic activities of picking up a glass of water, tilting it, and putting it down may make up the composite “drinking from a glass of water”. Recognizing composite activities is conceptually different from recognizing atomic activities, as for composites discrete events are processed instead of streaming sensor data [18]. In this work, we focus on evaluating reconfiguration benefits at the critical atomic activity layer, which directly affects processing at distributed sensor nodes. In our previous work [19], we investigated the performance of a distributed activity recognition architecture. In this work, wireless sensor nodes performed a local spotting and identification of activity events from sensor readings and communicated detected events. This concept was shown to considerably reduce data amounts that needed to be communicated through a network, and thus sensor node energy consumption. In this work, we consider this distributed architecture as baseline for our reconfiguration analysis. The recognition architecture is shown in Fig. 12.1. Each sensor node runs a detector, which recognizes patterns in locally acquired sensor data streams according to training examples of atomic activity events (activity spotting). Detected events are subsequently classified according to the type of atomic activity. Results are communicated within the network to allow further distributed or centralized processing. In this work, we deployed a network fusion scheme to centrally filter erroneously reported events. A recognition architecture to
Activity Recognition in Pervasive Intelligent Environments
sensor node
270
event spotting
event classification
sensor node
Fusion
event spotting
global recognition result
event classification
Fig. 12.1 Architecture of our distributed activity recognition system. Each sensor node performs local data acquisition, activity spotting, and classification. Only local detector events are communicated.
deal with distributed activity event recognition is presented below. The implementation of event spotting, classification, and network fusion is detailed in Sec. 12.5. 12.3.1
Distributed activity recognition architecture
In this section we introduce a model to represent distributed activity events at each sensor node (detector). This model will be subsequently extended to include dynamic reconfiguration in Sec. 12.4. Atomic activities ai ∈ A represent the global detection catalog in our architecture. However, distributed sensor nodes may observe atomic activities differently. For a particular sensor node, several atomic activities could be represented by identical event patterns, e.g. a wrist detector could potentially not distinguish between atomic activities for picking up a tool and returning it. Nevertheless, for a sensor node attached to the tool itself, these atomic activities may exhibit entirely different patterns. Thus, a distributed architecture can discriminate the two atomic activities, while individual sensor nodes may not. To represent these properties, we map atomic activities into detector events. Each contributing sensor node delivers detector events Ei, j representing local observations of a performed atomic activity. Each detector event represents a set of atomic activities that are locally observed as being identical. The set of all disjunct detector events forms detector set Di of a sensor node i: Ei, j = {ai,1 , ..., ai,n } ⊆ A
(12.1)
Di = {Ei,1 , . . . , Ei,n | ∀Ei, j , Ei,k ∈ Di , j = k : Ei, j ∩ Ei,k = 0} /
(12.2)
The relationship between atomic activities and detector events is illustrated in Fig. 12.2. In this example, the atomic activities “picking up a tool” (a2 ), “using it to manipulate an
a3 a1
ε3,1
ε2,1
ε2,3
ε1,2
A
ε2,2
ε1,2
a2
ε1,1
a3
ε1,2
D1
physical signals
a4
a5
a6
ε2,1
ε2,2
ε2,3
D2
a7
a8
atomic activities
a7
ε3,1
detector events
a2
271
D3
detector sets
wrist (D1)
tool (D2) object (D3)
Benefits of Dynamically Reconfigurable Activity Recognition in Distributed Sensing Environments
activity – event mapping
Fig. 12.2 Exemplary illustration of the relationship between atomic activities ai , detector events Ei, j , and detector sets Di in an object manipulation task. A tool is being picked up (a2 ), used on an object (a7 ), and placed down again (a3 ). Left: acceleration signals of involved sensor nodes in which detector events are recognized. Right: mapping of atomic activities to detector events.
object” (a7 ), and “placing the tool down” (a3 ) are visualized. Three sensor nodes with the detector sets D1 to D3 are used, each operating on their locally acquired signals using an individual mapping of atomic activities to detector events. In this example, the sensor node implementing D1 (wrist) does not distinguish between the atomic activities “picking up” (a2 ) and “placing down” (a3 ). Thus, both atomic activities are mapped to the same detector event E1,2 , which could be understood as “reaching down”. In contrast, the sensor node implementing D2 (tool) observes different patterns for a2 and a3 . It issues the detector events E2,1 and E2,2 . The mapping function between atomic activities and detector events is determined during training of the distributed activity recognition architecture. While it would be conceivable to use an automatic procedure for this purpose, we performed this step using expert knowledge of location and function of each sensor node. This was done in an attempt to minimize potential error sources for our reconfiguration analysis. Based on atomic activities we can describe activity composites Cn as a sequence of not strictly ordered m atomic activities: Cn = {an,1 , . . . , an,m } ∈ A m For the example illustrated in Fig. 12.2, a composite activity “manipulating task” could be defined as Cn = {a2 , a7 , a3 }. The recognition of such composites was discussed in our previous work [19]. In this work, we use the composite description to denote reconfiguration granularities, as detailed in Sec. 12.4 below.
272
12.4
Activity Recognition in Pervasive Intelligent Environments
Dynamic reconfiguration of activity models
Our reconfiguration approach uses additional domain knowledge to dynamically adapt a distributed recognition system. In particular, we constrained the set of atomic activities and let unused sensor nodes sleep depending on the system state. Trigger events are used to switch between configurations. 12.4.1
Reconfiguration concept
To track system states for reconfiguration we used a state machine to model a set of recognition states S as shown in Fig. 12.3. Each state si ∈ S of this state machine describes its own activity catalog As (si ) ⊆ A that is relevant when the state is active. Transitions
δ : S × A → S between states are activated by trigger activities At (si ) ⊂ As (si ). In the example shown in Fig. 12.3, activity a3 will trigger a transition from state s2 to state s3 . When state s3 is entered, the set of relevant atomic activities is changed to As (s3 ) and all sensor nodes are reconfigured for this state. Detector events Ei, j that do not contain atomic activities of As (s3 ) are removed from detector set Di . Subsequently, sensor nodes with empty detector sets will enter a low-power state to save energy until they are needed again.
s1
As(s2) a1
s3
δ(s2,a3)
s2
a2
ε1,1
ε1,2
D1
s4
a3
a4
a5
a6
ε2,1
ε2,2
ε2,3
D2
a7
a8
ε3,1
D3
s5
Fig. 12.3 Illustration of the reconfiguration approach. In each system state a subset of relevant atomic activities As (si ) from the total set A is defined. Here, state s2 , includes As (s2 ) = {a2 , a3 , a4 , a6 }. Thus, events E1,1 and E3,1 will never occur. Detector D3 does not have events to report and can be turned off. Trigger activity a3 will cause a state machine transition to state s3 and subsequent reconfigurations.
Reconfiguration states can be organized in a hierarchy, where each layer allows further constraints to the relevant activity set, resulting in a finer reconfiguration granularity. In Fig. 12.3, reconfiguration state s1 comprises all atomic activities of a catalog As (s1 ) = A . State s2 however reduces the activity catalog to As (s2 ) = {a2 , a3 , a4 , a6 }. This modeling
Benefits of Dynamically Reconfigurable Activity Recognition in Distributed Sensing Environments
273
could, e.g. be used for location-dependent activity sets: when a user enters a kitchen, it becomes unlikely that he will pick a book from the living room shelf. In Fig. 12.3, state s3 provides options to further restrict an activity catalog by transition to s4 or s5 . This may be used for object-specific reconfigurations, e.g. when detecting that a user picked up a pan and thus cannot use other kitchen objects at the same time with this hand. In our evaluation, we investigated effects of hierarchy depth for such state machines, hence the effect of different reconfiguration granularities.
12.4.2
Reconfiguration granularities
To evaluate benefits of this approach in detail, three different reconfiguration granularities were considered in addition to a baseline without reconfiguration. During reconfiguration, only parameters such as activity recognition models and sensor node power mode were changed, no modifications to the architecture or algorithm types was made to enable direct comparison. The three reconfiguration granularities in addition to baseline describe the hierarchy levels of our reconfiguration state machine. Figure 12.4 shows how those reconfigurations take place during runtime.
• Baseline. The baseline represents a recognition that does not reconfigure and thus includes all atomic activities within its activity catalog. It is the most general state which is assumed when no opportunity for a reduction of atomic activities is available. • Setting-specific reconfiguration. Here, the activity catalog is restricted to a specific setting, such as a room or place. The complexity per reconfiguration state corresponds here to most activity recognition results found in literature, e.g. [7, 8, 20]. • Composite-specific reconfiguration. Activity composites represent individual steps within a workflow. An example could be the steps involved in following a recipe during cooking. Reconfiguration restricts the activity catalog to a set of atomic activities which are required to fulfill a certain step. • Object-specific reconfiguration. A further refinement is the reconfiguration following interactions with individual objects. This assumes that a person is only using that object as a tool while it is being held. Trigger activities may be picking up and placing down an object, while the activity catalog contains atomic activities related to the tool.
274
Activity Recognition in Pervasive Intelligent Environments
Baseline
recognition of all activities
Setting specific Composite specific Object specific
trigger activity
shelf assembly Attach crossbar
mount shelf board screwdriver
screwdriver Time
En t Pi er w ck o u p rks sh p a Pi elf ce ck b up oa sc rd Sc re re w w Pi by ck ha up nd sc re w U dr se iv er sc Pl r e ac w dr e sc iv er re wd riv er Pi do ck wn up Sc re scre w Pi by w ck ha up nd sc re w dr U se iv er sc Pl re ac w e d riv sc re er wd riv er Pi ck do wn up cr os sb ar Pi ck up Pi ck na il up ha m m er
Atomic activities
Fig. 12.4 Illustration of a reconfiguration sequence at the finest granularity considered in this work. The atomic activities (lowest line) are recognized within reconfiguration states of different granularities (shown above). A finer reconfiguration granularity allows to reduce the number of atomic activities that needs to be described by pattern models.
12.5
Implementation of the activity recognition chain
Our distributed activity recognition architecture builds on sensor nodes that operate as activity event detectors on locally sampled continuous sensor data streams. In this section we detail spotting and classification algorithms that were employed for sensor nodes and fusion algorithms. 12.5.1
Event recognition at distributed sensor nodes
Activity event spotting was performed using the Feature Similarity Search (FSS) algorithm [21, 22]. The algorithm uses continuous sensor streams to spot events that are embedded in arbitrary data and can cope with variable-length motions. FSS was applied for each detector event type separately. A subsequent filtering stage was used to derive detector events for each sensor node. We briefly outline key elements of feature extraction, event spotting, and selection below. 12.5.1.1
Feature processing and selection
A general set of time-domain features were computed to model event data patterns of all 3D-acceleration sensors in the network. These features included sums and absolute sums; first and second deviations; minimum, average, and maximum amplitudes, and event duration. Besides acceleration sensors, we used light sensors, to e.g. detect open drawers.
Benefits of Dynamically Reconfigurable Activity Recognition in Distributed Sensing Environments
275
Activity events for these sensors are limited to light-on/off transitions, which required less features to model. The feature set was derived from three evenly distributed sections of sensor data and the entire event instance. To select relevant features, a Mann-Whitney-Wilcoxon test was used to compare event instances to embedding data of a training set. This ranking was refined by analyzing correlations among all features. A set of 20 features was selected according to a method yielding highest rank and minimum correlation scores [23].
12.5.1.2
Event spotting using feature similarity search (FSS)
The FSS algorithm consists of a signal pattern modeling (training) and a search stage. A separate training dataset was used to determine FSS model parameters and select a feature set. A validation set was subsequently used to determine performance results as described in Sec. 12.5.4 below. To search continuous data, we used an equidistant data segmentation of 0.25 Hz. This setting provided sufficient resolution for all motions. The FSS algorithm operation can be summarized as follows [21]: For each segmentation point, a variable-sized window of previously received data is analyzed. The features in this window are compared to a trained model and the Euclidean distance dS to this model is computed. A distance threshold θd is used to determine sections that are considered as spotting result (retrieved items). Each retrieved event is associated with a model confidence derived by mapping dS to an a-posteriori confidence with respect to the training model. Settings for threshold and variable-sized window bounds were determined during training.
12.5.1.3
Event filtering and classification of sensor nodes
For each detector event, an individual FSS stage was employed and returned its results independently. This allows the algorithm to scale starting from one detector event. Nevertheless, for each sensor node, only one event can occur at each point in time. Thus, a filtering is applied to generate non-overlapping events by selecting events with highest confidence [21]. For this purpose, a sliding window is maintained to capture temporal collisions between previously retrieved events and new ones. This local event detection result of each sensor node is forwarded to the network for further processing. In this work, we employ a central fusion of all detector results to analyze reconfiguration.
276
12.5.2
Activity Recognition in Pervasive Intelligent Environments
Network fusion of distributed detector events
In our architecture, all detector events communicated by sensor nodes were delivered to a central network fusion. In this final event processing step, we aimed at identifying individual atomic activities from detector events reported by all sensor nodes as well as to correct detection errors. To implement this behavior, we used a sliding window to capture all concurrently reported detector events for a particular atomic activity and applied a sum rule fusion using reported detector event confidences [24]. When the subject performs atomic activity a3 , sensor node i may, e.g., report detection event Ei,1 = {a1 , a3 } and sensor node j detector event E j,3 = {a3 , a5 }. Our fusion then created a fusion event a = Ei,1 ∩ E j,3 = {a3 } with the summed confidence of both detector events. An additional filtering step generates nonoverlapping fusion events by selecting fusion events with highest confidence. The resulting fusion events represent the global recognition result and the system’s output.
12.5.3
Architecture and reconfiguration complexity metrics
Several metrics are relevant to describe architecture and reconfiguration complexity for the recognition task. They can be derived from our architecture model. The total number of detector nodes |D| determines the overall recognition system size. The cardinality of individual detector event sets |Di | relates to processing requirements imposed on each detector node. Measures that describe frequency and granularity of reconfigurations, are number of recognition states |S| and size of atomic activity set in a state |As |. An additional relevant metric is the average size of detector event sets |Ei, j |, describing how specific detector events model atomic activities. Thus, a high number for |Ei, j | indicates that a detector is capable to identify abstract events describing a group of atomic activities only. To identify the actual atomic activity in this case, an activity recognition system requires the presence of other detectors providing complementary detector events. A low number for |Ei, j | indicates that the detector is capable to identify specific atomic activities. Similarly, the average size of a fusion event | ai | is an indicator for the specificity of a fusion result. It represents the number of atomic activities that cannot be distinguished by the whole sensor network.
Benefits of Dynamically Reconfigurable Activity Recognition in Distributed Sensing Environments
12.5.4
277
Performance evaluation
A ten-fold cross-validation was used to determine training and validation sets for activity spotting and classification of sensor nodes, thus to avoid overfitting. Activity recognition performance were analyzed using the information retrieval metrics precision and recall, and an event alignment check with a jitter of 0.5 [21]. Figure 12.5 shows an example precision-recall performance graph derived by sweeping a threshold on retrieved event confidence values, which illustrates detector result quality. Best performance is found towards high precision and recall. In this example, the performance of a sensor node attached at the left knee is shown for four detector events. As Fig. 12.5 indicates, the node performs well for all except the “picking up” detector event. This low performance can be attributed to the fact that some of the atomic activities mapped to this detector event could not be satisfactorily modeled by the applied recognition procedures. In this particular case, the detector event included a large set of 22 variable atomic activities in which legs were bend to pick up something from the ground. Nonetheless, we consider the overall performance of all detectors as adequate to explore reconfiguration in the considered complex activity dataset.
1
Picking up Sitting down Standing up Walking
0.9 0.8
Precision
0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Recall
1
Fig. 12.5 Example recognition performance graph for a sensor node at the left knee sensor. The graph was derived by sweeping a threshold on retrieved event confidence values for four detector event types. Detector event “picking up” is not sufficiently modeled representing a typical issue in activity spotting tasks.
In this evaluation, we have chosen the precision-recall point maximizing the F1 score (harmonic mean of precision and recall) as best operation point for the further analysis of reconfiguration.
278
Activity Recognition in Pervasive Intelligent Environments
Table 12.1 Settings, recordings sessions, and composite activities of the evaluation dataset. All sessions included activities of the subject entering and leaving the setting, e.g. a room. The dataset includes 18 composite and 78 atomic activities. |As | denotes the size of an atomic activity set. |As |
Setting
Session
Composite Activities
Kitchen
Prepare dinner
Heat water, add soup, cook soup, slice bread, use computer, prepare table, eat, cleaning up dishes
44
Study
Relaxing
Selecting a book from the shelf, reading, Returning the book to the shelf
12
Working
Use the computer, drink a glass of water, selecting a book, reading, writing, returning the book
21
Assembling shelf
getting the tools, mounting middle board, mounting upper board, mounting lower board, returning tools
14
Attaching crossbar
getting tools, mounting crossbar, hammering in a nail, returning tools
12
scratching the head, using the phone, tying shoes, coughing
0
Assembling furniture
Distractions
12.6
Evaluation dataset
To illustrate the benefits of reconfiguration, we analyzed a complex activity dataset with several settings, which could naturally benefit from a set of dedicated activities. These settings are often found in rooms or localized environments of a specific purpose, a work plan that has to be followed, etc. We have chosen a smart environment dataset consisting of four real-world settings and six sessions involving different activities, such as working and relaxing in a living room. In total 18 composite activities and 78 atomic activities were recorded from 28 sensors that were worn on-body, attached to tools, and embedded into the environment. In total 120 recordings were made with 4615 atomic activity instances performed by two subjects [25]. Table 12.1 provides an overview on our dataset, including settings, sessions, and composite activities. The sessions involved cooking a soup in the kitchen, assembling a shelf with three boards and attaching a metal crossbar to it, working on a desk reading, writing, and using the computer (two sets), and performing arbitrary activities. The latter session served as embedding data for activity spotting under realistic conditions [26]. Thus, these activities were not intended to be recognized, but to evaluate robustness of recognition algorithms. In total, 196 arbitrary activity instances were recorded and annotated. Separate sections of
Benefits of Dynamically Reconfigurable Activity Recognition in Distributed Sensing Environments
279
these activities were added to training and validation sets.
Fig. 12.6 Illustration of sensor selection used during the dataset recordings. The body-worn and objectintegrated sensors recorded 3D-acceleration, while drawers and cupboards were equipped with light sensors to monitor opening and closing activities.
12.6.1
Experimental procedure
Two subjects were asked to wear 3D-accelerometers at both wrists and at the right thigh. Bend sensors were used to monitor finger extension at the right hand. Additional accelerometers were placed on 12 objects and tools that subjects interacted with. An additional 8 light sensors were placed into drawers and cupboards to monitor when they were used by subjects. Work on the computer was sensed by counting keys pressed and mouse movements using a background application. A pyroelectric infrared (PIR) motion sensor was deployed to detect when subjects entered and left a room. In each recording, subjects entered a room, performed all composite activities in a scripted sequence and left the room again. While activity composites were scripted, subjects had the freedom to perform individual atomic activities in their habitual style. As the aim of this work was to identify reconfiguration benefits for sensor nodes which recognize independent atomic activities, scripting activity composites did not influence performance results. Each session was recorded from two subjects performing 10 iterations each. The activities were annotated during the experiment by an observer and manually refined in a postprocessing step to serve as ground truth. All recordings were videotaped to support later analysis.
280
Activity Recognition in Pervasive Intelligent Environments
Table 12.2 Complexity metrics |Di | for all sensor nodes considered in the dataset. Most detector events were addressed by body-worn sensors, especially the dominant right hand. Objects and tools typically provided three activities: “picking up”, “using”, and “placing down”. Drawers and cupboard sensors monitored “open” and “close”.
12.6.2
Category
Position
|Di |
Category
Position
Body worn (Acceleration)
right hand left hand Left knee
35 5 4
Infrastructure
Pyroelectric infrared 2 Computer mouse 1 Computer keyboard 1
Tools (Acceleration)
Hammer Screwdriver Scissors
3 3 3
Furniture (Acceleration)
Shelf board Shelf leg Chair
3 1 1
Knife Book 1 Book 2 Phone Stirring spoon Drill Small wrench
3 3 4 0 3 3 3
Furniture (Light)
Dish cupboard Cutlery drawer Garbage Pot drawer Food cupboard Tool drawer Desk drawer
2 2 2 2 2 2 2
Big wrench pen
3 3
Total
|Di |
92
Sensor node complexity
Table 12.2 shows complexity metrics for all sensor nodes |Di |, i.e. the number of events to be recognized by each detector. Highest |Di | can be observed for the right and dominant hand of all subjects. Almost all performed activities involve this hand. Objects and tools with attached sensor nodes typically contributed to the recognition task with events directly related to them, e.g. being picked up, used, and placed down. Ambient installed sensors, such as light sensors in drawers and cupboards only observed whether the furniture was used. 12.7
Results
Table 12.3 summarizes the complexity measures of our reconfiguration approaches. It shows the number of states that are added to the state machine with each level of reconfiguration granularity. Furthermore, the additional number of reconfigurations executed within the complete dataset is indicated. It is important to note that with increasing granularity, the number of relevant sensors decreases as well as the number of atomic activities per detector event. This shows that the recognition becomes more specific to a situation.
Benefits of Dynamically Reconfigurable Activity Recognition in Distributed Sensing Environments
281
Table 12.3 Complexity measures for all four reconfiguration granularity levels. The number of states of the state machine, the average number of atomic activities per detector event |Ei, j |, the total number of reconfigurations at runtime, and the average number of sensors |D| is shown. Granularity Baseline Setting specific Composite specific Object specific
12.7.1
states
|E ji |
Reconfigurations
|D|
1 5 28 16
1.58 1.36 1.21 1.23
1 112 595 245
26 12.7 7.7 7.6
Baseline results
For the baseline activity recognition, the complete dataset was used to create training and validation sets. Overall performance of all detector nodes amounted to 50.46 % recall and 40.90 % precision. After fusion, these recognition results were improved to 56.45 % recall and 61.67 % precision. These results show that recognition of 92 detector events at baseline is challenging. 12.7.2
Setting-specific results
To investigate reconfiguration, the dataset was split into a setting specific recognition. Our recognition architecture was therefore trained on data relevant for one setting only. Table 12.1 includes the number of activities that needed to be recognized in each setting in the rightmost column. A total of 78 initial atomic activities at baseline was reduced to 44 (56 %) for kitchen, 12 (15 %) and 21 (27 %) for study, and 14 (18 %) and 12 (15 %) for assembly respectively. An additional effect of this setup was that the specificity of fusion events | ai | reduced from an average of 1.20 atomic activities per fusion event at baseline, to 1.07 for setting-specific reconfiguration. Recognition accuracy improved to 61.80 % and 64.19 % for recall and precision, which corresponds to a +6.83 % increase in the F1 score compared to baseline. As trigger for entering a setting-specific state we selected a PIR sensor which recognized a person entering or leaving the setting. Upon entry, our activity recognition system loaded a setting-specific model and switched back on exiting this setting. For our evaluation, we used ground truth events to determine trigger points, such as to evaluate the gain on perfect triggering. Recall and precision values however include the actually recognized PIR results, which are 92.77 % precision and 56.25 % recall.
282
12.7.3
Activity Recognition in Pervasive Intelligent Environments
Composite-specific results
To further reduce atomic activities, reconfiguration was made with regard to composite activities. In this setup, we have not further restricted the dataset for training detectors, as there were not enough instances to reliably train event spotters. Again we used perfect triggers at beginning and end of each composite activity, identified by first and last activity belonging to each composite activity. The presented results should therefore be considered as a maximum gain that can be achieved through compositespecific reconfiguration. The activity recognition performance resulted in 68.12 % recall and 63.46 % precision, an improvement of +11.48 % compared to baseline. This corresponds to a +4.35 % higher F1 score than obtained for our setting-specific evaluation. The average number of atomic activities per fusion event |ai | increased to 1.08. This increase is due to the fact that fewer erroneous events were reported. This result shows that recall could be improved while maintaining precision in a composite-specific setup. 12.7.4
Object-specific results
For the finest reconfiguration granularity, trigger activities were chosen to be atomic activities, such as picking up and placing down individual tools, or opening and closing drawers. This choice further reduced the average duration of a reconfiguration state from 48.15 seconds and 8.64 atomic activities in the composite-specific setup, to 13.46 seconds and 7.27 atomic activities. Detector events were at 1.12 atomic activities per detector event less specific, but the number of active sensor nodes was at an average of 7.4 the smallest compared to other reconfiguration setups. As shown in Fig. 12.4, this setup used the full depth of our reconfiguration state machine. The system reconfigures for setting specific, composite specific, and for object interactions. Thus, in each state fewer activities need to be recognized. Using this fine reconfiguration granularity resulted in a recognition performance of 64.12 % for recall and 62.38 % for precision. This is an improvement of +7.28 % compared to the baseline, but decrease of −3.76 % compared to a composite-specific reconfiguration. Figure 12.7 shows the recognition performance of all fusion events and all considered reconfiguration granularities. The F1 scores are ordered with regard to increasing compositespecific recognition performance. Table 12.4 summarizes these performance figures for average recall, precision, and F1 score weighted by their respective event occurrence.
F1 score
Benefits of Dynamically Reconfigurable Activity Recognition in Distributed Sensing Environments 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0
283
Baseline Setting specific Composite specific Tool specific
fusion events Fig. 12.7 Summary of fusion event performances (F1 scores) for all reconfiguration granularities. Fusion events are ordered according to increasing composite-specific reconfiguration performance.
The average unweighted F1 scores for all fusion events and reconfiguration granularities were [0.581, 0.645, 0.679, 0.659] for baseline, setting-specific, composite-specific, and object-specific reconfiguration, respectively. For the best performing composite specific reconfiguration, 37 % of all fusion events were recognized with a F1 score over 0.8, and 78 % had a F1 score larger than 0.5. A total of 80 % of these events performed better or at least as well as baseline. Fig. 12.7 shows clear performance benefits of the reconfiguration granularities compared to a baseline performance. The baseline incurred several low performing events that were clearly visible. These events could not be adequately modeled when considering the complete dataset. A particular example, accounting for the right-most performance drop is the atomic activity “filling pot with water” which was derived from wrist sensor data only. This activity exhibited signal patterns of arm postures similar to those when holding boards during shelf assembly. Holding a shelf board was part of our embedding data, that was not modeled, however made recognizing “filling pot with water” difficult. Using our reconfiguration, this source of confusion was omitted as the involved atomic activities fall into different reconfiguration states, leading to improved recognition performances. 12.7.5
Costs of reconfiguration
Each time a reconfiguration is performed, all sensor nodes need to be reconfigured. Data required for this reconfiguration includes information about sampling rate of sensors, spotting window sizes, settings for features, spotting models, and fusion thresholds. An average configuration for our activity recognition approach includes 25 parameters. Encoded with 16-bit values, this configuration data amounts to 50 bytes. This data is trans-
284
Activity Recognition in Pervasive Intelligent Environments
Table 12.4 Comparison of recognition performances for different reconfiguration granularities. Besides precision and recall, the F1 score and the average fusion event size |ai | are shown. Standby time and energy cost determined cost benefit of reconfiguration and mark a trade-off between recognition performance and power consumption. Reconfiguration detail
Baseline Setting specific fusion Composite specific fusion Object specific fusion
|ai |
1.20 1.13 1.08 1.12
Performance [ %]
Standby
Energy
Recall
Precision
F1
[ %]
[J]
56.45 64.19 68.12 64.12
61.67 61.80 63.46 62.38
.589 .630 .657 .632
0.0 47.0 74.8 84.5
24’140 14’143 8’889 6’168
mitted via a wireless link using a TI CC2420 transceiver, which consumes an average of 135 nJ/bit [27]. For each reconfiguration, sensors get one message, requiring an approximate energy consumption of 1.7 J per reconfiguration. We can derive the total energy costs for reconfiguration by considering that 1 reconfiguration is required for baseline, 112 reconfigurations for setting-specific, 595 for compositespecific, and 245 for object-specific reconfiguration setups. For an overview, we computed the average time sensors can be turned off as well, which is zero for baseline, 46.5 %, 74.2 %, and 84.4 % of the total time in all other settings. In our experiments, we used the widely available TelosB sensor nodes [28]. They have an average standby power consumption of 100 μ W and a transceiver consumption of 40 mW. Assuming a 10 % duty-cycle for listening in standby state, results in a idle-waiting power consumption of 4 mW. We can sum up the values to obtain a recognition performance vs. power consumption trade-off. The resulting energy consumption values are 23.21 kJ, 13.68 kJ, 8.53 kJ, 5.90 kJ across for baseline and all reconfiguration setups in our 6.56 hours dataset. The energy required for reconfiguration itself amounted to 157.3 J, 835.4 J, 344.0 J. When configurations are cached within sensor nodes, the reconfiguration costs can be reduced to 1 broadcast message per reconfiguration. This would reduce reconfiguration costs to 6.0 J, 32.1 J, and 13.24 J, which includes 1 transmission of each states’ configuration, and only 1 broadcast reconfiguration message for subsequent appearances of the reconfiguration state. The main benefit of using cached reconfiguration is in the reduced reconfiguration time that a context switch requires. The context switch time is another cost of reconfiguration. From the time when a trigger activity is detected until the system is reconfigured and a feature window is filled, no
Benefits of Dynamically Reconfigurable Activity Recognition in Distributed Sensing Environments
285
detector events could be obtained. Thus, a short activity could not be recognized during reconfiguration if its falls into a sensor network reconfiguration. The exact time depends on the reconfiguration implementation. E.g., for the Titan framework [15] a reconfiguration time of 0.9 s for six sensor nodes was needed. In our dataset, this is short enough to fit between any two consecutive activities and thus allowed to neglected this effect from further investigation. In other scenarios however, it may be essential that reconfigurations are not triggered during successive activities, which exhibit short performance durations.
12.8
Discussion
Our results show that using reconfiguration is beneficial for distributed recognition systems running on wireless sensor nodes. Reconfiguration can be seen as focusing on currently relevant activities, thus a system can make better use of the limited resources on sensor nodes. Our results confirm that reconfiguration can reduce energy costs and improve recognition performance of human activities. Nevertheless, there are trade-offs to be considered. Setting-specific training improved the F1 score (+6.83 %) by reducing the number of activities to be recognized compared a baseline configuration. Moreover, our results show that reconfiguration can be performed at increasingly finer granularity as long as the remaining activity set allows to capture all possible activities at a particular moment and robust models can be constructed. To this end, our composite-specific results show further improvement in recognition performance (+4.35 %) compared to setting-specific results. In contrast, our object-specific results dropped by −3.76 % compared to composite-specific ones, indicating limits in reducing a relevant activity set. We attribute this decrease to reduced information available during the network fusion step. Consequently, errors are left unfiltered that otherwise were removed by competing detector events. One example illustrating this effect is a wrist sensor reporting many false positives for “screwing”. During reconfiguration to the composite activity “mount lower shelf board”, a knee sensor may report whether the knee is “bent down” or not. The fusion step may use this information to cancel out false positives from the wrist. During object-specific reconfiguration however, the seemingly irrelevant knee sensor is turned off, and a fusion could not use such information to refine results. While keeping too many sensors may introduce confusing information, a too aggressive limitation of involved sensors can omit subtle information content. It can be hypothesized that alternate recognition methods may produce different trade-offs
286
Activity Recognition in Pervasive Intelligent Environments
points. Any such method however needs to cope with sparse and multi-activity situations. Activity sparseness is limited by the natural requirement to capture all possible activities at a particular moment. Our results confirm that reconfiguration enables energy saving by turning off sensors when they are not needed. In this way the total energy usage of our distributed system was reduced by up to −84.5 %. In contrast to recognition performance changes, fine-grained reconfiguration further decreases energy costs, as costs for wirelessly sending reconfiguration data amounted to a maximum of 9.79 % of the total energy spent. 30
Precision
Performance [%]
90
Recall
27
80
F1 score
24
70
Energy
21
60
18
50
15
40
12
30
9
20
6
10
3
0
Energy [kJ]
100
0 Baseline
Settings
Composites
Tools
Fig. 12.8 Illustration of the reconfiguration evaluation results of Table 12.4. Recognition performance and energy consumption are shown for individual reconfiguration granularities. Energy consumption decreases monotonic for increasing granularity. Recognition performance shows a maximum for composite activities.
Whether reconfiguration can be successfully applied is application-dependent. Settings where activity sequences or activity sets are predominantly performed under certain circumstances, locations, etc., provide large potential for reconfiguration. Domain knowledge of the work place, e.g., the kitchen, etc., allows constraining the set of relevant activities. Widely used approaches in activity recognition apply prior probabilities to tune recognition to a situation. These priors are subsequently dynamically modified based on sensor readings to reflect relevance. Our reconfiguration approach is more rigorous in that it completely excludes activities that are not of interest at the moment. In turn, this exclusion enables using highly optimized recognition models at particular moments, comprising those remaining activities only. Reconfiguration benefits come at the drawback that some decision must be made on relevant activities for a particular reconfiguration. Activities left out of this catalog will not be recognized. By using domain knowledge, we
Benefits of Dynamically Reconfigurable Activity Recognition in Distributed Sensing Environments
287
implemented these constraints with the idea that left-out activities are not of interest at the current time. A consumer of the activity recognition service may not even know how to handle such unexpected results. While our work focused on the optimization of recognition models for low-level activities that are identified from patterns within sensor data streams, our approach is generalizable also to higher-level event processing algorithms. Besides offering the option to use optimized activity models for certain domains, reconfiguration can help to achieve scalable online activity recognition since processing effort for activities is reduced. We have manually included expert knowledge in defining when and to which activity set to reconfigure, and evaluated our approach with perfectly recognized trigger activities. While this approach was required to identify potential benefits of reconfiguration, additional steps are needed to make reconfiguration suitable for a broad range of applications. To this end, we expect that the following two challenges need to be addressed in further research. The first challenge is related to automatically construct activity sets of a recognition state machine and derive trigger activities. A statistical analysis of the datasets could reveal recurring activity sequences or domains that offer potential for reconfiguration. Typically, setting-specific reconfigurations can be rapidly derived by using dedicated location sensors (e.g. a PIR sensor at the door) as triggers, and collecting all activities within a room or environment involving localized activities into an activity set. Automatically identifying activity sequences, such as cooking a meal, for composite-specific reconfigurations is however more challenging. Besides identifying relevant activity sequences, a suitable high-level activity recognition system needs to be able to predict a subsequent composite activity to reconfigure to. We have presented such high-level recognition algorithms in our previous work [19]. A current overview of high-level activity pattern detection approaches can be found in Kim et al. [18]. The second challenge is to reliably detect reconfiguration triggers, thus when to switch from one reconfiguration state to another. The recognition of atomic activities from sensor signal patterns as well as that of composite activities is inherently limited in reliability. Hence a reconfigurable system may be cued into erroneous states from which it must be able to escape. By monitoring reported activity patterns in a state, the system could detect that the state might be wrong and an action must be taken. For example, Surie et al. [29] modeled activity event occurrences within a composite activity using a Hidden Markov Model and were able to correct their decisions after a certain delay. In the scope of reconfiguration and online recognition systems, the drawback of delayed decisions is that recognition may be
288
Activity Recognition in Pervasive Intelligent Environments
delayed or time periods of incorrect recognitions occur. The structure of our state machine allowed a fallback to a coarser granularity, which includes more activities. This behavior could be helpful if the current state was determined to be not appropriate. 12.9
Conclusion
We have proposed and evaluated benefits of reconfiguration in distributed activity recognition systems using wireless sensor nodes that are worn on the body, are integrated in tools, and in the environment. Our empirical evaluation showed recognition performance gains and energy cost savings by evaluating reconfiguration in a dataset comprising four settings with 18 composite and 78 atomic activities. Our results demonstrate that fine-grained reconfiguration granularities are beneficial for activity recognition performance. Performance increased from baseline to composite-specific reconfiguration by 11.48 %. When further decreasing reconfiguration granularity to object-specific reconfiguration however, a 4.85 % lower recognition performance was observed. This result indicates that an optimal reconfiguration granularity exists. With finer reconfiguration granularity, sensor node energy costs reduced monotonically. This result was achieved by switching off sensors. We observed up to 84.5 % in energy saving for a object-specific reconfiguration, and 74.8 % for a composite-specific reconfiguration. Energy costs for reconfiguring the sensor network was negligible compared to energy savings achieved by dynamically turning off sensor nodes. These results confirm that reconfiguration is a promising research direction to improve scalability and energy efficiency in activity and context recognition systems. Further work should address strategies for recognizing and correcting erroneous reconfiguration states and an automatic composition of the reconfiguration strategy. References [1] D. Geer, Will gesture recognition technology point the way?, Computer, 37(10), 20–23 (Oct., 2004). [2] D. J. Cook and S. K. Das, How smart are our environments? an updated look at the state of the art, Pervasive and Mobile Computing, 3(2), 53–73, (2007). [3] D. J. Cook, J. C. Augusto, and V. R. Jakkula, Ambient intelligence: Technologies, applications, and opportunities, Pervasive and Mobile Computing, 5(4), 277–298, (2009). [4] G. Pottie and W. Kaiser, Wireless integrated network sensors, Communications of the ACM, 43, 51–58, (2000).
Benefits of Dynamically Reconfigurable Activity Recognition in Distributed Sensing Environments
289
[5] M. Ryoo and J. Aggarwal, Recognition of composite human activities through context-free grammar based representation, In Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), vol. 2, pp. 1709–1718, (2006). [6] D. Kawanaka, T. Okatani, and K. Deguchi, HHMM based recognition of human activity, IEICE Trans. Information Systems, E89-D(7), 2180–2185 (July, 2006). [7] F. Naya, R. Ohmura, F. Takayanagi, H. Noma, and K. Kogure, Workers’ routine activity recognition using body movements and location information, In Proc. IEEE Int. Symp. Wearable Computers (ISWC), pp. 105–108, (2006). [8] G. Ogris, T. Stiefmeier, P. Lukowicz, and G. Tröster, Using a complex multi-modal on-body sensor system for activity spotting, In Proc. Int. Symp. Wearable Computers (ISWC), pp. 55–62, (2008). [9] K. Murao, T. Terada, Y. Takegawa, and S. Nishio, A context-aware system that changes sensor combinations considering energy consumption, In Proc. 6th Int. Conf. Pervasive Computing (Pervasive), pp. 197–212, (2008). [10] K.-F. Lee, Context-dependent phonetic hidden markov models for speaker-independent continuous speech recognition, IEEE Trans. Acoustics, Speech and Signal Processing, 38(4), 599– 609, (1990). [11] M. Stäger, P. Lukowicz, and G. Tröster, Power and accuracy trade-offs in sound-based context recognition systems, Pervasive and Mobile Computing, 3(3), 300–327 (June, 2007). [12] H. Ghasemzadeh, E. Guenterberg, and R. Jafari, Energy-efficient information-driven coverage for physical movement monitoring in body sensor networks, IEEE Journal on Selected Areas in Communications. 27(1), (2009). [13] P. Zappi, C. Lombriser, T. Stiefmeier, E. Farella, D. Roggen, L. Benini, and G. Tröster, Activity recognition from on-body sensors: Accuracy-power trade-off by dynamic sensor selection, In Proc. Europ. Conf. Wireless Sensor Networks (EWSN), pp. 17–33, (2008). [14] G. Batori, Z. Theisz, and D. Asztalos, Domain specific modeling methodology for reconfigurable networked systems, In Proc. Int. Conf. Model Driven Engineering Languages and Systems (MoDELS), pp. 316–330, (2007). [15] C. Lombriser, M. Rossi, A. Breitenmoser, D. Roggen, and G. Tröster, Recognizing context for pervasive applications with the titan framework, Technical report, Wearable Computing Laboratory, ETH Zurich, (2009). [16] R. Gravina, A. G. an Giancarlo Fortino, F. Bellifemine, R. Giannantonio, and M. Sgroi, Development of body sensor network applications using SPINE, In Proc. IEEE Int. Conf. Systems, Man and Cybernetics (SMC), (2008). [17] V. Osmani, S. Balasubramaniam, and D. Botvich, Human activity recognition in pervasive health-care: Supporting efficient remote collaboration, Network and Computer Applications, 31(4), 628–655, (2008). [18] E. Kim, S. Helal, and D. Cook, Human activity recognition and pattern discovery, Pervasive Computing Magazine, 9(1), 48–53, (2010). [19] O. Amft, C. Lombriser, T. Stiefmeier, and G. Tröster, Recognition of user activity sequences using distributed event detection, In Proc. Europ. Conf. Smart Sensing and Context (EuroSSC), pp. 126–141, (2007). [20] W. Dargie and T. Tersch, Recognition of complex settings by aggregating atomic scenes, IEEE Intelligent Systems, 23(5), 58–65, (2008). [21] O. Amft and G. Tröster, Recognition of dietary activity events using on-body sensors, Artificial Intelligence in Medicine, 42(2), 121–136, (2008). [22] H. Junker, O. Amft, P. Lukowicz, and G. Tröster, Gesture spotting with body-worn inertial sensors to detect user activities, Pattern Recognition, 41(6), 2010–2024 (June, 2008). [23] Q. Xu, M. Kamel, and M. M. A. Salama, Significance test for feature subset selection on image recognition, In Proc. Int. Conf. Image Analysis and Recognition (ICIAR), pp. 244–252, (2004).
290
Activity Recognition in Pervasive Intelligent Environments
[24] L. I. Kuncheva, Combining Pattern Classifiers: Methods and Algorithms, (Wiley Interscience, 2005). [25] P. Zappi, C. Lombriser, E. Farella, L. Benini, and G. Tröster, Experiences with experiments in ambient intelligence environments, In Proc. IADIS Int. Conf. Wireless Applications and Computing, pp. 171–174 (June, 2009). [26] O. Amft, Adaptive activity spotting based on event rates, In Proceedings of the IEEE International Conference on Sensor Networks, Ubiquitous, and Trustworthy Computing (SUTC), pp. 169–176, (2010). doi: 10.1109/SUTC.2010.63. [27] B. Bougard, F. Catthoor, D. C. Daly, A. Chandrakasan, and W. Dehaene, Energy efficiency of the IEEE 802.15.4 standard in dense wireless microsensor networks: Modeling and improvement perspectives, In Proc. Conf. Design, Automation and Test in Europe (DATE), pp. 196–201, (2005). [28] J. Polastre, R. Szewczyk, and D. Culler, Telos: enabling ultra-low power wireless research, In Proc. Int. Symp. Information Processing in Sensor Networks (IPSN), p. 48, (2005). [29] D. Surie, F. Lagriffoul, T. Pederson, and D. Sjölie, Activity recognition based on intra and extra manipulation of everyday objects, In Proc. Int. Symp. Ubiquitous Computing Systems (UCS), pp. 196–210, (2007).
Chapter 13
Embedded Activity Monitoring Methods
Niket Shah, Maulik Kapuria, and Kimberly Newman University of Colorado at Boulder, Department of Electrical, Computer, and Energy Engineering, 425 UCB, Boulder, CO 80309 Abstract As the average age of the population increases worldwide, automated tools for remote monitoring of activity are increasingly necessary and valuable. This chapter highlights embedded systems for activity recognition that provide privacy, do not require major infrastructure, and are easy to configure. The strengths and weaknesses of popular sensing modes that include RFID, motion, pressure, acceleration, and machine vision are discussed. A new activity detection system is also described for high privacy area like the bathroom and bedroom environment.
13.1
Introduction
The population of elderly individuals is rising in industrialized countries. In the current economic scenario, many families find it very difficult to maintain close contact with parents and aging community members. This population becomes dependent on care providers to manage health issues and provide security in daily life activities. Human labor is the standard approach to take care of elderly individuals; however, it is very expensive and sometimes not available. The development of automated systems that range from high-end robots to simple motion sensing devices evolved from this mismatch in supply and demand for the delivery of in-home care. Cost effective and reliable systems are described in the following sections to monitor activity and behavior patterns without overly intruding and changing the personal life of the seniors. Three basic elements of Activity Recognition System are: 1) Collection of raw data using sensors and implementation of low-power embedded circuitry for long-term deployment. 291
292
Activity Recognition in Pervasive Intelligent Environments
2) Signal processing of low-level data to remove noise and implement security screens. 3) Classification of activity based on low-level sensor data and previously gathered information with the help of various machine-learning algorithms and data mining techniques. The methods discussed and proposed are all sensor based embedded systems. Each of the described systems is capable of detecting basic activities such as walking, sitting, running, lying down, and falling down. Detection of these features enable aging in place so that individuals are safe and secure in their homes and do not need to move into assisted living communities. Additionally, the technology enables remote management of chronic illness and provides a safe environment for recovery after surgery in an outpatient setting.
13.2
Related Work
Pervasive intelligent environments are an active area of research from the Gator Tech house at the University of Florida [1] and the Maverick smart house at the University of Texas Arlington [2] to a wide variety of monitored spaces throughout the world. The focus of this section is to highlight leading activity detection methods and discuss new and better ways to sense activity with minimal infrastructure requirements. 13.2.1
Machine Vision
Vision tracking is one of the most common approaches for activity recognition. This method uses cameras at various locations and employs either manual or automated image processing. The manual approach for image processing involves a human operator. Data is monitored from various cameras to detect unusual activity. A single person scans multiple camera outputs from different homes in the community for any discrepancies in regular activity patterns and emergency situations. This approach is very labor intensive. It requires constant monitoring and also requires training to detect events of concern. Another approach involves cameras equipped with embedded systems that utilize imageprocessing algorithms. The survey paper provided by Weiming Hu et al. [3], discusses the issues involved in surveillance applications using machine vision. Motion detection is the first step in automated vision processing that requires environment modeling, motion segmentation, and object classification. Once the subject is isolated from the environment, object tracking is the next step. Four leading techniques are region-based tracking, active contour-based tracking, feature-based tracking, and model-based tracking. The next step in image processing is to determine the activity. Natural language is one method to describe
Embedded Activity Monitoring Methods
293
behaviors. Another method uses multiple cameras to create 2-D and 3-D representations for evaluation. A leading method for fall detection and pose estimation is an active vision system described by Diraco et al. [4]. The system uses wall mounted time-of-flight cameras to detect falls. Self calibration is performed to determine the distance from the mounting to the floor. Depth information is processed to determine the pose and orientation of the subject. This method removes the need for specialized illumination from the camera setup. Privacy is protected by providing only target depth information. Sparsely sampled sequences of body poses are used to simplify the image processing for activity detection. This technique is described by Jezekiel Ben-Arie et al. [5] from University of Illinois at Chicago. Image processing is implemented by tracing and joining predetermined points in the human body, to form a skeleton of the subject. The skeleton thus formed can easily be categorized as walking, sleeping, running, etc. This method is also decribed by Diraco to enhance the processing of the image from the time-of-flight camera. The disadvantage of vision based systems is the cost of the specialized cameras. Several of them are needed to monitor the entire house. Another point of concern is the large amount of data required for image processing algorithms to determine the pose and activity pattern.
13.2.2
RFID-Object Tracking
Radio Frequency Identification (RFID) is another popular option for activity recognition as described in the RFID handbook [6]. Objects in daily use are tagged with a passive RFID. The reader is a bracelet worn by the subject. As described in the article by Fishkin et al. [7], an RFID reader is incorporated into a bracelet. Whenever a tagged object is near the bracelet, information is sent to the central embedded database. Position is determined by evaluating the object that is read in the room. For example, when the reader detects the RFID tag on the refrigerator the person is in the kitchen. On a secondary level, the sequence of objects manipulated by the user provides additional insight into activity patterns. A sequence can be anticipated to predict the next set of possible activities of the person. These methods are discussed by Smith et al. [8] and Patterson et al. [9]. For example, tagged items are a teapot, water tap, stove-knob, teabag, sugar-jar and cup. When the person wearing a reader uses the teapot or picks up a teabag, the system predicts the remaining components required to make tea will be read in an
294
Activity Recognition in Pervasive Intelligent Environments
expected order and time frame. The system monitors when the stove is read to ensure it is switched off. Sequence detection monitors the two readings of the stove to ensure a minimum time period is observed. In this example, the system learns patterns from the daily activity of the user and triggers alarms when deviations from set patterns are detected. Drawbacks to this system are a long training time to make observations. The system also does not provide information regarding posture or orientation of the user. Use of a Dynamic Bayesian Network (DBN) can ease the process of learning and provide more accuracy in classification. Tomohito Inomata et al. [10] obtained 95.4 percent accuracy in recognizing the various activities in a drip injection task using DBN and Hidden Markov Models (HMM). Another approach to tracking with RFID is the use of the orientation of tag antennas as discussed by Rao et al. [11]. RFID antennas for the remote tags and the reader are generally planar antennas so when the antennas are in the same plane the reader receives the strongest signal. If the planes of the antennas of the reader and the tag are at an angle, the power received by the reader decreases. Limitations of this method are the detection of the direction of the angle and the need for fixed or known distance between the tag and the reader. Multiple readers are used to solve the angle issue but localization is not performed. Another drawback to this form of activity detection is the need to wear a bracelet. Wireless Identification and Sensing Platforms (WISP’s) are another way to detect objectuse as described by Yeager et al. [12]. WISP’s are essentially long-range RFID tags with integrated sensors. These tags use incident energy from distant readers to return a unique identifier, power a sensor, and communicate the current value of the sensor to the reader. For activity inference applications, WISP’s about the size of a large band aid with integrated accelerometers are attached to objects. When a tagged object is used, the accelerometer is triggered, and the ambient reader is notified. A single room containing hundreds of tagged objects may be monitored by a single RFID reader since most of the tags are inactive at any given time. While this technique is attractive due to low-cost and simplicity, it has several drawbacks. Objects and people must be equipped with RFID readers and tags. In the elder care environment, this is a concern since many residents may forget to wear the device. Additionally, objects such as metallic objects, food, and very small items are not feasible to be tagged. Another concern in using this type of sensing method is the noise present in the environment. Noise is created by RF field strength, interference, and conflict between labels during
Embedded Activity Monitoring Methods
295
reading from two adjacent tagged objects so it may be difficult to isolate an event or locate a specific item. 13.2.3
RFID and Machine Vision
Vision and RFID tracking have disadvantages when used as a sole method for observation of activity. RFID detects the regional location of a person but does not provide fine resolution. This approach is discussed in the papers by Jianxin Wu et al. [13] and Sangho Park and Henry Kautz [14]. In order to improve the accuracy of the RFID sensing, a vision system can be incorporated to further identify the location and pose of the individual. For example, a person is detected with RFID as standing in the kitchen near the refrigerator. After data is sent to the central module for processing, a camera with the refrigerator in view is switched on. The camera starts tracking the activity of the person. This approach limits the number of cameras that are operating at a given time so that much less signal processing is required to determine if the monitored individual is present. Other triggers can also be used to turn on the camera system such as motion sensors which are discussed in the following section. 13.2.4
Motion Sensors
Motion sensors provide a very broad area of location for a person. These sensors are directionless and do not identify specific individuals. However, networked motion sensors at strategic locations within a monitored facility provide new insight into activity patterns. The QuietCare system is a leading method for motion sensing in a home environment. It is from the collaboration of the Living Independently Group and GE Healthcare. This technology is mentioned by Cavoukian et al. [15] in her recent paper on remote home healthcare technologies that provide for privacy of the individual. The QuietCare system utilizes small wireless motion sensors placed in the bedroom, kitchen, bathroom, and meal prep and/or medication areas. Each sensor transmits information about the senior’s daily living activities to a base station. The base station initially learns from the behavioral pattern of the senior over a span of several days. Important incidents like the wakeup time, delay in medication time, missing to go to washroom, inactivity during sleep, or inactivity for a long time are recorded and reported. As shown in Fig. 13.1, three scenarios are recorded by the system on three different days. The horizontal axis shows the time in minutes after the person wakes up. The letters A to J represent different sequences of events from objects on which motion sensors are activated like A-Alarm Clock, B-Washroom, C-
296
Activity Recognition in Pervasive Intelligent Environments
Toothbrush, D-Bedroom, E-Drawer, F-Box of Medicines, G- Kitchen, H-water tap to get water for medicines, I-Washroom for taking a bath, J-Closet.
Fig. 13.1 Scenario X, Y and Z are shown based on motion sensors
Scenario X is the learned scenario from several sequences of activity that is classified as the TRUE or safe sequence of events. Scenario Y and Z are the scenarios that cause alarms. In scene Y, the person wakes up 9 minutes late and misses the events for taking medication. In scene Z, the person misses the remainder of the daily activities after going in the washroom. This is probably due to a fall or other serious problem in the washroom. When Y and Z occur, local alarms and other methods are used to inform care givers an emergent situation is in progress. Motion sensors may not be effective in a multiuser environment. In such cases a combination of one or more technologies are required for the system to identify the specific individual activities and create accurate activity patterns. Another drawback is the training time required to classify activity patterns. It is very long and may lead to extended delays in proper classification when the daily routine is irregular.
13.2.5
Pressure Sensors
Use of pressure sensors is another excellent and a simple way to detect the activities of a person. There are two techniques to accomplish this sensing task. One of the techniques uses the entire floor of the house as a pressure sensing entity through pressure mats. Pressure sensing floor plates are discussed in the paper by Youssef Kaddourah et al. [16] with the Gator Tech smart house research group. The sensing method is similar to a touch screen
Embedded Activity Monitoring Methods
297
LCD. The screen is the floor and the stylus is the person. Position is determined by knowing which areas in the room are pressurized. Posture of a person is extrapolated from sensor input. For example, if the person suddenly disappears from the house without going near the door then two postures are possible. Either the person is on the bed or is sitting in a chair. The final determination is made by checking the room layout to see the location of the nearest piece of furniture. A transient increase in pressure, on a large floor area means that the person has fallen. Therefore, this system is effective in the detection of position and posture of the person. However, there is an overall lack of sensitivity in posture recognition. The system must know the exact location of the home furniture, and it is assumed to be static. Installation is also non-trivial, since it must replace the regular flooring. The system is also limited in determining the actual activity of the person. Another approach using pressure sensors is to wear them on the body. This is the focus of the research of Danilo De Rossi et al. [17] with an emphasis on biomedical monitoring. A general approach is to place pressure sensors at key locations on the body with wireless modules for monitoring. This wireless module sends information to a central hub to make a decision about the posture of the person or the physiological status. For example, sensors are placed on the feet, hips, and back of the person. Walking, running, and standing are determined when the sensors on the feet are at full pressure and the butt and back sensors have no pressure. Sitting is determined by less pressure on the feet and full pressure on the butt. The presence or absence of back support while sitting is determined by the amount of pressure on the back. Sleeping is detected by partial pressure on the back and butt and no pressure on the feet. A sudden fall is detected by the butt and back, showing a transient full pressure when the feet show no pressure. The posture of the person is easily detected. The system can also detect the speed of walking and running of the person. However, there are several limitations to this type of deployment. The system is unable to detect the actual activity of the person, and it can’t detect the location of the person in the house.
13.2.6
Accelerometers
Another approach to basic activity recognition uses 3-axis digital accelerometers. According to publications by Williams et al. [18], and Doughty et al. [19], the basic criteria to detect a fall is a change in body orientation from upright to lying that occurs immediately
298
Activity Recognition in Pervasive Intelligent Environments
after a large negative acceleration. These two conditions are used in fall detection algorithms using accelerometers described by Mathie et al. [20], and Salleh et al. [21]. A small electrical circuit with a 3-axis accelerometer and wireless transceiver is worn to determine various postures like standing, walking, running, lying, falling, and sitting. Posture data is sent wirelessly to a central unit for continuous monitoring as well as logging for future use. In the event of a fall, the central unit sends an emergency message to the respective authority. Accelerometers can also be used to determine physical energy expended for any human activity as discussed by Montoye et. al. [22]. This unique parameter can be monitored, and used to predict whether the energy expended by wearer for particular activity (for e.g. walk, run, sleep etc.) is within the accepted limit or not. This technique gives a brief analysis of the physical health state of the patient. Like the other low-level sensors mentioned above, the main limitation of accelerometers is the inability to perform localization and tracking of a person. Accelerometers detect only basic human posture and activity patterns but do not have the ability to detect high-level, context aware activity recognition based intelligence. They are a reliable basic building block for developing a context awareness system for use in a pervasive environment. 13.2.7
Accelerometers and Gyroscopes
A combination of sensors overcomes the main drawbacks of using accelerometers alone for posture or fall detection. The system described by Li et al. [23] uses gyroscope and accelerometers to detect basic human activities. Abnormal activities such as a fall are determined by evaluation of the angular velocity of the gyroscope and the magnitude of acceleration due to abrupt transition between two body postures. Pairs of these sensor modules are attached on two different parts of the body to provide temporal accuracy. An accelerometer and gyroscope are placed on the thigh, and another pair is placed at the center of the chest. Distinct movements on each half of the body are then discernable for any basic posture like walking, running, sitting, etc. Data collected from sensors are processed for basic parameters such as magnitude of acceleration and angle of inclination for thigh and chest gyroscopes. The magnitude of acceleration is the RMS value of the acceleration in all three directions measured by an accelerometer. The angle of inclination for the chest and thigh is calculated using either gyroscope sensor data or magnitude of acceleration in a particular direction. Data is tabulated with
Embedded Activity Monitoring Methods
299
typical threshold values for different scenarios to create basic posture ranges for monitoring. Simple algorithms are then used to detect different postures. Fall detection is inferred by the detection of a lying posture followed by a very high value of acceleration and angular velocities. This approach has high resolution for detection of fall conditions. It also accurately distinguishes between normal user activity and fall, in case of fast sitting down and falling on stairs. However, wearable sensors are required on two places of the body. Also, like other low-level sensors, only basic posture is detected without location or context awareness.
13.3
Ultrasonic Activity Recognition Method
A new system of ultrasonic measurement along with motion sensors is proposed to detect the posture and activity of a person. This is a system with virtually no invasion into the life of the elderly people. The basic concept of mounting ultrasonic sensors in the ceiling is introduced by Nishida et al. [24]. This system consists of detecting a person according to height information. Sensors detect the distance of the nearest vertical object by measuring the round trip time of a reflection as shown in Fig. 13.2. Changes in the distance between the sensor and the subject are captured and stored in a remote computer. Activity and pose are monitored, and alarms are triggered, if at-risk behavior is detected.
Fig. 13.2 Usage of ultrasonic Rx-Tx modules on ceiling
300
Activity Recognition in Pervasive Intelligent Environments
Three different cases are shown in Fig. 13.2. In case ‘A’ the subject is standing so the distance detected is the average of the reflections from the top of the head. Calibration is performed on the room to determine where furniture is placed. This is illustrated as cast ‘C’ to provide the reference signal for the room provided furniture is not regularly moved. In case ‘B’ the distance is midway between standing and not present so the subject is assumed to be sitting which can also be further verified by evaluation of the static room configuration. If the subject is walking, the (X, Y) sensor detecting the person will change. The rate at which two adjacent sensors detect activity provides the speed of walking. If the subject falls down, there is a sudden transition from ‘A’ to a small distance off the floor. The rate of transition decides whether the subject has fallen or not. Additionally, multiple sensors can be evaluated since the person will probably be prone and activating more than one output. Emergency calls can be made in such cases.
13.3.1
Ultrasonic Sensor Selection
The system employs a network of ultrasonic sensors that can detect objects from 6 to 255 inches. Two ultrasonic sensors offered by MaxBotix are used for experimental measurement. The LV-MaxSonar-EZ0 has the highest beam width and the LV-MaxSonar-EZ4 has the narrowest. These sensors are evaluated to detect human presence and activity. Both sensors are placed on the ceiling of a room of approximately 96 inches. from ceiling to carpet. The height of the subject passing under the sensor is 71 inches. Each sensor is evaluated under identical conditions and the output is observed. The software provided with the sensors is used for these experiments. Graph plots are shown in Fig. 13.3 to illustrate the output of the sensors when an obstacle is detected.
Fig. 13.3
Graphical Plots for evaluation of the EZ0 and EZ4 Ultrasonic Sensors
In Fig. 13.3, the EZ0 on the left provides a smooth output when a person walks under the
Embedded Activity Monitoring Methods
301
detector. The output remains around 25 inches for the time the person is under the sensor. The output of the EZ4 on the right does not stabilize when triggered by a person walking under the sensor. Hence the EZ0 proves to be more useful than the EZ4 for basic activity detection.
Fig. 13.4
Sensor output for EZ0 and EZ4 when subject is seated
Another experiment is performed to compare the output of the two sensors when a person sits on a chair. As shown in Fig. 13.4, on the left, the output for the EZ0 decreases and then remains stable. When the subject does the same activity in the figure on the right, the output goes to a saturation max because of lobe reflections due to the narrow beam width of the EZ4 sensor. Based on this data, the EZ0 is also the best choice for sensing a seated posture.
Fig. 13.5 Beam width for the EZ0 sensor
302
Activity Recognition in Pervasive Intelligent Environments
In order to deploy these sensors in an array, the beam width of the EZ0 must be characterized. If an object is out of the beam width, the output should give a reading of 96 inches. An obstacle is placed on the floor to measure the beam width by moving it from a safe distance until the deflection is detected for the obstacle height. The obstacle is 5 inches high for this experiment. A conical beam width is detected as expected and illustrated in Fig. 13.5.
13.3.2
Construction of the System
An overview of the system is shown in Fig. 13.6. A motion sensor is used to turn on the ultrasonic sensing in the environment. Once activity is detected, the embedded computer triggers the activation of the room sensors to map the obstacles and activity in the environment. Data is collected in time frames that show the status of the room sensors for monitoring of activity. These frames are time stamped and transmitted through the wireless interface to a remote monitoring platform. Signal processing is performed on the raw data to alert care providers of at-risk activities.
Fig. 13.6
Ultrasonic Activity Detection System Deployment Scenario
The location and height of the individual is determined by measuring the difference between the transmitted and the received signal timings. An ultrasonic sensor generates an
Embedded Activity Monitoring Methods
303
analog signal proportional to the distance at 10mV per inch, which is converted to the proportional digital signal by the microcontroller. This A/D signal is processed using clutter removal and side lobe removal algorithms. Output is transmitted through the serial port to the embedded microprocessor for logging each sensor in the room. Data is transferred to a remote computer using the Zigbee module. Signal processing is then performed on the remote computer to determine if the person has risk of injury and needs additional assistance. An average sized room with dimensions of 144 (length) × 144 (width) × 96 (height) is used for testing. Each sensor can cover a circular area of 3.14 × 182 = 1018 sq. inch on the floor. To provide the best coverage of a square room, 25 sensors are used in a 5 row by 5 column configuration. 13.3.3
System Operation
Posture is determined by measuring the height of the person in the room. It is assumed that identification of the person is performed by different technology. Another assumption is that identification is focused on a single individual. The use of the system very beneficial in areas such as the bedroom and bathroom where an individual would be alone and want to maintain privacy. The ultrasonic system detects the obstacles in the room and maps the entire room for the stationary environment. Once this is performed, the detection of a person is calculated in a manner similar to the initial experiments with the EZ0 sensor. If the height of the room is x inches, and the height of the person is y inches, when the person is under a particular sensor node a reading of x − y inches is captured. As soon as the motion sensor detects a person entering the room, the sensors are triggered and a full image of the room is captured. The fastest that the sensor can capture activity information is 20 Hz. Sensors data is processed into a frame of data when the central node has received all the information for the room. The system remains active as long as there is activity detected by the motion sensor for 10 consecutive frame captures, which is approximately 0.5 seconds. This can be adjusted as necessary to enhance the sensitivity of the system but the primary goal is to save energy by shutting down when activity is not detected. An overview of the detection scenario is shown in Fig. 13.7 with a room that has a bed, sofa, and chair. The frame on the left of Fig. 13.7 corresponds to the matrix without a subject in the room. Stationary objects are detected by variations in the readings. Assuming the
304
Activity Recognition in Pervasive Intelligent Environments
maximum distance from the ceiling to the floor is 96 inches, the maximum value output when no obstacles are present is the 96. The bed returns a reading if 50 which is 46 inches from the floor. A sofa returns a reading of 75 which is 21 inches from the floor and a chair returns a reading of 81 which is 15 inches from the floor.
Fig. 13.7 Frameset Matrices
When a person enters the room, another frameset matrix is collected. This frameset matrix is shown on the right of Fig. 13.7. The sensor node near the door changes output and detects the height of the person. Other sensors are also impacted, based on the overlap of the adjacent beams. As stated before, the calibrated height of the individual is determined by the initial reading at the door. The reading of 21 indicates the individual is 75 inches tall for this example.
Fig. 13.8
Subtraction of two consecutive Frameset Matrices
Detection of the individual can be further refined by subtracting the background environment as shown in Fig. 13.8. Location is readily available using this method and agrees with
Embedded Activity Monitoring Methods
305
the height calculation when the person initially enters the room as shown in Fig. 13.7. Information about the room can be used when pose estimation is necessary but is not needed to track the individual moving in the room while they are standing. This approach makes tracking very simple since only one piece of data is needed for evaluation. The rate at which the person moves from one sensor to the next and the value after subtraction of the background is used to determine the exact activity and pose.
13.3.4
Activity and Pose Recognition
The exact action of a person can be determined by the rate at which the position of the person varies in the subtraction matrix, as shown in the sequences of Fig. 13.9. If the rate of change of position is high, the person is walking very fast or running; if the change is slow, the person is walking slowly so the classification of the speed of the motion is easy to determine. In order to improve the accuracy of the subtraction matrix, further calibration is performed to reduce the image to a single sensor value. This is accomplished by modifying the beam width of the sensor node and evaluating the placement so that the main lobes of adjacent sensor nodes do not interfere with each other. Detect of jumping or standing at a particular spot is also discernable by monitoring the returned value on the active node. Slight variations in displacement indicate the action from standing to jumping as demonstrated in Fig. 13.10. Standing is shown as the value 75 and the jump ranges from a crouching position of 72 to an extending position of 79. The detection of sitting, falling, and sleeping require additional information about the environment. Objects are identified using dimensions from the matrix frameset. The position of the person, their height and relation to detected fixed objects are used to determine the exact pose. A description of sitting detection is provided in Fig. 13.11. The person walks through the room at the expected height of 75 inches then disappears in the corner. After review of the background information, it is determined the chair height increases from 15 inches to 44 inches which corresponds to the person sitting in the chair. The detection of a fall is similar to the sitting detection and is illustrated in Fig. 13.12. A person enters the room in the first frame and moves toward the bed in the second frame. In the third frame, the person disappears from view at the calibrated height. The disappearance of the person is not combined by an increase in the height of either the sofa, bed or the chair. The person also does not disappear at a place where an exit to the room is available. A new
306
Activity Recognition in Pervasive Intelligent Environments
Fig. 13.9 Consecutive subtraction matrices are shown for two poses
Fig. 13.10 Detection of Jumping
obstacle is detected near the bed which indicates a sudden fall. Additionally, the increase in the height is not just for a single sensor cell but for two (or more) sensor cells since the person is prone on the floor. The time taken by the person to go from standing to the floor determines whether the action is an accidental fall or lying down with intent. Detection of sleeping is similar to falls. If the person walks toward the bed and suddenly disappears with a nominal increase in height of the bed across two cells this indicates that the person is lying down on the bed.
Embedded Activity Monitoring Methods
307
Fig. 13.11 Matrix sequence for detection of sitting
Fig. 13.12 Fall detection with ultrasonic system
13.3.5
Open Issues and Drawbacks
The following sections show the feasibility for the detection of posture and activity using ultrasonic range sensors. There are several pragmatic problems to be resolved in the next phase of this system deployment. Ceiling levels very close to the person’s height can create problems in sensitivity, irregular structure of the room may create problems in fitting the sensor, and the system may generate errors when water is present for activity detection in the bathroom.
308
Activity Recognition in Pervasive Intelligent Environments
Fig. 13.13 Sitting detection with ultrasonic sensors
Additional testing is currently in progress to resolve these issues and fine tune the performance of the system. The system currently in the lab uses eight EZ0 sensors and displays four levels of output as shown in Fig. 13.13. A green light is displayed when the person is standing, a yellow light shows when the person is seated as shown in Fig. 13.13, and a red light shows when the person is on the ground. Otherwise the display is white. The ceiling shown in Fig. 13.13 has two sensors mounted in the tile to capture the readings. Signals are sent from an FPGA interface in the ceiling to a VGA monitor for display of the location and position of individuals.
Embedded Activity Monitoring Methods
309
Further evaluation of the system performance is planned. A pilot site is currently in development at a local assisted living community. Data will be collected over several weeks in the bedroom and bathroom for two residents to monitor activity and pose. The density of sensors for each room will be around twenty five and data will be logged for evaluation and classification of activity. Alarms will be sent if activity resembling a fall occurs while the individual is alone. Additional logging of activity patterns will be performed on a remote computer and compared to the daily activity patterns monitored by the nursing staff. 13.4
Conclusion
In this chapter a comparison of sensing methods and systems for the detection of position and posture of elderly person in their home was performed. Each method has merits and demerits that require a combined approach to fully capture the activity in a pervasive environment. The use of machine vision is a good approach, considering the recent advancements in the field of image processing. However, in some locations of the home like the bathroom and bedroom the presence of a camera may increase concerns for privacy and cause risky modifications in behavior patterns. This method also requires processing a large amount of data if used as the only method of activity detection. It works well when used selectively in the pervasive environment to improve the detection accuracy of other sensing methods. RFID is another popular technique for activity monitoring. It can be used to determine patterns of activity through the use of tags in the environment and a wearable reader; however, this method fails to provide posture information when used as a sole means of monitoring. WISPs are another RFID technique that is gaining popularity due to its low cost and simplicity. However, there is a large amount of infrastructure required for deployment since WISP requires tagging of even the smallest of objects, which is not practical. Another approach involves networked motion sensors. This is a simple approach to activity detection however the system takes a long time to learn activity patterns and is incapable of detecting posture. It is also difficult to distinguish multiple users in the environment. Fixed pressure sensors are another method for location detection, but the posture of the person is not discernable unless the sensors are worn. Deployment of the sensing tiles also increases the cost of the system deployment. One of the most recent fields of research is to use wearable accelerometer and gyroscopes for the purpose of posture detection. This is a very reliable and accurate method, but this
310
Activity Recognition in Pervasive Intelligent Environments
kind of system cannot work alone. The position of the person in the house is not detected and the activity is difficult if not impossible to discern. It does have very accurate posture recognition. The proposed ultrasonic sensing method solves several of the issues in the current technology. It is also fixed as part of the infrastructure, so it does not require the user to wear additional devices. Ultrasonic sensors combined with motion sensors create a system that is robust and dynamic. The system can learn from the changing environment and also update the room configuration. The approach is a novel one and has a lot of scope for further refinement and development. Enhancements to the deployment of the ultrasonic sensing method are in progress to satisfy the safety and independence needs of the ever-growing elderly community.
References [1] Helal, S., Mann, W., El-Zabadani et al., The Gator Tech Smart House: A Programmable Pervasive Space, Computer, vol. 38, no. 3, pp. 50–60, (2005). [2] Sajal K. Das and Diane J. Cook, Designing Smart Environments: A Paradigm Based on Learning and Prediction, Lecture Notes in Computer Science: Pattern Recognition and Machine Intelligence, vol. 3776, pp. 80–90, (2005). [3] Weiming Hu, Tieniu Tan, Liang Wang, and Steve Maybank, A Survey on Visual Surveillance of Object Motion and Behaviors, IEEE Transactions on Systems, Man, and Cybernetics-Part C: Applications and Reviews, vol. 34, no. 3, pp. 334–352, (2004). [4] G. Diraco, A. Leone, and P. Siciliano, An Active Vision System for Fall Detection and Posture Recognition in Elderly Healthcare, Design, Automation, and Test in Europe Conference and Exhibition, pp. 1536–1541, (2010). [5] Jezekiel Ben-Arie, Zhiqian Wang, Purvin Pandit, and Shyamsundar Rajaram, Human Activity Recognition Using Multidimensional Indexing, IEEE Transactions On Pattern Analysis And Machine Intelligence, vol. 24, no. 8, pp. 1091–1104, (2002). [6] RFID Handbook by Klaus Finkenzeller, Second Edition, Published by John Wiley and Sons, Chapter 3, 4, and 9, (2003). [7] Fishkin, K.P., Philipose, M., Rea, A., Hands-on RFID: wireless wearables for detecting use of objects, Ninth IEEE International Symposium on Wearable Computers, pp. 38–41, (2005) [8] Joshua R. Smith, Kenneth P. Fishkin, Bing Jiang, Alexander Mamishev, Matthai Philipose, Adam D. Rea, Sumit Roy, and Kishore Sundara-Rajan, RFID-Based Techniques For Human Activity Detection, Communications of the ACM, vol. 48, no. 9, pp.39–44, (2005). [9] Donald J. Patterson, Dieter Fox, Henry Kautz, and Matthai Philipose, Fine-Grained Activity Recognition by Aggregating Abstract Object Usage, Ninth IEEE International Symposium on Wearable Computing, pp. 44–51, (2005). [10] Tomohito Inomata, Futoshi Naya, Noriaki Kuwahara, Fumio Hattori, Kiyoshi Kogure, Activity Recognition from Interactions with Objects using Dynamic Bayesian Network, Proceedings of the 3rd ACM International Workshop on Context-Awareness for Self-Managing Systems, pp. 39–42, (2009). [11] S. Srinivasa Rao, E. G. Rajan, and K. Lalkishore, Human Activity Tracking using RFID Tags, International Journal of Computer Science and Network Security, vol. 9, no. 1, pp. 387–394,
Embedded Activity Monitoring Methods
311
(2009). [12] RFID handbook: application, technology, security, and privacy, Chapter 14, WISP: A Passively Powered UHF RFID Tag with Sensing and Computation, editors, Syed Ahson, and Mohammad Ilyas, CRC Press, (2008). [13] Jianxin Wu, Adebola Osuntogun, Tanzeem Choudhury, Matthai Philipose, and James M. Rehg, A Scalable Approach to Activity Recognition based on Object Use, IEEE 11th International Conference on Computer Vision, pp. 1–8, 2007. [14] Sangho Park and Henry Kautz, Privacy-preserving Recognition of Activities in Daily Living from Multi-view Silhouettes and RFID-based Training, AAAI Fall 2008 Symposium on AI in Eldercare:New Solutions to Old Problems, (2008). [15] Ann Cavoukian, Angus Fisher, Scott Killen, and David A. Hoffman, Remote home health care tehnologies: how to ensure privacy? Build it in: Privacy by Design, Identity in the Information Society, Springer, published online 22 May, 2010. [16] Y. Kaddourah, J. King, and A. Helal, Cost-precision tradeoffs in unencumbered floor-based indoor location tracking, Proc.3rd. Int. Conf. Smart Homes Health Telematics, 2005. [17] Danilo De Rossi, Federico Carpi, Federico Lorussi, Alberto Mazzoldi, Rita Paradiso, Enzo Pasquale Scilingo, and Alessandro Tognetti, Electroactive Fabrics and Wearable Biomonitoring Devices, AUTEX Research Journal, vol. 3, no. 4, pp. 180–185, (2003). [18] G. Williams, K. Doughty, K. Cameron and D.A. Bradley, A smartfall and activity monitor for telecare applications, Proc. 20th Annual Int. Conf. of the IEEE Engineering in Medicine and Biology Society, vol. 3, pp. 1151–1154, (1998). [19] K. Doughty, R. Lewis, and A. McIntosh, The design of a practical and reliable fall detector for community and institutional telecare, J. Telemed. Telecare, vol. 6, pp. 150–154, (2000). [20] M. Mathie, J. Basilakis, and B.G. Celler, A system for monitoring posture and physical activity using accelerometers, 23rd Annual Int. Conf. IEEE Engineering in Medicine and Biology Society, vol. 4, pp. 3654–3657, (2001). [21] R. Salleh, D. MacKenzie, M. Mathie, and B.G. Celler, Low power tri-axial ambulatory falls monitor, Proc. 10th Int. Conf. on Biomedical Engineering, (2000). [22] Henry J. Montoye, Han C. G. Kemper, Wim H. M. Saris, and Richard A. Washburn, Measuring physical activity and energy expenditure, Human Kinetics, pp.72–96, (1996). [23] Quang Li, John A. Stankovic, Mark A. Hanson, Adam T. Barth, John Lach, and Gang Zhou, Accurate, Fast Fall Detection using Gyroscopes and Accelerometer-Derived Posture Information, Body Sensor Networks, (2009). [24] Yoshifumi Nishida, Shin’ichi Murakami, Toshio Hori, and Hiroshi Mizoguchi, Minimally Privacy-Violative Human Location Sensor by Ultrasonic Radar Embedded on Ceiling, Proceedings of IEEE Sensors, vol. 1, pp. 433–436, (2004).
Chapter 14
Activity Recognition and Healthier Food Preparation
Thomas Plötz1 , Paula Moynihan2 , Cuong Pham1 , and Patrick Olivier1 Newcastle University, Newcastle upon Tyne, U.K.
[email protected] Abstract Obesity is an increasing problem for modern societies, which implies enormous financial burdens for public health-care systems. There is growing evidence that a lack of cooking and food preparation skills is a substantial barrier to healthier eating for a significant proportion of the population. We present the basis for a technological approach to promoting healthier eating by encouraging people to cook more often. We integrated tri-axial acceleration sensors into kitchen utensils (knifes, scoops, spoons), which allows us to continuously monitor the activities people perform while acting in the kitchen. A recognition framework is described, which discriminates ten typical kitchen activities. It is based on a sliding-window procedure that extracts statistical features for contiguous portions of the sensor data. These frames are fed into a Gaussian mixture density classifier, which provides recognition hypotheses in real-time. We evaluated the activity recognition system by means of practical experiments of unconstrained food preparation. The system achieves classification accuracy of ca. 90% for a dataset that covers 20 persons’ cooking activities.
14.1
Introduction
A number of studies have identified that high energy density, low-fibre diets are major risk factors in the development of obesity [1]. Recent research indicates that diet alone can explain the observed trend of increasing obesity [2]. In a global setting, a number of educational initiatives such as the “5 A Day for Better Health” in the USA and “Change4Life” in the UK have attempted to promote the benefits of healthier eating. There is evidence that these programs have raised awareness of the importance of maintaining a healthier diet [3, 4]. However, there is little evidence that such changes in awareness have lead to 1 School
of Computing Science for Ageing and Health
2 Institute
313
314
Activity Recognition in Pervasive Intelligent Environments
any significant improvement in dietary intake [3–5]. Furthermore, there is a growing evidence base to indicate that a lack of cooking and food preparation skills, in addition to a lack of nutritional knowledge, is a barrier to healthier eating for a significant proportion of the population. The ever-decreasing size and costs of embedded sensing technology has given rise to the possibility of “situated” digital interventions to promote healthier eating. Key to such pervasive computing applications is the automatic monitoring and analysis of the activities people undertake in their kitchens, by which the nutritional quality of the food people prepare and eventually consume can be monitored. Activity recognition in the kitchen can also underpin the provision of prompts and advice to augment and enhance people’s cooking skills that can also play a role in improving their nutritional intake. Although kitchens may contain food preparation and cooking “appliances” they have not in general been considered as sites for the application of digital technologies. Consequently, activity recognition in the kitchen must be realized through the sensitive integration of technology into the home environment and existing tools, and poses an interesting challenge that requires robust processing of noisy data captured in real-world settings. The contribution of this chapter is two-fold. First, we will summarize related work on activity recognition in kitchen scenarios. This also includes an overview of the Ambient Kitchen, a high fidelity prototype for exploring the design of pervasive computing algorithms and applications for everyday environments. The focus of the Ambient Kitchen is on real-world applications since it is intended as a design tool for applications to be deployed in real households. Our second contribution relates to the recognition of food preparation activities using sensor equipped kitchen utensils. The key innovation here is the use of ordinary kitchen utensils, such as knives and spoons, that have been augmented with embedded accelerometers to create a multi-sensor network. The integrated real-time analysis of these sensory data provide the basis for real-time detection and recognition of kitchen activities such as peeling, chopping and stirring [6]. Our initial goal is to develop a completely non-intrusive activity recognition system, that is, one that has no unwarranted impact on a person’s food preparation activities. From this substantial insights about the food preparation process can be gained. The output of the system, the activity based annotation of time-series data, serves as high-level features for subsequent knowledge inference and prompting systems. Utensil based activity recognition has been evaluated by means of real-world experiments involving a large number of naive users preparing food in an unconstrained setting.
Activity Recognition and Healthier Food Preparation
14.2
315
The Role of Technology for Healthier Eating
There is convincing evidence that a healthier diet generally protects against diet related conditions such as obesity, diabetes, heart disease and even cognitive decline. Conventional intervention and education programs usually represent the methodology of choice to direct people’s attention towards healthier nutrition – though often only with limited success. For many people difficulties in preparing healthier food has been identified as one important cause for poor diet [7]. Technology based approaches that support people in their kitchen activities can be the key to overcome this fundamental barrier to healthier eating.
14.2.1
Current dietary guidelines
Current global dietary guidelines promote a diet that is low in total fat, saturated fat and free sugars (added sugars plus those found in natural juices, syrups and honey) and high in fruits, vegetables dietary fibre. This type of diet will help protect against many diet-related chronic conditions including cardiovascular disease (heart attack and stroke), some cancers (e.g. bowel cancer, oral cancer), diabetes, obesity and tooth decay [8]. The World Health Organization recommends that 15 to 30% of total dietary energy (kilocalories) should come from dietary fat and that intake of free sugars should be restricted to less than 10% of total energy intake [8]. An intake of at least 400g of fruits and vegetables per day is recommended as they are good sources of vitamins, minerals and fibre. Wholegrain varieties of cereals are recommended as these too are rich in micronutrients and dietary fibre. Diets high in fat and or free sugars i.e. ‘energy dense diets’ are a risk factor for the development of obesity. Whereas a high intake of dietary fibre, regular physical activity and home environments that promote healthier food and activity choices may protect against development of overweight and obesity. Consumption of saturated fats increases the risk of developing type 2 diabetes, whereas a diet rich in wholegrain foods, fruits and vegetables is associated with a reduced risk of diabetes. Saturated fats, especially those found in dairy products and meat, and trans fats increase the risk of heart disease and stroke whereas unsaturated fats (e.g. sunflower and olive oils) and fish oils (found in oily fish) are associated with a lower risk of these diseases. A high intake of salt can increase blood pressure and the risk of stroke and coronary heart disease and therefore restricting total salt intake to less than 5g per day (about one teaspoon) is recommended. A high fibre diet, rich in wholegrain foods and fruits and vegetables can reduce the risk of cardiovascular disease.
316
14.2.2
Activity Recognition in Pervasive Intelligent Environments
Barriers to healthier eating with focus on preparation
Many barriers to consuming a healthier diet have been identified and include poor nutritional knowledge, taste preferences for less healthy foods, budgetary restrictions, restricted access to a variety of foods and chewing and eating problems. In addition to these barriers research has shown that limited food preparation and cooking skills is also a barrier to eating more healthily [9]. Possession of food preparation skills has the potential to empower individuals to make beneficial food choices. To be able to make a range of healthy meals, a number of basic food preparation and cooking skills are required. There is some evidence to show that level of food preparation within the home is related to a healthier diet [10] and that avoidance of cooking due to a busy lifestyle may be a factor contributing to higher energy (kilocalorie) intakes in women [11]. 14.2.3
Why technology-based approach to healthier cooking?
Cooking method can have a major impact on the nutrient profile of a meal. For example, grilled meat will have significantly less fat compared with fried meat. Cooking potatoes in unsaturated vegetable oil in preference to roasting in animal fat will significantly impact on the saturated fat content of the meal and adding more vegetables to dishes such as casseroles, pizza or pasta sauces and choosing and preparing wholegrain varieties of foods e.g. wholemeal breads will impact positively of the nutrient profile of the prepared food. Finding means of encouraging and promoting healthier food preparation practices on a daily basis within the home is therefore a challenge which could be addressed using pervasive technology. Despite possession of cooking skills being one of the factors indicative of a better diet, cooking is being taught less in schools today than it was several decades ago. Community based cooking initiatives, that aim to address the lack of cooking skills in today’s society, have been met with some success. However, partaking in such initiatives requires motivation, commitment, traveling and time which may not suit contemporary busy lifestyles. Alternative means of encouraging and developing the population’s confidence and skill to prepare food are therefore necessary. Pervasive computing and use of situated digital interventions provides an alternative modern day platform to use to teach cooking within the home. Such technology may be used to teach cooking skills, provide prompts to cook using healthier ingredients and methods and to increase overall cooking confidence. The ability to provide individualized feedback on the nutritional profile of the food prepared and the impact of changing cooking methods and ingredients on the nutritional profile of
Activity Recognition and Healthier Food Preparation
317
the food is also possible, and warrants exploration as providing individualized feedback is one effective means of encouraging positive dietary behavior. 14.2.4
Evaluation and assessment of cooking skills
Nutrition education programs often promote increased cooking as a means to improve diet, yet there are few data on the relationship between the actual level of cooking activity and skill and food and nutrient intake. One explanation for this could be the lack of valid and sensitive measures of cooking activity and skill. Most research to date has relied on self reported perception of cooking confidence or used questionnaires to assess level of cooking knowledge and or activity. Employing embedded sensing technology to monitoring food preparation behavior and activity in situ may provide a more scientifically robust means of assessing cooking skills, activity and the impact on dietary behavior.
14.3
Activity Recognition in the Kitchen – The State-of-the-Art
Technology supported food preparation substantially relies on the automatic analysis of activities taking place in people’s kitchens. In the literature a number of approaches have been described where, for this purpose, sensors capable of sensing the environment at sufficient granularity have been added to either traditional or enhanced lab-based kitchens. Given this technological basis, a number of applications using activity recognition and contextaware computing are imaginable. In the following we will provide a summary of related work, which includes a brief overview of sensor-based activity recognition in general and a summary of the most relevant instrumented kitchen applications. 14.3.1
Sensor-based Activity Recognition
A considerable body of research addresses activity recognition using wearable sensors (see the recent survey of Atallah and Young [12]). In many of these studies accelerometers are worn by people engaged in activities of daily living (ADL). These ADL normally involve gross movements of the body such as sitting down, walking, cycling, etc. In a small number of cases sensors have been embedded into tools and utensils to allow the analysis of more fine grained ADL [6, 13]. Most AR techniques utilize a sliding window approach, extracting fixed length portions of sensor data for which features are calculated – Huyn & Schiele provide a comparison of feature extraction methods for AR [14]. The most widely adopted approach is to use
318
Activity Recognition in Pervasive Intelligent Environments
either frame-based simple statistical features (such as the mean and variance) or Fourier descriptors. Differences exist in the details of different parameterizations used (window lengths, selection of coefficients etc.). However, there is no clear consensus as to the most suitable statistical classifier. 14.3.2
Instrumented Kitchens
AR in the home has been explored in various guises, but only a small number of projects have considered the issues of technology-augmented kitchens. One of the first projects was MIT’s CounterIntelligence – an augmented kitchen that provides instructive information to users while they are cooking [15]. CounterIntelligence highlighted issues of situated interaction (rather than activity recognition) and included several different examples of how information could be displayed on kitchen surfaces to direct a user’s attention appropriately. Chi and colleagues [16] developed an application of RFID technology embedded in a kitchen counter in which food ingredients on the kitchen counter was intended to raise a user’s awareness of the healthy quality of food ingredients through the presentation of nutritional information on a display and speaker (an extension of the “dietary aware table” [17]). A number of other design proposals also relate to the provision of situated advice on food and cooking [18]. In the following, more general projects and initiatives related to instrumented homes (including kitchens) are described in more detail. PlaceLab@House_n
As part of MIT’s House_n research initiative on “how new tech-
nologies, materials, and strategies for design can make possible dynamic, evolving places that respond to the complexities of life” [19], the PlaceLab represents a “living lab”, i.e., a laboratory environment to study how people interact with new technologies integrated into real-world living environments [20]. Among others the PlaceLab also contains an instrumented kitchen environment. PlaceLab utilizes a multitude of (ubiquitous) sensors that are embedded into the environment (furniture, walls, floors, etc.). For example, by means of simple state-change sensors, occupants’ activities of daily living (ADL) have been monitored in a simple and non-intrusive way [21]. Additionally, activity recognition is also performed based on the analysis of body-worn sensors, which enables the analysis of ADL such as walking, running, cycling and so forth [22]. Quality of Life Technology Centre
QoLT is a joint research endeavor of the University
of Pittsburgh and the Carnegie Mellon University with the goal to develop “technologies that will improve and sustain the quality of life for all people” [23]. In addition to a multi-
Activity Recognition and Healthier Food Preparation
319
tude of general cross-disciplinary research activities that are related to the development of methods that enable older adults and people with disabilities to live independently, QoLT also addresses the analysis of real-world kitchen scenarios. For example, Spriggs and colleagues developed an activity segmentation and classification system that analyzes food preparation activities (baking and cooking) in the kitchen [24]. Computer vision techniques, as well as the analysis of inertial measurement units, are used as the basis for the recognition system. The analysis of multi-modal sequential data is performed by means of statistical modeling techniques such as hidden Markov models. Although the reliance on body-worn sensors raises some practical questions impressive recognition results have been reported. Aware Kitchen
The Aware Kitchen (also known as the Assistive Kitchen [13]) is an in-
strumented kitchen environment that has been developed at TU Munich. It comprises of a mobile robot and networked sensing and actuation devices that are physically embedded into the environment. The main purpose of the Aware Kitchen is to provide “a comprehensive demonstration and challenge scenario for technical cognitive systems” [13], i.e., a testbed for computer vision, robotics, and sensor analysis techniques. For the analysis of food preparation activities a number of sensors are integrated into kitchen utensils. This includes, force-sensors in the handle of a knife, load cells and accelerometers in a chopping board, and body-worn sensors such as RFID-enabled gloves. Sensor data have been analyzed in certain kitchen related exemplary applications (activity and context recognition). Ambient Kitchen
The Ambient Kitchen is our own high fidelity prototype for exploring
the design of pervasive computing algorithms and applications for everyday environments, which has been created at Newcastle University’s Culture Lab [25]. The environment integrates data projectors, cameras, RFID tags and readers, object mounted accelerometers, and under-floor pressure sensing using a combination of wired and wireless networks. The Ambient Kitchen is a lab-based replication of a real kitchen where careful design has hidden the additional technology, and allows both the evaluation of pervasive computing prototypes and the simultaneous capture of the multiple synchronized streams of sensor data. Previous work exploring the requirements for situated support for people with cognitive impairments motivated the design of the physical and technical infrastructure. The lab-based prototype has been put to use as: a design tool for designers; a design tool for users; an observatory to collect sensor data for activity recognition algorithm development, and an evaluation test bed. The recognition system for the analysis of kitchen activities has been integrated into
320
Activity Recognition in Pervasive Intelligent Environments
the Ambient Kitchen project. For real-time analysis it continuously runs as a background service simultaneously analyzing data from multiple, synchronized acceleration sensors. Philips ExperienceLabs
The ExperienceLabs is part of the research infrastructure at
Philips Research (Eindhoven) and recreates a natural setting for the innovation of technologies and applications in real living environments [26]. This also includes kitchens and all sorts of utensils and appliances used therein. Crucially, the ExperienceLabs have been designed to be comfortable enough for people to stay in for some time (longer than in most classical usability studies) and are equipped with a sensor and computing infrastructure to observe and monitor the behavior of the participants in studies.
14.4
Automatic Analysis of Food Preparation Processes
Technology based support for kitchen activities that eventually shall lead to the consumption of healthier food aims to encourage manual food preparation and to aid in related kitchen tasks. Such an assistive system would automatically keep track of the progress the cook makes while following a recipe. It would explicitly guide the cook to the particular steps, which are necessary to actually prepare a dish. Furthermore, it would give hints regarding potential pitfalls and how to circumvent them. If, for example, some dish requires especially careful chopping of the ingredients it would monitor the performance of the cook while he is chopping and would give hints if, e.g., finer chopping is necessary. During the cooking process an assistive system could also provide valuable additional information, such as background knowledge on the recipe, on the ingredients, and even related to the way the food is treated. It could give recommendations for side dishes, (healthier) alternatives to certain ingredients or treatments of the food (e.g. steaming vegetables instead of frying). Such situated support would substantially improve the cooking experience, and hence implicitly promote healthier eating. The foundation for a hands-free assistive system for situated support for kitchen tasks is a means for automatically and unobtrusively monitoring the activities the cook is pursuing while he is working in the kitchen. Hence, activity recognition represents the technological link between encouraging people to cook more and healthier eating. 14.4.1
Activity Recognition in the Ambient Kitchen
Typical activity recognition approaches in pervasive computing applications are based on the analysis of body-worn sensors such as bracelets with embedded accelerometers. Al-
Activity Recognition and Healthier Food Preparation
321
ternatively, video based activity recognition using computer vision techniques can also be utilized. Unfortunately, both approaches exhibit substantial drawbacks for real-world applications in the kitchen (either physically or in terms of privacy). Generally video cameras undermine the privacy of a home environment, and body-worn sensors are generally unwelcome encumbrances. The system presented in this chapter provides a means for activity recognition without violating people’s privacy thereby performing highly reliably in real-world food preparation tasks (see also our previous work [6]). It is part of the Ambient Kitchen (see Section 14.3 and left hand image of Figure 14.1). We enhanced ordinary kitchen utensils including knives and spoons by embedding accelerometers within their handles. Figure 14.1 (right side) illustrates the current state of the sensor-equipped utensils where modified Wii-Remote accelerometer boards are integrated into custom-made handles of the kitchen utensils. We are also currently working on the integration of much smaller custom-made wireless accelerometers, which will be integrated into standard kitchen utensils (without changing the design of their handles).
Fig. 14.1 Ambient Kitchen (left) – the framework for activity recognition supported food preparation – and the sensor equipped utensils (right). See text for explanation.
The recognition system allows for continuous monitoring of sensory data in an integrative manner. Technically this corresponds to mining low-cost sensor networks for activity recognition. People preparing their food do not have to worry about wearing the sensors, they just use the utensils as in their regular daily food preparation and cooking activities. Through the use of different utensils significant insights about the activities can be gained and larger scale context analysis becomes possible. In the following we provide a system overview together with the description of an experimental evaluation of activity recognition for food preparation tasks.
322
14.4.2
Activity Recognition in Pervasive Intelligent Environments
System Description
Analyzing food preparation for situated support requires fast and robust recognition of the actions as they are performed. Consequently the described AR system processes sensor data in a strictly time-synchronous manner. Raw acceleration data in form of x/y/z coordinates are captured continuously using the sensors integrated into the utensils. Our configuration utilizes Wii-Remotes sampled at 40Hz with each sample having a precision of 8 bits. By using a sliding window procedure, frames of 64 samples length (50% overlap) are extracted and statistical features are computed for every frame. Capturing approximately two seconds of sensor data per frame represents a reasonable compromise between sufficient amounts of context to analyze and low latency for real-time processing. The features extracted comprise mean, standard deviation, energy, entropy and correlation. They are computed for each frame for x-, y-, z-, pitch- and roll-acceleration resulting in 23-element vectors f ∈ R23 . The back-end of the recognition framework is based on a descriptive modeling approach, where statistical models GA – namely Gaussian mixture density models with parameters ΘA = (μi , Ci , w i | i = 1, . . . , KA ) – are extracted from training data to model the likelihood of feature vectors for every activity of interest A: KA
p(f | ΘA ) = GA = ∑ wi N (f | μi , Ci )
(14.1)
i=1
where N denotes a Normal probability distribution with mean vector μ and covariance matrix C: 1 T −1 1 N (f | μ , C) = e− 2 ( f −μ ) C ( f −μ ) . | 2π C|
(14.2)
wi represent the prior probabilities for the i-th mixture components (out of KA Gaussians in total). In our previous work we extensively investigated different modeling techniques for robust and reliable activity recognition based on the analysis of accelerometer data [6]. For a fully functional activity recognition system that can be successfully applied to the analysis of real-world kitchen tasks, it is essential to process all recorded data automatically. This especially requires proper treatment of unknown activities, i.e., sensor data that have been recorded while none of the particular activities of interest were conducted. Consequently, the AR system of the Ambient Kitchen also contains a rejection model that robustly covers unknown activities. Training mixture models for known activities can be done in a straightforward manner. By means of standard k-means clustering and maximum likelihood (ML) optimization,
Activity Recognition and Healthier Food Preparation
323
Gaussians, i.e., the parameters wi , μi , Ci , are estimated on class-specific training data. Note that the number KA of Gaussians to be used for representing an activity A needs to be determined via cross-validation. As a rule of thumb the classification performance is proportional to the number of Gaussians that can be robustly estimated. The estimation of a proper rejection model (also referred to as background model) is slightly different. The meaning of “unknown” is dependent on the particular task, i.e., on the set of activities that are of interest. It is unrealistic to assume the availability of a specific training set that only contains “unknown” samples, i.e., that is disjoint from the known activities. Thus, a straightforward modeling approach is impractical. The applied pattern recognition literature extensively addresses procedures for mixture model estimation where the availability of sample data is complicated [27]. The usual way of deriving the models is to start with the estimation of the background model on any – domain related – sample data that is available without considering potential annotations. By means of this unsupervised estimation procedure relatively general Gaussians are derived. In order to obtain mixture models for the activities of interest the background model is specifically adapted using model transformation techniques and activity specific sample data only. Different adaptation techniques can be applied. Examples of which are Maximum Likelihood Linear Regression (MLLR [28]), Maximum a-posteriori adaptation (MAP [29]), or standard expectation maximization (EM [30]) training. In our experiments EM-training performed best. During recognition all mixture models including the background model are evaluated in parallel and the one with the highest a-posteriori probability for some feature vector determines the classification result: = arg max p(ΘA |f ) = arg maxA p( f |ΘA )p(A) ≈ arg max p(f |ΘA )p(A). A A A p(f )
(14.3)
Bayes’ rule is applied with prior class probabilities p(A) estimated on the training set, and neglecting sample priors p(f ) as they are irrelevant for the maximization. For known activities the particular model’s score will be much higher compared to the rejection model, whereas for unknown data the more generic background model will produce higher scores. The activity recognition system is trained subject independently exploiting training samples that are annotated at the level of the activities of interest. Consequently, the system can be used as is without the need for user-specific training or adaptation.
324
14.4.3
Activity Recognition in Pervasive Intelligent Environments
Experimental Evaluation
The activity recognition system represents the basis for all higher-level applications within the Ambient Kitchen that address the analysis and support of food preparation activities. In order to evaluate the capabilities of the system we conducted practical experiments where volunteers were asked to prepare meals within the instrumented kitchen environment using the sensor enhanced utensils. Based on an observation of real world food preparation and language used in instructional cooking videos, ten activities were modeled by Gaussian mixture models: chopping, peeling, slicing, dicing, scraping, shaving, scooping, stirring, coring, and spreading (cf. figure 14.2 for an illustration of two typical activities). An additional rejection model has been estimated as described above.
Fig. 14.2 Examples of typical food preparation activities performed using the sensor-enhanced utensils: peeling (left) and chopping (right).
For the experimental evaluation twenty participants prepared sandwiches or a mixed salad using a range of ingredients with which they were provided with. No further instructions were given, so the task was conducted in a relatively unconstrained manner, resulting in substantial variance in the time taken to complete the task. Sessions with lengths varying from 7 to 16 minutes were recorded. Videos of sessions showed that all subjects performed a significant number of chopping, scooping, and peeling actions. Only small subsets of subjects performed eating (i.e. using the knife or spoon to eat ingredients – considered as unknown), scraping (rather than peeling) or dicing (i.e. fine grained rapid chopping) actions. The video recordings were independently annotated by three annotators. Every annotator was provided with an informal description of the ten activities of interest, and was asked to independently code the actions for each subject’s recording. After annotation, the
Activity Recognition and Healthier Food Preparation
325
intersection of the three coded data sets was created; this served as the ground truth annotation. Here only labeled data for which all three annotators agreed was extracted, that is, data where there is complete agreement between the annotators as to the action being performed. It should be noted that the boundaries between the ten activity labels is often unclear. In total almost four hours of sensor data were recorded covering approximately two hours of the ten modeled activities. Classification experiments were performed as 10-fold crossvalidation and results are averaged accordingly. The classification results of the experimental evaluation are summarized in Table 14.1. Processing the whole dataset, i.e., including samples that do not belong to one of the known activities (open set), 90.7% accuracy is achieved with a level of statistical significance of ±0.7%. Since almost 50% of the data covered unknown/idle activities, an additional evaluation of only the known activities of interest has been performed. For this closed dataset the accuracy was 86%.
Table 14.1
Classification results for food preparation experiments
dataset open set (incl. unknown act.) closed set (w/o unknown act.)
accuracy (stat. significance) 90.7% (± 0.7%) 86% (± 2.4%)
More detailed insights into the recognition performance of the system can be gained by means of a confusion analysis. Tables 14.2 and 14.3 represent the confusion matrices for the experimental evaluation based on the open and the closed datasets (each for tenfold cross validation experiments). The rejection model works very reasonably since only very few known activities are erroneously classified as being unknown (422 false positives compared to 44173 unknown frames in total, which is below 1% – cf. the first row and the first column of Table 14.2). The rate of false negative predictions, i.e., of erroneous rejections of known activities, was also reasonably low at approximately 8%. Slicing and dicing activities were quite frequently confused with chopping activities. For example, for the closed set approximately 46% of the dicing activities are classified as being chopping. Analyzing the video footage of the food preparation experiments it becomes clear that the majority of these failures can be explained by erroneous annotations (across the three annotators), i.e., ground truth errors.
326
Activity Recognition in Pervasive Intelligent Environments
chopping
scraping
peeling
slicing
dicing
coring
spreading
stirring
scooping
shaving
Confusion matrix (# frames) for open set experiments (10-fold cross eval.)
unknown
Table 14.2
40672 79 57 43 1 33 4 37 95 61 12
1398 2327 4 2 91 161 30 61 1 6 49
30 1 71 0 0 0 2 19 0 0 0
283 1 4 474 0 0 1 2 0 0 2
331 263 3 6 283 0 41 7 0 0 0
35 85 0 0 1 200 0 1 1 0 0
38 1 2 0 2 0 100 0 21 3 0
453 34 2 1 2 0 19 171 0 1 5
19 0 0 0 0 0 19 0 209 25 0
482 5 1 3 0 2 10 4 41 851 1
432 29 23 3 6 0 0 29 0 1 493
unknown chopping scraping peeling slicing dicing coring spreading stirring scooping shaving
14.5
chopping
scraping
peeling
slicing
dicing
coring
spreading
stirring
scooping
shaving
Table 14.3 Confusion matrix (# frames) for closed set experiments (10-fold cross eval.)
2390 4 2 92 183 30 77 1 8 51
1 71 0 0 0 2 19 0 0 0
1 4 498 0 0 1 2 0 0 2
263 3 8 283 0 41 7 0 0 0
90 0 0 1 207 0 1 1 1 0
2 2 0 2 0 100 0 24 4 0
34 2 1 2 0 19 171 0 1 5
0 0 0 0 0 20 3 238 29 1
15 58 20 0 6 12 17 104 904 3
29 23 3 6 0 1 34 0 1 500
chopping scraping peeling slicing dicing coring spreading stirring scooping shaving
Activity recognition and the promotion of health and wellbeing
Our approach to activity recognition in the kitchen appears to have the potential to promote healthier eating. As we described in the introduction to this chapter, food preparation and cooking skills have been identified as a barrier to healthier food choices, and the design space for imaginative technological interventions is wide open, both to support people’s actual food preparation and to assess and enhance their skills. Our own activity recogni-
Activity Recognition and Healthier Food Preparation
327
tion framework allows for real-time analysis of kitchen activities using statistical modeling techniques. Although we have demonstrated that our system can make sense of the actions of naive users, there is still a set of significant challenges before such AR capabilities can be integrated into support systems in regular households. In particular, the problem of recognizing the food items themselves. Though RFID has long been proposed as a technology that might afford embedded sensing of ingredients (in their packaging), the benefits for the retailers of individually labeling packages (rather than cartons or pallets) are not clear. RFID would also require either significant infrastructure in the kitchen (embedded antennae) or wrist worn readers as have been developed in a number of ubiquitous computing initiatives. Alternatively, kitchens have always sites for specialized appliances for food preparation, such as manual and mechanized tools. The practicalities of kitchenware retailing also means that if the sensing required can be achieved through an enhancement to an existing appliance, then take-up is more likely. Beetz et al.’s knife illustrates the potential clearly [13]. By embedding force sensors in the knife, as well as accelerometers, the knife could be used to sense the ingredients on which it was used. Our own vision recognizes the potential of such embedded sensors, and we see significant opportunities for further augmentation of the tools and utensils themselves with well chosen (and unobtrusive) sensors. However, activity recognition and sensor design and deployment are just a subset of the elements of the solution required. A mature solution will need to aggregate the atomic actions in reasoning about the preparation of a meal, address the problem of parallel actions, reason about the skill level of people cooking, and even the impact of their actions on the nutritional value of the ingredients. Furthermore, some practical problems need to be solved, which mostly relates to handling issues for the embedded sensory. The next version of sensor equipped utensils, which we are currently developing, will be based on much smaller, water- and heat-proof hardware. Integrating this tailored hardware into kitchen utensils, which can be used and washed as usual, will pave the way for real-life application in private households.
References [1] National Institute for Health and Clinical Excellence. Obesity: the prevention, identification, assessment and management of overweight and obesity in adults and children (May, 2006). [2] B. A. Swinburn. Increased energy intake alone virtually explains all the increase in body weight in the united states from 1970s to the 2000s. In Proc. European Congress on Obesity, (2009). [3] G. Block, T. Block, P. Wakimoto, and C. H. Block, Demonstration of an e-mailed worksite
328
Activity Recognition in Pervasive Intelligent Environments
nutrition intervention program, Preventing Chronic Disease. 1(4) (Jan, 2004). [4] S. Jebb, T. Steer, and C. Holmes. The ‘healthy living’ social marketing initiative: A review of the evidence (Mar, 2007). [5] J. Maitland, M. Chalmers, and K. A. Siek. Persuasion not required – improving our understanding of the sociotechnical context of dietary behavioural change. In Proc. Int. Conf. Pervasive Computing Technologies for Health Care (Feb, 2009). [6] C. Pham and P. Olivier. Slice&dice: Recognizing food preparation activities using embedded accelerometers. In Proc. Europ. Conf. Ambient Intelligence, pp. 34–43, (2009). [7] C. Byrd-Bredbenner, Food preparation knowledge and attitudes of young adults: Implications for nutrition practice, Topics in Clinical Nutrition. 19, 154–163, (2004). [8] World Health Organization. Diet, nutrition and the prevention of chronic diseases. WHO Technical Report Series, number 916, (2003). [9] E. Winkler and G. Turrell, Confidence to cook vegetables and the buying habits of australian households, Journal of the American Dietetic Association. 109(10), 1759–1768 (Oct., 2009). [10] N. I. Larson, M. Story, M. E. Eisenbergy, and D. Neumark-Sztainer, Food preparation and purchasing roles among adolescents: associations with sociodemographic characteristics and diet quality, Journal of the American Dietetic Association. 106, 211–218, (2006). [11] N. Sudo, D. Degeneffe, H. Vue, E. Merkle, J. Kinsey, K. Ghosh, and M. Reicks, Relationship between attitudes and indicators of obesity for midlife women, Health Educ. Behav. 36(6), 1082–1094, (2009). [12] L. Atallah and G. Yang, The use of pervasive sensing for behaviour profiling – a survey, Pervasive and Mobile Computing. pp. 447–464, (2009). [13] M. Beetz, J. Bandouch, A. Kirsch, A. Maldonado, and R. B. Rusu. The assistive kitchen—a demonstration scenario for cognitive technical systems. In IEEE 17th Int. Symp. Robot and Human Interactive Communication (RO-MAN), pp. 1—8 (Jan, 2008). [14] T. Huynh and B. Schiele. Analyzing features for activity recognition. In Proc. Joint Conf. on Smart Objects and AmI, pp. 159–163, (2005). [15] L. Bonanni, C. Lee, and T. Selker. CounterIntelligence: Augmented Reality Kitchen. In Proc. CHI, pp. 2239–2245, (2005). [16] P. Chi, J.-H. Chen, H.-H. Chu, and B.-Y. Chen. Enabling nutrition-aware cooking in a smart kitchen. In Proc. CHI – Extended Abstracts on Human Factors in Computing Systems, pp. 2333–Ð2338, (2007). [17] K. Chang, S.-Y. Liu, H.-H. Chu, J. Hsu, C. Chen, T.-Y. Lin, and P. Huang. Dietary-aware dining table: Observing dietary behaviors over tabletop surface. In Proc. Int. Conf. Pervasive Computing, pp. 366–Ð382, (2006). [18] Q. T. Tran, G. Calcaterra, and E. D. Mynatt. Cooks collage: Déjà vu display for a home kitchen. In Proc. Int. Conf. Home-Oriented Informatics and Telematics (HOIT), pp. 15–Ð32, (2005). [19] House_n. http://architecture.mit.edu/house_n/ – visited 9th April 2010. [20] S. S. Intille, K. Larson, E. Mungia-Tapia, J. S. Beaudin, P. Kaushik, J. Nawyn, and R. Rockinson. Using a live-in laboratory for ubiquitous computing research. In Proc. Int. Conf. Pervasive Computing, pp. 349–365 (Dec, 2006). [21] E. Mungia-Tapia, S. S. Intille, and K. Larson. Activity recognition in the home setting using simple and ubiquitous sensors. In Proc. Int. Conf. Pervasive Computing, (2004). [22] L. Bao and S. S. Intille. Activity recognition from user-annotated acceleration data. In Proc. Int. Conf. Pervasive Computing, (2004). [23] Quality of Life Technology Center – QoLT. http://www.cmu.edu/qolt/ – visited 9th April 2010. [24] E. H. Spriggs, F. D. L. Torre, and M. Hebert. Temporal segmentation and activity classification from first-person sensing. In IEEE Workshop on Egocentric Vision, CVPR 2009 (June, 2009). [25] P. Olivier, A. Monk, G. Xu, and J. Hoey. Ambient kitchen: Designing situated services using a
Activity Recognition and Healthier Food Preparation
[26] [27]
[28]
[29] [30]
329
high fidelity prototyping environment. In Workshop on Affect & Behaviour Related Assistance in the Support of the Elderly, PETRA-09, (2009). H. Hoonhout. ExperienceLabs: investigating peopleÕs experiences in realistic lab settings. In Proc. Int. Conf. Designing Pleasurable Products and Interfaces (DPPI), (2007). T. Plötz and G. A. Fink, Pattern recognition methods for advanced stochastic protein sequence analysis using HMMs, Pattern Recognition, Special Issue on Bioinformatics. 39, 2267–2280, (2006). C. J. Leggetter and P. C. Woodland, Maximum likelihood linear regression for speaker adaptation of continuous density Hidden Markov Models, Computer Speech & Language. pp. 171– 185, (1995). J.-L. Gauvain and C.-H. Lee. Map estimation of continuous density HMM: Theory and applications. In Proc. DARPA Speech and Natural Language Workshop, (1992). A. Dempster, L. N.M., and D. Rubin, Maximum likelihood from incomplete data via the em algorithm, Journal of the Royal Statistical Society. 39, 1–38, (1977). Series B (methodological).