This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Handbook of Research on Methods and Techniques for Studying Virtual Communities: Paradigms and Phenomena
Ben Kei Daniel University of Saskatchewan and Saskatoon Health Region, Canada
Volume I
INFORMATION SCIENCE REFERENCE Hershey • New York
Director of Editorial Content: Director of Book Publications: Acquisitions Editor: Development Editor: Publishing Assistant: Typesetter: Production Editor: Cover Design:
Kristin Klinger Julia Mosemann Lindsay Johnston Julia Mosemann Deanna Jo Zombro Natalie Pronio and Deanna Jo Zombro Jamie Snavely Lisa Tosheff
Editorial Advisory Board Demosthenes Akoumianakis, Technological Education Institution of Crete, Greece Anita Blanchard, University of North Carolina at Charlotte, USA John M.Carroll, The Pennsylvania State University, USA Bernie Hogan, Oxford Internet Institute, University of Oxford, UK Chris Kimble, Euromed Marseille École de Management, France Niki Lambropoulos, nikilambropoulos.com, UK Rocci Luppicini, University of Ottawa, Canada Piet Kommers, University of Twente, The Netherlands Gordon McCalla, University of Saskatchewan, Canada Howard Rheingold, rheingold.com Celine Robardet, INSA Lyon, France Richard Schiwer, University of Saskatchewan, Canada Tiffany Tang, Konkuk University, Korea Diego Zapata-Rivera, Educational Testing Service, USA
List of Contributors
Akoumianakis, Demosthenes / Technological Education Institution of Crete, Greece..................... 34 Annese, Susan / University of Bari, Italy.......................................................................................... 103 Barthès, Jean-Paul / Université de Technologie de Compiègne, France.......................................... 192 Berrueta, Diego / Parque Científico Tecnológico de Gijón Edificio Centros Tecnológicos, Spain......................................................................................................................... 10 Bizzocchi, Jim / Simon Fraser University Surrey, Canada............................................................... 469 Boaduo, Nana Adu-Pipim / University of the Free State, South Africa........................................... 594 Bojars, Uldis / National University of Ireland-Galway, Ireland........................................................ 429 Breslin, John G. / National University of Ireland-Galway, Ireland.................................................. 429 Brézillon, Patrick / University Paris 6 (UPMC), France................................................................. 285 Brigham, Nancy / Rosenblum Brigham Associates, USA.................................................................. 699 Buffa, Michel / University of Nice Sophia Antipolis, France............................................................ 122 Chai, Ching-Sing / Nanyang Technological University, Singapore.................................................. 611 Corby, Olivier / EDELWEISS, INRIA Sophia-Antipolis, France...................................................... 122 Crain-Dorough, Mindy / Southeastern Louisiana University, USA................................................. 457 Daniel, Ben Kei / University of Saskatchewan and Saskatoon Health Region, Canada.......... 1, 318, 585 de Azevedo, Hilton José Silva / Federal University of Technology - Paraná, Brazil . .................... 192 Decker, Stefan / National University of Ireland-Galway, Ireland..................................................... 429 Dennen, Vanessa P. / Florida State University, USA......................................................................... 509 English, Rebecca / Queensland University of Technology, Australia................................................ 233 Erétéo, Guillaume / Orange Labs, France........................................................................................ 122 Fernández, Sergio / Parque Científico Tecnológico de Gijón Edificio Centros Tecnológicos, Spain......................................................................................................................... 10 Fernández, Silvino / Parque Científico Tecnológico de Gijón Edificio Centros Tecnológicos, Spain......................................................................................................................... 10 Fujita, Shinobu / Spiceworks Corporation, Japan............................................................................ 381 Gandon, Fabien / EDELWEISS, INRIA Sophia-Antipolis, France................................................... 122 Gergle, Darren / Northwestern University, USA............................................................................... 333 Gibbs, William J. / Duquesne University, USA................................................................................. 568 Gruzd, Anatoliy / Dalhousie University, Canada............................................................................. 205 Gurzick, David / University of Maryland, Baltimore County, USA.................................................. 542 Hancock, Robert / Southeastern Louisiana University, USA............................................................ 457 Hecht, Brent / Northwestern University, USA................................................................................... 333
Hogg, Tad / Hewlett-Packard Laboratories, USA............................................................................. 268 Howell, Jennifer / Australian Catholic University Limited, Australia...................................... 176, 233 Isakovič, Jan / Artesia, Slovenia........................................................................................................ 359 Kato, Hiroshi / The Open University of Japan, Japan...................................................................... 381 Kinsella, Sheila / National University of Ireland-Galway, Ireland................................................... 429 Lambropoulos, Niki / London South Bank University, UK.............................................................. 672 Lansiquot, Reneta D. / New York City College of Technology of the City University of New York, USA . ........................................................................................................................ 224 Leitzelman, Mylène / University of Nice Sophia Antipolis, France................................................. 122 Limpens , Freddy / EDELWEISS, INRIA Sophia-Antipolis, France................................................. 122 López Aguirre, José Luis / Universidad Panamericana, México..................................................... 753 Matsuo, Yutaka / University of Tokyo, Japan................................................................................... 631 McCracken, Janet / Simon Fraser University Surrey, Canada........................................................ 469 McKendrick, Joseph E. / McKendrick and Associates, USA........................................................... 568 Mínguez, Iván / Parque Científico Tecnológico de Gijón Edificio Centros Tecnológicos, Spain......................................................................................................................... 10 Mochizuki, Toshio / Senshu University, Japan................................................................................. 381 Murillo, Enrique / Instituto Tecnológico Autónomo de México – ITAM, Mexico............................. 157 Myers, Jennifer B. / Florida State University, USA.......................................................................... 509 Nagamori, Yusuke / Tsukuba University of Technology, Japan........................................................ 381 Nishimori, Toshihisa / The University of Tokyo, Japan.................................................................... 381 Oescher, Jeff / Southeastern Louisiana University, USA................................................................... 457 Parchoma, Gale / Centre for Studies in Advanced Learning Technologies (CSALT) & Lancaster University, UK............................................................................................................ 61 Parton, Becky / Southeastern Louisiana University, USA................................................................. 457 Passant, Alexandre / National University of Ireland-Galway, Ireland............................................. 429 Pata, Kai / Tallinn University, Estonia............................................................................................... 482 Polo, Luis / Parque Científico Tecnológico de Gijón Edificio Centros Tecnológicos, Spain.............. 10 Poon, Nancy / University of Saskatchewan, Canada......................................................................... 585 Quan-Haase, Anabel / The University of Western Ontario, Canada................................................ 542 Repetto, Manuela / Institute for Educational Technology, National Research Council, Italy.......... 654 Robardet, Céline / Université de Lyon, France.................................................................................. 88 Robertson, Brent / Sancor, Canada.................................................................................................. 348 Rosen, Devan / Ithaca College, USA................................................................................................. 530 Rubiera, Emilio / Parque Científico Tecnológico de Gijón Edificio Centros Tecnológicos, Spain......................................................................................................................... 10 Sander, Peter / University of Nice Sophia Antipolis, France............................................................ 122 Shi, Lian / Parque Científico Tecnológico de Gijón Edificio Centros Tecnológicos, Spain................ 10 Siitonen, Marko / University of Jyväskylä, Finland.......................................................................... 555 Singleton, Alex D. / University of Liverpool, UK.............................................................................. 370 Siu Cheung, Hui / Nanyang Technological University, Singapore................................................... 394 Skågeby, Jörgen / Linköping University, Sweden.............................................................................. 410 So, Hyo-Jeong / Nanyang Technological University, Singapore....................................................... 611 Sookhanaphibarn, Kingkarn / Ritsumeikan University, Japan....................................................... 713
Suggs, Christie L. / Florida State University, USA........................................................................... 509 Sulčič, Alja / Artesia, Slovenia........................................................................................................... 359 Szabo, Gabor / Hewlett-Packard Laboratories, USA........................................................................ 268 Tan, Aik-Ling / Nanyang Technological University, Singapore........................................................ 248 Tan, Seng-Chee / Nanyang Technological University, Singapore............................................. 248, 611 Tang, Tiffany Y. / Konkuk University, South Korea.......................................................................... 731 Thanh Tho, Quan / Hochiminh City University of Technology, Vietnam......................................... 394 Thawonmas, Ruck / Ritsumeikan University, Japan........................................................................ 713 Traetta, Marta / University of Bari, Italy.......................................................................................... 103 Turner, Jeremy O. / Simon Fraser University Surrey, Canada........................................................ 469 Ueno, Maomi / The University of Electro-Communications, Japan.................................................. 303 Van Tien, Le / Hochiminh City University of Technology, Vietnam.................................................. 394 Winoto, Pinata / Konkuk University, South Korea............................................................................ 731 Yaegashi, Kazaru / Ritsumeikan University, Japan.......................................................................... 381 Yamamoto, Hikaru / Seikei University, Japan.................................................................................. 631 Young, Alyson / University of Maryland, Baltimore County, USA.................................................... 542 Yukio Sato, Gilson / Federal University of Technology - Paraná, Brazil . ...................................... 192
Table of Contents
Foreword........................................................................................................................................... xxxii Preface.............................................................................................................................................. xxxiv Acknowledgment............................................................................................................................. xxxvi Volume I Section 1 Introduction to Virtual Communities Chapter 1 Introduction to this Volume...................................................................................................................... 1 Ben Kei Daniel, University of Saskatchewan and Saskatoon Health Region, Canada Chapter 2 GEEK: Analyzing Online Communities for Expertise Information...................................................... 10 Lian Shi, Parque Científico Tecnológico de Gijón Edificio Centros Tecnológicos, Spain Diego Berrueta, Parque Científico Tecnológico de Gijón Edificio Centros Tecnológicos, Spain Sergio Fernández, Parque Científico Tecnológico de Gijón Edificio Centros Tecnológicos, Spain Luis Polo, Parque Científico Tecnológico de Gijón Edificio Centros Tecnológicos, Spain Iván Mínguez, Parque Científico Tecnológico de Gijón Edificio Centros Tecnológicos, Spain Emilio Rubiera, Parque Científico Tecnológico de Gijón Edificio Centros Tecnológicos, Spain Silvino Fernández, Parque Científico Tecnológico de Gijón Edificio Centros Tecnológicos, Spain Chapter 3 Recurrent Interactions, Acts of Communication and Emergent Social Practice in Virtual Community Settings.................................................................................................................. 34 Demosthenes Akoumianakis, Technological Education Institution of Crete, Greece
Chapter 4 Toward Diversity in Researching Teaching and Technology Philosophies-in-Practice in e-Learning Communities....................................................................................................................... 61 Gale Parchoma, Centre for Studies in Advanced Learning Technologies (CSALT) & Lancaster University, UK Section 2 Social Networks and Data Mining Chapter 5 Data Mining Techniques for Communities’ Detection in Dynamic Social Networks........................... 88 Céline Robardet, Université de Lyon, France Chapter 6 A Methodological Approach for Blended Communities: Social Network Analysis and Positioning Network Analysis............................................................................................................. 103 Susan Annese, University of Bari, Italy Marta Traetta, University of Bari, Italy Chapter 7 Semantic Social Network Analysis: A Concrete Case......................................................................... 122 Guillaume Erétéo, Orange Labs, France Freddy Limpens, EDELWEISS, INRIA Sophia-Antipolis, France Fabien Gandon, EDELWEISS, INRIA Sophia-Antipolis, France Olivier Corby, EDELWEISS, INRIA Sophia-Antipolis, France Michel Buffa, University of Nice Sophia Antipolis, France Mylène Leitzelman, University of Nice Sophia Antipolis, France Peter Sander, University of Nice Sophia Antipolis, France Chapter 8 Using Social Network Analysis to Guide Theoretical Sampling in an Ethnographic Study of a Virtual Community............................................................................................................. 157 Enrique Murillo, Instituto Tecnológico Autónomo de México – ITAM, Mexico Section 3 Tools and Techniques for Analysis and Building of Virtual Communities Chapter 9 Graphically Mapping Electronic Discussions: Understanding Online Conversational Dynamic....... 176 Jennifer Howell, Australian Catholic University Limited, Australia
Chapter 10 A Tool to Study the Evolution of the Domain of a Distributed Community of Practice..................... 192 Gilson Yukio Sato, Federal University of Technology - Paraná, Brazil Hilton José Silva de Azevedo, Federal University of Technology - Paraná, Brazil Jean-Paul Barthès, Université de Technologie de Compiègne, France Chapter 11 Exploring Virtual Communities with the Internet Community Text Analyzer (ICTA)....................... 205 Anatoliy Gruzd, Dalhousie University, Canada Chapter 12 Making the Virtual Real: Using Virtual Learning Communities for Research in Technical Writing................................................................................................................................. 224 Reneta D. Lansiquot, New York City College of Technology of the City University of New York, USA Chapter 13 Virtual Communities as Tools to Support Teaching Practicum: Putting Bourdieu on Facebook........ 233 Rebecca English, Queensland University of Technology, Australia Jennifer Howell, Australian Catholic University Limited, Australia Chapter 14 Conversation Analysis as a Tool to Understand Online Social Encounters......................................... 248 Aik-Ling Tan, Nanyang Technological University, Singapore Seng-Chee Tan, Nanyang Technological University, Singapore Section 4 Data and User Modelling Chapter 15 Modeling the Diversity of User Behavior in Online Communities..................................................... 268 Tad Hogg, Hewlett-Packard Laboratories, USA Gabor Szabo, Hewlett-Packard Laboratories, USA Chapter 16 Context and Explanation in e-Collaborative Work.............................................................................. 285 Patrick Brézillon, University Paris 6 (UPMC), France Chapter 17 Intelligent LMS with an Agent that Learns from Log Data in a Virtual Community.......................... 303 Maomi Ueno, The University of Electro-Communications, Japan
Chapter 18 A Computational Model of Social Capital in Virtual Communities.................................................... 318 Ben Kei Daniel, University of Saskatchewan and Saskatoon Health Region, Canada Section 5 Methods, Measurements and Matrices Chapter 19 A Beginner’s Guide to Geographic Virtual Communities Research.................................................... 333 Brent Hecht, Northwestern University, USA Darren Gergle, Northwestern University, USA Chapter 20 A Theoretical Method of Measuring Virtual Community Health and the Health of their Operating Environment in a Business Setting............................................................................. 348 Brent Robertson, Sancor, Canada Chapter 21 Building Web Communities: An Example Methodology.................................................................... 359 Jan Isakovič, Artesia, Slovenia Alja Sulčič, Artesia, Slovenia Chapter 22 Virtual Geodemographics: Consumer Insight in Online and Offline Spaces...................................... 370 Alex D. Singleton, University of Liverpool, UK Chapter 23 Development of Cellular Phone Software to Prompt Learners to Monitor and Reorganize Division of Labor in Project-Based Learning................................................................... 381 Toshio Mochizuki, Senshu University, Japan Kazaru Yaegashi, Ritsumeikan University, Japan Hiroshi Kato, The Open University of Japan, Japan Toshihisa Nishimori, The University of Tokyo, Japan Yusuke Nagamori, Tsukuba University of Technology, Japan Shinobu Fujita, Spiceworks Corporation, Japan Chapter 24 Mathematical Retrieval Techniques for Online Mathematics Learning.............................................. 394 Le Van Tien, Hochiminh City University of Technology, Vietnam Quan Thanh Tho, Hochiminh City University of Technology, Vietnam Hui Siu Cheung, Nanyang Technological University, Singapore
Chapter 25 Online Ethnographic Methods: Towards a Qualitative Understanding of Virtual Community Practices........................................................................................................................... 410 Jörgen Skågeby, Linköping University, Sweden Volume II Chapter 26 Understanding Online Communities by Using Semantic Web Technologies...................................... 429 Alexandre Passant, National University of Ireland-Galway, Ireland Sheila Kinsella, National University of Ireland-Galway, Ireland Uldis Bojars, National University of Ireland-Galway, Ireland John G. Breslin, National University of Ireland-Galway, Ireland Stefan Decker, National University of Ireland-Galway, Ireland Chapter 27 Understanding and Using Virtual Ethnography in Virtual Environments........................................... 457 Robert Hancock, Southeastern Louisiana University, USA Mindy Crain-Dorough, Southeastern Louisiana University, USA Becky Parton, Southeastern Louisiana University, USA Jeff Oescher, Southeastern Louisiana University, USA Chapter 28 Participant-Observation as a Method for Analyzing Avatar Design in User-Generated Virtual Worlds...................................................................................................................................... 469 Jeremy O. Turner, Simon Fraser University Surrey, Canada Janet McCracken, Simon Fraser University Surrey, Canada Jim Bizzocchi, Simon Fraser University Surrey, Canada Chapter 29 Participatory Design Experiment: Storytelling Swarm in Hybrid Narrative Ecosystem..................... 482 Kai Pata, Tallinn University, Estonia Chapter 30 Researching Community in Distributed Environments: Approaches for Studying Cross-Blog Interactions....................................................................................................................... 509 Vanessa P. Dennen, Florida State University, USA Jennifer B. Myers, Florida State University, USA Christie L. Suggs, Florida State University, USA Chapter 31 Methods for the Measurement and Visualization of Social Networks in Multi-User Virtual Worlds................................................................................................................... 530 Devan Rosen, Ithaca College, USA
Chapter 32 Online Multi-Contextual Analysis: (Re)connecting Social Network Site Users with Their Profile....................................................................................................................... 542 Alyson Young, University of Maryland, Baltimore County, USA David Gurzick, University of Maryland, Baltimore County, USA Anabel Quan-Haase, The University of Western Ontario, Canada Chapter 33 Participant Observation in Online Multiplayer Communities............................................................. 555 Marko Siitonen, University of Jyväskylä, Finland Chapter 34 Proposed Techniques for Data Collection and Analysis in the Study of News-Oriented Virtual Communities............................................................................................................................ 568 William J. Gibbs, Duquesne University, USA Joseph E. McKendrick, McKendrick and Associates, USA Chapter 35 Challenges of Analyzing Informal Virtual Communities.................................................................... 585 Nancy Poon, University of Saskatchewan, Canada Ben Kei Daniel, University of Saskatchewan and Saskatoon Health Region, Canada Chapter 36 Research Methods for Studying Virtual Communities........................................................................ 594 Nana Adu-Pipim Boaduo, University of the Free State, South Africa Chapter 37 Methodological Considerations for Quantitative Content Analysis of Online Interactions................ 611 Seng-Chee Tan, Nanyang Technological University, Singapore Hyo-Jeong So, Nanyang Technological University, Singapore Ching-Sing Chai, Nanyang Technological University, Singapore Chapter 38 Measuring Brand Community Strength............................................................................................... 631 Hikaru Yamamoto, Seikei University, Japan Yutaka Matsuo, University of Tokyo, Japan Chapter 39 An Approach for Analysing Interactions within Virtual Learning Communities................................ 654 Manuela Repetto, Institute for Educational Technology, National Research Council, Italy
Section 6 Online Phenomena and Case Studies Chapter 40 The Sense of e-Learning Community Index (SeLCI) for Computer Supported Collaborative e-Learning (CSCeL)...................................................................................................... 672 Niki Lambropoulos, London South Bank University, UK Chapter 41 Tracer Studies: A Concrete Approach to a Virtual Challenge.............................................................. 699 Nancy Brigham, Rosenblum Brigham Associates, USA Chapter 42 Digital Museums in 3D Virtual Environment...................................................................................... 713 Kingkarn Sookhanaphibarn, Ritsumeikan University, Japan Ruck Thawonmas, Ritsumeikan University, Japan Chapter 43 Weaving the Social Fabrics: Recognizing Social Signals to Support Awareness and Group Interaction in Online Games..................................................................................................... 731 Tiffany Y. Tang, Konkuk University, South Korea Pinata Winoto, Konkuk University, South Korea Chapter 44 Studying Social Capital in the New Communitarian Horizon: A Multi-Method Research Strategy................................................................................................................................. 753 José Luis López Aguirre, Universidad Panamericana, México Compilation of References................................................................................................................ 766
Detailed Table of Contents
Foreword........................................................................................................................................... xxxii Preface.............................................................................................................................................. xxxiv Acknowledgment............................................................................................................................. xxxvi Volume I Section 1 Introduction to Virtual Communities Chapter 1 Introduction to this Volume...................................................................................................................... 1 Ben Kei Daniel, University of Saskatchewan and Saskatoon Health Region, Canada The growth of virtual communities and their continuous impact on social, economic and technological structures of societies has attracted a great deal of interest among researchers, practitioners, system designers and policy makers. All interested in analysing and understanding how these communities form, develop, nurture social interaction, influence various technological design and implementation, enhance information and knowledge sharing, support business and act as catalytic environments to support human learning. This Chapter provides a general overview of virtual communities and introduces the reader to the various themes covered in this volume as well as the geographical distribution and institutional affiliations of contributors to the volume. Chapter 2 GEEK: Analyzing Online Communities for Expertise Information...................................................... 10 Lian Shi, Parque Científico Tecnológico de Gijón Edificio Centros Tecnológicos, Spain Diego Berrueta, Parque Científico Tecnológico de Gijón Edificio Centros Tecnológicos, Spain Sergio Fernández, Parque Científico Tecnológico de Gijón Edificio Centros Tecnológicos, Spain Luis Polo, Parque Científico Tecnológico de Gijón Edificio Centros Tecnológicos, Spain Iván Mínguez, Parque Científico Tecnológico de Gijón Edificio Centros Tecnológicos, Spain Emilio Rubiera, Parque Científico Tecnológico de Gijón Edificio Centros Tecnológicos, Spain Silvino Fernández, Parque Científico Tecnológico de Gijón Edificio Centros Tecnológicos, Spain
Finding experts over the Web using semantics has recently received increased attention, especially its application to enterprise management. This scenario introduces many novel challenges to the Web of Data. Gathering Enterprise Expertise Knowledge (GEEK) is a research project which fosters the adoption of Semantic Web technologies within the enterprise environment. GEEK has produced a prototype that demonstrates how to extract and infer expertise by taking into account people’s participation in various online communities (forums and projects). The reuse and interlinking of existing, well-established vocabularies in the areas of person description (FOAF), Internet communities (SIOC), project description (DOAP) and vocabulary sharing (SKOS) are explored in our framework, as well as a proposal for applying customized rules and other enabling technologies to the expert finding task. Chapter 3 Recurrent Interactions, Acts of Communication and Emergent Social Practice in Virtual Community Settings.................................................................................................................. 34 Demosthenes Akoumianakis, Technological Education Institution of Crete, Greece The chapter builds on recent efforts aiming to develop a conceptual frame of reference for gaining insight to and analyzing ‘practice’ in virtual communities. Following a thorough analysis of related works in new media, community-oriented thinking and practice-based approaches as well as reflections upon recent case studies, the chapter discusses what is it that differentiates offline from online practice, how these two are intertwined in virtual settings and what may be an appropriate methodological frame of reference for analyzing them. In this vein, instead of reproducing arguments for community management (i.e., discovering, forming and sustaining communities) and the underlying methodological challenges commonly encountered in Information Systems research, our effort is focused on understanding emergent social practices through a practice lens framed in technology constituting structures and cultural artifacts. Through a cross case design we formulate the argument that community results from the history of co-engagement of actors in a joint field, while in virtual settings, it is recurrent interactions that lead to an act of communication or the enactment of practice. Our main conclusions are (a) online social practices are shaped through cycles of ‘constructing – negotiating – reconstructing’ cultural artifacts in virtual settings, and (b) practice-oriented toolkits designed to support cycles of ‘constructing – negotiating – reconstructing’ cultural artifacts offer new grounds for understanding innovative engagement by virtual communities. Chapter 4 Toward Diversity in Researching Teaching and Technology Philosophies-in-Practice in e-Learning Communities....................................................................................................................... 61 Gale Parchoma, Centre for Studies in Advanced Learning Technologies (CSALT) & Lancaster University, UK e-Learning is pervasively perceived as a singular enterprise, subject to broad claims and overarching critiques. From this viewpoint, the strengths and weakness of large-scale e-learning implementations in supporting all forms of teaching and learning in higher education can be examined through bestpractices lenses. This chapter contests the e-learning singularity paradigm through examining a sample of diverse e-learning communities, each of which may be associated with distinct teaching and technology philosophies-of-practice, as well as divergent research and development histories. A gestalt view
of interacting and interlocking teaching and technology philosophies underpins a call for local actions aimed at achieving the democratization of e-learning environment design and fostering both difference and connectivity across e-learning communities of research and practice. Section 2 Social Networks and Data Mining Chapter 5 Data Mining Techniques for Communities’ Detection in Dynamic Social Networks........................... 88 Céline Robardet, Université de Lyon, France Social network analysis studies relationships between individuals and aims at identifying interesting substructures such as communities. This type of network structure is intuitively defined as a subset of nodes more densely linked, when compared with the rest of the network. Such dense subgraphs gather individuals sharing similar property depending on the type of relation encoded in the graph. In this chapter we tackle the problem of identifying communities in dynamic networks where relationships among entities evolve over time. Meaningful patterns in such structured data must capture the strong interactions between individuals but also their temporal relationships. We propose a pattern discovery method to identify evolving patterns defined by constraints. In this paradigm, constraints are parameterized by the user to drive the discovery process towards potentially interesting patterns, with the positive side effect of achieving a more efficient computation. In the proposed approach, dense and isolated subgraphs, defined by two user-parameterized constraints, are first computed in the dynamic network restricted at a given time stamp. Second, the temporal evolution of such patterns is captured by associating a temporal event types to each subgraph. We consider five basic temporal events: the formation, dissolution, growth, diminution and stability of subgraphs from one time stamp to the next one. We propose an algorithm that finds such subgraphs in a time series of graphs processed incrementally. The extraction is feasible thanks to efficient pruning patterns strategies. Experimental results on real-world data confirm the practical feasibility of our approach. We evaluate the added-value of the method, both in terms of the relevancy of the extracted evolving patterns and in terms of scalability, on two dynamic sensor networks and on a dynamic mobility network. Chapter 6 A Methodological Approach for Blended Communities: Social Network Analysis and Positioning Network Analysis............................................................................................................. 103 Susan Annese, University of Bari, Italy Marta Traetta, University of Bari, Italy The current diffusion of blended communities, characterized by the integration of online and offline interactions, has made necessary a methodological reflection about the suitable approaches to explore psychosocial dynamics in virtual and real communities. In this chapter we propose a mixed approach that ‘blends’ qualitative and quantitative methods: by combining qualitative content analysis with Social Network Analysis we investigate participation dynamics and by employing this methodological
combination in an original way we create an innovative method, called Positioning Network Analysis, to examine identity dynamics. We will describe the characteristics of this methodological device, providing some examples in order to show the manifold use of these original tools. Chapter 7 Semantic Social Network Analysis: A Concrete Case......................................................................... 122 Guillaume Erétéo, Orange Labs, France Freddy Limpens, EDELWEISS, INRIA Sophia-Antipolis, France Fabien Gandon, EDELWEISS, INRIA Sophia-Antipolis, France Olivier Corby, EDELWEISS, INRIA Sophia-Antipolis, France Michel Buffa, University of Nice Sophia Antipolis, France Mylène Leitzelman, University of Nice Sophia Antipolis, France Peter Sander, University of Nice Sophia Antipolis, France The World Wide Web has been evolving into a read-write medium permitting a high degree of interaction between participants, and social network analysis (SNA) seeks to understand this on-line social interaction, for example by identifying communities and sub-communities of users, important users, intermediaries between communities, etc. Semantic web techniques can explicitly model these interactions, but classical SNA methods have only been applied to these semantic representations without fully exploiting their rich expressiveness. The representation of social links can be further extended thanks to the semantic relationships found in the vocabularies (tags, folksonomies) shared by the members of these networks. These enriched representations of social networks, combined with a similar enrichment of the semantics of the meta-data attached to the shared resources, will allow the elaboration of shared knowledge graphs. In this chapter we present our approach to analyzing such semantic social networks and capturing collective intelligence from collaborative interactions to challenge requirements of Enterprise 2.0. Our tools and models have been tested on an anonymized dataset from Ipernity.com, one of the biggest French social web sites centered on multimedia sharing. This dataset contains over 60,000 users, around half a million declared relationships of three types, and millions of interactions (messages, comments on resources, etc.). We show that the enriched semantic web framework is particularly well-suited for representing online social networks, for identifying their key features and for predicting their evolution. Organizing huge quantity of socially produced information is necessary for a future acceptance of social applications in corporate contexts. Chapter 8 Using Social Network Analysis to Guide Theoretical Sampling in an Ethnographic Study of a Virtual Community............................................................................................................. 157 Enrique Murillo, Instituto Tecnológico Autónomo de México – ITAM, Mexico Social Network Analysis (SNA) provides a range of models particularly well suited for mapping bonds between participants in online communities and thus reveal prominent members or subgroups. This can yield valuable insights for selecting a theoretical sample of participants or participant interactions in qualitative studies of communities. This chapter describes a procedure for collecting data from Usenet newsgroups, deriving the social network created by participant interaction, and importing this relational data into SNA software, where various cohesion models can be applied. The technique is exemplified
by performing a longitudinal core periphery analysis of a specific newsgroup, which identified core members and provided clear evidence of a stable online community. Discussions dominated by core members are identified next, to guide theoretical sampling of text-based interactions in an ongoing ethnography of the community. Section 3 Tools and Techniques for Analysis and Building of Virtual Communities Chapter 9 Graphically Mapping Electronic Discussions: Understanding Online Conversational Dynamic....... 176 Jennifer Howell, Australian Catholic University Limited, Australia Transcripts of electronic discussions have traditionally been examined via the use of conversational analysis techniques. Coding such transcripts provides rich data regarding the content and nature of the discussions that take place. However, understanding the content of the messages is not limited to the actual message itself. An electronic message is sent either in response to or to start a discussion thread. Examining the entry point of a new message can help to clarify the dynamics of the community discussion. Electronic discussions do not appear to follow traditional conversational norms. New messages may be immediate responses or they can be responses to messages posted over a longer period of time in the past. However, by graphically mapping electronic discussions, a clearer understanding of the dynamics of electronic discussions can be achieved. This chapter will present the findings of a study that was conducted on three online communities for teachers. The transcripts of electronic discussions were collected and examined via conversational analysis. These messages were then analysed via graphical mapping and the findings concluded that three distinct patterns exist in which electronic discussions may follow. It was further discovered that each of these patterns were indicative of a distinct type of electronic discussion. The findings from this study offer further insight into the nature of online discussions and help to understand online conversational dynamics. Chapter 10 A Tool to Study the Evolution of the Domain of a Distributed Community of Practice..................... 192 Gilson Yukio Sato, Federal University of Technology - Paraná, Brazil Hilton José Silva de Azevedo, Federal University of Technology - Paraná, Brazil Jean-Paul Barthès, Université de Technologie de Compiègne, France Virtual communities and distributed communities of practice leave traces of their activities that are a valuable source of research material. At the same time, studying this kind of community requires new methods, techniques and tools. In this chapter, we present the Community Agent: a tool to follow the evolution of the domain of a distributed Community of Practice. Such a tool aims at obtaining and presenting graphically some indicators to study the evolution of the domain of a Community of Practice and the participation of its members. We present the implementation of the Community Agent, the results obtained in the preliminary tests and an example of how the agent could be used to study distributed communities.
Chapter 14 Conversation Analysis as a Tool to Understand Online Social Encounters......................................... 248 Aik-Ling Tan, Nanyang Technological University, Singapore Seng-Chee Tan, Nanyang Technological University, Singapore This chapter focuses on the application of Conversation Analysis (CA) as a tool to understand online social encounters. Complementing current analytic methods like content analysis and social network analysis, analytic tools like Discussion Analysis Tool (DAT) (Jeong, 2003) and Transcript Analysis Tool (TAT) (Fahy, Crawford, & Ally, 2001) have been developed to study both the content of online discussions as well as the interactions that take place among the participants. While these new tools have devoted certain attention to the development of social interactions, insights into how online participants form alliances among themselves and mechanisms for repairing a conversation when it breaks down remains lacking. Knowledge of online social order (or the lack of), both its genesis as well as maintenance, is essential as it affects the processes and intended learning outcomes in an online community. We argue that using CA, while not popularly applied for the analysis of online discussions, gives the much needed focus on the minute details of online interactions that are important to understanding social orderliness of conversations in a virtual community. In this chapter, we illustrate how CA can be applied in analysis of online discussion by applying Freebody’s (2003) six analytic passes and suggest that CA may be used as an alternative analytic tool in a virtual environment where conversations are generally asynchronous. These six analytic passes are: (1) turn taking, (2) building exchanges, (3) parties, alliances and talk, (4) trouble and repair, (5) preferences and accountability, and (6) institutional categories and the question of identity. Section 4 Data and User Modelling Chapter 15 Modeling the Diversity of User Behavior in Online Communities..................................................... 268 Tad Hogg, Hewlett-Packard Laboratories, USA Gabor Szabo, Hewlett-Packard Laboratories, USA This chapter describes models of the diversity of behavior seen in online communities, in particular how users contribute and attend to content, and how they form social links with their peers. We illustrate the models and parameter estimation procedure with a political discussion community. The models identify key characteristics of users and the web site design leading to the diverse behaviors, and suggest future experiments to identify causal mechanisms producing these characteristics. Chapter 16 Context and Explanation in e-Collaborative Work.............................................................................. 285 Patrick Brézillon, University Paris 6 (UPMC), France In a face-to-face collaboration, participants use a large part of contextual information to translate, interpret and understand others’ utterances by using contextual cues like mimics, voice modulation, movement of a hand, etc. Such a shared context constitutes the collaboration space of the virtual community.
Explanation generation, one the one hand, allows to reinforce the shared context, and, in the other hand, relies on the existing shared context. The situation is more critical in e-collaboration than in face-toface collaboration because new contextual cues are to be used. This chapter presents the interests of making explicit context and explanation generation in e-collaboration and which types of new paradigms exist then. Chapter 17 Intelligent LMS with an Agent that Learns from Log Data in a Virtual Community.......................... 303 Maomi Ueno, The University of Electro-Communications, Japan This study describes an agent that acquires domain knowledge related to the content from a learning history log database in a learning community and automatically generates motivational messages for the learner. The unique features of this system are as follows: The agent builds a learner model automatically by applying the decision tree model. The agent predicts a learner’s final status (Failed; Abandon; Successful; or Excellent) using the learner model and his/her current learning history log data. The constructed learner model becomes more exact as the amount of data accumulated in the database increases. Furthermore, the agent compares a learner’s learning processes with “Excellent” status learners’ learning processes stored in the database, diagnoses the learner’s learning processes, and generates adaptive instructional messages for the learner. A comparison between a class of students that used the system and one that did not demonstrates the effectiveness of the system. Chapter 18 A Computational Model of Social Capital in Virtual Communities.................................................... 318 Ben Kei Daniel, University of Saskatchewan and Saskatoon Health Region, Canada This chapter presents a Bayesian Belief computational model of social capital (SC) developed within the context of virtual communities. The development of the model was based on insights drawn from more than five years of research into social capital in virtual communities. The Chapter discusses the key variables constituting social capital in virtual communities and shows how the model was updated using practical scenarios. The scenarios describe authentic cases drawn from several virtual communities. The key issues predicted by the model as well as challenges encountered in building, verifying and updating the model are discussed.
Section 5 Methods, Measurements and Matrices Chapter 19 A Beginner’s Guide to Geographic Virtual Communities Research.................................................... 333 Brent Hecht, Northwestern University, USA Darren Gergle, Northwestern University, USA This chapter is effectively an introductory lesson in Geographic Information Systems (GIS) and Geographic Information Science (GIScience), customized for the virtual communities researcher.
Chapter 20 A Theoretical Method of Measuring Virtual Community Health and the Health of their Operating Environment in a Business Setting............................................................................. 348 Brent Robertson, Sancor, Canada This chapter discusses how virtual communities are associated with business and describes how the communities support the overall business effort. The chapter then examines the ways that the execution of certain business processes – such as the ‘lessons learned process’ – can have a strong supporting role in maintaining the health of virtual communities. Quantitatively measuring key aspects of these business processes provides a strong indication of the health of virtual communities that are linked to the process. The chapter introduces a measurement by objectives system, describes how it can be used to assess the health of virtual communities and how this can be extrapolated to assess the supportive nature of the overall business environment the communities are operating in. Chapter 21 Building Web Communities: An Example Methodology.................................................................... 359 Jan Isakovič, Artesia, Slovenia Alja Sulčič, Artesia, Slovenia The aim of the chapter is to provide an example of community definition and community building methodology using a step-by-step approach. The presented community specification and building methodology allows refining a broad community purpose into specific measurable goals, selects the social media tools that are best matched with the company needs and results in a platform specification that can be relatively simply transformed into software specifications or platform requirements. Chapter 22 Virtual Geodemographics: Consumer Insight in Online and Offline Spaces...................................... 370 Alex D. Singleton, University of Liverpool, UK Computer mediated communication and the Internet has fundamentally changed how consumers and producers connect and interact across both real space, and has also opened up new opportunities in virtual spaces. This book chapter describes how technologies capable of locating and sorting networked communities of geographically disparate individuals within virtual communities present a sea change in the conception, representation and analysis of socioeconomic distributions through geodemographic analysis. It is argued that through virtual communities, social networks between individuals may subsume the role of neighborhood areas as the most appropriate unit of analysis, and as such, geodemographics needs to be repositioned in order to accommodate social similarities in virtual, as well as geographical, space. The chapter ends by proposing a new model for geodemographics which spans both real and virtual geographies.
Chapter 23 Development of Cellular Phone Software to Prompt Learners to Monitor and Reorganize Division of Labor in Project-Based Learning................................................................... 381 Toshio Mochizuki, Senshu University, Japan Kazaru Yaegashi, Ritsumeikan University, Japan Hiroshi Kato, The Open University of Japan, Japan Toshihisa Nishimori, The University of Tokyo, Japan Yusuke Nagamori, Tsukuba University of Technology, Japan Shinobu Fujita, Spiceworks Corporation, Japan The authors have developed a cellular phone application called ProBoPortable that displays information regarding the progress and achievement of tasks and division of labor in project-based learning (PBL) for higher education. ProBoPortable works as wallpaper on the screen of the learner’s cellular phone, and it cooperates with Web-based groupware. When a learner activates his/her phone, ProBoPortable immediately retrieves the current status of the appropriate project from the groupware database and displays it on the screen. A classroom evaluation was performed in an undergraduate course; the evaluation confirmed that ProBoPortable enhanced mutual awareness of the division of labor among learners, who modified their own tasks by monitoring the overall status of the PBL. Using ProBoPortable increasingly fostered the sense of a learning community among the subjects. Moreover, social facilitation encouraged the learners to proceed with their own task due to the presence of others who are mutually aware of each member’s status. Chapter 24 Mathematical Retrieval Techniques for Online Mathematics Learning.............................................. 394 Le Van Tien, Hochiminh City University of Technology, Vietnam Quan Thanh Tho, Hochiminh City University of Technology, Vietnam Hui Siu Cheung, Nanyang Technological University, Singapore In recent years, the number of computer-aided educational softwares in mathematics field has been increasing. Currently, there are some research prototypes and systems assisting finding mathematical problems. However, when finding appropriate mathematical expressions, most of these systems only support mechanisms to search expression in a strict exact manner, or search some similar problems based on wildcard, not on the similarity of expression structures and semantic meanings. Such mechanisms restrict users significantly from achieving meaningful and accurate search results of mathematical expressions. In this chapter, we introduce a mathematical retrieval system that helps mathematics learners self-study effectively. The most important module in our system is the math-retrieving system module, which received the analyzed problems submitted from users, retrieves solutions from similar stored problems and ranks the retrieved problems to users. To fulfill these requirements, we have researched and proposed some advanced mathematical retrieval and mathematical ranking techniques. Experiments have shown that our proposed techniques highly suitable for mathematical retrieval as they outperformed the techniques used in typical document retrieval system
Chapter 25 Online Ethnographic Methods: Towards a Qualitative Understanding of Virtual Community Practices........................................................................................................................... 410 Jörgen Skågeby, Linköping University, Sweden This chapter describes the use of online ethnographical methods as a potent way to reach qualitative understanding of virtual communities. The term online ethnography envelopes document collection, online observation and online interviews. The chapter will explain the steps of conducting online ethnography – from defining setting and spelling out your research perspective, to collecting online data, analyzing gathered data, feeding back insights to the studied community and presenting results with ethical awareness. In this process the chapter will compare online ethnography to traditional ethnography and provide illustrative empirical examples and experiences from three recent online ethnographical studies on social information and media sharing (Skågeby, 2007, 2008, 2009a). While multimedial forms of data and data collection are becoming more common (i.e. video and sound recordings), the focus of the chapter lies mainly with text-based data. The chapter concludes by discussing methodological benefits and drawbacks of an online ethnographical process. Volume II Chapter 26 Understanding Online Communities by Using Semantic Web Technologies...................................... 429 Alexandre Passant, National University of Ireland-Galway, Ireland Sheila Kinsella, National University of Ireland-Galway, Ireland Uldis Bojars, National University of Ireland-Galway, Ireland John G. Breslin, National University of Ireland-Galway, Ireland Stefan Decker, National University of Ireland-Galway, Ireland During the last few years, the Web that we used to know as a read-only medium shifted to a read-write Web, often known as Web 2.0 or the Social Web, in which people interact, share and build content collaboratively within online communities. In order to clearly understand how these online communities are formed, evolve, share and produce content, a first requirement is to gather related data. In this chapter, we give an overview of how Semantic Web technologies can be used to provide a unified layer of representation for Social Web data in an open and machine-readable manner thanks to common models and shared semantics, facilitating data gathering and analysis. Through a comprehensive state of the art review, we describe the various models that can be applied to online communities and give an overview of some of the new possibilities offered by such a layer in terms of data querying and community analysis. Chapter 27 Understanding and Using Virtual Ethnography in Virtual Environments........................................... 457 Robert Hancock, Southeastern Louisiana University, USA Mindy Crain-Dorough, Southeastern Louisiana University, USA Becky Parton, Southeastern Louisiana University, USA Jeff Oescher, Southeastern Louisiana University, USA
This chapter proposes to outline a process of virtual ethnography that combines emic and etic methods of data gathering adapted to the virtual context to provide a ‘true’ (Richardson, 2000) accounting of the social constructs inherent in the virtual world. The first section of this chapter discusses the unique characteristics of virtual ethnography when used to explore virtual environments such as Second Life or MMORPGs such as World of Warcraft. The second section presents some of the methodological issues related to conducting such research. Finally, the third section offers for consideration some unique challenges related to the application of such methods. Two concerns are discussed: 1) identifying an understanding the phenomenological structures unique to a particular virtual environment and 2) the implications of such knowledge with regard to the design of new virtual educational environments. Chapter 28 Participant-Observation as a Method for Analyzing Avatar Design in User-Generated Virtual Worlds...................................................................................................................................... 469 Jeremy O. Turner, Simon Fraser University Surrey, Canada Janet McCracken, Simon Fraser University Surrey, Canada Jim Bizzocchi, Simon Fraser University Surrey, Canada This chapter explores the epistemological, and ethical boundaries of the application of a participantobserver methodology for analyzing avatar design in user-generated virtual worlds. We describe why Second Life was selected as the preferred platform for studying the fundamental design properties of avatars in a situated manner. We will situate the specific case study within the broader context of ethnographic qualitative research methodologies, particularly focusing on what it means to live – and roleplay - within the context that one is studying, or to facilitate prolonged engagement in order to have the research results accepted as trustworthy or credible (Lincoln & Guba, 1985). This chapter describes a case study where researchers can extract methods and techniques for studying “in-world” workshops and focus groups. Our speculations and research questions drawn from a close analysis of this case study will illuminate the possible limitations of applying similar hybrid iterations of participationobservation tactics and translations of disciplinary frameworks into the study of user-generated content for future virtual world communities. Finally, we will review the broader epistemological and ethical issues related to the role of the participant-observation researcher in the study of virtual worlds. Chapter 29 Participatory Design Experiment: Storytelling Swarm in Hybrid Narrative Ecosystem..................... 482 Kai Pata, Tallinn University, Estonia This chapter describes a participatory design experiment that is influenced by the swarming activity. The chapter introduces a new approach to writing narratives in virtual learning communities of the social Web 2.0 and contrasts it with traditional storytelling approaches. In the participatory design experiment we developed a hybrid virtual storytelling playground that augments the real world – a hybrid ecosystem of narratives. It consists of social software tools freely available in the Web, such as microblogs, social repositories of images, and blogs, the real locations in the city, and the storytellers who leave their digital contents. The results of writing narratives as a swarm in a hybrid ecosystem are presented. In our experiment, instead of bending old novel formats into the hybrid ecosystem, the evidences of new evolving narrative formats of this hybrid space were explored.
Chapter 30 Researching Community in Distributed Environments: Approaches for Studying Cross-Blog Interactions....................................................................................................................... 509 Vanessa P. Dennen, Florida State University, USA Jennifer B. Myers, Florida State University, USA Christie L. Suggs, Florida State University, USA In this chapter we examine how a variety of research approaches can be applied to the study of crossblog interactions. Cross-blog interactions can be challenging to study because of they often require the researcher to reconsider traditional notions of temporality, discourse space, and conversation. Further, in many instances they are neither static nor well defined; defining the beginning and end of a discussion as well as locating all components of the discussion can be difficult. For this reason, we advocate a blend of six approaches (social network analysis, content analysis, discourse analysis, conversation analysis, narrative analysis, and ethnography). For each, we discuss strengths and limitations and provide examples of how the approach may be used to help fully capture the complexity of these interactions. Additionally we discuss web-based tools that are helpful when engaged in this type of research. Chapter 31 Methods for the Measurement and Visualization of Social Networks in Multi-User Virtual Worlds................................................................................................................... 530 Devan Rosen, Ithaca College, USA Virtual communities that allow many users to interact in a virtual world, often called multi-user virtual worlds (MUVWs), allow users to explore and navigate the virtual world as well as interact with other users. The communicative interaction within these virtual worlds is often text-based using Internet relay chat (IRC) and related systems. IRC has posed a difficulty for researchers looking to evaluate the interaction by analyzing and interpreting the communication since data is stored in the form of chatlogs. The current chapter explicates methodological procedures for the measurement and visualization of chat-based communicative interaction in MUVWs as social networks. A case study on an educational MUVW, the SciCentr programs sponsored by Cornell University, is used to elaborate methods and related findings. Chapter 32 Online Multi-Contextual Analysis: (Re)connecting Social Network Site Users with Their Profile....................................................................................................................... 542 Alyson Young, University of Maryland, Baltimore County, USA David Gurzick, University of Maryland, Baltimore County, USA Anabel Quan-Haase, The University of Western Ontario, Canada This chapter proposes online multi-contextual analysis (OMCA) as a new multi-method approach for investigating and analyzing the behaviors, perceptions, and opinions of social network site (SNS) users. This approach is designed to extend methods currently available for the investigation of the use and social consequences of these sites with techniques that converge upon and triangulate users’ perceptions of their online behavior. Using quantitative measures of SNS usage, OMCA provides a much neglected
level of analysis. We discuss current methodological practice in SNS research and introduce OMCA as an alternative approach. We then describe two studies that have employed OMCA to illustrate the method’s diversity and potential for providing new insights. Finally, we discuss the strengths and weaknesses of OMCA in comparison to single approaches and draw conclusions for theories of SNSs. Chapter 33 Participant Observation in Online Multiplayer Communities............................................................. 555 Marko Siitonen, University of Jyväskylä, Finland This chapter discusses participant observation as a method of data collection for studying social interaction in online multiplayer games and the communities within them. Participant observation has its roots in the social sciences, and especially in the field of anthropology. True to a natural inquiry approach, studies utilizing participant observation try to understand the actual habitat or “lifeworld” of those participating in the study. This chapter looks at various practical issues connected to conducting participant observation in online multiplayer communities. Examples of data collection are presented, including saving log files, capturing images and video, and writing field notes. Participant observation seems well suited for studying online communities since it can respond well to the challenges of the ever-changing technology and social situations, the need to take into account multiple channels of communication, and the complex and sometimes hidden nature of computer-mediated social interaction. Chapter 34 Proposed Techniques for Data Collection and Analysis in the Study of News-Oriented Virtual Communities............................................................................................................................ 568 William J. Gibbs, Duquesne University, USA Joseph E. McKendrick, McKendrick and Associates, USA News providers today offer interactive sources that engage people, enable them to build community, and to participate in the news. At the same time, the digital interfaces through which people access the news are continuingly evolving, diverse, and oftentimes visually complex. How these factors shape human information seeking in news-oriented virtual communities is a relatively new area of study and therefore greater understanding of their influence on human behavior is of much practical value. In this chapter, the authors explore trends and developments in news-oriented virtual communities. They review several data collection and analysis techniques such as content analysis, usability testing and eye-tracking and propose that these techniques and associated tools can aid the study of news communities. They examine the implications these techniques have for better understanding human behavior in virtual communities as well as for improving the design of these environments. Chapter 35 Challenges of Analyzing Informal Virtual Communities.................................................................... 585 Nancy Poon, University of Saskatchewan, Canada Ben Kei Daniel, University of Saskatchewan and Saskatoon Health Region, Canada Drawing from previous research, this chapter presents major challenges associated with the analysis of interaction patterns in informal virtual communities. Using social network as well as content analysis to
understand the structure and nature of interaction in such virtual communities, the goal was understand the physical structure of the community as well as the nature of the themes discussed by community members in an attempt to build a theoretical model of interactions. Chapter 36 Research Methods for Studying Virtual Communities........................................................................ 594 Nana Adu-Pipim Boaduo, University of the Free State, South Africa Very often virtual community student researchers find it difficult to decide on methodological paradigms, the choice for methods and their application to use in a given research study. They may stay thousands of kilometres from their study supervisors. Some of them might not have had the opportunity to acquire basic research knowledge and skills while other must have trained in advance research methods. This chapter caters for both these group of virtual community readers. In many instances the possible means of contact may either be by phone or by the Internet. The problems of distance and non-physical contact with their supervisors may deter virtual community researchers from engaging in regular research activities. To complicate the problem of virtual community students are the provisions of authors who write research books who rarely discuss: The philosophical underpinnings of both qualitative and quantitative methods; How qualitative and quantitative methods can be applied in a research study, Where they can be applied in the study; When to apply them in the study, and; What to do to enable the virtual researcher make informed professional decision about the choice of methodology. Coupled with these dilemmas are the virtual community researchers’ choices of framework for data collection, treatment, analysis and interpretation to make the study report a professional masterpiece. This chapter discusses basic research methodologies to place the virtual community researchers in a comfortable position and clarifies the dilemma inherent in the virtual community research fraternity. Later in the chapter advanced discussion of systematic methodological application where data collected for a research study can be conveniently treated, analysed and interpreted to be able to write a professional masterpiece of a research report as a contribution to the knowledge data base. Chapter 37 Methodological Considerations for Quantitative Content Analysis of Online Interactions................ 611 Seng-Chee Tan, Nanyang Technological University, Singapore Hyo-Jeong So, Nanyang Technological University, Singapore Ching-Sing Chai, Nanyang Technological University, Singapore This chapter focuses on quantitative content analysis of online interactions, in particular, asynchronous online discussion. It clarifies the definitions of quantitative content analysis and provides a summary of 23 existing coding schemes, broadly categorized by the theoretical constructs under investigation: (1) (Meta) cognition, (2) knowledge construction, and (3) presence. To help interested researchers harvest the rich source of data in online communities, guidelines for using quantitative content analysis of online interactions were provided. In addition, important methodological considerations and issues were discussed, including the issues of validity, reliability, choice of unit of analysis, and latent versus manifested content.
Chapter 38 Measuring Brand Community Strength............................................................................................... 631 Hikaru Yamamoto, Seikei University, Japan Yutaka Matsuo, University of Tokyo, Japan The emphasis of this chapter is brand community. A brand community is a virtual community where consumers who share a set of social relations based upon usage or interest in a product gather into a group and mutually interact. The consumers’ purchase decision-making is often influenced by word-ofmouth communications with the other consumers; who to trust among them is often determined by their similarity of product purchase behavior. This bidirectional effect between trust and product preference explains the emergence and the strength of brand community. This chapter presents a theoretical model of this phenomenon along with analyses of an actual virtual community. We designate the bidirectional effect as community gravity because it represents the power to induce users to join the community. This analysis provides insights for understanding consumer behavior in an online environment. Chapter 39 An Approach for Analysing Interactions within Virtual Learning Communities................................ 654 Manuela Repetto, Institute for Educational Technology, National Research Council, Italy The aim of the contribution is to present a novel systematic model of interaction analysis which was designed and successfully experimented with a wide sample of adult learners in order to enhance and understand cognitive, socio-organizational and emotional-affective processes of virtual learning communities (VLCs). Starting from strengths and weaknesses of the present models and methodologies on interaction analysis, the mixed methodological approach adopted to develop this novel interaction analysis model is illustrated. The model is organised in five categories and about thirty indicators and it can be applied through the development of a coding scheme, a self-assessment questionnaire for learners, and an assessment grid for tutors. Triangulation of data obtained from these tools and integration of them with ethnographic analysis make this approach for analysing interactions a reliable means to allow assessment and self-regulation of learners, while exploring the nature of learning within virtual learning environments (VLEs). Section 6 Online Phenomena and Case Studies Chapter 40 The Sense of e-Learning Community Index (SeLCI) for Computer Supported Collaborative e-Learning (CSCeL)...................................................................................................... 672 Niki Lambropoulos, London South Bank University, UK The aim of this research is to shed light in collaborative e-learning communities in order to observe, analyse and support the e-learning participants. The research context is the Greek teachers’ e-learning community, started in 2003 as part of a project for online teachers’ training and aimed at enabling teachers to acquire new competencies. However, these aims were not met because of passive participation;
therefore this study aimed to enhance the Greek teachers’ social engagement to achieve the new skills acquisition. Therefore, the initial sense of community identification was based on empathy; however, because it was inadequate to fully describe the context,, a Sense of E-Learning Community Index (SeLCI) was developed. The new SeLCI attributes were: community evolution; sense of belonging; empathy; trust; intensity characterised by e-learners’ levels of participation and persistence on posting; collaborative e-learning quality measured by the quality in Computer Supported Collaborative eLearning (CSCeL) dialogical sequences, participants’ reflections on own learning; and social network analysis based on: global cohesion anchored in density, reciprocity, cliques and structural equivalence, global centrality derived from in- and out-degree centrality and closeness; and local nodes and centrality in real time. Forty Greek teachers participated in the study for 30 days using Moodle and enhanced Moodle with to measure participation, local Social network Analysis and critical thinking levels in CSCeL. Quantitative, qualitative, Social Network Analysis and measurements produced by the tools were used for data analysis. The findings indicated that each of the SeLCI is essential to enhance participation, collaboration, internalisation and externalisation of knowledge to ensure the e-learning quality and new skills acquisition. Affective factors in CSCeL (sense of belonging, empathy and trust) were also essential to increase reciprocity and promote active participation. Community management, e-learning activities and lastly, the technology appear to affect CSCeL. Chapter 41 Tracer Studies: A Concrete Approach to a Virtual Challenge.............................................................. 699 Nancy Brigham, Rosenblum Brigham Associates, USA This chapter introduces Tracer Study methodology, a cost-effective, capacity building tool for evaluating the operations and effectiveness of Virtual Communities of Practice (VCoPs). We make the argument that a VCoP is a dynamic, continually evolving entity, whose characteristics distinguish it in important ways from naturally occurring or purposively planned communities of practice operating in the face-to-face world. As a result, VCoPs lend themselves to evaluation by means of Tracer Studies, a methodology that originated in the field of knowledge utilization, and has been adapted to assess how a VCoP operates and the extent to which it is successful in promoting knowledge use and dissemination. The chapter provides historical background on VCoPs, defines Tracer Studies and demonstrates the types of information that may be derived from a Tracer Study evaluation. We also discuss the application of Tracer Study methodology to the evaluation of VCoPs sponsored by a private education organization. Chapter 42 Digital Museums in 3D Virtual Environment...................................................................................... 713 Kingkarn Sookhanaphibarn, Ritsumeikan University, Japan Ruck Thawonmas, Ritsumeikan University, Japan This chapter aims to present an overview of the field of digital museums and describes the current framework of content management systems feasibly integrated in the museums in 3D virtual environment for assisting visitors to deal with information overload and providing personalized recommendations, content, and services to them. Digital museums in 3D virtual environment are an intriguing alternative to let visitors experience them compared to thousands of existing digital museums that are similar to
digital archiving places published in the Internet. Exemplary characteristics of digital museums in Web 1.0, Web 2.0, and Second Life are also reviewed and discussed. Moreover, prior classification of visiting styles essential to personalize the museum context and content is described in this chapter. Chapter 43 Weaving the Social Fabrics: Recognizing Social Signals to Support Awareness and Group Interaction in Online Games..................................................................................................... 731 Tiffany Y. Tang, Konkuk University, South Korea Pinata Winoto, Konkuk University, South Korea Users in rich social media environments such as Massively Multiplayer Online Games (MMOGs) accomplish various kinds of tasks through maintaining a constant high degree of awareness and social awareness. Generally, being aware of each other’s presence provides a clue for one’s own action in a situated environment. It guides one’s own actions accordingly; and serves as virtual traces to coordinate and collaborate with partners. The ability to appropriately incorporate social spaces in the design of MMOGs socially-oriented game elements is critical. In other words, do MMOGs games and their designs facilitate social interactions from players’ perspective? In order to shed light on this issue, we conducted a series of usability studies through the typical ethnographic evaluation on the SIMs Online (TSO) and two other MMOGs. Our findings are mixed and they revealed that while players admitted tools and group-oriented tasks exist in the game, their usability are inadequately satisfactory; that is they are not well utilized by the players, and in some cases, there are too many which makes them difficult to decide which one(s) to notice. In addition, some of these tools are not readily accessible to players to unfold some critical information before/during their interactions with others. Similar findings were obtained from our study on a number of other MMOGs. This chapter describes our evaluation which shed light on the impact of appropriate technology and its design elements in promoting and supporting social awareness and seamless group interactions. Chapter 44 Studying Social Capital in the New Communitarian Horizon: A Multi-Method Research Strategy................................................................................................................................. 753 José Luis López Aguirre, Universidad Panamericana, México Characterized by the virtualization vs. materialization of the social interaction spaces, current communitarian scenarios set a series of doubts about how new technologies are transforming the ability of humans to associate with others over space and time. This uncertain atmosphere takes our methodological approaches for studying virtual communities to the study of the communitarian environment through the analysis of essential attributes that determine the existence of a community: social capital. This chapter presents a multi-method research strategy that allows the study of the social capital in these hybrid communities, in which the only stable element to perform the analysis is the person, understood as the central node where different social groups converge in physical and virtual interaction nets and where ultimately communitarian feelings are cherished. Compilation of References................................................................................................................ 766
xxxii
Foreword
If you go back to the earliest days of virtual communities, methods were simple: journalists and researchers participated, and then wrote about it. Lindsy Van Gelder met “Joan” (who was really “Alex”) on Compuserve in 1983, and wrote “The Case of the Electronic Lover” for Ms. Magazine in 1985 (Van Gelder, 1985). Howard Rheingold got advice from a friendly pediatrician in the middle of the night, chatting with friends on the WELL in the early 1990s (Rheingold, 1993). When Judith Donath was engaged to be married, she hung out on the brides group on USENET and then used concepts from animal behavior to understand what she observed (Donath, 1998). Just explaining the medium to the public and to scholars was half the battle. Applying established theory to understand your personal experiences was cutting edge. At the time of this writing in 2010, virtual communities/social computing are now part of mainstream popular culture. The medium is pervasive in industrialized nations, and mobile computing is growing explosively in developing nations. As virtual communities have accelerated in popularity, our need to understand them has grown commensurately. Our teenagers are gaming and texting, our elderly parents are renewing friendships online that are 50 years old, and our businesses are locating and vetting new suppliers on other continents. What does it all mean? How do we begin to tease apart the evolving socio-technical system that is the Internet today? With the rise of the importance of social computing comes a need for a wide range of methods to study these phenomena carefully. In this volume, Ben Kei Daniel has pulled together a global, savvy group of authors to survey a broad spectrum of methods and approaches. These methods borrow from a variety of disciplines. Ethnographic methods have their roots in anthropology, and social network analysis has its roots in quantitative sociology. Semantic network approaches have their roots in computer science and artificial intelligence. Conversation analysis comes from linguistics. And that doesn’t mention work in this volume coming from researchers in management, geography, mathematics and education. Most of the tools and projects described in this volume draw on not just one of these disciplines, but use multiple approaches in a complimentary fashion. Together, these chapters provide a window on our growing methodological sophistication in how to understand virtual communities. Amy Bruckman Atlanta, Georgia June 2010 Amy Bruckman is an associate professor in the School of Interactive Computing at the Georgia Institute of Technology. She and her students in the Electronic Learning Communities (ELC) research group do research on social computing, particularly for educational applications. She is interested in the ways that we can design online communities to encourage individuals
xxxiii
to create and share content online, and learn through that process. Dr. Bruckman received her Ph.D. from the MIT Media Lab's Epistemology and Learning group in 1997, her M.S.V.S. from the Media Lab's Interactive Cinema Group in 1991, and a B.A. in physics from Harvard University in 1987. In 1999, she was named one of the 100 top young innovators in science and technology in the world (TR100) by Technology Review magazine. In 2002, she was awarded the Jan Hawkins Award for Early Career Contributions to Humanistic Research and Scholarship in Learning Technologies.
REFERENCES Donath, J. (1998). Identity and Deception in the Virtual Community. In P. Kollock & M. Smith (Eds.), Communities in Cyberspace: Routledge. Rheingold, H. (1993). The Virtual Community: Homesteading on the Electronic Frontier. Reading, MA: Addison-Wesley Publishing Company. Van Gelder, L. (1985). The Strange Case of the Electronic Lover. Ms.
xxxiv
Preface
INTRODUCTION The 21st century has witnessed a phenomenal increase in the number of virtual communities. This growth signifies our augmenting desire to connect, work, share, exchange, play and socialize with others irrespective of time, space, speed and distance. Today, more and more people are using social software such as Facebook, Twitter, MySpace, Blogs, Wikis, LinkedIn, and many others, to help them carry out their daily activities. As new technologies become an increasingly interwoven aspect of our everyday lives, it has become apparent that traditional methods for studying social systems that characterizes some of these technologies often lack the detailed understanding of aspects of human, social and cultural life that is required. Since virtual communities and phenomena inherent in them are emergent, there is still a lack of robust methods and approaches to study and understand virtual communities in breadth and depth. Clearly, this is critical if we are to provide complete and useful information systems, build better tools, and develop lean and efficient processes that can make interactions in these communities more productive, trustworthy, safe, secure and fun. Currently, the massive utilization of virtual communities generates huge volume of data, which if systematically captured and appropriately analyzed, would be invaluable to increased understanding of social, educational and technological phenomena happening in these communities. Further, the availability of tracking and analytic tools as well as the development of robust just-intime data visualization software has helped enhance unprecedented opportunities to help researcher’s answers questions they have entertained only theoretically for decades, largely due to the difficulty to directly observe social relations inherent in these communities. This handbook of “Research on Methods and Techniques for Studying Virtual Communities: Phenomena and Paradigms” collectively appeal to a reorientation of research directions and methods and techniques on studying virtual communities. The book satisfies the need for diverse and yet coherent methodological consideration and tools for data collection, analysis and presentation on virtual communities. Drawing from a wide variety of disciplines and sectors, methods covered in the book include; qualitative, quantitative, mixed methods, social network analysis, content analysis, program evaluation, discourse analysis, data mining, and data and user modelling. Metrics for measuring virtual communities are also discussed. Moreover case studies on important emergent phenomena in virtual communities are presented.
PURPOSE OF THE BOOK Virtual communities have become a subject of considerable interest in both research and practice. These communities encompass a broad spectrum of activities, ranging from social networking, knowledge
xxxv
networking, health and health care, educational and economical. Attempts to evaluate the performance of virtual communities would depend on various possible ways of defining and measuring “success” and depending upon the perspective of the researcher, the sector they are associated with, as well as the type of community being investigated. In the past, several researchers have used various methods and metrics to investigate and measure different phenomena in virtual communities. Some researchers employed rigorous research methods such as social network analysis; others used traditional qualitative or quantitative methods and data mining techniques, while others have relied on ad hoc methods. This is the first book that brings together a number of methods for examining virtual communities. The book describes various research methods relevant for virtual communities and provides the readers with ways in which to apply these methods. The methods and techniques presented in the book are mainly based on empirical research. Since currently there is no comprehensive book on research methods for studying virtual communities, this book is likely to have enormous impact on scholarly and practical profound knowledge on doing research on virtual communities. In addition, the book makes strong theoretical and practical contribution to the field.
TARGET AUDIENCE This is a reference book, primarily intended for advanced undergraduate and graduate students and researchers interested in studying and building tools to support virtual communities. The book will be useful to programs taught in Computer Science, Educational Technology programs, Information Studies, Business and many other disciplines in the Humanities and the Social Sciences.
BENEFITS AND SCHOLARLY VALUE This book is a practical and immediately useful reference for researchers, technologists, instructors and graduate as well as senior undergraduate students who want to better understand how to use scientific research methods to study virtual communities. Contributors also write about the nature of relevant tools for data collection and analysis. The major contributions of the book however, are the internationally and diverse chapters and the breadth and depth of the issues covered as well as the detailed discussions and presentations of various methods for studying virtual communities, illustrating with practical examples drawn from current research.
ORGANIZATION OF THE BOOK The book has 44 chapters, which are spread across 5 sections. Section 1 of the book consists of chapters focused on an overview of virtual communities, and philosophical foundations of learning, teaching and engagement in virtual communities. Section 2 presents social and semantic network analysis of various aspects of virtual communities’ as well as dynamic models of virtual communities. In Section 3, chapters deal with methods and methodology for studying virtual communities. Section 4 introduces chapters describing various measures and approaches for studying virtual communities. And Section 5 presents case studies on various technical and social aspects of phenomenon of virtual communities. An overview of each section of the book and chapters in it are described in the beginning of each section.
xxxvi
Acknowledgment
This excellent volume is a collaborative project. I am grateful to many people for making this project a great success. I would like to thank all the contributors to this volume for their dedication and time. Thanks are also due to the members of the Editorial Advisory Board for their guidance and advice. Further, I am grateful to all the anonymous reviewers for taking their valuable time to review all the chapters, and provide constructive feedback. I would also like to take this opportunity to thank my mentors Dr. Beth Horsburgh and Dr. Veronika Makarova who provided me with numerous opportunities to develop and extend my research insights to practical domains, and within clinical settings. Thanks are also extended to Dr. Richard Julien for his Departmental leadership and support. Special thanks go to IGI Global Editorial and Publishing Team, whose contributions throughout the whole publication process were invaluable. In particular, I am deeply indebted to Julia Mosemann who continuously provided support via e-mail and kept the project on schedule, and Jan Travers for co-coordinating the editorial process. I am also grateful to my fiancée Michelle Lavergne for her unconditional love, patience and support. It was her encouragement to start this book project. And finally, thanks to everyone who contributed in one way or another towards the completion of this project. Sincerely, Ben Kei Daniel University of Saskathewan, Canada
Section 1
Introduction to Virtual Communities Virtual communities are increasing becoming part of how we work, play, and learn. But what are these communities? What are they really good for and what are the key research issues prevalent in these communities? Section 1of the book consists of 4 introductory chapters addressing theoretical and practical themes underlying virtual communities. Chapter 1 opens up with the description of the key themes covered in the book as well as the authors’ geographical and institutional distributions. The goal of the chapter is to provide the reader with context and basis in which the books draws from. Chapter 2 addresses the practical aspects as well as challenges associated with understanding the functional mechanisms of virtual communities. In particular, the chapter presents a prototype that demonstrates how to extract and infer expertise by taking into account people's participation in various virtual communities (forums and projects). The chapter also presents a proposal for applying customized rules and other enabling technologies to the expert finding task. Chapter 3 proceeds with a description of the development of a conceptual frame of reference for gaining insight to the analysis of ‘practice’ in virtual communities. The chapter includes a thorough analysis of related work in new media, community-oriented thinking and practice-based approaches. The chapter also presents reflections on recent case studies in the area. Chapter 4 takes a philosophical turn on the notion of e-learning paradigm through examination of a sample of diverse e-learning communities; considering each in association with distinct teaching and technology philosophies-of-practice, as well as divergent research and development histories.
1
Chapter 1
Introduction to this Volume Ben Kei Daniel University of Saskatchewan and Saskatoon Health Region, Canada
ABSRTACT The growth of virtual communities and their continuous impact on social, economic and technological structures of societies has attracted a great deal of interest among researchers, practitioners, system designers and policy makers. All interested in analysing and understanding how these communities form, develop, nurture social interaction, influence various technological design and implementation, enhance information and knowledge sharing, support business and act as catalytic environments to support human learning. This Chapter provides a general overview of virtual communities and introduces the reader to the various themes covered in this volume as well as the geographical distribution and institutional affiliations of contributors to the volume.
HISTORICAL OVERVIEW OF VIRTUAL COMMUNITIES Understanding the historical development of virtual communities requires a closer look at the history of the Internet. The Internet came into inception in 1969, when the United States Department of Defence Advanced Projects Research Agency (DARPA) established a computer network designed to endorse the existence of information beyond a susceptible, central location as a means DOI: 10.4018/978-1-60960-040-2.ch001
of defence against the possibility of nuclear war. Through this network—Advanced Research Projects Agency Network (ARPANET), came the development of a system which would act as a channel for “democratic information and distribution. This system advanced during the 1970s, with hosts being connected to the ARPANET as well as the subsequent appearance of state-funded computer networks, which later became known as the Internet. The pioneer technologies that supported virtual communities, started with the electronic mailing systems or, simply, e-mail, and then followed by
listservs and notice boards and then discussion forums. In 2000, various forms of websites supported by a wide range of Web technologies (Illera, 2007) became the mainstream environments for virtual communities. Though virtual communities might seem like a new phenomenon, there is a historical trend to their development. About four decades ago Licklider (1968) predicted the emergence of technology enhanced social systems—he referred to these systems “online communities”. Virtual communities, in his view, consisted of geographically separated individuals who would naturally group themselves into small clusters to work together or work individually on some issues of interests. Online/ virtual communities, he suggested would be communities not of common location, but of common interest. This prediction became accurate as there are many virtual communities that are based around common interests and goals. In some of the literature explored, virtual communities developed prior to the instigation of the Internet. They started to mature with the development of the Web technologies. The early examples of virtual communities included UseNet, with millions of users all around the world. Usenet
was established in 1980, as a distributed Internet discussion system. Membership in Usernet consisted mainly of voluntarily contributors and moderators. There were also other early virtual communities; Minitel in France and Whole Earth ‘Lectronic Link (WELL) in the United States of America. The WELL, was established in 1985 and many researchers have investigated its cultural manifestations and reported in several books (e.g. Reinghold, 1993). Many of the WELL’s members voluntarily contribute to community building and maintenance (e.g., as conference hosts). The WELL, as described in its site “provides a literate watering hole for some articulate and unpretentious thinkers”. Other writers claimed that the Minitel preceded the World Wide Web and that it existed since 1982 and was accessible to its members through the telephone lines. Further, it was stated that from its early days, members of Minitel could make online purchases, make train reservations, check stock prices, search the telephone directory, and chat in a similar way people do over the Internet today. In modern times, Slashdot is perhaps one of the most popular virtual communities. Slashdot hosts technology-related forums, with articles and
Figure 1. Historical development of virtual communities
2
Introduction to this Volume
readers comments. Slashdot subculture has become well-known in Internet circles, where its members accumulate a “karma score” and volunteer moderators are selected from those with high scores. Other virtual communities include a distributed communities of practice, intended to foster knowledge sharing and data among professionals working within the areas of governance and international development (Daniel, Sarkar & O’Brien, 2003) and virtual learning communities for graduates students of educational technology in higher education (Schwier, 2007).
and a shared value system, usually rooted in a common religion. Further, the Utopian Promises - Net Realities, asserted that “anyone with even a basic knowledge of Sociology understands that information exchange in no way constitutes a community.” The following summarises some of the fundamental features of virtual communities as opposed to geographical communities: •
What is a Virtual Community? The concept of a virtual community means different things to different people. In the scientific literature, several definitions have been proposed, ranging from those treating virtual communities as technological environments and those that describe the social configurations of the individuals participating in these environments. Preece (2000) suggested that a virtual community comprises of members who share an interest, who interact repeatedly, generate shared resources, develop governing policies, demonstrate reciprocity, and share cultural norms. Further, according to Rheingold (1993) “virtual communities are social aggregations that emerge from the Net when enough people carry on those public discussions long enough, with sufficient human feeling, to form webs of personal relationships in cyberspace” (p.5). Rheingold’s definition is one of the most frequently quoted in many discussions about virtual communities. Regardless of how a virtual community is defined, the concept intensely discussed, sometimes recieving a frosty reception mainly from some Sociologists, who doubt the validity ofof the term. Weinreich (1997) for example argued that “the idea of virtual communities must be wrong” because community is a collection of kinship networks which share a common geographic territory a common history,
•
•
•
•
•
•
•
• •
Membership: Membership can be drawn globally, from any culture and national, identity etc. There is a non-binding membership (retreat from communication is rather easily possible). Anonymity: Some virtual communities allow anonymity while others encourage openness. Domain Focused: virtual communities are constructed along shared interests goals and are domain specific Communication: There is continued interaction, i.e. a certain temporal continuity of online communication. Social Protocols: There are formal or informal conventions of online behaviour, style, and language and modes of engagement. These are either explicitly stipulated or they are implied. Space-Time: Communication is spatially disembodied and temporally synchronous or asynchronous. Shared Meaning: Meaning is communicated and shared among members. New meaning is negotiated. Voluntary: Interaction in virtual communities is voluntary and people are free to lurk, contribute and withdraw as they deem fit. Speed: Relationships can become intense more quickly online. Delusive Behaviour: People feel more courageous online than offline because they can more easily end a conversation,
3
Introduction to this Volume
•
•
•
•
they feel that there are potentially less consequences for action in a symbolic than in a physical space, and they have more time for thinking before answering and arguing. Visual Cues: The lack of physical presence and visual context queues and the invisibility of the communication partners might lower inhibitions. Deep Reflection: In a virtual community people can take their time to contribute, they can retrieve what they contributed or others contributed and synthesize things. Virtualisation and Actualisation of Relationships: In virtual communities, members first connect in cyberspace to initiate relationship (virtualisation) and later move the relationship to continue in physical space (actualisation). Similarly, they can move actual relationships from physical space to online setting Structure: The virtual community, like any other, has a distinct structure with well-defined responsibility and roles.
Figure 2. Core components of virtual community
4
•
Identity: In some virtual communities— such as a distributed community of practice, with limited anonymity, there is a community directory which contains a listing of all the members of the community, their expertise or what they can contribute to the community. This directory provides the ability to identify resources and access resources which will enhance the knowledge sharing process.
TYPES OF VIRTUAL COMMUNITIES The discussion about virtual communities in this chapter is based on distinction between two types of virtual communities (virtual learning communities and distributed communities of practice). These two communities were focus of previous research (Daniel, Schwier & McCalla, 2003) and they solid maturity to highly focused kinds of virtual communities intended to provide interesting technological, educational and social implication to learning and knowledge sharing. A
Introduction to this Volume
simple model of these communities is described in Figure 2.
Virtual Learning Communities The term virtual learning community is an aggregate of three concepts, “virtual”, “learning” and “community.” Defining these three concepts independently has been a difficult challenge to researchers in technology, information systems, education and sociology. Drawing from the previous discussion on what can be considered a virtual community; one would add a learning dimension to complete the meaning. The emphasis on learning signifies members’ focus on learning. This often has a clear starting and exit point. Of course, like any new concept, some writers use the term virtual learning community to describe other social activities conducted in virtual communities which might relate to a formalize set of learning activities. Figure 3 provides examples of a virtual learning community.
Distributed Communities of Practice A distributed community of practice is a type of a virtual community, serving as a vehicle for data and information exchange among a dispersed, multisectoral and highly distributed professionals, practitioners and scientists, who are interested in various issues within a certain field of practice. Though the notion of a distributed community of practice (DCoP) draws from the theory of a community of practice (Lave & Wenger, 1991), a DCoP differs from community of practice in many significant ways. A distributed community of practice describes a group of geographically dispersed professionals who share common practices and interests in a particular area of concern, and whose activities can be enriched and mediated by information and communication technology (Daniel, Schwier & McCalla, 2003). The concept of distributed communities of practice aims to move beyond connectivity to achieve new levels of community interactivity, bringing together diverse groups and encouraging knowledge and
Figure 3. Two type of virtual communities
5
Introduction to this Volume
Figure 4. A proof-of-a concept of a distributed community of practice: governance example
information sharing among people, organizations, and communities. An example of a distributed community of practice (see Figure 4) is a group of researchers in Canadian Universities and Colleges, various government departments, non-governmental organizations and private consulting who are although diverse in organisational backgrounds and distributed all over Canada, are all interested in different issues relating to the domain of governance and international development (Daniel, Sarkar & O’Brien, 2008). What holds members together in a distributed community of practice is a common sense of purpose and a need to know what each other knows and to share knowledge and exchange information. The ultimate goals of a distributed community of practice are informal or non-formal learning and knowledge sharing. For a distributed community of practice to evolve, it requires individuals whom are often geographically distributed, organizationally and culturally diverse to be connected through various forms of computer mediated communication tools. The key features of distributed communities of practice are listed below:
6
• •
•
•
•
•
Shared Interests: Membership is organized around topics or domain issues that are important to them. Common Identity: Members develop shared understanding and common identity. Shared Information and Knowledge: Members share information and knowledge, or they are willing to develop a culture of sharing, voluntarily responding to requests for help. Voluntary Participation: Members normally voluntarily participate in the activities of the community. Autonomy in Setting Goals: A distributed community of practice sets its own agenda based on the needs of the members and these needs change over time as the community evolves and membership and environment changes. Awareness of Social Protocols and Goals: Members in a distributed community of practice are normally aware of the acceptable social protocols and goals of the community.
Introduction to this Volume
•
•
Awareness of Membership: Members in a distributed community of practice are normally aware of each other in the community; that is, individuals have a reasonable knowledge of who is who and what they do in the community. Effective Means of Communications: Effective communication among others remains a key distinguishing factor among communities. Robust communication may include face-to-face meetings and technology-mediated communication such as email, videoconferencing, discussion forums, WebPages, intelligent agents.
Unlike a virtual learning community where membership is not necessarily professionally defined, distributed community of practice draws its members from professionals who are likely to sharing common interests in connecting to others, to informally learn from each other, through the use of information and communication technologies.
In addition, its life cycle of a distributed community of practice is determined by the value it creates for its members and it is sustained by the continuity of relevance of its goals to the members.
STANDARDIZED METHODS FOR STUDYING VIRTUAL COMMUNITIES Despite the growing interest in the investigation of virtual communities, the overall quality and depth of research varies considerably. One possible reason is that virtual communities and research issues surrounding them cut across disciplines and there are limited interdisciplinary methodologies for addressing these issues thoroughly. In addition, because the area is relatively new, there has been little opportunity to address many emerging research issues. In any area of new research, asking the right kinds of questions, formulating interesting hypotheses and adapting research methods to an emerging field poses many challenges. Further,
Table 1. Virtual learning communities and distributed communities of practice Virtual learning communities
Distributed communities of practice
Membership is explicit and identities are generally known
Membership may or may not be explicit
Presences of an instructor
Facilitator, coordinator or a system administrator
Participation is often required
Participation is mainly voluntary
Explicit set of social protocols for interaction
Implicit and emergent set of social protocols for interactions
Formal learning goals
Informal learning goals
Possibly diverse backgrounds of individual
Common subject-matter
Low shared understanding of domain
High shared understanding of domain
Loose sense of professionalism
Strong sense of professional identity
Strict distribution of responsibilities
No formal distribution of responsibilities
Easily disbanded once established
Less easily disbanded once established
Low level of trust
Reasonable level of trust
Life span determined by extent in which goals are achieved
Life span determined by the instrumental/expressive value the community provides to its members
Pre-planned activities and fixed goals
A joint enterprise as understood and continually renegotiated by its members
7
Introduction to this Volume
Figure 5. Distribution of Chapters by Country (N=45)
in an interdisciplinary research, it normally takes time for the scientific community to relate to each other in a way that makes any kind of collaborative progress possible. In addition, research and methodological approaches adapted to these kinds of studies are diverse, and they range from empirical to theoretical.
CONTRIBUTIONS TO THIS VOLUME This is one of the first comprehensive Handbook of Research Methods on virtual communities; presenting an array of methods, techniques, measures and matrices and various ways of studying, and modelling different phenomena in virtual communities. The book brings together the various experiences of experts in the field, throughout the world contributing research on methods for studying virtual communities in a single and coherent book. There are over 40Chapters; contributed by 92 well respected Researchers and Scientists from 20 countries, with global representation from North America, South America, Europe, Australia, Africa and Asia.
8
The diversity in representation as well as the issues presented in the book suggest the global nature of debate on the value of virtual communities. Further, the diversity in the chapters also shows that research on virtual communities is not only multidisciplinary but multisectoral, where it encompasses academia, government and the corporate sector. Figure 6 presents percentages of chapters associated with each sector. Figure 6. Chapters Distribution by Sector (N=45)
Introduction to this Volume
Main Themes The main themes presented in the book range from technical analysis, socio-educational analysis, computational modelling and emergent methodological presentation. The key focus however, is on the following: • • • • • •
Philosophical underpinnings of virtual communities Social and Semantic Network Analysis Computational Modelling Methods and Methodologies Measures and Matrices Emergent phenomenon
REFERENCES Daniel, B. K., McCalla, G., & Schwier, R. (2003). Social Capital in Virtual Learning Communities and Distributed Communities of Practice. The Canadian Journal of Learning Technology, 29(3), 113–139. Daniel, B. K., O’ Brien, D., & Sarkar, A. (2003). A design approach for Canadian distributed community of practice on Governance and International Development: A Preliminary Report. In Verburg, R. M., & De Ridder, J. A. (Eds.), Knowledge sharing under distributed circumstances (pp. 19–24). Enschede: Ipskamps.
Illera, J. L. R. (2007). How virtual communities of practice and learning communities can change our vision of education. Retrieved from http://sisifo. fpce.ul.pt/pdfs/sisifo03ENGconfer.pdf Lave, J., & Wenger, E. (1991). Situated learning: legitimate peripheral participation. New York: Cambridge University Press. Licklider, J. C. R. (1968). The Computer as a Communication Device. Science and Technology. Reprinted in. In Memoriam: J.C.R. Licklider. Systems Research Center. Preece, J. (2000). Online Communities: Designing Usability, Supporting Sociability. Chichester, UK: John Wiley & Sons. Rheingold, H. (1993). The Virtual Community: Homesteading on the Electronic Frontier. Reading, MA: Addison-Wesley. Schwier, R. A. (2007). A typology of catalysts, emphases and elements of virtual learning communities. In Luppicini, R. (Ed.), Trends in distance education: A focus on communities of learning (pp. 17–40). Greenwich, CT: Information Age Publishing. Weinreich, F. (1997). Establishing a point of view towards virtual communities. Computer-Mediated Communication, 3(2). Retrieved May 29, 2010 from http://www.december.com/cmc/mag/1997/ feb/wein.html.
Daniel, Sarkar & O’Brien (2008). Theory and practice of designing distributed communities of practice: Experience from the governance knowledge network in research communication. In Céline Beaudet, C., Grant, P. & Starke-Meyerring, D. (Eds.), the social and human sciences: From dissemination to public engagement (pp. 190-209). New Castle: Cambridge Scholars.
9
10
Chapter 2
GEEK:
Analyzing Online Communities for Expertise Information Lian Shi Parque Científico Tecnológico de Gijón Edificio Centros Tecnológicos, Spain Diego Berrueta Parque Científico Tecnológico de Gijón Edificio Centros Tecnológicos, Spain Sergio Fernández Parque Científico Tecnológico de Gijón Edificio Centros Tecnológicos, Spain Luis Polo Parque Científico Tecnológico de Gijón Edificio Centros Tecnológicos, Spain Iván Mínguez Parque Científico Tecnológico de Gijón Edificio Centros Tecnológicos, Spain Emilio Rubiera Parque Científico Tecnológico de Gijón Edificio Centros Tecnológicos, Spain Silvino Fernández Parque Científico Tecnológico de Gijón Edificio Centros Tecnológicos, Spain
GEEK: ANALYSING ONLINE COMMUNITIES FOR EXPERTISE INFORMATION In the last decades, we are observing a growing need for different types of “multidimensional” expertise. Efficient expertise management is a critical factor of success for organizations. Traditional approaches to this challenge are suffering many difficulties. Firstly, within multinational organizations, enterprises and companies, which are usually built by continuous merging of smaller companies, information systems are heterogeneous, disperse and often redundant. Secondly, a huge number of employees are scattered all over the world, with their expertise-related information spread out. Human resources departments have to manage and deal with hundreds of thousands of employees and countless profiles and areas of expertise. Even if integrated company solutions (such as ERPs, Human Resources Management Systems, etc.) can efficiently manage administrative and personnel information, data about employees’ profiles are decoupled from their daily activity, and have only poor links with their actual expertise. Thirdly, there is a very large variety of competencies with different topics, knowledge and techniques, which means another challenge is to locate given expertise quickly and precisely. The last issue is that traditional IT integration and maintenance of a new system becomes very complex. Motivated by these significant problems, the GEEK project aims to find a better solution to expertise management. More precisely, GEEK better enables us to extract and infer up-to-date expertise by taking into account people’s participation in virtual communities. In order to achieve this idea, we get help from the power of semantic technologies, which also provide lots of innovation opportunities to implementing the GEEK prototype. There is a range of benefits that can revert into an advanced personnel management. We list the most intuitive ones here: (i) identifying experts that match a given profile; (ii) detecting skill gaps
in order to plan training activities; (iii) building teams with the purpose of internal mobility and agile response to emergency situations. Semantic Web technologies have reached a maturity that allows managing large amounts of linked data, including information from virtual communities. Consequently, there seems to be potential possibilities for addressing enterprise expertise management by providing precise machine-readable semantic descriptions of the expertise and profiles of employees. The first step is to identify potential information sources on the Web, and to create the mechanisms to gather significant amounts of data. The raw information is usually available in different formats, therefore suitable extraction components are proposed to adapt the data. An ontology is used to unify all the information that comes from different data sources. The next step is to deal with partial descriptions caused by the use of different identifiers for the same resource. This phenomenon is particularly evident when a single individual participates in multiple online communities under different identities (e.g., different email addresses). On the one hand, these repetitive data become a burden for the quality of the collected data. On the other hand, users would be annoyed by redundant answers for a given query. Therefore we apply smushing techniques aiming to identify the co-occurrence of the same person in different communities. A key part of the project is the expertise inference process. We assume that people’s participation in communities is an evidence of their expertise. Then a set of customized rules are used and executed on top of all collected data to derive the expertise. Some mathematical functions aggregate partial expertise evidences into a coherent result. A friendly interface allows users to enter queries and browse the results and the collected data, including experts’ profiles.
11
GEEK
RELATED WORK Expert finders are systems that provide answers to expertise questions, in particular, about individuals with a certain competence. These systems have been explored in series of studies, including Streeter & Lochbaum (1988), Krulwich & Bruckey (1996), and Ackerman & McDonald (1996) as well as the studies in Ackerman et al. (2002). Most of these current systems use modern information retrieval techniques to discover expertise from implicit or secondary electronic resources (Zhang et al., 2007). A person’s expertise is presented as terms, which are keyword matched using standard IR techniques. The result is usually an unordered list of related people, or a list ordered according to term frequencies. The limitation of these systems is that it is difficult to measure the people’s relative expertise levels in particular areas. Mika (2005) proposed Flink, a system which extracts social network information including Web documents, FOAF data, email messages and bibliographic references from multiple sources. The collected information is then represented in extended FOAF profiles and can be used to perform inference. A RDF crawler collects profiles from FOAF documents on the web. These profiles are then matched with members of the target community. Rules are defined to decide whether people know each other and to determine similar instances across multiple information sources. GEEK has several points in common with Flink, although there are some differences too. The main objective of GEEK is to extract the expertise of people who participate in multiple communities by running inference rules on top of RDF data captured from these communities, whereas Flink focuses on the data extraction from social networks and the analysis of the personal relationships (e.g., co-authorship) by using centrality measures (Markov centrality/PageRank, closeness and betweeness). ExpertFinder (http://expertfinder.info/) is an international initiative that aims (i) to devise
12
vocabularies, rule extensions (e.g., for FOAF, SIOC, DOAP and SKOS) and best practices for the annotation of personal homepages, web pages of institutions, conferences, publication indexes, and (ii) to provide adequate metadata to enable computer agents to find experts on particular topics. The main goal of ExpertFinder is to enable a Web-scale infrastructure for the creation, publication and use of experts’ semantic descriptions to support various scenarios including: group management, disaster response, recruitment, team building, problem solving, and on-the-fly consultation. There are several products and projects related to the area of ExpertFinder. For instance, FindXpRT (Li et al., 2006) builds on the RDF-based FOAF project, and implements rules for users looking for an expert to collaborate with. This implementation focuses on combining FOAF facts and RuleML (Boley, 2006) rules to allow users to derive FOAF data by deploying personcentric rules, either before FOAF publication or, on demand, from published (RuleML FOAF) pages. Both a Description Logic (DL) reasoner and a rule engine are required to carry out inference tasks. The SemDis project proposes a Semantic Web application that detects Conflict of Interest (COI) relationships among potential reviewers and authors of scientific papers (Aleman-Meza et al., 2006). The expertise of a person on different topics or areas is described in a large populated ontology about computer science publications based on the DBLP bibliography database. The SEEMP project aims to “design and implement in a prototypal way an interoperability architecture for public e-Employment services”. SEEMP employs Semantic Web Services technology and introduces annotations in both the services for interacting with public employment agencies and the job offering/CVs. More recently, the Active project addresses the need for greater productivity of knowledge workers. One of its research lines involves using matching learning techniques to describe the user’s context semantically and
GEEK
thereby tailor the information presented to the user to fit the current tasks. The main difference between GEEK on the one hand, and SEEMP and Active on the other hand, is that we pay attention on “finding” experts and identifying expertise rather than “matching” profiles. Launched in the last months of 2008 Innoraise is a product that combines social software with semantics, information retrieval and data mining technologies to provide a solution for expert finding in computer science. Their novelty primarily comes from the unique combination of techniques employed and some unprecedented algorithmic achievements. They have two different versions of their product: a community edition focused on the needs of distributed organizations and Web communities, and an enterprise edition that additionally provides connectors to some enterprise applications. As well as these, portals such as ExpertWitness, Expertise Search and Teclantic have become popular recently. Users can look for experts who have relevant expertise through match-making. However, it is quite common that novice users have difficulties in characterizing their request for specific expertise, and current systems are not user-friendly enough to help them to find an expert and start collaboration, if desired. The expert search task is part of the Enterprise track of the Text Retrieval Conference (TREC) since it was firstly launched in 2005 (Craswell et al., 2005). The TREC community provides a benchmark consisting of organizational document collections, lists of candidate experts and sets of search topics, each one with a list of actual experts. This benchmark can be used to evaluate expert finding systems that are based on text retrieval techniques, which is an approach radically different from the one we use in GEEK. For instance, text retrieval techniques must be adapted for different natural languages, while the approach followed in GEEK is language-independent because it is not sensitive to the content of the expertise evidences.
The main novelties of our approach are that we: (i) use social networks within an organization or enterprise to help finding the appropriate people and their notable knowledge as well; (ii) leverage Semantic Web technologies for data recollection and data integration from heterogeneous and legacy information sources, both internal and external; (iii) execute a set of customized rules on top of all data to draw out evidences about experts and their outstanding expertise, through which an ordering rank is produced; (iv) empower end users without any background of Semantic Web technologies to conveniently express queries. When put together, these innovations enable employees in a global enterprise to find an appropriate expert with whom they can consult, and whose expertise can be used to solve particular situations.
TECHNOLOGY OVERVIEW The GEEK ExpertFinder is an integrated project built on the aggregation of several independent components, powered by Semantic Web technologies. It hosts a compact dataset and encompasses high-level enabling applications that exploit consolidated data for expertise management. The components provide storage, management, refinement and querying facilities for the hosted information. In this section, we present the relevant methods and technologies used in the realization of the GEEK prototype described above.
Vocabularies Driven by the primary application of expert finding, we reuse and extend existing, established vocabularies from the Semantic Web. In particular, FOAF, SIOC, DOAP and SKOS are the starting points of our work as they cover a wide range of necessary features to adequately describe the expert finder domain. This combination was firstly proposed by Aleman-Meza et al. (2007).
13
GEEK
FOAF (Brickley & Miller, 2007) is a paradigmatic vocabulary widely used on the Semantic Web. It was developed to create machine readable descriptions about people, groups, organizations, and their relationships. It contains some purely descriptive properties, such as foaf:mbox_sha1sum which can be used to relate people to their email addresses without revealing the address to spammers. Interestingly, FOAF also contains relational properties relevant for the expert finder domain, such as foaf:currentProject and foaf:pastProject, which provide information on ‘some collaborative or individual undertaking’ that a person is (or has been) involved in. In our work, FOAF is used to describe the personal data of the individuals, i.e., the potential experts. The SIOC ontology (Breslin et al., 2006) can capture the information contained explicitly and implicitly in online communities and discussion forums. Several software applications, usually deployed as plug-ins, are already available to export SIOC data from some popular blogging platforms, web forums, mailing lists and content management systems. In our work, SIOC is used for the description of online discussion forums, which are classified as sioc:Forums. Contributions to these communities are instances of sioc:Post. DOAP is a small vocabulary used to describe open source projects (Dumbill, 2004). Large open source communities have internally adopted and extended DOAP to organize their projects. The pivotal concept is the class doap:Project, but project instances are also related to their developers, releases, repositories, associated mailing lists, etc. SKOS stands for Simple Knowledge Organization System, a framework designed to represent and share controlled vocabularies, such as classifications, glossaries, and thesauri (Miles and Bechhofer, 2009). Basically, concept schemes are largely made up of instances of skos:Concept with associated labels and related by semantic relationships such as skos:broader. The GEEK prototype uses SKOS concept schemes to describe expertise topics in a relevant domain.
14
All these small vocabularies interact with the others, as shown in Figure 1. Note that other vocabularies specifically designed to describe the CV of an individual, such as DOAC and Resume, have not been considered, because GEEK focuses on the expertise that can be extracted from the activity traces in the communities. In addition to the use of these vocabularies, it is necessary to develop a new ontology to describe the finer details of the individuals’ participation in forums, and the derived expertise evidences. The design of this ontology is described in detail later in this chapter.
Resource Smushing The increasing amount of machine processable data in the Semantic Web facilitates processes such as social network analysis and data mining. Innovative applications, like expert finding on the (Semantic) Web, are enabled by the ability to execute these processes at a World Wide Web scale. Although the RDF data model is well suited for seamlessly merging data (triples) from arbitrary sources, a data integration problem still remains. Unconnected descriptions of the same thing can be obtained from different sources. For instance, a single individual can participate in several web communities with different virtual identities. When they are summed together, the descriptions of his or her virtual identities (such as e-mail accounts) will be different RDF resources weakly connected to each other. If these identities were to be taken as different persons, data analysis would be crippled, as it would lead to imprecise conclusions and a widespread flooding of phantom virtual identities. We assign the term smushing to the process of normalizing an RDF data set in order to unify a priori different RDF resources which actually represent the same thing. This use of the expression data smushing in this context can be traced back to Dan Brickley: (http://lists.w3.org/Archives/Public/www-rdfinterest/2000Dec/0191.html).
GEEK
Figure 1. Relations among the FOAF, SIOC, DOAP and SKOS vocabularies
The application which executes a data smushing process is called a smusher. The process is comprised of two stages: first, redundant resources are identified; then, the data set is updated to reflect the recently acquired knowledge. The latter is usually achieved by adding new triples to the model to relate the pairs of redundant resources (often using owl:sameAs). Although smushing can be applied to any kind of resource, it is particularly important for people descriptions. That is, we aim to identify the co-occurrence of the same person in different communities. In this sense our research relates to matching frameworks, see http://esw.w3.org/ topic/TaskForces/CommunityProjects/LinkingOpenData/ EquivalenceMining. In Shi et al. (2008), we evaluated two smushing techniques using data mined from open source communities. The first approach exploits the semantics of inverse functional properties, which
solely and definitely determines whether two entities are the same considering their property values. The second approach is not based on logics, but on heuristics, more precisely, on the comparison of entity labels (people’s names). Both techniques are applied to a data set that contains thousands of instances of foaf:Person.
Inverse Functional Properties OWL (Motik et al., 2009) introduces a kind of properties called Inverse Functional Properties (IFPs for short). An IFP is a property which behaves as an injective association; hence its values uniquely identify the subject instance: ∀p / p ∈ IFP ⇒ (∀s1, s2 / p(s1 ) = p(s2 ) ⇒ s1 = s2
15
GEEK
This inference rule is built-in in the OWL-DL reasoners; therefore, this kind of smushing can be easily achieved just by performing reasoning on the model. However, sometimes it is advisable to avoid the reasoner and to implement the IFP semantics by means of an ad hoc rule. •
•
Executing a simple, light-weight rule is often more efficient than the reasoner, which usually performs many other tasks. Moreover, it can be used regardless of the expressivity level of the dataset, while reasoners have unpredictable behavior for OWL-Full datasets. A custom rule can generalize IFPs to any kind of properties, including datatype properties. There are some scenarios in which such generalization is useful. For instance, while the object property foaf:mbox is declared as an IFP, its value is often unavailable due to privacy concerns. On the other hand, values of the property foaf:mbox_sha1sum are widely available (or can be easily calculated from the former). Unfortunately, the latter is a datatype property and cannot be declared as an IFP in the current version of OWL, thus the need for a generalization.
This rule can be written as a SPARQL CONSTRUCT sentence, according to the idiom described by Polleres (2007), see Figure 2. Note that this rule only takes into account the foaf:mbox_sha1sum property, but its generalization to any property declared as owl:InverseFunctionalProperty is straightforward. Some FOAF properties can be used as IFPs to smush resources that are people. The FOAF specification defines mbox, jabberID, mbox_sha1sum, homepage, weblog, and openid as IFP, among others. However, a quick analysis of a set of FOAF files collected from the web shows that some of these properties are barely used, while others are often (mis-)used in a way that makes them useless
16
Figure 2. IFP smushing rule implemented as a SPARQL CONSTRUCT sentence. The usual namespace prefixes are assumed
as IFPs. Notably, some users point their homepage to their company/university homepage, and weblog to a collective blog. Consequently, only the mbox_sha1sum property is actually useful as IFP.
Label Similarity The concept of similarity is extensively studied in Computer Science, Psychology, Artificial Intelligence, and Linguistics literature. String similarity plays a major role in Information Retrieval. When smushing people’s descriptions, typical labels used are personal names (foaf:name). Smushing based on label similarity deals with imprecise knowledge, i.e., even perfect label equality does not guarantee that two resources are the same. Using a softer comparison function will produce even more uncertain knowledge. A label-based smusher can be implemented as a rule. Unfortunately, SPARQL does not have rich built-in string comparison functions. There is a proposed extension call iSPARQL (Kiefer, 2007) that can be used for this purpose. Our experience reveals that iSPARQL implementation is far from being efficient enough to deal with large datasets. This fact suggests that other approaches to implement label-based smushing should be considered.
GEEK
Smushing, Correctness and Consistence
Combination of Ontologies and Rules
The pairs of redundant resources identified using the techniques described above can be used to enrich the data set. OWL provides a special property to ‘merge’ identical resources, owl:sameAs. When two resources are related by owl:sameAs, they effectively behave as a single resource for all the OWL-aware applications. Note, however, that plain SPARQL queries operate at the RDF level, and therefore they are unaware of the owl:sameAs semantics. In any case, the semantics of owl:sameAs may be too strong for some cases. On the one hand, some applications may still want to access the resources individually. On the other hand, several factors can influence the reliability of the findings made by the smusher. Notably, the data smushing based on label comparison is obviously imperfect, and can lead to incorrect results. For instance, different people can have the same name, or they can fake their identities. Even the logically-sound smushing based on IFPs is prone to error, due to the low-quality of the input data (fake e-mail addresses, identity theft). Although improbable, it is also possible that different e-mail addresses clash when they are hashed using SHA1 (Eastlake & Jones, 2001). To tackle these issues, a custom property can be used instead, such as ex:similarNameTo. Applications interested in the strong semantics of owl:sameAs can still use a rule to re-create the links. Another kind of OWL properties, Functional Properties (FP), is also useful for smushing. They can help to check the consistency of the smusher’s conclusions. A resource cannot have multiple different values for a FP. Therefore, if two resources that are to be smushed are found to have irreconcilable values for a FP such as foaf:birthday, an issue with the smushing rules (or the quality of the input data) must be flagged.
Assuming the hypothesis that the participation of people in communities gives the evidence of their expertise, the inference process provides means to reason about the expertise of a given person by executing a set of light-weight rules build on top of a domain specific ontology and data set. The significance of rules and rule-based representation languages for the dissemination of the emerging Semantic Web has been often emphasized by the community.The data model in GEEK is built on OWL semantics, which is expressive to describe the expertise domain but with the limitation of inference. Then, a suitable rule language as a complement plays a very important role in GEEK. The integration of an ontology language and a rule language becomes a hot topic nowadays. However, we realize the main issues that arise in the integration are (Rosati, 2006): (i) from the semantic viewpoint, OWL language is based on open-world semantics, while rules are typically interpreted under closed-world semantics, (ii) from the reasoning viewpoint, reasoning in the formal system obtained by integrating an ontology and a rule component maybe not be a decidable problem which is very hard to handle, (iii) from the logical viewpoint, OWL is based on Description Logic, specially, two species of OWL: OWL Lite and OWL DL are corresponding to two Description Languages: SHIF(D) and SHOIN(D) respectively. Whereas, the existing proposals for a rule layer on top of the ontology layer of the Semantic Web refer to rule formalisms originating from Logic Programming, like Prolog. These issues are revealed in a mature body of literature and many proposals have been made. Unfortunately, none of them stands as a standard for a rule language. SWRL (Horrocks et al., 2004) (Semantic Web Rule Language) is a rule language that combines OWL and RuleML with a well-defined declarative semantics. However, it is based on FOL, and consequently, on undecidable
17
GEEK
logic. The DLP (Grosof et al., 2003) approach defines an intersection of the Description Logic underlying OWL and Horn clauses making possible to reuse existing reasoners while losing the expressive power. The idea of the hybrid approach of integrating ontologies and rules is to separate between them to take the advantage of interfacing existing rule engines (e.g., Jess) with existing ontology reasoners (e.g., FaCT (Horrocks, 1998), Racer (Haarslev, 2001), etc.). In parallel, W3C has set up a working group, namely RIF, to produce a standardized rule interchange format (Boley et al., 2009). So far, RIF has achieved some significant progress. For our project, we take an open-world assumption. Given the requirements of our experiment, SWRL lacks the expressiveness needed to define the expertise inference rules, in terms of mathematical capabilities and the ability to introduce variables in the head of the rule. The latter is crucial to create descriptions about new resources. Moreover, the simplicity of the rule set (in terms of rule chaining) does not justify the cost to translate from RDF to Jess and back. By taking these considerations into account, a simpler and more compact way of defining rules is enough for this project. Therefore, we solely rely on SPARQL (Prud’hommeaux & Seaborne, 2008) instead of getting a powerful rule engine involved. Polleres suggested that SPARQL can be used as a rule language on top of RDF (Polleres, 2007). CONSTRUCT statements have an obvious similarity with view definitions in SQL, and thus may be seen as rules themselves.
AN ONTOLOGY FOR MODELLING PARTICIPATION AND EXPERTISE The GEEK ontology has been designed with the purpose of modeling people’s participation in social communities and projects, and modeling people’s expertise on certain topics. GEEK is a light-weight ontology built with OWL-DL.
18
It reuses many concepts from well-known web vocabularies such as SIOC, FOAF and DOAP. These vocabularies provide the main concepts to represent and characterize the entities collected from the Web. However they fail short to describe how a person is considered inside a community. From our point of view, the notion of participation is crucial for measuring people’s expertise. The participation of someone in a community provides indirect evidences about his knowledge on certain topics. Communities and projects are contextualized, owing to the fact that they are always devoted to some particular subjects and matters. Although being a member of a community or a project does not imply any concrete level of expertise, the analysis of the participation can provide clues to infer the skillfulness in a given field of a person: how many messages she posts, how many years she was involved in a project, how many communities with the same topic she participates in, etc. Undoubtedly, there exists a high degree of uncertainty due to the fact that the inference is based upon indirect references. The participation of a person in a community can be modeled as the predicate participation(person, community). In GEEK, this predicate has been reified as a first-order class in DL: geek:Participation-profile (Figure 3). In this way, the participation itself can be described in terms of role, level, karma, as it will be explained in the next paragraphs. Instances of geek:Participation-profile are related with a single instance of foaf:Person (or sioc:User) and a doap:Project (or sioc:Forum). For each community a person is involved in, there exists one and only one participation profile, even if he or she participates in the community using different accounts. A participation profile is characterized by the following items: •
geek:Participation-role: People can play different roles within a forum or a project. On the one hand, in case of projects, we use the classification from the DOAP vo-
GEEK
Figure 3. Graphical representation of geek: Participation-profile
•
cabulary, although only three roles (helper, maintainer, developer) are considered for the purpose of the GEEK ontology. On the other, a role classification for forums was created distinguishing between the role of requester and replier. Note that SIOC contains a class called sioc:Role which is related to access privileges of users with respect to the forum; conversely, geek:Participation-role is not related to access control, but to the activity performed by the user in the community. geek:Participation-level: The level of participation of a sioc:User in a specific forum. It is a descriptive feature of her participation in a community and it is represented by a value of the class geek:Participationlevel. This class measures the frequency of people’s participation as per the following enumeration: {geek:high, geek:medium, geek:low}. In our prototype, we do not consider the level of participation in projects, due to the fact that no information has been extracted from the data sources (Debian and Ohloh) in order to be used as
•
•
evidence for the level of participation of people in projects. A more in depth analysis of the sources could provide hints on the participation level of individuals in projects. For instance, the number of commits made to a revision control repository such as Subversion, may be an indicative of the participation level. geek:Karma-level: The karma is the status of a person in a particular community as perceived by other members. The karma is the valuation of the usual behavior of the participant, and it has been represented as a set of values: {geek:high-karma, geek:medium-karma, geek:low-karma}, not to be confused with the values of geek:Participation-level. Another central concept in the GEEK ontology is geek:Expertise-profile, whose instances represent the contextualized expertise of a person on a specific topic (Figure 4). This class arises from another reification that is necessary to capture an N-ary relationship among individuals, expertise topics and levels of expertise. The local
19
GEEK
expertise value is the result of the analysis of the participation profile of people involved in a social community. This information is obtained from the evidence that arouses from the participation in a particular forum, hence the inferred person’s expertise is not global. For instance, if forum A is about topic ‘Python language’, and the system has deduced that somebody is an expert on ‘Python language’ from the evidences derived from her participation in forum A, it cannot be ensured that this expertise applies outside the scope of this forum. Moreover, new evidences can be found in other communities which contradict the previous assert: the individual can be participating in other forums also about the ‘Python language’, from where the system can infer a low expertise.
It is difficult to obtain an analytical numerical value that measures the expertise degree on a topic taking into account only the participation evidences. Therefore, in the GEEK ontology, we consider the level of expertise as a discrete set of ordered values: {geek:beginner, geek:intermediate and geek:expert}. We are aware of the roughness of this classification, but it is expressive enough to measure the local expertise in our application. Finally, the GEEK ontology also provides a formal mechanism to represent the global expertise of a person on a certain topic (geek:Expertise, Figure 5). This expertise is build upon the set of local expertise derived from the participation profiles. The global expertise is measured in a continuous scale in the interval [0,1], where 1 means a complete knowledge on a topic and 0 a complete lack of knowledge. The three levels of local expertise (beginner, intermediate, expert) are distributed along this range. Calculating this numerical expertise value is a matter that is covered
Figure 4. Graphical representation of geek: Expertise-profile
20
GEEK
in a later section. We highlight that other scales for measuring the global expertise are also possible. For instance, the discrete scale used to measure the local geek:Expertise-profile could be used here too. However, we found it was not expressive enough for the global expertise for the requirements of the system, which call for a criterion to order people in the same range of expertise. For instance, if Alice and Bob are geek:experts on ‘Python language’, the system cannot produce a total ordering of the results and therefore is unable to decide which one should be suggested in the first place. Conversely, a numerical scale allows a much finer grain classification. So far, we have not addressed the issue of how to represent the domain-dependent areas of expertise. The ontology that has been just described is independent on the domain of expertise. Therefore, a complementary concept scheme is required to cover this aspect. SKOS is a perfect fit for the definition of the domain topics in a RDF model.
Therefore, a SKOS concept scheme (a thesaurus) must be defined for each domain in which GEEK is to be applied, if there is not any pre-existing one that can be re-used. Using SKOS, each expertise topic is modeled as an instance of skos:Concept, and the relationships with its broader, narrower or related topics can be captured as semantic relationships in SKOS. Note that, given the increasing popularity of folksonomies in the so called Web 2.0, they may be useful resources to acquire domain knowledge; however, extracting formal knowledge from a folksonomy is still a challenging research topic. In the following, the availability of a formal concept scheme will be assumed.
Figure 5. Graphical representation of geek: Expertise
21
GEEK
GEEK, AN EXPERTFINDER PROTOTYPE The GEEK prototype has been designed and developed as a show case to demonstrate the application of the Semantic Web technologies to the expert finding problem. A data flow (see Figure 6) is created among several components in the data warehouse architecture. For confidentiality reasons, we cannot disclose information about expertise and experts in a real enterprise scenario. In order to show an illustrative application, the prototype described in this article focuses on the open source communities.
RDF Store The RDF store is a very important component in our architecture, since it is the central node that provides the RDF support for the others parts of the system. We are aware that this architecture Figure 6. Data flow through the components of the system
can arise some scalability problems concerning high load requirements in some scenarios. Taking this into account, the decision about the RDF store comes quite important for the final performance of prototype. Among the RDF store products available in the market (Bizer & Schultz, 2009), we chose OpenLink Virtuoso (Erling & Mikhailov, 2008) for our prototype. Virtuoso is a fast store with an easy maintenance and access control mechanisms to prevent the uncontrolled disclosure of important data. We also profit from some other Virtuoso features that extend the current standards: SPARUL support (Seaborne & Manjunath, 2008), aggregates in SPARQL, inference on demand (Erling & Mikhailov, 2009) and Named Graphs (Carroll et al., 2005). They are used to manage the data, to efficiently implement rules and reasoning, and to keep track of the data provenance, respectively. We are aware that the use of these non-official features to skip some limitations of the actual specification of SPARQL could limit the portability of our prototype in the future. However we believe that the added value of these features justifies this decision sufficiently.
Data Recollection The data recollection components are in charge of providing large quantities of relevant and up-todate data extracted from various web sources. As a matter of fact that the raw data is heterogeneous and distributed, these components can have different forms, such as adaptors, wrappers, scrappers and crawlers. Resuming the work started in Berrueta et al. (2008), a corpus of RDF data was assembled from these five online communities: •
22
GNOME Desktop Mailings Lists: all the authors of messages in four mailing lists (evolution-hackers, gnome-accessibilitydevel, gtk-devel and xml) within the date range July 1998 to June 2008 were export-
GEEK
•
•
•
•
ed to RDF using SWAML (Fernández et al., 2009). Debian Mailing Lists: all the authors of messages in four mailing lists (debiandevel, debian-gtk-gnome, debian-java and debian-user) during years 2005 and 2006 were scrapped from the HTML versions of the archives with a set of XSLT style sheets to produce RDF triples. Advogato: this community exports its data as FOAF files. We used an RDF crawler starting at Miguel de Icaza’s profile. Although Advogato claims to have +13,000 registered users, only +4,000 were found by the crawler. Ohloh: the RDFohloh project (Fernández, 2008) exposes the information from this directory of open source projects and developers as Linked Data. Due to API usage restrictions, we could only get data about the +12,000 oldest user accounts. Debian Packages: descriptions of Debian packages maintainers were extracted from APT database of Debian packages in the main section of the unstable distribution, and converted to RDF triples.
The size of the corpus exceeds two million triples, with more than 25,000 instances of foaf:Person and sioc:User (Table 1). These communities share the focus on open source software development, hence they can be
Table 1. Size (number of people) of the communities before smushing €€€€€€€€€€€€€€€€€€€€Source
expected to have a significant number of people in common. The data consolidation component, described in the following section, will reveal the degree of overlap between them. However, before moving on to the next stage, the collected data is dumped into the RDF store and some preparatory processes are executed: •
• •
•
Some of these data sources or tools do not directly produce instances of foaf:Person, but just instances of sioc:User. An assumption is made that there is a foaf:Person instance for each sioc:User, with the same e-mail address and name as the user. These instances are automatically created when missing. This assumption obviously leads to redundant instances of foaf:Person which will be later detected by the smusher. A (domain-dependent) thesaurus in SKOS is added to the dataset. By defining subclasses of sioc:Forum (a detailed explanation of this division was addressed above), instances of this class are manually classified. Instances of sioc:Forum and doap:Project are also semi-automatically annotated with terms of the thesaurus, reusing the classifications used in the original sources of information (for instance, DebTags in the case of Debian packages).
Moreover, although in the following we will base our work on the aforementioned corpus, it is obvious that the data extraction processes must be repeated periodically in order to ensure that the data is up-to-date.
Data Consolidation The data consolidation component is critical for the success of the prototype. The data gathered from heterogeneous sources will be used later to draw conclusions on the expertise of individuals. However, such conclusions would be invalid un-
23
GEEK
less the quality of the data is assured. In particular, fragmentation due to the use of multiple virtual identities leads to partial unconnected descriptions. The consolidation stage tackles the challenge of identifying these identities and merging the partial descriptions of the same person into a coherent and complete representation. The smushing techniques are applied to avoid a widespread flooding of virtual identifiers caused by the recollection of data from independent information silos.In Shi et al. (2008), we analyzed the results of smushing the data in the corpus. As we expected, a significant overlap among the five open source communities was found, see Table 2.
Tagging of Forums and Projects Forums and projects collected from the web sources must be described with the concepts of the domain thesaurus in order to execute the inference process. Ideally, the topics of the communities should be automatically extracted from the sources. However, this is possible only for certain sources. For instance, an alignment is possible between the DebTags attached to the Debian packages and the concepts of the thesaurus. Nevertheless, for the general case, the communities must be manually tagged. For instance, the mailing list desktop-devel at GNOME was tagged with the concepts {‘Desktop applications’, ‘Development’}.
Table 2. People accounting by the number of communities they are present in, after smushing €€€€€€€€€€€€€€€€€€€€Num. of people €€€€€€€€€€€€€€€€€€€€In 5 communities
In the case of forums, a simple subclassification of sioc:Forum is available in the GEEK ontology. Three kinds of forum have been identified: technical support or development, general discussion, and announcement forums. Every forum is manually classified in one of these three categories before rules are executed. For instance, the gnome-hackers mailing list is classified as a general discussion forum because its primary function is to serve as meeting place for the GNOME community.
Expertise Inference Process In the following sections, we give insights about the definition of the expertise inference process. In order to reach the GEEK’s objective, we define a set of customized rules on top of the GEEK ontology. This set is composed by three families of rules according to the kind of inference they realize: 1. Behavior rules. This subset infers the participation profile (role, level and karma) of each user analyzing the traces of her activity in each social community. 2. Local expertise rules. This subset obtains the expertise profile from user’s participation evidences. 3. Global expertise rule. This final rule calculates the user’s global expertise on each topic as a function of her local expertise. As it was mentioned before, the implementation of the rules in GEEK follows Polleres’ pattern to express forward-chaining rules using SPARQL CONSTRUCT sentences. However, the use of this query mechanism does not automatically insert the produced triples back into the original dataset. Typically, the new triples must be added to the store after executing the query. For performance reasons, our implementation takes advantage of the SPARUL support in Virtuoso by simply rewriting CONSTRUCT queries as INSERT sentences.
GEEK
The insertion of rules and data is performed in just one step, mixing the inferred knowledge with the RDF data in the store.The expertise inference process launches the three set of rules in sequence. We overcome the limitation of the rule chaining process in SPARQL by using an ad-hoc script to fire the rules in a pre-programmed sequence.
Behavior Rules After the data recollection and consolidation, the inference of implicit information can start. There are three kinds of behavior rules, matching the three aspects of the participation profile: •
•
We consider a person has a geek:requester role within a forum when she usually starts new discussions (threads). On the other hand, a geek:replier mainly limits her participation to contributions to open discussions. Figure 7 shows a query that tags as geek:repliers those users with more replies than requests in a forum. Note that the nested SELECT query in the figure uses extended SPARQL features (aggregators). The participation level of a user in a forum cannot be defined in absolute terms due to the different activity of each forum. Therefore, we assign the levels (high, medium and low) relative to the participation of other users in the same forum. To this extent, all the participants in each forum are ordered by the number of messages they have sent. As central tendency measurements do not provide equal-sized subsets, for this division a percentile distribution (e.g., 20%, 60%, 20%) is applied over this list to classify the users into levels. This rule requires calculating complex aggregates, which are not available in the SPARQL extension provided by Virtuoso. Consequently, we use a Python script to apply this rule.
•
Some online communities have a built-in explicit karma value assigned to their participants. In such cases, these already available karma values can be re-used by GEEK. However, in the general case where there is no available value, the geek:Karmalevel can be computed, but only after the two previous and independent facets of the participation profile have been inferred. The karma rules are dependent on three variables: the participation role and level, and the type of each forum. Therefore, the karma is inferred for just forums, not projects. For instance, if a user participates in an announcement forum and she has a high level of participation and she plays a requester role, then we can say that this user
Figure 7. Rule used to infer the geek: Replier role of a user in a forum
25
GEEK
Figure 8. One of the rules used to infer karma in announcement forums
has a high-karma (Figure 8). These rules are of a heuristic nature and inspired by experience and common sense. In the case of the previous example, it is presumed that someone that often publishes announces is a relevant member of the community.
Local Expertise Rules After the analysis of the users’ behavior in each community, some information that was hidden behind the data now is shown explicitly. Therefore it can be used as evidence for driving inferences about the expertise of people. These rules are different for forums and projects. For forums the local expertise directly depends on the karma previously inferred. For instance Figure 9 presents a rule that infers high local expertise profiles for people with high karma. In the case of projects different variables may be used to infer the local expertise profiles. For example for how long the person has been involved in a project. This kind of data can be obtained, for instance, from Ohloh’s API. Right now this step (i.e., inferring local expertise) does not introduce a lot of new knowledge, but it is useful to abstract the sources. In the future,
26
Figure 9. One of the rules used to get local expertise of a person in a particular topic from her karma in a forum
we envision it may be used as a hook point to add more complex business logic in other scenarios, like the ones that are described later in the chapter.
Global Expertise Rule This final rule is in charge of calculating an analytic value for representing the person’s expertise on a certain topic. This value is calculated by consolidating people’s local expertise v derived from the application of the previous set of rules. This is a crucial step for any ExpertFinder system which aims to provide reliable answers.The global expertise is measured in a continuous scale in the range [0,1]. An arbitrary position in this scale is assigned to the three levels of local expertise (geek:Expertise-level) by means of the μ mapping function: μ(geek:beginner) = 0.3 μ(geek:intermediate) = 0.7
GEEK
μ(geek:expert) = 1.0 However, not all data sources are equally trustworthy. The original collected data and subsequent inferences have to be deemed carefully taking into account their provenance. For instance, the participants of the mailing list ‘Debian development’ are very well-known and skillful people in the topics related with the development of Debian. Hence we fully trust on the inferences obtained from the analysis of the participation in this community. Conversely, the mailing list ‘Debian users’ is a common forum where novel Debian users usually ask newbie questions. From our point of view, the local expertise measurements derived from these two forums cannot be considered equivalently. Being a local expert on ‘Debian development’ is a much higher social ranking than being a local expert on ‘Debian users’ due to the higher technical level of the former. In order to assign a relative value to each com munity, we introduce a weight vector w that defines their salience. At this stage, for each person and topic, the system calculates a global expertise from the set of local expertise of the communities relevant to this topic in which she participates. So given a person, a particular topic, a set of n forums and a set of m projects where she participates, we define the global expertise as the weighted mean of the local expertise vector: E=
∑w v ∑w
i i
i
Projects are usually more reliable sources than forums for mining expertise information because they have a higher signal-to-noise ratio. However, forums are usually richer in terms of numbers of participants and topics. Therefore, it may be useful to balance the relevance of all forums over the projects. This effect can be achieved by tuning the weight vector, that is, by applying a bias to
their weights. Alternatively, it is possible to split the vector v in two segments, v f and v p , and separately compute the weighted mean of forums and projects. The coefficients cf and cp balance between forums and projects: E=
1 c f + cp
c f
∑w v ∑w
f i i
+ cp
i
∑w v ∑w j
j
p j
The computed results provide an analytical value, E , that measures the global expertise of a person on a topic. These values can be used to create a ranking of people by their expertise on a topic. Computing these values is outside of the expressivity of SPARQL rules. In the prototype, a Python script calculates the values for E for each (person, topic) pair.
API and User Interface A high-level API has been designed and developed in order to hide the complexity of the data access for applications. Most of the operations in this API are realized by a set of SPARQL SELECT queries against the inferred model in the RDF store (after the execution of the rules). Consequently, the inference is performed off-line and the response time is low. The user interface is concisely designed with a very distinct layout in order to allow users (typically, enterprise managers and people looking for peers to solve a particular problem) to find answers to their queries with a point-and-click interface (Figure 10). There are four interrelated categories (people, topics, forums and projects) explicitly listed in a side column. These links lead to extensive listings of all the entities. For each entity, a detailed view is available. Interestingly, this view consolidates information that comes from independent sources (e.g.: photographs and links to personal homepages). The user interface
27
GEEK
Figure 10. Screenshot of the prototype showing details of a outstanding member of the communities
also shows ranked listings of entities matching a query, for instance, a ranked list of experts on a given topic. The user interface was built as a JavaEE application and provides a rich interaction with users. This user interface is just a demonstrator that serves to assess the final quality of our prototype. Nevertheless, we are also working on other ontology-based techniques to visualize large amounts of data (Katifori et al., 2007), even though we still do not have relevant results regarding this issue.
scenario. The architecture presented in this article is also generic and should remain valid for other domains apart from the open source communities. We identify the following changes as a requirement in order to deploy the GEEK solution to a new scenario, for instance, within a large company: •
APPLICABILITY TO THE ENTERPRISE SCENARIO The technologies applied to build the GEEK prototype are part of the standard semantic web stack, and thus they are domain-independent and portable. An attempt has been made to achieve a balance between not being too generic and not too specialized. The basic ideas underneath the prototype can be applied to finding experts in any domain and
28
•
Data Recollection Components: ideally, harvesting the RDF data from distributed sources should be just a matter of crawling the sources. Unfortunately, most sources do not publish yet RDF triples. Consequently, the data recollection process includes a specific task to produce the RDF triples. This task needs to be customized for each source, although in this chapter we have demonstrated different data recollection methods that can be adapted for new sources. Data Model: currently, the scope of the data model is restricted to expertise evidences related to the participation of individuals in communities (including forums and projects). We envision other mechanisms to obtain expertise evidences, for
GEEK
•
•
•
instance, from the authorship of scientific documents, presentations and technical reports. These new mechanisms should have a reflection on the data model. Expertise Inference Rules: the current rule set can be slightly changed to increase rule expressiveness or enriched with new rules that use new facts to infer expertise evidences. Domain-Specific Thesaurus: actually, this is the only piece that needs to be completely changed in line with the domain requirements. The prototype demonstrates that even a simple thesaurus with just basic semantic relationships performs adequately. Access control: within an enterprise scenario, privacy issues arise. Therefore, provisions must be made in order to control the access to the system, as well to the data stores that may contain confidential information about the employees and the company know-how.
For instance, in the particular scenario of a multi-national steel-producing company, we need to identify internal sources which can be potentially useful, and put in place adequate adaptors to acquire large datasets from these sources. The data model and expertise inference rules would be enriched for the use case, and a new thesaurus defined for the specific business of the company.
CONCLUSION AND FUTURE WORK In this chapter we propose an ExpertFinder prototype composed of four components, each of which has been described in more detail. The prototype demonstrates it is possible to exploit the web of data for world-wide experts and expertise identification leveraging the semantic web technological stack, including RDF, reasoning, SPARQL, thesauri and rules over ontologies. We have to say that all this information (the ontology, the dataset and the
expertise inferred) has not been made public so far, taking into from the many privacy issues that must be taken into account before. The proposed prototype describes raw data from different open communities using standard vocabularies such as FOAF, SIOC, DOAP and SKOS. In this way, data from independent sources can be integrated in a single model. Data quality is a critical issue which is addressed by two smushing strategies. The aim is to figure out and reduce identical instances under different identities. In Shi et al. (2008) we evaluated the smushing strategies and we showed promising results that make us confident about the quality of the data used for the expertise inference process. An experiment was carried out to measure the accuracy of the results of the smushing process. Feedback was requested from the people who were identified as members of the at least four of the communities at hand. The amount of responses was insufficient to extract significant conclusions, but the individual responses were generally positive and supportive. In the future, a feature may be added to the system user interface to allow users to flag whether they suspect the results of the smushing process are not correct. Such warning flags would be later studied and traced to discover the origin of the problem. We put effort in efficiently integrating inference techniques based on a set of custom rules, which are executed on top of all data to reveal expertise evidences of people for each topic in a given domain. Finally, some mathematical functions are used to aggregate expertise from different evidences in order to create a consistent and reliable profile. A preliminary quality evaluation was based on the open source communities data, and showed promising and consistent results with respect to the efficiency and usefulness of the expert finding. A more in depth study based on actual enterprise data will be performed once the system is deployed in the steel-producing company scenario. In order to develop the prototype described in this chapter, we cherry-picked from the current
29
GEEK
state of the art of the Semantic Web technologies: vocabularies, query languages, data extraction and rules over RDF datasets. Therefore, the authors do not claim to be credited for a huge step forward in this research context. Instead, we made punctual improvements that made possible to innovate and to build a practical solution to a business challenge by implementing many of the Semantic Web technologies. We are aware of the limitations of the current rule set and formulae for expertise mining. Regardless of the quantity and quality of the collected information, the expert finding inferences only actually use a subset of this information. On the one hand, some information such as the e-mail address, is irrelevant for the expertise calculation (i.e., it does not provide any clue about the knowledge and experience of its owner), but is really useful for identity smushing. On the other hand, there is some information that is subjective by nature, such as the karma level, and thus it is difficult to be interpreted in a consistent and convincing manner. Even the local expertise level is measured in an arbitrary and rough-grain scale. From a reasoning perspective, GEEK does not make use of any kind of traditional DL reasoners due to several reasons: •
•
•
30
Low level data integration from independent sources comes for free from the RDF semantics (graph merging). The proposed smushing techniques require an expressivity beyond OWL-DL. In particular, IFP must be enforced for datatype properties, and non-trivial string comparisons are required. We note that this situation is about to change with the upcoming OWL 2 (Motik et al., 2009) and its key axioms (HasKey). The ability to turn on and off specific OWL reasoning features is crucial in data integration scenarios, particularly regarding the semantics of owl:sameAs reasoning.
•
•
•
We found that our application requirements seldom rely on subsumption or other typical DL reasoning. The expertise inference rules require an expressivity beyond DL and can be seen as production rules that make use of mathematical functions. These rules produce new knowledge that is inserted back directly into the data repository. Using SPARQL to build a light-weight production rules system is feasible extending the semantics of RDF/OWL vocabularies. However SPARQL still lacks aggregation and arithmetic operators, so they must be replaced with external logic implemented by a scripting language (Python in our case). The same applies to rule chaining. We hope that a more expressive descendant of SPARQL and the upcoming results from the RIF Working Group will help to overcome these limitations.
There are several challenges along the lines we presented. Firstly, it would be desirable to add more functionality to the system. Particularly, provenance and explanations about the conclusions can empower the user to make precise judgments based on these conclusions. In the aim of improving that precision, it would be also required to explore new ontology-based techniques capable of visualizing such a huge volume of data. A finer classification of the profiles requires more expressive rules, probably associated with new kinds of reasoning. In this sense, it is worth exploring the combination of suitable logics and inference results to achieve a better reflection of the actual expertise of the individuals. We have highlighted the connection between the quality of the data and the soundness of the conclusions. We believe that the precision and recall of the smushing functions can be further improved by using more properties, and by gathering as much data as possible. However, we identify the centralized RDF store as the main weakness in
GEEK
our architecture because it threatens scalability. A more decentralized architecture based on data distribution would address this problem, but it is much more challenging. Regarding the expertise mining, we think more can be done in at least two fronts. Firstly, larger amounts of structured data can be gathered and processed. For instance, detailed traces of the participation in projects can be found in the source control management system. Document authorship provides hints on expertise too. Valuable information is also implicit in the wide social networks that are traversal to several communities. Finally, content of posts, documents, homepages and other unstructured sources of information may provide new evidences about the level of expertise. The latter kind of mining would require NLP techniques such as Information Extraction.
ACKNOWLEDGMENT We would like to thank ArcerlorMittal for providing a practical and real-world use case. We acknowledge support and efforts from ArcelorMittal and CTIC workmates in developing and implementing GEEK. We specially wish to thank our colleague Miguel Garcia for his contributions to this work. Moreover, thanks to the open sources communities to provides us the sources where all data has been extracted from; special thanks go to those members of the open communities who provided us with feedback about the accuracy of our smushing process.
REFERENCES Ackerman, M., & McDonald, D. W. (1996). Answer Garden 2: merging organization memory with collaborative help. In Proceedings of the 1996 ACM conference on Computer Supported Cooperative Work (pp. 97-105). ACM Press.
Ackerman, M., Pipek, V., & Wulf, V. (2002). Sharing Expertise: Beyond Knowledge Management. MIT Press. Aleman-Meza, B., Bojars, U., Boley, H., Breslin, J. G., Mochol, M., Nixon, L. J. B., et al. (2007, June). Combining RDF vocabularies for Expert Finding. In Proceedings of the 4th European Semantic Web Conference (ESWC2007), Innsbruck, Austria, June 2007. Berrueta, D., Fernández, S., & Shi, L. (2008, April). Bootstrapping the Semantic Web of Social Online Communities, In Proceedings of the Workshop on Social Web Search and Mining (SWSM2008), co-located with WWW2008, Beijing, China. Bizer, C., & Schultz, A. (2009to appear). The Berlin SPARQL Benchmark. International Journal on Semantic Web and Information Systems - Special Issue on Scalability and Performance of Semantic Web Systems. Boley, H. (2006, June). The RuleML family of Web Rule Languages. In Proceedings of the 4th Workshop on Principles and Practice of Semantic Web Reasoning, Budva, Montenegro (LNCS 4187). Boley, H., Hallmark, G., Kifer, M., Paschke, A., Polleres, A., & Reynolds, D. (2009). RIF Core Dialect. W3C Working Draft, July 2009. Retrieved July 24, 2009 from http://www.w3.org/ TR/rif-core/ Breslin, J., Decker, S., Harth, A., & Bojars, U. (2006, July). SIOC: an approach to connect webbased communities. International Journal of Web Based Communities, 2(2), 133–142. Brickley, D., & Miller, L. (2007). FOAF: Friendof-a-friend. Retrieved July 24, 2009 from http:// xmlns.com/foaf/spec/
31
GEEK
Carroll, J. J., Bizer, C., Hayes, P., & Stickler, P. (2005). Named Graphs, provenance and trust. In Proceedings of the 14th International Conference on World Wide Web (WWW2005), ACM, New York (pp. 613-622).
Grosof, B. N., Horrocks, I., Volz, R., & Decker, S. (2003). Description Logic Programs: Combining Logic Programs with Description Logic. In Proceedings of the 12th International World Wide Web Conference, ACM (pp. 48-57).
Craswell, N., de Vries, A., & Soboroff, I. (2005). Overview of the TREC-2005 Enterprise Track. In Proceedings of TREC-2005, Gaithersburg, United States.
Haarslev, V., & Möller, R. (2001). RACER System Description. In Proceedings of the 1st International Joint Conference on Automated Reasoning (Vol. 2038).
Dumbill, E. (2004). DOAP, description of a project. Retrieved July 24, 2009 from http://usefulinc. com/doap/
Horrocks, I. (1998) Using an expressive Description Logic: FaCT or Fiction? In Proceedings of the 6th International Conference of Principles of Knowledge Representation and Reasoning (pp. 636-649).
Eastlake, D., & Jones, P. (2001). RFC 3174: US Secure Hash Algorithm 1 (SHA1). Technical Report IETF. Retrieved July 24, 2009 from http:// dret.net/rfc-index/reference/RFC3174 Erling, O., & Mikhailov, I. (2008, October). Towards web scale RDF. In Proceedings of the 4th International Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS2008), colocated with the 7th International Semantic Web Conference (ISWC2008), Karlsruhe, Germany. Erling, O., & Mikhailov, I. (2009). SPARQL and Scalable Inference on Demand. In Proceedings of the 6th European Semantic Conference (ESWC2009), Heraklion, Greece. Fernández, S. (2008). RDFohloh, a RDF wrapper of Ohloh. In Proceedings of the 1st Workshop on Social data on the Web (SDoW2008), co-located with the 7th International Semantic Web Conference (ISWC2008), Karlsruhe, Germany. Fernández, S., Berrueta, D., Shi, L., Labra, J. E., & Ordóñez de Pablos, P. (2009). Mailing lists and social semantic web. In Social Web Evolution: integrating semantic applications and Web 2.0 technologies (pp. 42–56). Hershey, PA: Information Science Reference.
32
Horrocks, I., Patel-Schenider, P. F., Boley, H., Tabet, S., Grossof, B., & Dean, M. (2004, May). SWRL: A semantic web rule language combining OWL and RuleML. W3C Member Submission. Retrieved July 24, 2009 from http://www.w3.org/ Submission/SWRL/ Katifori, A., Halatsis, C., Lepouras, G., Vassilakis, C., & Giannopoulou, E. (2007). Ontology visualization methods — a survey. ACM Computing Surveys, 39(4). doi:10.1145/1287620.1287621 Kiefer, C. (2007). Imprecise SPARQL: towards a unified framework for similarity-based semantic web tasks. In Proceedings of the 2nd Knowledge Web PhD Symposium (KWEPSY), co-located with the 4th European Semantic Web Conference (ESWC2007). Innsbruck, Austria. Krulwich, B., & Burkey, C. (1996). ContactFinder agent: answering bulletin board questions with referrals. In Proceedings of the 13th National Conference on Artificial Intelligence, Portland, United States (Vol. 1, pp.10-15). Li, J., Boley, H., Bhavsar, V. C., & Mei, J. (2006). Expert Finding for eCollaboration using FOAF with RuleML rules. In Proceedings of the 2006 Conference on eTechnologies, Montreal, Canada (pp. 53-65).
GEEK
Mika, P. (2005). Flink: Semantic Web Technology for the Extraction and Analysis of Social Networks. Journal of Web Semantics, 3(2), 211–223. doi:10.1016/j.websem.2005.05.006 Miles, A., & Bechhofer, S. (2009, June). SKOS Simple Knowledge Organization System Reference. W3C Proposed Recommendation. Retrieved July 24, 2009 from http://www.w3.org/TR/skosreference Motik, B., Patel-Schneider, P. F., & Parsia, B. (2009, June). OWL2 Web Ontology Language: structural specification and functional-style syntax. W3C Candidate Recommendation. Retrieved July 24, 2009 from http://www.w3.org/ TR/owl2-syntax/ Polleres, A. (2007). From SPARQL to rules (and back). In Proceedings of the 16th World Wide Web Conference (WWW2007), Banff, Canada (pp.787-796). Prud’hommeaux, E., & Seaborne, A. ((2008, January). SPARQL Query Language for RDF. W3C Recommendation. Retrieved July 24, 2009 from http://www.w3.org/TR/rdf-sparql-query
Seaborne, A., & Manjunath, G. (2008, April). SPARQL Update, a language for updating RDF graphs (Technical Report Hewlett-Packard HPL2007-102). Retrieved July 24, 2009 from http:// jena.hpl.hp.com/~afs/SPARQL-Update.html Shi, L., Berrueta, D., Fernández, S., Polo, L., & Fernández, S. (2008, Octuber) Smushing RDF instances: are Alice and Bob the same open source developer? In Proceedings of 3rd ExpertFinder Workshop on Personal Identification and Collaborations: Knowledge Mediation and Extraction (PICKME), co-located with the 7th International Semantic Web Conference (ISWC2008), Karlsruhe, Germany. Streeter, L., & Lochbaum, K. (1988). Who knows: a system based on automatic representation of semantic structure. In Proceedings of RIAO’88, Cambridge, MA (pp. 380-388) Zhang, J., Ackerman, M., & Adamic, L. (2007, May). Expertise networks in online communities: structure and algorithms. In Proceedings of the 16th International Conference on World Wide Web (WWW2007), Banff, Canada.
Rosati, R. (2006). Integrating ontologies and rules: semantic and computational issues. In Reasoning Web 2006 (LNCS 4126, pp. 128-151).
33
34
Chapter 3
Recurrent Interactions, Acts of Communication and Emergent Social Practice in Virtual Community Settings Demosthenes Akoumianakis Technological Education Institution of Crete, Greece
Recurrent Interactions, Acts of Communication and Emergent Social Practice
INTRODUCTION Virtuality refers to a state where the subject is separated from the body (Sterne, 2006). Throughout the history of mankind, virtualities of various sorts have served a means for human beings to cope with complex, difficult to grasp and explain phenomena. Interestingly, such virtualities are constructed by a variety of instruments such as religious standpoint, political belief, but also more tangible things such as drugs or coding schemes (i.e., Morse code). Nevertheless, each type is characteristically different and distinct. Consider for example virtualities resulting from adopting a religion. They are primarily spiritual spaces, always revealed through recurrent engagement in a code of practice shaped by a system of values. Similarly, drugs result in mental spaces brought about by recurrent set of mind through the material properties of the drug. On the other hand, coding schemes such as Morse code, result in virtualities that lead to communication acts arising from the history of co-engagement of the partners in alternative linguistic domains. In all cases, there is a strong link between virtuality and materiality frequently established through ‘cause and effect’ relationships. Irrespective of how they are constructed, virtualities seem to share common ground in the sense that they facilitate recurrent engagement in the designated practice. It is such recurrent experience that leads to an act of communication meaningful to those subscribing to the virtuality. Furthermore, it is this repetitive engagement in a virtual space that may influence the enactment of socio-material practice in physical space, leading to an intertwining between virtual and physical activities. Although elements of the associated practices may be enacted and administered locally by individuals, there is always a ‘boundary’ core, which is ‘common’ enough and ‘plastic’ enough to facilitate local variations. Consequently, in all cases, the resultant virtualities satisfy three key properties: (a) they assume some sort of medium
(i.e., spiritual / mental or physical), (b) they are revealed through recurrent engagement in some sort of common practice of the members subscribing to the set principles and (c) the practice of the subscribing members is framed as much in social interaction as in processes, tools and artifacts. Virtual communities, the subject matter of this chapter, constitute a virtuality which is formed over networked digital media. Since the first introduction of the term in the work of Rheingold (1993), virtual communities have gained popularity in the research agendas of management scholars, social scientists and computer science researchers. Management scholars examine how business-sponsored virtual communities provide a new means for marketing (Stockdale, 2007) and sustaining innovation (Fuller et al., 2006). Anthropologists explore the ways in which Computer Mediated Communication (CMC) is giving rise to new forms of virtual communities and the socio-cultural implications of new communication technologies (e.g., Hine, 2000; Fabian, 2002; Wilson and Peterson, 2002; Eisenlohr, 2004). Recently there has been rising interest on the archaeology of virtual communities (Jones, 2003), especially those formed using virtual reality as medium to establish ‘places where the imaginary meets the real’ (Bartle, 2003). Jones (1997; 2003) coined the term cyber-archaeology implying an approach to online communities as ‘virtual settlements’. Subsequent refinements of the concept attempt to expand the domain of study of cyber-archaeology claiming that ‘the role of cyber-archaeology is not only to study the ‘actual’ technologies employed by virtual communities, but also the virtual objects they create within cyber-space’ (Harrison, 2009, p. 76). In this chapter, we pick up these intersecting threads of concern to analyze virtual communities as emergent structures resulting from the members’ recurrent co-engagement in a designated practice (i.e., communication, gaming, professional development, science, etc). Our intention is to partly challenge the methodological ground
35
Recurrent Interactions, Acts of Communication and Emergent Social Practice
available for analyzing virtual communities and which is implicitly or explicitly focused on community management (i.e., how communities are discovered, established, sustained and maintained) dismissing or undermining the practice they are about to serve. Indeed, the scholarly literature offers very little critical insights on what constitutes practice in these communities, how networked media facilitate existing or new practices and how online practice intertwines with offline practice. Consequently, our key concern is to sketch the boundary that distinguishes community management from practice management in virtual settings and attempt to qualify practice on the basis of certain criteria. In doing so, our objective is not to theorize about the concepts of ‘community’ and ‘practice’ respectively, or the mechanics through which they are shaped and revealed. Instead, we seek to unfold the methodological challenges confronting practice-based researchers interested in understanding how new media influence (i.e., reconstruct or extend) practice and the way in which it is revealed. The chapter is structured as follows. The next section reviews prominent conceptions on virtual communities and establishes bridges with practice-based studies. Then we elaborate on the ingredients of an approach for understanding communities through their practice, which builds on the concepts of ‘cultural artifacts’ and ‘technology constituting structures’. This forms the critical lens for a practice-based analysis of two case studies of virtual communities engaging in music notation lessons and vacation package assembly. Analysis of the two cases reveal interesting features of each community type and lead to several remarks about practice, how it is framed and revealed in virtual settings. The chapter concludes with summary and recommendations for future research on methodologies for studying virtual communities.
36
BACKGROUND There are two strands of theoretical thinking that are broadly relevant to our current work. The first relates to the concept of a ‘community’ as advanced by social scientists over the years and as considered in recent social studies of technology. The second is concerned with ‘practice’ as epistemology of knowledge in action. Our effort in the chapter is to attempt to understand virtual communities through the practice their members become engaged. To this end, we present a nonexhaustive review of the argumentation presented in scholarly works in an effort to highlight controversies and research challenges, thus underlying the rationale of the present work.
Community Conceptions Social psychology research has revealed that humans have an innate need to belong and be affiliated with others, which is a primary motivation in joining both online and offline communities (Ridings & Gefen, 2004, p. 4). Moreover, since the late part of the 19th century, community as a social phenomenon has been, and continuous to be the subject of considerable debate for sociologists. In this debate two main traditions of theoretical thinking have emerged. The first considers community from a process oriented perspective accounting for social solidarity, material processes of production and consumption, law making and symbolic processes of collective experience and cultural meaning. The second tradition considers community in terms of territorial boundaries, place-based social interactions, collective value and shared symbol systems that create a normative structure typified by organic traditions, collective rituals, fellowship and consensus building (Fernback, 2007). The latter perspective seems to have been more influential in designing technology for managing online communities. Specifically, there are several published works reporting on ‘place’based online communities such as LamdaMOO
Recurrent Interactions, Acts of Communication and Emergent Social Practice
(Mnookin, 1996), MicroMuse (Duval Smith, 1999), Phish.net (Watson, 1997), FurryMuck and JennyMUSH (Reid, 1999), to name just a few. Despite, its popularity the ‘place’-based perspective is abandoned (or does not seem to be followed) when analyzing phenomena within these online places. Rather, it seems that the primary focus is on methods for analyzing processes for developing structures that resemble ‘civil society’ such as neighborhoods, public forums and marketplaces (Mnookin, 1996; Poster, 1997), or a kind of anthropological study of evolving social structures within these ‘places’ (Cherney, 1999). Furthermore, there is evidence to suggest that ‘place’-based online communities are seldom conceived as such by their members (Gochenour, 2006). Instead, members seem to appropriate them as infrastructures for communication with a geographically distributed network of friends and/or family. A shift from the typical ‘place’based conception of online communities towards a network type organization is also evidenced by the rise of social networking applications. These applications use an overt network structure, in which each individual functions as node, to allow users to stay in touch with known friends, find connections to new ones, and to organize events. Unlike online ‘places’, these applications make no claim for pretending to function as ‘civil societies’; rather, they provide linking mechanisms for individuals to form networks, which can then be leveraged for social, cultural, and economic purposes. The above trends are widely acknowledged in recent scholarly works where the term ‘community’ is interpreted in the context of computersupported social networks (Wellman, et al., 1996) as means to foster new virtualities such as distributed communities (Gochenour, 2006), networks of practice (Brown and Duguid (2000), knowledge communities (Lindkvist, 2005) and value-creating networks (Buchel & Raub, 2002). Although, these efforts are in pursue of seemingly similar targets, their underlying baseline as well as the resulting /
proposed models have unique characteristics and distinct implications. Thus, there is an increasing tendency to expand the research agenda in various dimensions. For instance, rather than talking exclusively about ‘online communities’ seeking to understand behavior and phenomena ‘within’ an online space, perhaps we should also begin to investigate ‘distributed communities’ emphasizing the mechanisms (i.e., the internet and the practice toolkits) allowing widely-dispersed individuals to interact with one another. And while discussion of online communities has often focused on the nature of the subject within the community (Donath, 1999; Ito, 1997; Turkle, 1995), discussion of distributed communities may enable us to see how individuals function in a polyvalent way outside of specific spaces. Arguments such as the above reveal that shifting away from well-defined places one ‘goes to’ on the net, to tools one uses to maintain a network, suggests that we should also perhaps begin to consider other analytical tools for understanding whether or not these ‘networks’ can also be considered as ‘communities’, and what the nature of the ‘node’ is within them.
The Practice Lens The practice lens is the term coined to a line of research focusing on technology use and the emergent structures revealed through such use (Orlikowski, 2000; 2002). It falls within a wider movement, towards an analysis of practice as epistemology useful for the study of working practices and the kind of practical and ‘hidden’ knowledge that supports them (Gherardi, 2009). Although such a movement, as it unfolds through the writings of social scientists (Giddens, 1984; Suchman, 1987; Brown & Duguid, 2001; Suchman, 2007), organization and management scholars (Gherardi, 2001; Schatzki, 2001; Orlikowski, 2002; Nicolini et al., 2003), is rather polysemic and non-homogeneous, it has nevertheless, accumulated a critical body of knowledge which
37
Recurrent Interactions, Acts of Communication and Emergent Social Practice
forms the common bond for many practice-based studies (Schatzki 1996; 2001). Amongst these studies, the practice lens brings to the forefront the analytic distinction between technology as artifact (i.e., a technology’s arti-factual character) and technology use in practice (Orlikowski, 2002). According to the practice lens, technologies can be seen as prerequisites for particular outcomes but the existence of prerequisites does not determine the outcome. Thus, through technology used in practice, new structures may arise which were not initially foreseen during the development of the technology. This perspective brought to the context of online communities suggests a clear separation between pre-requisites and outcomes. Specifically, it leads to the argument that virtual communities should be conceived of and analyzed as emergent structures, whose software prerequisite is ‘connectivity’ as it facilitates recurrent interaction of people with whatever properties of the technology at hand. Phrased differently, as emergent structures communities cannot be treated as embodied in technology. Instead, what is embodied in a technology is a particular set of symbols and material properties allowing for social connectivity. Then, virtual communities emerge when such social connectivity is instantiated in practice. It is worth noticing that today social connectivity can hardly be conceived in terms of functional qualities or requirements. Instead, it appears to be a non-functional quality attribute to be satisficed rather than fulfilled. A second useful feature of a practice lens, in the context of our present work, is that it links with cyber-archaeology (Harrison, 2009; Jones, 2003) in so far as it fosters commitment to analyzing virtual communities through their cultural artifacts. These artifacts can provide an integrative framework for community life, whether virtual or real (Jones, 1997; Jones & Rafaeli, 2000). Archaeologists consider as cultural artifacts any kind of material remains of culture, aiming ‘not so much to reconstruct what once was, but to make sense
38
of the past from a viewpoint of today’ (Fahlander & Oestigaard, 2004: 44). Such cultural remains are seen as bi-products of culture, but not culture itself. Analysis of artifacts in situ and in relation to the other artifacts evokes particular understandings of the culture that they exist within. In virtual settings, such material remains are bits of program code and data, which can only be made sense of using dedicated software. Thus, cultural artifacts of a virtual community are inextricably linked with the software toolkits through which such artifacts are instantiated in practice, become negotiated, constructed and reconstructed by community members.
UNDERSTANDING VIRTUAL COMMUNITIES THROUGH A PRACTICE LENS In order to assess what conditions enable virtual communities to be established and flourish, we propose to adopt a perspective rooted in the practice leans, focusing on two units of analysis, namely constituting structures and cultural artifacts. Structure is what is embodied in a community in the form of social connectivity enabling the co-engagement of members in designated domains. On the other hand, cultural artifacts reveal the existence of the community and unfold its purpose and practice. At first glance, such a conceptualization of community is neutral to the practice a community is about. In other words, our definition would be equally valid for online communities of interest (i.e., music, games, entertainment, etc), virtual communities of professional practice (i.e., software engineering, commerce, government, etc), knowledge communities and value-added networks (i.e., intra- or inter-organization new product development). After all, they are all about recurrent interactions of the members resulting from a shared history of co-existence. Nevertheless, intrinsic properties of each type of community are revealed only when
Recurrent Interactions, Acts of Communication and Emergent Social Practice
the relevant practice comes into play, necessitating an analysis of the principles and the tools governing co-engagement in a designated domain and how such co-engagement evolves into a system of learnt interactive behaviors. For instance, prior to widespread computermediated communication, people generally found their social circles in close geographic proximity to where they lived (i.e., ‘door-to-door’ social networks). As travel became more common, people began widening the geographic spread of their social networks giving rise to what Wellman terms ‘place-to-place’ networks (Wellman, 2001). Today, advanced communication technologies create momentum for yet another transition towards ‘space-to-space’ networks. It is worth noticing that each shift in the paradigm of community management was realized by improving the associated practices. Thus, with travel, communications technologies and services, door-todoor interpersonal interaction was supplemented by the ability to write letters, and later, telephone one another to keep strong social ties over larger distances. As computer networks matured and became common place, ‘digital’ literacy practices were necessitated. Mobile phones, network attachable devices and social software have yet again extended the practice vocabulary so as to enable ‘space-to-space’ interactions and networking in which the individuals’ location is of no concern. The latest wave of innovation, namely tangible interfaces, pervasive computing and ambient environments implicates yet another transition towards ‘network-to-network’ interactions, as these technologies foster a stronger intertwining between online (or virtual) and physical (or material) spaces. It is evident therefore that with each shift in the paradigm old practices are not simply transferred to the new social context. Rather, they become enriched, augmented by and inextricably linked to the new social-technical environment. In light of the above, it stands to argue that virtual communities are phenomena of cyberspace, which are not solely a new communications
strategy, but rather a complex of cultural practices that occupy multiple locations, both virtual and physical. Focusing only on what happens online or offline (as separate contexts) cannot fully detail the practice involved. Consequently, the detachment of online activities from offline praxis, which is commonly observed in the relevant literature (Gochenour, 2006), should be relaxed so as to alleviate methodological challenges, which arguably constrain the researcher and the interpretive capacity of the instruments used to assess findings. Phrased differently, examining cultural artifacts in a location-bound manner (i.e., a single site) raises the possibility that the research will inevitably concern sub-cultural practices of a much broader, less definite context or culture. On the other hand, attempting to streamline online and offline practice is challenging both in terms of method used and unit of analysis.
Cultural Artifacts Cultural studies of technology have traditionally been more concerned with broader accounts of social context focusing either on how such contexts shape technology or how technologies are implicated in such contexts (Wise, 1997; Sterne, 2006). Nevertheless, they all point to the artifactual nature of technology, which in turn leads to frequently overlapping technological trajectories. The term is used to denote transitions characterized both by the technical qualities of technology as well as the assumed paradigm of use. Examples of such transitions are evidenced throughout the history of technology. Thus, for instance the ‘read-write’ web signified a period anchored by device-dependent markup languages such as HTML and web sites were the user was conceived as ‘audience’ or passive consumer of information-based products. Key quality feature during this period has been portability. Since 2004, the emergence of Web 2.0 signifies transitions in technology (from device-dependent to device independent markup, from web sites to blogging,
39
Recurrent Interactions, Acts of Communication and Emergent Social Practice
from authoring to collaborative editing and from consuming information to social interaction and networking). In terms of paradigm of use, this new digital culture progressively establishes new primary beneficiaries. Specifically, we are now experiencing the ‘active’ audience who creates content and this new content serves as added value. Research studies have claimed that throughout this evolutionary process new practices do not follow inexorably from the material features of established technologies; instead, they are improvised on the basis of old practices that work differently in new contexts (Harrison & Barthel, 2009: 156). Figure 1 below illustrates the case related to the current digital culture of Web2.0 and how it evolved (in terms of cultural artifacts and practices) from the earlier ‘read-write’ web. Thus, early authoring device-dependent mark-up practices prevailing in the ‘read-write’ web evolved into posting in blogs, collaborative editing using wikis and tagging practices in social software. Respectively, online communities transformed from ‘place’-based virtual gatherings to ‘space’oriented, knowledge communities and virtual social networks. Obviously, these transitions exhibit a degree of intertwining between social media, practices and the prevailing conceptions of communities, allowing the incremental refine-
ment of a set of activities in light of the emergent digital context. It is such intertwining that creates the grounds for a cyber-archaeology of practice with a dual focus – on the one hand allowing to understand past digital cultures (i.e., the ‘readwrite’ web) through the analysis of the material remains of their prevalent technologies (i.e., device-dependent markup) and on the other hand to re-construct cultural artifacts of practice (i.e., HTML web pages) in new contexts (i.e., social semantic web with tagging options).
Constituting Structures: Affordances and Quality Attributes To gain further insight into the how cultural artifacts reveal practice across technological regimes, it is important to assess what is it exactly the technological quality which is embodied as constituting structure in each generation of technology and how it affects the material aspects of technology. In this vein a useful concept is the notion of affordances. Gibson (1979) developed the idea of affordances to explain how people and other animals orient to the objects in their world in terms of the possibilities the objects afford for action. An affordance perspective recognizes how the materiality of an object favors, shapes,
Figure 1. Digital cultures, technological trajectories and cultural artifacts
40
Recurrent Interactions, Acts of Communication and Emergent Social Practice
or invites, and at the same time constrains, a set of specific uses (Zammuto, et al., 2007). Thus, it can be argued that the capacity to ‘understand’ and ‘reconstruct’ practice through cultural artifacts is enabled or constrained by the presence or absence of novel affordances that determine information-processing properties of the artifacts such social connectivity, abstraction, social translucence, specification-based languages and interoperability, The way in which these affordances are inscribed in technology shapes not only the material aspects of the technology but also what is enacted through technology use. Consequently, they constitute some form of structure, which however, can no longer be related only to functional components inscribed in technology to detail structures in data, processes and protocols used to implement designated capabilities. Instead, affordances are conceived of as non-functional qualities or intangible constraints defining boundary conditions of the process through which the technology is built and whose presence or absence determines the technology’s use in practice. Consider for example the Google search engine. Its constituting structures include handling of directories, databases and indexes, page ranking algorithms, etc. Nevertheless, these functional aspects alone do not convey Google engine’s material capacity unless conceived in the context of non-functional qualities such as connectivity to make use of multiple servers and architectural abstraction, which in turn reflect choices made about its design, construction, and operation. Another example is Lotus Notes. Orlikowski (2000, p. 414) underlines that although Lotus Notes’ constituting structures include public key cryptography, distributed database management, communication via email, etc., again it is non-functional qualities such as client-server architecture, graphic user interface, flexibility / tailorability, security, etc., that determine the enactment of a variety of technologies-in-practice. It can be concluded therefore, that although functionality embodied in technological artifacts is clearly important,
it is not likely to be the constituting structure facilitating what Orlikowski coins as ‘variety of technologies-in-practice’ when referring to emergent social practices of a technology. Instead, it is the non-functional inscriptions in technology, which allow or constrain these emergent social practices. It is possible to seek justification for this argument by briefly reviewing the qualities, which constituted the driving forces for technological change in recent years. Figure 2 summarizes major advances in media and technologies over the past 15 years with an intention to unfold they primary non-functional quality driving evolution. As shown, the running paradigm shift from Web2.0 towards the vision of the Semantic Web is shaped by qualities such as abstraction, interoperability, social connectivity and ultimately plasticity. It is actually not so important for our current purpose to assess how these qualities are to be embodied in a particular technological artifact. What is important is the extent to which they are inscribed in the technology so as to allow emergence of new practices and technologies-in-use as well as the extent to which they are traceable or revealed through the cultural artifacts. Consequently, by focusing on such nonfunctional inscriptions, we may gain insight to the technology constituting structures leading to projections about the emergent social practices that may prevail in actual use. In the context of our present work, the above motivate an approach to analyzing virtual communities as emergent structures which become viable through the members’ recurrent co-engagement in a designated practice. Practice, in turn, is revealed through cultural artifacts and activities on objects whose affordances (i.e., inscribed non-functional qualities) designate their use. In this view, possibilities of action are not given, but depend on the presence or absence of affordances as well as the intent of the actors enacting them.
41
Recurrent Interactions, Acts of Communication and Emergent Social Practice
Figure 2. Critical technology trajectories and constituting quality structures
RESEARCH AIMS, METHODOLOGICAL IMPLICATIONS AND APPROACH In our recent work, we have attempted to explore the link between virtual community and practice, bringing to the surface the challenging issue of the containment structure between community and practice. Gherardi (2009, p. 121) makes explicit this challenge stating ‘… is it community that constitutes the container of knowledge … since communities pre-exist their activities or is it the activities themselves that generate a community …. as they form the ‘glue’ which holds together a configuration of people, artifacts and social relations’. In our current effort, instead of emphasizing arguments for and against ‘place’versus ‘space’-based or ‘network’ communities, or whether practice is ‘routine work’ versus an epistemology of work, tacit or tangible, codifiable, transmittable and reproducible, we have taken the risk of attempting to understand one concept through the other. Specifically, our aim is two-fold. On the one hand, we seek to understand how practice shapes virtual communities, thus what
42
are the technology constituting structures that enable communities to be established in virtual settings. On the other hand, we are equally keen to understand how practice becomes implicated in sociotechnical community contexts, thereby becoming expanded and enriched. To bring light to the above, we will elaborate on our recent work, making use of a two-case research design. Our motivating rationale is not to unlock and reveal local practices i.e., how colleagues sharing the same material work environment perform their activities. Instead, our specific interests are to (a) trace recurrent events taking place in virtual space triggering enactment of local practice and (b) identify how the results of such work is fed back to the shared online setting.
Cases Description Two case studies are presented – the collaborative engagement in music notation exercises and the assembly of information-based products such as vacation packages for tourists – drawing upon radically different practice domains and work contexts. Both were conducted in the context
Recurrent Interactions, Acts of Communication and Emergent Social Practice
of collaborative research and development efforts targeted to gaining deeper insights to the operation of virtual communities of practice. Moreover, each case was developed in separate and independent collaborative research projects running in parallel (2006 – 2009) and exploiting common methodological ground. The music notation exercises case was selected to provide insights to the online reconstruction of widely accepted practices based on established music notation constructs. On the other hand, the vacation package assembly case was chosen as it provides a means for cross-organizational collaboration (comprising materially different work practices) for new product development. In terms of technical developments, both cases make use of dedicated practice toolkits to facilitate the members’ collaborative co-engagement online. These are software components designed specifically to enable members of the respective communities to engage in the designated practices. Consequently, their design presents various challenges but these are explored and elaborated elsewhere (Akoumianakis, 2009a). What was intriguing and motivating to conduct a cross-case analysis is the difference in the community settings, the shared practices in each case and the local settings of the agents involved across the two cases. Briefly these are summarized in Table 1. On the other hand, both cases share common ground in terms of the community types, policies and services supported. Specifically, each case study reveals two different types of community. The first type reflects a kind of members’ neigh-
boring on the grounds of shared interest. For the music lesson, neighboring is established by choice of musical instrument (i.e., piano master class). In vacation package assembly neighboring is based on partners’ service offerings (i.e., accommodation, travel, food and beverage, etc). Neighboring communities are formed by registration, using dedicated electronic registration systems, and irrespective of practice domain, neighbors enjoy a variety of common services including asynchronous communications (SMS, GroupSMS and threaded discussions), information sharing and interoperability between the community and the respective practice environments. The second type of community amounts to neighbors engaging in the shared practice through the practice toolkits. This type of community of practice is formed subject to different conditions. For music lessons the condition is an invitation by the moderator. For the vacation package assembly the condition is the member’s commitment of resources to a collective cause (i.e., a vacation package). In both cases, engagement in practice entails asynchronous interactions between the members as well as synchronous collaboration with the object of collaboration being domainspecific and replicated across sites. Such functions are supported by the practice toolkits; downloadable software platforms available upon successful registration.
Table 1. Cross-case conditions and criteria Community setting
Shared practice
Local settings
Music notation lessons
Moderated squads comprising one tutor and a music learning community of peers
Interpreting music scores (boundary artifact) and performing accordingly
Identical and differentiated only by the performers’ choice of musical instrument
Cross-organization product development
Virtual alliance of business partners offering competing or complementary services
Negotiating details of aggregate product lines (vacation packages) by assembling services
Differentiated by organizational boundaries and locally instituted work practices
43
Recurrent Interactions, Acts of Communication and Emergent Social Practice
Music Notation Lessons Music in general, is widely recognized as a medium for community making as it fosters sense of belonging and identity formation (Mavra & McNeil, 2007). This is also evidenced in music learning situations (Ferm, 2008). Our experimental scenario involves a multi-site engagement in a piano lesson with one moderator and several participants. To conduct a music notation lesson, moderators (or music theory tutors) prepare shared music materials (i.e., music notation, recordings, videos), schedule and organize the music lesson and invite participants. On the other hand, participants access shared contents in their own pace, while during the lesson they negotiate the act of interpreting shared music materials against their personal technical virtuosity. This is manifested as individual music performance of a designated music score. Additionally, the virtual group may engage in collaborative rehearsals of a music piece with participants co-engaging in collaborative muFigure 3. Music room management by moderators
44
sic performance. Synchronization of participants and latency issues are technical challenges in their own right, which have been explored elsewhere (Alexandraki & Valsamakis, 2009; Alexandraki & Akoumianakis, 2010). There are two prerequisite for taking part in a music notation lessons. The first is the users’ acceptance of the moderator’s invitation, which is followed up by registration to a ‘room’ containing the shared material of the lesson. Such ‘rooms’ are implemented as dedicated portlets (see Figure 3) using the Liferay content management system, which serves as the community support medium. Registration is a two-stage process where participants first become members of the community (by building their music profile) and then register to ‘rooms’. The second prerequisite entails downloading the dedicated practice-specific software suite, which allows members to engage synchronously in the micro-negotiations of a specific music lesson. An example of this client toolkit is depicted
Recurrent Interactions, Acts of Communication and Emergent Social Practice
Figure 4. The distributed music notation lesson toolkit
in Figure 4. As shown, the toolkit implements a dedicated room for each participant with online material related to the music notation lesson. Specifically, each room contains the shared and replicated representation of music (i.e., the score), relevant material from the corresponding Liferay room (bottom right hand side dialog), music performance controls for transmitting / receiving signals, controlling a remote peer’s music performance and synchronizing with peers during a rehearsal (bottom panel of buttons). Other facilities such as the web camera and online chat allow users to maintain visual contact with remote sites and to exchange informal messages. Since the music score offers a special type of language for music, its interactive version provides for editing, annotating and modifying parts of the music score by the holder of the floor. Example manipulations and practices are presented in Figure 5. The functionality of the synchronous session manager and the floor control is to manage access to the shared object and broadcast changes to all registered participants.
Vacation Package Assembly This case was conceived and designed to assess the conduct of distributed collective assembly of vacation packages in a virtual community context. This time the community takes the form of a crossorganization virtual alliance whose members join their efforts to satisfy the requirements of vacation packages as formulated by prospective user communities. Before presenting the details of the site studied, it is perhaps appropriate to briefly discuss what exactly is referred to as vacation package assembly. The package tour has been ‘the ultimate, mass-marketed’ tourism product since the 1950s (Osti, Turner & King, 2009). However, the increased capacity of travelers to gather and process information about destinations, transportation and accommodation and the increasing popularity of tourist blogs (Schmallegger & Carson, 2008) have challenged traditional package tours, which for several years now experience decrease in sales. Professionals investing on such products are increasingly responding to the challenge by adopting
45
Recurrent Interactions, Acts of Communication and Emergent Social Practice
Figure 5. Examples of music score manipulation practices by the floor holder
alternative vacation package marketing models (Walsh & Gwinner, 2009). One such prominent trend, which is turning the tourist industry from a leading candidate for B2B electronic commerce to an information industry, is devising flexible ‘package family engines’, which allow travelers to compile online alternative package instances to suit their own requirements (Ritzer & Liska, 1997; Akoumianakis, 2010). The site studied in this case study is an operational pilot of a virtual space on regional tourism, which facilitates cross-organization alliances to engage in collaborative assembly of in-vacation packages. These are information-based products which act either as supplements to vacation services acquired through conventional means or as catalysts for selecting a destination site. An alliance is formed as a moderated virtual community of practice with members offering dedicated services to fulfill the demands of a vacation package. For the virtual alliance, the vacation package remains an aggregate product offering until the moment that prospective customers make choices of specific services, thus selecting specific business partners. Vacation packages once tailored to user
46
needs provide a source of input to the alliance as customers may engage in a variety of posteriori reflections upon the actual experience. Figure 6 and Figure 7 provide illustrative views of vacation packages in the customers’ and the corresponding alliance’s social worlds, respectively. In the customers’ social world vacation package options are ultimately selected by customers to suit own requirements and cultural preferences. This can be done by assessing various parameters such as price, relative physical location of a service provider, etc to derive the most preferable portfolio of services. On the other hand, while working on a vacation package the virtual alliance exploits different visual tools to engage in asynchronous and synchronous micro-negotiations related to all aspects of a vacation package, including how it is presented, details of services, candidate providers, global policies (i.e., discounts), etc. The vacation package assembly line undertakes to process these micro-negotiations and to transform them into a vacation offering meaningful to users. Another key quality of the assembly line is to support the designated ‘plasticity’ of vacation packages. This entails automatic transformations
Recurrent Interactions, Acts of Communication and Emergent Social Practice
Figure 6. Vacation packages assembled and published
of work-in-progress arrangements (see Figure 7) to objects meaningful to the end users (see Figure 6). In other words, the assembly line serves as a
medium for transcending internal and external boundaries. Internal (to the virtual alliance) boundaries are between electronic neighborhoods
Figure 7. Vacation package in negotiation
47
Recurrent Interactions, Acts of Communication and Emergent Social Practice
of service types such as accommodation, transportation, food & beverage, etc. External boundaries indicate the customer communities that may be formed around a particular vacation package.
Data Collection Methods Each case was analyzed independently using several virtual ethnographic studies of operating music squads and virtual tourism alliances followed up by interviews and workshops. During virtual ethnographies, the researcher was involved as an active participant / observer of these virtual groups, undertaking the moderating role in each case. Following each ethnographic study, participants were interviewed and took part in a workshop. The interviewing strategy, which is summarized in Table 2, was devised to reveal participant conceptions on both community and practice management.
Table 3 provides insights to the questions raised in each interview for each of the two principal constituents. In deriving such questions we first elaborated the most prominent research issues for community and practice management respectively and then reduced the transcripts by grouping them into more general categories based on their similarities. Thus, for community management we wanted to assess the extent to which communities, once formed to address a specific practice agenda (i.e., music notation lesson or a vacation package), sustain their momentum and persist across different practice agendas. Phrased differently, we sought to consider sense of community subject to the degree to which members continue their co-engagement across different music lessons and vacation packages. Such recurrent behavior, if observed across virtual ethnographic studies, would indicate not only strength in social ties, but also it could explain sense of
Table 2. Interviewing strategy for data collection €€€€Music lessons
€€€€Vacation package assembly
€€€€Soliciting participants
Virtual ethnographic study of piano lesson
Virtual ethnographic study of ‘Pelloponissos Round Trip’ vacation package
€€€€Participant affiliations
Music tutors, music performers, conductors
Tour operators, travel agencies and service providers
Appropriated versus enacted structures & implications
Social awareness of what peers are doing
Motive for recurrent participation (in other communities)
Online practice reification and materiality
Community (online) practice vs. members’ offline activities
Alignment between online and offline practice
48
Recurrent Interactions, Acts of Communication and Emergent Social Practice
community through elements of the practice in which the members become engaged. For practice management our interest was to devise a suitable unit for ‘framing’ elements of practice in virtual settings. To this end, we hypothesized that practice is revealed either through social interaction between the members or the knowledge encoded in processes, tools and artifacts. These conditions are translated to the questions in Table 3.
ETHNOGRAPHIC ANALYSIS OF ELECTRONIC SQUADS & LESSONS LEARNT In this section we summarize key findings from our ethnographic studies of electronic squads and the follow-up workshops with the participants. Our intention is not to present details of our evaluations as these have been discussed elsewhere (Akoumianakis et al., 2008; Akoumianakis, 2010). Instead, our aim is to reflect upon our experiences following a practice lens, focusing on what is revealed by analyzing the cultural artifacts and the technology constituting structures of the respective virtual communities. In this endeavor, we seek to unfold useful insights to whether it is community that constitutes the container of practice or is it the practice that generates a community.
Cultural Artifacts Both case studies confirmed that communities emerge and sustain their function through the members’ recurrent interactions with cultural artifacts, which reflect the practice community members become engaged in. Nevertheless, the type and role of these cultural artifacts was somewhat different in each case.
Cultural Artifacts as Linguistic Vocabularies for Recurrent Engagement One key finding resulting from the responses to the questions ‘Means for understanding the shared practice’ and ‘Making sense of what is expected’ in Table 3 was that users in both cases conceived the community as a common information space which is made sense of through designated cultural artifacts predominating the members’ collaborative practices. Participation and engagement in practice is only possible through (and synonymous to) recurrent interactions with these cultural artifacts. Phrased differently, cultural artifacts form the linguistic vocabularies of practice and it is through the members’ co-engagement in such linguistic vocabularies that knowledge emerges either as ‘knowing in practice’ or as communication acts. In the music lesson scenario, participants identified two cultural artifacts rising during negotiations, namely the representation of music and the outcome of the distributed collective practice of the group. The music score, as means for representing music, qualifies as cultural artifact for several reasons. Firstly, it is a human-made object, which gives information about the culture of its creator. Secondly, it forms a type of language for music (Copland, 2002; Patel, 2007), with widely shared meaning, allowing not only description, but also re-creation of a musical piece. Thirdly, such meaning resides within the score and takes the form of melody, harmony, the interrelationship of melody and harmony, the overall form or structure of the piece and in vocal music, some sort of relationship of the lyrics to these other elements (Brackett, 1999, p.126). Another type of cultural artifact is the outcome of the group’s distributed music performance or the recorded rehearsal. An mp3 rehearsal is a crystallized set of social and material relations. It embodies both technical and material elements of practice in a single artifact that ‘works for’ and is ‘worked on’ by a host of people, ideologies, technologies and
49
Recurrent Interactions, Acts of Communication and Emergent Social Practice
other social and material elements. Sterne (2006) in his analysis of the mp3 music format develops, rather convincingly, this line of argumentation, revealing how mp3 embodies social, physical, psychological and ideological phenomena of which otherwise we might not have been fully aware. In a similar vein, vacation packages stand out very promptly as cultural artifacts. In their recent analysis of travelers’’ information search, Osti and colleagues (2009) identify several cultural characteristics as determinants of the information search process. These include ‘… type of vacation undertaken, origin and cultural background of the tourist, involvement with travelling, perceived risks of taking the holiday or undertaking a particular activity, knowledge about the destination, time available for selecting and planning the holiday or a determined activity, cost of the holiday or activity, difference between the possible holidays or activity to choose from, and involvement with the destination or a certain activity, as for example a particular sport or cultural activity’ (p. 65). Our analysis of the requests posed by customers for clarifying vacation package alternatives as well as the interviews with participants confirm previous research and reveal some of the factors, which qualify vacation packages as cultural artifacts. Firstly, choice of vacation destination appears to be strongly influenced by the way prospective tourists perceive their hosts and their culture. It seems that sometimes, previous established ideas and stereotypes, prevalent views on economic stability, safety and accessibility act as blinkers that determine tourists’ choice of destination (Costa & Ferrone, 1995). Secondly, choice of vacations frequently reveals cultural identity in so far as it entails conscious decisions on specific type of packages (such as urban, green tourism, cruising vacation, short-break vacation, etc), which may be further qualified by individualized travel plans. Thirdly, culture was also revealed as governing the type of service requested and consumed irrespective of destination decisions. It is therefore
50
important for information-based vacation packages, such as those considered in our study, to exhibit the flexibility and plasticity required to allow customers to request personalized arrangements and individualized options. Being able to support such micro-negotiations between customers and the virtual alliance as well as between members of the alliance has been the primary motif for both repeated customer requests for different vacation packages and recurrent participation of members in virtual alliances across different vacation package agendas. It stands therefore to argue that communities as emergent structures are best revealed by analyzing the practice in which they become engaged. Such analysis should focus on two prominent aspects. Firstly, it should aim to reveal practice by account of the cultural artifacts articulated by members in the course of their co-engagement online as well as the collective outcomes of this co-engagement. Secondly, analysis should seek to unfold how cultural artifacts become shared language or linguistic vocabularies leading to acts of communication. Such an analytical strand necessitates a detailed insight into the practice toolkit in order to extract the boundary artifacts through which shared meaning is constructed, negotiated and reconstructed. It is these boundary artifacts, which need to be designed so as to form the practice vocabulary of the virtual community. Our case studies addressed two possible types of boundary artifacts, namely artifacts with an established history and widely accepted meaning in offline practices (e.g., music notation scores) and artifacts whose metaphoric manifestation in virtual space affords collaborative sense making. Both types have been designed so as to exhibit ‘plasticity’ in order to enable their transformations across different social worlds.
Recurrent Interactions, Acts of Communication and Emergent Social Practice
Cultural Artifacts for Making Sense of Boundary-Spanning Domains Responding to the question on ‘Social awareness of what peers are doing’ (see Table 3), participants noticed the affordances designed into the designated cultural artifacts (i.e., the shared music score and the vacation package activity panel) and how these affordances foster and facilitate shared meaning and collective sense of the common information space. In both cases, participants indicated that they could immediately identify themselves with these artifacts, while at the same time, obtaining and maintaining the impression that they are not alone. It was also noted that such meaning does not result from the communications facilities available, but it resides solely with the cultural artifacts, which serve the purpose of structuring unknown contexts and/or actions and assigning them with meaning. Of course this should not undermine the role of organizational conditions, which are known to influence sense making (Weick, 2001). Nevertheless, this is where the boundary role of cultural artifacts is mostly beneficial, as it allows them, through their cultural binding, not only to foster making of sense of the shared information space, but also to translate what happens online to ‘local’ offline practice across different organizational conditions. Our case study data confirmed this conclusion, as it revealed that in both cases meaning resides in the cultural artifacts, either as codified knowledge or socially constructed and collective achievement. In the music notation lesson meaning as codified knowledge resides with the music score, which uses a standard vocabulary for this purpose. The socially constructed collective achievement is the rehearsed music performance of all participants. On the other hand, vacation package templates use a non-standardized vocabulary to reflect workin-progress and process-oriented information. Work-in-progress takes the form of knowledge about what services are required, how they are interrelated and who is contributing. Process-
oriented information relates to the transformation of the templates into a concrete information resource in the users’ social context, which can be further negotiated to suit individual preferences. The social aspects of the collective achievement are made explicit not only during the development of the vacation package (see Figure 7), but also once the package is crystallized and published through its dedicated forum in the portal. Another useful finding standing out very promptly from our interviewing data relates to how members of virtual communities cope with strangers. Previous work (Raybourn et al., 2003) suggests that cultural artifacts may be used to facilitate collaborative engagement with strangers. We found this to be true across the two cases. Specifically, in the music notation lesson no participant expressed negative feelings on collaborating with strangers to rehearse a musical piece. Moreover, in vacation package assembly co-engagement in vacation packages did not appear to be hindered by the fact that partners are different organizations with non-uniform values and material practice environments. Nevertheless, in both cases this was implicitly related to the boundary-spanning nature of the task in hand. In other words, most participants responded that ‘… don’t mind collaborating with strangers as long as their practice does not interfere with mine’. This leads to the conclusion that in boundary-spanning settings virtual groups are not so much concerned with the identification of specific attitudes or the individual members’ work practices, as they are concerned with the process of organizing (and making sense of) the distributed work itself. This abstractness should be reflected in the design of the boundary artifacts used to codify the shared practice. In our reference case studies, the music score (in the music notation lesson) as well as the elastic buttons and tailorable activity panels (in vacation package assembly) were designed precisely to facilitate such abstractness. Specifically, the library used to develop the music score was augmented with interaction techniques depicting the universal
51
Recurrent Interactions, Acts of Communication and Emergent Social Practice
code of music score practicing. Similarly, elastic buttons were designed so as to inherit the command metaphor of conventional GUI buttons, but also to afford additional semantics through features such as color codification and size manipulation. Notably, in both cases, such functionalities are embedded in the user interface toolkits used to develop the respective applications. Thus, they were encapsulated through abstractions intended to facilitate a degree of plasticity of the material practice of the technology. In turn, it is this plasticity which allows the respective artifacts and their surrounding practices to maintain a boundary role (being easily recognizable across communities, while raising different implications for members of each community).
Cultural Artifacts as Reifications of Practice Addressing the question of ‘How online practice is reified and obtains material properties’ the respondents underlined the catalytic role of cultural artifacts in (a) anchoring practice and making intermediate and final results of this practice visible to all actors involved and (b) unfolding rationale and the arguments behind the distributed collective practice, thus unfolding what was at stake, what worked, and what turned out to be insufficient or inadequate. In the music notation lesson, the prominent anchor of practice turns out to be the rehearsed music performance which when combined with earlier rehearsed versions and the shared music score binds together elements of a cultural past (i.e., the composer of the musical piece) with elements of a cultural present (i.e., dynamics of the music learning community). In the case of vacation package assembly, elastic buttons and tailorable activity panels served to anchor the virtual practice of vacation package assembly and to make its intermediate and final results tangible by enabling implicit mapping of abstract reality to concrete localities of individual partners and prospective
52
customers. In so doing, tailorable activity panels were found useful not only for summarizing vacation packages but also for clarifying boundaries (i.e., neighborhoods involved) and reinforcing the identity of members of the supporting communities of practice. Moreover, decoding the tailoring requests received by a vacation package reveals patterns in the end user communities which may provide useful insight to end user preferences, purchasing behavior and consumption patterns. Consequently, vacation packages as reifications of the virtual assembly practice are significant for learning because they communicate to participants what is important, what appears to be popular amongst user communities, what seems to work well and what turns out to be problematic. A final comment relates to how online practice obtains material properties. This is quite distinct from the intertwining of online and offline practice (which is briefly discussed below) as it relates to the materiality of the outcome of the distributed collective practice and the cultural remains of the respective virtual communities. In both cases, the participants concluded that the remains of the electronic squads comprise on the one hand ‘packaged’ information-based artifacts codified in popular formats (mp3s for music rehearsals or XML code for vacation packages), and on the other hand, the dynamics of collaboration giving rise to these artifacts. Notably, the former type of virtual ‘tells’ in the language of Jones and Rafaeli (2000) can be easily reconstructed using tools of the current technological paradigm (i.e., mp3 players and browsers), while the latter type remains bytes of code until such time that it is reconstructed using only the practice toolkit through which it was initially constructed.
Technology Constituting Structures We now turn to the second issue of interest to the present work, namely the constituting structures inscribed in technology and the extent to which these structures enable reproduction of existing
Recurrent Interactions, Acts of Communication and Emergent Social Practice
and well established practices or emergence of new practices. It turns out that this was a more difficult and challenging topic, as many participants could not associate technological artifacts with associated quality attributes. Needless to say that many quality attributes were considered as ‘jargon’ vocabulary requiring substantial explanation and elaboration of their details.
Technology Inscribed Structures Both cases analyzed in the present study revealed community formation at two distinct and separate levels. One level is that of building and maintaining electronic neighborhoods (i.e., community as neighboring), while another level is that of facilitating practice-oriented co-engagement (i.e., community as practicing). Participants also noticed that each type of community is fostered through different technological artifacts. Community as neighboring is facilitated through the Liferay portal and content management system. Community as practicing is enabled by the designated practice-oriented toolkits. In terms of technology constituting structures, community as neighboring exploits reuse of open source Liferay functionality to develop very similar structures across the two cases. These were inscribed in the augmented Liferay versions as electronic registration components, new content development policies, access rights to informational assets (including the practice toolkits) and asynchronous communication services. Community as practicing is facilitated by interoperable software components constituting the dedicated practice toolkits. The specific structures inscribed in these toolkits, although different in function across the two cases, followed established software engineering quality attributes (Akoumianakis, 2009b). For the music notation lesson toolkit, an open source third party library, namely JMusic, was augmented to facilitate interactive exploration of the shared music score. Additionally, a custom API was
developed to allow interoperability between the Swing user interface (see Figure 4) and the C++ libraries of the mixing server to enable control of the parameters of local and remote signals. On the other hand, vacation package assembly utilized augmentation and expansion of the Java Swing toolkit to create custom controls (such as the ‘elastic buttons’ representing services) and components (such as the ‘tailorable activity panels’ representing aggregate product offerings). The synchronous collaboration-oriented structures (i.e., session management, floor control, replication and object synchronization) inscribed to facilitate social connectivity and awareness, were very similar across the two cases, and were built using an abstraction mechanism designed for this purpose. Collectively, these structures provided the means for building interactive embodiments of the linguistic domains of music notation lessons and vacation package assembly, respectively.
Appropriation and Enactment of Structures Inscribed In Technology Orlikowski’s work (2000) identifies appropriation and enactment as two mechanisms distinct but integral to the practice lens. She draws the distinction by recalling that ‘… existing structurational models of technology examine what people do with technologies in use, positing such use as an appropriation of the ‘structures’ inscribed in the technologies’ p.407. Enactment is then introduced to emphasize a focus on emergent rather than embodied structures. Our content-based analysis of the respondents’ remarks on questions such as ‘Appropriated versus enacted structures and implications’ and ‘Motive for recurrent participation (in other communities)’ provides evidence to support a slightly different interpretation of enactment. Specifically, we observed that all community supporting functions embodied in the Liferay portal through design, were conceived by users as appropriating structures inscribed in technology to serve solely ‘community as neigh-
53
Recurrent Interactions, Acts of Communication and Emergent Social Practice
boring’ practices such as formation, information sharing and communicating. Respectively, the electronic neighborhood registration system, the custom information / content templates and the dedicated communication portlets were understood as components enabling these practices. Although the choice of embodied structures being appropriated at times may differ across users (i.e., some preferred email, others SMS and only moderators employed GroupSMS for asynchronous communication) there was consensus both on the scope of these structures and the exact steps for using them. Moreover, there was no confusion on members, resulting either from multiple identities, which was not inscribed, or the designated roles in the communities formed. Enactment was largely conceived as a mechanism associated with the practice toolkits. Specifically, two types of enactment were noted. The first relates to becoming co-partners in the designated practice of music notation lessons or vacation package assembly. The second is a form of enactment grounded on what emerges as a result of the members’ co-engagement in practice. Enactment as co-engagement was revealed by the fact that the practice toolkits are the only means enabling emergent structures of the type ‘communities as practicing’. Enactment as collective praxis is revealed by the ‘packaged’ outcome of cultivated virtual communities of practice. The emergent property of enactment of inscribed structures is evident in the cultural artifacts produced by the respective communities and which cannot be foreseen in advance. For instance, the rehearsal of a music lesson and its codification as an mp3 file format is emergent rather than pre-determined outcome of the micro-negotiations of the virtual team. Similarly, a vacation package emerges as a collective offering, but there is no guarantee that certain contributors will actually be chosen by prospective customers. In fact it may turn out that, accommodation providers of certain type or in a certain location, although registered and
54
eligible participants, never get selected due to various reasons. Our two case studies were also informative of some peripheral implications of enacting structures inscribed in technology. For instance, one case reported sharing of their rehearsed music lesson through a different social networking medium and using this medium to establish and sustain a virtual community. Notably, it was also reported that this had a positive implication, in the form of high interest expressed on the DIAMOUSES technology and tools. The vacation package case also confirmed similar emergent structures being formed and sustained outside the virtual space of eΚοΝΕΣ. This time, the driving force giving rise to these structures is choice of specific vacation services in a package, which reveals ‘hidden’ communities in the customer base and formation of ‘cliques’ between business partners. It turns out that both these emergent structures implicate revisions in subsequent versions of a package.
Intertwining Of Online and Offline (Material) Practices A final useful conclusion provided by our comparative study relates to how (i.e., through which artifacts) online practice is implicated in offline activities and vice versa. The findings re-established the critical role of cultural artifacts as boundary objects and confirmed that offline practice is strongly intertwined but not fully determined by what happens online. This conclusion shares common ground with recent works on online and offline practice by organization and management scientists (Vaast 2007). In the music lesson case study online practice and offline (local) performances are intertwined through the shared music score and computermediated social interaction such as verbal queues expressed either orally or textually (i.e., go back and forth, focus on specific phrases etc). Nevertheless, the shared music score is not sensible to contextual information characterizing local of-
Recurrent Interactions, Acts of Communication and Emergent Social Practice
fline performance at remote sites. Consequently, enactment of designated actions locally may lead to variable performance and this variation may be important. For instance, we observed that correct musical performance is still possible using suboptimal local (material) practices (i.e., erroneous or ineffective placement of fingers on the musical instrument). One possible improvement in this direction, revealed through evaluation, is that score following would provide a desirable contextual enhancement for making sense of local practices. In a similar vein, assembling vacation packages online influences but does not determine material practices of the members. This was evidenced through observing behavior of members of the virtual alliance and behavior of prospective customers. Interestingly, there were deviations between what partners promised online and how such promises were decoded and translated to situated (local) practice. For instance, we identified cases where the promised service quality could not actually be delivered without this deviation being manifested to the respective electronic squad well in advance.
DISCUSSION, SUMMARY AND CONCLUDING REMARKS This chapter set out to address two key questions, namely ‘what technology constituting structures enable communities to be established in virtual settings’ and ‘how practice becomes implicated in sociotechnical community contexts, thereby becoming expanded and enriched’. We have reached the conclusion that virtual community is constituted by structures inscribed in technology, while the community’s existence is manifested through cultural artifacts which reveal both community traces and elements of the practice in which members become engaged. In terms of structure inscribed in technology, we have claimed that functional perspectives do not suffice as they go as far as describing what is to be designed.
Quality attributes offer a means for gaining insight to emergent features not explicitly designed into the technology. Regarding the community’s manifestation the focus has been on cultural artifacts for two primary reasons. First, they reveal elements of the practice the community is about. Second, they provide traces of the community’ life, thus allowing us to understand a past reality as well as to reconstruct this reality. In light of the above, this chapter sketched an approach, which seeks to understand ‘community’ through the ‘practice’ the community is about. This approach is rooted in a cross-case analysis of two recent efforts. By separating practice from community management, we developed a practice lens to understand how technologically mediated cultural artifacts shape and become implicated in practice, as well as how they ‘virtual traces’ can be used to tell stories about the respective communities. From the results available through virtual ethnographic studies of operating electronic squads (Akoumianakis, 2010), we have reached several conclusions, briefly elaborated below. Our first conclusion is that in virtual settings practice is revealed through two ‘faces’ or constituents, namely ‘practice unfolded through social interaction’ and ‘practice encapsulated in processes, tools and artifacts’. Most of the recent studies on virtual communities do not recognize this duality that characterizes practice, as they concentrate on the face of practice as revealed through social interaction and analysis of content transcripts. Framing practice in processes, tools and artifacts offers valuable insight to the collaboration that takes place online and the means enabling this collaboration. Our case studies reveal that such collaboration creates cultural glues and shared context, which is important for virtual groups. Additionally, it unfolds recurrent praxis, which to a large extent is what sustains community and facilitates its evolution. A byproduct of this orientation to bring to the surface structures inscribed in technology and study process, tools and artifacts is the focus on practice toolkits.
55
Recurrent Interactions, Acts of Communication and Emergent Social Practice
Our second conclusion relates to the catalytic role of the practice toolkit. Through our case studies, it was revealed that a critical condition for success is not the community support system but the practice toolkit. Furthermore, we identified two contrasting conditions. Specifically, for practice domains with well-established vocabularies and notational systems, such as music scores, the toolkit should be designed so as to allow the reproduction of these practices online. This is a direct derivative of the music lesson case study where the toolkit, in effect, reproduces online the basic vocabulary of the linguistic domain of music theory (i.e., notational symbols such as notes, score, melodies and the valid patterns of use). This reproduction however, does not change the referent linguistic domain and its symbols in terms of expressive power, cognitive capability and basic means for communicating. Such a change would require the creation of new objects (in the vocabulary of the linguistic domain) and associated activities or patterns of use. In contrast, when the practice domain lacks universal codes of established practice, then the practice toolkit should establish the shared language for engaging in a micro-practice defined and experienced only in virtual settings. Moreover, such micro-practice should be defined through activities on boundary artifacts decoupled from the institutionalized ‘local’ practices of the members, which need not be compatible.
Akoumianakis, D. (2010). Electronic Community Factories: The model and its application in the tourism sector. Electronic Commerce Research, 10(1), 43–81. doi:10.1007/s10660-010-9045-1 Akoumianakis, D., Vellis, G., Milolidakis, I., Kotsalis, D., & Alexandraki, C. (2008). Distributed collective practices in collaborative music performance. In Proceedings of 3rd ACM International Conference on Digital Interactive Media in Entertainment and Arts (pp. 368-375), New York: ACM Press. Alexandraki, C., & Akoumianakis, D. (2010). Exploring New Perspectives in Network Music Performance: The DIAMOUSES Framework. Computer Music Journal, 34(3), 66–83. doi:10.1162/comj.2010.34.2.66 Alexandraki, C., & Valsamakis, N. (2009). Enabling Virtual Music Performance Communities. In Akoumianakis, D. (Ed.), Virtual Community Practices and Social Interactive Media – Technology Lifecycle and Workflow (pp. 376–397). Hershey, PA: IGI Global. doi:10.4018/978-160566-340-1.ch019 Bartle, R. A. (2003). Designing Virtual Worlds. Indianapolis: New Riders Publishing. Brackett, D. (1999). Music. In Horner, B., & Swiss, T. (Eds.), Key Terms in Popular Music and Culture (pp. 124–140). Oxford: Blackwell.
REFERENCES
Brown, J. S., & Duguid, P. (2000). The Social Life of Information. Boston, MA: Harvard Business School Press.
Akoumianakis, D. (2009a). Practice-oriented toolkits for virtual communities of practice. Journal of Enterprise Information Management, 22(3), 317–345. doi:10.1108/17410390910949742
Brown, J. S., & Duguid, P. (2001). Knowledge and Organization: A Social-Practice Perspective. Organization Science, 12(2), 198–213. doi:10.1287/ orsc.12.2.198.10116
Akoumianakis, D. (2009b). Managing accessibility requirements in software-intensive projects. Software Process Improvement and Practice, 14(1), 3–29. doi:10.1002/spip.383
Buchel, B., & Raub, S. (2002). Building Knowledge-creating Value Networks. European Management Journal, 20(6), 587–596. doi:10.1016/ S0263-2373(02)00110-X
56
Recurrent Interactions, Acts of Communication and Emergent Social Practice
Cherney, L. (1999). Conversation and Community: Chat in a Virtual World. Stanford: CSLI Publications. Copland, A. (2002). What to Listen For in Music. Signet Classics. Costa, J., & Ferrone, L. (1995). Sociocultural perspectives on tourism planning and development. International Journal of Contemporary Hospitality Management, 7(7), 27–35. doi:10.1108/09596119510101903 Donath, J. S. (1999). Identity and Deception in the Virtual Community. In Smith, M., & Kollack, P. (Eds.), Communities in Cyberspace (pp. 29–59). New York: Routledge. Duval Smith, A. (1999). Problems of Conflict Management in Virtual Communities. In Smith, M., & Kollack, P. (Eds.), Communities in Cyberspace (pp. 29–59). New York: Routledge. Eisenlohr, P. (2004). Language Revitalisation and New Technologies: Cultures of Electronic Mediation and the Refiguring of Communities. Annual Review of Anthropology, 33, 21–45. doi:10.1146/ annurev.anthro.33.070203.143900 Fabian, J. (2002). Virtual Archives and Ethnographic Writing: ‘Commentary’ as a New Genre? Current Anthropology, 43(5), 775–786. doi:10.1086/342640 Fahlander, F., & Oestigaard, T. (2004). Material Culture and Other Things, Post-disciplinary Studies in the 21st Century, Gotarc, Series C, No 61. Retrieved from http://folk.uib.no/gsuto/ ArtiklerWeb/Material_Culture/Oestigaard.pdf Ferm, C. (2008). Playing to teach music - embodiment and identity-making in musikdidaktik. Music Education Research, 10(3), 361–372. doi:10.1080/14613800802280100
Fernback, J. (2007). Beyond the diluted community concept: a symbolic interactionist perspective on online social relations. New Media & Society, 9(1), 49–69. doi:10.1177/1461444807072417 Fuller, J., Bartl, M., Ernst, H., & Muhlbacher, H. (2006). Community based innovation: How to integrate members of virtual communities into new product development. Electronic Commerce Research, 6, 57–73. doi:10.1007/s10660-006-5988-7 Gherardi, S. (2001). Practice-based Theorizing on Learning and Knowing in Organizations: An Introduction. Organization, 7(2), 211–223. doi:10.1177/135050840072001 Gherardi, S. (2009). Introduction: The Critical Power of the `Practice Lens’. Management Learning, 40(2), 115–128. doi:10.1177/1350507608101225 Gibson, J. J. (1979). The Ecological Approach to Visual Perception. Boston: Houghton Mifflin. Giddens, A. (1984). The constitution of society, Outline of the theory of structuration. Cambridge: Polity Press. Gochenour, H., P. (2006). Distributed communities and nodal subjects. New Media & Society, 8(1), 33–51. doi:10.1177/1461444806059867 Harrison, M. T., & Barthel, B. (2009). Wielding new media in Web 2.0: exploring the history of engagement with the collaborative construction of media products. New Media & Society, 11(1&2), 155–178. doi:10.1177/1461444808099580 Harrison, R. (2009). Excavating Second Life: Cyber-Archaeologies, Heritage and Virtual Communities. Journal of Material Culture, 14(1), 75–106. doi:10.1177/1359183508100009 Hine, C. (2000). Virtual Ethnography. London: Sage.
57
Recurrent Interactions, Acts of Communication and Emergent Social Practice
Ito, M. (1997). Virtually Embodied: The Reality of Fantasy in a Multi-User Dungeon. In Porter, D. (Ed.), Internet Culture (pp. 87–110). New York: Routledge. Jones, Q. (1997). Virtual-Communities, Virtual Settlements & Cyber-Archaeology: A Theoretical Outline. Journal of Computer-Mediated Communication, 3(3). Retrieved from http://jcmc.indiana. edu/vol3/issue3/jones.html. Jones, Q. (2003). Applying Cyber-Archaeology. In K. Kuutti et al. (Eds.), ECSCW 2003: Proceedings of the 8th European Conference of Computer Supported Cooperative Work, 14-18 September, Helsinki Finland (pp. 41-60). Amsterdam: Kluwer Academic Publishers. Jones, Q., & Rafaeli, S. (2000). What do virtual ‘Tells’ tell? Placing cybersociety research into a hierarchy of social explanation. In Proceedings of the 33rd Hawaii International Conference on System Sciences (Hawaii 2000), IEEE Press. Lindkvist, L. (2005). Knowledge Communities and Knowledge Collectivities: A Typology of Knowledge Work in Groups. Journal of Management Studies, 42(6), 1189–1210. doi:10.1111/ j.1467-6486.2005.00538.x Mavra, M., & McNeil, L. (2007). Identity Formation and Music. Human architecture: Journal of the Sociology of self-knowledge, 2, 1-20. Mnookin, J. L. (1996). Virtual(ly) law: The emergence of law in LambdaMOO. Journal of Computer-Mediated Communication, 2(1). Retrieved February 8, 1999 from http://jcmc.huji. ac.il/vol2/issue1/lambda.html Nicolini, D., Gherardi, S., & Yanow, D. (2003). Introduction. In D. Nicolini, S., Gherardi & D. Yanow (Eds.), Organizational Knowledge as Practice (pp. 3–31). Armonk, NY: ME Sharpe.
58
Orlikowski, W. J. (2000). Using Technology and Constructing Structures: A Practice Lens for Studying Technology in Organizations. Organization Science, 11(4), 404–428. doi:10.1287/ orsc.11.4.404.14600 Orlikowski, W. J. (2002). Knowing in Practice: Enacting a Collective Capability in Distributed Organizing. Organization Science, 13, 249–273. doi:10.1287/orsc.13.3.249.2776 Osti, L., & Turner, W., L., & King, B. (2009). Cultural differences in travel guidebooks information search. Journal of Vacation Marketing, 15(1), 63–78. doi:10.1177/1356766708098172 Patel, A. D. (2007). Music, Language, and the Brain. Oxford: Oxford University Press. Poster, M. (1997). Cyberdemocracy: Internet and the Public Sphere. In Porter, D. (Ed.), Internet Culture (pp. 201–218). New York: Routledge. Raybourn, M. E., Kings, N., & Davies, J. (2003). Adding cultural signposts in adaptive communitybased virtual environments. Interacting with Computers, 15(1), 91–107. doi:10.1016/S09535438(02)00056-5 Reid, E. (1999). Hierarchy and Power: Social Control in Cyberspace. In Smith, M., & Kollack, P. (Eds.), Communities in Cyberspace (pp. 134–166). New York: Routledge. Rheingold, H. (1993). The virtual community: Homesteading on the electronic frontier. Reading, MA: Addison-Wesley. Ridings, C., & Gefen, D. (2004). Virtual community attraction: Why people hang out online. Journal of Computer-Mediated Communication, 10(1), Article 4. Retrieved from http://jcmc.indiana.edu/vol10/issue1/ridings_gefen.html
Recurrent Interactions, Acts of Communication and Emergent Social Practice
Ritzer, K., & Liska, A. (1997). ‘McDisneyization’ and ‘Post-Tourism’: Complementary Perspectives on Contemporary Tourism. In Rojek, C., & Urry, J. (Eds.), Touring Cultures: Transformations of Travel and Theory (pp. 96–109). New York: Routledge. Schatzki, T. R. (1996). Social Practices: A Wittgensteinian Approach to Human Activity and the Social. Cambridge: Cambridge University Press. doi:10.1017/CBO9780511527470 Schatzki, T. R. (2001). Introduction. Practice Theory. In Schatzki, T. R., Knorr-Cetina, K., & von Savigny, E. (Eds.), The Practice Turn in Contemporary Theory (pp. 1–14). New York: Routledge. Schmallegger, D., & Carson, D. (2008). Blogs in tourism: Changing approaches to information exchange. Journal of Vacation Marketing, 14(2), 99–110. doi:10.1177/1356766707087519 Sterne, J. (2006). The mp3 as cultural artifact. New Media & Society, 8(5), 825–842. doi:10.1177/1461444806067737 Stockdale, R. (2007). Managing customer relationships in the self-service environment of e-tourism. Journal of Vacation Marketing, 13(3), 205–219. doi:10.1177/1356766707077688 Suchman, L. (1987). Plans and Situated Action: The Problem of Human–Machine Communication. Cambridge: Cambridge University Press. Suchman, L. (2007). Human–Machine Reconfigurations: Plans and Situated Actions. Cambridge: Cambridge University Press. Turkle, S. (1995). Life on the Screen: Identity in the Age of the Internet. New York: Touchstone.
Walsh, G., & Gwinner, P., K. (2009). Purchasing vacation packages through shop-at-home television programs: An analysis of consumers’ consumption motives. Journal of Vacation Marketing, 15(2), 111–128. doi:10.1177/1356766708100819 Watson, N. (1997). Why We Argue About Virtual Community: A Case Study of the Phish.net Fan Community. In Jones, S. G. (Ed.), Virtual Culture: Identity and Community in Cyberspace (pp. 102–132). London: Sage. Weick, K. E. (2001). Making Sense of the Organization. Malden, MA: Blackwell Publishing. Wellman, B. (2001). Physical place and cyberplace: Changing portals and the rise of networked individualism. International Journal of Urban and Regional Research, 25(2), 227–252. doi:10.1111/1468-2427.00309 Wellman, B., Salaff, J., Dimitrova, D., Garton, L., Gulia, M., & Haythornthwaite, C. (1996). Computer Networks as Social Networks: Collaborative Work, Telework, and Virtual Community. Annual Review of Sociology, 22(1), 213–238. doi:10.1146/ annurev.soc.22.1.213 Wilson, S. M., & Peterson, L. C. (2002). The Anthropology of Online Communities. Annual Review of Anthropology, 31(October), 449–467. doi:10.1146/annurev.anthro.31.040402.085436 Wise, J. M. (1997). Exploring Technology and Social Space. Thousand Oaks, CA: Sage. Zammuto, F. R., Griffith, L. F., Majchrzak, A., & Dougherty, J., D., & Faraj, S. (2007). Information Technology and the Changing Fabric of Organization. Organization Science, 18(5), 749–762. doi:10.1287/orsc.1070.0307
Vaast, E. (2007). What Goes Online Comes Offline: Knowledge Management System Use in a Soft Bureaucracy. Organization Studies, 28(3), 283–306. doi:10.1177/0170840607075997
59
Recurrent Interactions, Acts of Communication and Emergent Social Practice
KEY TERMS AND DEFINITIONS Constituting Structures: Structures embodied or designed into technology to facilitate designated functional and non-functional requirements.
60
Cultural Artifacts: Computer-mediated artifacts which reveal the existence of a community and unfold its purpose and practices. Practice Lens: A research perspective focusing on technology use and the emergent structures revealed through such use.
61
Chapter 4
Toward Diversity in Researching Teaching and Technology Philosophies-in-Practice in e-Learning Communities Gale Parchoma Centre for Studies in Advanced Learning Technologies (CSALT) & Lancaster University, UK
ABSTRACT e-Learning is pervasively perceived as a singular enterprise, subject to broad claims and overarching critiques. From this viewpoint, the strengths and weakness of large-scale e-learning implementations in supporting all forms of teaching and learning in higher education can be examined through bestpractices lenses. This chapter contests the e-learning singularity paradigm through examining a sample of diverse e-learning communities, each of which may be associated with distinct teaching and technology philosophies-of-practice, as well as divergent research and development histories. A gestalt view of interacting and interlocking teaching and technology philosophies underpins a call for local actions aimed at achieving the democratization of e-learning environment design and fostering both difference and connectivity across e-learning communities of research and practice.
INTRODUCTION Much attention has been and continues to be focused on examinations and theories of e-learning adoption in higher education (Anderson, 2008; Archer, Garrison, & Anderson, 1999; Bates, 2000, 2005; Garrison & Kanuka, 2004; Greener & Perriton, 2005; Laurillard, 2008; Nickols, 2008; Njenga, 2008; Parchoma, 2008; Stahl & Hesse, 2009). To date, this discourse has been marked DOI: 10.4018/978-1-60960-040-2.ch004
by generalist approaches, which tend to condense all forms of technology-mediated teaching and learning practices in higher education (HE) into an ill-defined field of e-learning research; and advocacy approaches, which promote or redress specific frameworks and models for adoption. Both approaches tend to be spiced with either pro- or anti-commentaries on “technopositivist ideology, a compulsory enthusiasm” (Njenga, 2008, p.2), for the potential for technology to transform teaching and learning in HE. Similarly, both approaches tend to ignore or reject the interrelationships be-
Toward Diversity in Researching Teaching and Technology Philosophies-in-Practice
tween disciplinary ways of knowing, underpinning philosophies of teaching and technology, and the resultant degrees of alignment or disconnect with institutionally mandated e-learning systems. In this chapter, I explore an alternative route through contested e-learning territories, a route initially opened for exploration through Kanuka’s (2008) work on understanding e-learning technologies-in-practice through philosophies-inpractice. References are made to higher education [HE], adult education [AE], technology and educational technology literatures in order to bring a relevant range of perspectives on teaching and technology relevant to bear on the issues at-hand. My efforts focus on achieving the following objectives: 1. Undertaking a critical examination of Kanuka’s (2008) framework and recommendations. 2. Extending the range of both teaching and technology philosophies-in-practice under consideration. 3. Theorizing a gestalt perspective on interrelationships between teaching and technology praxes. 4. Examining four recognizable e-learning research and practice communities for associations with teaching and technology philosophies-in-practice. 5. Making a case for continued diversity in elearning research and practice communities as an avenue to reconciliation of these virtual communities with their social, place-based environments. 6. Positing the interplay between teaching and technology philosophies-in-practice as a site for researching diverse views.
Background More than a decade ago, Gandolfo (1998) posited the effective use of technology has “the potential to improve and enhance learning;” however:
62
Just as assuredly there is the danger that the wrong headed adoption of various technologies apart from a sound grounding in educational research and practice will result, and indeed in some instances has already resulted, in costly additions to an already expensive enterprise without any value added. That is, technology applications must be consonant with what is known about the nature of [teaching and] learning and must be assessed to ensure that they are indeed enhancing learners’ experiences (p. 24). The increasingly ubiquitous presence of e-learning initiatives, strategies, and program offerings across institutions of higher education (HE) underpins debates around whether educational research and practice are keeping pace. Assurances that e-learning can enhance student learning via flexible access (Bates, 2005) to effective (Naylor, 2005), economical (UNESCO, 2002), up-to-date (Barone, 2003), problem-based (Jonassen, 2004), relevant (Alclay, 2003), community-oriented (Schwier, 2001, 2007; Schwier & Daniel, 2006), low-risk (Garrison & Kanuka, 2004), quality teaching and learning innovations (Garrison, Kanuka, & Hawes, 2002), that promote graduate-employability (EKOS, 2005), and internationalization (DiPaolo, 2003; Jones & Steeples, 2002), have all been linked to rationales for swift, broad-scale adoption. Despite the trend toward fast-paced, broad-ranging, and innovative e-learning adoption, over the past decade in the UK “the HE sector continued to make small gains in localized projects, but not to achieve mastery of the technology in service of its learning and teaching ambitions” (Laurillard, 2008, p. 522). Internationally, progress on e-learning acceptance and diffusion has encountered similar challenges (Bates, 2000, 2005; Nickols, 2008; Parchoma, 2008). Over-sold e-learning claims and promises, combined with under-achieved ambitions, have led to resistance on a range of fronts. Critical educators have asked us to consider if a “technopositivist
Toward Diversity in Researching Teaching and Technology Philosophies-in-Practice
ideology,” is “being created, propagated and channeled repeatedly by the people who stand to gain either economically, socially, politically or otherwise in due disregard of the trade-offs associated with the technology to the target audience” (Njenga, 2008, p.2). Peters (2006) argues that technology has become “the new star ship in the [educational] policy fleet for governments around the world,” resulting in “conceptually inchoate and ill-defined technology” moving to the core, both as a disciplinary subject and a programmatic approach, in higher education” (p. 95). Njenga (2008) cautions us that unbridled “enthusiasm for the use of technology for endless possibilities” will result in a misconception “that providing information automatically leads to meaningful knowledge creation” (pp. 969970), thus confusing information provision with an approach to teaching. Through lenses such as these, broad-scale e-learning initiatives have been critiqued as disruptive applications of technology, which are eroding (Bok, 2003), impoverishing (Greener & Perriton, 2005), and may even be capable of eclipsing institutionally-based HE (Archer, Garrison, & Anderson, 1999). Institutionally sanctioned e-learning models and strategies have been held to account for falling dismally short of respecting long held disciplinary teaching traditions (Nichols, 2008). HE policy and strategy documents increasingly privilege the goal of embedding standardized technologies into homogeneous pedagogical practices in order to ensure equality and consistency in the student experience of e-learning (See for example, EKOS, 2005; Lancaster University, 2006; Nichols, 2008). At a minimum, privileging the goal of consistency in student e-learning experiences evades discourse on difference among philosophies of teaching and learning. Excluding philosophically informed pedagogical debates on “theories and models, modes of delivery, instructional roles, instructional designs, and learning processes and outcomes” (Harasim, 2006, p. 59) homogenizes e-learning into a singularity paradigm, which
assumes the existence of community (Greener & Perriton, 2005), without due consideration or provision for diverse, emerging, and evolving e-learning community needs. Given the breadth of the range of hopes, promises, practices, concerns, and skepticisms that can be associated with e-learning as a singular entity, HE educators need the opportunity to step back and reflect (Laurillard, 2008; Njenga, 2008) on the nature of a range of e-learning forms, practices, and purposes. Where Njenga (2008) suggests shifting our attention from “actual educational technology as it advances” to technology’s “educational functions” and “the effects it has on the functions of teaching and learning, Kanuka (2008) posits a reversed process: reflecting on the “schism between opinions” via considering the philosophical “nature of the disagreement” (p. 92). She recommends beginning from a consideration of the interrelationship between philosophies of teaching “and the choices we make about e-learning technologies” (Kanuka, 2008, p. 93). Combined, Njenga’s and Kanuka’s focal points provide a useful gestalt perspective for a postmodern deconstruction of the e-learning singularity paradigm, along with opportunities to extend inquiries into the interplay between activities within virtual learning communities and the technologies that support them.
Unpacking the e-Learning ‘Singularity Paradigm’ The e-learning singularity paradigm evades philosophically informed, discipline-aware, research-based innovation and practice. This singularity paradigm, with its focus on effectiveness, consistency, efficiency, fiscal sustainability, and arguably change for its own sake, has been associated with a series of e-learning adoptions gone wrong in HE, including: failed attempts at commercialization (Bok, 2003; Greener & Perriton, 2005; Laurillard, 2008), increased enrolment and reputation competitions among traditional univer-
63
Toward Diversity in Researching Teaching and Technology Philosophies-in-Practice
sities (Hanna, 2000); corporate competition with traditional universities (DiPaolo, 2003; Greener & Perriton, 2005); inequitable access, associated with the digital divide and institutional elitism (Archer, Garrison, & Anderson, 1999), Western capitalistic-academic colonialism (Zurawski, 1996); the erosion of academic rigor (Olcott & Schmidt, 2000); and the standardization of curricula and impoverishment of the concept of community in HE experiences (Greener & Perriton, 2005). Taken further, privileging the goal of consistency in student e-learning experiences infers all forms of ICT-mediated teaching, learning, and community should, over time, converge into a limited set of identifiable, manageable best practices. Within a singularity paradigm, wide-ranging e-learning applications, which assume the concept of community, have been invoked as both promise and platform for supporting logistical, pragmatic, and pedagogical challenges and aspirations typically encountered in teaching and learning in HE. As a single entity, e-learning can signify anything from: online access to registration, grades, or course outlines (Littlejohn & Pegler, 2007); to an the transmission of a lecture from a server to a mobile phone (Wagner, 2005); to standardized learning object repositories, designed to distribute outcomes-oriented tutorials across modules or even academic disciplines (Daniel & Mohan, 2004); interactive games (Oblinger, 2006), simulations (Whitehouse, 2005), or virtual reality-based laboratory exercises (Abutarbush, et al., 2006); the use of electronic discussion boards to extend, reduce, or replace face-to-face dialogue (Parchoma, 2008); virtual spaces for social introductions into new academic settings (Selwyn, 2007); or provision of access to electronic databases of research resources (Tenopir, 2003). e-Learning community spaces vary in purpose and form from the creation of secure, shared electronic spaces for displaying and evaluating portfolio artifacts (Wong, 2000), to team-building and problem-solving activities (Jonassen, 2004),
64
to collaborative knowledge construction (Jones, Ferreday, & Hodgson, 2006) in networked learning communities (McConnell, 2006), to combined individualistic and collaborative contributions to virtual learning communities (Schwier, & Daniel, 2006). Faced with this array of options and associated terms, many tertiary educators are left wondering “what e-learning is” and whether it might indicate a systemic move away from the classroom, toward distance education (Njenga, 2008, p. 3). While the scope and aims of e-learning activities and virtual communities are both diverse and continuously evolving, managerial policy discourses on effectiveness, consistency, and efficiency seek to restrain diversity in the interest of institutional continuity and sustainability (See for example, EKOS, 2005; Lancaster University, 2006). Effectiveness, consistency, and efficiency discourses tend to base broad knowledge claims in references to context-dependent research. Diversities across research designs, theoretical frameworks, findings, and implications from inquiries into virtual learning environments, technology enhanced learning, social networking, computer-supported collaborative learning, and networked e-learning communities tend to be reduced into convergent, simplified sets of best practices, prescribed for use across geographical, cultural, demographic, and pedagogic settings. Questions on whether ongoing practice-based research is necessary for the development of theoretically informed, philosophically grounded, evolving practices within continuously changing technological and institutional landscapes are frequently deflected on the basis of fiscal restraints. Scarce resources for experimentation and innovation are increasingly migrating into the remit of information service units, whose foci of concern are oriented toward institutionally mandated technological solutions. As a result, the singularity paradigm is entrenched via policy.
Toward Diversity in Researching Teaching and Technology Philosophies-in-Practice
INSIDE THE GESTALT OF E-LEARNING Kanuka (2008) argues that simply selecting elearning technologies, “based on the latest institutional strategy or trend,” can lead to “incongruence and inconsistency in action between and among instructors, administrators, and students, and the ensuing disagreements that revolve around the means rather than the ends of education” (p. 111). She appeals to educators in institutions of HE to “know your teaching and technology philosophies in practice” as avenues to “avoiding mindless activism” (p. 111) and to developing informed responses to institutionally sanctioned e-learning technologies and systems. Technological systems, through their range of functionalities, affordances and limitations have profound influences on teaching and learning process that reach far beyond the way information is organized or the range of activities they support. As teachers and learners in each “discipline area find new ways of using the technology to understand or illuminate … knowledge,” these “new forms of representation offer new forms of engagement with, and ownership of knowledge and the individual’s developing understanding” (Laurillard, 2008, p. 526). From an alternative stance, disciplinary and interdisciplinary ways of knowing, teaching, and learning could inform designs for truly innovative technological systems. Far from having reached a point where sets of generalist best practices are ready to be institutionally recommended or mandated, HE e-learning researchers and practitioners and learners are just beginning to explore possibilities inside the social-technical gestalt of e-learning. Points of departure for this exploration necessarily include both disciplinary and interdisciplinary technology and teaching philosophies-in-practice. Drawing primarily on Elias and Merriam’s (1980) work on philosophies of adult education, Kanuka (2008) identifies alignments and frictions between ways of understanding educational tech-
nologies and philosophies of teaching (p. 101). Through this lens, Kanuka (2008) posits a reproach to the singularity paradigm of e-learning. Kanuka (2008) examines intersecting areas between three determinism roles for technology in teaching and learning in HE and six schools of philosophical thought on adult education. In her mapping exercise, she surfaces a series of interrelated teaching and technology diversities.
Three Varieties of Determinism Kanuka (2008) examines three modernist, determinist positions on technology philosophyin-practice. The first modernist position, uses determinism, is defined as an orientation to perceiving technologies as a set of “neutral tools” or “devices that extend our capacities,” over which “we have control and autonomy” (Kanuka, 2008, p. 96). Kanuka warns readers that an exclusive uses determinism perspective carries the potential to overlook or “disregard the broader social structures and/or technological artifacts’ effects on learning outcomes, leading to explanations that overemphasize the power and autonomy of actors” (p. 97). As a result of an overly focused, uses determinism perspective, “misguided or naïve assumptions” (p. 97) about the absence of social influences in technological systems can lead to illinformed choices and unintended consequences. The second position, social determinism, emphasizes ways that educational “uses of technologies are affected by the social structures and the social construction of technological artefacts” (Kanuka, 2008, p. 97). Critique of social determinism provides a gestalt image of the constraints of uses determinism, in that social determinism “can lead to flawed understandings of educational technology” through its lack of acknowledgment of “user agency or material limits” (p. 98). Through an exclusive social determinism lens, insufficient attention is paid to the dynamics of mutual shaping among social, technological, and actor-agency elements of e-learning environments.
65
Toward Diversity in Researching Teaching and Technology Philosophies-in-Practice
The third position, technological determinism, is defined as a Marxist approach to understanding “technologies as causal agents, determining our uses and having a pivotal role” in either sustaining advantaged populations and supporting politicaleconomic “interests such as commodification, commercialization, and corporatization of education or transforming learning through expanding access and increasing the quality of learning experiences (Kanuka, 2008, pp. 98-99). Kanuka (2008) advocates avoidance of any “one-dimensional” position and posits ongoing attention to the “effects of educational, social, and historical forces that have shaped both educational systems and educational technologies” (p.101) before focusing her discussion on philosophies of teaching. This abrupt conclusion and shift of focus are unsatisfying on a couple of levels. First, there appears to be insufficient room for reconciliation of the short-comings of any of the three determinism positions via an aggregation approach. The constraints of any of the positions cannot simply be overcome by simultaneous awareness of all three positions. There are problematic questions inherent in adopting a simultaneous, tri-position awareness that uncritically combines contradictory views. Critical awareness of contradictions falls short of theorizing an alternative, tenable perspective.
Mapping Six Schools of Modernist Thought on Educational Philosophy to the Three Modernist Determinisms In an effort to unpick HE e-learning adoption issues, Kanuka (2008) identifies points of coherence and friction across uses determinism, social determinism, and technological determinism and six schools of philosophical thought on adult higher education. Where it is unlikely that an individual HE educator would feel comfortable having their philosophy of teaching clearly tucked into a single category, Kanuka’s empirical study provides evidence that HE educators tend to align
66
themselves more strongly with one school than with others. The liberal/perennial school in HE has two primary aims: “(1) the search for truth, and (2) to develop good and moral people,” achieved via academic transmission of liberal educational content and subsequent student acquisition of “rational, intellectual, and evolving wisdom; moral values; a spiritual or religious dimension; and an aesthetic sense” (Kanuka, 2008, pp. 101-102). Kanuka argues that e-learning is generally perceived as interfering with the philosophical aims of the liberal/perennial school because flexible and convenient access to standardized curricula “typically associated with online courses and economies of scale are viewed as subverting academic rigour and “robbing the student of a [rich] intellectual experience (p. 103). She concludes that liberally oriented academics are most closely aligned to a technological deterministic view that critiques e-learning primarily as a “technology for disseminating an onslaught of incoherent and fragmented trivialities … at the expense of engagement, reflectivity, and depth” (p. 99). The progressive school of educational philosophy, with its focus on a practical and pragmatic orientation to “personal growth, maintenance, and promotion of a better society,” allows for a broader view of the usefulness of educational technologies, especially learner-centred, personalized, and problem-solving applications of elearning technologies in well managed e-learning environments (Kanuka, 2008, pp. 103-104). The facilitative role in progressively oriented teaching practice strengthens the alignment between the Progressive School and uses determinism (p. 104) because the choice of a facilitative role is an example of agency. The behaviouralist school, “with its focus on effective, observable, and measurable academic achievements and desired changes personal behaviour,” clearly aligns with a positivist view of technological determinism (Kanuka, 2008. p. 105). Behaviouralist methodologies, including com-
Toward Diversity in Researching Teaching and Technology Philosophies-in-Practice
petencies-based learning, programmed learning: “personalized systems of instruction, individually guided instruction, and individually prescribed instruction” lend themselves well to mechanized e-learning systems. “Automated courses (quizzes and exams) with modularized units, tutorials and/ or simulations” can create efficiencies and support economies of scale (Kanuka, p. 102); therefore, “the majority of behaviouralists believe that the use of e-learning technologies, in all forms, results in effective and efficient learning” (p. 106). The focus of the humanist school is “to support individual growth and self-actualization” through establishing learning environments marked by “freedom and autonomy, trust, active participation, and self-directed learning” (Kanuka, 2008, p. 108). As “the act of learning is a personal activity,” which involves “intrinsic motivation, self-concept, perception, and self-evaluation” (Kanuka, p. 107). As facilitators of “flexible, convenient” access to collaborative learning environments, designed for personal growth, humanists most closely align themselves to uses determinism (p. 107). The “overarching aim” of the radical-critical school is to “evoke change in the political, economic and social order in society via the intersection of education and political action” (Kanuka, 2008, p. 108). The purpose of HE is closely tied with pedagogical approaches that include “problem identification,” “collective dialogue, ideal speech, and critical questioning in a risk-free environment” (Kanuka, p. 108). As education is inherently value-laden and never neutral, power relations between teaching and learning are central and all forms of evaluation are problematic (p. 109). Radical educators “align themselves most closely with social determinism,” and avoid the use of surveillance-equipped, corporate learning management systems (p. 109). The focal point of an analytical school orientation to educational philosophy is the “fearless transmission of neutral knowledge,” disciplinebased truths that are “morally, socially, and politically neutral” (Kanuka, 2008, p. 110). Analytical
educators emphasize “the need for clarifying [and verifying] concepts, arguments, and policy statements” (Kanuka, p. 110). Students are expected to “temporarily give up their freedom and subject themselves to being guided, criticized, and tested according to the standards of a discipline” (p. 110). Analytical educators align themselves “most closely with determinism” and view e-learning processes as desirable for the transmission of lectures and the facilitation of “effectively moderated teacher-directed” dialogues (p.111). Kanuka (2008) advocates avoidance of any “one-dimensional” position and posits ongoing attention to the “effects of educational, social, and historical forces that have shaped both educational systems and educational technologies” (p. 101). She cautions educators “who choose and use e-learning technologies [to] be knowledgeable about [their] philosophies of teaching, as well as the multidimensionality of technological determination, and be reflexive about the limits of their activities in both areas” (p. 95). These recommendations provide openings for new ways of thinking about researching adoption of educational technologies in HE and the roles of virtual communities across philosophical perspectives of e-learning. See Figure 1 for a continuum of teaching and technology philosophies-in-practice.
Postmodernism Elias and Merriam (2005) posit the addition of postmodernist perspectives to the established six philosophies of adult education (AE). They contend that the introduction of postmodernism “challenges us to examine everything about our philosophy and practice” (p. x). Postmodernist orientations to teaching and learning signal “dissatisfaction with the present or modern” (p. 217), and signify the importance of an “awareness of contingency” (p. 221) into thinking about AE and HE. Further, postmodernists reject wholesale, global meta-narratives and Enlightenment dogmas, including:
67
Toward Diversity in Researching Teaching and Technology Philosophies-in-Practice
Figure 1. A modernist continuum of teaching philosophies-in-practice and determinist philosophies of technology
The belief that knowledge in the empirical sciences can be automatically applied to increase human progress and happiness; that the human sciences can transform society into a totally rational culture; the belief in rational universal values and human progress toward a utopian goal; that knowledge will liberate people from oppression. (Elias & Merriam, p. 222) Postmodernism accommodates “the modern critical spirit, personal autonomy, [and] individual rights” (Elias & Merriam, 2005, p. 224), and is “more in accord with emerging paradigms, which see the world as rich, open, subtle, complex, complementary and interrelated” (Inbody, 1995, p. 536). Postmodernists critique all six of the earlier philosophies of AE and HE. As radical educators have already done, postmodernists reject the Liberal/Perennial School’s commitment “to the grand narratives of the past and of modernity;” then extend this criticism to the work of the analytical school. They charge analytical educational philosophers with supporting and protecting modernist hegemonies of “technical-instrumental reason” and “the stance of objectivity and value neutrality in the making of knowledge claims” (Usher et al., 1997, p.7). Postmodernists also reject the progressive school for its roots in “empirical observation and rational processes of induction and deduction” (Usher et al., 1997, p.7), and linkages with uncritical
68
integration of individuals into formal education, employment systems, and technologies of control (Elias & Merriam, pp. 231-232). In deep agreement with humanists and radicals, postmodernists reject The Behaviouralist School for its “regulation of individuals by variables defined either as instruction or as rewards for correct behavior” (Elias & Merriam, p. 233) and for its pedagogies, which ignore diversities, including “gender, ethnicity, and class” (p. 234). Postmodernists recognize frameworks for understanding constituted in the language of humanistic educational philosophy, especially its reference points of person-centeredness and empowerment (Usher, Bryant, & Johnston, 1997, p. 75). However, they question the humanist contention that people can achieve autonomy and emancipation merely through talk because these techniques can be used by educators and counselors as “subtle forms of manipulation” directed toward desirable links between “self-fulfillment and political, economic, and social goals” (Elias & Merriam, p. 235). Further, postmodernists critique the Humanist School for failing to recognize the degree to which “the self is determined by the culture in which it was formed,” as well as the humanist focus on excessive individualism (Elias & Merriam, pp. 236-237). They reject the Radical School’s “dream of a future utopia” in favor of localized political actions, based in “the here and now” (Elias & Merriam, p. 237). While
Toward Diversity in Researching Teaching and Technology Philosophies-in-Practice
postmodernists express appreciation for radicalcritical theory for their support for participatory pedagogy, they reject statements of “how persons should act in the world” and “the notion of a rational culture in which communicative discourse is possible and necessary” (Elias & Merriam, pp. 238-239). A problem with posing the potential for an emergent, postmodernist, philosophy of teaching and learning or technology is that most postmodernist theorists would immediately reject the concept of a coherent or stable collection of postmodernist thoughts on either AE or HE. Postmodernists do, however, share more common perspectives than just rejections of the rational, the traditional, and the conventional; they collectively tend to privilege difference, spontaneity, emotion, creativity, and a de-centered sense of localized action and minimalist responsibility, as well as respect for diverse cultural contexts, and recognition of diverse forms of knowledge, “including the ethical, the technical, the existential, and the aesthetic” (Elias & Merriam, p. 240). A common goal of postmodernist approaches to AE and HE is to “develop educational practices that respectfully legitimize adult learners’ lives, perspectives, discourses, and voices” (Hemphill, 2000, p.21). Postmodernist approaches include revisionism from cultural perspectives,” including indigenous, minority, “feminist, black, and physically handicapped” perspectives, reinterpretations, and deconstructions of educational events (Elias & Merriam, 2005, p. 241). Phenomenalism is valued for its “unpredictability, changeability, uncertainty, and ambiguity” (Elias & Merriam, p. 242). Within postmodernist perspectives, “responsibility for adult education becomes diffused through society” and “does not differentiate itself from related human activity, such as research, social work, political action, and recreation, but is immersed in all of these activities” (p. 242). Postmodernist thinking admits privatization into AE and HE on the grounds of allowances for participant choice and withdrawal (p. 242).
Postmodernist approaches to AE and HE employ diverse “evaluation, justification, and appraisal” criteria, especially [favoring] those that stress the aesthetic, spiritual, affective, and experimental” (p. 240). Given the diversity, instability, and unpredictability of evolving postmodernist approaches to AE and HE, these approaches cannot be appropriated into any stable notion of a philosophy of technology. However, it is possible to contend that postmodernist approaches to learning can be considered in closer alliance to a subset of teaching philosophies-in-practice in AE and HE than other modernist schools of thought. Postmodernist approaches to teaching and learning have much in common with the humanistic school’s values of freedom and autonomy, trust, and self-determined learning and the radical-critical school’s foci on collective dialogue and questioning in a risk-free environment.
Epistemic Fluency Goodyear’s (2002) assertion of the multiplicity of accounts of learning available in psychological and educational theory, combined with his critique of modernist philosophies of teaching and learning in HE, aligns his work with postmodernism. First, he rejects all behaviouralist notions of controlled, experimental discovery of “rules or algorithms for optimizing learning” on the basis of recognition that there is always a “legitimate gap between the tasks we set for learners” [especially adult learners] and “the activities in which they actually engage”; therefore, “interpretive work” is required to “draw out implications” of teaching and learning practices (Goodyear, 2002, p.50). He critiques the remaining modernist conceptions of teaching and learning in HE, which he categorizes as: (1) the “academic” strand, (2) “general competence,” stance, and (3) “critical being and reflexivity” position (Goodyear, 2002, pp. 53-54). Goodyear’s (2002) academic strand is marked by its emphasis on learner competence in “aca-
69
Toward Diversity in Researching Teaching and Technology Philosophies-in-Practice
demic discourse, with its heavy reliance on declarative conceptual knowledge, contemplative forms of analysis and use of textual (including mathematical) representations,” and its primary goals of ensuring student achievement in the areas of recalling “conceptual knowledge” and deploying that knowledge “in the construction of arguments, and or in the solution of problems” (p. 53). The privileging of declarative conceptual knowledge for the purposes of engaging in academic discourse and argument construction closely align Goodyear’s (2000) critique of the academic conception of learning in HE with Elias and Merriam’s (1980, 2005) liberal/perennial and analytical philosophies of adult education. The general competence strand in Goodyear’s (2002) description emphasizes learner competencies for employability: core and transferable skills in “literacy, numeracy, communication, foreign language, leadership, teamworking, and IT” (p. 54). The general competence strand’s focus on supporting learners in developing “a willingness to learn; ability to deal with change and question assumptions; analytical, critical, and problemsolving skills, as well as knowledge and ideas for the purpose of achieving rewarding employment” (p. 54), aligns it well with Elias and Merriam’s (1980, 2005) progressive school. Goodyear (2002) identifies the critical being and reflexivity conception of learning in HE via its “rejection of ‘academicist’ and ‘operational competence’ conceptions,” and its basis in Barnett’s (1997) “conviction that we can have no certain knowledge of the world and consequently knowledge and skills become redundant and marginal” (p. 55). Therefore, from a critical being and reflexivity perspective, the goals of HE need to be focused on supporting students in (1) “their acquisition of discursive competence”; “insight into what it is like to handle with confidence the concepts, theories, and ideas of a field of thought, to handle complex ideas in communication with others”; (2) developing “self-reflexiveness by framing a student’s initiation into a field of thought
70
such that they see its essential openness and how they may be actors in it”; and (3) encouraging informed but critical action: understanding the power limitations of the field as a resource for action” (p. 55). The aims and approaches of the critical being and reflexivity stance correspond with Elias and Merriam’s (1980, 2005) humanist and radical/critical schools. In his synthesis, (and I would argue, evaluation), of modernist conceptions of teaching and learning in HE, Goodyear critiques the “academicist strand” for its privileging of the goal of acquisition of “‘second order’ knowledge,” the general competence strand for its privileging of “key skills and pre-dispositions,” and the critical being and reflexivity strand for its “end of knowledge’ position” (Goodyear, 2002, p. 55). Goodyear identifies debate on the nature of knowledge as an underpinning theme across modernist positions. He articulates the bases for his stance on teaching and learning in HE: (1) “the idea that higher education is a site for the development and use of ‘working knowledge’; and (2) the idea that the speed of change in modern knowledge-based economies, coupled with a need to be open to diverse views, requires students to develop a reflexivity in their use of knowledge—something Alan Collins … calls ‘epistemic fluency’” (p. 55). Examining the affordances of e-learning environments and their design elements for supporting virtual learning communities, in which exploration of diverse views, along with the development and use of working knowledge and reflexivity are central aims, will require innovative research designs, and perhaps, new methodologies.
Moving Beyond Determinist Philosophies of Technology In parallel to emerging challenges posed by postmodernist critiques of modernist approaches to teaching and learning, limitations of all three modernist technological positions challenge re-
Toward Diversity in Researching Teaching and Technology Philosophies-in-Practice
searchers to reexamine determinist philosophies of technology. First, deterministic beliefs include: 1) that “technical necessity dictates the path of development” and 2) “that path is discovered through the pursuit of efficiency” (Feenburg, 1999, p. 77). Deterministic positions project an “abstract technical logic of the finished project back into its origins as a cause of development” (Feenburg, p. 81), seemingly providing evidence for “an autonomous functional logic that can be explained without reference to society” (p.77). Therefore, technology can be social “only [emphasis added] through the purposes it serves” and those “purposes are in the mind of the beholder” (p. 77). Thus, social institutions can only “adapt to the [ambivalent] ‘imperatives’” of technology (p.77). This logic led to the focus of traditional Marxism on the necessity of state ownership of technology, as the means of production, as the route to social emancipation. Recognition of a neutral instrumental rationality could “guide management of society as a system” (p.75)—leaving no room for either individual or local agency. Where Kanuka (2008) acknowledges the constraints of determinism, and counsels her readers to develop a critical awareness of the limitations of deterministic positions, developing a critical awareness falls short of theorizing alternative, viable, philosophical stances.
Feenburg’s Varieties of Theory Limiting the discussion of technological philosophies-in-practice to technological determinism positions is unnecessary. Feenburg’s (1999) varieties-of-theory approach to framing philosophies of technology extends the field of available perspectives. Feenburg’s (1999) grid analysis, as illustrated in Table 1, posits fixed positions on technological autonomy and human agency that intersect with fixed technological perspectives on neutrality and value-ladenness. Feenburg’s (1999)varieties of theory grid provides alternatives to the limits of discon-
nected means-ends determinisms via considering technology as “a site of social struggle” (p. 82) where human “attitudes and desires crystallize around technical objects and influence their development” (p.80). Introducing individual and social agency into consideration of technical philosophies-in-practice brings us to an instrumentalism perspective.
Instrumentalism An instrumentalist position presumes a liberal faith in technology as a route to progress. The design process for instrumentalization involves four sequential sub-processes: 1) decontextualization, 2) reductionism, 3) autonomization, and 4) positioning. To “reconstitute natural objects,” or as is the case in educational technologies, to reconstitute teaching and learning activities in technical systems, they first must be “deworlded”—“artificially separated from the context in which they are originally found” in order to be examined and understood in terms of their technical properties or “schemas” (Feenburg, p. 203). These de-worlded activities can then be simplified: “stripped of technically useless qualities, and reduced to those aspects through which they can be enrolled in a technical network” (p. 203) where simplified forms can be automated to produce formalized, quantifiable, desirable affordances. The design process is complete when the automated object or activity—for example, a learning management system or social networking software—is positioned in “the right place at the right time” for broad-scale adoption. At this point, secondary instrumentalization is required. Individual automated events in virtual environments need to be systematically “reembedded into a natural environment” (p.205). In order to increase the odds for systemic acceptance and adoption, technical artifacts are adorned with “ethical and aesthetic mediations” (p. 206). Ethical ornamentations tend to be condensed “considerations of efficiency”: fiscal opportunity
71
72
General Aims
Seek truth, develop good & moral people
Pragmatic orientation to personal growth, maintenance & promotion of better society
Focus on effective, efficient, observable & measurable outcomes
Support individual growth and self-actualization
Evoke political, social & economic change through teaching and learning
Transmission of liberal content » student acquisition of rational, intellectual & evolving wisdom; moral values; spiritual/ religious dimension; aesthetic sense
Pedagogical foci
Table 1. Summary of modernist mapping exercise
Uses determinism
Social determinism
+
+
Uses determinism
Technological determinism
+
+
Uses determinism
Technological determinism
Phil.of Tech. Alignment
+
-
+/-
LMS are valuable for disseminating lectures, moderating teacher/tutor-led dialogues
Carefully designed e-learning environments can allow power distribution to students; all forms of assessment are problematic
e-Learning can support intrinsic motivations, development of self-concept / identity, reflection, & self-assessment
e-Learning is useful for modular, automated content + quizzes, exams; supports economies of scale
e-Learning can support learner-centered, personalized, problem-based pedagogies & allows for a facilitative role for teachers / tutors
e-Learning rejected on the basis of being too flexible and convenient, providing access to fragmented, standardized curricula & therefore subverting academic rigor
Rationale for position on e-learning
LMS, discussion boards, standardized assessment
Avoid surveillance-equipped LMS, Use discussion boards, chat, collaboration and social networking
LMS discussion boards, self-assessment tools, chat, collaboration and social networking
and responsibility or proof-of-concept. Aesthetic style and packaging for distribution can be equated with good design. Subsequently, the new technology needs to become associated with advancing a vocation; in the case of educational technology, advancing teaching (pp. 206-207). Finally, through tactical initiatives, such as institutional strategies, communications, and support programs, consensus is gradually formed around social bonds, which lend the new technology the normativity of progress (pp. 207; 103). Thus, it is around the processes of instrumentalism that claims for the efficacies of e-learning technologies, such as behaviouralist comparative studies tend to be taken. As well, skeptical political, socio-economic, and pedagogical critiques of e-learning in general tend to coalesce around instrumentalization process.
Substantivism Where instrumentalism focuses on the production and dissemination of technologies, substantivism is occupied with the pervasiveness of technologies in contemporary societies, such as mobile phones and laptops, and the ways those technologies shape our lifeworlds. At its most gloomy, a substantive position, such as the one posited by Heidegger, suggests that technology “forms a culture of universal control” (Feenburg, 1999, p. 3). More measured approaches, such as the Frankfurt School’s concept of “technology as an ideology,” provide room for exploring “‘subjugated knowledges’ that arise in opposition to dominating rationality” (Feenburg, p. 8). These “subjugated knowledges’ reveal “aspects of reality
hidden from the hegemonic standpoint of science and technique” (p.8). From the Frankfurt perspective, peripheral knowledges provide opportunities to redress the inequalities and social injustices of unchecked instrumentalization through “bounding it” (p. 151). Through resisting adoption of learning technologies, adherents to the liberal/ perennial school of philosophy of education attempt to bound technological influences on their teaching praxis. However, it is only when “technology is differentiated from other social domains’ that we perceive our interactions with it as something external (p. 209), which needs to be contained. From a more optimistic substantive view, instrumentalization can be checked by developing our understanding of the genesis and history of technologies, enabling us to imagine differing, more democratic and participatory, approaches to current and future technological developments. The power to influence social and pedagogical change can be actualized inside the design and dissemination processes of instrumentalization. In short, from an optimistic substantivism perspective, researchers and teachers, “‘end users,’ need to be fully involved in the design and implementation” of e-learning technologies (Carmichael, 2003, p. 109) in order to align means with worthwhile educational ends. Engaging in shaping technologies for teaching and learning aligns well with the progressive school of educational philosophy.
73
Toward Diversity in Researching Teaching and Technology Philosophies-in-Practice
Critical Theory and the Possibility of Alternatives Like substantivism, critical theory acknowledges the value-ladenness of technology. However, critical theory admits a non-instrumentalist theory of agency (Feenburg, 1999, p. 105). Rather than perceiving “technological democratization [as] an administrative problem,” (Feenburg, p. 105), critical theory recognizes the role of micro-politics in technical change. Micro-politics take the form of localized knowledge and action influencing technological change to ensure that new technologies fit well with local traditions in their natural settings. From a critical perspective, technical objects, systems, and environments are: Not ‘things’ in the usual sense, but nodes in a network that contains both people and devices in interlocking roles. … Social alliances in which technology is constructed are bound together by the very artifacts they create. Thus social groups do not precede and constitute technology, but emerge with it. (Feenburg, 1999, p. 114). From a critical perspective, understanding technology begins with understanding what makes it useful. Usefulness is defined as adaptability to multiple purposes through the process of concretization: “the discovery of synergisms between the functions technologies serve” and their “social and natural environments” (Feenburg, pp. 217218). This view of technology affords “technical ‘pluriculture’” where the ‘same’ technology used in another culture becomes quite a ‘different’ technology.” As academic disciplines can be defined in as cultural settings marked by shared “practices, meanings, and discourse,” (Mützel, 2009, p. 872), a technology used in one discipline becomes a different technology when it is deployed in an alternative discipline. Disciplinary adaptability, cultural flexibility, and political awareness provide space for adherents to humanist, analytical,
74
and radical/critical philosophies of education to engage with educational technologies. From a critical stance “future technology is not a fate one must choose for or against, but a challenge to political and social creativity” (Feenburg, 1999, p.225). Through its recognition of context and culture, critical technological philosophy-in-practice provides HE with an option for addressing contentions between disciplinary ways of knowing and standardized e-learning systems, including embedded theories of teaching and learning. Rather than assuming that HE teachers and researchers, who are reluctant to adopt institutionally sanctioned e-learning systems—are by definition resistant techno-skeptics—a critical philosophy of technology allows room to consider which philosophies of teaching and disciplinary cultures are underserved by common technological options, thus opening new research questions on how to address these short-comings across teaching and learning cultures. In an effort to construct new bridges between technology and teaching philosophies-in-practice, I contend that it is plausible to extend Feenburg’s (1999) grid into a pair of intersecting continua, rather than a series of distinct positions (See Figure 2). Reframing opens the field of possible philosophical positions to include a postmodernist perspective on teaching-technology interactions via accommodating movement across and blending of perspectives to align with contingencies of context- and discipline-specific virtual learning communities. This fluidity creates a framework for analyzing influences of individual and local agency in adapting technologies to diverse and networked contexts.
A Brief Overview of Social Network Analysis and Actor Network Theory The nature of networks is a pervasive interest in the study of teaching, learning, and technology. From a relational sociological perspective, investigating the nature of networks often begins
Toward Diversity in Researching Teaching and Technology Philosophies-in-Practice
Figure 2. Intersecting continua of Feenburg’s varieties of theory
with defining culture and structure. Mützel (2009) defines culture as referring to “practices, meanings and discourses” and structure as referring to “results of connected entities” (p.872). Examining similarities and differences in the ways culture and structure are conceptualized provides Mützel’s bases for a comparison of social network analysis (SNA) and actor-network theory (ANT). Both SNA and ANT “consider the production of meaning as an activity of connecting,” “posit that actions are prior to actors,” take “human and non-human actors into account,” and focus on qualitative methods to explore relations and interpret practices (Mützel, 2009, p. 872). Where SNA acknowledges the presence of non-human actors, e.g., concepts, categories, and events as part of social networks, these nonhuman actors are considered unable to contribute to meaning making or knowledge construction. In contrast, ANT includes a proposition that human and non-human actors are “equally able to act” (Mützel, p. 872). Methodologically, ANT researchers privilege open approaches to data collection and analysis, allowing “actors studied to make connections;” conversely, SNA researchers assign “who and what counts for analysis” (p.872).
Social Network Analysis (SNA) SNA is marked by a transactional approach to examining networks, where relations between human and non-human actors are viewed as “dynamic in nature, as unfolding, ongoing processes rather than as static ties;” therefore, attempts to “posit discrete, pre-given units of analysis such as the individual or society” are rejected (Mützel, 2009, p. 874). Consideration of how meaning emerges from relational contexts, as well as “how relations create meaning” provides avenues for analyzing “how identities, relations, and their social formations” emerge as identities connect and disconnect across network domains or netdoms (Mützel, p. 874-5). As actors strive to create social presences in netdoms, they seek allies through engaging in discourses. “These discursive interactions or stories” are open for interpretation and response from diverse listeners. (p.875) Over time story telling and story interpretations “make network relations explicit” (p.875) This view that stories reveal networks provides rich ground for expanding the use of narrative approaches to researching virtual learning communities. By switching from netdom to netdom, actors can engage in “reflective comparison and consequently generate perception and meaning;” there-
75
Toward Diversity in Researching Teaching and Technology Philosophies-in-Practice
fore, switching becomes “the central mechanism providing for the emergence of meaning” (Mützel, 2009, p.875). An implication of this centrality of switching—navigating among netdoms—is a need for rethinking the current emphasis on within-community research on identity formation and knowledge construction in virtual learning communities. Research designs and data collection techniques for examining switching patterns, e.g., between formal and informal learning communities, would provide a broader context for such studies.
Actor Network Theory (ANT) In ANT, networks are defined as diverse “chains of associations made up of multidimensional and evolving entanglements of human, non-human or collective actors,” all of which are referred to as actants (Mützel, 2009, p. 876). Researchers examine the linkages via which actants interact with other actants through activities: Analytical focus is first on the multifaceted interconnections of a local, egocentric network of an actor, before moving onto the next connected bundle of entanglements. Eventually these shifts and redefinitions between one micro-network of associations to the next over space and time and add up to a larger narrative on transformations of ideas and practices. (Mützel, p. 876). ANT studies do not seek to “add social networks to social theory, but to build social theory out of networks” (Latour, 1996, p. 369). As a result, ANT researchers reject a priori conceptions of groups or communities as these only become observable ex-post. An implication of adopting ANT for the study of formal virtual learning communities is a need to question the criteria of enrollment as evidence of community membership. Questions concerning weak and strong ties become more salient. Examining the clusters of micro-network linkages to personal, disciplinary,
76
and professional networks to which members of formal virtual learning communities belong, provides rich ground for developing new narratives on the development of epistemic fluency, academic and professional identities, as well as teaching and technology philosophies-in-practice.
Diversities across e-Learning Communities of Research and Practice As e-learning is far too expansive a term, on which to base anything but the broadest and most superficial discussion of issues and concerns around its influences in AE and HE, it is useful to consider diversities among e-learning sub-communities of research and practice. technology enhanced learning (TEL), computer-supported collaborative learning (CSCL), blended learning (BL), and networked learning (NL) constitute four recognizable, ‘branded’ communities or discourses, which are collectively a sufficient sample from which to trace representative diversities across the field.
Technology Enhanced Learning (TEL) TEL stands apart from other e-learning subcommunities in that its origins in North America, the United Kingdom, and the European Community reside in government funded initiatives across educational sectors, rather than from within academic communities of research and practice. TEL has been defined broadly by funding agencies, and arguably, almost as problematically as e-learning in general. In terms of the technologies it can employ and in the modes of delivery via which technologies can be applied, TEL has been defined as: A variety of information and communications technologies to provide flexible, high quality learning opportunities for both on and off-campus students. Technologies include, for example, the Internet and Web-based applications, video and
Toward Diversity in Researching Teaching and Technology Philosophies-in-Practice
audio conferencing, CD-ROMS, videotapes and interactive television. Technology enhanced learning can be used to offer wholly ‘virtual’ online opportunities, can be multi-mode, employing a combination of technologies, or can be integrated with traditional classroom instruction or independent study courses. (EKOS, 2005, p. 2). In its openness to technological diversity and interchangeability, as well as pedagogical neutrality, TEL can accommodate the full range of educational philosophies. While this philosophical fluidity might allow room for TEL to be linked to postmodern perspectives as a result of its “unpredictability, changeability, uncertainty, and ambiguity” (Elias & Merriam, 2005, p. 242), the absence of attention paid to “spontaneity, emotion, creativity, and a de-centered sense of localized action” (Elias & Merriam, 2005, p. 240) and its lack of consistent provision for community and connectivity deeply erode this linkage. The array of TEL technologies, applications, and purposes include: “wholly ‘virtual’ online,” and “multi-mode,” learning environments, as well as activities that are “integrated with traditional classroom instruction” or used to create “independent study courses” (EKOS, 2005, p. 2). TEL initiatives, in their variety of forms and purposes, tend to align with the determinist perspective that “technical means [are] neutral in so far as they merely fulfill natural needs” (Feenburg, 1999, p. 9). The focus of TEL on economic and political ends creates challenges for teachers and researchers to engage in critical examinations of its roles supporting diverse means across educational ends.
Computer-Supported Collaborative Learning (CSCL) The term, computer-supported collaborative learning (CSCL) “was first publicly coined at an international workshop in Maratea, Italy, in 1989. Since 1995, a biannual series of international CSCL conferences has been held in North
America, Western Europe and most recently Asia” (Stahl & Hesse, 2006, p. 5). Throughout this time period, the interdisciplinary community of CSCL researchers and practitioners have been working developing “theory, technology, research methods and educational practices that are specific to CSCL, and not simply inherited” from the diverse academic fields from which CSCL emerged, “such as artificial intelligence, educational and cognitive psychology, software development, instructional design” (Stahl & Hesse, 2006, pp. 3-4). To date, the CSCL community continues to “value a diversity of theories, methods, goals, disciplines, and approaches” (Stahl & Hesse, 2009, p. 233), while continuing to develop a shared understanding of “the essential processes of collaborative learning” (Stahl & Hesse, 2009, p. 235). As the community of CSCL researchers and practitioners include diverse “theories, methods, goals, disciplines, and approaches” within the CSCL discipline, like TEL, CSCL initially seems capable of accommodating fluidity across teaching philosophies. Again like TEL, CSCL may initially appear open to postmodern perspectives on education across sectors. However, the efforts of CSCL researchers to capture “the essential processes of collaborative learning” (Stahl & Hesse, 2009, p. 235), run contrary to the postmodern perspective that all forms of essentialism are flawed and “difference is not only desirable, but ineffaceable” (Feenburg, 1999, p.13). The growth and innovation motivations underpinning the CSCL community’s quest to understand and capture “the essential processes of collaborative learning” may loosely associated it with a progressive orientation to teaching and learning. Its collaborative nature provides a firmer tie to the “collective dialogue, ideal speech, and critical questioning in a riskfree environment” (Kanuka, p. 108) aspects of the radical-critical school. Unlike TEL, CSCL focuses on the collaborative, and therefore, the social nature of technologically mediated learning. Through bringing together researchers and practitioners from such
77
Toward Diversity in Researching Teaching and Technology Philosophies-in-Practice
diverse areas as “artificial intelligence, educational and cognitive psychology, software development, instructional design,” the CSCL community includes the social alliances that underpin both technical and educational choices. As there are “social interests at stake” in the design and development of collaborative educational technologies, the interests and epistemologies of individuals involved in these social alliances “are expressed in the technologies” they research, develop, and deploy (Feenburg, 1999, p. 11). “The process of ‘closure’ ultimately adapts” educational technologies to “socially recognized” criteria for success and utility; however, when the cycle of development is closed, the “social origins [may be] quickly forgotten” and “the artifact [may] appear purely technical, even inevitable” (Feenburg, p. 11). In its quest to continuously innovate and refine educational technologies, the CSCL community engages in the design processes of instrumentalism. CSCL researchers may adopt optimistic substantivism through recognizing ends as being “so implicated in the technological means employed to realize them that it makes no sense to distinguish means from ends” (Feenburg, p. 9).
Blended Learning (BL) Littlejohn and Pegler (2007) define blended learning as encompassing a range of HE educational technologies with sufficient breadth to include: (1) access to a wide array of electronic resources, from which individuals can assemble a personal “blend”; (2) “studying online with tutors as facilitators,” “downloading content to mobile devices,” (3) uploading notes to a blog “when a lecture is in progress,” and (4) “seamless integration of physical and virtual learning spaces that integrate and accommodate technology, but focus on student learning” (p. 10). In contrast, Garrison and Kanuka’s (2004) “simple” definition for BL, as the “integration of classroom face-to-face learning experiences with online learning experiences” (p. 96), is arguably a definition that marks most BL
78
practice in HE. The approach BL includes, “the challenge of virtually limitless design possibilities and applicability to so many contexts” (Garrison & Kanuka, 2004, p. 96) and makes BL seemingly as susceptible to pedagogical and philosophical neutrality as TEL. At its best BL “represents a fundamental re-conceptualization and reorganization of the teaching and learning dynamic, starting with various specific contextual needs and contingencies (e.g., discipline, developmental level, and resources)” (Garrison & Kanuka, p. 97). This more complex view of BL requires an in-depth examination of interrelationships between technologies and teaching to support the construction of learning environments that afford “interaction and [a] sense of engagement in a community of inquiry and learning” (p. 97). The dual foci on dialogue and community disassociate BL from the transmissive teaching missions of liberal and analytical schools, as well as the controlled, independent learning project of the behaviouralist school. BL’s recognition and provision for specific contextual needs associate its design process to the progressive school principles of “learner-centered, personalized, and problem-solving applications of e-learning technologies” (Kanuka, 2008, pp. 103-104). Historical, theoretical links between communities of inquiry (CoI), the development of critical thinking, and BL notably align blended learning to radical-critical school values; e.g., “collective dialogue, ideal speech, and critical questioning in a risk-free environment” (Kanuka, 2008, pp. 108). BL practices include sufficient latitude to accommodate the “unpredictability, changeability, uncertainty, and ambiguity” (Elias & Merriam, 2005, p. 242) and embrace postmodern values of creativity and connectivity. Blended learning bridges the paradox of “a simultaneous independent and collaborative learning experience” in that learners do not need to share space or time in order to learn within communities. While these ties indicate a fairly strong tie between BL and postmodern approaches
Toward Diversity in Researching Teaching and Technology Philosophies-in-Practice
to teaching and learning, the theoretical grounding of BL in the CoI framework and its research traditions of quality and effectiveness measures (Garrison, & Arbaugh, 2007) for meeting “specific learning requirements” (Garrison & Kanuka, 2004, p. 97) loosen that tie considerably. Recognition of the interdependence of the social affordances of technologies and design of teaching and learning environments, firmly place BL on the human control side of the philosophy of technology continuum. In its pursuit of “formalized, quantifiable, desirable affordances” BL incorporates instrumentalist qualities of optimistic substantivism. Concerns around provision for contextual needs in order to achieve pedagogical purposes link means to ends and loosely associate BL with technological critical theory.
communities” (Hodgson & Watland, 2004, p. 126). The NL community relational view of process of learning, where the production of meaning is a collaborative activity involving connecting people and resources, aligns well with SNA and ANT. Articulated respect for diverse individuals, cultural contexts, and recognition of diverse forms of knowledge, combined with engagement in teaching and learning practices that respectfully acknowledge learners’ lived experiences, perspectives, and voices associate NL theorists and practitioners with broader postmodernist approaches to HE.
Networked Learning (NL)
IMPLICATIONS FOR EDUCATIONAL PRACTIONTIONERS, RESEARCHERS, AND INSTITUTIONS OF HIGHER EDUCATION
Networked learning (NL) is distinct from TEL and CSCL because the field “focuses on the connections between learners, learners and tutors and between learners and the resources they make use of in their learning” (Jones, Ferreday, & Hodgson, 2006, p.1). This focus on connections includes a “relational view,” in which learning takes place in relation to others and also in relation to an array of learning resources (Jones 2004, Jones & Esnault, 2004). NL practice does not “privilege any particular types of relationships, either between people or between people and resources” (Jones, Ferreday, & Hodgson, 2006, p.1). Therefore, the practice of NL is distinct from TEL “approaches applied to the use of computers and digital networks in education, [and] communities of practice,” as well as from “computer supported collaborative learning” (Jones, Ferreday, & Hodgson, 2006, p.1). From the its original principles of developing and using reflexive, working knowledge, the theory and practice of NL has evolved into a praxis firmly based in “a social constructionalist view that assumes that learning emerges from relational dialogue with and/or through others in learning
Figure 3 theorizes a relative positioning of research, teaching and learning philosophies-in practice for four representative e-learning communities across intersecting technological philosophies-in practice continua. No doubt, further differences would emerge across a broader sample. Given the prevalence of the singularity paradigm in e-learning discourse, which currently marks policy and strategy directions, it is less than remarkable that “to date, most institutions of higher education can be described as lurching about” (Garrison & Kanuka, 2004, p. 104) in their efforts to integrate technology in meaningful and accepted ways into teaching practices. Pervasive, contentious debates around defining and implementing strategic directions and best practices precede the necessary and sufficient engagement of a diversity of e-learning communities of research and practice to create, explore, and experiment within and across continuously evolving communities of inquiry. Researchers and practitioners require time and resources to collaborate in developing context-specific, disciplinarysensitive and interdisciplinary-aware, epistemo-
79
Toward Diversity in Researching Teaching and Technology Philosophies-in-Practice
Figure 3. A theorized sample of teaching and technology philosophies-in-practice positions for e-learning research and practice communities
logically and philosophically grounded, methodological approaches to investigating interrelated educational and technological praxes, which focus on rich and diverse community-defined ends, rather than institutionally prescribed means. Software developers need to enter discourse and shared practice with HE teachers and researchers in order to “discovery of synergisms between the functions technologies serve” in HE in order to design new systems that interlock “social and natural environments” and flexible educational ends (Feenburg, 1999, pp. 217-218). While, no doubt, this recommendation seems a tall order in terms of time commitments for software developers, teachers, and researchers. However, in the absence of collaborative development, teachers spend copious amounts of time accommodating and circumventing system restraints, researchers are unable to access full sets of data needed to extend theory, and software developers tend to focus on incremental improvements to existing environments, rather than exploring transformative changes, based on research- and practice-identified needs. Leaders in institutions of higher education need to adopt sufficiently broad strategic visions to abandon the unachievable and undesirable goal
80
of prescribed consistency in favour of diversity and research-based innovation and change. In each of these arenas, authentic discourse must remain sufficiently open to admit the plausibility that some educational and social ends can be better served in the absence of educational technologies.
FUTURE RESEARCH DIRECTIONS Established technocracies rely “on the consensus that emerges spontaneously out of the technical roles and tasks in modern organizations” (Feenburg, 1999, p. 103). Any attempt at dismantling the e-learning singularity paradigm and its ethos of centralized, standardized control—any concerted effort to stop and reflect upon the reflexive natures of the technologies e-learning communities of research and practice inhabit—poses threats to established technocratic interests. Extending the range of possible educational technologies through adopting diverse, polycultural, technical philosophies-in-practice will, no doubt raise strong objections from some technological experts who “fear the loss of hard-won freedom from lay interference” (Feenburg, p. 76). It is probable
Toward Diversity in Researching Teaching and Technology Philosophies-in-Practice
that HE leaders will defend the continuance of existing institutionally implemented e-learning systems and practices with rational, scalable, maintainable, fiscal arguments. However, neither technical principles nor administratively orchestrated effectiveness and efficiency consensuses are sufficient to determine the design of the futures of polycultural e-learning communities in higher education. This philosophical ground can only be contested through local, ongoing, action research programs, which open dialogues across schools of thought and which reject end of knowledge positions. I posit the theorized sample of relative philosophies-in-practice positions of e-learning research and practice communities in Figure 3 as a launching point for cross-community dialogues—switching among research and practice netdoms—as a starting point for understanding the diversity of e-learning research endeavors.
CONCLUSION In this chapter, I have proposed a reframing the discourse on adoption of e-learning in higher education to include interrelationships between disciplinary ways of knowing, underpinning philosophies of teaching and technology, and the resultant degrees of alignment or disconnect with institutionally mandated e-learning systems. I have argued that technological systems need to become internal subjects in the discourse of community-based networked research, rather than continuing to be perceived as external objects of contention. Toward achieving this goal, I have undertaken: (1) a critical examination of Kanuka’s (2008) framework and recommendations, (2) extended the range of both teaching and technology philosophies-in-practice under consideration; (3) theorized a gestalt perspective on interrelationships between teaching and technology praxes; (4) examined four recognizable e-learning research and practice communities for associations with teaching and technology philosophies-in-practice,
and (5) presented rationales and examples for further investigation of the diversities of e-learning environments and virtual learning communities. In this process, I have delineated between modernist and postmodernist perspectives on teaching and learning. It is my hope that continued exploration will lead to a future where the persistence of e-learning communities in higher education is not a fate one must choose for or against, but as a site for political, social, technological, pedagogical, and philosophical creativity directed toward ongoing understanding of dynamic, networked teaching and learning experiences.
REFERENCES Abutarbush, S. M., Naylor, J. M., Parchoma, G., D’Eon, M., Petrie, L., & Carruthers, T. (2006). Evaluation of traditional instruction versus a selflearning computer module in teaching veterinary students how to pass a nasogastric tube in the horse. Journal of Veterinary Medical Education, 33(3), 447–454. doi:10.3138/jvme.33.3.447 Alcaly, R. E. (2003). The new economy: What it is, how it happened, and why it is likely to last. New York: Farrar, Stratus, & Giroux. Anderson, T. (2008). Towards a theory of online learning. In Anderson, T. (Ed.), The theory and practice of online learning (pp. 91–119). Athabasca, Canada: Athabasca University Press. Archer, W., Garrison, D. R., & Anderson, T. D. (1999). Adopting disruptive technologies in traditional universities: Continuing education as an incubator for innovation. Canadian Journal of University Continuing Education, 25(1), 13–44. Barone, C. A. (2003). The changing landscape and the new academy. EDUCAUSE Review, 38(5), 40–47.
81
Toward Diversity in Researching Teaching and Technology Philosophies-in-Practice
Bates, A. W. (2000). Managing technological change: Strategies for college and university leaders. San Francisco, CA: Jossey-Bass. Bok, D. (2003). Universities in the marketplace: The commercialization of higher education. Princeton, NJ: Princeton University Press. Carmichael, P. (2003). Teachers as researchers and teachers as software developers: How use-case analysis helps build better educational software. Curriculum Journal, 14(1), 105–123. doi:10.1080/0958517032000055983 Daniel, B., & Mohan, P. (2004, July). Reengineering the public university with reusable learning objects approach. Paper presented at the International Conference on Education and Information Systems: Technologies and Applications, Orlando, FL. DiPaolo, A. (2003). Choices and challenges: Lessons learned in the evolution of online education. Presentation to the Association of Pacific Rim Universities’ 2003 Distance Learning and the Internet conference (Singapore, Dec. 2003). Retrieved September 15, 2007, from the National University of Singapore’s Website: http://www.cit.nus.edu. sg/dli2003/Presentation/Andy_DiPaolo.pdf EKOS Research Associates. (2005, March). Review of the technology enhanced learning (TEL) action plan: Final report [Electronic version]. Edmonton, AB: Author. Retrieved January 5, 2008 from http://www.aee.gov.sk.ca/tel/pdf/ final_report_review_tel_action_plan_mar_05.pdf Elias, J. L., & Merriam, S. B. (1980). Philosophical foundations of adult education. Malabar, FL: Krieger. Elias, J. L., & Merriam, S. B. (2005). Philosophical foundations of adult education (3rd ed.). Malabar, FL: Krieger. Feenburg, A. (1999). Questioning technology. London: Routledge.
82
Gandolfo, A. (1998). Brave new world? The challenge of technology to time-honored pedagogies and traditional structures. New Directions for Teaching and Learning, 76, 23–38. doi:10.1002/ tl.7602 Garrison, D. R., & Arbaugh, J. B. (2007). Researching the community of inquiry framework: Review, issues, and future directions. The Internet and Higher Education, 10, 157–172. doi:10.1016/j.iheduc.2007.04.001 Garrison, R., Kanuka, H., & Hawes, D. (2002). Blended learning in a research university. Learning Commons publication: University of Calgary. Goodyear, P. (2002). Psychological foundations for networked learning. In Steeples, C., & Jones, C. (Eds.), Networked learning: Perspectives and issues (pp. 49–75). London: Springer-Verlag. Greener, I., & Perriton, L. (2005). The political economy of networked learning communities in higher education. Studies in Higher Education, 30(1), 67–79. doi:10.1080/0307507052000307803 Hanna, D. E. (2000). Emerging organizational models: The extended traditional university. In Hanna, D. (Eds.), Higher education in an era of digital competition: Choices and challenges (pp. 93–116). Madison, WI: Atwood. Harasim, L. (2006). A history of e-learning: Shift happened. In Weiss, J., Nolan, J., Hunsinger, J., & Trifonas, P. (Eds.), The international handbook of virtual learning environments (Vol. 1, pp. 59–94). Dordrecht, The Netherlands: Springer. doi:10.1007/978-1-4020-3803-7_2 Hemphill, D. F. (2001). Incorporating postmodernist perspectives into adult education. In Sheared, V., & Sissel, P. (Eds.), Making space: Merging theory and practice in adult education (pp. 15–28). Westport, CT: Bergin and Garvey.
Toward Diversity in Researching Teaching and Technology Philosophies-in-Practice
Inbody, T. (1995). Postmodernism: Intellectual velero dragged across culture? Theology Today (Princeton, N.J.), 51(4), 523–538.
Latour, B. (1996). The powers of association. In Law, J. (Ed.), Power, action and belief (pp. 264–280). London: Routledge & Kegan Paul.
Jonassen, D. H. (2004). Learning to solve problems: An instructional design guide. San Francisco, CA: Pheiffer.
Laurillard, D. (2008). Technology enhanced learning as a tool for pedagogical innovation. Journal of Philosophy of Education, 42(3-4), 521. doi:10.1111/j.1467-9752.2008.00658.x
Jones, C. (2004). Networks and learning: communities, practices and the metaphor of networks. ALTJ Association for Learning Technology Journal, 12(1), 82–93. doi:10.1080/0968776042000211548 Jones, C., & Esnault, L. (2004). The metaphor of networks in learning: Communities, collaboration and practice. In Banks, S., Goodyear, P., Hodgson, V., Jones, C., Lally, V., McConnell, D and Steeples, C. (Eds.) Networked Learning 2004: Proceedings of the Fourth International Conference on Networked Learning 2004 (pp. 317 – 323). Lancaster: Lancaster University and University of Sheffield. Jones, C., Ferredy, D., & Hodgson, V. (2006) Networked learning, a relational approach – Weak and strong ties [Electronic version]. Retrieved January 5, 2008 from the Lancaster University Website: http://www.networkedlearningconference.org.uk/ past/nlc2006/abstracts/pdfs/01Jones.pdf Jones, C., & Steeples, C. (2002). Perspectives and issues in networked learning. In Steeples, C., & Jones, C. (Eds.), Networked learning: Perspectives and issues (pp. 1–14). London: Springer-Verlag. Kanuka, H. (2008). Understanding e-learning technologies-in-practice through philosophiesin-practice. In Anderson, T. (Ed.), The theory and practice of online learning (pp. 91–119). Athabasca, Canada: Athabasca University Press. Lancaster University. (2006). Lancaster e-learning strategy. Retrieved May 22, 2009 http://www. lancs.ac.uk/celt/celtweb/files/eL%20Strategy%20Final%202006.pdf
Littlejohn, A., & Pegler, C. (2007). Preparing for blended e-learning. London: Routledge. McConnell, D. (2006). E-Learning groups and communities. Berkshire, UK: Open University Press. Mützen, S. (2009). Networks as culturally constituted processes: A comparison of relational sociology and actor-network theory. Current Sociology. doi:.doi:10.1177/0011392109342223 Naylor, J. M. (2005, May). Learning in the information age: Electronic resources for veterinarians. In Parchoma, G. (Ed.), Large Animal Veterinary Rounds, 5, 5. Nichols, M. (2008). Institutional perspectives: The challenges of e-learning diffusion. British Journal of Educational Technology, 39(4), 598–609. doi:10.1111/j.1467-8535.2007.00761.x Njenga, J. K., & Fourie, L. C. H. (2008). The myths about e-learning in higher education [Early View, Electronic version]. British Journal of Educational Technology. Retrieved from Wiley Interscience, DOI:10.1111/j.1467-8535.2008.00910.x Oblinger, D. G. (2006). Games and learning: Digital games have the potential to bring back the learning experience. EDUCAUSE Review, 29(3), 5–7. Olcott, D., & Schmidt, K. (2000). Redefining faculty policies and practices for the knowledge age. In Hanna, D. (Ed.), Associates, Higher education in an era of digital competition (pp. 258–286). Madison, WI: Atwood.
83
Toward Diversity in Researching Teaching and Technology Philosophies-in-Practice
Parchoma, G. (2008). Adoption of technology enhanced learning in higher education: Influences of institutional policies and practices. Saarbrüken, Germany: VDM Verlag.
Stahl, G., & Hesse, F. (2009). Classical dialogs in CSCL. International Journal of ComputerSupported Collaborative Learning, 4(3), 233–237. doi:10.1007/s11412-009-9071-y
Peters, M. A. (2006). Towards philosophy of technology in education: Mapping the field. In Weiss, J., Nolan, J., Hunsinger, J., & Trifonas, P. (Eds.), The international handbook of virtual learning environments (Vol. 1, pp. 95–116). Dordrecht, The Netherlands: Springer. doi:10.1007/978-14020-3803-7_3
Tenopir, C. (2003) Use and users of electronic library resources: An overview and analysis of recent research studies. Retrieved August 13, 2009: http://www.clir.org/pubs/reports/pub120/ pub120.pdf
Schwier, R. A. (2001). Catalysts, emphases and elements of virtual learning communities: Implications for research and practice. The Quarterly Review of Distance Education, 2(1), 5–18. Schwier, R. A. (2007). A typology of catalysts, emphases and elements of virtual learning communities. In Luppicini, R. (Ed.), Trends in distance education: A focus on communities of learning (pp. 17–40). Greenwich, CT: Information Age Publishing. Schwier, R. A., & Daniel, B. (2006). Did we become a community? Multiple methods for identifying community and its constituent elements in formal online learning environments. In Lambropoulous, N., & Zaphiris, P. (Eds.), Usercentered design of online learning communities (pp. 29–53). Hershey, PA: IGI Global. Selwyn, N. (2007). Screw Blackboard... Do it on Facebook! An investigation of students’ educational use of Facebook. Retrieved January 26, 2009, from: www.scribd.com/doc/513958/ Facebook-seminar-paper-Selwyn Stahl, G., & Hesse, F. (2006). ijCSCL—a journal for research in CSCL. International Journal of Computer-Supported Collaborative Learning, 1(1), 3–7. doi:10.1007/s11412-006-7867-6
84
UNESCO. (2002). Financing education – Investments and returns. Analysis of the world education indicators: 2002 ed. Executive summary. Paris, France. Usher, R., Bryant, I., & Johnston, R. (1997). Adult education and the postmodern challenge: Learning beyond the limits. New York: Routledge. Wagner, E. D. (2005). Enabling mobile learning. EDUCAUSE Review, 40(3), 40–53. Whitehouse, K. (2005). Web-enabled simulations: Exploring the learning process. Educause Quarterly, 3(20)-29. Wong, A. (2000). University-level prior learning assessment and recognition. Saskatoon, Canada: University of Saskatchewan. Zurawski, N. (1996, June 24-28). Ethnicity and the Internet in a global society. Paper presented at the 1996 Internet Society of Canada conference. Retrieved January 26, 2009, from http://www. isoc.org/inet96/proceedings/e8/e8_1.htm
ADDITIONAL READING Bates, T. (2008). Transforming distance education through new technologies. In Evans, T., Haughey, M., & Murphy, D. (Eds.), The international handbook of distance education (pp. 217–235). Bingley, UK: Emerald Press.
Toward Diversity in Researching Teaching and Technology Philosophies-in-Practice
Bianco, M. B., & Carr-Chellman, A. A. (2007). Exploring qualitative methodologies in online learning environments. In Luppicini, R. (Ed.), Online learning communities (pp. 299–317). Charlotte, NC: Information Age Publishing. Daniel, B., Schwier, R. A., & McCalla, G. (2003). Social capital in virtual learning communities and distributed communities of practice. Canadian Journal of Learning and Technology, 29(3), 113–139. Dillenbourg, P. (1999). Introduction: What do you mean by “collaborative” learning? In P. Dillenbourg (Ed.), Collaborative learning: Cognitive and computational approaches (pp. 1-19). Kidlington, Oxford, UK: Elsevier Science. Friesen, N. (2009). Re-thinking e-learning research: Foundations, methods, and practices. New York: Peter Lang. Goodyear, P., & Ellis, R. A. (2008). University students’ approaches to learning: Rethinking the place of technology. Distance Education, 29(2), 141–152. doi:10.1080/01587910802154947 Goodyear, P., & Zenios, M. (2007). Discussion, collaborative knowledge work and epistemic fluency. British Journal of Educational Studies, 55(4), 351–368. doi:10.1111/j.1467-8527.2007.00383.x Hemmi, A., Bayne, S., & Land, R. (2009). The appropriation and repurposing of social technologies in higher education. Journal of Computer Assisted Learning, 25, 19–30. doi:10.1111/j.13652729.2008.00306.x Hodgson, V., & Watland, P. (2004). The social constructionist case for researching networked management learning: A postscript and reply to Arbaugh and Benbunan-Fich. Management Learning, 35(2), 125–132. doi:10.1177/1350507604044186
Mason, R., & Rennie, F. (2008). Social networking as an educational tool. In E-learning and social networking handbook: Resources for higher education (pp. 1–24). London: Routledge. Maxwell, J. W. (2006). Re-situating constructionism. In J. Wies et al. (Eds.), In J. Weiss, J. Nolan, J. Hunsinger, & P. Trifonas (Eds.), The international handbook on virtual learning environments (Vol. 1) (pp. 279-298). Dordrecht, The Netherlands: Springer. McConnell, D. (2006). E-learning groups and communities. Maidenhead, UK: Open University Press. Paloff, R. M., & Pratt, K. (2007). Online learning communities in perspective. In Luppicini, R. (Ed.), Online learning communities (pp. 3–15). Charlotte, NC: Information Age. Parchoma, G. (2005). Roles and relationships in virtual environments: A model for adult distance educators extrapolated from leadership in experiences in virtual organizations. International Journal on E-Learning, 4(4), 463–487. Parchoma, G. (2009). Leadership strategies for coordinating distance education instructional development teams. In Kennepohl, D., & Shaw, L. (Eds.), Accessible elements: Teaching science at a distance (pp. 37–60). Athabasca, Canada: Athabasca University Press. Polin, L. G. (2008). Graduate professional education from a community of practice perspective: The role of social and technical networking. In Kimble, C., Hildreth, P., & Bourdon, I. (Eds.), Communities of practice: Creating learning environments for educators (Vol. 2, pp. 267–285). Charlotte, NC: Information Age. St. Clair, R. (2008). Educational research as a community of practice. In Kimble, C., Hildreth, P., & Bourdon, I. (Eds.), Communities of practice: Creating learning environments for educators (Vol. 1, pp. 21–38). Charlotte, NC: Information Age.
85
Toward Diversity in Researching Teaching and Technology Philosophies-in-Practice
Wellman, B., Koku, E., & Hunsinger, J. (2006). Networked scholarship. In Wies, J. (Eds.), The international handbook on virtual learning environments (Vol. 2, pp. 1429–1448). Dordrecht, The Netherlands: Springer. doi:10.1007/978-14020-3803-7_57
KEY TERMS AND DEFINITIONS Blended Learning: An approach that blends classroom face-to-face learning experiences with technologically mediated learning experiences in both integrated and distributed models, supported by a community of researchers and practitioners, who tend to promote the development critical thinking skills through a community of inquiry framework. Computer-Supported Collaborative Learning: An approach to developing technologymediated, collaborative learning experiences, supported by an interdisciplinary community of researchers and practitioners whose goals include developing theory, technology, research methods, and educational practices to enhance collaborative learning. e-Learning Singularity Paradigm: A representation of philosophically, technologically, and pedagogically diverse, technology-mediated approaches to teaching and learning as a homogenous set of researchable practices, from which best practices can be distilled and generalized across social, cultural and geographical contexts.
86
Gestalt of e-Learning: Recognition of reflexive, interrelated philosophical, technological, social, environmental, and pedagogical agencies within e-learning design and dissemination processes Networked Learning: A relational approach that focuses on the connections between learners, learners and teacher and between learners and resources, which does not privilege any particular relationship, either between people or between people and resources, and is supported by a community of researcher-practitioners, who are aligned with evolving, postmodern approaches to teaching praxis. Philosophies of Teaching: Schools of thought on the underpinning purposes and goals of teaching and learning. Philosophies of Technology: Schools of thought on the underpinning purposes and goals of technologies. Philosophies-in-Practice: The actualization of philosophical stances in practice. Technology Enhanced Learning: An approach to the provision of distance, blended, and classroom-based learning experiences through the use of a full range of information and communications technologies undertaken by communities of educational researchers, designers, information and communications technologists, and media specialists.
Section 2
Social Networks and Data Mining Social Network Analysis (SNA) provides a range of models particularly well suited for mapping bonds between participants in virtual communities and thus reveal prominent members or subgroups. Section 2 of this volume presents 4 chapters focused on various aspects of social network ranging from more traditional social network analysis to more emergent dynamic explorations of phenomena using social networks. Chapter 5 proposes social network approach dealing with the utilization of a pattern discovery method to identify evolving patterns defined by constraints. In this work, constraints are parameterized by the user to drive the discovery process towards potentially interesting patterns, with the positive side effect of achieving a more efficient computation. Chapter 6 proposes a mixed approach that ‘blends’ qualitative and quantitative methods in the utilization of social networks to investigate participation dynamics to create an innovative method. The chapter describes the characteristics of this methodology and it provides examples. Chapter 7 presents an approach for analyzing semantic social networks intended to capture collective intelligence from collaborative interactions in discussions on requirements for Enterprise 2.0. The chapter provides evidence of testing tools and models using anonymized dataset from Ipernity.com—one of the biggest French social web sites centered on multimedia sharing. Chapter 8 describes a procedure for collecting data from Usenet newsgroups, deriving a social network created by participant interaction, and importing this relational data into social network software, where various cohesion models can be applied.
88
Chapter 5
Data Mining Techniques for Communities’ Detection in Dynamic Social Networks Céline Robardet Université de Lyon, France
Data Mining Techniques for Communities’ Detection in Dynamic Social Networks
INTRODUCTION Social network analysis conceives social relationships in terms of graphs of interactions whose nodes represent individual actors within the networks and links social interactions such as ideas, friendship, collaboration, trade, etc. Virtual communities, and particularly online communities, are peculiar social networks whose analysis is facilitated by the fact that the network is in some sense monitored continuously. Social networks have attracted a large amount of attention from epidemiologists, sociologists, biologists and computer scientists that have shown the ubiquitous role played by social networks in determining the way problems are solved or organizations are run. The study of such networks has attracted much attention in the recent years and has proceeded along two main tracks: the analysis of graph properties, such as degree distribution, diameter or simple graph patterns such as cliques (Scherrer et al., 2008, Leskovec et al., 2005), and the identification of communities, which are loosely defined as collections of individuals who interact unusually frequently (Newmann, 2004, Palla et al., 2005). Communities reveal properties shared by related individuals. However, most of the interesting real-world social networks that have attracted the attention of researchers in the last few years are intrinsically time dependent and tend to change dynamically. As new nodes and edges appear while some others disappear over time, it seems decisive to analyze deeply the evolution of such dynamic graphs. Furthermore, there is a crucial need for incremental methods that enable to find groups of associated nodes and detect how these structures change over time. Communities are loosely defined as highly connected subgraphs that are also isolated from the rest of the graph. Such properties can be captured by measures such as modularity (Newman, 2004) used to find disjoint communities forming a partition. The modularity of a given partition of nodes is the number of edges inside clusters (as opposed
to crossing between clusters), minus the expected number of such edges if the graph was random conditioned on its degree distribution. Community structures often maximize the modularity measure. However, this measure has an intrinsic resolution scale, and can therefore fail to detect communities smaller than that scale and favor in general communities of similar size (Fortunato et al., 2007). Moreover, it has been shown (Brandes et al., 2008) that finding the community structure of maximum modularity for a given graph is NPcomplete and thus heuristics have been proposed that approximate this optimization problem. Instead of directly looking for a global structure of the graph, such as a partition of the vertices, it can be more efficient to proceed in two steps. One might first compute subgraphs that capture locally strong associations between vertices and then use these local patterns to construct a global model of the graph’s dynamics. Such a framework provides more interesting patterns when the analyst can specify his inclination by means of constraints. Many pattern mining under local constraints techniques (e.g., looking for frequent patterns, data dependencies) have been studied extensively the last decade (Morik et al., 2005). One crucial characteristic of local pattern mining approaches is that the interestingness of a pattern can be computed independently of the other patterns. Such framework enables the analyst to specify a priori relevancy of pattern by means of constraints. The constraints have been identified as a key issue to achieve the tractability of many data mining tasks: useful constraints can be deeply pushed into the extraction process such that it is possible to get complete (every pattern which satisfies the user-defined constraint is computed) though efficient algorithms. Specific subgraphs defined by constraints have already been examined. Fully connected subgraphs, also called cliques, are a local pattern type that has been considered as communities. Palla et al. (2005) consider that communities rely on several complete (fully connected) subgraphs of size
89
Data Mining Techniques for Communities’ Detection in Dynamic Social Networks
k that share k-1 of their nodes. Such structures can be explored systematically with a deterministic algorithm. Although clique is a popular pattern type that captures dense subgraphs, it fails in properly handling experimental data that are intrinsically noisy. Indeed, in such data, some links may be missing even in dense substructures. To cope with this problem, a relaxed definition of cliques has been proposed. Pseudo cliques are natural extension of cliques which are subgraphs obtained by removing a small number of edges from cliques, expressed as a proportion compared to the number of links the subgraph would contained if it was a clique. Thus, pseudo cliques are subgraphs with a density higher than a given threshold and recent research results have shown that the constraints defining pseudo cliques can be efficiently used in a mining algorithm (Uno, 2007). We extend this result to derive a new algorithm that extracts isolated pseudo-cliques and their evolution in time. We consider five basic temporal event types that are associated to the computed subgraphs: the formation, dissolution, growth, diminution and stability of such patterns. Such evolving patterns make possible to describe the processes by which communities come together, attract new members, and develop over time. We propose an algorithm that mines such evolving patterns. The use of complete solvers allows us to answer constraint user queries without uncertainty. Algorithmic technical details can be found in (Robardet, 2009). In this chapter, we provide much more details and examples on how the proposed method identifies communities. This chapter is organized as follows. The next section is dedicated to related work on the subject. It is followed by the presentation of the constraints that define the pattern types extracted in static graph. Then, the evolving pattern types are introduced. An algorithm that mines them is presented. Some experimental results are thus reported. Finally, some conclusions and future work close this chapter.
90
RELATED WORK There is an increasing interest in mining dynamic graphs. Earlier work studied the properties of the time evolution of real graphs such as densification laws and shrinking diameters (Leskovec et al., 2005), and the evolution of known communities over time (Backstrom et al., 2006). Other papers have focused on community extraction thanks to constrained optimization (Chi et al., 2007), lowrank matrix approximation approaches (Tong et al., 2008), information theoretic principles (Sun et al., 2007) or combinatorial optimization problems (Tantipathananandh et al., 2007). Another body of work considers constrainedbased mining approaches to extract knowledge from static graphs. Efficient algorithms that compute maximal cliques have been proposed (Makino et al., 2004). Many papers propose to relax the clique property by allowing the absence of some links. Strongly self-referring subgraphs are defined in (Hamalainen et al., 2004) as a set of nodes S whose nodes are connected to at least a given proportion of nodes of S. Zhu et al. (2007) give a comprehensive study on the pruning properties of constraints on graphs. They study the pruning properties for involved structural constraints in graph mining which achieve pruning on the pattern search space and data space. A general mining framework is proposed that incorporates these pruning properties. Pseudo clique mining, defined as the search for subgraphs having a density greater than a userdefined threshold, was first studied in (Pei et al., 2004), but the complete exploitation of the loose anti-monotonicity property of the pseudo clique constraint was only achieved in (Uno et al., 2007) where a polynomial delay algorithm that extracts all pseudo cliques is proposed. Considering the extraction of patterns in dynamic graphs, Borgwardt et al. (2006) propose to apply frequent subgraph mining algorithms to time series of graphs to extract subgraphs that are frequent within the set of graphs. The extraction of
Data Mining Techniques for Communities’ Detection in Dynamic Social Networks
periodic or near periodic subgraphs is considered in (Lahiri et al., 2008) where the problem is shown to be polynomial. Finally, the so-called change mining framework is proposed in (Böttcher et al., 2008) as an abstract knowledge discovery process based on models and patterns learned from a non-stationary population. Its objective is to detect and analyze when and how changes occur, including the quantification, interpretation and prediction of changes.
IDENTIFYING DENSE AND ISOLATED SUBGRAPHS IN A STATIC GRAPH Let us first present the static pattern type we are interested in. Let G=(V,E) be a simple undirected graph with a vertex set V and an edge set E. The subgraph induced by a subset of vertices S is the graph GS=(S,ES) where Es={{u,v}∈ E and u,v ∈ S2}. The degree, degs(u), of a vertex u on the subgraph induced by S is the number of vertices of S adjacent to u, i.e., degS(u)=|{v ∈ S such that {u,v}∈ E}|. Subgraphs of interest are usually those made of vertices that have a high density of edges. If any pair of vertices in a subgraph is connected by an edge, the subgraph is called a clique. Such subgraphs have a density of 1, where density is the number of edges in the subgraph divided by the maximal number of possible edges. To relax
this strong property, we can consider subgraphs with density higher than a user-defined threshold. Such subgraphs are usually called pseudo cliques or quasi cliques. Given a user-defined threshold σ∈ [0,1] and a set of nodes S of size n, the subgraph GS=(S,ES) induced by S is a pseudo clique if and only if it is connected and 2|ES|/(n(n-1))>σ. Constraint-based mining algorithms require taking advantage of the constraints to prune huge parts of the search space which can not contain valid patterns. Pruning based on monotonic or anti-monotonic constraints has been proved efficient on hard problems since when a candidate does not satisfy the constraint then none of its generalizations or specializations can satisfy it as well. Let us first remark that pseudo clique constraint is not anti-monotonic with respect to the enumeration of induced subgraphs based on the set inclusion of their vertices set: expanding a set of nodes S could make 2|ES|/(n(n-1)) increase or decrease. However, this constraint is loose antimonotonic, that is to say, pseudo cliques can always be grown from a smaller pseudo cliques with one vertex less (Zhu et al., 2007, Bonchi et al., 2007). Zhu et al. have shown that if S is a valid pseudo clique, thus the set obtained by removing from S a vertex having the smallest degree on S is also a pseudo clique. Figure 1 illustrates this property: S={1,2,3,4,5} is a pseudo-clique with σ=2/3. If we remove node 2 that has the smallest degree
Figure 1. Illustration of the loose anti-monotonicity of pseudo-cliques
91
Data Mining Techniques for Communities’ Detection in Dynamic Social Networks
Figure 2. Pseudo cliques with σ=0.7 (left), and pseudo cliques (σ=0.7) that are isolated (a=1) and maximal (right)
on S, the resulting subgraph {1,3,4,5} is a also a valid pseudo-clique. To be efficient, the pseudo cliques enumeration process must tap the pruning power from the loose anti-monotonicity of pseudo clique. It is clear that adding to a current pseudo clique S the node v that satisfies degS ∪{v } (v ) = minu ∈S ∪{v}degS ∪{v } (u )
(1)
leads to a pseudo clique, unless none of the supersets of S is a pseudo clique. Thus, an efficient algorithm would enumerate recursively nodes by finding at each iteration the node v that satisfies (Equation 1) and stop the enumeration if the obtained subgraph is not a valid pseudo clique. Note that if several nodes satisfy (Equation 1), the one of smallest index is taken. This leads to a polynomial delay time algorithm, that is to say the time needed to generate each single pseudo clique is bounded by a polynomial in the size of the input graph. Uno 2007 proposes an algorithm that checks if a subgraph is dense in constant time and finds the next vertex to be enumerated in O(maxv∈VdegS(v)). Pseudo cliques are local patterns that capture strong while not perfect associations in a graph. But, not all the pseudo cliques of a graph are of importance: some of them have many links to
92
outside vertices, others are redundant. Figure 2 (left) illustrates this phenomenon. 9 pseudo cliques have been extracted (σ=0.7) in the graph. These pseudo cliques are highly redundant. To select the most useful pseudo cliques, we consider two other constraints that coerce the patterns to be isolated and maximal. To pick out pseudo cliques S with few links to nodes outside S, we constrain the average number of outside links per vertex. This constraint is similar to the isolated constraints defined for formal concepts and their generalization in (Cerf et al., 2008). Given a user defined threshold α∈â—š, a subgraph S is isolated iff
∑ deg (u ) − deg (u ) ≤ ± (S ( V
S
Figure 2 (right) shows that, with a=1, a single isolated and maximal pseudo clique is extracted in the graph example. Such a pseudo clique has an average number of outside links per vertex lower or equal to 1. Even though this constraint is also loose antimonotonic, its combination with the high density constraint is not loose anti-monotonic constraint. The two constraints cannot be ensured at the same time by an algorithm that exploits both loose anti-monotonic constraints. Hence, we propose
Data Mining Techniques for Communities’ Detection in Dynamic Social Networks
to ensure the new constraint in a post-processing of the previously computed pseudo cliques. Extracting maximal patterns is even more difficult, since this constraint is global and requires enumerating supersets of a candidate to check whether it is maximal. A practical approach consists in extracting locally maximal isolated pseudo cliques. A subgraph S of size n is a local maximal isolated pseudo clique if it satisfies the two constraints and no supersets of S of size n+1 satisfies these two properties. With this constraint, the very large majority of non-maximal isolated pseudo cliques are removed, whereas the time complexity of the extraction remains the same.
MINING EVOLVING SUBGRAPHS Local pattern mining algorithms provide a frequently large and unstructured set of patterns that cannot be readily interpreted or exploited by the users (De Raedt et al., 2007). We propose to complement the first phase where potentially interesting subgraphs are mined in static graphs, with a second phase, in which sets of pattern are post-processed to answer temporal queries on dynamic graphs. ˆ = G 1 , ...,GT We consider a dynamic graph G
(
)
which is a time-series of T graphs, where Gt=(Vt,Et) is the graph with edges Et observed at time t, among the vertices of Vt. The typical questions we want to consider are: •
•
Do the strong interactions observed at time t grow, diminish or remain the same over time? When do these subgraphs appear and disappear?
The objective here is to identify the temporal relationships that may occur between valid (i.e., locally maximal isolated) pseudo cliques. We denote by Ct the set of subgraphs of Gt that satisfy
the constraints. We consider five basic temporal relationships between couples of subgraphs from consecutive time stamps: Stability: S is said to stay the same at time t if it is a valid pseudo clique at time t and t-1: S∈Ct∧S∈Ct-1 • Growth: a subgraph S enlarges at time t if S is a valid pseudo clique at time t and a subpart of it forms a valid pseudo clique at time t-1: S ∈ C t ∧ ∃R,R ⊂ S such that R ∈ C t −1 • Diminution: a subgraph S shrinks at time t if S is a valid pseudo clique at time t and is a subpart of a larger valid pseudo clique of time t-1: t S ∈ C ∧ ∃R,S ⊂ R such that R ∈ C t −1 • Extinction: a subgraph S disappears at time t if it is a valid pseudo clique at time t-1 and if it is not involved in any previously defined pattern at time t: S ∈ C t −1 ∧ ∀R such that R ⊆ S, R ∉ C t ∧ ∀R such that S ⊆ R,R ∉ C t •
Emergence: a subgraph S emerges at time t if it is a valid pseudo clique in Gt and if none of its subsets or supersets are valid pseudo cliques in Gt-1: t S ∈ C ∧ ∀R such that R ⊆ S, R ∉ C t −1 ∧ ∀R such that S ⊆ R,R ∉ C t −1
Those temporal relationships correspond to global constraints used to identify the dynamics of strong associations in graphs. We now present an incremental algorithm that processes each static graph sequentially. Inspired by the Triebased Apriori implementation (Bodon, 2005), we propose to use a trie data structure (prefix tree) to store valid pseudo-cliques. Indeed, finding evolving patterns requires the evaluation of subset queries over valid pseudo-cliques. Such queries are computationally consuming and require special attention. Suppose that pseudo-cliques of Ct-1 are stored in a trie T. Each node of T consists of the set S of all
93
Data Mining Techniques for Communities’ Detection in Dynamic Social Networks
the vertices of the pseudo-clique, a list of temporal states, a list of pointers to other trie nodes and a list of time stamps. When a new valid pseudo-clique of Gt is computed, its vertex set S is inserted in T recursively. Figure 3 illustrates this process. Figure 3-A corresponds to the trie that contains the 4 pseudo cliques of time stamp t-1: {1,9,13}, {2,6,7,10}, {3,4,7,11} and {5,8,12}. Figure 3-B corresponds to the insertion of the valid pseudoclique {1,9,13} of time stamp t: starting from the root node, we first go to the child corresponding to the first vertex of S ({1}) and process the remainder of S ({9,13}) recursively for that child. The recursion stops on a node whose vertex set is either S, or a prefix of S: 1. In the first case, the temporal label “Stability” is pushed back in the temporal label list of the node and its time stamp is set to t. Figure 3. Evolving subgraphs construction
94
2. In the latter case, the node gets a new son with vertex set S, time stamp t and temporal label “Emergent” (see Figure 3-C where the pseudo-clique {2,6,10} is inserted and Figure 3-D where {3,4,5,7,8,11,12} is inserted). Then we look whether S is involved in a growing evolving pattern. To do so, we have to retrieve all the subsets of S from T by means of the following doubly recursive procedure: We first go to the child corresponding to the first vertex of S and process the remainder of S recursively for that child and second discard the first vertex of S and process it recursively for the node itself. If there exists subsets of S that belongs to T with time stamp label t-1, then the temporal state associated to S is changed into “Growth” (see Figure 3-E) and pointers to the corresponding subsets are stored in the list associated to the node. Those nodes are also tagged to avoid their consideration in the following step. Now, we need to check whether the pseudocliques of time stamp t-1 have shrunk (“Diminution”) or completely disappeared (“Extinction”). As tries are more effective to find subsets than to find supersets, a second traversal of the trie is performed when all pseudo cliques of Ct have been processed. For all the nodes with time stamp t-1 that are not involved in “Stability” or “Growth” pattern, the function that searches subsets is triggered. If there exists a subset that belongs to Ct, the state of the first node is set to “Diminution” and pointers to the corresponding subsets are stored in the node list, otherwise the state is set to “Extinction”, the pattern is output and the node is removed from the trie. For example, Figure 3-C illustrates the insertion of the pseudo-clique {2 6 10} whose temporal label is “Emergent”. When all the pseudo-cliques of time stamp t are inserted, the second traversal of the trie is performed and the label of this node is set to “Diminution” (see Figure 3-F).
Data Mining Techniques for Communities’ Detection in Dynamic Social Networks
EXPERIMENTATION We evaluate the added-value of Evolving-Subgraphs and the general characteristics of evolving patterns subgraphs on three real-world dynamic networks: two dynamic sensor networks, imote and mit, and a dynamic mobility network velov, the shared bicycle system of Lyon. The main characteristics of these datasets are presented on Table 1. All experiments were done on a Pentium 3 with 2 Giga of memory running on Linux.
Dynamic Sensor Networks The two studied mobility networks used are based on sensor measurements. The imote (Chaintreau et al., 2005) data set has been collected during the Infocom 2005 conference. Bluetooth sensors have been distributed to a set of participants who were asked to keep the sensors with them continuously. These sensors were able to detect and record the presence of other Bluetooth devices inside their radio-range neighborhood. The available data concern 41 sensors over a period of nearly 3 days which represent 254151 seconds. The mit or Reality Mining (Eagle et al., 2006) experimental data set constitutes of records from Bluetooth contacts for a group of cell-phones distributed to 100 mit students during 9 months. Each cellular phone conducts a Bluetooth device discovery scan and records the identities of all devices present in its neighborhood at a sampling period of 300 seconds. For both data sets, the Bluetooth devices may discover any kind of Bluetooth objects in its neighborhood. We have restricted our analysis to internal contacts only.
Note also that the sensors had no localization capability. Therefore we do not have information on the actual movements of individuals carrying the sensors or on the proximity of two given sensors. We study the imote dataset over a typical day and the mit data over a typical week. The number of edges and the average degree of those graphs are reported in Figure 4. We have carefully checked that the results obtained on these durations were similar for other periods. Both imote and mit graphs are sparse (the number of edges is low) and the number of edges and the average degree exhibit large variations during daytime. To densify the graphs and cope with the flickering edge problem that may append with experimental data, we aggregate the graphs over a period of 15 minutes for imote and 1 hour for mit: in both dynamic graphs, an edge exists if it appears at least once during the considered period. The resulting dynamic graphs have a maximum degree of 25 for imote and 22 for mit. We extract evolving subgraphs with several density σ values, α being equal to 4.5 (average number of out-subgraph links per vertex) and the minimal size of the extracted locally maximal isolated pseudo cliques set to 4 for imote and to 3 for mit. The total runtimes and number of computed patterns are presented on Figure 5. These figures show that Evolving-Subgraphs is tractable in terms of execution time since it succeeds to extract the patterns in less than 20 minutes for different σ values varying between 1 and 0.6. The computational time is proportional to the number of output patterns what was expected according to the theoretical study of the time complexity of the pseudo clique mining algorithm. The time required
Table 1. Dataset characteristics Dataset
Nb Edges
Nb timesteps
Avg. Density
Imote
11785
282
0.025
Mit
107770
11763
0.001
Velov
279208
930
0.003
95
Data Mining Techniques for Communities’ Detection in Dynamic Social Networks
Figure 4. Statistics of graph properties, displayed as a function of time (imote on the left and mit on the right)
to compute evolving patterns generally decreases with σ as well as the number of extracted patterns. The numbers of evolving patterns of each type are shown on Figure 6. As the number of emergent patterns scales differently from other pattern types, their quantity is shown on the right ordinate axe, whereas the number of growth, stability and diminution patterns are plotted using the left ordinate axe. Even though the number of patterns decreases with the density threshold, we can
observe that the number of each type of patterns varies irregularly. Figure 7 shows the number of each pattern type at each time step. We can observe that the evolutions of these quantities are strongly correlated with the graph dynamic as depicted on Figure 4. The number of growth patterns is particularly correlated with the number of edges of the graph whereas the number of emergent patterns is more regular across the time.
Figure 5. Runtime and number of extracted patterns (logarithmic scales) for imote (left) and mit (right) dynamic graphs for different density threshold σ
96
Data Mining Techniques for Communities’ Detection in Dynamic Social Networks
Figure 6. Number of patterns of each type for imote (left) and mit (right) dynamic graphs for different density threshold σ
Figure 8 shows the output of our method: nodes represent valid pseudo-cliques and the numbers they contain are vertices identifiers, solid arrows show evolving patterns and dashed arrows are drawn between following subgraphs that intersect. We can identify three main groups of people. The first one is composed of individuals 9, 15, 31, 34 and 37. This group appears at time stamp 71, splits around time stamp 73 into two groups that then merge and integrate an additional vertex 5. The
second group is made up of individuals 0, 4, 29 and 35. Individuals 1 and 33 are nearby. This group is stable since it remains unchanged during two consecutive time stamps. The third group contains individuals 2, 14, 19 and 25 and is also stable.
Figure 7. Number of patterns of each type at each time step for mit dynamic graph (σ=0.65)
97
Data Mining Techniques for Communities’ Detection in Dynamic Social Networks
Figure 8. Display of the evolving patterns for imote with σ=0.8, α=3 and the minimum subgraph size equals 4 that occur in the morning
Shared Bicycle System VELOV We analyze Lyon’s shared bicycling system VELOV on the basis of the data provided by JCDecaux, promotor and operator of the program. The dataset contains all the bicycle trips that occurred between the 25th of May 2005 and the 12th of December 2007. Each record is anonymized and is made of the information about the date and time of the beginning of the trip, and of its end and the IDs of the departure and arrival stations (their geographical location being known). During this period, there were more than 13 million hired bicycle trips. To analyze the velov dataset, we first aggregate the number of rentals for every days of the week and every hour over the two and a half years period of observation. We thus obtain 168 time stamps. Then to leverage the most important links, we remove the edges that had less than 50 rentals over this period. Figure 9 (left) shows the total number of extracted evolving patterns and Evolving-Subgraphs runtime for several σ values. α being set to 5 and the minimum subgraph size is equal to 3. Here again, we can observe that the number of extracted patterns increases with σ. Figure 9 (right) shows
98
the repartition of the patterns among the different types of evolving patterns. The majority of the extracted patterns are emergent. The number of identical patterns can increase or decrease with σ: when a stable pattern disappears, usually a growth or diminution pattern appears. Figure 10 displays the main patterns output by Evolving-Subgraphs when applied on velov dataset for time stamp between Monday 6 PM and Tuesday 7 AM. The analysis of the output evolving patterns brings interesting pieces of information: for example, around Monday midnight, the identified patterns gather stations that are nearby to each other. Subgraph 58, 78, 115 is made of stations located on the largest campus of Lyon and shows that there are many tips between these stations. Such pattern grows at 1 AM, with the addition of a neighboring station. Stations 187, 71 and 90 are around the main Park of Lyon, also located in this area. Another important group of stations is the one made of stations 55, 84, 92 and 99 that are all located in the 7th district of the city where many student rooms are.
Data Mining Techniques for Communities’ Detection in Dynamic Social Networks
Figure 9. Runtime and number of extracted patterns for velov dynamic networks for different density threshold σ (left), number of patterns of each type (right)
CONCLUSION This chapter bridges the gap between constraintbased mining techniques and dynamic graphs analysis. We have considered the evolving-pattern mining problem in dynamic graph. We introduced five new pattern types which rely on the extraction of dense subgraphs and the identification of their
evolution. We formalized this task into a local-toglobal framework: Local patterns are first mined in a static graph; then they are combined with the ones extracted in the previous graph to form evolving patterns. These patterns are defined by means of constraints that are used to efficiently mine the evolving patterns. Our experiments on real life datasets show that our approach produces
Figure 10. Example of interesting subgraphs for velov network
99
Data Mining Techniques for Communities’ Detection in Dynamic Social Networks
high quality patterns that are useful to understand the graph dynamics. This technique can be of great interest for mining patterns of interactions in online communities, i.e. identifying groups of people that have strong social interactions and share some interest. Two main characteristics of our method make it a valuable tool for analysis online communities. First, whereas most of existing methods propose to identify group of interacting persons from a static point of view, here we propose to disclose how such groups emerge, attract new persons, or split, and disappear over time. This enables to analyze the temporal evolution of the online communities’ structure and keep track of the changes in the interests of the communities’ members. Second, the proposed method is incremental: For example, the graph of community member interactions can be updated everyday; Valid pseudo-cliques are thus extracted from it and then combined with the evolving patterns computed on the previous graphs. The global picture of the online communities is therefore maintained up to date without considering all the previous time steps (which would quickly becomes intractable) but just the previous time step graph. This is an important feature of the method which makes it usable on very long time periods.
REFERENCES Backstrom, L., Huttenlocher, D., Kleinberg, J., & Lan, X. (2006). Group formation in large social networks: membership, growth, and evolution. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 44-54). Philadelphia, PA, USA, August 20-23. New York: ACM Press.
100
Bodon, F. (2005). A trie-based Apriori implementation for mining frequent item sequences. In OSDM ‘05: Proceedings of the 1st International Workshop on Open Source Data Mining (pp. 5665). New York: ACM. Bonchi, F., & Lucchese, C. (2007). Extending the state-of-the-art of constraint-based pattern discovery. Data & Knowledge Engineering, 60(2), 377–399. doi:10.1016/j.datak.2006.02.006 Borgelt, C. (2003). Efficient implementations of Apriori and Eclat. In 1st Workshop of Frequent Item Set Mining Implementations. Borgwardt, K. M., Kriegel, H.-P., & Wackersreuther, P. (2006). Pattern mining in frequent dynamic subgraphs. In Proceedings of the 6th IEEE International Conference on Data Mining (ICDM 2006), Hong Kong, China (pp. 818-822), Washington, DC, USA. IEEE Computer Society. Böttcher, M., Höppner, F., & Spiliopoulou, M. (2008). On exploiting the power of time in data mining. SIGKDD Explor. Newsl., 10(2), 3–11. doi:10.1145/1540276.1540278 Brandes, U., Delling, D., Gaertler, M., Görke, R., Hoefer, M., Nikoloski, Z., & Wagner, D. (2008). On modularity clustering. IEEE Transactions on Knowledge and Data Engineering, 20(2), 172–188. doi:10.1109/TKDE.2007.190689 Cerf, L., Besson, J., Robardet, C., & Boulicaut, J.-F. (2008). Data-Peeler: Constraint-based Closed Pattern Mining in n-ary Relations. In Proceedings SIAM International Conference on Data Mining (SIAM DM) (pp. 37-48). Chaintreau, A., Crowcroft, J., Diot, C., Gass, R., Hui, P., & Scott, J. (2005). Pocket switched networks and the consequences of human mobility in conference environments. In WDTN ‘05: Proceedings of the 2005 ACM SIGCOMM workshop on Delay-tolerant networking (pp. 244-251). New York: ACM.
Data Mining Techniques for Communities’ Detection in Dynamic Social Networks
Chi, Y., Zhu, S., Song, X., Tatemura, J., & Tseng, B. L. (2007). Structural and temporal analysis of the blogosphere through community factorization. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, California, USA, August 12-15, 2007 (pp. 163-172). New York: ACM Press. De Raedt, L., & Zimmermann, A. (2007). Constraint-based pattern set mining. In Proceedings SIAM SDM’07, Minneapolis, USA. Eagle, N., & Pentland, A. (2006). Reality mining: Sensing complex social systems. Journal of Personal and Ubiquitous Computing, 10(4), 255–268. doi:10.1007/s00779-005-0046-3 Fortunato, S., & Barthelemy, M. (2007). Resolution limit in community detection. Proceedings of the National Academy of Sciences of the United States of America, 104(1), 36–41. doi:10.1073/ pnas.0605965104 Hämäläinen, W., Toivonen, H., & Poroshin, V. (2004). Mining relaxed graph properties in internet. In P. T. Isaias, N. Karmakar, L. Rodrigues, & P. Barbosa (Eds.), Proceedings of the IADIS International Conference WWW/Internet 2004, Madrid, Spain (pp. 152-159). Lahiri, M., & Berger-Wolf, T. Y. (2008). Mining periodic behavior in dynamic social networks. In Proceedings of the 8th IEEE International Conference on Data Mining (ICDM 2008), December 15-19, 2008, Pisa, Italy (pp. 373-382). IEEE Computer Society, 2008. Leskovec, J., Kleinberg, J., & Faloutsos, C. (2005). Graphs over time: densification laws, shrinking diameters and possible explanations. In Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, IL, USA, August 21-24, 2005 (pp. 177-187). New York: ACM Press.
Makino, K., & Uno, T. (2004). New algorithms for enumerating all maximal cliques. In Algorithm Theory - SWAT 2004, 9th Scandinavian Workshop on Algorithm Theory, Humlebaek, Denmark, July 8-10, 2004, Proceedings (LNCS 3111, pp. 260-272). Morik, K., Boulicaut, J.-F., & Siebes, A. (Eds.). (2005). Local Pattern Detection. In International Seminar, Dagstuhl Castle, Germany, April 12-16, 2004, Revised Selected Papers (LNCS 3539). Newman, M. E. J. (2004). Fast algorithm for detecting community structure in networks. Physical Review E: Statistical, Nonlinear, and Soft Matter Physics, 69(6), 66–133. doi:10.1103/ PhysRevE.69.066133 Palla, G., Derenyi, I., Farkas, I., & Vicsek, T. (2005). Uncovering the overlapping community structure of complex networks in nature and society. Nature, 435(7043), 814–818. doi:10.1038/ nature03607 Pei, J., Jiang, D., & Zhang, A. (2005). On mining cross-graph quasi-cliques. In Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, Illinois, USA, August 21-24, 2005 (pp. 228-238). New York: ACM. Pensa, R., Robardet, C., & Boulicaut, J.-F. (2008). Constraint-driven Co-Clustering of 0/1 Data (pp. 123–144). CRC Press. Robardet, C. (2009). Constraint-based Pattern Mining in Dynamic Graphs. In S. Ranka & P.S. Yu (Eds.), Proceedings of the IEEE International Conference on Data Mining (pp. 950-955). Scherrer, A., Borgnat, P., Fleury, E., Guillaume, J.L., & Robardet, C. (2008). Description and simulation of dynamic mobility networks. Computer Networks, 52(15), 2842–2858. doi:10.1016/j. comnet.2008.06.007
101
Data Mining Techniques for Communities’ Detection in Dynamic Social Networks
Sun, J., Papadimitriou, S., Yu, P. S., & Faloutsos, C. (2007). Graphscope: Parameter-free mining of large time-evolving graphs. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, California, USA, August 12-15, 2007 (pp. 687-696), San Jose, CA, USA.
Tong, H., Papadimitriou, S., Sun, J., Yu, P. S., & Faloutsos, C. (2008). Colibri: fast mining of large static and dynamic graphs. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, Nevada, USA, August 24-27, 2008 (pp. 686-694). New York: ACM.
Tantipathananandh, C., Berger-Wolf, T. Y., & Kempe, D. (2007). A framework for community identification in dynamic social networks. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, California, USA, August 12-15, 2007 (pp. 717-726). ACM.
Uno, T. (2007). An efficient algorithm for enumerating pseudo cliques. In Algorithms and Computation. In Proceedings of 18th International Symposium, ISAAC 2007, Sendai, Japan, December 17-19, 2007 (LNCS 4835, pp. 402-414).
102
Zhu, F., Yan, X., Han, J., & Yu, P. S. (2007). GPrune: A constraint pushing framework for graph pattern mining. In Z.-H. Zhou, H. Li, & Q. Yang (Eds.), Advances in Knowledge Discovery and Data Mining, 11th Pacific-Asia Conference, PAKDD, Nanjing, China, May 22-25, Proceedings (LNCS 4426, pp. 388-400).
103
Chapter 6
A Methodological Approach for Blended Communities: Social Network Analysis and Positioning Network Analysis Susan Annese University of Bari, Italy Marta Traetta University of Bari, Italy
ABSTRACT The current diffusion of blended communities, characterized by the integration of online and offline interactions, has made necessary a methodological reflection about the suitable approaches to explore psychosocial dynamics in virtual and real communities. In this chapter we propose a mixed approach that ‘blends’ qualitative and quantitative methods: by combining qualitative content analysis with Social Network Analysis we investigate participation dynamics and by employing this methodological combination in an original way we create an innovative method, called Positioning Network Analysis, to examine identity dynamics. We will describe the characteristics of this methodological device, providing some examples in order to show the manifold use of these original tools.
INTRODUCTION Over the last years there has been a growing development of new models of communities, innovated by the use of technologies. Recently the integration of Computer Mediated Communication technologies in face to face communities has produced blended models (Ligorio & Annese, in press; Ligorio & Sansone, 2009) of communities. They originate in educational contexts with
Blended Learning (Bonk & Graham, 2006), but are now spreading in other contexts, particularly in professional ones. The mixture of face to face and mediated interactions triggers psychosocial dynamics such as the sense of belonging to the community and the subsequent identity construction process (Ligorio, Annese, Spadaro, & Traetta, 2008), that have considerable implications on learning process. Our concern for psychosocial dimensions and psychoeducational implications of blended com-
munities is framed in a socio-cultural perspective that defines learning as a social event deriving from the participation in community life and affecting the construction of identity (Lave, 1991; Wenger, 1998). Our interest for the psychological processes emerging in blended communities engages us in a new research trend (Annese, Traetta & Spadaro, 2010; Ligorio, et al., 2008) looking for appropriate methodological procedures. The aim of this chapter is to propose a unique methodological approach to observe blended communities. It is a distinctive approach because it does not simply mix qualitative and quantitative methodologies, but it employs a quantitative tool in a qualitative way; it tries to qualitatively interpret a quantitative method. In this sense it could be interesting for qualitative scholars who are looking for new methodological modes, able to represent their epistemological attitude towards knowledge building.
BACKGROUND Blended Learning between Community and Identity The socio-cultural definition of learning as participation and acculturation (Bruner, 1966, Wenger, 1998), stressing the importance of the social context where negotiating meanings, recalls a suitable conceptual framework consisting of the construct of Communities of Practice (CoP) (Wenger, 1998) and the theory of Dialogical Self (Hermans, 1996; 2001). The definition of learning as an intersubjective process, given by the involvement of participants in a meaningful interaction (Matusov, 2001), well fits with the perspective proposed by Wenger (1998): learning is a social process deriving from the participation in community practices, a process triggering an increasing sense of belonging to the community.
104
The concept of CoP helps to study learning by highlighting community’s participation trajectories through the model of “Legitimate Peripheral Participation”(Lave & Wenger, 1991). A crucial node in this model is the degree of appropriation of community’s culture affecting the kind of participation of community members in social practices. Newcomers can move in a double participation trajectory: from the periphery to the centre and from the centre to the periphery. A newcomer can progressively increase his/her degree of appropriation of community’s culture, passing from a peripheral participation to a central one; at the same time he/she can also decrease his/her degree of appropriation of community’s culture, passing from a central participation to a peripheral one. For example, a research study of ours, about the socialization process of a newcomer in a professional community (Traetta & Annese, in press), shows discursive markers of his central participation –such as repair mechanisms addressed to others’s turns (Schegloff, Jefferson & Sacks, 1977) or the use of the pronoun “we” implying a strong sense of belonging to the community. At the same time some signals –such as conversational markers subduing strong statements- prove his persisting peripheral participation. Spadaro and Ligorio (2005) find that the objective of the interaction can influence the participation trajectories of newcomers. The different kinds of trajectory can depict a participation that is not always progressive and linear. In short, participation is a learning process mutually accomplished by individuals and community and supplying both of them with identity resources. They shape each other through the sense of belonging sprung from “common enterprises” (Wenger, 1998). Their active involvement in social interactions leads community members to considers themselves as part of a unit, their strong “sense of community” (McMillan & Chavis, 1986) allows them to negotiate individual and collective identities. By participating, community members dialogically position and think themselves in
A Methodological Approach for Blended Communities
new ways, they interiorize new self positionings according to the specific situation and context (Davies & Harré, 1990); consequently the shifting of positionings makes the self’s organization dynamic. Harré and Van Langhenove (1991) illustrate how the situation’s aims strategically direct positionings’ choice. As a consequence this choice is not definitive and consistent (Smith, 1988). The shifting of positionings represents a resource to negotiate participation conditions in social interactions (Antaki & Widdicombe, 1998). New experiences increase the possibility of innovating the positionings’ repertory (Hermans, 1996; Hermans & Kempen, 1993) and educational contexts are a very rich source of new experiences, where the participation in social activities activates learning process, a sense of belonging to the community and the redefinition of identity positionings (Garrison, 2006). In other words, the exposure to different learning affordances and the acquisition of new skills produce a continuous and plastic adjustment of identity trajectory (Cucchiara, Spadaro & Ligorio, 2008). The Dialogical Self Theory (Hermans, 1996; 2001) provides a clear explanation of the dynamic nature of identity, by proposing the idea of a self whose various aspects, settled in specific positions, are engaged in continuous dialogues. The dialogues give specific configurations to the self, depending on the particular situation and moment the individual is living. In this dialogical construction of the self, even community builds a new collective identity: it is an intersubjective configuration in which each member is considered as a positioning of the collectivity (Ligorio & Spadaro, 2005). According to this conceptual framework a community can regularly interact in virtual settings, too. The essential feature of a community is not the sharing of a physical space, but the engagement in meaningful social activities. In educational communities the sharing of a collaborative setting and the experience of active involvement in online learning activities “allows for mutual exploration
of ideas, a safe place to reflect and develop those ideas” (Palloff & Pratt, 2007), a community space triggering a strong sense of belonging to it. As a result this conceptual framework enlightens group dynamics in online or offline communities, and more in blended communities. These ones particularly benefit by this theoretical background as the mixture of virtual and real contexts can activate peculiar psychosocial dynamics in terms of participation and identity (Annese, et al., 2010). These conceptual premises led us to identify a methodological device able to investigate these two features of blended communities.
A Methodological Review for Blended Communities The need to study a blended community points us to a blended methodology (Cresswell, 2003) that employs a quantitative method together with a qualitative one (Tashakkori & Teddlie, 1998). The integration of these two methods produce a quantified qualitative analysis as it adjusts qualitative findings into statistical measures. We used this blended methodology for studying both participation strategies and identity dynamics. To study participation dynamics the whole corpus of data was qualitatively content analyzed, and then Social Network Analysis was performed on content analysis outcomes. To study identity dynamics the integration of content analysis and Social Network Analysis was adapted to the conceptual framework of Dialogical Self by creating a variant of the blended method, called Positioning Network Analysis. A brief review of relevant literature about the two employed methods will enlighten the mixed methodology we propose.
105
A Methodological Approach for Blended Communities
CONTENT ANALYSIS: FROM A QUANTITATIVE TO A QUALITATIVE APPROACH The first method we used for studying both participation and identity dynamics is represented by content analysis, a tool usually preferred by social scientists to examine communication processes (Holsti, 1968). It consists in breaking down the messages in simple elements in order to code them according to thematic categories and to register their frequency (Ghiglione, 1980). Nevertheless the definition of content analysis is not univocal as it depends on the theoretical evolution of the communication model. It develops from the analysis of the evident content of messages, in line with the informational model of communication, to the analysis of the text’s context, in line with the textual model of communication. The first definition, given by Bereleson (1952), focuses on a specific aspect of communication process: the message content. In fact he defines content analysis as a research technique for “the objective, systematic and quantitative description of manifest communication content” (Berelson, 1952, p. 18). The following definitions extend their object according to the complexity of communication acts. They address on the message content, but above all on other aspects of communication process such as the message senders and receivers. In line with this conceptual evolution Krippendorff (1980) provides a comprehensive definition of content analysis interpreted as “a research technique able in drawing valid and repeatable inferences from the data to their context” (Krippendorff, 1980, p. 21). Moreover the diffusion of information systems for managing considerable amount of data has complicated the attempt of clearly defining content analysis, driving some authors to define it as a set of research techniques often completely different among them (Holsti, 1969). These varied techniques are distinguishable in terms of analysis
106
units, that generate three different kinds of analysis process (Rositi, 1970): •
•
•
In the first kind of analysis, unit corresponds to simple elements of linguistic structure; In the second kind, units are identified through the content rather than linguistically; In the third kind, the analysis unit overlaps with the context unit, thus producing a treatment of data even in their extra-linguistic content.
This last kind of analysis, similar to an inquiry, considers the thorough meaning of the text, interpreted by making clear research criteria. The interpretative nature of analysis discriminates the qualitative approach (Mayring, 2000) to content analysis from the traditional quantitative one. Both of them are used to infer meaning from data content but the interpretative procedure is different. The quantitative approach detects linguistic occurrences of content with a descriptive-inferential aim, whereas the qualitative one classifies the content according to the sense interpreted through the extra-linguistic context with a theory-building aim. The qualitative approach can be defined as a “thematic analysis” (Boyatzis, 1998) where qualitative data are encoded through specific thematic codes generated from the theory and integrated during the process of analysis. Current applications show an extensive use of the qualitative approach as the interpretative process allows the researcher to make evident latent contents of messages. This rationale makes the qualitative content analysis an appropriate method for investigating psychosocial dynamics in a CoP. Specifically we used it for identifying participation links among blended communities’ members and positioning links in community members’ identity.
A Methodological Approach for Blended Communities
SOCIAL NETWORK ANALYSIS: FROM REAL TO VIRTUAL COMMUNITIES Social Network Analysis (SNA) is an interesting quantitative tool to map the relational framework of communities (Scott, 1997; Wasserman & Faust, 1994). The analysis of social networks is based on network, link and structure concepts and has various application fields: relationships among individuals and relationships among communities -families, organizations, countries. At the beginnings it was implemented in real social contexts for studying both wide-ranging systems such as economic and political structures (Snyder & Kick, 1979) and local systems such as the relationships among organizations of the same metropolitan area (Galaskiewicz & Wasserman, 1989) or the relationships among individuals in the same company (Krackhardt, 1987). Later SNA was used for observing other phenomena such as occupational mobility, group decision-making, introduction and diffusion of innovations, the framework of social power and social influence. A famous study was carried out by Padgett & Ansell (1993) to explore the increasing expansion and the social power of Medici’s family in 15th century through the identification of their relationships with other powerful Florentine families. At present the development of virtual communities has made salient the use of SNA for investigating social relationships in online contexts. The first scholars who employed it in virtual settings were Freeman & Freeman (1979) who observed networks of interactions among researchers through their e-mail exchanges. Then Garton, Haythornthwait & Wellman (1997) imported the tool in collaborative virtual contexts and Cho, Stefanone & Gay (2002) analyzed e-mail exchanges among students of the same university course. Nowadays SNA is revealing a useful tool to study virtual learning communities and assess learning process during the course and at the end
of it (Reffay & Chanier, 2002). It can identify dysfunctional behaviours of community members and allow reorganization of group structure and activities, in order to improve its effectiveness. First studies only explored the number of exchanges in communities’ networks, but the use of SNA as a diagnostic tool to detect communities’ ineffectiveness requires the examination of messages’ content, too. Just following this purpose, SNA has been recently employed in combination with content analysis in order to observe virtual groups engaged in collaborative learning settings (Martinez, Dimitriadis, Rubia, Gomez & de la Fuente, 2003). Our methodological proposal tries to employ qualitative content analysis to interpret SNA in a distinctive way. In fact our attention to psychosocial dynamics leads us to employ this methodological combination not only for participation strategies analysis, as traditional studies make, but also for identity positionings examination, through the creation of a specific procedure that we called Positioning Network Analysis (Traetta & Spadaro, 2008).
A Blended Methodological Device In this chapter we would propose the above mentioned blended methodology, implemented to study the participation and identity dynamics in two blended communities. In fact research data are composed of the interactions of two students’ communities (group 1 and group 2) attending a blended course on E-learning Psychology at the University of Bari (IT) in two different academic years. The first group was made up of 11 participants, while the second group was composed of 15 students and split into two subgroups (A and B). During the course students were asked to attend offline classroom lessons and to participate in online activities hosted by the platform Synergeia (http://bscl.gmd.de/), designed to support collaborative learning processes. The course activities consisted of:
107
A Methodological Approach for Blended Communities
•
•
Weekly offline lessons, during which the professor assigned a topic and explained it through key concepts, finally setting a relevant research question for student discussions in the online forum; A set of online activities, including group discussions in online forums. During the online activities, students played systematic roles such as e-tutor, the weekly discussion summariser, and critical friend, the evaluator of the group work during the week. The systematic distribution of these roles helps students to gain study and group work skills.
At the end of the course students participated in a face to face focus group discussion during which they reflected upon the experience of the blended course they had taken and discussed their learning process. We analyzed, for each group, an online discussion -forum- and an offline discussion -focus group. Finally we analysed: a) Group 1: an online discussion of a web forum (forum 1), and an offline discussion of a focus group (focus 1). b) Group 2, divided into two subgroups (A and B): three online discussions (forum A and forum B - where students of each subgroup interact within their subgroup, and forum 2 - where the students of both subgroups interact together), and an offline discussion of a focus group (focus 2). Research findings about their relational and identity networks were exhaustively illustrated in another paper (Annese et al., 2010). Here we would exclusively offer a deep reflection about the methodological device produced to examine the mediated and face to face interactions of the blended learning communities. We pursued the following objectives:
108
1. Describe individual participation strategies and the ensuing relational structure of the community; 2. Analyse identity construction in terms of positionings; 3. Observe the differential inquiry of online and offline contexts in participation strategies and identity dynamics. In both offline and online environments, participation strategies were analysed by a qualitative use of SNA, whereas the same qualitative use was innovated by making the concept of “positioning” operational and giving way to an original methodological tool – Positioning Network Analysis – to investigate identity trajectories.
SNA for Participation Strategies In order to build the networks of social relations in the two different interactive environments, we observed the participation strategies in online and offline contexts in two stages. First of all we performed a qualitative content analysis to detect the communication links in participation, to identify speakers and recipients of each message. The preliminary step of this first stage is to accurately choose the analysis unit: the message is meant as a single communicative act. This introductory step allows the easy application of a purposely created grid where categories correspond to participants. The identification of speakers is easier than that of recipients, particularly in a group context where recipients can be more than one participant and, especially in online asynchronous discussions, where messages are seldom directed to an explicit addressee as they are posted in a public setting. To solve this problem we tailored the qualitative content analysis procedure through specific criteria of identification: 1) The presence in the text of an explicit or implicit reference to a specific recipient;
A Methodological Approach for Blended Communities
2) The identification of multiple recipients through three indicators: a. Explicit or implicit reference to multiple recipients; b. Absence of reference to a specific recipient; c. Explicit reference to the whole community. Both criteria can show an explicit reference to the message recipient. Here is an example of the first identification criterion: “Hi Claudia! […] I would agree with you about the research question of this week”↜ (Debora) But they can also show implicit references to the recipient, requiring a careful interpretation work of the message context. The following example concerns the first identification criterion: “With regard to reasons that move a troll I suggest you a link. What do you think about it?” ↜(Ilario) “I didn’t know the definition oftroll[…] howeverI think thatthe behaviour of a troll can sometimes support the discussion in a forum”↜(Armando) In Armando’s note we inferred that the recipient was Ilario through two textual markers: the reference to the same content of Ilario’s note (“the troll”) and Armando’s intention to provide an answer (“I think that”) to Ilario’s question.
In both these cases there is a unique recipient, but in many cases messages have multiple recipients for whom it is possible to find implicit or explicit references in the text, too. The construction of identification criteria allowed two different researchers to perform independent analyses on the whole data corpus, by achieving a very high inter-reliability rate -89,7%- (Krippendorff, 1980). The results of the content analysis were arranged in adjacency matrices, in which each cell contains the number of communication links got by intersecting two participants: The second stage of participation inquiry was to perform Social Network Analysis: we imported matrices, one for each discussion analyzed, in the software NetMiner 3. This software makes available various options to arrange analysis according to peculiar research objects. We chose to employ three kinds of analysis, particularly fitting our concerns about participation strategies: a) The neighbour analysis (Wasserman & Faust, 1994), able to investigate the level of density among nodes/participants in the community; b) The cohesion analysis (Wasserman & Faust, 1994), to identify “sub-structures” in the network, defined “cliques”, as individuals can participate in the whole social structure through groups and subgroups; c) The centrality analysis (Wasserman & Faust, 1994), to examine each node/actor’s centrality and his social power; participants
Table 1. Extract of a participation adjacency matrix Debora Debora Dario
5
Krizia
1
Mario
1
Dario
Krizia
Mario
7
7
5
3
2
109
A Methodological Approach for Blended Communities
Figure 1. Graphical representation of cliques in group 1 – offline focus
having more links to others are in a central position. Each kind of analysis is based on specific indices. The neighbour analysis is based on the density index that makes a proportion between the actual number of observed links and the highest number of possible links in the community (Wasserman & Faust, 1994). The range of values for this index goes from 0 -no links at all among participants- to 1 -all participants are linked among them. The density index is calculated according to two criteria: the inclusiveness index and the nodal degree index. The first index represents the percentage of community members involved in the interactions; the second index provides information about the number of members with whom each participant is linked, therefore it identifies the presence of peripheral participants, such as isolated or pendant nodes (members linked to a unique participant), or of central participants, actors linked to many other members. We employed the density index to compare the cohesion degree of the same group in the two interactive contexts of blended communities. In the two observed learning communities we identified different relational structures for the two settings: the online discussions show higher density index values (1,00 in the first community and 0,80 in the second one) than the offline discussions (0,80 in
110
the first community and 0,59 in the second one). These results show stronger and more distributed links in online settings, marking a more consistent and solid network in the online context of blended communities. The second kind of analysis employed, the cohesion analysis, is based on the number of cliques present in the community networks (Wasserman & Faust, 1994). Cliques are sub-structures composed of at least three completely inter-connected nodes, but the software can report the composition of each clique through participants’ names and its graphical representation (see Figure 1). This kind of analysis support results obtained through neighbour analysis. For example peripheral participants, such as Romina, can be integrated again in the larger community through the belonging to a clique -clique2- where they are able to interact with central participants -Silvana, Ilario, Clara, Angelo, who play a mediating role being members of both cliques (see Figure 1). Finally, the centrality analysis is based on the concepts of centrality and prestige that describe the position of each social actor in the relational network (Wasserman & Faust, 1994). Centrality index shows the active participation in the community discourses through the calculation of messages production; prestige index shows the social validation of participants through the calculation of messages receipt. By comparing
A Methodological Approach for Blended Communities
these two indices it is possible to get participation style for each participant. For example some members play the role of counter-leaders having an high centrality index in messages production only. They give voice to the dissent in the community, but they do not receive other participants’ validation as the social confirmation of the community is addressed towards recognized leaders. In fact, leaders are characterized by a balanced participation style combining a high centrality index in messages production with a high centrality index in messages receipt. In this sense the community network is based on a positional logic that defines the individuals in terms of social power and popularity. To define the relevance of each participant, centrality analysis can explore his/her involvement in communication exchanges through the degree centrality index. It is calculated through each participant nodal degree with a score going from 0 to 1. This score marks his/her power through the comparison between the numbers of actual links and the number of possible links with other participants.
As centrality analysis outlines participation trajectories for each actor, it is possible to compare them in the two different contexts. The observed blended communities show different trajectories, not always consisting in a linear participation. Some members follow stable participation trajectories by activating the same strategies in both interactive contexts. For example, some students maintain the same popularity in both online and offline discussions; they are always crucial reference points for the whole community. These include Angelo whose trajectory is stable in both forum (degree centrality index 1,00) and focus discussions (degree centrality index 1,00), identifying him as a central actor: Other students change their participation style according to the interactive environment, generating specific trajectories. For example, students like Romina, who are peripheral (0.571) in face to face discussion (see Figure 3), play a central role, likewise other members, in online discussion (1,00), being perfectly integrated in a balanced community’s participation network (see Figure 2). The different individual trajectories and the community structure mutually influence them-
Figure 2. Degree Centrality in participation of group 1 - online forum
Figure 3. Degree Centrality index in participation of group 1- offline focus
111
A Methodological Approach for Blended Communities
selves. In fact a cohesive and balanced community structure produces the absence of central and peripheral participants; on the contrary an unbalanced and irregular community structure generates the presence of more central or more peripheral members. In addition, centrality analysis can observe each community through a centralisation index (Wasserman & Faust, 1994) to verify if the relational network as a whole is based on central nodes/participants, identified by the centrality index. High percentages of centralization index occur when there are few participants with a high score of centrality, whereas low percentages of centralization occur in absence of participants with a high score of centrality. In conclusion, the three performed kinds of analysis can fully represent the relational framework of communities: neighbour and cohesion analyses compose the participation network of the whole community whereas centrality analysis portrays the contribution of each individual participation trajectory to the collective structure.
PNA for Identity Trajectories We proposed the mixed use of qualitative content analysis and SNA even for analyzing identity dynamics in real and virtual contexts of the two observed blended communities. Our research aim was to compare participants’ identity trajectories in online and offline environments. For this purpose the mixed methodological device was adapted to the theoretical framework of Dialogical Self, by making the concept of “positioning” operational. We represented it as a node of Social Network Analysis, thus generating a variant of SNA, called Positioning Network Analysis (PNA) (Ligorio et al., 2008). In order to build identity networks for participants and community we performed three complementary stages of analysis: 1) Qualitative content analysis, 2) PNA,
112
3) Analysis of identity’s levels. In the first stage we needed to code data through the notion of “positioning” (Hermans, 1996; 2001); so we constructed an appropriate categories’ grid including 15 theory and data driven positionings – clustered in 5 core categories (see Table 2). The different kinds of positioning represent our “thematic codes” (Boyatzis, 1998). Each category is a positioning, meant as the way in which the speaker positions himself/herself towards community and his/her degree of involvement in it. To construct categories’ grid we first employed the two theoretical distinctions of internal/external positionings (Hermans, 1996) and individual/ collective positionings (Spadaro, 2008). Later, through the observation of data corpus, we inserted new categories given by the specific organization of studied communities, such as the positionings connected with the playing of a formal role or the different levels of collectivity produced by the configuration of communities in subgroups. The rich taxonomy of positionings tries to represent the complexity of identity trajectories, produced by the crossed references to personal and social identity. Individual positionings mark aspects of personal identity, collective ones intersect aspects of social identity with community identity; together they represent a methodological tool for a psychosocial reading of the Dialogical Self theory. For example group positionings represent the belonging to a ‘we’ necessary to define the ‘I’. Similarly, interpersonal positionings mark a belonging to a ‘we’ given by the social relationship with another community member. Differently, boundary positionings mark the need of a ‘we’ through a temporary distancing from the community; the selection of a momentary peripheral position constructs the sense of belonging to the community by contrast. Finally multimembership positionings, inside the same community, indicate a ‘we’ given by the interaction of numerous belonging senses.
A Methodological Approach for Blended Communities
Table 2. Grid of positioning categories Core categories
Categories
Definition
Example
Individual positionings
internal
Emotions, ideas, interior aspects related to personal identity
“I think that…”
external
Reference to experiences, people, places relevant for personal identity of the speaker
“I come from Valenzano…”
open
Utterances in which doubtful positions of the self are expressed
“I do not know if I am a good tutor…”
internal
Self descriptions as belonging to a “we” representing the whole community
“We meet in our Skype”
external
Reference to experiences, people, places shared by the whole community, therefore relevant for the collective identity
“The Sereni’s lessons…”
open
Utterances in which speakers express doubtful positions of the collective identity
“We had not understood…”
internal related to subgroup
Self descriptions as belonging to a “we” restricted to a formal subgroup of the larger community
“we belong to group A…”
internal related to formal role
Self descriptions as belonging to a “we” restricted to a subgroup of the larger community, composed of participants playing the same formal role
“we tutors”
open related to formal role
Utterances in which speakers express doubtful positions about the collective identity related to the role playing
“we tutors could do it…”
direct
Explicit reference to one or more participants through the use of “you”
“As you said…”
indirect
Implicit reference to one or more participants through an indirect quotation
“As Dario said…”
direct related to formal role
Explicit reference to one or more participants playing the same formal role
“As you tutors said…”
direct
Direct references to other subgroups
“you members belonging to group A”
Collective positionings
Interpersonal positionings
Intergroup positionings
indirect Boundary positioning
Indirect references to other subgroups
“As the group A said…”
Linguistic expressions marking the member’s temporary estrangement from the community
“I think that…; what do you think about it?”
After the construction of categories’ grid it was possible to identify the links between eliciting and elicited positionings through a qualitative content analysis. Initially each message, in online and offline discussions, was coded according to a positioning category. After that, each message’s content was carefully read in order to identify, through the examination of previous messages’ content, the source message and its relevant positioning that elicited the positioning of the current message. The results of the qualitative content analysis were arranged in adjacency matrices, in which
each cell contains the number of links got by intersecting two categories of positioning. Two independent analyses were performed on the whole data corpus by showing a high interreliability rate both in the coding phase -89,3%and in the links’ identification phase -82%. In the second stage we adapted SNA to the conceptual framework of Dialogical Self, creating the methodological tool of PNA. For this purpose content analysis outcomes were imported in the software NetMiner3 and treated through two diverse kinds of analysis. For them we chose two indices of those employed in participation
113
A Methodological Approach for Blended Communities
Figure 4. Identity repertoire in group 2- online forum
analysis, but their use according to PNA makes them completely different in the identity dynamics: a) The neighbour analysis (that investigates the level of cohesion among community participants), illustrates the complete repertoire of positionings used by each participant and by the whole community, in PNA; b) The degree centrality analysis (that explores each actor’s centrality and his social power in participation) identifies positionings crucial for the Self as they are tied to most of other positionings, in PNA. The neighbour analysis enables a comparison of positionings’ repertoire in real and virtual environments of blended communities, by making the role of the context salient in identity dynamics. In fact identity repertoire is broader in virtual context, but above all it is more compact as there are not isolated positionings (nodes having no links with other nodes) and all positionings are connected each other (see Figure 4). In this sense virtual settings can be considered “laboratories of identity experimentations” (Turkle, 1996) where all voices of identity can express, the self can experience an effective multiplicity. On the contrary, in real context, the network of position-
114
ings is less uniform, particularly because there are three cut off nodes (see Figure 5). It seems that the same community, in real setting, do not allow the intertwining of multiple relations and selves. The centrality analysis deepens the role of each positioning by identifying those ones more crucial for the repertoire’s organization. In both online and offline contexts the network of positionings appears to be organized around the
Figure 5. Identity repertoire in group 2- offline forum
A Methodological Approach for Blended Communities
Figure 6. Degree centrality index in positionings of group2- online forum
twofold pivot of personal and social identity. In fact individual internal positionings and collective internal positionings are central for both networks: they have the same score and the same importance inside each network; the two interactive environments show the same relevance for the same positionings even if with peculiar values.
Figure 7. Degree centrality index in positioning of group2- offline forum
In the communities we observed, collectivity is so essential as individuality for the construction of the self. The sense of community is an integral part of individual identity, as witnessed by the strong presence of ‘social’ positioning in identity dynamics. In fact when describing their self’s trajectory, community participants often refer to collective experiences that mark a strong sense of belonging: “I became an e-learning expert through the course that we followed”. In this statement the participant acknowledges the significance of the belonging experience to the learning community because he/she could develop e-learning skills and consequently reorganize his/her own identity according to the new position of “e-learning expert”. The individual identity is twisted with the community identity whose structure is particularly complex when the community is made up of various subgroups, several community levels and diverse senses of belonging. The dialogical interplay of individual and community identity is exactly described by the third stage of analysis where the positioning trajectories were analyzed according to three identity levels:
115
A Methodological Approach for Blended Communities
individual, interpersonal and community. The individual level examines the dialogue between positionings inside a single individual, as in the following example: “Hi Claudia! I created a folder in order to facilitate our interactions…”↜ (Debora) In this case the link between two positionings is in the same note of a single speaker. The first positioning (“I created”), marking her status of student, an individual aspect of Debora’s identity, elicits the second one (“our interactions”), marking her belonging to tutors’ role, a social aspect of Debora’s identity shared by Claudia, too. Differently, the interpersonal level reveals the dialogue between positionings of different social actors: “Nobody knows if we accomplished what professor required!”↜ (Clara) “We tried to do it!”↜ (Federica) In this example the link between positionings is interpersonal because it develops during the exchange between Clara and Federica. In fact Clara’s open collective positioning (“if we accomplished”), expressing an uncertainty about the group performance, elicits Federica’s collective positioning (“we tried”), reassuring her about group engagement in the course. Finally the community level connects all the individual and interpersonal positionings of the
community members, being the sum of them. The overall identity stems from the junction of personal and social aspects of the self. Again the innovation of analysis for identity’s levels is consistent with the conceptual framework of Hermans’ Dialogical Self where dialogue represent the exclusive kind of relationship both inside the individual, whose different inner voices are connected in dialogic relationships, and outside individual, where voices of other social actors interact through dialogic relationships. This further innovation helps us to verify the interplay between individual and collective identity in its building process across the two different interactive contexts of blended communities. Results show that interactive environment plays an important role in the construction of identity: in the online environment links between positionings of the same individual are more recurrent than in offline environment, where links between positionings of different people are more numerous. Therefore in online setting the building process turns around the individual level supported by otherness’ contributions; in the offline one, the building process is organized around the interpersonal level sustained by the individuality’s intervention. In the virtual context ‘other’ voices contribute to the weft of the ‘self’, individual identity is dialogically constructed for its social nature; in real context an exchange of experiences at the interpersonal level composes the self, individual identity is dialogically built through a shared framework, for its cultural dimension. In conclusion individual identity has a dialogical
Table 3. Frequence of positionings’ links in identity levels group 1 group 2
116
Individual level
Interpersonal level
Community level
online
55,30%
44,70%
100%
offline
40%
60%
100%
online
58%
42%
100%
offline
34,40%
65,60%
100%
A Methodological Approach for Blended Communities
form of construction that inevitably involves community identity. Finally, the study of identity dynamics in blended communities, let us maintain that the integration of online and offline interactions improves learning that, as a social process, promotes changes not only in abilities appropriation but also and above all in identity resources acquisition (Ligorio et al., 2008).
STUDY LIMITATIONS AND FUTURE PERSPECTIVES The blended methodological device proposed in this research work combines SNA and PNA to take an accurate picture of a specific moment in the observed event. This quality of our methodological proposal represents even its weakness as the accurate picture produced is static. Our methodological reflection is looking for a way to transform the weak point of the proposed device in its strong feature. We are considering the possibility of completing our device through a discursive contribution. In fact we think future research perspectives could integrate the blended methodology with a discursive approach. An investigation of the discursive device could improve the understanding of relational mechanisms (Lamherics, 2003) by giving the dynamic nature of psychological processes. The discursive tool could supply this limit by depicting social activities in progress. We are just working to identify the most appropriate discursive markers to support this research direction. Of course this is the principal weakness of our study, but we are aware that there are other limitations. For example another weak aspect is represented by the number of community participants: it is not easy to investigate the participation and identity networks of large communities through our methodological device. In particular it is hard to perform an accurate qualitative content analysis on a large number of community mem-
bers who clearly multiply their communication exchanges. To observe them our methodological approach requires a greater amount of efforts and resources. Obviously it is not impossible: even qualitative content analysis should be supported by information systems able to make consistent and reliable the work of identifying and coding thematic categories in a huge dataset. The awareness about weaknesses of our methodological proposal represents the way to make it stronger, as generally each limitation always opens the way to a research potentiality.
CONCLUSION The emerging of blended communities addresses our attention on one side to the unexpected psychological processes they trigger, on the other side to the unique methodological procedures they require. This new social space needs an accurate elaboration not only about the psychosocial implications of educational process, but also about the appropriate methodology for observing it. By an approach combining social and educational concerns, this study has tried to initiate a new research line where an innovative use of Social Network Analysis, mixed with qualitative content analysis, has determined an original tool called Positioning Network Analysis. The blended device here proposed, with its limitations and potentialities, could represent a useful attempt to extend the background of methods and techniques for studying different kinds of community. It is exploitable for the investigation of exclusively virtual or real communities; in the methodological debate, it focuses on the possibility of integrating qualitative and quantitative tools; and finally, it makes operational some conceptual constructs by bridging theories and methods, research and practice.
117
A Methodological Approach for Blended Communities
REFERENCES Annese, S., Traetta, M., & Spadaro, P. F. (2010). Blended learning communities: relational and identity networks. In Park, J., & Abels, E. G. (Eds.), Interpersonal Relations and Social Patterns in Communication Technologies: Discourse Norms, Language Structures and Cultural Variable (pp. 256–276). Hershey: IGI Global. doi:10.4018/9781-61520-827-2.ch014 Antaki, C., & Widdicombe, S. (1998). Identities in talk. London: Sage. Bereleson, B. (1952). Content analysis in communication research. Glencoe: Free Press. Bonk, C. J., & Graham, C. R. (Eds.). (2006). Handbook of blended learning: Global perspectives, local designs. San Francisco, CA: Pfeiffer Publishing. Boyatzis, R. E. (1998). Transforming qualitative information: thematic analysis and code development. Thousand Oaks, CA: Sage. Bruner, J. (1966). The culture of education. Cambridge, MA: Harvard University Press. Cho, H., Stefanone, M., & Gay, G. (2002). Social Network of Analysis of Information Sharing Networks in CSCL Community. In G. Stahl (Ed.), Computer-Support for Collaborative Learning: Foundations of a CSCL community. Proceedings of CSCL 2002 (pp.43-50). Hillsdale, NJ: Lawrence Erlabaum Associates.
Davies, B., & Harrè, R. (1990). Positioning: the discursive production of selves. Journal for the Theory of Social Behaviour, 20, 43–63. doi:10.1111/j.1468-5914.1990.tb00174.x Freeman, S. C., & Freeman, L. C. (1979). The networkers network: A study of the impact of a new communications medium on sociometric structure. Social Science Research Reports, No 46. Irvine, CA: University of California. Galaskiewicz, J., & Wasserman, S. (1989). Mimetic and normative processes within an interorganizational field: An empirical test. Administrative Science Quarterly, 34, 454–480. doi:10.2307/2393153 Garrison, J. (2006). Learning identity: the joint emergence of social identification and academic learning, by Stanton wortham (Book review). Educational Studies: A Journal of the American Educ. Studies Assoc., 40(3), 327–331. Garton, L., Haythornthwait, C., & Wellman, B. (1997). Studying Online Social Networks. JCMC, 3(1). Retrieved June 25, 2007, from http://www. ascus.org/jcmc/vol3/issue1/garton.html. Ghiglione, R. (1980). Manuel d’analyse du contenu. Paris: Colin. Harrè, R., & Van Langhenove, L. (1991). Varieties of positioning. Journal for the Theory of Social Behaviour, 21, 393–407. doi:10.1111/j.1468-5914.1991.tb00203.x
Creswell, J. W. (2003). Research Design: Qualitative, Quantitative, and Mixed Methods Approaches (2nd ed.). Thousand Oaks, CA: Sage.
Hermans, H. J. M. (1996). Voicing the self: From information processing to dialogical interchange. Psychological Bulletin, 119(1), 31–50. doi:10.1037/0033-2909.119.1.31
Cucchiara, S., Spadaro, P. F., & Ligorio, M. B. (2008). Identità e comunità in contesti collaborativi: un’esperienza blended universitaria. Qwerty, 1, 23–48.
Hermans, H. J. M. (2001). The dialogical self: Toward a theory of personal and cultural positioning. Culture and Psychology, 7, 243–281. doi:10.1177/1354067X0173001
118
A Methodological Approach for Blended Communities
Hermans, H. J. M., & Kempen, H. J. G. (1993). The dialogical self: meaning as movement. San Diego: Academic Press. Holsti, O. R. (1968). Content Analysis. In Lindzey, G., & Andersen, E. (Eds.), The Handbook of Social Psychology (pp. 587–692). Cambridge: Addison-Wesley. Holsti, O. R. (1969). Content Analysis for the social sciences and humanities. Reading, MA: Addison Wesley. Krackhardt, D. (1987). Cognitive social structures. Social Networks, 9, 109–134. doi:10.1016/03788733(87)90009-8 Krippendorff, K. (1980). Content Analysis: An Introduction to Its Methodology. Newbury Park, CA: Sage. Lamherics, J. (2003), Discourse of Support: Exploring Online Discussions on Depression. Thesis Wageningem University. Lave, J. (1991). Situated Learning in Communities of Practice. In Resnick, L., Levine, J., & Teasley, S. (Eds.), Perspectives on Socially Shared Cognition (pp. 63–82). Washington, DC: American Psychological Association. doi:10.1037/10096-003 Lave, J., & Wenger, E. (1991). Situated learning. Legitimate Peripheral Participation. Cambridge, GB: Cambridge University Press. Ligorio, M. B., & Annese, S. (in press). Blended activity design approach. A method for innovating e-learning communities in higher education. In Rowiński, T. (Ed.), Blachnio, A., Przepiorka, & A. M.A. Internet in Psychological Research. Ligorio, M. B., Annese, S., Spadaro, P. F., & Traetta, M. (2008). Building intersubjectivity and identity in on-line communities. In Varisco, B. M. (Ed.), Psychological, pedagogical and sociological models for learning and assessment in virtual communities of practice (pp. 57–91). Milan: Polimetrica.
Ligorio, M. B., & Sansone, N. (2009). Structure of a Blended University Course: Applying Constructivist principles to blended teaching. In Payne, C. R. (Ed.), Information Technology and Constructivism in Higher Education: Progressive Learning Framework (pp. 216–230). Hershey, PA: IGI Global. Ligorio, M. B., & Spadaro, P. F. (2005). Digital positioning and online communities. In Oles, P. K., & Hermans, H. J. M. (Eds.), The Dialogical Self: Theory And Research (pp. 217–229). Lublin: Wydawnictwo KUL. Martinez, A., Dimitriadis, Rubia, B., Gomez Y., & de la Fuente, P. (2003). Combining qualitative evaluation and social network analysis for the study of classroom social interactions. Computers & Education, 41(4), 353–368. doi:10.1016/j. compedu.2003.06.001 Matusov, E. (2001). Intersubjectivity as a way of informing teaching design for a community of learners class. Teaching and Teacher Education, 17, 383–402. doi:10.1016/S0742051X(01)00002-6 Mayring, P. (2000). Qualitative Content Analysis. In Forum: qualitative social research, 1(2). Retrieved April 30, 2009 from http://www. qualitative-research.net/fqs/ /Forum: Qualitative Social Research. McMillan, D. V., & Chavis, D. M. (1986). Sense of Community: A Definition and Theory. Journal of Community Psychology, 14, 6–23. doi:10.1002/1520-6629(198601)14:1<6::AIDJCOP2290140103>3.0.CO;2-I Padgett, J. F., & Ansell, C. K. (1993). Robust action and thne rise of the Medici, 144–1434. American Journal of Sociology, 98, 1259–1319. doi:10.1086/230190
119
A Methodological Approach for Blended Communities
Palloff, R. M., & Pratt, K. (2007). Building online learning communities: effective strategies for the virtual classroom. San Francisco, CA: Wiley Imprint. Reffay, C., & Chanier, T. (2002). Social Network Analysis Used for Modelling Collaboration in Distance Learning Groups. In S.A. Cerri, G. Guardares & F. Paraguaco (Eds.), Intelligent Tutoring System Conference (pp. 31-40). France: Biarrizt. Rositi, F. (1970). L’analisi del contenuto come interpretazione. Torino: ERI. Schegloff, E. A., Jefferson, G., & Sacks, H. (1977). The preference for self-correction in the organization of repair in conversation. Language, 53(2), 361–382. doi:10.2307/413107 Scott, J. (1997). Social Network Analysis. Newbury Park, CA: Sage. Smith, P. (1988). Discerning the subject. Minneapolis: University of Minnesota Press. Snyder, D., & Kick, E. (1979). Structural position in the world system and economic growth 195570: A multiple network analysis of transnational interactions. American Journal of Sociology, 84, 1096–1126. doi:10.1086/226902 Spadaro, P. F. (2008). Grid for activity analysis (GAct). In Varisco, B. M. (Ed.), Psychological, pedagogical and sociological models for learning and assessment in virtual communities of practice (pp. 89–90). Milan: Polimetrica. Spadaro, P. F., & Ligorio, M. B. (2005). Intrecci tra contesto e identità in un forum di discussione. In Ligorio, M. B., & Hermans, H. (Eds.), Identità dialogiche nell’era digitale (pp. 89–112). Trento: Erickson. Tashakkori, A., & Teddlie, C. (1998). Mixed Methodology: Combining Qualitative and Quantitative Approaches. Thousand Oaks, CA: Sage.
120
Traetta, M., & Annese, S. (in press). A newcomer’s career between community and identity. In Cortini, M., Tanucci, G., & Morin, E. (Eds.), Boundaryless careers and occupational wellbeing. An interdisciplinary approach. London: Palgrave McMillan. Traetta, M., & Spadaro, P. F. (2008). SNA and Positioning Network Analysis (PNA). In Varisco, B. M. (Ed.), Psychological, pedagogical and sociological models for learning and assessment in virtual communities (p. 91). Milan: Polimetrica. Turkle, S. (1996). Life on the screen: identity in the age of the internet. New York: Simon & Schuster. Wasserman, S., & Faust, K. (1994). Social Network Analysis. Methods and Applications. Cambridge, GB: Cambridge University Press, Wenger, E. (1998). Communities of Practice: learning, Meaning, and Identity. Cambridge, GB: Cambridge University Press.
ADDITIONAL READING Aviv, R., Zippy, E., Ravid, G., & Geva, A. (2003). Network Analysis of Knowledge Construction in Asynchronous Learning Networks. JALN, 7(3), 1–23. Cohen, A. P. (1985). The Symbolic Construction of Community. London: Tavistock. doi:10.4324/9780203323373 Freeman, L. C. (1986). The impact of computer based communication on the social structure of an emerging scientific speciality. Social Networks, 6, 201–221. doi:10.1016/0378-8733(84)90011-X Freeman, L. C., & Freeman, S. C. (1980). A semivisible college: Structural effects of seven months of EIES participation by social networks community. In M.M. Henderson & M.J. McNaughton (Ed.), Electronic Communication: Technology and Impacts (pp. 77-85). Washington, DC: AAAS Symposium.
A Methodological Approach for Blended Communities
Hermans, H. J. M., Kempen, H. J. G., & Van Loon, R. J. P. (1992). The Dialogical Self: Beyond Individualism and Rationalism. The American Psychologist, 47(1), 23–33. doi:10.1037/0003066X.47.1.23 Hsieh, H.-F., & Shannon, S. E. (2005). Three Approaches to Qualitative Content Analysis. Qualitative Health Research, 15(9), 1277–1288. doi:10.1177/1049732305276687 Martinez, A., Dimitriadis, Y., Gomez, E., Jorrin, I., Rubia, B., & Marcos, J. A. (2002). Studying participation networks in collaboration using mixed methods. International Journal of ComputerSupported Collaborative Learning, 1(3), 383–408. doi:10.1007/s11412-006-8705-6
Nemeth, R. J., & Smith, D. A. (1985). International trade and world-system structure. A multiple network analysis. RE:view, 8, 517–1319. Rosen D., Woelfel J., Krikorian D., & Barnett G.A. (2003), Procedures for Analysis of Online Communities. JCMC, 8(4). Wells, G. (1993). Intersubjectivity and the Construction of Knowledge. In Pontecorvo, C. (Ed.), La Condivisione della Conoscenza (pp. 353–380). Rome: La Nuova Italia.
121
122
Chapter 7
Semantic Social Network Analysis: A Concrete Case Guillaume Erétéo Orange Labs, France Freddy Limpens EDELWEISS, INRIA Sophia-Antipolis, France Fabien Gandon EDELWEISS, INRIA Sophia-Antipolis, France Olivier Corby EDELWEISS, INRIA Sophia-Antipolis, France Michel Buffa University of Nice Sophia Antipolis, France Mylène Leitzelman University of Nice Sophia Antipolis, France Peter Sander University of Nice Sophia Antipolis, France
shared knowledge graphs.In this chapter we present our approach to analyzing such semantic social networks and capturing collective intelligence from collaborative interactions to challenge requirements of Enterprise 2.0. Our tools and models have been tested on an anonymized dataset from Ipernity.com, one of the biggest French social web sites centered on multimedia sharing. This dataset contains over 60,000 users, around half a million declared relationships of three types, and millions of interactions (messages, comments on resources, etc.). We show that the enriched semantic web framework is particularly well-suited for representing online social networks, for identifying their key features and for predicting their evolution. Organizing huge quantity of socially produced information is necessary for a future acceptance of social applications in corporate contexts.
INTRODUCTION The web is now a major medium of communication in our society and, as the web is becoming more and more social, a huge amount of content is now collectively produced and widely shared online. Even early on, the social interactions on the web highlighted a social network structure (Wellman 1996), a phenomena dramatically amplified by web 2.0 which follows inexorably Metcalfe’s Law1 (Hendler and Golbeck 2008). Individuals and their activities are at the core of the web, along with all the easily-available social software and services, e.g., Delicious, Flickr, Linkedin, Facebook. After the explosion of the “web of content” at the end of 90’s, we are witnessing the outburst of the “web of people”. Taken together, “we use people to find content whereas we use content to find people” (Morville 2004), and we need new means to investigate the relationship between people and content.
web has become a key factor of economic development and innovation. The competitiveness of firms is related to the adequacy of their decisions, which depends heavily on the quality of available information and their ability to capitalize, enrich and distribute this relevant information to people who will make the right decisions at the right moment. The Business Intelligence market is clearly bound to be seriously shaken up by the social and viral 2.0 revolution. As shown in Figure 1, it is already possible to organize (through mashups, open plugins and APIs) various free modules over Figure 1. Tools that transform business intelligence process
New Challenges in Understanding Online Social Interactions: The Case of Business Intelligence Process Today every organization is forced to anticipate opportunities and threats by detecting “weak signals”, to look for value-added information and knowledge, and to integrate networks of experts into its domains of activity. In this context, structured and unstructured information from the
123
Semantic Social Network Analysis
the whole information cycle, i.e., identification of sources / research / analysis and treatment / creation / distribution, with an efficiency competing proprietary solutions (such as Autonomy’s IDOL, Lotus Connection of IBM, and SAP BI software suite, etc.). Classical Knowledge Management and Competitive Intelligence Process inside firms are currently based on top-down business process driven approaches involving data flow analysis, subject matter expert location and Communities of Practice management. Online social data and network Software and Services (depicted in Figure 1) are reversing this whole process and empower the knowledge worker. We are witnessing the consequences on enterprises worldwide and the different generations - boomers, gen X and millennial – will have to overcome their digital divides in intra-organizational contexts (Martin 2005). Individuals inside their organization, and organizations as a whole, need tools to exploit this new wealth of knowledge to create innovation and to improve performance. Consequently, more and more social solutions (Social Text2, Blue Kiwi3, etc.) are being deployed in corporate intranets to reproduce information sharing success stories from the web into an organizational context. This new trend is also called “Enterprise 2.0”, that Andrew McAfee first coined as “the use of emergent social software platforms within companies, or between companies and their partners or customers” (McAfee 2006). These collaborative platforms allow conducting innovative strategic watch by introducing social interactions into every step of the watch cycle: search, monitoring, collecting, handling, dissemination. Information produced at different sources becomes accessible at a single entry point, is quickly shared and permanently enriched with comments and new sources. However, these platforms also augment the amount of information their users are exposed to. The benefit of information sharing is often hindered when the social network becomes so large that
124
relevant information is lost in an overwhelming flow of activity notifications. Losing information can lead to a loss of reactivity and competitiveness in a professional context. Organizing this huge quantity of information is necessary for gaining acceptance in corporate contexts and to achieve the full potential of Enterprise 2.0. Social activities and user generated content have to be properly organized and filtered before any notification is pushed to users if we want to preserve the benefits of online collaboration. These social data are produced through different interactions between users who maintain many types of relationships. We present here our approach to (1) capture and (2) exploit the knowledge that is contained in social interactions that emerge from the use of web 2.0 applications. The first step (capture) needs models and languages for representing the diverse knowledge that emerges from online collaboration in a machine readable and exchangeable format. The second step (exploit) requires means (languages, tools) to query such evolving and diffuse social knowledge. We answer these issues with semantic web frameworks, and will show that they address both topics efficiently. Social network analysis (SNA) is a domain that provides relevant metrics and algorithm to understand the structure of the social network that can be built from social interactions. We also show that the use of semantic web technologies is well adapted for performing SNA on online social networks, adding flexibility and simplicity to many steps of the computation of common SNA indices. In the first part of this chapter, we recall existing works conducted by researchers from the semantic web domain - the ontologies used to represent online activities that can be combined to connect and represent online social networks. Then, we present approaches to structure and organize the shared vocabularies (folksonomies) built by users when they tag shared content on web 2.0 web sites. We will show that the tagging activities contribute to reinforcing social bonds thanks to greater involvement and freedom in
Semantic Social Network Analysis
publishing, organizing and sharing content and constitute a novel opportunity for analyzing social networks. In the last section, we propose a stack of tools for achieving semantic social network analysis. While existing tools discard the richness of semantic social networks, we propose a framework to handle not only their structure but also the semantics of the ontological primitives used to capture their knowledge. We present the results obtained by analyzing a real social network with over 60,000 users, connected through half a million declared relationships of three types and millions of interactions: messages, comments, visits, etc. Finally, we present some perspectives on the exploitation of folksonomy data thanks to semantic tools and methods. We will show how the combination of Web 2.0 and semantic web approaches can help to dramatically enhance the effectiveness of bottom-up approaches to sharing and organizing resources, as well as to discover hidden social bonds within the knowledge shared among online communities.
REPRESENTING SOCIAL DATA WITH SEMANTIC WEB FRAMEWORKS Historical Background: Different Graph Models The emerging interactions between people on the internet and especially later on the World Wide Web quickly revealed social network structures (Wellman 1996) with properties that were close to those observed in the physical world. Researchers have extracted social networks from synchronous and asynchronous discussions (e.g., emails, mailing-list archives, IRC), the hyperlink structure of homepage citations, co-occurrence of names in web pages, and from the digital traces created by web 2.0 application usages (Erétéo et al 2008). Considering this last point, turning the read web into a read/write web has led to
dramatic growth in the different possibilities for interaction, producing a huge amount of heterogeneous social data. Information and content on the web are now collectively produced, socially discovered and quickly shared through mashable applications. We are witnessing the deployment of a social media landscape where “expressing tools allow users to express themselves, discuss and aggregate their social life”, “sharing tools allow users to publish and share content”, “networking tools allow users to search, connect and interact with each other” and “playing services integrate strong social features” (Cavazza 2009). Social platforms, like Facebook, Orkut, Hi5, etc., are at the center of this landscape as they enable us to host and aggregate these different social applications. As an example you can publish and share your Delicious bookmarks, your RSS streams or your microblog posts in the Facebook news feed, thanks to dedicated Facebook applications. This integration of various means of publishing and socializing enables us to quickly share, recommend and propagate information to our social network, trigger reactions, interact with it, and finally enrich it. Moreover web 2.0 has made social tagging popular, permitting an additional level of organization for tagged web resources (pictures, videos, blog posts etc.). A set of tags built from usage of such applications forms a folksonomy that can be seen as a shared vocabulary that is both originated by, and familiar to, its primary users (Mika 2005). This collaborative classification of web resources can be further analyzed in order to decipher implicit links between users who use similar vocabularies or tag the same content, highlighting the existence of common interests. As more people use these social applications they expose more and more of their lives and social networks. Sociologists now have access to a valuable source of social data that captures characteristics of our societies with permanently evolving web usages and web technologies. The need for some appropriate representation to exploit them has consequently emerged. Traditionally re-
125
Semantic Social Network Analysis
searchers have used graph theory which proposes different graph models to represent this data (Scott 2000). People and resources are represented by nodes and relationships are represented by edges. Social networks with symmetric relationships as in Facebook, can be represented by non-oriented graphs. Inversely, oriented graphs are well suited to model social networks with non-symmetric relations like the “follows” relationships of Twitter. In weighted graphs, weights are associated to edges to specify the intensity of the relationships, useful for representing the frequency of interactions between people through messages or comments. Social networks like Ipernity.com (a French web 2.0 site for sharing pictures and videos) or Facebook propose adding labels (e.g. family, friends, favourite) on edges to represent the type of relationships that links actors. Finally, sharing sites (e.g., Flickr, YouTube, Delicious) allow interaction on shared content (e.g. photos, videos, bookmarks), connecting them through virtual artifacts. Such social networks are represented with bipartite graphs, with two types of nodes and edges that link nodes of each type. A hyperedge extends the notion of an edge by allowing more than two nodes to be connected and is often used to represent complex relationships involving at least three resources (e.g. a user, a document and a tag). However, while human interactions in web 2.0 sites produce a huge amount of social data, capturing more and more aspects of physical social networks, this decentralized process suffers from little interoperability and little linking between diffuse data. In fact, such rich and spread-out data can’t be represented using only the models of graph theory outlined above without some loss of information. These representations are poorly typed with labels on edges but with no semantic links to structure them. Moreover, they are not necessarily adapted for exchanging data and semantics across applications. We’ll now see how semantic web frameworks tackle these re-
126
quirements and how they can be used to represent online social networks.
Enriching Social Data With Semantics Semantic web frameworks answer the problem of representing and exchanging data on such social networks with a rich typed graph model (RDF4), a query language (SPARQL4) and schema definition frameworks (RDFS4 and OWL4). RDF enables us to make assertions and to describe resources with triples (domain, property, range) that can be viewed as “the subject, verb and object of an elementary sentence”, “a natural way to describe the vast majority of the data processed by machines” (Berners-Lee 2001). Each element of a triple is identified by a URI (Uniform Resource Identifier), which enable every application to make its own description to identify it. These triples provide RDF with a directed labeled graph structure that is well suited to representing the social data of users that connect and interact through heterogeneous content on different web sites. First, they allow data to be spread across the internet and intranet networks, involving actors, content and relationships, and are represented with a uniform graph structure in RDF even if they are located on different sites. The URIs that are used to identify resources and properties, link distributed identities and activities. Same URIs identify the same resources so that two URIs describing the same resource can be unified with a single description stating so. Then, both nodes and relationships can be richly typed with classes and properties of ontologies that are described in RDFS and OWL adding a semantic dimension to the social graph. An ontology is “a set of representational primitives with which to model a domain of knowledge or discourse. The representational primitives are typically classes (or sets), attributes (or properties), and relationships (or relations among class members). The definitions of the representational primitives include information about their mean-
Semantic Social Network Analysis
ing and constraints on their logically consistent application” (Gruber 2009). As an example, the inheritance relation is a frequently used relation between classes and properties to define taxonomies (e.g., web page is a sub class of document and parent of is a sub property of family), but any relation between terms can be specified (e.g. parent of is narrower than family). Finally, SPARQL is the standard query language for querying RDF data and for performing all desired transformations on these semantic social networks (San Martin et al 2009). We will now look at ontologies for describing social activities and actors on the web. Social data can be seen as a twofold structure: data describing the social network structure, and data describing the content produced by network members. Several ontologies exist for representing online social networks (see the chapter “Understanding Online Communities Using Semantic Web Technologies”). Currently, the most popular is FOAF5, used for describing people, their relationships and their activity. A large set of properties defines a user profile: “family name”, “nick”,
“interest”, etc. The “knows” property is used to connect people and to build a social network. Other properties are available for describing web usages: online accounts, weblogs, memberships, etc. The properties defined in the Relationship6 ontology specialize the “knows” property of FOAF to type relationships in a social network more precisely (familial, friendship or professional relationships). For instance the relation “livesWith” specializes the relation “knows”. Figure 2 shows a typed graph that uses a rich model for representing the relations between nodes. The primitives of the SIOC7 ontology specialize “OnlineAccount” and “HasOnlineAccount” from FOAF in order to model the interactions and resources manipulated by users of social web applications (Breslin et al 2005); SIOC defines concepts such as posts in forums, blogs, etc. Researchers (Bojars et al, 2008) have shown that SIOC and the other ontologies can be used and extended for linking to and reusing scenarios and data from web 2.0 community sites. In addition, the SKOS8 ontology offers a way to organize
Figure 2. A typical social network represented with types relations and nodes
127
Semantic Social Network Analysis
Figure 3. Alignments between SIOC, FOAF and SKOS9
concepts with lightweight semantic properties (e.g., narrower, broader, related) and to link them to SIOC descriptions with the property “isSubjectOf” (see Figure 3). Social tagging consists in allowing users to associate freely chosen key-words, called tags, with the resources they exchange such as blog posts, photos, or bookmarks (see Figure 4). The result of the collection of such associations, called “taggings”, is a folksonomy. Social tagging and folksonomies can be improved by adding semantics that structure and link tags together. Gruber (2005) was among the first to suggest designing ontologies to capture and exploit the activities of social tagging (Newman et al. 2005) (Kim et al. 2007), These descriptions can deal with the author of the tag, or the tag itself as a character string, but also with additional properties such as the service where this tag is shared, or even a vote on the relevance of this tag. Other research work has attempted to go further by linking tags with explanations of their meaning (MOAT, Meaning Of A Tag, Passant and Laublet, 2008), or more generally, by bridging folksonomies and ontologies to leverage the semantics of tags (see an overview of this very topic in Limpens et al. 2008).
128
RDF-based descriptions of social data form a rich typed graph, exchangeable across web applications, and offer a much more powerful and significant way to represent online social networks than traditional models of SNA. However, other formalisms exist to easily attach lightweight seFigure 4. (at the top) Tripartite model of folksonomy (Halpin et al., 2007), and (at the bottom) illustration of the tagging of 3 web sites by 2 users using 3 tags (Markines et al., 2009) where Round = users, Rectangle = resources, Rounded rectangle = tags
Semantic Social Network Analysis
mantics to web resources and are now widely used. Microformats expose social data in web pages using XHTML markup. They are considered as “a pragmatic path to the semantic web” (Khare et al 2006) and solutions exist to bridge them with RDF (Adida 2008). “Microformats are a way of attaching extra meaning to the information published on a web page. This is mostly done through adding special pre-defined names to the class attribute of existing XHTML markup”10. Microformats are proposed to describe people, organizations and places (hCard), human relationships(XFN - XTML Friends Network), events (hCalendar), opinions, rating and reviews (VoteLinks, hReview) and tags (rel-tag). The following examples show some conventions of the use of XHTML attributes to add lightweight semantics with microformats. XFN adds rel attributes to xhtml tags with all appropriate values separated by spaces to define the type of relationship(s) between the author of the page and a person represented by the URI defined in the href attribute.
VCard specifies values of class attributes to type the content of xhtml tags describing people, organization or places:
<span class=”type”>Work:
169 University Avenue
<span class=”locality”>Palo Alto,
CA <span class=”postal-code”>94301 span>
USA div>
<span class=”type”>Work span>+33651743832
Email:<span class=”email”>[email protected] span>
Adding Structure and Semantics to Social Tagging and Folksonomies Can Help In Building Social Graphs Since tags are neither explicitly structured nor semantically related to each other, folksonomies have limited capacities in fully eliciting the knowledge contained in documents tagged by users. Tags in folksonomies remain at the stage of adhoc categories which serve user-centred purposes (Veres 2006). While tags can be interpreted by humans, we still lack effective tools to integrate them with richer semantic representations shared by other members of their web communities, or by other web communities. Researchers have attempted to bridge folksonomies and ontologies to leverage the semantics of tags (Limpens and al 2008). Once semantically typed and structured, the relationships between tags and between tags and users also provide a new source of social networks. In fact social structures can be analyzed to type data produced by social actors and vice versa, data produced by social actors can be analyzed to type social networks. Consequently, tags can be used to link people, with the help of semantics (by identifying, for instance, communities that share the same interests). Providing pivot languages to capture and exchange social data takes special importance
129
Semantic Social Network Analysis
in corporate application such as business intelligence or technology watch: these schemas and the underlying semantic web frameworks are ground foundation for data integration spanning both online sources and internal corporate applications. The network of experts, the information resources they watch, the report they produce, etc. can be integrated and articulated inside this unified graph-based set of frameworks to support transversal analysis such as identifying central experts, their interests and the sources they use regardless of where the different pieces of knowledge come from. In the next section we will focus on the different approaches that can be used to add semantics to folksonomies.
BRIDGING FOLKSONOMIES AND ONTOLOGIES Social tagging systems have recently become very popular as a means of classifying large sets of resources shared amongst on-line communities over the social Web. The simplicity of tagging, combined with the web 2.0 culture of exchange, allow users to share their annotations on the mass of resources. While the act of tagging is primarily for content categorization purposes, it can also be used for building social networks. For instance, we can link people who: • •
Used the same tag, and/or Tagged the same resource.
The simple examples of Figure 4 show how we can link people who share the same interest, be it symbolized by an interest on the same resource, or on the same tag. However, this approach can be greatly improved by adding semantics to the folksonomies: (1) by grouping similar or related tags; or (2) by inferring a hierarchy of tags. For instance, these semantic links can consist in
130
stating that the tag “music” is broader than the tag “guitar”, or “saxophone” is narrower than ”music” etc. For example, if John tags a document with “saxophone” and if Freddy tags another document with “guitar”, and if “guitar” and “saxophone” are both narrower than the tag “music”, we can say that Freddy and John share the same interest for “music”, even if they share no common resources tagged with “music”. It will be now possible to state that Freddy and John are members of the community of people interested by music, and they form an interest-based social network. In this section, we will first analyze folksonomy usages and limitations, and position them among the other classical ways of categorizing. Then we will present the state of the art about semantic enrichment of folksonomies and the different ways of bridging them with ontologies to be able to discover semantic links between tags. Finally we present our recent work that consists in integrating folksonomies into a collaborative construction of knowledge representations, aiming at providing additional functionalities to folksonomy-based systems and at semantically enriching folksonomies.
Folksonomy Usages and Limitations Several qualitative studies have been conducted on folksonomies. (Golder & Huberman 2005) have analyzed the use of folksonomies and have proposed classifying the act of tagging itself into different categories in the context of a typical application of social bookmarking, such as the topic of the item tagged, or as adjectives characterizing the opinion of the author (“funny”), or such as tags oriented towards a specific task (“toread”). (Vanderwal 2004) distinguished broad folksonomies (when tags tend to be understandable by numerous users) from narrow folksonomies (when tags are more user-centered). (Veres 2006) tried to define the linguistic nature of tags and showed that some tags correspond to taxonomic
Semantic Social Network Analysis
categories, while other tags correspond to ad hoc categories serving users’ purposes. Thus, folksonomies are a mirror of the diversity of points of view and usages of the users who share their tags. However, the exploitation of folksonomies raises several issues, as pointed out by (Mathes 2004) and (Passant 2009): (1) The ambiguity of tags: one tag may refer to several concepts; (2) The variability of the spelling: several tags may refer to the same concept; (3) The lack of explicit representations of the knowledge contained in folksonomies (folksonomies are “flat”, just sets of isolated keywords); (4) Difficulties in dealing with tags from different languages. To overcome these limitations, the classical alternative to social tagging is the use of structured knowledge representations to classify or to index resources. Formal ontologies consist in a specification of the conceptualization of a domain of knowledge with the help of formal concepts and properties linking these concepts (Gruber 1993). Thesauruses and taxonomies consist in notions or concepts which are rigorously defined and hierarchically structured, but do not use formal semantics. Semiformal and shared knowledge representations, such as Topic Maps (Park & Hunting 2002) have also been proposed as an intermediary representation to formal ontologies where concepts, called topics, are defined in relation to others with hierarchical relations. In comparison with these knowledge representations, folksonomies can be seen as semiotic representations of the knowledge of a community, but they do not include any semantic structure. In order to overcome the limitations of folksonomies that we mentioned above, it is possible to bridge ontologies and folksonomies. The idea is to semantically enrich folksonomies in order to
discover links between the tags, and in the end, between the users behind these tags (linking the users by the tags is a very interesting way of building social graphs for enriching the social network models described in the section dedicated to the semantic network analysis). This bridging can be done in several ways which we detail in the next subsections.
Extracting Semantics from Folksonomies It is possible to take into account multiple dimensions of folksonomies as they consist in a triadic structure where tags are associated by people to resources (“who tags what with what”). This is what (Mika 2005) does, for instance, in order to extract broader and narrower relationships between tags and to build what he calls “lightweight ontologies”. One of the advantages of this type of approach is to decipher the semantics of folksonomies and to be able to more accurately build communities of interests, for instance by considering all the persons using the tag “music” and all the tags subsumed by music (such as “guitar”, or “saxophone” in a previous example). The first step in this task is to measure the semantic relatedness between tags. Since usually no explicit semantic relationships are given when users tag, this relatedness has to be first computed by analyzing the tripartite structure of folksonomies. In Table 1 we compare approaches of this type. (Cattuto et al 2008) proposed semantically grounding the measures of tag relatedness and characterizing different types of similarity measures according to the type of semantic relationships to which they correspond. Thus, their method can be used to find related tags which share a subsumption relationship with a given tag t, however without being sure whether these related tags may subsume or be subsumed by tag t. (Mika 2005) applied social network analysis on different projections of the tripartite structure
131
Semantic Social Network Analysis
Table 1. Comparison table of approaches extracting semantic relations between tags by analyzing the structure of folksonomies Type of similarity
Subsumption relations
Clustering
Mika (2005)
Network based
yes
no
Hotho et al. (2006)
FolkRank
no
no
Schmitz (2006)
conditional probability
yes
no
Begelman et al. (2006)
co-occurrence
no
yes
Heymann & Garcia-Molina (2006)
distributionnal (resource context)
yes
no
Specia & Motta (2007)
distributional (tag context)
no
yes
Schwarzkopf et al. (2007)
composite
yes
no
Cattuto et al. (2008)
distributional (3 contexts)
yes
no
Markines et al. (2009)
mutual information
yes
no
of folksonomies. Then he grouped similar communities of interest, i.e., groups of people sharing common tags, in order to derive subsumption properties between the tags thanks to the inclusion of these communities of interest. (Hotho et al 2006) adapted the PageRank algorithm to the case of folksonomies in order to find not only relationships between tags, but also between users and resources. (Schmitz 2006) used conditional probability methods to induce a hierarchy from Flickr tags. (Begelman and al. 2006) looked closely at the distribution of the co-occurring tags for a given tag, and computed the threshold above which co-occurring tags are strongly related to each other. Several other approaches use distributional measures with different contexts of aggregation of the folksonomy data. The idea is to project the tri-partite model of folksonomy into bi-partite representations by aggregating the data according to a given context. For instance the tag-tag context consists in looking at the association between a tag and its co-occuring tags. (Heymann & Garcia-Molina 2006) used the tag-resource context, while (Specia & Motta 2007) used the tag-tag context, and (Schwarzkopf et al. 2007) used a composite measure mixing the tag-tag context and the tag-user context. Finally (Cattuto et al. 2008) proposed an analysis of the
132
different context of distributional aggregation, while (Markines et al. 2009) proposed a new type of measure based on mutual information calculus, and a framework for analyzing the different types of similarity measures between resources and tags.
Semantically Enriching Folksonomies, Structure The Tags! Even if ontologies and folksonomies remain different entities, several approaches have been proposed to semantically enrich folksonomies by adding a semantic layer, or by attempting to semantically structure them with the help of other already available ontologies, or by using the tags to bootstrap an ontology. By adding structure to the tags, we add structure to the set of users who used these tags. Remember that by linking tags, we link people. If we use tags to bootstrap an ontology (for example by integrating the most popular tags into the ontology), or if we link tags to a domain ontology, we help structure the tags. More generally the usefulness of these approaches for semantic social network analysis is to connect the tags to other semantic resources, such as users, shared content, or members of other social data repositories in order to build a graph of people who share the same interests. In
Semantic Social Network Analysis
addition, once the semantics of folksonomies are better known, we can use formalisms or the tools of the semantic web to support folksonomy-based social platforms. This type of approach consists in either (1) using ontologies to represent folksonomies and properties of tags (Gruber 2005), or (2) assisting users to semantically augment tags (Tanasescu & Streibel 2007), or (3) using ontologies to automate the semantic enrichment of folksonomies (Specia & Motta 2007)), or (4) involving users in the semantic organization of tags. Then semantic web formalisms can help leverage the interoperability of the exchange of this additional knowledge. In Table 2 we compare these approaches. The main idea consists in constructing an ontology of folksonomies to support more advanced uses of tagging (Gruber 2005). Thus, tags can have properties and relationships, and can be grouped in tag clouds, etc. This idea has been implemented by (Newman et al. 2005), and further improved by (Kim et al. 2007) who integrated their SCOT11 ontology, another ontology that models users’ interaction on social Web platforms with SIOC12 (Breslin et al. 2005), another ontology that models users’ interaction on social Web
platforms. Later, (Passant & Laublet 2008) extended these interconnected schemas with MOAT13, an ontology linking tags with online resources to define precisely the meaning of tags and to tie them with the “Web of Linked Data”14, a vision of the Web where resources are linked with each other thanks to the concepts which can be attached to them. Research using the previous idea focused on user intervention in the process of semantically enriching folksonomies. Huynh-Kim Bang et al. (2008) proposed the concept of structurable tags where users can add specific tags corresponding to semantic relationships between tags (such as “france” < “europe” which means “france” is narrower than “europe”). (Tanasescu & Streibel 2007) suggested letting the users tag the links existing between tags. The two latter approaches do not make direct use of semantic web formalisms, as they focus more on the flexibility of the system than on the logical consistency of the knowledge structure obtained. (Passant 2007) developed a semantically augmented corporate blog where users can attach their tags to the concepts of a centrally maintained ontology, while (Good et al. 2007) suggest terms from professional vocabular-
Table 2. Comparison table of the approaches to enriching folksonomies which (1) exploit user intervention, and/or (2) make use of external semantic resources, and/or (3) seek the automation of the process (automatic), and/or (4) are based on semantic web formalisms User intervention
Ext. resources
Automatic
Sem. Web
Gruber (2005)
-
no
no
yes
Newman et al. (2005)
-
no
no
yes
Tanasescu & Streibel, (2007)
yes
no
no
no
Huynh-Kim Bang et al. (2008)
yes
no
no
no
Breslin et al. (2005), Kim et al., (2007)
-
no
no
yes
Passant & Laublet (2008)Good et al. (2007)
yes
yes
no
yes
Specia & Motta (2007), Angeletou et al. (2008)
no
yes
yes
yes
Tesconi et al. (2008), Ronzano et al. (2008)
no
yes
yes
yes
Van Damme et al. (2007)
yes
yes
yes
yes
Braun et al.
yes
no
no
yes
133
Semantic Social Network Analysis
ies fetched online at tagging time. Thanks to the two latter types of approaches, ambiguous tags can be associated to clearly defined concepts by the users while tagging, solving one of the limitations of folksonomies. Other research works proposed automating (even partially) the semantic enrichment of folksonomies. For example by applying several types of semantic processing, such as finding equivalent tags or grouping similar tags based on similarity measures. (Specia & Motta 2007) have developed such a system; they query ontologies on the semantic web and try to match the tags from these clusters with concepts from ontologies in order to link the tags with semantic relationships. The main limitation of such an approach is the limited coverage of currently available ontologies. Similarly, (Tesconi et al. 2008) and (Ronzano et al. 2008) first built sets of terms-meaning by mining Wikipedia, and then linked each tag of a sample of delicious.com users to a unique meaning. The main difference between these two latter types of methods is that (Specia & Motta 2007) apply the mapping of tags with semantic resources on clusters of related tags, whereas (Tesconi et al. 2008) consider sets of tags belonging to the same user. The semantic enrichment of tags proposed by (Specia & Motta 2007) can be used by all the contributors of a folksonomy, and may be useful to a whole community. The tag disambiguation of (Tesconi et al. 2008) can be applied to different purposes, such as the profiling of the tagging of a user, providing for richer information when consulting the bookmark database of this user. However, if we apply the algorithm proposed by (Tesconi et al. 2008) to all the users of a community, we can measure or detect the divergences existing among the users and, for instance, propose discussing their points of view in the case of the collaborative construction of an ontology. (Van Damme et al. 2007), along the same lines, suggest integrating as many semantic online resources as possible, and, at the same time, integrating user
134
intervention to build, at a reasonable cost, genuine “folks-ontologies”. The collaborative aspects of the semantic enrichment of folksonomies have been addressed by other approaches focused on ontology maturing processes. The idea is to involve users in the semantic organization of tags so that the tags in the folksonomy will better suit the user needs than purely automatic approaches. Web 2.0 tools are used to achieve this task, such as wikis (Buffa et al. 2008), blogs (Passant 2007), e-learning platforms (Torniai et al. 2008)), personal knowledge organizers (Abbattista et al. 2007), or social bookmarking sites (Braun et al. 2007). Following the distinctions brought by (Weller & Peters 2008) between the individual and the collective level at which folksonomies can be modified, we can distinguish approaches where the users merely propose new concepts to an existing ontology (Passant, 2007), with approaches where users can directly edit the whole shared ontology (Braun et al., 2007). These approaches raise also the problem of the user-friendliness of the interfaces used to edit tags and their semantic relations to other tags, as this task requires time and skills. Another great benefit of combining ontologies and folksonomies lies in the interoperability brought by the formalism of the semantic web. The Linking Open Data project15 consists in extending the Web with semantically interconnected data sources and which publish varied open data sets in RDF format following a set of ontologies describing the different types of resources. Ontologies from the Linking Open Data initiative include ontologies like SIOC3, used to describe online communities’ exchanges or SKOS4, used to describe thesauruses (see chapter “Understanding Online Communities Using Semantic Web Technologies”) for more details on this aspect of the use of semantic web formalisms to empower social data repositories)
Semantic Social Network Analysis
Concrete Example: A Tagging System for Collaboratively Building a Thesaurus and For Identifying a Network of Experts In this section we present our approach to the semantic enrichment of folksonomies which we have applied to the evolution of a thesaurus within a French organization. It involves a social bookmarking application similar to delicious.com but adds some simple features for helping to classify the tags. We will show that a very simple application that requires little effort by users can help structure the folksonomy and build a thesaurus. A very interesting consequence is that it also helps in building a network of experts.
Motivating Scenario Our scenario takes place within the French Agency for the Environment (ADEME16). In this organization, there is a distributed network of experts who publish, share and exploit resources. The goal of our collaboration with this organization is to help them improve the indexing of these resources thanks to a combination of bottom up approaches (like folksonomies) and semantic tools. In order to involve all the users in the indexing process, we designed a method based on the semantic enrichment of folksonomies. This method consists in associating the power of automatic handling of folksonomies and the expertise of users by integrating simple semantic functionality within the interface of the system. The result of this approach is a set of tags linked with semantic relationships (such as broader, narrower, or related) that can be connected to some nodes of the existing thesaurus thanks to ontology matching techniques (Euzenat & Shvaiko 2007). The tags which are not matched but which are semantically connected with tags that have been matched can then be proposed to the maintainer of the thesaurus as new concepts (new candidates for the integration into the ontology). In addition,
our model supports confrontational views so that any user can propose semantic relationships (on the basis of automatic suggestions); divergences may arise and can be an interesting opportunity to discover different sub-communities of interests.
Semantic Enrichment of Folksonomies Our approach consists in combining automatic processing of the folksonomy and semantic functionalities integrated within a navigation interface in order to assist the users in contributing to the semantic enrichment of the folksonomy. One of the widely known limitations of folksonomies is the handling of the spelling variations between supposedly equivalent tags such as “neighbour” and “neighbor”. A simple solution to this problem consists in measuring the editing distance between these tags, such as the Levenshtein distance (Levenshtein 1966), and to identify as equivalent tags the ones whose distance is below a given threshold value. Another type of analysis consists in measuring the “similarity distance” between all the tags thanks to an analysis of the links between the tags, the users, and the tagged resources in a folksonomy. This type of handling corresponds to the solutions proposed by (Markines et al. 2009), among others. We have implemented in our system the distributional measure based on the tag-tag context. This automatic handling is then used by functionalities such as the detection of spelling variants of tags and the possibility of related tags. These functionalities are suggested by the interface to induce users to validate, reject or correct the automatic processing.
Implementation The system in which we have implemented our ideas is a bookmark navigator which includes extra functionalities such as the extension of tag queries with spelling variants, and the suggestion
135
Semantic Social Network Analysis
of related tags, plus the possibility of editing these semantic relations (for a detailed presentation, see Limpens et al., 2009). Our system is composed of: (1) automatic agents applying semantic processing to folksonomies, and (2) a user interface to browse the bookmark database, and at the same time, to validate or correct the automatically suggested tags and semantic relationships. In our model every assertion is attached to a user, recorded, and added to the database, even when it contradicts other assertions (for example the assertion “pollution” is related to “car”, has been approved by John, and rejected by Paul). This feature has the advantage of collecting all users’ contributions and letting diverging points of view coexist, each user benefiting from their own structuring of the folksonomy plus the contributions of others when they are not confrontational. Since our model is described with semantic web formalisms (as an RDF schema), the discovery of conflicting relationships is straightforward and can make use of inference capabilities through SPARQL queries. Thus it is possible, for instance, for a given user to know who are the other users who agreed with him on semantic relations he made on his tags. The administrators of the system can further exploit these results in different ways. The different points of views arising within the community can be highlighted thanks to the mechanism described above. For instance, the point of view of the “car’s opponents”, and the point of view of the “car’s defenders” can be highlighted if there is a conflict or an agreement in the semantic relationship that links “car” with “pollution” for example. The hypothesis we make here is that when someone puts some effort into semantically structuring a tag, this implies a stronger commitment than mere tagging and can be a good indicator of a strong interest or an expertise in the domain described by this tag.
136
Towards Novel Exploitation of the Semantics of Social Data We have seen in this section how semantically enriching folksonomies can improve semantic social network analysis by providing additional links between tags, and thus, between people using these tags. We have presented the state of the art on bridging folksonomies and ontologies. Since folksonomies consist of the collection of the taggings by users, that is, the association of freely chosen keywords to resources, they can connect users together through the use of the same tags or the tagging of the same resources. Semantically enriching folksonomies can further enhance the ability to connect people via tags by discovering links between different tags which are not necessarily used for the same resources (such as “pollution” and “CO2” in the previous examples). We have also proposed a novel method to assist with automatically handling the semantic organization of folksonomies. This method consists in automatically proposing semantic relations between tags (such as “spelling variant” or “related”), and letting users validate or correct them, or even proposing new semantic relations thanks to functionalities embedded in the browsing interface (see Figure5). The results can then be exploited to highlight sub-communities of interest via the divergence or convergence between the semantic relations validated or rejected by the users. For instance if a group of users agreed on semantically connecting the tag “car” with the tag “pollution”, we can infer that they share the same view on the role of cars in pollution problems. Adding semantics to social data such as tagging data and folksonomies can greatly enhance business intelligence processes by helping the discovery of weak signals and the deciphering of links. Indeed, mere folksonomies and the classical tag cloud visualization have the tendency to hide rarely used notions since highlighted terms are the most popular ones. In our concrete example, if a
Semantic Social Network Analysis
Figure 5. Screenshot of our early interface for navigating a bookmarks database and validating or proposing semantic relations between tags
single user proposes a semantic relation between rarely used tags and more broadly used tags, this small piece of information can benefit the whole community and render visible emergent notions more quickly. Coming back to our initial scenario of business intelligence, a clear stake of leveraging social applications to capture and organize folksonomies is the potential of turning every user into a watcher, a contributor to business intelligence, a sensor and a categorizer, and all this, ideally, as a side effect of her day-to-day tasks such as bookmarking a resource or searching for a bookmarked resource. Now that we can capture and organize information resources and the experts who find them or who monitor them, we need to capture and analyze the networks of these experts, be they explicit or implicit.
SEMANTIC SOCIAL NETWORK ANALYSIS We saw in previous sections that we can represent user interaction on social web sites using several ontologies, both for representing the explicit part of the social network (network of friends, etc.) but also for building graphs of users based on other implicit markers. In particular, we focused on the semantic enrichment of the folksonomies that can be used to identify communities of interest. Once we have such graphs, we can analyse them via social network analysis (SNA). SNA tries to understand and exploit the key features of social networks in order to manage their life cycle and predict their evolution. Much research has been conducted on SNA using graph theory (Scott 2000) (Wasserman et al 1994). Among important results is the identification of sociometric features that characterize a network. SNA metrics can be decomposed into two categories; (1) some provide information on the position of actors and how they communicate and (2) others give information on the global structure of the social network. Centrality highlights the most important actors and the strategic positions of the network - three definitions have been proposed (Freeman 1979). Degree centrality considers nodes with high degrees (number of adjacent edges) as most central. It highlights the local popularity of the network, actors that influence their neighbourhood. In directed graphs the in-degree and out-degree (number of in-going and out-going adjacent edges) are alternative definitions that take into account the direction of edges, representing respectively the support and the influence of the actor. The n-degree is an alternative definition that widens the neighbourhood considered to a distance of n or less (the distance between two actors is the minimum number of relationship that link them). Closeness centrality is based on the average length of the paths (number of edges) linking a node to others and reveals the capacity of a node to be
137
Semantic Social Network Analysis
reached and to join others actors. The direction of edges also modifies the interpretation of the closeness centrality by differentiating the capacity to join or to be reached. Betweenness centrality focuses on the capacity of an actor to be an intermediary between any two others. A network is highly dependent on actors with high betweenness centrality and these actors have a strategic advantage due to their position as intermediaries and brokers (Burt 1992) (Holme 2002)(Burt 2004). Its exact computation is time consuming, several algorithms tackle this problem (Freeman et al 1991) (Newman 2001) (Latora et al 2007) (Brandes 2001) with a minimum time complexity of O(n.m) - n is the number of nodes and m the number of edges. To deal with large networks, approximating algorithms (Radicchi et al 2004) (Brandes et al 2007) (Bader et al) (Geisber et al 2008) and parallel algorithms (Bader et al 2006) (Santos et al 2006) have been proposed. Other metrics help understanding the global structure of the network. The density indicates the cohesion of the network, i.e., the number of relationships expressed as a proportion of the maximum possible number of relationships (n*(n1), with n the number of actors). The diameter is the length of longest geodesics of the network (a geodesic is a shortest sequence of linked actors between two actors). Community detection helps understanding the distribution of actors and activities in the network (Scott 2000), by detecting groups of densely connected actors. The community structure influences the way information is shared and the way actors behave (Burt 1992) (Burt 2001) (Burt 2004) (Coleman 1988). (Scott 2000) gives three graph patterns that correspond to cohesive subgroups of actors playing an important role in community detection: components (isolated connected sub graphs), cliques (complete sub graphs), and cycles (paths returning to their point of departure). Alternative definitions extend these initial concepts that are too restrictive for social networks. The members of an n-clique have a maximum distance of n to any other member
138
of the group, and a member of a k-plex must be connected to all the members of the group except a maximum number of k actors. However, these extensions, still not adapted to social network structure and other criteria of cohesiveness, are proposed by community detection algorithms. Community detection algorithms are decomposed into two categories, either hierarchical or based on heuristics (Newman 2004) (Givan et al 2004) (Danon et al 2005). Two strategies are used in hierarchical algorithms: divisive algorithms consider the whole network and divide it iteratively into sub communities (Girvan et al 2002) (Wilkinson et al 2003) (Fortunato et al 2004) (Radicchi et al 2004), and the agglomerative algorithms group nodes into larger and larger communities (Donetti et al 2004) (Zhou et al 2004) (Newman 2004). Other algorithms are based on heuristics such as random walk, analogies to electrical networks (Wu et al 2004) (Pons et al 2005). Social network graphs hold specific patterns that can be used to characterize them (Newman 2003) and accelerate algorithms. The degree distribution follows a power law, few actors have a high degree and many have a low one. According to the small world effect (Milgram 1967), the diameter in a social network with n actors is of the order of log(n). Social networks have an important clustering tendency forming a community structure due to a high transitivity in relationships (if Jack knows Paul and Paul knows Peter there is a good chance that Jack knows Peter or will meet him) (Newman 2003). This clustering tendency correlates with the assortativity that refers to the preference for actors of a social network to be linked to others who have similar characteristics. The size of the largest component is an indicator of the communication efficiency of the network, the more actors it contains the better the communication. In most of web 2.0 sites, the size of the largest component is of the order of the size of their social network as they are focused on user communication and centred on a viral diffusion of their content.
Semantic Social Network Analysis
These algorithms are only concerned with graph structure – they all lack semantics, and have an especially poor exploitation of the types of relations. There is a need for interoperable tools and languages that could help taking into account semantics and typing. Ontologies based on semantic web standards emerged these last years to help deal with such problems. Millions of FOAF profiles (Golbeck et al 2008) are now published on the web, due to the adoption of this ontology by web 2.0 platforms with large audiences (www.livejournal.net, www.tribe.net). SIOC exporters are also proposed and available in widely deployed social applications such as blogs (e.g., SIOC plugin for Wordpress). The adoption of standardized ontologies for online social networks will lead to increasing interoperability between them and to the need for uniform tools to analyze and manage them. Consequently, some researchers have applied classical SNA methods to the graph of acquaintance and interest networks respectively formed by the properties “foaf:knows” and “foaf:interest” to identify communities of interest from the network of LiveJournal.com (Paolillo et al 2006). (Golbeck et al 2003) studied trust propagation in social networks using semantic web frameworks. (Golbeck et al 2008) worked on merging FOAF profiles and identities used on different sites. In order to perform these analyses, they chose to build their own, untyped graphs (each corresponding to one relationship “knows” or “interest”) from the richer RDF descriptions of FOAF profiles. Too much knowledge is lost in this transformation and this knowledge could be used to parameterize social network indicators, improve their relevance and accuracy, filter their sources and customize their results. Others researchers (San Martin et al 2009) have shown that SPARQL is well suited for performing modifications on a social network but that it can’t deal with global queries currently used in social network analysis (e.g., diameter, density, centrality, that require complex path computations). Consequently, researchers have extended the standard SPARQL query language in order to find paths between se-
mantically linked resources in RDF-based graphs (Alkhateeb et al 2007) (Anyanwu 2007) (Kochut & Janik 2007) (Corby 2008) (Pérez et al 2008). In the next section, this work is used as a basis to work on graph-based and ontology-based social network representation and analysis.
Analyzing Online Social Networks with Semantic Web Frameworks We have designed a framework to analyse online social networks based on semantic web frameworks. Figure 6 illustrates the abstraction stack we follow. We use the RDF graphs to represent social networks, and we type those using existing ontologies together with specific domain ontologies if needed. Some social data are already readily available in a semantic format (RDF, RDFa, microformats, etc.). However, today, most of the Figure 6. Abstraction stack for social data analysis
139
Semantic Social Network Analysis
Figure 7. Schema of SemSNA: the ontology of social network analysis
data are still only accessible through APIs, see examples in (Rowe and Ciravegna 2008), or by crawling web pages and need to be converted. To enhance these social network representations with SNA indices, we have designed SemSNA (Figure 7), an ontology that describes the SNA
notions, e.g., centrality. With this ontology, we can (1) abstract social network constructs from domain ontologies to apply our tools on existing schemas by having them extend our primitives; and we can (2) enrich the social data with new annotations (see Figure 8) such as the SNA indices
Figure 8. Social network enhanced with SemSNA indices (Degree, Betweeness)
140
Semantic Social Network Analysis
that will be computed. These annotations enable us to manage more efficiently the life cycle of an analysis, by pre-calculating relevant SNA indices and updating them incrementally when the network changes over time. We propose SPARQL formal definitions of SNA operators improving the semantics of the representations. The current test uses the semantic search engine Corese (Corby et al 2004) that supports powerful SPARQL extensions particularly well suited for SNA features such as path computations (Corby et al 2008).
SemSNA: an Ontology of Social Network Analysis SemSNA17 (Figure 7) is an ontology that describes concepts of social network analysis with respect to the semantics of the analyzed relationships. First, we present the basic concepts that can be extended to integrate any SNA features and then we present different primitives that extend this basis to annotate social networks with popular SNA metrics. The main class SNAConcept is used as super class for all SNA concepts. The property isDefinedForProperty indicates for which relationship (i.e. sub-network) an instance of SNA concept is defined. An SNA concept is attached to a social resource with the property hasSNAConcept. The class SNAIndice describes valued concepts such as centrality, and the associated value is set with the property hasValue. As an example, with this basis a general declaration of a valued concept will be: hasSNAConcept _:a _:a hasValue 12 _:a isDefinedForProperty “foaf:knows”
A set of primitives can be used to annotate positions in the network based on centrality. The class Centrality is used as a super class for all centralities defined by the classes Degree, InDegree, OutDegree, Betweenness, Betweenness-
Centrality and ClosenessCentrality. The property hasCentralityDistance defines the distance of the neighbourhood taken into account for a centrality measure. Next a set of primitives are proposed for metrics on the global structure of the social network. Primitives are defined to annotate groups of resources linked by particular properties. The class Group is a super class for all classes representing any definition of groups of resources. The class Component represents a set of connected resources. The class StrongComponent defines a component of a directed graph where the paths connecting its resources don’t contain any change of direction. The Diameter subclass of Indice defines the length of the longest of the shortest paths of a component. The property maximumDistance enables us to restrict the membership to components with a maximum path length between members. A clique is a complete sub graph for a given property according to our model. An n-clique extends this definition with a maximum path length (n) between members of the clique; the class Clique integrates this definition, and the maximum path length is set by the property maximumDistance. Resources in a clique can be linked by shortest paths going through non clique members. An NClan is a restriction of a clique that excludes this particular case. As KPlex relaxes the clique definition to allow connecting to k members with a path longer than the clique distance, k is determined by the property nbExcept. Finally the concept Community supports different community definitions: InterestCommunity, LearningCommunity, GoalOrientedCommunity, PraticeCommunity and EpistemicCommunity (Conein 2004) (Henri et al 2003). These community classes are linked to more detailed ontologies, such as used by (Vidou et al 2006) to represent communities of practice. With this ontology we can enrich the RDF description of social data with SNA metrics that are semantically parametrized (Figure 8). These annotations are useful to manage more efficiently the life cycle of an analysis, by calculating the SNA
141
Semantic Social Network Analysis
indices only once and updating them incrementally when the network changes over time. Moreover, using a schema to add the results of our queries (rules) to the network also allows us to decompose complex processing into two or more stages and to factorize some computation among different operators, e.g., we can augment the network with in-degree calculation and betweenness calculation and then run a query on both criteria to identify some nodes (e.g., what are the nodes with indegree > y and betweenness > x ?).
Querying and Transforming the Social Network with SPARQL Based on our model, we propose SPARQL formal definitions to compute semantically parametrized SNA features and to annotate the graph nodes, caching the results. The current test uses the semantic search engine CORESE (Corby et al 2004) based on graph representations and processing that supports powerful SPARQL extensions particularly well suited for the computation of the SNA
features that require path computations (Corby 2008). In (San martin et al 2009), researchers have shown that SPARQL “is expressive enough to make all sensible transformations of networks”. However, this work also shows that SPARQL is not expressive enough to meet SNA requirements for global metric querying, e.g., density, of social networks. Such global queries are mostly based on result aggregation and path computation which are missing from the standard SPARQL definition. The Corese search engine provides such features with result grouping, aggregating functions like sum() or avg() and path retrieving (Corby et al 2008) (Erétéo et al 2009). Moreover, inheritance relations are natively taken into account when querying the RDF graph in SPARQL with CORESE. Thus parametrized operators formally defined in SPARQL allow adjusting the granularity of the analysis of interactions/relations while classical SNA ignores the semantics of richly typed graphs like RDF. The Figure 9 illustrates the calculation of a parametrized degree where only family rela-
Figure 9. A Parametrized degree that considers a hierarchy of relations
142
Semantic Social Network Analysis
tions are considered by exploiting the hierarchy of relationships). Different SPARQL queries, exploiting Corese features, are presented in (Erétéo et al 2009) to perform social network analysis combining structural and semantic characteristics of the network. This approach is easily extensible as other queries can be defined at anytime, to compute new operators. As a simple example, the parametrized degree described in Figure 8 is computed with the following query in Corese: select ?y count(?x) as ?degree where { { ?x rel:family ::?y } UNION { ::?y rel:family ?x } } group by ?y
In order to be exploited in web services to leverage the social experience, these queries must be applied in batch on a large number of stored RDF triples. Consequently the social data are enhanced with the results of these parametrized SNA metrics using the SemSNA ontology to provide services based on this analysis (e.g., filter social activity notifications), to use them in the computation of more complex indices or to support iterative or parallel approaches in the computation. Corese is a freeware that can handle millions of nodes but other engines with the same extensions could be used just as well. The W3C SPARQL Working Group18 is currently investing some of the extensions that are presented in (Erétéo et al 2009), such as project expression, aggregation, group by and property paths. ARQ19, PSPARQL20 and SPARQLeR (Kochut and Janik 2007) also implement property paths. However, some necessary extensions are unique to Corese, like the group by any statement that groups results that share a value through any variable, computing connected results.
Inside companies, these operators can analyze in real time or in batch the expert networks of the organization and its projects, providing a directory of the relevant persons to contact for every field of interest it is involved in. Leveraging both graphs (structured folksonomies and social networks) and the semantics of the schema, parametrized operators can produce reports and snapshots of the current assets and trends of the activity of the company, its markets and its competitors. But all this formalized knowledge can also be used in production rules to automatically produce new knowledge with potentially high added value as we will see in the next section.
TRANSFORM, ENRICH AND WRAP SOCIAL DATA Semantic web frameworks offer different ways to enrich RDF data with reasoning mechanisms. We first investigate how to infer new knowledge from an ontology by defining rules and schema properties. Then we’ll see how SPARQL enables us to generate RDF by performing queries with a CONSTRUCT clause and its extension in Corese to leverage such features. The OWL schema “specifies property characteristics, which provides a powerful mechanism for enhanced reasoning about a property”21. New properties can be defined automatically and inconsistencies among data can be easily inferred. For example, a property family can be defined as symmetric and transitive, and inferring on social data containing Paul family Jack and Jack family Peter will produce the knowledge Jack family Paul, Peter family Jack, Paul family Peter and Peter family Paul. The Figure 10 summarizes the characteristics that can be defined on properties with OWL. Other pre-processing can also enrich the semantics, such as rules crawling the network to add types or relations whenever they detect a pattern, e.g., every actor frequently commenting
143
Semantic Social Network Analysis
Figure 10: Owl in One picture 22
resources or posts by another actor is linked to him by a relation “monitors”. Corese can automate some transformations with inference rules (Corby et al 2002). As an example we can infer a property SemSNI:hasInteraction (SemSNI23 is an ontology of Social Network Interaction, see (Erétéo et al 2009)) between two actors when one has commented on the other’s resource using the following rule
include in particular Basic Logic Dialect (BLD) and Production Rule Dialect (PRD). Another tool to leverage the social network representation is to process it with a SPARQL query using a construct block to generate RDF and enrich the social data with it (San martin et al 2009) (Erétéo et al 2009). The following query produces the same result as the preceding example with a Corese rule:
The preceding syntax is specific to Corese but the Rule Interchange Format24 (RIF) proposes XML dialects for exchanging rules on the semantic web and providing interoperability between the different inference engines. These dialects
144
Such queries produce RDF triples in respect with the construct block, which can be stored next. Corese enables us to re-inject the knowledge produced directly into the knowledge base with an add clause. The following example highlights the enrichment of a social network, using SemSNA, with degrees computed in the select clause:
Semantic Social Network Analysis
ADD { ?y semsna:hasSNAConcept:_b. :_b rdf:type semsna:Degree. :_b semsna:isDefinedForProperty rel:family. :_b semsna:hasValue ?degree} SELECT ?y count(?x) as ?degree where { { ?x rel:family ?y } UNION { ?y rel:family ?x } }group by ?y
Wrap XML and SQL Social Data with RDF We used Corese to query social data stored in a relational database or in XML (most of web 2.0 social data are exposed in XML through restful APIs) and to turn it into RDF/XML. While some researchers, like (Waseem et al. 2008) proposed a solution with the XSPARQL language for turning XML data into RDF, without the need for costly XSLT transformations, Corese proposes a different approach: an extension that enables us to nest an SQL query or an XQuery within SPARQL (Corby et al 2009). This is done by means of the sql() (respectively XPath) function that returns a sequence of results for each variable in the SQL select clause (respectively result of the node-set). Corese proposes an extension to the standard SPARQL select clause that enables binding these results to a list of variables. In the following example, we show how we retrieve the friend relationships from the relational database, using this sql() function: construct { ?id1 foaf:knows ?id2 } select sql(<server>, , <user>, , ‘SELECT user1_id, user2_id from relations’) as (?id1, ?id2) where { }
Experiment on a Real Online Social Network We conducted an experiment on an anonymized dataset of Ipernity.com25, one of the largest French social networks centered on multimedia sharing. This dataset contains 61,937 actors, 494,510 declared relationships of three types and millions of interactions (messages, comments on resource, etc.). Ipernity.com, proposes to its users several options for building their social network and sharing multimedia content. Every user can share pictures, videos, music files, create a blog, a personal profile page, and comment on other’s shared resources. Every resource can be tagged and shared. For building the social network, users can specify the type of relationship they have with others: friend, family, or simple contact (like a favourite you follow). Relationships are not symmetric, Fabien can declare a relationship with Michel but Michel can declare a different type of relationship with Fabien or not have him in his contact list at all; thus we have a directed labelled graph. Users have a homepage containing their profile information and pointers to the resources they share. Users can post on their profile and their contacts’ profiles depending on the access rights. All these resources can be tagged including the homepage. A publisher can configure the access to a resource to make it public, private or accessible only for a subset of its contacts, depending on the type of relationship (family, friend or simple contact), and can monitor who visited it. Groups can also be created for topics of discussion with three kinds of visibility, public (all users can see it and join), protected (visible to all users, invitation required to join) or private (invitation required to join and consult). We analyzed the three types of relations separately (favourite, friend and family) and also used polymorphic queries to analyze them as a whole using their super property: foaf:knows. We also analyzed the interactions produced by exchanges of private messages between users, as well as
145
Semantic Social Network Analysis
those produced by someone commenting someone else’s documents. We first applied quantitative metrics to get relevant information on the structure of the network and activities: the number of links and actors, the components and the diameters. 61,937 actors are involved in a total of 494,510 relationships. These relationships are decomposed into 18,771 family relationships between 8,047 actors, 136,311 friend relationships involving 17,441 actors and 339,428 favourite relationships for 61,425 actors. These first metrics show that the semantics of relations are globally respected, as family relations are less used than friendship and favourite. 7,627 actors have interacted through 2,874,170 comments and 22,500 have communicated through 795,949 messages. All these networks are composed of a largest component containing most of the actors (fig 5) and a few very small components (less than 100 actors) that show “the effectiveness of the social network at doing its job” (Newman 2003), i.e., at connecting people. The interaction sub networks have a very small diameter (3 for comments and 2 for messages) due to their high density. The family network has a high diameter (19), consistent with its low density. However the friend and favourite networks have a low density and a low diameter revealing the presence of highly intermediary actors. The betweenness and degree centralities confirm this last remark. The favourite network is highly centralized, with five actors having a betweenness centrality higher than 0, with a dramatically higher value for one actor: one who has a betweenness centrality of 1,999,858 while the other 4 have a value comprised between 2.5 and 35. This highest value is attributed to the official animator of the social network who has a favourite relationship26 with most actors of the network, giving him the highest degree: 59,301. In the friend network 1,126 actors have a betweenness centrality going from 0 to 96,104 forming a long tail, with only 12 with a value higher than 10,000.
146
These actors don’t include the animator, showing that the friend network has been well adopted by users. The family network has 862 actors with a betweenness centrality from 0 to 162,881 with 5 values higher than 10,000. Only one actor is highly intermediary in both friend and family networks. The centralization of this three networks present significant differences showing that the semantics of relations have an impact on the structure of the social network. The betweenness centralities of all the relations, computed using the polymorphism in SPARQL queries with the “knows” property, highlight both the importance of the animator that has again the significantly highest centrality and the appropriation of users with 186 actors playing a role of intermediary. The employees of Ipernity.com have validated these interpretations of the metrics that we computed, showing the effectiveness of a social network analysis that exploits the semantic structure of relationships. The Corese engine works in main memory and such an amount of data is memory consuming. The 494,510 relations declared between 61,937 actors use a space of 4.9 Go. The annotations of all messages use 14.7 Go and the representation of documents with their comments use 27.2 Go. On the other hand working in main memory allows us to process the network very rapidly. The path computation is also time and space consuming and some queries had to be limited to a maximum number of graph projections when too many paths could be retrieved. However, in that case, approximations are sufficient to obtain relevant metrics on a social network, i.e., for centralities (Brandes & Pich 2007). Moreover, we can limit the distance of the paths we are looking for by using other metrics. For example, we limit the depth of paths to be smaller or equal to the diameter of the components when computing shortest paths.
Semantic Social Network Analysis
Toward an Efficient Navigation of the Social Capital The framework we presented enables analyzing the rich typed representations of semantic social networks and managing the diversity of interactions and relationships with parametrized SNA metrics. The exploitation of these semantic based SNA metrics permits structuring overwhelming flows of corporate social activities. The amount of metadata used to organize content will continue to increase as the success of social-tagging based system shows. Current methods are still limited at structuring this data and exploiting it for the analysis of social networks. As we have shown, combining semantic tools and methods with a collaborative approach is a promising track which needs to be further explored. Several challenges have to be tackled to provide efficient exploitation of the social capital (Lin 2008) (Krebs 2008) built through online collaboration, and to foster social interactions. First, computation is time consuming and even if Corese runs in main memory, experiments reported in the chapter show that handling a network with millions of actors is out of our reach today. Different approaches can be investigated to address that problem: (1) identifying computation techniques that are iterative, parallelizable, etc.; (2) identifying approximations that can be used and under which conditions they provide good quality results; (3) identifying graph characteristics (small worlds, diameters, etc.) that can help us cut the calculation space and time for the different operators. Social web applications permit publishing, sharing and connecting so easily that a huge amount of social data is permanently produced, with a potential impact on the structure of the social network and the importance of its actors. Even if Corese enables loading data to the graph of a running engine, the computing cost and the volume of the data suggest only measuring relevant impacted metrics which change significantly.
Consequently methods need to be developed to handle and quantify the impact of new social data on a semantic social graph. Furthermore, community detection is one of the main focuses of social network analysis. Existing algorithms are based on heuristics to detect densely connected and cohesive groups of actors. But these algorithms are once again only based on the structure of the social network and they discard the semantic primitives used to type both relationships and actors. This lost knowledge could be used to determine semantically the cohesiveness of a community, to propose algorithms based on sociological definitions and to focus on relevant elements of the social graph for more efficient computation.
PERSPECTIVES: BUILDING “SHARED KNOWLEDGE GRAPHS” As discussed above, we need to enrich with semantics the simple representations of social networks and the content their users share, in order to fully exploit the wealth of data and interactions on the web. Doing so could consist in building “shared knowledge graphs” which help users find relevant resources or persons. In the field of knowledge management, this was the idea behind Topic Maps and the ontologies of the semantic web - they were thought of as knowledge representations capable of grasping the multi-dimensionality of the information we exchange (see Baget and al (2008) for an overview of the different knowledge representations based on graphs). These shared knowledge graphs can be seen as a generalization of these two types of knowledge representation, with a focus on the shareable features and the ability for both machines and humans to exploit them at different levels of functionality. Folksonomies are a recent example of the “shared knowledge structures” which have emerged from web 2.0 applications as an affordable way to massively categorize resources.
147
Semantic Social Network Analysis
In order to map the knowledge exchanged by Web communities, several challenges have to be addressed. First, for interoperability purposes we need to find a good balance in the standardization of the many ways of describing content on the Web. The “Web of Linked Data27” initiative proposes weaving a web of scattered sources of knowledge thanks to a combination of “good practices” and conceptual schemes describing them. Examples of such conceptual schemes can be seen in the formal ontologies presented in section “representing social data with semantic web frameworks”, which describe content exchanged by and within on-line communities. These types of approaches are a good start as they already assist users in identifying, for instance, all the content posted by a user across multiple sites, but we are still missing tools and methods to connect communities at a semantically richer level (Newell, 1982). The next step lies in enriching the semantics by which we intend to map contents from multiple platforms. A possible means to achieve this consists in “shared knowledge hubs”. The DBpedia project (Auer et al., 2007) is an example of such a hub, as it proposes expressing the knowledge structure of Wikipedia pages in machine processable data. By doing so, they provide a sort of common reference (the hierarchically organized Wikipedia Categories for instance) to which we can start connecting more elaborate “knowledge graphs”. Of course, these common references are not sufficient to describe each community’s field of knowledge, but they provide common terminologies, which need not be exclusive, and to which it is possible to hook more specific terms. The “Web of linked Data” is made of multiple webs of tacit bits of knowledge that are still today rarely explicitly expressed in both machine and human understandable representations. Web 2.0 applications and folksonomies have led to novel user experiences and yielded rich materials which are still missing appropriate representations to be efficiently browsed. This goal
148
can be achieved by developing tools to assist community members to connect their own knowledge categories to common references. For instance, current terminology extractors can be exploited in the context of folksonomies in order to detect common taxonomy categories among the tags, and to propose to the contributors of these folksonomies to map their tags with these categories, or to create new ones when needed (Passant & Laublet, 2008). The semantic structure of the folksonomies could also combine automatic inferences with the expertise of the users by integrating the validation of these inferences within the “natural” use of the systems. This aspect opens up new perspectives to create novel interfaces to knowledge repositories that exploit the best of semantic technologies and the dynamism of the social web.
CONCLUSION Through the example of Business Intelligence Process we highlighted that the systematic exploitation of information to foster economic performance and facilitate decision making is one of the keys to success for all organizations worldwide. The progressive integration of successful web 2.0 applications into intranets to foster collaboration and knowledge sharing offers new perspectives for the competitiveness of innovative enterprises. Every user of the intranet becomes an actor of a collective watch by organizing, sharing, producing and enriching information as a side effect of using social applications. Semantic web frameworks provide models to connect and exchange the social data and the knowledge embedded in the social network, spread in a collaborative intranet. The semantic enrichment of social data such as folksonomies in intranets involves all the collaborators in an efficient elaboration of a shared and structured corporate vocabulary. The semantic SNA stack provides a way to fully exploit the RDF representations of online interactions and to enhance the social data with contextual-
Semantic Social Network Analysis
ized SNA features. These semantic intranets of people, combined with semantic descriptions of the knowledge they exchange, will allow for the construction of shared knowledge graphs. This will help to efficiently manipulate the overwhelming flow of data of a semantic intranet of people. An effective approach to building these shared knowledge graphs and to turning on-line social experiences into collective intelligence will permit efficiently capturing and managing the social capital embedded in the network structure of “knowledge workers” collaboration.
REFERENCES Abbattista, F., Gendarmi, D., & Lanubile, F. (2007). Fostering knowledge evolution through community-based participation. In Workshop on Social and Collaborative Construction of Structured Knowledge (CKC 2007) at WWW 2007, Banff, Canada. Adida, B. (2008). hGRDDL: Bridging micorformats and RDFa. Journal of Web Semantics, 6(1), 61–69. Akhtar, W., Kopecky, J., Krennwallner, T., & Polleres, A. (2008). XSPARQL: Traveling between the XML and RDF worlds - and avoiding the XSLT pilgrimage. In Proceedings of the 5th European Semantic Web Conference (ESWC2008) (pp. 432-447). Tenerife, Spain. Springer. Alkhateeb, F., Baget, J.-F., & Euzenat, J. (2007). RDF with Regular Expressions. INRIA RR-6191. Retrieved from http://hal.inria.fr/inria-00144922/ en Angeletou, S., Sabou, M., & Motta, E. (2008). Semantically enriching folksonomies with FLOR. In CISWeb Workshop at Europ. Semantic Web Conf.
Anyanwu, M., Maduko, A., & Sheth, A. (2007). L2L: Towards Support for Subgraphs Extraction Queries in RDF Databases. In Proc. WWW2007. SPARQ. Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., & Ives, Z. G. (2007). Dbpedia: A nucleus for a web of open data. In ISWC/ASWC (LNCS 4825, pp. 722–735). Bader, D. A., & Madduri, K. (2006). Parallel algorithms for evaluating centrality in real-world networks, ICPP2006 (2006). Baget, J., Corby, O., Dieng-Kuntz, R., FaronZucker, C., Gandon, F., Giboin, A., Gutierrez, A., Leclère, M., Mugnier, M., & Thomopoulos (2008). R. Griwes: Generic Model and Preliminary Specifications for a Graph-Based Knowledge Representation Toolkit. In Proc. of the 16th International Conference on Conceptual Structures (ICCS’2008). Begelman, G., Keller, P. & Smadja, F. (2006). Automated tag clustering: Improving search and exploration in the tag space. Benjamins, V. R., & Musen, M. A. (Eds.). (2005). The Semantic Web. In Proceedings of the 4th International Semantic Web Conference, ISWC2005 (LNCS 3729, pp. 522–536). Berners-Lee, T., Hendler, J., Lassila, O. (2001). The Semantic Web. Scientific American. Bojars, U., Breslin, J. G., Finn, A., & Decker, S. (2008). Using the Semantic Web for linking and reusing data across Web 2.0 communities. Journal of Web Semantics, 6, 21–28. doi:10.1016/j. websem.2007.11.010 Brandes, U. (2001). A faster algorithm for betweenness centrality. The Journal of Mathematical Sociology, 25(2), 163–177. doi:10.1080/002225 0X.2001.9990249
149
Semantic Social Network Analysis
Brandes, U., & Pich, C. (2007). Centrality estimation in large networks. J. Bifurcation and Chaos in Applied Sciences and Engineering, 17(7), 2303–2318. doi:10.1142/S0218127407018403
Corby, O. (2008). Web, Graphs & Semantics. In Proc. of the 16th International Conference on Conceptual Structures (ICCS’2008), Toulouse, France.
Braun, S., Schmidt, A., Walter, A., Nagypál, G., & Zacharias, V. (2007). Ontology maturing: a collaborative web 2.0 approach to ontology engineering. In CKC, volume 273 of CEUR Workshop Proceedings. CEUR-WS.org.
Corby, O., Dieng-Kuntz, R., & Faron-Zucker, C. (2004). Querying the semantic web with the Corese search engine. ECAI/PAIS2004.
Breslin, J., Harth, A., Bojars, U., & Decker, S. (2005). Towards Semantically-Interlinked Online Communities. ESWC. Buffa, M., Erétéo, G., Faron-Zucker, C., Gandon, F., & Sander, P. (2008). SweetWiki: A Semantic Wiki. Journal of Web Semantics, 6(1). doi:10.1016/j.websem.2007.11.003
Corby, O., & Faron-Zucker, F. (2002). Corese: A Corporate Semantic Web Engine. Workshop on Real World RDF and Semantic Web Applications. Corby, O., Kefi-Khelif, L., Hacène, C., Fabien, G., & Khaled, K. (2009). Querying the Semantic Web of Data using SPARQL, RDF and XML (INRIA Research Report n°6847).
Burt, R. S. (1992). Structural Holes. New York: Cambridge University Press.
Donetti, L., & Munoz, M. A. (2004). Detecting communities: a new systematic and efficient algorithm. Journal of Statistical Mechanics, P10012. doi:10.1088/1742-5468/2004/10/P10012
Burt, R. S. (2001). Structural Holes versus Network Closure as Social Capital. In Lin, N., Cook, K., & Burt, R. S. (Eds.), Social Capital: Theory and Research (pp. 31–56). Aldine de Gruyter.
Erétéo, G., & Buffa, M. Gandon., & Corby, O. (2009). Analysis of a Real Online Social Network Using Semantic Web Frameworks, International Semantic Web Conference 2009.
Burt, R. S. (2004). Structural Holes and Good Ideas. American Journal of Sociology, 100(2), 339–399.
Erétéo, G., Buffa, M., Gandon, F., Grohan, P., Leitzelman, M., & Sander, P. (2008). A State of the Art on Social Network Analysis and its Applications on a Semantic Web, SDoW2008 (Social Data on the Web). Workshop at the 7th International Semantic Web Conference.
Cattuto, C., Benz, D., Hotho, A., & Stumme, G. (2008). Semantic grounding of tag relatedness in social bookmarking systems. 7th International Semantic Web Conference, Karlsruhe, Germany. Coleman, J. S. (1988). Social Capital in the creation of the human capital. [Supplement: Organisations and institutions: Sociological and Economic Approaches to the Analysis of Social Structure.]. American Journal of Sociology, 94. Conein, B. (2004). Communautés épistémiques et réseaux cognitifs: coopération et cognition distribuée. Revue d’Economie Politique, 113, 141–159.
150
Euzenat, J., & Shvaiko, P. (2007). Ontology Matching. Berlin, Heidelberg: Springer. Fortunato, S., Latora, V., & Marchiori, M. (2004). Method to find community structures based on information centrality. Phy. Rev. E, 70(5), 056104. doi:10.1103/PhysRevE.70.056104 Freeman, L. C. (1979). Centrality in social Networks: Conceptual Clarification. Social Networks, 1, 215–239. doi:10.1016/0378-8733(78)90021-7
Semantic Social Network Analysis
Freeman, L. C., & Borgatti, S. P. (1991). Centrality in valued graphs: A measure of betweenness based on network flow. Social Networks, 13, 141–154. doi:10.1016/0378-8733(91)90017-N Geisberg, R., Sanders P., & Scultes, D. (2008). Better approximation of betweenness centrality. ALENEX08. Girvan, M., & Newman, M. E. J. (2002). Community structure in social and biological networks. Proceedings of the National Academy of Sciences of the United States of America, 99(12). doi:10.1073/pnas.122653799 Girvan, M., & Newman, M. E. J. (2004). Finding and evaluating community structure in networks. Physical Review E: Statistical, Nonlinear, and Soft Matter Physics, 69, 026113. doi:10.1103/ PhysRevE.69.026113 Golbeck, J., Parsia, B., & Hendler, J. (2003). Trust network on the semantic web. CIA03. Goldbeck, J., & Rothstein, M. (2008). Linking social networks on the web with FOAF. AAA08. Golder, S. & Huberman, B. A. (2005). The structure of collaborative tagging systems. Good, B., Kawas, E., & Wilkinson, M. (2007). Bridging the gap between Social Tagging and Semantic Annotation: E. D. the Entity Describer. Available from Nature Proceedings. Gruber, T. (1993). A Translation Approach to Portable Ontology Specifications. Knowledge Acquisition, 5(2), 199–220. doi:10.1006/ knac.1993.1008 Gruber, T. (2005). Ontology of folksonomy: A mash-up of apples and oranges. MTSR2005. Gruber, T. (2008). Collective knowledge systems: Where the Social Web meets the Semantic Web. Journal of Web Semantics, 6, 4–13.
Gruber, T. (2009). Ontology. In Liu, L., & Tamer Özsu, M. (Eds.), Encyclopedia of Database Systems. Springer-Verlag. Hendler, J., & Goldbeck, J. (2008). Metcalfe’s law, web 2.0 and the Semantic Web. Journal of Web Semantics, 6(1), 14–20. doi:10.1016/j.websem.2007.11.008 Henri, F., & Pudelko, B. (2003). Understanding and analyzing activity and learning in virtual communities. Journal of Computer Assisted Learning, 19, 474–487. doi:10.1046/j.0266-4909.2003.00051.x Heymann, P., & Garcia-Molina, H. (2006). Collaborative Creation of Communal Hierarchical Taxonomies in Social Tagging Systems. Rapport interne. Stanford InfoLab. Holme, P., Kim, B. J., Yoon, C. N., & Han, S. K. (2002). Attack vulnerability of complex networks. Physical Review E: Statistical, Nonlinear, and Soft Matter Physics, 65, 056109. doi:10.1103/ PhysRevE.65.056109 Hotho A., Jäschke, R., Schmitz, C. & Stumme, G. (2006). Information Retrieval in Folksonomies: Search and Ranking. Huynh-Kim Bang, B., Dané, E., & Grandbastien, M. (2008). Merging semantic and participative approaches for organising teachers’ documents. In Proceedings of ED-Media 08 ED-MEDIA 08 - World Conference on Educational Multimedia, Hypermedia & Telecommunications (pp. 4959–4966). Vienne France. Khare, R., & Celik, T. (2006). Microformats: a pragmatic path to the Semantic Web. WWW2006 Kim, H., Yang, S., Song, S., Breslin, J. G., & Kim, H. (2007). Tag Mediated Society with SCOT Ontology. ISWC2007. Kochut, K. L., & Janik, M. (2007). SPARQLeR: Extended SPARQL for Semantic Association Discovery. In Proc. ESWC2007.
151
Semantic Social Network Analysis
Krebs, V. (2008). Social Capital: the Key to Success for the 21st Century Organization. IHIRM Journal, 12(5), 38–42. Latora, V., & Marchiori, M. (2007). A measure of centrality based on the network efficiency. New Journal of Physics, 9(6), 188. doi:10.1088/13672630/9/6/188 Levenshtein, V. I. (1966). Binary codes capable of correcting deletions, insertions and reversals. Soviet Physics, Doklady, 10(8), 707–710. Limpens, F., Gandon, F., & Buffa, M. (2008). Bridging Ontologies and Folksonomies to Leverage Knowledge Sharing on the Social Web: a Brief Survey. In Proc. 1st International Workshop on Social Software Engineering and Applications (SoSEA), L’Aquila, Italy. Lin, N. (2008). A network theory of social capital. In Castiglione, D., van Deth, J. W., & Wolleb, G. (Eds.), Handbook on Social Capital. Oxford University Press. Markines, B., Cattuto, C., Menczer, F., Benz, D., Hotho, A., & Stumme, G. (2009). Evaluating similarity measures for emergent semantics of social tagging. In 18th International World Wide Web Conference (pp. 641–641). Martin, A. C. (2005). From high maintenance to high productivity: What managers need to know about Generation Y. Industrial and Commercial Training, 37(1), 39–44. doi:10.1108/00197850510699965 Mathes, A. (2004). Folksonomies - Cooperative Classification and Communication Through Shared Metadata. Rapport interne. Illinois Urbana-Champaign: GSLIS, Univ. McAfee, A.-P. (2006). Enterprise 2.0: The Dawn of Emergent Collaboration. MIT Sloan Management Review, Management of Technology and Innovation.
152
Mika, P. (2005). Ontologies are us: A unified model of social networks and semantics. Journal of Web Semantics, 5(1), 5–15. doi:10.1016/j. websem.2006.11.002 Milgram, S. (1967). The Small World Problem. Psychology Today, 1(1), 61–67. Morville, P. (2004). Ambient findability. Digital Web Magazine. Retrieved from http://www. digital-web.com/articles/ambient_findability/ Newell, A. (1982). The Knowledge Level. Artificial Intelligence, 18, 87–127. doi:10.1016/00043702(82)90012-1 Newman, M. E. J. (2001). Scientific collaboration networks. Shortest paths weighted networks, and centrality. Physical Review E: Statistical, Nonlinear, and Soft Matter Physics, 64, 016132. doi:10.1103/PhysRevE.64.016132 Newman, M. E. J. (2003). The structure and function of complex networks. SIAM Review, 45, 167–256. doi:10.1137/S003614450342480 Newman, M. E. J. (2004). Detecting community structure in networks. Eur. Phys. J. B, 38, 321330. Newman R., Ayers D. & Russell S. (2005). Tag Ontology Design. Retrieved from http://www. holygoat.co.uk/owl/redwood/0.1/tags/ Paolillo, J. C., & Wright, E. (2006). Social Network Analysis on the Semantic Web: Techniques and Challenges for Visualizing FOAF. Visualizing the semantic Web Xml-based Internet And Information. Park, J., & Hunting, S. (2002). XML Topic Maps: Creating and Using Topic Maps for the Web. Addison-Wesley Professional. Passant, A. (2007). Using Ontologies to Strengthen Folksonomies and Enrich Information Retrieval in Weblogs. In International Conference on Weblogs and Social Media.
Semantic Social Network Analysis
Passant, A. (2009). Technologies du Web Sémantique pour l’Entreprise 2.0. PhD thesis, Université Paris IV - Sorbonne. Passant, A., & Laublet, P. (2008). Meaning of a Tag: A Collaborative Approach to Bridge the Gap Between Tagging and Linked Data. LDOW2008. Pérez, J., Arenas, M., & Gutierrez, C. (2008). nSPARQL: A Navigational Language for SPARQL. In Proceedings of ISWC 2008. Pons, P., & Latapy, M. (2005). Computing communities in large networks using random walks. ISCIS2005. Radicchi, F., Castellano, C., Cecconi, F., Loreto, V., & Parisi, D. (2004). Defining and identifying communities in networks. Proceedings of the National Academy of Sciences of the United States of America, 101, 2658–2663. doi:10.1073/ pnas.0400054101 Ronzano, F., Marchetti, A., & Tesconi, M. (2008). Tagpedia: a semantic reference to describe and search for web resources. In WWW 2008 Workshop on Social Web and Knowledge Management, Beijing, China. Rowe, M., & Ciravegna, F. (2008). Getting to Me – Exporting Semantic Social Network Information from Facebook. In Proceedings of Social Data on the Web Workshop. ISWC 2008. Karlsruhe, Germany. San Martin, M., & Gutierrez, C. (2009). Representing, Querying and Transforming Social Networks with RDF / SPARQL. ESWC09. Santos, E.E., Pan, L., Arendt, D., & Pittkin, M. (2006). An Effective Anytime Anywhere Parallel Approach for Centrality Measurements in Social Network Analysis. IEEE2006. Schmitz, P. (2006). Inducing ontology from flickr tags. In Proc. of the Collaborative Web Tagging Workshop (WWW06).
Schwarzkopf, E., Heckmann, D., Dengler, D., & Kroner, A. (2007). Mining the structure of tag spaces for user modeling. In Proceedings of the Workshop on Data Mining for User Modeling at the 11th International Conference on User Modeling (pp. 63–75). Scott, J. (2000). Social Network Analysis, A handbook (2nd ed.). Thousand Oaks, CA: Sage. Specia, L., & Motta, E. (2007). Integrating folksonomies with the semantic web. 4th European Semantic Web Conference. Tanasescu, V., & Streibel, O. (2007). Emergent Semantics through the Tagging of Tags. In ESOE at ISWC. ExtremeTagging. Tesconi, M., Ronzano, F., Marchetti, A., & Minutoli, S. (2008). Semantify del.icio.us: Automatically turn your tags into senses. In Proceedings of the First Social Data on the Web Workshop (SDoW2008). Torniai, C., Jovanovic, J., Bateman, S., Gašević, D., & Hatala, M. (2008). Leveraging folksonomies for ontology evolution in e-learning environments. In ICSC ’08: Proceedings of the 2008 IEEE International Conference on Semantic Computing (pp. 206–213). Washington, DC, USA: IEEE Computer Society. Van Damme, C., Hepp, M., & Siorpaes, K. (2007). An integrated approach for turning folksonomies into ontologies. In Bridging the Gep between Semantic Web and Web 2.0 (SemNet 2007) (pp. 57–70). Folksontology. Vanderwal, T. (2004). Folksonomy Coinage and Definition. Retrieved from http://www.vanderwal. net/folksonomy.html. Veres, C. (2006). The language of folksonomies: What tags reveal about user classification. In Natural Language Processing and Information Systems (LNCS 3999, pp. 58-69).
153
Semantic Social Network Analysis
Vidou, G., Dieng-Kuntz, R., El Ghali, A., Evangelou, C., Giboin, A., Tifous, A., & Jacquemart, S. (2006). Towards an Ontology for Knowledge Management in Communities of Practice. Wasserman, S., Faust, K., Iacobucci, D., & Granovetter, M. (1994). Social Network Analysis: Methods and Applications. Cambridge University Press. Weller, K., & Peters, I. (2008). Seeding, weeding, fertilizing. different tag gardening activities for folksonomy maintenance and enrichment. In S. Auer, S. Schaffert & T. Pellegrini (Eds.), Proceedings of I-Semantics08, International Conference on Semantic Systems (pp. 10-117). Graz, Austria, September 3-5. Wellman, B. (1996). For a social network analysis of computer networks: a sociological perspective on collaborative work and virtual community. In Proceedings of the ACM SIGCPR/SIGMIS conference on Computer personnel research (pp. 1-11). Denver, Colorado, USA. Wilkinson, D. M., & Huberman, B. A. (2003). A method for finding communities of related genes. Proc. Natl. Acad. Sci. Wu, F., & Huberman, B. A. (2004). Finding communities in linear time: a physics approach. HP Labs. Zhou, H., & Lipowsky, R. (2004). Network browniam motion: A new method to measure vertex-vertex proximity and to identify communities and subcommunities.
ADDITIONAL READING Berners-Lee, T., Hendler, J., Lassila, O. (2001). The Semantic Web. Scientific American.
154
Bojars, U., Breslin, J. G., Finn, A., & Decker, S. (2008). Using the Semantic Web for linking and reusing data across Web 2.0 communities. Journal of Web Semantics, 6, 21–28. doi:10.1016/j. websem.2007.11.010 Corby, O., Dieng-Kuntz, R., & Faron-Zucker, C. (2004). Querying the semantic web with the corese search engine. ECAI/PAIS2004. Erétéo, G., & Buffa, M. Gandon., F., & Corby, O. (2009). Analysis of a Real Online Social Network Using Semantic Web Frameworks. International Semantic Web Conference 2009. Golbeck, J., Parsia, B., & Hendler, J. (2003). Trust network on the semantic web. CIA03. Goldbeck, J., & Rothstein, M. (2008). Linking social Networks on the web with FOAF. AAA08. Gruber, T. (2008). Collective knowledge systems: Where the Social Web meets the Semantic Web. Journal of Web Semantics, 6, 4–13. Hendler, J., Shadbolt, N., Hall, W., BernersLee, T., & Weitzner, D. (2008). Web science: an interdisciplinary approach to understanding the web. Communications of the ACM, 51(7), 60–69. doi:10.1145/1364782.1364798 Krebs, V. (2008). Social Capital: the Key to Success for the 21st Century Organization. IHIRM Journal, 12(5), 38–42. Limpens, F., Gandon, F., & Buffa, M. (2008). Bridging Ontologies and Folksonomies to Leverage Knowledge Sharing on the Social Web: a Brief Survey. In Proc. 1st International Workshop on Social Software Engineering and Applications (SoSEA), L’Aquila, Italy. Markines, B., Cattuto, C., Menczer, F., Benz, D., Hotho, A., & Stumme, G. (2009). Evaluating similarity measures for emergent semantics of social tagging. In 18th International World Wide Web Conference (p. 641).
Semantic Social Network Analysis
Mathes, A. (2004). Folksonomies - Cooperative Classification and Communication Through Shared Metadata. Rapport interne. Illinois Urbana-Champaign: GSLIS, Univ. McAfee, A.-P. (2006). Enterprise 2.0: The Dawn of Emergent Collaboration. MIT Sloan Management Review, Management of Technology and Innovation, April 1 Mika, P. (2007). Social Networks and the Semantic Web. Semantic web and beyond, 7. Paolillo, J. C., & Wright, E. (2006). Social Network Analysis on the Semantic Web: Techniques and Challenges for Visualizing FOAF. Visualizing the semantic Web Xml-based Internet And Information. San Martin, M., & Gutierrez, C. (2009). Representing, Querying and Transforming Social Networks with RDF / SPARQL. ESWC09. Scott, J. (2000). Social Network Analysis, A Handbook (2nd ed.). Thousand Oaks, CA: Sage.
3 4
5
6
7
8
9
10 11
14 12 13
15
16
17
Wellman, B. (1996). For a social network analysis of computer networks: a sociological perspective on collaborative work and virtual community. In Proceedings of the ACM SIGCPR/SIGMIS conference on Computer personnel research (pp. 1-11). Denver, Colorado, United States. Zhou, H., & Lipowsky, R. (2004). Network browniam motion: A new method to measure vertexvertex proximity and to identify communities and subcommunities. ICCS 2004
ENDNOTES 1
2
Metcalfe’s law states that the useful power of a network multiplies rapidly as the number of users of the network increases, “The community value of a network grows as the square of the number of its users” http://www.socialtext.com/
18
19
20
21
22
23
http://www.bluekiwi-software.com/ Semantic Web, W3C, http://www. w3.org/2001/sw/ FOAF, Friend Of A Friend http://www.foafproject.org/ RELATIONSHIP, http://vocab.org/relationship/ SIOC, Semantically Interlinked Online Communities, http://sioc-project.org/ SKOS, Simple Knowledge Organization System, http://www.w3.org/2004/02/skos/ SIOC, Semantically Interlinked Online Communities, http://sioc-project.org/node/158 http://microformats.org Semantic Cloud Of Tags, http://scot-project. org/ http://sioc-project.org/ Meaning Of A Tag http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData/ http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData/ ADEME (www.ademe.fr) has a distributed network of experts who compile data related to renewable energies (in particular for home use). They also answer questions (by email, phone) and exploit a knowledge base with simple keyword-based queries. Data is indexed using a thesaurus whose evolution is problematic. http://ns.inria.fr/semsna/2009/06/21/voc http://www.w3.org/2009/sparql/wiki/ Main_Page http://jena.sourceforge.net/ARQ/property_ paths.html http://exmo.inrialpes.fr/software/psparql/ http://www.w3.org/TR/2004/REC-owlguide-20040210/#SimpleProperties Slide, Owl in one, by F. Gandon, http:// twitpic.com/60pdy SemSNA is an ontology that describes concepts of Social Network Analysis, while SemSNI is used for representing interactions in a social network. For example, SemSNA
155
Semantic Social Network Analysis
24
156
can be used to compute centrality of nodes in a social network, nodes linked together using relations inferred from interactions form SemSNI (i.e network of people who commented a same resource). RIFWorkingGrouphttp://www.w3.org/2005/ rules/wiki/RIF_Working_Group
25 26
27
http://www.ipernity.com This animator is an employee of the company that animates the social network, he declares as favourite every user who just created an account and sends him welcome messages. www.linkeddata.org.
157
Chapter 8
Using Social Network Analysis to Guide Theoretical Sampling in an Ethnographic Study of a Virtual Community Enrique Murillo Instituto Tecnológico Autónomo de México – ITAM, Mexico
ABSTRACT Social Network Analysis (SNA) provides a range of models particularly well suited for mapping bonds between participants in online communities and thus reveal prominent members or subgroups. This can yield valuable insights for selecting a theoretical sample of participants or participant interactions in qualitative studies of communities. This chapter describes a procedure for collecting data from Usenet newsgroups, deriving the social network created by participant interaction, and importing this relational data into SNA software, where various cohesion models can be applied. The technique is exemplified by performing a longitudinal core periphery analysis of a specific newsgroup, which identified core members and provided clear evidence of a stable online community. Discussions dominated by core members are identified next, to guide theoretical sampling of text-based interactions in an ongoing ethnography of the community.
INTRODUCTION The Usenet discussion network is a popular area of the Net, attracting participants from every corner of the world. It works as a public bulletinboard, organized into topical discussion groups called newsgroups, whose number Turner, Smith, Fisher and Welser (2005) put at 150,000. It thus provides a convenient way for people with very DOI: 10.4018/978-1-60960-040-2.ch008
diverse interests to find each other and discuss their passion. Usenet’s large base of participants, neatly organized into distinct knowledge realms, has resulted in the emergence of many topicallyfocused virtual communities. Most are virtual communities of interest (Blanchard & Markus, 2004), focused on a particular passion or hobby, such as the Harry Potter books (alt.fan.harrypotter) or stamp collecting (rec.collecting.stamps. discuss). Others are more specialized communi-
Using Social Network Analysis to Guide Theoretical Sampling
ties serving specific professional groups; most dealing with computer topics, but many others addressing fields far removed from computers, such as taxation issues (misc.taxes.moderated), medical transcription (sci.med.transcription) or farming (uk.business.agriculture). This spontaneous forming of online communities explains why Usenet was an early platform for conducting unobtrusive naturalistic research of computer-mediated communications and the social environments that emerge from them (Lee, 2000). Since newsgroup messages are posted online for all to see, they constitute a publicly accessible record of discussions, offering a wealth of research data. A substantial portion of the extant literature about virtual communities is focused on Usenet newsgroups and mailing lists or listservs, a similar technology. As for the newer Web 2.0 platforms, research about virtual communities is fairly recent, with blogs taking the lead. Various approaches have been proposed for identifying communities that form around linked blogs (e.g. Kumar, Novak, Raghavan, & Tomkins, 2005; Chin & Chignell, 2007; Chau & Xu, 2007), and there have been some qualitative studies of specific communities (e.g. Kaiser, Müller-Seitz, Pereira Lopes, & Pina e Cunha, 2007; Silva, Goel, & Mousavidin, 2008). There are fewer studies of wiki-based communities (e.g. Bryant, Forte, & Bruckman, 2005), and fewer still of communities based on social networking sites like Facebook or Tweeter (e.g.; Ellison, Steinfield, & Lampe, 2007), although the number of publications will undoubtedly grow in the coming years. Since the aim of most studies of virtual communities is to describe a culture, they tend to rely on qualitative methods, which involve reading and analyzing samples of messages exchanged by community members. It is therefore important to provide a theory grounded rationale for sample selection in order to avoid the trap of “anecdotalism” (Silverman, 2000). Howard (2002) proposes using Social Network Analysis (SNA) to identify significant members of the online community and
158
to focus ethnographic analysis on this purposive sample. This chapter describes the steps taken for theoretical sample selection in an ongoing ethnography of the newsgroup misc.taxes.moderated (henceforth MTM) which hosts a long established community of practice (CoP) of tax professionals. The ethnography aims to provide a rich description of day-to-day interactions in the community. Theoretical sampling of relevant interactions was guided by a specific SNA technique, the continuous core-periphery model (Borgatti & Everett, 1999). The chapter describes the procedure used for data collection, social network analysis, identification of core members and theoretical sampling of discussion threads dominated by core members. The chapter is organized as follows. Section 1 provides background research on virtual communities and applications of SNA to Usenet. Section 2 explains how participants form social networks in Usenet and how these can be derived. Section 3 demonstrates the power of this technique by performing a longitudinal core-periphery analysis of MTM spanning six years. The last section discusses the rationale and implicit assumptions of this approach and how it generalizes to other Internet platforms.
VIRTUAL COMMUNITIES AND QUALITATIVE RESEARCH Studies of virtual or Internet-based communities began in the early nineties. Early examples are Rheingold’s (1993) book and North’s (1994) thesis about Usenet culture. Scholarly research grew rapidly in the following years (e.g. Jones, 1995), painting a broad picture of the characteristics of these communities, and the kinds of social interaction the plain-text medium of newsgroup or listserv messages can support. Researchers were surprised to find that virtual communities can exhibit rich cultures (Baym, 1995; Tepper, 1997), develop a sense of commu-
Using Social Network Analysis to Guide Theoretical Sampling
nity (Blanchard & Markus, 2004), give themselves rules and institutions to govern the common good (McLaughlin, Osborne, & Smith, 1995; Kollock & Smith, 1996; Denzin, 1999), provide social and emotional support (Winzelberg, 1997; Preece, 1999; Denzin, 1999; Pfeil & Zaphiris, 2007), sustain true personal “online” relationships (Parks & Floyd, 1996; Roberts, 1998, McKenna, Green, & Gleason, 2002), enable members to construct identities (Baym, 2000; Blanchard & Markus, 2004; Hara & Hew, 2007), and achieve a remarkable degree of altruistic cooperation (Kollock, 1999; Wasko & Faraj, 2000). Furthermore, practitioner or professional-oriented virtual communities can support sophisticated modes of technical and professional collaboration in topics as varied as public relations (Thomsen, 1996), nursing (Murray, 1996; Hara & Hew, 2007), cellular biology (Hengen, 1997), computer programming (Wasko & Faraj, 2000; Lee & Cole, 2003), taxes (Fuller, 1999; Murillo, 2008), law (Samborn, 1999; Wasko & Faraj, 2005), and handcrafts (Lovelace, 1998). As Wellman (1999, p. 15) aptly sums up: Usenet “supports emotional, nuanced and complex interactions, belying early fears that it would be useful only for simple, instrumental exchanges”. Several studies reported that in most online communities a clear differentiation of active participants can be observed. Some are stable and prolific posters, others post only occasionally or even just once (McLaughlin et al, 1995; Tepper, 1997; Smith, 1999; Lee & Cole, 2003). These studies used the term “core” to refer to the subset of stable and frequent posters, who often dominate discussions, and heavily influence the identity, culture and day-to-day operation of the community. A particularly appropriate method for studying such differentiated patterns of interaction is SNA (Wasserman & Faust, 1994; Garton, Haythornthwaite, & Wellman, 1997; Scott, 2000). The method adopts as its unit of analysis, not individual social entities, but the ties or relations between them, trying to explain social behavior
as a result of the patterns of strong and weak ties between individual units and the constraints which follow from them (Wellman, 1988). The fact that SNA only requires relational data makes it well-suited to analyze a social space where personal attributes are often not readily disclosed, and only posted messages can be observed. Still, the literature reveals relatively few studies that use SNA to empirically detect the social networks formed by distributed Usenet participants, especially inside individual newsgroups. Previous applications of SNA to Usenet have mostly focused on mapping the links cross-posted messages establish between different newsgroups (Smith, 1999; Donath, Karahalios, & Viégas, 1999; Sack, 2001; Choi & Danowski, 2002; Borgs, Chayes, Mahdian, M., & Saberi, 2004; McGlohon & Hurst, 2009). Studies that apply the SNA lens to individual newsgroups are fewer, nor have they focused on identifying a stable, persistent and active cluster of participants that can be thus considered a virtual community. Muncer, Loader, Burrows, Pleace and Nettleton (2000) used SNA to search for cliques in two newsgroups offering social support, one for people suffering from depression, the other, from diabetes. The study found a three-member clique in the depression group, with a frequency of shared postings greater than or equal to six, but findings are limited by a small sample size of 61 complete threads. Welser, Gleave, Fisher and Smith (2007) used ego-centric social networks to identify two roles participants play within newsgroups, “answer persons”, who typically reply to discussion threads initiated by others, and “discussion persons” who both initiate threads and participate in threads started by others. Zaphiris and Sarwar (2006) examined a sample of 200 messages each from alt.teens and soc.senior.issues. They detected 14 cliques in the teenagers group, and 18 in the seniors group, but again, the small sample size raises concerns about the stability of these detected SNA structures. This study will attempt a much larger sample to reveal stable online groupings and long term trends.
159
Using Social Network Analysis to Guide Theoretical Sampling
Sowe, Stamelos and Angelis (2006) provide an interesting application of SNA to listservs that mirrors the logic of this study by targeting key participants. They examined three high-volume mailing lists devoted to the Debian operating system, derived the affiliation network of postersto-lists, and thereby identified 15 “knowledge brokers” who linked and collaborated with members of all lists, thus acting as community facilitators or hubs. Through an e-mail survey, these brokers revealed they were long-time list participants, that spent substantial time every week reading and posting, and viewed themselves not only as expert knowledge providers but also as knowledge seekers. As illustrated by this study, an important problem SNA can solve is to identify key participants in online communities, and thus improve theoretical sampling in qualitative studies (Howard, 2002). Many previous studies of Usenet communities have relied on qualitative methods, such as interviews (Blanchard & Markus, 2004), participant observation (Tepper, 1997), discourse analysis (McLaughlin et al, 1995; Denzin, 1999), qualitative content analysis (Pfeil & Zaphiris, 2007), and ethnography (Baym, 2000). In all of these methods, sample selection can be improved by using SNA to derive the online social network participants weave through their interactions, and to identify “significant” members of the network
using any of a number of available SNA models (Wasserman & Faust, 1994; Scott, 2000). The next section provides a step-by-step technique to do this.
Deriving the Social Network from a Newsgroup Since Usenet newsgroups can receive hundreds of messages every day, some organizing scheme is required to make sense of all the information posted to the group. The basic organization unit of a newsgroup is the thread or conversation. A thread is a set of messages which address the question or discussion topic announced in the first message, which is called the thread head. Participants use a software program called newsreader to browse messages and post their own1. Those who wish to provide an answer or an opinion post to the thread, that is, they send a message making an explicit reference to the thread head or to one of the subsequent replies. At any given time, a busy newsgroup contains dozens of ongoing (“live”) threads or discussions, each with its own coherent set of messages. To illustrate this, Figure 1 displays the structure of a short thread posted to newsgroup soc.genealogy.britain (all names have been disguised). The figure is typical of the way newsreaders display discussion threads. Each line represents
Figure 1. Structure of a thread from soc.genealogy.britain
160
Using Social Network Analysis to Guide Theoretical Sampling
one message; the line ends with the date and time the message was posted. The thread head is in bold typeface and states as its subject the initial question or issue, followed by the author of the message, in this case, Peter Kent. This self-proclaimed newbie (i.e. novice) wishes to ask a question to the collective wisdom of the British genealogies newsgroup. The remaining messages are all follow-ups to the thread head, i.e. replies or comments from other participants. A key feature of this display is that it shows by different indentations whether each message is a follow-up to the original question or to one of the replies to it. For example, both Mary and Richard Platt posted replies to the original question. However, Barney Fisher posted a comment to Richard Platt which is shown by his message being indented to Richard Platt’s. A few days later, when the discussion was all but ended, John Thompson posted his own comment to Barney Fisher’s comment. This manner of address comes about naturally because most participants use newsreaders to read the messages in the newsgroup. When they read one they want to reply to, they simply choose the “Reply to this post” option of the newsreader, type a reply and send it off to the newsgroup. The newsreader automatically includes the necessary information, so that when the reply appears as a post in the newsgroup, it will be shown indented to the specific message the reader responded to. Thus, simple visual inspection of the thread reveals to whom each participant posted. The only exception is Peter Kent’s original question, which is addressed to the entire newsgroup, not to any particular participant. Every follow-up is thus a directed message, and will be treated as a (directed) social tie between the author and the recipient of the message. Participants in the newsgroup who either post or receive a directed message will be called actors, using SNA terminology. Because these are deliberate communications between specific members of the newsgroup, they are naturally-occurring
exemplars of relational data, and building blocks of online social networks (Wellman & Gulia, 1999; Garton et al, 1997). Repeated person-to-person communication is the basic building block of virtual communities, and deriving the social network of the newsgroup can render it visible. The valued ties between actors in the social network will steadily increase as time passes and the messages exchanged between actors also increase. Hence a newsgroup sample spanning an extended period, such as one year, will make repeated interpersonal communications increasingly visible. Visual inspection of threads is only practical for very small samples, but message headers provide an efficient way of handling larger samples. Every Usenet message starts with a standard set of data fields called headers. They contain system information that, among other things, allow newsreaders to display newsgroup messages as a distinct set of threads, that is, of coherent discussions identified by the thread head. Thus, headers provide the means to clearly establish the sender and recipient of each directed message, in turn making possible the derivation of the social network participants create through their interactions. This can be seen in Figure 2, which displays two real messages; the original question by Peter Kent to soc.genealogy.britain, and the (abridged) response to that question, by Mary. Both messages begin with several headers, followed by a blank line and the body of the message. The figure displays only those headers relevant to this study; specifically: From: The From header contains the electronic mailing address and, optionally, the full name of the person who sent the message. An disguised name, or e-mail address can be used, usually to avoid receiving unsolicited advertising (spam). Newsgroups: Specifies the newsgroup or newsgroups to which the message was posted. Subject: This header is filled by the author of the message, describing what it is about. If the
161
Using Social Network Analysis to Guide Theoretical Sampling
Figure 2. Thread head and reply from a thread in soc.genealogy.britain
message is submitted in response to another message, the default subject (automatically set by the newsreader) will be the Subject of the previous message, preceded by “Re: ”. Message-ID: This contains a computer-generated combination of characters that constitute a unique Usenet identifier for the message. For example, the Message-ID of Peter Kent’s original message is <9KuZmWAMwAS7EwCt@postbox. karoo.co.uk> References: This header is absent in some messages, such as Peter Kent’s original question, indicating to the newsreader the message is a thread head. When the header is present, it indicates the message is a follow-up, and the header contains the Message-ID of the message it was posted to. This can be seen in Mary’s reply; the References
162
field in Mary’s message contains the Message-ID of Peter Kent’s message. Newsreaders rely on the information in this header to coherently organize discussion threads. Data collection would be easy if the researcher could simply download a large sample of messages, and directly manipulate the newsreader’s database, but this is usually a proprietary format that only technically skilled users can decipher. The alternative is to use the functionality of most newsreaders to save a set of selected messages as a formatted text document2. From this, it is relatively easy to write a small routine that reads this formatted document, and stores the values from the various headers as separate fields in commaseparated files, which can then be imported into a database program such as Microsoft Access. A BASIC routine is provided in the Appendix to perform this task. The first step once the data is captured in the database is to anonimize poster names and e-mail addresses in the From header, both to protect their identity and to make for easier handling. A simple scheme is to use two-letter combinations: AA, AB, AC, etc. A useful trick is to make the new labels roughly correspond to posters sorted by number of posts, so AA is the most prolific poster, AB the second most prolific, etc. More peripheral posters who contribute very few messages can be more simply relabeled with a consecutive number3. At this point, relational data can be generated. An Access query can be used to match the Reference header with the unique Message-ID header, and thus derive a full listing of directed messages and the message they were directed to. A second query can then pick the author of each directed message and the author of the referenced message to generate a complete listing of ties between posters. Finally, a third query can be used to count ties between each pair of participants. This last listing reveals the complete social network participants created through their online interactions during the sampled period. Typically, it would have the following format:
Using Social Network Analysis to Guide Theoretical Sampling
AA AB 9 AA AC 12 AA AD 13 AG AA 1 AA AH 15 (...) When exported from Access as a text file, this listing provides provide the necessary input for SNA programs such as UCINET (Borgatti, Everett, & Freeman, 2002). Once the social network is captured in these programs, a complete battery of SNA models can be used to detect actor subgroups and measure their cohesion. In particular, given the core-periphery pattern reported in previous studies of virtual communities (Smith, 1999; Tepper, 1997; McLaughlin et al, 1995), the continuous core-periphery model is a logical choice (Borgatti & Everett, 1999). It was developed for social network data consisting of positive integers that represent strengths of relationships. This is just the case of message counts between pairs of actors in a newsgroup. The algorithm calculates a set of coreness scores which maximize the correlation of the observed social network with an ideal core-periphery pattern. To demonstrate the use of the model, the next section describes how it was used to identify the core group of MTM, and to examine the evolution of this core over a period of six years.
An Application to Newsgroup Misc.Taxes.Moderated This newsgroup was recently classified as a virtual community of practice displaying the Wenger dimensions of Mutual engagement, Shared repertoire and Joint enterprise (Murillo, 2008). Still, an in-depth examination of the dayto-day interactions and problem-solving between members would be a worthwhile addition to the CoP literature, since rich and detailed descriptions of these interactions have thus far only been attempted on conventional face-to-face CoPs (Orr, 1990; Wenger, 1998; Gherardi & Nicolini, 2000;
Hara, 2009). An ethnography of this newsgroup is currently in progress, and has adopted discussion threads as the unit of analysis. The advantage of threads over individual messages lies in the former representing a complete and coherent discussion. Thus a full thread is easier to interpret and analyze ethnographically because in the various messages community members provide a context and oftentimes a critical review of each other. Since there are thousands of discussion threads to choose from available newsgroup archives, a core periphery analysis will be used to detect threads where identified core members have the greatest participation measured by number of messages. Henceforth, the theoretical sample will be selected from this considerably reduced subset using two additional theoretical criteria. The first is to give preference to medium-sized and long threads because, unlike short threads, they are more capable of containing elaborate episodes of collective problem-solving. The second is to use the title of the thread to target problem-solving episodes. Part of this ongoing study also involved examining core membership over time. To accomplish this, three equally spaced one-year samples were downloaded from MTM and imported into Access: 2001-2002; 2004-2005 and 2007-2008. From each sample, the one-year social network was derived, imported into UCINET, and a core-periphery model fitted. The results are displayed in Table 1; in each sample, participants with coreness of 0.10 or higher are classified as core members and are highlighted. For each sample, the core-periphery analysis identifies core members, coreness score, and number of messages posted. This last number reflects newsgroup involvement, but not necessarily coreness. Since coreness is a generalized measure of network centrality (Everett & Borgatti, 2005); for an actor to have high coreness he or she must be the recipient of a large number of directed posts. For example, in the 2001-2002 sample, Charlie posted 765 messages and Frank
163
Using Social Network Analysis to Guide Theoretical Sampling
Table 1. MTM participants with coreness scores greater than 0.10 (in bold)
674, yet the former had coreness of 0.094 and the latter, 0.46. To better illustrate the concept of a coreperiphery pattern, it is useful to build a plot of total messages exchanged between members of the newsgroup sorted by descending order of coreness. Such a graph, shown in Figure 3, plots on the vertical z-axis the number of directed messages sent by the 70 newsgroup members with the highest coreness scores during the period 2001-2002. The graph displays a dense “core” at the origin, indicating core members have very frequent in-
164
teraction with other core members. The long tails along the x and y axes reveal core members also maintain reciprocated exchanges with the many peripheral participants in the newsgroup. The graph also highlights that messages exchanged by core members constitute a substantial portion of total newsgroup activity. The results in Table 1 track core membership over a six year period, revealing significant core turnover. For instance, in 2001-2002, Alice had a coreness score of 0.22, which dropped to 0.049 in 2004-2005 and by 2007 she was no longer ac-
Using Social Network Analysis to Guide Theoretical Sampling
Figure 3. Messages exchanged between high-coreness members of MTM
tive in the newsgroup. By contrast, Ron, Sergey, Theo and Ulrich are all participants joined the group only recently, and achieved high coreness scores. Only two participants sustained coreness scores greater than 0.10 over the three samples, Abe and Don, although there are other participants like Ian, Jack and Karl who’ve sustained substantial participation over the entire period. The table reveals that the size of the core has grown in recent years, but also that total newsgroup activity has steadily declined. This has resulted in a widening of core membership, and less pronounced differences of posting activity between core members. Both are issues which might fruitfully be explored in the ongoing ethnography. Having identified community core members, the next step is to identify the discussion threads they dominate. The rationale is that among all available threads (1261 in the 2007-2008 sample), those where most messages are posted by core members will better represent the professional issues and problems said members find interesting. Since core members are known at this point, identification of core-dominated threads can be performed with a couple of queries using the message database. Recall that a thread is a set
of messages having the same Subject header. A query can generate a table of all threads and their length by counting messages having the same Subject header. Another query can generate a second table, also counting messages with the same Subject header, but restricting the query to messages whose From header has a core member. The two tables can be copy pasted to Excel and the Subject headers matched, and a percentage of core member participation calculated for each thread. Table 2 shows partial results, for threads of 30 or more messages, for the sample year 2007-2008, with threads sorted by length. At this point, the procedure has identified medium-sized and long threads, with strong participation by core members. Thread topic can also be used to target plausible problem-solving discussions. In Table 2, the three discussions that would merit initial ethnographic examination are numbers 3, 5 and 11, which all have more than 90% core member participation. The title of the threads suggest they all deal with taxation in special cases, namely, the treatment of raffle winnings, deductions for charitable donations, and deductions for (business related) entertainment and meals.
165
Using Social Network Analysis to Guide Theoretical Sampling
Table 2. MTM thread length and core member participation
THREAD NAME
THREAD LENGTH
CORE MEMBER MESSAGES
PERCENT
Re: Why is catching a baseball taxable income?
116
86
74.1%
Re: Calif gay marriage
70
57
81.4%
Re: Raffle winnings
51
50
98.0%
Re: Is Landlord Double Dipping?
49
29
59.2%
Re: Charitable deductions
41
38
92.7%
Re: estate taxes
41
23
56.1%
Re: Self Employment Income of Gambling Winnings
35
18
51.4%
Re: Is this legal? (Avoiding gift tax.)
34
19
55.9%
Re: Big deduction for state tax: implications on estimated tax
34
6
17.6%
Re: H&R Block: How Does It Get Around Pub 1345 Rules?
33
14
42.4%
Re: Can entertainment and meals be deductible in full?
31
29
93.5%
Re: question about mileage deduction - special circumstances
30
25
83.3%
Ultimately, sample selection should be guided by ethnographic sensibility, but the procedure described here provides three important advantages. First, it arms the ethnographer with a clear sense of the key personalities in the community and the issues they care for. Second, it guides the researcher to a reduced subset of discussions, where preliminary ethnographic analysis can make informed choices about the final theoretical sample of the study. Third, the risk of researcher subjectivity and sample bias is substantially reduced (Howard, 2002).
DISCUSSION Implicit in the use of the core-periphery model to target theoretically relevant participants is an assumption that higher coreness is equivalent to more “representative” or “prominent” community membership. Several arguments support this assumption. As previously mentioned, researchers of virtual communities have used the term “core” (although not in a formally SNA sense) to refer to the subset of stable and frequent posters, and
166
have pointed out that these participants heavily influence the identity, culture and day-to-day operation of the newsgroup (McLaughlin et al, 1995; Tepper, 1997; Smith, 1999; Lee & Cole, 2003). This is confirmed in the present study. Core members numbered just 8, 11 and 16 participants in each of the three samples, but they respectively accounted for 32%, 49% and 62% of all directed messages. Their large influence in the newsgroup thus justifies making them the focus of the ethnographic analysis. A second argument has to do with the previous finding that MTM qualifies as a Usenet-based CoP (Murillo, 2008). Membership in a CoP is defined by mutual engagement, but members typically display different levels of engagement, which results in most CoPs displaying a core-periphery structure (Wenger, 2000). Such a structure can be modelled using the Borgatti-Everett algorithm. Since coreness scores are highly correlated to the intensity of online social interaction (not just to absolute number of posts), they are a good proxy of the degree of CoP membership, understood as the strength of mutual engagement. This again
Using Social Network Analysis to Guide Theoretical Sampling
recommends focusing the ethnography on participants with the highest coreness. A third argument has to do with the definition of “core member”. This study set an arbitrary threshold of 0.10 for core membership, which results in core sizes of 8-16 participants, with strong posting activity as shown in Table 1. Since each sample spanned a full year of newsgroup activity, it is clear that core members so defined have a stable online presence and they maintain intensive interaction between them (as well as with more volatile peripheral members). This results in the formation of strong ties between core members. People cannot interact intensively for long periods, in topics that interest them, and remain strangers, even on the Internet. Thus the core periphery analysis provides strong quantitative evidence that MTM is indeed a virtual community, with a stable membership, bonds between members, and sustained levels of interaction.4 Therefore it makes sense to focus the ethnography on the set of high coreness members where strong mutual engagement is most assured. A lower threshold for core membership could be used, but it would bring proportionately less assurance of direct engagement between all members of the resulting core, as well as a less focused theoretical sample of threads. Two more points can close this chapter. First, to recall that the continuous core-periphery model is but one of several SNA models that analyze network cohesion on the basis of density of ties. Other possibilities are cliques, k-cores and mcores (Scott, 2000), LS-sets (Borgatti, Everett, & Shirey, 1990), or Bonacich’s eigenvector centrality (Bonacich, 1987).5 Depending on the type of community examined and the research questions guiding the study, all of these models can be used to identify Usenet participants displaying the strongest sustained interaction. Second, to note that the techniques described in this chapter can be generalized to all Internet discussion forums organized around threaded discussions. The key technical challenge is to
develop an ad-hoc routine that can efficiently download messages from such forums, make accurate attributions of message directionality, and record the information in a database-accessible format. Data collection in Usenet and listservs is particularly straightforward, given their use of standard message headers. Web 2.0 platforms like blogs, wikis and social networking sites, present researchers with more difficult data harvesting issues (for a useful introduction, see Glance et al, 2005), but they provide exciting new opportunities for examining the emergence of community in Internet worlds.
ACKNOWLEDGMENT The author gratefully acknowledges the support of Asociación Mexicana de Cultura, A.C.
REFERENCES Baym, N. (1995). The emergence of community in computer-mediated communication. In Jones, S. (Ed.), Cybersociety: computer-mediated communication and community (pp. 138–163). Thousand Oaks, CA: Sage. Baym, N. (2000). Tune in, log on: soaps, fandom and online community. Thousand Oaks, CA: Sage. Blanchard, A. L., & Markus, M. L. (2004). The experienced ‘sense’ of a virtual community: characteristics and processes. The Data Base for Advances in Information Systems, 35(10), 65–79. Bonacich, P. (1987). Power and centrality: a family of measures. American Journal of Sociology, 92, 1170–1182. doi:10.1086/228631 Borgatti, S., & Everett, M. (1999). Models of core/periphery structures. Social Networks, 21, 375–395. doi:10.1016/S0378-8733(99)00019-2
167
Using Social Network Analysis to Guide Theoretical Sampling
Borgatti, S., Everett, M., & Freeman, L. (2002). Ucinet 6 for windows: Software for Social Network Analysis. Natick, MA: Analytic Technologies. Borgatti, S., Everett, M., & Shirey, P. (1990). LS sets, Lambda sets and other cohesive subsets. Social Networks, 12(4), 337–357. doi:10.1016/03788733(90)90014-Z Borgs, C., Chayes, J. T., Mahdian, M., & Saberi, A. (2004). Exploring the community structure of newsgroups. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge, Discovery and Data Mining (KKD) (pp. 783-787). Retrieved Mar 7, 2010 from http://portal.acm.org/ citation.cfm?id=1014052.1016914 Boyd, D. (2006). Friends, Friendsters, and MySpace Top 8: writing community into being on social network sites. First Monday, 11(2). Retrieved June 3, 2008 from http://www.firstmonday. org/issues/issue11_12/boyd/index.html Bryant, S. L., Forte, A., & Bruckman, A. (2005). Becoming Wikipedian: transformation of participation in a collaborative online encyclopledia. In K. Schmidt, M. Pendergast, M. Ackerman, & G. Mark (Eds.), Proceedings of GROUP International Conference on Supporting Group Work (pp. 11-20). New York: ACM Press. Chau, M., & Xu, J. (2007). Mining communities and their relationships in blogs: a study of online hate groups. International Journal of HumanComputer Studies, 65(1), 57–70. doi:10.1016/j. ijhcs.2006.08.009 Chin, A., & Chignell, M. (2007). Identifying communities in blogs: roles for social network analysis and survey instruments. International Journal of Web Based Communities, 3(3), 343–365. doi:10.1504/IJWBC.2007.014243
168
Choi, J. H., & Danowski, J. (2002). Making a global community on the Net – global village or global metropolis? A network analysis of Usenet newsgroups. Journal of computer-mediated communication, 7(3). Retrieved Jan 23, 2003 from http://www.ascusc.org/jcmc/vol7/issue3/ choi.html Denzin, N. (1999). Cybertalk and the method of instances. In Jones, S. (Ed.), Doing Internet Research (pp. 107–125). Thousand Oaks, CA: Sage. Donath, J., Karahalios, K., & Viégas, F. (1999). Visualizing conversation. Journal of computermediated communication, 4(4). Retrieved Jan 23, 2003 from http://www.ascusc.org/jcmc/vol4/ issue4/donath.html Ellison, N., Steinfield, C., & Lampe, C. (2007). The benefits of Facebook “friends”: exploring the relationship between college students’ use of online social networks and social capital. Journal of Computer-Mediated Communication, 12(3), 1143–1168. doi:10.1111/j.10836101.2007.00367.x Everett, M. G., & Borgatti, S. P. (2005). Extending centrality. In Carrington, P., Scott, J., & Wasserman, S. (Eds.), Models and Methods in Social Network Analysis (pp. 57–76). Cambridge: Cambridge University Press. Freeman, L., Borgatti, S., & White, D. (1991). Centrality in valued graphs: a measure of betweenness based on network flow. Social Networks, 13(2), 141–154. doi:10.1016/0378-8733(91)90017-N Fuller, J. (1999). Cybertax practitioners find friends, clients online. Accounting Today, April 26-May 9, 3 Garton, L., Haythornthwaite, C., & Wellman, B. (1997). Studying on-line social networks. Journal of computer-mediated communication, 3(1). Retrieved Jan 23, 2003 from http://www.ascusc. org/jcmc/vol3/issue1/garton.html
Using Social Network Analysis to Guide Theoretical Sampling
Gherardi, S., & Nicolini, D. (2000). The organizational learning of safety in communities of practice. Journal of Management Inquiry, 9(1), 7–18. doi:10.1177/105649260091002
Kollock, P. (1999). The economies of online cooperation: gifts and public goods in cyberspace. In Smith, M. A., & Kollock, P. (Eds.), Communities in Cyberspace (pp. 219–237). New York: Routledge.
Glance, N., Hurst, M., Nigam, K., Siegler, M., Stockton, R., & Tomokiyo, T. (2005). Deriving marketing intelligence from online discussion. In Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 419-428). Retrieved Mar 7, 2010 from http://portal.acm.org/citation. cfm?id=1081870.1081919
Kollock, P., & Smith, M. (1996). Managing the virtual commons: cooperation and conflict in computer communities. In Herring, S. (Ed.), Computer-mediated communication: Linguistic, social and cross-cultural perspectives (pp. 109–128). Amsterdam: John Benjamins.
Hara, N. (2009). Communities of practice: fostering peer-to-peer learning and informal knowledge sharing in the work place. Berlin: Springer-Verlag. Hara, N., & Hew, K. F. (2007). Knowledge-sharing in an online community of health-care professionals. Information Technology & People, 20(3), 235–261. doi:10.1108/09593840710822859 Hengen, P. (1997). Internet newsgroups. Trends in Cell Biology, 7(1), 34–35. doi:10.1016/S09628924(97)60045-8 Herring, S. C., Scheidt, L. A., Wright, E., & Bonus, S. (2005). Weblogs as a bridging genre. Information Technology & People, 18(2), 142–171. doi:10.1108/09593840510601513 Howard, P. N. (2002). Network ethnography and the hypermedia organization: new media, new organizations, new methods. New Media & Society, 4(4), 550–564. doi:10.1177/146144402321466813 Jones, S. (1995). Cybersociety: Computer-mediated communication and community. Thousand Oaks, CA: Sage. Kaiser, S., Müller-Seitz, G., Pereira Lopes, M., & Pina e Cunha, M. (2007). Weblogtechnology as a trigger to elicit passion for knowledge. Organization, 14(3), 391–412. doi:10.1177/1350508407076151
Kumar, R., Novak, P., Raghavan, S., & Tomkins, A. (2005). On the bursty evolution of blogspace. World Wide Web: Internet and Web Information Systems, 8(2), 159–178. Lee, G. K., & Cole, R. E. (2003). From a firmbased to a community-based model of knowledge creation: the case of the Linux kernel development. Organization Science, 14(6), 633–649. doi:10.1287/orsc.14.6.633.24866 Lee, R. M. (2000). Unobtrusive methods in social research. Buckingham: Open University Press. Lovelace, J. (1998). Craft in cyberspace. American Craft, 58(2), 4–6. McGlohon, M., & Hurst, M. (2009). Community structure and information flow in Usenet: improving analysis with a thread ownership model. In Proceedings of the Third International AAAI Conference on Weblogs and Social Media (ICWSM09). Retrieved Mar 5, 2010 from http://www.aaai.org/ ocs/index.php/ICWSM/09/paper/view/210 McKenna, K. Y. A., Green, A. S., & Gleason, M. J. (2002). Relationship formation on the Internet: what’s the big attraction? The Journal of Social Issues, 58(1), 9–31. doi:10.1111/1540-4560.00246 McLaughlin, M., Osborne, K., & Smith, C. (1995). Standards of conduct on Usenet. In Jones, S. (Ed.), Cybersociety: Computer-mediated communication and community (pp. 90–111). Thousand Oaks, CA: Sage.
169
Using Social Network Analysis to Guide Theoretical Sampling
Muncer, S., Loader, B., Burrows, R., Pleace, N., & Nettleton, S. (2000). Form and structure of newsgroups giving social support: a network approach. Cyberpsychology & Behavior, 3(6), 1017–1029. doi:10.1089/109493100452282 Murillo, E. (2008). Searching Usenet for virtual Communities of Practice: using mixed methods to identify the constructs of Wenger’s theory. Information Research, 13(4), paper 386. Retrieved Jan 23, 2009 from http://InformationR.net/ir/13-4/ paper386.html Murray, P. (1996). Nurses’ computer-mediated communications on NURSENET: a case study. Computers in Nursing, 14(4), 227–234. doi:10.1097/00024665-199607000-00011 North, T. (1994). The Internet and Usenet global computer networks. Unpublished Master thesis. Retrieved Mar 2, 2001 from http://www.vianet. net.au/~timn/thesis/ Orr, J. E. (1990). Sharing knowledge, celebrating identity: community memory in a service culture. In Middleton, R., & Edwards, D. (Eds.), Collective remembering: Memory in society (pp. 169–189). London: Sage. Parks, M., & Floyd, K. (1996). Making friends in cyberspace. The Journal of Communication, 46(1). doi:10.1111/j.1460-2466.1996.tb01462.x Pfeil, U., & Zaphiris, P. (2007). Patterns of empathy in online communication. In Proceedings of CHI 2007 – the ACM conference on human factors in computing systems (pp. 919 – 928). 28 April–3 May, 2007, San Jose, CA. Retrieved Mar 7, 2010 from http://portal.acm.org/citation. cfm?id=1240763 Preece, J. (1999). Empathic communities: balancing emotional and factual communication. Interacting with Computers, 12(1), 63–77. doi:10.1016/ S0953-5438(98)00056-3
170
Rheingold, H. (1993). The virtual community: homesteading on the electronic frontier. Reading, Mass: Addison-Wesley. Roberts, T. L. (1998). Are newsgroups virtual communities? In Proceedings of the Annual ACM SIGCHI Conference on Human Factors in Computing Systems (CHI 1998) (pp. 360–367). New York: ACM Press. Sack, W. (2001). Conversation map: an interface for very large-scale conversations. Journal of Management Information Systems, 17(3), 73–92. Samborn, H. V. (1999). Colleagues in space. ABA Journal, 85(12), 80–81. Scott, J. (2000). Social network analysis: a handbook (2nd ed.). London: Sage. Silva, L., Goel, L., & Mousavidin, E. (2008). Exploring the dynamics of blog communities: the case of MetaFilter. Information Systems Journal, 19(1), 55–81. doi:10.1111/j.1365-2575.2008.00304.x Silverman, D. (2000). Doing qualitative research: a practical handbook. London: Sage. Smith, M. A. (1999). Invisible crowds in cyberspace: mapping the social structure of the Usenet. In Smith, M. A., & Kollock, P. (Eds.), Communities in Cyberspace (pp. 195–219). New York: Routledge. Sowe, S., Stamelos, I., & Angelis, L. (2006). Identifying knowledge brokers that yield software engineering knowledge in OSS projects. Information and Software Technology, 48(11), 1025–1033. doi:10.1016/j.infsof.2005.12.019 Tepper, M. (1997). Usenet communities and the cultural politics of information. In Porter, D. (Ed.), Internet culture (pp. 39–54). New York: Routledge. Thomsen, S. R. (1996). At work in cyberspace: exploring practitioner use of the PRForum. Public Relations Review, 22(2), 115–131.
Using Social Network Analysis to Guide Theoretical Sampling
Turner, T. C., Smith, M. A., Fisher, D., & Welser, H. T. (2005). Picturing Usenet: Mapping computer-mediated collective action. Journal of Computer-Mediated Communication, 10(4), article 7. Retrieved Apr 17, 2006 from http://jcmc. indiana.edu/vol10/issue4/turner.html Wasko, M., & Faraj, S. (2000). It is what one does: why people participate and help others in electronic communities of practice. The Journal of Strategic Information Systems, 9(2-3), 155–173. doi:10.1016/S0963-8687(00)00045-7 Wasko, M. M., & Faraj, S. (2005). Why should I share? Examining social capital and knowledge contribution in electronic networks of practice. Management Information Systems Quarterly, 29(1), 35–57. Wasserman, S., & Faust, K. (1994). Social network analysis: methods and applications. Cambridge: Cambridge University Press. Wellman, B. (1988). Structural analysis: from method and metaphor to theory and substance. In Wellman, B., & Berkowitz, S. (Eds.), Social structures: a network approach (pp. 19–61). Cambridge: Cambridge University Press.
Wenger, E. (1998). Communities of practice: learning, meaning, and identity. Cambridge: Cambridge University Press. Wenger, E. (2000). Communities of practice: the structure of knowledge stewarding. In Despres, C., & Chauvel, D. (Eds.), Knowledge Horizons: the present and promise of knowledge management (pp. 205–223). Woburn, MA: ButterworthHeinneman. Winzelberg, A. (1997). The analysis of an electronic support group for individuals with eating disorders. Computers in Human Behavior, 13(3), 393–407. doi:10.1016/S0747-5632(97)00016-2 Zaphiris, P., & Sarwar, R. (2006). Trends, similarities and differences in the usage of teen and senior public online newsgroups. ACM Transactions on Computer-Human Interaction, 13(3), 403–422. doi:10.1145/1183456.1183461
ENDNOTES 1
2
3
Wellman, B. (1999). Living networked in a wired world. IEEE Intelligent Systems, 14(1), 15–17. Wellman, B., & Gulia, M. (1999). Virtual communities as communities: Net surfers don’t ride alone. In Smith, M. A., & Kollock, P. (Eds.), Communities in Cyberspace (pp. 167–194). New York: Routledge. Welser, H. T., Gleave, E., Fisher, D., & Smith, M. (2007) Visualizing the signatures of social roles in online discussion groups. Journal of Social Structure, 8(2). Retrieved Mar 7, 2010 from: http://www.cmu.edu/joss/content/articles/ volume8/Welser/
A popular dedicated newsreader is Forté Agent, in free and paid versions. Modern e-mail clients, such as Microsoft Outlook Express or Mozilla Thunderbird can also function as newsreaders. This study used Agent, which can save selected messages as a text document. The user can choose which headers to save and in what order. For this study: From, Newsgroups, Subject, Date, Lines, Message-ID, References, Message-Body. Furthermore, each saved message was preceded by an easily recognizeable separator, specifically: “== ** == ** == ** == ** == **” The From field requires some additional manipulation before relational data can be generated. It is necessary to identify the different e-mails under which a single person posted; many people, for instance, posted from both home and work with different
171
Using Social Network Analysis to Guide Theoretical Sampling
4
172
e-mails in each case. It is essential not to treat these two e-mails as two different persons. The task can be performed, albeit laboriously, by visual inspection of the list of distinct posters, aided by the search and replace function of the database. In misc. taxes.moderated it was only necessary to do this for participants who posted five or more messages in one year, and who accounted for less than 10% of all posters. By using one year samples, the stability of detected online SNA structures can be asserted with much greater confidence than
5
the online cliques reported by Muncer et al (2000) and Zaphiris & Zarwar (2006). Theoretically questionable are models based on the network concept of distance, such as n-cliques and n-clans (Scott, 2000) or flowbetweenness centrality (Freeman, Borgatti & White, 1991). In a newsgroup, any actor can directly reach any other actor by posting to him or her. Therefore, by the very nature of the Usenet medium, there is no relaying of messages and no intermediaries, which form the basis of the network concept of distance.
Using Social Network Analysis to Guide Theoretical Sampling
APPENDIX FILTER.BAS Routine DIM ONELINE AS STRING, AUTHOR AS STRING, NEWSGR AS STRING, SUBJECT AS STRING DIM POSTDATE AS STRING, TIMEZONE AS STRING, LINES AS STRING DIM MESSAGEID AS STRING, REFERENCES AS STRING, MESSAGEBODY AS STRING OPEN “TAXES.TXT” FOR INPUT AS #1 OPEN “AUTHOR.TXT” FOR OUTPUT AS #2 OPEN “MESSID.TXT” FOR OUTPUT AS #3 OPEN “REFS.TXT” FOR OUTPUT AS #4 OPEN “BODIES.TXT” FOR OUTPUT AS #5 INDEX& = 0: ‘CHANGE THIS INDEX TO IMPORT ADDITIONAL RECORDS DO INDEX& = INDEX& + 1 LINE INPUT #1, ONELINE: AUTHOR = MID$(ONELINE, 7) LINE INPUT #1, ONELINE: NEWSGR = MID$(ONELINE, 13) LINE INPUT #1, ONELINE IF LEFT$(ONELINE, 8) <> “Subject:” THEN LINE INPUT #1, ONELINE SUBJECT = MID$(ONELINE, 10) LINE INPUT #1, ONELINE IF LEFT$(ONELINE, 5) <> “Date:” THEN LINE INPUT #1, ONELINE POSTDATE = MID$(ONELINE, 7, 20): TIMEZONE = RIGHT$(ONELINE, 5) LINE INPUT #1, ONELINE: LINES = MID$(ONELINE, 8): LIN% = VAL(LINES) LINE INPUT #1, ONELINE: MESSAGEID = MID$(ONELINE, 13) LINE INPUT #1, ONELINE: REFERENCES = MID$(ONELINE, 13) IF REFERENCES = ““ THEN REFERENCES = “NONE” ‘ONLY KEEP THE LAST REFERENCE DO LINE INPUT #1, ONELINE IF ONELINE = “Message-Body: ” THEN EXIT DO REFERENCES = MID$(ONELINE, 2) LOOP MESSAGEBODY = ““ ‘ONLY COPY THE FIRST 10 LINES OF MESSAGE BODY FOR I = 1 TO 12000 LINE INPUT #1, ONELINE IF ONELINE = “== ** == ** == ** == ** == **” THEN EXIT FOR ‘ MESSAGES SAVED FROM AGENT ARE EACH SEPARATED BY THIS STRING IF ONELINE = ““ THEN ONELINE = “__” IF LEFT$(ONELINE, 1) = “>” THEN ONELINE = “>” IF I < 11 THEN MESSAGEBODY = MESSAGEBODY + ONELINE + ” “ NEXT I WRITE #2, INDEX&, AUTHOR, NEWSGR
173
Using Social Network Analysis to Guide Theoretical Sampling
Tools and Techniques for Analysis and Building of Virtual Communities As noted in the opening chapters in this volume, there is a growing interest in understanding the depth and breadth of these communities. However, most of the data available are noisy and there are limited tools that can help researchers make sense of these data. A demand for automated software tools and techniques that can provide a simple, quick, and inexpensive way for researchers to collect information on the nature of virtual communities would be required. Section 3 presents 6 chapters focused on tools and techniques available to collect, process and present data on interactions in virtual communities. Chapter 9 presents findings of a study conducted on three virtual communities for teachers based on electronic discourse. The transcripts of discussions were collected and examined using a set of conversational analysis techniques. Chapter 10 presents “the Community Agent”; a tool intended to trace the evolution of the domain of a distributed Community of Practice, by obtaining and presenting graphically indicators pointing the domain of a Community of Practice and the participation of its members. Chapter 11 presents a new web-based system called ICTA (http://textanalytics.net) for automated analysis and visualization of online conversations in virtual communities. Chapter 12 discusses emerging critical global collaboration paradigm and the use of virtual learning communities. The goal is to illustrate how the social nature of virtual worlds can be used to teach technical writing and the academic research process using Second Life, the online 3D virtual world created entirely by its residents. Chapter 13 examines the impact of Web 2.0 and social networking tools on education. It explores the challenge for teachers to embrace these new social networking tools and apply them to new educational contexts. Chapter 14 focuses on the application of Conversation Analysis (CA) as a tool to understand online social encounters. Complementing current analytic methods like content analysis and social network analysis, analytic tools like Discussion Analysis Tool (DAT) (Jeong, 2003) and Transcript Analysis Tool (TAT) (Fahy, Crawford, & Ally, 2001) have been developed to study both the content of online discussions as well as the interactions that take place among the participants.
176
Chapter 9
Graphically Mapping Electronic Discussions: Understanding Online Conversational Dynamics Jennifer Howell Australian Catholic University Limited, Australia
ABSTRACT Transcripts of electronic discussions have traditionally been examined via the use of conversational analysis techniques. Coding such transcripts provides rich data regarding the content and nature of the discussions that take place. However, understanding the content of the messages is not limited to the actual message itself. An electronic message is sent either in response to or to start a discussion thread. Examining the entry point of a new message can help to clarify the dynamics of the community discussion. Electronic discussions do not appear to follow traditional conversational norms. New messages may be immediate responses or they can be responses to messages posted over a longer period of time in the past. However, by graphically mapping electronic discussions, a clearer understanding of the dynamics of electronic discussions can be achieved. This chapter will present the findings of a study that was conducted on three online communities for teachers. The transcripts of electronic discussions were collected and examined via conversational analysis. These messages were then analysed via graphical mapping and the findings concluded that three distinct patterns exist in which electronic discussions may follow. It was further discovered that each of these patterns were indicative of a distinct type of electronic discussion. The findings from this study offer further insight into the nature of online discussions and help to understand online conversational dynamics.
INTRODUCTION The use of computer-mediated text messages in research has been well documented and a number DOI: 10.4018/978-1-60960-040-2.ch009
of frameworks exist for this purpose (Connelly & Clandinin, 1990; Grabowski, Pusch & Pusch, 1990; Hara, Bonk, Angeli, 2000; Henri, 1992; Levy, 2003). Text-based messages commonly used in computer-mediated communication (CMC) have unique characteristics. Whilst they are writ-
ten texts they do not share the same features as traditional written communication (Henri, 1992) and contain some characteristics of spoken communication. Electronic discussions are divided into threads, with responses to different threads not following logically after one another. This does not inhibit the communicative experience, but is merely a distinguishing characteristic of the medium. McCreary (1990) stated that the written word demands an exactness and coherence of thought, indicating that text-based communication results in more well planned and structured interactions. The message itself can be regarded as a complete communicative unit (Henri, 1992) which has its own meaning and structure. However, understanding the content of the messages is not limited to the actual message itself. An electronic message is sent either in response to or to start a discussion thread. Examining the entry point of a new message can help to clarify the dynamics of the electronic discussion and which upon a cursory examination, does not appear to follow traditional conversational norms. New messages may be immediate responses or they may be responses to messages posted in the considerable past. This is a feature of electronic text or hypertext, the individual blocks of text, or lexias, and the electronic links that join them (Landow, 1994). Hypertext is a nonlinear form of text that has multiple entry points and perceiving electronic discussions as thus is the starting point for graphically mapping them. As transcripts are essentially banks of online discussions, they are a form of hypertext and need to be considered as such. This nonlinear characteristic has influenced the way communication interactions are conducted online. Understanding the meaning of the messages has been well researched and there are many conversational analysis frameworks available for use (Garrison, Anderson & Archer, 2001; Gunawardena, Lowe & Anderson, 1997; Hara, Bonk & Angeli, 2000; Harasim, 1990; Henri, 1992; Hiltz, 1990; Levin, Kim & Riel, 1990). However,
by graphically mapping electronic discussions, a clearer understanding of the dynamics of electronic discussions can be achieved. These graphical maps of the electronic conversations can help researchers understand the nature of the discussion and thus help to clarify why electronic discussions may be structured in a particular way and why some discussions are longer and more complex. This chapter will be presented in six sections. The first is concerned with understanding and analyzing electronic messages via conversational analysis. This will include an exploration of different frameworks of analysis. The second will provide an overview of the participants in the study and the third section will examine the methodological approach and present the findings from the coding of the data. The fourth section of this chapter will present the graphical mapping of the discussion threads and explore the three patterns that emerged; (1) flowchart; (2) regular cluster and (3) bonded cluster. The fifth section of the chapter will discuss if graphical mapping of discussion threads help in the understanding of differences between electronic discussions and the nature of those differences. The final section of the chapter will present how this study offers further insight into the nature of online discussion and helps to further understanding of online conversational dynamics.
GRAPHICALLY MAPPING ONLINE CONVERSATIONS The rise in use of electronic communication has resulted in many different approaches to understanding those exchanges. With regard to graphically mapping those conversations, the approaches can be categorized as being either automated or physical. Automated graphical maps are produced by software programs that run diagnostic algorithms across the electronic messages and result in graphical images, such as box plot graphs of characteristics such as frequency,
177
Graphically Mapping Electronic Discussions
time or sender. Physical approaches are those that involve the physical plotting of messages or data by a researcher. These are very broad, simplistic definitions that will be clarified below. These two different approaches result in quite different outcomes. Automated approaches have the potential to produce vast quantities of demographic and diagnostic data. However, physical approaches tend to produce qualitative outcomes that can range from conversational patterns to semantic emphases. As previously stated, automated approaches are reliant upon software programs for example; AutoBrief (Kerpedjiev & Roth, 2001), Loom (Donath, Karahalios & Viegas, 1999), Conversation Map (Sack, 2000), ConverSpace (Popolov, Callaghan & Luker, 2000) and Netscan (Smith & Fiore, 2001). These automated systems for exploring communicative data are primarily concerned with reasoning about the users tasks in order to select graphical techniques (Kerpedjiev & Roth, 2001). Visual representations of user traffic are useful tools in understand the activities of online electronic communication (Viegas & Smith, 2004). The types of graphics they result in can range from simple graphs, conversation trees, to complex digital representations. It is this early work on developing conversation trees that the new model proposed in this chapter is based upon. Early approaches that resulted in conversation trees stopped at that point, they developed a diagrammatical tree of the electronic conversation being analysed, most often email exchanges or Usenet groups, but it was not taken further. The model being proposed here will show that conversation trees, when analysed over a number of electronic conversations, exhibit patterns. The physical approaches to graphical mapping, essentially involve a researcher to physically map and create the visual output. It is often conducted in combination with another approach, such as conversational analysis, and provides a richer understanding of the electronic conversation being analyzed. Patterns in conversation inform our
UNDERSTANDING ELECTRONIC MESSAGES VIA CONVERSATIONAL ANALYSIS Understanding and analyzing the meaning of electronic discussion has been largely conducted by conversational analysis. Conversational analysis has been defined as a systematic examination of documents (Babbie, 1990) and as a technique aimed at understanding the learning process (Henri, 1992). As Kuehn (1994) concluded conversational analysis provides an objective and systematic examination of the manifest content of communication. The use of conversational analysis for research conducted on computer-mediated communication has been well documented (Hara et al., 2000; Henri, 1992; Kuehn, 1994). Kuehn (1994) cited that in computer-mediated communication conversational analysis can be used in two ways: to describe a communication phenomenon or to test a hypothesis. Conversational analysis research on computer-mediated communication has resulted in the development of a number of frameworks for this purpose. Levin, Kim, and Riel (1990) examined interactions found in email messages sent to a group list. The analysis of the topic content of those discussion threads led to the development of Intermessage Reference Analysis, which was comprised of graphically representing messages in cluster diagrams and analysing the message act for content. Hiltz (1990) examined the relationship between educational technology and educational effectiveness, by sorting data into four categories: technological determinist, social psychological, human relations and interactionist. Harasim (1990) attempted to establish the existence of knowledge building and examined messages for discernible stages of knowledge building. One of the most commented on form of conversational analysis used in CMC was proposed by Henri (1992), who from a cognitive perspective, developed five categories, aimed at revealing the learning process behind the mes-
sage. The categories were: participative, social, interactive, cognitive and metacognitive. This framework for analysis was used and modified by many researchers. Newman, Webb and Cochrane (1995) used Henri’s (1992) five categories, but created more detailed sets of paired indicators, in an attempt to show evidence of critical thinking. Howell-Richardson and Mellar (1996) combined Henri’s (1992) categories with speech act theory to examine the facets of illocutionary acts. An interesting extension of the work by Henri (1992) was the Interaction Analysis Model (Gunawardena, Lowe, & Anderson, 1997) aimed at contextualising cognitive phases with social interaction. The framework aimed at identifying the strategies used in the co-creation of knowledge. Messages were categorised into five phases of interaction, which reflected the movement from lower to higher cognitive phases. This framework was the first to attempt to analyse not just the content of the messages and the learning that was occurring, but also incorporated the social construction of the new knowledge being created. It could be proposed that this was the first conversational analysis method that attempted to incorporate the social aspect of online communities into the analysis of their discussions. From this arose an attempt to understand the structure of the discourses occurring online (Hara, Bonk & Angeli, 2000). Messages were classified into five categories: elementary classification, in-depth classification, inferencing, judgment and application of strategies. Interactions were mapped electronically to determine the existence of patterns. A further approach by Levy (2003) extended Henri (1992) by using a constructivist action-research cycle; planning, taking action, evaluating and theorising to classify messages in an attempt to understand knowledge construction. The conversational analysis framework used in this study was The Practical Inquiry Model (Garrison, Anderson & Archer, 2001). This model recognises and incorporates the shared world and the private world of an individual as important
179
Graphically Mapping Electronic Discussions
Table 1. Teacher online communities activity (January 2006) Community Name
Acronym
Location
Membership (as at January,2006) (N=1288)
BECTA-Top Teachers
BECTA
United Kingdom
568
Oz-TeacherNet
OTN
Australia
608
SSABSA – English Teachers
SSABSA
South Australia
112
components in the construction of knowledge. This model’s strength lies in its applicability to online communication due to this shared/private world perspective. Individuals participating in online discussions are motivated and influenced by experiences in their private world. The Practical Inquiry Model perceives learning as the social construction of knowledge and therefore places the individual within that learning landscape. An online community is clearly a social group constructing meaning together, whether this is for educational or other purposes is dependent on the community itself. The model can be seen in Figure 1.
Participants This study forms part of a larger study that was concerned with determining the potential membership to an online community has a source Figure 1. Practical Inquiry Model (Garrison et al., 2001)
180
of professional development for teachers. The participants in this study were all members of online communities for teachers. They presented as a mixed cohort, from a variety of teaching backgrounds, amount of teaching experience and geographical locations. Three online communities were selected; one local Australian state-based community, one national Australian community and one international community (see Table 1). It was felt that this combination would provide rich data and a wider perspective on issues. In any investigation of an online community, what is said is of critical importance. The simple act of counting messages is only a partial measure of the success and reach of the community. More complex measures of community impact can only be drawn from the analysis of the messages themselves. Hence it was decided that in conjunction to analyzing the content of the messages, an attempt to graphically map the online discussions would be made to determine if further understanding could be achieved. It was decided that the community transcripts would be selected from the same time period for each of the three communities (see Table 1) and January 2006 was randomly chosen. It was hoped that, as this represented the start of a new school year in Australia, and the end of Term 1 in the United Kingdom, there would be rich data to analyse. The community transcripts were accessed through public archives and required a member username and password to access. As they were professional online communities, conversation tended to be around issues associated with teach-
Graphically Mapping Electronic Discussions
ing, such as problems, lesson ideas, curriculum issues. As this component formed part of a larger study that included the a use of an electronic survey and focus group forum, permission was sought and obtained from all participants prior to accessing the community transcripts.
Method Messages were coded and analysed using The Practical Inquiry Framework (Garrison, et al., 2001). The messages were classified according to the four phases of the model which are (1) triggering event, (2) exploration, (3) integration, and (4) resolution. These four phases reflect the critical thinking process and indicates a cognitive presence (Garrison et al., 2001). Each phase contained a broad descriptor; (a) evocative, (b) inquisitive, (c) tentative and (d) committed. An overview of the coding of the three community transcripts for January 2006 is presented in Table 2. The total number of messages coded was 546 (N=546) and the number of messages per phase
was; Evocative 101 messages (18.49%), Inquisitive 121 messages (22.16%), Tentative 270 messages (49.45%), and Committed 54 messages (9.89%). Clearly the majority of messages were coded as Tentative. From Table 2, it can be seen that the messages posted fit all four phases and indicators from the Practical Inquiry Model (Garrison et al., 2001). These will be detailed further below and some of the codes will be illustrated with examples:
Evocative (Triggering Event) Messages As noted, the total number of messages in this phase was 101, which represent 18.5% of the total (N=546). This ratio would be expected as these types of messages acted as the trigger for the community discussion and their purpose was to inspire or provoke further debate and discussion. “Evocative” messages can be divided between E1 Recognising the problem and E2 Sense of puzzle-
Table 2. Overview of transcript coding (January 2006) Total No. of messages per descriptor (N=546)
No. of messages per code (N=546)
Evocative codes (Triggering event) E1 Recognising the problem E2 Sense of puzzlement
101
37 64
Inquisitive codes (Exploration) I1 Divergence – within the online community I2 Divergence – within a single message I3 Information exchange I4 Suggestions for consideration I5 Brainstorming I6 Leaps to conclusions
121
26 9 17 32 15 22
Tentative codes (Integration) T1 Convergence – among group members T2 Convergence – within a single message T3 Connecting ideas, synthesis T4 Creating solutions
270
74 32 62 102
Committed codes (Resolution) C1 Vicarious application to the real world C2 Testing solutions C3 Defending solutions
54
30 12 12
181
Graphically Mapping Electronic Discussions
Table 3. Evocative messages: BECTA, OzTeachers and SSABSA Transcripts (January 2006) Message phases and descriptors (codes)
# of messages per phase (N=546)
EVOCATIVE CODES (Triggering event)
101
# and % of messages per descriptor
E1
Recognising the problem
37 (36.63%)
E2
Sense of puzzlement
64 (63.37%)
ment codes. The breakdown between E1 and E2 messages is shown in Table 3. Typically E1: Recognising the problem messages contained requests supported by contextual information to help members understand what is required. For example: I’ve just been having a look at Google Earth – looks fantastic! I was wondering if anyone has come up with any good classroom uses for it - it looks like it would be great for exploring Roman Roads for example.
BECTA Transcript Lines: 6162-6165 E2: Sense of puzzlement messages tended to be about professional issues or topics not directly related to a specific classroom problem, for example: As a secondary maths [sic] teacher, from time to time I encounter the question “When am I ever going to use this?” What’s the point of learning algebra or trigonometry? The implication of the question is that the study of these branches of
mathematics is pointless because few people will ever need to put them to use in their chosen career. How do we answer such questions? OzTeacherNet Transcript Lines: 4504-4532
Inquisitive (Exploration) Messages The exploration phase is illustrated by six “inquisitive” descriptors (I1-I6). These are used to identify (a) divergence within the online community (Code I1), (b) divergence within a single message (Code I2), (c) information exchange (Code I3), (d) suggestion for consideration (Code I4), (e) brainstorming (Code I5), and (f) leaps to conclusions (Code I6). The breakdown of inquisitive messages is presented in Table 4. Inquisitive messages were posted in response to evocative (triggering event) messages (E1 and E2) and were observed to be more prolific in response to E1 messages. Overall 121 inquisitive messages were coded representing 22.16% of all messages analysed (N=546). The breakdown of the coded
Table 4. Inquisitive messages: BECTA, OzTeachers and SSABSA Transcripts (January 2006) Message phases and descriptors (codes)
INQUISITIVE CODES (Exploration phase) I1
Divergence – within the online community
# of messages per phase (N=546)
# and % of messages per descriptor
121 26 (21.49%)
I2
Divergence – within a single message
9 (7.44%)
I3
Information exchange
17 (14.05%)
I4
Suggestions for consideration
32 (26.45%)
I5
Brainstorming
15 12.4%)
I6
Leaps to conclusions
22 (18.18%)
182
Graphically Mapping Electronic Discussions
inquisitive messages into the six descriptors can be seen in Table 4. The most frequently occurring message type was I4: Suggestions for consideration (n=32, representing 26.45% of all inquisitive messages). These were often messages that offered a solution to an E1 or E2 message but also included ideas or suggestions that warranted further exploration. These were commonly in the form of questions, for example: I would start with parents’ views: from their experience of surfing, do they think it is possible to protect children from downloading unsuitable sites? And then, is it unsafe for children to make their own judgments about these sites?
BECTA Transcript Lines: 490-490 Inquisitive messages attempted to present a solution to a problem or question, but did not want to appear to be authoritative or dominating. The authors couched their suggestions with rhetorical questions that could be responded to by the community. The second most frequently-occurring inquisitive messages were I1: Divergence – within the online community (n=26, representing 21.49% of all inquisitive messages). This did not indicate that arguments were rife among the communities just that differing opinions or ideas were being presented to be considered. An example of this is the following message concerning the use of interactive whiteboards:
I’m wondering if anyone else is as doubtful as I am about the value of these? (The boards - I’m 100% in favour of projectors). I remain unconvinced that they offer anything that cannot be achieved by other means, other perhaps than for the youngest children where the touchy-feely thing is important (and where is the cut-off for this: Yr 2? 4?)
BECTA Transcript Lines: 2757-2762 Tentative (Integration) Messages Tentative messages were the largest number of messages (n= 270, which represents 49.45% of the total number of messages posted) coded. The breakdown for the four tentative codes is seen in Table 5. The progression of a discussion, as per the model being used, could be roughly characterised as being the presentation of a problem (evocative), the clarification and exploration of that problem (inquisitive), a possible solution being reached (tentative) and finally, the solution being implemented (committed). Therefore, it could be expected that a large number of messages within the tentative phases would be concerned with creating solutions and connecting ideas. These types of messages offered specific solutions to a problem usually after the community had agreed on a course of action to follow. The most frequently-occurring message descriptor was T4 Creating solutions messages
Table 5. Tentative messages: BECTA, OzTeachers and SSABSA Transcripts (January 2006) Message phases and descriptors (codes)
# of messages per phase (N=546)
TENTATIVE CODES (Integration phase)
270
# and % of messages per descriptor
T1
Convergence – among group members
74 (27.41%)
T2
Convergence – within a single message
32 (11.85%)
T3
Connecting ideas, synthesis
62 (22.96%)
T4
Creating solutions
102 (37.78%)
183
Graphically Mapping Electronic Discussions
(n=102). This was the most recurrent message type in its phase (representing 37.78% of all tentative messages) as well as being the most frequent of all messages posted (representing 18.16% of N=546). T4 messages appeared to be “culmination” solutions reached after much discussion. For example; Getting students to do a comic sports commentary can be fun, or [be] a sports commentator interviewing a literary character (say Macbeth, do you think you kicked an own goal when you tried to get rid of Banquo? Ans: Yes I didn’t think they’d bring on his ghost as a substitute) Kids often like satires, e.g. visiting crikey.com, or some such. Getting students to choose their own scene to dramatise from a film or novel can produce great results.
SSABSA Transcript Lines: 66-72 The process of reaching this agreed course of action required many messages which were classified as T3: Connecting ideas and synthesis (n=62, representing 22.96% of all tentative messages). Generally, one member of the community attempted to tie together all of the other ideas or proposals. Often this member was the initiator of the evocative message (E1 or E2) that had started the discussion.
Committed (Resolution) Messages This phase was attributed to the least number of messages (n=54, representing 9.8% of all messages). As it was the final phase of the discussion
and often acted as a closure to the discussion, this finding was not surprising. As there had been 101 triggering messages, it might have been expected that there would be a similar number of committed messages. That there were a total of 54 messages in this descriptor may indicate that some discussions were not resolved or solutions were not flagged to the community as having been chosen. Some problems or questions may have been resolved in conversations outside of the community list or some may have been so simple as to not warrant a formal closure. The breakdown for the committed messages is seen in Table 6. The most common type of committed message was C1: Vicarious application to the real world (n=30, representing 55.56% of all committed messages). C1 messages attempted to show how the solutions or ideas the group had agreed on applied in real or authentic situations. For example: Building on [Name] Q Bear, something I’ve seen done is to have two or three stuffed toys going home. (More children get a turn this way) Keep them in cloth bags (library bags) and as well as a diary to fill out include fiction/nonfiction books, e.g. if you have a koala include two or three books about koalas that parents can read to the children or children can read to younger siblings. Something else that is lots of fun and can take care of your whole literature/language programme for a term is Walking Talking Text, developed in the NT for use with aboriginal children. Thanks to others who have made suggestions. I can hardly wait for school to start:)
Table 6. Committed messages: BECTA, OzTeachers and SSABSA Transcripts (January 2006) Message phases and descriptors (codes)
# of messages per phase (N=546)
COMMITTED CODES (Resolution phase)
54
C1
Vicarious application to real world
# and % of messages per descriptor 30 (55.56%)
C2
Testing solutions
12 (22.22%)
C3
Defending solutions
12 (22.22%)
184
Graphically Mapping Electronic Discussions
Table 7. Summary of the distribution of messages per descriptor (after Garrison et al., 2001) 1
2
3
Total
Evocative (Triggering event)
26
71
4
101
Inquisitive (Exploration)
39
72
10
121
Tentative (Integration)
98
153
19
270
Committed (Resolution)
13
37
4
54
Total messages coded
176
333
37
546
1. BECTA Top Teachers 2. Oz-TeacherNet 3. SS English Teachers
OzTeacherNet Transcript Lines: 5696-5711 An overview of the spread of messages per descriptor can be seen below in Table 7. The total number of messages coded was 546 (N=546) and the number of messages per phase was; Evocative 101 messages (18.49%), Inquisitive 121 messages (22.16%), Tentative 270 messages (49.45%), and Committed 54 messages (9.89%). Clearly the majority of messages were coded as Tentative.
GRAPHICAL MAPPING OF DISCUSSION THREADS In the second stage of analysis, the discussion threads were graphically mapped using the qualitative software program MAXMaps®. This phase of analysis was outside The Practical Inquiry Model (Garrison, et al., 2001) and was an innovative attempt to determine if mapping would contribute to the analysis of the content. These graphical representations helped to develop a clearer understanding of the structure of the electronic discussions. It was decided to limit each map to one specific discussion thread. Each community transcript was then surveyed to determine the main threads of the discussion for the month. It is of interest to note that the communities, despite differences in size, location and purpose all evidenced the same patterns of message
types. From the three community transcripts, nine discussion threads were followed. These threads were selected according to the criteria that they displayed the first three descriptors of the conversational analysis framework, The Practical Inquiry Model (Garrison et al., 2001) that is, evocative, inquisitive and tentative codes. Of the nine identified thread patterns that were analysed, three distinct patterns emerged when the graphical representations were developed. These patterns provided a wealth of information regarding the nature of online discussions. The patterns can be seen below in Figure 2. They are Pattern
RESULTS (1) Pattern 1: Flowchart Design These discussion threads were simple and logical to follow and were conducted over an average period of time of 1-2 weeks. They were the most commonly occurring design; four of the nine threads mapped were flowchart designs. They were initiated by an evocative message (E1 or E2). Most commonly this type of message was a specific pedagogical problem with a clear context. The response from members of the community was immediately to offer tentative messages that are T4 Creating solutions. The initiator of the evocative message concluded the thread discussion by contributing a committed message indicating which suggestion they would adapt or use. This thread pattern has 3 clear evolutionary stages, which are:
Figure 3. Example of flowchart design thread pattern
186
• •
•
Stage 1: Evocative message (Triggering event) – the discussion is initiated by the posting of an E1 or E2 message. Stage 2: Tentative (Integration) – attempts by the community to solve the problem developed over a period of 1-2 weeks with numerous contributors. Stage 3: Committed (Resolution) – committed message from initiator indicating a solution has been found.
An example of this can be drawn from a discussion on the Oz-TeacherNet community concerning homework for primary school students. This pattern has been mapped in Figure 3.
(2) Pattern 2: Regular Cluster Design Of the nine identified thread patterns that were analysed, three were classified as Regular cluster design patterns. These discussion threads were simple and conducted over a short period of time on average less than 1 week. These discussions were characterised by the following:
Graphically Mapping Electronic Discussions
• •
•
•
They were initiated by an evocative message (E1 or E2). They were generally a problem that had arisen which was often explained within a context. The community then responded by offering tentative messages. In effect, they were brainstorming possible solutions (most typically I5 messages). The cluster did not have any resolution and there was no feedback from the initiator that they would adopt one of the suggestions. This discussion had 2 evolutionary stages:
•
(3) Bonded Cluster Design Of the nine identified thread patterns that were analysed, two were classified as Bonded cluster design patterns. These were discussion threads that were the most complex and were conducted over a longer period of time, that is, more than 2 weeks. These were characterised by the following. •
•
Stage 1: Evocative message (Triggering event) – the discussion is initiated by the posting of an E1 or E2 message. Stage 2: Tentative (Integration) – Brainstorming solutions / suggestions offered over a short period (approximately 1 week)
•
An example of this can be drawn from a discussion from the SSABSA – English Teachers community concerning group oral presentations for Year 11 students. This pattern has been mapped in Figure 4.
•
•
•
•
They were initiated by an evocative message (E1 or E2) but more typically an issue open for discussion (E2) rather than a specific problem that needs to be solved (E1). Often it was a topic or problem that requests personal opinions or thoughts to be shared with the community. The response to the evocative message maybe divided into two stages, namely inquisitive (exploration) and tentative (integration). The discussion progresses and switches between inquisitive and tentative stages multiple times. The discussion may be led in a new direction by a secondary evocative message. Throughout the stages of the discussion, some members may attempt to reach a consensus and offer committed messages to the community.
Figure 4. Example of regular cluster design thread pattern
187
Graphically Mapping Electronic Discussions
•
The discussion does not end with a definite resolution and finishes in an inquisitive stage.
This discussion pattern has multiple evolutionary stages. Stage 3, 4 or 5 can be partly present or repeated for an infinite number of times depending on the discussion: • • • • • •
Stage 1: Evocative message (Triggering event) – the discussion is initiated by the posting of an E1 or E2 message. Stage 2: Inquisitive &/or Tentative: Initial responses Stage 3: Multiple manifestations of Inquisitive, Tentative and Committed messages Stage 4: New Trigger: New evocative message associated to initial evocative message Stage 5: Multiple manifestations of Inquisitive, Tentative and Committed messages. Stage 6: Inquisitive: Conclusion, no resolution reached
An example of this can be drawn from a discussion from the BECTA Top Teachers community concerning the use of interactive whiteboards. This pattern has been mapped in Figure 5.
DISCUSSION The purpose of this study was to graphically map discussion threads in an attempt to ascertain if electronic discussions follow specific patterns and if those patterns could help in the understanding of differences between electronic discussions and the nature of those differences. The coding of the three community transcripts provided rich data regarding the content and nature of the discussions that were taking place. However as stated previously, understanding the content of the messages is not limited to the actual message itself. An electronic message is sent either in response to or to start a discussion thread and by examining the entry point of a new message can help clarify the dynamics of the community discussion. Nine discussion threads were selected from the three online communities that displayed the first three descriptors of the conversational analysis framework, The Practical Inquiry Model (Garrison et al., 2001), evocative, inquisitive and tentative codes. Of the nine thread patterns analysed, three distinct patterns emerged. The flowchart patterns were straightforward and logical discussion threads. They were conducted over an average period of time of 1-2 weeks and were discussions that arose in response to a message with a specific problem. They were characterised
Figure 5. Example of bonded cluster design thread pattern
188
Graphically Mapping Electronic Discussions
as having three stages; trigger event, integration and resolution. These were the most commonly observed pattern, suggesting that the majority of online discussions follow a cyclical structure of problem – suggestion - conclusion. This may also indicate what the majority of members seek from their online communities, reasoned solutions to problems. The regular cluster patterns were the simplest discussion threads, conducted over the shortest time-frame, usually less than one week. They followed an uncomplicated problem – solution pattern and once solutions had been offered, the discussion was concluded by the community. They were characterised as having two stages; trigger event and integration. Whilst they were not the most commonly observed pattern, they were also concerned with problem solving. This strengthens the suggestion that the majority of members, in this case teachers, are seeking a forum for problem solving from their online communities. The bonded cluster patterns were the most complex discussion threads, conducted over a longer period of time, usually more than two weeks. The thread was initiated by an evocative message, but the discussion may then be led in new directions or return back to original threads and did not have a definite resolution. They were characterised as having six stages; trigger event and then multiple manifestations of messages and new triggers. These discussions were concerned with pedagogical issues rather than problems, which may explain the differences in time, responses and patterning. Perhaps the difference in triggers, such as discussions concerning issues, which were comprised largely of messages offering personal opinions or suggestions, could be an explanation for the complexity of these patterns. The three thread patterns that emerged provided a clearer understanding of the nature of discussion threads. It would appear that some discussion threads had a short period of sustainability (regular cluster threads) as communities dealt with problems or issues quickly. Some discus-
sion threads followed a comprehensible path of problem, solution and resolution as communities offered possible solutions to be considered and then one was clearly chosen to be applied (flowchart threads). The most common and most revealing thread pattern was the bonded cluster. This discussion thread plainly demonstrated the dynamic nature of electronic discussions as members responded to new triggers at different stages during the discussion. It also clearly showed the ability for asynchronous discussion to go back and re-visit previous messages and this non-linear capability would appear to be a unique feature of electronic discussions.
CONCLUSION The findings from this study offer further insight into the nature of online discussion and helps to further understanding of online conversational dynamics. As mentioned previously transcripts of electronic discussions have traditionally been examined via automated or physical approaches. Coding transcripts can provide researchers with rich data regarding the content and nature of the discussions that take place. However, the simple act of counting messages is only a partial measure of understanding the electronic discussion being examined. An electronic message is sent either in response to or to start a discussion thread and examining the entry point of a new message can help to clarify the dynamics of the community discussion. The unique characteristics of electronic text, or hypertext, such as non-linear capabilities results in communicative acts that have the potential to be more dynamic than more traditional forms of communication. Cursory examinations of discussion threads have shown that some are more detailed or result in a larger group participating in them, whilst some are quite short or limited. Understanding how these threads differ helps to create a clearer picture of the nature and structure of electronic discussions.
189
Graphically Mapping Electronic Discussions
The graphical mapping tool presented here is unique due to the freedom is has from an underlying theoretical framework. This enables it to be used by a wide range of analytical approaches. The approach presented here was with a conversational analysis framework. But one could say so what! Why would we bother? Surely conversational analysis is enough? This tool attempts to incorporate both the visual with the textual. Currently analysis is conducted either visually or textually. It presents as a useful accessory to other modes of analysis, it may answer questions other modes may not, it may provide extra information, understanding or on a simplistic level, provide an image that can be used to represent a communicative act. The use of electronic discussions in research will continue to remain a popular source of data due to its accessibility and prevalence. Methodological approaches have focused on understanding the content of the messages or the cognitive processes behind the messages. Understanding the content should be considered the first step in examining electronic discussions. The innovation of this study sought to determine if mapping would contribute to the analysis of the content. These graphical representations helped to develop a clearer understanding of the structure of the electronic discussions and enabled a greater understanding of the structure and nature of the discussion being analysed. The results from this study have provided researchers with a further methodological tool that can be used to complement conversational analysis of electronic discussions.
REFERENCES Babbie, E. (1990). Survey Research Methods. Belmont, CA: Wadsworth Publishing Company.
190
Combs Turner, T., Smith, M. A., Fisher, D., & Welser, H. T. (2005). Picturing Usenet: Mapping computer-mediated collective action. Journal of Computer-Mediated Communication, 10(4), article 7. Retrieved from http://jcmc.indiana.edu/ vol10/issue4/turner.html Connelly, F. M., & Clandinin, D. J. (1990). Stories of expertise and narrative inquiry. Educational Researcher, 19(5), 2–14. Cossette, P., & Audet, M. (1992). Mapping of an idiosyncratic schema. Journal of Management Studies, 29(3), 325–347. doi:10.1111/j.1467-6486.1992.tb00668.x Donath, J., Karahalios, K., & Viegas, F. (1999). Visualising conversations. In Proceedings HICSS-32. Garrison, D. R., Anderson, T., & Archer, W. (2001). Critical thinking, cognitive presence, and computer conferencing in distance education. American Journal of Distance Education, 15(1), 7–23. doi:10.1080/08923640109527071 Grabowski, B., Pusch, S., & Pusch, W. (1990). Social and intellectual value of computer-mediated communications in a graduate community. ETTI, 27(3), 276–283. Gunawardena, C. N., Lowe, C. A., & Anderson, T. (1997). Analysis of a global online debate and the development of an interaction analysis model for examining social construction of knowledge in computer conferencing. Journal of Educational Computing Research, 17(4), 397–431. doi:10.2190/7MQV-X9UJ-C7Q3-NRAG Hara, N., Bonk, C. J., & Angeli, C. (2000). Conversational analysis of online discussion in an applied educational psychology. Instructional Science, 28(2), 115–152. doi:10.1023/A:1003764722829
Graphically Mapping Electronic Discussions
Henri, F. (1992). Computer conferencing and conversational analysis. In Kaye, A. R. (Ed.), Collaborative learning through computer conferencing: The Najaden Papers (pp. 117–136). Berlin: Springer-Verlag. Hiltz, S. R. (1990). Evaluating the virtual classroom. In Harasim, L. M. (Ed.), Online Education: Perspectives on a new environment (pp. 133–184). New York: Praeger. Kerpedjiev, S., & Roth, S. F. (2001). Mapping communicative goals into conceptual tasks to generate graphics in discourse. Knowledge-Based Systems, 14, 93–102. doi:10.1016/S0950-7051(00)00100-3 Kuehn, S. A. (1994). Computer-mediated communication in instructional settings: a research agenda. Communication Education, 43, 171–183. doi:10.1080/03634529409378974 Landow, G. P. (1994). Hypertext Theory. Baltimore, MD: The John Hopkins University Press. Levin, J. A., Kim, H., & Riel, M. (1990). Analyzing instructional interactions on electronic message networks. In Harasim, L. M. (Ed.), Online Education: Perspectives on a new environment (pp. 185–214). New York: Praeger. Levy, P. (2003). A methodological framework for practice-based research in networked learning. Instructional Science, 31, 87–109. doi:10.1023/A:1022594030184 McCreary, E. K. (1990). Three behavioral models for computer-mediated communication. In Harasim, L. M. (Ed.), Online Education: Perspectives on a new environment (pp. 117–130). New York: Praeger.
Mellar, H., & Kambouri, M. (2004). Learning and teaching adult basic skills with digital technology. In Brown, A., & Davis, N. (Eds.), Digital technology, Communities and Education (pp. 131–144). London: Routledge Falmer. doi:10.4324/9780203416174_chapter_8 Newman, D. R., Webb, B., & Cochrane, C. (1995). A conversational analysis method to measure critical thinking in face-to-face and computer supported group learning. Interpersonal Computing and Technology, 3(2), 56–77. Popolov, D., Callaghan, M., & Luker, P. (200). Conversation Space: Visulaing multi-threaded conversation. In Proceedings AVI2000 (pp. 246249). Sack, W. (2001). Conversation Map: An interface for very large-scale conversations. Journal of Management Information Systems, 17(3), 73–92. Smith, M., & Fiore, A. (2001). Visualization components for persistent conversations. In Proceedings CHI 2001 (pp. 136-143). Venolia, G. D., & Neustaedter, C. (2003). Understanding sequence and reply relationships within email conversations: a mixed-model visualization. In Proceedings CHI 2003 (pp. 361-368). ACM Press. Viegas, F. B., & Smith, M. (2004). Newsgroup Crowds and AuthorLines: Visualizing the activity of individuals in conversational cyberspaces. In Proceedings 37th International Conference on System Sciences. Wortham, S. E. F. (1996). Mapping participant deictics: a technique for discovering speakers’ footing. Journal of Pragmatics, 25, 331–348. doi:10.1016/0378-2166(94)00100-6
191
192
Chapter 10
A Tool to Study the Evolution of the Domain of a Distributed Community of Practice Gilson Yukio Sato Federal University of Technology - Paraná, Brazil Hilton José Silva de Azevedo Federal University of Technology - Paraná, Brazil Jean-Paul Barthès Université de Technologie de Compiègne, France
ABSTRACT Virtual communities and distributed communities of practice leave traces of their activities that are a valuable source of research material. At the same time, studying this kind of community requires new methods, techniques and tools. In this chapter, we present the Community Agent: a tool to follow the evolution of the domain of a distributed Community of Practice. Such a tool aims at obtaining and presenting graphically some indicators to study the evolution of the domain of a Community of Practice and the participation of its members. We present the implementation of the Community Agent, the results obtained in the preliminary tests and an example of how the agent could be used to study distributed communities.
INTRODUCTION Virtual Communities and Distributed Communities of Practice (CoPs) are an exciting research subject. As they cannot rely exclusively on faceto-face interactions, they usually interact through Internet based tools, ranging from email to virtual environments. Under such circumstances, DOI: 10.4018/978-1-60960-040-2.ch010
methods and techniques used to study collocated groups are inadequate to study their distributed counterparts. Interacting through the Internet, communities leave traces of their activities that constitute a vast research material. This material offers innumerable research opportunities, but the amount of documents to analyze is challenging. Moreover, part of these documents (e.g. email messages, chat transcriptions) are unstructured and use informal
A Tool to Study the Evolution of the Domain of a Distributed Community of Practice
language. To face this challenge, a set of adequate methods, techniques and tools is necessary. This chapter aims at presenting the Community Agent (CA), a tool that is potentially useful to study virtual communities and Distributed CoPs. The tool is based on the idea of a control panel (or a dashboard) that shows indicators of how a device is operating. The CA presents indicators of the evolution of the domain and the participation of the members of a distributed community. We open the chapter discussing some issues about CoPs and indicators for distributed groups. Then, we describe a tool to study distributed CoPs: the CA, and its implementation. To complete such a description, we illustrate how the CA could be used to analyze the community domain and its members’ participation. To finish the chapter, we present some conclusions and future research directions.
DISTRIBUTED COMMUNITIES OF PRACTICE The notion of Communities of Practice (CoPs) was created by Lave and Wenger (1991) in their seminal work ‘Situated Learning: Legitimate Peripheral Participation’. Since then, it has been used in domains such as Education and Knowledge Management (Examples inExamples inExamples in: Barton & Tusting, 2005; Hildreth & Kimble, 2004; Hughes, Jewson, & Unwin, 2007; Wenger, McDermott, & Snyder, 2002). Cox (2005) and Kimble (2006) agree that the evolution of the notion passed through three phases and that, in each of them, the notion underwent important changes. Two key works of the first phase are the already mentioned work by Lave and Wenger (1991) and the paper by Brown and Duguid (2000), ‘Organizational Learning and Communities of Practice: Toward a Unified View of Working, Learning, and Innovation’ originally published in 1991. The work that defined the second phase is the book ‘Communities of practice:
learning, meaning and identity’ by Wenger (1998). The third phase can be represented by the book ‘Cultivating communities of practice: a guide to managing knowledge’ by Wenger et al. (2002). In the first phase, Lave and Wenger (1991) concentrate on the concepts of Situated Learning and Legitimate Peripheral Participation, leaving the notion of CoPs in a second plan. In contrast, Brown and Duguid (2000) consider CoPs as a management tool to support learning and innovation in companies. In the second phase, Wenger (1998), leaning to the path indicated by Brown and Duguid (2000), puts the notion of CoPs in the center of the stage, developing it and its relations with other concepts such as identity, meaning and engagement. The third phase is more prescriptive, Wenger et al. (2002) develop recommendations to apply CoPs in Knowledge Management initiatives. In the third phase, a less deep and complex approach is used, but some concepts can be useful to analyze CoPs. In the third phase, CoP is defined ‘a group of people who share a concern, a set of problems, or a passion about a topic, and who deepen their knowledge and expertise in the corresponding area by interacting on an ongoing basis’ (Wenger et al., 2002). A structural model of CoPs is also developed. It combines three elements: (i) a domain of knowledge; (ii) a community of people; and (iii) a shared practice. The domain defines a set of issues and legitimizes the community by affirming its purpose and value to its members. The domain motivates members´ participation and contribution and helps them to define what activities should be performed. The community creates the social fabric of learning and fosters interactions and relationships based on mutual respect and trust. This kind of relationship creates an environment encouraging people to share ideas, to expose their ignorance, to ask questions and to listen carefully. The practice is a ‘set of frameworks, ideas, tools, information, styles, languages, stories and documents that community members share’. It
193
A Tool to Study the Evolution of the Domain of a Distributed Community of Practice
represents the knowledge that the community creates, shares and maintains (Wenger et al., 2002). Although suggested in the second phase, when the notion of locality is discussed, distributed CoPs are defined in the third phase. In a distributed CoP, members are distributed geographically and thus, cannot count on face-to-face meetings to interact. Instead they use technological means such as videoconferencing, email and virtual environments (Wenger et al., 2002). In this section, we presented some definitions that we used in our work. As the theoretical framework that involves the notion of CoP has passed through three phases, we considered important to make explicit that we are using mostly the concepts of the third phase.
Indicators for Distributed COPs The tool described in this chapter aims at obtaining and presenting graphically some indicators to study the evolution of the domain of a CoP and the participation of its members. Indicators have been used to motivate group members and to support the group coordination, but we consider that the same kind of indicators could help researchers to study distributed CoPs. Ackerman and Starr (1995) used various social activity indicators to improve the utilization rate of a groupware. The idea was to demonstrate the intensity of the group’s activity to motivate members to participate. They developed a system to analyze messages exchanged in a synchronous chat application and to extract some indicators as: presence of a new message, level of activity in subgroups, and social network diagram. The system was created to motivate members of a group to use a groupware, but we consider that the kind of social indicators it captures can be useful to study groups as CoPs. Gouvea et al. (2006) use indicators to motivate members to participate in a CoP. This CoP uses a virtual environment that attributes scores to each action taken by a member (e.g. participation in the
194
discussion board, use of the repository) and best scoring members are given prizes. The system uses simple indicators that help to observe the activity in the community, but we think that they are not enough to study this community. Moreover, we consider that the approach is inadequate for fostering a CoP. Lock Lee and Neff (2004) describe a system specifically developed to support CoPs at BHP Billiton. The system has functionalities that allow on-line group discussions, document sharing, news broadcasting, requests for help and a directory of members. Moreover, the system collects data about its own use in order to support the community’s coordinating team. The data is presented as maps of the existing social networks. Analyzing the evolution of the maps in the time, it was possible to observe the evolution of the CoP inside the company’s social network. Although this system was conceived to support the coordination of CoPs, it also presents a way to study distributed CoPs. De Laat and Broer (2004) analyzed a CoP of police officers working in drug prevention. The CoP was using an ICT system called Police Discussion Net (PDN) to share information and discuss relevant questions. As the interactions among members were stored in the system, the researchers were able to study the interaction patterns by analyzing the content of the discourse. Such patterns were visualized in a multi-dimensional scaling plot. The research helped to identify the community core members and to verify the inexistence of sub-groups. De Laat and Broer (2004) used some indicators to study a CoP, as we intend to do using the tool presented in our work. We consider that an adequate set of indicators can successfully help to study and understand distributed CoPs. Most works presented in this section intend to support virtual groups, but they suggest various indicators that can be part of a set of indicators to study distributed CoPs.
A Tool to Study the Evolution of the Domain of a Distributed Community of Practice
A TOOL TO STUDY DISTRIBUTED COPs The Community Agent (CA) was originally conceived to support the coordination of distributed CoPs. The idea was to show to the community coordinators when they should act to keep the community in good health (Sato, 2008). But analyzing the results provided by the tool, we concluded that they could be useful to study distributed CoPs.
The Community Agent (CA) The CA analyses messages that members post in a community discussion list in order to create graphs representing the content of the messages and members’ participation. Two structural elements of CoPs can be analyzed: the community domain and the participation of community members. The CA has a mailbox, as a regular community member, to receive messages from the discussion list. It processes the messages and creates seven graphical representations. Two different sets of the graphs can be identified, a set concerning the community domain and other concerning members’ participation. The set concerning the community members’ participation is formed by three graphs: 1. Number of Messages graph; 2. Number of Participants graph; 3. Levels of Participation graph. The Number of Messages and Number of Participants graphs are bar graphs that show, respectively, the number of messages submitted to the discussion list in the last seven days and the number of community members that participate in these discussions. The Levels of Participation graph is more complex. It shows the members, who posted in the discussion list in the last 30 days, classified in three levels of participation: low, medium and high.
The Number of Messages and Number of Participants graphs could be useful to verify the fluctuation in the intensity of the community activity and to identify members who tend to monopolize discussions. Moreover, it could be useful to observe which subjects boost the participation of the community members or to observe the effects of the introduction of a new member or a new artifact. The Levels of Participation graph could suggest who are the core, the active, and the peripheral members of the community. It cannot be used as a reference, as it uses a limited source of information, but it could help to study the members’ participation in the community. This graph could also give clues about the trajectory of the community members. As a member changes her level of participation, she establishes a trajectory inside the community. The set concerning the community domain is formed by four graphs: 1. 2. 3. 4.
Number of Messages in each Category graph; Distance among Messages graph; Frequency of the Macro-concepts graph; Distance among Macro-concepts graph.
The CA classifies messages in pre-defined categories. The Number of Messages in each Category graph is a bar graph in which bars indicate the number of messages in each category. The messages of the past seven days are considered. The Distance among Messages graph is a multi-dimensional scaling plot. The distance among messages is defined by the similarity of the contents of the messages. The Frequency of the Macro-concepts graph is a bar graph in which each bar represents the number of occurrences of a given macro-concept in the last seven days. We defined macro-concepts as a term that groups the occurrences of a set of related terms. The Distance among Macro-concepts graph is also a multi-dimensional scaling plot, but the distances represent how macro-concepts occur in similar contexts.
195
A Tool to Study the Evolution of the Domain of a Distributed Community of Practice
The Number of Messages in each Category graph could indicate fluctuations in the community domain. The number of messages in a given category could indicate more or less concern about the subject of the category. Moreover, analyzing the graphs in time, some aspects of the evolution of the community domain could be observed. The Distance among Messages graph could indicate different interests of the community. It could also help to identify sub-communities inside a CoP. The Frequency of the Macro-concepts and Distance among Macro-concepts graphs could be used to follow changes and trends in the community domain. Moreover, they could help to analyze the relationships among macro-concepts in a specific community. For example, if the use of technology is an important subject for the community, the macro-concept ‘Technology’ might be frequent in the Frequency of the Macro-concepts graph. In the Distance among Macro-concepts graph, if the macro-concept ‘Technology’ is near the other macro-concepts, it probably indicates that the use of technology is an issue for all community. But if the macro-concept ‘Technology’ is distant from the other important concepts, it could indicate the emergence of sub-community interested just in the technological aspect.
Figure 1. Levels of participation graph
196
In this sub-section, we described the CA and the indicators it is able to present. We also suggest some ways to use these indicators. In the next section, we present the techniques used to implement the CA.
Implementation The Community Agent (CA) was implemented using the multi-agent platform OMAS (Open Multi-Agent System) (Barthès, 2003). We choose the multi-agent technology due to its distributed character and its flexibility. The CA has three major skills: to read email messages, to process them and to present the information extracted from the messages. In this section, we describe how the CA processes email messages (Sato, 2008). The implementation of the Number of Messages and the Number of Participants graphs is simple. Both graphs use information extracted from the email messages headers. The implementation of the Levels of Participation graph (Figure 1) is more complex. The CA computes the number of participations of each member and uses the Fisher Algorithm (Hartigan, 1975) to classify the participation in three levels: low, medium and high participation. The Fisher Algorithm determines
A Tool to Study the Evolution of the Domain of a Distributed Community of Practice
the optimal partition of a population (community members), characterized for only one variable (number of messages in the last 30 days), in a pre-established number of classes (low, medium and high participation). To classify the email messages for the Number of Messages in each Category graph, we chose a technique elaborated by Enembreck to classify documents (Enembreck, 2003). The idea of centroid and the discriminative power of each term are associated in this technique. Each category is represented by a centroid that is calculated using a set of example-messages. Each example-message is represented by a TF-IDF (Term Frequency – Inverse Document Frequency) vector and the centroid is calculated from these vectors. When classifying a new message, the CA calculates a vector for the new message and compares it with the centroid of each category. In this comparison the discriminative power of each term is also considered. The Distance among Messages graph (Figure 2) is a multi-dimensional scaling plot that represents each message as a circle. The size of a circle is proportional to the number of terms (relevant to the community) in the message. The distance among circles (messages) is proportional to the
similarity of the contents of the messages. If two circles are placed near each other, the content of the messages that they represent are similar. To facilitate visualization, it is possible to change the colors of the circles. For this, the user should choose a message as a reference (anchor) and then use the scroll bar to choose a reference distance. As the user changes the reference distance, messages whose distance to the anchor is smaller than the reference distance, change their color. To calculate the distance among messages, the CA represents each message as a normalized vector with the frequency of terms. Using these vectors, it calculates a matrix (message X message) with the Euclidean distances between every pair of messages. Applying on such a matrix the Multi Dimensional Scaling (MDS) method (Cox and Cox, 1994; Young, 1985), the CA obtains the coordinates of each message (circle). In order to build the Frequency of the Macroconcepts graph (Figure 3), it is necessary to group the occurrences of a set of terms under a unique term that we called macro-concept. For example, the occurrences of ‘Internet’, ‘WWW’, and ‘web’ are grouped in the ‘Internet’ macro-concept. Or, the terms ‘evaluation’, ‘added-value’, ‘results’ are considered under the macro-concept ‘evaluation’.
Figure 2. Distance among messages graph
197
A Tool to Study the Evolution of the Domain of a Distributed Community of Practice
Figure 3. Frequency of the macro-concepts graph
To group synonyms and related terms under a macro-concept, we used an ontology. The Distance among Macro-concepts graph (Figure 4) is also a multi-dimensional scaling plot. Circles in the graph represent macro-concepts and the size of the circles is proportional to the number of occurrences of the macro-concept in the messages. To establish the distance among macro-concepts, the CA calculates a vector for each macro-concept. Each coordinate in these vectors corresponds to the number of occurFigure 4. Distance among macro-concepts graph
198
rences of the macro-concept in the messages. The CA applies the Latent Semantic Analysis method (Landauer, 1998) to the matrix formed with the macro-concepts vectors in order to make more evident the macro-concepts used in similar contexts. With these vectors, the CA calculates a matrix with the Euclidean distances between every pair of macro-concepts. As for the Distance among Messages graph, the CA applies the MDS method to obtain the coordinates of each macroconcept.
A Tool to Study the Evolution of the Domain of a Distributed Community of Practice
In this sub-section, in order to better explain the graphs presented by the CA, we described the techniques that were used to calculate and present such graphs.
Using the Community Agent To exemplify the use of the CA, we analyzed the messages posted in the ‘com-prac’ discussion list. It is a list about CoPs that runs in the Yahoo! Groups. ‘com-prac’ cannot be qualified as a distributed CoP, but the idea was to test the CA with the content of a real discussion list. We collected data representing the period of 105 days and obtained seven sets of 15 graphs (Sato, 2008). Associating the Number of Messages, the Number of Participants and the Frequency of the Macro-concepts graphs, we could observe that the level of the activity in the list tended to fluctuate according to the subjects discussed. Subjects like the coordination of CoPs or technology for CoPs would draw the interest of more members and originate a peak in the group’s level of activity. We could also observe that key members of the group tended to participate in almost all discussions. On the other side, members that are active, but not key to the group, tended to participate in discussion about a specific subject. When observing the Number of Messages and the Frequency of the Macro-concepts graphs together, it was also possible to notice that, in the peaks of activity, different subjects had been discussed. So, discussions about coordination of CoPs and technologies for CoPs could occur almost concurrently. It seems that the activity in the list induced more activity in the list. Analyzing the Levels of participation graph, we reinforced the notion that key members of the group tend to participate in almost all discussions. When classified by levels of participation, key members had high levels of participation. Moreover, during the analyzed period they tended to keep such a level. The participation of some members fluctuated and one of the causes of this variation seems to be the discussed subjects. As
the period of analysis was too short, members’ trajectories into and inside the group were not observable. One of the aspects of the group we sought to observe with the Distance among Messages graph was the formation of sub-groups. The idea was to analyze messages in the same area of the graph and verify if the authors of such messages would form a sub-group. Analyzing the graph, we could observe that similar messages were assembled in the same region of the graph. However, analyzing carefully the content of such messages, we noticed that there were different assemblages of messages for the same subject. We also observed that the assemblages of messages discussing the same subject could be more distant from each other that an assemblage discussing a different subject. Analyzing the CA, we concluded that we might improve the ontology in certain subjects. We consider that using the Distance among Macro-concepts graph it is possible to analyze the community domain. Using the macro-concepts ‘Community of Practice’, ‘Community’ and ‘Practice’ together as a reference assemblage, we could observe the subjects that the group considered the most relevant: the coordination of CoPs and the technology for CoPs. Analyzing the sequence of fifteen Distance among Macro-concepts graphs, we observed that the macro-concepts ‘Coordination’ and ‘Technology’ appear frequently and that they were usually located near the reference assemblage. To illustrate the use of the Distance among Macro-concepts graph, we analyzed two graphs. In both we use the macro-concepts ‘Community of Practice’, ‘Community’ and ‘Practice’ as a reference assemblage. In the first graph (Figure 5), we could observe that the macro-concept ‘Coordination’ is nearer the reference assemblage than the macro-concepts ‘Technology’ and ‘Participation’. The proximity between ‘Coordination’ and the reference is due a discussion about the substitution of coordinators in communities. The macro-concept ‘Participation’ is not so far from the reference because such discussion also refers to the 199
A Tool to Study the Evolution of the Domain of a Distributed Community of Practice
Figure 5. First graph
other members, their reactions and the community development. Another discussion represented in this graph contains messages concerning ‘backchanneling’. The ‘Technology’ macro-concept is further from the reference because this subject was much less discussed than the substitution of the coordinator. In the second graph (Figure 6), the distances between ‘Technology’ and the reference and between ‘Coordination’ and the reference are similar because the subjects concerning two macro-concepts were discussed with a similar intensity. The first subject concerns religion/ church as a CoP (from the perspective of the Figure 6. Second graph
200
engagement, the leadership, etc), the second concerns the Wikipedia entry about communities of practice and the Wikipedia itself. The first subject explains the proximity of ‘Coordination’ and the second the proximity of ‘Technology’. In this section, we presented how the Community Agent could be used to study a distributed CoP. Using the CA it is possible to observe the fluctuation in the level of participation, the members’ behavior, the most relevant subjects and the evolution of the domain. We consider that this kind of indicator could help to study a distributed CoP.
A Tool to Study the Evolution of the Domain of a Distributed Community of Practice
FUTURE RESEARCH DIRECTIONS
CONCLUSION
The CA is still a research prototype and it should be tested and refined in order to be effectively used as a tool to study CoPs. It is possible to use it to study some aspects of CoPs as illustrated in this chapter, but its interface and functionalities should be improved. Its interface should be more dynamic because, in the prototype, the user can not choose the period of time used for the analysis (we used seven days or 30 days). Furthermore, as the visualization is not flexible, users can not choose the time period covered by the visualization and need to print the results to analyze them. For both limitations, there are potential technological solutions available that should be evaluated. As each community demands a specific ontology and the CA is sensitive to the quality of the ontology, it is interesting to use a methodology to build such an ontology. A methodology should help to improve its quality by improving the repeatability and the traceability of the ontology building process. A tool to help to semi-automatically build an ontology could be also useful. To improve the functionalities of the CA, techniques for document classification, data mining, text summarization and others should be analyzed. As the CA is developed in a Multi-agent platform, agents implementing such techniques should be developed and associated with the CA. As the CA was first conceived to support the coordination of CoPs, it is not associated to a method to study CoPs. We consider it a significant limitation in our work, but there are some options to face such a limitation. One of them is to adapt the CA to an existing method developing new functionalities and using different techniques. Another option is to develop a method bearing in mind the possibilities of the CA. Both options will probably require important modifications in the CA.
Although developed to support the coordination of distributed CoPs, the Community Agent could be used as a tool to help the study of distributed CoPs. It provides graphs with information about the community domain and the participation of its members. Using the CA, we were able to observe the fluctuation in the level of participation, the members’ behavior, the most relevant subjects and the evolution of the domain of a group participating in a discussion list. We consider that the elaboration of methods, techniques and tools to study distributed CoP is a stimulating research subject. Although distributed CoPs share some characteristics with their collocated counterparts, geographically distributed members face different problems as the distance among members, the lack of awareness of other members, the higher number of members and the different cultural mindsets (Wenger et al., 2002). We consider that such problems also affect the way they should be studied, thus methods and techniques used to study collocated CoPs should be adapted to the circumstances of distributed ones and new methods and techniques should be developed. As distributed CoPs rely on technological means to operate, we consider that computational tools are also necessary to study them. However, as the technology can only enable a distributed CoP, a tool to study them should be conceived in association with a method.
REFERENCES Ackerman, M. S., & Starr, B. (1995). Social activity indicators: interface components for CSCW systems. Paper presented at the Proceedings of the 8th annual ACM symposium on User interface and software technology, Pittsburgh, Pennsylvania, United States.
201
A Tool to Study the Evolution of the Domain of a Distributed Community of Practice
Barthès, J.-P. (2003). OMAS version 4 - A Primer, Memo UTC/GI/DI/N 167 (pp. 27). Compiègne: Université de Technologie de Compiègne. Barton, D., & Tusting, K. (Eds.). (2005). Beyond Communities of Practice (1st ed.). Cambridge: Cambridge University Press. doi:10.1017/ CBO9780511610554 Brown, J. S., & Duguid, P. (2000). Organizational Learning and Communities of Practice: Toward a Unified View of Working, Learning, and Innovation. In Lesser, E. L., Fontaine, M. A., & Slusher, J. A. (Eds.), Knowledge and Communities (pp. 99–118). Boston: Butterworth Heinemann. doi:10.1016/B978-0-7506-7293-1.50010-X Cox, A. (2005). What are communities of practice? A comparative review of four seminal works. Journal of Information Science, 31(6), 527–540. doi:10.1177/0165551505057016 Cox, T. F., & Cox, M. A. A. (1994). Multidimensional Scaling (Vol. 59). London: Chapman & Hall. de Laat, M., & Broer, W. (2004). CoPs for cops: managing and creating knowledge through networked expertise. In Hildreth, P., & Kimble, C. (Eds.), Knowledge Networks: innovation through communities of practice (pp. 58–69). Hershey, PA: Idea Group Publishing. Enembreck, F. (2003). Contribution à la conception d’agents assistants personnels adaptatifs. PhD Thesis, Université de Technologie de Compiègne, Compiègne. Gouvea, M. T. A., Motta, C. L. R., & Santoro, F. M. (2006). Scoring mechanisms to encourage participation in Communities of Practice. In W. Shen, J. P. Barthès, J. Deng, J. Yong, Z. Lin, J. Luo, X. Li & Q. Hao (Eds.), Proceedings of the 10th International Conference on Computer Supported Cooperative Work in Design (Vol. 1, pp. 516-521). Nanjing, China: IEEE Press.
202
Hartigan, J. A. (1975). Clustering Algorithms. New York: John Wiley & Sons. Hildreth, P., & Kimble, C. (2004). Knowledge Networks: Innovation through Communities of Practice. Hershey, PA: Idea Group Publishing. Hughes, J., Jewson, N., & Unwin, L. (Eds.). (2007). Communities of Practice: Critical Perspectives. London: Routledge. Kimble, C. (2006). Communities of Practice: Never Knowingly Undersold. Paper presented at the Innovative Approaches for Learning and Knowledge Sharing: EC-TEL 2006 Workshops Proceedings, Crete, Grece. Landauer, T. K., Foltz, P. W., & Laham, D. (1998). Introduction to Latent Semantic Analysis. Discourse Processes, 25, 259–284. doi:10.1080/01638539809545028 Lave, J., & Wenger, E. (1991). Situated Learning: Legitimate peripheral participation (1st ed., Vol. 1). Cambridge: Cambridge University Press. Lock Lee, L., & Neff, M. (2004). How information technologies can help build and sustain an organisation’s CoP: spanning the socio-technical divide. In Hildreth, P., & Kimble, C. (Eds.), Knowledge Networks: innovation through communities of practice (pp. 165–183). Hershey, PA: Idea Group Publishing. Sato, G. Y. (2008). Contribution à l’Amélioration de la Coordination de Communautés de Pratique Distribuées. PhD Thesis, Université de Technologie de Compiègne, Compiègne. Wenger, E. (1998). Communities of practice: learning, meaning and identity (1st ed., Vol. 1). Cambridge: Cambridge University Press. Wenger, E., McDermott, R., & Snyder, W. M. (2002). Cultivating communities of practice: a guide to managing knowledge (1st ed., Vol. 1). Boston: Harvard Business School Press.
A Tool to Study the Evolution of the Domain of a Distributed Community of Practice
Young, F. W. (1985). Multidimensional Scaling. In Kotz, S., & Johnson, N. L. (Eds.), Encyclopedia of Statistical Sciences (Vol. 5, pp. 649–659). New York: John Wiley & Sons.
Eraut, M. (2002). Conceptual analysis and research questions: do the concepts of “Learning Community” and “Community of Practice” provide added value? In Proceedings of the Annual Meeting of the American Educational Research Association. New Orleans.
ADDITIONAL READING
Fox, S. (2000). Communities of Practice, Foucault And Actor-Network Theory. Journal of Management Studies, 37(6), 853–868. doi:10.1111/14676486.00207
Bradshaw, P., Powell, S., & Terell, I. (2004). Building a Community of Practice: Technological and Social Implications for a Distributed Team. In Hildreth, P., & Kimble, C. (Eds.), Knowledge Networks: innovation through communities of practice (Vol. 1, pp. 184–201). Hershey, PA: Idea Group Publishing. Brown, J. S. (1998). Internet technology in support of the concept of “communities-of-practice”: the case of Xerox. Accounting. Management and Information Technologies, 8(4), 227–236. doi:10.1016/S0959-8022(98)00011-3 Brown, J. S., & Duguid, P. (1998). Organizing knowledge. California Management Review, 40(3), 90–111. Brown, J. S., & Duguid, P. (2000). The social life of information (1st ed., Vol. 1). Boston: Harvard Business School Press. Brown, J. S., & Duguid, P. (2000). Balancing act: How to capture knowledge without killing it. Harvard Business Review, 78(3), 73–80. Bryant, S. L., Forte, A., & Bruckman, A. (2005). Becoming Wikipedian: transformation of participation in a collaborative online encyclopedia. In Proceedings of the 2005 international ACM SIGGROUP conference on Supporting group work (pp. 1-10). Sanibel Island, Florida, USA: ACM Press. Contu, A., & Willmott, H. (2003). Re-Embedding Situatedness: The Importance of Power Relations in Learning Theory. Organization Science, 14(3), 283–296. doi:10.1287/orsc.14.3.283.15167
Fuller, A. (2007). Critiquing theories of learning and communities of practice. In Hughes, J., Jewson, N., & Unwin, L. (Eds.), Communities of Practice: critical perspectives (pp. 17–29). London: Routledge. doi:10.4324/NOE0415364737.ch2 Gherardi, S. (2000). Practice-based theorizing on learning and knowing in organization. Organization, 7(2), 225–246. doi:10.1177/135050840072001 Gherardi, S. (2005). Organizational Knowledge: The Texture of Workplace Learning (1st ed.). Blackwell. Hildreth, P., & Kimble, C. (2004, 26, 27, 28 mai 2004). Communities of Practice: Going One Step Too Far? Paper presented at the 9e colloque de l’AIM: Systèmes d’information: perspectives critiques, Evry, France. Hildreth, P., Kimble, C., & Wright, P. (2000). Communities of practice in the distributed international environment. Journal of Knowledge Management, 4(1), 27–38. doi:10.1108/13673270010315920 Hughes, J. (2007). Lost in translation: communities of practice - the journey form academic model to practitiner tool. In Hughes, J., Jewson, N., & Unwin, L. (Eds.), Communities of practice: critical perspectives (pp. 30–40). London: Routledge. Mutch, A. (2003). Communities of Practice and Habitus: A Critique. Organization Studies, 24(3), 383–401. doi:10.1177/0170840603024003909
203
A Tool to Study the Evolution of the Domain of a Distributed Community of Practice
Orr, J. E. (1996). Talking about machines: an ethnography of a modern job. Ithaca: Cornell University Press. Roberts, J. (2006). Limits to Communities of Practice. Journal of Management Studies, 43(3), 623–639. doi:10.1111/j.1467-6486.2006.00618.x Soulier, E. (2004). Les Communautés de Pratique au Coeur de l’Organisation Réelle des Entreprises. Systèmes d’Information et Management, 9(1), 3–23. Tusting, K. (2005). Language and power in communities of practice. In Barton, D., & Tusting, K. (Eds.), Beyond communities of practice (pp. 36–54). Cambridge: Cambridge University Press. doi:10.1017/CBO9780511610554.004
204
Wenger, E. (2000). Communities of practice: the key to knowledge strategy. In Lesser, E. L., Fontaine, M. A., & Slusher, J. A. (Eds.), Knowledge and communities (1st ed., Vol. 1, pp. 3–20). Boston: Butterworth-Heinemann. doi:10.1016/ B978-0-7506-7293-1.50004-4 Wenger, E. (2000). Communities of practice and social learning systems. Organization, 7(2), 225–246. doi:10.1177/135050840072002 Wenger, E., & Snyder, W. M. (2000). Communities of practice: the organizational frontier. Harvard Business Review, 78(1), 139–145. Wenger, E., White, N., Smith, J. D., & Rowe, K. (2005). Outiller sa communauté de pratique. In Langelier, L. (Ed.), Travailler, apprendre et collaborer en réseau. Guide de mise en place et d’animation de communautés de pratique intentionelles (pp. 47–66). (Langelier, L., Trans.). Quebec: CEFRIO.
205
Chapter 11
Exploring Virtual Communities with the Internet Community Text Analyzer (ICTA) Anatoliy Gruzd Dalhousie University, Canada
ABSTRACT The chapter presents a new web-based system called ICTA (http://netlytic.org) for automated analysis and visualization of online conversations in virtual communities. ICTA is designed to help researchers and other interested parties derive wisdom from large datasets. The system does this by offering a set of text mining techniques coupled with useful visualizations. The first part of the chapter describes ICTA’s infrastructure and user interface. The second part discusses two social network discovery procedures used by ICTA with a particular focus on a novel content-based method called name networks. The main advantage of this method is that it can be used to transform even unstructured Internet data into social network data. With the social network data available it is much easier to analyze, and make judgments about, social connections in a virtual community.
INTRODUCTION In the age of cheap digital data storage more and more online interactions among people are being captured and stored for posterity. This treasure trove of data represents a unique opportunity for social scientists and Internet researchers to study and better understand the inner workings of virtual communities. Researchers can now easily scrutinize these recorded interactions and DOI: 10.4018/978-1-60960-040-2.ch011
answer questions like: how and why one virtual community emerges and another dies, how people agree on common practices and rules in a virtual community, and how they share knowledge and information among group members. Answers to these and other related questions will allow us to understand basic processes such as how people meet, communicate, and establish social relationships. It will also help practitioners to develop new technologies to better serve the information needs of different communities. For instance, social networking websites like Facebook and
Exploring Virtual Communities with the Internet Community Text Analyzer (ICTA)
MySpace are good examples of how advancements in information technology can help people form and support a much larger number of online relationships than has ever been possible before. However, this avalanche of online data can be overwhelming for both researchers and the public at large. Thus, it is not surprising that there is an increasing interest in the ability to retrieve and analyze online data generated by virtual communities automatically. This chapter presents a web-based system for automated analysis and visualization of online conversations called the Internet Community Text Analyzer (ICTA). The system is available at http:// netlytic.org. The main goal of ICTA is to automate the process of analyzing and visualizing text-based communal interactions and provide researchers and other interested parties with effective automated methods to study virtual communities. This system was primarily tested with online learning communities, but it can also be used to analyze a wide variety of other types of text-based virtual communities. This chapter describes the development of ICTA, its infrastructure and user interface. In particular it focuses on a new method called ‘name network’ that allows users to automatically extract social networks from text-based computer mediated communication. Once discovered, social networks can provide researchers with an effective mechanism for studying collaborative processes in virtual communities such as shared knowledge construction, information sharing, influence, and support. In addition to being useful for researchers, social networks can also help web developers to improve online recommendation systems by analyzing the preferences of other users with similar interests (e.g., Amazon.com, Netflix.com) or provide new browsing capabilities for online information (e.g., Silobreaker.com). For example, with a social network representation of news it is now possible to trace explicit or implicit connections between events and individuals involved in the news (Pouliquen et al, 2007; Tanev, 2007). Information about online social networks can also
206
provide a more secured and easier way to share private content with trusted individuals within the so-called Web of Trust (e.g., Golbeck, 2008; Matsuo et.al., 2004). Finally, companies can use online social networks to recruit talented individuals (e.g., Leung, 2003), find experts (Ehrlich et al, 2007; Li et al, 2007), organize more effective virtual marking campaigns (e.g., Domingos, 2005), or build brand loyalty using customer networks (Thompson & Sinha, 2008).
RELATED WORK Research on automated analysis and visualization of online conversations can be grouped according to the various types of computer-mediated communication (CMC) technology presently in use. Below is a brief overview of some of the research on the four most popular CMC and online media types: emails, online forums, blogs, and twitter. This review is not meant to be exhaustive; its primary purpose is to provide a general overview of common methods used for analyzing and visualizing online conversations and to give the reader a starting point for further reading. Social networking (SN) websites such as Facebook and MySpace are not reviewed separately since these sites often utilize one or more of the popular CMC types that will be discussed below.
Emails Social network analysis (SNA) is one of the most common methods for studying email-based interactions. Email data contains characteristics that naturally fit with the network model; for example, senders and receivers form uniquely identifiable nodes within a network, and email traffic can be used to establish links between the nodes in the network. Among scholars who used SNA to study email data are Diesner and Carley (2005) and Lim et al. (2007). Both of these research teams used SNA to discover and analyze
Exploring Virtual Communities with the Internet Community Text Analyzer (ICTA)
communication networks from email exchanges among employees of Enron, a US energy company that collapsed due to massive fraud in 2002. Their research demonstrated that use of the SNA method and network representation of online conversations were very useful for tracking and detecting anomalies in the communication patterns of Enron’s employees during the collapse of the company. In addition to using “who sends emails to whom” data, Venolia and Neustaedter (2003) also used temporal information to group email messages into conversations. By grouping long conversation threads together, they were able to create a new way to visualize email data and in the process also made it much easier to comprehend and handle large-size email archives. Another common method to study email-based interactions involves the use of clustering algorithms. For instance, Sudarsky and Hjelsvold (2002) applied a clustering algorithm on domain names found in email addresses to build a hierarchical view of email data combined with a temporal visualization. This additional way to visualize email data allowed researchers “to reduce the search space, and eliminate the difficult task of filing messages into folders” (p. 3). For more references on e-mail research, see a detailed review by Ducheneaut and Watts (2005).
Online Forums Another popular CMC used by virtual communities are online forums. Marc Smith and his colleagues have done extensive research in automated analysis and visualization of messages posted by members of different Usenet newsgroups, a type of online forum; a few of their publications in this area include Smith and Fiore (2001); Viégas and Smith (2004); Welser et al. (2007). As part of their research, they proposed different ways to represent communication patterns among Usenet members as well as ways to represent the content of the members’ communication. For example, they visualized communication patterns in a form of
thread trees and communication networks (based on “who replies to whom” data) and displayed popular Usenet groups using a hierarchical display of topical clusters. Like Smith, Chin-Lung et al. (2002) also focused on the analysis of Usenet newsgroups. These researchers built social networks based on the “who replies to whom” data, and then used the discovered networks to study interactions among group members and to look for leading authors. Another group of researchers that have worked with Usenet data is Lam and Donath (2005). Their research took them down a different path that involved the use of animation to represent social interactions in Usenet newsgroups. For example, in one of their novel interfaces for visualizing Usenet conversations, each thread is displayed as a moving square in the path of a sine wave. The speed represents how active the thread is, and the frequency and amplitude represent how recent the thread is. The main benefit of using animation is that it makes it possible to represent additional characteristics of a virtual community on one screen. The disadvantage is that animated visualization tends to be less self-explanatory and may require additional training for the user of the animation.
Blogs Although blogs are not considered to be a traditional CMC technology, people do often use blogs to form and maintain explicit or implicit connections with others on the Internet (Blanchard, 2004; Dennen and Pashnyak, 2008). As with emails and forums, one of the most common forms of representing social interconnectivity in the blogosphere is to use network visualization. Network visualization has been used to represent links between different blogs (e.g., Chin and Chignell, 2007; Herring et al., 2005; Lin et al., 2007; Pikas, 2008), topics that are being discussed in the blogosphere (e.g., Tirapat et al., 2006) and most recently links between blog readers on a single blog (e.g., Gruzd, 2009b). There are also some researchers who did
207
Exploring Virtual Communities with the Internet Community Text Analyzer (ICTA)
not rely on network visualization to analyze the blogosphere. Tseng et al. (2005) developed an interesting visualization called a mountain view for representing different topical communities among different blogs. Their visualization is a two dimensional graph consisting of a series of peaks and valleys, where peaks represent authoritative blogs and valleys represent blog-connectors that tend to link to authoritative blogs. Indratmo et al. (2008) developed an interactive visualization called iBlogVis for browsing blog posts and comments on a single blog. In iBlogVis, a blog entry is displayed as a diamond shape and a line; “[t] he diamond shape provides an interface to view the content of an entry, while the length of a line represents the number of characters in an entry” (p. 41). Finally, there also has been some work done in text summarization of blog posts and comments. Recent research in this direction includes that of Asbagh et al. (2009) and Hu et al. (2007). Their approaches attempt to find the main topics of a blog based on sentences extracted from blog entries and/or comments posted by blog readers.
Twitter Most recently research on virtual communities has shifted to Twitter, a popular new micro-blogging platform for people to share short messages or ‘tweets’ (no more than 140 characters long) about what they are currently doing. Twitter users can also read or ‘follow’ other people’s tweets. Like blogs, Twitter was originally designed as a one way communication medium. Recent research on Twitter suggests that users of this platform are also using it to carry out in-depth conversations and to maintain online relationships (boyd et al., 2009; Gruzd et al., 2009; Wellman et al., 2009). Recently two separate studies about Twitter were conducted by Huberman et al. (2009) and Honeycutt and Herring (2009). Huberman et al. (2009) used automated analysis and network visualization techniques to build and compare two different types of social networks found on Twit-
208
ter: friends (defined as “who replies to whom”) and followers (defined as “who follows whom”). They found that “Twitter users have a very small number of friends compared to the number of followers” (n.p.). In Honeycutt and Herring (2009), so-called VisualDTA diagrams were used to plot the evolution of topics in a conversation over time. This was done to determine whether people use Twitter to carry on a conversation with others. The researchers found that although Twitter was not originally designed for collaborative work and conversations, these types of interaction can and do happen there. In addition to the academic studies highlighted above, many web developers have also devised some very novel visualization techniques to explore the mountains of textual data being created by Twitter users. One type of visualization that has caught on with developers in the Twitter information space is geo visualization. These geo visualizations usually involve plotting tweets on a geographic map (usually using Google Maps) in real-time. Some examples include Trendsmap (http://trendsmap.com), GeoMe (http://www. geome.me), and GeoChirp (http://www.geochirp. com). These systems analyze users’ posts from Twitter to discover popular topics in the form of keywords and then display those keywords on a map based upon where geographically those keywords were originally posted. Another popular class of visualizations attempts to summarize activities of a specific user on Twitter. For example, Twitter-Friends (http://twitter-friends. com) visualizes some general statistics about a particular user such as: the overlap of outgoing and incoming connections, the number of replies received per day, and so on. The site can also build and visualize a network of who follows whom on Twitter. Other similar websites like Twitalyzer (http://www.twitalyzer.com) and Tweetypants (http://tweetypants.com) can calculate users’influence on Twitter by using automated text analysis of their recent tweets.
Exploring Virtual Communities with the Internet Community Text Analyzer (ICTA)
Thus far, research in this area has yielded a number of helpful visualization techniques for studying virtual communities. However, all of the various techniques discussed above tend to be CMC-specific. They do not allow researchers to ‘plug in’ their own conversational data for analysis and visualization. The author of this chapter is attempting to fill this need by developing the ICTA web tool, which incorporates techniques from both content analysis (CA) and social network analysis (SNA) for automated analysis and visualization of online text-based communication. Currently, there are a few existing projects on the Internet that should be mentioned here. These projects broadly share some of the functionalities of ICTA; however, for the most part they are designed for other fields and have different implementation and goals in mind. Some examples include visualization tools like Swivel1 and IBM’s Many Eyes2 that allow anybody to upload some data and then visualize it by selecting one of the available visualizations such as graphs, charts, histograms, etc. There are some major differences between these pure visualization tools and ICTA. First, these tools are not tuned to work with CMC-type data. They mostly work with data that is already organized in a table format such as a table of top 50 US companies that made the most money in 2008 and their corresponding revenues. Second, these online tools provide only top-level visualizations without interactive features that would allow researchers to explore and delve into their datasets at different levels of granularity. Finally, the visualization tools mentioned here lack some basic security features. Most researchers are working with private datasets; at the very least they all want some control over who can have access to their dataset and view the results. In sum, Swivel and IBM’s Many Eyes are easy to use and good for the public since they allow for basic visualizations, sharing, and discussion over the Internet. But for the reasons mentioned above these tools are not satisfactory for researchers whose needs are very different. The rest of this chapter will be
on “How” to study recorded online conversations of virtual communities using ICTA, followed by a detailed description of social network discovery techniques available in ICTA.
INTERNET COMMUNITY TEXT ANALYZER (ICTA) Background This project started in 2006 as a way to make sense of a large archive of bulletin board postings from eight online classes collected over a period of four years. Each class in this archive generated on average of about 1500 postings. However, aside from manually reading each of the postings, there were few other options for analyzing such a large amount of data. The first version of ICTA v1.0 was developed and presented at the Communities and Technologies conference in 2007. ICTA v1.0 facilitated searching the text from these eight classes. The main screen of this tool provided the user with a means to select the class and bulletin board(s) to be analyzed. During the analysis the system generated a tag cloud to show popular topics (nouns and noun phrases) in the selected bulletin boards (see Figure 1). The size of a topic within the cloud correlated with its frequency count; the higher the frequency, the larger the word would appear. By clicking on any topic, ICTA v1.0 returned a list of all instances where that particular topic was located within the dataset (see Figure 2). And by clicking on any of the instances the user could see the full posting. Alternatively, the user could also search by simply typing a desired term into a text box, ICTA then returned a list of all instances where that particular term was found (if it was present in the dataset). The 2006-2007 LEEP language study conducted together with Professor Caroline Haythornthwaite (Haythornthwaite & Gruzd, 2007) showed that this system was useful in the preliminary exploration of large datasets and in the
209
Exploring Virtual Communities with the Internet Community Text Analyzer (ICTA)
Figure 1. An interactive tag cloud showing the top 30 (user adjustable) topics automatically extracted from all messages posted in July 2009 to the Internet Researchers’ listserv
identification of important topics being discussed by group members and their changes over time. This first study with ICTA also led to two important improvements. First, ICTA v1.0 did not have capabilities to upload a new dataset. As a result the first improvement to the next version of ICTA was to add an interface where anybody can create their own account and upload their own dataset for further analysis with ICTA. Second, ICTA v1.0 primarily focused on the text analysis of online interactions. Although useful, text analysis alone does not provide a complete picture of an online community. It does not take into account relationships between group members that may also provide important insights into the internal
operation of an online community. For example, using a simple automated text analysis we can easily tell that there are many disagreement-type postings in a particular dataset; however, this information alone does not tell us whether the postings are coming from just a few members who tend to disagree with each other or are a general characteristic of this particular community as a whole. To increase the range of types of research questions that a researcher could address with ICTA, a social network discovery and visualization component was added to the system. This new component in ICTA can use both trafficbased (who talks to whom) and content-based data to automatically extract social network in-
Figure 2. A web interface within ICTA that shows all instances where the word “Wikipedia” was located within the sample dataset
210
Exploring Virtual Communities with the Internet Community Text Analyzer (ICTA)
Figure 3. A visualization feature (stacked graph) within ICTA showing the use of important topics over time
formation and offer visual representations of the analysis.
User Interface Below is a brief description of ICTA’s interface and functionalities. A user starts by importing a dataset, which is done by uploading a file or by specifying the location of an external repository. Currently ICTA can parse text-based interactions stored in one of three formats: XML (e.g., RSS feeds), MySQL database, or a Comma Separated Values (CSV3) text file. After the data is imported, the second step is to remove any text that may be considered as noise. This is an optional step that is primarily designed to remove redundant or duplicate text that has been carried forward from prior messages. To accomplish this ICTA simply removes all lines that start with a symbol commonly indicating quotation such as “>” or “:”. But a user is not restricted to just these two symbols. In fact, in the ‘expert’ mode, it is possible to remove almost any text patterns such as URLs or email addresses from messages using a mechanism called regular expression4. After the data importing and cleansing steps are completed, the data is ready to be analyzed. In this stage, ICTA uses capabilities from ICTA
v1.0 described above to build concise summaries of the communal textual discourse. This is done by extracting the most descriptive terms (usually nouns and noun phrases) and presenting them in the form of interactive concept clouds or stacked graphs that show the use of important topics over time (see Figure 3). Another feature that is available in the ‘text analysis’ step is to define different groups or categories of words/phrases/patterns (so-called linguistic markers), count how many instances of each category are in a dataset and then display them in the form of a treemap view (see Figure 4). Using this functionality a researcher, for example, can define and use categories consisting of various linguistic markers that have been shown to be useful in identifying instances of social, cognitive, and/or meta-cognitive processes such as decision-making, problem-solving, questionanswering, etc (see for example, Alpers et al., 2005; Corich et al., 2006; Pennebaker & Graybeal, 2001). For demonstration purposes, ICTA comes with several commonly used categories such as ‘agreement’, ‘disagreement’, ‘uncertainty’, ‘social presence’, etc. Users can also modify the existing categories to better reflect their research questions or create their own categories.
211
Exploring Virtual Communities with the Internet Community Text Analyzer (ICTA)
Figure 4. A visualization feature (treemap) within ICTA of various predefined social, cognitive, and meta-cognitive categories found in the sample dataset
The final stage of analysis and the focal point of this chapter is the “network analysis” step which includes the procedures for building so-called chain networks and name networks (described in more detail in the next section). When building these networks from a CMC-type dataset, there are a lot of different parameters and threshold
choices to select from. ICTA’s interface allows users to fine tune many of the available parameters and thresholds. After networks are built, they can be visualized and explored using a built-in network visualization tool. Users also have the option of exporting the resulting networks to other popular social network analysis programs such as ORA5, Pajek6, UCINET7, or NodeXL8. In addition to a number of basic visualization features such as scaling, selecting cut off points to hide ‘weak’ nodes or ties9, ICTA can also display excerpts from messages exchanged between two individuals to show the context of their relations. The ability to call up and display excerpts from messages makes it a lot easier to ‘read’ a network and understand why a particular tie exists. This feature is activated by moving a mouse over an edge connecting two nodes (see Figure 5).
AUTOMATED DISCOVERY OF SOCIAL NETWORKS IN ICTA This section presents two automated procedures used in ICTA for discovery of social networks from text-based online interactions in a virtual community, focusing on the ‘name network’ method,
Figure 5. ICTA’s social network visualization showing connections among top participants on the Internet Researchers’ listserv in July 2009
212
Exploring Virtual Communities with the Internet Community Text Analyzer (ICTA)
a key feature in ICTA. The section describes the procedures in the context of threaded discussions due to their wide acceptance and usage by various virtual communities. However, ICTA is capable of analyzing other online data types such as nonthreaded discussion lists, chats, and blogs.
Chain Network Method Chain networks are built automatically using information from the posting headers, specifically reference chains (a reference chain refers to a running list of group members who previously posted to a particular discussion thread.) ICTA provides four distinct options for building chain networks: • •
•
•
Option 1: Connecting a poster to the last person in the reference chain only Option 2: Connecting a poster to the last and first (thread starter) person in the chain, and assigning equal weight values of 1 to both ties Option 3: Same as option 2, but the tie between a poster and the first person is assigned only half the weight (0.5) Option 4: Connecting a poster to all people in the reference chain with decreasing weights
The ‘chain network’ method operates under the assumption that the reference chain may reveal the addressee(s). More specifically, it is usually assumed that a poster is replying to the immediately previous poster in the reference chain (a variation on this method is often used with email-type data; see for example, Hogan, 2008.) Unfortunately, the above mentioned assumption is not always true in highly active, argumentative, and/or collaborative communities such as online classes, or where many discussion topics may be in play at one time. Furthermore, an individual may seem to respond to one post but in the text refer to several others, synthesizing and bringing together comments of
many posters. So, although the ‘chain network’ method provides some approximation of ‘who talks to whom’ data, such approximation is not very accurate. In the examples below if we were to rely on just the chain network to discover ties we would miss some important connections. In Example 1, the chain network only finds one connection between Sam and Gabriel. But there are actually four possible connections with Sam. This is because except for Gabriel, the other addressees (Nick, Ann, and Gina) in the sample message below were not among the people who had previously posted to the thread. In Example 2, Fred is the first person who posted to the thread, thus the reference chain is empty. As a result, the ‘chain network’ method finds no connections in this posting. However, upon closer examination there is actually one potential connection between the poster Fred and a person named Dan, who has not posted to the current thread. Example1: FROM: Sam REFERENCE CHAIN: Gabriel Nick, Ann, Gina, Gabriel: I apologize for not backing this up with a good source, but I know from reading about this topic that libraries […] Example 2: FROM: Fred REFERENCE CHAIN: <empty> I wonder if that could be why other libraries around the world have resisted changing – it’s too much work, and as Dan pointed out, too expensive. As an alternative and/or complement to the ‘chain network’ method, ICTA provides access to another method called ‘name network’ (developed by the author of this chapter). The ‘name network’ method discussed in the following section over-
213
Exploring Virtual Communities with the Internet Community Text Analyzer (ICTA)
comes some of the inherent weaknesses of the ‘chain network’ method.
Name Network Method This section outlines the implementation of the ‘name network’ method and highlights some of the manual override features available in ICTA that can be used by researchers to further improve the accuracy of the resulting social network analysis. To develop the ‘name network’ method, the best practices from the literature on computational linguistics were relied on as the basis. A detailed literature review of the best practices is available in Gruzd & Haythornthwaite (in press). In general, the ‘name network’ method consists of two main steps: node discovery and tie discovery.
Node Discovery During this step all references to people in the messages such as names, pronouns, and email addresses are identified. Currently there are many software packages that can perform this task10. However, many of them are weak in terms of execution speed and/or accuracy. Furthermore, these packages are often trained on documents from newspaper or medical domains which tend to have more formal and standardized spellings, capitalizations, and grammar rules. As a result, these packages are unsuitable for working with CMC-type data which are filled with idiosyncratic spellings, capitalizations, and grammar. To address these limitations a hybrid approach to personal name discovery was used. The approach attempts to satisfy the following two criteria: (1) to process messages in real-time and (2) to understand informal online texts. The algorithm works as follows. First, it removes ‘stop-words’, such as and, the, to, of, etc. Second, the algorithm normalizes all remaining words by stripping all special symbols from the beginning and end of any word, including possessives (e.g @Dan or Dan’s becomes Dan).
214
For all remaining words, to determine whether a word is a personal name, the algorithm relies on a dictionary of names and a set of general linguistic rules derived manually. To find first names, the procedure uses a dictionary containing over 5,000 of the most frequently used first names in the United States as reported by the US Census11. If a capitalized word is found right after a first name this word is classified as a middle or last name. In addition to the dictionary, ICTA also relies on an additional source of personal names - the ‘From’ field in the message header (e.g., Culotta et al., 2004). In addition to a poster’s email address, the ‘From’ field sometimes includes his/her name enclosed within a set of parentheses. To recognize names in the ‘From’ field of the message header, the algorithm uses a simple string matching pattern that looks only for words found within the round brackets (if any). For example, “agruzd@gmail. com (Anatoliy)” will produce Anatoliy. To recognize names that are not likely to be found in the dictionary, such as nicknames, abbreviated names, unconventional names, etc. – for example AG or AnatoliyG – the algorithm relies on context words that usually indicate personal names such as titles (e.g., Professor, Major, Ms.) and greetings (e.g., Hi or Dear). To exclude personal names that are part of a building or organization name, such as the Ronald Reagan Presidential Library, the algorithm first ignores all sequences of more than three capitalized words, and second removes phrases in which the last word is found in a pre-compiled list of “prohibited” words such as Street or Ave. While the algorithm described above is very thorough, it is still not yet capable of achieving 100% accuracy. At this point in the process, incorrectly spelled names may be missed and some possible false-positive words may still be on the list. However, since accurate name extraction is a vital foundational building block in automated inference of social networks, the accuracy level should be as close to 100% as possible. To bridge the gap between automated name discovery and
Exploring Virtual Communities with the Internet Community Text Analyzer (ICTA)
Figure 6. A web interface within ICTA for editing extracted names: Top 30 names automatically extracted from the Internet Researchers’ listserv for messages posted during July 2009
the desired 100% level of accuracy in the final list of names, ICTA allows researchers to manually review and edit the list of extracted names (see Figure 6). This allows those with knowledge of the group members (e.g., knowing nicknames and full names) and/or of the subject matter (knowing that references to Reagan may be to a library or airport in the current context) to fine-tune the name list. The end result of this semi-automated name extraction procedure is a highly accurate list consisting of all occurrences of personal names in the postings.
Tie Discovery After all network nodes consisting of previously extracted personal names are identified, the next step is to uncover if and how these nodes are interconnected. The algorithm used relies on the content of messages to infer ties between people. It works under the assumption that the chance of two people sharing a social tie is proportional to the number of times each of them mentions the other in his/her postings either as an addressee or a subject person. As a way to quantify this assumption, the algorithm adds a nominal weight of 1 to a tie between a poster and all names found in each posting. Below is a sample posting that attempts to demonstrate how the algorithm works: From: [email protected] Reference Chain: [email protected], [email protected]
Hi Dustin, Sam and all, I appreciate your posts from this and last week […]. I keep thinking of poor Charlie who only wanted information on “dogs“. […] Cheers, Wilma. As indicated in the header, this posting is from [email protected], and it is a reply to the post by [email protected]. And [email protected] is a person who actually started the thread. There are four names in the posting: Dustin, Sam, Charlie, and Wilma. According to the algorithm, there will be connections between the poster wilma@bedrock. us to each name in the posting: [email protected] - Dustin [email protected] – Sam [email protected] – Charlie [email protected] – Wilma There are a few problems with this initial approach. First, Wilma is a poster; so there is no need for the [email protected] – Wilma connection. Second, what will happen if more than one person has the same name? For example, suppose that there is more than one Sam in the group, how would we know which Sam is mentioned in this posting? Conversely, there could be situations where many different names can belong to one person. Furthermore, in the example above, Charlie is not even a group member; he is just an imaginary user. Ideally, the poster should not be connected to Charlie. To address these problems, ICTA uses
215
Exploring Virtual Communities with the Internet Community Text Analyzer (ICTA)
a name alias resolution algorithm. To disambiguate name aliases, the algorithm adopts a simple but effective approach that relies on associating names in the postings with email addresses in the corresponding posting headers (further referred as name-email associations). By learning name-email associations, the algorithm knows that there are, for example, two Nicks because of the existence of two associations for Nick with two different email addresses. Finally, to achieve the highest level of accuracy on this task, a semi-automated approach was adopted. A web interface was developed to allow a manual correction of the extracted associations (see Figure 7). For each email address that had at least one name associated with it, ICTA displays a list of choices for possible aliases sorted by their confidence levels. Using this interface, a researcher can easily remove and/or add a new name-email association by selecting a name from a list of all names found in the dataset from a drop down menu. After learning all possible name-email associations and their overall confidence levels, the algorithm goes through all postings once again to replace those names mentioned in the body of the postings that have been associated with at least one email. If a name has more than one email
candidate, then the algorithm uses the email with the highest level of confidence. However, in some cases selecting an email with the highest level of confidence may produce an incorrect result. For example, in one of the sample datasets, there were two Wilmas: [email protected] with the confidence level set to 27.45, and [email protected] with the confidence level set to 18.83. If we were to select an email with the highest confidence level, then all mentions of Wilma in all postings would be attributed to only [email protected]. But of course this would be wrong since in some instances it might be [email protected]. To ensure that the algorithm identifies the right Wilma, the following fail-safe measure was implemented. If there is more than one email candidate, the algorithm then relies on an additional source of evidence – the reference chain. First, it identifies an overlap between email candidates for a name and emails from the reference chain. If the overlap is empty, then the algorithm proceeds as usual and uses the email with the highest confidence level (further referred to as the strongest candidate). When the overlap is not empty, it means that one or more email candidates have previously posted to the thread. Based on the manual analysis of the sample dataset, the name mentioned in the posting is more likely to belong
Figure 7. A web interface within ICTA for manual alias resolution
216
Exploring Virtual Communities with the Internet Community Text Analyzer (ICTA)
to an email candidate that is also in the reference chain than to an email candidate that is not. Taking this observation into consideration, if there are two possible email candidates, as in the case of Wilma, and the strongest candidate (wilma1@ email.com) is not present in the reference chain, but the other candidate ([email protected]) is, then the algorithm uses the one that is also in the reference chain. In cases where both email candidates have previously posted to the thread, the algorithm takes the candidate who has posted most recent posting to the thread. This section concludes the description of the ‘name network’ algorithm which consists of two main steps: node discovery and tie discovery.
CONCLUSION AND FUTURE RESEARCH The chapter presented a new web-based system called ICTA (http://netlytic.org) for automated analysis of text-based interactions in virtual communities. ICTA is designed to help researchers and other interested parties derive wisdom from large size datasets. The system does this by offering a set of text mining techniques coupled with useful visualizations. ICTA’s content and network analysis procedures help users automatically discover who talks to whom in a virtual community and what they are talking about. ICTA’s visualization component also enables users to explore the resulting data using visualizations techniques such as stacked charts, treemaps, and networks. After describing ICTA’s infrastructure and user interface, the second part of the chapter discussed two social network discovery procedures used by ICTA with a particular focus on a novel contentbased method called name networks. The main advantage of this method is that it can be used to transform even unstructured Internet data into social network data. With the social network data available, it is much easier to analyze and make
judgments about social connections between community members. ICTA in general and the ‘name network’ method specifically have already proven to be useful in studying virtual learning communities (Gruzd, 2009a; Haythornthwaite & Gruzd, 2007) and is currently being tested in the context of virtual communities among blog readers (Gruzd, 2009b). Future research will include the evaluation of ICTA and the ‘name network’ method on larger datasets from other domains such as: online health support groups, communities of political bloggers, communities of YouTube contributors, and 3-dimensional virtual communities on Second Life. To effectively analyze datasets from such diverse domains, the name alias algorithm at the heart of the ‘name network’ method will require additional modifications. For example, currently the ‘name network’ method resolves name aliases by assuming that each group member uses only one unique email address to post messages to a group. However, this is not always the case in open virtual communities. To make the ‘name network’ method more useful, it needs to do more than just automatically identify that two people are connected. It should also find how they are connected, what types of social relations they share, and what roles they have in a group. This means that any future work will need to incorporate techniques for automated role and relationship identification. Some initial work in this direction has been done by Matsuo et al. (2007) and Mori et al. (2005) on web pages, and by Diehl et al. (2007), Carvalho et al. (2007), and McCallum et al. (2005) on email datasets. Finally, many virtual communities are now using multiple types of electronic communication methods such as forums, chats, and wikis to carry on their discussions. It is important to know how we can capture and combine network information from these various data streams to build a more comprehensive view of a virtual community. Thus, future work will include devising and evaluating methods for collecting and combining evidence
217
Exploring Virtual Communities with the Internet Community Text Analyzer (ICTA)
from multiple data sources to build the name network. Some of the challenges here include matching names that people use across different communication mediums. For example, an algorithm needs to know that ‘AnneT’ on the bulletin boards is the same person as ‘Anne2’ in the chat room and ‘Anne Tolkin’ on a wiki page.
ACKNOWLEDGMENT This work was supported by the Social Sciences and Humanities Research Council (SSHRC) grant.
REFERENCES Alpers, G. W., Winzelberg, A. J., Classen, C., Roberts, H., Dev, P., Koopman, C., & Barr Taylor, C. (2005). Evaluation of Computerized Text Analysis in an Internet Breast Cancer Support Group. Computers in Human Behavior, 21(2), 361–376. .doi:10.1016/j.chb.2004.02.008 Asbagh, M., Sayyadi, M., & Abolhassani, H. (2009). Blog Summarization for Blog Mining. In. (Eds.), Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (pp. 157-167). Berlin, Heidelberg: Springer. Blanchard, A. (2004). Blogs as Virtual Communities: Identifying a Sense of Community in the Julie/Julia Project. In Gurak, S. A. L., Johnson, L., Ratliff, C., & Reyman, J. (Eds.), Into the Blogosphere: Rhetoric, Community, and Culture of Weblogs. University of Minnesota. Boyd, D., Golder, S., & Lotan, G. (2010). Tweet, Tweet, Retweet: Conversational Aspects of Retweeting on Twitter. In Proceedings of the 43rd Hawaii International Conference on System Sciences, Hawaii, USA.
218
Carvalho, V. R., Wu, W., & Cohen, W. W. (2007). Discovering Leadership Roles in Email Workgroups. In Proceedings of 4th Conference on Email and Anti-Spam (Mountain View, CA). Chin, A., & Chignell, M. (2007). Identifying Communities in Blogs: Roles for Social Network Analysis and Survey Instruments. International Journal of Web Based Communities, 3(3), 343– 365. .doi:10.1504/IJWBC.2007.014243 Chin-Lung, C., Ding-Yi, C., & Tyng-Ruey, C. (2002). Browsing Newsgroups with a Social Network Analyzer. In Proceedings of the 6th International Conference on Information Visualisation (pp. 750-755), London, England. Corich, S. Kinshuk, & Hunt, L.M. (2006). Measuring Critical Thinking within Discussion Forums Using a Computerised Content Analysis Tool. In Proceedings of the 5th International Conference on Networked Learning 2006, Lancaster, UK. Culotta, A., Bekkerman, R., & McCallum, A. (2004). Extracting Social Networks and Contact Information from Email and the Web. In Proceedings of the 1st Conference on Email and Anti-Spam, Mountain View, CA, USA. Dennen, V. P., & Pashnyak, T. G. (2008). Finding Community in the Comments: The Role of Reader and Blogger Responses in a Weblog Community of Practice. International Journal of Web Based Communities, 4(3), 272–283. .doi:10.1504/ IJWBC.2008.019189 Diehl, C., Namata, G., & Getoor, L. (2007). Relationship Identification for Social Network Discovery. In Proceedings of the 22nd National Conference on Artificial Intelligence (pp. 546552). Menlo Park, CA: AAAI Press. Diesner, J., & Carley, K. M. (2005). Exploration of Communication Networks from the Enron Email Corpus. In Proceedings of the 2005 SIAM Workshop on Link Analysis, Counterterrorism and Security (pp. 3-14), Newport Beach, CA, USA.
Exploring Virtual Communities with the Internet Community Text Analyzer (ICTA)
Domingos, P. (2005). Mining Social Networks for Viral Marketing. IEEE Intelligent Systems, 20(1), 80–82. Ducheneaut, N., & Watts, L. A. (2005). In Search of Coherence: A Review of E-Mail Research. Human-Computer Interaction, 20(1/2), 11–48. .doi:10.1207/s15327051hci2001&2_2 Ehrlich, K., Lin, C.-Y., & Griffiths-Fisher, V. (2007). Searching for Experts in the Enterprise: Combining Text and Social Network Analysis. In Proceedings of the 2007 International ACM Conference on Supporting Group Work, Sanibel Island, Florida, USA. Golbeck, J. (2008). Trust and Nuanced Profile Similarity in Online Social Networks. ACM Transactions on the Web, 3(4), Article No. 12. Gruzd, A. (2009a). Studying Collaborative Learning using Name Networks. [JELIS]. Journal of Education for Library and Information Science, 50(4). Gruzd, A. (2009b). Automated Discovery of Emerging Online Communities Among Blog Readers: A Case Study of a Canadian Real Estate Blog. In Proceedings of the 10th Annual Conference of the Association of Internet Researchers (AoIR), Milwaukee, WI, USA. Gruzd, A., & Haythornthwaite, C. (in press). Networking Online: Cyber Communities. In Scott, J., & Carrington, P. (Eds.), Handbook of Social Network Analysis. London: Sage. Gruzd, A., Takhteyev, Y., & Wellman, B. (2009). A Tweetise on Twitter: Networked Individualism Online. Thematic Session on Imagined Communities in the 21st Century. San Francisco, CA, USA: American Sociological Association.
Haythornthwaite, C., & Gruzd, A. A. (2007). A Noun Phrase Analysis Tool for Mining Online Community. In C. Steinfield, B. Pentland, M. Ackerman, & N. Contractor (Eds.), Communities and Technologies 2007 (pp. 67–86). Springer. doi:10.1007/978-1-84628-905-7_4 Herring, S. C., Kouper, I., Paolillo, J. C., Scheidt, L. A., Tyworth, M., Welsch, P., et al. (2005). Conversations in the Blogosphere: An Analysis From the Bottom Up. In Proceedings of the 38th Hawaii International Conference on System Sciences, Los Alamitos, USA. Hogan, B. (2008). Analyzing social networks via the Internet. In Fielding, N., Lee, R. M., & Blank, G. (Eds.), Sage Handbook of Online Research Methods (pp. 141–160). Thousand Oaks, CA: Sage. Honeycutt, C., & Herring, S. C. (2009). Beyond Microblogging: Conversation and Collaboration Via Twitter. Proceedings of the 42nd Hawaii International Conference on System Sciences, Los Alamitos, USA. Hu, M., Sun, A., & Lim, E.-P. (2007). CommentsOriented Blog Summarization by Sentence Extraction. In Proceedings of the 16th ACM Conference on Information and Knowledge Management, Lisbon, Portugal. Huberman, B., Romero, D. M., & Wu, F. (2009). Social Networks That Matter: Twitter under the Microscope. First Monday, 14, 1–5. Indratmo, V. J., & Gutwin, C. (2008). Exploring Blog Archives with Interactive Visualization. In Proceedings of the Working Conference on Advanced Visual Interfaces, Napoli, Italy. Lam, F., & Donath, J. (2005). Seascape and Volcano: Visualizing Online Discussions Using Timeless Motion. CHI ‘05 extended abstracts on Human Factors in Computing Systems, Portland, OR, USA.
219
Exploring Virtual Communities with the Internet Community Text Analyzer (ICTA)
Leung, A. (2003). Different Ties for Different Needs: Recruitment Practices of Entrepreneurial Firms at Different Developmental Phases. Human Resource Management, 42(4), 303–320. .doi:10.1002/hrm.10092 Li, J., Tang, J., Zhang, J., Luo, Q., Liu, Y., & Hong, M. (2007). EOS: Expertise Oriented Search Using Social Networks. In Proceedings of the 16th International Conference on World Wide Web (pp. 1271-1272), Banff, Alberta, Canada. Lim, M. J. H., Negnevitsky, M., & Hartnett, J. (2007). Detecting Abnormal Changes in E-Mail Traffic Using Hierarchical Fuzzy Systems. In Proceedings of Fuzzy Systems Conference. IEEE International. Lin, Y.-R., Sundaram, H., Chi, Y., Tatemura, J., & Tseng, B. L. (2007). Blog Community Discovery and Evolution Based on Mutual Awareness Expansion. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence. IEEE Computer Society. Matsuo, Y., Mori, J., Hamasaki, M., Nishimura, T., Takeda, H., Hasida, K., & Ishizuka, M. (2007). Polyphonet: An Advanced Social Network Extraction System from the Web. Web Semantics: Science. Services and Agents on the World Wide Web, 5(4), 262–278. .doi:10.1016/j. websem.2007.09.002 Matsuo, Y., Tomobe, H., Hasida, K., & Ishizuka, M. (2004). Finding Social Network for Trust Calculation. In Proceedings of the 16th European Conference on Artificial Intelligence (ECAI2004) (pp. 510-514). McCallum, A., Corrada-Emanuel, A., & Wang, X. (2005). Topic and Role Discovery in Social Networks. In Proceedings of International Joint Conference on Artificial Intelligence.
220
Mori, J., Sugiyama, T., & Matsuo, Y. (2005). Real-World Oriented Information Sharing Using Social Networks. In Proceedings of the 2005 International ACM SIGGROUP Conference on Supporting Group Work (pp. 81-84). New York, NY: ACM Press. Pennebaker, J. W., & Graybeal, A. (2001). Patterns of Natural Language Use: Disclosure, Personality, and Social Integration. Current Directions in Psychological Science, 10(3), 90–93. .doi:10.1111/1467-8721.00123 Pikas, C. K. (2008). Detecting Communities in Science Blogs. In Proceedings of the 4th IEEE International Conference on eScience - Volume 00, IEEE Computer Society. Pouliquen, B., Steinberger, R., & Belyaeva, J. (2007). Multilingual Multi-Document Continuously-Updated Social Networks. In Proceedings of the Workshop Multi-source Multilingual Information Extraction and Summarization (MMIES’2007) held at RANLP’2007, Borovets, Bulgaria. Smith, M. A., & Fiore, A. T. (2001). Visualization Components for Persistent Conversations. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, Seattle, Washington, USA. Sudarsky, S., & Hjelsvold, R. (2002). Visualizing Electronic Mail. In Proceedings of 6th International Conference on Information Visualisation (pp 3-9). Tanev, H. (2007). Unsupervised Learning of Social Networks from a Multiple-Source News Corpus. Workshop Multi-source Multilingual Information Extraction and Summarization (MMIES’2007) held at RANLP’2007, Borovets, Bulgaria.
Exploring Virtual Communities with the Internet Community Text Analyzer (ICTA)
Thompson, S. A., & Sinha, R. K. (2008). Brand Communities and New Product Adoption:The Influence and Limits of Oppositional Loyalty. Journal of Marketing, 72(6), 65–80. .doi:10.1509/ jmkg.72.6.65
ADDITIONAL READING
Tirapat, T., Espiritu, C., & Stroulia, E. (2006). Taking the Community’s Pulse: One Blog at a Time. In Proceedings of the 6th International Conference on Web Engineering (pp. 169-176), Palo Alto, California, USA.
Bird, C., Gourley, A., Devanbu, P., Gertz, M., & Swaminathan, A. (2006). Mining Email Social Networks. In Proceedings of the 2006 International Workshop on Mining Software Repositories (pp 137-143). New York: ACM Press.
Tseng, B. L., Tatemura, J., & Wu, Y. (2005). Tomographic Clustering to Visualize Blog Communities as Mountain Views. In Proceedings of WWW 2005 Workshop on the Weblogging Ecosystem, Chiba, Japan.
Bollegala, D., Matsuo, Y., & Ishizuka, M. (2006). Extracting Key Phrases to Disambiguate Personal Names on the Web. In A. Gelbukh (Ed.), Computational Linguistics and Intelligent Text Processing (pp. 223–234). Berlin: Springer. doi:10.1007/11671299_24
Venolia, G. D., & Neustaedter, C. (2003). Understanding Sequence and Reply Relationships within Email Conversations: A Mixed-Model Visualization. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, Ft. Lauderdale, Florida, USA. Viégas, F. B., & Smith, M. (2004). Newsgroup Crowds and Authorlines: Visualizing the Activity of Individuals in Conversational Cyberspaces. In Proceedings of the 37th Annual Hawaii International Conference on System Sciences - Track 4 - Volume 4, IEEE Computer Society. Wellman, B., Gruzd, A., & Takhteyev, Y. (2009). Networking on Twitter: A Case Study of a Networked Social Operating System. Paper presented at the Workshop on Information in Networks (WIN), New York City, NY, USA. Welser, H.T., Gleave, E., Fisher, D., & Smith, M. (2007). Visualizing the Signatures of Social Roles in Online Discussion Groups. Journal of Social Structure, 8.
Adamic, L. A., & Adar, E. (2003). Friends and neighbors on the web. Social Networks, 25(3), 211–230. .doi:10.1016/S0378-8733(03)00009-1
Chen, Z., Wenyin, L., & Zhang, F. (2002). A New Statistical Approach to Personal Name Extraction. In Proceedings of the 19th International Conference on Machine Learning (pp. 67-74). San Francisco, CA: Morgan Kaufmann Publishers. Cho, H., Gay, G., Davidson, B., & Ingraffea, A. (2007). Social Networks, Communication Styles, and Learning Performance in a CSCL Community. Computers & Education, 49(2), 309–329. .doi:10.1016/j.compedu.2005.07.003 Christen, P. (2006). Comparison of Personal Name Matching: Techniques and Practical Issues. In Proceedings of the Workshop on Mining Complex Data (MCD) held at the IEEE International Conference on Data Mining (pp. 290-294). Washington, DC: IEEE Computer Society. Haythornthwaite, C. (1996). Social Network Analysis: An Approach and Technique for the Study of Information Exchange. Library & Information Science Research, 18(4), 323–342. .doi:10.1016/S0740-8188(96)90003-1
221
Exploring Virtual Communities with the Internet Community Text Analyzer (ICTA)
Haythornthwaite, C. (2008). Learning Relations and Networks in Web-Based Communities. International Journal of Web Based Communities, 4(2), 140–159. .doi:10.1504/IJWBC.2008.017669 Hsiung, P., Moore, A., Neill, D., & Schneider, J. (2005). Alias Detection in Link Data Sets. In Proceedings of the 3rd International Workshop on Link Discovery at the International Conference on Intelligence Analysis (pp. 52-57). New York, NY: ACM Press. Malin, B., Airoldi, E., & Carley, K. M. (2005). A Network Analysis Model for Disambiguation of Names in Lists. Computational & Mathematical Organization Theory, 11(2), 119–139. .doi:10.1007/s10588-005-3940-3 Mann, G., & Yarowsky, D. (2003). Unsupervised Personal Name Disambiguation. Proceedings of Conference on Computational Natural Language Learning (pp. 33-40). Morristown, NJ: Association for Computational Linguistics. McArthur, R., & Bruza, P. (2003). Discovery of Social Networks and Knowledge in Social Networks by Analysis of Email Utterances. In Proceedings of ECSCW 03 Workshop on Moving From Analysis to Design: Social Networks in the CSCW Context (Helsinki, Finland). Medynskiy, Y. E., Ducheneaut, N., & Farahat, A. (2006). Using Hybrid Networks for the Analysis of Online Software Development Communities. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 513516). New York, NY: ACM Press. Minkov, E., Wang, R. C., & Cohen, W. W. (2005). Extracting Personal Names from Email: Applying Named Entity Recognition to Informal Text. In Proceedings of the Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (pp. 443-450). Morristown, NJ: Association for Computational Linguistics.
222
Pang, B. & Lee, L., (2008). Opinion Mining and Sentiment Analysis. Foundations and Trends® in Information Retrieval, 2(1-2), 1-135. Patman, F., & Thompson, P. (2003). Names: A New Frontier in Text Mining. In H. Chen, R. Miranda, D. D. Zeng, C. Demchak, J. Schroeder, T. Madhusudan (Eds.), Proceedings of the Intelligence and Security Informatics: First NSF/NIJ Symposium (pp. 27-38). Berlin / Heidelberg: Springer. Pedersen, T., Kulkarni, A., Angheluta, R., Kozareva, Z., & Solorio, T. (2006). An Unsupervised Language Independent Method of Name Discrimination Using Second Order Co-Occurrence Features. In A. Gelbukh (Ed.), Computational Linguistics and Intelligent Text Processing (pp. 208–222). Berlin, Heidelberg: Springer. doi:10.1007/11671299_23 Phan, X.-H., Nguyen, L.-M., & Horiguchi, S. (2006). Personal Name Resolution Crossover Documents by a Semantics-Based Approach. IEICE Transactions on Information and Systems. E (Norwalk, Conn.), 89-D(2), 825–836. Reuther, P., & Walter, B. (2006). Survey on Test Collections and Techniques for Personal Name Matching. International Journal of Metadata. Semantics and Ontologies, 1(2), 89–99. .doi:10.1504/IJMSO.2006.011006 Reyes, P., & Tchounikine, P. (2005). Mining Learning Groups’ Activities in Forum-Type Tools. In Proceedings of the 2005 Conference on Computer Support for Collaborative Learning: Learning 2005: the Next 10 Years! (pp. 509-513). International Society of the Learning Sciences. Savignon, S. J., & Roithmeier, W. (2004). Computer-Mediated Communication: Texts and Strategies. Computer Assisted Language Instruction Consortium Journal, 21(2), 265–290.
Exploring Virtual Communities with the Internet Community Text Analyzer (ICTA)
Stefanone, M. A., & Gay, G. (2008). Structural Reproduction of Social Networks in ComputerMediated Communication Forums. Behaviour & Information Technology, 27(2), 97–106. .doi:10.1080/01449290600802429 Sweeney, L. (2004). Finding Lists of People on the Web. Computer Science Technical Report CMU-CS-03-168. Carnegie Mellon University, Pittsburgh, PA. Wellman, B. (1996). For a Social Network Analysis of Computer Networks: A Sociological Perspective on Collaborative Work and Virtual Community. In Proceedings of the 1996 ACM SIGCPR/SIGMIS Conference on Computer Personnel Research (pp. 1-11). New York, NY: ACM Press. Williams, R. S., & Humphrey, R. (2007). Understanding and Fostering Interaction in Threaded Discussion. Journal of Asynchronous Learning Networks, 11(2).
ENDNOTES 1 2
3
4
5
6
7
8
9
10
11
Swivel - http://www.swivel.com. IBM’s Many Eyes - http://manyeyes.alphaworks.ibm.com. To learn more about CSV files, visit http:// en.wikipedia.org/wiki/Comma-separated_ values. To learn how to create a CSV file using Microsoft Excel, visit http://office.microsoft.com/en-us/excel/HP100997251033. aspx. Tutorial on Regular Expressions - http:// www.regular-expressions.info. ORA - http://www.casos.cs.cmu.edu/projects/ora. Pajek - http://vlado.fmf.uni-lj.si/pub/networks/pajek. UCINET - http://www.analytictech.com/ ucinet. NodeXL - www.codeplex.com/NodeXL. Tie is a connection between two individuals. A comprehensive list of Named Entity Recognition packages - http://alias-i.com/ lingpipe/web/competition.html. US Census - http://www.census.gov/genealogy/names.
223
224
Chapter 12
Making the Virtual Real:
Using Virtual Learning Communities for Research in Technical Writing Reneta D. Lansiquot New York City College of Technology of the City University of New York, USA
ABSTRACT The emerging critical global collaboration paradigm and the use of virtual learning communities can form structured domains that require complementary methods for educational research. The purpose of this chapter is to illustrate how the social nature of virtual worlds can be used to teach technical writing and the academic research process. A yearlong, mixed methodology, research study is used to demonstrate the effect of this blended learning pedagogical approach on writing apprehension in advanced technical writing courses. Students wrote manuals collaboratively for an audience of their peers. Second Life, the online 3D virtual world created entirely by its residents, was both their subject of study and a mode of meaningful communication.
INTRODUCTION Globalization has triggered and is accelerating the disappearance of the competition paradigm so that the key issue is no longer whether students can compete with their global counterparts, but whether they can work with them (Suárez-Orozco & Qin-Hilliard, 2004). For instance, technical communication courses that are inherently interdisciplinary (i.e., merging writing with science, technology, engineering, and mathematics) proDOI: 10.4018/978-1-60960-040-2.ch012
mote the use of computer-supported collaborative learning and now extend beyond such Web 2.0 technologies as blogs (see Lansiquot, Rosalia, & Howell, 2009) to include 3D virtual worlds, notably Second Life. Furthermore, newly emerging mapping applications will soon expand this virtual space. (This model has been termed Second Earth [Roush, 2007].) Such virtual communities provide alternate spaces for real discussions and overcome geographic limitations. In virtual worlds, interaction is more explicit and uses gesture rather than needing to rely on purely text-based interactions because students can actually see virtual avatars of
each other. They do not need to rely on conceptual ideas of presence, such as aliases in a knowledge forum (Padmanabhan, 2008). Meeting in virtual worlds can help create a sense of community that keeps students engaged in learning (Atkinson, 2008). What is evident, however, is that it is not enough to use this technology in the classroom for its novelty effect or even for engagement; the technology should encourage the use of academic skills so that students can transfer what they learn in a virtual world to traditional academics that use a highly structured or scientific method of collaborative investigation. This chapter will illustrate the value of a broader and more interactive research design for understanding virtual learning communities by bringing into play the complementary methods commonly used in educational research with those common to technical usability testing. To facilitate an understanding of how learning communities function beyond traditional face-to-face communication, a brief overview of theory and practical application is given below.
BACKGROUND Cognition and Cognitive Flexibility Cognition is situated in social interactions. As Lave and Wenger (1991) observed, student learning styles are illustrated during the interaction and collaboration afforded by situated learning. The different learning styles employed by students depend on what students are working on and with whom they are working. Students tend to form a tentative community in which assuming distinct roles is helpful in order to gain knowledge and to use information in disciplined ways. In terms of what students are learning, Bloom’s taxonomy calls for students to gain knowledge, comprehension, application, analysis, synthesis, and evaluation. In Anderson and Krathwohl’s (2000) revision of this taxonomy, the knowledge
dimension focuses on factual, conceptual, procedural, and meta-cognitive knowledge. The cognitive process dimension emphasizes remembering (recalling information), understanding (explaining ideas or concepts), applying (using the information in a new way), analyzing (distinguishing among the different parts), evaluating (justifying a stand or decision), and creating (producing a new product or point of view). Proponents of constructionism suggest in addition to considering the many dimensions of learning that educators should consider what personally meaningful artifacts their learners will have the chance to create and share in their learning (Papert, 1991). Adhering to these educational goals, the use of computersupported collaborative learning can foster active, constructive, intentional, contextualized, and reflective learning (Jonassen, 1995). Learning is enhanced if it (a) is situated in real-world or simulated contexts, (b) fits new information with what is already known, (c) is collaborative, and (d) integrates assessment into the overall learning process. For advanced knowledge acquisition in ill-structured domains, cognitive flexibility theory (Spiro, Feltovich, Jacobson, & Coulson, 1991) explains the importance of fostering the ability to restructure knowledge in adaptive response to situational demands. Its relevance is embedded in “how multiple concept representations support comprehension and usability” (Passerini, 2007, p. 186). Understanding how cognition is distributed in effective groups can provide important implications for facilitating meaningful communication and scaffolding collaborative learning.
Distributed Cognition and Project-Based Learning Cognition and knowledge are not confined to an individual; rather, they are distributed across objects, artifacts, and tools. As Rogers (2006) explained, distributed cognition is not a methodology that one can easily apply to a problem, but it is, nonetheless, an analytic framework for examin-
225
Making the Virtual Real
ing the interactions between learners (Angeli, 2008). Similar to distributed cognition proponents, learning-by-doing advocates emphasize multidisciplinary projects that are learner-centered, incorporate independent and group research, and focus on real-world problems (Schank, Berman, & Macpherson, 1999). A learning-by-doing approach is often instantiated in project-based learning because project-based learning encourages students to use inquiry to understand their world and construct meaning from their own experiences. Once a project is completed, “the learners also reflect on their reasoning, their strategies for resource gathering, their group skills, and so forth” (Driscoll, 2005, p. 405). When educators couple technology-mediated communication and collaboration in virtual learning communities, intersubjectivity has the effect of “blurring the borders of the educational community, and making the learning context richer” (Ligorio, Cesareni, & Schwartz, 2008). Vygotsky (1978/2006) indicated that intersubjectivity was a process occurring between learners in which social interaction generates new understanding beyond the mere combination of the various learners’ points of view. Specifically, in addressing the use of virtual learning communities, Sullivan (2009) pointed out that, although anecdotal accounts have been written about Second Life and its use or potential use in the classroom, there is a dearth of empirical studies published in peer-reviewed journals on the subject. She argued for the use of self-study as one research methodology that could be used to begin filling this void. Her self-study on the use of Second Life in the classroom is an important exemplar for learning and improving practice in the use of emerging technologies in the classroom. The challenge facing those seeking to use virtual communities in education is the difficulty in identifying which elements have been productively brought together—for what purpose(s), in what ways, and on what scale to explore which phenomena (Green, Camilli, & Elmore, 2006). When facilitators are able to identify clearly
226
these elements, then the application of virtual technologies to traditional instruction such as writing instruction becomes more efficient.
ACADEMIC APPLICATIONS Mixed Methods and Techniques An effective research study begins with a researchable question and the selection of appropriate research methods for answering it. Smith (2006) encouraged the use of triangulation, suggesting that researchers should be aware that any single method has flaws and alone cannot really sufficiently explain real-world complexities. He pointed out that “qualitative and quantitative research are not distinct, but part of the whole thinking process, and thus mixing methodologies is simply formalizing what researchers already do” (p. 460). One particular problem related to virtual learning communities and increased technology connectivity is the increasing volume of data generated by these communities. In addition, the technologies these communities are based on generate their own type of chaos, complexity, and contextuality. Completely mixed investigations, or mixed methods, simultaneously use both types of data collection—quantitative and qualitative— and both types of data analysis—statistical and qualitative (Tashakkori & Teddlie, 1998). The yearlong research study—“A Student’s Guide to Virtual Worlds”—reported herein examines the interactions of 60 junior and senior undergraduate students from five sections of an Advanced Technical Writing course. Most of these students were completing baccalaureates in Computer System Technology. In this writing course, students wrote manuals following an interactive model of design with two essential properties: description of the components and the ways in which these components were related (Maxwell & Loomis, 2003). This study used complementary methods to explore how virtual learning com-
Making the Virtual Real
munities facilitate meaningful communication and examined the effect of creating virtual communities on writing apprehension. Participating students from five sections of Advanced Technical Writing—three sections during the Fall 2008 semester and two sections in the Spring 2009 semester—were given the following project instructions: 1. Form groups of about three based on common interests and areas of expertise. With your group mates and on your own, explore Second Life and teleport to locations related to an agreed upon topic (e.g., where students hang out or should hang out) that is best discovered in this virtual world. Talk to others within Second Life to gather information and gain different perspectives on your topic. Then, as a group, develop a questionnaire and survey residents to narrow your topic. What kind of students are you catering to—parttimers or perhaps those with specific hobbies or majors? Write an instruction manual on your topic for students—your target audience. The manual should contain information and instructions (e.g., how-to, tips) that are useful to your student audience. Include appropriate graphics and snapshots taken in this virtual world. In a brief note preceding the instructions, specify your student audience and purpose. Be sure to include the avatar names of all group members. 2. To provide guidance in a world that is constantly changing, develop an online multimedia version of your manual that includes at least one original instructional video. Be sure to include helpful supplementary resources, Web sites, images, and videos, keeping in mind your audience, instructional design principles, and information architecture. 3. Conduct usability testing and revise the manual.
Students received little instruction on how to navigate, build, and interact with residents in Second Life, other than a review of what they learned during their tutorial in Orientation Island (the first stop after initial login where a new resident begins his or her second life). Students were meant to experience this new world as a user of their manual would, and the role of the teacher became that of a facilitator. Previous assignments in this course served as scaffolds for this final group project. Class time was allotted to create and explore virtual communities and to interact with residents. At the end of the course, to learn how this project affected technical communications, students evaluated their collaborations with other students and the interactions they had had with residents. Tracing the evolution of student thinking and writing skills was central to this mixed-method study, which used an adapted writing apprehension test (Daly & Miller, 1975; Reed, Burton, & Kelly, 1988) to measure student writing apprehension both pre- and post-participation. Analyses of project data indicated that students tended to avoid writing less after their participation; the apprehension many students felt towards writing was significantly lowered (Lansiquot, 2009; Lansiquot & Perez, 2009; see Table 1). Apprehension of writing may have been lessened by the explicit structure taught to students and the way in which this structure mirrored the academic research process they had been vaguely aware of before the course. Students were already used to the interactive model of design and its five components. The instructor highlighted how this structure was also like the one used in formal research. Similar to formal research, students used the following steps in preparing their manuals: purpose, conceptual framework, research questions, methods, and validity. A clear structure and coherent organization—which would include introduction, procedures, findings, and discussion—provide the means by which researchers can present their study. This overall organiza-
227
Making the Virtual Real
Table 1. Paired sample tests Paired differences
Pair 1
Writing apprehension pretreatment – posttreatment
M
SD
SE M
4.91667
8.84383
1.14173
tional structure was layered within the processes that students followed while writing their manuals, in addition to being present in the manual itself and in the assessment of the collaborative experiences of students. Therefore, student experiences in virtual communities allowed them to mirror academic argumentation through the following: •
•
•
228
Selecting their own topics, each group defining the scope of its topic so that it was manageable within the time frame considering the group members’ expertise. Student topics ranged from one exemplary student project entitled “Second Life Poetry Slam: A Student’s Manual for Organizing, Setting Up, and Hosting Your First Virtual Event” to the creation of an online library, which is being filled with open titles connected to the Web (see Schmid, 2008). Other student groups wrote manuals on such topics as setting up an art gallery and promoting events in Second Life. Immersing themselves in the virtual learning communities under investigation and recording field notes of the experience. Interviewing experts, residents of the virtual world. These interviews, students pointed out, helped them refine the topics of their manuals and begin to see objects in this virtual world as crafted art, not just
t
df
Sig. (2-tailed)
4.306
59
.000
95% confidence interval of the difference Lower
Upper
2.63206
7.20127
•
•
• •
technical artifacts created along the way to community building. Looking for areas of possible replication or gaps in knowledge. In this case, figuring out what had already been effectively accomplished in Second Life, what expertise they could build on, and what was yet to be done to add value to users’ lives. Creating a prototype and conducting usability testing, which is an iterative process that involves feedback. That is, students experiment with their product, testing and refining it, until they complete their manual. Manuals had to include clear procedures that could be replicated. As the students conducted usability tests, they realized what information and tasks would indeed help their users. Frustration and initial setbacks in performing a task in Second Life were recycled into explanations for new users, which included tips and scaffolds, not unlike the teaching implications educational researchers provide in their formal research studies. Presenting their findings to classmates on a group presentation day. Concluding with a demonstration, discussion, and defense.
Students conduct traditional research while performing research in virtual learning communities to create multimedia manuals.
Making the Virtual Real
Lessons Learned Students were provided a space to create their own world, and to improve their collaborative academic writing and research. Project requirements incorporated students’ prior knowledge, but anticipated that students had not already constructed a second life prior to the start of the course. Thus, students had to learn how to create virtual artifacts with their groups and use their newfound knowledge to teach others through a detailed instructional manual. In addition, it was not assumed that students would automatically be engaged because of any prior use of virtual spaces. To wit, simply having students explore the virtual spaces was not enough: Students needed concrete projects. Initially, several students believed Second Life was a game, but they soon discovered that, although some areas adhere to the videogame structure with a rules-based structure and clearly defined goals and strategies to attain these goals (Gee, 2003), this virtual world, in general, rather offers a chance to explore a second life. It does not have a rules-based structure or a goal. Because of the nature of virtual worlds, educators must have clear learning goals and objectives when using them with students. SsStudents should be challenged and provided with just-in-time scaffolds to facilitate learning. As Sullivan (2009) discovered, it is not enough to place students in Second Life with general instructions to explore without a specific goal to achieve each time starting a second life. In fact, the technology is not enough. Students have reported being bored in Second Life when they do not have a purpose. In order to engage students, forming relationships and, ideally, communities is a key to student learning, as is engagement. As Jones, Warren, and Robertson (2009) found, the addition of a 3D online learning environment to blended learning courses can more rapidly create rapport among students. This rapport, in turn, translates into accelerated discourse that occurs earlier in and
sustains itself longer throughout the semester. In spite of this, instructors still must facilitate these communities. Finally, faculty and administrative buy-in and financial support is needed to sustain creating virtual learning communities and making and sustaining innovative spaces for students.
Recommendations Research principles can be applied to a wide range of learning spaces that focus on how to integrate face-to-face and online instruction in a pedagogically sound manner in order to form supportive virtual learning communities. As Lave and Wenger (1991) argued, a community of practice is one assembled around a common goal. This goal ought to be challenging so that layers of participation and expertise are respected. The project of creating a manual gave writers a real-world goal of generating meaningful knowledge together. Certainly, students would have been comfortable making a digital story or narrative of their social experiences in this virtual world, but an instruction manual pushed them to produce more academic writing. In addition, they were introduced to formal investigation and collaboration. Students were the researchers in the blended learning spaces they created. For example, students took field notes of their reflective learning and described their virtual communities. The position of being the creators of their virtual communities and experts for a particular project improved students’ engagement by making them responsible for teaching others.
FUTURE RESEARCH DIRECTIONS Ultimately, the goal of successful educational research is that it is read, disseminated, and improved upon. Although students in this course presented their manuals to each other, the feedback received on their final versions generally came only from the instructor. Therefore, future research should involve students from a new class using manu-
229
Making the Virtual Real
Figure 1. A model of our solar system that rotates on appropriate axes and adheres to scientific principles
als created during this study as a model. Online publication of manuals ought to happen earlier in a course and should target a community outside of the class, such as the Second Life community itself. In addition, in order to refine communication with a larger outside community, the target community will include people in science, technology, engineering, and mathematics (STEM) fields because they will have professional interests in common with these students. Updated requirements of this student project specify creating documentation of a complex scientific or technological system to teach a concept to a chosen audience (e.g., see Figure 1 for an example of a presentation of an astronomical concept). As a result, assessment of individual student contributions during collaboration will be needed. Because the competition paradigm is no longer a viable one, an ethnographic exploration of the effects of culture on motivation in virtual worlds could help improve student collaboration and engagement in such worlds.
CONCLUSION This chapter explored the use of virtual worlds to engage students in real research. Students used
230
field notes, teamwork, interviews, and the principles of academic research to collect usability data and produce multimedia manuals. Student experiences were situated in a community of practice (Lave & Wenger, 1991; Wegner, White, & Smith, 2009) with members at different levels of expertise and showing different intensities of engagement at different times, but the goal was always clear: to work together to write a manual for their peers. This manual mirrored what one does in conducting a research study. Student learning was situated in the virtual world, but students had to think in the real world in order to figure out how to accomplish the goals of their group project. It seems the virtual makes some things more real (Turkle, 2009).
REFERENCES Anderson, L. W., & Krathwohl, D. R. (Eds.). (2000). A taxonomy for learning, teaching and assessing: A revision of Bloom’s Taxonomy of educational objectives. New York: Pearson.
Making the Virtual Real
Angeli, C. (2008). Distributed cognition: A framework for understanding the role of computers in the classroom. Journal of Research on Technology in Education, 40(3), 271–279. Atkinson, T. (2008). Second Life for educators: Myths and realities. TechTrends, 52(5), 26–29. Daly, J. A., & Miller, M. D. (1975). The empirical development of an instrument to measure writing apprehension. Research in the Teaching of English, 9, 242–249.
Lansiquot, R., Rosalia, C., & Howell, A. (2009). The use and abuse of blogging as a course activity: Three perspectives, three approaches. In I. Gibson et al. (Eds.), Proceedings of Society for Information Technology and Teacher Education International Conference 2009 (pp. 2853-2857). Chesapeake, VA: AACE.
Driscoll, M. P. (2005). Psychology of learning for instruction (3rd ed.). Boston: Pearson.
Lansiquot, R. D. (2009). Advanced technical writing: Blending virtual communities [Special issue on blended learning]. The Journal of the Research Center for Educational Technology, 5(1), 57–63. Retrieved from http://www.rcetj.org/index.php/ rcetj/article/view/18.
Gee, J. P. (2003). What videogames have to teach us about learning and literacy. New York: Palgrave Macmillan.
Lave, J., & Wenger, E. (1991). Situated learning: Legitimate peripheral participation. Cambridge: Cambridge University Press.
Green, J. L., Camilli, G., & Elmore, P. B. (Eds.). (2006). Handbook of complementary methods in education research. Mahwah, NJ: Erlbaum.
Ligorio, M. B., Cesareni, D., & Schwartz, N. (2008). Collaborative virtual environments as means to increase the level of intersubjectivitiy in a distributed cognition system. Journal of Research on Technology in Education, 40(3), 339–357.
Jonassen, D. (1995). Supporting communities of learners with technology: A vision for integrating technology with learning in schools. Educational Technology, 35(4), 60–63. Jones, J., Warren, S., & Robertson, M. (2009). Increasing student discourse to support rapport building in Web and blended courses using 3D online learning environment. Journal of Interactive Learning Research, 20(3), 269–294. Lansiquot, R., & Perez, M. (2009, June). A student’s guide to virtual worlds. Poster session presented at the annual meeting of the World Conference on Educational Multimedia, Hypermedia, and Telecommunications, Honolulu, HI. Retrieved from http://www.editlib.org/p/31568
Maxwell, J. A., & Loomis, D. M. (2003). Mixed methods design: An alternative approach. In Tashakkori, A., & Teddlie, C. (Eds.), Handbook of mixed methods in social and behavioral research (pp. 241–271). Thousand Oaks, CA: Sage. Padmanabhan, P. (2008). Exploring human factors in virtual worlds. Technical Communication, 55(3), 270–276. Papert, S. (1991). Situating constructionism. In Papert, S., & Harel, I. (Eds.), Constructionism (pp. 1–11). Norwood, NJ: Ablex Publishing. Passerini, K. (2007). Performance and behavioral outcomes in technology-supported learning: The role of interactive media. Journal of Educational Multimedia and Hypermedia, 16(2), 183–211.
231
Making the Virtual Real
Reed, W. M., Burton, J. K., & Kelly, P. P. (1985). The effects of writing ability and mode of discourse on cognitive capacity engagement. Research in the Teaching of English, 19(3), 283–297. Rogers, Y. (2006). Distributed cognition and communication. In Brown, K. (Ed.), The encyclopedia of language and linguistics (2nd ed., pp. 181–202). Oxford, UK: Elsevier. Roush, W. (2007). Second earth. Technology Review, 110(4), 38–48. Schank, R. C., Berman, T., & Macpherson, K. A. (1999). Learning by doing. In Reigeluth, C. M. (Ed.), Instructional-design theories and models. Mahwah, NJ: Erlbaum. Schmid, R. (2008). Real text in virtual worlds. Technical Communication, 55(3), 277–284. Smith, M. L. (2006). Multiple methodology in education research. In Green, J. L., Camilli, G., & Elmore, P. B. (Eds.), Handbook of complementary methods in education research (pp. 457–476). Mahwah, NJ: Erlbaum.
Spiro, R. J., Feltovich, P. J., Jacobson, M. J., & Coulson, R. L. (1991). Cognitive flexibility, constructivism, and hypertext: Random access instruction for advanced knowledge acquisition in ill-structured domains. Educational Technology, 31(5), 24–33. Suárez-Orozco, M. M., & Qin-Hilliard, D. B. (Eds.). (2004). Globalization: Culture and education in the new millennium. Berkeley: University of California Press. Sullivan, F. (2009). Risk and responsibility: A self-study of teaching with Second Life. Journal of Interactive Learning Research, 20(3), 337–357. Tashakkori, A., & Teddlie, C. (1998). Mixed methodology: Combining qualitative and quantitative approaches: Vol. 46. Applied social research methods series. Thousand Oaks, CA: Sage. Turkle, S. (2009). Simulation and its discontents. Cambridge, MA: MIT Press. Vygotsky, L. S. (1978/2006). Mind in society: The development of higher psychological processes. Cambridge: Harvard University Press. Wenger, E., White, N., & Smith, J. D. (2009). Digital habitats: Stewarding technology for communities. Portland, OR: CPsquare.
232
233
Chapter 13
Virtual Communities as Tools to Support Teaching Practicum: Putting Bourdieu on Facebook Rebecca English Queensland University of Technology, Australia Jennifer Howell Australian Catholic University Limited, Australia
how to increase engagement (Barnett, Keating, Harwood and Saam, 2002; Rye and Katayama, 2003). It seems that only when the participation is attached to formal assessment in units, are students found to participate, however, when there is no assessment, students generally do not participate or participate intermittently. This may be because they are expected to participate in multiple LMS groups in multiple subjects. While there may be an initial flurry of activity as students ask for advice, help, contact details, resources and general
Virtual Communities as Tools to Support Teaching Practicum
immigrants – those who have not embraced the use of ICT in the instructional process. Understanding the differences in digital behaviours and expectations between students and academics, those who are natural users of technology and those who may not be requires theoretical mechanism that can explain the different uptake of the digital tools and the expectancy of different groups in universities. The theoretical concept of capital offered by Pierre Bourdieu can be extended and used for this process. Writers have used Bourdieu’s theories of capital to analyse the (differentiated) uptake, use and integration of ICTs and focused on the digital divide. Authors such as van Dijk (2006), Kvasny and Keil (2006) Cummings, Heeks and Huysman (2006) and Benítez (2006) utilised Bourdieu’s theories of cultural and social capital to explain disparities in digital uptake. These studies argued that the digital divide was reflective of wider divides in economic, social and cultural resources. Generally focused on social reproduction theories and issues (due, in part to their use of Bourdieu’s work) these studies theorised the digital divide as another form of social and cultural division. For example, van Dijk (2006) theorized that a lack of ‘informational capital’ was a marked determinant of the digital divide, closely related to a lack of social and cultural capital. Similarly, Kvasny and Keil (2006) also saw the digital divide as another form of social division arguing for more than an educational intervention to teach ICT skills. Cummings, Heeks and Huysman (2006) used social capital to explore the online network. They based their analysis of the digital divide on the three dimensions of social capital they saw as significant (Huysman, 2004), “the structural opportunity to share; the cognitive opportunity to share; and the relation-based motivation to share” (p. 581). In a study applying the ideas of Bourdieu in the field of cyber education, Menchik (2004) used cultural capital to theorise cyber education as a distinct field in education studies. He argued that cultural capital was not enough to explain
the engagement of various groups with the field and argued for an approach that examined the emergence of the internet as a central role in the curriculum, again finding that the digital divide, however clichéd, was a significant representation of the technological haves and have nots. However, Selwyn (2004) cautioned that there was a need to define what is actually meant by ICTs when examining the digital divide to incorporate technology as diverse as mobile phones, digital cameras, and computers incorporating both the soft (creation of content) and the hard (creation of hardware). He also cautioned against conflating access with creation, arguing that the digital divide exists more as a hierarchy of access between those that have no access to those that are actively engaged in building the hardware and software associated with digital technology. He argued for the use of Bourdieu’s theoretical tools of economic, social and cultural capital to be used to analyse the digital uptake and use by different groups. Finding that the economic capital used to purchase the ICTs, the cultural capital that included the investment of time, the inculcation into the ICTs and the social capital, which included the networks of online contacts, were all significant in the process, he instead offered the term ‘technological capital’ to explain these differences (Selwyn, 2004). Studies that have examined the interactions of online or virtual communities have used social capital as a means of analyzing interaction. This is because social capital can be used to theorise how trust and knowledge are developed in online interactions (Daniel, McCalla, and Schwier, 2002). Social capital represents an abstract, hidden resource that is tapped into when members of online communities interact together to share ideas. The development of social capital has several aspects. Firstly, it requires a level of positive interaction which exists over time and allows people to build trust, share norms and commit to the group. Social relationships, developed over time are considered to be essential to knowledge development (Daniel, McCalla, Schwier, 2008). Secondly, it requires
235
Virtual Communities as Tools to Support Teaching Practicum
reciprocation (Daniel, Zapta-Riviera, McCalla, 2003). While it does not have a dollar value, it can be exchanged for recognition in a field. This is because, while there may be limited financial gains to be made from interacting in an online community, social desire to help and a social appreciation of that help (Daniel, 2002). What seemed to be evident in these studies is that there is a new generation of learners, distinct from those of the past who possess a different type of cultural and social capital. The cultural and social capital that the different generations have possessed has created labels that were used to identify their distinct characteristics. The Baby Boomers were the first such-labelled generational cohort and are typically the parents of Generation X. Generation X were largely born during the 1960’s and 1970’s and are regarded as entrepreneurial and technology friendly (see Figure 1). This generational cohort have seen and driven the majority of technological innovations and developments. For example, they have seen the development of PCs, the WWW, email, mobile phones and computer games. Generation Y, born during the 1980’s and early 1990’s have also been labelled The Internet Generation (see Figure 1). This cohort have grown up in an increasingly Figure 1. Generation X, Y and C
236
digital and Internet driven world. They have seen the Internet develop in all spheres of life, both personal and in business. They have digital technologies such as high speed broadband and digital cameras. This implies the different cultural and social as well as digital capital these groups possess.
DO GENERATION C POSSES A NEW TYPE OF CAPITAL? In the studies that utilised Bourdieu’s conception of capital, the focus was on the social and cultural capital required to engage with the new digital spaces and the differences in capital that created the digital divide. These studies conceived of capital as “accumulated history” and “accumulated labour (in its materialized form or its “incorporated,” embodied form) which, when appropriated on a private, i.e., exclusive, basis by agents or groups of agents, enables them to appropriate social energy in the form of reified or living labour” (Bourdieu, 1986, p. 241). These studies have used capital as the ability to employ the appropriate resources to engage productively in a field. If the agent failed to do so it may indicate that the agent did
Virtual Communities as Tools to Support Teaching Practicum
not possess the appropriate resources. This may be because they do not know how to engage with the technology, they might not have access to the technology or their social networks may not use the technology. These studies used cultural capital to explain the comportment of the self towards a technological end, social capital to explain the network of relations that enable the use of the technology and economic capital to access the technology. Inequalities in capital distribution were implicated in the different uptake of digital resources. From an educational perspective, the uptake of digital communities in the Generation Y or Generation C cohort indicates that more than just cultural and social (and obviously economic) capital are at play. The attributes of this generational grouping were hypothesised by Lawrence Lessig, who suggested that; “Technology could enable a whole generation to create – remixed films, new forms of music, digital art, a new kind of storytelling, writing, a new technology for poetry, criticism, political activism – and then, through the infrastructure of the Internet, share that creativity with others” (Lessig, 2002, p.9)Generation C was a term first offered by the Internet site, Trendwatching.com in 2004 and builds on the work of Lessig (2002). Here, Generation C are defined as those who typically produce and share digital content (Trendwatching, 2004), such as blogs, digital images, digital audio or video files and SMS messages. They are digitally fluent and fearlessly use new forms of technology as they are released. They fluently use computers, mobile telephones, the Internet and other associated technologies (see Figure 1). Generally, this generational cohort is born late 1980’s and early 1990’s but, this cohort is not limited to a narrow age range. As Dye (2007) states “they aren’t categorised by age, they’re categorised by behaviour. And it’s very much about content-centric communication, how they share, store, and manage content” (p.38). They are a “generation that spans across the age divide to encompass the growing population that creates, shares, and is connected
by its own user-generated content” (Dye, 2007, p.38).The habitual and fluent use of the Web 2.0 technology and the creation of content implied that they possessed more than just cultural and social capital. It seemed that the group also possessed a digital capital. This generational cohort of digital content creators use Web 2.0 habitually and fluently to create user-generated digital content. But what types of innovations or digital technologies are they using? This behavioural group, courtesy of Web 2.0 technologies, are fluent in social and mobile digital technologies, WiFi, digital editing, MP3s, podcasts, RSS streaming and vodcasts. Thus making them the first digitally native generation. They characteristically build networks, relationships and their very identity around and through content (Dye, 2007, p.38). Lenhart, Madden, Rankin Macgill and Smith (2007) reported that “44% of US adult internet users (53million people aged 18 and over) have created content for the online world through building or contributing to web sites, creating blogs, and sharing files” ( 5). It was these characteristics and activities that defined the capital possessed by the group in this study. Perhaps one of the most distinguishing characteristics of this generational grouping that defines them as significantly different from predecessors is that there has been a shift from straight forward consumption of digital technologies to customisation and coproduction (Trendwatching, 2004, 14). This is an active and creative cohort, hence the amount of user-generated content is expected to increase significantly (Dye, 2007). Generation C have grown up in a world dominated by technology, the possess the capital required to generate content and customise and coproduce digital content. In Australia, in 2006-2007, 5.67 million households had home Internet access (ABS, 2007). The number of people aged 15 years and over, who had accessed the Internet at home during 2006-2007 was 61% (ABS, 2007). What is interesting in this statistic is the purpose of this access, 75% of all of home Internet consumption is for personal/private use (see Table 1).
237
Virtual Communities as Tools to Support Teaching Practicum
Table 1. Purpose of Internet use at home (ABS, 2007) Main purpose of Internet use at home Personal/private
75%
Work/business
15%
Education/study
8%
Voluntary/community
1%
Other
1%
As Generation C is digitally fluent, the impact of technology is apparent in all spheres of their life, no longer is it neatly divided into personal, business or education. Beyond digital fluency, this group possessed the capital to make content and to interact digitally. What is apparent is that their social network is both physical and digital, often a hybrid blend of both. It is neither separated nor distinguishable. Relationships are maintained both via face-to-face, via sms, email, online chat and mms regardless of where the other person maybe (Goldberger, 2003). The impact of technology on social networks has resulted in a significant shift, according to Dye (2007) it is simultaneously larger and narrower “the entire globe is their new local, and niche communities are the new mass audience” (p.40). The online tools and programs available to users mean that geographical and time constraints are increasingly irrelevant. They possess an ability to use the technology as it exists, not programming new technology. The Generation C cohort are populist users of the technology as it has been created for them, incorporating it into their lives, using it to connect, to share, to create and to communicate. Unlike previous generations who may have possessed the technological capital to program new software, the Generation C cohort are not specialist users of the technology and it is the mass appeal which differentiates them from other generational users of technology. It represents the digital capital that they possess rather
238
than a blend of social, cultural and economic capital or technological capital.
Digital Capital Digital capital is the blend of the social, cultural, economic and technological skills, know-how and attributes that allow access to and interaction with the digital environment. Digital capital is more than the technological capital described by Selwyn (2004) or informational capital described by Gigler (2004). Selwyn (2004) described technological capital as the “specific technological forms of cultural capital that are useful to the information age, such as technological skills, ‘know-how’ and socialization into the technoculture via family and the household” (p. 353). However, Generation C possess more than just technological skills and know how. This generation appears to engage more deeply than merely “having access or ownership of a technology, and engaging with and making meaningful use of that technology” (Selwyn, 2004, p. 353), rather they are imbibed with the technology, and their whole social networks and engagements are mediated by the technology. This is also more than simply “influenced by their social capital” (Selwyn, 2004, p. 353). Rather, this generation uses the digital tools and the ICTs to enact their social and cultural capital. Selwyn’s work was more around the Information Technology element, the technology itself; the significance for Generation C is the Information Communication element of the technology. Sewlyn (2004) argued, “ICT use is increasingly about being able to draw upon ‘expert’ sources of advice to help us use ever-powerful computer systems that the vast majority of users will never fully use, let alone understand” (p. 354). The students in this study used their ICT skills, their digital capital to mediate between their digital and ‘real’ lives. The relationships they had formed in the ‘real’ world at university, in their sporting clubs, at work and in their social lives were furthered, extended and reinforced by their digital capital.
Virtual Communities as Tools to Support Teaching Practicum
Method ‘The wall’ posts were analysed for evidence of capital; social, cultural and digital. Messages that were coded as ‘Social Capital’ included content that was concerned with developing or were evidence of the social connections between group members. This sense of social connectedness manifested itself in ways such as supportive comments, references to shared experiences, colloquialisms such as nick names and awareness or knowledge of social pursuits or personal interests. Cultural capital was demonstrated by the discipline knowledge or discussions they engaged in. This included messages that shared their teaching practicum experiences, asked for suggestions or
239
Virtual Communities as Tools to Support Teaching Practicum
help with a particular teaching problem, shared teaching ideas or demonstrated some link to the shared teaching culture they were all engaged in. The final code, digital capital were messages that illustrated interaction within the digital environment. These were varied in their level of digital fluency, such as request for help in doing something within the community space. All messages posted to the community site were counted as evidence of digital capital as they were a specific form of technological capital, as discussed in detail above. Messages were assigned multiple codes, as they were often analysed at being representative of multiple forms of capital. Apart from the general posting of messages, three ‘discussion topics’ were set up which attracted 15 posts. Two of these discussion topics were requested by a member of the group via email to the administrator/lecturer, as they were unable to create the discussion forums themselves due to not being an administrator of the group. The lack of activity in this section of the group page can probably be seen to be a result of the steps required to access these, ‘the wall’ is available on page one and the box to input a thread is accessed from the front page of the group’s site, the discussion forum requires three steps to access. The first involves clicking on the discussion topic, the second requires you to select ‘reply to [poster]’ then students are able to post to the discussion.
RESULTS AND DISCUSSION The messages were coded into the three types of capital; social, cultural and digital, which involved several steps. Once all the data had been collected, the group wall was read to identify presence of the codes in the data. These three codes, in order of frequency Digital Capital (n=100), Cultural Capital (n=74) and Social Capital (n=70). For purposes of clarity and organisation, the results will be presented through these three codes.
240
(a) Digital Capital As stated above, all of the postings made to the community site were counted as evidence of digital capital. The simple act of joining the Facebook community, navigating around the site, understanding how to post a message to the wall and being able to respond to a message were all indicative of digital capital. The Facebook site is not just an online community; it is imbued with levels of Web 2.0 connectivity that would potentially confuse the inexperienced. Whilst these students would have been experienced in the use of online communities, online discussion and other such learning tools due to the University’s LMS Blackboard, the Facebook application is a different experience again. The success of their ability to engage with this application was evidenced by the number of messages posted (n=100).
(b) Cultural Capital This category contained a reasonable number of posts (n=74) and largely included posts that were associated with discipline knowledge. They referred to teaching activities or responsibilities, lesson plans, student behaviour or other topics associated with teaching and their school. These posts were generally initiated by the students themselves and often answers or comments (largely sharing similar experiences) were made in response. For example; “HEY[sic],↜I just got home from my first full teaching day. It was awesome! Had so much fun. I’m exhausted though, no sleep for the wikid [sic] though... many lessons to prepare for tomorrow plus grade 12 assignments to mark. Just wanted to say it’s great reading all of your stories from your schools”. [KB12/05/08. These posts were typically positive and encouraging by nature indicating that the group sense of community was strong among participants. There
Virtual Communities as Tools to Support Teaching Practicum
was also a growing sense of disciplinarity evident, considering the stage of their undergraduate studies, the sense of them as an emerging professional was beginning to appear. There were collective attempts to solve teaching problems, share stress, expressions of concern as evidence of this developing ‘teacherness’. There appeared to exist an implicit imperative for members to respond and post messages within a short timeframe, which would indicate that they were regularly online and participating. Hence motivation and engagement would appear to be quite high. The posts that were associated with problems were often concerned with nerves, lack of resources, concern over teaching a particular subject for the first time, problems with students, problems with resources and problems with prac supervisors. An example of these types of posts; “.....I wud [sic] like to ask anyone who knows the answer a quick question. My supervising teacher left for England yesterday & her replacement is a first year & by 1st year I mean the class she has on Monday morning will be her first class EVER [sic]. Is she allowed to be my supervising teacher? I wasn’t sure if there were rules etc to the requirements of a supervising teacher & what’s worse is that she is IT and English & she is supervising my BCT classes. Anyone know? We took a class together the other day and it will be interesting cuse [sic] the kids just walk straight over her. The double last week had last 20 mins as free time. Let me know please”. [AH18/05/08] The types of problems were quite complex, but could be broadly grouped as those pertaining to specific classroom-based problems (such as student behaviour or resources) and those pertaining to problems with supervisors (see example above). It would be interesting to correlate the problems that were raised in this forum, to those raised with the visiting liaison lecturers. Did members of this group solve their problems online rather than via more traditional methods such as face-to-face
meetings with staff? Were more problems raised and discussed via this method as they were peerbased or because they did not have such formalised processes attached to them? When a problem had been posted to the community wall, it was interesting to observe the community swing into action and offer solutions. This type of posting was evidence of both cultural and social capital. The example below indicates a response to the problem post described above. “[student’s name], both of my teachers were absent one day each of last week and the substitute teacher that was there is Early Childhood trained and treats the seniors as if they were 6! She tried to take over the class that I had total control over and mentioned to her quietly that just because im [sic] a student teacher doesn’t mean I don’t know how to make a class work. Kids out here talk all the time and as long as their pens are moving at the same time it’s all okay. I would ask for another teacher to supervise your lessons. Someone with real experience with dealing with kids. Don’t let some fresh hot shot run you over... Maybe I’m just talking to myself??? And has anybody picked up their [different subject] stuff yet? Maybe [university name] can post my assignments to me”. [JM18/05/08] It was interesting to note that the number of problem (n=20) and solution (n=18) posts were relatively matched. This group tool would appear to be an effective problem-solving resource. It should be further noted that the solutions were offered by the peer-group not by their lecturer.
(c) Social Capital This category contained the smallest number of posts (n=70) and was unique as it had two clear periods of contribution, at the start of the field practicum and towards the end. The postings made at the start of the field practicum were positive, excited and anticipatory regarding the upcom-
241
Virtual Communities as Tools to Support Teaching Practicum
ing field practicum placement. Those posted at the end of the period were more supportive and encouraging everyone to keep going as they concluded their placement. For example a posting from the start of the field practicum; “hey well my first day at [XY] high went great! there are a few other prac teachers there but yeah don’t think I will have much to do with them as they are in totally different teaching areas....I have a lovely prac supervising teacher and the best staffroom and they are all lovely and really welcoming! got my own desk and have been given heaps of resources already!!!! its great!! [sic] get to teach year 11 and 12 bct as well as year 9 sose and year 10 geography! should be great! Although [sic] my geog [sic] supervising teacher is away this week so not sure if I will get to teach anything in that field this week! good luck to everyone:)” [AN06/05/08] The posts in this category were typically quite long (more than 50 words) and hence would indicate that whilst they were largely concerned with expressing excitement at commencing or finishing practicum, they displayed a level of detail. This commitment to explain and engage in community discussions was an interesting phenomenon to observe as it implies a building of a community among the students. Social capital codes were also concerned with evidence of the social activities or connections between group members. This is consistent with Daniel (2002) and Daniel et al. (2002; 2003; 2008) that the connection, the capital that resided with the group through their shared experiences in the course, between participants had developed a trust that allowed the group to participate. This is because the social capital of the group reveals the relationship. The quality of the responses in this category are a significant insight into the social capital that the group possessed (Daniel, ZaptaRiviera, McCalla, 2003). They were indicative of the sense of comradeship and connection between the students. There were postings that were selfdeprecating for not contributing more, comments
242
on mistakes made on prac, greetings, jokes and they generally created a sense of togetherness amongst the group. An example of these types of posts; “Okay note to self: Students are feral a) after the state of origin, b) on rainy days.... I’m sure I’m going to just keep adding to this list”. [FC29/05/08]
REFERENCES Abbitt, J. (2007). Exploring the Educational Possibilities for a User-Driven Social Content System in an Undergraduate Course. MERLOT Journal of Online Learning and Teaching, 3(4), 437-447. Retrieved 18 August, 2008 from http://jolt.merlot. org/Vol3_No4.htm
245
Virtual Communities as Tools to Support Teaching Practicum
ABS. (2007). Household use of Information Technology, Australia 2006-07. [Online] Accessed 9 May 2008 http://www.abs.gov.au/ Ausstats/abs@. nsf/ 0e5fa1cc95cd093c4a 2568110007852b/ acc2d18cc958bc7bca 2568a9001393ae! OpenDocument Barnett, M., Keating, T., Harwood, W., & Saam, J. (2002). Using Emerging Technologies to Help Bridge the Gap between University Theory and Classroom Practice: Challenges and Successes. School Science and Mathematics, 102. BECTA. (2008). Technology Strategy for further education, skills and regeneration. London: BECTA. Bishop, J. (2006). Increasing participation in online communities: A framework for human-computer interaction. Computers in Human Behavior, 23, 1881–1893. doi:10.1016/j.chb.2005.11.004 Bourdieu, P., & Wacquant, L. (1992). An invitation to reflexive sociology. Chicago: The University of Chicago Press. Daniel, B. (2002). A process model for building social capital in virtual learning communities. Unpublished manuscript, University of Saskatchewan. [online] Accessed 15 November 2009 http:// www.usask.ca/education/coursework/802papers/ daniel Daniel, B., McCalla, G., & Schwier, R. (2002). A Process Model for Building Social Capital in Virtual Learning Communities. ICCE: 2002 International Conference on Computers in Education (ICCE’02), 2002 (p. 574). Daniel, B., McCalla, G., & Schwier, R. (2008). Social Network Analysis techniques: implications for information and knowledge sharing in virtual learning communities. International Journal of Advanced Media and Communication, 2(1), 20–34. doi:10.1504/IJAMC.2008.016212
246
Daniel, B., Zapta-Rivera, J.-D., & McCalla, G. (2003). A Bayesian computational model of social capital in virtual communities. In Communities and Technologies: Proceedings of the first international conference on Communities and Technologies 2003 (pp. 287-305). Maryland, USA: Springer Dye, J. (2007). Meet generation C: creatively connecting through content. [Online] Accessed 9 May 2008 from http://www.econtentmag.com Facebook About. (2008). Retrieved August 10, 2008, from http://www.facebook.com/about.php Gibson, C. (Ed.). (2006). Student engagement and information literacy (p. viii). Chicago: American Library Association. Goldberger, P. (2003). Disconnected Urbanism. [Online] Accessed on 9 May 2008 from http:// www.metropolismag.com/html/content_1103/ obj/index.html Keegan, D. (2004). The incorporation of mobile learning into mainstream education and training. Paper presented at The 18th Asian Association of Open Universities Annual Conference. Lenhart, A., Madden, M., Ranking Macgill, A., & Smith, A. (2007). Teens and Social Media. Pew Internet and American Life Project. [Online] Accessed on 9 May 2008 from http://www.pewinternet.org/pdfs/PIP_Teens_Social_Media_Final.pdf Lessig, L. (2002). The Future of Ideas. New York: Vintage Books. Lorenzo, G., Oblinger, D., & Dziuban, C. (2006). How Choice, Co-Creation, and Culture Are Changing What It Means to Be Net Savvy. [Online] Accessed on 9 May 2008 from http://connect. educause.edu/Library/EDUCAUSE+Quarterly/ HowChoiceCoCreationandCul/40008
Virtual Communities as Tools to Support Teaching Practicum
McElrath, E., & McDowell, K. (2008). Pedagogical Strategies for Building Community in Distance Education Courses. MERLOT Journal of Online Teaching and Learning, 4(1) 117-127. Retrieved 18 August, 2008 from http://jolt.merlot. org/vol4no1/mcelrath0308.pdf McNeely, B. (2005). Using Technology as a Learning Tool, Not Just the Cool New Thing. Educating the Net Generation. EDUCAUSE Ebook. [Online] Accessed on 9 May 2008 from http://www.educause.edu/UsingTechnologyasa LearningTool,NotJusttheCoolNewThing/6060 Oblinger, D. G., & Oblinger, J. L. (2005). Educating the net generation. [On-line]. Accessed on 9 May 2008 from http://www.educause.edu/ educatingthenetgen Papert, S. (1993). The Children’s Machine: Rethinking school in the age of the computer. New York: BasicBooks. Prensky, M. (2001, October). Digital natives, Digital immigrants. On the Horizon, 9(5). NCB University Press. [Online] Accessed on 9 May 2008 from http://www.marcprensky.com/writing/ Reil, M. (2000). New designs for connected teaching and learning. A White Paper available on-line at http://www.gse.uci.edu/mriel/whitepaper
Rye, J., & Katayama, A. (2003). Integrating Electronic Forums and Concept Mapping With a Science Methods Course for Preservice Elementary Teachers. Electronic Journal of Science Education, 7(4). Retrieved 18 August, 2008 from http://ejse.southwestern.edu/original%20site/ manuscripts/v7n4/issue.html Taylor, M. (2006). Generation NeXt Comes to College: Today’s postmodern student. [Online] Retrieved May 9, 2008 http://globalcscc.edu/ tirc/blog/files/Gen%20NeXt%20handout%20 06%20oln.pdf Trendwatching. (2004). Generation C. [Online]. Retrieved 9 May 2008 from http://www.trendwatching.com/trends/generation_C.htm Wellman, B., Boase, J., & Chen, W. (2002). The Networked Nature of Community: Online and Offline. IT & Society, 1(1), 151–165. Windham, C. (2007). Father Google and Mother IM: Confessions of a Net Gen Learner. Presented at ELI Annual Meeting, January 23, 2007. [Online]. Retrieved 9 May 2008 from http://connect. educause.edu/library/abstract/FatherGoogleandMothe/39228.
Roper, C. (2008). Teaching People to Bargain Online: The Impossible Task Becomes the Preferred Method. MERLOT Journal of Online Learning and Teaching, 4(2), 254-260. Retrieved 18 August, 2008 from http://jolt.merlot.org/vol4no2/ roper0608.pdf
247
248
Chapter 14
Conversation Analysis as a Tool to Understand Online Social Encounters Aik-Ling Tan Nanyang Technological University, Singapore Seng-Chee Tan Nanyang Technological University, Singapore
Conversation Analysis as a Tool to Understand Online Social Encounters
INTRODUCTION Conversation Analysis (CA) studies talk in naturally occurring interactions. It originated from the works of Harvey Sacks (1974), who wanted to develop an observational science as an alternative means to examine details in actual social events. CA studies how social orders are produced and how societies reproduce these social orders through details grounded in “talk-in-interaction”. CA seeks to place a new emphasis on participants’ orientation to indigenous social and cultural constructs. It seeks to describe the underlying social organization – conceived as an institutional substratum of international rules, procedures, and conventions– through which orderly and intelligible social interaction is made possible (Goodwin & Heritage, 1990, p. 283). In using CA to study interactions in a social context, we seek to understand the transaction of events in the social world. We give emphasis to the routine everyday events and norms of how the participants within specific social and cultural contexts involve themselves in forming, shaping, affirming or denying each other to define the social orders (Tan & Tan, 2006). While conversations traditionally involve two individuals, CA has been applied in broader institutional contexts such as schools from the 1970s. Researchers like Mehan (1983), Cazden (1986), and Sinclair and Coulthard (1992) have all carried out investigations into how talk is used as a resource by teachers and students to accomplish learning. The application of CA in studying face-toface classroom interactions have enabled insights into the transactions that result in learning, but its application to understanding online learning environments appears to be limited. Everyone recognizes that interactions in an online learning environment are different from face-to-face interactions (Waither, 1996) and hence it is necessary for us to under the differences and how the differences come about in order to gain better insights into the norms of online discussions. The
processes and maintenance of orderliness of how participants in an online learning environment go about their business of transacting and sharing their knowledge to accomplish learning is at best an intelligent guess by researchers currently. In this chapter, we suggest using CA as a tool to uncover and illumine the micro structures of “virtual talkin-interaction” so as to better understand the social structures that are embedded in the orderliness of online learning environments. The examples presented in this chapter are chosen to illustrate how CA and Freebody’s (2003) six analytic passes can be used and are useful for analyzing asynchronous discussion, the focus is not on the results and implications of each example analyzed.
BACKGROUND With the proliferation of educational technology and its penetration into classrooms, educational technologists begin to realize the urgency of scrutinizing people’s on-line conversations as evidence of educational processes and outcomes (Mazur, 2004). Analytical methods like content analysis and social network analysis have been used by researchers to make sense of online interaction. Heckman and Annabi (2005) used content analysis methodology to compare between face-to-face interaction and online learning processes and found that students assume more instructional role and are engaged in higher order thinking processes in asynchronous online environment. Similarly, Hara, Bonk, and Angeli (2000) used transcript content analysis on students in a psychology course and found that the course participants engaged in lengthy and cognitively more complex discussions. The methodological concerns of applying content analysis have been thoroughly addressed by Rourke, Anderson, Garrison and Archer (2000) as early as the start of the new century. In their paper, they highlighted the need to examine objectivity, reliability, replicability and systematic coherence when using quantitative
249
Conversation Analysis as a Tool to Understand Online Social Encounters
content analysis. To complement the quantitative information that is revealed by content analysis, CA and other qualitative methodologies can be used. While discussions on the potentials of CA techniques for analyzing face-to-face discussions are not new, we are suffering from a dearth of studies that apply CA on empirical data corpus in online conversations. This chapter attempts to mitigate the gulf between theoretical notions of CA and its application in analysis of empirical online data. Both face-to-face and online encounters have elements of cultural and social dimensions of knowledge; these knowledge manifest as shared meanings, judgments, and understandings of expressed thoughts and beliefs within a specific context. As suggested by Mazur (2004), context can be defined as the sum of the dimensions and properties of the social situation that relate to the evolution, production or reception of discourse. As such, face-to-face and online discussion presents interaction in two contexts that are different but yet share some similarities. To gain insights into the culture of face-to-face or online interaction, examination of talk becomes the point of discourse analysis. When people talk, they do not do so in isolation. Talk among people is collective and consists of, as an example, speakers and listeners, or writer and readers. These different parties interact with each other to construct the social events they experience. The central purpose of CA is hence to investigate the norms and conventions that speakers used in interaction to establish communicative understandings. For CA to be useful in understanding the nuances of online learning environments, much work still needs to be done to enhance “our understanding of context, content, participant response and reaction, and the social relationships inherent in all this on-line talk-ininteraction” (Mazur, 2004, p.1095). In face-to-face conversations, there is a continuity of time and space that is not a necessary condition in virtual online discussions. For example, through asynchronous online discussion,
250
participants can contribute to a discussion from different locations and at different times. The affordances of each mode of communication have prompted researchers to examine the differences between them as well as to reconcile the strength of each mode of interaction. Researchers have attempted to develop models to study online interaction (for example, Gunawardena, Lowe, & Anderson, 1997). Gunawardena and her colleagues studied online database with 554 list subscribers and proposed a social knowledge construction model with five phases, namely (1) sharing or comparison of information, (2) discovery and exploration of dissonance or inconsistency among ideas, concepts, and statements, (3) negotiation of meaning and co-construction of knowledge, (4) testing and modification of proposed synthesis or co-construction and (5) agreement statements or application of newly constructed meaning. Building on the work of Gunawardena and her colleagues in 1997, and ideas from social network analysis (Wasserman & Faust, 1997), deLaat (2001) analyzed the interaction patterns of an online community of practice within a Dutch police organization. The objective of his analysis was to ascertain the activity of the participants and to establish the central participant in the community and the density of the discussions. Similar attempts have also been made to develop analytic tools to aid in analysis of interactions in computer mediated communication. Fahy, Crawford and Ally (2001) and Fahy (2002) developed the Transcript Analysis Tool (TAT) to analyse the interaction pattern in a computer conference and found that while all participants participated in the discussion, the intensity and persistence of discussion differs among different participants. Like the earlier researchers, Jeong (2003) also developed an analytic tool (Discussion Analysis Tool, DAT) based largely on coding principles to examine group interaction among participants in online threaded discussions and found that disagreements are unlikely to be posted by participants in response to a position statements or arguments. While these
Conversation Analysis as a Tool to Understand Online Social Encounters
researchers have developed models and means of interaction analysis for online environments, the possibilities of examining social orderliness such as power relationships in the online talk is usually not addressed by these models. We suggest here that CA could fill in this vacuum to provide more intricate details of interaction in an online environment. For example, Panyametheekul and Herring (2003) used CA to examine gender and turn allocation in a Thai chat room and found out that gender interacts with culture online in complex ways such that Thai females appears to be relatively empowered in Thai chat rooms. These insights revealed through CA analysis provide useful information about the mechanisms through which social interaction actually takes place.
SIX ANALYTIC PASSES OF CA ON ONLINE INTERACTIONS As discussed above, various analytic methods have been developed and deployed to understand different aspects of online learning environment. Virtual asynchronous online interactions, while different from the usual face-to-face interactions, are a result of human activity nevertheless. Analytic methods employed to study these virtual asynchronous interactions will hence need to provide insights into how these interactions are conceived and maintained by each participant such that learning and knowledge is synthesized. As discussed earlier, most existing analytic methods aim to reveal what is going on in the online interaction and what is being transacted (Table 1).
There is little attention given to the underlying reasons of how these virtual online interactions are accomplished due to social order within the community. We argue that knowledge of both what is going on and how events are accomplished are important. As such, we present how CA could be used to illumine how online learning interactions are accomplished by using six examples. In this section, we will use Freebody’s (2003) six analytic passes to illustrate how CA may be applied to understand the social order in online learning environments. These six analytic passes are: (1) turn taking, (2) building exchanges, (3) parties, alliances and talk, (4) trouble and repair, (5) preferences and accountability, and (6) institutional categories and the question of identity. Analysis of each move is illustrated and elaborated with examples. The focus here is on how analysis using CA can be carried out rather than on the results of the analysis; hence, six different examples that can best illustrate how analysis can be carried out for each of the six passes are chosen from different data sets rather than from a single transcript.
Turn Taking Structures In studying turn taking structures in a face-to-face interaction, we are interested in uncovering how turns are taken and allocated. In many instances, turn-taking is not a random event and the system of turn-taking, while invisible to many of us, works in a systematic way to maintain the basic orderliness of any interaction. In this section, we illustrate how analyzing turn taking structures
Table 1. Existing analytic methods for online interactions Analytic Method
What is revealed
Content analysis
Participants’ moves, knowledge and thoughts. Development of ideas through the interaction.
Social network analysis
The relevant manifested ties between the participants.
Discourse analysis
Different variants of discourse analysis reveal different aspects of learning. For example, it can reveal the growth in vocabulary of participants as they interaction with each other through time.
251
Conversation Analysis as a Tool to Understand Online Social Encounters
reveals the organization of conversations in online environment given its own unique affordances when compared with face-to-face interaction. In a face-to-face conversation, the parties involved in the talk will speak when they come across signals like pauses from the speaker, a gaze directed from the speaker to the person nominated to take over the turn of talk, an explicit invitation from the speaker to speak, or simply using speech markers like “may I interrupt” to indicate their desire to take over the conversation. The orderliness of having only one party speaking at any one time is important as it allows the rest of the participants to concentrate and listen to what is being said. This orderliness is possible and maintained in most face-to-face situations as the participants are aware and adhere to the social norms and rules, which while unspoken, have been established and understood by all involved. The familiarity of daily face-to-face encounters has rendered turn taking and turn construction “invisible” to us. The question of interest here is whether interactions in an online learning environment follows the same rules of turn taking as in face-to-face interactions. In the absence of visible facial expressions, gaze and “directedness” of face-to-face encounters, how do participants in an online discussion or conversation know when their turn of talk is? Surely, turn taking organization in an asynchronous online environment is not a random process with participants “speaking” or contributing whenever they like. In this section, we use the interactions among a group of eight individuals in an introductory graduate course using an online discussion forum to illustrate how turn taking structures are established such that the discussion proceeded with little difficulty. We begin the analysis by searching for clues or cues that indicate the how participants determine their speaking rights. Given the lack of facial or verbal cues to signal the transition of a turn of talk in an online environment, what are the evidences available for participants to interpret turn of talk or turn construction move?
252
An online discussion demands each member of the discussion group to make judgment about when their contribution needs to be made and has to be made so that a coherent and sound discussion can proceed. In Table 2, by tracking the date and time of each contribution made by the participants in the group, we found a nonlinear turn-taking structure. The content of the conversation in Table 2 was purposely hidden to highlight the turn-taking structure. Individuals in the group can initiate a discussion (or a conversation) simultaneously across different threads and an individual can participate in two conversations simultaneously.
Table 2. Discussion between eight individuals in a graduate course. Discussion between May, Wayne, Albert, Sam, Herman, Dorothy, Charles, and Simon Thread 1 Charles (18 Nov, 16:09) May (27 Nov, 9:13:15) Wayne (27 Nov, 9:32:38) Albert (28 Nov, 18:27) Thread 2 Sam (27 Nov, 9:14:12) May (27 Nov, 9:26:58) Albert (28 Nov, 11:03:46) Thread 3 Herman (27 Nov, 9:19:51) Albert (28 Nov, 11:04:58) Thread 4 Dorothy (27 Nov, 9: 21:56) Charles (27 Nov, 9:24:02) Simon (27 Nov, 9:26:31) Thread 5 Simon (27 Nov, 9:20:32) Dorothy (27 Nov, 9:26:01)
Conversation Analysis as a Tool to Understand Online Social Encounters
For example, May was involved in a “conversation” with Charles in thread 1 (taken to be one conversation group), and at the same time, she is also involved in a “conversation” with Sam in thread 2. Similarly, Dorothy was involved in a conversation with Simon in together with Charles (in thread 4) but was also engaged in a separate conversation with Simon separately in thread 5. Albert too, showed that it was possible for him to simultaneously be involved in different conversations at the same time. He was involved in a conversation with Charles and May (thread 1), with Sam and May (thread 2) and with Herman in thread 3. This multiple turn taking pattern, not seen in face-to-face interactions, appears to be common in an online discussion environment. In a face-to-face interaction, a participant can be physically present only at a defined time and space to speak to one group of people; in contrast, an online environment provides the affordances for an individual to participate in two conversations simultaneously. More detailed analysis of the discussion threads also revealed that within each thread, it is possible to have more than one reply to one message at different times. In thread 1 for example, May, Wayne and Albert individually responded to Charles’ message at different times. Similarly, in thread 2, May and Albert also responded to Sam directly one day apart. In thread 4, Charles responded to Dorothy while Simon, instead of responding to Dorothy, responded to Charles. This form of taking up a turn of “talk”, immediate or delayed, is only possible because an online environment is asynchronous. Compared with a faceto-face environment, delayed and multiple turn taking within a single thread (conversation) is a unique feature of an online discussion. Participants can decide when and which thread of discussion to participate in. In online interaction, how do participants know if a turn of talk has ended? While participants in a face-to-face interaction relies on verbal and visual cues like gesture to indicate the end of a turn of
talk, in an online interaction, the participants rely on the visual cues of a posted note. Once a note is posted by a particular participant and appears on the screen of other participants, it is perceived by others that the turn of talk has ended and that they are free to respond. In the example cited, it is important to note that there is little evidence of multiple responses of similar content by different participants to a note that is posted. We postulate that the asynchronous nature of an online environment allows for thinking time and hence occasions of simultaneous, multiple responses to a single post is not frequent. Participants can read other notes, wait and think through what they would like to “say” before taking up a “turn of talk”. Finally, we question the evidence or cues that allow participants to realize that the discussion has concluded. From Table 2, it is clear that discussion appears to cease when no new ideas were posted by members of the group and no members of the group volunteered to start a new topic of discussion.
Building Exchanges Moving from analyzing single turns of talk, we now illustrate how these turns of talk can be analyzed to ascertain how exchanges are built. Building exchanges is about establishing the mutually understood character of exchanges that will take place, and who will take responsibility for sustaining what courses of action as they play their various parts. We look for the ways in which turns at talk are co-ordinated into larger collections, and directed at particular topics or tasks within an interaction. As analysts, we do not (and usually have no means) to access the internal thought processes and intentions of each participant. What is accessible and visible to us are ideas that a participant deliberately makes available and also the reaction to how others have responded to his/her apparent intentions. Examining how exchanges are built hence offers insights into the participants’ (speakers) internal intentions
253
Conversation Analysis as a Tool to Understand Online Social Encounters
as compared to merely looking at a single turn of talk. Using this analytic pass, we can gain a better understanding of the interactions that occur during the learning process. Emphasis on the process of interaction helps increase understanding of what actually goes on in an online discussion to bring about learning. It complements current methods of examining online learning such as content analysis that focus largely on the products. In Table 3, the group of four boys (grade five) has just gone to the school pond and made some observations about the living things that they found in the pond. After their visit to the pond, they were tasked to discuss about the ecology of the school pond on Knowledge Forum, an online discussion forum. The discussion is facilitated by the teacher, Angel, who posted a note in turn 5. In Table 3, phrases in square brackets (e.g., <My theory>) are customizable scaffolds in the form of sentence openers available in Knowledge Forum. We begin analyzing for exchanges by examining each turn of talk and relate this to the earlier turn of talk and the later turn of talk. By examining how each turn of talk relates to other turns of talk offers insights into how the specific turn of talk is “heard”, “read” or “understood” by the participants of the discussion. In the analysis of the group discussion between William, Sherman, Sydney, Robert and Angel, it is evident that every participant knew their roles as contributors to the discussion (turns 2-4). The discussion was sustained because each member of the group understood their individual responsibility to contribute to the discussion positively by relating to the pond visit they had just experienced. When the discussion appeared to have ceased after turn 4 where no new ideas were suggested by the students, Angel (the teacher) read this as a signal for her to take over the “turn of talk” and she did this by posing a question to stimulate further discussion. This post by Angel was correctly responded by Sydney as he continued with the discussion by posting his answer to Angel’s question. The discussion was hence sustained because the partici-
254
pants were aware of their responsibilities and the goal of the discussion they were involved in; they were also able to understand and interrupt each contribution by fellow participant accurately. From the use of relevant scientific terms like “physical environment”, “carbon dioxide”, “organisms”, it was evident that the participants, who were novices to pond ecology, understood this as a learning event and were trying to make sense of what they had observed in the pond using scientific language. The participants mutually build on each others’ exchanges to craft this learning event. The analysis of building exchanges further asked the question of how the discussion is sustained or can be sustained so that the learning event actually takes place. From Table 3, we notice that the learning event is sustained by four key features: (1) Sherman building on William’s idea of turtles, small fishes and survival of organisms in the pond to complete the idea suggested by William, (2) Sydney agreeing with the ideas presented by William and Sherman, (3) Robert disagreeing with what Sydney has presented and presenting his own point of view and (4) the social norm for every member to make a contribution in the discussion prompted every participant to contribute. Hence, in this case, the online discussion was sustained because (1) there is a diversity of ideas among the participants, leading to some agreement and disagreement of ideas among some participants, (2) there are contributions of ideas which appear incomplete so that members of the discussion can build on each other’s ideas and (3) there is a clear understanding by the participants of what the learning event is and their roles in the discussion.
Parties, Alliances and Talk When a conversation takes place in an institution and involves more than one individual, it is likely that the conversation does not proceed in a linear fashion (Mazur, 2004). In this analytic pass, we are interested in the underlying reason leading to
Participants
William
Sherman
Sydney
Robert
Angel
Sydney
Turn No.
1
2
3
4
5
6
21 May, 16:08:40
16 May, 11:17:08
14 May, 16:18:03
14 May, 16:16:06
14 May, 16:14:08
14 May, 16:14:07
Sequence of time
<My Theory> The living things in the pond such as the water plants give out oxygen on day time for the physical environment and the living things in the pond.The living things in the pond also produce carbon dioxide for the plants to breathe during day time. Fishes and turtle also like to live on the place with low turbidity so that they can see clearly through the water but they also like to live in the high turbidity so that they can hide from their enemies.
whether the organisms found in and around the pond depend on each other for survival?
I think that the organisms’ job is to be fed by other animals which live in the pond.
I think that the function of the organism is to feed on other plants and on other things in the pond.
<My Theory> The turtle eats some small animals or plants in the pond in the order to survive in the pond.
Why does the turtles and small fishes stay on one side while the bigger fishes stay on the other side? <My Theory> I think it is because of the low bridge or the smaller fishes are afraid of getting eaten.
What is discussed
Took the responsibility sustain the discussion by responding to Angel’s question.
Noticed that discussion has ceased with the lack of entries by the group. Took over the “turn of talk” and asked a question to stimulate discussion.
Acknowledges the earlier contributions and builds upon it. Also plays role as a fellow discussant.
Acknowledges the earlier contributions and builds upon it. Also plays role as a fellow discussant.
Continues with the discussion by contributing his ideas. He is playing the role of a fellow discussant.
Initiates the discussion by posing a question and giving his theory about the phenomena he observed.
Remarks
Table 3. Discussion between four boys and their teacher. Discussion between William, Sherman, Sydney, Robert and Angel (Teacher)
Conversation Analysis as a Tool to Understand Online Social Encounters
255
Conversation Analysis as a Tool to Understand Online Social Encounters
the non-linear interaction, specifically, how the interactions between the different parties organize themselves - based on their knowledge, interests, and institutional roles - into groups or meaningful categories. Similarly, in online environments, how parties and alliances are formed between different groups of individuals would help us understand how learning takes place (or fail to take place). In Table 4, we analyze a conversation between six individuals (Yong, Xen, Tim, Yoyo, Tom and Lerk) on the topic of water pollution in their homes. We present nine turns of talk as an illustration of how alliances can be formed in an online discussion. As stated in the earlier section, this analysis begins with the examination of each turn of talk and a decision is made on the purposes of each turn of talk (see analysis column in the table). As analysts, we are unable to “get into the heads” of the speakers, it is important to ensure that the earlier turn of talk as well as the next turn of talk is taken into consideration during the analysis since what and how each utterance is heard usually serves as the warrant for the interpretation of the purposes of each turn of talk. Once the purposes of each turn of talk is determined, we check for patterns into order to cluster the turns of talk by what each cluster accomplished in the interaction event. Detailed analysis of each turn of talk is found in the last column of Table 4. Overall, the analysis of each turn of talk can then be composed as a coherent description of the event in forming alliances. The nine exchanges in this example accomplished the following: • • • •
256
Turn 1 Posing the problem in the form of three questions. Turn 2 Possible answers to the questions suggested. Turns 3-5 Parties coming together to form alliances based on their personal experiences. Turns 6-9 Other alliances are being formed as there are alternative views.
From this example, we see three possible principles of forming alliances between different parties in an online discussion – (1) Explicitly stating, identifying or sharing one’s own experiences with that of another person (as in the case of Tim identifying with Yong in turns 3 and 4); (2) responding to questions that are posted online and using the questions to present a different point of view (see Lerk in turn 8); and finally (3) restating or repeating position presented earlier with a rebuttal (as shown by Yoyo in turn 9).
Trouble and Repair Trouble-and-repair is likely to be one of the most important and interesting analytic passes. When speakers realize that their contribution in the conversation does not yield the response they expected, and the conversation breaks down, corrective mechanisms kick in to repair and restore social order. Here we examine how trouble starts, how it is recognized and how it is repaired. We will also examine the consequences of failure to recognize interactional trouble. In Table 5, we examine a case where individuals who do not understand the norms or are unfamiliar with the rules of contributions in an online discussion to carry out a meaningful and productive discussion. In this exchange between a group of five Grade 5 girls (Angel, Yvonne, Helen, Mandy and Mei), we illustrate how interactional trouble can manifest itself and how repair can be carried out to allow successful and meaning learning to take place. In this example, the girls have just gone to the school pond to observe the ecology of the organisms found in the pond. They have returned from their visit and were tasked to discuss the inter-dependence of the organisms found in the school pond. The analysis of a transcript for trouble and repair is similar to the passes that have been described previously – each turn of talk is analysed for the purposes accomplished in relation to earlier and later turns of talk (see Remarks column).
Who/When
Yong 22/2 20:48:41
Xen 23/2 14:02:58
Yong 23/2 15:52:37
Tim 23/2 20:49:21
Yoyo 23/2 21:33:15
Tom 23/2 21:43:18
Yong 24/2 18:25:18
Lerk 24/2 18:47:27
Yoyo 24/2 19:07:05
Turn
1
2
3
4
5
6
7
8
9
Analysis
no, i only want to understand why they wanted to waste water
No. It is not that the taps are not taken care of. It’s just that the water has been too polluted and it shall affect the water from the tap.
I need to understand What do you mean when owners don’t take care of the taps properly? Is that the only reason why the water turned yellow?
but i think even we care the tap still we have the yellow water as it is the water that is polluted not the tap
Yoyo, maintains his stand of “tap is not cared for” and hence water became polluted by offering a query as to why polluted water is allowed into taps leading to a waste of water.
Lerk, another member in the discussion, showed his alliance with the “water-is-originally-polluted” group by providing an answer to Yong’s question.
Yong built on Tom and Yoyo’s idea about owners taking care of taps and questioned the relationship between care of taps and water turning yellow. Yong is not forming alliances with either the “tap-is-not-cared-for” group and “water-isoriginally-polluted” group yet.
Tom took a different stand from Yoyo and Tim by starting his posting with the word “But”. He disagrees that the owner and the tap could possibly be the cause of polluted water. He countered Yoyo by stating that water can still be polluted even if taps were cared for.
Yoyo readily agreed with Tim’s stand and supported Tim’s discomfort with yellow water but saying that it had never occurred to him and using the word “disgusted” to indicate a similar distaste to dirty water as expressed by Tim. Here, Yoyo further added that dirty water could possibly be caused by owner’s neglect of the tap. Here we see Tim and Yoyo forming an alliance based on their similar opinion to dirty water. This is done by building on each other’s ideas positively.
In response to Yong’s question on occurrence at home, Tim indicated that he would be horrified if it ever occurred to him.
no but if it does i would freak out
it did not occur at my house. disgusting yellow water will only occur when the owner does not take care of the tap properly.
Yong asked another question in response to Xen’s answer. This is a question pertaining to personal experiences with dirty water. Yong further asked if there were any other reasons for water turning yellow.
Replying to questions posed by Yong. Attributed the poor quality of the water to the poor water treatment process.
Posing questions concerning dirty water to the group.
Has that ever occurred to you at home? Besides water not being treated properly, what could have caused the water to turn yellow?
The water will be dirty & we can’t use it. I think the water was not treated properly.
What problem will this pose? What are the causes of this problem? What are we talking about here?
What is posted?
Table 4. How can water from the tap become yellow? Discussion between Yong, Xen, Tim, Yoyo, Tom and Lerk
Conversation Analysis as a Tool to Understand Online Social Encounters
257
258
Participants
Angel, Yvonne, Helen, Mandy, Mei (posted using Angel user ID)
Angel
Mandy
Angel
Yvonne
Helen
Mei
Turn No.
1
2
3
4
5
6
7
14 May, 16:22:08
14 May, 16:16:43
14 May, 16:15:42
14 May, 16:15:33
14 May, 16:15: 01
14 May, 15:57
14 May, 15.03
Sequence of time
<My Theory> my theory is water lettuce acts as a shelter for the small fishes.
<My Theory> My theory is that the terrapin will feed on the water lettuce to get food.
<My Theory> The water plants is important to the pond is because it provides food for other animals. but if the water plants are gone the animals such as terrapins or turtles will be hungry and might even die. Can you imagine a pond without water plants? It will be ugly but only animals at the bottom.
<My Theory> The water lettuce takes in carbon dioxide and releases oxygen through the process of photosynthesis.
<My Theory> Terrapins need water to survive so they live in places with water like reservoirs, oceans, lakes, fountains and ponds. However they lay their eggs on land.
We found, about 4 terrapins, lots of water lettuces, some duckweed, small fishes in the pond.
[a]The living things in the pond are, koi fish, tadpole, frogs, water lily, duckweed, terrapin, lotus and turtles.[b]They depend on one another for food.
What it discussed
Another evidence that the new norm and rule is accepted.
Helen, another member, also realized her role to contribute to the discussion and following the norms and rules of the discussion is able to do so successfully.
Yvonne “caught” on the rules or norms of how the discussion ought to proceed and offered her theory. Indicative of successful repair in the interaction.
Angel realized the interactional trouble and started to “repair” her contribution by stating her own theory rather than speaking for the group.
Started the repair mechanism to contribute her personal view, so that personal voice would not be subsumed in the group leader’s note.
50 mins later, Angel, appeared to be the group leader, took the lead in the discussion, speaking as a “collective” group with the use of the pronoun “We”.
Group entry => multiple parties opinion Participants are uncertain how to start the conversation/ discussion online.
Remarks
Table 5. Inter-dependence of organisms in the pond. Discussion between Angel, Yvonne, Helen, Mandy and Mei
Conversation Analysis as a Tool to Understand Online Social Encounters
Conversation Analysis as a Tool to Understand Online Social Encounters
The purposes and warrants from the relevant turns of talk are then interpreted and linked together to form a narrative of the trouble and repair in the interaction. In this example, the first entry of this online discussion was a group entry posted using Angels user id (refer to turn 1). This group opinion is unusual as the forum is based on the principle that members of the discussion group will contribute their individual ideas so that it can be improved upon collectively. It appeared here that the participants are somewhat uncertain how to start the conversation/discussion online. After the group note, some 50 minutes later, Angel, supposedly the group leader, decided to take the lead in the discussion since no members of the group, the teacher nor members of other discussion group posted a response or comment on the note. In her note in turn 2, she presented her ideas as a “collective” using the pronoun “we”. She appeared to be speaking on behalf of all the members in the discussion group. We treat this as interactional “trouble” since Angel was not adhering to the norms and rules of this discussion forum. If she “speaks” and presents ideas for the group, then the discussion cannot proceed as the voices of the other members of the group will not be “heard”. Further, members of her discussion group may not agree to Angel’s ideas. Mandy, upon reading Angel’s contribution, self nominated to present her personal views about the biology of terrapins. While she built upon Angel’s ideas of terrapins, she did not pursue the discussion of reporting on numbers directly. Rather, she presented her idea about the biology of terrapins, probably as an effort to shift the discussion to bring out the idea of interdependence of organism (there is no direct evidence in the discussion to conclusively attribute the intention). Mandy’s entry is interpreted as a repair move to (1) shift the discussion from Angel’s collective representation to one that allows other members to present their views, and (2) to ensure topic shift to one which aligned with the requirement of the discussion. This repair mechanism is understood by
Angel as evident in her entry in turn 3 realized where the interactional trouble was as she read Mandy’s note. She “repairs” her contribution by stating her own theory rather than speaking for the group. Observing how Mandy and Angel have repaired and proceeded with the discussion, Yvonne “caught on” the rules or norms of how the discussion ought to proceed and hence offered her theory. This is indicative of successful repair in the interaction. Helen, as a member of the group also realized her role to contribute to the discussion and following the norms and rules of the discussion is able to do so successfully. Mei’s contribution to the discussion provided yet more evidence that the original “trouble” of unfamiliarity with online discussion norm as exhibited by Angel has been repaired by Mandy.
Preferences and Accountability Any conversation exchange is formed by contribution of each speaker in logical and coherent pairs: question-answer, charge-rebuttal and so on. Speakers are accountable for providing the appropriate corresponding pair when they are presented with the first part of a recognizable pair. For example, for invitations, acceptances are preferred over rejections and for ideas or suggestions contributed, agreements and praises are preferred over counter-arguments. Analysis of preferences and accountability is commonly linked to formation of alliances, and trouble and repair. When a dis-preferred response surfaced, participants of the interaction will respond in such a way to show that the response given is not acceptable and hence, studying preferences and accountability in online environment will reveal the acceptable social norms within each online community and also between the relationship and rapport between different individuals. It will also reveal how participants hear and interpret preferences and subsequently use their turn of talk to present their own preferences using different accounting mechanisms.
259
260
“Wait a minute, or a second” shows that Yvonne’s reply is a dis-preferred answer; Timothy had expected Yvonne to agree with him. Timothy realized that his contribution style may be dis-preferred by members; softened his tone and asked to be corrected. Timothy 12/3 20:48:55 3
Wait a minute, or a second. I thought that if the owners do not take care of their taps and pipes, buyers will complain and that will harm the owner’s reputation, right? But even so, the water turning yellow does not have any connection with the quality and material of the pipes, right? The most important reason is that people pollute the water, turning them yellow, right? (Sorry if my blog has so many ‘’right?’’ Correct me if I am wrong, ok?)
Yvonne offered her opinion in response to Timothy. Yvonne 12/3 20:25:32 2
I don’t think the owner would take care of their taps. I think they are polluting water.
Timothy presents his point of view by adding another reason to a previous turn of talk and also partially agreeing with what was presented earlier. Don’t think that is the only reason. it might be true that the water turned yellow because the owners do not take care of their taps, but that is not likely. There might be a lot of different answers for this question. For example, it might be that the water is polluted, or that the material of the pipe rusts and wears off and drop into the water, causing it to turn yellow. What reasons other than the above mentioned three do you know? Timothy 22/3 21:45:59 1
What was discussed Participants/Time Sequence Turn No.
Table 6. Wait a minute. Dialogue between Timothy and Yvonne
Analysis
Conversation Analysis as a Tool to Understand Online Social Encounters
Table 6 is an excerpt of a “dialogue” taken from a discussion on cleanliness of water between two persons who are members of a larger discussion group of six persons. Analysis for preferences and accountability is tightly linked to trouble and repair and share similar procedures. The differences in analysis for the two passes are the intent of the analysis and the narrative that is generated. In turn 1 of Table 6, Timothy presented his point of view by adding another reason to a previous turn of talk that suggested a reason for tap water turning yellow. While presenting his view, Timothy also partially agreed with what was presented earlier by using the phrase “it might be true that...”. Following Timothy’s suggestion that owners are unlikely to neglect the care of their tap, Yvonne offered her opinion which indicated that she did not agree with Timothy’s opinion in turn 2. Here she used “I don’t think...” to express her difference in opinion. Timothy saw Yvonne’s reply as a dispreferred answer as can be seen from the expressions “Wait a minute, or a second” in turn 3. The use of this expression suggests that Timothy had expected Yvonne to agree with him that the owners are responsible for the water turning yellow but Yvonne gave an opinion that did not align with his and hence the need to pause to clarify stance. Timothy was also conscious that his contribution style may be dis-preferred by members in the group and hence he apologized for his style and consequently asked to be corrected. This apology appears within the parenthesis and not as part of the main text since it does not contribute to the development of ideas in the main discussion of polluted tap water. In analyzing for preferences and accountability, the subtleties of what is expected and accepted socially online can be revealed. Insights into how participants responsible for providing the second parts of pairs cannot do so or did so in a manner that is not acceptable will reveal how discussions break down. By studying the systems of preferences in online discussions and what happens when
Conversation Analysis as a Tool to Understand Online Social Encounters
dis-preferred responses are made, facilitators and participants of online discussions will be able to engage in more productive discussion. Further, when participants account for their contributions in a dis-preferred manner, analysis can lead to the diagnosis of trouble, so that repair mechanisms can be activated. In reality, analysis of preferences and accountability when coupled with analysis of trouble and repair helps to illumine online social norms and orderliness.
Institutional Categories and the Question of Identity The institution authority and the identity adopted by each speaker in any interaction will depend on what is said and how it is said. As such, analysis for institutional categories and identity of the participants would reveal how the various participants orient themselves in the interactions. Participants, particularly in institutions, orientate themselves with different identities and as a result of this, what is by different individuals carry different meanings. For example, a rejection “No” uttered by different individuals could be a preferred request to a completion of a pair (as in a question by a teacher to her students about additional lesson) or it could also be a dis-preferred completion (as in a request by a teacher to her students to hand in their homework). In this analytic phase, we examine turns of talk to surface each speakers’ identity and the power relations that exist between the speakers. The example cited below (taken from Tan and Tan, 2006) shows the interaction between a teacher and five students as they work together to construct knowledge of convention currents. The students had made an observation that when they placed a candle below a piece of paper cut out in the shape of a spiral, the spiral moved. Examine Table 7 and consider the nature of the interaction taking place. What are the roles and identities held by each of the participants such that this learning event is possible? The
participants are all contributing their ideas online in the form of questions in turns 1, 3 and 5 while others contribute in the form of comments and discussions to the questions in subsequent moves. Examining the nine turns of talk, it is appropriate to ask the question of who is playing the role of the teacher and who the students are. In a traditional teacher-student relationship and interaction, the institutional responsibility, authority and power to decide and dictate learning activities resides with the teacher while students typically “play” the role of students by responding to the teacher’s questions. In Table 7, it is possible that Edmund, Ken or Paul could be the teacher since the turns of talk exhibit questioning of ideas presented in previous turns of talk and these questions directed the course of the discussion. It is important to note that the affordances of the online discussion allows for renegotiation of traditional roles of teacher-students interaction. This online discussion revealed that the traditional institutional power and responsibilities bestowed upon the teacher is somewhat distributed to the students, in this case the participants of the discussion. The students, Cindy, Ken, Paul, John and Mark are free to express their views. Edmund, the teacher, did not exert control over the speaking rights of the students. The students are given the liberty to decide when they want to contribute their ideas or comments. They regulated each others’ contribution with Edmund facilitating the discussion by posing a question at the start. In fact, Ken (in turn 3) and Paul (in turns 5 and 7) exhibited “teacher-like” traits in the various turns of talk by asking questions to challenge ideas presented by other participants, likely to promote more indepth thinking about ideas presented. Analysis of institutional power and identity in an online discussion allows concrete evidence of control over learning events to be visible. This helps to enlighten the interactions taking place to inform decisions make about learning.
261
Conversation Analysis as a Tool to Understand Online Social Encounters
Table 7. Why the spiral moved? Discussion between Edmund, Cindy, Ken, Paul, John and Mark 1.
Edmund:
Explain how the burning candle caused the spiral to move.
2.
Cindy:
<My theory> is that the smoke from the flames causes the spiral to move. The flame’s smoke is thick, so when the smoke rises, and the wind blows the smoke, the wind will blow the smoke and push it causing the spiral to move. if the wind affects the spiral’s movement.
3.
Ken:
how does the smoke cause the spiral to move? Do you mean the heat and light energy from the flame instead?
4.
Cindy:
<My theory> the smoke makes the air expands, then the wind pushes the air upwards causing the spiral to coil and turn up.
5.
Paul:
where did the wind come from??? <My theory> is that there is ‘no wind’ and the heat given out by the spiral causes the spiral to coil and turn up.
6.
Cindy:
<My theory> is that the air comes from our surroundings.
7.
Paul:
is that if wind is included in the experiment, it would not be a fair test.
8.
John:
<My theory> is that the wind did not come from the surrounding but from beneath. It did not come from the surrounding as when we did the experiment, we blocked the wind from the surrounding but not the wind beneath. how the wind move from underneath the spiral.
9.
Mark:
Then it was not a fair test. The experiment was conducted to find out how the heat from the candle interacting with the spiral to make it move not the surrounding air.
FUTURE RESEARCH DIRECTIONS Publications thus far have focused on CA as a method for studying face-to-face social interactions. With growing interest of understanding how the orderliness of online interactions is formed and maintained, it is therefore not surprising that researchers of online learning are turning to methods traditionally used by social scientists to understand emergent trends in online interactions (for example, Panyametheekul & Herring, 2003). Future research using CA as a method could include studies that examine unique characteristics of online interactions. For example, the non-linear turn taking structure of an online environment presents opportunities for participants to take part in conversations with different groups within a short span of time. Given the different affordances of face-to-face and online environments, another area of study could be comparison of face-to-face interactions with online interactions. For example, given the lack of visual cues in most online interactions, are there differences in trouble-and-repair
262
mechanisms between face-to-face and online environments? It is also possible to conduct multiple pass analysis of the same transcript to provide a more holistic picture of what goes on in an online environment. For example, content analysis can be applied to examine the levels of knowledge construction achieved by the participants while CA can be used to reveal the possible process leading to the outcomes. Analytic passes like parties and alliances can be used to understand how and if power in real life influence interactions in an online discussion. With the growing popularity of environments like second life, the translation of such power interplay to an online environment needs to be understood. Similarly, in classrooms where the use of online learning is popular, researchers can examine whether the interactions in classrooms have spill-over effects on interaction in the virtual environment and how these effects can be encouraged or reduced, depending on the outcomes of these interactions. CA is an analytic method that is interpretative in nature. Methodological issues such as objectivity, reliability, replicability and systemic coherence
Conversation Analysis as a Tool to Understand Online Social Encounters
(Rourke, Anderson, Garrison and Archer, 2000) have been discussed extensively for methods such as content analysis. Discussion and studies on these methodological issues could help to enhance the credibility of CA and widen its acceptance among researchers.
CONCLUSION The exploratory nature of CA as an analytic tool to understanding online learning environment is exciting as it has the potential to reveal what existing analytic methods could not present to us. With the rich insights that CA has generated in face-toface social interaction, its applications to online learning environment present great potential waiting to be discovered. In this chapter, we illustrate some of these potentials by applying Freebody’s (2003) six analytic passes to examine transcripts taken from online asynchronous discussion. Our analysis showed several interesting phenomena in online interactions, for example, it is possible for participants to engage in several conversations with different groups of people within the same period of time. Another example is the greater power assumed by the learners in an online environment, which could be partly due to the absence of a visual authoritative figure (teacher) that otherwise would be expected to direct and guide the conversations. Our examples provide only glimpses of possibilities that CA could offer. We believe that using CA as a method for analyzing online interactions is a fertile ground of research with potentials yet to be realized.
REFERENCES Cazden, C. (1986). Classroom discourse. In Wittrock, M. C. (Ed.), Handbook of research in teaching. New York: Collier Macmillan Publishers.
de Laat, M. F. (2002). Network and content analysis in an online community discourse. In G. Stahl (Ed.), Computer support for collaborative learning: Foundation for a CSCL community; proceedings of CSCL 2002, Boulder, Colorado, USA, January 7-11, 2002. (pp. 160-168). Hillside, NJ: Erlbaum. Fahy, P., Crawford, G., & Ally, M. (2001). Patterns of interaction in a computer conference transcript. International Review of Research in Open and Distance Learning, 2(1), 1–24. Fahy, P. J. (2002). Epistolary and expository interaction patterns in a computer conference transcript. Journal of Distance Education, 17(1), 20–35. Freebody, P. (2003). Qualitative research in education: Interaction and practice. London: Sage Publications. Goodwin, C., & Heritage, J. (1990). Conversation analysis. Annual Review of Anthropology, 19, 283– 307. doi:10.1146/annurev.an.19.100190.001435 Gunawardena, C., Lowe, C., & Anderson, T. (1997). Analysis of a global online debate and the development of an interaction analysis model for examining social construction of knowledge in computer conferencing. Journal of Educational Computing Research, 17, 397–431. doi:10.2190/7MQV-X9UJ-C7Q3-NRAG Hara, N., Bonk, C. J., & Angeli, C. (2000). Content analysis of online discussion in an applied educational psychology course. Instructional Science, 28, 115–152. doi:10.1023/A:1003764722829 Heckman, R., & Annabi, H. (2005). A content analytic comparison of learning processes in online and face-to-face case study discussions. Journal of Computer-Mediated Communication, 10(2). Retrieved Sept 10, 2009 from http://jcmc. indiana.edu/vol10/issue2/heckman.html
263
Conversation Analysis as a Tool to Understand Online Social Encounters
Jeong, A. C. (2003). The sequential analysis of group interaction and critical thinking in online threaded discussions. American Journal of Distance Education, 17(1), 25–43. doi:10.1207/ S15389286AJDE1701_3 Mazur, J. M. (2004). Conversation analysis for educational technologists: Theoretical and methodological issues for researching the structures, processes, and meaning of on-line talk. In Jonassen, D. H. (Ed.), Handbook of Research on Educational Communications and Technology (2nd ed., pp. 1073–1098). New Jersey: Lawrence Erlbaum Associates. Mehan, H. (1983). The role of language and the language of role in institutional decision making. Language in Society, 12, 187–211. doi:10.1017/ S0047404500009805 Panyametheekul, S., & Herring, S. C. (2003). Gender and turn allocation in a Thai chat room. Journal of Computer-Mediated Communication, 9(1). Retrieved Sept 10, 2009 from http://jcmc. indiana.edu/vol 12/issue2/waldvogel.html Rourke, L., Anderson, T., Garrison, D. R., & Archer, W. (2000). Methodological issues in the content analysis of computer conference transcripts. International Journal of Artificial Intelligence in Education, 12, 8–22. Sacks, H., Schegloff, E. A., & Jefferson, G. (1974). A simplest systematic for the organization of turntaking for conversation. Language, 50, 696–735. doi:10.2307/412243 Sinclair, J., & Coulthard, M. (1992). Towards an analysis of discourse. In Coulthard, M. (Ed.), Advances in spoken discourse analysis. London: Routledge. Tan, S.-C., & Tan, A.-L. (2006). Conversational analysis as a analytical tool for face-to-face and online conversations. Educational Media International, 43, 347–361. doi:10.1080/09523980600926374
264
ten Have, P. (1999). Doing conversation analysis: A practical guide. London: Sage Publication. Walther, J. B. (1996). Computer-mediated communication: Impersonal, interpersonal, and hyperpersonal interaction. Communication Research, 23(1), 3–43. doi:10.1177/009365096023001001 Wasserman, S., & Faust, K. (1997). Social network analysis. Methods and applications. Cambridge: Cambridge University Press.
ADDITIONAL READING Black, S. D., Levin, J. A., Mehan, H., & Quinn, N. C. (1983). Real and non-real time interaction: Unraveling multiple threads of discourse. Discourse Processes, 6, 59–75. doi:10.1080/01638538309544554 Carlsen, W. (1991). Questioning in classroom: A sociolinguistic perspective. Review of Educational Research, 61(2), 157–178. Cazden, C. (1986). Classroom discourse. In Wittrock, M. (Ed.), The handbook of research in teaching (pp. 432–463). New York: Macmillan. Cherny, L. (1999). Conversation and community: Chat in a virtual world. Stanford, CA: CSLI. Collison, G., Elbaum, B., Haavind, S., & Tinker, R. (2000). Facilitating on-line learning. Madison, WI: Atwood Press. Condon, S. L., & Cech, C. G. (1996). Functional comparison of face-to-face and computer-mediated decision making interactions. In Herring, S. (Ed.), Computer-mediated communication: Linguistics, social and cross-cultural perspectives (pp. 65–80). Amsterdam: John Benjamins.
Conversation Analysis as a Tool to Understand Online Social Encounters
Drew, P. (1984). Speakers’ reportings in invitation sequences. In Atkinson, J. M., & Heritage, J. (Eds.), Structures of social actions: Studies in conversation analysis (pp. 57–101). Cambridge: Cambridge University Press. Edwards, A. D., & Westgate, D. P. G. (1987). Investigating classroom talk. Philadelphia, PA: Falmer Press. Garcia, A. C., & Jacobs, J. B. (1999). The eyes of the beholder: Understanding the turn-taking system in quasi-synchronous computer-mediated communication. Research on Language and Social Interaction, 32, 337–367. doi:10.1207/ S15327973rls3204_2 Garfinkel, H. (1967). Studies in ethnomethodology. New York: Prentice Hall. Gilbert, N., Wooffitt, R., & Fraser, N. (1990). Organizing computer talk. In Luff, P., Gilbert, N., & Frohlih, D. (Eds.), Computers and conversation (pp. 235–258). London: Academic Press. Have, P. (1999). Doing conversation analysis. Thousand Oaks, CA: Sage Publications. Heath, C. (1997). The analysis of activities in face to face interaction using video. In Silverman, D. (Ed.), Qualitative research: Theory, method and practice (pp. 183–200). London: Sage Publications. Heritage, J. (1997). Conversational analysis and institutional talk: Analysing data. In Silverman, D. (Ed.), Qualitative research: Theory, method and practice (pp. 161–182). London: Sage Publications. Hiltz, S. R. (1986). The ‘virtual classroom’: Using computer-mediated communication for university teaching. The Journal of Communication, 36, 94–105. doi:10.1111/j.1460-2466.1986.tb01427.x Hutchby, I. (2001). Conversation and technology: From the telephone to the internet. Malden, MA: Polity Press/Blackwell.
Hutchby, I., & Wooffitt, R. (1998). Conversation analysis. Cambridge: Polity Press. Jacko, J., & Sears, A. (2002). The human-computer interaction handbook: Fundamentals, evolving technologies and emerging applications. Upper Saddle River, NJ: Lawrence Erlbaum Associates. Levin, J., Kim, H., & Riel, M. (1990). Analyzing instructional interactions on electronic message nextworks. In Harasim, L. (Ed.), Online education. New York: Praeger. Luff, P., Gilbert, N., & Frohlich, D. (Eds.). (1990). Computers and conversation. London: Academic Press. Malone, M. (1997). Worlds of talk: The presentation of self in everyday conversation. Malden, MA: Polity Press/Blackwell. Mason, R. (1991). Methodologies for evaluating applications of computer conferencing. In Kaye, A. R. (Ed.), Collaborative learning through computer conferencing. Heidelberg: Springer-Verlag. Pomeranz, A., & Fehr, B. J. (1997). Conversation analysis: An approach to the study of social action as sense making practices. In van Dijik, T. A. (Ed.), Discourse studies: A multidisciplinary introduction (pp. 64–91). London: Sage Publications. Psathias, G. (1995). Conversation analysis: The study of talk-in-interaction. Thousand Oaks, CA: Sage Publications. Rice, R. E., & Love, G. (1987). Electronic emotion: Socio-emotional content in a computer-mediated communication network. Communication Research, 14, 85–105. doi:10.1177/009365087014001005 Romiszowski, A., & Mason, R. (1986). Computermediated communication. In Jonassen, D. (Ed.), Handbook of research for educational communication and technology (pp. 438–456). New York: Simon and Schuster Macmillian.
265
Conversation Analysis as a Tool to Understand Online Social Encounters
Spitzer, M. (1989). Computer conferencing: An emerging technology. In G. Hawisher & S. Selfe (Eds.), Critical perspectives on computers and composition instruction (pp. 187-199). New York: Teachers’ College Press. Wooffitt, R. (2005). Conversation analysis and discourse analysis: A comparative and critical introduction. London, Thousand Oaks, New Delhi: Sage Publications.
KEY TERMS AND DEFINITIONS Accountability and Preferences: Analysis of how participants supply the second part in sequence of pairs. Building Exchanges: Analysis of ways in which turns at talk are co-ordinated into larger
266
collections, directed at particular topics or tasks within an interaction. Conversation Analysis: The study of talk-ininteractions that show the social orders, structure, patterns and influence of powers. Institutional Category and Identity: Analysis of how participants orient their responsibilities to provide preferred second-pair parts to exchanges. Parties, Alliances and Talk: Analysis of how different parties organize themselves through interactions into groups or meaningful categories. Trouble and Repair: Analysis of how a conversation breaks down and the ensuing corrective mechanisms to repair and restore social order. Turn Taking Structure: Analysis of the fundamental organization of a conversation by which the conversation is achieved through turns.
Section 4
Data and User Modelling
The massive amounts of data available in virtual communities can be used to build sophisticated as well as simple models of human interactions that in turn can be used to build systems, and processes that can enhance understanding of ways to support better interaction in these communities. Modelling approaches in virtual communities are needed to understand more complex and more abstract phenomenon and data structures in more concrete ways. Modelling provides the basis for measurement and to identify key metrics such as community structure, conversion rates from readers to contributors, type and degree of cooperation and interactions social dynamics. Important properties of data models are variables and characteristics of the models. Underlying these characteristics is the ability to operationalise and measure the variables. For variables should be used with consistent models that can in turn inform us about why certain communities are successful while others are not. Section 4 presents 4 chapters dealing with user and data modelling of social behaviour, user traces as well context of interactions in virtual communities. Chapter 15 presents work on modeling the diversity of user behavior in online communities. More specifically the authors looked at how users contribute and attend to content, and how they form social links with their peers. The chapter also attempts to illustrate the models being examined and parameter estimation procedure. Chapter 16 presents a need to understand the role of context in knowledge-based systems. The chapter shows the relationships between explanation and context and presents different types of explanations in contextual-graphs formalism. The chapter also presents a discussion on a case study of collaborative answer building. Chapter 17 is focused on description and discussion of an agent-based modelling system. The chapter describes an agent that acquires domain knowledge content from a learning history log database in a learning community and automatically generates motivational messages for the learner. Chapter 18 presents a Bayesian Network techniques for modelling complex social systems. It illustrates the use of this methodology through the discussion of social capital and practical scenarios.
268
Chapter 15
Modeling the Diversity of User Behavior in Online Communities Tad Hogg Hewlett-Packard Laboratories, USA Gabor Szabo Hewlett-Packard Laboratories, USA
ABSTRACT This chapter describes models of the diversity of behavior seen in online communities, in particular how users contribute and attend to content, and how they form social links with their peers. We illustrate the models and parameter estimation procedure with a political discussion community. The models identify key characteristics of users and the web site design leading to the diverse behaviors, and suggest future experiments to identify causal mechanisms producing these characteristics.
INTRODUCTION Online communities are becoming ubiquitous and allow exploiting “wisdom of crowds” (Surowiecki, 2004) to create and rate content. Examples include identifying interesting current news stories (Digg), creating encyclopedia articles (Wikipedia), sharing photos and videos (Flickr and YouTube) and fixing bugs in open source software (Bugzilla). Online communities often allow users to form explicit links with other users whose contributions they find interesting and many times they highlight the activity of a user’s designated friends
DOI: 10.4018/978-1-60960-040-2.ch015
(Lerman and Galstyan, 2008) to help users find relevant content. The key aspects of the communities are the users, their links to other users, and the content they create and act on. User behaviors in these communities are often extremely diverse, with long-tailed distributions among participants. These behaviors include a concentration of activity among a few top users, a focus of community attention on a small fraction of the submitted content, and a few active users forming most of the links in the community networks. Thus diversity plays a dominant role in community behavior. Identifying the nature of this diversity can aid in designing online communities, particularly in identifying aspects of the web site design that
Modeling the Diversity of User Behavior in Online Communities
promote effective participation by various types of users (Ren et al., 2007). This chapter describes models of this diversity based on choices users make with information readily available to them on the community web site. The models suggest how diversity in behavior arises from underlying preferences of the users and the design of the site. Identifying these characteristics can help improve the web site by attracting or retaining contributing users, and by suggesting what information the web site should highlight about contributed content or other users. Models can also describe the aggregate average or typical behaviors in the community (Lerman, 2007). We describe and illustrate these models in the context of a political group formation community site, Essembly. The models not only explain the observed diversity but also allow estimating users’ activity rates from their behavior on the site shortly after they join, and estimating the community’s interest in new content from the initial reactions of a few users. That is, model parameter estimation shows that user activity rate and community interest in new content becomes evident shortly after users join or content is posted (Szabo and Huberman, 2010). In the remainder of this chapter, we first describe the Essembly online community that we focus on for our models. We then discuss models for user activity, content relevance and link structure. We conclude with questions for future study and discuss how these models apply to other online communities.
ESSEMBLY Essembly is an online community for political discussion. These discussions center around usercreated resolves reflecting controversial political issues such as “overall, free trade is good for workers”. Similar to other online communities,
Essembly allows users to contribute, rate, and discuss content, in this case political policy questions. Essembly encourages users to find others with similar interests and form links with them. To facilitate this discovery, Essembly provides each user a ranked list of the other users with similar ideological profiles based on their votes on the resolves. Unlike most online communities, Essembly allows users to explicitly distinguish links to others with similar preferences (e.g., discovered via their community activities) from links to people they know socially. Specifically, Essembly provides three distinct networks for users: a social network, an ideological preference network, and an anti-preference network, called friends (those who know each other in person), allies (those who share similar ideologies), and nemeses (those who have opposing ideologies), respectively. Users specify the link type when they create links. Network links are formed by invitation only and each link must be approved by the invitee. The resulting networks have a structure similar to that seen in other social networking web sites, and the links generally conform to their nominal semantics (Brzozowski et al., 2008; Hogg et al., 2008). In summary, user activities consist of creating resolves, voting (expressing their opinions on resolves on a 4-point scale ranging from strong disagreement to strong agreement), commenting on resolves (e.g., to explain their vote or how they interpret the resolve), and forming links to other users. The Essembly user interface presents several options for users to discover new resolves, e.g., based on votes by network neighbors, recency, overall popularity, and degree of controversy. Essembly is a modest-sized community for which it is feasible to evaluate the behavior of all users and contributed content over an extended period of time. This comprehensive view is useful for studying diversity of user behavior. In contrast, larger online communities, such as Digg or YouTube, generally require focusing on a sample of users or content. Nevertheless, online communities generally show similar broad distributions
269
Modeling the Diversity of User Behavior in Online Communities
of behavior (Wilkinson, 2008) so the modeling approach we illustrate for Essembly also applies to these other communities. Our data set consists of anonymized voting records for Essembly between its inception in August 2005 and December 2006, and the users and links in the three networks at the end of this period. Our data set has 15,424 users. Essembly presents 10 resolves during the user registration process to establish an initial ideological profile. To focus on user-created content, we consider the remaining 24,953 resolves, with a total of 1.3 million votes.
MODELING THE DIVERSITY OF ACTIVITY Important aggregate measures of community behavior are the numbers of actions users perform, how users spread their attention over the submitted content, how users form links to each other, and how the resulting networks influence subsequent user actions. While motivations of individual users depend on their specific interests and experience with the community, aggregate behaviors of online communities show many simple regularities (Wilkinson, 2008). As a key aspect of community behavior, we focus on the diversity of the distribution of the activity associated with users and content. We thus require models relating observed counts of various actions in the online community to underlying properties of the users and content. A simple approach to modeling observed counts, such as number of votes by an Essembly user, is as a Poisson random process. Such a model assumes each action is made independently of the others at some rate x > 0 for a duration t > 0. The product s = xt is then the expected number of actions and the actual number of actions observed, an integer k ≥ 0, is modeled as arising from a Poisson distribution with mean s. Specifically, the probability to observe the value k is Po(s; k) = e-s sk / k!.
270
For example, in modeling the number of votes a user makes, x could be the average number of votes per day for users in the community and t the number of days a typical user is active in the community. The random variation arising from the Poisson process then models the variations around the expected value xt for different users. However, a model using a Poisson process with a single expected value s for the entire population results in narrow distributions, in marked contrast to the observed behavior in online communities. Instead, we capture the observed diversity by having the rate x vary across the users and content in the community. That is, our model for user and resolve diversity posits an underlying quantity x, characterizing the user or resolve, which determines the mean of a Poisson process producing the observed number of actions (for users) or votes and comments (for resolves). The values of x are not directly observable, but can be estimated for individual users or resolves. These estimates are fit well by lognormal distributions (Hogg and Szabo, 2009a). A continuous value x > 0 has a lognormal distribution (Aitchison and Brown, 1957) with parameters μ and σ> 0 when ln x is distributed according to a normal distribution with mean and standard deviation equal to ¹ and σ, respectively. We denote this distribution as LN(μ,σ; x). Multiplying x by a constant t > 0 also gives a lognormally distributed value s = xt with distribution LN(μ+ ln t, σ; y). In this chapter we focus on the aggregate behavior of the population. So instead of estimating values of x individually for each user or resolve, we estimate the parameters characterizing the lognormal distribution of x values. This approach gives a description of the population as a whole rather than individual users or resolves. Combining the lognormal distribution of characteristic values x and the Poisson process for producing the observed number of actions with mean s = xt, the overall probability to observe k is the mixture of these two distributions, i.e., P(μ + ln t,σ; k) where
Modeling the Diversity of User Behavior in Online Communities
P (m, s; k ) =
∫
∞ 0
LN (m,s; s )Po(s; k )ds
(1)
While this integral does not have a simple closed form, it is readily evaluated numerically. We use maximum likelihood estimation based on this distribution to estimate the model parameters μ and σ from the observed values (Collett, 2003). Specifically, with the observed actions (k1,k2,…) for users active for times (t1,t2,…), the log-likelihood associated with parameters μ and σ is ∑ i ln(P(µ + ln ti , σ, ki )) where the sum is over all the users. The maximum likelihood estimates for the parameters μ and σ are those values maximizing this expression, found through a numerical maximization procedure. For some behaviors, such as the total number of votes or comments a resolve receives, the duration t is mainly determined by how the web site shows content to users (e.g., highlighting recently introduced resolves). In this case, the values of t appearing in Equation (1) are all the same, or nearly so, as illustrated in Sec. 5. For other behaviors, such as the number of votes a user makes, the durations vary considerably, giving another source of diversity. In particular, users differ significantly in how long they remain active in the community. A simple model of how users decide to become inactive is with a rate independent of how long they have already participated, leading to an exponential distribution of durations. As described in Sec. 4, such a model is a poor description of the observed activity times of users. Instead, these times are described by a generalization of the exponential distribution, called a Weibull or stretched-exponential distribution, α
W (α, β; t ) = αβ −αt σ−1e −(t /β )
and can also account for correlations between these two values among the users, as discussed in Sec. 4.
USERS Users join Essembly by registering on its web site. Typically, users participate in the community for a while and eventually lose interest. As a measure of activity time, we use the time between a user’s first and last votes (including votes on the 10 resolves presented during registration). Most users are active for only a short time (less than a day) and do not contribute much content. For modeling the distribution of user actions, we restrict attention to major users: users who were active at least one day and who had no activity within the last 30 days of our sample. Our data sample includes most or all of the activity of these users, since in our sample, few users ever return after 30 days of no activity. There are 3891 such users, of which 18 had no activity (beyond their original registration with the site, which they took more than a day to complete). Figure 1 shows the distribution of activity among the major users. The plot does not include the 18 users with no activity. The logarithmic scale Figure 1. Distribution of number of major users vs. the number of actions (votes, comments, resolve creations and links) a user made. The gray curve is the distribution from the model described in the text
(2)
When α = 1, this is an exponential distribution. Our model gives the overall distribution as a combination of the distribution of times and rates,
271
Modeling the Diversity of User Behavior in Online Communities
of the plot highlights the diversity of user activity: most users make few contributions to the community, while a few make hundreds or thousands of contributions. This broad distribution arises from two factors: how long users participate on the site, and how often they act on the site while active.
Model In our data set, users typically return to Essembly every few days if they return at all. As a useful, but somewhat arbitrary, criterion, we say a user becomes inactive after 30 days of no activity. Figure 2 summarizes user behaviors, from the time they become active by registering until they become inactive. Users have a diverse range of interest in the community, as expressed in the time they remain active and their average activity rate while active, i.e., the ratio of a user’s number of actions to the time that user remains active. These values have a slightly negative correlation, corresponding to the observed behavior of users starting with relatively frequent actions after they register and decreasing activity rate over time. A simple approach to modeling user activity is to ignore this small correlation and treat activity time and rate as independent. This approach shows the average
user activity rates are well-described by a lognormal distribution (Hogg and Szabo, 2009a). We can improve on this independence assumption by accounting for the generally decreasing activity rate of users as they spend more time with Essembly. In particular, we consider the observed number of actions of a user who is active for time t to arise from a Poisson process whose expected value is ρta where ρ, characterizes the rate the user participates in the site while active and a characterizes how the average activity rate depends on the time a user remains active. A value a = 1 corresponds to activity proportional to the time a user is active, so average activity rate is independent of activity time, as assumed in the previous model (Hogg and Szabo, 2009a). Note that a constant activity rate was used in Sec. 3 to illustrate the basic ingredients of the model. In the context of this model, the diversity of user actions arises from the combination of ρ and activity time values for each user. Based on the observed lognormal distribution of average user activity times (Hogg and Szabo, 2009a), we suppose the ρ values are lognormally distributed with parameters ν and τ, for the mean and standard deviation of ln ρ, respectively. Figure 3 shows the distribution for the activity times of the major users, and a fit to a Weibull
Figure 2. Model of user behavior. People join the site as active users, who create resolves, vote and comment on them and link to other active users. Users can eventually stop participating and become inactive
272
Modeling the Diversity of User Behavior in Online Communities
Figure 3. Distribution of activity times for major users on a log plot. Each bar shows the fraction of users who become inactive, per day, within a 20day period. The curve shows a Weibull distribution fit to the values, with parameters given in Table 1
distribution starting at one day of activity (since, by definition, major users are active at least one day). The figure shows that the longer a user remains active, the lower the probability they will become inactive in the next day. Thus in this model, a user active for time t has probability to perform k actions equal to P(ν + a ln t; τ ; k) as given by Equation (1). For the population as a whole, the model gives the probability for k actions for users active at least one day as Puser (k ) =
∫
time. The value in Table 1 gives a negative correlation between average activity rate and activity time. The observed correlation between activity rate and time for major users is -0.17. As a test whether the model is consistent with this value, we used the model to generate multiple samples of hypothetical users whose actions are described by the model. Each of these samples gives a correlation between activity rates and times, thereby producing a distribution of correlations that we would expect if the model were correct. We find about 15% of the samples have a correlation at least as extreme as the observed value. So with respect to this correlation, the model is consistent with the data. The curve in Figure 1 shows the probability values Puser from Equation (3), multiplied by the number of major users to give expected values for the numbers of users with each number of actions. While the model is close over most of the distribution, it does not capture the small number of cases with few actions. This is due to the larger negative correlation between activity rate and time for users with just a few actions, which is not captured by the simple dependence in the model.
∞ 0
W (α, β; t )P (v + a ln(t + 1), τ; k )dt
(3)
with P and W given by Equation (1) and Equation (2), respectively. The use of t + 1 in the second factor arises because the set of users considered here are active for at least one day, so the Weibull distribution of activity times is for the time beyond this one day minimum. Table 1 gives the parameters, including maximum likelihood estimates for the values ν, τ, characterizing the distribution of activity rates, and a, characterizing how average activity rate changes with how long the user is active. Since a < 1, activity grows less rapidly than linearly with
Table 1. Number of users active at least one day with no activity within 30 days of the end of our sample, and actions by these users. The values ρ characterize user activity rate and the activity time follows a Weibull distribution (Equation (2)). The ranges for the model parameters are estimates of the 95% confidence intervals Number of major users
3891
Number of actions by these users
616335
Model parameters for ρ
ν=1.8±0.1 τ=1.49±0.04
Model power for growth of actions with time
a=0.51±0.03
Activity time distribution
α=0.70±0.02 β=59±3 days
273
Modeling the Diversity of User Behavior in Online Communities
Discussion This model of user activity accounts for diversity as arising from two factors: the time a user chooses to spend in the community and the user’s average activity rate during that time. The lognormal distribution for ρ suggests a multiplicative process (Redner, 1990) underlying user preferences for how active they are while participating in the community. It would be interesting to relate this range to properties of the users, such as their involvement in political action groups and other demographics. The stretched exponential distribution of activity times indicates multiple time scales for users to lose interest in the site, indicating a mixture of processes (Frisch and Sornette, 1997) leading users to abandon the site, as also occurs in other online communities (Wilkinson, 2008). A more general question on the origin of user diversity is the extent to which it arises from preexisting characteristics of the users or from the differing experiences users have on the site. Our model, by supposing each user has a characteristic value ρ determining their activity rate, suggests prior differences among users are the dominant effect for Essembly. Similarly, a study of Wikipedia contributors suggests heavy contributors are different from rest of population even initially (Panciera et al., 2009). Other studies suggest a significant influence for experience on the site, particularly whether other users attend to and encourage new users’ contributions (Butler et al., 2007; Joyce and Kraut, 2006; Wilkinson, 2008). These differing causes of user diversity have distinct implications for building the online community. If user participation is mainly due to the nature of the users, then a good approach is improving the exposure of the web site to potential new users who would be interested in the site. Conversely, if users are initially fairly homogeneous and diversity arises through experience, then improving recognition of current users within
274
the community and providing mentors is the best approach (Rashid et al., 2006).
CONTENT A key question for user-created content is how user activities distribute among the available content. For Essembly, as with other online communities, there is a broad distribution in attention given to content. In Essembly, each resolve receives its first vote when it is created, i.e., the vote of the user introducing the resolve. Thus the observed votes on a resolve are a combination of two user activities: creating a new resolve (giving the resolve its first vote) and subsequently other users choosing to vote on the resolve if they see it while visiting the site. We consider a user’s selection of an existing resolve to vote on as mainly due to a combination of two factors: visibility and interestingness of a resolve to a user. Visibility is the probability a user finds the resolve during a visit to the site. Interestingness is the conditional probability a user votes on the resolve given it is visible to that user. These two factors apply to a variety of web sites, e.g., providing a description of average behavior on Digg (Lerman, 2007). The web site’s user interface design determines content visibility. Typically sites, including Essembly, emphasize recently created content and popular content (i.e., receiving many votes over a period of time). Essembly also emphasizes controversial resolves. As with other networking sites, the user interface highlights resolves with these properties both globally and among the user’s network neighbors. Users can also find resolves through a search interface. For Essembly, the networks have only a modest influence on voting (Hogg et al., 2008). Recency appears to be the most significant factor affecting visibility (Hogg and Szabo, 2009a). One approach to modeling attention on content considers how attention varies with time and in-
Modeling the Diversity of User Behavior in Online Communities
troduction of subsequent content, combined with users’ willingness to visit successive pages or scroll down a long list, described by the “law of surfing” (Huberman et al., 1998). For Essembly, the number of subsequent resolves (“resolve age”) is the key contribution to loss of visibility. The combination of different ages in the data sample is a significant factor in producing the observed distribution of votes (Huberman and Adamic, 1999). In particular, for Essembly, this process gives votes following a lognormal distribution with power-law tails (Hogg and Szabo, 2009a). To model diversity of attention to resolves, we focus on “old” resolves, i.e., those introduced more than 30 days before the end of our sample. These resolves already have most of their votes and comments. This results in a simpler model, without the need to account for aging effects, i.e., some newly introduced resolves may be very interesting but haven’t yet had time to accumulate many votes. On average, resolves receive 90% of their votes within 30 days of their introduction. By restricting our discussion to these resolves, we focus primarily on intrinsic properties of the resolves that lead to their diversity of attention rather than also having to consider how resolves receive votes and comments over time. Figure 4 shows the distribution of activities for the resolves. Not included in the plots are the 13 and 76 resolves with no votes or comments, respectively.
Model Our model assumes a resolve collects votes and comments as a Poisson process proportional to an “interestingness” factor r, whose distribution fits well with a lognormal distribution based on votes (Hogg and Szabo, 2009a). Extending this model to comments gives a Poisson process where the expected numbers of votes and comments for a resolve with completed activity are V r and Cr where V and C are the total number of votes and comments, respectively. Each resolve receives one vote (from the user posting the resolve) when the resolve is introduced. Thus, the model applies to votes other than the one vote introducing each resolve, and V is the difference between the total number of votes on the resolves and the number of resolves. In this model, we take the interestingness of a resolve to be the same with respect to votes and comments. With r distributed according to a lognormal distribution with parameters μ and σ, the probability to observe k votes according to this model is P(μ +ln V; σ; k) as given by Equation (1). The distribution of comments is the same form, with V replaced by C. Table 2 gives the parameters, including maximum likelihood estimates for the values μ, σ characterizing the distribution of interestingness of the resolves. The maximum likelihood estimation procedure is the same as described above for the model of user activity. The curves in Figure 4 show the expected number of resolves with each number of votes
Figure 4. Distribution of (a) votes and (b) comments on resolves. The gray curves indicate fits from the model described in the text, with parameters given in Table 2
275
Modeling the Diversity of User Behavior in Online Communities
Table 2. Number of resolves introduced at least 30 days before the end of our sample, and votes and comments on those resolves. The values r are the “interestingness” for the resolves. The ranges for the model parameters are estimates of the 95% confidence intervals Number of resolves
R=22848
Number of votes
V=1208035
Number of comments
C=432315
Model parameters for r
μ=-10.30±0.01 σ=0.69±0.01
and comments, respectively. Specifically, the expected number of resolves with k votes is RP(μ + ln V; σ; k), and similarly for comments with V replaced by C, with R the number of resolves, given in Table 2. In our model, the same interestingness value r applies to both votes and comments. Given the value of r for a particular user, the votes and comments are independent choices. The variation in r values among resolves introduces a correlation between votes and comments. That is, a highly interesting resolve is likely to get relatively large numbers of both votes and comments. Thus a test of the model is how the distribution of correlations it predicts between votes and comments compares with the observed correlation. The correlation between numbers of votes and comments on these resolves is 0.87, which is somewhat lower than the value 0.94 predicted by the model. A randomization test with the model indicates the model is unlikely to produce a correlation as small as 0.87. Thus while the model identifies a major factor underlying the correlation, there is likely some variation in user’s interests in voting and commenting on resolves to give somewhat less correlation than the model predicts.
276
Discussion The model shows the wide range of attention arises from a broad lognormal distribution in “interestingness” of individual resolves, i.e., how well they appeal to the user community. Such lognormal distributions of content interestingness are also seen in other web communities, such as Digg (Hogg and Lerman, 2009). An open question is identifying the origin of this broad distribution. In the case of Essembly, the interest reflects the appeal of various political discussion topics embodied in the resolves. The broad distribution of interest seen in Essembly could arise from users’ pre-existing interests, which often show broad distributions and may be explained by information cascades and confirmation biases (Bikhchandani et al., 1992; Shermer, 2006) rather than being specific to the structure of the online community. Another multiplicative mechanism that can produce a lognormal distribution of interestingness of content is when the appeal of an item, not just is visibility, is influenced by the observed popularity with other users in general, or among friends. This is in addition to popularity giving visibility: online communities typically highlight recent popular content. This influence is especially likely in situations where the quality of content is difficult to evaluate personally, or the value depends on the number of others using it, as with fashion or the latest “cool” product. Having identified lognormal distributions underlying both user activity and the attention resolves receive, a natural question is whether these distributions are related. For example, do users with higher activity levels or who are active for longer times tend to introduce resolves with high interestingness? Addressing this question requires estimating ρ and r values for individual users and resolves (Hogg and Szabo, 2009a) rather than the population-based models considered in this chapter. Such estimates show that while more active users introduce more resolves than less active users, there is little correlation between
Modeling the Diversity of User Behavior in Online Communities
user activity rate and the average interestingness of the resolves they introduce to the community as a whole.
LINKS Users’ decisions of who to link to and how they attend to the behavior of their neighbors can significantly affect the performance of online communities. Users can have several motivations for link formation, such as prior social relationships (“friends”) or discovery of others in the online community with similar interests. Essembly encourages users to explicitly distinguish these link types. A common property of such networks is the wide range in numbers of links made by users, given by the degree distribution of the network. The structure of the three networks in Essembly is typical of those seen in online social networking sites, with a broad distribution among the users.
Model We can understand this diversity of degree distribution as a consequence of the distribution of user activity discussed in Sec. 4. Forming links is one of the activities shown in Figure 2. Thus the model prediction for the link distribution is directly determined by the distribution for all actions, shown in Figure 1. Specifically, multiply-
ing the activity rate ρ for a user by the fraction of actions that are link creations in each network gives a lognormal distribution for the rate a user forms links. On average, the link creation rate is λA = 14526=616335 = 2.4% for the ally links, and λN = 2123=616335 = 0.35% for the nemesis links, respectively, as a fraction of the rate for all actions. The numerators in these expressions are the respective number of links in the Essembly network, and the denominator is the total number of all actions as shown in Table 1. The observed number of links according to this model then arises from the same Poisson process shown in Figure 1 but with this rescaled mean value with λA and λN, respectively. Since links are a relatively small fraction of the total activity, this rescaling produces fairly small mean values and hence a larger fraction of users with just a few links compared to the distribution of total activity among users. Using this rescaling, Figure 5 shows the distribution of degrees in the networks and how they compare with the distribution from our model, using only the parameters in Table 1 and the fraction of link creation actions, λA and λN.
Discussion The number of links a user forms is a combination of the user’s activity rate and how long the user remains at the site. The wide variation in activity times and rates among users give rise to
Figure 5. The number of major users in the Essembly social network who have a given number of links of the indicated type: (a) allies, (b) nemeses. The solid curves show the expected link distributions based on the model of activity for these users shown in Figure 1
277
Modeling the Diversity of User Behavior in Online Communities
a wide distribution of the number of links (Hogg and Szabo, 2009a). In addition to the number of links a user makes, an important question for the behavior of online communities is which pairs of users form those links. User choices of whom to link exhibit properties such as transitivity that indicate users do not form links at random, even given the distribution of number of links for each user (Hogg et al., 2008). The long-tailed degree distributions are often viewed as due to a preferential attachment process, which, combined with a limitation on the number of links a user has, gives truncated power-law degree distributions (Amaral et al., 2000; Vazquez, 2003). However, users in Essembly have no direct access to the number of links of other users, thus of more direct relevance to understanding and improving online communities is identifying a mechanism users could use to find others to form links with, based on information available to them. Essembly users likely find each other through shared interest in politics, and establish links based on similar voting activities rather than how many connections they already have (as would be the case in structure-based models). Essembly prominently highlights ideologically similar and dissimilar users on one’s personal page, which is a list that is inferred from votes on the same resolves by two users. This makes it particularly easy for users to make connections with people who share many common votes. Thus, for Essembly, a reasonable model for which pairs of users form links in the ideological networks is based on the number of votes they have in common (Hogg and Szabo, 2009b). A reasonable assumption is thus that the probability for link formation is proportional to the number of shared resolves between two users. Assuming random selection of resolves, this probability is in turn given by the product of the number of resolves that two users have voted on (for a detailed explanation, see the previous reference). Eventually therefore, the probability that a user will form a link is proportional to the number of votes that he or
278
she has cast, which provides another explanation of why we can approximate the link formation probabilities, as a constant for every user, without explicitly having to assume users devote a given fraction of their activities to forming links. Our model treats the various user activities (e.g., voting or forming a link) as independent choices. However, these activities are somewhat related over the population as a whole. For instance, web sites, including Essembly, highlight recent activity of a user’s network neighbors. In this way, linked users are more likely to view the same content that unlinked pairs. This increased visibility can lead to increased votes on the content. The importance of the online social network during the life cycle of user-contributed content depends on the nature of the web site. For example, in the Digg community-moderated news recommendation service, the social network formed by the users has a large influence on attention while content is only exposed to a small fraction of the users. As soon as content becomes easily accessible to a large user base (i.e., when a story is promoted to the “front page”), the influence of social ties diminishes and mass exposure becomes the dominant factor in making content visible. After this change in content visibility, the characteristics of users attending to the content changes as well; namely in having much more limited participation in the social networks then users who peruse the content while it is on a less exposed part of the web site (Lerman and Galstyan, 2008).
FUTURE RESEARCH DIRECTIONS Our discussion of diversity of online communities illustrates how new technology enables the study of social processes. These technologies include both the availability of activity of online communities and the development of sensors to record interactions based on proximity and directed conversations (Pentland, 2007).
Modeling the Diversity of User Behavior in Online Communities
Our model raises a key question for future work: how the lognormal distributions in user activity and resolve interestingness arise. Lognormal distributions suggest underlying multiplicative processes are involved, but the specific mechanisms for these processes and how they depend on the web site design and type of content are not yet known. Identifying such causal mechanisms could benefit from controlled experiments, e.g., with randomly selected subgroups of users to avoid self-selection biases. The long-tail distributions observed in online communities pose a challenge for statistical modeling because samples may not be indicative of future behavior due to large variations among users. Thus these studies can benefit from robust statistical tools (Brown and Sethna, 2003) and require caution in situations where just a few highly active users can dominate the community. The study of online communities could benefit from more specific data showing not only what users did (e.g., vote on or download content) but also what they viewed and decided not to act on. Such data would allow models to better distinguish effects of visibility (largely determined by web site design) from how interesting users find the content once they see it. We modeled resolve vote distribution as due to a range of resolve interestingness combined with decreasing visibility with age. An interesting question is whether these factors have some dependency. For instance, age could affect interestingness as well as visibility (e.g., a resolve on a current news event vs. one on a general ideological value statement). The effect of age on interestingness (as opposed to visibility in the user interface) could depend on the type of web site. For example current events stories on Digg lose interest as the story becomes “old news” (Wu and Huberman, 2007) whereas entries on Wikipedia for topics of general interest retain their relevance to the users over an extended period of time. An interesting extension of the model is to identify niche resolves, i.e., resolves of high in-
terest to small subgroups of users but not to the population as a whole. Automatically identifying such subgroups could help people find others with similar interests by supplementing comparisons based on ideological profiles. Our model of the networks describes the degree distribution but does not address other significant properties of the networks, such as community structure and assortativity. For a discussion of some of these points see Hogg et al. (2008). Nor does our model address detailed effects on user behavior due to their network neighbors. Another aspect of diversity, not included in our model, is the bursty time intervals between a user’s successive activities on the site (Barabasi, 2005; Vazquez et al., 2006). One approach to understanding such behavior uses psychologically- motivated diffusion models of decision-making (Bogacz et al., 2006), which can describe this distribution and other aspects of users writing and commenting on blogs (Gotz et al., 2009). Finally, beyond models to describe the diversity in online communities is the question of how this diversity contributes to the performance of the online community, both from the perspective of individual users and for the community as a whole. The latter is particularly relevant when the community members are engaged in an aggregate activity such as collective problem solving or information aggregation (Arrow et al., 2008; Hahn and Tetlock, 2006). In appropriate contexts, diversity can aid these activities (Page, 2007), so a significant question for the design of online communities is the extent to which the observed diversity in activity relates to diversity of approaches to problem solving for the group as a whole.
IMPLICATIONS OF THE MODELS In the previous sections we gave descriptions of how users make use of an online community web service, and in particular what the natural patterns
279
Modeling the Diversity of User Behavior in Online Communities
are for their accessing the service. We have also analyzed how they form community links to other users that they have (or most likely, have not) met in person, and what are the consequences of this to the content they discuss. The models can be applied for predictions on at least two levels: to forecast workload demand and provide a more personalized experience for the users. Similarly to how the Erlang formula was found to describe the call frequencies in call centers and used as a predictive tool to design telephone exchange boards for a long time (Erlang 1909), the amount of requests and the nature of activities in web communities may well be extrapolated into the future by using and fine-tuning the models discussed in this chapter. If the online service is growing quickly and hardware resource usage is a concern, different usage levels may be provisioned based on the number of future users as an input. This assumes that individual user access pattern is independent of the number of total users in the system, which is most likely true as except in the scenario where there are only a handful of users in total, no individual is ever aware of the full scope of the user community. Users can observe others’ activities only through content filters (similar resolves, similar users, recent votes, etc.) and shared activities, which limits their overview of the whole user base. A second application of the models is more visible, and has possible deeper consequences to user interface design. This is when the predictions of the models are used to customize and personalize the information that the users are shown as a result of a query or search. For example, one could replace the usual resolve ranking system where the “popularity” of resolves is not measured by the total number of votes that they have received during their lifetimes, by one that takes into account their ages and how actively they have been voted on by users even if they are new (Szabo and Huberman 2008). This would have the consequence that the “rich get richer” mechanism for content popularity growth will no
280
longer accurately describe the number of votes on resolves, since more visibility will be added for resolves that are deemed interesting even from the beginning. However, interesting but new resolves will more likely receive the early attention they would not get otherwise. While we focused on different aspects of user modeling, examining individual users and how their access rates change in time could allow web site designers to estimate attrition probabilities for users that exhibit certain patterns of diminishing usage. This is becoming more and more of a concern for very popular web sites that depend on a large number of visitors for their existence.
CONCLUSION We described several extended distributions resulting from user behavior on Essembly, a community where users create and rate content as well as form networks. Essembly has extremely heterogenous populations of users and resolves. We introduce a model of these distributions as due to the key features of continual arrival of new users, existing users becoming inactive, and a wide range of activity levels among the user population and interest in the content. These features can apply in many online community contexts, depending on the nature of the shared content and how users find it. The distribution of how users rate content depends on the origin of perceived value to the users. At one extreme, which seems to apply to Essembly, the items themselves have a wide range of appeal to the user population, leading some content to consistently attract user attention at much higher rates than other content introduced at about the same time. At the other extreme, perceived value could be largely driven by popularity among the users, or subgroups of users. In rapidly changing situations, e.g., current news events, recency is important not only in providing visibility through the system’s user interface, but also determining
Modeling the Diversity of User Behavior in Online Communities
the level of interest. In other situations, the level of interest in the items changes slowly, if at all, as appears to be the case for Essembly’s content concerning broad political questions such as the benefits of free trade. All these situations can lead to long-tail distributions through a combination of a “rich get richer” multiplicative process and decay with age. But these situations have different underlying causal mechanisms and hence different implications for how the site design affects user behavior. Thus, the design and evaluation of participatory web sites can benefit from models relating user behavior to information readily available on the site. These models can identify important aspects of the community design leading to observed aggregate behavior, and hence suggest design improvements to the online community web sites. In particular, online communities can have two primary purposes for their users: to share and aggregate information (e.g., for news story recommendations on Digg), or help people form and maintain social relationships (e.g., Facebook). These different purposes correspond to distinct performance measures for the community and underlying social mechanisms to encourage continued participation (Ren et al., 2007). The models described in this chapter apply to both motivations: identifying diversity of both user interest in the community and the relevance of the content to those users. Consequences of our model include suggestions for identifying user activity and interesting resolves early in their history. This possibility arises from persistence in voting rates over time, even before content accumulates enough votes to be rated as popular, as is also seen in larger user communities (Szabo and Huberman, 2008). Such identification could help promote interesting content on the web site more rapidly, particularly in the case of niche interests. For more specific predictions, the models can be extended to include the dynamics of the web site, particularly how users find content (Hogg and Lerman, 2009).
Beyond helping users find interesting content, designs informed by models could help with derivative applications, such as collaborative filtering or developing trust and reputations, by quickly focusing on the most significant users or content. Such applications raise significant questions of the relevant time scales. That is, observed behavior is noisy, so there is a tradeoff between using a long time to accumulate enough statistics to calibrate the model vs. using a short time to allow responsiveness faster than other proxies for user interest such as popularity. A caveat on our results, as with other observational studies of online communities, is the evidence for mechanisms is based on correlations in observations. While our model provides plausible causal explanations since it relies on information and actions available to users rather than aggregated descriptive variables not known by individual users, intervention experiments would give more confidence in distinguishing correlation from causal relationships. Our model provides testable hypotheses for such experiments. For example, if intrinsic interest in resolves is a major factor in users’ selection of resolves, then deliberate interventions to change the number of votes may change visibility but will not affect interestingness. In that case, we would expect subsequent votes to return to the original trend. Thus one area for experimentation is to determine how users value content on various web sites. For example, if items are valued mainly because others value them (e.g., fashion items and a variety of other economic contexts (Ariely, 2008)) then observed votes would cause rather than just reflect high value. In such cases, random initial variations in ratings would be amplified, and show very different results if repeated or tried on separate subgroups of the population. If items all have similar values and differences are mainly due to visibility, e.g., recency or popularity, then we would expect votes to arise from rank order of votes (e.g., whether item is most popular) rather than absolute number of votes. If items have
281
Modeling the Diversity of User Behavior in Online Communities
broad intrinsic value, then voting would show persistence over time and similar outcomes for independent subgroups. It would also be useful to identify aspects of the model that could be tested in small groups, thereby allowing detailed and well-controlled laboratory experiments comparing multiple interventions. Larger scale experiments (Bainbridge, 2007; Salganik et al., 2006) would also help identify causal mechanisms.
Bikhchandani, S., Hirshleifer, D., & Welch, I. (1992). A theory of fads, fashion, custom, and cultural change as informational cascades. The Journal of Political Economy, 100, 992–1026. doi:10.1086/261849
ACKNOWLEDGMENT
Brown, K. S., & Sethna, J. P. (2003). Statistical mechanical approaches to models with many poorly known parameters. Physical Review E: Statistical, Nonlinear, and Soft Matter Physics, 68, 012904. doi:10.1103/PhysRevE.68.021904
We thank Chris Chan and Jimmy Kittiyachavalit of Essembly for their help in accessing the Essembly data. We have benefited from discussions with Michael Brzozowski and Dennis Wilkinson.
REFERENCES Aitchison, J., & Brown, J. A. C. (1957). The Log-normal Distribution. Cambridge: Cambridge University Press. Amaral, L. A. N., Scala, A., Barthelemy, M., & Stanley, H. E. (2000). Classes of small-world networks. Proceedings of the National Academy of Sciences of the United States of America, 97, 11149–11152. doi:10.1073/pnas.200327197
Bogacz, R. (2006). The physics of optimal decision making: A formal analysis of models of performance in two-alternative forced-choice tasks. Psychological Review, 113, 700–765. doi:10.1037/0033-295X.113.4.700
Brzozowski, M. J., Hogg, T., & Szabo, G. (2008). Friends and foes: Ideological social networking. In Proc. of the SIGCHI Conference on Human Factors in Computing (CHI2008) (pp. 817-820). New York: ACM Press. Butler, B., Sproull, L., Kiesler, S., & Kraut, R. (2007). Community effort in online groups: Who does the work and why? In Weisband, S. (Ed.), Leadership at a Distance (pp. 171–194). Hillsdale, NJ: Lawrence Erlbaum. Collett, D. (2003). Modelling Binary Data (2nd ed.). Boca Raton, FL: CRC Press.
Ariely, D. (2008). Predictably Irrational: The Hidden Forces That Shape Our Decisions. New York: Harper Collins.
Erlang, A. (1909). The theory of probabilities and telephone conversations. Nyt Tidsskrift for Matematik B, 20, 33–39.
Arrow, K. J. (2008). The promise of prediction markets. Science, 320, 877–878. doi:10.1126/ science.1157679
Frisch, U. & Sornette, D. (1997). Extreme deviations and applications. J. Physics I France, 7, 1155-1171.
Bainbridge, W. S. (2007). The scientific research potential of virtual worlds. Science, 317, 472–476. doi:10.1126/science.1146930
Gotz, M., Leskovec, J., McGlohon, M., & Faloutsos, C. (2009). Modeling blog dynamics. In Proc. of the Third International Conference on Weblogs and Social Media (ICWSM2009) (pp. 26-33). AAAI.
Barabasi, A.-L. (2005). The origin of bursts and heavy tails in human dynamics. Nature, 435, 207–211. doi:10.1038/nature03459
282
Modeling the Diversity of User Behavior in Online Communities
Hahn, R. W., & Tetlock, P. C. (Eds.). (2006). Information Markets: A New Way of Making Decisions. Washington, DC: AEI Press.
Page, S. E. (2007). The Difference: How the Power of Diversity Creates Better Groups, Firms, Schools, and Societies. Princeton Univ. Press.
Hogg, T., & Lerman, K. (2009). Stochastic models of user-contributory web sites. In Proc. of the Third International Conference on Weblogs and Social Media (ICWSM2009) (pp. 50-57). AAAI.
Panciera, K., Halfaker, A., & Terveen, L. (2009). Wikipedians are born, not made: a study of power editors on Wikipedia. In Proc. of the Intl. Conf. on Supporting Group Work (GROUP09) (pp. 51-60). New York: ACM Press.
Hogg, T., & Szabo, G. (2009a). Diversity of user activity and content quality in online communities. In Proc. of the Third International Conference on Weblogs and Social Media (ICWSM2009) (pp. 58-65). AAAI. Hogg, T., & Szabo, G. (2009b). Dynamics and diversity of online community activities. Europhysics Letters, 86, 38003. doi:10.1209/02955075/86/38003 Hogg, T., Wilkinson, D. M., Szabo, G., & Brzozowski, M. (2008). Multiple relationship types in online communities and social networks. In Lerman, K. et al. (Eds.), Proc. of the AAAI Symposium on Social Information Processing (pp. 30-35).
Pentland, A. S. (2007). Automatic mapping and modeling of human networks. Physica A, 378, 59–67. doi:10.1016/j.physa.2006.11.046 Rashid, A. M., Ling, K., Tassone, R. D., Resnick, P., Kraut, R., & Riedl, J. (2006). Motivating participation by displaying the value of contribution. In Proc. of the ACM Conference on Human-Factors in Computing Systems (CHI 2006) (pp. 955-958). New York: ACM Press. Redner, S. (1990). Random multiplicative processes: An elementary tutorial. American Journal of Physics, 58(3), 267–273. doi:10.1119/1.16497
Huberman, B. A., & Adamic, L. A. (1999). Growth dynamics of the World Wide Web. Nature, 401, 131.
Ren, Y., Kraut, R., & Kiesler, S. (2007). Applying common identity and bond theory to the design of online communities. Organization Studies, 28, 379–410.
Huberman, B. A., Pirolli, P. L. T., Pitkow, J. E., & Lukose, R. M. (1998). Strong regularities in World Wide Web surfing. Science, 280, 95–97. doi:10.1126/science.280.5360.95
Salganik, M. J., Dodds, P. S., & Watts, D. J. (2006). Experimental study of inequality and unpredictability in an artificial cultural market. Science, 311, 854–856. doi:10.1126/science.1121066
Joyce, E., & Kraut, R. E. (2006). Predicting continued participation in newsgroups. Journal of Computer-Mediated Communication, 11, 723–747. doi:10.1111/j.1083-6101.2006.00033.x
Shermer, M. (2006). The political brain. Scientific American, 295(1), 36. doi:10.1038/scientificamerican0706-36
Lerman, K. (2007). Social information processing in social news aggregation. IEEE Internet Computing: special issue on Social Search, 11(6),16-28.
Surowiecki, J. (2004). The Wisdom of Crowds: Why the Many Are Smarter Than the Few and How Collective Wisdom Shapes Business, Economies, Societies and Nations. Doubleday.
Lerman, K., & Galstyan, A. (2008). Analysis of social voting patterns on Digg. In Proceedings of the 1st ACM SIGCOMM Workshop on Online Social Networks (pp. 7-12). New York: ACM.
Szabo, G., & Huberman, B. A. (2010). Predicting the popularity of online content. Communications of the ACM, 53, 80–88. doi:10.1145/1787234.1787254
283
Modeling the Diversity of User Behavior in Online Communities
Vazquez, A. (2003). Growing network with local rules: Preferential attachment, clustering hierarchy, and degree correlations. Physical Review E: Statistical, Nonlinear, and Soft Matter Physics, 67, 056104. doi:10.1103/PhysRevE.67.056104 Vazquez, A., Oliveira, J. G., Dezso, Z., Goh, K.I., Kondor, I., & Barabasi, A.-L. (2006). Modeling bursts and heavy tails in human dynamics. Physical Review E: Statistical, Nonlinear, and Soft Matter Physics, 73, 036127. doi:10.1103/ PhysRevE.73.036127
284
Wilkinson, D. M. (2008). Strong regularities in online peer production. In Proc. of the 2008 ACM Conference on E-Commerce (pp. 302-309). Wu, F., & Huberman, B. A. (2007). Novelty and collective attention. Proceedings of the National Academy of Sciences of the United States of America, 104, 17599–17601. doi:10.1073/ pnas.0704916104
285
Chapter 16
Context and Explanation in e-Collaborative Work Patrick Brézillon University Paris 6 (UPMC), France
ABSTRACT In a face-to-face collaboration, participants use a large part of contextual information to translate, interpret and understand others’ utterances by using contextual cues like mimics, voice modulation, movement of a hand, etc. Such a shared context constitutes the collaboration space of the virtual community. Explanation generation, one the one hand, allows to reinforce the shared context, and, in the other hand, relies on the existing shared context. The situation is more critical in e-collaboration than in face-to-face collaboration because new contextual cues are to be used. This chapter presents the interests of making explicit context and explanation generation in e-collaboration and which types of new paradigms exist then.
INTRODUCTION An important challenge for virtual communities is the development of new means for interaction, especially in collaborative work. Any collaboration supposes that each participant understands how others make a decision and the steps of their reasoning to reach the decision. In a face-to-face collaboration, participants use a large part of contextual information to translate, interpret and understand others’ utterances by using contextual DOI: 10.4018/978-1-60960-040-2.ch016
cues like mimics, voice modulation, movement of a hand, etc. All these contextual elements are essential in the determination of a shared context among virtual-community members, a shared context that constitutes the collaboration space of the virtual community. Explanation generation, which relies heavily on contextual cues (Karsenty and Brézillon, 1995), would play a role in e-collaboration more important than in face-to-face collaboration. Twenty years ago, Artificial Intelligence was considered as the science of explanation (Kodratoff, 1987). However, few concrete results can be
reused from that time (e.g. see PRC-GDR, 1990). There are several reasons for that. The first point concerns expert systems (and knowledge-based systems after) themselves and their past failures (Brézillon and Pomerol, 1997). There was an exclusion of the human expert providing the knowledge for feeding the expert systems. The “interface” was the knowledge engineer asking the expert “If you face this problem, which solution do you propose?” The expert generally answered something like “Well, in the context A, I will consider this solution,” but the knowledge engineer only retained the pair {problem, solution} and forgot the initial triple {problem, context, solution} provided by the expert. The reason was to generalize in order to cover a large class of similar problems when the expert was giving a local solution in a specific context. Now, we know that a system needs to acquire knowledge and its context of use. On the opposite side, the user was excluded from the noble part of the problem solving because all the expert knowledge was supposed to be in the machine: the machine was considered as the oracle and the user as a novice (Karsenty and Brézillon, 1995). Thus, explanations aimed to convince the user of the rationale used by the machine without respect to what the user knew or wanted to know. Now, we know that we need of a user-centered approach (Brézillon, 2003). Capturing the knowledge from the expert, it was supposed to put all the needed knowledge in the machine, prior to the use of the system. However, one knows that the exception is rather the norm in expert diagnosis. Thus, the system was able to solve 80% of the most common problems, on which users did not need explanations and nothing about the 20% that users did not understand. Now, we know that systems must be able to acquire incrementally knowledge with its context of use in order to address more specific problems of users. Systems were unable to generate relevant explanations because they did not consider what the
286
user’s question was really, and in which context the question was asked. The request for an explanation was analyzed on the basis of the available information to the system. Now, we know that the system must understand the user’s question and after build jointly with the user the answer. Thus, the three key lessons learned are: (1) KM (i.e. knowledge management normally) stands for management of the knowledge in its context; (2) any collaboration needs a usercentered approach; and (3) an intelligent system must incrementally acquire new knowledge and learns corresponding new practices. We present in (Brézillon, 2007) and (Brézillon and Brézillon, 2007) a context-based formalism for explaining concretely the differences often cited but never clearly identified between prescribed and effective tasks (Leplat and Hoc, 1983), procedures and practices (Brézillon, 2005), logic of functioning and logic of use (Richard, 1983). Focusing on explanation generation, it appears that a context-based formalism for representing knowledge and reasoning allows the introduction of the end-user in the loop of the system development and the possibility for generating new types of explanations. Moreover, such formalism allows a uniform representation of elements of knowledge, of reasoning and of contexts. Hereafter, the chapter is organized in the following way. First, we install the background of our proposal. This background comprises two parts: the consideration of explanations in knowledgebased systems. In a second part, we show the relationships between explanation and context, what context is (the general framework, the shared context, granularity of context). The following section presents different types of explanations in the contextual-graphs formalism that we introduce first. The following section discusses a case study of collaborative answer building.
Context and Explanation in e-Collaborative Work
BACKGROUND This section introduces briefly the evolution of the way in which explanations have been considered in experts systems and after in knowledge-based systems. In a second part, we show that it was clear that there is a relationship between explanation generation and context, the lack of concrete works on context at that time (end of the eighties) has seriously limited the interest of explanations in knowledge systems.
Explanations in Expert Systems and Knowledge-Based Systems The first research on explanations started with rule-based expert systems. Imitating a human reasoning, the presentation of the trace of the expert-system reasoning (i.e. the sequence of fired rules) was supposed to be an explanation of the way in which the expert system reaches a conclusion. Indeed, it was right, but explanations were generated at the implementation level. The following step was the use of canned texts where “Firing of Rule_23 allows to checking rule_7” was replaced by something like “The available facts allow to identify the failure on equipment piece B3, and this leads to check if it is a mechanical problem”. Explanations thus moved from the implementation level to a representation level. However, the logic behind the chaining of the rules (why rule_7 is chosen first for example) was hidden. An important reason discovered lately is that a part of the control knowledge was put in the inference engine implicitly by the knowledge engineer (by imposing the ordering of rule checking for example). Thus, it was not possible to go another step above (i.e. a modeling level after the implementation and representation levels). Rapidly, it was clear that it was intractable to explain heuristics provided by human experts without additional knowledge. It was then proposed to introduce a domain model. It was the second generation of expert systems, called the
knowledge-based systems. This approach also reached its limits because it was difficult to know in advance all the needed knowledge and also because it was not always possible to have models of the domain. The user’s role was limited to be a data gatherer for the system. A second observation was that the goal of explanations is not to make identical user’s reasoning and the system reasoning, but only to make them compatible: the user must understand the system reasoning in terms of his own mental representation. For example, a driver and a garage mechanic can reason differently and reach the same diagnosis on the state of the car. The situation is similar in collaboration where specialists of different domains and different geographical areas must interact in order to design a complex object. A third observation is that the relevance of explanation generation depends essentially on the context use of the topic to explain (Karsenty and Brézillon, 1995; Abu-Hakima and Brézillon, 1995). Even if expert systems are now abandoned, there are important results that we can yet reuse, such as the base for new explanations proposed by Spieker (1991) and the qualities for relevant explanations established by Swartout and Moore (1993). Thus, beyond the need to make context explicit, first in the reasoning to explain, and, second, in the explanation generation, the most challenging finding is that lines of reasoning and explanation must be distinguished. Figure 1 illusFigure 1. Line of reasoning versus line of explanation (Abu-Hakima and Brézillon, 1995)
287
Context and Explanation in e-Collaborative Work
trates the evolution of the research on explanation generation (Abu-Hakima and Brézillon, 1995). Figure 1(a) gives the initial view on explanation generation by a strict superposition of the lines of reasoning and explanation (the firing of rule 23 allows to check Rule7). Figure 1(b) represents the first evolution corresponding to the introduction of domain knowledge, the knowledge that is not necessary for reasoning but for explanation. This was the first separation of the line of reasoning and the line of explanation. Figure 1(c) shows that lines of reasoning and of explanation interact, and providing an explanation may modify the line of reasoning. The line of explanation was considered during the development of the line of reasoning and not produced after the reasoning of the system. This corresponds to a collective building of a shared context jointly with problem solving. Thus the key problem for providing relevant explanations is to find a uniform representation of elements of knowledge, of reasoning and of context.
Explanations and Contexts A frequent confusion between representation and modeling of the knowledge and reasoning implies that explanations are provided in a given representation formalism, and their relevance depend on explanation expressiveness through this formalism. For example, ordinary linear differential equation formalism will never allow to express—and thus explaining—the self-oscillating behavior of a nonlinear system. Thus, the choice of representation formalism is a key factor for generating relevant explanations for the user and is of paramount importance in collaboration with different users and several tasks. A second condition is to account for, make explicit, and model the context in which knowledge can be used and reasoning held. This concerns the needed distinction between data, information and knowledge. For example, a temperature of 24°C (the datum) in winter in Paris (when tem-
288
perature is normally around 0°C) is considered to be hot (the “French information”) and cold (the “Brazilian information”) in Rio de Janeiro (when temperature is rather around 35°C during winter). Thus, the knowledge must be considered within its context of use for providing relevant explanations, like to explain to a person living in Paris why a temperature of 24°C could be considered as cold in some other countries. Temperature = 24°C is a datum. A process of interpretation leads to an information (hot or cold). Information is data with meaning built on the basis of the knowledge that the person possesses. The knowledge is specific to a person and constitutes the context in which a person evaluates (and eventually integrates) information pieces in his mental representation. Indeed, this is more particularly the part of the knowledge that the person finds more or less related to the information. It corresponds to a mental representation that the person built from its experience for giving meaning to the information and eventually integrates the information in the body of contextual knowledge already available. When information cannot be related totally to the mental representation, an explanation is required for making explicit the links between the information and the contextual knowledge of the person. We will come back on this point on the following. There is now a consensus around the following definition “context is what constrains reasoning without intervening in it explicitly” (Brézillon and Pomerol, 1999), which applies also in ecollaboration (although with more complex constraints) where reasoning is developed collectively. Explanation generation is a means to develop a shared context among the actors in order to have a better understanding of the others (and their own reasoning), to reduce needs for communication and to speed up interaction. From our previous works on context, several conclusions have been reached. First, a context is always relative to something that we call the (current) focus of attention of the actors. Second, with respect to this focus, context is composed of
Context and Explanation in e-Collaborative Work
external knowledge and contextual knowledge. The former has nothing to see with the current focus (but could be mobilized later, once the focus moves), when the former can be more or less related directly to the focus (at least by some actors). Third, actors address the current focus by extracting a subset of contextual elements, assembling and structuring them all together in a proceduralized context, which is a kind of « chunk of contextual knowledge » (in the spirit of the “chunk of knowledge” of Schank, 1982). Fourth, the focus evolving, the status of the knowledge (external, contextual, into the proceduralized context) evolves too. Thus, there is a dynamics of context that plays an important role in the quality of explanations. As the context exists with the knowledge, a context-based generation of explanations does not require an additional effort because the explanatory knowledge is integrated in the knowledge representation at the time of their acquisition and the representation of the reasoning (see Brézillon, 2005, on this aspect). However, this supposes to have a context-based formalism allowing a uniform way to represent elements of knowledge, of reasoning and of contexts.
WHAT IS CONTEXT? A Conceptual Framework for Modeling Context One of our aims is to take in account the context. There are a lot of definitions of context, but we refer to the definition of Brézillon and Pomerol (1999) who consider context as the sum of two types of knowledge. First, there is the part of the context that is relevant at the current step of the answer building, and the part that is not relevant. The former part is called contextual knowledge, and obviously depends on the decision maker and on the decision at hand. The latter part is called external knowledge and appears in different sources,
such as the knowledge known by the participant but let implicit with respect to the current focus, the knowledge unknown to the participant (out of his competence), contextual knowledge of other actors in a team, etc. Here, the focus acts as a discriminating factor between the external and contextual knowledge. However, the frontier between external and contextual knowledge is porous and moves with the progress of the focus. In our viewpoint, context is what surrounds a focus (e.g. the decision making process or the task at hand) and gives meaning to items related to the focus. On the one hand, context guides the focus of attention, i.e. the subset of common ground that is pertinent to the current task. Indeed, context acts more on the relationships between the items in the focus than on items themselves, modifying their extension and surface. On the other hand, the focus allows identifying the relevant elements to consider in the context. It specifies what must be contextual knowledge and external knowledge in the context at a given step. For example, a focus on the driving task mobilizes contextual knowledge such as the fact of knowing the meaning of the traffic signs, the fact to have learned how to drive, etc., i.e. knowledge that could eventually be used when the focus evolves. Some knowledge from driver’s personal context could also be considered such as a previous experience in the driving task. For example, this corresponds to the choice of a specific method at a given step of a task. For driving-situation solving, a driver has several solutions, e.g. several behaviors for crossing an intersection. Indeed, some contextual elements are considered explicitly, say for the selection of the behavior and thus can be considered as a part of the way in which the problem is solved at the considered step. A sub-set of the contextual knowledge is proceduralized for addressing specifically the current focus. We call it the proceduralized context. The proceduralized context is a sub-set of contextual knowledge that is invoked, assembled, organized, structured and situated according to the given focus
289
Context and Explanation in e-Collaborative Work
and is common to the various people involved in decision making. A proceduralized context is quite similar, in the spirit, to the chunk of knowledge discussed in SOAR (Schank, 1982), and, in its building, to Clancey’s view (1992) on diagnosis as the building of a situation-specific model. A proceduralized context is like a local model that accounts for a precise goal in a specific situation (at a given step). In a distinction reminiscent to cognitive ergonomics (Leplat and Hoc, 1983), we could say that the contextual knowledge is useful to identify the task at hand whereas the proceduralized context is relevant to characterize the task realization, i.e. the activity. An important issue is the passage of elements from contextual knowledge to a proceduralized context. This proceduralization process, which depends on the focus on a task, is task-oriented just as the know-how and is often triggered by an event or primed by the recognition of a pattern. This proceduralization process provides a consistent explanatory framework to anticipate the results of a decision or an action. This consistency is obtained by reasoning about causes and consequences and particularly their relationships in a given situation. Thus, we can separate the reasoning between diagnosing the real context and, anticipating the follow up (Pomerol, 2001). The second step needs a conscious reasoning about causes and consequences. Brézillon and Brézillon (2007) discuss a second type of proceduralization, namely the instantiation of contextual elements. This means that the contextual knowledge or background knowledge needs some further specifications to perfectly fit the decision making at hand. The precision and specification brought to the contextual knowledge is also a part of the proceduralization process that leads from the contextual knowledge to the proceduralized context. For each instantiation of a contextual element, a particular action will be executed. There are as many actions as different instantiations. However, once the corresponding action is executed, the instantiation does not mat-
290
ter anymore and the contextual element leaves the proceduralized context and goes back in the contextual knowledge. For example, arriving to a crossroad, a driver looks at the traffic light. If it is the green signal, then the driver will decide to cross. The instantiation of the contextual element “traffic light” (green signal) has guided the decision making process and then the decision is made. The color of the traffic light does not matter once the decision is made. Figure 2 illustrates our view on context for one person. Contextual knowledge is more or less similar to what people generally have in mind about the term ‘context’. Contextual knowledge is personal to an agent and it has no clear limit (the infinite dimension of context for McCarthy, 1993). Contextual knowledge is evoked by situations and events, and loosely tied to a task or a goal. When the task becomes more precise, a large part of this contextual knowledge can be proceduralized according to the current focus of the answer building. Although the contextual knowledge exists in theory, it is actually implicit and latent, and is not usable unless a goal (or an intention) emerges. When an event occurs, the attention of the actor is focused and a part of the contextual knowledge is proceduralized. Contextual knowledge appears back-stage, whereas the proceduralized context is front-stage in the spotlights.
Figure 2. The three types of context
Context and Explanation in e-Collaborative Work
Moreover, the context must rather be considered as a status of knowledge (external, contextual or proceduralized context) linked to the focus of attention. The context has a dynamic dimension that corresponds to a movement between contextual knowledge and a proceduralized context during the evolution of the focus of attention (i.e. when the decision making process progresses). From one step to the next one, either a piece of contextual knowledge or external knowledge enters the proceduralized context or the proceduralized context moves into the contextual knowledge of the actor once used in the current focus which then evolved. Participants face the problem of organizing and structuring contextual knowledge to transform it in a relevant proceduralized context for their answer-building process. This movement between the contextual knowledge and the proceduralized context is realized inside the individual context of each participant. Here eventual explanations are for the explainer himself.
Shared Context and Proceduralized Context The construction of the proceduralized context from contextual knowledge is often a process of communication in a work group. Figure 3 represents how the proceduralized context is built Figure 3. A representation of the interaction to build the proceduralized context
from contextual knowledge during the interaction between two participants. The shared context contains proceduralized pieces of knowledge in the focus of attention of the two participants. These pieces of knowledge are extracted from the contextual knowledge of each participant, are jointly structured by the two participants, and result in a shared knowledge. For example, the first utterance of a participant gives a rule such as “Stop at the next station if the alarm signal is triggered”. Then, on the request of the second participant, the first one may add some pieces of knowledge related to his first utterance. If this knowledge chunk belongs to the common part of the contextual knowledge of the participants, the pieces are integrated into a mutually acceptable knowledge structure, and then are moved to the proceduralized context. Here, the co-building of the proceduralized context implies that, first, their interpretations are made compatible, and, second, the proceduralized context will go to enrich their shared contextual knowledge after, thanks to explanations. Thus, during the interaction process, ties between participants of a decision group are reinforced and this will impact the constitution of new decision groups in the future. The proceduralized context contains all the pieces of knowledge that have been discussed and accepted (at least made compatible) by all the participants. The proceduralized context will become again a part of the shared contextual knowledge of each participant while it will get off the focus of the interaction context. Later, this chunk of knowledge previously proceduralized may be recalled, as any piece of contextual knowledge, to be integrated in a new proceduralized context. Thus, the more a participant is experimented, the more he possesses available structured knowledge. This is very similar to the externalization process in sense given by Nonaka and Takeuchi (1995). Let us also note that the proceduralized context can be shaped in procedures whether implicit or explicit. In other words, parts of contextual knowledge are compiled into short cuts or
291
Context and Explanation in e-Collaborative Work
implicit procedures as a result of learning. The proceduralized context building appears such as a kind of contextualization for the current focus. Thus, the focus and its context must be considered jointly for optimizing interaction among participants. The previous example of joint proceduralization explains that whereas the proceduralization process is primarily subjective, it can also be shared and results into some common context in communities sharing the same background and expertise. The shared contextual knowledge is build by interaction among participants. It constitutes a reference for the actors like the “Référentiel Opératif commun” discussed in Leplat and de Terssac (1990). The more the shared context will be developed, the more efficient will be the decision group.
Granularity of Context In the previous section, individual contexts and the shared context does not present the same granularity. As said previously, it would be more convenient to consider context as a status of the knowledge (external or contextual knowledge, and temporarily in a proceduralized context), and information is what is transferred between contexts. For the case study that is described below, we distinguish at a general level the group context, at an intermediate level, the individual contexts of the participants, and at the finer level, the context of the project on which are interacting collaboratively the participants in the group. Figure 4 gives an illustration of the situation. According to our view on context, the contextual knowledge at a granularity level is transformed in a proceduralized context at the finer granularity. For example, a contextual information piece of the group context could be “find a compromise between a relevant information for the readers of the newspaper and the notoriety of the sponsors of the newspaper.” This contextual knowledge in the group context will be interpreted at the indi-
292
Figure 4. Granularity and dynamic of context
vidual context of the persons writing the article in a proceduralized context such as provide the information without direct links with the sponsors. In Figure 4, individual contexts concern individuals and the group context, say a firm. Now the firm evolves in an arena (e.g. a market) in which the firm must fight and survive among other firms. Thus at this level, individual contexts concern the firms and the interaction context would correspond to a market. This means that the two views on a firm—the internal view with individuals, and the external view with the other firms—are strongly related, and the more the internal view will be coherent, the more the firm will be powerful externally.
AN EXPLANATION TYPOLOGY IN CONTEXTUAL GRAPHS The Formalism of Representation The development of our conceptual framework leads to the implementation of Contextual Graphs, which allows a uniform representation of elements of knowledge, of reasoning and of contexts. Then, in such representation formalism, we come back on the types of explanation that are possible to
Context and Explanation in e-Collaborative Work
generated in contextual graphs because “explanatory knowledge” is a natural part of the knowledge in knowledge systems. A key point here is that contextual graphs are representation formalism as workflows, Petri nets, Bayesian nets, etc. However, the main difference is that Contextual Graphs is a user-centered formalism (Brézillon, 2003): any user (e.g. a psychologist) needs less than one minute to learn and use the software (freely available at http://www.cxg.fr). A contextual graph represents the different ways to solve a problem. It is a directed graph, acyclic with one input and one output and a general structure of spindle (Brézillon, 2005). Figure 6 gives an example of contextual graph. A path in a contextual graph corresponds to a specific way (i.e. a practice) for the problem solving represented by the contextual graph. It is composed of elements of reasoning and of contexts, the latter being instantiated on the path followed (i.e. the values of the contextual elements are required for selecting a branch, i.e. an element of reasoning among several ones). Figure 5 provides the definition of the elements in a contextual graph. A more complete presentation of this formalism and its implementation can be found in (Brézillon, 2005). Elements of a contextual graph are: actions, contextual elements, sub-graphs, activities and temporal branching. An action is the building block of contextual graphs. We call it an action but it would be better to consider it as an elementary task. An action can appear on several paths.
This leads us to speak of instances of a given action, because an action, which appears on several paths in a contextual graph, is considered each time in a specific context. A contextual element is a couple of nodes, a contextual node and a recombination node; A contextual node has one input and N branches [1, N] corresponding to the N instantiations of the contextual element already encountered. The recombination node is [N, 1] and shows that even if we know the current instantiation of the contextual element, once the part of the practice on the branch between the contextual and recombination nodes corresponding to a given instantiation of the contextual element has been executed, it does not matter to know this instantiation because we do not need to differentiate a state of affairs any more with respect to this value. Then, the contextual element leaves the proceduralized context and (globally) is considered to go back to the contextual knowledge. A sub-graph is itself a contextual graph. This is a method to decompose a part of the task in different way according to the context and the different methods existing. In contextual graphs, sub-graphs are mainly used for obtaining different displays of the contextual graph on the graphical interface by some mechanisms of aggregation and expansion like in conceptual graphs (Sowa, 2000). An activity is a particular sub-graph (and thus also a contextual graph by itself) that is identified by participants because appearing on different paths
Figure 5. Elements of a contextual graph
293
Context and Explanation in e-Collaborative Work
Figure 6. Contextual Graph of the different collaborative building processes
and/or in several contextual graphs. This recurring sub-structure is generally considered as a complex action. An activity is a kind a contextualized task. An activity is similar to a scheme as considered in cognitive ergonomics (Leplat and Hoc, 1983). Each scheme organizes the activity around an object and can call other schemes to complete specific sub-goals. A temporal branching expresses the fact (and reduces the complexity of the representation) that several groups of actions must be accomplished but that the order in which action groups must be considered is not important, or even could be done in parallel, but all actions must be accomplished before continuing. The temporal branching is for context what activities are for actions (i.e. complex actions). This item expresses a problem of representation at a lower granularity. For example, the activity “Make train empty of travelers” in the SART application (Pomerol et al., 2002) accounts for the damaged train and the helping train. There
294
is no importance to empty of travelers first either the damaged train or the helping train or both in parallel. This operation is at a too low level with respect to the general task “Return back rapidly to a normal service” and would have otherwise to be detailed in the three paths in parallel leading to the same sequence of actions after. Some mechanisms of aggregation and expansion provide different local views on a contextual graph at different levels of detail by aggregating a sub-graph in an item (a temporary activity) or expanding it. This representation is used for the recording of the practices developed by users, which thus are responsible for some paths in the contextual graph, or at least some parts of them.
A CASE STUDY How collaboration can improve document comprehension? Starting from the C/I comprehension
Context and Explanation in e-Collaborative Work
model developed by Kintsch (1998), Brézillon et al. (2006) set up a series of several experiments aiming to test whether the ideas evoked during a prior collaborative situation can affect the comprehension processes and at which representation levels this may occur. The hypothesis was that collaboration affected directly the construction of the situation model. In order to test this hypothesis, Brézillon et al. (2006) built an experimental design in two phases: 1) a collaboration phase, and 2) a comprehension phase (reading and questionnaire). In the comprehension phase, the authors run several experiments (with an eye-tracking technique) where participants of the experiments had to read a set of texts varying both semantically and from the layout. The general purpose was to correlate the verbal interactions occurring during the collaboration and the behavioral data (eye-movements and correct answers to questions) recorded during reading. Here, we only discuss the modeling in the Contextual Graphs formalism of the collaborative verbal exchanges between two participants. The goal was to build an efficient task model that would be closer to the effective task(s) than the prescribed task. Such a “contextualized prescribed task” is possible, thanks to a formalism allowing a uniform representation of elements of decision and of contexts. This study has two side effects. There are, first, the need to make explicit the shared context for building the answer, and, second, the relative position of cooperation and collaboration between them. The shared context is the common background from which the two participants of the experiments will build collaboratively the answer to questions such as “How does the oyster make pearls?” (The expected answer is “A pearl arises from the introduction of a little artificial stone inserted into the oyster sexual gland. The oyster neutralizes the intrusive, the stone, surrounding it of the pearlier bag. Once closed, this pearlier bag secretes the pearlier material: the motherof-pearl”.) The quality of the answer depends essentially of the richness of the shared context.
The building of this shared context is a step of the process that we study. Even if one of the participants knows the answer, s/he tries to build this shared context, and the answer building thus is enriched with the generation of an explanation for the other participant. Our goal was to provide a representation of the different ways to build an answer according to the context of the question. Along this view, the context of the question is the shared context in which each participant introduces contextual elements from his/her individual context. In a collaborative decision making process, such a shared context must be built. The shared context contains contextual elements on which participants agree, eventually after a discussion and having provided an illustration. A subset of this shared context is then organized, assembled and structured to build the answer. The result of this answer building is a proceduralized context (Brézillon, 2005). In this chapter, we put these results in the larger framework of collaborative decision making that discriminates a procedure and the different practices, the prescribed task and the effective task, the logic of functioning and the logic of use, etc. A practice is assimilated to a contextualization of a procedure. Thus, our goal was to analyze how an answer is built, its basic contextual elements and the different ways to assemble these elements. The modeling of the answer building is made, thanks to contextual graph. The main results that we obtained were the following ones. Two models have been built, the dialog model and the answer collaborative building model. The Dialog model contained 4 phases: • • • •
E1. Reformulate the question E2. Find an example E3. Gather domain knowledge (collection) E4. Build the answer from either characteristics or explanatory elements (integration)
For each pair of participants and for each question, we looked for the ordering of the 4 phases
295
Context and Explanation in e-Collaborative Work
Table 1. Different mean values for phases E1 to E4: frequencies into the collaboration (Col.1), Range of occurrences (Col.2), and Frequencies of occurrences (Col.3) €€€€Collaboration
€€€€Range
€€€€Frequencies
€€€€E1
€€€€1
€€€€1,27
€€€€70
€€€€E2
€€€€10
€€€€2,05
€€€€58
€€€€E3
€€€€120
€€€€1,98
€€€€133
€€€€E4
€€€€71
€€€€1,77
€€€€129
and which phase is a collaboration phase. Results are presented into Table 1. For example, column 1 indicates that collaboration used mostly phase E3 (i.e. gathering domain knowledge to constitute the shared context) and unlike phase E1 (Reformulation of the question). Column 2 shows that phase 1 appeared mostly at the beginning of exchange and phase E2 (Find an example) at the end. Column 3 reveals that phases E3 and E4 (construction) are the most frequent phases carry out into the exchange. Furthermore, collaboration appeared the most often at the beginning of exchanges. See (Brézillon et al., 2006) for more details. We obtain in this way a typology of explanations in a collaborative building of answers. The typology aims to classify whether the answer has been given and the granularity of this answer. We thus distinguish: Answer required at the right granularity Answer required but at a superficial level Answer required but too detailed Partial answer
Answer partially false False answer No answer. The granularity of the answer depends on the degree of development of the shared context.
The Collaborative Building Model of the Case Study The contextual graph of the collaboration model is represented in Figure 6 and the activity (the pink oval) is detailed in Figure 7. The collaboration model is composed of 4 paths: Path 1: Both partners do not know the answer Path 2: Both partners do not know the answer but each has elements of explanation, Path 3: Co-building of the answer, Path 4: One of the partners knows exactly the answer and provides it. Interestingly, results show that when participants collaborated by co-building the answer (Path
Figure 7. Details of the activity “Exemplify” represented by ovals in Figure 6
296
Context and Explanation in e-Collaborative Work
3), they gave mostly the correct answer either at superficial level (b) or partial answer (d). When either Path 2 (elements of answers) or Path 4 (One-Way) has been used, no difference in the type of answers emerges.
Path 1: No Knowledge about the Answer Both participants do not know the answer. They have no elements of the answer at all. However, they try to utter some rough ideas (example, a parallel with a known topic) in order to trigger a constructive reaction of the other.
Path 2: Elements of the Answer Both participants do not know the answer but think to have elements for generating an explanation. Generally, a participant leads the interaction by proposing elements or asking questions to the other. Explanation generation is a kind of justification or validation to themselves of their general understanding of the question, without trying to build an answer.
Path 3: Two-Ways Knowledge Both participants have a partial view of the answer, know some of the elements of the answer and try to assemble them with the elements provide by the other. They have the same position in the answer building, and there is no need for explanations between them or for external observer. This is a situation of maximal cooperation. However, without external validation, the quality of the answer is rather variable.
Path 4: One-Way Knowledge One of the participants knows exactly the answer, provides it immediately and spontaneously, and spends his/her time after to explain the other
participant. Here the cooperation is unidirectional like the information flow. Indeed, we can expect a relatively continuous spectrum between the path where one participant knows exactly (Path 4) and the situation where none of the participants knows (Path 1).
An Explanation Typology Based on Contextual Graphs We established a typology of explanations, based on previous works and exploiting the capabilities of contextual graphs. By adding a new practice, several contextual information pieces are recorded automatically (date of creation, creator, the practice-parent) and others are provided by the participant himself like a definition and comments on the item that is introduced. Such contextual information is exploited during the explanation generation. Thus, the richness of contextual-graph formalism leads in the expressiveness, first, of the knowledge and reasoning represented, and, second, of the explanations addressing different participants’ requirements. The main categories of explanations identified in contextual graphs are: Visual explanations correspond to a graphical presentation of a set of complex information generally associated with the evolution of an item, e.g. the contextual graph itself, the decomposition of a given practice, the series of changes introduced by a given participant, regularities in contextual graphs, etc. Note that we are dealing with contextual graphs with an “experience-based” knowledge base. Dynamic explanations. They correspond to the progress of the answer building during a simulation addressing questions as the “What if” question. With the mechanisms of aggregation and expansion, a participant can ask an explanation in two different contexts and thus receives two explanations with different presentations (e.g. with the details of what an activity is doing in one of the two explanations). The dynamic nature of the explanation is also related to the fact that items
297
Context and Explanation in e-Collaborative Work
are not introduced chronologically in a contextual graph. For example, in Figure 6, the contextual element 15 (“Need to justify?”) has be added after (1) the action 16 (“Cite elements of the answer”), (2) finding a situation where both explainer and explainee know all elements. Thus, there is no need to justify. Finally, the proceduralized context along a practice is an ordered series of instantiated contextual elements, and changing the instantiation of one of them is changing of practice and thus changing of explanation. User-based explanations. The participant being responsible of some practice changes in the contextual graph, the system uses this information to tailor its explanation by detailing parts unknown of the participant and sum up parts developed by the participant. Such an explanation allows the author of a practice to identify the contextual elements that he had not taken into account initially and that has been introduced by other participants). Context-based explanations. The definition of the proceduralized context (an ordered sequence of instantiated contextual elements) shows that a given item (say the activity “exemplify” represented by an oval in Figure 6) on different branches of the contextual graph appears in different contexts. This means that the explanation of the activity on any branch will be different from explanations on the two other branches. We exploit this finding in our driver-modeling application for representing “good” and “bad” behaviors of car drivers on a unique contextual graph (Brézillon and Brézillon, 2008). Thus a relevant explanation relies heavily on the building of the proceduralized context (different for each item such as different instances of the same activity), and because the contextual graph can be incrementally enriched, explanations can be richer also. Micro- and macro-explanations. Again, with the mechanisms of aggregation and expansion, it is possible to generate an explanation at different levels of detail. For such a complex item like an activity (or any other sub-graph), it is possible to provide on them a micro-explanation by using an
298
internal viewpoint based on activity components. A macro-explanation from an external viewpoint is built with respect to the location of the activity in the contextual graph like any item, similarly to a context-based explanation as discussed above. This allows to providing (at least) two different types of explanation on the activity “Exemplify” at the macro level for the explainer and at the micro level for the explainee. Note that the explainer also may ask a micro-explanation in case of doubt on explainee’s understanding. This twofold explanation is linked to the notion of activity, but can be used by any participant with aggregation and expansion of local sub-graphs of parts of the whole contextual graph. Real-time explanations. There are three types of such explanations. First, the explanation is asked during an answer building when the system fails to match the participant’s practice with its recorded practices (e.g. a new explainer may decide to provide a personal experience as an example not considered in Figure 7). Then, the system needs to acquire incrementally new knowledge and to learn the corresponding practice developed by the participant (generally due to specific values of contextual elements not taken into account before). This is an explanation from the participant to the machine. Second, the participant wished to follow the reasoning of a colleague having solved the problem with a new practice (and then we are back to simulation). Three, the system tries to anticipate the participant’s reasoning from its contextual graph and provides the user with suggestions and explanations when the user is operating. These suggestion and explanation rely on the contextual elements that are explicitly considered in the contextual graph. Note that it is because the system fails to represent a user’s practice that the user explains to the system the new practice by introducing new knowledge, knowledge that the system can reuse after. Moreover, these different types of explanation (and others that we are discovering progressively) can be combined in different ways such as visual and dynamic explanations.
Context and Explanation in e-Collaborative Work
Lessons Learned on the Case Study Cooperation and collaboration are two ambiguous notions that have different meanings across domains, and sometimes from one author and another one. The difference between cooperation and collaboration seems related to the sharing of the participants’ goal in the interaction. In cooperation (co-operation), each participant aims at the same goal and the task is divided in sub-tasks, each sub-tasks being under the responsibility of a participant. Thus, each participant intervenes in the shared goal through a part of the task. In collaboration, participants have different goals but interact in order to satisfy at least the goal of one of them, or one of his sub-goal. An example is the Head of a service and his secretary, often called a collaborator. The secretary takes in charge a part of the Head’s task, but only as a support for the complex tasks of the Head (i.e. by collecting all the needed information for the Head that will make the decision). However, we think that the difficulty to agree between cooperation and collaboration relationships is the lack of consideration for the dynamic dimension of the relationships. Two participants may cooperate at one moment and collaborate at another moment. The shift comes from their background (their individual contexts) with respect to the current focus and their previous interaction (the shared context). If one participant can fix the current focus, then the other only agrees, and there is a minimal cooperation, i.e. collaboration for validating the answer. If none of the participants knows how to address the current focus, they try together, first, to bring (contextual) elements of an answer, and, second, to build the answer as a chunk of knowledge (Schank, 1982) or a proceduralized context, i.e. a kind of chunk of contextual knowledge (Brézillon, 2007). This is a full cooperation. Several lessons could be learned from this study:
•
•
Repetition of the question occurs when the participants of the experiments wish to be sure to understand correctly the question, i.e. to be able to find some relationships between elements of the questions and contextual elements of their mental representation of the domain (or maybe to have time to build their mental representation of this question). An answer can be given at different levels of granularity. Thus, we observe correct answer at the right level as well as at a too low level of granularity (too many details) or too high level (rough description of the answer). For example, “gas” instead of “CO2” for sparkling water. Participants of the experiments have a problem for finding the right granularity of their answer. One can know the answer but not the elements or even the rationale (e.g. everybody knows that a refrigerator keeps cold the food, but few knows that this relies on the 2nd principle of the Thermodynamics). As a consequence, participants may express an external and superficial position.
Collaboration as a minimal expression of cooperation: one leads the interaction and the other only feeds in information (or only agrees), reinforces the statement of the other. When participants of the experiments gather contextual information, the goal is not to build immediately the answer because they want first to determine the granularity that their answer must have. Once, the level of granularity is identified, the selection of pieces of contextual knowledge to use in the proceduralized context is direct. When they cannot identify the right level of granularity, they enter the process of an explanation generation. An explanation is given to: (1) justify a known answer, (2) progress in the co-construction of the answer by sharing elements and their interconnection; (3) when participants are not sure of the granularity of the answer (e.g. participants speak
299
Context and Explanation in e-Collaborative Work
of ‘gas’ instead of ‘CO2’ for sparkling water). The explanation (given for an answer) is frequently less precise than an answer (generally at a macrolevel), and is often for use between the participants. Several groups were confused and explain instead of giving the answer (thus with additional details not necessary). The answer appears to be a kind of minimal explanation.
CONCLUSION In a virtual community, people have feature in common (e.g. French speaking people in New York), but it is not sufficient for collaboration. A collaboration supposes the sharing of several contextual cue (a language, social cues, an environment, etc.)that will impact the collaboration. Relevant explanations are a crucial factor in any collaboration between human actors, especially when they interact by computer-mediated means. First, collaboration looses some advantages of a face-to-face collaboration in which a number of contextual elements are exchanged in parallel with the direct communication. Second, collaboration can benefit of new ways to replace this “hidden exchanges” of contextual cues between actors by the use of the computer-means themselves. Explanation generation is very promising for collaboration because explanations use and help to maintain a shared context among actors. We are now in a situation in which computer-mediated interaction concerns human and software actors. Software must be able to react in the best way for human actors. For example, for presenting a complex set of data, a software piece could choose a visual explanation taking into account the type of information that human actors are looking for. We show that making context explicit allows the generation of relevant explanations. Conversely, explanations are a way to make contextual knowledge explicit and points out the relationships between context and the task at hand, and thus develop a real shared context.
300
In this chapter we argue that a key factor for the success of relevant explanations is to use a context-based formalism, like Contextual Graphs, that represent in a uniform way all the richness of the knowledge and reasoning in the focus. A good option is to consider context of use simultaneously with the knowledge. As a consequence, this allows developing new types of explanation like visual explanations, dynamic explanations, real-time explanations, etc. Indeed, we have developed a new typology of explanations that include past works on explanations but goes largely beyond. Moreover, these different types of explanations can be combined together to provide richer explanations. However, this is only the first step. A promising path is to explore intelligent assistant systems. Indeed, computer-mediated means can keep and reuse a trace of interaction between human actors. In real-time situations, the human actor cannot lose time to answer questions of a machine because the actor is generally under time pressure (e.g. an incident solving in a control room), but the machine can act in parallel with actors in a kind of personal simulation replaying similar past situations, and making suggestions when appropriate. In that sense, the machine may become an excellent secretary, fixing alone all the simple problems of human actors, and preparing a complete folder on complex situations letting actors make their decision. Here, the machine generates explanations for humans. Conversely, when the machine fails to address correctly a problem, the machine may benefit of its interaction with the human actors to acquire incrementally the missing knowledge and learn new practices. As a consequence, the machine will be able to explain later its choices and decisions. Now, there is a software piece called Contextual Graphs that is able to manage incremental acquisition and learning, and begins to provide some elementary explanations. As a general learned lesson, expressiveness of the knowledge and reasoning models depends
Context and Explanation in e-Collaborative Work
essentially of the representation formalism chosen for expressing such models. This appears a key element of collaboration with multiple sources of knowledge and different lines of reasoning intertwined in a group work. This is a partial answer to our initial observation that collaboration would be better understood if we consider jointly its two dimensions, the human dimension and the technology dimension. Then, explanation generation would be revised in order to develop “collective explanations” for all the (human) participants in the collaboration that is in each mental representation.
Brézillon, P., & Pomerol, J.-Ch. (1999). Contextual Knowledge sharing and cooperation in intelligent assistant systems, Le Travail Humain. PUF, Paris, 62(3), 223–246.
Abu-Hakima, S., & Brézillon, P. (1994). Knowledge acquisition and explanation for diagnosis in context, Research Report 94/11. Paris, France: LAFORIA, University Paris VI. Brézillon, J., & Brézillon, P. (2007). Context modeling: Context as a dressing of a focus. In Proceedings of the 5th International and Interdisciplinary Conference on Modeling and Using Context, CONTEXT-07 (LNAI 4635, pp. 136-149).
Brézillon, P., Drai-Zerbib, V., Baccino, T., & Therouanne, T. (2006). Modeling collaborative construction of an answer by contextual graphs. In Proceedings of IPMU, Paris, France, May 11-13. Brézillon, P., & Pomerol, J.-Ch. (1997). Lessons learned on successes and failures of KBSs. Special Issue on Successes and Pitfalls of KnowledgeBased Systems. In Real-World Applications. Failures and Lessons Learned in Information Technology Management, 1(2), 89-98.
Clancey, W. J. (1992). Model construction operators. Artificial Intelligence, 53, 1–115. doi:10.1016/0004-3702(92)90037-X Karsenty, L., & Brézillon, P. (1995). Cooperative problem solving and explanation. International Journal of Expert Systems with Applications, 8(4), 445–462. doi:10.1016/0957-4174(94)E0035-S
Brézillon, P. (2003). Focusing on context in humancentered computing. IEEE Intelligent Systems, 18(3), 62–66. doi:10.1109/MIS.2003.1200731
Kintsch, W. (1998). Comprehension: a paradigm for cognition. Cambridge: Cambridge University Press.
Brézillon, P. (2005). Task-realization models in Contextual Graphs. In 5th International and Interdisciplinary Conference on Modeling and Using Context (LNCS 3554, pp. 55-68).
Kock, N., Davison, R., Ocker, R., & Wazlawick, R. (2001). E-Collaboration: A look at past research and future challenges. Journal of Systems and Information Technology, 5(1), 1–9. doi:10.1108/13287260180001059
Brézillon, P. (2007). Context modeling: Task model and model of practices. In A. Dey, B.Kokinov, D.Leake, & R.Turner (Eds.), Proceedings of the 5th International and Interdisciplinary Conference on Modeling and Using Context, CONTEXT-07 (LNCS 3554, pp. 55-68).
Kodratoff, Y. (1987). Is artificial intelligence a subfield of computer science or is artificial intelligence the science of explanation? Bled. Progress in Machine Learning (pp. 91–106). Sigma Press. Leplat, J., & de Terssac, G. (1990). Les facteurs humains de la fiabilité dans les systèmes complexes. Toulouse: Octarès Éditions.
301
Context and Explanation in e-Collaborative Work
Leplat, J., & Hoc, J. M. (1983). Tâche et activité dans l’analyse psychologique des situations. Cahiers de Psychologie Cognitive, 3, 49–63.
Richard, J.F. (1983). Logique du fonctionnement et logique de l’utilisation. Rapport de Recherche INRIA no 202.
McCarthy, J. (1993). Notes on formalizing context. In Proceedings of the 13th IJCAI 1 (pp. 555-560).
Schank, R. C. (1982). Dynamic memory, a theory of learning in computers and people. Cambridge University Press.
Nonaka, I., & Takeuchi, H. (1995). The knowledgecreating company. New York: Oxford University Press. Pomerol, J.-Ch. (2001). Scenario development and practical decision making under uncertainty. Decision Support Systems, 31, 197–204. doi:10.1016/ S0167-9236(00)00131-7 Pomerol, J.-Ch., Brézillon, P., & Pasquier, L. (2002). Operational knowledge representation for practical decision making. Journal of Management Information Systems, 18(4), 101–116. PRC-GDR. (1990). Actes des 3e journées nationales PRC-GDR IA organisées par le CNRS, textes réunis par Bernadette Bouchon-Meunier, Chapitre 7. Editions Hermes.
302
Sowa, J. F. (2000). Knowledge Representation: Logical, Philosophical, and Computational Foundations. Pacific Grove, CA: Brooks Cole Publishing Co. Spieker, P. (1991). Natürlichsprachliche Erklärungen in technischen Expertensystemen. Ph.D. Dissertation, University of Kaiserslautern. Swartout, W. R., & Moore, J. D. (1993). Explanation in second generation expert systems. In David, J., Krivine, J., & Simmons, R. (Eds.), Second Generation Expert Systems (pp. 543–585). Berlin: Springer Verlag.
303
Chapter 17
Intelligent LMS with an Agent that Learns from Log Data in a Virtual Community Maomi Ueno The University of Electro-Communications, Japan
ABSTRACT This study describes an agent that acquires domain knowledge related to the content from a learning history log database in a learning community and automatically generates motivational messages for the learner. The unique features of this system are as follows: The agent builds a learner model automatically by applying the decision tree model. The agent predicts a learner’s final status (Failed; Abandon; Successful; or Excellent) using the learner model and his/her current learning history log data. The constructed learner model becomes more exact as the amount of data accumulated in the database increases. Furthermore, the agent compares a learner’s learning processes with “Excellent” status learners’ learning processes stored in the database, diagnoses the learner’s learning processes, and generates adaptive instructional messages for the learner. A comparison between a class of students that used the system and one that did not demonstrates the effectiveness of the system.
INTRODUCTION The constructivist approach has pervaded the area of educational technology in recent decades. It has been argued in this approach that the responsibility for learning should be increasingly with the learner (Von Glasersfeld, 1995). Therefore, the role of instructor has changed to facilitator from that of teacher (Bauersfeld, 1995). A teacher gives DOI: 10.4018/978-1-60960-040-2.ch017
a didactic lecture that covers the subject matter, but a facilitator assists the autonomous learning process. The learner plays a passive role in the former scenario and in the latter the learner plays an active role in the learning process. The emphasis thus shifts from the instructor and content-centred approach toward the learner-centred approach (Gamoran, Secada & Marrett, 2000). A central feature of this facilitation is individualizing learners and helping them to achieve self-growth through self-evaluation and coopera-
Intelligent LMS with an Agent that Learns from Log Data in a Virtual Community
tion with others (Merriam & Brockett, 2007). For example, according to the well known theory by Knowles, facilitation is designing a pattern of learning experiences, conducting these learning experiences with suitable techniques and materials, and evaluating the learning outcomes and rediagnosing learning needs (Knowles, 1983) (Knowles, Holton & Swanson, 1998). e-Learning, which emerged as a method of attaining the learnercentred approach, provides a new autonomouslearning environment that combines 1. multimedia content, 2. collaboration among learners, and 3. computer-supported learning (Ueno 2007). eLearning should work even if there is no human facilitator and a huge number of learners participate in it. It would essentially be impossible for facilitators to individualize such a huge number of learners and facilitate their learning. The main idea in this chapter is that a computational agent in a Learning Management System (LMS) plays the role of facilitator instead of human teachers. The proposed agent uses the learners’ history data in a learning community, which is stored in a database, to compare the learning process of the learner with that of the past excellent learners. A computational agent that learns using the decision tree model, one of machine learning or data-mining technologies, from data is called a “learning agent”. The decision tree model (Quinlan, 1986), which is a well-known method that is equivalent to the Bayesian belief network, enables users to obtain valid results even if the number of variables in the tree increases significantly, although interpreting the meaning of a structure is more difficult than in the Bayesian belief network. Building a meaningful model requires a number of variables for representing a learner’s status. For these reasons, in this study we used an intelligent agent based on the decision tree model and installed it into an LMS. The unique features of this system are summarized as follows.
304
1. The agent builds a learner model automatically by applying the decision tree model. 2. The agent predicts a learner’s final status (1. Failed; 2. Abandon; 3. Successful; or 4. Excellent) using the learner model and his/ her current learning history log data. The constructed learner model becomes more exact as the amount of data accumulated in the database increases. 3. The agent compares a learner’s learning processes with excellent learners’ learning processes in the database, diagnoses the learner’s learning processes and generates adaptive instructional messages for the learner. It should be noted that the learner model strongly reflects the learning culture of the community because the model was built using log data of the past learners. In addition, some previous research on learning motivation found that the effects of a mentor’s motivational messages were adapted to a learner’s status in e-Learning. Visser and Keller (1990) reported that motivational messages could reduce dropout rates and later attempted to improve motivation in e-Learning situations using such messages (Visser, Plomp, and Kuiper, 1999). Gabrielle (2000) applied technology-mediated instructional strategies to Gagne’s events of instruction and demonstrated how these strategies affected motivation. Thus, agent messages are also expected to be effective in facilitating learner motivation. The developed LMS with the agent system was compared with one without it in actual e-learning classes for one semester. The results showed that a much lower number of students withdrew from classes when the LMS with the agent system was used. In addition, the average score of the final test was significantly higher in the case of the LMS with the agent system. Answers to questions and interviews with learners showed that the agent system enhances learners’ motivation and contributes to learners’ maintaining a constant learning pace
Intelligent LMS with an Agent that Learns from Log Data in a Virtual Community
learners showed that the agent system enhances learners’ motivation and contributes to learners’ maintaining a constant learning pace contributes to learners’ maintaining a constant learning pace. Finally, we note that this study does not focus on the group works in collaborative learning but the individual works in this study can be easily extended to group works but the individual works in this study can be easily extended to group works easily extended to group works.
RELATED STUDIES Various studies have been done that have applied data mining techniques to learning history data in e-learning. Becker and Vanzin (2003) tried to detect meaningful patterns of learning activities in e-learning using the association rule. MinaeiBidgoli,et al. (2003) proposed a method to predict a learner’s final test score using the combination of multiple classifiers constructed from learning history data in e-learning, and they reported that a modified method using a genetic algorithm could improve the prediction performances. Talavera and Gaudioso (2004) and Hamalainen et al. (2006) separately proposed a method to predict final test scores using the naive Bayes model from learning history data in e-learning. Huang et al. (2007) predicted learning efficiency as defined by test score/learning time using a support vector machine (SVM) from learning history data in e-learning. However, these studies only tried to predict learner’s performance in e-learning from learning history data, and therefore, they did not discuss how to effectively utilize the predicted data mining results to improve learners’ results. Furthermore, the employed data mining engines in these studies were not installed into an LMS to automatically analyze the learning log database. Here, we proposed not simply a system to predict a learner’s final status using a data mining technique, but an agent that acquires the domain knowledge related to the content from a learning history log
database and that automatically generates adaptive instructional messages for the learners.
LMS “SAMURAI” We have developed an LMS called “Samurai” (Ueno, 2002) that is used with many e-learning courses; 128 e-learning courses are now offered by The University of Electro-Communications, through this LMS. Samurai consists of a contents presentation system (CPS), a contents database (CD), a discussion board (DB), a learning history database (LHD), and a data mining system (DMS). The CPS integrates various kinds of content and presents the integrated information on a web page. Figure 1 shows a typical e-learning content presentation by Samurai. The contents are presented by clicking on the menu button. A sound track of the teacher’s narration is also presented based on the research of Mayer and Anderson (1991), and the red pointer moves automatically as the narration proceeds. This lesson corresponds to a 90-minute university lecture and includes 42 topics. Although the content in Figure 1 is textual, the system also provides illustrations, animations or computer graphics, and video clips. In this lesson, there are 11 topics presented as textual content, 11 as illustrations, 10 as animations, and 10 as video clips. The CPS also presents some test items to assess the learners’ degree of comprehension as soon as the lessons have been completed (Figure 2). The CD consists of various kinds of media, text, jpeg and mpeg files, and so on. The teacher prepares a lecture and saves the contents in the CD. Then the CPS automatically integrates the contents, and presents them to the learners. The learners can ask questions about the contents in the DB (Figure 3). They can also submit the products of their learning for the given task (for example, a report or program source) using the DB. The LMS monitors learners’ learning processes and stores them as log data in the LHD. The stored data consist of a Contents ID, a
305
Intelligent LMS with an Agent that Learns from Log Data in a Virtual Community
Figure 1. The LMS “Samurai”
Figure 2. Example test frame
Learner ID, the number of topics that the learner has completed, a Test Item ID, a record of data input into the DB, an Operation Order ID (which indicates what operation was done), a Date and Time ID (which indicates the date and time that an operation started), and a Time ID (which indicates the time it took to complete the operation). These data enable the LMS to recount the learner’s behavior in e-learning.
306
AN AGENT USING THE DECISION TREE MODEL FOR E-LEARNING HISTORY DATA Prediction of Learner’s Final Status The main idea here is to apply a data-mining method to the huge amount of stored data and construct a learner model to predict each learner’s final status: (1) Failed (Final examination score
Intelligent LMS with an Agent that Learns from Log Data in a Virtual Community
Figure 3. Example discussion board
below 60); (2) Abandon (The learner withdraws before the final examination); (3) Successful (Final examination score is more than 60 but less than 80); or (4) Excellent (Final examination mark is more than 80.) For this purpose, the well-known data-mining decision tree model (Quinlan, 1986) is employed using the following variables reflecting each learner‘s status each week. 1. The number of topics the learner has learned. 2. The number of times the learner accessed the e-learning system. 3. The average number of times the learner has completed each topic. (This implies the number of times the learner repeated each topic.) 4. The average learning time for each lecture, which consists of several types of contents and runs 90 minutes. 5. The average of the degree of understanding of each topic. (This is measured by the response to the question corresponding to each topic.) 6. The average learning time for each course consisting of fifteen lectures.
7. The average number of times the learner has changed the answer to questions in the e-learning. 8. The number of times the learner has posted opinions or comments on the discussion board. 9. The average learning time for each topic. As all courses run for 15 weeks, 15 decision trees are prepared corresponding to the learners’ learning history data for the 15 weeks. The continuous variables are categorized into the number that minimizes the entropy. We used the ID3 algorithm (Ueno, 2002) as a learning algorithm for the decision trees because the computation cost is low and the estimators are robust. The program source was developed using Java and installed in Samurai. The decision trees are always learned using updated learning histories. Therefore, the decision trees’structures for predicting the learner’s final status are always changing. In this algorithm, all variables are always used. A decision tree learned from 1,344 learners’ data is shown in Figure 4. This tree was prepared using 14 weeks of learning history data. The two values in parentheses indicate the number of cases in which the inference is correct and incorrect. For
307
Intelligent LMS with an Agent that Learns from Log Data in a Virtual Community
Figure 4. Example of a constructed decision tree
example, (408/18) indicates that the probability of the correct inference is 408/426. In this system, decision trees corresponding to the weekly learner’s status are constantly being constructed.
Outline of Intelligent Agent System
introduces an outline of the system. The agent provides adaptive messages to the learner using the learner model. The agent also performs various actions based on the learner’s current status, as shown in Figure 6. The instructional messages to a learner are generated as follows.
The main purpose of the intelligent agent system is to provide optimum instructional messages to a learner using the previous automatically constructed learner model. The agent appears in the LMS as shown in Figure 5. First, this section
1. The system predicts the target learner’s future status and probability using the constructed decision tree. 2. If the predicted status is “Excellent”, the agent provides messages like “Looking
Figure 5. An intelligent agent (Note that the presented message is not misspelled. The message is moving continuously across the frame.)
308
Intelligent LMS with an Agent that Learns from Log Data in a Virtual Community
Figure 6. Various actions of the agent
great!”, “Keep doing your best”, and “Probability of success is xx%”. If the predicted status is not “Excellent”, the system searches for the closest “Excellent” node from the current predicted status node. For example, using Figure 7, we consider a part of the decision tree in Figure 4. If the predicted status is “Failed”, the nearest “Excellent” node is the gray node in the figure. The system finds the nearest “Excellent” node and determines the operations that will change the learner’s predicted status to “Excellent”. In this case, “the average learning time for each topic” is detected. The system provides messages with the predicted future status, the probability of success estimated by the decision tree, and the instructional messages according to Table 1.
Data Structure The system constructs a decision tree from learning history data and stores it in the database. The data structure of the constructed decision tree is defined using XML, as shown in Figure 8. indicates the course subject name and variable names. has two types: “Explain”, which means “explaining variables” and “Object”, which means “object variable” in the decision tree model. refers to the values an explaining variable takes. means the values an object variable takes. corresponds to the node structure. For a target variable defined by , a parent variable in the tree is expressed by .
means a table that has a number of positive instances and a number of negative instances.
Figure 7. Part of the decision tree in Figure 4
309
Intelligent LMS with an Agent that Learns from Log Data in a Virtual Community
Table 1. Instructional messages corresponding to the detected variables Variables
Instructional messages
1. The number of topics the learner has learned.
• Progress in your lesson is behind. Please do more lectures. • Progress in the lesson is liable to be slow. Let’s do more lectures.
2. The number of times the learner has accessed the e-learning system.
• You have not engaged in the lesson well. Let’s access the system more often.
3. The average number of times the learner has completed each topic.
4. The average learning time for each lecture, which consists of several types of contents and runs 90 minutes.
• It seems that you are working through the lecture too quickly. Please spend more time on each lecture.
5. The average of the degree of understanding of each topic (This is measured by the response to a question that corresponds to each topic.).
• Were the contents of the lesson difficult? Let’s do the lecture from the beginning once again. • When there is something you don’t understand, let’s post a question on the discussion board.
6. The average learning time for each course consisting of fifteen lectures.
• You have not engaged in the lesson well. Let’s access the system and study more slowly and carefully.
7. The average number of times the learner has changed the answer to the e-learning questions.
• Your knowledge does not appear to be very robust. Let’s do the lecture from the beginning once again.
8. The number of times the learner has posted opinions or comments on the discussion board.
• Learning is more effective when done between learners. Let’s participate in and contribute to the discussion board.
9. The average learning time for each topic.
• Did you do the lecture correctly? Ordinarily, a lesson should take more time.
Message Generation Algorithm Figure 9 shows the algorithm to generate instructional messages in the proposed intelligent agent system. According to the algorithm, the system first predicts the learner’s future status from his/ her current learning histories data using the con-
structed decision tree, and if the predicted status is “Excellent” the system sends complimentary messages to the learner. Otherwise, the system searches the nearest ancestor node whose descendant node has an “Excellent” node. If the system finds the ancestor node which has an “Excellent” descendant node, the system searches for
Figure 8. Example of the data structure of the constructed decision tree
310
Intelligent LMS with an Agent that Learns from Log Data in a Virtual Community
the “Excellent” node which has the shortest path from the ancestor node. If the several “Excellent” nodes exist which have the shortest path length from the ancestor node, the system selects the first searched for “Excellent” node to generate instructional messages. Next, the system selects a set of nodes that form a path from the ancestor node to the “Excellent” node and generates instructional messages corresponding to the set of node variables according to Table 1. If there are several instructional messages corresponding to one node variable, the system selects a message using a random number. This algorithm is installed into “Samurai” using Java. This system can create 2,048 patterns of adaptive instructional messages to learners, such as the one shown in Figure 5; thus, it is expected to adaptively correspond to various learner statuses.
COMPARATIVE PREDICTION EXPERIMENTS Some previous studies have been done on predicting a learner’s final test score using several machine learning methods from learning history
data in e-learning. Minaei-Bidgoli, et al.(2003) compared prediction performances of machine learning methods (decision tree model, Naive Bayes, and SVM) to predict a learner’s final test score from learning history data in e-learning. The decision tree showed the best performance in the results. On the other hand, Talavera and Gaudioso (2004) and Hamalainen et al. (2006) conducted similar experiments and insisted that Naive Bayes was the best model. Finally, Huang et al.(2007) found that SVM was the most effective model. Thus, these previous studies reported different results which means that the predictive performance depends on the characteristics of the data (the kinds of variables, data size, domain, learners’ age, and so on). Therefore, we also needed to evaluate various models with respect to data obtained from the LMS “Samurai” just as the previous studies did. We compared the decision tree model with the ID3 algorithm Naive Bayes model, and SVM. Here, we employed the most popular Naive Bayes model, the “multivariate Bernoulli model”(Domingos & Pazzani, 1997) and a well known SVM that has a “polynomial kernel”(Vapnik, 1995). First, the latest data from 800 learners were randomly sampled from the learning history
311
Intelligent LMS with an Agent that Learns from Log Data in a Virtual Community
database for 128 courses in the LMS “Samurai”. Furthermore, learner history data from 400 out of the 800 learners were randomly sampled as training data, and the remaining 400 learners’ history data were used as validation data (test data) for a cross-validation experiment. The cross-validation experiment was performed to predict learners’final status from their learning history data. Decision tree and Naive Bayes models use only categories variables as input data, but the learning history data use continuous variables data. Consequently, the continuous variables data in the learning history data were categorized to be uniformly distributed in each category. Although SVM can use the continuous variables data for input data, this experiment applied the categorized data to SVM under the same conditions as the other models. Here SVM employed the polynomial kernel as a kernel function. To categorize the input data, the range (from the minimum value of data to the maximum value of data) of each variable was divided by the number of the categories m into the category ranges. As a result, the continuous data were transformed to category data xicj (if the i-th variable’s category c’s range includes j-th learner’s data then xicj =1, otherwise xicj =0),(i=1,…9, c=1,…,m, j=1,…,N). The number of categories for all variables was changed from two to five in the experiment. The results are listed in Table 2. Each value indicates the correct prediction rates of the crossvalidation given the number of categories in the corresponding model. When there is a small
number of categories, SVM shows the best performance. However, it is clear that SVM over fits the data when there are four or more categories. Antagonistically, although the decision tree model shows lower performance than SVM when there are a small number of categories, it shows the best performance with four or more categories. Although Naïve Bayes shows lower correct prediction rates, the reason is that the explaining variables that are used all have a mutually strong correlation; nevertheless, the model assumes the variables are conditionally independent respectively. From these results, the decision tree is the most suitable for the data stored in LMS ”Samurai” because the proposed agent needs to use four categories as variables.
FEEDBACK FOR TEACHERS The proposed LMS can also provide feedback on all learners to a teacher, as shown in Figure 10. In this system, the feedback comprises the degree of the learning progress, the learning time, and the rate of understanding for each learner. In addition, this system also presents the current instructional messages to the teacher that the agent has sent to each learner.
Table 2. Correct prediction rates(%) obtained in the cross-validation experiment NC
DT
SVM
Naïve Bayes
2
75.00(88.70)
80.75(89.25)
75.50(76.25)
3
80.00(84.75)
81.00(88.7)
76.00(77.25)
4
82.00(88.75)
74.00(91.5)
77.00(77.75)
5
80.25(84.75)
78.76(91.5)
76.75(77.75)
Note NC: number of categories; DT: decision tree model using the ID3 algorithm. The parenthetical values indicate the fitting rate of the training data.
312
Intelligent LMS with an Agent that Learns from Log Data in a Virtual Community
Figure 10. Feedback for a teacher
EVALUATIONS The system was evaluated by comparing a class of students that used the agent system with one
that did not for one semester. The decision tree for the agent system was learned using 1,344 learners’ histories. The details of the two e-learning classes are summarized in Table 3. The results
Table 3. Comparison between classes with and without the system With agent system
Without agent system
Subject name
Information & Communication Technology
Information & Communication Technology
Students
Undergraduate students (third and fourth year)
Undergraduate students (third and fourth year)
Learning place
Each student’s home
Each student’s home
Credits
2
2
Number of students
74
92
Term
2003, April 10 - July 31
2004, April 10 - July 31
Number of students who withdrew from the course
14 (18.9%)
49 (53.2%)
Final test scores
Average: 93.26 Variance: 43.2 (n=60)
Average: 78.74 Variance: 215.24 (n=43)
P-value
1.33E-07
Total learning time (minutes)
Average: 1045.13 Variance: 71721.8 (n=60)
P-value
1.25E-05
Average degree of progress of lesson
Average: 0.93 Variance: 0.64 (n=60)
P-value of the statistical difference test of two averages
Intelligent LMS with an Agent that Learns from Log Data in a Virtual Community
Figure 11. Plotted results of Question A given to (a) the class with the system and (b) the one without it
show that far fewer students withdrew from the class if they had used the LMS with the agent system. In addition, the final test scores, learning time data, and progress of learning data also indicate that the proposed agent system enhanced learning significantly. The presentation of the predictive learner’s future status and the presentation of adaptive instructional messages help learners maintain the required learning pace. As a result, the learners can progress until they reach their predicted future status. Furthermore, all learners were asked Question A: “How would you rate the system’s ability to enhance your e-learning? 1. Very poor; 2. Poor; 3. Fair; 4. Good; 5. Very good.” The group with the agent system was asked an additional question, Question B: “How would you rate the adequacy of the instructional messages from the agent system? 1. Very bad; 2. Bad; 3. Fair; 4. Good; 5. Very good.” The results for the Question A are shown in Figure 11. Response frequencies of answers 2 and 3, “Poor” and “Fair” were less for the class with the system than for the one without it. This indicates that the system is effective in enhancing learning and the instructional messages have a positive effect on e-learning. However, it should be noted that the response frequency of “Very poor” increased for the class with the system. If
314
we assume that the difference between the results for the two classes are due only to the agent system use, the results mean that learners’ opinion about the agent system tended to be polarized compared to the opinions of the class without it. Figure 12 summarizes the learners’ responses frequencies to Question B. The results show that many learners rated the agent system’s messages as “Good” or “Very good” and this means that the instructional messages from the agent system are acceptable for many learners. However, it should be noticed that five learners rated it as “Bad”.The learners who rated the system as “Bad” gave the following reasons:
Figure 12. The results of Question B
Intelligent LMS with an Agent that Learns from Log Data in a Virtual Community
•
•
” The messages from the agent were distracting. I didn’t concentrate on my learning due to the agent’s constant actions.” “The messages from the agent were meddling because I previously knew my-self almost all the messages content even if the agent didn’t send them.”
This means that the messages from the system are sometimes meddling for some autonomous learners who can learn by themselves. Therefore we think that the system needs a function whereby learners can hide the agent from the system whenever they want.
CONCLUSION This chapter proposed an LMS in which an intelligent agent provides effective adaptive messages to learners using learning history data in a learning community and data mining techniques. The unique features of this system are as follows. •
•
The agent builds a learner model automatically by applying the decision tree model. The agent predicts a learner’s final status (Failed; Abandon; Successful; Excellent) using the learner model and his/her current learning history data. The constructed learner model becomes more precise as the amount of data accumulated in the database increases. The agent compares a learner’s learning processes with the “Excellent” status learners’ learning processes in the database, diagnoses the learner’s learning processes and generates adaptive messages to the learner.
This study compared the developed LMS with an LMS without the agent system through actual university e-learning classes for one semester. The results showed that the number of students who
withdrew from the class was significantly lower than in the case of the LMS without the agent system. In addition, the results showed that the average score on the final test was significantly higher when the agent system was used. Some questions and interviews with the learners showed that the agent system enhanced learning motivation and was instrumental in learners’ maintaining the required learning pace. Thus, the results demonstrate that the agent system is very effective in maintaining learners’ motivation in e-learning. In addition, it is important to note that in practical use we should not use the automatically constructed tree structure without reviewing it. This is because some teachers are not earnest in facilitating e-learning, and the automatically constructed tree structure is not valid for e-learning. For example, some teachers give a final result of “Excellent” to all learners without deliberation, and the constructed tree’s structure shows that any learner might be predicted to be “Excellent”. If we consider the constructed tree structure to be invalid, we use the typical structure in Figure 4 instead of the invalid structure. This procedure is also adopted when there is no data and no structure because the course has never been provided before. Finally, we note that this study does not focus on the group works in collaborative learning but the individual works in this study can be easily extended to group works. This is one of future tasks.
REFERENCES Bauersfeld, H. (1995). The Structuring of the Structures: Development and Function of Mathematizing as a Social Practice. In Steffe, L. P., & Gale, J. (Eds.), Constructivism in Education. Hillsdale, New Jersey: Lawrence Erlbaum Associates Publishers.
315
Intelligent LMS with an Agent that Learns from Log Data in a Virtual Community
Becker, K., & Vanzin, M. (2003). Discovering increasing usage patterns in web based learning environments. In Proc. Int. Conf. on Utility, Usability, and Complexity of e-information systems (pp. 57-72). Domingos, P., & Pazzani, M. (1997). On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning, 29, 103–137. doi:10.1023/A:1007413511361 Gabrielle, D. M. (2000). The effects of technology –mediated instructional strategies on motivation, performance, and self directed learning. In Proc. of ED-Media (pp. 2569-2575). Gamoran, A. Secada, W. G. & Marrett, C. A. (2000). The organizational context of teaching and learning: changing theoretical perspectives, In Hallinan, M.T. (Ed.), Handbook of Sociology of Education (pp. 37–64). New York: Kluwer Academic/Plenum Publishers. Hamalainen, W., Laine, T. H., & Sutinen, E. (2005). Data mining in personalizing distance education courses. In Romero, C., & Ventura, S. (Eds.), Data Mining in e-Learning (pp. 157–171). UK: WIT Press. Huang, C. J., Chu, S. S., & Guan, C. T. (2007). Implementation and performance evaluation of parameter improvement mechanisms for intelligent e-learning systems. Computers & Education, 49(3), 597–614. doi:10.1016/j. compedu.2005.11.008 Keller, J. M. (1999). Motivation in cyber learning environments. International Journal of Instructional of Educational Technology, 1(1), 7–30. Knowles, M. (1983). How the media can make it or bust it in education. Media and Adult Learning, 5(2), 3–4.
316
Knowles, M., Holton, E. F., & Swanson, R. A. (1998). The Adult Learner: The Definitive Classic in Adult Education and Human Resource Development (5th ed.). Houston, TX: Gulf Publishing. Lee, C. Y. (2000). Student motivation in the online learning environment. Journal of Educational Media & Library Sciences, 37(4), 365–375. Mayer, R. E., & Anderson, R. B. (1991). Animations need narrations. Journal of Educational Psychology, 83(4), 484–490. doi:10.1037/00220663.83.4.484 Merriam, S. B., & Brockett, R. G. (2007). The Profession and Practice of Adult Education: An Introduction. New York: Jossey-Bass. Minaei-Bidgoli, B., Kashy, D. A., Kortemeyer, G., & Punch, W. F. (2003). Predicting student performance: an application of data mining methods with an educational web-based system. In Proc. IEEE conf. on Frontier in Education, T2A (pp. 13-18). Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1, 81–106. doi:10.1007/ BF00116251 Talavera, L., & Gaudioso, E. (2004). Mining student data to characterize similar behavior groups in unstructured collaboration spaces. In Proc. of the workshop on Artificial Intelligence in CSCL, 16th European Conference on Artificial Intelligence, ECAI2004 (pp. 17-23). Ueno, M. (2007). e-Leanirng in knowledge society (in Japanese), Baihuukan, Tokyo. VanLehn, K., & Martin, J. (1998). Evaluation on an assessment system based on Bayesian student modeling. International Journal of Artificial Intelligence in Education, 8(2), 179–221. VanLehn, K., & Niu, Z. (2001). Bayesian student modeling, User Interfaces and Feedback: A sensitivity analysis. International Journal of Artificial Intelligence in Education, 12, 421–442.
Intelligent LMS with an Agent that Learns from Log Data in a Virtual Community
VanLehn, K., Niu, Z., Siler, S., & Gertner, A. (1998). Student modeling from conventional test data: A Bayesian approach without priors. In Proc. the 4th Intelligent Tutoring Systems ITS’98 Conference (pp. 434-443). Berlin/Heidelberg: Springer-Verlag. Vapnik, V. (1995). The Nature of Statistical Learning Theory. New York: Springer. Visser, L., & Keller, J. M. (1990). The clinical use of motivational messages: An inquiry into the validity of the ARCS model to motivational design. Instructional Science, 19, 467–500. doi:10.1007/ BF00119391
Visser, L., Plomp, T., & Kuiper, L. (1999). Development research applied to improve motivation in distance education (pp. 17–28). Houston, TX: Association for Educational Communications and Technology. von Glasersfeld, E. (1995). A constructivist approach to teaching. In Steffe, L., & Gale, J. (Eds.), Constructivism in Education (pp. 3–16). New Jersey: Lawrence Erlbaum Associates, Inc. doi:10.4324/9780203454220
317
318
Chapter 18
A Computational Model of Social Capital in Virtual Communities Ben Kei Daniel University of Saskatchewan and Saskatoon Health Region, Canada
ABSTRACT This chapter presents a Bayesian Belief computational model of social capital (SC) developed within the context of virtual communities. The development of the model was based on insights drawn from more than five years of research into social capital in virtual communities. The Chapter discusses the key variables constituting social capital in virtual communities and shows how the model was updated using practical scenarios. The scenarios describe authentic cases drawn from several virtual communities. The key issues predicted by the model as well as challenges encountered in building, verifying and updating the model are discussed.
INTRODUCTION This chapter presents the Bayesian Belief computational model of social capital (SC) developed within the context of virtual communities. The development of the model was based on insights drawn from more than five years of research into social capital in virtual communities. The Chapter discusses the key variables constituting social capital in virtual communities and shows how the model was updated using practical scenarios. The scenarios describe authentic cases drawn DOI: 10.4018/978-1-60960-040-2.ch018
from several virtual communities. The key issues predicted by the model as well as challenges encountered in building, updating, verifying and updating the model are discussed.
A MODEL OF SOCIAL CAPITAL There are fundamentally many variables constituting social capital in virtual communities. For examples, social capital is frequently defined as a function of positive engagement or engagement which is shared by many definitions. More specifically, when people engage in positive engagement
A Computational Model of Social Capital in Virtual Communities
on issues of mutual interests they are more likely to get to know more about each other which is critical to the development of social capital. The value derived from positive engagement can include sharing personal experiences with others, endorsing positive behaviour or discouraging negative one, sharing information, recommending options, and providing companionship and hospitality, all of which are not only vital elements for community living but social capital. Productive relationships crucial to social capital occur when people have a common set of expectations, mediated by a set of shared social protocols and are willing to identify with each other as members of the same community. Another important aspect of building social capital in virtual communities is when members establish a certain level of shared understanding. The process of establishing shared understanding often draws upon a set of shared beliefs, shared goals and values, experiences and knowledge. Shared understanding is essentially enhanced by various forms of awareness. And for many years researchers in human computer interactions have established that awareness is critical to effective interactions and productive social relationships in virtual settings (Gutwin, at al., 1998). Maintaining different forms of awareness in a virtual community therefore, lubricates and increases the value of engagement and possibly increases shared understanding. In any enhancing social capital in virtual communities, people need to be aware of people they are interacting with. They want to know where others are located (demographic awareness) and what they are up to. In more professional settings or in distributed communities of practice, people are often curious of what others do or are interested in (professional awareness), what others know (knowledge awareness) or what they are able to do (capability awareness) Trust is another influential variable of social capital. Several research studies used trust as proxy for measuring social capital. Trust is a
critical ingredient and a lubricant to almost many forms of social interactions. Trust enables people to work together, collaborate, and smoothly exchange information and share knowledge without time worsted on negotiation and conflict (Cohan & Prusak, 2000). Trust can also be treated as an outcome of positive attitudes among individuals in a community. Further, in virtual communities, trust can only be created and sustained when individuals are provided with an environment that can support different forms of awareness. In other words, people with shared vision and goals and shared language and terminology are more likely to develop trusting relationships with each other than those interested in different things and who do not understand each other. These variables are the detailed specification of the elements of social capital in virtual communities. The second step in building a model of social capital is to map the variables (see graph 1) into a graphical structure based on qualitative reasoning. In particular, the knowledge of the structure of the model was grounded in current research into social capital and physical communities as well as recent work on social capital in virtual communities (Daniel, McCalla & Schwier, 2005). And the qualitative reasoning of the causal relationships among variables was based on the results of synthesis of current research on social capital. This research suggests that people’s attitudes in virtual learning communities can strongly influence the level of their engagement with each other and consequently their ability to know various issues about themselves, which in turn can influence their level of trust in each other. In other words, when people have positive attitudes towards each other, they are more likely to engage on fruitful discussion that in turn raises their level of awareness in terms of what is being discussed but also increase their understanding of the strengths and weakness of others in the community. The causal relationships among the variables in the graph is shown by the direction of the ar-
319
A Computational Model of Social Capital in Virtual Communities
Figure 1. Graphical model of social capital
row i.e. attitudes influencing different forms of awareness and the strength of the influence suggesting strongly positive relationship among the variables. Further, since awareness can contribute to both trust and distrust the strength of the relationships can be medium positive, medium weak, etc. depending on the kind of the awareness. For instance demographic awareness has a positive and medium effect on trust (see Figure 1), meaning that it is more likely that people will trust others regardless of their demographic backgrounds and in fact this is the case with distributed communities of practice. Extending this type of qualitative reasoning resulted in the Bayesian Belief Network structure shown in Figure 1. In the graph, those nodes that contribute to higher nodes align themselves in “child” to “parent” relationships, where parent nodes are super-ordinate to child nodes. For example, trust is the child of shared understanding; different forms of awareness and social protocols, which are in turn children of interaction and attitudes. The graph presented above relates to the two kinds of virtual communities (virtual learning community and distributed community of prac-
320
tice) the graph topology enables different forms of experiments to be conducted, which can apply to both types of communities. Once a Bayesian Belief Network graph is developed, the third stage is to obtain initial probability values to populate the network. Initial probabilities can be obtained from different sources, and that sometimes obtaining accurate initial numbers that can yield valid and meaningful posteriors can be difficult. The approach presented in the book is intended to simplify the construction process.
Computing Conditional Probabilities In a Bayesian network, every stage of situation assessment requires assigning initial probabilities to the hypotheses. These initial probabilities are normally obtained from knowledge of a particular situation prevailing at a particular time. However, converting a state of knowledge to probability values is a challenge facing Bayesian Modellers. In the book, the initial conditional probabilities for the social capital model were obtained by examining qualitative descriptions of the influence between two or more variables. Each probability value describes strength of relationships and the
A Computational Model of Social Capital in Virtual Communities
Table 1. Social capital variables and their definitions Variable Name
Variable Definition
Variable States
Attitudes
Individuals’ general perception about each other and others’ actions
Positive/Negative
Shared Understanding
A mutual agreement/consensus between two or more agents about the meaning of an object or idea
High/Low
Awareness
Knowledge of people, tasks, or environment and or all of the above
Present/Absent
Demographic Awareness
Knowledge of an individual: country of origin, language and location
Present/Absent
Professional Awareness
Knowledge of people’s background training, affiliation etc.
Present/Absent
Engagement
An extended period of interaction between two or more people that goes beyond exchange of words but important and meaningful social connections.
positive/negative
Social protocols
The mutually agreed upon, acceptable and unacceptable ways of behaviour in a community
Present/Absent
Trust
A particular level of certainty or confidence with which an agent use to assess the action of another agent.
High/Low
letters S (strong), M (medium), and W (weak) represent different degrees of influence among the variables in the model are (Daniel, Zapata-Rivera & McCalla, 2003). The signs + and - represent positive and negative relationships among the variables. Once, the initial probabilities were determined, a fully specified joint probability distribution was computed. In the process, each node, X (where X implies any variable in the graph), the probability distribution for X conditional upon X’s parents. For example, the distribution of shared understanding conditional upon its parents (engagement, attitudes and community type) was specified. As earlier discussed in the chapter, different forms of awareness are critical to engagement that stimulates positive social capital. The criterion used for obtaining the conditional probability values was based on adding weights to the values of the variables depending on the number of parents and the strength of the relationship between particular parents and the children. For example, Attitudes and Engagement have positive and strong (S+) relationships with Knowledge Awareness; the evidence of positive engagement and positive attitudes will therefore, produce a conditional probability value for
Knowledge Awareness 0.98 (a threshold value for strong = 0.98). The weights were obtained by subtracting a base value (1 / number of states, 0.5 in this case) from the threshold value associated to the degree of influence and dividing the result by the number of parents (i.e. (0.98 - 0.5) / 2 = 0.48 / 2 = 0.24), this follows the fact that in the graph Knowledge awareness is a child of both interactions and attitudes. Table 1 shows the threshold values and weights used in this example. Since it is more likely that a certain degree of uncertainty can exist, value α = 0.02 leaves some room for uncertainty when considering evidence coming from positive and strong relationships. These threshold values can be adjusted based on expert opinion. This approach has appeared in early work (Zapata-Rivera, 2002; Daniel, Zapata-Rivera & McCalla, 2003). Using this approach it is possible to generate conditional probability tables (CPTs) for each node (variable) regardless of the number of parents. Of course the accuracy of this approach is also dependent on how the consistency, coherence and validity of the initial knowledge gathered from experts and the decisions knowledge by the engineer to transform the knowledge gathered into
321
A Computational Model of Social Capital in Virtual Communities
Table 2. Threshold values and weights with two parents Degree of influence
Thresholds
Weights
Strong
1-α = 1 - 0.02 = 0.98
(0.98-0.5) / 2 = 0.48 / 2 = 0.24
Medium
0.8
(0.8-0.5) / 2 =0.3 / 2 = 0.15
Weak
0.6
(0.6-0.5) / 2 =0.1 / 2 = 0.05
initial probabilities. For instance, assuming some subject matter experts are consulted to obtain initial probabilities, this knowledge is translated into the threshold weighted values as described in Table 2 depending on the degree of influence among the variables (i.e. evidence coming from one of the parent’s states), a knowledge engineer can be guided by domain expert when working on the threshold values.
Querying the Model In general, the mechanism for drawing conclusions in Bayesian models is based on probability propagation of evidence. Propagation refers to model updating based upon known set of evidence entered into the model. A Bayesian model contains many variables each of which can be relevant for some kind of reasoning but rarely are all variables relevant for all kinds of reasoning at once. Therefore, researchers need to identify the subset of the model that is relevant to their needs. In other words, it is sometimes the case that the modeller only enters evidence to few variables in order to observe changes in certain variables. Querying a Bayesian model refers to the process of updating the conditional probability table and making inferences based on new evidence. One way of updating a Bayesian model is to develop a detailed number of scenarios that can be used to query the model. Constructing and updating a model of social capital in virtual learning communities is a complex task since there are numerous underlying variables that are not necessarily obvious in virtual communities. In order to facilitate model construction and updat-
322
ing, scenarios are developed underlying various events based on either evidence or an expert’s knowledge to test and tune the model over time. A scenario can generally be described as a set of written stories or synopsis of acts in stories built around carefully constructed events. In scientific and technical sense a scenario describes a vision of the future state of a system. Such a description can be based on current assessment of the system, of the variables and assumptions, and the likely interaction between system variables in the progression from current conditions to a future state (Collin, 1989). In virtual communities, scenarios provide simple, intuitive, example based upon descriptions of the patterns of interactions two or more variables of interests. A scenario-based modeling is essentially a set of procedures for describing specific sequences of behaviours within a model that illustrate actual interactions within a learning community. The goal is to understand and explain the interactions of variables or set of events within a model and how these might possible influence direction of interaction patterns, and subsequently their influence on the level of social capital within that community. This means that a single scenario might describe a possible given state of interactions as it were in a community, and upon its implementation possible alternative explanations are provide to describe the current and future behaviours of a model. When several scenarios are used together to describe possible outcomes of events within a model, they can exceed the power of predictions based on a single hypothesis or a set of propositions drawn from a single data set. While a hypothesis normally refers to a set of unproven ideas, beliefs,
A Computational Model of Social Capital in Virtual Communities
and arguments, a scenario can describe proven states of events, which can be used to understand future changes within a model. Further, the outcomes of the events might be further used to generate a set of hypothesis. These hypotheses can then be used to understand a specific situation within a model. Moreover, the results of a scenario and hypothesis can be combined to further refine the consistency and accuracy of a model. However, for a scenariobased approach to be useful the scenarios created within any particular evidence or data sets must be plausible and internally consistent. Scenarios in Bayesian modeling of social capital provide alternative explanations to particular changes in variables and their effects on a particular community. The use of a scenario-based approach to query a model also offers with a common vocabulary and an effective basis for communicating complex and sometimes paradoxical conditions. In the context of my research, this provides opportunity for incorporating strategies from qualitative perspectives and to avoid the potential for sharp discontinuities that most quantitative approaches exclude from qualitative approaches. In updating the model on social capital new evidences in form of scenarios were used. And a scenario is a written synopsis of inferences drawn from observed phenomenon or empirical data. Druzdzel and Henrion (993) described a scenario as an assignment of values to those variables in Bayesian network which are relevant for a certain conclusion, ordered in such a way that they form a coherent story—a causal story which is compatible with the evidence of the story. The use of scenarios as an approach in updating Bayesian network models is based on psychological research (Pennington & Hastie, 1988). This research shows that humans tend to interpret and explain any social situation by weighing the most credible stories to test and understand social phenomena. Furthermore, updating a Bayesian model using scenarios is an attempt to understand various relationships among variables within a network.
CASE SCENARIOS During the process of updating the Bayesian model, various evidences were collected and compiled as scenarios to simplify the process and to enhance clarity of the stories. These scenarios are intended to emulate an experimental data and to illustrate the process of updating an initial Bayesian model using similar evidence. It is likely that the results of the model predictions could change in the face of more empirical data.
Case 1: A Virtual Learning Community of Graduate Students Community A was a formal virtual learning community of graduate students learning fundamental concepts and philosophies of E-Learning. The members of this community were drawn from diverse cultural backgrounds and different professional training. In particular, participants were practising teachers teaching in different domains at secondary and primary schools levels. Some individuals in the community had extensive experiences with educational technologies, while others were novices but had extensive experience in classroom pedagogy. These individuals were not exposed to each other before and thus were not aware of each other’s talents and experiences. Since the community was a formal one, there was a formalized discourse structure and the social protocols for interactions were explained to participants in advance. The special protocols required different forms of interactions including posting messages, critiquing others, providing feedback to others’ postings, asking for clarifications etc. As the interactions progressed in this community, intense disagreements were observed in the community. Individuals began to disagree more on the issues under discussion and there was a little shared understanding among the participants in most of the discourse.
323
A Computational Model of Social Capital in Virtual Communities
Case 2: A Distributed Community of Practice of Software Engineers Community B was a distributed community of practice for software engineers who gathered to discuss issues around software development. The main goals of the community were to facilitate exchange of information, and knowledge and peer-support to the members of the community. Members of this community shared common concerns. In terms of skills, participants composed of highly experienced software developers and novices. Participants were drawn from all over the world and were affiliated to different organisations, including researchers at University and software support groups. After a considerable period of interaction, individuals were exposed to each other long enough to start exchanging personal information among them. It was also observed that individuals offered a lot of help to each other throughout their interactions. Though no formal social protocols were explained to the participants, members interacted as if there were social protocols guiding their interactions. Further, there were no visible roles of community leaders.
Case 3: A Distributed Community of Practice of Programmers Community C consisted of a group of individuals learning fundamentals of programming in Java. It was an open community whose members were geographically distributed and had diverse demographic backgrounds and professional cultures. They did not personally know each other; they used different aliases from time to time while interacting in the community. Diverse programming experiences, skills and knowledge were also observed among the participants. It was interesting to observe that though these individuals did not know each other in advance, they were willing to offer help and to support each other in learning Java. Though there were no formal social protocols
324
of interaction, individuals interacted as if there were clear set social protocols to be followed in the community.
Case 4: A Distributed Community of Practice of Biomedical and Clinicians This case for community D is extracted from a recent phenomenon observed within the health system research. Increasing complexity of clinical problems and the difficulty to engage all health professionals to do research, coupled with failure to rapidly move research into new clinical approaches, procedures and technologies have created a need for new approaches to clinical research, practice and policy interface. The hallmark of these new approaches for health research is embedded in the conceptual framework of distributed communities of practice of biomedical and clinicians. Such a community normally operates as an interdisciplinary unit, drawing membership from nurses, clinicians, policy analysts and academic researchers to move research findings into patient care. The continuous demand for understanding of complex human diseases, the solutions to chronic diseases and preventative measures will most likely lie within many disciplines with the biomedical sciences, clinicians and nurses, all coming together and participate in a distributed community of practice. Members of this community are highly distributed in terms of both epistemological stances towards addressing health problems as well as the organizations in which they work. And so for them to effectively work together, it is imperative that knowledge required for solving problems draw from theories, concepts or models that are integral of two or more disciplines. It is also required that methods of problem solving need to be developed from multiple perspectives throughout the collaborative process and shared understanding and awareness of what
A Computational Model of Social Capital in Virtual Communities
people can bring to the table are definitely some of the issues this group deals with. Though diversity as seen in this community brings rich and diverse views, methods, approaches and procedures enriches problem solving, are difficult to enforce due to lack of shared understanding. Some of the challenges associated with difficulties of collaboration in interdisciplinary groups include, language barriers, cultural diversity, establishing a reasonable level of shared understanding to leverage differences and initiating useful for building trusts for a group succeed. In order for such a community to be effective, there is need for members to be aware of the value of each disciplinary contribution to the joint accord. They need to develop mechanism to help them translation interface to facilitate share understanding as they work together.
UPDATING THE MODEL The scenarios described above all represent typically situations where the model can be applied in real world settings. In order to test and update the initial Bayesian model of social capital, each case scenario was analysed looking for various evidence regarding the impact of individual variables in the model. Once a piece of evidence was added to the model, typically through tweaking a state of a variable (i.e. observing a particular state of a variable) or a process commonly known as variable initialisation, the model is updated and results are propagated to the rest of variables in the Bayesian model. This process generates a set of new marginal probabilities for the variables in the model. In the three case scenarios, the goal was to observe changes in probability values for trust and social capital. The model prediction outcomes were based on the nature of the cases described in the chapter. It is important to note that the cases themselves represent general characteristics of virtual communities, and is not directly based on empirical
evidence. However, this is a step to come up with more cases to train the model and run some empirical experiments to validate the model. This phase of a model development further helps experts to examine the model and refine it based on their knowledge of the domain. The Bayesian model therefore serves as an interactive tool that enables experts to create a probabilistic model, simulate scenarios and reflect on the results of the predictions.
Community A Community A is a virtual learning community (Community Type = VLC.) Based on the case description shared understanding is set to low and professional knowledge awareness is set to doesnotexist. Individuals in this community are familiar with their geographical diversity and so demographic awareness is set to exists. There is well-established formal set of social protocols set previously by the instructor (social protocols = known.) Figure 2 shows the Bayesian model after the evidence from community A has been added (shaded nodes) and the results of the posterior probabilities. The results of the predictions show the highest level of trust (P (Trust=high) =41.0%) and a corresponding probability level of SC (P (SC=high) =36.2%). These values are relatively low. Several explanations can be provided for the drop in
Figure 2. Probability distribution for community A
325
A Computational Model of Social Capital in Virtual Communities
the levels of SC and trust. First, there was a negative interaction in the community and lack of shared understanding in the community. The lack of shared understanding had possibly affected the level of trust and subsequently social capital. It is also possible that negative interactions and attitudes have affected the levels of task knowledge awareness and individual capability awareness. It could also be inferred that experiences of more knowledgeable individuals in the community were more likely to have been ignored, making individuals to become less co-operative.
Community B Variables observed in this case include community type which has been set to community of practice (DCoP); professional awareness was set to exists, since after interaction, it was observed that individuals in that community became aware of their individual talents and skills. Knowledge awareness was set to “exists” as well. Individual in this community shared common concerns and frame of reference, and so shared understanding was set to high. Figure 3 shows the Bayesian model after the evidence from community B has been added (shaded nodes) and propagated through the model. Propagating this set of evidence, high levels of trust and SC (P (Trust=high) =93.1% and P (SC=high) =73%) were observed. Given the evidence, it was also observed that interactions and Figure 3. Probability distribution for community B
326
attitudes in the model were positive which have positively influenced demographic cultural awareness and social protocols. Further, the presence of shared understanding and the high degrees of different kinds of awareness and knowledge of social protocols in this community have resulted in high levels of trust and SC. In spite of the evidence, demographic cultural awareness has little influence on the level of trust in this kind of a community and subsequently, it has not significantly affected SC. This can be explained by the fact that professionals in most cases are likely to cherish their professional identity more than their demographic backgrounds. This is in line with a previous study, which suggested most people in distributed communities of practice mainly build and maintain social relations based on common concerns other than geographical distribution (Daniel, O’Brien & Sarkar, 2003).
Community C Variables extracted from this case scenario include community type (VLC), shared understanding, professional awareness, demographic awareness, knowledge awareness were all set to exists. Figure 4 shows the Bayesian model after the evidence from community C has been added (shaded nodes) and propagated through the model. In community C, high levels of trust and SC (P (Trust=high) =92.7% and P (SC=high) =78.4%) were observed after the propagation of the evi-
A Computational Model of Social Capital in Virtual Communities
Figure 4. Probability distribution for community C
dence. These high levels of trust and SC can be attributed to the fact that the community was based on an explicit and focused domain. Though members might conceal their identities, they were willing to positively interact and participate in order to learn the domain. Further increase in the levels of trust and social capital can also be attributed to the presence of shared understanding. In other words, people in that community got along well and understood each other well enough. They used the same frame of reference and have common goals of learning a domain (Java programming language).
Community D Community D is a truly a distributed community of practice manifesting all the features. In this community that has a strong identified need of collaboration across domains having shared interests and concerns and identified need to build collaborative joint enterprise, which is dependent on social capital. However most of the variables critical to the development of social capital are lacking. As a result, shared understanding was set to low, social protocols not observed, professional and knowledge awareness all set to low. The results of the model’s prediction revealed considerably low levels of trust (P (Trust=high)
Figure 5. Probability distribution for community D
327
A Computational Model of Social Capital in Virtual Communities
= 59.6%) low level of trust and correspondingly low level of social capital (P (SC=high) = 35.1). This is expected since core variables of social capital critical to distributed communities of practice such as shared understanding and professional awareness were absent.
CHALLENGES In theory computational models are expected to be fully verified and valid but in practice, no computational model will ever be fully verified, guaranteeing 100% error free accuracy. But a high degree of statistical certainty is certainly still required to achieve for a model to deliver useful insights and knowledge. One of greatest challenges of building a computational model is making it valid, relevant and useful, which implies in most part establishing model credibility. This requires gathering empirical data, subjecting the model to undergo several rigorous verification and validation stages. It also requires establishing an argument that the model has produced sound insights and sound data based on a wide range of tests, which are comparable to data in real world settings. The development of the social capital model presented in this book did not passed rigorous model validation leading to statistically significant results. Since like in many social systems, modelling social issues is not so much about gain 100% error free models but about insights required to understanding and solving problems. In addition, most of the approaches used for building models of social systems make use of qualitative inferences rather than quantitative predictions about the future state of systems. The input variables used for building the model of social capital were extracted from what exists in the literature, which might not necessarily be empirically based, or situated within virtual communities, though variables such as awareness were based on understanding of social capital in virtual
328
communities. Furthermore, research into social capital in virtual communities is still premature and more needed to be done in terms of understanding the nature of independent variables constituting these communities and how they causally relate to each other. There are two ways to construct Bayesian models, one is to learn a graphical structure from data and the one is to initially propose a graphical structure based on some logical reasoning and train the graph to learn probability values from the structure using new evidences. The latter is the approach used to build the social capital graph. This approach is not necessarily consistent all the time. There is a need to run a number of experiments to further validate and refine the structure, which in this case was minimally done. The sensitivity analysis conducted on the structure though has proved the logic used for building the model and some of the findings from the synthesis of what constitute social capital, it will be more interesting to reconfigure the structure and re-run several sensitivity analysis and observe the variability of the degree of influence of all the variables on social capital and social capital to each other its own components. These procedures though useful are not necessary to do since the overall goal of the model is to demonstrate a procedure for modelling social capital rather than building a final model of social capital, which though possible requires further work and perhaps in the second edition of this book.
CONCLUSION The ultimate goal of this chapter was to demonstrate a working example of a computational model of social capital in virtual communities. It is also intended provide guidance to researchers and practitioners interested in exploring social issues in virtual communities. Moreover issues predicted by the model are intended to open up debate about social capital in virtual communi-
A Computational Model of Social Capital in Virtual Communities
ties and its technical and social implications to knowledge sharing and social networking. A Bayesian Network is highly suitable for modelling social capital, since its variables are still highly uncertain. This uncertainty to current gaps in knowledge, complexity and imprecision of social capital, it is also due to ignorance, or volatility of its definition. By representing social capital in graphical form, researchers can effectively communicate results and can observe the most important and less important variables of social capital. The model predictions revealed an increased level of trust and a corresponding increase in social capital. The predictions also suggest that professional awareness and shared understanding have an effect on the level of social capital in a community. The predictions also suggest that when people are aware of each other and are able to develop a certain level of shared understanding; they may develop better and productive relationships—social capital. And where there was a high presence of shared understanding and the fact that individuals were aware of each other’s capabilities (knowledge awareness), the model predicted increased in trusting relationships as well high levels of social capital. Similarly, when all forms of awareness shown in the model are present in any situation, and when there is an increased level of shared understanding, the levels of trust and social capital correspondingly increase significantly compared to the presence of only one form of awareness (demographic awareness). In conclusion, different forms of awareness, shared understanding and trust contribute to our understanding of social capital in any virtual community. Although the scenarios presented in this chapter are inadequate to fully draw final conclusions about causal links between these variables and an overall level of social capital in a community, the predictions provide a starting point for understanding social capital in virtual communities and will possibly trigger more think-
ing in terms of how to enhance social capital in virtual communities.
REFERENCES Cohen, D., & Prusak, L. (2001). In good company: How social capital makes organizations work. Massachusetts: Harvard Business School Press. Collion, M. H. (1989). Strategic planning for national agricultural research systems: An Overview. Working Paper 26. Retrieved July 3 from http:// www.ifpri.org/divs/isnar.htm Daniel, B. K. Zapata-Rivera, D. J., & McCalla, G. I. (2003) A Bayesian computational model of social capital. In Huysman, M. Wenger, E. & Wulf, V. (Eds.), Communities and technologies (pp. 287-305). London: Kluwer Publishers. Daniel, B. K., O’ Brien, D., & Sarkar, A. (2003). A design approach for canadian distributed community of practice on governance and international development: A Preliminary Report. In Verburg, R. M., & De Ridder, J. A. (Eds.), Knowledge sharing under distributed Circumstances (pp. 19–24). Enschede: Ipskamps. Daniel, B. K., Schwier, R. A., & McCalla, G. I. (2003). Social capital in virtual learning communities and distributed communities of practice. Canadian Journal of Learning and Technology, 29(3), 113–139. Daniel, B. K., & Zapata-Rivera, J. D & McCalla, G.I. (2005, November). Computational framework for constructing ayesian belief network models from incomplete, inconsistent and imprecise data in E-Learning (Poster). The Second LORNET International Annual Conference, I2LOR-2005, November 16 to 18. Vancouver, Canada
329
A Computational Model of Social Capital in Virtual Communities
Daniel, B. K., Zapata-Rivera, J. D., & McCalla, G. I. (2007). A Bayesian Belief Network approach for modelling complex domains. In Mittal, A., Kassim, A., & Tan, T. (Eds.), Bayesian Network Technologies: Applications and Graphical Models (pp. 13–41). Hershey, PA: IGI Publishing. Druzdzel, M. J., & Henrion, M. (1993). Efficient reasoning in qualitative probabilistic networks. In Proceedings of the 11th National Conference on Artificial Intelligence (pp. 548-553).
Gutwin, C., & Greenberg, S. (1998). Design for individuals, design for groups: Tradeoffs between power and workspace awareness. In Proceedings of the ACM Conference on Computer Supported Cooperative Work (pp. 207-216). ACM Press. Pennington, N., & Hastie, R. (1988). Explanation-based decision making: effects of memory structure on judgment. Journal of Experimental Psychology. Learning, Memory, and Cognition, 14(3), 521–533. doi:10.1037/0278-7393.14.3.521 Zapata-Rivera. J.D. (2002). cbCPT: Knowledge engineering support for CPTs in Bayesian networks. Proceedings of Canadian Conference on AI (pp. 368-370).
330
Section 5
Methods, Measurements and Matrices In the previous section the importance of modelling was emphasized. In addition to building models, there is a need to develop measures and metrices for analysing interactions in virtual communities. Section 5 covers diverse measurement metrics, methods and approaches for studying virtual communities. Chapter 19 is targeted at virtual community researchers with interest to quantitatively examine or employ the geography of a community, but has no training in the methodologies necessary to do so. The chapter takes the reader from the data collection stage through the application of several simple techniques. Chapter 20 discusses how virtual communities are associated with business and describes how these communities can support the overall business effort. The chapter then examines the ways that the execution of certain business processes – such as the ‘lessons learned process’ – can have a strong supporting role in maintaining the health of virtual communities. Chapter 21 provides an example of community building methodology using a step-by-step approach. The chapter first shows an overview of reaching at defining community specistcation and building methodology which translate into specistc measurable goals, social media and tools selection and matching and how these can be ultimately into software. In Chapter 22 technologies capable of locating and sorting networked communities of geographically disparate individuals within virtual communities are discussed. From a methodological point of view, the chapter suggests that virtual communities and social networks between individuals subsume the role of neighborhood areas as the most appropriate unit of analysis for deriving consumer insight, and as such, the methodologies of geodemographics need repositioned to accommodate social similarities in virtual, as well as geographical space. Chapter 23 presents work on the development of a cellular phone application—ProBoPortable that displays information regarding the progress and achievement of tasks and division of labor in project-based learning (PBL) for higher education. ProBoPortable works as wallpaper on the screen of the learner’s cellular phone, and it cooperates with Web-based groupware. Chapter 24 introduces a mathematical retrieval system that helps mathematics learners self-study effectively. The chapter presents math-retrieving system module, which takes analyzed problems submitted from users, retrieves solutions from similar stored problems and ranks the retrieved problems to users. Chapter 25 focuses on quantitative content analysis of online interactions, in particular, asynchronous online discussion. It clarifies the definitions of quantitative content analysis and provides a summary of 23 existing coding schemes, broadly categorized by the theoretical constructs under investigation. Chapter 26 offers an overview of how Semantic Web technologies can be used to provide a unified layer of representation for Social Web data in an open and machine-readable manner, to generate shared models and shared semantics, facilitating data gathering and analysis. Chapter 27 outlines a process of virtual ethnography that combines emic and etic methods of data gathering adapted to the virtual context to provide a ‘true’ accounting of the social constructs inherent in the virtual world. Chapter 28 explores the epistemological, and ethical boundaries of the application of a participant-observer methodology for analyzing avatar design in user-generated virtual worlds. The chapter describes why Second Life was selected as the preferred platform for studying the fundamental design properties of avatars in a situated manner.
Chapter 29 presents a participatory design experiment influenced by swarming activity. The chapter also introduces a new approach of writing narratives in virtual learning communities driven by social Web 2.0 and contrasts it with traditional storytelling approaches. Chapter 30 examines how a variety of research approaches can be applied to the study of cross-blog interactions. It shows challenges of studying cross-blog interactions and discusses strengths and limitations of traditional approaches and provide examples of how a new approach may be used to help fully capture the complexity of the interactions. Chapter 31 explicates methodological procedures for the measurement and visualization of chat-based communicative interaction in MUVWs as social networks. It presents a case study on an educational MUVW, the SciCentr programs sponsored by Cornell University and shows how this was used to elaborate methods and related findings. Chapter 32 proposes an online multi-contextual analysis (OMCA) as a new multi-method approach for investigating and analyzing the behaviors, perceptions, and opinions of social network site (SNS) users. The approach was designed to extend methods currently available for the investigation of the use and social consequences of these sites with techniques that converge upon and triangulate users’ perceptions of their online behavior. Chapter 33 discusses participant observation as a method of data collection for studying social interaction in online multiplayer games and the communities within them and looks at various practical issues connected to conducting participant observation in online multiplayer communities. Chapter 34 explores trends and developments in news-oriented virtual communities. The chapters shows reviews of several data collection and analysis techniques such as content analysis, usability testing and eye-tracking and propose that these techniques and associated tools can aid the study of news communities. The chapter examines the implications these techniques have for better understanding human behavior in virtual communities as well as for improving the design of these environments. Chapter 35 presents major challenges associated with the analysis of interaction patterns in informal virtual communities. Drawing from previous research whose intent was to develop a theoretical model of interactions, social network as well as content analysis were employed to understand the structure and nature of interaction in such virtual communities. Chapter 36 discusses basic research methodologies to help virtual community researchers clarify the dilemma inherent in the virtual community research fraternity. The chapter also presents advanced discussion of systematic methodological application where data collected for a research study can be conveniently treated, analysed and interpreted to be able to write a professional masterpiece of a research report as a contribution to the knowledge data base. Chapter 37 focuses on quantitative content analysis of online interactions, in particular, asynchronous online discussion. It clarifies the definitions of quantitative content analysis and provides a summary of 23 existing coding schemes, broadly categorized by the theoretical constructs under investigation. Chapter 38 a theoretical model of business brand model along with analyses of an actual virtual community. Chapter 39 presents a novel systematic model of interaction analysis which was designed and successfully experimented with a wide sample of adult learners in order to enhance and understand cognitive, socio-organizational and emotionalaffective processes of virtual learning communities (VLCs).
333
Chapter 19
A Beginner’s Guide to Geographic Virtual Communities Research Brent Hecht Northwestern University, USA Darren Gergle Northwestern University, USA
INTRODUCTION Virtual communities have important geographic components. Community participants live, work, and travel to specific places on the Earth’s surface, and communities often reflect the characteristics of these places. In addition, community artifacts are often imbued with geographic information. Researchers can use these often under-appreciated geographic elements to understand important patterns in virtual communities’ interaction with the real world. For instance, one could build and study a shared repository for a biking community’s geographic knowledge (Priedhorsky & Terveen, 2008), investigate whether community artifact density is biased towards certain areas of the globe (Hecht & Gergle, 2009), or model the DOI: 10.4018/978-1-60960-040-2.ch019
particular characteristics of a community’s spatiosocial network (Larsen, Axhausen, & Urry, 2006; Larsen, Urry, & Axhausen, 2006). Geographic analyses can also allow an investigator to answer questions that are not overtly geographic in nature. In such cases, these analyses can provide an efficient alternative or supplement to more traditional methods such as large-scale surveys, interviews, or observational techniques. In many ways, it is this capability of geographical analyses that is more powerful for the virtual communities researcher. The number of research topics here are infinite, but could include modeling the relationship between social networking site usage and socioeconomic status, understanding human photo-taking behavior (Hecht & Gergle, 2010; Yanai, Yaegashi, & Qiu, 2009), modeling and sharing dynamic travel behavior based on interaction within social networks (Pultar &
A Beginner’s Guide to Geographic Virtual Communities Research
Raubal, 2009), and identifying self-focus bias in wikis (see the case study at the end of the chapter). This chapter is targeted at the virtual community researcher who wants to quantitatively examine or employ the geography of a community, but has no training in the methodologies necessary to do so. We take the reader from the data collection stage through the application of several simple techniques, suggesting more advanced literature when space limitations prevent us from delving into details. We also take special care to flag important pitfalls that cause hard-to-notice but critical errors. Finally, we close with a brief but illustrative research project case study. This chapter is effectively an introductory lesson in Geographic Information Systems (GIS) and Geographic Information Science (GIScience), customized for the virtual communities researcher. A GIS is a “set of tools for performing operations on geographic data that are too tedious or expensive or inaccurate if performed by hand”. In doing so, it helps “reveal what is otherwise invisible in geographic information” (Longley, Goodchild, Maguire, & Rhind, 2005b). Another definition many GIS educators find useful describes GIS as a “powerful set of tools for collecting, storing, retrieving at will, transforming, and displaying spatial data from the real world for a particular set of purposes.” (Burrough & McDonnell, 1998) GIScience is the science and engineering behind this “set of tools”. It can be loosely considered analogous to information science but for the welldefined class of geographic information (Longley, et al., 2005b). While GIS/GIScience and computer science are closely related, this chapter should be accessible to readers with no programming experience at all. However, programming ability (or access to someone with knowledge of programming) will help the reader more readily leverage the tools we mention for their own research. In particular, experience with web-based application programming interfaces (APIs), Java, and/or statistical programming will be useful.
334
MINING GEOGRAPHIC INFORMATION FROM VIRTUAL COMMUNITIES Before engaging in any study involving the geographic component of virtual communities, it is necessary to obtain geographic information or to transform pre-existing geographic information into a “usable” form. Usable forms include latitude/longitude coordinates, bounding boxes around geographic features, and advanced polygonal and polylinear representations (e.g. the shape of the United States and the path of a road), along with the attribute information attached to these data, such as a username, population, etc. Formally, geographic information is defined as “atomic pairs of the form <x,z> where x is a location in space1 and z is a set of properties [attributes] of that location; or information that is reducible to such pairs.” (M. Goodchild, 2001; M. Goodchild, Yuan, & Cova, 2007). For example, the x in a pair could be a latitude/longitude of a city that is mentioned in a forum posting, and the z could include the average income of the city, the username of the poster, his/her centrality in a social network, and/or the size of the post (Figure 1). This section discusses important methodologies for obtaining geographic information and making it usable for virtual communities research. We also point the reader to easy-to-use tools for applying these methodologies.
Latitude and Longitude Pairs A growing number of virtual communities generate community artifacts that contain latitude and longitude coordinates. Assuming this structured information is accurate, it is often immediately “usable” in geographic analyses. Classic examples include the latitude and longitude (“lat/lon[g]”) tags that have been manually associated with hundreds of thousands of Wikipedia articles or online photo collections that have been manually or automatically tagged with lat/lon information.
A Beginner’s Guide to Geographic Virtual Communities Research
Figure 1. Examples of geographic information datasets. Each row represents an <x,z> pair. Note the variety of representations that can make up x (in this case there are both latitude and longitude coordinates and complex polygonal representations), as well as the diversity of possibilities for z attributes
Later in the chapter we discuss challenges that can result from the poor spatial representations inherent in latitude and longitude points (such as inaccurate area and distance calculations). However, if the virtual community being studied explicitly contains lat/lon tags, a researcher can generally consider herself lucky. Geographic information in other forms (covered later in this section) is generally harder and more error prone to extract.
Street Addresses Street addresses require a quick and relatively accurate process known as address geocoding before they can be used by most geographical analyses. This process, which generally turns a street address into latitude and longitude coordinates, is usually quite exact. However, the returned coordinates can sometimes contain inaccuracies about the size of a city block or the locations may be inaccurately positioned on the wrong side of a street (although this situation is improving). Google2, Microsoft
Bing3, Yahoo!4 and MapQuest5 all provide webbased address geocoding APIs.
Geographic Information in IP Addresses One form of geographic information that is frequently available to virtual communities researchers is that contained within IP addresses. Through the process of IP geolocation, a user’s location can be determined with a certain degree of precision and accuracy. Usually, the more one pays for the geolocation software, the better the precision and accuracy. One cannot expect to achieve sub-city level precision at any reasonable level of accuracy. Country-scale research, on the other hand, is generally very suited to IP geolocation. MaxMind’s6 free GeoLite Country, for instance, advertises 99.5 percent accuracy at a country scale (99.5 percent of country identifications are correct), while its GeoLite City package offers 79 percent accuracy for the US within a 25-mile radius (different countries may be more or less
335
A Beginner’s Guide to Geographic Virtual Communities Research
accurate). IP geolocation companies frequently offer free online “sample” versions of their software that can be used to geolocate a small number of IP addresses. Readers should be somewhat cautious when using and interpreting IP geolocation data as some of the causes for IP geolocation inaccuracies can add significant systematic error to certain studies. For example, if you were examining a community of distributed software developers and that group of users primarily connected via a VPN (virtual private network) to their companies then you might have a bias in the results you would get back from IP geolocation.
Geographic Information in Natural Language Very frequently, community discussions and other artifacts contain vast amounts of geographic information in the form of toponyms, or place names, in natural language. Yahoo! describes this information as “geographically relevant, but not [easily] geographically discoverable.” (20092009Yahoo! Developer Network, 2009!!) Geotagging is the process of identifying toponyms in text and matching them with structured geographic information. It is composed of two parts – geoparsing and geocoding (a generalized form of address geocoding) (Pasley, Clough, Purves, & Twaroch, 2008) – each of which is a difficult process and can introduce error. The goal of the geoparsing process is to disambiguate toponyms from non-geographic named entities (solving “geo/non-geo” ambiguity). Consider the case of “Washington”, for example. Without context, geoparsing is impossible to do, as “Washington” can be a place (e.g. “Washington Park”), a former U.S. president (“George Washington”), part of a newspaper title (e.g. “Washington Post”), etc. Natural language processing (NLP) techniques are generally used to partially solve this problem. Once toponyms have been identified with a certain degree of accuracy, the geocoding process
336
can begin. Geocoding associates a toponym with a spatial footprint of structured geographic information using a digital gazetteer (M. Goodchild & Hill, 2008; Hill, 2000). A spatial footprint can be a latitude and longitude point, a bounding box around a city’s borders, or even a detailed polygonal representation. In other words, whereas geoparsing resolves geo/non-geo ambiguity, geocoding resolves geo/geo ambiguity. Again, “Washington” presents an interesting example. Even if we are sure that we are operating in the geographic domain, “Washington” can refer to a U.S. state, the capital of the United States, or even a street in Albany, California. Without additional assistance, it is not clear which footprint should be matched with the term “Washington”. The case of “London” presents similar problems. Contextual clues can help the disambiguation process. Chances are that if a community member writes about how much she enjoys visiting the Tate Modern and Buckingham Palace on the weekends, the “London” she refers to will be that of London, England. Once this is recognized, a spatial footprint (i.e. latitude/longitude pair) for London, England can be used in a geographic analysis. However, if she writes that she is a student at the University of Western Ontario, then London, Ontario is likely correct, and London, Ontario’s (very different) spatial footprint is used. Virtual communities researchers will often perform the entire geotagging process, but in some cases only the geocoding step is necessary. The latter is true for getting geographic information from data in necessarily geographic database fields such as the “hometown” field in Facebook. The strict typing of the field means that its value is nearly guaranteed to be a geographic entity, thus there is no geographic ambiguity and the geoparsing stage can be skipped. Both Yahoo! and MetaCarta offer web-based APIs for geotagging. Metacarta’s GeoTagger API7 has the advantage of advanced natural language processing, meaning it is capable of correctly interpreting the expression “10 miles North of
A Beginner’s Guide to Geographic Virtual Communities Research
Phoenix” as more than just “Phoenix”. Yahoo!’s Placemaker8 geotagging API, however, may be more familiar to a developer already working with Yahoo!’s APIs, and is better suited to handle high volumes of text. Generally speaking, if geocoding alone is required, either the address geocoding or geotagging web APIs can be used to extract spatial footprints. Knowing that only geocoding is needed allows the researcher to use the Google, Mapquest, and/ or Bing APIs, instead of being restricted to any particular functionalities and foibles of Metacarta and Yahoo! (such as traffic limits). Once geographic data has been collected, it is important to understand its limitations. The following section identifies the largest of these limitations for virtual communities researchers, as well suggesting tips for getting around it.
THE GEOWEB SCALE PROBLEM: ALASKA ON THE HEAD OF A PIN Scale is a fundamental concept in the study of geographic information. Patterns observed at one scale, for instance, are not necessarily observed at other scales. In addition to the many other scalerelated concerns in geographic research (such as the ecological fallacy and the modifiable area unit problem), online geographic research usually faces a distinctive scale problem: the Geoweb Scale Problem (GSP) (Hecht & Moxley, 2009). Stated in the virtual communities context, the GSP occurs when the spatial footprints available are at too coarse a scale for a given research problem. This can occur when the community itself embeds structured geographic information or when this information is derived using techniques such as IP geolocation or geotagging. How does this manifest in virtual communities research? Consider a researcher aiming to uncover the relationship between the socioeconomic status of neighborhoods in Chicago with the number of Facebook users in those neighborhoods. In this
case the researcher will likely run up against the GSP because Facebook users typically specify their current city (e.g. “Chicago”), and not their neighborhood (e.g. “Hyde Park”). In our work reported in (Hecht & Gergle, 2010), we were unable to specify the proximity of Flickr users to their photos with a precision better than 50km for the same reason. An even nastier instance of the GSP occurs when some spatial footprints are encoded at an appropriate scale for a study, but others are not. The English Wikipedia, for instance, encodes all footprints as single points, including, for example, the state of Alaska’s. Distance-based studies using this point will be fallacious, especially within the region. For instance, Anchorage and the state of Alaska are around 400km apart according to the English Wikipedia’s spatial footprints! Similarly, any study that requires knowledge of containment relations would be impossible using this dataset. To get around this problem, (Hecht & Moxley, 2009) automatically removed the more egregiously coarse spatial footprints in Wikipedia using a list of the geographic features with the largest “real” footprints: countries and first-order administrative districts (i.e. provinces, states, etc.). Taking a similar approach, (Lieberman & Lin, 2009) assumed that coordinates not specified to a certain number of significant digits implied that the geographic features being represented were very large, and filtered them from their analysis. Another approach is to decrease the resolution of the experiment to the lowest common denominator resolution, which is the method described in the case study below. If you do geographic virtual community research long enough, chances are you will run into the GSP. Unfortunately, there is no easy solution. The two approaches used in the literature are either to (1) redefine your study around the spatial representation limitations of your data or (2) filter your data to remove the most egregious cases. At the very least you need to be aware of this potential problem and think critically about
337
A Beginner’s Guide to Geographic Virtual Communities Research
how your study or usage of geographic information can be affected.
PROJECTIONS: YOU KNOW THE EARTH ISN’T FLAT, BUT DO YOUR TECHNIQUES AND METHODS? It is our hope that most people reading this chapter are aware that the Earth is not flat. However, it is surprising how often this piece of common knowledge gets overlooked in the analysis of geographic data by researchers naive to traditional geography, cartography, and related fields. In order to use latitude and longitude points (or other types of spatial footprints encoded in latitude and longitude), it is essential to fully understand the implications of the shape of the Earth on geographic analyses, especially those done at a global/continental scale and/or those that require great precision. In order to represent the Earth’s surface on a flat plane – such as on a map or a regular grid – distortions must necessarily be introduced. For centuries, geographers, cartographers, mathematicians and others have examined ways to manage these distortions in order to optimize the functionality of planar Earth representations for specific tasks.
A vital component of these optimized representations are projections. However, with the invention of GPS, geotagging, and Google Earth, centuries of expertise and knowledge have been unwittingly ignored as researchers and practitioners from many fields naively attempt geographical analysis, entranced by these new technologies. Careful attention to projections (and coordinate system issues in general) is a necessary step and unfortunately one that is often skipped. In the place of projection expertise has arisen a “knee-jerk” reaction: considering the Earth’s surface to be an accurate Cartesian coordinate system with longitudes as the x-coordinates and latitudes as the y-coordinates. A flat earth assumption is inherent to this approach. Geographers have long called this flat-Earth latitude and longitude “projection” the “unprojected projection”9 and have strongly cautioned against its use in analyses. Any introductory GIS textbook worth its salt will warn of the “serious problems that can occur” (Longley, Goodchild, Maguire, & Rhind, 2005a) when applying raw latitude and longitude coordinates in analyses. The important thing to remember for virtual communities research about the unprojected projection is that it does not preserve true area, scale, distance,
Figure 2. In the unprojected projection on the left, the latitude and longitude grid seems to set up “pixels” of identical area across the globe. However, it can be easily seen in an equal area projection like the Mollweide Projection (right), that this is not actually the case. Units of square lat/lon degree are much smaller near the poles than at the Equator because lines of longitude get closer and closer together as they approach the poles
338
A Beginner’s Guide to Geographic Virtual Communities Research
or shape, particularly anywhere far from the equator (i.e. England, Germany, Canada, South Africa, etc.). As a result, most calculations one makes (such as average distance, density measures, etc.) using this projection are significantly distorted. The most obvious corollary is that researchers who report lengths, densities or areas in units per degree or units per square degree are failing to report findings in a consistent fashion. A degree/ square degree has different meanings at different latitudes. As shown in Figure 2, this due to the fact that the real-world length of a degree of longitude varies with latitude. At the equator, one degree of longitude is ~111km, but it is ~70km at 50° latitude and ~38km at 70° latitude. For reference, Berlin, Germany is at ~52°N latitude and Quito, Ecuador is approximately at the Equator (0°). As such, a square degree around Berlin is ~6,200 km2 but ~12,300 km2 around Quito. Similarly, research that reports distance results in latitude and longitude degrees is equally erroneous. Of course, all of these problems are their most severe for global-scale research, but regional and local analyses will be affected as well if reasonable precision is required. Solving the projection problem for distance calculations is easier than for area-based calculations. Google’s Map API and others can calculate driving distance, which for some research problems is the preferred distance metric over straight-line Euclidean distance. For global research problems where local precision is not required, great circle distance is a computationally simple proxy for the minimum “as the crow flies” distance. Great circle distances, which differ extensively from Euclidean distances calculated from latitude and longitude coordinates in nearly all cases, are derived from the same “curved” paths flown by airplanes. These paths (chords of great circles) only look “curved” because of the projection on which they are often drawn; in fact, they are the shortest paths between two points on a sphere. An Internet search will reveal dozens of great circle straight-line distance calculators in many
different programming languages and forms10. Unfortunately, if local precision is required, the Earth-as-sphere assumption behind the great circle calculation becomes a problem, because the Earth is not quite spherical (see next subsection). Area calculations require transformation of the underlying latitude and longitude coordinates into true linear coordinates (meters, km, etc.)11 using an equal area projection. Equal area projections guarantee that “areas on the map are always in the same proportion to areas measured on the Earth’s surface” (Longley, et al., 2005a). This is in stark contrast to the unprojected projection, where an area A that appears larger on the map than an area B may actually be smaller in “real life”. All full desktop GIS software packages provide extensive projection technology. Those familiar with C can use the famous PROJ.412 software package, and Java programmers can take advantage of the excellent open-source GeoTools13 code library. GeoTools contains many of the operations of a professional GIS package, albeit only in Java code form. Finally, many statistical packages such as R and MatLab have spatial extensions that are capable of performing projections. As an important aside, the famous Mercator projection is also an example of a projection that is very much not equal area. The Mercator projection displays Greenland, for example, as being massively larger than Mexico, but in actuality, the two are approximately equal in area. This may shock anyone who uses Google Maps regularly, as it is encoded in the Mercator projection. Google apparently failed to consult cartographers, who long ago noted that the “use of the Mercator projection for world maps should be [repudiated] by authors and publishers for all purposes” (Boggs, 1947). Of course, performing area-based analyses on data in a Mercator projection (perhaps from data that used a screenshot of Google Maps as a base map) is as problematic as using data in unprojected (latitude and longitude) form. A more appropriate projection for the globe or local areas should be used.
339
A Beginner’s Guide to Geographic Virtual Communities Research
Figure 3. In this screenshot from Google Maps, Greenland appears as large as Canada due to the area distortions inherent to the Mercator projection. Had Google chosen an equal-area projection, Greenland’s area would have been accurately depicted as being approximately that of Mexico
Readers interested in gaining more expertise in projection-related issues (and the datum-related issues discussed below) have many options. The Geographer’s Craft14 is a well-reputed (albeit a bit long in the tooth) online resource. Introductory GIS textbooks should all have at least one chapter dedicated to coordinate systems. Finally, those who crave the mathematical nitty gritty can turn to John Snyder’s classic text on projections (Synder, 1987).
Latitude and Longitude, According to Whom? There is yet another major concern regarding the shape of the Earth that can have large effects on research projects that need local accuracy and precision. As noted above in the discussion about great circle distances, the Earth is not a true sphere.
340
In fact, it is not even a spheroid or ellipsoid, but has an irregular, constantly changing surface. However, for reasons of computational simplicity, the Earth’s shape is usually approximated in most GIS analyses with an ellipsoidal model called a datum. Latitude and longitude points are always derived on a datum, and each datum is optimized in certain parts of the world. A latitude and longitude coordinate means nothing without knowing the underlying ellipsoidal model on which it is based. In other words, a single latitude and longitude coordinate refers to different real-world locations in different datums. The reason readers should not panic after reading the preceding sentence is that most researchers working with online geographic data will encounter geographic information encoded in one of two datums. WGS84 (World Geodetic Survey 1984) is the default datum in most GPS
A Beginner’s Guide to Geographic Virtual Communities Research
devices and web-based APIs, and therefore is the most common datum behind latitude and longitude coordinates. However, with the advent of Google Earth, a new datum has risen in popularity: the Google Earth datum. The Google Earth datum deviates from WGS84 due to a problem called (satellite) image misregistration. Goodchild (M. F. Goodchild, 2007) found that in Santa Barbara, California, this error will cause positioning to be off by about 40 meters. Google Earth image misregistration also affects any geographic data layer made using Google Earth as a reference. Depending on what type of project the reader has in mind, the above two paragraphs should result in one of two reactions: 1. 40 meter error? Why do I care about 40 stinkin’ meters? 2. 40 meter error! That ruins my whole project! The key difference between these two reactions is the required precision and accuracy of the research project, as well as the ratio of the number of data points likely to be affected to those likely not to be affected. A person seeking to count how many Flickr photos’ tagged latitude and longitude points lay within each country in the world will likely have the first reaction. Researchers who want to crowdsource gravestone database generation or landmine identification should be in the second camp. These researchers will also have to be extra careful about other coordinate system/ projection issues (and other types of precision/ accuracy concerns).
SPATIAL AUTOCORRELATION: IF YOU SMELL, IT’S LIKELY YOUR ROOMMATE WILL SMELL TOO Statistics wonks in the readership may be familiar with temporal autocorrelation, or the tendency of observations made nearby in time to be correlated. Spatial data has an analogous property, albeit in
more than one dimension. Spatial autocorrelation is so important to the study of geographic information that it is described in the so-called First Law of Geography15: “everything is related to everything else, but near things are more related than distant things”(Tobler, 1970). While it is well beyond the purview of this chapter to explain this phenomenon in detail (spatial statistics is the field that focuses on spatial autocorrelation issues), it is important that “geonovices” be aware of spatial autocorrelation. In particular, the virtual communities researcher should know that spatial autocorrelation can cause a violation of the standard independent and identically distributed (iid) assumption of regression error terms. According to de Smith and colleagues, “many (most) spatial datasets exhibit patterns of data and/or residuals in which neighboring areas have similar values (positive spatial autocorrelation) and hence violate the core assumptions of standard regression models.” (de Smith, Goodchild, & Longley, 2009). One approach to addressing spatial autocorrelation is to use Geographically Weighted Regression (GWR), which allows parameters in regression models to vary across space. Another is to implement a mixed regressive spatial autoregressive model, which explicitly incorporates an autoregressive component, or to apply a spatial error model. De Smith and colleagues (de Smith, et al., 2009) provide an excellent overview of these methods and others, along with suggestions of tools that can be used to implement them. Their book is available in online form for free16.
CASE STUDY: DETECTING SELF-FOCUS IN WIKIPEDIA In order to ground our geographic information crash course in real virtual communities research, the remainder of this chapter is dedicated to a short case study based on the paper “Measuring SelfFocus Bias in Community-Maintained Knowledge
341
A Beginner’s Guide to Geographic Virtual Communities Research
Repositories” (Hecht & Gergle, 2009). We will of course center our attention on the geographic analyses, especially with regard to how we handled many of the issues raised above. The goal of our study was to examine diversity in knowledge representations across many different language editions of Wikipedia. In other words, is there a global consensus emerging as to the structure and content of world knowledge, or does each Wikipedia contain large amounts of unique information? And if the latter is the case, is this unique information random, or is it selffocused (i.e. centered on the particular interests and realities of speakers of each language)? These research questions were motivated by the implicit “global consensus of world knowledge” assumption in many areas of computer science-based virtual communities research (see (Adar, Skinner, & Weld, 2009) for example). Even Wikipedia’s co-founder Jimmy Wales seems to assume that there is one single “sum” of world knowledge in his famous quote about the Wikipedia project’s end goal: “Imagine a world in which every single person on the planet is given free access to the sum of all human knowledge. That’s what we’re doing.” We had many options in exploring this difficult research question. A typical virtual communities approach would have been to interview or survey Wikipedia authors from several different languages about the type of world knowledge they encode. However, this would be challenging given the need to deal with multi-lingual survey development, encoding and interpretation of the data, and numerous other challenges associated with global surveying. Moreover, Wikipedians are particularly averse to participating in surveys. Another approach would have involved choosing a small sample of articles in several different languages, and examining their particular characteristics. Indeed, after the publication of our article, this was done between English and
342
Polish (Callahan & Herring, 2009) but such an approach is necessarily limited to the sample that is drawn and large-scale / global patterns are more difficult to reveal. However, by using the geographic information embedded in many Wikipedia articles, we realized that we could reduce the amount of error-prone human labor, as well as drastically increase the number of languages and articles studied. We ended up examining data from around 8.9 million articles in 15 different Wikipedia language editions. Because hundreds of thousands of these articles are tagged with latitude and longitude coordinates, we could identify the location on the Earth at which these articles exist. We were able to use this information to answer questions such as “Do Russian-speakers tend to write more (relatively) about Russia than anyone else?” and “Do Finnish-speakers blab on and on about Finland relative to Spanish-speakers?” We formalized these inquiries in the geographical analyses that follow. Before describing these analyses in detail, however, we must highlight an important subtext to the above discussion. One of the much underappreciated aspects of geographic information is that it can help researchers investigate nongeographic topics. This is particularly true in virtual communities research, where geographic information can provide a unique analytical lens to examine otherwise difficult or impossible questions. Our research question about the diversity of world knowledge representations was in no way explicitly geographic. However, through the use and analysis of geographic information, we were able to provide stronger evidence and expend fewer resources than with a non-geographic approach.
Geographic Data As noted above, the location component of our geographic information (the x) was the latitude and longitude coordinates embedded by Wikipedia contributors into hundreds of thousands of
A Beginner’s Guide to Geographic Virtual Communities Research
Wikipedia articles. Of course, the only articles in which a lat/lon tag make sense are those that have a permanent and specific footprint on the surface of the Earth, which we call “explicitly geographic Wikipedia articles”. For instance, explicitly geographic articles include “University of Saskatchewan”, “Toronto”, and “Golden Gate Bridge”. Articles without lat/lon tags are those like “Stephen Colbert”, “Diet Coke”, and “iTunes”. As noted above, Wikipedia’s latitude and longitude tags are a canonical example of the Geoweb Scale Problem. Latitude and longitude tags are inherently zero-dimensional, while some of the entities described in Wikipedia are quite extensive one- or two-dimensional (on a map) features. It is quite difficult to accurately describe Alaska in a lat/lon point, but that does not stop Wikipedians from doing it. As such, we carefully chose our minimum scale of analysis to circumvent the GSP, a process that will be described below and is repeatable in similar virtual community work.
Geographic Analyses We used a combination of our open-source, Javabased WikAPIdia Wikipedia analysis software, which is optimized for geographic analysis, and ESRI’s ArcGIS software17. ArcGIS is the industry-standard GIS package, but it is a costly piece of software. Our study could have also been performed – albeit with greater effort – using other software, such as Matlab or R (with their spatial extensions). GRASS GIS18, the most popular open-source GIS software, would have also been possible, but GRASS is notoriously difficult to use. Finally, GeoTools (Java) was another option. First, using WikAPIdia, we exported all latitude and longitude tags into the Shapefile file format, which is a GIS industry standard19. We created a separate shapefile for each of the 15 languages. Like all geographic information data formats, shapefiles allow both the storage of location (x) and attribution information (z). In our case, the x was the latitude and longitude pairs, and the z
was a measure of how much the article located at each pair was “being written about.” We found that one simple way to quantify the somewhat abstract idea of “being written about” is to use the indegree – or number of inlinks – for each article, because when an author of a given Wikipedia article a links to an explicitly geographic article b, the author must necessarily be writing something about the topic of b in article a. In the end, each of our 15 shapefiles contained a listing of lat/lon coordinates (x) for every explicitly geographic article (in a language edition l) paired with the indegree in l (z) of each of those articles. We also included additional attributes (z), such as article title, in order to help us visually inspect the data. It was then necessary to aggregate all this information into summary statistics for some set of spatial features that are comparable across all languages. Articles themselves are not comparable because the vast majority of explicitly geographic articles do not exist in all 15 languages. The first concern in our aggregation was to choose a unit that was appropriate given the GSP. This meant that we had to choose first-order administrative districts (states, provinces, etc.) or larger, due to the Alaska problem mentioned above. Had we chosen a smaller unit – counties for example – the article for the state of Alaska would be considered to be within the county20 that the lat/lon tag for Alaska happens to fall within. In the end, we performed our analyses at two scales: first-order administrative district-scale and country-scale. Similarly, but less obviously, had we decided to use a grid of geographic pixels21 – a common choice for researchers new to geographic information – pixels smaller than the state of Alaska would fail to solve the GSP. In general, where possible, it is best to use real spatial units that have inherent semantic meaning to the research question (e.g. states, counties, countries) rather than pixels. This can be done using the Point-In-Polygon (PIP) or spatial join algorithms in any of the GIS or GIScapable software packages mentioned above and geospatial data that is usually available in ESRI’s
343
A Beginner’s Guide to Geographic Virtual Communities Research
Figure 4. A choropleth map showing self-focus in the Russian Wikipedia through “indegree sums”, which indicate how many articles in a Wikipedia link to articles about places in a geographic region. We were careful to use a proper data classification strategy in this cartographic product
Shapefile or Google’s KML file format (from stakeholder websites22 or via a web search). Once we executed the aggregation, we were able to perform both statistical and visual analyses of the results. We will leave the rather detailed statistical analyses to readers who download the paper, but the visual reporting both elucidates the power of geographical analyses and presents an opportunity to briefly touch upon appropriate cartographic techniques for reporting these types of results. Figure 4 shows the rather extreme nature of our results: Russia is the destination of the most links in the Russian Wikipedia (by far). This was repeated across nearly all 15 languages. In order to truthfully convey the results of our study in map form (Figure 4 appeared in our paper), we made absolutely sure that our data classification strategy accurately represented our findings. A cartographic novice or an expert manipulator could easily exploit the map’s legend to naively or unscrupulously alter the reader’s impression of the data, especially given the lesser-known units of “inlinks”. It is also possible through naïveté to produce maps that are simply very difficult for the
344
reader to interpret. Before producing a choropleth (i.e. colored-polygon) map, it is important that the researcher be familiar with the standard methods of data classification (e.g. quantile, natural breaks, etc.). Many websites23 provide good tutorials on this topic. However, consulting a GIS or cartography textbook, (e.g. (Slocum, McMaster, Kessler, & Howard, 2009) or reading the entertaining “How to Lie With Maps” (Monmonier, 1996) is of course a more complete solution. Hopefully, through this case study the reader has gained a greater understanding of how geography can enable exciting virtual communities research. Readers should also be able to repeat many of the steps above in their own work.
NEXT STEPS: WHERE TO GO FROM HERE In this chapter, we have covered what we believe to be the minimal information required to begin examining virtual communities with a geographic lens. However, this chapter is by no means a replacement for a solid GIS course series. The
A Beginner’s Guide to Geographic Virtual Communities Research
majority of major universities (and many community colleges) will have at least one GIS course available. There are also online courses offered by universities such as Pennsylvania State24, which is well known in GIScience circles, and GIS software companies25. Finally, a growing number of universities including Harvard, UC Berkeley and UC Santa Barbara offer geographic analysis consultation centers in the vein of academic statistics consulting.
ACKNOWLEDGMENT We wish to offer special thanks to Dr. Martin Raubal (UC Santa Barbara, Geography), Dr. Emilee Rader (Northwestern University, Technology and Social Behavior), and Dr. Holly Barcus (Macalester College, Geography) for their invaluable comments and suggestions. We also thank the anonymous reviewers of this chapter for their feedback.
REFERENCES Adar, E., Skinner, M., & Weld, D. S. (2009). Information Arbitrage Across Multi-lingual Wikipedia. Paper presented at the WSDM ‘09: Second ACM International Conference on Web Search and Data Mining. Boggs, S. W. (1947). Cartohypnosis. The Scientific Monthly, 64(6), 469–476. Burrough, P. A., & McDonnell, R. A. (1998). Principles of Geographical Information Systems. New York: Oxford University Press. Callahan, E., & Herring, S. C. (2009). Cultural Bias in Wikipedia Content on Famous Persons. Paper presented at the AoIR 10.0: Internet Research 10.0.
de Smith, M. J., Goodchild, M. F., & Longley, P. A. (2009). Data Exploration and Spatial Statistics Geospatial Analysis (3rd ed.). Leicester, UK: Matador. Goodchild, M. (2001). A Geographer Looks at Spatial Information Theory. Paper presented at the COSIT ‘01: Conference on Spatial Information Theory 2001. Goodchild, M., & Hill, L. L. (2008). Introduction to Digital Gazetteer Research. International Journal of Geographical Information Science, 22(10), 1039–1044. doi:10.1080/13658810701850497 Goodchild, M., Yuan, M., & Cova, T. J. (2007). Towards a general theory of geographic representation in GIS. International Journal of Geographical Information Science, 21(3), 239–260. doi:10.1080/13658810600965271 Goodchild, M. F. (2007). Citizens as Sensors: The World of Volunteered Geography. GeoJournal, 69(4), 211–221. doi:10.1007/s10708-007-9111-y Hecht, B., & Gergle, D. (2009). Measuring SelfFocus Bias in Community-Maintained Knowledge Repositories. Paper presented at the Communities and Technologies 2009: Fourth International Conference on Communities and Technologies, University Park, PA, USA. Hecht, B., & Gergle, D. (2010). On The “Localness” Of User-Generated Content. Paper presented at the CSCW 2010: 2010 ACM Conference on Computer Supported Cooperative Work. Hecht, B., & Moxley, E. (2009). Terabytes of Tobler: Evaluating the First Law in a Massive, Domain-Neutral Representation of World Knowledge. Paper presented at the COSIT ‘09: 9th International Conference on Spatial Information Theory.
345
A Beginner’s Guide to Geographic Virtual Communities Research
Hill, L. L. (2000). Core Elements of Digital Gazetteers: Placenames, Categories, and Footprints Research and Advanced Technology for Digital Libraries. Berlin / Heidelberg, Germany: Springer. Larsen, J., Axhausen, K., & Urry, J. (2006). Geographies of Social Networks: Meetings, Travel, and Communications. Mobilities, 1(2), 261–283. doi:10.1080/17450100600726654 Larsen, J., Urry, J., & Axhausen, K. (2006). Mobilities, Networks, Geographies. Aldershot, England: Ashgate. Lieberman, M. D., & Lin, J. (2009). You are where you edit: Locating Wikipedia users through edit histories. Paper presented at the ICWSM ‘09: 3rd International Conference on Weblogs and Social Media.
Pultar, E., & Raubal, M. (2009). Progressive Tourism: Integrating Social, Transportation, and Data Networks. In Sharda, N. (Ed.), Tourism Informatics: Visual Travel Recommender Systems, Social Communities, and User Interface Design. Hershey, PA: IGI Global. Slocum, T. A., McMaster, R. B., Kessler, F. C., & Howard, H. H. (2009). Thematic Cartography and Geovisualization (3rd ed.). Upper Saddle River, NJ: Prentice Hall. Synder, J. P. (1987). Map Projections - A Working Manual. Washington, D.C.: U.S. Geological Survey. Tobler, W. R. (1970). A computer movie simulating urban growth in the Detroit region. Economic Geography, 46, 234–240. doi:10.2307/143141
Longley, P., Goodchild, M., Maguire, D., & Rhind, D. (2005a). Georeferencing Geographic information Systems and Science (2nd ed.).
Yahoo. Developer Network (2009). Yahoo! Placemaker Beta Beta. Retrieved July 20, 2009, from http://developer.yahoo.com/geo/placemaker/
Longley, P., Goodchild, M., Maguire, D., & Rhind, D. (2005b). Introduction Geographic Information Systems and Science (2nd ed., pp. 4–33). West Sussex, England: John Wiley & Sons, Ltd.
Yanai, K., Yaegashi, K., & Qiu, B. (2009). Detecting Cultural Differences using ConsumerGenerated Geotagged Photos. Paper presented at the LocWeb ‘09: Second International Workshop on Location and the Web in Conjunction with CHI 2009.
Monmonier, M. (1996). How to Lie with Maps (2nd ed.). Chicago, IL: University of Chicago Press. Pasley, R., Clough, P., Purves, R. S., & Twaroch, F. A. (2008). Mapping geographic coverage of the web. Paper presented at the ACMGIS ‘08: 16th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems. Priedhorsky, R., & Terveen, L. G. (2008). The computational geowiki: what, why, and how. Paper presented at the CSCW ‘08: The 2008 ACM Conference on Computer Supported Cooperative Work.
346
ENDNOTES 1
2
3
4
5
An important topic in cutting-edge GIScience research is the inclusion of the temporal dimension, so x now usually refers to a location in space-time, not just space. http://code.google.com/apis/maps/documentation/services.html#Geocoding. http://msdn.microsoft.com/en-us/library/ cc981067.aspx. http://developer.yahoo.com/maps/rest/V1/ geocode.html. http://www.mapquest.com/features/developer_ tools_oapi_ quickstart.
A Beginner’s Guide to Geographic Virtual Communities Research
7 6
8
9
10
11
14 12 13
15
16
http://www.maxmind.com. http://ondemand.metacarta. com/?method=GeoTagger. http://developer.yahoo.com/geo/placemaker/. Another common name is a “Geographic Coordinate System”, as opposed to a “Projected Coordinate System”. http://www.nhc.noaa.gov/gccalc.shtml and http://www.chemical-ecology.net/java/ lat-long.htm both offer easy-to-use manual circle distance calculators. The job of all projections (not just equal area) is converting the “angular” coordinates of latitude and longitude into “linear” coordinates with units like meter, nautical mile, kilometer, etc. http://trac.osgeo.org/proj/. http://geotools.codehaus.org/. http://www.colorado.edu/geography/gcraft/ notes/coordsys/coordsys_f.html. While it’s called a Law, geography and GIScience researchers agree that it is more of a guideline or rule-of-thumb. http://www.spatialanalysisonline.com/.
19 17 18
20
21
22
23
24
25
http://www.esri.com/ http://grass.itc.it/ Here we used GeoTool’s Input/Output packages Geography trivia sticklers in the readership will note that counties are called “boroughs” in Alaska. The geographic pixels methodology refers to dividing up the geographic study area into arbitrarily-sized square area units (i.e. 10km-by-10km). The U.S. Census (http://factfinder.census. gov) and/or Statistics Canada (http://www. statcan.gc.ca/) are good places to start looking. Statistics Canada provides an excellent overview at: http://atlas.gc.ca/sitefrancais/ english/learningresources/carto_corner/ map_content_carto_symbology.html http://www.worldcampus.psu.edu/GISCertificate.shtml http://www.gis.com/education/online.html. These educational opportunities are provided by ESRI, which sells the famous, powerful, and rather expensive ArcGIS software.
347
348
Chapter 20
A Theoretical Method of Measuring Virtual Community Health and the Health of their Operating Environment in a Business Setting Brent Robertson Sancor, Canada
ABSTRACT This chapter discusses how virtual communities are associated with business and describes how the communities support the overall business effort. The chapter then examines the ways that the execution of certain business processes – such as the ‘lessons learned process’ – can have a strong supporting role in maintaining the health of virtual communities. Quantitatively measuring key aspects of these business processes provides a strong indication of the health of virtual communities that are linked to the process. The chapter introduces a measurement by objectives system, describes how it can be used to assess the health of virtual communities and how this can be extrapolated to assess the supportive nature of the overall business environment the communities are operating in.
INTRODUCTION Virtual communities exist within the business community and within businesses themselves. They are a key part of the knowledge transfer system within a business. Companies invest money in supporting the virtual communities, not just the internal ones but external ones as well. Companies DOI: 10.4018/978-1-60960-040-2.ch020
tend to invest only in things which bring a net return, so their investment in these virtual communities suggests that many companies believe there is value in them and that they will have a net positive return on their investment. Because there is perceived value in business-based virtual communities, there needs to be a way to measure their health and provide those making the investment with information related to the health of their communities.
A Theoretical Method of Measuring Virtual Community Health
The business environment in which a virtual community operates will impact its overall health. If the environment is supportive of virtual communities, they will have a better chance to succeed than if the environment is toxic. Since the investment is made, it is reasonable to measure the health of a virtual community’s business environment to ensure it is supportive. There are several business processes that can benefit from developing linkages with virtual communities. Benefits accrue to both the process and the community. Processes become more robust and the communities have a continual stream of relevant discussion topics. As opposed to attempting to directly measure the health of a virtual community, indirect measures based on the performance of key elements of linked business processes may be established. Using a modified version of the Productivity Measurement by Objectives methodology (Felix and Riggs, 1983) provides a sound method of measuring the interface activity between any business process and the virtual communities that support it. The quantitative output values support clear presentation of virtual community health to those with an interest. Aggregating the output values across various groupings of communities provides a relative measure of the health of environment they operate in. The goal of the chapter is to present a measurement method that can be used to provide those with an interest an understanding of the health of business-based virtual communities and the overall health of the environment they operate in. The measurement method is presented with a template that can be adapted for use by practitioners.
BACKGROUND Rheingold (1993) defined Virtual communities as “social aggregations that emerge from the Net when enough people carry on those public discussions long enough, with sufficient human
feeling, to form webs of personal relationships in cyberspace.” (Chapter 1). Wenger, McDermott, and Snyder (2002) define a Community of Practice as “groups of people who share a concern, a set of problems, or a passion about a topic, and who deepen their knowledge and expertise in this area by interacting on an ongoing basis.” (p. 4). Distributed Communities of Practice (DCOP) (Daniel, Sarkar, and O’Brien, 2006) focus on communities of practice that are wholly supported by virtual means. A question to be addressed is whether DCOPs in the workplace are a subset of virtual communities. Simply because web-based technologies are used by businesses does not mean that businesses have created virtual communities. Rheingold’s definition weaves an emotional component with a technology component, forcing the question ‘does utilizing web-based technology in a business setting constitute a virtual community?’ From a technological perspective, we are all aware that businesses have begun to utilize webbased technology as a means of enhancing business performance. Technological components of DCOPs that businesses are using include email, online forums, discussion areas, websites, and libraries – most of which are routinely used by employees in small and large business enterprises for world wide communication. In any business-related internet and intranet setting, anonymity is difficult and use of actual names and – often – additional contact information is expected. Because names are used and there are often professional settings where ‘names’ meet, communication does form a sense of community. In small to medium-sized businesses, the most common means of electronic communication is email. Other forms of communication are typically web-based and involve targeted web and discussion sites which provide support for employees in the business enterprise. Professional and other associations have discussion sites where businessrelated issues are posted and addressed, web sites where current news is posted, and directories
349
A Theoretical Method of Measuring Virtual Community Health
so that specific individuals can be located for discussions. In larger businesses, internal intranet systems exist. These intranets provide a secure internal network where key elements of the internet may be mimicked and accessed by employees only. The focus of an intranet is supporting the business through communication and file storage. Intranets support email, discussion sites, internal websites, classified advertisements, and – more currently – instant messaging software. Typically, business intranets also have a portal to the external World Wide Web (the internet) to support employees who use the internet to capitalize on public domain information and applications. From a technical perspective, DCOPs do meet the criteria established to be considered a virtual community. Looking at the emotional component of the virtual community classification, employees are emotionally involved in their employing business’s overall success. If the business is not successful, the employees will experience a threat to their income. If you threaten a person’s income, you are posing a threat to their basic physiological needs (Hersey, Blanchard, and Johnson, 2001). There is no question that this results in strong emotion – just look at the picket line of any labour strike for verification. As well, many business activities support other employee physiological needs such as social, esteem, and self actualization needs (Maslow, 1987). These factors and their influence on all aspects of human needs generate very strong emotions in employees; therefore, the emotional component of Rheingold’s definition is satisfied. Looking at Rhiengold’s publication, it becomes apparent that the social aggregations are all focused on specific topics and that those who participate may – over time – feel a sense of community and emotional bond. This is aligned with business communications. Those who participate in business groups are typically focused on their area of contribution to the overall success of the business – a topic area with special significance to
350
them. The majority of people are passionate about their work – something demonstrated clearly in Demming’s ‘Red Beads Experiment’ where even though individuals know on an intellectual level that they cannot succeed, they passionately wish to (Demming, 1994). Participants in work communications also develop a strong emotional bond with other employees on two levels, one because they are colleagues with a common goal of business success and a second because work is such an intense part of their lives as humans. As a result of the emotion involved, the lack of anonymity, and the public nature of many of the discussions, employees who participate in internet and intranet DCOP communications as part of their work are participating in virtual communities. Although there are many forms of electronic communications, this chapter will focus on discussion groups.
MEASURING THE HEALTH OF BUSINESS-BASED VIRTUAL COMMUNITIES Virtual Communities within a Business Business-based virtual communities have a group of regular participants which interact on various aspects of the community focus and limit themselves to that focus while participating in the community. In many cases, they are DCOPs with a specialized focus on their particular area of interest (ie: metallurgy, accounting, safety, etc). In some cases, virtual communities are broader and less defined in interest and focus than a DCOP. Some virtual communities look at how to integrate / leverage the knowledge generated by the DCOPs and their members to benefit the overall organization. Virtual communities can exist either internally within a business’s intranet or externally on the internet. Virtual communities that are internal maintain the specialized focus but only have em-
A Theoretical Method of Measuring Virtual Community Health
ployees of one business enterprise as members. External business-oriented virtual communities have membership from multiple businesses and are typically sponsored by professional associations or vendors. Both internal and external virtual communities are extremely valuable to a business. Internal business-based virtual communities operate within an environment which impacts the health of the community. The environment is the business culture (Deal and Kennedy, 1982). Logically, there is a strong correlation between the environment and the health of virtual communities. If time is not provided for virtual community use or if those who use them are looked upon unfavorably within a business or business department, the virtual community will struggle. If participation in an internal virtual community is supported and managerially recognized, the community will have fewer barriers to ongoing health. When looking at business-based virtual communities, measuring the health of the virtual community environment is as important as measuring the health of the community itself.
Virtual Community Issues Key issues within virtual communities are enforcing the regulations about discussion items and community participation levels. It is common to have a moderator established to monitor discussions, ensure non-topic postings are curtailed, and to ensure a level of respect and decorum is maintained. This being said, generating interest in the group and enlarging its membership is typically beyond the moderators control. Even in a strong business environment, virtual communities that have little to discuss will cease to exist. Business processes that support discussion and drive increased membership would support the virtual community’s longevity and vigor. Recognizing that these supporting business processes exist and taking steps to integrate them into the world of virtual communities would be beneficial to both.
Business Processes and Silos All businesses have processes – they are simply the steps that are taken to execute the work of the enterprise. For example, the simplified process of a retail transaction is: 1. Customer selects merchandise 2. Cashier Associate determines price and enters it into the register 3. Payment is made (steps vary with type of payment – cash, credit, etc) 4. Cashier wraps / bags item 5. Cashier passes receipt and bagged item to customer All processes have a trigger and an endpoint, the trigger in this case being a customer selecting merchandise for purchase and the end point being the customer has the receipt and the bagged merchandise. Processes exist for most key elements in a business. There are marketing processes for placing an advertisement in a publication, accounting & banking processes, and work execution processes. Every process is a series of steps that is repeated each time the trigger occurs. The goal of a process is to achieve the same outcome every time the process is executed (Deming, 1994). The terms Quality and Continual Improvement (sometimes known as Operational Excellence) are based on writing down the steps in a business processes and following the steps. Quality comes from following the steps. Continual improvement comes when you identify an improvement to one or all the steps and update the written process description and activities to reflect it. Silos – or functional units in a matrix organization – are the organizational specialties within a business that have multiple personnel (Project Management Institute, 2008). Marketing would be a silo, as would accounting, engineering, and risk management. Silo’s are staffed with specialized individuals, each having a similar focus and
351
A Theoretical Method of Measuring Virtual Community Health
typically with similar training and varying levels of experience (ie: accounting). Everyone in a silo has the common goal of executing the work in their scope in a way that supports the overall business. It is common for processes within a large business to cross artificial work boundaries. These cross-silo processes originate in one silo and generate work in other silos, sometimes in sequence and sometimes in parallel. An example of this is when engineers design a new piece of equipment. Someone identifies the need for the equipment, the design is developed and drawings / specifications are created, a fabrication shop is selected to build the equipment, the shop receives authorization to proceed, inspectors monitor construction of the equipment, payment installments are issued, construction is completed, the equipment is shipped and received at the work site, and it is installed and commissioned. This process involved multiple silos and even an external fabrication shop.
Virtual Communities and Cross-Silo Processes In large businesses, certain business processes can be used to enhance virtual community activity. Knowledge-focused cross-silo business processes are not specific to a single virtual community topic area. They are focused on the management of corporate knowledge. An example of a business process that supports virtual communities is the lessons learned process (Weber, Aha, and Becerra-Fernandez, 2001; Cristal and Reis, 2006). Although there are numerous other processes that support virtual communities (examples include the non-conformance process, the continual improvement process, and the risk management process), this chapter will focus on examples involving the lessons learned process. The lessons learned process is common among large businesses and it cuts across all organizational silo’s. There is no portion of a business where lessons are not learned. The process goal is the capture, storage, and dissemination of corporate lessons that were
352
learned during the execution of their business activities. The lessons can be about any aspect of the business. Lessons learned typically describe an issue, provide some background, describe how it was resolved, and (sometimes) discuss the effectiveness of the resolution. Another type of lesson describes steps that led to something working exceptionally well. In both cases, the goal is to reduce company waste by providing the knowledge for others in the business to act on in similar situations. Lessons are collected using various methodologies including random submissions, focused meetings, and, in some cases, project gate reviews (Project Management Institute, 2008). Once a lesson is submitted, the lesson must be reviewed and verified. The verification portion of the process is key – without verification, there could be erroneous or outright wrong information placed in the database which would create problems (including discrediting the other data stored in the database). Verification involves getting independent review of the submitted lesson to ensure it is correct and well presented. Verification is improved when there are numerous reviewers, some that were directly involved and some that were not involved but are checking to ensure the lesson contents make sense. All reviewers must have a reasonable background in the subject matter.
Integration of Processes and Virtual Communities Like any community, virtual communities need a reason to exist. There is always more work to do than there is time to do it, so people need a valid reason to spend time in a virtual community. Participants would like to experience reward through the ongoing intellectual and emotional stimulation the subject area provides as well as having a sense that their thoughts and opinions are supporting the business. By integrating the verification portion of the lessons learned process with internal virtual
A Theoretical Method of Measuring Virtual Community Health
communities, the communities benefit from a continual stream of discussion topics. The lessons learned process benefits by having a strong cadre of appropriately qualified reviewers. Because the outcome of the review will have a lasting impact on the company and individual reviewer’s names may be attached, interest levels will be high and emotions will be involved. There may even be aspects of a lesson which are reviewed by external virtual communities, providing additional spin-off benefits. Note that the methodology used for integrating the lessons learned process with a virtual community would be the same when integrating other business processes with a virtual community. When lessons that fall within a community’s area of focus are submitted for discussion, some of the individuals that are part of the virtual community become engaged. As part of their role in the business enterprise, they are interested in capturing the knowledge presented and providing the best possible insight for each lesson. Community members can reasonably be expected to review the lesson, discuss it, revise it to ensure clarity, and provide a reasonable critique of any suggested actions. It is anticipated that the initial lesson submitter would participate in the discussions and provide additional information as needed. When the virtual community has reviewed the lesson, the initial submission can be updated with the new information (one of the functions of the person managing the lessons learned process – the Lesson Learned Coordinator). In order to ensure integration between the lessons learned process and virtual communities, a formal step of communicating to the virtual community must be placed in the process’s diagram or supporting procedure. Communicating to the virtual community requires that the Lessons Learned Coordinator select the appropriate virtual community(s), post the draft lesson in the community, and ensure that community members are notified. The Lesson Learned Coordinator would then revisit the community a few weeks after the lesson was posted, review the comments,
and integrate the comments into the final lesson learned text. The more members who provide feedback related to an item, the more confidence the Lessons Learned Coordinator can have in the result. For example, if a lesson is forwarded to a virtual community for validation and only one member comments, the confidence the Lessons Learned Coordinator has in the submission would not be high. However, if ten people comment and agree with the submission, confidence that the lesson is valid increases.
Measuring to Determine Virtual Community Health Business processes can be measured to determine how well they are performing. When a process step is formalized, expectations related to time and quality are documented. Having documented expectations provides a benchmark that elements of the process can be measured to. Once integration of the virtual community into the business process is formalized, the community’s performance against expectations can be measured. Although this may not be valid for non-business virtual communities, it is certainly valid in a business virtual community. Recognizing that there are several techniques for measuring business process performance, it is suggested that using a “Measurement By Objectives” (MBO) system would provide meaningful measures that reflect the health of virtual communities. The author has successfully used MBO to measure administrative processes in a business setting and is confident that its application in this setting is appropriate. MBO measurement systems output quantitative data. Quantitative data is superior to qualitative data in that it removes or downplays the subjectivity associated with qualitative measures and permits relative comparison between communities. It also reduces the un-calibrated judgment that is frequently encountered when
353
A Theoretical Method of Measuring Virtual Community Health
individual leaders are asked to assign values for their organizational health. MBO utilizes a matrix structure that is flexible, can be modified to suit specific needs, and can be calibrated once the needs are identified. The matrix itself is easy to use once it is understood and – because virtual communities are all computer based – the matrix can be fully automated so that measures are provided with the click of a mouse. The key to successfully using the MBO format is establishing the right elements to measure as well as the benchmark and measurement scale. A significant measure of virtual community health is the number of responses per lesson learned submitted. If a community is healthy, its members will be actively participating in community discussions. When a potential lesson learned is posted to a virtual community, a healthy community will discuss, evaluate, and either validate, modify, or reject the lesson. In a formal process, that becomes their defined role. A community that is not healthy will have very few if any comments. Recognizing that some items will be very controversial and will generate much discussion while other items will either be accepted or rejected outright, there is still an opportunity to measure the overall trend within the community. In addition to measuring the number of responses per lesson, the time elapsed between submission and response and the quality of each response should be measured. A weighting factor is part of the matrix, and it is used to adjust the importance associated with each measure.
MBO Matrix Operation By definition, virtual communities are electronic in nature. This supports having the MBO matrix fully automated – eliminating the need for manual calculation of community scores. Even in situations where manual calculations are required, experience has proven it takes less than a couple of hours per month in most instances. Having
354
automated scoring also supports the assessment of the virtual community environment. A suggested MBO matrix format for determining the health of virtual communities is presented in Table 1. When reviewing the matrix, please keep in mind that its application for measuring knowledge processes is still theoretical and calibration is required. Also note that it would have to be further calibrated for each business-oriented virtual community it was applied to. To develop a better understanding of how the matrix works, we will work through a lesson learned validation. A potential lesson is received by a Lessons Learned Coordinator. In order to validate the lesson, it is posted in a virtual community for comment. Comments are collected by the Lessons Learned Coordinator and the potential lesson is validated, modified, or rejected.
The Matrix Score Would be Calculated for Each Lesson Posted to the Virtual Community The first component of the overall submission score is developed by dividing the number of members that responded (a quick count) by the number of members in the group. The calculated percentage is written in the ‘Measured Value’ cell below the ‘% Of Members Responding’ column heading. The calculated percentage is then compared to the scoring ranges directly below the measured value cell. When the scoring range that the measured value falls within is located, the score value in the left hand column is awarded (enter it into the ‘Actual Score’ cell beneath the ‘% Of Members Responding’ column). The actual score is then multiplied by the weighting factor below it and the result recorded in the ‘Weighted Score’ cell. The average time between the submission being placed in the virtual community and the responses is then calculated. Using a methodology similar to what was done for ‘% of Members
A Theoretical Method of Measuring Virtual Community Health
Table 1. Suggested MBO matrix for determining the health of virtual communities Assessed Element
% Of Members Responding
Average Time Elapsed Between Submission and Response
Subjective Evaluation of Quality of Responses
SCORE VALUE
(Measured Value)
(Measured Value)
(Assessed Value)
High Performance Level (Award 10 Points)
>30%
1 day
Exceptional
Score of 9
27% - 30%
2 days
Score of 8
23% - 26%
3 days
Score of 7
20% - 22%
5 days
Very Clear
Score of 6
17% - 19%
7 days
Score of 5
14% - 16%
9 days
Score of 4
11% – 13%
11 days
Score of 3
8% - 10%
15 days
Score of 2
5% - 7%
20 days
Very Low Performance Level (Score of 1)
<5%
> 20 days
Actual Score
(Actual Score)
(Actual Score)
(Actual Score)
Weighting Factor
40%
40%
20%
Weighted Score
(Weighted Score)
(Weighted Score)
(Weighted Score)
Acceptable
Significant Revision Required
Overall Submission Score
Responding’, the actual score and weighted score are calculated and entered. As part of their role, the Lessons Learned Coordinator is responsible for reviewing each of the responses to the initial submission. The Lessons Learned Coordinator is required to make a subjective assessment of the responses for clarity and choose one of the potential values under the ‘Subjective Evaluation of Quality of Responses’ column. The corresponding score value in the left hand column is then entered into the ‘Actual Score’ row and the weighted score is calculated and entered. The three weighted scores are then added together and the overall submission score is entered in the ‘Overall Submission Score’ box. The scoring process is repeated for each submission issued to the virtual community. The results are averaged and the average – or ‘Aggregate Score’ – is reported to community and business leaders
within the established measurement time period (monthly, quarterly, biannually, etc).
Matrix Calibration It is likely that each community will have its own matrix calibrated to support its unique characteristics. Calibration includes subjectively adjusting the weighting factors, the scoring ranges, and the elements assessed. The matrix should be calibrated so that realistic community measures of strong, average, and struggling are output. Reasons for needing specific calibrations for each community include community size, member workload, and characteristics of the members.Using the MBO matrix shown in Table 1: •
A strong community would be one having an aggregate score of over 7
355
A Theoretical Method of Measuring Virtual Community Health
• •
An average community would be one having an aggregate score between 5 and 7, A struggling community would be one having an aggregate score less than 5.
Standardizing calibration to reflect these categories would provide useful information to those leading the virtual communities and those responsible for funding them. With a robust and calibrated measurement system, virtual communities that are struggling can be quickly identified. Once identified, business leaders can work to determine the issues and take appropriate measures to restore the health of the community.
Virtual Community Environment In a large organization, there may be several different business groupings or areas. For example, at least one large oil company has a group that pumps oil from wells and delivers it to refineries, another group responsible for refining it, a group responsible for delivering and selling the refined gasoline to consumers, and a group responsible for building any project having a total price over $ 10 million. Each of these groups would have numerous active virtual communities. The group itself (ie: refining) would be the environment that the virtual communities operate in.a smaller organization, there may be only a few virtual communities and the single environment of the business. Measures of the health of each virtual community can be aggregated to provide a measure of the overall virtual community environment. MBO provides an indication of the health of each of the communities within a group. If they are all reasonably healthy, then it is likely that the environment is healthy. If there is strong variation with groups at both ends of the spectrum, there are likely problems within portions of the environment. If they are generally scoring poorly across the board, the overall environment is likely not healthy.
356
Determining the health of the environment can be done using a dashboard arrangement which provides a unique visual display all of the virtual communities in an environment at the same time as their measured health is displayed. Dashboards are designed to show all the component communities aggregate score by colour (typically red – struggling, yellow – average, or green – strong) with the numerical value also shown. The dashboard display of the MBO scores comes in a row, column, table, chart, or any other relevant format that ensures the viewer can quickly identify any problem areas. This is particularly valuable in situations where there are multiple environments active within a large company and a very senior person wishes to gauge their health. A quick scan of the dashboard identifies any struggling virtual communities and a string of reds quickly identifies any problem environments.
Benefits There are numerous benefits to establishing a systematic way to quantify the health of virtual communities within an organization. Measures support early action by leaders when issues are identified, support the processes that are integrated into the community, and provide members with feedback showing how well they – as a team – are supporting their overall organizational goals. Measures provide individual community leaders with feedback about their community’s health. If their community is struggling, they are able to take steps to address the situation. It also gives them objective evidence that they can use within the business leadership structure to make a case for changes. Strong, vibrant, virtual communities are associated with knowledge transfer, innovation, and committed employees. Areas with perpetually struggling virtual communities are likely experiencing other issues as well, and it is valuable to be able to identify them and begin investigating what the issues are.
A Theoretical Method of Measuring Virtual Community Health
Knowledge transfer processes support the success of the business organization. Lessons learned, risk, quality, and similar processes are targeted specifically at reducing waste within an organization. These processes rely on information validation from individuals who are working in specialized areas within the organization. Submitting items to the community for review also disseminates information and discussion of the information ingrains it within the community members. Measures that indicate to senior managers that their knowledge transfer processes are working well provide a level of comfort that waste due to repetition of past mistakes will be minimized. Providing the measures to the teams ensures that team members and leaders are aware of how well they are meeting company expectations. Virtual communities can take pride in their performance or their leaders can take corrective action if the community is struggling. Having the ability to identify the strong performing virtual communities as well as the struggling communities enables discussion about what is working and what should be changed within the struggling communities.
FUTURE RESEARCH DIRECTIONS The methodology presented for measuring the health of virtual communities has not been tested or studied in a controlled manner. Future research in the area should be focused on testing the proposed measurement system in a large operating business which has existing virtual communities. This chapter is designed to open the door to further discussion and research in the area.
CONCLUSION Individuals are emotionally involved in their work through interest, need of wage, and other social considerations. This interest allows us to classify
business-based knowledge sharing sites as virtual communities. Virtual communities having very little to discuss cease to exist as the membership looses interest. By integrating knowledge-based business processes into a business’s virtual communities, there is a constant stream of discussion items and there is benefit to the integrated business processes. This is especially true for cross-silo business processes such as lessons learned, risk management, and quality. Along with the improvement in overall health of virtual communities, integration of business process with the communities allows measurement which is an indicator of community health. The measurements are based on the MBO matrix methodology. Measurements are calibrated and produce quantitative results. Measurement outputs can be presented to leaders to provide them with an indication of the health of their virtual communities. The results are specific enough to ensure focused steps can be taken. Virtual communities operate within a business environment. Some business environments support virtual communities and some are rather toxic to them. By using a dashboard method of displaying the health of the virtual communities, senior business leaders can quickly assess the health of the virtual community environment present within their organization. Issues can be addressed as needed. This provides them with an indication about the overall state of knowledge transfer within the organization – a key element of business success.
REFERENCES Cristal, M., & Reis, J. (2006). Leveraging Lessons Learned for Distributed Projects Through Communities Of Practice. IEEE International Conference on Global Software Engineering Proceedings (pp. 239-240).
357
A Theoretical Method of Measuring Virtual Community Health
Daniel, B. K., Sarkar, A., & O’Brien, D. (2006). User-Centred Design for Online Learning Communities: A Sociotechnical Approach for the Design of a Distributed Community of Practice. In Lambropoulos, N., & Zaphiris, P. (Eds.), UserEvaluation and Online Communities (pp. 54–70). Hershey, PA: Information Science Publishing. Deal, E. D., & Kennedy, A. A. (1982). Corporate Cultures- The Rites and Rituals of Corporate Life. Cambridge, MA: Perseus Books Group. Demming, W. E. (1994). The New Economics for Industry, Government, Education. Cambridge, MA: MIT Center For Advanced Educational Services. Felix, G. H., & Riggs, J. L. (1983). Productivity Measurement by Objectives. National Productivity Review, 2(4), 386–393. doi:10.1002/ npr.4040020407
Hersey, P., Blanchard, K. H., & Johnson, D. E. (2001). Management of Organizational Behavior: Leading Human Resources. Upper Saddle River, NJ: Prentice-Hall Inc. Maslow, A. H. (1987). Motivation and Personality (3rd ed.). Reading, MA: Addison-Wesley Publishing Company. Project Management Institute. (2008). The Project Management Book Of Knowledge (PMBOK Guide) (4th ed.). Mixed Sources. Newton Square, PA, Project Management Institute Rheingold, H. (1993). The Virtual Community: Homesteading on the Electronic Frontier. Reading, MA: Addison-Wesley Publishing Company. Weber, R., Aha, D.W., & Becerra-Fernandez, I. (2001). Intelligent Lessons Learned Systems. International Journal of Expert Systems – Research and Applications, 20(1), 17-34. Wenger, E., McDermott, R., & Snyder, W. M. (2002). Cultivating Communities Of Practice. Boston, MA: Harvard Business School Press.
358
359
Chapter 21
Building Web Communities: An Example Methodology Jan Isakovič Artesia, Slovenia Alja Sulčič Artesia, Slovenia
ABSTRACT The aim of the chapter is to provide an example of community definition and community building methodology using a step-by-step approach. The presented community specification and building methodology allows refining a broad community purpose into specific measurable goals, selects the social media tools that are best matched with the company needs and results in a platform specification that can be relatively simply transformed into software specifications or platform requirements.
INTRODUCTION The Internet is a tool which enables individuals from all over the world to connect, work and play together in ways that were not possible before. Individuals now have the power to become content producers and global thought leaders without the help of big corporations or mainstream media. These changes are also challenging the traditional perception of enterprises. Consumers can now talk to each other on a global scale and they demand more from companies. On the other hand, global and distributed companies are also using online tools for collaboration and innovation. It is DOI: 10.4018/978-1-60960-040-2.ch021
becoming increasingly important for companies to learn about the changing media landscape and embrace the new wave of online social technologies to (re)connect with their customers or make internal collaboration more efficient. Among other things, companies should pay attention to the phenomena of web communities and how web communities can influence their business. Many companies want to have their own web communities based around their product or service, but a lot of them fail to define the right goals and strategy for a web community and consequently fail to build a lively web community with a clear business value - even if they have the right product or service to build a community around. In this chapter we will take a look at the
most important elements of web communities and how we can combine these elements in a valueadded methodology for defining and building web communities. We will also present our step-by-step community building approach using an example community definition process.
DEFINING WEB COMMUNITIES Communities were traditionally formed through interaction with other individuals living in the same location. Grouping has enabled people to deal with their environment in a more effective way. Modern communication networks and broadly available transportation has made it easier for people to stay or get in touch over longer distances (Wellman et al, 2002). Therefore, communities can nowadays be defined as “networks of interpersonal ties that provide sociability, support, information, a sense of belonging and social identity” (ibidem). This can be achieved both in local neighborhoods and on the Internet using various web technologies. When communities are formed on the Internet, we can call them web communities, online communities or even virtual communities. Web communities can also have an “offline” component. Howard Rheingold, one of the pioneers in the field of web communities, defined a web community as “a group of people who may or may not meet one another face-to-face, and who exchange words and ideas through the mediation of computer bulletin boards and networks” (Preece, 2001a). People instinctively form web communities every day. They can be centered around such things as a common purpose, interest, practice or circumstance (Clarke, 2009). They want to exchange information about how to file their tax report (community of purpose), talk about their favorite music (community of interest), connect with other individuals doing the same job (communities of practice) or talk about the problems they have with their teenage children (community of circumstance). Other shared attributes of an online
360
community can include emotional connections, shared activities, resources and conventions, and interpersonal support (Lazar and Preece, 1998). Community building is a complex interdisciplinary process that does not have many established models. Too often the models focus on the software used by web communities and forget that software is just a tool that enables a community to communicate and interact over space and time. Software or any other technology by itself doesn’t form communities or guarantee a successful online community (De Souza and Preece, 2004). Early examples of web communities like The WELL (http://www.well.com/), which started as a simple bulletin-board system (BBS), show that web communities often don’t require advanced software to spring into live. Instead, successful web communities are determined by social factors (such as people, purpose and policies), although well-designed software functionality and usability can improve the success of a community (ibidem). Much more important is getting together the right people, who have something in common and are willing to invest their time and effort into their community (which is of course also true for offline communities). Therefore the focus of this chapter is on the people-side of the community building process. Communities are often created with a bottomup approach, we can also call these communities member-initiated (Porter, 2004). People naturally come together with a common interest, and the community is managed by members (ibidem). In the business world, we often have the desire to “build” a community top-down. We can call these community organization-sponsored, and the members of these communities are somehow connected to the business and goals of the organization (ibidem). A company might want to expand a real world community of customers on the web or wants to give customers or employees the opportunity to network in an online environment with hopes that they might form a web community and strengthen their relationships with one another and
Building Web Communities
the company. In these cases, communities often aren’t naturally born, but require some planning. It is worth noting that not every company will be able to create a successful community with a top-down approach and that is why there isn’t a single magic formula for community building, but rather a series of guidelines that can help a company figure out if and how a web community can help its business. This chapter focuses on tools and strategies that can help build organizationsponsored communities. According to Preece (2001), there are three key components that contribute to good sociability of a community and are therefore the main factors of a community’s success: Purpose (a shared focus), People (personal characteristics of community members), and Policies (written and unwritten rules and norms, trustworthiness). We also believe that it is important to define a clear added value for both the community members and the company, which is either sponsoring or building a web community. Too often companies fail to provide a clear added value for the members and focus only on the added value for themselves. Or, on the other extreme, companies can forget about the need of having an added value, which is aligned to their business goals. Once we figure out the basic characteristics of our community and its members, we should also start thinking in more concrete terms about how our community will be defined and managed. One of the best models for this was provided by Amy Jo Kim through her nine design principles for building web communities: 1. Defining a clear purpose, 2. Building gathering places for community members, 3. Creating meaningful personal profiles, 4. Planning for different roles within the community, 5. Defining community leadership practices,
6. Encouraging the right etiquette and ground rules for communication and conflict solution, 7. Promoting regular events that bring members together, 8. Integrating community rituals around important events (e. g. welcoming new members), 9. Allowing members to run sub-groups (especially in large communities) (Couros, 2003). Similarly, Typaldos (2000) defined the following principles of that define a web community: shared purpose, member identity, sharing and exchanging information, ideas, support, good etc., building trust, forming reputations, working in small groups, shared space, boundaries, governance, expressing group identity, and tracking history and the evolution of the community. We can translate Kim’s or Typaldos’ into community parameters, which help us define platform software specifications, terms of service, standard of communication (business or more personal) and can be used as a monitoring tool for the community growth and health in its later stages. However, we do believe that knowing the Purpose and the People, who form a community, are the key principles that should be thought out before going into specifics of a community. Figure 1 shows what we believe is the best way to define and build a community. The fundaments are built by understanding the type of community we can build (and what will be its main purpose) and the characteristics of the users of our desired community. This will enable us to find a clear added value for both the users and the company or organization trying to build the community. Once these basics are defined, we can use them to define concrete community parameters and strategy. The rest of the chapter will define each of the building blocks in more detail and provide an example of how the model can be used to define a web community.
361
Building Web Communities
Figure 1. Community model
DEFINING COMMUNITY PURPOSE Of the design principles mentioned above, defining a clear community purpose should be a starting point of any community building. It is important to understand why people join a certain community. What is their motivation and what problems is the community solving for the individual? We should also define a clear purpose of the community for the company or organization trying to build a web community. The purpose specifies what user and company need(s) the community fulfills. The aim of this phase is to find some matching needs or common points of interest on both sides. These are some basic questions you should try to answer in this first phase: 1. What is the value of community to its members and for the company or organization? (basic purpose) 2. Why should somebody register with your community? (why won’t a Google search do?) 3. Why should they be back regularly? (how to provide persistent value) 4. Why will a possible competitor be unable to simply copy your community project in 12 months? (how to harness network effects) Once we are able to identify a common purpose for the community, we should try to define
362
the characteristics of the users of our desired community.
DEFINING THE USERS Knowing the users is a crucial part of any community building process. We start by defining the user needs, motivation and purpose for joining a certain web community and continue with an analysis of how they usually behave in web communities. There are different theories and models on why people join communities that Äkkinen (2004) divides into three main groups: economic theories, social theories and the interest perspective. According to economic theories, people participate in online communities if the perceived benefits exceed the sacrificed resources or costs, such as time, energy, knowledge (ibidem). Social theories emphasize the social benefits of participating in a community, such as reciprocity among community members, defining and maintaining members’ social identity, finding like-minded people, entertainment or self-realization values (ibidem). The interest perspective deals with the interest of community members, and divides interests into self-interests and community-interests or altruism (ibidem). Members are motivated to participate to please their own personal motivations and also to benefit other members, by doing which they get recognition, personal satisfaction or, to put it
Building Web Communities
Figure 2. User participation hierarchy
in terms of economic theories, they get benefits that repay the resources they invested into their participation. When thinking about the potential users it is therefore important to think about how a web community can benefit the members. Additionally, we can use this knowledge during the software design process to make it easier for members to participate and get recognition for their participation. For this purpose, several incentive mechanism like rating systems, leader boards or personal recommendations can be used (Cheng and Vassileva, 2006). The next step is understanding how people usually behave in web communities. The model created by Forrester Research defines six basic groups of online users: 1. Inactives (do not participate in online communities), 2. Spectators (passive participants - content consumers), 3. Joiners (participate in social networks), 4. Collectors (organize, tag, and share information), 5. Critics (write reviews of content, rate content), 6. Creators (create content) (Li, Bernoff, 2008).
Users can belong in different groups and can change their “membership” over time or depending on the circumstance. Forrester Research arranges these groups in a participation ladder - from the lowest (Inactives) to the highest (Creators) (Li, Bernoff, 2008). We arrange them a bit differently to emphasize the relative equality of active participation types. A creator is not automatically more important than a critic or a collector - it is active participation that matters, and not the form that it takes. It is important for a company to be able to estimate the number of users in each of the user participation types for its own community. In other words, we must be able to know how many users would be willing to consume content in our community (passive participants) and how many would be willing to create content for passive participants to consume (active participants). The usual rule-of-thumb is 90-9-1, or 90% of passive content consumers, 9% occasional contributors and 1% of regular content contributors (Nielsen, 2006). However, this can be very different for some user groups and even for smaller communities with stronger ties. In a large, open community, it will be more common to see a large percentage of passive participants, who are often called lurkers (Preece et al, 2004).Passive participants satisfy their needs by reading following what active participants are saying, and it is not
363
Building Web Communities
always necessary to try to convert passive participants into active ones (ibidem). According to Critical mass theory, communities need a critical mass of participants to make participation worthwhile, but it is unclear what the exact number in different kinds of communities might be (Preece, 2001b). It is also important to find out what web communities the target user group already belongs to and what social tools they are familiar with or like using. Using the most appropriate social tools helps user acquisition and retention, as they do not have to learn to use new tools. The best way to accurately gauge the social profiles of the users is to perform a survey on a sample of the target users groups, asking them about their internet and social media use.
Figure 3. Community definition process
364
BUILDING A COMMUNITY In this section, we present our community building methodology that builds on the previous two sections in more detail through an example. It is based on common community purpose and value for both the members and the company or organization (see Figure 3). The core of our methodology is the Waterfall model for software development, which includes these basic phases: Requirement Specifications phase, Software Design, Implementation and Testing & Maintenance (Parekh, 2005). In our opinion, the Waterfall model is a useful framework for defining web communities, as all web communities use a certain complex software for its online interactions. The pre-defined stages and clear deliverables at each stage help to focus the community definiton and building process and keep it on its tracks. However, it is often worth noting that the first and third phase of the Water-
Building Web Communities
fall model are usually the most important phases of web community definition and building, as we nowadays have a great selection of open-source software ready to be used to support different types of communities. Our main contribution to the basic Waterfall model can be seen in the Requirement specification phase, in which we included various elements of community models presented in previous sections (Purpose, People, Added Value, Community parameters). We believe the advantage of combining various community models and theories with an established software development process is the fact that companies (and project managers) are already familiar with this process and can easily follow the progress across different stages. The main deliverable of the Requirement Specifications phase is the definition of concrete Community parameters, which can be used to prepare clear software specifications (the Software Design phase in the Waterfall model). This chapter mainly focuses on the process during the Requirement Specifications phase, although the other two phases are also very important for a successful community process. In the rest of the chapter we’ll present our methodology in more detail by presenting the case study of PetPro, a fictional company specializing in cat food and other products for cat lovers. PetPro is a company that sells high quality cat food and now wants to expand into health pet products. They wish to use the crowdsourcing model to discover possible product niches.
First Step: Purpose A series of meetings with the client company is the best way to get to know the company. In our case, we find out that they already have a lot of loyal clients and that their brand is respected. The need of PetPro is to increase their sales with new product niches. The next step is to take a look at the users in relation to the company. Who are the people
buying ProPet food? What do they want? How do they feel about the company and what do they generally like to do online? The best way to do that is to meet with a sample of them or ask them to fill out a survey. The meeting or survey showed us that the users want healthy pet food, but they also need more information about cat health - e.g. what illnesses are serious, how to prevent them. We now have both the company and user needs and can define the basic purpose of the community for both parties. In the case of PetPro, the purpose of the community would be discover product niches and build relationship. The purpose for users to join a community would be getting information about cat health.
Second Step: Added Value Clearly defined purpose allows us to define added value for both client company (increased revenue) and their customers (improved cat health) which translates into concrete benefits for both the company and users. As pet health is an ongoing concern, there is an incentive for users to come back, provided we keep providing them with value.
Third Step: Users’ Social Profiles In this step, we take another look at the target user group from a different viewpoint. We want to know what kind of users they are when it comes to participation in online communities. Are they active or passive participants? What tools are they using? With the help of surveys and other research methods, we can then create an active participation graph that charts the profile of active users. We have plotted the results on a two-dimensional XY chart to better showcase the strength of particular user type (e.g. light creator / moderate creator / strong creator), but they can also be shown as bars representing different user types. As Figure 4 shows, the pet community members are eager creators. During our research, we find many cat blogs on the web, where proud
365
Building Web Communities
Figure 4. Active participation graph of a sample PetPro community
owners share pictures of their cats and document their daily adventures. The mass of sites where users can rate pictures of cats also tell us that there is a strong critic component to the community. Because there is no large cat owner forum or blog aggregators, we conclude that the users are weak joiners and collectors. These findings are further validated using surveys.
Step Four: Proposed Solution Given the defined added value and the user social profile and participation data, PetPro decides to set up a website with a blog written by experts (veterinarians) that will cover most common cat health issues without focusing on PetPro’s existing products. Visitors of the site can suggest health issues to be dealt with in the next post and vote on suggestions made by other visitors. In this case, the added value for community members is expert advice on cat health, and the company can accurately gauge the scale of cat health issues based on the number of votes a specific health issue gets and on the number of page views for each of the expert posts. This will enable them to
366
create new products that address the most frequent pet health problems. Because of the strong Critics component, we also allow users to rate issues based on importance. Users can of course also rate blog posts and comment both on suggested issues and blog posts. Because there are already many other communities for cat lovers and because the users aren’t eager joiners, we don’t add an excessive number of social networking features and don’t force users to register. We however make sure they can share our content and access it content from their other communities using, for example, widgets.
Step Five: Community Parameters The community parameters are then used to specify the community details in more concrete terms using the community model described above. As the cat owners are likely to have blogs and share cat pictures, user profiles are set up which allow users to link to their blog and showcase pictures of their pets. To promote user activity and establish leadership, “karma” is introduced, which increases when users perform specific actions (suggest questions, vote on questions, get their question answered, post
Building Web Communities
pictures). Users with the highest karma receive discount coupons for PetPro products. As the cat owners are not prone to joining social networks, no further social functions will be implemented at the start; the company does, however, include a votable suggestion box to detect possible future wishes for platform expansion. We also have to define clear goals and metrics that we can use later on to evaluate the success of the web community. In the example, such goals could be to get 3 new product ideas per year and establish a dialogue with our customers. The metrics could include the number of submitted issues, ratings, page views and incoming links.
SUMMARY The basic process for community building starts with a perceived need - either at the company or the user side (the need of PetPro to find new product niches). The need is solidified in a purpose for the community (to make cats healthier), which, joined with the user profiles (active participation profile, willingness to use different technologies), defines the added value (expert cat health advice, direct connection to cat health problems) which provides benefits to the company and users and fulfills the initial need. After the main community parameters have been defined, we define the community parameters which serve as a good foundation for the community platform software specification. That allows us to select the best software platform for the community or identify the required custom modules that should be developed to support community specifics. We also define the goals and metrics we use to gauge community success. Following the launch of our new community, we can use the results of the community model that is used in the conceptual stage to monitor the progress of our community and to adjust our community strategy when needed. In the later stages the community model should be improved by empirical data and feedback from the community
members, which can also result in new software specifications. An example would be the rising success of social networking platforms like Facebook which could cause us to add social networking features or integration with other platforms. Communities are a living organism that changes in time and for this reason we often return to the beginning of the community definition process even after we already have a live community.
COMMUNITY BUILDING PITFALLS Our model includes a few basic assumptions. Firstly, that the company has clearly defined business goals, knows their strengths and weaknesses and knows their users well. One of the disadvantages of using a Waterfall model in our methodology is that insufficient (or even false) requirements can result in poor strategy, which fails to meet the goals of the project. Second, that there actually is a basis for the community to form. A lot of products, services and companies do not lend themselves to forming online communities (think of car tires or dairy products) and it might be hard to find an overlap between the company purpose and a user purpose. And third, that the company is willing to invest their time and resources into growing and fostering the community. It is not enough to spend a few months planning and implementing a community; a community is a long-term commitment, which requires the right kind of people. A combination of all these is rare; be prepared to realize that during the community definition process and decide whether to proceed with the community building or whether to halt the project and perhaps use the requirements for a more lightweight strategy.
CONCLUSION The business value of web communities is becoming more important each day. However, a
367
Building Web Communities
lot of companies fail to define the right goals and strategy for a web community and consequently fail to build a lively web community with a clear business value. Our community building process starts with clear definitions of company needs and user needs. We then define a common purpose for a community that will fulfill both these needs by providing case-specific added value. After that, we use the community parameters to define as much about the community as we can in advance. This allows us to prepare documents like Terms of service, train community managers and clearly define the needed software platform. We set clear business goals for the community and translate them into community metrics, which will allow us to track the community progress, alert us to possible problems and enable us to modify community growth strategy if needed. Despite the seemingly simple community building process, it is worth keeping in mind that building communities is, at heart, a human problem - not a technological one. Human participants usually do not lend themselves to orderly processes. That is why our methodology is, at heart, just a foundation and not a panacea. Every real community building project will be different and it is important to learn and adapt to a specific business and user environment.
REFERENCES Äkkinen, M. (2005). Conceptual foundations of online communities. Retrieved August 25, 2007, from http://hsepubl.lib.hse.fi/pdf/wp/w387.pdf Cheng, R., & Vassileva, J. (2006). Design and evaluation of an adaptive incentive mechanism for sustained educational online communities. User Modeling and User-Adapted Interaction, 16(3), 321–348. doi:10.1007/s11257-006-9013-6 Clarke, L. (2009). What is an online community? Retrieved October 17, 2009, from http://www. siftgroups.com/blog/what-online-community
368
Couros, A. (2003). Communities of Practice: A Literature Review. Retrieved May 10, 2008, from http://www.tcd.ie/CAPSL/academic_practice/ pdfdocs/Couros_2003.pdf De Souza, C. S., & Preece, J. (2004). A framework for analyzing and understanding online communities. Interacting with Computers, The Interdisciplinary Journal of Human-Computer Interaction. Lazar, J., & Preece, J. (1998). Classification Schema for Online Communities. In Proceedings of the 1998 Association for Information Systems, Americas Conference (pp. 84-86). Li, C., & Bernoff, J. (2008). Groundswell: Winning in a World Transformed by Social Technologies. Boston: Harvard Business Press. Nielsen, J. (2006). Participation Inequality: Encouraging More Users to Contribute. Retrieved July 10, 2009, from http://www.useit.com/alertbox/participation_inequality.html Parekh, N. (2005). The Waterfall Model Explained. Retrieved October 18, 2009, from http://www. buzzle.com/editorials/1-5-2005-63768.asp Porter, C. E. (2004). A Typology of Virtual Communities: A Multi-Disciplinary Foundation for Future Research. Journal of Computer-Mediated Communication, 10(1). Retrieved September 2, 2007, from http://jcmc.indiana.edu/vol10/issue1/ porter.html Preece, J. (2001a). Sociability and usability: Twenty years of chatting online. Behavior and Information Technology Journal, 20(5), 347–356. doi:10.1080/01449290110084683 Preece, J. (2001b). Online communities: Usability, Sociability, Theory and Methods. In Earnshaw, R., Guedj, R., van Dam, A., & Vince, T. (Eds.), Frontiers of Human-Centred Computing, Online Communities and Virtual Environments (pp. 263–277). Amsterdam: Springer Verlag.
Building Web Communities
Preece, J., Nonnecke, B., & Andrews, D. (2004). The top 5 reasons for lurking: Improving community experiences for everyone. Computers in Human Behavior, 2(1). Typaldos, C. (2000). Community Standards. Fast Company, 38, 369. Wellman, B., Boase, J., & Chen, W. (2002). The Networked Nature of Community: Online and Offline. IT&Society, 1(1), 151–165.
ADDITIONAL READING Forrester Research. (2008). Social Technographics Profile Tool. Retrieved July 10, 2009, from http:// www.forrester.com/Groundswell/profile_tool. html Kim, A. J. (2000). Community Building on the Web: Secret Strategies for Successful Online Communities. Peachpit Press. Uhrmacher, A. (2008). 35+ Examples of Corporate Social Media in Action. Retrieved July 20, 2009, from http://mashable.com/2008/07/23/corporatesocial-media/.
369
370
Chapter 22
Virtual Geodemographics: Consumer Insight in Online and Offline Spaces Alex D. Singleton University of Liverpool, UK
ABSTRACT Computer mediated communication and the Internet has fundamentally changed how consumers and producers connect and interact across both real space, and has also opened up new opportunities in virtual spaces. This book chapter describes how technologies capable of locating and sorting networked communities of geographically disparate individuals within virtual communities present a sea change in the conception, representation and analysis of socioeconomic distributions through geodemographic analysis. It is argued that through virtual communities, social networks between individuals may subsume the role of neighborhood areas as the most appropriate unit of analysis, and as such, geodemographics needs to be repositioned in order to accommodate social similarities in virtual, as well as geographical, space. The chapter ends by proposing a new model for geodemographics which spans both real and virtual geographies.
THE DENUDATION OF REAL WORLD GEODEMOGRAPHICS Geodemographic classifications work by categorizing real world geographic areas into a series of Types which purport to represent homogeneous and multidimensional characteristics of individuals living with neighborhoods. Fundamental to this view is that the geographical location in which DOI: 10.4018/978-1-60960-040-2.ch022
you live shapes who you are, and in the case of commercial applications; what you are likely to buy in the future. This kind of classification has apparently sustained considerable success in the commercial sector by leveraging greater returns through target marketing (Birkin, Clarke, & Clarke, 2002; Harris, Sleight, & Webber, 2005), and classifications are increasingly used by the public sector for social marketing and customized service delivery (Longley, 2005). The assignment of an individual within a classification Type is
achieved by address matching against a small area geography equivalent in size to census areas, US Zip Codes, UK/Canadian Postcodes and so forth, an assignment process that is potentially vulnerable to the ecological fallacy (Birkin et al., 2002) and suppression of diversity within areas (Voas & Williamson, 2001). Furthermore, although geodemographic classifications are constructed using data which relate to geographic areas, their mode of construction is avowedly aspatial, in that the clustering procedures that are used to create the classification are optimized by searching for patterns of social similarity, independent of locational proximity. As such, the “geo” prefix to geodemographics perhaps implies greater spatial intelligence than perhaps exists in reality. Against this backcloth, the growing role of the Internet for mediating relationships between producers and consumers is fundamentally challenging the supremacy of geographic classification as a method of targeting based on homogeneity of behaviors between consumers within a neighborhood area (Longley & Singleton, 2009a, 2009b; Longley, Webber, & Chao, 2008). The core principle underlying current geodemographic classifications is that ‘birds of a feather, flock together’ (Sleight, 2001), that is, the locations of consumers with similar traits, tastes and preferences exhibit spatial autocorrelation. For traditional marketing activities such as the provision of targeted mail shots or the location of advertising bill boards, response rates can be estimated simply as a function of the typical characteristics of the local population likely to view these offerings. However, more and more consumer interaction takes place on the Internet, where the similarities between consumer behavior are less obviously viewed through the lens of geographic co-location. Instead, consumers or potential customers can be drawn together from across large geographic areas. To date, critiques of geodemographics have been limited to offline behaviors occurring across geographic space, and as such little attention has been directed at the challenges that computer
mediated communication poses to areal classification. To what extent do social similarities manifest both between and within online virtual spaces supplement or even replace conventional geodemographic classification?
TOWARDS A GEODEMOGRAPHY OF CYBERSPACE? Before reconsidering the role of geodemographics as a tool for generalized representation it is important to define how online spaces are constructed, as this influences how they can be understood and measured. There is long established interest in how new forms of interaction and place forming processes are enabled by information and communication technology (Adams, 1998; Batty, 1997; Valentine & Holloway, 2002). A useful typology of online and offline spaces is provided by Batty (1997:340): 1. Place/space: the original domain of geography abstracting place into space using traditional methods; 2. Cspace: abstractions of space into c(omputer)space, inside computers and their networks; 3. Cyberspace: new spaces that emerge from cspace through using computers to communicate; 4. Cyberplace: the impact of the infrastructure of cyberspace on the infrastructure of traditional place. For a full review of early developments in computer mediated communication had their implications for the development of cyberspace see Rheingold (1994) and Batty and Barr (1994). As discussed in the previous section, geodemographics has demonstrated use across a variety of application areas in place/space and more recently cyberplace (Longley et al., 2008). Although early commentary argued that communication enabled
371
Virtual Geodemographics
Table 1. Facebook demographic profile – May 2007 vs. May 2006 (Lipsman, 2007) Age Segment
May-06 (000s)
May-07 (000s)
Percent Change
Persons: 12-17
1,628
4,060
149%
Persons: 18-24
5,674
7,843
38%
Persons: 25-34
1,114
3,134
181%
Persons: 35+
5,247
10,412
98%
by the Internet would erode the importance of place/space (Benedikt, 1991; Caincross, 1997), these effects, as argued by Kitchin (1998) were overstated. Today, businesses still cluster in real geographic spaces to build on economies of proximity, and the majority of the workforce do not telecommute from their homes into virtual offices. Connection to the Internet has not replaced our interactions and organization across real space, and as such, place/space areal targeting applications using geodemographics as traditionally conceived still maintain relevance. However, the Internet, since these early commentaries has changed. Goodchild (2007:27) differentiates that ‘the early Web was primarily one-directional, allowing a large number of users to view the contents of a comparatively small number of sites, [whereas] the new Web 2.0 is a bi-directional collaboration in which users are able to interact with and provide information to central sites, and to see that information collated and made available to others’. This paradigm shift has enabled numerous and rapidly expanding cyberspaces to develop around multiple different types of digital interaction (Dodge & Kitchin, 2001). In this new information age, and as predicted by Castells (2000), networks have become an increasingly important organizational framework on which new organizations have been made. The conception of networks as the building blocks for cyberspaces is increasingly evident in those new services popularized online that link individuals together through their personal associations, or sharing of common interests. Although the development and success of these social network internet websites
372
is a relatively new but growing phenomenon (Boyd & Ellison, 2008) (see Table 1), the study of offline social networks has a longer history extending back to the 1970s (Boornam & White, 1976; Freeman, 2004; Galaskiewicz, 1979; Scott, 2007; Wasserman, 1994; White, Boornam, & Breiger, 1976) with applications across a multiple sciences including health (Christakis & Fowler, 2007), education (Hawe & Ghali, 2007), crime (Calvó-Armengol & Zenou, 2004) and politics (Crossley, 2007). Although online networks may also demonstrate real world spatial autocorrelation, this offline spatial clustering is likely to be more diffuse, and particularly so for those networks built around niche activities. Thus, the likely success of targeting individuals within cyberspaces based on space/ place geo-location is eroded, thus undermining the value of spatial classification such as geodemographics. Indeed, in a study of LiveJournal (www.livejournal.com/) friendships LibenNowell et al.(2005) showed that around a third of social-network friendships were independent of geography. In response to this problem marketers have had to develop a range of new strategies to reach networks of individuals communicating online. One example technique which substitutes areal targeting is viral marketing, defined as a method which “takes advantage of networks of influence among customers to inexpensively achieve large changes in behaviour” (Richardson, 2002:61). In this type of targeting, marketing messages are sent to a range of individuals within a targeted community who pass these on through their network of social connections. In
Virtual Geodemographics
Figure 1. Second Life: Total active users (Data: Linden Labs, maps created by author)
this type of marketing, the individual and their relationships become the focus for targeting rather their geo-location and ascribed geodemographic classification. Thus far discussion has concerned the implication of online activity served through traditional HTML based websites, albeit with elements of interaction enabled by database connectivity and scripting languages. A developing area of cyberspace are those situated in virtual worlds (Bainbridge, 2007:472; Butler, 2006), defined as “an electronic environment that visually mimics complex physical spaces, where people can interact with each other and with virtual objects,
and where people are represented by animated characters”. Virtual worlds in their current form extend from the technologies of internet relay chat, through Multi User Dungeon/Domains (MUD) and early graphical representations of MUD such as Active Worlds (Dodge & Kitchin, 2001). There are many different virtual worlds which range in purpose, scale and sophistication. One of the most popular is Second Life from Linden Labs (http:// secondlife.com/) which as of 26th June 2008 there were 14,123,766 residents1, around double the total population of London. The frequency of active users, the time spent online and the ratio between the two measures are shown in Figures
Figure 2. Second Life: Time spent online (Data: Linden Labs, maps created by author)
373
Virtual Geodemographics
Figure 3. Second Life: Ratio between the user frequency and the time spent online (Data: Linden Labs, maps created by author)
1-3 for each country in the world, illustrating how use of this technology has penetrated numerous disparate but real geographic locations. The Second Life operating environment was created with tools which enable an economy, allowing users to both produce and consume products and services sold for virtual money (Linden™ dollars). Users can purchase or sell this currency through LindeX™ the Second Life virtual financial exchange, thus making it possible to make real world money from virtual business activities. Virtual worlds present further challenges for marketers and social scientists looking to understand and segment consumer behaviour. In addition to interactions enabled by social networks between individual users of these cyberspaces, virtual worlds also partition activities across a Euclidean space, that is, each building, home, shop and avatar is located at a specific set of spatial co-ordinates, thus re-engaging the possibility of spatial targeting. However, despite early calls (Batty, 1997), there has been little research to date on how the relationship between activities, space use and organisation are comparable to our real world understandings of geographic processes. This has however not deterred the many large corporations with real world presence including Ford, Coca Cola, MTV, IBM and Amer-
374
ican Apparel entering Second Life as an opportunity to expand their market and brand with limited cost. Second Life consumer intelligence will have an increasing value for all types of companies wishing to target their selling and is recognised by a number of real life market research companies who have produced a range of panel surveys conducted with Second Life residents (Tarran, 2007). Cyberspaces enable real-time and scale free interaction between their inhabitants, be this through passing association in virtual worlds, or via connections within social networks. In a sense, cyberspaces decouple the association between behavioural patterns and place, thus undermining the prerequisite of traditional geodemographics that affiliate people into typologies based on consumption patterns of those people in the area where you live. Area becomes far more difficult to define and specifically so given that interactions and activities in Cyberspace do not necessarily have to occur at a fixed place, or even in place at all as traditionally conceived. For example, what is the place of a send email? Given these challenges, it can be argued that in Cyberspaces the appropriate geographic scale of analysis is the individual. However, analogous to those challenges of linking individual records
Virtual Geodemographics
from within multiple and large administrative data in space/place, identifying an individual’s digital footprints across multiple cyberspaces is equally challenging. For example, how can you link the behaviours of an individual on the social networking website Facebook to their activities in Second Life given that there is no unique ID? There is an acute need for more research in this area, specifically as online and offline interactions will increasingly overlap.
VIRTUAL CHOICES AND INFORMATION ASYMMETRY A related challenge for offline targeting solutions such as geodemographics is that they are optimised to predict homogeneous consumption patterns of limited and well defined behaviours. For example, the earliest examples of commercial geodemographic classifications examined the readership of newspapers (Batey & Brown, 1995), a product category which tends to correlate highly with political allegiance(Newton & Brynin, 2001), voting patterns (Johnston & Pattie, 2006) and socioeconomic status (Chan & Goldthorpe, 2007). In these examples there is a close correspondence between the specification, or the indicators used, and the outcomes as measured by the classification. In an era of online mass customerization (Wind & Rangaswamy, 2001) online retailers offer the ability to customise product offerings to meet the specific needs of individuals. An example of such a service is provided by the computer retailer Dell who offer the ability to customise products down to the level of individual components. Although traditional classification may be useful to predict those neighbourhoods likely to purchase new computers, it is unlikely to have successes at discriminating between disaggregation with this group based on niche tastes and preferences of individuals. This issue of nice tastes is explored by Anderson (2006) who develops a thesis for the long tail of retail, where those companies
who provide ‘endless choices’ online in turn are matched by consumer ‘unlimited demand’. This business model is enabled by the removal of the physical limitations of retail such as geographic location and shelf space, and as such negates the opportunity cost of stocking more items. In a physical store, each product has to occupy shelf space, and there is a cost associated with each of these items in terms of ground rents, staffing, heating and lighting. As such, the physical store will generally cater for those items which are popular and can be sold in large quantities. Anderson (2006) describes these as the “hits”, and it is posited here that, like newspaper readership, it is these large and well defined hits which traditional geodemographic classifications are predominantly suited to target. The challenge for future geodemographic classification will be how they can adapt to better account for the plethora of niches which make up long tail of future online retail markets. The Internet democratizes the dissemination of information and provides consumers with a plethora of tools which enable them to compare products or services, read reviews and search for the best prices. Some of this information is prepared by teams of professional or semiprofessional reviewers (e.g. www.gizmodo. com), and some is based on the opinions of the public (e.g. www.tripadvisor.com/). Around 52% of consumers on the Internet use it to compare product information (Nie & Erbring, 2000), and in previous studies the provision of third party consumer information has been shown to have a significant and cumulative effect on consumer online shopping behavior (Ward & Lee, 2000). Although Levitt and Dubner (2005) argue that these websites have the effect of reducing ‘information asymmetry’ (Akerlof, 1970), the uneven access between those who do and do not engage with new information and communication technologies (Longley & Singleton, 2009a) will likely create a more complex spatial arrangement of those benefiting from these information. As access grows
375
Virtual Geodemographics
to online resources which govern more informed consumer choices this will likely affect the aggregate retail behaviors of the “e-engaged”(Longley et al., 2008). This therefore has implications for those neighborhood level segmentations that do not account for such patterning of internet use. For example, a targeted mail shot advertising a new low price for a product may not be as effective if the potential consumer has access to price comparison information indicating that the same product could be purchased elsewhere for the same or lower price.
IMPLICATIONS AND CHALLENGES This brief review of those online technologies affecting the usefulness of geodemographics demonstrates a need to revisit the underpinning philosophy and methodology used to justify and construct these spatial representations. There are a number of implications which need to be investigated in this new research agenda. With an assertion that individuals subsume the role of neighborhood as the most appropriate scale of analysis; where transactional information creates a significant resource for targeting effective promotions linked through either a website logon, or virtual identity; this in turn requires new insight into issues of privacy and surveillance, particularly in the way in which information gathered about individuals online can be collected, collated and reused. Privacy concerns for geodemographic classification is not a new phenomenon (Goss, 1995), however, if these classification are to be extended to measure virtual as well as real geographies, further research is now necessary to address a growing body of concern about the way in which online information may impinge on privacy, security and civil liberties(see Alessandro & Ralph, 2006; Miyazaki, 2008; Whysall, 2000). Users of the internet are becoming increasingly
376
aware of these risks (Madden, Fox, Smith, & Vitak, 2007), and indeed a number of companies now provide consumers with various ways of assessing their digital footprint, both in terms of data transferred2, and those occurrences of your details across various websites3. These issues are complex, and also have parallels with other real world methods of data collection, for example, in those activities of retailers operating store card schemes. When users collect points on their store card based on the value and items in their shopping basket, they also are providing retailers with a plethora of information about their shopping behaviour. This information is used by retailers to provide targeted promotions and inform store intelligence (Hunby, Hunt, & Philip, 2007), and in the case of some schemes, these information are available outside the borders of the stores in which the data was collected. A further implication for geodemographic classification builders is a requirement for better understanding of how information gathered online relates to offline behaviors, and indeed analysis if these are either complementary and as such reinforcing, or; contradictory, thus providing new insights. Some research has been completed in terms of social capital accumulation (Wellman, Haase, Witte, & Hampton, 2001) and specifically how these constructs may influence offline behaviors (Blanchard & Horan, 1998; Matei & Ball-Rokeach, 2001). Other researchers have looked at the relationship between engagement with new information communication technologies and the arrangement of these behaviors across real geographic space (Longley & Singleton, 2009a; Longley et al., 2008). The link between online behaviors for offline applications are beginning to be explored, for example Sulake, a Finnish provider of a virtual world have started utilizing the platform to produce market research data by surveying 42,000 consumers across 22 countries (Jana, 2007). Additionally, with the advent of geocoded online content, such
Virtual Geodemographics
Table 2. A new framework for geodemographic analysis Space/Place
Cyberplace
Cyberspace
Initiation
Direct Mail targeted by home address.
Direct Mail targeted by online and offline purchasing linked to a store card.
Targeted website adverts based on a search criteria – e.g. Google Adverts.
Regular Purchasing
Walk in store recording customer information – e.g. Evans Cycles
Using websites to purchase items which are collected in store.
Recording of online purchases.
Customisation
N/A – Too expensive for the majority of stores to uniquely customise products.
Online recommendations tailored by offline shopping behaviours linked to a store card.
Online recommendations tailored by previous online shopping behaviours.
as the real geographic location of Twitter4 feeds, this offers new online information which could potentially be mined for offline spatial intelligence at an individual level, with resulting implications for privacy. An example of this is demonstrated through Twittervision5 which plots the spatial location of Twitter feeds onto a Google Map. In order to action further research on the challenges and implications posed by those new technologies discussed this chapter proposes a new framework through which geodemographics can be repositioned. This is a matrix made up of offline (space/place), hybrid (cyberplace) and online (cyberspace) geographic spaces; cross tabulated against three levels of increasing purchasing complexity ranging from initiation, regular purchasing and customization (see Table 2). Of the nine cells in Table 2, which each contain examples of a range interactions between suppliers and consumers, traditional geodemographics arguably only have function as traditionally conceived for space/place initiations which are based on the area in which a person lives. Although it could be argued that area classification may add insight into the types of neighborhoods in which an individual consumer lives (where address is known), and as such could be applicable across multiple areas of the matrix, this information is likely to be far less insightful than information mined at an individual level. In this chapter it is argued that the benefit of examining behavior at a neighborhood level is eroded by increased consumer activity within multiple cyberspaces, and that because of these
interactions, more sophisticated methods are required to identify and map homogeneous clusters of behavior at the scale of the individual. Although one can adopt a dichotomous view of consumer transactions, where online and offline behaviors are neatly partitioned, online behavior have relevance to real world consumer habits and the mining and linking of this type of information could potentially lead to new insights. Given significant evidence that the role of traditional advertising media channels are being eroding as an effective tool for engaging potential consumers (Anderson, 2006; Webster, 1992), it is posited that geodemographics in its current form will experience gradual erosion of their effectiveness unless the concerns presented in this chapter are addressed. Our established understanding of the behaviors which govern consumption are clearly challenged by the new e-infrastructures described in this chapter, and as consumer responses to areal targeting initiatives changes; investigation is now required as to how better response rates can be garnered through new methods of segmentation and engagement. These changes represent a shift in our understanding of how consumer behaviors can be modeled, from a top down hierarchical approach where classification builders produce automated spatiality (Thrift & French, 2002) based on users postcode, to the type of generative bottom up social science discussed by Batty (2008) and Epstein (2007).
377
Virtual Geodemographics
REFERENCES Adams, P. (1998). Network Topologies and Virtual Place. Annals of the Association of American Geographers. Association of American Geographers, 88(1), 88–106. doi:10.1111/1467-8306.00086 Akerlof, G. (1970). The Market for Lemons: Quality Uncertainty and the Market Mechanism. The Quarterly Journal of Economics, 84(3), 488–500. doi:10.2307/1879431 Alessandro, A., & Ralph, G. (2006). Imagined Communities: Awareness, Information Sharing, and Privacy on the Facebook Lecture Notes in Computer Science, 4258/2006, 36-58. Anderson, C. (2006). The Long Tail: How Endless Choice Is Creating Unlimited Demand. London: Random House Business Books. Bainbridge, W. S. (2007). The Scientific Research Potential of Virtual Worlds. Science, 317(5837), 472–476. doi:10.1126/science.1146930 Batey, P. W. J., & Brown, P. J. B. (1995). From Human Ecology to Customer Targeting: The Evolution of Geodemographics. In Longley, P., & Clarke, G. (Eds.), Gis for Business and Service Planning (pp. 77–103). Cambridge: GeoInformation International. Batty, M. (1997). Virtual Geography. Futures, 29(4-5), 337–352. doi:10.1016/S00163287(97)00018-9 Batty, M. (2008). Generative Social Science: A Challenge. Environment and Planning B, 35, 191–194. doi:10.1068/b3502ed Batty, M., & Barr, B. (1994). The Electronic Frontier: Exploring and Mapping Cyberspace. Futures, 26, 699–712. doi:10.1016/0016-3287(94)90039-6 Benedikt, M. (1991). Cyberspace: First Steps. Massachusetts: MIT Press.
378
Birkin, M., Clarke, G., & Clarke, M. (2002). Retail Geography and Intelligent Network Planning. Chichester: Wiley. Blanchard, A., & Horan, T. (1998). Virtual Communities and Social Capital. Social Science Computer Review, 16(3), 293–307. doi:10.1177/089443939801600306 Boornam, S. A., & White, H. C. (1976). Social Structure from Multiple Networks. Ii. Role Structures. American Journal of Sociology, 81(6), 1384–1446. doi:10.1086/226228 Boyd, D. M., & Ellison, N. B. (2008). Social Network Sites: Definition, History, and Scholarship. Journal of Computer-Mediated Communication, 13, 210–230. doi:10.1111/j.10836101.2007.00393.x Butler, D. (2006). Virtual Globes: The WebWide World. Nature, 439(7078), 776–778. doi:10.1038/439776a Caincross, F. (1997). The Death of Distance: How the Communications Revolution Will Change Our Lives. Boston: Harvard Business School Press. Calvó-Armengol, A., & Zenou, Y. (2004). Social Networks and Crime Decisions: The Role of Social Structure in Facilitating Delinquent Behavior. International Economic Review, 45(3), 939–958. doi:10.1111/j.0020-6598.2004.00292.x Castells, M. (2000). Rise of the Network Society. Oxford, MA. Chan, T. W., & Goldthorpe, J. H. (2007). Social Status and Newspaper Readership. American Journal of Sociology, 112(4), 1095–1134. doi:10.1086/508792 Christakis, N. A., & Fowler, J. H. (2007). The Spread of Obesity in a Large Social Network over 32 Years. The New England Journal of Medicine, 357(4), 370–379. doi:10.1056/NEJMsa066082
Virtual Geodemographics
Crossley, N. (2007). Social Networks and Extraparliamentary Politics. Social Compass, 1(1), 222–236. doi:10.1111/j.1751-9020.2007.00003.x Dodge, M., & Kitchin, R. (2001). Mapping Cyberspace. London: Routledge. Epstein, J. M. (2007). Generative Social Science: Studies in Agent-Based Computational Modeling. Princeton: Princeton University Press. Freeman, L. C. (2004). The Development of Social Network Analysis: A Study in the Sociology of Science. Vancouver, Canada: Booksurge Publishing. Galaskiewicz, J. (1979). Exchange Networks and Community Politics. Thousand Oaks, CA: Sage. Goodchild, M. F. (2007). Citizens as Voluntary Sensors: Spatial Data Infrastructure in the World of Web 2.0. International Journal of Spatial Data Infrastructures Research, 2, 24–32. Goss, J. (1995). Marketing the New Marketing. The Strategic Discourse of Geodemographic Information Systems. In Pickles, J. (Ed.), Ground Truth (pp. 130–170). New York: Guildford Press. Harris, R., Sleight, P., & Webber, R. (2005). Geodemographics, Gis and Neighbourhood Targeting. London: Wiey. Hawe, P., & Ghali, L. (2007). Use of Social Network Analysis to Map the Social Relationships of Staff and Teachers at School. Health Education Research, cyl162. Hunby, C., Hunt, T., & Philip, T. (2007). Scoring Points: How Tesco Continues to Win Customer Loyalty (2nd ed.). London: Kogan Page. Jana, R. (2007, August 13, 2007). Mining Virtual Worlds for Market Research. Business Week. Johnston, R., & Pattie, C. (2006). Candidate Quality and the Impact of Campaign Expenditure: A British Example. Journal of Elections. Public Opinion & Parties, 16(3), 283–294. doi:10.1080/13689880600950550
Kitchin, R. (1998). Cyberspace: The World in the Wires. Chichester: John Wiley and Sons. Levitt, S. D., & Dubner, S. J. (2005). Freakonomics. London: Penguin. Liben-Nowell, D., Novak, J., Kumar, R., Raghavan, P., & Tomkins, A. (2005). Geographic Routing in Social Networks. Proceedings of the National Academy of Sciences of the United States of America, 102(33), 11623–11628. doi:10.1073/ pnas.0503018102 Lipsman, A. (2007). Facebook Sees Flood of New Traffic from Teenagers and Adults. Retrieved from http://www.comscore.com/press/release. asp?press=1519 Longley, P. A. (2005). Geographical Information Systems: A Renaissance of Geodemographics for Public Service Delivery. Progress in Human Geography, 29(1), 57–63. doi:10.1191/0309132505ph528pr Longley, P. A., & Singleton, A. D. (2009a). Classification through Consultation: Public Views of the Geography of the E-Society. International Journal of Geographical Information Science. Longley, P. A., & Singleton, A. D. (2009b). Linking Social Deprivation and Digital Exclusion in England. Urban Studies. Longley, P. A., Webber, R., & Chao, L. (2008). The UK Geography of the E-Society: A National Classification. Environment and Planning A. Madden, M., Fox, S., Smith, A., & Vitak, J. (2007). Digital Footprints: Online Identity Management and Search in the Age of Transparency. Pew/ Interneto. Document Number. Matei, S., & Ball-Rokeach, S. J. (2001). Real and Virtual Social Ties: Connections in the Everyday Lives of Seven Ethnic Neighborhoods. The American Behavioral Scientist, 45(3), 550–564.
379
Virtual Geodemographics
Miyazaki, A. D. (2008). Online Privacy and the Disclosure of Cookie Use: Effects on Consumer Trust and Anticipated Patronage. Journal of Public Policy & Marketing, 27(1), 19–33. doi:10.1509/ jppm.27.1.19 Newton, K., & Brynin, M. (2001). The National Press and Party Voting in the UK. Political Studies, 49, 265–285. doi:10.1111/1467-9248.00313 Nie, N. H., & Erbring, L. (2000). Internet and Society: Preliminary Report, from http://www. stanford.edu/group/siqss/Press_Release/Preliminary_Report.pdf Rheingold, H. (1994). The Virtual Community in a Computerized World. London: Martin Secker and Warburg Limited. Richardson, M., & Domingos, P. (2002). Mining Knowledge-Sharing Sites for Viral Marketing. Paper presented at the Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. Scott, J. (2007). Social Network Analysis (2nd ed.). London: Sage. Tarran, B. (2007, January 26). Mr Firms Publish First Reports from Second Life. Research. Thrift, N., & French, S. (2002). The Automatic Production of Space. Transactions of the Institute of British Geographers, 27, 309–335. doi:10.1111/1475-5661.00057 Valentine, G., & Holloway, S. L. (2002). Cyberkids? Exploring Children’s Identities and Social Networks in on-Line and Off-Line Worlds. Annals of the Association of American Geographers. Association of American Geographers, 92(2), 302–319. doi:10.1111/1467-8306.00292 Voas, D., & Williamson, P. (2001). The Diversity of Diversity: A Critique of Geodemographic Classification. Area, 33(1), 63–76. doi:10.1111/14754762.00009
380
Ward, M. R., & Lee, M. J. (2000). Internet Shopping, Consumer Search and Product Branding. Journal of Product and Brand Management, 9(1), 2–20. doi:10.1108/10610420010316302 Wasserman, S., & Faust, K. (1994). Social Network Analysis: Methods and Applications. New York: Cambridge University Press. Webster, F. E. Jr. (1992). The Changing Role of Marketing in the Corporation. Journal of Marketing, 56(4), 1–17. doi:10.2307/1251983 Wellman, B., Haase, A. Q., Witte, J., & Hampton, K. (2001). Does the Internet Increase, Decrease, or Supplement Social Capital? Social Networks, Participation, and Community Commitment. The American Behavioral Scientist, 45(3), 436–455. doi:10.1177/00027640121957286 White, H. C., Boornam, S. A., & Breiger, R. L. (1976). Social Structure from Multiple Networks. I. Blockmodels of Roles and Positions. American Journal of Sociology, 81(4), 730–780. doi:10.1086/226141 Whysall, P. (2000). Retailing and the Internet: A Review of Ethical Issues. International Journal of Retail & Distribution Management, 28(11), 481–489. doi:10.1108/09590550010356840 Wind, J., & Rangaswamy, A. (2001). Customerization: The Next Revolution in Mass Costomization. Journal of Interactive Marketing, 15(1), 13–32. doi:10.1002/1520-6653(200124)15:1<13::AIDDIR1001>3.0.CO;2-#
Development of Cellular Phone Software to Prompt Learners to Monitor and Reorganize Division of Labor in Project-Based Learning Toshio Mochizuki Senshu University, Japan Kazaru Yaegashi Ritsumeikan University, Japan Hiroshi Kato The Open University of Japan, Japan Toshihisa Nishimori The University of Tokyo, Japan Yusuke Nagamori Tsukuba University of Technology, Japan Shinobu Fujita Spiceworks Corporation, Japan
Development of Cellular Phone Software to Prompt Learners to Monitor and Reorganize
displays it on the screen. A classroom evaluation was performed in an undergraduate course; the evaluation confirmed that ProBoPortable enhanced mutual awareness of the division of labor among learners, who modified their own tasks by monitoring the overall status of the PBL. Using ProBoPortable increasingly fostered the sense of a learning community among the subjects. Moreover, social facilitation encouraged the learners to proceed with their own task due to the presence of others who are mutually aware of each member’s status.
INTRODUCTION In recent years, project-based learning (PBL) has been used extensively as a major educational method for higher education (Gijbels et al., 2005). PBL is a type of learning activity in which a learner studies along with other learners while working toward a common goal and collaborating on tasks as a group. This trend results from the fact that undergraduate students are expected to enhance their creativity and social skills before they graduate and commence participating in the society. Throughout PBL, learners rarely perform the same task simultaneously. They prefer to divide a certain part of the task into smaller tasks and allocate each task to individual group members. Even in cases in which the rules for the division of labor are institutionalized by a teacher or an organization, group members sometimes cross the borders of the division and coordinate their tasks with other members across these borders as the occasion may demand. For instance, if the task monitor provides a task performer with some instructions and then notices the task performer’s errors, it is implied that the monitor becomes involved in performing the task. Thus, the division of labor is reorganized in a more or less ad lib and ad hoc manner such that the task progresses uninterrupted and error free (Hutchins, 1990). Kato et al. (2004) termed such a crossover of division of labor as the “emergent division of labor (EDL)” and argued that EDL should provide extensive opportunities for learning in situations where scaffolding (Wood et al., 1976) takes place naturally. According to
382
Kato and his colleagues, EDL has the following three characteristics. •
•
•
Emergence of division of labor: One can interactively negotiate the border of the division of labor by taking into account what others are doing in their regions. This is called “awareness” in the context of computer support for collaborative work (CSCW). Maintenance of division of labor: One can continue coordinating the division of labor with others through a continuous monitoring of their task status. Since stability is achieved through constant negotiation in ever-changing situations, maintenance is a dynamic and aggressive idea. Reorganization of division of labor: Based on the monitoring of the status of others’ tasks, one can flexibly reorganize the division of labor as required (ibid, p. 2654).
However, undergraduate students in PBL get very little time to interact with each other on campus for working together or sharing and discussing their progress or situation. For example, they can meet only in the classroom, while eating lunch, etc. Therefore, PBL in higher education sometimes faces social problems such as social loafing (Latané et al., 1979) and process loss (Steiner, 1972), or it may result in the learners dropping out of the project. Such problems are attributed to a lack of collective cognitive responsibility (Scaradamalia, 2002), social presence (Short et
Development of Cellular Phone Software to Prompt Learners to Monitor and Reorganize
Figure 1. ProBo (Left: ProjectHome; Right: ToDo list)
al., 1976), or workspace awareness (Gutwin et al., 1995). In order to help learners assess the current division of labor in a group and then plan and reorganize the next step by themselves, Nishimori et al. (2005) have developed a Web-based groupware for PBL called “ProBo” (formerly “Project Board”) (Figure 1). ProBo has been designed to visualize and allocate tasks among learners, and to share files among those learners during PBL. ProBo has the following four features: (1) ProjectHome, which indicates the manner in which learners should organize their division of labor and the progress of their respective tasks; (2) the ToDo list, which presents tasks structured in the form of a tree diagram; (3) the Scheduler, which allows learners to confirm the schedule for each task—the deadline that needs to be scheduled; and (4) FileBox, which stores the files pertaining to tasks. Learners can communicate with each other on every task, file, and the entire project using the memo function. However, a study in an undergraduate course revealed that even by using ProBo, most students did not possess an overall grasp of the progress of the other members in their group, and they did not reorganize their division of labor. In order to address this issue, the present study aims to design and develop a mobile learning environment that prompts learners to assess the status of their group in real time and maintain and reorganize their division of labor. The authors
expect this mobile platform will facilitate cooperation and stimulate the emergent division of labor among learners because an increase of workspace awareness online makes students responsible for working together to complete their collaborative task; students must be aware of what is going on around them as part of the collaboration (Gutwin et al. 1995).
Design and Development of ProBoPortable ProBoPortable is cellular phone software that tracks the progress and achievement of project tasks in real time; it also displays the division of labor in PBL. ProBoPortable is used in the NTT DoCoMo 901 to 905 series cellular phones and runs in the Java SDK 1.4 development environment. Fogg (2003) indicated that mobile terminals such as cellular phones could be persuasive technologies that increase people’s awareness of their situations and promote the reorganization of their actions. In Japan, cellular phone penetration among undergraduate/graduate students stood at 96.3% as of October 2003 (Hakuhodo, 2004). Under such circumstances, Nakahara et al. (2005) revealed that the cellular phone display was effective in promoting student participation in online discussions. ProBoPortable was designed to work as wallpaper on the screen of the learner’s cellular phone (Figure 2) in order to keep them updated regarding
383
Development of Cellular Phone Software to Prompt Learners to Monitor and Reorganize
Figure 2. ProBoPortable Interface (displayed on the cell phone screen)
the progress of their project and to stimulate the division of labor as soon as the necessity arises. The software cooperates with the Web-based groupware ProBo. When ProBoPortable is configured as wallpaper, it automatically retrieves real-time status updates for a project from the ProBo database and displays the status immediately; this occurs whenever a learner activates or uses his/ her phone, checks e-mails, etc. Due to the limited screen size of cellular phones, ProBoPortable displays only the selected information from the ProBo database. Table 1 shows the information
that the authors selected to display on the screen, based on the requirements of EDL. ProBoPortable displays all the learners working on the project as warehouse keepers. It also displays the current status of each member’s progress, the tasks allocated to each member, the tasks each member has completed, any backlogged tasks, and the overall progress of the project—this is indicated by a dollar amount on the display. The project “income” increases with the completion of each task. As a learner progresses on a task, the corresponding box shifts slightly to the left. When the task is completed the box moves to a different stack. The boxes are color-coded to enable the learner to instantly identify the areas of the project that require urgent attention. When a task has reached its deadline, the color of the corresponding box turns red. Additionally, the background color of each “warehouse” reflects whether learners are accessing or updating their sections on the main project website. Thus, learners are expected to keep track of other members’ status on a daily basis, perform their tasks, and reorganize their division of labor as and when required.
Table 1. Relationship between visualized information on ProBoPortable and ProBo Information
Index
Target
Expression
Member(s)
Each member
Warehouse keepers and their facial colors
Each member has a color code
Number of Tasks
Number of boxes
Box(es)
If a new task is added on ProBo, a new box will be added from above
Progress of each task
Shift length
Box corresponding to the task
If a learner makes progress with a task, the corresponding box will shift slightly
Approaching the deadline for the task
Color (normal or red)
Corresponding box
If the deadline of a task is approaching, the color of the corresponding box will change to red
Progress of the project
Background color (normal or red)
Backgrounds of all the members of the project
If the progress of the project shows a lower value than the benchmark, the color will change to red.
Money
Amount of money
If the task is completed, the amount will increase
Background color (of relevant learner(s))
Relevant learner(s)
If the learner has not confirmed the status of EDL via ProBo/ProBoPortable, his/her background color will change to black
Whether or not each learner has confirmed the status
384
Development of Cellular Phone Software to Prompt Learners to Monitor and Reorganize
EVALUATION OF PROBOPORTABLE IN AN UNDERGRADUATE COURSE Course Outline The research was conducted in an undergraduate course taught by one of the authors at a university in Japan between June 5 and July 10, 2006; during this period, the class met six times. A total of 94 students participated in the course. The students were divided into 20 groups with 4–6 individuals per group. The common objective of each group was to conduct a presentation on the current situation and the prospects of one of the various topics associated with information communication technology. Each group had to conduct a survey on an assigned topic and make suggestions on that topic to improve future society. The web-based groupware, ProBo, served as the standard groupware for every group in this course. During the final session (i.e., the sixth session), each group was allotted 5 minutes to present their research. The topics covered in each session are listed in Table 2.
Research Design The research was formulated using the split-class design (Carver, 2006) in order to evaluate the
software being used in the classroom with respect to the context of the course being taught in the classroom. In accordance with group structure, the students’ preferred topics, and their phone models, 11 students participated in using ProBoPortable. They installed ProBoPortable, and it served as the wallpaper on their phones throughout the four-week period starting June 12. The students also used ProBo. In order to evaluate the enhancement of students’ awareness of their status and the encouragement to assess and reorganize the division of labor, the authors analyzed the operation logs of ProBo and ProBoPortable from June 12 to the final class. In addition, the post questionnaire (85 effective responses, 90.4%)—which contained questions regarding the awareness of division of labor—was administrated after the final class. In order to investigate the effects of ProBoPortable on the maturation of the groups as learning communities, the post questionnaire contained items from the sense of community scale (Rovai, 2002) that the authors modified to suit PBL. These items were also administered in the pre-test survey (83 effective responses, 88.3%) on June 12 and were used as comparative data. Furthermore, the authors performed a focus group interview after the post questionnaire on July 10 in order to review the conditions under which
Table 2. Class schedule Date
Session
Topic
Jun. 5, 2006
1
Orientation, PowerPoint and ProBo Operation Research in assigned area (Web, book, papers, etc.), Discuss the research topic
Jun 12
2
Focus research topic Further research in the topic area Pre-test Survey on Sense of Community Scale ProBoPortable Orientation (for ProBoPortable users only, after the class hour)
Jun 19
3
Organize the data studied by previous week Develop presentation story
Jun 26
4
Making presentation (PowerPoint file)
Jul 3
5
Making presentation (PowerPoint file)
Jul 10
6
Class Presentations/ Post-test Survey (a portion of the survey was only for ProBoPortable users) Group Interview (for ProBoPortable users only)
385
Development of Cellular Phone Software to Prompt Learners to Monitor and Reorganize
ProBoPortable was used; these were not recorded in the operation history. Such an additional focus group interview regarding the questionnaire is often used for usability testing in order to clarify backgrounds or reasons for answers in the questionnaire (Kuniavsky, 2003). In general, a focus group is helpful for obtaining data and insights that would be less accessible without interaction in a group setting: listening to others’ experiences or ideas stimulates the production of memories, ideas, and experiences (Vaughn et al., 1996). The authors considered that the focus group was useful for identifying the actual user experiences with ProBoPortable because they took place in the participants’ private sphere. One of the authors (not the one who served as the instructor) performed a structured group interview with five students who used ProBoPortable: Andrew, Betty, Caroline, Diana, and Eliza (fictitious names).
Results and Discussion From June 12 to July 10 (29 days), the number of times each student used ProBoPortable averaged 688.2 (S.D. = 470.6, maximum = 1757, minimum = 253). On average, each student used ProBoPortable 23.7 times a day. The number of times a student
accessed his/her group status with ProBoPortable averaged 615.7(S.D. = 634.3, maximum = 1573, minimum = 187), and the number of times a student accessed the status of a group that was not his/her own group averaged 72.5 (S.D. = 75.1, maximum = 184, minimum = 27).
Students’ Evaluation of ProBoPortable Table 3 shows the overall evaluation of ProBoPortable. All 11 users answered questions on a 4-point Likert scale. The learners rated the items related to the collaboration among group members highly. Almost all learners provided a positive answer for the question regarding whether the system was helpful in determining how to proceed with the project. In addition, the results showed that learners felt encouraged when they saw the status of tasks, which were represented by boxes. On the other hand, the results showed that direct interaction among learners was not achieved very often. However, this does not imply that ProBoPortable prevented mutual task coordination among learners, as described below.
Table 3. Summative evaluation of ProBoPortable (4-point Likert scale) Items
Average
(1) I have checked the name of group members using the keypad.
3.00
(2) I felt an urgent need to complete a task when I saw a box dropping down.
3.18
(3) I felt that I had achieved something when any box was cleared.
3.27
(4) I felt that I had achieved something when the amount of money increased.
2.82
(5) I have contacted a group member after I saw the display of ProBoPortable.
1.45
(6) I felt a sense of community with other group members.
3.18
(7) I have monitored the progress of group members in other projects with horizontal keypad as needed.
3.27
(8) ProBoPortable was helpful to review how to proceed with the project, and any reorganization necessary.
3.64
(9) It took time to become familiar with ProBoPortable.
2.00
(10) ProBoPortable was helpful with project collaboration.
3.36
(11) I would like to use a tool like ProBoPortable in a group project in the future, if it is available.
3.64
386
Development of Cellular Phone Software to Prompt Learners to Monitor and Reorganize
Awareness of Collaboration Status and its Effects on Students’ Activities In order to accurately confirm the effect of ProBoPortable, the Mann-Whitney U-test was administered to examine the differences between students who used ProBoPortable (n = 11, hereafter referred to as the “Experimental Group”) and those who did not (n = 83, hereafter referred to as the “Control Group”) with regard to the 20 questions (see Table 4) on the students’ self-evaluation of their PBL with a 5-point Likert scale. According to the test results, significant differences were observed in items such as “I was aware of the progress of each task undertaken by the other members” (U = 138.0, p <.001), and “I monitored the pace at which other group members were working and adjusted my pace accordingly” (U = 233.0, p <.01). These results indicate that ProBoPortable helped students to understand the status of others’ tasks as well as confirm whether others were aware of their own progress in the task. ProBoPortable also helped members to flexibly adjust their own task, as necessary, by continuously monitoring others’ status as well as their own. Analysis of the operation log (the number of people accessing each function per day) of the Web-based groupware ProBo showed that significant differences existed between the Experimental Group and the Control Group with regard to the number of times participants accessed the ToDo list and the Scheduler. The ToDo list organizes and displays the PBL tasks (the Experimental Group averaged 0.175, the Control Group averaged 0.101, U = 273.5, p <.05) and the Scheduler confirms the prospects of PBL (the Experimental Group averaged 0.357, the Control Group averaged 0.142, U = 311.0, p <.05). The group using ProBoPortable accessed the task profiles more often (the Experimental Group averaged 1.11, the Control Group averaged 0.86, U = 322.5, n.s.), and they also modified tasks more often (the Experimental Group averaged 0.12, the Control
Group averaged 0.08, U = 324.5, n.s.); however, no significant statistical difference was observed. Therefore, the results suggest that ProBoPortable encouraged students to confirm their task status and schedule, and assess the division of labor of their learning activity. Furthermore, significant differences and trends were observed with regard to items such as “From time to time, I wanted to talk to other member(s) outside the classroom to negotiate a protocol for proceeding with the project” (U = 250.0, p <.05) and “I frequently contacted other group member(s) outside the classroom in connection with the group activities” (U = 274.0, p <.10). This suggests that ProBoPortable presents opportunities to generate learning activities and mutual adjustment outside the classroom. Results of the group interview indicated that social facilitation by coaction effects encouraged the students to perform the task allocated to them rather than reorganize their division of labor. Interviewer: Well, when you found that the color of the box changed to red, how did you feel or what did you do? Caroline: I would just feel pressured. Diana: I felt I should perform my tasks urgently. Caroline: I would feel uncomfortable if the professor could see my status and recognize that I had some overdue tasks. If the professor accessed ProBoPortable, wouldn’t you (feel the same)? Diana: I was more worried about the other members being aware of my status, rather than the professor. Betty: Well, I thought that the members may get angry at my progress, or something like that. Caroline: They would think “Oh, she stopped her job again” or something like that. Betty: They say “it’s OK,” when they meet us in face-to-face, but… Diana: Yeah, that is true, although no one speaks their mind. So I felt I have to do my tasks as early as possible.
387
Development of Cellular Phone Software to Prompt Learners to Monitor and Reorganize
The statement “I was more worried about the other members being aware of my status, rather than the professor” indicated that the students felt evaluation apprehension from the other members in their group; that is, each of the students using ProBoPortable knew that he/she watched the other members’ status, just as they could see their own status. The questionnaire survey also revealed that the item “I think that the other group members were also aware of the progress of my tasks” (U = 172.5, p <.01) was significantly higher in the Experimental Group. This served as sufficient proof that there was mutual monitoring and evaluation among students in the group. Such evaluation apprehension promoted coaction effects, which promoted students’ own tasks (Cottrell et al., 1968). However, this did not imply that ProBoPortable prevented mutual task coordination among students. The following excerpt from the group interview is evidence that students were sometimes encouraged to communicate with each other in order to coordinate their division of labor if they felt some anxiety from seeing their status on ProBoPortable. Caroline: Eliza phoned me. Eliza: Yeah, yesterday. Caroline: She said “It seems that I’m the only one working on ProBo. Are you aware of it? Are we OK?” Eliza: Because all of you weren’t working on ProBo. Betty: Sorry, sorry. Caroline: No, it’s OK. I don’t blame you. Interviewer: So, is there anyone who called your members or sent e-mails after noticing their status on your mobile? Caroline: Us? (pointing to Diana and Eliza) Eliza: But we usually call each other anyway, because we’re close friends. Interviewer: I see. The two of you have been friends. How about the other students? Well, for example, when you noticed there were
388
some members whose background color changed to black, what did you do? Caroline: I would definitely call the members whose colors become black. Betty: Me, too. I would send a message on the mailing list to encourage them. Caroline: Yeah. I would send them “Are you all right?” Because I wasn’t expecting that any of the members’ background colors would change to black. Betty: We would be worried if one of our backgrounds had changed to black. Diana: Yeah, even if the color changes to red, I’d get worried. Caroline: Definitely. I would send a message to ask their condition. Well, we would probably have communicated more if we had belonged to the same college and were acquainted with each other – However, we usually don’t have many opportunities to meet each other. This dialogue indicated that the movement of the warehouse keepers or a change in the color of other students’ backgrounds made the students aware of each other’s activeness, progress, and status, which then caused the question “Are you all right?” However, we could not observe such situations often because social loafing or dropping out had not seriously occurred, and therefore, direct interaction seldom occurred among students during the experiment. Consequently, it can be concluded that to a certain extent, ProBoPortable provided students with opportunities for the mutual reorganization of division of labor. Moreover, we believe that our system could have encouraged mutual interaction if a situation had occurred where members needed to support each other. In this experiment, however, such situations did not arise or were infrequent; therefore, the effect of social facilitation was efficacious in the sense that they had to perform the allocated tasks on their own rather than cross the border of the division of labor.
Development of Cellular Phone Software to Prompt Learners to Monitor and Reorganize
Figure 3. Effects on the sense of learning community and its subscores
Effects on the Sense of Learning Community With regard to the sense of community scale, a two-way multivariate analysis of variance (MANOVA) was used to examine the effects over “week 2–week 6” (the initial and the final stages of the group) and of “with/without ProBoPortable” (groups with ProBoPortable or those without it). This was done in order to determine the influence of the interaction between these two factors on the maturation of groups as learning communities. There were statistically significant main effects of “week 2–week 6” and “with/without ProBoPortable” on the scale score of the sense of learning community (F (1,1) = 4.386, p <.05 and F (1,1) = 4.549, p <.05, respectively), though there was no significant interaction (Figure 3). For a detailed examination of the significance of such effects, additional two-way MANOVAs were applied to the subscores of the scale—“Connectedness” and “Learning.” There were statistically significant main effects of “week 2–week 6” and “with/without ProBoPortable” on the subscore “Learning” (F (1,1) = 5.392, p <.05 and F (1,1) = 5.540, p <.05, respectively), though there was no significant interaction. Furthermore, slightly significant effects were observed in “week 2–week 6” and “with/without ProBoPortable” on the subscore “Connectedness” (F (1,1) = 2.935, p <.10 and F (1,1) = 2.842, p <.10, respectively), though there was no significant interaction. Palloff & Pratt (1999) indicated that enhancing the sense of learning community is important to
motivate and prompt students to learn in a distributed environment. The results revealed that the subscore “Learning” significantly increased; thus, these findings clarified that ProBoPortable enhanced not only mutual awareness of the division of labor among learners, but also the sense of learning community, which promotes their learning in PBL.
REDESIGNING TO ACHIEVE CLASSWIDE MUTUAL ASSESSMENT AND TO MAKE THE ENTIRE CLASS A LEARNING COMMUNITY As mentioned earlier, ProBoPortable facilitated cooperation, stimulated the emergent division of labor among learners, and enhanced the sense of learning community among learners in a project group. In order for PBL to be more successful in a course, it is important to prevent social psychological disincentives such as group polarization or groupthink within project groups (Janis, 1982). Ozawa and his colleagues (2004) mentioned that intergroup interaction provides opportunities for learners to review the activities of their own group. On the basis of the aforementioned result, such intergroup interaction is also expected to foster a sense of learning community in the entire class. Therefore, ProBo and ProBoPortable were redesigned in order to foster an overall reflection of the project group activity by not only monitoring the status of the project members, but also mutually assessing the class-wide projects. ProBo
389
Development of Cellular Phone Software to Prompt Learners to Monitor and Reorganize
Table 5. Additional information that the revised versions of ProBoPortable and ProBo displays Design Principles
Name
Information (in detail)
Objective
Learners can learn by obtaining an unobstructed view of all other learner activities at any point of time
Attention ranking
Ranking based on how often every group has attracted attention from other groups recently
To compare the attraction of a learner’s own group with that of other groups in order to review his/her own group’s activity, to prompt learners to access to the status of groups that are at the center of attention for reference, and to generate evaluation apprehension among groups
Login ranking
Ranking based on how often each group member has logged in to ProBo recently
To compare the activity level of a learner’s own group with that of other groups, and to prompt learners to access the status active of groups for reference
Recent actions
Most recent operations performed by learners in other groups
To make learners aware of the activity within a learner’s own group and that of other groups, and to prompt learners to access the status of active groups for reference
Recent memos
Most recent communication using memos on tasks, files, or ProjectHome of ProBo
To overhear communications among other groups’ students, and to obtain useful hints to make progress with their project
Learners can learn from fragments of conversation overheard among other learners
and ProBoPortable were enhanced using the following two principles based on recent learning space researches (Oblinger, 2006) and educational environments in design education, where general courses were provided in PBL format. •
•
390
Learners can learn by obtaining an unobstructed view of all other learner activities at any point of time: Classrooms or learning spaces where learners actively engage in their learning activities are similar to traditional classrooms—where learners felt they had less responsibility for participation—with seats and tables arranged in rows and the instructor at the front of the room. New age learning spaces provide multiple focal points in classrooms, promote visibility, and provide learners with an unobstructed view of other learners’ work or activities within other groups whenever they are within the classroom. Learners can observe how other learners proceed and be inspired by other groups’ activities. Learners can learn by hearing conversations among other learners: The abovementioned learning spaces also provide
accessibility in order to facilitate casual conversations among learners or groups. Moreover, sometimes, learners can overhear fragments of conversation from outside their group and take a hint to extend their learning activity. Such characteristics of learning environments are considered to provide learners with opportunities to reflect on their own group’s progress by comparing one’s group with other groups, and sometimes opportunities of appropriation, i.e., “the process of taking something that belongs to others and making it one’s own” (Wertsch, 1998, p. 53). ProBo and ProBoPortable were redesigned and extended to display two types of information in order to accomplish the abovementioned principles (Table 5). •
Information on recent activities in all other groups: ProBo and ProBoPortable both provide information on “Attention ranking,” “Login ranking,” and “Recent actions.” Attention ranking indicates how often each group has been monitored by other groups. It is expected to prompt learners to access the status of groups that
Development of Cellular Phone Software to Prompt Learners to Monitor and Reorganize
•
are at the center of attention for reference, and to enhance the evaluation apprehension among groups in order to stimulate their activities. Both “Login ranking” and “Recent actions” are also intended to make learners aware of the activities of other groups. Information on recent communication on ProBo and ProBoPortable in all other groups: Learners can communicate with each other regarding tasks to be completed, files to be shared for the project tasks, and the overall status of their own project by using the memo function. ProBo and ProBoPortable display headings of the other groups’ recent messages. Learners can be aware of where interactions are occurring or how other groups are progressing in their project(s).
The revised version of ProBo displays “Attention ranking” and “Login ranking” on the left-hand side of ProjectHome, and “Recent actions” and “Recent memos” on the left-hand side of the other functions except ProjectHome (see Figure 4 for an example). The current status of different groups is immediately retrieved from the groupware database whenever a learner logs onto ProBo, accesses a function, or refreshes the page. Learners can access the corresponding information on other projects by clicking on hyperlinks pertaining to each aspect of the information. On
Figure 5. Ticker field headings displayed on ProBoPortable
the other hand, ProBoPortable displays all the above mentioned information in the ticker field whenever a learner opens and activates his/her phone; current information is updated whenever ProBoPortable is activated or refreshed. A pilot study using this revised version of ProBo and ProBoPortable that was conducted in another undergraduate course revealed that providing information regarding other groups fostered evaluation apprehension among learners’ groups and stimulated their activities via social facilitation. It also indicated that students were encouraged to reflect on their division of labor, rethink the protocol of their project, and communicate with each other. This pilot study was conducted with a rather limited number of students. Therefore, we would like to empirically discuss the accurate effect of the revised version of ProBoPortable in a future work.
CONCLUSION Figure 4. Information on recent actions displayed on ProBo
The authors developed ProBoPortable—a cellular phone application for enhancing certain functions in Web-based groupware. In order to facilitate cooperation and stimulate the emergent division of labor among learners, ProBoPortable displays the status of division of labor on the screen of a cellular phone. A classroom evaluation in an undergraduate course confirmed that ProBoPortable enhanced mutual awareness of
391
Development of Cellular Phone Software to Prompt Learners to Monitor and Reorganize
division of labor among learners, who modified their own tasks by monitoring the overall status of the PBL project. The sense of a learning community was significantly fostered through the use of ProBoPortable. Moreover, the learners were encouraged to proceed with their own tasks due to the social facilitation caused by the presence of other members who were mutually aware of each other’s status. This evaluation indicated that each student engaged in a relative evaluation of his/her own task by checking the status of others’ tasks. As a result, the tasks were performed smoothly in the manner that they were mutually assigned with subsequent reorganization of division of labor. With regard to the research of online learning communities, this study indicates the importance of blending or bridging virtual learning communities and real learning activities by using a mobile learning platform in order to stimulate students’ learning. A mobile learning platform such as ProBoPortable can be a useful tool for not only PBL but also ICT-enabled learning, including elearning. Mobile and ubiquitous technologies can help one to feel the social presence of learning communities at any time and place—they enable a fostering of the sense of a learning community and encourage learners to stimulate and review their learning via social facilitation, even in an informal learning environment. With regard to future studies, it will be necessary to redesign the software in a manner that allows for a prompt and flexible evaluation and the seamless reorganization of division of labor in both formal and informal learning.
Scientific Research (B) (Subject No. 19300290, representative: Hiroshi Kato) from the Japanese Ministry of Education, Culture, Sports, Science and Technology.
ACKNOWLEDGMENT
Hutchins, E. (1990). The Technology of Team Navigation. In Galegher, J., Kraut, R. E., & Egido, C. (Eds.), Intellectual Teamwork: Social and Technological Foundations of Cooperative Work (pp. 191–220). Hillsdale, NJ: Lawrence Erlbaum Associates.
A part of this research was supported by Grantin-Aids for Young Scientists (B) (Subject No. 17700607 & 19700630, representative: Toshio Mochizuki; Subject No. 20700652, representative Kazaru Yaegashi), and Grant-in-Aid for
392
REFERENCES Carver, S. M. (2006). Assessing for Deep Understanding. In Sawyer, R. K. (Ed.), The Cambridge Handbook of the Learning Sciences (pp. 205–223). New York: Cambridge University Press. Cottrell, N., Wack, D., Sekerak, G., & Rittle, R. (1968). Social facilitation of dominant responses by the presence of an audience and the mere presence of others. Journal of Personality and Social Psychology, 9(3), 245–250. doi:10.1037/h0025902 Fogg, B. J. (2003). Persuasive Technology: Using Computers to Change What We Think and Do. New York: Morgan Kaufman Publishers. Gijbels, D., Dochy, F., Van den Bossche, P., & Segers, M. (2005). Effects of Problem-based Learning: A Meta Analysis from the Angle of Assessment. Review of Educational Research, 75(1), 27–61. doi:10.3102/00346543075001027 Gutwin, C., Stark, G., & Greenberg, S. (1995). Support for workspace awareness in educational groupware. In Proceedings of CSCL’95 Conference (pp. 147-156). Bloomington, IN, October 1995. Hakuhodo, Inc. (2004). The survey on the penetration rate and attitude in the use of cell phones and the Personal Handy-phone System among young people (as of October 2003). Tokyo: Hakuhodo, Inc.(in Japanese)
Development of Cellular Phone Software to Prompt Learners to Monitor and Reorganize
Janis, I. L. (1982). Groupthink: Psychological studies of policy decision and fiascoes (2nd ed.). Boston, MA: Hougton Miffin. Kato, H., Mochizuki, T., Funaoi, H., & Suzuki, H. (2004). A Principle for CSCL Design: Emergent Division of Labor. In L. Cantoni & C. McLoughlin (Eds.), Proceedings of World Conference on Educational Multimedia, Hypermedia and Telecommunications 2004 (pp. 2652–2569). Chesapeake, VA: AACE.
Ozawa, S., Mochizuki, T., Egi, H., & Kunifuji, S. (2004). Facilitating Reflection in Collaborative Learning Using Formative Peer Evaluation among Groups. [in Japanese]. Japan Journal of Educational Technology, 28, 281–294. Palloff, R. M., & Pratt, K. (1999). Building Learning Communities in Cyberspace: Effective Strategies for the Online Classroom. San Francisco, CA: Jossey-Bass Publishing.
Kuniavsky, M. (2003). Observing the User Experience: A Practitioner’s Guide to User Research. San Francisco: Morgan Kaufmann.
Rovai, A. P. (2002). Development of an instrument to measure classroom community. The Internet and Higher Education, 5, 197–211. doi:10.1016/ S1096-7516(02)00102-1
Latané, B., Williams, K., & Harkins, S. (1979). Many hands make light the work: The causes and consequences of social loafing. Journal of Personality and Social Psychology, 37, 343–356.
Scardamalia, M. (2002). Collective cognitive responsibility for the advancement of knowledge. In Jones, B. (Ed.), Liberal Education in the Knowledge Age (pp. 67–98). Chicago, IL: Open Court.
Nakahara, J., Hisamatsu, S., Yaegashi, K., & Yamauchi, Y. (2005). iTree: Does the mobile phone encourage learners to be more involved in collaborative learning? In T. Koschmann, D. Suthers & T.W. Chan (Eds.) Computer Supported Collaborative Learning 2005: The Next 10 Years! (pp. 470–478), Mahwah, NJ: Lawrence Erlbaum Associates.
Short, J., Williams, E., & Christie, B. (1976). The Social Psychology of Telecommunications. London: John Willey & Sons.
Nishimori, T., Kato, H., Mochizuki, T., Yaegashi, K., Hisamatsu, S., & Ozawa, S. (2005). Development and Trial of Project-Based Learning Support System. In G. Richards (Ed.), Proceedings of World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2005 (pp. 966–971). Chesapeake, VA: AACE. Oblinger, D. G. (Ed.). (2006). Learning Spaces. Washington, D.C.: EDUCAUSE.
Steiner, I. D. (1972). Group process and productivity. New York: Academic Press. Vaughn, S., Schumm, J. S., & Sinagub, J. (1996). Focus Group Interview in Education and Psychology. Thousand Oaks, CA: Sage. Wertsch, J. V. (1998). Mind as action. New York: Oxford University Press. Wood, D., Bruner, J. S., & Ross, G. (1976). The role of tutoring in problem solving. Journal of Child Psychology and Psychiatry, and Allied Disciplines, 17, 89–100. doi:10.1111/j.1469-7610.1976. tb00381.x
393
394
Chapter 24
Mathematical Retrieval Techniques for Online Mathematics Learning Le Van Tien Hochiminh City University of Technology, Vietnam Quan Thanh Tho Hochiminh City University of Technology, Vietnam Hui Siu Cheung Nanyang Technological University, Singapore
ABSTRACT In recent years, the number of computer-aided educational softwares in mathematics field has been increasing. Currently, there are some research prototypes and systems assisting finding mathematical problems. However, when finding appropriate mathematical expressions, most of these systems only support mechanisms to search expression in a strict exact manner, or search some similar problems based on wildcard, not on the similarity of expression structures and semantic meanings. Such mechanisms restrict users significantly from achieving meaningful and accurate search results of mathematical expressions. In this chapter, we introduce a mathematical retrieval system that helps mathematics learners self-study effectively. The most important module in our system is the math-retrieving system module, which received the analyzed problems submitted from users, retrieves solutions from similar stored problems and ranks the retrieved problems to users. To fulfill these requirements, we have researched and proposed some advanced mathematical retrieval and mathematical ranking techniques. Experiments have shown that our proposed techniques highly suitable for mathematical retrieval as they outperformed the techniques used in typical document retrieval system
INTRODUCTION In recent years, Information Retrieval has emerged as a significant field in terms of both research and DOI: 10.4018/978-1-60960-040-2.ch024
development for many prototypes and applications intended to process information on the Internet. The names such as Google (Google), Yahoo! Search (Yahoo), Baidu (Dawn, 2007) and Bing (Schofield, 2009) have become familiar to Internet users nowadays. These search engines have
Mathematical Retrieval Techniques for Online Mathematics Learning
attracted the increased numbers of Internet users because of their powerful capabilities of finding information quickly over the huge resources available on the World Wide Web. However, the current generation of search engines has shown some prominent shortcomings in searching semantic information. For example, with the query “Where is the Sun Flower?”, it is not easy to infer the real semantics associated with the term “Sun Flower”, which can be a kind of flower or a company name. Clearly, this question cannot be answered precisely if we are only merely based on the word lexicon. Thus, the need of a search engine which can search semantic content effectively becomes highly desirable, which introduces the recently emerging semantic search engine Wolfram Alpha (Johnson, 2009). This research investigates mining and retrieving semantic information on another type of data, rather than the textual one. In our system, we have attempted to build a module that can retrieve the mathematical content precisely. One important advantage of mathematical data is that it conveys higher semantic level than that of textual data. For example, when encountering the term log appearing in a mathematical expression, we can be certain about the semantic meaning associated with this term, which is the arithmetic logarithm function. Currently, there are some research prototypes and systems assisting finding mathematical problems such as MathWebSearch (Kohlhase & Sucan, 2006) and MathDex (Miner & Munavalli, 2007). However, when finding appropriate mathematical expressions, most of these systems only support mechanisms to search expression in a strict exact manner, or search some similar problems based on wildcard, not on the similarity of expression structures and semantic meanings. Such mechanisms restrict users significantly from achieving meaningful and accurate search results of mathematical expressions. Therefore, a mathematics search engine based on similarity of mathematical expressions has important value in
helping mathematics learners approach desirable solutions quickly. In this chapter we also discuss applying mathematics search engine in the education domain, rather than merely technical aspects. Thus, we propose a mathematical retrieval system that helps mathematics learners self-study effectively. The proposed system consists of the following major modules. First, a math-browser module has been developed to help learners browse classes of mathematics problems in a friendly and organized manner. Next, a testing module has also been built to enable learners to take a trial test and get the results online. Lastly, a math-retrieving module has been constructed to assist users on solving exercises based on some information retrieval techniques. This module can help users find similar problems when trying to solve certain specific problems. In addition, the system can get input of mathematical expressions from users in a friendly hand-writing manner. During the development of such a system, we have researched and employed the following advanced mathematics retrieval techniques: •
•
Mathematical retrieval: we have proposed a technique to process and retrieve mathematical data, adapted from the typical vector space model and tf•idf weights that are widely used for document retrieval (Baeza-Yates & Ribeiro-Neto, 1999). Mathematical ranking: While the adapted tf•idf technique is useful for retrieving mathematical problems, it is not efficient to rank the retrieved problem due the specific meaning implied by mathematical symbols and formulas. Thus, we develop a graph-based matching approach for the ranking problem. Our approach suggests a mixture between the Hungarian algorithm (Kuhn, 1956) with a self-developed treebased matching algorithm to deal with variety of mathematical problems ranged in different levels of complexity.
395
Mathematical Retrieval Techniques for Online Mathematics Learning
The rest of the chapter is organized as follows. In the next section, we provide readers with some background knowledge, which include information retrieval, mathematical retrieval and matching problem. Then, we give the architecture of our proposed system, followed by a case study on how to use it. Next, we discuss in details our core techniques on mathematical retrieval and ranking. Performance evaluation is given in the next section. Finally, the chapter is concluded by some discussion on conclusion and future work.
BACKGROUND In our proposed system, we intend to let user input some mathematics problems, then we retrieve some similar problems from our databases whose solutions will be returned to the users as their hints and references. Thus, we furnish this section with some background in information retrieval and mathematical retrieval. Nevertheless, since retrieval of mathematic data is more specific yet complicated than that of textual data, we will discuss some matching techniques particularly employed for this case.
Document Information Retrieval Information Retrieval System Information retrieval (IR) is the science of searching for information in documents, searching for documents themselves, searching for metadata which describe documents, or searching within databases, whether relational stand-alone databases or hyper textually-networked databases such as the World Wide Web. IR is interdisciplinary, based on computer science, mathematics, library science, information science, cognitive psychology, linguistics, statistics and physics. An information retrieval system (IR system) comprises three components: input, processor, and output.
396
The input of an IR system includes queries and documents. Queries are formal statements of information needs, for example search strings in web search engines. In information retrieval system, a query does not uniquely identify a single document in the collection. Instead, several documents may match the query, perhaps with different degrees of relevancy. The processor is part of the retrieval system concerned with the retrieval process. The process may involve structuring the information in some appropriate ways, such as classifying or clustering it. It will also involve performing the actual retrieval function, that is, executing the search strategy in response to a query. The output is usually a set of citations or document numbers. In an experimental system, the evaluation step is often required after achieving the output.
Vector Space Model and Similarity Measures In most modern document retrieval systems, the documents will be processed and represented as numerical vectors, therefore the set of documents will form a so-called vector space model. Then, the similarities between documents will be calculated via similarity measures of the vectors, which is typically based on the tf•idf weights. The details of those concepts are briefly discussed as follows. Vector space model. Vector space model (or term vector model) is an algebraic model for representing text documents (and any objects, in general) as vectors of identifiers, such as, for example, index terms. It is used in information filtering, information retrieval, indexing and relevancy rankings. In vector space model, a document is represented as a vector. Each dimension corresponds to a separate term. If a term occurs in the document, its value in the vector is nonzero. Several different ways of computing these values, also known as (term) weights, have been developed. One of the best known schemes is tf•idf as abovementioned. Typically terms are single words, keywords, or longer phrases. If the words
Mathematical Retrieval Techniques for Online Mathematics Learning
are chosen to be the terms, the dimensionality of the vector is the number of words in the vocabulary (the number of distinct words occurring in the corpus). Term Frequency–Inverse Document Frequency (tf•idf). The tf•idf is a weight often used in information retrieval and text mining. This weight is a statistical measure used to evaluate how important a word is to a document in a collection or corpus. The importance increases proportionally to the number of times a word appears in the document but is offset by the frequency of the word in the corpus. The term frequency in the given document is simply the number of times a given term appears in that document. This count is usually normalized to prevent a bias towards longer documents (which may have a higher term frequency regardless of the actual importance of that term in the document) to give a measure of the importance of the term ti within the particular document dj. tf i, j =
ni, j
∑n
k k,j
where ni,j is the number of occurrences of the considered term in document dj, and the denominator is the number of occurrences of all terms in document dj.The inverse document frequency is a measure of the general importance of the term (obtained by dividing the number of all documents by the number of documents containing the term, and then taking the logarithm of that quotient). idf i = log
D
{d
j
:ti ∈d j }
,
where | D | is total number of documents in the corpus, and the denominator is the number of documents where the term appears (that is ni,j≠ 0). Then tfidf i, j = tf i, j * idf i =
ni, j
∑n
* log
k kj
D
{d
j
: ti ∈d j }
Similarity Measures. Various algorithms are available to measure similarities between document vectors. Two of the most commonly used are Euclidean distance and cosine measure.
Mathematical Retrieval Currently, in Mathematical Retrieval, there are some research prototypes proposed; each of which has its own specific approach and different achievement. In this section, we firstly review some related approaches and projects aiming to retrieve mathematical content. Then, our proposed method is presented. MathWebSearch. MathWebSearch is the mathematical search engine developed by Michael Kohlhase and Ioan Sucan (Johlhase & Sucan, 2006). The MathWebSearch system harvests the web for content representations (currently MathML and OpenMath) of formulae and indexes them with substitution tree indexing, a technique originally developed for accessing intermediate results in automated theorem proven. For querying, the development team presents a generic language extension approach that allows constructing queries by minimally annotating existing representations. In summary, the result of this project is all the problems that have the same expression or sub-expressions with the query, no ranking technique is proposed. The prototype of this project can be found at http://search.mathweb.org/. There are other projects that have the approach and results similar to this project such as Mbase (Kohlhase & Franke, 2001)
397
Mathematical Retrieval Techniques for Online Mathematics Learning
MathDex. MathDex (Miner & Munavalli, 2007) project proposes an approach to search for mathematical notation. The approach aims at a search system that can produce good results with a large portion of the mathematical content freely available on the World Wide Web today. The basic concept is to linearize mathematical notation as a sequence of text tokens, which are then indexed by a traditional text search engine. For adequate precision and recall in the mathematical context, more complex combinations of atomic queries are required. This approach is to query for a weighted collection of significant sub-expressions, where weights depend on expression complexity, nesting depth, expression length, and special boosting of well-known expressions. In summary, the result of this project is all the problems that have the same expression or sub-expression with the query, and these problems are sorted based on the number of same sub-expressions and the number of mathematical terms of them. The prototype of this project can be found at http://www.mathdex.com/. In general, the current mathematical search engines aim at retrieving the mathematical contents on the Internet in the short time. Almost these systems retrieve mathematics contents based on the exact matching of sub-expressions. Search engine that really retrieves the mathematics contents based on mathematical similarity is rarely seen. In our project, we suggest an approach to retrieve the mathematical contents which is based on the similarity level between two mathematics expressions. In order to improve the execution cost of the retrieval method, we propose to classify mathematics problems into different clusters before indexing them. Then, the retrieval process is performed based on the generated clusters, rather than a single expression. In order to calculate the similarity between two mathematical expressions, we rely on the techniques on graph matching, which is reviewed subsequently.
398
Graph Matching Problem When ranking retrieved problems based on their similarity to query, we encounter situation which needs to evaluate the similarity between two sets of mathematic terms. After some conversions, this problem can be treated as a matching problem and can be solved by applying a matching algorithm. So, in this section, we give some information about matching problem. Because matching problems are often concerned with bipartite graphs, we introduce briefly bipartite graph before discussing matching problem and its algorithm.
Graph Matching Given a graph G = (V, E), a matching M in G is a set of pair wise non-adjacent edges; that is, no two edges share a common vertex. We say that a vertex is matched if it is incident to an edge in the matching. Otherwise the vertex is unmatched. A maximal matching is a matching M of a graph G with the property that if any edge not in M is added to M, it is no longer a matching, that is, M is maximal if it is not a proper subset of any other matching in graph G. In other words, a matching M of a graph G is maximal if every edge in G has a non-empty intersection with at least one edge in M. A maximum matching is a matching that contains the largest possible number of edges. There may be many maximum matchings. The matching number of a graph is the size of a maximum matching. Note that every maximum matching must be maximal, but not every maximal matching must be maximum. A perfect matching is a matching which covers all vertices of the graph. That is, every vertex of the graph is incident to exactly one edge of the matching. Every perfect matching is maximum and hence maximal. In some literature, the term complete matching is used. Given a matching M,
Mathematical Retrieval Techniques for Online Mathematics Learning
•
•
An alternating path is a path in which the edges belong alternatively to the matching and not to the matching. An augmenting path is an alternating path that starts from and ends on free (unmatched) vertices.
We can prove that a matching is maximum if and only if it does not have any augmenting path.
Matching in Bipartite Graph In the mathematical field of graph theory, a bipartite graph is a graph whose vertices can be divided into two disjoint sets X and Y such that every edge connects a vertex in X and one in Y; that is, there is no edge between two vertices in the same set. Matching problems are often concerned with bipartite graphs. Finding a maximum bipartite matching (often called a maximum cardinality bipartite matching) in an unweighted bipartite graph G = (V = (X + Y), E) is perhaps the simplest problem. The augmenting path algorithm finds it by finding an augmenting path from each x ∈ X to Y and adding it to the matching if it exists.
Matching Algorithms Hungarian algorithm (Kuhn, 1956) is a combinatorial optimization algorithm which solves matching problem in bipartite graph in polynomial time (O(n3)). The first version, known as the Hungarian method, was invented and published by Harold Kuhn in 1955. This was revised by James Munkres in 1957, and has been known since as the Hungarian algorithm, the Munkres assignment algorithm, or the Kuhn-Munkres algorithm. The main idea of Kuhn-Munkres algorithm to solve the bipartite matching problem with minimum weight is as follows. Finding two array Fx[1..k], Fy[1..k] which satisfy: •
c[i, j] – Fx[i] – Fy[j] >= 0
•
Set of edges (x[i], y[j]) satisfying c[i,j] – Fx[i] – Fy[j] = 0 is a complete matching with k edges. This is our expected matching.
MATHEMATICS RETRIEVAL SYSTEM In this section, we will overview the overall architecture of our proposed system for mathematics retrieval. To illustrate how our system works, an operational case study on a working session in the system will be presented then.
System Architecture The architecture of our proposed mathematics retrieval system is given is Figure 1. As we can see, our system is a client-server web-base system. The communication between client and server is handled by AJAX (Asynchronous Javascript And Xml) technology, which helps to reduce the response time and improve our system performance. When a user uses web browser to visit our system, he/she communicates with the client layer, which contains three main components that are Math Browser, Math Editor, and Admin Control Panel. These components first receive the requests from users, then create corresponding ajax requests and send them to the server. At the server, there are six main modules, namely Recognition module, Browsing module, Retrieval module, Testing module, Clustering module, and Administration module. The names of these modules should be self-explanatory. These modules are built based on the.NET framework. In this system, the following utilities are facilitated: •
•
Math Browser: It allows users to browse mathematical knowledge in a friendly and organized manner. Mathematical Retrieval: It supports retrieval of similar mathematical problems for a given problem. The retrieved problems will be ranked accordingly.
399
Mathematical Retrieval Techniques for Online Mathematics Learning
Figure 1. System conceptual architecture
•
Trial Testing: It assists users to self-study via a trial test.
These features will be illustrated via an operational case study followed. The most interesting feature, Mathematical Retrieval, will be discussed in details in the next section.
Operational Case Study In this section, we will present some screens specifying assumed scenarios of our system. As abovementioned, these scenarios include Math Browser, Mathematical Retrieval and Trial Testing.
Math Browser The main interface of Math Browser is given in Figure 2. As can be observed, the mathematics knowledge is organized as a hierarchy that allows users to browse in a systematic manner. In this browser, mathematics topics are represented by the concept of clusters. Each cluster can be considered as a topic. As we can see, all mathematical topics are displayed in Clusters window at the left of the screen. These topics are displayed in tree view, which helps users to browse the topics friendly and easily. The remainder at the right of
400
the screen displays information of the currently chosen cluster, which includes all paths from root node to the current cluster and all problems in this cluster. The user can access one cluster on the paths to current cluster by clicking on the corresponding button. He can also view the solution for a specific problem in a cluster by clicking on the “View solution” link. Finally, if the user wants to return to the main menu in order to choose another system functions, he needs to click on “Main Menu” button at the bottom of this screen.
Math Retrieval Math Retrieval supports retrieval of mathematics problems similar to a query problem input by users. As depicted in Figure 3, there is a writing table that allows the user to write expression of the problem that he wants to find other similar ones. The user can also see all possible characters which system has recognized from his written information. If the recognition character is not correct, he can easily choose the appropriate one among the characters in “Recognized Characters” window. After recognizing the query problem from the user, the system will do the retrieval and ranking processes. After processing, the result will be displayed back to the user as presented in Figure 3.
Mathematical Retrieval Techniques for Online Mathematics Learning
Figure 2. Math browser interface
The result contains the top ranked problems together with their solutions. Top ranked problems indicate the stored problems that are most relevant to current expression. The returned problems are sorted according to the similarities to the inputted expression. Based on the corresponding solutions of the retrieved problems, users may find hints or infer the solutions for the requested problem.
Testing If the user chooses the Testing function, he will see screen as depicted in Figure 4. After reading the question in the “Question” window, the user can write his answer in the “Writing panel”. The recognition result is displayed in the “Expression” window. He can easily navigate among all questions by using the control buttons in “Control Table” window. When finishing inputting the answer, a user can click on “EVALUATE” button to get the evaluated mark for his answer. After
Figure 3. Mathematical retrieval
401
Mathematical Retrieval Techniques for Online Mathematics Learning
Figure 4. Solution submitted for the test
clicking on “EVALUATE” button, if the user’s answer and the solution are the same mathematically, he will see the following screen informing the evaluated mark for his answer. We use the Maple library to evaluate the correctness of the submissions of users and notify the results to the users accordingly.
Figure 5. Operational mechanism of retrieval module
402
MATHEMATICAL RETRIEVAL AND RANKING As given in the case study, the most interesting feature of our system is the ability to retrieve the stored problems relevant to a query and rank the retrieved problem accordingly. In our architecture, this feature is handled mainly in the Retrieval module, as depicted in Figure 5. After the Recognition module successfully recognizes the query given by users from the Math Editor,
Mathematical Retrieval Techniques for Online Mathematics Learning
the Retrieval module will then retrieve relevant stored problems from the database. In the database, all of stored problems are clustered into groups, or clusters, based on their similarities. Therefore, the Retrieval module will retrieve the whole clusters that are most relevant, rather than individual problems. Then, the Retrieval will rank the problems contained in the retrieved clusters. In order to cluster and retrieve the mathematical problems, we represent the query and the stored problems as vectors of tf•idf weights, as discussed in the Background section. Then, we use some tree-based matching techniques for ranking the retrieved problems. These will be discussed in detail as follows.
Representing Mathematical Expressions as tf•idf Vectors In a modern document retrieval model, the following steps are usually performed to convert a document into a tf•idf vector: keywords identification, stemming and tf•idf weights calculation. For example, let us consider the query “I want to know about JAVA compiler”. When identifying keywords, we firstly remove unimportant words (i.e., non-content words such as pronoun, prepositions, etc.) to get significant keywords, which are “JAVA” and “compiler”. In the stemming step, we convert each keyword to its base form. For example, “compiler” will be reduced to “compil”. After having the base forms produced, the tf•idf weights are calculated as discussed in the Background section. When dealing with the mathematical data, because of the advantage of highly semantic representation, the step of finding keywords becomes simple. In a mathematical expression, a function name/operator is considered as a keyword, and the stemming step is not necessary when dealing with this mathematical data. With a specific variable/number, we consider it as general variable/ number as the stemming step in textual retrieval. For example, we consider “x” variable and “y”
variable are the same kind in terms of semantic meaning. Thus, the process to convert a mathematical expression into a tf•idf vector consists of the following steps: Partitioning functions: The recognized expression must be converted into vector form to apply the vector retrieval model later. To convert the expression into the vector form, we consider each mathematical function (such as log, cos, sin, ect.) as a dimension of a vector. For example, with a vector space comprises five dimensions which are integral, sin, cos, log, and identifier, the expression ò sin log sin x is presented by vect(1,2,0,1,1). This presentation means that: “The expression includes one integral function, two sin function, no cos function, one log function and one identifier which is x”. This process will receive the mathematical expression from the recognition module and convert it into the vector form. •
•
Stemming number/variable: Because our approach based on the similarity, we consider two specific numbers or two specific variables are the same. This step replaces each specific number/variable by a general number/variable notation. Calculating tf•idf weights: In the last steps, we applied the techniques similar to those of the textual retrieval process to produce the corresponding tf•idf weights.
For example, let us consider the query expression ò sin log sin x with database which has 100
documents in which there are 80 documents contain the integral function, 60 documents contain the sin function, 20 documents contain the log function, 40 documents contains the cos function, and 100 documents contain the identifier. After the partition function step, the query is presented by vect(1,2,0,1,1). By calculating tf•idf weight for each dimension of this vector, we have tf•idf vector of the expression. For example, the tf•idf weight of term sin is calculated as follows:
403
Mathematical Retrieval Techniques for Online Mathematics Learning
wintegral, query = tf integral , query * idf integral =
1 100 * log ≈ 0.0194 5 80
wcos, query = tf cos, query * idf cos =
100 0 * log = 0 5 40
the number/variable stemming on these vectors, calculate their tf•idf vectors, and finally measure their similarity based on cosine formulae. As we can see, this ranking approach is similar to that of the textual retrieval technique, so it performs well with the problems that almost content is textual (the problems such as “Calculating velocity of a person if he spent 1 hour to go from his office to supermarket. The distance between his office and the supermarket is 12km.”).
Ranking Based on Tree-Matching wlog, query = tf log, query * idf log =
Thus, the tf•idf vector corresponding to occurrence vector vect(1,2,0,1,1) is vectTF-IDF(0.0194, 0.887, 0, 0.1398, 0).
Ranking Mathematical Expressions In order to help the user find the most suitable problems quickly, the returned problems need to be sorted. In this process, the result problems are ranked based on their similarity with the input query. Basically, we have two choices for ranking methods. With a problem description which is almost textual, we do the ranking task based on the similarity between the corresponding tf•idf vectors. With a problem including mathematical expressions, we propose a ranking method based on Tree-Matching algorithm.
Ranking Based on tf•idf In order to rank the problem which is almost text, we have applied the same technique as when retrieving clusters. With this approach, we firstly convert the sample problems into vectors, then do
404
With problems whose contents include mathematical expression, we can also apply ranking technique based on tf•idf, but the evaluation value will be not reasonable in some cases. For example, with the query “sinlogcosx”, let us consider three expressions which are “sinlogcosx”, “sincoslogx” and “coslogsinx”. Clearly, we expect to retrieve the problem sinlogcosx first because it is the same problem with the query. But when we measure the similarity of these expressions, we will have the same values in terms of tf•idf, meaning these three expressions have the same similarities when compared with the query. Obviously, this is not the result we have expected. So, to deal with this case, we propose a matching approach known as Tree-Matching algorithm to evaluate the similar value between two mathematical expressions. In the Tree-Matching approach, we consider an expression as a tree, and develop an algorithm to evaluate the similarity value between the query expression and the sample expression as the similarity between two trees. The idea of evaluating the similarity of two expression trees is very simple; it is performed level-by-level, and node-by-node. This general idea is described visually in Figure 6. Applying this idea, we will evaluate the similarity of two expression trees by recursively comparing mathematical terms at each level, from top to bottom. At a specific node on a tree, we terminate the evaluation on its leaves if the math-
Mathematical Retrieval Techniques for Online Mathematics Learning
Figure 6. Example of comparing two expression trees
ematical term is not similar to the corresponding one in the remaining tree.
Matching Methods for TreeMatching Algorithm When comparing two trees, we have observed an important remark that the number of operands and their order in expression are very important in almost operators. So, in some trivial cases, the comparison can just be simply performed from top level to deepest level, from the left to the right. However, there are some so-called special operators such as addition (+), multiplication or product (*) which do not require a fixed number of operands and allow the operands organized in any arbitrary order. In this case, we must try to
exhaustively match all possible cases of the nodes between two trees in order to find out the best matching. For example, with the two trees given in Figure 7, we can see that the best matching in level 2 is to match the “sin” term in the left tree with the “sin” term in the right tree, and similarly to two “x” terms. In order to find the best matching in these cases, we suggest applying one of two algorithms, which are known as Permutations (Bona, 2004) and Maximum Bipartite Matching algorithms. Permutation matching algorithm is similar to the brute-force search (or exhaustive search) technique. This means that we generate all possible orders of matching, then calculate the similarity weight of each matching method and choose the best one. This method always returns the best
Figure 7. Example of mathematical expression matching trees
405
Mathematical Retrieval Techniques for Online Mathematics Learning
possible matching. In addition, it is algorithm is quite intuitive and easy implement. However, the complexity of this algorithm, which is O(n!), is very high. So we decided to apply this approach in case the maximum number of operands is not greater than six terms. In our implementation, we have attempted to speed up this algorithm by using the non-recursive permutation algorithm. When the number of operands in an operator becomes too large, we suggest applying maximum bipartite matching algorithm, because the complexity of this algorithm is O(n3), so it runs more smoothly when comparing to the permutation algorithm in these cases. But the trade-off is that we need more computing space when applying this algorithm. So, we decide to apply this algorithm when the number of nodes in the tree is greater than 5, just simply because O(53) ≈ O(5!). To applying the maximum bipartite matching algorithm to our problem, we consider X as the set of nodes at ith level in one tree, Y is set of nodes at the ith level in the remaining tree, and (x[i], y[j]) is an edge which its weight c[i, j] is the value of similarity when comparing x[i] and y[j]. For example, with the trees in Figure 7, we will have the following bipartite graph as in Figure 8, with the white nodes standing for mathematical terms in the left tree are in the left side, and the grey nodes standing for mathematical terms in the right tree are in the right side. Figure 8. Example of expression bipartite graph
406
Clearly, by applying this conversion, our problem becomes the problem of finding the complete matching which has maximum weight. So, we can solve the matching problem by applying the maximum bipartite matching algorithm which we have been covered in the Background section. In our implementation, we have applied the Kuhn-Munkres algorithm, which is considered as one of the best algorithms which solves the matching problem in bipartite graph.
PERFORMANCE EVALUATION In this section, we present an initial experiment to evaluate the effectiveness of the proposed mathematical retrieval system. In the experiment, we use a database consisting of 80 mathematics problems extracted from textbook. Table 1 lists the 20 illustrated problems. We have performed the system performance evaluation based on the 10-fold method. When a fold has been used as a testing data, the remaining folds act as the training data. After performing testing on 10 folds, we received 79/80 correct results, which is about 98.75% of accuracy. We have also compared the performance of our tree-matching ranking with other typical ranking techniques in terms of typical IR measures like recall, precision and F-measure, using the same query sets as depicted in Table 1. There are two ranking techniques are used for comparison. In the first technique, the typical tf•idf vector-spacemodel (VSM) is applied. In the second one, we also make use of VSM for retrieval; the data are clustered before retrieved. Table 2 presents the experimental results of the techniques implemented. As can be seen in Table 2, when combined with the clustering technique, VSM-based retrieval technique can be improved significantly in terms of both recall and precision. It is because when data are clustered, information is better represented as clusters and thus making the retrieval performance better. As compared to the
Mathematical Retrieval Techniques for Online Mathematics Learning
Table 1. Examples of some stored problems
ò lnx ò lnx * ln sin x ò lnx * ln(2x ) ò x * lnx ò x *lnx ò lnx * ln(2x ) * ln(3x ) ∫ lnx * sin (ln x ) ∫ lnx * cos (ln x ) ∫ lnx * sin (ln x ) * ln(2x ) ∫ sin (ln x ) *(lnx ) 2
2
ln ln x x ln ln x 2 ò x2 2 ò x * ln ln x
ò
ò ò ∫
lnx * sinlnx x lnx * coslnx x ln (ax + b )
(ln x )
4
∫ ò
x lnx * ln(3x) x2
ò x * sin x * ln x * ln cos x
VSM+clustering technique, the precision gained by Tree-matching queries technique is slightly lower, however the recall is quite better. As a result, the tree-matching ranking technique achieved the best performance in terms of Fmeasure.
CONCLUSION In this chapter, we have discussed mathematical retrieval issue and proposed a mathematical retrieval system. Rather than just searching for information like some typical document retrieval systems, the proposed system is very useful for mathematics learners, since it can suggest hints from solutions of relevant problems retrieved. Thus, the learners can self-study on an online environment. Taking into account the characteristics of mathematical contents, we have also proposed some appropriate mathematical retrieval techniques, which are adapted from typical document retrieval techniques combined with some graph matching methods. To represent mathematics expression for efficient retrieval, we suggest to modify the concept of tf•idf weight to better reflect the mathematic-specific contents. Therefore, mathematics expression can be retrieved effectively in our system. To make the ranking of our system more reasonable and accurate, we adopt a tree-matching approach, which can handle the mathematical structures almost precisely. Moreover, our system is equipped with a friendly interface allowing users to directly submit their solutions in a drawing board, which is the most natural and convenient manner. Thus, our system, in one hand, fully takes advantage of the online environment to liberate learners from physical distance and time constraints. Learners can access the system from anywhere and at anytime they wish. In other hand, our system helps typical learners to avoid technical difficulty such as getting themselves familiar with specific symbol-embedded editors or sufficiently mastering skill of keyboard typing to practice their studies. In our system, users are provided a convenient learning environment to communicate information in a traditional manner, i.e. users can just simply write down what they have in mind as if they are working with typical paper-and-pencil
407
Mathematical Retrieval Techniques for Online Mathematics Learning
Table 2. Comparison of IR techniques Techniques
Recall
Precision
F-measure
VSM VSM + Clustering Tree-matching
74% 92% 91%
85% 89% 99%
79% 90% 95%
tools. In short, our system combines the advantages of both online and traditional studying methods. With the furnished feature of intelligent retrieval of mathematical contents, our system can also therefore play the role of a virtual tutor, who is able to provide learners hints for their currently unsolved problems. In other extent, our system can also serve as an online testing program for mathematics learners. All of these features can be implemented with a significantly reduced cost of manpower and administration but still preserving a friendly and easy-to-use interactive mechanism for users, which is one of vital requirements to conduct the success of a studying program.
Kohlhase, M., & Sucan, I. (2006). A Search Engine for Mathematical Formulae. In Proceedings of 8th International Conference (pp. 241-253), AISC 2006 Beijing, China, September 20-22.
REFERENCES
Yahoo. (n.d.). Retrieved from http://www.yahoo. com
Baeza-Yates, R., & Ribeiro-Neto, B. (1999). Modern Information Retrieval. Addison-Wesley. Bona, M. (2004). Combinatorics of Permutations. Chapman Hall-CRC. Dawn, C. C. (2007). Search site moves at the speed of China. Los Angeles Times. Google search engine, http://www.google.com. Johnson (2009). British search engine could rival Google. The Guardian. Kohlhase, M., & Franke, A. (2001). MBase: Representing Knowledge and Context for the Integration of Mathematical Software Systems. Journal of Symbolic Computation, 32(4), 365–402. doi:10.1006/jsco.2000.0468
408
Kuhn, H. W. (1956). Variants of the Hungarian method for assignment problems. Naval Research Logistics Quarterly, 3, 253–258. doi:10.1002/ nav.3800030404 Miner, R., & Munavalli, R. (2007). An approach to mathematical search through query formulation and data normalization. Towards Mechanized Mathematical Assistants (pp. 342–355). MKM. Schofield, J. (2009). Microsoft launches Bing.com as its new search engine. The Guardian.
KEY TERMS AND DEFINITIONS Information Retrieval: To retrieve information from a dataset based on a submitted query. Matching Problem: To match two graphs in order to infer the similarity between them. Mathematical Retrieval: To retrieve mathematical contents based on a submitted query. Retrieval Ranking: To rank the information retrieved based on the relevance/ similarity with the submitted query. Tf•idf weightWeight: The weight indicating the importance/significance of a term in a document. Tree-matching ranking: To rank the retrieved mathematics expression based on the similarities
Mathematical Retrieval Techniques for Online Mathematics Learning
between their corresponding tree-like representatives with that of the original query. Vector Space Model: To model documents in a dataset as numerical vectors for the sake of retrieval.
409
410
Chapter 25
Online Ethnographic Methods: Towards a Qualitative Understanding of Virtual Community Practices Jörgen Skågeby Linköping University, Sweden
ABSTRACT This chapter describes the use of online ethnographical methods as a potent way to reach qualitative understanding of virtual communities. The term online ethnography envelopes document collection, online observation and online interviews. The chapter will explain the steps of conducting online ethnography – from defining setting and spelling out your research perspective, to collecting online data, analyzing gathered data, feeding back insights to the studied community and presenting results with ethical awareness. In this process the chapter will compare online ethnography to traditional ethnography and provide illustrative empirical examples and experiences from three recent online ethnographical studies on social information and media sharing (Skågeby, 2007, 2008, 2009a). While multimedial forms of data and data collection are becoming more common (i.e. video and sound recordings), the focus of the chapter lies mainly with text-based data. The chapter concludes by discussing methodological benefits and drawbacks of an online ethnographical process.
INTRODUCTION User engagement and communication sharing are two very central activities of virtual communities. In themselves, these activities also lay the ground for online methods as practical means of data-collection. The enduring qualities of some conversations and the technological means to DOI: 10.4018/978-1-60960-040-2.ch025
record other, more transient, conversations make it possible to collect social data from virtual communities. Indeed, the growing body of research in the field of Internet studies provides support to the viability of online methods to examine virtual community practices, behaviours and sentiments (Granello & Wheaton, 2004; Hine, 2005; Kinnevy & Enosh, 2002; Maczewski, et al., 2004). Not only does this development give an improved scholarly foundation for researchers to “honour the field in
which the participants are working – the online environment” (Crichton & Kinash, 2003), but also that “turning to the Internet for data collection […] prompts one to think outside of the traditional box and leads to creative methods and measurements” (Skitka & Sargis, 2006, p. 543).
Online Ethnography as a Method to Understand User Engagement and Communication Sharing Online ethnography is a qualitative approach to data collection in virtual communities. As such, its aim is usually to look beyond amounts and distributions and to try to unearth the deeper reasons for behaviours or sentiments (i.e. “why?”). In the attempt to find answers to the question “why?” online ethnography must acknowledge that usage is often situated in specific communities and with specific communication technologies. As Jones (2005) puts it: “Internet studies can […] describe and intervene in the life and values of the people who use the internet, and these can be best understood, no matter our temporal distance, through close observation and analysis of specific people and technologies, in specific places and times”. In many ways, online ethnography is not very different from traditional ethnography (Hine, 2000; Jakobsson, 2006). In fact, online ethnography has been described as “[traditional] ethnography adapted to the study of online communities” (Guimarães, 2003; Kozinets, 2002). Thus, to understand online ethnography, this chapter argues that it is important to understand the “costs and benefits” of traditional ethnography. A very short, but concise, explanation of ethnography would articulate it as a description of individuals, groups or cultures in their own environment over a (long) period of time. As such, ethnography is not explicitly wed to a specific set of methods, but commonly the utilized methods are qualitative in nature (e.g. observations and unstructured interviews). This chapter will discuss how document collection, direct observations and participation as
well as mediated interviews with key informants can help researchers to shape a valid description of a studied virtual community.
THE ONLINE ETHNOGRAPHICAL PROCEDURE The online ethnographical procedure consists of a number of different steps, namely cultural entrance (or entrée), collection and analysis of data while also making sure that trustworthy interpretations are made, conducting ethically sound research and making sure that members of the studied milieu can provide feedback to the research(er). This chapter will describe these steps, but use a slightly different, and elaborated taxonomy, i.e. defining setting and research perspective; making an entrance; qualitative online data collection; analysis; and presentation of results. Additionally, the author will also consider the omnipresent ethical dilemmas that colour online ethnographical research.
Defining Setting and Research Perspective This chapter will assume that an important research question has already been defined and that online ethnography has been identified as a workable method for addressing this particular question. It may be that online ethnography is judged to be the only available method for this specific question. In some cases it might even be seen as counterproductive to include other (offline) data collection methods – particularly when researching the specifics of mediated social interaction (Markham, 2004b). Still, it is important to understand that online ethnography is one method part of a larger repertoire of viable qualitative methods. This means that even though a study is aimed towards a virtual community, online and offline methods can sometimes be used in conjunction when researching certain practices or other social phenomena (Garcia, et
411
Online Ethnographic Methods
al., 2009; Silver, 2000). Thus, defining the setting and how the studied phenomena permeates or does not permeate the online/offline border can be crucial for subsequent methodological choices. Consequently, the setting should be defined in both social and technical terms, illustrating for example typical features, system development processes, user base, historical background, characteristic social phenomena (if any), content of significance in FAQ:s, any official standards of conduct etc. The declaration of setting provides readers with the basic knowledge needed to contextualize the findings and insights presented later as well as judge how the studied phenomena relates to the chosen setting on a larger scale. The reader can then make connections between the application genre, its features and the emergence of rules or practices. From a study of music file-sharing we see an example of how a high-level description of the setting can be written below (Skågeby, 2007):
Example: High-Level Setting Declaration The network Soulseek (slsk) is one of the more popular P2P music networks, although it has kept an ‘underground spirit’ to it. Some members are annoyed that Soulseek is appearing in the media, something that has become more common. Soulseek is not a large-scale business endeavour, which makes the official descriptions and ‘biographical accounts’ of it stem from Wikipedia (“Soulseek,” 2006) or published interviews with the main programmer (Mennecke, 2003), rather than press releases or white-papers.
Soulseek is a file-sharing application and tightly knit network used mostly to exchange music, although able to share a variety of files. It was created by Nir Arbel, a former Napster programmer. Like Napster, it relies on a central server. Soulseek is free of spyware and other malicious code. Soulseek is different from other file sharing programs as it allows users the option of downloading full folders instead of just single files. The original Soulseek userbase was composed mostly of members of the IDM [Intelligent Dance Music] Mailing List, and most of the music first found on Soulseek was underground electronic music or music created by the users themselves. After Audiogalaxy [another popular musicsharing service] was shut down, however, many former Audiogalaxy users migrated to Soulseek and brought copyrighted music owned by record labels belonging to the RIAA [Recording Industry of America Association]. Nevertheless, Soulseek remains a favourite of fans of underground and independent music, and a large portion of filesharing on Soulseek is legal sharing of music that is distributed under a free license. The userbase has grown rapidly since its beginnings, and there are now some 120,000 users at any given time, with more than one million total registered users in early 2004. (“Soulseek,” 2006) In terms of interaction, the picture below shows the overall interaction categories (these would be expanded on in a more elaborated description). Within each category there are additional, more specific features. This chapter strongly recommends that the critical next step of an ethnographic research process is to declare the perspective from which
Figure 1. A structural overview of the interaction repertoire in Nicotine (one of the applications used to access and interact with the Soulseek network)
412
Online Ethnographic Methods
the researcher(s) are initiating their endeavour. This is a step that is more often than not, overseen by (online) ethnographic research papers. This may be due to the limited space often provided to papers or chapters, but nevertheless, as ethnographic research is so dependent on the relationship between researcher and the researched context, it presents itself as a necessary component of any ethnographic study. To declare a research perspective means to initiate a self-reflecting process, where the researcher(s) try to unfold his or hers prior interpretations and personal experiences of the studied domain and the present research questions. It also means to acknowledge the perspective that pure objective and value-free descriptions are very difficult, if not impossible. The Internet is also a setting for fieldwork and as such not neutral to bias from agendas, personal histories or social norms (Murthy, 2008). Thus, much importance lies in the in depth awareness and declaration of such potential preconceptions and the realization of how such preconceptions can enrich the research. In many ways, an ethnographic process is about taking turns between the stories of the researcher (e.g. the debriefing descriptions) and the stories of the participants, looking for instances where those stories converge (Guimarães, 2003). This chapter, and much of the literature on interpretative methods, mentions the methodological importance of declaring the research perspective (Jones, 2005; Walstrom, 2004). In new media contexts it becomes particularly important, as it has the potential to further strengthen the entire field of Internet studies: “One action to be undertaken is questioning by us how we come to the knowledge we have. That is to say that, if an interpretative turn consists at least in part of self-reflection, of knowing how we know others, then we must as part of the development of our research and scholarship unpack the complicities and complications of our own positions as Internet users” (Jones, 2005). The process of unpacking our own positions as Internet users requires that
we, as online ethnographers, ask ourselves, and people in our close surroundings, some important questions, namely: • • • •
•
What phenomena do I/we initially see in the problem area and why? Are there other ways to delimit/categorize phenomena in the problem area? What are my experiences of the problem area? What are my preconceptions about relationships between phenomena in the problem area? What are my values relating to phenomena in the problem area?
Answering these questions, honestly and with detail, is likely to enhance the dependability of the research. As described, once an ethnographical study is launched it is often seen as an obligation for researcher’s to, as accurately as possible, describe the context of the study (e.g. for reasons of transferability). This chapter argues that before a study is launched a declaration of self-reflection also needs to be conducted (i.e. by reflecting over the questions above).
Example: Researcher Perspective – Insider vs. Outsider In prior studies conducted by the author of this chapter (Skågeby, 2007, 2008, 2009a), the stance of the researcher was to be an outsider with certain inside experience. The benefits of such a position are that there still is room for reflective observation as well as improved analytical skills from knowing the basics of interaction and technical features (e.g. by avoiding technical slip-ups or elementary social faux-pas). Other ethnographers have also deemed this to be a favourable research stance (Forsythe, 1999). A clear benefit of having an inside experience is that researchers who are viewed as insiders, or at least knowledgeable of the “local customs”, will face an easier task
413
Online Ethnographic Methods
when recruiting key informants (i.e. for follow-up interviews). Associating with central community members, engaging them as participants of the research (rather than just respondents), provides the researcher(s) with situated interpretative proficiency. The taking on of key informants could not only add interpretative fine-tuning, but potentially also enhance the reach of the method allowing for hidden or hard-to-reach populations to be included in the study (Matthews & Cramer, 2008).
Making an Entrance How to “enter” the studied community depends much on whether hidden or open research is intended (i.e. if the researcher intends to not only gather data from archives, but also engage or participate in the ongoing activities of the community and its members). This, in turn, could depend on the nature of the studied community and how the results are fed back and ultimately disseminated. At the end of the day there is the question of how sensitive the material is judged to be and the potential harms and benefits that can result from the publication of the research (Murthy, 2008). Thus the entrance in a community can vary from simply identifying a community forum message archive appropriate to study, on the basis of the level of activity in it (i.e. number of active members, amount and richness of postings) (Kozinets, 2002) to overtly presenting oneself, the study and continuous results, obtaining collective informed consent and taking measures to protect participant privacy (Sharf, 1999). The practical issues and consequences of presenting oneself as an online researcher, and how it may depend on the specific social context, is effectively discussed by Garcia and colleagues (2009). They say, for example, that gender can still be an issue, sometimes posing a threat and sometimes acting as a factor enhancing trust. Referring to official web pages hosted by the University in charge, or other proofs of authorization, seems to balance out any informal tones that can otherwise build
414
caution with respondents. Naturally, any attempts to recruit or approach participants should adhere to the current informal rules, or netiquette, in order to circumvent potential social gaffes.
Online Ethnographic Data Collection Once entrance is made, data collection may commence. The three most common data collection methods used in online ethnography are (1) document collection, (2) online observation and (3) online interviews. By document collection I refer to the gathering of some form of archived interaction (e.g. mailing list archives, forum discussion archives). Online observation refers to the researcher’s concurrent use of, and data collection through, the services or applications utilized in the studied online practice. For example, the use of chat services (Svenningson, 2001), or virtual world systems (Jakobsson, 2006) as both means and ends to observe practices. Compared to document collection, online observation is real-time and synchronous, something which carries implications on both the observation type and the researcher role (as shall be elaborated on later). Online interviews means to use synchronous, micro-synchronous or asynchronous communication technology as a mediator of an interview. Benefits and drawbacks of each method will be concisely summarized under each section respectively.
Document Collection Document collection refers to the gathering of, in some sense, archived data. In general, the data comes from asynchronous genres of communication, such as discussion forums, blogs or mailing lists. As such, the data collected via document collection is usually made up of textual material. However, with the increasing multimediated social interaction in for example social networking sites, audio and video are to an increasing extent included as data sources. The richness of social network-
Online Ethnographic Methods
ing sites is acknowledged by Murthy (2008) who ascertains: “when conducted alongside other data (e.g. interviews), the sites can provide unique in-depth autobiographical accounts of scenes and respondents” (p. 846). Certain structural and contextual information can also be collected via, for example, screenshots. The selection and amount of data collected is usually at the discretion of the ethnographer and should, naturally, be guided by the research question. Since a lot of communities are highly social arenas with its benefits and drawbacks, data can be cluttered with ‘off-topic’ material. In a general sense, ethnography is interested in all community activities, but depending on the research question a preliminary sorting can be resource-saving. A risk is that it can be hard to predict what to save and pursue and what to discard and ignore, even with a narrow topic of research. On the one hand, this depends on the relevance of answering the research question and is up to the skills of the researcher. On the other hand, the technical cost of saving additional material from online ethnographic studies is usually low, so while a preliminary sorting can save time and effort, there is always a possibility to re-include material previously disregarded. Document collection is typically one of two types: (1) targeted or (2) distributed. Targeted document collection means that one specific forum is selected because of its specific relevance to answering the research question (e.g. topical or demographical relevance). In this case, the actual collection boils down to identifying relevant discussion threads or posts and saving these. If the research question is limited in scope in relation to the overall topic of the forum, this chapter suggests to make use of search options included for the specific studied forum, but also that a final relevance judgement must be made at human discretion. In the case of distributed document collection, several for a (or blogs), are searched for relevant discussion. It might be that these are general discussion fora containing a wide
variety of discussions, not only pertaining to the specific research question. This approach is largely made possible by the continuously refined search options and techniques offered by both general (e.g. Google) or specialized (e.g. Boardreader, Omgili, Blogpulse) search engines. As such there is a larger reliance on language and technology, but also a potential to widen both the reach and variety of the data.
Online Observation Internet use is often distributed over different techniques (such as discussion groups, instant messaging conversations, shared files, member profiles etc.) and capable of leaving many manifest traces. A combination of different sources of data can be very rewarding for scientists with an interest in social activities on the net. Thus, to actually engage in the common activities of a network or community is beneficial in reducing the potential gap between what people say and what people do. In much virtual community research the co-evolvement of social activities, groupings and technical tools and development becomes a central theme. What is interesting is to understand how the virtual community works, often from the perspective of the users. However, the user does not act in a sociotechnical vacuum: the motivations, technology, netiquette, and conflicts of interest etc. emerge during social interaction. As such, using the same application(s) that members are utilizing can considerably help the online ethnographer understand new aspects of use. Note that this could also include using any other applications that the studied community favour, which may further assist to develop a sense of what is relevant, important and significant to the end-users. In summary, the researcher should make an effort to experience daily life as it is composed for the regular members of the studied community (Garcia, et al., 2009). In the words of Walstrom (2004): “Moreover, this approach obliges researchers to not only participate in the [online] groups
€€€€Partly open observation €€€€Hidden observation
that they study but also to have experienced the dilemma central to the participants’ discussions”. This participation, however, can be hidden, open or somewhere in between. In the table below we see a summary of different research perspectives based on hidden/open and participant/ observing dimensions: The choice of how to conduct online observation can depend on several factors. Depending on the type of community it is not always technically possible to be open towards all other members (e.g. in P2P file sharing). Another aspect is whether online observation is used as primary or supplementary method. If used as supplementary method and not dealing with individuals or sensitive data, but as a way to confirm insights about ways of conduct in a general sense (i.e. practices) a hidden approach could be defended. However, if used as a primary method and dealing with potentially sensitive data, the researcher-participant trust can be heavily damaged if conducted in a hidden manner (Skitka & Sargis, 2006). The question of how to record data from online observation has several possible answers. For example, Jakobsson (2006) made video recordings of a virtual environment (while concurrently also entrusting his avatar with a video camera). Screenshots is another, more static, option. As always, such material must be kept confidential. If considered for publication, informed consent and/or anonymization should be collected and performed. Interestingly, the role of field notes has been thoroughly discussed in literature on traditional ethnography (Emerson, et al., 1995), but as regards online ethnography it has not been well covered. However, the importance of these,
416
and the process of collecting them, is not to be underrated. Several authors have reported on the high value of field notes in online ethnography (Baym, 2000; Jakobsson, 2006). In studies of an online music sharing community, the researcher’s usage of a sharing application generated an ample amount of central field notes (Skågeby, 2007). The main point of online observation and field notes is that the researcher must be sensitive to phenomena that cannot be deduced from text only. For example, in the aforementioned music study, fieldnotes about the conduct of users in terms of what they downloaded and shared, in combination with what they talked about, provided deeper insights and theoretical sensitivity than conversations alone. Typical categories of data that can be recorded via fieldnotes are implicit practices, member hierarchies, relationship structures and tacit knowledge.
Online Interviews Online interviews can be performed synchronously or asynchronously. The main ‘genres’ (Barnes, 2003) of computer-mediated communication which are used for performing online interviews are either instant messaging (Davis, et al., 2004; Lawson, 2004; Voida, et al., 2004) or e-mail (Bampton & Cowton, 2002; Meho, in press). The big initial difference between these two genres is that instant messaging (IM) is synchronous and e-mail is asynchronous. This has certain implications on the type of interaction that can take place. While both these genres rely on certain technological skills and resources, IM is more time-dependent/time-intensive and thus
Online Ethnographic Methods
sensitive to technical problems, possibly creating lags and delays in responses. IM conversations also have the potential of being concurrent, for example an interviewee can sometimes manage as many as 20 conversations at the same time (Crystal, 2001). Interestingly, online interviews direct attention towards the benefits and drawbacks of conventional or face-to face interviewing. Scholars skewing away from online interviews often acknowledge the “technical” benefits that come with it, such as diminished costs, speed and geographic reach. However, the social aspects are often treated with scepticism. The lack of nonverbal behaviour, the possibilities to manufacture online identities (for both interviewers and respondents) and less in-depth replies (Fontana & Frey, 2000) are only some of the common misgivings. Nevertheless, as more scholars conduct online interviews, the overall picture tends to get more nuanced. Many researchers have dedicated time and effort to describing the drawbacks and benefits of mediated interviewing (Crichton & Kinash, 2003; Kivits, 2005; Olivero & Lunt, 2004; Selwyn & Robson, 1998). Their accounts reveal a dependable data collection method capable of producing rich and in-depth data. As before mentioned, it is captivating to highlight the strengths and limitations of online interviews, since they indirectly reveal what is good and bad about traditional interviews (Gruber, et al., 2008; Joinson, 2001; Murray & Sixsmith, 1998). To summarize, the strengths and limitations are: Strengths:
•
•
It is imperative to remember that, when considering the benefits and drawbacks, depending on which perspective is taken, the pluses and minuses can be interpreted reversely. For example, the interviewee has a possibility to, at any time, increase the latency of, or even ‘withdraw without a trace’ from the interview. While this is on the one hand not a desired outcome for the interviewer, it is on the other a ‘safety vent’, increasing a sense of
•
Online conversations allow participants to reconsider, research, recognize and reflect on words and expressions prior to posting them, allowing the conversation to be mutually negotiated Interviewees with textual skills are able to create more refined accounts of their experience
• •
•
• •
The aim with the interview becomes more clear due to the absence of visual and bodily cues No non-verbal cues that discourage or distract participants Goes beyond geographical and economical limitations in term of reaching participants and interviewees otherwise not accessible Can be conducted with the convenience of time and familiarity of the home or work environment Transcription is less demanding (in its simplest sense, copy and paste) Disclosure of more honest and deep information Limitations:
• • •
•
•
•
•
Limited non-verbal cues for encouragement Empathic and emotional communication is not obviously manifested Covert or constructed identities or characteristics, as well as temporary nature of participation, can make follow ups difficult Potentially skewed population (e.g. predominantly young European or American male) Asynchronous: can be stretched out over time due to flexibility in response time – respondents can answer in their own time Requires careful development of research relationships and knowledge of studied venue, and thus, time Potential strategic self-presentation
417
Online Ethnographic Methods
security for the interviewee. Likewise, the careful development of a trusting research relationship is not merely time-consuming, but also essential and rewarding in terms of generating high quality data: “Email communication is then constructed as a continuous alternation between an informal and formal style in answering the question, between interviewing and conversing” (Kivits, 2005). Olivero and Lunt (2004), as well as Kivits (2005), stress the importance of upholding a trustful, sensitive and linguistically adaptive relationship. Since e-mail interviews rely on textual communication, the linguistic and paralinguistic methods of strengthening the relationship is central. As such, the researcher’s sensitivity to issues of fostering trust, reciprocal conversation and questioning, equal partaking, authentic disclosure, cooperation and reflexivity becomes important. When it comes to synchronous interviewing (e.g. instant messaging or “chat”) a particularly interesting possibility is to conduct group interviews, similar to offline focus groups. For a detailed account of the group interview process we refer to the work of Klein et al. (2007).
Ethical Considerations Research ethics come particularly in focus when conducting online ethnographies (Sharf, 1999). A part from the ethical issues pondered under each method respectively, there are also some general issues that come to mind. Online material can be quite dynamic and ephemeral. At the same time, one benefit from collecting textual material online is that it is automatically transcribed without particular efforts. The transcripts can easily be copied or saved for future use and reuse. At times they are also publicly available online for quite some time. This can of course also be a problem, why the issue of informed consent is important (Spinello, 1995). Scholars who intend to engage in discussion and post questions to the forum usually introduce themselves and their goals prior to the study (Walstrom, 2004). With researchers who
418
lurk in concurrent discussion or access archival data, the necessity of obtaining consent or not is debated (Bruckman, 2002; Chen, et al., 2004). For large public forums there is an issue regarding from whom to seek permission as well as who has the mandate to deny access since they are public (Clegg Smith, 2004). While archived interaction is by many Institutional Review Boards regarded as public information (Skitka & Sargis, 2006), researchers must themselves make contextually informed judgments regarding how to deal with these issues in specific cases. One way to ground insights with the studied community is to feed back tentative results to the studied community (see the upcoming section on design patterns).
Example: Distributed Document Collection and Ethical Concerns As an example of something in between the outer edges of hidden and open research, in a study conducted on general forum discussions about Facebook, the author took steps to protect the privacy of users by not using participant’s screen names and by assembling prototypical quotes from several users (with the increasing possibilities to search forums and blog, researchers must be careful not to cite word for word since this makes quotes findable and thus traceable and not anonymous). However, informed consent was not deemed possible since users were dispersed over several various discussion forums and not always reachable for questioning (Skågeby, 2009a).
Analyzing Online Data Analysis of textual data can follow many different frameworks (e.g. conversation analysis, discourse analysis, feminist analysis). This chapter does not have the space to include run-throughs of all potential analytical frameworks. Rather, we shall propose the more general approach of Romano et al (2003), combined with thematic analysis (Braun & Clarke, 2006; Freeday & Muir-Cochrane, 2006).
Online Ethnographic Methods
For the purposes of this chapter, the comprehensive framework of Romano et al illustrates the overall methodological procedure, while the thematic analysis gives detail to the selection and coding steps of this procedure (see Figure 2). Put simply, thematic analysis refers to a careful reading and re-reading of the data in order to find recurrent themes across the data. Using more general frameworks like these is purposeful in a handbook chapter since readers are likely to come from a variety of disciplines. Because thematic analysis is a general and flexible method that has been used in many different methodological and theoretical traditions, however under different monikers, it can provide a “common starting ground”. As such thematic analysis also connects nicely to a prominent quality of the procedure presented by Romano and colleagues, namely that it is open to and acknowledges the influence of application theories (i.e. initial theories and research efforts that colour the preliminary categorization of the studied area).
In step 1, data is collected as per previously described methods. In conventional terms, step 2 in the model above contains what is commonly referred to as “analysis”. Most, if not all, research questions owes legacy to previous research, either theoretically or empirically. Accordingly, an application theory is often used to create an initial code categorisation. Of course, it might turn out so that all initial categories are developed into refined categories or even discarded as no data supports them. Step 2.1 sees the initiation of the thematic analysis. The input to this step is formatted raw data. Formatting is usually done so that either each row of text is separated by a line break or that initial analyst judgment separates tentative “findings”. Through an iterative process, where the analyst conducts careful reading and rereading of the data, s/he then identifies emergent and recurrent themes and key terms. These evolve as more data is included in the coding and clustering steps (2.2 and 2.3). The analyst must be sensitive to themes that emerge from the data: “There needs to be a balance between experi-
Figure 2. Overall analysis procedure (adopted and amended from Romano et al. (2003))
419
Online Ethnographic Methods
menter expectancy through overreliance on a priory theory and total dependence on data-derived meaning that may not provide any generalizability of the results.” (Romano, et al., 2003) These new themes are likely to influence the collection of data, allowing for a deeper analysis of that particular theme. Consequently, the sample size is in most cases not decided in advance. Rather the reduction step guides elicitation. Once no new classes of phenomena are found, elicitation can be stopped (or redesigned). So, from a pragmatic view the final sample is likely to reflect the variance of the population.
Presenting Results Ethnographic studies normally result in “thick descriptions” (i.e. lengthy and exhaustive descriptions of the studied phenomena). While these are certainly valid and appropriate for online ethnographic studies as well, there are at times conditions underlying online ethnographic research to “inform design”. In these cases a thick description may be too dense and generate a need for more ‘appropriate’ forms of presentation (Diggins & Tolmie, 2003). Earlier work in online ethnography has suggested three alternative ways to present results particularly suited for studies of information systems intended to provide design suggestions (Skågeby, 2009b). This chapter will only briefly outline these presentation techniques. For further detail please refer to the works cited under each respective technique.
Use Qualities Löwgren (2006) describes use qualities as “properties of a digital design that is experienced in its use”. The concept is based on value perspectives that include instrumental, aesthetical, social/communicative, constructional and ethical aspects. These perspectives are significant to a varying degree depending on the use or practice that is being researched. Largely because (a) any artefact
420
can be used in a multiplicity of ways, including ways it was not ‘intended’ and (b) there are many artefacts that can be used to accomplish one specific task (Ihde, 1993). An underlying postulation of use qualities is that different artefacts (or genres of artefacts) will present/generate diverse use qualities; use qualities that a designer will need to bear in mind during the craft of interaction design. While many general user experience attributes has been introduced over the years, the use qualities approach suggest that not all of them are equally relevant to all systems. One way to make use qualities more specific in terms of informing design is to include opposing use qualities as conflicting forces in design patterns (Arvola, 2006). However, before taking the full step to design patterns, this chapter argues that it is important to consider the potential space in between conflicting use qualities.
Analytical Dimensions Analytical dimensions aims to envelope the full diversity of use qualities that commonly exist in complex information systems (Skågeby, 2007). Analytical dimension offer a frame for relating and contrasting the complexity of the studied conflict. As such, it is good for presenting a comprehensive view that supports comparison over several specific cases. The basic anatomy of the analytical dimension is a polarized conflict, where the most obvious/prevalent counterparts make up the poles of the dimension. Once these poles are identified, they give the researcher the opportunity to postulate, identify and research activities, concerns and intentions in-between the extremes. Analytical dimensions also sustain analysis of users who have decided to change from one specific way of online conduct to another. A prototypical example would be users who have initially provided photos completely publicly, but then decide to be more selective about the receiving audiences of their pictures. In summary the analytical dimension is a versatile communicative
Online Ethnographic Methods
tool, allowing both for precise descriptions, but also recognizing the full range of potential use qualities that make up the dimension.
•
Design Patterns As hinted previously, both use qualities and analytical dimensions can inform design patterns. In short, a design pattern is a structured exposition of a generic solution to a problem in a context. Another way to say it is that the design pattern includes a feature that resolves forces in some context(s) (Martin, et al., 2001; Martin & Sommerville, 2001). The concept of design patterns was originally developed by architect Christopher Alexander and colleagues (Alexander, 1979). Alexander had noticed that there were qualities of architecture that were “hard to define”, but that were still very common and desirable. To create shareable descriptions of these qualities Alexander developed design patterns and while it is hard to summarize all that has been written about them, there are certain features that seem more or less agreed upon, to be precise: • •
• •
• •
Design patterns address a re-occurring problem in a specific context They build on understanding of what needs, interests and motivations (forces) that drive people in specific contexts. They include a feature(s) that coordinates or resolves these forces They are not too vague nor too specific and are thereby able to help designers understand what forces are at play and how these can be resolved, but still be flexible enough to allow infinite specific solutions They focus on what is good, rather than critique what is bad They are testable. That is to say, by applying the pattern to other cases, by sharing the patterns and debating, discussing, agreeing and disagreeing on them, and by examining how a specific pattern-derived
•
solution functions and feels, patterns can be put to empirical tests They are shareable in that they create a concrete common resource debatable to all involved stakeholders. They also create a way for people who are not designers, but still holds relevant and valuable knowledge, to inform the design (Erickson, 2000) They can help to bridge the gap between the qualitative descriptions of (users’) problems and solutions applicable in a design or implementation phase (or analysis phase, for that matter)
There is now a significant range of pattern forms spanning from minimally functionalistic to more narrative ones (Fincher, 2000). However, there are also a number of elements that are common to most design patterns: a name; a description of the problem, a description of the context, the forces at play; and a generic solution. A benefit that cannot be over-emphasized is that design patterns are fairly simple and to the point, and can therefore straightforwardly be brought back to the studied communities for discussion. A user-grounded assessment of developed patterns strengthens both the validity and reliability of the results. So, when a design pattern, use quality or analytical dimension is considered ready for sharing, feeding it back to the community or to any users reporting a special interest in the study and its results, is imperative. Indeed, a benefit of all of the above forms of presentation is the ease with what tentative results can, and should, be fed back to the studied community. Drafts of insights and study results posted to the community can open up to both grounding and improvements of the research in whole (Erickson, 2000; Matthews & Cramer, 2008). This can be accomplished through regular forum posts. Another alternative is to use a “research blog” where results, interpretations and draft texts can be published for public (or semipublic) scrutiny and be subject to suggestions for editorial development (Murthy, 2008). It needs to
421
Online Ethnographic Methods
be said though, that the benefits and drawbacks of such efforts are, so far, largely unevaluated.
Example: Feeding Back Results A recent study (Skågeby, 2007) used a combined effort to address the grounding of interpretations as well as to estimate the sufficiency of collection, some tentative analysis results were fed back to the network forum to generate discussion. The goal was to instigate discussion and use this as a recurring source of data (which could challenge the analysis or be incorporated as further proof of trustworthiness). The feedback did generate some debate and discussion that was enriching both to analysis, as well as to picking up on keywords and in-vivo-expressions (Internet slang is common in forums). Apart from that, the feedback also has the function of meeting informants on their ground. This can help to make a more situated and grounded judgment about the appropriateness of including quotes and applying certain analytical reasoning. However, there is a risk connected to this procedure, why after two general forum feedback sessions it was decided to attempt more personal feedback through interviews with users. The reason results were not fed back to the macrolevel forums more extensively or continuously was the risk of cluttering or biasing the naturally occurring discussions. Fortunately, discussions were very rich and inspired and consequently, the grounding of interpretations was mainly done via individual key informants.
that many of the fundamental epistemological issues and controversies remain the same (Travers, 2009). These include for example the concept of validity in ethnographic studies and interpretative versus critical analytical frameworks. However, in terms of data quality, there is tentative support to that online methods may attract higher motivated and more unprompted participants than traditional approaches (Skitka & Sargis, 2006). From a methodological point of view it is motivating to discuss what issues that are be brought to light by doing research in, and through, a technology-mediated context (e.g. a virtual community). At large, the Internet holds certain characteristics, which are supported more or less by the various applications utilizing it. Internet is, in a certain sense, global; it is, in a certain sense, anonymous; it is, in a certain sense, interactive; and it supports digital manipulation (material can be digitized, transferred, stored, cross-referenced and reproduced with certain flexibility and efficiency) (Weckert, 2000). These characteristics bring some fundamental differences with regards to researching virtual communities compared to corporeal communities: •
DISCUSSION: BENEFITS, DRAWBACKS AND EPISTIMOLOGICAL ISSUES
•
This chapter has shown that there are both similarities, as well as certain characteristic differences between a traditional ethnographic procedure and its counterparts when studying virtual communities and cultures. It is imperative though, to remember
•
422
•
The population online may be heterogeneous and almost ubiquitous to researchers, which raises questions about the validity and reliability of the data. There may be, for example, potential misrepresentative attractions, that is, that the people responding to calls for participation or people who post to forums etc. are only a minor extrovert part of the actual population Not only are the users dissimilar, but the specific technologies used to communicate can be quite diverse and consequently influence the entire sociotechnical setting Identity, anonymity and pseudonymity give rise to methodological concerns All statements above raise issues pertaining to research ethics
Online Ethnographic Methods
Credibility and Transferability of Online Ethnographic Studies Credibility refers to the degree by which the results are recognizable and valid from the perspective of the participants and transferability refers to the degree to which the results can be transferred or generalized to other settings and contexts (Guba & Lincoln, 1989). While ethnographic studies are often idiosyncratic in nature, there is still reason to discuss and even try to defend such scientific concepts as credibility and transferability. A complete coverage of the full population is often impractical, not to say impossible. Consequently, because of practical and material constraints, there is a need to sample the population. In online ethnography there is a practical issue: it can only address individuals who speak up or ‘write up’ or somehow participate. Further, there is often an explorative purpose to the study why the number of individuals interesting to the study is hard to predict. Thus, it is hard to generate a statistically representative sample based on a distribution criterion: “The significance of this argument is even greater when we think of other more dynamic units: as we do not know how characteristics concerning emotions, attitudes, opinions and behaviour are distributed in the population, aiming at statistical representativeness of samples is technically groundless.” (Gobo, 2004, p. 440) Instead, online ethnographers must focus on finding communities that hold high significance and relevance for the research question. A more specific way to put it is to say that when studying certain practices, we need to look at online settings where those practices are likely to occur. Online ethnographic studies can thus be generalized because they contribute to a growing body of scientific research, which forms a system of cases. They also constitute tests of hypotheses and methods, which may supplement the overall understanding of a phenomena (Flyvbjerg, 2004). In addition, by using an application theory, the study can be anchored in insights that have been repeat-
edly present in other communities and societies (i.e. a ‘theoretical transferability’). Another way to put it is to say that the theoretical recognition of the studied phenomena makes the use of application concepts theoretically meaningful. Moreover, the use of an application theory can aid in holding back the overwhelming idea of trying to draw a complete picture of all social interaction occurring in a virtual community (LeBesco, 2004). The heterogeneity of the user group is a circumstance, which also adds to the difficulty of generalizing results. The difficulty in accurately sampling a representative population stresses an even greater importance in representing the social and technological structures surrounding, emerging and co-evolving with it. By sensitively describing these circumstances and mechanisms the potential transferability of the results becomes more highlighted, and thus of more use to fellow researchers and practioners. While themes and insights are usually developed from one specific community, they often represent a class of concerns and intentions, which, at times, have also been identified or alluded to in other contexts. In other words, the results are not transferable as to what opinions users in other virtual communities actually have, but rather as to what opinions users in these networks can have. Thus, online ethnography describes the social significance of dimensions and the relations between them, rather than statistically logic populations, and how many individuals who possess a certain characteristic. Further, there is cause to consider the ‘authenticity’ of online ethnographical data. Markham (2004a) reports that online textual communication can be very representative, and assuming that it is less so than, say, interviewing face-to-face, could be impetuous. Presenting tentative results to the studied community is one way to increase the authenticity. Another way is to consecutively, and for extensive periods of time, use the current applications and gain first-hand experience of the various aspects of everyday life in the virtual community.
423
Online Ethnographic Methods
Dependability and Confirmability of Online Ethnographic Studies Dependability refers to the obligation of the researcher to provide accurate descriptions of the context so that readers feel they can depend on the results. This is particularly important; as a fundamental assumption of qualitative research is that we cannot measure the exact same thing twice. This has two practical implications: (1) researchers must aim to describe the sociotechnical context to the best of their ability and (2) it is a benefit if presentations of results are also in forms that can be easily compared. As regards to the first proposition, this chapter has already provided information on describing the setting. In reference to the second, researchers must be aware that there are obvious risks with condensing insights, such as over-reduction or constraining future interpretations of data (Diggins & Tolmie, 2003). Consequently, researchers should stress that any developed models or theories are most often not complete or generic theories. Rather, models are interfaces through which the data can be understood. Confirmability refers to the degree by which the results can be confirmed by others. This becomes interesting vis-à-vis confidentiality and anonymity. While confirmability is certainly desirable, there can also be reason to protect users or even the entire virtual community from identification. Again, this highlights rigour in describing setting and perspectives along with the importance to establish refined reflexivity (Delamont, 2004) and trustworthy interpretations of data (Golafshani, 2003). Another way is to actively engage participants, in order to get ‘confirmation’ from them. Established techniques, such as inter-rated analysis and coding are also adequate (Conway, et al., 1995), although there are also scholars who regard single-hand analysis as a methodological strength (Emerson, et al., 1995).
424
CONCLUSION The conclusion drawn from working with empirical data from document collection, online interviews and application use is that they are highly workable sources for insights into enduser problems and solutions; verbalized concerns and intentions; experiences and stories; and likes and dislikes. With regard to online ethnographic data collection it is also important to consider the balance of collecting data to the level where no additional or new information emerges and the risk of collecting too much data. Due to the availability and ease of data collection, huge amounts of data can be accumulated, increasing the risk of having an insurmountable records and spending too little time actually classifying and analyzing them. Again, this shows the importance of ‘living the life’ and becoming aware of communal memberships, rituals, language and behaviours in order to bring depth and quality to the analysis. In summary, online ethnography is capable of revealing: •
•
•
•
Hidden aspects of an activity – what was previously known and, perhaps even documented within the community might not be good descriptions of ‘what is actually performed’. Insignificant details may very well make an activity meaningful and worthwhile to its actors. Public knowledge (e.g. media reports and folk models), are frequently in contrast to the unofficial aspects of an activity. Thus, there is a need to complement the public picture with aspects based on studied practice. Categories and expressions used naturally within an area of research, emphasizing local and situated expertise. Concrete needs, objectives and methods that are described in ways that are recognizable to those who have and perform them.
Online Ethnographic Methods
•
• •
Relationships and connections between individuals and groups that explicate context and how the division of labour is enacted Conflicts between individuals, activities, groups, technologies etc. The specifics of conflicts: what/who is causing them, how are they dealt with, are there different magnitudes of problems et c
It seems obvious that research closely connected to virtual community practices gain extra relevance from utilizing online ethnographical methods. However, there is also great future potential in conducting wider user experience research, for example in the fields of humancomputer interaction and consumer research, via online ethnographic methods (Skågeby, 2009b).
REFERENCES Alexander, C. (1979). The Timeless Way of Building. Oxford University Press. Arvola, M. (2006). Interaction design patterns for computers in sociable use. International Journal of Computer Applications in Technology, 25(2/3), 128–139. doi:10.1504/IJCAT.2006.009063 Bampton, R., & Cowton, C. J. (2002). The EInterview. Forum Qualitative Sozialforschung / Forum: Qualitative Social Research [On-line Journal], 3(2). Barnes, S. S. (2003). Computer-Mediated Communication: Human-to-Human Communication Across the Internet. London: Pearson Education. Baym, N. K. (2000). Tune in, Log on: Soaps, fandom and online community. London: SAGE. Braun,V., & Clarke,V. (2006). Using thematic analysis in psychology. Qualitative Research in Psychology, 3, 77–101. doi:10.1191/1478088706qp063oa
Bruckman, A. (2002, Apr 4, 2002). Ethical Guidelines for Research Online Retrieved Nov 20, 2007, from http://www.cc.gatech.edu/~asb/ethics/ Chen, S.-L. S., Hall, G. J., & Johns, M. D. (2004). Research Paparazzi in Cyberspace: The Voices of the Researched. In Johns, M. D., Chen, S.-L. S., & Hall, G. J. (Eds.), Online Social Research: Methods, Issues, & Ethics (pp. 157–178). New York: Peter Lang. Clegg Smith, K. M. (2004). “Electronic Eavesdropping”: The Ethical Issues Involved in Conducting a Virtual Ethnography. In Johns, M. D., Chen, S.-L. S., & Hall, G. J. (Eds.), Online Social Research: Methods, Issues, & Ethics (pp. 223–238). New York: Peter Lang. Conway, J. M., Jako, R. A., & Goodman, D. F. (1995). A meta-analysis of interrater and internal consistency reliability of selection interviews. The Journal of Applied Psychology, 80(5), 565–579. doi:10.1037/0021-9010.80.5.565 Crichton, S., & Kinash, S. (2003). Virtual Ethnography: Interactive Interviewing Online as Method. Canadian Journal of Learning and Technology, 29(2). Crystal, D. (2001). Language and the Internet. Cambridge: Cambridge University Press. Davis, M., Bolding, G., Hart, G., Sherr, L., & Elford, J. (2004). Reflecting on the experience of interviewing online: perspectives from the Internet and HIV study in London. AIDS Care, 16(8), 944–952. doi:10.1080/09540120412331292499 Delamont, S. (2004). Ethnography and participant observation. In Seale, C., Gubrium, J., Gobo, G., & Silverman, D. (Eds.), Qualitative Research Practice (pp. 217–229). London: Sage.
425
Online Ethnographic Methods
Diggins, T., & Tolmie, P. (2003). The ‘adequate’ design of ethnographic outputs for practice: some explorations of the characteristics of design resources. Personal and Ubiquitous Computing, 7(3-4), 147–158. doi:10.1007/s00779-003-0226-y Emerson, R. M., Fretz, R. I., & Shaw, L. L. (1995). Writing Ethnographic Fieldnotes. Chicago: University of Chicago Press. Erickson, T. (2000). Lingua Francas for Design: Sacred Places and Pattern Languages. In the Proceedings of DIS2000, New York, USA. Fincher, S. (2000, Aug 7, 2007). The Pattern Gallery Retrieved June 10, 2008, from http://www. cs.kent.ac.uk/people/staff/saf/patterns/gallery. html Flyvbjerg, B. (2004). Five misunderstandings about case-study research. In Seale, C., Gubrium, J., Gobo, G., & Silverman, D. (Eds.), Qualitative Research Practice (pp. 420–434). London: Sage. Fontana, A., & Frey, J. H. (2000). The Interview: From Structured Questions to Negotiated Text. In Denzin, N., & Lincoln, Y. (Eds.), The handbook of Qualitative Research (2nd ed., pp. 645–669). Thousand Oaks, CA: Sage. Forsythe, D. E. (1999). “It’s Just a Matter of Common Sense”: Ethnography as Invisible Work. Computer Supported Cooperative Work, 8(1-2), 127–145. doi:10.1023/A:1008692231284 Freeday, J., & Muir-Cochrane, E. (2006). Demonstrating Rigor Using Thematic Analysis: A Hybrid Approach of Inductive and Deductive Coding and Theme Development. International Journal of Qualitative Methods, 5(1). Garcia, A. C., Standlee, A. I., Bechkoff, J., & Cui, Y. (2009). Ethnographic Approaches to the Internet and Computer-mediated Communication. Journal of Contemporary Ethnography, 38(1), 52–84. doi:10.1177/0891241607310839
426
Gobo, G. (2004). Sampling, representativeness and generalizability. In Seale, C., Gubrium, J., Gobo, G., & Silverman, D. (Eds.), Qualitative Research Practice (pp. 435–456). London: Sage. Golafshani, N. (2003). Understanding Reliability and Validity in Qualitative Research. Qualitative Report, 8(4), 597–607. Granello, D. H., & Wheaton, J. E. (2004). Online Data Collection: Strategies for Research. Journal of Counseling and Development, 82(4), 387–393. Gruber, T., Szmigin, I., Reppel, A. E., & Voss, R. (2008). Designing and Conducting online interviews to investigate interesting consumer phenomena. Qualitative Market Research: An International Journal, 11(3), 256–274. doi:10.1108/13522750810879002 Guba, E. G., & Lincoln, Y. S. (1989). Fourth Generation Evaluation. Thousand Oaks, CA: Sage. Guimarães, M. (2003). Doing Online Ethnography. Unpublished Seminar Report. Department of Sociology, University of Surrey. Hine, C. (2000). Virtual Ethnography. London: Sage. Hine, C. (Ed.). (2005). Virtual Methods: Issues in Social Research on the Internet. New York: Berg. Ihde, D. (1993). Postphenomenology - Essays in the Postmodern Context. Evanston, Illinois: Northwestern University Publishers. Jakobsson, M. (2006). Virtual Worlds & Social Interaction Design. Umeå: Umeå University. Joinson, A. N. (2001). Self-dicsclosure in computer-mediated communication: The role of self-awareness and visual anonymity. European Journal of Social Psychology, 31, 177–192. doi:10.1002/ejsp.36
Online Ethnographic Methods
Jones, S. (2005). Fizz in the field: Toward a Basis for an Emergent Internet Studies. The Information Society, 21(4), 233–237. doi:10.1080/01972240591007544 Kinnevy, S. C., & Enosh, G. (2002). Problems and Promises in the Study of Virtual Communities. Journal of Technology in Human Services, 19(2/3), 119–134. doi:10.1300/J017v19n02_09 Kivits, J. ë. (2005). Online Interviewing and the Research Relationship. In Hine, C. (Ed.), Virtual Methods: Issues in Social Research on the Internet. New York: Berg. Klein, E. E., Tellefsen, T., & Herskovitz, P. J. (2007). The use of group support systems in focus groups: Information technology meets qualitative research. Computers in Human Behavior, 23, 2113–2132. doi:10.1016/j.chb.2006.02.007 Kozinets, R. V. (2002). The Field Behind the Screen: Using Netnography for Marketing Research in Online Communities. JMR, Journal of Marketing Research, 39(1), 61–72. doi:10.1509/ jmkr.39.1.61.18935 Lawson, D. (2004). Blurring the Boundaries: Ethical Considerations for Online Research Using Synchronous CMC Forums. In Buchanan, E. A. (Ed.), Readings in Virtual Research Ethics: Issues and Controversies (pp. 80–100). London: Information Science Publishing. LeBesco, K. (2004). Managing Visibility, Intimacy, and Focus in Online Critical Ethnography. In Johns, M. D., & Chen, S.-L. S. (Eds.), Online Social Research: Methods, Issues, & Ethics (pp. 63–80). New York: Peter Lang. Löwgren, J. (2006). Articulating the Use Qualities of Digital Designs. In Fishwick, P. (Ed.), Aesthetic Design (pp. 383–403). Cambridge, MA: MIT Press.
Maczewski, M., Storey, M.-A., & Hoskins, M. (2004). Conducting Congruent, Ethical, Qualitative Research in Internet-Mediated Research Environments. In Buchanan, E. A. (Ed.), Reading in Virtual Research Ethics. London: Information Science Publishers. Markham, A. N. (2004a). Representations in Online Ethnographies: A Matter of Context Sensitivity. In Johns, M. D., Chen, S.-L. S., & Hall, G. J. (Eds.), Online Social Research: Methods, Issues, & Ethics (pp. 141–156). New York: Peter Lang. Markham, A. N. (2004b). The Internet as research Context. In Seale, C., Gubrium, J., Gobo, G., & Silverman, D. (Eds.), Qualitative Research Practice. London: Sage. Martin, D., Rodden, T., Rouncefield, M., Sommerville, I., & Viller, S. (2001). Finding Patterns in the Fieldwork. In the Proceedings of ECSCW2001. Bonn, Germany: Kluwer. Martin, D., & Sommerville, I. (2001). Patterns of Cooperative Design: Linking Ethnomethodology and Design. ACM Transactions on ComputerHuman Interaction, 11(1), 58–89. Matthews, J., & Cramer, E. P. (2008). Using Technology to Enhance Qualitative Research with Hidden Populations. Qualitative Report, 13(2), 301–315. Meho, L. I. (in press). E-mail interviews in qualitative research: A methodological discussion. [x]. Journal of the American Society for Information Science and Technology. Mennecke, T. (2003, Dec 26). Soulseek Interview Retrieved March 13, 2006, from http://www.slyck. com/news.php?story=356 Murray, C., & Sixsmith, M. (1998). E-mail: A qualitative research medium for interviewing? International Journal of Social Research Methodology: Theory & Practice, 1(2), 103–121.
427
Online Ethnographic Methods
Murthy, D. (2008). Digital Ethnography: An examination of the Use of New Technologies for Social Research. Sociology, 42(5), 837–855. doi:10.1177/0038038508094565 Olivero, N., & Lunt, P. (2004). When the Ethics is Functional to the Method: The Case of E-Mail Qualitative Interviews. In Buchanan, E. A. (Ed.), Readings in Virtual Research Ethics: Issues and Controversies. London: Information Science Publishers. Romano, N. C., Donovan, C., Chen, H., & Nunamaker, J. (2003). A Methodology for Analyzing Web-Based Qualitative Data. Journal of Management Information Systems, 19(4). Selwyn, N., & Robson, K. (1998). Using e-mail as a research tool. Social Research Update(21). Sharf, B. F. (1999). Beyond Netiquette: The Ethics of Doing Naturalistic Discourse Research on the Internet. In Jones, S. G. (Ed.), Doing Internet Research: Critical Issues and Methods for Examining the Net. London: Sage. Silver, D. (2000). Looking Backwards, Looking Forwards: Cyberculture Studies 1990-2000. In Gauntlett, D. (Ed.), web.studies: Rewiring media studies for the digital age. London: Arnold. Skågeby, J. (2007). Analytical Dimensions of Online Gift-giving: ‘other-oriented’ contributions in virtual communities. International Journal of Web-based Communities, 3(1), 55–68. doi:10.1504/IJWBC.2007.013774 Skågeby, J. (2008). Semi-Public End-user Content Contributions: a case study of concerns and intentions in online photo-sharing. International Journal of Human-Computer Studies, 66(4), 287–300. doi:10.1016/j.ijhcs.2007.10.010 Skågeby, J. (2009a). Exploring Qualitative Sharing Practices of Social Metadata: Expanding the Attention Economy. The Information Society, 25(1), 60–72. doi:10.1080/01972240802587588
428
Skågeby, J. (2009b). Online friction: studying sociotechnical conflicts to elicit user experience. International Journal of Sociotechnology and Knowledge Development - Special Issue on New Sociotechnical Insights in Interaction Design, 1(2), 62-74. Skitka, L. J., & Sargis, E. J. (2006). The Internet as Psychological Laboratory. Annual Review of Psychology, 57, 529–555. doi:10.1146/annurev. psych.57.102904.190048 Soulseek (2006, Feb 28). Retrieved Mar 09, 2006, from http://en.wikipedia.org/w/index. php?title=Soulseek Spinello, R. A. (1995). Ethical Aspects of Information Technology. London: Prentice-Hall. Svenningson, M. (2001). Creating a Sense of Community. Experiences from a Swedish Web Chat. Linköping University, Linköping. Travers, M. (2009). New methods, old problems: A sceptical view of innovation in qualitative research. Qualitative Research, 9(2), 161–179. doi:10.1177/1468794108095079 Walstrom, M. K. (2004). “Seeing and Sensing” Online Interaction: An Interpretative Interactionist Approach to USENET Support Group Research. In Johns, M. D., Chen, S.-L. S., & Hall, G. J. (Eds.), Online Social Research: Methods, Issues, & Ethics (pp. 81–100). New York: Peter Lang. Weckert, J. (2000). What is New or Unique about Internet Activities? In D. Langford (Ed.), Internet Ethics. London: MacMillan. Voida, A., Mynatt, E. D., Erickson, T., & Kellog, W. A. (2004, 24-29 April). Interviewing Over Instant Messaging. In the Proceedings of CHI 2004, Vienna, Austria.
429
Chapter 26
Understanding Online Communities by Using Semantic Web Technologies Alexandre Passant National University of Ireland-Galway, Ireland Sheila Kinsella National University of Ireland-Galway, Ireland Uldis Bojars National University of Ireland-Galway, Ireland John G. Breslin National University of Ireland-Galway, Ireland Stefan Decker National University of Ireland-Galway, Ireland
ABSTRACT During the last few years, the Web that we used to know as a read-only medium shifted to a read-write Web, often known as Web 2.0 or the Social Web, in which people interact, share and build content collaboratively within online communities. In order to clearly understand how these online communities are formed, evolve, share and produce content, a first requirement is to gather related data. In this chapter, we give an overview of how Semantic Web technologies can be used to provide a unified layer of representation for Social Web data in an open and machine-readable manner thanks to common models and shared semantics, facilitating data gathering and analysis. Through a comprehensive state of the art review, we describe the various models that can be applied to online communities and give an overview of some of the new possibilities offered by such a layer in terms of data querying and community analysis.
INTRODUCTION Social media is now a part of the everyday lives DOI: 10.4018/978-1-60960-040-2.ch026
of people who are using Web technologies. People read and comment on blogs, participate in editing wiki pages, use social networking to interact with their friends (or to get new ones), and share pictures, memories and more via services such as
Understanding Online Communities by Using Semantic Web Technologies
Flickr or YouTube: the whole paradigm often being known as Web 2.0 (O’Reilly, 2005). Moreover, this phenomenon goes further, for example, impacting research communities with services such as the Nature Network - http://network.nature. com/ - and enterprise information systems in a shift known as Enterprise 2.0 (McAfee, 2006). The more that both data and people interact and connect via Web 2.0, the more scientists (both from Social Science and Computer Science) try to understand how online communities are formed, how they evolve, what do they share, and what valuable information can be extracted from these analyses. Yet, the diversity of tools, communities and services makes the process of gathering the data, and consequently understanding these communities, a complex task. For each ecosystem, new algorithms must be built, new links must be mined, new applications must be designed, etc. Nevertheless, another trend from the research community during the last ten years, the Semantic Web (Berners-Lee et al., 2001), aims to provide models for interoperable data between applications and can be of great interest for communities from the Social Web. By relying on standard models to represent data as well as shared semantics between applications, it offers a means to better integrate and query data from various systems, as well as creating links between them. Using Semantic Web technologies can help us to better understand these online communities, by providing common means to represent, link and mine information from various distributed systems and heterogeneous data sets, as emphasised by Figure 1. Thus, the goal of this chapter is to provide readers - especially advanced undergraduate and graduate students in Computer Science, Social Science and more generally in Web Science (a term that we will describe later) – with a comprehensive state-of-the-art study on how Semantic Web technologies can be used to model, export and analyse virtual communities in distributed environments such as the Web or Web-based Information System (e.g. corporate organisations).
430
Figure 1. Using the Semantic Web to facilitate community analysis
The chapter is structured as follows. In the first part, we will focus on current practices to understand and model virtual communities and their related content as well as describing the shortcomings of these approaches, such as relying on vendor-specific APIs. It will hence provide us with incentives to introduce the core of this chapter, i.e. the need for Semantic Web technologies for modelling virtual communities and identifying the advantages they offer regarding data and content analysis as well as interoperability between social applications. In the second section, we will then introduce Semantic Web principles and provide a comprehensive state–of-the-art review of existing models from the Semantic Web that are dedicated to Social Web data. In the third part, we will then discuss use cases on how to use these technologies to better understand communities. We will thus give the reader an overview of possibilities that are offered by such methods: querying communities, mining profiles from distributed social
Understanding Online Communities by Using Semantic Web Technologies
networks, browsing social data, etc. In the fourth section, we will discuss some upcoming challenges and we will focus both on how Semantic Web technologies can be used to solve some of them and at the same time which challenges are still faced by the Semantic Web community in the context of understanding virtual communities, especially at Web scale. Finally, we will conclude the chapter with an overall discussion on how the Semantic Web could help to understand not only virtual communities, but also the Web in general. We will discuss the recent Web Science Research Initiative and will discuss how, in our opinion, the Semantic Web and the Web Science agenda relate to each other, and how Semantic Web technologies could help people to better understand the evolution and complexity of the Web and of Web-based information systems.
UNDERSTANDING VIRTUAL COMMUNITIES From a Web to a Social Web Since it was established, the Web has been used to enable communication not only between computers but also between people. Usenet newsgroups, mailing lists and web-based forums allowed people to connect with each other and thereby enabled communities to form, often around specific topics of interest. The social networks formed via these technologies were not explicitly stated, but were implicitly defined by the interactions of the people involved (e.g. by replying to each other). Later, technologies such as IRC (Internet Relay Chat), instant messaging and blogging continued the trend of using the Internet to build communities of interest. One of the most visible trends on the Web is the emergence of Web 2.0-style services. The term Web 2.0 (O’Reilly, 2005) refers to a perceived second-generation of Web-based communities and hosted services. Although the term suggests a new
version of the Web, it does not refer to an update of the World Wide Web technical specifications and architecture (TAG, 2004), but rather to new structures and abstractions that have emerged on top of the ordinary Web. Although it is difficult to define the boundaries of what structures or abstractions belongs to Web 2.0, there seems to be an agreement that services and technologies like blogs, wikis, folksonomies, podcasts, many-tomany publishing, social networking sites (SNSs), Web APIs, web standards and online Web services are part of Web 2.0. Web 2.0 has not only been a technological but also a business trend: according to Tim O’Reilly: “Web 2.0 is the business revolution in the computer industry caused by the move to the Internet as platform, and an attempt to understand the rules for success on that new platform” (O’Reilly, 2006). In addition to participation features (through blogging, wiki participation, etc.), an important feature of the Web 2.0 meme is the online social networking aspect. Social networking sites such as Friendster (an early SNS previously popular in the US, now widely-used in Asia), orkut (Google’s SNS), LinkedIn (an SNS for professional relationships) and MySpace (a music and youth-oriented service) - where explicitly-stated networks of friendship form a core part of the website - have become part of the daily lives of millions of users, and have generated huge amounts of investment since they began to appear around 2002. Since then, the popularity of these sites has grown hugely and continues to do so. (Boyd and Ellison, 2007) recently described the history of social networking sites, and suggested that in the early days of SNSs, when only the SixDegrees service existed, there simply were not enough users: “While people were already flocking to the Internet, most did not have extended networks of friends who were online”. According to Internet World Stats, between 2000 (when SixDegrees shut down) and 2003 (when Friendster became the first successful SNS), the number of Internet users had doubled.
431
Understanding Online Communities by Using Semantic Web Technologies
Web 2.0 content-sharing sites with social networking functionality such as YouTube (a video-sharing site), Flickr (for sharing images) and Last.fm (a radio and music community site) have enjoyed similar popularity. The common features of a social networking site include personal profiles, friends listings, commenting, private messaging, discussion forums, blogging, and media uploading and sharing. Many contentsharing sites, such as Flickr and YouTube also include some social networking functionality. In addition to SNSs, other forms of social websites include wikis, forums and blogs. Some of these publish content in structured formats enabling them to be aggregated together. A common property of Web 2.0 technologies are that they facilitate collaboration and sharing between users with low technical barriers – although usually on single sites or with a limited range of information. In this book we will refer to this collaborative and sharing aspect as the “Social Web”, a term that can be used to describe a subset of Web interactions that are highly social, conversational and participatory, whereby social media content is being created and augmented on a variety of social media platforms. The Social Web may also be used instead of Web 2.0 as it is clearer what feature of the Web is being referred to, and we will use both in this chapter. Finally, it is worth noticing that this social vision of the Web is actually closely aligned to the original vision of the Web, as Tim Berners-Lee noted in an interview with the BBC: “The idea was that anybody who used the web would have a space where they could write and so the first browser was an editor, it was a writer as well as a reader. […] What happened with blogs and with wikis, these editable web spaces, was that they became much more simple. When you write a blog, you don’t write complicated hypertext, you just write text, so I’m very, very happy to see that now it’s gone in the direction of becoming more of a creative medium.” Since the beginning of the Web, that participation aspect was enabled. For
432
example, the first Web browser Amaya was not just a read-only browser, but it also allowed one to edit pages from the browser (similar to methods now popularised by wiki interfaces).
Current Approaches for Data Mining and Analysis The field of social network analysis (SNA) gives us a methodology for gaining insight into the structure of communities. Social network analysis uses methods from graph theory to study networks of individuals and the relationships between them. The individuals are often referred to as nodes or actors, and they may represent people, groups, countries, organisations or any other type of social unit. The relations between them can be called edges or ties, and can indicate any type of link, for example, acquaintance, friendship, co-authorship or information exchange. Ties may be undirected, in which case the relationship is symmetric, or directed, in which case the relationship has a specific direction and may not be reciprocated. Social network analysis enables us to discover information such as the key people in a network, the distinct communities in a network, and the different types of roles which occur in a network. Apart from comprehensive textbooks in this area (Wasserman and Faust, 1994), there are many academic tools for visually examining social networks and performing common SNA routines. For example, the tool Pajek - http:// vlado.fmf.uni-lj.si/pub/networks/pajek/ - can be used to drill down into various social networks. A common method is to reduce the amount of relevant social network data by clustering. One can choose to cluster people by common friends, by shared interests, by geographic location, by tags, etc. visualisations. Alternatively, a library like JUNG - http://jung.sourceforge.net/ - which provides analysis and visualisation methods, can be used to develop custom analytic or visual tools. In any case, before loading the data into one of these analysis tools, the relevant data must first
Understanding Online Communities by Using Semantic Web Technologies
be converted to an appropriate representation, which is dependant on the tool used. For more on social network analysis, and on a Semantic Web framework for carrying out SNA, see the chapter titled “Semantic Social Networks Analysis, a Concrete Case”. Another approach for analysing online communities is using Natural Language Processing (NLP) algorithms to extract entities, topics and relationships from textual content generated by users. However when dealing with social media sites, performing NLP can be particularly difficult due to the typically informal nature of user posts, which tend to contain a lot of slang and contextdependant terms, with little attention given to spelling and grammar (Gruhl et al., 2009). Thus, while NLP algorithms are potentially very useful tools for investigating SNSs, there are challenges particular to user-generated content which must be handled. When dealing with typical SNSs, even just acquiring the relevant data can require a lot of effort. A typical approach is to start from the profiles of a seed user or set of users, and follow the links to their friend’s profiles, and friend-offriends’ profiles, and so on. Often it is necessary to download each user’s profile as a HTML page, and then scrape the desired information. This process is time consuming and sometimes difficult. The code requires updating every time the structure of a page is changed, and needs to be completely rewritten for every new website which one wishes to investigate, since information is represented differently depending on the website. As an alternative to scraping, many Web 2.0 sites provide APIs, for example LiveJournal, Twitter, Flickr and YouTube. The main motivation for providing APIs is to facilitate the integration of services into new applications or mashups. One can send requests to an API about a particular user, content item, or other resource, and the results are returned in a structured, easy-to-parse format. This makes the process of data acquisition much easier, at least when the data of interest is limited
to one site. However, if analysis requires data to be collected from multiple sites, integration can be problematic. For example, let us consider two major applications for the Social Web, the first one being Twitter, the microblogging service, and the second one being Flickr, the photo sharing system. Both provide a public API that can be used by thirdparty developers to make their own applications, or to simply gather some data to analyse the communities (e.g. identifying social networks, groups the users belong to, etc.). Yet, these APIs are different both in terms of sending the request and parsing the results. While both of them are based on HTTP calls and provide common formats for the API output response (such as XML and JSON – JavaScript Object Notation, a popular format for exchanging structured data within Web 2.0 applications), they use different parameters and return values. For example, to identify all the people who are connected to a particular user on Twitter (and to get results using XML), one must call a URL pattern such as http://twitter.com/ friends/ids/terraces.xml which retrieves results shown in Table 1. On Flickr, a similar query is performed by calling a pattern such as http://api.flickr.com/ services/rest/?method=flickr.contacts. getPublicList&api_key=f4c67b996f01077cf2e1 d1469a7e790f&user_ id=33669349%40N00&api_sig=c8c0fd49fe472 72e1410e9574c98096c and the results are shown in Table 2. As one can see, there is no obvious relationship between these two models: while they share common properties in terms of data artefacts, such as the user id, this one is represented as an XML tag by itself using id on Twitter, but as an attribute nsid on Flickr, which makes integration complex. Moreover, the parameters that have to be passed to the API are also different. In the next chapter, we will see how Semantic Web technologies, providing uniform description of resources using RDF and ontologies, can be used to provide a
433
Understanding Online Communities by Using Semantic Web Technologies
Table 1. Example of a social network retrieved using the Twitter API < ?xml version=”1.0” encoding=”UTF-8”?> 30076775598708571992702739321347 […]
Table 2. Example of a social network retrieved using the Flickr API […]
common layer of representation over such data, from heterogeneous APIs to standardised representation models (both in terms of how to get the data and how to understand it).
SOCIAL SEMANTICS TO THE RESCUE The Semantic Web: An Introduction When looking at the initial proposal that led to the World Wide Web (Figure 2) by (Berners-Lee, 1989), we can see that it links typed objects (people, projects, software, etc.) using various types of properties (describes, refers to, etc.). However, in spite of this initial proposal, so far the Web we
434
are using is mainly a Web of documents (either text files or multimedia ones), linked together by untyped and unidirectional hyperlinks. While this is probably enough for human readers (who can interpret the content of these documents and the meaning of the links between these documents), the situation is far more complex for software agents. For example, one person reading the Wikipedia page about Paris can understand that this is a city and that a link to a page about France identifies that Paris is located in France, but there is no means for a software agent to understand anything about the nature of the objects described in these pages, despite the evolution of NLP algorithms that can be used to extract named entities and relationships from such pages.
Understanding Online Communities by Using Semantic Web Technologies
Figure 2. The architecture proposal by Tim Berners-Lee that gave birth to the Web
The Semantic Web vision aims to solve these issues by providing a Web of machine-readable information, with well-defined structure and semantics. The W3C (World Wide Web Consortium) recently termed this as “a Web of Data”, in contrast to the Web of Documents, i.e. a Web in which data (not just documents) can be represented, exchanged and understood in a meaningful way. The Semantic Web is not a new Web, disconnected from the current one, but an “extension of the current Web” (Berners-Lee et al., 2001). While most of the current standardisation efforts around the Semantic Web have occurred via the W3C within their Semantic Web activity - http://www. w3.org/2001/sw/ - some older projects such as SHOE (Helfin and Hendler, 2000) have focused on similar ideas leading towards a machinereadable Web. In order to achieve the Semantic Web goal, different technologies are needed, that form the complete Semantic Web layer cake, depicted in Figure 3. While we do not aim to provide a complete description of this stack, some particular elements must be understood before going further in this chapter. A first component that enables the Semantic Web is the use of URIs – Uniform Resource
Identifiers (Berners-Lee et al., 2005) - as identifiers for everything that is described on the Web: people, cities, communities, etc. These URIs act as Web-scaled identifiers for naming resources. For example, can be used to identify the city of Galway, (while would identify a page about it). There can be multiple identifiers for the same resource, and we will
Figure 3. The Semantic Web “layer cake” from the W3C
435
Understanding Online Communities by Using Semantic Web Technologies
detail this in the context of online identity in a later section of this chapter. Another requirement is to define facts or assertions about these URIs, for example, to say that Galway is a city. RDF – Resource Description Framework – provides a way to do so by defining a triples-based model in the form of <subject> <predicate>